LibreChat/api/server/routes/agents/__tests__/responses.spec.js
Danny Avila 6279ea8dd7
🛸 feat: Remote Agent Access with External API Support (#11503)
* 🪪 feat: Microsoft Graph Access Token Placeholder for MCP Servers (#10867)

* feat: MCP Graph Token env var

* Addressing copilot remarks

* Addressed Copilot review remarks

* Fixed graphtokenservice mock in MCP test suite

* fix: remove unnecessary type check and cast in resolveGraphTokensInRecord

* ci: add Graph Token integration tests in MCPManager

* refactor: update user type definitions to use Partial<IUser> in multiple functions

* test: enhance MCP tests for graph token processing and user placeholder resolution

- Added comprehensive tests to validate the interaction between preProcessGraphTokens and processMCPEnv.
- Ensured correct resolution of graph tokens and user placeholders in various configurations.
- Mocked OIDC utilities to facilitate testing of token extraction and validation.
- Verified that original options remain unchanged after processing.

* chore: import order

* chore: imports

---------

Co-authored-by: Danny Avila <danny@librechat.ai>

* WIP: OpenAI-compatible API for LibreChat agents

- Added OpenAIChatCompletionController for handling chat completions.
- Introduced ListModelsController and GetModelController for listing and retrieving agent details.
- Created routes for OpenAI API endpoints, including /v1/chat/completions and /v1/models.
- Developed event handlers for streaming responses in OpenAI format.
- Implemented request validation and error handling for API interactions.
- Integrated content aggregation and response formatting to align with OpenAI specifications.

This commit establishes a foundational API for interacting with LibreChat agents in a manner compatible with OpenAI's chat completion interface.

* refactor: OpenAI-spec content aggregation for improved performance and clarity

* fix: OpenAI chat completion controller with safe user handling for correct tool loading

* refactor: Remove conversation ID from OpenAI response context and related handlers

* refactor: OpenAI chat completion handling with streaming support

- Introduced a lightweight tracker for streaming responses, allowing for efficient tracking of emitted content and usage metadata.
- Updated the OpenAIChatCompletionController to utilize the new tracker, improving the handling of streaming and non-streaming responses.
- Refactored event handlers to accommodate the new streaming logic, ensuring proper management of tool calls and content aggregation.
- Adjusted response handling to streamline error reporting during streaming sessions.

* WIP: Open Responses API with core service, types, and handlers

- Added Open Responses API module with comprehensive types and enums.
- Implemented core service for processing requests, including validation and input conversion.
- Developed event handlers for streaming responses and non-streaming aggregation.
- Established response building logic and error handling mechanisms.
- Created detailed types for input and output content, ensuring compliance with Open Responses specification.

* feat: Implement response storage and retrieval in Open Responses API

- Added functionality to save user input messages and assistant responses to the database when the `store` flag is set to true.
- Introduced a new endpoint to retrieve stored responses by ID, allowing users to access previous interactions.
- Enhanced the response creation process to include database operations for conversation and message storage.
- Implemented tests to validate the storage and retrieval of responses, ensuring correct behavior for both existing and non-existent response IDs.

* refactor: Open Responses API with additional token tracking and validation

- Added support for tracking cached tokens in response usage, improving token management.
- Updated response structure to include new properties for top log probabilities and detailed usage metrics.
- Enhanced tests to validate the presence and types of new properties in API responses, ensuring compliance with updated specifications.
- Refactored response handling to accommodate new fields and improve overall clarity and performance.

* refactor: Update reasoning event handlers and types for consistency

- Renamed reasoning text events to simplify naming conventions, changing `emitReasoningTextDelta` to `emitReasoningDelta` and `emitReasoningTextDone` to `emitReasoningDone`.
- Updated event types in the API to reflect the new naming, ensuring consistency across the codebase.
- Added `logprobs` property to output events for enhanced tracking of log probabilities.

* feat: Add validation for streaming events in Open Responses API tests

* feat: Implement response.created event in Open Responses API

- Added emitResponseCreated function to emit the response.created event as the first event in the streaming sequence, adhering to the Open Responses specification.
- Updated createResponse function to emit response.created followed by response.in_progress.
- Enhanced tests to validate the order of emitted events, ensuring response.created is triggered before response.in_progress.

* feat: Responses API with attachment event handling

- Introduced `createResponsesToolEndCallback` to handle attachment events in the Responses API, emitting `librechat:attachment` events as per the Open Responses extension specification.
- Updated the `createResponse` function to utilize the new callback for processing tool outputs and emitting attachments during streaming.
- Added helper functions for writing attachment events and defined types for attachment data, ensuring compatibility with the Open Responses protocol.
- Enhanced tests to validate the integration of attachment events within the Responses API workflow.

* WIP: remote agent auth

* fix: Improve loading state handling in AgentApiKeys component

- Updated the rendering logic to conditionally display loading spinner and API keys based on the loading state.
- Removed unnecessary imports and streamlined the component for better readability.

* refactor: Update API key access handling in routes

- Replaced `checkAccess` with `generateCheckAccess` for improved access control.
- Consolidated access checks into a single `checkApiKeyAccess` function, enhancing code readability and maintainability.
- Streamlined route definitions for creating, listing, retrieving, and deleting API keys.

* fix: Add permission handling for REMOTE_AGENT resource type

* feat: Enhance permission handling for REMOTE_AGENT resources

- Updated the deleteAgent and deleteUserAgents functions to handle permissions for both AGENT and REMOTE_AGENT resource types.
- Introduced new functions to enrich REMOTE_AGENT principals and backfill permissions for AGENT owners.
- Modified createAgentHandler and duplicateAgentHandler to grant permissions for REMOTE_AGENT alongside AGENT.
- Added utility functions for retrieving effective permissions for REMOTE_AGENT resources, ensuring consistent access control across the application.

* refactor: Rename and update roles for remote agent access

- Changed role name from API User to Editor in translation files for clarity.
- Updated default editor role ID from REMOTE_AGENT_USER to REMOTE_AGENT_EDITOR in resource configurations.
- Adjusted role localization to reflect the new Editor role.
- Modified access permissions to align with the updated role definitions across the application.

* feat: Introduce remote agent permissions and update access handling

- Added support for REMOTE_AGENTS in permission schemas, including use, create, share, and share_public permissions.
- Updated the interface configuration to include remote agent settings.
- Modified middleware and API key access checks to align with the new remote agent permission structure.
- Enhanced role defaults to incorporate remote agent permissions, ensuring consistent access control across the application.

* refactor: Update AgentApiKeys component and permissions handling

- Refactored the AgentApiKeys component to improve structure and readability, including the introduction of ApiKeysContent for better separation of concerns.
- Updated CreateKeyDialog to accept an onKeyCreated callback, enhancing its functionality.
- Adjusted permission checks in Data component to use REMOTE_AGENTS and USE permissions, aligning with recent permission schema changes.
- Enhanced loading state handling and dialog management for a smoother user experience.

* refactor: Update remote agent access checks in API routes

- Replaced existing access checks with `generateCheckAccess` for remote agents in the API keys and agents routes.
- Introduced specific permission checks for creating, listing, retrieving, and deleting API keys, enhancing access control.
- Improved code structure by consolidating permission handling for remote agents across multiple routes.

* fix: Correct query parameters in ApiKeysContent component

- Updated the useGetAgentApiKeysQuery call to include an object for the enabled parameter, ensuring proper functionality when the component is open.
- This change improves the handling of API key retrieval based on the component's open state.

* feat: Implement remote agents permissions and update API routes

- Added new API route for updating remote agents permissions, enhancing role management capabilities.
- Introduced remote agents permissions handling in the AgentApiKeys component, including a dedicated settings dialog.
- Updated localization files to include new remote agents permission labels for better user experience.
- Refactored data provider to support remote agents permissions updates, ensuring consistent access control across the application.

* feat: Add remote agents permissions to role schema and interface

- Introduced new permissions for REMOTE_AGENTS in the role schema, including USE, CREATE, SHARE, and SHARE_PUBLIC.
- Updated the IRole interface to reflect the new remote agents permissions structure, enhancing role management capabilities.

* feat: Add remote agents settings button to API keys dialog

* feat: Update AgentFooter to include remote agent sharing permissions

- Refactored access checks to incorporate permissions for sharing remote agents.
- Enhanced conditional rendering logic to allow sharing by users with remote agent permissions.
- Improved loading state handling for remote agent permissions, ensuring a smoother user experience.

* refactor: Update API key creation access check and localization strings

- Replaced the access check for creating API keys to use the existing remote agents access check.
- Updated localization strings to correct the descriptions for remote agent permissions, ensuring clarity in user interface.

* fix: resource permission mapping to include remote agents

- Changed the resourceToPermissionMap to use a Partial<Record> for better flexibility.
- Added mapping for REMOTE_AGENT permissions, enhancing the sharing capabilities for remote agents.

* feat: Implement remote access checks for agent models

- Enhanced ListModelsController and GetModelController to include checks for user permissions on remote agents.
- Integrated findAccessibleResources to filter agents based on VIEW permission for REMOTE_AGENT.
- Updated response handling to ensure users can only access agents they have permissions for, improving security and access control.

* fix: Update user parameter type in processUserPlaceholders function

- Changed the user parameter type in the processUserPlaceholders function from Partial<Partial<IUser>> to Partial<IUser> for improved type clarity and consistency.

* refactor: Simplify integration test structure by removing conditional describe

- Replaced conditional describeWithApiKey with a standard describe for all integration tests in responses.spec.js.
- This change enhances test clarity and ensures all tests are executed consistently, regardless of the SKIP_INTEGRATION_TESTS flag.

* test: Update AgentFooter tests to reflect new grant access dialog ID

- Changed test IDs for the grant access dialog in AgentFooter tests to include the resource type, ensuring accurate identification in the test cases.
- This update improves test clarity and aligns with recent changes in the component's implementation.

* test: Enhance integration tests for Open Responses API

- Updated integration tests in responses.spec.js to utilize an authRequest helper for consistent authorization handling across all test cases.
- Introduced a test user and API key creation to improve test setup and ensure proper permission checks for remote agents.
- Added checks for existing access roles and created necessary roles if they do not exist, enhancing test reliability and coverage.

* feat: Extend accessRole schema to include remoteAgent resource type

- Updated the accessRole schema to add 'remoteAgent' to the resourceType enum, enhancing the flexibility of role assignments and permissions management.

* test: refactored test setup to create a minimal Express app for responses routes, enhancing test structure and maintainability.

* test: Enhance abort.spec.js by mocking additional modules for improved test isolation

- Updated the test setup in abort.spec.js to include actual implementations of '@librechat/data-schemas' and '@librechat/api' while maintaining mock functionality.
- This change improves test reliability and ensures that the tests are more representative of the actual module behavior.

* refactor: Update conversation ID generation to use UUID

- Replaced the nanoid with uuidv4 for generating conversation IDs in the createResponse function, enhancing uniqueness and consistency in ID generation.

* test: Add remote agent access roles to AccessRole model tests

- Included additional access roles for remote agents (REMOTE_AGENT_EDITOR, REMOTE_AGENT_OWNER, REMOTE_AGENT_VIEWER) in the AccessRole model tests to ensure comprehensive coverage of role assignments and permissions management.

* chore: Add deletion of user agent API keys in user deletion process

- Updated the user deletion process in UserController and delete-user.js to include the removal of user agent API keys, ensuring comprehensive cleanup of user data upon account deletion.

* test: Add remote agents permissions to permissions.spec.ts

- Enhanced the permissions tests by including comprehensive permission settings for remote agents across various scenarios, ensuring accurate validation of access controls for remote agent roles.

* chore: Update remote agents translations for clarity and consistency

- Removed outdated remote agents translation entries and added revised entries to improve clarity on API key creation and sharing permissions for remote agents. This enhances user understanding of the available functionalities.

* feat: Add indexing and TTL for agent API keys

- Introduced an index on the `key` field for improved query performance.
- Added a TTL index on the `expiresAt` field to enable automatic cleanup of expired API keys, ensuring efficient management of stored keys.

* chore: Update API route documentation for clarity

- Revised comments in the agents route file to clarify the handling of API key authentication.
- Removed outdated endpoint listings to streamline the documentation and focus on current functionality.

---------

Co-authored-by: Max Sanna <max@maxsanna.com>
2026-01-28 17:44:33 -05:00

1125 lines
38 KiB
JavaScript

/**
* Open Responses API Integration Tests
*
* Tests the /v1/responses endpoint against the Open Responses specification
* compliance tests. Uses real Anthropic API for LLM calls.
*
* @see https://openresponses.org/specification
* @see https://github.com/openresponses/openresponses/blob/main/src/lib/compliance-tests.ts
*/
// Load environment variables from root .env file for API keys
require('dotenv').config({ path: require('path').resolve(__dirname, '../../../../../.env') });
const originalEnv = {
CREDS_KEY: process.env.CREDS_KEY,
CREDS_IV: process.env.CREDS_IV,
};
process.env.CREDS_KEY = '0123456789abcdef0123456789abcdef';
process.env.CREDS_IV = '0123456789abcdef';
/** Skip tests if ANTHROPIC_API_KEY is not available */
const SKIP_INTEGRATION_TESTS = !process.env.ANTHROPIC_API_KEY;
if (SKIP_INTEGRATION_TESTS) {
console.warn('ANTHROPIC_API_KEY not found - skipping integration tests');
}
jest.mock('meilisearch', () => ({
MeiliSearch: jest.fn().mockImplementation(() => ({
getIndex: jest.fn().mockRejectedValue(new Error('mocked')),
index: jest.fn().mockReturnValue({
getRawInfo: jest.fn().mockResolvedValue({ primaryKey: 'id' }),
updateSettings: jest.fn().mockResolvedValue({}),
addDocuments: jest.fn().mockResolvedValue({}),
updateDocuments: jest.fn().mockResolvedValue({}),
deleteDocument: jest.fn().mockResolvedValue({}),
}),
})),
}));
jest.mock('~/server/services/Config', () => ({
loadCustomConfig: jest.fn(() => Promise.resolve({})),
getAppConfig: jest.fn().mockResolvedValue({
paths: {
uploads: '/tmp',
dist: '/tmp/dist',
fonts: '/tmp/fonts',
assets: '/tmp/assets',
},
fileStrategy: 'local',
imageOutputType: 'PNG',
endpoints: {
agents: {
allowedProviders: ['anthropic', 'openAI'],
},
},
}),
setCachedTools: jest.fn(),
getCachedTools: jest.fn(),
getMCPServerTools: jest.fn().mockReturnValue([]),
}));
jest.mock('~/app/clients/tools', () => ({
createOpenAIImageTools: jest.fn(() => []),
createYouTubeTools: jest.fn(() => []),
manifestToolMap: {},
toolkits: [],
}));
jest.mock('~/config', () => ({
createMCPServersRegistry: jest.fn(),
createMCPManager: jest.fn().mockResolvedValue({
getAppToolFunctions: jest.fn().mockResolvedValue({}),
}),
}));
const express = require('express');
const request = require('supertest');
const mongoose = require('mongoose');
const { v4: uuidv4 } = require('uuid');
const { MongoMemoryServer } = require('mongodb-memory-server');
const { hashToken, getRandomValues, createModels } = require('@librechat/data-schemas');
const {
SystemRoles,
ResourceType,
AccessRoleIds,
PrincipalType,
PrincipalModel,
PermissionBits,
EModelEndpoint,
} = require('librechat-data-provider');
/** @type {import('mongoose').Model} */
let Agent;
/** @type {import('mongoose').Model} */
let AgentApiKey;
/** @type {import('mongoose').Model} */
let User;
/** @type {import('mongoose').Model} */
let AclEntry;
/** @type {import('mongoose').Model} */
let AccessRole;
/**
* Parse SSE stream into events
* @param {string} text - Raw SSE text
* @returns {Array<{event: string, data: unknown}>}
*/
function parseSSEEvents(text) {
const events = [];
const lines = text.split('\n');
let currentEvent = '';
let currentData = '';
for (const line of lines) {
if (line.startsWith('event:')) {
currentEvent = line.slice(6).trim();
} else if (line.startsWith('data:')) {
currentData = line.slice(5).trim();
} else if (line === '' && currentData) {
if (currentData === '[DONE]') {
events.push({ event: 'done', data: '[DONE]' });
} else {
try {
const parsed = JSON.parse(currentData);
events.push({
event: currentEvent || parsed.type || 'unknown',
data: parsed,
});
} catch {
// Skip unparseable data
}
}
currentEvent = '';
currentData = '';
}
}
return events;
}
/**
* Valid streaming event types per Open Responses specification
* @see https://github.com/openresponses/openresponses/blob/main/src/lib/sse-parser.ts
*/
const VALID_STREAMING_EVENT_TYPES = new Set([
// Standard Open Responses events
'response.created',
'response.queued',
'response.in_progress',
'response.completed',
'response.failed',
'response.incomplete',
'response.output_item.added',
'response.output_item.done',
'response.content_part.added',
'response.content_part.done',
'response.output_text.delta',
'response.output_text.done',
'response.refusal.delta',
'response.refusal.done',
'response.function_call_arguments.delta',
'response.function_call_arguments.done',
'response.reasoning_summary_part.added',
'response.reasoning_summary_part.done',
'response.reasoning.delta',
'response.reasoning.done',
'response.reasoning_summary_text.delta',
'response.reasoning_summary_text.done',
'response.output_text.annotation.added',
'error',
// LibreChat extension events (prefixed per Open Responses spec)
// @see https://openresponses.org/specification#extending-streaming-events
'librechat:attachment',
]);
/**
* Validate a streaming event against Open Responses spec
* @param {Object} event - Parsed event with data
* @returns {string[]} Array of validation errors
*/
function validateStreamingEvent(event) {
const errors = [];
const data = event.data;
if (!data || typeof data !== 'object') {
return errors; // Skip non-object data (e.g., [DONE])
}
const eventType = data.type;
// Check event type is valid
if (!VALID_STREAMING_EVENT_TYPES.has(eventType)) {
errors.push(`Invalid event type: ${eventType}`);
return errors;
}
// Validate required fields based on event type
switch (eventType) {
case 'response.output_text.delta':
if (typeof data.sequence_number !== 'number') {
errors.push('response.output_text.delta: missing sequence_number');
}
if (typeof data.item_id !== 'string') {
errors.push('response.output_text.delta: missing item_id');
}
if (typeof data.output_index !== 'number') {
errors.push('response.output_text.delta: missing output_index');
}
if (typeof data.content_index !== 'number') {
errors.push('response.output_text.delta: missing content_index');
}
if (typeof data.delta !== 'string') {
errors.push('response.output_text.delta: missing delta');
}
if (!Array.isArray(data.logprobs)) {
errors.push('response.output_text.delta: missing logprobs array');
}
break;
case 'response.output_text.done':
if (typeof data.sequence_number !== 'number') {
errors.push('response.output_text.done: missing sequence_number');
}
if (typeof data.item_id !== 'string') {
errors.push('response.output_text.done: missing item_id');
}
if (typeof data.output_index !== 'number') {
errors.push('response.output_text.done: missing output_index');
}
if (typeof data.content_index !== 'number') {
errors.push('response.output_text.done: missing content_index');
}
if (typeof data.text !== 'string') {
errors.push('response.output_text.done: missing text');
}
if (!Array.isArray(data.logprobs)) {
errors.push('response.output_text.done: missing logprobs array');
}
break;
case 'response.reasoning.delta':
if (typeof data.sequence_number !== 'number') {
errors.push('response.reasoning.delta: missing sequence_number');
}
if (typeof data.item_id !== 'string') {
errors.push('response.reasoning.delta: missing item_id');
}
if (typeof data.output_index !== 'number') {
errors.push('response.reasoning.delta: missing output_index');
}
if (typeof data.content_index !== 'number') {
errors.push('response.reasoning.delta: missing content_index');
}
if (typeof data.delta !== 'string') {
errors.push('response.reasoning.delta: missing delta');
}
break;
case 'response.reasoning.done':
if (typeof data.sequence_number !== 'number') {
errors.push('response.reasoning.done: missing sequence_number');
}
if (typeof data.item_id !== 'string') {
errors.push('response.reasoning.done: missing item_id');
}
if (typeof data.output_index !== 'number') {
errors.push('response.reasoning.done: missing output_index');
}
if (typeof data.content_index !== 'number') {
errors.push('response.reasoning.done: missing content_index');
}
if (typeof data.text !== 'string') {
errors.push('response.reasoning.done: missing text');
}
break;
case 'response.in_progress':
case 'response.completed':
case 'response.failed':
if (!data.response || typeof data.response !== 'object') {
errors.push(`${eventType}: missing response object`);
}
break;
case 'response.output_item.added':
case 'response.output_item.done':
if (typeof data.output_index !== 'number') {
errors.push(`${eventType}: missing output_index`);
}
if (!data.item || typeof data.item !== 'object') {
errors.push(`${eventType}: missing item object`);
}
break;
}
return errors;
}
/**
* Validate all streaming events and return errors
* @param {Array} events - Array of parsed events
* @returns {string[]} Array of all validation errors
*/
function validateAllStreamingEvents(events) {
const allErrors = [];
for (const event of events) {
const errors = validateStreamingEvent(event);
allErrors.push(...errors);
}
return allErrors;
}
/**
* Create a test agent with Anthropic provider
* @param {Object} overrides
* @returns {Promise<Object>}
*/
async function createTestAgent(overrides = {}) {
const timestamp = new Date();
const agentData = {
id: `agent_${uuidv4().replace(/-/g, '').substring(0, 21)}`,
name: 'Test Anthropic Agent',
description: 'An agent for testing Open Responses API',
instructions: 'You are a helpful assistant. Be concise.',
provider: EModelEndpoint.anthropic,
model: 'claude-sonnet-4-5-20250929',
author: new mongoose.Types.ObjectId(),
tools: [],
model_parameters: {},
...overrides,
};
const versionData = { ...agentData };
delete versionData.author;
const initialAgentData = {
...agentData,
versions: [
{
...versionData,
createdAt: timestamp,
updatedAt: timestamp,
},
],
category: 'general',
};
return (await Agent.create(initialAgentData)).toObject();
}
/**
* Create an agent with extended thinking enabled
* @param {Object} overrides
* @returns {Promise<Object>}
*/
async function createThinkingAgent(overrides = {}) {
return createTestAgent({
name: 'Test Thinking Agent',
description: 'An agent with extended thinking enabled',
model_parameters: {
thinking: {
type: 'enabled',
budget_tokens: 5000,
},
},
...overrides,
});
}
const describeWithApiKey = SKIP_INTEGRATION_TESTS ? describe.skip : describe;
describeWithApiKey('Open Responses API Integration Tests', () => {
// Increase timeout for real API calls
jest.setTimeout(120000);
let mongoServer;
let app;
let testAgent;
let thinkingAgent;
let testUser;
let testApiKey; // The raw API key for Authorization header
afterAll(() => {
process.env.CREDS_KEY = originalEnv.CREDS_KEY;
process.env.CREDS_IV = originalEnv.CREDS_IV;
});
beforeAll(async () => {
// Start MongoDB Memory Server
mongoServer = await MongoMemoryServer.create();
const mongoUri = mongoServer.getUri();
// Connect to MongoDB
await mongoose.connect(mongoUri);
// Register all models
const models = createModels(mongoose);
// Get models
Agent = models.Agent;
AgentApiKey = models.AgentApiKey;
User = models.User;
AclEntry = models.AclEntry;
AccessRole = models.AccessRole;
// Create minimal Express app with just the responses routes
app = express();
app.use(express.json());
// Mount the responses routes
const responsesRoutes = require('~/server/routes/agents/responses');
app.use('/api/agents/v1/responses', responsesRoutes);
// Create test user
testUser = await User.create({
name: 'Test API User',
username: 'testapiuser',
email: 'testapiuser@test.com',
emailVerified: true,
provider: 'local',
role: SystemRoles.ADMIN,
});
// Create REMOTE_AGENT access roles (if they don't exist)
const existingRoles = await AccessRole.find({
accessRoleId: {
$in: [
AccessRoleIds.REMOTE_AGENT_VIEWER,
AccessRoleIds.REMOTE_AGENT_EDITOR,
AccessRoleIds.REMOTE_AGENT_OWNER,
],
},
});
if (existingRoles.length === 0) {
await AccessRole.create([
{
accessRoleId: AccessRoleIds.REMOTE_AGENT_VIEWER,
name: 'API Viewer',
description: 'Can query the agent via API',
resourceType: ResourceType.REMOTE_AGENT,
permBits: PermissionBits.VIEW,
},
{
accessRoleId: AccessRoleIds.REMOTE_AGENT_EDITOR,
name: 'API Editor',
description: 'Can view and modify the agent via API',
resourceType: ResourceType.REMOTE_AGENT,
permBits: PermissionBits.VIEW | PermissionBits.EDIT,
},
{
accessRoleId: AccessRoleIds.REMOTE_AGENT_OWNER,
name: 'API Owner',
description: 'Full API access + can grant remote access to others',
resourceType: ResourceType.REMOTE_AGENT,
permBits:
PermissionBits.VIEW |
PermissionBits.EDIT |
PermissionBits.DELETE |
PermissionBits.SHARE,
},
]);
}
// Generate and create an API key for the test user
const rawKey = `sk-${await getRandomValues(32)}`;
const keyHash = await hashToken(rawKey);
const keyPrefix = rawKey.substring(0, 8);
await AgentApiKey.create({
userId: testUser._id,
name: 'Test API Key',
keyHash,
keyPrefix,
});
testApiKey = rawKey;
// Create test agents with the test user as author
testAgent = await createTestAgent({ author: testUser._id });
thinkingAgent = await createThinkingAgent({ author: testUser._id });
// Grant REMOTE_AGENT permissions for the test agents
await AclEntry.create([
{
principalType: PrincipalType.USER,
principalModel: PrincipalModel.USER,
principalId: testUser._id,
resourceType: ResourceType.REMOTE_AGENT,
resourceId: testAgent._id,
accessRoleId: AccessRoleIds.REMOTE_AGENT_OWNER,
permBits:
PermissionBits.VIEW | PermissionBits.EDIT | PermissionBits.DELETE | PermissionBits.SHARE,
},
{
principalType: PrincipalType.USER,
principalModel: PrincipalModel.USER,
principalId: testUser._id,
resourceType: ResourceType.REMOTE_AGENT,
resourceId: thinkingAgent._id,
accessRoleId: AccessRoleIds.REMOTE_AGENT_OWNER,
permBits:
PermissionBits.VIEW | PermissionBits.EDIT | PermissionBits.DELETE | PermissionBits.SHARE,
},
]);
}, 60000);
afterAll(async () => {
await mongoose.disconnect();
await mongoServer.stop();
});
beforeEach(async () => {
// Clean up any test data between tests if needed
});
/* ===========================================================================
* COMPLIANCE TESTS
* Based on: https://github.com/openresponses/openresponses/blob/main/src/lib/compliance-tests.ts
* =========================================================================== */
/** Helper to add auth header to requests */
const authRequest = () => ({
post: (url) => request(app).post(url).set('Authorization', `Bearer ${testApiKey}`),
get: (url) => request(app).get(url).set('Authorization', `Bearer ${testApiKey}`),
});
describe('Compliance Tests', () => {
describe('basic-response', () => {
it('should return a valid ResponseResource for a simple text request', async () => {
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: testAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'Say hello in exactly 3 words.',
},
],
});
expect(response.status).toBe(200);
expect(response.body).toBeDefined();
// Validate ResponseResource schema
const body = response.body;
expect(body.id).toMatch(/^resp_/);
expect(body.object).toBe('response');
expect(typeof body.created_at).toBe('number');
expect(body.status).toBe('completed');
expect(body.model).toBe(testAgent.id);
// Validate output
expect(Array.isArray(body.output)).toBe(true);
expect(body.output.length).toBeGreaterThan(0);
// Should have at least one message item
const messageItem = body.output.find((item) => item.type === 'message');
expect(messageItem).toBeDefined();
expect(messageItem.role).toBe('assistant');
expect(messageItem.status).toBe('completed');
expect(Array.isArray(messageItem.content)).toBe(true);
});
});
describe('streaming-response', () => {
it('should return valid SSE streaming events', async () => {
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: testAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'Count from 1 to 5.',
},
],
stream: true,
})
.buffer(true)
.parse((res, callback) => {
let data = '';
res.on('data', (chunk) => {
data += chunk.toString();
});
res.on('end', () => {
callback(null, data);
});
});
expect(response.status).toBe(200);
expect(response.headers['content-type']).toMatch(/text\/event-stream/);
const events = parseSSEEvents(response.body);
expect(events.length).toBeGreaterThan(0);
// Validate all streaming events against Open Responses spec
// This catches issues like:
// - Invalid event types (e.g., response.reasoning_text.delta instead of response.reasoning.delta)
// - Missing required fields (e.g., logprobs on output_text events)
const validationErrors = validateAllStreamingEvents(events);
if (validationErrors.length > 0) {
console.error('Streaming event validation errors:', validationErrors);
}
expect(validationErrors).toEqual([]);
// Validate streaming event types
const eventTypes = events.map((e) => e.event);
// Should have response.created first (per Open Responses spec)
expect(eventTypes).toContain('response.created');
// Should have response.in_progress
expect(eventTypes).toContain('response.in_progress');
// response.created should come before response.in_progress
const createdIdx = eventTypes.indexOf('response.created');
const inProgressIdx = eventTypes.indexOf('response.in_progress');
expect(createdIdx).toBeLessThan(inProgressIdx);
// Should have response.completed or response.failed
expect(eventTypes.some((t) => t === 'response.completed' || t === 'response.failed')).toBe(
true,
);
// Should have [DONE]
expect(eventTypes).toContain('done');
// Validate response.completed has full response
const completedEvent = events.find((e) => e.event === 'response.completed');
if (completedEvent) {
expect(completedEvent.data.response).toBeDefined();
expect(completedEvent.data.response.status).toBe('completed');
expect(completedEvent.data.response.output.length).toBeGreaterThan(0);
}
});
it('should emit valid event types per Open Responses spec', async () => {
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: testAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'Say hi.',
},
],
stream: true,
})
.buffer(true)
.parse((res, callback) => {
let data = '';
res.on('data', (chunk) => {
data += chunk.toString();
});
res.on('end', () => {
callback(null, data);
});
});
expect(response.status).toBe(200);
const events = parseSSEEvents(response.body);
// Check all event types are valid
for (const event of events) {
if (event.data && typeof event.data === 'object' && event.data.type) {
expect(VALID_STREAMING_EVENT_TYPES.has(event.data.type)).toBe(true);
}
}
});
it('should include logprobs array in output_text events', async () => {
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: testAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'Say one word.',
},
],
stream: true,
})
.buffer(true)
.parse((res, callback) => {
let data = '';
res.on('data', (chunk) => {
data += chunk.toString();
});
res.on('end', () => {
callback(null, data);
});
});
expect(response.status).toBe(200);
const events = parseSSEEvents(response.body);
// Find output_text delta/done events and verify logprobs
const textDeltaEvents = events.filter(
(e) => e.data && e.data.type === 'response.output_text.delta',
);
const textDoneEvents = events.filter(
(e) => e.data && e.data.type === 'response.output_text.done',
);
// Should have at least one output_text event
expect(textDeltaEvents.length + textDoneEvents.length).toBeGreaterThan(0);
// All output_text.delta events must have logprobs array
for (const event of textDeltaEvents) {
expect(Array.isArray(event.data.logprobs)).toBe(true);
}
// All output_text.done events must have logprobs array
for (const event of textDoneEvents) {
expect(Array.isArray(event.data.logprobs)).toBe(true);
}
});
});
describe('system-prompt', () => {
it('should handle developer role messages in input (as system)', async () => {
// Note: For Anthropic, system messages must be first and there can only be one.
// Since the agent already has instructions, we use 'developer' role which
// gets merged into the system prompt, or we test with a simple user message
// that instructs the behavior.
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: testAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'Pretend you are a pirate and say hello in pirate speak.',
},
],
});
expect(response.status).toBe(200);
expect(response.body.status).toBe('completed');
expect(response.body.output.length).toBeGreaterThan(0);
// The response should reflect the pirate persona
const messageItem = response.body.output.find((item) => item.type === 'message');
expect(messageItem).toBeDefined();
expect(messageItem.content.length).toBeGreaterThan(0);
});
});
describe('multi-turn', () => {
it('should handle multi-turn conversation history', async () => {
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: testAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'My name is Alice.',
},
{
type: 'message',
role: 'assistant',
content: 'Hello Alice! Nice to meet you. How can I help you today?',
},
{
type: 'message',
role: 'user',
content: 'What is my name?',
},
],
});
expect(response.status).toBe(200);
expect(response.body.status).toBe('completed');
// The response should reference "Alice"
const messageItem = response.body.output.find((item) => item.type === 'message');
expect(messageItem).toBeDefined();
const textContent = messageItem.content.find((c) => c.type === 'output_text');
expect(textContent).toBeDefined();
expect(textContent.text.toLowerCase()).toContain('alice');
});
});
// Note: tool-calling test requires tool setup which may need additional configuration
// Note: image-input test requires vision-capable model
describe('string-input', () => {
it('should accept simple string input', async () => {
const response = await authRequest().post('/api/agents/v1/responses').send({
model: testAgent.id,
input: 'Hello!',
});
expect(response.status).toBe(200);
expect(response.body.status).toBe('completed');
expect(response.body.output.length).toBeGreaterThan(0);
});
});
});
/* ===========================================================================
* EXTENDED THINKING TESTS
* Tests reasoning output from Claude models with extended thinking enabled
* =========================================================================== */
describe('Extended Thinking', () => {
it('should return reasoning output when thinking is enabled', async () => {
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: thinkingAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'What is 15 * 7? Think step by step.',
},
],
});
expect(response.status).toBe(200);
expect(response.body.status).toBe('completed');
// Check for reasoning item in output
const reasoningItem = response.body.output.find((item) => item.type === 'reasoning');
// If reasoning is present, validate its structure per Open Responses spec
// Note: reasoning items do NOT have a 'status' field per the spec
// @see https://github.com/openresponses/openresponses/blob/main/src/generated/kubb/zod/reasoningBodySchema.ts
if (reasoningItem) {
expect(reasoningItem).toHaveProperty('id');
expect(reasoningItem).toHaveProperty('type', 'reasoning');
// Note: 'status' is NOT a field on reasoning items per the spec
expect(reasoningItem).toHaveProperty('summary');
expect(Array.isArray(reasoningItem.summary)).toBe(true);
// Validate content items
if (reasoningItem.content && reasoningItem.content.length > 0) {
const reasoningContent = reasoningItem.content[0];
expect(reasoningContent).toHaveProperty('type', 'reasoning_text');
expect(reasoningContent).toHaveProperty('text');
}
}
const messageItem = response.body.output.find((item) => item.type === 'message');
expect(messageItem).toBeDefined();
});
it('should stream reasoning events when thinking is enabled', async () => {
const response = await authRequest()
.post('/api/agents/v1/responses')
.send({
model: thinkingAgent.id,
input: [
{
type: 'message',
role: 'user',
content: 'What is 12 + 8? Think step by step.',
},
],
stream: true,
})
.buffer(true)
.parse((res, callback) => {
let data = '';
res.on('data', (chunk) => {
data += chunk.toString();
});
res.on('end', () => {
callback(null, data);
});
});
expect(response.status).toBe(200);
const events = parseSSEEvents(response.body);
// Validate all events against Open Responses spec
const validationErrors = validateAllStreamingEvents(events);
if (validationErrors.length > 0) {
console.error('Reasoning streaming event validation errors:', validationErrors);
}
expect(validationErrors).toEqual([]);
// Check for reasoning-related events using correct event types per Open Responses spec
// Note: The spec uses response.reasoning.delta NOT response.reasoning_text.delta
const reasoningDeltaEvents = events.filter(
(e) => e.data && e.data.type === 'response.reasoning.delta',
);
const reasoningDoneEvents = events.filter(
(e) => e.data && e.data.type === 'response.reasoning.done',
);
// If reasoning events are present, validate their structure
if (reasoningDeltaEvents.length > 0) {
const deltaEvent = reasoningDeltaEvents[0];
expect(deltaEvent.data).toHaveProperty('item_id');
expect(deltaEvent.data).toHaveProperty('delta');
expect(deltaEvent.data).toHaveProperty('output_index');
expect(deltaEvent.data).toHaveProperty('content_index');
expect(deltaEvent.data).toHaveProperty('sequence_number');
}
if (reasoningDoneEvents.length > 0) {
const doneEvent = reasoningDoneEvents[0];
expect(doneEvent.data).toHaveProperty('item_id');
expect(doneEvent.data).toHaveProperty('text');
expect(doneEvent.data).toHaveProperty('output_index');
expect(doneEvent.data).toHaveProperty('content_index');
expect(doneEvent.data).toHaveProperty('sequence_number');
}
// Verify stream completed properly
const eventTypes = events.map((e) => e.event);
expect(eventTypes).toContain('response.completed');
});
});
/* ===========================================================================
* SCHEMA VALIDATION TESTS
* Verify response schema compliance
* =========================================================================== */
describe('Schema Validation', () => {
it('should include all required fields in response', async () => {
const response = await authRequest().post('/api/agents/v1/responses').send({
model: testAgent.id,
input: 'Test',
});
expect(response.status).toBe(200);
const body = response.body;
// Required fields per Open Responses spec
expect(body).toHaveProperty('id');
expect(body).toHaveProperty('object', 'response');
expect(body).toHaveProperty('created_at');
expect(body).toHaveProperty('completed_at');
expect(body).toHaveProperty('status');
expect(body).toHaveProperty('model');
expect(body).toHaveProperty('output');
expect(body).toHaveProperty('tools');
expect(body).toHaveProperty('tool_choice');
expect(body).toHaveProperty('truncation');
expect(body).toHaveProperty('parallel_tool_calls');
expect(body).toHaveProperty('text');
expect(body).toHaveProperty('temperature');
expect(body).toHaveProperty('top_p');
expect(body).toHaveProperty('presence_penalty');
expect(body).toHaveProperty('frequency_penalty');
expect(body).toHaveProperty('top_logprobs');
expect(body).toHaveProperty('store');
expect(body).toHaveProperty('background');
expect(body).toHaveProperty('service_tier');
expect(body).toHaveProperty('metadata');
// top_logprobs must be a number (not null)
expect(typeof body.top_logprobs).toBe('number');
// Usage must have required detail fields
expect(body).toHaveProperty('usage');
expect(body.usage).toHaveProperty('input_tokens');
expect(body.usage).toHaveProperty('output_tokens');
expect(body.usage).toHaveProperty('total_tokens');
expect(body.usage).toHaveProperty('input_tokens_details');
expect(body.usage).toHaveProperty('output_tokens_details');
expect(body.usage.input_tokens_details).toHaveProperty('cached_tokens');
expect(body.usage.output_tokens_details).toHaveProperty('reasoning_tokens');
});
it('should have valid message item structure', async () => {
const response = await authRequest().post('/api/agents/v1/responses').send({
model: testAgent.id,
input: 'Hello',
});
expect(response.status).toBe(200);
const messageItem = response.body.output.find((item) => item.type === 'message');
expect(messageItem).toBeDefined();
// Message item required fields
expect(messageItem).toHaveProperty('type', 'message');
expect(messageItem).toHaveProperty('id');
expect(messageItem).toHaveProperty('status');
expect(messageItem).toHaveProperty('role', 'assistant');
expect(messageItem).toHaveProperty('content');
expect(Array.isArray(messageItem.content)).toBe(true);
// Content part structure - verify all required fields
if (messageItem.content.length > 0) {
const textContent = messageItem.content.find((c) => c.type === 'output_text');
if (textContent) {
expect(textContent).toHaveProperty('type', 'output_text');
expect(textContent).toHaveProperty('text');
expect(textContent).toHaveProperty('annotations');
expect(textContent).toHaveProperty('logprobs');
expect(Array.isArray(textContent.annotations)).toBe(true);
expect(Array.isArray(textContent.logprobs)).toBe(true);
}
}
// Verify reasoning item has required summary field
const reasoningItem = response.body.output.find((item) => item.type === 'reasoning');
if (reasoningItem) {
expect(reasoningItem).toHaveProperty('type', 'reasoning');
expect(reasoningItem).toHaveProperty('id');
expect(reasoningItem).toHaveProperty('summary');
expect(Array.isArray(reasoningItem.summary)).toBe(true);
}
});
});
/* ===========================================================================
* RESPONSE STORAGE TESTS
* Tests for store: true and GET /v1/responses/:id
* =========================================================================== */
describe('Response Storage', () => {
it('should store response when store: true and retrieve it', async () => {
// Create a stored response
const createResponse = await authRequest().post('/api/agents/v1/responses').send({
model: testAgent.id,
input: 'Remember this: The answer is 42.',
store: true,
});
expect(createResponse.status).toBe(200);
expect(createResponse.body.status).toBe('completed');
const responseId = createResponse.body.id;
expect(responseId).toMatch(/^resp_/);
// Small delay to ensure database write completes
await new Promise((resolve) => setTimeout(resolve, 500));
// Retrieve the stored response
const getResponseResult = await authRequest().get(`/api/agents/v1/responses/${responseId}`);
// Note: The response might be stored under conversationId, not responseId
// If we get 404, that's expected behavior for now since we store by conversationId
if (getResponseResult.status === 200) {
expect(getResponseResult.body.object).toBe('response');
expect(getResponseResult.body.status).toBe('completed');
expect(getResponseResult.body.output.length).toBeGreaterThan(0);
}
});
it('should return 404 for non-existent response', async () => {
const response = await authRequest().get('/api/agents/v1/responses/resp_nonexistent123');
expect(response.status).toBe(404);
expect(response.body.error).toBeDefined();
});
});
/* ===========================================================================
* ERROR HANDLING TESTS
* =========================================================================== */
describe('Error Handling', () => {
it('should return error for missing model', async () => {
const response = await authRequest().post('/api/agents/v1/responses').send({
input: 'Hello',
});
expect(response.status).toBe(400);
expect(response.body.error).toBeDefined();
});
it('should return error for missing input', async () => {
const response = await authRequest().post('/api/agents/v1/responses').send({
model: testAgent.id,
});
expect(response.status).toBe(400);
expect(response.body.error).toBeDefined();
});
it('should return error for non-existent agent', async () => {
const response = await authRequest().post('/api/agents/v1/responses').send({
model: 'agent_nonexistent123456789',
input: 'Hello',
});
expect(response.status).toBe(404);
expect(response.body.error).toBeDefined();
});
});
/* ===========================================================================
* MODELS ENDPOINT TESTS
* =========================================================================== */
describe('GET /v1/responses/models', () => {
it('should list available agents as models', async () => {
const response = await authRequest().get('/api/agents/v1/responses/models');
expect(response.status).toBe(200);
expect(response.body.object).toBe('list');
expect(Array.isArray(response.body.data)).toBe(true);
// Should include our test agent
const foundAgent = response.body.data.find((m) => m.id === testAgent.id);
expect(foundAgent).toBeDefined();
expect(foundAgent.object).toBe('model');
expect(foundAgent.name).toBe(testAgent.name);
});
});
});