LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-04-07 08:25:23 +02:00

Author	SHA1	Message	Date
Danny Avila	a6d771d362	fix: correct stale-client cleanup in both OAuth paths - Blocking path: remove result?.clientInfo guard that made cleanup unreachable (handleOAuthRequired returns null on failure, so result?.clientInfo was always false in the failure branch) - returnOnOAuth path: only clear stored client when the prior flow status is FAILED, not on COMPLETED or PENDING flows, to avoid deleting valid registrations during normal flow replacement	2026-04-03 19:28:52 -04:00
Danny Avila	874f2a03fc	fix: address minor review findings N3, N5, N6 - N3: Type deleteClientRegistration param as TokenMethods['deleteTokens'] instead of Promise<unknown> - N5: Elevate deletion failure logging from debug to warn for operator visibility when stale client cleanup fails - N6: Use getLogPrefix() instead of hardcoded log prefix to respect system-user privacy convention	2026-04-03 19:28:52 -04:00
Danny Avila	978ce2b4eb	fix: validate auth server identity and target cleanup to reused clients - Gate client reuse on authorization server identity: compare stored issuer against freshly discovered metadata before reusing, preventing wrong-client reuse when the MCP server switches auth providers - Add reusedStoredClient flag to MCPOAuthFlowMetadata so cleanup only runs when the failed flow actually reused a stored registration, not on unrelated failures (timeouts, user-denied consent, etc.) - Add cleanup in returnOnOAuth path: when a prior flow that reused a stored client is detected as failed, clear the stale registration before re-initiating - Add tests for issuer mismatch and reusedStoredClient flag assertions	2026-04-03 19:28:52 -04:00
Danny Avila	d355be7dd0	fix: clear stale client registration on OAuth flow failure When a stored client_id is no longer recognized by the OAuth server, the flow fails but the stale client stays in MongoDB, causing every retry to reuse the same invalid registration in an infinite loop. On OAuth failure, clear the stored client registration so the next attempt falls through to fresh Dynamic Client Registration. - Add MCPTokenStorage.deleteClientRegistration() for targeted cleanup - Call it from MCPConnectionFactory's OAuth failure path - Add integration test proving recovery from stale client reuse	2026-04-03 19:28:52 -04:00
Danny Avila	20a08e1904	fix: address follow-up review findings R1, R2, R3 - R1: Move `import type { TokenMethods }` to the type-imports section, before local types, per CLAUDE.md import order rules - R2: Add unit test for empty redirect_uris in handler.test.ts to verify the inverted condition triggers re-registration - R3: Use delete for process.env.DOMAIN_SERVER restoration when the original value was undefined to avoid coercion to string "undefined"	2026-04-03 19:28:52 -04:00
Danny Avila	83ba37853b	fix: resolve type check errors for OAuthClientInformation redirect_uris The SDK's OAuthClientInformation type lacks redirect_uris (only on OAuthClientInformationFull). Cast to the local OAuthClientInformation type in handler.ts when accessing deserialized client info from DB, and use intersection types in tests for clientInfo with redirect_uris.	2026-04-03 19:28:52 -04:00
Danny Avila	ca60c83aa3	fix: address review findings for client registration reuse - Fix empty redirect_uris bug: invert condition so missing/empty redirect_uris triggers re-registration instead of silent reuse - Revert undocumented config?.redirect_uri in auto-discovery path - Change DB error logging from debug to warn for operator visibility - Fix import order: move package type import to correct section - Remove redundant type cast and misleading JSDoc comment - Test file: remove dead imports, restore process.env.DOMAIN_SERVER, rename describe blocks, add empty redirect_uris edge case test, add concurrent reconnection test with pre-seeded token, scope documentation to reconnection stabilization	2026-04-03 19:28:51 -04:00
Danny Avila	e22c4675e8	test: add client registration reuse tests for horizontal scaling race condition Reproduces the client_id mismatch bug that occurs in multi-replica deployments where concurrent initiateOAuthFlow calls each register a new OAuth client. Tests verify that the findToken-based client reuse prevents re-registration.	2026-04-03 19:28:51 -04:00
Denis Palnitsky	7e0fffca25	Add undefined fields for logo_uri and tos_uri in OAuth metadata tests	2026-04-03 19:28:51 -04:00
Denis Palnitsky	016e96849e	Handle re-registration of OAuth clients when redirect_uri changes	2026-04-03 19:28:51 -04:00
Denis Palnitsky	2fcf8c5419	fix: reuse existing OAuth client registrations to prevent client_id mismatch When using auto-discovered OAuth (DCR), LibreChat calls /register on every flow initiation, getting a new client_id each time. When concurrent connections or reconnections happen, the client_id used during /authorize differs from the one used during /token, causing the server to reject the exchange. Before registering a new client, check if a valid client registration already exists in the database and reuse it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 19:28:51 -04:00
Danny Avila	fda72ac621	🏗️ refactor: Remove Redundant Caching, Migrate Config Services to TypeScript (#12466 ) * ♻️ refactor: Remove redundant scopedCacheKey caching, support user-provided key model fetching Remove redundant cache layers that used `scopedCacheKey()` (tenant-only scoping) on top of `getAppConfig()` which already caches per-principal (role+user+tenant). This caused config overrides for different principals within the same tenant to be invisible due to stale cached data. Changes: - Add `requireJwtAuth` to `/api/endpoints` route for proper user context - Remove ENDPOINT_CONFIG, STARTUP_CONFIG, PLUGINS, TOOLS, and MODELS_CONFIG cache layers — all derive from `getAppConfig()` with cheap computation - Enhance MODEL_QUERIES cache: hash(baseURL+apiKey) keys, 2-minute TTL, caching centralized in `fetchModels()` base function - Support fetching models with user-provided API keys in `loadConfigModels` via `getUserKeyValues` lookup (no caching for user keys) - Update all affected tests Closes #1028 * ♻️ refactor: Migrate config services to TypeScript in packages/api Move core config logic from CJS /api wrappers to typed TypeScript in packages/api using dependency injection factories: - `createEndpointsConfigService` — endpoint config merging + checkCapability - `createLoadConfigModels` — custom endpoint model loading with user key support - `createMCPToolCacheService` — MCP tool cache operations (update, merge, cache) /api files become thin wrappers that wire dependencies (getAppConfig, loadDefaultEndpointsConfig, getUserKeyValues, getCachedTools, etc.) into the typed factories. Also moves existing `endpoints/config.ts` → `endpoints/config/providers.ts` to accommodate the new `config/` directory structure. * 🔄 fix: Invalidate models query when user API key is set or revoked Without this, users had to refresh the page after entering their API key to see the updated model list fetched with their credentials. - Invalidate QueryKeys.models in useUpdateUserKeysMutation onSuccess - Invalidate QueryKeys.models in useRevokeUserKeyMutation onSuccess - Invalidate QueryKeys.models in useRevokeAllUserKeysMutation onSuccess * 🗺️ fix: Remap YAML-level override keys to AppConfig equivalents in mergeConfigOverrides Config overrides stored in the DB use YAML-level keys (TCustomConfig), but they're merged into the already-processed AppConfig where some fields have been renamed by AppService. This caused mcpServers overrides to land on a nonexistent key instead of mcpConfig, so config-override MCP servers never appeared in the UI. - Add OVERRIDE_KEY_MAP to remap mcpServers→mcpConfig, interface→interfaceConfig - Apply remapping before deep merge in mergeConfigOverrides - Add test for YAML-level key remapping behavior - Update existing tests to use AppConfig field names in assertions * 🧪 test: Update service.spec to use AppConfig field names after override key remapping * 🛡️ fix: Address code review findings — reliability, types, tests, and performance - Pass tenant context (getTenantId) in importers.js getEndpointsConfig call - Add 5 tests for user-provided API key model fetching (key found, no key, DB error, missing userId, apiKey-only with fixed baseURL) - Distinguish NO_USER_KEY (debug) from infrastructure errors (warn) in catch - Switch fetchPromisesMap from Promise.all to Promise.allSettled so one failing provider doesn't kill the entire model config - Parallelize getUserKeyValues DB lookups via batched Promise.allSettled instead of sequential awaits in the loop - Hoist standardCache instance in fetchModels to avoid double instantiation - Replace Record<string, unknown> types with Partial<TConfig>-based types; remove as unknown as T double-cast in endpoints config - Narrow Bedrock availableRegions to typed destructure - Narrow version field from string\|number\|undefined to string\|undefined - Fix import ordering in mcp/tools.ts and config/models.ts per AGENTS.md - Add JSDoc to getModelsConfig alias clarifying caching semantics * fix: Guard against null getCachedTools in mergeAppTools * 🔍 fix: Address follow-up review — deduplicate extractEnvVariable, fix error discrimination, add log-level tests - Deduplicate extractEnvVariable calls: resolve apiKey/baseURL once, reuse for both the entry and isUserProvided checks (Finding A) - Move ResolvedEndpoint interface from function closure to module scope (Finding B) - Replace fragile msg.includes('NO_USER_KEY') with ErrorTypes.NO_USER_KEY enum check against actual error message format (Finding C). Also handle ErrorTypes.INVALID_USER_KEY as an expected "no key" case. - Add test asserting logger.warn is called for infra errors (not debug) - Add test asserting logger.debug is called for NO_USER_KEY errors (not warn) * fix: Preserve numeric assistants version via String() coercion * 🐛 fix: Address secondary review — Ollama cache bypass, cache tests, type safety - Fix Ollama success path bypassing cache write in fetchModels (CRITICAL): store result before returning so Ollama models benefit from 2-minute TTL - Add 4 fetchModels cache behavior tests: cache write with TTL, cache hit short-circuits HTTP, skipCache bypasses read+write, empty results not cached - Type-safe OVERRIDE_KEY_MAP: Partial<Record<keyof TCustomConfig, keyof AppConfig>> so compiler catches future field rename mismatches - Fix import ordering in config/models.ts (package types longest→shortest) - Rename ToolCacheDeps → MCPToolCacheDeps for naming consistency - Expand getModelsConfig JSDoc to explain caching granularity * fix: Narrow OVERRIDE_KEY_MAP index to satisfy strict tsconfig * 🧩 fix: Add allowedProviders to TConfig, remove Record<string, unknown> from PartialEndpointEntry The agents endpoint config includes allowedProviders (used by the frontend AgentPanel to filter available providers), but it was missing from TConfig. This forced PartialEndpointEntry to use & Record<string, unknown> as an escape hatch, violating AGENTS.md type policy. - Add allowedProviders?: (string \| EModelEndpoint)[] to TConfig - Remove Record<string, unknown> from PartialEndpointEntry — now just Partial<TConfig> * 🛡️ fix: Isolate Ollama cache write from fetch try-catch, add Ollama cache tests - Separate Ollama fetch and cache write into distinct scopes so a cache failure (e.g., Redis down) doesn't misattribute the error as an Ollama API failure and fall through to the OpenAI-compatible path (Issue A) - Add 2 Ollama-specific cache tests: models written with TTL on fetch, cached models returned without hitting server (Issue B) - Replace hardcoded 120000 with Time.TWO_MINUTES constant in cache TTL test assertion (Issue C) - Fix OVERRIDE_KEY_MAP JSDoc to accurately describe runtime vs compile-time type enforcement (Issue D) - Add global beforeEach for cache mock reset to prevent cross-test leakage * 🧪 fix: Address third review — DI consistency, cache key width, MCP tests - Inject loadCustomEndpointsConfig via EndpointsConfigDeps with default fallback, matching loadDefaultEndpointsConfig DI pattern (Finding 3) - Widen modelsCacheKey from 64-bit (.slice(0,16)) to 128-bit (.slice(0,32)) for collision-sensitive cross-credential cache key (Finding 4) - Add fetchModels.mockReset() in loadConfigModels.spec beforeEach to prevent mock implementation leaks across tests (Finding 5) - Add 11 unit tests for createMCPToolCacheService covering all three functions: null/empty input, successful ops, error propagation, cold-cache merge (Finding 2) - Simplify getModelsConfig JSDoc to @see reference (Finding 10) * ♻️ refactor: Address remaining follow-ups from reviews OVERRIDE_KEY_MAP completeness: - Add missing turnstile→turnstileConfig mapping - Add exhaustiveness test verifying all three renamed keys are remapped and original YAML keys don't leak through Import role context: - Pass userRole through importConversations job → importLibreChatConvo so role-based endpoint overrides are honored during conversation import - Update convos.js route to include req.user.role in the job payload createEndpointsConfigService unit tests: - Add 8 tests covering: default+custom merge, Azure/AzureAssistants/ Anthropic Vertex/Bedrock config enrichment, assistants version coercion, agents allowedProviders, req.config bypass Plugins/tools efficiency: - Use Set for includedTools/filteredTools lookups (O(1) vs O(n) per plugin) - Combine auth check + filter into single pass (eliminates intermediate array) - Pre-compute toolDefKeys Set for O(1) tool definition lookups * fix: Scope model query cache by user when userIdQuery is enabled * fix: Skip model cache for userIdQuery endpoints, fix endpoints test types - When userIdQuery is true, skip caching entirely (like user_provided keys) to avoid cross-user model list leakage without duplicating cache data - Fix AgentCapabilities type error in endpoints.spec.ts — use enum values and appConfig() helper for partial mock typing * 🐛 fix: Restore filteredTools+includedTools composition, add checkCapability tests - Fix filteredTools regression: whitelist and blacklist are now applied independently (two flat guards), matching original behavior where includedTools=['a','b'] + filteredTools=['b'] produces ['a'] (Finding A) - Fix Set spread in toolkit loop: pre-compute toolDefKeysList array once alongside the Set, reuse for .some() without per-plugin allocation (Finding B) - Add 2 filteredTools tests: blacklist-only path and combined whitelist+blacklist composition (Finding C) - Add 3 checkCapability tests: capability present, capability absent, fallback to defaultAgentCapabilities for non-agents endpoints (Finding D) * 🔑 fix: Include config-override MCP servers in filterAuthorizedTools Config-override MCP servers (defined via admin config overrides for roles/groups) were rejected by filterAuthorizedTools because it called getAllServerConfigs(userId) without the configServers parameter. Only YAML and DB-backed user servers were included in the access check. - Add configServers parameter to filterAuthorizedTools - Resolve config servers via resolveConfigServers(req) at all 4 callsites (create, update, duplicate, revert) using parallel Promise.all - Pass configServers through to getAllServerConfigs(userId, configServers) so the registry merges config-source servers into the access check - Update filterAuthorizedTools.spec.js mock for resolveConfigServers * fix: Skip model cache for userIdQuery endpoints, fix endpoints test types For user-provided key endpoints (userProvide: true), skip the full model list re-fetch during message validation — the user already selected from a list we served them, and re-fetching with skipCache:true on every message send is both slow and fragile (5s provider timeout = rejected model). Instead, validate the model string format only: - Must be a string, max 256 chars - Must match [a-zA-Z0-9][a-zA-Z0-9_.:\-/@+ ]* (covers all known provider model ID formats while rejecting injection attempts) System-configured endpoints still get full model list validation as before. * 🧪 test: Add regression tests for filterAuthorizedTools configServers and validateModel filterAuthorizedTools: - Add test verifying configServers is passed to getAllServerConfigs and config-override server tools are allowed through - Guard resolveConfigServers in createAgentHandler to only run when MCP tools are present (skip for tool-free agent creates) validateModel (12 new tests): - Format validation: missing model, non-string, length overflow, leading special char, script injection, standard model ID acceptance - userProvide early-return: next() called immediately, getModelsConfig not invoked (regression guard for the exact bug this fixes) - System endpoint list validation: reject unknown model, accept known model, handle null/missing models config Also fix unnecessary backslash escape in MODEL_PATTERN regex. * 🧹 fix: Remove space from MODEL_PATTERN, trim input, clean up nits - Remove space character from MODEL_PATTERN regex — no real model ID uses spaces; prevents spurious violation logs from whitespace artifacts - Add model.trim() before validation to handle accidental whitespace - Remove redundant filterUniquePlugins call on already-deduplicated output - Add comment documenting intentional whitelist+blacklist composition - Add getUserKeyValues.mockReset() in loadConfigModels.spec beforeEach - Remove narrating JSDoc from getModelsConfig one-liner - Add 2 tests: trim whitespace handling, reject spaces in model ID * fix: Match startup tool loader semantics — includedTools takes precedence over filteredTools The startup tool loader (loadAndFormatTools) explicitly ignores filteredTools when includedTools is set, with a warning log. The PluginController was applying both independently, creating inconsistent behavior where the same config produced different results at startup vs plugin listing time. Restored mutually exclusive semantics: when includedTools is non-empty, filteredTools is not evaluated. * 🧹 chore: Simplify validateModel flow, note auth requirement on endpoints route - Separate missing-model from invalid-model checks cleanly: type+presence guard first, then trim+format guard (reviewer NIT) - Add route comment noting auth is required for role/tenant scoping * fix: Write trimmed model back to req.body.model for downstream consumers	2026-03-30 16:49:48 -04:00
Danny Avila	0d94881c2d	🧹 refactor: Tighten Config Schema Typing and Remove Deprecated Fields (#12452 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * refactor: Remove deprecated and unused fields from endpoint schemas - Remove summarize, summaryModel from endpointSchema and azureEndpointSchema - Remove plugins from azureEndpointSchema - Remove customOrder from endpointSchema and azureEndpointSchema - Remove baseURL from all and agents endpoint schemas - Type paramDefinitions with full SettingDefinition-based schema - Clean up summarize/summaryModel references in initialize.ts and config.spec.ts * refactor: Improve MCP transport schema typing - Add defaults to transport type discriminators (stdio, websocket, sse) - Type stderr field as IOType union instead of z.any() * refactor: Add narrowed preset schema for model specs - Create tModelSpecPresetSchema omitting system/DB/deprecated fields - Update tModelSpecSchema to use the narrowed preset schema * test: Add explicit type field to MCP test fixtures Add transport type discriminator to test objects that construct MCPOptions/ParsedServerConfig directly, required after type field changed from optional to default in schema definitions. * chore: Bump librechat-data-provider to 0.8.404 * refactor: Tighten z.record(z.any()) fields to precise value types - Type headers fields as z.record(z.string()) in endpoint, assistant, and azure schemas - Type addParams as z.record(z.union([z.string(), z.number(), z.boolean(), z.null()])) - Type azure additionalHeaders as z.record(z.string()) - Type memory model_parameters as z.record(z.union([z.string(), z.number(), z.boolean()])) - Type firecrawl changeTrackingOptions.schema as z.record(z.string()) * refactor: Type supportedMimeTypes schema as z.array(z.string()) Replace z.array(z.any()).refine() with z.array(z.string()) since config input is always strings that get converted to RegExp via convertStringsToRegex() after parsing. Destructure supportedMimeTypes from spreads to avoid string[]/RegExp[] type mismatch. * refactor: Tighten enum, role, and numeric constraint schemas - Type engineSTT as enum ['openai', 'azureOpenAI'] - Type engineTTS as enum ['openai', 'azureOpenAI', 'elevenlabs', 'localai'] - Constrain playbackRate to 0.25–4 range - Type titleMessageRole as enum ['system', 'user', 'assistant'] - Add int().nonnegative() to MCP timeout and firecrawl timeout * chore: Bump librechat-data-provider to 0.8.405 * fix: Accept both string and RegExp in supportedMimeTypes schema The schema must accept both string[] (config input) and RegExp[] (post-merge runtime) since tests validate merged output against the schema. Use z.union([z.string(), z.instanceof(RegExp)]) to handle both. * refactor: Address review findings for schema tightening PR - Revert changeTrackingOptions.schema to z.record(z.unknown()) (JSON Schema is nested, not flat strings) - Remove dead contextStrategy code from BaseClient.js and cleanup.js - Extract paramDefinitionSchema to named exported constant - Add .int() constraint to columnSpan and columns - Apply consistent .int().nonnegative() to initTimeout, sseReadTimeout, scraperTimeout - Update stale stderr JSDoc to match actual accepted types - Add comprehensive tests for paramDefinitionSchema, tModelSpecPresetSchema, endpointSchema deprecated field stripping, and azureEndpointSchema * fix: Address second review pass findings - Revert supportedMimeTypesSchema to z.array(z.string()) and remove as string[] casts — fix tests to not validate merged RegExp[] output against the config input schema - Remove unused tModelSpecSchema import from test file - Consolidate duplicate '../src/schemas' imports - Add expiredAt coverage to tModelSpecPresetSchema test - Assert plugins is absent in azureEndpointSchema test - Add sync comments for engineSTT/engineTTS enum literals * refactor: Omit preset-management fields from tModelSpecPresetSchema Omit conversationId, presetId, title, defaultPreset, and order from the model spec preset schema — these are preset-management fields that don't belong in model spec configuration.	2026-03-29 01:10:57 -04:00
Danny Avila	fda1bfc3cc	🔬 ci: Add TypeScript Type Checks to Backend Workflow and Fix All Type Errors (#12451 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * fix(data-schemas): resolve TypeScript strict type check errors in source files - Constrain ConfigSection to string keys via `string & keyof TCustomConfig` - Replace broken `z` import from data-provider with TCustomConfig derivation - Add `_id: Types.ObjectId` to IUser matching other Document interfaces - Add `federatedTokens` and `openidTokens` optional fields to IUser - Type mongoose model accessors as `Model<IRole>` and `Model<IUser>` - Widen `getPremiumRate` param to accept `number \| null` - Widen `bulkWriteAclEntries` ops to untyped `AnyBulkWriteOperation[]` - Fix `getUserPrincipals` return type to use `PrincipalType` enum - Add non-null assertions for `connection.db` in migration files - Import DailyRotateFile constructor directly instead of relying on broken module augmentation across mismatched node_modules trees - Add winston-daily-rotate-file as devDependency for type resolution * fix(data-schemas): resolve TypeScript type errors in test files - Replace arbitrary test keys with valid TCustomConfig properties in config.spec - Use non-null assertions for permission objects in role.methods.spec - Replace `.SHARED_GLOBAL` access with `.not.toHaveProperty()` for legacy field - Add non-null assertions for balance, writeRate, readRate in spendTokens.spec - Update mock user _id to use ObjectId in user.test - Remove unused Schema import in tenantIndexes.spec * fix(api): resolve TypeScript strict type check errors across source and test files - Widen getUserPrincipals dep type in capabilities middleware - Fix federatedTokens type in createSafeUser return - Use proper mock req type for read-only properties in preAuthTenant.spec - Replace `as IUser` casts with ObjectId-typed mocks in openid/oidc specs - Use TokenExchangeMethodEnum values instead of string literals in MCP specs - Fix SessionStore type compatibility in sessionCache specs - Replace `catch (error: any)` with `(error as Error)` in redis specs - Remove invalid properties from test data in initialize and MCP specs - Add String.prototype.isWellFormed declaration for sanitizeTitle spec * fix(client): resolve TypeScript type errors in shared client components - Add default values for destructured bindings in OGDialogTemplate - Replace broken ExtendedFile import with inline type in FileIcon * ci: add TypeScript type-check job to backend review workflow Add a `typecheck` job that runs `tsc --noEmit` on all four TypeScript workspaces (data-provider, data-schemas, @librechat/api, @librechat/client) after the build step. Catches type errors that rollup builds may miss. * fix(data-schemas): add local type declaration for DailyRotateFile transport The `winston-daily-rotate-file` package ships a module augmentation for `winston/lib/winston/transports`, but it fails when winston and winston-daily-rotate-file resolve from different node_modules trees (which happens in this monorepo due to npm hoisting). Add a local `.d.ts` declaration that augments the same module path from within data-schemas' compilation unit, so `tsc --noEmit` passes while keeping the original runtime pattern (`new winston.transports.DailyRotateFile`). * fix: address code review findings from PR #12451 - Restore typed `AnyBulkWriteOperation<AclEntry>[]` on bulkWriteAclEntries, cast to untyped only at the tenantSafeBulkWrite call site (Finding 1) - Type `findUser` model accessor consistently with `findUsers` (Finding 2) - Replace inline `import('mongoose').ClientSession` with top-level import type - Use `toHaveLength` for spy assertions in playwright-expect spec file - Replace numbered Record casts with `.not.toHaveProperty()` in role.methods.spec for SHARED_GLOBAL assertions - Use per-test ObjectIds instead of shared testUserId in openid.spec - Replace inline `import()` type annotations with top-level SessionData import in sessionCache spec - Remove extraneous blank line in user.ts searchUsers * refactor: address remaining review findings (4–7) - Extract OIDCTokens interface in user.ts; deduplicate across IUser fields and oidc.ts FederatedTokens (Finding 4) - Move String.isWellFormed declaration from spec file to project-level src/types/es2024-string.d.ts (Finding 5) - Replace verbose `= undefined` defaults in OGDialogTemplate with null coalescing pattern (Finding 6) - Replace `Record<string, unknown>` TestConfig with named interface containing explicit test fields (Finding 7)	2026-03-28 21:06:39 -04:00
Danny Avila	877c2efc85	🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445 ) * fix: wrap seedDatabase() in runAsSystem() for strict tenant mode seedDatabase() was called without tenant context at startup, causing every Mongoose operation inside it to throw when TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering, matching the pattern already used for performStartupChecks and updateInterfacePermissions. * fix: chain tenantContextMiddleware in optionalJwtAuth optionalJwtAuth populated req.user but never established ALS tenant context, unlike requireJwtAuth which chains tenantContextMiddleware after successful auth. Authenticated users hitting routes with optionalJwtAuth (e.g. /api/banner) had no tenant isolation. * feat: tenant-safe bulkWrite wrapper and call-site migration Mongoose's bulkWrite() does not trigger schema-level middleware hooks, so the applyTenantIsolation plugin cannot intercept it. This adds a tenantSafeBulkWrite() utility that injects the current ALS tenant context into every operation's filter/document before delegating to native bulkWrite. Migrates all 8 runtime bulkWrite call sites: - agentCategory (seedCategories, ensureDefaultCategories) - conversation (bulkSaveConvos) - message (bulkSaveMessages) - file (batchUpdateFiles) - conversationTag (updateTagsForConversation, bulkIncrementTagCounts) - aclEntry (bulkWriteAclEntries) systemGrant.seedSystemGrants is intentionally not migrated — it uses explicit tenantId: { $exists: false } filters and is exempt from the isolation plugin. * feat: pre-auth tenant middleware and tenant-scoped config cache Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request header and wraps downstream in tenantStorage ALS context. Wired onto /oauth, /api/auth, /api/config, and /api/share — unauthenticated routes that need tenant scoping before JWT auth runs. The /api/config cache key is now tenant-scoped (STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the correct login page config per tenant. The middleware is intentionally minimal — no subdomain parsing, no OIDC claim extraction. The private fork's reverse proxy or auth gateway sets the header. * feat: accept optional tenantId in updateInterfacePermissions When tenantId is provided, the function re-enters inside tenantStorage.run({ tenantId }) so all downstream Mongoose queries target that tenant's roles instead of the system context. This lets the private fork's tenant provisioning flow call updateInterfacePermissions per-tenant after creating tenant-scoped ADMIN/USER roles. * fix: tenant-filter $lookup in getPromptGroup aggregation The $lookup stage in getPromptGroup() queried the prompts collection without tenant filtering. While the outer PromptGroup aggregate is protected by the tenantIsolation plugin's pre('aggregate') hook, $lookup runs as an internal MongoDB operation that bypasses Mongoose hooks entirely. Converts from simple field-based $lookup to pipeline-based $lookup with an explicit tenantId match when tenant context is active. * fix: replace field-level unique indexes with tenant-scoped compounds Field-level unique:true creates a globally-unique single-field index in MongoDB, which would cause insert failures across tenants sharing the same ID values. - agent.id: removed field-level unique, added { id, tenantId } compound - convo.conversationId: removed field-level unique (compound at line 50 already exists: { conversationId, user, tenantId }) - message.messageId: removed field-level unique (compound at line 165 already exists: { messageId, user, tenantId }) - preset.presetId: removed field-level unique, added { presetId, tenantId } compound * fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant These caches store per-tenant configuration (available models, endpoint settings, plugin availability, tool definitions) but were using global cache keys. In multi-tenant mode, one tenant's cached config would be served to all tenants. Appends :${tenantId} to cache keys when tenant context is active. Falls back to the unscoped key when no tenant context exists (backward compatible for single-tenant OSS deployments). Covers all read, write, and delete sites: - ModelController.js: get/set MODELS_CONFIG - PluginController.js: get/set PLUGINS, get/set TOOLS - getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG - app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache) - mcp.js: delete TOOLS (updateMCPTools, mergeAppTools) - importers.js: get ENDPOINT_CONFIG * fix: add getTenantId to PluginController spec mock The data-schemas mock was missing getTenantId, causing all PluginController tests to throw when the controller calls getTenantId() for tenant-scoped cache keys. * fix: address review findings — migration, strict-mode, DRY, types Addresses all CRITICAL, MAJOR, and MINOR review findings: F1 (CRITICAL): Add agents, conversations, messages, presets to SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes() drops the old single-field unique indexes that block multi-tenant inserts. F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode instead of silently passing through without tenant injection. F3 (MAJOR): Replace wildcard export with named export for tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the public package API. F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on bulkWriteAclEntries — the unparameterized wrapper accepts parameterized ops as a subtype. F7 (MAJOR): Fix config.js tenant precedence — JWT-derived req.user.tenantId now takes priority over the X-Tenant-Id header for authenticated requests. F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and replace all 11 inline occurrences across 7 files. F9 (MINOR): Use simple localField/foreignField $lookup for the non-tenant getPromptGroup path (more efficient index seeks). F12 (NIT): Remove redundant BulkOp type alias. F13 (NIT): Remove debug log that leaked raw tenantId. * fix: add new superseded indexes to tenantIndexes test fixture The test creates old indexes to verify the migration drops them. Missing fixture entries for agents.id_1, conversations.conversationId_1, messages.messageId_1, and presets.presetId_1 caused the count assertion to fail (expected 22, got 18). * fix: restore logger.warn for unknown bulk op types in non-strict mode * fix: block SYSTEM_TENANT_ID sentinel from external header input CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id, including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID as an explicit bypass — skipping ALL query filters. A client sending X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config, /api/auth, /oauth) would execute Mongoose operations without tenant isolation. Fixes: - preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header - scopedCacheKey returns the base key (not key:__SYSTEM__) in system context, preventing stale cache entries during runAsSystem() - updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID - $lookup pipeline separates $expr join from constant tenantId match for better index utilization - Regression test for sentinel rejection in preAuthTenant.spec.ts - Remove redundant getTenantId() call in config.js * test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions bulkWrite spec: - deleteMany: verifies tenant-scoped deletion leaves other tenants untouched - replaceOne: verifies tenantId injected into both filter and replacement - replaceOne overwrite: verifies a conflicting tenantId in the replacement document is overwritten by the ALS tenant (defense-in-depth) - empty ops array: verifies graceful handling preAuthTenant spec: - All negative-case tests now use the capturedNext pattern to verify getTenantId() inside the middleware's execution context, not the test runner's outer frame (which was always undefined regardless) * feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager MESSAGES cache (streamAudio.js): - Cache key now uses scopedCacheKey(messageId) to prefix with tenantId, preventing cross-tenant message content reads during TTS streaming. FLOWS cache (FlowStateManager): - getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant context is active, isolating OAuth flow state per tenant. GenerationJobManager: - tenantId added to SerializableJobData and GenerationJobMetadata - createJob() captures the current ALS tenant context (excluding SYSTEM_TENANT_ID) and stores it in job metadata - SSE subscription endpoint validates job.metadata.tenantId matches req.user.tenantId, blocking cross-tenant stream access - Both InMemoryJobStore and RedisJobStore updated to accept tenantId * fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas without these exports, causing TypeError at runtime. * fix: correct import ordering per AGENTS.md conventions Package imports sorted shortest to longest line length, local imports sorted longest to shortest — fixes ordering violations introduced by our new imports across 8 files. * fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode serializeJob() writes tenantId to the Redis hash via Object.entries, but deserializeJob() manually enumerates fields and omitted tenantId. Every getJob() from Redis returned tenantId: undefined, causing the SSE route's cross-tenant guard to short-circuit (undefined && ... → false). * test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs SSE stream tenant tests (streamTenant.spec.js): - Cross-tenant user accessing another tenant's stream → 403 - Same-tenant user accessing own stream → allowed - OSS mode (no tenantId on job) → tenant check skipped FlowStateManager tenant tests (manager.tenant.spec.ts): - completeFlow finds flow created under same tenant context - completeFlow does NOT find flow under different tenant context - Unscoped flows are separate from tenant-scoped flows Documentation: - JSDoc on getFlowKey documenting ALS context consistency requirement - Comment on streamAudio.js scopedCacheKey capture site * fix: SSE stream tests hang on success path, remove internal fork references The success-path tests entered the SSE streaming code which never closes, causing timeout. Mock subscribe() to end the response immediately. Restructured assertions to verify non-403/non-404. Removed "private fork" and "OSS" references from code and test descriptions — replaced with "deployment layer", "multi-tenant deployments", and "single-tenant mode". * fix: address review findings — test rigor, tenant ID validation, docs F1: SSE stream tests now mock subscribe() with correct signature (streamId, writeEvent, onDone, onError) and assert 200 status, verifying the tenant guard actually allows through same-tenant users. F2: completeFlow logs the attempted key and ALS tenantId when flow is not found, so reverse proxy misconfiguration (missing X-Tenant-Id on OAuth callback) produces an actionable warning. F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects colons, special characters, and values exceeding 128 chars. Trims whitespace. Prevents cache key collisions via crafted headers. F4: Documented cache invalidation scope limitation in clearEndpointConfigCache — only the calling tenant's key is cleared; other tenants expire via TTL. F7: getFlowKey JSDoc now lists all 8 methods requiring consistent ALS context. F8: Added dedicated scopedCacheKey unit tests — base key without context, base key in system context, scoped key with tenant, no ALS leakage across scope boundaries. * fix: revert flow key tenant scoping, fix SSE test timing FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks arrive without tenant ALS context (provider redirects don't carry X-Tenant-Id), so completeFlow/failFlow would never find flows created under tenant context. Flow IDs are random UUIDs with no collision risk, and flow data is ephemeral (TTL-bounded). SSE tests: Use process.nextTick for onDone callback so Express response headers are flushed before res.write/res.end are called. * fix: restore getTenantId import for completeFlow diagnostic log * fix: correct completeFlow warning message, add missing flow test The warning referenced X-Tenant-Id header consistency which was only relevant when flow keys were tenant-scoped (since reverted). Updated to list actual causes: TTL expiry, missing flow, or routing to a different instance without shared Keyv storage. Removed the getTenantId() call and import — no longer needed since flow keys are unscoped. Added test for the !flowState branch in completeFlow — verifies return false and logger.warn on nonexistent flow ID. * fix: add explicit return type to recursive updateInterfacePermissions The recursive call (tenantId branch calls itself without tenantId) causes TypeScript to infer circular return type 'any'. Adding explicit Promise<void> satisfies the rollup typescript plugin. * fix: update MCPOAuthRaceCondition test to match new completeFlow warning * fix: clearEndpointConfigCache deletes both scoped and unscoped keys Unauthenticated /api/endpoints requests populate the unscoped ENDPOINT_CONFIG key. Admin config mutations clear only the tenant-scoped key, leaving the unscoped entry stale indefinitely. Now deletes both when in tenant context. * fix: tenant guard on abort/status endpoints, warn logs, test coverage F1: Add tenant guard to /chat/status/:conversationId and /chat/abort matching the existing guard on /chat/stream/:streamId. The status endpoint exposes aggregatedContent (AI response text) which requires tenant-level access control. F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__ sentinel and malformed tenant IDs, providing observability for bypass probing attempts. F3: Abort fallback path (getActiveJobIdsForUser) now has tenant check after resolving the job. F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem bypasses tenantSafeBulkWrite without throwing in strict mode. F5: Test for job with tenantId + user without tenantId → 403. F10: Regex uses idiomatic hyphen-at-start form. F11: Test descriptions changed from "rejects" to "ignores" since middleware calls next() (not 4xx). Also fixes MCPOAuthRaceCondition test assertion to match updated completeFlow warning message. * fix: test coverage for logger.warn, status/abort guards, consistency A: preAuthTenant spec now mocks logger and asserts warn calls for __SYSTEM__ sentinel, malformed characters, and oversized headers. B: streamTenant spec expanded with status and abort endpoint tests — cross-tenant status returns 403, same-tenant returns 200 with body, cross-tenant abort returns 403. C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId) matching stream/status pattern — requireJwtAuth guarantees req.user. D: Malformed header warning now includes ip in log metadata, matching the sentinel warning for consistent SOC correlation. * fix: assert ip field in malformed header warn tests * fix: parallelize cache deletes, document tenant guard, fix import order - clearEndpointConfigCache uses Promise.all for independent cache deletes instead of sequential awaits - SSE stream tenant guard has inline comment explaining backward-compat behavior for untenanted legacy jobs - conversation.ts local imports reordered longest-to-shortest per AGENTS.md * fix: tenant-qualify userJobs keys, document tenant guard backward-compat Job store userJobs keys now include tenantId when available: - Redis: stream:user:{tenantId:userId}:jobs (falls back to stream:user:{userId}:jobs when no tenant) - InMemory: composite key tenantId:userId in userJobMap getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId parameter, threaded through from req.user.tenantId at all call sites (/chat/active and /chat/abort fallback). Added inline comments on all three SSE tenant guards explaining the backward-compat design: untenanted legacy jobs remain accessible when the userId check passes. * fix: parallelize cache deletes, document tenant guard, fix import order Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use the tenant-qualified userKey instead of bare userId — prevents orphaned empty Sets accumulating in userJobMap for multi-tenant users. Document cross-tenant staleness in clearEndpointConfigCache JSDoc — other tenants' scoped keys expire via TTL, not active invalidation. * fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs F1: InMemoryJobStore.cleanup() now removes entries from userJobMap before calling deleteJob, preventing orphaned empty Sets from accumulating with tenant-qualified composite keys. F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds operators to configure reverse proxy to control X-Tenant-Id header. F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are not actively invalidated (matching clearEndpointConfigCache pattern). F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId (not req.user?.tenantId) — consistent with stream/status handlers. F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID behavior — falls through to caller's ALS context. F7: Extracted hasTenantMismatch() helper, replacing three identical inline tenant guard blocks across stream/status/abort endpoints. F9: scopedCacheKey JSDoc documents both passthrough cases (no context and SYSTEM_TENANT_ID context). * fix: clean userJobMap in evictOldest — same leak as cleanup()	2026-03-28 16:43:50 -04:00
Danny Avila	935288f841	🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring - Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains` - Add `MCPServerSource` type ('yaml' \| 'config' \| 'user') and `source` field to `ParsedServerConfig` - Change `dbSourced` determination from `!!config.dbId` to `config.source === 'user'` across MCPManager, ConnectionsRepository, UserConnectionManager, and MCPServerInspector - Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB * feat: three-layer MCPServersRegistry with config cache and lazy init - Add `configCacheRepo` as third repository layer between YAML cache and DB for admin-defined config-source MCP servers - Implement `ensureConfigServers()` that identifies config-override servers from resolved `getAppConfig()` mcpConfig, lazily inspects them, and caches parsed configs with `source: 'config'` - Add `lazyInitConfigServer()` with timeout, stub-on-failure, and concurrent-init deduplication via `pendingConfigInits` map - Extend `getAllServerConfigs()` with optional `configServers` param for three-way merge: YAML → Config → User - Add `getServerConfig()` lookup through config cache layer - Add `invalidateConfigCache()` for clearing config-source inspection results on admin config mutations - Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on DB-stored servers in `addServer()` and `addServerStub()` * feat: wire tenant context into MCP controllers, services, and cache invalidation - Resolve config-source servers via `getAppConfig({ role, tenantId })` in `getMCPTools()` and `getMCPServersList()` controllers - Pass `ensureConfigServers()` results through `getAllServerConfigs()` for three-way merge of YAML + Config + User servers - Add tenant/role context to `getMCPSetupData()` and connection status routes via `getTenantId()` from ALS - Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin config mutations trigger re-inspection of config-source MCP servers * feat: enforce tenantMcpPolicy on admin config mcpServers mutations - Add `validateMcpServerPolicy()` helper that checks mcpServers against operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant, allowedTransports, allowedDomains) - Wire validation into `upsertConfigOverrides` and `patchConfigField` handlers — rejects with 403 when policy is violated - Infer transport type from config shape (command → stdio, url protocol → websocket/sse, type field → streamable-http) - Validate server domains against policy allowlist when configured * revert: remove tenantMcpPolicy schema and enforcement The existing admin config CRUD routes already provide the mechanism for granular MCP server prepopulation (groups, roles, users). The tenantMcpPolicy gating adds unnecessary complexity that can be revisited if needed in the future. - Remove tenantMcpPolicy from mcpSettings Zod schema - Remove validateMcpServerPolicy helper and TenantMcpPolicy interface - Remove policy enforcement from upsertConfigOverrides and patchConfigField handlers * test: update test assertions for source field and config-server wiring - Use objectContaining in MCPServersRegistry reset test to account for new source: 'yaml' field on CACHE-stored configs - Add getTenantId and ensureConfigServers mocks to MCP route tests - Add getAppConfig mock to route test Config service mock - Update getMCPSetupData assertion to expect second options argument - Update getAllServerConfigs assertions for new configServers parameter * fix: disconnect active connections when config-source servers are evicted When admin config overrides change and config-source MCP servers are removed, the invalidation now proactively disconnects active connections for evicted servers instead of leaving them lingering until timeout. - Return evicted server names from invalidateConfigCache() - Disconnect app-level connections for evicted servers in clearMcpConfigCache() via MCPManager.appConnections.disconnect() * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Scope configCacheRepo keys by config content hash to prevent cross-tenant cache poisoning when two tenants define the same server name with different configurations - Change dbSourced checks from `source === 'user'` to `source !== 'yaml' && source !== 'config'` so undefined source (pre-upgrade cached configs) fails closed to restricted mode MAJOR fixes: - Derive OAuth servers from already-computed mcpConfig instead of calling getOAuthServers() separately — config-source OAuth servers are now properly detected - Add parseInt radix (10) and NaN guard with fallback to 30_000 for CONFIG_SERVER_INIT_TIMEOUT_MS - Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Remove `if (role \|\| tenantId)` guard in getMCPSetupData — config servers now always resolve regardless of tenant context MINOR fixes: - Extract resolveAllMcpConfigs() helper in mcp controller to eliminate 3x copy-pasted config resolution boilerplate - Distinguish "not initialized" from real errors in clearMcpConfigCache — log actual failures instead of swallowing - Remove narrative inline comments per style guide - Remove dead try/catch inside Promise.allSettled in ensureConfigServers (inner method never throws) - Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll() calls per request Test updates: - Add ensureConfigServers mock to registry test fixtures - Update getMCPSetupData assertions for inline OAuth derivation * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Break circular dependency: move CONFIG_CACHE_NAMESPACE from MCPServersRegistry to ServerConfigsCacheFactory - Fix dbSourced fail-closed: use source field when present, fall back to legacy dbId check when absent (backward-compatible with pre-upgrade cached configs that lack source field) MAJOR fixes: - Add CONFIG_CACHE_NAMESPACE to aggregate-key set in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests) covering lazy init, stub-on-failure, cross-tenant isolation via config hash keys, concurrent deduplication, merge order, and cache invalidation MINOR fixes: - Update MCPServerInspector test assertion for dbSourced change * fix: restore getServerConfig lookup for config-source servers (NEW-1) Add configNameToKey map that indexes server name → hash-based cache key for O(1) lookup by name in getServerConfig. This restores the config cache layer that was dropped when hash-based keys were introduced. Without this fix, config-source servers appeared in tool listings (via getAllServerConfigs) but getServerConfig returned undefined, breaking all connection and tool call paths. - Populate configNameToKey in ensureSingleConfigServer - Clear configNameToKey in invalidateConfigCache and reset - Clear stale read-through cache entries after lazy init - Remove dead code in invalidateConfigCache (config.title, key parsing) - Add getServerConfig tests for config-source server lookup * fix: eliminate configNameToKey race via caller-provided configServers param Replace the process-global configNameToKey map (last-writer-wins under concurrent multi-tenant load) with a configServers parameter on getServerConfig. Callers pass the pre-resolved config servers map directly — no shared mutable state, no cross-tenant race. - Add optional configServers param to getServerConfig; when provided, returns matching config directly without any global lookup - Remove configNameToKey map entirely (was the source of the race) - Extract server names from cache keys via lastIndexOf in invalidateConfigCache (safe for names containing colons) - Use mcpConfig[serverName] directly in getMCPTools instead of a redundant getServerConfig call - Add cross-tenant isolation test for getServerConfig * fix: populate read-through cache after config server lazy init After lazyInitConfigServer succeeds, write the parsed config to readThroughCache keyed by serverName so that getServerConfig calls from ConnectionsRepository, UserConnectionManager, and MCPManager.callTool find the config without needing configServers. Without this, config-source servers appeared in tool listings but every connection attempt and tool call returned undefined. * fix: user-scoped getServerConfig fallback to server-only cache key When getServerConfig is called with a userId (e.g., from callTool or UserConnectionManager), the cache key is serverName::userId. Config-source servers are cached under the server-only key (no userId). Add a fallback so user-scoped lookups find config-source servers in the read-through cache. * fix: configCacheRepo fallback, isUserSourced DRY, cross-process race CRITICAL: Add findInConfigCache fallback in getServerConfig so config-source servers remain reachable after readThroughCache TTL expires (5s). Without this, every tool call after 5s returned undefined for config-source servers. MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace all 5 inline dbSourced ternary expressions (MCPManager x2, ConnectionsRepository, UserConnectionManager, MCPServerInspector). MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when configCacheRepo.add throws (key exists from another process), fall back to reading the existing entry instead of returning undefined. MINOR: Parallelize invalidateConfigCache awaits with Promise.all. Remove redundant .catch(() => {}) inside Promise.allSettled. Tighten dedup test assertion to toBe(1). Add TTL-expiry tests for getServerConfig (with and without userId). * feat: thread configServers through getAppToolFunctions and formatInstructionsForContext Add optional configServers parameter to getAppToolFunctions, getInstructions, and formatInstructionsForContext so config-source server tools and instructions are visible to agent initialization and context injection paths. Existing callers (boot-time init, tests) pass no argument and continue to work unchanged. Agent runtime paths can now thread resolved config servers from request context. * fix: stale failure stubs retry after 5 min, upsert for cross-process races - Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried instead of permanently disabling config-source servers after transient errors (DNS outage, cold-start race) - Extract upsertConfigCache() helper that tries add then falls back to update, preventing cross-process Redis races where a second instance's successful inspection result was discarded - Add test for stale-stub retry after CONFIG_STUB_RETRY_MS * fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup - Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs were always considered stale (updatedAt ?? 0 → epoch → always expired) - Add null guard for rawConfig in MCPManager.callTool before passing to preProcessGraphTokens — prevents unsafe `as` cast on undefined - Log double-failure in upsertConfigCache instead of silently swallowing - Replace module-scope Date.now monkey-patch with jest.useFakeTimers / jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests * fix: server-only readThrough fallback only returns truthy values Prevents a cached undefined from a prior no-userId lookup from short-circuiting the DB query on a subsequent userId-scoped lookup. * fix: remove findInConfigCache to eliminate cross-tenant config leakage The findInConfigCache prefix scan (serverName:) could return any tenant's config after readThrough TTL expires, violating tenant isolation. Config-source servers are now ONLY resolvable through: 1. The configServers param (callers with tenant context from ALS) 2. The readThrough cache (populated by ensureSingleConfigServer, 5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs) Connection/tool-call paths without tenant context rely exclusively on the readThrough cache. If it expires before the next HTTP request repopulates it, the server is not found — which is correct because there is no tenant context to determine which config to return. - Remove findInConfigCache method and its call in getServerConfig - Update server-only readThrough fallback to only return truthy values (prevents cached undefined from short-circuiting user-scoped DB lookup) - Update tests to document tenant isolation behavior after cache expiry style: fix import order per AGENTS.md conventions Sort package imports shortest-to-longest, local imports longest-to-shortest across MCPServersRegistry, ConnectionsRepository, MCPManager, UserConnectionManager, and MCPServerInspector. * fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures Thread pre-resolved serverConfig from tool creation context into callTool, removing dependency on the readThrough cache for config-source servers. This fixes two issues: - Cross-tenant contamination: the readThrough cache key was unscoped (just serverName), so concurrent multi-tenant requests for same-named servers would overwrite each other's entries - TTL expiry: tool calls happening >5s after config resolution would fail with "Configuration not found" because the readThrough entry had expired Changes: - Add optional serverConfig param to MCPManager.callTool — uses provided config directly, falling back to getServerConfig lookup for YAML/user servers - Thread serverConfig from createMCPTool through createToolInstance closure to callTool - Remove readThrough write from ensureSingleConfigServer — config-source servers are only accessible via configServers param (tenant-scoped) - Remove server-only readThrough fallback from getServerConfig - Increase config cache hash from 8 to 16 hex chars (64-bit) - Add isUserSourced boundary tests for all source/dbId combinations - Fix double Object.keys call in getMCPTools controller - Update test assertions for new getServerConfig behavior * fix: cache base configs for config-server users; narrow upsertConfigCache error handling - Refactor getAllServerConfigs to separate base config fetch (YAML + DB) from config-server layering. Base configs are cached via readThroughCacheAll regardless of whether configServers is provided, eliminating uncached MongoDB queries per request for config-server users - Narrow upsertConfigCache catch to duplicate-key errors only; infrastructure errors (Redis timeouts, network failures) now propagate instead of being silently swallowed, preventing inspection storms during outages * fix: restore correct merge order and document upsert error matching - Restore YAML → Config → User DB precedence in getAllServerConfigs (user DB servers have highest precedence, matching the JSDoc contract) - Add source comment on upsertConfigCache duplicate-key detection linking to the two cache implementations that define the error message * feat: complete config-source server support across all execution paths Wire configServers through the entire agent execution pipeline so config-source MCP servers are fully functional — not just visible in listings but executable in agent sessions. - Thread configServers into handleTools.js agent tool pipeline: resolve config servers from tenant context before MCP tool iteration, pass to getServerConfig, createMCPTools, and createMCPTool - Thread configServers into agent instructions pipeline: applyContextToAgent → getMCPInstructionsForServers → formatInstructionsForContext, resolved in client.js before agent context application - Add configServers param to createMCPTool and createMCPTools for reconnect path fallback - Add source field to redactServerSecrets allowlist for client UI differentiation of server tiers - Narrow invalidateConfigCache to only clear readThroughCacheAll (merged results), preserving YAML individual-server readThrough entries - Update context.spec.ts assertions for new configServers parameter * fix: add missing mocks for config-source server dependencies in client.test.js Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added to client.js but not reflected in the test file's jest.mock declarations. * fix: update formatInstructionsForContext assertions for configServers param The test assertions expected formatInstructionsForContext to be called with only the server names array, but it now receives configServers as a second argument after the config-source server feature wiring. * fix: move configServers resolution before MCP tool loop to avoid TDZ configServers was declared with `let` after the first tool loop but referenced inside it via getServerConfig(), causing a ReferenceError temporal dead zone. Move declaration and resolution before the loop, using tools.some(mcpToolPattern) to gate the async resolution. * fix: address review findings — cache bypass, discoverServerTools gap, DRY - #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via readThroughCacheAll) instead of bypassing it when configServers is present. Extracts user-DB entries from cached base by diffing against YAML keys to maintain YAML → Config → User DB merge order without extra MongoDB calls. - #3: Add configServers param to ToolDiscoveryOptions and thread it through discoverServerTools → getServerConfig so config-source servers are discoverable during OAuth reconnection flows. - #6: Replace inline import() type annotations in context.ts with proper import type { ParsedServerConfig } per AGENTS.md conventions. - #7: Extract resolveConfigServers(req) helper in MCP.js and use it from handleTools.js and client.js, eliminating the duplicated 6-line config resolution pattern. - #10: Restore removed "why" comment explaining getLoaded() vs getAll() choice in getMCPSetupData — documents non-obvious correctness constraint. - #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs. * fix: consolidate imports, reorder constants, fix YAML-DB merge edge case - Merge duplicate @librechat/data-schemas requires in MCP.js into one - Move resolveConfigServers after module-level constants - Fix getAllServerConfigs edge case where user-DB entry overriding a YAML entry with the same name was excluded from userDbConfigs; now uses reference equality check to detect DB-overwritten YAML keys * fix: replace fragile string-match error detection with proper upsert method Add upsert() to IServerConfigsRepositoryInterface and all implementations (InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle error message string match ('already exists in cache') in upsertConfigCache that was the only thing preventing cross-process init races from silently discarding inspection results. Each implementation handles add-or-update atomically: - InMemory: direct Map.set() - Redis: direct cache.set() - RedisAggregateKey: read-modify-write under write lock - DB: delegates to update() (DB servers use explicit add() with ACL setup) * fix: wire configServers through remaining HTTP endpoints - getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig - reinitialize route: resolve configServers before getServerConfig - auth-values route: resolve configServers before getServerConfig - getOAuthHeaders: accept configServers param, thread from callers - Update mcp.spec.js tests to mock getAllServerConfigs for GET by name * fix: thread serverConfig through getConnection for config-source servers Config-source servers exist only in configCacheRepo, not in YAML cache or DB. When callTool → getConnection → getUserConnection → getServerConfig runs without configServers, it returns undefined and throws. Fix by threading the pre-resolved serverConfig (providedConfig) from callTool through getConnection → getUserConnection → createUserConnectionInternal, using it as a fallback before the registry lookup. * fix: thread configServers through reinit, reconnect, and tool definition paths Wire configServers through every remaining call chain that creates or reconnects MCP server connections: - reinitMCPServer: accepts serverConfig and configServers, uses them for getServerConfig fallback, getConnection, and discoverServerTools - reconnectServer: accepts and passes configServers to reinitMCPServer - createMCPTools/createMCPTool: pass configServers to reconnectServer - ToolService.loadToolDefinitionsWrapper: resolves configServers from req, passes to both reinitMCPServer call sites - reinitialize route: passes serverConfig and configServers to reinitMCPServer * fix: address review findings — simplify merge, harden error paths, fix log labels - Simplify getAllServerConfigs merge: replace fragile reference-equality loop with direct spread { ...yamlConfigs, ...configServers, ...base } - Guard upsertConfigCache in lazyInitConfigServer catch block so cache failures don't mask the original inspection error - Deduplicate getYamlServerNames cold-start with promise dedup pattern - Remove dead `if (!mcpConfig)` guard in getMCPSetupData - Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error messages — now uses this.namespace for correct Config/App labeling - Remove misleading OAuth callback comment about readThrough cache - Move resolveConfigServers after module-level constants in MCP.js * fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label - Clear yamlServerNamesPromise on rejection so transient cache errors don't permanently prevent ensureConfigServers from working - Skip reinspectServer for config-source servers (source: 'config') in reinitMCPServer — they lack a CACHE/DB storage location; retry is handled by CONFIG_STUB_RETRY_MS in ensureConfigServers - Use source field instead of dbId for storageLocation derivation - Fix remaining hardcoded "App" in reset() leaderCheck message * fix: persist oauthHeaders in flow state for config-source OAuth servers The OAuth callback route has no JWT auth context and cannot resolve config-source server configs. Previously, getOAuthHeaders would silently return {} for config-source servers, dropping custom token exchange headers. Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow initiation (which has auth context), and the callback reads them from the stored flow state with a fallback to the registry lookup for YAML/user-DB servers. * fix: update tests for getMCPSetupData null guard removal and ToolService mock - MCP.spec.js: update test to expect graceful handling of null mcpConfig instead of a throw (getAllServerConfigs always returns an object) - MCP.js: add defensive \|\| {} for Object.entries(mcpConfig) in case of null from test mocks - ToolService.spec.js: add missing mock for ~/server/services/MCP (resolveConfigServers) * fix: address review findings — DRY, naming, logging, dead code, defensive guards - #1: Simplify getAllServerConfigs to single getBaseServerConfigs call, eliminating redundant double-fetch of cacheConfigsRepo.getAll() - #2: Add warning log when oauthHeaders absent from OAuth callback flow state - #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller imports shared helper instead of reimplementing - #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider in createToolInstance — these are actively used, not unused - #5: Log rejected results from ensureConfigServers Promise.allSettled so cache errors are visible instead of silently dropped - #6: Remove dead 'MCP config not found' error handlers from routes - #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache - #8: Remove logger.error from withTimeout to prevent double-logging timeouts - #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message - #12: Use spread instead of mutation in addServer for immutability consistency - Add upsert mock to ensureConfigServers.test.ts DB mock - Update route tests for resolveAllMcpConfigs import change * fix: restore correct merge priority, use immutable spread, fix test mock - getAllServerConfigs: { ...configServers, ...base } so userDB wins over configServers, matching documented "User DB (highest)" priority - lazyInitConfigServer: use immutable spread instead of direct mutation for parsedConfig.source, consistent with addServer fix - Fix test to mock getAllServerConfigs as {} instead of null, remove unnecessary \|\| {} defensive guard in getMCPSetupData * fix: error handling, stable hashing, flatten nesting, remove dead param - Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with graceful {} fallback so transient DB/cache errors don't crash tool pipeline - Sort keys in configCacheKey JSON.stringify for deterministic hashing regardless of object property insertion order - Flatten clearMcpConfigCache from 3 nested try-catch to early returns; document that user connections are cleaned up lazily (accepted tradeoff) - Remove dead configServers param from getAppToolFunctions (never passed) - Add security rationale comment for source field in redactServerSecrets * fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision The array replacer in JSON.stringify acts as a property allowlist at every nesting depth, silently dropping nested keys like headers['X-API-Key'], oauth.client_secret, etc. Two configs with different nested values but identical top-level structure produced the same hash, causing cross-tenant cache hits and potential credential contamination. Switch to a function replacer that recursively sorts keys at all depths without dropping any properties. Also document the known gap in getOAuthServers: config-source OAuth servers are not covered by auto-reconnection or uninstall cleanup because callers lack request context. * fix: move clearMcpConfigCache to packages/api to eliminate circular dependency The function only depends on MCPServersRegistry and MCPManager, both of which live in packages/api. Import it directly from @librechat/api in the CJS layer instead of using dynamic require('~/config'). * chore: imports/fields ordering * fix: address review findings — error handling, targeted lookup, test gaps - Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so getAppConfig/getAllServerConfigs failures propagate instead of masking infrastructure errors as empty server lists. - Use targeted getServerConfig in getMCPServerById instead of fetching all server configs for a single-server lookup. - Forward configServers to inner createMCPTool calls so reconnect path works for config-source servers. - Update getAllServerConfigs JSDoc to document disjoint-key design. - Add OAuth callback oauthHeaders fallback tests (flow state present vs registry fallback). - Add resolveConfigServers/resolveAllMcpConfigs unit tests covering happy path and error propagation. * fix: add getOAuthReconnectionManager mock to OAuth callback tests * chore: imports ordering	2026-03-28 10:36:43 -04:00
Danny Avila	5e3b7bcde3	🌊 refactor: Local Snapshot for Aggregate Key Cache to Avoid Redundant Redis GETs (#12422 ) * perf: Add local snapshot to aggregate key cache to avoid redundant Redis GETs getAll() was being called 20+ times per chat request (once per tool, per server config lookup, per connection check). Each call hit Redis even though the data doesn't change within a request cycle. Add an in-memory snapshot with 5s TTL that collapses all reads within the window into a single Redis GET. Writes (add/update/remove/reset) invalidate the snapshot immediately so mutations are never stale. Also removes the debug logger that was producing noisy per-call logs. * fix: Prevent snapshot mutation and guarantee cleanup on write failure - Never mutate the snapshot object in-place during writes. Build a new object (spread) so concurrent readers never observe uncommitted state. - Move invalidateLocalSnapshot() into withWriteLock's finally block so cleanup is guaranteed even when successCheck throws on Redis failure. - After successful writes, populate the snapshot with the committed state to avoid an unnecessary Redis GET on the next read. - Use Date.now() after the await in getAll() so the TTL window isn't shortened by Redis latency. - Strengthen tests: spy on underlying Keyv cache to verify N getAll() calls collapse into 1 Redis GET, verify snapshot reference immutability. * fix: Remove dead populateLocalSnapshot calls from write callbacks populateLocalSnapshot was called inside withWriteLock callbacks, but the finally block in withWriteLock always calls invalidateLocalSnapshot immediately after — undoing the populate on every execution path. Remove the dead method and its three call sites. The snapshot is correctly cleared by finally on both success and failure paths. The next getAll() after a write hits Redis once to fetch the committed state, which is acceptable since writes only occur during init and rare manual reinspection. * fix: Derive local snapshot TTL from MCP_REGISTRY_CACHE_TTL config Use cacheConfig.MCP_REGISTRY_CACHE_TTL (default 5000ms) instead of a hardcoded 5s constant. When TTL is 0 (operator explicitly wants no caching), the snapshot is disabled entirely — every getAll() hits Redis. * fix: Add TTL expiry test, document 2×TTL staleness, clarify comments - Add missing test for snapshot TTL expiry path (force-expire via localSnapshotExpiry mutation, verify Redis is hit again) - Document 2×TTL max cross-instance staleness in localSnapshot JSDoc - Document reset() intentionally bypasses withWriteLock - Add inline comments explaining why early invalidateLocalSnapshot() in write callbacks is distinct from the finally-block cleanup - Update cacheConfig.MCP_REGISTRY_CACHE_TTL JSDoc to reflect both use sites and the staleness implication - Rename misleading test name for snapshot reference immutability - Add epoch sentinel comment on localSnapshotExpiry initialization	2026-03-26 16:39:09 -04:00
Danny Avila	8e2721011e	🔑 fix: Robust MCP OAuth Detection in Tool-Call Flow (#12418 ) * fix(api): add buildOAuthToolCallName utility for MCP OAuth flows Extract a shared utility that builds the synthetic tool-call name used during MCP OAuth flows (oauth_mcp_{normalizedServerName}). Uses startsWith on the raw serverName (not the normalized form) to guard against double-wrapping, so names that merely normalize to start with oauth_mcp_ (e.g., oauth@mcp@server) are correctly prefixed while genuinely pre-wrapped names are left as-is. Add 8 unit tests covering normal names, pre-wrapped names, _mcp_ substrings, special characters, non-ASCII, and empty string inputs. * fix(backend): use buildOAuthToolCallName in MCP OAuth flows Replace inline tool-call name construction in both reconnectServer (MCP.js) and createOAuthEmitter (ToolService.js) with the shared buildOAuthToolCallName utility. Remove unused normalizeServerName import from ToolService.js. Fix import ordering in both files. This ensures the oauth_mcp_ prefix is consistently applied so the client correctly identifies MCP OAuth flows and binds the CSRF cookie to the right server. * fix(client): robust MCP OAuth detection and split handling in ToolCall - Fix split() destructuring to preserve tail segments for server names containing _mcp_ (e.g., foo_mcp_bar no longer truncated to foo). - Add auth URL redirect_uri fallback: when the tool-call name lacks the _mcp_ delimiter, parse redirect_uri for the MCP callback path. Set function_name to the extracted server name so progress text shows the server, not the raw tool-call ID. - Display server name instead of literal "oauth" as function_name, gated on auth presence to avoid misidentifying real tools named "oauth". - Consolidate three independent new URL(auth) parses into a single parsedAuthUrl useMemo shared across detection, actionId, and authDomain hooks. - Replace any type on ProgressText test mock with structural type. - Add 8 tests covering delimiter detection, multi-segment names, function_name display, redirect_uri fallback, normalized _mcp_ server names, and non-MCP action auth exclusion. * chore: fix import order in utils.test.ts * fix(client): drop auth gate on OAuth displayName so completed flows show server name The createOAuthEnd handler re-emits the toolCall delta without auth, so auth is cleared on the client after OAuth completes. Gating displayName on `func === 'oauth' && auth` caused completed OAuth steps to render "Completed oauth" instead of "Completed my-server". Remove the `&& auth` gate — within the MCP delimiter branch the func="oauth" check alone is sufficient. Also remove `auth` from the useMemo dep array since only `parsedAuthUrl` is referenced. Update the test to assert correct post-completion display.	2026-03-26 14:45:13 -04:00
Danny Avila	359cc63b41	⚡ refactor: Use in-memory cache for App MCP configs to avoid Redis SCAN (#12410 ) * ⚡ perf: Use in-memory cache for App MCP configs to avoid Redis SCAN The 'App' namespace holds static YAML-loaded configs identical on every instance. Storing them in Redis and retrieving via SCAN + batch-GET caused 60s+ stalls under concurrent load (#11624). Since these configs are already loaded into memory at startup, bypass Redis entirely by always returning ServerConfigsCacheInMemory for the 'App' namespace. * ♻️ refactor: Extract APP_CACHE_NAMESPACE constant and harden tests - Extract magic string 'App' to a shared `APP_CACHE_NAMESPACE` constant used by both ServerConfigsCacheFactory and MCPServersRegistry - Document that `leaderOnly` is ignored for the App namespace - Reset `cacheConfig.USE_REDIS` in test `beforeEach` to prevent ordering-dependent flakiness - Fix import order in test file (longest to shortest) * 🐛 fix: Populate App cache on follower instances in cluster mode In cluster deployments, only the leader runs MCPServersInitializer to inspect and cache MCP server configs. Followers previously read these from Redis, but with the App namespace now using in-memory storage, followers would have an empty cache. Add populateLocalCache() so follower processes independently initialize their own in-memory App cache from the same YAML configs after the leader signals completion. The method is idempotent — if the cache is already populated (leader case), it's a no-op. * 🐛 fix: Use static flag for populateLocalCache idempotency Replace getAllServerConfigs() idempotency check with a static localCachePopulated flag. The previous check merged App + DB caches, causing false early returns in deployments with publicly shared DB configs, and poisoned the TTL read-through cache with stale results. The static flag is zero-cost (no async/Redis/DB calls), immune to DB config interference, and is reset alongside hasInitializedThisProcess in resetProcessFlag() for test teardown. Also set localCachePopulated=true after leader initialization completes, so subsequent calls on the leader don't redundantly re-run populateLocalCache. * 📝 docs: Document process-local reset() semantics for App cache With the App namespace using in-memory storage, reset() only clears the calling process's cache. Add JSDoc noting this behavioral change so callers in cluster deployments know each instance must reset independently. * ✅ test: Add follower cache population tests for MCPServersInitializer Cover the populateLocalCache code path: - Follower populates its own App cache after leader signals completion - localCachePopulated flag prevents redundant re-initialization - Fresh follower process independently initializes all servers * 🧹 style: Fix import order to longest-to-shortest convention * 🔬 test: Add Redis perf benchmark to isolate getAll() bottleneck Benchmarks that run against a live Redis instance to measure: 1. SCAN vs batched GET phases independently 2. SCAN cost scaling with total keyspace size (noise keys) 3. Concurrent getAll() at various concurrency levels (1/10/50/100) 4. Alternative: single aggregate key vs SCAN+GET 5. Alternative: raw MGET vs Keyv batch GET (serialization overhead) Run with: npx jest --config packages/api/jest.config.mjs \ --testPathPatterns="perf_benchmark" --coverage=false * ⚡ feat: Add aggregate-key Redis cache for MCP App configs ServerConfigsCacheRedisAggregateKey stores all configs under a single Redis key, making getAll() a single GET instead of SCAN + N GETs. This eliminates the O(keyspace_size) SCAN that caused 60s+ stalls in large deployments while preserving cross-instance visibility — all instances read/write the same Redis key, so reinspection results propagate automatically after readThroughCache TTL expiry. * ♻️ refactor: Use aggregate-key cache for App namespace in factory Update ServerConfigsCacheFactory to return ServerConfigsCacheRedisAggregateKey for the App namespace when Redis is enabled, instead of ServerConfigsCacheInMemory. This preserves cross-instance visibility (reinspection results propagate through Redis) while eliminating SCAN. Non-App namespaces still use the standard per-key ServerConfigsCacheRedis. * 🗑️ revert: Remove populateLocalCache — no longer needed with aggregate key With App configs stored under a single Redis key (aggregate approach), followers read from Redis like before. The populateLocalCache mechanism and its localCachePopulated flag are no longer necessary. Also reverts the process-local reset() JSDoc since reset() is now cluster-wide again via Redis. * 🐛 fix: Add write mutex to aggregate cache and exclude perf benchmark from CI - Add promise-based write lock to ServerConfigsCacheRedisAggregateKey to prevent concurrent read-modify-write races during parallel initialization (Promise.allSettled runs multiple addServer calls concurrently, causing last-write-wins data loss on the aggregate key) - Rename perf benchmark to cache_integration pattern so CI skips it (requires live Redis) * 🔧 fix: Rename perf benchmark to .manual.spec.ts to exclude from all CI The cache_integration pattern is picked up by test:cache-integration:mcp in CI. Rename to .manual.spec.ts which isn't matched by any CI runner. * ✅ test: Add cache integration tests for ServerConfigsCacheRedisAggregateKey Tests against a live Redis instance covering: - CRUD operations (add, get, update, remove) - getAll with empty/populated cache - Duplicate add rejection, missing update/remove errors - Concurrent write safety (20 parallel adds without data loss) - Concurrent read safety (50 parallel getAll calls) - Reset clears all configs * 🔧 fix: Rename perf benchmark to .manual.spec.ts to exclude from all CI The perf benchmark file was renamed to .manual.spec.ts but no testPathIgnorePatterns existed for that convention. Add .manual\.spec\. to both test and test:ci scripts, plus jest.config.mjs, so manual-only tests never run in CI unit test jobs. fix: Address review findings for aggregate key cache - Add successCheck() to all write paths (add/update/remove) so Redis SET failures throw instead of being silently swallowed - Override reset() to use targeted cache.delete(AGGREGATE_KEY) instead of inherited SCAN-based cache.clear() — consistent with eliminating SCAN operations - Document cross-instance write race invariant in class JSDoc: the promise-based writeLock is process-local only; callers must enforce single-writer semantics externally (leader-only init) - Use definite-assignment assertion (let resolve!:) instead of non-null assertion at call site - Fix import type convention in integration test - Verify Promise.allSettled rejections explicitly in concurrent write test - Fix broken run command in benchmark file header * style: Fix import ordering per AGENTS.md convention Local/project imports sorted longest to shortest. * chore: Update import ordering and clean up unused imports in MCPServersRegistry.ts * chore: import order * chore: import order	2026-03-26 14:44:31 -04:00
Danny Avila	221e49222d	⚡ refactor: Fast-Fail MCP Tool Discovery on 401 for Non-OAuth Servers (#12395 ) * fix: fast-fail MCP discovery for non-OAuth servers on auth errors Always attach oauthHandler in discoverToolsInternal regardless of useOAuth flag. Previously, non-OAuth servers hitting 401 would hang for 30s because connectClient's oauthHandledPromise had no listener to emit oauthFailed, waiting until withTimeout killed it. * chore: import order	2026-03-25 13:18:02 -04:00
Danny Avila	7829fa9eca	🪄 refactor: Simplify MCP Tool Content Formatting to Unified String Output (#12352 ) * refactor: Simplify content formatting in MCP service and parser - Consolidated content handling in `formatToolContent` to return a plain-text string instead of an array for all providers, enhancing clarity and consistency. - Removed unnecessary checks for content array providers, streamlining the logic for handling text and image artifacts. - Updated related tests to reflect changes in expected output format, ensuring comprehensive coverage for the new implementation. * fix: Return empty string for image-only tool responses instead of '(No response)' When artifacts exist (images/UI resources) but no text content is present, return an empty string rather than the misleading '(No response)' fallback. Adds missing test assertions for image-only content and standardizes length checks to explicit `> 0` comparisons.	2026-03-21 14:28:56 -04:00
Danny Avila	8ba2bde5c1	📦 refactor: Consolidate DB models, encapsulating Mongoose usage in `data-schemas` (#11830 ) * chore: move database model methods to /packages/data-schemas * chore: add TypeScript ESLint rule to warn on unused variables * refactor: model imports to streamline access - Consolidated model imports across various files to improve code organization and reduce redundancy. - Updated imports for models such as Assistant, Message, Conversation, and others to a unified import path. - Adjusted middleware and service files to reflect the new import structure, ensuring functionality remains intact. - Enhanced test files to align with the new import paths, maintaining test coverage and integrity. * chore: migrate database models to packages/data-schemas and refactor all direct Mongoose Model usage outside of data-schemas * test: update agent model mocks in unit tests - Added `getAgent` mock to `client.test.js` to enhance test coverage for agent-related functionality. - Removed redundant `getAgent` and `getAgents` mocks from `openai.spec.js` and `responses.unit.spec.js` to streamline test setup and reduce duplication. - Ensured consistency in agent mock implementations across test files. * fix: update types in data-schemas * refactor: enhance type definitions in transaction and spending methods - Updated type definitions in `checkBalance.ts` to use specific request and response types. - Refined `spendTokens.ts` to utilize a new `SpendTxData` interface for better clarity and type safety. - Improved transaction handling in `transaction.ts` by introducing `TransactionResult` and `TxData` interfaces, ensuring consistent data structures across methods. - Adjusted unit tests in `transaction.spec.ts` to accommodate new type definitions and enhance robustness. * refactor: streamline model imports and enhance code organization - Consolidated model imports across various controllers and services to a unified import path, improving code clarity and reducing redundancy. - Updated multiple files to reflect the new import structure, ensuring all functionalities remain intact. - Enhanced overall code organization by removing duplicate import statements and optimizing the usage of model methods. * feat: implement loadAddedAgent and refactor agent loading logic - Introduced `loadAddedAgent` function to handle loading agents from added conversations, supporting multi-convo parallel execution. - Created a new `load.ts` file to encapsulate agent loading functionalities, including `loadEphemeralAgent` and `loadAgent`. - Updated the `index.ts` file to export the new `load` module instead of the deprecated `loadAgent`. - Enhanced type definitions and improved error handling in the agent loading process. - Adjusted unit tests to reflect changes in the agent loading structure and ensure comprehensive coverage. * refactor: enhance balance handling with new update interface - Introduced `IBalanceUpdate` interface to streamline balance update operations across the codebase. - Updated `upsertBalanceFields` method signatures in `balance.ts`, `transaction.ts`, and related tests to utilize the new interface for improved type safety. - Adjusted type imports in `balance.spec.ts` to include `IBalanceUpdate`, ensuring consistency in balance management functionalities. - Enhanced overall code clarity and maintainability by refining type definitions related to balance operations. * feat: add unit tests for loadAgent functionality and enhance agent loading logic - Introduced comprehensive unit tests for the `loadAgent` function, covering various scenarios including null and empty agent IDs, loading of ephemeral agents, and permission checks. - Enhanced the `initializeClient` function by moving `getConvoFiles` to the correct position in the database method exports, ensuring proper functionality. - Improved test coverage for agent loading, including handling of non-existent agents and user permissions. * chore: reorder memory method exports for consistency - Moved `deleteAllUserMemories` to the correct position in the exported memory methods, ensuring a consistent and logical order of method exports in `memory.ts`.	2026-03-21 14:28:53 -04:00
crossagent	290984c514	🔑 fix: Type-Safe User Context Forwarding for Non-OAuth Tool Discovery (#12348 ) Some checks failed Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled Details * fix(mcp): pass missing customUserVars and user during unauthenticated tool discovery * fix(mcp): type-safe user context forwarding for non-OAuth tool discovery Extract UserConnectionContext from OAuthConnectionOptions to properly model the non-OAuth case where user/customUserVars/requestBody need placeholder resolution without requiring OAuth-specific fields. - Remove prohibited `as unknown as` double-cast - Forward requestBody and connectionTimeout (previously omitted) - Add unit tests for argument forwarding at Manager and Factory layers - Add integration test exercising real processMCPEnv substitution --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-21 12:46:23 -04:00
Danny Avila	c68066a636	🪝 fix: MCP Refresh token on OAuth Discovery Failure (#12266 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * 🔒 fix: Prevent token leaks to MCP server on OAuth discovery failure When OAuth metadata discovery fails, refresh logic was falling back to POSTing refresh tokens to /token on the MCP resource server URL instead of the authorization server. A malicious MCP server could exploit this by blocking .well-known discovery to harvest refresh tokens. Changes: - Replace unsafe /token fallback with hard error in both refresh paths - Thread stored token_endpoint (SSRF-validated during initial flow) through the refresh chain so legacy servers without .well-known still work after the first successful auth - Fix revokeOAuthToken to always SSRF-validate the revocation URL, including the /revoke fallback path - Redact refresh token and credentials from debug-level log output - Split branch 2 compound condition for consistent error messages * ✅ test: Add stored endpoint fallback tests and improve refresh coverage - Add storedTokenEndpoint fallback tests for both refresh branches - Add missing test for branch 2 metadata-without-token_endpoint case - Rename misleading test name to match actual mock behavior - Split auto-discovered throw test into undefined vs missing-endpoint - Remove redundant afterEach mockFetch.mockClear() calls (already covered by jest.clearAllMocks() in beforeEach)	2026-03-16 09:31:01 -04:00
Danny Avila	acd07e8085	🗝️ fix: Exempt Admin-Trusted Domains from MCP OAuth Validation (#12255 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * fix: exempt allowedDomains from MCP OAuth SSRF checks (#12254) The SSRF guard in validateOAuthUrl was context-blind — it blocked private/internal OAuth endpoints even for admin-trusted MCP servers listed in mcpSettings.allowedDomains. Add isHostnameAllowed() to domain.ts and skip SSRF checks in validateOAuthUrl when the OAuth endpoint hostname matches an allowed domain. * refactor: thread allowedDomains through MCP connection stack Pass allowedDomains from MCPServersRegistry through BasicConnectionOptions, MCPConnectionFactory, and into MCPOAuthHandler method calls so the OAuth layer can exempt admin-trusted domains from SSRF validation. * test: add allowedDomains bypass tests and fix registry mocks Add isHostnameAllowed unit tests (exact, wildcard, case-insensitive, private IPs). Add MCPOAuthSecurity tests covering the allowedDomains bypass for initiateOAuthFlow, refreshOAuthTokens, and revokeOAuthToken. Update registry mocks to include getAllowedDomains. * fix: enforce protocol/port constraints in OAuth allowedDomains bypass Replace isHostnameAllowed (hostname-only check) with isOAuthUrlAllowed which parses the full OAuth URL and matches against allowedDomains entries including protocol and explicit port constraints — mirroring isDomainAllowedCore's allowlist logic. Prevents a port-scoped entry like 'https://auth.internal:8443' from also exempting other ports. * test: cover auto-discovery and branch-3 refresh paths with allowedDomains Add three new integration tests using a real OAuth test server: - auto-discovered OAuth endpoints allowed when server IP is in allowedDomains - auto-discovered endpoints rejected when allowedDomains doesn't match - refreshOAuthTokens branch 3 (no clientInfo/config) with allowedDomains bypass Also rename describe block from ephemeral issue number to durable name. * docs: explain intentional absence of allowedDomains in completeOAuthFlow Prevents future contributors from assuming a missing parameter during security audits — URLs are pre-validated during initiateOAuthFlow. * test: update initiateOAuthFlow assertion for allowedDomains parameter * perf: avoid redundant URL parse for admin-trusted OAuth endpoints Move isOAuthUrlAllowed check before the hostname extraction so admin-trusted URLs short-circuit with a single URL parse instead of two. The hostname extraction (new URL) is now deferred to the SSRF-check path where it's actually needed.	2026-03-15 23:03:12 -04:00
Danny Avila	8318446704	💁 refactor: Better Config UX for MCP STDIO with `customUserVars` (#12226 ) * refactor: Better UX for MCP stdio with Custom User Variables - Updated the ConnectionsRepository to prevent connections when customUserVars are defined, improving security and access control. - Modified the MCPServerInspector to skip capabilities fetch when customUserVars are present, streamlining server inspection. - Added tests to validate connection restrictions with customUserVars, ensuring robust handling of various server configurations. This change enhances the overall integrity of the connection management process by enforcing stricter rules around custom user variables. * fix: guard against empty customUserVars and add JSDoc context - Extract `hasCustomUserVars()` helper to guard against truthy `{}` (Zod's `.record().optional()` yields `{}` on empty input, not `undefined`) - Add JSDoc to `isAllowedToConnectToServer` explaining why customUserVars servers are excluded from app-level connections * test: improve customUserVars test coverage and fixture hygiene - Add no-connection-provided test for MCPServerInspector (production path) - Fix test descriptions to match actual fixture values - Replace real package name with fictional @test/mcp-stdio-server	2026-03-14 21:22:25 -04:00
Danny Avila	c6982dc180	🛡️ fix: Agent Permission Check on Image Upload Route (#12219 ) * fix: add agent permission check to image upload route * refactor: remove unused SystemRoles import and format test file for clarity * fix: address review findings for image upload agent permission check * refactor: move agent upload auth logic to TypeScript in packages/api Extract pure authorization logic from agentPermCheck.js into checkAgentUploadAuth() in packages/api/src/files/agentUploadAuth.ts. The function returns a structured result ({ allowed, status, error }) instead of writing HTTP responses directly, eliminating the dual responsibility and confusing sentinel return value. The JS wrapper in /api is now a thin adapter that translates the result to HTTP. * test: rewrite image upload permission tests as integration tests Replace mock-heavy images-agent-perm.spec.js with integration tests using MongoMemoryServer, real models, and real PermissionService. Follows the established pattern in files.agents.test.js. Moves test to sibling location (images.agents.test.js) matching backend convention. Adds temp file cleanup assertions on 403/404 responses and covers message_file exemption paths (boolean true, string "true", false). * fix: widen AgentUploadAuthDeps types to accept ObjectId from Mongoose The injected getAgent returns Mongoose documents where _id and author are Types.ObjectId at runtime, not string. Widen the DI interface to accept string \| Types.ObjectId for _id, author, and resourceId so the contract accurately reflects real callers. * chore: move agent upload auth into files/agents/ subdirectory * refactor: delete agentPermCheck.js wrapper, move verifyAgentUploadPermission to packages/api The /api-only dependencies (getAgent, checkPermission) are now passed as object-field params from the route call sites. Both images.js and files.js import verifyAgentUploadPermission from @librechat/api and inject the deps directly, eliminating the intermediate JS wrapper. * style: fix import type ordering in agent upload auth * fix: prevent token TTL race in MCPTokenStorage.storeTokens When expires_in is provided, use it directly instead of round-tripping through Date arithmetic. The previous code computed accessTokenExpiry as a Date, then after an async encryptV2 call, recomputed expiresIn by subtracting Date.now(). On loaded CI runners the elapsed time caused Math.floor to truncate to 0, triggering the 1-year fallback and making the token appear permanently valid — so refresh never fired.	2026-03-14 02:57:56 -04:00
Danny Avila	fa9e1b228a	🪪 fix: MCP API Responses and OAuth Validation (#12217 ) * 🔒 fix: Validate MCP Configs in Server Responses * 🔒 fix: Enhance OAuth URL Validation in MCPOAuthHandler - Introduced validation for OAuth URLs to ensure they do not target private or internal addresses, enhancing security against SSRF attacks. - Updated the OAuth flow to validate both authorization and token URLs before use, ensuring compliance with security standards. - Refactored redirect URI handling to streamline the OAuth client registration process. - Added comprehensive error handling for invalid URLs, improving robustness in OAuth interactions. * 🔒 feat: Implement Permission Checks for MCP Server Management - Added permission checkers for MCP server usage and creation, enhancing access control. - Updated routes for reinitializing MCP servers and retrieving authentication values to include these permission checks, ensuring only authorized users can access these functionalities. - Refactored existing permission logic to improve clarity and maintainability. * 🔒 fix: Enhance MCP Server Response Validation and Redaction - Updated MCP route tests to use `toMatchObject` for better validation of server response structures, ensuring consistency in expected properties. - Refactored the `redactServerSecrets` function to streamline the removal of sensitive information, ensuring that user-sourced API keys are properly redacted while retaining their source. - Improved OAuth security tests to validate rejection of private URLs across multiple endpoints, enhancing protection against SSRF vulnerabilities. - Added comprehensive tests for the `redactServerSecrets` function to ensure proper handling of various server configurations, reinforcing security measures. * chore: eslint * 🔒 fix: Enhance OAuth Server URL Validation in MCPOAuthHandler - Added validation for discovered authorization server URLs to ensure they meet security standards. - Improved logging to provide clearer insights when an authorization server is found from resource metadata. - Refactored the handling of authorization server URLs to enhance robustness against potential security vulnerabilities. * 🔒 test: Bypass SSRF validation for MCP OAuth Flow tests - Mocked SSRF validation functions to allow tests to use real local HTTP servers, facilitating more accurate testing of the MCP OAuth flow. - Updated test setup to ensure compatibility with the new mocking strategy, enhancing the reliability of the tests. * 🔒 fix: Add Validation for OAuth Metadata Endpoints in MCPOAuthHandler - Implemented checks for the presence and validity of registration and token endpoints in the OAuth metadata, enhancing security by ensuring that these URLs are properly validated before use. - Improved error handling and logging to provide better insights during the OAuth metadata processing, reinforcing the robustness of the OAuth flow. * 🔒 refactor: Simplify MCP Auth Values Endpoint Logic - Removed redundant permission checks for accessing the MCP server resource in the auth-values endpoint, streamlining the request handling process. - Consolidated error handling and response structure for improved clarity and maintainability. - Enhanced logging for better insights during the authentication value checks, reinforcing the robustness of the endpoint. * 🔒 test: Refactor LeaderElection Integration Tests for Improved Cleanup - Moved Redis key cleanup to the beforeEach hook to ensure a clean state before each test. - Enhanced afterEach logic to handle instance resignations and Redis key deletion more robustly, improving test reliability and maintainability.	2026-03-13 23:18:56 -04:00
Danny Avila	fcb344da47	🛂 fix: MCP OAuth Race Conditions, CSRF Fallback, and Token Expiry Handling (#12171 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * fix: Implement race conditions in MCP OAuth flow - Added connection mutex to coalesce concurrent `getUserConnection` calls, preventing multiple simultaneous attempts. - Enhanced flow state management to retry once when a flow state is missing, improving resilience against race conditions. - Introduced `ReauthenticationRequiredError` for better error handling when access tokens are expired or missing. - Updated tests to cover new race condition scenarios and ensure proper handling of OAuth flows. * fix: Stale PENDING flow detection and OAuth URL re-issuance PENDING flows in handleOAuthRequired now check createdAt age — flows older than 2 minutes are treated as stale and replaced instead of joined. Fixes the case where a leftover PENDING flow from a previous session blocks new OAuth initiation. authorizationUrl is now stored in MCPOAuthFlowMetadata so that when a second caller joins an active PENDING flow (e.g., the SSE-emitting path in ToolService), it can re-issue the URL to the user via oauthStart. * fix: CSRF fallback via active PENDING flow in OAuth callback When the OAuth callback arrives without CSRF or session cookies (common in the chat/SSE flow where cookies can't be set on streaming responses), fall back to validating that a PENDING flow exists for the flowId. This is safe because the flow was created server-side after JWT authentication and the authorization code is PKCE-protected. * test: Extract shared OAuth test server helpers Move MockKeyv, getFreePort, trackSockets, and createOAuthMCPServer into a shared helpers/oauthTestServer module. Enhance the test server with refresh token support, token rotation, metadata discovery, and dynamic client registration endpoints. Add InMemoryTokenStore for token storage tests. Refactor MCPOAuthRaceCondition.test.ts to import from shared helpers. * test: Add comprehensive MCP OAuth test modules MCPOAuthTokenStorage — 21 tests for storeTokens/getTokens with InMemoryTokenStore: encrypt/decrypt round-trips, expiry calculation, refresh callback wiring, ReauthenticationRequiredError paths. MCPOAuthFlow — 10 tests against real HTTP server: token refresh with stored client info, refresh token rotation, metadata discovery, dynamic client registration, full store/retrieve/expire/refresh lifecycle. MCPOAuthConnectionEvents — 5 tests for MCPConnection OAuth event cycle with real OAuth-gated MCP server: oauthRequired emission on 401, oauthHandled reconnection, oauthFailed rejection, token expiry detection. MCPOAuthTokenExpiry — 12 tests for the token expiry edge case: refresh success/failure paths, ReauthenticationRequiredError, PENDING flow CSRF fallback, authorizationUrl metadata storage, full re-auth cycle after refresh failure, concurrent expired token coalescing, stale PENDING flow detection. * test: Enhance MCP OAuth connection tests with cooldown reset Added a `beforeEach` hook to clear the cooldown for `MCPConnection` before each test, ensuring a clean state. Updated the race condition handling in the tests to properly clear the timeout, improving reliability in the event data retrieval process. * refactor: PENDING flow management and state recovery in MCP OAuth - Introduced a constant `PENDING_STALE_MS` to define the age threshold for PENDING flows, improving the handling of stale flows. - Updated the logic in `MCPConnectionFactory` and `FlowStateManager` to check the age of PENDING flows before joining or reusing them. - Modified the `completeFlow` method to return false when the flow state is deleted, ensuring graceful handling of race conditions. - Enhanced tests to validate the new behavior and ensure robustness against state recovery issues. * refactor: MCP OAuth flow management and testing - Updated the `completeFlow` method to log warnings when a tool flow state is not found during completion, improving error handling. - Introduced a new `normalizeExpiresAt` function to standardize expiration timestamp handling across the application. - Refactored token expiration checks in `MCPConnectionFactory` to utilize the new normalization function, ensuring consistent behavior. - Added a comprehensive test suite for OAuth callback CSRF fallback logic, validating the handling of PENDING flows and their staleness. - Enhanced existing tests to cover new expiration normalization logic and ensure robust flow state management. * test: Add CSRF fallback tests for active PENDING flows in MCP OAuth - Introduced new tests to validate CSRF fallback behavior when a fresh PENDING flow exists without cookies, ensuring successful OAuth callback handling. - Added scenarios to reject requests when no PENDING flow exists, when only a COMPLETED flow is present, and when a PENDING flow is stale, enhancing the robustness of flow state management. - Improved overall test coverage for OAuth callback logic, reinforcing the handling of CSRF validation failures. * chore: imports order * refactor: Update UserConnectionManager to conditionally manage pending connections - Modified the logic in `UserConnectionManager` to only set pending connections if `forceNew` is false, preventing unnecessary overwrites. - Adjusted the cleanup process to ensure pending connections are only deleted when not forced, enhancing connection management efficiency. * refactor: MCP OAuth flow state management - Introduced a new method `storeStateMapping` in `MCPOAuthHandler` to securely map the OAuth state parameter to the flow ID, improving callback resolution and security against forgery. - Updated the OAuth initiation and callback handling in `mcp.js` to utilize the new state mapping functionality, ensuring robust flow management. - Refactored `MCPConnectionFactory` to store state mappings during flow initialization, enhancing the integrity of the OAuth process. - Adjusted comments to clarify the purpose of state parameters in authorization URLs, reinforcing code readability. * refactor: MCPConnection with OAuth recovery handling - Added `oauthRecovery` flag to manage OAuth recovery state during connection attempts. - Introduced `decrementCycleCount` method to reduce the circuit breaker's cycle count upon successful reconnection after OAuth recovery. - Updated connection logic to reset the `oauthRecovery` flag after handling OAuth, improving state management and connection reliability. * chore: Add debug logging for OAuth recovery cycle count decrement - Introduced a debug log statement in the `MCPConnection` class to track the decrement of the cycle count after a successful reconnection during OAuth recovery. - This enhancement improves observability and aids in troubleshooting connection issues related to OAuth recovery. * test: Add OAuth recovery cycle management tests - Introduced new tests for the OAuth recovery cycle in `MCPConnection`, validating the decrement of cycle counts after successful reconnections. - Added scenarios to ensure that the cycle count is not decremented on OAuth failures, enhancing the robustness of connection management. - Improved test coverage for OAuth reconnect scenarios, ensuring reliable behavior under various conditions. * feat: Implement circuit breaker configuration in MCP - Added circuit breaker settings to `.env.example` for max cycles, cycle window, and cooldown duration. - Refactored `MCPConnection` to utilize the new configuration values from `mcpConfig`, enhancing circuit breaker management. - Improved code maintainability by centralizing circuit breaker parameters in the configuration file. * refactor: Update decrementCycleCount method for circuit breaker management - Changed the visibility of the `decrementCycleCount` method in `MCPConnection` from private to public static, allowing it to be called with a server name parameter. - Updated calls to `decrementCycleCount` in `MCPConnectionFactory` to use the new static method, improving clarity and consistency in circuit breaker management during connection failures and OAuth recovery. - Enhanced the handling of circuit breaker state by ensuring the method checks for the existence of the circuit breaker before decrementing the cycle count. * refactor: cycle count decrement on tool listing failure - Added a call to `MCPConnection.decrementCycleCount` in the `MCPConnectionFactory` to handle cases where unauthenticated tool listing fails, improving circuit breaker management. - This change ensures that the cycle count is decremented appropriately, maintaining the integrity of the connection recovery process. * refactor: Update circuit breaker configuration and logic - Enhanced circuit breaker settings in `.env.example` to include new parameters for failed rounds and backoff strategies. - Refactored `MCPConnection` to utilize the updated configuration values from `mcpConfig`, improving circuit breaker management. - Updated tests to reflect changes in circuit breaker logic, ensuring accurate validation of connection behavior under rapid reconnect scenarios. * feat: Implement state mapping deletion in MCP flow management - Added a new method `deleteStateMapping` in `MCPOAuthHandler` to remove orphaned state mappings when a flow is replaced, preventing old authorization URLs from resolving after a flow restart. - Updated `MCPConnectionFactory` to call `deleteStateMapping` during flow cleanup, ensuring proper management of OAuth states. - Enhanced test coverage for state mapping functionality to validate the new deletion logic.	2026-03-10 21:15:01 -04:00
Danny Avila	6167ce6e57	🧪 chore: MCP Reconnect Storm Follow-Up Fixes and Integration Tests (#12172 ) * 🧪 test: Add reconnection storm regression tests for MCPConnection Introduced a comprehensive test suite for reconnection storm scenarios, validating circuit breaker, throttling, cooldown, and timeout fixes. The tests utilize real MCP SDK transports and a StreamableHTTP server to ensure accurate behavior under rapid connect/disconnect cycles and error handling for SSE 400/405 responses. This enhances the reliability of the MCPConnection by ensuring proper handling of reconnection logic and circuit breaker functionality. * 🔧 fix: Update createUnavailableToolStub to return structured response Modified the `createUnavailableToolStub` function to return an array containing the unavailable message and a null value, enhancing the response structure. Additionally, added a debug log to skip tool creation when the result is null, improving the handling of reconnection scenarios in the MCP service. * 🧪 test: Enhance MCP tool creation tests for cache and throttle interactions Added new test cases for the `createMCPTool` function to validate the caching behavior when tools are unavailable or throttled. The tests ensure that tools are correctly cached as missing and prevent unnecessary reconnects across different users, improving the reliability of the MCP service under concurrent usage scenarios. Additionally, introduced a test for the `createMCPTools` function to verify that it returns an empty array when reconnect is throttled, ensuring proper handling of throttling logic. * 📝 docs: Update AGENTS.md with testing philosophy and guidelines Expanded the testing section in AGENTS.md to emphasize the importance of using real logic over mocks, advocating for the use of spies and real dependencies in tests. Added specific recommendations for testing with MongoDB and MCP SDK, highlighting the need to mock only uncontrollable external services. This update aims to improve testing practices and encourage more robust test implementations. * 🧪 test: Enhance reconnection storm tests with socket tracking and SSE handling Updated the reconnection storm test suite to include a new socket tracking mechanism for better resource management during tests. Improved the handling of SSE 400/405 responses by ensuring they are processed in the same branch as 404 errors, preventing unhandled cases. This enhances the reliability of the MCPConnection under rapid reconnect scenarios and ensures proper error handling. * 🔧 fix: Implement cache eviction for stale reconnect attempts and missing tools Added an `evictStale` function to manage the size of the `lastReconnectAttempts` and `missingToolCache` maps, ensuring they do not exceed a maximum cache size. This enhancement improves resource management by removing outdated entries based on a specified time-to-live (TTL), thereby optimizing the MCP service's performance during reconnection scenarios.	2026-03-10 17:44:13 -04:00
Danny Avila	c0e876a2e6	🔄 refactor: OAuth Metadata Discovery with Origin Fallback (#12170 ) * 🔄 refactor: OAuth Metadata Discovery with Origin Fallback Updated the `discoverWithOriginFallback` method to improve the handling of OAuth authorization server metadata discovery. The method now retries with the origin URL when discovery fails for a path-based URL, ensuring consistent behavior across `discoverMetadata` and token refresh flows. This change reduces code duplication and enhances the reliability of the OAuth flow by providing a unified implementation for origin fallback logic. * 🧪 test: Add tests for OAuth Token Refresh with Origin Fallback Introduced new tests for the `refreshOAuthTokens` method in `MCPOAuthHandler` to validate the retry mechanism with the origin URL when path-based discovery fails. The tests cover scenarios where the first discovery attempt throws an error and the subsequent attempt succeeds, as well as cases where the discovery fails entirely. This enhances the reliability of the OAuth token refresh process by ensuring proper handling of discovery failures. * chore: imports order * fix: Improve Base URL Logging and Metadata Discovery in MCPOAuthHandler Updated the logging to use a consistent base URL object when handling discovery failures in the MCPOAuthHandler. This change enhances error reporting by ensuring that the base URL is logged correctly, and it refines the metadata discovery process by returning the result of the discovery attempt with the base URL, improving the reliability of the OAuth flow.	2026-03-10 16:19:07 -04:00
Oreon Lothamer	eb6328c1d9	🛤️ fix: Base URL Fallback for Path-based OAuth Discovery in Token Refresh (#12164 ) * fix: add base URL fallback for path-based OAuth discovery in token refresh The two `refreshOAuthTokens` paths in `MCPOAuthHandler` were missing the origin-URL fallback that `initiateOAuthFlow` already had. With MCP SDK 1.27.1, `buildDiscoveryUrls` appends the server path to the `.well-known` URL (e.g. `/.well-known/oauth-authorization-server/mcp`), which returns 404 for servers like Sentry that only expose the root discovery endpoint (`/.well-known/oauth-authorization-server`). Without the fallback, discovery returns null during refresh, the token endpoint resolves to the wrong URL, and users are prompted to re-authenticate every time their access token expires instead of the refresh token being exchanged silently. Both refresh paths now mirror the `initiateOAuthFlow` pattern: if discovery fails and the server URL has a non-root path, retry with just the origin URL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: extract discoverWithOriginFallback helper; add tests Extract the duplicated path-based URL retry logic from both `refreshOAuthTokens` branches into a single private static helper `discoverWithOriginFallback`, reducing the risk of the two paths drifting in the future. Add three tests covering the new behaviour: - stored clientInfo path: asserts discovery is called twice (path then origin) and that the token endpoint from the origin discovery is used - auto-discovered path: same assertions for the branchless path - root URL: asserts discovery is called only once when the server URL already has no path component Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: use discoverWithOriginFallback in discoverMetadata too Remove the inline duplicate of the origin-fallback logic from `discoverMetadata` and replace it with a call to the shared `discoverWithOriginFallback` helper, giving all three discovery sites a single implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: use mock.calls + .href/.toString() for URL assertions Replace brittle `toHaveBeenNthCalledWith(new URL(...))` comparisons with `expect.any(URL)` matchers and explicit `.href`/`.toString()` checks on the captured call args, consistent with the existing mock.calls pattern used throughout handler.test.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 15:04:35 -04:00
matt burnett	ad5c51f62b	⛈️ fix: MCP Reconnection Storm Prevention with Circuit Breaker, Backoff, and Tool Stubs (#12162 ) * fix: MCP reconnection stability - circuit breaker, throttling, and cooldown retry * Comment and logging cleanup * fix broken tests	2026-03-10 14:21:36 -04:00
Danny Avila	32cadb1cc5	🩹 fix: MCP Server Recovery from Startup Inspection Failures (#12145 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details * feat: MCP server reinitialization recovery mechanism - Added functionality to store a stub configuration for MCP servers that fail inspection at startup, allowing for recovery via reinitialization. - Introduced `reinspectServer` method in `MCPServersRegistry` to handle reinspection of previously failed servers. - Enhanced `MCPServersInitializer` to log and manage server initialization failures, ensuring proper handling of inspection failures. - Added integration tests to verify the recovery process for unreachable MCP servers, ensuring that stub configurations are stored and can be reinitialized successfully. - Updated type definitions to include `inspectionFailed` flag in server configurations for better state management. * fix: MCP server handling for inspection failures - Updated `reinitMCPServer` to return a structured response when the server is unreachable, providing clearer feedback on the failure. - Modified `ConnectionsRepository` to prevent connections to servers marked as inspection failed, improving error handling. - Adjusted `MCPServersRegistry` methods to ensure proper management of server states, including throwing errors for non-failed servers during reinspection. - Enhanced integration tests to validate the behavior of the system when dealing with unreachable MCP servers and inspection failures, ensuring robust recovery mechanisms. * fix: Clear all cached server configurations in MCPServersRegistry - Added a comment to clarify the necessity of clearing all cached server configurations when updating a server's configuration, as the cache is keyed by userId without a reverse index for enumeration. * fix: Update integration test for file_tools_server inspection handling - Modified the test to verify that the `file_tools_server` is stored as a stub when inspection fails, ensuring it can be reinitialized correctly. - Adjusted expectations to confirm that the `inspectionFailed` flag is set to true for the stub configuration, enhancing the robustness of the recovery mechanism. * test: Add unit tests for reinspecting servers in MCPServersRegistry - Introduced tests for the `reinspectServer` method to validate error handling when called on a healthy server and when the server does not exist. - Ensured that appropriate exceptions are thrown for both scenarios, enhancing the robustness of server state management. * test: Add integration test for concurrent reinspectServer calls - Introduced a new test to validate that multiple concurrent calls to reinspectServer do not crash or corrupt the server state. - Ensured that at least one call succeeds and any failures are due to the server not being in a failed state, enhancing the reliability of the reinitialization process. * test: Enhance integration test for concurrent MCP server reinitialization - Added a new test to validate that concurrent calls to reinitialize the MCP server do not crash or corrupt the server state. - Ensured that at least one call succeeds and that failures are handled gracefully, improving the reliability of the reinitialization process. - Reset MCPManager instance after each test to maintain a clean state for subsequent tests.	2026-03-08 21:49:04 -04:00
Danny Avila	4a8a5b5994	🔒 fix: Hex-normalized IPv4-mapped IPv6 in Domain Validation (#12130 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details * 🔒 fix: handle hex-normalized IPv4-mapped IPv6 in domain validation * fix: Enhance IPv6 private address detection in domain validation - Added tests for detecting IPv4-compatible, 6to4, NAT64, and Teredo addresses. - Implemented `extractEmbeddedIPv4` function to identify private IPv4 addresses within various IPv6 formats. - Updated `isPrivateIP` function to utilize the new extraction logic for improved accuracy in address validation. * fix: Update private IPv4 detection logic in domain validation - Enhanced the `isPrivateIPv4` function to accurately identify additional private and non-routable IPv4 ranges. - Adjusted the return logic in `resolveHostnameSSRF` to utilize the updated private IP detection for improved hostname validation. * test: Expand private IP detection tests in domain validation - Added tests for additional private IPv4 ranges including 0.0.0.0/8, 100.64.0.0/10, 192.0.0.0/24, and 198.18.0.0/15. - Updated existing tests to ensure accurate detection of private and multicast IP addresses in the `isPrivateIP` function. - Enhanced `resolveHostnameSSRF` to correctly identify private literal IPv4 addresses without DNS lookup. * refactor: Rename and enhance embedded IPv4 detection in IPv6 addresses - Renamed `extractEmbeddedIPv4` to `hasPrivateEmbeddedIPv4` for clarity on its purpose. - Updated logic to accurately check for private IPv4 addresses embedded in Teredo, 6to4, and NAT64 IPv6 formats. - Improved the `isPrivateIP` function to utilize the new naming and logic for better readability and accuracy. - Enhanced documentation for clarity on the functionality of the updated methods. * feat: Enhance private IPv4 detection in embedded IPv6 addresses - Added additional checks in `hasPrivateEmbeddedIPv4` to ensure only valid private IPv4 formats are recognized. - Improved the logic for identifying private IPv4 addresses embedded within various IPv6 formats, enhancing overall accuracy. * test: Add additional test for hostname resolution in SSRF detection - Included a new test case in `resolveHostnameSSRF` to validate the detection of private IPv4 addresses embedded in IPv6 formats for the hostname 'meta.example.com'. - Enhanced existing tests to ensure comprehensive coverage of hostname resolution scenarios. * fix: Set redirect option to 'manual' in undiciFetch calls - Updated undiciFetch calls in MCPConnection to include the redirect option set to 'manual' for better control over HTTP redirects. - Added documentation comments regarding SSRF pre-checks for WebSocket connections, highlighting the limitations of the current SDK regarding DNS resolution. * test: Add integration tests for MCP SSRF protections - Introduced a new test suite for MCP SSRF protections, verifying that MCPConnection does not follow HTTP redirects to private IPs and blocks WebSocket connections to private IPs when SSRF protection is enabled. - Implemented tests to ensure correct behavior of the connection under various scenarios, including redirect handling and WebSocket DNS resolution. * refactor: Improve SSRF protection logic for WebSocket connections - Enhanced the SSRF pre-check for WebSocket connections to validate resolved IPs, ensuring that allowlisting a domain does not grant trust to its resolved IPs at runtime. - Updated documentation comments to clarify the limitations of the current SDK regarding DNS resolution and the implications for SSRF protection. * test: Enhance MCP SSRF protection tests for redirect handling and WebSocket connections - Updated tests to ensure that MCPConnection does not follow HTTP redirects to private IPs, regardless of SSRF protection settings. - Added checks to verify that WebSocket connections to hosts resolving to private IPs are blocked, even when SSRF protection is disabled. - Improved documentation comments for clarity on the behavior of the tests and the implications for SSRF protection. * test: Refactor MCP SSRF protection test for WebSocket connection errors - Updated the test to use `await expect(...).rejects.not.toThrow(...)` for better readability and clarity. - Simplified the error handling logic while ensuring that SSRF rejections are correctly validated during connection failures.	2026-03-07 20:13:52 -05:00
Danny Avila	6ebee069c7	🤝 fix: Respect Server Token Endpoint Auth Method Preference in MCP OAuth (#12052 ) Some checks are pending Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details * fix(mcp): respect server's token endpoint auth method preference order * fix(mcp): update token endpoint auth method to client_secret_basic * fix(mcp): correct auth method to client_secret_basic in OAuth handler * test(mcp): add tests for OAuth client registration method selection based on server preferences * refactor(mcp): extract and implement token endpoint auth methods into separate utility functions - Moved token endpoint authentication method logic from the MCPOAuthHandler to new utility functions in methods.ts for better organization and reusability. - Added tests for the new methods to ensure correct behavior in selecting and resolving authentication methods based on server preferences and token exchange methods. - Updated MCPOAuthHandler to utilize the new utility functions, improving code clarity and maintainability. * chore(mcp): remove redundant comments in OAuth handler - Cleaned up the MCPOAuthHandler by removing unnecessary comments related to authentication methods, improving code readability and maintainability. * refactor(mcp): update supported auth methods to use ReadonlySet for better performance - Changed the SUPPORTED_AUTH_METHODS from an array to a ReadonlySet for improved lookup efficiency. - Enhanced the logic in selectRegistrationAuthMethod to prioritize credential-based methods and handle cases where the server advertises 'none' correctly, ensuring compliance with RFC 7591. * test(mcp): add tests for selectRegistrationAuthMethod to handle 'none' and empty array cases - Introduced new test cases to ensure selectRegistrationAuthMethod correctly prioritizes credential-based methods over 'none' when listed first or before other methods. - Added a test to verify that an empty token_endpoint_auth_methods_supported returns undefined, adhering to RFC 8414. * refactor(mcp): streamline authentication method handling in OAuth handler - Simplified the logic for determining the authentication method by consolidating checks into a single function call. - Removed redundant checks for supported auth methods, enhancing code clarity and maintainability. - Updated the request header and body handling based on the resolved authentication method. * fix(mcp): ensure compliance with RFC 6749 by removing credentials from body when using client_secret_basic - Updated the MCPOAuthHandler to delete client_id and client_secret from body parameters when using the client_secret_basic authentication method, ensuring adherence to RFC 6749 §2.3.1. * test(mcp): add tests for OAuth flow handling of client_secret_basic and client_secret_post methods - Introduced new test cases to verify that the MCPOAuthHandler correctly removes client_id and client_secret from the request body when using client_secret_basic. - Added tests to ensure proper handling of client_secret_post and none authentication methods, confirming that the correct parameters are included or excluded based on the specified method. - Enhanced the test suite for completeOAuthFlow to cover various scenarios, ensuring compliance with OAuth 2.0 specifications. * test(mcp): enhance tests for selectRegistrationAuthMethod and resolveTokenEndpointAuthMethod - Added new test cases to verify the selection of the first supported credential method from a mixed list in selectRegistrationAuthMethod. - Included tests to ensure resolveTokenEndpointAuthMethod correctly ignores unsupported preferred methods and handles empty tokenAuthMethods, returning undefined as expected. - Improved test coverage for various scenarios in the OAuth flow, ensuring compliance with relevant specifications. --------- Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>	2026-03-03 22:44:13 -05:00
Danny Avila	d3c06052d7	🗝️ feat: Credential Variables for DB-Sourced MCP Servers (#12044 ) * feat: Allow Credential Variables in Headers for DB-sourced MCP Servers - Removed the hasCustomUserVars check from ToolService.js, directly retrieving userMCPAuthMap. - Updated MCPConnectionFactory and related classes to include a dbSourced flag for better handling of database-sourced configurations. - Added integration tests to ensure proper behavior of dbSourced servers, verifying that sensitive placeholders are not resolved while allowing customUserVars. - Adjusted various MCP-related files to accommodate the new dbSourced logic, ensuring consistent handling across the codebase. * chore: MCPConnectionFactory Tests with Additional Flow Metadata for typing - Updated MCPConnectionFactory tests to include new fields in flowMetadata: serverUrl and state. - Enhanced mockFlowData in multiple test cases to reflect the updated structure, ensuring comprehensive coverage of the OAuth flow scenarios. - Added authorization_endpoint to metadata in the test setup for improved validation of the OAuth process. * refactor: Simplify MCPManager Configuration Handling - Removed unnecessary type assertions and streamlined the retrieval of server configuration in MCPManager. - Enhanced the handling of OAuth and database-sourced flags for improved clarity and efficiency. - Updated tests to reflect changes in user object structure and ensure proper processing of MCP environment variables. * refactor: Optimize User MCP Auth Map Retrieval in ToolService - Introduced conditional loading of userMCPAuthMap based on the presence of MCP-delimited tools, improving efficiency by avoiding unnecessary calls. - Updated the loadToolDefinitionsWrapper and loadAgentTools functions to reflect this change, enhancing overall performance and clarity. * test: Add userMCPAuthMap gating tests in ToolService - Introduced new tests to validate the logic for determining if MCP tools are present in the agent's tool list. - Implemented various scenarios to ensure accurate detection of MCP tools, including edge cases for empty, undefined, and null tool lists. - Enhanced clarity and coverage of the ToolService capability checking logic. * refactor: Enhance MCP Environment Variable Processing - Simplified the handling of the dbSourced parameter in the processMCPEnv function. - Introduced a failsafe mechanism to derive dbSourced from options if not explicitly provided, improving robustness and clarity in MCP environment variable processing. * refactor: Update Regex Patterns for Credential Placeholders in ServerConfigsDB - Modified regex patterns to include additional credential/env placeholders that should not be allowed in user-provided configurations. - Clarified comments to emphasize the security risks associated with credential exfiltration when MCP servers are shared between users. * chore: field order * refactor: Clean Up dbSourced Parameter Handling in processMCPEnv - Reintroduced the failsafe mechanism for deriving the dbSourced parameter from options, ensuring clarity and robustness in MCP environment variable processing. - Enhanced code readability by maintaining consistent comment structure. * refactor: Update MCPOptions Type to Include Optional dbId - Modified the processMCPEnv function to extend the MCPOptions type, allowing for an optional dbId property. - Simplified the logic for deriving the dbSourced parameter by directly checking the dbId property, enhancing code clarity and maintainability.	2026-03-03 18:02:37 -05:00
Jón Levy	f7ac449ca4	🔌 fix: Resolve MCP OAuth flow state race condition (#11941 ) * 🔌 fix: Resolve MCP OAuth flow state race condition The OAuth callback arrives before the flow state is stored because `createFlow()` returns a long-running Promise that only resolves on flow COMPLETION, not when the initial PENDING state is persisted. Calling it fire-and-forget with `.catch(() => {})` meant the redirect happened before the state existed, causing "Flow state not found" errors. Changes: - Add `initFlow()` to FlowStateManager that stores PENDING state and returns immediately, decoupling state persistence from monitoring - Await `initFlow()` before emitting the OAuth redirect so the callback always finds existing state - Keep `createFlow()` in the background for monitoring, but log warnings instead of silently swallowing errors - Increase FLOWS cache TTL from 3 minutes to 10 minutes to give users more time to complete OAuth consent screens Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * 🔌 refactor: Revert FLOWS cache TTL change The race condition fix (initFlow) is sufficient on its own. TTL configurability should be a separate enhancement via librechat.yaml mcpSettings rather than a hardcoded increase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * 🔌 fix: Address PR review — restore FLOWS TTL, fix blocking-path race, clean up dead args - Restore FLOWS cache TTL to 10 minutes (was silently dropped back to 3) - Add initFlow before oauthStart in blocking handleOAuthRequired path to guarantee state persistence before any redirect - Pass {} to createFlow metadata arg (dead after initFlow writes state) - Downgrade background monitor .catch from logger.warn to logger.debug - Replace process.nextTick with Promise.resolve in test (correct semantics) - Add initFlow TTL assertion test - Add blocking-path ordering test (initFlow → oauthStart → createFlow) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 19:27:36 -05:00
Danny Avila	a0a1749151	🔗 fix: Normalize MCP OAuth `resource` parameter to match token exchange (#12018 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * 🔗 fix: Normalize MCP OAuth `resource` parameter to match token exchange The authorization request used the raw resource string from metadata while the token exchange normalized it through `new URL().href`, causing a trailing-slash mismatch that Cloudflare's auth server rejected. Canonicalize the resource URL in both paths so they match. * 🔧 test: Simplify LeaderElection integration tests for Redis Refactored the integration tests for LeaderElection with Redis by reducing the number of instances from 100 to 1, streamlining the leadership election process. Updated assertions to verify leadership status and UUID after resignation, improving test clarity and performance. Adjusted timeout to 15 seconds for the single instance scenario. * 🔧 test: Update LeaderElection test case description for clarity Modified the description of the test case for leader resignation in the LeaderElection integration tests to better reflect the expected behavior, enhancing clarity and understanding of the test's purpose. * refactor: `resource` parameter in MCP OAuth authorization URL Updated the `MCPOAuthHandler` to ensure the `resource` parameter is added to the authorization URL even when an error occurs while retrieving it from metadata. This change improves the handling of invalid resource URLs by using the raw value as a fallback, enhancing the robustness of the authorization process.	2026-03-02 15:52:29 -05:00
Danny Avila	1f82fb8692	🪵 refactor: `onmessage` Handler and Restructure MCP Debug Logging (#12004 ) Some checks are pending Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details * 🪵 refactor: Simplify MCP Transport Log Messages - Updated the logging in MCPConnection to provide clearer output by explicitly logging the method and ID of messages received and sent, improving traceability during debugging. - This change replaces the previous JSON stringification of messages with a more structured log format, enhancing readability and understanding of the transport interactions. * 🔧 refactor: Streamline MCPConnection Message Handling - Removed redundant onmessage logging in MCPConnection to simplify the codebase. - Introduced a dedicated setupTransportOnMessageHandler method to centralize message handling and improve clarity in transport interactions. - Enhanced logging to provide clearer output for received messages, ensuring better traceability during debugging. * 🔧 refactor: Rename setupTransportDebugHandlers to patchTransportSend - Updated the MCPConnection class to rename the setupTransportDebugHandlers method to patchTransportSend for improved clarity. - Adjusted the method call in the connection setup process to reflect the new naming, enhancing code readability and maintainability.	2026-03-01 19:23:45 -05:00
Danny Avila	a0f9782e60	🪣 fix: Prevent Memory Retention from AsyncLocalStorage Context Propagation (#11942 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * fix: store hide_sequential_outputs before processStream clears config processStream now clears config.configurable after completion to break memory retention chains. Save hide_sequential_outputs to a local variable before calling runAgents so the post-stream filter still works. * feat: memory diagnostics * chore: expose garbage collection in backend inspect command Updated the backend inspect command in package.json to include the --expose-gc flag, enabling garbage collection diagnostics for improved memory management during development. * chore: update @librechat/agents dependency to version 3.1.52 Bumped the version of @librechat/agents in package.json and package-lock.json to ensure compatibility and access to the latest features and fixes. * fix: clear heavy config state after processStream to prevent memory leaks Break the reference chain from LangGraph's internal __pregel_scratchpad through @langchain/core RunTree.extra[lc:child_config] into the AsyncLocalStorage context captured by timers and I/O handles. After stream completion, null out symbol-keyed scratchpad properties (currentTaskInput), config.configurable, and callbacks. Also call Graph.clearHeavyState() to release config, signal, content maps, handler registry, and tool sessions. * chore: fix imports for memory utils * chore: add circular dependency check in API build step Enhanced the backend review workflow to include a check for circular dependencies during the API build process. If a circular dependency is detected, an error message is displayed, and the process exits with a failure status. * chore: update API build step to include circular dependency detection Modified the backend review workflow to rename the API package installation step to reflect its new functionality, which now includes detection of circular dependencies during the build process. * chore: add memory diagnostics option to .env.example Included a commented-out configuration option for enabling memory diagnostics in the .env.example file, which logs heap and RSS snapshots every 60 seconds when activated. * chore: remove redundant agentContexts cleanup in disposeClient function Streamlined the disposeClient function by eliminating duplicate cleanup logic for agentContexts, ensuring efficient memory management during client disposal. * refactor: move runOutsideTracing utility to utils and update its usage Refactored the runOutsideTracing function by relocating it to the utils module for better organization. Updated the tool execution handler to utilize the new import, ensuring consistent tracing behavior during tool execution. * refactor: enhance connection management and diagnostics Added a method to ConnectionsRepository for retrieving the active connection count. Updated UserConnectionManager to utilize this new method for app connection count reporting. Refined the OAuthReconnectionTracker's getStats method to improve clarity in diagnostics. Introduced a new tracing utility in the utils module to streamline tracing context management. Additionally, added a safeguard in memory diagnostics to prevent unnecessary snapshot collection for very short intervals. * refactor: enhance tracing utility and add memory diagnostics tests Refactored the runOutsideTracing function to improve warning logic when the AsyncLocalStorage context is missing. Added tests for memory diagnostics and tracing utilities to ensure proper functionality and error handling. Introduced a new test suite for memory diagnostics, covering snapshot collection and garbage collection behavior.	2026-02-25 17:41:23 -05:00
Danny Avila	9a8a5d66d7	⏱️ fix: Separate MCP GET SSE Stream Timeout from POST and Suppress SDK-Internal Recovery Errors (#11936 ) * fix: Separate MCP GET SSE body timeout from POST and suppress SDK-internal stream recovery - Add a dedicated GET Agent with a configurable `sseReadTimeout` (default 5 min, matching the Python MCP SDK) so idle SSE streams time out independently of POST requests, preventing the reconnect-loop log flood described in Discussion #11230. - Suppress "SSE stream disconnected" and "Failed to reconnect SSE stream" errors in setupTransportErrorHandlers — these are SDK-internal recovery events, not transport failures. "Maximum reconnection attempts exceeded" still escalates. - Add optional `sseReadTimeout` to BaseOptionsSchema for per-server configuration. - Add 6 tests: agent timeout separation, custom sseReadTimeout, SSE disconnect suppression (3 unit), and a real-server integration test proving the GET stream recovers without a full transport rebuild. * fix: Refactor MCP connection timeouts and error handling - Updated the `DEFAULT_SSE_READ_TIMEOUT` to use a constant for better readability. - Introduced internal error message constants for SSE stream disconnection and reconnection failures to improve maintainability. - Enhanced type safety in tests by ensuring the options symbol is defined before usage. - Updated the `sseReadTimeout` in `BaseOptionsSchema` to enforce positive values, ensuring valid configurations. * chore: Update SSE read timeout documentation format in BaseOptionsSchema - Changed the default timeout value comment in BaseOptionsSchema to use an underscore for better readability, aligning with common formatting practices.	2026-02-24 21:05:58 -05:00
Danny Avila	8c3c326440	🔌 fix: Reuse Undici Agents Per Transport and Close on Disconnect (#11935 ) * fix: error handling for transient HTTP request failures in MCP connection - Added specific handling for the "fetch failed" TypeError, indicating that the request was aborted likely due to a timeout, while the connection remains usable. - Updated the error message to provide clearer context for users regarding the transient nature of the error. * refactor: MCPConnection with Agent Lifecycle Management - Introduced an array to manage undici Agents, ensuring they are reused across requests and properly closed during disconnection. - Updated the custom fetch and SSE connection methods to utilize the new Agent management system. - Implemented error handling for SSE 404 responses based on session presence, improving connection stability. - Added integration tests to validate the Agent lifecycle, ensuring agents are reused and closed correctly. * fix: enhance error handling and connection management in MCPConnection - Updated SSE connection timeout handling to use nullish coalescing for better defaulting. - Improved the connection closure process by ensuring agents are properly closed and errors are logged non-fatally. - Added tests to validate handling of "fetch failed" errors, marking them as transient and providing clearer messaging for users. * fix: update timeout handling in MCPConnection for improved defaulting - Changed timeout handling in MCPConnection to use logical OR instead of nullish coalescing for better default value assignment. - Ensured consistent timeout behavior for both standard and SSE connections, enhancing reliability in connection management.	2026-02-24 19:06:06 -05:00
Danny Avila	3bf715e05e	♻️ refactor: On-demand MCP connections: remove proactive reconnect, default to available (#11839 ) * feat: Implement reconnection staggering and backoff jitter for MCP connections - Enhanced the reconnection logic in OAuthReconnectionManager to stagger reconnection attempts for multiple servers, reducing the risk of connection storms. - Introduced a backoff delay with random jitter in MCPConnection to improve reconnection behavior during network issues. - Updated the ConnectionsRepository to handle multiple server connections concurrently with a defined concurrency limit. Added tests to ensure the new reconnection strategy works as intended. * refactor: Update MCP server query configuration for improved data freshness - Reduced stale time from 5 minutes to 30 seconds to ensure quicker updates on server initialization. - Enabled refetching on window focus and mount to enhance data accuracy during user interactions. * ♻️ refactor: On-demand MCP connections; remove proactive reconnection, default to available - Remove reconnectServers() from refresh controller (connection storm root cause) - Stop gating server selection on connection status; add to selection immediately - Render agent panel tools from DB cache, not live connection status - Proceed to cached tools on init failure (only gate on OAuth) - Remove unused batchToggleServers() - Reduce useMCPServersQuery staleTime from 5min to 30s, enable refetchOnMount/WindowFocus * refactor: Optimize MCP tool initialization and server connection logic - Adjusted tool initialization to only occur if no cached tools are available, improving efficiency. - Updated comments for clarity on server connection and tool fetching processes. - Removed unnecessary connection status checks during server selection to streamline the user experience.	2026-02-17 22:33:57 -05:00
Danny Avila	2ea72a0f87	🎛️ fix: Google JSON Schema Normalization/Resolution Logic (#11804 ) - Updated `resolveJsonSchemaRefs` to prevent `` and `definitions` from appearing in the resolved output, ensuring compatibility with LLM APIs. - Improved `normalizeJsonSchema` to strip vendor extension fields (e.g., `x-*` prefixed keys) and leftover ``/`definitions` blocks, enhancing schema normalization for Google/Gemini API. - Added comprehensive tests to validate the stripping of ``, vendor extensions, and proper normalization across various schema structures.	2026-02-15 21:31:16 -05:00
Danny Avila	ccbf9dc093	🧰 fix: Convert `const` to `enum` in MCP Schemas for Gemini Compatibility (#11784 ) * fix: Convert `const` to `enum` in MCP tool schemas for Gemini/Vertex AI compatibility Gemini/Vertex AI rejects the JSON Schema `const` keyword in function declarations with a 400 error. Previously, the Zod conversion layer accidentally stripped `const`, but after migrating to pass raw JSON schemas directly to providers, the unsupported keyword now reaches Gemini verbatim. Add `normalizeJsonSchema` to recursively convert `const: X` → `enum: [X]`, which is semantically equivalent per the JSON Schema spec and supported by all providers. * fix: Update secure cookie handling in AuthService to use dynamic secure flag Replaced the static `secure: isProduction` with a call to `shouldUseSecureCookie()` in the `setOpenIDAuthTokens` function. This change ensures that the secure cookie setting is evaluated at runtime, improving cookie handling in development environments while maintaining security in production. * refactor: Simplify MCP tool key formatting and remove unused mocks in tests - Updated MCP test suite to replace static tool key formatting with a dynamic delimiter from Constants, enhancing consistency and maintainability. - Removed unused mock implementations for `@langchain/core/tools` and `@librechat/agents`, streamlining the test setup. - Adjusted related test cases to reflect the new tool key format, ensuring all tests remain functional. * chore: import order	2026-02-13 13:33:25 -05:00
Danny Avila	599f4a11f1	🛡️ fix: Secure MCP/Actions OAuth Flows, Resolve Race Condition & Tool Cache Cleanup (#11756 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * 🔧 fix: Update OAuth error message for clarity - Changed the default error message in the OAuth error route from 'Unknown error' to 'Unknown OAuth error' to provide clearer context during authentication failures. * 🔒 feat: Enhance OAuth flow with CSRF protection and session management - Implemented CSRF protection for OAuth flows by introducing `generateOAuthCsrfToken`, `setOAuthCsrfCookie`, and `validateOAuthCsrf` functions. - Added session management for OAuth with `setOAuthSession` and `validateOAuthSession` middleware. - Updated routes to bind CSRF tokens for MCP and action OAuth flows, ensuring secure authentication. - Enhanced tests to validate CSRF handling and session management in OAuth processes. * 🔧 refactor: Invalidate cached tools after user plugin disconnection - Added a call to `invalidateCachedTools` in the `updateUserPluginsController` to ensure that cached tools are refreshed when a user disconnects from an MCP server after a plugin authentication update. This change improves the accuracy of tool data for users. * chore: imports order * fix: domain separator regex usage in ToolService - Moved the declaration of `domainSeparatorRegex` to avoid redundancy in the `loadActionToolsForExecution` function, improving code clarity and performance. * chore: OAuth flow error handling and CSRF token generation - Enhanced the OAuth callback route to validate the flow ID format, ensuring proper error handling for invalid states. - Updated the CSRF token generation function to require a JWT secret, throwing an error if not provided, which improves security and clarity in token generation. - Adjusted tests to reflect changes in flow ID handling and ensure robust validation across various scenarios.	2026-02-12 14:22:05 -05:00
Danny Avila	924be3b647	🛡️ fix: Implement TOCTOU-Safe SSRF Protection for Actions and MCP (#11722 ) * refactor: better SSRF Protection in Action and Tool Services - Added `createSSRFSafeAgents` function to create HTTP/HTTPS agents that block connections to private/reserved IP addresses, enhancing security against SSRF attacks. - Updated `createActionTool` to accept a `useSSRFProtection` parameter, allowing the use of SSRF-safe agents during tool execution. - Modified `processRequiredActions` and `loadAgentTools` to utilize the new SSRF protection feature based on allowed domains configuration. - Introduced `resolveHostnameSSRF` function to validate resolved IPs against private ranges, preventing potential SSRF vulnerabilities. - Enhanced tests for domain resolution and private IP detection to ensure robust SSRF protection mechanisms are in place. * feat: Implement SSRF protection in MCP connections - Added `createSSRFSafeUndiciConnect` function to provide SSRF-safe DNS lookup options for undici agents. - Updated `MCPConnection`, `MCPConnectionFactory`, and `ConnectionsRepository` to include `useSSRFProtection` parameter, enabling SSRF protection based on server configuration. - Enhanced `MCPManager` and `UserConnectionManager` to utilize SSRF protection when establishing connections. - Updated tests to validate the integration of SSRF protection across various components, ensuring robust security measures are in place. * refactor: WS MCPConnection with SSRF protection and async transport construction - Added `resolveHostnameSSRF` to validate WebSocket hostnames against private IP addresses, enhancing SSRF protection. - Updated `constructTransport` method to be asynchronous, ensuring proper handling of SSRF checks before establishing connections. - Improved error handling for WebSocket transport to prevent connections to potentially unsafe addresses. * test: Enhance ActionRequest tests for SSRF-safe agent passthrough - Added tests to verify that httpAgent and httpsAgent are correctly passed to axios.create when provided in ActionRequest. - Included scenarios to ensure agents are not included when no options are specified. - Enhanced coverage for POST requests to confirm agent passthrough functionality. - Improved overall test robustness for SSRF protection in ActionRequest execution.	2026-02-11 22:09:58 -05:00
Danny Avila	5dc5799fc0	✈️ refactor: Single-Flight Deduplication for MCP Server Configs and Optimize Redis Batch Fetching (#11628 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * fix: Implement single-flight deduplication for getAllServerConfigs and optimize Redis getAll method - Added a pending promises map in MCPServersRegistry to handle concurrent calls to getAllServerConfigs, ensuring that multiple requests for the same userId are deduplicated. - Introduced a new fetchAllServerConfigs method to streamline the fetching process and improve performance. - Enhanced the getAll method in ServerConfigsCacheRedis to utilize MGET for batch fetching, significantly reducing Redis roundtrips and improving efficiency. - Added comprehensive tests for deduplication and performance optimizations, ensuring consistent results across concurrent calls and validating the new implementation. * refactor: Enhance logging in ServerConfigsCacheRedis for getAll method - Added debug logging to track the execution time and key retrieval in the getAll method of ServerConfigsCacheRedis. - Improved import organization by consolidating related imports for better clarity and maintainability. * test: Update MCPServersRegistry and ServerConfigsCacheRedis tests for call count assertions - Modified MCPServersRegistry integration tests to assert specific call counts for cache retrieval, ensuring accurate tracking of Redis interactions. - Refactored ServerConfigsCacheRedis integration tests to rename the test suite for clarity and improved focus on parallel fetching optimizations. - Enhanced the getAll method in ServerConfigsCacheRedis to utilize batching for improved performance during key retrieval. * chore: Simplify key extraction in ServerConfigsCacheRedis - Streamlined the key extraction logic in the getAll method of ServerConfigsCacheRedis by consolidating the mapping function into a single line, enhancing code readability and maintainability.	2026-02-04 16:25:22 +01:00
Danny Avila	d13037881a	🔐 fix: MCP OAuth Tool Discovery and Event Emission (#11599 ) * fix: MCP OAuth tool discovery and event emission in event-driven mode - Add discoverServerTools method to MCPManager for tool discovery when OAuth is required - Fix OAuth event emission to send both ON_RUN_STEP and ON_RUN_STEP_DELTA events - Fix hasSubscriber flag reset in GenerationJobManager for proper event buffering - Add ToolDiscoveryOptions and ToolDiscoveryResult types - Update reinitMCPServer to use new discovery method and propagate OAuth URLs * refactor: Update ToolService and MCP modules for improved functionality - Reintroduced Constants in ToolService for better reference management. - Enhanced loadToolDefinitionsWrapper to handle both response and streamId scenarios. - Updated MCP module to correct type definitions for oauthStart parameter. - Improved MCPConnectionFactory to ensure proper disconnection handling during tool discovery. - Adjusted tests to reflect changes in mock implementations and ensure accurate behavior during OAuth handling. * fix: Refine OAuth handling in MCPConnectionFactory and related tests - Updated the OAuth URL assignment logic in reinitMCPServer to prevent overwriting existing URLs. - Enhanced error logging to provide clearer messages when tool discovery fails. - Adjusted tests to reflect changes in OAuth handling, ensuring accurate detection of OAuth requirements without generating URLs in discovery mode. * refactor: Clean up OAuth URL assignment in reinitMCPServer - Removed redundant OAuth URL assignment logic in the reinitMCPServer function to streamline the tool discovery process. - Enhanced error logging for tool discovery failures, improving clarity in debugging and monitoring. * fix: Update response handling in ToolService for event-driven mode - Changed the condition in loadToolDefinitionsWrapper to check for writableEnded instead of headersSent, ensuring proper event emission when the response is still writable. - This adjustment enhances the reliability of event handling during tool execution, particularly in streaming scenarios.	2026-02-01 19:37:04 -05:00

1 2 3

150 commits