mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-04-03 14:27:20 +02:00
3 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
fda72ac621
|
🏗️ refactor: Remove Redundant Caching, Migrate Config Services to TypeScript (#12466)
* ♻️ refactor: Remove redundant scopedCacheKey caching, support user-provided key model fetching Remove redundant cache layers that used `scopedCacheKey()` (tenant-only scoping) on top of `getAppConfig()` which already caches per-principal (role+user+tenant). This caused config overrides for different principals within the same tenant to be invisible due to stale cached data. Changes: - Add `requireJwtAuth` to `/api/endpoints` route for proper user context - Remove ENDPOINT_CONFIG, STARTUP_CONFIG, PLUGINS, TOOLS, and MODELS_CONFIG cache layers — all derive from `getAppConfig()` with cheap computation - Enhance MODEL_QUERIES cache: hash(baseURL+apiKey) keys, 2-minute TTL, caching centralized in `fetchModels()` base function - Support fetching models with user-provided API keys in `loadConfigModels` via `getUserKeyValues` lookup (no caching for user keys) - Update all affected tests Closes #1028 * ♻️ refactor: Migrate config services to TypeScript in packages/api Move core config logic from CJS /api wrappers to typed TypeScript in packages/api using dependency injection factories: - `createEndpointsConfigService` — endpoint config merging + checkCapability - `createLoadConfigModels` — custom endpoint model loading with user key support - `createMCPToolCacheService` — MCP tool cache operations (update, merge, cache) /api files become thin wrappers that wire dependencies (getAppConfig, loadDefaultEndpointsConfig, getUserKeyValues, getCachedTools, etc.) into the typed factories. Also moves existing `endpoints/config.ts` → `endpoints/config/providers.ts` to accommodate the new `config/` directory structure. * 🔄 fix: Invalidate models query when user API key is set or revoked Without this, users had to refresh the page after entering their API key to see the updated model list fetched with their credentials. - Invalidate QueryKeys.models in useUpdateUserKeysMutation onSuccess - Invalidate QueryKeys.models in useRevokeUserKeyMutation onSuccess - Invalidate QueryKeys.models in useRevokeAllUserKeysMutation onSuccess * 🗺️ fix: Remap YAML-level override keys to AppConfig equivalents in mergeConfigOverrides Config overrides stored in the DB use YAML-level keys (TCustomConfig), but they're merged into the already-processed AppConfig where some fields have been renamed by AppService. This caused mcpServers overrides to land on a nonexistent key instead of mcpConfig, so config-override MCP servers never appeared in the UI. - Add OVERRIDE_KEY_MAP to remap mcpServers→mcpConfig, interface→interfaceConfig - Apply remapping before deep merge in mergeConfigOverrides - Add test for YAML-level key remapping behavior - Update existing tests to use AppConfig field names in assertions * 🧪 test: Update service.spec to use AppConfig field names after override key remapping * 🛡️ fix: Address code review findings — reliability, types, tests, and performance - Pass tenant context (getTenantId) in importers.js getEndpointsConfig call - Add 5 tests for user-provided API key model fetching (key found, no key, DB error, missing userId, apiKey-only with fixed baseURL) - Distinguish NO_USER_KEY (debug) from infrastructure errors (warn) in catch - Switch fetchPromisesMap from Promise.all to Promise.allSettled so one failing provider doesn't kill the entire model config - Parallelize getUserKeyValues DB lookups via batched Promise.allSettled instead of sequential awaits in the loop - Hoist standardCache instance in fetchModels to avoid double instantiation - Replace Record<string, unknown> types with Partial<TConfig>-based types; remove as unknown as T double-cast in endpoints config - Narrow Bedrock availableRegions to typed destructure - Narrow version field from string|number|undefined to string|undefined - Fix import ordering in mcp/tools.ts and config/models.ts per AGENTS.md - Add JSDoc to getModelsConfig alias clarifying caching semantics * fix: Guard against null getCachedTools in mergeAppTools * 🔍 fix: Address follow-up review — deduplicate extractEnvVariable, fix error discrimination, add log-level tests - Deduplicate extractEnvVariable calls: resolve apiKey/baseURL once, reuse for both the entry and isUserProvided checks (Finding A) - Move ResolvedEndpoint interface from function closure to module scope (Finding B) - Replace fragile msg.includes('NO_USER_KEY') with ErrorTypes.NO_USER_KEY enum check against actual error message format (Finding C). Also handle ErrorTypes.INVALID_USER_KEY as an expected "no key" case. - Add test asserting logger.warn is called for infra errors (not debug) - Add test asserting logger.debug is called for NO_USER_KEY errors (not warn) * fix: Preserve numeric assistants version via String() coercion * 🐛 fix: Address secondary review — Ollama cache bypass, cache tests, type safety - Fix Ollama success path bypassing cache write in fetchModels (CRITICAL): store result before returning so Ollama models benefit from 2-minute TTL - Add 4 fetchModels cache behavior tests: cache write with TTL, cache hit short-circuits HTTP, skipCache bypasses read+write, empty results not cached - Type-safe OVERRIDE_KEY_MAP: Partial<Record<keyof TCustomConfig, keyof AppConfig>> so compiler catches future field rename mismatches - Fix import ordering in config/models.ts (package types longest→shortest) - Rename ToolCacheDeps → MCPToolCacheDeps for naming consistency - Expand getModelsConfig JSDoc to explain caching granularity * fix: Narrow OVERRIDE_KEY_MAP index to satisfy strict tsconfig * 🧩 fix: Add allowedProviders to TConfig, remove Record<string, unknown> from PartialEndpointEntry The agents endpoint config includes allowedProviders (used by the frontend AgentPanel to filter available providers), but it was missing from TConfig. This forced PartialEndpointEntry to use & Record<string, unknown> as an escape hatch, violating AGENTS.md type policy. - Add allowedProviders?: (string | EModelEndpoint)[] to TConfig - Remove Record<string, unknown> from PartialEndpointEntry — now just Partial<TConfig> * 🛡️ fix: Isolate Ollama cache write from fetch try-catch, add Ollama cache tests - Separate Ollama fetch and cache write into distinct scopes so a cache failure (e.g., Redis down) doesn't misattribute the error as an Ollama API failure and fall through to the OpenAI-compatible path (Issue A) - Add 2 Ollama-specific cache tests: models written with TTL on fetch, cached models returned without hitting server (Issue B) - Replace hardcoded 120000 with Time.TWO_MINUTES constant in cache TTL test assertion (Issue C) - Fix OVERRIDE_KEY_MAP JSDoc to accurately describe runtime vs compile-time type enforcement (Issue D) - Add global beforeEach for cache mock reset to prevent cross-test leakage * 🧪 fix: Address third review — DI consistency, cache key width, MCP tests - Inject loadCustomEndpointsConfig via EndpointsConfigDeps with default fallback, matching loadDefaultEndpointsConfig DI pattern (Finding 3) - Widen modelsCacheKey from 64-bit (.slice(0,16)) to 128-bit (.slice(0,32)) for collision-sensitive cross-credential cache key (Finding 4) - Add fetchModels.mockReset() in loadConfigModels.spec beforeEach to prevent mock implementation leaks across tests (Finding 5) - Add 11 unit tests for createMCPToolCacheService covering all three functions: null/empty input, successful ops, error propagation, cold-cache merge (Finding 2) - Simplify getModelsConfig JSDoc to @see reference (Finding 10) * ♻️ refactor: Address remaining follow-ups from reviews OVERRIDE_KEY_MAP completeness: - Add missing turnstile→turnstileConfig mapping - Add exhaustiveness test verifying all three renamed keys are remapped and original YAML keys don't leak through Import role context: - Pass userRole through importConversations job → importLibreChatConvo so role-based endpoint overrides are honored during conversation import - Update convos.js route to include req.user.role in the job payload createEndpointsConfigService unit tests: - Add 8 tests covering: default+custom merge, Azure/AzureAssistants/ Anthropic Vertex/Bedrock config enrichment, assistants version coercion, agents allowedProviders, req.config bypass Plugins/tools efficiency: - Use Set for includedTools/filteredTools lookups (O(1) vs O(n) per plugin) - Combine auth check + filter into single pass (eliminates intermediate array) - Pre-compute toolDefKeys Set for O(1) tool definition lookups * fix: Scope model query cache by user when userIdQuery is enabled * fix: Skip model cache for userIdQuery endpoints, fix endpoints test types - When userIdQuery is true, skip caching entirely (like user_provided keys) to avoid cross-user model list leakage without duplicating cache data - Fix AgentCapabilities type error in endpoints.spec.ts — use enum values and appConfig() helper for partial mock typing * 🐛 fix: Restore filteredTools+includedTools composition, add checkCapability tests - Fix filteredTools regression: whitelist and blacklist are now applied independently (two flat guards), matching original behavior where includedTools=['a','b'] + filteredTools=['b'] produces ['a'] (Finding A) - Fix Set spread in toolkit loop: pre-compute toolDefKeysList array once alongside the Set, reuse for .some() without per-plugin allocation (Finding B) - Add 2 filteredTools tests: blacklist-only path and combined whitelist+blacklist composition (Finding C) - Add 3 checkCapability tests: capability present, capability absent, fallback to defaultAgentCapabilities for non-agents endpoints (Finding D) * 🔑 fix: Include config-override MCP servers in filterAuthorizedTools Config-override MCP servers (defined via admin config overrides for roles/groups) were rejected by filterAuthorizedTools because it called getAllServerConfigs(userId) without the configServers parameter. Only YAML and DB-backed user servers were included in the access check. - Add configServers parameter to filterAuthorizedTools - Resolve config servers via resolveConfigServers(req) at all 4 callsites (create, update, duplicate, revert) using parallel Promise.all - Pass configServers through to getAllServerConfigs(userId, configServers) so the registry merges config-source servers into the access check - Update filterAuthorizedTools.spec.js mock for resolveConfigServers * fix: Skip model cache for userIdQuery endpoints, fix endpoints test types For user-provided key endpoints (userProvide: true), skip the full model list re-fetch during message validation — the user already selected from a list we served them, and re-fetching with skipCache:true on every message send is both slow and fragile (5s provider timeout = rejected model). Instead, validate the model string format only: - Must be a string, max 256 chars - Must match [a-zA-Z0-9][a-zA-Z0-9_.:\-/@+ ]* (covers all known provider model ID formats while rejecting injection attempts) System-configured endpoints still get full model list validation as before. * 🧪 test: Add regression tests for filterAuthorizedTools configServers and validateModel filterAuthorizedTools: - Add test verifying configServers is passed to getAllServerConfigs and config-override server tools are allowed through - Guard resolveConfigServers in createAgentHandler to only run when MCP tools are present (skip for tool-free agent creates) validateModel (12 new tests): - Format validation: missing model, non-string, length overflow, leading special char, script injection, standard model ID acceptance - userProvide early-return: next() called immediately, getModelsConfig not invoked (regression guard for the exact bug this fixes) - System endpoint list validation: reject unknown model, accept known model, handle null/missing models config Also fix unnecessary backslash escape in MODEL_PATTERN regex. * 🧹 fix: Remove space from MODEL_PATTERN, trim input, clean up nits - Remove space character from MODEL_PATTERN regex — no real model ID uses spaces; prevents spurious violation logs from whitespace artifacts - Add model.trim() before validation to handle accidental whitespace - Remove redundant filterUniquePlugins call on already-deduplicated output - Add comment documenting intentional whitelist+blacklist composition - Add getUserKeyValues.mockReset() in loadConfigModels.spec beforeEach - Remove narrating JSDoc from getModelsConfig one-liner - Add 2 tests: trim whitespace handling, reject spaces in model ID * fix: Match startup tool loader semantics — includedTools takes precedence over filteredTools The startup tool loader (loadAndFormatTools) explicitly ignores filteredTools when includedTools is set, with a warning log. The PluginController was applying both independently, creating inconsistent behavior where the same config produced different results at startup vs plugin listing time. Restored mutually exclusive semantics: when includedTools is non-empty, filteredTools is not evaluated. * 🧹 chore: Simplify validateModel flow, note auth requirement on endpoints route - Separate missing-model from invalid-model checks cleanly: type+presence guard first, then trim+format guard (reviewer NIT) - Add route comment noting auth is required for role/tenant scoping * fix: Write trimmed model back to req.body.model for downstream consumers |
||
|
|
935288f841
|
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring
- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
`enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
`config.source === 'user'` across MCPManager, ConnectionsRepository,
UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB
* feat: three-layer MCPServersRegistry with config cache and lazy init
- Add `configCacheRepo` as third repository layer between YAML cache and
DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
DB-stored servers in `addServer()` and `addServerStub()`
* feat: wire tenant context into MCP controllers, services, and cache invalidation
- Resolve config-source servers via `getAppConfig({ role, tenantId })`
in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
config mutations trigger re-inspection of config-source MCP servers
* feat: enforce tenantMcpPolicy on admin config mcpServers mutations
- Add `validateMcpServerPolicy()` helper that checks mcpServers against
operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
→ websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured
* revert: remove tenantMcpPolicy schema and enforcement
The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.
- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
patchConfigField handlers
* test: update test assertions for source field and config-server wiring
- Use objectContaining in MCPServersRegistry reset test to account for
new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter
* fix: disconnect active connections when config-source servers are evicted
When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.
- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
clearMcpConfigCache() via MCPManager.appConnections.disconnect()
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
cross-tenant cache poisoning when two tenants define the same
server name with different configurations
- Change dbSourced checks from `source === 'user'` to
`source !== 'yaml' && source !== 'config'` so undefined source
(pre-upgrade cached configs) fails closed to restricted mode
MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
calling getOAuthServers() separately — config-source OAuth servers
are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
servers now always resolve regardless of tenant context
MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
calls per request
Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
legacy dbId check when absent (backward-compatible with pre-upgrade
cached configs that lack source field)
MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
covering lazy init, stub-on-failure, cross-tenant isolation via config
hash keys, concurrent deduplication, merge order, and cache invalidation
MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change
* fix: restore getServerConfig lookup for config-source servers (NEW-1)
Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.
Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.
- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup
* fix: eliminate configNameToKey race via caller-provided configServers param
Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.
- Add optional configServers param to getServerConfig; when provided,
returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig
* fix: populate read-through cache after config server lazy init
After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.
Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.
* fix: user-scoped getServerConfig fallback to server-only cache key
When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.
* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race
CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.
MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).
MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.
MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).
* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext
Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.
Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.
* fix: stale failure stubs retry after 5 min, upsert for cross-process races
- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
instead of permanently disabling config-source servers after transient
errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
update, preventing cross-process Redis races where a second instance's
successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS
* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup
- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests
* fix: server-only readThrough fallback only returns truthy values
Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.
* fix: remove findInConfigCache to eliminate cross-tenant config leakage
The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:
1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)
Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.
- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
(prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry
* style: fix import order per AGENTS.md conventions
Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.
* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures
Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:
- Cross-tenant contamination: the readThrough cache key was unscoped
(just serverName), so concurrent multi-tenant requests for same-named
servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
fail with "Configuration not found" because the readThrough entry
had expired
Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
provided config directly, falling back to getServerConfig lookup
for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior
* fix: cache base configs for config-server users; narrow upsertConfigCache error handling
- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
from config-server layering. Base configs are cached via readThroughCacheAll
regardless of whether configServers is provided, eliminating uncached
MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
infrastructure errors (Redis timeouts, network failures) now propagate
instead of being silently swallowed, preventing inspection storms
during outages
* fix: restore correct merge order and document upsert error matching
- Restore YAML → Config → User DB precedence in getAllServerConfigs
(user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
linking to the two cache implementations that define the error message
* feat: complete config-source server support across all execution paths
Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.
- Thread configServers into handleTools.js agent tool pipeline: resolve
config servers from tenant context before MCP tool iteration, pass to
getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
applyContextToAgent → getMCPInstructionsForServers →
formatInstructionsForContext, resolved in client.js before agent
context application
- Add configServers param to createMCPTool and createMCPTools for
reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter
* fix: add missing mocks for config-source server dependencies in client.test.js
Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.
* fix: update formatInstructionsForContext assertions for configServers param
The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.
* fix: move configServers resolution before MCP tool loop to avoid TDZ
configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.
* fix: address review findings — cache bypass, discoverServerTools gap, DRY
- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
readThroughCacheAll) instead of bypassing it when configServers is present.
Extracts user-DB entries from cached base by diffing against YAML keys
to maintain YAML → Config → User DB merge order without extra MongoDB calls.
- #3: Add configServers param to ToolDiscoveryOptions and thread it through
discoverServerTools → getServerConfig so config-source servers are
discoverable during OAuth reconnection flows.
- #6: Replace inline import() type annotations in context.ts with proper
import type { ParsedServerConfig } per AGENTS.md conventions.
- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
handleTools.js and client.js, eliminating the duplicated 6-line config
resolution pattern.
- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
choice in getMCPSetupData — documents non-obvious correctness constraint.
- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.
* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case
- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
YAML entry with the same name was excluded from userDbConfigs; now
uses reference equality check to detect DB-overwritten YAML keys
* fix: replace fragile string-match error detection with proper upsert method
Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.
Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)
* fix: wire configServers through remaining HTTP endpoints
- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name
* fix: thread serverConfig through getConnection for config-source servers
Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.
* fix: thread configServers through reinit, reconnect, and tool definition paths
Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:
- reinitMCPServer: accepts serverConfig and configServers, uses them for
getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer
* fix: address review findings — simplify merge, harden error paths, fix log labels
- Simplify getAllServerConfigs merge: replace fragile reference-equality
loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js
* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label
- Clear yamlServerNamesPromise on rejection so transient cache errors
don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
reinitMCPServer — they lack a CACHE/DB storage location; retry is
handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message
* fix: persist oauthHeaders in flow state for config-source OAuth servers
The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.
Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.
* fix: update tests for getMCPSetupData null guard removal and ToolService mock
- MCP.spec.js: update test to expect graceful handling of null mcpConfig
instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
(resolveConfigServers)
* fix: address review findings — DRY, naming, logging, dead code, defensive guards
- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change
* fix: restore correct merge priority, use immutable spread, fix test mock
- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
unnecessary || {} defensive guard in getMCPSetupData
* fix: error handling, stable hashing, flatten nesting, remove dead param
- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets
* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision
The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.
Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.
Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.
* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency
The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').
* chore: imports/fields ordering
* fix: address review findings — error handling, targeted lookup, test gaps
- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
getAppConfig/getAllServerConfigs failures propagate instead of masking
infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
happy path and error propagation.
* fix: add getOAuthReconnectionManager mock to OAuth callback tests
* chore: imports ordering
|
||
|
|
9f6d8c6e93
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation Introduces tenantContextMiddleware that propagates req.user.tenantId into AsyncLocalStorage, activating the Mongoose applyTenantIsolation plugin for all downstream DB queries within a request. - Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId - Non-strict mode passes through for backward compatibility - No-op for unauthenticated requests - Includes 6 unit tests covering all paths * feat: register tenant middleware and wrap startup/auth in runAsSystem() - Register tenantContextMiddleware in Express app after capability middleware - Wrap server startup initialization in runAsSystem() for strict mode compat - Wrap auth strategy getAppConfig() calls in runAsSystem() since they run before user context is established (LDAP, SAML, OpenID, social login, AuthService) * feat: thread tenantId through all getAppConfig callers Pass tenantId from req.user to getAppConfig() across all callers that have request context, ensuring correct per-tenant cache key resolution. Also fixes getBaseConfig admin endpoint to scope to requesting admin's tenant instead of returning the unscoped base config. Files updated: - Controllers: UserController, PluginController - Middleware: checkDomainAllowed, balance - Routes: config - Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP - Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech - Admin: getBaseConfig endpoint * feat: add config cache invalidation on admin mutations - Add clearOverrideCache(tenantId?) to flush per-principal override caches by enumerating Keyv store keys matching _OVERRIDE_: prefix - Add invalidateConfigCaches() helper that clears base config, override caches, tool caches, and endpoint config cache in one call - Wire invalidation into all 5 admin config mutation handlers (upsert, patch, delete field, delete overrides, toggle active) - Add strict mode warning when __default__ tenant fallback is used - Add 3 new tests for clearOverrideCache (all/scoped/base-preserving) * chore: update getUserPrincipals comment to reflect ALS-based tenant filtering The TODO(#12091) about missing tenantId filtering is resolved by the tenant context middleware + applyTenantIsolation Mongoose plugin. Group queries are now automatically scoped by tenantId via ALS. * fix: replace runAsSystem with baseOnly for pre-tenant code paths App configs are tenant-owned — runAsSystem() would bypass tenant isolation and return cross-tenant DB overrides. Instead, add baseOnly option to getAppConfig() that returns YAML-derived config only, with zero DB queries. All startup code, auth strategies, and MCP initialization now use getAppConfig({ baseOnly: true }) to get the YAML config without touching the Config collection. * fix: address PR review findings — middleware ordering, types, cache safety - Chain tenantContextMiddleware inside requireJwtAuth after passport auth instead of global app.use() where req.user is always undefined (Finding 1) - Remove global tenantContextMiddleware registration from index.js - Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4) - Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3) - Use startsWith instead of includes for cache key filtering (Finding 12) - Use generator loop instead of Array.from for key enumeration (Finding 3) - Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5) - Move isMainThread check to module level, remove per-request check (Finding 9) - Move mid-file require to top of app.js (Finding 8) - Parallelize invalidateConfigCaches with Promise.all (Finding 10) - Remove clearOverrideCache from public app.js exports (internal only) - Strengthen getUserPrincipals comment re: ALS dependency (Finding 2) * fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly - Restore runAsSystem() around performStartupChecks, updateInterfacePermissions, initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose queries that need system context in strict tenant mode (NEW-3) - Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1) - Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2) * test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests - requireJwtAuth: 5 tests verifying ALS tenant context is set after passport auth, isolated between concurrent requests, and not set when user has no tenantId (Finding 6) - invalidateConfigCaches: 4 tests verifying all four caches are cleared, tenantId is threaded through, partial failure is handled gracefully, and operations run in parallel via Promise.all (Finding 11) * fix: address Copilot review — passport errors, namespaced cache keys, /base scoping - Forward passport errors in requireJwtAuth before entering tenant middleware — prevents silent auth failures from reaching handlers (P1) - Account for Keyv namespace prefix in clearOverrideCache — stored keys are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...", so override caches were never actually matched/cleared (P2) - Remove role from getBaseConfig — /base should return tenant-scoped base config, not role-merged config that drifts per admin role (P2) - Return tenantStorage.run() for cleaner async semantics - Update mock cache in service.spec.ts to simulate Keyv namespacing * fix: address second review — cache safety, code quality, test reliability - Decouple cache invalidation from mutation response: fire-and-forget with logging so DB mutation success is not masked by cache failures - Extract clearEndpointConfigCache helper from inline IIFE - Move isMainThread check to lazy once-per-process guard (no import side effect) - Memoize process.env read in overrideCacheKey to avoid per-request env lookups and log flooding in strict mode - Remove flaky timer-based parallelism assertion, use structural check - Merge orphaned double JSDoc block on getUserPrincipals - Fix stale [getAppConfig] log prefix → [ensureBaseConfig] - Fix import order in tenant.spec.ts (package types before local values) - Replace "Finding 1" reference with self-contained description - Use real tenantStorage primitives in requireJwtAuth spec mock * fix: move JSDoc to correct function after clearEndpointConfigCache extraction * refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry Redis SCAN causes 60s+ stalls under concurrent load (see #12410). APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the in-memory store.keys() path handles the standard case. When APP_CONFIG is Redis-backed, overrides expire naturally via overrideCacheTtl (60s default) — an acceptable window for admin config mutations. * fix: remove return from tenantStorage.run to satisfy void middleware signature * fix: address second review — cache safety, code quality, test reliability - Switch invalidateConfigCaches from Promise.all to Promise.allSettled so partial failures are logged individually instead of producing one undifferentiated error (Finding 3) - Gate overrideCacheKey strict-mode warning behind a once-per-process flag to prevent log flooding under load (Finding 4) - Add test for passport error forwarding in requireJwtAuth — the if (err) { return next(err) } branch now has coverage (Finding 5) - Add test for real partial failure in invalidateConfigCaches where clearAppConfigCache rejects (not just the swallowed endpoint error) * chore: reorder imports in index.js and app.js for consistency - Moved logger and runAsSystem imports to maintain a consistent import order across files. - Improved code readability by ensuring related imports are grouped together. |