2025-08-26 12:10:18 -04:00
|
|
|
const { CacheKeys } = require('librechat-data-provider');
|
🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445)
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode
seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.
* fix: chain tenantContextMiddleware in optionalJwtAuth
optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.
* feat: tenant-safe bulkWrite wrapper and call-site migration
Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.
Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)
systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.
* feat: pre-auth tenant middleware and tenant-scoped config cache
Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.
The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.
The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.
* feat: accept optional tenantId in updateInterfacePermissions
When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.
* fix: tenant-filter $lookup in getPromptGroup aggregation
The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.
Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.
* fix: replace field-level unique indexes with tenant-scoped compounds
Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.
- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
compound
* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant
These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.
Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).
Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG
* fix: add getTenantId to PluginController spec mock
The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.
* fix: address review findings — migration, strict-mode, DRY, types
Addresses all CRITICAL, MAJOR, and MINOR review findings:
F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.
F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.
F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.
F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.
F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.
F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.
F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).
F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.
* fix: add new superseded indexes to tenantIndexes test fixture
The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).
* fix: restore logger.warn for unknown bulk op types in non-strict mode
* fix: block SYSTEM_TENANT_ID sentinel from external header input
CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.
Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js
* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions
bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling
preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
getTenantId() inside the middleware's execution context, not the
test runner's outer frame (which was always undefined regardless)
* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager
MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
preventing cross-tenant message content reads during TTS streaming.
FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
context is active, isolating OAuth flow state per tenant.
GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId
* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks
FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.
* fix: correct import ordering per AGENTS.md conventions
Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.
* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode
serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).
* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs
SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped
FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows
Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site
* fix: SSE stream tests hang on success path, remove internal fork references
The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.
Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".
* fix: address review findings — test rigor, tenant ID validation, docs
F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.
F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.
F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.
F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.
F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.
F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.
* fix: revert flow key tenant scoping, fix SSE test timing
FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).
SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.
* fix: restore getTenantId import for completeFlow diagnostic log
* fix: correct completeFlow warning message, add missing flow test
The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.
Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.
Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.
* fix: add explicit return type to recursive updateInterfacePermissions
The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.
* fix: update MCPOAuthRaceCondition test to match new completeFlow warning
* fix: clearEndpointConfigCache deletes both scoped and unscoped keys
Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.
* fix: tenant guard on abort/status endpoints, warn logs, test coverage
F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.
F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.
F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.
F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.
F5: Test for job with tenantId + user without tenantId → 403.
F10: Regex uses idiomatic hyphen-at-start form.
F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).
Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.
* fix: test coverage for logger.warn, status/abort guards, consistency
A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.
B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.
C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.
D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.
* fix: assert ip field in malformed header warn tests
* fix: parallelize cache deletes, document tenant guard, fix import order
- clearEndpointConfigCache uses Promise.all for independent cache
deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
AGENTS.md
* fix: tenant-qualify userJobs keys, document tenant guard backward-compat
Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap
getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).
Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.
* fix: parallelize cache deletes, document tenant guard, fix import order
Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.
Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.
* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs
F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.
F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.
F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).
F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.
F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.
F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.
F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).
* fix: clean userJobMap in evictOldest — same leak as cleanup()
2026-03-28 16:43:50 -04:00
|
|
|
const { AppService, logger, scopedCacheKey } = require('@librechat/data-schemas');
|
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring
- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
`enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
`config.source === 'user'` across MCPManager, ConnectionsRepository,
UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB
* feat: three-layer MCPServersRegistry with config cache and lazy init
- Add `configCacheRepo` as third repository layer between YAML cache and
DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
DB-stored servers in `addServer()` and `addServerStub()`
* feat: wire tenant context into MCP controllers, services, and cache invalidation
- Resolve config-source servers via `getAppConfig({ role, tenantId })`
in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
config mutations trigger re-inspection of config-source MCP servers
* feat: enforce tenantMcpPolicy on admin config mcpServers mutations
- Add `validateMcpServerPolicy()` helper that checks mcpServers against
operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
→ websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured
* revert: remove tenantMcpPolicy schema and enforcement
The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.
- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
patchConfigField handlers
* test: update test assertions for source field and config-server wiring
- Use objectContaining in MCPServersRegistry reset test to account for
new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter
* fix: disconnect active connections when config-source servers are evicted
When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.
- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
clearMcpConfigCache() via MCPManager.appConnections.disconnect()
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
cross-tenant cache poisoning when two tenants define the same
server name with different configurations
- Change dbSourced checks from `source === 'user'` to
`source !== 'yaml' && source !== 'config'` so undefined source
(pre-upgrade cached configs) fails closed to restricted mode
MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
calling getOAuthServers() separately — config-source OAuth servers
are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
servers now always resolve regardless of tenant context
MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
calls per request
Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
legacy dbId check when absent (backward-compatible with pre-upgrade
cached configs that lack source field)
MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
covering lazy init, stub-on-failure, cross-tenant isolation via config
hash keys, concurrent deduplication, merge order, and cache invalidation
MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change
* fix: restore getServerConfig lookup for config-source servers (NEW-1)
Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.
Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.
- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup
* fix: eliminate configNameToKey race via caller-provided configServers param
Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.
- Add optional configServers param to getServerConfig; when provided,
returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig
* fix: populate read-through cache after config server lazy init
After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.
Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.
* fix: user-scoped getServerConfig fallback to server-only cache key
When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.
* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race
CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.
MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).
MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.
MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).
* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext
Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.
Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.
* fix: stale failure stubs retry after 5 min, upsert for cross-process races
- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
instead of permanently disabling config-source servers after transient
errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
update, preventing cross-process Redis races where a second instance's
successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS
* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup
- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests
* fix: server-only readThrough fallback only returns truthy values
Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.
* fix: remove findInConfigCache to eliminate cross-tenant config leakage
The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:
1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)
Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.
- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
(prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry
* style: fix import order per AGENTS.md conventions
Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.
* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures
Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:
- Cross-tenant contamination: the readThrough cache key was unscoped
(just serverName), so concurrent multi-tenant requests for same-named
servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
fail with "Configuration not found" because the readThrough entry
had expired
Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
provided config directly, falling back to getServerConfig lookup
for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior
* fix: cache base configs for config-server users; narrow upsertConfigCache error handling
- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
from config-server layering. Base configs are cached via readThroughCacheAll
regardless of whether configServers is provided, eliminating uncached
MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
infrastructure errors (Redis timeouts, network failures) now propagate
instead of being silently swallowed, preventing inspection storms
during outages
* fix: restore correct merge order and document upsert error matching
- Restore YAML → Config → User DB precedence in getAllServerConfigs
(user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
linking to the two cache implementations that define the error message
* feat: complete config-source server support across all execution paths
Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.
- Thread configServers into handleTools.js agent tool pipeline: resolve
config servers from tenant context before MCP tool iteration, pass to
getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
applyContextToAgent → getMCPInstructionsForServers →
formatInstructionsForContext, resolved in client.js before agent
context application
- Add configServers param to createMCPTool and createMCPTools for
reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter
* fix: add missing mocks for config-source server dependencies in client.test.js
Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.
* fix: update formatInstructionsForContext assertions for configServers param
The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.
* fix: move configServers resolution before MCP tool loop to avoid TDZ
configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.
* fix: address review findings — cache bypass, discoverServerTools gap, DRY
- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
readThroughCacheAll) instead of bypassing it when configServers is present.
Extracts user-DB entries from cached base by diffing against YAML keys
to maintain YAML → Config → User DB merge order without extra MongoDB calls.
- #3: Add configServers param to ToolDiscoveryOptions and thread it through
discoverServerTools → getServerConfig so config-source servers are
discoverable during OAuth reconnection flows.
- #6: Replace inline import() type annotations in context.ts with proper
import type { ParsedServerConfig } per AGENTS.md conventions.
- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
handleTools.js and client.js, eliminating the duplicated 6-line config
resolution pattern.
- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
choice in getMCPSetupData — documents non-obvious correctness constraint.
- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.
* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case
- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
YAML entry with the same name was excluded from userDbConfigs; now
uses reference equality check to detect DB-overwritten YAML keys
* fix: replace fragile string-match error detection with proper upsert method
Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.
Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)
* fix: wire configServers through remaining HTTP endpoints
- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name
* fix: thread serverConfig through getConnection for config-source servers
Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.
* fix: thread configServers through reinit, reconnect, and tool definition paths
Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:
- reinitMCPServer: accepts serverConfig and configServers, uses them for
getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer
* fix: address review findings — simplify merge, harden error paths, fix log labels
- Simplify getAllServerConfigs merge: replace fragile reference-equality
loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js
* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label
- Clear yamlServerNamesPromise on rejection so transient cache errors
don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
reinitMCPServer — they lack a CACHE/DB storage location; retry is
handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message
* fix: persist oauthHeaders in flow state for config-source OAuth servers
The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.
Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.
* fix: update tests for getMCPSetupData null guard removal and ToolService mock
- MCP.spec.js: update test to expect graceful handling of null mcpConfig
instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
(resolveConfigServers)
* fix: address review findings — DRY, naming, logging, dead code, defensive guards
- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change
* fix: restore correct merge priority, use immutable spread, fix test mock
- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
unnecessary || {} defensive guard in getMCPSetupData
* fix: error handling, stable hashing, flatten nesting, remove dead param
- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets
* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision
The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.
Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.
Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.
* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency
The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').
* chore: imports/fields ordering
* fix: address review findings — error handling, targeted lookup, test gaps
- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
getAppConfig/getAllServerConfigs failures propagate instead of masking
infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
happy path and error propagation.
* fix: add getOAuthReconnectionManager mock to OAuth callback tests
* chore: imports ordering
2026-03-28 10:36:43 -04:00
|
|
|
const { createAppConfigService, clearMcpConfigCache } = require('@librechat/api');
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
const { setCachedTools, invalidateCachedTools } = require('./getCachedTools');
|
2025-10-05 06:37:57 -04:00
|
|
|
const { loadAndFormatTools } = require('~/server/services/start/tools');
|
|
|
|
|
const loadCustomConfig = require('./loadCustomConfig');
|
2025-08-26 12:10:18 -04:00
|
|
|
const getLogStores = require('~/cache/getLogStores');
|
2025-10-05 06:37:57 -04:00
|
|
|
const paths = require('~/config/paths');
|
🎛️ feat: DB-Backed Per-Principal Config System (#12354)
* ✨ feat: Add Config schema, model, and methods for role-based DB config overrides
Add the database foundation for principal-based configuration overrides
(user, group, role) in data-schemas. Includes schema with tenantId and
tenant isolation, CRUD methods, and barrel exports.
* 🔧 fix: Add shebang and enforce LF line endings for git hooks
The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was
converting it to CRLF, both causing "Exec format error" on Windows.
Add .gitattributes to force LF for .husky/* and *.sh files.
* ✨ feat: Add admin config API routes with section-level capability checks
Add /api/admin/config endpoints for managing per-principal config
overrides (user, group, role). Handlers in @librechat/api use DI pattern
with section-level hasConfigCapability checks for granular access control.
Supports full overrides replacement, per-field PATCH via dot-paths, field
deletion, toggle active, and listing.
* 🐛 fix: Move deleteConfigField fieldPath from URL param to request body
The path-to-regexp wildcard syntax (:fieldPath(*)) is not supported by
the version used in Express. Send fieldPath in the DELETE request body
instead, which also avoids URL-encoding issues with dotted paths.
* ✨ feat: Wire config resolution into getAppConfig with override caching
Add mergeConfigOverrides utility in data-schemas for deep-merging DB
config overrides into base AppConfig by priority order.
Update getAppConfig to query DB for applicable configs when role/userId
is provided, with short-TTL caching and a hasAnyConfigs feature flag
for zero-cost when no DB configs exist.
Also: add unique compound index on Config schema, pass userId from
config middleware, and signal config changes from admin API handlers.
* 🔄 refactor: Extract getAppConfig logic into packages/api as TS service
Move override resolution, caching strategy, and signalConfigChange from
api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts
using the DI factory pattern (createAppConfigService). The JS file becomes
a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies.
* 🧹 chore: Rename configResolution.ts to resolution.ts
* ✨ feat: Move admin types & capabilities to librechat-data-provider
Move SystemCapabilities, CapabilityImplications, and utility functions
(hasImpliedCapability, expandImplications) from data-schemas to
data-provider so they are available to external consumers like the
admin panel without a data-schemas dependency.
Add API-friendly admin types: TAdminConfig, TAdminSystemGrant,
TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult,
TCapabilityCategory, and CAPABILITY_CATEGORIES.
data-schemas re-exports these from data-provider and extends with
config-schema-derived types (ConfigSection, SystemCapability union).
Bump version to 0.8.500.
* feat: Add JSON-serializable admin config API response types to data-schemas
Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and
AdminConfigDeleteResponse types so both LibreChat API handlers and the
admin panel can share the same response contract. Bump version to 0.0.41.
* refactor: Move admin capabilities & types from data-provider to data-schemas
SystemCapabilities, CapabilityImplications, utility functions,
CAPABILITY_CATEGORIES, and admin API response types should not be in
data-provider as it gets compiled into the frontend bundle, exposing
the capability surface. Moved everything to data-schemas (server-only).
All consumers already import from @librechat/data-schemas, so no
import changes needed elsewhere. Consolidated duplicate AdminConfig
type (was in both config.ts and admin.ts).
* chore: Bump @librechat/data-schemas to 0.0.42
* refactor: Reorganize admin capabilities into admin/ and types/admin.ts
Split systemCapabilities.ts following data-schemas conventions:
- Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.)
→ src/types/admin.ts
- Runtime code (SystemCapabilities, CapabilityImplications, utilities)
→ src/admin/capabilities.ts
Revert data-provider version to 0.8.401 (no longer modified).
* chore: Fix import ordering, rename appConfigService to service
- Rename app/appConfigService.ts → app/service.ts (directory provides context)
- Fix import order in admin/config.ts, types/admin.ts, types/config.ts
- Add naming convention to AGENTS.md
* feat: Add DB base config support (role/__base__)
- Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc
- getApplicableConfigs always includes __base__ in queries
- getAppConfig queries DB even without role/userId when DB configs exist
- Bump @librechat/data-schemas to 0.0.43
* fix: Address PR review issues for admin config
- Add listAllConfigs method; listConfigs endpoint returns all active
configs instead of only __base__
- Normalize principalId to string in all config methods to prevent
ObjectId vs string mismatch on user/group lookups
- Block __proto__ and all dunder-prefixed segments in field path
validation to prevent prototype pollution
- Fix configVersion off-by-one: default to 0, guard pre('save') with
!isNew, use $inc on findOneAndUpdate
- Remove unused getApplicableConfigs from admin handler deps
* fix: Enable tree-shaking for data-schemas, bump packages
- Switch data-schemas Rollup output to preserveModules so each source
file becomes its own chunk; consumers (admin panel) can now import
just the modules they need without pulling in winston/mongoose/etc.
- Add sideEffects: false to data-schemas package.json
- Bump data-schemas to 0.0.44, data-provider to 0.8.402
* feat: add capabilities subpath export to data-schemas
Adds `@librechat/data-schemas/capabilities` subpath export so browser
consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants
without pulling in Node.js-only modules (winston, async_hooks, etc.).
Bump version to 0.0.45.
* fix: include dist/ in data-provider npm package
Add explicit files field so npm includes dist/types/ in the published
package. Without this, the root .gitignore exclusion of dist/ causes
npm to omit type declarations, breaking TypeScript consumers.
* chore: bump librechat-data-provider to 0.8.403
* feat: add GET /api/admin/config/base for raw AppConfig
Returns the full AppConfig (YAML + DB base merged) so the admin panel
can display actual config field values and structure. The startup config
endpoint (/api/config) returns TStartupConfig which is a different shape
meant for the frontend app.
* chore: imports order
* fix: address code review findings for admin config
Critical:
- Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE
instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys
- Eliminate race condition: patchConfigField and deleteConfigField now use
atomic MongoDB $set/$unset with dot-path notation instead of
read-modify-write cycles, removing the lost-update bug entirely
- Add patchConfigFields and unsetConfigField atomic DB methods
Major:
- Reorder cache check before principal resolution in getAppConfig so
getUserPrincipals DB query only fires on cache miss
- Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant
- Parallelize capability checks with Promise.all instead of sequential
awaits in for loops
- Use loose equality (== null) for cache miss check to handle both null
and undefined returns from cache implementations
- Set HAS_DB_CONFIGS_KEY to true on successful config fetch
Minor:
- Remove dead pre('save') hook from config schema (all writes use
findOneAndUpdate which bypasses document hooks)
- Consolidate duplicate type imports in resolution.ts
- Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops)
- Add .sort({ priority: 1 }) to getApplicableConfigs query
- Rename _impliedBy to impliedByMap
* fix: self-referencing BROAD_CONFIG_ACCESS constant
* fix: replace type-cast sentinel with proper null parameter
Update hasConfigCapability to accept ConfigSection | null where null
means broad access check (MANAGE_CONFIGS or READ_CONFIGS only).
Removes the '' as ConfigSection type lie from admin config handlers.
* fix: remaining review findings + add tests
- listAllConfigs accepts optional { isActive } filter so admin listing
can show inactive configs (#9)
- Standardize session application to .session(session ?? null) across
all config DB methods (#15)
- Export isValidFieldPath and getTopLevelSection for testability
- Add 38 tests across 3 spec files:
- config.spec.ts (api): path validation, prototype pollution rejection
- resolution.spec.ts: deep merge, priority ordering, array replacement
- config.spec.ts (data-schemas): full CRUD, ObjectId normalization,
atomic $set/$unset, configVersion increment, toggle, __base__ query
* fix: address second code review findings
- Fix cross-user cache contamination: overrideCacheKey now handles
userId-without-role case with its own cache key (#1)
- Add broad capability check before DB lookup in getConfig to prevent
config existence enumeration (#2/#3)
- Move deleteConfigField fieldPath from request body to query parameter
for proxy/load balancer compatibility (#5)
- Derive BaseSystemCapability from SystemCapabilities const instead of
manual string union (#6)
- Return 201 on upsert creation, 200 on update (#11)
- Remove inline narration comments per AGENTS.md (#12)
- Type overrides as Partial<TCustomConfig> in DB methods and handler
deps (#13)
- Replace double as-unknown-as casts in resolution.ts with generic
deepMerge<T> (#14)
- Make override cache TTL injectable via AppConfigServiceDeps (#16)
- Add exhaustive never check in principalModel switch (#17)
* fix: remaining review findings — tests, rename, semantics
- Rename signalConfigChange → markConfigsDirty with JSDoc documenting
the stale-window tradeoff and overrideCacheTtl knob
- Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention
- Add createAppConfigService tests (14 cases): cache behavior, feature
flag, cross-user key isolation, fallback on error, markConfigsDirty
- Add admin handler integration tests (13 cases): auth ordering,
201/200 on create/update, fieldPath from query param, markConfigsDirty
calls, capability checks
* fix: global flag corruption + empty overrides auth bypass
- Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning
no configs does not mean no configs exist globally. Setting the flag
false from a per-principal query short-circuited all subsequent users.
- Add broad manage capability check before section checks in
upsertConfigOverrides: empty overrides {} no longer bypasses auth.
* test: add regression and invariant tests for config system
Regression tests:
- Bug 1: User A's empty result does not short-circuit User B's overrides
- Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS
Invariant tests (applied across ALL handlers):
- All 5 mutation handlers call markConfigsDirty on success
- All 5 mutation handlers return 401 without auth
- All 5 mutation handlers return 403 without capability
- All 3 read handlers return 403 without capability
* fix: third review pass — all findings addressed
Service (service.ts):
- Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId)
so deployments with zero DB configs skip DB queries (#1)
- Resolve cache once at factory init instead of per-invocation (#8)
- Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10)
- Add JSDoc to clearAppConfigCache documenting stale-window (#4)
- Fix log message to not say "from YAML" (#14)
Admin handlers (config.ts):
- Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2)
- Add Array.isArray guard on overrides body (#5)
- Import CapabilityUser from capabilities.ts, remove duplicate (#6)
- Replace as-unknown-as cast with targeted type assertion (#7)
- Add MAX_PATCH_ENTRIES=100 cap on entries array (#15)
- Reorder deleteConfigField to validate principalType first (#12)
- Export CapabilityUser from middleware/capabilities.ts
DB methods (config.ts):
- Remove isActive:true from patchConfigFields to prevent silent
reactivation of disabled configs (#3)
Schema (config.ts):
- Change principalId from Schema.Types.Mixed to String (#11)
Tests:
- Add patchConfigField unsafe fieldPath rejection test (#9)
- Add base-only HAS_DB_CONFIGS=false test (#1)
- Update 201/200 tests to use configVersion instead of findConfig (#2)
* fix: add read handler 401 invariant tests + document flag behavior
- Add invariant: all 3 read handlers return 401 without auth
- Document on markConfigsDirty that HAS_DB_CONFIGS stays true after
all configs are deleted until clearAppConfigCache or restart
* fix: remove HAS_DB_CONFIGS false optimization entirely
getApplicableConfigs([]) only queries for __base__, not all configs.
A deployment with role/group configs but no __base__ doc gets the
flag poisoned to false by a base-only query, silently ignoring all
scoped overrides. The optimization is not safe without a comprehensive
Config.exists() check, which adds its own DB cost. Removed entirely.
The flag is now write-once-true (set when configs are found or by
markConfigsDirty) and only cleared by clearAppConfigCache/restart.
* chore: reorder import statements in app.js for clarity
* refactor: remove HAS_DB_CONFIGS_KEY machinery entirely
The three-state flag (false/null/true) was the source of multiple bugs
across review rounds. Every attempt to safely set it to false was
defeated by getApplicableConfigs querying only a subset of principals.
Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag,
markConfigsDirty (now a no-op concept), notifyChange wrapper, and all
tests that seeded false manually.
The per-user/role TTL cache (overrideCacheTtl, default 60s) is the
sole caching mechanism. On cache miss, getApplicableConfigs queries
the DB. This is one indexed query per user per TTL window — acceptable
for the config override use case.
* docs: rewrite admin panel remaining work with current state
* perf: cache empty override results to avoid repeated DB queries
When getApplicableConfigs returns no configs for a principal, cache
baseConfig under their override key with TTL. Without this, every
user with no per-principal overrides hits MongoDB on every request
after the 60s cache window expires.
* fix: add tenantId to cache keys + reject PUBLIC principal type
- Include tenantId in override cache keys to prevent cross-tenant
config contamination. Single-tenant deployments (tenantId undefined)
use '_' as placeholder — no behavior change for them.
- Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has
no PrincipalModel and is never resolved by getApplicableConfigs,
so config docs for it would be dead data.
- Config middleware passes req.user.tenantId to getAppConfig.
* fix: fourth review pass findings
DB methods (config.ts):
- findConfigByPrincipal accepts { includeInactive } option so admin
GET can retrieve inactive configs (#5)
- upsertConfig catches E11000 duplicate key on concurrent upserts and
retries without upsert flag (#2)
- unsetConfigField no longer filters isActive:true, consistent with
patchConfigFields (#11)
- Typed filter objects replace Record<string, unknown> (#12)
Admin handlers (config.ts):
- patchConfigField: serial broad capability check before Promise.all
to pre-warm ALS principal cache, preventing N parallel DB calls (#3)
- isValidFieldPath rejects leading/trailing dots and consecutive
dots (#7)
- Duplicate fieldPaths in patch entries return 400 (#8)
- DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14)
- Admin getConfig and patchConfigField pass includeInactive to
findConfigByPrincipal (#5)
- Route import uses barrel instead of direct file path (#13)
Resolution (resolution.ts):
- deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow
from crafted deeply nested configs (#4)
* fix: final review cleanup
- Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths)
- Add empty-result caching regression test
- Add tenantId to AdminConfigDeps.getAppConfig type
- Restore exhaustive never check in principalModel switch
- Standardize toggleConfigActive session handling to options pattern
* fix: validate priority in patchConfigField handler
Add the same non-negative number validation for priority that
upsertConfigOverrides already has. Without this, invalid priority
values could be stored via PATCH and corrupt merge ordering.
* chore: remove planning doc from PR
* fix: correct stale cache key strings in service tests
* fix: clean up service tests and harden tenant sentinel
- Remove no-op cache delete lines from regression tests
- Change no-tenant sentinel from '_' to '__default__' to avoid
collision with a real tenant ID when multi-tenancy is enabled
- Remove unused CONFIG_STORE from AppConfigServiceDeps
* chore: bump @librechat/data-schemas to 0.0.46
* fix: block prototype-poisoning keys in deepMerge
Skip __proto__, constructor, and prototype keys during config merge
to prevent prototype pollution via PUT /api/admin/config overrides.
2026-03-25 19:39:29 -04:00
|
|
|
const db = require('~/models');
|
2025-09-10 16:46:54 -06:00
|
|
|
|
2025-10-05 06:37:57 -04:00
|
|
|
const loadBaseConfig = async () => {
|
|
|
|
|
/** @type {TCustomConfig} */
|
|
|
|
|
const config = (await loadCustomConfig()) ?? {};
|
|
|
|
|
/** @type {Record<string, FunctionTool>} */
|
|
|
|
|
const systemTools = loadAndFormatTools({
|
|
|
|
|
adminFilter: config.filteredTools,
|
|
|
|
|
adminIncluded: config.includedTools,
|
|
|
|
|
directory: paths.structuredTools,
|
|
|
|
|
});
|
|
|
|
|
return AppService({ config, paths, systemTools });
|
|
|
|
|
};
|
|
|
|
|
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
const { getAppConfig, clearAppConfigCache, clearOverrideCache } = createAppConfigService({
|
🎛️ feat: DB-Backed Per-Principal Config System (#12354)
* ✨ feat: Add Config schema, model, and methods for role-based DB config overrides
Add the database foundation for principal-based configuration overrides
(user, group, role) in data-schemas. Includes schema with tenantId and
tenant isolation, CRUD methods, and barrel exports.
* 🔧 fix: Add shebang and enforce LF line endings for git hooks
The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was
converting it to CRLF, both causing "Exec format error" on Windows.
Add .gitattributes to force LF for .husky/* and *.sh files.
* ✨ feat: Add admin config API routes with section-level capability checks
Add /api/admin/config endpoints for managing per-principal config
overrides (user, group, role). Handlers in @librechat/api use DI pattern
with section-level hasConfigCapability checks for granular access control.
Supports full overrides replacement, per-field PATCH via dot-paths, field
deletion, toggle active, and listing.
* 🐛 fix: Move deleteConfigField fieldPath from URL param to request body
The path-to-regexp wildcard syntax (:fieldPath(*)) is not supported by
the version used in Express. Send fieldPath in the DELETE request body
instead, which also avoids URL-encoding issues with dotted paths.
* ✨ feat: Wire config resolution into getAppConfig with override caching
Add mergeConfigOverrides utility in data-schemas for deep-merging DB
config overrides into base AppConfig by priority order.
Update getAppConfig to query DB for applicable configs when role/userId
is provided, with short-TTL caching and a hasAnyConfigs feature flag
for zero-cost when no DB configs exist.
Also: add unique compound index on Config schema, pass userId from
config middleware, and signal config changes from admin API handlers.
* 🔄 refactor: Extract getAppConfig logic into packages/api as TS service
Move override resolution, caching strategy, and signalConfigChange from
api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts
using the DI factory pattern (createAppConfigService). The JS file becomes
a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies.
* 🧹 chore: Rename configResolution.ts to resolution.ts
* ✨ feat: Move admin types & capabilities to librechat-data-provider
Move SystemCapabilities, CapabilityImplications, and utility functions
(hasImpliedCapability, expandImplications) from data-schemas to
data-provider so they are available to external consumers like the
admin panel without a data-schemas dependency.
Add API-friendly admin types: TAdminConfig, TAdminSystemGrant,
TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult,
TCapabilityCategory, and CAPABILITY_CATEGORIES.
data-schemas re-exports these from data-provider and extends with
config-schema-derived types (ConfigSection, SystemCapability union).
Bump version to 0.8.500.
* feat: Add JSON-serializable admin config API response types to data-schemas
Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and
AdminConfigDeleteResponse types so both LibreChat API handlers and the
admin panel can share the same response contract. Bump version to 0.0.41.
* refactor: Move admin capabilities & types from data-provider to data-schemas
SystemCapabilities, CapabilityImplications, utility functions,
CAPABILITY_CATEGORIES, and admin API response types should not be in
data-provider as it gets compiled into the frontend bundle, exposing
the capability surface. Moved everything to data-schemas (server-only).
All consumers already import from @librechat/data-schemas, so no
import changes needed elsewhere. Consolidated duplicate AdminConfig
type (was in both config.ts and admin.ts).
* chore: Bump @librechat/data-schemas to 0.0.42
* refactor: Reorganize admin capabilities into admin/ and types/admin.ts
Split systemCapabilities.ts following data-schemas conventions:
- Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.)
→ src/types/admin.ts
- Runtime code (SystemCapabilities, CapabilityImplications, utilities)
→ src/admin/capabilities.ts
Revert data-provider version to 0.8.401 (no longer modified).
* chore: Fix import ordering, rename appConfigService to service
- Rename app/appConfigService.ts → app/service.ts (directory provides context)
- Fix import order in admin/config.ts, types/admin.ts, types/config.ts
- Add naming convention to AGENTS.md
* feat: Add DB base config support (role/__base__)
- Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc
- getApplicableConfigs always includes __base__ in queries
- getAppConfig queries DB even without role/userId when DB configs exist
- Bump @librechat/data-schemas to 0.0.43
* fix: Address PR review issues for admin config
- Add listAllConfigs method; listConfigs endpoint returns all active
configs instead of only __base__
- Normalize principalId to string in all config methods to prevent
ObjectId vs string mismatch on user/group lookups
- Block __proto__ and all dunder-prefixed segments in field path
validation to prevent prototype pollution
- Fix configVersion off-by-one: default to 0, guard pre('save') with
!isNew, use $inc on findOneAndUpdate
- Remove unused getApplicableConfigs from admin handler deps
* fix: Enable tree-shaking for data-schemas, bump packages
- Switch data-schemas Rollup output to preserveModules so each source
file becomes its own chunk; consumers (admin panel) can now import
just the modules they need without pulling in winston/mongoose/etc.
- Add sideEffects: false to data-schemas package.json
- Bump data-schemas to 0.0.44, data-provider to 0.8.402
* feat: add capabilities subpath export to data-schemas
Adds `@librechat/data-schemas/capabilities` subpath export so browser
consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants
without pulling in Node.js-only modules (winston, async_hooks, etc.).
Bump version to 0.0.45.
* fix: include dist/ in data-provider npm package
Add explicit files field so npm includes dist/types/ in the published
package. Without this, the root .gitignore exclusion of dist/ causes
npm to omit type declarations, breaking TypeScript consumers.
* chore: bump librechat-data-provider to 0.8.403
* feat: add GET /api/admin/config/base for raw AppConfig
Returns the full AppConfig (YAML + DB base merged) so the admin panel
can display actual config field values and structure. The startup config
endpoint (/api/config) returns TStartupConfig which is a different shape
meant for the frontend app.
* chore: imports order
* fix: address code review findings for admin config
Critical:
- Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE
instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys
- Eliminate race condition: patchConfigField and deleteConfigField now use
atomic MongoDB $set/$unset with dot-path notation instead of
read-modify-write cycles, removing the lost-update bug entirely
- Add patchConfigFields and unsetConfigField atomic DB methods
Major:
- Reorder cache check before principal resolution in getAppConfig so
getUserPrincipals DB query only fires on cache miss
- Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant
- Parallelize capability checks with Promise.all instead of sequential
awaits in for loops
- Use loose equality (== null) for cache miss check to handle both null
and undefined returns from cache implementations
- Set HAS_DB_CONFIGS_KEY to true on successful config fetch
Minor:
- Remove dead pre('save') hook from config schema (all writes use
findOneAndUpdate which bypasses document hooks)
- Consolidate duplicate type imports in resolution.ts
- Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops)
- Add .sort({ priority: 1 }) to getApplicableConfigs query
- Rename _impliedBy to impliedByMap
* fix: self-referencing BROAD_CONFIG_ACCESS constant
* fix: replace type-cast sentinel with proper null parameter
Update hasConfigCapability to accept ConfigSection | null where null
means broad access check (MANAGE_CONFIGS or READ_CONFIGS only).
Removes the '' as ConfigSection type lie from admin config handlers.
* fix: remaining review findings + add tests
- listAllConfigs accepts optional { isActive } filter so admin listing
can show inactive configs (#9)
- Standardize session application to .session(session ?? null) across
all config DB methods (#15)
- Export isValidFieldPath and getTopLevelSection for testability
- Add 38 tests across 3 spec files:
- config.spec.ts (api): path validation, prototype pollution rejection
- resolution.spec.ts: deep merge, priority ordering, array replacement
- config.spec.ts (data-schemas): full CRUD, ObjectId normalization,
atomic $set/$unset, configVersion increment, toggle, __base__ query
* fix: address second code review findings
- Fix cross-user cache contamination: overrideCacheKey now handles
userId-without-role case with its own cache key (#1)
- Add broad capability check before DB lookup in getConfig to prevent
config existence enumeration (#2/#3)
- Move deleteConfigField fieldPath from request body to query parameter
for proxy/load balancer compatibility (#5)
- Derive BaseSystemCapability from SystemCapabilities const instead of
manual string union (#6)
- Return 201 on upsert creation, 200 on update (#11)
- Remove inline narration comments per AGENTS.md (#12)
- Type overrides as Partial<TCustomConfig> in DB methods and handler
deps (#13)
- Replace double as-unknown-as casts in resolution.ts with generic
deepMerge<T> (#14)
- Make override cache TTL injectable via AppConfigServiceDeps (#16)
- Add exhaustive never check in principalModel switch (#17)
* fix: remaining review findings — tests, rename, semantics
- Rename signalConfigChange → markConfigsDirty with JSDoc documenting
the stale-window tradeoff and overrideCacheTtl knob
- Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention
- Add createAppConfigService tests (14 cases): cache behavior, feature
flag, cross-user key isolation, fallback on error, markConfigsDirty
- Add admin handler integration tests (13 cases): auth ordering,
201/200 on create/update, fieldPath from query param, markConfigsDirty
calls, capability checks
* fix: global flag corruption + empty overrides auth bypass
- Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning
no configs does not mean no configs exist globally. Setting the flag
false from a per-principal query short-circuited all subsequent users.
- Add broad manage capability check before section checks in
upsertConfigOverrides: empty overrides {} no longer bypasses auth.
* test: add regression and invariant tests for config system
Regression tests:
- Bug 1: User A's empty result does not short-circuit User B's overrides
- Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS
Invariant tests (applied across ALL handlers):
- All 5 mutation handlers call markConfigsDirty on success
- All 5 mutation handlers return 401 without auth
- All 5 mutation handlers return 403 without capability
- All 3 read handlers return 403 without capability
* fix: third review pass — all findings addressed
Service (service.ts):
- Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId)
so deployments with zero DB configs skip DB queries (#1)
- Resolve cache once at factory init instead of per-invocation (#8)
- Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10)
- Add JSDoc to clearAppConfigCache documenting stale-window (#4)
- Fix log message to not say "from YAML" (#14)
Admin handlers (config.ts):
- Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2)
- Add Array.isArray guard on overrides body (#5)
- Import CapabilityUser from capabilities.ts, remove duplicate (#6)
- Replace as-unknown-as cast with targeted type assertion (#7)
- Add MAX_PATCH_ENTRIES=100 cap on entries array (#15)
- Reorder deleteConfigField to validate principalType first (#12)
- Export CapabilityUser from middleware/capabilities.ts
DB methods (config.ts):
- Remove isActive:true from patchConfigFields to prevent silent
reactivation of disabled configs (#3)
Schema (config.ts):
- Change principalId from Schema.Types.Mixed to String (#11)
Tests:
- Add patchConfigField unsafe fieldPath rejection test (#9)
- Add base-only HAS_DB_CONFIGS=false test (#1)
- Update 201/200 tests to use configVersion instead of findConfig (#2)
* fix: add read handler 401 invariant tests + document flag behavior
- Add invariant: all 3 read handlers return 401 without auth
- Document on markConfigsDirty that HAS_DB_CONFIGS stays true after
all configs are deleted until clearAppConfigCache or restart
* fix: remove HAS_DB_CONFIGS false optimization entirely
getApplicableConfigs([]) only queries for __base__, not all configs.
A deployment with role/group configs but no __base__ doc gets the
flag poisoned to false by a base-only query, silently ignoring all
scoped overrides. The optimization is not safe without a comprehensive
Config.exists() check, which adds its own DB cost. Removed entirely.
The flag is now write-once-true (set when configs are found or by
markConfigsDirty) and only cleared by clearAppConfigCache/restart.
* chore: reorder import statements in app.js for clarity
* refactor: remove HAS_DB_CONFIGS_KEY machinery entirely
The three-state flag (false/null/true) was the source of multiple bugs
across review rounds. Every attempt to safely set it to false was
defeated by getApplicableConfigs querying only a subset of principals.
Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag,
markConfigsDirty (now a no-op concept), notifyChange wrapper, and all
tests that seeded false manually.
The per-user/role TTL cache (overrideCacheTtl, default 60s) is the
sole caching mechanism. On cache miss, getApplicableConfigs queries
the DB. This is one indexed query per user per TTL window — acceptable
for the config override use case.
* docs: rewrite admin panel remaining work with current state
* perf: cache empty override results to avoid repeated DB queries
When getApplicableConfigs returns no configs for a principal, cache
baseConfig under their override key with TTL. Without this, every
user with no per-principal overrides hits MongoDB on every request
after the 60s cache window expires.
* fix: add tenantId to cache keys + reject PUBLIC principal type
- Include tenantId in override cache keys to prevent cross-tenant
config contamination. Single-tenant deployments (tenantId undefined)
use '_' as placeholder — no behavior change for them.
- Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has
no PrincipalModel and is never resolved by getApplicableConfigs,
so config docs for it would be dead data.
- Config middleware passes req.user.tenantId to getAppConfig.
* fix: fourth review pass findings
DB methods (config.ts):
- findConfigByPrincipal accepts { includeInactive } option so admin
GET can retrieve inactive configs (#5)
- upsertConfig catches E11000 duplicate key on concurrent upserts and
retries without upsert flag (#2)
- unsetConfigField no longer filters isActive:true, consistent with
patchConfigFields (#11)
- Typed filter objects replace Record<string, unknown> (#12)
Admin handlers (config.ts):
- patchConfigField: serial broad capability check before Promise.all
to pre-warm ALS principal cache, preventing N parallel DB calls (#3)
- isValidFieldPath rejects leading/trailing dots and consecutive
dots (#7)
- Duplicate fieldPaths in patch entries return 400 (#8)
- DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14)
- Admin getConfig and patchConfigField pass includeInactive to
findConfigByPrincipal (#5)
- Route import uses barrel instead of direct file path (#13)
Resolution (resolution.ts):
- deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow
from crafted deeply nested configs (#4)
* fix: final review cleanup
- Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths)
- Add empty-result caching regression test
- Add tenantId to AdminConfigDeps.getAppConfig type
- Restore exhaustive never check in principalModel switch
- Standardize toggleConfigActive session handling to options pattern
* fix: validate priority in patchConfigField handler
Add the same non-negative number validation for priority that
upsertConfigOverrides already has. Without this, invalid priority
values could be stored via PATCH and corrupt merge ordering.
* chore: remove planning doc from PR
* fix: correct stale cache key strings in service tests
* fix: clean up service tests and harden tenant sentinel
- Remove no-op cache delete lines from regression tests
- Change no-tenant sentinel from '_' to '__default__' to avoid
collision with a real tenant ID when multi-tenancy is enabled
- Remove unused CONFIG_STORE from AppConfigServiceDeps
* chore: bump @librechat/data-schemas to 0.0.46
* fix: block prototype-poisoning keys in deepMerge
Skip __proto__, constructor, and prototype keys during config merge
to prevent prototype pollution via PUT /api/admin/config overrides.
2026-03-25 19:39:29 -04:00
|
|
|
loadBaseConfig,
|
|
|
|
|
setCachedTools,
|
|
|
|
|
getCache: getLogStores,
|
|
|
|
|
cacheKeys: CacheKeys,
|
|
|
|
|
getApplicableConfigs: db.getApplicableConfigs,
|
|
|
|
|
getUserPrincipals: db.getUserPrincipals,
|
|
|
|
|
});
|
2025-08-26 12:10:18 -04:00
|
|
|
|
🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445)
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode
seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.
* fix: chain tenantContextMiddleware in optionalJwtAuth
optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.
* feat: tenant-safe bulkWrite wrapper and call-site migration
Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.
Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)
systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.
* feat: pre-auth tenant middleware and tenant-scoped config cache
Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.
The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.
The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.
* feat: accept optional tenantId in updateInterfacePermissions
When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.
* fix: tenant-filter $lookup in getPromptGroup aggregation
The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.
Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.
* fix: replace field-level unique indexes with tenant-scoped compounds
Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.
- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
compound
* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant
These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.
Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).
Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG
* fix: add getTenantId to PluginController spec mock
The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.
* fix: address review findings — migration, strict-mode, DRY, types
Addresses all CRITICAL, MAJOR, and MINOR review findings:
F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.
F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.
F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.
F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.
F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.
F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.
F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).
F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.
* fix: add new superseded indexes to tenantIndexes test fixture
The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).
* fix: restore logger.warn for unknown bulk op types in non-strict mode
* fix: block SYSTEM_TENANT_ID sentinel from external header input
CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.
Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js
* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions
bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling
preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
getTenantId() inside the middleware's execution context, not the
test runner's outer frame (which was always undefined regardless)
* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager
MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
preventing cross-tenant message content reads during TTS streaming.
FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
context is active, isolating OAuth flow state per tenant.
GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId
* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks
FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.
* fix: correct import ordering per AGENTS.md conventions
Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.
* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode
serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).
* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs
SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped
FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows
Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site
* fix: SSE stream tests hang on success path, remove internal fork references
The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.
Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".
* fix: address review findings — test rigor, tenant ID validation, docs
F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.
F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.
F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.
F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.
F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.
F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.
* fix: revert flow key tenant scoping, fix SSE test timing
FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).
SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.
* fix: restore getTenantId import for completeFlow diagnostic log
* fix: correct completeFlow warning message, add missing flow test
The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.
Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.
Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.
* fix: add explicit return type to recursive updateInterfacePermissions
The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.
* fix: update MCPOAuthRaceCondition test to match new completeFlow warning
* fix: clearEndpointConfigCache deletes both scoped and unscoped keys
Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.
* fix: tenant guard on abort/status endpoints, warn logs, test coverage
F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.
F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.
F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.
F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.
F5: Test for job with tenantId + user without tenantId → 403.
F10: Regex uses idiomatic hyphen-at-start form.
F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).
Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.
* fix: test coverage for logger.warn, status/abort guards, consistency
A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.
B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.
C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.
D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.
* fix: assert ip field in malformed header warn tests
* fix: parallelize cache deletes, document tenant guard, fix import order
- clearEndpointConfigCache uses Promise.all for independent cache
deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
AGENTS.md
* fix: tenant-qualify userJobs keys, document tenant guard backward-compat
Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap
getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).
Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.
* fix: parallelize cache deletes, document tenant guard, fix import order
Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.
Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.
* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs
F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.
F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.
F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).
F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.
F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.
F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.
F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).
* fix: clean userJobMap in evictOldest — same leak as cleanup()
2026-03-28 16:43:50 -04:00
|
|
|
/**
|
|
|
|
|
* Deletes ENDPOINT_CONFIG entries from CONFIG_STORE.
|
|
|
|
|
* Clears both the tenant-scoped key (if in tenant context) and the
|
|
|
|
|
* unscoped base key (populated by unauthenticated /api/endpoints calls).
|
|
|
|
|
* Other tenants' scoped keys are NOT actively cleared — they expire
|
|
|
|
|
* via TTL. Config mutations in one tenant do not propagate immediately
|
|
|
|
|
* to other tenants' endpoint config caches.
|
|
|
|
|
*/
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
async function clearEndpointConfigCache() {
|
|
|
|
|
try {
|
|
|
|
|
const configStore = getLogStores(CacheKeys.CONFIG_STORE);
|
🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445)
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode
seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.
* fix: chain tenantContextMiddleware in optionalJwtAuth
optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.
* feat: tenant-safe bulkWrite wrapper and call-site migration
Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.
Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)
systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.
* feat: pre-auth tenant middleware and tenant-scoped config cache
Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.
The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.
The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.
* feat: accept optional tenantId in updateInterfacePermissions
When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.
* fix: tenant-filter $lookup in getPromptGroup aggregation
The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.
Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.
* fix: replace field-level unique indexes with tenant-scoped compounds
Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.
- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
compound
* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant
These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.
Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).
Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG
* fix: add getTenantId to PluginController spec mock
The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.
* fix: address review findings — migration, strict-mode, DRY, types
Addresses all CRITICAL, MAJOR, and MINOR review findings:
F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.
F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.
F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.
F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.
F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.
F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.
F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).
F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.
* fix: add new superseded indexes to tenantIndexes test fixture
The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).
* fix: restore logger.warn for unknown bulk op types in non-strict mode
* fix: block SYSTEM_TENANT_ID sentinel from external header input
CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.
Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js
* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions
bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling
preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
getTenantId() inside the middleware's execution context, not the
test runner's outer frame (which was always undefined regardless)
* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager
MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
preventing cross-tenant message content reads during TTS streaming.
FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
context is active, isolating OAuth flow state per tenant.
GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId
* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks
FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.
* fix: correct import ordering per AGENTS.md conventions
Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.
* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode
serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).
* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs
SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped
FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows
Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site
* fix: SSE stream tests hang on success path, remove internal fork references
The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.
Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".
* fix: address review findings — test rigor, tenant ID validation, docs
F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.
F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.
F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.
F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.
F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.
F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.
* fix: revert flow key tenant scoping, fix SSE test timing
FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).
SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.
* fix: restore getTenantId import for completeFlow diagnostic log
* fix: correct completeFlow warning message, add missing flow test
The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.
Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.
Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.
* fix: add explicit return type to recursive updateInterfacePermissions
The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.
* fix: update MCPOAuthRaceCondition test to match new completeFlow warning
* fix: clearEndpointConfigCache deletes both scoped and unscoped keys
Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.
* fix: tenant guard on abort/status endpoints, warn logs, test coverage
F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.
F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.
F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.
F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.
F5: Test for job with tenantId + user without tenantId → 403.
F10: Regex uses idiomatic hyphen-at-start form.
F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).
Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.
* fix: test coverage for logger.warn, status/abort guards, consistency
A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.
B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.
C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.
D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.
* fix: assert ip field in malformed header warn tests
* fix: parallelize cache deletes, document tenant guard, fix import order
- clearEndpointConfigCache uses Promise.all for independent cache
deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
AGENTS.md
* fix: tenant-qualify userJobs keys, document tenant guard backward-compat
Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap
getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).
Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.
* fix: parallelize cache deletes, document tenant guard, fix import order
Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.
Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.
* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs
F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.
F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.
F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).
F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.
F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.
F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.
F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).
* fix: clean userJobMap in evictOldest — same leak as cleanup()
2026-03-28 16:43:50 -04:00
|
|
|
const scoped = scopedCacheKey(CacheKeys.ENDPOINT_CONFIG);
|
|
|
|
|
const keys = [scoped];
|
|
|
|
|
if (scoped !== CacheKeys.ENDPOINT_CONFIG) {
|
|
|
|
|
keys.push(CacheKeys.ENDPOINT_CONFIG);
|
|
|
|
|
}
|
|
|
|
|
await Promise.all(keys.map((k) => configStore.delete(k)));
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
} catch {
|
|
|
|
|
// CONFIG_STORE or ENDPOINT_CONFIG may not exist — not critical
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Invalidate all config-related caches after an admin config mutation.
|
|
|
|
|
* Clears the base config, per-principal override caches, tool caches,
|
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring
- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
`enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
`config.source === 'user'` across MCPManager, ConnectionsRepository,
UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB
* feat: three-layer MCPServersRegistry with config cache and lazy init
- Add `configCacheRepo` as third repository layer between YAML cache and
DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
DB-stored servers in `addServer()` and `addServerStub()`
* feat: wire tenant context into MCP controllers, services, and cache invalidation
- Resolve config-source servers via `getAppConfig({ role, tenantId })`
in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
config mutations trigger re-inspection of config-source MCP servers
* feat: enforce tenantMcpPolicy on admin config mcpServers mutations
- Add `validateMcpServerPolicy()` helper that checks mcpServers against
operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
→ websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured
* revert: remove tenantMcpPolicy schema and enforcement
The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.
- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
patchConfigField handlers
* test: update test assertions for source field and config-server wiring
- Use objectContaining in MCPServersRegistry reset test to account for
new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter
* fix: disconnect active connections when config-source servers are evicted
When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.
- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
clearMcpConfigCache() via MCPManager.appConnections.disconnect()
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
cross-tenant cache poisoning when two tenants define the same
server name with different configurations
- Change dbSourced checks from `source === 'user'` to
`source !== 'yaml' && source !== 'config'` so undefined source
(pre-upgrade cached configs) fails closed to restricted mode
MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
calling getOAuthServers() separately — config-source OAuth servers
are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
servers now always resolve regardless of tenant context
MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
calls per request
Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
legacy dbId check when absent (backward-compatible with pre-upgrade
cached configs that lack source field)
MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
covering lazy init, stub-on-failure, cross-tenant isolation via config
hash keys, concurrent deduplication, merge order, and cache invalidation
MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change
* fix: restore getServerConfig lookup for config-source servers (NEW-1)
Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.
Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.
- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup
* fix: eliminate configNameToKey race via caller-provided configServers param
Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.
- Add optional configServers param to getServerConfig; when provided,
returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig
* fix: populate read-through cache after config server lazy init
After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.
Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.
* fix: user-scoped getServerConfig fallback to server-only cache key
When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.
* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race
CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.
MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).
MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.
MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).
* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext
Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.
Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.
* fix: stale failure stubs retry after 5 min, upsert for cross-process races
- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
instead of permanently disabling config-source servers after transient
errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
update, preventing cross-process Redis races where a second instance's
successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS
* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup
- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests
* fix: server-only readThrough fallback only returns truthy values
Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.
* fix: remove findInConfigCache to eliminate cross-tenant config leakage
The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:
1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)
Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.
- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
(prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry
* style: fix import order per AGENTS.md conventions
Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.
* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures
Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:
- Cross-tenant contamination: the readThrough cache key was unscoped
(just serverName), so concurrent multi-tenant requests for same-named
servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
fail with "Configuration not found" because the readThrough entry
had expired
Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
provided config directly, falling back to getServerConfig lookup
for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior
* fix: cache base configs for config-server users; narrow upsertConfigCache error handling
- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
from config-server layering. Base configs are cached via readThroughCacheAll
regardless of whether configServers is provided, eliminating uncached
MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
infrastructure errors (Redis timeouts, network failures) now propagate
instead of being silently swallowed, preventing inspection storms
during outages
* fix: restore correct merge order and document upsert error matching
- Restore YAML → Config → User DB precedence in getAllServerConfigs
(user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
linking to the two cache implementations that define the error message
* feat: complete config-source server support across all execution paths
Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.
- Thread configServers into handleTools.js agent tool pipeline: resolve
config servers from tenant context before MCP tool iteration, pass to
getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
applyContextToAgent → getMCPInstructionsForServers →
formatInstructionsForContext, resolved in client.js before agent
context application
- Add configServers param to createMCPTool and createMCPTools for
reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter
* fix: add missing mocks for config-source server dependencies in client.test.js
Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.
* fix: update formatInstructionsForContext assertions for configServers param
The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.
* fix: move configServers resolution before MCP tool loop to avoid TDZ
configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.
* fix: address review findings — cache bypass, discoverServerTools gap, DRY
- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
readThroughCacheAll) instead of bypassing it when configServers is present.
Extracts user-DB entries from cached base by diffing against YAML keys
to maintain YAML → Config → User DB merge order without extra MongoDB calls.
- #3: Add configServers param to ToolDiscoveryOptions and thread it through
discoverServerTools → getServerConfig so config-source servers are
discoverable during OAuth reconnection flows.
- #6: Replace inline import() type annotations in context.ts with proper
import type { ParsedServerConfig } per AGENTS.md conventions.
- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
handleTools.js and client.js, eliminating the duplicated 6-line config
resolution pattern.
- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
choice in getMCPSetupData — documents non-obvious correctness constraint.
- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.
* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case
- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
YAML entry with the same name was excluded from userDbConfigs; now
uses reference equality check to detect DB-overwritten YAML keys
* fix: replace fragile string-match error detection with proper upsert method
Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.
Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)
* fix: wire configServers through remaining HTTP endpoints
- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name
* fix: thread serverConfig through getConnection for config-source servers
Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.
* fix: thread configServers through reinit, reconnect, and tool definition paths
Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:
- reinitMCPServer: accepts serverConfig and configServers, uses them for
getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer
* fix: address review findings — simplify merge, harden error paths, fix log labels
- Simplify getAllServerConfigs merge: replace fragile reference-equality
loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js
* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label
- Clear yamlServerNamesPromise on rejection so transient cache errors
don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
reinitMCPServer — they lack a CACHE/DB storage location; retry is
handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message
* fix: persist oauthHeaders in flow state for config-source OAuth servers
The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.
Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.
* fix: update tests for getMCPSetupData null guard removal and ToolService mock
- MCP.spec.js: update test to expect graceful handling of null mcpConfig
instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
(resolveConfigServers)
* fix: address review findings — DRY, naming, logging, dead code, defensive guards
- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change
* fix: restore correct merge priority, use immutable spread, fix test mock
- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
unnecessary || {} defensive guard in getMCPSetupData
* fix: error handling, stable hashing, flatten nesting, remove dead param
- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets
* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision
The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.
Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.
Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.
* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency
The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').
* chore: imports/fields ordering
* fix: address review findings — error handling, targeted lookup, test gaps
- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
getAppConfig/getAllServerConfigs failures propagate instead of masking
infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
happy path and error propagation.
* fix: add getOAuthReconnectionManager mock to OAuth callback tests
* chore: imports ordering
2026-03-28 10:36:43 -04:00
|
|
|
* the endpoints config cache, and the MCP config-source server cache.
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
* @param {string} [tenantId] - Optional tenant ID to scope override cache clearing.
|
|
|
|
|
*/
|
|
|
|
|
async function invalidateConfigCaches(tenantId) {
|
|
|
|
|
const results = await Promise.allSettled([
|
|
|
|
|
clearAppConfigCache(),
|
|
|
|
|
clearOverrideCache(tenantId),
|
|
|
|
|
invalidateCachedTools({ invalidateGlobal: true }),
|
|
|
|
|
clearEndpointConfigCache(),
|
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring
- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
`enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
`config.source === 'user'` across MCPManager, ConnectionsRepository,
UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB
* feat: three-layer MCPServersRegistry with config cache and lazy init
- Add `configCacheRepo` as third repository layer between YAML cache and
DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
DB-stored servers in `addServer()` and `addServerStub()`
* feat: wire tenant context into MCP controllers, services, and cache invalidation
- Resolve config-source servers via `getAppConfig({ role, tenantId })`
in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
config mutations trigger re-inspection of config-source MCP servers
* feat: enforce tenantMcpPolicy on admin config mcpServers mutations
- Add `validateMcpServerPolicy()` helper that checks mcpServers against
operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
→ websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured
* revert: remove tenantMcpPolicy schema and enforcement
The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.
- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
patchConfigField handlers
* test: update test assertions for source field and config-server wiring
- Use objectContaining in MCPServersRegistry reset test to account for
new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter
* fix: disconnect active connections when config-source servers are evicted
When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.
- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
clearMcpConfigCache() via MCPManager.appConnections.disconnect()
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
cross-tenant cache poisoning when two tenants define the same
server name with different configurations
- Change dbSourced checks from `source === 'user'` to
`source !== 'yaml' && source !== 'config'` so undefined source
(pre-upgrade cached configs) fails closed to restricted mode
MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
calling getOAuthServers() separately — config-source OAuth servers
are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
servers now always resolve regardless of tenant context
MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
calls per request
Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
legacy dbId check when absent (backward-compatible with pre-upgrade
cached configs that lack source field)
MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
covering lazy init, stub-on-failure, cross-tenant isolation via config
hash keys, concurrent deduplication, merge order, and cache invalidation
MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change
* fix: restore getServerConfig lookup for config-source servers (NEW-1)
Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.
Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.
- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup
* fix: eliminate configNameToKey race via caller-provided configServers param
Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.
- Add optional configServers param to getServerConfig; when provided,
returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig
* fix: populate read-through cache after config server lazy init
After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.
Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.
* fix: user-scoped getServerConfig fallback to server-only cache key
When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.
* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race
CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.
MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).
MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.
MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).
* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext
Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.
Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.
* fix: stale failure stubs retry after 5 min, upsert for cross-process races
- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
instead of permanently disabling config-source servers after transient
errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
update, preventing cross-process Redis races where a second instance's
successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS
* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup
- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests
* fix: server-only readThrough fallback only returns truthy values
Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.
* fix: remove findInConfigCache to eliminate cross-tenant config leakage
The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:
1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)
Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.
- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
(prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry
* style: fix import order per AGENTS.md conventions
Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.
* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures
Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:
- Cross-tenant contamination: the readThrough cache key was unscoped
(just serverName), so concurrent multi-tenant requests for same-named
servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
fail with "Configuration not found" because the readThrough entry
had expired
Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
provided config directly, falling back to getServerConfig lookup
for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior
* fix: cache base configs for config-server users; narrow upsertConfigCache error handling
- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
from config-server layering. Base configs are cached via readThroughCacheAll
regardless of whether configServers is provided, eliminating uncached
MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
infrastructure errors (Redis timeouts, network failures) now propagate
instead of being silently swallowed, preventing inspection storms
during outages
* fix: restore correct merge order and document upsert error matching
- Restore YAML → Config → User DB precedence in getAllServerConfigs
(user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
linking to the two cache implementations that define the error message
* feat: complete config-source server support across all execution paths
Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.
- Thread configServers into handleTools.js agent tool pipeline: resolve
config servers from tenant context before MCP tool iteration, pass to
getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
applyContextToAgent → getMCPInstructionsForServers →
formatInstructionsForContext, resolved in client.js before agent
context application
- Add configServers param to createMCPTool and createMCPTools for
reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter
* fix: add missing mocks for config-source server dependencies in client.test.js
Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.
* fix: update formatInstructionsForContext assertions for configServers param
The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.
* fix: move configServers resolution before MCP tool loop to avoid TDZ
configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.
* fix: address review findings — cache bypass, discoverServerTools gap, DRY
- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
readThroughCacheAll) instead of bypassing it when configServers is present.
Extracts user-DB entries from cached base by diffing against YAML keys
to maintain YAML → Config → User DB merge order without extra MongoDB calls.
- #3: Add configServers param to ToolDiscoveryOptions and thread it through
discoverServerTools → getServerConfig so config-source servers are
discoverable during OAuth reconnection flows.
- #6: Replace inline import() type annotations in context.ts with proper
import type { ParsedServerConfig } per AGENTS.md conventions.
- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
handleTools.js and client.js, eliminating the duplicated 6-line config
resolution pattern.
- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
choice in getMCPSetupData — documents non-obvious correctness constraint.
- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.
* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case
- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
YAML entry with the same name was excluded from userDbConfigs; now
uses reference equality check to detect DB-overwritten YAML keys
* fix: replace fragile string-match error detection with proper upsert method
Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.
Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)
* fix: wire configServers through remaining HTTP endpoints
- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name
* fix: thread serverConfig through getConnection for config-source servers
Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.
* fix: thread configServers through reinit, reconnect, and tool definition paths
Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:
- reinitMCPServer: accepts serverConfig and configServers, uses them for
getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer
* fix: address review findings — simplify merge, harden error paths, fix log labels
- Simplify getAllServerConfigs merge: replace fragile reference-equality
loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js
* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label
- Clear yamlServerNamesPromise on rejection so transient cache errors
don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
reinitMCPServer — they lack a CACHE/DB storage location; retry is
handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message
* fix: persist oauthHeaders in flow state for config-source OAuth servers
The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.
Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.
* fix: update tests for getMCPSetupData null guard removal and ToolService mock
- MCP.spec.js: update test to expect graceful handling of null mcpConfig
instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
(resolveConfigServers)
* fix: address review findings — DRY, naming, logging, dead code, defensive guards
- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change
* fix: restore correct merge priority, use immutable spread, fix test mock
- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
unnecessary || {} defensive guard in getMCPSetupData
* fix: error handling, stable hashing, flatten nesting, remove dead param
- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets
* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision
The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.
Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.
Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.
* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency
The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').
* chore: imports/fields ordering
* fix: address review findings — error handling, targeted lookup, test gaps
- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
getAppConfig/getAllServerConfigs failures propagate instead of masking
infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
happy path and error propagation.
* fix: add getOAuthReconnectionManager mock to OAuth callback tests
* chore: imports ordering
2026-03-28 10:36:43 -04:00
|
|
|
clearMcpConfigCache(),
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
]);
|
|
|
|
|
const labels = [
|
|
|
|
|
'clearAppConfigCache',
|
|
|
|
|
'clearOverrideCache',
|
|
|
|
|
'invalidateCachedTools',
|
|
|
|
|
'clearEndpointConfigCache',
|
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring
- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
`enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
`config.source === 'user'` across MCPManager, ConnectionsRepository,
UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB
* feat: three-layer MCPServersRegistry with config cache and lazy init
- Add `configCacheRepo` as third repository layer between YAML cache and
DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
DB-stored servers in `addServer()` and `addServerStub()`
* feat: wire tenant context into MCP controllers, services, and cache invalidation
- Resolve config-source servers via `getAppConfig({ role, tenantId })`
in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
config mutations trigger re-inspection of config-source MCP servers
* feat: enforce tenantMcpPolicy on admin config mcpServers mutations
- Add `validateMcpServerPolicy()` helper that checks mcpServers against
operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
→ websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured
* revert: remove tenantMcpPolicy schema and enforcement
The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.
- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
patchConfigField handlers
* test: update test assertions for source field and config-server wiring
- Use objectContaining in MCPServersRegistry reset test to account for
new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter
* fix: disconnect active connections when config-source servers are evicted
When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.
- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
clearMcpConfigCache() via MCPManager.appConnections.disconnect()
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
cross-tenant cache poisoning when two tenants define the same
server name with different configurations
- Change dbSourced checks from `source === 'user'` to
`source !== 'yaml' && source !== 'config'` so undefined source
(pre-upgrade cached configs) fails closed to restricted mode
MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
calling getOAuthServers() separately — config-source OAuth servers
are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
servers now always resolve regardless of tenant context
MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
calls per request
Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation
* fix: address code review findings (CRITICAL, MAJOR, MINOR)
CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
legacy dbId check when absent (backward-compatible with pre-upgrade
cached configs that lack source field)
MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
covering lazy init, stub-on-failure, cross-tenant isolation via config
hash keys, concurrent deduplication, merge order, and cache invalidation
MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change
* fix: restore getServerConfig lookup for config-source servers (NEW-1)
Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.
Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.
- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup
* fix: eliminate configNameToKey race via caller-provided configServers param
Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.
- Add optional configServers param to getServerConfig; when provided,
returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig
* fix: populate read-through cache after config server lazy init
After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.
Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.
* fix: user-scoped getServerConfig fallback to server-only cache key
When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.
* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race
CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.
MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).
MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.
MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).
* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext
Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.
Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.
* fix: stale failure stubs retry after 5 min, upsert for cross-process races
- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
instead of permanently disabling config-source servers after transient
errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
update, preventing cross-process Redis races where a second instance's
successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS
* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup
- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests
* fix: server-only readThrough fallback only returns truthy values
Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.
* fix: remove findInConfigCache to eliminate cross-tenant config leakage
The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:
1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)
Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.
- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
(prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry
* style: fix import order per AGENTS.md conventions
Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.
* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures
Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:
- Cross-tenant contamination: the readThrough cache key was unscoped
(just serverName), so concurrent multi-tenant requests for same-named
servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
fail with "Configuration not found" because the readThrough entry
had expired
Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
provided config directly, falling back to getServerConfig lookup
for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior
* fix: cache base configs for config-server users; narrow upsertConfigCache error handling
- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
from config-server layering. Base configs are cached via readThroughCacheAll
regardless of whether configServers is provided, eliminating uncached
MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
infrastructure errors (Redis timeouts, network failures) now propagate
instead of being silently swallowed, preventing inspection storms
during outages
* fix: restore correct merge order and document upsert error matching
- Restore YAML → Config → User DB precedence in getAllServerConfigs
(user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
linking to the two cache implementations that define the error message
* feat: complete config-source server support across all execution paths
Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.
- Thread configServers into handleTools.js agent tool pipeline: resolve
config servers from tenant context before MCP tool iteration, pass to
getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
applyContextToAgent → getMCPInstructionsForServers →
formatInstructionsForContext, resolved in client.js before agent
context application
- Add configServers param to createMCPTool and createMCPTools for
reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter
* fix: add missing mocks for config-source server dependencies in client.test.js
Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.
* fix: update formatInstructionsForContext assertions for configServers param
The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.
* fix: move configServers resolution before MCP tool loop to avoid TDZ
configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.
* fix: address review findings — cache bypass, discoverServerTools gap, DRY
- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
readThroughCacheAll) instead of bypassing it when configServers is present.
Extracts user-DB entries from cached base by diffing against YAML keys
to maintain YAML → Config → User DB merge order without extra MongoDB calls.
- #3: Add configServers param to ToolDiscoveryOptions and thread it through
discoverServerTools → getServerConfig so config-source servers are
discoverable during OAuth reconnection flows.
- #6: Replace inline import() type annotations in context.ts with proper
import type { ParsedServerConfig } per AGENTS.md conventions.
- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
handleTools.js and client.js, eliminating the duplicated 6-line config
resolution pattern.
- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
choice in getMCPSetupData — documents non-obvious correctness constraint.
- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.
* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case
- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
YAML entry with the same name was excluded from userDbConfigs; now
uses reference equality check to detect DB-overwritten YAML keys
* fix: replace fragile string-match error detection with proper upsert method
Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.
Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)
* fix: wire configServers through remaining HTTP endpoints
- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name
* fix: thread serverConfig through getConnection for config-source servers
Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.
* fix: thread configServers through reinit, reconnect, and tool definition paths
Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:
- reinitMCPServer: accepts serverConfig and configServers, uses them for
getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer
* fix: address review findings — simplify merge, harden error paths, fix log labels
- Simplify getAllServerConfigs merge: replace fragile reference-equality
loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js
* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label
- Clear yamlServerNamesPromise on rejection so transient cache errors
don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
reinitMCPServer — they lack a CACHE/DB storage location; retry is
handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message
* fix: persist oauthHeaders in flow state for config-source OAuth servers
The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.
Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.
* fix: update tests for getMCPSetupData null guard removal and ToolService mock
- MCP.spec.js: update test to expect graceful handling of null mcpConfig
instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
(resolveConfigServers)
* fix: address review findings — DRY, naming, logging, dead code, defensive guards
- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change
* fix: restore correct merge priority, use immutable spread, fix test mock
- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
unnecessary || {} defensive guard in getMCPSetupData
* fix: error handling, stable hashing, flatten nesting, remove dead param
- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets
* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision
The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.
Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.
Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.
* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency
The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').
* chore: imports/fields ordering
* fix: address review findings — error handling, targeted lookup, test gaps
- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
getAppConfig/getAllServerConfigs failures propagate instead of masking
infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
happy path and error propagation.
* fix: add getOAuthReconnectionManager mock to OAuth callback tests
* chore: imports ordering
2026-03-28 10:36:43 -04:00
|
|
|
'clearMcpConfigCache',
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
];
|
|
|
|
|
for (let i = 0; i < results.length; i++) {
|
|
|
|
|
if (results[i].status === 'rejected') {
|
|
|
|
|
logger.error(`[invalidateConfigCaches] ${labels[i]} failed:`, results[i].reason);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-26 12:10:18 -04:00
|
|
|
module.exports = {
|
|
|
|
|
getAppConfig,
|
|
|
|
|
clearAppConfigCache,
|
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation
Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.
- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths
* feat: register tenant middleware and wrap startup/auth in runAsSystem()
- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
before user context is established (LDAP, SAML, OpenID, social login, AuthService)
* feat: thread tenantId through all getAppConfig callers
Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.
Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.
Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint
* feat: add config cache invalidation on admin mutations
- Add clearOverrideCache(tenantId?) to flush per-principal override caches
by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
(upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)
* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering
The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.
* fix: replace runAsSystem with baseOnly for pre-tenant code paths
App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.
All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.
* fix: address PR review findings — middleware ordering, types, cache safety
- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)
* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly
- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)
* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests
- requireJwtAuth: 5 tests verifying ALS tenant context is set after
passport auth, isolated between concurrent requests, and not set
when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
tenantId is threaded through, partial failure is handled gracefully,
and operations run in parallel via Promise.all (Finding 11)
* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping
- Forward passport errors in requireJwtAuth before entering tenant
middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing
* fix: address second review — cache safety, code quality, test reliability
- Decouple cache invalidation from mutation response: fire-and-forget
with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock
* fix: move JSDoc to correct function after clearEndpointConfigCache extraction
* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry
Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.
* fix: remove return from tenantStorage.run to satisfy void middleware signature
* fix: address second review — cache safety, code quality, test reliability
- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
so partial failures are logged individually instead of producing one
undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
clearAppConfigCache rejects (not just the swallowed endpoint error)
* chore: reorder imports in index.js and app.js for consistency
- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
|
|
|
invalidateConfigCaches,
|
2025-08-26 12:10:18 -04:00
|
|
|
};
|