Commit graph

3983 commits

Author SHA1 Message Date
Danny Avila
6ecd1b510f
📎 fix: Route Unrecognized File Types via supportedMimeTypes Config (#12508)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* fix: check supportedMimeTypes before routing unrecognized file types

In processAttachments, files not matching the hardcoded mime type
categories (image, PDF, video, audio) were silently dropped. Now
resolves the endpoint's file config and checks the file type against
supportedMimeTypes before routing to the documents pipeline. Files
not matching any config are still skipped (original behavior).

Closes #12482

* feat: encode generic document types for supported providers

Remove restrictive mime type filter in encodeAndFormatDocuments that
only allowed PDFs and application/* types. Add a generic encoding
path for non-PDF, non-Bedrock files using the provider's native
format (Anthropic base64 document, OpenAI file block, Google media
block). Files are already validated upstream by supportedMimeTypes.

* fix: guard file.type and cache file config in processAttachments

- Add file.type truthiness check before checkType to prevent
  coercion of null/undefined to string 'null'/'undefined'
- Cache mergedFileConfig and endpointFileConfig on the instance
  so addPreviousAttachments doesn't recompute per message

* refactor: harden generic document encoding with validation and tests

- Extract formatDocumentBlock helper to eliminate ~30 lines of
  duplicate provider-dispatch code between PDF and generic paths
- Add size validation in generic encoding path using
  configuredFileSizeLimit (was fetched but unused)
- Guard Bedrock from generic path — non-bedrockDocumentFormats
  types are now skipped instead of silently tracking metadata
- Only push metadata to result.files when a document block was
  actually created, preventing silent inconsistent state
- Enable Anthropic citations for text/plain, text/html,
  text/markdown (supported by Anthropic's document API)
- Fix != to !== for Providers.AZURE comparison
- Add 9 tests covering all four provider branches, Bedrock
  exclusion, size limit enforcement, and unhandled provider

* fix: resolve filename type mismatch in formatDocumentBlock

filename parameter is string | undefined but OpenAIFileBlock and
OpenAIInputFileBlock require string. Default to 'document' when
filename is undefined.

* fix: use endpoint name for file config lookup in processAttachments

Agent runs can have agent.provider set to a base provider (e.g.,
openAI) while agent.endpoint is a custom endpoint name. Using
provider for the getEndpointFileConfig lookup bypassed custom
endpoint supportedMimeTypes config. Now uses agent.endpoint,
matching the pattern in addDocuments.

* perf: filter non-Bedrock files before fetching streams

Bedrock only supports types in bedrockDocumentFormats. Previously,
getFileStream was called for all files and unsupported types were
discarded after download. Now pre-filters the file list for Bedrock
to avoid unnecessary network and memory overhead for large
unsupported attachments.

* refactor: clean up processAttachments file config handling

- Remove redundant ?? null intermediaries; use instance properties
  directly in the else-if condition
- Add JSDoc @type annotations for _mergedFileConfig and
  _endpointFileConfig in the constructor

* refactor: harden document encoding and add routing tests

- Hoist configuredFileSizeLimit above the loop to avoid recomputing
  mergeFileConfig per file
- Replace Buffer.from decode with base64 length formula in the
  generic size check to avoid unnecessary heap allocation
- Use nullish coalescing (??) for filename fallback
- Clean up test: remove unnecessary type cast, use createMockRequest
  helper for size-limit test
- Add 14 tests for processAttachments categorization logic covering
  supportedMimeTypes routing, null/undefined guards, standard type
  passthrough, and edge cases

* fix: use optional chaining for checkType in routing tests

FileConfig.checkType is typed as optional. Use optional chaining
to satisfy strict type checking.

* fix: skip stream fetches for unsupported providers, block Bedrock generic routing

- Return early from encodeAndFormatDocuments when the provider is
  neither document-supported nor Bedrock, avoiding unnecessary
  getFileStream calls for providers that would discard all results
- Add !isBedrock guard to the supportedMimeTypes fallback branch in
  processAttachments so permissive patterns like '.*' don't route
  non-Bedrock types into documents that would be silently dropped
- Add test for Bedrock + non-Bedrock-document-type skipping

* fix: respect supportedMimeTypes config for Bedrock endpoints

Remove !isBedrock guard from the generic supportedMimeTypes routing
branch. If a user configures permissive supportedMimeTypes for a
Bedrock endpoint, the upload validation already accepted the file.
The encoding layer pre-filters to Bedrock-supported types before
fetching streams, so unsupported types are handled there without
silently dropping files the user explicitly allowed.
2026-04-01 23:04:43 -04:00
Danny Avila
275af48592
🎯 fix: MCP Tool Misclassification from Action Delimiter Collision (#12512)
* fix: prevent MCP tools with `_action` in name from being misclassified as OpenAPI action tools

Add `isActionTool()` helper that checks for the `_action_` delimiter
while guarding against cross-delimiter collision with `_mcp_`. Replace
all `includes(actionDelimiter)` classification checks with the new
helper across backend and frontend.

* test: add coverage for MCP/action cross-delimiter collision

Verify that `isActionTool` correctly rejects MCP tool names containing
`_action` and that `loadAgentTools` does not filter them based on
`actionsEnabled`. Add ToolIcon and definitions test cases.

* fix: simplify isActionTool to handle all MCP name patterns

- Use `!toolName.includes('_mcp_')` instead of checking only after the
  first `_action_` occurrence, which missed MCP tools with `_action_` in
  the middle of their name (e.g. `get_action_data_mcp_myserver`).
- Reference `Constants.mcp_delimiter` value via a local const to avoid
  circular import from config.ts, with a comment explaining why.
- Remove dead `actionDelimiter` import from definitions.ts.
- Replace double-filter with single-pass partition in loadToolsForExecution.
- Add test for mid-name `_action_` collision case.

* fix: narrow MCP exclusion to delimiter position in isActionTool

Only reject when `_mcp_` appears after `_action_` (the MCP suffix
position). `_mcp_` before `_action_` is part of the operationId and
is valid — e.g. `sync_mcp_state_action_api---example---com` is a
legitimate action tool whose operationId happens to contain `_mcp_`.

* fix: document positional _mcp_ guard and known RFC-invalid domain limitation

Expand JSDoc on isActionTool to explain the action/MCP format
disambiguation and the theoretical false negative for non-RFC-compliant
domains containing `_mcp_`. Add test documenting this known edge case.
2026-04-01 22:36:21 -04:00
Danny Avila
611a1ef5dc
🏖️ fix: Sandpack ExternalResources for Static HTML Artifact Previews (#12509)
* fix: omit externalResources for static Sandpack previews

The Tailwind CDN URL lacks a file extension, causing Sandpack's
static template to throw a runtime injection error. Static previews
already load Tailwind via a script tag in the shared index.html,
so externalResources is unnecessary for them.

Closes #12507

* refactor: extract buildSandpackOptions and add tests

- Surgically omit only externalResources for static templates
  instead of discarding all sharedOptions, preventing future
  regression if new template-agnostic options are added.
- Extract options logic into a pure, testable helper function.
- Add unit tests covering all template/config combinations.

* chore: fix import order and pin test assertions

* fix: use URL fragment hint instead of omitting externalResources

Sandpack's static template regex detects resource type from the URL's
last file extension. The versioned CDN path (/3.4.17) matched ".17"
instead of ".js", throwing "Unable to determine file type". Rather
than omitting externalResources for static templates (which would
remove the only Tailwind injection path for HTML artifacts that don't
embed their own script tag), append a #tailwind.js fragment hint so
the regex matches ".js". Fragments are not sent to the server, so
the CDN response is unchanged.
2026-04-01 22:06:42 -04:00
Danny Avila
cb41ba14b2
🔁 fix: Pass recursionLimit to OpenAI-Compatible Agents API Endpoint (#12510)
* fix: pass recursionLimit to processStream in OpenAI-compatible agents API

The OpenAI-compatible endpoint never passed recursionLimit to LangGraph's
processStream(), silently capping all API-based agent calls at the default
25 steps. Mirror the 3-step cascade already used by the UI path (client.js):
yaml config default → per-agent DB override → max cap.

* refactor: extract resolveRecursionLimit into shared utility

Extract the 3-step recursion limit cascade into a shared
resolveRecursionLimit() function in @librechat/api. Both openai.js and
client.js now call this single source of truth.

Also fixes falsy-guard edge cases where recursion_limit=0 or
maxRecursionLimit=0 would silently misbehave, by using explicit
typeof + positive checks.

Includes unit tests covering all cascade branches and edge cases.

* refactor: use resolveRecursionLimit in openai.js and client.js

Replace duplicated cascade logic in both controllers with the shared
resolveRecursionLimit() utility from @librechat/api.

In openai.js: hoist agentsEConfig to avoid double property walk,
remove displaced comment, add integration test assertions.

In client.js: remove inline cascade that was overriding config
after initial assignment.

* fix: hoist processStream mock for test accessibility

The processStream mock was created inline inside mockResolvedValue,
making it inaccessible via createRun.mock.results (which returns
the Promise, not the resolved value). Hoist it to a module-level
variable so tests can assert on it directly.

* test: improve test isolation and boundary coverage

Use mockReturnValueOnce instead of mockReturnValue to prevent mock
leaking across test boundaries. Add boundary tests for downward
agent override and exact-match maxRecursionLimit.
2026-04-01 21:13:07 -04:00
Danny Avila
aa575b274b
🛡️ refactor: Self-Healing Tenant Isolation Update Guard (#12506)
* refactor: self-healing tenant isolation update guard

Replace the strict throw-on-any-tenantId guard with a
strip-or-throw approach:

- $set/$setOnInsert: strip when value matches current tenant
  or no context is active; throw only on cross-tenant mutations
- $unset/$rename: always strip (unsetting/renaming tenantId
  is never valid)
- Top-level tenantId: same logic as $set

This eliminates the entire class of "tenantId in update payload"
bugs at the plugin level while preserving the cross-tenant
security invariant.

* test: update mutation guard tests for self-healing behavior

- Convert same-tenant $set/$setOnInsert tests to expect silent
  stripping instead of throws
- Convert $unset test to expect silent stripping
- Add cross-tenant throw tests for $set, $setOnInsert, top-level
- Add same-tenant stripping tests for $set, $setOnInsert, top-level
- Add $rename stripping test
- Add no-context stripping test
- Update error message assertions to match new cross-tenant message

* revert: remove call-site tenantId stripping patches

Revert the per-call-site tenantId stripping from #12498 and
the excludedKeys patch from #12501. These are no longer needed
since the self-healing guard handles tenantId in update payloads
at the plugin level.

Reverted patches:
- conversation.ts: delete update.tenantId in saveConvo(),
  tenantId destructuring in bulkSaveConvos()
- message.ts: delete update.tenantId in saveMessage() and
  recordMessage(), tenantId destructuring in bulkSaveMessages()
  and updateMessage()
- config.ts: tenantId in excludedKeys Set
- config.spec.ts: tenantId in excludedKeys test assertion

* fix: strip tenantId from update documents in tenantSafeBulkWrite

Mongoose middleware does not fire for bulkWrite, so the plugin-level
guard never sees update payloads in bulk operations. Extend
injectTenantId() to strip tenantId from update documents for
updateOne/updateMany operations, preventing cross-tenant overwrites.

* refactor: rename guard, add empty-op cleanup and strict-mode warning

- Rename assertNoTenantIdMutation to sanitizeTenantIdMutation
- Remove empty operator objects after stripping to avoid MongoDB errors
- Log warning in strict mode when stripping tenantId without context
- Fix $setOnInsert test to use upsert:true with non-matching filter

* test: fix bulk-save tests and add negative excludedKeys assertion

- Wrap bulkSaveConvos/bulkSaveMessages tests in tenantStorage.run()
  to exercise the actual multi-tenant stripping path
- Assert tenantId equals the real tenant, not undefined
- Add negative assertion: excludedKeys must NOT contain tenantId

* fix: type-safe tenantId stripping in tenantSafeBulkWrite

- Fix TS2345 error: replace conditional type inference with
  UpdateQuery<Record<string, unknown>> for stripTenantIdFromUpdate
- Handle empty updates after stripping (e.g., $set: { tenantId } as
  sole field) by filtering null ops from the bulk array
- Add 4 tests for bulk update tenantId stripping: plain-object update,
  $set stripping, $unset stripping, and sole-field-in-$set edge case

* fix: resolve TS2345 in stripTenantIdFromUpdate parameter type

Use Record<string, unknown> instead of UpdateQuery<> to avoid
type incompatibility with Mongoose's AnyObject-based UpdateQuery
resolution in CI.

* fix: strip tenantId from bulk updates unconditionally

Separate sanitization from injection in tenantSafeBulkWrite:
tenantId is now stripped from all update documents before any
tenant-context checks, closing the gap where no-context and
system-context paths passed caller-supplied tenantId through
to MongoDB unmodified.

* refactor: address review findings in tenant isolation

- Fix early-return gap in stripTenantIdFromUpdate that skipped
  operator-level tenantId when top-level was also present
- Lazy-allocate copy in stripTenantIdFromUpdate (no allocation
  when no tenantId is present)
- Document behavioral asymmetry: plugin throws on cross-tenant,
  bulkWrite strips silently (intentional, documented in JSDoc)
- Remove double JSDoc on injectTenantId
- Remove redundant cast in stripTenantIdFromUpdate
- Use shared frozen EMPTY_BULK_RESULT constant
- Remove Record<string, unknown> annotation in recordMessage
- Isolate bulkSave* tests: pre-create docs then update with
  cross-tenant payload, read via runAsSystem to prove stripping
  is independent of filter injection

* fix: no-op empty updates after tenantId sanitization

When tenantId is the sole field in an update (e.g., { $set: { tenantId } }),
sanitization leaves an empty update object that would fail with
"Update document requires atomic operators." The updateGuard now
detects this and short-circuits the query by adding an unmatchable
filter condition and disabling upsert, matching the bulk-write
handling that filters out null ops.

* refactor: remove dead logger.warn branches, add mixed-case test

- Remove unreachable logger.warn calls in sanitizeTenantIdMutation:
  queryMiddleware throws before updateGuard in strict+no-context,
  and isStrict() is false in non-strict+no-context
- Add test for combined top-level + operator-level tenantId stripping
  to lock in the early-return fix

* feat: ESLint rule to ban raw bulkWrite and collection.* in data-schemas

Add no-restricted-syntax rules to the data-schemas ESLint config that
flag direct Model.bulkWrite() and Model.collection.* calls. These
bypass Mongoose middleware and the tenant isolation plugin — all bulk
writes must use tenantSafeBulkWrite() instead.

Test files are excluded since they intentionally use raw driver calls
for fixture setup.

Also migrate the one remaining raw bulkWrite in seedSystemGrants() to
use tenantSafeBulkWrite() for consistency.

* test: add findByIdAndUpdate coverage to mutation guard tests

* fix: keep tenantSafeBulkWrite in seedSystemGrants, fix ESLint config

- Revert to tenantSafeBulkWrite in seedSystemGrants (always runs
  under runAsSystem, so the wrapper passes through correctly)
- Split data-schemas ESLint config: shared TS rules for all files,
  no-restricted-syntax only for production non-wrapper files
- Fix unused destructure vars to use _tenantId pattern
2026-04-01 19:07:52 -04:00
Danny Avila
7b368916d5
🔑 fix: Auth-Aware Startup Config Caching for Fresh Sessions (#12505)
* fix: auth-aware config caching for fresh sessions

- Add auth state to startup config query key via shared `startupConfigKey`
  builder so login (unauthenticated) and chat (authenticated) configs are
  cached independently
- Disable queries during login onMutate to prevent premature unauthenticated
  refetches after cache clear
- Re-enable queries in setUserContext only after setTokenHeader runs, with
  positive-only guard to avoid redundant disable on logout
- Update all getQueryData call sites to use the shared key builder
- Fall back to getConfigDefaults().interface in useEndpoints, hoisted to
  module-level constant to avoid per-render recomputation

* fix: address review findings for auth-aware config caching

- Move defaultInterface const after all imports in ModelSelector.tsx
- Remove dead QueryKeys import, use import type for TStartupConfig
  in ImportConversations.tsx
- Spread real exports in useQueryParams.spec.ts mock to preserve
  startupConfigKey, fixing TypeError in all 6 tests

* chore: import order

* fix: re-enable queries on login failure

When login fails, onSuccess never fires so queriesEnabled stays
false. Re-enable in onError so the login page can re-fetch config
(needed for LDAP username validation and social login options).
2026-04-01 17:20:39 -04:00
Danny Avila
c4b5dedb77
🔒 fix: Exclude Unnecessary fields from Conversation $unset (#12501)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
`BaseClient.js` iterates existing conversation keys to build `unsetFields`
for removal when `endpointOptions` doesn't include them. When tenant
isolation stamps `tenantId` on the document, it gets swept into `$unset`,
triggering `assertNoTenantIdMutation`. Adding `tenantId` to `excludedKeys`
prevents this — it's a system field, not an endpoint option.
2026-04-01 13:01:02 -04:00
Danny Avila
5e789f589f
🔏 fix: Strip Unnecessary Fields Across Write Paths in Conversation & Message Methods (#12498)
* fix: Exclude field from conversation and message updates

* fix: Remove tenantId from conversation and message update objects to prevent unintended data exposure.
* refactor: Adjust update logic in createConversationMethods and createMessageMethods to ensure tenantId is not included in the updates, maintaining data integrity.

* fix: Strip tenantId from all write paths in conversation and message methods

Extends the existing tenantId stripping to bulkSaveConvos, bulkSaveMessages,
recordMessage, and updateMessage — all of which previously passed caller-supplied
tenantId straight through to the update document. Renames discard alias from _t
to _tenantId for clarity. Adds regression tests for all six write paths.

* fix: Eliminate double-copy overhead and strengthen test assertions

Replace destructure-then-spread with spread-once-then-delete for saveConvo,
saveMessage, and recordMessage — removes one O(n) copy per call on hot paths.
Add missing not-null and positive data assertions to tenantId stripping tests.

* test: Add positive data assertions to bulkSaveMessages and recordMessage tests
2026-04-01 11:16:39 -04:00
Danny Avila
419613fdaf 📋 chore: Move project instructions from AGENTS.md to CLAUDE.md
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
2026-03-31 21:50:38 -04:00
Danny Avila
f8405e731b
🗂️ fix: Allow Empty-Overrides Scope Creation in Admin Config (#12492)
* fix: Allow empty-overrides scope creation when priority is provided

The upsertConfigOverrides handler short-circuited when overrides was
empty, returning a plain message instead of creating the config document.
This broke the admin panel's "create blank scope" flow which sends
`{ overrides: {}, priority: N }` — the missing `config` property in the
response caused an `_id` error on the client.

The early return now only triggers when both overrides are empty and no
priority is provided. Per-section permission checks are scoped to cases
where override sections are actually present.

* test: Add tests for empty-overrides scope creation with priority

* test: Address review nits for empty-overrides scope tests

- Add res.statusCode/res.body assertions to capability-check test
- Add 403/401 tests for empty overrides + priority path
- Use mockResolvedValue(null) for consistency on bare jest.fn()
- Remove narrating comment; fold intent into test name
2026-03-31 21:46:48 -04:00
Dustin Healy
2451bf54cf
🛡️ fix: Restrict System Grants to Role Principals (#12491)
* 🛡️ fix: restrict system grants to role principals only

Narrows GrantPrincipalType to PrincipalType.ROLE, rejecting GROUP and
USER with 400. Removes grant cascade cleanup from group/user deletion
handlers and their route wiring since only roles can hold grants.

* 🛡️ fix: address review findings for grants roles-only restriction

Add missing GROUP rejection test for revokeGrant (symmetric with
getPrincipalGrants and assignGrant coverage), add extensibility comment
to GrantPrincipalType, and document the checkRoleExists guard.
2026-03-31 19:25:14 -04:00
Danny Avila
2e706ebcb3
⚖️ refactor: Split Config Route into Unauthenticated and Authenticated Paths (#12490)
* refactor: split /api/config into unauthenticated and authenticated response paths

- Replace preAuthTenantMiddleware with optionalJwtAuth on the /api/config
  route so the handler can detect whether the request is authenticated
- When unauthenticated: call getAppConfig({ baseOnly: true }) for zero DB
  queries, return only login-relevant fields (social logins, turnstile,
  privacy policy / terms of service from interface config)
- When authenticated: call getAppConfig({ role, userId, tenantId }) to
  resolve per-user DB overrides (USER + ROLE + GROUP + PUBLIC principals),
  return full payload including modelSpecs, balance, webSearch, etc.
- Extract buildSharedPayload() and addWebSearchConfig() helpers to avoid
  duplication between the two code paths
- Fixes per-user balance overrides not appearing in the frontend because
  userId was never passed to getAppConfig (follow-up to #12474)

* test: rewrite config route tests for unauthenticated vs authenticated paths

- Replace the previously-skipped supertest tests with proper mocked tests
- Cover unauthenticated path: baseOnly config call, minimal payload,
  interface subset (privacyPolicy/termsOfService only), exclusion of
  authenticated-only fields
- Cover authenticated path: getAppConfig called with userId, full payload
  including modelSpecs/balance/webSearch, per-user balance override merging

* fix: address review findings — restore multi-tenant support, improve tests

- Chain preAuthTenantMiddleware back before optionalJwtAuth on /api/config
  so unauthenticated requests in multi-tenant deployments still get
  tenant-scoped config via X-Tenant-Id header (Finding #1)
- Use getAppConfig({ tenantId }) instead of getAppConfig({ baseOnly: true })
  when a tenant context is present; fall back to baseOnly for single-tenant
- Fix @type annotation: unauthenticated payload is Partial<TStartupConfig>
- Refactor addWebSearchConfig into pure buildWebSearchConfig that returns a
  value instead of mutating the payload argument
- Hoist isBirthday() to module level
- Remove inline narration comments
- Assert tenantId propagation in tests, including getTenantId fallback and
  user.tenantId preference
- Add error-path tests for both unauthenticated and authenticated branches
- Expand afterEach env var cleanup for proper test isolation

* test: fix mock isolation and add tenant-scoped response test

- Replace jest.clearAllMocks() with jest.resetAllMocks() so
  mockReturnValue implementations don't leak between tests
- Add test verifying tenant-scoped socialLogins and turnstile are
  correctly mapped in the unauthenticated response

* fix: add optionalJwtAuth to /api/config in experimental.js

Without this middleware, req.user is never populated in the experimental
cluster entrypoint, so authenticated users always receive the minimal
unauthenticated config payload.
2026-03-31 19:22:51 -04:00
Danny Avila
7181174c3b
🖼️ fix: Message Icon Flickering from Context-Triggered Re-renders (#12489)
* perf: add custom memo comparator to MessageIcon for stable re-render gating

MessageIcon receives full `agent` and `assistant` objects as props from
useMessageActions, which recomputes them when AgentsMapContext or
AssistantsMapContext update (e.g., react-query refetch on window focus).
These context-triggered re-renders bypass MessageRender's React.memo,
producing new object references for agent/assistant even when the
underlying data is unchanged. The default shallow comparison in
MessageIcon's memo then fails, causing unnecessary re-renders that
manifest as visible icon flickering.

Add arePropsEqual comparator that checks only the fields MessageIcon
actually uses (name, avatar filepath, metadata avatar) instead of
object identity, so the component correctly bails out when icon-relevant
data hasn't changed.

* refactor: export arePropsEqual, drop redundant useMemos, add JSDoc

- Export arePropsEqual so it can be tested in isolation
- Add JSDoc documenting which fields are intentionally omitted (id)
  and why iconData uses reference equality
- Replace five trivial useMemo calls (agent?.name ?? '', etc.) with
  direct computed values — the custom comparator already gates
  re-renders, so these memos only add closure/dep-array overhead
  without ever providing cache hits
- Remove unused React import

* test: add unit tests for MessageIcon arePropsEqual comparator

Exercise the custom memo comparator to ensure:
- New object references with same display fields are treated as equal
- Changed name or avatar filepath triggers re-render
- iconData reference change triggers re-render
- undefined→defined agent with undefined fields is treated as equal

* fix: replace nested ternary with if-else for eslint compliance

* test: add comment on subtle equality invariant and defined→undefined case

* perf: compare iconData by field values instead of reference

iconData is a useMemo'd object from the parent, but comparing by
reference still causes unnecessary re-renders when the parent
recomputes the memo with identical primitive values. Compare all
five fields individually (endpoint, model, iconURL, modelLabel,
isCreatedByUser) for consistency with how agent/assistant are handled.
2026-03-31 18:24:18 -04:00
Danny Avila
aa7e5ba051
📦 chore: bump axios to exact v1.13.6, @librechat/agents to v3.1.63, @aws-sdk/client-bedrock-runtime to v3.1013.0 (#12488)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Publish `@librechat/client` to NPM / build-and-publish (push) Has been cancelled
Publish `librechat-data-provider` to NPM / build (push) Has been cancelled
Publish `@librechat/data-schemas` to NPM / build-and-publish (push) Has been cancelled
Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled
* 🔧 chore: bump @librechat/agents to v3.1.63

* 🔧 chore: update axios dependency to exact version 1.13.6

* 🔧 chore: update @aws-sdk/client-bedrock-runtime to version 3.1013.0 in package.json and package-lock.json

- Bump the version of @aws-sdk/client-bedrock-runtime across package.json files in api and packages/api to ensure compatibility with the latest features and fixes.
- Reflect the updated version in package-lock.json to maintain consistency in dependency resolution.

* 🔧 chore: update axios dependency to version 1.13.6 across multiple package.json and package-lock.json files

- Bump axios version from ^1.13.5 to 1.13.6 in package.json and package-lock.json for improved performance and security.
- Ensure consistency in dependency resolution across the project by updating all relevant files.
2026-03-31 14:49:31 -04:00
Danny Avila
d9f216c11a
📦 chore: bump dependabot packages (#12487)
* chore: Update Handlebars and package versions in package-lock.json and package.json

- Upgrade Handlebars from version 4.7.7 to 4.7.9 in both package-lock.json and package.json for improved performance and security.
- Update librechat-data-provider version from 0.8.401 to 0.8.406 in package-lock.json.
- Update @librechat/data-schemas version from 0.0.40 to 0.0.48 in package-lock.json.

* chore: Upgrade @happy-dom/jest-environment and happy-dom versions in package-lock.json and package.json

- Update @happy-dom/jest-environment from version 20.8.3 to 20.8.9 for improved compatibility.
- Upgrade happy-dom from version 20.8.3 to 20.8.9 to ensure consistency across dependencies.

* chore: Upgrade @rollup/plugin-terser to version 1.0.0 in package-lock.json and package.json

- Update @rollup/plugin-terser from version 0.4.4 to 1.0.0 in both package-lock.json and package.json for improved performance and compatibility.
- Reflect the new version in the dependencies of data-provider and data-schemas packages.

* chore: Upgrade rollup-plugin-typescript2 to version 0.37.0 in package-lock.json and package.json

- Update rollup-plugin-typescript2 from version 0.35.0 to 0.37.0 in package-lock.json and all relevant package.json files for improved compatibility and performance.
- Adjust dependencies for semver and tslib to their latest versions in line with the rollup-plugin-typescript2 upgrade.

* chore: Upgrade nodemailer to version 8.0.4 in package-lock.json and package.json

- Update nodemailer from version 7.0.11 to 8.0.4 in both package-lock.json and package.json to enhance functionality and security.

* chore: Upgrade picomatch, yaml, brace-expansion versions in package-lock.json

- Update picomatch from version 4.0.3 to 4.0.4 across multiple dependencies for improved functionality.
- Upgrade brace-expansion from version 2.0.2 to 2.0.3 and from 5.0.3 to 5.0.5 to enhance compatibility and performance.
- Update yaml from version 1.10.2 to 1.10.3 for better stability.
2026-03-31 13:36:20 -04:00
Danny Avila
c0ce7fee91
🚫 refactor: Remove Interface Config from Override Processing (#12473)
Add INTERFACE_PERMISSION_FIELDS set defining the interface fields that
seed role permissions at startup (prompts, agents, marketplace, etc.).
These fields are now stripped from DB config overrides in the merge
layer because updateInterfacePermissions() only runs at boot — DB
overrides for these fields create a client/server permission mismatch.

Pure UI fields (endpointsMenu, modelSelect, parameters, presets,
sidePanel, customWelcome, etc.) continue to work in overrides as
before.

YAML startup path is completely unaffected.
2026-03-31 11:07:31 -04:00
Dustin Healy
3d1b883e9d
👨‍👨‍👦‍👦 feat: Admin Users API Endpoints (#12446)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* feat: add admin user management endpoints

Add /api/admin/users with list, search, and delete handlers gated by
ACCESS_ADMIN + READ_USERS/MANAGE_USERS system grants. Handler factory
in packages/api uses findUsers, countUsers, and deleteUserById from
data-schemas.

* fix: address convention violations in admin users handlers

* fix: add pagination, self-deletion guard, and DB-level search limit

- listUsers now uses parsePagination + countUsers for proper pagination
  matching the roles/groups pattern
- findUsers extended with optional limit/offset options
- deleteUser returns 403 when caller tries to delete own account
- searchUsers passes limit to DB query instead of fetching all and
  slicing in JS
- Fix import ordering per CLAUDE.md, complete logger mock
- Replace fabricated date fallback with undefined

* fix: deterministic sort, null-safe pagination, consistent search filter

- Add sort option to findUsers; listUsers sorts by createdAt desc for
  deterministic pagination
- Use != null guards for offset/limit to handle zero values correctly
- Remove username from search filter since it is not in the projection
  or AdminUserSearchResult response type

* fix: last-admin deletion guard and search query max-length

- Prevent deleting the last admin user (look up target role, count
  admins, reject with 400 if count <= 1)
- Cap search query at 200 characters to prevent regex DoS
- Add tests for both guards

* fix: include missing capability name in 403 Forbidden response

* fix: cascade user deletion cleanup, search username, parallel capability checks

- Cascade Config, AclEntry, and SystemGrant cleanup on user deletion
  (matching the pattern in roles/groups handlers)
- Add username to admin search $or filter for parity with searchUsers
- Parallelize READ_* capability checks in listAllGrants with Promise.all

* fix: TOCTOU safety net, capability info leak, DRY/style cleanup, data-layer tests

- Add post-delete admin recount with CRITICAL log if race leaves 0 admins
- Revert capability name from 403 response to server-side log only
- Document thin deleteUserById limitation (full cascade is a future task)
- DRY: extract query.trim() to local variable in searchUsersHandler
- Add username to search projection, response type, and AdminUserSearchResult
- Functional filter/map in grants.ts parallel capability check
- Consistent null guards and limit>0 guard in findUsers options
- Fallback for empty result.message on delete response
- Fix mockUser() to generate unique _id per call
- Break long destructuring across multiple lines
- Assert countUsers filter and non-admin skip in delete tests
- Add data-layer tests for findUsers limit, offset, sort, and pagination

* chore: comment out admin delete user endpoint (out of scope)

* fix: cast USER principalId to ObjectId for ACL entry cleanup

ACL entries store USER principalId as ObjectId (via grantPermission casting),
but deleteAclEntries is a raw deleteMany that passes the filter through.
Passing a string won't match stored ObjectIds, leaving orphaned entries.

* chore: comment out unused requireManageUsers alongside disabled delete route

* fix: add missing logger.warn mock in capabilities test

* fix: harden admin users handlers — type safety, response consistency, test coverage

- Unify response shape: AdminUserSearchResult.userId → id, add AdminUserListItem type
- Fix unsafe req.query type assertion in searchUsersHandler (typeof guards)
- Anchor search regex with ^ for prefix matching (enables index usage)
- Add total/capped to search response for truncation signaling
- Add parseInt radix, remove redundant new Date() wrap
- Add tests: countUsers throw, countUsers call args, array query param, capped flag

* fix: scope deleteGrantsForPrincipal to tenant, deterministic search sort, align test mocks

- Add tenantId option to AdminUsersDeps.deleteGrantsForPrincipal and
  pass req.user.tenantId at the call site, matching the pattern already
  used by the roles and groups handlers
- Add sort: { name: 1 } to searchUsersHandler for deterministic results
- Align test mock deleteUserById messages with production output
  ('User was deleted successfully.')
- Make capped-results test explicitly set limit: '20' instead of
  relying on the implicit default

* test: add tenantId propagation test for deleteGrantsForPrincipal

Add tenantId to createReqRes user type and test that a non-undefined
tenantId is threaded through to deleteGrantsForPrincipal.

* test: remove redundant deleteUserById override in tenantId test

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-30 23:06:50 -04:00
Danny Avila
fd01dfc083
💰 fix: Lazy-Initialize Balance Record at Check Time for Overrides (#12474)
* fix: Lazy-initialize balance record when missing at check time

When balance is configured via admin panel DB overrides, users with
existing sessions never pass through the login middleware that creates
their balance record. This causes checkBalanceRecord to find no record
and return balance: 0, blocking the user.

Add optional balanceConfig and upsertBalanceFields deps to
CheckBalanceDeps. When no balance record exists but startBalance is
configured, lazily create the record instead of returning canSpend: false.

Pass the new deps from BaseClient, chatV1, and chatV2 callers.

* test: Add checkBalance lazy initialization tests

Cover lazy balance init scenarios: successful init with startBalance,
insufficient startBalance, missing config fallback, undefined
startBalance, missing upsertBalanceFields dep, and startBalance of 0.

* fix: Address review findings for lazy balance initialization

- Use canonical BalanceConfig and IBalanceUpdate types from
  @librechat/data-schemas instead of inline type definitions
- Include auto-refill fields (autoRefillEnabled, refillIntervalValue,
  refillIntervalUnit, refillAmount, lastRefill) during lazy init,
  mirroring the login middleware's buildUpdateFields logic
- Add try/catch around upsertBalanceFields with graceful fallback to
  canSpend: false on DB errors
- Read balance from DB return value instead of raw startBalance constant
- Fix misleading test names to describe observable throw behavior
- Add tests: upsertBalanceFields rejection, auto-refill field inclusion,
  DB-returned balance value, and logViolation assertions

* fix: Address second review pass findings

- Fix import ordering: package type imports before local type imports
- Remove misleading comment on DB-fallback test, rename for clarity
- Add logViolation assertion to insufficient-balance lazy-init test
- Add test for partial auto-refill config (autoRefillEnabled without
  required dependent fields)

* refactor: Replace createMockReqRes factory with describe-scoped consts

Replace zero-argument factory with plain const declarations using
direct type casts instead of double-cast through unknown.

* fix: Sort local type imports longest-first, add missing logViolation assertion

- Reorder local type imports in spec file per AGENTS.md (longest to
  shortest within sub-group)
- Add logViolation assertion to startBalance: 0 test for consistent
  violation payload coverage across all throw paths
2026-03-30 22:51:07 -04:00
Danny Avila
4f37e8adb9
🔐 feat: Admin Auth Support for SAML and Social OAuth Providers (#12472)
* refactor: Add existingUsersOnly support to social and SAML auth callbacks

- Add `existingUsersOnly` option to the `socialLogin` handler factory to
  reject unknown users instead of creating new accounts
- Refactor SAML strategy callback into `createSamlCallback(existingUsersOnly)`
  factory function, mirroring the OpenID `createOpenIDCallback` pattern
- Extract shared SAML config into `getBaseSamlConfig()` helper
- Register `samlAdmin` passport strategy with `existingUsersOnly: true` and
  admin-specific callback URL, called automatically from `setupSaml()`

* feat: Register admin OAuth strategy variants for all social providers

- Add admin strategy exports to Google, GitHub, Discord, Facebook, and
  Apple strategy files with admin callback URLs and existingUsersOnly
- Extract provider configs into reusable helpers to avoid duplication
  between regular and admin strategy constructors
- Re-export all admin strategy factories from strategies/index.js
- Register admin passport strategies (googleAdmin, githubAdmin, etc.)
  alongside regular ones in socialLogins.js when env vars are present

* feat: Add admin auth routes for SAML and social OAuth providers

- Add initiation and callback routes for SAML, Google, GitHub, Discord,
  Facebook, and Apple to the admin auth router
- Each provider follows the exchange code + PKCE pattern established by
  OpenID admin auth: store PKCE challenge on initiation, retrieve on
  callback, generate exchange code for the admin panel
- SAML and Apple use POST callbacks with state extracted from
  req.body.RelayState and req.body.state respectively
- Extract storePkceChallenge(), retrievePkceChallenge(), and
  generateState() helpers; refactor existing OpenID routes to use them
- All callback chains enforce requireAdminAccess, setBalanceConfig,
  checkDomainAllowed, and the shared createOAuthHandler
- No changes needed to the generic POST /oauth/exchange endpoint

* fix: Update SAML strategy test to handle dual strategy registration

setupSaml() now registers both 'saml' and 'samlAdmin' strategies,
causing the SamlStrategy mock to be called twice. The verifyCallback
variable was getting overwritten with the admin callback (which has
existingUsersOnly: true), making all new-user tests fail.

Fix: capture only the first callback per setupSaml() call and reset
between tests.

* fix: Address review findings for admin OAuth strategy changes

- Fix existingUsersOnly rejection in socialLogin.js to use
  cb(null, false, { message }) instead of cb(error), ensuring
  passport's failureRedirect fires correctly for admin flows
- Consolidate duplicate require() calls in strategies/index.js by
  destructuring admin exports from the already-imported default export
- Pass pre-parsed baseConfig to setupSamlAdmin() to avoid redundant
  certificate file I/O at startup
- Extract getGoogleConfig() helper in googleStrategy.js for consistency
  with all other provider strategy files
- Replace randomState() (openid-client) with generateState() (crypto)
  in the OpenID admin route for consistency with all other providers,
  and remove the now-unused openid-client import

* Reorder import statements in auth.js
2026-03-30 22:49:44 -04:00
Danny Avila
2bf0f892d6
🛡️ fix: Add Origin Binding to Admin OAuth Exchange Codes (#12469)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* fix(auth): add origin binding to admin OAuth exchange codes

Bind admin OAuth exchange codes to the admin panel's origin at
generation time and validate the origin on redemption. This prevents
an intercepted code (via referrer leakage, logs, or network capture)
from being redeemed by a different origin within the 30-second TTL.

- Store the admin panel origin alongside the exchange code in cache
- Extract the request origin (from Origin/Referer headers) on the
  exchange endpoint and pass it for validation
- Reject code redemption when the request origin does not match the
  stored origin (code is still consumed to prevent replay)
- Backward compatible: codes without a stored origin are accepted

* fix(auth): add PKCE proof-of-possession to admin OAuth exchange codes

Add a PKCE-like code_challenge/code_verifier flow to the admin OAuth
exchange so that intercepting the exchange code alone is insufficient
to redeem it. The admin panel generates a code_verifier (stored in its
HttpOnly session cookie) and sends sha256(verifier) as code_challenge
through the OAuth initiation URL. LibreChat stores the challenge keyed
by OAuth state and attaches it to the exchange code. On redemption, the
admin panel sends the verifier and LibreChat verifies the hash match.

- Add verifyCodeChallenge() helper using SHA-256
- Store code_challenge in ADMIN_OAUTH_EXCHANGE cache (pkce: prefix, 5min TTL)
- Capture OAuth state in callback middleware before passport processes it
- Accept code_verifier in exchange endpoint body
- Backward compatible: no challenge stored → PKCE check skipped

* fix(auth): harden PKCE and origin binding in admin OAuth exchange

- Await cache.set for PKCE challenge storage with error handling
- Use crypto.timingSafeEqual for PKCE hash comparison
- Drop case-insensitive flag from hex validation regexes
- Add code_verifier length validation (max 512 chars)
- Normalize Origin header via URL parsing in resolveRequestOrigin
- Add test for undefined requestOrigin rejection
- Clarify JSDoc: hex-encoded SHA-256, not RFC 7636 S256

* fix(auth): fail closed on PKCE callback cache errors, clean up origin/buffer handling

- Callback middleware now redirects to error URL on cache.get failure
  instead of silently continuing without PKCE challenge
- resolveRequestOrigin returns undefined (not raw header) on parse failure
- Remove dead try/catch around Buffer.from which never throws for string input

* chore(auth): remove narration comments, scope eslint-disable to lines

* chore(auth): narrow query.state to string, remove narration comments in exchange.ts

* fix(auth): address review findings — warn on missing PKCE challenge, validate verifier length, deduplicate URL parse

- Log warning when OAuth state is present but no PKCE challenge found
- Add minimum length check (>= 1) on code_verifier input validation
- Update POST /oauth/exchange JSDoc to document code_verifier param
- Deduplicate new URL(redirectUri) parse in createOAuthHandler
- Restore intent comment on pre-delete pattern in exchangeAdminCode

* test(auth): replace mock cache with real Keyv, remove all as-any casts

- Use real Keyv in-memory store instead of hand-rolled Map mock
- Replace jest.fn mocks with jest.spyOn on real Keyv instance
- Remove redundant store.has() assertion, use cache.get() instead
- Eliminate all eslint-disable and as-any suppressions
- User fixture no longer needs any cast (Keyv accepts plain objects)

* fix(auth): add IUser type cast for test fixture to satisfy tsc
2026-03-30 16:54:00 -04:00
Danny Avila
1455f15b7b
📄 feat: Model-Aware Bedrock Document Size Validation (#12467)
* 📄 fix: Model-Aware Bedrock Document Size Validation

Remove the hard 4.5MB clamp on Bedrock document uploads so that
Claude 4+ (PDF) and Nova (PDF/DOCX) models can accept larger files
per AWS documentation. The default 4.5MB limit is preserved for
other models/formats, and fileConfig can now override it in either
direction—consistent with every other provider.

* address review: restore Math.min for non-exempt docs, tighten regexes, add tests

- Restore Math.min clamp for non-exempt Bedrock documents (fileConfig can
  only lower the hard 4.5 MB API limit, not raise it); only exempt models
  (Claude 4+ PDF, Nova PDF/DOCX) use ?? to allow fileConfig override
- Replace copied isBedrockClaude4Plus regex with cleaner anchored pattern
  that correctly handles multi-digit version numbers (e.g. sonnet-40)
  and removes dead Alt 1 branch matching no real Bedrock model IDs
- Tighten isBedrockNova from includes() to startsWith() to prevent
  substring matching in unexpected positions
- Add integration test verifying model is threaded to validateBedrockDocument
- Add boundary tests for exempt + low configuredFileSizeLimit, non-exempt
  + high configuredFileSizeLimit, and exempt model accepting files up to 32 MB
- Revert two tests that were incorrectly inverted to prove wrong behavior
- Fix inaccurate JSDoc and misleading test name

* simplify: allow fileConfig to override Bedrock limit in either direction

Make Bedrock consistent with all other providers — fileConfig sets the
effective limit unconditionally via ?? rather than clamping with Math.min.
The model-aware defaults (4.5 MB for non-exempt, 32 MB for exempt) remain
as sensible fallbacks when no fileConfig is set.

* fix: handle cross-region inference profile IDs in Bedrock model matchers

Bedrock cross-region inference profiles prepend a region code to the
model ID (e.g. "us.amazon.nova-pro-v1:0", "eu.anthropic.claude-sonnet-4-...").
Both isBedrockNova and isBedrockClaude4Plus would miss these prefixed IDs,
silently falling back to the 4.5 MB default for eligible models.

Switch both matchers to use (?:^|\.) to anchor the vendor segment so the
pattern matches with or without a leading region prefix.
2026-03-30 16:50:10 -04:00
Danny Avila
fda72ac621
🏗️ refactor: Remove Redundant Caching, Migrate Config Services to TypeScript (#12466)
* ♻️ refactor: Remove redundant scopedCacheKey caching, support user-provided key model fetching

Remove redundant cache layers that used `scopedCacheKey()` (tenant-only scoping)
on top of `getAppConfig()` which already caches per-principal (role+user+tenant).
This caused config overrides for different principals within the same tenant to
be invisible due to stale cached data.

Changes:
- Add `requireJwtAuth` to `/api/endpoints` route for proper user context
- Remove ENDPOINT_CONFIG, STARTUP_CONFIG, PLUGINS, TOOLS, and MODELS_CONFIG
  cache layers — all derive from `getAppConfig()` with cheap computation
- Enhance MODEL_QUERIES cache: hash(baseURL+apiKey) keys, 2-minute TTL,
  caching centralized in `fetchModels()` base function
- Support fetching models with user-provided API keys in `loadConfigModels`
  via `getUserKeyValues` lookup (no caching for user keys)
- Update all affected tests

Closes #1028

* ♻️ refactor: Migrate config services to TypeScript in packages/api

Move core config logic from CJS /api wrappers to typed TypeScript in
packages/api using dependency injection factories:

- `createEndpointsConfigService` — endpoint config merging + checkCapability
- `createLoadConfigModels` — custom endpoint model loading with user key support
- `createMCPToolCacheService` — MCP tool cache operations (update, merge, cache)

/api files become thin wrappers that wire dependencies (getAppConfig,
loadDefaultEndpointsConfig, getUserKeyValues, getCachedTools, etc.)
into the typed factories.

Also moves existing `endpoints/config.ts` → `endpoints/config/providers.ts`
to accommodate the new `config/` directory structure.

* 🔄 fix: Invalidate models query when user API key is set or revoked

Without this, users had to refresh the page after entering their API key
to see the updated model list fetched with their credentials.

- Invalidate QueryKeys.models in useUpdateUserKeysMutation onSuccess
- Invalidate QueryKeys.models in useRevokeUserKeyMutation onSuccess
- Invalidate QueryKeys.models in useRevokeAllUserKeysMutation onSuccess

* 🗺️ fix: Remap YAML-level override keys to AppConfig equivalents in mergeConfigOverrides

Config overrides stored in the DB use YAML-level keys (TCustomConfig),
but they're merged into the already-processed AppConfig where some fields
have been renamed by AppService. This caused mcpServers overrides to land
on a nonexistent key instead of mcpConfig, so config-override MCP servers
never appeared in the UI.

- Add OVERRIDE_KEY_MAP to remap mcpServers→mcpConfig, interface→interfaceConfig
- Apply remapping before deep merge in mergeConfigOverrides
- Add test for YAML-level key remapping behavior
- Update existing tests to use AppConfig field names in assertions

* 🧪 test: Update service.spec to use AppConfig field names after override key remapping

* 🛡️ fix: Address code review findings — reliability, types, tests, and performance

- Pass tenant context (getTenantId) in importers.js getEndpointsConfig call
- Add 5 tests for user-provided API key model fetching (key found, no key,
  DB error, missing userId, apiKey-only with fixed baseURL)
- Distinguish NO_USER_KEY (debug) from infrastructure errors (warn) in catch
- Switch fetchPromisesMap from Promise.all to Promise.allSettled so one
  failing provider doesn't kill the entire model config
- Parallelize getUserKeyValues DB lookups via batched Promise.allSettled
  instead of sequential awaits in the loop
- Hoist standardCache instance in fetchModels to avoid double instantiation
- Replace Record<string, unknown> types with Partial<TConfig>-based types;
  remove as unknown as T double-cast in endpoints config
- Narrow Bedrock availableRegions to typed destructure
- Narrow version field from string|number|undefined to string|undefined
- Fix import ordering in mcp/tools.ts and config/models.ts per AGENTS.md
- Add JSDoc to getModelsConfig alias clarifying caching semantics

* fix: Guard against null getCachedTools in mergeAppTools

* 🔍 fix: Address follow-up review — deduplicate extractEnvVariable, fix error discrimination, add log-level tests

- Deduplicate extractEnvVariable calls: resolve apiKey/baseURL once, reuse
  for both the entry and isUserProvided checks (Finding A)
- Move ResolvedEndpoint interface from function closure to module scope (Finding B)
- Replace fragile msg.includes('NO_USER_KEY') with ErrorTypes.NO_USER_KEY
  enum check against actual error message format (Finding C). Also handle
  ErrorTypes.INVALID_USER_KEY as an expected "no key" case.
- Add test asserting logger.warn is called for infra errors (not debug)
- Add test asserting logger.debug is called for NO_USER_KEY errors (not warn)

* fix: Preserve numeric assistants version via String() coercion

* 🐛 fix: Address secondary review — Ollama cache bypass, cache tests, type safety

- Fix Ollama success path bypassing cache write in fetchModels (CRITICAL):
  store result before returning so Ollama models benefit from 2-minute TTL
- Add 4 fetchModels cache behavior tests: cache write with TTL, cache hit
  short-circuits HTTP, skipCache bypasses read+write, empty results not cached
- Type-safe OVERRIDE_KEY_MAP: Partial<Record<keyof TCustomConfig, keyof AppConfig>>
  so compiler catches future field rename mismatches
- Fix import ordering in config/models.ts (package types longest→shortest)
- Rename ToolCacheDeps → MCPToolCacheDeps for naming consistency
- Expand getModelsConfig JSDoc to explain caching granularity

* fix: Narrow OVERRIDE_KEY_MAP index to satisfy strict tsconfig

* 🧩 fix: Add allowedProviders to TConfig, remove Record<string, unknown> from PartialEndpointEntry

The agents endpoint config includes allowedProviders (used by the frontend
AgentPanel to filter available providers), but it was missing from TConfig.
This forced PartialEndpointEntry to use & Record<string, unknown> as an
escape hatch, violating AGENTS.md type policy.

- Add allowedProviders?: (string | EModelEndpoint)[] to TConfig
- Remove Record<string, unknown> from PartialEndpointEntry — now just Partial<TConfig>

* 🛡️ fix: Isolate Ollama cache write from fetch try-catch, add Ollama cache tests

- Separate Ollama fetch and cache write into distinct scopes so a cache
  failure (e.g., Redis down) doesn't misattribute the error as an Ollama
  API failure and fall through to the OpenAI-compatible path (Issue A)
- Add 2 Ollama-specific cache tests: models written with TTL on fetch,
  cached models returned without hitting server (Issue B)
- Replace hardcoded 120000 with Time.TWO_MINUTES constant in cache TTL
  test assertion (Issue C)
- Fix OVERRIDE_KEY_MAP JSDoc to accurately describe runtime vs compile-time
  type enforcement (Issue D)
- Add global beforeEach for cache mock reset to prevent cross-test leakage

* 🧪 fix: Address third review — DI consistency, cache key width, MCP tests

- Inject loadCustomEndpointsConfig via EndpointsConfigDeps with default
  fallback, matching loadDefaultEndpointsConfig DI pattern (Finding 3)
- Widen modelsCacheKey from 64-bit (.slice(0,16)) to 128-bit (.slice(0,32))
  for collision-sensitive cross-credential cache key (Finding 4)
- Add fetchModels.mockReset() in loadConfigModels.spec beforeEach to
  prevent mock implementation leaks across tests (Finding 5)
- Add 11 unit tests for createMCPToolCacheService covering all three
  functions: null/empty input, successful ops, error propagation,
  cold-cache merge (Finding 2)
- Simplify getModelsConfig JSDoc to @see reference (Finding 10)

* ♻️ refactor: Address remaining follow-ups from reviews

OVERRIDE_KEY_MAP completeness:
- Add missing turnstile→turnstileConfig mapping
- Add exhaustiveness test verifying all three renamed keys are remapped
  and original YAML keys don't leak through

Import role context:
- Pass userRole through importConversations job → importLibreChatConvo
  so role-based endpoint overrides are honored during conversation import
- Update convos.js route to include req.user.role in the job payload

createEndpointsConfigService unit tests:
- Add 8 tests covering: default+custom merge, Azure/AzureAssistants/
  Anthropic Vertex/Bedrock config enrichment, assistants version
  coercion, agents allowedProviders, req.config bypass

Plugins/tools efficiency:
- Use Set for includedTools/filteredTools lookups (O(1) vs O(n) per plugin)
- Combine auth check + filter into single pass (eliminates intermediate array)
- Pre-compute toolDefKeys Set for O(1) tool definition lookups

* fix: Scope model query cache by user when userIdQuery is enabled

* fix: Skip model cache for userIdQuery endpoints, fix endpoints test types

- When userIdQuery is true, skip caching entirely (like user_provided keys)
  to avoid cross-user model list leakage without duplicating cache data
- Fix AgentCapabilities type error in endpoints.spec.ts — use enum values
  and appConfig() helper for partial mock typing

* 🐛 fix: Restore filteredTools+includedTools composition, add checkCapability tests

- Fix filteredTools regression: whitelist and blacklist are now applied
  independently (two flat guards), matching original behavior where
  includedTools=['a','b'] + filteredTools=['b'] produces ['a'] (Finding A)
- Fix Set spread in toolkit loop: pre-compute toolDefKeysList array once
  alongside the Set, reuse for .some() without per-plugin allocation (Finding B)
- Add 2 filteredTools tests: blacklist-only path and combined
  whitelist+blacklist composition (Finding C)
- Add 3 checkCapability tests: capability present, capability absent,
  fallback to defaultAgentCapabilities for non-agents endpoints (Finding D)

* 🔑 fix: Include config-override MCP servers in filterAuthorizedTools

Config-override MCP servers (defined via admin config overrides for
roles/groups) were rejected by filterAuthorizedTools because it called
getAllServerConfigs(userId) without the configServers parameter. Only
YAML and DB-backed user servers were included in the access check.

- Add configServers parameter to filterAuthorizedTools
- Resolve config servers via resolveConfigServers(req) at all 4 callsites
  (create, update, duplicate, revert) using parallel Promise.all
- Pass configServers through to getAllServerConfigs(userId, configServers)
  so the registry merges config-source servers into the access check
- Update filterAuthorizedTools.spec.js mock for resolveConfigServers

* fix: Skip model cache for userIdQuery endpoints, fix endpoints test types

For user-provided key endpoints (userProvide: true), skip the full model
list re-fetch during message validation — the user already selected from
a list we served them, and re-fetching with skipCache:true on every
message send is both slow and fragile (5s provider timeout = rejected model).

Instead, validate the model string format only:
- Must be a string, max 256 chars
- Must match [a-zA-Z0-9][a-zA-Z0-9_.:\-/@+ ]* (covers all known provider
  model ID formats while rejecting injection attempts)

System-configured endpoints still get full model list validation as before.

* 🧪 test: Add regression tests for filterAuthorizedTools configServers and validateModel

filterAuthorizedTools:
- Add test verifying configServers is passed to getAllServerConfigs and
  config-override server tools are allowed through
- Guard resolveConfigServers in createAgentHandler to only run when
  MCP tools are present (skip for tool-free agent creates)

validateModel (12 new tests):
- Format validation: missing model, non-string, length overflow, leading
  special char, script injection, standard model ID acceptance
- userProvide early-return: next() called immediately, getModelsConfig
  not invoked (regression guard for the exact bug this fixes)
- System endpoint list validation: reject unknown model, accept known
  model, handle null/missing models config

Also fix unnecessary backslash escape in MODEL_PATTERN regex.

* 🧹 fix: Remove space from MODEL_PATTERN, trim input, clean up nits

- Remove space character from MODEL_PATTERN regex — no real model ID
  uses spaces; prevents spurious violation logs from whitespace artifacts
- Add model.trim() before validation to handle accidental whitespace
- Remove redundant filterUniquePlugins call on already-deduplicated output
- Add comment documenting intentional whitelist+blacklist composition
- Add getUserKeyValues.mockReset() in loadConfigModels.spec beforeEach
- Remove narrating JSDoc from getModelsConfig one-liner
- Add 2 tests: trim whitespace handling, reject spaces in model ID

* fix: Match startup tool loader semantics — includedTools takes precedence over filteredTools

The startup tool loader (loadAndFormatTools) explicitly ignores
filteredTools when includedTools is set, with a warning log. The
PluginController was applying both independently, creating inconsistent
behavior where the same config produced different results at startup
vs plugin listing time.

Restored mutually exclusive semantics: when includedTools is non-empty,
filteredTools is not evaluated.

* 🧹 chore: Simplify validateModel flow, note auth requirement on endpoints route

- Separate missing-model from invalid-model checks cleanly: type+presence
  guard first, then trim+format guard (reviewer NIT)
- Add route comment noting auth is required for role/tenant scoping

* fix: Write trimmed model back to req.body.model for downstream consumers
2026-03-30 16:49:48 -04:00
Dustin Healy
a4a17ac771
⛩️ feat: Admin Grants API Endpoints (#12438)
* feat: add System Grants handler factory with tests

Handler factory with 4 endpoints: getEffectiveCapabilities (expanded
capability set for authenticated user), getPrincipalGrants (list grants
for a specific principal), assignGrant, and revokeGrant. Write ops
dynamically check MANAGE_ROLES/GROUPS/USERS based on target principal
type. 31 unit tests covering happy paths, validation, 403, and errors.

* feat: wire System Grants REST routes

Mount /api/admin/grants with requireJwtAuth + ACCESS_ADMIN gate.
Add barrel export for createAdminGrantsHandlers and AdminGrantsDeps.

* fix: cascade grant cleanup on role deletion

Add deleteGrantsForPrincipal to AdminRolesDeps and call it in
deleteRoleHandler via Promise.allSettled after successful deletion,
matching the groups cleanup pattern. 3 tests added for cleanup call,
skip on 404, and resilience to cleanup failure.

* fix: simplify cascade grant cleanup on role deletion

Replace Promise.allSettled wrapper with a direct try/catch for the
single deleteGrantsForPrincipal call.

* fix: harden grant handlers with auth, validation, types, and RESTful revoke

- Add per-handler auth checks (401) and granular capability gates
  (READ_* for getPrincipalGrants, possession check for assignGrant)
- Extract validatePrincipal helper; rewrite validateGrantBody to use
  direct type checks instead of unsafe `as string` casts
- Align DI types with data layer (ResolvedPrincipal.principalType
  widened to string, getUserPrincipals role made optional)
- Switch revoke route from DELETE body to RESTful URL params
- Return 201 for assignGrant to match roles/groups create convention
- Handle null grantCapability return with 500
- Add comprehensive test coverage for new auth/validation paths

* fix: deduplicate ResolvedPrincipal, typed body, defensive auth checks

- Remove duplicate ResolvedPrincipal from capabilities.ts; import the
  canonical export from grants.ts
- Replace Record<string, unknown> with explicit GrantRequestBody interface
- Add defensive 403 when READ_CAPABILITY_BY_TYPE lookup misses
- Document revoke asymmetry (no possession check) with JSDoc
- Use _id only in resolveUser (avoid Mongoose virtual reliance)
- Improve null-grant error message
- Complete logger mock in tests

* refactor: move ResolvedPrincipal to shared types to fix circular dep

Extract ResolvedPrincipal from admin/grants.ts to types/principal.ts
so middleware/capabilities.ts imports from shared types rather than
depending upward on the admin handler layer.

* chore: remove dead re-export, align logger mocks across admin tests

- Remove unused ResolvedPrincipal re-export from grants.ts (canonical
  source is types/principal.ts)
- Align logger mocks in roles.spec.ts and groups.spec.ts to include
  all log levels (error, warn, info, debug) matching grants.spec.ts

* fix: cascade Config and AclEntry cleanup on role deletion

Add deleteConfig and deleteAclEntries to role deletion cascade,
matching the group deletion pattern. Previously only grants were
cleaned up, leaving orphaned config overrides and ACL entries.

* perf: single-query batch for getEffectiveCapabilities

Add getCapabilitiesForPrincipals (plural) to the data layer — a single
$or query across all principals instead of N+1 parallel queries. Wire
it into the grants handler so getEffectiveCapabilities hits the DB once
regardless of how many principals the user has.

* fix: defer SystemCapabilities access to factory call time

Move all SystemCapabilities usage (VALID_CAPABILITIES,
MANAGE_CAPABILITY_BY_TYPE, READ_CAPABILITY_BY_TYPE) inside the
createAdminGrantsHandlers factory. External test suites that mock
@librechat/data-schemas without providing SystemCapabilities crashed
at import time when grants.ts was loaded transitively.

* test: add data-layer and handler test coverage for review findings

- Add 6 mongodb-memory-server tests for getCapabilitiesForPrincipals:
  multi-principal batch, empty array, filtering, tenant scoping
- Add handler test: all principals filtered (only PUBLIC)
- Add handler test: granting an implied capability succeeds
- Add handler test: all cascade cleanup operations fail simultaneously
- Document platform-scope-only tenantId behavior in JSDoc

* fix: resolveUser fallback to user.id, early-return empty principals

- Match capabilities middleware pattern: _id?.toString() ?? user.id
  to handle JWT-deserialized users without Mongoose _id
- Move empty-array guard before principals.map() to skip unnecessary
  normalizePrincipalId calls
- Add comment explaining VALID_PRINCIPAL_TYPES module-scope asymmetry

* refactor: derive VALID_PRINCIPAL_TYPES from capability maps

Make MANAGE_CAPABILITY_BY_TYPE and READ_CAPABILITY_BY_TYPE
non-Partial Records over a shared GrantPrincipalType union, then
derive VALID_PRINCIPAL_TYPES from the map keys. This makes divergence
between the three data structures structurally impossible.

* feat: add GET /api/admin/grants list-all-grants endpoint

Add listAllGrants data-layer method and handler so the admin panel
can fetch all grants in a single request instead of fanning out
N+M calls per role and group. Response is filtered to only include
grants for principal types the caller has read access to.

* fix: update principalType to use GrantPrincipalType for consistency in grants handling

- Refactor principalType in createAdminGrantsHandlers to use GrantPrincipalType instead of PrincipalType for better type accuracy.
- Ensure type consistency across the grants handling logic in the API.

* fix: address admin grants review findings — tenantId propagation, capability validation, pagination, and test coverage

Propagate tenantId through all grant operations for multi-tenancy support.
Extract isValidCapability to accept full SystemCapability union (base, section,
assign) and reuse it in both Mongoose schema validation and handler input checks.
Replace listAllGrants with paginated listGrants + countGrants. Filter PUBLIC
principals from getCapabilitiesForPrincipals queries. Export getCachedPrincipals
from ALS store for fast-path principal resolution. Move DELETE capability param
to query string to avoid colon-in-URL issues. Remove dead code and add
comprehensive handler and data-layer test coverage.

* refactor: harden admin grants — FilterQuery types, auth-first ordering, DELETE path param, isValidCapability tests

Replace Record<string, unknown> with FilterQuery<ISystemGrant> across all
data-layer query filters. Refactor buildTenantFilter to a pure tenantCondition
function that returns a composable FilterQuery fragment, eliminating the $or
collision between tenant and principal queries. Move auth check before input
validation in getPrincipalGrantsHandler, assignGrantHandler, and
revokeGrantHandler to avoid leaking valid type names to unauthenticated callers.
Switch DELETE route from query param back to path param (/:capability) with
encodeURIComponent per project conventions. Add compound index for listGrants
sort. Type VALID_PRINCIPAL_TYPES as Set<GrantPrincipalType>. Remove unused
GetCachedPrincipalsFn type export. Add dedicated isValidCapability unit tests
and revokeGrant idempotency test.

* refactor: batch capability checks in listGrantsHandler via getHeldCapabilities

Replace 3 parallel hasCapabilityForPrincipals DB calls with a single
getHeldCapabilities query that returns the subset of capabilities any
principal holds. Also: defensive limit(0) clamp, parallelized assignGrant
auth checks, principalId type-vs-required error split, tenantCondition
hoisted to factory top, JSDoc on cascade deps, DELETE route encoding note.

* fix: normalize principalId and filter undefined in getHeldCapabilities

Add normalizePrincipalId + null guard to getHeldCapabilities, matching
the contract of getCapabilitiesForPrincipals. Simplify allCaps build
with flatMap, add no-tenantId cross-check and undefined-principalId
test cases.

* refactor: use concrete types in GrantRequestBody, rename encoding test

Replace unknown fields with explicit string types in GrantRequestBody,
matching the established pattern in roles/groups/config handlers. Rename
misleading 'encoded' test to 'with colons' since Express auto-decodes
req.params.

* fix: support hierarchical parent capabilities in possession checks

hasCapabilityForPrincipals and getHeldCapabilities now resolve parent
base capabilities for section/assignment grants. An admin holding
manage:configs can now grant manage:configs:<section> and transitively
read:configs:<section>. Fixes anti-escalation 403 blocking config
capability delegation.

* perf: use getHeldCapabilities in assignGrant to halve DB round-trips

assignGrantHandler was making two parallel hasCapabilityForPrincipals
calls to check manage + capability possession. getHeldCapabilities was
introduced in this PR specifically for this pattern. Replace with a
single batched call. Update corresponding spec assertions.

* fix: validate role existence before granting capabilities

Grants for non-existent role names were silently persisted, creating
orphaned grants that could surprise-activate if a role with that name
was later created. Add optional checkRoleExists dep to assignGrant and
wire it to getRoleByName in the route file.

* refactor: tighten principalType typing and use grantCapability in tests

Narrow getCapabilitiesForPrincipals parameter from string to
PrincipalType, removing the redundant cast. Replace direct
SystemGrant.create() calls in getCapabilitiesForPrincipals tests with
methods.grantCapability() to honor the schema's normalization invariant.
Add getHeldCapabilities extended capability tests.

* test: rename misleading cascade cleanup test name

The test only injects failure into deleteGrantsForPrincipal, not all
cascade operations. Rename from 'cascade cleanup fails' to 'grant
cleanup fails' to match the actual scope.

* fix: reorder role check after permission guard, add tenantId to index

Move checkRoleExists after the getHeldCapabilities permission check so
that a sub-MANAGE_ROLES admin cannot probe role name existence via
400 vs 403 response codes.

Add tenantId to the { principalType, capability } index so listGrants
queries in multi-tenant deployments can use a covering index instead
of post-scanning for tenant condition.

Add missing test for checkRoleExists throwing.

* fix: scope deleteGrantsForPrincipal to tenant on role deletion

deleteGrantsForPrincipal previously filtered only on principalType +
principalId, deleting grants across all tenants. Since the role schema
supports multi-tenancy (compound unique index on name + tenantId), two
tenants can share a role name like 'editor'. Deleting that role in one
tenant would wipe grants for identically-named roles in other tenants.

Add optional tenantId parameter to deleteGrantsForPrincipal. When
provided, scopes the delete to that tenant plus platform-level grants.
Propagate req.user.tenantId through the role deletion cascade.

* fix: scope grant cleanup to tenant on group deletion

Same cross-tenant gap as the role deletion path: deleteGroupHandler
called deleteGrantsForPrincipal without tenantId, so deleting a group
would wipe its grants across all tenants. Extract req.user.tenantId
and pass it through.

* test: add HTTP integration test for admin grants routes

Supertest-based test with real MongoMemoryServer exercising the full
Express wiring: route registration, injected auth middleware, handler
DI deps, and real DB round-trips.

Covers GET /, GET /effective, POST / + DELETE / lifecycle, role
existence validation, and 401 for unauthenticated callers.

Also documents the expandImplications scope: the /effective endpoint
returns base-level capabilities only; section-level resolution is
handled at authorization check time by getParentCapabilities.

* fix: use exact tenant match in deleteGrantsForPrincipal, normalize principalId, harden API

CRITICAL: deleteGrantsForPrincipal was using tenantCondition (a
read-query helper) for deleteMany, which includes the
{ tenantId: { $exists: false } } arm. This silently destroyed
platform-level grants when a tenant-scoped role/group deletion
occurred. Replace with exact { tenantId } match for deletes so
platform-level grants survive tenant-scoped cascade cleanup.

Refactor deleteGrantsForPrincipal signature from fragile positional
overload (sessionOrTenantId union + maybeSession) to a clean options
object: { tenantId?, session? }. Update all callers and test assertions.

Add normalizePrincipalId to hasCapabilityForPrincipals to match the
pattern already used by getHeldCapabilities — prevents string/ObjectId
type mismatch on USER/GROUP principal queries.

Also: export GrantPrincipalType from barrel, add upper-bound cap to
listGrants, document GROUP/USER existence check trade-off, add
integration tests for tenant-isolation property of deleteGrantsForPrincipal.

* fix: forward tenantId to getUserPrincipals in resolvePrincipals

resolvePrincipals had tenantId available from the caller but only
forwarded it to getCachedPrincipals (cache lookup). The DB fallback
via getUserPrincipals omitted it. While the Group schema's
applyTenantIsolation Mongoose plugin handles scoping via
AsyncLocalStorage in HTTP request context, explicitly passing tenantId
makes the contract visible and prevents silent cross-tenant group
resolution if called outside request context.

* fix: remove unused import and add assertion to 401 integration test

Remove unused SystemCapabilities import flagged by ESLint. Add explicit
body assertion to the 401 test so it has a jest expect() call.

* chore: hoist grant limit constants to scope, remove dead isolateModules

Move GRANTS_DEFAULT_LIMIT / GRANTS_MAX_LIMIT from inside listGrants
function body to createSystemGrantMethods scope so they are evaluated
once at module load. Remove dead jest.isolateModules + jest.doMock
block in integration test — the ~/models mock was never exercised
since handlers are built with explicit DI deps.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-30 16:49:23 -04:00
github-actions[bot]
56d994e9ec
🌍 i18n: Update translation.json with latest translations (#12458)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Has been cancelled
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Has been cancelled
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-03-29 18:43:43 -04:00
Danny Avila
7e2b51697e
🪢 refactor: Eliminate Unnecessary Re-renders During Message Streaming (#12454)
Some checks failed
Publish `librechat-data-provider` to NPM / build (push) Has been cancelled
Publish `@librechat/data-schemas` to NPM / build-and-publish (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
* refactor: add TMessageChatContext type for stable context passing

Defines a type for a stable context object that wrapper components
pass to memo'd message components, avoiding direct ChatContext
subscriptions that bypass React.memo during streaming.

* perf: remove ChatContext subscription from useMessageActions

useMessageActions previously called useChatContext() inside memo'd
components (MessageRender, ContentRender), bypassing React.memo
when isSubmitting changed during streaming. Now accepts a stable
chatContext param instead, using a ref for the isSubmitting guard
in regenerateMessage.

Also stabilizes handleScroll in useMessageProcess by using a ref
for isSubmitting instead of including it in useCallback deps.

* perf: pass stable chatContext to memo'd message components

Wrapper components (Message, MessageContent) now create a stable
chatContext object via useMemo with a getter-backed isSubmitting,
and compute effectiveIsSubmitting (false for non-latest messages).

This ensures MessageRender and ContentRender (both React.memo'd)
only re-render for the latest message during streaming, preventing
unnecessary re-renders of all prior messages and their SubRow,
HoverButtons, and SiblingSwitch children.

* perf: add custom memo comparators to prevent message reference re-renders

buildTree creates new message objects on every streaming update for
ALL messages, not just the changed one. This defeats React.memo's
default shallow comparison since the message prop has a new reference
even when the content hasn't changed.

Custom areEqual comparators now compare message by key fields
(messageId, text, error, depth, children length, etc.) instead of
reference equality, preventing unnecessary re-renders of SubRow,
Files, HoverButtons and other children for non-latest messages.

* perf: memoize ChatForm children to prevent streaming re-renders

- Wrap StopButton in React.memo
- Wrap AudioRecorder in React.memo, use ref for isSubmitting in
  onTranscriptionComplete callback to stabilize it
- Remove useChatContext() from FileFormChat (bypassed its memo
  during streaming), accept files/setFiles/setFilesLoading as
  props from ChatForm instead

* perf: stabilize ChatForm child props to prevent cascading re-renders

ChatForm re-renders frequently during streaming (ChatContext changes).
This caused StopButton and AttachFileChat/AttachFileMenu to re-render
despite being memo'd, because their props were new references each time.

- Wrap handleStopGenerating in a ref-based stable callback so StopButton
  always receives the same function reference
- Create stableConversation via useMemo keyed on rendering-relevant
  fields only (conversationId, endpoint, agent_id, etc.), so
  AttachFileChat and FileFormChat don't re-render from unrelated
  conversation metadata updates (e.g., title generation)

* perf: remove ChatContext subscription from AttachFileMenu and FileFormChat

Both components used useFileHandling() which internally calls
useChatContext(), bypassing their React.memo wrappers and causing
re-renders on every streaming chunk.

Switch to useFileHandlingNoChatContext() which accepts file state
as parameters. The state (files, setFiles, setFilesLoading,
conversation) is passed down from ChatForm → AttachFileChat →
AttachFileMenu as props, keeping the memo chain intact.

* fix: update imports and test mocks for useFileHandlingNoChatContext

- Re-export useFileHandlingNoChatContext from hooks barrel
- Import from ~/hooks instead of direct path for test compatibility
- Add useToastContext mock to @librechat/client in AttachFileMenu tests
  since useFileHandlingNoChatContext runs the core hook which needs it
- Add useFileHandlingNoChatContext to ~/hooks test mock

* perf: fix remaining ChatForm streaming re-renders

- Switch AttachFileMenu from useSharePointFileHandling (subscribes to
  ChatContext) to useSharePointFileHandlingNoChatContext with explicit
  file state props
- Memoize ChatForm textarea onFocus/onBlur handlers with useCallback
  to prevent TextareaAutosize re-renders (inline arrow functions and
  .bind() created new references on every ChatForm render)
- Update AttachFileMenu test mocks for new hook variants

* refactor: add displayName to ChatForm for React DevTools

* perf: prevent ChatForm re-renders during streaming via wrapper pattern

ChatForm was re-rendering on every streaming chunk because it subscribed
to useChatContext() internally, and the ChatContext value changed
frequently during streaming.

Extract context subscription into a ChatFormWrapper that:
- Subscribes to useChatContext() (re-renders on every chunk, cheap)
- Stabilizes conversation via selective useMemo
- Stabilizes handleStopGenerating via ref-based callback
- Passes individual stable values as props to ChatForm

ChatForm (memo'd) now receives context values as props instead of
subscribing directly. Since individual values (files, setFiles,
isSubmitting, etc.) are stable references during streaming, ChatForm's
memo prevents re-renders entirely — it only re-renders when isSubmitting
actually toggles (2x per stream: start/end).

* perf: stabilize newConversation prop and memoize CollapseChat

- Wrap newConversation in ref-based stable callback in ChatFormWrapper
  (was the remaining unstable prop causing ChatForm to re-render)
- Wrap CollapseChat in React.memo to prevent re-renders from parent

* perf: memoize useAddedResponse return value

useAddedResponse returned a new object literal on every render,
causing AddedChatContext.Provider to trigger re-renders of all
consumers (including ChatForm) on every streaming chunk. Wrap in
useMemo so the context value stays referentially stable.

* perf: memoize TextareaHeader to prevent re-renders from ChatForm

* perf: address review findings for streaming render optimization

Finding 1: Switch AttachFile.tsx from useFileHandling to
useFileHandlingNoChatContext, closing the optimization hole for
standard (non-agent) chat endpoints.

Finding 2: Replace content reference equality with length comparison
in both memo comparators — safer against buildTree array reconstruction.

Finding 3: Add conversation?.model to stableConversation deps in
ChatFormWrapper so file uploads use the correct model after switches.

Finding 4/14: Fix stableNewConversation to explicitly return the
underlying call's result instead of discarding it via `as` cast.

Finding 5/6: Extract useMemoizedChatContext hook shared by Message.tsx
and MessageContent.tsx — eliminates ~70 lines of duplication and
stabilizes chatContext.conversation via selective useMemo to prevent
post-stream metadata updates from re-rendering all messages.

Finding 8: Use TMessage type for regenerate param instead of
Record<string, unknown>.

Finding 9: Use FileSetter alias in FileFormChat instead of inline type.

Finding 11: Fix pre-existing broken throttle in useMessageProcess —
was creating a new throttle instance per call, providing zero
deduplication. Now retains the instance via useMemo.

Finding 12: Initialize isSubmittingRef with chatContext.isSubmitting
instead of false for consistency.

Finding 13: Add ChatFormWrapper displayName.

* fix: revert content comparison to reference equality in memo comparators

The length-based comparison (content?.length) missed updates within
existing content parts during streaming — text chunks update a part's
content without changing the array length, so the comparator returned
true and skipped re-renders for the latest message.

Reference equality (===) is correct here: buildTree preserves content
array references for unchanged messages via shallow spread, while
React Query gives the latest message a new reference when its content
updates during streaming.

* fix: cancel throttled handleScroll on unmount and remove unused import

* fix: use chatContext getter directly in regenerateMessage callback

The local isSubmittingRef was stale for non-latest messages (which
don't re-render during streaming by design). chatContext.isSubmitting
is a getter backed by the wrapper's ref, so reading it at call-time
always returns the current value regardless of whether the component
has re-rendered.

* fix: remove unused useCallback import from useMemoizedChatContext

* fix: pass global isSubmitting to HoverButtons for action gating

HoverButtons uses isSubmitting via useGenerationsByLatest to disable
regenerate and hide edit buttons during streaming. Passing the
effective value (false for non-latest messages) re-enabled those
actions mid-stream, risking overlapping edits/regenerations.

Use chatContext.isSubmitting (getter, always returns current value)
for HoverButtons while keeping the effective value for rendering-only
UI (cursor, placeholder, streaming indicator).

* fix: address second review — stale HoverButtons, messages dep, cleanup

- Add isSubmitting to chatContext useMemo deps in useMemoizedChatContext
  so HoverButtons correctly updates when streaming starts/ends (2 extra
  re-renders per session, belt-and-suspenders for post-stream state)
- Change conversation?.messages?.length dep to boolean in ChatFormWrapper
  stableConversation — only need 0↔1+ transition for landing page check,
  not exact count on every message addition
- Add defensive comment at chatContext destructuring point in
  useMessageActions explaining why isSubmitting must not be destructured
- Remove dead mockUseFileHandling.mockReturnValue from AttachFileMenu tests

* chore: remove dead useFileHandling mock artifacts from AttachFileMenu tests

* fix: resolve eslint warnings for useMemo dependencies

- Extract complex expression (conversation?.messages?.length ?? 0) > 0
  to hasMessages variable for static analysis in ChatFormWrapper
- Add eslint-disable for intentional isSubmitting dep in
  useMemoizedChatContext (forces new chatContext reference on streaming
  start/end so HoverButtons re-renders)
2026-03-29 17:05:12 -04:00
Danny Avila
0d94881c2d
🧹 refactor: Tighten Config Schema Typing and Remove Deprecated Fields (#12452)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* refactor: Remove deprecated and unused fields from endpoint schemas

- Remove summarize, summaryModel from endpointSchema and azureEndpointSchema
- Remove plugins from azureEndpointSchema
- Remove customOrder from endpointSchema and azureEndpointSchema
- Remove baseURL from all and agents endpoint schemas
- Type paramDefinitions with full SettingDefinition-based schema
- Clean up summarize/summaryModel references in initialize.ts and config.spec.ts

* refactor: Improve MCP transport schema typing

- Add defaults to transport type discriminators (stdio, websocket, sse)
- Type stderr field as IOType union instead of z.any()

* refactor: Add narrowed preset schema for model specs

- Create tModelSpecPresetSchema omitting system/DB/deprecated fields
- Update tModelSpecSchema to use the narrowed preset schema

* test: Add explicit type field to MCP test fixtures

Add transport type discriminator to test objects that construct
MCPOptions/ParsedServerConfig directly, required after type field
changed from optional to default in schema definitions.

* chore: Bump librechat-data-provider to 0.8.404

* refactor: Tighten z.record(z.any()) fields to precise value types

- Type headers fields as z.record(z.string()) in endpoint, assistant, and azure schemas
- Type addParams as z.record(z.union([z.string(), z.number(), z.boolean(), z.null()]))
- Type azure additionalHeaders as z.record(z.string())
- Type memory model_parameters as z.record(z.union([z.string(), z.number(), z.boolean()]))
- Type firecrawl changeTrackingOptions.schema as z.record(z.string())

* refactor: Type supportedMimeTypes schema as z.array(z.string())

Replace z.array(z.any()).refine() with z.array(z.string()) since config
input is always strings that get converted to RegExp via
convertStringsToRegex() after parsing. Destructure supportedMimeTypes
from spreads to avoid string[]/RegExp[] type mismatch.

* refactor: Tighten enum, role, and numeric constraint schemas

- Type engineSTT as enum ['openai', 'azureOpenAI']
- Type engineTTS as enum ['openai', 'azureOpenAI', 'elevenlabs', 'localai']
- Constrain playbackRate to 0.25–4 range
- Type titleMessageRole as enum ['system', 'user', 'assistant']
- Add int().nonnegative() to MCP timeout and firecrawl timeout

* chore: Bump librechat-data-provider to 0.8.405

* fix: Accept both string and RegExp in supportedMimeTypes schema

The schema must accept both string[] (config input) and RegExp[]
(post-merge runtime) since tests validate merged output against the
schema. Use z.union([z.string(), z.instanceof(RegExp)]) to handle both.

* refactor: Address review findings for schema tightening PR

- Revert changeTrackingOptions.schema to z.record(z.unknown()) (JSON Schema is nested, not flat strings)
- Remove dead contextStrategy code from BaseClient.js and cleanup.js
- Extract paramDefinitionSchema to named exported constant
- Add .int() constraint to columnSpan and columns
- Apply consistent .int().nonnegative() to initTimeout, sseReadTimeout, scraperTimeout
- Update stale stderr JSDoc to match actual accepted types
- Add comprehensive tests for paramDefinitionSchema, tModelSpecPresetSchema,
  endpointSchema deprecated field stripping, and azureEndpointSchema

* fix: Address second review pass findings

- Revert supportedMimeTypesSchema to z.array(z.string()) and remove
  as string[] casts — fix tests to not validate merged RegExp[] output
  against the config input schema
- Remove unused tModelSpecSchema import from test file
- Consolidate duplicate '../src/schemas' imports
- Add expiredAt coverage to tModelSpecPresetSchema test
- Assert plugins is absent in azureEndpointSchema test
- Add sync comments for engineSTT/engineTTS enum literals

* refactor: Omit preset-management fields from tModelSpecPresetSchema

Omit conversationId, presetId, title, defaultPreset, and order from the
model spec preset schema — these are preset-management fields that don't
belong in model spec configuration.
2026-03-29 01:10:57 -04:00
Danny Avila
f82d4300a4
🧹 chore: Remove Deprecated Gemini 2.0 Models & Fix Mistral-Large-3 Context Window (#12453)
* chore: remove deprecated Gemini 2.0 models from default models list

Remove gemini-2.0-flash-001 and gemini-2.0-flash-lite from the Google
default models array, as they have been deprecated by Google.

Closes #12444

* fix: add mistral-large-3 max context tokens (256k)

Add mistral-large-3 with 255000 max context tokens to the mistralModels
map. Without this entry, the model falls back to the generic
mistral-large key (131k), causing context window errors when using
tools with Azure AI Foundry deployments.

Closes #12429

* test: add mistral-large-3 token resolution tests and fix key ordering

Add test coverage for mistral-large-3 context token resolution,
verifying exact match, suffixed variants, and longest-match precedence
over the generic mistral-large key. Reorder the mistral-large-3 entry
after mistral-large to follow the file's documented convention of
listing newer models last for reverse-scan performance.
2026-03-28 23:44:58 -04:00
Danny Avila
fda1bfc3cc
🔬 ci: Add TypeScript Type Checks to Backend Workflow and Fix All Type Errors (#12451)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* fix(data-schemas): resolve TypeScript strict type check errors in source files

- Constrain ConfigSection to string keys via `string & keyof TCustomConfig`
- Replace broken `z` import from data-provider with TCustomConfig derivation
- Add `_id: Types.ObjectId` to IUser matching other Document interfaces
- Add `federatedTokens` and `openidTokens` optional fields to IUser
- Type mongoose model accessors as `Model<IRole>` and `Model<IUser>`
- Widen `getPremiumRate` param to accept `number | null`
- Widen `bulkWriteAclEntries` ops to untyped `AnyBulkWriteOperation[]`
- Fix `getUserPrincipals` return type to use `PrincipalType` enum
- Add non-null assertions for `connection.db` in migration files
- Import DailyRotateFile constructor directly instead of relying on
  broken module augmentation across mismatched node_modules trees
- Add winston-daily-rotate-file as devDependency for type resolution

* fix(data-schemas): resolve TypeScript type errors in test files

- Replace arbitrary test keys with valid TCustomConfig properties in config.spec
- Use non-null assertions for permission objects in role.methods.spec
- Replace `.SHARED_GLOBAL` access with `.not.toHaveProperty()` for legacy field
- Add non-null assertions for balance, writeRate, readRate in spendTokens.spec
- Update mock user _id to use ObjectId in user.test
- Remove unused Schema import in tenantIndexes.spec

* fix(api): resolve TypeScript strict type check errors across source and test files

- Widen getUserPrincipals dep type in capabilities middleware
- Fix federatedTokens type in createSafeUser return
- Use proper mock req type for read-only properties in preAuthTenant.spec
- Replace `as IUser` casts with ObjectId-typed mocks in openid/oidc specs
- Use TokenExchangeMethodEnum values instead of string literals in MCP specs
- Fix SessionStore type compatibility in sessionCache specs
- Replace `catch (error: any)` with `(error as Error)` in redis specs
- Remove invalid properties from test data in initialize and MCP specs
- Add String.prototype.isWellFormed declaration for sanitizeTitle spec

* fix(client): resolve TypeScript type errors in shared client components

- Add default values for destructured bindings in OGDialogTemplate
- Replace broken ExtendedFile import with inline type in FileIcon

* ci: add TypeScript type-check job to backend review workflow

Add a `typecheck` job that runs `tsc --noEmit` on all four TypeScript
workspaces (data-provider, data-schemas, @librechat/api, @librechat/client)
after the build step. Catches type errors that rollup builds may miss.

* fix(data-schemas): add local type declaration for DailyRotateFile transport

The `winston-daily-rotate-file` package ships a module augmentation for
`winston/lib/winston/transports`, but it fails when winston and
winston-daily-rotate-file resolve from different node_modules trees
(which happens in this monorepo due to npm hoisting).

Add a local `.d.ts` declaration that augments the same module path from
within data-schemas' compilation unit, so `tsc --noEmit` passes while
keeping the original runtime pattern (`new winston.transports.DailyRotateFile`).

* fix: address code review findings from PR #12451

- Restore typed `AnyBulkWriteOperation<AclEntry>[]` on bulkWriteAclEntries,
  cast to untyped only at the tenantSafeBulkWrite call site (Finding 1)
- Type `findUser` model accessor consistently with `findUsers` (Finding 2)
- Replace inline `import('mongoose').ClientSession` with top-level import type
- Use `toHaveLength` for spy assertions in playwright-expect spec file
- Replace numbered Record casts with `.not.toHaveProperty()` in
  role.methods.spec for SHARED_GLOBAL assertions
- Use per-test ObjectIds instead of shared testUserId in openid.spec
- Replace inline `import()` type annotations with top-level SessionData
  import in sessionCache spec
- Remove extraneous blank line in user.ts searchUsers

* refactor: address remaining review findings (4–7)

- Extract OIDCTokens interface in user.ts; deduplicate across IUser fields
  and oidc.ts FederatedTokens (Finding 4)
- Move String.isWellFormed declaration from spec file to project-level
  src/types/es2024-string.d.ts (Finding 5)
- Replace verbose `= undefined` defaults in OGDialogTemplate with null
  coalescing pattern (Finding 6)
- Replace `Record<string, unknown>` TestConfig with named interface
  containing explicit test fields (Finding 7)
2026-03-28 21:06:39 -04:00
Marco Beretta
d5c7d9f525
📝 docs: update deployment link for Railway in README and README.zh.md (#12449)
* docs: update deployment link for Railway in README

* docs: mirror Railway deploy link update in README.zh.md

Agent-Logs-Url: https://github.com/danny-avila/LibreChat/sessions/ea1b6e56-f93d-47a7-9d62-6157d824acff

Co-authored-by: berry-13 <81851188+berry-13@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-28 20:10:36 -04:00
Danny Avila
877c2efc85
🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445)
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode

seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.

* fix: chain tenantContextMiddleware in optionalJwtAuth

optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.

* feat: tenant-safe bulkWrite wrapper and call-site migration

Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.

Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)

systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.

* feat: pre-auth tenant middleware and tenant-scoped config cache

Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.

The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.

The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.

* feat: accept optional tenantId in updateInterfacePermissions

When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.

* fix: tenant-filter $lookup in getPromptGroup aggregation

The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.

Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.

* fix: replace field-level unique indexes with tenant-scoped compounds

Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.

- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
  already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
  already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
  compound

* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant

These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.

Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).

Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG

* fix: add getTenantId to PluginController spec mock

The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.

* fix: address review findings — migration, strict-mode, DRY, types

Addresses all CRITICAL, MAJOR, and MINOR review findings:

F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.

F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.

F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.

F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.

F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.

F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.

F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).

F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.

* fix: add new superseded indexes to tenantIndexes test fixture

The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).

* fix: restore logger.warn for unknown bulk op types in non-strict mode

* fix: block SYSTEM_TENANT_ID sentinel from external header input

CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.

Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
  context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
  for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js

* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions

bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
  document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling

preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
  getTenantId() inside the middleware's execution context, not the
  test runner's outer frame (which was always undefined regardless)

* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager

MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
  preventing cross-tenant message content reads during TTS streaming.

FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
  context is active, isolating OAuth flow state per tenant.

GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
  SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
  req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId

* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks

FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.

* fix: correct import ordering per AGENTS.md conventions

Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.

* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode

serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).

* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs

SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped

FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows

Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site

* fix: SSE stream tests hang on success path, remove internal fork references

The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.

Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".

* fix: address review findings — test rigor, tenant ID validation, docs

F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.

F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.

F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.

F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.

F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.

F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.

* fix: revert flow key tenant scoping, fix SSE test timing

FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).

SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.

* fix: restore getTenantId import for completeFlow diagnostic log

* fix: correct completeFlow warning message, add missing flow test

The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.

Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.

Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.

* fix: add explicit return type to recursive updateInterfacePermissions

The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.

* fix: update MCPOAuthRaceCondition test to match new completeFlow warning

* fix: clearEndpointConfigCache deletes both scoped and unscoped keys

Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.

* fix: tenant guard on abort/status endpoints, warn logs, test coverage

F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.

F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.

F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.

F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.

F5: Test for job with tenantId + user without tenantId → 403.

F10: Regex uses idiomatic hyphen-at-start form.

F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).

Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.

* fix: test coverage for logger.warn, status/abort guards, consistency

A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.

B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.

C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.

D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.

* fix: assert ip field in malformed header warn tests

* fix: parallelize cache deletes, document tenant guard, fix import order

- clearEndpointConfigCache uses Promise.all for independent cache
  deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
  behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
  AGENTS.md

* fix: tenant-qualify userJobs keys, document tenant guard backward-compat

Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
  stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap

getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).

Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.

* fix: parallelize cache deletes, document tenant guard, fix import order

Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.

Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.

* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs

F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.

F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.

F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).

F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.

F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.

F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.

F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).

* fix: clean userJobMap in evictOldest — same leak as cleanup()
2026-03-28 16:43:50 -04:00
Danny Avila
935288f841
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring

- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
  `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
  field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
  `config.source === 'user'` across MCPManager, ConnectionsRepository,
  UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB

* feat: three-layer MCPServersRegistry with config cache and lazy init

- Add `configCacheRepo` as third repository layer between YAML cache and
  DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
  from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
  caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
  concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
  three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
  results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
  DB-stored servers in `addServer()` and `addServerStub()`

* feat: wire tenant context into MCP controllers, services, and cache invalidation

- Resolve config-source servers via `getAppConfig({ role, tenantId })`
  in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
  for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
  routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
  config mutations trigger re-inspection of config-source MCP servers

* feat: enforce tenantMcpPolicy on admin config mcpServers mutations

- Add `validateMcpServerPolicy()` helper that checks mcpServers against
  operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
  allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
  handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
  → websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured

* revert: remove tenantMcpPolicy schema and enforcement

The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.

- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
  patchConfigField handlers

* test: update test assertions for source field and config-server wiring

- Use objectContaining in MCPServersRegistry reset test to account for
  new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter

* fix: disconnect active connections when config-source servers are evicted

When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.

- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
  clearMcpConfigCache() via MCPManager.appConnections.disconnect()

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
  cross-tenant cache poisoning when two tenants define the same
  server name with different configurations
- Change dbSourced checks from `source === 'user'` to
  `source !== 'yaml' && source !== 'config'` so undefined source
  (pre-upgrade cached configs) fails closed to restricted mode

MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
  calling getOAuthServers() separately — config-source OAuth servers
  are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
  for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
  servers now always resolve regardless of tenant context

MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
  eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
  clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
  ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
  calls per request

Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
  MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
  legacy dbId check when absent (backward-compatible with pre-upgrade
  cached configs that lack source field)

MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
  covering lazy init, stub-on-failure, cross-tenant isolation via config
  hash keys, concurrent deduplication, merge order, and cache invalidation

MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change

* fix: restore getServerConfig lookup for config-source servers (NEW-1)

Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.

Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.

- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup

* fix: eliminate configNameToKey race via caller-provided configServers param

Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.

- Add optional configServers param to getServerConfig; when provided,
  returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
  invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
  redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig

* fix: populate read-through cache after config server lazy init

After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.

Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.

* fix: user-scoped getServerConfig fallback to server-only cache key

When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.

* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race

CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.

MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).

MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.

MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).

* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext

Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.

Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.

* fix: stale failure stubs retry after 5 min, upsert for cross-process races

- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
  instead of permanently disabling config-source servers after transient
  errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
  update, preventing cross-process Redis races where a second instance's
  successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS

* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup

- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
  CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
  were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
  preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
  jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests

* fix: server-only readThrough fallback only returns truthy values

Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.

* fix: remove findInConfigCache to eliminate cross-tenant config leakage

The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:

1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
   5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)

Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.

- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
  (prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry

* style: fix import order per AGENTS.md conventions

Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.

* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures

Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:

- Cross-tenant contamination: the readThrough cache key was unscoped
  (just serverName), so concurrent multi-tenant requests for same-named
  servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
  fail with "Configuration not found" because the readThrough entry
  had expired

Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
  provided config directly, falling back to getServerConfig lookup
  for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
  closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
  servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior

* fix: cache base configs for config-server users; narrow upsertConfigCache error handling

- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
  from config-server layering. Base configs are cached via readThroughCacheAll
  regardless of whether configServers is provided, eliminating uncached
  MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
  infrastructure errors (Redis timeouts, network failures) now propagate
  instead of being silently swallowed, preventing inspection storms
  during outages

* fix: restore correct merge order and document upsert error matching

- Restore YAML → Config → User DB precedence in getAllServerConfigs
  (user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
  linking to the two cache implementations that define the error message

* feat: complete config-source server support across all execution paths

Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.

- Thread configServers into handleTools.js agent tool pipeline: resolve
  config servers from tenant context before MCP tool iteration, pass to
  getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
  applyContextToAgent → getMCPInstructionsForServers →
  formatInstructionsForContext, resolved in client.js before agent
  context application
- Add configServers param to createMCPTool and createMCPTools for
  reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
  differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
  results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter

* fix: add missing mocks for config-source server dependencies in client.test.js

Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.

* fix: update formatInstructionsForContext assertions for configServers param

The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.

* fix: move configServers resolution before MCP tool loop to avoid TDZ

configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.

* fix: address review findings — cache bypass, discoverServerTools gap, DRY

- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
  readThroughCacheAll) instead of bypassing it when configServers is present.
  Extracts user-DB entries from cached base by diffing against YAML keys
  to maintain YAML → Config → User DB merge order without extra MongoDB calls.

- #3: Add configServers param to ToolDiscoveryOptions and thread it through
  discoverServerTools → getServerConfig so config-source servers are
  discoverable during OAuth reconnection flows.

- #6: Replace inline import() type annotations in context.ts with proper
  import type { ParsedServerConfig } per AGENTS.md conventions.

- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
  handleTools.js and client.js, eliminating the duplicated 6-line config
  resolution pattern.

- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
  choice in getMCPSetupData — documents non-obvious correctness constraint.

- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.

* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case

- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
  YAML entry with the same name was excluded from userDbConfigs; now
  uses reference equality check to detect DB-overwritten YAML keys

* fix: replace fragile string-match error detection with proper upsert method

Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.

Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)

* fix: wire configServers through remaining HTTP endpoints

- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name

* fix: thread serverConfig through getConnection for config-source servers

Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.

* fix: thread configServers through reinit, reconnect, and tool definition paths

Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:

- reinitMCPServer: accepts serverConfig and configServers, uses them for
  getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
  passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer

* fix: address review findings — simplify merge, harden error paths, fix log labels

- Simplify getAllServerConfigs merge: replace fragile reference-equality
  loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
  failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
  messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js

* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label

- Clear yamlServerNamesPromise on rejection so transient cache errors
  don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
  reinitMCPServer — they lack a CACHE/DB storage location; retry is
  handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message

* fix: persist oauthHeaders in flow state for config-source OAuth servers

The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.

Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.

* fix: update tests for getMCPSetupData null guard removal and ToolService mock

- MCP.spec.js: update test to expect graceful handling of null mcpConfig
  instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
  null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
  (resolveConfigServers)

* fix: address review findings — DRY, naming, logging, dead code, defensive guards

- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
  eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
  imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
  in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
  so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change

* fix: restore correct merge priority, use immutable spread, fix test mock

- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
  configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
  for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
  unnecessary || {} defensive guard in getMCPSetupData

* fix: error handling, stable hashing, flatten nesting, remove dead param

- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
  graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
  regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
  document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets

* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision

The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.

Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.

Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.

* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency

The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').

* chore: imports/fields ordering

* fix: address review findings — error handling, targeted lookup, test gaps

- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
  getAppConfig/getAllServerConfigs failures propagate instead of masking
  infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
  all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
  works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
  vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
  happy path and error propagation.

* fix: add getOAuthReconnectionManager mock to OAuth callback tests

* chore: imports ordering
2026-03-28 10:36:43 -04:00
Danny Avila
77712c825f
🏢 feat: Tenant-Scoped App Config in Auth Login Flows (#12434)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* feat: add resolveAppConfigForUser utility for tenant-scoped auth config

TypeScript utility in packages/api that wraps getAppConfig in
tenantStorage.run() when the user has a tenantId, falling back to
baseOnly for new users or non-tenant deployments. Uses DI pattern
(getAppConfig passed as parameter) for testability.

Auth flows apply role-level overrides only (userId not passed)
because user/group principal resolution is deferred to post-auth.

* feat: tenant-scoped app config in auth login flows

All auth strategies (LDAP, SAML, OpenID, social login) now use a
two-phase domain check consistent with requestPasswordReset:

1. Fast-fail with base config (memory-cached, zero DB queries)
2. DB user lookup
3. Tenant-scoped re-check via resolveAppConfigForUser (only when
   user has a tenantId; otherwise reuse base config)

This preserves the original fast-fail protection against globally
blocked domains while enabling tenant-specific config overrides.

OpenID error ordering preserved: AUTH_FAILED checked before domain
re-check so users with wrong providers get the correct error type.

registerUser unchanged (baseOnly, no user identity yet).

* test: add tenant-scoped config tests for auth strategies

Add resolveAppConfig.spec.ts in packages/api with 8 tests:
- baseOnly fallback for null/undefined/no-tenant users
- tenant-scoped config with role and tenantId
- ALS context propagation verified inside getAppConfig callback
- undefined role with tenantId edge case

Update strategy and AuthService tests to mock resolveAppConfigForUser
via @librechat/api. Tests verify two-phase domain check behavior:
fast-fail before DB, tenant re-check after. Non-tenant users reuse
base config without calling resolveAppConfigForUser.

* refactor: skip redundant domain re-check for non-tenant users

Guard the second isEmailDomainAllowed call with appConfig !== baseConfig
in SAML, OpenID, and social strategies. For non-tenant users the tenant
config is the same base config object, so the second check is a no-op.

Narrow eslint-disable in resolveAppConfig.spec.ts to the specific
require line instead of blanket file-level suppression.

* fix: address review findings — consistency, tests, and ordering

- Consolidate duplicate require('@librechat/api') in AuthService.js
- Add two-phase domain check to LDAP (base fast-fail before findUser),
  making all strategies consistent with PR description
- Add appConfig !== baseConfig guard to requestPasswordReset second
  domain check, consistent with SAML/OpenID/social strategies
- Move SAML provider check before tenant config resolution to avoid
  unnecessary resolveAppConfigForUser call for wrong-provider users
- Add tenant domain rejection tests to SAML, OpenID, and social specs
  verifying that tenant config restrictions actually block login
- Add error propagation tests to resolveAppConfig.spec.ts
- Remove redundant mockTenantStorage alias in resolveAppConfig.spec.ts
- Narrow eslint-disable to specific require line

* test: add tenant domain rejection test for LDAP strategy

Covers the appConfig !== baseConfig && !isEmailDomainAllowed path,
consistent with SAML, OpenID, and social strategy specs.

* refactor: rename resolveAppConfig to app/resolve per AGENTS.md

Rename resolveAppConfig.ts → resolve.ts and
resolveAppConfig.spec.ts → resolve.spec.ts to align with
the project's concise naming convention.

* fix: remove fragile reference-equality guard, add logging and docs

Remove appConfig !== baseConfig guard from all strategies and
requestPasswordReset. The guard relied on implicit cache-backend
identity semantics (in-memory Keyv returns same object reference)
that would silently break with Redis or cloned configs. The second
isEmailDomainAllowed call is a cheap synchronous check — always
running it is clearer and eliminates the coupling.

Add audit logging to requestPasswordReset domain blocks (base and
tenant), consistent with all auth strategies.

Extract duplicated error construction into makeDomainDeniedError().

Wrap resolveAppConfigForUser in requestPasswordReset with try/catch
to prevent DB errors from leaking to the client via the controller's
generic catch handler.

Document the dual tenantId propagation (ALS for DB isolation,
explicit param for cache key) in resolveAppConfigForUser JSDoc.

Add comment documenting the LDAP error-type ordering change
(cross-provider users from blocked domains now get 'domain not
allowed' instead of AUTH_FAILED).

Assert resolveAppConfigForUser is not called on LDAP provider
mismatch path.

* fix: return generic response for tenant domain block in password reset

Tenant-scoped domain rejection in requestPasswordReset now returns the
same generic "If an account with that email exists..." response instead
of an Error. This prevents user-enumeration: an attacker cannot
distinguish between "email not found" and "tenant blocks this domain"
by comparing HTTP responses.

The base-config fast-fail (pre-user-lookup) still returns an Error
since it fires before any user existence is revealed.

* docs: document phase 1 vs phase 2 domain check behavior in JSDoc

Phase 1 (base config, pre-findUser) intentionally returns Error/400
to reveal globally blocked domains without confirming user existence.
Phase 2 (tenant config, post-findUser) returns generic 200 to prevent
user-enumeration. This distinction is now explicit in the JSDoc.
2026-03-27 16:08:43 -04:00
Dustin Healy
5972a21479
🪪 feat: Admin Roles API Endpoints (#12400)
* feat: add createRole and deleteRole methods to role

* feat: add admin roles handler factory and Express routes

* fix: address convention violations in admin roles handlers

* fix: rename createRole/deleteRole to avoid AccessRole name collision

The existing accessRole.ts already exports createRole/deleteRole for the
AccessRole model. In createMethods index.ts, these are spread after
roleMethods, overwriting them. Renamed our Role methods to
createRoleByName/deleteRoleByName to match the existing pattern
(getRoleByName, updateRoleByName) and avoid the collision.

* feat: add description field to Role model

- Add description to IRole, CreateRoleRequest, UpdateRoleRequest types
- Add description field to Mongoose roleSchema (default: '')
- Wire description through createRoleHandler and updateRoleHandler
- Include description in listRoles select clause so it appears in list

* fix: address Copilot review findings in admin roles handlers

* test: add unit tests for admin roles and groups handlers

* test: add data-layer tests for createRoleByName, deleteRoleByName, listUsersByRole

* fix: allow system role updates when name is unchanged

The updateRoleHandler guard rejected any request where body.name matched
a system role, even when the name was not being changed. This blocked
editing a system role's description. Compare against the URL param to
only reject actual renames to reserved names.

* fix: address external review findings for admin roles

- Block renaming system roles (ADMIN/USER) and add user migration on rename
- Add input validation: name max-length, trim on update, duplicate name check
- Replace fragile String.includes error matching with prefix-based classification
- Catch MongoDB 11000 duplicate key in createRoleByName
- Add pagination (limit/offset/total) to getRoleMembersHandler
- Reverse delete order in deleteRoleByName — reassign users before deletion
- Add role existence check in removeRoleMember; drop unused createdAt select
- Add Array.isArray guard for permissions input; use consistent ?? coalescing
- Fix import ordering per AGENTS.md conventions
- Type-cast mongoose.models.User as Model<IUser> for proper TS inference
- Add comprehensive tests: rename guards, pagination, validation, 500 paths

* fix: address re-review findings for admin roles

- Gate deleteRoleByName on existence check — skip user reassignment and
  cache invalidation when role doesn't exist (fixes test mismatch)
- Reverse rename order: migrate users before renaming role so a migration
  failure leaves the system in a consistent state
- Add .sort({ _id: 1 }) to listUsersByRole for deterministic pagination
- Import shared AdminMember type from data-schemas instead of local copy;
  make joinedAt optional since neither groups nor roles populate it
- Change IRole.description from optional to required to match schema default
- Add data-layer tests for updateUsersByRole and countUsersByRole
- Add handler test verifying users-first rename ordering and migration
  failure safety

* fix: add rollback on rename failure and update PR description

- Roll back user migration if updateRoleByName returns null during a
  rename (race: role deleted between existence check and update)
- Add test verifying rollback calls updateUsersByRole in reverse
- Update PR #12400 description to reflect current test counts (56
  handler tests, 40 data-layer tests) and safety features

* fix: rollback on rename throw, description validation, delete/DRY cleanup

- Hoist isRename/trimmedName above try block so catch can roll back user
  migration when updateRoleByName throws (not just returns null)
- Add description type + max-length (2000) validation in create and update,
  consistent with groups handler
- Remove redundant getRoleByName existence check in deleteRoleHandler —
  use deleteRoleByName return value directly
- Skip no-op name write when body.name equals current name (use isRename)
- Extract getUserModel() accessor to DRY repeated Model<IUser> casts
- Use name.trim() consistently in createRoleByName error messages
- Add tests: rename-throw rollback, description validation (create+update),
  update delete test mocks to match simplified handler

* fix: guard spurious rollback, harden createRole error path, validate before DB calls

- Add migrationRan flag to prevent rollback of user migration that never ran
- Return generic message on 500 in createRoleHandler, specific only for 409
- Move description validation before DB queries in updateRoleHandler
- Return existing role early when update body has no changes
- Wrap cache.set in createRoleByName with try/catch to prevent masking DB success
- Add JSDoc on 11000 catch explaining compound unique index
- Add tests: spurious rollback guard, empty update body, description validation
  ordering, listUsersByRole pagination

* fix: validate permissions in create, RoleConflictError, rollback safety, cache consistency

- Add permissions type/array validation in createRoleHandler
- Introduce RoleConflictError class replacing fragile string-prefix matching
- Wrap rollback in !role null path with try/catch for correct 404 response
- Wrap deleteRoleByName cache.set in try/catch matching createRoleByName
- Narrow updateRoleHandler body type to { name?, description? }
- Add tests: non-string description in create, rollback failure logging,
  permissions array rejection, description max-length assertion fix

* feat: prevent removing the last admin user

Add guard in removeRoleMember that checks countUsersByRole before
demoting an ADMIN user, returning 400 if they are the last one.

* fix: move interleaved export below imports, add await to countUsersByRole

* fix: paginate listRoles, null-guard permissions handler, fix export ordering

- Add limit/offset/total pagination to listRoles matching the groups pattern
- Add countRoles data-layer method
- Omit permissions from listRoles select (getRole returns full document)
- Null-guard re-fetched role in updateRolePermissionsHandler
- Move interleaved export below all imports in methods/index.ts

* fix: address review findings — race safety, validation DRY, type accuracy, test coverage

- Add post-write admin count verification in removeRoleMember to prevent
  zero-admin race condition (TOCTOU → rollback if count hits 0)
- Make IRole.description optional; backfill in initializeRoles for
  pre-existing roles that lack the field (.lean() bypasses defaults)
- Extract parsePagination, validateNameParam, validateRoleName, and
  validateDescription helpers to eliminate duplicated validation
- Add validateNameParam guard to all 7 handlers reading req.params.name
- Catch 11000 in updateRoleByName and surface as 409 via RoleConflictError
- Add idempotent skip in addRoleMember when user already has target role
- Verify updateRolePermissions test asserts response body
- Add data-layer tests: listRoles sort/pagination/projection, countRoles,
  and createRoleByName 11000 duplicate key race

* fix: defensive rollback in removeRoleMember, type/style cleanup, test coverage

- Wrap removeRoleMember post-write admin rollback in try/catch so a
  transient DB failure cannot leave the system with zero administrators
- Replace double `as unknown[] as IRole[]` cast with `.lean<IRole[]>()`
- Type parsePagination param explicitly; extract DEFAULT/MAX page constants
- Preserve original error cause in updateRoleByName re-throw
- Add test for rollback failure path in removeRoleMember (returns 400)
- Add test for pre-existing roles missing description field (.lean())

* chore: bump @librechat/data-schemas to 0.0.47

* fix: stale cache on rename, extract renameRole helper, shared pagination, cleanup

- Fix updateRoleByName cache bug: invalidate old key and populate new key
  when updates.name differs from roleName (prevents stale cache after rename)
- Extract renameRole helper to eliminate mutable outer-scope state flags
  (isRename, trimmedName, migrationRan) in updateRoleHandler
- Unify system-role protection to 403 for both rename-from and rename-to
- Extract parsePagination to shared admin/pagination.ts; use in both
  roles.ts and groups.ts
- Extract name.trim() to local const in createRoleByName (was called 5×)
- Remove redundant findOne pre-check in deleteRoleByName
- Replace getUserModel closure with local const declarations
- Remove redundant description ?? '' in createRoleHandler (schema default)
- Add doc comment on updateRolePermissionsHandler noting cache dependency
- Add data-layer tests for cache rename behavior (old key null, new key set)

* fix: harden role guards, add User.role index, validate names, improve tests

- Add index on User.role field for efficient member queries at scale
- Replace fragile SystemRoles key lookup with value-based Set check (6 sites)
- Elevate rename rollback failure logging to CRITICAL (matches removeRoleMember)
- Guard removeRoleMember against non-ADMIN system roles (403 for USER)
- Fix parsePagination limit=0 gotcha: use parseInt + NaN check instead of ||
- Add control character and reserved path segment validation to role names
- Simplify validateRoleName: remove redundant casts and dead conditions
- Add JSDoc to deleteRoleByName documenting non-atomic window
- Split mixed value+type import in methods/index.ts per AGENTS.md
- Add 9 new tests: permissions assertion, combined rename+desc, createRole
  with permissions, pagination edge cases, control char/reserved name
  rejection, system role removeRoleMember guard

* fix: exact-case reserved name check, consistent validation, cleaner createRole

- Remove .toLowerCase() from reserved name check so only exact matches
  (members, permissions) are rejected, not legitimate names like "Members"
- Extract trimmed const in validateRoleName for consistent validation
- Add control char check to validateNameParam for parity with body validation
- Build createRole roleData conditionally to avoid passing description: undefined
- Expand deleteRoleByName JSDoc documenting self-healing design and no-op trade-off

* fix: scope rename rollback to only migrated users, prevent cross-role corruption

Capture user IDs before forward migration so the rollback path only
reverts users this request actually moved. Previously the rollback called
updateUsersByRole(newName, currentName) which would sweep all users with
the new role — including any independently assigned by a concurrent admin
request — causing silent cross-role data corruption.

Adds findUserIdsByRole and updateUsersRoleByIds to the data layer.
Extracts rollbackMigratedUsers helper to deduplicate rollback sites.

* fix: guard last admin in addRoleMember to prevent zero-admin lockout

Since each user has exactly one role, addRoleMember implicitly removes
the user from their current role. Without a guard, reassigning the sole
admin to a non-admin role leaves zero admins and locks out admin
management. Adds the same countUsersByRole check used in removeRoleMember.

* fix: wire findUserIdsByRole and updateUsersRoleByIds into roles route

The scoped rollback deps added in c89b5db were missing from the route
DI wiring, causing renameRole to call undefined and return a 500.

* fix: post-write admin guard in addRoleMember, compound role index, review cleanup

- Add post-write admin count check + rollback to addRoleMember to match
  removeRoleMember's two-phase TOCTOU protection (prevents zero-admin via
  concurrent requests)
- Replace single-field User.role index with compound { role: 1, tenantId: 1 }
  to align with existing multi-tenant index pattern (email, OAuth IDs)
- Narrow listRoles dep return type to RoleListItem (projected fields only)
- Refactor validateDescription to early-return style per AGENTS.md
- Remove redundant double .lean() in updateRoleByName
- Document rename snapshot race window in renameRole JSDoc
- Document cache null-set behavior in deleteRoleByName
- Add routing-coupling comment on RESERVED_ROLE_NAMES
- Add test for addRoleMember post-write rollback

* fix: review cleanup — system-role guard, type safety, JSDoc accuracy, tests

- Add system-role guard to addRoleMember: block direct assignment to
  non-ADMIN system roles (403), symmetric with removeRoleMember
- Fix RESERVED_ROLE_NAMES comment: explain semantic URL ambiguity, not
  a routing conflict (Express resolves single vs multi-segment correctly)
- Replace _id: unknown with Types.ObjectId | string per AGENTS.md
- Narrow listRoles data-layer return type to Pick<IRole, 'name' | 'description'>
  to match the actual .select() projection
- Move updateRoleHandler param check inside try/catch for consistency
- Include user IDs in all CRITICAL rollback failure logs for operator recovery
- Clarify deleteRoleByName JSDoc: replace "self-healing" with "idempotent",
  document that recovery requires caller retry
- Add tests: system-role guard, promote non-admin to ADMIN,
  findUserIdsByRole throw prevents migration

* fix: include _id in listRoles return type to match RoleListItem

Pick<IRole, 'name' | 'description'> omits _id, making it incompatible
with the handler dep's RoleListItem which requires _id.

* fix: case-insensitive system role guard, reject null permissions, check updateUser result

- System role name checks now use case-insensitive comparison via
  toUpperCase() — prevents creating 'admin' or 'user' which would
  collide with the legacy roles route that uppercases params
- Reject permissions: null in createRole (typeof null === 'object'
  was bypassing the validation)
- Check updateUser return in addRoleMember — return 404 if the user
  was deleted between the findUser and updateUser calls

* fix: check updateUser return in removeRoleMember for concurrent delete safety

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-27 15:44:47 -04:00
Dustin Healy
2e3d66cfe2
👥 feat: Admin Groups API Endpoints (#12387)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* feat: add listGroups and deleteGroup methods to userGroup

* feat: add admin groups handler factory and Express routes

* fix: address convention violations in admin groups handlers

* fix: address Copilot review findings in admin groups handlers

- Escape regex in listGroups to prevent injection/ReDoS
- Validate ObjectId format in all handlers accepting id/userId params
- Replace N+1 findUser loop with batched findUsers query
- Remove unused findGroupsByMemberId from dep interface
- Map Mongoose ValidationError to 400 in create/update handlers
- Validate name in updateGroupHandler (reject empty/whitespace)
- Handle null updateGroupById result (race condition)
- Tighten error message matching in add/remove member handlers

* test: add unit tests for admin groups handlers

* fix: address code review findings for admin groups

Atomic delete/update handlers (single DB trip), pass through
idOnTheSource, add removeMemberById for non-ObjectId members,
deduplicate member results, fix error message exposure, add hard
cap/sort to listGroups, replace GroupListFilter with Pick of
GroupFilterOptions, validate memberIds as array, trim name in
update, fix import order, and improve test hygiene with fresh
IDs per test.

* fix: cascade cleanup, pagination, and test coverage for admin groups

Add deleteGrantsForPrincipal to systemGrant data layer and wire cascade
cleanup (Config, AclEntry, SystemGrant) into deleteGroupHandler. Add
limit/offset pagination to getGroupMembers. Guard empty PATCH bodies with
400. Remove dead type guard and unnecessary type cast. Add 11 new tests
covering cascade delete, idempotent member removal, empty update, search
filter, 500 error paths, and pagination.

* fix: harden admin groups with cascade resilience, type safety, and fallback removal

Wrap cascade cleanup in inner try/catch so partial failure logs but still
returns 200 (group is already deleted). Replace Record<string, unknown> on
deleteAclEntries with proper typed filter. Log warning for unmapped user
ObjectIds in createGroup memberIds. Add removeMemberById fallback when
removeUserFromGroup throws User not found for ObjectId-format userId.
Extract VALID_GROUP_SOURCES constant. Add 3 new tests (60 total).

* refactor: add countGroups, pagination, and projection type to data layer

Extract buildGroupQuery helper, add countGroups method, support
limit/offset/skip in listGroups, standardize session handling to
.session(session ?? null), and tighten projection parameter from
Record<string, unknown> to Record<string, 0 | 1>.

* fix: cascade resilience, pagination, validation, and error clarity for admin groups

- Use Promise.allSettled for cascade cleanup so all steps run even if
  one fails; log individual rejections
- Echo deleted group id in delete response
- Add countGroups dep and wire limit/offset pagination for listGroups
- Deduplicate memberIds before computing total in getGroupMembers
- Use { memberIds: 1 } projection in getGroupMembers
- Cap memberIds at 500 entries in createGroup
- Reject search queries exceeding 200 characters
- Clarify addGroupMember error for non-ObjectId userId
- Document deleted-user fallback limitation in removeGroupMember

* test: extend handler and DB-layer test coverage for admin groups

Handler tests: projection assertion, dedup total, memberIds cap,
search max length, non-ObjectId memberIds passthrough, cascade partial
failure resilience, dedup scenarios, echo id in delete response.

DB-layer tests: listGroups sort/filter/pagination, countGroups,
deleteGroup, removeMemberById, deleteGrantsForPrincipal.

* fix: cast group principalId to ObjectId for ACL entry cleanup

deleteAclEntries is a thin deleteMany wrapper with no type casting,
but grantPermission stores group principalId as ObjectId. Passing the
raw string from req.params would leave orphaned ACL entries on group
deletion.

* refactor: remove redundant pagination clamping from DB listGroups

Handler already clamps limit/offset at the API boundary. The DB
method is a general-purpose building block and should not re-validate.

* fix: add source and name validation, import order, and test coverage for admin groups

- Validate source against VALID_GROUP_SOURCES in createGroupHandler
- Cap name at 500 characters in both create and update handlers
- Document total as upper bound in getGroupMembers response
- Document ObjectId requirement for deleteAclEntries in cascade
- Fix import ordering in test file (local value after type imports)
- Add tests for updateGroup with description, email, avatar fields
- Add tests for invalid source and name max-length in both handlers

* fix: add field length caps, flatten nested try/catch, and fix logger level in admin groups

Add max-length validation for description, email, avatar, and
idOnTheSource in create/update handlers. Extract removeObjectIdMember
helper to flatten nested try/catch per never-nesting convention. Downgrade
unmapped-memberIds log from error to warn. Fix type import ordering and
add missing await in removeMemberById for consistency.
2026-03-26 17:36:18 -04:00
Danny Avila
9f6d8c6e93
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation

Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.

- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths

* feat: register tenant middleware and wrap startup/auth in runAsSystem()

- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
  before user context is established (LDAP, SAML, OpenID, social login, AuthService)

* feat: thread tenantId through all getAppConfig callers

Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.

Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.

Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint

* feat: add config cache invalidation on admin mutations

- Add clearOverrideCache(tenantId?) to flush per-principal override caches
  by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
  caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
  (upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)

* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering

The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.

* fix: replace runAsSystem with baseOnly for pre-tenant code paths

App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.

All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.

* fix: address PR review findings — middleware ordering, types, cache safety

- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
  instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)

* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly

- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
  initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
  queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)

* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests

- requireJwtAuth: 5 tests verifying ALS tenant context is set after
  passport auth, isolated between concurrent requests, and not set
  when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
  tenantId is threaded through, partial failure is handled gracefully,
  and operations run in parallel via Promise.all (Finding 11)

* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping

- Forward passport errors in requireJwtAuth before entering tenant
  middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
  are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
  so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
  base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing

* fix: address second review — cache safety, code quality, test reliability

- Decouple cache invalidation from mutation response: fire-and-forget
  with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
  side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
  env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock

* fix: move JSDoc to correct function after clearEndpointConfigCache extraction

* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry

Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.

* fix: remove return from tenantStorage.run to satisfy void middleware signature

* fix: address second review — cache safety, code quality, test reliability

- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
  so partial failures are logged individually instead of producing one
  undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
  flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
  if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
  clearAppConfigCache rejects (not just the swallowed endpoint error)

* chore: reorder imports in index.js and app.js for consistency

- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
Danny Avila
083042e56c
🪝 fix: Safe Hook Fallbacks for Tool-Call Components in Search Route (#12423)
* fix: add useOptionalMessagesOperations hook for context-safe message operations

Add a variant of useMessagesOperations that returns no-op functions
when MessagesViewProvider is absent instead of throwing, enabling
shared components to render safely outside the chat route.

* fix: use optional message operations in ToolCallInfo and UIResourceCarousel

Switch ToolCallInfo and UIResourceCarousel from useMessagesOperations
to useOptionalMessagesOperations so they no longer crash when rendered
in the /search route, which lacks MessagesViewProvider.

* fix: update test mocks to use useOptionalMessagesOperations

* fix: consolidate noops and narrow useMemo dependency in useOptionalMessagesOperations

- Replace three noop variants (noopAsync, noopReturn, noop) with a single
  `const noop = () => undefined` that correctly returns void/undefined
- Destructure individual fields from context before the useMemo so the
  dependency array tracks stable operation references, not the full
  context object (avoiding unnecessary re-renders on unrelated state changes)
- Add useOptionalMessagesConversation for components that need conversation
  data outside MessagesViewProvider

* fix: use optional hooks in MCPUIResource components to prevent search crash

MCPUIResource and MCPUIResourceCarousel render inside Markdown prose and
can appear in the /search route. Switch them from the strict
useMessagesOperations/useMessagesConversation hooks to the optional
variants that return safe defaults when MessagesViewProvider is absent.

* test: update test mocks for optional hook renames

* fix: update ToolCallInfo and UIResourceCarousel test mocks to useOptionalMessagesOperations

* fix: use optional message operations in useConversationUIResources

useConversationUIResources internally called the strict
useMessagesOperations(), which still threw when MCPUIResource rendered
outside MessagesViewProvider. Switch to useOptionalMessagesOperations
so the entire MCPUIResource render chain is safe in the /search route.

* style: fix import order per project conventions

* fix: replace as-unknown-as casts with typed NOOP_OPS stubs

- Define OptionalMessagesOps type and NOOP_OPS constant with properly
  typed no-op functions, eliminating all `as unknown as T` casts
- Use conversationId directly from useOptionalMessagesConversation
  instead of re-deriving it from conversation object
- Update JSDoc to reflect search route support

* test: add no-provider regression tests for optional message hooks

Verify useOptionalMessagesOperations and useOptionalMessagesConversation
return safe defaults when rendered outside MessagesViewProvider, covering
the core crash path this PR fixes.
2026-03-26 16:40:37 -04:00
Danny Avila
5e3b7bcde3
🌊 refactor: Local Snapshot for Aggregate Key Cache to Avoid Redundant Redis GETs (#12422)
* perf: Add local snapshot to aggregate key cache to avoid redundant Redis GETs

getAll() was being called 20+ times per chat request (once per tool,
per server config lookup, per connection check). Each call hit Redis
even though the data doesn't change within a request cycle.

Add an in-memory snapshot with 5s TTL that collapses all reads within
the window into a single Redis GET. Writes (add/update/remove/reset)
invalidate the snapshot immediately so mutations are never stale.

Also removes the debug logger that was producing noisy per-call logs.

* fix: Prevent snapshot mutation and guarantee cleanup on write failure

- Never mutate the snapshot object in-place during writes. Build a new
  object (spread) so concurrent readers never observe uncommitted state.
- Move invalidateLocalSnapshot() into withWriteLock's finally block so
  cleanup is guaranteed even when successCheck throws on Redis failure.
- After successful writes, populate the snapshot with the committed state
  to avoid an unnecessary Redis GET on the next read.
- Use Date.now() after the await in getAll() so the TTL window isn't
  shortened by Redis latency.
- Strengthen tests: spy on underlying Keyv cache to verify N getAll()
  calls collapse into 1 Redis GET, verify snapshot reference immutability.

* fix: Remove dead populateLocalSnapshot calls from write callbacks

populateLocalSnapshot was called inside withWriteLock callbacks, but
the finally block in withWriteLock always calls invalidateLocalSnapshot
immediately after — undoing the populate on every execution path.

Remove the dead method and its three call sites. The snapshot is
correctly cleared by finally on both success and failure paths. The
next getAll() after a write hits Redis once to fetch the committed
state, which is acceptable since writes only occur during init and
rare manual reinspection.

* fix: Derive local snapshot TTL from MCP_REGISTRY_CACHE_TTL config

Use cacheConfig.MCP_REGISTRY_CACHE_TTL (default 5000ms) instead of a
hardcoded 5s constant. When TTL is 0 (operator explicitly wants no
caching), the snapshot is disabled entirely — every getAll() hits Redis.

* fix: Add TTL expiry test, document 2×TTL staleness, clarify comments

- Add missing test for snapshot TTL expiry path (force-expire via
  localSnapshotExpiry mutation, verify Redis is hit again)
- Document 2×TTL max cross-instance staleness in localSnapshot JSDoc
- Document reset() intentionally bypasses withWriteLock
- Add inline comments explaining why early invalidateLocalSnapshot()
  in write callbacks is distinct from the finally-block cleanup
- Update cacheConfig.MCP_REGISTRY_CACHE_TTL JSDoc to reflect both
  use sites and the staleness implication
- Rename misleading test name for snapshot reference immutability
- Add epoch sentinel comment on localSnapshotExpiry initialization
2026-03-26 16:39:09 -04:00
Danny Avila
8e2721011e
🔑 fix: Robust MCP OAuth Detection in Tool-Call Flow (#12418)
* fix(api): add buildOAuthToolCallName utility for MCP OAuth flows

Extract a shared utility that builds the synthetic tool-call name
used during MCP OAuth flows (oauth_mcp_{normalizedServerName}).

Uses startsWith on the raw serverName (not the normalized form) to
guard against double-wrapping, so names that merely normalize to
start with oauth_mcp_ (e.g., oauth@mcp@server) are correctly
prefixed while genuinely pre-wrapped names are left as-is.

Add 8 unit tests covering normal names, pre-wrapped names, _mcp_
substrings, special characters, non-ASCII, and empty string inputs.

* fix(backend): use buildOAuthToolCallName in MCP OAuth flows

Replace inline tool-call name construction in both reconnectServer
(MCP.js) and createOAuthEmitter (ToolService.js) with the shared
buildOAuthToolCallName utility. Remove unused normalizeServerName
import from ToolService.js. Fix import ordering in both files.

This ensures the oauth_mcp_ prefix is consistently applied so the
client correctly identifies MCP OAuth flows and binds the CSRF
cookie to the right server.

* fix(client): robust MCP OAuth detection and split handling in ToolCall

- Fix split() destructuring to preserve tail segments for server names
  containing _mcp_ (e.g., foo_mcp_bar no longer truncated to foo).
- Add auth URL redirect_uri fallback: when the tool-call name lacks
  the _mcp_ delimiter, parse redirect_uri for the MCP callback path.
  Set function_name to the extracted server name so progress text
  shows the server, not the raw tool-call ID.
- Display server name instead of literal "oauth" as function_name,
  gated on auth presence to avoid misidentifying real tools named
  "oauth".
- Consolidate three independent new URL(auth) parses into a single
  parsedAuthUrl useMemo shared across detection, actionId, and
  authDomain hooks.
- Replace any type on ProgressText test mock with structural type.
- Add 8 tests covering delimiter detection, multi-segment names,
  function_name display, redirect_uri fallback, normalized _mcp_
  server names, and non-MCP action auth exclusion.

* chore: fix import order in utils.test.ts

* fix(client): drop auth gate on OAuth displayName so completed flows show server name

The createOAuthEnd handler re-emits the toolCall delta without auth,
so auth is cleared on the client after OAuth completes. Gating
displayName on `func === 'oauth' && auth` caused completed OAuth
steps to render "Completed oauth" instead of "Completed my-server".

Remove the `&& auth` gate — within the MCP delimiter branch the
func="oauth" check alone is sufficient. Also remove `auth` from the
useMemo dep array since only `parsedAuthUrl` is referenced. Update
the test to assert correct post-completion display.
2026-03-26 14:45:13 -04:00
Danny Avila
359cc63b41
refactor: Use in-memory cache for App MCP configs to avoid Redis SCAN (#12410)
*  perf: Use in-memory cache for App MCP configs to avoid Redis SCAN

The 'App' namespace holds static YAML-loaded configs identical on every
instance. Storing them in Redis and retrieving via SCAN + batch-GET
caused 60s+ stalls under concurrent load (#11624). Since these configs
are already loaded into memory at startup, bypass Redis entirely by
always returning ServerConfigsCacheInMemory for the 'App' namespace.

* ♻️ refactor: Extract APP_CACHE_NAMESPACE constant and harden tests

- Extract magic string 'App' to a shared `APP_CACHE_NAMESPACE` constant
  used by both ServerConfigsCacheFactory and MCPServersRegistry
- Document that `leaderOnly` is ignored for the App namespace
- Reset `cacheConfig.USE_REDIS` in test `beforeEach` to prevent
  ordering-dependent flakiness
- Fix import order in test file (longest to shortest)

* 🐛 fix: Populate App cache on follower instances in cluster mode

In cluster deployments, only the leader runs MCPServersInitializer to
inspect and cache MCP server configs. Followers previously read these
from Redis, but with the App namespace now using in-memory storage,
followers would have an empty cache.

Add populateLocalCache() so follower processes independently initialize
their own in-memory App cache from the same YAML configs after the
leader signals completion. The method is idempotent — if the cache is
already populated (leader case), it's a no-op.

* 🐛 fix: Use static flag for populateLocalCache idempotency

Replace getAllServerConfigs() idempotency check with a static
localCachePopulated flag. The previous check merged App + DB caches,
causing false early returns in deployments with publicly shared DB
configs, and poisoned the TTL read-through cache with stale results.

The static flag is zero-cost (no async/Redis/DB calls), immune to
DB config interference, and is reset alongside hasInitializedThisProcess
in resetProcessFlag() for test teardown.

Also set localCachePopulated=true after leader initialization completes,
so subsequent calls on the leader don't redundantly re-run populateLocalCache.

* 📝 docs: Document process-local reset() semantics for App cache

With the App namespace using in-memory storage, reset() only clears the
calling process's cache. Add JSDoc noting this behavioral change so
callers in cluster deployments know each instance must reset independently.

*  test: Add follower cache population tests for MCPServersInitializer

Cover the populateLocalCache code path:
- Follower populates its own App cache after leader signals completion
- localCachePopulated flag prevents redundant re-initialization
- Fresh follower process independently initializes all servers

* 🧹 style: Fix import order to longest-to-shortest convention

* 🔬 test: Add Redis perf benchmark to isolate getAll() bottleneck

Benchmarks that run against a live Redis instance to measure:
1. SCAN vs batched GET phases independently
2. SCAN cost scaling with total keyspace size (noise keys)
3. Concurrent getAll() at various concurrency levels (1/10/50/100)
4. Alternative: single aggregate key vs SCAN+GET
5. Alternative: raw MGET vs Keyv batch GET (serialization overhead)

Run with: npx jest --config packages/api/jest.config.mjs \
  --testPathPatterns="perf_benchmark" --coverage=false

*  feat: Add aggregate-key Redis cache for MCP App configs

ServerConfigsCacheRedisAggregateKey stores all configs under a single
Redis key, making getAll() a single GET instead of SCAN + N GETs.

This eliminates the O(keyspace_size) SCAN that caused 60s+ stalls in
large deployments while preserving cross-instance visibility — all
instances read/write the same Redis key, so reinspection results
propagate automatically after readThroughCache TTL expiry.

* ♻️ refactor: Use aggregate-key cache for App namespace in factory

Update ServerConfigsCacheFactory to return ServerConfigsCacheRedisAggregateKey
for the App namespace when Redis is enabled, instead of ServerConfigsCacheInMemory.

This preserves cross-instance visibility (reinspection results propagate
through Redis) while eliminating SCAN. Non-App namespaces still use the
standard per-key ServerConfigsCacheRedis.

* 🗑️ revert: Remove populateLocalCache — no longer needed with aggregate key

With App configs stored under a single Redis key (aggregate approach),
followers read from Redis like before. The populateLocalCache mechanism
and its localCachePopulated flag are no longer necessary.

Also reverts the process-local reset() JSDoc since reset() is now
cluster-wide again via Redis.

* 🐛 fix: Add write mutex to aggregate cache and exclude perf benchmark from CI

- Add promise-based write lock to ServerConfigsCacheRedisAggregateKey to
  prevent concurrent read-modify-write races during parallel initialization
  (Promise.allSettled runs multiple addServer calls concurrently, causing
  last-write-wins data loss on the aggregate key)
- Rename perf benchmark to cache_integration pattern so CI skips it
  (requires live Redis)

* 🔧 fix: Rename perf benchmark to *.manual.spec.ts to exclude from all CI

The cache_integration pattern is picked up by test:cache-integration:mcp
in CI. Rename to *.manual.spec.ts which isn't matched by any CI runner.

*  test: Add cache integration tests for ServerConfigsCacheRedisAggregateKey

Tests against a live Redis instance covering:
- CRUD operations (add, get, update, remove)
- getAll with empty/populated cache
- Duplicate add rejection, missing update/remove errors
- Concurrent write safety (20 parallel adds without data loss)
- Concurrent read safety (50 parallel getAll calls)
- Reset clears all configs

* 🔧 fix: Rename perf benchmark to *.manual.spec.ts to exclude from all CI

The perf benchmark file was renamed to *.manual.spec.ts but no
testPathIgnorePatterns existed for that convention. Add .*manual\.spec\.
to both test and test:ci scripts, plus jest.config.mjs, so manual-only
tests never run in CI unit test jobs.

* fix: Address review findings for aggregate key cache

- Add successCheck() to all write paths (add/update/remove) so Redis
  SET failures throw instead of being silently swallowed
- Override reset() to use targeted cache.delete(AGGREGATE_KEY) instead
  of inherited SCAN-based cache.clear() — consistent with eliminating
  SCAN operations
- Document cross-instance write race invariant in class JSDoc: the
  promise-based writeLock is process-local only; callers must enforce
  single-writer semantics externally (leader-only init)
- Use definite-assignment assertion (let resolve!:) instead of non-null
  assertion at call site
- Fix import type convention in integration test
- Verify Promise.allSettled rejections explicitly in concurrent write test
- Fix broken run command in benchmark file header

* style: Fix import ordering per AGENTS.md convention

Local/project imports sorted longest to shortest.

* chore: Update import ordering and clean up unused imports in MCPServersRegistry.ts

* chore: import order

* chore: import order
2026-03-26 14:44:31 -04:00
Marco Beretta
1123f96e6a
📝 docs: add UTM tracking parameters to Railway deployment links (#12228) 2026-03-26 13:43:33 -04:00
Danny Avila
df82f2e9b2
🏁 fix: Invalidate Message Cache on Stream 404 Instead of Showing Error (#12411)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* fix: Invalidate message cache on STREAM_EXPIRED instead of showing error

When a 404 (stream expired) is received during SSE resume, the generation
has already completed and messages are persisted in the database. Instead
of injecting an error message into the cache, invalidate the messages query
so react-query refetches from the DB. Also clear stale stream status cache
and step maps to prevent retries and memory leaks.

* fix: Mark conversation as processed when no active job found

Prevents useResumeOnLoad from repeatedly re-checking the same
conversation when the stream status returns inactive. The ref
still resets on conversation change, so navigating away and back
will correctly re-check.

Also wait for background refetches to settle (isFetching) before
acting on inactive status, preventing stale cached active:false
from suppressing a valid resume.

* test: Update useResumableSSE spec for cache invalidation on 404

Verify message cache invalidation, stream status removal,
clearStepMaps, and setIsSubmitting(false) on the 404 path.

* fix: Resolve lint warnings from CI

Remove unused ErrorTypes import in test, add queryClient to
useCallback dependency array in useResumableSSE.

* Reorder import statements in useResumableSSE.ts
2026-03-26 12:27:31 -04:00
Danny Avila
4b6d68b3b5
🎛️ feat: DB-Backed Per-Principal Config System (#12354)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
*  feat: Add Config schema, model, and methods for role-based DB config overrides

Add the database foundation for principal-based configuration overrides
(user, group, role) in data-schemas. Includes schema with tenantId and
tenant isolation, CRUD methods, and barrel exports.

* 🔧 fix: Add shebang and enforce LF line endings for git hooks

The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was
converting it to CRLF, both causing "Exec format error" on Windows.
Add .gitattributes to force LF for .husky/* and *.sh files.

*  feat: Add admin config API routes with section-level capability checks

Add /api/admin/config endpoints for managing per-principal config
overrides (user, group, role). Handlers in @librechat/api use DI pattern
with section-level hasConfigCapability checks for granular access control.

Supports full overrides replacement, per-field PATCH via dot-paths, field
deletion, toggle active, and listing.

* 🐛 fix: Move deleteConfigField fieldPath from URL param to request body

The path-to-regexp wildcard syntax (:fieldPath(*)) is not supported by
the version used in Express. Send fieldPath in the DELETE request body
instead, which also avoids URL-encoding issues with dotted paths.

*  feat: Wire config resolution into getAppConfig with override caching

Add mergeConfigOverrides utility in data-schemas for deep-merging DB
config overrides into base AppConfig by priority order.

Update getAppConfig to query DB for applicable configs when role/userId
is provided, with short-TTL caching and a hasAnyConfigs feature flag
for zero-cost when no DB configs exist.

Also: add unique compound index on Config schema, pass userId from
config middleware, and signal config changes from admin API handlers.

* 🔄 refactor: Extract getAppConfig logic into packages/api as TS service

Move override resolution, caching strategy, and signalConfigChange from
api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts
using the DI factory pattern (createAppConfigService). The JS file becomes
a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies.

* 🧹 chore: Rename configResolution.ts to resolution.ts

*  feat: Move admin types & capabilities to librechat-data-provider

Move SystemCapabilities, CapabilityImplications, and utility functions
(hasImpliedCapability, expandImplications) from data-schemas to
data-provider so they are available to external consumers like the
admin panel without a data-schemas dependency.

Add API-friendly admin types: TAdminConfig, TAdminSystemGrant,
TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult,
TCapabilityCategory, and CAPABILITY_CATEGORIES.

data-schemas re-exports these from data-provider and extends with
config-schema-derived types (ConfigSection, SystemCapability union).

Bump version to 0.8.500.

* feat: Add JSON-serializable admin config API response types to data-schemas

Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and
AdminConfigDeleteResponse types so both LibreChat API handlers and the
admin panel can share the same response contract. Bump version to 0.0.41.

* refactor: Move admin capabilities & types from data-provider to data-schemas

SystemCapabilities, CapabilityImplications, utility functions,
CAPABILITY_CATEGORIES, and admin API response types should not be in
data-provider as it gets compiled into the frontend bundle, exposing
the capability surface. Moved everything to data-schemas (server-only).

All consumers already import from @librechat/data-schemas, so no
import changes needed elsewhere. Consolidated duplicate AdminConfig
type (was in both config.ts and admin.ts).

* chore: Bump @librechat/data-schemas to 0.0.42

* refactor: Reorganize admin capabilities into admin/ and types/admin.ts

Split systemCapabilities.ts following data-schemas conventions:
- Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.)
  → src/types/admin.ts
- Runtime code (SystemCapabilities, CapabilityImplications, utilities)
  → src/admin/capabilities.ts

Revert data-provider version to 0.8.401 (no longer modified).

* chore: Fix import ordering, rename appConfigService to service

- Rename app/appConfigService.ts → app/service.ts (directory provides context)
- Fix import order in admin/config.ts, types/admin.ts, types/config.ts
- Add naming convention to AGENTS.md

* feat: Add DB base config support (role/__base__)

- Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc
- getApplicableConfigs always includes __base__ in queries
- getAppConfig queries DB even without role/userId when DB configs exist
- Bump @librechat/data-schemas to 0.0.43

* fix: Address PR review issues for admin config

- Add listAllConfigs method; listConfigs endpoint returns all active
  configs instead of only __base__
- Normalize principalId to string in all config methods to prevent
  ObjectId vs string mismatch on user/group lookups
- Block __proto__ and all dunder-prefixed segments in field path
  validation to prevent prototype pollution
- Fix configVersion off-by-one: default to 0, guard pre('save') with
  !isNew, use $inc on findOneAndUpdate
- Remove unused getApplicableConfigs from admin handler deps

* fix: Enable tree-shaking for data-schemas, bump packages

- Switch data-schemas Rollup output to preserveModules so each source
  file becomes its own chunk; consumers (admin panel) can now import
  just the modules they need without pulling in winston/mongoose/etc.
- Add sideEffects: false to data-schemas package.json
- Bump data-schemas to 0.0.44, data-provider to 0.8.402

* feat: add capabilities subpath export to data-schemas

Adds `@librechat/data-schemas/capabilities` subpath export so browser
consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants
without pulling in Node.js-only modules (winston, async_hooks, etc.).

Bump version to 0.0.45.

* fix: include dist/ in data-provider npm package

Add explicit files field so npm includes dist/types/ in the published
package. Without this, the root .gitignore exclusion of dist/ causes
npm to omit type declarations, breaking TypeScript consumers.

* chore: bump librechat-data-provider to 0.8.403

* feat: add GET /api/admin/config/base for raw AppConfig

Returns the full AppConfig (YAML + DB base merged) so the admin panel
can display actual config field values and structure. The startup config
endpoint (/api/config) returns TStartupConfig which is a different shape
meant for the frontend app.

* chore: imports order

* fix: address code review findings for admin config

Critical:
- Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE
  instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys
- Eliminate race condition: patchConfigField and deleteConfigField now use
  atomic MongoDB $set/$unset with dot-path notation instead of
  read-modify-write cycles, removing the lost-update bug entirely
- Add patchConfigFields and unsetConfigField atomic DB methods

Major:
- Reorder cache check before principal resolution in getAppConfig so
  getUserPrincipals DB query only fires on cache miss
- Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant
- Parallelize capability checks with Promise.all instead of sequential
  awaits in for loops
- Use loose equality (== null) for cache miss check to handle both null
  and undefined returns from cache implementations
- Set HAS_DB_CONFIGS_KEY to true on successful config fetch

Minor:
- Remove dead pre('save') hook from config schema (all writes use
  findOneAndUpdate which bypasses document hooks)
- Consolidate duplicate type imports in resolution.ts
- Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops)
- Add .sort({ priority: 1 }) to getApplicableConfigs query
- Rename _impliedBy to impliedByMap

* fix: self-referencing BROAD_CONFIG_ACCESS constant

* fix: replace type-cast sentinel with proper null parameter

Update hasConfigCapability to accept ConfigSection | null where null
means broad access check (MANAGE_CONFIGS or READ_CONFIGS only).
Removes the '' as ConfigSection type lie from admin config handlers.

* fix: remaining review findings + add tests

- listAllConfigs accepts optional { isActive } filter so admin listing
  can show inactive configs (#9)
- Standardize session application to .session(session ?? null) across
  all config DB methods (#15)
- Export isValidFieldPath and getTopLevelSection for testability
- Add 38 tests across 3 spec files:
  - config.spec.ts (api): path validation, prototype pollution rejection
  - resolution.spec.ts: deep merge, priority ordering, array replacement
  - config.spec.ts (data-schemas): full CRUD, ObjectId normalization,
    atomic $set/$unset, configVersion increment, toggle, __base__ query

* fix: address second code review findings

- Fix cross-user cache contamination: overrideCacheKey now handles
  userId-without-role case with its own cache key (#1)
- Add broad capability check before DB lookup in getConfig to prevent
  config existence enumeration (#2/#3)
- Move deleteConfigField fieldPath from request body to query parameter
  for proxy/load balancer compatibility (#5)
- Derive BaseSystemCapability from SystemCapabilities const instead of
  manual string union (#6)
- Return 201 on upsert creation, 200 on update (#11)
- Remove inline narration comments per AGENTS.md (#12)
- Type overrides as Partial<TCustomConfig> in DB methods and handler
  deps (#13)
- Replace double as-unknown-as casts in resolution.ts with generic
  deepMerge<T> (#14)
- Make override cache TTL injectable via AppConfigServiceDeps (#16)
- Add exhaustive never check in principalModel switch (#17)

* fix: remaining review findings — tests, rename, semantics

- Rename signalConfigChange → markConfigsDirty with JSDoc documenting
  the stale-window tradeoff and overrideCacheTtl knob
- Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention
- Add createAppConfigService tests (14 cases): cache behavior, feature
  flag, cross-user key isolation, fallback on error, markConfigsDirty
- Add admin handler integration tests (13 cases): auth ordering,
  201/200 on create/update, fieldPath from query param, markConfigsDirty
  calls, capability checks

* fix: global flag corruption + empty overrides auth bypass

- Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning
  no configs does not mean no configs exist globally. Setting the flag
  false from a per-principal query short-circuited all subsequent users.
- Add broad manage capability check before section checks in
  upsertConfigOverrides: empty overrides {} no longer bypasses auth.

* test: add regression and invariant tests for config system

Regression tests:
- Bug 1: User A's empty result does not short-circuit User B's overrides
- Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS

Invariant tests (applied across ALL handlers):
- All 5 mutation handlers call markConfigsDirty on success
- All 5 mutation handlers return 401 without auth
- All 5 mutation handlers return 403 without capability
- All 3 read handlers return 403 without capability

* fix: third review pass — all findings addressed

Service (service.ts):
- Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId)
  so deployments with zero DB configs skip DB queries (#1)
- Resolve cache once at factory init instead of per-invocation (#8)
- Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10)
- Add JSDoc to clearAppConfigCache documenting stale-window (#4)
- Fix log message to not say "from YAML" (#14)

Admin handlers (config.ts):
- Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2)
- Add Array.isArray guard on overrides body (#5)
- Import CapabilityUser from capabilities.ts, remove duplicate (#6)
- Replace as-unknown-as cast with targeted type assertion (#7)
- Add MAX_PATCH_ENTRIES=100 cap on entries array (#15)
- Reorder deleteConfigField to validate principalType first (#12)
- Export CapabilityUser from middleware/capabilities.ts

DB methods (config.ts):
- Remove isActive:true from patchConfigFields to prevent silent
  reactivation of disabled configs (#3)

Schema (config.ts):
- Change principalId from Schema.Types.Mixed to String (#11)

Tests:
- Add patchConfigField unsafe fieldPath rejection test (#9)
- Add base-only HAS_DB_CONFIGS=false test (#1)
- Update 201/200 tests to use configVersion instead of findConfig (#2)

* fix: add read handler 401 invariant tests + document flag behavior

- Add invariant: all 3 read handlers return 401 without auth
- Document on markConfigsDirty that HAS_DB_CONFIGS stays true after
  all configs are deleted until clearAppConfigCache or restart

* fix: remove HAS_DB_CONFIGS false optimization entirely

getApplicableConfigs([]) only queries for __base__, not all configs.
A deployment with role/group configs but no __base__ doc gets the
flag poisoned to false by a base-only query, silently ignoring all
scoped overrides. The optimization is not safe without a comprehensive
Config.exists() check, which adds its own DB cost. Removed entirely.

The flag is now write-once-true (set when configs are found or by
markConfigsDirty) and only cleared by clearAppConfigCache/restart.

* chore: reorder import statements in app.js for clarity

* refactor: remove HAS_DB_CONFIGS_KEY machinery entirely

The three-state flag (false/null/true) was the source of multiple bugs
across review rounds. Every attempt to safely set it to false was
defeated by getApplicableConfigs querying only a subset of principals.

Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag,
markConfigsDirty (now a no-op concept), notifyChange wrapper, and all
tests that seeded false manually.

The per-user/role TTL cache (overrideCacheTtl, default 60s) is the
sole caching mechanism. On cache miss, getApplicableConfigs queries
the DB. This is one indexed query per user per TTL window — acceptable
for the config override use case.

* docs: rewrite admin panel remaining work with current state

* perf: cache empty override results to avoid repeated DB queries

When getApplicableConfigs returns no configs for a principal, cache
baseConfig under their override key with TTL. Without this, every
user with no per-principal overrides hits MongoDB on every request
after the 60s cache window expires.

* fix: add tenantId to cache keys + reject PUBLIC principal type

- Include tenantId in override cache keys to prevent cross-tenant
  config contamination. Single-tenant deployments (tenantId undefined)
  use '_' as placeholder — no behavior change for them.
- Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has
  no PrincipalModel and is never resolved by getApplicableConfigs,
  so config docs for it would be dead data.
- Config middleware passes req.user.tenantId to getAppConfig.

* fix: fourth review pass findings

DB methods (config.ts):
- findConfigByPrincipal accepts { includeInactive } option so admin
  GET can retrieve inactive configs (#5)
- upsertConfig catches E11000 duplicate key on concurrent upserts and
  retries without upsert flag (#2)
- unsetConfigField no longer filters isActive:true, consistent with
  patchConfigFields (#11)
- Typed filter objects replace Record<string, unknown> (#12)

Admin handlers (config.ts):
- patchConfigField: serial broad capability check before Promise.all
  to pre-warm ALS principal cache, preventing N parallel DB calls (#3)
- isValidFieldPath rejects leading/trailing dots and consecutive
  dots (#7)
- Duplicate fieldPaths in patch entries return 400 (#8)
- DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14)
- Admin getConfig and patchConfigField pass includeInactive to
  findConfigByPrincipal (#5)
- Route import uses barrel instead of direct file path (#13)

Resolution (resolution.ts):
- deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow
  from crafted deeply nested configs (#4)

* fix: final review cleanup

- Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths)
- Add empty-result caching regression test
- Add tenantId to AdminConfigDeps.getAppConfig type
- Restore exhaustive never check in principalModel switch
- Standardize toggleConfigActive session handling to options pattern

* fix: validate priority in patchConfigField handler

Add the same non-negative number validation for priority that
upsertConfigOverrides already has. Without this, invalid priority
values could be stored via PATCH and corrupt merge ordering.

* chore: remove planning doc from PR

* fix: correct stale cache key strings in service tests

* fix: clean up service tests and harden tenant sentinel

- Remove no-op cache delete lines from regression tests
- Change no-tenant sentinel from '_' to '__default__' to avoid
  collision with a real tenant ID when multi-tenancy is enabled
- Remove unused CONFIG_STORE from AppConfigServiceDeps

* chore: bump @librechat/data-schemas to 0.0.46

* fix: block prototype-poisoning keys in deepMerge

Skip __proto__, constructor, and prototype keys during config merge
to prevent prototype pollution via PUT /api/admin/config overrides.
2026-03-25 19:39:29 -04:00
Danny Avila
f277b32030
📸 fix: Snapshot Options to Prevent Mid-Await Client Disposal Crash (#12398)
Some checks failed
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
* 🐛 fix: Prevent crash when token balance exhaustion races with client disposal

Add null guard in saveMessageToDatabase to handle the case where
disposeClient nullifies this.options while a userMessagePromise is
still pending from a prior async save operation.

* 🐛 fix: Snapshot this.options to prevent mid-await disposal crash

The original guard at function entry was insufficient — this.options is
always valid at entry. The crash occurs after the first await
(db.saveMessage) when disposeClient nullifies client.options while the
promise is suspended.

Fix: capture this.options into a local const before any await. The local
reference is immune to client.options = null set by disposeClient.

Also add .catch on userMessagePromise in sendMessage to prevent
unhandled rejections when checkBalance throws before the promise is
awaited, and add two regression tests.
2026-03-25 14:18:32 -04:00
ESJavadex
abaf9b3e13
🗝️ fix: Resolve User-Provided API Key in Agents API Flow (#12390)
* fix: resolve user-provided API key in Agents API flow

When the Agents API calls initializeCustom, req.body follows the
OpenAI-compatible format (model, messages, stream) and does not
include the `key` field that the regular UI chat flow sends.

Previously, getUserKeyValues was only called when expiresAt
(from req.body.key) was truthy, causing the Agents API to always
fail with NO_USER_KEY for custom endpoints using
apiKey: "user_provided".

This fix decouples the key fetch from the expiry check:
- If expiresAt is present (UI flow): checks expiry AND fetches key
- If expiresAt is absent (Agents API): skips expiry check, still
  fetches key

Fixes #12389

* address review feedback from @danny-avila

- Flatten nested if into two sibling statements (never-nesting style)
- Add inline comment explaining why expiresAt may be absent
- Add negative assertion: checkUserKeyExpiry NOT called in Agents API flow
- Add regression test: expired key still throws EXPIRED_USER_KEY
- Add test for userProvidesURL=true variant in Agents API flow
- Remove unnecessary undefined cast in test params

* fix: CI failure + address remaining review items

- Fix mock leak: use mockImplementationOnce instead of mockImplementation
  to prevent checkUserKeyExpiry throwing impl from leaking into SSRF tests
  (clearAllMocks does not reset implementations)
- Use ErrorTypes.EXPIRED_USER_KEY constant instead of raw string
- Add test: system-defined key/URL should NOT call getUserKeyValues
2026-03-25 14:17:11 -04:00
Adi Singhal
6466483ae3
🐛 fix: Resolve MeiliSearch Startup Sync Failure from Model Loading Order (#12397)
* fix: resolve MeiliSearch startup sync failure from model loading order

`indexSync.js` captures `mongoose.models.Message` and
`mongoose.models.Conversation` at module load time (lines 9-10).
Since PR #11830 moved model registration into `createModels()`,
these references are always `undefined` because `require('./indexSync')`
ran before `createModels(mongoose)` in `api/db/index.js`.

Move the `require('./indexSync')` below `createModels(mongoose)` so
Mongoose models are registered before `indexSync.js` captures them.

* fix: defer model lookups in indexSync, add load-order regression test

- Move mongoose.models.Message and mongoose.models.Conversation lookups
  from module-level into performSync() and the indexSync() catch block,
  eliminating the fragile load-order dependency that caused the original
  regression
- Add guard assertion that throws a clear error if models are missing
- Add comment in api/db/index.js documenting the createModels ordering
  constraint
- Add api/db/index.spec.js regression test to prevent future re-introduction
  of the load-order bug

* fix: strengthen regression test to verify call order, add guard in setTimeout path

- Rewrite index.spec.js to track call sequence via callOrder array,
  ensuring createModels executes before indexSync module loads
  (verified: test fails if order is reversed)
- Add missing model guard assertion in the setTimeout fallback path
  in indexSync.js for consistency with performSync

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-25 14:02:44 -04:00
Danny Avila
734239346b
🔒 chore: Bump MongoDB from 8.0.17 to 8.0.20 in Docker Compose Files (#12399)
Addresses vulnerabilities disclosed in CERTFR-2026-AVI-0310:
- Improper object lifecycle management of MD5 hash state in core
  cryptographic operations (Blocker/P1)
- Use-after-free in ExpressionContext during pipeline cloning with
  nested $unionWith stages (Major/P3)
2026-03-25 13:56:43 -04:00
Danny Avila
221e49222d
refactor: Fast-Fail MCP Tool Discovery on 401 for Non-OAuth Servers (#12395)
* fix: fast-fail MCP discovery for non-OAuth servers on auth errors

Always attach oauthHandler in discoverToolsInternal regardless of
useOAuth flag. Previously, non-OAuth servers hitting 401 would hang
for 30s because connectClient's oauthHandledPromise had no listener
to emit oauthFailed, waiting until withTimeout killed it.

* chore: import order
2026-03-25 13:18:02 -04:00
Christopher Bennell
3f805d68a1
📬 docs: Add Forwarded Headers to Nginx SSL Proxy Template (#12379)
Add X-Forwarded-Proto, X-Forwarded-For and Host headers in nginx proxy config. Addresses #12378.
2026-03-25 13:04:19 -04:00
Marco Beretta
0c66823c26
🧩 feat: Redesign Tool Call UI with Contextual Icons, Smart Grouping, and Rich Output Rendering (#12163)
* feat: redesign tool call UI with type-specific icons, smart grouping, and rich output rendering

Replace the generic spinner/checkmark tool call UI with a modern, Cursor-inspired design:

- Add per-tool-type icons (Plug for MCP, Terminal for code, Globe for web search, etc.)
- Group 2+ consecutive tool calls into collapsible "Used N tools" sections
- Stack unique tool icons in grouped headers with overlapping circle design
- Replace raw JSON output with intelligent renderers (table, error, text)
- Restructure ToolCallInfo: output first, parameters collapsible at bottom
- Add shared useExpandCollapse hook for consistent animations
- Add CodeWindowHeader for ExecuteCode windowed view
- Remove FinishedIcon (purple checkmark) entirely

* feat: display custom MCP server icons in tool calls

Add useMCPIconMap hook to resolve MCP server names to their configured
icon paths. ToolIcon and StackedToolIcons now accept custom icon URLs,
showing actual server logos (e.g., Home Assistant, GitHub) instead of
the generic Plug icon for MCP tool calls.

* refactor: unify container styling across code blocks, mermaid, and tool output

Replace hardcoded gray colors with theme tokens throughout:
- CodeBlock: bg-gray-900/700 -> bg-surface-secondary/tertiary + border-border-light
- Mermaid dialog: bg-gray-700 -> bg-surface-secondary, text-gray-200 -> text-text-secondary
- Mermaid containers: rounded-xl -> rounded-lg, remove shadow-md for consistency
- ResultSwitcher: bg-gray-700 -> bg-surface-secondary with border separator
- RunCode: hover:bg-gray-700 -> hover:bg-surface-hover
- ErrorOutput: add border for visual consistency
- MermaidHeader/CodeWindowHeader: consistent focus outlines using border-heavy

* refactor: simplify tool output to plain text, remove custom renderers

Remove over-engineered tool output system (TableOutput, ErrorOutput,
detectOutputType) in favor of simple text extraction. Tool output now
extracts the text content from MCP content blocks and displays it as
clean readable text — no tables, no error styling, no JSON formatting.

Parameters only show key-value badges for simple objects; complex JSON
is hidden instead of dumped raw. Matches Cursor-style simplicity.

* fix: handle error messages and format JSON arrays in tool output

- Strip verbose MCP error prefixes (Error: [MCP][server][tool] tool call
  failed: Error POSTing...) and show just the meaningful error message
- Display errors in red text
- Format uniform JSON arrays as readable lists (name — path) instead
  of raw JSON dumps
- Format plain JSON objects as key: value lines

* feat: improve JSON display in tool call output and parameters

- Replace flat formatObject with recursive formatValue for proper
  indented display of nested JSON structures
- Add ComplexInput component for tool parameters with nested objects,
  arrays, or long strings (previously hidden)
- Broaden hasParams check to show parameters for all object types
- Add font-mono to output renderer for better alignment

* feat: add localization keys for tool errors, web search, and code UI

* refactor: move Mermaid components into dedicated directory module

* refactor: extract CodeBar, FloatingCodeBar, and copy utilities from CodeBlock

* feat: replace manual SVG icons with @icons-pack/react-simple-icons

Supports 50+ programming languages with tree-shaken brand icons
instead of hand-crafted SVGs for 19 languages.

* refactor: simplify code execution UI with persistent code toggle

* refactor: use useExpandCollapse hook in Thinking and Reasoning

* feat: improve tool call error states, subtitles, and group summaries

* feat: redesign web search with inline source display

* feat: improve agent handoff with keyboard accessibility

* feat: reorganize exports order in hooks and utils

* refactor: unify CopyCodeButton with animated icon transitions and iconOnly support

* feat: add run code state machine with animated success/error feedback

* refactor: improve ResultSwitcher with lucide icons and accessibility

* refactor: update CopyButton component

* refactor: replace CopyCodeButton with CopyButton component across multiple files

* test: add ImageGen test stubs

* test: add RetrievalCall test stubs

* feat: merge ImageGen with ToolIcon and localized progress text

* feat: modernize RetrievalCall with ToolIcon and collapsible output

* test: add getToolIconType action delimiter tests

* test: add ImageGen collapsible output tests

* feat: add action ToolIcon type with Zap icon

* fix: replace AgentHandoff div with semantic button

* feat: add aria-live regions to tool components

* feat: redesign execute_code tool UI with syntax highlighting and language icons

- Remove filename labels (script.py, main.rs) and line counter from CodeWindowHeader
- Replace generic FileCode icon with language-specific LangIcon
- Add syntax highlighting via highlight.js to code blocks
- Add SquareTerminal icon to ExecuteCode progress text
- Use shared CopyButton component in CodeWindowHeader
- Remove active:scale-95 press animation from CopyButton and RunCode

* feat: dynamic tool status text sizing based on markdown font-size variable

- Add tool-status-text CSS class using calc(0.9 * --markdown-font-size)
- Update progress-text-wrapper to use dynamic sizing instead of base size
- Apply tool-status-text to WebSearch, ToolCallGroup, AgentHandoff, ImageGen
- Replace hardcoded text-sm/text-xs with dynamic class across all tools
- Animate chevron rotation in ProgressText and ToolCallGroup
- Update subtitle text color from tertiary to secondary

* fix: consistent spacing and text styles across all tool components

- Standardize tool status row spacing to my-1/my-1.5 across all components
- Update ToolCallInfo text from tertiary to secondary, add vertical padding
- Animate ToolCallInfo parameters chevron rotation
- Update OutputRenderer link colors from tertiary to secondary

* feat: unify tool call grouping for all tool types

All consecutive tool calls (MCP, execute_code, web_search, image_gen,
file_search, code_interpreter) are now grouped under a single
collapsible "Used N tools" header instead of only grouping generic
tool calls.

- Remove SPECIAL_TOOL_NAMES blacklist from groupToolCalls
- Replace getToolCallData with getToolMeta to handle all tool types
- Use renderPart callback in ToolCallGroup for proper component routing
- Add file_search and code_interpreter mappings to getToolIconType

* feat: friendly tool group labels, more icons, and output copy button

- Show friendly names in group summary (Code, Web Search, Image
  Generation) instead of raw tool names
- Display MCP server names instead of individual function names
- Deduplicate labels and show up to 3 with +N overflow
- Increase stacked icons from 3 to 4
- Add icon-only copy button to tool output (OutputRenderer)

* fix: execute_code spacing and syntax-highlighted code visibility

Match ToolCall spacing by using my-1.5 on status line and moving my-2
inside overflow-hidden. Replace broken hljs.highlight() with lowlight
(same engine used by rehype-highlight for markdown code blocks) to
render syntax-highlighted code as React elements. Handle object args
in useParseArgs to support both string and Record arg formats.

* feat: replace showCode with auto-expand tools setting

Replace the execute_code-only "Always show code when using code
interpreter" global toggle with a new "Auto-expand tool details"
setting that controls all tool types. Each tool instance now uses
independent local state initialized from the setting, so expanding
one tool no longer affects others. Applies to ToolCall, ExecuteCode,
ToolCallGroup, and CodeAnalyze components.

* fix: apply auto-expand tools setting to WebSearch and RetrievalCall

* fix: only auto-expand tools when content is available

Defer auto-expansion until tool output or content arrives, preventing
empty bordered containers from showing while tools are still running.
Uses useEffect to expand when output becomes available during streaming.

* feat: redesign file_search tool output, citations, and file preview

- Redesign RetrievalCall with per-file cards using OutputRenderer
  (truncated content with show more/less, copy button) matching MCP
  tool pattern
- Route file_search tool calls from Agents API to RetrievalCall
  instead of generic ToolCall
- Add FilePreviewDialog for viewing files (PDF iframe, text content)
  with download option, opened from clickable filenames
- Redesign file citations: FileText icon in badge, relevance and
  page numbers in hovercard, click opens file preview instead of
  downloading
- Add file preview to message file attachments (Files.tsx)
- Fix hovercard animation to slide top-to-bottom and dismiss
  instantly on file click to prevent glitching over dialog
- Add localization keys for relevance, extracted content, preview
- Add top margin to ToolCallGroup

* chore: remove leftover .planning files

* fix: polish FilePreviewDialog, CodeBlock, LangIcon, and Sources

* fix: prevent keyboard focus on collapsed tool content

Add inert attribute to all expand/collapse wrapper divs so
collapsed content is removed from tab order and hidden from
assistive technology. Skip disabled ProgressText buttons from
tab order with tabIndex={-1}.

* feat: integrate file metadata into file_search UI

Pass fileType (MIME) and fileBytes from backend file records through
to the frontend. Add file-type-specific icons, file size display,
pages sorted by relevance, multi-snippet content per file, smart
preview detection by MIME type, and copy button in file preview dialog.

* fix: review fixes — inverted type check, wrong dimension, missing import, fail-open perms, timer leaks, dead code cleanup

* fix: update CodeBlock styling for improved visual consistency

* fix(chat): open composite file citations in preview

* fix(chat): restore file previews for parsed search results

* chore(git): ignore bg-shell artifacts

* fix(chat): restore readable code content in light theme

* style(chat): align code and output surfaces by theme

* chore(i18n): remove 6 unused translation keys

* fix(deps): replace private registry URL with public npm registry in lockfile

* fix: CI lint, build, and test failures

- Add missing scaleImage utility (fixes Vite build error)
- Export scaleImage from utils/index.ts
- Remove unused imports from Part.tsx (FunctionToolCall, CodeToolCall, Agents)
- Fix prettier formatting in Part.tsx (multi-line → single-line imports, conditions)
- Remove excess blank lines in Part.tsx
- Remove unused CodeEditorRef import from Artifacts.tsx
- Add useProgress mock to OpenAIImageGen.test.tsx
- Add scaleImage mock to OpenAIImageGen.test.tsx
- Update OpenAIImageGen tests to match redesigned component structure
- Remove dead collapsible output panel tests from ImageGen.test.tsx
- Add @icons-pack/react-simple-icons to Jest transformIgnorePatterns (ESM fix)

* refactor: reorganize imports order across multiple components for consistency

* fix: add scaleImage tests, delete dead ImageGen wrapper, wire up onUIAction in ToolCallInfo

- Add 7 unit tests for scaleImage utility covering null ref, scaling,
  no-upscale, height clamping, landscape, and panoramic images
- Delete unused Content/ImageGen.tsx re-export wrapper (ImageGen is
  imported from Parts/OpenAIImageGen via the Parts barrel)
- Wire up onUIAction in ToolCallInfo to use handleUIAction + ask from
  useMessagesOperations, matching UIResourceCarousel's behavior
  (was previously a silent no-op)

* refactor: optimize imports and enhance lazy loading for language icons

* fix: address review findings for tool call UI redesign

- Fix unstable array-index keys in ToolCallGroup (streaming state corruption)
- Add plain-text fallback in InputRenderer for non-JSON tool args
- Localize FRIENDLY_NAMES via translation keys instead of hardcoded English
- Guard autoCollapse against user-initiated manual expansion
- Fix CODE_INTERPRETER hasOutput to check actual outputs instead of hardcoding true
- Add logger.warn for Citations fail-closed behavior on permission errors
- Add Terminal icon to CodeAnalyze ProgressText for visual consistency
- Fix getMCPServerName to use indexOf instead of fragile split
- Use useLayoutEffect for inert attribute in useExpandCollapse (a11y)
- Memoize style object in useExpandCollapse to avoid defeating React.memo
- Memoize groupSequentialToolCalls in ContentParts to avoid recomputation
- Use source.link as stable key instead of array index in WebSearch
- Hoist rehypePlugins outside CodeMarkdown to prevent per-render recreation

* fix: revert useMemo after conditional returns in ContentParts

The useMemo placed after early returns violated React Rules of Hooks —
hook call count would change when transitioning between edit/view mode.
Reverted to the original plain forEach which is correct and equally
performant since content changes on every streaming token anyway.

* chore: remove unused com_ui_variables_info translation key

* fix: update tests and jest config for ESM compatibility after rebase

- Add ESM-only packages to transformIgnorePatterns (@dicebear, unified
  ecosystem, react-dnd, lowlight, etc.) to fix Jest parse failures
  introduced by dev rebase
- Update ToolCall.test.tsx to match new component API (CSS
  expand/collapse instead of conditional rendering, simplified props)
- Update ToolCallInfo.test.tsx to mock OutputRenderer (avoids ESM
  chain), align with current component interface (input/output/attachments)

* refactor: replace @icons-pack/react-simple-icons with inline SVGs

Inline the 51 Simple Icons SVG paths used by LangIcon directly into
langIconPaths.ts, eliminating the runtime dependency on
@icons-pack/react-simple-icons (which requires Node >= 24).

- LangIcon now renders a plain <svg> with the path data instead of
  lazy-loading React components from the package
- Removes Suspense/React.lazy overhead for code block language icons
- SVG paths sourced from Simple Icons (CC0 1.0 license)
- Package kept in package.json for now (will be removed separately)

* fix: replace Plug icon with Wrench for MCP tools, remove unused i18n keys

- MCP tools without a custom iconPath now show Wrench instead of Plug,
  matching the generic tool fallback and avoiding the "plugin" metaphor
- Remove unused translation keys: com_assistants_action_attempt,
  com_assistants_attempt_info, com_assistants_domain_info,
  com_ui_ui_resources

* fix: address second review findings

- Combine 3x getToolMeta loop into single toolMetadata pass (ToolCallGroup)
- Extract sortPagesByRelevance to shared util (was duplicated in
  FilePreviewDialog and RetrievalCall)
- Deduplicate AGENT_STYLE_TOOLS Set (export from OpenAIImageGen/index.ts)
- Localize "source/sources" in WebSearch aria-label
- Add autoExpand useEffect to CodeAnalyze for live setting changes
- Log download errors in FilePreviewDialog instead of silently swallowing
- Replace @ts-ignore with @ts-expect-error + explanation in Code.tsx
- Remove dead currentContent alias in CodeMarkdown

* chore: remove @icons-pack/react-simple-icons dependency from package.json and package-lock.json

- Deleted the @icons-pack/react-simple-icons entry from both package.json and package-lock.json, following the previous refactor to use inline SVGs for icons.

* fix: address triage audit findings

- Remove unused gIdx variable (ESLint error)
- Fix singular/plural in web search sources aria-label
- Separate inline type import in ToolCallGroup per AGENTS.md

* fix: remove invalid placeholderDimensions prop from Image component

* chore: import order

* chore: import order

* fix: resolve TypeScript errors in PR-touched files

- Remove non-existent placeholderDimensions prop from Image in Files.tsx
- Fix localize count param type (number, not string) in WebSearch.tsx
- Pass full resource object instead of partial in UIResourceCarousel.tsx
- Add 'as const' to toggleSwitchConfigs localizationKey in General.tsx
- Fix SearchResultData type in Citation.test.tsx
- Fix TAttachment and UIResource test fixture types across test files

* docs: document formatBytes difference in FilePreviewDialog

The local formatBytes returns a human-readable string with units
("1.5 MB"), while ~/utils/formatBytes returns a raw number. They
serve different purposes, so the local copy is retained with a
JSDoc comment explaining the distinction.

* fix: address remaining review items

- Replace cancelled IIFE with documented ternary in OpenAIImageGen,
  explaining the agent vs legacy path distinction
- Add .catch() fallback to loadLowlight() in useLazyHighlight — falls
  back to plain text if the chunk fails to load
- Fix import ordering in ToolCallGroup.tsx (type imports grouped before
  local value imports per AGENTS.md)

* fix: blob URL leak and useGetFiles over-fetch

- FilePreviewDialog: add cancelledRef guard to loadPreview so blob URLs
  are never created after the dialog closes (prevents orphaned object
  URLs on unmount during async PDF fetch)
- RetrievalCall: filter useGetFiles by fileIds from fileSources instead
  of fetching the entire user file corpus for display-only name matching

* chore: fix com_nav_auto_expand_tools alphabetical order in translation.json

* fix: render non-object JSON params instead of returning null in InputRenderer

* refactor: render JSON tool output as syntax-highlighted code block

Replace the custom YAML-ish formatValue/formatObjectArray rendering
with JSON.stringify + hljs language-json styling. Structured API
responses (like GitHub search results) now display as proper
syntax-highlighted JSON with indentation instead of a flat key-value
text dump.

- Remove formatValue, formatObjectArray, isUniformObjectArray helpers
- Add isJson flag to extractText return type
- JSON output rendered in <code class="hljs language-json"> block
- Text content blocks (type: "text") still extracted and rendered
  as plain text
- Error output unchanged

* fix: extract cancelled IIFE to named function in OpenAIImageGen

Replace nested ternary with a named computeCancelled() function that
documents the agent vs legacy path branching. Resolves eslint
no-nested-ternary warning.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-25 12:31:39 -04:00
Danny Avila
5a373825a5
📐 style: Resolve Stale Active Sidebar Panel and Favorites Row Height (#12366)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Has been cancelled
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Has been cancelled
Publish `@librechat/client` to NPM / build-and-publish (push) Has been cancelled
Publish `librechat-data-provider` to NPM / build (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
* 🐛 fix: Sidebar favorites row height stuck at stale measurement

Use a content-aware cache key for the favorites row in the virtualized
sidebar list. The CellMeasurerCache keyMapper now encodes
favorites.length, showAgentMarketplace, and isFavoritesLoading into the
key so the cache naturally invalidates when the content shape changes,
forcing CellMeasurer to re-measure from scratch instead of returning a
stale height from a transient render state.

Remove the onHeightChange callback and its useEffect from FavoritesList
since the content-aware key handles all height-affecting state
transitions.

* test: Add unit tests for Conversations component favorites height caching

Introduced a new test suite for the Conversations component to validate the behavior of the favorites CellMeasurerCache. The tests ensure that the cache correctly invalidates when the favorites count changes, loading state transitions occur, and marketplace visibility toggles. This enhances the reliability of the component's rendering logic and ensures proper height management in the virtualized list.

* fix: Validate active sidebar panel against available links

The `side:active-panel` localStorage key was shared with the old
right-side panel. On first load of the unified sidebar, a stale value
(e.g. 'hide-panel', or a conditional panel not currently available)
would match no link, leaving the expanded panel empty.

Add `resolveActivePanel` as derived state in both Nav and ExpandedPanel
so content and icon highlight always fall back to the first link when
the stored value doesn't match any available link.

* refactor: Remove redundant new-chat button from Agent Marketplace header

The sidebar icon strip already provides a new chat button, making the
sticky header in the marketplace page unnecessary.

* chore: import order and linting

* fix: Update dependencies in Conversations component to include marketplace visibility

Modified the useEffect dependency array in the Conversations component to include `showAgentMarketplace`, ensuring proper reactivity to changes in marketplace visibility. Additionally, updated the test suite to mock favorites state for accurate height caching validation when marketplace visibility changes.
2026-03-23 13:33:26 -04:00