Commit graph

514 commits

Author SHA1 Message Date
Danny Avila
fda72ac621
🏗️ refactor: Remove Redundant Caching, Migrate Config Services to TypeScript (#12466)
* ♻️ refactor: Remove redundant scopedCacheKey caching, support user-provided key model fetching

Remove redundant cache layers that used `scopedCacheKey()` (tenant-only scoping)
on top of `getAppConfig()` which already caches per-principal (role+user+tenant).
This caused config overrides for different principals within the same tenant to
be invisible due to stale cached data.

Changes:
- Add `requireJwtAuth` to `/api/endpoints` route for proper user context
- Remove ENDPOINT_CONFIG, STARTUP_CONFIG, PLUGINS, TOOLS, and MODELS_CONFIG
  cache layers — all derive from `getAppConfig()` with cheap computation
- Enhance MODEL_QUERIES cache: hash(baseURL+apiKey) keys, 2-minute TTL,
  caching centralized in `fetchModels()` base function
- Support fetching models with user-provided API keys in `loadConfigModels`
  via `getUserKeyValues` lookup (no caching for user keys)
- Update all affected tests

Closes #1028

* ♻️ refactor: Migrate config services to TypeScript in packages/api

Move core config logic from CJS /api wrappers to typed TypeScript in
packages/api using dependency injection factories:

- `createEndpointsConfigService` — endpoint config merging + checkCapability
- `createLoadConfigModels` — custom endpoint model loading with user key support
- `createMCPToolCacheService` — MCP tool cache operations (update, merge, cache)

/api files become thin wrappers that wire dependencies (getAppConfig,
loadDefaultEndpointsConfig, getUserKeyValues, getCachedTools, etc.)
into the typed factories.

Also moves existing `endpoints/config.ts` → `endpoints/config/providers.ts`
to accommodate the new `config/` directory structure.

* 🔄 fix: Invalidate models query when user API key is set or revoked

Without this, users had to refresh the page after entering their API key
to see the updated model list fetched with their credentials.

- Invalidate QueryKeys.models in useUpdateUserKeysMutation onSuccess
- Invalidate QueryKeys.models in useRevokeUserKeyMutation onSuccess
- Invalidate QueryKeys.models in useRevokeAllUserKeysMutation onSuccess

* 🗺️ fix: Remap YAML-level override keys to AppConfig equivalents in mergeConfigOverrides

Config overrides stored in the DB use YAML-level keys (TCustomConfig),
but they're merged into the already-processed AppConfig where some fields
have been renamed by AppService. This caused mcpServers overrides to land
on a nonexistent key instead of mcpConfig, so config-override MCP servers
never appeared in the UI.

- Add OVERRIDE_KEY_MAP to remap mcpServers→mcpConfig, interface→interfaceConfig
- Apply remapping before deep merge in mergeConfigOverrides
- Add test for YAML-level key remapping behavior
- Update existing tests to use AppConfig field names in assertions

* 🧪 test: Update service.spec to use AppConfig field names after override key remapping

* 🛡️ fix: Address code review findings — reliability, types, tests, and performance

- Pass tenant context (getTenantId) in importers.js getEndpointsConfig call
- Add 5 tests for user-provided API key model fetching (key found, no key,
  DB error, missing userId, apiKey-only with fixed baseURL)
- Distinguish NO_USER_KEY (debug) from infrastructure errors (warn) in catch
- Switch fetchPromisesMap from Promise.all to Promise.allSettled so one
  failing provider doesn't kill the entire model config
- Parallelize getUserKeyValues DB lookups via batched Promise.allSettled
  instead of sequential awaits in the loop
- Hoist standardCache instance in fetchModels to avoid double instantiation
- Replace Record<string, unknown> types with Partial<TConfig>-based types;
  remove as unknown as T double-cast in endpoints config
- Narrow Bedrock availableRegions to typed destructure
- Narrow version field from string|number|undefined to string|undefined
- Fix import ordering in mcp/tools.ts and config/models.ts per AGENTS.md
- Add JSDoc to getModelsConfig alias clarifying caching semantics

* fix: Guard against null getCachedTools in mergeAppTools

* 🔍 fix: Address follow-up review — deduplicate extractEnvVariable, fix error discrimination, add log-level tests

- Deduplicate extractEnvVariable calls: resolve apiKey/baseURL once, reuse
  for both the entry and isUserProvided checks (Finding A)
- Move ResolvedEndpoint interface from function closure to module scope (Finding B)
- Replace fragile msg.includes('NO_USER_KEY') with ErrorTypes.NO_USER_KEY
  enum check against actual error message format (Finding C). Also handle
  ErrorTypes.INVALID_USER_KEY as an expected "no key" case.
- Add test asserting logger.warn is called for infra errors (not debug)
- Add test asserting logger.debug is called for NO_USER_KEY errors (not warn)

* fix: Preserve numeric assistants version via String() coercion

* 🐛 fix: Address secondary review — Ollama cache bypass, cache tests, type safety

- Fix Ollama success path bypassing cache write in fetchModels (CRITICAL):
  store result before returning so Ollama models benefit from 2-minute TTL
- Add 4 fetchModels cache behavior tests: cache write with TTL, cache hit
  short-circuits HTTP, skipCache bypasses read+write, empty results not cached
- Type-safe OVERRIDE_KEY_MAP: Partial<Record<keyof TCustomConfig, keyof AppConfig>>
  so compiler catches future field rename mismatches
- Fix import ordering in config/models.ts (package types longest→shortest)
- Rename ToolCacheDeps → MCPToolCacheDeps for naming consistency
- Expand getModelsConfig JSDoc to explain caching granularity

* fix: Narrow OVERRIDE_KEY_MAP index to satisfy strict tsconfig

* 🧩 fix: Add allowedProviders to TConfig, remove Record<string, unknown> from PartialEndpointEntry

The agents endpoint config includes allowedProviders (used by the frontend
AgentPanel to filter available providers), but it was missing from TConfig.
This forced PartialEndpointEntry to use & Record<string, unknown> as an
escape hatch, violating AGENTS.md type policy.

- Add allowedProviders?: (string | EModelEndpoint)[] to TConfig
- Remove Record<string, unknown> from PartialEndpointEntry — now just Partial<TConfig>

* 🛡️ fix: Isolate Ollama cache write from fetch try-catch, add Ollama cache tests

- Separate Ollama fetch and cache write into distinct scopes so a cache
  failure (e.g., Redis down) doesn't misattribute the error as an Ollama
  API failure and fall through to the OpenAI-compatible path (Issue A)
- Add 2 Ollama-specific cache tests: models written with TTL on fetch,
  cached models returned without hitting server (Issue B)
- Replace hardcoded 120000 with Time.TWO_MINUTES constant in cache TTL
  test assertion (Issue C)
- Fix OVERRIDE_KEY_MAP JSDoc to accurately describe runtime vs compile-time
  type enforcement (Issue D)
- Add global beforeEach for cache mock reset to prevent cross-test leakage

* 🧪 fix: Address third review — DI consistency, cache key width, MCP tests

- Inject loadCustomEndpointsConfig via EndpointsConfigDeps with default
  fallback, matching loadDefaultEndpointsConfig DI pattern (Finding 3)
- Widen modelsCacheKey from 64-bit (.slice(0,16)) to 128-bit (.slice(0,32))
  for collision-sensitive cross-credential cache key (Finding 4)
- Add fetchModels.mockReset() in loadConfigModels.spec beforeEach to
  prevent mock implementation leaks across tests (Finding 5)
- Add 11 unit tests for createMCPToolCacheService covering all three
  functions: null/empty input, successful ops, error propagation,
  cold-cache merge (Finding 2)
- Simplify getModelsConfig JSDoc to @see reference (Finding 10)

* ♻️ refactor: Address remaining follow-ups from reviews

OVERRIDE_KEY_MAP completeness:
- Add missing turnstile→turnstileConfig mapping
- Add exhaustiveness test verifying all three renamed keys are remapped
  and original YAML keys don't leak through

Import role context:
- Pass userRole through importConversations job → importLibreChatConvo
  so role-based endpoint overrides are honored during conversation import
- Update convos.js route to include req.user.role in the job payload

createEndpointsConfigService unit tests:
- Add 8 tests covering: default+custom merge, Azure/AzureAssistants/
  Anthropic Vertex/Bedrock config enrichment, assistants version
  coercion, agents allowedProviders, req.config bypass

Plugins/tools efficiency:
- Use Set for includedTools/filteredTools lookups (O(1) vs O(n) per plugin)
- Combine auth check + filter into single pass (eliminates intermediate array)
- Pre-compute toolDefKeys Set for O(1) tool definition lookups

* fix: Scope model query cache by user when userIdQuery is enabled

* fix: Skip model cache for userIdQuery endpoints, fix endpoints test types

- When userIdQuery is true, skip caching entirely (like user_provided keys)
  to avoid cross-user model list leakage without duplicating cache data
- Fix AgentCapabilities type error in endpoints.spec.ts — use enum values
  and appConfig() helper for partial mock typing

* 🐛 fix: Restore filteredTools+includedTools composition, add checkCapability tests

- Fix filteredTools regression: whitelist and blacklist are now applied
  independently (two flat guards), matching original behavior where
  includedTools=['a','b'] + filteredTools=['b'] produces ['a'] (Finding A)
- Fix Set spread in toolkit loop: pre-compute toolDefKeysList array once
  alongside the Set, reuse for .some() without per-plugin allocation (Finding B)
- Add 2 filteredTools tests: blacklist-only path and combined
  whitelist+blacklist composition (Finding C)
- Add 3 checkCapability tests: capability present, capability absent,
  fallback to defaultAgentCapabilities for non-agents endpoints (Finding D)

* 🔑 fix: Include config-override MCP servers in filterAuthorizedTools

Config-override MCP servers (defined via admin config overrides for
roles/groups) were rejected by filterAuthorizedTools because it called
getAllServerConfigs(userId) without the configServers parameter. Only
YAML and DB-backed user servers were included in the access check.

- Add configServers parameter to filterAuthorizedTools
- Resolve config servers via resolveConfigServers(req) at all 4 callsites
  (create, update, duplicate, revert) using parallel Promise.all
- Pass configServers through to getAllServerConfigs(userId, configServers)
  so the registry merges config-source servers into the access check
- Update filterAuthorizedTools.spec.js mock for resolveConfigServers

* fix: Skip model cache for userIdQuery endpoints, fix endpoints test types

For user-provided key endpoints (userProvide: true), skip the full model
list re-fetch during message validation — the user already selected from
a list we served them, and re-fetching with skipCache:true on every
message send is both slow and fragile (5s provider timeout = rejected model).

Instead, validate the model string format only:
- Must be a string, max 256 chars
- Must match [a-zA-Z0-9][a-zA-Z0-9_.:\-/@+ ]* (covers all known provider
  model ID formats while rejecting injection attempts)

System-configured endpoints still get full model list validation as before.

* 🧪 test: Add regression tests for filterAuthorizedTools configServers and validateModel

filterAuthorizedTools:
- Add test verifying configServers is passed to getAllServerConfigs and
  config-override server tools are allowed through
- Guard resolveConfigServers in createAgentHandler to only run when
  MCP tools are present (skip for tool-free agent creates)

validateModel (12 new tests):
- Format validation: missing model, non-string, length overflow, leading
  special char, script injection, standard model ID acceptance
- userProvide early-return: next() called immediately, getModelsConfig
  not invoked (regression guard for the exact bug this fixes)
- System endpoint list validation: reject unknown model, accept known
  model, handle null/missing models config

Also fix unnecessary backslash escape in MODEL_PATTERN regex.

* 🧹 fix: Remove space from MODEL_PATTERN, trim input, clean up nits

- Remove space character from MODEL_PATTERN regex — no real model ID
  uses spaces; prevents spurious violation logs from whitespace artifacts
- Add model.trim() before validation to handle accidental whitespace
- Remove redundant filterUniquePlugins call on already-deduplicated output
- Add comment documenting intentional whitelist+blacklist composition
- Add getUserKeyValues.mockReset() in loadConfigModels.spec beforeEach
- Remove narrating JSDoc from getModelsConfig one-liner
- Add 2 tests: trim whitespace handling, reject spaces in model ID

* fix: Match startup tool loader semantics — includedTools takes precedence over filteredTools

The startup tool loader (loadAndFormatTools) explicitly ignores
filteredTools when includedTools is set, with a warning log. The
PluginController was applying both independently, creating inconsistent
behavior where the same config produced different results at startup
vs plugin listing time.

Restored mutually exclusive semantics: when includedTools is non-empty,
filteredTools is not evaluated.

* 🧹 chore: Simplify validateModel flow, note auth requirement on endpoints route

- Separate missing-model from invalid-model checks cleanly: type+presence
  guard first, then trim+format guard (reviewer NIT)
- Add route comment noting auth is required for role/tenant scoping

* fix: Write trimmed model back to req.body.model for downstream consumers
2026-03-30 16:49:48 -04:00
Dustin Healy
a4a17ac771
⛩️ feat: Admin Grants API Endpoints (#12438)
* feat: add System Grants handler factory with tests

Handler factory with 4 endpoints: getEffectiveCapabilities (expanded
capability set for authenticated user), getPrincipalGrants (list grants
for a specific principal), assignGrant, and revokeGrant. Write ops
dynamically check MANAGE_ROLES/GROUPS/USERS based on target principal
type. 31 unit tests covering happy paths, validation, 403, and errors.

* feat: wire System Grants REST routes

Mount /api/admin/grants with requireJwtAuth + ACCESS_ADMIN gate.
Add barrel export for createAdminGrantsHandlers and AdminGrantsDeps.

* fix: cascade grant cleanup on role deletion

Add deleteGrantsForPrincipal to AdminRolesDeps and call it in
deleteRoleHandler via Promise.allSettled after successful deletion,
matching the groups cleanup pattern. 3 tests added for cleanup call,
skip on 404, and resilience to cleanup failure.

* fix: simplify cascade grant cleanup on role deletion

Replace Promise.allSettled wrapper with a direct try/catch for the
single deleteGrantsForPrincipal call.

* fix: harden grant handlers with auth, validation, types, and RESTful revoke

- Add per-handler auth checks (401) and granular capability gates
  (READ_* for getPrincipalGrants, possession check for assignGrant)
- Extract validatePrincipal helper; rewrite validateGrantBody to use
  direct type checks instead of unsafe `as string` casts
- Align DI types with data layer (ResolvedPrincipal.principalType
  widened to string, getUserPrincipals role made optional)
- Switch revoke route from DELETE body to RESTful URL params
- Return 201 for assignGrant to match roles/groups create convention
- Handle null grantCapability return with 500
- Add comprehensive test coverage for new auth/validation paths

* fix: deduplicate ResolvedPrincipal, typed body, defensive auth checks

- Remove duplicate ResolvedPrincipal from capabilities.ts; import the
  canonical export from grants.ts
- Replace Record<string, unknown> with explicit GrantRequestBody interface
- Add defensive 403 when READ_CAPABILITY_BY_TYPE lookup misses
- Document revoke asymmetry (no possession check) with JSDoc
- Use _id only in resolveUser (avoid Mongoose virtual reliance)
- Improve null-grant error message
- Complete logger mock in tests

* refactor: move ResolvedPrincipal to shared types to fix circular dep

Extract ResolvedPrincipal from admin/grants.ts to types/principal.ts
so middleware/capabilities.ts imports from shared types rather than
depending upward on the admin handler layer.

* chore: remove dead re-export, align logger mocks across admin tests

- Remove unused ResolvedPrincipal re-export from grants.ts (canonical
  source is types/principal.ts)
- Align logger mocks in roles.spec.ts and groups.spec.ts to include
  all log levels (error, warn, info, debug) matching grants.spec.ts

* fix: cascade Config and AclEntry cleanup on role deletion

Add deleteConfig and deleteAclEntries to role deletion cascade,
matching the group deletion pattern. Previously only grants were
cleaned up, leaving orphaned config overrides and ACL entries.

* perf: single-query batch for getEffectiveCapabilities

Add getCapabilitiesForPrincipals (plural) to the data layer — a single
$or query across all principals instead of N+1 parallel queries. Wire
it into the grants handler so getEffectiveCapabilities hits the DB once
regardless of how many principals the user has.

* fix: defer SystemCapabilities access to factory call time

Move all SystemCapabilities usage (VALID_CAPABILITIES,
MANAGE_CAPABILITY_BY_TYPE, READ_CAPABILITY_BY_TYPE) inside the
createAdminGrantsHandlers factory. External test suites that mock
@librechat/data-schemas without providing SystemCapabilities crashed
at import time when grants.ts was loaded transitively.

* test: add data-layer and handler test coverage for review findings

- Add 6 mongodb-memory-server tests for getCapabilitiesForPrincipals:
  multi-principal batch, empty array, filtering, tenant scoping
- Add handler test: all principals filtered (only PUBLIC)
- Add handler test: granting an implied capability succeeds
- Add handler test: all cascade cleanup operations fail simultaneously
- Document platform-scope-only tenantId behavior in JSDoc

* fix: resolveUser fallback to user.id, early-return empty principals

- Match capabilities middleware pattern: _id?.toString() ?? user.id
  to handle JWT-deserialized users without Mongoose _id
- Move empty-array guard before principals.map() to skip unnecessary
  normalizePrincipalId calls
- Add comment explaining VALID_PRINCIPAL_TYPES module-scope asymmetry

* refactor: derive VALID_PRINCIPAL_TYPES from capability maps

Make MANAGE_CAPABILITY_BY_TYPE and READ_CAPABILITY_BY_TYPE
non-Partial Records over a shared GrantPrincipalType union, then
derive VALID_PRINCIPAL_TYPES from the map keys. This makes divergence
between the three data structures structurally impossible.

* feat: add GET /api/admin/grants list-all-grants endpoint

Add listAllGrants data-layer method and handler so the admin panel
can fetch all grants in a single request instead of fanning out
N+M calls per role and group. Response is filtered to only include
grants for principal types the caller has read access to.

* fix: update principalType to use GrantPrincipalType for consistency in grants handling

- Refactor principalType in createAdminGrantsHandlers to use GrantPrincipalType instead of PrincipalType for better type accuracy.
- Ensure type consistency across the grants handling logic in the API.

* fix: address admin grants review findings — tenantId propagation, capability validation, pagination, and test coverage

Propagate tenantId through all grant operations for multi-tenancy support.
Extract isValidCapability to accept full SystemCapability union (base, section,
assign) and reuse it in both Mongoose schema validation and handler input checks.
Replace listAllGrants with paginated listGrants + countGrants. Filter PUBLIC
principals from getCapabilitiesForPrincipals queries. Export getCachedPrincipals
from ALS store for fast-path principal resolution. Move DELETE capability param
to query string to avoid colon-in-URL issues. Remove dead code and add
comprehensive handler and data-layer test coverage.

* refactor: harden admin grants — FilterQuery types, auth-first ordering, DELETE path param, isValidCapability tests

Replace Record<string, unknown> with FilterQuery<ISystemGrant> across all
data-layer query filters. Refactor buildTenantFilter to a pure tenantCondition
function that returns a composable FilterQuery fragment, eliminating the $or
collision between tenant and principal queries. Move auth check before input
validation in getPrincipalGrantsHandler, assignGrantHandler, and
revokeGrantHandler to avoid leaking valid type names to unauthenticated callers.
Switch DELETE route from query param back to path param (/:capability) with
encodeURIComponent per project conventions. Add compound index for listGrants
sort. Type VALID_PRINCIPAL_TYPES as Set<GrantPrincipalType>. Remove unused
GetCachedPrincipalsFn type export. Add dedicated isValidCapability unit tests
and revokeGrant idempotency test.

* refactor: batch capability checks in listGrantsHandler via getHeldCapabilities

Replace 3 parallel hasCapabilityForPrincipals DB calls with a single
getHeldCapabilities query that returns the subset of capabilities any
principal holds. Also: defensive limit(0) clamp, parallelized assignGrant
auth checks, principalId type-vs-required error split, tenantCondition
hoisted to factory top, JSDoc on cascade deps, DELETE route encoding note.

* fix: normalize principalId and filter undefined in getHeldCapabilities

Add normalizePrincipalId + null guard to getHeldCapabilities, matching
the contract of getCapabilitiesForPrincipals. Simplify allCaps build
with flatMap, add no-tenantId cross-check and undefined-principalId
test cases.

* refactor: use concrete types in GrantRequestBody, rename encoding test

Replace unknown fields with explicit string types in GrantRequestBody,
matching the established pattern in roles/groups/config handlers. Rename
misleading 'encoded' test to 'with colons' since Express auto-decodes
req.params.

* fix: support hierarchical parent capabilities in possession checks

hasCapabilityForPrincipals and getHeldCapabilities now resolve parent
base capabilities for section/assignment grants. An admin holding
manage:configs can now grant manage:configs:<section> and transitively
read:configs:<section>. Fixes anti-escalation 403 blocking config
capability delegation.

* perf: use getHeldCapabilities in assignGrant to halve DB round-trips

assignGrantHandler was making two parallel hasCapabilityForPrincipals
calls to check manage + capability possession. getHeldCapabilities was
introduced in this PR specifically for this pattern. Replace with a
single batched call. Update corresponding spec assertions.

* fix: validate role existence before granting capabilities

Grants for non-existent role names were silently persisted, creating
orphaned grants that could surprise-activate if a role with that name
was later created. Add optional checkRoleExists dep to assignGrant and
wire it to getRoleByName in the route file.

* refactor: tighten principalType typing and use grantCapability in tests

Narrow getCapabilitiesForPrincipals parameter from string to
PrincipalType, removing the redundant cast. Replace direct
SystemGrant.create() calls in getCapabilitiesForPrincipals tests with
methods.grantCapability() to honor the schema's normalization invariant.
Add getHeldCapabilities extended capability tests.

* test: rename misleading cascade cleanup test name

The test only injects failure into deleteGrantsForPrincipal, not all
cascade operations. Rename from 'cascade cleanup fails' to 'grant
cleanup fails' to match the actual scope.

* fix: reorder role check after permission guard, add tenantId to index

Move checkRoleExists after the getHeldCapabilities permission check so
that a sub-MANAGE_ROLES admin cannot probe role name existence via
400 vs 403 response codes.

Add tenantId to the { principalType, capability } index so listGrants
queries in multi-tenant deployments can use a covering index instead
of post-scanning for tenant condition.

Add missing test for checkRoleExists throwing.

* fix: scope deleteGrantsForPrincipal to tenant on role deletion

deleteGrantsForPrincipal previously filtered only on principalType +
principalId, deleting grants across all tenants. Since the role schema
supports multi-tenancy (compound unique index on name + tenantId), two
tenants can share a role name like 'editor'. Deleting that role in one
tenant would wipe grants for identically-named roles in other tenants.

Add optional tenantId parameter to deleteGrantsForPrincipal. When
provided, scopes the delete to that tenant plus platform-level grants.
Propagate req.user.tenantId through the role deletion cascade.

* fix: scope grant cleanup to tenant on group deletion

Same cross-tenant gap as the role deletion path: deleteGroupHandler
called deleteGrantsForPrincipal without tenantId, so deleting a group
would wipe its grants across all tenants. Extract req.user.tenantId
and pass it through.

* test: add HTTP integration test for admin grants routes

Supertest-based test with real MongoMemoryServer exercising the full
Express wiring: route registration, injected auth middleware, handler
DI deps, and real DB round-trips.

Covers GET /, GET /effective, POST / + DELETE / lifecycle, role
existence validation, and 401 for unauthenticated callers.

Also documents the expandImplications scope: the /effective endpoint
returns base-level capabilities only; section-level resolution is
handled at authorization check time by getParentCapabilities.

* fix: use exact tenant match in deleteGrantsForPrincipal, normalize principalId, harden API

CRITICAL: deleteGrantsForPrincipal was using tenantCondition (a
read-query helper) for deleteMany, which includes the
{ tenantId: { $exists: false } } arm. This silently destroyed
platform-level grants when a tenant-scoped role/group deletion
occurred. Replace with exact { tenantId } match for deletes so
platform-level grants survive tenant-scoped cascade cleanup.

Refactor deleteGrantsForPrincipal signature from fragile positional
overload (sessionOrTenantId union + maybeSession) to a clean options
object: { tenantId?, session? }. Update all callers and test assertions.

Add normalizePrincipalId to hasCapabilityForPrincipals to match the
pattern already used by getHeldCapabilities — prevents string/ObjectId
type mismatch on USER/GROUP principal queries.

Also: export GrantPrincipalType from barrel, add upper-bound cap to
listGrants, document GROUP/USER existence check trade-off, add
integration tests for tenant-isolation property of deleteGrantsForPrincipal.

* fix: forward tenantId to getUserPrincipals in resolvePrincipals

resolvePrincipals had tenantId available from the caller but only
forwarded it to getCachedPrincipals (cache lookup). The DB fallback
via getUserPrincipals omitted it. While the Group schema's
applyTenantIsolation Mongoose plugin handles scoping via
AsyncLocalStorage in HTTP request context, explicitly passing tenantId
makes the contract visible and prevents silent cross-tenant group
resolution if called outside request context.

* fix: remove unused import and add assertion to 401 integration test

Remove unused SystemCapabilities import flagged by ESLint. Add explicit
body assertion to the 401 test so it has a jest expect() call.

* chore: hoist grant limit constants to scope, remove dead isolateModules

Move GRANTS_DEFAULT_LIMIT / GRANTS_MAX_LIMIT from inside listGrants
function body to createSystemGrantMethods scope so they are evaluated
once at module load. Remove dead jest.isolateModules + jest.doMock
block in integration test — the ~/models mock was never exercised
since handlers are built with explicit DI deps.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-30 16:49:23 -04:00
Danny Avila
877c2efc85
🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445)
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode

seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.

* fix: chain tenantContextMiddleware in optionalJwtAuth

optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.

* feat: tenant-safe bulkWrite wrapper and call-site migration

Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.

Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)

systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.

* feat: pre-auth tenant middleware and tenant-scoped config cache

Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.

The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.

The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.

* feat: accept optional tenantId in updateInterfacePermissions

When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.

* fix: tenant-filter $lookup in getPromptGroup aggregation

The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.

Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.

* fix: replace field-level unique indexes with tenant-scoped compounds

Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.

- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
  already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
  already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
  compound

* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant

These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.

Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).

Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG

* fix: add getTenantId to PluginController spec mock

The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.

* fix: address review findings — migration, strict-mode, DRY, types

Addresses all CRITICAL, MAJOR, and MINOR review findings:

F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.

F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.

F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.

F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.

F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.

F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.

F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).

F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.

* fix: add new superseded indexes to tenantIndexes test fixture

The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).

* fix: restore logger.warn for unknown bulk op types in non-strict mode

* fix: block SYSTEM_TENANT_ID sentinel from external header input

CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.

Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
  context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
  for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js

* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions

bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
  document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling

preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
  getTenantId() inside the middleware's execution context, not the
  test runner's outer frame (which was always undefined regardless)

* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager

MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
  preventing cross-tenant message content reads during TTS streaming.

FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
  context is active, isolating OAuth flow state per tenant.

GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
  SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
  req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId

* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks

FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.

* fix: correct import ordering per AGENTS.md conventions

Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.

* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode

serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).

* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs

SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped

FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows

Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site

* fix: SSE stream tests hang on success path, remove internal fork references

The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.

Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".

* fix: address review findings — test rigor, tenant ID validation, docs

F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.

F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.

F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.

F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.

F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.

F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.

* fix: revert flow key tenant scoping, fix SSE test timing

FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).

SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.

* fix: restore getTenantId import for completeFlow diagnostic log

* fix: correct completeFlow warning message, add missing flow test

The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.

Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.

Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.

* fix: add explicit return type to recursive updateInterfacePermissions

The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.

* fix: update MCPOAuthRaceCondition test to match new completeFlow warning

* fix: clearEndpointConfigCache deletes both scoped and unscoped keys

Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.

* fix: tenant guard on abort/status endpoints, warn logs, test coverage

F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.

F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.

F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.

F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.

F5: Test for job with tenantId + user without tenantId → 403.

F10: Regex uses idiomatic hyphen-at-start form.

F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).

Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.

* fix: test coverage for logger.warn, status/abort guards, consistency

A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.

B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.

C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.

D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.

* fix: assert ip field in malformed header warn tests

* fix: parallelize cache deletes, document tenant guard, fix import order

- clearEndpointConfigCache uses Promise.all for independent cache
  deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
  behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
  AGENTS.md

* fix: tenant-qualify userJobs keys, document tenant guard backward-compat

Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
  stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap

getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).

Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.

* fix: parallelize cache deletes, document tenant guard, fix import order

Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.

Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.

* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs

F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.

F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.

F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).

F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.

F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.

F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.

F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).

* fix: clean userJobMap in evictOldest — same leak as cleanup()
2026-03-28 16:43:50 -04:00
Danny Avila
935288f841
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring

- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
  `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
  field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
  `config.source === 'user'` across MCPManager, ConnectionsRepository,
  UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB

* feat: three-layer MCPServersRegistry with config cache and lazy init

- Add `configCacheRepo` as third repository layer between YAML cache and
  DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
  from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
  caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
  concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
  three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
  results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
  DB-stored servers in `addServer()` and `addServerStub()`

* feat: wire tenant context into MCP controllers, services, and cache invalidation

- Resolve config-source servers via `getAppConfig({ role, tenantId })`
  in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
  for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
  routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
  config mutations trigger re-inspection of config-source MCP servers

* feat: enforce tenantMcpPolicy on admin config mcpServers mutations

- Add `validateMcpServerPolicy()` helper that checks mcpServers against
  operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
  allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
  handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
  → websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured

* revert: remove tenantMcpPolicy schema and enforcement

The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.

- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
  patchConfigField handlers

* test: update test assertions for source field and config-server wiring

- Use objectContaining in MCPServersRegistry reset test to account for
  new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter

* fix: disconnect active connections when config-source servers are evicted

When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.

- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
  clearMcpConfigCache() via MCPManager.appConnections.disconnect()

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
  cross-tenant cache poisoning when two tenants define the same
  server name with different configurations
- Change dbSourced checks from `source === 'user'` to
  `source !== 'yaml' && source !== 'config'` so undefined source
  (pre-upgrade cached configs) fails closed to restricted mode

MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
  calling getOAuthServers() separately — config-source OAuth servers
  are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
  for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
  servers now always resolve regardless of tenant context

MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
  eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
  clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
  ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
  calls per request

Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
  MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
  legacy dbId check when absent (backward-compatible with pre-upgrade
  cached configs that lack source field)

MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
  covering lazy init, stub-on-failure, cross-tenant isolation via config
  hash keys, concurrent deduplication, merge order, and cache invalidation

MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change

* fix: restore getServerConfig lookup for config-source servers (NEW-1)

Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.

Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.

- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup

* fix: eliminate configNameToKey race via caller-provided configServers param

Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.

- Add optional configServers param to getServerConfig; when provided,
  returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
  invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
  redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig

* fix: populate read-through cache after config server lazy init

After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.

Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.

* fix: user-scoped getServerConfig fallback to server-only cache key

When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.

* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race

CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.

MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).

MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.

MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).

* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext

Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.

Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.

* fix: stale failure stubs retry after 5 min, upsert for cross-process races

- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
  instead of permanently disabling config-source servers after transient
  errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
  update, preventing cross-process Redis races where a second instance's
  successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS

* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup

- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
  CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
  were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
  preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
  jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests

* fix: server-only readThrough fallback only returns truthy values

Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.

* fix: remove findInConfigCache to eliminate cross-tenant config leakage

The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:

1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
   5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)

Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.

- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
  (prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry

* style: fix import order per AGENTS.md conventions

Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.

* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures

Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:

- Cross-tenant contamination: the readThrough cache key was unscoped
  (just serverName), so concurrent multi-tenant requests for same-named
  servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
  fail with "Configuration not found" because the readThrough entry
  had expired

Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
  provided config directly, falling back to getServerConfig lookup
  for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
  closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
  servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior

* fix: cache base configs for config-server users; narrow upsertConfigCache error handling

- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
  from config-server layering. Base configs are cached via readThroughCacheAll
  regardless of whether configServers is provided, eliminating uncached
  MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
  infrastructure errors (Redis timeouts, network failures) now propagate
  instead of being silently swallowed, preventing inspection storms
  during outages

* fix: restore correct merge order and document upsert error matching

- Restore YAML → Config → User DB precedence in getAllServerConfigs
  (user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
  linking to the two cache implementations that define the error message

* feat: complete config-source server support across all execution paths

Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.

- Thread configServers into handleTools.js agent tool pipeline: resolve
  config servers from tenant context before MCP tool iteration, pass to
  getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
  applyContextToAgent → getMCPInstructionsForServers →
  formatInstructionsForContext, resolved in client.js before agent
  context application
- Add configServers param to createMCPTool and createMCPTools for
  reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
  differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
  results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter

* fix: add missing mocks for config-source server dependencies in client.test.js

Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.

* fix: update formatInstructionsForContext assertions for configServers param

The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.

* fix: move configServers resolution before MCP tool loop to avoid TDZ

configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.

* fix: address review findings — cache bypass, discoverServerTools gap, DRY

- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
  readThroughCacheAll) instead of bypassing it when configServers is present.
  Extracts user-DB entries from cached base by diffing against YAML keys
  to maintain YAML → Config → User DB merge order without extra MongoDB calls.

- #3: Add configServers param to ToolDiscoveryOptions and thread it through
  discoverServerTools → getServerConfig so config-source servers are
  discoverable during OAuth reconnection flows.

- #6: Replace inline import() type annotations in context.ts with proper
  import type { ParsedServerConfig } per AGENTS.md conventions.

- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
  handleTools.js and client.js, eliminating the duplicated 6-line config
  resolution pattern.

- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
  choice in getMCPSetupData — documents non-obvious correctness constraint.

- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.

* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case

- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
  YAML entry with the same name was excluded from userDbConfigs; now
  uses reference equality check to detect DB-overwritten YAML keys

* fix: replace fragile string-match error detection with proper upsert method

Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.

Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)

* fix: wire configServers through remaining HTTP endpoints

- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name

* fix: thread serverConfig through getConnection for config-source servers

Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.

* fix: thread configServers through reinit, reconnect, and tool definition paths

Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:

- reinitMCPServer: accepts serverConfig and configServers, uses them for
  getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
  passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer

* fix: address review findings — simplify merge, harden error paths, fix log labels

- Simplify getAllServerConfigs merge: replace fragile reference-equality
  loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
  failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
  messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js

* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label

- Clear yamlServerNamesPromise on rejection so transient cache errors
  don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
  reinitMCPServer — they lack a CACHE/DB storage location; retry is
  handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message

* fix: persist oauthHeaders in flow state for config-source OAuth servers

The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.

Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.

* fix: update tests for getMCPSetupData null guard removal and ToolService mock

- MCP.spec.js: update test to expect graceful handling of null mcpConfig
  instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
  null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
  (resolveConfigServers)

* fix: address review findings — DRY, naming, logging, dead code, defensive guards

- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
  eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
  imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
  in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
  so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change

* fix: restore correct merge priority, use immutable spread, fix test mock

- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
  configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
  for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
  unnecessary || {} defensive guard in getMCPSetupData

* fix: error handling, stable hashing, flatten nesting, remove dead param

- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
  graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
  regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
  document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets

* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision

The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.

Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.

Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.

* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency

The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').

* chore: imports/fields ordering

* fix: address review findings — error handling, targeted lookup, test gaps

- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
  getAppConfig/getAllServerConfigs failures propagate instead of masking
  infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
  all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
  works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
  vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
  happy path and error propagation.

* fix: add getOAuthReconnectionManager mock to OAuth callback tests

* chore: imports ordering
2026-03-28 10:36:43 -04:00
Dustin Healy
5972a21479
🪪 feat: Admin Roles API Endpoints (#12400)
* feat: add createRole and deleteRole methods to role

* feat: add admin roles handler factory and Express routes

* fix: address convention violations in admin roles handlers

* fix: rename createRole/deleteRole to avoid AccessRole name collision

The existing accessRole.ts already exports createRole/deleteRole for the
AccessRole model. In createMethods index.ts, these are spread after
roleMethods, overwriting them. Renamed our Role methods to
createRoleByName/deleteRoleByName to match the existing pattern
(getRoleByName, updateRoleByName) and avoid the collision.

* feat: add description field to Role model

- Add description to IRole, CreateRoleRequest, UpdateRoleRequest types
- Add description field to Mongoose roleSchema (default: '')
- Wire description through createRoleHandler and updateRoleHandler
- Include description in listRoles select clause so it appears in list

* fix: address Copilot review findings in admin roles handlers

* test: add unit tests for admin roles and groups handlers

* test: add data-layer tests for createRoleByName, deleteRoleByName, listUsersByRole

* fix: allow system role updates when name is unchanged

The updateRoleHandler guard rejected any request where body.name matched
a system role, even when the name was not being changed. This blocked
editing a system role's description. Compare against the URL param to
only reject actual renames to reserved names.

* fix: address external review findings for admin roles

- Block renaming system roles (ADMIN/USER) and add user migration on rename
- Add input validation: name max-length, trim on update, duplicate name check
- Replace fragile String.includes error matching with prefix-based classification
- Catch MongoDB 11000 duplicate key in createRoleByName
- Add pagination (limit/offset/total) to getRoleMembersHandler
- Reverse delete order in deleteRoleByName — reassign users before deletion
- Add role existence check in removeRoleMember; drop unused createdAt select
- Add Array.isArray guard for permissions input; use consistent ?? coalescing
- Fix import ordering per AGENTS.md conventions
- Type-cast mongoose.models.User as Model<IUser> for proper TS inference
- Add comprehensive tests: rename guards, pagination, validation, 500 paths

* fix: address re-review findings for admin roles

- Gate deleteRoleByName on existence check — skip user reassignment and
  cache invalidation when role doesn't exist (fixes test mismatch)
- Reverse rename order: migrate users before renaming role so a migration
  failure leaves the system in a consistent state
- Add .sort({ _id: 1 }) to listUsersByRole for deterministic pagination
- Import shared AdminMember type from data-schemas instead of local copy;
  make joinedAt optional since neither groups nor roles populate it
- Change IRole.description from optional to required to match schema default
- Add data-layer tests for updateUsersByRole and countUsersByRole
- Add handler test verifying users-first rename ordering and migration
  failure safety

* fix: add rollback on rename failure and update PR description

- Roll back user migration if updateRoleByName returns null during a
  rename (race: role deleted between existence check and update)
- Add test verifying rollback calls updateUsersByRole in reverse
- Update PR #12400 description to reflect current test counts (56
  handler tests, 40 data-layer tests) and safety features

* fix: rollback on rename throw, description validation, delete/DRY cleanup

- Hoist isRename/trimmedName above try block so catch can roll back user
  migration when updateRoleByName throws (not just returns null)
- Add description type + max-length (2000) validation in create and update,
  consistent with groups handler
- Remove redundant getRoleByName existence check in deleteRoleHandler —
  use deleteRoleByName return value directly
- Skip no-op name write when body.name equals current name (use isRename)
- Extract getUserModel() accessor to DRY repeated Model<IUser> casts
- Use name.trim() consistently in createRoleByName error messages
- Add tests: rename-throw rollback, description validation (create+update),
  update delete test mocks to match simplified handler

* fix: guard spurious rollback, harden createRole error path, validate before DB calls

- Add migrationRan flag to prevent rollback of user migration that never ran
- Return generic message on 500 in createRoleHandler, specific only for 409
- Move description validation before DB queries in updateRoleHandler
- Return existing role early when update body has no changes
- Wrap cache.set in createRoleByName with try/catch to prevent masking DB success
- Add JSDoc on 11000 catch explaining compound unique index
- Add tests: spurious rollback guard, empty update body, description validation
  ordering, listUsersByRole pagination

* fix: validate permissions in create, RoleConflictError, rollback safety, cache consistency

- Add permissions type/array validation in createRoleHandler
- Introduce RoleConflictError class replacing fragile string-prefix matching
- Wrap rollback in !role null path with try/catch for correct 404 response
- Wrap deleteRoleByName cache.set in try/catch matching createRoleByName
- Narrow updateRoleHandler body type to { name?, description? }
- Add tests: non-string description in create, rollback failure logging,
  permissions array rejection, description max-length assertion fix

* feat: prevent removing the last admin user

Add guard in removeRoleMember that checks countUsersByRole before
demoting an ADMIN user, returning 400 if they are the last one.

* fix: move interleaved export below imports, add await to countUsersByRole

* fix: paginate listRoles, null-guard permissions handler, fix export ordering

- Add limit/offset/total pagination to listRoles matching the groups pattern
- Add countRoles data-layer method
- Omit permissions from listRoles select (getRole returns full document)
- Null-guard re-fetched role in updateRolePermissionsHandler
- Move interleaved export below all imports in methods/index.ts

* fix: address review findings — race safety, validation DRY, type accuracy, test coverage

- Add post-write admin count verification in removeRoleMember to prevent
  zero-admin race condition (TOCTOU → rollback if count hits 0)
- Make IRole.description optional; backfill in initializeRoles for
  pre-existing roles that lack the field (.lean() bypasses defaults)
- Extract parsePagination, validateNameParam, validateRoleName, and
  validateDescription helpers to eliminate duplicated validation
- Add validateNameParam guard to all 7 handlers reading req.params.name
- Catch 11000 in updateRoleByName and surface as 409 via RoleConflictError
- Add idempotent skip in addRoleMember when user already has target role
- Verify updateRolePermissions test asserts response body
- Add data-layer tests: listRoles sort/pagination/projection, countRoles,
  and createRoleByName 11000 duplicate key race

* fix: defensive rollback in removeRoleMember, type/style cleanup, test coverage

- Wrap removeRoleMember post-write admin rollback in try/catch so a
  transient DB failure cannot leave the system with zero administrators
- Replace double `as unknown[] as IRole[]` cast with `.lean<IRole[]>()`
- Type parsePagination param explicitly; extract DEFAULT/MAX page constants
- Preserve original error cause in updateRoleByName re-throw
- Add test for rollback failure path in removeRoleMember (returns 400)
- Add test for pre-existing roles missing description field (.lean())

* chore: bump @librechat/data-schemas to 0.0.47

* fix: stale cache on rename, extract renameRole helper, shared pagination, cleanup

- Fix updateRoleByName cache bug: invalidate old key and populate new key
  when updates.name differs from roleName (prevents stale cache after rename)
- Extract renameRole helper to eliminate mutable outer-scope state flags
  (isRename, trimmedName, migrationRan) in updateRoleHandler
- Unify system-role protection to 403 for both rename-from and rename-to
- Extract parsePagination to shared admin/pagination.ts; use in both
  roles.ts and groups.ts
- Extract name.trim() to local const in createRoleByName (was called 5×)
- Remove redundant findOne pre-check in deleteRoleByName
- Replace getUserModel closure with local const declarations
- Remove redundant description ?? '' in createRoleHandler (schema default)
- Add doc comment on updateRolePermissionsHandler noting cache dependency
- Add data-layer tests for cache rename behavior (old key null, new key set)

* fix: harden role guards, add User.role index, validate names, improve tests

- Add index on User.role field for efficient member queries at scale
- Replace fragile SystemRoles key lookup with value-based Set check (6 sites)
- Elevate rename rollback failure logging to CRITICAL (matches removeRoleMember)
- Guard removeRoleMember against non-ADMIN system roles (403 for USER)
- Fix parsePagination limit=0 gotcha: use parseInt + NaN check instead of ||
- Add control character and reserved path segment validation to role names
- Simplify validateRoleName: remove redundant casts and dead conditions
- Add JSDoc to deleteRoleByName documenting non-atomic window
- Split mixed value+type import in methods/index.ts per AGENTS.md
- Add 9 new tests: permissions assertion, combined rename+desc, createRole
  with permissions, pagination edge cases, control char/reserved name
  rejection, system role removeRoleMember guard

* fix: exact-case reserved name check, consistent validation, cleaner createRole

- Remove .toLowerCase() from reserved name check so only exact matches
  (members, permissions) are rejected, not legitimate names like "Members"
- Extract trimmed const in validateRoleName for consistent validation
- Add control char check to validateNameParam for parity with body validation
- Build createRole roleData conditionally to avoid passing description: undefined
- Expand deleteRoleByName JSDoc documenting self-healing design and no-op trade-off

* fix: scope rename rollback to only migrated users, prevent cross-role corruption

Capture user IDs before forward migration so the rollback path only
reverts users this request actually moved. Previously the rollback called
updateUsersByRole(newName, currentName) which would sweep all users with
the new role — including any independently assigned by a concurrent admin
request — causing silent cross-role data corruption.

Adds findUserIdsByRole and updateUsersRoleByIds to the data layer.
Extracts rollbackMigratedUsers helper to deduplicate rollback sites.

* fix: guard last admin in addRoleMember to prevent zero-admin lockout

Since each user has exactly one role, addRoleMember implicitly removes
the user from their current role. Without a guard, reassigning the sole
admin to a non-admin role leaves zero admins and locks out admin
management. Adds the same countUsersByRole check used in removeRoleMember.

* fix: wire findUserIdsByRole and updateUsersRoleByIds into roles route

The scoped rollback deps added in c89b5db were missing from the route
DI wiring, causing renameRole to call undefined and return a 500.

* fix: post-write admin guard in addRoleMember, compound role index, review cleanup

- Add post-write admin count check + rollback to addRoleMember to match
  removeRoleMember's two-phase TOCTOU protection (prevents zero-admin via
  concurrent requests)
- Replace single-field User.role index with compound { role: 1, tenantId: 1 }
  to align with existing multi-tenant index pattern (email, OAuth IDs)
- Narrow listRoles dep return type to RoleListItem (projected fields only)
- Refactor validateDescription to early-return style per AGENTS.md
- Remove redundant double .lean() in updateRoleByName
- Document rename snapshot race window in renameRole JSDoc
- Document cache null-set behavior in deleteRoleByName
- Add routing-coupling comment on RESERVED_ROLE_NAMES
- Add test for addRoleMember post-write rollback

* fix: review cleanup — system-role guard, type safety, JSDoc accuracy, tests

- Add system-role guard to addRoleMember: block direct assignment to
  non-ADMIN system roles (403), symmetric with removeRoleMember
- Fix RESERVED_ROLE_NAMES comment: explain semantic URL ambiguity, not
  a routing conflict (Express resolves single vs multi-segment correctly)
- Replace _id: unknown with Types.ObjectId | string per AGENTS.md
- Narrow listRoles data-layer return type to Pick<IRole, 'name' | 'description'>
  to match the actual .select() projection
- Move updateRoleHandler param check inside try/catch for consistency
- Include user IDs in all CRITICAL rollback failure logs for operator recovery
- Clarify deleteRoleByName JSDoc: replace "self-healing" with "idempotent",
  document that recovery requires caller retry
- Add tests: system-role guard, promote non-admin to ADMIN,
  findUserIdsByRole throw prevents migration

* fix: include _id in listRoles return type to match RoleListItem

Pick<IRole, 'name' | 'description'> omits _id, making it incompatible
with the handler dep's RoleListItem which requires _id.

* fix: case-insensitive system role guard, reject null permissions, check updateUser result

- System role name checks now use case-insensitive comparison via
  toUpperCase() — prevents creating 'admin' or 'user' which would
  collide with the legacy roles route that uppercases params
- Reject permissions: null in createRole (typeof null === 'object'
  was bypassing the validation)
- Check updateUser return in addRoleMember — return 404 if the user
  was deleted between the findUser and updateUser calls

* fix: check updateUser return in removeRoleMember for concurrent delete safety

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-27 15:44:47 -04:00
Dustin Healy
2e3d66cfe2
👥 feat: Admin Groups API Endpoints (#12387)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* feat: add listGroups and deleteGroup methods to userGroup

* feat: add admin groups handler factory and Express routes

* fix: address convention violations in admin groups handlers

* fix: address Copilot review findings in admin groups handlers

- Escape regex in listGroups to prevent injection/ReDoS
- Validate ObjectId format in all handlers accepting id/userId params
- Replace N+1 findUser loop with batched findUsers query
- Remove unused findGroupsByMemberId from dep interface
- Map Mongoose ValidationError to 400 in create/update handlers
- Validate name in updateGroupHandler (reject empty/whitespace)
- Handle null updateGroupById result (race condition)
- Tighten error message matching in add/remove member handlers

* test: add unit tests for admin groups handlers

* fix: address code review findings for admin groups

Atomic delete/update handlers (single DB trip), pass through
idOnTheSource, add removeMemberById for non-ObjectId members,
deduplicate member results, fix error message exposure, add hard
cap/sort to listGroups, replace GroupListFilter with Pick of
GroupFilterOptions, validate memberIds as array, trim name in
update, fix import order, and improve test hygiene with fresh
IDs per test.

* fix: cascade cleanup, pagination, and test coverage for admin groups

Add deleteGrantsForPrincipal to systemGrant data layer and wire cascade
cleanup (Config, AclEntry, SystemGrant) into deleteGroupHandler. Add
limit/offset pagination to getGroupMembers. Guard empty PATCH bodies with
400. Remove dead type guard and unnecessary type cast. Add 11 new tests
covering cascade delete, idempotent member removal, empty update, search
filter, 500 error paths, and pagination.

* fix: harden admin groups with cascade resilience, type safety, and fallback removal

Wrap cascade cleanup in inner try/catch so partial failure logs but still
returns 200 (group is already deleted). Replace Record<string, unknown> on
deleteAclEntries with proper typed filter. Log warning for unmapped user
ObjectIds in createGroup memberIds. Add removeMemberById fallback when
removeUserFromGroup throws User not found for ObjectId-format userId.
Extract VALID_GROUP_SOURCES constant. Add 3 new tests (60 total).

* refactor: add countGroups, pagination, and projection type to data layer

Extract buildGroupQuery helper, add countGroups method, support
limit/offset/skip in listGroups, standardize session handling to
.session(session ?? null), and tighten projection parameter from
Record<string, unknown> to Record<string, 0 | 1>.

* fix: cascade resilience, pagination, validation, and error clarity for admin groups

- Use Promise.allSettled for cascade cleanup so all steps run even if
  one fails; log individual rejections
- Echo deleted group id in delete response
- Add countGroups dep and wire limit/offset pagination for listGroups
- Deduplicate memberIds before computing total in getGroupMembers
- Use { memberIds: 1 } projection in getGroupMembers
- Cap memberIds at 500 entries in createGroup
- Reject search queries exceeding 200 characters
- Clarify addGroupMember error for non-ObjectId userId
- Document deleted-user fallback limitation in removeGroupMember

* test: extend handler and DB-layer test coverage for admin groups

Handler tests: projection assertion, dedup total, memberIds cap,
search max length, non-ObjectId memberIds passthrough, cascade partial
failure resilience, dedup scenarios, echo id in delete response.

DB-layer tests: listGroups sort/filter/pagination, countGroups,
deleteGroup, removeMemberById, deleteGrantsForPrincipal.

* fix: cast group principalId to ObjectId for ACL entry cleanup

deleteAclEntries is a thin deleteMany wrapper with no type casting,
but grantPermission stores group principalId as ObjectId. Passing the
raw string from req.params would leave orphaned ACL entries on group
deletion.

* refactor: remove redundant pagination clamping from DB listGroups

Handler already clamps limit/offset at the API boundary. The DB
method is a general-purpose building block and should not re-validate.

* fix: add source and name validation, import order, and test coverage for admin groups

- Validate source against VALID_GROUP_SOURCES in createGroupHandler
- Cap name at 500 characters in both create and update handlers
- Document total as upper bound in getGroupMembers response
- Document ObjectId requirement for deleteAclEntries in cascade
- Fix import ordering in test file (local value after type imports)
- Add tests for updateGroup with description, email, avatar fields
- Add tests for invalid source and name max-length in both handlers

* fix: add field length caps, flatten nested try/catch, and fix logger level in admin groups

Add max-length validation for description, email, avatar, and
idOnTheSource in create/update handlers. Extract removeObjectIdMember
helper to flatten nested try/catch per never-nesting convention. Downgrade
unmapped-memberIds log from error to warn. Fix type import ordering and
add missing await in removeMemberById for consistency.
2026-03-26 17:36:18 -04:00
Danny Avila
9f6d8c6e93
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation

Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.

- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths

* feat: register tenant middleware and wrap startup/auth in runAsSystem()

- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
  before user context is established (LDAP, SAML, OpenID, social login, AuthService)

* feat: thread tenantId through all getAppConfig callers

Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.

Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.

Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint

* feat: add config cache invalidation on admin mutations

- Add clearOverrideCache(tenantId?) to flush per-principal override caches
  by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
  caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
  (upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)

* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering

The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.

* fix: replace runAsSystem with baseOnly for pre-tenant code paths

App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.

All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.

* fix: address PR review findings — middleware ordering, types, cache safety

- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
  instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)

* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly

- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
  initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
  queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)

* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests

- requireJwtAuth: 5 tests verifying ALS tenant context is set after
  passport auth, isolated between concurrent requests, and not set
  when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
  tenantId is threaded through, partial failure is handled gracefully,
  and operations run in parallel via Promise.all (Finding 11)

* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping

- Forward passport errors in requireJwtAuth before entering tenant
  middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
  are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
  so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
  base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing

* fix: address second review — cache safety, code quality, test reliability

- Decouple cache invalidation from mutation response: fire-and-forget
  with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
  side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
  env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock

* fix: move JSDoc to correct function after clearEndpointConfigCache extraction

* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry

Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.

* fix: remove return from tenantStorage.run to satisfy void middleware signature

* fix: address second review — cache safety, code quality, test reliability

- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
  so partial failures are logged individually instead of producing one
  undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
  flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
  if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
  clearAppConfigCache rejects (not just the swallowed endpoint error)

* chore: reorder imports in index.js and app.js for consistency

- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
Danny Avila
4b6d68b3b5
🎛️ feat: DB-Backed Per-Principal Config System (#12354)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
*  feat: Add Config schema, model, and methods for role-based DB config overrides

Add the database foundation for principal-based configuration overrides
(user, group, role) in data-schemas. Includes schema with tenantId and
tenant isolation, CRUD methods, and barrel exports.

* 🔧 fix: Add shebang and enforce LF line endings for git hooks

The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was
converting it to CRLF, both causing "Exec format error" on Windows.
Add .gitattributes to force LF for .husky/* and *.sh files.

*  feat: Add admin config API routes with section-level capability checks

Add /api/admin/config endpoints for managing per-principal config
overrides (user, group, role). Handlers in @librechat/api use DI pattern
with section-level hasConfigCapability checks for granular access control.

Supports full overrides replacement, per-field PATCH via dot-paths, field
deletion, toggle active, and listing.

* 🐛 fix: Move deleteConfigField fieldPath from URL param to request body

The path-to-regexp wildcard syntax (:fieldPath(*)) is not supported by
the version used in Express. Send fieldPath in the DELETE request body
instead, which also avoids URL-encoding issues with dotted paths.

*  feat: Wire config resolution into getAppConfig with override caching

Add mergeConfigOverrides utility in data-schemas for deep-merging DB
config overrides into base AppConfig by priority order.

Update getAppConfig to query DB for applicable configs when role/userId
is provided, with short-TTL caching and a hasAnyConfigs feature flag
for zero-cost when no DB configs exist.

Also: add unique compound index on Config schema, pass userId from
config middleware, and signal config changes from admin API handlers.

* 🔄 refactor: Extract getAppConfig logic into packages/api as TS service

Move override resolution, caching strategy, and signalConfigChange from
api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts
using the DI factory pattern (createAppConfigService). The JS file becomes
a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies.

* 🧹 chore: Rename configResolution.ts to resolution.ts

*  feat: Move admin types & capabilities to librechat-data-provider

Move SystemCapabilities, CapabilityImplications, and utility functions
(hasImpliedCapability, expandImplications) from data-schemas to
data-provider so they are available to external consumers like the
admin panel without a data-schemas dependency.

Add API-friendly admin types: TAdminConfig, TAdminSystemGrant,
TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult,
TCapabilityCategory, and CAPABILITY_CATEGORIES.

data-schemas re-exports these from data-provider and extends with
config-schema-derived types (ConfigSection, SystemCapability union).

Bump version to 0.8.500.

* feat: Add JSON-serializable admin config API response types to data-schemas

Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and
AdminConfigDeleteResponse types so both LibreChat API handlers and the
admin panel can share the same response contract. Bump version to 0.0.41.

* refactor: Move admin capabilities & types from data-provider to data-schemas

SystemCapabilities, CapabilityImplications, utility functions,
CAPABILITY_CATEGORIES, and admin API response types should not be in
data-provider as it gets compiled into the frontend bundle, exposing
the capability surface. Moved everything to data-schemas (server-only).

All consumers already import from @librechat/data-schemas, so no
import changes needed elsewhere. Consolidated duplicate AdminConfig
type (was in both config.ts and admin.ts).

* chore: Bump @librechat/data-schemas to 0.0.42

* refactor: Reorganize admin capabilities into admin/ and types/admin.ts

Split systemCapabilities.ts following data-schemas conventions:
- Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.)
  → src/types/admin.ts
- Runtime code (SystemCapabilities, CapabilityImplications, utilities)
  → src/admin/capabilities.ts

Revert data-provider version to 0.8.401 (no longer modified).

* chore: Fix import ordering, rename appConfigService to service

- Rename app/appConfigService.ts → app/service.ts (directory provides context)
- Fix import order in admin/config.ts, types/admin.ts, types/config.ts
- Add naming convention to AGENTS.md

* feat: Add DB base config support (role/__base__)

- Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc
- getApplicableConfigs always includes __base__ in queries
- getAppConfig queries DB even without role/userId when DB configs exist
- Bump @librechat/data-schemas to 0.0.43

* fix: Address PR review issues for admin config

- Add listAllConfigs method; listConfigs endpoint returns all active
  configs instead of only __base__
- Normalize principalId to string in all config methods to prevent
  ObjectId vs string mismatch on user/group lookups
- Block __proto__ and all dunder-prefixed segments in field path
  validation to prevent prototype pollution
- Fix configVersion off-by-one: default to 0, guard pre('save') with
  !isNew, use $inc on findOneAndUpdate
- Remove unused getApplicableConfigs from admin handler deps

* fix: Enable tree-shaking for data-schemas, bump packages

- Switch data-schemas Rollup output to preserveModules so each source
  file becomes its own chunk; consumers (admin panel) can now import
  just the modules they need without pulling in winston/mongoose/etc.
- Add sideEffects: false to data-schemas package.json
- Bump data-schemas to 0.0.44, data-provider to 0.8.402

* feat: add capabilities subpath export to data-schemas

Adds `@librechat/data-schemas/capabilities` subpath export so browser
consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants
without pulling in Node.js-only modules (winston, async_hooks, etc.).

Bump version to 0.0.45.

* fix: include dist/ in data-provider npm package

Add explicit files field so npm includes dist/types/ in the published
package. Without this, the root .gitignore exclusion of dist/ causes
npm to omit type declarations, breaking TypeScript consumers.

* chore: bump librechat-data-provider to 0.8.403

* feat: add GET /api/admin/config/base for raw AppConfig

Returns the full AppConfig (YAML + DB base merged) so the admin panel
can display actual config field values and structure. The startup config
endpoint (/api/config) returns TStartupConfig which is a different shape
meant for the frontend app.

* chore: imports order

* fix: address code review findings for admin config

Critical:
- Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE
  instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys
- Eliminate race condition: patchConfigField and deleteConfigField now use
  atomic MongoDB $set/$unset with dot-path notation instead of
  read-modify-write cycles, removing the lost-update bug entirely
- Add patchConfigFields and unsetConfigField atomic DB methods

Major:
- Reorder cache check before principal resolution in getAppConfig so
  getUserPrincipals DB query only fires on cache miss
- Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant
- Parallelize capability checks with Promise.all instead of sequential
  awaits in for loops
- Use loose equality (== null) for cache miss check to handle both null
  and undefined returns from cache implementations
- Set HAS_DB_CONFIGS_KEY to true on successful config fetch

Minor:
- Remove dead pre('save') hook from config schema (all writes use
  findOneAndUpdate which bypasses document hooks)
- Consolidate duplicate type imports in resolution.ts
- Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops)
- Add .sort({ priority: 1 }) to getApplicableConfigs query
- Rename _impliedBy to impliedByMap

* fix: self-referencing BROAD_CONFIG_ACCESS constant

* fix: replace type-cast sentinel with proper null parameter

Update hasConfigCapability to accept ConfigSection | null where null
means broad access check (MANAGE_CONFIGS or READ_CONFIGS only).
Removes the '' as ConfigSection type lie from admin config handlers.

* fix: remaining review findings + add tests

- listAllConfigs accepts optional { isActive } filter so admin listing
  can show inactive configs (#9)
- Standardize session application to .session(session ?? null) across
  all config DB methods (#15)
- Export isValidFieldPath and getTopLevelSection for testability
- Add 38 tests across 3 spec files:
  - config.spec.ts (api): path validation, prototype pollution rejection
  - resolution.spec.ts: deep merge, priority ordering, array replacement
  - config.spec.ts (data-schemas): full CRUD, ObjectId normalization,
    atomic $set/$unset, configVersion increment, toggle, __base__ query

* fix: address second code review findings

- Fix cross-user cache contamination: overrideCacheKey now handles
  userId-without-role case with its own cache key (#1)
- Add broad capability check before DB lookup in getConfig to prevent
  config existence enumeration (#2/#3)
- Move deleteConfigField fieldPath from request body to query parameter
  for proxy/load balancer compatibility (#5)
- Derive BaseSystemCapability from SystemCapabilities const instead of
  manual string union (#6)
- Return 201 on upsert creation, 200 on update (#11)
- Remove inline narration comments per AGENTS.md (#12)
- Type overrides as Partial<TCustomConfig> in DB methods and handler
  deps (#13)
- Replace double as-unknown-as casts in resolution.ts with generic
  deepMerge<T> (#14)
- Make override cache TTL injectable via AppConfigServiceDeps (#16)
- Add exhaustive never check in principalModel switch (#17)

* fix: remaining review findings — tests, rename, semantics

- Rename signalConfigChange → markConfigsDirty with JSDoc documenting
  the stale-window tradeoff and overrideCacheTtl knob
- Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention
- Add createAppConfigService tests (14 cases): cache behavior, feature
  flag, cross-user key isolation, fallback on error, markConfigsDirty
- Add admin handler integration tests (13 cases): auth ordering,
  201/200 on create/update, fieldPath from query param, markConfigsDirty
  calls, capability checks

* fix: global flag corruption + empty overrides auth bypass

- Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning
  no configs does not mean no configs exist globally. Setting the flag
  false from a per-principal query short-circuited all subsequent users.
- Add broad manage capability check before section checks in
  upsertConfigOverrides: empty overrides {} no longer bypasses auth.

* test: add regression and invariant tests for config system

Regression tests:
- Bug 1: User A's empty result does not short-circuit User B's overrides
- Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS

Invariant tests (applied across ALL handlers):
- All 5 mutation handlers call markConfigsDirty on success
- All 5 mutation handlers return 401 without auth
- All 5 mutation handlers return 403 without capability
- All 3 read handlers return 403 without capability

* fix: third review pass — all findings addressed

Service (service.ts):
- Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId)
  so deployments with zero DB configs skip DB queries (#1)
- Resolve cache once at factory init instead of per-invocation (#8)
- Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10)
- Add JSDoc to clearAppConfigCache documenting stale-window (#4)
- Fix log message to not say "from YAML" (#14)

Admin handlers (config.ts):
- Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2)
- Add Array.isArray guard on overrides body (#5)
- Import CapabilityUser from capabilities.ts, remove duplicate (#6)
- Replace as-unknown-as cast with targeted type assertion (#7)
- Add MAX_PATCH_ENTRIES=100 cap on entries array (#15)
- Reorder deleteConfigField to validate principalType first (#12)
- Export CapabilityUser from middleware/capabilities.ts

DB methods (config.ts):
- Remove isActive:true from patchConfigFields to prevent silent
  reactivation of disabled configs (#3)

Schema (config.ts):
- Change principalId from Schema.Types.Mixed to String (#11)

Tests:
- Add patchConfigField unsafe fieldPath rejection test (#9)
- Add base-only HAS_DB_CONFIGS=false test (#1)
- Update 201/200 tests to use configVersion instead of findConfig (#2)

* fix: add read handler 401 invariant tests + document flag behavior

- Add invariant: all 3 read handlers return 401 without auth
- Document on markConfigsDirty that HAS_DB_CONFIGS stays true after
  all configs are deleted until clearAppConfigCache or restart

* fix: remove HAS_DB_CONFIGS false optimization entirely

getApplicableConfigs([]) only queries for __base__, not all configs.
A deployment with role/group configs but no __base__ doc gets the
flag poisoned to false by a base-only query, silently ignoring all
scoped overrides. The optimization is not safe without a comprehensive
Config.exists() check, which adds its own DB cost. Removed entirely.

The flag is now write-once-true (set when configs are found or by
markConfigsDirty) and only cleared by clearAppConfigCache/restart.

* chore: reorder import statements in app.js for clarity

* refactor: remove HAS_DB_CONFIGS_KEY machinery entirely

The three-state flag (false/null/true) was the source of multiple bugs
across review rounds. Every attempt to safely set it to false was
defeated by getApplicableConfigs querying only a subset of principals.

Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag,
markConfigsDirty (now a no-op concept), notifyChange wrapper, and all
tests that seeded false manually.

The per-user/role TTL cache (overrideCacheTtl, default 60s) is the
sole caching mechanism. On cache miss, getApplicableConfigs queries
the DB. This is one indexed query per user per TTL window — acceptable
for the config override use case.

* docs: rewrite admin panel remaining work with current state

* perf: cache empty override results to avoid repeated DB queries

When getApplicableConfigs returns no configs for a principal, cache
baseConfig under their override key with TTL. Without this, every
user with no per-principal overrides hits MongoDB on every request
after the 60s cache window expires.

* fix: add tenantId to cache keys + reject PUBLIC principal type

- Include tenantId in override cache keys to prevent cross-tenant
  config contamination. Single-tenant deployments (tenantId undefined)
  use '_' as placeholder — no behavior change for them.
- Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has
  no PrincipalModel and is never resolved by getApplicableConfigs,
  so config docs for it would be dead data.
- Config middleware passes req.user.tenantId to getAppConfig.

* fix: fourth review pass findings

DB methods (config.ts):
- findConfigByPrincipal accepts { includeInactive } option so admin
  GET can retrieve inactive configs (#5)
- upsertConfig catches E11000 duplicate key on concurrent upserts and
  retries without upsert flag (#2)
- unsetConfigField no longer filters isActive:true, consistent with
  patchConfigFields (#11)
- Typed filter objects replace Record<string, unknown> (#12)

Admin handlers (config.ts):
- patchConfigField: serial broad capability check before Promise.all
  to pre-warm ALS principal cache, preventing N parallel DB calls (#3)
- isValidFieldPath rejects leading/trailing dots and consecutive
  dots (#7)
- Duplicate fieldPaths in patch entries return 400 (#8)
- DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14)
- Admin getConfig and patchConfigField pass includeInactive to
  findConfigByPrincipal (#5)
- Route import uses barrel instead of direct file path (#13)

Resolution (resolution.ts):
- deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow
  from crafted deeply nested configs (#4)

* fix: final review cleanup

- Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths)
- Add empty-result caching regression test
- Add tenantId to AdminConfigDeps.getAppConfig type
- Restore exhaustive never check in principalModel switch
- Standardize toggleConfigActive session handling to options pattern

* fix: validate priority in patchConfigField handler

Add the same non-negative number validation for priority that
upsertConfigOverrides already has. Without this, invalid priority
values could be stored via PATCH and corrupt merge ordering.

* chore: remove planning doc from PR

* fix: correct stale cache key strings in service tests

* fix: clean up service tests and harden tenant sentinel

- Remove no-op cache delete lines from regression tests
- Change no-tenant sentinel from '_' to '__default__' to avoid
  collision with a real tenant ID when multi-tenancy is enabled
- Remove unused CONFIG_STORE from AppConfigServiceDeps

* chore: bump @librechat/data-schemas to 0.0.46

* fix: block prototype-poisoning keys in deepMerge

Skip __proto__, constructor, and prototype keys during config merge
to prevent prototype pollution via PUT /api/admin/config overrides.
2026-03-25 19:39:29 -04:00
Marco Beretta
ccd049d8ce
📁 refactor: Prompts UI (#11570)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* style: enhance prompts UI with new components and improved structure; add CreatePromptButton and AutoSendPrompt; refactor GroupSidePanel and PromptsAccordion

* refactor(Prompts): move button components to buttons/ subdirectory

* refactor(Prompts): move dialog components to dialogs/ subdirectory

* refactor(Prompts): move display components to display/ subdirectory

* refactor(Prompts): move editor components to editor/ subdirectory

* refactor(Prompts): move field components to fields/ subdirectory

* refactor(Prompts): move form components to forms/ subdirectory

* refactor(Prompts): move layout components to layouts/ subdirectory

* refactor(Prompts): move list components to lists/ subdirectory

* refactor(Prompts): move sidebar components to sidebar/ subdirectory

* refactor(Prompts): move utility components to utils/ subdirectory

* refactor(Prompts): update main exports and external imports

* refactor(Prompts): fix class name typo in AutoSendPrompt

* refactor(Prompts): reorganize exports and imports order across components

* refactor(Prompts): reorder exports for better organization and clarity

* refactor(Buttons): enhance prompts accessibility with aria-labels and update translations

* refactor(AdminSettings): reorganize imports and improve form structure for clarity

* refactor(Dialogs): reorganize imports for consistency and clarity across DeleteVersion, SharePrompt, and VariableDialog components

* refactor(Dialogs): enhance prompts accessibility with aria-labels

* refactor(Display): enhance prompt components and accessibility features

* refactor(.gitignore): add Playwright MCP directory

* refactor(Preview): enhance prompt components, improve layout, and add accessibility features

* refactor(Prompts): enhance variable handling, improve accessibility, and update UI components

* refactor(Prompts): enhance loading state handling and improve accessibility in PromptName component

* refactor(Prompts): streamline special variable handling, improve icon management, and enhance UI components

* refactor(Prompts): update AdvancedSwitch component to use Radio for mode selection, enhance PromptName with tooltips, and improve layout in PromptForm

* refactor(Prompts): enhance VersionCard and VersionBadge components for improved UI and accessibility, update loading state handling in VersionsPanel

* refactor(Prompts): improve layout and styling of VersionCard component for better visual alignment and clarity

* refactor(DeleteVersion): update text color for confirmation prompt in DeleteConfirmDialog

* refactor(Prompts): add configurations for always make production and auto-send prompts, update localization strings for clarity

* refactor(Prompts): enhance layout and styling in CategorySelector, CreatePromptForm, and List components for improved responsiveness and clarity

* refactor(Prompts): enhance PromptDetailHeader and ChatGroupItem components, add shared prompt indication, and remove unused PromptMetadata component

* refactor(Prompts): implement prompt group usage tracking, update sorting logic, and enhance related components

* fix(Prompts): security, performance, and pagination fixes

- Fix cursor pagination skipping/duplicating items by including
  numberOfGenerations in cursor condition to match sort order
- Close NoSQL injection vector via otherFilters rest spread in
  GET /all, GET /groups, and buildPromptGroupFilter
- Validate groupId as ObjectId before passing to query (GET /)
- Add prompt body validation in addPromptToGroup (type + text)
- Return 404 instead of 500 for missing group in POST /use
- Combine data + count into single $facet aggregation
- Add compound index {numberOfGenerations, updatedAt, _id}
- Add index on prompt.author for deleteUserPrompts
- Update useRecordPromptUsage to refresh client caches
- Replace console.error with logger.error

* refactor(PromptForm): remove console warning for unselected prompt in VersionsPanel

* refactor(Prompts): improve error handling for groupId and streamline usage tracking

* refactor(.gitignore): add CLAUDE.md to ignore list

* refactor(Prompts): streamline prompt components by removing unused variables and enhancing props structure

* refactor(Prompts): fix sort stability, keyboard handling, and remove dead code

Add _id tiebreaker to prompt group sort pipelines for deterministic
pagination ordering. Prevent default browser scroll on Space key in
PromptEditor preview mode. Remove unused blurTimeoutRef and its
onMutate callback from DashGroupItem.

* refactor(Prompts): enhance groupId validation and improve prompt group aggregation handling

* fix: aria-hidden, API fixes, accessibility improvements

* fix: ACL author filter, mobile guard, semantic HTML, and add useFocusTrap hook

- Remove author filter from patchPromptGroup so ACL-granted editors
  can update prompt groups (aligns with deletePromptGroupController)
- Add missing group guard to mobile HeaderActions in PromptForm
- Replace div with article in DashGroupItem, remove redundant
  stopPropagation and onClick on outer container
- Add useFocusTrap hook for keyboard focus management
- Add numberOfGenerations to default projection
- Deduplicate ObjectId validation, remove console.warn,
  fix aria-labelledby, localize search announcements

* refactor(Prompts): adjust UI and improve a11y

* refactor(Prompts): reorder imports for consistency and clarity

* refactor(Prompts): implement updateFieldsInPlace for efficient data updates and add related tests

* refactor(Prompts): reorder imports to include updateFieldsInPlace for better organization

* refactor(Prompts): enhance DashGroupItem with toast notifications for prompt updates and add click-to-edit functionality in PromptEditor

* style: use self-closing TooltipAnchor in CreatePromptButton

Replace ></TooltipAnchor> with /> for consistency with the rest of the Prompts directory.

* fix(i18n): replace placeholder text for com_ui_global_group translation key

The value was left as 'something needs to go here. was empty' which
would be visible to users as an aria-label in DashGroupItem.

* fix(DashGroupItem): sync rename input with group.name on external changes

nameInputValue was initialized via useState(group.name) but never
synced when group.name changed from a background refetch. Added
useEffect that updates the input when the dialog is closed.

* perf(useFocusTrap): store onEscape in ref to avoid listener churn

onEscape was in the useEffect dependency array, causing the keydown
listener to be torn down and re-attached on every render when callers
passed an inline function. Now stored in a ref so the effect only
re-runs when active or containerRef changes.

* fix(a11y): replace role=button div with layered button overlay in ListCard

The card used role='button' on a div that contained nested Button
elements — an invalid ARIA pattern. Replaced with a hidden button
at z-0 for the card action while child interactive elements sit
at z-10, eliminating nested interactive element violations.

* fix(PromptForm): reset selectionIndex on route change, guard auto-save, and fix a11y

- Reset selectionIndex to 0 and isEditing to false when promptId
  changes, preventing out-of-bounds index when navigating between
  groups with different version counts.
- Track selectedPrompt in a ref so the auto-save effect doesn't
  fire against a stale prompt when the selection changed mid-edit.
- Stabilize useFocusTrap onEscape via useCallback to avoid
  unnecessary listener re-attachment.
- Conditionally render mobile overlay instead of always-present
  button with aria-hidden/pointer-events toggling.

* refactor: extract isValidObjectIdString to shared utility in data-schemas

The same regex helper was duplicated in api/server/routes/prompts.js
and packages/data-schemas/src/methods/prompt.ts. Moved to
packages/data-schemas/src/utils/objectId.ts and imported from both
consumers. Also removed a duplicate router.use block introduced
during the extraction.

* perf(updateFieldsInPlace): replace JSON deep clone with targeted spread

Instead of JSON.parse(JSON.stringify(data)) which serializes the
entire paginated data structure, use targeted immutable spreads
that only copy the affected page and collection array. Returns the
original data reference unchanged when the item is not found.

* perf(VariablesDropdown): memoize items array and stabilize handleAddVariable

The items array containing JSX elements was rebuilt on every render.
Wrapped in useMemo keyed on usedVariables and localize. Also wrapped
handleAddVariable in useCallback and memoized usedCount to avoid
redundant array filtering.

* perf(DashGroupItem): stabilize mutation callbacks via refs

handleSaveRename and handleDelete had updateGroup/deleteGroup mutation
objects in their useCallback dependency arrays. Since mutation objects
are new references each render, the callbacks were recreated every
render, defeating memoization. Now store mutation objects in refs and
call via ref.current in the callbacks.

* fix(security): validate groupId in incrementPromptGroupUsage

The data-schema method passed the groupId string directly to
findByIdAndUpdate without validation. If called from a different
entrypoint without the route-level check, Mongoose would throw a
CastError. Now validates with isValidObjectIdString before the
DB call and throws a clean 'Invalid groupId' error.

* fix(security): add rate limiter to prompt usage tracking endpoint

POST /groups/:groupId/use had no rate limiting — a user could spam
it to inflate numberOfGenerations, which controls sort order for all
users. Added promptUsageLimiter (30 req/user/min) following the same
pattern as toolCallLimiter. Also handle 'Invalid groupId' error from
the data layer in the route error handler.

* fix(updateFieldsInPlace): guard against undefined identifier value

If updatedItem[identifierField] is null/undefined, findIndex could
match unintended items where that field is also undefined. Added
early return when the identifier value is nullish.

* fix(a11y): use React useId for stable unique IDs in ListCard

aria-describedby/id values were derived from prompt name which can
contain spaces and special characters, producing invalid HTML IDs
and potential collisions. Now uses React.useId() for guaranteed
unique, valid IDs per component instance.

* fix: Align prompts panel styling with other sidebar panels and fix test

- Match FilterPrompts first row to Memory/Bookmark pattern (items-center gap-2)
- Remove items-stretch override from PromptsAccordion
- Add missing promptUsageLimiter mock to prompts route test

* fix: Address code review findings for prompts refactor PR

- Fix #5: Gate DeletePrompt in HeaderActions behind canDelete permission
- Fix #8: BackToChat navigates to last conversation instead of /c/new
- Fix #7: Restore useLiveAnnouncer for screen reader feedback on delete/rename
- Fix #1: Use isPublic (set by API) instead of deprecated projectIds for globe icon
- Fix #4: Optimistic cache update in useRecordPromptUsage instead of full invalidation
- Fix #6: Add migration to drop superseded { createdAt, updatedAt } compound index
- Fix #9: Single-pass reduce in PromptVariables instead of triple filter
- Fix #10: Rename PromptLabelsForm internal component to avoid collision with PromptForm
- Fix #14: Remove redundant aria-label from aria-hidden Checkbox in AutoSendPrompt

* fix: Align prompts panel filter row element sizes with other panels

- Override Dropdown trigger to size-9 (36px) to match FilterInput height
- Set CreatePromptButton to size-9 shrink-0 bg-transparent matching
  Memory/Bookmark panel button pattern

* fix(prompts): Shared Prompts filter ignores direct shares, only returns PUBLIC

Folds fix from PR #11882 into the refactored codebase.

Bug A: filterAccessibleIdsBySharedLogic now accepts ownedPromptGroupIds:
- MY_PROMPTS: accessible intersect owned
- SHARED_PROMPTS: (accessible union public) minus owned
- ALL: accessible union public (deduplicated)
Legacy fallback preserved when ownedPromptGroupIds is omitted.

Bug B: getPromptGroup uses $lookup aggregation to populate productionPrompt,
fixing empty text on direct URL navigation to shared prompts.

Also adds getOwnedPromptGroupIds to data-schemas methods and passes it
from both /all and /groups route handlers.

* fix: Add missing canDelete to mobile HeaderActions, remove dead instanceProjectId prop

- Pass canDelete to mobile HeaderActions row (was only on desktop)
- Remove instanceProjectId prop from ChatGroupItem and DashGroupItem
  since global check now uses group.isPublic
- Remove useGetStartupConfig from List.tsx (no longer needed)

* fix: Use runtime ObjectId instead of type-only Types.ObjectId, fix i18next interpolation

- getPromptGroup and getOwnedPromptGroupIds were using Types.ObjectId
  (imported as type-only), which is erased at compile time. Use the
  runtime ObjectId from mongoose.Types (already destructured at line 20).
  This fixes the 404s in PATCH /groups/:groupId tests.
- Fix com_ui_prompt_deleted_group translation to use {{0}} (i18next
  double-brace syntax) instead of {0}.

* chore: Fix translation key ordering, add sideEffects: false to data-provider

- Reorder new translation keys to maintain alphabetical order:
  com_ui_click_to_edit, com_ui_labels, com_ui_live, com_ui_prompt_delete_confirm,
  com_ui_prompt_deleted_group, com_ui_prompt_details, com_ui_prompt_renamed,
  com_ui_prompt_update_error, com_ui_prompt_variables_list
- Add "sideEffects": false to librechat-data-provider package.json to
  enable tree-shaking of unused exports (types, constants, pure functions)

* fix: Reduce prompts panel spacing, align memory toggle with checkbox pattern

- Remove unnecessary wrapper div around AutoSendPrompt in PromptsAccordion,
  reducing vertical space between the toggle and the first prompt item
- Replace Memory panel's Switch toggle with Checkbox+Button pattern
  matching the prompts panel's AutoSendPrompt for visual consistency

* fix: Reduce gap between AutoSendPrompt and first prompt item

Change ChatGroupItem margin from my-2 to mb-2 to eliminate the
doubled spacing (gap-2 from parent + top margin from first item).
Restore wrapper div around AutoSendPrompt for right-alignment.

* fix: Restore prompt name on empty save, remove dead bodyProps from checkGlobalPromptShare

- PromptName: reset newName to name when save is cancelled due to empty
  or unchanged input, preventing blank title in read mode
- checkGlobalPromptShare: remove dead bodyProps config — Permissions.SHARE
  was not in the permissions array so the bodyProps rule was never evaluated.
  Per-resource share checks are handled by canAccessPromptGroupResource.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-22 16:56:22 -04:00
Danny Avila
b5c097e5c7
⚗️ feat: Agent Context Compaction/Summarization (#12287)
* chore: imports/types

Add summarization config and package-level summarize handler contracts

Register summarize handlers across server controller paths

Port cursor dual-read/dual-write summary support and UI status handling

Selectively merge cursor branch files for BaseClient summary content
block detection (last-summary-wins), dual-write persistence, summary
block unit tests, and on_summarize_status SSE event handling with
started/completed/failed branches.

Co-authored-by: Cursor <cursoragent@cursor.com>

refactor: type safety

feat: add localization for summarization status messages

refactor: optimize summary block detection in BaseClient

Updated the logic for identifying existing summary content blocks to use a reverse loop for improved efficiency. Added a new test case to ensure the last summary content block is updated correctly when multiple summary blocks exist.

chore: add runName to chainOptions in AgentClient

refactor: streamline summarization configuration and handler integration

Removed the deprecated summarizeNotConfigured function and replaced it with a more flexible createSummarizeFn. Updated the summarization handler setup across various controllers to utilize the new function, enhancing error handling and configuration resolution. Improved overall code clarity and maintainability by consolidating summarization logic.

feat(summarization): add staged chunk-and-merge fallback

feat(usage): track summarization usage separately from messages

feat(summarization): resolve prompt from config in runtime

fix(endpoints): use @librechat/api provider config loader

refactor(agents): import getProviderConfig from @librechat/api

chore: code order

feat(app-config): auto-enable summarization when configured

feat: summarization config

refactor(summarization): streamline persist summary handling and enhance configuration validation

Removed the deprecated createDeferredPersistSummary function and integrated a new createPersistSummary function for MongoDB persistence. Updated summarization handlers across various controllers to utilize the new persistence method. Enhanced validation for summarization configuration to ensure provider, model, and prompt are properly set, improving error handling and overall robustness.

refactor(summarization): update event handling and remove legacy summarize handlers

Replaced the deprecated summarization handlers with new event-driven handlers for summarization start and completion across multiple controllers. This change enhances the clarity of the summarization process and improves the integration of summarization events in the application. Additionally, removed unused summarization functions and streamlined the configuration loading process.

refactor(summarization): standardize event names in handlers

Updated event names in the summarization handlers to use constants from GraphEvents for consistency and clarity. This change improves maintainability and reduces the risk of errors related to string literals in event handling.

feat(summarization): enhance usage tracking for summarization events

Added logic to track summarization usage in multiple controllers by checking the current node type. If the node indicates a summarization task, the usage type is set accordingly. This change improves the granularity of usage data collected during summarization processes.

feat(summarization): integrate SummarizationConfig into AppSummarizationConfig type

Enhanced the AppSummarizationConfig type by extending it with the SummarizationConfig type from librechat-data-provider. This change improves type safety and consistency in the summarization configuration structure.

test: add end-to-end tests for summarization functionality

Introduced a comprehensive suite of end-to-end tests for the summarization feature, covering the full LibreChat pipeline from message creation to summarization. This includes a new setup file for environment configuration and a Jest configuration specifically for E2E tests. The tests utilize real API keys and ensure proper integration with the summarization process, enhancing overall test coverage and reliability.

refactor(summarization): include initial summary in formatAgentMessages output

Updated the formatAgentMessages function to return an initial summary alongside messages and index token count map. This change is reflected in multiple controllers and the corresponding tests, enhancing the summarization process by providing additional context for each agent's response.

refactor: move hydrateMissingIndexTokenCounts to tokenMap utility

Extracted the hydrateMissingIndexTokenCounts function from the AgentClient and related tests into a new tokenMap utility file. This change improves code organization and reusability, allowing for better management of token counting logic across the application.

refactor(summarization): standardize step event handling and improve summary rendering

Refactored the step event handling in the useStepHandler and related components to utilize constants for event names, enhancing consistency and maintainability. Additionally, improved the rendering logic in the Summary component to conditionally display the summary text based on its availability, providing a better user experience during the summarization process.

feat(summarization): introduce baseContextTokens and reserveTokensRatio for improved context management

Added baseContextTokens to the InitializedAgent type to calculate the context budget based on agentMaxContextNum and maxOutputTokensNum. Implemented reserveTokensRatio in the createRun function to allow configurable context token management. Updated related tests to validate these changes and ensure proper functionality.

feat(summarization): add minReserveTokens, context pruning, and overflow recovery configurations

Introduced new configuration options for summarization, including minReserveTokens, context pruning settings, and overflow recovery parameters. Updated the createRun function to accommodate these new options and added a comprehensive test suite to validate their functionality and integration within the summarization process.

feat(summarization): add updatePrompt and reserveTokensRatio to summarization configuration

Introduced an updatePrompt field for updating existing summaries with new messages, enhancing the flexibility of the summarization process. Additionally, added reserveTokensRatio to the configuration schema, allowing for improved management of token allocation during summarization. Updated related tests to validate these new features.

feat(logging): add on_agent_log event handler for structured logging

Implemented an on_agent_log event handler in both the agents' callbacks and responses to facilitate structured logging of agent activities. This enhancement allows for better tracking and debugging of agent interactions by logging messages with associated metadata. Updated the summarization process to ensure proper handling of log events.

fix: remove duplicate IBalanceUpdate interface declaration

perf(usage): single-pass partition of collectedUsage

Replace two Array.filter() passes with a single for-of loop that
partitions message vs. summarization usages in one iteration.

fix(BaseClient): shallow-copy message content before mutating and preserve string content

Avoid mutating the original message.content array in-place when
appending a summary block. Also convert string content to a text
content part instead of silently discarding it.

fix(ui): fix Part.tsx indentation and useStepHandler summarize-complete handling

- Fix SUMMARY else-if branch indentation in Part.tsx to match chain level
- Guard ON_SUMMARIZE_COMPLETE with didFinalize flag to avoid unnecessary
  re-renders when no summarizing parts exist
- Protect against undefined completeData.summary instead of unsafe spread

fix(agents): use strict enabled check for summarization handlers

Change summarizationConfig?.enabled !== false to === true so handlers
are not registered when summarizationConfig is undefined.

chore: fix initializeClient JSDoc and move DEFAULT_RESERVE_RATIO to module scope

refactor(Summary): align collapse/expand behavior with Reasoning component

- Single render path instead of separate streaming vs completed branches
- Use useMessageContext for isSubmitting/isLatestMessage awareness so
  the "Summarizing..." label only shows during active streaming
- Default to collapsed (matching Reasoning), user toggles to expand
- Add proper aria attributes (aria-hidden, role, aria-controls, contentId)
- Hide copy button while actively streaming

feat(summarization): default to self-summarize using agent's own provider/model

When no summarization config is provided (neither in librechat.yaml nor
on the agent), automatically enable summarization using the agent's own
provider and model. The agents package already provides default prompts,
so no prompt configuration is needed.

Also removes the dead resolveSummarizationLLMConfig in summarize.ts
(and its spec) — run.ts buildAgentContext is the single source of truth
for summarization config resolution. Removes the duplicate
RuntimeSummarizationConfig local type in favor of the canonical
SummarizationConfig from data-provider.

chore: schema and type cleanup for summarization

- Add trigger field to summarizationAgentOverrideSchema so per-agent
  trigger overrides in librechat.yaml are not silently stripped by Zod
- Remove unused SummarizationStatus type from runs.ts
- Make AppSummarizationConfig.enabled non-optional to reflect the
  invariant that loadSummarizationConfig always sets it

refactor(responses): extract duplicated on_agent_log handler

refactor(run): use agents package types for summarization config

Import SummarizationConfig, ContextPruningConfig, and
OverflowRecoveryConfig from @librechat/agents and use them to
type-check the translation layer in buildAgentContext. This ensures
the config object passed to the agent graph matches what it expects.

- Use `satisfies AgentSummarizationConfig` on the config object
- Cast contextPruningConfig and overflowRecoveryConfig to agents types
- Properly narrow trigger fields from DeepPartial to required shape

feat(config): add maxToolResultChars to base endpoint schema

Add maxToolResultChars to baseEndpointSchema so it can be configured
on any endpoint in librechat.yaml. Resolved during agent initialization
using getProviderConfig's endpoint resolution: custom endpoint config
takes precedence, then the provider-specific endpoint config, then the
shared `all` config.

Passed through to the agents package ToolNode, which uses it to cap
tool result length before it enters the context window. When not
configured, the agents package computes a sensible default from
maxContextTokens.

fix(summarization): forward agent model_parameters in self-summarize default

When no explicit summarization config exists, the self-summarize
default now forwards the agent's model_parameters as the
summarization parameters. This ensures provider-specific settings
(e.g. Bedrock region, credentials, endpoint host) are available
when the agents package constructs the summarization LLM.

fix(agents): register summarization handlers by default

Change the enabled gate from === true to !== false so handlers
register when no explicit summarization config exists. This aligns
with the self-summarize default where summarization is always on
unless explicitly disabled via enabled: false.

refactor(summarization): let agents package inherit clientOptions for self-summarize

Remove model_parameters forwarding from the self-summarize default.
The agents package now reuses the agent's own clientOptions when the
summarization provider matches the agent's provider, inheriting all
provider-specific settings (region, credentials, proxy, etc.)
automatically.

refactor(summarization): use MessageContentComplex[] for summary content

Unify summary content to always use MessageContentComplex[] arrays,
matching the pattern used by on_message_delta. No more string | array
unions — content is always an array of typed blocks ({ type: 'text',
text: '...' } for text, { type: 'reasoning_content', ... } for
reasoning).

Agents package:
- SummaryContentBlock.content: MessageContentComplex[] (was string)
- tokenCount now optional (not sent on deltas)
- Removed reasoning field — reasoning is now a content block type
- streamAndCollect normalizes all chunks to content block arrays
- Delta events pass content blocks directly

LibreChat:
- SummaryContentPart.content: Agents.MessageContentComplex[]
- Updated Part.tsx, Summary.tsx, useStepHandler.ts, BaseClient.js
- Summary.tsx derives display text from content blocks via useMemo
- Aggregator uses simple array spread

refactor(summarization): enhance summary handling and text extraction

- Updated BaseClient.js to improve summary text extraction, accommodating both legacy and new content formats.
- Modified summarization logic to ensure consistent handling of summary content across different message formats.
- Adjusted test cases in summarization.e2e.spec.js to utilize the new summary text extraction method.
- Refined SSE useStepHandler to initialize summary content as an array.
- Updated configuration schema by removing unused minReserveTokens field.
- Cleaned up SummaryContentPart type by removing rangeHash property.

These changes streamline the summarization process and ensure compatibility with various content structures.

refactor(summarization): streamline usage tracking and logging

- Removed direct checks for summarization nodes in ModelEndHandler and replaced them with a dedicated markSummarizationUsage function for better readability and maintainability.
- Updated OpenAIChatCompletionController and responses handlers to utilize the new markSummarizationUsage function for setting usage types.
- Enhanced logging functionality by ensuring the logger correctly handles different log levels.
- Introduced a new useCopyToClipboard hook in the Summary component to encapsulate clipboard copy logic, improving code reusability and clarity.

These changes improve the overall structure and efficiency of the summarization handling and logging processes.

refactor(summarization): update summary content block documentation

- Removed outdated comment regarding the last summary content block in BaseClient.js.
- Added a new comment to clarify the purpose of the findSummaryContentBlock method, ensuring consistency in documentation.

These changes enhance code clarity and maintainability by providing accurate descriptions of the summarization logic.

refactor(summarization): update summary content structure in tests

- Modified the summarization content structure in e2e tests to use an array format for text, aligning with recent changes in summary handling.
- Updated test descriptions to clarify the behavior of context token calculations, ensuring consistency and clarity in the tests.

These changes enhance the accuracy and maintainability of the summarization tests by reflecting the updated content structure.

refactor(summarization): remove legacy E2E test setup and configuration

- Deleted the e2e-setup.js and jest.e2e.config.js files, which contained legacy configurations for E2E tests using real API keys.
- Introduced a new summarization.e2e.ts file that implements comprehensive E2E backend integration tests for the summarization process, utilizing real AI providers and tracking summaries throughout the run.

These changes streamline the testing framework by consolidating E2E tests into a single, more robust file while removing outdated configurations.

refactor(summarization): enhance E2E tests and error handling

- Added a cleanup step to force exit after all tests to manage Redis connections.
- Updated the summarization model to 'claude-haiku-4-5-20251001' for consistency across tests.
- Improved error handling in the processStream function to capture and return processing errors.
- Enhanced logging for cross-run tests and tight context scenarios to provide better insights into test execution.

These changes improve the reliability and clarity of the E2E tests for the summarization process.

refactor(summarization): enhance test coverage for maxContextTokens behavior

- Updated run-summarization.test.ts to include a new test case ensuring that maxContextTokens does not exceed user-defined limits, even when calculated ratios suggest otherwise.
- Modified summarization.e2e.ts to replace legacy UsageMetadata type with a more appropriate type for collectedUsage, improving type safety and clarity in the test setup.

These changes improve the robustness of the summarization tests by validating context token constraints and refining type definitions.

feat(summarization): add comprehensive E2E tests for summarization process

- Introduced a new summarization.e2e.test.ts file that implements extensive end-to-end integration tests for the summarization pipeline, covering the full flow from LibreChat to agents.
- The tests utilize real AI providers and include functionality to track summaries during and between runs.
- Added necessary cleanup steps to manage Redis connections post-tests and ensure proper exit.

These changes enhance the testing framework by providing robust coverage for the summarization process, ensuring reliability and performance under real-world conditions.

fix(service): import logger from winston configuration

- Removed the import statement for logger from '@librechat/data-schemas' and replaced it with an import from '~/config/winston'.
- This change ensures that the logger is correctly sourced from the updated configuration, improving consistency in logging practices across the application.

refactor(summary): simplify Summary component and enhance token display

- Removed the unused `meta` prop from the `SummaryButton` component to streamline its interface.
- Updated the token display logic to use a localized string for better internationalization support.
- Adjusted the rendering of the `meta` information to improve its visibility within the `Summary` component.

These changes enhance the clarity and usability of the Summary component while ensuring better localization practices.

feat(summarization): add maxInputTokens configuration for summarization

- Introduced a new `maxInputTokens` property in the summarization configuration schema to control the amount of conversation context sent to the summarizer, with a default value of 10000.
- Updated the `createRun` function to utilize the new `maxInputTokens` setting, allowing for more flexible summarization based on agent context.

These changes enhance the summarization capabilities by providing better control over input token limits, improving the overall summarization process.

refactor(summarization): simplify maxInputTokens logic in createRun function

- Updated the logic for the `maxInputTokens` property in the `createRun` function to directly use the agent's base context tokens when the resolved summarization configuration does not specify a value.
- This change streamlines the configuration process and enhances clarity in how input token limits are determined for summarization.

These modifications improve the maintainability of the summarization configuration by reducing complexity in the token calculation logic.

feat(summary): enhance Summary component to display meta information

- Updated the SummaryContent component to accept an optional `meta` prop, allowing for additional contextual information to be displayed above the main content.
- Adjusted the rendering logic in the Summary component to utilize the new `meta` prop, improving the visibility of supplementary details.

These changes enhance the user experience by providing more context within the Summary component, making it clearer and more informative.

refactor(summarization): standardize reserveRatio configuration in summarization logic

- Replaced instances of `reserveTokensRatio` with `reserveRatio` in the `createRun` function and related tests to unify the terminology across the codebase.
- Updated the summarization configuration schema to reflect this change, ensuring consistency in how the reserve ratio is defined and utilized.
- Removed the per-agent override logic for summarization configuration, simplifying the overall structure and enhancing clarity.

These modifications improve the maintainability and readability of the summarization logic by standardizing the configuration parameters.

* fix: circular dependency of `~/models`

* chore: update logging scope in agent log handlers

Changed log scope from `[agentus:${data.scope}]` to `[agents:${data.scope}]` in both the callbacks and responses controllers to ensure consistent logging format across the application.

* feat: calibration ratio

* refactor(tests): update summarizationConfig tests to reflect changes in enabled property

Modified tests to check for the new `summarizationEnabled` property instead of the deprecated `enabled` field in the summarization configuration. This change ensures that the tests accurately validate the current configuration structure and behavior of the agents.

* feat(tests): add markSummarizationUsage mock for improved test coverage

Introduced a mock for the markSummarizationUsage function in the responses unit tests to enhance the testing of summarization usage tracking. This addition supports better validation of summarization-related functionalities and ensures comprehensive test coverage for the agents' response handling.

* refactor(tests): simplify event handler setup in createResponse tests

Removed redundant mock implementations for event handlers in the createResponse unit tests, streamlining the setup process. This change enhances test clarity and maintainability while ensuring that the tests continue to validate the correct behavior of usage tracking during on_chat_model_end events.

* refactor(agents): move calibration ratio capture to finally block

Reorganized the logic for capturing the calibration ratio in the AgentClient class to ensure it is executed in the finally block. This change guarantees that the ratio is captured even if the run is aborted, enhancing the reliability of the response message persistence. Removed redundant code and improved clarity in the handling of context metadata.

* refactor(agents): streamline bulk write logic in recordCollectedUsage function

Removed redundant bulk write operations and consolidated document handling in the recordCollectedUsage function. The logic now combines all documents into a single bulk write operation, improving efficiency and reducing error handling complexity. Updated logging to provide consistent error messages for bulk write failures.

* refactor(agents): enhance summarization configuration resolution in createRun function

Streamlined the summarization configuration logic by introducing a base configuration and allowing for overrides from agent-specific settings. This change improves clarity and maintainability, ensuring that the summarization configuration is consistently applied while retaining flexibility for customization. Updated the handling of summarization parameters to ensure proper integration with the agent's model and provider settings.

* refactor(agents): remove unused tokenCountMap and streamline calibration ratio handling

Eliminated the unused tokenCountMap variable from the AgentClient class to enhance code clarity. Additionally, streamlined the logic for capturing the calibration ratio by using optional chaining and a fallback value, ensuring that context metadata is consistently defined. This change improves maintainability and reduces potential confusion in the codebase.

* refactor(agents): extract agent log handler for improved clarity and reusability

Refactored the agent log handling logic by extracting it into a dedicated function, `agentLogHandler`, enhancing code clarity and reusability across different modules. Updated the event handlers in both the OpenAI and responses controllers to utilize the new handler, ensuring consistent logging behavior throughout the application.

* test: add summarization event tests for useStepHandler

Implemented a series of tests for the summarization events in the useStepHandler hook. The tests cover scenarios for ON_SUMMARIZE_START, ON_SUMMARIZE_DELTA, and ON_SUMMARIZE_COMPLETE events, ensuring proper handling of summarization logic, including message accumulation and finalization. This addition enhances test coverage and validates the correct behavior of the summarization process within the application.

* refactor(config): update summarizationTriggerSchema to use enum for type validation

Changed the type of the `type` field in the summarizationTriggerSchema from a string to an enum with a single value 'token_count'. This modification enhances type safety and ensures that only valid types are accepted in the configuration, improving overall clarity and maintainability of the schema.

* test(usage): add bulk write tests for message and summarization usage

Implemented tests for the bulk write functionality in the recordCollectedUsage function, covering scenarios for combined message and summarization usage, summarization-only usage, and message-only usage. These tests ensure correct document handling and token rollup calculations, enhancing test coverage and validating the behavior of the usage tracking logic.

* refactor(Chat): enhance clipboard copy functionality and type definitions in Summary component

Updated the Summary component to improve the clipboard copy functionality by handling clipboard permission errors. Refactored type definitions for SummaryProps to use a more specific type, enhancing type safety. Adjusted the SummaryButton and FloatingSummaryBar components to accept isCopied and onCopy props, promoting better separation of concerns and reusability.

* chore(translations): remove unused "Expand Summary" key from English translations

Deleted the "Expand Summary" key from the English translation file to streamline the localization resources and improve clarity in the user interface. This change helps maintain an organized and efficient translation structure.

* refactor: adjust token counting for Claude model to account for API discrepancies

Implemented a correction factor for token counting when using the Claude model, addressing discrepancies between Anthropic's API and local tokenizer results. This change ensures accurate token counts by applying a scaling factor, improving the reliability of token-related functionalities.

* refactor(agents): implement token count adjustment for Claude model messages

Added a method to adjust token counts for messages processed by the Claude model, applying a correction factor to align with API expectations. This enhancement improves the accuracy of token counting, ensuring reliable functionality when interacting with the Claude model.

* refactor(agents): token counting for media content in messages

Introduced a new method to estimate token costs for image and document blocks in messages, improving the accuracy of token counting. This enhancement ensures that media content is properly accounted for, particularly for the Claude model, by integrating additional token estimation logic for various content types. Updated the token counting function to utilize this new method, enhancing overall reliability and functionality.

* chore: fix missing import

* fix(agents): clamp baseContextTokens and document reserve ratio change

Prevent negative baseContextTokens when maxOutputTokens exceeds the
context window (misconfigured models). Document the 10%→5% default
reserve ratio reduction introduced alongside summarization.

* fix(agents): include media tokens in hydrated token counts

Add estimateMediaTokensForMessage to createTokenCounter so the hydration
path (used by hydrateMissingIndexTokenCounts) matches the precomputed
path in AgentClient.getTokenCountForMessage. Without this, messages
containing images or documents were systematically undercounted during
hydration, risking context window overflow.

Add 34 unit tests covering all block-type branches of
estimateMediaTokensForMessage.

* fix(agents): include summarization output tokens in usage return value

The returned output_tokens from recordCollectedUsage now reflects all
billed LLM calls (message + summarization). Previously, summarization
completions were billed but excluded from the returned metadata, causing
a discrepancy between what users were charged and what the response
message reported.

* fix(tests): replace process.exit with proper Redis cleanup in e2e test

The summarization E2E test used process.exit(0) to work around a Redis
connection opened at import time, which killed the Jest runner and
bypassed teardown. Use ioredisClient.quit() and keyvRedisClient.disconnect()
for graceful cleanup instead.

* fix(tests): update getConvo imports in OpenAI and response tests

Refactor test files to import getConvo from the main models module instead of the Conversation submodule. This change ensures consistency across tests and simplifies the import structure, enhancing maintainability.

* fix(clients): improve summary text validation in BaseClient

Refactor the summary extraction logic to ensure that only non-empty summary texts are considered valid. This change enhances the robustness of the message processing by utilizing a dedicated method for summary text retrieval, improving overall reliability.

* fix(config): replace z.any() with explicit union in summarization schema

Model parameters (temperature, top_p, etc.) are constrained to
primitive types rather than the policy-violating z.any().

* refactor(agents): deduplicate CLAUDE_TOKEN_CORRECTION constant

Export from the TS source in packages/api and import in the JS client,
eliminating the static class property that could drift out of sync.

* refactor(agents): eliminate duplicate selfProvider in buildAgentContext

selfProvider and provider were derived from the same expression with
different type casts. Consolidated to a single provider variable.

* refactor(agents): extract shared SSE handlers and restrict log levels

- buildSummarizationHandlers() factory replaces triplicated handler
  blocks across responses.js and openai.js
- agentLogHandlerObj exported from callbacks.js for consistent reuse
- agentLogHandler restricted to an allowlist of safe log levels
  (debug, info, warn, error) instead of accepting arbitrary strings

* fix(SSE): batch summarize deltas, add exhaustiveness check, conditional error announcement

- ON_SUMMARIZE_DELTA coalesces rapid-fire renders via requestAnimationFrame
  instead of calling setMessages per chunk
- Exhaustive never-check on TStepEvent catches unhandled variants at
  compile time when new StepEvents are added
- ON_SUMMARIZE_COMPLETE error announcement only fires when a summary
  part was actually present and removed

* feat(agents): persist instruction overhead in contextMeta and seed across runs

Extend contextMeta with instructionOverhead and toolCount so the
provider-observed instruction overhead is persisted on the response message
and seeded into the pruner on subsequent runs. This enables the pruner to
use a calibrated budget from the first call instead of waiting for a
provider observation, preventing the ratio collapse caused by local
tokenizer overestimating tool schema tokens.

The seeded overhead is only used when encoding and tool count match
between runs, ensuring stale values from different configurations
are discarded.

* test(agents): enhance OpenAI test mocks for summarization handlers

Updated the OpenAI test suite to include additional mock implementations for summarization handlers, including buildSummarizationHandlers, markSummarizationUsage, and agentLogHandlerObj. This improves test coverage and ensures consistent behavior during testing.

* fix(agents): address review findings for summarization v2

Cancel rAF on unmount to prevent stale Recoil writes from dead
component context. Clear orphaned summarizing:true parts when
ON_SUMMARIZE_COMPLETE arrives without a summary payload. Add null
guard and safe spread to agentLogHandler. Handle Anthropic-format
base64 image/* documents in estimateMediaTokensForMessage. Use
role="region" for expandable summary content. Add .describe() to
contextMeta Zod fields. Extract duplicate usage loop into helper.

* refactor: simplify contextMeta to calibrationRatio + encoding only

Remove instructionOverhead and toolCount from cross-run persistence —
instruction tokens change too frequently between runs (prompt edits,
tool changes) for a persisted seed to be reliable. The intra-run
calibration in the pruner still self-corrects via provider observations.
contextMeta now stores only the tokenizer-bias ratio and encoding,
which are stable across instruction changes.

* test(SSE): enhance useStepHandler tests for ON_SUMMARIZE_COMPLETE behavior

Updated the test for ON_SUMMARIZE_COMPLETE to clarify that it finalizes the existing part with summarizing set to false when the summary is undefined. Added assertions to verify the correct behavior of message updates and the state of summary parts.

* refactor(BaseClient): remove handleContextStrategy and truncateToolCallOutputs functions

Eliminated the handleContextStrategy method from BaseClient to streamline message handling. Also removed the truncateToolCallOutputs function from the prompts module, simplifying the codebase and improving maintainability.

* refactor: add AGENT_DEBUG_LOGGING option and refactor token count handling in BaseClient

Introduced AGENT_DEBUG_LOGGING to .env.example for enhanced debugging capabilities. Refactored token count handling in BaseClient by removing the handleTokenCountMap method and simplifying token count updates. Updated AgentClient to log detailed token count recalculations and adjustments, improving traceability during message processing.

* chore: update dependencies in package-lock.json and package.json files

Bumped versions of several dependencies, including @librechat/agents to ^3.1.62 and various AWS SDK packages to their latest versions. This ensures compatibility and incorporates the latest features and fixes.

* chore: imports order

* refactor: extract summarization config resolution from buildAgentContext

* refactor: rename and simplify summarization configuration shaping function

* refactor: replace AgentClient token counting methods with single-pass pure utility

Extract getTokenCount() and getTokenCountForMessage() from AgentClient
into countFormattedMessageTokens(), a pure function in packages/api that
handles text, tool_call, image, and document content types in one loop.

- Decompose estimateMediaTokensForMessage into block-level helpers
  (estimateImageDataTokens, estimateImageBlockTokens, estimateDocumentBlockTokens)
  shared by both estimateMediaTokensForMessage and the new single-pass function
- Remove redundant per-call getEncoding() resolution (closure captures once)
- Remove deprecated gpt-3.5-turbo-0301 model branching
- Drop this.getTokenCount guard from BaseClient.sendMessage

* refactor: streamline token counting in createTokenCounter function

Simplified the createTokenCounter function by removing the media token estimation and directly calculating the token count. This change enhances clarity and performance by consolidating the token counting logic into a single pass, while maintaining compatibility with Claude's token correction.

* refactor: simplify summarization configuration types

Removed the AppSummarizationConfig type and directly used SummarizationConfig in the AppConfig interface. This change streamlines the type definitions and enhances consistency across the codebase.

* chore: import order

* fix: summarization event handling in useStepHandler

- Cancel pending summarizeDeltaRaf in clearStepMaps to prevent stale
  frames firing after map reset or component unmount
- Move announcePolite('summarize_completed') inside the didFinalize
  guard so screen readers only announce when finalization actually occurs
- Remove dead cleanup closure returned from stepHandler useCallback body
  that was never invoked by any caller

* fix: estimate tokens for non-PDF/non-image base64 document blocks

Previously estimateDocumentBlockTokens returned 0 for unrecognized MIME
types (e.g. text/plain, application/json), silently underestimating
context budget. Fall back to character-based heuristic or countTokens.

* refactor: return cloned usage from markSummarizationUsage

Avoid mutating LangChain's internal usage_metadata object by returning
a shallow clone with the usage_type tag. Update all call sites in
callbacks, openai, and responses controllers to use the returned value.

* refactor: consolidate debug logging loops in buildMessages

Merge the two sequential O(n) debug-logging passes over orderedMessages
into a single pass inside the map callback where all data is available.

* refactor: narrow SummaryContentPart.content type

Replace broad Agents.MessageContentComplex[] with the specific
Array<{ type: ContentTypes.TEXT; text: string }> that all producers
and consumers already use, improving compile-time safety.

* refactor: use single output array in recordCollectedUsage

Have processUsageGroup append to a shared array instead of returning
separate arrays that are spread into a third, reducing allocations.

* refactor: use for...in in hydrateMissingIndexTokenCounts

Replace Object.entries with for...in to avoid allocating an
intermediate tuple array during token map hydration.
2026-03-21 14:28:56 -04:00
Danny Avila
67db0c1cb3
🗑️ chore: Remove Action Test Suite and Update Mock Implementations (#12268)
- Deleted the Action test suite located in `api/models/Action.spec.js` to streamline the codebase.
- Updated various test files to reflect changes in model mocks, consolidating mock implementations for user-related actions and enhancing clarity.
- Improved consistency in test setups by aligning with the latest model updates and removing redundant mock definitions.
2026-03-21 14:28:55 -04:00
Danny Avila
dd72b7b17e
🔄 chore: Consolidate agent model imports across middleware and tests from rebase
- Updated imports for `createAgent` and `getAgent` to streamline access from a unified `~/models` path.
- Enhanced test files to reflect the new import structure, ensuring consistency and maintainability across the codebase.
- Improved clarity by removing redundant imports and aligning with the latest model updates.
2026-03-21 14:28:55 -04:00
Atef Bellaaj
a0fed6173c
🗂️ refactor: Migrate S3 Storage to TypeScript in packages/api (#11947)
* Migrate S3 storage module with unit and integration tests

  - Migrate S3 CRUD and image operations to packages/api/src/storage/s3/
  - Add S3ImageService class with dependency injection
  - Add unit tests using aws-sdk-client-mock
  - Add integration tests with real s3 bucket (condition presence of  AWS_TEST_BUCKET_NAME)

* AI Review Findings Fixes

* chore: tests and refactor S3 storage types

- Added mock implementations for the 'sharp' library in various test files to improve image processing testing.
- Updated type references in S3 storage files from MongoFile to TFile for consistency and type safety.
- Refactored S3 CRUD operations to ensure proper handling of file types and improve code clarity.
- Enhanced integration tests to validate S3 file operations and error handling more effectively.

* chore: rename test file

* Remove duplicate import of refreshS3Url

* chore: imports order

* fix: remove duplicate imports for S3 URL handling in UserController

* fix: remove duplicate import of refreshS3FileUrls in files.js

* test: Add mock implementations for 'sharp' and '@librechat/api' in UserController tests

- Introduced mock functions for the 'sharp' library to facilitate image processing tests, including metadata retrieval and buffer conversion.
- Enhanced mocking for '@librechat/api' to ensure consistent behavior in tests, particularly for the needsRefresh and getNewS3URL functions.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-21 14:28:55 -04:00
Danny Avila
9e0592a236
📜 feat: Implement System Grants for Capability-Based Authorization (#11896)
* feat: Implement System Grants for Role-Based Capabilities

- Added a new `systemGrant` model and associated methods to manage role-based capabilities within the application.
- Introduced middleware functions `hasCapability` and `requireCapability` to check user permissions based on their roles.
- Updated the database seeding process to include system grants for the ADMIN role, ensuring all necessary capabilities are assigned on startup.
- Enhanced type definitions and schemas to support the new system grant functionality, improving overall type safety and clarity in the codebase.

* test: Add unit tests for capabilities middleware and system grant methods

- Introduced comprehensive unit tests for the capabilities middleware, including `hasCapability` and `requireCapability`, ensuring proper permission checks based on user roles.
- Added tests for the `SystemGrant` methods, verifying the seeding of system grants, capability granting, and revocation processes.
- Enhanced test coverage for edge cases, including idempotency of grant operations and handling of unexpected errors in middleware.
- Utilized mocks for database interactions to isolate tests and improve reliability.

* refactor: Transition to Capability-Based Access Control

- Replaced role-based access checks with capability-based checks across various middleware and routes, enhancing permission management.
- Introduced `hasCapability` and `requireCapability` functions to streamline capability verification for user actions.
- Updated relevant routes and middleware to utilize the new capability system, ensuring consistent permission enforcement.
- Enhanced type definitions and added tests for the new capability functions, improving overall code reliability and maintainability.

* test: Enhance capability-based access tests for ADMIN role

- Updated tests to reflect the new capability-based access control, specifically for the ADMIN role.
- Modified test descriptions to clarify that users with the MANAGE_AGENTS capability can bypass permission checks.
- Seeded capabilities for the ADMIN role in multiple test files to ensure consistent permission checks across different routes and middleware.
- Improved overall test coverage for capability verification, ensuring robust permission management.

* test: Update capability tests for MCP server access

- Renamed test to reflect the correct capability for bypassing permission checks, changing from MANAGE_AGENTS to MANAGE_MCP_SERVERS.
- Updated seeding of capabilities for the ADMIN role to align with the new capability structure.
- Ensured consistency in capability definitions across tests and middleware for improved permission management.

* feat: Add hasConfigCapability for enhanced config access control

- Introduced `hasConfigCapability` function to check user permissions for managing or reading specific config sections.
- Updated middleware to export the new capability function, ensuring consistent access control across the application.
- Enhanced unit tests to cover various scenarios for the new capability, improving overall test coverage and reliability.

* fix: Update tenantId filter in createSystemGrantMethods

- Added a condition to set tenantId filter to { $exists: false } when tenantId is null, ensuring proper handling of cases where tenantId is not provided.
- This change improves the robustness of the system grant methods by explicitly managing the absence of tenantId in the filter logic.

* fix: account deletion capability check

- Updated the `canDeleteAccount` middleware to ensure that the `hasManageUsers` capability check only occurs if a user is present, preventing potential errors when the user object is undefined.
- This change improves the robustness of the account deletion logic by ensuring proper handling of user permissions.

* refactor: Optimize seeding of system grants for ADMIN role

- Replaced sequential capability granting with parallel execution using Promise.all in the seedSystemGrants function.
- This change improves performance and efficiency during the initialization of system grants, ensuring all capabilities are granted concurrently.

* refactor: Simplify systemGrantSchema index definition

- Removed the sparse option from the unique index on principalType, principalId, capability, and tenantId in the systemGrantSchema.
- This change streamlines the index definition, potentially improving query performance and clarity in the schema design.

* refactor: Reorganize role capability check in roles route

- Moved the capability check for reading roles to occur after parsing the roleName, improving code clarity and structure.
- This change ensures that the authorization logic is consistently applied before fetching role details, enhancing overall permission management.

* refactor: Remove unused ISystemGrant interface from systemCapabilities.ts

- Deleted the ISystemGrant interface as it was no longer needed, streamlining the code and improving clarity.
- This change helps reduce clutter in the file and focuses on relevant capabilities for the system.

* refactor: Migrate SystemCapabilities to data-schemas

- Replaced imports of SystemCapabilities from 'librechat-data-provider' with imports from '@librechat/data-schemas' across multiple files.
- This change centralizes the management of system capabilities, improving code organization and maintainability.

* refactor: Update account deletion middleware and capability checks

- Modified the `canDeleteAccount` middleware to ensure that the account deletion permission is only granted to users with the `MANAGE_USERS` capability, improving security and clarity in permission management.
- Enhanced error logging for unauthorized account deletion attempts, providing better insights into permission issues.
- Updated the `capabilities.ts` file to ensure consistent handling of user authentication checks, improving robustness in capability verification.
- Refined type definitions in `systemGrant.ts` and `systemGrantMethods.ts` to utilize the `PrincipalType` enum, enhancing type safety and code clarity.

* refactor: Extract principal ID normalization into a separate function

- Introduced `normalizePrincipalId` function to streamline the normalization of principal IDs based on their type, enhancing code clarity and reusability.
- Updated references in `createSystemGrantMethods` to utilize the new normalization function, improving maintainability and reducing code duplication.

* test: Add unit tests for principalId normalization in systemGrant

- Introduced tests for the `grantCapability`, `revokeCapability`, and `getCapabilitiesForPrincipal` methods to verify correct handling of principalId normalization between string and ObjectId formats.
- Enhanced the `capabilities.ts` middleware to utilize the `PrincipalType` enum for improved type safety.
- Added a new utility function `normalizePrincipalId` to streamline principal ID normalization logic, ensuring consistent behavior across the application.

* feat: Introduce capability implications and enhance system grant methods

- Added `CapabilityImplications` to define relationships between broader and implied capabilities, allowing for more intuitive permission checks.
- Updated `createSystemGrantMethods` to expand capability queries to include implied capabilities, improving authorization logic.
- Enhanced `systemGrantSchema` to include an `expiresAt` field for future TTL enforcement of grants, and added validation to ensure `tenantId` is not set to null.
- Documented authorization requirements for prompt group and prompt deletion methods to clarify access control expectations.

* test: Add unit tests for canDeleteAccount middleware

- Introduced unit tests for the `canDeleteAccount` middleware to verify account deletion permissions based on user roles and capabilities.
- Covered scenarios for both allowed and blocked account deletions, including checks for ADMIN users with the `MANAGE_USERS` capability and handling of undefined user cases.
- Enhanced test structure to ensure clarity and maintainability of permission checks in the middleware.

* fix: Add principalType enum validation to SystemGrant schema

Without enum validation, any string value was accepted for principalType
and silently stored. Invalid documents would never match capability
queries, creating phantom grants impossible to diagnose without raw DB
inspection. All other ACL models in the codebase validate this field.

* fix: Replace seedSystemGrants Promise.all with bulkWrite for concurrency safety

When two server instances start simultaneously (K8s rolling deploy, PM2
cluster), both call seedSystemGrants. With Promise.all + findOneAndUpdate
upsert, both instances may attempt to insert the same documents, causing
E11000 duplicate key errors that crash server startup.

bulkWrite with ordered:false handles concurrent upserts gracefully and
reduces 17 individual round trips to a single network call. The returned
documents (previously discarded) are no longer fetched.

* perf: Add AsyncLocalStorage per-request cache for capability checks

Every hasCapability call previously required 2 DB round trips
(getUserPrincipals + SystemGrant.exists) — replacing what were O(1)
string comparisons. Routes like patchPromptGroup triggered this twice,
and hasConfigCapability's fallback path resolved principals twice.

This adds a per-request AsyncLocalStorage cache that:
- Caches resolved principals (same for all checks within one request)
- Caches capability check results (same user+cap = same answer)
- Automatically scoped to request lifetime (no stale grants)
- Falls through to DB when no store exists (background jobs, tests)
- Requires no signature changes to hasCapability

The capabilityContextMiddleware is registered at the app level before
all routes, initializing a fresh store per request.

* fix: Add error handling for inline hasCapability calls

canDeleteAccount, fetchAssistants, and validateAuthor all call
hasCapability without try-catch. These were previously O(1) string
comparisons that could never throw. Now they hit the database and can
fail on connection timeout or transient errors.

Wrap each call in try-catch, defaulting to deny (false) on error.
This ensures a DB hiccup returns a clean 403 instead of an unhandled
500 with a stack trace.

* test: Add canDeleteAccount DB-error resilience test

Tests that hasCapability rejection (e.g., DB timeout) results in a clean
403 rather than an unhandled exception. Validates the error handling
added in the previous commit.

* refactor: Use barrel import for hasCapability in validateAuthor

Import from ~/server/middleware barrel instead of directly from
~/server/middleware/roles/capabilities for consistency with other
non-middleware consumers. Files within the middleware barrel itself
must continue using direct imports to avoid circular requires.

* refactor: Remove misleading pre('save') hook from SystemGrant schema

The pre('save') hook normalized principalId for USER/GROUP principals,
but the primary write path (grantCapability) uses findOneAndUpdate —
which does not trigger save hooks. The normalization was already handled
explicitly in grantCapability itself. The hook created a false impression
of schema-level enforcement that only covered save()/create() paths.

Replace with a comment documenting that all writes must go through
grantCapability.

* feat: Add READ_ASSISTANTS capability to complete manage/read pair

Every other managed resource had a paired READ_X / MANAGE_X capability
except assistants. This adds READ_ASSISTANTS and registers the
MANAGE_ASSISTANTS → READ_ASSISTANTS implication in CapabilityImplications,
enabling future read-only assistant visibility grants.

* chore: Reorder systemGrant methods for clarity

Moved hasCapabilityForPrincipals to a more logical position in the returned object of createSystemGrantMethods, improving code readability. This change also maintains the inclusion of seedSystemGrants in the export, ensuring all necessary methods are available.

* fix: Wrap seedSystemGrants in try-catch to avoid blocking startup

Seeding capabilities is idempotent and will succeed on the next restart.
A transient DB error during seeding should not prevent the server from
starting — log the error and continue.

* refactor: Improve capability check efficiency and add audit logging

Move hasCapability calls after cheap early-exits in validateAuthor and
fetchAssistants so the DB check only runs when its result matters. Add
logger.debug on every capability bypass grant across all 7 call sites
for auditability, and log errors in catch blocks instead of silently
swallowing them.

* test: Add integration tests for AsyncLocalStorage capability caching

Exercises the full vertical — ALS context, generateCapabilityCheck,
real getUserPrincipals, real hasCapabilityForPrincipals, real MongoDB
via MongoMemoryServer. Covers per-request caching, cross-context
isolation, concurrent request isolation, negative caching, capability
implications, tenant scoping, group-based grants, and requireCapability
middleware.

* test: Add systemGrant data-layer and ALS edge-case integration tests

systemGrant.spec.ts (51 tests): Full integration tests for all
systemGrant methods against real MongoDB — grant/revoke lifecycle,
principalId normalization (string→ObjectId for USER/GROUP, string for
ROLE), capability implications (both directions), tenant scoping,
schema validation (null tenantId, invalid enum, required fields,
unique compound index).

capabilities.integration.spec.ts (27 tests): Adds ALS edge cases —
missing context degrades gracefully with no caching (background jobs,
child processes), nested middleware creates independent inner context,
optional-chaining safety when store is undefined, mid-request grant
changes are invisible due to result caching, requireCapability works
without ALS, and interleaved concurrent contexts maintain isolation.

* fix: Add worker thread guards to capability ALS usage

Detect when hasCapability or capabilityContextMiddleware is called from
a worker thread (where ALS context does not propagate from the parent).
hasCapability logs a warn-once per factory instance; the middleware logs
an error since mounting Express middleware in a worker is likely a
misconfiguration. Both continue to function correctly — the guard is
observability, not a hard block.

* fix: Include tenantId in ALS principal cache key for tenant isolation

The principal cache key was user.id:user.role, which would reuse
cached principals across tenants for the same user within a request.
When getUserPrincipals gains tenant-scoped group resolution, principals
from tenant-a would incorrectly serve tenant-b checks. Changed to
user.id:user.role:user.tenantId to prevent cross-tenant cache hits.

Adds integration test proving separate principal lookups per tenantId.

* test: Remove redundant mocked capabilities.spec.js

The JS wrapper test (7 tests, all mocked) is a strict subset of
capabilities.integration.spec.ts (28 tests, real MongoDB). Every
scenario it covered — hasCapability true/false, tenantId passthrough,
requireCapability 403/500, error handling — is tested with higher
fidelity in the integration suite.

* test: Replace mocked canDeleteAccount tests with real MongoDB integration

Remove hasCapability mock — tests now exercise the full capability
chain against real MongoDB (getUserPrincipals, hasCapabilityForPrincipals,
SystemGrant collection). Only mocks remaining are logger and cache.

Adds new coverage: admin role without grant is blocked, user-level
grant bypasses deletion restriction, null user handling.

* test: Add comprehensive tests for ACL entry management and user group methods

Introduces new tests for `deleteAclEntries`, `bulkWriteAclEntries`, and `findPublicResourceIds` in `aclEntry.spec.ts`, ensuring proper functionality for deleting and bulk managing ACL entries. Additionally, enhances `userGroup.spec.ts` with tests for finding groups by ID and name pattern, including external ID matching and source filtering. These changes improve coverage and validate the integrity of ACL and user group operations against real MongoDB interactions.

* refactor: Update capability checks and logging for better clarity and error handling

Replaced `MANAGE_USERS` with `ACCESS_ADMIN` in the `canDeleteAccount` middleware and related tests to align with updated permission structure. Enhanced logging in various middleware functions to use `logger.warn` for capability check failures, providing clearer error messages. Additionally, refactored capability checks in the `patchPromptGroup` and `validateAuthor` functions to improve readability and maintainability. This commit also includes adjustments to the `systemGrant` methods to implement retry logic for transient failures during capability seeding, ensuring robustness in the face of database errors.

* refactor: Enhance logging and retry logic in seedSystemGrants method

Updated the logging format in the seedSystemGrants method to include error messages for better clarity. Improved the retry mechanism by explicitly mocking multiple failures in tests, ensuring robust error handling during transient database issues. Additionally, refined imports in the systemGrant schema for better type management.

* refactor: Consolidate imports in canDeleteAccount middleware

Merged logger and SystemCapabilities imports from the data-schemas module into a single line for improved readability and maintainability of the code. This change streamlines the import statements in the canDeleteAccount middleware.

* test: Enhance systemGrant tests for error handling and capability validation

Added tests to the systemGrant methods to handle various error scenarios, including E11000 race conditions, invalid ObjectId strings for USER and GROUP principals, and invalid capability strings. These enhancements improve the robustness of the capability granting and revoking logic, ensuring proper error propagation and validation of inputs.

* fix: Wrap hasCapability calls in deny-by-default try-catch at remaining sites

canAccessResource, files.js, and roles.js all had hasCapability inside
outer try-catch blocks that returned 500 on DB failure instead of
falling through to the regular ACL check. This contradicts the
deny-by-default pattern used everywhere else.

Also removes raw error.message from the roles.js 500 response to
prevent internal host/connection info leaking to clients.

* fix: Normalize user ID in canDeleteAccount before passing to hasCapability

requireCapability normalizes req.user.id via _id?.toString() fallback,
but canDeleteAccount passed raw req.user directly. If req.user.id is
absent (some auth layers only populate _id), getUserPrincipals received
undefined, silently returning empty principals and blocking the bypass.

* fix: Harden systemGrant schema and type safety

- Reject empty string tenantId in schema validator (was only blocking
  null; empty string silently orphaned documents)
- Fix reverseImplications to use BaseSystemCapability[] instead of
  string[], preserving the narrow discriminated type
- Document READ_ASSISTANTS as reserved/unenforced

* test: Use fake timers for seedSystemGrants retry tests and add tenantId validation

- Switch retry tests to jest.useFakeTimers() to eliminate 3+ seconds
  of real setTimeout delays per test run
- Add regression test for empty-string tenantId rejection

* docs: Add TODO(#12091) comments for tenant-scoped capability gaps

In multi-tenant mode, platform-level grants (no tenantId) won't match
tenant-scoped queries, breaking admin access. getUserPrincipals also
returns cross-tenant group memberships. Both need fixes in #12091.
2026-03-21 14:28:54 -04:00
Danny Avila
8ba2bde5c1
📦 refactor: Consolidate DB models, encapsulating Mongoose usage in data-schemas (#11830)
* chore: move database model methods to /packages/data-schemas

* chore: add TypeScript ESLint rule to warn on unused variables

* refactor: model imports to streamline access

- Consolidated model imports across various files to improve code organization and reduce redundancy.
- Updated imports for models such as Assistant, Message, Conversation, and others to a unified import path.
- Adjusted middleware and service files to reflect the new import structure, ensuring functionality remains intact.
- Enhanced test files to align with the new import paths, maintaining test coverage and integrity.

* chore: migrate database models to packages/data-schemas and refactor all direct Mongoose Model usage outside of data-schemas

* test: update agent model mocks in unit tests

- Added `getAgent` mock to `client.test.js` to enhance test coverage for agent-related functionality.
- Removed redundant `getAgent` and `getAgents` mocks from `openai.spec.js` and `responses.unit.spec.js` to streamline test setup and reduce duplication.
- Ensured consistency in agent mock implementations across test files.

* fix: update types in data-schemas

* refactor: enhance type definitions in transaction and spending methods

- Updated type definitions in `checkBalance.ts` to use specific request and response types.
- Refined `spendTokens.ts` to utilize a new `SpendTxData` interface for better clarity and type safety.
- Improved transaction handling in `transaction.ts` by introducing `TransactionResult` and `TxData` interfaces, ensuring consistent data structures across methods.
- Adjusted unit tests in `transaction.spec.ts` to accommodate new type definitions and enhance robustness.

* refactor: streamline model imports and enhance code organization

- Consolidated model imports across various controllers and services to a unified import path, improving code clarity and reducing redundancy.
- Updated multiple files to reflect the new import structure, ensuring all functionalities remain intact.
- Enhanced overall code organization by removing duplicate import statements and optimizing the usage of model methods.

* feat: implement loadAddedAgent and refactor agent loading logic

- Introduced `loadAddedAgent` function to handle loading agents from added conversations, supporting multi-convo parallel execution.
- Created a new `load.ts` file to encapsulate agent loading functionalities, including `loadEphemeralAgent` and `loadAgent`.
- Updated the `index.ts` file to export the new `load` module instead of the deprecated `loadAgent`.
- Enhanced type definitions and improved error handling in the agent loading process.
- Adjusted unit tests to reflect changes in the agent loading structure and ensure comprehensive coverage.

* refactor: enhance balance handling with new update interface

- Introduced `IBalanceUpdate` interface to streamline balance update operations across the codebase.
- Updated `upsertBalanceFields` method signatures in `balance.ts`, `transaction.ts`, and related tests to utilize the new interface for improved type safety.
- Adjusted type imports in `balance.spec.ts` to include `IBalanceUpdate`, ensuring consistency in balance management functionalities.
- Enhanced overall code clarity and maintainability by refining type definitions related to balance operations.

* feat: add unit tests for loadAgent functionality and enhance agent loading logic

- Introduced comprehensive unit tests for the `loadAgent` function, covering various scenarios including null and empty agent IDs, loading of ephemeral agents, and permission checks.
- Enhanced the `initializeClient` function by moving `getConvoFiles` to the correct position in the database method exports, ensuring proper functionality.
- Improved test coverage for agent loading, including handling of non-existent agents and user permissions.

* chore: reorder memory method exports for consistency

- Moved `deleteAllUserMemories` to the correct position in the exported memory methods, ensuring a consistent and logical order of method exports in `memory.ts`.
2026-03-21 14:28:53 -04:00
Danny Avila
58f128bee7
🗑️ chore: Remove Deprecated Project Model and Associated Fields (#11773)
* chore: remove projects and projectIds usage

* chore: empty line linting

* chore: remove isCollaborative property across agent models and related tests

- Removed the isCollaborative property from agent models, controllers, and tests, as it is deprecated in favor of ACL permissions.
- Updated related validation schemas and data provider types to reflect this change.
- Ensured all references to isCollaborative were stripped from the codebase to maintain consistency and clarity.
2026-03-21 14:28:53 -04:00
Danny Avila
365a0dc0f6
🩺 refactor: Surface Descriptive OCR Error Messages to Client (#12344)
* fix: pass along error message when OCR fails

Right now, if OCR fails, it just says "Error processing file" which
isn't very helpful.

The `error.message` does has helpful information in it, but our
filter wasn't including the right case to pass it along. Now it does!

* fix: extract shared upload error filter, apply to images route

The 'Unable to extract text from' error was only allowlisted in the
files route but not the images route, which also calls
processAgentFileUpload. Extract the duplicated error filter logic
into a shared resolveUploadErrorMessage utility in packages/api so
both routes stay in sync.

---------

Co-authored-by: Dan Lew <daniel@mightyacorn.com>
2026-03-20 17:10:25 -04:00
Danny Avila
9a64791e3e
🪢 fix: Action Domain Encoding Collision for HTTPS URLs (#12271)
* fix: strip protocol from domain before encoding in `domainParser`

All https:// (and http://) domains produced the same 10-char base64
prefix due to ENCODED_DOMAIN_LENGTH truncation, causing tool name
collisions for agents with multiple actions.

Strip the protocol before encoding so the base64 key is derived from
the hostname. Add `legacyDomainEncode` to preserve the old encoding
logic for backward-compatible matching of existing stored actions.

* fix: backward-compatible tool matching in ToolService

Update `getActionToolDefinitions` to match stored tools against both
new and legacy domain encodings. Update `loadActionToolsForExecution`
to resolve model-called tool names via a `normalizedToDomain` map
that includes both encoding variants, with legacy fallback for
request builder lookup.

* fix: action route save/delete domain encoding issues

Save routes now remove old tools matching either new or legacy domain
encoding, preventing stale entries when an action's encoding changes
on update.

Delete routes no longer re-encode the already-encoded domain extracted
from the stored actions array, which was producing incorrect keys and
leaving orphaned tools.

* test: comprehensive coverage for action domain encoding

Rewrite ActionService tests to cover real matching patterns used by
ToolService and action routes. Tests verify encode/decode round-trips,
protocol stripping, backward-compatible tool name matching at both
definition and execution phases, save-route cleanup of old/new
encodings, delete-route domain extraction, and the collision fix for
multi-action agents.

* fix: add legacy domain compat to all execution paths, make legacyDomainEncode sync

CRITICAL: processRequiredActions (assistants path) was not updated with
legacy domain matching — existing assistants with https:// domain actions
would silently fail post-deployment because domainMap only had new encoding.

MAJOR: loadAgentTools definitionsOnly=false path had the same issue.

Both now use a normalizedToDomain map with legacy+new entries and extract
function names via the matched key (not the canonical domain).

Also: make legacyDomainEncode synchronous (no async operations), store
legacyNormalized in processedActionSets to eliminate recomputation in
the per-tool fallback, and hoist domainSeparatorRegex to module level.

* refactor: clarify domain variable naming and tool-filter helpers in action routes

Rename shadowed 'domain' to 'encodedDomain' to separate raw URL from
encoded key in both agent and assistant save routes.

Rename shouldRemoveTool to shouldRemoveAgentTool / shouldRemoveAssistantTool
to make the distinct data-shape guards explicit.

Remove await on now-synchronous legacyDomainEncode.

* test: expand coverage for all review findings

- Add validateAndUpdateTool tests (protocol-stripping match logic)
- Restore unicode domain encode/decode/round-trip tests
- Add processRequiredActions matching pattern tests (assistants path)
- Add legacy guard skip test for short bare hostnames
- Add pre-normalized Set test for definition-phase optimization
- Fix corrupt-cache test to assert typeof instead of toBeDefined
- Verify legacyDomainEncode is synchronous (not a Promise)
- Remove all await on legacyDomainEncode (now sync)

58 tests, up from 44.

* fix: address follow-up review findings A-E

A: Fix stale JSDoc @returns {Promise<string>} on now-synchronous
   legacyDomainEncode — changed to @returns {string}.

B: Rename normalizedToDomain to domainLookupMap in processRequiredActions
   and loadAgentTools where keys are raw encoded domains (not normalized),
   avoiding confusion with loadActionToolsForExecution where keys ARE
   normalized.

C: Pre-normalize actionToolNames into a Set<string> in
   getActionToolDefinitions, replacing O(signatures × tools) per-check
   .some() + .replace() with O(1) Set.has() lookups.

D: Remove stripProtocol from ActionService exports — it is a one-line
   internal helper. Spec tests for it removed; behavior is fully covered
   by domainParser protocol-stripping tests.

E: Fix pre-existing bug where processRequiredActions re-loaded action
   sets on every missing-tool iteration. The guard !actionSets.length
   always re-triggered because actionSets was reassigned to a plain
   object (whose .length is undefined). Replaced with a null-check
   on a dedicated actionSetsData variable.

* fix: strip path and query from domain URLs in stripProtocol

URLs like 'https://api.example.com/v1/endpoint?foo=bar' previously
retained the path after protocol stripping, contaminating the encoded
domain key. Now strips everything after the first '/' following the
host, using string indexing instead of URL parsing to avoid punycode
normalization of unicode hostnames.

Closes Copilot review comments 1, 2, and 5.
2026-03-17 01:38:51 -04:00
Danny Avila
0c27ad2d55
🛡️ refactor: Scope Action Mutations by Parent Resource Ownership (#12237)
* 🛡️ fix: Scope action mutations by parent resource ownership

Prevent cross-tenant action overwrites by validating that an existing
action's agent_id/assistant_id matches the URL parameter before allowing
updates or deletes. Without this, a user with EDIT access on their own
agent could reference a foreign action_id to hijack another agent's
action record.

* 🛡️ fix: Harden action ownership checks and scope write filters

- Remove && short-circuit that bypassed the guard when agent_id or
  assistant_id was falsy (e.g. assistant-owned actions have no agent_id,
  so the check was skipped entirely on the agents route).
- Include agent_id / assistant_id in the updateAction and deleteAction
  query filters so the DB write itself enforces ownership atomically.
- Log a warning when deleteAction returns null (silent no-op from
  data-integrity mismatch).

* 📝 docs: Update Action model JSDoc to reflect scoped query params

*  test: Add Action ownership scoping tests

Cover update, delete, and cross-type protection scenarios using
MongoMemoryServer to verify that scoped query filters (agent_id,
assistant_id) prevent cross-tenant overwrites and deletions at the
database level.

* 🛡️ fix: Scope updateAction filter in agent duplication handler

* 🐛 fix: Use action metadata domain instead of action_id when duplicating agent actions

The duplicate handler was splitting `action.action_id` by `actionDelimiter`
to extract the domain, but `action_id` is a bare nanoid that doesn't
contain the delimiter. This produced malformed entries in the duplicated
agent's actions array (nanoid_action_newNanoid instead of
domain_action_newNanoid). The domain is available on `action.metadata.domain`.

*  test: Add integration tests for agent duplication action handling

Uses MongoMemoryServer with real Agent and Action models to verify:
- Duplicated actions use metadata.domain (not action_id) for the
  agent actions array entries
- Sensitive metadata fields are stripped from duplicated actions
- Original action documents are not modified
2026-03-15 10:19:29 -04:00
Danny Avila
7bc793b18d
🌊 fix: Prevent Buffered Event Duplication on SSE Resume Connections (#12225)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* fix: skipBufferReplay for job resume connections

- Introduced a new option `skipBufferReplay` in the `subscribe` method of `GenerationJobManagerClass` to prevent duplication of events when resuming a connection.
- Updated the logic to conditionally skip replaying buffered events if a sync event has already been sent, enhancing the efficiency of event handling during reconnections.
- Added integration tests to verify the correct behavior of the new option, ensuring that no buffered events are replayed when `skipBufferReplay` is true, while still allowing for normal replay behavior when false.

* refactor: Update GenerationJobManager to handle sync events more efficiently

- Modified the `subscribe` method to utilize a new `skipBufferReplay` option, allowing for the prevention of duplicate events during resume connections.
- Enhanced the logic in the `chat/stream` route to conditionally skip replaying buffered events if a sync event has already been sent, improving event handling efficiency.
- Updated integration tests to verify the correct behavior of the new option, ensuring that no buffered events are replayed when `skipBufferReplay` is true, while maintaining normal replay behavior when false.

* test: Enhance GenerationJobManager integration tests for Redis mode

- Updated integration tests to conditionally run based on the USE_REDIS environment variable, allowing for better control over Redis-related tests.
- Refactored test descriptions to utilize a dynamic `describeRedis` function, improving clarity and organization of tests related to Redis functionality.
- Removed redundant checks for Redis availability within individual tests, streamlining the test logic and enhancing readability.

* fix: sync handler state for new messages on resume

The sync event's else branch (new response message) was missing
resetContentHandler() and syncStepMessage() calls, leaving stale
handler state that caused subsequent deltas to build on partial
content instead of the synced aggregatedContent.

* feat: atomic subscribeWithResume to close resume event gap

Replaces separate getResumeState() + subscribe() calls with a single
subscribeWithResume() that atomically drains earlyEventBuffer between
the resume snapshot and the subscribe. In in-memory mode, drained events
are returned as pendingEvents for the client to replay after sync.
In Redis mode, pendingEvents is empty since chunks are already persisted.

The route handler now uses the atomic method for resume connections and
extracted shared SSE write helpers to reduce duplication. The client
replays any pendingEvents through the existing step/content handlers
after applying aggregatedContent from the sync payload.

* fix: only capture gap events in subscribeWithResume, not pre-snapshot buffer

The previous implementation drained the entire earlyEventBuffer into
pendingEvents, but pre-snapshot events are already reflected in
aggregatedContent. Replaying them re-introduced the duplication bug
through a different vector.

Now records buffer length before getResumeState() and slices from that
index, so only events arriving during the async gap are returned as
pendingEvents.

Also:
- Handle pendingEvents when resumeState is null (replay directly)
- Hoist duplicate test helpers to shared scope
- Remove redundant writableEnded guard in onDone
2026-03-14 10:54:26 -04:00
Danny Avila
35a35dc2e9
📏 refactor: Add File Size Limits to Conversation Imports (#12221)
* fix: add file size limits to conversation import multer instance

* fix: address review findings for conversation import file size limits

* fix: use local jest.mock for data-schemas instead of global moduleNameMapper

The global @librechat/data-schemas mock in jest.config.js only provided
logger, breaking all tests that depend on createModels from the same
package. Replace with a virtual jest.mock scoped to the import spec file.

* fix: move import to top of file, pre-compute upload middleware, assert logger.warn in tests

* refactor: move resolveImportMaxFileSize to packages/api

New backend logic belongs in packages/api as TypeScript. Delete the
api/server/utils/import/limits.js wrapper and import directly from
@librechat/api in convos.js and importConversations.js. Resolver unit
tests move to packages/api; the api/ spec retains only multer behavior
tests.

* chore: rename importLimits to import

* fix: stale type reference and mock isolation in import tests

Update typeof import path from '../importLimits' to '../import' after
the rename. Clear mockLogger.warn in beforeEach to prevent cross-test
accumulation.

* fix: add resolveImportMaxFileSize to @librechat/api mock in convos.spec.js

* fix: resolve jest.mock hoisting issue in import tests

jest.mock factories are hoisted above const declarations, so the
mockLogger reference was undefined at factory evaluation time. Use a
direct import of the mocked logger module instead.

* fix: remove virtual flag from data-schemas mock for CI compatibility

virtual: true prevents the mock from intercepting the real module in
CI where @librechat/data-schemas is built, causing import.ts to use
the real logger while the test asserts against the mock.
2026-03-14 03:06:29 -04:00
Danny Avila
c6982dc180
🛡️ fix: Agent Permission Check on Image Upload Route (#12219)
* fix: add agent permission check to image upload route

* refactor: remove unused SystemRoles import and format test file for clarity

* fix: address review findings for image upload agent permission check

* refactor: move agent upload auth logic to TypeScript in packages/api

Extract pure authorization logic from agentPermCheck.js into
checkAgentUploadAuth() in packages/api/src/files/agentUploadAuth.ts.
The function returns a structured result ({ allowed, status, error })
instead of writing HTTP responses directly, eliminating the dual
responsibility and confusing sentinel return value. The JS wrapper
in /api is now a thin adapter that translates the result to HTTP.

* test: rewrite image upload permission tests as integration tests

Replace mock-heavy images-agent-perm.spec.js with integration tests
using MongoMemoryServer, real models, and real PermissionService.
Follows the established pattern in files.agents.test.js. Moves test
to sibling location (images.agents.test.js) matching backend convention.
Adds temp file cleanup assertions on 403/404 responses and covers
message_file exemption paths (boolean true, string "true", false).

* fix: widen AgentUploadAuthDeps types to accept ObjectId from Mongoose

The injected getAgent returns Mongoose documents where _id and author
are Types.ObjectId at runtime, not string. Widen the DI interface to
accept string | Types.ObjectId for _id, author, and resourceId so the
contract accurately reflects real callers.

* chore: move agent upload auth into files/agents/ subdirectory

* refactor: delete agentPermCheck.js wrapper, move verifyAgentUploadPermission to packages/api

The /api-only dependencies (getAgent, checkPermission) are now passed
as object-field params from the route call sites. Both images.js and
files.js import verifyAgentUploadPermission from @librechat/api and
inject the deps directly, eliminating the intermediate JS wrapper.

* style: fix import type ordering in agent upload auth

* fix: prevent token TTL race in MCPTokenStorage.storeTokens

When expires_in is provided, use it directly instead of round-tripping
through Date arithmetic. The previous code computed accessTokenExpiry
as a Date, then after an async encryptV2 call, recomputed expiresIn by
subtracting Date.now(). On loaded CI runners the elapsed time caused
Math.floor to truncate to 0, triggering the 1-year fallback and making
the token appear permanently valid — so refresh never fired.
2026-03-14 02:57:56 -04:00
Danny Avila
71a3b48504
🔑 fix: Require OTP Verification for 2FA Re-Enrollment and Backup Code Regeneration (#12223)
* fix: require OTP verification for 2FA re-enrollment and backup code regeneration

* fix: require OTP verification for account deletion when 2FA is enabled

* refactor: Improve code formatting and readability in TwoFactorController and UserController

- Reformatted code in TwoFactorController and UserController for better readability by aligning parameters and breaking long lines.
- Updated test cases in deleteUser.spec.js and TwoFactorController.spec.js to enhance clarity by formatting object parameters consistently.

* refactor: Consolidate OTP and backup code verification logic in TwoFactorController and UserController

- Introduced a new `verifyOTPOrBackupCode` function to streamline the verification process for TOTP tokens and backup codes across multiple controllers.
- Updated the `enable2FA`, `disable2FA`, and `deleteUserController` methods to utilize the new verification function, enhancing code reusability and readability.
- Adjusted related tests to reflect the changes in verification logic, ensuring consistent behavior across different scenarios.
- Improved error handling and response messages for verification failures, providing clearer feedback to users.

* chore: linting

* refactor: Update BackupCodesItem component to enhance OTP verification logic

- Consolidated OTP input handling by moving the 2FA verification UI logic to a more consistent location within the component.
- Improved the state management for OTP readiness, ensuring the regenerate button is only enabled when the OTP is ready.
- Cleaned up imports by removing redundant type imports, enhancing code clarity and maintainability.

* chore: lint

* fix: stage 2FA re-enrollment in pending fields to prevent disarmament window

enable2FA now writes to pendingTotpSecret/pendingBackupCodes instead of
overwriting the live fields. confirm2FA performs the atomic swap only after
the new TOTP code is verified. If the user abandons mid-flow, their
existing 2FA remains active and intact.
2026-03-14 01:51:31 -04:00
Danny Avila
189cdf581d
🔐 fix: Add User Filter to Message Deletion (#12220)
* fix: add user filter to message deletion to prevent IDOR

* refactor: streamline DELETE request syntax in messages-delete test

- Simplified the DELETE request syntax in the messages-delete.spec.js test file by combining multiple lines into a single line for improved readability. This change enhances the clarity of the test code without altering its functionality.

* fix: address review findings for message deletion IDOR fix

* fix: add user filter to message deletion in conversation tests

- Included a user filter in the message deletion test to ensure proper handling of user-specific deletions, enhancing the accuracy of the test case and preventing potential IDOR vulnerabilities.

* chore: lint
2026-03-13 23:42:37 -04:00
Danny Avila
ca79a03135
🚦 fix: Add Rate Limiting to Conversation Duplicate Endpoint (#12218)
* fix: add rate limiting to conversation duplicate endpoint

* chore: linter

* fix: address review findings for conversation duplicate rate limiting

* refactor: streamline test mocks for conversation routes

- Consolidated mock implementations into a dedicated `convos-route-mocks.js` file to enhance maintainability and readability of test files.
- Updated tests in `convos-duplicate-ratelimit.spec.js` and `convos.spec.js` to utilize the new mock structure, improving clarity and reducing redundancy.
- Enhanced the `duplicateConversation` function to accept an optional title parameter for better flexibility in conversation duplication.

* chore: rename files
2026-03-13 23:40:44 -04:00
Danny Avila
fa9e1b228a
🪪 fix: MCP API Responses and OAuth Validation (#12217)
* 🔒 fix: Validate MCP Configs in Server Responses

* 🔒 fix: Enhance OAuth URL Validation in MCPOAuthHandler

- Introduced validation for OAuth URLs to ensure they do not target private or internal addresses, enhancing security against SSRF attacks.
- Updated the OAuth flow to validate both authorization and token URLs before use, ensuring compliance with security standards.
- Refactored redirect URI handling to streamline the OAuth client registration process.
- Added comprehensive error handling for invalid URLs, improving robustness in OAuth interactions.

* 🔒 feat: Implement Permission Checks for MCP Server Management

- Added permission checkers for MCP server usage and creation, enhancing access control.
- Updated routes for reinitializing MCP servers and retrieving authentication values to include these permission checks, ensuring only authorized users can access these functionalities.
- Refactored existing permission logic to improve clarity and maintainability.

* 🔒 fix: Enhance MCP Server Response Validation and Redaction

- Updated MCP route tests to use `toMatchObject` for better validation of server response structures, ensuring consistency in expected properties.
- Refactored the `redactServerSecrets` function to streamline the removal of sensitive information, ensuring that user-sourced API keys are properly redacted while retaining their source.
- Improved OAuth security tests to validate rejection of private URLs across multiple endpoints, enhancing protection against SSRF vulnerabilities.
- Added comprehensive tests for the `redactServerSecrets` function to ensure proper handling of various server configurations, reinforcing security measures.

* chore: eslint

* 🔒 fix: Enhance OAuth Server URL Validation in MCPOAuthHandler

- Added validation for discovered authorization server URLs to ensure they meet security standards.
- Improved logging to provide clearer insights when an authorization server is found from resource metadata.
- Refactored the handling of authorization server URLs to enhance robustness against potential security vulnerabilities.

* 🔒 test: Bypass SSRF validation for MCP OAuth Flow tests

- Mocked SSRF validation functions to allow tests to use real local HTTP servers, facilitating more accurate testing of the MCP OAuth flow.
- Updated test setup to ensure compatibility with the new mocking strategy, enhancing the reliability of the tests.

* 🔒 fix: Add Validation for OAuth Metadata Endpoints in MCPOAuthHandler

- Implemented checks for the presence and validity of registration and token endpoints in the OAuth metadata, enhancing security by ensuring that these URLs are properly validated before use.
- Improved error handling and logging to provide better insights during the OAuth metadata processing, reinforcing the robustness of the OAuth flow.

* 🔒 refactor: Simplify MCP Auth Values Endpoint Logic

- Removed redundant permission checks for accessing the MCP server resource in the auth-values endpoint, streamlining the request handling process.
- Consolidated error handling and response structure for improved clarity and maintainability.
- Enhanced logging for better insights during the authentication value checks, reinforcing the robustness of the endpoint.

* 🔒 test: Refactor LeaderElection Integration Tests for Improved Cleanup

- Moved Redis key cleanup to the beforeEach hook to ensure a clean state before each test.
- Enhanced afterEach logic to handle instance resignations and Redis key deletion more robustly, improving test reliability and maintainability.
2026-03-13 23:18:56 -04:00
Danny Avila
f32907cd36
🔏 fix: MCP Server URL Schema Validation (#12204)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* fix: MCP server configuration validation and schema

- Added tests to reject URLs containing environment variable references for SSE, streamable-http, and websocket types in the MCP routes.
- Introduced a new schema in the data provider to ensure user input URLs do not resolve environment variables, enhancing security against potential leaks.
- Updated existing MCP server user input schema to utilize the new validation logic, ensuring consistent handling of user-supplied URLs across the application.

* fix: MCP URL validation to reject env variable references

- Updated tests to ensure that URLs for SSE, streamable-http, and websocket types containing environment variable patterns are rejected, improving security against potential leaks.
- Refactored the MCP server user input schema to enforce stricter validation rules, preventing the resolution of environment variables in user-supplied URLs.
- Introduced new test cases for various URL types to validate the rejection logic, ensuring consistent handling across the application.

* test: Enhance MCPServerUserInputSchema tests for environment variable handling

- Introduced new test cases to validate the prevention of environment variable exfiltration through user input URLs in the MCPServerUserInputSchema.
- Updated existing tests to confirm that URLs containing environment variable patterns are correctly resolved or rejected, improving security against potential leaks.
- Refactored test structure to better organize environment variable handling scenarios, ensuring comprehensive coverage of edge cases.
2026-03-12 23:19:31 -04:00
Danny Avila
fcb344da47
🛂 fix: MCP OAuth Race Conditions, CSRF Fallback, and Token Expiry Handling (#12171)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* fix: Implement race conditions in MCP OAuth flow

- Added connection mutex to coalesce concurrent `getUserConnection` calls, preventing multiple simultaneous attempts.
- Enhanced flow state management to retry once when a flow state is missing, improving resilience against race conditions.
- Introduced `ReauthenticationRequiredError` for better error handling when access tokens are expired or missing.
- Updated tests to cover new race condition scenarios and ensure proper handling of OAuth flows.

* fix: Stale PENDING flow detection and OAuth URL re-issuance

PENDING flows in handleOAuthRequired now check createdAt age — flows
older than 2 minutes are treated as stale and replaced instead of
joined. Fixes the case where a leftover PENDING flow from a previous
session blocks new OAuth initiation.

authorizationUrl is now stored in MCPOAuthFlowMetadata so that when a
second caller joins an active PENDING flow (e.g., the SSE-emitting path
in ToolService), it can re-issue the URL to the user via oauthStart.

* fix: CSRF fallback via active PENDING flow in OAuth callback

When the OAuth callback arrives without CSRF or session cookies (common
in the chat/SSE flow where cookies can't be set on streaming responses),
fall back to validating that a PENDING flow exists for the flowId. This
is safe because the flow was created server-side after JWT authentication
and the authorization code is PKCE-protected.

* test: Extract shared OAuth test server helpers

Move MockKeyv, getFreePort, trackSockets, and createOAuthMCPServer into
a shared helpers/oauthTestServer module. Enhance the test server with
refresh token support, token rotation, metadata discovery, and dynamic
client registration endpoints. Add InMemoryTokenStore for token storage
tests.

Refactor MCPOAuthRaceCondition.test.ts to import from shared helpers.

* test: Add comprehensive MCP OAuth test modules

MCPOAuthTokenStorage — 21 tests for storeTokens/getTokens with
InMemoryTokenStore: encrypt/decrypt round-trips, expiry calculation,
refresh callback wiring, ReauthenticationRequiredError paths.

MCPOAuthFlow — 10 tests against real HTTP server: token refresh with
stored client info, refresh token rotation, metadata discovery, dynamic
client registration, full store/retrieve/expire/refresh lifecycle.

MCPOAuthConnectionEvents — 5 tests for MCPConnection OAuth event cycle
with real OAuth-gated MCP server: oauthRequired emission on 401,
oauthHandled reconnection, oauthFailed rejection, token expiry detection.

MCPOAuthTokenExpiry — 12 tests for the token expiry edge case: refresh
success/failure paths, ReauthenticationRequiredError, PENDING flow CSRF
fallback, authorizationUrl metadata storage, full re-auth cycle after
refresh failure, concurrent expired token coalescing, stale PENDING
flow detection.

* test: Enhance MCP OAuth connection tests with cooldown reset

Added a `beforeEach` hook to clear the cooldown for `MCPConnection` before each test, ensuring a clean state. Updated the race condition handling in the tests to properly clear the timeout, improving reliability in the event data retrieval process.

* refactor: PENDING flow management and state recovery in MCP OAuth

- Introduced a constant `PENDING_STALE_MS` to define the age threshold for PENDING flows, improving the handling of stale flows.
- Updated the logic in `MCPConnectionFactory` and `FlowStateManager` to check the age of PENDING flows before joining or reusing them.
- Modified the `completeFlow` method to return false when the flow state is deleted, ensuring graceful handling of race conditions.
- Enhanced tests to validate the new behavior and ensure robustness against state recovery issues.

* refactor: MCP OAuth flow management and testing

- Updated the `completeFlow` method to log warnings when a tool flow state is not found during completion, improving error handling.
- Introduced a new `normalizeExpiresAt` function to standardize expiration timestamp handling across the application.
- Refactored token expiration checks in `MCPConnectionFactory` to utilize the new normalization function, ensuring consistent behavior.
- Added a comprehensive test suite for OAuth callback CSRF fallback logic, validating the handling of PENDING flows and their staleness.
- Enhanced existing tests to cover new expiration normalization logic and ensure robust flow state management.

* test: Add CSRF fallback tests for active PENDING flows in MCP OAuth

- Introduced new tests to validate CSRF fallback behavior when a fresh PENDING flow exists without cookies, ensuring successful OAuth callback handling.
- Added scenarios to reject requests when no PENDING flow exists, when only a COMPLETED flow is present, and when a PENDING flow is stale, enhancing the robustness of flow state management.
- Improved overall test coverage for OAuth callback logic, reinforcing the handling of CSRF validation failures.

* chore: imports order

* refactor: Update UserConnectionManager to conditionally manage pending connections

- Modified the logic in `UserConnectionManager` to only set pending connections if `forceNew` is false, preventing unnecessary overwrites.
- Adjusted the cleanup process to ensure pending connections are only deleted when not forced, enhancing connection management efficiency.

* refactor: MCP OAuth flow state management

- Introduced a new method `storeStateMapping` in `MCPOAuthHandler` to securely map the OAuth state parameter to the flow ID, improving callback resolution and security against forgery.
- Updated the OAuth initiation and callback handling in `mcp.js` to utilize the new state mapping functionality, ensuring robust flow management.
- Refactored `MCPConnectionFactory` to store state mappings during flow initialization, enhancing the integrity of the OAuth process.
- Adjusted comments to clarify the purpose of state parameters in authorization URLs, reinforcing code readability.

* refactor: MCPConnection with OAuth recovery handling

- Added `oauthRecovery` flag to manage OAuth recovery state during connection attempts.
- Introduced `decrementCycleCount` method to reduce the circuit breaker's cycle count upon successful reconnection after OAuth recovery.
- Updated connection logic to reset the `oauthRecovery` flag after handling OAuth, improving state management and connection reliability.

* chore: Add debug logging for OAuth recovery cycle count decrement

- Introduced a debug log statement in the `MCPConnection` class to track the decrement of the cycle count after a successful reconnection during OAuth recovery.
- This enhancement improves observability and aids in troubleshooting connection issues related to OAuth recovery.

* test: Add OAuth recovery cycle management tests

- Introduced new tests for the OAuth recovery cycle in `MCPConnection`, validating the decrement of cycle counts after successful reconnections.
- Added scenarios to ensure that the cycle count is not decremented on OAuth failures, enhancing the robustness of connection management.
- Improved test coverage for OAuth reconnect scenarios, ensuring reliable behavior under various conditions.

* feat: Implement circuit breaker configuration in MCP

- Added circuit breaker settings to `.env.example` for max cycles, cycle window, and cooldown duration.
- Refactored `MCPConnection` to utilize the new configuration values from `mcpConfig`, enhancing circuit breaker management.
- Improved code maintainability by centralizing circuit breaker parameters in the configuration file.

* refactor: Update decrementCycleCount method for circuit breaker management

- Changed the visibility of the `decrementCycleCount` method in `MCPConnection` from private to public static, allowing it to be called with a server name parameter.
- Updated calls to `decrementCycleCount` in `MCPConnectionFactory` to use the new static method, improving clarity and consistency in circuit breaker management during connection failures and OAuth recovery.
- Enhanced the handling of circuit breaker state by ensuring the method checks for the existence of the circuit breaker before decrementing the cycle count.

* refactor: cycle count decrement on tool listing failure

- Added a call to `MCPConnection.decrementCycleCount` in the `MCPConnectionFactory` to handle cases where unauthenticated tool listing fails, improving circuit breaker management.
- This change ensures that the cycle count is decremented appropriately, maintaining the integrity of the connection recovery process.

* refactor: Update circuit breaker configuration and logic

- Enhanced circuit breaker settings in `.env.example` to include new parameters for failed rounds and backoff strategies.
- Refactored `MCPConnection` to utilize the updated configuration values from `mcpConfig`, improving circuit breaker management.
- Updated tests to reflect changes in circuit breaker logic, ensuring accurate validation of connection behavior under rapid reconnect scenarios.

* feat: Implement state mapping deletion in MCP flow management

- Added a new method `deleteStateMapping` in `MCPOAuthHandler` to remove orphaned state mappings when a flow is replaced, preventing old authorization URLs from resolving after a flow restart.
- Updated `MCPConnectionFactory` to call `deleteStateMapping` during flow cleanup, ensuring proper management of OAuth states.
- Enhanced test coverage for state mapping functionality to validate the new deletion logic.
2026-03-10 21:15:01 -04:00
Lionel Ringenbach
6d0938be64
🔒 refactor: Set ALLOW_SHARED_LINKS_PUBLIC to false by Default (#12100)
* fix: default ALLOW_SHARED_LINKS_PUBLIC to false for security

Shared links were publicly accessible by default when
ALLOW_SHARED_LINKS_PUBLIC was not explicitly set, which could lead to
unintentional data exposure. Users may assume their authentication
settings protect shared links when they do not.

This changes the default behavior so shared links require JWT
authentication unless ALLOW_SHARED_LINKS_PUBLIC is explicitly set to
true.

* Document ALLOW_SHARED_LINKS_PUBLIC in .env.example

Add comment explaining ALLOW_SHARED_LINKS_PUBLIC setting.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Danny Avila <danacordially@gmail.com>
2026-03-06 19:05:56 -05:00
Peter Nancarrow
14bcab60b3
🧬 feat: Allow Agent Editors to Duplicate Agents (#12041)
* feat: allow editors to duplicate agents

* fix: Update permissions for duplicating agents and enhance visibility in AgentFooter

- Changed required permission for duplicating agents from VIEW to EDIT in the API route.
- Updated AgentFooter component to display the duplicate button for admins and users with EDIT permission, improving access control.
- Added tests to ensure the duplicate button visibility logic works correctly based on user roles and permissions.

* test: Update AgentFooter tests to reflect permission changes

- Adjusted tests in AgentFooter.spec.tsx to verify UI behavior based on user permissions.
- Updated expectations for the visibility of the grant access dialog and duplicate button, ensuring they align with the new permission logic.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-03 20:45:02 -05:00
Danny Avila
b8c31e7314
🔱 chore: Harden API Routes Against IDOR and DoS Attacks (#11760)
* 🔧 feat: Update user key handling in keys route and add comprehensive tests

- Enhanced the PUT /api/keys route to destructure request body for better clarity and maintainability.
- Introduced a new test suite for keys route, covering key update, deletion, and retrieval functionalities, ensuring robust validation and IDOR prevention.
- Added tests to verify handling of extraneous fields and missing optional parameters in requests.

* 🔧 fix: Enhance conversation deletion route with parameter validation

- Updated the DELETE /api/convos route to handle cases where the request body is empty or the 'arg' parameter is null/undefined, returning a 400 status with an appropriate error message for DoS prevention.
- Added corresponding tests to ensure proper validation and error handling for these scenarios, enhancing the robustness of the API.

* 🔧 fix: Improve request body validation in keys and convos routes

- Updated the DELETE /api/convos and PUT /api/keys routes to validate the request body, returning a 400 status for null or invalid bodies to enhance security and prevent potential DoS attacks.
- Added corresponding tests to ensure proper error handling for these scenarios, improving the robustness of the API.
2026-02-12 18:08:24 -05:00
Danny Avila
599f4a11f1
🛡️ fix: Secure MCP/Actions OAuth Flows, Resolve Race Condition & Tool Cache Cleanup (#11756)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* 🔧 fix: Update OAuth error message for clarity

- Changed the default error message in the OAuth error route from 'Unknown error' to 'Unknown OAuth error' to provide clearer context during authentication failures.

* 🔒 feat: Enhance OAuth flow with CSRF protection and session management

- Implemented CSRF protection for OAuth flows by introducing `generateOAuthCsrfToken`, `setOAuthCsrfCookie`, and `validateOAuthCsrf` functions.
- Added session management for OAuth with `setOAuthSession` and `validateOAuthSession` middleware.
- Updated routes to bind CSRF tokens for MCP and action OAuth flows, ensuring secure authentication.
- Enhanced tests to validate CSRF handling and session management in OAuth processes.

* 🔧 refactor: Invalidate cached tools after user plugin disconnection

- Added a call to `invalidateCachedTools` in the `updateUserPluginsController` to ensure that cached tools are refreshed when a user disconnects from an MCP server after a plugin authentication update. This change improves the accuracy of tool data for users.

* chore: imports order

* fix: domain separator regex usage in ToolService

- Moved the declaration of `domainSeparatorRegex` to avoid redundancy in the `loadActionToolsForExecution` function, improving code clarity and performance.

* chore: OAuth flow error handling and CSRF token generation

- Enhanced the OAuth callback route to validate the flow ID format, ensuring proper error handling for invalid states.
- Updated the CSRF token generation function to require a JWT secret, throwing an error if not provided, which improves security and clarity in token generation.
- Adjusted tests to reflect changes in flow ID handling and ensure robust validation across various scenarios.
2026-02-12 14:22:05 -05:00
Danny Avila
6279ea8dd7
🛸 feat: Remote Agent Access with External API Support (#11503)
* 🪪 feat: Microsoft Graph Access Token Placeholder for MCP Servers (#10867)

* feat: MCP Graph Token env var

* Addressing copilot remarks

* Addressed Copilot review remarks

* Fixed graphtokenservice mock in MCP test suite

* fix: remove unnecessary type check and cast in resolveGraphTokensInRecord

* ci: add Graph Token integration tests in MCPManager

* refactor: update user type definitions to use Partial<IUser> in multiple functions

* test: enhance MCP tests for graph token processing and user placeholder resolution

- Added comprehensive tests to validate the interaction between preProcessGraphTokens and processMCPEnv.
- Ensured correct resolution of graph tokens and user placeholders in various configurations.
- Mocked OIDC utilities to facilitate testing of token extraction and validation.
- Verified that original options remain unchanged after processing.

* chore: import order

* chore: imports

---------

Co-authored-by: Danny Avila <danny@librechat.ai>

* WIP: OpenAI-compatible API for LibreChat agents

- Added OpenAIChatCompletionController for handling chat completions.
- Introduced ListModelsController and GetModelController for listing and retrieving agent details.
- Created routes for OpenAI API endpoints, including /v1/chat/completions and /v1/models.
- Developed event handlers for streaming responses in OpenAI format.
- Implemented request validation and error handling for API interactions.
- Integrated content aggregation and response formatting to align with OpenAI specifications.

This commit establishes a foundational API for interacting with LibreChat agents in a manner compatible with OpenAI's chat completion interface.

* refactor: OpenAI-spec content aggregation for improved performance and clarity

* fix: OpenAI chat completion controller with safe user handling for correct tool loading

* refactor: Remove conversation ID from OpenAI response context and related handlers

* refactor: OpenAI chat completion handling with streaming support

- Introduced a lightweight tracker for streaming responses, allowing for efficient tracking of emitted content and usage metadata.
- Updated the OpenAIChatCompletionController to utilize the new tracker, improving the handling of streaming and non-streaming responses.
- Refactored event handlers to accommodate the new streaming logic, ensuring proper management of tool calls and content aggregation.
- Adjusted response handling to streamline error reporting during streaming sessions.

* WIP: Open Responses API with core service, types, and handlers

- Added Open Responses API module with comprehensive types and enums.
- Implemented core service for processing requests, including validation and input conversion.
- Developed event handlers for streaming responses and non-streaming aggregation.
- Established response building logic and error handling mechanisms.
- Created detailed types for input and output content, ensuring compliance with Open Responses specification.

* feat: Implement response storage and retrieval in Open Responses API

- Added functionality to save user input messages and assistant responses to the database when the `store` flag is set to true.
- Introduced a new endpoint to retrieve stored responses by ID, allowing users to access previous interactions.
- Enhanced the response creation process to include database operations for conversation and message storage.
- Implemented tests to validate the storage and retrieval of responses, ensuring correct behavior for both existing and non-existent response IDs.

* refactor: Open Responses API with additional token tracking and validation

- Added support for tracking cached tokens in response usage, improving token management.
- Updated response structure to include new properties for top log probabilities and detailed usage metrics.
- Enhanced tests to validate the presence and types of new properties in API responses, ensuring compliance with updated specifications.
- Refactored response handling to accommodate new fields and improve overall clarity and performance.

* refactor: Update reasoning event handlers and types for consistency

- Renamed reasoning text events to simplify naming conventions, changing `emitReasoningTextDelta` to `emitReasoningDelta` and `emitReasoningTextDone` to `emitReasoningDone`.
- Updated event types in the API to reflect the new naming, ensuring consistency across the codebase.
- Added `logprobs` property to output events for enhanced tracking of log probabilities.

* feat: Add validation for streaming events in Open Responses API tests

* feat: Implement response.created event in Open Responses API

- Added emitResponseCreated function to emit the response.created event as the first event in the streaming sequence, adhering to the Open Responses specification.
- Updated createResponse function to emit response.created followed by response.in_progress.
- Enhanced tests to validate the order of emitted events, ensuring response.created is triggered before response.in_progress.

* feat: Responses API with attachment event handling

- Introduced `createResponsesToolEndCallback` to handle attachment events in the Responses API, emitting `librechat:attachment` events as per the Open Responses extension specification.
- Updated the `createResponse` function to utilize the new callback for processing tool outputs and emitting attachments during streaming.
- Added helper functions for writing attachment events and defined types for attachment data, ensuring compatibility with the Open Responses protocol.
- Enhanced tests to validate the integration of attachment events within the Responses API workflow.

* WIP: remote agent auth

* fix: Improve loading state handling in AgentApiKeys component

- Updated the rendering logic to conditionally display loading spinner and API keys based on the loading state.
- Removed unnecessary imports and streamlined the component for better readability.

* refactor: Update API key access handling in routes

- Replaced `checkAccess` with `generateCheckAccess` for improved access control.
- Consolidated access checks into a single `checkApiKeyAccess` function, enhancing code readability and maintainability.
- Streamlined route definitions for creating, listing, retrieving, and deleting API keys.

* fix: Add permission handling for REMOTE_AGENT resource type

* feat: Enhance permission handling for REMOTE_AGENT resources

- Updated the deleteAgent and deleteUserAgents functions to handle permissions for both AGENT and REMOTE_AGENT resource types.
- Introduced new functions to enrich REMOTE_AGENT principals and backfill permissions for AGENT owners.
- Modified createAgentHandler and duplicateAgentHandler to grant permissions for REMOTE_AGENT alongside AGENT.
- Added utility functions for retrieving effective permissions for REMOTE_AGENT resources, ensuring consistent access control across the application.

* refactor: Rename and update roles for remote agent access

- Changed role name from API User to Editor in translation files for clarity.
- Updated default editor role ID from REMOTE_AGENT_USER to REMOTE_AGENT_EDITOR in resource configurations.
- Adjusted role localization to reflect the new Editor role.
- Modified access permissions to align with the updated role definitions across the application.

* feat: Introduce remote agent permissions and update access handling

- Added support for REMOTE_AGENTS in permission schemas, including use, create, share, and share_public permissions.
- Updated the interface configuration to include remote agent settings.
- Modified middleware and API key access checks to align with the new remote agent permission structure.
- Enhanced role defaults to incorporate remote agent permissions, ensuring consistent access control across the application.

* refactor: Update AgentApiKeys component and permissions handling

- Refactored the AgentApiKeys component to improve structure and readability, including the introduction of ApiKeysContent for better separation of concerns.
- Updated CreateKeyDialog to accept an onKeyCreated callback, enhancing its functionality.
- Adjusted permission checks in Data component to use REMOTE_AGENTS and USE permissions, aligning with recent permission schema changes.
- Enhanced loading state handling and dialog management for a smoother user experience.

* refactor: Update remote agent access checks in API routes

- Replaced existing access checks with `generateCheckAccess` for remote agents in the API keys and agents routes.
- Introduced specific permission checks for creating, listing, retrieving, and deleting API keys, enhancing access control.
- Improved code structure by consolidating permission handling for remote agents across multiple routes.

* fix: Correct query parameters in ApiKeysContent component

- Updated the useGetAgentApiKeysQuery call to include an object for the enabled parameter, ensuring proper functionality when the component is open.
- This change improves the handling of API key retrieval based on the component's open state.

* feat: Implement remote agents permissions and update API routes

- Added new API route for updating remote agents permissions, enhancing role management capabilities.
- Introduced remote agents permissions handling in the AgentApiKeys component, including a dedicated settings dialog.
- Updated localization files to include new remote agents permission labels for better user experience.
- Refactored data provider to support remote agents permissions updates, ensuring consistent access control across the application.

* feat: Add remote agents permissions to role schema and interface

- Introduced new permissions for REMOTE_AGENTS in the role schema, including USE, CREATE, SHARE, and SHARE_PUBLIC.
- Updated the IRole interface to reflect the new remote agents permissions structure, enhancing role management capabilities.

* feat: Add remote agents settings button to API keys dialog

* feat: Update AgentFooter to include remote agent sharing permissions

- Refactored access checks to incorporate permissions for sharing remote agents.
- Enhanced conditional rendering logic to allow sharing by users with remote agent permissions.
- Improved loading state handling for remote agent permissions, ensuring a smoother user experience.

* refactor: Update API key creation access check and localization strings

- Replaced the access check for creating API keys to use the existing remote agents access check.
- Updated localization strings to correct the descriptions for remote agent permissions, ensuring clarity in user interface.

* fix: resource permission mapping to include remote agents

- Changed the resourceToPermissionMap to use a Partial<Record> for better flexibility.
- Added mapping for REMOTE_AGENT permissions, enhancing the sharing capabilities for remote agents.

* feat: Implement remote access checks for agent models

- Enhanced ListModelsController and GetModelController to include checks for user permissions on remote agents.
- Integrated findAccessibleResources to filter agents based on VIEW permission for REMOTE_AGENT.
- Updated response handling to ensure users can only access agents they have permissions for, improving security and access control.

* fix: Update user parameter type in processUserPlaceholders function

- Changed the user parameter type in the processUserPlaceholders function from Partial<Partial<IUser>> to Partial<IUser> for improved type clarity and consistency.

* refactor: Simplify integration test structure by removing conditional describe

- Replaced conditional describeWithApiKey with a standard describe for all integration tests in responses.spec.js.
- This change enhances test clarity and ensures all tests are executed consistently, regardless of the SKIP_INTEGRATION_TESTS flag.

* test: Update AgentFooter tests to reflect new grant access dialog ID

- Changed test IDs for the grant access dialog in AgentFooter tests to include the resource type, ensuring accurate identification in the test cases.
- This update improves test clarity and aligns with recent changes in the component's implementation.

* test: Enhance integration tests for Open Responses API

- Updated integration tests in responses.spec.js to utilize an authRequest helper for consistent authorization handling across all test cases.
- Introduced a test user and API key creation to improve test setup and ensure proper permission checks for remote agents.
- Added checks for existing access roles and created necessary roles if they do not exist, enhancing test reliability and coverage.

* feat: Extend accessRole schema to include remoteAgent resource type

- Updated the accessRole schema to add 'remoteAgent' to the resourceType enum, enhancing the flexibility of role assignments and permissions management.

* test: refactored test setup to create a minimal Express app for responses routes, enhancing test structure and maintainability.

* test: Enhance abort.spec.js by mocking additional modules for improved test isolation

- Updated the test setup in abort.spec.js to include actual implementations of '@librechat/data-schemas' and '@librechat/api' while maintaining mock functionality.
- This change improves test reliability and ensures that the tests are more representative of the actual module behavior.

* refactor: Update conversation ID generation to use UUID

- Replaced the nanoid with uuidv4 for generating conversation IDs in the createResponse function, enhancing uniqueness and consistency in ID generation.

* test: Add remote agent access roles to AccessRole model tests

- Included additional access roles for remote agents (REMOTE_AGENT_EDITOR, REMOTE_AGENT_OWNER, REMOTE_AGENT_VIEWER) in the AccessRole model tests to ensure comprehensive coverage of role assignments and permissions management.

* chore: Add deletion of user agent API keys in user deletion process

- Updated the user deletion process in UserController and delete-user.js to include the removal of user agent API keys, ensuring comprehensive cleanup of user data upon account deletion.

* test: Add remote agents permissions to permissions.spec.ts

- Enhanced the permissions tests by including comprehensive permission settings for remote agents across various scenarios, ensuring accurate validation of access controls for remote agent roles.

* chore: Update remote agents translations for clarity and consistency

- Removed outdated remote agents translation entries and added revised entries to improve clarity on API key creation and sharing permissions for remote agents. This enhances user understanding of the available functionalities.

* feat: Add indexing and TTL for agent API keys

- Introduced an index on the `key` field for improved query performance.
- Added a TTL index on the `expiresAt` field to enable automatic cleanup of expired API keys, ensuring efficient management of stored keys.

* chore: Update API route documentation for clarity

- Revised comments in the agents route file to clarify the handling of API key authentication.
- Removed outdated endpoint listings to streamline the documentation and focus on current functionality.

---------

Co-authored-by: Max Sanna <max@maxsanna.com>
2026-01-28 17:44:33 -05:00
Danny Avila
b6af884dd2
🔐 feat: Admin Auth. Routes with Secure Cross-Origin Token Exchange (#11297)
* feat: implement admin authentication with OpenID & Local Auth proxy support

* feat: implement admin OAuth exchange flow with caching support

- Added caching for admin OAuth exchange codes with a short TTL.
- Introduced new endpoints for generating and exchanging admin OAuth codes.
- Updated relevant controllers and routes to handle admin panel redirects and token exchanges.
- Enhanced logging for better traceability of OAuth operations.

* refactor: enhance OpenID strategy mock to support multiple verify callbacks

- Updated the OpenID strategy mock to store and retrieve verify callbacks by strategy name.
- Improved backward compatibility by maintaining a method to get the last registered callback.
- Adjusted tests to utilize the new callback retrieval methods, ensuring clarity in the verification process for the 'openid' strategy.

* refactor: reorder import statements for better organization

* refactor: admin OAuth flow with improved URL handling and validation

- Added a utility function to retrieve the admin panel URL, defaulting to a local development URL if not set in the environment.
- Updated the OAuth exchange endpoint to include validation for the authorization code format.
- Refactored the admin panel redirect logic to handle URL parsing more robustly, ensuring accurate origin comparisons.
- Removed redundant local URL definitions from the codebase for better maintainability.

* refactor: remove deprecated requireAdmin middleware and migrate to TypeScript

- Deleted the old requireAdmin middleware file and its references in the middleware index.
- Introduced a new TypeScript version of the requireAdmin middleware with enhanced error handling and logging.
- Updated routes to utilize the new requireAdmin middleware, ensuring consistent access control for admin routes.

* feat: add requireAdmin middleware for admin role verification

- Introduced requireAdmin middleware to enforce admin role checks for authenticated users.
- Implemented comprehensive error handling and logging for unauthorized access attempts.
- Added unit tests to validate middleware functionality and ensure proper behavior for different user roles.
- Updated middleware index to include the new requireAdmin export.
2026-01-28 17:44:31 -05:00
Danny Avila
8be0047a80
🔒 fix: Access Check for User-Specific Job Metadata in Streaming Endpoint (#11487)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* Implemented a check to ensure that only the user associated with a job can access its chat stream, returning a 403 Unauthorized response for mismatched user IDs.
* This enhancement improves security by preventing unauthorized access to user-specific job data.
2026-01-23 09:06:48 -05:00
Danny Avila
11210d8b98
🏁 fix: Message Race Condition if Cancelled Early (#11462)
* 🔧 fix: Prevent race conditions in message saving during abort scenarios

* Added logic to save partial responses before returning from the abort endpoint to ensure parentMessageId exists in the database.
* Updated the ResumableAgentController to save response messages before emitting final events, preventing orphaned parentMessageIds.
* Enhanced handling of unfinished responses to improve stability and data integrity in agent interactions.

* 🔧 fix: logging and job replacement handling in ResumableAgentController

* Added detailed logging for job creation and final event emissions to improve traceability.
* Implemented logic to check for job replacement before emitting events, preventing stale requests from affecting newer jobs.
* Updated abort handling to log additional context about the abort result, enhancing debugging capabilities.

* refactor: abort handling and token spending logic in AgentStream

* Added authorization check for abort attempts to prevent unauthorized access.
* Improved response message saving logic to ensure valid message IDs are stored.
* Implemented token spending for aborted requests to prevent double-spending across parallel agents.
* Enhanced logging for better traceability of token spending operations during abort scenarios.

* refactor: remove TODO comments for token spending in abort handling

* Removed outdated TODO comments regarding token spending for aborted requests in the abort endpoint.
* This change streamlines the code and clarifies the current implementation status.

*  test: Add comprehensive tests for job replacement and abort handling

* Introduced unit tests for job replacement detection in ResumableAgentController, covering job creation timestamp tracking, stale job detection, and response message saving order.
* Added tests for the agent abort endpoint, ensuring proper authorization checks, early abort handling, and partial response saving.
* Enhanced logging and error handling in tests to improve traceability and robustness of the abort functionality.
2026-01-21 13:57:12 -05:00
Danny Avila
b5e4c763af
🔀 refactor: Endpoint Check for File Uploads in Images Route (#11352)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
- Changed the endpoint check from `isAgentsEndpoint` to `isAssistantsEndpoint` to adjust the logic for processing file uploads.
- Reordered the import statements for better organization.
2026-01-14 14:07:58 -05:00
Danny Avila
f8774983a0
🪪 fix: Misleading MCP Server Lookup Method Name (#11315)
* 🔧 fix: MCP server ID resolver in access permissions (#11315)

- Replaced `findMCPServerById` with `findMCPServerByObjectId` in access permissions route and corresponding tests for improved clarity and consistency in resource identification.

* 🔧 refactor: Update MCP server resource access methods to use server name

- Replaced instances of `findMCPServerById` with `findMCPServerByServerName` across middleware, database, and test files for improved clarity and consistency in resource identification.
- Updated related comments and test cases to reflect the change in method usage.

* chore: Increase timeout for Redis update in GenerationJobManager integration tests

- Updated the timeout duration from 50ms to 200ms in the GenerationJobManager integration tests to ensure reliable verification of final event data in Redis after emitting the done event.
2026-01-12 21:04:25 -05:00
Danny Avila
76e17ba701
🔧 refactor: Permission handling for Resource Sharing (#11283)
* 🔧 refactor: permission handling for public sharing

- Updated permission keys from SHARED_GLOBAL to SHARE across various files for consistency.
- Added public access configuration in librechat.example.yaml.
- Adjusted related tests and components to reflect the new permission structure.

* chore: Update default SHARE permission to false

* fix: Update SHARE permissions in tests and implementation

- Added SHARE permission handling for user and admin roles in permissions.spec.ts and permissions.ts.
- Updated expected permissions in tests to reflect new SHARE permission values for various permission types.

* fix: Handle undefined values in PeoplePickerAdminSettings component

- Updated the checked and value props of the Switch component to handle undefined values gracefully by defaulting to false. This ensures consistent behavior when the field value is not set.

* feat: Add CREATE permission handling for prompts and agents

- Introduced CREATE permission for user and admin roles in permissions.spec.ts and permissions.ts.
- Updated expected permissions in tests to include CREATE permission for various permission types.

* 🔧 refactor: Enhance permission handling for sharing dialog usability

* refactor: public sharing permissions for resources

- Added middleware to check SHARE_PUBLIC permissions for agents, prompts, and MCP servers.
- Updated interface configuration in librechat.example.yaml to include public sharing options.
- Enhanced components and hooks to support public sharing functionality.
- Adjusted tests to validate new permission handling for public sharing across various resource types.

* refactor: update Share2Icon styling in GenericGrantAccessDialog

* refactor: update Share2Icon size in GenericGrantAccessDialog for consistency

* refactor: improve layout and styling of Share2Icon in GenericGrantAccessDialog

* refactor: update Share2Icon size in GenericGrantAccessDialog for improved consistency

* chore: remove redundant public sharing option from People Picker

* refactor: add SHARE_PUBLIC permission handling in updateInterfacePermissions tests
2026-01-10 14:02:56 -05:00
Danny Avila
9434d4a070
🔧 fix: Sorting and Pagination logic for Conversations (#11242)
- Changed default sorting from 'createdAt' to 'updatedAt' in both Conversation and Message routes.
- Updated pagination logic to ensure the cursor is created from the last returned item instead of the popped item, preventing skipped items at page boundaries.
- Added comprehensive tests for pagination behavior, ensuring no messages or conversations are skipped and that sorting works as expected.
2026-01-07 09:44:45 -05:00
Danny Avila
348b4a4a32
🍪 refactor: Move OpenID Tokens from Cookies to Server-Side Sessions (#11236)
* refactor: OpenID token handling by storing tokens in session to reduce cookie size

* refactor: Improve OpenID user identification logic in logout controller

* refactor: Enhance OpenID logout flow by adding post-logout redirect URI

* refactor: Update logout process to clear additional OpenID user ID cookie
2026-01-06 15:22:10 -05:00
Danny Avila
b7db0dd9bc
📎 fix: Allow Message Attachments for Users with Viewer Permission on Agents (#11210)
* fix: allow message attachments for users with viewer permission on agents

Fixes regression introduced by the agent file upload access control fix
(SBA-ADV-20251204-01). The original fix was too restrictive - it blocked
ALL file uploads with agent_id + tool_resource, including temporary
message attachments used during chat.

## Problem

Users with VIEWER permission on a shared agent could not attach files to
their chat messages. The permission check blocked any upload request that
included both `agent_id` and `tool_resource`, but message attachments
legitimately include both fields since files need to be added to the
agent's context for processing within that conversation.

* test: Add permission check for file uploads with message_file set to false

Introduced a new test case to ensure that file uploads are denied when the `message_file` flag is false, reinforcing permission checks for users with VIEW access on agents. This change enhances security by preventing unauthorized file uploads while maintaining functionality for legitimate message attachments.

* fix: Update BadgeRow to handle undefined endpoint in ChatForm

Modified the `showEphemeralBadges` prop in the `BadgeRow` component to ensure it correctly handles cases where the `endpoint` is undefined. This change improves the robustness of the chat input functionality by preventing potential errors related to endpoint checks.
2026-01-05 13:44:59 -05:00
Danny Avila
211b39f311
🔒 fix: Restrict MCP Stdio Transport via API (#11184)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Has been cancelled
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
- Updated MCP server configuration tests to reject stdio transport configurations, ensuring that only remote transports (SSE, HTTP, WebSocket) are allowed via the API.
- Enhanced documentation to clarify that stdio transport is excluded from user input for security, as it allows arbitrary command execution and should only be configured by administrators through YAML files.
2026-01-03 12:47:11 -05:00
Danny Avila
b94388ce9d
🏺 fix: Restore Archive Functionality with Dedicated Endpoint (#11183)
The archive conversation feature was broken after the `/api/convos/update`
route was modified to only handle title updates. The frontend was sending
`{ conversationId, isArchived }` to the update endpoint, but the backend
was only extracting `title` and ignoring the `isArchived` field entirely.

This fix implements a dedicated `/api/convos/archive` endpoint to restore
the archive/unarchive functionality.

Changes:

packages/data-provider/src/api-endpoints.ts:
- Add `archiveConversation()` endpoint returning `/api/convos/archive`

packages/data-provider/src/data-service.ts:
- Update `archiveConversation()` to use dedicated archive endpoint

api/server/routes/convos.js:
- Add `POST /archive` route with validation for `conversationId` (required)
  and `isArchived` (must be boolean)

api/server/routes/__tests__/convos.spec.js:
- Add test coverage for archive endpoint (success, validation, error cases)
2026-01-02 19:41:53 -05:00
Danny Avila
a7aa4dc91b
🚦 refactor: Concurrent Request Limiter for Resumable Streams (#11167)
* feat: Implement concurrent request handling in ResumableAgentController

- Introduced a new concurrency management system by adding `checkAndIncrementPendingRequest` and `decrementPendingRequest` functions to manage user request limits.
- Replaced the previous `concurrentLimiter` middleware with a more integrated approach directly within the `ResumableAgentController`.
- Enhanced violation logging and request denial for users exceeding their concurrent request limits.
- Removed the obsolete `concurrentLimiter` middleware file and updated related imports across the codebase.

* refactor: Simplify error handling in ResumableAgentController and enhance SSE error management

- Removed the `denyRequest` middleware and replaced it with a direct response for concurrent request violations in the ResumableAgentController.
- Improved error handling in the `useResumableSSE` hook to differentiate between network errors and other error types, ensuring more informative error responses are sent to the error handler.

* test: Enhance MCP server configuration tests with new mocks and improved logging

- Added mocks for MCP server registry and manager in `index.spec.js` to facilitate testing of server configurations.
- Updated debug logging in `initializeMCPs.spec.js` to simplify messages regarding server configurations, improving clarity in test outputs.

* refactor: Enhance concurrency management in request handling

- Updated `checkAndIncrementPendingRequest` and `decrementPendingRequest` functions to utilize Redis for atomic request counting, improving concurrency control.
- Added error handling for Redis operations to ensure requests can proceed even during Redis failures.
- Streamlined cache key generation for both Redis and in-memory fallback, enhancing clarity and performance in managing pending requests.
- Improved comments and documentation for better understanding of the concurrency logic and its implications.

* refactor: Improve atomicity in Redis operations for pending request management

- Updated `checkAndIncrementPendingRequest` to utilize Redis pipelines for atomic INCR and EXPIRE operations, enhancing concurrency control and preventing edge cases.
- Added error handling for pipeline execution failures to ensure robust request management.
- Improved comments for clarity on the concurrency logic and its implications.
2026-01-01 11:10:56 -05:00
Danny Avila
06ba025bd9
🔒 fix: Access Control on Agent Permission Queries (#11145)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Adds access control check to GET /api/permissions/:resourceType/:resourceId
endpoint to prevent unauthorized disclosure of agent permission information.

## Vulnerability Summary

LibreChat version 0.8.1-rc2 did not enforce proper access control when
querying agent permissions. Any authenticated user could read the permissions
of arbitrary agents by knowing the agent ID, even for private agents they
had no access to.

**Impact:**
- Attackers could enumerate which users have access to private agents
- Permission levels (owner, editor, viewer) were exposed
- User emails and names of permitted users were disclosed
- Agent's public/private sharing status was revealed

**Attack Vector:**
```
GET /api/permissions/agent/{agent_id}
Authorization: Bearer <any_valid_token>
```

The MongoDB ObjectId format (timestamp + process ID + counter) made it
feasible to brute-force discover valid agent IDs.

## Fix

Added `checkResourcePermissionAccess` middleware factory that enforces
SHARE permission before allowing access to permission queries. This
middleware is now applied to the GET endpoint, matching the existing
access control on the PUT endpoint.

**Before:**
```javascript
router.get('/:resourceType/:resourceId', getResourcePermissions);
```

**After:**
```javascript
router.get(
  '/:resourceType/:resourceId',
  checkResourcePermissionAccess(PermissionBits.SHARE),
  getResourcePermissions,
);
```

The middleware handles all supported resource types:
- Agent (ResourceType.AGENT)
- Prompt Group (ResourceType.PROMPTGROUP)
- MCP Server (ResourceType.MCPSERVER)

## Code Changes

**api/server/routes/accessPermissions.js:**
- Added `checkResourcePermissionAccess()` middleware factory
- Applied middleware to GET /:resourceType/:resourceId endpoint
- Refactored PUT endpoint to use the same middleware factory (DRY)

**api/server/routes/accessPermissions.test.js:**
- Added security tests verifying unauthorized access is denied
- Tests confirm 403 Forbidden for users without SHARE permission

## Security Tests

```
✓ should deny permission query for user without access (main vulnerability test)
✓ should return 400 for unsupported resource type
✓ should deny permission update for user without access
2025-12-29 15:10:31 -05:00
Danny Avila
4b9c6ab1cb
🔧 fix: Agent File Upload Permission Checks (#11144)
- Added permission checks for agent file uploads to ensure only authorized users can upload files.
- Admin users bypass permission checks, while other users must be the agent author or have EDIT permissions.
- Enhanced error handling for non-existent agents and insufficient permissions.
- Updated tests to cover various scenarios for file uploads, including permission validation and message attachments.
2025-12-29 15:10:14 -05:00
Danny Avila
bfc981d736
✍️ fix: Validation for Conversation Title Updates (#11099)
* ✍️ fix: Validation for Conversation Title Updates

* fix: Add validateConvoAccess middleware mock in tests
2025-12-25 12:59:48 -05:00
Artyom Bogachenko
7844a93f8b
♻️ fix: use DOMAIN_CLIENT for MCP OAuth Redirects (#11057)
Co-authored-by: Artyom Bogachenco <a.bogachenko@easyreport.ai>
2025-12-25 12:24:01 -05:00
Danny Avila
439bc98682
⏸ refactor: Improve UX for Parallel Streams (Multi-Convo) (#11096)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* 🌊 feat: Implement multi-conversation feature with added conversation context and payload adjustments

* refactor: Replace isSubmittingFamily with isSubmitting across message components for consistency

* feat: Add loadAddedAgent and processAddedConvo for multi-conversation agent execution

* refactor: Update ContentRender usage to conditionally render PlaceholderRow based on isLast and isSubmitting

* WIP: first pass, sibling index

* feat: Enhance multi-conversation support with agent tracking and display improvements

* refactor: Introduce isEphemeralAgentId utility and update related logic for agent handling

* refactor: Implement createDualMessageContent utility for sibling message display and enhance useStepHandler for added conversations

* refactor: duplicate tools for added agent if ephemeral and primary agent is also ephemeral

* chore: remove deprecated multimessage rendering

* refactor: enhance dual message content creation and agent handling for parallel rendering

* refactor: streamline message rendering and submission handling by removing unused state and optimizing conditional logic

* refactor: adjust content handling in parallel mode to utilize existing content for improved agent display

* refactor: update @librechat/agents dependency to version 3.0.53

* refactor: update @langchain/core and @librechat/agents dependencies to latest versions

* refactor: remove deprecated @langchain/core dependency from package.json

* chore: remove unused SearchToolConfig and GetSourcesParams types from web.ts

* refactor: remove unused message properties from Message component

* refactor: enhance parallel content handling with groupId support in ContentParts and useStepHandler

* refactor: implement parallel content styling in Message, MessageRender, and ContentRender components. use explicit model name

* refactor: improve agent ID handling in createDualMessageContent for dual message display

* refactor: simplify title generation in AddedConvo by removing unused sender and preset logic

* refactor: replace string interpolation with cn utility for className in HoverButtons component

* refactor: enhance agent ID handling by adding suffix management for parallel agents and updating related components

* refactor: enhance column ordering in ContentParts by sorting agents with suffix management

* refactor: update @librechat/agents dependency to version 3.0.55

* feat: implement parallel content rendering with metadata support

- Added `ParallelContentRenderer` and `ParallelColumns` components for rendering messages in parallel based on groupId and agentId.
- Introduced `contentMetadataMap` to store metadata for each content part, allowing efficient parallel content detection.
- Updated `Message` and `ContentRender` components to utilize the new metadata structure for rendering.
- Modified `useStepHandler` to manage content indices and metadata during message processing.
- Enhanced `IJobStore` interface and its implementations to support storing and retrieving content metadata.
- Updated data schemas to include `contentMetadataMap` for messages, enabling multi-agent and parallel execution scenarios.

* refactor: update @librechat/agents dependency to version 3.0.56

* refactor: remove unused EPHEMERAL_AGENT_ID constant and simplify agent ID check

* refactor: enhance multi-agent message processing and primary agent determination

* refactor: implement branch message functionality for parallel responses

* refactor: integrate added conversation retrieval into message editing and regeneration processes

* refactor: remove unused isCard and isMultiMessage props from MessageRender and ContentRender components

* refactor: update @librechat/agents dependency to version 3.0.60

* refactor: replace usage of EPHEMERAL_AGENT_ID constant with isEphemeralAgentId function for improved clarity and consistency

* refactor: standardize agent ID format in tests for consistency

* chore: move addedConvo property to the correct position in payload construction

* refactor: rename agent_id values in loadAgent tests for clarity

* chore: reorder props in ContentParts component for improved readability

* refactor: rename variable 'content' to 'result' for clarity in RedisJobStore tests

* refactor: streamline useMessageActions by removing duplicate handleFeedback assignment

* chore: revert placeholder rendering logic MessageRender and ContentRender components to original

* refactor: implement useContentMetadata hook for optimized content metadata handling

* refactor: remove contentMetadataMap and related logic from the codebase and revert back to agentId/groupId in content parts

- Eliminated contentMetadataMap from various components and services, simplifying the handling of message content.
- Updated functions to directly access agentId and groupId from content parts instead of relying on a separate metadata map.
- Adjusted related hooks and components to reflect the removal of contentMetadataMap, ensuring consistent handling of message content.
- Updated tests and documentation to align with the new structure of message content handling.

* refactor: remove logging from groupParallelContent function to clean up output

* refactor: remove model parameter from TBranchMessageRequest type for simplification

* refactor: enhance branch message creation by stripping metadata for standalone content

* chore: streamline branch message creation by simplifying content filtering and removing unnecessary metadata checks

* refactor: include attachments in branch message creation for improved content handling

* refactor: streamline agent content processing by consolidating primary agent identification and filtering logic

* refactor: simplify multi-agent message processing by creating a dedicated mapping method and enhancing content filtering

* refactor: remove unused parameter from loadEphemeralAgent function for cleaner code

* refactor: update groupId handling in metadata to only set when provided by the server
2025-12-25 01:43:54 -05:00