LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-04-03 14:27:20 +02:00

Author	SHA1	Message	Date
Danny Avila	cb41ba14b2	🔁 fix: Pass recursionLimit to OpenAI-Compatible Agents API Endpoint (#12510 ) * fix: pass recursionLimit to processStream in OpenAI-compatible agents API The OpenAI-compatible endpoint never passed recursionLimit to LangGraph's processStream(), silently capping all API-based agent calls at the default 25 steps. Mirror the 3-step cascade already used by the UI path (client.js): yaml config default → per-agent DB override → max cap. * refactor: extract resolveRecursionLimit into shared utility Extract the 3-step recursion limit cascade into a shared resolveRecursionLimit() function in @librechat/api. Both openai.js and client.js now call this single source of truth. Also fixes falsy-guard edge cases where recursion_limit=0 or maxRecursionLimit=0 would silently misbehave, by using explicit typeof + positive checks. Includes unit tests covering all cascade branches and edge cases. * refactor: use resolveRecursionLimit in openai.js and client.js Replace duplicated cascade logic in both controllers with the shared resolveRecursionLimit() utility from @librechat/api. In openai.js: hoist agentsEConfig to avoid double property walk, remove displaced comment, add integration test assertions. In client.js: remove inline cascade that was overriding config after initial assignment. * fix: hoist processStream mock for test accessibility The processStream mock was created inline inside mockResolvedValue, making it inaccessible via createRun.mock.results (which returns the Promise, not the resolved value). Hoist it to a module-level variable so tests can assert on it directly. * test: improve test isolation and boundary coverage Use mockReturnValueOnce instead of mockReturnValue to prevent mock leaking across test boundaries. Add boundary tests for downward agent override and exact-match maxRecursionLimit.	2026-04-01 21:13:07 -04:00
Dustin Healy	2451bf54cf	🛡️ fix: Restrict System Grants to Role Principals (#12491 ) * 🛡️ fix: restrict system grants to role principals only Narrows GrantPrincipalType to PrincipalType.ROLE, rejecting GROUP and USER with 400. Removes grant cascade cleanup from group/user deletion handlers and their route wiring since only roles can hold grants. * 🛡️ fix: address review findings for grants roles-only restriction Add missing GROUP rejection test for revokeGrant (symmetric with getPrincipalGrants and assignGrant coverage), add extensibility comment to GrantPrincipalType, and document the checkRoleExists guard.	2026-03-31 19:25:14 -04:00
Danny Avila	2e706ebcb3	⚖️ refactor: Split Config Route into Unauthenticated and Authenticated Paths (#12490 ) * refactor: split /api/config into unauthenticated and authenticated response paths - Replace preAuthTenantMiddleware with optionalJwtAuth on the /api/config route so the handler can detect whether the request is authenticated - When unauthenticated: call getAppConfig({ baseOnly: true }) for zero DB queries, return only login-relevant fields (social logins, turnstile, privacy policy / terms of service from interface config) - When authenticated: call getAppConfig({ role, userId, tenantId }) to resolve per-user DB overrides (USER + ROLE + GROUP + PUBLIC principals), return full payload including modelSpecs, balance, webSearch, etc. - Extract buildSharedPayload() and addWebSearchConfig() helpers to avoid duplication between the two code paths - Fixes per-user balance overrides not appearing in the frontend because userId was never passed to getAppConfig (follow-up to #12474) * test: rewrite config route tests for unauthenticated vs authenticated paths - Replace the previously-skipped supertest tests with proper mocked tests - Cover unauthenticated path: baseOnly config call, minimal payload, interface subset (privacyPolicy/termsOfService only), exclusion of authenticated-only fields - Cover authenticated path: getAppConfig called with userId, full payload including modelSpecs/balance/webSearch, per-user balance override merging * fix: address review findings — restore multi-tenant support, improve tests - Chain preAuthTenantMiddleware back before optionalJwtAuth on /api/config so unauthenticated requests in multi-tenant deployments still get tenant-scoped config via X-Tenant-Id header (Finding #1) - Use getAppConfig({ tenantId }) instead of getAppConfig({ baseOnly: true }) when a tenant context is present; fall back to baseOnly for single-tenant - Fix @type annotation: unauthenticated payload is Partial<TStartupConfig> - Refactor addWebSearchConfig into pure buildWebSearchConfig that returns a value instead of mutating the payload argument - Hoist isBirthday() to module level - Remove inline narration comments - Assert tenantId propagation in tests, including getTenantId fallback and user.tenantId preference - Add error-path tests for both unauthenticated and authenticated branches - Expand afterEach env var cleanup for proper test isolation * test: fix mock isolation and add tenant-scoped response test - Replace jest.clearAllMocks() with jest.resetAllMocks() so mockReturnValue implementations don't leak between tests - Add test verifying tenant-scoped socialLogins and turnstile are correctly mapped in the unauthenticated response * fix: add optionalJwtAuth to /api/config in experimental.js Without this middleware, req.user is never populated in the experimental cluster entrypoint, so authenticated users always receive the minimal unauthenticated config payload.	2026-03-31 19:22:51 -04:00
Dustin Healy	3d1b883e9d	👨‍👨‍👦‍👦 feat: Admin Users API Endpoints (#12446 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * feat: add admin user management endpoints Add /api/admin/users with list, search, and delete handlers gated by ACCESS_ADMIN + READ_USERS/MANAGE_USERS system grants. Handler factory in packages/api uses findUsers, countUsers, and deleteUserById from data-schemas. * fix: address convention violations in admin users handlers * fix: add pagination, self-deletion guard, and DB-level search limit - listUsers now uses parsePagination + countUsers for proper pagination matching the roles/groups pattern - findUsers extended with optional limit/offset options - deleteUser returns 403 when caller tries to delete own account - searchUsers passes limit to DB query instead of fetching all and slicing in JS - Fix import ordering per CLAUDE.md, complete logger mock - Replace fabricated date fallback with undefined * fix: deterministic sort, null-safe pagination, consistent search filter - Add sort option to findUsers; listUsers sorts by createdAt desc for deterministic pagination - Use != null guards for offset/limit to handle zero values correctly - Remove username from search filter since it is not in the projection or AdminUserSearchResult response type * fix: last-admin deletion guard and search query max-length - Prevent deleting the last admin user (look up target role, count admins, reject with 400 if count <= 1) - Cap search query at 200 characters to prevent regex DoS - Add tests for both guards * fix: include missing capability name in 403 Forbidden response * fix: cascade user deletion cleanup, search username, parallel capability checks - Cascade Config, AclEntry, and SystemGrant cleanup on user deletion (matching the pattern in roles/groups handlers) - Add username to admin search $or filter for parity with searchUsers - Parallelize READ_* capability checks in listAllGrants with Promise.all * fix: TOCTOU safety net, capability info leak, DRY/style cleanup, data-layer tests - Add post-delete admin recount with CRITICAL log if race leaves 0 admins - Revert capability name from 403 response to server-side log only - Document thin deleteUserById limitation (full cascade is a future task) - DRY: extract query.trim() to local variable in searchUsersHandler - Add username to search projection, response type, and AdminUserSearchResult - Functional filter/map in grants.ts parallel capability check - Consistent null guards and limit>0 guard in findUsers options - Fallback for empty result.message on delete response - Fix mockUser() to generate unique _id per call - Break long destructuring across multiple lines - Assert countUsers filter and non-admin skip in delete tests - Add data-layer tests for findUsers limit, offset, sort, and pagination * chore: comment out admin delete user endpoint (out of scope) * fix: cast USER principalId to ObjectId for ACL entry cleanup ACL entries store USER principalId as ObjectId (via grantPermission casting), but deleteAclEntries is a raw deleteMany that passes the filter through. Passing a string won't match stored ObjectIds, leaving orphaned entries. * chore: comment out unused requireManageUsers alongside disabled delete route * fix: add missing logger.warn mock in capabilities test * fix: harden admin users handlers — type safety, response consistency, test coverage - Unify response shape: AdminUserSearchResult.userId → id, add AdminUserListItem type - Fix unsafe req.query type assertion in searchUsersHandler (typeof guards) - Anchor search regex with ^ for prefix matching (enables index usage) - Add total/capped to search response for truncation signaling - Add parseInt radix, remove redundant new Date() wrap - Add tests: countUsers throw, countUsers call args, array query param, capped flag * fix: scope deleteGrantsForPrincipal to tenant, deterministic search sort, align test mocks - Add tenantId option to AdminUsersDeps.deleteGrantsForPrincipal and pass req.user.tenantId at the call site, matching the pattern already used by the roles and groups handlers - Add sort: { name: 1 } to searchUsersHandler for deterministic results - Align test mock deleteUserById messages with production output ('User was deleted successfully.') - Make capped-results test explicitly set limit: '20' instead of relying on the implicit default * test: add tenantId propagation test for deleteGrantsForPrincipal Add tenantId to createReqRes user type and test that a non-undefined tenantId is threaded through to deleteGrantsForPrincipal. * test: remove redundant deleteUserById override in tenantId test --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-30 23:06:50 -04:00
Danny Avila	fd01dfc083	💰 fix: Lazy-Initialize Balance Record at Check Time for Overrides (#12474 ) * fix: Lazy-initialize balance record when missing at check time When balance is configured via admin panel DB overrides, users with existing sessions never pass through the login middleware that creates their balance record. This causes checkBalanceRecord to find no record and return balance: 0, blocking the user. Add optional balanceConfig and upsertBalanceFields deps to CheckBalanceDeps. When no balance record exists but startBalance is configured, lazily create the record instead of returning canSpend: false. Pass the new deps from BaseClient, chatV1, and chatV2 callers. * test: Add checkBalance lazy initialization tests Cover lazy balance init scenarios: successful init with startBalance, insufficient startBalance, missing config fallback, undefined startBalance, missing upsertBalanceFields dep, and startBalance of 0. * fix: Address review findings for lazy balance initialization - Use canonical BalanceConfig and IBalanceUpdate types from @librechat/data-schemas instead of inline type definitions - Include auto-refill fields (autoRefillEnabled, refillIntervalValue, refillIntervalUnit, refillAmount, lastRefill) during lazy init, mirroring the login middleware's buildUpdateFields logic - Add try/catch around upsertBalanceFields with graceful fallback to canSpend: false on DB errors - Read balance from DB return value instead of raw startBalance constant - Fix misleading test names to describe observable throw behavior - Add tests: upsertBalanceFields rejection, auto-refill field inclusion, DB-returned balance value, and logViolation assertions * fix: Address second review pass findings - Fix import ordering: package type imports before local type imports - Remove misleading comment on DB-fallback test, rename for clarity - Add logViolation assertion to insufficient-balance lazy-init test - Add test for partial auto-refill config (autoRefillEnabled without required dependent fields) * refactor: Replace createMockReqRes factory with describe-scoped consts Replace zero-argument factory with plain const declarations using direct type casts instead of double-cast through unknown. * fix: Sort local type imports longest-first, add missing logViolation assertion - Reorder local type imports in spec file per AGENTS.md (longest to shortest within sub-group) - Add logViolation assertion to startBalance: 0 test for consistent violation payload coverage across all throw paths	2026-03-30 22:51:07 -04:00
Danny Avila	4f37e8adb9	🔐 feat: Admin Auth Support for SAML and Social OAuth Providers (#12472 ) * refactor: Add existingUsersOnly support to social and SAML auth callbacks - Add `existingUsersOnly` option to the `socialLogin` handler factory to reject unknown users instead of creating new accounts - Refactor SAML strategy callback into `createSamlCallback(existingUsersOnly)` factory function, mirroring the OpenID `createOpenIDCallback` pattern - Extract shared SAML config into `getBaseSamlConfig()` helper - Register `samlAdmin` passport strategy with `existingUsersOnly: true` and admin-specific callback URL, called automatically from `setupSaml()` * feat: Register admin OAuth strategy variants for all social providers - Add admin strategy exports to Google, GitHub, Discord, Facebook, and Apple strategy files with admin callback URLs and existingUsersOnly - Extract provider configs into reusable helpers to avoid duplication between regular and admin strategy constructors - Re-export all admin strategy factories from strategies/index.js - Register admin passport strategies (googleAdmin, githubAdmin, etc.) alongside regular ones in socialLogins.js when env vars are present * feat: Add admin auth routes for SAML and social OAuth providers - Add initiation and callback routes for SAML, Google, GitHub, Discord, Facebook, and Apple to the admin auth router - Each provider follows the exchange code + PKCE pattern established by OpenID admin auth: store PKCE challenge on initiation, retrieve on callback, generate exchange code for the admin panel - SAML and Apple use POST callbacks with state extracted from req.body.RelayState and req.body.state respectively - Extract storePkceChallenge(), retrievePkceChallenge(), and generateState() helpers; refactor existing OpenID routes to use them - All callback chains enforce requireAdminAccess, setBalanceConfig, checkDomainAllowed, and the shared createOAuthHandler - No changes needed to the generic POST /oauth/exchange endpoint * fix: Update SAML strategy test to handle dual strategy registration setupSaml() now registers both 'saml' and 'samlAdmin' strategies, causing the SamlStrategy mock to be called twice. The verifyCallback variable was getting overwritten with the admin callback (which has existingUsersOnly: true), making all new-user tests fail. Fix: capture only the first callback per setupSaml() call and reset between tests. * fix: Address review findings for admin OAuth strategy changes - Fix existingUsersOnly rejection in socialLogin.js to use cb(null, false, { message }) instead of cb(error), ensuring passport's failureRedirect fires correctly for admin flows - Consolidate duplicate require() calls in strategies/index.js by destructuring admin exports from the already-imported default export - Pass pre-parsed baseConfig to setupSamlAdmin() to avoid redundant certificate file I/O at startup - Extract getGoogleConfig() helper in googleStrategy.js for consistency with all other provider strategy files - Replace randomState() (openid-client) with generateState() (crypto) in the OpenID admin route for consistency with all other providers, and remove the now-unused openid-client import * Reorder import statements in auth.js	2026-03-30 22:49:44 -04:00
Danny Avila	2bf0f892d6	🛡️ fix: Add Origin Binding to Admin OAuth Exchange Codes (#12469 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * fix(auth): add origin binding to admin OAuth exchange codes Bind admin OAuth exchange codes to the admin panel's origin at generation time and validate the origin on redemption. This prevents an intercepted code (via referrer leakage, logs, or network capture) from being redeemed by a different origin within the 30-second TTL. - Store the admin panel origin alongside the exchange code in cache - Extract the request origin (from Origin/Referer headers) on the exchange endpoint and pass it for validation - Reject code redemption when the request origin does not match the stored origin (code is still consumed to prevent replay) - Backward compatible: codes without a stored origin are accepted * fix(auth): add PKCE proof-of-possession to admin OAuth exchange codes Add a PKCE-like code_challenge/code_verifier flow to the admin OAuth exchange so that intercepting the exchange code alone is insufficient to redeem it. The admin panel generates a code_verifier (stored in its HttpOnly session cookie) and sends sha256(verifier) as code_challenge through the OAuth initiation URL. LibreChat stores the challenge keyed by OAuth state and attaches it to the exchange code. On redemption, the admin panel sends the verifier and LibreChat verifies the hash match. - Add verifyCodeChallenge() helper using SHA-256 - Store code_challenge in ADMIN_OAUTH_EXCHANGE cache (pkce: prefix, 5min TTL) - Capture OAuth state in callback middleware before passport processes it - Accept code_verifier in exchange endpoint body - Backward compatible: no challenge stored → PKCE check skipped * fix(auth): harden PKCE and origin binding in admin OAuth exchange - Await cache.set for PKCE challenge storage with error handling - Use crypto.timingSafeEqual for PKCE hash comparison - Drop case-insensitive flag from hex validation regexes - Add code_verifier length validation (max 512 chars) - Normalize Origin header via URL parsing in resolveRequestOrigin - Add test for undefined requestOrigin rejection - Clarify JSDoc: hex-encoded SHA-256, not RFC 7636 S256 * fix(auth): fail closed on PKCE callback cache errors, clean up origin/buffer handling - Callback middleware now redirects to error URL on cache.get failure instead of silently continuing without PKCE challenge - resolveRequestOrigin returns undefined (not raw header) on parse failure - Remove dead try/catch around Buffer.from which never throws for string input * chore(auth): remove narration comments, scope eslint-disable to lines * chore(auth): narrow query.state to string, remove narration comments in exchange.ts * fix(auth): address review findings — warn on missing PKCE challenge, validate verifier length, deduplicate URL parse - Log warning when OAuth state is present but no PKCE challenge found - Add minimum length check (>= 1) on code_verifier input validation - Update POST /oauth/exchange JSDoc to document code_verifier param - Deduplicate new URL(redirectUri) parse in createOAuthHandler - Restore intent comment on pre-delete pattern in exchangeAdminCode * test(auth): replace mock cache with real Keyv, remove all as-any casts - Use real Keyv in-memory store instead of hand-rolled Map mock - Replace jest.fn mocks with jest.spyOn on real Keyv instance - Remove redundant store.has() assertion, use cache.get() instead - Eliminate all eslint-disable and as-any suppressions - User fixture no longer needs any cast (Keyv accepts plain objects) * fix(auth): add IUser type cast for test fixture to satisfy tsc	2026-03-30 16:54:00 -04:00
Danny Avila	fda72ac621	🏗️ refactor: Remove Redundant Caching, Migrate Config Services to TypeScript (#12466 ) * ♻️ refactor: Remove redundant scopedCacheKey caching, support user-provided key model fetching Remove redundant cache layers that used `scopedCacheKey()` (tenant-only scoping) on top of `getAppConfig()` which already caches per-principal (role+user+tenant). This caused config overrides for different principals within the same tenant to be invisible due to stale cached data. Changes: - Add `requireJwtAuth` to `/api/endpoints` route for proper user context - Remove ENDPOINT_CONFIG, STARTUP_CONFIG, PLUGINS, TOOLS, and MODELS_CONFIG cache layers — all derive from `getAppConfig()` with cheap computation - Enhance MODEL_QUERIES cache: hash(baseURL+apiKey) keys, 2-minute TTL, caching centralized in `fetchModels()` base function - Support fetching models with user-provided API keys in `loadConfigModels` via `getUserKeyValues` lookup (no caching for user keys) - Update all affected tests Closes #1028 * ♻️ refactor: Migrate config services to TypeScript in packages/api Move core config logic from CJS /api wrappers to typed TypeScript in packages/api using dependency injection factories: - `createEndpointsConfigService` — endpoint config merging + checkCapability - `createLoadConfigModels` — custom endpoint model loading with user key support - `createMCPToolCacheService` — MCP tool cache operations (update, merge, cache) /api files become thin wrappers that wire dependencies (getAppConfig, loadDefaultEndpointsConfig, getUserKeyValues, getCachedTools, etc.) into the typed factories. Also moves existing `endpoints/config.ts` → `endpoints/config/providers.ts` to accommodate the new `config/` directory structure. * 🔄 fix: Invalidate models query when user API key is set or revoked Without this, users had to refresh the page after entering their API key to see the updated model list fetched with their credentials. - Invalidate QueryKeys.models in useUpdateUserKeysMutation onSuccess - Invalidate QueryKeys.models in useRevokeUserKeyMutation onSuccess - Invalidate QueryKeys.models in useRevokeAllUserKeysMutation onSuccess * 🗺️ fix: Remap YAML-level override keys to AppConfig equivalents in mergeConfigOverrides Config overrides stored in the DB use YAML-level keys (TCustomConfig), but they're merged into the already-processed AppConfig where some fields have been renamed by AppService. This caused mcpServers overrides to land on a nonexistent key instead of mcpConfig, so config-override MCP servers never appeared in the UI. - Add OVERRIDE_KEY_MAP to remap mcpServers→mcpConfig, interface→interfaceConfig - Apply remapping before deep merge in mergeConfigOverrides - Add test for YAML-level key remapping behavior - Update existing tests to use AppConfig field names in assertions * 🧪 test: Update service.spec to use AppConfig field names after override key remapping * 🛡️ fix: Address code review findings — reliability, types, tests, and performance - Pass tenant context (getTenantId) in importers.js getEndpointsConfig call - Add 5 tests for user-provided API key model fetching (key found, no key, DB error, missing userId, apiKey-only with fixed baseURL) - Distinguish NO_USER_KEY (debug) from infrastructure errors (warn) in catch - Switch fetchPromisesMap from Promise.all to Promise.allSettled so one failing provider doesn't kill the entire model config - Parallelize getUserKeyValues DB lookups via batched Promise.allSettled instead of sequential awaits in the loop - Hoist standardCache instance in fetchModels to avoid double instantiation - Replace Record<string, unknown> types with Partial<TConfig>-based types; remove as unknown as T double-cast in endpoints config - Narrow Bedrock availableRegions to typed destructure - Narrow version field from string\|number\|undefined to string\|undefined - Fix import ordering in mcp/tools.ts and config/models.ts per AGENTS.md - Add JSDoc to getModelsConfig alias clarifying caching semantics * fix: Guard against null getCachedTools in mergeAppTools * 🔍 fix: Address follow-up review — deduplicate extractEnvVariable, fix error discrimination, add log-level tests - Deduplicate extractEnvVariable calls: resolve apiKey/baseURL once, reuse for both the entry and isUserProvided checks (Finding A) - Move ResolvedEndpoint interface from function closure to module scope (Finding B) - Replace fragile msg.includes('NO_USER_KEY') with ErrorTypes.NO_USER_KEY enum check against actual error message format (Finding C). Also handle ErrorTypes.INVALID_USER_KEY as an expected "no key" case. - Add test asserting logger.warn is called for infra errors (not debug) - Add test asserting logger.debug is called for NO_USER_KEY errors (not warn) * fix: Preserve numeric assistants version via String() coercion * 🐛 fix: Address secondary review — Ollama cache bypass, cache tests, type safety - Fix Ollama success path bypassing cache write in fetchModels (CRITICAL): store result before returning so Ollama models benefit from 2-minute TTL - Add 4 fetchModels cache behavior tests: cache write with TTL, cache hit short-circuits HTTP, skipCache bypasses read+write, empty results not cached - Type-safe OVERRIDE_KEY_MAP: Partial<Record<keyof TCustomConfig, keyof AppConfig>> so compiler catches future field rename mismatches - Fix import ordering in config/models.ts (package types longest→shortest) - Rename ToolCacheDeps → MCPToolCacheDeps for naming consistency - Expand getModelsConfig JSDoc to explain caching granularity * fix: Narrow OVERRIDE_KEY_MAP index to satisfy strict tsconfig * 🧩 fix: Add allowedProviders to TConfig, remove Record<string, unknown> from PartialEndpointEntry The agents endpoint config includes allowedProviders (used by the frontend AgentPanel to filter available providers), but it was missing from TConfig. This forced PartialEndpointEntry to use & Record<string, unknown> as an escape hatch, violating AGENTS.md type policy. - Add allowedProviders?: (string \| EModelEndpoint)[] to TConfig - Remove Record<string, unknown> from PartialEndpointEntry — now just Partial<TConfig> * 🛡️ fix: Isolate Ollama cache write from fetch try-catch, add Ollama cache tests - Separate Ollama fetch and cache write into distinct scopes so a cache failure (e.g., Redis down) doesn't misattribute the error as an Ollama API failure and fall through to the OpenAI-compatible path (Issue A) - Add 2 Ollama-specific cache tests: models written with TTL on fetch, cached models returned without hitting server (Issue B) - Replace hardcoded 120000 with Time.TWO_MINUTES constant in cache TTL test assertion (Issue C) - Fix OVERRIDE_KEY_MAP JSDoc to accurately describe runtime vs compile-time type enforcement (Issue D) - Add global beforeEach for cache mock reset to prevent cross-test leakage * 🧪 fix: Address third review — DI consistency, cache key width, MCP tests - Inject loadCustomEndpointsConfig via EndpointsConfigDeps with default fallback, matching loadDefaultEndpointsConfig DI pattern (Finding 3) - Widen modelsCacheKey from 64-bit (.slice(0,16)) to 128-bit (.slice(0,32)) for collision-sensitive cross-credential cache key (Finding 4) - Add fetchModels.mockReset() in loadConfigModels.spec beforeEach to prevent mock implementation leaks across tests (Finding 5) - Add 11 unit tests for createMCPToolCacheService covering all three functions: null/empty input, successful ops, error propagation, cold-cache merge (Finding 2) - Simplify getModelsConfig JSDoc to @see reference (Finding 10) * ♻️ refactor: Address remaining follow-ups from reviews OVERRIDE_KEY_MAP completeness: - Add missing turnstile→turnstileConfig mapping - Add exhaustiveness test verifying all three renamed keys are remapped and original YAML keys don't leak through Import role context: - Pass userRole through importConversations job → importLibreChatConvo so role-based endpoint overrides are honored during conversation import - Update convos.js route to include req.user.role in the job payload createEndpointsConfigService unit tests: - Add 8 tests covering: default+custom merge, Azure/AzureAssistants/ Anthropic Vertex/Bedrock config enrichment, assistants version coercion, agents allowedProviders, req.config bypass Plugins/tools efficiency: - Use Set for includedTools/filteredTools lookups (O(1) vs O(n) per plugin) - Combine auth check + filter into single pass (eliminates intermediate array) - Pre-compute toolDefKeys Set for O(1) tool definition lookups * fix: Scope model query cache by user when userIdQuery is enabled * fix: Skip model cache for userIdQuery endpoints, fix endpoints test types - When userIdQuery is true, skip caching entirely (like user_provided keys) to avoid cross-user model list leakage without duplicating cache data - Fix AgentCapabilities type error in endpoints.spec.ts — use enum values and appConfig() helper for partial mock typing * 🐛 fix: Restore filteredTools+includedTools composition, add checkCapability tests - Fix filteredTools regression: whitelist and blacklist are now applied independently (two flat guards), matching original behavior where includedTools=['a','b'] + filteredTools=['b'] produces ['a'] (Finding A) - Fix Set spread in toolkit loop: pre-compute toolDefKeysList array once alongside the Set, reuse for .some() without per-plugin allocation (Finding B) - Add 2 filteredTools tests: blacklist-only path and combined whitelist+blacklist composition (Finding C) - Add 3 checkCapability tests: capability present, capability absent, fallback to defaultAgentCapabilities for non-agents endpoints (Finding D) * 🔑 fix: Include config-override MCP servers in filterAuthorizedTools Config-override MCP servers (defined via admin config overrides for roles/groups) were rejected by filterAuthorizedTools because it called getAllServerConfigs(userId) without the configServers parameter. Only YAML and DB-backed user servers were included in the access check. - Add configServers parameter to filterAuthorizedTools - Resolve config servers via resolveConfigServers(req) at all 4 callsites (create, update, duplicate, revert) using parallel Promise.all - Pass configServers through to getAllServerConfigs(userId, configServers) so the registry merges config-source servers into the access check - Update filterAuthorizedTools.spec.js mock for resolveConfigServers * fix: Skip model cache for userIdQuery endpoints, fix endpoints test types For user-provided key endpoints (userProvide: true), skip the full model list re-fetch during message validation — the user already selected from a list we served them, and re-fetching with skipCache:true on every message send is both slow and fragile (5s provider timeout = rejected model). Instead, validate the model string format only: - Must be a string, max 256 chars - Must match [a-zA-Z0-9][a-zA-Z0-9_.:\-/@+ ]* (covers all known provider model ID formats while rejecting injection attempts) System-configured endpoints still get full model list validation as before. * 🧪 test: Add regression tests for filterAuthorizedTools configServers and validateModel filterAuthorizedTools: - Add test verifying configServers is passed to getAllServerConfigs and config-override server tools are allowed through - Guard resolveConfigServers in createAgentHandler to only run when MCP tools are present (skip for tool-free agent creates) validateModel (12 new tests): - Format validation: missing model, non-string, length overflow, leading special char, script injection, standard model ID acceptance - userProvide early-return: next() called immediately, getModelsConfig not invoked (regression guard for the exact bug this fixes) - System endpoint list validation: reject unknown model, accept known model, handle null/missing models config Also fix unnecessary backslash escape in MODEL_PATTERN regex. * 🧹 fix: Remove space from MODEL_PATTERN, trim input, clean up nits - Remove space character from MODEL_PATTERN regex — no real model ID uses spaces; prevents spurious violation logs from whitespace artifacts - Add model.trim() before validation to handle accidental whitespace - Remove redundant filterUniquePlugins call on already-deduplicated output - Add comment documenting intentional whitelist+blacklist composition - Add getUserKeyValues.mockReset() in loadConfigModels.spec beforeEach - Remove narrating JSDoc from getModelsConfig one-liner - Add 2 tests: trim whitespace handling, reject spaces in model ID * fix: Match startup tool loader semantics — includedTools takes precedence over filteredTools The startup tool loader (loadAndFormatTools) explicitly ignores filteredTools when includedTools is set, with a warning log. The PluginController was applying both independently, creating inconsistent behavior where the same config produced different results at startup vs plugin listing time. Restored mutually exclusive semantics: when includedTools is non-empty, filteredTools is not evaluated. * 🧹 chore: Simplify validateModel flow, note auth requirement on endpoints route - Separate missing-model from invalid-model checks cleanly: type+presence guard first, then trim+format guard (reviewer NIT) - Add route comment noting auth is required for role/tenant scoping * fix: Write trimmed model back to req.body.model for downstream consumers	2026-03-30 16:49:48 -04:00
Dustin Healy	a4a17ac771	⛩️ feat: Admin Grants API Endpoints (#12438 ) * feat: add System Grants handler factory with tests Handler factory with 4 endpoints: getEffectiveCapabilities (expanded capability set for authenticated user), getPrincipalGrants (list grants for a specific principal), assignGrant, and revokeGrant. Write ops dynamically check MANAGE_ROLES/GROUPS/USERS based on target principal type. 31 unit tests covering happy paths, validation, 403, and errors. * feat: wire System Grants REST routes Mount /api/admin/grants with requireJwtAuth + ACCESS_ADMIN gate. Add barrel export for createAdminGrantsHandlers and AdminGrantsDeps. * fix: cascade grant cleanup on role deletion Add deleteGrantsForPrincipal to AdminRolesDeps and call it in deleteRoleHandler via Promise.allSettled after successful deletion, matching the groups cleanup pattern. 3 tests added for cleanup call, skip on 404, and resilience to cleanup failure. * fix: simplify cascade grant cleanup on role deletion Replace Promise.allSettled wrapper with a direct try/catch for the single deleteGrantsForPrincipal call. * fix: harden grant handlers with auth, validation, types, and RESTful revoke - Add per-handler auth checks (401) and granular capability gates (READ_* for getPrincipalGrants, possession check for assignGrant) - Extract validatePrincipal helper; rewrite validateGrantBody to use direct type checks instead of unsafe `as string` casts - Align DI types with data layer (ResolvedPrincipal.principalType widened to string, getUserPrincipals role made optional) - Switch revoke route from DELETE body to RESTful URL params - Return 201 for assignGrant to match roles/groups create convention - Handle null grantCapability return with 500 - Add comprehensive test coverage for new auth/validation paths * fix: deduplicate ResolvedPrincipal, typed body, defensive auth checks - Remove duplicate ResolvedPrincipal from capabilities.ts; import the canonical export from grants.ts - Replace Record<string, unknown> with explicit GrantRequestBody interface - Add defensive 403 when READ_CAPABILITY_BY_TYPE lookup misses - Document revoke asymmetry (no possession check) with JSDoc - Use _id only in resolveUser (avoid Mongoose virtual reliance) - Improve null-grant error message - Complete logger mock in tests * refactor: move ResolvedPrincipal to shared types to fix circular dep Extract ResolvedPrincipal from admin/grants.ts to types/principal.ts so middleware/capabilities.ts imports from shared types rather than depending upward on the admin handler layer. * chore: remove dead re-export, align logger mocks across admin tests - Remove unused ResolvedPrincipal re-export from grants.ts (canonical source is types/principal.ts) - Align logger mocks in roles.spec.ts and groups.spec.ts to include all log levels (error, warn, info, debug) matching grants.spec.ts * fix: cascade Config and AclEntry cleanup on role deletion Add deleteConfig and deleteAclEntries to role deletion cascade, matching the group deletion pattern. Previously only grants were cleaned up, leaving orphaned config overrides and ACL entries. * perf: single-query batch for getEffectiveCapabilities Add getCapabilitiesForPrincipals (plural) to the data layer — a single $or query across all principals instead of N+1 parallel queries. Wire it into the grants handler so getEffectiveCapabilities hits the DB once regardless of how many principals the user has. * fix: defer SystemCapabilities access to factory call time Move all SystemCapabilities usage (VALID_CAPABILITIES, MANAGE_CAPABILITY_BY_TYPE, READ_CAPABILITY_BY_TYPE) inside the createAdminGrantsHandlers factory. External test suites that mock @librechat/data-schemas without providing SystemCapabilities crashed at import time when grants.ts was loaded transitively. * test: add data-layer and handler test coverage for review findings - Add 6 mongodb-memory-server tests for getCapabilitiesForPrincipals: multi-principal batch, empty array, filtering, tenant scoping - Add handler test: all principals filtered (only PUBLIC) - Add handler test: granting an implied capability succeeds - Add handler test: all cascade cleanup operations fail simultaneously - Document platform-scope-only tenantId behavior in JSDoc * fix: resolveUser fallback to user.id, early-return empty principals - Match capabilities middleware pattern: _id?.toString() ?? user.id to handle JWT-deserialized users without Mongoose _id - Move empty-array guard before principals.map() to skip unnecessary normalizePrincipalId calls - Add comment explaining VALID_PRINCIPAL_TYPES module-scope asymmetry * refactor: derive VALID_PRINCIPAL_TYPES from capability maps Make MANAGE_CAPABILITY_BY_TYPE and READ_CAPABILITY_BY_TYPE non-Partial Records over a shared GrantPrincipalType union, then derive VALID_PRINCIPAL_TYPES from the map keys. This makes divergence between the three data structures structurally impossible. * feat: add GET /api/admin/grants list-all-grants endpoint Add listAllGrants data-layer method and handler so the admin panel can fetch all grants in a single request instead of fanning out N+M calls per role and group. Response is filtered to only include grants for principal types the caller has read access to. * fix: update principalType to use GrantPrincipalType for consistency in grants handling - Refactor principalType in createAdminGrantsHandlers to use GrantPrincipalType instead of PrincipalType for better type accuracy. - Ensure type consistency across the grants handling logic in the API. * fix: address admin grants review findings — tenantId propagation, capability validation, pagination, and test coverage Propagate tenantId through all grant operations for multi-tenancy support. Extract isValidCapability to accept full SystemCapability union (base, section, assign) and reuse it in both Mongoose schema validation and handler input checks. Replace listAllGrants with paginated listGrants + countGrants. Filter PUBLIC principals from getCapabilitiesForPrincipals queries. Export getCachedPrincipals from ALS store for fast-path principal resolution. Move DELETE capability param to query string to avoid colon-in-URL issues. Remove dead code and add comprehensive handler and data-layer test coverage. * refactor: harden admin grants — FilterQuery types, auth-first ordering, DELETE path param, isValidCapability tests Replace Record<string, unknown> with FilterQuery<ISystemGrant> across all data-layer query filters. Refactor buildTenantFilter to a pure tenantCondition function that returns a composable FilterQuery fragment, eliminating the $or collision between tenant and principal queries. Move auth check before input validation in getPrincipalGrantsHandler, assignGrantHandler, and revokeGrantHandler to avoid leaking valid type names to unauthenticated callers. Switch DELETE route from query param back to path param (/:capability) with encodeURIComponent per project conventions. Add compound index for listGrants sort. Type VALID_PRINCIPAL_TYPES as Set<GrantPrincipalType>. Remove unused GetCachedPrincipalsFn type export. Add dedicated isValidCapability unit tests and revokeGrant idempotency test. * refactor: batch capability checks in listGrantsHandler via getHeldCapabilities Replace 3 parallel hasCapabilityForPrincipals DB calls with a single getHeldCapabilities query that returns the subset of capabilities any principal holds. Also: defensive limit(0) clamp, parallelized assignGrant auth checks, principalId type-vs-required error split, tenantCondition hoisted to factory top, JSDoc on cascade deps, DELETE route encoding note. * fix: normalize principalId and filter undefined in getHeldCapabilities Add normalizePrincipalId + null guard to getHeldCapabilities, matching the contract of getCapabilitiesForPrincipals. Simplify allCaps build with flatMap, add no-tenantId cross-check and undefined-principalId test cases. * refactor: use concrete types in GrantRequestBody, rename encoding test Replace unknown fields with explicit string types in GrantRequestBody, matching the established pattern in roles/groups/config handlers. Rename misleading 'encoded' test to 'with colons' since Express auto-decodes req.params. * fix: support hierarchical parent capabilities in possession checks hasCapabilityForPrincipals and getHeldCapabilities now resolve parent base capabilities for section/assignment grants. An admin holding manage:configs can now grant manage:configs:<section> and transitively read:configs:<section>. Fixes anti-escalation 403 blocking config capability delegation. * perf: use getHeldCapabilities in assignGrant to halve DB round-trips assignGrantHandler was making two parallel hasCapabilityForPrincipals calls to check manage + capability possession. getHeldCapabilities was introduced in this PR specifically for this pattern. Replace with a single batched call. Update corresponding spec assertions. * fix: validate role existence before granting capabilities Grants for non-existent role names were silently persisted, creating orphaned grants that could surprise-activate if a role with that name was later created. Add optional checkRoleExists dep to assignGrant and wire it to getRoleByName in the route file. * refactor: tighten principalType typing and use grantCapability in tests Narrow getCapabilitiesForPrincipals parameter from string to PrincipalType, removing the redundant cast. Replace direct SystemGrant.create() calls in getCapabilitiesForPrincipals tests with methods.grantCapability() to honor the schema's normalization invariant. Add getHeldCapabilities extended capability tests. * test: rename misleading cascade cleanup test name The test only injects failure into deleteGrantsForPrincipal, not all cascade operations. Rename from 'cascade cleanup fails' to 'grant cleanup fails' to match the actual scope. * fix: reorder role check after permission guard, add tenantId to index Move checkRoleExists after the getHeldCapabilities permission check so that a sub-MANAGE_ROLES admin cannot probe role name existence via 400 vs 403 response codes. Add tenantId to the { principalType, capability } index so listGrants queries in multi-tenant deployments can use a covering index instead of post-scanning for tenant condition. Add missing test for checkRoleExists throwing. * fix: scope deleteGrantsForPrincipal to tenant on role deletion deleteGrantsForPrincipal previously filtered only on principalType + principalId, deleting grants across all tenants. Since the role schema supports multi-tenancy (compound unique index on name + tenantId), two tenants can share a role name like 'editor'. Deleting that role in one tenant would wipe grants for identically-named roles in other tenants. Add optional tenantId parameter to deleteGrantsForPrincipal. When provided, scopes the delete to that tenant plus platform-level grants. Propagate req.user.tenantId through the role deletion cascade. * fix: scope grant cleanup to tenant on group deletion Same cross-tenant gap as the role deletion path: deleteGroupHandler called deleteGrantsForPrincipal without tenantId, so deleting a group would wipe its grants across all tenants. Extract req.user.tenantId and pass it through. * test: add HTTP integration test for admin grants routes Supertest-based test with real MongoMemoryServer exercising the full Express wiring: route registration, injected auth middleware, handler DI deps, and real DB round-trips. Covers GET /, GET /effective, POST / + DELETE / lifecycle, role existence validation, and 401 for unauthenticated callers. Also documents the expandImplications scope: the /effective endpoint returns base-level capabilities only; section-level resolution is handled at authorization check time by getParentCapabilities. * fix: use exact tenant match in deleteGrantsForPrincipal, normalize principalId, harden API CRITICAL: deleteGrantsForPrincipal was using tenantCondition (a read-query helper) for deleteMany, which includes the { tenantId: { $exists: false } } arm. This silently destroyed platform-level grants when a tenant-scoped role/group deletion occurred. Replace with exact { tenantId } match for deletes so platform-level grants survive tenant-scoped cascade cleanup. Refactor deleteGrantsForPrincipal signature from fragile positional overload (sessionOrTenantId union + maybeSession) to a clean options object: { tenantId?, session? }. Update all callers and test assertions. Add normalizePrincipalId to hasCapabilityForPrincipals to match the pattern already used by getHeldCapabilities — prevents string/ObjectId type mismatch on USER/GROUP principal queries. Also: export GrantPrincipalType from barrel, add upper-bound cap to listGrants, document GROUP/USER existence check trade-off, add integration tests for tenant-isolation property of deleteGrantsForPrincipal. * fix: forward tenantId to getUserPrincipals in resolvePrincipals resolvePrincipals had tenantId available from the caller but only forwarded it to getCachedPrincipals (cache lookup). The DB fallback via getUserPrincipals omitted it. While the Group schema's applyTenantIsolation Mongoose plugin handles scoping via AsyncLocalStorage in HTTP request context, explicitly passing tenantId makes the contract visible and prevents silent cross-tenant group resolution if called outside request context. * fix: remove unused import and add assertion to 401 integration test Remove unused SystemCapabilities import flagged by ESLint. Add explicit body assertion to the 401 test so it has a jest expect() call. * chore: hoist grant limit constants to scope, remove dead isolateModules Move GRANTS_DEFAULT_LIMIT / GRANTS_MAX_LIMIT from inside listGrants function body to createSystemGrantMethods scope so they are evaluated once at module load. Remove dead jest.isolateModules + jest.doMock block in integration test — the ~/models mock was never exercised since handlers are built with explicit DI deps. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-30 16:49:23 -04:00
Danny Avila	0d94881c2d	🧹 refactor: Tighten Config Schema Typing and Remove Deprecated Fields (#12452 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * refactor: Remove deprecated and unused fields from endpoint schemas - Remove summarize, summaryModel from endpointSchema and azureEndpointSchema - Remove plugins from azureEndpointSchema - Remove customOrder from endpointSchema and azureEndpointSchema - Remove baseURL from all and agents endpoint schemas - Type paramDefinitions with full SettingDefinition-based schema - Clean up summarize/summaryModel references in initialize.ts and config.spec.ts * refactor: Improve MCP transport schema typing - Add defaults to transport type discriminators (stdio, websocket, sse) - Type stderr field as IOType union instead of z.any() * refactor: Add narrowed preset schema for model specs - Create tModelSpecPresetSchema omitting system/DB/deprecated fields - Update tModelSpecSchema to use the narrowed preset schema * test: Add explicit type field to MCP test fixtures Add transport type discriminator to test objects that construct MCPOptions/ParsedServerConfig directly, required after type field changed from optional to default in schema definitions. * chore: Bump librechat-data-provider to 0.8.404 * refactor: Tighten z.record(z.any()) fields to precise value types - Type headers fields as z.record(z.string()) in endpoint, assistant, and azure schemas - Type addParams as z.record(z.union([z.string(), z.number(), z.boolean(), z.null()])) - Type azure additionalHeaders as z.record(z.string()) - Type memory model_parameters as z.record(z.union([z.string(), z.number(), z.boolean()])) - Type firecrawl changeTrackingOptions.schema as z.record(z.string()) * refactor: Type supportedMimeTypes schema as z.array(z.string()) Replace z.array(z.any()).refine() with z.array(z.string()) since config input is always strings that get converted to RegExp via convertStringsToRegex() after parsing. Destructure supportedMimeTypes from spreads to avoid string[]/RegExp[] type mismatch. * refactor: Tighten enum, role, and numeric constraint schemas - Type engineSTT as enum ['openai', 'azureOpenAI'] - Type engineTTS as enum ['openai', 'azureOpenAI', 'elevenlabs', 'localai'] - Constrain playbackRate to 0.25–4 range - Type titleMessageRole as enum ['system', 'user', 'assistant'] - Add int().nonnegative() to MCP timeout and firecrawl timeout * chore: Bump librechat-data-provider to 0.8.405 * fix: Accept both string and RegExp in supportedMimeTypes schema The schema must accept both string[] (config input) and RegExp[] (post-merge runtime) since tests validate merged output against the schema. Use z.union([z.string(), z.instanceof(RegExp)]) to handle both. * refactor: Address review findings for schema tightening PR - Revert changeTrackingOptions.schema to z.record(z.unknown()) (JSON Schema is nested, not flat strings) - Remove dead contextStrategy code from BaseClient.js and cleanup.js - Extract paramDefinitionSchema to named exported constant - Add .int() constraint to columnSpan and columns - Apply consistent .int().nonnegative() to initTimeout, sseReadTimeout, scraperTimeout - Update stale stderr JSDoc to match actual accepted types - Add comprehensive tests for paramDefinitionSchema, tModelSpecPresetSchema, endpointSchema deprecated field stripping, and azureEndpointSchema * fix: Address second review pass findings - Revert supportedMimeTypesSchema to z.array(z.string()) and remove as string[] casts — fix tests to not validate merged RegExp[] output against the config input schema - Remove unused tModelSpecSchema import from test file - Consolidate duplicate '../src/schemas' imports - Add expiredAt coverage to tModelSpecPresetSchema test - Assert plugins is absent in azureEndpointSchema test - Add sync comments for engineSTT/engineTTS enum literals * refactor: Omit preset-management fields from tModelSpecPresetSchema Omit conversationId, presetId, title, defaultPreset, and order from the model spec preset schema — these are preset-management fields that don't belong in model spec configuration.	2026-03-29 01:10:57 -04:00
Danny Avila	877c2efc85	🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445 ) * fix: wrap seedDatabase() in runAsSystem() for strict tenant mode seedDatabase() was called without tenant context at startup, causing every Mongoose operation inside it to throw when TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering, matching the pattern already used for performStartupChecks and updateInterfacePermissions. * fix: chain tenantContextMiddleware in optionalJwtAuth optionalJwtAuth populated req.user but never established ALS tenant context, unlike requireJwtAuth which chains tenantContextMiddleware after successful auth. Authenticated users hitting routes with optionalJwtAuth (e.g. /api/banner) had no tenant isolation. * feat: tenant-safe bulkWrite wrapper and call-site migration Mongoose's bulkWrite() does not trigger schema-level middleware hooks, so the applyTenantIsolation plugin cannot intercept it. This adds a tenantSafeBulkWrite() utility that injects the current ALS tenant context into every operation's filter/document before delegating to native bulkWrite. Migrates all 8 runtime bulkWrite call sites: - agentCategory (seedCategories, ensureDefaultCategories) - conversation (bulkSaveConvos) - message (bulkSaveMessages) - file (batchUpdateFiles) - conversationTag (updateTagsForConversation, bulkIncrementTagCounts) - aclEntry (bulkWriteAclEntries) systemGrant.seedSystemGrants is intentionally not migrated — it uses explicit tenantId: { $exists: false } filters and is exempt from the isolation plugin. * feat: pre-auth tenant middleware and tenant-scoped config cache Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request header and wraps downstream in tenantStorage ALS context. Wired onto /oauth, /api/auth, /api/config, and /api/share — unauthenticated routes that need tenant scoping before JWT auth runs. The /api/config cache key is now tenant-scoped (STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the correct login page config per tenant. The middleware is intentionally minimal — no subdomain parsing, no OIDC claim extraction. The private fork's reverse proxy or auth gateway sets the header. * feat: accept optional tenantId in updateInterfacePermissions When tenantId is provided, the function re-enters inside tenantStorage.run({ tenantId }) so all downstream Mongoose queries target that tenant's roles instead of the system context. This lets the private fork's tenant provisioning flow call updateInterfacePermissions per-tenant after creating tenant-scoped ADMIN/USER roles. * fix: tenant-filter $lookup in getPromptGroup aggregation The $lookup stage in getPromptGroup() queried the prompts collection without tenant filtering. While the outer PromptGroup aggregate is protected by the tenantIsolation plugin's pre('aggregate') hook, $lookup runs as an internal MongoDB operation that bypasses Mongoose hooks entirely. Converts from simple field-based $lookup to pipeline-based $lookup with an explicit tenantId match when tenant context is active. * fix: replace field-level unique indexes with tenant-scoped compounds Field-level unique:true creates a globally-unique single-field index in MongoDB, which would cause insert failures across tenants sharing the same ID values. - agent.id: removed field-level unique, added { id, tenantId } compound - convo.conversationId: removed field-level unique (compound at line 50 already exists: { conversationId, user, tenantId }) - message.messageId: removed field-level unique (compound at line 165 already exists: { messageId, user, tenantId }) - preset.presetId: removed field-level unique, added { presetId, tenantId } compound * fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant These caches store per-tenant configuration (available models, endpoint settings, plugin availability, tool definitions) but were using global cache keys. In multi-tenant mode, one tenant's cached config would be served to all tenants. Appends :${tenantId} to cache keys when tenant context is active. Falls back to the unscoped key when no tenant context exists (backward compatible for single-tenant OSS deployments). Covers all read, write, and delete sites: - ModelController.js: get/set MODELS_CONFIG - PluginController.js: get/set PLUGINS, get/set TOOLS - getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG - app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache) - mcp.js: delete TOOLS (updateMCPTools, mergeAppTools) - importers.js: get ENDPOINT_CONFIG * fix: add getTenantId to PluginController spec mock The data-schemas mock was missing getTenantId, causing all PluginController tests to throw when the controller calls getTenantId() for tenant-scoped cache keys. * fix: address review findings — migration, strict-mode, DRY, types Addresses all CRITICAL, MAJOR, and MINOR review findings: F1 (CRITICAL): Add agents, conversations, messages, presets to SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes() drops the old single-field unique indexes that block multi-tenant inserts. F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode instead of silently passing through without tenant injection. F3 (MAJOR): Replace wildcard export with named export for tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the public package API. F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on bulkWriteAclEntries — the unparameterized wrapper accepts parameterized ops as a subtype. F7 (MAJOR): Fix config.js tenant precedence — JWT-derived req.user.tenantId now takes priority over the X-Tenant-Id header for authenticated requests. F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and replace all 11 inline occurrences across 7 files. F9 (MINOR): Use simple localField/foreignField $lookup for the non-tenant getPromptGroup path (more efficient index seeks). F12 (NIT): Remove redundant BulkOp type alias. F13 (NIT): Remove debug log that leaked raw tenantId. * fix: add new superseded indexes to tenantIndexes test fixture The test creates old indexes to verify the migration drops them. Missing fixture entries for agents.id_1, conversations.conversationId_1, messages.messageId_1, and presets.presetId_1 caused the count assertion to fail (expected 22, got 18). * fix: restore logger.warn for unknown bulk op types in non-strict mode * fix: block SYSTEM_TENANT_ID sentinel from external header input CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id, including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID as an explicit bypass — skipping ALL query filters. A client sending X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config, /api/auth, /oauth) would execute Mongoose operations without tenant isolation. Fixes: - preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header - scopedCacheKey returns the base key (not key:__SYSTEM__) in system context, preventing stale cache entries during runAsSystem() - updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID - $lookup pipeline separates $expr join from constant tenantId match for better index utilization - Regression test for sentinel rejection in preAuthTenant.spec.ts - Remove redundant getTenantId() call in config.js * test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions bulkWrite spec: - deleteMany: verifies tenant-scoped deletion leaves other tenants untouched - replaceOne: verifies tenantId injected into both filter and replacement - replaceOne overwrite: verifies a conflicting tenantId in the replacement document is overwritten by the ALS tenant (defense-in-depth) - empty ops array: verifies graceful handling preAuthTenant spec: - All negative-case tests now use the capturedNext pattern to verify getTenantId() inside the middleware's execution context, not the test runner's outer frame (which was always undefined regardless) * feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager MESSAGES cache (streamAudio.js): - Cache key now uses scopedCacheKey(messageId) to prefix with tenantId, preventing cross-tenant message content reads during TTS streaming. FLOWS cache (FlowStateManager): - getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant context is active, isolating OAuth flow state per tenant. GenerationJobManager: - tenantId added to SerializableJobData and GenerationJobMetadata - createJob() captures the current ALS tenant context (excluding SYSTEM_TENANT_ID) and stores it in job metadata - SSE subscription endpoint validates job.metadata.tenantId matches req.user.tenantId, blocking cross-tenant stream access - Both InMemoryJobStore and RedisJobStore updated to accept tenantId * fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas without these exports, causing TypeError at runtime. * fix: correct import ordering per AGENTS.md conventions Package imports sorted shortest to longest line length, local imports sorted longest to shortest — fixes ordering violations introduced by our new imports across 8 files. * fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode serializeJob() writes tenantId to the Redis hash via Object.entries, but deserializeJob() manually enumerates fields and omitted tenantId. Every getJob() from Redis returned tenantId: undefined, causing the SSE route's cross-tenant guard to short-circuit (undefined && ... → false). * test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs SSE stream tenant tests (streamTenant.spec.js): - Cross-tenant user accessing another tenant's stream → 403 - Same-tenant user accessing own stream → allowed - OSS mode (no tenantId on job) → tenant check skipped FlowStateManager tenant tests (manager.tenant.spec.ts): - completeFlow finds flow created under same tenant context - completeFlow does NOT find flow under different tenant context - Unscoped flows are separate from tenant-scoped flows Documentation: - JSDoc on getFlowKey documenting ALS context consistency requirement - Comment on streamAudio.js scopedCacheKey capture site * fix: SSE stream tests hang on success path, remove internal fork references The success-path tests entered the SSE streaming code which never closes, causing timeout. Mock subscribe() to end the response immediately. Restructured assertions to verify non-403/non-404. Removed "private fork" and "OSS" references from code and test descriptions — replaced with "deployment layer", "multi-tenant deployments", and "single-tenant mode". * fix: address review findings — test rigor, tenant ID validation, docs F1: SSE stream tests now mock subscribe() with correct signature (streamId, writeEvent, onDone, onError) and assert 200 status, verifying the tenant guard actually allows through same-tenant users. F2: completeFlow logs the attempted key and ALS tenantId when flow is not found, so reverse proxy misconfiguration (missing X-Tenant-Id on OAuth callback) produces an actionable warning. F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects colons, special characters, and values exceeding 128 chars. Trims whitespace. Prevents cache key collisions via crafted headers. F4: Documented cache invalidation scope limitation in clearEndpointConfigCache — only the calling tenant's key is cleared; other tenants expire via TTL. F7: getFlowKey JSDoc now lists all 8 methods requiring consistent ALS context. F8: Added dedicated scopedCacheKey unit tests — base key without context, base key in system context, scoped key with tenant, no ALS leakage across scope boundaries. * fix: revert flow key tenant scoping, fix SSE test timing FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks arrive without tenant ALS context (provider redirects don't carry X-Tenant-Id), so completeFlow/failFlow would never find flows created under tenant context. Flow IDs are random UUIDs with no collision risk, and flow data is ephemeral (TTL-bounded). SSE tests: Use process.nextTick for onDone callback so Express response headers are flushed before res.write/res.end are called. * fix: restore getTenantId import for completeFlow diagnostic log * fix: correct completeFlow warning message, add missing flow test The warning referenced X-Tenant-Id header consistency which was only relevant when flow keys were tenant-scoped (since reverted). Updated to list actual causes: TTL expiry, missing flow, or routing to a different instance without shared Keyv storage. Removed the getTenantId() call and import — no longer needed since flow keys are unscoped. Added test for the !flowState branch in completeFlow — verifies return false and logger.warn on nonexistent flow ID. * fix: add explicit return type to recursive updateInterfacePermissions The recursive call (tenantId branch calls itself without tenantId) causes TypeScript to infer circular return type 'any'. Adding explicit Promise<void> satisfies the rollup typescript plugin. * fix: update MCPOAuthRaceCondition test to match new completeFlow warning * fix: clearEndpointConfigCache deletes both scoped and unscoped keys Unauthenticated /api/endpoints requests populate the unscoped ENDPOINT_CONFIG key. Admin config mutations clear only the tenant-scoped key, leaving the unscoped entry stale indefinitely. Now deletes both when in tenant context. * fix: tenant guard on abort/status endpoints, warn logs, test coverage F1: Add tenant guard to /chat/status/:conversationId and /chat/abort matching the existing guard on /chat/stream/:streamId. The status endpoint exposes aggregatedContent (AI response text) which requires tenant-level access control. F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__ sentinel and malformed tenant IDs, providing observability for bypass probing attempts. F3: Abort fallback path (getActiveJobIdsForUser) now has tenant check after resolving the job. F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem bypasses tenantSafeBulkWrite without throwing in strict mode. F5: Test for job with tenantId + user without tenantId → 403. F10: Regex uses idiomatic hyphen-at-start form. F11: Test descriptions changed from "rejects" to "ignores" since middleware calls next() (not 4xx). Also fixes MCPOAuthRaceCondition test assertion to match updated completeFlow warning message. * fix: test coverage for logger.warn, status/abort guards, consistency A: preAuthTenant spec now mocks logger and asserts warn calls for __SYSTEM__ sentinel, malformed characters, and oversized headers. B: streamTenant spec expanded with status and abort endpoint tests — cross-tenant status returns 403, same-tenant returns 200 with body, cross-tenant abort returns 403. C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId) matching stream/status pattern — requireJwtAuth guarantees req.user. D: Malformed header warning now includes ip in log metadata, matching the sentinel warning for consistent SOC correlation. * fix: assert ip field in malformed header warn tests * fix: parallelize cache deletes, document tenant guard, fix import order - clearEndpointConfigCache uses Promise.all for independent cache deletes instead of sequential awaits - SSE stream tenant guard has inline comment explaining backward-compat behavior for untenanted legacy jobs - conversation.ts local imports reordered longest-to-shortest per AGENTS.md * fix: tenant-qualify userJobs keys, document tenant guard backward-compat Job store userJobs keys now include tenantId when available: - Redis: stream:user:{tenantId:userId}:jobs (falls back to stream:user:{userId}:jobs when no tenant) - InMemory: composite key tenantId:userId in userJobMap getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId parameter, threaded through from req.user.tenantId at all call sites (/chat/active and /chat/abort fallback). Added inline comments on all three SSE tenant guards explaining the backward-compat design: untenanted legacy jobs remain accessible when the userId check passes. * fix: parallelize cache deletes, document tenant guard, fix import order Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use the tenant-qualified userKey instead of bare userId — prevents orphaned empty Sets accumulating in userJobMap for multi-tenant users. Document cross-tenant staleness in clearEndpointConfigCache JSDoc — other tenants' scoped keys expire via TTL, not active invalidation. * fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs F1: InMemoryJobStore.cleanup() now removes entries from userJobMap before calling deleteJob, preventing orphaned empty Sets from accumulating with tenant-qualified composite keys. F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds operators to configure reverse proxy to control X-Tenant-Id header. F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are not actively invalidated (matching clearEndpointConfigCache pattern). F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId (not req.user?.tenantId) — consistent with stream/status handlers. F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID behavior — falls through to caller's ALS context. F7: Extracted hasTenantMismatch() helper, replacing three identical inline tenant guard blocks across stream/status/abort endpoints. F9: scopedCacheKey JSDoc documents both passthrough cases (no context and SYSTEM_TENANT_ID context). * fix: clean userJobMap in evictOldest — same leak as cleanup()	2026-03-28 16:43:50 -04:00
Danny Avila	935288f841	🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring - Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains` - Add `MCPServerSource` type ('yaml' \| 'config' \| 'user') and `source` field to `ParsedServerConfig` - Change `dbSourced` determination from `!!config.dbId` to `config.source === 'user'` across MCPManager, ConnectionsRepository, UserConnectionManager, and MCPServerInspector - Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB * feat: three-layer MCPServersRegistry with config cache and lazy init - Add `configCacheRepo` as third repository layer between YAML cache and DB for admin-defined config-source MCP servers - Implement `ensureConfigServers()` that identifies config-override servers from resolved `getAppConfig()` mcpConfig, lazily inspects them, and caches parsed configs with `source: 'config'` - Add `lazyInitConfigServer()` with timeout, stub-on-failure, and concurrent-init deduplication via `pendingConfigInits` map - Extend `getAllServerConfigs()` with optional `configServers` param for three-way merge: YAML → Config → User - Add `getServerConfig()` lookup through config cache layer - Add `invalidateConfigCache()` for clearing config-source inspection results on admin config mutations - Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on DB-stored servers in `addServer()` and `addServerStub()` * feat: wire tenant context into MCP controllers, services, and cache invalidation - Resolve config-source servers via `getAppConfig({ role, tenantId })` in `getMCPTools()` and `getMCPServersList()` controllers - Pass `ensureConfigServers()` results through `getAllServerConfigs()` for three-way merge of YAML + Config + User servers - Add tenant/role context to `getMCPSetupData()` and connection status routes via `getTenantId()` from ALS - Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin config mutations trigger re-inspection of config-source MCP servers * feat: enforce tenantMcpPolicy on admin config mcpServers mutations - Add `validateMcpServerPolicy()` helper that checks mcpServers against operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant, allowedTransports, allowedDomains) - Wire validation into `upsertConfigOverrides` and `patchConfigField` handlers — rejects with 403 when policy is violated - Infer transport type from config shape (command → stdio, url protocol → websocket/sse, type field → streamable-http) - Validate server domains against policy allowlist when configured * revert: remove tenantMcpPolicy schema and enforcement The existing admin config CRUD routes already provide the mechanism for granular MCP server prepopulation (groups, roles, users). The tenantMcpPolicy gating adds unnecessary complexity that can be revisited if needed in the future. - Remove tenantMcpPolicy from mcpSettings Zod schema - Remove validateMcpServerPolicy helper and TenantMcpPolicy interface - Remove policy enforcement from upsertConfigOverrides and patchConfigField handlers * test: update test assertions for source field and config-server wiring - Use objectContaining in MCPServersRegistry reset test to account for new source: 'yaml' field on CACHE-stored configs - Add getTenantId and ensureConfigServers mocks to MCP route tests - Add getAppConfig mock to route test Config service mock - Update getMCPSetupData assertion to expect second options argument - Update getAllServerConfigs assertions for new configServers parameter * fix: disconnect active connections when config-source servers are evicted When admin config overrides change and config-source MCP servers are removed, the invalidation now proactively disconnects active connections for evicted servers instead of leaving them lingering until timeout. - Return evicted server names from invalidateConfigCache() - Disconnect app-level connections for evicted servers in clearMcpConfigCache() via MCPManager.appConnections.disconnect() * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Scope configCacheRepo keys by config content hash to prevent cross-tenant cache poisoning when two tenants define the same server name with different configurations - Change dbSourced checks from `source === 'user'` to `source !== 'yaml' && source !== 'config'` so undefined source (pre-upgrade cached configs) fails closed to restricted mode MAJOR fixes: - Derive OAuth servers from already-computed mcpConfig instead of calling getOAuthServers() separately — config-source OAuth servers are now properly detected - Add parseInt radix (10) and NaN guard with fallback to 30_000 for CONFIG_SERVER_INIT_TIMEOUT_MS - Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Remove `if (role \|\| tenantId)` guard in getMCPSetupData — config servers now always resolve regardless of tenant context MINOR fixes: - Extract resolveAllMcpConfigs() helper in mcp controller to eliminate 3x copy-pasted config resolution boilerplate - Distinguish "not initialized" from real errors in clearMcpConfigCache — log actual failures instead of swallowing - Remove narrative inline comments per style guide - Remove dead try/catch inside Promise.allSettled in ensureConfigServers (inner method never throws) - Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll() calls per request Test updates: - Add ensureConfigServers mock to registry test fixtures - Update getMCPSetupData assertions for inline OAuth derivation * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Break circular dependency: move CONFIG_CACHE_NAMESPACE from MCPServersRegistry to ServerConfigsCacheFactory - Fix dbSourced fail-closed: use source field when present, fall back to legacy dbId check when absent (backward-compatible with pre-upgrade cached configs that lack source field) MAJOR fixes: - Add CONFIG_CACHE_NAMESPACE to aggregate-key set in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests) covering lazy init, stub-on-failure, cross-tenant isolation via config hash keys, concurrent deduplication, merge order, and cache invalidation MINOR fixes: - Update MCPServerInspector test assertion for dbSourced change * fix: restore getServerConfig lookup for config-source servers (NEW-1) Add configNameToKey map that indexes server name → hash-based cache key for O(1) lookup by name in getServerConfig. This restores the config cache layer that was dropped when hash-based keys were introduced. Without this fix, config-source servers appeared in tool listings (via getAllServerConfigs) but getServerConfig returned undefined, breaking all connection and tool call paths. - Populate configNameToKey in ensureSingleConfigServer - Clear configNameToKey in invalidateConfigCache and reset - Clear stale read-through cache entries after lazy init - Remove dead code in invalidateConfigCache (config.title, key parsing) - Add getServerConfig tests for config-source server lookup * fix: eliminate configNameToKey race via caller-provided configServers param Replace the process-global configNameToKey map (last-writer-wins under concurrent multi-tenant load) with a configServers parameter on getServerConfig. Callers pass the pre-resolved config servers map directly — no shared mutable state, no cross-tenant race. - Add optional configServers param to getServerConfig; when provided, returns matching config directly without any global lookup - Remove configNameToKey map entirely (was the source of the race) - Extract server names from cache keys via lastIndexOf in invalidateConfigCache (safe for names containing colons) - Use mcpConfig[serverName] directly in getMCPTools instead of a redundant getServerConfig call - Add cross-tenant isolation test for getServerConfig * fix: populate read-through cache after config server lazy init After lazyInitConfigServer succeeds, write the parsed config to readThroughCache keyed by serverName so that getServerConfig calls from ConnectionsRepository, UserConnectionManager, and MCPManager.callTool find the config without needing configServers. Without this, config-source servers appeared in tool listings but every connection attempt and tool call returned undefined. * fix: user-scoped getServerConfig fallback to server-only cache key When getServerConfig is called with a userId (e.g., from callTool or UserConnectionManager), the cache key is serverName::userId. Config-source servers are cached under the server-only key (no userId). Add a fallback so user-scoped lookups find config-source servers in the read-through cache. * fix: configCacheRepo fallback, isUserSourced DRY, cross-process race CRITICAL: Add findInConfigCache fallback in getServerConfig so config-source servers remain reachable after readThroughCache TTL expires (5s). Without this, every tool call after 5s returned undefined for config-source servers. MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace all 5 inline dbSourced ternary expressions (MCPManager x2, ConnectionsRepository, UserConnectionManager, MCPServerInspector). MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when configCacheRepo.add throws (key exists from another process), fall back to reading the existing entry instead of returning undefined. MINOR: Parallelize invalidateConfigCache awaits with Promise.all. Remove redundant .catch(() => {}) inside Promise.allSettled. Tighten dedup test assertion to toBe(1). Add TTL-expiry tests for getServerConfig (with and without userId). * feat: thread configServers through getAppToolFunctions and formatInstructionsForContext Add optional configServers parameter to getAppToolFunctions, getInstructions, and formatInstructionsForContext so config-source server tools and instructions are visible to agent initialization and context injection paths. Existing callers (boot-time init, tests) pass no argument and continue to work unchanged. Agent runtime paths can now thread resolved config servers from request context. * fix: stale failure stubs retry after 5 min, upsert for cross-process races - Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried instead of permanently disabling config-source servers after transient errors (DNS outage, cold-start race) - Extract upsertConfigCache() helper that tries add then falls back to update, preventing cross-process Redis races where a second instance's successful inspection result was discarded - Add test for stale-stub retry after CONFIG_STUB_RETRY_MS * fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup - Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs were always considered stale (updatedAt ?? 0 → epoch → always expired) - Add null guard for rawConfig in MCPManager.callTool before passing to preProcessGraphTokens — prevents unsafe `as` cast on undefined - Log double-failure in upsertConfigCache instead of silently swallowing - Replace module-scope Date.now monkey-patch with jest.useFakeTimers / jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests * fix: server-only readThrough fallback only returns truthy values Prevents a cached undefined from a prior no-userId lookup from short-circuiting the DB query on a subsequent userId-scoped lookup. * fix: remove findInConfigCache to eliminate cross-tenant config leakage The findInConfigCache prefix scan (serverName:) could return any tenant's config after readThrough TTL expires, violating tenant isolation. Config-source servers are now ONLY resolvable through: 1. The configServers param (callers with tenant context from ALS) 2. The readThrough cache (populated by ensureSingleConfigServer, 5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs) Connection/tool-call paths without tenant context rely exclusively on the readThrough cache. If it expires before the next HTTP request repopulates it, the server is not found — which is correct because there is no tenant context to determine which config to return. - Remove findInConfigCache method and its call in getServerConfig - Update server-only readThrough fallback to only return truthy values (prevents cached undefined from short-circuiting user-scoped DB lookup) - Update tests to document tenant isolation behavior after cache expiry style: fix import order per AGENTS.md conventions Sort package imports shortest-to-longest, local imports longest-to-shortest across MCPServersRegistry, ConnectionsRepository, MCPManager, UserConnectionManager, and MCPServerInspector. * fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures Thread pre-resolved serverConfig from tool creation context into callTool, removing dependency on the readThrough cache for config-source servers. This fixes two issues: - Cross-tenant contamination: the readThrough cache key was unscoped (just serverName), so concurrent multi-tenant requests for same-named servers would overwrite each other's entries - TTL expiry: tool calls happening >5s after config resolution would fail with "Configuration not found" because the readThrough entry had expired Changes: - Add optional serverConfig param to MCPManager.callTool — uses provided config directly, falling back to getServerConfig lookup for YAML/user servers - Thread serverConfig from createMCPTool through createToolInstance closure to callTool - Remove readThrough write from ensureSingleConfigServer — config-source servers are only accessible via configServers param (tenant-scoped) - Remove server-only readThrough fallback from getServerConfig - Increase config cache hash from 8 to 16 hex chars (64-bit) - Add isUserSourced boundary tests for all source/dbId combinations - Fix double Object.keys call in getMCPTools controller - Update test assertions for new getServerConfig behavior * fix: cache base configs for config-server users; narrow upsertConfigCache error handling - Refactor getAllServerConfigs to separate base config fetch (YAML + DB) from config-server layering. Base configs are cached via readThroughCacheAll regardless of whether configServers is provided, eliminating uncached MongoDB queries per request for config-server users - Narrow upsertConfigCache catch to duplicate-key errors only; infrastructure errors (Redis timeouts, network failures) now propagate instead of being silently swallowed, preventing inspection storms during outages * fix: restore correct merge order and document upsert error matching - Restore YAML → Config → User DB precedence in getAllServerConfigs (user DB servers have highest precedence, matching the JSDoc contract) - Add source comment on upsertConfigCache duplicate-key detection linking to the two cache implementations that define the error message * feat: complete config-source server support across all execution paths Wire configServers through the entire agent execution pipeline so config-source MCP servers are fully functional — not just visible in listings but executable in agent sessions. - Thread configServers into handleTools.js agent tool pipeline: resolve config servers from tenant context before MCP tool iteration, pass to getServerConfig, createMCPTools, and createMCPTool - Thread configServers into agent instructions pipeline: applyContextToAgent → getMCPInstructionsForServers → formatInstructionsForContext, resolved in client.js before agent context application - Add configServers param to createMCPTool and createMCPTools for reconnect path fallback - Add source field to redactServerSecrets allowlist for client UI differentiation of server tiers - Narrow invalidateConfigCache to only clear readThroughCacheAll (merged results), preserving YAML individual-server readThrough entries - Update context.spec.ts assertions for new configServers parameter * fix: add missing mocks for config-source server dependencies in client.test.js Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added to client.js but not reflected in the test file's jest.mock declarations. * fix: update formatInstructionsForContext assertions for configServers param The test assertions expected formatInstructionsForContext to be called with only the server names array, but it now receives configServers as a second argument after the config-source server feature wiring. * fix: move configServers resolution before MCP tool loop to avoid TDZ configServers was declared with `let` after the first tool loop but referenced inside it via getServerConfig(), causing a ReferenceError temporal dead zone. Move declaration and resolution before the loop, using tools.some(mcpToolPattern) to gate the async resolution. * fix: address review findings — cache bypass, discoverServerTools gap, DRY - #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via readThroughCacheAll) instead of bypassing it when configServers is present. Extracts user-DB entries from cached base by diffing against YAML keys to maintain YAML → Config → User DB merge order without extra MongoDB calls. - #3: Add configServers param to ToolDiscoveryOptions and thread it through discoverServerTools → getServerConfig so config-source servers are discoverable during OAuth reconnection flows. - #6: Replace inline import() type annotations in context.ts with proper import type { ParsedServerConfig } per AGENTS.md conventions. - #7: Extract resolveConfigServers(req) helper in MCP.js and use it from handleTools.js and client.js, eliminating the duplicated 6-line config resolution pattern. - #10: Restore removed "why" comment explaining getLoaded() vs getAll() choice in getMCPSetupData — documents non-obvious correctness constraint. - #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs. * fix: consolidate imports, reorder constants, fix YAML-DB merge edge case - Merge duplicate @librechat/data-schemas requires in MCP.js into one - Move resolveConfigServers after module-level constants - Fix getAllServerConfigs edge case where user-DB entry overriding a YAML entry with the same name was excluded from userDbConfigs; now uses reference equality check to detect DB-overwritten YAML keys * fix: replace fragile string-match error detection with proper upsert method Add upsert() to IServerConfigsRepositoryInterface and all implementations (InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle error message string match ('already exists in cache') in upsertConfigCache that was the only thing preventing cross-process init races from silently discarding inspection results. Each implementation handles add-or-update atomically: - InMemory: direct Map.set() - Redis: direct cache.set() - RedisAggregateKey: read-modify-write under write lock - DB: delegates to update() (DB servers use explicit add() with ACL setup) * fix: wire configServers through remaining HTTP endpoints - getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig - reinitialize route: resolve configServers before getServerConfig - auth-values route: resolve configServers before getServerConfig - getOAuthHeaders: accept configServers param, thread from callers - Update mcp.spec.js tests to mock getAllServerConfigs for GET by name * fix: thread serverConfig through getConnection for config-source servers Config-source servers exist only in configCacheRepo, not in YAML cache or DB. When callTool → getConnection → getUserConnection → getServerConfig runs without configServers, it returns undefined and throws. Fix by threading the pre-resolved serverConfig (providedConfig) from callTool through getConnection → getUserConnection → createUserConnectionInternal, using it as a fallback before the registry lookup. * fix: thread configServers through reinit, reconnect, and tool definition paths Wire configServers through every remaining call chain that creates or reconnects MCP server connections: - reinitMCPServer: accepts serverConfig and configServers, uses them for getServerConfig fallback, getConnection, and discoverServerTools - reconnectServer: accepts and passes configServers to reinitMCPServer - createMCPTools/createMCPTool: pass configServers to reconnectServer - ToolService.loadToolDefinitionsWrapper: resolves configServers from req, passes to both reinitMCPServer call sites - reinitialize route: passes serverConfig and configServers to reinitMCPServer * fix: address review findings — simplify merge, harden error paths, fix log labels - Simplify getAllServerConfigs merge: replace fragile reference-equality loop with direct spread { ...yamlConfigs, ...configServers, ...base } - Guard upsertConfigCache in lazyInitConfigServer catch block so cache failures don't mask the original inspection error - Deduplicate getYamlServerNames cold-start with promise dedup pattern - Remove dead `if (!mcpConfig)` guard in getMCPSetupData - Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error messages — now uses this.namespace for correct Config/App labeling - Remove misleading OAuth callback comment about readThrough cache - Move resolveConfigServers after module-level constants in MCP.js * fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label - Clear yamlServerNamesPromise on rejection so transient cache errors don't permanently prevent ensureConfigServers from working - Skip reinspectServer for config-source servers (source: 'config') in reinitMCPServer — they lack a CACHE/DB storage location; retry is handled by CONFIG_STUB_RETRY_MS in ensureConfigServers - Use source field instead of dbId for storageLocation derivation - Fix remaining hardcoded "App" in reset() leaderCheck message * fix: persist oauthHeaders in flow state for config-source OAuth servers The OAuth callback route has no JWT auth context and cannot resolve config-source server configs. Previously, getOAuthHeaders would silently return {} for config-source servers, dropping custom token exchange headers. Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow initiation (which has auth context), and the callback reads them from the stored flow state with a fallback to the registry lookup for YAML/user-DB servers. * fix: update tests for getMCPSetupData null guard removal and ToolService mock - MCP.spec.js: update test to expect graceful handling of null mcpConfig instead of a throw (getAllServerConfigs always returns an object) - MCP.js: add defensive \|\| {} for Object.entries(mcpConfig) in case of null from test mocks - ToolService.spec.js: add missing mock for ~/server/services/MCP (resolveConfigServers) * fix: address review findings — DRY, naming, logging, dead code, defensive guards - #1: Simplify getAllServerConfigs to single getBaseServerConfigs call, eliminating redundant double-fetch of cacheConfigsRepo.getAll() - #2: Add warning log when oauthHeaders absent from OAuth callback flow state - #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller imports shared helper instead of reimplementing - #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider in createToolInstance — these are actively used, not unused - #5: Log rejected results from ensureConfigServers Promise.allSettled so cache errors are visible instead of silently dropped - #6: Remove dead 'MCP config not found' error handlers from routes - #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache - #8: Remove logger.error from withTimeout to prevent double-logging timeouts - #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message - #12: Use spread instead of mutation in addServer for immutability consistency - Add upsert mock to ensureConfigServers.test.ts DB mock - Update route tests for resolveAllMcpConfigs import change * fix: restore correct merge priority, use immutable spread, fix test mock - getAllServerConfigs: { ...configServers, ...base } so userDB wins over configServers, matching documented "User DB (highest)" priority - lazyInitConfigServer: use immutable spread instead of direct mutation for parsedConfig.source, consistent with addServer fix - Fix test to mock getAllServerConfigs as {} instead of null, remove unnecessary \|\| {} defensive guard in getMCPSetupData * fix: error handling, stable hashing, flatten nesting, remove dead param - Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with graceful {} fallback so transient DB/cache errors don't crash tool pipeline - Sort keys in configCacheKey JSON.stringify for deterministic hashing regardless of object property insertion order - Flatten clearMcpConfigCache from 3 nested try-catch to early returns; document that user connections are cleaned up lazily (accepted tradeoff) - Remove dead configServers param from getAppToolFunctions (never passed) - Add security rationale comment for source field in redactServerSecrets * fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision The array replacer in JSON.stringify acts as a property allowlist at every nesting depth, silently dropping nested keys like headers['X-API-Key'], oauth.client_secret, etc. Two configs with different nested values but identical top-level structure produced the same hash, causing cross-tenant cache hits and potential credential contamination. Switch to a function replacer that recursively sorts keys at all depths without dropping any properties. Also document the known gap in getOAuthServers: config-source OAuth servers are not covered by auto-reconnection or uninstall cleanup because callers lack request context. * fix: move clearMcpConfigCache to packages/api to eliminate circular dependency The function only depends on MCPServersRegistry and MCPManager, both of which live in packages/api. Import it directly from @librechat/api in the CJS layer instead of using dynamic require('~/config'). * chore: imports/fields ordering * fix: address review findings — error handling, targeted lookup, test gaps - Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so getAppConfig/getAllServerConfigs failures propagate instead of masking infrastructure errors as empty server lists. - Use targeted getServerConfig in getMCPServerById instead of fetching all server configs for a single-server lookup. - Forward configServers to inner createMCPTool calls so reconnect path works for config-source servers. - Update getAllServerConfigs JSDoc to document disjoint-key design. - Add OAuth callback oauthHeaders fallback tests (flow state present vs registry fallback). - Add resolveConfigServers/resolveAllMcpConfigs unit tests covering happy path and error propagation. * fix: add getOAuthReconnectionManager mock to OAuth callback tests * chore: imports ordering	2026-03-28 10:36:43 -04:00
Danny Avila	77712c825f	🏢 feat: Tenant-Scoped App Config in Auth Login Flows (#12434 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * feat: add resolveAppConfigForUser utility for tenant-scoped auth config TypeScript utility in packages/api that wraps getAppConfig in tenantStorage.run() when the user has a tenantId, falling back to baseOnly for new users or non-tenant deployments. Uses DI pattern (getAppConfig passed as parameter) for testability. Auth flows apply role-level overrides only (userId not passed) because user/group principal resolution is deferred to post-auth. * feat: tenant-scoped app config in auth login flows All auth strategies (LDAP, SAML, OpenID, social login) now use a two-phase domain check consistent with requestPasswordReset: 1. Fast-fail with base config (memory-cached, zero DB queries) 2. DB user lookup 3. Tenant-scoped re-check via resolveAppConfigForUser (only when user has a tenantId; otherwise reuse base config) This preserves the original fast-fail protection against globally blocked domains while enabling tenant-specific config overrides. OpenID error ordering preserved: AUTH_FAILED checked before domain re-check so users with wrong providers get the correct error type. registerUser unchanged (baseOnly, no user identity yet). * test: add tenant-scoped config tests for auth strategies Add resolveAppConfig.spec.ts in packages/api with 8 tests: - baseOnly fallback for null/undefined/no-tenant users - tenant-scoped config with role and tenantId - ALS context propagation verified inside getAppConfig callback - undefined role with tenantId edge case Update strategy and AuthService tests to mock resolveAppConfigForUser via @librechat/api. Tests verify two-phase domain check behavior: fast-fail before DB, tenant re-check after. Non-tenant users reuse base config without calling resolveAppConfigForUser. * refactor: skip redundant domain re-check for non-tenant users Guard the second isEmailDomainAllowed call with appConfig !== baseConfig in SAML, OpenID, and social strategies. For non-tenant users the tenant config is the same base config object, so the second check is a no-op. Narrow eslint-disable in resolveAppConfig.spec.ts to the specific require line instead of blanket file-level suppression. * fix: address review findings — consistency, tests, and ordering - Consolidate duplicate require('@librechat/api') in AuthService.js - Add two-phase domain check to LDAP (base fast-fail before findUser), making all strategies consistent with PR description - Add appConfig !== baseConfig guard to requestPasswordReset second domain check, consistent with SAML/OpenID/social strategies - Move SAML provider check before tenant config resolution to avoid unnecessary resolveAppConfigForUser call for wrong-provider users - Add tenant domain rejection tests to SAML, OpenID, and social specs verifying that tenant config restrictions actually block login - Add error propagation tests to resolveAppConfig.spec.ts - Remove redundant mockTenantStorage alias in resolveAppConfig.spec.ts - Narrow eslint-disable to specific require line * test: add tenant domain rejection test for LDAP strategy Covers the appConfig !== baseConfig && !isEmailDomainAllowed path, consistent with SAML, OpenID, and social strategy specs. * refactor: rename resolveAppConfig to app/resolve per AGENTS.md Rename resolveAppConfig.ts → resolve.ts and resolveAppConfig.spec.ts → resolve.spec.ts to align with the project's concise naming convention. * fix: remove fragile reference-equality guard, add logging and docs Remove appConfig !== baseConfig guard from all strategies and requestPasswordReset. The guard relied on implicit cache-backend identity semantics (in-memory Keyv returns same object reference) that would silently break with Redis or cloned configs. The second isEmailDomainAllowed call is a cheap synchronous check — always running it is clearer and eliminates the coupling. Add audit logging to requestPasswordReset domain blocks (base and tenant), consistent with all auth strategies. Extract duplicated error construction into makeDomainDeniedError(). Wrap resolveAppConfigForUser in requestPasswordReset with try/catch to prevent DB errors from leaking to the client via the controller's generic catch handler. Document the dual tenantId propagation (ALS for DB isolation, explicit param for cache key) in resolveAppConfigForUser JSDoc. Add comment documenting the LDAP error-type ordering change (cross-provider users from blocked domains now get 'domain not allowed' instead of AUTH_FAILED). Assert resolveAppConfigForUser is not called on LDAP provider mismatch path. * fix: return generic response for tenant domain block in password reset Tenant-scoped domain rejection in requestPasswordReset now returns the same generic "If an account with that email exists..." response instead of an Error. This prevents user-enumeration: an attacker cannot distinguish between "email not found" and "tenant blocks this domain" by comparing HTTP responses. The base-config fast-fail (pre-user-lookup) still returns an Error since it fires before any user existence is revealed. * docs: document phase 1 vs phase 2 domain check behavior in JSDoc Phase 1 (base config, pre-findUser) intentionally returns Error/400 to reveal globally blocked domains without confirming user existence. Phase 2 (tenant config, post-findUser) returns generic 200 to prevent user-enumeration. This distinction is now explicit in the JSDoc.	2026-03-27 16:08:43 -04:00
Dustin Healy	5972a21479	🪪 feat: Admin Roles API Endpoints (#12400 ) * feat: add createRole and deleteRole methods to role * feat: add admin roles handler factory and Express routes * fix: address convention violations in admin roles handlers * fix: rename createRole/deleteRole to avoid AccessRole name collision The existing accessRole.ts already exports createRole/deleteRole for the AccessRole model. In createMethods index.ts, these are spread after roleMethods, overwriting them. Renamed our Role methods to createRoleByName/deleteRoleByName to match the existing pattern (getRoleByName, updateRoleByName) and avoid the collision. * feat: add description field to Role model - Add description to IRole, CreateRoleRequest, UpdateRoleRequest types - Add description field to Mongoose roleSchema (default: '') - Wire description through createRoleHandler and updateRoleHandler - Include description in listRoles select clause so it appears in list * fix: address Copilot review findings in admin roles handlers * test: add unit tests for admin roles and groups handlers * test: add data-layer tests for createRoleByName, deleteRoleByName, listUsersByRole * fix: allow system role updates when name is unchanged The updateRoleHandler guard rejected any request where body.name matched a system role, even when the name was not being changed. This blocked editing a system role's description. Compare against the URL param to only reject actual renames to reserved names. * fix: address external review findings for admin roles - Block renaming system roles (ADMIN/USER) and add user migration on rename - Add input validation: name max-length, trim on update, duplicate name check - Replace fragile String.includes error matching with prefix-based classification - Catch MongoDB 11000 duplicate key in createRoleByName - Add pagination (limit/offset/total) to getRoleMembersHandler - Reverse delete order in deleteRoleByName — reassign users before deletion - Add role existence check in removeRoleMember; drop unused createdAt select - Add Array.isArray guard for permissions input; use consistent ?? coalescing - Fix import ordering per AGENTS.md conventions - Type-cast mongoose.models.User as Model<IUser> for proper TS inference - Add comprehensive tests: rename guards, pagination, validation, 500 paths * fix: address re-review findings for admin roles - Gate deleteRoleByName on existence check — skip user reassignment and cache invalidation when role doesn't exist (fixes test mismatch) - Reverse rename order: migrate users before renaming role so a migration failure leaves the system in a consistent state - Add .sort({ _id: 1 }) to listUsersByRole for deterministic pagination - Import shared AdminMember type from data-schemas instead of local copy; make joinedAt optional since neither groups nor roles populate it - Change IRole.description from optional to required to match schema default - Add data-layer tests for updateUsersByRole and countUsersByRole - Add handler test verifying users-first rename ordering and migration failure safety * fix: add rollback on rename failure and update PR description - Roll back user migration if updateRoleByName returns null during a rename (race: role deleted between existence check and update) - Add test verifying rollback calls updateUsersByRole in reverse - Update PR #12400 description to reflect current test counts (56 handler tests, 40 data-layer tests) and safety features * fix: rollback on rename throw, description validation, delete/DRY cleanup - Hoist isRename/trimmedName above try block so catch can roll back user migration when updateRoleByName throws (not just returns null) - Add description type + max-length (2000) validation in create and update, consistent with groups handler - Remove redundant getRoleByName existence check in deleteRoleHandler — use deleteRoleByName return value directly - Skip no-op name write when body.name equals current name (use isRename) - Extract getUserModel() accessor to DRY repeated Model<IUser> casts - Use name.trim() consistently in createRoleByName error messages - Add tests: rename-throw rollback, description validation (create+update), update delete test mocks to match simplified handler * fix: guard spurious rollback, harden createRole error path, validate before DB calls - Add migrationRan flag to prevent rollback of user migration that never ran - Return generic message on 500 in createRoleHandler, specific only for 409 - Move description validation before DB queries in updateRoleHandler - Return existing role early when update body has no changes - Wrap cache.set in createRoleByName with try/catch to prevent masking DB success - Add JSDoc on 11000 catch explaining compound unique index - Add tests: spurious rollback guard, empty update body, description validation ordering, listUsersByRole pagination * fix: validate permissions in create, RoleConflictError, rollback safety, cache consistency - Add permissions type/array validation in createRoleHandler - Introduce RoleConflictError class replacing fragile string-prefix matching - Wrap rollback in !role null path with try/catch for correct 404 response - Wrap deleteRoleByName cache.set in try/catch matching createRoleByName - Narrow updateRoleHandler body type to { name?, description? } - Add tests: non-string description in create, rollback failure logging, permissions array rejection, description max-length assertion fix * feat: prevent removing the last admin user Add guard in removeRoleMember that checks countUsersByRole before demoting an ADMIN user, returning 400 if they are the last one. * fix: move interleaved export below imports, add await to countUsersByRole * fix: paginate listRoles, null-guard permissions handler, fix export ordering - Add limit/offset/total pagination to listRoles matching the groups pattern - Add countRoles data-layer method - Omit permissions from listRoles select (getRole returns full document) - Null-guard re-fetched role in updateRolePermissionsHandler - Move interleaved export below all imports in methods/index.ts * fix: address review findings — race safety, validation DRY, type accuracy, test coverage - Add post-write admin count verification in removeRoleMember to prevent zero-admin race condition (TOCTOU → rollback if count hits 0) - Make IRole.description optional; backfill in initializeRoles for pre-existing roles that lack the field (.lean() bypasses defaults) - Extract parsePagination, validateNameParam, validateRoleName, and validateDescription helpers to eliminate duplicated validation - Add validateNameParam guard to all 7 handlers reading req.params.name - Catch 11000 in updateRoleByName and surface as 409 via RoleConflictError - Add idempotent skip in addRoleMember when user already has target role - Verify updateRolePermissions test asserts response body - Add data-layer tests: listRoles sort/pagination/projection, countRoles, and createRoleByName 11000 duplicate key race * fix: defensive rollback in removeRoleMember, type/style cleanup, test coverage - Wrap removeRoleMember post-write admin rollback in try/catch so a transient DB failure cannot leave the system with zero administrators - Replace double `as unknown[] as IRole[]` cast with `.lean<IRole[]>()` - Type parsePagination param explicitly; extract DEFAULT/MAX page constants - Preserve original error cause in updateRoleByName re-throw - Add test for rollback failure path in removeRoleMember (returns 400) - Add test for pre-existing roles missing description field (.lean()) * chore: bump @librechat/data-schemas to 0.0.47 * fix: stale cache on rename, extract renameRole helper, shared pagination, cleanup - Fix updateRoleByName cache bug: invalidate old key and populate new key when updates.name differs from roleName (prevents stale cache after rename) - Extract renameRole helper to eliminate mutable outer-scope state flags (isRename, trimmedName, migrationRan) in updateRoleHandler - Unify system-role protection to 403 for both rename-from and rename-to - Extract parsePagination to shared admin/pagination.ts; use in both roles.ts and groups.ts - Extract name.trim() to local const in createRoleByName (was called 5×) - Remove redundant findOne pre-check in deleteRoleByName - Replace getUserModel closure with local const declarations - Remove redundant description ?? '' in createRoleHandler (schema default) - Add doc comment on updateRolePermissionsHandler noting cache dependency - Add data-layer tests for cache rename behavior (old key null, new key set) * fix: harden role guards, add User.role index, validate names, improve tests - Add index on User.role field for efficient member queries at scale - Replace fragile SystemRoles key lookup with value-based Set check (6 sites) - Elevate rename rollback failure logging to CRITICAL (matches removeRoleMember) - Guard removeRoleMember against non-ADMIN system roles (403 for USER) - Fix parsePagination limit=0 gotcha: use parseInt + NaN check instead of \|\| - Add control character and reserved path segment validation to role names - Simplify validateRoleName: remove redundant casts and dead conditions - Add JSDoc to deleteRoleByName documenting non-atomic window - Split mixed value+type import in methods/index.ts per AGENTS.md - Add 9 new tests: permissions assertion, combined rename+desc, createRole with permissions, pagination edge cases, control char/reserved name rejection, system role removeRoleMember guard * fix: exact-case reserved name check, consistent validation, cleaner createRole - Remove .toLowerCase() from reserved name check so only exact matches (members, permissions) are rejected, not legitimate names like "Members" - Extract trimmed const in validateRoleName for consistent validation - Add control char check to validateNameParam for parity with body validation - Build createRole roleData conditionally to avoid passing description: undefined - Expand deleteRoleByName JSDoc documenting self-healing design and no-op trade-off * fix: scope rename rollback to only migrated users, prevent cross-role corruption Capture user IDs before forward migration so the rollback path only reverts users this request actually moved. Previously the rollback called updateUsersByRole(newName, currentName) which would sweep all users with the new role — including any independently assigned by a concurrent admin request — causing silent cross-role data corruption. Adds findUserIdsByRole and updateUsersRoleByIds to the data layer. Extracts rollbackMigratedUsers helper to deduplicate rollback sites. * fix: guard last admin in addRoleMember to prevent zero-admin lockout Since each user has exactly one role, addRoleMember implicitly removes the user from their current role. Without a guard, reassigning the sole admin to a non-admin role leaves zero admins and locks out admin management. Adds the same countUsersByRole check used in removeRoleMember. * fix: wire findUserIdsByRole and updateUsersRoleByIds into roles route The scoped rollback deps added in `c89b5db` were missing from the route DI wiring, causing renameRole to call undefined and return a 500. * fix: post-write admin guard in addRoleMember, compound role index, review cleanup - Add post-write admin count check + rollback to addRoleMember to match removeRoleMember's two-phase TOCTOU protection (prevents zero-admin via concurrent requests) - Replace single-field User.role index with compound { role: 1, tenantId: 1 } to align with existing multi-tenant index pattern (email, OAuth IDs) - Narrow listRoles dep return type to RoleListItem (projected fields only) - Refactor validateDescription to early-return style per AGENTS.md - Remove redundant double .lean() in updateRoleByName - Document rename snapshot race window in renameRole JSDoc - Document cache null-set behavior in deleteRoleByName - Add routing-coupling comment on RESERVED_ROLE_NAMES - Add test for addRoleMember post-write rollback * fix: review cleanup — system-role guard, type safety, JSDoc accuracy, tests - Add system-role guard to addRoleMember: block direct assignment to non-ADMIN system roles (403), symmetric with removeRoleMember - Fix RESERVED_ROLE_NAMES comment: explain semantic URL ambiguity, not a routing conflict (Express resolves single vs multi-segment correctly) - Replace _id: unknown with Types.ObjectId \| string per AGENTS.md - Narrow listRoles data-layer return type to Pick<IRole, 'name' \| 'description'> to match the actual .select() projection - Move updateRoleHandler param check inside try/catch for consistency - Include user IDs in all CRITICAL rollback failure logs for operator recovery - Clarify deleteRoleByName JSDoc: replace "self-healing" with "idempotent", document that recovery requires caller retry - Add tests: system-role guard, promote non-admin to ADMIN, findUserIdsByRole throw prevents migration * fix: include _id in listRoles return type to match RoleListItem Pick<IRole, 'name' \| 'description'> omits _id, making it incompatible with the handler dep's RoleListItem which requires _id. * fix: case-insensitive system role guard, reject null permissions, check updateUser result - System role name checks now use case-insensitive comparison via toUpperCase() — prevents creating 'admin' or 'user' which would collide with the legacy roles route that uppercases params - Reject permissions: null in createRole (typeof null === 'object' was bypassing the validation) - Check updateUser return in addRoleMember — return 404 if the user was deleted between the findUser and updateUser calls * fix: check updateUser return in removeRoleMember for concurrent delete safety --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-27 15:44:47 -04:00
Dustin Healy	2e3d66cfe2	👥 feat: Admin Groups API Endpoints (#12387 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * feat: add listGroups and deleteGroup methods to userGroup * feat: add admin groups handler factory and Express routes * fix: address convention violations in admin groups handlers * fix: address Copilot review findings in admin groups handlers - Escape regex in listGroups to prevent injection/ReDoS - Validate ObjectId format in all handlers accepting id/userId params - Replace N+1 findUser loop with batched findUsers query - Remove unused findGroupsByMemberId from dep interface - Map Mongoose ValidationError to 400 in create/update handlers - Validate name in updateGroupHandler (reject empty/whitespace) - Handle null updateGroupById result (race condition) - Tighten error message matching in add/remove member handlers * test: add unit tests for admin groups handlers * fix: address code review findings for admin groups Atomic delete/update handlers (single DB trip), pass through idOnTheSource, add removeMemberById for non-ObjectId members, deduplicate member results, fix error message exposure, add hard cap/sort to listGroups, replace GroupListFilter with Pick of GroupFilterOptions, validate memberIds as array, trim name in update, fix import order, and improve test hygiene with fresh IDs per test. * fix: cascade cleanup, pagination, and test coverage for admin groups Add deleteGrantsForPrincipal to systemGrant data layer and wire cascade cleanup (Config, AclEntry, SystemGrant) into deleteGroupHandler. Add limit/offset pagination to getGroupMembers. Guard empty PATCH bodies with 400. Remove dead type guard and unnecessary type cast. Add 11 new tests covering cascade delete, idempotent member removal, empty update, search filter, 500 error paths, and pagination. * fix: harden admin groups with cascade resilience, type safety, and fallback removal Wrap cascade cleanup in inner try/catch so partial failure logs but still returns 200 (group is already deleted). Replace Record<string, unknown> on deleteAclEntries with proper typed filter. Log warning for unmapped user ObjectIds in createGroup memberIds. Add removeMemberById fallback when removeUserFromGroup throws User not found for ObjectId-format userId. Extract VALID_GROUP_SOURCES constant. Add 3 new tests (60 total). * refactor: add countGroups, pagination, and projection type to data layer Extract buildGroupQuery helper, add countGroups method, support limit/offset/skip in listGroups, standardize session handling to .session(session ?? null), and tighten projection parameter from Record<string, unknown> to Record<string, 0 \| 1>. * fix: cascade resilience, pagination, validation, and error clarity for admin groups - Use Promise.allSettled for cascade cleanup so all steps run even if one fails; log individual rejections - Echo deleted group id in delete response - Add countGroups dep and wire limit/offset pagination for listGroups - Deduplicate memberIds before computing total in getGroupMembers - Use { memberIds: 1 } projection in getGroupMembers - Cap memberIds at 500 entries in createGroup - Reject search queries exceeding 200 characters - Clarify addGroupMember error for non-ObjectId userId - Document deleted-user fallback limitation in removeGroupMember * test: extend handler and DB-layer test coverage for admin groups Handler tests: projection assertion, dedup total, memberIds cap, search max length, non-ObjectId memberIds passthrough, cascade partial failure resilience, dedup scenarios, echo id in delete response. DB-layer tests: listGroups sort/filter/pagination, countGroups, deleteGroup, removeMemberById, deleteGrantsForPrincipal. * fix: cast group principalId to ObjectId for ACL entry cleanup deleteAclEntries is a thin deleteMany wrapper with no type casting, but grantPermission stores group principalId as ObjectId. Passing the raw string from req.params would leave orphaned ACL entries on group deletion. * refactor: remove redundant pagination clamping from DB listGroups Handler already clamps limit/offset at the API boundary. The DB method is a general-purpose building block and should not re-validate. * fix: add source and name validation, import order, and test coverage for admin groups - Validate source against VALID_GROUP_SOURCES in createGroupHandler - Cap name at 500 characters in both create and update handlers - Document total as upper bound in getGroupMembers response - Document ObjectId requirement for deleteAclEntries in cascade - Fix import ordering in test file (local value after type imports) - Add tests for updateGroup with description, email, avatar fields - Add tests for invalid source and name max-length in both handlers * fix: add field length caps, flatten nested try/catch, and fix logger level in admin groups Add max-length validation for description, email, avatar, and idOnTheSource in create/update handlers. Extract removeObjectIdMember helper to flatten nested try/catch per never-nesting convention. Downgrade unmapped-memberIds log from error to warn. Fix type import ordering and add missing await in removeMemberById for consistency.	2026-03-26 17:36:18 -04:00
Danny Avila	9f6d8c6e93	🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407 ) * feat: add tenant context middleware for ALS-based isolation Introduces tenantContextMiddleware that propagates req.user.tenantId into AsyncLocalStorage, activating the Mongoose applyTenantIsolation plugin for all downstream DB queries within a request. - Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId - Non-strict mode passes through for backward compatibility - No-op for unauthenticated requests - Includes 6 unit tests covering all paths * feat: register tenant middleware and wrap startup/auth in runAsSystem() - Register tenantContextMiddleware in Express app after capability middleware - Wrap server startup initialization in runAsSystem() for strict mode compat - Wrap auth strategy getAppConfig() calls in runAsSystem() since they run before user context is established (LDAP, SAML, OpenID, social login, AuthService) * feat: thread tenantId through all getAppConfig callers Pass tenantId from req.user to getAppConfig() across all callers that have request context, ensuring correct per-tenant cache key resolution. Also fixes getBaseConfig admin endpoint to scope to requesting admin's tenant instead of returning the unscoped base config. Files updated: - Controllers: UserController, PluginController - Middleware: checkDomainAllowed, balance - Routes: config - Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP - Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech - Admin: getBaseConfig endpoint * feat: add config cache invalidation on admin mutations - Add clearOverrideCache(tenantId?) to flush per-principal override caches by enumerating Keyv store keys matching _OVERRIDE_: prefix - Add invalidateConfigCaches() helper that clears base config, override caches, tool caches, and endpoint config cache in one call - Wire invalidation into all 5 admin config mutation handlers (upsert, patch, delete field, delete overrides, toggle active) - Add strict mode warning when __default__ tenant fallback is used - Add 3 new tests for clearOverrideCache (all/scoped/base-preserving) * chore: update getUserPrincipals comment to reflect ALS-based tenant filtering The TODO(#12091) about missing tenantId filtering is resolved by the tenant context middleware + applyTenantIsolation Mongoose plugin. Group queries are now automatically scoped by tenantId via ALS. * fix: replace runAsSystem with baseOnly for pre-tenant code paths App configs are tenant-owned — runAsSystem() would bypass tenant isolation and return cross-tenant DB overrides. Instead, add baseOnly option to getAppConfig() that returns YAML-derived config only, with zero DB queries. All startup code, auth strategies, and MCP initialization now use getAppConfig({ baseOnly: true }) to get the YAML config without touching the Config collection. * fix: address PR review findings — middleware ordering, types, cache safety - Chain tenantContextMiddleware inside requireJwtAuth after passport auth instead of global app.use() where req.user is always undefined (Finding 1) - Remove global tenantContextMiddleware registration from index.js - Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4) - Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3) - Use startsWith instead of includes for cache key filtering (Finding 12) - Use generator loop instead of Array.from for key enumeration (Finding 3) - Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5) - Move isMainThread check to module level, remove per-request check (Finding 9) - Move mid-file require to top of app.js (Finding 8) - Parallelize invalidateConfigCaches with Promise.all (Finding 10) - Remove clearOverrideCache from public app.js exports (internal only) - Strengthen getUserPrincipals comment re: ALS dependency (Finding 2) * fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly - Restore runAsSystem() around performStartupChecks, updateInterfacePermissions, initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose queries that need system context in strict tenant mode (NEW-3) - Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1) - Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2) * test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests - requireJwtAuth: 5 tests verifying ALS tenant context is set after passport auth, isolated between concurrent requests, and not set when user has no tenantId (Finding 6) - invalidateConfigCaches: 4 tests verifying all four caches are cleared, tenantId is threaded through, partial failure is handled gracefully, and operations run in parallel via Promise.all (Finding 11) * fix: address Copilot review — passport errors, namespaced cache keys, /base scoping - Forward passport errors in requireJwtAuth before entering tenant middleware — prevents silent auth failures from reaching handlers (P1) - Account for Keyv namespace prefix in clearOverrideCache — stored keys are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...", so override caches were never actually matched/cleared (P2) - Remove role from getBaseConfig — /base should return tenant-scoped base config, not role-merged config that drifts per admin role (P2) - Return tenantStorage.run() for cleaner async semantics - Update mock cache in service.spec.ts to simulate Keyv namespacing * fix: address second review — cache safety, code quality, test reliability - Decouple cache invalidation from mutation response: fire-and-forget with logging so DB mutation success is not masked by cache failures - Extract clearEndpointConfigCache helper from inline IIFE - Move isMainThread check to lazy once-per-process guard (no import side effect) - Memoize process.env read in overrideCacheKey to avoid per-request env lookups and log flooding in strict mode - Remove flaky timer-based parallelism assertion, use structural check - Merge orphaned double JSDoc block on getUserPrincipals - Fix stale [getAppConfig] log prefix → [ensureBaseConfig] - Fix import order in tenant.spec.ts (package types before local values) - Replace "Finding 1" reference with self-contained description - Use real tenantStorage primitives in requireJwtAuth spec mock * fix: move JSDoc to correct function after clearEndpointConfigCache extraction * refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry Redis SCAN causes 60s+ stalls under concurrent load (see #12410). APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the in-memory store.keys() path handles the standard case. When APP_CONFIG is Redis-backed, overrides expire naturally via overrideCacheTtl (60s default) — an acceptable window for admin config mutations. * fix: remove return from tenantStorage.run to satisfy void middleware signature * fix: address second review — cache safety, code quality, test reliability - Switch invalidateConfigCaches from Promise.all to Promise.allSettled so partial failures are logged individually instead of producing one undifferentiated error (Finding 3) - Gate overrideCacheKey strict-mode warning behind a once-per-process flag to prevent log flooding under load (Finding 4) - Add test for passport error forwarding in requireJwtAuth — the if (err) { return next(err) } branch now has coverage (Finding 5) - Add test for real partial failure in invalidateConfigCaches where clearAppConfigCache rejects (not just the swallowed endpoint error) * chore: reorder imports in index.js and app.js for consistency - Moved logger and runAsSystem imports to maintain a consistent import order across files. - Improved code readability by ensuring related imports are grouped together.	2026-03-26 17:35:00 -04:00
Danny Avila	8e2721011e	🔑 fix: Robust MCP OAuth Detection in Tool-Call Flow (#12418 ) * fix(api): add buildOAuthToolCallName utility for MCP OAuth flows Extract a shared utility that builds the synthetic tool-call name used during MCP OAuth flows (oauth_mcp_{normalizedServerName}). Uses startsWith on the raw serverName (not the normalized form) to guard against double-wrapping, so names that merely normalize to start with oauth_mcp_ (e.g., oauth@mcp@server) are correctly prefixed while genuinely pre-wrapped names are left as-is. Add 8 unit tests covering normal names, pre-wrapped names, _mcp_ substrings, special characters, non-ASCII, and empty string inputs. * fix(backend): use buildOAuthToolCallName in MCP OAuth flows Replace inline tool-call name construction in both reconnectServer (MCP.js) and createOAuthEmitter (ToolService.js) with the shared buildOAuthToolCallName utility. Remove unused normalizeServerName import from ToolService.js. Fix import ordering in both files. This ensures the oauth_mcp_ prefix is consistently applied so the client correctly identifies MCP OAuth flows and binds the CSRF cookie to the right server. * fix(client): robust MCP OAuth detection and split handling in ToolCall - Fix split() destructuring to preserve tail segments for server names containing _mcp_ (e.g., foo_mcp_bar no longer truncated to foo). - Add auth URL redirect_uri fallback: when the tool-call name lacks the _mcp_ delimiter, parse redirect_uri for the MCP callback path. Set function_name to the extracted server name so progress text shows the server, not the raw tool-call ID. - Display server name instead of literal "oauth" as function_name, gated on auth presence to avoid misidentifying real tools named "oauth". - Consolidate three independent new URL(auth) parses into a single parsedAuthUrl useMemo shared across detection, actionId, and authDomain hooks. - Replace any type on ProgressText test mock with structural type. - Add 8 tests covering delimiter detection, multi-segment names, function_name display, redirect_uri fallback, normalized _mcp_ server names, and non-MCP action auth exclusion. * chore: fix import order in utils.test.ts * fix(client): drop auth gate on OAuth displayName so completed flows show server name The createOAuthEnd handler re-emits the toolCall delta without auth, so auth is cleared on the client after OAuth completes. Gating displayName on `func === 'oauth' && auth` caused completed OAuth steps to render "Completed oauth" instead of "Completed my-server". Remove the `&& auth` gate — within the MCP delimiter branch the func="oauth" check alone is sufficient. Also remove `auth` from the useMemo dep array since only `parsedAuthUrl` is referenced. Update the test to assert correct post-completion display.	2026-03-26 14:45:13 -04:00
Danny Avila	4b6d68b3b5	🎛️ feat: DB-Backed Per-Principal Config System (#12354 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * ✨ feat: Add Config schema, model, and methods for role-based DB config overrides Add the database foundation for principal-based configuration overrides (user, group, role) in data-schemas. Includes schema with tenantId and tenant isolation, CRUD methods, and barrel exports. * 🔧 fix: Add shebang and enforce LF line endings for git hooks The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was converting it to CRLF, both causing "Exec format error" on Windows. Add .gitattributes to force LF for .husky/* and .sh files. ✨ feat: Add admin config API routes with section-level capability checks Add /api/admin/config endpoints for managing per-principal config overrides (user, group, role). Handlers in @librechat/api use DI pattern with section-level hasConfigCapability checks for granular access control. Supports full overrides replacement, per-field PATCH via dot-paths, field deletion, toggle active, and listing. * 🐛 fix: Move deleteConfigField fieldPath from URL param to request body The path-to-regexp wildcard syntax (:fieldPath()) is not supported by the version used in Express. Send fieldPath in the DELETE request body instead, which also avoids URL-encoding issues with dotted paths. ✨ feat: Wire config resolution into getAppConfig with override caching Add mergeConfigOverrides utility in data-schemas for deep-merging DB config overrides into base AppConfig by priority order. Update getAppConfig to query DB for applicable configs when role/userId is provided, with short-TTL caching and a hasAnyConfigs feature flag for zero-cost when no DB configs exist. Also: add unique compound index on Config schema, pass userId from config middleware, and signal config changes from admin API handlers. * 🔄 refactor: Extract getAppConfig logic into packages/api as TS service Move override resolution, caching strategy, and signalConfigChange from api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts using the DI factory pattern (createAppConfigService). The JS file becomes a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies. * 🧹 chore: Rename configResolution.ts to resolution.ts * ✨ feat: Move admin types & capabilities to librechat-data-provider Move SystemCapabilities, CapabilityImplications, and utility functions (hasImpliedCapability, expandImplications) from data-schemas to data-provider so they are available to external consumers like the admin panel without a data-schemas dependency. Add API-friendly admin types: TAdminConfig, TAdminSystemGrant, TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult, TCapabilityCategory, and CAPABILITY_CATEGORIES. data-schemas re-exports these from data-provider and extends with config-schema-derived types (ConfigSection, SystemCapability union). Bump version to 0.8.500. * feat: Add JSON-serializable admin config API response types to data-schemas Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and AdminConfigDeleteResponse types so both LibreChat API handlers and the admin panel can share the same response contract. Bump version to 0.0.41. * refactor: Move admin capabilities & types from data-provider to data-schemas SystemCapabilities, CapabilityImplications, utility functions, CAPABILITY_CATEGORIES, and admin API response types should not be in data-provider as it gets compiled into the frontend bundle, exposing the capability surface. Moved everything to data-schemas (server-only). All consumers already import from @librechat/data-schemas, so no import changes needed elsewhere. Consolidated duplicate AdminConfig type (was in both config.ts and admin.ts). * chore: Bump @librechat/data-schemas to 0.0.42 * refactor: Reorganize admin capabilities into admin/ and types/admin.ts Split systemCapabilities.ts following data-schemas conventions: - Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.) → src/types/admin.ts - Runtime code (SystemCapabilities, CapabilityImplications, utilities) → src/admin/capabilities.ts Revert data-provider version to 0.8.401 (no longer modified). * chore: Fix import ordering, rename appConfigService to service - Rename app/appConfigService.ts → app/service.ts (directory provides context) - Fix import order in admin/config.ts, types/admin.ts, types/config.ts - Add naming convention to AGENTS.md * feat: Add DB base config support (role/__base__) - Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc - getApplicableConfigs always includes __base__ in queries - getAppConfig queries DB even without role/userId when DB configs exist - Bump @librechat/data-schemas to 0.0.43 * fix: Address PR review issues for admin config - Add listAllConfigs method; listConfigs endpoint returns all active configs instead of only __base__ - Normalize principalId to string in all config methods to prevent ObjectId vs string mismatch on user/group lookups - Block __proto__ and all dunder-prefixed segments in field path validation to prevent prototype pollution - Fix configVersion off-by-one: default to 0, guard pre('save') with !isNew, use $inc on findOneAndUpdate - Remove unused getApplicableConfigs from admin handler deps * fix: Enable tree-shaking for data-schemas, bump packages - Switch data-schemas Rollup output to preserveModules so each source file becomes its own chunk; consumers (admin panel) can now import just the modules they need without pulling in winston/mongoose/etc. - Add sideEffects: false to data-schemas package.json - Bump data-schemas to 0.0.44, data-provider to 0.8.402 * feat: add capabilities subpath export to data-schemas Adds `@librechat/data-schemas/capabilities` subpath export so browser consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants without pulling in Node.js-only modules (winston, async_hooks, etc.). Bump version to 0.0.45. * fix: include dist/ in data-provider npm package Add explicit files field so npm includes dist/types/ in the published package. Without this, the root .gitignore exclusion of dist/ causes npm to omit type declarations, breaking TypeScript consumers. * chore: bump librechat-data-provider to 0.8.403 * feat: add GET /api/admin/config/base for raw AppConfig Returns the full AppConfig (YAML + DB base merged) so the admin panel can display actual config field values and structure. The startup config endpoint (/api/config) returns TStartupConfig which is a different shape meant for the frontend app. * chore: imports order * fix: address code review findings for admin config Critical: - Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys - Eliminate race condition: patchConfigField and deleteConfigField now use atomic MongoDB $set/$unset with dot-path notation instead of read-modify-write cycles, removing the lost-update bug entirely - Add patchConfigFields and unsetConfigField atomic DB methods Major: - Reorder cache check before principal resolution in getAppConfig so getUserPrincipals DB query only fires on cache miss - Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant - Parallelize capability checks with Promise.all instead of sequential awaits in for loops - Use loose equality (== null) for cache miss check to handle both null and undefined returns from cache implementations - Set HAS_DB_CONFIGS_KEY to true on successful config fetch Minor: - Remove dead pre('save') hook from config schema (all writes use findOneAndUpdate which bypasses document hooks) - Consolidate duplicate type imports in resolution.ts - Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops) - Add .sort({ priority: 1 }) to getApplicableConfigs query - Rename _impliedBy to impliedByMap * fix: self-referencing BROAD_CONFIG_ACCESS constant * fix: replace type-cast sentinel with proper null parameter Update hasConfigCapability to accept ConfigSection \| null where null means broad access check (MANAGE_CONFIGS or READ_CONFIGS only). Removes the '' as ConfigSection type lie from admin config handlers. * fix: remaining review findings + add tests - listAllConfigs accepts optional { isActive } filter so admin listing can show inactive configs (#9) - Standardize session application to .session(session ?? null) across all config DB methods (#15) - Export isValidFieldPath and getTopLevelSection for testability - Add 38 tests across 3 spec files: - config.spec.ts (api): path validation, prototype pollution rejection - resolution.spec.ts: deep merge, priority ordering, array replacement - config.spec.ts (data-schemas): full CRUD, ObjectId normalization, atomic $set/$unset, configVersion increment, toggle, __base__ query * fix: address second code review findings - Fix cross-user cache contamination: overrideCacheKey now handles userId-without-role case with its own cache key (#1) - Add broad capability check before DB lookup in getConfig to prevent config existence enumeration (#2/#3) - Move deleteConfigField fieldPath from request body to query parameter for proxy/load balancer compatibility (#5) - Derive BaseSystemCapability from SystemCapabilities const instead of manual string union (#6) - Return 201 on upsert creation, 200 on update (#11) - Remove inline narration comments per AGENTS.md (#12) - Type overrides as Partial<TCustomConfig> in DB methods and handler deps (#13) - Replace double as-unknown-as casts in resolution.ts with generic deepMerge<T> (#14) - Make override cache TTL injectable via AppConfigServiceDeps (#16) - Add exhaustive never check in principalModel switch (#17) * fix: remaining review findings — tests, rename, semantics - Rename signalConfigChange → markConfigsDirty with JSDoc documenting the stale-window tradeoff and overrideCacheTtl knob - Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention - Add createAppConfigService tests (14 cases): cache behavior, feature flag, cross-user key isolation, fallback on error, markConfigsDirty - Add admin handler integration tests (13 cases): auth ordering, 201/200 on create/update, fieldPath from query param, markConfigsDirty calls, capability checks * fix: global flag corruption + empty overrides auth bypass - Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning no configs does not mean no configs exist globally. Setting the flag false from a per-principal query short-circuited all subsequent users. - Add broad manage capability check before section checks in upsertConfigOverrides: empty overrides {} no longer bypasses auth. * test: add regression and invariant tests for config system Regression tests: - Bug 1: User A's empty result does not short-circuit User B's overrides - Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS Invariant tests (applied across ALL handlers): - All 5 mutation handlers call markConfigsDirty on success - All 5 mutation handlers return 401 without auth - All 5 mutation handlers return 403 without capability - All 3 read handlers return 403 without capability * fix: third review pass — all findings addressed Service (service.ts): - Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId) so deployments with zero DB configs skip DB queries (#1) - Resolve cache once at factory init instead of per-invocation (#8) - Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10) - Add JSDoc to clearAppConfigCache documenting stale-window (#4) - Fix log message to not say "from YAML" (#14) Admin handlers (config.ts): - Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2) - Add Array.isArray guard on overrides body (#5) - Import CapabilityUser from capabilities.ts, remove duplicate (#6) - Replace as-unknown-as cast with targeted type assertion (#7) - Add MAX_PATCH_ENTRIES=100 cap on entries array (#15) - Reorder deleteConfigField to validate principalType first (#12) - Export CapabilityUser from middleware/capabilities.ts DB methods (config.ts): - Remove isActive:true from patchConfigFields to prevent silent reactivation of disabled configs (#3) Schema (config.ts): - Change principalId from Schema.Types.Mixed to String (#11) Tests: - Add patchConfigField unsafe fieldPath rejection test (#9) - Add base-only HAS_DB_CONFIGS=false test (#1) - Update 201/200 tests to use configVersion instead of findConfig (#2) * fix: add read handler 401 invariant tests + document flag behavior - Add invariant: all 3 read handlers return 401 without auth - Document on markConfigsDirty that HAS_DB_CONFIGS stays true after all configs are deleted until clearAppConfigCache or restart * fix: remove HAS_DB_CONFIGS false optimization entirely getApplicableConfigs([]) only queries for __base__, not all configs. A deployment with role/group configs but no __base__ doc gets the flag poisoned to false by a base-only query, silently ignoring all scoped overrides. The optimization is not safe without a comprehensive Config.exists() check, which adds its own DB cost. Removed entirely. The flag is now write-once-true (set when configs are found or by markConfigsDirty) and only cleared by clearAppConfigCache/restart. * chore: reorder import statements in app.js for clarity * refactor: remove HAS_DB_CONFIGS_KEY machinery entirely The three-state flag (false/null/true) was the source of multiple bugs across review rounds. Every attempt to safely set it to false was defeated by getApplicableConfigs querying only a subset of principals. Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag, markConfigsDirty (now a no-op concept), notifyChange wrapper, and all tests that seeded false manually. The per-user/role TTL cache (overrideCacheTtl, default 60s) is the sole caching mechanism. On cache miss, getApplicableConfigs queries the DB. This is one indexed query per user per TTL window — acceptable for the config override use case. * docs: rewrite admin panel remaining work with current state * perf: cache empty override results to avoid repeated DB queries When getApplicableConfigs returns no configs for a principal, cache baseConfig under their override key with TTL. Without this, every user with no per-principal overrides hits MongoDB on every request after the 60s cache window expires. * fix: add tenantId to cache keys + reject PUBLIC principal type - Include tenantId in override cache keys to prevent cross-tenant config contamination. Single-tenant deployments (tenantId undefined) use '_' as placeholder — no behavior change for them. - Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has no PrincipalModel and is never resolved by getApplicableConfigs, so config docs for it would be dead data. - Config middleware passes req.user.tenantId to getAppConfig. * fix: fourth review pass findings DB methods (config.ts): - findConfigByPrincipal accepts { includeInactive } option so admin GET can retrieve inactive configs (#5) - upsertConfig catches E11000 duplicate key on concurrent upserts and retries without upsert flag (#2) - unsetConfigField no longer filters isActive:true, consistent with patchConfigFields (#11) - Typed filter objects replace Record<string, unknown> (#12) Admin handlers (config.ts): - patchConfigField: serial broad capability check before Promise.all to pre-warm ALS principal cache, preventing N parallel DB calls (#3) - isValidFieldPath rejects leading/trailing dots and consecutive dots (#7) - Duplicate fieldPaths in patch entries return 400 (#8) - DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14) - Admin getConfig and patchConfigField pass includeInactive to findConfigByPrincipal (#5) - Route import uses barrel instead of direct file path (#13) Resolution (resolution.ts): - deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow from crafted deeply nested configs (#4) * fix: final review cleanup - Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths) - Add empty-result caching regression test - Add tenantId to AdminConfigDeps.getAppConfig type - Restore exhaustive never check in principalModel switch - Standardize toggleConfigActive session handling to options pattern * fix: validate priority in patchConfigField handler Add the same non-negative number validation for priority that upsertConfigOverrides already has. Without this, invalid priority values could be stored via PATCH and corrupt merge ordering. * chore: remove planning doc from PR * fix: correct stale cache key strings in service tests * fix: clean up service tests and harden tenant sentinel - Remove no-op cache delete lines from regression tests - Change no-tenant sentinel from '_' to '__default__' to avoid collision with a real tenant ID when multi-tenancy is enabled - Remove unused CONFIG_STORE from AppConfigServiceDeps * chore: bump @librechat/data-schemas to 0.0.46 * fix: block prototype-poisoning keys in deepMerge Skip __proto__, constructor, and prototype keys during config merge to prevent prototype pollution via PUT /api/admin/config overrides.	2026-03-25 19:39:29 -04:00
Marco Beretta	0c66823c26	🧩 feat: Redesign Tool Call UI with Contextual Icons, Smart Grouping, and Rich Output Rendering (#12163 ) * feat: redesign tool call UI with type-specific icons, smart grouping, and rich output rendering Replace the generic spinner/checkmark tool call UI with a modern, Cursor-inspired design: - Add per-tool-type icons (Plug for MCP, Terminal for code, Globe for web search, etc.) - Group 2+ consecutive tool calls into collapsible "Used N tools" sections - Stack unique tool icons in grouped headers with overlapping circle design - Replace raw JSON output with intelligent renderers (table, error, text) - Restructure ToolCallInfo: output first, parameters collapsible at bottom - Add shared useExpandCollapse hook for consistent animations - Add CodeWindowHeader for ExecuteCode windowed view - Remove FinishedIcon (purple checkmark) entirely * feat: display custom MCP server icons in tool calls Add useMCPIconMap hook to resolve MCP server names to their configured icon paths. ToolIcon and StackedToolIcons now accept custom icon URLs, showing actual server logos (e.g., Home Assistant, GitHub) instead of the generic Plug icon for MCP tool calls. * refactor: unify container styling across code blocks, mermaid, and tool output Replace hardcoded gray colors with theme tokens throughout: - CodeBlock: bg-gray-900/700 -> bg-surface-secondary/tertiary + border-border-light - Mermaid dialog: bg-gray-700 -> bg-surface-secondary, text-gray-200 -> text-text-secondary - Mermaid containers: rounded-xl -> rounded-lg, remove shadow-md for consistency - ResultSwitcher: bg-gray-700 -> bg-surface-secondary with border separator - RunCode: hover:bg-gray-700 -> hover:bg-surface-hover - ErrorOutput: add border for visual consistency - MermaidHeader/CodeWindowHeader: consistent focus outlines using border-heavy * refactor: simplify tool output to plain text, remove custom renderers Remove over-engineered tool output system (TableOutput, ErrorOutput, detectOutputType) in favor of simple text extraction. Tool output now extracts the text content from MCP content blocks and displays it as clean readable text — no tables, no error styling, no JSON formatting. Parameters only show key-value badges for simple objects; complex JSON is hidden instead of dumped raw. Matches Cursor-style simplicity. * fix: handle error messages and format JSON arrays in tool output - Strip verbose MCP error prefixes (Error: [MCP][server][tool] tool call failed: Error POSTing...) and show just the meaningful error message - Display errors in red text - Format uniform JSON arrays as readable lists (name — path) instead of raw JSON dumps - Format plain JSON objects as key: value lines * feat: improve JSON display in tool call output and parameters - Replace flat formatObject with recursive formatValue for proper indented display of nested JSON structures - Add ComplexInput component for tool parameters with nested objects, arrays, or long strings (previously hidden) - Broaden hasParams check to show parameters for all object types - Add font-mono to output renderer for better alignment * feat: add localization keys for tool errors, web search, and code UI * refactor: move Mermaid components into dedicated directory module * refactor: extract CodeBar, FloatingCodeBar, and copy utilities from CodeBlock * feat: replace manual SVG icons with @icons-pack/react-simple-icons Supports 50+ programming languages with tree-shaken brand icons instead of hand-crafted SVGs for 19 languages. * refactor: simplify code execution UI with persistent code toggle * refactor: use useExpandCollapse hook in Thinking and Reasoning * feat: improve tool call error states, subtitles, and group summaries * feat: redesign web search with inline source display * feat: improve agent handoff with keyboard accessibility * feat: reorganize exports order in hooks and utils * refactor: unify CopyCodeButton with animated icon transitions and iconOnly support * feat: add run code state machine with animated success/error feedback * refactor: improve ResultSwitcher with lucide icons and accessibility * refactor: update CopyButton component * refactor: replace CopyCodeButton with CopyButton component across multiple files * test: add ImageGen test stubs * test: add RetrievalCall test stubs * feat: merge ImageGen with ToolIcon and localized progress text * feat: modernize RetrievalCall with ToolIcon and collapsible output * test: add getToolIconType action delimiter tests * test: add ImageGen collapsible output tests * feat: add action ToolIcon type with Zap icon * fix: replace AgentHandoff div with semantic button * feat: add aria-live regions to tool components * feat: redesign execute_code tool UI with syntax highlighting and language icons - Remove filename labels (script.py, main.rs) and line counter from CodeWindowHeader - Replace generic FileCode icon with language-specific LangIcon - Add syntax highlighting via highlight.js to code blocks - Add SquareTerminal icon to ExecuteCode progress text - Use shared CopyButton component in CodeWindowHeader - Remove active:scale-95 press animation from CopyButton and RunCode * feat: dynamic tool status text sizing based on markdown font-size variable - Add tool-status-text CSS class using calc(0.9 * --markdown-font-size) - Update progress-text-wrapper to use dynamic sizing instead of base size - Apply tool-status-text to WebSearch, ToolCallGroup, AgentHandoff, ImageGen - Replace hardcoded text-sm/text-xs with dynamic class across all tools - Animate chevron rotation in ProgressText and ToolCallGroup - Update subtitle text color from tertiary to secondary * fix: consistent spacing and text styles across all tool components - Standardize tool status row spacing to my-1/my-1.5 across all components - Update ToolCallInfo text from tertiary to secondary, add vertical padding - Animate ToolCallInfo parameters chevron rotation - Update OutputRenderer link colors from tertiary to secondary * feat: unify tool call grouping for all tool types All consecutive tool calls (MCP, execute_code, web_search, image_gen, file_search, code_interpreter) are now grouped under a single collapsible "Used N tools" header instead of only grouping generic tool calls. - Remove SPECIAL_TOOL_NAMES blacklist from groupToolCalls - Replace getToolCallData with getToolMeta to handle all tool types - Use renderPart callback in ToolCallGroup for proper component routing - Add file_search and code_interpreter mappings to getToolIconType * feat: friendly tool group labels, more icons, and output copy button - Show friendly names in group summary (Code, Web Search, Image Generation) instead of raw tool names - Display MCP server names instead of individual function names - Deduplicate labels and show up to 3 with +N overflow - Increase stacked icons from 3 to 4 - Add icon-only copy button to tool output (OutputRenderer) * fix: execute_code spacing and syntax-highlighted code visibility Match ToolCall spacing by using my-1.5 on status line and moving my-2 inside overflow-hidden. Replace broken hljs.highlight() with lowlight (same engine used by rehype-highlight for markdown code blocks) to render syntax-highlighted code as React elements. Handle object args in useParseArgs to support both string and Record arg formats. * feat: replace showCode with auto-expand tools setting Replace the execute_code-only "Always show code when using code interpreter" global toggle with a new "Auto-expand tool details" setting that controls all tool types. Each tool instance now uses independent local state initialized from the setting, so expanding one tool no longer affects others. Applies to ToolCall, ExecuteCode, ToolCallGroup, and CodeAnalyze components. * fix: apply auto-expand tools setting to WebSearch and RetrievalCall * fix: only auto-expand tools when content is available Defer auto-expansion until tool output or content arrives, preventing empty bordered containers from showing while tools are still running. Uses useEffect to expand when output becomes available during streaming. * feat: redesign file_search tool output, citations, and file preview - Redesign RetrievalCall with per-file cards using OutputRenderer (truncated content with show more/less, copy button) matching MCP tool pattern - Route file_search tool calls from Agents API to RetrievalCall instead of generic ToolCall - Add FilePreviewDialog for viewing files (PDF iframe, text content) with download option, opened from clickable filenames - Redesign file citations: FileText icon in badge, relevance and page numbers in hovercard, click opens file preview instead of downloading - Add file preview to message file attachments (Files.tsx) - Fix hovercard animation to slide top-to-bottom and dismiss instantly on file click to prevent glitching over dialog - Add localization keys for relevance, extracted content, preview - Add top margin to ToolCallGroup * chore: remove leftover .planning files * fix: polish FilePreviewDialog, CodeBlock, LangIcon, and Sources * fix: prevent keyboard focus on collapsed tool content Add inert attribute to all expand/collapse wrapper divs so collapsed content is removed from tab order and hidden from assistive technology. Skip disabled ProgressText buttons from tab order with tabIndex={-1}. * feat: integrate file metadata into file_search UI Pass fileType (MIME) and fileBytes from backend file records through to the frontend. Add file-type-specific icons, file size display, pages sorted by relevance, multi-snippet content per file, smart preview detection by MIME type, and copy button in file preview dialog. * fix: review fixes — inverted type check, wrong dimension, missing import, fail-open perms, timer leaks, dead code cleanup * fix: update CodeBlock styling for improved visual consistency * fix(chat): open composite file citations in preview * fix(chat): restore file previews for parsed search results * chore(git): ignore bg-shell artifacts * fix(chat): restore readable code content in light theme * style(chat): align code and output surfaces by theme * chore(i18n): remove 6 unused translation keys * fix(deps): replace private registry URL with public npm registry in lockfile * fix: CI lint, build, and test failures - Add missing scaleImage utility (fixes Vite build error) - Export scaleImage from utils/index.ts - Remove unused imports from Part.tsx (FunctionToolCall, CodeToolCall, Agents) - Fix prettier formatting in Part.tsx (multi-line → single-line imports, conditions) - Remove excess blank lines in Part.tsx - Remove unused CodeEditorRef import from Artifacts.tsx - Add useProgress mock to OpenAIImageGen.test.tsx - Add scaleImage mock to OpenAIImageGen.test.tsx - Update OpenAIImageGen tests to match redesigned component structure - Remove dead collapsible output panel tests from ImageGen.test.tsx - Add @icons-pack/react-simple-icons to Jest transformIgnorePatterns (ESM fix) * refactor: reorganize imports order across multiple components for consistency * fix: add scaleImage tests, delete dead ImageGen wrapper, wire up onUIAction in ToolCallInfo - Add 7 unit tests for scaleImage utility covering null ref, scaling, no-upscale, height clamping, landscape, and panoramic images - Delete unused Content/ImageGen.tsx re-export wrapper (ImageGen is imported from Parts/OpenAIImageGen via the Parts barrel) - Wire up onUIAction in ToolCallInfo to use handleUIAction + ask from useMessagesOperations, matching UIResourceCarousel's behavior (was previously a silent no-op) * refactor: optimize imports and enhance lazy loading for language icons * fix: address review findings for tool call UI redesign - Fix unstable array-index keys in ToolCallGroup (streaming state corruption) - Add plain-text fallback in InputRenderer for non-JSON tool args - Localize FRIENDLY_NAMES via translation keys instead of hardcoded English - Guard autoCollapse against user-initiated manual expansion - Fix CODE_INTERPRETER hasOutput to check actual outputs instead of hardcoding true - Add logger.warn for Citations fail-closed behavior on permission errors - Add Terminal icon to CodeAnalyze ProgressText for visual consistency - Fix getMCPServerName to use indexOf instead of fragile split - Use useLayoutEffect for inert attribute in useExpandCollapse (a11y) - Memoize style object in useExpandCollapse to avoid defeating React.memo - Memoize groupSequentialToolCalls in ContentParts to avoid recomputation - Use source.link as stable key instead of array index in WebSearch - Hoist rehypePlugins outside CodeMarkdown to prevent per-render recreation * fix: revert useMemo after conditional returns in ContentParts The useMemo placed after early returns violated React Rules of Hooks — hook call count would change when transitioning between edit/view mode. Reverted to the original plain forEach which is correct and equally performant since content changes on every streaming token anyway. * chore: remove unused com_ui_variables_info translation key * fix: update tests and jest config for ESM compatibility after rebase - Add ESM-only packages to transformIgnorePatterns (@dicebear, unified ecosystem, react-dnd, lowlight, etc.) to fix Jest parse failures introduced by dev rebase - Update ToolCall.test.tsx to match new component API (CSS expand/collapse instead of conditional rendering, simplified props) - Update ToolCallInfo.test.tsx to mock OutputRenderer (avoids ESM chain), align with current component interface (input/output/attachments) * refactor: replace @icons-pack/react-simple-icons with inline SVGs Inline the 51 Simple Icons SVG paths used by LangIcon directly into langIconPaths.ts, eliminating the runtime dependency on @icons-pack/react-simple-icons (which requires Node >= 24). - LangIcon now renders a plain <svg> with the path data instead of lazy-loading React components from the package - Removes Suspense/React.lazy overhead for code block language icons - SVG paths sourced from Simple Icons (CC0 1.0 license) - Package kept in package.json for now (will be removed separately) * fix: replace Plug icon with Wrench for MCP tools, remove unused i18n keys - MCP tools without a custom iconPath now show Wrench instead of Plug, matching the generic tool fallback and avoiding the "plugin" metaphor - Remove unused translation keys: com_assistants_action_attempt, com_assistants_attempt_info, com_assistants_domain_info, com_ui_ui_resources * fix: address second review findings - Combine 3x getToolMeta loop into single toolMetadata pass (ToolCallGroup) - Extract sortPagesByRelevance to shared util (was duplicated in FilePreviewDialog and RetrievalCall) - Deduplicate AGENT_STYLE_TOOLS Set (export from OpenAIImageGen/index.ts) - Localize "source/sources" in WebSearch aria-label - Add autoExpand useEffect to CodeAnalyze for live setting changes - Log download errors in FilePreviewDialog instead of silently swallowing - Replace @ts-ignore with @ts-expect-error + explanation in Code.tsx - Remove dead currentContent alias in CodeMarkdown * chore: remove @icons-pack/react-simple-icons dependency from package.json and package-lock.json - Deleted the @icons-pack/react-simple-icons entry from both package.json and package-lock.json, following the previous refactor to use inline SVGs for icons. * fix: address triage audit findings - Remove unused gIdx variable (ESLint error) - Fix singular/plural in web search sources aria-label - Separate inline type import in ToolCallGroup per AGENTS.md * fix: remove invalid placeholderDimensions prop from Image component * chore: import order * chore: import order * fix: resolve TypeScript errors in PR-touched files - Remove non-existent placeholderDimensions prop from Image in Files.tsx - Fix localize count param type (number, not string) in WebSearch.tsx - Pass full resource object instead of partial in UIResourceCarousel.tsx - Add 'as const' to toggleSwitchConfigs localizationKey in General.tsx - Fix SearchResultData type in Citation.test.tsx - Fix TAttachment and UIResource test fixture types across test files * docs: document formatBytes difference in FilePreviewDialog The local formatBytes returns a human-readable string with units ("1.5 MB"), while ~/utils/formatBytes returns a raw number. They serve different purposes, so the local copy is retained with a JSDoc comment explaining the distinction. * fix: address remaining review items - Replace cancelled IIFE with documented ternary in OpenAIImageGen, explaining the agent vs legacy path distinction - Add .catch() fallback to loadLowlight() in useLazyHighlight — falls back to plain text if the chunk fails to load - Fix import ordering in ToolCallGroup.tsx (type imports grouped before local value imports per AGENTS.md) * fix: blob URL leak and useGetFiles over-fetch - FilePreviewDialog: add cancelledRef guard to loadPreview so blob URLs are never created after the dialog closes (prevents orphaned object URLs on unmount during async PDF fetch) - RetrievalCall: filter useGetFiles by fileIds from fileSources instead of fetching the entire user file corpus for display-only name matching * chore: fix com_nav_auto_expand_tools alphabetical order in translation.json * fix: render non-object JSON params instead of returning null in InputRenderer * refactor: render JSON tool output as syntax-highlighted code block Replace the custom YAML-ish formatValue/formatObjectArray rendering with JSON.stringify + hljs language-json styling. Structured API responses (like GitHub search results) now display as proper syntax-highlighted JSON with indentation instead of a flat key-value text dump. - Remove formatValue, formatObjectArray, isUniformObjectArray helpers - Add isJson flag to extractText return type - JSON output rendered in <code class="hljs language-json"> block - Text content blocks (type: "text") still extracted and rendered as plain text - Error output unchanged * fix: extract cancelled IIFE to named function in OpenAIImageGen Replace nested ternary with a named computeCancelled() function that documents the agent vs legacy path branching. Resolves eslint no-nested-ternary warning. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-25 12:31:39 -04:00
Marco Beretta	ccd049d8ce	📁 refactor: Prompts UI (#11570 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * style: enhance prompts UI with new components and improved structure; add CreatePromptButton and AutoSendPrompt; refactor GroupSidePanel and PromptsAccordion * refactor(Prompts): move button components to buttons/ subdirectory * refactor(Prompts): move dialog components to dialogs/ subdirectory * refactor(Prompts): move display components to display/ subdirectory * refactor(Prompts): move editor components to editor/ subdirectory * refactor(Prompts): move field components to fields/ subdirectory * refactor(Prompts): move form components to forms/ subdirectory * refactor(Prompts): move layout components to layouts/ subdirectory * refactor(Prompts): move list components to lists/ subdirectory * refactor(Prompts): move sidebar components to sidebar/ subdirectory * refactor(Prompts): move utility components to utils/ subdirectory * refactor(Prompts): update main exports and external imports * refactor(Prompts): fix class name typo in AutoSendPrompt * refactor(Prompts): reorganize exports and imports order across components * refactor(Prompts): reorder exports for better organization and clarity * refactor(Buttons): enhance prompts accessibility with aria-labels and update translations * refactor(AdminSettings): reorganize imports and improve form structure for clarity * refactor(Dialogs): reorganize imports for consistency and clarity across DeleteVersion, SharePrompt, and VariableDialog components * refactor(Dialogs): enhance prompts accessibility with aria-labels * refactor(Display): enhance prompt components and accessibility features * refactor(.gitignore): add Playwright MCP directory * refactor(Preview): enhance prompt components, improve layout, and add accessibility features * refactor(Prompts): enhance variable handling, improve accessibility, and update UI components * refactor(Prompts): enhance loading state handling and improve accessibility in PromptName component * refactor(Prompts): streamline special variable handling, improve icon management, and enhance UI components * refactor(Prompts): update AdvancedSwitch component to use Radio for mode selection, enhance PromptName with tooltips, and improve layout in PromptForm * refactor(Prompts): enhance VersionCard and VersionBadge components for improved UI and accessibility, update loading state handling in VersionsPanel * refactor(Prompts): improve layout and styling of VersionCard component for better visual alignment and clarity * refactor(DeleteVersion): update text color for confirmation prompt in DeleteConfirmDialog * refactor(Prompts): add configurations for always make production and auto-send prompts, update localization strings for clarity * refactor(Prompts): enhance layout and styling in CategorySelector, CreatePromptForm, and List components for improved responsiveness and clarity * refactor(Prompts): enhance PromptDetailHeader and ChatGroupItem components, add shared prompt indication, and remove unused PromptMetadata component * refactor(Prompts): implement prompt group usage tracking, update sorting logic, and enhance related components * fix(Prompts): security, performance, and pagination fixes - Fix cursor pagination skipping/duplicating items by including numberOfGenerations in cursor condition to match sort order - Close NoSQL injection vector via otherFilters rest spread in GET /all, GET /groups, and buildPromptGroupFilter - Validate groupId as ObjectId before passing to query (GET /) - Add prompt body validation in addPromptToGroup (type + text) - Return 404 instead of 500 for missing group in POST /use - Combine data + count into single $facet aggregation - Add compound index {numberOfGenerations, updatedAt, _id} - Add index on prompt.author for deleteUserPrompts - Update useRecordPromptUsage to refresh client caches - Replace console.error with logger.error * refactor(PromptForm): remove console warning for unselected prompt in VersionsPanel * refactor(Prompts): improve error handling for groupId and streamline usage tracking * refactor(.gitignore): add CLAUDE.md to ignore list * refactor(Prompts): streamline prompt components by removing unused variables and enhancing props structure * refactor(Prompts): fix sort stability, keyboard handling, and remove dead code Add _id tiebreaker to prompt group sort pipelines for deterministic pagination ordering. Prevent default browser scroll on Space key in PromptEditor preview mode. Remove unused blurTimeoutRef and its onMutate callback from DashGroupItem. * refactor(Prompts): enhance groupId validation and improve prompt group aggregation handling * fix: aria-hidden, API fixes, accessibility improvements * fix: ACL author filter, mobile guard, semantic HTML, and add useFocusTrap hook - Remove author filter from patchPromptGroup so ACL-granted editors can update prompt groups (aligns with deletePromptGroupController) - Add missing group guard to mobile HeaderActions in PromptForm - Replace div with article in DashGroupItem, remove redundant stopPropagation and onClick on outer container - Add useFocusTrap hook for keyboard focus management - Add numberOfGenerations to default projection - Deduplicate ObjectId validation, remove console.warn, fix aria-labelledby, localize search announcements * refactor(Prompts): adjust UI and improve a11y * refactor(Prompts): reorder imports for consistency and clarity * refactor(Prompts): implement updateFieldsInPlace for efficient data updates and add related tests * refactor(Prompts): reorder imports to include updateFieldsInPlace for better organization * refactor(Prompts): enhance DashGroupItem with toast notifications for prompt updates and add click-to-edit functionality in PromptEditor * style: use self-closing TooltipAnchor in CreatePromptButton Replace ></TooltipAnchor> with /> for consistency with the rest of the Prompts directory. * fix(i18n): replace placeholder text for com_ui_global_group translation key The value was left as 'something needs to go here. was empty' which would be visible to users as an aria-label in DashGroupItem. * fix(DashGroupItem): sync rename input with group.name on external changes nameInputValue was initialized via useState(group.name) but never synced when group.name changed from a background refetch. Added useEffect that updates the input when the dialog is closed. * perf(useFocusTrap): store onEscape in ref to avoid listener churn onEscape was in the useEffect dependency array, causing the keydown listener to be torn down and re-attached on every render when callers passed an inline function. Now stored in a ref so the effect only re-runs when active or containerRef changes. * fix(a11y): replace role=button div with layered button overlay in ListCard The card used role='button' on a div that contained nested Button elements — an invalid ARIA pattern. Replaced with a hidden button at z-0 for the card action while child interactive elements sit at z-10, eliminating nested interactive element violations. * fix(PromptForm): reset selectionIndex on route change, guard auto-save, and fix a11y - Reset selectionIndex to 0 and isEditing to false when promptId changes, preventing out-of-bounds index when navigating between groups with different version counts. - Track selectedPrompt in a ref so the auto-save effect doesn't fire against a stale prompt when the selection changed mid-edit. - Stabilize useFocusTrap onEscape via useCallback to avoid unnecessary listener re-attachment. - Conditionally render mobile overlay instead of always-present button with aria-hidden/pointer-events toggling. * refactor: extract isValidObjectIdString to shared utility in data-schemas The same regex helper was duplicated in api/server/routes/prompts.js and packages/data-schemas/src/methods/prompt.ts. Moved to packages/data-schemas/src/utils/objectId.ts and imported from both consumers. Also removed a duplicate router.use block introduced during the extraction. * perf(updateFieldsInPlace): replace JSON deep clone with targeted spread Instead of JSON.parse(JSON.stringify(data)) which serializes the entire paginated data structure, use targeted immutable spreads that only copy the affected page and collection array. Returns the original data reference unchanged when the item is not found. * perf(VariablesDropdown): memoize items array and stabilize handleAddVariable The items array containing JSX elements was rebuilt on every render. Wrapped in useMemo keyed on usedVariables and localize. Also wrapped handleAddVariable in useCallback and memoized usedCount to avoid redundant array filtering. * perf(DashGroupItem): stabilize mutation callbacks via refs handleSaveRename and handleDelete had updateGroup/deleteGroup mutation objects in their useCallback dependency arrays. Since mutation objects are new references each render, the callbacks were recreated every render, defeating memoization. Now store mutation objects in refs and call via ref.current in the callbacks. * fix(security): validate groupId in incrementPromptGroupUsage The data-schema method passed the groupId string directly to findByIdAndUpdate without validation. If called from a different entrypoint without the route-level check, Mongoose would throw a CastError. Now validates with isValidObjectIdString before the DB call and throws a clean 'Invalid groupId' error. * fix(security): add rate limiter to prompt usage tracking endpoint POST /groups/:groupId/use had no rate limiting — a user could spam it to inflate numberOfGenerations, which controls sort order for all users. Added promptUsageLimiter (30 req/user/min) following the same pattern as toolCallLimiter. Also handle 'Invalid groupId' error from the data layer in the route error handler. * fix(updateFieldsInPlace): guard against undefined identifier value If updatedItem[identifierField] is null/undefined, findIndex could match unintended items where that field is also undefined. Added early return when the identifier value is nullish. * fix(a11y): use React useId for stable unique IDs in ListCard aria-describedby/id values were derived from prompt name which can contain spaces and special characters, producing invalid HTML IDs and potential collisions. Now uses React.useId() for guaranteed unique, valid IDs per component instance. * fix: Align prompts panel styling with other sidebar panels and fix test - Match FilterPrompts first row to Memory/Bookmark pattern (items-center gap-2) - Remove items-stretch override from PromptsAccordion - Add missing promptUsageLimiter mock to prompts route test * fix: Address code review findings for prompts refactor PR - Fix #5: Gate DeletePrompt in HeaderActions behind canDelete permission - Fix #8: BackToChat navigates to last conversation instead of /c/new - Fix #7: Restore useLiveAnnouncer for screen reader feedback on delete/rename - Fix #1: Use isPublic (set by API) instead of deprecated projectIds for globe icon - Fix #4: Optimistic cache update in useRecordPromptUsage instead of full invalidation - Fix #6: Add migration to drop superseded { createdAt, updatedAt } compound index - Fix #9: Single-pass reduce in PromptVariables instead of triple filter - Fix #10: Rename PromptLabelsForm internal component to avoid collision with PromptForm - Fix #14: Remove redundant aria-label from aria-hidden Checkbox in AutoSendPrompt * fix: Align prompts panel filter row element sizes with other panels - Override Dropdown trigger to size-9 (36px) to match FilterInput height - Set CreatePromptButton to size-9 shrink-0 bg-transparent matching Memory/Bookmark panel button pattern * fix(prompts): Shared Prompts filter ignores direct shares, only returns PUBLIC Folds fix from PR #11882 into the refactored codebase. Bug A: filterAccessibleIdsBySharedLogic now accepts ownedPromptGroupIds: - MY_PROMPTS: accessible intersect owned - SHARED_PROMPTS: (accessible union public) minus owned - ALL: accessible union public (deduplicated) Legacy fallback preserved when ownedPromptGroupIds is omitted. Bug B: getPromptGroup uses $lookup aggregation to populate productionPrompt, fixing empty text on direct URL navigation to shared prompts. Also adds getOwnedPromptGroupIds to data-schemas methods and passes it from both /all and /groups route handlers. * fix: Add missing canDelete to mobile HeaderActions, remove dead instanceProjectId prop - Pass canDelete to mobile HeaderActions row (was only on desktop) - Remove instanceProjectId prop from ChatGroupItem and DashGroupItem since global check now uses group.isPublic - Remove useGetStartupConfig from List.tsx (no longer needed) * fix: Use runtime ObjectId instead of type-only Types.ObjectId, fix i18next interpolation - getPromptGroup and getOwnedPromptGroupIds were using Types.ObjectId (imported as type-only), which is erased at compile time. Use the runtime ObjectId from mongoose.Types (already destructured at line 20). This fixes the 404s in PATCH /groups/:groupId tests. - Fix com_ui_prompt_deleted_group translation to use {{0}} (i18next double-brace syntax) instead of {0}. * chore: Fix translation key ordering, add sideEffects: false to data-provider - Reorder new translation keys to maintain alphabetical order: com_ui_click_to_edit, com_ui_labels, com_ui_live, com_ui_prompt_delete_confirm, com_ui_prompt_deleted_group, com_ui_prompt_details, com_ui_prompt_renamed, com_ui_prompt_update_error, com_ui_prompt_variables_list - Add "sideEffects": false to librechat-data-provider package.json to enable tree-shaking of unused exports (types, constants, pure functions) * fix: Reduce prompts panel spacing, align memory toggle with checkbox pattern - Remove unnecessary wrapper div around AutoSendPrompt in PromptsAccordion, reducing vertical space between the toggle and the first prompt item - Replace Memory panel's Switch toggle with Checkbox+Button pattern matching the prompts panel's AutoSendPrompt for visual consistency * fix: Reduce gap between AutoSendPrompt and first prompt item Change ChatGroupItem margin from my-2 to mb-2 to eliminate the doubled spacing (gap-2 from parent + top margin from first item). Restore wrapper div around AutoSendPrompt for right-alignment. * fix: Restore prompt name on empty save, remove dead bodyProps from checkGlobalPromptShare - PromptName: reset newName to name when save is cancelled due to empty or unchanged input, preventing blank title in read mode - checkGlobalPromptShare: remove dead bodyProps config — Permissions.SHARE was not in the permissions array so the bodyProps rule was never evaluated. Per-resource share checks are handled by canAccessPromptGroupResource. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-22 16:56:22 -04:00
Danny Avila	04e65bb21a	🧹 chore: Move direct model usage from PermissionsController to data-schemas Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details	2026-03-21 15:20:15 -04:00
Danny Avila	87a3b8221a	🧹 chore: Consolidate getSoleOwnedResourceIds into data-schemas and use db object in PermissionService Move getSoleOwnedResourceIds from PermissionService to data-schemas aclEntry methods, update PermissionService to use the db object pattern instead of destructured imports from ~/models, and update UserController + tests accordingly.	2026-03-21 14:28:57 -04:00
Danny Avila	7829fa9eca	🪄 refactor: Simplify MCP Tool Content Formatting to Unified String Output (#12352 ) * refactor: Simplify content formatting in MCP service and parser - Consolidated content handling in `formatToolContent` to return a plain-text string instead of an array for all providers, enhancing clarity and consistency. - Removed unnecessary checks for content array providers, streamlining the logic for handling text and image artifacts. - Updated related tests to reflect changes in expected output format, ensuring comprehensive coverage for the new implementation. * fix: Return empty string for image-only tool responses instead of '(No response)' When artifacts exist (images/UI resources) but no text content is present, return an empty string rather than the misleading '(No response)' fallback. Adds missing test assertions for image-only content and standardizes length checks to explicit `> 0` comparisons.	2026-03-21 14:28:56 -04:00
Danny Avila	b5c097e5c7	⚗️ feat: Agent Context Compaction/Summarization (#12287 ) * chore: imports/types Add summarization config and package-level summarize handler contracts Register summarize handlers across server controller paths Port cursor dual-read/dual-write summary support and UI status handling Selectively merge cursor branch files for BaseClient summary content block detection (last-summary-wins), dual-write persistence, summary block unit tests, and on_summarize_status SSE event handling with started/completed/failed branches. Co-authored-by: Cursor <cursoragent@cursor.com> refactor: type safety feat: add localization for summarization status messages refactor: optimize summary block detection in BaseClient Updated the logic for identifying existing summary content blocks to use a reverse loop for improved efficiency. Added a new test case to ensure the last summary content block is updated correctly when multiple summary blocks exist. chore: add runName to chainOptions in AgentClient refactor: streamline summarization configuration and handler integration Removed the deprecated summarizeNotConfigured function and replaced it with a more flexible createSummarizeFn. Updated the summarization handler setup across various controllers to utilize the new function, enhancing error handling and configuration resolution. Improved overall code clarity and maintainability by consolidating summarization logic. feat(summarization): add staged chunk-and-merge fallback feat(usage): track summarization usage separately from messages feat(summarization): resolve prompt from config in runtime fix(endpoints): use @librechat/api provider config loader refactor(agents): import getProviderConfig from @librechat/api chore: code order feat(app-config): auto-enable summarization when configured feat: summarization config refactor(summarization): streamline persist summary handling and enhance configuration validation Removed the deprecated createDeferredPersistSummary function and integrated a new createPersistSummary function for MongoDB persistence. Updated summarization handlers across various controllers to utilize the new persistence method. Enhanced validation for summarization configuration to ensure provider, model, and prompt are properly set, improving error handling and overall robustness. refactor(summarization): update event handling and remove legacy summarize handlers Replaced the deprecated summarization handlers with new event-driven handlers for summarization start and completion across multiple controllers. This change enhances the clarity of the summarization process and improves the integration of summarization events in the application. Additionally, removed unused summarization functions and streamlined the configuration loading process. refactor(summarization): standardize event names in handlers Updated event names in the summarization handlers to use constants from GraphEvents for consistency and clarity. This change improves maintainability and reduces the risk of errors related to string literals in event handling. feat(summarization): enhance usage tracking for summarization events Added logic to track summarization usage in multiple controllers by checking the current node type. If the node indicates a summarization task, the usage type is set accordingly. This change improves the granularity of usage data collected during summarization processes. feat(summarization): integrate SummarizationConfig into AppSummarizationConfig type Enhanced the AppSummarizationConfig type by extending it with the SummarizationConfig type from librechat-data-provider. This change improves type safety and consistency in the summarization configuration structure. test: add end-to-end tests for summarization functionality Introduced a comprehensive suite of end-to-end tests for the summarization feature, covering the full LibreChat pipeline from message creation to summarization. This includes a new setup file for environment configuration and a Jest configuration specifically for E2E tests. The tests utilize real API keys and ensure proper integration with the summarization process, enhancing overall test coverage and reliability. refactor(summarization): include initial summary in formatAgentMessages output Updated the formatAgentMessages function to return an initial summary alongside messages and index token count map. This change is reflected in multiple controllers and the corresponding tests, enhancing the summarization process by providing additional context for each agent's response. refactor: move hydrateMissingIndexTokenCounts to tokenMap utility Extracted the hydrateMissingIndexTokenCounts function from the AgentClient and related tests into a new tokenMap utility file. This change improves code organization and reusability, allowing for better management of token counting logic across the application. refactor(summarization): standardize step event handling and improve summary rendering Refactored the step event handling in the useStepHandler and related components to utilize constants for event names, enhancing consistency and maintainability. Additionally, improved the rendering logic in the Summary component to conditionally display the summary text based on its availability, providing a better user experience during the summarization process. feat(summarization): introduce baseContextTokens and reserveTokensRatio for improved context management Added baseContextTokens to the InitializedAgent type to calculate the context budget based on agentMaxContextNum and maxOutputTokensNum. Implemented reserveTokensRatio in the createRun function to allow configurable context token management. Updated related tests to validate these changes and ensure proper functionality. feat(summarization): add minReserveTokens, context pruning, and overflow recovery configurations Introduced new configuration options for summarization, including minReserveTokens, context pruning settings, and overflow recovery parameters. Updated the createRun function to accommodate these new options and added a comprehensive test suite to validate their functionality and integration within the summarization process. feat(summarization): add updatePrompt and reserveTokensRatio to summarization configuration Introduced an updatePrompt field for updating existing summaries with new messages, enhancing the flexibility of the summarization process. Additionally, added reserveTokensRatio to the configuration schema, allowing for improved management of token allocation during summarization. Updated related tests to validate these new features. feat(logging): add on_agent_log event handler for structured logging Implemented an on_agent_log event handler in both the agents' callbacks and responses to facilitate structured logging of agent activities. This enhancement allows for better tracking and debugging of agent interactions by logging messages with associated metadata. Updated the summarization process to ensure proper handling of log events. fix: remove duplicate IBalanceUpdate interface declaration perf(usage): single-pass partition of collectedUsage Replace two Array.filter() passes with a single for-of loop that partitions message vs. summarization usages in one iteration. fix(BaseClient): shallow-copy message content before mutating and preserve string content Avoid mutating the original message.content array in-place when appending a summary block. Also convert string content to a text content part instead of silently discarding it. fix(ui): fix Part.tsx indentation and useStepHandler summarize-complete handling - Fix SUMMARY else-if branch indentation in Part.tsx to match chain level - Guard ON_SUMMARIZE_COMPLETE with didFinalize flag to avoid unnecessary re-renders when no summarizing parts exist - Protect against undefined completeData.summary instead of unsafe spread fix(agents): use strict enabled check for summarization handlers Change summarizationConfig?.enabled !== false to === true so handlers are not registered when summarizationConfig is undefined. chore: fix initializeClient JSDoc and move DEFAULT_RESERVE_RATIO to module scope refactor(Summary): align collapse/expand behavior with Reasoning component - Single render path instead of separate streaming vs completed branches - Use useMessageContext for isSubmitting/isLatestMessage awareness so the "Summarizing..." label only shows during active streaming - Default to collapsed (matching Reasoning), user toggles to expand - Add proper aria attributes (aria-hidden, role, aria-controls, contentId) - Hide copy button while actively streaming feat(summarization): default to self-summarize using agent's own provider/model When no summarization config is provided (neither in librechat.yaml nor on the agent), automatically enable summarization using the agent's own provider and model. The agents package already provides default prompts, so no prompt configuration is needed. Also removes the dead resolveSummarizationLLMConfig in summarize.ts (and its spec) — run.ts buildAgentContext is the single source of truth for summarization config resolution. Removes the duplicate RuntimeSummarizationConfig local type in favor of the canonical SummarizationConfig from data-provider. chore: schema and type cleanup for summarization - Add trigger field to summarizationAgentOverrideSchema so per-agent trigger overrides in librechat.yaml are not silently stripped by Zod - Remove unused SummarizationStatus type from runs.ts - Make AppSummarizationConfig.enabled non-optional to reflect the invariant that loadSummarizationConfig always sets it refactor(responses): extract duplicated on_agent_log handler refactor(run): use agents package types for summarization config Import SummarizationConfig, ContextPruningConfig, and OverflowRecoveryConfig from @librechat/agents and use them to type-check the translation layer in buildAgentContext. This ensures the config object passed to the agent graph matches what it expects. - Use `satisfies AgentSummarizationConfig` on the config object - Cast contextPruningConfig and overflowRecoveryConfig to agents types - Properly narrow trigger fields from DeepPartial to required shape feat(config): add maxToolResultChars to base endpoint schema Add maxToolResultChars to baseEndpointSchema so it can be configured on any endpoint in librechat.yaml. Resolved during agent initialization using getProviderConfig's endpoint resolution: custom endpoint config takes precedence, then the provider-specific endpoint config, then the shared `all` config. Passed through to the agents package ToolNode, which uses it to cap tool result length before it enters the context window. When not configured, the agents package computes a sensible default from maxContextTokens. fix(summarization): forward agent model_parameters in self-summarize default When no explicit summarization config exists, the self-summarize default now forwards the agent's model_parameters as the summarization parameters. This ensures provider-specific settings (e.g. Bedrock region, credentials, endpoint host) are available when the agents package constructs the summarization LLM. fix(agents): register summarization handlers by default Change the enabled gate from === true to !== false so handlers register when no explicit summarization config exists. This aligns with the self-summarize default where summarization is always on unless explicitly disabled via enabled: false. refactor(summarization): let agents package inherit clientOptions for self-summarize Remove model_parameters forwarding from the self-summarize default. The agents package now reuses the agent's own clientOptions when the summarization provider matches the agent's provider, inheriting all provider-specific settings (region, credentials, proxy, etc.) automatically. refactor(summarization): use MessageContentComplex[] for summary content Unify summary content to always use MessageContentComplex[] arrays, matching the pattern used by on_message_delta. No more string \| array unions — content is always an array of typed blocks ({ type: 'text', text: '...' } for text, { type: 'reasoning_content', ... } for reasoning). Agents package: - SummaryContentBlock.content: MessageContentComplex[] (was string) - tokenCount now optional (not sent on deltas) - Removed reasoning field — reasoning is now a content block type - streamAndCollect normalizes all chunks to content block arrays - Delta events pass content blocks directly LibreChat: - SummaryContentPart.content: Agents.MessageContentComplex[] - Updated Part.tsx, Summary.tsx, useStepHandler.ts, BaseClient.js - Summary.tsx derives display text from content blocks via useMemo - Aggregator uses simple array spread refactor(summarization): enhance summary handling and text extraction - Updated BaseClient.js to improve summary text extraction, accommodating both legacy and new content formats. - Modified summarization logic to ensure consistent handling of summary content across different message formats. - Adjusted test cases in summarization.e2e.spec.js to utilize the new summary text extraction method. - Refined SSE useStepHandler to initialize summary content as an array. - Updated configuration schema by removing unused minReserveTokens field. - Cleaned up SummaryContentPart type by removing rangeHash property. These changes streamline the summarization process and ensure compatibility with various content structures. refactor(summarization): streamline usage tracking and logging - Removed direct checks for summarization nodes in ModelEndHandler and replaced them with a dedicated markSummarizationUsage function for better readability and maintainability. - Updated OpenAIChatCompletionController and responses handlers to utilize the new markSummarizationUsage function for setting usage types. - Enhanced logging functionality by ensuring the logger correctly handles different log levels. - Introduced a new useCopyToClipboard hook in the Summary component to encapsulate clipboard copy logic, improving code reusability and clarity. These changes improve the overall structure and efficiency of the summarization handling and logging processes. refactor(summarization): update summary content block documentation - Removed outdated comment regarding the last summary content block in BaseClient.js. - Added a new comment to clarify the purpose of the findSummaryContentBlock method, ensuring consistency in documentation. These changes enhance code clarity and maintainability by providing accurate descriptions of the summarization logic. refactor(summarization): update summary content structure in tests - Modified the summarization content structure in e2e tests to use an array format for text, aligning with recent changes in summary handling. - Updated test descriptions to clarify the behavior of context token calculations, ensuring consistency and clarity in the tests. These changes enhance the accuracy and maintainability of the summarization tests by reflecting the updated content structure. refactor(summarization): remove legacy E2E test setup and configuration - Deleted the e2e-setup.js and jest.e2e.config.js files, which contained legacy configurations for E2E tests using real API keys. - Introduced a new summarization.e2e.ts file that implements comprehensive E2E backend integration tests for the summarization process, utilizing real AI providers and tracking summaries throughout the run. These changes streamline the testing framework by consolidating E2E tests into a single, more robust file while removing outdated configurations. refactor(summarization): enhance E2E tests and error handling - Added a cleanup step to force exit after all tests to manage Redis connections. - Updated the summarization model to 'claude-haiku-4-5-20251001' for consistency across tests. - Improved error handling in the processStream function to capture and return processing errors. - Enhanced logging for cross-run tests and tight context scenarios to provide better insights into test execution. These changes improve the reliability and clarity of the E2E tests for the summarization process. refactor(summarization): enhance test coverage for maxContextTokens behavior - Updated run-summarization.test.ts to include a new test case ensuring that maxContextTokens does not exceed user-defined limits, even when calculated ratios suggest otherwise. - Modified summarization.e2e.ts to replace legacy UsageMetadata type with a more appropriate type for collectedUsage, improving type safety and clarity in the test setup. These changes improve the robustness of the summarization tests by validating context token constraints and refining type definitions. feat(summarization): add comprehensive E2E tests for summarization process - Introduced a new summarization.e2e.test.ts file that implements extensive end-to-end integration tests for the summarization pipeline, covering the full flow from LibreChat to agents. - The tests utilize real AI providers and include functionality to track summaries during and between runs. - Added necessary cleanup steps to manage Redis connections post-tests and ensure proper exit. These changes enhance the testing framework by providing robust coverage for the summarization process, ensuring reliability and performance under real-world conditions. fix(service): import logger from winston configuration - Removed the import statement for logger from '@librechat/data-schemas' and replaced it with an import from '~/config/winston'. - This change ensures that the logger is correctly sourced from the updated configuration, improving consistency in logging practices across the application. refactor(summary): simplify Summary component and enhance token display - Removed the unused `meta` prop from the `SummaryButton` component to streamline its interface. - Updated the token display logic to use a localized string for better internationalization support. - Adjusted the rendering of the `meta` information to improve its visibility within the `Summary` component. These changes enhance the clarity and usability of the Summary component while ensuring better localization practices. feat(summarization): add maxInputTokens configuration for summarization - Introduced a new `maxInputTokens` property in the summarization configuration schema to control the amount of conversation context sent to the summarizer, with a default value of 10000. - Updated the `createRun` function to utilize the new `maxInputTokens` setting, allowing for more flexible summarization based on agent context. These changes enhance the summarization capabilities by providing better control over input token limits, improving the overall summarization process. refactor(summarization): simplify maxInputTokens logic in createRun function - Updated the logic for the `maxInputTokens` property in the `createRun` function to directly use the agent's base context tokens when the resolved summarization configuration does not specify a value. - This change streamlines the configuration process and enhances clarity in how input token limits are determined for summarization. These modifications improve the maintainability of the summarization configuration by reducing complexity in the token calculation logic. feat(summary): enhance Summary component to display meta information - Updated the SummaryContent component to accept an optional `meta` prop, allowing for additional contextual information to be displayed above the main content. - Adjusted the rendering logic in the Summary component to utilize the new `meta` prop, improving the visibility of supplementary details. These changes enhance the user experience by providing more context within the Summary component, making it clearer and more informative. refactor(summarization): standardize reserveRatio configuration in summarization logic - Replaced instances of `reserveTokensRatio` with `reserveRatio` in the `createRun` function and related tests to unify the terminology across the codebase. - Updated the summarization configuration schema to reflect this change, ensuring consistency in how the reserve ratio is defined and utilized. - Removed the per-agent override logic for summarization configuration, simplifying the overall structure and enhancing clarity. These modifications improve the maintainability and readability of the summarization logic by standardizing the configuration parameters. * fix: circular dependency of `~/models` * chore: update logging scope in agent log handlers Changed log scope from `[agentus:${data.scope}]` to `[agents:${data.scope}]` in both the callbacks and responses controllers to ensure consistent logging format across the application. * feat: calibration ratio * refactor(tests): update summarizationConfig tests to reflect changes in enabled property Modified tests to check for the new `summarizationEnabled` property instead of the deprecated `enabled` field in the summarization configuration. This change ensures that the tests accurately validate the current configuration structure and behavior of the agents. * feat(tests): add markSummarizationUsage mock for improved test coverage Introduced a mock for the markSummarizationUsage function in the responses unit tests to enhance the testing of summarization usage tracking. This addition supports better validation of summarization-related functionalities and ensures comprehensive test coverage for the agents' response handling. * refactor(tests): simplify event handler setup in createResponse tests Removed redundant mock implementations for event handlers in the createResponse unit tests, streamlining the setup process. This change enhances test clarity and maintainability while ensuring that the tests continue to validate the correct behavior of usage tracking during on_chat_model_end events. * refactor(agents): move calibration ratio capture to finally block Reorganized the logic for capturing the calibration ratio in the AgentClient class to ensure it is executed in the finally block. This change guarantees that the ratio is captured even if the run is aborted, enhancing the reliability of the response message persistence. Removed redundant code and improved clarity in the handling of context metadata. * refactor(agents): streamline bulk write logic in recordCollectedUsage function Removed redundant bulk write operations and consolidated document handling in the recordCollectedUsage function. The logic now combines all documents into a single bulk write operation, improving efficiency and reducing error handling complexity. Updated logging to provide consistent error messages for bulk write failures. * refactor(agents): enhance summarization configuration resolution in createRun function Streamlined the summarization configuration logic by introducing a base configuration and allowing for overrides from agent-specific settings. This change improves clarity and maintainability, ensuring that the summarization configuration is consistently applied while retaining flexibility for customization. Updated the handling of summarization parameters to ensure proper integration with the agent's model and provider settings. * refactor(agents): remove unused tokenCountMap and streamline calibration ratio handling Eliminated the unused tokenCountMap variable from the AgentClient class to enhance code clarity. Additionally, streamlined the logic for capturing the calibration ratio by using optional chaining and a fallback value, ensuring that context metadata is consistently defined. This change improves maintainability and reduces potential confusion in the codebase. * refactor(agents): extract agent log handler for improved clarity and reusability Refactored the agent log handling logic by extracting it into a dedicated function, `agentLogHandler`, enhancing code clarity and reusability across different modules. Updated the event handlers in both the OpenAI and responses controllers to utilize the new handler, ensuring consistent logging behavior throughout the application. * test: add summarization event tests for useStepHandler Implemented a series of tests for the summarization events in the useStepHandler hook. The tests cover scenarios for ON_SUMMARIZE_START, ON_SUMMARIZE_DELTA, and ON_SUMMARIZE_COMPLETE events, ensuring proper handling of summarization logic, including message accumulation and finalization. This addition enhances test coverage and validates the correct behavior of the summarization process within the application. * refactor(config): update summarizationTriggerSchema to use enum for type validation Changed the type of the `type` field in the summarizationTriggerSchema from a string to an enum with a single value 'token_count'. This modification enhances type safety and ensures that only valid types are accepted in the configuration, improving overall clarity and maintainability of the schema. * test(usage): add bulk write tests for message and summarization usage Implemented tests for the bulk write functionality in the recordCollectedUsage function, covering scenarios for combined message and summarization usage, summarization-only usage, and message-only usage. These tests ensure correct document handling and token rollup calculations, enhancing test coverage and validating the behavior of the usage tracking logic. * refactor(Chat): enhance clipboard copy functionality and type definitions in Summary component Updated the Summary component to improve the clipboard copy functionality by handling clipboard permission errors. Refactored type definitions for SummaryProps to use a more specific type, enhancing type safety. Adjusted the SummaryButton and FloatingSummaryBar components to accept isCopied and onCopy props, promoting better separation of concerns and reusability. * chore(translations): remove unused "Expand Summary" key from English translations Deleted the "Expand Summary" key from the English translation file to streamline the localization resources and improve clarity in the user interface. This change helps maintain an organized and efficient translation structure. * refactor: adjust token counting for Claude model to account for API discrepancies Implemented a correction factor for token counting when using the Claude model, addressing discrepancies between Anthropic's API and local tokenizer results. This change ensures accurate token counts by applying a scaling factor, improving the reliability of token-related functionalities. * refactor(agents): implement token count adjustment for Claude model messages Added a method to adjust token counts for messages processed by the Claude model, applying a correction factor to align with API expectations. This enhancement improves the accuracy of token counting, ensuring reliable functionality when interacting with the Claude model. * refactor(agents): token counting for media content in messages Introduced a new method to estimate token costs for image and document blocks in messages, improving the accuracy of token counting. This enhancement ensures that media content is properly accounted for, particularly for the Claude model, by integrating additional token estimation logic for various content types. Updated the token counting function to utilize this new method, enhancing overall reliability and functionality. * chore: fix missing import * fix(agents): clamp baseContextTokens and document reserve ratio change Prevent negative baseContextTokens when maxOutputTokens exceeds the context window (misconfigured models). Document the 10%→5% default reserve ratio reduction introduced alongside summarization. * fix(agents): include media tokens in hydrated token counts Add estimateMediaTokensForMessage to createTokenCounter so the hydration path (used by hydrateMissingIndexTokenCounts) matches the precomputed path in AgentClient.getTokenCountForMessage. Without this, messages containing images or documents were systematically undercounted during hydration, risking context window overflow. Add 34 unit tests covering all block-type branches of estimateMediaTokensForMessage. * fix(agents): include summarization output tokens in usage return value The returned output_tokens from recordCollectedUsage now reflects all billed LLM calls (message + summarization). Previously, summarization completions were billed but excluded from the returned metadata, causing a discrepancy between what users were charged and what the response message reported. * fix(tests): replace process.exit with proper Redis cleanup in e2e test The summarization E2E test used process.exit(0) to work around a Redis connection opened at import time, which killed the Jest runner and bypassed teardown. Use ioredisClient.quit() and keyvRedisClient.disconnect() for graceful cleanup instead. * fix(tests): update getConvo imports in OpenAI and response tests Refactor test files to import getConvo from the main models module instead of the Conversation submodule. This change ensures consistency across tests and simplifies the import structure, enhancing maintainability. * fix(clients): improve summary text validation in BaseClient Refactor the summary extraction logic to ensure that only non-empty summary texts are considered valid. This change enhances the robustness of the message processing by utilizing a dedicated method for summary text retrieval, improving overall reliability. * fix(config): replace z.any() with explicit union in summarization schema Model parameters (temperature, top_p, etc.) are constrained to primitive types rather than the policy-violating z.any(). * refactor(agents): deduplicate CLAUDE_TOKEN_CORRECTION constant Export from the TS source in packages/api and import in the JS client, eliminating the static class property that could drift out of sync. * refactor(agents): eliminate duplicate selfProvider in buildAgentContext selfProvider and provider were derived from the same expression with different type casts. Consolidated to a single provider variable. * refactor(agents): extract shared SSE handlers and restrict log levels - buildSummarizationHandlers() factory replaces triplicated handler blocks across responses.js and openai.js - agentLogHandlerObj exported from callbacks.js for consistent reuse - agentLogHandler restricted to an allowlist of safe log levels (debug, info, warn, error) instead of accepting arbitrary strings * fix(SSE): batch summarize deltas, add exhaustiveness check, conditional error announcement - ON_SUMMARIZE_DELTA coalesces rapid-fire renders via requestAnimationFrame instead of calling setMessages per chunk - Exhaustive never-check on TStepEvent catches unhandled variants at compile time when new StepEvents are added - ON_SUMMARIZE_COMPLETE error announcement only fires when a summary part was actually present and removed * feat(agents): persist instruction overhead in contextMeta and seed across runs Extend contextMeta with instructionOverhead and toolCount so the provider-observed instruction overhead is persisted on the response message and seeded into the pruner on subsequent runs. This enables the pruner to use a calibrated budget from the first call instead of waiting for a provider observation, preventing the ratio collapse caused by local tokenizer overestimating tool schema tokens. The seeded overhead is only used when encoding and tool count match between runs, ensuring stale values from different configurations are discarded. * test(agents): enhance OpenAI test mocks for summarization handlers Updated the OpenAI test suite to include additional mock implementations for summarization handlers, including buildSummarizationHandlers, markSummarizationUsage, and agentLogHandlerObj. This improves test coverage and ensures consistent behavior during testing. * fix(agents): address review findings for summarization v2 Cancel rAF on unmount to prevent stale Recoil writes from dead component context. Clear orphaned summarizing:true parts when ON_SUMMARIZE_COMPLETE arrives without a summary payload. Add null guard and safe spread to agentLogHandler. Handle Anthropic-format base64 image/* documents in estimateMediaTokensForMessage. Use role="region" for expandable summary content. Add .describe() to contextMeta Zod fields. Extract duplicate usage loop into helper. * refactor: simplify contextMeta to calibrationRatio + encoding only Remove instructionOverhead and toolCount from cross-run persistence — instruction tokens change too frequently between runs (prompt edits, tool changes) for a persisted seed to be reliable. The intra-run calibration in the pruner still self-corrects via provider observations. contextMeta now stores only the tokenizer-bias ratio and encoding, which are stable across instruction changes. * test(SSE): enhance useStepHandler tests for ON_SUMMARIZE_COMPLETE behavior Updated the test for ON_SUMMARIZE_COMPLETE to clarify that it finalizes the existing part with summarizing set to false when the summary is undefined. Added assertions to verify the correct behavior of message updates and the state of summary parts. * refactor(BaseClient): remove handleContextStrategy and truncateToolCallOutputs functions Eliminated the handleContextStrategy method from BaseClient to streamline message handling. Also removed the truncateToolCallOutputs function from the prompts module, simplifying the codebase and improving maintainability. * refactor: add AGENT_DEBUG_LOGGING option and refactor token count handling in BaseClient Introduced AGENT_DEBUG_LOGGING to .env.example for enhanced debugging capabilities. Refactored token count handling in BaseClient by removing the handleTokenCountMap method and simplifying token count updates. Updated AgentClient to log detailed token count recalculations and adjustments, improving traceability during message processing. * chore: update dependencies in package-lock.json and package.json files Bumped versions of several dependencies, including @librechat/agents to ^3.1.62 and various AWS SDK packages to their latest versions. This ensures compatibility and incorporates the latest features and fixes. * chore: imports order * refactor: extract summarization config resolution from buildAgentContext * refactor: rename and simplify summarization configuration shaping function * refactor: replace AgentClient token counting methods with single-pass pure utility Extract getTokenCount() and getTokenCountForMessage() from AgentClient into countFormattedMessageTokens(), a pure function in packages/api that handles text, tool_call, image, and document content types in one loop. - Decompose estimateMediaTokensForMessage into block-level helpers (estimateImageDataTokens, estimateImageBlockTokens, estimateDocumentBlockTokens) shared by both estimateMediaTokensForMessage and the new single-pass function - Remove redundant per-call getEncoding() resolution (closure captures once) - Remove deprecated gpt-3.5-turbo-0301 model branching - Drop this.getTokenCount guard from BaseClient.sendMessage * refactor: streamline token counting in createTokenCounter function Simplified the createTokenCounter function by removing the media token estimation and directly calculating the token count. This change enhances clarity and performance by consolidating the token counting logic into a single pass, while maintaining compatibility with Claude's token correction. * refactor: simplify summarization configuration types Removed the AppSummarizationConfig type and directly used SummarizationConfig in the AppConfig interface. This change streamlines the type definitions and enhances consistency across the codebase. * chore: import order * fix: summarization event handling in useStepHandler - Cancel pending summarizeDeltaRaf in clearStepMaps to prevent stale frames firing after map reset or component unmount - Move announcePolite('summarize_completed') inside the didFinalize guard so screen readers only announce when finalization actually occurs - Remove dead cleanup closure returned from stepHandler useCallback body that was never invoked by any caller * fix: estimate tokens for non-PDF/non-image base64 document blocks Previously estimateDocumentBlockTokens returned 0 for unrecognized MIME types (e.g. text/plain, application/json), silently underestimating context budget. Fall back to character-based heuristic or countTokens. * refactor: return cloned usage from markSummarizationUsage Avoid mutating LangChain's internal usage_metadata object by returning a shallow clone with the usage_type tag. Update all call sites in callbacks, openai, and responses controllers to use the returned value. * refactor: consolidate debug logging loops in buildMessages Merge the two sequential O(n) debug-logging passes over orderedMessages into a single pass inside the map callback where all data is available. * refactor: narrow SummaryContentPart.content type Replace broad Agents.MessageContentComplex[] with the specific Array<{ type: ContentTypes.TEXT; text: string }> that all producers and consumers already use, improving compile-time safety. * refactor: use single output array in recordCollectedUsage Have processUsageGroup append to a shared array instead of returning separate arrays that are spread into a third, reducing allocations. * refactor: use for...in in hydrateMissingIndexTokenCounts Replace Object.entries with for...in to avoid allocating an intermediate tuple array during token map hydration.	2026-03-21 14:28:56 -04:00
Danny Avila	67db0c1cb3	🗑️ chore: Remove Action Test Suite and Update Mock Implementations (#12268 ) - Deleted the Action test suite located in `api/models/Action.spec.js` to streamline the codebase. - Updated various test files to reflect changes in model mocks, consolidating mock implementations for user-related actions and enhancing clarity. - Improved consistency in test setups by aligning with the latest model updates and removing redundant mock definitions.	2026-03-21 14:28:55 -04:00
Danny Avila	dd72b7b17e	🔄 chore: Consolidate agent model imports across middleware and tests from rebase - Updated imports for `createAgent` and `getAgent` to streamline access from a unified `~/models` path. - Enhanced test files to reflect the new import structure, ensuring consistency and maintainability across the codebase. - Improved clarity by removing redundant imports and aligning with the latest model updates.	2026-03-21 14:28:55 -04:00
Atef Bellaaj	a0fed6173c	🗂️ refactor: Migrate S3 Storage to TypeScript in packages/api (#11947 ) * Migrate S3 storage module with unit and integration tests - Migrate S3 CRUD and image operations to packages/api/src/storage/s3/ - Add S3ImageService class with dependency injection - Add unit tests using aws-sdk-client-mock - Add integration tests with real s3 bucket (condition presence of AWS_TEST_BUCKET_NAME) * AI Review Findings Fixes * chore: tests and refactor S3 storage types - Added mock implementations for the 'sharp' library in various test files to improve image processing testing. - Updated type references in S3 storage files from MongoFile to TFile for consistency and type safety. - Refactored S3 CRUD operations to ensure proper handling of file types and improve code clarity. - Enhanced integration tests to validate S3 file operations and error handling more effectively. * chore: rename test file * Remove duplicate import of refreshS3Url * chore: imports order * fix: remove duplicate imports for S3 URL handling in UserController * fix: remove duplicate import of refreshS3FileUrls in files.js * test: Add mock implementations for 'sharp' and '@librechat/api' in UserController tests - Introduced mock functions for the 'sharp' library to facilitate image processing tests, including metadata retrieval and buffer conversion. - Enhanced mocking for '@librechat/api' to ensure consistent behavior in tests, particularly for the needsRefresh and getNewS3URL functions. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-21 14:28:55 -04:00
Danny Avila	9e0592a236	📜 feat: Implement System Grants for Capability-Based Authorization (#11896 ) * feat: Implement System Grants for Role-Based Capabilities - Added a new `systemGrant` model and associated methods to manage role-based capabilities within the application. - Introduced middleware functions `hasCapability` and `requireCapability` to check user permissions based on their roles. - Updated the database seeding process to include system grants for the ADMIN role, ensuring all necessary capabilities are assigned on startup. - Enhanced type definitions and schemas to support the new system grant functionality, improving overall type safety and clarity in the codebase. * test: Add unit tests for capabilities middleware and system grant methods - Introduced comprehensive unit tests for the capabilities middleware, including `hasCapability` and `requireCapability`, ensuring proper permission checks based on user roles. - Added tests for the `SystemGrant` methods, verifying the seeding of system grants, capability granting, and revocation processes. - Enhanced test coverage for edge cases, including idempotency of grant operations and handling of unexpected errors in middleware. - Utilized mocks for database interactions to isolate tests and improve reliability. * refactor: Transition to Capability-Based Access Control - Replaced role-based access checks with capability-based checks across various middleware and routes, enhancing permission management. - Introduced `hasCapability` and `requireCapability` functions to streamline capability verification for user actions. - Updated relevant routes and middleware to utilize the new capability system, ensuring consistent permission enforcement. - Enhanced type definitions and added tests for the new capability functions, improving overall code reliability and maintainability. * test: Enhance capability-based access tests for ADMIN role - Updated tests to reflect the new capability-based access control, specifically for the ADMIN role. - Modified test descriptions to clarify that users with the MANAGE_AGENTS capability can bypass permission checks. - Seeded capabilities for the ADMIN role in multiple test files to ensure consistent permission checks across different routes and middleware. - Improved overall test coverage for capability verification, ensuring robust permission management. * test: Update capability tests for MCP server access - Renamed test to reflect the correct capability for bypassing permission checks, changing from MANAGE_AGENTS to MANAGE_MCP_SERVERS. - Updated seeding of capabilities for the ADMIN role to align with the new capability structure. - Ensured consistency in capability definitions across tests and middleware for improved permission management. * feat: Add hasConfigCapability for enhanced config access control - Introduced `hasConfigCapability` function to check user permissions for managing or reading specific config sections. - Updated middleware to export the new capability function, ensuring consistent access control across the application. - Enhanced unit tests to cover various scenarios for the new capability, improving overall test coverage and reliability. * fix: Update tenantId filter in createSystemGrantMethods - Added a condition to set tenantId filter to { $exists: false } when tenantId is null, ensuring proper handling of cases where tenantId is not provided. - This change improves the robustness of the system grant methods by explicitly managing the absence of tenantId in the filter logic. * fix: account deletion capability check - Updated the `canDeleteAccount` middleware to ensure that the `hasManageUsers` capability check only occurs if a user is present, preventing potential errors when the user object is undefined. - This change improves the robustness of the account deletion logic by ensuring proper handling of user permissions. * refactor: Optimize seeding of system grants for ADMIN role - Replaced sequential capability granting with parallel execution using Promise.all in the seedSystemGrants function. - This change improves performance and efficiency during the initialization of system grants, ensuring all capabilities are granted concurrently. * refactor: Simplify systemGrantSchema index definition - Removed the sparse option from the unique index on principalType, principalId, capability, and tenantId in the systemGrantSchema. - This change streamlines the index definition, potentially improving query performance and clarity in the schema design. * refactor: Reorganize role capability check in roles route - Moved the capability check for reading roles to occur after parsing the roleName, improving code clarity and structure. - This change ensures that the authorization logic is consistently applied before fetching role details, enhancing overall permission management. * refactor: Remove unused ISystemGrant interface from systemCapabilities.ts - Deleted the ISystemGrant interface as it was no longer needed, streamlining the code and improving clarity. - This change helps reduce clutter in the file and focuses on relevant capabilities for the system. * refactor: Migrate SystemCapabilities to data-schemas - Replaced imports of SystemCapabilities from 'librechat-data-provider' with imports from '@librechat/data-schemas' across multiple files. - This change centralizes the management of system capabilities, improving code organization and maintainability. * refactor: Update account deletion middleware and capability checks - Modified the `canDeleteAccount` middleware to ensure that the account deletion permission is only granted to users with the `MANAGE_USERS` capability, improving security and clarity in permission management. - Enhanced error logging for unauthorized account deletion attempts, providing better insights into permission issues. - Updated the `capabilities.ts` file to ensure consistent handling of user authentication checks, improving robustness in capability verification. - Refined type definitions in `systemGrant.ts` and `systemGrantMethods.ts` to utilize the `PrincipalType` enum, enhancing type safety and code clarity. * refactor: Extract principal ID normalization into a separate function - Introduced `normalizePrincipalId` function to streamline the normalization of principal IDs based on their type, enhancing code clarity and reusability. - Updated references in `createSystemGrantMethods` to utilize the new normalization function, improving maintainability and reducing code duplication. * test: Add unit tests for principalId normalization in systemGrant - Introduced tests for the `grantCapability`, `revokeCapability`, and `getCapabilitiesForPrincipal` methods to verify correct handling of principalId normalization between string and ObjectId formats. - Enhanced the `capabilities.ts` middleware to utilize the `PrincipalType` enum for improved type safety. - Added a new utility function `normalizePrincipalId` to streamline principal ID normalization logic, ensuring consistent behavior across the application. * feat: Introduce capability implications and enhance system grant methods - Added `CapabilityImplications` to define relationships between broader and implied capabilities, allowing for more intuitive permission checks. - Updated `createSystemGrantMethods` to expand capability queries to include implied capabilities, improving authorization logic. - Enhanced `systemGrantSchema` to include an `expiresAt` field for future TTL enforcement of grants, and added validation to ensure `tenantId` is not set to null. - Documented authorization requirements for prompt group and prompt deletion methods to clarify access control expectations. * test: Add unit tests for canDeleteAccount middleware - Introduced unit tests for the `canDeleteAccount` middleware to verify account deletion permissions based on user roles and capabilities. - Covered scenarios for both allowed and blocked account deletions, including checks for ADMIN users with the `MANAGE_USERS` capability and handling of undefined user cases. - Enhanced test structure to ensure clarity and maintainability of permission checks in the middleware. * fix: Add principalType enum validation to SystemGrant schema Without enum validation, any string value was accepted for principalType and silently stored. Invalid documents would never match capability queries, creating phantom grants impossible to diagnose without raw DB inspection. All other ACL models in the codebase validate this field. * fix: Replace seedSystemGrants Promise.all with bulkWrite for concurrency safety When two server instances start simultaneously (K8s rolling deploy, PM2 cluster), both call seedSystemGrants. With Promise.all + findOneAndUpdate upsert, both instances may attempt to insert the same documents, causing E11000 duplicate key errors that crash server startup. bulkWrite with ordered:false handles concurrent upserts gracefully and reduces 17 individual round trips to a single network call. The returned documents (previously discarded) are no longer fetched. * perf: Add AsyncLocalStorage per-request cache for capability checks Every hasCapability call previously required 2 DB round trips (getUserPrincipals + SystemGrant.exists) — replacing what were O(1) string comparisons. Routes like patchPromptGroup triggered this twice, and hasConfigCapability's fallback path resolved principals twice. This adds a per-request AsyncLocalStorage cache that: - Caches resolved principals (same for all checks within one request) - Caches capability check results (same user+cap = same answer) - Automatically scoped to request lifetime (no stale grants) - Falls through to DB when no store exists (background jobs, tests) - Requires no signature changes to hasCapability The capabilityContextMiddleware is registered at the app level before all routes, initializing a fresh store per request. * fix: Add error handling for inline hasCapability calls canDeleteAccount, fetchAssistants, and validateAuthor all call hasCapability without try-catch. These were previously O(1) string comparisons that could never throw. Now they hit the database and can fail on connection timeout or transient errors. Wrap each call in try-catch, defaulting to deny (false) on error. This ensures a DB hiccup returns a clean 403 instead of an unhandled 500 with a stack trace. * test: Add canDeleteAccount DB-error resilience test Tests that hasCapability rejection (e.g., DB timeout) results in a clean 403 rather than an unhandled exception. Validates the error handling added in the previous commit. * refactor: Use barrel import for hasCapability in validateAuthor Import from ~/server/middleware barrel instead of directly from ~/server/middleware/roles/capabilities for consistency with other non-middleware consumers. Files within the middleware barrel itself must continue using direct imports to avoid circular requires. * refactor: Remove misleading pre('save') hook from SystemGrant schema The pre('save') hook normalized principalId for USER/GROUP principals, but the primary write path (grantCapability) uses findOneAndUpdate — which does not trigger save hooks. The normalization was already handled explicitly in grantCapability itself. The hook created a false impression of schema-level enforcement that only covered save()/create() paths. Replace with a comment documenting that all writes must go through grantCapability. * feat: Add READ_ASSISTANTS capability to complete manage/read pair Every other managed resource had a paired READ_X / MANAGE_X capability except assistants. This adds READ_ASSISTANTS and registers the MANAGE_ASSISTANTS → READ_ASSISTANTS implication in CapabilityImplications, enabling future read-only assistant visibility grants. * chore: Reorder systemGrant methods for clarity Moved hasCapabilityForPrincipals to a more logical position in the returned object of createSystemGrantMethods, improving code readability. This change also maintains the inclusion of seedSystemGrants in the export, ensuring all necessary methods are available. * fix: Wrap seedSystemGrants in try-catch to avoid blocking startup Seeding capabilities is idempotent and will succeed on the next restart. A transient DB error during seeding should not prevent the server from starting — log the error and continue. * refactor: Improve capability check efficiency and add audit logging Move hasCapability calls after cheap early-exits in validateAuthor and fetchAssistants so the DB check only runs when its result matters. Add logger.debug on every capability bypass grant across all 7 call sites for auditability, and log errors in catch blocks instead of silently swallowing them. * test: Add integration tests for AsyncLocalStorage capability caching Exercises the full vertical — ALS context, generateCapabilityCheck, real getUserPrincipals, real hasCapabilityForPrincipals, real MongoDB via MongoMemoryServer. Covers per-request caching, cross-context isolation, concurrent request isolation, negative caching, capability implications, tenant scoping, group-based grants, and requireCapability middleware. * test: Add systemGrant data-layer and ALS edge-case integration tests systemGrant.spec.ts (51 tests): Full integration tests for all systemGrant methods against real MongoDB — grant/revoke lifecycle, principalId normalization (string→ObjectId for USER/GROUP, string for ROLE), capability implications (both directions), tenant scoping, schema validation (null tenantId, invalid enum, required fields, unique compound index). capabilities.integration.spec.ts (27 tests): Adds ALS edge cases — missing context degrades gracefully with no caching (background jobs, child processes), nested middleware creates independent inner context, optional-chaining safety when store is undefined, mid-request grant changes are invisible due to result caching, requireCapability works without ALS, and interleaved concurrent contexts maintain isolation. * fix: Add worker thread guards to capability ALS usage Detect when hasCapability or capabilityContextMiddleware is called from a worker thread (where ALS context does not propagate from the parent). hasCapability logs a warn-once per factory instance; the middleware logs an error since mounting Express middleware in a worker is likely a misconfiguration. Both continue to function correctly — the guard is observability, not a hard block. * fix: Include tenantId in ALS principal cache key for tenant isolation The principal cache key was user.id:user.role, which would reuse cached principals across tenants for the same user within a request. When getUserPrincipals gains tenant-scoped group resolution, principals from tenant-a would incorrectly serve tenant-b checks. Changed to user.id:user.role:user.tenantId to prevent cross-tenant cache hits. Adds integration test proving separate principal lookups per tenantId. * test: Remove redundant mocked capabilities.spec.js The JS wrapper test (7 tests, all mocked) is a strict subset of capabilities.integration.spec.ts (28 tests, real MongoDB). Every scenario it covered — hasCapability true/false, tenantId passthrough, requireCapability 403/500, error handling — is tested with higher fidelity in the integration suite. * test: Replace mocked canDeleteAccount tests with real MongoDB integration Remove hasCapability mock — tests now exercise the full capability chain against real MongoDB (getUserPrincipals, hasCapabilityForPrincipals, SystemGrant collection). Only mocks remaining are logger and cache. Adds new coverage: admin role without grant is blocked, user-level grant bypasses deletion restriction, null user handling. * test: Add comprehensive tests for ACL entry management and user group methods Introduces new tests for `deleteAclEntries`, `bulkWriteAclEntries`, and `findPublicResourceIds` in `aclEntry.spec.ts`, ensuring proper functionality for deleting and bulk managing ACL entries. Additionally, enhances `userGroup.spec.ts` with tests for finding groups by ID and name pattern, including external ID matching and source filtering. These changes improve coverage and validate the integrity of ACL and user group operations against real MongoDB interactions. * refactor: Update capability checks and logging for better clarity and error handling Replaced `MANAGE_USERS` with `ACCESS_ADMIN` in the `canDeleteAccount` middleware and related tests to align with updated permission structure. Enhanced logging in various middleware functions to use `logger.warn` for capability check failures, providing clearer error messages. Additionally, refactored capability checks in the `patchPromptGroup` and `validateAuthor` functions to improve readability and maintainability. This commit also includes adjustments to the `systemGrant` methods to implement retry logic for transient failures during capability seeding, ensuring robustness in the face of database errors. * refactor: Enhance logging and retry logic in seedSystemGrants method Updated the logging format in the seedSystemGrants method to include error messages for better clarity. Improved the retry mechanism by explicitly mocking multiple failures in tests, ensuring robust error handling during transient database issues. Additionally, refined imports in the systemGrant schema for better type management. * refactor: Consolidate imports in canDeleteAccount middleware Merged logger and SystemCapabilities imports from the data-schemas module into a single line for improved readability and maintainability of the code. This change streamlines the import statements in the canDeleteAccount middleware. * test: Enhance systemGrant tests for error handling and capability validation Added tests to the systemGrant methods to handle various error scenarios, including E11000 race conditions, invalid ObjectId strings for USER and GROUP principals, and invalid capability strings. These enhancements improve the robustness of the capability granting and revoking logic, ensuring proper error propagation and validation of inputs. * fix: Wrap hasCapability calls in deny-by-default try-catch at remaining sites canAccessResource, files.js, and roles.js all had hasCapability inside outer try-catch blocks that returned 500 on DB failure instead of falling through to the regular ACL check. This contradicts the deny-by-default pattern used everywhere else. Also removes raw error.message from the roles.js 500 response to prevent internal host/connection info leaking to clients. * fix: Normalize user ID in canDeleteAccount before passing to hasCapability requireCapability normalizes req.user.id via _id?.toString() fallback, but canDeleteAccount passed raw req.user directly. If req.user.id is absent (some auth layers only populate _id), getUserPrincipals received undefined, silently returning empty principals and blocking the bypass. * fix: Harden systemGrant schema and type safety - Reject empty string tenantId in schema validator (was only blocking null; empty string silently orphaned documents) - Fix reverseImplications to use BaseSystemCapability[] instead of string[], preserving the narrow discriminated type - Document READ_ASSISTANTS as reserved/unenforced * test: Use fake timers for seedSystemGrants retry tests and add tenantId validation - Switch retry tests to jest.useFakeTimers() to eliminate 3+ seconds of real setTimeout delays per test run - Add regression test for empty-string tenantId rejection * docs: Add TODO(#12091) comments for tenant-scoped capability gaps In multi-tenant mode, platform-level grants (no tenantId) won't match tenant-scoped queries, breaking admin access. getUserPrincipals also returns cross-tenant group memberships. Both need fixes in #12091.	2026-03-21 14:28:54 -04:00
Danny Avila	0412f05daf	🪢 chore: Consolidate Pricing and Tx Imports After tx.js Module Removal (#12086 ) * 🧹 chore: resolve imports due to rebase * chore: Update model mocks in unit tests for consistency - Consolidated model mock implementations across various test files to streamline setup and reduce redundancy. - Removed duplicate mock definitions for `getMultiplier` and `getCacheMultiplier`, ensuring a unified approach in `recordCollectedUsage.spec.js`, `openai.spec.js`, `responses.unit.spec.js`, and `abortMiddleware.spec.js`. - Enhanced clarity and maintainability of test files by aligning mock structures with the latest model updates. * fix: Safeguard token credit checks in transaction tests - Updated assertions in `transaction.spec.ts` to handle potential null values for `updatedBalance` by using optional chaining. - Enhanced robustness of tests related to token credit calculations, ensuring they correctly account for scenarios where the balance may not be found. * chore: transaction methods with bulk insert functionality - Introduced `bulkInsertTransactions` method in `transaction.ts` to facilitate batch insertion of transaction documents. - Updated test file `transactions.bulk-parity.spec.ts` to utilize new pricing function assignments and handle potential null values in calculations, improving test robustness. - Refactored pricing function initialization for clarity and consistency. * refactor: Enhance type definitions and introduce new utility functions for model matching - Added `findMatchingPattern` and `matchModelName` utility functions to improve model name matching logic in transaction methods. - Updated type definitions for `findMatchingPattern` to accept a more specific tokensMap structure, enhancing type safety. - Refactored `dbMethods` initialization in `transactions.bulk-parity.spec.ts` to include the new utility functions, improving test clarity and functionality. * refactor: Update database method imports and enhance transaction handling - Refactored `abortMiddleware.js` to utilize centralized database methods for message handling and conversation retrieval, improving code consistency. - Enhanced `bulkInsertTransactions` in `transaction.ts` to handle empty document arrays gracefully and added error logging for better debugging. - Updated type definitions in `transactions.ts` to enforce stricter typing for token types, enhancing type safety across transaction methods. - Improved test setup in `transactions.bulk-parity.spec.ts` by refining pricing function assignments and ensuring robust handling of potential null values. * refactor: Update database method references and improve transaction multiplier handling - Refactored `client.js` to update database method references for `bulkInsertTransactions` and `updateBalance`, ensuring consistency in method usage. - Enhanced transaction multiplier calculations in `transaction.spec.ts` to provide fallback values for write and read multipliers, improving robustness in cost calculations across structured token spending tests.	2026-03-21 14:28:53 -04:00
Danny Avila	8ba2bde5c1	📦 refactor: Consolidate DB models, encapsulating Mongoose usage in `data-schemas` (#11830 ) * chore: move database model methods to /packages/data-schemas * chore: add TypeScript ESLint rule to warn on unused variables * refactor: model imports to streamline access - Consolidated model imports across various files to improve code organization and reduce redundancy. - Updated imports for models such as Assistant, Message, Conversation, and others to a unified import path. - Adjusted middleware and service files to reflect the new import structure, ensuring functionality remains intact. - Enhanced test files to align with the new import paths, maintaining test coverage and integrity. * chore: migrate database models to packages/data-schemas and refactor all direct Mongoose Model usage outside of data-schemas * test: update agent model mocks in unit tests - Added `getAgent` mock to `client.test.js` to enhance test coverage for agent-related functionality. - Removed redundant `getAgent` and `getAgents` mocks from `openai.spec.js` and `responses.unit.spec.js` to streamline test setup and reduce duplication. - Ensured consistency in agent mock implementations across test files. * fix: update types in data-schemas * refactor: enhance type definitions in transaction and spending methods - Updated type definitions in `checkBalance.ts` to use specific request and response types. - Refined `spendTokens.ts` to utilize a new `SpendTxData` interface for better clarity and type safety. - Improved transaction handling in `transaction.ts` by introducing `TransactionResult` and `TxData` interfaces, ensuring consistent data structures across methods. - Adjusted unit tests in `transaction.spec.ts` to accommodate new type definitions and enhance robustness. * refactor: streamline model imports and enhance code organization - Consolidated model imports across various controllers and services to a unified import path, improving code clarity and reducing redundancy. - Updated multiple files to reflect the new import structure, ensuring all functionalities remain intact. - Enhanced overall code organization by removing duplicate import statements and optimizing the usage of model methods. * feat: implement loadAddedAgent and refactor agent loading logic - Introduced `loadAddedAgent` function to handle loading agents from added conversations, supporting multi-convo parallel execution. - Created a new `load.ts` file to encapsulate agent loading functionalities, including `loadEphemeralAgent` and `loadAgent`. - Updated the `index.ts` file to export the new `load` module instead of the deprecated `loadAgent`. - Enhanced type definitions and improved error handling in the agent loading process. - Adjusted unit tests to reflect changes in the agent loading structure and ensure comprehensive coverage. * refactor: enhance balance handling with new update interface - Introduced `IBalanceUpdate` interface to streamline balance update operations across the codebase. - Updated `upsertBalanceFields` method signatures in `balance.ts`, `transaction.ts`, and related tests to utilize the new interface for improved type safety. - Adjusted type imports in `balance.spec.ts` to include `IBalanceUpdate`, ensuring consistency in balance management functionalities. - Enhanced overall code clarity and maintainability by refining type definitions related to balance operations. * feat: add unit tests for loadAgent functionality and enhance agent loading logic - Introduced comprehensive unit tests for the `loadAgent` function, covering various scenarios including null and empty agent IDs, loading of ephemeral agents, and permission checks. - Enhanced the `initializeClient` function by moving `getConvoFiles` to the correct position in the database method exports, ensuring proper functionality. - Improved test coverage for agent loading, including handling of non-existent agents and user permissions. * chore: reorder memory method exports for consistency - Moved `deleteAllUserMemories` to the correct position in the exported memory methods, ensuring a consistent and logical order of method exports in `memory.ts`.	2026-03-21 14:28:53 -04:00
Danny Avila	58f128bee7	🗑️ chore: Remove Deprecated Project Model and Associated Fields (#11773 ) * chore: remove projects and projectIds usage * chore: empty line linting * chore: remove isCollaborative property across agent models and related tests - Removed the isCollaborative property from agent models, controllers, and tests, as it is deprecated in favor of ACL permissions. - Updated related validation schemas and data provider types to reflect this change. - Ensured all references to isCollaborative were stripped from the codebase to maintain consistency and clarity.	2026-03-21 14:28:53 -04:00
Danny Avila	38521381f4	🐘 feat: FerretDB Compatibility (#11769 ) * feat: replace unsupported MongoDB aggregation operators for FerretDB compatibility Replace $lookup, $unwind, $sample, $replaceRoot, and $addFields aggregation stages which are unsupported on FerretDB v2.x (postgres-documentdb backend). - Prompt.js: Replace $lookup/$unwind/$project pipelines with find().select().lean() + attachProductionPrompts() batch helper. Replace $group/$replaceRoot/$sample in getRandomPromptGroups with distinct() + Fisher-Yates shuffle. - Agent/Prompt migration scripts: Replace $lookup anti-join pattern with distinct() + $nin two-step queries for finding un-migrated resources. All replacement patterns verified against FerretDB v2.7.0. * fix: use $pullAll for simple array removals, fix memberIds type mismatches Replace $pull with $pullAll for exact-value scalar array removals. Both operators work on MongoDB and FerretDB, but $pullAll is more explicit for exact matching (no condition expressions). Fix critical type mismatch bugs where ObjectId values were used against String[] memberIds arrays in Group queries: - config/delete-user.js: use string uid instead of ObjectId user._id - e2e/setup/cleanupUser.ts: convert userId.toString() before query Harden PermissionService.bulkUpdateResourcePermissions abort handling to prevent crash when abortTransaction is called after commitTransaction. All changes verified against FerretDB v2.7.0 and MongoDB Memory Server. * fix: harden transaction support probe for FerretDB compatibility Commit the transaction before aborting in supportsTransactions probe, and wrap abortTransaction in try-catch to prevent crashes when abort is called after a successful commit (observed behavior on FerretDB). * feat: add FerretDB compatibility test suite, retry utilities, and CI config Add comprehensive FerretDB integration test suite covering: - $pullAll scalar array operations - $pull with subdocument conditions - $lookup replacement (find + manual join) - $sample replacement (distinct + Fisher-Yates) - $bit and $bitsAllSet operations - Migration anti-join pattern - Multi-tenancy (useDb, scaling, write amplification) - Sharding proof-of-concept - Production operations (backup/restore, schema migration, deadlock retry) Add production retryWithBackoff utility for deadlock recovery during concurrent index creation on FerretDB/DocumentDB backends. Add UserController.spec.js tests for deleteUserController (runs in CI). Configure jest and eslint to isolate FerretDB tests from CI pipelines: - packages/data-schemas/jest.config.mjs: ignore misc/ directory - eslint.config.mjs: ignore packages/data-schemas/misc/ Include Docker Compose config for local FerretDB v2.7 + postgres-documentdb, dedicated jest/tsconfig for the test files, and multi-tenancy findings doc. * style: brace formatting in aclEntry.ts modifyPermissionBits * refactor: reorganize retry utilities and update imports - Moved retryWithBackoff utility to a new file `retry.ts` for better structure. - Updated imports in `orgOperations.ferretdb.spec.ts` to reflect the new location of retry utilities. - Removed old import statement for retryWithBackoff from index.ts to streamline exports. * test: add $pullAll coverage for ConversationTag and PermissionService Add integration tests for deleteConversationTag verifying $pullAll removes tags from conversations correctly, and for syncUserEntraGroupMemberships verifying $pullAll removes user from non-matching Entra groups while preserving local group membership. ---------	2026-03-21 14:28:49 -04:00
Danny Avila	365a0dc0f6	🩺 refactor: Surface Descriptive OCR Error Messages to Client (#12344 ) * fix: pass along error message when OCR fails Right now, if OCR fails, it just says "Error processing file" which isn't very helpful. The `error.message` does has helpful information in it, but our filter wasn't including the right case to pass it along. Now it does! * fix: extract shared upload error filter, apply to images route The 'Unable to extract text from' error was only allowlisted in the files route but not the images route, which also calls processAgentFileUpload. Extract the duplicated error filter logic into a shared resolveUploadErrorMessage utility in packages/api so both routes stay in sync. --------- Co-authored-by: Dan Lew <daniel@mightyacorn.com>	2026-03-20 17:10:25 -04:00
mfish911	4e5ae28fa9	📡 feat: Support Unauthenticated SMTP Relays (#12322 ) * allow smtp server that does not have authentication * fix: align checkEmailConfig with optional SMTP credentials and add tests Remove EMAIL_USERNAME/EMAIL_PASSWORD requirements from the hasSMTPConfig predicate in checkEmailConfig() so the rest of the codebase (login, startup checks, invite-user) correctly recognizes unauthenticated SMTP as a valid email configuration. Add a warning when only one of the two credential env vars is set, in both sendEmail.js and checkEmailConfig(), to catch partial misconfigurations early. Add test coverage for both the transporter auth assembly in sendEmail.js and the checkEmailConfig predicate in packages/api. Document in .env.example that credentials are optional for unauthenticated SMTP relays. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-20 13:07:39 -04:00
JooyoungChoi14	54fc9c2c99	♾️ fix: Permanent Ban Cache and Expired Ban Cleanup Defects (#12324 ) * fix: preserve ban data object in checkBan to prevent permanent cache The !! operator on line 108 coerces the ban data object to a boolean, losing the expiresAt property. This causes: 1. Number(true.expiresAt) = NaN → expired bans never cleaned from banLogs 2. banCache.set(key, true, NaN) → Keyv stores with expires: null (permanent) 3. IP-based cache entries persist indefinitely, blocking unrelated users Fix: replace isBanned (boolean) with banData (original object) so that expiresAt is accessible for TTL calculation and proper cache expiry. * fix: address checkBan cleanup defects exposed by ban data fix The prior commit correctly replaced boolean coercion with the ban data object, but activated previously-dead cleanup code with several defects: - IP-only expired bans fell through cleanup without returning next(), re-caching with negative TTL (permanent entry) and blocking the user - Redis deployments used cache-prefixed keys for banLogs.delete(), silently failing since bans are stored at raw keys - banCache.set() calls were fire-and-forget, silently dropping errors - No guard for missing/invalid expiresAt reproduced the NaN TTL bug on legacy ban records Consolidate expired-ban cleanup into a single block that always returns next(), use raw keys (req.ip, userId) for banLogs.delete(), add an expiresAt validity guard, await cache writes with error logging, and parallelize independent I/O with Promise.all. Add 25 tests covering all checkBan code paths including the specific regressions for IP-only cleanup, Redis key mismatch, missing expiresAt, and cache write failures. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-20 12:47:51 -04:00
Airam Hernández Hernández	96f6976e00	🪂 fix: Automatic `logout_hint` Fallback for Oversized OpenID Token URLs (#12326 ) * fix: automatic logout_hint fallback for long OpenID tokens Implements OIDC RP-Initiated Logout cascading strategy to prevent errors when id_token_hint makes logout URL too long. Automatically detects URLs exceeding configurable length and falls back to logout_hint only when URL is too long, preserving previous behavior when token is missing. Adds OPENID_MAX_LOGOUT_URL_LENGTH environment variable. Comprehensive test coverage with 20 tests. Works with any OpenID provider. * fix: address review findings for OIDC logout URL length fallback - Replace two-boolean tri-state (useIdTokenHint/urlTooLong) with a single string discriminant ('use_token'\|'too_long'\|'no_token') for clarity - Fix misleading warning: differentiate 'url too long + no client_id' from 'no token + no client_id' so operators get actionable advice - Strict env var parsing: reject partial numeric strings like '500abc' that Number.parseInt silently accepted; use regex + Number() instead - Pre-compute projected URL length from base URL + token length (JWT chars are URL-safe), eliminating the set-then-delete mutation pattern - Extract parseMaxLogoutUrlLength helper for validation and early return - Add tests: invalid env values, url-too-long + missing OPENID_CLIENT_ID, boundary condition (exact max vs max+1), cookie-sourced long token - Remove redundant try/finally in 'respects custom limit' test - Use empty value in .env.example to signal optional config (default: 2000) --------- Co-authored-by: Airam Hernández Hernández <airam.hernandez@intelequia.com> Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-20 12:46:57 -04:00
Brad Russell	ecd6d76bc8	🚦 fix: ERR_ERL_INVALID_IP_ADDRESS and IPv6 Key Collisions in IP Rate Limiters (#12319 ) * fix: Add removePorts keyGenerator to all IP-based rate limiters Six IP-based rate limiters are missing the `keyGenerator: removePorts` option that is already used by the auth-related limiters (login, register, resetPassword, verifyEmail). Without it, reverse proxies that include ports in X-Forwarded-For headers cause ERR_ERL_INVALID_IP_ADDRESS errors from express-rate-limit. Fixes #12318 * fix: make removePorts IPv6-safe to prevent rate-limit key collisions The original regex `/:\d+[^:]$/` treated the last colon-delimited segment of bare IPv6 addresses as a port, mangling valid IPs (e.g. `::1` → `::`, `2001:db8::1` → `2001:db8::`). Distinct IPv6 clients could collapse into the same rate-limit bucket. Use `net.isIP()` as a fast path for already-valid IPs, then match bracketed IPv6+port and IPv4+port explicitly. Bare IPv6 addresses are now returned unchanged. Also fixes pre-existing property ordering inconsistency in ttsLimiters.js userLimiterOptions (keyGenerator before store). refactor: move removePorts to packages/api as TypeScript, fix import order - Move removePorts implementation to packages/api/src/utils/removePorts.ts with proper Express Request typing - Reduce api/server/utils/removePorts.js to a thin re-export from @librechat/api for backward compatibility - Consolidate removePorts import with limiterCache from @librechat/api in all 6 limiter files, fixing import order (package imports shortest to longest, local imports longest to shortest) - Remove narrating inline comments per code style guidelines --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-19 21:48:03 -04:00
Danny Avila	1ecff83b20	🪦 fix: ACL-Safe User Account Deletion for Agents, Prompts, and MCP Servers (#12314 ) * fix: use ACL ownership for prompt group cleanup on user deletion deleteUserPrompts previously called getAllPromptGroups with only an author filter, which defaults to searchShared=true and drops the author filter for shared/global project entries. This caused any user deleting their account to strip shared prompt group associations and ACL entries for other users. Replace the author-based query with ACL-based ownership lookup: - Find prompt groups where the user has OWNER permission (DELETE bit) - Only delete groups where the user is the sole owner - Preserve multi-owned groups and their ACL entries for other owners * fix: use ACL ownership for agent cleanup on user deletion deleteUserAgents used the deprecated author field to find and delete agents, then unconditionally removed all ACL entries for those agents. This could destroy ACL entries for agents shared with or co-owned by other users. Replace the author-based query with ACL-based ownership lookup: - Find agents where the user has OWNER permission (DELETE bit) - Only delete agents where the user is the sole owner - Preserve multi-owned agents and their ACL entries for other owners - Also clean up handoff edges referencing deleted agents * fix: add MCP server cleanup on user deletion User deletion had no cleanup for MCP servers, leaving solely-owned servers orphaned in the database with dangling ACL entries for other users. Add deleteUserMcpServers that follows the same ACL ownership pattern as prompt groups and agents: find servers with OWNER permission, check for sole ownership, and only delete those with no other owners. * style: fix prettier formatting in Prompt.spec.js * refactor: extract getSoleOwnedResourceIds to PermissionService The ACL sole-ownership detection algorithm was duplicated across deleteUserPrompts, deleteUserAgents, and deleteUserMcpServers. Centralizes the three-step pattern (find owned entries, find other owners, compute sole-owned set) into a single reusable utility. * refactor: use getSoleOwnedResourceIds in all deletion functions - Replace inline ACL queries with the centralized utility - Remove vestigial _req parameter from deleteUserPrompts - Use Promise.all for parallel project removal instead of sequential awaits - Disconnect live MCP sessions and invalidate tool cache before deleting sole-owned MCP server documents - Export deleteUserMcpServers for testability * test: improve deletion test coverage and quality - Move deleteUserPrompts call to beforeAll to eliminate execution-order dependency between tests - Standardize on test() instead of it() for consistency in Prompt.spec.js - Add assertion for deleting user's own ACL entry preservation on multi-owned agents - Add deleteUserMcpServers integration test suite with 6 tests covering sole-owner deletion, multi-owner preservation, session disconnect, cache invalidation, model-not-registered guard, and missing MCPManager - Add PermissionService mock to existing deleteUser.spec.js to fix import chain * fix: add legacy author-based fallback for unmigrated resources Resources created before the ACL system have author set but no AclEntry records. The sole-ownership detection returns empty for these, causing deleteUserPrompts, deleteUserAgents, and deleteUserMcpServers to silently skip them — permanently orphaning data on user deletion. Add a fallback that identifies author-owned resources with zero ACL entries (truly unmigrated) and includes them in the deletion set. This preserves the multi-owner safety of the ACL path while ensuring pre-ACL resources are still cleaned up regardless of migration status. * style: fix prettier formatting across all changed files * test: add resource type coverage guard for user deletion Ensures every ResourceType in the ACL system has a corresponding cleanup handler wired into deleteUserController. When a new ResourceType is added (e.g. WORKFLOW), this test fails immediately, preventing silent data orphaning on user account deletion. * style: fix import order in PermissionService destructure * test: add opt-out set and fix test lifecycle in coverage guard Add NO_USER_CLEANUP_NEEDED set for resource types that legitimately require no per-user deletion. Move fs.readFileSync into beforeAll so path errors surface as clean test failures instead of unhandled crashes.	2026-03-19 17:46:14 -04:00
Danny Avila	f380390408	🛡️ fix: Prevent loop in ChatGPT import on Cyclic Parent Graphs (#12313 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details Cap adjustTimestampsForOrdering to N passes and add cycle detection to findValidParent, preventing DoS via crafted ChatGPT export files with cyclic parentMessageId relationships. Add breakParentCycles to sever cyclic back-edges before saving, ensuring structurally valid message trees are persisted to the DB.	2026-03-19 17:15:12 -04:00
Danny Avila	39f5f83a8a	🔌 fix: Isolate Code-Server HTTP Agents to Prevent Socket Pool Contamination (#12311 ) * 🔧 fix: Isolate HTTP agents for code-server axios requests Prevents socket hang up after 5s on Node 19+ when code executor has file attachments. follow-redirects (axios dep) leaks `socket.destroy` as a timeout listener on TCP sockets; with Node 19+ defaulting to keepAlive: true, tainted sockets re-enter the global pool and destroy active node-fetch requests in CodeExecutor after the idle timeout. Uses dedicated http/https agents with keepAlive: false for all axios calls targeting CODE_BASEURL in crud.js and process.js. Closes #12298 * ♻️ refactor: Extract code-server HTTP agents to shared module - Move duplicated agent construction from crud.js and process.js into a shared agents.js module to eliminate DRY violation - Switch process.js from raw `require('axios')` to `createAxiosInstance()` for proxy configuration parity with crud.js - Fix import ordering in process.js (agent constants no longer split imports) - Add 120s timeout to uploadCodeEnvFile (was the only code-server call without a timeout) * ✅ test: Add regression tests for code-server socket isolation - Add crud.spec.js covering getCodeOutputDownloadStream and uploadCodeEnvFile (agent options, timeout, URL, error handling) - Add socket pool isolation tests to process.spec.js asserting keepAlive:false agents are forwarded to axios - Update process.spec.js mocks for createAxiosInstance() migration * ♻️ refactor: Move code-server agents to packages/api Relocate agents.js from api/server/services/Files/Code/ to packages/api/src/utils/code.ts per workspace conventions. Consumers now import codeServerHttpAgent/codeServerHttpsAgent from @librechat/api.	2026-03-19 16:16:57 -04:00
Danny Avila	93952f06b4	🧯 fix: Remove Revoked Agents from User Favorites (#12296 ) * 🧯 fix: Remove revoked agents from user favorites When agent access is revoked, the agent remained in the user's favorites causing repeated 403 errors on page load. Backend now cleans up favorites on permission revocation; frontend treats 403 like 404 and auto-removes stale agent references. * 🧪 fix: Address review findings for stale agent favorites cleanup - Guard cleanup effect with ref to prevent infinite loop on mutation failure (Finding 1) - Use validated results.revoked instead of raw request payload for revokedUserIds (Finding 3) - Stabilize staleAgentIds memo with string key to avoid spurious re-evaluation during drag-drop (Finding 5) - Add JSDoc with param types to removeRevokedAgentFromFavorites (Finding 7) - Return promise from removeRevokedAgentFromFavorites for testability - Add 7 backend tests covering revocation cleanup paths - Add 3 frontend tests for 403 handling and stale cleanup persistence	2026-03-19 15:15:10 -04:00
Danny Avila	2f09d29c71	🛂 fix: Validate `types` Query Param in People Picker Access Middleware (#12276 ) * 🛂 fix: Validate `types` query param in people picker access middleware checkPeoplePickerAccess only inspected `req.query.type` (singular), allowing callers to bypass type-specific permission checks by using the `types` (plural) parameter accepted by the controller. Now both `type` and `types` are collected and each requested principal type is validated against the caller's role permissions. * 🛂 refactor: Hoist valid types constant, improve logging, and add edge-case tests - Hoist VALID_PRINCIPAL_TYPES to module-level Set to avoid per-request allocation - Include both `type` and `types` in error log for debuggability - Restore detailed JSDoc documenting per-type permission requirements - Add missing .json() assertion on partial-denial test - Add edge-case tests: all-invalid types, empty string types, PrincipalType.PUBLIC * 🏷️ fix: Align TPrincipalSearchParams with actual controller API The stale type used `type` (singular) but the controller and all callers use `types` (plural array). Aligns with PrincipalSearchParams in types/queries.ts.	2026-03-17 02:46:11 -04:00
Danny Avila	9a64791e3e	🪢 fix: Action Domain Encoding Collision for HTTPS URLs (#12271 ) * fix: strip protocol from domain before encoding in `domainParser` All https:// (and http://) domains produced the same 10-char base64 prefix due to ENCODED_DOMAIN_LENGTH truncation, causing tool name collisions for agents with multiple actions. Strip the protocol before encoding so the base64 key is derived from the hostname. Add `legacyDomainEncode` to preserve the old encoding logic for backward-compatible matching of existing stored actions. * fix: backward-compatible tool matching in ToolService Update `getActionToolDefinitions` to match stored tools against both new and legacy domain encodings. Update `loadActionToolsForExecution` to resolve model-called tool names via a `normalizedToDomain` map that includes both encoding variants, with legacy fallback for request builder lookup. * fix: action route save/delete domain encoding issues Save routes now remove old tools matching either new or legacy domain encoding, preventing stale entries when an action's encoding changes on update. Delete routes no longer re-encode the already-encoded domain extracted from the stored actions array, which was producing incorrect keys and leaving orphaned tools. * test: comprehensive coverage for action domain encoding Rewrite ActionService tests to cover real matching patterns used by ToolService and action routes. Tests verify encode/decode round-trips, protocol stripping, backward-compatible tool name matching at both definition and execution phases, save-route cleanup of old/new encodings, delete-route domain extraction, and the collision fix for multi-action agents. * fix: add legacy domain compat to all execution paths, make legacyDomainEncode sync CRITICAL: processRequiredActions (assistants path) was not updated with legacy domain matching — existing assistants with https:// domain actions would silently fail post-deployment because domainMap only had new encoding. MAJOR: loadAgentTools definitionsOnly=false path had the same issue. Both now use a normalizedToDomain map with legacy+new entries and extract function names via the matched key (not the canonical domain). Also: make legacyDomainEncode synchronous (no async operations), store legacyNormalized in processedActionSets to eliminate recomputation in the per-tool fallback, and hoist domainSeparatorRegex to module level. * refactor: clarify domain variable naming and tool-filter helpers in action routes Rename shadowed 'domain' to 'encodedDomain' to separate raw URL from encoded key in both agent and assistant save routes. Rename shouldRemoveTool to shouldRemoveAgentTool / shouldRemoveAssistantTool to make the distinct data-shape guards explicit. Remove await on now-synchronous legacyDomainEncode. * test: expand coverage for all review findings - Add validateAndUpdateTool tests (protocol-stripping match logic) - Restore unicode domain encode/decode/round-trip tests - Add processRequiredActions matching pattern tests (assistants path) - Add legacy guard skip test for short bare hostnames - Add pre-normalized Set test for definition-phase optimization - Fix corrupt-cache test to assert typeof instead of toBeDefined - Verify legacyDomainEncode is synchronous (not a Promise) - Remove all await on legacyDomainEncode (now sync) 58 tests, up from 44. * fix: address follow-up review findings A-E A: Fix stale JSDoc @returns {Promise<string>} on now-synchronous legacyDomainEncode — changed to @returns {string}. B: Rename normalizedToDomain to domainLookupMap in processRequiredActions and loadAgentTools where keys are raw encoded domains (not normalized), avoiding confusion with loadActionToolsForExecution where keys ARE normalized. C: Pre-normalize actionToolNames into a Set<string> in getActionToolDefinitions, replacing O(signatures × tools) per-check .some() + .replace() with O(1) Set.has() lookups. D: Remove stripProtocol from ActionService exports — it is a one-line internal helper. Spec tests for it removed; behavior is fully covered by domainParser protocol-stripping tests. E: Fix pre-existing bug where processRequiredActions re-loaded action sets on every missing-tool iteration. The guard !actionSets.length always re-triggered because actionSets was reassigned to a plain object (whose .length is undefined). Replaced with a null-check on a dedicated actionSetsData variable. * fix: strip path and query from domain URLs in stripProtocol URLs like 'https://api.example.com/v1/endpoint?foo=bar' previously retained the path after protocol stripping, contaminating the encoded domain key. Now strips everything after the first '/' following the host, using string indexing instead of URL parsing to avoid punycode normalization of unicode hostnames. Closes Copilot review comments 1, 2, and 5.	2026-03-17 01:38:51 -04:00
Danny Avila	d17ac8f06d	🔏 fix: Remove Federated Tokens from OpenID Refresh Response (#12264 ) * 🔒 fix: Remove OpenID federated tokens from refresh endpoint response The refresh controller was attaching federatedTokens (including the refresh_token) to the user object returned in the JSON response, exposing HttpOnly-protected tokens to client-side JavaScript. The tokens are already stored server-side by setOpenIDAuthTokens and re-attached by the JWT strategy on authenticated requests. * 🔒 fix: Strip sensitive fields from OpenID refresh response user object The OpenID refresh path returned the raw findOpenIDUser result without field projection, unlike the non-OpenID path which excludes password, __v, totpSecret, and backupCodes via getUserById projection. Destructure out sensitive fields before serializing. Also strengthens the regression test: uses not.toHaveProperty for true property-absence checks (expect.anything() misses null/undefined), adds positive shape assertion, and DRYs up duplicated mock user setup.	2026-03-16 09:23:46 -04:00
Danny Avila	381ed8539b	🪪 fix: Enforce Conversation Ownership Checks in Remote Agent Controllers (#12263 ) * 🔒 fix: Validate conversation ownership in remote agent API endpoints Add user-scoped ownership checks for client-supplied conversation IDs in OpenAI-compatible and Open Responses controllers to prevent cross-tenant file/message loading via IDOR. * 🔒 fix: Harden ownership checks against type confusion and unhandled errors - Add typeof string validation before getConvo to block NoSQL operator injection (e.g. { "$gt": "" }) bypassing the ownership check - Move ownership checks inside try/catch so DB errors produce structured JSON error responses instead of unhandled promise rejections - Add string type validation for conversation_id and previous_response_id in the upstream TS request validators (defense-in-depth) * 🧪 test: Add coverage for conversation ownership validation in remote agent APIs - Fix broken getConvo mock in openai.spec.js (was missing entirely) - Add tests for: owned conversation, unowned (404), non-string type (400), absent conversation_id (skipped), and DB error (500) — both controllers	2026-03-16 09:19:48 -04:00
Danny Avila	acd07e8085	🗝️ fix: Exempt Admin-Trusted Domains from MCP OAuth Validation (#12255 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details * fix: exempt allowedDomains from MCP OAuth SSRF checks (#12254) The SSRF guard in validateOAuthUrl was context-blind — it blocked private/internal OAuth endpoints even for admin-trusted MCP servers listed in mcpSettings.allowedDomains. Add isHostnameAllowed() to domain.ts and skip SSRF checks in validateOAuthUrl when the OAuth endpoint hostname matches an allowed domain. * refactor: thread allowedDomains through MCP connection stack Pass allowedDomains from MCPServersRegistry through BasicConnectionOptions, MCPConnectionFactory, and into MCPOAuthHandler method calls so the OAuth layer can exempt admin-trusted domains from SSRF validation. * test: add allowedDomains bypass tests and fix registry mocks Add isHostnameAllowed unit tests (exact, wildcard, case-insensitive, private IPs). Add MCPOAuthSecurity tests covering the allowedDomains bypass for initiateOAuthFlow, refreshOAuthTokens, and revokeOAuthToken. Update registry mocks to include getAllowedDomains. * fix: enforce protocol/port constraints in OAuth allowedDomains bypass Replace isHostnameAllowed (hostname-only check) with isOAuthUrlAllowed which parses the full OAuth URL and matches against allowedDomains entries including protocol and explicit port constraints — mirroring isDomainAllowedCore's allowlist logic. Prevents a port-scoped entry like 'https://auth.internal:8443' from also exempting other ports. * test: cover auto-discovery and branch-3 refresh paths with allowedDomains Add three new integration tests using a real OAuth test server: - auto-discovered OAuth endpoints allowed when server IP is in allowedDomains - auto-discovered endpoints rejected when allowedDomains doesn't match - refreshOAuthTokens branch 3 (no clientInfo/config) with allowedDomains bypass Also rename describe block from ephemeral issue number to durable name. * docs: explain intentional absence of allowedDomains in completeOAuthFlow Prevents future contributors from assuming a missing parameter during security audits — URLs are pre-validated during initiateOAuthFlow. * test: update initiateOAuthFlow assertion for allowedDomains parameter * perf: avoid redundant URL parse for admin-trusted OAuth endpoints Move isOAuthUrlAllowed check before the hostname extraction so admin-trusted URLs short-circuit with a single URL parse instead of two. The hostname extraction (new URL) is now deferred to the SSRF-check path where it's actually needed.	2026-03-15 23:03:12 -04:00
Danny Avila	8e8fb01d18	🧱 fix: Enforce Agent Access Control on Context and OCR File Loading (#12253 ) * 🔏 fix: Apply agent access control filtering to context/OCR resource loading The context/OCR file path in primeResources fetched files by file_id without applying filterFilesByAgentAccess, unlike the file_search and execute_code paths. Add filterFiles dependency injection to primeResources and invoke it after getFiles to enforce consistent access control. * fix: Wire filterFilesByAgentAccess into all agent initialization callers Pass the filterFilesByAgentAccess function from the JS layer into the TS initializeAgent → primeResources chain via dependency injection, covering primary, handoff, added-convo, and memory agent init paths. * test: Add access control filtering tests for primeResources Cover filterFiles invocation with context/OCR files, verify filtering rejects inaccessible files, and confirm graceful fallback when filterFiles, userId, or agentId are absent. * fix: Guard filterFilesByAgentAccess against ephemeral agent IDs Ephemeral agents have no DB document, so getAgent returns null and the access map defaults to all-false, silently blocking all non-owned files. Short-circuit with isEphemeralAgentId to preserve the pass-through behavior for inline-built agents (memory, tool agents). * fix: Clean up resources.ts and JS caller import order Remove redundant optional chain on req.user.role inside user-guarded block, update primeResources JSDoc with filterFiles and agentId params, and reorder JS imports to longest-to-shortest per project conventions. * test: Strengthen OCR assertion and add filterFiles error-path test Use toHaveBeenCalledWith for the OCR filtering test to verify exact arguments after the OCR→context merge step. Add test for filterFiles rejection to verify graceful degradation (logs error, returns original tool_resources). * fix: Correct import order in addedConvo.js and initialize.js Sort by total line length descending: loadAddedAgent (91) before filterFilesByAgentAccess (84), loadAgentTools (91) before filterFilesByAgentAccess (84). * test: Add unit tests for filterFilesByAgentAccess and hasAccessToFilesViaAgent Cover every branch in permissions.js: ephemeral agent guard, missing userId/agentId/files early returns, all-owned short-circuit, mixed owned + non-owned with VIEW/no-VIEW, agent-not-found fail-closed, author path scoped to attached files, EDIT gate on delete, DB error fail-closed, and agent with no tool_resources. * test: Cover file.user undefined/null in permissions spec Files with no user field fall into the non-owned path and get run through hasAccessToFilesViaAgent. Add two cases: attached file with no user field is returned, unattached file with no user field is excluded.	2026-03-15 23:02:36 -04:00
Danny Avila	6f87b49df8	🛂 fix: Enforce Actions Capability Gate Across All Event-Driven Tool Loading Paths (#12252 ) * fix: gate action tools by actions capability in all code paths Extract resolveAgentCapabilities helper to eliminate 3x-duplicated capability resolution. Apply early action-tool filtering in both loadToolDefinitionsWrapper and loadAgentTools non-definitions path. Gate loadActionToolsForExecution in loadToolsForExecution behind an actionsEnabled parameter with a cache-based fallback. Replace the late capability guard in loadAgentTools with a hasActionTools check to avoid unnecessary loadActionSets DB calls and duplicate warnings. * fix: thread actionsEnabled through InitializedAgent type Add actionsEnabled to the loadTools callback return type, InitializedAgent, and the initializeAgent destructuring/return so callers can forward the resolved value to loadToolsForExecution without redundant getEndpointsConfig cache lookups. * fix: pass actionsEnabled from callers to loadToolsForExecution Thread actionsEnabled through the agentToolContexts map in initialize.js (primary and handoff agents) and through primaryConfig in the openai.js and responses.js controllers, avoiding per-tool-call capability re-resolution on the hot path. * test: add regression tests for action capability gating Test the real exported functions (resolveAgentCapabilities, loadAgentTools, loadToolsForExecution) with mocked dependencies instead of shadow re-implementations. Covers definition filtering, execution gating, actionsEnabled param forwarding, and fallback capability resolution. * test: use Constants.EPHEMERAL_AGENT_ID in ephemeral fallback test Replaces a string guess with the canonical constant to avoid fragility if the ephemeral detection heuristic changes. * fix: populate agentToolContexts for addedConvo parallel agents After processAddedConvo returns, backfill agentToolContexts for any agents in agentConfigs not already present, so ON_TOOL_EXECUTE for added-convo agents receives actionsEnabled instead of falling back to a per-call cache lookup.	2026-03-15 23:01:36 -04:00
Danny Avila	a26eeea592	🔏 fix: Enforce MCP Server Authorization on Agent Tool Persistence (#12250 ) * 🛡️ fix: Validate MCP tool authorization on agent create/update Agent creation and update accepted arbitrary MCP tool strings without verifying the user has access to the referenced MCP servers. This allowed a user to embed unauthorized server names in tool identifiers (e.g. "anything_mcp_<victimServer>"), causing mcpServerNames to be stored on the agent and granting consumeOnly access via hasAccessViaAgent(). Adds filterAuthorizedTools() that checks MCP tool strings against the user's accessible server configs (via getAllServerConfigs) before persisting. Applied to create, update, and duplicate agent paths. * 🛡️ fix: Harden MCP tool authorization and add test coverage Addresses review findings on the MCP agent tool authorization fix: - Wrap getMCPServersRegistry() in try/catch so uninitialized registry gracefully filters all MCP tools instead of causing a 500 (DoS risk) - Guard revertAgentVersionHandler: filter unauthorized MCP tools after reverting to a previous version snapshot - Preserve existing MCP tools on collaborative updates: only validate newly added tools, preventing silent stripping of tools the editing user lacks direct access to - Add audit logging (logger.warn) when MCP tools are rejected - Refactor to single-pass lazy-fetch (registry queried only on first MCP tool encountered) - Export filterAuthorizedTools for direct unit testing - Add 18 tests covering: authorized/unauthorized/mixed tools, registry unavailable fallback, create/update/duplicate/revert handler paths, collaborative update preservation, and mcpServerNames persistence * test: Add duplicate handler test, use Constants.mcp_delimiter, DB assertions - N1: Add duplicateAgentHandler integration test verifying unauthorized MCP tools are stripped from the cloned agent and mcpServerNames are correctly persisted in the database - N2: Replace all hardcoded '_mcp_' delimiter literals with Constants.mcp_delimiter to prevent silent false-positive tests if the delimiter value ever changes - N3: Add DB state assertion to the revert-with-strip test confirming persisted tools match the response after unauthorized tools are removed * fix: Enforce exact 2-segment format for MCP tool keys Reject MCP tool keys with multiple delimiters to prevent authorization/execution mismatch when `.pop()` vs `split[1]` extract different server names from the same key. * fix: Preserve existing MCP tools when registry is unavailable When the MCP registry is uninitialized (e.g. server restart), existing tools already persisted on the agent are preserved instead of silently stripped. New MCP tools are still rejected when the registry cannot verify them. Applies to duplicate and revert handlers via existingTools param; update handler already preserves existing tools via its diff logic.	2026-03-15 20:08:34 -04:00
Danny Avila	ad08df4db6	🔏 fix: Scope Agent-Author File Access to Attached Files Only (#12251 ) * 🛡️ fix: Scope agent-author file access to attached files only The hasAccessToFilesViaAgent helper short-circuited for agent authors, granting access to all requested file IDs without verifying they were attached to the agent's tool_resources. This enabled an IDOR where any agent author could delete arbitrary files by supplying their agent_id alongside unrelated file IDs. Now both the author and non-author paths check file IDs against the agent's tool_resources before granting access. * chore: Use Object.values/for...of and add JSDoc in getAttachedFileIds * test: Add boundary cases for agent file access authorization - Agent with no tool_resources denies all access (fail-closed) - Files across multiple resource types are all reachable - Author + isDelete: true still scopes to attached files only	2026-03-15 18:54:34 -04:00

1 2 3 4 5 ...

1124 commits