Commit graph

3 commits

Author SHA1 Message Date
Danny Avila
4b6d68b3b5
🎛️ feat: DB-Backed Per-Principal Config System (#12354)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
*  feat: Add Config schema, model, and methods for role-based DB config overrides

Add the database foundation for principal-based configuration overrides
(user, group, role) in data-schemas. Includes schema with tenantId and
tenant isolation, CRUD methods, and barrel exports.

* 🔧 fix: Add shebang and enforce LF line endings for git hooks

The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was
converting it to CRLF, both causing "Exec format error" on Windows.
Add .gitattributes to force LF for .husky/* and *.sh files.

*  feat: Add admin config API routes with section-level capability checks

Add /api/admin/config endpoints for managing per-principal config
overrides (user, group, role). Handlers in @librechat/api use DI pattern
with section-level hasConfigCapability checks for granular access control.

Supports full overrides replacement, per-field PATCH via dot-paths, field
deletion, toggle active, and listing.

* 🐛 fix: Move deleteConfigField fieldPath from URL param to request body

The path-to-regexp wildcard syntax (:fieldPath(*)) is not supported by
the version used in Express. Send fieldPath in the DELETE request body
instead, which also avoids URL-encoding issues with dotted paths.

*  feat: Wire config resolution into getAppConfig with override caching

Add mergeConfigOverrides utility in data-schemas for deep-merging DB
config overrides into base AppConfig by priority order.

Update getAppConfig to query DB for applicable configs when role/userId
is provided, with short-TTL caching and a hasAnyConfigs feature flag
for zero-cost when no DB configs exist.

Also: add unique compound index on Config schema, pass userId from
config middleware, and signal config changes from admin API handlers.

* 🔄 refactor: Extract getAppConfig logic into packages/api as TS service

Move override resolution, caching strategy, and signalConfigChange from
api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts
using the DI factory pattern (createAppConfigService). The JS file becomes
a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies.

* 🧹 chore: Rename configResolution.ts to resolution.ts

*  feat: Move admin types & capabilities to librechat-data-provider

Move SystemCapabilities, CapabilityImplications, and utility functions
(hasImpliedCapability, expandImplications) from data-schemas to
data-provider so they are available to external consumers like the
admin panel without a data-schemas dependency.

Add API-friendly admin types: TAdminConfig, TAdminSystemGrant,
TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult,
TCapabilityCategory, and CAPABILITY_CATEGORIES.

data-schemas re-exports these from data-provider and extends with
config-schema-derived types (ConfigSection, SystemCapability union).

Bump version to 0.8.500.

* feat: Add JSON-serializable admin config API response types to data-schemas

Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and
AdminConfigDeleteResponse types so both LibreChat API handlers and the
admin panel can share the same response contract. Bump version to 0.0.41.

* refactor: Move admin capabilities & types from data-provider to data-schemas

SystemCapabilities, CapabilityImplications, utility functions,
CAPABILITY_CATEGORIES, and admin API response types should not be in
data-provider as it gets compiled into the frontend bundle, exposing
the capability surface. Moved everything to data-schemas (server-only).

All consumers already import from @librechat/data-schemas, so no
import changes needed elsewhere. Consolidated duplicate AdminConfig
type (was in both config.ts and admin.ts).

* chore: Bump @librechat/data-schemas to 0.0.42

* refactor: Reorganize admin capabilities into admin/ and types/admin.ts

Split systemCapabilities.ts following data-schemas conventions:
- Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.)
  → src/types/admin.ts
- Runtime code (SystemCapabilities, CapabilityImplications, utilities)
  → src/admin/capabilities.ts

Revert data-provider version to 0.8.401 (no longer modified).

* chore: Fix import ordering, rename appConfigService to service

- Rename app/appConfigService.ts → app/service.ts (directory provides context)
- Fix import order in admin/config.ts, types/admin.ts, types/config.ts
- Add naming convention to AGENTS.md

* feat: Add DB base config support (role/__base__)

- Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc
- getApplicableConfigs always includes __base__ in queries
- getAppConfig queries DB even without role/userId when DB configs exist
- Bump @librechat/data-schemas to 0.0.43

* fix: Address PR review issues for admin config

- Add listAllConfigs method; listConfigs endpoint returns all active
  configs instead of only __base__
- Normalize principalId to string in all config methods to prevent
  ObjectId vs string mismatch on user/group lookups
- Block __proto__ and all dunder-prefixed segments in field path
  validation to prevent prototype pollution
- Fix configVersion off-by-one: default to 0, guard pre('save') with
  !isNew, use $inc on findOneAndUpdate
- Remove unused getApplicableConfigs from admin handler deps

* fix: Enable tree-shaking for data-schemas, bump packages

- Switch data-schemas Rollup output to preserveModules so each source
  file becomes its own chunk; consumers (admin panel) can now import
  just the modules they need without pulling in winston/mongoose/etc.
- Add sideEffects: false to data-schemas package.json
- Bump data-schemas to 0.0.44, data-provider to 0.8.402

* feat: add capabilities subpath export to data-schemas

Adds `@librechat/data-schemas/capabilities` subpath export so browser
consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants
without pulling in Node.js-only modules (winston, async_hooks, etc.).

Bump version to 0.0.45.

* fix: include dist/ in data-provider npm package

Add explicit files field so npm includes dist/types/ in the published
package. Without this, the root .gitignore exclusion of dist/ causes
npm to omit type declarations, breaking TypeScript consumers.

* chore: bump librechat-data-provider to 0.8.403

* feat: add GET /api/admin/config/base for raw AppConfig

Returns the full AppConfig (YAML + DB base merged) so the admin panel
can display actual config field values and structure. The startup config
endpoint (/api/config) returns TStartupConfig which is a different shape
meant for the frontend app.

* chore: imports order

* fix: address code review findings for admin config

Critical:
- Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE
  instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys
- Eliminate race condition: patchConfigField and deleteConfigField now use
  atomic MongoDB $set/$unset with dot-path notation instead of
  read-modify-write cycles, removing the lost-update bug entirely
- Add patchConfigFields and unsetConfigField atomic DB methods

Major:
- Reorder cache check before principal resolution in getAppConfig so
  getUserPrincipals DB query only fires on cache miss
- Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant
- Parallelize capability checks with Promise.all instead of sequential
  awaits in for loops
- Use loose equality (== null) for cache miss check to handle both null
  and undefined returns from cache implementations
- Set HAS_DB_CONFIGS_KEY to true on successful config fetch

Minor:
- Remove dead pre('save') hook from config schema (all writes use
  findOneAndUpdate which bypasses document hooks)
- Consolidate duplicate type imports in resolution.ts
- Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops)
- Add .sort({ priority: 1 }) to getApplicableConfigs query
- Rename _impliedBy to impliedByMap

* fix: self-referencing BROAD_CONFIG_ACCESS constant

* fix: replace type-cast sentinel with proper null parameter

Update hasConfigCapability to accept ConfigSection | null where null
means broad access check (MANAGE_CONFIGS or READ_CONFIGS only).
Removes the '' as ConfigSection type lie from admin config handlers.

* fix: remaining review findings + add tests

- listAllConfigs accepts optional { isActive } filter so admin listing
  can show inactive configs (#9)
- Standardize session application to .session(session ?? null) across
  all config DB methods (#15)
- Export isValidFieldPath and getTopLevelSection for testability
- Add 38 tests across 3 spec files:
  - config.spec.ts (api): path validation, prototype pollution rejection
  - resolution.spec.ts: deep merge, priority ordering, array replacement
  - config.spec.ts (data-schemas): full CRUD, ObjectId normalization,
    atomic $set/$unset, configVersion increment, toggle, __base__ query

* fix: address second code review findings

- Fix cross-user cache contamination: overrideCacheKey now handles
  userId-without-role case with its own cache key (#1)
- Add broad capability check before DB lookup in getConfig to prevent
  config existence enumeration (#2/#3)
- Move deleteConfigField fieldPath from request body to query parameter
  for proxy/load balancer compatibility (#5)
- Derive BaseSystemCapability from SystemCapabilities const instead of
  manual string union (#6)
- Return 201 on upsert creation, 200 on update (#11)
- Remove inline narration comments per AGENTS.md (#12)
- Type overrides as Partial<TCustomConfig> in DB methods and handler
  deps (#13)
- Replace double as-unknown-as casts in resolution.ts with generic
  deepMerge<T> (#14)
- Make override cache TTL injectable via AppConfigServiceDeps (#16)
- Add exhaustive never check in principalModel switch (#17)

* fix: remaining review findings — tests, rename, semantics

- Rename signalConfigChange → markConfigsDirty with JSDoc documenting
  the stale-window tradeoff and overrideCacheTtl knob
- Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention
- Add createAppConfigService tests (14 cases): cache behavior, feature
  flag, cross-user key isolation, fallback on error, markConfigsDirty
- Add admin handler integration tests (13 cases): auth ordering,
  201/200 on create/update, fieldPath from query param, markConfigsDirty
  calls, capability checks

* fix: global flag corruption + empty overrides auth bypass

- Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning
  no configs does not mean no configs exist globally. Setting the flag
  false from a per-principal query short-circuited all subsequent users.
- Add broad manage capability check before section checks in
  upsertConfigOverrides: empty overrides {} no longer bypasses auth.

* test: add regression and invariant tests for config system

Regression tests:
- Bug 1: User A's empty result does not short-circuit User B's overrides
- Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS

Invariant tests (applied across ALL handlers):
- All 5 mutation handlers call markConfigsDirty on success
- All 5 mutation handlers return 401 without auth
- All 5 mutation handlers return 403 without capability
- All 3 read handlers return 403 without capability

* fix: third review pass — all findings addressed

Service (service.ts):
- Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId)
  so deployments with zero DB configs skip DB queries (#1)
- Resolve cache once at factory init instead of per-invocation (#8)
- Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10)
- Add JSDoc to clearAppConfigCache documenting stale-window (#4)
- Fix log message to not say "from YAML" (#14)

Admin handlers (config.ts):
- Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2)
- Add Array.isArray guard on overrides body (#5)
- Import CapabilityUser from capabilities.ts, remove duplicate (#6)
- Replace as-unknown-as cast with targeted type assertion (#7)
- Add MAX_PATCH_ENTRIES=100 cap on entries array (#15)
- Reorder deleteConfigField to validate principalType first (#12)
- Export CapabilityUser from middleware/capabilities.ts

DB methods (config.ts):
- Remove isActive:true from patchConfigFields to prevent silent
  reactivation of disabled configs (#3)

Schema (config.ts):
- Change principalId from Schema.Types.Mixed to String (#11)

Tests:
- Add patchConfigField unsafe fieldPath rejection test (#9)
- Add base-only HAS_DB_CONFIGS=false test (#1)
- Update 201/200 tests to use configVersion instead of findConfig (#2)

* fix: add read handler 401 invariant tests + document flag behavior

- Add invariant: all 3 read handlers return 401 without auth
- Document on markConfigsDirty that HAS_DB_CONFIGS stays true after
  all configs are deleted until clearAppConfigCache or restart

* fix: remove HAS_DB_CONFIGS false optimization entirely

getApplicableConfigs([]) only queries for __base__, not all configs.
A deployment with role/group configs but no __base__ doc gets the
flag poisoned to false by a base-only query, silently ignoring all
scoped overrides. The optimization is not safe without a comprehensive
Config.exists() check, which adds its own DB cost. Removed entirely.

The flag is now write-once-true (set when configs are found or by
markConfigsDirty) and only cleared by clearAppConfigCache/restart.

* chore: reorder import statements in app.js for clarity

* refactor: remove HAS_DB_CONFIGS_KEY machinery entirely

The three-state flag (false/null/true) was the source of multiple bugs
across review rounds. Every attempt to safely set it to false was
defeated by getApplicableConfigs querying only a subset of principals.

Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag,
markConfigsDirty (now a no-op concept), notifyChange wrapper, and all
tests that seeded false manually.

The per-user/role TTL cache (overrideCacheTtl, default 60s) is the
sole caching mechanism. On cache miss, getApplicableConfigs queries
the DB. This is one indexed query per user per TTL window — acceptable
for the config override use case.

* docs: rewrite admin panel remaining work with current state

* perf: cache empty override results to avoid repeated DB queries

When getApplicableConfigs returns no configs for a principal, cache
baseConfig under their override key with TTL. Without this, every
user with no per-principal overrides hits MongoDB on every request
after the 60s cache window expires.

* fix: add tenantId to cache keys + reject PUBLIC principal type

- Include tenantId in override cache keys to prevent cross-tenant
  config contamination. Single-tenant deployments (tenantId undefined)
  use '_' as placeholder — no behavior change for them.
- Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has
  no PrincipalModel and is never resolved by getApplicableConfigs,
  so config docs for it would be dead data.
- Config middleware passes req.user.tenantId to getAppConfig.

* fix: fourth review pass findings

DB methods (config.ts):
- findConfigByPrincipal accepts { includeInactive } option so admin
  GET can retrieve inactive configs (#5)
- upsertConfig catches E11000 duplicate key on concurrent upserts and
  retries without upsert flag (#2)
- unsetConfigField no longer filters isActive:true, consistent with
  patchConfigFields (#11)
- Typed filter objects replace Record<string, unknown> (#12)

Admin handlers (config.ts):
- patchConfigField: serial broad capability check before Promise.all
  to pre-warm ALS principal cache, preventing N parallel DB calls (#3)
- isValidFieldPath rejects leading/trailing dots and consecutive
  dots (#7)
- Duplicate fieldPaths in patch entries return 400 (#8)
- DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14)
- Admin getConfig and patchConfigField pass includeInactive to
  findConfigByPrincipal (#5)
- Route import uses barrel instead of direct file path (#13)

Resolution (resolution.ts):
- deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow
  from crafted deeply nested configs (#4)

* fix: final review cleanup

- Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths)
- Add empty-result caching regression test
- Add tenantId to AdminConfigDeps.getAppConfig type
- Restore exhaustive never check in principalModel switch
- Standardize toggleConfigActive session handling to options pattern

* fix: validate priority in patchConfigField handler

Add the same non-negative number validation for priority that
upsertConfigOverrides already has. Without this, invalid priority
values could be stored via PATCH and corrupt merge ordering.

* chore: remove planning doc from PR

* fix: correct stale cache key strings in service tests

* fix: clean up service tests and harden tenant sentinel

- Remove no-op cache delete lines from regression tests
- Change no-tenant sentinel from '_' to '__default__' to avoid
  collision with a real tenant ID when multi-tenancy is enabled
- Remove unused CONFIG_STORE from AppConfigServiceDeps

* chore: bump @librechat/data-schemas to 0.0.46

* fix: block prototype-poisoning keys in deepMerge

Skip __proto__, constructor, and prototype keys during config merge
to prevent prototype pollution via PUT /api/admin/config overrides.
2026-03-25 19:39:29 -04:00
Danny Avila
6167ce6e57
🧪 chore: MCP Reconnect Storm Follow-Up Fixes and Integration Tests (#12172)
* 🧪 test: Add reconnection storm regression tests for MCPConnection

Introduced a comprehensive test suite for reconnection storm scenarios, validating circuit breaker, throttling, cooldown, and timeout fixes. The tests utilize real MCP SDK transports and a StreamableHTTP server to ensure accurate behavior under rapid connect/disconnect cycles and error handling for SSE 400/405 responses. This enhances the reliability of the MCPConnection by ensuring proper handling of reconnection logic and circuit breaker functionality.

* 🔧 fix: Update createUnavailableToolStub to return structured response

Modified the `createUnavailableToolStub` function to return an array containing the unavailable message and a null value, enhancing the response structure. Additionally, added a debug log to skip tool creation when the result is null, improving the handling of reconnection scenarios in the MCP service.

* 🧪 test: Enhance MCP tool creation tests for cache and throttle interactions

Added new test cases for the `createMCPTool` function to validate the caching behavior when tools are unavailable or throttled. The tests ensure that tools are correctly cached as missing and prevent unnecessary reconnects across different users, improving the reliability of the MCP service under concurrent usage scenarios. Additionally, introduced a test for the `createMCPTools` function to verify that it returns an empty array when reconnect is throttled, ensuring proper handling of throttling logic.

* 📝 docs: Update AGENTS.md with testing philosophy and guidelines

Expanded the testing section in AGENTS.md to emphasize the importance of using real logic over mocks, advocating for the use of spies and real dependencies in tests. Added specific recommendations for testing with MongoDB and MCP SDK, highlighting the need to mock only uncontrollable external services. This update aims to improve testing practices and encourage more robust test implementations.

* 🧪 test: Enhance reconnection storm tests with socket tracking and SSE handling

Updated the reconnection storm test suite to include a new socket tracking mechanism for better resource management during tests. Improved the handling of SSE 400/405 responses by ensuring they are processed in the same branch as 404 errors, preventing unhandled cases. This enhances the reliability of the MCPConnection under rapid reconnect scenarios and ensures proper error handling.

* 🔧 fix: Implement cache eviction for stale reconnect attempts and missing tools

Added an `evictStale` function to manage the size of the `lastReconnectAttempts` and `missingToolCache` maps, ensuring they do not exceed a maximum cache size. This enhancement improves resource management by removing outdated entries based on a specified time-to-live (TTL), thereby optimizing the MCP service's performance during reconnection scenarios.
2026-03-10 17:44:13 -04:00
Danny Avila
c3da148fa0
📝 docs: Add AGENTS.md for Project Structure and Coding Standards (#11866)
Some checks are pending
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* 📝 docs: Add AGENTS.md for project structure and coding standards

- Introduced AGENTS.md to outline project workspaces, coding standards, and development commands.
- Defined workspace boundaries for backend and frontend development, emphasizing TypeScript usage.
- Established guidelines for code style, iteration performance, type safety, and import order.
- Updated CONTRIBUTING.md to reference AGENTS.md for coding standards and project conventions.
- Modified package.json to streamline build commands, consolidating frontend and backend build processes.

* chore: Update build commands and improve smart reinstall process

- Modified AGENTS.md to clarify the purpose of `npm run smart-reinstall` and other build commands, emphasizing Turborepo's role in dependency management and builds.
- Updated package.json to streamline build commands, replacing the legacy frontend build with a Turborepo-based approach for improved performance.
- Enhanced the smart reinstall script to fully delegate build processes to Turborepo, including cache management and dependency checks, ensuring a more efficient build workflow.
2026-02-19 16:33:43 -05:00