mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-04-03 06:17:21 +02:00
⚡ refactor: Use in-memory cache for App MCP configs to avoid Redis SCAN (#12410)
* ⚡ perf: Use in-memory cache for App MCP configs to avoid Redis SCAN The 'App' namespace holds static YAML-loaded configs identical on every instance. Storing them in Redis and retrieving via SCAN + batch-GET caused 60s+ stalls under concurrent load (#11624). Since these configs are already loaded into memory at startup, bypass Redis entirely by always returning ServerConfigsCacheInMemory for the 'App' namespace. * ♻️ refactor: Extract APP_CACHE_NAMESPACE constant and harden tests - Extract magic string 'App' to a shared `APP_CACHE_NAMESPACE` constant used by both ServerConfigsCacheFactory and MCPServersRegistry - Document that `leaderOnly` is ignored for the App namespace - Reset `cacheConfig.USE_REDIS` in test `beforeEach` to prevent ordering-dependent flakiness - Fix import order in test file (longest to shortest) * 🐛 fix: Populate App cache on follower instances in cluster mode In cluster deployments, only the leader runs MCPServersInitializer to inspect and cache MCP server configs. Followers previously read these from Redis, but with the App namespace now using in-memory storage, followers would have an empty cache. Add populateLocalCache() so follower processes independently initialize their own in-memory App cache from the same YAML configs after the leader signals completion. The method is idempotent — if the cache is already populated (leader case), it's a no-op. * 🐛 fix: Use static flag for populateLocalCache idempotency Replace getAllServerConfigs() idempotency check with a static localCachePopulated flag. The previous check merged App + DB caches, causing false early returns in deployments with publicly shared DB configs, and poisoned the TTL read-through cache with stale results. The static flag is zero-cost (no async/Redis/DB calls), immune to DB config interference, and is reset alongside hasInitializedThisProcess in resetProcessFlag() for test teardown. Also set localCachePopulated=true after leader initialization completes, so subsequent calls on the leader don't redundantly re-run populateLocalCache. * 📝 docs: Document process-local reset() semantics for App cache With the App namespace using in-memory storage, reset() only clears the calling process's cache. Add JSDoc noting this behavioral change so callers in cluster deployments know each instance must reset independently. * ✅ test: Add follower cache population tests for MCPServersInitializer Cover the populateLocalCache code path: - Follower populates its own App cache after leader signals completion - localCachePopulated flag prevents redundant re-initialization - Fresh follower process independently initializes all servers * 🧹 style: Fix import order to longest-to-shortest convention * 🔬 test: Add Redis perf benchmark to isolate getAll() bottleneck Benchmarks that run against a live Redis instance to measure: 1. SCAN vs batched GET phases independently 2. SCAN cost scaling with total keyspace size (noise keys) 3. Concurrent getAll() at various concurrency levels (1/10/50/100) 4. Alternative: single aggregate key vs SCAN+GET 5. Alternative: raw MGET vs Keyv batch GET (serialization overhead) Run with: npx jest --config packages/api/jest.config.mjs \ --testPathPatterns="perf_benchmark" --coverage=false * ⚡ feat: Add aggregate-key Redis cache for MCP App configs ServerConfigsCacheRedisAggregateKey stores all configs under a single Redis key, making getAll() a single GET instead of SCAN + N GETs. This eliminates the O(keyspace_size) SCAN that caused 60s+ stalls in large deployments while preserving cross-instance visibility — all instances read/write the same Redis key, so reinspection results propagate automatically after readThroughCache TTL expiry. * ♻️ refactor: Use aggregate-key cache for App namespace in factory Update ServerConfigsCacheFactory to return ServerConfigsCacheRedisAggregateKey for the App namespace when Redis is enabled, instead of ServerConfigsCacheInMemory. This preserves cross-instance visibility (reinspection results propagate through Redis) while eliminating SCAN. Non-App namespaces still use the standard per-key ServerConfigsCacheRedis. * 🗑️ revert: Remove populateLocalCache — no longer needed with aggregate key With App configs stored under a single Redis key (aggregate approach), followers read from Redis like before. The populateLocalCache mechanism and its localCachePopulated flag are no longer necessary. Also reverts the process-local reset() JSDoc since reset() is now cluster-wide again via Redis. * 🐛 fix: Add write mutex to aggregate cache and exclude perf benchmark from CI - Add promise-based write lock to ServerConfigsCacheRedisAggregateKey to prevent concurrent read-modify-write races during parallel initialization (Promise.allSettled runs multiple addServer calls concurrently, causing last-write-wins data loss on the aggregate key) - Rename perf benchmark to cache_integration pattern so CI skips it (requires live Redis) * 🔧 fix: Rename perf benchmark to *.manual.spec.ts to exclude from all CI The cache_integration pattern is picked up by test:cache-integration:mcp in CI. Rename to *.manual.spec.ts which isn't matched by any CI runner. * ✅ test: Add cache integration tests for ServerConfigsCacheRedisAggregateKey Tests against a live Redis instance covering: - CRUD operations (add, get, update, remove) - getAll with empty/populated cache - Duplicate add rejection, missing update/remove errors - Concurrent write safety (20 parallel adds without data loss) - Concurrent read safety (50 parallel getAll calls) - Reset clears all configs * 🔧 fix: Rename perf benchmark to *.manual.spec.ts to exclude from all CI The perf benchmark file was renamed to *.manual.spec.ts but no testPathIgnorePatterns existed for that convention. Add .*manual\.spec\. to both test and test:ci scripts, plus jest.config.mjs, so manual-only tests never run in CI unit test jobs. * fix: Address review findings for aggregate key cache - Add successCheck() to all write paths (add/update/remove) so Redis SET failures throw instead of being silently swallowed - Override reset() to use targeted cache.delete(AGGREGATE_KEY) instead of inherited SCAN-based cache.clear() — consistent with eliminating SCAN operations - Document cross-instance write race invariant in class JSDoc: the promise-based writeLock is process-local only; callers must enforce single-writer semantics externally (leader-only init) - Use definite-assignment assertion (let resolve!:) instead of non-null assertion at call site - Fix import type convention in integration test - Verify Promise.allSettled rejections explicitly in concurrent write test - Fix broken run command in benchmark file header * style: Fix import ordering per AGENTS.md convention Local/project imports sorted longest to shortest. * chore: Update import ordering and clean up unused imports in MCPServersRegistry.ts * chore: import order * chore: import order
This commit is contained in:
parent
1123f96e6a
commit
359cc63b41
8 changed files with 779 additions and 43 deletions
|
|
@ -18,8 +18,8 @@
|
|||
"build:dev": "npm run clean && NODE_ENV=development rollup -c --bundleConfigAsCjs",
|
||||
"build:watch": "NODE_ENV=development rollup -c -w --bundleConfigAsCjs",
|
||||
"build:watch:prod": "rollup -c -w --bundleConfigAsCjs",
|
||||
"test": "jest --coverage --watch --testPathIgnorePatterns=\"\\.*integration\\.|\\.*helper\\.|__tests__/helpers/\"",
|
||||
"test:ci": "jest --coverage --ci --testPathIgnorePatterns=\"\\.*integration\\.|\\.*helper\\.|__tests__/helpers/\"",
|
||||
"test": "jest --coverage --watch --testPathIgnorePatterns=\"\\.*integration\\.|\\.*helper\\.|__tests__/helpers/|\\.*manual\\.spec\\.\"",
|
||||
"test:ci": "jest --coverage --ci --testPathIgnorePatterns=\"\\.*integration\\.|\\.*helper\\.|__tests__/helpers/|\\.*manual\\.spec\\.\"",
|
||||
"test:cache-integration:core": "jest --testPathPatterns=\"src/cache/.*\\.cache_integration\\.spec\\.ts$\" --coverage=false",
|
||||
"test:cache-integration:cluster": "jest --testPathPatterns=\"src/cluster/.*\\.cache_integration\\.spec\\.ts$\" --coverage=false --runInBand",
|
||||
"test:cache-integration:mcp": "jest --testPathPatterns=\"src/mcp/.*\\.cache_integration\\.spec\\.ts$\" --coverage=false",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue