🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)

* feat: add tenant context middleware for ALS-based isolation

Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.

- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths

* feat: register tenant middleware and wrap startup/auth in runAsSystem()

- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
  before user context is established (LDAP, SAML, OpenID, social login, AuthService)

* feat: thread tenantId through all getAppConfig callers

Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.

Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.

Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint

* feat: add config cache invalidation on admin mutations

- Add clearOverrideCache(tenantId?) to flush per-principal override caches
  by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
  caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
  (upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)

* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering

The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.

* fix: replace runAsSystem with baseOnly for pre-tenant code paths

App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.

All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.

* fix: address PR review findings — middleware ordering, types, cache safety

- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
  instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)

* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly

- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
  initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
  queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)

* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests

- requireJwtAuth: 5 tests verifying ALS tenant context is set after
  passport auth, isolated between concurrent requests, and not set
  when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
  tenantId is threaded through, partial failure is handled gracefully,
  and operations run in parallel via Promise.all (Finding 11)

* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping

- Forward passport errors in requireJwtAuth before entering tenant
  middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
  are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
  so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
  base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing

* fix: address second review — cache safety, code quality, test reliability

- Decouple cache invalidation from mutation response: fire-and-forget
  with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
  side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
  env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock

* fix: move JSDoc to correct function after clearEndpointConfigCache extraction

* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry

Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.

* fix: remove return from tenantStorage.run to satisfy void middleware signature

* fix: address second review — cache safety, code quality, test reliability

- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
  so partial failures are logged individually instead of producing one
  undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
  flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
  if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
  clearAppConfigCache rejects (not just the swallowed endpoint error)

* chore: reorder imports in index.js and app.js for consistency

- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
This commit is contained in:
Danny Avila 2026-03-26 17:35:00 -04:00 committed by GitHub
parent 083042e56c
commit 9f6d8c6e93
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
32 changed files with 768 additions and 63 deletions

View file

@ -0,0 +1,137 @@
// ── Mocks ──────────────────────────────────────────────────────────────
const mockConfigStoreDelete = jest.fn().mockResolvedValue(true);
const mockClearAppConfigCache = jest.fn().mockResolvedValue(undefined);
const mockClearOverrideCache = jest.fn().mockResolvedValue(undefined);
jest.mock('~/cache/getLogStores', () => {
return jest.fn(() => ({
delete: mockConfigStoreDelete,
}));
});
jest.mock('~/server/services/start/tools', () => ({
loadAndFormatTools: jest.fn(() => ({})),
}));
jest.mock('../loadCustomConfig', () => jest.fn().mockResolvedValue({}));
jest.mock('@librechat/data-schemas', () => {
const actual = jest.requireActual('@librechat/data-schemas');
return { ...actual, AppService: jest.fn(() => ({ availableTools: {} })) };
});
jest.mock('~/models', () => ({
getApplicableConfigs: jest.fn().mockResolvedValue([]),
getUserPrincipals: jest.fn().mockResolvedValue([]),
}));
const mockInvalidateCachedTools = jest.fn().mockResolvedValue(undefined);
jest.mock('../getCachedTools', () => ({
setCachedTools: jest.fn().mockResolvedValue(undefined),
invalidateCachedTools: mockInvalidateCachedTools,
}));
jest.mock('@librechat/api', () => ({
createAppConfigService: jest.fn(() => ({
getAppConfig: jest.fn().mockResolvedValue({ availableTools: {} }),
clearAppConfigCache: mockClearAppConfigCache,
clearOverrideCache: mockClearOverrideCache,
})),
}));
// ── Tests ──────────────────────────────────────────────────────────────
const { CacheKeys } = require('librechat-data-provider');
const { invalidateConfigCaches } = require('../app');
describe('invalidateConfigCaches', () => {
beforeEach(() => {
jest.clearAllMocks();
});
it('clears all four caches', async () => {
await invalidateConfigCaches();
expect(mockClearAppConfigCache).toHaveBeenCalledTimes(1);
expect(mockClearOverrideCache).toHaveBeenCalledTimes(1);
expect(mockInvalidateCachedTools).toHaveBeenCalledWith({ invalidateGlobal: true });
expect(mockConfigStoreDelete).toHaveBeenCalledWith(CacheKeys.ENDPOINT_CONFIG);
});
it('passes tenantId through to clearOverrideCache', async () => {
await invalidateConfigCaches('tenant-a');
expect(mockClearOverrideCache).toHaveBeenCalledWith('tenant-a');
expect(mockClearAppConfigCache).toHaveBeenCalledTimes(1);
expect(mockInvalidateCachedTools).toHaveBeenCalledWith({ invalidateGlobal: true });
});
it('does not throw when CONFIG_STORE.delete fails', async () => {
mockConfigStoreDelete.mockRejectedValueOnce(new Error('store not found'));
await expect(invalidateConfigCaches()).resolves.not.toThrow();
// Other caches should still have been invalidated
expect(mockClearAppConfigCache).toHaveBeenCalledTimes(1);
expect(mockClearOverrideCache).toHaveBeenCalledTimes(1);
expect(mockInvalidateCachedTools).toHaveBeenCalledWith({ invalidateGlobal: true });
});
it('all operations run in parallel (not sequentially)', async () => {
const order = [];
mockClearAppConfigCache.mockImplementation(
() =>
new Promise((r) =>
setTimeout(() => {
order.push('base');
r();
}, 10),
),
);
mockClearOverrideCache.mockImplementation(
() =>
new Promise((r) =>
setTimeout(() => {
order.push('override');
r();
}, 10),
),
);
mockInvalidateCachedTools.mockImplementation(
() =>
new Promise((r) =>
setTimeout(() => {
order.push('tools');
r();
}, 10),
),
);
mockConfigStoreDelete.mockImplementation(
() =>
new Promise((r) =>
setTimeout(() => {
order.push('endpoint');
r();
}, 10),
),
);
await invalidateConfigCaches();
// All four should have been called (parallel execution via Promise.allSettled)
expect(order).toHaveLength(4);
expect(new Set(order)).toEqual(new Set(['base', 'override', 'tools', 'endpoint']));
});
it('resolves even when clearAppConfigCache throws (partial failure)', async () => {
mockClearAppConfigCache.mockRejectedValueOnce(new Error('cache connection lost'));
await expect(invalidateConfigCaches()).resolves.not.toThrow();
// Other caches should still have been invalidated despite the failure
expect(mockClearOverrideCache).toHaveBeenCalledTimes(1);
expect(mockInvalidateCachedTools).toHaveBeenCalledWith({ invalidateGlobal: true });
});
});

View file

@ -1,9 +1,9 @@
const { CacheKeys } = require('librechat-data-provider');
const { AppService } = require('@librechat/data-schemas');
const { createAppConfigService } = require('@librechat/api');
const { AppService, logger } = require('@librechat/data-schemas');
const { setCachedTools, invalidateCachedTools } = require('./getCachedTools');
const { loadAndFormatTools } = require('~/server/services/start/tools');
const loadCustomConfig = require('./loadCustomConfig');
const { setCachedTools } = require('./getCachedTools');
const getLogStores = require('~/cache/getLogStores');
const paths = require('~/config/paths');
const db = require('~/models');
@ -20,7 +20,7 @@ const loadBaseConfig = async () => {
return AppService({ config, paths, systemTools });
};
const { getAppConfig, clearAppConfigCache } = createAppConfigService({
const { getAppConfig, clearAppConfigCache, clearOverrideCache } = createAppConfigService({
loadBaseConfig,
setCachedTools,
getCache: getLogStores,
@ -29,7 +29,44 @@ const { getAppConfig, clearAppConfigCache } = createAppConfigService({
getUserPrincipals: db.getUserPrincipals,
});
/** Deletes the ENDPOINT_CONFIG entry from CONFIG_STORE. Failures are non-critical and swallowed. */
async function clearEndpointConfigCache() {
try {
const configStore = getLogStores(CacheKeys.CONFIG_STORE);
await configStore.delete(CacheKeys.ENDPOINT_CONFIG);
} catch {
// CONFIG_STORE or ENDPOINT_CONFIG may not exist — not critical
}
}
/**
* Invalidate all config-related caches after an admin config mutation.
* Clears the base config, per-principal override caches, tool caches,
* and the endpoints config cache.
* @param {string} [tenantId] - Optional tenant ID to scope override cache clearing.
*/
async function invalidateConfigCaches(tenantId) {
const results = await Promise.allSettled([
clearAppConfigCache(),
clearOverrideCache(tenantId),
invalidateCachedTools({ invalidateGlobal: true }),
clearEndpointConfigCache(),
]);
const labels = [
'clearAppConfigCache',
'clearOverrideCache',
'invalidateCachedTools',
'clearEndpointConfigCache',
];
for (let i = 0; i < results.length; i++) {
if (results[i].status === 'rejected') {
logger.error(`[invalidateConfigCaches] ${labels[i]} failed:`, results[i].reason);
}
}
}
module.exports = {
getAppConfig,
clearAppConfigCache,
invalidateConfigCaches,
};

View file

@ -26,7 +26,8 @@ async function getEndpointsConfig(req) {
}
}
const appConfig = req.config ?? (await getAppConfig({ role: req.user?.role }));
const appConfig =
req.config ?? (await getAppConfig({ role: req.user?.role, tenantId: req.user?.tenantId }));
const defaultEndpointsConfig = await loadDefaultEndpointsConfig(appConfig);
const customEndpointsConfig = loadCustomEndpointsConfig(appConfig?.endpoints?.custom);

View file

@ -12,7 +12,7 @@ const { getAppConfig } = require('./app');
* @param {ServerRequest} req - The Express request object.
*/
async function loadConfigModels(req) {
const appConfig = await getAppConfig({ role: req.user?.role });
const appConfig = await getAppConfig({ role: req.user?.role, tenantId: req.user?.tenantId });
if (!appConfig) {
return {};
}

View file

@ -16,7 +16,8 @@ const { getAppConfig } = require('./app');
*/
async function loadDefaultModels(req) {
try {
const appConfig = req.config ?? (await getAppConfig({ role: req.user?.role }));
const appConfig =
req.config ?? (await getAppConfig({ role: req.user?.role, tenantId: req.user?.tenantId }));
const vertexConfig = appConfig?.endpoints?.[EModelEndpoint.anthropic]?.vertexConfig;
const [openAI, anthropic, azureOpenAI, assistants, azureAssistants, google, bedrock] =