🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)

* feat: add tenant context middleware for ALS-based isolation

Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.

- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths

* feat: register tenant middleware and wrap startup/auth in runAsSystem()

- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
  before user context is established (LDAP, SAML, OpenID, social login, AuthService)

* feat: thread tenantId through all getAppConfig callers

Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.

Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.

Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint

* feat: add config cache invalidation on admin mutations

- Add clearOverrideCache(tenantId?) to flush per-principal override caches
  by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
  caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
  (upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)

* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering

The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.

* fix: replace runAsSystem with baseOnly for pre-tenant code paths

App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.

All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.

* fix: address PR review findings — middleware ordering, types, cache safety

- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
  instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)

* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly

- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
  initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
  queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)

* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests

- requireJwtAuth: 5 tests verifying ALS tenant context is set after
  passport auth, isolated between concurrent requests, and not set
  when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
  tenantId is threaded through, partial failure is handled gracefully,
  and operations run in parallel via Promise.all (Finding 11)

* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping

- Forward passport errors in requireJwtAuth before entering tenant
  middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
  are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
  so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
  base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing

* fix: address second review — cache safety, code quality, test reliability

- Decouple cache invalidation from mutation response: fire-and-forget
  with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
  side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
  env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock

* fix: move JSDoc to correct function after clearEndpointConfigCache extraction

* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry

Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.

* fix: remove return from tenantStorage.run to satisfy void middleware signature

* fix: address second review — cache safety, code quality, test reliability

- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
  so partial failures are logged individually instead of producing one
  undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
  flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
  if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
  clearAppConfigCache rejects (not just the swallowed endpoint error)

* chore: reorder imports in index.js and app.js for consistency

- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
This commit is contained in:
Danny Avila 2026-03-26 17:35:00 -04:00 committed by GitHub
parent 083042e56c
commit 9f6d8c6e93
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
32 changed files with 768 additions and 63 deletions

View file

@ -0,0 +1,101 @@
import { getTenantId } from '@librechat/data-schemas';
import type { Response, NextFunction } from 'express';
import type { ServerRequest } from '~/types/http';
// Import directly from source file — _resetTenantMiddlewareStrictCache is intentionally
// excluded from the public barrel export (index.ts).
import { tenantContextMiddleware, _resetTenantMiddlewareStrictCache } from '../tenant';
function mockReq(user?: Record<string, unknown>): ServerRequest {
return { user } as unknown as ServerRequest;
}
function mockRes(): Response {
const res = {
status: jest.fn().mockReturnThis(),
json: jest.fn().mockReturnThis(),
};
return res as unknown as Response;
}
/** Runs the middleware and returns a Promise that resolves when next() is called. */
function runMiddleware(req: ServerRequest, res: Response): Promise<string | undefined> {
return new Promise((resolve) => {
const next: NextFunction = () => {
resolve(getTenantId());
};
tenantContextMiddleware(req, res, next);
});
}
describe('tenantContextMiddleware', () => {
afterEach(() => {
_resetTenantMiddlewareStrictCache();
delete process.env.TENANT_ISOLATION_STRICT;
});
it('sets ALS tenant context for authenticated requests with tenantId', async () => {
const req = mockReq({ tenantId: 'tenant-x', role: 'user' });
const res = mockRes();
const tenantId = await runMiddleware(req, res);
expect(tenantId).toBe('tenant-x');
});
it('is a no-op for unauthenticated requests (no user)', async () => {
const req = mockReq();
const res = mockRes();
const tenantId = await runMiddleware(req, res);
expect(tenantId).toBeUndefined();
});
it('passes through without ALS when user has no tenantId in non-strict mode', async () => {
const req = mockReq({ role: 'user' });
const res = mockRes();
const tenantId = await runMiddleware(req, res);
expect(tenantId).toBeUndefined();
});
it('returns 403 when user has no tenantId in strict mode', () => {
process.env.TENANT_ISOLATION_STRICT = 'true';
_resetTenantMiddlewareStrictCache();
const req = mockReq({ role: 'user' });
const res = mockRes();
const next: NextFunction = jest.fn();
tenantContextMiddleware(req, res, next);
expect(res.status).toHaveBeenCalledWith(403);
expect(res.json).toHaveBeenCalledWith(
expect.objectContaining({ error: expect.stringContaining('Tenant context required') }),
);
expect(next).not.toHaveBeenCalled();
});
it('allows authenticated requests with tenantId in strict mode', async () => {
process.env.TENANT_ISOLATION_STRICT = 'true';
_resetTenantMiddlewareStrictCache();
const req = mockReq({ tenantId: 'tenant-y', role: 'admin' });
const res = mockRes();
const tenantId = await runMiddleware(req, res);
expect(tenantId).toBe('tenant-y');
});
it('different requests get independent tenant contexts', async () => {
const runRequest = (tid: string) => {
const req = mockReq({ tenantId: tid, role: 'user' });
const res = mockRes();
return runMiddleware(req, res);
};
const results = await Promise.all([runRequest('tenant-1'), runRequest('tenant-2')]);
expect(results).toHaveLength(2);
expect(results).toContain('tenant-1');
expect(results).toContain('tenant-2');
});
});

View file

@ -12,7 +12,11 @@ import type { BalanceUpdateFields } from '~/types';
import { getBalanceConfig } from '~/app/config';
export interface BalanceMiddlewareOptions {
getAppConfig: (options?: { role?: string; refresh?: boolean }) => Promise<AppConfig>;
getAppConfig: (options?: {
role?: string;
tenantId?: string;
refresh?: boolean;
}) => Promise<AppConfig>;
findBalanceByUser: (userId: string) => Promise<IBalance | null>;
upsertBalanceFields: (userId: string, fields: IBalanceUpdate) => Promise<IBalance | null>;
}
@ -92,7 +96,10 @@ export function createSetBalanceConfig({
return async (req: ServerRequest, res: ServerResponse, next: NextFunction): Promise<void> => {
try {
const user = req.user as IUser & { _id: string | ObjectId };
const appConfig = await getAppConfig({ role: user?.role });
const appConfig = await getAppConfig({
role: user?.role,
tenantId: user?.tenantId,
});
const balanceConfig = getBalanceConfig(appConfig);
if (!balanceConfig?.enabled) {
return next();

View file

@ -5,5 +5,6 @@ export * from './notFound';
export * from './balance';
export * from './json';
export * from './capabilities';
export { tenantContextMiddleware } from './tenant';
export * from './concurrency';
export * from './checkBalance';

View file

@ -0,0 +1,70 @@
import { isMainThread } from 'worker_threads';
import { tenantStorage, logger } from '@librechat/data-schemas';
import type { Response, NextFunction } from 'express';
import type { ServerRequest } from '~/types/http';
let _checkedThread = false;
let _strictMode: boolean | undefined;
function isStrict(): boolean {
return (_strictMode ??= process.env.TENANT_ISOLATION_STRICT === 'true');
}
/** Resets the cached strict-mode flag. Exposed for test teardown only. */
export function _resetTenantMiddlewareStrictCache(): void {
_strictMode = undefined;
}
/**
* Express middleware that propagates the authenticated user's `tenantId` into
* the AsyncLocalStorage context used by the Mongoose tenant-isolation plugin.
*
* **Placement**: Chained automatically by `requireJwtAuth` after successful
* passport authentication (req.user is populated). Must NOT be registered at
* global `app.use()` scope `req.user` is undefined at that stage.
*
* Behaviour:
* - Authenticated request with `tenantId` wraps downstream in `tenantStorage.run({ tenantId })`
* - Authenticated request **without** `tenantId`:
* - Strict mode (`TENANT_ISOLATION_STRICT=true`) responds 403
* - Non-strict (default) passes through without ALS context (backward compat)
* - Unauthenticated request no-op (calls `next()` directly)
*/
export function tenantContextMiddleware(
req: ServerRequest,
res: Response,
next: NextFunction,
): void {
if (!_checkedThread) {
_checkedThread = true;
if (!isMainThread) {
logger.error(
'[tenantContextMiddleware] Running in a worker thread — ' +
'ALS context will not propagate. This middleware must only run in the main Express process.',
);
}
}
const user = req.user as { tenantId?: string } | undefined;
if (!user) {
next();
return;
}
const tenantId = user.tenantId;
if (!tenantId) {
if (isStrict()) {
res.status(403).json({ error: 'Tenant context required in strict isolation mode' });
return;
}
next();
return;
}
return void tenantStorage.run({ tenantId }, async () => {
next();
});
}