🧪 chore: MCP Reconnect Storm Follow-Up Fixes and Integration Tests (#12172)

* 🧪 test: Add reconnection storm regression tests for MCPConnection Introduced a comprehensive test suite for reconnection storm scenarios, validating circuit breaker, throttling, cooldown, and timeout fixes. The tests utilize real MCP SDK transports and a StreamableHTTP server to ensure accurate behavior under rapid connect/disconnect cycles and error handling for SSE 400/405 responses. This enhances the reliability of the MCPConnection by ensuring proper handling of reconnection logic and circuit breaker functionality. * 🔧 fix: Update createUnavailableToolStub to return structured response Modified the `createUnavailableToolStub` function to return an array containing the unavailable message and a null value, enhancing the response structure. Additionally, added a debug log to skip tool creation when the result is null, improving the handling of reconnection scenarios in the MCP service. * 🧪 test: Enhance MCP tool creation tests for cache and throttle interactions Added new test cases for the `createMCPTool` function to validate the caching behavior when tools are unavailable or throttled. The tests ensure that tools are correctly cached as missing and prevent unnecessary reconnects across different users, improving the reliability of the MCP service under concurrent usage scenarios. Additionally, introduced a test for the `createMCPTools` function to verify that it returns an empty array when reconnect is throttled, ensuring proper handling of throttling logic. * 📝 docs: Update AGENTS.md with testing philosophy and guidelines Expanded the testing section in AGENTS.md to emphasize the importance of using real logic over mocks, advocating for the use of spies and real dependencies in tests. Added specific recommendations for testing with MongoDB and MCP SDK, highlighting the need to mock only uncontrollable external services. This update aims to improve testing practices and encourage more robust test implementations. * 🧪 test: Enhance reconnection storm tests with socket tracking and SSE handling Updated the reconnection storm test suite to include a new socket tracking mechanism for better resource management during tests. Improved the handling of SSE 400/405 responses by ensuring they are processed in the same branch as 404 errors, preventing unhandled cases. This enhances the reliability of the MCPConnection under rapid reconnect scenarios and ensures proper error handling. * 🔧 fix: Implement cache eviction for stale reconnect attempts and missing tools Added an `evictStale` function to manage the size of the `lastReconnectAttempts` and `missingToolCache` maps, ensuring they do not exceed a maximum cache size. This enhancement improves resource management by removing outdated entries based on a specified time-to-live (TTL), thereby optimizing the MCP service's performance during reconnection scenarios.
2026-03-11 10:32:37 +01:00 · 2026-03-10 17:44:13 -04:00 · 2026-03-10 17:44:13 -04:00 · 6167ce6e57
commit 6167ce6e57
parent c0e876a2e6
4 changed files with 736 additions and 2 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -149,7 +149,15 @@ Multi-line imports count total character length across all lines. Consolidate va
 - Run tests from their workspace directory: `cd api && npx jest <pattern>`, `cd packages/api && npx jest <pattern>`, etc.
 - Frontend tests: `__tests__` directories alongside components; use `test/layout-test-utils` for rendering.
 - Cover loading, success, and error states for UI/data flows.
- Mock data-provider hooks and external dependencies.
+
+### Philosophy
+
+- **Real logic over mocks.** Exercise actual code paths with real dependencies. Mocking is a last resort.
+- **Spies over mocks.** Assert that real functions are called with expected arguments and frequency without replacing underlying logic.
+- **MongoDB**: use `mongodb-memory-server` for a real in-memory MongoDB instance. Test actual queries and schema validation, not mocked DB calls.
+- **MCP**: use real `@modelcontextprotocol/sdk` exports for servers, transports, and tool definitions. Mirror real scenarios, don't stub SDK internals.
+- Only mock what you cannot control: external HTTP APIs, rate-limited services, non-deterministic system calls.
+- Heavy mocking is a code smell, not a testing strategy.

 ---

--- a/api/server/services/MCP.js
+++ b/api/server/services/MCP.js
@ -34,12 +34,28 @@ const { reinitMCPServer } = require('./Tools/mcp');
 const { getAppConfig } = require('./Config');
 const { getLogStores } = require('~/cache');

+const MAX_CACHE_SIZE = 1000;
 const lastReconnectAttempts = new Map();
 const RECONNECT_THROTTLE_MS = 10_000;

 const missingToolCache = new Map();
 const MISSING_TOOL_TTL_MS = 10_000;

+function evictStale(map, ttl) {
+  if (map.size <= MAX_CACHE_SIZE) {
+    return;
+  }
+  const now = Date.now();
+  for (const [key, timestamp] of map) {
+    if (now - timestamp >= ttl) {
+      map.delete(key);
+    }
+    if (map.size <= MAX_CACHE_SIZE) {
+      return;
+    }
+  }
+}
+
 const unavailableMsg =
  "This tool's MCP server is temporarily unavailable. Please try again shortly.";

@ -49,7 +65,7 @@ const unavailableMsg =
 */
 function createUnavailableToolStub(toolName, serverName) {
  const normalizedToolKey = `${toolName}${Constants.mcp_delimiter}${normalizeServerName(serverName)}`;
-  const _call = async () => unavailableMsg;
+  const _call = async () => [unavailableMsg, null];
  const toolInstance = tool(_call, {
    schema: {
      type: 'object',
@ -253,6 +269,7 @@ async function reconnectServer({
    return null;
  }
  lastReconnectAttempts.set(throttleKey, now);
+  evictStale(lastReconnectAttempts, RECONNECT_THROTTLE_MS);

  const runId = Constants.USE_PRELIM_RESPONSE_MESSAGE_ID;
  const flowId = `${user.id}:${serverName}:${Date.now()}`;
@ -373,6 +390,10 @@ async function createMCPTools({
    userMCPAuthMap,
    streamId,
  });
+  if (result === null) {
+    logger.debug(`[MCP][${serverName}] Reconnect throttled, skipping tool creation.`);
+    return [];
+  }
  if (!result || !result.tools) {
    logger.warn(`[MCP][${serverName}] Failed to reinitialize MCP server.`);
    return [];
@ -469,6 +490,7 @@ async function createMCPTool({

    if (!toolDefinition) {
      missingToolCache.set(toolKey, Date.now());
+      evictStale(missingToolCache, MISSING_TOOL_TTL_MS);
    }
  }

--- a/api/server/services/MCP.spec.js
+++ b/api/server/services/MCP.spec.js
@ -45,6 +45,7 @@ const {
  getMCPSetupData,
  checkOAuthFlowStatus,
  getServerConnectionStatus,
+  createUnavailableToolStub,
 } = require('./MCP');

 jest.mock('./Config', () => ({
@ -1098,6 +1099,188 @@ describe('User parameter passing tests', () => {
    });
  });

+  describe('createUnavailableToolStub', () => {
+    it('should return a tool whose _call returns a valid CONTENT_AND_ARTIFACT two-tuple', async () => {
+      const stub = createUnavailableToolStub('myTool', 'myServer');
+      // invoke() goes through langchain's base tool, which checks responseFormat.
+      // CONTENT_AND_ARTIFACT requires [content, artifact] — a bare string would throw:
+      //   "Tool response format is "content_and_artifact" but the output was not a two-tuple"
+      const result = await stub.invoke({});
+      // If we reach here without throwing, the two-tuple format is correct.
+      // invoke() returns the content portion of [content, artifact] as a string.
+      expect(result).toContain('temporarily unavailable');
+    });
+  });
+
+  describe('negative tool cache and throttle interaction', () => {
+    it('should cache tool as missing even when throttled (cross-user dedup)', async () => {
+      const mockUser = { id: 'throttle-test-user' };
+      const mockRes = { write: jest.fn(), flush: jest.fn() };
+
+      // First call: reconnect succeeds but tool not found
+      mockReinitMCPServer.mockResolvedValueOnce({
+        availableTools: {},
+      });
+
+      await createMCPTool({
+        res: mockRes,
+        user: mockUser,
+        toolKey: `missing-tool${D}cache-dedup-server`,
+        provider: 'openai',
+        userMCPAuthMap: {},
+        availableTools: undefined,
+      });
+
+      // Second call within 10s for DIFFERENT tool on same server:
+      // reconnect is throttled (returns null), tool is still cached as missing.
+      // This is intentional: the cache acts as cross-user dedup since the
+      // throttle is per-user-per-server and can't prevent N different users
+      // from each triggering their own reconnect.
+      const result2 = await createMCPTool({
+        res: mockRes,
+        user: mockUser,
+        toolKey: `other-tool${D}cache-dedup-server`,
+        provider: 'openai',
+        userMCPAuthMap: {},
+        availableTools: undefined,
+      });
+
+      expect(result2).toBeDefined();
+      expect(result2.name).toContain('other-tool');
+      expect(mockReinitMCPServer).toHaveBeenCalledTimes(1);
+    });
+
+    it('should prevent user B from triggering reconnect when user A already cached the tool', async () => {
+      const userA = { id: 'cache-user-A' };
+      const userB = { id: 'cache-user-B' };
+      const mockRes = { write: jest.fn(), flush: jest.fn() };
+
+      // User A: real reconnect, tool not found → cached
+      mockReinitMCPServer.mockResolvedValueOnce({
+        availableTools: {},
+      });
+
+      await createMCPTool({
+        res: mockRes,
+        user: userA,
+        toolKey: `shared-tool${D}cross-user-server`,
+        provider: 'openai',
+        userMCPAuthMap: {},
+        availableTools: undefined,
+      });
+
+      expect(mockReinitMCPServer).toHaveBeenCalledTimes(1);
+
+      // User B requests the SAME tool within 10s.
+      // The negative cache is keyed by toolKey (no user prefix), so user B
+      // gets a cache hit and no reconnect fires. This is the cross-user
+      // storm protection: without this, user B's unthrottled first request
+      // would trigger a second reconnect to the same server.
+      const result = await createMCPTool({
+        res: mockRes,
+        user: userB,
+        toolKey: `shared-tool${D}cross-user-server`,
+        provider: 'openai',
+        userMCPAuthMap: {},
+        availableTools: undefined,
+      });
+
+      expect(result).toBeDefined();
+      expect(result.name).toContain('shared-tool');
+      // reinitMCPServer still called only once — user B hit the cache
+      expect(mockReinitMCPServer).toHaveBeenCalledTimes(1);
+    });
+
+    it('should prevent user B from triggering reconnect for throttle-cached tools', async () => {
+      const userA = { id: 'storm-user-A' };
+      const userB = { id: 'storm-user-B' };
+      const mockRes = { write: jest.fn(), flush: jest.fn() };
+
+      // User A: real reconnect for tool-1, tool not found → cached
+      mockReinitMCPServer.mockResolvedValueOnce({
+        availableTools: {},
+      });
+
+      await createMCPTool({
+        res: mockRes,
+        user: userA,
+        toolKey: `tool-1${D}storm-server`,
+        provider: 'openai',
+        userMCPAuthMap: {},
+        availableTools: undefined,
+      });
+
+      // User A: tool-2 on same server within 10s → throttled → cached from throttle
+      await createMCPTool({
+        res: mockRes,
+        user: userA,
+        toolKey: `tool-2${D}storm-server`,
+        provider: 'openai',
+        userMCPAuthMap: {},
+        availableTools: undefined,
+      });
+
+      expect(mockReinitMCPServer).toHaveBeenCalledTimes(1);
+
+      // User B requests tool-2 — gets cache hit from the throttle-cached entry.
+      // Without this caching, user B would trigger a real reconnect since
+      // user B has their own throttle key and hasn't reconnected yet.
+      const result = await createMCPTool({
+        res: mockRes,
+        user: userB,
+        toolKey: `tool-2${D}storm-server`,
+        provider: 'openai',
+        userMCPAuthMap: {},
+        availableTools: undefined,
+      });
+
+      expect(result).toBeDefined();
+      expect(result.name).toContain('tool-2');
+      // Still only 1 real reconnect — user B was protected by the cache
+      expect(mockReinitMCPServer).toHaveBeenCalledTimes(1);
+    });
+  });
+
+  describe('createMCPTools throttle handling', () => {
+    it('should return empty array with debug log when reconnect is throttled', async () => {
+      const mockUser = { id: 'throttle-tools-user' };
+      const mockRes = { write: jest.fn(), flush: jest.fn() };
+
+      // First call: real reconnect
+      mockReinitMCPServer.mockResolvedValueOnce({
+        tools: [{ name: 'tool1' }],
+        availableTools: {
+          [`tool1${D}throttle-tools-server`]: {
+            function: { description: 'Tool 1', parameters: {} },
+          },
+        },
+      });
+
+      await createMCPTools({
+        res: mockRes,
+        user: mockUser,
+        serverName: 'throttle-tools-server',
+        provider: 'openai',
+        userMCPAuthMap: {},
+      });
+
+      // Second call within 10s — throttled
+      const result = await createMCPTools({
+        res: mockRes,
+        user: mockUser,
+        serverName: 'throttle-tools-server',
+        provider: 'openai',
+        userMCPAuthMap: {},
+      });
+
+      expect(result).toEqual([]);
+      // reinitMCPServer called only once — second was throttled
+      expect(mockReinitMCPServer).toHaveBeenCalledTimes(1);
+      // Should log at debug level (not warn) for throttled case
+      expect(logger.debug).toHaveBeenCalledWith(expect.stringContaining('Reconnect throttled'));
+    });
+  });
+
  describe('User parameter integrity', () => {
    it('should preserve user object properties through the call chain', async () => {
      const complexUser = {
--- a/packages/api/src/mcp/tests/reconnection-storm.test.ts
+++ b/packages/api/src/mcp/tests/reconnection-storm.test.ts
@ -0,0 +1,521 @@
+/**
+ * Reconnection storm regression tests for PR #12162.
+ *
+ * Validates circuit breaker, throttling, cooldown, and timeout fixes using real
+ * MCP SDK transports (no mocked stubs). A real StreamableHTTP server is spun up
+ * per test suite and MCPConnection talks to it through a genuine HTTP stack.
+ */
+import http from 'http';
+import { randomUUID } from 'crypto';
+import express from 'express';
+import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
+import { isInitializeRequest } from '@modelcontextprotocol/sdk/types.js';
+import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
+import type { Socket } from 'net';
+import { OAuthReconnectionTracker } from '~/mcp/oauth/OAuthReconnectionTracker';
+import { MCPConnection } from '~/mcp/connection';
+
+jest.mock('@librechat/data-schemas', () => ({
+  logger: {
+    info: jest.fn(),
+    warn: jest.fn(),
+    error: jest.fn(),
+    debug: jest.fn(),
+  },
+}));
+
+/* ------------------------------------------------------------------ */
+/*  Helpers                                                           */
+/* ------------------------------------------------------------------ */
+
+interface TestServer {
+  url: string;
+  httpServer: http.Server;
+  close: () => Promise<void>;
+}
+
+function trackSockets(httpServer: http.Server): () => Promise<void> {
+  const sockets = new Set<Socket>();
+  httpServer.on('connection', (socket: Socket) => {
+    sockets.add(socket);
+    socket.once('close', () => sockets.delete(socket));
+  });
+  return () =>
+    new Promise<void>((resolve) => {
+      for (const socket of sockets) {
+        socket.destroy();
+      }
+      sockets.clear();
+      httpServer.close(() => resolve());
+    });
+}
+
+function startMCPServer(): Promise<TestServer> {
+  const app = express();
+  app.use(express.json());
+
+  const transports: Record<string, StreamableHTTPServerTransport> = {};
+
+  function createServer(): McpServer {
+    const server = new McpServer({ name: 'test-server', version: '1.0.0' });
+    server.tool('echo', 'echoes input', { message: { type: 'string' } as never }, async (args) => {
+      const msg = (args as Record<string, string>).message ?? '';
+      return { content: [{ type: 'text', text: msg }] };
+    });
+    return server;
+  }
+
+  app.all('/mcp', async (req, res) => {
+    const sessionId = req.headers['mcp-session-id'] as string | undefined;
+
+    if (sessionId && transports[sessionId]) {
+      await transports[sessionId].handleRequest(req, res, req.body);
+      return;
+    }
+
+    if (!sessionId && isInitializeRequest(req.body)) {
+      const transport = new StreamableHTTPServerTransport({
+        sessionIdGenerator: () => randomUUID(),
+        onsessioninitialized: (sid) => {
+          transports[sid] = transport;
+        },
+      });
+      transport.onclose = () => {
+        const sid = transport.sessionId;
+        if (sid) {
+          delete transports[sid];
+        }
+      };
+      const server = createServer();
+      await server.connect(transport);
+      await transport.handleRequest(req, res, req.body);
+      return;
+    }
+
+    if (req.method === 'GET') {
+      res.status(404).send('Not Found');
+      return;
+    }
+
+    res.status(400).json({
+      jsonrpc: '2.0',
+      error: { code: -32000, message: 'Bad Request: No valid session ID provided' },
+      id: null,
+    });
+  });
+
+  return new Promise((resolve) => {
+    const httpServer = app.listen(0, '127.0.0.1', () => {
+      const destroySockets = trackSockets(httpServer);
+      const addr = httpServer.address() as { port: number };
+      resolve({
+        url: `http://127.0.0.1:${addr.port}/mcp`,
+        httpServer,
+        close: async () => {
+          for (const t of Object.values(transports)) {
+            t.close().catch(() => {});
+          }
+          await destroySockets();
+        },
+      });
+    });
+  });
+}
+
+function createConnection(serverName: string, url: string, initTimeout = 5000): MCPConnection {
+  return new MCPConnection({
+    serverName,
+    serverConfig: { url, type: 'streamable-http', initTimeout } as never,
+  });
+}
+
+async function teardownConnection(conn: MCPConnection): Promise<void> {
+  // eslint-disable-next-line @typescript-eslint/no-explicit-any
+  (conn as any).shouldStopReconnecting = true;
+  conn.removeAllListeners();
+  await conn.disconnect();
+}
+
+afterEach(() => {
+  // eslint-disable-next-line @typescript-eslint/no-explicit-any
+  (MCPConnection as any).circuitBreakers.clear();
+});
+
+/* ------------------------------------------------------------------ */
+/*  Fix #2 — Circuit breaker trips after rapid connect/disconnect      */
+/*  cycles (5 cycles within 60s -> 30s cooldown)                       */
+/* ------------------------------------------------------------------ */
+describe('Fix #2: Circuit breaker stops rapid reconnect cycling', () => {
+  it('blocks connection after 5 rapid cycles via static circuit breaker', async () => {
+    const srv = await startMCPServer();
+    const conn = createConnection('cycling-server', srv.url);
+
+    let completedCycles = 0;
+    let breakerMessage = '';
+    for (let cycle = 0; cycle < 10; cycle++) {
+      try {
+        await conn.connect();
+        await teardownConnection(conn);
+        // eslint-disable-next-line @typescript-eslint/no-explicit-any
+        (conn as any).shouldStopReconnecting = false;
+        completedCycles++;
+      } catch (e) {
+        breakerMessage = (e as Error).message;
+        break;
+      }
+    }
+
+    expect(breakerMessage).toContain('Circuit breaker is open');
+    expect(completedCycles).toBeLessThanOrEqual(5);
+
+    await srv.close();
+  });
+});
+
+/* ------------------------------------------------------------------ */
+/*  Fix #3 — SSE 400/405 handled in same branch as 404                */
+/* ------------------------------------------------------------------ */
+describe('Fix #3: SSE 400/405 handled in same branch as 404', () => {
+  it('400 with active session triggers reconnection (session lost)', async () => {
+    const srv = await startMCPServer();
+    const conn = createConnection('sse-400', srv.url);
+    await conn.connect();
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    (conn as any).shouldStopReconnecting = true;
+
+    const changes: string[] = [];
+    conn.on('connectionChange', (s: string) => changes.push(s));
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const transport = (conn as any).transport;
+    transport.onerror({ message: 'Failed to open SSE stream', code: 400 });
+
+    expect(changes).toContain('error');
+
+    await teardownConnection(conn);
+    await srv.close();
+  });
+
+  it('405 with active session triggers reconnection (session lost)', async () => {
+    const srv = await startMCPServer();
+    const conn = createConnection('sse-405', srv.url);
+    await conn.connect();
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    (conn as any).shouldStopReconnecting = true;
+
+    const changes: string[] = [];
+    conn.on('connectionChange', (s: string) => changes.push(s));
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const transport = (conn as any).transport;
+    transport.onerror({ message: 'Method Not Allowed', code: 405 });
+
+    expect(changes).toContain('error');
+
+    await teardownConnection(conn);
+    await srv.close();
+  });
+});
+
+/* ------------------------------------------------------------------ */
+/*  Fix #4 — Circuit breaker state persists in static Map across       */
+/*  instance replacements                                              */
+/* ------------------------------------------------------------------ */
+describe('Fix #4: Circuit breaker state persists across instance replacement', () => {
+  it('new MCPConnection for same serverName inherits breaker state from static Map', async () => {
+    const srv = await startMCPServer();
+
+    const conn1 = createConnection('replace', srv.url);
+    await conn1.connect();
+    await teardownConnection(conn1);
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const cbAfterConn1 = (MCPConnection as any).circuitBreakers.get('replace');
+    expect(cbAfterConn1).toBeDefined();
+    const cyclesAfterConn1 = cbAfterConn1.cycleCount;
+    expect(cyclesAfterConn1).toBeGreaterThan(0);
+
+    const conn2 = createConnection('replace', srv.url);
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const cbFromConn2 = (conn2 as any).getCircuitBreaker();
+    expect(cbFromConn2.cycleCount).toBe(cyclesAfterConn1);
+
+    await teardownConnection(conn2);
+    await srv.close();
+  });
+
+  it('clearCooldown resets static state so explicit retry proceeds', () => {
+    const conn = createConnection('replace', 'http://127.0.0.1:1/mcp');
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const cb = (conn as any).getCircuitBreaker();
+    cb.cooldownUntil = Date.now() + 999_999;
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    expect((conn as any).isCircuitOpen()).toBe(true);
+
+    MCPConnection.clearCooldown('replace');
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    expect((conn as any).isCircuitOpen()).toBe(false);
+  });
+});
+
+/* ------------------------------------------------------------------ */
+/*  Fix #5 — Dead servers now trigger circuit breaker via              */
+/*  recordFailedRound() in the catch path                              */
+/* ------------------------------------------------------------------ */
+describe('Fix #5: Dead server triggers circuit breaker', () => {
+  it('3 failures trigger backoff, blocking subsequent attempts before they reach the SDK', async () => {
+    const conn = createConnection('dead', 'http://127.0.0.1:1/mcp', 1000);
+    const spy = jest.spyOn(conn.client, 'connect');
+
+    const errors: string[] = [];
+    for (let i = 0; i < 5; i++) {
+      try {
+        await conn.connect();
+      } catch (e) {
+        errors.push((e as Error).message);
+      }
+    }
+
+    expect(spy.mock.calls.length).toBe(3);
+    expect(errors).toHaveLength(5);
+    expect(errors.filter((m) => m.includes('Circuit breaker is open'))).toHaveLength(2);
+
+    await conn.disconnect();
+  });
+
+  it('user B is immediately blocked when user A already tripped the breaker for the same server', async () => {
+    const deadUrl = 'http://127.0.0.1:1/mcp';
+
+    const userA = new MCPConnection({
+      serverName: 'shared-dead',
+      serverConfig: { url: deadUrl, type: 'streamable-http', initTimeout: 1000 } as never,
+      userId: 'user-A',
+    });
+
+    for (let i = 0; i < 3; i++) {
+      try {
+        await userA.connect();
+      } catch {
+        // expected
+      }
+    }
+
+    const userB = new MCPConnection({
+      serverName: 'shared-dead',
+      serverConfig: { url: deadUrl, type: 'streamable-http', initTimeout: 1000 } as never,
+      userId: 'user-B',
+    });
+    const spyB = jest.spyOn(userB.client, 'connect');
+
+    let blockedMessage = '';
+    try {
+      await userB.connect();
+    } catch (e) {
+      blockedMessage = (e as Error).message;
+    }
+
+    expect(blockedMessage).toContain('Circuit breaker is open');
+    expect(spyB).toHaveBeenCalledTimes(0);
+
+    await userA.disconnect();
+    await userB.disconnect();
+  });
+
+  it('clearCooldown after user retry unblocks all users', async () => {
+    const deadUrl = 'http://127.0.0.1:1/mcp';
+
+    const userA = new MCPConnection({
+      serverName: 'shared-dead-clear',
+      serverConfig: { url: deadUrl, type: 'streamable-http', initTimeout: 1000 } as never,
+      userId: 'user-A',
+    });
+    for (let i = 0; i < 3; i++) {
+      try {
+        await userA.connect();
+      } catch {
+        // expected
+      }
+    }
+
+    const userB = new MCPConnection({
+      serverName: 'shared-dead-clear',
+      serverConfig: { url: deadUrl, type: 'streamable-http', initTimeout: 1000 } as never,
+      userId: 'user-B',
+    });
+    try {
+      await userB.connect();
+    } catch (e) {
+      expect((e as Error).message).toContain('Circuit breaker is open');
+    }
+
+    MCPConnection.clearCooldown('shared-dead-clear');
+
+    const spyB = jest.spyOn(userB.client, 'connect');
+    try {
+      await userB.connect();
+    } catch {
+      // expected — server is still dead
+    }
+
+    expect(spyB).toHaveBeenCalledTimes(1);
+
+    await userA.disconnect();
+    await userB.disconnect();
+  });
+});
+
+/* ------------------------------------------------------------------ */
+/*  Fix #5b — disconnect(false) preserves cycle tracking               */
+/* ------------------------------------------------------------------ */
+describe('Fix #5b: disconnect(false) preserves cycle tracking', () => {
+  it('connect() passes false to disconnect, which calls recordCycle()', async () => {
+    const srv = await startMCPServer();
+    const conn = createConnection('wipe', srv.url);
+    const spy = jest.spyOn(conn, 'disconnect');
+
+    await conn.connect();
+    expect(spy).toHaveBeenCalledWith(false);
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const cb = (MCPConnection as any).circuitBreakers.get('wipe');
+    expect(cb).toBeDefined();
+    expect(cb.cycleCount).toBeGreaterThan(0);
+
+    await teardownConnection(conn);
+    await srv.close();
+  });
+});
+
+/* ------------------------------------------------------------------ */
+/*  Fix #6 — OAuth failure uses cooldown-based retry                   */
+/* ------------------------------------------------------------------ */
+describe('Fix #6: OAuth failure uses cooldown-based retry', () => {
+  beforeEach(() => jest.useFakeTimers());
+  afterEach(() => jest.useRealTimers());
+
+  it('isFailed expires after first cooldown of 5 min', () => {
+    jest.setSystemTime(Date.now());
+    const tracker = new OAuthReconnectionTracker();
+    tracker.setFailed('u1', 'srv');
+
+    expect(tracker.isFailed('u1', 'srv')).toBe(true);
+    jest.advanceTimersByTime(5 * 60 * 1000);
+    expect(tracker.isFailed('u1', 'srv')).toBe(false);
+  });
+
+  it('progressive cooldown: 5m, 10m, 20m, 30m (capped)', () => {
+    jest.setSystemTime(Date.now());
+    const tracker = new OAuthReconnectionTracker();
+
+    tracker.setFailed('u1', 'srv');
+    jest.advanceTimersByTime(5 * 60 * 1000);
+    expect(tracker.isFailed('u1', 'srv')).toBe(false);
+
+    tracker.setFailed('u1', 'srv');
+    jest.advanceTimersByTime(10 * 60 * 1000);
+    expect(tracker.isFailed('u1', 'srv')).toBe(false);
+
+    tracker.setFailed('u1', 'srv');
+    jest.advanceTimersByTime(20 * 60 * 1000);
+    expect(tracker.isFailed('u1', 'srv')).toBe(false);
+
+    tracker.setFailed('u1', 'srv');
+    jest.advanceTimersByTime(29 * 60 * 1000);
+    expect(tracker.isFailed('u1', 'srv')).toBe(true);
+    jest.advanceTimersByTime(1 * 60 * 1000);
+    expect(tracker.isFailed('u1', 'srv')).toBe(false);
+  });
+
+  it('removeFailed resets attempt count so next failure starts at 5m', () => {
+    jest.setSystemTime(Date.now());
+    const tracker = new OAuthReconnectionTracker();
+
+    tracker.setFailed('u1', 'srv');
+    tracker.setFailed('u1', 'srv');
+    tracker.setFailed('u1', 'srv');
+    tracker.removeFailed('u1', 'srv');
+
+    tracker.setFailed('u1', 'srv');
+    jest.advanceTimersByTime(5 * 60 * 1000);
+    expect(tracker.isFailed('u1', 'srv')).toBe(false);
+  });
+});
+
+/* ------------------------------------------------------------------ */
+/*  Integration: Circuit breaker caps rapid cycling with real transport */
+/* ------------------------------------------------------------------ */
+describe('Cascade: Circuit breaker caps rapid cycling', () => {
+  it('breaker trips before 10 cycles complete against a live server', async () => {
+    const srv = await startMCPServer();
+    const conn = createConnection('cascade', srv.url);
+    const spy = jest.spyOn(conn.client, 'connect');
+
+    let completedCycles = 0;
+    for (let i = 0; i < 10; i++) {
+      try {
+        await conn.connect();
+        await teardownConnection(conn);
+        // eslint-disable-next-line @typescript-eslint/no-explicit-any
+        (conn as any).shouldStopReconnecting = false;
+        completedCycles++;
+      } catch (e) {
+        if ((e as Error).message.includes('Circuit breaker is open')) {
+          break;
+        }
+        throw e;
+      }
+    }
+
+    expect(completedCycles).toBeLessThanOrEqual(5);
+    expect(spy.mock.calls.length).toBeLessThanOrEqual(5);
+
+    await srv.close();
+  });
+
+  it('breaker bounds failures against a killed server', async () => {
+    const srv = await startMCPServer();
+    const conn = createConnection('cascade-die', srv.url, 2000);
+
+    await conn.connect();
+    await teardownConnection(conn);
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    (conn as any).shouldStopReconnecting = false;
+    await srv.close();
+
+    let breakerTripped = false;
+    for (let i = 0; i < 10; i++) {
+      try {
+        await conn.connect();
+      } catch (e) {
+        if ((e as Error).message.includes('Circuit breaker is open')) {
+          breakerTripped = true;
+          break;
+        }
+      }
+    }
+
+    expect(breakerTripped).toBe(true);
+  }, 30_000);
+});
+
+/* ------------------------------------------------------------------ */
+/*  Sanity: Real transport works end-to-end                            */
+/* ------------------------------------------------------------------ */
+describe('Sanity: Real MCP SDK transport works correctly', () => {
+  it('connects, lists tools, and disconnects cleanly', async () => {
+    const srv = await startMCPServer();
+    const conn = createConnection('sanity', srv.url);
+
+    await conn.connect();
+    expect(await conn.isConnected()).toBe(true);
+
+    const tools = await conn.fetchTools();
+    expect(tools).toEqual(expect.arrayContaining([expect.objectContaining({ name: 'echo' })]));
+
+    await teardownConnection(conn);
+    await srv.close();
+  });
+});