🧮 refactor: Bulk Transactions & Balance Updates for Token Spending (#11996)

* refactor: transaction handling by integrating pricing and bulk write operations - Updated `recordCollectedUsage` to accept pricing functions and bulk write operations, improving transaction management. - Refactored `AgentClient` and related controllers to utilize the new transaction handling capabilities, ensuring better performance and accuracy in token spending. - Added tests to validate the new functionality, ensuring correct behavior for both standard and bulk transaction paths. - Introduced a new `transactions.ts` file to encapsulate transaction-related logic and types, enhancing code organization and maintainability. * chore: reorganize imports in agents client controller - Moved `getMultiplier` and `getCacheMultiplier` imports to maintain consistency and clarity in the import structure. - Removed duplicate import of `updateBalance` and `bulkInsertTransactions`, streamlining the code for better readability. * refactor: add TransactionData type and CANCEL_RATE constant to data-schemas Establishes a single source of truth for the transaction document shape and the incomplete-context billing rate constant, both consumed by packages/api and api/. * refactor: use proper types in data-schemas transaction methods - Replace `as unknown as { tokenCredits }` with `lean<IBalance>()` - Use `TransactionData[]` instead of `Record<string, unknown>[]` for bulkInsertTransactions parameter - Add JSDoc noting insertMany bypasses document middleware - Remove orphan section comment in methods/index.ts * refactor: use shared types in transactions.ts, fix bulk write logic - Import CANCEL_RATE from data-schemas instead of local duplicate - Import TransactionData from data-schemas for PreparedEntry/BulkWriteDeps - Use tilde alias for EndpointTokenConfig import - Pass valueKey through to getMultiplier - Only sum tokenValue for balance-enabled docs in bulkWriteTransactions - Consolidate two loops into single-pass map * refactor: remove duplicate updateBalance from Transaction.js Import updateBalance from ~/models (sourced from data-schemas) instead of maintaining a second copy. Also import CANCEL_RATE from data-schemas and remove the Balance model import (no longer needed directly). * fix: test real spendCollectedUsage instead of IIFE replica Export spendCollectedUsage from abortMiddleware.js and rewrite the test file to import and test the actual function. Previously the tests ran against a hand-written replica that could silently diverge from the real implementation. * test: add transactions.spec.ts and restore regression comments Add 22 direct unit tests for transactions.ts financial logic covering prepareTokenSpend, prepareStructuredTokenSpend, bulkWriteTransactions, CANCEL_RATE paths, NaN guards, disabled transactions, zero tokens, cache multipliers, and balance-enabled filtering. Restore critical regression documentation comments in recordCollectedUsage.spec.js explaining which production bugs the tests guard against. * fix: widen setValues type to include lastRefill The UpdateBalanceParams.setValues type was Partial<Pick<IBalance, 'tokenCredits'>> which excluded lastRefill — used by createAutoRefillTransaction. Widen to also pick 'lastRefill'. * test: use real MongoDB for bulkWriteTransactions tests Replace mock-based bulkWriteTransactions tests with real DB tests using MongoMemoryServer. Pure function tests (prepareTokenSpend, prepareStructuredTokenSpend) remain mock-based since they don't touch DB. Add end-to-end integration tests that verify the full prepare → bulk write → DB state pipeline with real Transaction and Balance models. * chore: update @librechat/agents dependency to version 3.1.54 in package-lock.json and related package.json files * test: add bulk path parity tests proving identical DB outcomes Three test suites proving the bulk path (prepareTokenSpend/ prepareStructuredTokenSpend + bulkWriteTransactions) produces numerically identical results to the legacy path for all scenarios: - usage.bulk-parity.spec.ts: mirrors all legacy recordCollectedUsage tests; asserts same return values and verifies metadata fields on the insertMany docs match what spendTokens args would carry - transactions.bulk-parity.spec.ts: real-DB tests using actual getMultiplier/getCacheMultiplier pricing functions; asserts exact tokenValue, rate, rawAmount and balance deductions for standard tokens, structured/cache tokens, CANCEL_RATE, premium pricing, multi-entry batches, and edge cases (NaN, zero, disabled) - Transaction.spec.js: adds describe('Bulk path parity') that mirrors 7 key legacy tests via recordCollectedUsage + bulk deps against real MongoDB, asserting same balance deductions and doc counts * refactor: update llmConfig structure to use modelKwargs for reasoning effort Refactor the llmConfig in getOpenAILLMConfig to store reasoning effort within modelKwargs instead of directly on llmConfig. This change ensures consistency in the configuration structure and improves clarity in the handling of reasoning properties in the tests. * test: update performance checks in processAssistantMessage tests Revise the performance assertions in the processAssistantMessage tests to ensure that each message processing time remains under 100ms, addressing potential ReDoS vulnerabilities. This change enhances the reliability of the tests by focusing on maximum processing time rather than relative ratios. * test: fill parity test gaps — model fallback, abort context, structured edge cases - usage.bulk-parity: add undefined model fallback test - transactions.bulk-parity: add abort context test (txns inserted, balance unchanged when balance not passed), fix readTokens type cast - Transaction.spec: add 3 missing mirrors — balance disabled with transactions enabled, structured transactions disabled, structured balance disabled * fix: deduct balance before inserting transactions to prevent orphaned docs Swap the order in bulkWriteTransactions: updateBalance runs before insertMany. If updateBalance fails (after exhausting retries), no transaction documents are written — avoiding the inconsistent state where transactions exist in MongoDB with no corresponding balance deduction. * chore: import order * test: update config.spec.ts for OpenRouter reasoning in modelKwargs Same fix as llm.spec.ts — OpenRouter reasoning is now passed via modelKwargs instead of llmConfig.reasoning directly.
2026-03-03 14:50:19 +01:00 · 2026-03-01 12:26:36 -05:00 · 2026-03-01 12:26:36 -05:00 · e1e204d6cf
commit e1e204d6cf
parent 0e5ee379b3
29 changed files with 3004 additions and 1070 deletions
--- a/api/server/controllers/agents/recordCollectedUsage.spec.js
+++ b/api/server/controllers/agents/recordCollectedUsage.spec.js
@ -2,23 +2,37 @@
 * Tests for AgentClient.recordCollectedUsage
 *
 * This is a critical function that handles token spending for agent LLM calls.
- * It must correctly handle:
- * - Sequential execution (single agent with tool calls)
- * - Parallel execution (multiple agents with independent inputs)
- * - Cache token handling (OpenAI and Anthropic formats)
+ * The client now delegates to the TS recordCollectedUsage from @librechat/api,
+ * passing pricing and bulkWriteOps deps.
 */

 const { EModelEndpoint } = require('librechat-data-provider');

-// Mock dependencies before requiring the module
 const mockSpendTokens = jest.fn().mockResolvedValue();
 const mockSpendStructuredTokens = jest.fn().mockResolvedValue();
+const mockGetMultiplier = jest.fn().mockReturnValue(1);
+const mockGetCacheMultiplier = jest.fn().mockReturnValue(null);
+const mockUpdateBalance = jest.fn().mockResolvedValue({});
+const mockBulkInsertTransactions = jest.fn().mockResolvedValue(undefined);
+const mockRecordCollectedUsage = jest
+  .fn()
+  .mockResolvedValue({ input_tokens: 100, output_tokens: 50 });

 jest.mock('~/models/spendTokens', () => ({
  spendTokens: (...args) => mockSpendTokens(...args),
  spendStructuredTokens: (...args) => mockSpendStructuredTokens(...args),
 }));

+jest.mock('~/models/tx', () => ({
+  getMultiplier: mockGetMultiplier,
+  getCacheMultiplier: mockGetCacheMultiplier,
+}));
+
+jest.mock('~/models', () => ({
+  updateBalance: mockUpdateBalance,
+  bulkInsertTransactions: mockBulkInsertTransactions,
+}));
+
 jest.mock('~/config', () => ({
  logger: {
    debug: jest.fn(),
@ -39,6 +53,14 @@ jest.mock('@librechat/agents', () => ({
  }),
 }));

+jest.mock('@librechat/api', () => {
+  const actual = jest.requireActual('@librechat/api');
+  return {
+    ...actual,
+    recordCollectedUsage: (...args) => mockRecordCollectedUsage(...args),
+  };
+});
+
 const AgentClient = require('./client');

 describe('AgentClient - recordCollectedUsage', () => {
@ -74,31 +96,66 @@ describe('AgentClient - recordCollectedUsage', () => {
  });

  describe('basic functionality', () => {
-    it('should return early if collectedUsage is empty', async () => {
+    it('should delegate to recordCollectedUsage with full deps', async () => {
+      const collectedUsage = [{ input_tokens: 100, output_tokens: 50, model: 'gpt-4' }];
+
+      await client.recordCollectedUsage({
+        collectedUsage,
+        balance: { enabled: true },
+        transactions: { enabled: true },
+      });
+
+      expect(mockRecordCollectedUsage).toHaveBeenCalledTimes(1);
+      const [deps, params] = mockRecordCollectedUsage.mock.calls[0];
+
+      expect(deps).toHaveProperty('spendTokens');
+      expect(deps).toHaveProperty('spendStructuredTokens');
+      expect(deps).toHaveProperty('pricing');
+      expect(deps.pricing).toHaveProperty('getMultiplier');
+      expect(deps.pricing).toHaveProperty('getCacheMultiplier');
+      expect(deps).toHaveProperty('bulkWriteOps');
+      expect(deps.bulkWriteOps).toHaveProperty('insertMany');
+      expect(deps.bulkWriteOps).toHaveProperty('updateBalance');
+
+      expect(params).toEqual(
+        expect.objectContaining({
+          user: 'user-123',
+          conversationId: 'convo-123',
+          collectedUsage,
+          context: 'message',
+          balance: { enabled: true },
+          transactions: { enabled: true },
+        }),
+      );
+    });
+
+    it('should not set this.usage if collectedUsage is empty (returns undefined)', async () => {
+      mockRecordCollectedUsage.mockResolvedValue(undefined);
+
      await client.recordCollectedUsage({
        collectedUsage: [],
        balance: { enabled: true },
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).not.toHaveBeenCalled();
-      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
      expect(client.usage).toBeUndefined();
    });

-    it('should return early if collectedUsage is null', async () => {
+    it('should not set this.usage if collectedUsage is null (returns undefined)', async () => {
+      mockRecordCollectedUsage.mockResolvedValue(undefined);
+
      await client.recordCollectedUsage({
        collectedUsage: null,
        balance: { enabled: true },
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).not.toHaveBeenCalled();
      expect(client.usage).toBeUndefined();
    });

-    it('should handle single usage entry correctly', async () => {
-      const collectedUsage = [{ input_tokens: 100, output_tokens: 50, model: 'gpt-4' }];
+    it('should set this.usage from recordCollectedUsage result', async () => {
+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 200, output_tokens: 75 });
+      const collectedUsage = [{ input_tokens: 200, output_tokens: 75, model: 'gpt-4' }];

      await client.recordCollectedUsage({
        collectedUsage,
@ -106,521 +163,122 @@ describe('AgentClient - recordCollectedUsage', () => {
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).toHaveBeenCalledTimes(1);
-      expect(mockSpendTokens).toHaveBeenCalledWith(
-        expect.objectContaining({
-          conversationId: 'convo-123',
-          user: 'user-123',
-          model: 'gpt-4',
-        }),
-        { promptTokens: 100, completionTokens: 50 },
-      );
-      expect(client.usage.input_tokens).toBe(100);
-      expect(client.usage.output_tokens).toBe(50);
-    });
-
-    it('should skip null entries in collectedUsage', async () => {
-      const collectedUsage = [
-        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
-        null,
-        { input_tokens: 200, output_tokens: 60, model: 'gpt-4' },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(mockSpendTokens).toHaveBeenCalledTimes(2);
+      expect(client.usage).toEqual({ input_tokens: 200, output_tokens: 75 });
    });
  });

  describe('sequential execution (single agent with tool calls)', () => {
-    it('should calculate tokens correctly for sequential tool calls', async () => {
-      // Sequential flow: output of call N becomes part of input for call N+1
-      // Call 1: input=100, output=50
-      // Call 2: input=150 (100+50), output=30
-      // Call 3: input=180 (150+30), output=20
+    it('should pass all usage entries to recordCollectedUsage', async () => {
      const collectedUsage = [
        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
        { input_tokens: 150, output_tokens: 30, model: 'gpt-4' },
        { input_tokens: 180, output_tokens: 20, model: 'gpt-4' },
      ];

+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 100, output_tokens: 100 });
+
      await client.recordCollectedUsage({
        collectedUsage,
        balance: { enabled: true },
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).toHaveBeenCalledTimes(3);
-      // Total output should be sum of all output_tokens: 50 + 30 + 20 = 100
+      expect(mockRecordCollectedUsage).toHaveBeenCalledTimes(1);
+      const [, params] = mockRecordCollectedUsage.mock.calls[0];
+      expect(params.collectedUsage).toHaveLength(3);
      expect(client.usage.output_tokens).toBe(100);
-      expect(client.usage.input_tokens).toBe(100); // First entry's input
+      expect(client.usage.input_tokens).toBe(100);
    });
  });

  describe('parallel execution (multiple agents)', () => {
-    it('should handle parallel agents with independent input tokens', async () => {
-      // Parallel agents have INDEPENDENT input tokens (not cumulative)
-      // Agent A: input=100, output=50
-      // Agent B: input=80, output=40 (different context, not 100+50)
+    it('should pass parallel agent usage to recordCollectedUsage', async () => {
      const collectedUsage = [
        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
        { input_tokens: 80, output_tokens: 40, model: 'gpt-4' },
      ];

+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 100, output_tokens: 90 });
+
      await client.recordCollectedUsage({
        collectedUsage,
        balance: { enabled: true },
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).toHaveBeenCalledTimes(2);
-      // Expected total output: 50 + 40 = 90
-      // output_tokens must be positive and should reflect total output
+      expect(mockRecordCollectedUsage).toHaveBeenCalledTimes(1);
+      expect(client.usage.output_tokens).toBe(90);
      expect(client.usage.output_tokens).toBeGreaterThan(0);
    });

-    it('should NOT produce negative output_tokens for parallel execution', async () => {
-      // Critical bug scenario: parallel agents where second agent has LOWER input tokens
+    /** Bug regression: parallel agents where second agent has LOWER input tokens produced negative output via incremental calculation. */
+    it('should NOT produce negative output_tokens', async () => {
      const collectedUsage = [
        { input_tokens: 200, output_tokens: 100, model: 'gpt-4' },
        { input_tokens: 50, output_tokens: 30, model: 'gpt-4' },
      ];

+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 200, output_tokens: 130 });
+
      await client.recordCollectedUsage({
        collectedUsage,
        balance: { enabled: true },
        transactions: { enabled: true },
      });

-      // output_tokens MUST be positive for proper token tracking
      expect(client.usage.output_tokens).toBeGreaterThan(0);
-      // Correct value should be 100 + 30 = 130
-    });
-
-    it('should calculate correct total output for parallel agents', async () => {
-      // Three parallel agents with independent contexts
-      const collectedUsage = [
-        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
-        { input_tokens: 120, output_tokens: 60, model: 'gpt-4-turbo' },
-        { input_tokens: 80, output_tokens: 40, model: 'claude-3' },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(mockSpendTokens).toHaveBeenCalledTimes(3);
-      // Total output should be 50 + 60 + 40 = 150
-      expect(client.usage.output_tokens).toBe(150);
-    });
-
-    it('should handle worst-case parallel scenario without negative tokens', async () => {
-      // Extreme case: first agent has very high input, subsequent have low
-      const collectedUsage = [
-        { input_tokens: 1000, output_tokens: 500, model: 'gpt-4' },
-        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
-        { input_tokens: 50, output_tokens: 25, model: 'gpt-4' },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      // Must be positive, should be 500 + 50 + 25 = 575
-      expect(client.usage.output_tokens).toBeGreaterThan(0);
-      expect(client.usage.output_tokens).toBe(575);
+      expect(client.usage.output_tokens).toBe(130);
    });
  });

  describe('real-world scenarios', () => {
-    it('should correctly sum output tokens for sequential tool calls with growing context', async () => {
-      // Real production data: Claude Opus with multiple tool calls
-      // Context grows as tool results are added, but output_tokens should only count model generations
+    it('should correctly handle sequential tool calls with growing context', async () => {
      const collectedUsage = [
-        {
-          input_tokens: 31596,
-          output_tokens: 151,
-          total_tokens: 31747,
-          input_token_details: { cache_read: 0, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 35368,
-          output_tokens: 150,
-          total_tokens: 35518,
-          input_token_details: { cache_read: 0, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 58362,
-          output_tokens: 295,
-          total_tokens: 58657,
-          input_token_details: { cache_read: 0, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 112604,
-          output_tokens: 193,
-          total_tokens: 112797,
-          input_token_details: { cache_read: 0, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 257440,
-          output_tokens: 2217,
-          total_tokens: 259657,
-          input_token_details: { cache_read: 0, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
+        { input_tokens: 31596, output_tokens: 151, model: 'claude-opus-4-5-20251101' },
+        { input_tokens: 35368, output_tokens: 150, model: 'claude-opus-4-5-20251101' },
+        { input_tokens: 58362, output_tokens: 295, model: 'claude-opus-4-5-20251101' },
+        { input_tokens: 112604, output_tokens: 193, model: 'claude-opus-4-5-20251101' },
+        { input_tokens: 257440, output_tokens: 2217, model: 'claude-opus-4-5-20251101' },
      ];

+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 31596, output_tokens: 3006 });
+
      await client.recordCollectedUsage({
        collectedUsage,
        balance: { enabled: true },
        transactions: { enabled: true },
      });

-      // input_tokens should be first entry's input (initial context)
      expect(client.usage.input_tokens).toBe(31596);
-
-      // output_tokens should be sum of all model outputs: 151 + 150 + 295 + 193 + 2217 = 3006
-      // NOT the inflated value from incremental calculation (338,559)
      expect(client.usage.output_tokens).toBe(3006);
-
-      // Verify spendTokens was called for each entry with correct values
-      expect(mockSpendTokens).toHaveBeenCalledTimes(5);
-      expect(mockSpendTokens).toHaveBeenNthCalledWith(
-        1,
-        expect.objectContaining({ model: 'claude-opus-4-5-20251101' }),
-        { promptTokens: 31596, completionTokens: 151 },
-      );
-      expect(mockSpendTokens).toHaveBeenNthCalledWith(
-        5,
-        expect.objectContaining({ model: 'claude-opus-4-5-20251101' }),
-        { promptTokens: 257440, completionTokens: 2217 },
-      );
    });

-    it('should handle single followup message correctly', async () => {
-      // Real production data: followup to the above conversation
-      const collectedUsage = [
-        {
-          input_tokens: 263406,
-          output_tokens: 257,
-          total_tokens: 263663,
-          input_token_details: { cache_read: 0, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(client.usage.input_tokens).toBe(263406);
-      expect(client.usage.output_tokens).toBe(257);
-
-      expect(mockSpendTokens).toHaveBeenCalledTimes(1);
-      expect(mockSpendTokens).toHaveBeenCalledWith(
-        expect.objectContaining({ model: 'claude-opus-4-5-20251101' }),
-        { promptTokens: 263406, completionTokens: 257 },
-      );
-    });
-
-    it('should ensure output_tokens > 0 check passes for BaseClient.sendMessage', async () => {
-      // This verifies the fix for the duplicate token spending bug
-      // BaseClient.sendMessage checks: if (usage != null && Number(usage[this.outputTokensKey]) > 0)
-      const collectedUsage = [
-        {
-          input_tokens: 31596,
-          output_tokens: 151,
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 35368,
-          output_tokens: 150,
-          model: 'claude-opus-4-5-20251101',
-        },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      const usage = client.getStreamUsage();
-
-      // The check that was failing before the fix
-      expect(usage).not.toBeNull();
-      expect(Number(usage.output_tokens)).toBeGreaterThan(0);
-
-      // Verify correct value
-      expect(usage.output_tokens).toBe(301); // 151 + 150
-    });
-
-    it('should correctly handle cache tokens with multiple tool calls', async () => {
-      // Real production data: Claude Opus with cache tokens (prompt caching)
-      // First entry has cache_creation, subsequent entries have cache_read
+    it('should correctly handle cache tokens', async () => {
      const collectedUsage = [
        {
          input_tokens: 788,
          output_tokens: 163,
-          total_tokens: 951,
          input_token_details: { cache_read: 0, cache_creation: 30808 },
          model: 'claude-opus-4-5-20251101',
        },
-        {
-          input_tokens: 3802,
-          output_tokens: 149,
-          total_tokens: 3951,
-          input_token_details: { cache_read: 30808, cache_creation: 768 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 26808,
-          output_tokens: 225,
-          total_tokens: 27033,
-          input_token_details: { cache_read: 31576, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 80912,
-          output_tokens: 204,
-          total_tokens: 81116,
-          input_token_details: { cache_read: 31576, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 136454,
-          output_tokens: 206,
-          total_tokens: 136660,
-          input_token_details: { cache_read: 31576, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 146316,
-          output_tokens: 224,
-          total_tokens: 146540,
-          input_token_details: { cache_read: 31576, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 150402,
-          output_tokens: 1248,
-          total_tokens: 151650,
-          input_token_details: { cache_read: 31576, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 156268,
-          output_tokens: 139,
-          total_tokens: 156407,
-          input_token_details: { cache_read: 31576, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
-        {
-          input_tokens: 167126,
-          output_tokens: 2961,
-          total_tokens: 170087,
-          input_token_details: { cache_read: 31576, cache_creation: 0 },
-          model: 'claude-opus-4-5-20251101',
-        },
      ];

+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 31596, output_tokens: 163 });
+
      await client.recordCollectedUsage({
        collectedUsage,
        balance: { enabled: true },
        transactions: { enabled: true },
      });

-      // input_tokens = first entry's input + cache_creation + cache_read
-      // = 788 + 30808 + 0 = 31596
      expect(client.usage.input_tokens).toBe(31596);
-
-      // output_tokens = sum of all output_tokens
-      // = 163 + 149 + 225 + 204 + 206 + 224 + 1248 + 139 + 2961 = 5519
-      expect(client.usage.output_tokens).toBe(5519);
-
-      // First 2 entries have cache tokens, should use spendStructuredTokens
-      // Remaining 7 entries have cache_read but no cache_creation, still structured
-      expect(mockSpendStructuredTokens).toHaveBeenCalledTimes(9);
-      expect(mockSpendTokens).toHaveBeenCalledTimes(0);
-
-      // Verify first entry uses structured tokens with cache_creation
-      expect(mockSpendStructuredTokens).toHaveBeenNthCalledWith(
-        1,
-        expect.objectContaining({ model: 'claude-opus-4-5-20251101' }),
-        {
-          promptTokens: { input: 788, write: 30808, read: 0 },
-          completionTokens: 163,
-        },
-      );
-
-      // Verify second entry uses structured tokens with both cache_creation and cache_read
-      expect(mockSpendStructuredTokens).toHaveBeenNthCalledWith(
-        2,
-        expect.objectContaining({ model: 'claude-opus-4-5-20251101' }),
-        {
-          promptTokens: { input: 3802, write: 768, read: 30808 },
-          completionTokens: 149,
-        },
-      );
-    });
-  });
-
-  describe('cache token handling', () => {
-    it('should handle OpenAI format cache tokens (input_token_details)', async () => {
-      const collectedUsage = [
-        {
-          input_tokens: 100,
-          output_tokens: 50,
-          model: 'gpt-4',
-          input_token_details: {
-            cache_creation: 20,
-            cache_read: 10,
-          },
-        },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(mockSpendStructuredTokens).toHaveBeenCalledTimes(1);
-      expect(mockSpendStructuredTokens).toHaveBeenCalledWith(
-        expect.objectContaining({ model: 'gpt-4' }),
-        {
-          promptTokens: {
-            input: 100,
-            write: 20,
-            read: 10,
-          },
-          completionTokens: 50,
-        },
-      );
-    });
-
-    it('should handle Anthropic format cache tokens (cache_*_input_tokens)', async () => {
-      const collectedUsage = [
-        {
-          input_tokens: 100,
-          output_tokens: 50,
-          model: 'claude-3',
-          cache_creation_input_tokens: 25,
-          cache_read_input_tokens: 15,
-        },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(mockSpendStructuredTokens).toHaveBeenCalledTimes(1);
-      expect(mockSpendStructuredTokens).toHaveBeenCalledWith(
-        expect.objectContaining({ model: 'claude-3' }),
-        {
-          promptTokens: {
-            input: 100,
-            write: 25,
-            read: 15,
-          },
-          completionTokens: 50,
-        },
-      );
-    });
-
-    it('should use spendTokens for entries without cache tokens', async () => {
-      const collectedUsage = [{ input_tokens: 100, output_tokens: 50, model: 'gpt-4' }];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(mockSpendTokens).toHaveBeenCalledTimes(1);
-      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
-    });
-
-    it('should handle mixed cache and non-cache entries', async () => {
-      const collectedUsage = [
-        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
-        {
-          input_tokens: 150,
-          output_tokens: 30,
-          model: 'gpt-4',
-          input_token_details: { cache_creation: 10, cache_read: 5 },
-        },
-        { input_tokens: 200, output_tokens: 20, model: 'gpt-4' },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(mockSpendTokens).toHaveBeenCalledTimes(2);
-      expect(mockSpendStructuredTokens).toHaveBeenCalledTimes(1);
-    });
-
-    it('should include cache tokens in total input calculation', async () => {
-      const collectedUsage = [
-        {
-          input_tokens: 100,
-          output_tokens: 50,
-          model: 'gpt-4',
-          input_token_details: {
-            cache_creation: 20,
-            cache_read: 10,
-          },
-        },
-      ];
-
-      await client.recordCollectedUsage({
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      // Total input should include cache tokens: 100 + 20 + 10 = 130
-      expect(client.usage.input_tokens).toBe(130);
+      expect(client.usage.output_tokens).toBe(163);
    });
  });

  describe('model fallback', () => {
-    it('should use usage.model when available', async () => {
-      const collectedUsage = [{ input_tokens: 100, output_tokens: 50, model: 'gpt-4-turbo' }];
-
-      await client.recordCollectedUsage({
-        model: 'fallback-model',
-        collectedUsage,
-        balance: { enabled: true },
-        transactions: { enabled: true },
-      });
-
-      expect(mockSpendTokens).toHaveBeenCalledWith(
-        expect.objectContaining({ model: 'gpt-4-turbo' }),
-        expect.any(Object),
-      );
-    });
-
-    it('should fallback to param model when usage.model is missing', async () => {
+    it('should use param model when available', async () => {
+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 100, output_tokens: 50 });
      const collectedUsage = [{ input_tokens: 100, output_tokens: 50 }];

      await client.recordCollectedUsage({
@ -630,14 +288,13 @@ describe('AgentClient - recordCollectedUsage', () => {
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).toHaveBeenCalledWith(
-        expect.objectContaining({ model: 'param-model' }),
-        expect.any(Object),
-      );
+      const [, params] = mockRecordCollectedUsage.mock.calls[0];
+      expect(params.model).toBe('param-model');
    });

    it('should fallback to client.model when param model is missing', async () => {
      client.model = 'client-model';
+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 100, output_tokens: 50 });
      const collectedUsage = [{ input_tokens: 100, output_tokens: 50 }];

      await client.recordCollectedUsage({
@ -646,13 +303,12 @@ describe('AgentClient - recordCollectedUsage', () => {
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).toHaveBeenCalledWith(
-        expect.objectContaining({ model: 'client-model' }),
-        expect.any(Object),
-      );
+      const [, params] = mockRecordCollectedUsage.mock.calls[0];
+      expect(params.model).toBe('client-model');
    });

    it('should fallback to agent model_parameters.model as last resort', async () => {
+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 100, output_tokens: 50 });
      const collectedUsage = [{ input_tokens: 100, output_tokens: 50 }];

      await client.recordCollectedUsage({
@ -661,15 +317,14 @@ describe('AgentClient - recordCollectedUsage', () => {
        transactions: { enabled: true },
      });

-      expect(mockSpendTokens).toHaveBeenCalledWith(
-        expect.objectContaining({ model: 'gpt-4' }),
-        expect.any(Object),
-      );
+      const [, params] = mockRecordCollectedUsage.mock.calls[0];
+      expect(params.model).toBe('gpt-4');
    });
  });

  describe('getStreamUsage integration', () => {
    it('should return the usage object set by recordCollectedUsage', async () => {
+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 100, output_tokens: 50 });
      const collectedUsage = [{ input_tokens: 100, output_tokens: 50, model: 'gpt-4' }];

      await client.recordCollectedUsage({
@ -679,10 +334,7 @@ describe('AgentClient - recordCollectedUsage', () => {
      });

      const usage = client.getStreamUsage();
-      expect(usage).toEqual({
-        input_tokens: 100,
-        output_tokens: 50,
-      });
+      expect(usage).toEqual({ input_tokens: 100, output_tokens: 50 });
    });

    it('should return undefined before recordCollectedUsage is called', () => {
@ -690,9 +342,9 @@ describe('AgentClient - recordCollectedUsage', () => {
      expect(usage).toBeUndefined();
    });

+    /** Verifies usage passes the check in BaseClient.sendMessage: if (usage != null && Number(usage[this.outputTokensKey]) > 0) */
    it('should have output_tokens > 0 for BaseClient.sendMessage check', async () => {
-      // This test verifies the usage will pass the check in BaseClient.sendMessage:
-      // if (usage != null && Number(usage[this.outputTokensKey]) > 0)
+      mockRecordCollectedUsage.mockResolvedValue({ input_tokens: 200, output_tokens: 130 });
      const collectedUsage = [
        { input_tokens: 200, output_tokens: 100, model: 'gpt-4' },
        { input_tokens: 50, output_tokens: 30, model: 'gpt-4' },