🧮 refactor: Bulk Transactions & Balance Updates for Token Spending (#11996)

* refactor: transaction handling by integrating pricing and bulk write operations - Updated `recordCollectedUsage` to accept pricing functions and bulk write operations, improving transaction management. - Refactored `AgentClient` and related controllers to utilize the new transaction handling capabilities, ensuring better performance and accuracy in token spending. - Added tests to validate the new functionality, ensuring correct behavior for both standard and bulk transaction paths. - Introduced a new `transactions.ts` file to encapsulate transaction-related logic and types, enhancing code organization and maintainability. * chore: reorganize imports in agents client controller - Moved `getMultiplier` and `getCacheMultiplier` imports to maintain consistency and clarity in the import structure. - Removed duplicate import of `updateBalance` and `bulkInsertTransactions`, streamlining the code for better readability. * refactor: add TransactionData type and CANCEL_RATE constant to data-schemas Establishes a single source of truth for the transaction document shape and the incomplete-context billing rate constant, both consumed by packages/api and api/. * refactor: use proper types in data-schemas transaction methods - Replace `as unknown as { tokenCredits }` with `lean<IBalance>()` - Use `TransactionData[]` instead of `Record<string, unknown>[]` for bulkInsertTransactions parameter - Add JSDoc noting insertMany bypasses document middleware - Remove orphan section comment in methods/index.ts * refactor: use shared types in transactions.ts, fix bulk write logic - Import CANCEL_RATE from data-schemas instead of local duplicate - Import TransactionData from data-schemas for PreparedEntry/BulkWriteDeps - Use tilde alias for EndpointTokenConfig import - Pass valueKey through to getMultiplier - Only sum tokenValue for balance-enabled docs in bulkWriteTransactions - Consolidate two loops into single-pass map * refactor: remove duplicate updateBalance from Transaction.js Import updateBalance from ~/models (sourced from data-schemas) instead of maintaining a second copy. Also import CANCEL_RATE from data-schemas and remove the Balance model import (no longer needed directly). * fix: test real spendCollectedUsage instead of IIFE replica Export spendCollectedUsage from abortMiddleware.js and rewrite the test file to import and test the actual function. Previously the tests ran against a hand-written replica that could silently diverge from the real implementation. * test: add transactions.spec.ts and restore regression comments Add 22 direct unit tests for transactions.ts financial logic covering prepareTokenSpend, prepareStructuredTokenSpend, bulkWriteTransactions, CANCEL_RATE paths, NaN guards, disabled transactions, zero tokens, cache multipliers, and balance-enabled filtering. Restore critical regression documentation comments in recordCollectedUsage.spec.js explaining which production bugs the tests guard against. * fix: widen setValues type to include lastRefill The UpdateBalanceParams.setValues type was Partial<Pick<IBalance, 'tokenCredits'>> which excluded lastRefill — used by createAutoRefillTransaction. Widen to also pick 'lastRefill'. * test: use real MongoDB for bulkWriteTransactions tests Replace mock-based bulkWriteTransactions tests with real DB tests using MongoMemoryServer. Pure function tests (prepareTokenSpend, prepareStructuredTokenSpend) remain mock-based since they don't touch DB. Add end-to-end integration tests that verify the full prepare → bulk write → DB state pipeline with real Transaction and Balance models. * chore: update @librechat/agents dependency to version 3.1.54 in package-lock.json and related package.json files * test: add bulk path parity tests proving identical DB outcomes Three test suites proving the bulk path (prepareTokenSpend/ prepareStructuredTokenSpend + bulkWriteTransactions) produces numerically identical results to the legacy path for all scenarios: - usage.bulk-parity.spec.ts: mirrors all legacy recordCollectedUsage tests; asserts same return values and verifies metadata fields on the insertMany docs match what spendTokens args would carry - transactions.bulk-parity.spec.ts: real-DB tests using actual getMultiplier/getCacheMultiplier pricing functions; asserts exact tokenValue, rate, rawAmount and balance deductions for standard tokens, structured/cache tokens, CANCEL_RATE, premium pricing, multi-entry batches, and edge cases (NaN, zero, disabled) - Transaction.spec.js: adds describe('Bulk path parity') that mirrors 7 key legacy tests via recordCollectedUsage + bulk deps against real MongoDB, asserting same balance deductions and doc counts * refactor: update llmConfig structure to use modelKwargs for reasoning effort Refactor the llmConfig in getOpenAILLMConfig to store reasoning effort within modelKwargs instead of directly on llmConfig. This change ensures consistency in the configuration structure and improves clarity in the handling of reasoning properties in the tests. * test: update performance checks in processAssistantMessage tests Revise the performance assertions in the processAssistantMessage tests to ensure that each message processing time remains under 100ms, addressing potential ReDoS vulnerabilities. This change enhances the reliability of the tests by focusing on maximum processing time rather than relative ratios. * test: fill parity test gaps — model fallback, abort context, structured edge cases - usage.bulk-parity: add undefined model fallback test - transactions.bulk-parity: add abort context test (txns inserted, balance unchanged when balance not passed), fix readTokens type cast - Transaction.spec: add 3 missing mirrors — balance disabled with transactions enabled, structured transactions disabled, structured balance disabled * fix: deduct balance before inserting transactions to prevent orphaned docs Swap the order in bulkWriteTransactions: updateBalance runs before insertMany. If updateBalance fails (after exhausting retries), no transaction documents are written — avoiding the inconsistent state where transactions exist in MongoDB with no corresponding balance deduction. * chore: import order * test: update config.spec.ts for OpenRouter reasoning in modelKwargs Same fix as llm.spec.ts — OpenRouter reasoning is now passed via modelKwargs instead of llmConfig.reasoning directly.
2026-03-03 14:50:19 +01:00 · 2026-03-01 12:26:36 -05:00 · 2026-03-01 12:26:36 -05:00 · e1e204d6cf
commit e1e204d6cf
parent 0e5ee379b3
29 changed files with 3004 additions and 1070 deletions
--- a/api/models/Transaction.js
+++ b/api/models/Transaction.js
@ -1,140 +1,7 @@
-const { logger } = require('@librechat/data-schemas');
+const { logger, CANCEL_RATE } = require('@librechat/data-schemas');
 const { getMultiplier, getCacheMultiplier } = require('./tx');
-const { Transaction, Balance } = require('~/db/models');
-
-const cancelRate = 1.15;
-
-/**
- * Updates a user's token balance based on a transaction using optimistic concurrency control
- * without schema changes. Compatible with DocumentDB.
- * @async
- * @function
- * @param {Object} params - The function parameters.
- * @param {string|mongoose.Types.ObjectId} params.user - The user ID.
- * @param {number} params.incrementValue - The value to increment the balance by (can be negative).
- * @param {import('mongoose').UpdateQuery<import('@librechat/data-schemas').IBalance>['$set']} [params.setValues] - Optional additional fields to set.
- * @returns {Promise<Object>} Returns the updated balance document (lean).
- * @throws {Error} Throws an error if the update fails after multiple retries.
- */
-const updateBalance = async ({ user, incrementValue, setValues }) => {
-  let maxRetries = 10; // Number of times to retry on conflict
-  let delay = 50; // Initial retry delay in ms
-  let lastError = null;
-
-  for (let attempt = 1; attempt <= maxRetries; attempt++) {
-    let currentBalanceDoc;
-    try {
-      // 1. Read the current document state
-      currentBalanceDoc = await Balance.findOne({ user }).lean();
-      const currentCredits = currentBalanceDoc ? currentBalanceDoc.tokenCredits : 0;
-
-      // 2. Calculate the desired new state
-      const potentialNewCredits = currentCredits + incrementValue;
-      const newCredits = Math.max(0, potentialNewCredits); // Ensure balance doesn't go below zero
-
-      // 3. Prepare the update payload
-      const updatePayload = {
-        $set: {
-          tokenCredits: newCredits,
-          ...(setValues || {}), // Merge other values to set
-        },
-      };
-
-      // 4. Attempt the conditional update or upsert
-      let updatedBalance = null;
-      if (currentBalanceDoc) {
-        // --- Document Exists: Perform Conditional Update ---
-        // Try to update only if the tokenCredits match the value we read (currentCredits)
-        updatedBalance = await Balance.findOneAndUpdate(
-          {
-            user: user,
-            tokenCredits: currentCredits, // Optimistic lock: condition based on the read value
-          },
-          updatePayload,
-          {
-            new: true, // Return the modified document
-            // lean: true, // .lean() is applied after query execution in Mongoose >= 6
-          },
-        ).lean(); // Use lean() for plain JS object
-
-        if (updatedBalance) {
-          // Success! The update was applied based on the expected current state.
-          return updatedBalance;
-        }
-        // If updatedBalance is null, it means tokenCredits changed between read and write (conflict).
-        lastError = new Error(`Concurrency conflict for user ${user} on attempt ${attempt}.`);
-        // Proceed to retry logic below.
-      } else {
-        // --- Document Does Not Exist: Perform Conditional Upsert ---
-        // Try to insert the document, but only if it still doesn't exist.
-        // Using tokenCredits: {$exists: false} helps prevent race conditions where
-        // another process creates the doc between our findOne and findOneAndUpdate.
-        try {
-          updatedBalance = await Balance.findOneAndUpdate(
-            {
-              user: user,
-              // Attempt to match only if the document doesn't exist OR was just created
-              // without tokenCredits (less likely but possible). A simple { user } filter
-              // might also work, relying on the retry for conflicts.
-              // Let's use a simpler filter and rely on retry for races.
-              // tokenCredits: { $exists: false } // This condition might be too strict if doc exists with 0 credits
-            },
-            updatePayload,
-            {
-              upsert: true, // Create if doesn't exist
-              new: true, // Return the created/updated document
-              // setDefaultsOnInsert: true, // Ensure schema defaults are applied on insert
-              // lean: true,
-            },
-          ).lean();
-
-          if (updatedBalance) {
-            // Upsert succeeded (likely created the document)
-            return updatedBalance;
-          }
-          // If null, potentially a rare race condition during upsert. Retry should handle it.
-          lastError = new Error(
-            `Upsert race condition suspected for user ${user} on attempt ${attempt}.`,
-          );
-        } catch (error) {
-          if (error.code === 11000) {
-            // E11000 duplicate key error on index
-            // This means another process created the document *just* before our upsert.
-            // It's a concurrency conflict during creation. We should retry.
-            lastError = error; // Store the error
-            // Proceed to retry logic below.
-          } else {
-            // Different error, rethrow
-            throw error;
-          }
-        }
-      } // End if/else (document exists?)
-    } catch (error) {
-      // Catch errors from findOne or unexpected findOneAndUpdate errors
-      logger.error(`[updateBalance] Error during attempt ${attempt} for user ${user}:`, error);
-      lastError = error; // Store the error
-      // Consider stopping retries for non-transient errors, but for now, we retry.
-    }
-
-    // If we reached here, it means the update failed (conflict or error), wait and retry
-    if (attempt < maxRetries) {
-      const jitter = Math.random() * delay * 0.5; // Add jitter to delay
-      await new Promise((resolve) => setTimeout(resolve, delay + jitter));
-      delay = Math.min(delay * 2, 2000); // Exponential backoff with cap
-    }
-  } // End for loop (retries)
-
-  // If loop finishes without success, throw the last encountered error or a generic one
-  logger.error(
-    `[updateBalance] Failed to update balance for user ${user} after ${maxRetries} attempts.`,
-  );
-  throw (
-    lastError ||
-    new Error(
-      `Failed to update balance for user ${user} after maximum retries due to persistent conflicts.`,
-    )
-  );
-};
+const { Transaction } = require('~/db/models');
+const { updateBalance } = require('~/models');

 /** Method to calculate and set the tokenValue for a transaction */
 function calculateTokenValue(txn) {
@ -145,8 +12,8 @@ function calculateTokenValue(txn) {
  txn.rate = multiplier;
  txn.tokenValue = txn.rawAmount * multiplier;
  if (txn.context && txn.tokenType === 'completion' && txn.context === 'incomplete') {
-    txn.tokenValue = Math.ceil(txn.tokenValue * cancelRate);
-    txn.rate *= cancelRate;
+    txn.tokenValue = Math.ceil(txn.tokenValue * CANCEL_RATE);
+    txn.rate *= CANCEL_RATE;
  }
 }

@ -321,11 +188,11 @@ function calculateStructuredTokenValue(txn) {
  }

  if (txn.context && txn.tokenType === 'completion' && txn.context === 'incomplete') {
-    txn.tokenValue = Math.ceil(txn.tokenValue * cancelRate);
-    txn.rate *= cancelRate;
+    txn.tokenValue = Math.ceil(txn.tokenValue * CANCEL_RATE);
+    txn.rate *= CANCEL_RATE;
    if (txn.rateDetail) {
      txn.rateDetail = Object.fromEntries(
-        Object.entries(txn.rateDetail).map(([k, v]) => [k, v * cancelRate]),
+        Object.entries(txn.rateDetail).map(([k, v]) => [k, v * CANCEL_RATE]),
      );
    }
  }
--- a/api/models/Transaction.spec.js
+++ b/api/models/Transaction.spec.js
@ -1,8 +1,10 @@
 const mongoose = require('mongoose');
+const { recordCollectedUsage } = require('@librechat/api');
+const { createMethods } = require('@librechat/data-schemas');
 const { MongoMemoryServer } = require('mongodb-memory-server');
-const { spendTokens, spendStructuredTokens } = require('./spendTokens');
 const { getMultiplier, getCacheMultiplier, premiumTokenValues, tokenValues } = require('./tx');
 const { createTransaction, createStructuredTransaction } = require('./Transaction');
+const { spendTokens, spendStructuredTokens } = require('./spendTokens');
 const { Balance, Transaction } = require('~/db/models');

 let mongoServer;
@ -985,3 +987,339 @@ describe('Premium Token Pricing Integration Tests', () => {
    expect(updatedBalance.tokenCredits).toBeCloseTo(initialBalance - expectedCost, 0);
  });
 });
+
+describe('Bulk path parity', () => {
+  /**
+   * Each test here mirrors an existing legacy test above, replacing spendTokens/
+   * spendStructuredTokens with recordCollectedUsage + bulk deps.
+   * The balance deduction and transaction document fields must be numerically identical.
+   */
+  let bulkDeps;
+  let methods;
+
+  beforeEach(() => {
+    methods = createMethods(mongoose);
+    bulkDeps = {
+      spendTokens: () => Promise.resolve(),
+      spendStructuredTokens: () => Promise.resolve(),
+      pricing: { getMultiplier, getCacheMultiplier },
+      bulkWriteOps: {
+        insertMany: methods.bulkInsertTransactions,
+        updateBalance: methods.updateBalance,
+      },
+    };
+  });
+
+  test('balance should decrease when spending tokens via bulk path', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'gpt-3.5-turbo';
+    const promptTokens = 100;
+    const completionTokens = 50;
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-conversation-id',
+      model,
+      context: 'test',
+      balance: { enabled: true },
+      transactions: { enabled: true },
+      collectedUsage: [{ input_tokens: promptTokens, output_tokens: completionTokens, model }],
+    });
+
+    const updatedBalance = await Balance.findOne({ user: userId });
+    const promptMultiplier = getMultiplier({
+      model,
+      tokenType: 'prompt',
+      inputTokenCount: promptTokens,
+    });
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: promptTokens,
+    });
+    const expectedTotalCost =
+      promptTokens * promptMultiplier + completionTokens * completionMultiplier;
+    const expectedBalance = initialBalance - expectedTotalCost;
+
+    expect(updatedBalance.tokenCredits).toBeCloseTo(expectedBalance, 0);
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(2);
+  });
+
+  test('bulk path should not update balance when balance.enabled is false', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'gpt-3.5-turbo';
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-conversation-id',
+      model,
+      context: 'test',
+      balance: { enabled: false },
+      transactions: { enabled: true },
+      collectedUsage: [{ input_tokens: 100, output_tokens: 50, model }],
+    });
+
+    const updatedBalance = await Balance.findOne({ user: userId });
+    expect(updatedBalance.tokenCredits).toBe(initialBalance);
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(2); // transactions still recorded
+  });
+
+  test('bulk path should not insert when transactions.enabled is false', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-conversation-id',
+      model: 'gpt-3.5-turbo',
+      context: 'test',
+      balance: { enabled: true },
+      transactions: { enabled: false },
+      collectedUsage: [{ input_tokens: 100, output_tokens: 50, model: 'gpt-3.5-turbo' }],
+    });
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(0);
+    const balance = await Balance.findOne({ user: userId });
+    expect(balance.tokenCredits).toBe(initialBalance);
+  });
+
+  test('bulk path handles incomplete context for completion tokens — same CANCEL_RATE as legacy', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 17613154.55;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-3-5-sonnet';
+    const promptTokens = 10;
+    const completionTokens = 50;
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-convo',
+      model,
+      context: 'incomplete',
+      balance: { enabled: true },
+      transactions: { enabled: true },
+      collectedUsage: [{ input_tokens: promptTokens, output_tokens: completionTokens, model }],
+    });
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    const completionTx = txns.find((t) => t.tokenType === 'completion');
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: promptTokens,
+    });
+    expect(completionTx.tokenValue).toBeCloseTo(-completionTokens * completionMultiplier * 1.15, 0);
+  });
+
+  test('bulk path structured tokens — balance deduction matches legacy spendStructuredTokens', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 17613154.55;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-3-5-sonnet';
+    const promptInput = 11;
+    const promptWrite = 140522;
+    const promptRead = 0;
+    const completionTokens = 5;
+    const totalInput = promptInput + promptWrite + promptRead;
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-convo',
+      model,
+      context: 'message',
+      balance: { enabled: true },
+      transactions: { enabled: true },
+      collectedUsage: [
+        {
+          input_tokens: promptInput,
+          output_tokens: completionTokens,
+          model,
+          input_token_details: { cache_creation: promptWrite, cache_read: promptRead },
+        },
+      ],
+    });
+
+    const promptMultiplier = getMultiplier({
+      model,
+      tokenType: 'prompt',
+      inputTokenCount: totalInput,
+    });
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: totalInput,
+    });
+    const writeMultiplier = getCacheMultiplier({ model, cacheType: 'write' }) ?? promptMultiplier;
+    const readMultiplier = getCacheMultiplier({ model, cacheType: 'read' }) ?? promptMultiplier;
+
+    const expectedPromptCost =
+      promptInput * promptMultiplier + promptWrite * writeMultiplier + promptRead * readMultiplier;
+    const expectedCompletionCost = completionTokens * completionMultiplier;
+    const expectedTotalCost = expectedPromptCost + expectedCompletionCost;
+    const expectedBalance = initialBalance - expectedTotalCost;
+
+    const updatedBalance = await Balance.findOne({ user: userId });
+    expect(Math.abs(updatedBalance.tokenCredits - expectedBalance)).toBeLessThan(100);
+  });
+
+  test('premium pricing above threshold via bulk path — same balance as legacy', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 100000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-opus-4-6';
+    const promptTokens = 250000;
+    const completionTokens = 500;
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-premium',
+      model,
+      context: 'test',
+      balance: { enabled: true },
+      transactions: { enabled: true },
+      collectedUsage: [{ input_tokens: promptTokens, output_tokens: completionTokens, model }],
+    });
+
+    const premiumPromptRate = premiumTokenValues[model].prompt;
+    const premiumCompletionRate = premiumTokenValues[model].completion;
+    const expectedCost =
+      promptTokens * premiumPromptRate + completionTokens * premiumCompletionRate;
+
+    const updatedBalance = await Balance.findOne({ user: userId });
+    expect(updatedBalance.tokenCredits).toBeCloseTo(initialBalance - expectedCost, 0);
+  });
+
+  test('real-world multi-entry batch: 5 sequential tool calls — same total deduction as 5 legacy spendTokens calls', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 100000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-opus-4-5-20251101';
+    const calls = [
+      { input_tokens: 31596, output_tokens: 151 },
+      { input_tokens: 35368, output_tokens: 150 },
+      { input_tokens: 58362, output_tokens: 295 },
+      { input_tokens: 112604, output_tokens: 193 },
+      { input_tokens: 257440, output_tokens: 2217 },
+    ];
+
+    let expectedTotalCost = 0;
+    for (const { input_tokens, output_tokens } of calls) {
+      const pm = getMultiplier({ model, tokenType: 'prompt', inputTokenCount: input_tokens });
+      const cm = getMultiplier({ model, tokenType: 'completion', inputTokenCount: input_tokens });
+      expectedTotalCost += input_tokens * pm + output_tokens * cm;
+    }
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-sequential',
+      model,
+      context: 'message',
+      balance: { enabled: true },
+      transactions: { enabled: true },
+      collectedUsage: calls.map((c) => ({ ...c, model })),
+    });
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(10); // 5 calls × 2 docs (prompt + completion)
+
+    const updatedBalance = await Balance.findOne({ user: userId });
+    expect(updatedBalance.tokenCredits).toBeCloseTo(initialBalance - expectedTotalCost, 0);
+  });
+
+  test('bulk path should save transaction but not update balance when balance disabled, transactions enabled', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-conversation-id',
+      model: 'gpt-3.5-turbo',
+      context: 'test',
+      balance: { enabled: false },
+      transactions: { enabled: true },
+      collectedUsage: [{ input_tokens: 100, output_tokens: 50, model: 'gpt-3.5-turbo' }],
+    });
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(2);
+    expect(txns[0].rawAmount).toBeDefined();
+    const balance = await Balance.findOne({ user: userId });
+    expect(balance.tokenCredits).toBe(initialBalance);
+  });
+
+  test('bulk path structured tokens should not save when transactions.enabled is false', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-conversation-id',
+      model: 'claude-3-5-sonnet',
+      context: 'message',
+      balance: { enabled: true },
+      transactions: { enabled: false },
+      collectedUsage: [
+        {
+          input_tokens: 10,
+          output_tokens: 5,
+          model: 'claude-3-5-sonnet',
+          input_token_details: { cache_creation: 100, cache_read: 5 },
+        },
+      ],
+    });
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(0);
+    const balance = await Balance.findOne({ user: userId });
+    expect(balance.tokenCredits).toBe(initialBalance);
+  });
+
+  test('bulk path structured tokens should save but not update balance when balance disabled', async () => {
+    const userId = new mongoose.Types.ObjectId();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    await recordCollectedUsage(bulkDeps, {
+      user: userId.toString(),
+      conversationId: 'test-conversation-id',
+      model: 'claude-3-5-sonnet',
+      context: 'message',
+      balance: { enabled: false },
+      transactions: { enabled: true },
+      collectedUsage: [
+        {
+          input_tokens: 10,
+          output_tokens: 5,
+          model: 'claude-3-5-sonnet',
+          input_token_details: { cache_creation: 100, cache_read: 5 },
+        },
+      ],
+    });
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(2);
+    const promptTx = txns.find((t) => t.tokenType === 'prompt');
+    expect(promptTx.inputTokens).toBe(-10);
+    expect(promptTx.writeTokens).toBe(-100);
+    expect(promptTx.readTokens).toBe(-5);
+    const balance = await Balance.findOne({ user: userId });
+    expect(balance.tokenCredits).toBe(initialBalance);
+  });
+});