🧮 refactor: Bulk Transactions & Balance Updates for Token Spending (#11996)

* refactor: transaction handling by integrating pricing and bulk write operations - Updated `recordCollectedUsage` to accept pricing functions and bulk write operations, improving transaction management. - Refactored `AgentClient` and related controllers to utilize the new transaction handling capabilities, ensuring better performance and accuracy in token spending. - Added tests to validate the new functionality, ensuring correct behavior for both standard and bulk transaction paths. - Introduced a new `transactions.ts` file to encapsulate transaction-related logic and types, enhancing code organization and maintainability. * chore: reorganize imports in agents client controller - Moved `getMultiplier` and `getCacheMultiplier` imports to maintain consistency and clarity in the import structure. - Removed duplicate import of `updateBalance` and `bulkInsertTransactions`, streamlining the code for better readability. * refactor: add TransactionData type and CANCEL_RATE constant to data-schemas Establishes a single source of truth for the transaction document shape and the incomplete-context billing rate constant, both consumed by packages/api and api/. * refactor: use proper types in data-schemas transaction methods - Replace `as unknown as { tokenCredits }` with `lean<IBalance>()` - Use `TransactionData[]` instead of `Record<string, unknown>[]` for bulkInsertTransactions parameter - Add JSDoc noting insertMany bypasses document middleware - Remove orphan section comment in methods/index.ts * refactor: use shared types in transactions.ts, fix bulk write logic - Import CANCEL_RATE from data-schemas instead of local duplicate - Import TransactionData from data-schemas for PreparedEntry/BulkWriteDeps - Use tilde alias for EndpointTokenConfig import - Pass valueKey through to getMultiplier - Only sum tokenValue for balance-enabled docs in bulkWriteTransactions - Consolidate two loops into single-pass map * refactor: remove duplicate updateBalance from Transaction.js Import updateBalance from ~/models (sourced from data-schemas) instead of maintaining a second copy. Also import CANCEL_RATE from data-schemas and remove the Balance model import (no longer needed directly). * fix: test real spendCollectedUsage instead of IIFE replica Export spendCollectedUsage from abortMiddleware.js and rewrite the test file to import and test the actual function. Previously the tests ran against a hand-written replica that could silently diverge from the real implementation. * test: add transactions.spec.ts and restore regression comments Add 22 direct unit tests for transactions.ts financial logic covering prepareTokenSpend, prepareStructuredTokenSpend, bulkWriteTransactions, CANCEL_RATE paths, NaN guards, disabled transactions, zero tokens, cache multipliers, and balance-enabled filtering. Restore critical regression documentation comments in recordCollectedUsage.spec.js explaining which production bugs the tests guard against. * fix: widen setValues type to include lastRefill The UpdateBalanceParams.setValues type was Partial<Pick<IBalance, 'tokenCredits'>> which excluded lastRefill — used by createAutoRefillTransaction. Widen to also pick 'lastRefill'. * test: use real MongoDB for bulkWriteTransactions tests Replace mock-based bulkWriteTransactions tests with real DB tests using MongoMemoryServer. Pure function tests (prepareTokenSpend, prepareStructuredTokenSpend) remain mock-based since they don't touch DB. Add end-to-end integration tests that verify the full prepare → bulk write → DB state pipeline with real Transaction and Balance models. * chore: update @librechat/agents dependency to version 3.1.54 in package-lock.json and related package.json files * test: add bulk path parity tests proving identical DB outcomes Three test suites proving the bulk path (prepareTokenSpend/ prepareStructuredTokenSpend + bulkWriteTransactions) produces numerically identical results to the legacy path for all scenarios: - usage.bulk-parity.spec.ts: mirrors all legacy recordCollectedUsage tests; asserts same return values and verifies metadata fields on the insertMany docs match what spendTokens args would carry - transactions.bulk-parity.spec.ts: real-DB tests using actual getMultiplier/getCacheMultiplier pricing functions; asserts exact tokenValue, rate, rawAmount and balance deductions for standard tokens, structured/cache tokens, CANCEL_RATE, premium pricing, multi-entry batches, and edge cases (NaN, zero, disabled) - Transaction.spec.js: adds describe('Bulk path parity') that mirrors 7 key legacy tests via recordCollectedUsage + bulk deps against real MongoDB, asserting same balance deductions and doc counts * refactor: update llmConfig structure to use modelKwargs for reasoning effort Refactor the llmConfig in getOpenAILLMConfig to store reasoning effort within modelKwargs instead of directly on llmConfig. This change ensures consistency in the configuration structure and improves clarity in the handling of reasoning properties in the tests. * test: update performance checks in processAssistantMessage tests Revise the performance assertions in the processAssistantMessage tests to ensure that each message processing time remains under 100ms, addressing potential ReDoS vulnerabilities. This change enhances the reliability of the tests by focusing on maximum processing time rather than relative ratios. * test: fill parity test gaps — model fallback, abort context, structured edge cases - usage.bulk-parity: add undefined model fallback test - transactions.bulk-parity: add abort context test (txns inserted, balance unchanged when balance not passed), fix readTokens type cast - Transaction.spec: add 3 missing mirrors — balance disabled with transactions enabled, structured transactions disabled, structured balance disabled * fix: deduct balance before inserting transactions to prevent orphaned docs Swap the order in bulkWriteTransactions: updateBalance runs before insertMany. If updateBalance fails (after exhausting retries), no transaction documents are written — avoiding the inconsistent state where transactions exist in MongoDB with no corresponding balance deduction. * chore: import order * test: update config.spec.ts for OpenRouter reasoning in modelKwargs Same fix as llm.spec.ts — OpenRouter reasoning is now passed via modelKwargs instead of llmConfig.reasoning directly.
2026-03-03 14:50:19 +01:00 · 2026-03-01 12:26:36 -05:00 · 2026-03-01 12:26:36 -05:00 · e1e204d6cf
commit e1e204d6cf
parent 0e5ee379b3
29 changed files with 3004 additions and 1070 deletions
--- a/packages/api/package.json
+++ b/packages/api/package.json
@ -90,7 +90,7 @@
    "@google/genai": "^1.19.0",
    "@keyv/redis": "^4.3.3",
    "@langchain/core": "^0.3.80",
-    "@librechat/agents": "^3.1.53",
+    "@librechat/agents": "^3.1.54",
    "@librechat/data-schemas": "*",
    "@modelcontextprotocol/sdk": "^1.27.1",
    "@smithy/node-http-handler": "^4.4.5",
--- a/packages/api/src/agents/index.ts
+++ b/packages/api/src/agents/index.ts
@ -9,6 +9,7 @@ export * from './legacy';
 export * from './memory';
 export * from './migration';
 export * from './openai';
+export * from './transactions';
 export * from './usage';
 export * from './resources';
 export * from './responses';
--- a/packages/api/src/agents/transactions.bulk-parity.spec.ts
+++ b/packages/api/src/agents/transactions.bulk-parity.spec.ts
@ -0,0 +1,559 @@
+/**
+ * Real-DB parity tests for the bulk transaction path.
+ *
+ * Each test uses the actual getMultiplier/getCacheMultiplier pricing functions
+ * (the same ones the legacy createTransaction path uses) and runs the bulk path
+ * against a real MongoMemoryServer instance.
+ *
+ * The assertion pattern: compute the expected tokenValue/rate/rawAmount from the
+ * pricing functions directly, then verify the DB state matches exactly. Since both
+ * legacy (createTransaction) and bulk (prepareTokenSpend + bulkWriteTransactions)
+ * call the same pricing functions with the same inputs, their outputs must be
+ * numerically identical.
+ */
+import mongoose from 'mongoose';
+import { MongoMemoryServer } from 'mongodb-memory-server';
+import {
+  CANCEL_RATE,
+  createMethods,
+  balanceSchema,
+  transactionSchema,
+} from '@librechat/data-schemas';
+import type { PricingFns, TxMetadata } from './transactions';
+import {
+  prepareStructuredTokenSpend,
+  bulkWriteTransactions,
+  prepareTokenSpend,
+} from './transactions';
+
+jest.mock('@librechat/data-schemas', () => {
+  const actual = jest.requireActual('@librechat/data-schemas');
+  return {
+    ...actual,
+    logger: { debug: jest.fn(), error: jest.fn(), warn: jest.fn(), info: jest.fn() },
+  };
+});
+
+// Real pricing functions from api/models/tx.js — same ones the legacy path uses
+/* eslint-disable @typescript-eslint/no-require-imports */
+const {
+  getMultiplier,
+  getCacheMultiplier,
+  tokenValues,
+  premiumTokenValues,
+} = require('../../../../api/models/tx.js');
+/* eslint-enable @typescript-eslint/no-require-imports */
+
+const pricing: PricingFns = { getMultiplier, getCacheMultiplier };
+
+let mongoServer: MongoMemoryServer;
+let Transaction: mongoose.Model<unknown>;
+let Balance: mongoose.Model<unknown>;
+let dbMethods: ReturnType<typeof createMethods>;
+
+beforeAll(async () => {
+  mongoServer = await MongoMemoryServer.create();
+  await mongoose.connect(mongoServer.getUri());
+  Transaction = mongoose.models.Transaction || mongoose.model('Transaction', transactionSchema);
+  Balance = mongoose.models.Balance || mongoose.model('Balance', balanceSchema);
+  dbMethods = createMethods(mongoose);
+});
+
+afterAll(async () => {
+  await mongoose.disconnect();
+  await mongoServer.stop();
+});
+
+beforeEach(async () => {
+  await mongoose.connection.dropDatabase();
+});
+
+const dbOps = () => ({
+  insertMany: dbMethods.bulkInsertTransactions,
+  updateBalance: dbMethods.updateBalance,
+});
+
+function txMeta(user: string, extra: Partial<TxMetadata> = {}): TxMetadata {
+  return {
+    user,
+    conversationId: 'test-convo',
+    context: 'test',
+    balance: { enabled: true },
+    transactions: { enabled: true },
+    ...extra,
+  };
+}
+
+describe('Standard token parity', () => {
+  test('balance should decrease by promptCost + completionCost — identical to legacy path', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'gpt-3.5-turbo';
+    const promptTokens = 100;
+    const completionTokens = 50;
+
+    const promptMultiplier = getMultiplier({
+      model,
+      tokenType: 'prompt',
+      inputTokenCount: promptTokens,
+    });
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: promptTokens,
+    });
+    const expectedCost = promptTokens * promptMultiplier + completionTokens * completionMultiplier;
+    const expectedBalance = initialBalance - expectedCost;
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model }),
+      { promptTokens, completionTokens },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBeCloseTo(expectedBalance, 0);
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    expect(txns).toHaveLength(2);
+    const promptTx = txns.find((t) => t.tokenType === 'prompt');
+    const completionTx = txns.find((t) => t.tokenType === 'completion');
+    expect(promptTx!.rawAmount).toBe(-promptTokens);
+    expect(promptTx!.rate).toBe(promptMultiplier);
+    expect(promptTx!.tokenValue).toBe(-promptTokens * promptMultiplier);
+    expect(completionTx!.rawAmount).toBe(-completionTokens);
+    expect(completionTx!.rate).toBe(completionMultiplier);
+    expect(completionTx!.tokenValue).toBe(-completionTokens * completionMultiplier);
+  });
+
+  test('balance unchanged when balance.enabled is false — identical to legacy path', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model: 'gpt-3.5-turbo', balance: { enabled: false } }),
+      { promptTokens: 100, completionTokens: 50 },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBe(initialBalance);
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(2); // transactions still inserted
+  });
+
+  test('no docs when transactions.enabled is false — identical to legacy path', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model: 'gpt-3.5-turbo', transactions: { enabled: false } }),
+      { promptTokens: 100, completionTokens: 50 },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(0);
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBe(initialBalance);
+  });
+
+  test('abort context — transactions inserted, no balance update when balance not passed', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'gpt-3.5-turbo';
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model, context: 'abort', balance: undefined }),
+      { promptTokens: 100, completionTokens: 50 },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(2);
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBe(initialBalance);
+  });
+
+  test('NaN promptTokens — only completion doc inserted, identical to legacy', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model: 'gpt-3.5-turbo' }),
+      { promptTokens: NaN, completionTokens: 50 },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    expect(txns).toHaveLength(1);
+    expect(txns[0].tokenType).toBe('completion');
+  });
+
+  test('zero tokens produce docs with rawAmount=0, tokenValue=0', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    await Balance.create({ user: userId, tokenCredits: 10000 });
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model: 'gpt-3.5-turbo' }),
+      { promptTokens: 0, completionTokens: 0 },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    expect(txns).toHaveLength(2);
+    expect(txns.every((t) => t.rawAmount === 0)).toBe(true);
+    expect(txns.every((t) => t.tokenValue === 0)).toBe(true);
+  });
+});
+
+describe('CANCEL_RATE parity (incomplete context)', () => {
+  test('CANCEL_RATE applied to completion token — same tokenValue as legacy', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    await Balance.create({ user: userId, tokenCredits: 10000000 });
+
+    const model = 'claude-3-5-sonnet';
+    const completionTokens = 50;
+    const promptTokens = 10;
+
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: promptTokens,
+    });
+    const expectedCompletionTokenValue = Math.ceil(
+      -completionTokens * completionMultiplier * CANCEL_RATE,
+    );
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model, context: 'incomplete' }),
+      { promptTokens, completionTokens },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    const completionTx = txns.find((t) => t.tokenType === 'completion');
+    expect(completionTx!.tokenValue).toBe(expectedCompletionTokenValue);
+    expect(completionTx!.rate).toBeCloseTo(completionMultiplier * CANCEL_RATE, 5);
+  });
+
+  test('CANCEL_RATE NOT applied to prompt tokens in incomplete context', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    await Balance.create({ user: userId, tokenCredits: 10000000 });
+
+    const model = 'claude-3-5-sonnet';
+    const promptTokens = 100;
+
+    const promptMultiplier = getMultiplier({
+      model,
+      tokenType: 'prompt',
+      inputTokenCount: promptTokens,
+    });
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model, context: 'incomplete' }),
+      { promptTokens, completionTokens: 0 },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    const promptTx = txns.find((t) => t.tokenType === 'prompt');
+    expect(promptTx!.rate).toBe(promptMultiplier); // no CANCEL_RATE
+  });
+});
+
+describe('Structured token parity', () => {
+  test('balance deduction identical to legacy spendStructuredTokens', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 17613154.55;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-3-5-sonnet';
+    const tokenUsage = {
+      promptTokens: { input: 11, write: 140522, read: 0 },
+      completionTokens: 5,
+    };
+
+    const promptMultiplier = getMultiplier({
+      model,
+      tokenType: 'prompt',
+      inputTokenCount: 11 + 140522,
+    });
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: 11 + 140522,
+    });
+    const writeMultiplier = getCacheMultiplier({ model, cacheType: 'write' }) ?? promptMultiplier;
+    const readMultiplier = getCacheMultiplier({ model, cacheType: 'read' }) ?? promptMultiplier;
+
+    const expectedPromptCost =
+      tokenUsage.promptTokens.input * promptMultiplier +
+      tokenUsage.promptTokens.write * writeMultiplier +
+      tokenUsage.promptTokens.read * readMultiplier;
+    const expectedCompletionCost = tokenUsage.completionTokens * completionMultiplier;
+    const expectedTotalCost = expectedPromptCost + expectedCompletionCost;
+    const expectedBalance = initialBalance - expectedTotalCost;
+
+    const entries = prepareStructuredTokenSpend(txMeta(userId, { model }), tokenUsage, pricing);
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(Math.abs((balance.tokenCredits as number) - expectedBalance)).toBeLessThan(100);
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    const promptTx = txns.find((t) => t.tokenType === 'prompt');
+    expect(promptTx!.inputTokens).toBe(-11);
+    expect(promptTx!.writeTokens).toBe(-140522);
+    expect(Math.abs(Number(promptTx!.readTokens ?? 0))).toBe(0);
+  });
+
+  test('structured tokens with both cache_creation and cache_read', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 10000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-3-5-sonnet';
+    const tokenUsage = {
+      promptTokens: { input: 100, write: 50, read: 30 },
+      completionTokens: 80,
+    };
+    const totalInput = 180;
+
+    const promptMultiplier = getMultiplier({
+      model,
+      tokenType: 'prompt',
+      inputTokenCount: totalInput,
+    });
+    const writeMultiplier = getCacheMultiplier({ model, cacheType: 'write' }) ?? promptMultiplier;
+    const readMultiplier = getCacheMultiplier({ model, cacheType: 'read' }) ?? promptMultiplier;
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: totalInput,
+    });
+
+    const expectedPromptCost = 100 * promptMultiplier + 50 * writeMultiplier + 30 * readMultiplier;
+    const expectedCost = expectedPromptCost + 80 * completionMultiplier;
+
+    const entries = prepareStructuredTokenSpend(txMeta(userId, { model }), tokenUsage, pricing);
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    expect(txns).toHaveLength(2);
+    const promptTx = txns.find((t) => t.tokenType === 'prompt');
+    expect(promptTx!.inputTokens).toBe(-100);
+    expect(promptTx!.writeTokens).toBe(-50);
+    expect(promptTx!.readTokens).toBe(-30);
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(
+      Math.abs((balance.tokenCredits as number) - (initialBalance - expectedCost)),
+    ).toBeLessThan(1);
+  });
+
+  test('CANCEL_RATE applied to completion in structured incomplete context', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    await Balance.create({ user: userId, tokenCredits: 17613154.55 });
+
+    const model = 'claude-3-5-sonnet';
+    const tokenUsage = {
+      promptTokens: { input: 10, write: 100, read: 5 },
+      completionTokens: 50,
+    };
+
+    const completionMultiplier = getMultiplier({
+      model,
+      tokenType: 'completion',
+      inputTokenCount: 115,
+    });
+    const expectedCompletionTokenValue = Math.ceil(-50 * completionMultiplier * CANCEL_RATE);
+
+    const entries = prepareStructuredTokenSpend(
+      txMeta(userId, { model, context: 'incomplete' }),
+      tokenUsage,
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const txns = (await Transaction.find({ user: userId }).lean()) as Record<string, unknown>[];
+    const completionTx = txns.find((t) => t.tokenType === 'completion');
+    expect(completionTx!.tokenValue).toBeCloseTo(expectedCompletionTokenValue, 0);
+  });
+});
+
+describe('Premium pricing parity', () => {
+  test('standard pricing below threshold — identical to legacy', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 100000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-opus-4-6';
+    const promptTokens = 100000;
+    const completionTokens = 500;
+
+    const standardPromptRate = (tokenValues as Record<string, Record<string, number>>)[model]
+      .prompt;
+    const standardCompletionRate = (tokenValues as Record<string, Record<string, number>>)[model]
+      .completion;
+    const expectedCost =
+      promptTokens * standardPromptRate + completionTokens * standardCompletionRate;
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model }),
+      { promptTokens, completionTokens },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBeCloseTo(initialBalance - expectedCost, 0);
+  });
+
+  test('premium pricing above threshold — identical to legacy', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 100000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-opus-4-6';
+    const promptTokens = 250000;
+    const completionTokens = 500;
+
+    const premiumPromptRate = (premiumTokenValues as Record<string, Record<string, number>>)[model]
+      .prompt;
+    const premiumCompletionRate = (premiumTokenValues as Record<string, Record<string, number>>)[
+      model
+    ].completion;
+    const expectedCost =
+      promptTokens * premiumPromptRate + completionTokens * premiumCompletionRate;
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model }),
+      { promptTokens, completionTokens },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBeCloseTo(initialBalance - expectedCost, 0);
+  });
+
+  test('standard pricing at exactly the threshold — identical to legacy', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 100000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-opus-4-6';
+    const promptTokens = (premiumTokenValues as Record<string, Record<string, number>>)[model]
+      .threshold;
+    const completionTokens = 500;
+
+    const standardPromptRate = (tokenValues as Record<string, Record<string, number>>)[model]
+      .prompt;
+    const standardCompletionRate = (tokenValues as Record<string, Record<string, number>>)[model]
+      .completion;
+    const expectedCost =
+      promptTokens * standardPromptRate + completionTokens * standardCompletionRate;
+
+    const entries = prepareTokenSpend(
+      txMeta(userId, { model }),
+      { promptTokens, completionTokens },
+      pricing,
+    );
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBeCloseTo(initialBalance - expectedCost, 0);
+  });
+});
+
+describe('Multi-entry batch parity', () => {
+  test('real-world sequential tool calls — total balance deduction identical to N individual legacy calls', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 100000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-opus-4-5-20251101';
+    const calls = [
+      { promptTokens: 31596, completionTokens: 151 },
+      { promptTokens: 35368, completionTokens: 150 },
+      { promptTokens: 58362, completionTokens: 295 },
+      { promptTokens: 112604, completionTokens: 193 },
+      { promptTokens: 257440, completionTokens: 2217 },
+    ];
+
+    let expectedTotalCost = 0;
+    const allEntries = [];
+    for (const { promptTokens, completionTokens } of calls) {
+      const pm = getMultiplier({ model, tokenType: 'prompt', inputTokenCount: promptTokens });
+      const cm = getMultiplier({ model, tokenType: 'completion', inputTokenCount: promptTokens });
+      expectedTotalCost += promptTokens * pm + completionTokens * cm;
+      const entries = prepareTokenSpend(
+        txMeta(userId, { model }),
+        { promptTokens, completionTokens },
+        pricing,
+      );
+      allEntries.push(...entries);
+    }
+
+    await bulkWriteTransactions({ user: userId, docs: allEntries }, dbOps());
+
+    const txns = await Transaction.find({ user: userId }).lean();
+    expect(txns).toHaveLength(10); // 5 calls × 2 docs
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBeCloseTo(initialBalance - expectedTotalCost, 0);
+  });
+
+  test('structured premium above threshold — batch vs individual produce same balance deduction', async () => {
+    const userId = new mongoose.Types.ObjectId().toString();
+    const initialBalance = 100000000;
+    await Balance.create({ user: userId, tokenCredits: initialBalance });
+
+    const model = 'claude-opus-4-6';
+    const tokenUsage = {
+      promptTokens: { input: 200000, write: 10000, read: 5000 },
+      completionTokens: 1000,
+    };
+    const totalInput = 215000;
+
+    const premiumPromptRate = (premiumTokenValues as Record<string, Record<string, number>>)[model]
+      .prompt;
+    const premiumCompletionRate = (premiumTokenValues as Record<string, Record<string, number>>)[
+      model
+    ].completion;
+    const writeMultiplier = getCacheMultiplier({ model, cacheType: 'write' });
+    const readMultiplier = getCacheMultiplier({ model, cacheType: 'read' });
+
+    const expectedPromptCost =
+      tokenUsage.promptTokens.input * premiumPromptRate +
+      tokenUsage.promptTokens.write * writeMultiplier +
+      tokenUsage.promptTokens.read * readMultiplier;
+    const expectedCompletionCost = tokenUsage.completionTokens * premiumCompletionRate;
+    const expectedTotalCost = expectedPromptCost + expectedCompletionCost;
+
+    expect(totalInput).toBeGreaterThan(
+      (premiumTokenValues as Record<string, Record<string, number>>)[model].threshold,
+    );
+
+    const entries = prepareStructuredTokenSpend(txMeta(userId, { model }), tokenUsage, pricing);
+    await bulkWriteTransactions({ user: userId, docs: entries }, dbOps());
+
+    const balance = (await Balance.findOne({ user: userId }).lean()) as Record<string, unknown>;
+    expect(balance.tokenCredits).toBeCloseTo(initialBalance - expectedTotalCost, 0);
+  });
+});
--- a/packages/api/src/agents/transactions.spec.ts
+++ b/packages/api/src/agents/transactions.spec.ts
@ -0,0 +1,474 @@
+import mongoose from 'mongoose';
+import { MongoMemoryServer } from 'mongodb-memory-server';
+import {
+  CANCEL_RATE,
+  createMethods,
+  balanceSchema,
+  transactionSchema,
+} from '@librechat/data-schemas';
+import type { PricingFns, TxMetadata, PreparedEntry } from './transactions';
+import {
+  prepareStructuredTokenSpend,
+  bulkWriteTransactions,
+  prepareTokenSpend,
+} from './transactions';
+
+jest.mock('@librechat/data-schemas', () => {
+  const actual = jest.requireActual('@librechat/data-schemas');
+  return {
+    ...actual,
+    logger: {
+      debug: jest.fn(),
+      error: jest.fn(),
+      warn: jest.fn(),
+      info: jest.fn(),
+    },
+  };
+});
+
+let mongoServer: MongoMemoryServer;
+let Transaction: mongoose.Model<unknown>;
+let Balance: mongoose.Model<unknown>;
+let dbMethods: ReturnType<typeof createMethods>;
+
+beforeAll(async () => {
+  mongoServer = await MongoMemoryServer.create();
+  await mongoose.connect(mongoServer.getUri());
+  Transaction = mongoose.models.Transaction || mongoose.model('Transaction', transactionSchema);
+  Balance = mongoose.models.Balance || mongoose.model('Balance', balanceSchema);
+  dbMethods = createMethods(mongoose);
+});
+
+afterAll(async () => {
+  await mongoose.disconnect();
+  await mongoServer.stop();
+});
+
+beforeEach(async () => {
+  await mongoose.connection.dropDatabase();
+});
+
+const testUserId = new mongoose.Types.ObjectId().toString();
+
+const baseTxData: TxMetadata = {
+  user: testUserId,
+  context: 'message',
+  conversationId: 'convo-123',
+  model: 'gpt-4',
+  messageId: 'msg-123',
+  balance: { enabled: true },
+  transactions: { enabled: true },
+};
+
+const mockPricing: PricingFns = {
+  getMultiplier: jest.fn().mockReturnValue(2),
+  getCacheMultiplier: jest.fn().mockReturnValue(null),
+};
+
+describe('prepareTokenSpend', () => {
+  beforeEach(() => {
+    jest.clearAllMocks();
+  });
+
+  it('should prepare prompt + completion entries', () => {
+    const entries = prepareTokenSpend(
+      baseTxData,
+      { promptTokens: 100, completionTokens: 50 },
+      mockPricing,
+    );
+    expect(entries).toHaveLength(2);
+    expect(entries[0].doc.tokenType).toBe('prompt');
+    expect(entries[1].doc.tokenType).toBe('completion');
+  });
+
+  it('should return empty array when transactions disabled', () => {
+    const txData = { ...baseTxData, transactions: { enabled: false } };
+    const entries = prepareTokenSpend(
+      txData,
+      { promptTokens: 100, completionTokens: 50 },
+      mockPricing,
+    );
+    expect(entries).toHaveLength(0);
+  });
+
+  it('should filter out NaN rawAmount entries', () => {
+    const entries = prepareTokenSpend(
+      baseTxData,
+      { promptTokens: NaN, completionTokens: 50 },
+      mockPricing,
+    );
+    expect(entries).toHaveLength(1);
+    expect(entries[0].doc.tokenType).toBe('completion');
+  });
+
+  it('should handle promptTokens only', () => {
+    const entries = prepareTokenSpend(baseTxData, { promptTokens: 100 }, mockPricing);
+    expect(entries).toHaveLength(1);
+    expect(entries[0].doc.tokenType).toBe('prompt');
+  });
+
+  it('should handle completionTokens only', () => {
+    const entries = prepareTokenSpend(baseTxData, { completionTokens: 50 }, mockPricing);
+    expect(entries).toHaveLength(1);
+    expect(entries[0].doc.tokenType).toBe('completion');
+  });
+
+  it('should handle zero tokens', () => {
+    const entries = prepareTokenSpend(
+      baseTxData,
+      { promptTokens: 0, completionTokens: 0 },
+      mockPricing,
+    );
+    expect(entries).toHaveLength(2);
+    expect(entries[0].doc.rawAmount).toBe(0);
+    expect(entries[1].doc.rawAmount).toBe(0);
+  });
+
+  it('should calculate tokenValue using pricing multiplier', () => {
+    (mockPricing.getMultiplier as jest.Mock).mockReturnValue(3);
+    const entries = prepareTokenSpend(
+      baseTxData,
+      { promptTokens: 100, completionTokens: 50 },
+      mockPricing,
+    );
+    expect(entries[0].doc.rate).toBe(3);
+    expect(entries[0].doc.tokenValue).toBe(-100 * 3);
+    expect(entries[1].doc.rate).toBe(3);
+    expect(entries[1].doc.tokenValue).toBe(-50 * 3);
+  });
+
+  it('should pass valueKey to getMultiplier', () => {
+    prepareTokenSpend(baseTxData, { promptTokens: 100 }, mockPricing);
+    expect(mockPricing.getMultiplier).toHaveBeenCalledWith(
+      expect.objectContaining({ tokenType: 'prompt', model: 'gpt-4' }),
+    );
+  });
+
+  it('should carry balance config on each entry', () => {
+    const entries = prepareTokenSpend(
+      baseTxData,
+      { promptTokens: 100, completionTokens: 50 },
+      mockPricing,
+    );
+    for (const entry of entries) {
+      expect(entry.balance).toEqual({ enabled: true });
+    }
+  });
+});
+
+describe('prepareTokenSpend — CANCEL_RATE', () => {
+  beforeEach(() => {
+    jest.clearAllMocks();
+    (mockPricing.getMultiplier as jest.Mock).mockReturnValue(2);
+  });
+
+  it('should apply CANCEL_RATE to completion tokens with incomplete context', () => {
+    const txData: TxMetadata = { ...baseTxData, context: 'incomplete' };
+    const entries = prepareTokenSpend(
+      txData,
+      { promptTokens: 100, completionTokens: 50 },
+      mockPricing,
+    );
+    const completion = entries.find((e) => e.doc.tokenType === 'completion');
+    expect(completion).toBeDefined();
+    expect(completion!.doc.rate).toBe(2 * CANCEL_RATE);
+    expect(completion!.doc.tokenValue).toBe(Math.ceil(-50 * 2 * CANCEL_RATE));
+  });
+
+  it('should NOT apply CANCEL_RATE to prompt tokens with incomplete context', () => {
+    const txData: TxMetadata = { ...baseTxData, context: 'incomplete' };
+    const entries = prepareTokenSpend(
+      txData,
+      { promptTokens: 100, completionTokens: 50 },
+      mockPricing,
+    );
+    const prompt = entries.find((e) => e.doc.tokenType === 'prompt');
+    expect(prompt!.doc.rate).toBe(2);
+  });
+
+  it('should NOT apply CANCEL_RATE for abort context', () => {
+    const txData: TxMetadata = { ...baseTxData, context: 'abort' };
+    const entries = prepareTokenSpend(txData, { completionTokens: 50 }, mockPricing);
+    expect(entries[0].doc.rate).toBe(2);
+  });
+});
+
+describe('prepareStructuredTokenSpend', () => {
+  beforeEach(() => {
+    jest.clearAllMocks();
+    (mockPricing.getMultiplier as jest.Mock).mockReturnValue(2);
+    (mockPricing.getCacheMultiplier as jest.Mock).mockReturnValue(null);
+  });
+
+  it('should prepare prompt + completion for structured tokens', () => {
+    const entries = prepareStructuredTokenSpend(
+      baseTxData,
+      { promptTokens: { input: 100, write: 50, read: 30 }, completionTokens: 80 },
+      mockPricing,
+    );
+    expect(entries).toHaveLength(2);
+    expect(entries[0].doc.tokenType).toBe('prompt');
+    expect(entries[0].doc.inputTokens).toBe(-100);
+    expect(entries[0].doc.writeTokens).toBe(-50);
+    expect(entries[0].doc.readTokens).toBe(-30);
+    expect(entries[1].doc.tokenType).toBe('completion');
+  });
+
+  it('should use cache multipliers when available', () => {
+    (mockPricing.getCacheMultiplier as jest.Mock).mockImplementation(({ cacheType }) => {
+      if (cacheType === 'write') {
+        return 5;
+      }
+      if (cacheType === 'read') {
+        return 0.5;
+      }
+      return null;
+    });
+
+    const entries = prepareStructuredTokenSpend(
+      baseTxData,
+      { promptTokens: { input: 100, write: 50, read: 30 }, completionTokens: 0 },
+      mockPricing,
+    );
+    const prompt = entries.find((e) => e.doc.tokenType === 'prompt');
+    expect(prompt).toBeDefined();
+    expect(prompt!.doc.rateDetail).toEqual({ input: 2, write: 5, read: 0.5 });
+  });
+
+  it('should return empty when transactions disabled', () => {
+    const txData = { ...baseTxData, transactions: { enabled: false } };
+    const entries = prepareStructuredTokenSpend(
+      txData,
+      { promptTokens: { input: 100 }, completionTokens: 50 },
+      mockPricing,
+    );
+    expect(entries).toHaveLength(0);
+  });
+
+  it('should handle zero totalPromptTokens (fallback rate)', () => {
+    const entries = prepareStructuredTokenSpend(
+      baseTxData,
+      { promptTokens: { input: 0, write: 0, read: 0 }, completionTokens: 50 },
+      mockPricing,
+    );
+    const prompt = entries.find((e) => e.doc.tokenType === 'prompt');
+    expect(prompt).toBeDefined();
+    expect(prompt!.doc.rate).toBe(2);
+  });
+});
+
+describe('bulkWriteTransactions (real DB)', () => {
+  it('should return early for empty docs without DB writes', async () => {
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs: [] }, dbOps);
+    const txCount = await Transaction.countDocuments();
+    expect(txCount).toBe(0);
+  });
+
+  it('should insert transaction documents into MongoDB', async () => {
+    const docs: PreparedEntry[] = [
+      {
+        doc: {
+          user: testUserId,
+          conversationId: 'c1',
+          tokenType: 'prompt',
+          tokenValue: -200,
+          rate: 2,
+          rawAmount: -100,
+        },
+        tokenValue: -200,
+        balance: { enabled: true },
+      },
+      {
+        doc: {
+          user: testUserId,
+          conversationId: 'c1',
+          tokenType: 'completion',
+          tokenValue: -100,
+          rate: 2,
+          rawAmount: -50,
+        },
+        tokenValue: -100,
+        balance: { enabled: true },
+      },
+    ];
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs }, dbOps);
+
+    const saved = await Transaction.find({ user: testUserId }).lean();
+    expect(saved).toHaveLength(2);
+    expect(saved.map((t: Record<string, unknown>) => t.tokenType).sort()).toEqual([
+      'completion',
+      'prompt',
+    ]);
+  });
+
+  it('should create balance document and update credits', async () => {
+    const docs: PreparedEntry[] = [
+      {
+        doc: { user: testUserId, conversationId: 'c1', tokenType: 'prompt', tokenValue: -300 },
+        tokenValue: -300,
+        balance: { enabled: true },
+      },
+    ];
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs }, dbOps);
+
+    const bal = (await Balance.findOne({ user: testUserId }).lean()) as Record<
+      string,
+      unknown
+    > | null;
+    expect(bal).toBeDefined();
+    expect(bal!.tokenCredits).toBe(0);
+  });
+
+  it('should NOT update balance when no docs have balance enabled', async () => {
+    const docs: PreparedEntry[] = [
+      {
+        doc: { user: testUserId, conversationId: 'c1', tokenType: 'prompt', tokenValue: -100 },
+        tokenValue: -100,
+        balance: { enabled: false },
+      },
+    ];
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs }, dbOps);
+
+    const txCount = await Transaction.countDocuments({ user: testUserId });
+    expect(txCount).toBe(1);
+    const bal = await Balance.findOne({ user: testUserId }).lean();
+    expect(bal).toBeNull();
+  });
+
+  it('should only sum tokenValue from balance-enabled docs', async () => {
+    await Balance.create({ user: testUserId, tokenCredits: 1000 });
+
+    const docs: PreparedEntry[] = [
+      {
+        doc: { user: testUserId, conversationId: 'c1', tokenType: 'prompt', tokenValue: -100 },
+        tokenValue: -100,
+        balance: { enabled: true },
+      },
+      {
+        doc: { user: testUserId, conversationId: 'c1', tokenType: 'completion', tokenValue: -50 },
+        tokenValue: -50,
+        balance: { enabled: false },
+      },
+    ];
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs }, dbOps);
+
+    const bal = (await Balance.findOne({ user: testUserId }).lean()) as Record<
+      string,
+      unknown
+    > | null;
+    expect(bal!.tokenCredits).toBe(900);
+  });
+
+  it('should handle null balance gracefully', async () => {
+    const docs: PreparedEntry[] = [
+      {
+        doc: { user: testUserId, conversationId: 'c1', tokenType: 'prompt', tokenValue: -100 },
+        tokenValue: -100,
+        balance: null,
+      },
+    ];
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs }, dbOps);
+
+    const txCount = await Transaction.countDocuments({ user: testUserId });
+    expect(txCount).toBe(1);
+    const bal = await Balance.findOne({ user: testUserId }).lean();
+    expect(bal).toBeNull();
+  });
+});
+
+describe('end-to-end: prepare → bulk write → verify', () => {
+  it('should prepare, write, and correctly update balance for standard tokens', async () => {
+    await Balance.create({ user: testUserId, tokenCredits: 10000 });
+    (mockPricing.getMultiplier as jest.Mock).mockReturnValue(2);
+
+    const entries = prepareTokenSpend(
+      baseTxData,
+      { promptTokens: 100, completionTokens: 50 },
+      mockPricing,
+    );
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs: entries }, dbOps);
+
+    const txns = (await Transaction.find({ user: testUserId }).lean()) as Record<string, unknown>[];
+    expect(txns).toHaveLength(2);
+
+    const prompt = txns.find((t) => t.tokenType === 'prompt');
+    const completion = txns.find((t) => t.tokenType === 'completion');
+    expect(prompt!.tokenValue).toBe(-200);
+    expect(prompt!.rate).toBe(2);
+    expect(completion!.tokenValue).toBe(-100);
+    expect(completion!.rate).toBe(2);
+
+    const bal = (await Balance.findOne({ user: testUserId }).lean()) as Record<
+      string,
+      unknown
+    > | null;
+    expect(bal!.tokenCredits).toBe(10000 + -200 + -100);
+  });
+
+  it('should prepare and write structured tokens with cache pricing', async () => {
+    await Balance.create({ user: testUserId, tokenCredits: 5000 });
+    (mockPricing.getMultiplier as jest.Mock).mockReturnValue(1);
+    (mockPricing.getCacheMultiplier as jest.Mock).mockImplementation(({ cacheType }) => {
+      if (cacheType === 'write') {
+        return 3;
+      }
+      if (cacheType === 'read') {
+        return 0.1;
+      }
+      return null;
+    });
+
+    const entries = prepareStructuredTokenSpend(
+      baseTxData,
+      { promptTokens: { input: 100, write: 50, read: 200 }, completionTokens: 80 },
+      mockPricing,
+    );
+    const dbOps = {
+      insertMany: dbMethods.bulkInsertTransactions,
+      updateBalance: dbMethods.updateBalance,
+    };
+    await bulkWriteTransactions({ user: testUserId, docs: entries }, dbOps);
+
+    const txns = (await Transaction.find({ user: testUserId }).lean()) as Record<string, unknown>[];
+    expect(txns).toHaveLength(2);
+
+    const prompt = txns.find((t) => t.tokenType === 'prompt');
+    expect(prompt!.inputTokens).toBe(-100);
+    expect(prompt!.writeTokens).toBe(-50);
+    expect(prompt!.readTokens).toBe(-200);
+
+    const bal = (await Balance.findOne({ user: testUserId }).lean()) as Record<
+      string,
+      unknown
+    > | null;
+    expect(bal!.tokenCredits).toBeLessThan(5000);
+  });
+});
--- a/packages/api/src/agents/transactions.ts
+++ b/packages/api/src/agents/transactions.ts
@ -0,0 +1,345 @@
+import { CANCEL_RATE } from '@librechat/data-schemas';
+import type { TCustomConfig, TTransactionsConfig } from 'librechat-data-provider';
+import type { TransactionData } from '@librechat/data-schemas';
+import type { EndpointTokenConfig } from '~/types/tokens';
+
+interface GetMultiplierParams {
+  valueKey?: string;
+  tokenType?: string;
+  model?: string;
+  endpointTokenConfig?: EndpointTokenConfig;
+  inputTokenCount?: number;
+}
+
+interface GetCacheMultiplierParams {
+  cacheType: 'write' | 'read';
+  model?: string;
+  endpointTokenConfig?: EndpointTokenConfig;
+}
+
+export interface PricingFns {
+  getMultiplier: (params: GetMultiplierParams) => number;
+  getCacheMultiplier: (params: GetCacheMultiplierParams) => number | null;
+}
+
+interface BaseTxData {
+  user: string;
+  model?: string;
+  context: string;
+  messageId?: string;
+  conversationId: string;
+  endpointTokenConfig?: EndpointTokenConfig;
+  balance?: Partial<TCustomConfig['balance']> | null;
+  transactions?: Partial<TTransactionsConfig>;
+}
+
+interface StandardTxData extends BaseTxData {
+  tokenType: string;
+  rawAmount: number;
+  inputTokenCount?: number;
+  valueKey?: string;
+}
+
+interface StructuredTxData extends BaseTxData {
+  tokenType: string;
+  inputTokens?: number;
+  writeTokens?: number;
+  readTokens?: number;
+  inputTokenCount?: number;
+  rawAmount?: number;
+}
+
+export interface PreparedEntry {
+  doc: TransactionData;
+  tokenValue: number;
+  balance?: Partial<TCustomConfig['balance']> | null;
+}
+
+export interface TokenUsage {
+  promptTokens?: number;
+  completionTokens?: number;
+}
+
+export interface StructuredPromptTokens {
+  input?: number;
+  write?: number;
+  read?: number;
+}
+
+export interface StructuredTokenUsage {
+  promptTokens?: StructuredPromptTokens;
+  completionTokens?: number;
+}
+
+export interface TxMetadata {
+  user: string;
+  model?: string;
+  context: string;
+  messageId?: string;
+  conversationId: string;
+  balance?: Partial<TCustomConfig['balance']> | null;
+  transactions?: Partial<TTransactionsConfig>;
+  endpointTokenConfig?: EndpointTokenConfig;
+}
+
+export interface BulkWriteDeps {
+  insertMany: (docs: TransactionData[]) => Promise<unknown>;
+  updateBalance: (params: { user: string; incrementValue: number }) => Promise<unknown>;
+}
+
+function calculateTokenValue(
+  txData: StandardTxData,
+  pricing: PricingFns,
+): { tokenValue: number; rate: number } {
+  const { tokenType, model, endpointTokenConfig, inputTokenCount, rawAmount, valueKey } = txData;
+  const multiplier = Math.abs(
+    pricing.getMultiplier({ valueKey, tokenType, model, endpointTokenConfig, inputTokenCount }),
+  );
+  let rate = multiplier;
+  let tokenValue = rawAmount * multiplier;
+  if (txData.context === 'incomplete' && tokenType === 'completion') {
+    tokenValue = Math.ceil(tokenValue * CANCEL_RATE);
+    rate *= CANCEL_RATE;
+  }
+  return { tokenValue, rate };
+}
+
+function calculateStructuredTokenValue(
+  txData: StructuredTxData,
+  pricing: PricingFns,
+): { tokenValue: number; rate: number; rawAmount: number; rateDetail?: Record<string, number> } {
+  const { tokenType, model, endpointTokenConfig, inputTokenCount } = txData;
+
+  if (!tokenType) {
+    return { tokenValue: txData.rawAmount ?? 0, rate: 0, rawAmount: txData.rawAmount ?? 0 };
+  }
+
+  if (tokenType === 'prompt') {
+    const inputMultiplier = pricing.getMultiplier({
+      tokenType: 'prompt',
+      model,
+      endpointTokenConfig,
+      inputTokenCount,
+    });
+    const writeMultiplier =
+      pricing.getCacheMultiplier({ cacheType: 'write', model, endpointTokenConfig }) ??
+      inputMultiplier;
+    const readMultiplier =
+      pricing.getCacheMultiplier({ cacheType: 'read', model, endpointTokenConfig }) ??
+      inputMultiplier;
+
+    const inputAbs = Math.abs(txData.inputTokens ?? 0);
+    const writeAbs = Math.abs(txData.writeTokens ?? 0);
+    const readAbs = Math.abs(txData.readTokens ?? 0);
+    const totalPromptTokens = inputAbs + writeAbs + readAbs;
+
+    const rate =
+      totalPromptTokens > 0
+        ? (Math.abs(inputMultiplier * (txData.inputTokens ?? 0)) +
+            Math.abs(writeMultiplier * (txData.writeTokens ?? 0)) +
+            Math.abs(readMultiplier * (txData.readTokens ?? 0))) /
+          totalPromptTokens
+        : Math.abs(inputMultiplier);
+
+    const tokenValue = -(
+      inputAbs * inputMultiplier +
+      writeAbs * writeMultiplier +
+      readAbs * readMultiplier
+    );
+
+    return {
+      tokenValue,
+      rate,
+      rawAmount: -totalPromptTokens,
+      rateDetail: { input: inputMultiplier, write: writeMultiplier, read: readMultiplier },
+    };
+  }
+
+  const multiplier = pricing.getMultiplier({
+    tokenType,
+    model,
+    endpointTokenConfig,
+    inputTokenCount,
+  });
+  const rawAmount = -Math.abs(txData.rawAmount ?? 0);
+  let rate = Math.abs(multiplier);
+  let tokenValue = rawAmount * multiplier;
+
+  if (txData.context === 'incomplete' && tokenType === 'completion') {
+    tokenValue = Math.ceil(tokenValue * CANCEL_RATE);
+    rate *= CANCEL_RATE;
+  }
+
+  return { tokenValue, rate, rawAmount };
+}
+
+function prepareStandardTx(
+  _txData: StandardTxData & {
+    balance?: Partial<TCustomConfig['balance']> | null;
+    transactions?: Partial<TTransactionsConfig>;
+  },
+  pricing: PricingFns,
+): PreparedEntry | null {
+  const { balance, transactions, ...txData } = _txData;
+  if (txData.rawAmount != null && isNaN(txData.rawAmount)) {
+    return null;
+  }
+  if (transactions?.enabled === false) {
+    return null;
+  }
+
+  const { tokenValue, rate } = calculateTokenValue(txData, pricing);
+  return {
+    doc: { ...txData, tokenValue, rate },
+    tokenValue,
+    balance,
+  };
+}
+
+function prepareStructuredTx(
+  _txData: StructuredTxData & {
+    balance?: Partial<TCustomConfig['balance']> | null;
+    transactions?: Partial<TTransactionsConfig>;
+  },
+  pricing: PricingFns,
+): PreparedEntry | null {
+  const { balance, transactions, ...txData } = _txData;
+  if (transactions?.enabled === false) {
+    return null;
+  }
+
+  const { tokenValue, rate, rawAmount, rateDetail } = calculateStructuredTokenValue(
+    txData,
+    pricing,
+  );
+  return {
+    doc: {
+      ...txData,
+      tokenValue,
+      rate,
+      rawAmount,
+      ...(rateDetail && { rateDetail }),
+    },
+    tokenValue,
+    balance,
+  };
+}
+
+export function prepareTokenSpend(
+  txData: TxMetadata,
+  tokenUsage: TokenUsage,
+  pricing: PricingFns,
+): PreparedEntry[] {
+  const { promptTokens, completionTokens } = tokenUsage;
+  const results: PreparedEntry[] = [];
+  const normalizedPromptTokens = Math.max(promptTokens ?? 0, 0);
+
+  if (promptTokens !== undefined) {
+    const entry = prepareStandardTx(
+      {
+        ...txData,
+        tokenType: 'prompt',
+        rawAmount: promptTokens === 0 ? 0 : -normalizedPromptTokens,
+        inputTokenCount: normalizedPromptTokens,
+      },
+      pricing,
+    );
+    if (entry) {
+      results.push(entry);
+    }
+  }
+
+  if (completionTokens !== undefined) {
+    const entry = prepareStandardTx(
+      {
+        ...txData,
+        tokenType: 'completion',
+        rawAmount: completionTokens === 0 ? 0 : -Math.max(completionTokens, 0),
+        inputTokenCount: normalizedPromptTokens,
+      },
+      pricing,
+    );
+    if (entry) {
+      results.push(entry);
+    }
+  }
+
+  return results;
+}
+
+export function prepareStructuredTokenSpend(
+  txData: TxMetadata,
+  tokenUsage: StructuredTokenUsage,
+  pricing: PricingFns,
+): PreparedEntry[] {
+  const { promptTokens, completionTokens } = tokenUsage;
+  const results: PreparedEntry[] = [];
+
+  if (promptTokens) {
+    const input = Math.max(promptTokens.input ?? 0, 0);
+    const write = Math.max(promptTokens.write ?? 0, 0);
+    const read = Math.max(promptTokens.read ?? 0, 0);
+    const totalInputTokens = input + write + read;
+    const entry = prepareStructuredTx(
+      {
+        ...txData,
+        tokenType: 'prompt',
+        inputTokens: -input,
+        writeTokens: -write,
+        readTokens: -read,
+        inputTokenCount: totalInputTokens,
+      },
+      pricing,
+    );
+    if (entry) {
+      results.push(entry);
+    }
+  }
+
+  if (completionTokens) {
+    const totalInputTokens = promptTokens
+      ? Math.max(promptTokens.input ?? 0, 0) +
+        Math.max(promptTokens.write ?? 0, 0) +
+        Math.max(promptTokens.read ?? 0, 0)
+      : undefined;
+    const entry = prepareStandardTx(
+      {
+        ...txData,
+        tokenType: 'completion',
+        rawAmount: -Math.max(completionTokens, 0),
+        inputTokenCount: totalInputTokens,
+      },
+      pricing,
+    );
+    if (entry) {
+      results.push(entry);
+    }
+  }
+
+  return results;
+}
+
+export async function bulkWriteTransactions(
+  { user, docs }: { user: string; docs: PreparedEntry[] },
+  dbOps: BulkWriteDeps,
+): Promise<void> {
+  if (!docs.length) {
+    return;
+  }
+
+  let totalTokenValue = 0;
+  let balanceEnabled = false;
+  const plainDocs = docs.map(({ doc, tokenValue, balance }) => {
+    if (balance?.enabled) {
+      balanceEnabled = true;
+      totalTokenValue += tokenValue;
+    }
+    return doc;
+  });
+
+  if (balanceEnabled) {
+    await dbOps.updateBalance({ user, incrementValue: totalTokenValue });
+  }
+
+  await dbOps.insertMany(plainDocs);
+}
--- a/packages/api/src/agents/usage.bulk-parity.spec.ts
+++ b/packages/api/src/agents/usage.bulk-parity.spec.ts
@ -0,0 +1,533 @@
+/**
+ * Bulk path parity tests for recordCollectedUsage.
+ *
+ * Every test here mirrors a corresponding legacy-path test in usage.spec.ts.
+ * The return values (input_tokens, output_tokens) must be identical between paths.
+ * The docs written to insertMany must carry the same metadata as the args that
+ * would have been passed to spendTokens/spendStructuredTokens.
+ */
+import type { UsageMetadata } from '../stream/interfaces/IJobStore';
+import type { RecordUsageDeps, RecordUsageParams } from './usage';
+import type { BulkWriteDeps, PricingFns } from './transactions';
+import { recordCollectedUsage } from './usage';
+
+describe('recordCollectedUsage — bulk path parity', () => {
+  let mockSpendTokens: jest.Mock;
+  let mockSpendStructuredTokens: jest.Mock;
+  let mockInsertMany: jest.Mock;
+  let mockUpdateBalance: jest.Mock;
+  let mockPricing: PricingFns;
+  let mockBulkWriteOps: BulkWriteDeps;
+  let deps: RecordUsageDeps;
+
+  const baseParams: Omit<RecordUsageParams, 'collectedUsage'> = {
+    user: 'user-123',
+    conversationId: 'convo-123',
+    model: 'gpt-4',
+    context: 'message',
+    balance: { enabled: true },
+    transactions: { enabled: true },
+  };
+
+  beforeEach(() => {
+    jest.clearAllMocks();
+    mockSpendTokens = jest.fn().mockResolvedValue(undefined);
+    mockSpendStructuredTokens = jest.fn().mockResolvedValue(undefined);
+    mockInsertMany = jest.fn().mockResolvedValue(undefined);
+    mockUpdateBalance = jest.fn().mockResolvedValue({});
+    mockPricing = {
+      getMultiplier: jest.fn().mockReturnValue(1),
+      getCacheMultiplier: jest.fn().mockReturnValue(null),
+    };
+    mockBulkWriteOps = {
+      insertMany: mockInsertMany,
+      updateBalance: mockUpdateBalance,
+    };
+    deps = {
+      spendTokens: mockSpendTokens,
+      spendStructuredTokens: mockSpendStructuredTokens,
+      pricing: mockPricing,
+      bulkWriteOps: mockBulkWriteOps,
+    };
+  });
+
+  describe('basic functionality', () => {
+    it('should return undefined if collectedUsage is empty', async () => {
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage: [] });
+      expect(result).toBeUndefined();
+      expect(mockInsertMany).not.toHaveBeenCalled();
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+    });
+
+    it('should return undefined if collectedUsage is null-ish', async () => {
+      const result = await recordCollectedUsage(deps, {
+        ...baseParams,
+        collectedUsage: null as unknown as UsageMetadata[],
+      });
+      expect(result).toBeUndefined();
+      expect(mockInsertMany).not.toHaveBeenCalled();
+    });
+
+    it('should handle single usage entry — same return value as legacy path', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result).toEqual({ input_tokens: 100, output_tokens: 50 });
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs).toHaveLength(2);
+      const promptDoc = docs.find((d: { tokenType: string }) => d.tokenType === 'prompt');
+      const completionDoc = docs.find((d: { tokenType: string }) => d.tokenType === 'completion');
+      expect(promptDoc.user).toBe('user-123');
+      expect(promptDoc.conversationId).toBe('convo-123');
+      expect(promptDoc.model).toBe('gpt-4');
+      expect(promptDoc.context).toBe('message');
+      expect(promptDoc.rawAmount).toBe(-100);
+      expect(completionDoc.rawAmount).toBe(-50);
+    });
+
+    it('should skip null entries — same return value as legacy path', async () => {
+      const collectedUsage = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        null,
+        { input_tokens: 200, output_tokens: 60, model: 'gpt-4' },
+      ] as UsageMetadata[];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result).toEqual({ input_tokens: 100, output_tokens: 110 });
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs).toHaveLength(4); // 2 non-null entries × 2 docs each
+    });
+  });
+
+  describe('sequential execution (tool calls)', () => {
+    it('should calculate tokens correctly for sequential tool calls — same totals as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        { input_tokens: 150, output_tokens: 30, model: 'gpt-4' },
+        { input_tokens: 180, output_tokens: 20, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.output_tokens).toBe(100); // 50 + 30 + 20
+      expect(result?.input_tokens).toBe(100); // first entry's input
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs).toHaveLength(6); // 3 entries × 2 docs
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+    });
+  });
+
+  describe('parallel execution (multiple agents)', () => {
+    it('should handle parallel agents — same output_tokens total as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        { input_tokens: 80, output_tokens: 40, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.output_tokens).toBe(90); // 50 + 40
+      expect(result?.output_tokens).toBeGreaterThan(0);
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+    });
+
+    /** Bug regression: parallel agents where second agent has LOWER input tokens produced negative output via incremental calculation. */
+    it('should NOT produce negative output_tokens — same positive result as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 200, output_tokens: 100, model: 'gpt-4' },
+        { input_tokens: 50, output_tokens: 30, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.output_tokens).toBeGreaterThan(0);
+      expect(result?.output_tokens).toBe(130); // 100 + 30
+    });
+
+    it('should calculate correct total output for 3 parallel agents', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        { input_tokens: 120, output_tokens: 60, model: 'gpt-4-turbo' },
+        { input_tokens: 80, output_tokens: 40, model: 'claude-3' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.output_tokens).toBe(150); // 50 + 60 + 40
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs).toHaveLength(6);
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+    });
+  });
+
+  describe('cache token handling - OpenAI format', () => {
+    it('should route cache entries to structured path — same input_tokens as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        {
+          input_tokens: 100,
+          output_tokens: 50,
+          model: 'gpt-4',
+          input_token_details: { cache_creation: 20, cache_read: 10 },
+        },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.input_tokens).toBe(130); // 100 + 20 + 10
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      const promptDoc = docs.find((d: { tokenType: string }) => d.tokenType === 'prompt');
+      expect(promptDoc.inputTokens).toBe(-100);
+      expect(promptDoc.writeTokens).toBe(-20);
+      expect(promptDoc.readTokens).toBe(-10);
+      expect(promptDoc.model).toBe('gpt-4');
+    });
+  });
+
+  describe('cache token handling - Anthropic format', () => {
+    it('should route Anthropic cache entries to structured path — same input_tokens as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        {
+          input_tokens: 100,
+          output_tokens: 50,
+          model: 'claude-3',
+          cache_creation_input_tokens: 25,
+          cache_read_input_tokens: 15,
+        },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.input_tokens).toBe(140); // 100 + 25 + 15
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      const promptDoc = docs.find((d: { tokenType: string }) => d.tokenType === 'prompt');
+      expect(promptDoc.inputTokens).toBe(-100);
+      expect(promptDoc.writeTokens).toBe(-25);
+      expect(promptDoc.readTokens).toBe(-15);
+      expect(promptDoc.model).toBe('claude-3');
+    });
+  });
+
+  describe('mixed cache and non-cache entries', () => {
+    it('should handle mixed entries — same output_tokens as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        {
+          input_tokens: 150,
+          output_tokens: 30,
+          model: 'gpt-4',
+          input_token_details: { cache_creation: 10, cache_read: 5 },
+        },
+        { input_tokens: 200, output_tokens: 20, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.output_tokens).toBe(100); // 50 + 30 + 20
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs).toHaveLength(6); // 3 entries × 2 docs each
+    });
+  });
+
+  describe('model fallback', () => {
+    it('should use usage.model when available — model lands in doc', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4-turbo' },
+      ];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        model: 'fallback-model',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs[0].model).toBe('gpt-4-turbo');
+    });
+
+    it('should fallback to param model when usage.model is missing — model lands in doc', async () => {
+      const collectedUsage: UsageMetadata[] = [{ input_tokens: 100, output_tokens: 50 }];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        model: 'param-model',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs[0].model).toBe('param-model');
+    });
+
+    it('should fallback to undefined model when both usage.model and param model are missing', async () => {
+      const collectedUsage: UsageMetadata[] = [{ input_tokens: 100, output_tokens: 50 }];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        model: undefined,
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs[0].model).toBeUndefined();
+    });
+  });
+
+  describe('real-world scenarios', () => {
+    it('should correctly sum output tokens for sequential tool calls with growing context', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 31596, output_tokens: 151, model: 'claude-opus' },
+        { input_tokens: 35368, output_tokens: 150, model: 'claude-opus' },
+        { input_tokens: 58362, output_tokens: 295, model: 'claude-opus' },
+        { input_tokens: 112604, output_tokens: 193, model: 'claude-opus' },
+        { input_tokens: 257440, output_tokens: 2217, model: 'claude-opus' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.input_tokens).toBe(31596);
+      expect(result?.output_tokens).toBe(3006); // 151+150+295+193+2217
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs).toHaveLength(10); // 5 entries × 2 docs
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+    });
+
+    it('should handle cache tokens with multiple tool calls — same totals as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        {
+          input_tokens: 788,
+          output_tokens: 163,
+          model: 'claude-opus',
+          input_token_details: { cache_read: 0, cache_creation: 30808 },
+        },
+        {
+          input_tokens: 3802,
+          output_tokens: 149,
+          model: 'claude-opus',
+          input_token_details: { cache_read: 30808, cache_creation: 768 },
+        },
+        {
+          input_tokens: 26808,
+          output_tokens: 225,
+          model: 'claude-opus',
+          input_token_details: { cache_read: 31576, cache_creation: 0 },
+        },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result?.input_tokens).toBe(31596); // 788 + 30808 + 0
+      expect(result?.output_tokens).toBe(537); // 163 + 149 + 225
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+    });
+  });
+
+  describe('error handling', () => {
+    it('should catch bulk write errors — still returns correct result', async () => {
+      mockInsertMany.mockRejectedValue(new Error('DB error'));
+
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(deps, { ...baseParams, collectedUsage });
+
+      expect(result).toEqual({ input_tokens: 100, output_tokens: 50 });
+    });
+  });
+
+  describe('transaction metadata — doc fields match what legacy would pass to spendTokens', () => {
+    it('should pass all metadata fields to docs', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+      const endpointTokenConfig = { 'gpt-4': { prompt: 0.01, completion: 0.03, context: 8192 } };
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        messageId: 'msg-123',
+        endpointTokenConfig,
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      for (const doc of docs) {
+        expect(doc.user).toBe('user-123');
+        expect(doc.conversationId).toBe('convo-123');
+        expect(doc.model).toBe('gpt-4');
+        expect(doc.context).toBe('message');
+        expect(doc.messageId).toBe('msg-123');
+      }
+    });
+
+    it('should use default context "message" when not provided', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(deps, {
+        user: 'user-123',
+        conversationId: 'convo-123',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs[0].context).toBe('message');
+    });
+
+    it('should allow custom context like "title"', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        context: 'title',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs[0].context).toBe('title');
+    });
+  });
+
+  describe('messageId propagation — messageId on every doc', () => {
+    it('should propagate messageId to all docs', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 10, output_tokens: 5, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        messageId: 'msg-1',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      for (const doc of docs) {
+        expect(doc.messageId).toBe('msg-1');
+      }
+    });
+
+    it('should propagate messageId to structured cache docs', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        {
+          input_tokens: 100,
+          output_tokens: 50,
+          model: 'claude-3',
+          cache_creation_input_tokens: 25,
+          cache_read_input_tokens: 15,
+        },
+      ];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        messageId: 'msg-cache-1',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      for (const doc of docs) {
+        expect(doc.messageId).toBe('msg-cache-1');
+      }
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+    });
+
+    it('should pass undefined messageId when not provided', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 10, output_tokens: 5, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(deps, {
+        user: 'user-123',
+        conversationId: 'convo-123',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      expect(docs[0].messageId).toBeUndefined();
+    });
+
+    it('should propagate messageId across all entries in a multi-entry batch', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        { input_tokens: 200, output_tokens: 60, model: 'gpt-4' },
+        {
+          input_tokens: 150,
+          output_tokens: 30,
+          model: 'gpt-4',
+          input_token_details: { cache_creation: 10, cache_read: 5 },
+        },
+      ];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        messageId: 'msg-multi',
+        collectedUsage,
+      });
+
+      const docs = mockInsertMany.mock.calls[0][0];
+      for (const doc of docs) {
+        expect(doc.messageId).toBe('msg-multi');
+      }
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+    });
+  });
+
+  describe('balance behavior parity', () => {
+    it('should not call updateBalance when balance is disabled — same as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        balance: { enabled: false },
+        collectedUsage,
+      });
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockUpdateBalance).not.toHaveBeenCalled();
+    });
+
+    it('should not insert docs when transactions are disabled — same as legacy', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(deps, {
+        ...baseParams,
+        transactions: { enabled: false },
+        collectedUsage,
+      });
+
+      expect(mockInsertMany).not.toHaveBeenCalled();
+      expect(mockUpdateBalance).not.toHaveBeenCalled();
+    });
+  });
+});
--- a/packages/api/src/agents/usage.spec.ts
+++ b/packages/api/src/agents/usage.spec.ts
@ -1,6 +1,7 @@
-import { recordCollectedUsage } from './usage';
-import type { RecordUsageDeps, RecordUsageParams } from './usage';
 import type { UsageMetadata } from '../stream/interfaces/IJobStore';
+import type { RecordUsageDeps, RecordUsageParams } from './usage';
+import type { BulkWriteDeps, PricingFns } from './transactions';
+import { recordCollectedUsage } from './usage';

 describe('recordCollectedUsage', () => {
  let mockSpendTokens: jest.Mock;
@ -522,4 +523,199 @@ describe('recordCollectedUsage', () => {
      );
    });
  });
+
+  describe('bulk write path', () => {
+    let mockInsertMany: jest.Mock;
+    let mockUpdateBalance: jest.Mock;
+    let mockPricing: PricingFns;
+    let mockBulkWriteOps: BulkWriteDeps;
+    let bulkDeps: RecordUsageDeps;
+
+    beforeEach(() => {
+      mockInsertMany = jest.fn().mockResolvedValue(undefined);
+      mockUpdateBalance = jest.fn().mockResolvedValue({});
+      mockPricing = {
+        getMultiplier: jest.fn().mockReturnValue(1),
+        getCacheMultiplier: jest.fn().mockReturnValue(null),
+      };
+      mockBulkWriteOps = {
+        insertMany: mockInsertMany,
+        updateBalance: mockUpdateBalance,
+      };
+      bulkDeps = {
+        spendTokens: mockSpendTokens,
+        spendStructuredTokens: mockSpendStructuredTokens,
+        pricing: mockPricing,
+        bulkWriteOps: mockBulkWriteOps,
+      };
+    });
+
+    it('should use bulk path when pricing and bulkWriteOps are provided', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(bulkDeps, {
+        ...baseParams,
+        collectedUsage,
+      });
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+      expect(result).toEqual({ input_tokens: 100, output_tokens: 50 });
+    });
+
+    it('should batch all entries into a single insertMany call', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        { input_tokens: 200, output_tokens: 60, model: 'gpt-4' },
+        { input_tokens: 300, output_tokens: 70, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(bulkDeps, {
+        ...baseParams,
+        collectedUsage,
+      });
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      const insertedDocs = mockInsertMany.mock.calls[0][0];
+      expect(insertedDocs.length).toBe(6); // 2 per entry (prompt + completion)
+    });
+
+    it('should call updateBalance once when balance is enabled', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        { input_tokens: 200, output_tokens: 60, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(bulkDeps, {
+        ...baseParams,
+        balance: { enabled: true },
+        collectedUsage,
+      });
+
+      expect(mockUpdateBalance).toHaveBeenCalledTimes(1);
+      expect(mockUpdateBalance).toHaveBeenCalledWith(
+        expect.objectContaining({
+          user: 'user-123',
+          incrementValue: expect.any(Number),
+        }),
+      );
+    });
+
+    it('should not call updateBalance when balance is disabled', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(bulkDeps, {
+        ...baseParams,
+        balance: { enabled: false },
+        collectedUsage,
+      });
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockUpdateBalance).not.toHaveBeenCalled();
+    });
+
+    it('should handle cache tokens via bulk path', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        {
+          input_tokens: 100,
+          output_tokens: 50,
+          model: 'gpt-4',
+          input_token_details: { cache_creation: 20, cache_read: 10 },
+        },
+      ];
+
+      const result = await recordCollectedUsage(bulkDeps, {
+        ...baseParams,
+        collectedUsage,
+      });
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+      expect(result).toBeDefined();
+    });
+
+    it('should handle mixed cache and non-cache entries in bulk', async () => {
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+        {
+          input_tokens: 150,
+          output_tokens: 30,
+          model: 'gpt-4',
+          input_token_details: { cache_creation: 10, cache_read: 5 },
+        },
+      ];
+
+      const result = await recordCollectedUsage(bulkDeps, {
+        ...baseParams,
+        collectedUsage,
+      });
+
+      expect(mockInsertMany).toHaveBeenCalledTimes(1);
+      expect(mockSpendTokens).not.toHaveBeenCalled();
+      expect(mockSpendStructuredTokens).not.toHaveBeenCalled();
+      expect(result?.output_tokens).toBe(80);
+    });
+
+    it('should fall back to legacy path when pricing is missing', async () => {
+      const legacyDeps: RecordUsageDeps = {
+        spendTokens: mockSpendTokens,
+        spendStructuredTokens: mockSpendStructuredTokens,
+        bulkWriteOps: mockBulkWriteOps,
+        // no pricing
+      };
+
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(legacyDeps, {
+        ...baseParams,
+        collectedUsage,
+      });
+
+      expect(mockSpendTokens).toHaveBeenCalledTimes(1);
+      expect(mockInsertMany).not.toHaveBeenCalled();
+    });
+
+    it('should fall back to legacy path when bulkWriteOps is missing', async () => {
+      const legacyDeps: RecordUsageDeps = {
+        spendTokens: mockSpendTokens,
+        spendStructuredTokens: mockSpendStructuredTokens,
+        pricing: mockPricing,
+        // no bulkWriteOps
+      };
+
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      await recordCollectedUsage(legacyDeps, {
+        ...baseParams,
+        collectedUsage,
+      });
+
+      expect(mockSpendTokens).toHaveBeenCalledTimes(1);
+      expect(mockInsertMany).not.toHaveBeenCalled();
+    });
+
+    it('should handle errors in bulk write gracefully', async () => {
+      mockInsertMany.mockRejectedValue(new Error('DB error'));
+
+      const collectedUsage: UsageMetadata[] = [
+        { input_tokens: 100, output_tokens: 50, model: 'gpt-4' },
+      ];
+
+      const result = await recordCollectedUsage(bulkDeps, {
+        ...baseParams,
+        collectedUsage,
+      });
+
+      expect(result).toEqual({ input_tokens: 100, output_tokens: 50 });
+    });
+  });
 });
--- a/packages/api/src/agents/usage.ts
+++ b/packages/api/src/agents/usage.ts
@ -1,34 +1,20 @@
 import { logger } from '@librechat/data-schemas';
 import type { TCustomConfig, TTransactionsConfig } from 'librechat-data-provider';
-import type { UsageMetadata } from '../stream/interfaces/IJobStore';
-import type { EndpointTokenConfig } from '../types/tokens';
-
-interface TokenUsage {
-  promptTokens?: number;
-  completionTokens?: number;
-}
-
-interface StructuredPromptTokens {
-  input?: number;
-  write?: number;
-  read?: number;
-}
-
-interface StructuredTokenUsage {
-  promptTokens?: StructuredPromptTokens;
-  completionTokens?: number;
-}
-
-interface TxMetadata {
-  user: string;
-  model?: string;
-  context: string;
-  messageId?: string;
-  conversationId: string;
-  balance?: Partial<TCustomConfig['balance']> | null;
-  transactions?: Partial<TTransactionsConfig>;
-  endpointTokenConfig?: EndpointTokenConfig;
-}
+import type {
+  StructuredTokenUsage,
+  BulkWriteDeps,
+  PreparedEntry,
+  TxMetadata,
+  TokenUsage,
+  PricingFns,
+} from './transactions';
+import type { UsageMetadata } from '~/stream/interfaces/IJobStore';
+import type { EndpointTokenConfig } from '~/types/tokens';
+import {
+  prepareStructuredTokenSpend,
+  bulkWriteTransactions,
+  prepareTokenSpend,
+} from './transactions';

 type SpendTokensFn = (txData: TxMetadata, tokenUsage: TokenUsage) => Promise<unknown>;
 type SpendStructuredTokensFn = (
@ -39,6 +25,8 @@ type SpendStructuredTokensFn = (
 export interface RecordUsageDeps {
  spendTokens: SpendTokensFn;
  spendStructuredTokens: SpendStructuredTokensFn;
+  pricing?: PricingFns;
+  bulkWriteOps?: BulkWriteDeps;
 }

 export interface RecordUsageParams {
@ -61,6 +49,9 @@ export interface RecordUsageResult {
 /**
 * Records token usage for collected LLM calls and spends tokens against balance.
 * This handles both sequential execution (tool calls) and parallel execution (multiple agents).
+ *
+ * When `pricing` and `bulkWriteOps` deps are provided, prepares all transaction documents
+ * in-memory first, then writes them in a single `insertMany` + one `updateBalance` call.
 */
 export async function recordCollectedUsage(
  deps: RecordUsageDeps,
@ -78,8 +69,6 @@ export async function recordCollectedUsage(
    context = 'message',
  } = params;

-  const { spendTokens, spendStructuredTokens } = deps;
-
  if (!collectedUsage || !collectedUsage.length) {
    return;
  }
@ -96,6 +85,11 @@ export async function recordCollectedUsage(

  let total_output_tokens = 0;

+  const { pricing, bulkWriteOps } = deps;
+  const useBulk = pricing && bulkWriteOps;
+
+  const allDocs: PreparedEntry[] = [];
+
  for (const usage of collectedUsage) {
    if (!usage) {
      continue;
@ -121,26 +115,68 @@ export async function recordCollectedUsage(
      model: usage.model ?? model,
    };

-    if (cache_creation > 0 || cache_read > 0) {
-      spendStructuredTokens(txMetadata, {
-        promptTokens: {
-          input: usage.input_tokens,
-          write: cache_creation,
-          read: cache_read,
-        },
-        completionTokens: usage.output_tokens,
-      }).catch((err) => {
-        logger.error('[packages/api #recordCollectedUsage] Error spending structured tokens', err);
-      });
+    if (useBulk) {
+      const entries =
+        cache_creation > 0 || cache_read > 0
+          ? prepareStructuredTokenSpend(
+              txMetadata,
+              {
+                promptTokens: {
+                  input: usage.input_tokens,
+                  write: cache_creation,
+                  read: cache_read,
+                },
+                completionTokens: usage.output_tokens,
+              },
+              pricing,
+            )
+          : prepareTokenSpend(
+              txMetadata,
+              {
+                promptTokens: usage.input_tokens,
+                completionTokens: usage.output_tokens,
+              },
+              pricing,
+            );
+      allDocs.push(...entries);
      continue;
    }

-    spendTokens(txMetadata, {
-      promptTokens: usage.input_tokens,
-      completionTokens: usage.output_tokens,
-    }).catch((err) => {
-      logger.error('[packages/api #recordCollectedUsage] Error spending tokens', err);
-    });
+    if (cache_creation > 0 || cache_read > 0) {
+      deps
+        .spendStructuredTokens(txMetadata, {
+          promptTokens: {
+            input: usage.input_tokens,
+            write: cache_creation,
+            read: cache_read,
+          },
+          completionTokens: usage.output_tokens,
+        })
+        .catch((err) => {
+          logger.error(
+            '[packages/api #recordCollectedUsage] Error spending structured tokens',
+            err,
+          );
+        });
+      continue;
+    }
+
+    deps
+      .spendTokens(txMetadata, {
+        promptTokens: usage.input_tokens,
+        completionTokens: usage.output_tokens,
+      })
+      .catch((err) => {
+        logger.error('[packages/api #recordCollectedUsage] Error spending tokens', err);
+      });
+  }
+
+  if (useBulk && allDocs.length > 0) {
+    try {
+      await bulkWriteTransactions({ user, docs: allDocs }, bulkWriteOps);
+    } catch (err) {
+      logger.error('[packages/api #recordCollectedUsage] Error in bulk write', err);
+    }
  }

  return {
--- a/packages/api/src/endpoints/openai/config.spec.ts
+++ b/packages/api/src/endpoints/openai/config.spec.ts
@ -872,9 +872,8 @@ describe('getOpenAIConfig', () => {
        modelOptions,
      });

-      // OpenRouter reasoning object should only include effort, not summary
-      expect(result.llmConfig.reasoning).toEqual({
-        effort: ReasoningEffort.high,
+      expect(result.llmConfig.modelKwargs).toMatchObject({
+        reasoning: { effort: ReasoningEffort.high },
      });
      expect(result.llmConfig.include_reasoning).toBeUndefined();
      expect(result.provider).toBe('openrouter');
@ -1206,13 +1205,13 @@ describe('getOpenAIConfig', () => {
        model: 'gpt-4-turbo',
        temperature: 0.8,
        streaming: false,
-        reasoning: { effort: ReasoningEffort.high }, // OpenRouter reasoning object
      });
      expect(result.llmConfig.include_reasoning).toBeUndefined();
      // Should NOT have useResponsesApi for OpenRouter
      expect(result.llmConfig.useResponsesApi).toBeUndefined();
      expect(result.llmConfig.maxTokens).toBe(2000);
      expect(result.llmConfig.modelKwargs).toEqual({
+        reasoning: { effort: ReasoningEffort.high },
        verbosity: Verbosity.medium,
        customParam: 'custom-value',
        plugins: [{ id: 'web' }], // OpenRouter web search format
@ -1482,13 +1481,11 @@ describe('getOpenAIConfig', () => {
          user: 'openrouter-user',
          temperature: 0.7,
          maxTokens: 4000,
-          reasoning: {
-            effort: ReasoningEffort.high,
-          },
          apiKey: apiKey,
        });
        expect(result.llmConfig.include_reasoning).toBeUndefined();
        expect(result.llmConfig.modelKwargs).toMatchObject({
+          reasoning: { effort: ReasoningEffort.high },
          top_k: 50,
          repetition_penalty: 1.1,
        });
--- a/packages/api/src/endpoints/openai/llm.spec.ts
+++ b/packages/api/src/endpoints/openai/llm.spec.ts
@ -393,7 +393,9 @@ describe('getOpenAILLMConfig', () => {
        },
      });

-      expect(result.llmConfig).toHaveProperty('reasoning', { effort: ReasoningEffort.high });
+      expect(result.llmConfig.modelKwargs).toHaveProperty('reasoning', {
+        effort: ReasoningEffort.high,
+      });
      expect(result.llmConfig).not.toHaveProperty('include_reasoning');
      expect(result.llmConfig.modelKwargs).toHaveProperty('plugins', [{ id: 'web' }]);
    });
@ -617,7 +619,9 @@ describe('getOpenAILLMConfig', () => {
        },
      });

-      expect(result.llmConfig).toHaveProperty('reasoning', { effort: ReasoningEffort.high });
+      expect(result.llmConfig.modelKwargs).toHaveProperty('reasoning', {
+        effort: ReasoningEffort.high,
+      });
      expect(result.llmConfig).not.toHaveProperty('include_reasoning');
      expect(result.llmConfig).not.toHaveProperty('reasoning_effort');
    });
@ -634,7 +638,9 @@ describe('getOpenAILLMConfig', () => {
        },
      });

-      expect(result.llmConfig).toHaveProperty('reasoning', { effort: ReasoningEffort.high });
+      expect(result.llmConfig.modelKwargs).toHaveProperty('reasoning', {
+        effort: ReasoningEffort.high,
+      });
    });

    it.each([ReasoningEffort.xhigh, ReasoningEffort.minimal, ReasoningEffort.none])(
@ -650,7 +656,7 @@ describe('getOpenAILLMConfig', () => {
          },
        });

-        expect(result.llmConfig).toHaveProperty('reasoning', { effort });
+        expect(result.llmConfig.modelKwargs).toHaveProperty('reasoning', { effort });
        expect(result.llmConfig).not.toHaveProperty('include_reasoning');
      },
    );
--- a/packages/api/src/endpoints/openai/llm.ts
+++ b/packages/api/src/endpoints/openai/llm.ts
@ -1,7 +1,6 @@
 import { EModelEndpoint, removeNullishValues } from 'librechat-data-provider';
 import type { BindToolsInput } from '@langchain/core/language_models/chat_models';
 import type { SettingDefinition } from 'librechat-data-provider';
-import type { OpenRouterReasoning } from '@librechat/agents';
 import type { AzureOpenAIInput } from '@langchain/openai';
 import type { OpenAI } from 'openai';
 import type * as t from '~/types';
@ -231,7 +230,8 @@ export function getOpenAILLMConfig({
       * `include_reasoning` is legacy compat that maps to `{ enabled: true }` only when
       * no `reasoning` object is present, so we intentionally omit it here.
       */
-      llmConfig.reasoning = { effort: reasoning_effort } as OpenRouterReasoning;
+      modelKwargs.reasoning = { effort: reasoning_effort };
+      hasModelKwargs = true;
    } else {
      /** No explicit effort; fall back to legacy `include_reasoning` for reasoning token inclusion */
      llmConfig.include_reasoning = true;