💰 fix: Multi-Agent Token Spending & Prevent Double-Spend (#11433)

* fix: Token Spending Logic for Multi-Agents on Abort Scenarios

* Implemented logic to skip token spending if a conversation is aborted, preventing double-spending.
* Introduced `spendCollectedUsage` function to handle token spending for multiple models during aborts, ensuring accurate accounting for parallel agents.
* Updated `GenerationJobManager` to store and retrieve collected usage data for improved abort handling.
* Added comprehensive tests for the new functionality, covering various scenarios including cache token handling and parallel agent usage.

* fix: Memory Context Handling for Multi-Agents

* Refactored `buildMessages` method to pass memory context to parallel agents, ensuring they share the same user context.
* Improved handling of memory context when no existing instructions are present for parallel agents.
* Added comprehensive tests to verify memory context propagation and behavior under various scenarios, including cases with no memory available and empty agent configurations.
* Enhanced logging for better traceability of memory context additions to agents.

* chore: Memory Context Documentation for Parallel Agents

* Updated documentation in the `AgentClient` class to clarify the in-place mutation of agentConfig objects when passing memory context to parallel agents.
* Added notes on the implications of mutating objects directly to ensure all parallel agents receive the correct memory context before execution.

* chore: UsageMetadata Interface docs for Token Spending

* Expanded the UsageMetadata interface to support both OpenAI and Anthropic cache token formats.
* Added detailed documentation for cache token properties, including mutually exclusive fields for different model types.
* Improved clarity on how to access cache token details for accurate token spending tracking.

* fix: Enhance Token Spending Logic in Abort Middleware

* Refactored `spendCollectedUsage` function to utilize Promise.all for concurrent token spending, improving performance and ensuring all operations complete before clearing the collectedUsage array.
* Added documentation to clarify the importance of clearing the collectedUsage array to prevent double-spending in abort scenarios.
* Updated tests to verify the correct behavior of the spending logic and the clearing of the array after spending operations.
This commit is contained in:
Danny Avila 2026-01-20 14:43:19 -05:00 committed by GitHub
parent 32e6f3b8e5
commit 36c5a88c4e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 1440 additions and 28 deletions

View file

@ -522,14 +522,36 @@ class AgentClient extends BaseClient {
}
const withoutKeys = await this.useMemory();
if (withoutKeys) {
systemContent += `${memoryInstructions}\n\n# Existing memory about the user:\n${withoutKeys}`;
const memoryContext = withoutKeys
? `${memoryInstructions}\n\n# Existing memory about the user:\n${withoutKeys}`
: '';
if (memoryContext) {
systemContent += memoryContext;
}
if (systemContent) {
this.options.agent.instructions = systemContent;
}
/**
* Pass memory context to parallel agents (addedConvo) so they have the same user context.
*
* NOTE: This intentionally mutates the agentConfig objects in place. The agentConfigs Map
* holds references to config objects that will be passed to the graph runtime. Mutating
* them here ensures all parallel agents receive the memory context before execution starts.
* Creating new objects would not work because the Map references would still point to the old objects.
*/
if (memoryContext && this.agentConfigs?.size > 0) {
for (const [agentId, agentConfig] of this.agentConfigs.entries()) {
if (agentConfig.instructions) {
agentConfig.instructions = agentConfig.instructions + '\n\n' + memoryContext;
} else {
agentConfig.instructions = memoryContext;
}
logger.debug(`[AgentClient] Added memory context to parallel agent: ${agentId}`);
}
}
return result;
}
@ -1084,11 +1106,20 @@ class AgentClient extends BaseClient {
this.artifactPromises.push(...attachments);
}
await this.recordCollectedUsage({
context: 'message',
balance: balanceConfig,
transactions: transactionsConfig,
});
/** Skip token spending if aborted - the abort handler (abortMiddleware.js) handles it
This prevents double-spending when user aborts via `/api/agents/chat/abort` */
const wasAborted = abortController?.signal?.aborted;
if (!wasAborted) {
await this.recordCollectedUsage({
context: 'message',
balance: balanceConfig,
transactions: transactionsConfig,
});
} else {
logger.debug(
'[api/server/controllers/agents/client.js #chatCompletion] Skipping token spending - handled by abort middleware',
);
}
} catch (err) {
logger.error(
'[api/server/controllers/agents/client.js #chatCompletion] Error in cleanup phase',