🏁 fix: Message Race Condition if Cancelled Early (#11462)

* 🔧 fix: Prevent race conditions in message saving during abort scenarios

* Added logic to save partial responses before returning from the abort endpoint to ensure parentMessageId exists in the database.
* Updated the ResumableAgentController to save response messages before emitting final events, preventing orphaned parentMessageIds.
* Enhanced handling of unfinished responses to improve stability and data integrity in agent interactions.

* 🔧 fix: logging and job replacement handling in ResumableAgentController

* Added detailed logging for job creation and final event emissions to improve traceability.
* Implemented logic to check for job replacement before emitting events, preventing stale requests from affecting newer jobs.
* Updated abort handling to log additional context about the abort result, enhancing debugging capabilities.

* refactor: abort handling and token spending logic in AgentStream

* Added authorization check for abort attempts to prevent unauthorized access.
* Improved response message saving logic to ensure valid message IDs are stored.
* Implemented token spending for aborted requests to prevent double-spending across parallel agents.
* Enhanced logging for better traceability of token spending operations during abort scenarios.

* refactor: remove TODO comments for token spending in abort handling

* Removed outdated TODO comments regarding token spending for aborted requests in the abort endpoint.
* This change streamlines the code and clarifies the current implementation status.

*  test: Add comprehensive tests for job replacement and abort handling

* Introduced unit tests for job replacement detection in ResumableAgentController, covering job creation timestamp tracking, stale job detection, and response message saving order.
* Added tests for the agent abort endpoint, ensuring proper authorization checks, early abort handling, and partial response saving.
* Enhanced logging and error handling in tests to improve traceability and robustness of the abort functionality.
This commit is contained in:
Danny Avila 2026-01-21 13:57:12 -05:00 committed by GitHub
parent dea246934e
commit 11210d8b98
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 682 additions and 12 deletions

View file

@ -67,7 +67,15 @@ const ResumableAgentController = async (req, res, next, initializeClient, addTit
let client = null;
try {
logger.debug(`[ResumableAgentController] Creating job`, {
streamId,
conversationId,
reqConversationId,
userId,
});
const job = await GenerationJobManager.createJob(streamId, userId, conversationId);
const jobCreatedAt = job.createdAt; // Capture creation time to detect job replacement
req._resumableStreamId = streamId;
// Send JSON response IMMEDIATELY so client can connect to SSE stream
@ -272,6 +280,33 @@ const ResumableAgentController = async (req, res, next, initializeClient, addTit
});
}
// CRITICAL: Save response message BEFORE emitting final event.
// This prevents race conditions where the client sends a follow-up message
// before the response is saved to the database, causing orphaned parentMessageIds.
if (client.savedMessageIds && !client.savedMessageIds.has(messageId)) {
await saveMessage(
req,
{ ...response, user: userId, unfinished: wasAbortedBeforeComplete },
{ context: 'api/server/controllers/agents/request.js - resumable response end' },
);
}
// Check if our job was replaced by a new request before emitting
// This prevents stale requests from emitting events to newer jobs
const currentJob = await GenerationJobManager.getJob(streamId);
const jobWasReplaced = !currentJob || currentJob.createdAt !== jobCreatedAt;
if (jobWasReplaced) {
logger.debug(`[ResumableAgentController] Skipping FINAL emit - job was replaced`, {
streamId,
originalCreatedAt: jobCreatedAt,
currentCreatedAt: currentJob?.createdAt,
});
// Still decrement pending request since we incremented at start
await decrementPendingRequest(userId);
return;
}
if (!wasAbortedBeforeComplete) {
const finalEvent = {
final: true,
@ -281,26 +316,34 @@ const ResumableAgentController = async (req, res, next, initializeClient, addTit
responseMessage: { ...response },
};
logger.debug(`[ResumableAgentController] Emitting FINAL event`, {
streamId,
wasAbortedBeforeComplete,
userMessageId: userMessage?.messageId,
responseMessageId: response?.messageId,
conversationId: conversation?.conversationId,
});
GenerationJobManager.emitDone(streamId, finalEvent);
GenerationJobManager.completeJob(streamId);
await decrementPendingRequest(userId);
if (client.savedMessageIds && !client.savedMessageIds.has(messageId)) {
await saveMessage(
req,
{ ...response, user: userId },
{ context: 'api/server/controllers/agents/request.js - resumable response end' },
);
}
} else {
const finalEvent = {
final: true,
conversation,
title: conversation.title,
requestMessage: sanitizeMessageForTransmit(userMessage),
responseMessage: { ...response, error: true },
error: { message: 'Request was aborted' },
responseMessage: { ...response, unfinished: true },
};
logger.debug(`[ResumableAgentController] Emitting ABORTED FINAL event`, {
streamId,
wasAbortedBeforeComplete,
userMessageId: userMessage?.messageId,
responseMessageId: response?.messageId,
conversationId: conversation?.conversationId,
});
GenerationJobManager.emitDone(streamId, finalEvent);
GenerationJobManager.completeJob(streamId, 'Request aborted');
await decrementPendingRequest(userId);