🌊 fix: Prevent Buffered Event Duplication on SSE Resume Connections (#12225)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run

* fix: skipBufferReplay for job resume connections

- Introduced a new option `skipBufferReplay` in the `subscribe` method of `GenerationJobManagerClass` to prevent duplication of events when resuming a connection.
- Updated the logic to conditionally skip replaying buffered events if a sync event has already been sent, enhancing the efficiency of event handling during reconnections.
- Added integration tests to verify the correct behavior of the new option, ensuring that no buffered events are replayed when `skipBufferReplay` is true, while still allowing for normal replay behavior when false.

* refactor: Update GenerationJobManager to handle sync events more efficiently

- Modified the `subscribe` method to utilize a new `skipBufferReplay` option, allowing for the prevention of duplicate events during resume connections.
- Enhanced the logic in the `chat/stream` route to conditionally skip replaying buffered events if a sync event has already been sent, improving event handling efficiency.
- Updated integration tests to verify the correct behavior of the new option, ensuring that no buffered events are replayed when `skipBufferReplay` is true, while maintaining normal replay behavior when false.

* test: Enhance GenerationJobManager integration tests for Redis mode

- Updated integration tests to conditionally run based on the USE_REDIS environment variable, allowing for better control over Redis-related tests.
- Refactored test descriptions to utilize a dynamic `describeRedis` function, improving clarity and organization of tests related to Redis functionality.
- Removed redundant checks for Redis availability within individual tests, streamlining the test logic and enhancing readability.

* fix: sync handler state for new messages on resume

The sync event's else branch (new response message) was missing
resetContentHandler() and syncStepMessage() calls, leaving stale
handler state that caused subsequent deltas to build on partial
content instead of the synced aggregatedContent.

* feat: atomic subscribeWithResume to close resume event gap

Replaces separate getResumeState() + subscribe() calls with a single
subscribeWithResume() that atomically drains earlyEventBuffer between
the resume snapshot and the subscribe. In in-memory mode, drained events
are returned as pendingEvents for the client to replay after sync.
In Redis mode, pendingEvents is empty since chunks are already persisted.

The route handler now uses the atomic method for resume connections and
extracted shared SSE write helpers to reduce duplication. The client
replays any pendingEvents through the existing step/content handlers
after applying aggregatedContent from the sync payload.

* fix: only capture gap events in subscribeWithResume, not pre-snapshot buffer

The previous implementation drained the entire earlyEventBuffer into
pendingEvents, but pre-snapshot events are already reflected in
aggregatedContent. Replaying them re-introduced the duplication bug
through a different vector.

Now records buffer length before getResumeState() and slices from that
index, so only events arriving during the async gap are returned as
pendingEvents.

Also:
- Handle pendingEvents when resumeState is null (replay directly)
- Hoist duplicate test helpers to shared scope
- Remove redundant writableEnded guard in onDone
This commit is contained in:
Danny Avila 2026-03-14 10:54:26 -04:00 committed by GitHub
parent cbdc6f6060
commit 7bc793b18d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 700 additions and 259 deletions

View file

@ -76,52 +76,62 @@ router.get('/chat/stream/:streamId', async (req, res) => {
logger.debug(`[AgentStream] Client subscribed to ${streamId}, resume: ${isResume}`);
// Send sync event with resume state for ALL reconnecting clients
// This supports multi-tab scenarios where each tab needs run step data
if (isResume) {
const resumeState = await GenerationJobManager.getResumeState(streamId);
if (resumeState && !res.writableEnded) {
// Send sync event with run steps AND aggregatedContent
// Client will use aggregatedContent to initialize message state
res.write(`event: message\ndata: ${JSON.stringify({ sync: true, resumeState })}\n\n`);
const writeEvent = (event) => {
if (!res.writableEnded) {
res.write(`event: message\ndata: ${JSON.stringify(event)}\n\n`);
if (typeof res.flush === 'function') {
res.flush();
}
logger.debug(
`[AgentStream] Sent sync event for ${streamId} with ${resumeState.runSteps.length} run steps`,
);
}
}
};
const result = await GenerationJobManager.subscribe(
streamId,
(event) => {
if (!res.writableEnded) {
res.write(`event: message\ndata: ${JSON.stringify(event)}\n\n`);
const onDone = (event) => {
writeEvent(event);
res.end();
};
const onError = (error) => {
if (!res.writableEnded) {
res.write(`event: error\ndata: ${JSON.stringify({ error })}\n\n`);
if (typeof res.flush === 'function') {
res.flush();
}
res.end();
}
};
let result;
if (isResume) {
const { subscription, resumeState, pendingEvents } =
await GenerationJobManager.subscribeWithResume(streamId, writeEvent, onDone, onError);
if (!res.writableEnded) {
if (resumeState) {
res.write(
`event: message\ndata: ${JSON.stringify({ sync: true, resumeState, pendingEvents })}\n\n`,
);
if (typeof res.flush === 'function') {
res.flush();
}
}
},
(event) => {
if (!res.writableEnded) {
res.write(`event: message\ndata: ${JSON.stringify(event)}\n\n`);
if (typeof res.flush === 'function') {
res.flush();
GenerationJobManager.markSyncSent(streamId);
logger.debug(
`[AgentStream] Sent sync event for ${streamId} with ${resumeState.runSteps.length} run steps, ${pendingEvents.length} pending events`,
);
} else if (pendingEvents.length > 0) {
for (const event of pendingEvents) {
writeEvent(event);
}
res.end();
logger.warn(
`[AgentStream] Resume state null for ${streamId}, replayed ${pendingEvents.length} gap events directly`,
);
}
},
(error) => {
if (!res.writableEnded) {
res.write(`event: error\ndata: ${JSON.stringify({ error })}\n\n`);
if (typeof res.flush === 'function') {
res.flush();
}
res.end();
}
},
);
}
result = subscription;
} else {
result = await GenerationJobManager.subscribe(streamId, writeEvent, onDone, onError);
}
if (!result) {
return res.status(404).json({ error: 'Failed to subscribe to stream' });