2024-05-02 08:48:26 +02:00
const { v4 : uuidv4 } = require ( 'uuid' ) ;
🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445)
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode
seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.
* fix: chain tenantContextMiddleware in optionalJwtAuth
optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.
* feat: tenant-safe bulkWrite wrapper and call-site migration
Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.
Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)
systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.
* feat: pre-auth tenant middleware and tenant-scoped config cache
Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.
The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.
The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.
* feat: accept optional tenantId in updateInterfacePermissions
When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.
* fix: tenant-filter $lookup in getPromptGroup aggregation
The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.
Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.
* fix: replace field-level unique indexes with tenant-scoped compounds
Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.
- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
compound
* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant
These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.
Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).
Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG
* fix: add getTenantId to PluginController spec mock
The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.
* fix: address review findings — migration, strict-mode, DRY, types
Addresses all CRITICAL, MAJOR, and MINOR review findings:
F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.
F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.
F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.
F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.
F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.
F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.
F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).
F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.
* fix: add new superseded indexes to tenantIndexes test fixture
The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).
* fix: restore logger.warn for unknown bulk op types in non-strict mode
* fix: block SYSTEM_TENANT_ID sentinel from external header input
CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.
Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js
* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions
bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling
preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
getTenantId() inside the middleware's execution context, not the
test runner's outer frame (which was always undefined regardless)
* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager
MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
preventing cross-tenant message content reads during TTS streaming.
FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
context is active, isolating OAuth flow state per tenant.
GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId
* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks
FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.
* fix: correct import ordering per AGENTS.md conventions
Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.
* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode
serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).
* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs
SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped
FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows
Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site
* fix: SSE stream tests hang on success path, remove internal fork references
The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.
Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".
* fix: address review findings — test rigor, tenant ID validation, docs
F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.
F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.
F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.
F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.
F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.
F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.
* fix: revert flow key tenant scoping, fix SSE test timing
FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).
SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.
* fix: restore getTenantId import for completeFlow diagnostic log
* fix: correct completeFlow warning message, add missing flow test
The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.
Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.
Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.
* fix: add explicit return type to recursive updateInterfacePermissions
The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.
* fix: update MCPOAuthRaceCondition test to match new completeFlow warning
* fix: clearEndpointConfigCache deletes both scoped and unscoped keys
Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.
* fix: tenant guard on abort/status endpoints, warn logs, test coverage
F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.
F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.
F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.
F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.
F5: Test for job with tenantId + user without tenantId → 403.
F10: Regex uses idiomatic hyphen-at-start form.
F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).
Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.
* fix: test coverage for logger.warn, status/abort guards, consistency
A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.
B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.
C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.
D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.
* fix: assert ip field in malformed header warn tests
* fix: parallelize cache deletes, document tenant guard, fix import order
- clearEndpointConfigCache uses Promise.all for independent cache
deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
AGENTS.md
* fix: tenant-qualify userJobs keys, document tenant guard backward-compat
Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap
getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).
Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.
* fix: parallelize cache deletes, document tenant guard, fix import order
Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.
Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.
* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs
F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.
F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.
F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).
F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.
F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.
F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.
F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).
* fix: clean userJobMap in evictOldest — same leak as cleanup()
2026-03-28 16:43:50 -04:00
const { logger , scopedCacheKey } = require ( '@librechat/data-schemas' ) ;
2024-05-29 09:15:05 -04:00
const { EModelEndpoint , Constants , openAISettings , CacheKeys } = require ( 'librechat-data-provider' ) ;
2024-05-02 08:48:26 +02:00
const { createImportBatchBuilder } = require ( './importBatchBuilder' ) ;
2025-07-05 12:44:19 -04:00
const { cloneMessagesWithTimestamps } = require ( './fork' ) ;
2024-05-29 09:15:05 -04:00
const getLogStores = require ( '~/cache/getLogStores' ) ;
2024-05-02 08:48:26 +02:00
/ * *
* Returns the appropriate importer function based on the provided JSON data .
*
* @ param { Object } jsonData - The JSON data to import .
* @ returns { Function } - The importer function .
* @ throws { Error } - If the import type is not supported .
* /
function getImporter ( jsonData ) {
2025-12-30 03:37:52 +01:00
// For array-based formats (ChatGPT or Claude)
2024-05-02 08:48:26 +02:00
if ( Array . isArray ( jsonData ) ) {
2025-12-30 03:37:52 +01:00
// Claude format has chat_messages array in each conversation
if ( jsonData . length > 0 && jsonData [ 0 ] ? . chat _messages ) {
logger . info ( 'Importing Claude conversation' ) ;
return importClaudeConvo ;
}
// ChatGPT format has mapping object in each conversation
2024-05-02 08:48:26 +02:00
logger . info ( 'Importing ChatGPT conversation' ) ;
return importChatGptConvo ;
}
// For ChatbotUI
if ( jsonData . version && Array . isArray ( jsonData . history ) ) {
logger . info ( 'Importing ChatbotUI conversation' ) ;
return importChatBotUiConvo ;
}
// For LibreChat
2024-05-29 09:15:05 -04:00
if ( jsonData . conversationId && ( jsonData . messagesTree || jsonData . messages ) ) {
2024-05-02 08:48:26 +02:00
logger . info ( 'Importing LibreChat conversation' ) ;
return importLibreChatConvo ;
}
throw new Error ( 'Unsupported import type' ) ;
}
/ * *
* Imports a chatbot - ui V1 conversation from a JSON file and saves it to the database .
*
* @ param { Object } jsonData - The JSON data containing the chatbot conversation .
* @ param { string } requestUserId - The ID of the user making the import request .
* @ param { Function } [ builderFactory = createImportBatchBuilder ] - The factory function to create an import batch builder .
* @ returns { Promise < void > } - A promise that resolves when the import is complete .
* @ throws { Error } - If there is an error creating the conversation from the JSON file .
* /
async function importChatBotUiConvo (
jsonData ,
requestUserId ,
builderFactory = createImportBatchBuilder ,
) {
// this have been tested with chatbot-ui V1 export https://github.com/mckaywrigley/chatbot-ui/tree/b865b0555f53957e96727bc0bbb369c9eaecd83b#legacy-code
try {
🌿 feat: Fork Messages/Conversations (#2617)
* typedef for ImportBatchBuilder
* feat: first pass, fork conversations
* feat: fork - getMessagesUpToTargetLevel
* fix: additional tests and fix getAllMessagesUpToParent
* chore: arrow function return
* refactor: fork 3 options
* chore: remove unused genbuttons
* chore: remove unused hover buttons code
* feat: fork first pass
* wip: fork remember setting
* style: user icon
* chore: move clear chats to data tab
* WIP: fork UI options
* feat: data-provider fork types/services/vars and use generic MutationOptions
* refactor: use single param for fork option, use enum, fix mongo errors, use Date.now(), add records flag for testing, use endpoint from original convo and messages, pass originalConvo to finishConversation
* feat: add fork mutation hook and consolidate type imports
* refactor: use enum
* feat: first pass, fork mutation
* chore: add enum for target level fork option
* chore: add enum for target level fork option
* show toast when checking remember selection
* feat: splitAtTarget
* feat: split at target option
* feat: navigate to new fork, show toasts, set result query data
* feat: hover info for all fork options
* refactor: add Messages settings tab
* fix(Fork): remember text info
* ci: test for single message and is target edge case
* feat: additional tests for getAllMessagesUpToParent
* ci: additional tests and cycle detection for getMessagesUpToTargetLevel
* feat: circular dependency checks for getAllMessagesUpToParent
* fix: getMessagesUpToTargetLevel circular dep. check
* ci: more tests for getMessagesForConversation
* style: hover text for checkbox fork items
* refactor: add statefulness to conversation import
2024-05-05 11:48:20 -04:00
/** @type {ImportBatchBuilder} */
2024-05-02 08:48:26 +02:00
const importBatchBuilder = builderFactory ( requestUserId ) ;
for ( const historyItem of jsonData . history ) {
importBatchBuilder . startConversation ( EModelEndpoint . openAI ) ;
for ( const message of historyItem . messages ) {
if ( message . role === 'assistant' ) {
importBatchBuilder . addGptMessage ( message . content , historyItem . model . id ) ;
} else if ( message . role === 'user' ) {
importBatchBuilder . addUserMessage ( message . content ) ;
}
}
importBatchBuilder . finishConversation ( historyItem . name , new Date ( ) ) ;
}
await importBatchBuilder . saveBatch ( ) ;
logger . info ( ` user: ${ requestUserId } | ChatbotUI conversation imported ` ) ;
} catch ( error ) {
logger . error ( ` user: ${ requestUserId } | Error creating conversation from ChatbotUI file ` , error ) ;
}
}
2025-12-30 03:37:52 +01:00
/ * *
* Extracts text and thinking content from a Claude message .
* @ param { Object } msg - Claude message object with content array and optional text field .
* @ returns { { textContent : string , thinkingContent : string } } Extracted text and thinking content .
* /
function extractClaudeContent ( msg ) {
let textContent = '' ;
let thinkingContent = '' ;
for ( const part of msg . content || [ ] ) {
if ( part . type === 'text' && part . text ) {
textContent += part . text ;
} else if ( part . type === 'thinking' && part . thinking ) {
thinkingContent += part . thinking ;
}
}
// Use the text field as fallback if content array is empty
if ( ! textContent && msg . text ) {
textContent = msg . text ;
}
return { textContent , thinkingContent } ;
}
/ * *
* Imports Claude conversations from provided JSON data .
* Claude export format : array of conversations with chat _messages array .
*
* @ param { Array } jsonData - Array of Claude conversation objects to be imported .
* @ param { string } requestUserId - The ID of the user who initiated the import process .
* @ param { Function } builderFactory - Factory function to create a new import batch builder instance .
* @ returns { Promise < void > } Promise that resolves when all conversations have been imported .
* /
async function importClaudeConvo (
jsonData ,
requestUserId ,
builderFactory = createImportBatchBuilder ,
) {
try {
const importBatchBuilder = builderFactory ( requestUserId ) ;
for ( const conv of jsonData ) {
importBatchBuilder . startConversation ( EModelEndpoint . anthropic ) ;
let lastMessageId = Constants . NO _PARENT ;
let lastTimestamp = null ;
for ( const msg of conv . chat _messages || [ ] ) {
const isCreatedByUser = msg . sender === 'human' ;
const messageId = uuidv4 ( ) ;
const { textContent , thinkingContent } = extractClaudeContent ( msg ) ;
// Skip empty messages
if ( ! textContent && ! thinkingContent ) {
continue ;
}
// Parse timestamp, fallback to conversation create_time or current time
const messageTime = msg . created _at || conv . created _at ;
let createdAt = messageTime ? new Date ( messageTime ) : new Date ( ) ;
// Ensure timestamp is after the previous message.
// Messages are sorted by createdAt and buildTree expects parents to appear before children.
// This guards against any potential ordering issues in exports.
if ( lastTimestamp && createdAt <= lastTimestamp ) {
createdAt = new Date ( lastTimestamp . getTime ( ) + 1 ) ;
}
lastTimestamp = createdAt ;
const message = {
messageId ,
parentMessageId : lastMessageId ,
text : textContent ,
sender : isCreatedByUser ? 'user' : 'Claude' ,
isCreatedByUser ,
user : requestUserId ,
endpoint : EModelEndpoint . anthropic ,
createdAt ,
} ;
// Add content array with thinking if present
if ( thinkingContent && ! isCreatedByUser ) {
message . content = [
{ type : 'think' , think : thinkingContent } ,
{ type : 'text' , text : textContent } ,
] ;
}
importBatchBuilder . saveMessage ( message ) ;
lastMessageId = messageId ;
}
const createdAt = conv . created _at ? new Date ( conv . created _at ) : new Date ( ) ;
importBatchBuilder . finishConversation ( conv . name || 'Imported Claude Chat' , createdAt ) ;
}
await importBatchBuilder . saveBatch ( ) ;
logger . info ( ` user: ${ requestUserId } | Claude conversation imported ` ) ;
} catch ( error ) {
logger . error ( ` user: ${ requestUserId } | Error creating conversation from Claude file ` , error ) ;
}
}
2024-05-02 08:48:26 +02:00
/ * *
* Imports a LibreChat conversation from JSON .
*
* @ param { Object } jsonData - The JSON data representing the conversation .
* @ param { string } requestUserId - The ID of the user making the import request .
* @ param { Function } [ builderFactory = createImportBatchBuilder ] - The factory function to create an import batch builder .
* @ returns { Promise < void > } - A promise that resolves when the import is complete .
* /
async function importLibreChatConvo (
jsonData ,
requestUserId ,
builderFactory = createImportBatchBuilder ,
) {
try {
🌿 feat: Fork Messages/Conversations (#2617)
* typedef for ImportBatchBuilder
* feat: first pass, fork conversations
* feat: fork - getMessagesUpToTargetLevel
* fix: additional tests and fix getAllMessagesUpToParent
* chore: arrow function return
* refactor: fork 3 options
* chore: remove unused genbuttons
* chore: remove unused hover buttons code
* feat: fork first pass
* wip: fork remember setting
* style: user icon
* chore: move clear chats to data tab
* WIP: fork UI options
* feat: data-provider fork types/services/vars and use generic MutationOptions
* refactor: use single param for fork option, use enum, fix mongo errors, use Date.now(), add records flag for testing, use endpoint from original convo and messages, pass originalConvo to finishConversation
* feat: add fork mutation hook and consolidate type imports
* refactor: use enum
* feat: first pass, fork mutation
* chore: add enum for target level fork option
* chore: add enum for target level fork option
* show toast when checking remember selection
* feat: splitAtTarget
* feat: split at target option
* feat: navigate to new fork, show toasts, set result query data
* feat: hover info for all fork options
* refactor: add Messages settings tab
* fix(Fork): remember text info
* ci: test for single message and is target edge case
* feat: additional tests for getAllMessagesUpToParent
* ci: additional tests and cycle detection for getMessagesUpToTargetLevel
* feat: circular dependency checks for getAllMessagesUpToParent
* fix: getMessagesUpToTargetLevel circular dep. check
* ci: more tests for getMessagesForConversation
* style: hover text for checkbox fork items
* refactor: add statefulness to conversation import
2024-05-05 11:48:20 -04:00
/** @type {ImportBatchBuilder} */
2024-05-02 08:48:26 +02:00
const importBatchBuilder = builderFactory ( requestUserId ) ;
2024-05-29 09:15:05 -04:00
const options = jsonData . options || { } ;
/* Endpoint configuration */
let endpoint = jsonData . endpoint ? ? options . endpoint ? ? EModelEndpoint . openAI ;
const cache = getLogStores ( CacheKeys . CONFIG _STORE ) ;
🏗️ feat: bulkWrite isolation, pre-auth context, strict-mode fixes (#12445)
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode
seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.
* fix: chain tenantContextMiddleware in optionalJwtAuth
optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.
* feat: tenant-safe bulkWrite wrapper and call-site migration
Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.
Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)
systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.
* feat: pre-auth tenant middleware and tenant-scoped config cache
Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.
The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.
The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.
* feat: accept optional tenantId in updateInterfacePermissions
When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.
* fix: tenant-filter $lookup in getPromptGroup aggregation
The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.
Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.
* fix: replace field-level unique indexes with tenant-scoped compounds
Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.
- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
compound
* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant
These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.
Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).
Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG
* fix: add getTenantId to PluginController spec mock
The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.
* fix: address review findings — migration, strict-mode, DRY, types
Addresses all CRITICAL, MAJOR, and MINOR review findings:
F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.
F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.
F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.
F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.
F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.
F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.
F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).
F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.
* fix: add new superseded indexes to tenantIndexes test fixture
The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).
* fix: restore logger.warn for unknown bulk op types in non-strict mode
* fix: block SYSTEM_TENANT_ID sentinel from external header input
CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.
Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js
* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions
bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling
preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
getTenantId() inside the middleware's execution context, not the
test runner's outer frame (which was always undefined regardless)
* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager
MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
preventing cross-tenant message content reads during TTS streaming.
FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
context is active, isolating OAuth flow state per tenant.
GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId
* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks
FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.
* fix: correct import ordering per AGENTS.md conventions
Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.
* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode
serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).
* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs
SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped
FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows
Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site
* fix: SSE stream tests hang on success path, remove internal fork references
The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.
Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".
* fix: address review findings — test rigor, tenant ID validation, docs
F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.
F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.
F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.
F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.
F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.
F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.
* fix: revert flow key tenant scoping, fix SSE test timing
FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).
SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.
* fix: restore getTenantId import for completeFlow diagnostic log
* fix: correct completeFlow warning message, add missing flow test
The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.
Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.
Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.
* fix: add explicit return type to recursive updateInterfacePermissions
The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.
* fix: update MCPOAuthRaceCondition test to match new completeFlow warning
* fix: clearEndpointConfigCache deletes both scoped and unscoped keys
Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.
* fix: tenant guard on abort/status endpoints, warn logs, test coverage
F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.
F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.
F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.
F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.
F5: Test for job with tenantId + user without tenantId → 403.
F10: Regex uses idiomatic hyphen-at-start form.
F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).
Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.
* fix: test coverage for logger.warn, status/abort guards, consistency
A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.
B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.
C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.
D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.
* fix: assert ip field in malformed header warn tests
* fix: parallelize cache deletes, document tenant guard, fix import order
- clearEndpointConfigCache uses Promise.all for independent cache
deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
AGENTS.md
* fix: tenant-qualify userJobs keys, document tenant guard backward-compat
Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap
getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).
Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.
* fix: parallelize cache deletes, document tenant guard, fix import order
Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.
Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.
* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs
F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.
F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.
F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).
F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.
F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.
F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.
F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).
* fix: clean userJobMap in evictOldest — same leak as cleanup()
2026-03-28 16:43:50 -04:00
const endpointsConfig = await cache . get ( scopedCacheKey ( CacheKeys . ENDPOINT _CONFIG ) ) ;
2024-05-29 09:15:05 -04:00
const endpointConfig = endpointsConfig ? . [ endpoint ] ;
if ( ! endpointConfig && endpointsConfig ) {
endpoint = Object . keys ( endpointsConfig ) [ 0 ] ;
} else if ( ! endpointConfig ) {
endpoint = EModelEndpoint . openAI ;
}
importBatchBuilder . startConversation ( endpoint ) ;
2024-05-02 08:48:26 +02:00
let firstMessageDate = null ;
2024-05-29 09:15:05 -04:00
const messagesToImport = jsonData . messagesTree || jsonData . messages ;
if ( jsonData . recursive ) {
/ * *
2025-07-05 12:44:19 -04:00
* Flatten the recursive message tree into a flat array
2024-05-29 09:15:05 -04:00
* @ param { TMessage [ ] } messages
* @ param { string } parentMessageId
2025-07-05 12:44:19 -04:00
* @ param { TMessage [ ] } flatMessages
2024-05-29 09:15:05 -04:00
* /
2025-07-05 12:44:19 -04:00
const flattenMessages = (
messages ,
parentMessageId = Constants . NO _PARENT ,
flatMessages = [ ] ,
) => {
2024-05-29 09:15:05 -04:00
for ( const message of messages ) {
2024-12-30 13:01:47 -05:00
if ( ! message . text && ! message . content ) {
2024-05-29 09:15:05 -04:00
continue ;
}
2025-07-05 12:44:19 -04:00
const flatMessage = {
... message ,
parentMessageId : parentMessageId ,
children : undefined , // Remove children from flat structure
} ;
flatMessages . push ( flatMessage ) ;
2024-05-02 08:48:26 +02:00
2024-06-07 21:06:47 +02:00
if ( ! firstMessageDate && message . createdAt ) {
2024-05-29 09:15:05 -04:00
firstMessageDate = new Date ( message . createdAt ) ;
}
if ( message . children && message . children . length > 0 ) {
2025-07-05 12:44:19 -04:00
flattenMessages ( message . children , message . messageId , flatMessages ) ;
2024-05-29 09:15:05 -04:00
}
2024-05-02 08:48:26 +02:00
}
2025-07-05 12:44:19 -04:00
return flatMessages ;
2024-05-29 09:15:05 -04:00
} ;
2025-07-05 12:44:19 -04:00
const flatMessages = flattenMessages ( messagesToImport ) ;
cloneMessagesWithTimestamps ( flatMessages , importBatchBuilder ) ;
2024-05-29 09:15:05 -04:00
} else if ( messagesToImport ) {
2025-07-05 12:44:19 -04:00
cloneMessagesWithTimestamps ( messagesToImport , importBatchBuilder ) ;
2024-05-29 09:15:05 -04:00
for ( const message of messagesToImport ) {
2024-06-07 21:06:47 +02:00
if ( ! firstMessageDate && message . createdAt ) {
2024-05-02 08:48:26 +02:00
firstMessageDate = new Date ( message . createdAt ) ;
}
2024-05-29 09:15:05 -04:00
}
} else {
throw new Error ( 'Invalid LibreChat file format' ) ;
}
2024-05-02 08:48:26 +02:00
2024-06-07 21:06:47 +02:00
if ( firstMessageDate === 'Invalid Date' ) {
firstMessageDate = null ;
}
2024-05-29 09:15:05 -04:00
importBatchBuilder . finishConversation ( jsonData . title , firstMessageDate ? ? new Date ( ) , options ) ;
2024-05-02 08:48:26 +02:00
await importBatchBuilder . saveBatch ( ) ;
logger . debug ( ` user: ${ requestUserId } | Conversation " ${ jsonData . title } " imported ` ) ;
} catch ( error ) {
logger . error ( ` user: ${ requestUserId } | Error creating conversation from LibreChat file ` , error ) ;
}
}
/ * *
* Imports ChatGPT conversations from provided JSON data .
* Initializes the import process by creating a batch builder and processing each conversation in the data .
*
* @ param { ChatGPTConvo [ ] } jsonData - Array of conversation objects to be imported .
* @ param { string } requestUserId - The ID of the user who initiated the import process .
* @ param { Function } builderFactory - Factory function to create a new import batch builder instance , defaults to createImportBatchBuilder .
* @ returns { Promise < void > } Promise that resolves when all conversations have been imported .
* /
async function importChatGptConvo (
jsonData ,
requestUserId ,
builderFactory = createImportBatchBuilder ,
) {
try {
const importBatchBuilder = builderFactory ( requestUserId ) ;
for ( const conv of jsonData ) {
processConversation ( conv , importBatchBuilder , requestUserId ) ;
}
await importBatchBuilder . saveBatch ( ) ;
} catch ( error ) {
logger . error ( ` user: ${ requestUserId } | Error creating conversation from imported file ` , error ) ;
}
}
/ * *
* Processes a single conversation , adding messages to the batch builder based on author roles and handling text content .
* It directly manages the addition of messages for different roles and handles citations for assistant messages .
*
* @ param { ChatGPTConvo } conv - A single conversation object that contains multiple messages and other details .
🌿 feat: Fork Messages/Conversations (#2617)
* typedef for ImportBatchBuilder
* feat: first pass, fork conversations
* feat: fork - getMessagesUpToTargetLevel
* fix: additional tests and fix getAllMessagesUpToParent
* chore: arrow function return
* refactor: fork 3 options
* chore: remove unused genbuttons
* chore: remove unused hover buttons code
* feat: fork first pass
* wip: fork remember setting
* style: user icon
* chore: move clear chats to data tab
* WIP: fork UI options
* feat: data-provider fork types/services/vars and use generic MutationOptions
* refactor: use single param for fork option, use enum, fix mongo errors, use Date.now(), add records flag for testing, use endpoint from original convo and messages, pass originalConvo to finishConversation
* feat: add fork mutation hook and consolidate type imports
* refactor: use enum
* feat: first pass, fork mutation
* chore: add enum for target level fork option
* chore: add enum for target level fork option
* show toast when checking remember selection
* feat: splitAtTarget
* feat: split at target option
* feat: navigate to new fork, show toasts, set result query data
* feat: hover info for all fork options
* refactor: add Messages settings tab
* fix(Fork): remember text info
* ci: test for single message and is target edge case
* feat: additional tests for getAllMessagesUpToParent
* ci: additional tests and cycle detection for getMessagesUpToTargetLevel
* feat: circular dependency checks for getAllMessagesUpToParent
* fix: getMessagesUpToTargetLevel circular dep. check
* ci: more tests for getMessagesForConversation
* style: hover text for checkbox fork items
* refactor: add statefulness to conversation import
2024-05-05 11:48:20 -04:00
* @ param { ImportBatchBuilder } importBatchBuilder - The batch builder instance used to manage and batch conversation data .
2024-05-02 08:48:26 +02:00
* @ param { string } requestUserId - The ID of the user who initiated the import process .
* @ returns { void }
* /
function processConversation ( conv , importBatchBuilder , requestUserId ) {
importBatchBuilder . startConversation ( EModelEndpoint . openAI ) ;
// Map all message IDs to new UUIDs
const messageMap = new Map ( ) ;
for ( const [ id , mapping ] of Object . entries ( conv . mapping ) ) {
if ( mapping . message && mapping . message . content . content _type ) {
const newMessageId = uuidv4 ( ) ;
messageMap . set ( id , newMessageId ) ;
}
}
2025-09-09 13:51:26 -04:00
/ * *
2026-03-19 17:15:12 -04:00
* Finds the nearest valid parent by traversing up through skippable messages
* ( system , reasoning _recap , thoughts ) . Uses iterative traversal to avoid
* stack overflow on deep chains of skippable messages .
*
* @ param { string } startId - The ID of the starting parent message .
2025-12-30 03:31:18 +01:00
* @ returns { string } The ID of the nearest valid parent message .
2025-09-09 13:51:26 -04:00
* /
2026-03-19 17:15:12 -04:00
const findValidParent = ( startId ) => {
const visited = new Set ( ) ;
let parentId = startId ;
2025-09-09 13:51:26 -04:00
2026-03-19 17:15:12 -04:00
while ( parentId ) {
if ( ! messageMap . has ( parentId ) || visited . has ( parentId ) ) {
return Constants . NO _PARENT ;
}
visited . add ( parentId ) ;
2025-09-09 13:51:26 -04:00
2026-03-19 17:15:12 -04:00
const parentMapping = conv . mapping [ parentId ] ;
if ( ! parentMapping ? . message ) {
return Constants . NO _PARENT ;
}
2025-12-30 03:31:18 +01:00
2026-03-19 17:15:12 -04:00
const contentType = parentMapping . message . content ? . content _type ;
const shouldSkip =
parentMapping . message . author ? . role === 'system' ||
contentType === 'reasoning_recap' ||
contentType === 'thoughts' ;
if ( ! shouldSkip ) {
return messageMap . get ( parentId ) ;
}
parentId = parentMapping . parent ;
2025-09-09 13:51:26 -04:00
}
2026-03-19 17:15:12 -04:00
return Constants . NO _PARENT ;
2025-09-09 13:51:26 -04:00
} ;
2025-12-30 03:31:18 +01:00
/ * *
* Helper function to find thinking content from parent chain ( thoughts messages )
* @ param { string } parentId - The ID of the parent message .
* @ param { Set } visited - Set of already - visited IDs to prevent cycles .
* @ returns { Array } The thinking content array ( empty if not found ) .
* /
const findThinkingContent = ( parentId , visited = new Set ( ) ) => {
// Guard against circular references in malformed imports
if ( ! parentId || visited . has ( parentId ) ) {
return [ ] ;
}
visited . add ( parentId ) ;
const parentMapping = conv . mapping [ parentId ] ;
if ( ! parentMapping ? . message ) {
return [ ] ;
}
const contentType = parentMapping . message . content ? . content _type ;
// If this is a thoughts message, extract the thinking content
if ( contentType === 'thoughts' ) {
const thoughts = parentMapping . message . content . thoughts || [ ] ;
const thinkingText = thoughts
. map ( ( t ) => t . content || t . summary || '' )
. filter ( Boolean )
. join ( '\n\n' ) ;
if ( thinkingText ) {
return [ { type : 'think' , think : thinkingText } ] ;
}
return [ ] ;
}
// If this is reasoning_recap, look at its parent for thoughts
if ( contentType === 'reasoning_recap' ) {
return findThinkingContent ( parentMapping . parent , visited ) ;
}
return [ ] ;
} ;
2024-05-02 08:48:26 +02:00
// Create and save messages using the mapped IDs
const messages = [ ] ;
for ( const [ id , mapping ] of Object . entries ( conv . mapping ) ) {
const role = mapping . message ? . author ? . role ;
if ( ! mapping . message ) {
messageMap . delete ( id ) ;
continue ;
} else if ( role === 'system' ) {
2025-09-09 13:51:26 -04:00
// Skip system messages but keep their ID in messageMap for parent references
2024-05-02 08:48:26 +02:00
continue ;
}
2025-12-30 03:31:18 +01:00
const contentType = mapping . message . content ? . content _type ;
// Skip thoughts messages - they will be merged into the response message
if ( contentType === 'thoughts' ) {
continue ;
}
// Skip reasoning_recap messages (just summaries like "Thought for 44s")
if ( contentType === 'reasoning_recap' ) {
continue ;
}
2024-05-02 08:48:26 +02:00
const newMessageId = messageMap . get ( id ) ;
2025-12-30 03:31:18 +01:00
const parentMessageId = findValidParent ( mapping . parent ) ;
2024-05-02 08:48:26 +02:00
const messageText = formatMessageText ( mapping . message ) ;
const isCreatedByUser = role === 'user' ;
2025-09-09 13:51:26 -04:00
let sender = isCreatedByUser ? 'user' : 'assistant' ;
2024-05-02 08:48:26 +02:00
const model = mapping . message . metadata . model _slug || openAISettings . model . default ;
2025-09-09 13:51:26 -04:00
if ( ! isCreatedByUser ) {
/** Extracted model name from model slug */
const gptMatch = model . match ( /gpt-(.+)/i ) ;
if ( gptMatch ) {
sender = ` GPT- ${ gptMatch [ 1 ] } ` ;
} else {
sender = model || 'assistant' ;
}
2024-05-02 08:48:26 +02:00
}
2025-12-30 03:31:18 +01:00
// Use create_time from ChatGPT export to ensure proper message ordering
// For null timestamps, use the conversation's create_time as fallback, or current time as last resort
const messageTime = mapping . message . create _time || conv . create _time ;
const createdAt = messageTime ? new Date ( messageTime * 1000 ) : new Date ( ) ;
const message = {
2024-05-02 08:48:26 +02:00
messageId : newMessageId ,
parentMessageId ,
text : messageText ,
sender ,
isCreatedByUser ,
model ,
user : requestUserId ,
endpoint : EModelEndpoint . openAI ,
2025-12-30 03:31:18 +01:00
createdAt ,
} ;
// For assistant messages, check if there's thinking content in the parent chain
if ( ! isCreatedByUser ) {
const thinkingContent = findThinkingContent ( mapping . parent ) ;
if ( thinkingContent . length > 0 ) {
// Combine thinking content with the text response
message . content = [ ... thinkingContent , { type : 'text' , text : messageText } ] ;
}
}
messages . push ( message ) ;
2024-05-02 08:48:26 +02:00
}
2026-03-19 17:15:12 -04:00
const cycleDetected = adjustTimestampsForOrdering ( messages ) ;
if ( cycleDetected ) {
breakParentCycles ( messages ) ;
}
2025-12-30 03:31:18 +01:00
2024-05-02 08:48:26 +02:00
for ( const message of messages ) {
importBatchBuilder . saveMessage ( message ) ;
}
importBatchBuilder . finishConversation ( conv . title , new Date ( conv . create _time * 1000 ) ) ;
}
/ * *
* Processes text content of messages authored by an assistant , inserting citation links as required .
2024-10-24 15:50:48 -04:00
* Uses citation start and end indices to place links at the correct positions .
2024-05-02 08:48:26 +02:00
*
* @ param { ChatGPTMessage } messageData - The message data containing metadata about citations .
* @ param { string } messageText - The original text of the message which may be altered by inserting citation links .
* @ returns { string } - The updated message text after processing for citations .
* /
function processAssistantMessage ( messageData , messageText ) {
2024-10-24 15:50:48 -04:00
if ( ! messageText ) {
return messageText ;
}
const citations = messageData . metadata ? . citations ? ? [ ] ;
const sortedCitations = [ ... citations ] . sort ( ( a , b ) => b . start _ix - a . start _ix ) ;
2024-05-02 08:48:26 +02:00
2024-10-24 15:50:48 -04:00
let result = messageText ;
for ( const citation of sortedCitations ) {
2024-05-02 08:48:26 +02:00
if (
2024-10-24 15:50:48 -04:00
! citation . metadata ? . type ||
citation . metadata . type !== 'webpage' ||
typeof citation . start _ix !== 'number' ||
typeof citation . end _ix !== 'number' ||
citation . start _ix >= citation . end _ix
2024-05-02 08:48:26 +02:00
) {
continue ;
}
const replacement = ` ([ ${ citation . metadata . title } ]( ${ citation . metadata . url } )) ` ;
2024-10-24 15:50:48 -04:00
result = result . slice ( 0 , citation . start _ix ) + replacement + result . slice ( citation . end _ix ) ;
2024-05-02 08:48:26 +02:00
}
2024-10-24 15:50:48 -04:00
return result ;
2024-05-02 08:48:26 +02:00
}
/ * *
* Formats the text content of a message based on its content type and author role .
* @ param { ChatGPTMessage } messageData - The message data .
2025-12-30 03:31:18 +01:00
* @ returns { string } - The formatted message text .
2024-05-02 08:48:26 +02:00
* /
function formatMessageText ( messageData ) {
2025-12-30 03:31:18 +01:00
const contentType = messageData . content . content _type ;
const isText = contentType === 'text' ;
2024-05-02 08:48:26 +02:00
let messageText = '' ;
if ( isText && messageData . content . parts ) {
messageText = messageData . content . parts . join ( ' ' ) ;
2025-12-30 03:31:18 +01:00
} else if ( contentType === 'code' ) {
2024-05-02 08:48:26 +02:00
messageText = ` \` \` \` ${ messageData . content . language } \n ${ messageData . content . text } \n \` \` \` ` ;
2025-12-30 03:31:18 +01:00
} else if ( contentType === 'execution_output' ) {
2024-05-02 08:48:26 +02:00
messageText = ` Execution Output: \n > ${ messageData . content . text } ` ;
} else if ( messageData . content . parts ) {
for ( const part of messageData . content . parts ) {
if ( typeof part === 'string' ) {
messageText += part + ' ' ;
} else if ( typeof part === 'object' ) {
messageText = ` \` \` \` json \n ${ JSON . stringify ( part , null , 2 ) } \n \` \` \` \n ` ;
}
}
messageText = messageText . trim ( ) ;
} else {
messageText = ` \` \` \` json \n ${ JSON . stringify ( messageData . content , null , 2 ) } \n \` \` \` ` ;
}
if ( isText && messageData . author . role !== 'user' ) {
messageText = processAssistantMessage ( messageData , messageText ) ;
}
return messageText ;
}
2025-12-30 03:31:18 +01:00
/ * *
* Adjusts message timestamps to ensure children always come after parents .
* Messages are sorted by createdAt and buildTree expects parents to appear before children .
* ChatGPT exports can have slight timestamp inversions ( e . g . , tool call results
* arriving a few ms before their parent ) . Uses multiple passes to handle cascading adjustments .
2026-03-19 17:15:12 -04:00
* Capped at N passes ( where N = message count ) to guarantee termination on cyclic graphs .
2025-12-30 03:31:18 +01:00
*
* @ param { Array } messages - Array of message objects with messageId , parentMessageId , and createdAt .
2026-03-19 17:15:12 -04:00
* @ returns { boolean } True if cyclic parent relationships were detected .
2025-12-30 03:31:18 +01:00
* /
function adjustTimestampsForOrdering ( messages ) {
2026-03-19 17:15:12 -04:00
if ( messages . length === 0 ) {
return false ;
}
2025-12-30 03:31:18 +01:00
const timestampMap = new Map ( ) ;
2026-03-19 17:15:12 -04:00
for ( const msg of messages ) {
timestampMap . set ( msg . messageId , msg . createdAt ) ;
}
2025-12-30 03:31:18 +01:00
let hasChanges = true ;
2026-03-19 17:15:12 -04:00
let remainingPasses = messages . length ;
while ( hasChanges && remainingPasses > 0 ) {
2025-12-30 03:31:18 +01:00
hasChanges = false ;
2026-03-19 17:15:12 -04:00
remainingPasses -- ;
2025-12-30 03:31:18 +01:00
for ( const message of messages ) {
if ( message . parentMessageId && message . parentMessageId !== Constants . NO _PARENT ) {
const parentTimestamp = timestampMap . get ( message . parentMessageId ) ;
if ( parentTimestamp && message . createdAt <= parentTimestamp ) {
message . createdAt = new Date ( parentTimestamp . getTime ( ) + 1 ) ;
timestampMap . set ( message . messageId , message . createdAt ) ;
hasChanges = true ;
}
}
}
}
2026-03-19 17:15:12 -04:00
const cycleDetected = remainingPasses === 0 && hasChanges ;
if ( cycleDetected ) {
logger . warn (
'[importers] Detected cyclic parent relationships while adjusting import timestamps' ,
) ;
}
return cycleDetected ;
}
/ * *
* Severs cyclic parentMessageId back - edges so saved messages form a valid tree .
* Walks each message ' s parent chain ; if a message is visited twice , its parentMessageId
* is set to NO _PARENT to break the cycle .
*
* @ param { Array } messages - Array of message objects with messageId and parentMessageId .
* /
function breakParentCycles ( messages ) {
const parentLookup = new Map ( ) ;
for ( const msg of messages ) {
parentLookup . set ( msg . messageId , msg ) ;
}
const settled = new Set ( ) ;
for ( const message of messages ) {
const chain = new Set ( ) ;
let current = message ;
while ( current && ! settled . has ( current . messageId ) ) {
if ( chain . has ( current . messageId ) ) {
current . parentMessageId = Constants . NO _PARENT ;
break ;
}
chain . add ( current . messageId ) ;
const parentId = current . parentMessageId ;
if ( ! parentId || parentId === Constants . NO _PARENT ) {
break ;
}
current = parentLookup . get ( parentId ) ;
}
for ( const id of chain ) {
settled . add ( id ) ;
}
}
2025-12-30 03:31:18 +01:00
}
2024-10-24 15:50:48 -04:00
module . exports = { getImporter , processAssistantMessage } ;