mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-04-03 14:27:20 +02:00
* fix: wrap seedDatabase() in runAsSystem() for strict tenant mode
seedDatabase() was called without tenant context at startup, causing
every Mongoose operation inside it to throw when
TENANT_ISOLATION_STRICT=true. Wrapping in runAsSystem() gives it the
SYSTEM_TENANT_ID sentinel so the isolation plugin skips filtering,
matching the pattern already used for performStartupChecks and
updateInterfacePermissions.
* fix: chain tenantContextMiddleware in optionalJwtAuth
optionalJwtAuth populated req.user but never established ALS tenant
context, unlike requireJwtAuth which chains tenantContextMiddleware
after successful auth. Authenticated users hitting routes with
optionalJwtAuth (e.g. /api/banner) had no tenant isolation.
* feat: tenant-safe bulkWrite wrapper and call-site migration
Mongoose's bulkWrite() does not trigger schema-level middleware hooks,
so the applyTenantIsolation plugin cannot intercept it. This adds a
tenantSafeBulkWrite() utility that injects the current ALS tenant
context into every operation's filter/document before delegating to
native bulkWrite.
Migrates all 8 runtime bulkWrite call sites:
- agentCategory (seedCategories, ensureDefaultCategories)
- conversation (bulkSaveConvos)
- message (bulkSaveMessages)
- file (batchUpdateFiles)
- conversationTag (updateTagsForConversation, bulkIncrementTagCounts)
- aclEntry (bulkWriteAclEntries)
systemGrant.seedSystemGrants is intentionally not migrated — it uses
explicit tenantId: { $exists: false } filters and is exempt from the
isolation plugin.
* feat: pre-auth tenant middleware and tenant-scoped config cache
Adds preAuthTenantMiddleware that reads X-Tenant-Id from the request
header and wraps downstream in tenantStorage ALS context. Wired onto
/oauth, /api/auth, /api/config, and /api/share — unauthenticated
routes that need tenant scoping before JWT auth runs.
The /api/config cache key is now tenant-scoped
(STARTUP_CONFIG:${tenantId}) so multi-tenant deployments serve the
correct login page config per tenant.
The middleware is intentionally minimal — no subdomain parsing, no
OIDC claim extraction. The private fork's reverse proxy or auth
gateway sets the header.
* feat: accept optional tenantId in updateInterfacePermissions
When tenantId is provided, the function re-enters inside
tenantStorage.run({ tenantId }) so all downstream Mongoose queries
target that tenant's roles instead of the system context. This lets
the private fork's tenant provisioning flow call
updateInterfacePermissions per-tenant after creating tenant-scoped
ADMIN/USER roles.
* fix: tenant-filter $lookup in getPromptGroup aggregation
The $lookup stage in getPromptGroup() queried the prompts collection
without tenant filtering. While the outer PromptGroup aggregate is
protected by the tenantIsolation plugin's pre('aggregate') hook,
$lookup runs as an internal MongoDB operation that bypasses Mongoose
hooks entirely.
Converts from simple field-based $lookup to pipeline-based $lookup
with an explicit tenantId match when tenant context is active.
* fix: replace field-level unique indexes with tenant-scoped compounds
Field-level unique:true creates a globally-unique single-field index in
MongoDB, which would cause insert failures across tenants sharing the
same ID values.
- agent.id: removed field-level unique, added { id, tenantId } compound
- convo.conversationId: removed field-level unique (compound at line 50
already exists: { conversationId, user, tenantId })
- message.messageId: removed field-level unique (compound at line 165
already exists: { messageId, user, tenantId })
- preset.presetId: removed field-level unique, added { presetId, tenantId }
compound
* fix: scope MODELS_CONFIG, ENDPOINT_CONFIG, PLUGINS, TOOLS caches by tenant
These caches store per-tenant configuration (available models, endpoint
settings, plugin availability, tool definitions) but were using global
cache keys. In multi-tenant mode, one tenant's cached config would be
served to all tenants.
Appends :${tenantId} to cache keys when tenant context is active.
Falls back to the unscoped key when no tenant context exists (backward
compatible for single-tenant OSS deployments).
Covers all read, write, and delete sites:
- ModelController.js: get/set MODELS_CONFIG
- PluginController.js: get/set PLUGINS, get/set TOOLS
- getEndpointsConfig.js: get/set/delete ENDPOINT_CONFIG
- app.js: delete ENDPOINT_CONFIG (clearEndpointConfigCache)
- mcp.js: delete TOOLS (updateMCPTools, mergeAppTools)
- importers.js: get ENDPOINT_CONFIG
* fix: add getTenantId to PluginController spec mock
The data-schemas mock was missing getTenantId, causing all
PluginController tests to throw when the controller calls
getTenantId() for tenant-scoped cache keys.
* fix: address review findings — migration, strict-mode, DRY, types
Addresses all CRITICAL, MAJOR, and MINOR review findings:
F1 (CRITICAL): Add agents, conversations, messages, presets to
SUPERSEDED_INDEXES in tenantIndexes.ts so dropSupersededTenantIndexes()
drops the old single-field unique indexes that block multi-tenant inserts.
F2 (CRITICAL): Unknown bulkWrite op types now throw in strict mode
instead of silently passing through without tenant injection.
F3 (MAJOR): Replace wildcard export with named export for
tenantSafeBulkWrite, hiding _resetBulkWriteStrictCache from the
public package API.
F5 (MAJOR): Restore AnyBulkWriteOperation<IAclEntry>[] typing on
bulkWriteAclEntries — the unparameterized wrapper accepts parameterized
ops as a subtype.
F7 (MAJOR): Fix config.js tenant precedence — JWT-derived
req.user.tenantId now takes priority over the X-Tenant-Id header for
authenticated requests.
F8 (MINOR): Extract scopedCacheKey() helper into tenantContext.ts and
replace all 11 inline occurrences across 7 files.
F9 (MINOR): Use simple localField/foreignField $lookup for the
non-tenant getPromptGroup path (more efficient index seeks).
F12 (NIT): Remove redundant BulkOp type alias.
F13 (NIT): Remove debug log that leaked raw tenantId.
* fix: add new superseded indexes to tenantIndexes test fixture
The test creates old indexes to verify the migration drops them.
Missing fixture entries for agents.id_1, conversations.conversationId_1,
messages.messageId_1, and presets.presetId_1 caused the count assertion
to fail (expected 22, got 18).
* fix: restore logger.warn for unknown bulk op types in non-strict mode
* fix: block SYSTEM_TENANT_ID sentinel from external header input
CRITICAL: preAuthTenantMiddleware accepted any string as X-Tenant-Id,
including '__SYSTEM__'. The tenantIsolation plugin treats SYSTEM_TENANT_ID
as an explicit bypass — skipping ALL query filters. A client sending
X-Tenant-Id: __SYSTEM__ to pre-auth routes (/api/share, /api/config,
/api/auth, /oauth) would execute Mongoose operations without tenant
isolation.
Fixes:
- preAuthTenantMiddleware rejects SYSTEM_TENANT_ID in header
- scopedCacheKey returns the base key (not key:__SYSTEM__) in system
context, preventing stale cache entries during runAsSystem()
- updateInterfacePermissions guards tenantId against SYSTEM_TENANT_ID
- $lookup pipeline separates $expr join from constant tenantId match
for better index utilization
- Regression test for sentinel rejection in preAuthTenant.spec.ts
- Remove redundant getTenantId() call in config.js
* test: add missing deleteMany/replaceOne coverage, fix vacuous ALS assertions
bulkWrite spec:
- deleteMany: verifies tenant-scoped deletion leaves other tenants untouched
- replaceOne: verifies tenantId injected into both filter and replacement
- replaceOne overwrite: verifies a conflicting tenantId in the replacement
document is overwritten by the ALS tenant (defense-in-depth)
- empty ops array: verifies graceful handling
preAuthTenant spec:
- All negative-case tests now use the capturedNext pattern to verify
getTenantId() inside the middleware's execution context, not the
test runner's outer frame (which was always undefined regardless)
* feat: tenant-isolate MESSAGES cache, FLOWS cache, and GenerationJobManager
MESSAGES cache (streamAudio.js):
- Cache key now uses scopedCacheKey(messageId) to prefix with tenantId,
preventing cross-tenant message content reads during TTS streaming.
FLOWS cache (FlowStateManager):
- getFlowKey() now generates ${type}:${tenantId}:${flowId} when tenant
context is active, isolating OAuth flow state per tenant.
GenerationJobManager:
- tenantId added to SerializableJobData and GenerationJobMetadata
- createJob() captures the current ALS tenant context (excluding
SYSTEM_TENANT_ID) and stores it in job metadata
- SSE subscription endpoint validates job.metadata.tenantId matches
req.user.tenantId, blocking cross-tenant stream access
- Both InMemoryJobStore and RedisJobStore updated to accept tenantId
* fix: add getTenantId and SYSTEM_TENANT_ID to MCP OAuth test mocks
FlowStateManager.getFlowKey() now calls getTenantId() for tenant-scoped
flow keys. The 4 MCP OAuth test files mock @librechat/data-schemas
without these exports, causing TypeError at runtime.
* fix: correct import ordering per AGENTS.md conventions
Package imports sorted shortest to longest line length, local imports
sorted longest to shortest — fixes ordering violations introduced by
our new imports across 8 files.
* fix: deserialize tenantId in RedisJobStore — cross-tenant SSE guard was no-op in Redis mode
serializeJob() writes tenantId to the Redis hash via Object.entries,
but deserializeJob() manually enumerates fields and omitted tenantId.
Every getJob() from Redis returned tenantId: undefined, causing the
SSE route's cross-tenant guard to short-circuit (undefined && ... → false).
* test: SSE tenant guard, FlowStateManager key consistency, ALS scope docs
SSE stream tenant tests (streamTenant.spec.js):
- Cross-tenant user accessing another tenant's stream → 403
- Same-tenant user accessing own stream → allowed
- OSS mode (no tenantId on job) → tenant check skipped
FlowStateManager tenant tests (manager.tenant.spec.ts):
- completeFlow finds flow created under same tenant context
- completeFlow does NOT find flow under different tenant context
- Unscoped flows are separate from tenant-scoped flows
Documentation:
- JSDoc on getFlowKey documenting ALS context consistency requirement
- Comment on streamAudio.js scopedCacheKey capture site
* fix: SSE stream tests hang on success path, remove internal fork references
The success-path tests entered the SSE streaming code which never
closes, causing timeout. Mock subscribe() to end the response
immediately. Restructured assertions to verify non-403/non-404.
Removed "private fork" and "OSS" references from code and test
descriptions — replaced with "deployment layer", "multi-tenant
deployments", and "single-tenant mode".
* fix: address review findings — test rigor, tenant ID validation, docs
F1: SSE stream tests now mock subscribe() with correct signature
(streamId, writeEvent, onDone, onError) and assert 200 status,
verifying the tenant guard actually allows through same-tenant users.
F2: completeFlow logs the attempted key and ALS tenantId when flow
is not found, so reverse proxy misconfiguration (missing X-Tenant-Id
on OAuth callback) produces an actionable warning.
F3/F10: preAuthTenantMiddleware validates tenant ID format — rejects
colons, special characters, and values exceeding 128 chars. Trims
whitespace. Prevents cache key collisions via crafted headers.
F4: Documented cache invalidation scope limitation in
clearEndpointConfigCache — only the calling tenant's key is cleared;
other tenants expire via TTL.
F7: getFlowKey JSDoc now lists all 8 methods requiring consistent
ALS context.
F8: Added dedicated scopedCacheKey unit tests — base key without
context, base key in system context, scoped key with tenant, no
ALS leakage across scope boundaries.
* fix: revert flow key tenant scoping, fix SSE test timing
FlowStateManager: Reverts tenant-scoped flow keys. OAuth callbacks
arrive without tenant ALS context (provider redirects don't carry
X-Tenant-Id), so completeFlow/failFlow would never find flows
created under tenant context. Flow IDs are random UUIDs with no
collision risk, and flow data is ephemeral (TTL-bounded).
SSE tests: Use process.nextTick for onDone callback so Express
response headers are flushed before res.write/res.end are called.
* fix: restore getTenantId import for completeFlow diagnostic log
* fix: correct completeFlow warning message, add missing flow test
The warning referenced X-Tenant-Id header consistency which was only
relevant when flow keys were tenant-scoped (since reverted). Updated
to list actual causes: TTL expiry, missing flow, or routing to a
different instance without shared Keyv storage.
Removed the getTenantId() call and import — no longer needed since
flow keys are unscoped.
Added test for the !flowState branch in completeFlow — verifies
return false and logger.warn on nonexistent flow ID.
* fix: add explicit return type to recursive updateInterfacePermissions
The recursive call (tenantId branch calls itself without tenantId)
causes TypeScript to infer circular return type 'any'. Adding
explicit Promise<void> satisfies the rollup typescript plugin.
* fix: update MCPOAuthRaceCondition test to match new completeFlow warning
* fix: clearEndpointConfigCache deletes both scoped and unscoped keys
Unauthenticated /api/endpoints requests populate the unscoped
ENDPOINT_CONFIG key. Admin config mutations clear only the
tenant-scoped key, leaving the unscoped entry stale indefinitely.
Now deletes both when in tenant context.
* fix: tenant guard on abort/status endpoints, warn logs, test coverage
F1: Add tenant guard to /chat/status/:conversationId and /chat/abort
matching the existing guard on /chat/stream/:streamId. The status
endpoint exposes aggregatedContent (AI response text) which requires
tenant-level access control.
F2: preAuthTenantMiddleware now logs warn for rejected __SYSTEM__
sentinel and malformed tenant IDs, providing observability for
bypass probing attempts.
F3: Abort fallback path (getActiveJobIdsForUser) now has tenant
check after resolving the job.
F4: Test for strict mode + SYSTEM_TENANT_ID — verifies runAsSystem
bypasses tenantSafeBulkWrite without throwing in strict mode.
F5: Test for job with tenantId + user without tenantId → 403.
F10: Regex uses idiomatic hyphen-at-start form.
F11: Test descriptions changed from "rejects" to "ignores" since
middleware calls next() (not 4xx).
Also fixes MCPOAuthRaceCondition test assertion to match updated
completeFlow warning message.
* fix: test coverage for logger.warn, status/abort guards, consistency
A: preAuthTenant spec now mocks logger and asserts warn calls for
__SYSTEM__ sentinel, malformed characters, and oversized headers.
B: streamTenant spec expanded with status and abort endpoint tests —
cross-tenant status returns 403, same-tenant returns 200 with body,
cross-tenant abort returns 403.
C: Abort endpoint uses req.user.tenantId (not req.user?.tenantId)
matching stream/status pattern — requireJwtAuth guarantees req.user.
D: Malformed header warning now includes ip in log metadata,
matching the sentinel warning for consistent SOC correlation.
* fix: assert ip field in malformed header warn tests
* fix: parallelize cache deletes, document tenant guard, fix import order
- clearEndpointConfigCache uses Promise.all for independent cache
deletes instead of sequential awaits
- SSE stream tenant guard has inline comment explaining backward-compat
behavior for untenanted legacy jobs
- conversation.ts local imports reordered longest-to-shortest per
AGENTS.md
* fix: tenant-qualify userJobs keys, document tenant guard backward-compat
Job store userJobs keys now include tenantId when available:
- Redis: stream:user:{tenantId:userId}:jobs (falls back to
stream:user:{userId}:jobs when no tenant)
- InMemory: composite key tenantId:userId in userJobMap
getActiveJobIdsByUser/getActiveJobIdsForUser accept optional tenantId
parameter, threaded through from req.user.tenantId at all call sites
(/chat/active and /chat/abort fallback).
Added inline comments on all three SSE tenant guards explaining the
backward-compat design: untenanted legacy jobs remain accessible
when the userId check passes.
* fix: parallelize cache deletes, document tenant guard, fix import order
Fix InMemoryJobStore.getActiveJobIdsByUser empty-set cleanup to use
the tenant-qualified userKey instead of bare userId — prevents
orphaned empty Sets accumulating in userJobMap for multi-tenant users.
Document cross-tenant staleness in clearEndpointConfigCache JSDoc —
other tenants' scoped keys expire via TTL, not active invalidation.
* fix: cleanup userJobMap leak, startup warning, DRY tenant guard, docs
F1: InMemoryJobStore.cleanup() now removes entries from userJobMap
before calling deleteJob, preventing orphaned empty Sets from
accumulating with tenant-qualified composite keys.
F2: Startup warning when TENANT_ISOLATION_STRICT is active — reminds
operators to configure reverse proxy to control X-Tenant-Id header.
F3: mergeAppTools JSDoc documents that tenant-scoped TOOLS keys are
not actively invalidated (matching clearEndpointConfigCache pattern).
F5: Abort handler getActiveJobIdsForUser call uses req.user.tenantId
(not req.user?.tenantId) — consistent with stream/status handlers.
F6: updateInterfacePermissions JSDoc clarifies SYSTEM_TENANT_ID
behavior — falls through to caller's ALS context.
F7: Extracted hasTenantMismatch() helper, replacing three identical
inline tenant guard blocks across stream/status/abort endpoints.
F9: scopedCacheKey JSDoc documents both passthrough cases (no context
and SYSTEM_TENANT_ID context).
* fix: clean userJobMap in evictOldest — same leak as cleanup()
645 lines
22 KiB
JavaScript
645 lines
22 KiB
JavaScript
const { v4: uuidv4 } = require('uuid');
|
|
const { logger, scopedCacheKey } = require('@librechat/data-schemas');
|
|
const { EModelEndpoint, Constants, openAISettings, CacheKeys } = require('librechat-data-provider');
|
|
const { createImportBatchBuilder } = require('./importBatchBuilder');
|
|
const { cloneMessagesWithTimestamps } = require('./fork');
|
|
const getLogStores = require('~/cache/getLogStores');
|
|
|
|
/**
|
|
* Returns the appropriate importer function based on the provided JSON data.
|
|
*
|
|
* @param {Object} jsonData - The JSON data to import.
|
|
* @returns {Function} - The importer function.
|
|
* @throws {Error} - If the import type is not supported.
|
|
*/
|
|
function getImporter(jsonData) {
|
|
// For array-based formats (ChatGPT or Claude)
|
|
if (Array.isArray(jsonData)) {
|
|
// Claude format has chat_messages array in each conversation
|
|
if (jsonData.length > 0 && jsonData[0]?.chat_messages) {
|
|
logger.info('Importing Claude conversation');
|
|
return importClaudeConvo;
|
|
}
|
|
// ChatGPT format has mapping object in each conversation
|
|
logger.info('Importing ChatGPT conversation');
|
|
return importChatGptConvo;
|
|
}
|
|
|
|
// For ChatbotUI
|
|
if (jsonData.version && Array.isArray(jsonData.history)) {
|
|
logger.info('Importing ChatbotUI conversation');
|
|
return importChatBotUiConvo;
|
|
}
|
|
|
|
// For LibreChat
|
|
if (jsonData.conversationId && (jsonData.messagesTree || jsonData.messages)) {
|
|
logger.info('Importing LibreChat conversation');
|
|
return importLibreChatConvo;
|
|
}
|
|
|
|
throw new Error('Unsupported import type');
|
|
}
|
|
|
|
/**
|
|
* Imports a chatbot-ui V1 conversation from a JSON file and saves it to the database.
|
|
*
|
|
* @param {Object} jsonData - The JSON data containing the chatbot conversation.
|
|
* @param {string} requestUserId - The ID of the user making the import request.
|
|
* @param {Function} [builderFactory=createImportBatchBuilder] - The factory function to create an import batch builder.
|
|
* @returns {Promise<void>} - A promise that resolves when the import is complete.
|
|
* @throws {Error} - If there is an error creating the conversation from the JSON file.
|
|
*/
|
|
async function importChatBotUiConvo(
|
|
jsonData,
|
|
requestUserId,
|
|
builderFactory = createImportBatchBuilder,
|
|
) {
|
|
// this have been tested with chatbot-ui V1 export https://github.com/mckaywrigley/chatbot-ui/tree/b865b0555f53957e96727bc0bbb369c9eaecd83b#legacy-code
|
|
try {
|
|
/** @type {ImportBatchBuilder} */
|
|
const importBatchBuilder = builderFactory(requestUserId);
|
|
|
|
for (const historyItem of jsonData.history) {
|
|
importBatchBuilder.startConversation(EModelEndpoint.openAI);
|
|
for (const message of historyItem.messages) {
|
|
if (message.role === 'assistant') {
|
|
importBatchBuilder.addGptMessage(message.content, historyItem.model.id);
|
|
} else if (message.role === 'user') {
|
|
importBatchBuilder.addUserMessage(message.content);
|
|
}
|
|
}
|
|
importBatchBuilder.finishConversation(historyItem.name, new Date());
|
|
}
|
|
await importBatchBuilder.saveBatch();
|
|
logger.info(`user: ${requestUserId} | ChatbotUI conversation imported`);
|
|
} catch (error) {
|
|
logger.error(`user: ${requestUserId} | Error creating conversation from ChatbotUI file`, error);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Extracts text and thinking content from a Claude message.
|
|
* @param {Object} msg - Claude message object with content array and optional text field.
|
|
* @returns {{textContent: string, thinkingContent: string}} Extracted text and thinking content.
|
|
*/
|
|
function extractClaudeContent(msg) {
|
|
let textContent = '';
|
|
let thinkingContent = '';
|
|
|
|
for (const part of msg.content || []) {
|
|
if (part.type === 'text' && part.text) {
|
|
textContent += part.text;
|
|
} else if (part.type === 'thinking' && part.thinking) {
|
|
thinkingContent += part.thinking;
|
|
}
|
|
}
|
|
|
|
// Use the text field as fallback if content array is empty
|
|
if (!textContent && msg.text) {
|
|
textContent = msg.text;
|
|
}
|
|
|
|
return { textContent, thinkingContent };
|
|
}
|
|
|
|
/**
|
|
* Imports Claude conversations from provided JSON data.
|
|
* Claude export format: array of conversations with chat_messages array.
|
|
*
|
|
* @param {Array} jsonData - Array of Claude conversation objects to be imported.
|
|
* @param {string} requestUserId - The ID of the user who initiated the import process.
|
|
* @param {Function} builderFactory - Factory function to create a new import batch builder instance.
|
|
* @returns {Promise<void>} Promise that resolves when all conversations have been imported.
|
|
*/
|
|
async function importClaudeConvo(
|
|
jsonData,
|
|
requestUserId,
|
|
builderFactory = createImportBatchBuilder,
|
|
) {
|
|
try {
|
|
const importBatchBuilder = builderFactory(requestUserId);
|
|
|
|
for (const conv of jsonData) {
|
|
importBatchBuilder.startConversation(EModelEndpoint.anthropic);
|
|
|
|
let lastMessageId = Constants.NO_PARENT;
|
|
let lastTimestamp = null;
|
|
|
|
for (const msg of conv.chat_messages || []) {
|
|
const isCreatedByUser = msg.sender === 'human';
|
|
const messageId = uuidv4();
|
|
|
|
const { textContent, thinkingContent } = extractClaudeContent(msg);
|
|
|
|
// Skip empty messages
|
|
if (!textContent && !thinkingContent) {
|
|
continue;
|
|
}
|
|
|
|
// Parse timestamp, fallback to conversation create_time or current time
|
|
const messageTime = msg.created_at || conv.created_at;
|
|
let createdAt = messageTime ? new Date(messageTime) : new Date();
|
|
|
|
// Ensure timestamp is after the previous message.
|
|
// Messages are sorted by createdAt and buildTree expects parents to appear before children.
|
|
// This guards against any potential ordering issues in exports.
|
|
if (lastTimestamp && createdAt <= lastTimestamp) {
|
|
createdAt = new Date(lastTimestamp.getTime() + 1);
|
|
}
|
|
lastTimestamp = createdAt;
|
|
|
|
const message = {
|
|
messageId,
|
|
parentMessageId: lastMessageId,
|
|
text: textContent,
|
|
sender: isCreatedByUser ? 'user' : 'Claude',
|
|
isCreatedByUser,
|
|
user: requestUserId,
|
|
endpoint: EModelEndpoint.anthropic,
|
|
createdAt,
|
|
};
|
|
|
|
// Add content array with thinking if present
|
|
if (thinkingContent && !isCreatedByUser) {
|
|
message.content = [
|
|
{ type: 'think', think: thinkingContent },
|
|
{ type: 'text', text: textContent },
|
|
];
|
|
}
|
|
|
|
importBatchBuilder.saveMessage(message);
|
|
lastMessageId = messageId;
|
|
}
|
|
|
|
const createdAt = conv.created_at ? new Date(conv.created_at) : new Date();
|
|
importBatchBuilder.finishConversation(conv.name || 'Imported Claude Chat', createdAt);
|
|
}
|
|
|
|
await importBatchBuilder.saveBatch();
|
|
logger.info(`user: ${requestUserId} | Claude conversation imported`);
|
|
} catch (error) {
|
|
logger.error(`user: ${requestUserId} | Error creating conversation from Claude file`, error);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Imports a LibreChat conversation from JSON.
|
|
*
|
|
* @param {Object} jsonData - The JSON data representing the conversation.
|
|
* @param {string} requestUserId - The ID of the user making the import request.
|
|
* @param {Function} [builderFactory=createImportBatchBuilder] - The factory function to create an import batch builder.
|
|
* @returns {Promise<void>} - A promise that resolves when the import is complete.
|
|
*/
|
|
async function importLibreChatConvo(
|
|
jsonData,
|
|
requestUserId,
|
|
builderFactory = createImportBatchBuilder,
|
|
) {
|
|
try {
|
|
/** @type {ImportBatchBuilder} */
|
|
const importBatchBuilder = builderFactory(requestUserId);
|
|
const options = jsonData.options || {};
|
|
|
|
/* Endpoint configuration */
|
|
let endpoint = jsonData.endpoint ?? options.endpoint ?? EModelEndpoint.openAI;
|
|
const cache = getLogStores(CacheKeys.CONFIG_STORE);
|
|
const endpointsConfig = await cache.get(scopedCacheKey(CacheKeys.ENDPOINT_CONFIG));
|
|
const endpointConfig = endpointsConfig?.[endpoint];
|
|
if (!endpointConfig && endpointsConfig) {
|
|
endpoint = Object.keys(endpointsConfig)[0];
|
|
} else if (!endpointConfig) {
|
|
endpoint = EModelEndpoint.openAI;
|
|
}
|
|
|
|
importBatchBuilder.startConversation(endpoint);
|
|
|
|
let firstMessageDate = null;
|
|
|
|
const messagesToImport = jsonData.messagesTree || jsonData.messages;
|
|
|
|
if (jsonData.recursive) {
|
|
/**
|
|
* Flatten the recursive message tree into a flat array
|
|
* @param {TMessage[]} messages
|
|
* @param {string} parentMessageId
|
|
* @param {TMessage[]} flatMessages
|
|
*/
|
|
const flattenMessages = (
|
|
messages,
|
|
parentMessageId = Constants.NO_PARENT,
|
|
flatMessages = [],
|
|
) => {
|
|
for (const message of messages) {
|
|
if (!message.text && !message.content) {
|
|
continue;
|
|
}
|
|
|
|
const flatMessage = {
|
|
...message,
|
|
parentMessageId: parentMessageId,
|
|
children: undefined, // Remove children from flat structure
|
|
};
|
|
flatMessages.push(flatMessage);
|
|
|
|
if (!firstMessageDate && message.createdAt) {
|
|
firstMessageDate = new Date(message.createdAt);
|
|
}
|
|
|
|
if (message.children && message.children.length > 0) {
|
|
flattenMessages(message.children, message.messageId, flatMessages);
|
|
}
|
|
}
|
|
return flatMessages;
|
|
};
|
|
|
|
const flatMessages = flattenMessages(messagesToImport);
|
|
cloneMessagesWithTimestamps(flatMessages, importBatchBuilder);
|
|
} else if (messagesToImport) {
|
|
cloneMessagesWithTimestamps(messagesToImport, importBatchBuilder);
|
|
for (const message of messagesToImport) {
|
|
if (!firstMessageDate && message.createdAt) {
|
|
firstMessageDate = new Date(message.createdAt);
|
|
}
|
|
}
|
|
} else {
|
|
throw new Error('Invalid LibreChat file format');
|
|
}
|
|
|
|
if (firstMessageDate === 'Invalid Date') {
|
|
firstMessageDate = null;
|
|
}
|
|
|
|
importBatchBuilder.finishConversation(jsonData.title, firstMessageDate ?? new Date(), options);
|
|
await importBatchBuilder.saveBatch();
|
|
logger.debug(`user: ${requestUserId} | Conversation "${jsonData.title}" imported`);
|
|
} catch (error) {
|
|
logger.error(`user: ${requestUserId} | Error creating conversation from LibreChat file`, error);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Imports ChatGPT conversations from provided JSON data.
|
|
* Initializes the import process by creating a batch builder and processing each conversation in the data.
|
|
*
|
|
* @param {ChatGPTConvo[]} jsonData - Array of conversation objects to be imported.
|
|
* @param {string} requestUserId - The ID of the user who initiated the import process.
|
|
* @param {Function} builderFactory - Factory function to create a new import batch builder instance, defaults to createImportBatchBuilder.
|
|
* @returns {Promise<void>} Promise that resolves when all conversations have been imported.
|
|
*/
|
|
async function importChatGptConvo(
|
|
jsonData,
|
|
requestUserId,
|
|
builderFactory = createImportBatchBuilder,
|
|
) {
|
|
try {
|
|
const importBatchBuilder = builderFactory(requestUserId);
|
|
for (const conv of jsonData) {
|
|
processConversation(conv, importBatchBuilder, requestUserId);
|
|
}
|
|
await importBatchBuilder.saveBatch();
|
|
} catch (error) {
|
|
logger.error(`user: ${requestUserId} | Error creating conversation from imported file`, error);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Processes a single conversation, adding messages to the batch builder based on author roles and handling text content.
|
|
* It directly manages the addition of messages for different roles and handles citations for assistant messages.
|
|
*
|
|
* @param {ChatGPTConvo} conv - A single conversation object that contains multiple messages and other details.
|
|
* @param {ImportBatchBuilder} importBatchBuilder - The batch builder instance used to manage and batch conversation data.
|
|
* @param {string} requestUserId - The ID of the user who initiated the import process.
|
|
* @returns {void}
|
|
*/
|
|
function processConversation(conv, importBatchBuilder, requestUserId) {
|
|
importBatchBuilder.startConversation(EModelEndpoint.openAI);
|
|
|
|
// Map all message IDs to new UUIDs
|
|
const messageMap = new Map();
|
|
for (const [id, mapping] of Object.entries(conv.mapping)) {
|
|
if (mapping.message && mapping.message.content.content_type) {
|
|
const newMessageId = uuidv4();
|
|
messageMap.set(id, newMessageId);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Finds the nearest valid parent by traversing up through skippable messages
|
|
* (system, reasoning_recap, thoughts). Uses iterative traversal to avoid
|
|
* stack overflow on deep chains of skippable messages.
|
|
*
|
|
* @param {string} startId - The ID of the starting parent message.
|
|
* @returns {string} The ID of the nearest valid parent message.
|
|
*/
|
|
const findValidParent = (startId) => {
|
|
const visited = new Set();
|
|
let parentId = startId;
|
|
|
|
while (parentId) {
|
|
if (!messageMap.has(parentId) || visited.has(parentId)) {
|
|
return Constants.NO_PARENT;
|
|
}
|
|
visited.add(parentId);
|
|
|
|
const parentMapping = conv.mapping[parentId];
|
|
if (!parentMapping?.message) {
|
|
return Constants.NO_PARENT;
|
|
}
|
|
|
|
const contentType = parentMapping.message.content?.content_type;
|
|
const shouldSkip =
|
|
parentMapping.message.author?.role === 'system' ||
|
|
contentType === 'reasoning_recap' ||
|
|
contentType === 'thoughts';
|
|
|
|
if (!shouldSkip) {
|
|
return messageMap.get(parentId);
|
|
}
|
|
|
|
parentId = parentMapping.parent;
|
|
}
|
|
|
|
return Constants.NO_PARENT;
|
|
};
|
|
|
|
/**
|
|
* Helper function to find thinking content from parent chain (thoughts messages)
|
|
* @param {string} parentId - The ID of the parent message.
|
|
* @param {Set} visited - Set of already-visited IDs to prevent cycles.
|
|
* @returns {Array} The thinking content array (empty if not found).
|
|
*/
|
|
const findThinkingContent = (parentId, visited = new Set()) => {
|
|
// Guard against circular references in malformed imports
|
|
if (!parentId || visited.has(parentId)) {
|
|
return [];
|
|
}
|
|
visited.add(parentId);
|
|
|
|
const parentMapping = conv.mapping[parentId];
|
|
if (!parentMapping?.message) {
|
|
return [];
|
|
}
|
|
|
|
const contentType = parentMapping.message.content?.content_type;
|
|
|
|
// If this is a thoughts message, extract the thinking content
|
|
if (contentType === 'thoughts') {
|
|
const thoughts = parentMapping.message.content.thoughts || [];
|
|
const thinkingText = thoughts
|
|
.map((t) => t.content || t.summary || '')
|
|
.filter(Boolean)
|
|
.join('\n\n');
|
|
|
|
if (thinkingText) {
|
|
return [{ type: 'think', think: thinkingText }];
|
|
}
|
|
return [];
|
|
}
|
|
|
|
// If this is reasoning_recap, look at its parent for thoughts
|
|
if (contentType === 'reasoning_recap') {
|
|
return findThinkingContent(parentMapping.parent, visited);
|
|
}
|
|
|
|
return [];
|
|
};
|
|
|
|
// Create and save messages using the mapped IDs
|
|
const messages = [];
|
|
for (const [id, mapping] of Object.entries(conv.mapping)) {
|
|
const role = mapping.message?.author?.role;
|
|
if (!mapping.message) {
|
|
messageMap.delete(id);
|
|
continue;
|
|
} else if (role === 'system') {
|
|
// Skip system messages but keep their ID in messageMap for parent references
|
|
continue;
|
|
}
|
|
|
|
const contentType = mapping.message.content?.content_type;
|
|
|
|
// Skip thoughts messages - they will be merged into the response message
|
|
if (contentType === 'thoughts') {
|
|
continue;
|
|
}
|
|
|
|
// Skip reasoning_recap messages (just summaries like "Thought for 44s")
|
|
if (contentType === 'reasoning_recap') {
|
|
continue;
|
|
}
|
|
|
|
const newMessageId = messageMap.get(id);
|
|
const parentMessageId = findValidParent(mapping.parent);
|
|
|
|
const messageText = formatMessageText(mapping.message);
|
|
|
|
const isCreatedByUser = role === 'user';
|
|
let sender = isCreatedByUser ? 'user' : 'assistant';
|
|
const model = mapping.message.metadata.model_slug || openAISettings.model.default;
|
|
|
|
if (!isCreatedByUser) {
|
|
/** Extracted model name from model slug */
|
|
const gptMatch = model.match(/gpt-(.+)/i);
|
|
if (gptMatch) {
|
|
sender = `GPT-${gptMatch[1]}`;
|
|
} else {
|
|
sender = model || 'assistant';
|
|
}
|
|
}
|
|
|
|
// Use create_time from ChatGPT export to ensure proper message ordering
|
|
// For null timestamps, use the conversation's create_time as fallback, or current time as last resort
|
|
const messageTime = mapping.message.create_time || conv.create_time;
|
|
const createdAt = messageTime ? new Date(messageTime * 1000) : new Date();
|
|
|
|
const message = {
|
|
messageId: newMessageId,
|
|
parentMessageId,
|
|
text: messageText,
|
|
sender,
|
|
isCreatedByUser,
|
|
model,
|
|
user: requestUserId,
|
|
endpoint: EModelEndpoint.openAI,
|
|
createdAt,
|
|
};
|
|
|
|
// For assistant messages, check if there's thinking content in the parent chain
|
|
if (!isCreatedByUser) {
|
|
const thinkingContent = findThinkingContent(mapping.parent);
|
|
if (thinkingContent.length > 0) {
|
|
// Combine thinking content with the text response
|
|
message.content = [...thinkingContent, { type: 'text', text: messageText }];
|
|
}
|
|
}
|
|
|
|
messages.push(message);
|
|
}
|
|
|
|
const cycleDetected = adjustTimestampsForOrdering(messages);
|
|
if (cycleDetected) {
|
|
breakParentCycles(messages);
|
|
}
|
|
|
|
for (const message of messages) {
|
|
importBatchBuilder.saveMessage(message);
|
|
}
|
|
|
|
importBatchBuilder.finishConversation(conv.title, new Date(conv.create_time * 1000));
|
|
}
|
|
|
|
/**
|
|
* Processes text content of messages authored by an assistant, inserting citation links as required.
|
|
* Uses citation start and end indices to place links at the correct positions.
|
|
*
|
|
* @param {ChatGPTMessage} messageData - The message data containing metadata about citations.
|
|
* @param {string} messageText - The original text of the message which may be altered by inserting citation links.
|
|
* @returns {string} - The updated message text after processing for citations.
|
|
*/
|
|
function processAssistantMessage(messageData, messageText) {
|
|
if (!messageText) {
|
|
return messageText;
|
|
}
|
|
|
|
const citations = messageData.metadata?.citations ?? [];
|
|
|
|
const sortedCitations = [...citations].sort((a, b) => b.start_ix - a.start_ix);
|
|
|
|
let result = messageText;
|
|
for (const citation of sortedCitations) {
|
|
if (
|
|
!citation.metadata?.type ||
|
|
citation.metadata.type !== 'webpage' ||
|
|
typeof citation.start_ix !== 'number' ||
|
|
typeof citation.end_ix !== 'number' ||
|
|
citation.start_ix >= citation.end_ix
|
|
) {
|
|
continue;
|
|
}
|
|
|
|
const replacement = ` ([${citation.metadata.title}](${citation.metadata.url}))`;
|
|
|
|
result = result.slice(0, citation.start_ix) + replacement + result.slice(citation.end_ix);
|
|
}
|
|
|
|
return result;
|
|
}
|
|
|
|
/**
|
|
* Formats the text content of a message based on its content type and author role.
|
|
* @param {ChatGPTMessage} messageData - The message data.
|
|
* @returns {string} - The formatted message text.
|
|
*/
|
|
function formatMessageText(messageData) {
|
|
const contentType = messageData.content.content_type;
|
|
const isText = contentType === 'text';
|
|
let messageText = '';
|
|
|
|
if (isText && messageData.content.parts) {
|
|
messageText = messageData.content.parts.join(' ');
|
|
} else if (contentType === 'code') {
|
|
messageText = `\`\`\`${messageData.content.language}\n${messageData.content.text}\n\`\`\``;
|
|
} else if (contentType === 'execution_output') {
|
|
messageText = `Execution Output:\n> ${messageData.content.text}`;
|
|
} else if (messageData.content.parts) {
|
|
for (const part of messageData.content.parts) {
|
|
if (typeof part === 'string') {
|
|
messageText += part + ' ';
|
|
} else if (typeof part === 'object') {
|
|
messageText = `\`\`\`json\n${JSON.stringify(part, null, 2)}\n\`\`\`\n`;
|
|
}
|
|
}
|
|
messageText = messageText.trim();
|
|
} else {
|
|
messageText = `\`\`\`json\n${JSON.stringify(messageData.content, null, 2)}\n\`\`\``;
|
|
}
|
|
|
|
if (isText && messageData.author.role !== 'user') {
|
|
messageText = processAssistantMessage(messageData, messageText);
|
|
}
|
|
|
|
return messageText;
|
|
}
|
|
|
|
/**
|
|
* Adjusts message timestamps to ensure children always come after parents.
|
|
* Messages are sorted by createdAt and buildTree expects parents to appear before children.
|
|
* ChatGPT exports can have slight timestamp inversions (e.g., tool call results
|
|
* arriving a few ms before their parent). Uses multiple passes to handle cascading adjustments.
|
|
* Capped at N passes (where N = message count) to guarantee termination on cyclic graphs.
|
|
*
|
|
* @param {Array} messages - Array of message objects with messageId, parentMessageId, and createdAt.
|
|
* @returns {boolean} True if cyclic parent relationships were detected.
|
|
*/
|
|
function adjustTimestampsForOrdering(messages) {
|
|
if (messages.length === 0) {
|
|
return false;
|
|
}
|
|
|
|
const timestampMap = new Map();
|
|
for (const msg of messages) {
|
|
timestampMap.set(msg.messageId, msg.createdAt);
|
|
}
|
|
|
|
let hasChanges = true;
|
|
let remainingPasses = messages.length;
|
|
while (hasChanges && remainingPasses > 0) {
|
|
hasChanges = false;
|
|
remainingPasses--;
|
|
for (const message of messages) {
|
|
if (message.parentMessageId && message.parentMessageId !== Constants.NO_PARENT) {
|
|
const parentTimestamp = timestampMap.get(message.parentMessageId);
|
|
if (parentTimestamp && message.createdAt <= parentTimestamp) {
|
|
message.createdAt = new Date(parentTimestamp.getTime() + 1);
|
|
timestampMap.set(message.messageId, message.createdAt);
|
|
hasChanges = true;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
const cycleDetected = remainingPasses === 0 && hasChanges;
|
|
if (cycleDetected) {
|
|
logger.warn(
|
|
'[importers] Detected cyclic parent relationships while adjusting import timestamps',
|
|
);
|
|
}
|
|
return cycleDetected;
|
|
}
|
|
|
|
/**
|
|
* Severs cyclic parentMessageId back-edges so saved messages form a valid tree.
|
|
* Walks each message's parent chain; if a message is visited twice, its parentMessageId
|
|
* is set to NO_PARENT to break the cycle.
|
|
*
|
|
* @param {Array} messages - Array of message objects with messageId and parentMessageId.
|
|
*/
|
|
function breakParentCycles(messages) {
|
|
const parentLookup = new Map();
|
|
for (const msg of messages) {
|
|
parentLookup.set(msg.messageId, msg);
|
|
}
|
|
|
|
const settled = new Set();
|
|
for (const message of messages) {
|
|
const chain = new Set();
|
|
let current = message;
|
|
while (current && !settled.has(current.messageId)) {
|
|
if (chain.has(current.messageId)) {
|
|
current.parentMessageId = Constants.NO_PARENT;
|
|
break;
|
|
}
|
|
chain.add(current.messageId);
|
|
const parentId = current.parentMessageId;
|
|
if (!parentId || parentId === Constants.NO_PARENT) {
|
|
break;
|
|
}
|
|
current = parentLookup.get(parentId);
|
|
}
|
|
for (const id of chain) {
|
|
settled.add(id);
|
|
}
|
|
}
|
|
}
|
|
|
|
module.exports = { getImporter, processAssistantMessage };
|