🗂️ feat: Better Persistence for Code Execution Files Between Sessions (#11362)

* refactor: process code output files for re-use (WIP) * feat: file attachment handling with additional metadata for downloads * refactor: Update directory path logic for local file saving based on basePath * refactor: file attachment handling to support TFile type and improve data merging logic * feat: thread filtering of code-generated files - Introduced parentMessageId parameter in addedConvo and initialize functions to enhance thread management. - Updated related methods to utilize parentMessageId for retrieving messages and filtering code-generated files by conversation threads. - Enhanced type definitions to include parentMessageId in relevant interfaces for better clarity and usage. * chore: imports/params ordering * feat: update file model to use messageId for filtering and processing - Changed references from 'message' to 'messageId' in file-related methods for consistency. - Added messageId field to the file schema and updated related types. - Enhanced file processing logic to accommodate the new messageId structure. * feat: enhance file retrieval methods to support user-uploaded execute_code files - Added a new method `getUserCodeFiles` to retrieve user-uploaded execute_code files, excluding code-generated files. - Updated existing file retrieval methods to improve filtering logic and handle edge cases. - Enhanced thread data extraction to collect both message IDs and file IDs efficiently. - Integrated `getUserCodeFiles` into relevant endpoints for better file management in conversations. * chore: update @librechat/agents package version to 3.0.78 in package-lock.json and related package.json files * refactor: file processing and retrieval logic - Added a fallback mechanism for download URLs when files exceed size limits or cannot be processed locally. - Implemented a deduplication strategy for code-generated files based on conversationId and filename to optimize storage. - Updated file retrieval methods to ensure proper filtering by messageIds, preventing orphaned files from being included. - Introduced comprehensive tests for new thread data extraction functionality, covering edge cases and performance considerations. * fix: improve file retrieval tests and handling of optional properties - Updated tests to safely access optional properties using non-null assertions. - Modified test descriptions for clarity regarding the exclusion of execute_code files. - Ensured that the retrieval logic correctly reflects the expected outcomes for file queries. * test: add comprehensive unit tests for processCodeOutput functionality - Introduced a new test suite for the processCodeOutput function, covering various scenarios including file retrieval, creation, and processing for both image and non-image files. - Implemented mocks for dependencies such as axios, logger, and file models to isolate tests and ensure reliable outcomes. - Validated behavior for existing files, new file creation, and error handling, including size limits and fallback mechanisms. - Enhanced test coverage for metadata handling and usage increment logic, ensuring robust verification of file processing outcomes. * test: enhance file size limit enforcement in processCodeOutput tests - Introduced a configurable file size limit for tests to improve flexibility and coverage. - Mocked the `librechat-data-provider` to allow dynamic adjustment of file size limits during tests. - Updated the file size limit enforcement test to validate behavior when files exceed specified limits, ensuring proper fallback to download URLs. - Reset file size limit after tests to maintain isolation for subsequent test cases.
2026-01-20 09:16:13 +01:00 · 2026-01-16 10:06:24 -05:00 · 2026-01-16 10:06:24 -05:00 · cc32895d13
commit cc32895d13
parent fe32cbedf9
22 changed files with 1364 additions and 83 deletions
--- a/api/server/services/Files/Code/process.js
+++ b/api/server/services/Files/Code/process.js
@ -6,27 +6,112 @@ const { getCodeBaseURL } = require('@librechat/agents');
 const { logAxiosError, getBasePath } = require('@librechat/api');
 const {
  Tools,
+  megabyte,
+  fileConfig,
  FileContext,
  FileSources,
  imageExtRegex,
+  inferMimeType,
  EToolResources,
+  EModelEndpoint,
+  mergeFileConfig,
+  getEndpointFileConfig,
 } = require('librechat-data-provider');
 const { filterFilesByAgentAccess } = require('~/server/services/Files/permissions');
 const { getStrategyFunctions } = require('~/server/services/Files/strategies');
 const { convertImage } = require('~/server/services/Files/images/convert');
 const { createFile, getFiles, updateFile } = require('~/models');
+const { determineFileType } = require('~/server/utils');

 /**
- * Process OpenAI image files, convert to target format, save and return file metadata.
+ * Creates a fallback download URL response when file cannot be processed locally.
+ * Used when: file exceeds size limit, storage strategy unavailable, or download error occurs.
+ * @param {Object} params - The parameters.
+ * @param {string} params.name - The filename.
+ * @param {string} params.session_id - The code execution session ID.
+ * @param {string} params.id - The file ID from the code environment.
+ * @param {string} params.conversationId - The current conversation ID.
+ * @param {string} params.toolCallId - The tool call ID that generated the file.
+ * @param {string} params.messageId - The current message ID.
+ * @param {number} params.expiresAt - Expiration timestamp (24 hours from creation).
+ * @returns {Object} Fallback response with download URL.
+ */
+const createDownloadFallback = ({
+  id,
+  name,
+  messageId,
+  expiresAt,
+  session_id,
+  toolCallId,
+  conversationId,
+}) => {
+  const basePath = getBasePath();
+  return {
+    filename: name,
+    filepath: `${basePath}/api/files/code/download/${session_id}/${id}`,
+    expiresAt,
+    conversationId,
+    toolCallId,
+    messageId,
+  };
+};
+
+/**
+ * Find an existing code-generated file by filename in the conversation.
+ * Used to update existing files instead of creating duplicates.
+ *
+ * ## Deduplication Strategy
+ *
+ * Files are deduplicated by `(conversationId, filename)` - NOT including `messageId`.
+ * This is an intentional design decision to handle iterative code development patterns:
+ *
+ * **Rationale:**
+ * - When users iteratively refine code (e.g., "regenerate that chart with red bars"),
+ *   the same logical file (e.g., "chart.png") is produced multiple times
+ * - Without deduplication, each iteration would create a new file, leading to storage bloat
+ * - The latest version is what matters for re-upload to the code environment
+ *
+ * **Implications:**
+ * - Different messages producing files with the same name will update the same file record
+ * - The `messageId` field tracks which message last updated the file
+ * - The `usage` counter tracks how many times the file has been generated
+ *
+ * **Future Considerations:**
+ * - If file versioning is needed, consider adding a `versions` array or separate version collection
+ * - The current approach prioritizes storage efficiency over history preservation
+ *
+ * @param {string} filename - The filename to search for.
+ * @param {string} conversationId - The conversation ID.
+ * @returns {Promise<MongoFile | null>} The existing file or null.
+ */
+const findExistingCodeFile = async (filename, conversationId) => {
+  if (!filename || !conversationId) {
+    return null;
+  }
+  const files = await getFiles(
+    {
+      filename,
+      conversationId,
+      context: FileContext.execute_code,
+    },
+    { createdAt: -1 },
+    { text: 0 },
+  );
+  return files?.[0] ?? null;
+};
+
+/**
+ * Process code execution output files - downloads and saves both images and non-image files.
+ * All files are saved to local storage with fileIdentifier metadata for code env re-upload.
 * @param {ServerRequest} params.req - The Express request object.
- * @param {string} params.id - The file ID.
+ * @param {string} params.id - The file ID from the code environment.
 * @param {string} params.name - The filename.
 * @param {string} params.apiKey - The code execution API key.
 * @param {string} params.toolCallId - The tool call ID that generated the file.
 * @param {string} params.session_id - The code execution session ID.
 * @param {string} params.conversationId - The current conversation ID.
 * @param {string} params.messageId - The current message ID.
- * @returns {Promise<MongoFile & { messageId: string, toolCallId: string } | { filename: string; filepath: string; expiresAt: number; conversationId: string; toolCallId: string; messageId: string } | undefined>} The file metadata or undefined if an error occurs.
+ * @returns {Promise<MongoFile & { messageId: string, toolCallId: string } | undefined>} The file metadata or undefined if an error occurs.
 */
 const processCodeOutput = async ({
  req,
@ -41,19 +126,15 @@ const processCodeOutput = async ({
  const appConfig = req.config;
  const currentDate = new Date();
  const baseURL = getCodeBaseURL();
-  const basePath = getBasePath();
-  const fileExt = path.extname(name);
-  if (!fileExt || !imageExtRegex.test(name)) {
-    return {
-      filename: name,
-      filepath: `${basePath}/api/files/code/download/${session_id}/${id}`,
-      /** Note: expires 24 hours after creation */
-      expiresAt: currentDate.getTime() + 86400000,
-      conversationId,
-      toolCallId,
-      messageId,
-    };
-  }
+  const fileExt = path.extname(name).toLowerCase();
+  const isImage = fileExt && imageExtRegex.test(name);
+
+  const mergedFileConfig = mergeFileConfig(appConfig.fileConfig);
+  const endpointFileConfig = getEndpointFileConfig({
+    fileConfig: mergedFileConfig,
+    endpoint: EModelEndpoint.agents,
+  });
+  const fileSizeLimit = endpointFileConfig.fileSizeLimit ?? mergedFileConfig.serverFileSizeLimit;

  try {
    const formattedDate = currentDate.toISOString();
@ -70,29 +151,135 @@ const processCodeOutput = async ({

    const buffer = Buffer.from(response.data, 'binary');

-    const file_id = v4();
-    const _file = await convertImage(req, buffer, 'high', `${file_id}${fileExt}`);
+    // Enforce file size limit
+    if (buffer.length > fileSizeLimit) {
+      logger.warn(
+        `[processCodeOutput] File "${name}" (${(buffer.length / megabyte).toFixed(2)} MB) exceeds size limit of ${(fileSizeLimit / megabyte).toFixed(2)} MB, falling back to download URL`,
+      );
+      return createDownloadFallback({
+        id,
+        name,
+        messageId,
+        toolCallId,
+        session_id,
+        conversationId,
+        expiresAt: currentDate.getTime() + 86400000,
+      });
+    }
+
+    const fileIdentifier = `${session_id}/${id}`;
+
+    /**
+     * Check for existing file with same filename in this conversation.
+     * If found, we'll update it instead of creating a duplicate.
+     */
+    const existingFile = await findExistingCodeFile(name, conversationId);
+    const file_id = existingFile?.file_id ?? v4();
+    const isUpdate = !!existingFile;
+
+    if (isUpdate) {
+      logger.debug(
+        `[processCodeOutput] Updating existing file "${name}" (${file_id}) instead of creating duplicate`,
+      );
+    }
+
+    if (isImage) {
+      const _file = await convertImage(req, buffer, 'high', `${file_id}${fileExt}`);
+      const file = {
+        ..._file,
+        file_id,
+        messageId,
+        usage: isUpdate ? (existingFile.usage ?? 0) + 1 : 1,
+        filename: name,
+        conversationId,
+        user: req.user.id,
+        type: `image/${appConfig.imageOutputType}`,
+        createdAt: isUpdate ? existingFile.createdAt : formattedDate,
+        updatedAt: formattedDate,
+        source: appConfig.fileStrategy,
+        context: FileContext.execute_code,
+        metadata: { fileIdentifier },
+      };
+      createFile(file, true);
+      return Object.assign(file, { messageId, toolCallId });
+    }
+
+    // For non-image files, save to configured storage strategy
+    const { saveBuffer } = getStrategyFunctions(appConfig.fileStrategy);
+    if (!saveBuffer) {
+      logger.warn(
+        `[processCodeOutput] saveBuffer not available for strategy ${appConfig.fileStrategy}, falling back to download URL`,
+      );
+      return createDownloadFallback({
+        id,
+        name,
+        messageId,
+        toolCallId,
+        session_id,
+        conversationId,
+        expiresAt: currentDate.getTime() + 86400000,
+      });
+    }
+
+    // Determine MIME type from buffer or extension
+    const detectedType = await determineFileType(buffer, true);
+    const mimeType = detectedType?.mime || inferMimeType(name, '') || 'application/octet-stream';
+
+    /** Check MIME type support - for code-generated files, we're lenient but log unsupported types */
+    const isSupportedMimeType = fileConfig.checkType(
+      mimeType,
+      endpointFileConfig.supportedMimeTypes,
+    );
+    if (!isSupportedMimeType) {
+      logger.warn(
+        `[processCodeOutput] File "${name}" has unsupported MIME type "${mimeType}", proceeding with storage but may not be usable as tool resource`,
+      );
+    }
+
+    const fileName = `${file_id}__${name}`;
+    const filepath = await saveBuffer({
+      userId: req.user.id,
+      buffer,
+      fileName,
+      basePath: 'uploads',
+    });
+
    const file = {
-      ..._file,
      file_id,
-      usage: 1,
+      filepath,
+      messageId,
+      object: 'file',
      filename: name,
+      type: mimeType,
      conversationId,
      user: req.user.id,
-      type: `image/${appConfig.imageOutputType}`,
-      createdAt: formattedDate,
+      bytes: buffer.length,
      updatedAt: formattedDate,
+      metadata: { fileIdentifier },
      source: appConfig.fileStrategy,
      context: FileContext.execute_code,
+      usage: isUpdate ? (existingFile.usage ?? 0) + 1 : 1,
+      createdAt: isUpdate ? existingFile.createdAt : formattedDate,
    };
+
    createFile(file, true);
-    /** Note: `messageId` & `toolCallId` are not part of file DB schema; message object records associated file ID */
    return Object.assign(file, { messageId, toolCallId });
  } catch (error) {
    logAxiosError({
-      message: 'Error downloading code environment file',
+      message: 'Error downloading/processing code environment file',
      error,
    });
+
+    // Fallback for download errors - return download URL so user can still manually download
+    return createDownloadFallback({
+      id,
+      name,
+      messageId,
+      toolCallId,
+      session_id,
+      conversationId,
+      expiresAt: currentDate.getTime() + 86400000,
+    });
  }
 };

@ -204,9 +391,16 @@ const primeFiles = async (options, apiKey) => {
        if (!toolContext) {
          toolContext = `- Note: The following files are available in the "${Tools.execute_code}" tool environment:`;
        }
-        toolContext += `\n\t- /mnt/data/${file.filename}${
-          agentResourceIds.has(file.file_id) ? '' : ' (just attached by user)'
-        }`;
+
+        let fileSuffix = '';
+        if (!agentResourceIds.has(file.file_id)) {
+          fileSuffix =
+            file.context === FileContext.execute_code
+              ? ' (from previous code execution)'
+              : ' (attached by user)';
+        }
+
+        toolContext += `\n\t- /mnt/data/${file.filename}${fileSuffix}`;
        files.push({
          id,
          session_id,