📎 feat: Direct Provider Attachment Support for Multimodal Content (#9994)

* 📎 feat: Direct Provider Attachment Support for Multimodal Content * 📑 feat: Anthropic Direct Provider Upload (#9072) * feat: implement Anthropic native PDF support with document preservation - Add comprehensive debug logging throughout PDF processing pipeline - Refactor attachment processing to separate image and document handling - Create distinct addImageURLs(), addDocuments(), and processAttachments() methods - Fix critical bugs in stream handling and parameter passing - Add streamToBuffer utility for proper stream-to-buffer conversion - Remove api/agents submodule from repository 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: remove out of scope formatting changes * fix: stop duplication of file in chat on end of response stream * chore: bring back file search and ocr options * chore: localize upload to provider string in file menu * refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support * feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching * feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload * feat: add citations support according to docs * refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now * refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic * fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs * chore: remove client specific handling since the direct provider stuff is handled by the agent client * feat: handle documents in AgentClient so no need for change to agents repo * chore: removed unused changes * chore: remove auto generated comments from OG commit * feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic) * fix: reintroduce role check to fix render error because of undefined value for Content Part * fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent --------- Co-authored-by: Andres Restrepo <andres@thelinuxkid.com> Co-authored-by: Claude <noreply@anthropic.com> 📁 feat: Send Attachments Directly to Provider (OpenAI) (#9098) * refactor: change references from direct upload to direct attach to better reflect functionality since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support * feat: add upload to provider option for openai (and agent) ui * chore: move anthropic pdf validator over to packages/api * feat: simple pdf validation according to openai docs * feat: add provider agnostic validatePdf logic to start handling multiple endpoints * feat: add handling for openai specific documentPart formatting * refactor: move require statement to proper place at top of file * chore: add in openAI endpoint for the rest of the document handling logic * feat: add direct attach support for azureOpenAI endpoint and agents * feat: add pdf validation for azureOpenAI endpoint * refactor: unify all the endpoint checks with isDocumentSupportedEndpoint * refactor: consolidate Upload to Provider vs Upload image logic for clarity * refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now 🗂️ feat: Send Attachments Directly to Provider (Google) (#9100) * feat: add validation for google PDFs and add google endpoint as a document supporting endpoint * feat: add proper pdf formatting for google endpoints (requires PR #14 in agents) * feat: add multimodal support for google endpoint attachments * feat: add audio file svg * fix: refactor attachments logic so multi-attachment messages work properly * feat: add video file svg * fix: allows for followup questions of uploaded multimodal attachments * fix: remove incorrect final message filtering that was breaking Attachment component rendering fix: manualy rename 'documents' to 'Documents' in git since it wasn't picked up due to case insensitivity in dir name fix: add logic so filepicker for a google agent has proper filetype filtering 🛫 refactor: Move Encoding Logic to packages/api (#9182) * refactor: move audio encode over to TS * refactor: audio encoding now functional in LC again * refactor: move video encode over to TS * refactor: move document encode over to TS * refactor: video encoding now functional in LC again * refactor: document encoding now functional in LC again * fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider * feat: only accept pdfs if responses api is enabled for openai convos chore: address ESLint comments chore: add missing audio mimetype * fix: type safety for message content parts and improve null handling * chore: reorder AttachFileMenuProps for consistency and clarity * chore: import order in AttachFileMenu * fix: improve null handling for text parts in parseTextParts function * fix: remove no longer used unsupported capability error message for file uploads * fix: OpenAI Direct File Attachment Format * fix: update encodeAndFormatDocuments to support OpenAI responses API and enhance document result types * refactor: broaden providers supported for documents * feat: enhance DragDrop context and modal to support document uploads based on provider capabilities * fix: reorder import statements for consistency in video encoding module --------- Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
2026-01-31 06:45:17 +01:00 · 2025-10-06 17:30:16 -04:00 · 2025-10-06 17:30:16 -04:00 · bcd97aad2f
commit bcd97aad2f
parent 9c77f53454
33 changed files with 1040 additions and 74 deletions
--- a/api/app/clients/BaseClient.js
+++ b/api/app/clients/BaseClient.js
@ -1,18 +1,24 @@
 const crypto = require('crypto');
 const fetch = require('node-fetch');
 const { logger } = require('@librechat/data-schemas');
-const { getBalanceConfig } = require('@librechat/api');
 const {
-  supportsBalanceCheck,
-  isAgentsEndpoint,
-  isParamEndpoint,
-  EModelEndpoint,
+  getBalanceConfig,
+  encodeAndFormatAudios,
+  encodeAndFormatVideos,
+  encodeAndFormatDocuments,
+} = require('@librechat/api');
+const {
+  Constants,
+  ErrorTypes,
  ContentTypes,
  excludedKeys,
-  ErrorTypes,
-  Constants,
+  EModelEndpoint,
+  isParamEndpoint,
+  isAgentsEndpoint,
+  supportsBalanceCheck,
 } = require('librechat-data-provider');
 const { getMessages, saveMessage, updateMessage, saveConvo, getConvo } = require('~/models');
+const { getStrategyFunctions } = require('~/server/services/Files/strategies');
 const { checkBalance } = require('~/models/balanceMethods');
 const { truncateToolCallOutputs } = require('./prompts');
 const { getFiles } = require('~/models/File');
@ -1198,8 +1204,99 @@ class BaseClient {
    return await this.sendCompletion(payload, opts);
  }

+  async addDocuments(message, attachments) {
+    const documentResult = await encodeAndFormatDocuments(
+      this.options.req,
+      attachments,
+      {
+        provider: this.options.agent?.provider,
+        useResponsesApi: this.options.agent?.model_parameters?.useResponsesApi,
+      },
+      getStrategyFunctions,
+    );
+    message.documents =
+      documentResult.documents && documentResult.documents.length
+        ? documentResult.documents
+        : undefined;
+    return documentResult.files;
+  }
+
+  async addVideos(message, attachments) {
+    const videoResult = await encodeAndFormatVideos(
+      this.options.req,
+      attachments,
+      this.options.agent.provider,
+      getStrategyFunctions,
+    );
+    message.videos =
+      videoResult.videos && videoResult.videos.length ? videoResult.videos : undefined;
+    return videoResult.files;
+  }
+
+  async addAudios(message, attachments) {
+    const audioResult = await encodeAndFormatAudios(
+      this.options.req,
+      attachments,
+      this.options.agent.provider,
+      getStrategyFunctions,
+    );
+    message.audios =
+      audioResult.audios && audioResult.audios.length ? audioResult.audios : undefined;
+    return audioResult.files;
+  }
+
+  async processAttachments(message, attachments) {
+    const categorizedAttachments = {
+      images: [],
+      documents: [],
+      videos: [],
+      audios: [],
+    };
+
+    for (const file of attachments) {
+      if (file.type.startsWith('image/')) {
+        categorizedAttachments.images.push(file);
+      } else if (file.type === 'application/pdf') {
+        categorizedAttachments.documents.push(file);
+      } else if (file.type.startsWith('video/')) {
+        categorizedAttachments.videos.push(file);
+      } else if (file.type.startsWith('audio/')) {
+        categorizedAttachments.audios.push(file);
+      }
+    }
+
+    const [imageFiles, documentFiles, videoFiles, audioFiles] = await Promise.all([
+      categorizedAttachments.images.length > 0
+        ? this.addImageURLs(message, categorizedAttachments.images)
+        : Promise.resolve([]),
+      categorizedAttachments.documents.length > 0
+        ? this.addDocuments(message, categorizedAttachments.documents)
+        : Promise.resolve([]),
+      categorizedAttachments.videos.length > 0
+        ? this.addVideos(message, categorizedAttachments.videos)
+        : Promise.resolve([]),
+      categorizedAttachments.audios.length > 0
+        ? this.addAudios(message, categorizedAttachments.audios)
+        : Promise.resolve([]),
+    ]);
+
+    const allFiles = [...imageFiles, ...documentFiles, ...videoFiles, ...audioFiles];
+    const seenFileIds = new Set();
+    const uniqueFiles = [];
+
+    for (const file of allFiles) {
+      if (file.file_id && !seenFileIds.has(file.file_id)) {
+        seenFileIds.add(file.file_id);
+        uniqueFiles.push(file);
+      } else if (!file.file_id) {
+        uniqueFiles.push(file);
+      }
+    }
+
+    return uniqueFiles;
+  }
+
  /**
-   *
   * @param {TMessage[]} _messages
   * @returns {Promise<TMessage[]>}
   */
@ -1248,7 +1345,7 @@ class BaseClient {
        {},
      );

-      await this.addImageURLs(message, files, this.visionMode);
+      await this.processAttachments(message, files);

      this.message_file_map[message.messageId] = files;
      return message;
--- a/api/package.json
+++ b/api/package.json
@ -48,7 +48,7 @@
    "@langchain/google-genai": "^0.2.13",
    "@langchain/google-vertexai": "^0.2.13",
    "@langchain/textsplitters": "^0.1.0",
-    "@librechat/agents": "^2.4.84",
+    "@librechat/agents": "^2.4.85",
    "@librechat/api": "*",
    "@librechat/data-schemas": "*",
    "@microsoft/microsoft-graph-client": "^3.0.7",
--- a/api/server/controllers/agents/client.js
+++ b/api/server/controllers/agents/client.js
@ -257,7 +257,7 @@ class AgentClient extends BaseClient {
        };
      }

-      const files = await this.addImageURLs(
+      const files = await this.processAttachments(
        orderedMessages[orderedMessages.length - 1],
        attachments,
      );
--- a/api/server/services/Files/Local/crud.js
+++ b/api/server/services/Files/Local/crud.js
@ -4,6 +4,7 @@ const axios = require('axios');
 const { logger } = require('@librechat/data-schemas');
 const { EModelEndpoint } = require('librechat-data-provider');
 const { generateShortLivedToken } = require('@librechat/api');
+const { resizeImageBuffer } = require('~/server/services/Files/images/resize');
 const { getBufferMetadata } = require('~/server/utils');
 const paths = require('~/config/paths');

@ -286,7 +287,18 @@ async function uploadLocalFile({ req, file, file_id }) {
  await fs.promises.writeFile(newPath, inputBuffer);
  const filepath = path.posix.join('/', 'uploads', req.user.id, path.basename(newPath));

-  return { filepath, bytes };
+  let height, width;
+  if (file.mimetype && file.mimetype.startsWith('image/')) {
+    try {
+      const { width: imgWidth, height: imgHeight } = await resizeImageBuffer(inputBuffer, 'high');
+      height = imgHeight;
+      width = imgWidth;
+    } catch (error) {
+      logger.warn('[uploadLocalFile] Could not get image dimensions:', error.message);
+    }
+  }
+
+  return { filepath, bytes, height, width };
 }

 /**
--- a/api/server/services/Files/process.js
+++ b/api/server/services/Files/process.js
@ -522,11 +522,6 @@ const processAgentFileUpload = async ({ req, res, metadata }) => {
  }

  const isImage = file.mimetype.startsWith('image');
-  if (!isImage && !tool_resource) {
-    /** Note: this needs to be removed when we can support files to providers */
-    throw new Error('No tool resource provided for non-image agent file upload');
-  }
-
  let fileInfoMetadata;
  const entity_id = messageAttachment === true ? undefined : agent_id;
  const basePath = mime.getType(file.originalname)?.startsWith('image') ? 'images' : 'uploads';