📎 feat: Direct Provider Attachment Support for Multimodal Content (#9994)

* 📎 feat: Direct Provider Attachment Support for Multimodal Content * 📑 feat: Anthropic Direct Provider Upload (#9072) * feat: implement Anthropic native PDF support with document preservation - Add comprehensive debug logging throughout PDF processing pipeline - Refactor attachment processing to separate image and document handling - Create distinct addImageURLs(), addDocuments(), and processAttachments() methods - Fix critical bugs in stream handling and parameter passing - Add streamToBuffer utility for proper stream-to-buffer conversion - Remove api/agents submodule from repository 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: remove out of scope formatting changes * fix: stop duplication of file in chat on end of response stream * chore: bring back file search and ocr options * chore: localize upload to provider string in file menu * refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support * feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching * feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload * feat: add citations support according to docs * refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now * refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic * fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs * chore: remove client specific handling since the direct provider stuff is handled by the agent client * feat: handle documents in AgentClient so no need for change to agents repo * chore: removed unused changes * chore: remove auto generated comments from OG commit * feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic) * fix: reintroduce role check to fix render error because of undefined value for Content Part * fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent --------- Co-authored-by: Andres Restrepo <andres@thelinuxkid.com> Co-authored-by: Claude <noreply@anthropic.com> 📁 feat: Send Attachments Directly to Provider (OpenAI) (#9098) * refactor: change references from direct upload to direct attach to better reflect functionality since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support * feat: add upload to provider option for openai (and agent) ui * chore: move anthropic pdf validator over to packages/api * feat: simple pdf validation according to openai docs * feat: add provider agnostic validatePdf logic to start handling multiple endpoints * feat: add handling for openai specific documentPart formatting * refactor: move require statement to proper place at top of file * chore: add in openAI endpoint for the rest of the document handling logic * feat: add direct attach support for azureOpenAI endpoint and agents * feat: add pdf validation for azureOpenAI endpoint * refactor: unify all the endpoint checks with isDocumentSupportedEndpoint * refactor: consolidate Upload to Provider vs Upload image logic for clarity * refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now 🗂️ feat: Send Attachments Directly to Provider (Google) (#9100) * feat: add validation for google PDFs and add google endpoint as a document supporting endpoint * feat: add proper pdf formatting for google endpoints (requires PR #14 in agents) * feat: add multimodal support for google endpoint attachments * feat: add audio file svg * fix: refactor attachments logic so multi-attachment messages work properly * feat: add video file svg * fix: allows for followup questions of uploaded multimodal attachments * fix: remove incorrect final message filtering that was breaking Attachment component rendering fix: manualy rename 'documents' to 'Documents' in git since it wasn't picked up due to case insensitivity in dir name fix: add logic so filepicker for a google agent has proper filetype filtering 🛫 refactor: Move Encoding Logic to packages/api (#9182) * refactor: move audio encode over to TS * refactor: audio encoding now functional in LC again * refactor: move video encode over to TS * refactor: move document encode over to TS * refactor: video encoding now functional in LC again * refactor: document encoding now functional in LC again * fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider * feat: only accept pdfs if responses api is enabled for openai convos chore: address ESLint comments chore: add missing audio mimetype * fix: type safety for message content parts and improve null handling * chore: reorder AttachFileMenuProps for consistency and clarity * chore: import order in AttachFileMenu * fix: improve null handling for text parts in parseTextParts function * fix: remove no longer used unsupported capability error message for file uploads * fix: OpenAI Direct File Attachment Format * fix: update encodeAndFormatDocuments to support OpenAI responses API and enhance document result types * refactor: broaden providers supported for documents * feat: enhance DragDrop context and modal to support document uploads based on provider capabilities * fix: reorder import statements for consistency in video encoding module --------- Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
2025-12-17 00:40:14 +01:00 · 2025-10-06 17:30:16 -04:00 · 2025-10-06 17:30:16 -04:00 · bcd97aad2f
commit bcd97aad2f
parent 9c77f53454
33 changed files with 1040 additions and 74 deletions
--- a/client/src/components/Chat/Input/Files/AttachFileChat.tsx
+++ b/client/src/components/Chat/Input/Files/AttachFileChat.tsx
@ -2,13 +2,15 @@ import { memo, useMemo } from 'react';
 import {
  Constants,
  supportsFiles,
+  EModelEndpoint,
  mergeFileConfig,
  isAgentsEndpoint,
  isAssistantsEndpoint,
  fileConfig as defaultFileConfig,
 } from 'librechat-data-provider';
 import type { EndpointFileConfig, TConversation } from 'librechat-data-provider';
-import { useGetFileConfig } from '~/data-provider';
+import { useGetFileConfig, useGetEndpointsQuery } from '~/data-provider';
+import { getEndpointField } from '~/utils/endpoints';
 import AttachFileMenu from './AttachFileMenu';
 import AttachFile from './AttachFile';

@ -20,7 +22,7 @@ function AttachFileChat({
  conversation: TConversation | null;
 }) {
  const conversationId = conversation?.conversationId ?? Constants.NEW_CONVO;
-  const { endpoint, endpointType } = conversation ?? { endpoint: null };
+  const { endpoint } = conversation ?? { endpoint: null };
  const isAgents = useMemo(() => isAgentsEndpoint(endpoint), [endpoint]);
  const isAssistants = useMemo(() => isAssistantsEndpoint(endpoint), [endpoint]);

@ -28,6 +30,15 @@ function AttachFileChat({
    select: (data) => mergeFileConfig(data),
  });

+  const { data: endpointsConfig } = useGetEndpointsQuery();
+
+  const endpointType = useMemo(() => {
+    return (
+      getEndpointField(endpointsConfig, endpoint, 'type') ||
+      (endpoint as EModelEndpoint | undefined)
+    );
+  }, [endpoint, endpointsConfig]);
+
  const endpointFileConfig = fileConfig.endpoints[endpoint ?? ''] as EndpointFileConfig | undefined;
  const endpointSupportsFiles: boolean = supportsFiles[endpointType ?? endpoint ?? ''] ?? false;
  const isUploadDisabled = (disableInputs || endpointFileConfig?.disabled) ?? false;
@ -37,7 +48,9 @@ function AttachFileChat({
  } else if (isAgents || (endpointSupportsFiles && !isUploadDisabled)) {
    return (
      <AttachFileMenu
+        endpoint={endpoint}
        disabled={disableInputs}
+        endpointType={endpointType}
        conversationId={conversationId}
        agentId={conversation?.agent_id}
        endpointFileConfig={endpointFileConfig}
--- a/client/src/components/Chat/Input/Files/AttachFileMenu.tsx
+++ b/client/src/components/Chat/Input/Files/AttachFileMenu.tsx
@ -1,8 +1,19 @@
 import React, { useRef, useState, useMemo } from 'react';
-import * as Ariakit from '@ariakit/react';
 import { useRecoilState } from 'recoil';
-import { FileSearch, ImageUpIcon, TerminalSquareIcon, FileType2Icon } from 'lucide-react';
-import { EToolResources, EModelEndpoint, defaultAgentCapabilities } from 'librechat-data-provider';
+import * as Ariakit from '@ariakit/react';
+import {
+  FileSearch,
+  ImageUpIcon,
+  FileType2Icon,
+  FileImageIcon,
+  TerminalSquareIcon,
+} from 'lucide-react';
+import {
+  EToolResources,
+  EModelEndpoint,
+  defaultAgentCapabilities,
+  isDocumentSupportedProvider,
+} from 'librechat-data-provider';
 import {
  FileUpload,
  TooltipAnchor,
@ -26,15 +37,19 @@ import { MenuItemProps } from '~/common';
 import { cn } from '~/utils';

 interface AttachFileMenuProps {
-  conversationId: string;
  agentId?: string | null;
+  endpoint?: string | null;
  disabled?: boolean | null;
+  conversationId: string;
+  endpointType?: EModelEndpoint;
  endpointFileConfig?: EndpointFileConfig;
 }

 const AttachFileMenu = ({
  agentId,
+  endpoint,
  disabled,
+  endpointType,
  conversationId,
  endpointFileConfig,
 }: AttachFileMenuProps) => {
@ -55,44 +70,75 @@ const AttachFileMenu = ({
    overrideEndpointFileConfig: endpointFileConfig,
    toolResource,
  });
+
+  const { agentsConfig } = useGetAgentsConfig();
  const { data: startupConfig } = useGetStartupConfig();
  const sharePointEnabled = startupConfig?.sharePointFilePickerEnabled;

  const [isSharePointDialogOpen, setIsSharePointDialogOpen] = useState(false);
-  const { agentsConfig } = useGetAgentsConfig();
+
  /** TODO: Ephemeral Agent Capabilities
   * Allow defining agent capabilities on a per-endpoint basis
   * Use definition for agents endpoint for ephemeral agents
   * */
  const capabilities = useAgentCapabilities(agentsConfig?.capabilities ?? defaultAgentCapabilities);

-  const { fileSearchAllowedByAgent, codeAllowedByAgent } = useAgentToolPermissions(
+  const { fileSearchAllowedByAgent, codeAllowedByAgent, provider } = useAgentToolPermissions(
    agentId,
    ephemeralAgent,
  );

-  const handleUploadClick = (isImage?: boolean) => {
+  const handleUploadClick = (
+    fileType?: 'image' | 'document' | 'multimodal' | 'google_multimodal',
+  ) => {
    if (!inputRef.current) {
      return;
    }
    inputRef.current.value = '';
-    inputRef.current.accept = isImage === true ? 'image/*' : '';
+    if (fileType === 'image') {
+      inputRef.current.accept = 'image/*';
+    } else if (fileType === 'document') {
+      inputRef.current.accept = '.pdf,application/pdf';
+    } else if (fileType === 'multimodal') {
+      inputRef.current.accept = 'image/*,.pdf,application/pdf';
+    } else if (fileType === 'google_multimodal') {
+      inputRef.current.accept = 'image/*,.pdf,application/pdf,video/*,audio/*';
+    } else {
+      inputRef.current.accept = '';
+    }
    inputRef.current.click();
    inputRef.current.accept = '';
  };

  const dropdownItems = useMemo(() => {
-    const createMenuItems = (onAction: (isImage?: boolean) => void) => {
-      const items: MenuItemProps[] = [
-        {
+    const createMenuItems = (
+      onAction: (fileType?: 'image' | 'document' | 'multimodal' | 'google_multimodal') => void,
+    ) => {
+      const items: MenuItemProps[] = [];
+
+      const currentProvider = provider || endpoint;
+
+      if (isDocumentSupportedProvider(endpointType || currentProvider)) {
+        items.push({
+          label: localize('com_ui_upload_provider'),
+          onClick: () => {
+            setToolResource(undefined);
+            onAction(
+              (provider || endpoint) === EModelEndpoint.google ? 'google_multimodal' : 'multimodal',
+            );
+          },
+          icon: <FileImageIcon className="icon-md" />,
+        });
+      } else {
+        items.push({
          label: localize('com_ui_upload_image_input'),
          onClick: () => {
            setToolResource(undefined);
-            onAction(true);
+            onAction('image');
          },
          icon: <ImageUpIcon className="icon-md" />,
-        },
-      ];
+        });
+      }

      if (capabilities.contextEnabled) {
        items.push({
@ -156,8 +202,11 @@ const AttachFileMenu = ({

    return localItems;
  }, [
-    capabilities,
    localize,
+    endpoint,
+    provider,
+    endpointType,
+    capabilities,
    setToolResource,
    setEphemeralAgent,
    sharePointEnabled,
--- a/client/src/components/Chat/Input/Files/DragDropModal.tsx
+++ b/client/src/components/Chat/Input/Files/DragDropModal.tsx
@ -1,8 +1,18 @@
 import React, { useMemo } from 'react';
 import { useRecoilValue } from 'recoil';
 import { OGDialog, OGDialogTemplate } from '@librechat/client';
-import { EToolResources, defaultAgentCapabilities } from 'librechat-data-provider';
-import { ImageUpIcon, FileSearch, TerminalSquareIcon, FileType2Icon } from 'lucide-react';
+import {
+  EToolResources,
+  defaultAgentCapabilities,
+  isDocumentSupportedProvider,
+} from 'librechat-data-provider';
+import {
+  ImageUpIcon,
+  FileSearch,
+  FileType2Icon,
+  FileImageIcon,
+  TerminalSquareIcon,
+} from 'lucide-react';
 import {
  useAgentToolPermissions,
  useAgentCapabilities,
@ -34,22 +44,34 @@ const DragDropModal = ({ onOptionSelect, setShowModal, files, isVisible }: DragD
   * Use definition for agents endpoint for ephemeral agents
   * */
  const capabilities = useAgentCapabilities(agentsConfig?.capabilities ?? defaultAgentCapabilities);
-  const { conversationId, agentId } = useDragDropContext();
+  const { conversationId, agentId, endpoint, endpointType } = useDragDropContext();
  const ephemeralAgent = useRecoilValue(ephemeralAgentByConvoId(conversationId ?? ''));
-  const { fileSearchAllowedByAgent, codeAllowedByAgent } = useAgentToolPermissions(
+  const { fileSearchAllowedByAgent, codeAllowedByAgent, provider } = useAgentToolPermissions(
    agentId,
    ephemeralAgent,
  );

  const options = useMemo(() => {
-    const _options: FileOption[] = [
-      {
+    const _options: FileOption[] = [];
+    const currentProvider = provider || endpoint;
+
+    // Check if provider supports document upload
+    if (isDocumentSupportedProvider(endpointType || currentProvider)) {
+      _options.push({
+        label: localize('com_ui_upload_provider'),
+        value: undefined,
+        icon: <FileImageIcon className="icon-md" />,
+        condition: true, // Allow for both images and documents
+      });
+    } else {
+      // Only show image upload option if all files are images and provider doesn't support documents
+      _options.push({
        label: localize('com_ui_upload_image_input'),
        value: undefined,
        icon: <ImageUpIcon className="icon-md" />,
        condition: files.every((file) => file.type?.startsWith('image/')),
-      },
-    ];
+      });
+    }
    if (capabilities.fileSearchEnabled && fileSearchAllowedByAgent) {
      _options.push({
        label: localize('com_ui_upload_file_search'),
@ -73,7 +95,16 @@ const DragDropModal = ({ onOptionSelect, setShowModal, files, isVisible }: DragD
    }

    return _options;
-  }, [capabilities, files, localize, fileSearchAllowedByAgent, codeAllowedByAgent]);
+  }, [
+    files,
+    localize,
+    provider,
+    endpoint,
+    endpointType,
+    capabilities,
+    codeAllowedByAgent,
+    fileSearchAllowedByAgent,
+  ]);

  if (!isVisible) {
    return null;
--- a/client/src/components/Chat/Messages/Content/Part.tsx
+++ b/client/src/components/Chat/Messages/Content/Part.tsx
@ -57,7 +57,7 @@ const Part = memo(
        </>
      );
    } else if (part.type === ContentTypes.TEXT) {
-      const text = typeof part.text === 'string' ? part.text : part.text.value;
+      const text = typeof part.text === 'string' ? part.text : part.text?.value;

      if (typeof text !== 'string') {
        return null;
@ -71,7 +71,7 @@ const Part = memo(
        </Container>
      );
    } else if (part.type === ContentTypes.THINK) {
-      const reasoning = typeof part.think === 'string' ? part.think : part.think.value;
+      const reasoning = typeof part.think === 'string' ? part.think : part.think?.value;
      if (typeof reasoning !== 'string') {
        return null;
      }
--- a/client/src/components/Chat/Messages/Content/Parts/LogContent.tsx
+++ b/client/src/components/Chat/Messages/Content/Parts/LogContent.tsx
@ -37,7 +37,7 @@ const LogContent: React.FC<LogContentProps> = ({ output = '', renderImages, atta
    attachments?.forEach((attachment) => {
      const { width, height, filepath = null } = attachment as TFile & TAttachmentMetadata;
      const isImage =
-        imageExtRegex.test(attachment.filename) &&
+        imageExtRegex.test(attachment.filename ?? '') &&
        width != null &&
        height != null &&
        filepath != null;
@ -56,21 +56,25 @@ const LogContent: React.FC<LogContentProps> = ({ output = '', renderImages, atta

  const renderAttachment = (file: TAttachment) => {
    const now = new Date();
-    const expiresAt = typeof file.expiresAt === 'number' ? new Date(file.expiresAt) : null;
+    const expiresAt =
+      'expiresAt' in file && typeof file.expiresAt === 'number' ? new Date(file.expiresAt) : null;
    const isExpired = expiresAt ? isAfter(now, expiresAt) : false;
+    const filename = file.filename || '';

    if (isExpired) {
-      return `${file.filename} ${localize('com_download_expired')}`;
+      return `${filename} ${localize('com_download_expired')}`;
    }

+    const filepath = file.filepath || '';
+
    // const expirationText = expiresAt
    //   ? ` ${localize('com_download_expires', { 0: format(expiresAt, 'MM/dd/yy HH:mm') })}`
    //   : ` ${localize('com_click_to_download')}`;

    return (
-      <LogLink href={file.filepath} filename={file.filename}>
+      <LogLink href={filepath} filename={filename}>
        {'- '}
-        {file.filename} {localize('com_click_to_download')}
+        {filename} {localize('com_click_to_download')}
      </LogLink>
    );
  };