📎 feat: Direct Provider Attachment Support for Multimodal Content (#9994)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled

* 📎 feat: Direct Provider Attachment Support for Multimodal Content

* 📑 feat: Anthropic Direct Provider Upload (#9072)

* feat: implement Anthropic native PDF support with document preservation

- Add comprehensive debug logging throughout PDF processing pipeline
- Refactor attachment processing to separate image and document handling
- Create distinct addImageURLs(), addDocuments(), and processAttachments() methods
- Fix critical bugs in stream handling and parameter passing
- Add streamToBuffer utility for proper stream-to-buffer conversion
- Remove api/agents submodule from repository

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: remove out of scope formatting changes

* fix: stop duplication of file in chat on end of response stream

* chore: bring back file search and ocr options

* chore: localize upload to provider string in file menu

* refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support

* feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching

* feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload

* feat: add citations support according to docs

* refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now

* refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic

* fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs

* chore: remove client specific handling since the direct provider stuff is handled by the agent client

* feat: handle documents in AgentClient so no need for change to agents repo

* chore: removed unused changes

* chore: remove auto generated comments from OG commit

* feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic)

* fix: reintroduce role check to fix render error because of undefined value for Content Part

* fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent

---------

Co-authored-by: Andres Restrepo <andres@thelinuxkid.com>
Co-authored-by: Claude <noreply@anthropic.com>

📁 feat: Send Attachments Directly to Provider (OpenAI) (#9098)

* refactor: change references from direct upload to direct attach to better reflect functionality

since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support

* feat: add upload to provider option for openai (and agent) ui

* chore: move anthropic pdf validator over to packages/api

* feat: simple pdf validation according to openai docs

* feat: add provider agnostic validatePdf logic to start handling multiple endpoints

* feat: add handling for openai specific documentPart formatting

* refactor: move require statement to proper place at top of file

* chore: add in openAI endpoint for the rest of the document handling logic

* feat: add direct attach support for azureOpenAI endpoint and agents

* feat: add pdf validation for azureOpenAI endpoint

* refactor: unify all the endpoint checks with isDocumentSupportedEndpoint

* refactor: consolidate Upload to Provider vs Upload image logic for clarity

* refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now

🗂️ feat: Send Attachments Directly to Provider (Google) (#9100)

* feat: add validation for google PDFs and add google endpoint as a document supporting endpoint

* feat: add proper pdf formatting for google endpoints (requires PR #14 in agents)

* feat: add multimodal support for google endpoint attachments

* feat: add audio file svg

* fix: refactor attachments logic so multi-attachment messages work properly

* feat: add video file svg

* fix: allows for followup questions of uploaded multimodal attachments

* fix: remove incorrect final message filtering that was breaking Attachment component rendering

fix: manualy rename 'documents' to 'Documents' in git since it wasn't picked up due to case insensitivity in dir name

fix: add logic so filepicker for a google agent has proper filetype filtering

🛫 refactor: Move Encoding Logic to packages/api (#9182)

* refactor: move audio encode over to TS

* refactor: audio encoding now functional in LC again

* refactor: move video encode over to TS

* refactor: move document encode over to TS

* refactor: video encoding now functional in LC again

* refactor: document encoding now functional in LC again

* fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider

* feat: only accept pdfs if responses api is enabled for openai convos

chore: address ESLint comments

chore: add missing audio mimetype

* fix: type safety for message content parts and improve null handling

* chore: reorder AttachFileMenuProps for consistency and clarity

* chore: import order in AttachFileMenu

* fix: improve null handling for text parts in parseTextParts function

* fix: remove no longer used unsupported capability error message for file uploads

* fix: OpenAI Direct File Attachment Format

* fix: update encodeAndFormatDocuments to support  OpenAI responses API and enhance document result types

* refactor: broaden providers supported for documents

* feat: enhance DragDrop context and modal to support document uploads based on provider capabilities

* fix: reorder import statements for consistency in video encoding module

---------

Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
This commit is contained in:
Danny Avila 2025-10-06 17:30:16 -04:00 committed by GitHub
parent 9c77f53454
commit bcd97aad2f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
33 changed files with 1040 additions and 74 deletions

View file

@ -2,13 +2,15 @@ import { memo, useMemo } from 'react';
import {
Constants,
supportsFiles,
EModelEndpoint,
mergeFileConfig,
isAgentsEndpoint,
isAssistantsEndpoint,
fileConfig as defaultFileConfig,
} from 'librechat-data-provider';
import type { EndpointFileConfig, TConversation } from 'librechat-data-provider';
import { useGetFileConfig } from '~/data-provider';
import { useGetFileConfig, useGetEndpointsQuery } from '~/data-provider';
import { getEndpointField } from '~/utils/endpoints';
import AttachFileMenu from './AttachFileMenu';
import AttachFile from './AttachFile';
@ -20,7 +22,7 @@ function AttachFileChat({
conversation: TConversation | null;
}) {
const conversationId = conversation?.conversationId ?? Constants.NEW_CONVO;
const { endpoint, endpointType } = conversation ?? { endpoint: null };
const { endpoint } = conversation ?? { endpoint: null };
const isAgents = useMemo(() => isAgentsEndpoint(endpoint), [endpoint]);
const isAssistants = useMemo(() => isAssistantsEndpoint(endpoint), [endpoint]);
@ -28,6 +30,15 @@ function AttachFileChat({
select: (data) => mergeFileConfig(data),
});
const { data: endpointsConfig } = useGetEndpointsQuery();
const endpointType = useMemo(() => {
return (
getEndpointField(endpointsConfig, endpoint, 'type') ||
(endpoint as EModelEndpoint | undefined)
);
}, [endpoint, endpointsConfig]);
const endpointFileConfig = fileConfig.endpoints[endpoint ?? ''] as EndpointFileConfig | undefined;
const endpointSupportsFiles: boolean = supportsFiles[endpointType ?? endpoint ?? ''] ?? false;
const isUploadDisabled = (disableInputs || endpointFileConfig?.disabled) ?? false;
@ -37,7 +48,9 @@ function AttachFileChat({
} else if (isAgents || (endpointSupportsFiles && !isUploadDisabled)) {
return (
<AttachFileMenu
endpoint={endpoint}
disabled={disableInputs}
endpointType={endpointType}
conversationId={conversationId}
agentId={conversation?.agent_id}
endpointFileConfig={endpointFileConfig}

View file

@ -1,8 +1,19 @@
import React, { useRef, useState, useMemo } from 'react';
import * as Ariakit from '@ariakit/react';
import { useRecoilState } from 'recoil';
import { FileSearch, ImageUpIcon, TerminalSquareIcon, FileType2Icon } from 'lucide-react';
import { EToolResources, EModelEndpoint, defaultAgentCapabilities } from 'librechat-data-provider';
import * as Ariakit from '@ariakit/react';
import {
FileSearch,
ImageUpIcon,
FileType2Icon,
FileImageIcon,
TerminalSquareIcon,
} from 'lucide-react';
import {
EToolResources,
EModelEndpoint,
defaultAgentCapabilities,
isDocumentSupportedProvider,
} from 'librechat-data-provider';
import {
FileUpload,
TooltipAnchor,
@ -26,15 +37,19 @@ import { MenuItemProps } from '~/common';
import { cn } from '~/utils';
interface AttachFileMenuProps {
conversationId: string;
agentId?: string | null;
endpoint?: string | null;
disabled?: boolean | null;
conversationId: string;
endpointType?: EModelEndpoint;
endpointFileConfig?: EndpointFileConfig;
}
const AttachFileMenu = ({
agentId,
endpoint,
disabled,
endpointType,
conversationId,
endpointFileConfig,
}: AttachFileMenuProps) => {
@ -55,44 +70,75 @@ const AttachFileMenu = ({
overrideEndpointFileConfig: endpointFileConfig,
toolResource,
});
const { agentsConfig } = useGetAgentsConfig();
const { data: startupConfig } = useGetStartupConfig();
const sharePointEnabled = startupConfig?.sharePointFilePickerEnabled;
const [isSharePointDialogOpen, setIsSharePointDialogOpen] = useState(false);
const { agentsConfig } = useGetAgentsConfig();
/** TODO: Ephemeral Agent Capabilities
* Allow defining agent capabilities on a per-endpoint basis
* Use definition for agents endpoint for ephemeral agents
* */
const capabilities = useAgentCapabilities(agentsConfig?.capabilities ?? defaultAgentCapabilities);
const { fileSearchAllowedByAgent, codeAllowedByAgent } = useAgentToolPermissions(
const { fileSearchAllowedByAgent, codeAllowedByAgent, provider } = useAgentToolPermissions(
agentId,
ephemeralAgent,
);
const handleUploadClick = (isImage?: boolean) => {
const handleUploadClick = (
fileType?: 'image' | 'document' | 'multimodal' | 'google_multimodal',
) => {
if (!inputRef.current) {
return;
}
inputRef.current.value = '';
inputRef.current.accept = isImage === true ? 'image/*' : '';
if (fileType === 'image') {
inputRef.current.accept = 'image/*';
} else if (fileType === 'document') {
inputRef.current.accept = '.pdf,application/pdf';
} else if (fileType === 'multimodal') {
inputRef.current.accept = 'image/*,.pdf,application/pdf';
} else if (fileType === 'google_multimodal') {
inputRef.current.accept = 'image/*,.pdf,application/pdf,video/*,audio/*';
} else {
inputRef.current.accept = '';
}
inputRef.current.click();
inputRef.current.accept = '';
};
const dropdownItems = useMemo(() => {
const createMenuItems = (onAction: (isImage?: boolean) => void) => {
const items: MenuItemProps[] = [
{
const createMenuItems = (
onAction: (fileType?: 'image' | 'document' | 'multimodal' | 'google_multimodal') => void,
) => {
const items: MenuItemProps[] = [];
const currentProvider = provider || endpoint;
if (isDocumentSupportedProvider(endpointType || currentProvider)) {
items.push({
label: localize('com_ui_upload_provider'),
onClick: () => {
setToolResource(undefined);
onAction(
(provider || endpoint) === EModelEndpoint.google ? 'google_multimodal' : 'multimodal',
);
},
icon: <FileImageIcon className="icon-md" />,
});
} else {
items.push({
label: localize('com_ui_upload_image_input'),
onClick: () => {
setToolResource(undefined);
onAction(true);
onAction('image');
},
icon: <ImageUpIcon className="icon-md" />,
},
];
});
}
if (capabilities.contextEnabled) {
items.push({
@ -156,8 +202,11 @@ const AttachFileMenu = ({
return localItems;
}, [
capabilities,
localize,
endpoint,
provider,
endpointType,
capabilities,
setToolResource,
setEphemeralAgent,
sharePointEnabled,

View file

@ -1,8 +1,18 @@
import React, { useMemo } from 'react';
import { useRecoilValue } from 'recoil';
import { OGDialog, OGDialogTemplate } from '@librechat/client';
import { EToolResources, defaultAgentCapabilities } from 'librechat-data-provider';
import { ImageUpIcon, FileSearch, TerminalSquareIcon, FileType2Icon } from 'lucide-react';
import {
EToolResources,
defaultAgentCapabilities,
isDocumentSupportedProvider,
} from 'librechat-data-provider';
import {
ImageUpIcon,
FileSearch,
FileType2Icon,
FileImageIcon,
TerminalSquareIcon,
} from 'lucide-react';
import {
useAgentToolPermissions,
useAgentCapabilities,
@ -34,22 +44,34 @@ const DragDropModal = ({ onOptionSelect, setShowModal, files, isVisible }: DragD
* Use definition for agents endpoint for ephemeral agents
* */
const capabilities = useAgentCapabilities(agentsConfig?.capabilities ?? defaultAgentCapabilities);
const { conversationId, agentId } = useDragDropContext();
const { conversationId, agentId, endpoint, endpointType } = useDragDropContext();
const ephemeralAgent = useRecoilValue(ephemeralAgentByConvoId(conversationId ?? ''));
const { fileSearchAllowedByAgent, codeAllowedByAgent } = useAgentToolPermissions(
const { fileSearchAllowedByAgent, codeAllowedByAgent, provider } = useAgentToolPermissions(
agentId,
ephemeralAgent,
);
const options = useMemo(() => {
const _options: FileOption[] = [
{
const _options: FileOption[] = [];
const currentProvider = provider || endpoint;
// Check if provider supports document upload
if (isDocumentSupportedProvider(endpointType || currentProvider)) {
_options.push({
label: localize('com_ui_upload_provider'),
value: undefined,
icon: <FileImageIcon className="icon-md" />,
condition: true, // Allow for both images and documents
});
} else {
// Only show image upload option if all files are images and provider doesn't support documents
_options.push({
label: localize('com_ui_upload_image_input'),
value: undefined,
icon: <ImageUpIcon className="icon-md" />,
condition: files.every((file) => file.type?.startsWith('image/')),
},
];
});
}
if (capabilities.fileSearchEnabled && fileSearchAllowedByAgent) {
_options.push({
label: localize('com_ui_upload_file_search'),
@ -73,7 +95,16 @@ const DragDropModal = ({ onOptionSelect, setShowModal, files, isVisible }: DragD
}
return _options;
}, [capabilities, files, localize, fileSearchAllowedByAgent, codeAllowedByAgent]);
}, [
files,
localize,
provider,
endpoint,
endpointType,
capabilities,
codeAllowedByAgent,
fileSearchAllowedByAgent,
]);
if (!isVisible) {
return null;

View file

@ -57,7 +57,7 @@ const Part = memo(
</>
);
} else if (part.type === ContentTypes.TEXT) {
const text = typeof part.text === 'string' ? part.text : part.text.value;
const text = typeof part.text === 'string' ? part.text : part.text?.value;
if (typeof text !== 'string') {
return null;
@ -71,7 +71,7 @@ const Part = memo(
</Container>
);
} else if (part.type === ContentTypes.THINK) {
const reasoning = typeof part.think === 'string' ? part.think : part.think.value;
const reasoning = typeof part.think === 'string' ? part.think : part.think?.value;
if (typeof reasoning !== 'string') {
return null;
}

View file

@ -37,7 +37,7 @@ const LogContent: React.FC<LogContentProps> = ({ output = '', renderImages, atta
attachments?.forEach((attachment) => {
const { width, height, filepath = null } = attachment as TFile & TAttachmentMetadata;
const isImage =
imageExtRegex.test(attachment.filename) &&
imageExtRegex.test(attachment.filename ?? '') &&
width != null &&
height != null &&
filepath != null;
@ -56,21 +56,25 @@ const LogContent: React.FC<LogContentProps> = ({ output = '', renderImages, atta
const renderAttachment = (file: TAttachment) => {
const now = new Date();
const expiresAt = typeof file.expiresAt === 'number' ? new Date(file.expiresAt) : null;
const expiresAt =
'expiresAt' in file && typeof file.expiresAt === 'number' ? new Date(file.expiresAt) : null;
const isExpired = expiresAt ? isAfter(now, expiresAt) : false;
const filename = file.filename || '';
if (isExpired) {
return `${file.filename} ${localize('com_download_expired')}`;
return `${filename} ${localize('com_download_expired')}`;
}
const filepath = file.filepath || '';
// const expirationText = expiresAt
// ? ` ${localize('com_download_expires', { 0: format(expiresAt, 'MM/dd/yy HH:mm') })}`
// : ` ${localize('com_click_to_download')}`;
return (
<LogLink href={file.filepath} filename={file.filename}>
<LogLink href={filepath} filename={filename}>
{'- '}
{file.filename} {localize('com_click_to_download')}
{filename} {localize('com_click_to_download')}
</LogLink>
);
};