mirror of
https://github.com/danny-avila/LibreChat.git
synced 2025-12-17 00:40:14 +01:00
📎 feat: Direct Provider Attachment Support for Multimodal Content (#9994)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
* 📎 feat: Direct Provider Attachment Support for Multimodal Content * 📑 feat: Anthropic Direct Provider Upload (#9072) * feat: implement Anthropic native PDF support with document preservation - Add comprehensive debug logging throughout PDF processing pipeline - Refactor attachment processing to separate image and document handling - Create distinct addImageURLs(), addDocuments(), and processAttachments() methods - Fix critical bugs in stream handling and parameter passing - Add streamToBuffer utility for proper stream-to-buffer conversion - Remove api/agents submodule from repository 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: remove out of scope formatting changes * fix: stop duplication of file in chat on end of response stream * chore: bring back file search and ocr options * chore: localize upload to provider string in file menu * refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support * feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching * feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload * feat: add citations support according to docs * refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now * refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic * fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs * chore: remove client specific handling since the direct provider stuff is handled by the agent client * feat: handle documents in AgentClient so no need for change to agents repo * chore: removed unused changes * chore: remove auto generated comments from OG commit * feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic) * fix: reintroduce role check to fix render error because of undefined value for Content Part * fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent --------- Co-authored-by: Andres Restrepo <andres@thelinuxkid.com> Co-authored-by: Claude <noreply@anthropic.com> 📁 feat: Send Attachments Directly to Provider (OpenAI) (#9098) * refactor: change references from direct upload to direct attach to better reflect functionality since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support * feat: add upload to provider option for openai (and agent) ui * chore: move anthropic pdf validator over to packages/api * feat: simple pdf validation according to openai docs * feat: add provider agnostic validatePdf logic to start handling multiple endpoints * feat: add handling for openai specific documentPart formatting * refactor: move require statement to proper place at top of file * chore: add in openAI endpoint for the rest of the document handling logic * feat: add direct attach support for azureOpenAI endpoint and agents * feat: add pdf validation for azureOpenAI endpoint * refactor: unify all the endpoint checks with isDocumentSupportedEndpoint * refactor: consolidate Upload to Provider vs Upload image logic for clarity * refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now 🗂️ feat: Send Attachments Directly to Provider (Google) (#9100) * feat: add validation for google PDFs and add google endpoint as a document supporting endpoint * feat: add proper pdf formatting for google endpoints (requires PR #14 in agents) * feat: add multimodal support for google endpoint attachments * feat: add audio file svg * fix: refactor attachments logic so multi-attachment messages work properly * feat: add video file svg * fix: allows for followup questions of uploaded multimodal attachments * fix: remove incorrect final message filtering that was breaking Attachment component rendering fix: manualy rename 'documents' to 'Documents' in git since it wasn't picked up due to case insensitivity in dir name fix: add logic so filepicker for a google agent has proper filetype filtering 🛫 refactor: Move Encoding Logic to packages/api (#9182) * refactor: move audio encode over to TS * refactor: audio encoding now functional in LC again * refactor: move video encode over to TS * refactor: move document encode over to TS * refactor: video encoding now functional in LC again * refactor: document encoding now functional in LC again * fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider * feat: only accept pdfs if responses api is enabled for openai convos chore: address ESLint comments chore: add missing audio mimetype * fix: type safety for message content parts and improve null handling * chore: reorder AttachFileMenuProps for consistency and clarity * chore: import order in AttachFileMenu * fix: improve null handling for text parts in parseTextParts function * fix: remove no longer used unsupported capability error message for file uploads * fix: OpenAI Direct File Attachment Format * fix: update encodeAndFormatDocuments to support OpenAI responses API and enhance document result types * refactor: broaden providers supported for documents * feat: enhance DragDrop context and modal to support document uploads based on provider capabilities * fix: reorder import statements for consistency in video encoding module --------- Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
This commit is contained in:
parent
9c77f53454
commit
bcd97aad2f
33 changed files with 1040 additions and 74 deletions
|
|
@ -2,13 +2,15 @@ import { memo, useMemo } from 'react';
|
|||
import {
|
||||
Constants,
|
||||
supportsFiles,
|
||||
EModelEndpoint,
|
||||
mergeFileConfig,
|
||||
isAgentsEndpoint,
|
||||
isAssistantsEndpoint,
|
||||
fileConfig as defaultFileConfig,
|
||||
} from 'librechat-data-provider';
|
||||
import type { EndpointFileConfig, TConversation } from 'librechat-data-provider';
|
||||
import { useGetFileConfig } from '~/data-provider';
|
||||
import { useGetFileConfig, useGetEndpointsQuery } from '~/data-provider';
|
||||
import { getEndpointField } from '~/utils/endpoints';
|
||||
import AttachFileMenu from './AttachFileMenu';
|
||||
import AttachFile from './AttachFile';
|
||||
|
||||
|
|
@ -20,7 +22,7 @@ function AttachFileChat({
|
|||
conversation: TConversation | null;
|
||||
}) {
|
||||
const conversationId = conversation?.conversationId ?? Constants.NEW_CONVO;
|
||||
const { endpoint, endpointType } = conversation ?? { endpoint: null };
|
||||
const { endpoint } = conversation ?? { endpoint: null };
|
||||
const isAgents = useMemo(() => isAgentsEndpoint(endpoint), [endpoint]);
|
||||
const isAssistants = useMemo(() => isAssistantsEndpoint(endpoint), [endpoint]);
|
||||
|
||||
|
|
@ -28,6 +30,15 @@ function AttachFileChat({
|
|||
select: (data) => mergeFileConfig(data),
|
||||
});
|
||||
|
||||
const { data: endpointsConfig } = useGetEndpointsQuery();
|
||||
|
||||
const endpointType = useMemo(() => {
|
||||
return (
|
||||
getEndpointField(endpointsConfig, endpoint, 'type') ||
|
||||
(endpoint as EModelEndpoint | undefined)
|
||||
);
|
||||
}, [endpoint, endpointsConfig]);
|
||||
|
||||
const endpointFileConfig = fileConfig.endpoints[endpoint ?? ''] as EndpointFileConfig | undefined;
|
||||
const endpointSupportsFiles: boolean = supportsFiles[endpointType ?? endpoint ?? ''] ?? false;
|
||||
const isUploadDisabled = (disableInputs || endpointFileConfig?.disabled) ?? false;
|
||||
|
|
@ -37,7 +48,9 @@ function AttachFileChat({
|
|||
} else if (isAgents || (endpointSupportsFiles && !isUploadDisabled)) {
|
||||
return (
|
||||
<AttachFileMenu
|
||||
endpoint={endpoint}
|
||||
disabled={disableInputs}
|
||||
endpointType={endpointType}
|
||||
conversationId={conversationId}
|
||||
agentId={conversation?.agent_id}
|
||||
endpointFileConfig={endpointFileConfig}
|
||||
|
|
|
|||
|
|
@ -1,8 +1,19 @@
|
|||
import React, { useRef, useState, useMemo } from 'react';
|
||||
import * as Ariakit from '@ariakit/react';
|
||||
import { useRecoilState } from 'recoil';
|
||||
import { FileSearch, ImageUpIcon, TerminalSquareIcon, FileType2Icon } from 'lucide-react';
|
||||
import { EToolResources, EModelEndpoint, defaultAgentCapabilities } from 'librechat-data-provider';
|
||||
import * as Ariakit from '@ariakit/react';
|
||||
import {
|
||||
FileSearch,
|
||||
ImageUpIcon,
|
||||
FileType2Icon,
|
||||
FileImageIcon,
|
||||
TerminalSquareIcon,
|
||||
} from 'lucide-react';
|
||||
import {
|
||||
EToolResources,
|
||||
EModelEndpoint,
|
||||
defaultAgentCapabilities,
|
||||
isDocumentSupportedProvider,
|
||||
} from 'librechat-data-provider';
|
||||
import {
|
||||
FileUpload,
|
||||
TooltipAnchor,
|
||||
|
|
@ -26,15 +37,19 @@ import { MenuItemProps } from '~/common';
|
|||
import { cn } from '~/utils';
|
||||
|
||||
interface AttachFileMenuProps {
|
||||
conversationId: string;
|
||||
agentId?: string | null;
|
||||
endpoint?: string | null;
|
||||
disabled?: boolean | null;
|
||||
conversationId: string;
|
||||
endpointType?: EModelEndpoint;
|
||||
endpointFileConfig?: EndpointFileConfig;
|
||||
}
|
||||
|
||||
const AttachFileMenu = ({
|
||||
agentId,
|
||||
endpoint,
|
||||
disabled,
|
||||
endpointType,
|
||||
conversationId,
|
||||
endpointFileConfig,
|
||||
}: AttachFileMenuProps) => {
|
||||
|
|
@ -55,44 +70,75 @@ const AttachFileMenu = ({
|
|||
overrideEndpointFileConfig: endpointFileConfig,
|
||||
toolResource,
|
||||
});
|
||||
|
||||
const { agentsConfig } = useGetAgentsConfig();
|
||||
const { data: startupConfig } = useGetStartupConfig();
|
||||
const sharePointEnabled = startupConfig?.sharePointFilePickerEnabled;
|
||||
|
||||
const [isSharePointDialogOpen, setIsSharePointDialogOpen] = useState(false);
|
||||
const { agentsConfig } = useGetAgentsConfig();
|
||||
|
||||
/** TODO: Ephemeral Agent Capabilities
|
||||
* Allow defining agent capabilities on a per-endpoint basis
|
||||
* Use definition for agents endpoint for ephemeral agents
|
||||
* */
|
||||
const capabilities = useAgentCapabilities(agentsConfig?.capabilities ?? defaultAgentCapabilities);
|
||||
|
||||
const { fileSearchAllowedByAgent, codeAllowedByAgent } = useAgentToolPermissions(
|
||||
const { fileSearchAllowedByAgent, codeAllowedByAgent, provider } = useAgentToolPermissions(
|
||||
agentId,
|
||||
ephemeralAgent,
|
||||
);
|
||||
|
||||
const handleUploadClick = (isImage?: boolean) => {
|
||||
const handleUploadClick = (
|
||||
fileType?: 'image' | 'document' | 'multimodal' | 'google_multimodal',
|
||||
) => {
|
||||
if (!inputRef.current) {
|
||||
return;
|
||||
}
|
||||
inputRef.current.value = '';
|
||||
inputRef.current.accept = isImage === true ? 'image/*' : '';
|
||||
if (fileType === 'image') {
|
||||
inputRef.current.accept = 'image/*';
|
||||
} else if (fileType === 'document') {
|
||||
inputRef.current.accept = '.pdf,application/pdf';
|
||||
} else if (fileType === 'multimodal') {
|
||||
inputRef.current.accept = 'image/*,.pdf,application/pdf';
|
||||
} else if (fileType === 'google_multimodal') {
|
||||
inputRef.current.accept = 'image/*,.pdf,application/pdf,video/*,audio/*';
|
||||
} else {
|
||||
inputRef.current.accept = '';
|
||||
}
|
||||
inputRef.current.click();
|
||||
inputRef.current.accept = '';
|
||||
};
|
||||
|
||||
const dropdownItems = useMemo(() => {
|
||||
const createMenuItems = (onAction: (isImage?: boolean) => void) => {
|
||||
const items: MenuItemProps[] = [
|
||||
{
|
||||
const createMenuItems = (
|
||||
onAction: (fileType?: 'image' | 'document' | 'multimodal' | 'google_multimodal') => void,
|
||||
) => {
|
||||
const items: MenuItemProps[] = [];
|
||||
|
||||
const currentProvider = provider || endpoint;
|
||||
|
||||
if (isDocumentSupportedProvider(endpointType || currentProvider)) {
|
||||
items.push({
|
||||
label: localize('com_ui_upload_provider'),
|
||||
onClick: () => {
|
||||
setToolResource(undefined);
|
||||
onAction(
|
||||
(provider || endpoint) === EModelEndpoint.google ? 'google_multimodal' : 'multimodal',
|
||||
);
|
||||
},
|
||||
icon: <FileImageIcon className="icon-md" />,
|
||||
});
|
||||
} else {
|
||||
items.push({
|
||||
label: localize('com_ui_upload_image_input'),
|
||||
onClick: () => {
|
||||
setToolResource(undefined);
|
||||
onAction(true);
|
||||
onAction('image');
|
||||
},
|
||||
icon: <ImageUpIcon className="icon-md" />,
|
||||
},
|
||||
];
|
||||
});
|
||||
}
|
||||
|
||||
if (capabilities.contextEnabled) {
|
||||
items.push({
|
||||
|
|
@ -156,8 +202,11 @@ const AttachFileMenu = ({
|
|||
|
||||
return localItems;
|
||||
}, [
|
||||
capabilities,
|
||||
localize,
|
||||
endpoint,
|
||||
provider,
|
||||
endpointType,
|
||||
capabilities,
|
||||
setToolResource,
|
||||
setEphemeralAgent,
|
||||
sharePointEnabled,
|
||||
|
|
|
|||
|
|
@ -1,8 +1,18 @@
|
|||
import React, { useMemo } from 'react';
|
||||
import { useRecoilValue } from 'recoil';
|
||||
import { OGDialog, OGDialogTemplate } from '@librechat/client';
|
||||
import { EToolResources, defaultAgentCapabilities } from 'librechat-data-provider';
|
||||
import { ImageUpIcon, FileSearch, TerminalSquareIcon, FileType2Icon } from 'lucide-react';
|
||||
import {
|
||||
EToolResources,
|
||||
defaultAgentCapabilities,
|
||||
isDocumentSupportedProvider,
|
||||
} from 'librechat-data-provider';
|
||||
import {
|
||||
ImageUpIcon,
|
||||
FileSearch,
|
||||
FileType2Icon,
|
||||
FileImageIcon,
|
||||
TerminalSquareIcon,
|
||||
} from 'lucide-react';
|
||||
import {
|
||||
useAgentToolPermissions,
|
||||
useAgentCapabilities,
|
||||
|
|
@ -34,22 +44,34 @@ const DragDropModal = ({ onOptionSelect, setShowModal, files, isVisible }: DragD
|
|||
* Use definition for agents endpoint for ephemeral agents
|
||||
* */
|
||||
const capabilities = useAgentCapabilities(agentsConfig?.capabilities ?? defaultAgentCapabilities);
|
||||
const { conversationId, agentId } = useDragDropContext();
|
||||
const { conversationId, agentId, endpoint, endpointType } = useDragDropContext();
|
||||
const ephemeralAgent = useRecoilValue(ephemeralAgentByConvoId(conversationId ?? ''));
|
||||
const { fileSearchAllowedByAgent, codeAllowedByAgent } = useAgentToolPermissions(
|
||||
const { fileSearchAllowedByAgent, codeAllowedByAgent, provider } = useAgentToolPermissions(
|
||||
agentId,
|
||||
ephemeralAgent,
|
||||
);
|
||||
|
||||
const options = useMemo(() => {
|
||||
const _options: FileOption[] = [
|
||||
{
|
||||
const _options: FileOption[] = [];
|
||||
const currentProvider = provider || endpoint;
|
||||
|
||||
// Check if provider supports document upload
|
||||
if (isDocumentSupportedProvider(endpointType || currentProvider)) {
|
||||
_options.push({
|
||||
label: localize('com_ui_upload_provider'),
|
||||
value: undefined,
|
||||
icon: <FileImageIcon className="icon-md" />,
|
||||
condition: true, // Allow for both images and documents
|
||||
});
|
||||
} else {
|
||||
// Only show image upload option if all files are images and provider doesn't support documents
|
||||
_options.push({
|
||||
label: localize('com_ui_upload_image_input'),
|
||||
value: undefined,
|
||||
icon: <ImageUpIcon className="icon-md" />,
|
||||
condition: files.every((file) => file.type?.startsWith('image/')),
|
||||
},
|
||||
];
|
||||
});
|
||||
}
|
||||
if (capabilities.fileSearchEnabled && fileSearchAllowedByAgent) {
|
||||
_options.push({
|
||||
label: localize('com_ui_upload_file_search'),
|
||||
|
|
@ -73,7 +95,16 @@ const DragDropModal = ({ onOptionSelect, setShowModal, files, isVisible }: DragD
|
|||
}
|
||||
|
||||
return _options;
|
||||
}, [capabilities, files, localize, fileSearchAllowedByAgent, codeAllowedByAgent]);
|
||||
}, [
|
||||
files,
|
||||
localize,
|
||||
provider,
|
||||
endpoint,
|
||||
endpointType,
|
||||
capabilities,
|
||||
codeAllowedByAgent,
|
||||
fileSearchAllowedByAgent,
|
||||
]);
|
||||
|
||||
if (!isVisible) {
|
||||
return null;
|
||||
|
|
|
|||
|
|
@ -57,7 +57,7 @@ const Part = memo(
|
|||
</>
|
||||
);
|
||||
} else if (part.type === ContentTypes.TEXT) {
|
||||
const text = typeof part.text === 'string' ? part.text : part.text.value;
|
||||
const text = typeof part.text === 'string' ? part.text : part.text?.value;
|
||||
|
||||
if (typeof text !== 'string') {
|
||||
return null;
|
||||
|
|
@ -71,7 +71,7 @@ const Part = memo(
|
|||
</Container>
|
||||
);
|
||||
} else if (part.type === ContentTypes.THINK) {
|
||||
const reasoning = typeof part.think === 'string' ? part.think : part.think.value;
|
||||
const reasoning = typeof part.think === 'string' ? part.think : part.think?.value;
|
||||
if (typeof reasoning !== 'string') {
|
||||
return null;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ const LogContent: React.FC<LogContentProps> = ({ output = '', renderImages, atta
|
|||
attachments?.forEach((attachment) => {
|
||||
const { width, height, filepath = null } = attachment as TFile & TAttachmentMetadata;
|
||||
const isImage =
|
||||
imageExtRegex.test(attachment.filename) &&
|
||||
imageExtRegex.test(attachment.filename ?? '') &&
|
||||
width != null &&
|
||||
height != null &&
|
||||
filepath != null;
|
||||
|
|
@ -56,21 +56,25 @@ const LogContent: React.FC<LogContentProps> = ({ output = '', renderImages, atta
|
|||
|
||||
const renderAttachment = (file: TAttachment) => {
|
||||
const now = new Date();
|
||||
const expiresAt = typeof file.expiresAt === 'number' ? new Date(file.expiresAt) : null;
|
||||
const expiresAt =
|
||||
'expiresAt' in file && typeof file.expiresAt === 'number' ? new Date(file.expiresAt) : null;
|
||||
const isExpired = expiresAt ? isAfter(now, expiresAt) : false;
|
||||
const filename = file.filename || '';
|
||||
|
||||
if (isExpired) {
|
||||
return `${file.filename} ${localize('com_download_expired')}`;
|
||||
return `${filename} ${localize('com_download_expired')}`;
|
||||
}
|
||||
|
||||
const filepath = file.filepath || '';
|
||||
|
||||
// const expirationText = expiresAt
|
||||
// ? ` ${localize('com_download_expires', { 0: format(expiresAt, 'MM/dd/yy HH:mm') })}`
|
||||
// : ` ${localize('com_click_to_download')}`;
|
||||
|
||||
return (
|
||||
<LogLink href={file.filepath} filename={file.filename}>
|
||||
<LogLink href={filepath} filename={filename}>
|
||||
{'- '}
|
||||
{file.filename} {localize('com_click_to_download')}
|
||||
{filename} {localize('com_click_to_download')}
|
||||
</LogLink>
|
||||
);
|
||||
};
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue