mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-04-02 13:57:19 +02:00
📎 fix: Route Unrecognized File Types via supportedMimeTypes Config (#12508)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* fix: check supportedMimeTypes before routing unrecognized file types In processAttachments, files not matching the hardcoded mime type categories (image, PDF, video, audio) were silently dropped. Now resolves the endpoint's file config and checks the file type against supportedMimeTypes before routing to the documents pipeline. Files not matching any config are still skipped (original behavior). Closes #12482 * feat: encode generic document types for supported providers Remove restrictive mime type filter in encodeAndFormatDocuments that only allowed PDFs and application/* types. Add a generic encoding path for non-PDF, non-Bedrock files using the provider's native format (Anthropic base64 document, OpenAI file block, Google media block). Files are already validated upstream by supportedMimeTypes. * fix: guard file.type and cache file config in processAttachments - Add file.type truthiness check before checkType to prevent coercion of null/undefined to string 'null'/'undefined' - Cache mergedFileConfig and endpointFileConfig on the instance so addPreviousAttachments doesn't recompute per message * refactor: harden generic document encoding with validation and tests - Extract formatDocumentBlock helper to eliminate ~30 lines of duplicate provider-dispatch code between PDF and generic paths - Add size validation in generic encoding path using configuredFileSizeLimit (was fetched but unused) - Guard Bedrock from generic path — non-bedrockDocumentFormats types are now skipped instead of silently tracking metadata - Only push metadata to result.files when a document block was actually created, preventing silent inconsistent state - Enable Anthropic citations for text/plain, text/html, text/markdown (supported by Anthropic's document API) - Fix != to !== for Providers.AZURE comparison - Add 9 tests covering all four provider branches, Bedrock exclusion, size limit enforcement, and unhandled provider * fix: resolve filename type mismatch in formatDocumentBlock filename parameter is string | undefined but OpenAIFileBlock and OpenAIInputFileBlock require string. Default to 'document' when filename is undefined. * fix: use endpoint name for file config lookup in processAttachments Agent runs can have agent.provider set to a base provider (e.g., openAI) while agent.endpoint is a custom endpoint name. Using provider for the getEndpointFileConfig lookup bypassed custom endpoint supportedMimeTypes config. Now uses agent.endpoint, matching the pattern in addDocuments. * perf: filter non-Bedrock files before fetching streams Bedrock only supports types in bedrockDocumentFormats. Previously, getFileStream was called for all files and unsupported types were discarded after download. Now pre-filters the file list for Bedrock to avoid unnecessary network and memory overhead for large unsupported attachments. * refactor: clean up processAttachments file config handling - Remove redundant ?? null intermediaries; use instance properties directly in the else-if condition - Add JSDoc @type annotations for _mergedFileConfig and _endpointFileConfig in the constructor * refactor: harden document encoding and add routing tests - Hoist configuredFileSizeLimit above the loop to avoid recomputing mergeFileConfig per file - Replace Buffer.from decode with base64 length formula in the generic size check to avoid unnecessary heap allocation - Use nullish coalescing (??) for filename fallback - Clean up test: remove unnecessary type cast, use createMockRequest helper for size-limit test - Add 14 tests for processAttachments categorization logic covering supportedMimeTypes routing, null/undefined guards, standard type passthrough, and edge cases * fix: use optional chaining for checkType in routing tests FileConfig.checkType is typed as optional. Use optional chaining to satisfy strict type checking. * fix: skip stream fetches for unsupported providers, block Bedrock generic routing - Return early from encodeAndFormatDocuments when the provider is neither document-supported nor Bedrock, avoiding unnecessary getFileStream calls for providers that would discard all results - Add !isBedrock guard to the supportedMimeTypes fallback branch in processAttachments so permissive patterns like '.*' don't route non-Bedrock types into documents that would be silently dropped - Add test for Bedrock + non-Bedrock-document-type skipping * fix: respect supportedMimeTypes config for Bedrock endpoints Remove !isBedrock guard from the generic supportedMimeTypes routing branch. If a user configures permissive supportedMimeTypes for a Bedrock endpoint, the upload validation already accepted the file. The encoding layer pre-filters to Bedrock-supported types before fetching streams, so unsupported types are handled there without silently dropping files the user explicitly allowed.
This commit is contained in:
parent
275af48592
commit
6ecd1b510f
4 changed files with 539 additions and 64 deletions
|
|
@ -17,11 +17,13 @@ const {
|
|||
ContentTypes,
|
||||
excludedKeys,
|
||||
EModelEndpoint,
|
||||
mergeFileConfig,
|
||||
isParamEndpoint,
|
||||
isAgentsEndpoint,
|
||||
isEphemeralAgentId,
|
||||
supportsBalanceCheck,
|
||||
isBedrockDocumentType,
|
||||
getEndpointFileConfig,
|
||||
} = require('librechat-data-provider');
|
||||
const { getStrategyFunctions } = require('~/server/services/Files/strategies');
|
||||
const { logViolation } = require('~/cache');
|
||||
|
|
@ -71,6 +73,10 @@ class BaseClient {
|
|||
this.currentMessages = [];
|
||||
/** @type {import('librechat-data-provider').VisionModes | undefined} */
|
||||
this.visionMode;
|
||||
/** @type {import('librechat-data-provider').FileConfig | undefined} */
|
||||
this._mergedFileConfig;
|
||||
/** @type {import('librechat-data-provider').EndpointFileConfig | undefined} */
|
||||
this._endpointFileConfig;
|
||||
}
|
||||
|
||||
setOptions() {
|
||||
|
|
@ -1160,6 +1166,16 @@ class BaseClient {
|
|||
const provider = this.options.agent?.provider ?? this.options.endpoint;
|
||||
const isBedrock = provider === EModelEndpoint.bedrock;
|
||||
|
||||
if (!this._mergedFileConfig && this.options.req?.config?.fileConfig) {
|
||||
this._mergedFileConfig = mergeFileConfig(this.options.req.config.fileConfig);
|
||||
const endpoint = this.options.agent?.endpoint ?? this.options.endpoint;
|
||||
this._endpointFileConfig = getEndpointFileConfig({
|
||||
fileConfig: this._mergedFileConfig,
|
||||
endpoint,
|
||||
endpointType: this.options.endpointType,
|
||||
});
|
||||
}
|
||||
|
||||
for (const file of attachments) {
|
||||
/** @type {FileSources} */
|
||||
const source = file.source ?? FileSources.local;
|
||||
|
|
@ -1186,6 +1202,14 @@ class BaseClient {
|
|||
} else if (file.type.startsWith('audio/')) {
|
||||
categorizedAttachments.audios.push(file);
|
||||
allFiles.push(file);
|
||||
} else if (
|
||||
file.type &&
|
||||
this._mergedFileConfig &&
|
||||
this._endpointFileConfig?.supportedMimeTypes &&
|
||||
this._mergedFileConfig.checkType(file.type, this._endpointFileConfig.supportedMimeTypes)
|
||||
) {
|
||||
categorizedAttachments.documents.push(file);
|
||||
allFiles.push(file);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -56,13 +56,16 @@ describe('encodeAndFormatDocuments - fileConfig integration', () => {
|
|||
});
|
||||
|
||||
/** Helper to create a mock request with file config */
|
||||
const createMockRequest = (fileSizeLimit?: number): Partial<AppConfig> => ({
|
||||
const createMockRequest = (
|
||||
fileSizeLimit?: number,
|
||||
provider: string = Providers.OPENAI,
|
||||
): Partial<AppConfig> => ({
|
||||
config:
|
||||
fileSizeLimit !== undefined
|
||||
? {
|
||||
fileConfig: {
|
||||
endpoints: {
|
||||
[Providers.OPENAI]: {
|
||||
[provider]: {
|
||||
fileSizeLimit,
|
||||
},
|
||||
},
|
||||
|
|
@ -747,4 +750,235 @@ describe('encodeAndFormatDocuments - fileConfig integration', () => {
|
|||
});
|
||||
});
|
||||
});
|
||||
|
||||
describe('Generic document encoding path', () => {
|
||||
it('should format text/plain for Anthropic with citations enabled', async () => {
|
||||
const req = createMockRequest(30) as ServerRequest;
|
||||
const file = createMockDocFile(1, 'text/plain', 'notes.txt');
|
||||
|
||||
const mockContent = Buffer.from('plain text content').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.ANTHROPIC },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(1);
|
||||
expect(result.documents[0]).toMatchObject({
|
||||
type: 'document',
|
||||
source: {
|
||||
type: 'base64',
|
||||
media_type: 'text/plain',
|
||||
data: mockContent,
|
||||
},
|
||||
citations: { enabled: true },
|
||||
context: 'File: "notes.txt"',
|
||||
});
|
||||
expect(result.files).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('should format text/html for Anthropic with citations enabled', async () => {
|
||||
const req = createMockRequest(30) as ServerRequest;
|
||||
const file = createMockDocFile(1, 'text/html', 'page.html');
|
||||
|
||||
const mockContent = Buffer.from('<html>content</html>').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.ANTHROPIC },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(1);
|
||||
expect(result.documents[0]).toMatchObject({
|
||||
type: 'document',
|
||||
source: { type: 'base64', media_type: 'text/html', data: mockContent },
|
||||
citations: { enabled: true },
|
||||
});
|
||||
});
|
||||
|
||||
it('should format application/json for Anthropic without citations', async () => {
|
||||
const req = createMockRequest(30) as ServerRequest;
|
||||
const file = createMockDocFile(1, 'application/json', 'data.json');
|
||||
|
||||
const mockContent = Buffer.from('{"key":"value"}').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.ANTHROPIC },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(1);
|
||||
expect(result.documents[0]).not.toHaveProperty('citations');
|
||||
});
|
||||
|
||||
it('should format text/csv for OpenAI responses API', async () => {
|
||||
const req = createMockRequest(15) as ServerRequest;
|
||||
const file = createMockDocFile(1, 'text/csv', 'data.csv');
|
||||
|
||||
const mockContent = Buffer.from('a,b\n1,2').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.OPENAI, useResponsesApi: true },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(1);
|
||||
expect(result.documents[0]).toMatchObject({
|
||||
type: 'input_file',
|
||||
filename: 'data.csv',
|
||||
file_data: `data:text/csv;base64,${mockContent}`,
|
||||
});
|
||||
expect(result.files).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('should format XLSX for Google/VertexAI as media block', async () => {
|
||||
const req = createMockRequest(25) as ServerRequest;
|
||||
const mimeType = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet';
|
||||
const file = createMockDocFile(2, mimeType, 'report.xlsx');
|
||||
|
||||
const mockContent = Buffer.from('xlsx-binary').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.GOOGLE },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(1);
|
||||
expect(result.documents[0]).toMatchObject({
|
||||
type: 'media',
|
||||
mimeType,
|
||||
data: mockContent,
|
||||
});
|
||||
expect(result.files).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('should format text/plain for standard OpenAI-like provider as file block', async () => {
|
||||
const req = createMockRequest(15) as ServerRequest;
|
||||
const file = createMockDocFile(1, 'text/plain', 'readme.txt');
|
||||
|
||||
const mockContent = Buffer.from('readme content').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.OPENAI },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(1);
|
||||
expect(result.documents[0]).toMatchObject({
|
||||
type: 'file',
|
||||
file: {
|
||||
filename: 'readme.txt',
|
||||
file_data: `data:text/plain;base64,${mockContent}`,
|
||||
},
|
||||
});
|
||||
expect(result.files).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('should skip non-Bedrock-document types for Bedrock provider', async () => {
|
||||
const req = createMockRequest() as ServerRequest;
|
||||
const file = createMockDocFile(1, 'application/zip', 'archive.zip');
|
||||
|
||||
const mockContent = Buffer.from('zip-content').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.BEDROCK },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(0);
|
||||
expect(result.files).toHaveLength(0);
|
||||
});
|
||||
|
||||
it('should throw when generic file exceeds configured size limit', async () => {
|
||||
const req = createMockRequest(1, Providers.ANTHROPIC) as ServerRequest;
|
||||
const file = createMockDocFile(2, 'text/plain', 'large.txt');
|
||||
|
||||
const largeContent = Buffer.alloc(2 * 1024 * 1024).toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: largeContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
await expect(
|
||||
encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.ANTHROPIC },
|
||||
mockStrategyFunctions,
|
||||
),
|
||||
).rejects.toThrow('File size');
|
||||
});
|
||||
|
||||
it('should not push metadata when provider has no handler', async () => {
|
||||
const req = createMockRequest(15) as ServerRequest;
|
||||
const file = createMockDocFile(1, 'text/plain', 'test.txt');
|
||||
|
||||
const mockContent = Buffer.from('content').toString('base64');
|
||||
mockedGetFileStream.mockResolvedValue({
|
||||
file,
|
||||
content: mockContent,
|
||||
metadata: file,
|
||||
});
|
||||
|
||||
const result = await encodeAndFormatDocuments(
|
||||
req,
|
||||
[file],
|
||||
{ provider: Providers.AZURE as Providers },
|
||||
mockStrategyFunctions,
|
||||
);
|
||||
|
||||
expect(result.documents).toHaveLength(0);
|
||||
expect(result.files).toHaveLength(0);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -7,6 +7,7 @@ import {
|
|||
} from 'librechat-data-provider';
|
||||
import type { IMongoFile } from '@librechat/data-schemas';
|
||||
import type {
|
||||
DocumentBlock,
|
||||
AnthropicDocumentBlock,
|
||||
StrategyFunctions,
|
||||
DocumentResult,
|
||||
|
|
@ -15,16 +16,85 @@ import type {
|
|||
import { validatePdf, validateBedrockDocument } from '~/files/validation';
|
||||
import { getFileStream, getConfiguredFileSizeLimit } from './utils';
|
||||
|
||||
const ANTHROPIC_CITATION_TYPES = new Set([
|
||||
'application/pdf',
|
||||
'text/plain',
|
||||
'text/html',
|
||||
'text/markdown',
|
||||
]);
|
||||
|
||||
/**
|
||||
* Processes and encodes document files for various providers
|
||||
* @param req - Express request object
|
||||
* @param files - Array of file objects to process
|
||||
* @param params - Object containing provider, endpoint, and other options
|
||||
* @param params.provider - The provider name
|
||||
* @param params.endpoint - Optional endpoint name for file config lookup
|
||||
* @param params.useResponsesApi - Whether to use responses API format
|
||||
* @param getStrategyFunctions - Function to get strategy functions
|
||||
* @returns Promise that resolves to documents and file metadata
|
||||
* Formats a base64-encoded document into the appropriate provider-specific block.
|
||||
* Returns `null` when the provider has no matching handler.
|
||||
*/
|
||||
function formatDocumentBlock(
|
||||
provider: Providers,
|
||||
mimeType: string,
|
||||
content: string,
|
||||
filename: string | undefined,
|
||||
useResponsesApi: boolean | undefined,
|
||||
): DocumentBlock | null {
|
||||
if (provider === Providers.ANTHROPIC) {
|
||||
const document: AnthropicDocumentBlock = {
|
||||
type: 'document',
|
||||
source: {
|
||||
type: 'base64',
|
||||
media_type: mimeType,
|
||||
data: content,
|
||||
},
|
||||
};
|
||||
|
||||
if (ANTHROPIC_CITATION_TYPES.has(mimeType)) {
|
||||
document.citations = { enabled: true };
|
||||
}
|
||||
|
||||
if (filename) {
|
||||
document.context = `File: "${filename}"`;
|
||||
}
|
||||
|
||||
return document;
|
||||
}
|
||||
|
||||
const resolvedFilename = filename ?? 'document';
|
||||
|
||||
if (useResponsesApi) {
|
||||
return {
|
||||
type: 'input_file',
|
||||
filename: resolvedFilename,
|
||||
file_data: `data:${mimeType};base64,${content}`,
|
||||
};
|
||||
}
|
||||
|
||||
if (provider === Providers.GOOGLE || provider === Providers.VERTEXAI) {
|
||||
return {
|
||||
type: 'media',
|
||||
mimeType,
|
||||
data: content,
|
||||
};
|
||||
}
|
||||
|
||||
if (isOpenAILikeProvider(provider) && provider !== Providers.AZURE) {
|
||||
return {
|
||||
type: 'file',
|
||||
file: {
|
||||
filename: resolvedFilename,
|
||||
file_data: `data:${mimeType};base64,${content}`,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Encodes and formats document files for various providers.
|
||||
*
|
||||
* Callers are responsible for pre-filtering `files` to types the endpoint accepts
|
||||
* (e.g., via `supportedMimeTypes` in `processAttachments`). This function processes
|
||||
* every file it receives and dispatches to the appropriate provider format:
|
||||
* - **Bedrock**: Only encodes types in `bedrockDocumentFormats`; all others are skipped.
|
||||
* - **PDF**: Validated via `validatePdf` before encoding.
|
||||
* - **Generic types**: Encoded with a provider-specific size check.
|
||||
*/
|
||||
export async function encodeAndFormatDocuments(
|
||||
req: ServerRequest,
|
||||
|
|
@ -43,25 +113,22 @@ export async function encodeAndFormatDocuments(
|
|||
const isBedrock = provider === Providers.BEDROCK;
|
||||
const isDocSupported = isDocumentSupportedProvider(provider);
|
||||
|
||||
const documentFiles = files.filter((file) => {
|
||||
if (isBedrock && isBedrockDocumentType(file.type)) {
|
||||
return true;
|
||||
}
|
||||
return file.type === 'application/pdf' || file.type?.startsWith('application/');
|
||||
});
|
||||
|
||||
if (!documentFiles.length) {
|
||||
if (!isDocSupported && !isBedrock) {
|
||||
return result;
|
||||
}
|
||||
|
||||
const processableFiles = isBedrock
|
||||
? files.filter((file) => isBedrockDocumentType(file.type))
|
||||
: files;
|
||||
|
||||
if (!processableFiles.length) {
|
||||
return result;
|
||||
}
|
||||
|
||||
const configuredFileSizeLimit = getConfiguredFileSizeLimit(req, { provider, endpoint });
|
||||
|
||||
const results = await Promise.allSettled(
|
||||
documentFiles.map((file) => {
|
||||
const isProcessable = isBedrock
|
||||
? isBedrockDocumentType(file.type)
|
||||
: file.type === 'application/pdf' && isDocSupported;
|
||||
if (!isProcessable) {
|
||||
return Promise.resolve(null);
|
||||
}
|
||||
processableFiles.map((file) => {
|
||||
return getFileStream(req, file, encodingMethods, getStrategyFunctions);
|
||||
}),
|
||||
);
|
||||
|
|
@ -82,7 +149,6 @@ export async function encodeAndFormatDocuments(
|
|||
continue;
|
||||
}
|
||||
|
||||
const configuredFileSizeLimit = getConfiguredFileSizeLimit(req, { provider, endpoint });
|
||||
const mimeType = file.type ?? '';
|
||||
|
||||
if (isBedrock && isBedrockDocumentType(mimeType)) {
|
||||
|
|
@ -130,44 +196,37 @@ export async function encodeAndFormatDocuments(
|
|||
throw new Error(`PDF validation failed: ${validation.error}`);
|
||||
}
|
||||
|
||||
if (provider === Providers.ANTHROPIC) {
|
||||
const document: AnthropicDocumentBlock = {
|
||||
type: 'document',
|
||||
source: {
|
||||
type: 'base64',
|
||||
media_type: 'application/pdf',
|
||||
data: content,
|
||||
},
|
||||
citations: { enabled: true },
|
||||
};
|
||||
|
||||
if (file.filename) {
|
||||
document.context = `File: "${file.filename}"`;
|
||||
}
|
||||
|
||||
result.documents.push(document);
|
||||
} else if (useResponsesApi) {
|
||||
result.documents.push({
|
||||
type: 'input_file',
|
||||
filename: file.filename,
|
||||
file_data: `data:application/pdf;base64,${content}`,
|
||||
});
|
||||
} else if (provider === Providers.GOOGLE || provider === Providers.VERTEXAI) {
|
||||
result.documents.push({
|
||||
type: 'media',
|
||||
mimeType: 'application/pdf',
|
||||
data: content,
|
||||
});
|
||||
} else if (isOpenAILikeProvider(provider) && provider != Providers.AZURE) {
|
||||
result.documents.push({
|
||||
type: 'file',
|
||||
file: {
|
||||
filename: file.filename,
|
||||
file_data: `data:application/pdf;base64,${content}`,
|
||||
},
|
||||
});
|
||||
const block = formatDocumentBlock(
|
||||
provider,
|
||||
mimeType,
|
||||
content,
|
||||
file.filename,
|
||||
useResponsesApi,
|
||||
);
|
||||
if (block) {
|
||||
result.documents.push(block);
|
||||
result.files.push(metadata);
|
||||
}
|
||||
} else if (isDocSupported && !isBedrock) {
|
||||
const paddingChars = content.endsWith('==') ? 2 : content.endsWith('=') ? 1 : 0;
|
||||
const decodedByteCount = Math.floor((content.length * 3) / 4) - paddingChars;
|
||||
if (configuredFileSizeLimit && decodedByteCount > configuredFileSizeLimit) {
|
||||
throw new Error(
|
||||
`File size (~${(decodedByteCount / 1024 / 1024).toFixed(1)}MB) exceeds the configured limit for ${provider}`,
|
||||
);
|
||||
}
|
||||
|
||||
const block = formatDocumentBlock(
|
||||
provider,
|
||||
mimeType,
|
||||
content,
|
||||
file.filename,
|
||||
useResponsesApi,
|
||||
);
|
||||
if (block) {
|
||||
result.documents.push(block);
|
||||
result.files.push(metadata);
|
||||
}
|
||||
result.files.push(metadata);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
158
packages/api/src/files/encode/processAttachments.spec.ts
Normal file
158
packages/api/src/files/encode/processAttachments.spec.ts
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
import {
|
||||
FileSources,
|
||||
mergeFileConfig,
|
||||
EModelEndpoint,
|
||||
getEndpointFileConfig,
|
||||
isBedrockDocumentType,
|
||||
} from 'librechat-data-provider';
|
||||
import type { FileConfig, EndpointFileConfig } from 'librechat-data-provider';
|
||||
|
||||
/**
|
||||
* Mirrors the categorization logic from BaseClient.processAttachments.
|
||||
* Extracted here for testability since the /api workspace test setup is broken.
|
||||
*/
|
||||
function categorizeFile(
|
||||
file: {
|
||||
type?: string | null;
|
||||
source?: string;
|
||||
embedded?: boolean;
|
||||
metadata?: { fileIdentifier?: string };
|
||||
},
|
||||
isBedrock: boolean,
|
||||
mergedFileConfig: FileConfig | undefined,
|
||||
endpointFileConfig: EndpointFileConfig | undefined,
|
||||
): 'images' | 'documents' | 'videos' | 'audios' | 'skipped' {
|
||||
const source = file.source ?? FileSources.local;
|
||||
if (source === FileSources.text) {
|
||||
return 'skipped';
|
||||
}
|
||||
if (file.embedded === true || file.metadata?.fileIdentifier != null) {
|
||||
return 'skipped';
|
||||
}
|
||||
|
||||
if (file.type?.startsWith('image/')) {
|
||||
return 'images';
|
||||
} else if (file.type === 'application/pdf') {
|
||||
return 'documents';
|
||||
} else if (isBedrock && file.type && isBedrockDocumentType(file.type)) {
|
||||
return 'documents';
|
||||
} else if (file.type?.startsWith('video/')) {
|
||||
return 'videos';
|
||||
} else if (file.type?.startsWith('audio/')) {
|
||||
return 'audios';
|
||||
} else if (
|
||||
file.type &&
|
||||
mergedFileConfig &&
|
||||
endpointFileConfig?.supportedMimeTypes &&
|
||||
mergedFileConfig.checkType?.(file.type, endpointFileConfig.supportedMimeTypes)
|
||||
) {
|
||||
return 'documents';
|
||||
}
|
||||
|
||||
return 'skipped';
|
||||
}
|
||||
|
||||
describe('processAttachments — supportedMimeTypes routing logic', () => {
|
||||
const endpoint = EModelEndpoint.openAI;
|
||||
|
||||
function resolveConfig(mimePatterns: string[]) {
|
||||
const merged = mergeFileConfig({
|
||||
endpoints: {
|
||||
[endpoint]: { supportedMimeTypes: mimePatterns },
|
||||
},
|
||||
});
|
||||
const epConfig = getEndpointFileConfig({
|
||||
fileConfig: merged,
|
||||
endpoint,
|
||||
});
|
||||
return { merged, epConfig };
|
||||
}
|
||||
|
||||
it('should route text/csv to documents when supportedMimeTypes includes it', () => {
|
||||
const { merged, epConfig } = resolveConfig(['text/csv']);
|
||||
const result = categorizeFile({ type: 'text/csv' }, false, merged, epConfig);
|
||||
expect(result).toBe('documents');
|
||||
});
|
||||
|
||||
it('should route text/plain to documents when supportedMimeTypes uses wildcard', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
const result = categorizeFile({ type: 'text/plain' }, false, merged, epConfig);
|
||||
expect(result).toBe('documents');
|
||||
});
|
||||
|
||||
it('should skip application/zip when supportedMimeTypes only allows text types', () => {
|
||||
const { merged, epConfig } = resolveConfig(['text/csv', 'text/plain']);
|
||||
const result = categorizeFile({ type: 'application/zip' }, false, merged, epConfig);
|
||||
expect(result).toBe('skipped');
|
||||
});
|
||||
|
||||
it('should skip files when no fileConfig is provided', () => {
|
||||
const result = categorizeFile({ type: 'text/csv' }, false, undefined, undefined);
|
||||
expect(result).toBe('skipped');
|
||||
});
|
||||
|
||||
it('should skip files with null type even with permissive config', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
const result = categorizeFile({ type: null }, false, merged, epConfig);
|
||||
expect(result).toBe('skipped');
|
||||
});
|
||||
|
||||
it('should skip files with undefined type even with permissive config', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
const result = categorizeFile({ type: undefined }, false, merged, epConfig);
|
||||
expect(result).toBe('skipped');
|
||||
});
|
||||
|
||||
it('should still route image types through images category (not documents)', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
expect(categorizeFile({ type: 'image/png' }, false, merged, epConfig)).toBe('images');
|
||||
});
|
||||
|
||||
it('should still route PDF through documents (dedicated branch)', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
expect(categorizeFile({ type: 'application/pdf' }, false, merged, epConfig)).toBe('documents');
|
||||
});
|
||||
|
||||
it('should still route video types through videos category', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
expect(categorizeFile({ type: 'video/mp4' }, false, merged, epConfig)).toBe('videos');
|
||||
});
|
||||
|
||||
it('should still route audio types through audios category', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
expect(categorizeFile({ type: 'audio/mp3' }, false, merged, epConfig)).toBe('audios');
|
||||
});
|
||||
|
||||
it('should route Bedrock document types through documents for Bedrock provider', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
expect(categorizeFile({ type: 'text/csv' }, true, merged, epConfig)).toBe('documents');
|
||||
});
|
||||
|
||||
it('should route non-Bedrock-document types for Bedrock when config allows them', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
expect(categorizeFile({ type: 'application/zip' }, true, merged, epConfig)).toBe('documents');
|
||||
});
|
||||
|
||||
it('should route xlsx to documents with matching config', () => {
|
||||
const xlsxType = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet';
|
||||
const { merged, epConfig } = resolveConfig([xlsxType]);
|
||||
expect(categorizeFile({ type: xlsxType }, false, merged, epConfig)).toBe('documents');
|
||||
});
|
||||
|
||||
it('should skip text source files regardless of config', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
const result = categorizeFile(
|
||||
{ type: 'text/csv', source: FileSources.text },
|
||||
false,
|
||||
merged,
|
||||
epConfig,
|
||||
);
|
||||
expect(result).toBe('skipped');
|
||||
});
|
||||
|
||||
it('should skip embedded files regardless of config', () => {
|
||||
const { merged, epConfig } = resolveConfig(['.*']);
|
||||
const result = categorizeFile({ type: 'text/csv', embedded: true }, false, merged, epConfig);
|
||||
expect(result).toBe('skipped');
|
||||
});
|
||||
});
|
||||
Loading…
Add table
Add a link
Reference in a new issue