LibreChat/packages/api/src/files/encode/audio.ts
papasaidfine 4fe223eedd
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
🎞️ feat: OpenRouter Audio/Video File Upload Support (#11070)
* Added video upload support for OpenRouter

- Added VIDEO_URL content type to support video_url message format
- Implemented OpenRouter video encoding using base64 data URLs
- Extended encodeAndFormatVideos() to handle OpenRouter provider
- Updated UI to accept video uploads for OpenRouter (mp4, webm, mpeg, mov)
- Fixed case-sensitivity in provider detection for agents
- Made isDocumentSupportedProvider() and isOpenAILikeProvider() case-insensitive

Videos are now converted to data:video/mp4;base64,... format compatible
with OpenRouter's API requirements per their documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: change multimodal and google_multimodal to more transparent variable names of image_document and image_document_video_audio

(also google_multimodal doesn't apply as much since we are adding support for video and audio uploads for open router)

* fix: revert .toLowerCase change to isOpenAILikeProvider and isDocumentSupportedProvider which broke upload to provider detection for openAI endpoints

* wip: add audio support to openrouter

* fix: filetypes now properly parsed and sent rather than destructured mimetypes for openrouter

* refactor: Omit to Exclude for ESLint

* feat: update DragDropModal for new openrouter support

* fix: special case openrouter for lower case provider

(currently getting issues with the provider coming in as 'OpenRouter' and our enum being 'openrouter') This will probably require a larger refactor later to handle case insensitivity for all providers, but that will have to be thoroughly tested in its own isolated PR

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
2025-12-25 13:23:29 -05:00

103 lines
3.4 KiB
TypeScript

import { Providers } from '@librechat/agents';
import { isDocumentSupportedProvider } from 'librechat-data-provider';
import type { IMongoFile } from '@librechat/data-schemas';
import type { ServerRequest, StrategyFunctions, AudioResult } from '~/types';
import { getFileStream, getConfiguredFileSizeLimit } from './utils';
import { validateAudio } from '~/files/validation';
/**
* Encodes and formats audio files for different providers
* @param req - The request object
* @param files - Array of audio files
* @param params - Object containing provider and optional endpoint
* @param params.provider - The provider to format for (currently only google is supported)
* @param params.endpoint - Optional endpoint name for file config lookup
* @param getStrategyFunctions - Function to get strategy functions
* @returns Promise that resolves to audio and file metadata
*/
export async function encodeAndFormatAudios(
req: ServerRequest,
files: IMongoFile[],
params: { provider: Providers; endpoint?: string },
getStrategyFunctions: (source: string) => StrategyFunctions,
): Promise<AudioResult> {
const { provider, endpoint } = params;
if (!files?.length) {
return { audios: [], files: [] };
}
const encodingMethods: Record<string, StrategyFunctions> = {};
const result: AudioResult = { audios: [], files: [] };
const results = await Promise.allSettled(
files.map((file) => getFileStream(req, file, encodingMethods, getStrategyFunctions)),
);
for (const settledResult of results) {
if (settledResult.status === 'rejected') {
console.error('Audio processing failed:', settledResult.reason);
continue;
}
const processed = settledResult.value;
if (!processed) continue;
const { file, content, metadata } = processed;
if (!content || !file) {
if (metadata) result.files.push(metadata);
continue;
}
if (!file.type.startsWith('audio/') || !isDocumentSupportedProvider(provider)) {
result.files.push(metadata);
continue;
}
const audioBuffer = Buffer.from(content, 'base64');
/** Extract configured file size limit from fileConfig for this endpoint */
const configuredFileSizeLimit = getConfiguredFileSizeLimit(req, {
provider,
endpoint,
});
const validation = await validateAudio(
audioBuffer,
audioBuffer.length,
provider,
configuredFileSizeLimit,
);
if (!validation.isValid) {
throw new Error(`Audio validation failed: ${validation.error}`);
}
if (provider === Providers.GOOGLE || provider === Providers.VERTEXAI) {
result.audios.push({
type: 'media',
mimeType: file.type,
data: content,
});
} else if (provider === Providers.OPENROUTER) {
// Extract format from filename extension (e.g., 'audio.mp3' -> 'mp3')
// OpenRouter expects format values like: wav, mp3, aiff, aac, ogg, flac, m4a, pcm16, pcm24
// Note: MIME types don't always match (e.g., 'audio/mpeg' is mp3, not mpeg), so that is why we are using the file extension instead
const format = file.filename.split('.').pop()?.toLowerCase();
if (!format) {
throw new Error(`Could not extract audio format from filename: ${file.filename}`);
}
result.audios.push({
type: 'input_audio',
input_audio: {
data: content,
format,
},
});
}
result.files.push(metadata);
}
return result;
}