* fix: Remove ephemeral cache control from document encoding function
* refactor: Improve document encoding types and add file context for anthropic messages api
- Added AnthropicDocumentBlock interface to define the structure for documents from the Anthropic provider.
- Updated encodeAndFormatDocuments function to utilize the new type and include optional context for filenames.
- Refactored DocumentResult to use a union type for various document formats, improving type safety and clarity.
- problem: `addImageUrls` had a side effect that was being leveraged before to populate both the `ocr` message field, now `fileContext`, and `client.options.attachments`, which would record the user's uploaded message attachments to the user message when saved to the database and returned at the end of the request lifecycle
- solution: created dedicated handling for file context, and made sure to populate `allFiles` with non-provider attachments
* 📎 feat: Direct Provider Attachment Support for Multimodal Content
* 📑 feat: Anthropic Direct Provider Upload (#9072)
* feat: implement Anthropic native PDF support with document preservation
- Add comprehensive debug logging throughout PDF processing pipeline
- Refactor attachment processing to separate image and document handling
- Create distinct addImageURLs(), addDocuments(), and processAttachments() methods
- Fix critical bugs in stream handling and parameter passing
- Add streamToBuffer utility for proper stream-to-buffer conversion
- Remove api/agents submodule from repository
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* chore: remove out of scope formatting changes
* fix: stop duplication of file in chat on end of response stream
* chore: bring back file search and ocr options
* chore: localize upload to provider string in file menu
* refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support
* feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching
* feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload
* feat: add citations support according to docs
* refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now
* refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic
* fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs
* chore: remove client specific handling since the direct provider stuff is handled by the agent client
* feat: handle documents in AgentClient so no need for change to agents repo
* chore: removed unused changes
* chore: remove auto generated comments from OG commit
* feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic)
* fix: reintroduce role check to fix render error because of undefined value for Content Part
* fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent
---------
Co-authored-by: Andres Restrepo <andres@thelinuxkid.com>
Co-authored-by: Claude <noreply@anthropic.com>
📁 feat: Send Attachments Directly to Provider (OpenAI) (#9098)
* refactor: change references from direct upload to direct attach to better reflect functionality
since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support
* feat: add upload to provider option for openai (and agent) ui
* chore: move anthropic pdf validator over to packages/api
* feat: simple pdf validation according to openai docs
* feat: add provider agnostic validatePdf logic to start handling multiple endpoints
* feat: add handling for openai specific documentPart formatting
* refactor: move require statement to proper place at top of file
* chore: add in openAI endpoint for the rest of the document handling logic
* feat: add direct attach support for azureOpenAI endpoint and agents
* feat: add pdf validation for azureOpenAI endpoint
* refactor: unify all the endpoint checks with isDocumentSupportedEndpoint
* refactor: consolidate Upload to Provider vs Upload image logic for clarity
* refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now
🗂️ feat: Send Attachments Directly to Provider (Google) (#9100)
* feat: add validation for google PDFs and add google endpoint as a document supporting endpoint
* feat: add proper pdf formatting for google endpoints (requires PR #14 in agents)
* feat: add multimodal support for google endpoint attachments
* feat: add audio file svg
* fix: refactor attachments logic so multi-attachment messages work properly
* feat: add video file svg
* fix: allows for followup questions of uploaded multimodal attachments
* fix: remove incorrect final message filtering that was breaking Attachment component rendering
fix: manualy rename 'documents' to 'Documents' in git since it wasn't picked up due to case insensitivity in dir name
fix: add logic so filepicker for a google agent has proper filetype filtering
🛫 refactor: Move Encoding Logic to packages/api (#9182)
* refactor: move audio encode over to TS
* refactor: audio encoding now functional in LC again
* refactor: move video encode over to TS
* refactor: move document encode over to TS
* refactor: video encoding now functional in LC again
* refactor: document encoding now functional in LC again
* fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider
* feat: only accept pdfs if responses api is enabled for openai convos
chore: address ESLint comments
chore: add missing audio mimetype
* fix: type safety for message content parts and improve null handling
* chore: reorder AttachFileMenuProps for consistency and clarity
* chore: import order in AttachFileMenu
* fix: improve null handling for text parts in parseTextParts function
* fix: remove no longer used unsupported capability error message for file uploads
* fix: OpenAI Direct File Attachment Format
* fix: update encodeAndFormatDocuments to support OpenAI responses API and enhance document result types
* refactor: broaden providers supported for documents
* feat: enhance DragDrop context and modal to support document uploads based on provider capabilities
* fix: reorder import statements for consistency in video encoding module
---------
Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
* refactor: move `loadOCRConfig` from `packages/data-provider` to `packages/api` and return `undefined` if not explicitly configured
* fix: loadOCRConfig import from @librechat/api
* refactor: update defaultTextMimeTypes to support virtually all file types for text parsing
* fix: improve OCR capability check and error message for unsupported file types
* ci: remove unnecessary ocr expectation from AppService test
* fix: axios response logging for text parsing, remove console logging, remove jsdoc
* refactor: error logging in logAxiosError function to handle various error types with type guards
* refactor: enhance text parsing with improved error handling and async file reading
* refactor: replace synchronous file reading with asynchronous methods for improved performance and memory management
* ci: update tests
* 💻 feat: Add proxy configuration support for Mistral OCR API requests
* refactor: Implement proxy support for Mistral API requests using HttpsProxyAgent
* 🪶 feat: Add Support for Uploading Plaintext Files
feat: delineate between OCR and text handling in fileConfig field of config file
- also adds support for passing in mimetypes as just plain file extensions
feat: add showLabel bool to support future synthetic component DynamicDropdownInput
feat: add new combination dropdown-input component in params panel to support file type token limits
refactor: move hovercard to side to align with other hovercards
chore: clean up autogenerated comments
feat: add delineation to file upload path between text and ocr configured filetypes
feat: add token limit checks during file upload
refactor: move textParsing out of ocrEnabled logic
refactor: clean up types for filetype config
refactor: finish decoupling DynamicDropdownInput from fileTokenLimits
fix: move image token cost function into file to fix circular dependency causing unittest to fail and remove unused var for linter
chore: remove out of scope code following review
refactor: make fileTokenLimit conform to existing styles
chore: remove unused localization string
chore: undo changes to DynamicInput and other strays
feat: add fileTokenLimit to all provider config panels
fix: move textParsing back into ocr tool_resource block for now so that it doesn't interfere with other upload types
* 📤 feat: Add RAG API Endpoint Support for Text Parsing (#8849)
* feat: implement RAG API integration for text parsing with fallback to native parsing
* chore: remove TODO now that placeholder and fllback are implemented
* ✈️ refactor: Migrate Text Parsing to TS (#8892)
* refactor: move generateShortLivedToken to packages/api
* refactor: move textParsing logic into packages/api
* refactor: reduce nesting and dry code with createTextFile
* fix: add proper source handling
* fix: mock new parseText and parseTextNative functions in jest file
* ci: add test coverage for textParser
* 💬 feat: Add Audio File Support to Upload as Text (#8893)
* feat: add STT support for Upload as Text
* refactor: move processAudioFile to packages/api
* refactor: move textParsing from utils to files
* fix: remove audio/mp3 from unsupported mimetypes test since it is now supported
* ✂️ feat: Configurable File Token Limits and Truncation (#8911)
* feat: add configurable fileTokenLimit default value
* fix: add stt to fileConfig merge logic
* fix: add fileTokenLimit to mergeFileConfig logic so configurable value is actually respected from yaml
* feat: add token limiting to parsed text files
* fix: add extraction logic and update tests so fileTokenLimit isnt sent to LLM providers
* fix: address comments
* refactor: rename textTokenLimiter.ts to text.ts
* chore: update form-data package to address CVE-2025-7783 and update package-lock
* feat: use default supported mime types for ocr on frontend file validation
* fix: should be using logger.debug not console.debug
* fix: mock existsSync in text.spec.ts
* fix: mock logger rather than every one of its function calls
* fix: reorganize imports and streamline file upload processing logic
* refactor: update createTextFile function to use destructured parameters and improve readability
* chore: update file validation to use EToolResources for improved type safety
* chore: update import path for types in audio processing module
* fix: update file configuration access and replace console.debug with logger.debug for improved logging
---------
Co-authored-by: Dustin Healy <dustinhealy1@gmail.com>
Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
* WIP: app.locals refactoring
WIP: appConfig
fix: update memory configuration retrieval to use getAppConfig based on user role
fix: update comment for AppConfig interface to clarify purpose
🏷️ refactor: Update tests to use getAppConfig for endpoint configurations
ci: Update AppService tests to initialize app config instead of app.locals
ci: Integrate getAppConfig into remaining tests
refactor: Update multer storage destination to use promise-based getAppConfig and improve error handling in tests
refactor: Rename initializeAppConfig to setAppConfig and update related tests
ci: Mock getAppConfig in various tests to provide default configurations
refactor: Update convertMCPToolsToPlugins to use mcpManager for server configuration and adjust related tests
chore: rename `Config/getAppConfig` -> `Config/app`
fix: streamline OpenAI image tools configuration by removing direct appConfig dependency and using function parameters
chore: correct parameter documentation for imageOutputType in ToolService.js
refactor: remove `getCustomConfig` dependency in config route
refactor: update domain validation to use appConfig for allowed domains
refactor: use appConfig registration property
chore: remove app parameter from AppService invocation
refactor: update AppConfig interface to correct registration and turnstile configurations
refactor: remove getCustomConfig dependency and use getAppConfig in PluginController, multer, and MCP services
refactor: replace getCustomConfig with getAppConfig in STTService, TTSService, and related files
refactor: replace getCustomConfig with getAppConfig in Conversation and Message models, update tempChatRetention functions to use AppConfig type
refactor: update getAppConfig calls in Conversation and Message models to include user role for temporary chat expiration
ci: update related tests
refactor: update getAppConfig call in getCustomConfigSpeech to include user role
fix: update appConfig usage to access allowedDomains from actions instead of registration
refactor: enhance AppConfig to include fileStrategies and update related file strategy logic
refactor: update imports to use normalizeEndpointName from @librechat/api and remove redundant definitions
chore: remove deprecated unused RunManager
refactor: get balance config primarily from appConfig
refactor: remove customConfig dependency for appConfig and streamline loadConfigModels logic
refactor: remove getCustomConfig usage and use app config in file citations
refactor: consolidate endpoint loading logic into loadEndpoints function
refactor: update appConfig access to use endpoints structure across various services
refactor: implement custom endpoints configuration and streamline endpoint loading logic
refactor: update getAppConfig call to include user role parameter
refactor: streamline endpoint configuration and enhance appConfig usage across services
refactor: replace getMCPAuthMap with getUserMCPAuthMap and remove unused getCustomConfig file
refactor: add type annotation for loadedEndpoints in loadEndpoints function
refactor: move /services/Files/images/parse to TS API
chore: add missing FILE_CITATIONS permission to IRole interface
refactor: restructure toolkits to TS API
refactor: separate manifest logic into its own module
refactor: consolidate tool loading logic into a new tools module for startup logic
refactor: move interface config logic to TS API
refactor: migrate checkEmailConfig to TypeScript and update imports
refactor: add FunctionTool interface and availableTools to AppConfig
refactor: decouple caching and DB operations from AppService, make part of consolidated `getAppConfig`
WIP: fix tests
* fix: rebase conflicts
* refactor: remove app.locals references
* refactor: replace getBalanceConfig with getAppConfig in various strategies and middleware
* refactor: replace appConfig?.balance with getBalanceConfig in various controllers and clients
* test: add balance configuration to titleConvo method in AgentClient tests
* chore: remove unused `openai-chat-tokens` package
* chore: remove unused imports in initializeMCPs.js
* refactor: update balance configuration to use getAppConfig instead of getBalanceConfig
* refactor: integrate configMiddleware for centralized configuration handling
* refactor: optimize email domain validation by removing unnecessary async calls
* refactor: simplify multer storage configuration by removing async calls
* refactor: reorder imports for better readability in user.js
* refactor: replace getAppConfig calls with req.config for improved performance
* chore: replace getAppConfig calls with req.config in tests for centralized configuration handling
* chore: remove unused override config
* refactor: add configMiddleware to endpoint route and replace getAppConfig with req.config
* chore: remove customConfig parameter from TTSService constructor
* refactor: pass appConfig from request to processFileCitations for improved configuration handling
* refactor: remove configMiddleware from endpoint route and retrieve appConfig directly in getEndpointsConfig if not in `req.config`
* test: add mockAppConfig to processFileCitations tests for improved configuration handling
* fix: pass req.config to hasCustomUserVars and call without await after synchronous refactor
* fix: type safety in useExportConversation
* refactor: retrieve appConfig using getAppConfig in PluginController and remove configMiddleware from plugins route, to avoid always retrieving when plugins are cached
* chore: change `MongoUser` typedef to `IUser`
* fix: Add `user` and `config` fields to ServerRequest and update JSDoc type annotations from Express.Request to ServerRequest
* fix: remove unused setAppConfig mock from Server configuration tests
* chore: Handle optional token_endpoint in OAuth metadata discovery
* chore: Simplify permission typing logic in checkAccess function
* feat: Implement `deleteMistralFile` function and integrate file cleanup in `uploadMistralOCR`
* feat: Enhance loadServiceKey to support stringified JSON input
* chore: Update GOOGLE_SERVICE_KEY_FILE_PATH to GOOGLE_SERVICE_KEY_FILE for consistency
* Implemented new uploadGoogleVertexMistralOCR function for processing OCR using Google Vertex AI.
* Added vertexMistralOCRStrategy to handle file uploads.
* Updated FileSources and OCRStrategy enums to include vertexai_mistral_ocr.
* Introduced helper functions for JWT creation and Google service account configuration loading.
* 🔧 fix: enhance client options handling in AgentClient and set default recursion limit
- Updated the recursion limit to default to 25 if not specified in agentsEConfig.
- Enhanced client options in AgentClient to include model parameters such as apiKey and anthropicApiUrl from agentModelParams.
- Updated requestOptions in the anthropic endpoint to use reverseProxyUrl as anthropicApiUrl.
* Enhance LLM configuration tests with edge case handling
* chore add return type annotation for getCustomEndpointConfig function
* fix: update modelOptions handling to use optional chaining and default to empty object in multiple endpoint initializations
* chore: update @librechat/agents to version 2.4.42
* refactor: streamline agent endpoint configuration and enhance client options handling for title generations
- Introduced a new `getProviderConfig` function to centralize provider configuration logic.
- Updated `AgentClient` to utilize the new provider configuration, improving clarity and maintainability.
- Removed redundant code related to endpoint initialization and model parameter handling.
- Enhanced error logging for missing endpoint configurations.
* fix: add abort handling for image generation and editing in OpenAIImageTools
* ci: enhance getLLMConfig tests to verify fetchOptions and dispatcher properties
* fix: use optional chaining for endpointOption properties in getOptions
* fix: increase title generation timeout from 25s to 45s, pass `endpointOption` to `getOptions`
* fix: update file filtering logic in getToolFilesByIds to ensure text field is properly checked
* fix: add error handling for empty OCR results in uploadMistralOCR and uploadAzureMistralOCR
* fix: enhance error handling in file upload to include 'No OCR result' message
* chore: update error messages in uploadMistralOCR and uploadAzureMistralOCR
* fix: enhance filtering logic in getToolFilesByIds to include context checks for OCR resources to only include files directly attached to agent
---------
Co-authored-by: Matt Burnett <matt.burnett@shopify.com>
* 🧹 chore: Remove Comments and Cleanup base64 handling for Azure Mistral OCR
* chore: Remove unnecessary await from MCP instructions formatting in AgentClient
* ci: Update document_url regex in MistralOCR tests to support PDF format
* 👁️ feat: Add Azure Mistral OCR strategy and endpoint integration
This commit introduces a new OCR strategy named 'azure_mistral_ocr', allowing the use of a Mistral OCR endpoint deployed on Azure. The configuration, schemas, and file upload strategies have been updated to support this integration, enabling seamless OCR processing via Azure-hosted Mistral services.
* 🗑️ chore: Clean up .gitignore by removing commented-out uncommon directory name
* chore: remove unused vars
* refactor: Move createAxiosInstance to packages/api/utils and update imports
- Removed the createAxiosInstance function from the config module and relocated it to a new utils module for better organization.
- Updated import paths in relevant files to reflect the new location of createAxiosInstance.
- Added tests for createAxiosInstance to ensure proper functionality and proxy configuration handling.
* chore: move axios helpers to packages/api
- Added logAxiosError function to @librechat/api for centralized error logging.
- Updated imports across various files to use the new logAxiosError function.
- Removed the old axios.js utility file as it is no longer needed.
* chore: Update Jest moduleNameMapper for improved path resolution
- Added a new mapping for '~/' to resolve module paths in Jest configuration, enhancing import handling for the project.
* feat: Implement Mistral OCR API integration in TS
* chore: Update MistralOCR tests based on new imports
* fix: Enhance MistralOCR configuration handling and tests
- Introduced helper functions for resolving configuration values from environment variables or hardcoded settings.
- Updated the uploadMistralOCR and uploadAzureMistralOCR functions to utilize the new configuration resolution logic.
- Improved test cases to ensure correct behavior when mixing environment variables and hardcoded values.
- Mocked file upload and signed URL responses in tests to validate functionality without external dependencies.
* feat: Enhance MistralOCR functionality with improved configuration and error handling
- Introduced helper functions for loading authentication configuration and resolving values from environment variables.
- Updated uploadMistralOCR and uploadAzureMistralOCR functions to utilize the new configuration logic.
- Added utility functions for processing OCR results and creating error messages.
- Improved document type determination and result aggregation for better OCR processing.
* refactor: Reorganize OCR type imports in Mistral CRUD file
- Moved OCRResult, OCRResultPage, and OCRImage imports to a more logical grouping for better readability and maintainability.
* feat: Add file exports to API and create files index
* chore: Update OCR types for enhanced structure and clarity
- Redesigned OCRImage interface to include mandatory fields and improved naming conventions.
- Added PageDimensions interface for better representation of page metrics.
- Updated OCRResultPage to include dimensions and mandatory images array.
- Refined OCRResult to include document annotation and usage information.
* refactor: use TS counterpart of uploadOCR methods
* ci: Update MistralOCR tests to reflect new OCR result structure
* chore: Bump version of @librechat/api to 1.2.3 in package.json and package-lock.json
* chore: Update CONFIG_VERSION to 1.2.8
* chore: remove unused sendEvent function from config module (now imported from '@librechat/api')
* chore: remove MistralOCR service files and tests (now in '@librechat/api')
* ci: update logger import in ModelService tests to use @librechat/data-schemas
---------
Co-authored-by: arthurolivierfortin <arthurolivier.fortin@gmail.com>