Commit graph

33 commits

Author SHA1 Message Date
Danny Avila
e442984364
💣 fix: Harden against falsified ZIP metadata in ODT parsing (#12320)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Publish `@librechat/client` to NPM / build-and-publish (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* security: replace JSZip metadata guard with yauzl streaming decompression

The ODT decompressed-size guard was checking JSZip's private
_data.uncompressedSize fields, which are populated from the ZIP central
directory — attacker-controlled metadata. A crafted ODT with falsified
uncompressedSize values bypassed the 50MB cap entirely, allowing
content.xml decompression to exhaust Node.js heap memory (DoS).

Replace JSZip with yauzl for ODT extraction. The new extractOdtContentXml
function uses yauzl's streaming API: it lazily iterates ZIP entries,
opens a decompression stream for content.xml, and counts real bytes as
they arrive from the inflate stream. The stream is destroyed the moment
the byte count crosses ODT_MAX_DECOMPRESSED_SIZE, aborting the inflate
before the full payload is materialised in memory.

- Remove jszip from direct dependencies (still transitive via mammoth)
- Add yauzl + @types/yauzl
- Update zip-bomb test to verify streaming abort with DEFLATE payload

* fix: close file descriptor leaks and declare jszip test dependency

- Use a shared `finish()` helper in extractOdtContentXml that calls
  zipfile.close() on every exit path (success, size cap, missing entry,
  openReadStream errors, zipfile errors). Without this, any error path
  leaked one OS file descriptor permanently — uploading many malformed
  ODTs could exhaust the process FD limit (a distinct DoS vector).
- Add jszip to devDependencies so the zip-bomb test has an explicit
  dependency rather than relying on mammoth's transitive jszip.
- Update JSDoc to document that all exit paths close the zipfile.

* fix: move yauzl from dependencies to peerDependencies

Matches the established pattern for runtime parser libraries in
packages/api: mammoth, pdfjs-dist, and xlsx are all peerDependencies
(provided by the consuming /api workspace) with devDependencies for
testing. yauzl was incorrectly placed in dependencies.

* fix: add yauzl to /api dependencies to satisfy peer dep

packages/api declares yauzl as a peerDependency; /api is the consuming
workspace that must provide it at runtime, matching the pattern used
for mammoth, pdfjs-dist, and xlsx.
2026-03-19 22:13:40 -04:00
Pol Burkardt Freire
7e74165c3c
📖 feat: Add Native ODT Document Parser Support (#12303)
* fix: add ODT support to native document parser

* fix: replace execSync with jszip for ODT parsing

* docs: update documentParserMimeTypes comment to include odt

* fix: improve ODT XML extraction and add empty.odt fixture

- Scope extraction to <office:body> to exclude metadata/style nodes
- Map </text:p> and </text:h> closings to newlines, preserving paragraph
  structure instead of collapsing everything to a single line
- Handle <text:line-break/> as explicit newlines
- Strip remaining tags, normalize horizontal whitespace, cap consecutive
  blank lines at one
- Regenerate sample.odt as a two-paragraph fixture so the test exercises
  multi-paragraph output
- Add empty.odt fixture and test asserting 'No text found in document'

* fix: address review findings in ODT parser

- Use static `import JSZip from 'jszip'` instead of dynamic import;
  jszip is CommonJS-only with no ESM/Jest-isolation concern (F1)
- Decode the five standard XML entities after tag-stripping so
  documents with &, <, >, ", ' send correct text to the LLM (F2)
- Remove @types/jszip devDependency; jszip ships bundled declarations
  and @types/jszip is a stale 2020 stub that would shadow them (F3)
- Handle <text:tab/> → \t and <text:s .../> → ' ' before the generic
  tag stripper so tab-aligned and multi-space content is preserved (F4)
- Add sample-entities.odt fixture and test covering entity decoding,
  tab, and spacing-element handling (F5)
- Rename 'throws for empty odt' → 'throws for odt with no extractable
  text' to distinguish from a zero-byte/corrupt file case (F8)

* fix: add decompressed content size cap to odtToText (F6)

Reads uncompressed entry sizes from the JSZip internal metadata before
extracting any content. Throws if the total exceeds 50MB, preventing a
crafted ODT with a high-ratio compressed payload from exhausting heap.

Adds a corresponding test using a real DEFLATE-compressed ZIP (~51KB on
disk, 51MB uncompressed) to verify the guard fires before any extraction.

* fix: add java to codeTypeMapping for file upload support

.java files were rejected with "Unable to determine file type" because
browsers send an empty MIME type for them and codeTypeMapping had no
'java' entry for inferMimeType() to fall back on.

text/x-java was already present in all five validation lists
(fullMimeTypesList, codeInterpreterMimeTypesList, retrievalMimeTypesList,
textMimeTypes, retrievalMimeTypes), so mapping to it (not text/plain)
ensures .java uploads work for both File Search and Code Interpreter.

Closes #12307

* fix: address follow-up review findings (A-E)

A: regenerate package-lock.json after removing @types/jszip from
   package.json; without this npm ci was still installing the stale
   2020 type stubs and TypeScript was resolving against them
B: replace dynamic import('jszip') in the zip-bomb test with the same
   static import already used in production; jszip is CJS-only with no
   ESM/Jest isolation concern
C: document that the _data.uncompressedSize guard fails open if jszip
   renames the private field (accepted limitation, test would catch it)
D: rename 'preserves tabs' test to 'normalizes tab and spacing elements
   to spaces' since <text:tab> is collapsed to a space, not kept as \t
E: fix test.each([ formatting artifact (missing newline after '[')

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-19 15:49:52 -04:00
Danny Avila
68435cdcd0
🧯 fix: Add Pre-Parse File Size Guard to Document Parser (#12275)
Prevent memory exhaustion DoS by rejecting documents exceeding 15MB
before reading them into memory, closing the gap between the 512MB
upload limit and unbounded in-memory parsing.
2026-03-17 02:36:18 -04:00
Danny Avila
c6982dc180
🛡️ fix: Agent Permission Check on Image Upload Route (#12219)
* fix: add agent permission check to image upload route

* refactor: remove unused SystemRoles import and format test file for clarity

* fix: address review findings for image upload agent permission check

* refactor: move agent upload auth logic to TypeScript in packages/api

Extract pure authorization logic from agentPermCheck.js into
checkAgentUploadAuth() in packages/api/src/files/agentUploadAuth.ts.
The function returns a structured result ({ allowed, status, error })
instead of writing HTTP responses directly, eliminating the dual
responsibility and confusing sentinel return value. The JS wrapper
in /api is now a thin adapter that translates the result to HTTP.

* test: rewrite image upload permission tests as integration tests

Replace mock-heavy images-agent-perm.spec.js with integration tests
using MongoMemoryServer, real models, and real PermissionService.
Follows the established pattern in files.agents.test.js. Moves test
to sibling location (images.agents.test.js) matching backend convention.
Adds temp file cleanup assertions on 403/404 responses and covers
message_file exemption paths (boolean true, string "true", false).

* fix: widen AgentUploadAuthDeps types to accept ObjectId from Mongoose

The injected getAgent returns Mongoose documents where _id and author
are Types.ObjectId at runtime, not string. Widen the DI interface to
accept string | Types.ObjectId for _id, author, and resourceId so the
contract accurately reflects real callers.

* chore: move agent upload auth into files/agents/ subdirectory

* refactor: delete agentPermCheck.js wrapper, move verifyAgentUploadPermission to packages/api

The /api-only dependencies (getAgent, checkPermission) are now passed
as object-field params from the route call sites. Both images.js and
files.js import verifyAgentUploadPermission from @librechat/api and
inject the deps directly, eliminating the intermediate JS wrapper.

* style: fix import type ordering in agent upload auth

* fix: prevent token TTL race in MCPTokenStorage.storeTokens

When expires_in is provided, use it directly instead of round-tripping
through Date arithmetic. The previous code computed accessTokenExpiry
as a Date, then after an async encryptV2 call, recomputed expiresIn by
subtracting Date.now(). On loaded CI runners the elapsed time caused
Math.floor to truncate to 0, triggering the 1-year fallback and making
the token appear permanently valid — so refresh never fired.
2026-03-14 02:57:56 -04:00
Danny Avila
3b84cc048a
🧮 fix: XLSX/XLS Upload-as-Text via Buffer-Based SheetJS Parsing (#12098)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* 🔧 fix: Update Excel sheet parsing to use fs.promises.readFile and correct import for xlsx

- Modified the excelSheetToText function to read the file using fs.promises.readFile instead of directly accessing the file path.
- Updated the import statement for the xlsx library to use the correct read method, ensuring proper functionality in parsing Excel sheets.

* 🔧 fix: Update document parsing methods to use buffer for file reading

- Modified the wordDocToText function to read the file as a buffer using fs.promises.readFile, ensuring compatibility with the mammoth library.
- Updated the excelSheetToText function to read the Excel file as a buffer, addressing issues with the xlsx library's handling of dynamic imports and file access.

* feat: Add tests for empty xlsx document parsing and validate xlsx imports

- Introduced a new test case to verify that the `parseDocument` function correctly handles an empty xlsx file with only a sheet name, ensuring it returns the expected document structure.
- Added a test to confirm that the `xlsx` library exports `read` and `utils` as named imports, validating the functionality of the library integration.
- Included a new empty xlsx file to support the test cases.
2026-03-06 00:21:55 -05:00
Danny Avila
046e92217f
🧩 feat: OpenDocument Format File Upload and Native ODS Parsing (#11959)
*  feat: Add support for OpenDocument MIME types in file configuration

Updated the applicationMimeTypes regex to include support for OASIS OpenDocument formats, enhancing the file type recognition capabilities of the data provider.

* feat: document processing with OpenDocument support

Added support for OpenDocument Spreadsheet (ODS) MIME type in the file processing service and updated the document parser to handle ODS files. Included tests to verify correct parsing of ODS documents and updated file configuration to recognize OpenDocument formats.

* refactor: Enhance document processing to support additional Excel MIME types

Updated the document processing logic to utilize a regex for matching Excel MIME types, improving flexibility in handling various Excel file formats. Added tests to ensure correct parsing of new MIME types, including multiple Excel variants and OpenDocument formats. Adjusted file configuration to include these MIME types for better recognition in the file processing service.

* feat: Add support for additional OpenDocument MIME types in file processing

Enhanced the document processing service to support ODT, ODP, and ODG MIME types. Updated tests to verify correct routing through the OCR strategy for these new formats. Adjusted documentation to reflect changes in handled MIME types for improved clarity.
2026-02-26 14:39:49 -05:00
Dustin Healy
1d0a4c501f
🪨 feat: AWS Bedrock Document Uploads (#11912)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* feat: add aws bedrock upload to provider support

* chore: address copilot comments

* feat: add shared Bedrock document format types and MIME mapping

Bedrock Converse API accepts 9 document formats beyond PDF. Add
BedrockDocumentFormat union type, MIME-to-format mapping, and helpers
in data-provider so both client and backend can reference them.

* refactor: generalize Bedrock PDF validation to support all document types

Rename validateBedrockPdf to validateBedrockDocument with MIME-aware
logic: 4.5MB hard limit applies to all types, PDF header check only
runs for application/pdf. Adds test coverage for non-PDF documents.

* feat: support all Bedrock document formats in encoding pipeline

Widen file type gates to accept csv, doc, docx, xls, xlsx, html, txt,
md for Bedrock. Uses shared MIME-to-format map instead of hardcoded
'pdf'. Other providers' PDF-only paths remain unchanged.

* feat: expand Bedrock file upload UI to accept all document types

Add 'image_document_extended' upload type for Bedrock with accept
filters for all 9 supported formats. Update drag-and-drop validation
to use isBedrockDocumentType helper.

* fix: route Bedrock document types through provider pipeline
2026-02-23 22:32:44 -05:00
Danny Avila
7ce898d6a0
📄 feat: Local Text Extraction for PDF, DOCX, and XLS/XLSX (#11900)
* feat: Added "document parser" OCR strategy

The document parser uses libraries to parse the text out of known document types.
This lets LibreChat handle some complex document types without having to use a
secondary service (like Mistral or standing up a RAG API server).

To enable the document parser, set the ocr strategy to "document_parser" in
librechat.yaml.

We now support:

- PDFs using pdfjs
- DOCX using mammoth
- XLS/XLSX using SheetJS

(The associated packages were also added to the project.)

* fix: applied Copilot code review suggestions

- Properly calculate length of text based on UTF8.

- Avoid issues with loading / blocking PDF parsing.

* fix: improved docs on parseDocument()

* chore: move to packages/api for TS support

* refactor: make document processing the default ocr strategy

- Introduced support for additional document types in the OCR strategy, including PDF, DOCX, and XLS/XLSX.
- Updated the file upload handling to dynamically select the appropriate parsing strategy based on the file type.
- Refactored the document parsing functions to use asynchronous imports for improved performance and maintainability.

* test: add unit tests for processAgentFileUpload functionality

- Introduced a new test suite for the processAgentFileUpload function in process.spec.js.
- Implemented various test cases to validate OCR strategy selection based on file types, including PDF, DOCX, XLSX, and XLS.
- Mocked dependencies to ensure isolated testing of file upload handling and strategy selection logic.
- Enhanced coverage for scenarios involving OCR capability checks and default strategy fallbacks.

* chore: update pdfjs-dist version and enhance document parsing tests

- Bumped pdfjs-dist dependency to version 5.4.624 in both api and packages/api.
- Refactored document parsing tests to use 'originalname' instead of 'filename' for file objects.
- Added a new test case for parsing XLS files to improve coverage of document types supported by the parser.
- Introduced a sample XLS file for testing purposes.

* feat: enforce text size limit and improve OCR fallback handling in processAgentFileUpload

- Added a check to ensure extracted text does not exceed the 15MB storage limit, throwing an error if it does.
- Refactored the OCR handling logic to improve fallback behavior when the configured OCR fails, ensuring a more robust document processing flow.
- Enhanced unit tests to cover scenarios for oversized text and fallback mechanisms, ensuring proper error handling and functionality.

* fix: correct OCR URL construction in performOCR function

- Updated the OCR URL construction to ensure it correctly appends '/ocr' to the base URL if not already present, improving the reliability of the OCR request.

---------

Co-authored-by: Dan Lew <daniel@mightyacorn.com>
2026-02-22 14:22:45 -05:00
ethanlaj
2513e0a423
🔧 feat: deleteRagFile utility for Consistent RAG API document deletion (#11493)
* 🔧 feat: Implement deleteRagFile utility for RAG API document deletion across storage strategies

* chore: import order

* chore: import order & remove unnecessary comments

---------

Co-authored-by: Danny Avila <danacordially@gmail.com>
2026-02-14 13:57:01 -05:00
papasaidfine
4fe223eedd
🎞️ feat: OpenRouter Audio/Video File Upload Support (#11070)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
* Added video upload support for OpenRouter

- Added VIDEO_URL content type to support video_url message format
- Implemented OpenRouter video encoding using base64 data URLs
- Extended encodeAndFormatVideos() to handle OpenRouter provider
- Updated UI to accept video uploads for OpenRouter (mp4, webm, mpeg, mov)
- Fixed case-sensitivity in provider detection for agents
- Made isDocumentSupportedProvider() and isOpenAILikeProvider() case-insensitive

Videos are now converted to data:video/mp4;base64,... format compatible
with OpenRouter's API requirements per their documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: change multimodal and google_multimodal to more transparent variable names of image_document and image_document_video_audio

(also google_multimodal doesn't apply as much since we are adding support for video and audio uploads for open router)

* fix: revert .toLowerCase change to isOpenAILikeProvider and isDocumentSupportedProvider which broke upload to provider detection for openAI endpoints

* wip: add audio support to openrouter

* fix: filetypes now properly parsed and sent rather than destructured mimetypes for openrouter

* refactor: Omit to Exclude for ESLint

* feat: update DragDropModal for new openrouter support

* fix: special case openrouter for lower case provider

(currently getting issues with the provider coming in as 'OpenRouter' and our enum being 'openrouter') This will probably require a larger refactor later to handle case insensitivity for all providers, but that will have to be thoroughly tested in its own isolated PR

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
2025-12-25 13:23:29 -05:00
rossbg
959984f959
⏱️ fix: Increase RAG API Text Parsing Timeout (#10562)
* fix: increase RAG API text parsing timeout for large files

* ci: Update text.spec.ts

---------

Co-authored-by: Rosen Simov <rosen.simov@endurosat.com>
Co-authored-by: Danny Avila <danny@librechat.ai>
2025-11-25 14:54:53 -05:00
Dustin Healy
dfcaff9b00
📷 fix: Use 'media' type for Google multimodal attachments (#10586)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
* fix: change google multimodal attachments to use type: 'media'

* chore: Update @librechat/agents to version 3.0.27 in package.json and package-lock.json

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2025-11-19 18:31:05 -05:00
Danny Avila
937563f645
🖼️ feat: File Size and MIME Type Filtering at Agent level (#10446)
* refactor: add image file size validation as part of payload build

* feat: implement file size and MIME type filtering in endpoint configuration

* chore: import order
2025-11-10 21:36:48 -05:00
Danny Avila
2524d33362
📂 refactor: Cleanup File Filtering Logic, Improve Validation (#10414)
* feat: add filterFilesByEndpointConfig to filter disabled file processing by provider

* chore: explicit define of endpointFileConfig for better debugging

* refactor: move `normalizeEndpointName` to data-provider as used app-wide

* chore: remove overrideEndpoint from useFileHandling

* refactor: improve endpoint file config selection

* refactor: update filterFilesByEndpointConfig to accept structured parameters and improve endpoint file config handling

* refactor: replace defaultFileConfig with getEndpointFileConfig for improved file configuration handling across components

* test: add comprehensive unit tests for getEndpointFileConfig to validate endpoint configuration handling

* refactor: streamline agent endpoint assignment and improve file filtering logic

* feat: add error handling for disabled file uploads in endpoint configuration

* refactor: update encodeAndFormat functions to accept structured parameters for provider and endpoint

* refactor: streamline requestFiles handling in initializeAgent function

* fix: getEndpointFileConfig partial config merging scenarios

* refactor: enhance mergeWithDefault function to support document-supported providers with comprehensive MIME types

* refactor: user-configured default file config in getEndpointFileConfig

* fix: prevent file handling when endpoint is disabled and file is dragged to chat

* refactor: move `getEndpointField` to `data-provider` and update usage across components and hooks

* fix: prioritize endpointType based on agent.endpoint in file filtering logic

* fix: prioritize agent.endpoint in file filtering logic and remove unnecessary endpointType defaulting
2025-11-10 19:05:30 -05:00
Danny Avila
360ec22964
⚗️ refactor: Provider File Validation with Configurable Size Limits (#10405)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Has been cancelled
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Has been cancelled
* chore: correct type for ServerRequest

* chore: improve ServerRequest typing across several modules

* feat: Add PDF configured limit validation

- Introduced comprehensive tests for PDF validation across multiple providers, ensuring correct behavior for file size limits and edge cases.
- Enhanced the `validatePdf` function to accept an optional configured file size limit, allowing for stricter validation based on user configurations.
- Updated related functions to utilize the new validation logic, ensuring consistent behavior across different providers.

* chore: Update Request type to ServerRequest in audio and video encoding modules

* refactor: move `getConfiguredFileSizeLimit` utility

* feat: Add video and audio validation with configurable size limits

- Introduced `validateVideo` and `validateAudio` functions to validate media files against provider-specific size limits.
- Enhanced validation logic to consider optional configured file size limits, allowing for more flexible file handling.
- Added comprehensive tests for video and audio validation across different providers, ensuring correct behavior for various scenarios.

* refactor: Update PDF and media validation to allow higher configured limits

- Modified validation logic to accept user-configured file size limits that exceed provider defaults, ensuring correct acceptance of files within the specified range.
- Updated tests to reflect changes in validation behavior, confirming that files are accepted when within the configured limits.
- Enhanced documentation in tests to clarify expected outcomes with the new validation rules.

* chore: Add @types/node-fetch dependency to package.json and package-lock.json

- Included the @types/node-fetch package to enhance type definitions for node-fetch usage.
- Updated package-lock.json to reflect the addition of the new dependency.

* fix: Rename FileConfigInput to TFileConfig
2025-11-07 10:57:15 -05:00
Danny Avila
f59daaeecc
📄 feat: Context Field for Anthropic Documents (PDF) (#10148)
* fix: Remove ephemeral cache control from document encoding function

* refactor: Improve document encoding types and add file context for anthropic messages api

- Added AnthropicDocumentBlock interface to define the structure for documents from the Anthropic provider.
- Updated encodeAndFormatDocuments function to utilize the new type and include optional context for filenames.
- Refactored DocumentResult to use a union type for various document formats, improving type safety and clarity.
2025-10-16 16:24:14 -04:00
Danny Avila
07d0abc9fd
🖼️ fix: Extract File Context & Persist Attachments (#10069)
- problem: `addImageUrls` had a side effect that was being leveraged before to populate both the `ocr` message field, now `fileContext`, and `client.options.attachments`, which would record the user's uploaded message attachments to the user message when saved to the database and returned at the end of the request lifecycle
- solution: created dedicated handling for file context, and made sure to populate `allFiles` with non-provider attachments
2025-10-10 05:35:37 -04:00
Danny Avila
bcd97aad2f
📎 feat: Direct Provider Attachment Support for Multimodal Content (#9994)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled
* 📎 feat: Direct Provider Attachment Support for Multimodal Content

* 📑 feat: Anthropic Direct Provider Upload (#9072)

* feat: implement Anthropic native PDF support with document preservation

- Add comprehensive debug logging throughout PDF processing pipeline
- Refactor attachment processing to separate image and document handling
- Create distinct addImageURLs(), addDocuments(), and processAttachments() methods
- Fix critical bugs in stream handling and parameter passing
- Add streamToBuffer utility for proper stream-to-buffer conversion
- Remove api/agents submodule from repository

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: remove out of scope formatting changes

* fix: stop duplication of file in chat on end of response stream

* chore: bring back file search and ocr options

* chore: localize upload to provider string in file menu

* refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support

* feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching

* feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload

* feat: add citations support according to docs

* refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now

* refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic

* fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs

* chore: remove client specific handling since the direct provider stuff is handled by the agent client

* feat: handle documents in AgentClient so no need for change to agents repo

* chore: removed unused changes

* chore: remove auto generated comments from OG commit

* feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic)

* fix: reintroduce role check to fix render error because of undefined value for Content Part

* fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent

---------

Co-authored-by: Andres Restrepo <andres@thelinuxkid.com>
Co-authored-by: Claude <noreply@anthropic.com>

📁 feat: Send Attachments Directly to Provider (OpenAI) (#9098)

* refactor: change references from direct upload to direct attach to better reflect functionality

since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support

* feat: add upload to provider option for openai (and agent) ui

* chore: move anthropic pdf validator over to packages/api

* feat: simple pdf validation according to openai docs

* feat: add provider agnostic validatePdf logic to start handling multiple endpoints

* feat: add handling for openai specific documentPart formatting

* refactor: move require statement to proper place at top of file

* chore: add in openAI endpoint for the rest of the document handling logic

* feat: add direct attach support for azureOpenAI endpoint and agents

* feat: add pdf validation for azureOpenAI endpoint

* refactor: unify all the endpoint checks with isDocumentSupportedEndpoint

* refactor: consolidate Upload to Provider vs Upload image logic for clarity

* refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now

🗂️ feat: Send Attachments Directly to Provider (Google) (#9100)

* feat: add validation for google PDFs and add google endpoint as a document supporting endpoint

* feat: add proper pdf formatting for google endpoints (requires PR #14 in agents)

* feat: add multimodal support for google endpoint attachments

* feat: add audio file svg

* fix: refactor attachments logic so multi-attachment messages work properly

* feat: add video file svg

* fix: allows for followup questions of uploaded multimodal attachments

* fix: remove incorrect final message filtering that was breaking Attachment component rendering

fix: manualy rename 'documents' to 'Documents' in git since it wasn't picked up due to case insensitivity in dir name

fix: add logic so filepicker for a google agent has proper filetype filtering

🛫 refactor: Move Encoding Logic to packages/api (#9182)

* refactor: move audio encode over to TS

* refactor: audio encoding now functional in LC again

* refactor: move video encode over to TS

* refactor: move document encode over to TS

* refactor: video encoding now functional in LC again

* refactor: document encoding now functional in LC again

* fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider

* feat: only accept pdfs if responses api is enabled for openai convos

chore: address ESLint comments

chore: add missing audio mimetype

* fix: type safety for message content parts and improve null handling

* chore: reorder AttachFileMenuProps for consistency and clarity

* chore: import order in AttachFileMenu

* fix: improve null handling for text parts in parseTextParts function

* fix: remove no longer used unsupported capability error message for file uploads

* fix: OpenAI Direct File Attachment Format

* fix: update encodeAndFormatDocuments to support  OpenAI responses API and enhance document result types

* refactor: broaden providers supported for documents

* feat: enhance DragDrop context and modal to support document uploads based on provider capabilities

* fix: reorder import statements for consistency in video encoding module

---------

Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
2025-10-06 17:30:16 -04:00
Danny Avila
4b5b46604c
🔍 refactor: OCR Fully Optional with Defaults for "Upload as Text" (#9856)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* refactor: move `loadOCRConfig` from `packages/data-provider` to `packages/api` and return `undefined` if not explicitly configured

* fix: loadOCRConfig import from @librechat/api

* refactor: update defaultTextMimeTypes to support virtually all file types for text parsing

* fix: improve OCR capability check and error message for unsupported file types

* ci: remove unnecessary ocr expectation from AppService test
2025-09-26 11:56:11 -04:00
Danny Avila
2489670f54
📂 refactor: File Read Operations (#9747)
* fix: axios response logging for text parsing, remove console logging, remove jsdoc

* refactor: error logging in logAxiosError function to handle various error types with type guards

* refactor: enhance text parsing with improved error handling and async file reading

* refactor: replace synchronous file reading with asynchronous methods for improved performance and memory management

* ci: update tests
2025-09-20 10:17:24 -04:00
Danny Avila
5bfb06b417
💻 feat: Add Proxy Config for Mistral OCR API (#9629)
* 💻 feat: Add proxy configuration support for Mistral OCR API requests

* refactor: Implement proxy support for Mistral API requests using HttpsProxyAgent
2025-09-14 18:50:41 -04:00
Danny Avila
e2a6937ca6
⚙️ fix: Update OCR context to use req.config (#9367) 2025-08-29 10:06:03 -04:00
Danny Avila
7742b18c9c
🔧 fix: Upload Audio as Text missing Param (#9356) 2025-08-28 21:07:30 -04:00
Danny Avila
48f6f8f2f8
📎 feat: Upload as Text Support for Plaintext, STT, RAG, and Token Limits (#8868)
* 🪶 feat: Add Support for Uploading Plaintext Files

feat: delineate between OCR and text handling in fileConfig field of config file

- also adds support for passing in mimetypes as just plain file extensions

feat: add showLabel bool to support future synthetic component DynamicDropdownInput

feat: add new combination dropdown-input component in params panel to support file type token limits

refactor: move hovercard to side to align with other hovercards

chore: clean up autogenerated comments

feat: add delineation to file upload path between text and ocr configured filetypes

feat: add token limit checks during file upload

refactor: move textParsing out of ocrEnabled logic

refactor: clean up types for filetype config

refactor: finish decoupling DynamicDropdownInput from fileTokenLimits

fix: move image token cost function into file to fix circular dependency causing unittest to fail and remove unused var for linter

chore: remove out of scope code following review

refactor: make fileTokenLimit conform to existing styles

chore: remove unused localization string

chore: undo changes to DynamicInput and other strays

feat: add fileTokenLimit to all provider config panels

fix: move textParsing back into ocr tool_resource block for now so that it doesn't interfere with other upload types

* 📤 feat: Add RAG API Endpoint Support for Text Parsing (#8849)

* feat: implement RAG API integration for text parsing with fallback to native parsing

* chore: remove TODO now that placeholder and fllback are implemented

* ✈️ refactor: Migrate Text Parsing to TS (#8892)

* refactor: move generateShortLivedToken to packages/api

* refactor: move textParsing logic into packages/api

* refactor: reduce nesting and dry code with createTextFile

* fix: add proper source handling

* fix: mock new parseText and parseTextNative functions in jest file

* ci: add test coverage for textParser

* 💬 feat: Add Audio File Support to Upload as Text (#8893)

* feat: add STT support for Upload as Text

* refactor: move processAudioFile to packages/api

* refactor: move textParsing from utils to files

* fix: remove audio/mp3 from unsupported mimetypes test since it is now supported

* ✂️ feat: Configurable File Token Limits and Truncation (#8911)

* feat: add configurable fileTokenLimit default value

* fix: add stt to fileConfig merge logic

* fix: add fileTokenLimit to mergeFileConfig logic so configurable value is actually respected from yaml

* feat: add token limiting to parsed text files

* fix: add extraction logic and update tests so fileTokenLimit isnt sent to LLM providers

* fix: address comments

* refactor: rename textTokenLimiter.ts to text.ts

* chore: update form-data package to address CVE-2025-7783 and update package-lock

* feat: use default supported mime types for ocr on frontend file validation

* fix: should be using logger.debug not console.debug

* fix: mock existsSync in text.spec.ts

* fix: mock logger rather than every one of its function calls

* fix: reorganize imports and streamline file upload processing logic

* refactor: update createTextFile function to use destructured parameters and improve readability

* chore: update file validation to use EToolResources for improved type safety

* chore: update import path for types in audio processing module

* fix: update file configuration access and replace console.debug with logger.debug for improved logging

---------

Co-authored-by: Dustin Healy <dustinhealy1@gmail.com>
Co-authored-by: Dustin Healy <54083382+dustinhealy@users.noreply.github.com>
2025-08-27 03:44:39 -04:00
Danny Avila
9a210971f5
🛜 refactor: Streamline App Config Usage (#9234)
* WIP: app.locals refactoring

WIP: appConfig

fix: update memory configuration retrieval to use getAppConfig based on user role

fix: update comment for AppConfig interface to clarify purpose

🏷️ refactor: Update tests to use getAppConfig for endpoint configurations

ci: Update AppService tests to initialize app config instead of app.locals

ci: Integrate getAppConfig into remaining tests

refactor: Update multer storage destination to use promise-based getAppConfig and improve error handling in tests

refactor: Rename initializeAppConfig to setAppConfig and update related tests

ci: Mock getAppConfig in various tests to provide default configurations

refactor: Update convertMCPToolsToPlugins to use mcpManager for server configuration and adjust related tests

chore: rename `Config/getAppConfig` -> `Config/app`

fix: streamline OpenAI image tools configuration by removing direct appConfig dependency and using function parameters

chore: correct parameter documentation for imageOutputType in ToolService.js

refactor: remove `getCustomConfig` dependency in config route

refactor: update domain validation to use appConfig for allowed domains

refactor: use appConfig registration property

chore: remove app parameter from AppService invocation

refactor: update AppConfig interface to correct registration and turnstile configurations

refactor: remove getCustomConfig dependency and use getAppConfig in PluginController, multer, and MCP services

refactor: replace getCustomConfig with getAppConfig in STTService, TTSService, and related files

refactor: replace getCustomConfig with getAppConfig in Conversation and Message models, update tempChatRetention functions to use AppConfig type

refactor: update getAppConfig calls in Conversation and Message models to include user role for temporary chat expiration

ci: update related tests

refactor: update getAppConfig call in getCustomConfigSpeech to include user role

fix: update appConfig usage to access allowedDomains from actions instead of registration

refactor: enhance AppConfig to include fileStrategies and update related file strategy logic

refactor: update imports to use normalizeEndpointName from @librechat/api and remove redundant definitions

chore: remove deprecated unused RunManager

refactor: get balance config primarily from appConfig

refactor: remove customConfig dependency for appConfig and streamline loadConfigModels logic

refactor: remove getCustomConfig usage and use app config in file citations

refactor: consolidate endpoint loading logic into loadEndpoints function

refactor: update appConfig access to use endpoints structure across various services

refactor: implement custom endpoints configuration and streamline endpoint loading logic

refactor: update getAppConfig call to include user role parameter

refactor: streamline endpoint configuration and enhance appConfig usage across services

refactor: replace getMCPAuthMap with getUserMCPAuthMap and remove unused getCustomConfig file

refactor: add type annotation for loadedEndpoints in loadEndpoints function

refactor: move /services/Files/images/parse to TS API

chore: add missing FILE_CITATIONS permission to IRole interface

refactor: restructure toolkits to TS API

refactor: separate manifest logic into its own module

refactor: consolidate tool loading logic into a new tools module for startup logic

refactor: move interface config logic to TS API

refactor: migrate checkEmailConfig to TypeScript and update imports

refactor: add FunctionTool interface and availableTools to AppConfig

refactor: decouple caching and DB operations from AppService, make part of consolidated `getAppConfig`

WIP: fix tests

* fix: rebase conflicts

* refactor: remove app.locals references

* refactor: replace getBalanceConfig with getAppConfig in various strategies and middleware

* refactor: replace appConfig?.balance with getBalanceConfig in various controllers and clients

* test: add balance configuration to titleConvo method in AgentClient tests

* chore: remove unused `openai-chat-tokens` package

* chore: remove unused imports in initializeMCPs.js

* refactor: update balance configuration to use getAppConfig instead of getBalanceConfig

* refactor: integrate configMiddleware for centralized configuration handling

* refactor: optimize email domain validation by removing unnecessary async calls

* refactor: simplify multer storage configuration by removing async calls

* refactor: reorder imports for better readability in user.js

* refactor: replace getAppConfig calls with req.config for improved performance

* chore: replace getAppConfig calls with req.config in tests for centralized configuration handling

* chore: remove unused override config

* refactor: add configMiddleware to endpoint route and replace getAppConfig with req.config

* chore: remove customConfig parameter from TTSService constructor

* refactor: pass appConfig from request to processFileCitations for improved configuration handling

* refactor: remove configMiddleware from endpoint route and retrieve appConfig directly in getEndpointsConfig if not in `req.config`

* test: add mockAppConfig to processFileCitations tests for improved configuration handling

* fix: pass req.config to hasCustomUserVars and call without await after synchronous refactor

* fix: type safety in useExportConversation

* refactor: retrieve appConfig using getAppConfig in PluginController and remove configMiddleware from plugins route, to avoid always retrieving when plugins are cached

* chore: change `MongoUser` typedef to `IUser`

* fix: Add `user` and `config` fields to ServerRequest and update JSDoc type annotations from Express.Request to ServerRequest

* fix: remove unused setAppConfig mock from Server configuration tests
2025-08-26 12:10:18 -04:00
Danny Avila
33834cd484
🧹 feat: Automatic File Cleanup for Mistral OCR Uploads (#8827)
* chore: Handle optional token_endpoint in OAuth metadata discovery

* chore: Simplify permission typing logic in checkAccess function

* feat: Implement `deleteMistralFile` function and integrate file cleanup in `uploadMistralOCR`
2025-08-03 17:11:14 -04:00
Danny Avila
7e37211458
🗝️ refactor: loadServiceKey to Support Stringified JSON and Env Var Renaming (#8317)
* feat: Enhance loadServiceKey to support stringified JSON input

* chore: Update GOOGLE_SERVICE_KEY_FILE_PATH to GOOGLE_SERVICE_KEY_FILE for consistency
2025-07-08 21:07:33 -04:00
Danny Avila
59d00e99f3
🔍 feat: Fetch Google Service Key and Consolidate Key Loading Logic (#8179) 2025-07-01 22:37:29 -04:00
Danny Avila
20100e120b
🔑 feat: Set Google Service Key File Path (#8130) 2025-06-29 17:09:37 -04:00
Danny Avila
3f3cfefc52
🗒️ feat: Add Google Vertex AI Mistral OCR Strategy (#8125)
* Implemented new uploadGoogleVertexMistralOCR function for processing OCR using Google Vertex AI.
* Added vertexMistralOCRStrategy to handle file uploads.
* Updated FileSources and OCRStrategy enums to include vertexai_mistral_ocr.
* Introduced helper functions for JWT creation and Google service account configuration loading.
2025-06-28 13:26:03 -04:00
Danny Avila
d39b99971f
🧠 fix: Agent Title Config & Resource Handling (#8028)
* 🔧 fix: enhance client options handling in AgentClient and set default recursion limit

- Updated the recursion limit to default to 25 if not specified in agentsEConfig.
- Enhanced client options in AgentClient to include model parameters such as apiKey and anthropicApiUrl from agentModelParams.
- Updated requestOptions in the anthropic endpoint to use reverseProxyUrl as anthropicApiUrl.

* Enhance LLM configuration tests with edge case handling

* chore add return type annotation for getCustomEndpointConfig function

* fix: update modelOptions handling to use optional chaining and default to empty object in multiple endpoint initializations

* chore: update @librechat/agents to version 2.4.42

* refactor: streamline agent endpoint configuration and enhance client options handling for title generations

- Introduced a new `getProviderConfig` function to centralize provider configuration logic.
- Updated `AgentClient` to utilize the new provider configuration, improving clarity and maintainability.
- Removed redundant code related to endpoint initialization and model parameter handling.
- Enhanced error logging for missing endpoint configurations.

* fix: add abort handling for image generation and editing in OpenAIImageTools

* ci: enhance getLLMConfig tests to verify fetchOptions and dispatcher properties

* fix: use optional chaining for endpointOption properties in getOptions

* fix: increase title generation timeout from 25s to 45s, pass `endpointOption` to `getOptions`

* fix: update file filtering logic in getToolFilesByIds to ensure text field is properly checked

* fix: add error handling for empty OCR results in uploadMistralOCR and uploadAzureMistralOCR

* fix: enhance error handling in file upload to include 'No OCR result' message

* chore: update error messages in uploadMistralOCR and uploadAzureMistralOCR

* fix: enhance filtering logic in getToolFilesByIds to include context checks for OCR resources to only include files directly attached to agent

---------

Co-authored-by: Matt Burnett <matt.burnett@shopify.com>
2025-06-23 19:44:24 -04:00
Danny Avila
0103b4b08a
🧹 chore: Cleanup base64 Handling for Azure Mistral OCR (#7892)
* 🧹 chore: Remove Comments and Cleanup base64 handling for Azure Mistral OCR

* chore: Remove unnecessary await from MCP instructions formatting in AgentClient

* ci: Update document_url regex in MistralOCR tests to support PDF format
2025-06-13 18:17:25 -04:00
Danny Avila
5f2d1c5dc9
👁️ feat: Azure Mistral OCR Strategy (#7888)
* 👁️ feat: Add Azure Mistral OCR strategy and endpoint integration

This commit introduces a new OCR strategy named 'azure_mistral_ocr', allowing the use of a Mistral OCR endpoint deployed on Azure. The configuration, schemas, and file upload strategies have been updated to support this integration, enabling seamless OCR processing via Azure-hosted Mistral services.

* 🗑️ chore: Clean up .gitignore by removing commented-out uncommon directory name

* chore: remove unused vars

* refactor: Move createAxiosInstance to packages/api/utils and update imports

- Removed the createAxiosInstance function from the config module and relocated it to a new utils module for better organization.
- Updated import paths in relevant files to reflect the new location of createAxiosInstance.
- Added tests for createAxiosInstance to ensure proper functionality and proxy configuration handling.

* chore: move axios helpers to packages/api

- Added logAxiosError function to @librechat/api for centralized error logging.
- Updated imports across various files to use the new logAxiosError function.
- Removed the old axios.js utility file as it is no longer needed.

* chore: Update Jest moduleNameMapper for improved path resolution

- Added a new mapping for '~/' to resolve module paths in Jest configuration, enhancing import handling for the project.

* feat: Implement Mistral OCR API integration in TS

* chore: Update MistralOCR tests based on new imports

* fix: Enhance MistralOCR configuration handling and tests

- Introduced helper functions for resolving configuration values from environment variables or hardcoded settings.
- Updated the uploadMistralOCR and uploadAzureMistralOCR functions to utilize the new configuration resolution logic.
- Improved test cases to ensure correct behavior when mixing environment variables and hardcoded values.
- Mocked file upload and signed URL responses in tests to validate functionality without external dependencies.

* feat: Enhance MistralOCR functionality with improved configuration and error handling

- Introduced helper functions for loading authentication configuration and resolving values from environment variables.
- Updated uploadMistralOCR and uploadAzureMistralOCR functions to utilize the new configuration logic.
- Added utility functions for processing OCR results and creating error messages.
- Improved document type determination and result aggregation for better OCR processing.

* refactor: Reorganize OCR type imports in Mistral CRUD file

- Moved OCRResult, OCRResultPage, and OCRImage imports to a more logical grouping for better readability and maintainability.

* feat: Add file exports to API and create files index

* chore: Update OCR types for enhanced structure and clarity

- Redesigned OCRImage interface to include mandatory fields and improved naming conventions.
- Added PageDimensions interface for better representation of page metrics.
- Updated OCRResultPage to include dimensions and mandatory images array.
- Refined OCRResult to include document annotation and usage information.

* refactor: use TS counterpart of uploadOCR methods

* ci: Update MistralOCR tests to reflect new OCR result structure

* chore: Bump version of @librechat/api to 1.2.3 in package.json and package-lock.json

* chore: Update CONFIG_VERSION to 1.2.8

* chore: remove unused sendEvent function from config module (now imported from '@librechat/api')

* chore: remove MistralOCR service files and tests (now in '@librechat/api')

* ci: update logger import in ModelService tests to use @librechat/data-schemas

---------

Co-authored-by: arthurolivierfortin <arthurolivier.fortin@gmail.com>
2025-06-13 15:14:57 -04:00