LibreChat/packages/data-provider/src
Pol Burkardt Freire 7e74165c3c
📖 feat: Add Native ODT Document Parser Support (#12303)
* fix: add ODT support to native document parser

* fix: replace execSync with jszip for ODT parsing

* docs: update documentParserMimeTypes comment to include odt

* fix: improve ODT XML extraction and add empty.odt fixture

- Scope extraction to <office:body> to exclude metadata/style nodes
- Map </text:p> and </text:h> closings to newlines, preserving paragraph
  structure instead of collapsing everything to a single line
- Handle <text:line-break/> as explicit newlines
- Strip remaining tags, normalize horizontal whitespace, cap consecutive
  blank lines at one
- Regenerate sample.odt as a two-paragraph fixture so the test exercises
  multi-paragraph output
- Add empty.odt fixture and test asserting 'No text found in document'

* fix: address review findings in ODT parser

- Use static `import JSZip from 'jszip'` instead of dynamic import;
  jszip is CommonJS-only with no ESM/Jest-isolation concern (F1)
- Decode the five standard XML entities after tag-stripping so
  documents with &, <, >, ", ' send correct text to the LLM (F2)
- Remove @types/jszip devDependency; jszip ships bundled declarations
  and @types/jszip is a stale 2020 stub that would shadow them (F3)
- Handle <text:tab/> → \t and <text:s .../> → ' ' before the generic
  tag stripper so tab-aligned and multi-space content is preserved (F4)
- Add sample-entities.odt fixture and test covering entity decoding,
  tab, and spacing-element handling (F5)
- Rename 'throws for empty odt' → 'throws for odt with no extractable
  text' to distinguish from a zero-byte/corrupt file case (F8)

* fix: add decompressed content size cap to odtToText (F6)

Reads uncompressed entry sizes from the JSZip internal metadata before
extracting any content. Throws if the total exceeds 50MB, preventing a
crafted ODT with a high-ratio compressed payload from exhausting heap.

Adds a corresponding test using a real DEFLATE-compressed ZIP (~51KB on
disk, 51MB uncompressed) to verify the guard fires before any extraction.

* fix: add java to codeTypeMapping for file upload support

.java files were rejected with "Unable to determine file type" because
browsers send an empty MIME type for them and codeTypeMapping had no
'java' entry for inferMimeType() to fall back on.

text/x-java was already present in all five validation lists
(fullMimeTypesList, codeInterpreterMimeTypesList, retrievalMimeTypesList,
textMimeTypes, retrievalMimeTypes), so mapping to it (not text/plain)
ensures .java uploads work for both File Search and Code Interpreter.

Closes #12307

* fix: address follow-up review findings (A-E)

A: regenerate package-lock.json after removing @types/jszip from
   package.json; without this npm ci was still installing the stale
   2020 type stubs and TypeScript was resolving against them
B: replace dynamic import('jszip') in the zip-bomb test with the same
   static import already used in production; jszip is CJS-only with no
   ESM/Jest isolation concern
C: document that the _data.uncompressedSize guard fails open if jszip
   renames the private field (accepted limitation, test would catch it)
D: rename 'preserves tabs' test to 'normalizes tab and spacing elements
   to spaces' since <text:tab> is collapsed to a space, not kept as \t
E: fix test.each([ formatting artifact (missing newline after '[')

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-19 15:49:52 -04:00
..
react-query 🛸 feat: Remote Agent Access with External API Support (#11503) 2026-01-28 17:44:33 -05:00
types 📄 feat: Local Text Extraction for PDF, DOCX, and XLS/XLSX (#11900) 2026-02-22 14:22:45 -05:00
accessPermissions.ts 🛂 fix: Validate types Query Param in People Picker Access Middleware (#12276) 2026-03-17 02:46:11 -04:00
actions.ts 🛡️ fix: Implement TOCTOU-Safe SSRF Protection for Actions and MCP (#11722) 2026-02-11 22:09:58 -05:00
api-endpoints.ts 🧭 fix: Subdirectory Deployment Auth Redirect Path Doubling (#12077) 2026-03-05 01:38:44 -05:00
artifacts.ts 🪟 fix: Windows Build (npm) (#3889) 2024-09-02 10:01:09 -04:00
azure.ts 🔃 refactor: Decouple Effects from AppService, move to data-schemas (#9974) 2025-10-05 06:37:57 -04:00
bedrock.ts 🧠 feat: Add reasoning_effort configuration for Bedrock models (#11991) 2026-02-28 15:02:09 -05:00
config.spec.ts fix: Resolve Agent Provider Endpoint Type for File Upload Support (#12117) 2026-03-07 10:45:43 -05:00
config.ts 🫧 refactor: Clear Drafts and Surface Error on Expired SSE Stream (#12309) 2026-03-19 14:51:28 -04:00
createPayload.ts ⏸ refactor: Improve UX for Parallel Streams (Multi-Convo) (#11096) 2025-12-25 01:43:54 -05:00
data-service.ts 🔑 fix: Require OTP Verification for 2FA Re-Enrollment and Backup Code Regeneration (#12223) 2026-03-14 01:51:31 -04:00
feedback.ts 📈 feat: Chat rating for feedback (#5878) 2025-05-30 12:16:34 -04:00
file-config.spec.ts 📖 feat: Add Native ODT Document Parser Support (#12303) 2026-03-19 15:49:52 -04:00
file-config.ts 📖 feat: Add Native ODT Document Parser Support (#12303) 2026-03-19 15:49:52 -04:00
generate.ts 🪐 feat: Initial OpenAI Responses API Support (#8149) 2025-06-30 18:34:47 -04:00
headers-helpers.ts 🚪 fix: Complete OIDC RP-Initiated Logout With id_token_hint and Redirect Race Fix (#12024) 2026-03-02 21:34:13 -05:00
index.ts 🔒 fix: Request interceptor for Shared Link Page Scenarios (#12036) 2026-03-03 12:03:33 -05:00
keys.ts 🛸 feat: Remote Agent Access with External API Support (#11503) 2026-01-28 17:44:33 -05:00
mcp.ts 🔏 fix: MCP Server URL Schema Validation (#12204) 2026-03-12 23:19:31 -04:00
messages.ts 🐛 fix: String Interpolation in Messages Endpoint from #9155 (#9312) 2025-08-27 13:48:48 -04:00
models.ts 🗂️ refactor: Artifacts via Model Specs & Scope Badge Persistence by Spec Context (#11796) 2026-02-14 13:56:50 -05:00
parameterSettings.ts 🎚️ feat: Add Thinking Level Parameter for Gemini 3+ Models (#11994) 2026-02-28 16:56:10 -05:00
parsers.ts 📅 refactor: Replace Numeric Weekday Index with Named Day in Date Template Variables (#12022) 2026-03-02 19:22:11 -05:00
permissions.ts 🛸 feat: Remote Agent Access with External API Support (#11503) 2026-01-28 17:44:33 -05:00
request.ts 🧭 fix: Subdirectory Deployment Auth Redirect Path Doubling (#12077) 2026-03-05 01:38:44 -05:00
roles.spec.ts 🎭 fix: Set Explicit Permission Defaults for USER Role in roleDefaults (#12308) 2026-03-19 14:52:06 -04:00
roles.ts 🎭 fix: Set Explicit Permission Defaults for USER Role in roleDefaults (#12308) 2026-03-19 14:52:06 -04:00
schemas.spec.ts 🤖 feat: Claude Opus 4.6 - 1M Context, Premium Pricing, Adaptive Thinking (#11670) 2026-02-06 18:35:36 -05:00
schemas.ts 🎚️ feat: Add Thinking Level Parameter for Gemini 3+ Models (#11994) 2026-02-28 16:56:10 -05:00
types.ts 🔑 fix: Require OTP Verification for 2FA Re-Enrollment and Backup Code Regeneration (#12223) 2026-03-14 01:51:31 -04:00
utils.ts 🧯 fix: Prevent Env-Variable Exfil. via Placeholder Injection (#12260) 2026-03-16 08:48:24 -04:00