mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-01-20 17:26:12 +01:00
⚡ refactor: Optimize & Standardize Tokenizer Usage (#10777)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* refactor: Token Limit Processing with Enhanced Efficiency - Added a new test suite for `processTextWithTokenLimit`, ensuring comprehensive coverage of various scenarios including under, at, and exceeding token limits. - Refactored the `processTextWithTokenLimit` function to utilize a ratio-based estimation method, significantly reducing the number of token counting function calls compared to the previous binary search approach. - Improved handling of edge cases and variable token density, ensuring accurate truncation and performance across diverse text inputs. - Included direct comparisons with the old implementation to validate correctness and efficiency improvements. * refactor: Remove Tokenizer Route and Related References - Deleted the tokenizer route from the server and removed its references from the routes index and server files, streamlining the API structure. - This change simplifies the routing configuration by eliminating unused endpoints. * refactor: Migrate countTokens Utility to API Module - Removed the local countTokens utility and integrated it into the @librechat/api module for centralized access. - Updated various files to reference the new countTokens import from the API module, ensuring consistent usage across the application. - Cleaned up unused references and imports related to the previous countTokens implementation. * refactor: Centralize escapeRegExp Utility in API Module - Moved the escapeRegExp function from local utility files to the @librechat/api module for consistent usage across the application. - Updated imports in various files to reference the new centralized escapeRegExp function, ensuring cleaner code and reducing redundancy. - Removed duplicate implementations of escapeRegExp from multiple files, streamlining the codebase. * refactor: Enhance Token Counting Flexibility in Text Processing - Updated the `processTextWithTokenLimit` function to accept both synchronous and asynchronous token counting functions, improving its versatility. - Introduced a new `TokenCountFn` type to define the token counting function signature. - Added comprehensive tests to validate the behavior of `processTextWithTokenLimit` with both sync and async token counting functions, ensuring consistent results. - Implemented a wrapper to track call counts for the `countTokens` function, optimizing performance and reducing unnecessary calls. - Enhanced existing tests to compare the performance of the new implementation against the old one, demonstrating significant improvements in efficiency. * chore: documentation for Truncation Safety Buffer in Token Processing - Added a safety buffer multiplier to the character position estimates during text truncation to prevent overshooting token limits. - Updated the `processTextWithTokenLimit` function to utilize the new `TRUNCATION_SAFETY_BUFFER` constant, enhancing the accuracy of token limit processing. - Improved documentation to clarify the rationale behind the buffer and its impact on performance and efficiency in token counting.
This commit is contained in:
parent
b2387cc6fa
commit
8bdc808074
19 changed files with 925 additions and 107 deletions
|
|
@ -1,11 +1,39 @@
|
|||
import { logger } from '@librechat/data-schemas';
|
||||
|
||||
/** Token count function that can be sync or async */
|
||||
export type TokenCountFn = (text: string) => number | Promise<number>;
|
||||
|
||||
/**
|
||||
* Safety buffer multiplier applied to character position estimates during truncation.
|
||||
*
|
||||
* We use 98% (0.98) rather than 100% to intentionally undershoot the target on the first attempt.
|
||||
* This is necessary because:
|
||||
* - Token density varies across text (some regions may have more tokens per character than the average)
|
||||
* - The ratio-based estimate assumes uniform token distribution, which is rarely true
|
||||
* - Undershooting is safer than overshooting: exceeding the limit requires another iteration,
|
||||
* while being slightly under is acceptable
|
||||
* - In practice, this buffer reduces refinement iterations from 2-3 down to 0-1 in most cases
|
||||
*
|
||||
* @example
|
||||
* // If text has 1000 chars and 250 tokens (4 chars/token average), targeting 100 tokens:
|
||||
* // Without buffer: estimate = 1000 * (100/250) = 400 chars → might yield 105 tokens (over!)
|
||||
* // With 0.98 buffer: estimate = 400 * 0.98 = 392 chars → likely yields 97-99 tokens (safe)
|
||||
*/
|
||||
const TRUNCATION_SAFETY_BUFFER = 0.98;
|
||||
|
||||
/**
|
||||
* Processes text content by counting tokens and truncating if it exceeds the specified limit.
|
||||
* Uses ratio-based estimation to minimize expensive tokenCountFn calls.
|
||||
*
|
||||
* @param text - The text content to process
|
||||
* @param tokenLimit - The maximum number of tokens allowed
|
||||
* @param tokenCountFn - Function to count tokens
|
||||
* @param tokenCountFn - Function to count tokens (can be sync or async)
|
||||
* @returns Promise resolving to object with processed text, token count, and truncation status
|
||||
*
|
||||
* @remarks
|
||||
* This function uses a ratio-based estimation algorithm instead of binary search.
|
||||
* Binary search would require O(log n) tokenCountFn calls (~17 for 100k chars),
|
||||
* while this approach typically requires only 2-3 calls for a 90%+ reduction in CPU usage.
|
||||
*/
|
||||
export async function processTextWithTokenLimit({
|
||||
text,
|
||||
|
|
@ -14,7 +42,7 @@ export async function processTextWithTokenLimit({
|
|||
}: {
|
||||
text: string;
|
||||
tokenLimit: number;
|
||||
tokenCountFn: (text: string) => number;
|
||||
tokenCountFn: TokenCountFn;
|
||||
}): Promise<{ text: string; tokenCount: number; wasTruncated: boolean }> {
|
||||
const originalTokenCount = await tokenCountFn(text);
|
||||
|
||||
|
|
@ -26,40 +54,34 @@ export async function processTextWithTokenLimit({
|
|||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Doing binary search here to find the truncation point efficiently
|
||||
* (May be a better way to go about this)
|
||||
*/
|
||||
let low = 0;
|
||||
let high = text.length;
|
||||
let bestText = '';
|
||||
|
||||
logger.debug(
|
||||
`[textTokenLimiter] Text content exceeds token limit: ${originalTokenCount} > ${tokenLimit}, truncating...`,
|
||||
);
|
||||
|
||||
while (low <= high) {
|
||||
const mid = Math.floor((low + high) / 2);
|
||||
const truncatedText = text.substring(0, mid);
|
||||
const tokenCount = await tokenCountFn(truncatedText);
|
||||
const ratio = tokenLimit / originalTokenCount;
|
||||
let charPosition = Math.floor(text.length * ratio * TRUNCATION_SAFETY_BUFFER);
|
||||
|
||||
if (tokenCount <= tokenLimit) {
|
||||
bestText = truncatedText;
|
||||
low = mid + 1;
|
||||
} else {
|
||||
high = mid - 1;
|
||||
}
|
||||
let truncatedText = text.substring(0, charPosition);
|
||||
let tokenCount = await tokenCountFn(truncatedText);
|
||||
|
||||
const maxIterations = 5;
|
||||
let iterations = 0;
|
||||
|
||||
while (tokenCount > tokenLimit && iterations < maxIterations && charPosition > 0) {
|
||||
const overageRatio = tokenLimit / tokenCount;
|
||||
charPosition = Math.floor(charPosition * overageRatio * TRUNCATION_SAFETY_BUFFER);
|
||||
truncatedText = text.substring(0, charPosition);
|
||||
tokenCount = await tokenCountFn(truncatedText);
|
||||
iterations++;
|
||||
}
|
||||
|
||||
const finalTokenCount = await tokenCountFn(bestText);
|
||||
|
||||
logger.warn(
|
||||
`[textTokenLimiter] Text truncated from ${originalTokenCount} to ${finalTokenCount} tokens (limit: ${tokenLimit})`,
|
||||
`[textTokenLimiter] Text truncated from ${originalTokenCount} to ${tokenCount} tokens (limit: ${tokenLimit})`,
|
||||
);
|
||||
|
||||
return {
|
||||
text: bestText,
|
||||
tokenCount: finalTokenCount,
|
||||
text: truncatedText,
|
||||
tokenCount,
|
||||
wasTruncated: true,
|
||||
};
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue