🦥 refactor: Event-Driven Lazy Tool Loading (#11588)

* refactor: json schema tools with lazy loading - Added LocalToolExecutor class for lazy loading and caching of tools during execution. - Introduced ToolExecutionContext and ToolExecutor interfaces for better type management. - Created utility functions to generate tool proxies with JSON schema support. - Added ExtendedJsonSchema type for enhanced schema definitions. - Updated existing toolkits to utilize the new schema and executor functionalities. - Introduced a comprehensive tool definitions registry for managing various tool schemas. chore: update @librechat/agents to version 3.1.2 refactor: enhance tool loading optimization and classification - Improved the loadAgentToolsOptimized function to utilize a proxy pattern for all tools, enabling deferred execution and reducing overhead. - Introduced caching for tool instances and refined tool classification logic to streamline tool management. - Updated the handling of MCP tools to improve logging and error reporting for missing tools in the cache. - Enhanced the structure of tool definitions to support better classification and integration with existing tools. refactor: modularize tool loading and enhance optimization - Moved the loadAgentToolsOptimized function to a new service file for better organization and maintainability. - Updated the ToolService to utilize the new service for optimized tool loading, improving code clarity. - Removed legacy tool loading methods and streamlined the tool loading process to enhance performance and reduce complexity. - Introduced feature flag handling for optimized tool loading, allowing for easier toggling of this functionality. refactor: replace loadAgentToolsWithFlag with loadAgentTools in tool loader refactor: enhance MCP tool loading with proxy creation and classification refactor: optimize MCP tool loading by grouping tools by server - Introduced a Map to group cached tools by server name, improving the organization of tool data. - Updated the createMCPProxyTool function to accept server name directly, enhancing clarity. - Refactored the logic for handling MCP tools, streamlining the process of creating proxy tools for classification. refactor: enhance MCP tool loading and proxy creation - Added functionality to retrieve MCP server tools and reinitialize servers if necessary, improving tool availability. - Updated the tool loading logic to utilize a Map for organizing tools by server, enhancing clarity and performance. - Refactored the createToolProxy function to ensure a default response format, streamlining tool creation. refactor: update createToolProxy to ensure consistent response format - Modified the createToolProxy function to await the executor's execution and validate the result format. - Ensured that the function returns a default response structure when the result is not an array of two elements, enhancing reliability in tool proxy creation. refactor: ToolExecutionContext with toolCall property - Added toolCall property to ToolExecutionContext interface for improved context handling during tool execution. - Updated LocalToolExecutor to include toolCall in the runnable configuration, allowing for more flexible tool invocation. - Modified createToolProxy to pass toolCall from the configuration, ensuring consistent context across tool executions. refactor: enhance event-driven tool execution and logging - Introduced ToolExecuteOptions for improved handling of event-driven tool execution, allowing for parallel execution of tool calls. - Updated getDefaultHandlers to include support for ON_TOOL_EXECUTE events, enhancing the flexibility of tool invocation. - Added detailed logging in LocalToolExecutor to track tool loading and execution metrics, improving observability and debugging capabilities. - Refactored initializeClient to integrate event-driven tool loading, ensuring compatibility with the new execution model. chore: update @librechat/agents to version 3.1.21 refactor: remove legacy tool loading and executor components - Eliminated the loadAgentToolsWithFlag function, simplifying the tool loading process by directly using loadAgentTools. - Removed the LocalToolExecutor and related executor components to streamline the tool execution architecture. - Updated ToolService and related files to reflect the removal of deprecated features, enhancing code clarity and maintainability. refactor: enhance tool classification and definitions handling - Updated the loadAgentTools function to return toolDefinitions alongside toolRegistry, improving the structure of tool data returned to clients. - Removed the convertRegistryToDefinitions function from the initialize.js file, simplifying the initialization process. - Adjusted the buildToolClassification function to ensure toolDefinitions are built and returned simultaneously with the toolRegistry, enhancing efficiency in tool management. - Updated type definitions in initialize.ts to include toolDefinitions, ensuring consistency across the codebase. refactor: implement event-driven tool execution handler - Introduced createToolExecuteHandler function to streamline the handling of ON_TOOL_EXECUTE events, allowing for parallel execution of tool calls. - Updated getDefaultHandlers to utilize the new handler, simplifying the event-driven architecture. - Added handlers.ts file to encapsulate tool execution logic, improving code organization and maintainability. - Enhanced OpenAI handlers to integrate the new tool execution capabilities, ensuring consistent event handling across the application. refactor: integrate event-driven tool execution options - Added toolExecuteOptions to support event-driven tool execution in OpenAI and responses controllers, enhancing flexibility in tool handling. - Updated handlers to utilize createToolExecuteHandler, allowing for streamlined execution of tools during agent interactions. - Refactored service dependencies to include toolExecuteOptions, ensuring consistent integration across the application. refactor: enhance tool loading with definitionsOnly parameter - Updated createToolLoader and loadAgentTools functions to include a definitionsOnly parameter, allowing for the retrieval of only serializable tool definitions in event-driven mode. - Adjusted related interfaces and documentation to reflect the new parameter, improving clarity and flexibility in tool management. - Ensured compatibility across various components by integrating the definitionsOnly option in the initialization process. refactor: improve agent tool presence check in initialization - Added a check for tool presence using a new hasAgentTools variable, which evaluates both structuredTools and toolDefinitions. - Updated the conditional logic in the agent initialization process to utilize the hasAgentTools variable, enhancing clarity and maintainability in tool management. refactor: enhance agent tool extraction to support tool definitions - Updated the extractMCPServers function to handle both tool instances and serializable tool definitions, improving flexibility in agent tool management. - Added a new property toolDefinitions to the AgentWithTools type for better integration of event-driven mode. - Enhanced documentation to clarify the function's capabilities in extracting unique MCP server names from both tools and tool definitions. refactor: enhance tool classification and registry building - Added serverName property to ToolDefinition for improved tool identification. - Introduced buildToolRegistry function to streamline the creation of tool registries based on MCP tool definitions and agent options. - Updated buildToolClassification to utilize the new registry building logic, ensuring basic definitions are returned even when advanced classification features are not allowed. - Enhanced documentation and logging for clarity in tool classification processes. refactor: update @librechat/agents dependency to version 3.1.22 fix: expose loadTools function in ToolService - Added loadTools function to the exported module in ToolService.js, enhancing the accessibility of tool loading functionality. chore: remove configurable options from tool execute options in OpenAI controller refactor: enhance tool loading mechanism to utilize agent-specific context chore: update @librechat/agents dependency to version 3.1.23 fix: simplify result handling in createToolExecuteHandler * refactor: loadToolDefinitions for efficient tool loading in event-driven mode * refactor: replace legacy tool loading with loadToolsForExecution in OpenAI and responses controllers - Updated OpenAIChatCompletionController and createResponse functions to utilize loadToolsForExecution for improved tool loading. - Removed deprecated loadToolsLegacy references, streamlining the tool execution process. - Enhanced tool loading options to include agent-specific context and configurations. * refactor: enhance tool loading and execution handling - Introduced loadActionToolsForExecution function to streamline loading of action tools, improving organization and maintainability. - Updated loadToolsForExecution to handle both regular and action tools, optimizing the tool loading process. - Added detailed logging for missing tools in createToolExecuteHandler, enhancing error visibility. - Refactored tool definitions to normalize action tool names, improving consistency in tool management. * refactor: enhance built-in tool definitions loading - Updated loadToolDefinitions to include descriptions and parameters from the tool registry for built-in tools, improving the clarity and usability of tool definitions. - Integrated getToolDefinition to streamline the retrieval of tool metadata, enhancing the overall tool management process. * feat: add action tool definitions loading to tool service - Introduced getActionToolDefinitions function to load action tool definitions based on agent ID and tool names, enhancing the tool loading process. - Updated loadToolDefinitions to integrate action tool definitions, allowing for better management and retrieval of action-specific tools. - Added comprehensive tests for action tool definitions to ensure correct loading and parameter handling, improving overall reliability and functionality. * chore: update @librechat/agents dependency to version 3.1.26 * refactor: add toolEndCallback to handle tool execution results * fix: tool definitions and execution handling - Introduced native tools (execute_code, file_search, web_search) to the tool service, allowing for better integration and management of these tools. - Updated isBuiltInTool function to include native tools in the built-in check, improving tool recognition. - Added comprehensive tests for loading parameters of native tools, ensuring correct functionality and parameter handling. - Enhanced tool definitions registry to include new agent tool definitions, streamlining tool retrieval and management. * refactor: enhance tool loading and execution context - Added toolRegistry to the context for OpenAIChatCompletionController and createResponse functions, improving tool management. - Updated loadToolsForExecution to utilize toolRegistry for better integration of programmatic tools and tool search functionalities. - Enhanced the initialization process to include toolRegistry in agent context, streamlining tool access and configuration. - Refactored tool classification logic to support event-driven execution, ensuring compatibility with new tool definitions. * chore: add request duration logging to OpenAI and Responses controllers - Introduced logging for request start and completion times in OpenAIChatCompletionController and createResponse functions. - Calculated and logged the duration of each request, enhancing observability and performance tracking. - Improved debugging capabilities by providing detailed logs for both streaming and non-streaming responses. * chore: update @librechat/agents dependency to version 3.1.27 * refactor: implement buildToolSet function for tool management - Introduced buildToolSet function to streamline the creation of tool sets from agent configurations, enhancing tool management across various controllers. - Updated AgentClient, OpenAIChatCompletionController, and createResponse functions to utilize buildToolSet, improving consistency in tool handling. - Added comprehensive tests for buildToolSet to ensure correct functionality and edge case handling, enhancing overall reliability. * refactor: update import paths for ToolExecuteOptions and createToolExecuteHandler * fix: update GoogleSearch.js description for maximum search results - Changed the default maximum number of search results from 10 to 5 in the Google Search JSON schema description, ensuring accurate documentation of the expected behavior. * chore: remove deprecated Browser tool and associated assets - Deleted the Browser tool definition from manifest.json, which included its name, plugin key, description, and authentication configuration. - Removed the web-browser.svg asset as it is no longer needed following the removal of the Browser tool. * fix: ensure tool definitions are valid before processing - Added a check to verify the existence of tool definitions in the registry before accessing their properties, preventing potential runtime errors. - Updated the loading logic for built-in tool definitions to ensure that only valid definitions are pushed to the built-in tool definitions array. * fix: extend ExtendedJsonSchema to support 'null' type and nullable enums - Updated the ExtendedJsonSchema type to include 'null' as a valid type option. - Modified the enum property to accept an array of values that can include strings, numbers, booleans, and null, enhancing schema flexibility. * test: add comprehensive tests for tool definitions loading and registry behavior - Implemented tests to verify the handling of built-in tools without registry definitions, ensuring they are skipped correctly. - Added tests to confirm that built-in tools include descriptions and parameters in the registry. - Enhanced tests for action tools, checking for proper inclusion of metadata and handling of tools without parameters in the registry. * test: add tests for mixed-type and number enum schema handling - Introduced tests to validate the parsing of mixed-type enum values, including strings, numbers, booleans, and null. - Added tests for number enum schema values to ensure correct parsing of numeric inputs, enhancing schema validation coverage. * fix: update mock implementation for @librechat/agents - Changed the mock for @librechat/agents to spread the actual module's properties, ensuring that all necessary functionalities are preserved in tests. - This adjustment enhances the accuracy of the tests by reflecting the real structure of the module. * fix: change max_results type in GoogleSearch schema from number to integer - Updated the type of max_results in the Google Search JSON schema to 'integer' for better type accuracy and validation consistency. * fix: update max_results description and type in GoogleSearch schema - Changed the type of max_results from 'number' to 'integer' for improved type accuracy. - Updated the description to reflect the new default maximum number of search results, changing it from 10 to 5. * refactor: remove unused code and improve tool registry handling - Eliminated outdated comments and conditional logic related to event-driven mode in the ToolService. - Enhanced the handling of the tool registry by ensuring it is configurable for better integration during tool execution. * feat: add definitionsOnly option to buildToolClassification for event-driven mode - Introduced a new parameter, definitionsOnly, to the BuildToolClassificationParams interface to enable a mode that skips tool instance creation. - Updated the buildToolClassification function to conditionally add tool definitions without instantiating tools when definitionsOnly is true. - Modified the loadToolDefinitions function to pass definitionsOnly as true, ensuring compatibility with the new feature. * test: add unit tests for buildToolClassification with definitionsOnly option - Implemented tests to verify the behavior of buildToolClassification when definitionsOnly is set to true or false. - Ensured that tool instances are not created when definitionsOnly is true, while still adding necessary tool definitions. - Confirmed that loadAuthValues is called appropriately based on the definitionsOnly parameter, enhancing test coverage for this new feature.
2026-02-16 23:48:09 +01:00 · 2026-02-01 08:50:57 -05:00 · 2026-02-01 08:50:57 -05:00 · 5af1342dbb
commit 5af1342dbb
parent 6279ea8dd7
46 changed files with 3297 additions and 565 deletions
--- a/packages/api/src/tools/toolkits/gemini.ts
+++ b/packages/api/src/tools/toolkits/gemini.ts
@ -1,4 +1,4 @@
-import { z } from 'zod';
+import type { ExtendedJsonSchema } from '../registry/definitions';

 /** Default description for Gemini image generation tool */
 const DEFAULT_GEMINI_IMAGE_GEN_DESCRIPTION =
@ -46,6 +46,35 @@ const getGeminiImageIdsDescription = () => {
  return process.env.GEMINI_IMAGE_IDS_DESCRIPTION || DEFAULT_GEMINI_IMAGE_IDS_DESCRIPTION;
 };

+const geminiImageGenJsonSchema: ExtendedJsonSchema = {
+  type: 'object',
+  properties: {
+    prompt: {
+      type: 'string',
+      maxLength: 32000,
+      description: getGeminiImageGenPromptDescription(),
+    },
+    image_ids: {
+      type: 'array',
+      items: { type: 'string' },
+      description: getGeminiImageIdsDescription(),
+    },
+    aspectRatio: {
+      type: 'string',
+      enum: ['1:1', '2:3', '3:2', '3:4', '4:3', '4:5', '5:4', '9:16', '16:9', '21:9'],
+      description:
+        'The aspect ratio of the generated image. Use 16:9 or 3:2 for landscape, 9:16 or 2:3 for portrait, 21:9 for ultra-wide/cinematic, 1:1 for square. Defaults to 1:1 if not specified.',
+    },
+    imageSize: {
+      type: 'string',
+      enum: ['1K', '2K', '4K'],
+      description:
+        'The resolution of the generated image. Use 1K for standard, 2K for high, 4K for maximum quality. Defaults to 1K if not specified.',
+    },
+  },
+  required: ['prompt'],
+};
+
 export const geminiToolkit = {
  gemini_image_gen: {
    name: 'gemini_image_gen' as const,
@ -77,22 +106,7 @@ export const geminiToolkit = {
 9. Use imageSize to control the resolution: 1K (standard), 2K (high), 4K (maximum quality).

 The prompt should be a detailed paragraph describing every part of the image in concrete, objective detail.`,
-    schema: z.object({
-      prompt: z.string().max(32000).describe(getGeminiImageGenPromptDescription()),
-      image_ids: z.array(z.string()).optional().describe(getGeminiImageIdsDescription()),
-      aspectRatio: z
-        .enum(['1:1', '2:3', '3:2', '3:4', '4:3', '4:5', '5:4', '9:16', '16:9', '21:9'])
-        .optional()
-        .describe(
-          'The aspect ratio of the generated image. Use 16:9 or 3:2 for landscape, 9:16 or 2:3 for portrait, 21:9 for ultra-wide/cinematic, 1:1 for square. Defaults to 1:1 if not specified.',
-        ),
-      imageSize: z
-        .enum(['1K', '2K', '4K'])
-        .optional()
-        .describe(
-          'The resolution of the generated image. Use 1K for standard, 2K for high, 4K for maximum quality. Defaults to 1K if not specified.',
-        ),
-    }),
+    schema: geminiImageGenJsonSchema,
    responseFormat: 'content_and_artifact' as const,
  },
 } as const;
--- a/packages/api/src/tools/toolkits/oai.ts
+++ b/packages/api/src/tools/toolkits/oai.ts
@ -1,4 +1,4 @@
-import { z } from 'zod';
+import type { ExtendedJsonSchema } from '../registry/definitions';

 /** Default descriptions for image generation tool  */
 const DEFAULT_IMAGE_GEN_DESCRIPTION =
@ -67,87 +67,81 @@ const getImageEditPromptDescription = () => {
  return process.env.IMAGE_EDIT_OAI_PROMPT_DESCRIPTION || DEFAULT_IMAGE_EDIT_PROMPT_DESCRIPTION;
 };

+const imageGenOaiJsonSchema: ExtendedJsonSchema = {
+  type: 'object',
+  properties: {
+    prompt: {
+      type: 'string',
+      maxLength: 32000,
+      description: getImageGenPromptDescription(),
+    },
+    background: {
+      type: 'string',
+      enum: ['transparent', 'opaque', 'auto'],
+      description:
+        'Sets transparency for the background. Must be one of transparent, opaque or auto (default). When transparent, the output format should be png or webp.',
+    },
+    quality: {
+      type: 'string',
+      enum: ['auto', 'high', 'medium', 'low'],
+      description: 'The quality of the image. One of auto (default), high, medium, or low.',
+    },
+    size: {
+      type: 'string',
+      enum: ['auto', '1024x1024', '1536x1024', '1024x1536'],
+      description:
+        'The size of the generated image. One of 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait), or auto (default).',
+    },
+  },
+  required: ['prompt'],
+};
+
+const imageEditOaiJsonSchema: ExtendedJsonSchema = {
+  type: 'object',
+  properties: {
+    image_ids: {
+      type: 'array',
+      items: { type: 'string' },
+      minItems: 1,
+      description: `IDs (image ID strings) of previously generated or uploaded images that should guide the edit.
+
+Guidelines:
+- If the user's request depends on any prior image(s), copy their image IDs into the \`image_ids\` array (in the same order the user refers to them).  
+- Never invent or hallucinate IDs; only use IDs that are still visible in the conversation context.
+- If no earlier image is relevant, omit the field entirely.`,
+    },
+    prompt: {
+      type: 'string',
+      maxLength: 32000,
+      description: getImageEditPromptDescription(),
+    },
+    quality: {
+      type: 'string',
+      enum: ['auto', 'high', 'medium', 'low'],
+      description:
+        'The quality of the image. One of auto (default), high, medium, or low. High/medium/low only supported for gpt-image-1.',
+    },
+    size: {
+      type: 'string',
+      enum: ['auto', '1024x1024', '1536x1024', '1024x1536', '256x256', '512x512'],
+      description:
+        'The size of the generated images. For gpt-image-1: auto (default), 1024x1024, 1536x1024, 1024x1536. For dall-e-2: 256x256, 512x512, 1024x1024.',
+    },
+  },
+  required: ['image_ids', 'prompt'],
+};
+
 export const oaiToolkit = {
  image_gen_oai: {
    name: 'image_gen_oai' as const,
    description: getImageGenDescription(),
-    schema: z.object({
-      prompt: z.string().max(32000).describe(getImageGenPromptDescription()),
-      background: z
-        .enum(['transparent', 'opaque', 'auto'])
-        .optional()
-        .describe(
-          'Sets transparency for the background. Must be one of transparent, opaque or auto (default). When transparent, the output format should be png or webp.',
-        ),
-      /*
-        n: z
-          .number()
-          .int()
-          .min(1)
-          .max(10)
-          .optional()
-          .describe('The number of images to generate. Must be between 1 and 10.'),
-        output_compression: z
-          .number()
-          .int()
-          .min(0)
-          .max(100)
-          .optional()
-          .describe('The compression level (0-100%) for webp or jpeg formats. Defaults to 100.'),
-           */
-      quality: z
-        .enum(['auto', 'high', 'medium', 'low'])
-        .optional()
-        .describe('The quality of the image. One of auto (default), high, medium, or low.'),
-      size: z
-        .enum(['auto', '1024x1024', '1536x1024', '1024x1536'])
-        .optional()
-        .describe(
-          'The size of the generated image. One of 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait), or auto (default).',
-        ),
-    }),
+    schema: imageGenOaiJsonSchema,
    responseFormat: 'content_and_artifact' as const,
  } as const,
  image_edit_oai: {
    name: 'image_edit_oai' as const,
    description: getImageEditDescription(),
-    schema: z.object({
-      image_ids: z
-        .array(z.string())
-        .min(1)
-        .describe(
-          `
-IDs (image ID strings) of previously generated or uploaded images that should guide the edit.
-
-Guidelines:
- If the user's request depends on any prior image(s), copy their image IDs into the \`image_ids\` array (in the same order the user refers to them).  
- Never invent or hallucinate IDs; only use IDs that are still visible in the conversation context.
- If no earlier image is relevant, omit the field entirely.
-`.trim(),
-        ),
-      prompt: z.string().max(32000).describe(getImageEditPromptDescription()),
-      /*
-        n: z
-          .number()
-          .int()
-          .min(1)
-          .max(10)
-          .optional()
-          .describe('The number of images to generate. Must be between 1 and 10. Defaults to 1.'),
-        */
-      quality: z
-        .enum(['auto', 'high', 'medium', 'low'])
-        .optional()
-        .describe(
-          'The quality of the image. One of auto (default), high, medium, or low. High/medium/low only supported for gpt-image-1.',
-        ),
-      size: z
-        .enum(['auto', '1024x1024', '1536x1024', '1024x1536', '256x256', '512x512'])
-        .optional()
-        .describe(
-          'The size of the generated images. For gpt-image-1: auto (default), 1024x1024, 1536x1024, 1024x1536. For dall-e-2: 256x256, 512x512, 1024x1024.',
-        ),
-    }),
+    schema: imageEditOaiJsonSchema,
    responseFormat: 'content_and_artifact' as const,
  },
 } as const;