LibreChat/api/app/clients/tools/structured/DALLE3.js
Danny Avila 5af1342dbb
🦥 refactor: Event-Driven Lazy Tool Loading (#11588)
* refactor: json schema tools with lazy loading

- Added LocalToolExecutor class for lazy loading and caching of tools during execution.
- Introduced ToolExecutionContext and ToolExecutor interfaces for better type management.
- Created utility functions to generate tool proxies with JSON schema support.
- Added ExtendedJsonSchema type for enhanced schema definitions.
- Updated existing toolkits to utilize the new schema and executor functionalities.
- Introduced a comprehensive tool definitions registry for managing various tool schemas.

chore: update @librechat/agents to version 3.1.2

refactor: enhance tool loading optimization and classification

- Improved the loadAgentToolsOptimized function to utilize a proxy pattern for all tools, enabling deferred execution and reducing overhead.
- Introduced caching for tool instances and refined tool classification logic to streamline tool management.
- Updated the handling of MCP tools to improve logging and error reporting for missing tools in the cache.
- Enhanced the structure of tool definitions to support better classification and integration with existing tools.

refactor: modularize tool loading and enhance optimization

- Moved the loadAgentToolsOptimized function to a new service file for better organization and maintainability.
- Updated the ToolService to utilize the new service for optimized tool loading, improving code clarity.
- Removed legacy tool loading methods and streamlined the tool loading process to enhance performance and reduce complexity.
- Introduced feature flag handling for optimized tool loading, allowing for easier toggling of this functionality.

refactor: replace loadAgentToolsWithFlag with loadAgentTools in tool loader

refactor: enhance MCP tool loading with proxy creation and classification

refactor: optimize MCP tool loading by grouping tools by server

- Introduced a Map to group cached tools by server name, improving the organization of tool data.
- Updated the createMCPProxyTool function to accept server name directly, enhancing clarity.
- Refactored the logic for handling MCP tools, streamlining the process of creating proxy tools for classification.

refactor: enhance MCP tool loading and proxy creation

- Added functionality to retrieve MCP server tools and reinitialize servers if necessary, improving tool availability.
- Updated the tool loading logic to utilize a Map for organizing tools by server, enhancing clarity and performance.
- Refactored the createToolProxy function to ensure a default response format, streamlining tool creation.

refactor: update createToolProxy to ensure consistent response format

- Modified the createToolProxy function to await the executor's execution and validate the result format.
- Ensured that the function returns a default response structure when the result is not an array of two elements, enhancing reliability in tool proxy creation.

refactor: ToolExecutionContext with toolCall property

- Added toolCall property to ToolExecutionContext interface for improved context handling during tool execution.
- Updated LocalToolExecutor to include toolCall in the runnable configuration, allowing for more flexible tool invocation.
- Modified createToolProxy to pass toolCall from the configuration, ensuring consistent context across tool executions.

refactor: enhance event-driven tool execution and logging

- Introduced ToolExecuteOptions for improved handling of event-driven tool execution, allowing for parallel execution of tool calls.
- Updated getDefaultHandlers to include support for ON_TOOL_EXECUTE events, enhancing the flexibility of tool invocation.
- Added detailed logging in LocalToolExecutor to track tool loading and execution metrics, improving observability and debugging capabilities.
- Refactored initializeClient to integrate event-driven tool loading, ensuring compatibility with the new execution model.

chore: update @librechat/agents to version 3.1.21

refactor: remove legacy tool loading and executor components

- Eliminated the loadAgentToolsWithFlag function, simplifying the tool loading process by directly using loadAgentTools.
- Removed the LocalToolExecutor and related executor components to streamline the tool execution architecture.
- Updated ToolService and related files to reflect the removal of deprecated features, enhancing code clarity and maintainability.

refactor: enhance tool classification and definitions handling

- Updated the loadAgentTools function to return toolDefinitions alongside toolRegistry, improving the structure of tool data returned to clients.
- Removed the convertRegistryToDefinitions function from the initialize.js file, simplifying the initialization process.
- Adjusted the buildToolClassification function to ensure toolDefinitions are built and returned simultaneously with the toolRegistry, enhancing efficiency in tool management.
- Updated type definitions in initialize.ts to include toolDefinitions, ensuring consistency across the codebase.

refactor: implement event-driven tool execution handler

- Introduced createToolExecuteHandler function to streamline the handling of ON_TOOL_EXECUTE events, allowing for parallel execution of tool calls.
- Updated getDefaultHandlers to utilize the new handler, simplifying the event-driven architecture.
- Added handlers.ts file to encapsulate tool execution logic, improving code organization and maintainability.
- Enhanced OpenAI handlers to integrate the new tool execution capabilities, ensuring consistent event handling across the application.

refactor: integrate event-driven tool execution options

- Added toolExecuteOptions to support event-driven tool execution in OpenAI and responses controllers, enhancing flexibility in tool handling.
- Updated handlers to utilize createToolExecuteHandler, allowing for streamlined execution of tools during agent interactions.
- Refactored service dependencies to include toolExecuteOptions, ensuring consistent integration across the application.

refactor: enhance tool loading with definitionsOnly parameter

- Updated createToolLoader and loadAgentTools functions to include a definitionsOnly parameter, allowing for the retrieval of only serializable tool definitions in event-driven mode.
- Adjusted related interfaces and documentation to reflect the new parameter, improving clarity and flexibility in tool management.
- Ensured compatibility across various components by integrating the definitionsOnly option in the initialization process.

refactor: improve agent tool presence check in initialization

- Added a check for tool presence using a new hasAgentTools variable, which evaluates both structuredTools and toolDefinitions.
- Updated the conditional logic in the agent initialization process to utilize the hasAgentTools variable, enhancing clarity and maintainability in tool management.

refactor: enhance agent tool extraction to support tool definitions

- Updated the extractMCPServers function to handle both tool instances and serializable tool definitions, improving flexibility in agent tool management.
- Added a new property toolDefinitions to the AgentWithTools type for better integration of event-driven mode.
- Enhanced documentation to clarify the function's capabilities in extracting unique MCP server names from both tools and tool definitions.

refactor: enhance tool classification and registry building

- Added serverName property to ToolDefinition for improved tool identification.
- Introduced buildToolRegistry function to streamline the creation of tool registries based on MCP tool definitions and agent options.
- Updated buildToolClassification to utilize the new registry building logic, ensuring basic definitions are returned even when advanced classification features are not allowed.
- Enhanced documentation and logging for clarity in tool classification processes.

refactor: update @librechat/agents dependency to version 3.1.22

fix: expose loadTools function in ToolService

- Added loadTools function to the exported module in ToolService.js, enhancing the accessibility of tool loading functionality.

chore: remove configurable options from tool execute options in OpenAI controller

refactor: enhance tool loading mechanism to utilize agent-specific context

chore: update @librechat/agents dependency to version 3.1.23

fix: simplify result handling in createToolExecuteHandler

* refactor: loadToolDefinitions for efficient tool loading in event-driven mode

* refactor: replace legacy tool loading with loadToolsForExecution in OpenAI and responses controllers

- Updated OpenAIChatCompletionController and createResponse functions to utilize loadToolsForExecution for improved tool loading.
- Removed deprecated loadToolsLegacy references, streamlining the tool execution process.
- Enhanced tool loading options to include agent-specific context and configurations.

* refactor: enhance tool loading and execution handling

- Introduced loadActionToolsForExecution function to streamline loading of action tools, improving organization and maintainability.
- Updated loadToolsForExecution to handle both regular and action tools, optimizing the tool loading process.
- Added detailed logging for missing tools in createToolExecuteHandler, enhancing error visibility.
- Refactored tool definitions to normalize action tool names, improving consistency in tool management.

* refactor: enhance built-in tool definitions loading

- Updated loadToolDefinitions to include descriptions and parameters from the tool registry for built-in tools, improving the clarity and usability of tool definitions.
- Integrated getToolDefinition to streamline the retrieval of tool metadata, enhancing the overall tool management process.

* feat: add action tool definitions loading to tool service

- Introduced getActionToolDefinitions function to load action tool definitions based on agent ID and tool names, enhancing the tool loading process.
- Updated loadToolDefinitions to integrate action tool definitions, allowing for better management and retrieval of action-specific tools.
- Added comprehensive tests for action tool definitions to ensure correct loading and parameter handling, improving overall reliability and functionality.

* chore: update @librechat/agents dependency to version 3.1.26

* refactor: add toolEndCallback to handle tool execution results

* fix: tool definitions and execution handling

- Introduced native tools (execute_code, file_search, web_search) to the tool service, allowing for better integration and management of these tools.
- Updated isBuiltInTool function to include native tools in the built-in check, improving tool recognition.
- Added comprehensive tests for loading parameters of native tools, ensuring correct functionality and parameter handling.
- Enhanced tool definitions registry to include new agent tool definitions, streamlining tool retrieval and management.

* refactor: enhance tool loading and execution context

- Added toolRegistry to the context for OpenAIChatCompletionController and createResponse functions, improving tool management.
- Updated loadToolsForExecution to utilize toolRegistry for better integration of programmatic tools and tool search functionalities.
- Enhanced the initialization process to include toolRegistry in agent context, streamlining tool access and configuration.
- Refactored tool classification logic to support event-driven execution, ensuring compatibility with new tool definitions.

* chore: add request duration logging to OpenAI and Responses controllers

- Introduced logging for request start and completion times in OpenAIChatCompletionController and createResponse functions.
- Calculated and logged the duration of each request, enhancing observability and performance tracking.
- Improved debugging capabilities by providing detailed logs for both streaming and non-streaming responses.

* chore: update @librechat/agents dependency to version 3.1.27

* refactor: implement buildToolSet function for tool management

- Introduced buildToolSet function to streamline the creation of tool sets from agent configurations, enhancing tool management across various controllers.
- Updated AgentClient, OpenAIChatCompletionController, and createResponse functions to utilize buildToolSet, improving consistency in tool handling.
- Added comprehensive tests for buildToolSet to ensure correct functionality and edge case handling, enhancing overall reliability.

* refactor: update import paths for ToolExecuteOptions and createToolExecuteHandler

* fix: update GoogleSearch.js description for maximum search results

- Changed the default maximum number of search results from 10 to 5 in the Google Search JSON schema description, ensuring accurate documentation of the expected behavior.

* chore: remove deprecated Browser tool and associated assets

- Deleted the Browser tool definition from manifest.json, which included its name, plugin key, description, and authentication configuration.
- Removed the web-browser.svg asset as it is no longer needed following the removal of the Browser tool.

* fix: ensure tool definitions are valid before processing

- Added a check to verify the existence of tool definitions in the registry before accessing their properties, preventing potential runtime errors.
- Updated the loading logic for built-in tool definitions to ensure that only valid definitions are pushed to the built-in tool definitions array.

* fix: extend ExtendedJsonSchema to support 'null' type and nullable enums

- Updated the ExtendedJsonSchema type to include 'null' as a valid type option.
- Modified the enum property to accept an array of values that can include strings, numbers, booleans, and null, enhancing schema flexibility.

* test: add comprehensive tests for tool definitions loading and registry behavior

- Implemented tests to verify the handling of built-in tools without registry definitions, ensuring they are skipped correctly.
- Added tests to confirm that built-in tools include descriptions and parameters in the registry.
- Enhanced tests for action tools, checking for proper inclusion of metadata and handling of tools without parameters in the registry.

* test: add tests for mixed-type and number enum schema handling

- Introduced tests to validate the parsing of mixed-type enum values, including strings, numbers, booleans, and null.
- Added tests for number enum schema values to ensure correct parsing of numeric inputs, enhancing schema validation coverage.

* fix: update mock implementation for @librechat/agents

- Changed the mock for @librechat/agents to spread the actual module's properties, ensuring that all necessary functionalities are preserved in tests.
- This adjustment enhances the accuracy of the tests by reflecting the real structure of the module.

* fix: change max_results type in GoogleSearch schema from number to integer

- Updated the type of max_results in the Google Search JSON schema to 'integer' for better type accuracy and validation consistency.

* fix: update max_results description and type in GoogleSearch schema

- Changed the type of max_results from 'number' to 'integer' for improved type accuracy.
- Updated the description to reflect the new default maximum number of search results, changing it from 10 to 5.

* refactor: remove unused code and improve tool registry handling

- Eliminated outdated comments and conditional logic related to event-driven mode in the ToolService.
- Enhanced the handling of the tool registry by ensuring it is configurable for better integration during tool execution.

* feat: add definitionsOnly option to buildToolClassification for event-driven mode

- Introduced a new parameter, definitionsOnly, to the BuildToolClassificationParams interface to enable a mode that skips tool instance creation.
- Updated the buildToolClassification function to conditionally add tool definitions without instantiating tools when definitionsOnly is true.
- Modified the loadToolDefinitions function to pass definitionsOnly as true, ensuring compatibility with the new feature.

* test: add unit tests for buildToolClassification with definitionsOnly option

- Implemented tests to verify the behavior of buildToolClassification when definitionsOnly is set to true or false.
- Ensured that tool instances are not created when definitionsOnly is true, while still adding necessary tool definitions.
- Confirmed that loadAuthValues is called appropriately based on the definitionsOnly parameter, enhancing test coverage for this new feature.
2026-02-01 08:50:57 -05:00

243 lines
9.7 KiB
JavaScript

const path = require('path');
const OpenAI = require('openai');
const { v4: uuidv4 } = require('uuid');
const { ProxyAgent, fetch } = require('undici');
const { Tool } = require('@langchain/core/tools');
const { logger } = require('@librechat/data-schemas');
const { getImageBasename, extractBaseURL } = require('@librechat/api');
const { FileContext, ContentTypes } = require('librechat-data-provider');
const dalle3JsonSchema = {
type: 'object',
properties: {
prompt: {
type: 'string',
maxLength: 4000,
description:
'A text description of the desired image, following the rules, up to 4000 characters.',
},
style: {
type: 'string',
enum: ['vivid', 'natural'],
description:
'Must be one of `vivid` or `natural`. `vivid` generates hyper-real and dramatic images, `natural` produces more natural, less hyper-real looking images',
},
quality: {
type: 'string',
enum: ['hd', 'standard'],
description: 'The quality of the generated image. Only `hd` and `standard` are supported.',
},
size: {
type: 'string',
enum: ['1024x1024', '1792x1024', '1024x1792'],
description:
'The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request.',
},
},
required: ['prompt', 'style', 'quality', 'size'],
};
const displayMessage =
"DALL-E displayed an image. All generated images are already plainly visible, so don't repeat the descriptions in detail. Do not list download links as they are available in the UI already. The user may download the images by clicking on them, but do not mention anything about downloading to the user.";
class DALLE3 extends Tool {
constructor(fields = {}) {
super();
/** @type {boolean} Used to initialize the Tool without necessary variables. */
this.override = fields.override ?? false;
/** @type {boolean} Necessary for output to contain all image metadata. */
this.returnMetadata = fields.returnMetadata ?? false;
this.userId = fields.userId;
this.fileStrategy = fields.fileStrategy;
/** @type {boolean} */
this.isAgent = fields.isAgent;
if (fields.processFileURL) {
/** @type {processFileURL} Necessary for output to contain all image metadata. */
this.processFileURL = fields.processFileURL.bind(this);
}
let apiKey = fields.DALLE3_API_KEY ?? fields.DALLE_API_KEY ?? this.getApiKey();
const config = { apiKey };
if (process.env.DALLE_REVERSE_PROXY) {
config.baseURL = extractBaseURL(process.env.DALLE_REVERSE_PROXY);
}
if (process.env.DALLE3_AZURE_API_VERSION && process.env.DALLE3_BASEURL) {
config.baseURL = process.env.DALLE3_BASEURL;
config.defaultQuery = { 'api-version': process.env.DALLE3_AZURE_API_VERSION };
config.defaultHeaders = {
'api-key': process.env.DALLE3_API_KEY,
'Content-Type': 'application/json',
};
config.apiKey = process.env.DALLE3_API_KEY;
}
if (process.env.PROXY) {
const proxyAgent = new ProxyAgent(process.env.PROXY);
config.fetchOptions = {
dispatcher: proxyAgent,
};
}
/** @type {OpenAI} */
this.openai = new OpenAI(config);
this.name = 'dalle';
this.description = `Use DALLE to create images from text descriptions.
- It requires prompts to be in English, detailed, and to specify image type and human features for diversity.
- Create only one image, without repeating or listing descriptions outside the "prompts" field.
- Maintains the original intent of the description, with parameters for image style, quality, and size to tailor the output.`;
this.description_for_model =
process.env.DALLE3_SYSTEM_PROMPT ??
`// Whenever a description of an image is given, generate prompts (following these rules), and use dalle to create the image. If the user does not ask for a specific number of images, default to creating 2 prompts to send to dalle that are written to be as diverse as possible. All prompts sent to dalle must abide by the following policies:
// 1. Prompts must be in English. Translate to English if needed.
// 2. One image per function call. Create only 1 image per request unless explicitly told to generate more than 1 image.
// 3. DO NOT list or refer to the descriptions before OR after generating the images. They should ONLY ever be written out ONCE, in the \`"prompts"\` field of the request. You do not need to ask for permission to generate, just do it!
// 4. Always mention the image type (photo, oil painting, watercolor painting, illustration, cartoon, drawing, vector, render, etc.) at the beginning of the caption. Unless the captions suggests otherwise, make one of the images a photo.
// 5. Diversify depictions of ALL images with people to always include always DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.
// - EXPLICITLY specify these attributes, not abstractly reference them. The attributes should be specified in a minimal way and should directly describe their physical form.
// - Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.
// - Use "various" or "diverse" ONLY IF the description refers to groups of more than 3 people. Do not change the number of people requested in the original description.
// - Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.
// The prompt must intricately describe every part of the image in concrete, objective detail. THINK about what the end goal of the description is, and extrapolate that to what would make satisfying images.
// All descriptions sent to dalle should be a paragraph of text that is extremely descriptive and detailed. Each should be more than 3 sentences long.
// - The "vivid" style is HIGHLY preferred, but "natural" is also supported.`;
this.schema = dalle3JsonSchema;
}
static get jsonSchema() {
return dalle3JsonSchema;
}
getApiKey() {
const apiKey = process.env.DALLE3_API_KEY ?? process.env.DALLE_API_KEY ?? '';
if (!apiKey && !this.override) {
throw new Error('Missing DALLE_API_KEY environment variable.');
}
return apiKey;
}
replaceUnwantedChars(inputString) {
return inputString
.replace(/\r\n|\r|\n/g, ' ')
.replace(/"/g, '')
.trim();
}
wrapInMarkdown(imageUrl) {
return `![generated image](${imageUrl})`;
}
returnValue(value) {
if (this.isAgent === true && typeof value === 'string') {
return [value, {}];
} else if (this.isAgent === true && typeof value === 'object') {
return [displayMessage, value];
}
return value;
}
async _call(data) {
const { prompt, quality = 'standard', size = '1024x1024', style = 'vivid' } = data;
if (!prompt) {
throw new Error('Missing required field: prompt');
}
let resp;
try {
resp = await this.openai.images.generate({
model: 'dall-e-3',
quality,
style,
size,
prompt: this.replaceUnwantedChars(prompt),
n: 1,
});
} catch (error) {
logger.error('[DALL-E-3] Problem generating the image:', error);
return this
.returnValue(`Something went wrong when trying to generate the image. The DALL-E API may be unavailable:
Error Message: ${error.message}`);
}
if (!resp) {
return this.returnValue(
'Something went wrong when trying to generate the image. The DALL-E API may be unavailable',
);
}
const theImageUrl = resp.data[0].url;
if (!theImageUrl) {
return this.returnValue(
'No image URL returned from OpenAI API. There may be a problem with the API or your configuration.',
);
}
if (this.isAgent) {
let fetchOptions = {};
if (process.env.PROXY) {
const proxyAgent = new ProxyAgent(process.env.PROXY);
fetchOptions.dispatcher = proxyAgent;
}
const imageResponse = await fetch(theImageUrl, fetchOptions);
const arrayBuffer = await imageResponse.arrayBuffer();
const base64 = Buffer.from(arrayBuffer).toString('base64');
const content = [
{
type: ContentTypes.IMAGE_URL,
image_url: {
url: `data:image/png;base64,${base64}`,
},
},
];
const response = [
{
type: ContentTypes.TEXT,
text: displayMessage,
},
];
return [response, { content }];
}
const imageBasename = getImageBasename(theImageUrl);
const imageExt = path.extname(imageBasename);
const extension = imageExt.startsWith('.') ? imageExt.slice(1) : imageExt;
const imageName = `img-${uuidv4()}.${extension}`;
logger.debug('[DALL-E-3]', {
imageName,
imageBasename,
imageExt,
extension,
theImageUrl,
data: resp.data[0],
});
try {
const result = await this.processFileURL({
URL: theImageUrl,
basePath: 'images',
userId: this.userId,
fileName: imageName,
fileStrategy: this.fileStrategy,
context: FileContext.image_generation,
});
if (this.returnMetadata) {
this.result = result;
} else {
this.result = this.wrapInMarkdown(result.filepath);
}
} catch (error) {
logger.error('Error while saving the image:', error);
this.result = `Failed to save the image locally. ${error.message}`;
}
return this.returnValue(this.result);
}
}
module.exports = DALLE3;