🍌 feat: Gemini Image Generation Tool (Nano Banana) (#10676)

* Added fully functioning Agent Tool supporting Google's Nano Banana * 🔧 refactor: Update Google credentials handling in GeminiImageGen.js * Refactored the credentials path to follow a consistent pattern with other Google service integrations, allowing for an environment variable override. * Updated documentation in README-GeminiNanoBanana.md to reflect the new credentials handling approach and removed references to hardcoded paths. * 🛠️ refactor: Remove unnecessary whitespace in handleTools.js * 🔧 feat: Update Gemini Image Generation Tool - Bump @google/genai package version to ^1.19.0 for improved functionality. - Refactor GeminiImageGen to createGeminiImageTool for better clarity and consistency. - Enhance manifest.json for Gemini Image Tools with updated descriptions and icon. - Add SVG icon for Gemini Image Tools. - Implement progress tracking for Gemini image generation in the UI. - Introduce new toolkit and context handling for image generation tools. This update improves the Gemini image generation capabilities and user experience. * 🗑️ chore: Remove outdated Gemini image generation PNG and update SVG icon - Deleted the obsolete PNG file for Gemini image generation. - Updated the SVG icon with a new design featuring a gradient and shadow effect, enhancing visual appeal and consistency. * fix: ESLint formatting and unused variable in GeminiImageGen * fix: Update default model to gemini-2.5-flash-image * ✨ feat: Enhance Gemini Image Generation Configuration - Updated .env.example to include new environment variables for Google Cloud region, service account configuration, and Gemini API key options. - Modified GeminiImageGen.js to support both user-provided API keys and Vertex AI service accounts, improving flexibility in client initialization. - Updated manifest.json to reflect changes in authentication methods for the Gemini Image Tools. - Bumped @google/genai package version to 1.19.0 in package-lock.json for compatibility with new features. * 🔧 fix: Format Default Service Key Path in GeminiImageGen.js - Adjusted the return statement in getDefaultServiceKeyPath function for improved readability by formatting it across multiple lines. This change enhances code clarity without altering functionality. * ✨ feat: Enhance Gemini Image Generation with Token Usage Tracking - Added `recordTokenUsage` function to track token usage for balance management. - Integrated token recording into the image generation process. - Updated Gemini image generation tool to accept optional `aspectRatio` and `imageSize` parameters for improved image customization. - Updated token values for new Gemini models in the transaction model. - Improved documentation for image generation tool descriptions and parameters. * ✨ feat: Add new Gemini models for image generation token limits - Introduced token limits for 'gemini-3-pro-image' and 'gemini-2.5-flash-image' models. - Updated token values to enhance the Gemini image generation capabilities. * 🔧 fix: Update Google Service Key Path for Consistency in Initialization (#11001) * 🔧 refactor: Update GeminiImageGen for improved file handling and path resolution - Changed the default service key path to use process.cwd() for better compatibility. - Replaced synchronous file system operations with asynchronous promises for mkdir and writeFile, enhancing performance and error handling. - Added error handling for credential file access to prevent crashes when the file does not exist. * 🔧 refactor: Update GeminiImageGen to streamline API key handling - Refactored API key checks to improve clarity and consistency. - Removed redundant checks for user-provided keys, enhancing code readability. - Ensured proper logging for API key usage across different configurations. * 🔧 fix: Update GeminiImageGen to handle imageSize support conditionally - Added a check to ensure imageSize is only applied if the gemini model does not include 'gemini-2.5-flash-image', improving compatibility. - Enhanced the logic for setting imageConfig to prevent potential issues with unsupported configurations. * 🔧 refactor: Simplify local storage condition in createGeminiImageTool function * 🔧 feat: Enhance image format handling in GeminiImageGen with conversion support * 🔧 refactor: Streamline API key initialization in GeminiImageGen - Simplified the handling of API keys by removing redundant checks for user-provided keys. - Updated logging to reflect the new priority order for API key usage, enhancing clarity and consistency. - Improved code readability by consolidating key retrieval logic. --------- Co-authored-by: Dev Bhanushali <dev.bhanushali@hingehealth.com> Co-authored-by: Danny Avila <danny@librechat.ai>
2026-02-26 12:24:10 +01:00 · 2026-01-03 11:26:46 -05:00 · 2026-01-03 11:26:46 -05:00 · 200098d992
commit 200098d992
parent e452c1a8d9
19 changed files with 1063 additions and 55 deletions
--- a/packages/api/package.json
+++ b/packages/api/package.json
@ -85,6 +85,7 @@
    "@azure/identity": "^4.7.0",
    "@azure/search-documents": "^12.0.0",
    "@azure/storage-blob": "^12.27.0",
+    "@google/genai": "^1.19.0",
    "@keyv/redis": "^4.3.3",
    "@langchain/core": "^0.3.80",
    "@librechat/agents": "^3.0.66",
--- a/packages/api/src/endpoints/google/initialize.ts
+++ b/packages/api/src/endpoints/google/initialize.ts
@ -45,7 +45,7 @@ export async function initializeGoogle({
    /** Only attempt to load service key if GOOGLE_KEY is not provided */
    try {
      const serviceKeyPath =
-        process.env.GOOGLE_SERVICE_KEY_FILE || path.join(process.cwd(), 'data', 'auth.json');
+        process.env.GOOGLE_SERVICE_KEY_FILE || path.join(process.cwd(), 'api', 'data', 'auth.json');
      const loadedKey = await loadServiceKey(serviceKeyPath);
      if (loadedKey) {
        serviceKey = loadedKey;
--- a/packages/api/src/tools/toolkits/gemini.ts
+++ b/packages/api/src/tools/toolkits/gemini.ts
@ -0,0 +1,100 @@
+import { z } from 'zod';
+
+/** Default description for Gemini image generation tool */
+const DEFAULT_GEMINI_IMAGE_GEN_DESCRIPTION =
+  `Generates high-quality, original images based on text prompts, with optional image context.
+
+When to use \`gemini_image_gen\`:
+- To create entirely new images from detailed text descriptions
+- To generate images using existing images as context or inspiration
+- When the user requests image generation, creation, or asks to "generate an image"
+- When the user asks to "edit", "modify", "change", or "swap" elements in an image (generates new image with changes)
+
+When NOT to use \`gemini_image_gen\`:
+- For uploading or saving existing images without modification
+
+Generated image IDs will be returned in the response, so you can refer to them in future requests.` as const;
+
+const getGeminiImageGenDescription = () => {
+  return process.env.GEMINI_IMAGE_GEN_DESCRIPTION || DEFAULT_GEMINI_IMAGE_GEN_DESCRIPTION;
+};
+
+/** Default prompt description for Gemini image generation */
+const DEFAULT_GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION =
+  `A detailed text description of the desired image, up to 32000 characters. For "editing" requests, describe the changes you want to make to the referenced image. Be specific about composition, style, lighting, and subject matter.` as const;
+
+const getGeminiImageGenPromptDescription = () => {
+  return (
+    process.env.GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION || DEFAULT_GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION
+  );
+};
+
+/** Default image IDs description */
+const DEFAULT_GEMINI_IMAGE_IDS_DESCRIPTION = `
+Optional array of image IDs to use as visual context for generation.
+
+Guidelines:
+- For "editing" requests: ALWAYS include the image ID being "edited"
+- For new generation with context: Include any relevant reference image IDs
+- If the user's request references any prior images, include their image IDs in this array
+- These images will be used as visual context/inspiration for the new generation
+- Never invent or hallucinate IDs; only use IDs that are visible in the conversation
+- If no images are relevant, omit this field entirely
+`.trim();
+
+const getGeminiImageIdsDescription = () => {
+  return process.env.GEMINI_IMAGE_IDS_DESCRIPTION || DEFAULT_GEMINI_IMAGE_IDS_DESCRIPTION;
+};
+
+export const geminiToolkit = {
+  gemini_image_gen: {
+    name: 'gemini_image_gen' as const,
+    description: getGeminiImageGenDescription(),
+    description_for_model: `Use this tool to generate images from text descriptions using Vertex AI Gemini.
+1. Prompts should be detailed and specific for best results.
+2. One image per function call. Create only 1 image per request.
+3. IMPORTANT: When user asks to "edit", "modify", "change", or "swap" elements in an existing image:
+   - ALWAYS include the original image ID in the image_ids array
+   - Describe the desired changes clearly in the prompt
+   - The tool will generate a new image based on the original image context + your prompt
+4. IMPORTANT: For editing requests, use DIRECT editing instructions:
+   - User says "remove the gun" → prompt should be "remove the gun from this image"
+   - User says "make it blue" → prompt should be "make this image blue"
+   - User says "add sunglasses" → prompt should be "add sunglasses to this image"
+   - DO NOT reconstruct or modify the original prompt - use the user's editing instruction directly
+   - ALWAYS include the image being edited in image_ids array
+5. OPTIONAL: Use image_ids to provide context images that will influence the generation:
+   - Include any relevant image IDs from the conversation in the image_ids array
+   - These images will be used as visual context/inspiration for the new generation
+   - For "editing" requests, always include the image being "edited"
+6. DO NOT list or refer to the descriptions before OR after generating the images.
+7. Always mention the image type (photo, oil painting, watercolor painting, illustration, cartoon, drawing, vector, render, etc.) at the beginning of the prompt.
+8. Use aspectRatio to control the shape of the image:
+   - 16:9 or 3:2 for landscape/wide images
+   - 9:16 or 2:3 for portrait/tall images
+   - 21:9 for ultra-wide/cinematic images
+   - 1:1 for square images (default)
+9. Use imageSize to control the resolution: 1K (standard), 2K (high), 4K (maximum quality).
+
+The prompt should be a detailed paragraph describing every part of the image in concrete, objective detail.`,
+    schema: z.object({
+      prompt: z.string().max(32000).describe(getGeminiImageGenPromptDescription()),
+      image_ids: z.array(z.string()).optional().describe(getGeminiImageIdsDescription()),
+      aspectRatio: z
+        .enum(['1:1', '2:3', '3:2', '3:4', '4:3', '4:5', '5:4', '9:16', '16:9', '21:9'])
+        .optional()
+        .describe(
+          'The aspect ratio of the generated image. Use 16:9 or 3:2 for landscape, 9:16 or 2:3 for portrait, 21:9 for ultra-wide/cinematic, 1:1 for square. Defaults to 1:1 if not specified.',
+        ),
+      imageSize: z
+        .enum(['1K', '2K', '4K'])
+        .optional()
+        .describe(
+          'The resolution of the generated image. Use 1K for standard, 2K for high, 4K for maximum quality. Defaults to 1K if not specified.',
+        ),
+    }),
+    responseFormat: 'content_and_artifact' as const,
+  },
+} as const;
+
+export type GeminiToolkit = typeof geminiToolkit;
--- a/packages/api/src/tools/toolkits/imageContext.ts
+++ b/packages/api/src/tools/toolkits/imageContext.ts
@ -0,0 +1,38 @@
+/**
+ * Builds tool context string for image generation tools based on available image files.
+ * @param params - The parameters for building image context
+ * @param params.imageFiles - Array of image file objects with file_id property
+ * @param params.toolName - The name of the tool (e.g., 'gemini_image_gen', 'image_edit_oai')
+ * @param params.contextDescription - Description of what the images are for (e.g., 'image context', 'image editing')
+ * @returns The tool context string or empty string if no images
+ */
+export function buildImageToolContext({
+  imageFiles,
+  toolName,
+  contextDescription = 'image context',
+}: {
+  imageFiles: Array<{ file_id: string }>;
+  toolName: string;
+  contextDescription?: string;
+}): string {
+  if (!imageFiles || imageFiles.length === 0) {
+    return '';
+  }
+
+  let toolContext = '';
+  for (let i = 0; i < imageFiles.length; i++) {
+    const file = imageFiles[i];
+    if (!file) {
+      continue;
+    }
+    if (i === 0) {
+      toolContext = `Image files provided in this request (their image IDs listed in order of appearance) available for ${contextDescription}:`;
+    }
+    toolContext += `\n\t- ${file.file_id}`;
+    if (i === imageFiles.length - 1) {
+      toolContext += `\n\nInclude any you need in the \`image_ids\` array when calling \`${toolName}\` to use them as visual context for generation. You may also include previously referenced or generated image IDs.`;
+    }
+  }
+  return toolContext;
+}
+
--- a/packages/api/src/tools/toolkits/index.ts
+++ b/packages/api/src/tools/toolkits/index.ts
@ -1,2 +1,4 @@
+export * from './gemini';
+export * from './imageContext';
 export * from './oai';
 export * from './yt';
--- a/packages/api/src/utils/tokens.ts
+++ b/packages/api/src/utils/tokens.ts
@ -77,9 +77,11 @@ const googleModels = {
  'gemini-pro-vision': 12288,
  'gemini-exp': 2000000,
  'gemini-3': 1000000, // 1M input tokens, 64k output tokens
+  'gemini-3-pro-image': 1000000,
  'gemini-2.5': 1000000, // 1M input tokens, 64k output tokens
  'gemini-2.5-pro': 1000000,
  'gemini-2.5-flash': 1000000,
+  'gemini-2.5-flash-image': 1000000,
  'gemini-2.5-flash-lite': 1000000,
  'gemini-2.0': 2000000,
  'gemini-2.0-flash': 1000000,