mirror of https://github.com/danny-avila/LibreChat.git synced 2025-12-17 08:50:15 +01:00

💫 feat: Config File & Custom Endpoints (#1474 )

* WIP(backend/api): custom endpoint

* WIP(frontend/client): custom endpoint

* chore: adjust typedefs for configs

* refactor: use data-provider for cache keys and rename enums and custom endpoint for better clarity and compatibility

* feat: loadYaml utility

* refactor: rename back to  from  and proof-of-concept for creating schemas from user-defined defaults

* refactor: remove custom endpoint from default endpointsConfig as it will be exclusively managed by yaml config

* refactor(EndpointController): rename variables for clarity

* feat: initial load custom config

* feat(server/utils): add simple `isUserProvided` helper

* chore(types): update TConfig type

* refactor: remove custom endpoint handling from model services as will be handled by config, modularize fetching of models

* feat: loadCustomConfig, loadConfigEndpoints, loadConfigModels

* chore: reorganize server init imports, invoke loadCustomConfig

* refactor(loadConfigEndpoints/Models): return each custom endpoint as standalone endpoint

* refactor(Endpoint/ModelController): spread config values after default (temporary)

* chore(client): fix type issues

* WIP: first pass for multiple custom endpoints
- add endpointType to Conversation schema
- add update zod schemas for both convo/presets to allow non-EModelEndpoint value as endpoint (also using type assertion)
- use `endpointType` value as `endpoint` where mapping to type is necessary using this field
- use custom defined `endpoint` value and not type for mapping to modelsConfig
- misc: add return type to `getDefaultEndpoint`
- in `useNewConvo`, add the endpointType if it wasn't already added to conversation
- EndpointsMenu: use user-defined endpoint name as Title in menu
- TODO: custom icon via custom config, change unknown to robot icon

* refactor(parseConvo): pass args as an object and change where used accordingly; chore: comment out 'create schema' code

* chore: remove unused availableModels field in TConfig type

* refactor(parseCompactConvo): pass args as an object and change where used accordingly

* feat: chat through custom endpoint

* chore(message/convoSchemas): avoid saving empty arrays

* fix(BaseClient/saveMessageToDatabase): save endpointType

* refactor(ChatRoute): show Spinner if endpointsQuery or modelsQuery are still loading, which is apparent with slow fetching of models/remote config on first serve

* fix(useConversation): assign endpointType if it's missing

* fix(SaveAsPreset): pass real endpoint and endpointType when saving Preset)

* chore: recorganize types order for TConfig, add `iconURL`

* feat: custom endpoint icon support:
- use UnknownIcon in all icon contexts
- add mistral and openrouter as known endpoints, and add their icons
- iconURL support

* fix(presetSchema): move endpointType to default schema definitions shared between convoSchema and defaults

* refactor(Settings/OpenAI): remove legacy `isOpenAI` flag

* fix(OpenAIClient): do not invoke abortCompletion on completion error

* feat: add responseSender/label support for custom endpoints:
- use defaultModelLabel field in endpointOption
- add model defaults for custom endpoints in `getResponseSender`
- add `useGetSender` hook which uses EndpointsQuery to determine `defaultModelLabel`
- include defaultModelLabel from endpointConfig in custom endpoint client options
- pass `endpointType` to `getResponseSender`

* feat(OpenAIClient): use custom options from config file

* refactor: rename `defaultModelLabel` to `modelDisplayLabel`

* refactor(data-provider): separate concerns from `schemas` into `parsers`, `config`, and fix imports elsewhere

* feat: `iconURL` and extract environment variables from custom endpoint config values

* feat: custom config validation via zod schema, rename and move to `./projectRoot/librechat.yaml`

* docs: custom config docs and examples

* fix(OpenAIClient/mistral): mistral does not allow singular system message, also add `useChatCompletion` flag to use openai-node for title completions

* fix(custom/initializeClient): extract env var and use `isUserProvided` function

* Update librechat.example.yaml

* feat(InputWithLabel): add className props, and forwardRef

* fix(streamResponse): handle error edge case where either messages or convos query throws an error

* fix(useSSE): handle errorHandler edge cases where error response is and is not properly formatted from API, especially when a conversationId is not yet provided, which ensures stream is properly closed on error

* feat: user_provided keys for custom endpoints

* fix(config/endpointSchema): do not allow default endpoint values in custom endpoint `name`

* feat(loadConfigModels): extract env variables and optimize fetching models

* feat: support custom endpoint iconURL for messages and Nav

* feat(OpenAIClient): add/dropParams support

* docs: update docs with default params, add/dropParams, and notes to use config file instead of `OPENAI_REVERSE_PROXY`

* docs: update docs with additional notes

* feat(maxTokensMap): add mistral models (32k context)

* docs: update openrouter notes

* Update ai_setup.md

* docs(custom_config): add table of contents and fix note about custom name

* docs(custom_config): reorder ToC

* Update custom_config.md

* Add note about `max_tokens` field in custom_config.md

2024-01-03 09:22:48 -05:00

4.1 KiB

Raw Blame History

title	description	weight
🚅 LiteLLM	Using LibreChat with LiteLLM Proxy	-7

Using LibreChat with LiteLLM Proxy

Use LiteLLM Proxy for:

Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
Authentication & Spend Tracking Virtual Keys

Start LiteLLM Proxy Server

Pip install litellm

pip install litellm

Create a config.yaml for litellm proxy

More information on LiteLLM configurations here: docs.litellm.ai/docs/simple_proxy

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-eu
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: 
      rpm: 6      # Rate limit for this deployment: in requests per minute (rpm)
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: 
      rpm: 6
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-large
      api_base: https://openai-france-1234.openai.azure.com/
      api_key: 
      rpm: 1440

Start the proxy

litellm --config /path/to/config.yaml

#INFO: Proxy running on http://0.0.0.0:8000

Use LiteLLM Proxy Server with LibreChat

1. Clone the repo

git clone https://github.com/danny-avila/LibreChat.git

2. Modify Librechat's `docker-compose.yml`

OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions

Important: As of v0.6.6, it's recommend you use the librechat.yaml Configuration file (guide here) to add Reverse Proxies as separate endpoints.

3. Save fake OpenAI key in Librechat's `.env`

Copy Librechat's .env.example to .env and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key).

OPENAI_API_KEY=sk-1234

4. Run LibreChat:

docker compose up

Why use LiteLLM?

Access to Multiple LLMs: It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format.
Load Balancing: Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments.
Authentication & Spend Tracking: The server supports virtual keys for authentication and tracks spending.

Key components and features include:

Installation: Easy installation.
Testing: Testing features to route requests to specific models.
Server Endpoints: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation.
Supported LLMs: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more.
Proxy Configurations: Allows setting various parameters like model list, server settings, environment variables, and more.
Multiple Models Management: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts.
Embedding Models Support: Special configurations for embedding models.
Authentication Management: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend.
Custom Configurations: Supports setting model-specific parameters, caching responses, and custom prompt templates.
Debugging Tools: Options for debugging and logging proxy input/output.
Deployment and Performance: Information on deploying LiteLLM Proxy and its performance metrics.
Proxy CLI Arguments: A wide range of command-line arguments for customization.

Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications.

4.1 KiB Raw Blame History