2023-12-22 08:36:42 -05:00
---
title: 🚅 LiteLLM
2023-12-28 17:10:06 -05:00
description: Using LibreChat with LiteLLM Proxy
2023-12-22 08:36:42 -05:00
weight: -7
---
2023-11-30 10:59:16 -08:00
# Using LibreChat with LiteLLM Proxy
2023-12-28 17:10:06 -05:00
Use ** [LiteLLM Proxy ](https://docs.litellm.ai/docs/simple_proxy )** for:
2023-11-30 10:59:16 -08:00
* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
* Authentication & Spend Tracking Virtual Keys
## Start LiteLLM Proxy Server
### Pip install litellm
```shell
pip install litellm
```
### Create a config.yaml for litellm proxy
2023-12-28 17:10:06 -05:00
More information on LiteLLM configurations here: ** [docs.litellm.ai/docs/simple_proxy ](https://docs.litellm.ai/docs/simple_proxy )**
2023-11-30 10:59:16 -08:00
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-eu
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
api_key:
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key:
rpm: 6
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-large
api_base: https://openai-france-1234.openai.azure.com/
api_key:
rpm: 1440
```
### Start the proxy
```shell
litellm --config /path/to/config.yaml
#INFO: Proxy running on http://0.0.0.0:8000
```
## Use LiteLLM Proxy Server with LibreChat
#### 1. Clone the repo
```shell
git clone https://github.com/danny-avila/LibreChat.git
```
#### 2. Modify Librechat's `docker-compose.yml`
```yaml
OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions
```
💫 feat: Config File & Custom Endpoints (#1474)
* WIP(backend/api): custom endpoint
* WIP(frontend/client): custom endpoint
* chore: adjust typedefs for configs
* refactor: use data-provider for cache keys and rename enums and custom endpoint for better clarity and compatibility
* feat: loadYaml utility
* refactor: rename back to from and proof-of-concept for creating schemas from user-defined defaults
* refactor: remove custom endpoint from default endpointsConfig as it will be exclusively managed by yaml config
* refactor(EndpointController): rename variables for clarity
* feat: initial load custom config
* feat(server/utils): add simple `isUserProvided` helper
* chore(types): update TConfig type
* refactor: remove custom endpoint handling from model services as will be handled by config, modularize fetching of models
* feat: loadCustomConfig, loadConfigEndpoints, loadConfigModels
* chore: reorganize server init imports, invoke loadCustomConfig
* refactor(loadConfigEndpoints/Models): return each custom endpoint as standalone endpoint
* refactor(Endpoint/ModelController): spread config values after default (temporary)
* chore(client): fix type issues
* WIP: first pass for multiple custom endpoints
- add endpointType to Conversation schema
- add update zod schemas for both convo/presets to allow non-EModelEndpoint value as endpoint (also using type assertion)
- use `endpointType` value as `endpoint` where mapping to type is necessary using this field
- use custom defined `endpoint` value and not type for mapping to modelsConfig
- misc: add return type to `getDefaultEndpoint`
- in `useNewConvo`, add the endpointType if it wasn't already added to conversation
- EndpointsMenu: use user-defined endpoint name as Title in menu
- TODO: custom icon via custom config, change unknown to robot icon
* refactor(parseConvo): pass args as an object and change where used accordingly; chore: comment out 'create schema' code
* chore: remove unused availableModels field in TConfig type
* refactor(parseCompactConvo): pass args as an object and change where used accordingly
* feat: chat through custom endpoint
* chore(message/convoSchemas): avoid saving empty arrays
* fix(BaseClient/saveMessageToDatabase): save endpointType
* refactor(ChatRoute): show Spinner if endpointsQuery or modelsQuery are still loading, which is apparent with slow fetching of models/remote config on first serve
* fix(useConversation): assign endpointType if it's missing
* fix(SaveAsPreset): pass real endpoint and endpointType when saving Preset)
* chore: recorganize types order for TConfig, add `iconURL`
* feat: custom endpoint icon support:
- use UnknownIcon in all icon contexts
- add mistral and openrouter as known endpoints, and add their icons
- iconURL support
* fix(presetSchema): move endpointType to default schema definitions shared between convoSchema and defaults
* refactor(Settings/OpenAI): remove legacy `isOpenAI` flag
* fix(OpenAIClient): do not invoke abortCompletion on completion error
* feat: add responseSender/label support for custom endpoints:
- use defaultModelLabel field in endpointOption
- add model defaults for custom endpoints in `getResponseSender`
- add `useGetSender` hook which uses EndpointsQuery to determine `defaultModelLabel`
- include defaultModelLabel from endpointConfig in custom endpoint client options
- pass `endpointType` to `getResponseSender`
* feat(OpenAIClient): use custom options from config file
* refactor: rename `defaultModelLabel` to `modelDisplayLabel`
* refactor(data-provider): separate concerns from `schemas` into `parsers`, `config`, and fix imports elsewhere
* feat: `iconURL` and extract environment variables from custom endpoint config values
* feat: custom config validation via zod schema, rename and move to `./projectRoot/librechat.yaml`
* docs: custom config docs and examples
* fix(OpenAIClient/mistral): mistral does not allow singular system message, also add `useChatCompletion` flag to use openai-node for title completions
* fix(custom/initializeClient): extract env var and use `isUserProvided` function
* Update librechat.example.yaml
* feat(InputWithLabel): add className props, and forwardRef
* fix(streamResponse): handle error edge case where either messages or convos query throws an error
* fix(useSSE): handle errorHandler edge cases where error response is and is not properly formatted from API, especially when a conversationId is not yet provided, which ensures stream is properly closed on error
* feat: user_provided keys for custom endpoints
* fix(config/endpointSchema): do not allow default endpoint values in custom endpoint `name`
* feat(loadConfigModels): extract env variables and optimize fetching models
* feat: support custom endpoint iconURL for messages and Nav
* feat(OpenAIClient): add/dropParams support
* docs: update docs with default params, add/dropParams, and notes to use config file instead of `OPENAI_REVERSE_PROXY`
* docs: update docs with additional notes
* feat(maxTokensMap): add mistral models (32k context)
* docs: update openrouter notes
* Update ai_setup.md
* docs(custom_config): add table of contents and fix note about custom name
* docs(custom_config): reorder ToC
* Update custom_config.md
* Add note about `max_tokens` field in custom_config.md
2024-01-03 09:22:48 -05:00
**Important**: As of v0.6.6, it's recommend you use the `librechat.yaml` [Configuration file (guide here) ](./custom_config.md ) to add Reverse Proxies as separate endpoints.
2023-11-30 10:59:16 -08:00
#### 3. Save fake OpenAI key in Librechat's `.env`
Copy Librechat's `.env.example` to `.env` and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key).
```env
OPENAI_API_KEY=sk-1234
```
#### 4. Run LibreChat:
```shell
docker compose up
```
---
### Why use LiteLLM?
1. **Access to Multiple LLMs** : It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format.
2. **Load Balancing** : Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments.
3. **Authentication & Spend Tracking** : The server supports virtual keys for authentication and tracks spending.
Key components and features include:
- **Installation**: Easy installation.
- **Testing**: Testing features to route requests to specific models.
- **Server Endpoints**: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation.
- **Supported LLMs**: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more.
- **Proxy Configurations**: Allows setting various parameters like model list, server settings, environment variables, and more.
- **Multiple Models Management**: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts.
- **Embedding Models Support**: Special configurations for embedding models.
- **Authentication Management**: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend.
- **Custom Configurations**: Supports setting model-specific parameters, caching responses, and custom prompt templates.
- **Debugging Tools**: Options for debugging and logging proxy input/output.
- **Deployment and Performance**: Information on deploying LiteLLM Proxy and its performance metrics.
- **Proxy CLI Arguments**: A wide range of command-line arguments for customization.
Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications.