2023-12-22 08:36:42 -05:00
---
2024-03-02 11:42:02 -06:00
title: 🚅 LiteLLM
2023-12-28 17:10:06 -05:00
description: Using LibreChat with LiteLLM Proxy
2023-12-22 08:36:42 -05:00
weight: -7
---
2023-11-30 10:59:16 -08:00
# Using LibreChat with LiteLLM Proxy
2023-12-28 17:10:06 -05:00
Use ** [LiteLLM Proxy ](https://docs.litellm.ai/docs/simple_proxy )** for:
2024-02-23 21:29:36 +01:00
2023-11-30 10:59:16 -08:00
* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
* Authentication & Spend Tracking Virtual Keys
## Start LiteLLM Proxy Server
2024-03-02 11:42:02 -06:00
### 1. Uncomment desired sections in docker-compose.override.yml
The override file contains sections for the below LiteLLM features
2024-04-05 19:34:11 -04:00
Minimum working `docker-compose.override.yml` Example:
```
litellm:
image: ghcr.io/berriai/litellm:main-latest
volumes:
- ./litellm/litellm-config.yaml:/app/config.yaml
# NOTE: For Google - required auth "GOOGLE_APPLICATION_CREDENTIALS" envronment and volume mount
# This also means you need to add the `application_default_credentaials.json` file within ~/litellm
- ./litellm/application_default_credentials.json:/app/application_default_credentials.json
ports:
- "4000:8000"
command: [ "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "8" ]
For Google - see above about required auth "GOOGLE_APPLICATION_CREDENTIALS" envronment and volume mount
environment:
GOOGLE_APPLICATION_CREDENTIALS: /app/application_default_credentials.json
```
2024-03-02 11:42:02 -06:00
#### Caching with Redis
Litellm supports in-memory, redis, and s3 caching. Note: Caching currently only works with exact matching.
2023-11-30 10:59:16 -08:00
2024-03-02 11:42:02 -06:00
#### Performance Monitoring with Langfuse
Litellm supports various logging and observability options. The settings below will enable Langfuse which will provide a cache_hit tag showing which conversations used cache.
2024-04-05 19:34:11 -04:00
### 2. Create a Config for LiteLLM proxy
LiteLLM requires a configuration file in addition to the override file. Within LibreChat, this will be `litellm/litellm-config.yml` . The file
2024-03-02 11:42:02 -06:00
below has the options to enable llm proxy to various providers, load balancing, Redis caching, and Langfuse monitoring. Review documentation for other configuration options.
2023-12-28 17:10:06 -05:00
More information on LiteLLM configurations here: ** [docs.litellm.ai/docs/simple_proxy ](https://docs.litellm.ai/docs/simple_proxy )**
2023-11-30 10:59:16 -08:00
2024-04-05 19:34:11 -04:00
#### Working Example of incorporating OpenAI, Azure OpenAI, AWS Bedrock, and GCP
Please note the `...` being a secret or a value you should not share (API key, custom tenant endpoint, etc)
You can potentially use env variables for these too, ex: `api_key: "os.environ/AZURE_API_KEY" # does os.getenv("AZURE_API_KEY")`
```yaml
model_list:
# https://litellm.vercel.app/docs/proxy/quick_start
- model_name: claude-3-haiku
litellm_params:
model: bedrock/anthropic.claude-3-haiku-20240307-v1:0
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: claude-3-sonnet
litellm_params:
model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
2024-04-27 06:46:20 -04:00
- model_name: claude-3-opus
litellm_params:
model: bedrock/anthropic.claude-3-opus-20240229-v1:0
aws_region_name: us-west-2
aws_access_key_id: A...
aws_secret_access_key: ...
2024-04-05 19:34:11 -04:00
- model_name: claude-v2
litellm_params:
model: bedrock/anthropic.claude-v2:1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: claude-instant
litellm_params:
model: bedrock/anthropic.claude-instant-v1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: llama2-13b
litellm_params:
model: bedrock/meta.llama2-13b-chat-v1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: llama2-70b
litellm_params:
model: bedrock/meta.llama2-70b-chat-v1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
2024-04-27 06:46:20 -04:00
- model_name: llama3-8b
litellm_params:
model: bedrock/meta.llama3-8b-instruct-v1:0
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: llama3-70b
litellm_params:
model: bedrock/meta.llama3-70b-instruct-v1:0
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
2024-04-05 19:34:11 -04:00
- model_name: mistral-7b-instruct
litellm_params:
model: bedrock/mistral.mistral-7b-instruct-v0:2
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: mixtral-8x7b-instruct
litellm_params:
model: bedrock/mistral.mixtral-8x7b-instruct-v0:1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
2024-04-27 06:46:20 -04:00
- model_name: mixtral-large
litellm_params:
model: bedrock/mistral.mistral-large-2402-v1:0
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: cohere-command-v14
litellm_params:
model: bedrock/cohere.command-text-v14
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: cohere-command-light-v14
litellm_params:
model: bedrock/cohere.command-light-text-v14
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: ai21-j2-mid
litellm_params:
model: bedrock/ai21.j2-mid-v1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: ai21-j2-ultra
litellm_params:
model: bedrock/ai21.j2-ultra-v1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: amazon-titan-lite
litellm_params:
model: bedrock/amazon.titan-text-lite-v1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
- model_name: amazon-titan-express
litellm_params:
model: bedrock/amazon.titan-text-express-v1
aws_region_name: us-east-1
aws_access_key_id: A...
aws_secret_access_key: ...
2024-04-05 19:34:11 -04:00
- model_name: azure-gpt-4-turbo-preview
litellm_params:
model: azure/gpt-4-turbo-preview
2024-04-27 06:46:20 -04:00
api_base: https://tenant-name.openai.azure.com/
2024-04-05 19:34:11 -04:00
api_key: ...
- model_name: azure-gpt-3.5-turbo
litellm_params:
model: azure/gpt-35-turbo
2024-04-27 06:46:20 -04:00
api_base: https://tenant-name.openai.azure.com/
2024-04-05 19:34:11 -04:00
api_key: ...
- model_name: azure-gpt-4
litellm_params:
model: azure/gpt-4
2024-04-27 06:46:20 -04:00
api_base: https://tenant-name.openai.azure.com/
2024-04-05 19:34:11 -04:00
api_key: ...
- model_name: azure-gpt-3.5-turbo-16k
litellm_params:
model: azure/gpt-35-turbo-16k
2024-04-27 06:46:20 -04:00
api_base: https://tenant-name.openai.azure.com/
2024-04-05 19:34:11 -04:00
api_key: ...
- model_name: azure-gpt-4-32k
litellm_params:
model: azure/gpt-4-32k
2024-04-27 06:46:20 -04:00
api_base: https://tenant-name.openai.azure.com/
2024-04-05 19:34:11 -04:00
api_key: ...
2024-04-27 06:46:20 -04:00
- model_name: gpt-4-turbo
litellm_params:
model: gpt-4-turbo
api_key: ...
- model_name: old-gpt-4-turbo-preview
2024-04-05 19:34:11 -04:00
litellm_params:
model: gpt-4-turbo-preview
2024-04-27 06:46:20 -04:00
api_key: ...
2024-04-05 19:34:11 -04:00
2024-04-27 06:46:20 -04:00
- model_name: gpt-3.5-turbo
2024-04-05 19:34:11 -04:00
litellm_params:
model: gpt-3.5-turbo
2024-04-27 06:46:20 -04:00
api_key: ...
2024-04-05 19:34:11 -04:00
2024-04-27 06:46:20 -04:00
- model_name: gpt-4
2024-04-05 19:34:11 -04:00
litellm_params:
model: gpt-4
2024-04-27 06:46:20 -04:00
api_key: ...
2024-04-05 19:34:11 -04:00
2024-04-27 06:46:20 -04:00
- model_name: gpt-3.5-turbo-16k
2024-04-05 19:34:11 -04:00
litellm_params:
model: gpt-3.5-turbo-16k
2024-04-27 06:46:20 -04:00
api_key: ...
2024-04-05 19:34:11 -04:00
2024-04-27 06:46:20 -04:00
- model_name: gpt-4-32k
2024-04-05 19:34:11 -04:00
litellm_params:
model: gpt-4-32k
2024-04-27 06:46:20 -04:00
api_key: ...
2024-04-05 19:34:11 -04:00
2024-04-27 06:46:20 -04:00
- model_name: gpt-4-vision-preview
2024-04-05 19:34:11 -04:00
litellm_params:
model: gpt-4-vision-preview
2024-04-27 06:46:20 -04:00
api_key: ...
2024-04-05 19:34:11 -04:00
# NOTE: For Google - see above about required auth "GOOGLE_APPLICATION_CREDENTIALS" envronment and volume mount
- model_name: google-chat-bison
litellm_params:
model: vertex_ai/chat-bison
2024-04-27 06:46:20 -04:00
vertex_project: gcp-proj-name
2024-04-05 19:34:11 -04:00
vertex_location: us-central1
# NOTE: For Google - see above about required auth "GOOGLE_APPLICATION_CREDENTIALS" envronment and volume mount
- model_name: google-chat-bison-32k
litellm_params:
model: vertex_ai/chat-bison-32k
2024-04-27 06:46:20 -04:00
vertex_project: gcp-proj-name
2024-04-05 19:34:11 -04:00
vertex_location: us-central1
# NOTE: For Google - see above about required auth "GOOGLE_APPLICATION_CREDENTIALS" envronment and volume mount
2024-04-27 06:46:20 -04:00
- model_name: google-gemini-pro-1.0
2024-04-05 19:34:11 -04:00
litellm_params:
model: vertex_ai/gemini-pro
2024-04-27 06:46:20 -04:00
vertex_project: gcp-proj-name
vertex_location: us-central1
# NOTE: For Google - see above about required auth "GOOGLE_APPLICATION_CREDENTIALS" envronment and volume mount
- model_name: google-gemini-pro-1.5-preview
litellm_params:
model: vertex_ai/gemini-1.5-pro-preview-0409
vertex_project: gcp-proj-name
2024-04-05 19:34:11 -04:00
vertex_location: us-central1
2024-04-27 06:46:20 -04:00
# NOTE: It may be a good idea to comment out "success_callback", "cache", "cache_params" (both lines under) when you first start until this works!
2024-04-05 19:34:11 -04:00
litellm_settings:
success_callback: ["langfuse"]
cache: True
cache_params:
type: "redis"
supported_call_types: ["acompletion", "completion", "embedding", "aembedding"]
general_settings:
master_key: sk_live_SetToRandomValue
```
#### Example of a few Different Options (ex: rpm, stream, ollama)
2023-11-30 10:59:16 -08:00
```yaml
2024-04-05 19:34:11 -04:00
2023-11-30 10:59:16 -08:00
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-eu
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
api_key:
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key:
rpm: 6
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-large
api_base: https://openai-france-1234.openai.azure.com/
api_key:
rpm: 1440
2024-03-02 11:42:02 -06:00
- model_name: mixtral
litellm_params:
2024-04-04 14:32:36 +02:00
model: openai/mixtral:8x7b-instruct-v0.1-q5_K_M # use openai/* for ollama's openai api compatibility
api_base: http://ollama:11434/v1
2024-03-02 11:42:02 -06:00
stream: True
- model_name: mistral
litellm_params:
2024-04-04 14:32:36 +02:00
model: openai/mistral # use openai/* for ollama's openai api compatibility
api_base: http://ollama:11434/v1
2024-03-02 11:42:02 -06:00
stream: True
litellm_settings:
success_callback: ["langfuse"]
cache: True
cache_params:
type: "redis"
supported_call_types: ["acompletion", "completion", "embedding", "aembedding"]
general_settings:
master_key: sk_live_SetToRandomValue
2023-11-30 10:59:16 -08:00
```
2024-04-05 19:34:11 -04:00
2024-03-02 11:42:02 -06:00
### 3. Configure LibreChat
2023-11-30 10:59:16 -08:00
2024-03-02 11:42:02 -06:00
Use `librechat.yaml` [Configuration file (guide here) ](./ai_endpoints.md ) to add Reverse Proxies as separate endpoints.
2023-11-30 10:59:16 -08:00
2024-04-05 19:34:11 -04:00
Here is an example config:
```
custom:
- name: "Lite LLM"
# A place holder - otherwise it becomes the default (OpenAI) key
# Provide the key instead in each "model" block within "litellm/litellm-config.yaml"
apiKey: "sk-from-config-file"
# See the required changes above in "Start LiteLLM Proxy Server" step.
baseURL: "http://host.docker.internal:4000"
# A "default" model to start new users with. The "fetch" will pull the rest of the available models from LiteLLM
# More or less this is "irrelevant", you can pick any model. Just pick one you have defined in LiteLLM.
models:
default: ["gpt-3.5-turbo"]
fetch: true
titleConvo: true
titleModel: "gpt-3.5-turbo"
summarize: false
summaryModel: "gpt-3.5-turbo"
forcePrompt: false
modelDisplayLabel: "Lite LLM"
```
2023-11-30 10:59:16 -08:00
---
### Why use LiteLLM?
1. **Access to Multiple LLMs** : It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format.
2. **Load Balancing** : Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments.
3. **Authentication & Spend Tracking** : The server supports virtual keys for authentication and tracks spending.
Key components and features include:
- **Installation**: Easy installation.
- **Testing**: Testing features to route requests to specific models.
- **Server Endpoints**: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation.
- **Supported LLMs**: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more.
- **Proxy Configurations**: Allows setting various parameters like model list, server settings, environment variables, and more.
- **Multiple Models Management**: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts.
- **Embedding Models Support**: Special configurations for embedding models.
- **Authentication Management**: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend.
- **Custom Configurations**: Supports setting model-specific parameters, caching responses, and custom prompt templates.
- **Debugging Tools**: Options for debugging and logging proxy input/output.
- **Deployment and Performance**: Information on deploying LiteLLM Proxy and its performance metrics.
- **Proxy CLI Arguments**: A wide range of command-line arguments for customization.
2024-04-04 14:32:36 +02:00
Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications.