LibreChat/docs/install/configuration/litellm.md

---
title: 🚅 LiteLLM
description: Using LibreChat with LiteLLM Proxy 
weight: -7
---

# Using LibreChat with LiteLLM Proxy 
Use **[LiteLLM Proxy](https://docs.litellm.ai/docs/simple_proxy)** for: 

* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
* Authentication & Spend Tracking Virtual Keys

## Start LiteLLM Proxy Server 
### 1. Uncomment desired sections in docker-compose.override.yml
The override file contains sections for the below LiteLLM features

#### Caching with Redis
Litellm supports in-memory, redis, and s3 caching. Note: Caching currently only works with exact matching.

#### Performance Monitoring with Langfuse
Litellm supports various logging and observability options.  The settings below will enable Langfuse which will provide a cache_hit tag showing which conversations used cache.

### 2. Create a config.yaml for LiteLLM proxy 
LiteLLM requires a configuration file in addition to the override file. The file 
below has the options to enable llm proxy to various providers, load balancing, Redis caching, and Langfuse monitoring. Review documentation for other configuration options.
More information on LiteLLM configurations here: **[docs.litellm.ai/docs/simple_proxy](https://docs.litellm.ai/docs/simple_proxy)**

```yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-eu
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: 
      rpm: 6      # Rate limit for this deployment: in requests per minute (rpm)
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: 
      rpm: 6
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-large
      api_base: https://openai-france-1234.openai.azure.com/
      api_key: 
      rpm: 1440
  - model_name: mixtral
    litellm_params:
      model: ollama/mixtral:8x7b-instruct-v0.1-q5_K_M
      api_base: http://ollama:11434
      stream: True
  - model_name: mistral
    litellm_params:
      model: ollama/mistral
      api_base: http://ollama:11434
      stream: True
litellm_settings:
  success_callback: ["langfuse"]
  cache: True
  cache_params:
    type: "redis"
    supported_call_types: ["acompletion", "completion", "embedding", "aembedding"]
general_settings:
  master_key: sk_live_SetToRandomValue
```

### 3. Configure LibreChat

Use `librechat.yaml` [Configuration file (guide here)](./ai_endpoints.md) to add Reverse Proxies as separate endpoints.

---

### Why use LiteLLM?

1. **Access to Multiple LLMs**: It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format.

2. **Load Balancing**: Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments.

3. **Authentication & Spend Tracking**: The server supports virtual keys for authentication and tracks spending.

Key components and features include:

- **Installation**: Easy installation.
- **Testing**: Testing features to route requests to specific models.
- **Server Endpoints**: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation.
- **Supported LLMs**: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more.
- **Proxy Configurations**: Allows setting various parameters like model list, server settings, environment variables, and more.
- **Multiple Models Management**: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts.
- **Embedding Models Support**: Special configurations for embedding models.
- **Authentication Management**: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend.
- **Custom Configurations**: Supports setting model-specific parameters, caching responses, and custom prompt templates.
- **Debugging Tools**: Options for debugging and logging proxy input/output.
- **Deployment and Performance**: Information on deploying LiteLLM Proxy and its performance metrics.
- **Proxy CLI Arguments**: A wide range of command-line arguments for customization.

Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications.
🧹📚 docs: refactor and clean up (#1392) * 📑 update mkdocs * rename docker override file and add to gitignore * update .env.example - GOOGLE_MODELS * update index.md * doc refactor: split installation and configuration in two sub-folders * doc update: installation guides * doc update: configuration guides * doc: new docker override guide * doc: new beginner's guide for contributions - Thanks @Berry-13 * doc: update documentation_guidelines.md * doc: update testing.md * doc: update deployment guides * doc: update /dev readme * doc: update general_info * doc: add 0 value to doc weight * doc: add index.md to every doc folders * doc: add weight to index.md and move openrouter from free_ai_apis.md to ai_setup.md * doc: update toc so they display properly on the right had side in mkdocs * doc: update pandoranext.md * doc: index logging_system.md * doc: update readme.md * doc: update litellm.md * doc: update ./dev/readme.md * doc:🔖 new presets.md * doc: minor corrections * doc update: user_auth_system.md and presets.md, doc feat: add mermaid support to mkdocs * doc update: add screenshots to presets.md * doc update: add screenshots to - OpenID with AWS Cognito * doc update: BingAI cookie instruction * doc update: discord auth * doc update: facebook auth * doc: corrections to user_auth_system.md * doc update: github auth * doc update: google auth * doc update: auth clean up * doc organization: installation * doc organization: configuration * doc organization: features+plugins & update:plugins screenshots * doc organization: deploymend + general_info & update: tech_stack.md * doc organization: contributions * doc: minor fixes * doc: minor fixes 2023-12-22 08:36:42 -05:00			`---`
📚 docs: Separate LiteLLM and Ollama Documentation (#1948) * Separate LiteLLM and Ollama Documentation * Clarify Ollama Setup * Fix litellm config 2024-03-02 11:42:02 -06:00			`title: 🚅 LiteLLM`
🪪mkdocs: social cards (#1428) * mkdocs plugins: add plugin for social cards and plugin that allow to exclude a folder * docs: fix hyperlinks * mkdocs: social cards (descriptions) for 'contributions' and 'deployment' guides * mkdocs: social cards (descriptions) for all 'index.md' * mkdocs: social cards (descriptions) for 'features' and 'plugins' * mkdocs: social cards (descriptions) for 'general_info' * mkdocs: social cards (descriptions) for 'configuration' * mkdocs: social cards (descriptions) for 'installation' * mkdocs: minor fixes * update librechat.svg * update how_to_contribute.md add reference to the official GitHub documentation 2023-12-28 17:10:06 -05:00			`description: Using LibreChat with LiteLLM Proxy`
🧹📚 docs: refactor and clean up (#1392) * 📑 update mkdocs * rename docker override file and add to gitignore * update .env.example - GOOGLE_MODELS * update index.md * doc refactor: split installation and configuration in two sub-folders * doc update: installation guides * doc update: configuration guides * doc: new docker override guide * doc: new beginner's guide for contributions - Thanks @Berry-13 * doc: update documentation_guidelines.md * doc: update testing.md * doc: update deployment guides * doc: update /dev readme * doc: update general_info * doc: add 0 value to doc weight * doc: add index.md to every doc folders * doc: add weight to index.md and move openrouter from free_ai_apis.md to ai_setup.md * doc: update toc so they display properly on the right had side in mkdocs * doc: update pandoranext.md * doc: index logging_system.md * doc: update readme.md * doc: update litellm.md * doc: update ./dev/readme.md * doc:🔖 new presets.md * doc: minor corrections * doc update: user_auth_system.md and presets.md, doc feat: add mermaid support to mkdocs * doc update: add screenshots to presets.md * doc update: add screenshots to - OpenID with AWS Cognito * doc update: BingAI cookie instruction * doc update: discord auth * doc update: facebook auth * doc: corrections to user_auth_system.md * doc update: github auth * doc update: google auth * doc update: auth clean up * doc organization: installation * doc organization: configuration * doc organization: features+plugins & update:plugins screenshots * doc organization: deploymend + general_info & update: tech_stack.md * doc organization: contributions * doc: minor fixes * doc: minor fixes 2023-12-22 08:36:42 -05:00			`weight: -7`
			`---`

📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00			`# Using LibreChat with LiteLLM Proxy`
🪪mkdocs: social cards (#1428) * mkdocs plugins: add plugin for social cards and plugin that allow to exclude a folder * docs: fix hyperlinks * mkdocs: social cards (descriptions) for 'contributions' and 'deployment' guides * mkdocs: social cards (descriptions) for all 'index.md' * mkdocs: social cards (descriptions) for 'features' and 'plugins' * mkdocs: social cards (descriptions) for 'general_info' * mkdocs: social cards (descriptions) for 'configuration' * mkdocs: social cards (descriptions) for 'installation' * mkdocs: minor fixes * update librechat.svg * update how_to_contribute.md add reference to the official GitHub documentation 2023-12-28 17:10:06 -05:00			`Use [LiteLLM Proxy](https://docs.litellm.ai/docs/simple_proxy) for:`
📒 docs: Add newline for list to be correctly rendered in UI (#1873) Currently in the documentation page the bullet list is not rendered correctly. (See first paragraph on this docs page: https://docs.librechat.ai/install/configuration/litellm.html) 2024-02-23 21:29:36 +01:00
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00			`* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format`
			`* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests`
			`* Authentication & Spend Tracking Virtual Keys`

			`## Start LiteLLM Proxy Server`
📚 docs: Separate LiteLLM and Ollama Documentation (#1948) * Separate LiteLLM and Ollama Documentation * Clarify Ollama Setup * Fix litellm config 2024-03-02 11:42:02 -06:00			`### 1. Uncomment desired sections in docker-compose.override.yml`
			`The override file contains sections for the below LiteLLM features`

			`#### Caching with Redis`
			`Litellm supports in-memory, redis, and s3 caching. Note: Caching currently only works with exact matching.`
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00
📚 docs: Separate LiteLLM and Ollama Documentation (#1948) * Separate LiteLLM and Ollama Documentation * Clarify Ollama Setup * Fix litellm config 2024-03-02 11:42:02 -06:00			`#### Performance Monitoring with Langfuse`
			`Litellm supports various logging and observability options. The settings below will enable Langfuse which will provide a cache_hit tag showing which conversations used cache.`

			`### 2. Create a config.yaml for LiteLLM proxy`
			`LiteLLM requires a configuration file in addition to the override file. The file`
			`below has the options to enable llm proxy to various providers, load balancing, Redis caching, and Langfuse monitoring. Review documentation for other configuration options.`
🪪mkdocs: social cards (#1428) * mkdocs plugins: add plugin for social cards and plugin that allow to exclude a folder * docs: fix hyperlinks * mkdocs: social cards (descriptions) for 'contributions' and 'deployment' guides * mkdocs: social cards (descriptions) for all 'index.md' * mkdocs: social cards (descriptions) for 'features' and 'plugins' * mkdocs: social cards (descriptions) for 'general_info' * mkdocs: social cards (descriptions) for 'configuration' * mkdocs: social cards (descriptions) for 'installation' * mkdocs: minor fixes * update librechat.svg * update how_to_contribute.md add reference to the official GitHub documentation 2023-12-28 17:10:06 -05:00			`More information on LiteLLM configurations here: [docs.litellm.ai/docs/simple_proxy](https://docs.litellm.ai/docs/simple_proxy)`
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00
			```yaml
			`model_list:`
			`- model_name: gpt-3.5-turbo`
			`litellm_params:`
			`model: azure/gpt-turbo-small-eu`
			`api_base: https://my-endpoint-europe-berri-992.openai.azure.com/`
			`api_key:`
			`rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)`
			`- model_name: gpt-3.5-turbo`
			`litellm_params:`
			`model: azure/gpt-turbo-small-ca`
			`api_base: https://my-endpoint-canada-berri992.openai.azure.com/`
			`api_key:`
			`rpm: 6`
			`- model_name: gpt-3.5-turbo`
			`litellm_params:`
			`model: azure/gpt-turbo-large`
			`api_base: https://openai-france-1234.openai.azure.com/`
			`api_key:`
			`rpm: 1440`
📚 docs: Separate LiteLLM and Ollama Documentation (#1948) * Separate LiteLLM and Ollama Documentation * Clarify Ollama Setup * Fix litellm config 2024-03-02 11:42:02 -06:00			`- model_name: mixtral`
			`litellm_params:`
			`model: ollama/mixtral:8x7b-instruct-v0.1-q5_K_M`
			`api_base: http://ollama:11434`
			`stream: True`
			`- model_name: mistral`
			`litellm_params:`
			`model: ollama/mistral`
			`api_base: http://ollama:11434`
			`stream: True`
			`litellm_settings:`
			`success_callback: ["langfuse"]`
			`cache: True`
			`cache_params:`
			`type: "redis"`
			`supported_call_types: ["acompletion", "completion", "embedding", "aembedding"]`
			`general_settings:`
			`master_key: sk_live_SetToRandomValue`
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00			```

📚 docs: Separate LiteLLM and Ollama Documentation (#1948) * Separate LiteLLM and Ollama Documentation * Clarify Ollama Setup * Fix litellm config 2024-03-02 11:42:02 -06:00			`### 3. Configure LibreChat`
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00
📚 docs: Separate LiteLLM and Ollama Documentation (#1948) * Separate LiteLLM and Ollama Documentation * Clarify Ollama Setup * Fix litellm config 2024-03-02 11:42:02 -06:00			Use `librechat.yaml` [Configuration file (guide here)](./ai_endpoints.md) to add Reverse Proxies as separate endpoints.
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00
			`---`

			`### Why use LiteLLM?`

			`1. Access to Multiple LLMs: It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format.`

			`2. Load Balancing: Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments.`

			`3. Authentication & Spend Tracking: The server supports virtual keys for authentication and tracks spending.`

			`Key components and features include:`

			`- Installation: Easy installation.`
			`- Testing: Testing features to route requests to specific models.`
			`- Server Endpoints: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation.`
			`- Supported LLMs: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more.`
			`- Proxy Configurations: Allows setting various parameters like model list, server settings, environment variables, and more.`
			`- Multiple Models Management: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts.`
			`- Embedding Models Support: Special configurations for embedding models.`
			`- Authentication Management: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend.`
			`- Custom Configurations: Supports setting model-specific parameters, caching responses, and custom prompt templates.`
			`- Debugging Tools: Options for debugging and logging proxy input/output.`
			`- Deployment and Performance: Information on deploying LiteLLM Proxy and its performance metrics.`
			`- Proxy CLI Arguments: A wide range of command-line arguments for customization.`

📚 docs: Separate LiteLLM and Ollama Documentation (#1948) * Separate LiteLLM and Ollama Documentation * Clarify Ollama Setup * Fix litellm config 2024-03-02 11:42:02 -06:00			`Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications.`