LibreChat/docs/install/configuration/litellm.md

---
title: 🚅 LiteLLM and Ollama
description: Using LibreChat with LiteLLM Proxy 
weight: -7
---

# Using LibreChat with LiteLLM Proxy 
Use **[LiteLLM Proxy](https://docs.litellm.ai/docs/simple_proxy)** for: 
* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
* Authentication & Spend Tracking Virtual Keys

## Start LiteLLM Proxy Server 
### Pip install litellm 
```shell
pip install litellm
```

### Create a config.yaml for litellm proxy 
More information on LiteLLM configurations here: **[docs.litellm.ai/docs/simple_proxy](https://docs.litellm.ai/docs/simple_proxy)**

```yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-eu
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: 
      rpm: 6      # Rate limit for this deployment: in requests per minute (rpm)
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: 
      rpm: 6
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-large
      api_base: https://openai-france-1234.openai.azure.com/
      api_key: 
      rpm: 1440
```

### Start the proxy
```shell
litellm --config /path/to/config.yaml

#INFO: Proxy running on http://0.0.0.0:8000
```

## Use LiteLLM Proxy Server with LibreChat


#### 1. Clone the repo
```shell
git clone https://github.com/danny-avila/LibreChat.git
```


#### 2. Modify Librechat's `docker-compose.yml`
```yaml
OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions
```

**Important**: As of v0.6.6, it's recommend you use the `librechat.yaml` [Configuration file (guide here)](./custom_config.md) to add Reverse Proxies as separate endpoints.

#### 3. Save fake OpenAI key in Librechat's `.env` 

Copy Librechat's `.env.example` to `.env` and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key).
```env
OPENAI_API_KEY=sk-1234
```

#### 4. Run LibreChat: 
```shell
docker compose up
```

---

### Why use LiteLLM?

1. **Access to Multiple LLMs**: It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format.

2. **Load Balancing**: Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments.

3. **Authentication & Spend Tracking**: The server supports virtual keys for authentication and tracks spending.

Key components and features include:

- **Installation**: Easy installation.
- **Testing**: Testing features to route requests to specific models.
- **Server Endpoints**: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation.
- **Supported LLMs**: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more.
- **Proxy Configurations**: Allows setting various parameters like model list, server settings, environment variables, and more.
- **Multiple Models Management**: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts.
- **Embedding Models Support**: Special configurations for embedding models.
- **Authentication Management**: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend.
- **Custom Configurations**: Supports setting model-specific parameters, caching responses, and custom prompt templates.
- **Debugging Tools**: Options for debugging and logging proxy input/output.
- **Deployment and Performance**: Information on deploying LiteLLM Proxy and its performance metrics.
- **Proxy CLI Arguments**: A wide range of command-line arguments for customization.

Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications.

## Ollama
Use [Ollama](https://ollama.ai/) for
 * Run large language models on local hardware
 * Host multiple models
 * Dynamically load the model upon request

### docker-compose.yaml with GPU
```yaml
version: "3.8"
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-v1.18.8
    volumes:
      - ./litellm/litellm-config.yaml:/app/config.yaml
    command: [ "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "8" ]
  ollama:
    image: ollama/ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [compute, utility]
    ports:
      - "11434:11434"
    volumes:
      - ./ollama:/root/.ollama

```

### Loading Models in Ollama
1. Browse the available models at [Ollama Library](https://ollama.ai/library)
2. Run ```docker exec -it ollama /bin/bash```
3. Copy the text from the Tags tab from the library website. It should begin with 'ollama run'
4. Check model size. Models that can run in GPU memory perform the best.
5. Use /bye to exit the terminal

### Litellm Ollama Configuration
Add the below lines to the config to access the Ollama models
```yaml
  - model_name: mixtral
    litellm_params:
      model: ollama/mixtral:8x7b-instruct-v0.1-q5_K_M
      api_base: http://ollama:11434
      stream: True
  - model_name: mistral
    litellm_params:
      model: ollama/mistral
      api_base: http://ollama:11434
      stream: True
```
🧹📚 docs: refactor and clean up (#1392) * 📑 update mkdocs * rename docker override file and add to gitignore * update .env.example - GOOGLE_MODELS * update index.md * doc refactor: split installation and configuration in two sub-folders * doc update: installation guides * doc update: configuration guides * doc: new docker override guide * doc: new beginner's guide for contributions - Thanks @Berry-13 * doc: update documentation_guidelines.md * doc: update testing.md * doc: update deployment guides * doc: update /dev readme * doc: update general_info * doc: add 0 value to doc weight * doc: add index.md to every doc folders * doc: add weight to index.md and move openrouter from free_ai_apis.md to ai_setup.md * doc: update toc so they display properly on the right had side in mkdocs * doc: update pandoranext.md * doc: index logging_system.md * doc: update readme.md * doc: update litellm.md * doc: update ./dev/readme.md * doc:🔖 new presets.md * doc: minor corrections * doc update: user_auth_system.md and presets.md, doc feat: add mermaid support to mkdocs * doc update: add screenshots to presets.md * doc update: add screenshots to - OpenID with AWS Cognito * doc update: BingAI cookie instruction * doc update: discord auth * doc update: facebook auth * doc: corrections to user_auth_system.md * doc update: github auth * doc update: google auth * doc update: auth clean up * doc organization: installation * doc organization: configuration * doc organization: features+plugins & update:plugins screenshots * doc organization: deploymend + general_info & update: tech_stack.md * doc organization: contributions * doc: minor fixes * doc: minor fixes 2023-12-22 08:36:42 -05:00			`---`
📖 docs: Update litellm.md to add Ollama (#1616) 2024-01-22 19:45:00 -06:00			`title: 🚅 LiteLLM and Ollama`
🪪mkdocs: social cards (#1428) * mkdocs plugins: add plugin for social cards and plugin that allow to exclude a folder * docs: fix hyperlinks * mkdocs: social cards (descriptions) for 'contributions' and 'deployment' guides * mkdocs: social cards (descriptions) for all 'index.md' * mkdocs: social cards (descriptions) for 'features' and 'plugins' * mkdocs: social cards (descriptions) for 'general_info' * mkdocs: social cards (descriptions) for 'configuration' * mkdocs: social cards (descriptions) for 'installation' * mkdocs: minor fixes * update librechat.svg * update how_to_contribute.md add reference to the official GitHub documentation 2023-12-28 17:10:06 -05:00			`description: Using LibreChat with LiteLLM Proxy`
🧹📚 docs: refactor and clean up (#1392) * 📑 update mkdocs * rename docker override file and add to gitignore * update .env.example - GOOGLE_MODELS * update index.md * doc refactor: split installation and configuration in two sub-folders * doc update: installation guides * doc update: configuration guides * doc: new docker override guide * doc: new beginner's guide for contributions - Thanks @Berry-13 * doc: update documentation_guidelines.md * doc: update testing.md * doc: update deployment guides * doc: update /dev readme * doc: update general_info * doc: add 0 value to doc weight * doc: add index.md to every doc folders * doc: add weight to index.md and move openrouter from free_ai_apis.md to ai_setup.md * doc: update toc so they display properly on the right had side in mkdocs * doc: update pandoranext.md * doc: index logging_system.md * doc: update readme.md * doc: update litellm.md * doc: update ./dev/readme.md * doc:🔖 new presets.md * doc: minor corrections * doc update: user_auth_system.md and presets.md, doc feat: add mermaid support to mkdocs * doc update: add screenshots to presets.md * doc update: add screenshots to - OpenID with AWS Cognito * doc update: BingAI cookie instruction * doc update: discord auth * doc update: facebook auth * doc: corrections to user_auth_system.md * doc update: github auth * doc update: google auth * doc update: auth clean up * doc organization: installation * doc organization: configuration * doc organization: features+plugins & update:plugins screenshots * doc organization: deploymend + general_info & update: tech_stack.md * doc organization: contributions * doc: minor fixes * doc: minor fixes 2023-12-22 08:36:42 -05:00			`weight: -7`
			`---`

📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00			`# Using LibreChat with LiteLLM Proxy`
🪪mkdocs: social cards (#1428) * mkdocs plugins: add plugin for social cards and plugin that allow to exclude a folder * docs: fix hyperlinks * mkdocs: social cards (descriptions) for 'contributions' and 'deployment' guides * mkdocs: social cards (descriptions) for all 'index.md' * mkdocs: social cards (descriptions) for 'features' and 'plugins' * mkdocs: social cards (descriptions) for 'general_info' * mkdocs: social cards (descriptions) for 'configuration' * mkdocs: social cards (descriptions) for 'installation' * mkdocs: minor fixes * update librechat.svg * update how_to_contribute.md add reference to the official GitHub documentation 2023-12-28 17:10:06 -05:00			`Use [LiteLLM Proxy](https://docs.litellm.ai/docs/simple_proxy) for:`
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00			`* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format`
			`* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests`
			`* Authentication & Spend Tracking Virtual Keys`

			`## Start LiteLLM Proxy Server`
			`### Pip install litellm`
			```shell
			`pip install litellm`
			```

			`### Create a config.yaml for litellm proxy`
🪪mkdocs: social cards (#1428) * mkdocs plugins: add plugin for social cards and plugin that allow to exclude a folder * docs: fix hyperlinks * mkdocs: social cards (descriptions) for 'contributions' and 'deployment' guides * mkdocs: social cards (descriptions) for all 'index.md' * mkdocs: social cards (descriptions) for 'features' and 'plugins' * mkdocs: social cards (descriptions) for 'general_info' * mkdocs: social cards (descriptions) for 'configuration' * mkdocs: social cards (descriptions) for 'installation' * mkdocs: minor fixes * update librechat.svg * update how_to_contribute.md add reference to the official GitHub documentation 2023-12-28 17:10:06 -05:00			`More information on LiteLLM configurations here: [docs.litellm.ai/docs/simple_proxy](https://docs.litellm.ai/docs/simple_proxy)`
📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00
			```yaml
			`model_list:`
			`- model_name: gpt-3.5-turbo`
			`litellm_params:`
			`model: azure/gpt-turbo-small-eu`
			`api_base: https://my-endpoint-europe-berri-992.openai.azure.com/`
			`api_key:`
			`rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)`
			`- model_name: gpt-3.5-turbo`
			`litellm_params:`
			`model: azure/gpt-turbo-small-ca`
			`api_base: https://my-endpoint-canada-berri992.openai.azure.com/`
			`api_key:`
			`rpm: 6`
			`- model_name: gpt-3.5-turbo`
			`litellm_params:`
			`model: azure/gpt-turbo-large`
			`api_base: https://openai-france-1234.openai.azure.com/`
			`api_key:`
			`rpm: 1440`
			```

			`### Start the proxy`
			```shell
			`litellm --config /path/to/config.yaml`

			`#INFO: Proxy running on http://0.0.0.0:8000`
			```

			`## Use LiteLLM Proxy Server with LibreChat`


			`#### 1. Clone the repo`
			```shell
			`git clone https://github.com/danny-avila/LibreChat.git`
			```


			#### 2. Modify Librechat's `docker-compose.yml`
			```yaml
			`OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions`
			```

💫 feat: Config File & Custom Endpoints (#1474) * WIP(backend/api): custom endpoint * WIP(frontend/client): custom endpoint * chore: adjust typedefs for configs * refactor: use data-provider for cache keys and rename enums and custom endpoint for better clarity and compatibility * feat: loadYaml utility * refactor: rename back to from and proof-of-concept for creating schemas from user-defined defaults * refactor: remove custom endpoint from default endpointsConfig as it will be exclusively managed by yaml config * refactor(EndpointController): rename variables for clarity * feat: initial load custom config * feat(server/utils): add simple `isUserProvided` helper * chore(types): update TConfig type * refactor: remove custom endpoint handling from model services as will be handled by config, modularize fetching of models * feat: loadCustomConfig, loadConfigEndpoints, loadConfigModels * chore: reorganize server init imports, invoke loadCustomConfig * refactor(loadConfigEndpoints/Models): return each custom endpoint as standalone endpoint * refactor(Endpoint/ModelController): spread config values after default (temporary) * chore(client): fix type issues * WIP: first pass for multiple custom endpoints - add endpointType to Conversation schema - add update zod schemas for both convo/presets to allow non-EModelEndpoint value as endpoint (also using type assertion) - use `endpointType` value as `endpoint` where mapping to type is necessary using this field - use custom defined `endpoint` value and not type for mapping to modelsConfig - misc: add return type to `getDefaultEndpoint` - in `useNewConvo`, add the endpointType if it wasn't already added to conversation - EndpointsMenu: use user-defined endpoint name as Title in menu - TODO: custom icon via custom config, change unknown to robot icon * refactor(parseConvo): pass args as an object and change where used accordingly; chore: comment out 'create schema' code * chore: remove unused availableModels field in TConfig type * refactor(parseCompactConvo): pass args as an object and change where used accordingly * feat: chat through custom endpoint * chore(message/convoSchemas): avoid saving empty arrays * fix(BaseClient/saveMessageToDatabase): save endpointType * refactor(ChatRoute): show Spinner if endpointsQuery or modelsQuery are still loading, which is apparent with slow fetching of models/remote config on first serve * fix(useConversation): assign endpointType if it's missing * fix(SaveAsPreset): pass real endpoint and endpointType when saving Preset) * chore: recorganize types order for TConfig, add `iconURL` * feat: custom endpoint icon support: - use UnknownIcon in all icon contexts - add mistral and openrouter as known endpoints, and add their icons - iconURL support * fix(presetSchema): move endpointType to default schema definitions shared between convoSchema and defaults * refactor(Settings/OpenAI): remove legacy `isOpenAI` flag * fix(OpenAIClient): do not invoke abortCompletion on completion error * feat: add responseSender/label support for custom endpoints: - use defaultModelLabel field in endpointOption - add model defaults for custom endpoints in `getResponseSender` - add `useGetSender` hook which uses EndpointsQuery to determine `defaultModelLabel` - include defaultModelLabel from endpointConfig in custom endpoint client options - pass `endpointType` to `getResponseSender` * feat(OpenAIClient): use custom options from config file * refactor: rename `defaultModelLabel` to `modelDisplayLabel` * refactor(data-provider): separate concerns from `schemas` into `parsers`, `config`, and fix imports elsewhere * feat: `iconURL` and extract environment variables from custom endpoint config values * feat: custom config validation via zod schema, rename and move to `./projectRoot/librechat.yaml` * docs: custom config docs and examples * fix(OpenAIClient/mistral): mistral does not allow singular system message, also add `useChatCompletion` flag to use openai-node for title completions * fix(custom/initializeClient): extract env var and use `isUserProvided` function * Update librechat.example.yaml * feat(InputWithLabel): add className props, and forwardRef * fix(streamResponse): handle error edge case where either messages or convos query throws an error * fix(useSSE): handle errorHandler edge cases where error response is and is not properly formatted from API, especially when a conversationId is not yet provided, which ensures stream is properly closed on error * feat: user_provided keys for custom endpoints * fix(config/endpointSchema): do not allow default endpoint values in custom endpoint `name` * feat(loadConfigModels): extract env variables and optimize fetching models * feat: support custom endpoint iconURL for messages and Nav * feat(OpenAIClient): add/dropParams support * docs: update docs with default params, add/dropParams, and notes to use config file instead of `OPENAI_REVERSE_PROXY` * docs: update docs with additional notes * feat(maxTokensMap): add mistral models (32k context) * docs: update openrouter notes * Update ai_setup.md * docs(custom_config): add table of contents and fix note about custom name * docs(custom_config): reorder ToC * Update custom_config.md * Add note about `max_tokens` field in custom_config.md 2024-01-03 09:22:48 -05:00			Important: As of v0.6.6, it's recommend you use the `librechat.yaml` [Configuration file (guide here)](./custom_config.md) to add Reverse Proxies as separate endpoints.

📚 docs: Add LiteLLM Proxy - Load balance 100+ LLMs & Spend Tracking ⚖️🤖📈 (#1249) * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> 2023-11-30 10:59:16 -08:00			#### 3. Save fake OpenAI key in Librechat's `.env`

			Copy Librechat's `.env.example` to `.env` and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key).
			```env
			`OPENAI_API_KEY=sk-1234`
			```

			`#### 4. Run LibreChat:`
			```shell
			`docker compose up`
			```

			`---`

			`### Why use LiteLLM?`

			`1. Access to Multiple LLMs: It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format.`

			`2. Load Balancing: Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments.`

			`3. Authentication & Spend Tracking: The server supports virtual keys for authentication and tracks spending.`

			`Key components and features include:`

			`- Installation: Easy installation.`
			`- Testing: Testing features to route requests to specific models.`
			`- Server Endpoints: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation.`
			`- Supported LLMs: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more.`
			`- Proxy Configurations: Allows setting various parameters like model list, server settings, environment variables, and more.`
			`- Multiple Models Management: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts.`
			`- Embedding Models Support: Special configurations for embedding models.`
			`- Authentication Management: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend.`
			`- Custom Configurations: Supports setting model-specific parameters, caching responses, and custom prompt templates.`
			`- Debugging Tools: Options for debugging and logging proxy input/output.`
			`- Deployment and Performance: Information on deploying LiteLLM Proxy and its performance metrics.`
			`- Proxy CLI Arguments: A wide range of command-line arguments for customization.`

			`Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications.`
📖 docs: Update litellm.md to add Ollama (#1616) 2024-01-22 19:45:00 -06:00
			`## Ollama`
			`Use [Ollama](https://ollama.ai/) for`
			`* Run large language models on local hardware`
			`* Host multiple models`
			`* Dynamically load the model upon request`

			`### docker-compose.yaml with GPU`
			```yaml
			`version: "3.8"`
			`services:`
			`litellm:`
			`image: ghcr.io/berriai/litellm:main-v1.18.8`
			`volumes:`
			`- ./litellm/litellm-config.yaml:/app/config.yaml`
			`command: [ "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "8" ]`
			`ollama:`
			`image: ollama/ollama`
			`deploy:`
			`resources:`
			`reservations:`
			`devices:`
			`- driver: nvidia`
			`capabilities: [compute, utility]`
			`ports:`
			`- "11434:11434"`
			`volumes:`
			`- ./ollama:/root/.ollama`

			```

			`### Loading Models in Ollama`
			`1. Browse the available models at [Ollama Library](https://ollama.ai/library)`
			2. Run ```docker exec -it ollama /bin/bash```
			`3. Copy the text from the Tags tab from the library website. It should begin with 'ollama run'`
			`4. Check model size. Models that can run in GPU memory perform the best.`
			`5. Use /bye to exit the terminal`

			`### Litellm Ollama Configuration`
			`Add the below lines to the config to access the Ollama models`
			```yaml
			`- model_name: mixtral`
			`litellm_params:`
			`model: ollama/mixtral:8x7b-instruct-v0.1-q5_K_M`
			`api_base: http://ollama:11434`
			`stream: True`
			`- model_name: mistral`
			`litellm_params:`
			`model: ollama/mistral`
			`api_base: http://ollama:11434`
			`stream: True`
			```