From 2bcfb04a724176f2ab03ae6d4b998d778b41245a Mon Sep 17 00:00:00 2001 From: Ishaan Jaff Date: Thu, 30 Nov 2023 10:59:16 -0800 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9A=20docs:=20Add=20LiteLLM=20Proxy=20?= =?UTF-8?q?-=20Load=20balance=20100+=20LLMs=20&=20Spend=20Tracking=20?= =?UTF-8?q?=E2=9A=96=EF=B8=8F=F0=9F=A4=96=F0=9F=93=88=20=20(#1249)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * (docs) add instructions on using litellm * Update litellm.md --------- Co-authored-by: Danny Avila <110412045+danny-avila@users.noreply.github.com> --- README.md | 1 + docs/install/litellm.md | 96 +++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 3 files changed, 98 insertions(+) create mode 100644 docs/install/litellm.md diff --git a/README.md b/README.md index 127447d38..e77d6b86d 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,7 @@ Keep up with the latest updates by visiting the releases page - [Releases](https * [User Auth System](docs/install/user_auth_system.md) * [Online MongoDB Database](docs/install/mongodb.md) * [Default Language](docs/install/default_language.md) + * [LiteLLM Proxy: Load Balance LLMs + Spend Tracking](docs/install/litellm.md)
diff --git a/docs/install/litellm.md b/docs/install/litellm.md new file mode 100644 index 000000000..dfae409e0 --- /dev/null +++ b/docs/install/litellm.md @@ -0,0 +1,96 @@ +# Using LibreChat with LiteLLM Proxy +Use LiteLLM Proxy for: +* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format +* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests +* Authentication & Spend Tracking Virtual Keys + +## Start LiteLLM Proxy Server +### Pip install litellm +```shell +pip install litellm +``` + +### Create a config.yaml for litellm proxy +More information on LiteLLM configurations here: https://docs.litellm.ai/docs/simple_proxy#proxy-configs + +```yaml +model_list: + - model_name: gpt-3.5-turbo + litellm_params: + model: azure/gpt-turbo-small-eu + api_base: https://my-endpoint-europe-berri-992.openai.azure.com/ + api_key: + rpm: 6 # Rate limit for this deployment: in requests per minute (rpm) + - model_name: gpt-3.5-turbo + litellm_params: + model: azure/gpt-turbo-small-ca + api_base: https://my-endpoint-canada-berri992.openai.azure.com/ + api_key: + rpm: 6 + - model_name: gpt-3.5-turbo + litellm_params: + model: azure/gpt-turbo-large + api_base: https://openai-france-1234.openai.azure.com/ + api_key: + rpm: 1440 +``` + +### Start the proxy +```shell +litellm --config /path/to/config.yaml + +#INFO: Proxy running on http://0.0.0.0:8000 +``` + +## Use LiteLLM Proxy Server with LibreChat + + +#### 1. Clone the repo +```shell +git clone https://github.com/danny-avila/LibreChat.git +``` + + +#### 2. Modify Librechat's `docker-compose.yml` +```yaml +OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions +``` + +#### 3. Save fake OpenAI key in Librechat's `.env` + +Copy Librechat's `.env.example` to `.env` and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key). +```env +OPENAI_API_KEY=sk-1234 +``` + +#### 4. Run LibreChat: +```shell +docker compose up +``` + +--- + +### Why use LiteLLM? + +1. **Access to Multiple LLMs**: It allows calling over 100 LLMs from platforms like Huggingface, Bedrock, TogetherAI, etc., using OpenAI's ChatCompletions and Completions format. + +2. **Load Balancing**: Capable of handling over 1,000 requests per second during load tests, it balances load across various models and deployments. + +3. **Authentication & Spend Tracking**: The server supports virtual keys for authentication and tracks spending. + +Key components and features include: + +- **Installation**: Easy installation. +- **Testing**: Testing features to route requests to specific models. +- **Server Endpoints**: Offers multiple endpoints for chat completions, completions, embeddings, model lists, and key generation. +- **Supported LLMs**: Supports a wide range of LLMs, including AWS Bedrock, Azure OpenAI, Huggingface, AWS Sagemaker, Anthropic, and more. +- **Proxy Configurations**: Allows setting various parameters like model list, server settings, environment variables, and more. +- **Multiple Models Management**: Configurations can be set up for managing multiple models with fallbacks, cooldowns, retries, and timeouts. +- **Embedding Models Support**: Special configurations for embedding models. +- **Authentication Management**: Features for managing authentication through virtual keys, model upgrades/downgrades, and tracking spend. +- **Custom Configurations**: Supports setting model-specific parameters, caching responses, and custom prompt templates. +- **Debugging Tools**: Options for debugging and logging proxy input/output. +- **Deployment and Performance**: Information on deploying LiteLLM Proxy and its performance metrics. +- **Proxy CLI Arguments**: A wide range of command-line arguments for customization. + +Overall, LiteLLM Server offers a comprehensive suite of tools for managing, deploying, and interacting with a variety of LLMs, making it a versatile choice for large-scale AI applications. diff --git a/mkdocs.yml b/mkdocs.yml index 211d520e6..1ce87c5ac 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -93,6 +93,7 @@ nav: - User Auth System: 'install/user_auth_system.md' - Online MongoDB Database: 'install/mongodb.md' - Languages: 'install/default_language.md' + - LiteLLM Proxy: 'install/litellm.md' - Miscellaneous: 'install/misc.md' - Features: - Plugins: