Some cleanup of the LLM contrib README

2026-03-16 21:06:30 +01:00 · 2023-07-15 00:17:54 +02:00 · 2023-07-15 00:17:54 +02:00 · f6021cf8c3
commit f6021cf8c3
parent 64c2da18c4
2 changed files with 18 additions and 18 deletions
--- a/docs/source/Contribs/Contrib-Llm.md
+++ b/docs/source/Contribs/Contrib-Llm.md
@ -10,24 +10,24 @@ You need two components for this contrib - Evennia, and an LLM webserver that op

 ### LLM Server

-There are many LLM servers, but they can be pretty technical to install and set up. This contrib was tested with [text-generation-webui](https://github.com/oobabooga/text-generation-webui) which has a lot of features, and is also easy to install. Here are the install instructions in brief, see their home page for more details.
+There are many LLM servers, but they can be pretty technical to install and set up. This contrib was tested with [text-generation-webui](https://github.com/oobabooga/text-generation-webui). It has a lot of features while also being easy to install. |

 1. [Go to the Installation section](https://github.com/oobabooga/text-generation-webui#installation) and grab the 'one-click installer' for your OS.
 2. Unzip the files in a folder somewhere on your hard drive (you don't have to put it next to your evennia stuff if you don't want to).
 3. In a terminal/console, `cd` into the folder and execute the source file in whatever way it's done for your OS (like `source start_linux.sh` for Linux, or `.\start_windows` for Windows). This is an installer that will fetch and install everything in a conda virtual environment. When asked, make sure to select your GPU (NVIDIA/AMD etc) if you have one, otherwise use CPU.
-4. Once all is loaded, Ctrl-C (or Cmd-C) the server and open the file `webui.py` (it's one of the top files in the archive you unzipped). Find a text string `CMD_FLAGS = ''` and change this to `CMD_FLAGS = '--api'`. Then save and close. This makes the server activate its api automatically.
-4. Now just run that server starting script again. This is what you'll use to start the LLM server henceforth.
-5. Once the server is running, open your browser on http://127.0.0.1:7860 to see the running Text generation web ui running. If you turned on the API, you'll find it's now active on port 5000. This should not collide with default Evennia ports unless you changed something.
+4. Once all is loaded, stop the server with `Ctrl-C` (or `Cmd-C`) and open the file `webui.py` (it's one of the top files in the archive you unzipped). Find the text string `CMD_FLAGS = ''` near the top and change this to `CMD_FLAGS = '--api'`. Then save and close. This makes the server activate its api automatically.
+4. Now just run that server starting script (`start_linux.sh` etc) again. This is what you'll use to start the LLM server henceforth.
+5. Once the server is running, point your browser to http://127.0.0.1:7860 to see the running Text generation web ui running. If you turned on the API, you'll find it's now active on port 5000. This should not collide with default Evennia ports unless you changed something.
 6. At this point you have the server and API, but it's not actually running any Large-Language-Model (LLM) yet. In the web ui, go to the `models` tab and enter a github-style path in the `Download custom model or LoRA` field.  To test so things work, enter `facebook/opt-125m` and download. This is a relatively small model (125 million parameters) so should be possible to run on most machines using only CPU. Update the models in the drop-down on the left and select it, then load it with the `Transformers` loader. It should load pretty quickly. If you want to load this every time, you can select the `Autoload the model` checkbox; otherwise you'll need to select and load the model every time you start the LLM server.
-7. To experiment, you can find thousands of other open-source text-generation LLM models on [huggingface.co/models](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending). Be ware to not download a too huge model; your machine may not be able to load it! If you try large models, _don't_ set the `Autoload the model` checkbox, in case the model crashes your server on startup.
+7. To experiment, you can find thousands of other open-source text-generation LLM models on [huggingface.co/models](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending). Beware to not download a too huge model; your machine may not be able to load it! If you try large models, _don't_ set the `Autoload the model` checkbox, in case the model crashes your server on startup.

-For troubleshooting, you can look at the terminal output of the `text-generation-webui` server; it will show you the requests you do to it and also list any errors.
+For troubleshooting, you can look at the terminal output of the `text-generation-webui` server; it will show you the requests you do to it and also list any errors. See the text-generation-webui homepage for more details.

 ### Evennia config

-To be able to talk to NPCs, import and add the `evennia.contrib.rpg.llm.llm_npc.CmdLLMTalk` command to your Character cmdset in `mygame/commands/default_commands.py` (see the basic tutorials if you are unsure.
+To be able to talk to NPCs, import and add the `evennia.contrib.rpg.llm.llm_npc.CmdLLMTalk` command to your Character cmdset in `mygame/commands/default_commands.py` (see the basic tutorials if you are unsure).

-The default LLM api config should work with the text-generation-webui LLM server running its API on port 5000. You can also customize it via settings (if a setting is not added, the default below is used:
+The default LLM api config should work with the text-generation-webui LLM server running its API on port 5000. You can also customize it via settings (if a setting is not added, the default below is used):

 ```python
    # path to the LLM server
@ -97,7 +97,7 @@ Running an LLM locally can be _very_ demanding.

 As an example, I tested this on my very beefy work laptop. It has 32GB or RAM, but no gpu. so i ran the example (small 128m parameter) model on cpu. it takes about 3-4 seconds to generate a (frankly very bad) response. so keep that in mind.

-On huggingface you can find listings of the 'best performing' language models right now. This changes all the time. The leading models require 100+ GB RAM. And while it's possible to run on a CPU, ideally you should have a large graphics card (GPU) with a lot of VRAM too.
+On huggingface.co you can find listings of the 'best performing' language models right now. This changes all the time. The leading models require 100+ GB RAM. And while it's possible to run on a CPU, ideally you should have a large graphics card (GPU) with a lot of VRAM too.

 So most likely you'll have to settle on something smaller. Experimenting with different models and also tweaking the prompt is needed.