Add LLM contrib for having NPCs talk with input from an LLM AI server

2026-03-16 21:06:30 +01:00 · 2023-07-14 21:49:20 +02:00 · 2023-07-14 21:49:20 +02:00 · 64c2da18c4
commit 64c2da18c4
parent 49bb82f8ff
17 changed files with 670 additions and 19 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,10 @@
 # Changelog

+## Evennia main branch
+
+- Contrib: Large-language-model (LLM) AI integration; allows NPCs to talk using
+  responses from a neural network server.
+
 ## Evennia 2.1.0

 July 14, 2023
--- a/docs/source/Coding/Changelog.md
+++ b/docs/source/Coding/Changelog.md
@ -1,5 +1,10 @@
 # Changelog

+## Evennia main branch
+
+- Contrib: Large-language-model (LLM) AI integration; allows NPCs to talk using
+  responses from a neural network server.
+
 ## Evennia 2.1.0

 July 14, 2023
--- a/docs/source/Contribs/Contrib-Llm.md
+++ b/docs/source/Contribs/Contrib-Llm.md
@ -0,0 +1,133 @@
+# Large Language Model ("Chat-bot AI") integration
+
+Contribution by Griatch 2023
+
+This adds an LLMClient that allows Evennia to send prompts to a  LLM server (Large Language Model, along the lines of ChatGPT). Example uses a local OSS LLM install. Included is an NPC you can chat with using a new `talk` command. The NPC will respond using the AI responses from the LLM server. All calls are asynchronous, so if the LLM is slow, Evennia is not affected.
+
+## Installation
+
+You need two components for this contrib - Evennia, and an LLM webserver that operates and provides an API to an LLM AI model.
+
+### LLM Server
+
+There are many LLM servers, but they can be pretty technical to install and set up. This contrib was tested with [text-generation-webui](https://github.com/oobabooga/text-generation-webui) which has a lot of features, and is also easy to install. Here are the install instructions in brief, see their home page for more details.
+
+1. [Go to the Installation section](https://github.com/oobabooga/text-generation-webui#installation) and grab the 'one-click installer' for your OS.
+2. Unzip the files in a folder somewhere on your hard drive (you don't have to put it next to your evennia stuff if you don't want to).
+3. In a terminal/console, `cd` into the folder and execute the source file in whatever way it's done for your OS (like `source start_linux.sh` for Linux, or `.\start_windows` for Windows). This is an installer that will fetch and install everything in a conda virtual environment. When asked, make sure to select your GPU (NVIDIA/AMD etc) if you have one, otherwise use CPU.
+4. Once all is loaded, Ctrl-C (or Cmd-C) the server and open the file `webui.py` (it's one of the top files in the archive you unzipped). Find a text string `CMD_FLAGS = ''` and change this to `CMD_FLAGS = '--api'`. Then save and close. This makes the server activate its api automatically.
+4. Now just run that server starting script again. This is what you'll use to start the LLM server henceforth.
+5. Once the server is running, open your browser on http://127.0.0.1:7860 to see the running Text generation web ui running. If you turned on the API, you'll find it's now active on port 5000. This should not collide with default Evennia ports unless you changed something.
+6. At this point you have the server and API, but it's not actually running any Large-Language-Model (LLM) yet. In the web ui, go to the `models` tab and enter a github-style path in the `Download custom model or LoRA` field.  To test so things work, enter `facebook/opt-125m` and download. This is a relatively small model (125 million parameters) so should be possible to run on most machines using only CPU. Update the models in the drop-down on the left and select it, then load it with the `Transformers` loader. It should load pretty quickly. If you want to load this every time, you can select the `Autoload the model` checkbox; otherwise you'll need to select and load the model every time you start the LLM server.
+7. To experiment, you can find thousands of other open-source text-generation LLM models on [huggingface.co/models](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending). Be ware to not download a too huge model; your machine may not be able to load it! If you try large models, _don't_ set the `Autoload the model` checkbox, in case the model crashes your server on startup.
+
+For troubleshooting, you can look at the terminal output of the `text-generation-webui` server; it will show you the requests you do to it and also list any errors.
+
+### Evennia config
+
+To be able to talk to NPCs, import and add the `evennia.contrib.rpg.llm.llm_npc.CmdLLMTalk` command to your Character cmdset in `mygame/commands/default_commands.py` (see the basic tutorials if you are unsure.
+
+The default LLM api config should work with the text-generation-webui LLM server running its API on port 5000. You can also customize it via settings (if a setting is not added, the default below is used:
+
+```python
+    # path to the LLM server
+    LLM_HOST = "http://127.0.0.1:5000"
+    LLM_PATH = "/api/v1/generate"
+
+    # if you wanted to authenticated to some external service, you could
+    # add an Authenticate header here with a token
+    LLM_HEADERS = {"Content-Type": "application/json"}
+
+    # this key will be inserted in the request, with your user-input
+    LLM_PROMPT_KEYNAME = "prompt"
+
+    # defaults are set up for text-generation-webui. I have no idea what most of
+    # these do ^_^; you'll need to read a book on LLMs, or at least dive
+    # into a bunch of online tutorials.
+    LLM_REQUEST_BODY = {
+        "max_new_tokens": 250,  # set how many tokens are part of a response
+        "preset": "None",
+        "do_sample": True,
+        "temperature": 0.7,
+        "top_p": 0.1,
+        "typical_p": 1,
+        "epsilon_cutoff": 0,  # In units of 1e-4
+        "eta_cutoff": 0,  # In units of 1e-4
+        "tfs": 1,
+        "top_a": 0,
+        "repetition_penalty": 1.18,
+        "repetition_penalty_range": 0,
+        "top_k": 40,
+        "min_length": 0,
+        "no_repeat_ngram_size": 0,
+        "num_beams": 1,
+        "penalty_alpha": 0,
+        "length_penalty": 1,
+        "early_stopping": False,
+        "mirostat_mode": 0,
+        "mirostat_tau": 5,
+        "mirostat_eta": 0.1,
+        "seed": -1,
+        "add_bos_token": True,
+        "truncation_length": 2048,
+        "ban_eos_token": False,
+        "skip_special_tokens": True,
+        "stopping_strings": [],
+    }
+```
+Don't forget to reload Evennia if you make any changes.
+
+
+## Usage
+
+With the LLM server running and the new `talk` command added, create a new LLM-connected NPC and talk to it in-game.
+
+    > create/drop girl:evennia.contrib.rpg.llm.LLMNPC
+    > talk girl Hello!
+    girl ponders ...
+    girl says, Hello! How are you?
+
+The NPC will show a 'thinking' message if the server responds slower than 2 seconds (by default).
+
+Most likely, your first response will *not* be this nice and short, but will be quite nonsensical, looking like an email. This is because the example model we loaded is not optimized for conversations. But at least you know it works!
+
+## A note on running LLMs locally
+
+Running an LLM locally can be _very_ demanding.
+
+As an example, I tested this on my very beefy work laptop. It has 32GB or RAM, but no gpu. so i ran the example (small 128m parameter) model on cpu. it takes about 3-4 seconds to generate a (frankly very bad) response. so keep that in mind.
+
+On huggingface you can find listings of the 'best performing' language models right now. This changes all the time. The leading models require 100+ GB RAM. And while it's possible to run on a CPU, ideally you should have a large graphics card (GPU) with a lot of VRAM too.
+
+So most likely you'll have to settle on something smaller. Experimenting with different models and also tweaking the prompt is needed.
+
+Also be aware that many open-source models are intended for AI research and licensed for non-commercial use only. So be careful if you want to use this in a commercial game. No doubt there will be a lot of changes in this area over the coming years.
+
+### Why not use an AI cloud service?
+
+You could in principle use this to call out to an external API, like OpenAI (chat-GPT) or Google. Most such cloud-hosted services are commercial (costs money). But since they have the hardware to run bigger models (or their own, proprietary models), they may give better and faster results.
+
+Calling an external API is not tested, so report any findings. Since the Evennia Server (not the Portal) is doing the calling, you are recommended to put a proxy between you and the internet if you call out like this.
+
+## The LLMNPC class
+
+This is a simple Character class, with a few extra properties:
+
+```python
+    response_template = "{name} says: {response}"
+    thinking_timeout = 2    # how long to wait until showing thinking
+
+    # random 'thinking echoes' to return while we wait, if the AI is slow
+    thinking_messages = [
+        "{name} thinks about what you said ...",
+        "{name} ponders your words ...",
+        "{name} ponders ...",
+    ]
+```
+
+The character has a new method `at_talked_to` which does the connection to the LLM server and responds. This is called by the new `talk` command. Note that all these calls are asynchronous, meaning a slow response will not block Evennia.
+
+----
+
+<small>This document page is generated from `evennia/contrib/rpg/llm/README.md`. Changes to this
+file will be overwritten, so edit that file rather than this one.</small>
--- a/docs/source/Contribs/Contribs-Overview.md
+++ b/docs/source/Contribs/Contribs-Overview.md
@ -7,7 +7,7 @@ in the [Community Contribs & Snippets][forum] forum.
 _Contribs_ are optional code snippets and systems contributed by
 the Evennia community. They vary in size and complexity and
 may be more specific about game types and styles than 'core' Evennia.
-This page is auto-generated and summarizes all **48** contribs currently included
+This page is auto-generated and summarizes all **49** contribs currently included
 with the Evennia distribution.

 All contrib categories are imported from `evennia.contrib`, such as
@ -34,11 +34,11 @@ If you want to add a contrib, see [the contrib guidelines](./Contribs-Guidelines
 | [components](#components) | [containers](#containers) | [cooldowns](#cooldowns) | [crafting](#crafting) | [custom_gametime](#custom_gametime) |
 | [dice](#dice) | [email_login](#email_login) | [evadventure](#evadventure) | [evscaperoom](#evscaperoom) | [extended_room](#extended_room) |
 | [fieldfill](#fieldfill) | [gendersub](#gendersub) | [git_integration](#git_integration) | [godotwebsocket](#godotwebsocket) | [health_bar](#health_bar) |
-| [ingame_map_display](#ingame_map_display) | [ingame_python](#ingame_python) | [mail](#mail) | [mapbuilder](#mapbuilder) | [menu_login](#menu_login) |
-| [mirror](#mirror) | [multidescer](#multidescer) | [mux_comms_cmds](#mux_comms_cmds) | [name_generator](#name_generator) | [puzzles](#puzzles) |
-| [random_string_generator](#random_string_generator) | [red_button](#red_button) | [rpsystem](#rpsystem) | [simpledoor](#simpledoor) | [slow_exit](#slow_exit) |
-| [talking_npc](#talking_npc) | [traits](#traits) | [tree_select](#tree_select) | [turnbattle](#turnbattle) | [tutorial_world](#tutorial_world) |
-| [unixcommand](#unixcommand) | [wilderness](#wilderness) | [xyzgrid](#xyzgrid) |
+| [ingame_map_display](#ingame_map_display) | [ingame_python](#ingame_python) | [llm](#llm) | [mail](#mail) | [mapbuilder](#mapbuilder) |
+| [menu_login](#menu_login) | [mirror](#mirror) | [multidescer](#multidescer) | [mux_comms_cmds](#mux_comms_cmds) | [name_generator](#name_generator) |
+| [puzzles](#puzzles) | [random_string_generator](#random_string_generator) | [red_button](#red_button) | [rpsystem](#rpsystem) | [simpledoor](#simpledoor) |
+| [slow_exit](#slow_exit) | [talking_npc](#talking_npc) | [traits](#traits) | [tree_select](#tree_select) | [turnbattle](#turnbattle) |
+| [tutorial_world](#tutorial_world) | [unixcommand](#unixcommand) | [wilderness](#wilderness) | [xyzgrid](#xyzgrid) |



@ -552,6 +552,7 @@ Contrib-Buffs.md
 Contrib-Character-Creator.md
 Contrib-Dice.md
 Contrib-Health-Bar.md
+Contrib-Llm.md
 Contrib-RPSystem.md
 Contrib-Traits.md
 ```
@ -604,6 +605,16 @@ and can be used for any sort of appropriate data besides player health.



+### `llm`
+
+_Contribution by Griatch 2023_
+
+This adds an LLMClient that allows Evennia to send prompts to a  LLM server (Large Language Model, along the lines of ChatGPT). Example uses a local OSS LLM install. Included is an NPC you can chat with using a new `talk` command. The NPC will respond using the AI responses from the LLM server. All calls are asynchronous, so if the LLM is slow, Evennia is not affected.
+
+[Read the documentation](./Contrib-Llm.md) - [Browse the Code](evennia.contrib.rpg.llm)
+
+
+
 ### `rpsystem`

 _Contribution by Griatch, 2015_
--- a/docs/source/api/evennia.contrib.rpg.llm.llm_client.md
+++ b/docs/source/api/evennia.contrib.rpg.llm.llm_client.md
@ -0,0 +1,10 @@
+```{eval-rst}
+evennia.contrib.rpg.llm.llm\_client 
+==========================================
+
+.. automodule:: evennia.contrib.rpg.llm.llm_client
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+```
--- a/docs/source/api/evennia.contrib.rpg.llm.llm_npc.md
+++ b/docs/source/api/evennia.contrib.rpg.llm.llm_npc.md
@ -0,0 +1,10 @@
+```{eval-rst}
+evennia.contrib.rpg.llm.llm\_npc 
+=======================================
+
+.. automodule:: evennia.contrib.rpg.llm.llm_npc
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+```
--- a/docs/source/api/evennia.contrib.rpg.llm.md
+++ b/docs/source/api/evennia.contrib.rpg.llm.md
@ -0,0 +1,19 @@
+```{eval-rst}
+evennia.contrib.rpg.llm 
+===============================
+
+.. automodule:: evennia.contrib.rpg.llm
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+
+
+.. toctree::
+   :maxdepth: 6
+
+   evennia.contrib.rpg.llm.llm_client
+   evennia.contrib.rpg.llm.llm_npc
+   evennia.contrib.rpg.llm.tests
+
+```
--- a/docs/source/api/evennia.contrib.rpg.llm.tests.md
+++ b/docs/source/api/evennia.contrib.rpg.llm.tests.md
@ -0,0 +1,10 @@
+```{eval-rst}
+evennia.contrib.rpg.llm.tests 
+====================================
+
+.. automodule:: evennia.contrib.rpg.llm.tests
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+```
--- a/docs/source/api/evennia.contrib.rpg.md
+++ b/docs/source/api/evennia.contrib.rpg.md
@ -15,6 +15,7 @@ evennia.contrib.rpg
   evennia.contrib.rpg.character_creator
   evennia.contrib.rpg.dice
   evennia.contrib.rpg.health_bar
+   evennia.contrib.rpg.llm
   evennia.contrib.rpg.rpsystem
   evennia.contrib.rpg.traits

--- a/evennia/commands/cmdhandler.py
+++ b/evennia/commands/cmdhandler.py
@ -36,13 +36,12 @@ from weakref import WeakValueDictionary

 from django.conf import settings
 from django.utils.translation import gettext as _
-from twisted.internet import reactor
-from twisted.internet.defer import inlineCallbacks, returnValue
-from twisted.internet.task import deferLater
-
 from evennia.commands.command import InterruptCommand
 from evennia.utils import logger, utils
 from evennia.utils.utils import string_suggestions
+from twisted.internet import reactor
+from twisted.internet.defer import inlineCallbacks, returnValue
+from twisted.internet.task import deferLater

 _IN_GAME_ERRORS = settings.IN_GAME_ERRORS

--- a/evennia/contrib/rpg/llm/README.md
+++ b/evennia/contrib/rpg/llm/README.md
@ -0,0 +1,128 @@
+# Large Language Model ("Chat-bot AI") integration
+
+Contribution by Griatch 2023
+
+This adds an LLMClient that allows Evennia to send prompts to a  LLM server (Large Language Model, along the lines of ChatGPT). Example uses a local OSS LLM install. Included is an NPC you can chat with using a new `talk` command. The NPC will respond using the AI responses from the LLM server. All calls are asynchronous, so if the LLM is slow, Evennia is not affected.
+
+## Installation
+
+You need two components for this contrib - Evennia, and an LLM webserver that operates and provides an API to an LLM AI model.
+
+### LLM Server
+
+There are many LLM servers, but they can be pretty technical to install and set up. This contrib was tested with [text-generation-webui](https://github.com/oobabooga/text-generation-webui) which has a lot of features, and is also easy to install. Here are the install instructions in brief, see their home page for more details.
+
+1. [Go to the Installation section](https://github.com/oobabooga/text-generation-webui#installation) and grab the 'one-click installer' for your OS.
+2. Unzip the files in a folder somewhere on your hard drive (you don't have to put it next to your evennia stuff if you don't want to).
+3. In a terminal/console, `cd` into the folder and execute the source file in whatever way it's done for your OS (like `source start_linux.sh` for Linux, or `.\start_windows` for Windows). This is an installer that will fetch and install everything in a conda virtual environment. When asked, make sure to select your GPU (NVIDIA/AMD etc) if you have one, otherwise use CPU.
+4. Once all is loaded, Ctrl-C (or Cmd-C) the server and open the file `webui.py` (it's one of the top files in the archive you unzipped). Find a text string `CMD_FLAGS = ''` and change this to `CMD_FLAGS = '--api'`. Then save and close. This makes the server activate its api automatically.
+4. Now just run that server starting script again. This is what you'll use to start the LLM server henceforth.
+5. Once the server is running, open your browser on http://127.0.0.1:7860 to see the running Text generation web ui running. If you turned on the API, you'll find it's now active on port 5000. This should not collide with default Evennia ports unless you changed something.
+6. At this point you have the server and API, but it's not actually running any Large-Language-Model (LLM) yet. In the web ui, go to the `models` tab and enter a github-style path in the `Download custom model or LoRA` field.  To test so things work, enter `facebook/opt-125m` and download. This is a relatively small model (125 million parameters) so should be possible to run on most machines using only CPU. Update the models in the drop-down on the left and select it, then load it with the `Transformers` loader. It should load pretty quickly. If you want to load this every time, you can select the `Autoload the model` checkbox; otherwise you'll need to select and load the model every time you start the LLM server.
+7. To experiment, you can find thousands of other open-source text-generation LLM models on [huggingface.co/models](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending). Be ware to not download a too huge model; your machine may not be able to load it! If you try large models, _don't_ set the `Autoload the model` checkbox, in case the model crashes your server on startup.
+
+For troubleshooting, you can look at the terminal output of the `text-generation-webui` server; it will show you the requests you do to it and also list any errors.
+
+### Evennia config
+
+To be able to talk to NPCs, import and add the `evennia.contrib.rpg.llm.llm_npc.CmdLLMTalk` command to your Character cmdset in `mygame/commands/default_commands.py` (see the basic tutorials if you are unsure.
+
+The default LLM api config should work with the text-generation-webui LLM server running its API on port 5000. You can also customize it via settings (if a setting is not added, the default below is used:
+
+```python
+    # path to the LLM server
+    LLM_HOST = "http://127.0.0.1:5000"
+    LLM_PATH = "/api/v1/generate"
+
+    # if you wanted to authenticated to some external service, you could
+    # add an Authenticate header here with a token
+    LLM_HEADERS = {"Content-Type": "application/json"}
+
+    # this key will be inserted in the request, with your user-input
+    LLM_PROMPT_KEYNAME = "prompt"
+
+    # defaults are set up for text-generation-webui. I have no idea what most of
+    # these do ^_^; you'll need to read a book on LLMs, or at least dive
+    # into a bunch of online tutorials.
+    LLM_REQUEST_BODY = {
+        "max_new_tokens": 250,  # set how many tokens are part of a response
+        "preset": "None",
+        "do_sample": True,
+        "temperature": 0.7,
+        "top_p": 0.1,
+        "typical_p": 1,
+        "epsilon_cutoff": 0,  # In units of 1e-4
+        "eta_cutoff": 0,  # In units of 1e-4
+        "tfs": 1,
+        "top_a": 0,
+        "repetition_penalty": 1.18,
+        "repetition_penalty_range": 0,
+        "top_k": 40,
+        "min_length": 0,
+        "no_repeat_ngram_size": 0,
+        "num_beams": 1,
+        "penalty_alpha": 0,
+        "length_penalty": 1,
+        "early_stopping": False,
+        "mirostat_mode": 0,
+        "mirostat_tau": 5,
+        "mirostat_eta": 0.1,
+        "seed": -1,
+        "add_bos_token": True,
+        "truncation_length": 2048,
+        "ban_eos_token": False,
+        "skip_special_tokens": True,
+        "stopping_strings": [],
+    }
+```
+Don't forget to reload Evennia if you make any changes.
+
+
+## Usage
+
+With the LLM server running and the new `talk` command added, create a new LLM-connected NPC and talk to it in-game.
+
+    > create/drop girl:evennia.contrib.rpg.llm.LLMNPC
+    > talk girl Hello!
+    girl ponders ...
+    girl says, Hello! How are you?
+
+The NPC will show a 'thinking' message if the server responds slower than 2 seconds (by default).
+
+Most likely, your first response will *not* be this nice and short, but will be quite nonsensical, looking like an email. This is because the example model we loaded is not optimized for conversations. But at least you know it works!
+
+## A note on running LLMs locally
+
+Running an LLM locally can be _very_ demanding.
+
+As an example, I tested this on my very beefy work laptop. It has 32GB or RAM, but no gpu. so i ran the example (small 128m parameter) model on cpu. it takes about 3-4 seconds to generate a (frankly very bad) response. so keep that in mind.
+
+On huggingface you can find listings of the 'best performing' language models right now. This changes all the time. The leading models require 100+ GB RAM. And while it's possible to run on a CPU, ideally you should have a large graphics card (GPU) with a lot of VRAM too.
+
+So most likely you'll have to settle on something smaller. Experimenting with different models and also tweaking the prompt is needed.
+
+Also be aware that many open-source models are intended for AI research and licensed for non-commercial use only. So be careful if you want to use this in a commercial game. No doubt there will be a lot of changes in this area over the coming years.
+
+### Why not use an AI cloud service?
+
+You could in principle use this to call out to an external API, like OpenAI (chat-GPT) or Google. Most such cloud-hosted services are commercial (costs money). But since they have the hardware to run bigger models (or their own, proprietary models), they may give better and faster results.
+
+Calling an external API is not tested, so report any findings. Since the Evennia Server (not the Portal) is doing the calling, you are recommended to put a proxy between you and the internet if you call out like this.
+
+## The LLMNPC class
+
+This is a simple Character class, with a few extra properties:
+
+```python
+    response_template = "{name} says: {response}"
+    thinking_timeout = 2    # how long to wait until showing thinking
+
+    # random 'thinking echoes' to return while we wait, if the AI is slow
+    thinking_messages = [
+        "{name} thinks about what you said ...",
+        "{name} ponders your words ...",
+        "{name} ponders ...",
+    ]
+```
+
+The character has a new method `at_talked_to` which does the connection to the LLM server and responds. This is called by the new `talk` command. Note that all these calls are asynchronous, meaning a slow response will not block Evennia.
--- a/evennia/contrib/rpg/llm/init.py
+++ b/evennia/contrib/rpg/llm/init.py
@ -0,0 +1,2 @@
+from .llm_client import LLMClient
+from .llm_npc import LLMNPC, CmdLLMTalk
--- a/evennia/contrib/rpg/llm/llm_client.py
+++ b/evennia/contrib/rpg/llm/llm_client.py
@ -0,0 +1,183 @@
+"""
+LLM (Large Language Model) client, for communicating with an LLM backend. This can be used
+for generating texts for AI npcs, or for fine-tuning the LLM on a given prompt.
+
+Note that running a LLM locally requires a lot of power, and ideally a powerful GPU. Testing
+this with CPU mode on a beefy laptop, still takes some 4s just on a very small model.
+
+The server defaults to output suitable for a local server
+https://github.com/oobabooga/text-generation-webui, but could be used for other LLM servers too.
+
+See the LLM instructions on that page for how to set up the server. You'll also need
+a model file - there are thousands to try out on https://huggingface.co/models (you want Text
+Generation models specifically).
+
+# Optional Evennia settings (if not given, these defaults are used)
+
+DEFAULT_LLM_HOST = "http://localhost:5000"
+DEFAULT_LLM_PATH = "/api/v1/generate"
+DEFAULT_LLM_HEADERS = {"Content-Type": "application/json"}
+DEFAULT_LLM_PROMPT_KEYNAME = "prompt"
+DEFAULT_LLM_REQUEST_BODY = {...}   # see below, this controls how to prompt the LLM server.
+
+"""
+
+import json
+
+from django.conf import settings
+from evennia import logger
+from twisted.internet import defer, protocol, reactor
+from twisted.internet.defer import inlineCallbacks
+from twisted.web.client import Agent, HTTPConnectionPool, _HTTP11ClientFactory
+from twisted.web.http_headers import Headers
+from twisted.web.iweb import IBodyProducer
+from zope.interface import implementer
+
+DEFAULT_LLM_HOST = "http://127.0.0.1:5000"
+DEFAULT_LLM_PATH = "/api/v1/generate"
+DEFAULT_LLM_HEADERS = {"Content-Type": "application/json"}
+DEFAULT_LLM_PROMPT_KEYNAME = "prompt"
+DEFAULT_LLM_REQUEST_BODY = {
+    "max_new_tokens": 250,
+    # Generation params. If 'preset' is set to different than 'None', the values
+    # in presets/preset-name.yaml are used instead of the individual numbers.
+    "preset": "None",
+    "do_sample": True,
+    "temperature": 0.7,
+    "top_p": 0.1,
+    "typical_p": 1,
+    "epsilon_cutoff": 0,  # In units of 1e-4
+    "eta_cutoff": 0,  # In units of 1e-4
+    "tfs": 1,
+    "top_a": 0,
+    "repetition_penalty": 1.18,
+    "repetition_penalty_range": 0,
+    "top_k": 40,
+    "min_length": 0,
+    "no_repeat_ngram_size": 0,
+    "num_beams": 1,
+    "penalty_alpha": 0,
+    "length_penalty": 1,
+    "early_stopping": False,
+    "mirostat_mode": 0,
+    "mirostat_tau": 5,
+    "mirostat_eta": 0.1,
+    "seed": -1,
+    "add_bos_token": True,
+    "truncation_length": 2048,
+    "ban_eos_token": False,
+    "skip_special_tokens": True,
+    "stopping_strings": [],
+}
+
+
+@implementer(IBodyProducer)
+class StringProducer:
+    """
+    Used for feeding a request body to the HTTP client.
+    """
+
+    def __init__(self, body):
+        self.body = bytes(body, "utf-8")
+        self.length = len(body)
+
+    def startProducing(self, consumer):
+        consumer.write(self.body)
+        return defer.succeed(None)
+
+    def pauseProducing(self):
+        pass
+
+    def stopProducing(self):
+        pass
+
+
+class SimpleResponseReceiver(protocol.Protocol):
+    """
+    Used for pulling the response body out of an HTTP response.
+    """
+
+    def __init__(self, status_code, d):
+        self.status_code = status_code
+        self.buf = b""
+        self.d = d
+
+    def dataReceived(self, data):
+        self.buf += data
+
+    def connectionLost(self, reason=protocol.connectionDone):
+        self.d.callback((self.status_code, self.buf))
+
+
+class QuietHTTP11ClientFactory(_HTTP11ClientFactory):
+    """
+    Silences the obnoxious factory start/stop messages in the default client.
+    """
+
+    noisy = False
+
+
+class LLMClient:
+    """
+    A client for communicating with an LLM server.
+
+    """
+
+    def __init__(self, on_bad_request=None):
+        self._conn_pool = HTTPConnectionPool(reactor)
+        self._conn_pool._factory = QuietHTTP11ClientFactory
+
+        self.prompt_keyname = getattr(settings, "LLM_PROMPT_KEYNAME", DEFAULT_LLM_PROMPT_KEYNAME)
+        self.hostname = getattr(settings, "LLM_HOST", DEFAULT_LLM_HOST)
+        self.pathname = getattr(settings, "LLM_PATH", DEFAULT_LLM_PATH)
+        self.headers = getattr(settings, "LLM_HEADERS", DEFAULT_LLM_HEADERS)
+        self.request_body = getattr(settings, "LLM_REQUEST_BODY", DEFAULT_LLM_REQUEST_BODY)
+
+    @inlineCallbacks
+    def get_response(self, prompt):
+        """
+        Get a response from the LLM server for the given npc.
+
+        Args:
+            prompt (str): The prompt to send to the LLM server.
+
+        Returns:
+            str: The generated text response. Will return an empty string
+                if there is an issue with the server, in which case the
+                the caller is expected to handle this gracefully.
+
+        """
+        status_code, response = yield self._get_response_from_llm_server(prompt)
+        if status_code == 200:
+            return json.loads(response)["results"][0]["text"]
+        else:
+            logger.log_err(f"LLM API error (status {status_code}): {response}")
+            return ""
+
+    def _get_response_from_llm_server(self, prompt):
+        """Call and wait for response from LLM server"""
+
+        agent = Agent(reactor, pool=self._conn_pool)
+
+        request_body = self.request_body.copy()
+        request_body[self.prompt_keyname] = prompt
+
+        d = agent.request(
+            b"POST",
+            bytes(self.hostname + self.pathname, "utf-8"),
+            headers=Headers(self.headers),
+            bodyProducer=StringProducer(json.dumps(request_body)),
+        )
+
+        d.addCallbacks(self._handle_llm_response_body, self._handle_llm_error)
+        return d
+
+    def _handle_llm_response_body(self, response):
+        """Get the response body from the response"""
+        d = defer.Deferred()
+        response.deliverBody(SimpleResponseReceiver(response.code, d))
+        return d
+
+    def _handle_llm_error(self, failure):
+        failure.trap(Exception)
+        return (500, failure.getErrorMessage())
--- a/evennia/contrib/rpg/llm/llm_npc.py
+++ b/evennia/contrib/rpg/llm/llm_npc.py
@ -0,0 +1,109 @@
+"""
+Basic class for NPC that makes use of an LLM (Large Language Model) to generate replies.
+
+It comes with a `talk` command; use `talk npc <something>` to talk to the NPC. The NPC will
+respond using the LLM response.
+
+Makes use of the LLMClient for communicating with the server. The NPC will also
+echo a 'thinking...' message if the LLM server takes too long to respond.
+
+
+"""
+
+from random import choice
+
+from evennia import Command, DefaultCharacter
+from evennia.utils.utils import make_iter
+from twisted.internet import reactor, task
+from twisted.internet.defer import inlineCallbacks
+
+from .llm_client import LLMClient
+
+
+class LLMNPC(DefaultCharacter):
+    """An NPC that uses the LLM server to generate its responses. If the server is slow, it will
+    echo a thinking message to the character while it waits for a response."""
+
+    response_template = "{name} says: {response}"
+    thinking_timeout = 2  # seconds
+    thinking_messages = [
+        "{name} thinks about what you said ...",
+        "{name} ponders your words ...",
+        "{name} ponders ...",
+    ]
+
+    @property
+    def llm_client(self):
+        if not hasattr(self, "_llm_client"):
+            self._llm_client = LLMClient()
+        return self._llm_client
+
+    @inlineCallbacks
+    def at_talked_to(self, speech, character):
+        """Called when this NPC is talked to by a character."""
+
+        def _respond(response):
+            """Async handling of the server response"""
+
+            if thinking_defer and not thinking_defer.called:
+                # abort the thinking message if we were fast enough
+                thinking_defer.cancel()
+
+            character.msg(
+                self.response_template.format(
+                    name=self.get_display_name(character), response=response
+                )
+            )
+
+        def _echo_thinking_message():
+            """Echo a random thinking message to the character"""
+            thinking_messages = make_iter(self.db.thinking_messages or self.thinking_messages)
+            character.msg(choice(thinking_messages).format(name=self.get_display_name(character)))
+
+        # if response takes too long, note that the NPC is thinking.
+        thinking_defer = task.deferLater(reactor, self.thinking_timeout, _echo_thinking_message)
+
+        # get the response from the LLM server
+        yield self.llm_client.get_response(speech).addCallback(_respond)
+
+
+class CmdLLMTalk(Command):
+    """
+    Talk to an NPC
+
+    Usage:
+        talk npc <something>
+        talk npc with spaces in name = <something>
+
+    """
+
+    key = "talk"
+
+    def parse(self):
+        args = self.args.strip()
+        if "s=" in args:
+            name, *speech = args.split("=", 1)
+        else:
+            name, *speech = args.split(" ", 1)
+        self.target_name = name
+        self.speech = speech[0] if speech else ""
+
+    def func(self):
+        if not self.target_name:
+            self.caller.msg("Talk to who?")
+            return
+
+        location = self.caller.location
+        target = self.caller.search(self.target_name)
+        if not target:
+            return
+        if location:
+            location.msg_contents(
+                f'$You() talk to $You({target.key}), saying "{self.speech}"',
+                mapping={target.key: target},
+                from_obj=self.caller,
+            )
+        if hasattr(target, "at_talked_to"):
+            target.at_talked_to(self.speech, self.caller)
+        else:
+            self.caller.msg(f"{target.key} doesn't seem to want to talk to you.")
--- a/evennia/contrib/rpg/llm/tests.py
+++ b/evennia/contrib/rpg/llm/tests.py
@ -0,0 +1,27 @@
+"""
+Unit tests for the LLM Client and npc.
+
+"""
+
+from anything import Something
+from evennia.utils.create import create_object
+from evennia.utils.test_resources import EvenniaTestCase
+from mock import Mock, patch
+
+from .llm_npc import LLMNPC
+
+
+class TestLLMClient(EvenniaTestCase):
+    @patch("evennia.contrib.rpg.llm.llm_npc.task.deferLater")
+    def test_npc_at_talked_to(self, mock_deferLater):
+        """
+        Test the LLMNPC class.
+        """
+        npc = create_object(LLMNPC, key="Test NPC")
+        mock_LLMClient = Mock()
+        npc._llm_client = mock_LLMClient
+
+        npc.at_talked_to("Hello", npc)
+
+        mock_deferLater.assert_called_with(Something, npc.thinking_timeout, Something)
+        mock_LLMClient.get_response.assert_called_with("Hello")
--- a/evennia/server/game_index_client/client.py
+++ b/evennia/server/game_index_client/client.py
@ -8,7 +8,10 @@ import urllib.parse
 import urllib.request

 import django
+import evennia
 from django.conf import settings
+from evennia.accounts.models import AccountDB
+from evennia.utils import get_evennia_version, logger
 from twisted.internet import defer, protocol, reactor
 from twisted.internet.defer import inlineCallbacks
 from twisted.web.client import Agent, HTTPConnectionPool, _HTTP11ClientFactory
@ -16,15 +19,11 @@ from twisted.web.http_headers import Headers
 from twisted.web.iweb import IBodyProducer
 from zope.interface import implementer

-import evennia
-from evennia.accounts.models import AccountDB
-from evennia.utils import get_evennia_version, logger
-
 _EGI_HOST = "http://evennia-game-index.appspot.com"
 _EGI_REPORT_PATH = "/api/v1/game/check_in"


-class EvenniaGameIndexClient(object):
+class EvenniaGameIndexClient:
    """
    This client class is used for gathering and sending game details to the
    Evennia Game Index. Since EGI is in the early goings, this isn't
@ -33,8 +32,8 @@ class EvenniaGameIndexClient(object):

    def __init__(self, on_bad_request=None):
        """
-        :param on_bad_request: Optional callable to trigger when a bad request
-            was sent. This is almost always going to be due to bad config.
+        on_bad_request (callable, optional): Callable to trigger when a bad request was sent.
+
        """
        self.report_host = _EGI_HOST
        self.report_path = _EGI_REPORT_PATH
@ -150,7 +149,7 @@ class SimpleResponseReceiver(protocol.Protocol):


@implementer(IBodyProducer)
-class StringProducer(object):
+class StringProducer:
    """
    Used for feeding a request body to the tx HTTP client.
    """
--- a/evennia/server/game_index_client/service.py
+++ b/evennia/server/game_index_client/service.py
@ -54,6 +54,6 @@ class EvenniaGameIndexService(Service):
        Stop the service so we're not wasting resources.
        """
        logger.log_infomsg(
-            "Shutting down Evennia Game Index client service due to " "invalid configuration."
+            "Shutting down Evennia Game Index client service due to invalid configuration."
        )
        self.stopService()