evennia/docs/2.x/Contribs/Contrib-Llm.html


<!DOCTYPE html>

<html>
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

    <title>Large Language Model (“Chat-bot AI”) integration &#8212; Evennia 2.x documentation</title>
    <link rel="stylesheet" href="../_static/nature.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    <script id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
    <script src="../_static/jquery.js"></script>
    <script src="../_static/underscore.js"></script>
    <script src="../_static/doctools.js"></script>
    <script src="../_static/language_data.js"></script>
    <link rel="shortcut icon" href="../_static/favicon.ico"/>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="Roleplaying base system for Evennia" href="Contrib-RPSystem.html" />
    <link rel="prev" title="Health Bar" href="Contrib-Health-Bar.html" />
  </head><body>


    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="Contrib-RPSystem.html" title="Roleplaying base system for Evennia"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="Contrib-Health-Bar.html" title="Health Bar"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../index.html">Evennia 2.x</a> &#187;</li>
          <li class="nav-item nav-item-1"><a href="Contribs-Overview.html" accesskey="U">Contribs</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">Large Language Model (“Chat-bot AI”) integration</a></li>
      </ul>
    </div>

    <div class="document">

      <div class="documentwrapper">
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
            <p class="logo"><a href="../index.html">
              <img class="logo" src="../_static/evennia_logo.png" alt="Logo"/>
            </a></p>
<div id="searchbox" style="display: none" role="search">
  <h3 id="searchlabel">Quick search</h3>
    <div class="searchformwrapper">
    <form class="search" action="../search.html" method="get">
      <input type="text" name="q" aria-labelledby="searchlabel" />
      <input type="submit" value="Go" />
    </form>
    </div>
</div>
<script>$('#searchbox').show(0);</script>
<h3><a href="../index.html">Table of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Large Language Model (“Chat-bot AI”) integration</a><ul>
<li><a class="reference internal" href="#installation">Installation</a><ul>
<li><a class="reference internal" href="#llm-server">LLM Server</a></li>
<li><a class="reference internal" href="#evennia-config">Evennia config</a></li>
</ul>
</li>
<li><a class="reference internal" href="#usage">Usage</a></li>
<li><a class="reference internal" href="#a-note-on-running-llms-locally">A note on running LLMs locally</a><ul>
<li><a class="reference internal" href="#why-not-use-an-ai-cloud-service">Why not use an AI cloud service?</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-llmnpc-class">The LLMNPC class</a></li>
<li><a class="reference internal" href="#todo">TODO</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="Contrib-Health-Bar.html"
                        title="previous chapter">Health Bar</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="Contrib-RPSystem.html"
                        title="next chapter">Roleplaying base system for Evennia</a></p>
  <div role="note" aria-label="source link">
    <!--h3>This Page</h3-->
    <ul class="this-page-menu">
      <li><a href="../_sources/Contribs/Contrib-Llm.md.txt"
            rel="nofollow">Show Page Source</a></li>
    </ul>
   </div><h3>Links</h3>
<ul>
  <li><a href="https://www.evennia.com/docs/latest/index.html">Documentation Top</a> </li>
  <li><a href="https://www.evennia.com">Evennia Home</a> </li>
  <li><a href="https://github.com/evennia/evennia">Github</a> </li>
  <li><a href="http://games.evennia.com">Game Index</a> </li>
  <li>
    <a href="https://discord.gg/AJJpcRUhtF">Discord</a> -
     <a href="https://github.com/evennia/evennia/discussions">Discussions</a> -
      <a href="https://evennia.blogspot.com/">Blog</a>
  </li>
</ul>
<h3>Doc Versions</h3>
<ul>

    <li><a href="Contrib-Llm.html">2.x (main branch)</a></li>
 <ul>
    <li><a href="../1.3.0/index.html">1.3.0 (v1.3.0 branch)</a></li>

    <li><a href="../0.9.5/index.html">0.9.5 (v0.9.5 branch)</a></li>


</ul>

        </div>
      </div>
        <div class="bodywrapper">
          <div class="body" role="main">

  <section class="tex2jax_ignore mathjax_ignore" id="large-language-model-chat-bot-ai-integration">
<h1>Large Language Model (“Chat-bot AI”) integration<a class="headerlink" href="#large-language-model-chat-bot-ai-integration" title="Permalink to this headline">¶</a></h1>
<p>Contribution by Griatch 2023</p>
<p>This adds an LLMClient that allows Evennia to send prompts to a  LLM server (Large Language Model, along the lines of ChatGPT). Example uses a local OSS LLM install. Included is an NPC you can chat with using a new <code class="docutils literal notranslate"><span class="pre">talk</span></code> command. The NPC will respond using the AI responses from the LLM server. All calls are asynchronous, so if the LLM is slow, Evennia is not affected.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>&gt; create/drop villager:evennia.contrib.rpg.llm.LLMNPC
You create a new LLMNPC: villager

&gt; talk villager Hello there friend, what&#39;s up?
You say (to villager): Hello there friend, what&#39;s up?
villager says (to You): Hello! Not much going on, really. How about you?

&gt; talk villager Just enjoying the nice weather.
You say (to villager): Just enjoying the nice weather.
villager says (to You): Yeah, it is really quite nice, ain&#39;t it.
</pre></div>
</div>
<section id="installation">
<h2>Installation<a class="headerlink" href="#installation" title="Permalink to this headline">¶</a></h2>
<p>You need two components for this contrib - Evennia, and an LLM webserver that operates and provides an API to an LLM AI model.</p>
<section id="llm-server">
<h3>LLM Server<a class="headerlink" href="#llm-server" title="Permalink to this headline">¶</a></h3>
<p>There are many LLM servers, but they can be pretty technical to install and set up. This contrib was tested with <a class="reference external" href="https://github.com/oobabooga/text-generation-webui">text-generation-webui</a>. It has a lot of features while also being easy to install. |</p>
<ol class="simple">
<li><p><a class="reference external" href="https://github.com/oobabooga/text-generation-webui#installation">Go to the Installation section</a> and grab the ‘one-click installer’ for your OS.</p></li>
<li><p>Unzip the files in a folder somewhere on your hard drive (you don’t have to put it next to your evennia stuff if you don’t want to).</p></li>
<li><p>In a terminal/console, <code class="docutils literal notranslate"><span class="pre">cd</span></code> into the folder and execute the source file in whatever way it’s done for your OS (like <code class="docutils literal notranslate"><span class="pre">source</span> <span class="pre">start_linux.sh</span></code> for Linux, or <code class="docutils literal notranslate"><span class="pre">.\start_windows</span></code> for Windows). This is an installer that will fetch and install everything in a conda virtual environment. When asked, make sure to select your GPU (NVIDIA/AMD etc) if you have one, otherwise use CPU.</p></li>
<li><p>Once all is loaded, stop the server with <code class="docutils literal notranslate"><span class="pre">Ctrl-C</span></code> (or <code class="docutils literal notranslate"><span class="pre">Cmd-C</span></code>) and open the file <code class="docutils literal notranslate"><span class="pre">webui.py</span></code> (it’s one of the top files in the archive you unzipped). Find the text string <code class="docutils literal notranslate"><span class="pre">CMD_FLAGS</span> <span class="pre">=</span> <span class="pre">''</span></code> near the top and change this to <code class="docutils literal notranslate"><span class="pre">CMD_FLAGS</span> <span class="pre">=</span> <span class="pre">'--api'</span></code>. Then save and close. This makes the server activate its api automatically.</p></li>
<li><p>Now just run that server starting script (<code class="docutils literal notranslate"><span class="pre">start_linux.sh</span></code> etc) again. This is what you’ll use to start the LLM server henceforth.</p></li>
<li><p>Once the server is running, point your browser to <a class="reference external" href="http://127.0.0.1:7860">http://127.0.0.1:7860</a> to see the running Text generation web ui running. If you turned on the API, you’ll find it’s now active on port 5000. This should not collide with default Evennia ports unless you changed something.</p></li>
<li><p>At this point you have the server and API, but it’s not actually running any Large-Language-Model (LLM) yet. In the web ui, go to the <code class="docutils literal notranslate"><span class="pre">models</span></code> tab and enter a github-style path in the <code class="docutils literal notranslate"><span class="pre">Download</span> <span class="pre">custom</span> <span class="pre">model</span> <span class="pre">or</span> <span class="pre">LoRA</span></code> field.  To test so things work, enter <code class="docutils literal notranslate"><span class="pre">DeepPavlov/bart-base-en-persona-chat</span></code> and download. This is a relatively small model (350 million parameters) so should be possible to run on most machines using only CPU. Update the models in the drop-down on the left and select it, then load it with the <code class="docutils literal notranslate"><span class="pre">Transformers</span></code> loader. It should load pretty quickly. If you want to load this every time, you can select the <code class="docutils literal notranslate"><span class="pre">Autoload</span> <span class="pre">the</span> <span class="pre">model</span></code> checkbox; otherwise you’ll need to select and load the model every time you start the LLM server.</p></li>
<li><p>To experiment, you can find thousands of other open-source text-generation LLM models on <a class="reference external" href="https://huggingface.co/models?pipeline_tag=text-generation&amp;sort=trending">huggingface.co/models</a>. Beware to not download a too huge model; your machine may not be able to load it! If you try large models, <em>don’t</em> set the <code class="docutils literal notranslate"><span class="pre">Autoload</span> <span class="pre">the</span> <span class="pre">model</span></code> checkbox, in case the model crashes your server on startup.</p></li>
</ol>
<p>For troubleshooting, you can look at the terminal output of the <code class="docutils literal notranslate"><span class="pre">text-generation-webui</span></code> server; it will show you the requests you do to it and also list any errors. See the text-generation-webui homepage for more details.</p>
</section>
<section id="evennia-config">
<h3>Evennia config<a class="headerlink" href="#evennia-config" title="Permalink to this headline">¶</a></h3>
<p>To be able to talk to NPCs, import and add the <code class="docutils literal notranslate"><span class="pre">evennia.contrib.rpg.llm.llm_npc.CmdLLMTalk</span></code> command to your Character cmdset in <code class="docutils literal notranslate"><span class="pre">mygame/commands/default_commands.py</span></code> (see the basic tutorials if you are unsure).</p>
<p>The default LLM api config should work with the text-generation-webui LLM server running its API on port 5000. You can also customize it via settings (if a setting is not added, the default below is used):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>    <span class="c1"># path to the LLM server</span>
    <span class="n">LLM_HOST</span> <span class="o">=</span> <span class="s2">&quot;http://127.0.0.1:5000&quot;</span>
    <span class="n">LLM_PATH</span> <span class="o">=</span> <span class="s2">&quot;/api/v1/generate&quot;</span>

    <span class="c1"># if you wanted to authenticated to some external service, you could</span>
    <span class="c1"># add an Authenticate header here with a token</span>
    <span class="n">LLM_HEADERS</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;Content-Type&quot;</span><span class="p">:</span> <span class="s2">&quot;application/json&quot;</span><span class="p">}</span>

    <span class="c1"># this key will be inserted in the request, with your user-input</span>
    <span class="n">LLM_PROMPT_KEYNAME</span> <span class="o">=</span> <span class="s2">&quot;prompt&quot;</span>

    <span class="c1"># defaults are set up for text-generation-webui and most models</span>
    <span class="n">LLM_REQUEST_BODY</span> <span class="o">=</span> <span class="p">{</span>
        <span class="s2">&quot;max_new_tokens&quot;</span><span class="p">:</span> <span class="mi">250</span><span class="p">,</span>  <span class="c1"># set how many tokens are part of a response</span>
        <span class="s2">&quot;temperature&quot;</span><span class="p">:</span> <span class="mf">0.7</span><span class="p">,</span> <span class="c1"># 0-2. higher=more random, lower=predictable</span>
    <span class="p">}</span>
</pre></div>
</div>
<p>Don’t forget to reload Evennia if you make any changes.</p>
</section>
</section>
<section id="usage">
<h2>Usage<a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h2>
<p>With the LLM server running and the new <code class="docutils literal notranslate"><span class="pre">talk</span></code> command added, create a new LLM-connected NPC and talk to it in-game.</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>&gt; create/drop girl:evennia.contrib.rpg.llm.LLMNPC
&gt; talk girl Hello!
You say (to girl): Hello
girl ponders ...
girl says (to You): Hello! How are you?
</pre></div>
</div>
<p>Most likely, your first response will <em>not</em> be this nice and short, but will be quite nonsensical, looking like an email. This is because the example model we loaded is not optimized for conversations. But at least you know it works!</p>
<p>The  conversation will be echoed to everyone in the room. The NPC will show a thinking/pondering message if the server responds slower than 2 seconds (by default).</p>
</section>
<section id="a-note-on-running-llms-locally">
<h2>A note on running LLMs locally<a class="headerlink" href="#a-note-on-running-llms-locally" title="Permalink to this headline">¶</a></h2>
<p>Running an LLM locally can be <em>very</em> demanding.</p>
<p>As an example, I tested this on my very beefy work laptop. It has 32GB or RAM, but no gpu. so i ran the example (small 128m parameter) model on cpu. it takes about 3-4 seconds to generate a (frankly very bad) response. so keep that in mind.</p>
<p>On <a class="reference external" href="http://huggingface.co">huggingface.co</a> you can find listings of the ‘best performing’ language models right now. This changes all the time. The leading models require 100+ GB RAM. And while it’s possible to run on a CPU, ideally you should have a large graphics card (GPU) with a lot of VRAM too.</p>
<p>So most likely you’ll have to settle on something smaller. Experimenting with different models and also tweaking the prompt is needed.</p>
<p>Also be aware that many open-source models are intended for AI research and licensed for non-commercial use only. So be careful if you want to use this in a commercial game. No doubt there will be a lot of changes in this area over the coming years.</p>
<section id="why-not-use-an-ai-cloud-service">
<h3>Why not use an AI cloud service?<a class="headerlink" href="#why-not-use-an-ai-cloud-service" title="Permalink to this headline">¶</a></h3>
<p>You could in principle use this to call out to an external API, like OpenAI (chat-GPT) or Google. Most cloud-hosted services are commercial and costs money. But since they have the hardware to run bigger models (or their own, proprietary models), they may give better and faster results.</p>
<p>Calling an external API is not tested, so report any findings. Since the Evennia Server (not the Portal) is doing the calling, you are recommended to put a proxy between you and the internet if you call out like this.</p>
<p>Here is an untested example of the Evennia setting for calling <a class="reference external" href="https://platform.openai.com/docs/api-reference/completions">OpenAI’s v1/completions API</a>:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">LLM_HOST</span> <span class="o">=</span> <span class="s2">&quot;https://api.openai.com&quot;</span>
<span class="n">LLM_PATH</span> <span class="o">=</span> <span class="s2">&quot;/v1/completions&quot;</span>
<span class="n">LLM_HEADERS</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;Content-Type&quot;</span><span class="p">:</span> <span class="s2">&quot;application/json&quot;</span><span class="p">,</span>
               <span class="s2">&quot;Authorization&quot;</span><span class="p">:</span> <span class="s2">&quot;Bearer YOUR_OPENAI_API_KEY&quot;</span><span class="p">}</span>
<span class="n">LLM_PROMPT_KEYNAME</span> <span class="o">=</span> <span class="s2">&quot;prompt&quot;</span>
<span class="n">LLM_REQUEST_BODY</span> <span class="o">=</span> <span class="p">{</span>
                        <span class="s2">&quot;model&quot;</span><span class="p">:</span> <span class="s2">&quot;gpt-3.5-turbo&quot;</span><span class="p">,</span>
                        <span class="s2">&quot;temperature&quot;</span><span class="p">:</span> <span class="mf">0.7</span><span class="p">,</span>
                        <span class="s2">&quot;max_tokens&quot;</span><span class="p">:</span> <span class="mi">128</span><span class="p">,</span>
                   <span class="p">}</span>

</pre></div>
</div>
<blockquote>
<div><p>TODO: OpenAI’s more modern <a class="reference external" href="https://platform.openai.com/docs/api-reference/chat">v1/chat/completions</a> api does currently not work out of the gate since it’s a bit more complex, having the prompt given as a list of all responses so far.</p>
</div></blockquote>
</section>
</section>
<section id="the-llmnpc-class">
<h2>The LLMNPC class<a class="headerlink" href="#the-llmnpc-class" title="Permalink to this headline">¶</a></h2>
<p>This is a simple Character class, with a few extra properties:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>    <span class="c1"># response template on msg_contents form.</span>
    <span class="n">prompt_prefix</span> <span class="o">=</span> <span class="p">(</span><span class="s2">&quot;You will chat and roleplay &quot;</span><span class="p">)</span>

    <span class="n">response_template</span> <span class="o">=</span> <span class="s2">&quot;$You() $conj(say) (to $You(character)): </span><span class="si">{response}</span><span class="s2">&quot;</span>
    <span class="n">thinking_timeout</span> <span class="o">=</span> <span class="mi">2</span>    <span class="c1"># how long to wait until showing thinking</span>

    <span class="c1"># random &#39;thinking echoes&#39; to return while we wait, if the AI is slow</span>
    <span class="n">thinking_messages</span> <span class="o">=</span> <span class="p">[</span>
        <span class="s2">&quot;</span><span class="si">{name}</span><span class="s2"> thinks about what you said ...&quot;</span><span class="p">,</span>
        <span class="s2">&quot;</span><span class="si">{name}</span><span class="s2"> ponders your words ...&quot;</span><span class="p">,</span>
        <span class="s2">&quot;</span><span class="si">{name}</span><span class="s2"> ponders ...&quot;</span><span class="p">,</span>
    <span class="p">]</span>
</pre></div>
</div>
<p>The character has a new method <code class="docutils literal notranslate"><span class="pre">at_talked_to</span></code> which does the connection to the LLM server and responds. This is called by the new <code class="docutils literal notranslate"><span class="pre">talk</span></code> command. Note that all these calls are asynchronous, meaning a slow response will not block Evennia.</p>
</section>
<section id="todo">
<h2>TODO<a class="headerlink" href="#todo" title="Permalink to this headline">¶</a></h2>
<p>There is a lot of expansion potential with this contrib. Some ideas:</p>
<ul class="simple">
<li><p>Better standard prompting to make the NPC actually conversant.</p></li>
<li><p>Have the NPC remember previous conversations with the player</p></li>
<li><p>Easier support for different cloud LLM provider API structures.</p></li>
<li><p>More examples of useful prompts and suitable models for MUD use.</p></li>
</ul>
<hr class="docutils" />
<p><small>This document page is generated from <code class="docutils literal notranslate"><span class="pre">evennia/contrib/rpg/llm/README.md</span></code>. Changes to this
file will be overwritten, so edit that file rather than this one.</small></p>
</section>
</section>


          </div>
        </div>
      </div>

    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="Contrib-RPSystem.html" title="Roleplaying base system for Evennia"
             >next</a> |</li>
        <li class="right" >
          <a href="Contrib-Health-Bar.html" title="Health Bar"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../index.html">Evennia 2.x</a> &#187;</li>
          <li class="nav-item nav-item-1"><a href="Contribs-Overview.html" >Contribs</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">Large Language Model (“Chat-bot AI”) integration</a></li>
      </ul>
    </div>


    <div class="footer" role="contentinfo">
        &#169; Copyright 2023, The Evennia developer community.
      Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
    </div>
  </body>
</html>