PatBot - How to Trust Your Local AI

Trust is an interesting concept when applied to computing. What do you trust a computer with? If you think about it, quite a lot really. You trust computers with your most intimate communications, your most outlandish chat groups and your professional correspondence - and all of that is just on the messaging front. Expanding this thought to banking, and money in general – we “trust” that those numbers on the screen mean we have more or less to use for shopping and we trust the computers on board transportation to run correctly so we can get to where we need to go.

So yes, we “trust” these things, and have done for some time, because we rely on them and in turn, they [the manufacturers] rely on us. Most of the systems we trust today we do so for one of the following reasons;

A. They are historically reliable - let’s say Microsoft Outlook - ignoring those times it has been very much *Unreliable* - this is the “that’s how it’s always been done” group

B. They come from a trusted legacy entity - let’s say the iPhone - this is the “you never get fired for buying IBM” group

C. We have either built them or understand how they work - usually reserved for the early-adopters or technical individuals - this is the “let’s find the limits of this so we know how best to use it” group

It’s not a surprise then that AI in its many forms is firmly within group C here. But with the capabilities and opportunities AI is affording everyone - regardless of size or budget - there is a valid concern from those not fully on the hype train, both from not wanting to get left behind but also for a not wanting to “back the wrong horse”.

The good news is that this isn’t a VHS vs Betamax or Blu-ray vs HD-DVD [can’t imagine why that catchy name never prevailed] scenario, these may be early days of AI and Agents but there are some things you can set up fairly simply to cover basic tasks / help out without having to break the bank or put all your eggs in one basket and equally importantly, not dumping all your data outside where you feel comfortable.

So how do we start to feel comfortable? How do we start to “trust” AI and / or the AI systems that are out there? We have to try to understand how they work and how they can work for us, and ideally, we can do this in fairly short order and with a low budget.

Let’s chat!

It's not an original idea, but one that’s good for a fun example so here’s the scenario: I want a digital clone of me. I want to begin testing how an AI chat bot would behave if it had all the information I have put out into the world and was restricted to that knowledge base for assessing and responding to questions. Maybe a Pat Chat-Bot - a “PatBot” if you will - is a fairly niche use case, but if you consider the model - this knowledge base that the LLM is restricted to using when responding - you can imagine applications across pretty much any area of expertise. This is Retrieval Augmented Generation, - a “restricted knowledge base” meaning your chat bot can only answer from a set of specific documents you have chosen - and you can [probably] set it up locally, privately and start using it within a day.

Sounds neat right? Got a set of user manuals for those household appliances? Stick them into a dedicated knowledge base and you can chat with your “HousekeeperModel” who will only have information about “how to set the clock on the cooker” or “where the reset button for the washing machine” is. Saves you trawling through the search results to get to the user manual of a similar but different enough vacuum cleaner only to find yours doesn’t have the same buttons.

Or maybe something more business focussed - chat with the previous set of contracts you have in your archives, see if the next one you are putting together includes or omits anything noteworthy, or if the latest terms and conditions you received align with your policies. Endless opportunities.

And I mentioned local and private. This is the key for the understanding and getting comfortable – moving into the group C types. You are much more likely to happily test the limits of and fire in sensitive information if you know that the information is not leaving your hard drive.

What hardware do you need?

You don’t need a brand-new GPU or a data-centre server. Even a modest modern laptop is enough for a small, local RAG setup that’s only indexing a few dozen documents.

Example configuration:

5-year-old MacBook with Apple M1 Pro (16 GB RAM)
Roughly comparable to an Intel Core i7-13700K desktop (16 GB RAM)

In other words, if you can run a web browser and a few developer tools on your machine, you can probably run a scaled-down LLM like llama3.2. Worst case, install everything and benchmark for your own workloads - you can always upgrade later if you really need more power.

Now for the installation, there are dozens of blogs and YouTube videos to help you set by step with the exacts so I won’t regurgitate that, you can search “Ollama with OpenWebUI and Docker Desktop installation” or better still, ask your favourite LLM to “ELI5 installing Ollama with OpenWebUI and Docker Desktop”. The following elements I will breakdown.

What is Docker Desktop?

Docker is a platform for virtualising applications in what they call “containers”. Again, not getting into the weeds on any of this so please search and wiki if you want a deep dive. Basically, it’s where the chat “website” is going to live. It runs it and if Docker isn’t running, you won’t be able to reach the chat bot.

What is Ollama?

This is the main power behind it all. Ollama is an open-source application that runs on your computer that can load and run “any” LLM. The LLMs available are always being updated but there are plenty that will fit the bill for our purposes. For example, llama3.2 is a small version of Meta’s LLM which is what you would have been interacting with if you used Meta’s assistant 9 months ago or so.

What is OpenWebUI?

So, while Ollama “runs” the models, it doesn’t give you a nice interface [unless you’re a fan of the command line] to interact with them – so no nice “hello how can I help” familiar interface. That’s where OpenWebUI comes from. It is also open-source software and provides the “website” and back-end connections to Ollama so when you hit the “website” and ask a question, that question is passed to Ollama, the model you are running answers it and this is passed back to you via the “website”.

All sounding too complicated? It doesn’t have to be - the explanations of each part here are more to aid the move to “group C” - understanding how things work.

More on the how things work - this setup is a one-time deal, a bit of tinkering but then you will have your system running, with a pull to get the model we want for Ollama [llama3.2 as previously suggested] and we can then disconnect our machine from the internet and still interact with the new chat bot “website”.

This is the step that will hopefully mean you can begin to trust what’s going on – there’s the chat interface, the knowledge [more on that in a minute] and the “model” that will do the work. None of these elements are touching the internet or exposed or accessible to anyone else if we only run them locally.

RAG

Retrieval Augmented Generation. Here’s the wiki for what it “is”:

Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information. With RAG, LLMs do not respond to user queries until they refer to a specified set of documents. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources.

So basically, we are going to create the first RAG Chat Bot [we can create several just with the above setup] and give it some data to use.

Setting up OpenWebUI

Once you have logged in [again this username and password is just for you locally logging into your own private website and not signing up for something] you can click on “Workspace” on the left hand menu to get into the settings for your instance, followed by “Knowledge” which will give you a pane where you can create a new knowledge base, and upload the files you wish to use as the basis / knowledge for your chat bot. here I have added my previous blog posts to “PatBot” so it can respond only using this data.

Next you click back to Models [to the left of “Knowledge”] so we can add a new model and add the knowledge we just created to its “memory”. Choose the base model [the LLM model you pulled e.g. llama3.2] so we have the “basis” for how the LLM will interact. As mentioned, it’s not important that this PatBot is up to date with the latest football scores or news headlines as we are creating it for a very specific purpose, it just needs to be able to respond conversationally to our questions. Llama3.2 will do this no problem. Scroll down to the knowledge section and select the knowledge we created earlier. Next up is to give the model a “system prompt”. This is where a bit of nuance and trial and error happens – all perfect in the quest to learn and understand the limitations of the system. We basically want to tell the model what it is and what it’s supposed to do. For PatBot this is pretty easy as I don’t care about the format of the output, however if I wanted to create a bot who was responsible for outputting a structured format, I would need to specify this here. I’ve just told PatBot that it is me. Save the model and we’re almost done.

Click on New Chat at the top left and where the select model is listed, drop down and you should see your new RAG Chat Bot! Now you can ask all manner of things that the chatbot will respond to, along with citations from the documents and files it used to build the response, but rather than a Gemini / Chat GPT / etc. request that encompasses everything, it will be restricted to your chosen knowledgebase and will do so privately and locally on your machine.

I have often used the analogy that you should treat AI like an intern. An intern is a very capable individual, but someone whose work you would want to check to ensure it was in line with your expectations and then provide further guidance to bridge any gaps in their knowledge, hoping that the next time you would see the knowledge transfer had been successful.

Just as with a real intern, your chat bot should be given feedback and additional knowledge to perform at its best. Learning to do this on a localised chat bot is something you can now start and will hopefully demystify some of the simple but powerful ways AI can be implemented securely, cheaply and in a piecemeal but relevant way today.

Jun 4 PatBot - How to Trust Your Local AI

Jun 6 Analogue Sheep : Vol 5 : Clark Kent's Day Off

May 30 Good Vibes Only - The tools are ready, are you?