Deploying self-hosted language models for ChatGPT-like inference using DeepSeek

DeepSeek R1 is an open-source LLM with performance on par with OpenAI-o1 (https://api-docs.deepseek.com/news/news250120). In this tutorial, we’re briefly exploring how to self-host a ChatGPT-style inference using WebUI, with DeepSeek R1 as the backend model.

Goals:

Free use of the open-source LLMs
Energy efficient and low noise for long-term hosting
Re-use of outdated hardware
Public access from internet secured by credentials

Out of scope:

High volume throughput or enterprise level deployment
Enhanced security and protection
LLM Guardrails

As this is only for experimenting purpose, I decided to go ahead and carry out all the work on a almost 10-year old Microsoft Surface book, base model without GPU (I would like to have one with GPU but don’t have a proper laptop around). For the OS, there are options like Windows, Linux, MacOS etc for the popularity reason this time I picked Windows though personally I would prefer Linux better than others. The specs and OS of the Surface Book is as shown in the picture, the storage is 128gb SSD. The reason to pick up a laptop is to cope with Goal 2 and 3, though those two might be conflicting as old hardware sometime is less energy efficient.

The method discussed here in this post hopefully require only minimal technology understanding so everyone can try this out. The reason of doing this could be a lot, for example lower cost (as in the UK the estimated cost of monthly running is less than £10, which is much cheaper than other options not mention you can share your hosting with family and friends to make it even cheaper, actually the laptop even don’t need to be power-on all the time rather than using trigger events to wake it up to make it even more energy efficient), much better privacy (as everything is self-hosted and not sharing with any services/model providers), and fun. Let’s start the build, and I assume the laptop has fresh installed Windows with all patches and updates, as the Surface Book used in this experiment.

TL:DR

Install Docker and update
Install Ollama
Deploy Open WebUI using Docker
Pull models
Grant appropriate permissions to users
Apply the port forwarding to enable access from internet, you may point your domain to your local host.

Step1: Install/Update docker

To minimal user’s effort in this experiment we’re using docker as suggested by Open WebUI, in the “Quick Start with Docker” section.

Download Docker and install, restart the PC if it is required. (https://docs.docker.com/desktop/setup/install/windows-install/)

After restarting Window, continue the setup and use recommended setting. You also need an account which is free to create one.

Step 2: Install Ollama

Ollama is a free, open-source tool that allows users to run large language models (LLMs) on their own computers. Utilizing Ollama significantly minimizes the need for command-line interactions.

Download the Windows installer of Ollama and install (https://ollama.com/download/windows). After install check the bottom right notification area to make sure Ollama is running in background

Step 3: Deploy Open WebUI using docker

This step is quite straightforward, just open command prompt and use the following line to do the job, and wait till it is finished. To open the command prompt on Windows, press the Windows key + R to open the Run dialog, then type “cmd” and press Enter, then copy, paste and run the following command. This will take a while.

docker run -d -p 3000:8080 –add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data –name open-webui –restart always ghcr.io/open-webui/open-webui:main

Once it is done in the command prompt, don’t rush to your browser, as highly likely you will see the following error 😀

This is because docker is still doing its job. You can open the docker desktop app and check the logs to make sure the service is started. Once it is not done, you will see the logs like this:

Wait till you see “Application startup complete”, and now you can go to your browser at localhost:3000, create an admin account (this will be local) and start playing around. Actually, from this point, you can access the Open WebUI on any devices in your local network by using the LAN IP address.

Now you might have issues with no model, when start a new chat, there is no model available. Don’t worry, this is because we haven’t pull any model yet. To pull a model, you can either do it in command prompt using:

ollama run deepseek-r1:1.5b

Or you can do it using the WebUI:

In this experiment, we’re using 1.5b as the Surface Book is a bit outdated, it won’t be able to host the 7b model. If we try 7b it will give error “model requires more system memory”.

Now you can enjoy the open-source model on your LAN:

One more step, access your own LLM from internet

Before go ahead, please note this will expose your IP to the internet.

To use your self-hosted anywhere using internet, it is not that difficult, the only thing you need to do is port forwarding access from internet to your laptop on your local network. To do this, please google “port forwarding <your ISP provider> router”, for example, the vodafone router tutorial is here: https://deviceguides.vodafone.ie/vodafone/gigabox-windows-10/basic-use/set-up-port-forwarding/

Basically, log into your router, and then add port 3000 to your local laptop.

After this, find you IP address, for example, use https://whatismyipaddress.com/, then it should be good to use your LLM at <YOUR IP ADDRESS>:3000. To make this a bit more interesting, if you have a domain, you can actually add a “A record” to point the domain to your IP address, therefore you can visit your LLM using url like https://lsong.net:3000/. If you would like to, you can share this url with your family and friend. You need to grant the model permission to public using the webui in your admin panel. That’s it, hope you enjoy this post.

Post Views: 457

Deploying self-hosted language models for ChatGPT-like inference using DeepSeek