Running a local LLM

A local LLM runs entirely on your own computer. Your email body never crosses the network, classification works offline, and you don’t pay per-token fees. The trade-off is that you need a machine capable of running the model and you have to keep the model server running while Saymail is open.

What works

Saymail talks to local models over the OpenAI Chat Completions API. Any tool that exposes that API works:

Ollama — the most common choice. Easy install, model catalogue, runs on Windows/macOS/Linux.
LM Studio — desktop app with a built-in OpenAI-compatible server.
llama.cpp server — minimal CLI server.
vLLM, TGI, Text Generation WebUI — heavier-weight options that also expose the same API.

If your tool of choice has an “OpenAI compatible” or “OpenAI API” mode, it should work.

Connecting Ollama (the typical path)

Install Ollama and pull a model. A capable mid-size model is a good starting point — for example: ollama pull llama3.1:8b. Saymail’s classification prompt isn’t huge, so 8–12B parameter models on a recent GPU give good results; 70B+ models give better classification but want a lot of VRAM.
Run Ollama (it starts a server on http://localhost:11434).
In Saymail, open LLM Connections → New.
Pick Local as the provider. The base URL pre-fills to http://localhost:11434/v1 — that’s Ollama’s OpenAI-compatible endpoint.
Type the model name exactly as you pulled it, e.g. llama3.1:8b.
Leave the API key blank (Ollama doesn’t require one).
Save.

Saymail tests the connection on save. If it succeeds the connection lights up green and is ready to classify.

Other servers

If you use LM Studio, the OpenAI-compatible server defaults to http://localhost:1234/v1 — adjust the base URL accordingly. Other servers have their own ports; check the tool’s docs for the OpenAI endpoint.

Performance notes

Email classification runs every 15 minutes by default. If your machine is slow, this can pile up. The classification task processes mail in batches, so it catches up over the course of a few rounds.
Quantised models (Q4, Q5) run far faster than full-precision weights with only a small quality drop. They’re a good default.
A model that’s “too big” for your VRAM will fall back to CPU and become very slow — pick a model that fits.
If you close Ollama, classifications fail silently. After three failures Saymail marks the connection as disconnected. Restart Ollama and click Reconnect in the LLM Connections panel. See LLM connection health.

Combining local with cloud

A common setup is local as priority 1 with a cloud key as priority 2. When your laptop is plugged in and at home, Saymail uses the local model and costs nothing. When you’re on the road and the local server isn’t running, Saymail falls back to the cloud. See Choosing your AI setup.