Models
Kiki talks to language models through one abstraction, so a task doesn't care whether a model runs on your device or in the cloud. The router decides which model handles each request, governed by your configuration.
Local-first by default
Out of the box, Kiki runs a small model on your own machine through Ollama — the shipped default is granite4.1:3b. With the default policy (allow_remote = false), that local model is the only one used: nothing leaves the device.
This is why a fresh Kiki device does no outbound inference and costs nothing to run.
Cloud models, when you opt in
For models bigger than your device can run, Kiki can route to cloud providers — but only when you allow it. Each provider owns the models it supports:
| Provider | Owns models like |
|---|---|
| Local (Ollama) | anything not claimed by a cloud provider |
| Anthropic | claude-* |
| OpenAI | gpt-*, o1, o3 |
| Modal | modal/* (your own hosted models) |
How routing decides
The router also respects the policy levers in agentd.toml: fall back to local on low battery, keep voice off third-party providers, pin or auto-select a model.
Choosing a default model
Pick a model your hardware can run on-device. Set it when you build a custom image (--build-arg DEFAULT_MODEL=...) or change it later in the config. Larger machines can run larger local models; smaller ones lean on a compact default and opt into the cloud for heavy lifting.
Cloud inference and cost
When a device is in the cloud, remote inference is metered as part of billing. On-device inference is always free — it's your hardware.