Deploy DeepSeek R2 + Ollama on Kubernetes: Local AI Stack
Running a local LLM requires a model server, persistent weight storage, and a browser interface behind an ingress. This template deploys DeepSeek R2 via Ollama with Open WebUI on Kubernetes, producing an offline-capable AI stack accessible at a local domain.
What's Included
| Component | Type | Port | Role |
|---|---|---|---|
| Namespace | Namespace | - | Isolates all resources under deepseek |
| ollama-ds-pv | PersistentVolume | - | 30 GB hostPath volume for model weights |
| ollama-ds-pvc | PersistentVolumeClaim | - | Binds Ollama to model storage |
| webui-ds-pv | PersistentVolume | - | 5 GB hostPath volume for WebUI data |
| webui-ds-pvc | PersistentVolumeClaim | - | Binds WebUI to its data storage |
| ollama-ds-config | ConfigMap | - | Ollama runtime env vars |
| webui-ds-config | ConfigMap | - | Open WebUI env vars |
| ollama | Deployment | 11434 | Ollama inference server running DeepSeek R2 |
| ollama-ds-service | Service (NodePort) | 11434 / 30112 | Exposes Ollama internally and via NodePort |
| open-webui | Deployment | 8080 | Browser chat interface |
| open-webui-service | Service (NodePort) | 8080 / 30081 | Exposes WebUI internally and via NodePort |
| nginx | IngressClass | - | Declares NGINX as default ingress class |
| deepseek-ingress | Ingress | 80 | Routes deepseek.local to WebUI, ollama-ds.local to API |
| open-webui-hpa | HorizontalPodAutoscaler | - | Scales WebUI on CPU or memory pressure |
| deepseek-quota | ResourceQuota | - | Caps CPU, memory, and pod count in namespace |
| deepseek-limits | LimitRange | - | Sets per-container resource defaults and ceilings |
| ollama-network-policy | NetworkPolicy | 11434 | Restricts Ollama to WebUI pods and ingress controller |
Architecture Overview
Ollama runs as a single replica with a 30 GB PersistentVolume for model weights. An init container pulls DeepSeek R2 on first boot using the Ollama CLI directly. Open WebUI reaches Ollama via internal ClusterIP DNS. NGINX Ingress routes deepseek.local to the WebUI and ollama-ds.local to the raw API, with buffering off for token streaming. A NetworkPolicy restricts Ollama to WebUI pods and the ingress controller only.
Prerequisites
- Docker Desktop with Kubernetes enabled and WSL2 configured with at least 12 GB memory via
.wslconfig - NGINX Ingress Controller deployed in the
ingress-nginxnamespace 127.0.0.1 deepseek.local ollama-ds.localadded toC:\Windows\System32\drivers\etc\hosts- KubeKanvas CLI installed and running on your computer (Optional, if you want to use one-click deployment)
How to Deploy
- Confirm the NGINX Ingress Controller pod is running in
ingress-nginxbefore applying. - Verify
.wslconfigsetsmemory=12GBand restart Docker Desktop if you changed it. - Update the
hostPath.pathvalues in both PersistentVolume specs to match your storage path. Kubernetes creates the directories automatically on first mount. - Deploy the template to your cluster via the Play button in the top right bar. If you prefer to deploy manually, download the YAML and apply it with
kubectl apply -f deepseek-k8s-final.yaml. - Wait for all pods to reach Running status. You can monitor progress in the Release Monitor screen.
How to Test
- Run
kubectl logs -n deepseek -l app=ollama -c model-puller -fand confirmPull completeandInit doneappear before the main container starts. - Open
http://deepseek.localand confirm the Open WebUI interface loads. - Send a message and confirm DeepSeek R2 responds with streamed tokens.
- Hit
http://ollama-ds.local/api/tagsand confirm the JSON response listsdeepseek-r2.
Use Cases
- Offline LLM development: Test prompts against DeepSeek R2 with no internet or API token after the initial pull.
- Private code assistance: Point Continue.dev or any OpenAI-compatible client at
http://ollama-ds.localfor local inference with no data leaving the machine. - Kubernetes learning: Study a realistic manifest covering PVs, init containers, Ingress, HPA, NetworkPolicy, ResourceQuota, and LimitRange in one deployable file.
- Air-gapped environments: Pre-pull weights to the hostPath volume and deploy on a machine with no outbound internet.
- Multi-user setups: Enable Open WebUI signup so multiple developers share one inference server with isolated chat history.
Summary
This template configures a 17-resource Kubernetes stack that runs DeepSeek R2 via Ollama, exposes it through a streaming-optimized NGINX Ingress, and enforces namespace-level resource and network controls.
