Deploy Gemma 4 + Ollama on Kubernetes Via Kubekanvas: Local AI Stack

Running a local LLM requires a model server, persistent weight storage, and a browser interface behind an ingress. This template deploys Google Gemma 4 via Ollama with Open WebUI on Kubernetes, producing an offline-capable AI stack accessible at a local domain.
| Component | Type | Port | Role |
|---|---|---|---|
| Namespace | Namespace | - | Isolates all resources under gemma4 |
| ollama-pv | PersistentVolume | - | 30 GB hostPath volume for model weights |
| ollama-pvc | PersistentVolumeClaim | - | Binds Ollama to model storage |
| webui-pv | PersistentVolume | - | 5 GB hostPath volume for WebUI data |
| webui-pvc | PersistentVolumeClaim | - | Binds WebUI to its data storage |
| ollama-config | ConfigMap | - | Ollama runtime env vars |
| webui-config | ConfigMap | - | Open WebUI env vars |
| ollama | Deployment | 11434 | Ollama inference server running Gemma 4 |
| ollama-service | Service (NodePort) | 11434 / 30111 | Exposes Ollama internally and via NodePort |
| open-webui | Deployment | 8080 | Browser chat interface |
| open-webui-service | Service (NodePort) | 8080 / 30080 | Exposes WebUI internally and via NodePort |
| nginx | IngressClass | - | Declares NGINX as default ingress class |
| gemma4-ingress | Ingress | 80 | Routes gemma.local to WebUI, ollama.local to API |
| open-webui-hpa | HorizontalPodAutoscaler | - | Scales WebUI on CPU or memory pressure |
| gemma4-quota | ResourceQuota | - | Caps CPU, memory, and pod count in namespace |
| gemma4-limits | LimitRange | - | Sets per-container resource defaults and ceilings |
| ollama-network-policy | NetworkPolicy | 11434 | Restricts Ollama to WebUI pods and ingress controller |
Ollama runs as a single replica with a 30 GB PersistentVolume for model weights. An init container pulls Gemma 4 on first boot using the Ollama CLI directly. Open WebUI reaches Ollama via internal ClusterIP DNS. NGINX Ingress routes gemma.local to the WebUI and ollama.local to the raw API, with buffering off for token streaming. A NetworkPolicy restricts Ollama to WebUI pods and the ingress controller only.
.wslconfigingress-nginx namespace127.0.0.1 gemma.local ollama.local added to C:\Windows\System32\drivers\etc\hostsingress-nginx before applying..wslconfig sets memory=12GB and restart Docker Desktop if you changed it.hostPath.path values in both PersistentVolume specs to match your storage path. Kubernetes creates the directories automatically on first mount.kubectl apply -f gemma4-k8s-final.yaml.kubectl logs -n gemma4 -l app=ollama -c model-puller -f and confirm Pull complete and Init done appear before the main container starts.http://gemma.local and confirm the Open WebUI interface loads.http://ollama.local/api/tags and confirm the JSON response lists gemma4.http://ollama.local for local inference with no data leaving the machine.This template configures a 17-resource Kubernetes stack that runs Gemma 4 via Ollama, exposes it through a streaming-optimized NGINX Ingress, and enforces namespace-level resource and network controls.