Deploy Gemma 4 + Ollama on Kubernetes Via Kubekanvas: Local AI Stack

Running a local LLM requires a model server, persistent weight storage, and a browser interface behind an ingress. This template deploys Google Gemma 4 via Ollama with Open WebUI on Kubernetes, producing an offline-capable AI stack accessible at a local domain.

What's Included

Component	Type	Port	Role
Namespace	Namespace	-	Isolates all resources under `gemma4`
ollama-pv	PersistentVolume	-	30 GB hostPath volume for model weights
ollama-pvc	PersistentVolumeClaim	-	Binds Ollama to model storage
webui-pv	PersistentVolume	-	5 GB hostPath volume for WebUI data
webui-pvc	PersistentVolumeClaim	-	Binds WebUI to its data storage
ollama-config	ConfigMap	-	Ollama runtime env vars
webui-config	ConfigMap	-	Open WebUI env vars
ollama	Deployment	11434	Ollama inference server running Gemma 4
ollama-service	Service (NodePort)	11434 / 30111	Exposes Ollama internally and via NodePort
open-webui	Deployment	8080	Browser chat interface
open-webui-service	Service (NodePort)	8080 / 30080	Exposes WebUI internally and via NodePort
nginx	IngressClass	-	Declares NGINX as default ingress class
gemma4-ingress	Ingress	80	Routes gemma.local to WebUI, ollama.local to API
open-webui-hpa	HorizontalPodAutoscaler	-	Scales WebUI on CPU or memory pressure
gemma4-quota	ResourceQuota	-	Caps CPU, memory, and pod count in namespace
gemma4-limits	LimitRange	-	Sets per-container resource defaults and ceilings
ollama-network-policy	NetworkPolicy	11434	Restricts Ollama to WebUI pods and ingress controller

Architecture Overview

Ollama runs as a single replica with a 30 GB PersistentVolume for model weights. An init container pulls Gemma 4 on first boot using the Ollama CLI directly. Open WebUI reaches Ollama via internal ClusterIP DNS. NGINX Ingress routes gemma.local to the WebUI and ollama.local to the raw API, with buffering off for token streaming. A NetworkPolicy restricts Ollama to WebUI pods and the ingress controller only.

Prerequisites

Docker Desktop with Kubernetes enabled and WSL2 configured with at least 12 GB memory via .wslconfig
NGINX Ingress Controller deployed in the ingress-nginx namespace
127.0.0.1 gemma.local ollama.local added to C:\Windows\System32\drivers\etc\hosts
KubeKanvas CLI installed and running on your computer (Optional, if you want to use one-click deployment)

How to Deploy

Confirm the NGINX Ingress Controller pod is running in ingress-nginx before applying.
Verify .wslconfig sets memory=12GB and restart Docker Desktop if you changed it.
Update the hostPath.path values in both PersistentVolume specs to match your storage path. Kubernetes creates the directories automatically on first mount.
Deploy the template to your cluster via the Play button in the top right bar. If you prefer to deploy manually, download the YAML and apply it with kubectl apply -f gemma4-k8s-final.yaml.
Wait for all pods to reach Running status. You can monitor progress in the Release Monitor screen.

How to Test

Run kubectl logs -n gemma4 -l app=ollama -c model-puller -f and confirm Pull complete and Init done appear before the main container starts.
Open http://gemma.local and confirm the Open WebUI interface loads.
Send a message and confirm Gemma 4 responds with streamed tokens.
Hit http://ollama.local/api/tags and confirm the JSON response lists gemma4.

Use Cases

Offline LLM development: Test prompts against Gemma 4 with no internet or API token after the initial pull.
Private code assistance: Point Continue.dev or any OpenAI-compatible client at http://ollama.local for local inference with no data leaving the machine.
Kubernetes learning: Study a realistic manifest covering PVs, init containers, Ingress, HPA, NetworkPolicy, ResourceQuota, and LimitRange in one deployable file.
Air-gapped environments: Pre-pull weights to the hostPath volume and deploy on a machine with no outbound internet.
Multi-user setups: Enable Open WebUI signup so multiple developers share one inference server with isolated chat history.

Summary

This template configures a 17-resource Kubernetes stack that runs Gemma 4 via Ollama, exposes it through a streaming-optimized NGINX Ingress, and enforces namespace-level resource and network controls.

Tags:

GemmaKubernetesKubekanvasLLMK8s

Created by:

Siddiqui

Deploy Gemma 4 + Ollama on Kubernetes Via Kubekanvas: Local AI Stack template preview

0 uses

What's Included

Component	Type	Port	Role
Namespace	Namespace	-	Isolates all resources under `gemma4`
ollama-pv	PersistentVolume	-	30 GB hostPath volume for model weights
ollama-pvc	PersistentVolumeClaim	-	Binds Ollama to model storage
webui-pv	PersistentVolume	-	5 GB hostPath volume for WebUI data
webui-pvc	PersistentVolumeClaim	-	Binds WebUI to its data storage
ollama-config	ConfigMap	-	Ollama runtime env vars
webui-config	ConfigMap	-	Open WebUI env vars
ollama	Deployment	11434	Ollama inference server running Gemma 4
ollama-service	Service (NodePort)	11434 / 30111	Exposes Ollama internally and via NodePort
open-webui	Deployment	8080	Browser chat interface
open-webui-service	Service (NodePort)	8080 / 30080	Exposes WebUI internally and via NodePort
nginx	IngressClass	-	Declares NGINX as default ingress class
gemma4-ingress	Ingress	80	Routes gemma.local to WebUI, ollama.local to API
open-webui-hpa	HorizontalPodAutoscaler	-	Scales WebUI on CPU or memory pressure
gemma4-quota	ResourceQuota	-	Caps CPU, memory, and pod count in namespace
gemma4-limits	LimitRange	-	Sets per-container resource defaults and ceilings
ollama-network-policy	NetworkPolicy	11434	Restricts Ollama to WebUI pods and ingress controller

Architecture Overview

Prerequisites

Docker Desktop with Kubernetes enabled and WSL2 configured with at least 12 GB memory via .wslconfig

NGINX Ingress Controller deployed in the ingress-nginx namespace

127.0.0.1 gemma.local ollama.local added to C:\Windows\System32\drivers\etc\hosts

KubeKanvas CLI installed and running on your computer (Optional, if you want to use one-click deployment)

How to Deploy

Confirm the NGINX Ingress Controller pod is running in ingress-nginx before applying.

Verify .wslconfig sets memory=12GB and restart Docker Desktop if you changed it.

Update the hostPath.path values in both PersistentVolume specs to match your storage path. Kubernetes creates the directories automatically on first mount.

Deploy the template to your cluster via the Play button in the top right bar. If you prefer to deploy manually, download the YAML and apply it with kubectl apply -f gemma4-k8s-final.yaml.

Wait for all pods to reach Running status. You can monitor progress in the Release Monitor screen.

How to Test

Run kubectl logs -n gemma4 -l app=ollama -c model-puller -f and confirm Pull complete and Init done appear before the main container starts.

Open http://gemma.local and confirm the Open WebUI interface loads.

Send a message and confirm Gemma 4 responds with streamed tokens.

Hit http://ollama.local/api/tags and confirm the JSON response lists gemma4.

Use Cases

Offline LLM development: Test prompts against Gemma 4 with no internet or API token after the initial pull.

Private code assistance: Point Continue.dev or any OpenAI-compatible client at http://ollama.local for local inference with no data leaving the machine.

Kubernetes learning: Study a realistic manifest covering PVs, init containers, Ingress, HPA, NetworkPolicy, ResourceQuota, and LimitRange in one deployable file.

Air-gapped environments: Pre-pull weights to the hostPath volume and deploy on a machine with no outbound internet.

Multi-user setups: Enable Open WebUI signup so multiple developers share one inference server with isolated chat history.