Deploying OpenClaw in Kubernetes

Deploying OpenClaw in Kubernetes
A few weeks ago our team decided to give OpenClaw a proper run. Not just a quick test on someone's laptop, but a shared instance the whole team could rely on day to day. My first instinct was to grab a cheap VM somewhere, run docker compose up, and call it done. Simple enough.
Then I remembered we already had a small Kubernetes cluster humming along on Hetzner, handling a handful of internal services. Paying for yet another VM felt wasteful when there were spare resources sitting right there. So I figured: why not drop OpenClaw onto the cluster we already maintain?
This article is exactly that—everything I learned while getting OpenClaw running on Kubernetes, laid out step by step so you can skip the head-scratching.

1. Secrets & Isolation: The Foundation
First, we create a dedicated Namespace to keep our resources organized and a Secret to keep our keys under lock and key. Avoid environment variables in plain manifests—security first!
apiVersion: v1
kind: Namespace
metadata:
name: openclaw
---
apiVersion: v1
kind: Secret
metadata:
name: openclaw-secrets
namespace: openclaw
type: Opaque
data:
OPENCLAW_TELEGRAM_TOKEN: <base64-encoded-value>
ANTHROPIC_API_KEY: <base64-encoded-value>
DISCORD_BOT_TOKEN: <base64-encoded-value>
Pro Tip: Generate your values using echo -n 'your-value' | base64.
2. The Deployment
OpenClaw keeps its config on disk and runs cron jobs internally, so it behaves like a stateful app. Running multiple replicas sounds appealing on paper, but it opens a can of worms: shared storage requirements, duplicate cron executions, state conflicts between pods. For our team's usage a single pod turned out to be the sweet spot. Kubernetes still restarts it automatically if anything goes wrong, and one less moving part means one less thing to debug.
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
namespace: openclaw
spec:
replicas: 1
selector:
matchLabels:
app: openclaw-gateway
template:
metadata:
labels:
app: openclaw-gateway
spec:
containers:
- name: gateway
image: openclaw/openclaw:latest
ports:
- containerPort: 18789
envFrom:
- secretRef:
name: openclaw-secrets
volumeMounts:
- name: config
mountPath: /root/.openclaw
- name: workspace
mountPath: /root/workspace
resources:
limits:
cpu: '2'
memory: '2Gi'
requests:
cpu: '1'
memory: '1Gi'
livenessProbe:
httpGet:
path: /health
port: 18789
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 18789
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config
persistentVolumeClaim:
claimName: openclaw-config-pvc
- name: workspace
persistentVolumeClaim:
claimName: openclaw-workspace-pvc
The liveness probe is your safety net—if the gateway locks up, Kubernetes kills the pod and brings up a fresh one. The readiness probe keeps traffic away until the process is actually ready to serve requests.
3. Persistent Storage
With a single pod, standard ReadWriteOnce (RWO) block storage works perfectly—no need for shared filesystems like NFS or Longhorn RWX. AWS EBS, GCE Persistent Disk, or Hetzner Volumes are all fine choices.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-config-pvc
namespace: openclaw
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-workspace-pvc
namespace: openclaw
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
4. Networking: Ingress & Long-Lived Connections
OpenClaw relies on WebSockets to keep a persistent connection between your browser and the gateway. When you send a prompt, the gateway might need minutes to process it, streaming results back over that open connection the entire time. The problem is that most Ingress controllers default to a 60-second proxy timeout. If the response takes longer than that, NGINX closes the connection mid-stream and you get a broken session with no output. Bumping the read and send timeouts to 3600 seconds (one hour) gives long-running tasks enough room to finish without getting cut off.
apiVersion: v1
kind: Service
metadata:
name: openclaw-service
namespace: openclaw
spec:
selector:
app: openclaw-gateway
ports:
- name: gateway
port: 18789
targetPort: 18789
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openclaw-ingress
namespace: openclaw
annotations:
cert-manager.io/cluster-issuer: 'letsencrypt-prod'
nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
nginx.ingress.kubernetes.io/proxy-send-timeout: '3600'
spec:
rules:
- host: openclaw.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: openclaw-service
port:
number: 18789
5. Managing State
With a single pod, you avoid the common multi-replica headaches entirely—no duplicate cron executions, no shared-state conflicts. A few tips:
- External Backends: If using QMD or Mem0, run them as standalone deployments with their own ClusterIP services. This keeps concerns separated and makes each component independently manageable.
- Webhooks: Ensure your Ingress points to a stable URL. Telegram and Discord expect a single endpoint; Kubernetes routes traffic to your pod automatically.
6. Example: Monitoring a Statically Generated Site with WhatsApp Alerts
Here is one thing we actually set up with our running instance that I think shows the value of having OpenClaw on Kubernetes.
Imagine you run a site at https://openclawexample.mywebsite.com built with incremental static generation:
- Blog and informational pages are generated statically and revalidated over time.
- A pricing page pulls data from a database during static generation and refreshes it at a configured interval.
Even if the backend service that provides pricing data or the service that stores blog content fails or returns incorrect data, the website itself will usually keep serving HTML. The problem is that the content on those pages might be empty, stale, or incorrect—and you might not notice until a user reports it.
This is where OpenClaw helps. You can:
- Configure a scheduled job (via
jobs.jsonstored on the shared config volume) that periodically checks key pages such as the blog index, a sample blog post, and the pricing page. - Have that job call a small checker endpoint or script that verifies that expected data is present (for example, that the pricing table is not empty and the blog list has recent entries).
- If the checker detects missing or invalid data, OpenClaw triggers an alert message over WhatsApp to your phone so you can investigate immediately while the site is still up.
To make this work, in addition to the Kubernetes manifests shown above, you need to:
- Configure a WhatsApp channel in OpenClaw and add the required credentials (for example, WhatsApp API keys or tokens) as additional Kubernetes Secrets.
- Mount those secrets into the gateway (as shown for other tokens) and reference them in your OpenClaw configuration.
- Define a cron-style job in OpenClaw's
jobs.jsonthat calls your site URLs and validates the responses. This is not a Kubernetes CronJob—it is an OpenClaw-level config file that lives on the persistent volume you mounted at/root/.openclaw. A minimal entry looks like this:
[
{
"name": "check-pricing-page",
"cron": "*/30 * * * *",
"prompt": "Fetch https://openclawexample.mywebsite.com/pricing and verify the pricing table has at least one row. If it is empty or the page returns an error, send a WhatsApp alert saying 'Pricing page is broken'."
}
]
OpenClaw reads this file on startup, schedules each entry as an internal cron task, and executes the prompt at the specified interval. Because the gateway pod already has network access to your site and the WhatsApp channel credentials mounted as secrets, no extra Kubernetes resources are needed—just this one JSON file.
With this pattern, your statically generated site continues to serve traffic, but you get proactive, chat-based alerts when the underlying data pipelines or content APIs break.
7. Don't Forget Backups!
PVCs are not backups—they are live data. If your storage volume is corrupted, the pod loses its state. Use tools like Velero for Kubernetes-native snapshots and off-cluster backups to S3-compatible storage so you can recover both cluster state and persistent volumes.
Honestly, the deployment itself was the boring part. The fun started when we actually began using it. Within a couple of days, someone on the team had wired up Telegram alerts for a staging environment, another person set up a cron job to check if our docs site was returning stale content. It kind of snowballed from there. Now it just sits on the cluster doing its thing and we keep finding new uses for it. If you have a cluster with some room to spare, just throw it on there and see what happens. You will probably end up using it for stuff you did not plan for.
Grab the YAMLs and Go
I have pushed every manifest from this article into a public GitHub repository—you can clone it, tweak the values, and kubectl apply the whole thing in one shot.

Would you like to visually see what you need to deploy OpenClaw in Kubernetes
Either way, I hope this saves you the weekend I spent figuring it all out. If you run into something I missed, open an issue on the repo and I will add it.




