This documentation covers the complete setup of a K3s cluster with Flux for GitOps and Vault for secrets management.
- Mesh VPN Installation
- K3s Installation
- Control Plane Setup
- Worker Node Setup
- Mac Worker via Multipass
- Flux Installation
- Helm Installation
- Vault Installation
- MinIO Installation
- Phoenix Installation
- OpenObserve Installation
- Inference API Services
- Troubleshooting
You need to headscale server on your choosen server
mkdir -p ./headscale/{config,lib,run}
cd ./headscale
docker run \
--name headscale \
--detach \
--volume "$(pwd)/config:/etc/headscale" \
--volume "$(pwd)/lib:/var/lib/headscale" \
--volume "$(pwd)/run:/var/run/headscale" \
--publish 0.0.0.0:8080:8080 \
--publish 0.0.0.0:9090:9090 \
docker.io/headscale/headscale:<VERSION> \
serve
docker exec -it headscale \
headscale users create myfirstuser
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up --login-server=http://<IP_SERVER>:8080
headscale nodes register --user USERNAME --key <GENERATED_KEY>
on docker :
sudo docker exec -it headscale headscale nodes register --user k3s-headscale --key <KEY>
curl -sfL https://get.k3s.io | sh -If you encounter the error "failed to find memory cgroup (v2)", follow these steps:
Enable cgroups v2:
- Edit the boot configuration:
sudo nano /boot/firmware/cmdline.txt- Add the following parameters to the end of the existing line (everything must be on one line):
cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1
- Reboot the Raspberry Pi:
sudo reboot- Restart K3s service:
sudo systemctl start k3s
sudo systemctl status k3s- Verify the cluster is working:
sudo k3s kubectl get nodesOn the control plane node, retrieve the token and IP:
# Get the node token
sudo cat /var/lib/rancher/k3s/server/node-token
# Get the control plane IP
hostname -IOn the worker node, run:
curl -sfL https://get.k3s.io | K3S_URL=https://CONTROL_PLANE_IP:6443 K3S_TOKEN=YOUR_TOKEN sh -or
curl -sfL https://get.k3s.io | K3S_URL=https://CONTROL_PLANE_IP:6443 \
K3S_TOKEN=YOUR_TOKEN \
sh -s - --node-ip=$(tailscale ip -4)Replace:
CONTROL_PLANE_IPwith the actual IP of your control planeYOUR_TOKENwith the token from the previous step
sudo systemctl status k3s-agentFor Mac systems, use Colima to create a Ubuntu VM:
colima start --cpu 4 --memory 4 --disk 30colima sshThen follow the worker installation steps above within the VM.
macOS:
brew install fluxcd/tap/fluxLinux:
curl -s https://fluxcd.io/install.sh | sudo bashflux installAfter installation, you should see Flux components running:
kubectl get pods -AExpected output should include:
NAMESPACE NAME READY STATUS RESTARTS AGE
flux-system helm-controller-5c898f4887-568tw 1/1 Running 0 28s
flux-system kustomize-controller-7bcf986f97-67hfv 1/1 Running 0 28s
flux-system notification-controller-5f66f99d4d-s6qll 1/1 Running 0 28s
flux-system source-controller-54bc45dc6-7zcpk 1/1 Running 0 28s
Helm is required for installing Vault and other applications in the cluster.
macOS:
brew install helmLinux:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bashhelm versionhelm repo add hashicorp https://helm.releases.hashicorp.comhelm install openbao openbao/openbao --namespace soludev -f config/dev/openbao/values.yamlNote: Make sure you have a values.yaml file configured for your Vault setup.
kubectl exec -n soludev openbao-0 -- vault operator initThis command will output unseal keys and a root token. Save these securely!
Use any 3 of the 5 unseal keys provided during initialization:
kubectl exec -n soludev openbao-0 -- vault operator unseal '<key1>'
kubectl exec -n soludev openbao-0 -- vault operator unseal '<key2>'
kubectl exec -n soludev openbao-0 -- vault operator unseal '<key3>'Replace <key1>, <key2>, and <key3> with actual unseal keys from the initialization step.
After unsealing Vault, you need to enable and configure Kubernetes authentication to allow pods to authenticate with Vault.
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
helm install external-secrets external-secrets/external-secrets \
-n external-secrets-systemexport VAULT_TOKEN=<YOUR_TOKEN>
kubectl exec -n soludev openbao-0 -- \
env VAULT_TOKEN="$VAULT_TOKEN" \
bao auth enable kubernetes# Create service account
kubectl create serviceaccount openbao-auth -n soludev
# Create cluster role binding
kubectl create clusterrolebinding openbao-auth \
--clusterrole=system:auth-delegator \
--serviceaccount=soludev:openbao-auth
kubectl create serviceaccount external-secrets-sa -n soludev
# Generate token for the service account
kubectl create token openbao-auth -n soludev # Get the Kubernetes host URL (usually the API server)
K8S_HOST=$(kubectl config view --raw --minify --flatten -o jsonpath='{.clusters[].cluster.server}')
# Get the service account token
SA_TOKEN=$(kubectl create token openbao-auth -n kaiohz)
# Get the CA certificate
K8S_CA_CERT=$(kubectl get configmap kube-root-ca.crt -o jsonpath='{.data.ca\.crt}')
# Configure the Kubernetes auth method
kubectl exec -n soludev openbao-0 -- env VAULT_TOKEN="$VAULT_TOKEN" \
bao write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc" \
kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
disable_iss_validation=truekubectl exec -n soludev openbao-0 -- env VAULT_TOKEN="$VAULT_TOKEN" \
bao write auth/kubernetes/role/external-secrets-role \
bound_service_account_names=external-secrets-sa \
bound_service_account_namespaces=soludev \
policies=external-secrets-policy \
ttl=24hkubectl exec -n soludev openbao-0 -- sh -c "echo 'path \"kv/data/*\" {
capabilities = [\"read\", \"list\"]
}
path \"kv/metadata/*\" {
capabilities = [\"read\", \"list\"]
}' | env VAULT_TOKEN=\"$VAULT_TOKEN\" bao policy write external-secrets-policy -"After installing Flux, you need to configure GitRepository and Kustomization resources to enable GitOps workflows.
Create a GitRepository resource to tell Flux where to find your configuration:
# config/dev/gitrepository.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: kaiohz-repo
namespace: flux-system
spec:
interval: 1m
url: https://github.com/Kaiohz/flux.git
ref:
branch: mainApply the configuration:
kubectl apply -f config/dev/gitrepository.yamlCreate a Kustomization resource to define how Flux should reconcile your manifests:
# config/dev/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: kaiohz-kustomization
namespace: flux-system
spec:
interval: 2m
timeout: 90s
wait: false # set wait to true if you deploy resources like Deployment, ConfigMap without Helm, for HelmRelease set wait to false
targetNamespace: kaiohz
sourceRef:
kind: GitRepository
name: kaiohz-repo
# If the helmreleases are in a directory use the parameter below
path: "dev"
#When the Git revision changes, the manifests are reconciled automatically. If previously applied objects are missing from the current revision, these objects are deleted from the cluster when spec.prune is enabled
prune: trueApply the configuration:
kubectl apply -f config/dev/kustomization.yamlYou can also configure additional GitRepository resources for external projects:
# dev/gitrepositories.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: prospection-api-mcp
namespace: kaiohz
spec:
interval: 2m
url: https://github.com/Kaiohz/prospectio-api-mcp.git
ref:
branch: mainApply the configuration:
kubectl apply -f dev/gitrepositories.yamlThis section covers setting up services to access inference APIs running on external machines (Jetson) from within the K3s cluster.
The inference services allow pods in the cluster to access AI inference APIs running on external machines using standard Kubernetes service discovery. This enables applications to use inference endpoints via cluster-internal URLs.
For accessing the Ollama inference API running on a Jetson device:
# dev/ollama-inference-jetson.yaml
apiVersion: v1
kind: Service
metadata:
name: ollama-inference-jetson
namespace: kaiohz
spec:
type: ExternalName
externalName: 192.168.1.6 # Replace with your Jetson IP
ports:
- port: 11434Access from pods:
http://ollama-inference-jetson.kaiohz.svc.cluster.local:11434
For accessing the inference API running on a Mac:
# dev/ollama-inference-mac.yaml
apiVersion: v1
kind: Service
metadata:
name: ollama-inference-mac
namespace: kaiohz
spec:
type: ExternalName
externalName: 192.168.1.10 # Replace with your Mac IP
ports:
- port: 11434Access from pods:
http://ollama-inference-mac.kaiohz.svc.cluster.local:11434
ExternalName Service:
type: ExternalName: Creates a CNAME record pointing to the external hostexternalName: IP address or hostname of the external serviceports: Port mapping for the service
Alternative: Endpoint-based Service
For more control, you can create a service with explicit endpoints:
apiVersion: v1
kind: Service
metadata:
name: mac-inference-endpoints
namespace: kaiohz
spec:
ports:
- port: 11434
targetPort: 11434
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: mac-inference-endpoints
namespace: kaiohz
subsets:
- addresses:
- ip: 192.168.1.10 # Mac IP
ports:
- port: 11434In your application pods, you can now access the inference APIs using cluster-internal URLs:
# Example deployment using the inference service
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
env:
- name: MAC_INFERENCE_URL
value: "http://ollama-inference-mac.kaiohz.svc.cluster.local:11434"
- name: JETSON_INFERENCE_URL
value: "http://ollama-inference-jetson.kaiohz.svc.cluster.local:11434"You can add health checks to monitor the external services:
apiVersion: v1
kind: Pod
metadata:
name: inference-healthcheck
spec:
containers:
- name: healthcheck
image: curlimages/curl
command:
- /bin/sh
- -c
- |
while true; do
echo "Checking Mac inference..."
curl -f http://ollama-inference-mac.kaiohz.svc.cluster.local:11434/health || echo "Mac inference down"
echo "Checking Jetson inference..."
curl -f http://ollama-inference-jetson.kaiohz.svc.cluster.local:11434/health || echo "Jetson inference down"
sleep 30
doneEnsure that:
- The external machines (Mac/Jetson) are accessible from the K3s cluster nodes
- Firewall rules allow traffic on the inference API ports (11434)
- The inference services are running and bound to the correct network interfaces
Deploy the inference services to your cluster:
# Apply Jetson service
kubectl apply -f dev/ollama-inference-jetson.yaml
# Apply Mac service
kubectl apply -f dev/ollama-inference-mac.yaml
# Verify services are created
kubectl get services -A | grep inferenceTest the services from within the cluster:
# Create a test pod
kubectl run test-pod --image=curlimages/curl --rm -it -- /bin/sh
# Test Mac inference service
curl http://ollama-inference-mac:11434
# Test Jetson inference service
curl http://ollama-inference-jetson:11434Check that Flux is monitoring your repositories:
# Check GitRepository resources
kubectl get gitrepositories -A
# Check Kustomization resources
kubectl get kustomizations -A
# Check Flux reconciliation status
flux get sources git
flux get kustomizationsYour repository should be structured like this for optimal GitOps workflow:
flux/
├── config/
│ └── dev/
│ ├── gitrepository.yaml # Main repo configuration
│ └── kustomization.yaml # Main kustomization
└── dev/
├── gitrepositories.yaml # Additional repos
├── pgvector/ # Application manifests
├── prospectio-api-mcp/ # Application manifests
└── vault/ # Vault configuration
GitRepository Parameters:
interval: How often Flux checks for changesurl: Git repository URL (HTTPS or SSH)ref.branch: Branch to monitorref.tag: Specific tag to use (alternative to branch)
Kustomization Parameters:
interval: How often to reconcile manifeststimeout: Maximum time for reconciliationwait: Whether to wait for resources to be readytargetNamespace: Default namespace for resourcespath: Directory in the repo containing manifestsprune: Remove resources not present in the current revision
Before deploying IngressRouteTCP resources, ensure Traefik CRDs are installed in your cluster.
kubectl apply -f https://raw.githubusercontent.com/traefik/traefik/v3.3.6/docs/content/reference/dynamic-configuration/kubernetes-crd-definition-v1.ymlAdd installCRDs: true to your Traefik HelmChart values:
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: traefik
namespace: kube-system
spec:
chart: https://%{KUBERNETES_API}%/static/charts/traefik-34.2.1+up34.2.0.tgz
valuesContent: |-
installCRDs: true
additionalArguments:
- "--entrypoints.postgres.address=:5432/tcp"
ports:
postgres:
port: 5432
expose:
default: true
exposedPort: 5432
protocol: TCP
# ... rest of your configurationCheck if Traefik CRDs are installed:
kubectl get crd | grep traefikYou should see output similar to:
ingressroutes.traefik.containo.us
ingressroutetcps.traefik.containo.us
ingressrouteudps.traefik.containo.us
middlewares.traefik.containo.us
middlewaretcps.traefik.containo.us
serverstransports.traefik.containo.us
tlsoptions.traefik.containo.us
tlsstores.traefik.containo.us
traefikservices.traefik.containo.us
If you get the error "no matches for kind 'IngressRouteTCP'", it means the CRDs are not installed. Follow one of the installation methods above.
Common issues:
- CRDs not found: Ensure Traefik is deployed with
installCRDs: trueor install manually - Version mismatch: Make sure the CRD version matches your Traefik version
- Permissions: Ensure you have cluster-admin permissions to install CRDs
This section explains how to automatically synchronize secrets stored in Vault to Kubernetes secrets using External Secrets Operator (ESO).
# Add the repo
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
# Install ESO
helm install external-secrets external-secrets/external-secrets \
-n external-secrets-system \
--create-namespace
# Verify installation
kubectl get pods -n external-secrets-systemkubectl apply -f https://raw.githubusercontent.com/external-secrets/external-secrets/main/deploy/crds/bundle.yaml
kubectl apply -f https://raw.githubusercontent.com/external-secrets/external-secrets/main/deploy/charts/external-secrets/templates/deployment.yaml# vault-secret-store.yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: openbao-backend
namespace: soludev
spec:
provider:
vault:
server: "http://vault.soludev.svc.cluster.local:8200"
path: "secret" # Path to your KV engine
version: "v2" # Version of the KV engine (v1 or v2)
auth:
kubernetes:
mountPath: "kubernetes"
role: "external-secrets-role"
serviceAccountRef:
name: "external-secrets-sa"# Apply the configuration
kubectl apply -f vault-secret-store.yaml
# Verify the SecretStore
kubectl get secretstore vault-backend -n kaiohz
kubectl describe secretstore vault-backend -n kaiohz- Secrets > secret/ (or your engine)
- Create secret
Example structures:
# Secrets for an application
Path: myapp/database
- username: mydbuser
- password: supersecret123
- host: db.example.com
- port: 5432
# Secrets for the API
Path: myapp/api
- key: abc123xyz
- secret: def456uvw
- endpoint: https://api.example.com
# Secrets for certificates
Path: myapp/tls
- cert: -----BEGIN CERTIFICATE-----...
- key: -----BEGIN PRIVATE KEY-----...
# Secrets de base de données
vault kv put secret/myapp/database \
username=mydbuser \
password=supersecret123 \
host=db.example.com \
port=5432
# Secrets d'API
vault kv put secret/myapp/api \
key=abc123xyz \
secret=def456uvw \
endpoint=https://api.example.com
# Configuration générale
vault kv put secret/myapp/config \
env=production \
debug=false \
log_level=info# myapp-database-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-database
namespace: kaiohz
spec:
refreshInterval: 60s # Synchronize every minute
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: myapp-db-secret # Name of the K8s secret to be created
creationPolicy: Owner # ESO manages the secret
type: Opaque # Type of K8s secret
data:
- secretKey: DB_USERNAME # Key in the K8s secret
remoteRef:
key: myapp/database # Path in Vault
property: username # Specific property
- secretKey: DB_PASSWORD
remoteRef:
key: myapp/database
property: password
- secretKey: DB_HOST
remoteRef:
key: myapp/database
property: host
- secretKey: DB_PORT
remoteRef:
key: myapp/database
property: port# myapp-api-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-api
namespace: kaiohz
spec:
refreshInterval: 30s
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: myapp-api-secret
creationPolicy: Owner
dataFrom:
- extract:
key: myapp/api # Retrieves ALL keys from this Vault secret# myapp-tls-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-tls
namespace: kaiohz
spec:
refreshInterval: 300s # 5 minutes for certificates
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: myapp-tls-secret
creationPolicy: Owner
type: kubernetes.io/tls # Special type for TLS
data:
- secretKey: tls.crt
remoteRef:
key: myapp/tls
property: cert
- secretKey: tls.key
remoteRef:
key: myapp/tls
property: key# Apply all ExternalSecrets
kubectl apply -f myapp-database-secret.yaml
kubectl apply -f myapp-api-secret.yaml
kubectl apply -f myapp-tls-secret.yaml
# Check status
kubectl get externalsecrets -n kaiohz
# View details (important for debugging)
kubectl describe externalsecret myapp-database -n kaiohz# List created secrets
kubectl get secrets -n kaiohz | grep myapp
# View the content of a secret (base64)
kubectl get secret myapp-db-secret -n kaiohz -o yaml
# Decode a secret to verify
kubectl get secret myapp-db-secret -n kaiohz -o jsonpath='{.data.DB_USERNAME}' | base64 -dapiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: kaiohz
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
env:
# Environment variables from secrets
- name: DB_USERNAME
valueFrom:
secretKeyRef:
name: myapp-db-secret
key: DB_USERNAME
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: myapp-db-secret
key: DB_PASSWORD
- name: API_KEY
valueFrom:
secretKeyRef:
name: myapp-api-secret
key: key
# Volume for TLS certificates
volumeMounts:
- name: tls-certs
mountPath: /etc/ssl/certs/app
readOnly: true
volumes:
- name: tls-certs
secret:
secretName: myapp-tls-secret envFrom:
- secretRef:
name: myapp-api-secret # All keys become environment variablesRun these commands inside the Colima VM:
sudo mkdir -p /var/lib/k3s-storage sudo chmod 777 /var/lib/k3s-storage
sudo apt update sudo apt install -y nfs-kernel-server
echo "/var/lib/k3s-storage 100.64.0.0/10(rw,sync,no_subtree_check,no_root_squash,insecure)" | sudo tee -a /etc/exports
sudo exportfs -ra sudo systemctl enable nfs-kernel-server sudo systemctl restart nfs-kernel-server
showmount -e localhost
Test port NFS (2049) bash# Avec netcat nc -zv <IP_HEADSCALE_COLIMA> 2049
telnet <IP_HEADSCALE_COLIMA> 2049
nmap -p 2049 <IP_HEADSCALE_COLIMA>
MinIO is a high-performance, S3-compatible object storage service. This section covers deploying MinIO in your K3s cluster using Helm.
helm repo add minio https://charts.min.io/
helm repo updatekubectl create namespace minioCreate or update your config/dev/minio/values.yaml file with the following configuration:
# MinIO Helm values
mode: standalone
replicas: 1
# Root credentials (change these in production!)
rootUser: minioadmin
rootPassword: minioadmin
# Storage configuration
persistence:
enabled: true
storageClass: nfs-cluster-global # Use your storage class
size: 50Gi # Adjust based on your PV capacity
# Resource limits
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
# Service configuration
service:
type: ClusterIP
port: 9000
consoleService:
type: ClusterIP
port: 9001
# Auto-create buckets
buckets:
- name: documents
policy: none
- name: uploads
policy: nonehelm install minio minio/minio \
-n minio \
-f config/dev/minio/values.yaml# Check if MinIO pod is running
kubectl get pods -n minio
# Check PVC binding
kubectl get pvc -n minio
# View MinIO service
kubectl get svc -n miniokubectl port-forward -n minio svc/minio 9001:9001Then access the console at: http://localhost:9001
Create an Ingress resource to expose MinIO API and Console:
# dev/minio/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-ingress
namespace: minio
spec:
rules:
# MinIO Console
- host: minio-console.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio
port:
number: 9001
# MinIO API
- host: minio-api.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio
port:
number: 9000Apply the ingress:
kubectl apply -f dev/minio/ingress.yamlAfter MinIO is running, create access keys for applications:
# Port forward to MinIO
kubectl port-forward -n minio svc/minio 9001:9001 &
# Access console at http://localhost:9001
# Login with minioadmin/minioadmin
# Create new access key under "Access Keys"Or use MinIO CLI:
# Install MinIO CLI
curl https://dl.min.io/client/mc/release/darwin-amd64/mc \
-o /usr/local/bin/mc
chmod +x /usr/local/bin/mc
# Configure MinIO alias
mc config host add minio http://localhost:9000 minioadmin minioadmin
# Create service account
mc admin user svcacct add minio minioadminapiVersion: v1
kind: Secret
metadata:
name: minio-credentials
namespace: default
type: Opaque
stringData:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
MINIO_ENDPOINT: minio.minio.svc.cluster.local:9000
MINIO_USE_SSL: "false"apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-minio
namespace: default
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
envFrom:
- secretRef:
name: minio-credentials
env:
- name: MINIO_BUCKET
value: "uploads"If you need MinIO access from multiple namespaces, create a PersistentVolumeClaim in each namespace:
# Each namespace needs its own PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-pvc
namespace: my-app-namespace
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-cluster-global
resources:
requests:
storage: 20Gi # Size per namespace claim# Check MinIO status
kubectl get all -n minio
# View MinIO logs
kubectl logs -n minio -l app=minio -f
# Scale MinIO (for HA setup, change mode to distributed)
kubectl scale statefulset minio -n minio --replicas=3
# Delete MinIO (data persists if using Retain policy)
helm uninstall minio -n minio
# Delete MinIO with data
kubectl delete namespace minioPVC not binding:
kubectl describe pvc -n minio
kubectl get pvPod stuck in Pending:
kubectl describe pod -n minio -l app=minioConnection issues from applications:
- Verify DNS resolution:
kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup minio.minio.svc.cluster.local - Check network policies and firewall rules
- Verify credentials are correct
Phoenix is an open-source AI observability platform for LLM applications. This section covers deploying Phoenix in your K3s cluster with external PostgreSQL and secrets managed by OpenBao.
- Running PostgreSQL instance in the cluster
- OpenBao/Vault configured and unsealed
- External Secrets Operator installed
- Helm 3.x installed
Chart Version Limitations:
- Phoenix Helm chart version 4.0.6 does NOT support the
additionalEnvparameter for injecting external secrets - The chart always creates its own secret for authentication credentials
- Cannot use
auth.nameto point directly to an external secret (causes ownership conflicts with External Secrets Operator) - Secrets must be provided at Helm upgrade time via command-line or temporary values file
Secret Requirements:
PHOENIX_SECRET: Must be at least 32 characters longPHOENIX_ADMIN_SECRET: Must be at least 32 characters long- Other secrets can be any length
Before installing Phoenix, create a dedicated database and user in your PostgreSQL instance.
# Find your PostgreSQL pod
kubectl get pods -n soludev | grep postgres
# Connect to PostgreSQL
kubectl exec -it -n soludev <postgres-pod-name> -- psql -U <postgres-user>-- Create the phoenix user
CREATE USER phoenix WITH PASSWORD 'your-secure-password';
-- Create the phoenix database
CREATE DATABASE phoenix OWNER phoenix;
-- Grant permissions
GRANT ALL PRIVILEGES ON DATABASE phoenix TO phoenix;
-- Verify
\l -- List databases
\du -- List usersPhoenix requires several secrets to be stored in OpenBao. Create them with the following structure:
# Generate 32+ character secrets for PHOENIX_SECRET and PHOENIX_ADMIN_SECRET
openssl rand -base64 32 # For PHOENIX_SECRET
openssl rand -base64 32 # For PHOENIX_ADMIN_SECRET
# Generate other secrets
openssl rand -base64 16 # For PHOENIX_SMTP_PASSWORD (if using email)
openssl rand -base64 16 # For PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORDUsing the OpenBao CLI:
# Set your vault token
export VAULT_TOKEN=<your-root-token>
# Port forward to OpenBao (if needed)
kubectl port-forward -n soludev svc/openbao 8200:8200 &
# Store Phoenix secrets
kubectl exec -n soludev openbao-0 -- env VAULT_TOKEN="$VAULT_TOKEN" \
bao kv put soludev/phoenix \
PHOENIX_SECRET="<your-32+-char-secret>" \
PHOENIX_ADMIN_SECRET="<your-32+-char-secret>" \
PHOENIX_POSTGRES_PASSWORD="<your-db-password>" \
PHOENIX_SMTP_PASSWORD="" \
PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD="<your-admin-password>"Using the OpenBao UI:
- Port forward to OpenBao:
kubectl port-forward -n soludev svc/openbao 8200:8200 - Access UI at
http://localhost:8200 - Login with your root token
- Navigate to Secrets → soludev/
- Create a new secret named
phoenix - Add the following keys:
PHOENIX_SECRET(32+ characters)PHOENIX_ADMIN_SECRET(32+ characters)PHOENIX_POSTGRES_PASSWORD(your database password)PHOENIX_SMTP_PASSWORD(empty string if not using email)PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD(admin password)
Create an ExternalSecret resource to sync secrets from OpenBao to Kubernetes:
# dev/soludev/phoenix/external-secret.yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: soludev-phoenix-external-secret
namespace: soludev
spec:
refreshInterval: 60s
secretStoreRef:
name: openbao-backend
kind: ClusterSecretStore
target:
name: soludev-phoenix-secret
creationPolicy: Owner
dataFrom:
- extract:
key: soludev/phoenixApply the ExternalSecret (if not using Flux):
kubectl apply -f dev/soludev/phoenix/external-secret.yamlVerify Secret Synchronization:
# Check ExternalSecret status
kubectl get externalsecret -n soludev soludev-phoenix-external-secret
# Verify all keys are synced
kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data}' | jq -r 'keys[]'
# Expected output:
# PHOENIX_ADMIN_SECRET
# PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD
# PHOENIX_POSTGRES_PASSWORD
# PHOENIX_SECRET
# PHOENIX_SMTP_PASSWORD
# Verify secret lengths (should be 32+ for PHOENIX_SECRET and PHOENIX_ADMIN_SECRET)
kubectl get secret -n soludev soludev-phoenix-secret -o json | \
jq -r '.data | to_entries[] | "\(.key): \(.value | @base64d | length) chars"'Create a values file for Phoenix that references your PostgreSQL instance:
# config/dev/phoenix/values.yaml
postgresql:
enabled: false # We're using external PostgreSQL
database:
postgres:
host: "postgres" # Short name works for same-namespace services
port: 5432
db: "phoenix"
user: "phoenix"
password: "" # Will be provided via command-line during upgrade
auth:
enableAuth: true # Enable authentication
persistence:
enabled: false # Use PostgreSQL for persistence instead of SQLite
ingress:
enabled: false # Configure based on your needs
server:
port: 6006Since the Helm chart doesn't support external secrets directly, we need to pass secrets at upgrade time.
# Create temporary values file with secrets from Kubernetes secret
cat > /tmp/phoenix-secrets.yaml <<EOF
database:
postgres:
password: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_POSTGRES_PASSWORD}' | base64 -d)"
auth:
secret:
- key: "PHOENIX_SECRET"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_SECRET}' | base64 -d)"
- key: "PHOENIX_ADMIN_SECRET"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_ADMIN_SECRET}' | base64 -d)"
- key: "PHOENIX_POSTGRES_PASSWORD"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_POSTGRES_PASSWORD}' | base64 -d)"
- key: "PHOENIX_SMTP_PASSWORD"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_SMTP_PASSWORD}' | base64 -d)"
- key: "PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD}' | base64 -d)"
EOF
# Install Phoenix
helm install phoenix oci://registry-1.docker.io/arizephoenix/phoenix-helm \
--version 4.0.6 \
-n soludev \
-f config/dev/phoenix/values.yaml \
-f /tmp/phoenix-secrets.yaml
# Clean up temporary file
rm /tmp/phoenix-secrets.yamlWhen you need to upgrade Phoenix or update configuration:
# Create temporary values file with current secrets
cat > /tmp/phoenix-secrets.yaml <<EOF
database:
postgres:
password: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_POSTGRES_PASSWORD}' | base64 -d)"
auth:
secret:
- key: "PHOENIX_SECRET"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_SECRET}' | base64 -d)"
- key: "PHOENIX_ADMIN_SECRET"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_ADMIN_SECRET}' | base64 -d)"
- key: "PHOENIX_POSTGRES_PASSWORD"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_POSTGRES_PASSWORD}' | base64 -d)"
- key: "PHOENIX_SMTP_PASSWORD"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_SMTP_PASSWORD}' | base64 -d)"
- key: "PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD"
value: "$(kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD}' | base64 -d)"
EOF
# Upgrade Phoenix
helm upgrade phoenix oci://registry-1.docker.io/arizephoenix/phoenix-helm \
--version 4.0.6 \
-n soludev \
-f config/dev/phoenix/values.yaml \
-f /tmp/phoenix-secrets.yaml
# Clean up
rm /tmp/phoenix-secrets.yaml# Check if Phoenix pod is running
kubectl get pods -n soludev -l app=phoenix
# Expected output:
# NAME READY STATUS RESTARTS AGE
# phoenix-xxxxxxxxxx-xxxxx 1/1 Running 0 2m
# Check Phoenix logs
kubectl logs -n soludev -l app=phoenix --tail=50
# Look for successful startup messages:
# - "Application startup complete"
# - "Uvicorn running on http://0.0.0.0:6006"
# - Database connection: "postgresql://phoenix:***@postgres:5432/phoenix"kubectl port-forward -n soludev svc/phoenix-svc 6006:6006Access Phoenix at: http://localhost:6006
# Get admin password
kubectl get secret -n soludev phoenix-secret \
-o jsonpath='{.data.PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD}' | base64 -d
echo # Print newline
# Default username: adminTo expose Phoenix externally:
# dev/soludev/phoenix/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: phoenix-ingress
namespace: soludev
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod # If using cert-manager
spec:
rules:
- host: phoenix.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: phoenix-svc
port:
number: 6006
tls:
- hosts:
- phoenix.yourdomain.com
secretName: phoenix-tls1. DNS Resolution Errors
Error: [Errno -2] Name or service not known
Solution:
- Verify PostgreSQL service exists:
kubectl get svc -n soludev postgres - For same-namespace services, short hostname (
postgres) should work - For different namespaces, use FQDN:
postgres.soludev.svc.cluster.local - Test DNS from Phoenix pod:
kubectl exec -n soludev <phoenix-pod> -- nslookup postgres
2. Password Authentication Failed
FATAL: password authentication failed for user "phoenix"
Solution:
- Verify database password in secret:
kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_POSTGRES_PASSWORD}' | base64 -d - Check PostgreSQL user exists:
kubectl exec -it -n soludev <postgres-pod> -- psql -U logto -c "\du" | grep phoenix - Verify password matches what you set in PostgreSQL
- Ensure secrets are properly synced from OpenBao
3. Secret Validation Errors
ValueError: Phoenix secret must be at least 32 characters long
Solution:
- Check secret lengths:
kubectl get secret -n soludev soludev-phoenix-secret -o json | jq -r '.data | to_entries[] | "\(.key): \(.value | @base64d | length) chars"' - Regenerate secrets in OpenBao with proper length (32+ chars)
- Wait for ExternalSecret to sync (60s refresh interval)
- Force sync: Restart External Secrets Operator pod
4. Chart Limitations
If you see that additionalEnv is not being applied:
- Chart version 4.0.6 doesn't support
additionalEnvparameter - This feature exists in the GitHub main branch but hasn't been released
- Use the temporary values file method shown above as a workaround
- Watch for future chart versions that support external secrets natively
5. Pod Stuck in CrashLoopBackOff
# Check pod events
kubectl describe pod -n soludev <phoenix-pod-name>
# View detailed logs
kubectl logs -n soludev <phoenix-pod-name> --previous
# Common causes:
# - Database connection issues
# - Invalid secret values
# - Database migration failures6. Database Connection Troubleshooting
# Test PostgreSQL connectivity from Phoenix pod
kubectl run -it --rm debug --image=postgres:16 --restart=Never -n soludev -- \
psql -h postgres -U phoenix -d phoenix -c "SELECT version();"
# If connection fails:
# - Verify PostgreSQL service is running
# - Check PostgreSQL logs for authentication errors
# - Ensure database and user exist
# - Verify network policies allow trafficWhen you need to update Phoenix secrets:
-
Update secrets in OpenBao:
kubectl exec -n soludev openbao-0 -- env VAULT_TOKEN="$VAULT_TOKEN" \ bao kv patch soludev/phoenix \ PHOENIX_SECRET="<new-32+-char-secret>"
-
Wait for ExternalSecret to sync (60 seconds) or force sync:
kubectl delete pod -n external-secrets-system -l app.kubernetes.io/name=external-secrets
-
Verify secrets updated:
kubectl get secret -n soludev soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_SECRET}' | base64 -d
-
Upgrade Phoenix with new secrets using the upgrade command from step 5
-
Restart Phoenix pod to apply changes:
kubectl rollout restart deployment/phoenix -n soludev
Secret Management:
- ✅ Store all secrets in OpenBao, never in Git
- ✅ Use strong, randomly generated secrets (32+ chars for main secrets)
- ✅ Rotate secrets periodically
- ✅ Use the temporary file method to avoid secrets in shell history
- ✅ Delete temporary secret files immediately after use
Configuration:
- ✅ Use short hostnames for same-namespace services
- ✅ Keep values.yaml in Git without any secret values
- ✅ Document the secret structure in comments
- ✅ Use
database.postgres.password: ""as placeholder
Operations:
- ✅ Always verify secrets are synced before upgrading
- ✅ Check pod logs after deployment
- ✅ Monitor database connections
- ✅ Set up proper monitoring and alerting
Security:
- ✅ Enable authentication (
auth.enableAuth: true) - ✅ Use TLS for ingress in production
- ✅ Restrict network access using NetworkPolicies
- ✅ Regular security updates (upgrade Phoenix chart versions)
For easier management, create a deployment script:
#!/bin/bash
# deploy-phoenix.sh
set -e
NAMESPACE="soludev"
RELEASE="phoenix"
CHART_VERSION="4.0.6"
echo "Creating temporary values file with secrets..."
cat > /tmp/phoenix-secrets.yaml <<EOF
database:
postgres:
password: "$(kubectl get secret -n ${NAMESPACE} soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_POSTGRES_PASSWORD}' | base64 -d)"
auth:
secret:
- key: "PHOENIX_SECRET"
value: "$(kubectl get secret -n ${NAMESPACE} soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_SECRET}' | base64 -d)"
- key: "PHOENIX_ADMIN_SECRET"
value: "$(kubectl get secret -n ${NAMESPACE} soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_ADMIN_SECRET}' | base64 -d)"
- key: "PHOENIX_POSTGRES_PASSWORD"
value: "$(kubectl get secret -n ${NAMESPACE} soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_POSTGRES_PASSWORD}' | base64 -d)"
- key: "PHOENIX_SMTP_PASSWORD"
value: "$(kubectl get secret -n ${NAMESPACE} soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_SMTP_PASSWORD}' | base64 -d)"
- key: "PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD"
value: "$(kubectl get secret -n ${NAMESPACE} soludev-phoenix-secret -o jsonpath='{.data.PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD}' | base64 -d)"
EOF
echo "Upgrading Phoenix release..."
helm upgrade ${RELEASE} oci://registry-1.docker.io/arizephoenix/phoenix-helm \
--version ${CHART_VERSION} \
-n ${NAMESPACE} \
--install \
-f config/dev/phoenix/values.yaml \
-f /tmp/phoenix-secrets.yaml
echo "Cleaning up temporary file..."
rm /tmp/phoenix-secrets.yaml
echo "Phoenix deployment complete!"
echo "Check status with: kubectl get pods -n ${NAMESPACE} -l app=phoenix"Make it executable and use:
chmod +x deploy-phoenix.sh
./deploy-phoenix.shWatch for chart updates that add native support for:
- External secret references via
additionalEnv - Support for
existingSecretparameter - Direct integration with External Secrets Operator
Once these features are available, you can simplify the deployment by updating values.yaml to reference the external secret directly, eliminating the need for temporary files during upgrades.
OpenObserve is a cloud-native observability platform for logs, metrics, and traces. This section covers deploying OpenObserve with MinIO for object storage and PostgreSQL for metadata.
- MinIO instance running in the cluster
- PostgreSQL instance running in the cluster
- OpenBao/Vault configured and unsealed
- External Secrets Operator installed
- Helm 3.x installed
- NFS storage configured (for local cache)
OpenObserve deployment uses:
- MinIO: For storing logs, metrics, and traces (object storage)
- PostgreSQL: For metadata storage
- NFS: For local cache/temporary data
- External Secrets: For secure credential management
Create a dedicated database for OpenObserve metadata.
# Find your PostgreSQL pod
kubectl get pods -n soludev | grep postgres
# Connect to PostgreSQL
kubectl exec -it -n soludev <postgres-pod-name> -- psql -U <postgres-user>-- Create the openobserve user
CREATE USER openobserve WITH PASSWORD 'your-secure-password';
-- Create the openobserve database
CREATE DATABASE openobserve OWNER openobserve;
-- Grant permissions
GRANT ALL PRIVILEGES ON DATABASE openobserve TO openobserve;
-- Connect to the database to grant schema permissions
\c openobserve
GRANT ALL ON SCHEMA public TO openobserve;
-- Verify
\l -- List databases
\du -- List usersCreate a dedicated bucket in MinIO for OpenObserve data.
# Port forward to MinIO console
kubectl port-forward -n soludev svc/minio 9001:9001
# Access console at http://localhost:9001
# Login with your MinIO credentials
# Navigate to "Buckets" and create a new bucket named "observability"# Install MinIO CLI if not already installed
curl https://dl.min.io/client/mc/release/darwin-amd64/mc \
-o /usr/local/bin/mc
chmod +x /usr/local/bin/mc
# Port forward to MinIO API
kubectl port-forward -n soludev svc/minio 9000:9000 &
# Configure MinIO alias
mc alias set minio http://localhost:9000 <MINIO_USER> <MINIO_PASSWORD>
# Create bucket
mc mb minio/observability
# Verify
mc ls minio# Using MinIO console:
# 1. Go to "Access Keys"
# 2. Click "Create Access Key"
# 3. Save the Access Key and Secret Key
# Or using CLI:
mc admin user svcacct add minio <MINIO_USER> \
--access-key "openobserve-access" \
--secret-key "your-secret-key"OpenObserve requires several secrets to be stored in OpenBao.
ZO_ROOT_USER_EMAIL: Admin email for OpenObserve loginZO_ROOT_USER_PASSWORD: Admin password for OpenObserveMINIO_ACCESS: MinIO access keyMINIO_SECRET: MinIO secret keyZO_META_POSTGRES_DSN: PostgreSQL connection string
The DSN format for PostgreSQL:
postgresql://username:password@host:port/database
Example:
postgresql://openobserve:your-password@postgres.soludev.svc.cluster.local:5432/openobserve
Using the OpenBao CLI:
# Set your vault token
export VAULT_TOKEN=<your-root-token>
# Port forward to OpenBao (if needed)
kubectl port-forward -n soludev svc/openbao 8200:8200 &
# Store OpenObserve secrets
kubectl exec -n soludev openbao-0 -- env VAULT_TOKEN="$VAULT_TOKEN" \
bao kv put soludev/openobserve \
ZO_ROOT_USER_EMAIL="admin@yourdomain.com" \
ZO_ROOT_USER_PASSWORD="your-secure-password" \
MINIO_ACCESS="openobserve-access-key" \
MINIO_SECRET="your-minio-secret-key" \
ZO_META_POSTGRES_DSN="postgresql://openobserve:your-db-password@postgres.soludev.svc.cluster.local:5432/openobserve"Using the OpenBao UI:
- Port forward:
kubectl port-forward -n soludev svc/openbao 8200:8200 - Access UI at
http://localhost:8200 - Login with your root token
- Navigate to Secrets → soludev/
- Create a new secret named
openobserve - Add all required keys listed above
Create an ExternalSecret resource to sync secrets from OpenBao:
# dev/soludev/openobserve/external-secret.yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: soludev-openobserve-external-secret
namespace: soludev
spec:
refreshInterval: 60s
secretStoreRef:
name: openbao-backend
kind: ClusterSecretStore
target:
name: soludev-openobserve-secret
creationPolicy: Owner
dataFrom:
- extract:
key: soludev/openobserveApply the ExternalSecret:
kubectl apply -f dev/soludev/openobserve/external-secret.yamlVerify Secret Synchronization:
# Check ExternalSecret status
kubectl get externalsecret -n soludev soludev-openobserve-external-secret
# Verify all keys are synced
kubectl get secret -n soludev soludev-openobserve-secret -o jsonpath='{.data}' | jq -r 'keys[]'
# Expected output:
# MINIO_ACCESS
# MINIO_SECRET
# ZO_META_POSTGRES_DSN
# ZO_ROOT_USER_EMAIL
# ZO_ROOT_USER_PASSWORDOpenObserve needs local storage for cache. Create an NFS-backed PersistentVolume:
# dev/soludev/openobserve/persistent-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-soludev-openobserve
namespace: soludev
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs-soludev-openobserve
nfs:
server: <NFS_SERVER_IP> # Your NFS server IP
path: /path/to/openobserve/storage
mountOptions:
- nfsvers=4.1
- hard
- timeo=600
- retrans=2Create the NFS directory on your NFS server:
# SSH to your NFS server
ssh user@nfs-server
# Create directory
sudo mkdir -p /path/to/openobserve/storage
sudo chmod 777 /path/to/openobserve/storage
# Update NFS exports if needed
sudo exportfs -raApply the PersistentVolume:
kubectl apply -f dev/soludev/openobserve/persistent-volume.yaml
# Verify PV is available
kubectl get pv nfs-soludev-openobserveCreate a values file that references external secrets and configures OpenObserve:
# config/dev/openobserve/values.yml
# Empty auth section - credentials injected via extraEnv
auth:
ZO_ROOT_USER_EMAIL: ""
ZO_ROOT_USER_PASSWORD: ""
ZO_S3_ACCESS_KEY: ""
ZO_S3_SECRET_KEY: ""
# Inject credentials from external secret
extraEnv:
- name: ZO_ROOT_USER_EMAIL
valueFrom:
secretKeyRef:
name: soludev-openobserve-secret
key: ZO_ROOT_USER_EMAIL
- name: ZO_ROOT_USER_PASSWORD
valueFrom:
secretKeyRef:
name: soludev-openobserve-secret
key: ZO_ROOT_USER_PASSWORD
- name: ZO_S3_ACCESS_KEY
valueFrom:
secretKeyRef:
name: soludev-openobserve-secret
key: MINIO_ACCESS
- name: ZO_S3_SECRET_KEY
valueFrom:
secretKeyRef:
name: soludev-openobserve-secret
key: MINIO_SECRET
- name: ZO_META_POSTGRES_DSN
valueFrom:
secretKeyRef:
name: soludev-openobserve-secret
key: ZO_META_POSTGRES_DSN
# OpenObserve configuration
config:
# Data retention (10 days)
ZO_COMPACT_DATA_RETENTION_DAYS: "10"
ZO_COMPACT_ENABLED: "true"
ZO_COMPACT_INTERVAL: "3600"
# PostgreSQL for metadata
ZO_META_STORE: "postgres"
ZO_META_CONNECTION_POOL_MIN_SIZE: "2"
ZO_META_CONNECTION_POOL_MAX_SIZE: "10"
# Local mode with S3/MinIO storage
ZO_LOCAL_MODE: "true"
ZO_LOCAL_MODE_STORAGE: "s3"
# MinIO configuration
ZO_S3_PROVIDER: "minio"
ZO_S3_SERVER_URL: "http://minio.soludev.svc.cluster.local:9000"
ZO_S3_REGION_NAME: "us-east-1"
ZO_S3_BUCKET_NAME: "observability"
ZO_S3_BUCKET_PREFIX: ""
ZO_S3_FEATURE_FORCE_HOSTED_STYLE: "false"
ZO_S3_FEATURE_FORCE_PATH_STYLE: "true"
ZO_S3_FEATURE_HTTP1_ONLY: "false"
ZO_S3_FEATURE_HTTP2_ONLY: "false"
# Resource limits
resources:
limits:
cpu: 1500m
memory: 2Gi
requests:
cpu: 1000m
memory: 1Gi
# Local cache persistence
persistence:
enabled: true
size: 5Gi
storageClass: "nfs-soludev-openobserve"
accessModes:
- ReadWriteMany
# Service configuration
service:
type: ClusterIP
http_port: 5080
grpc_port: 5081
# Ingress disabled (configured separately)
ingress:
enabled: false
# Disable built-in MinIO (using external MinIO)
minio:
enabled: falsehelm repo add openobserve https://charts.openobserve.ai
helm repo updatehelm install openobserve openobserve/openobserve-standalone \
-n soludev \
-f config/dev/openobserve/values.ymlWhen you need to update configuration:
helm upgrade openobserve openobserve/openobserve-standalone \
-n soludev \
-f config/dev/openobserve/values.yml# Check if OpenObserve pod is running
kubectl get pods -n soludev -l app.kubernetes.io/name=openobserve
# Expected output:
# NAME READY STATUS RESTARTS AGE
# openobserve-openobserve-standalone-0 1/1 Running 0 2m
# Check PVC is bound
kubectl get pvc -n soludev | grep openobserve
# Check logs
kubectl logs -n soludev -l app.kubernetes.io/name=openobserve --tail=50
# Look for successful startup messages:
# - "Starting OpenObserve"
# - PostgreSQL connection established
# - MinIO/S3 connection verifiedCreate an Ingress resource to expose OpenObserve:
# dev/soludev/openobserve/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openobserve-ingress
namespace: soludev
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
traefik.ingress.kubernetes.io/redirect-scheme: https
cert-manager.io/cluster-issuer: letsencrypt-prod # If using cert-manager
spec:
rules:
- host: openobserve.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: openobserve-openobserve-standalone
port:
number: 5080
tls:
- hosts:
- openobserve.yourdomain.com
secretName: openobserve-tlsApply the Ingress:
kubectl apply -f dev/soludev/openobserve/ingress.yaml
# Verify ingress
kubectl get ingress -n soludev openobserve-ingresskubectl port-forward -n soludev svc/openobserve-openobserve-standalone 5080:5080Access OpenObserve at: http://localhost:5080
# Get admin email
kubectl get secret -n soludev soludev-openobserve-secret \
-o jsonpath='{.data.ZO_ROOT_USER_EMAIL}' | base64 -d
echo
# Get admin password
kubectl get secret -n soludev soludev-openobserve-secret \
-o jsonpath='{.data.ZO_ROOT_USER_PASSWORD}' | base64 -d
echo# Ingest logs using curl
curl -u "admin@yourdomain.com:your-password" \
-X POST "http://openobserve.yourdomain.com/api/default/logs" \
-H "Content-Type: application/json" \
-d '[
{
"timestamp": "2024-01-01T12:00:00Z",
"level": "info",
"message": "Test log message",
"service": "my-app"
}
]'# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[OUTPUT]
Name http
Match *
Host openobserve-openobserve-standalone.soludev.svc.cluster.local
Port 5080
URI /api/default/logs
Format json
HTTP_User admin@yourdomain.com
HTTP_Passwd your-password
tls Off# prometheus-config.yaml
remote_write:
- url: http://openobserve-openobserve-standalone.soludev.svc.cluster.local:5080/api/default/prometheus/api/v1/write
basic_auth:
username: admin@yourdomain.com
password: your-password1. PostgreSQL Connection Errors
Error: failed to connect to PostgreSQL
Solution:
- Verify DSN format in secret:
kubectl get secret -n soludev soludev-openobserve-secret -o jsonpath='{.data.ZO_META_POSTGRES_DSN}' | base64 -d - Check PostgreSQL is accessible:
kubectl exec -it -n soludev <openobserve-pod> -- nc -zv postgres 5432 - Verify database and user exist in PostgreSQL
- Check PostgreSQL logs for authentication errors
2. MinIO/S3 Connection Errors
Error: failed to connect to S3
Solution:
-
Verify MinIO is running:
kubectl get pods -n soludev | grep minio -
Check MinIO access keys in secret
-
Verify bucket exists:
mc ls minio/observability -
Test connectivity from OpenObserve pod:
kubectl exec -it -n soludev <openobserve-pod> -- \ curl http://minio.soludev.svc.cluster.local:9000/minio/health/live
3. PVC Not Binding
# Check PVC status
kubectl get pvc -n soludev | grep openobserve
# Check PV status
kubectl get pv nfs-soludev-openobserve
# Describe PVC for events
kubectl describe pvc -n soludev <pvc-name>Solution:
- Verify NFS server is accessible
- Check NFS exports:
showmount -e <nfs-server-ip> - Verify storage class matches:
kubectl get storageclass - Check NFS path exists and has correct permissions
4. Secret Not Syncing
# Check ExternalSecret status
kubectl describe externalsecret -n soludev soludev-openobserve-external-secret
# Check for errors in External Secrets Operator
kubectl logs -n external-secrets-system -l app.kubernetes.io/name=external-secretsSolution:
- Verify secrets exist in OpenBao
- Check ClusterSecretStore is configured correctly
- Verify service account has proper permissions
- Force sync by restarting External Secrets Operator pod
5. Data Not Appearing in UI
Solution:
- Check data is being sent to correct endpoint
- Verify authentication credentials
- Check OpenObserve logs for ingestion errors
- Verify MinIO bucket has data:
mc ls minio/observability - Check PostgreSQL for metadata entries
6. High Memory Usage
Solution:
- Reduce
ZO_META_CONNECTION_POOL_MAX_SIZE - Increase resource limits in values.yml
- Adjust
ZO_COMPACT_DATA_RETENTION_DAYSto retain less data - Monitor MinIO bucket size
# Health check endpoint
curl http://openobserve.yourdomain.com/healthz
# Metrics endpoint
curl http://openobserve.yourdomain.com/metrics# Check MinIO bucket size
mc du minio/observability
# Check PVC usage
kubectl exec -it -n soludev <openobserve-pod> -- df -h /data
# Check PostgreSQL database size
kubectl exec -it -n soludev <postgres-pod> -- \
psql -U openobserve -d openobserve -c \
"SELECT pg_size_pretty(pg_database_size('openobserve'));"OpenObserve automatically compacts data based on configuration:
config:
ZO_COMPACT_DATA_RETENTION_DAYS: "10" # Keep data for 10 days
ZO_COMPACT_ENABLED: "true"
ZO_COMPACT_INTERVAL: "3600" # Run compaction every hourPostgreSQL Metadata:
# Backup PostgreSQL database
kubectl exec -n soludev <postgres-pod> -- \
pg_dump -U openobserve openobserve > openobserve-metadata-backup.sqlMinIO Data:
# Backup MinIO bucket
mc mirror minio/observability /path/to/backup/For production workloads, consider:
Horizontal Scaling:
replicaCount: 3 # Run multiple replicasResource Scaling:
resources:
limits:
cpu: 4000m
memory: 8Gi
requests:
cpu: 2000m
memory: 4GiPostgreSQL Connection Pool:
config:
ZO_META_CONNECTION_POOL_MAX_SIZE: "20"Security:
- ✅ Use strong passwords for admin account
- ✅ Enable TLS/HTTPS via ingress
- ✅ Store all credentials in OpenBao
- ✅ Use network policies to restrict access
- ✅ Regularly rotate credentials
Performance:
- ✅ Adjust retention policies based on your needs
- ✅ Monitor resource usage and scale accordingly
- ✅ Use appropriate PostgreSQL connection pool sizes
- ✅ Enable data compaction
Reliability:
- ✅ Use persistent storage for local cache
- ✅ Backup PostgreSQL metadata regularly
- ✅ Monitor MinIO bucket size
- ✅ Set up proper monitoring and alerting
Cost Optimization:
- ✅ Adjust data retention to balance storage costs
- ✅ Use MinIO lifecycle policies for old data
- ✅ Right-size resource requests and limits
# Example: Configure your app to send logs to OpenObserve
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
env:
- name: LOG_ENDPOINT
value: "http://openobserve-openobserve-standalone.soludev.svc.cluster.local:5080/api/default/logs"
- name: LOG_USER
valueFrom:
secretKeyRef:
name: soludev-openobserve-secret
key: ZO_ROOT_USER_EMAIL
- name: LOG_PASSWORD
valueFrom:
secretKeyRef:
name: soludev-openobserve-secret
key: ZO_ROOT_USER_PASSWORD# otel-collector-config.yaml
exporters:
otlphttp:
endpoint: http://openobserve-openobserve-standalone.soludev.svc.cluster.local:5080/api/default
headers:
Authorization: Basic <base64(email:password)>This section covers the deployment of the PickPro application stack, including OAuth2 Proxy for authentication, MinIO for storage, and Traefik middleware integration.
OAuth2 Proxy is used to protect the application with OIDC authentication (e.g., Logto).
- OpenBao/Vault configured with the secret
pickpro/oauth2-proxycontaining:client-idclient-secretcookie-secret
-
Add Helm Repository:
helm repo add oauth2-proxy https://oauth2-proxy.github.io/manifests helm repo update
-
Deploy with Helm:
helm upgrade --install oauth2-proxy oauth2-proxy/oauth2-proxy \ --namespace pickpro \ --create-namespace \ -f config/dev/oauth2-proxy/values.yaml
-
Apply External Secrets and ConfigMap: Ensure the
oauth2-proxy-secretandoauth2-proxy-configare created:kubectl apply -f dev/pickpro/oauth2-proxy/
To protect your Ingress resources, configure a Traefik Middleware that delegates authentication to OAuth2 Proxy.
File: dev/pickpro/traefik/middleware.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: traefik-forward-auth
namespace: pickpro
spec:
forwardAuth:
address: "http://oauth2-proxy.pickpro.svc.cluster.local:4180/oauth2/auth"
authResponseHeaders:
- "X-Auth-Request-User"
- "X-Auth-Request-Email"
- "Authorization"
trustForwardHeader: truekubectl apply -f dev/pickpro/traefik/middleware.yamlPickPro uses a dedicated MinIO instance for storing CVs and other documents.
helm upgrade --install minio-pickpro minio/minio \
--namespace pickpro \
-f config/dev/minio/pickpro/values.yamlThis configuration:
- Creates a bucket named
cvs - Uses NFS storage class
nfs-pickpro-minio - Sets up a standalone MinIO instance
To protect an Ingress resource (e.g., pickpro-front), add the middleware annotation:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: pickpro-front
namespace: pickpro
annotations:
traefik.ingress.kubernetes.io/router.middlewares: pickpro-traefik-forward-auth@kubernetescrd
spec:
# ... ingress rules ...- cgroups v2 Error: Follow the cgroups v2 setup steps in the Control Plane Setup section
- Network Issues: Ensure firewall allows traffic on port 6443
- Token Issues: Verify the token is copied correctly without extra spaces or characters
Symptoms:
- Pods remain in
ContainerCreatingstate for extended periods (5+ minutes) - Multiple pods trying to mount the same NFS volume simultaneously
- No events showing volume mounting progress
kubectl describe podshows the pod is scheduled but containers haven't started
Root Cause: NFS server in Colima VM stops responding or becomes unresponsive, preventing pods from mounting NFS volumes.
Solution:
-
Restart Colima completely:
# Stop Colima colima stop # Start Colima colima start
-
Verify NFS is running after restart:
# SSH into Colima colima ssh # Check NFS server status sudo systemctl status nfs-kernel-server # Verify exports showmount -e localhost
-
If NFS still not working, restart the service manually:
# Inside Colima VM sudo systemctl restart nfs-kernel-server sudo exportfs -ra -
Test connectivity from other nodes:
# From jetson or raspberry pi showmount -e <COLIMA_HEADSCALE_IP> nc -zv <COLIMA_HEADSCALE_IP> 2049
Prevention:
- Monitor NFS server health regularly
- Consider restarting Colima periodically if experiencing frequent NFS issues
- Use
local-pathstorage class for critical system components that don't need shared storage
Note: After restarting Colima, pods should automatically retry mounting volumes and transition from ContainerCreating to Running within 1-2 minutes.
Symptoms:
Warning: Extension tcp revision 0 not supported, missing kernel module?
iptables v1.8.11 (nf_tables): RULE_INSERT failed (No such file or directory): rule in chain FORWARD
Or pods svclb-traefik-* in CrashLoopBackOff:
failed to "StartContainer" for "lb-tcp-80" with CrashLoopBackOff
failed to "StartContainer" for "lb-tcp-443" with CrashLoopBackOff
Root Cause:
The Traefik/ServiceLB containers use iptables-nft internally, but the Jetson host uses iptables-legacy. This causes iptables rule insertion failures.
Solution: Disable ServiceLB on Jetson and enable on other nodes
Important: Once you set an enablelb label on ANY node, K3s changes behavior:
- Without any labels → svclb deploys on ALL nodes
- With labels → svclb deploys ONLY on nodes with
enablelb=true
So you must explicitly enable it on nodes where you want it:
# Disable ServiceLB on the Jetson node
kubectl label node jetson-desktop svccontroller.k3s.cattle.io/enablelb=false
# Enable ServiceLB on all other nodes (required!)
kubectl label node bignode svccontroller.k3s.cattle.io/enablelb=true
kubectl label node colima svccontroller.k3s.cattle.io/enablelb=true
kubectl label node raspberrypi svccontroller.k3s.cattle.io/enablelb=true
# Add any other nodes here...Verify the fix:
# Check svclb pods are running on correct nodes
kubectl get pods -n kube-system -o wide | grep svclb
# Verify the DaemonSet has the expected number of pods
kubectl get daemonset -n kube-system | grep svclb
# Check labels on all nodes
kubectl get nodes --show-labels | grep enablelbExpected result:
- DaemonSet should show pods on all nodes except Jetson
svclb-traefikpods should be Running on bignode, colima, raspberrypi- No svclb pods on jetson-desktop
Alternative: If you need ServiceLB on Jetson
You would need to either:
- Switch the Jetson to use
iptables-nft(not recommended, may break other things) - Wait for a Traefik/K3s update that handles this better
- Use a different ingress controller that supports legacy iptables
# Check cluster status
kubectl get nodes
# Check all pods
kubectl get pods -A
# Check K3s service status
sudo systemctl status k3s
# Check K3s agent status (on worker nodes)
sudo systemctl status k3s-agent
# View K3s logs
sudo journalctl -u k3s -f
# Check CPU and memory usage by nodes
kubectl top nodes
# Check allocated resources vs available on nodes
kubectl describe nodes | grep -A 7 "Allocated resources"This setup creates:
- A K3s cluster with control plane and worker nodes
- Flux for GitOps deployment management
- Vault for secrets management
- Traefik as the default ingress controller
- CoreDNS for cluster DNS
- Local path provisioner for storage
After completing this setup, you can:
- Configure Flux to watch your Git repository
- Set up Vault policies and authentication
- Deploy applications using GitOps workflows
- Configure ingress for external access to services