DockLogDockLogBlog
7 min readDockLog

Kubernetes monitoring tools: a practical guide without the YAML avalanche

Kubernetes monitoring tools explained for small clusters: kubectl, metrics-server, Prometheus, Lens, K9s, and log tailing with DockLog before you buy a platform.

Kubernetes monitoring tools split into four jobs: is the cluster healthy, are pods running, are users happy (metrics), and what did the app print (logs). Enterprise vendors sell one pane for all four. Small teams usually assemble two or three cheap pieces.

This guide assumes a homelab, staging cluster, or modest prod (not 500 nodes). If you are at platform-team scale, you already have opinions. If you are not, read this before installing everything in the CNCF landscape at once.

Layer 0: kubectl (free, required)

bash
kubectl get pods -A
kubectl describe pod my-app-7f9c8
kubectl logs -f deploy/my-app
kubectl top nodes   # needs metrics-server
kubectl top pods

Enough for the person who owns kubeconfig. Not enough for the developer who should see one namespace, or on-call without a laptop.

A kubectl workflow that scales to one person

When a pod misbehaves, this sequence answers most questions before you install anything else:

bash
kubectl get pods -n staging -l app=my-app
kubectl describe pod my-app-7f9c8-abcde -n staging
kubectl logs my-app-7f9c8-abcde -n staging --previous   # crashed container
kubectl get events -n staging --sort-by='.lastTimestamp'

--previous is the one people forget. If the container restarted, current logs may be empty while the last crash reason lives in the previous instance.

kubectl get events surfaces scheduling failures, image pull errors, and eviction warnings. Those never appear in application stdout. More on that below.

Layer 1: Cluster plumbing

metrics-server

Enables kubectl top and the HPA. Install once per cluster. Small, usually fine on k3s and kind with a tweak or two.

Without it, you are flying blind on memory pressure until the OOM killer arrives. On a 4 GB homelab node, run kubectl top pods -A weekly to learn which Deployment actually needs limits.

Kubernetes Dashboard

Official web UI for resources. Useful for "what is CrashLoopBackOff." Less loved for log tailing at scale. Lock it behind auth and network policy.

Dashboard is not a substitute for log RBAC. Giving every developer dashboard access with a shared token is the same problem as sharing cluster-admin.

Lens / OpenLens

Desktop IDE for clusters. Popular with people who want GUI without running another in-cluster service. Logs and metrics depend on what the cluster already exposes.

Lens shines when you manage three clusters and do not want three browser bookmarks. It still assumes kubeconfig on your laptop, same trust boundary as K9s.

Layer 2: Terminal UIs

K9s

The default power-user terminal for Kubernetes. Pods, logs, port-forward, CRDs, plugins. Requires kubeconfig on the machine. DockLog vs K9s compares when a web UI with RBAC beats a local binary.

K9s is hard to beat for the person who deploys. It is a poor fit for the frontend developer who needs staging logs but should never hold cluster-admin credentials.

stern

Multi-pod log tail from the terminal:

bash
stern my-app -n staging
stern . -n staging --since 10m

Great for deploy day. Same SSH/kubeconfig assumptions as kubectl.

Pair stern with a namespace-scoped Role if you want developers to self-serve without full K9s access. Stern still needs credentials on the machine.

Layer 3: In-cluster observability stacks

Prometheus + Grafana (+ Alertmanager)

The open-source metrics standard. scrape ServiceMonitors, chart RED metrics, page on burn rates. kube-prometheus-stack bundles a lot of YAML.

Worth it when:

  • You have SLOs
  • Multiple services need the same dashboards
  • Someone will maintain Prometheus upgrades

Heavy for "three microservices on k3s." Monitoring tools roundup covers when to stay lighter.

On a 2 GB node, skip the full stack. On an 8 GB homelab with five services you care about, a minimal Prometheus plus one Grafana dashboard for HTTP error rate is reasonable.

Loki (+ Grafana)

Log aggregation aligned with Prometheus labels. Powerful, ops-heavy. DockLog vs Grafana/Loki explains pairing live tail with Loki instead of replacing one with the other.

Install Loki when tickets say "find this trace ID from last Tuesday across all pods." Until then, tailing is enough.

Datadog / New Relic / Grafana Cloud

Agents, hosted backends, credit card. Fastest path if budget exists and you do not want to run Thanos.

Reasonable when on-call rotation, mobile escalation, and APM are day-one requirements. Overkill when the cluster is a staging k3s on a NUC.

Log tailing without a logging stack

Many teams reach for Loki on day one because logs feel urgent. You can defer that if the real need is "see staging pod output from a browser."

DockLog mounts kubeconfig or runs in-cluster, scopes users by namespace pattern, and tails the same way it does for Docker. K8s log tailing guide has compose and ingress notes.

When to use DockLog vs K9s vs stern:

SituationTool
Platform engineer at laptopK9s or stern
Developer, one namespace, no kubeconfigDockLog with RBAC
On-call from phoneDockLog native apps
Search last month across 40 servicesLoki or SaaS

Namespace RBAC in practice

A developer who owns staging should see staging-* pods, not prod-*. DockLog allowed_containers patterns map to namespace prefixes. Full patterns in the RBAC guide.

Platform engineers keep kubeconfig. Everyone else gets a DockLog login. That split prevents the "we gave them cluster-admin because logs were hard" anti-pattern.

Events and alerts people forget

Kubernetes emits Warning events (failed scheduling, backoff, eviction). They are not container stdout. DockLog can surface K8s warning events and route alerts alongside log rules. Useful when the pod never started and there is nothing to tail.

Common event-driven failures with no application logs:

  • FailedScheduling because CPU requests exceed node capacity
  • FailedMount because a PVC or secret is missing
  • ImagePullBackOff because registry credentials expired
  • Evicted because the node ran out of ephemeral storage

For HTTP uptime, something external (Uptime Kuma, Better Stack, Pingdom) still matters. In-cluster health checks do not see DNS or CDN issues. Self-hosted monitoring covers pairing external checks with in-cluster tail.

Resource limits: the silent monitoring layer

kubectl top shows usage. Limits and requests define what the scheduler believes. A pod can be "Running" and unhealthy because it hits memory limit and restarts.

Checklist:

  • Set requests so the scheduler can place pods honestly
  • Set limits so one leaky pod cannot take the node down
  • Alert on restart count, not just pod phase

Docker health checks principles transfer to liveness and readiness probes. Cheap liveness, stricter readiness, do not hammer dependencies on every probe.

A sane stack for a small cluster

Homelab / staging

  1. metrics-server
  2. DockLog or K9s for daily logs
  3. One external uptime check on the ingress URL

Small production

  1. Everything above
  2. Prometheus + Grafana OR managed metrics
  3. Loki or log SaaS when search tickets appear weekly
  4. RBAC and audit on anything with cluster credentials (RBAC guide)

Mistakes we see

  • Installing kube-prometheus-stack "for logs" (it is metrics-first; add Loki separately)
  • Giving every developer cluster-admin because logs are hard
  • Tail-only tools on prod without disk rotation on noisy pods (fix the app or limit log driver size on nodes)
  • Ignoring control plane and etcd on self-managed clusters (k3s hides some of this; bring-your-own-k8s does not)
  • Treating Running as healthy without checking restart count
  • Logging everything at DEBUG in prod because "we might need it" (disk and viewer noise)

Day-one install order

If you are standing up a new k3s cluster this weekend:

  1. metrics-server (15 minutes)
  2. Ingress with TLS (Caddy or nginx, reverse proxy guide)
  3. DockLog in-cluster or with kubeconfig on a trusted host
  4. One Slack alert for pod BackOff or a log pattern you already grep
  5. Defer Prometheus until you have a metric you would chart twice

Kubernetes monitoring tools are not one product. Start with visibility you will use tomorrow, then add Prometheus when metrics drive decisions, and Loki when search drives incidents.

Continue reading