July 12, 20266 min readDockLog

Docker health checks and monitoring that actually help

Docker HEALTHCHECK, restart policies, and how they fit with monitoring tools, log alerts, and external uptime checks.

Docker health checks tell the engine whether a container is fit to serve traffic. Monitoring tools tell humans something is wrong. Those are related but not the same.

A passing healthcheck does not mean users can log in. A failing healthcheck without alerts means Docker restarts in a loop while you sleep.

HEALTHCHECK in practice

Dockerfile example:

dockerfile

HEALTHCHECK --interval=30s --timeout=5s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

Compose example:

yaml

services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 40s

docker ps shows healthy, unhealthy, or starting. Orchestrators and compose can wait on service_healthy before starting dependents.

Healthcheck without curl in the image

Minimal images often lack curl. Alternatives:

dockerfile

# wget (busybox/alpine)
HEALTHCHECK CMD wget -q -O- http://localhost:8080/health || exit 1

# pure shell TCP check (weaker: port open != app healthy)
HEALTHCHECK CMD timeout 1 bash -c '</dev/tcp/localhost/8080' || exit 1

TCP checks catch "nothing listening." They miss "listening but returning 500." Prefer HTTP when the image allows it.

depends_on with health conditions

Compose v2 supports waiting for healthy dependencies:

yaml

services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      retries: 5
      start_period: 30s

  worker:
    image: my-worker:latest
    depends_on:
      api:
        condition: service_healthy

Without condition: service_healthy, the worker starts while the API is still booting and may spam connection errors in logs. Those errors look like incidents in your log viewer unless you expect them during startup.

What makes a good health endpoint

Hits the app process, not just "port open"
Avoids requiring auth for a trivial /health or /ready
Separate liveness (process up) from readiness (can serve traffic) if you run Kubernetes later
Fast: sub-second response, no full DB migration check on every probe unless you mean it

A useful pattern:

/health or /live: returns 200 if the process is up
/ready: returns 200 only if DB and cache connections work

Docker has one HEALTHCHECK per container. Combine checks thoughtfully or pick the stricter one for Docker and split later in K8s.

Common mistakes

Checking curl localhost on the wrong port inside the container
120s start_period on a 5s boot app (slows real failure detection)
Healthcheck that hammers the database every 10s
No healthcheck at all on stateful services that can deadlock while still running

Another frequent one: health endpoint hits the public URL through the load balancer instead of localhost inside the container. That tests nginx and DNS, not the app process. Keep probes inside the container network namespace.

Restart policies

yaml

restart: unless-stopped

Policy	Behavior
`no`	Default, manual restart
`on-failure`	After non-zero exit
`unless-stopped`	Common for servers
`always`	Even after daemon reboot

unless-stopped plus unhealthy does not always mean Docker replaces the container. Health status informs humans and compose dependencies; it is not always automatic kill-and-recreate. Know your version and orchestration behavior.

When a container restart loops, logs are the evidence. A log viewer beats parsing docker events by hand.

Detecting restart storms

Signs something is wrong even when docker ps shows Up:

Restart count climbing in docker inspect
Same error line every 30 seconds in logs
Health flapping between starting and unhealthy

Wire an alert on Docker die or restart events, or a log pattern that appears only on crash. Alert setup walks through Slack and Discord webhooks.

Health checks vs monitoring tools

Layer	What it does
HEALTHCHECK	Local process judgment, Docker status
docker stats / cAdvisor	Resource usage
Log tail + alerts	App errors in stdout
Uptime monitor	User-visible URL from outside
Prometheus	Time-series metrics, SLO alerts

Use all layers that match your risk. A payment API wants external HTTP checks and metrics. A internal wiki might live with healthcheck plus log alerts.

None of these replace the others. HEALTHCHECK can pass while memory leaks. Logs can look clean while TLS is broken at the edge. External uptime can pass while one background worker is stuck.

Wiring health into DockLog workflows

DockLog is not a healthcheck engine. It complements one:

Log alerts when health endpoint starts returning 500 in access logs
Docker event alerts on restart storms
CPU/memory thresholds when leaks precede health failures
Tail during incidents when unhealthy appears in docker ps

Slack/Teams/Discord alerts covers channel setup.

For team access during firefighting, RBAC limits who can restart containers after you diagnose.

Incident sequence that works

External monitor or user report says the site is down
Check docker ps for health status and uptime
Tail logs in DockLog (or native app if you are on call away from laptop)
If healthcheck fails locally, fix the app; if healthcheck passes but users fail, check nginx, DNS, TLS
Restart only after you know why; blind restarts hide root cause

External checks matter

Healthchecks run inside the container network namespace. They will not catch:

Bad TLS cert on nginx
DNS pointing to the wrong IP
Cloudflare or CDN misconfig
Database reachable from app but not from users

Run Uptime Kuma, Better Stack, or similar against the public URL. Self-hosted monitoring guide: on a budget.

Check both the root URL and one authenticated or API path if auth middleware can fail independently of /health.

Kubernetes note

Docker HEALTHCHECK maps loosely to liveness and readiness probes. Same principles: cheap liveness, stricter readiness, do not DOS yourself. Kubernetes monitoring tools for the cluster side.

In K8s you get two probes instead of one. Liveness restarts the pod; readiness removes it from Service endpoints. A DB migration on boot should affect readiness, not liveness, or you restart mid-migration.

DockLog tails pod logs the same way it tails Docker when the server runs in K8s mode. K8s log tailing for setup.

Minimal production checklist

Every long-running service has a health endpoint
Compose defines healthcheck and sensible restart
Log rotation configured (max-size, max-file)
One monitoring UI with auth for logs and resource peeks
One external uptime check on the customer-facing URL
One alert to a channel humans read

Compose production tie-in

Healthchecks belong in the same compose file as TLS, auth, and logging limits. Compose for production shows a full example with DockLog behind Caddy.

Docker health checks and monitoring that actually help

HEALTHCHECK in practice

Healthcheck without curl in the image

depends_on with health conditions

What makes a good health endpoint

Common mistakes

Restart policies

Detecting restart storms

Health checks vs monitoring tools

Wiring health into DockLog workflows

Incident sequence that works

External checks matter

Kubernetes note

Minimal production checklist

Compose production tie-in

Further reading

Continue reading

Self-hosted container monitoring on a budget

Docker log management: from docker logs to a real workflow

Docker monitoring tools in 2026: what to use and when