DockLogDockLogBlog
6 min readDockLog

Docker health checks and monitoring that actually help

Docker HEALTHCHECK, restart policies, and how they fit with monitoring tools, log alerts, and external uptime checks.

Docker health checks tell the engine whether a container is fit to serve traffic. Monitoring tools tell humans something is wrong. Those are related but not the same.

A passing healthcheck does not mean users can log in. A failing healthcheck without alerts means Docker restarts in a loop while you sleep.

HEALTHCHECK in practice

Dockerfile example:

dockerfile
HEALTHCHECK --interval=30s --timeout=5s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

Compose example:

yaml
services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 40s

docker ps shows healthy, unhealthy, or starting. Orchestrators and compose can wait on service_healthy before starting dependents.

Healthcheck without curl in the image

Minimal images often lack curl. Alternatives:

dockerfile
# wget (busybox/alpine)
HEALTHCHECK CMD wget -q -O- http://localhost:8080/health || exit 1

# pure shell TCP check (weaker: port open != app healthy)
HEALTHCHECK CMD timeout 1 bash -c '</dev/tcp/localhost/8080' || exit 1

TCP checks catch "nothing listening." They miss "listening but returning 500." Prefer HTTP when the image allows it.

depends_on with health conditions

Compose v2 supports waiting for healthy dependencies:

yaml
services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      retries: 5
      start_period: 30s

  worker:
    image: my-worker:latest
    depends_on:
      api:
        condition: service_healthy

Without condition: service_healthy, the worker starts while the API is still booting and may spam connection errors in logs. Those errors look like incidents in your log viewer unless you expect them during startup.

What makes a good health endpoint

  • Hits the app process, not just "port open"
  • Avoids requiring auth for a trivial /health or /ready
  • Separate liveness (process up) from readiness (can serve traffic) if you run Kubernetes later
  • Fast: sub-second response, no full DB migration check on every probe unless you mean it

A useful pattern:

  • /health or /live: returns 200 if the process is up
  • /ready: returns 200 only if DB and cache connections work

Docker has one HEALTHCHECK per container. Combine checks thoughtfully or pick the stricter one for Docker and split later in K8s.

Common mistakes

  • Checking curl localhost on the wrong port inside the container
  • 120s start_period on a 5s boot app (slows real failure detection)
  • Healthcheck that hammers the database every 10s
  • No healthcheck at all on stateful services that can deadlock while still running

Another frequent one: health endpoint hits the public URL through the load balancer instead of localhost inside the container. That tests nginx and DNS, not the app process. Keep probes inside the container network namespace.

Restart policies

yaml
restart: unless-stopped
PolicyBehavior
noDefault, manual restart
on-failureAfter non-zero exit
unless-stoppedCommon for servers
alwaysEven after daemon reboot

unless-stopped plus unhealthy does not always mean Docker replaces the container. Health status informs humans and compose dependencies; it is not always automatic kill-and-recreate. Know your version and orchestration behavior.

When a container restart loops, logs are the evidence. A log viewer beats parsing docker events by hand.

Detecting restart storms

Signs something is wrong even when docker ps shows Up:

  • Restart count climbing in docker inspect
  • Same error line every 30 seconds in logs
  • Health flapping between starting and unhealthy

Wire an alert on Docker die or restart events, or a log pattern that appears only on crash. Alert setup walks through Slack and Discord webhooks.

Health checks vs monitoring tools

LayerWhat it does
HEALTHCHECKLocal process judgment, Docker status
docker stats / cAdvisorResource usage
Log tail + alertsApp errors in stdout
Uptime monitorUser-visible URL from outside
PrometheusTime-series metrics, SLO alerts

Use all layers that match your risk. A payment API wants external HTTP checks and metrics. A internal wiki might live with healthcheck plus log alerts.

None of these replace the others. HEALTHCHECK can pass while memory leaks. Logs can look clean while TLS is broken at the edge. External uptime can pass while one background worker is stuck.

Wiring health into DockLog workflows

DockLog is not a healthcheck engine. It complements one:

  • Log alerts when health endpoint starts returning 500 in access logs
  • Docker event alerts on restart storms
  • CPU/memory thresholds when leaks precede health failures
  • Tail during incidents when unhealthy appears in docker ps

Slack/Teams/Discord alerts covers channel setup.

For team access during firefighting, RBAC limits who can restart containers after you diagnose.

Incident sequence that works

  1. External monitor or user report says the site is down
  2. Check docker ps for health status and uptime
  3. Tail logs in DockLog (or native app if you are on call away from laptop)
  4. If healthcheck fails locally, fix the app; if healthcheck passes but users fail, check nginx, DNS, TLS
  5. Restart only after you know why; blind restarts hide root cause

External checks matter

Healthchecks run inside the container network namespace. They will not catch:

  • Bad TLS cert on nginx
  • DNS pointing to the wrong IP
  • Cloudflare or CDN misconfig
  • Database reachable from app but not from users

Run Uptime Kuma, Better Stack, or similar against the public URL. Self-hosted monitoring guide: on a budget.

Check both the root URL and one authenticated or API path if auth middleware can fail independently of /health.

Kubernetes note

Docker HEALTHCHECK maps loosely to liveness and readiness probes. Same principles: cheap liveness, stricter readiness, do not DOS yourself. Kubernetes monitoring tools for the cluster side.

In K8s you get two probes instead of one. Liveness restarts the pod; readiness removes it from Service endpoints. A DB migration on boot should affect readiness, not liveness, or you restart mid-migration.

DockLog tails pod logs the same way it tails Docker when the server runs in K8s mode. K8s log tailing for setup.

Minimal production checklist

  1. Every long-running service has a health endpoint
  2. Compose defines healthcheck and sensible restart
  3. Log rotation configured (max-size, max-file)
  4. One monitoring UI with auth for logs and resource peeks
  5. One external uptime check on the customer-facing URL
  6. One alert to a channel humans read

Compose production tie-in

Healthchecks belong in the same compose file as TLS, auth, and logging limits. Compose for production shows a full example with DockLog behind Caddy.

Further reading

Health checks are the automatic reflex. Monitoring tools are the nervous system that tells the team. Configure both, or you will only notice the gap at 2am.

Continue reading