Slack and Teams alerts without the noise
Notification channels, alert rules, and thresholds that survived our own staging hosts.
DockLog alerts are useful when they fire once and mean something. They're useless when staging spam wakes you at 2am.
Two places in the UI: Admin → Notifications (where messages go) and Admin → Alerts (what triggers them). Channels first, rules second. Get delivery working before you tune thresholds.
Hook up a channel
Slack
- Open the channel you want (
#docklog-alertsor similar). - Channel name → Integrations → Incoming Webhooks → Add.
- Copy the webhook URL. It looks like
https://hooks.slack.com/services/T.../B.../.... - In DockLog: Admin → Notifications → Add channel → Slack → paste URL → Save.
- Hit Test. You should see a message in Slack within a few seconds.
If Test fails, the URL is wrong or Slack revoked it. Regenerate the webhook and update DockLog.
Microsoft Teams
- Open the target channel.
- Channel name → Connectors → Incoming Webhook → Configure.
- Name it (e.g. "DockLog"), copy the URL.
- Same flow in DockLog: pick Teams, paste, save, Test.
Teams webhooks expire if someone deletes the connector. Symptoms look like "it worked last week."
Discord
- Channel settings → Integrations → Webhooks → New Webhook.
- Copy webhook URL.
- Add channel in DockLog, pick Discord, paste, Test.
Discord rate-limits aggressively. If you fire 50 alerts in a minute during a deploy, some may drop. That's Discord, not DockLog.
Custom HTTPS endpoint
Anything that accepts POST JSON works: n8n, PagerDuty Events API, your own script, a Zapier catch hook.
DockLog sends a JSON payload with rule name, severity, container/pod scope, and a short message body. Point a test channel at webhook.site first if you want to see the exact shape before wiring production.
Channel toggles
For rule-based alerts, enable Intelligent alerts on the channel. Without it, only the simpler event toggles fire (container start/stop, healthcheck failures, blocked actions).
| Toggle | When to enable |
|---|---|
| Intelligent alerts | Log, metric, and K8s event rules |
| Container started/stopped | Noisy on staging; useful on prod if restarts matter |
| Healthcheck failed | Good early signal before OOM |
| Blocked action | Catches RBAC mistakes; see RBAC post |
Turn on only what you need. A channel that pings on every container start on a dev host gets muted in a day.
Rule types
Logs
Match text in stdout/stderr. "15 lines containing ERROR in 3 minutes on prod-api-*" is a common starting point.
Regex works when you need case-insensitivity or structured patterns:
(?i)(exception|fatal|panic)Scope with the same patterns as RBAC: prod-api-*, staging/*, ^worker-\d+$.
Events
Restart loops, OOM kills, unhealthy healthchecks. Catches things you'd otherwise docker inspect for.
Example: a container restarts 5 times in 10 minutes. That's usually a deploy gone wrong or a missing env var, not a flaky network.
Metrics
CPU or memory above X for Y minutes. Useful when logs stay quiet but the process is thrashing.
Start high (90% CPU for 5 minutes) and tighten after a week of baseline. A rule at 50% CPU on a bursty API will lie to you.
Kubernetes events
Crash loop backoff, image pull failures, scheduling failures. Scope with namespace patterns: production/*, staging/api-*.
Pair with K8s log tailing so on-call can jump from alert to live pod logs in one UI.
Starter rules (enable one at a time)
Fresh installs ship six starter rules, all disabled:
| Rule | What it catches | Suggested first? |
|---|---|---|
| OOM kill | Kernel killed the container | Yes on prod |
| Restart loop | Too many restarts in a window | Yes on prod |
| Error spike | Log pattern threshold | After you tune pattern |
| High CPU | Sustained CPU over limit | After baseline week |
| High memory | Sustained memory over limit | After baseline week |
| Unhealthy container | Failed healthcheck | Yes if you use healthchecks |
Enable one, assign a channel, trigger a test (restart a container, print ERROR lines), confirm delivery, then add the next. Turning them all on at once on a busy host is how you mute the channel forever.
Staging vs prod on one instance
Common on a single VPS: staging and prod containers share one DockLog.
| Approach | Pros | Cons |
|---|---|---|
| Two Slack channels | Clean on-call signal | More setup |
| Tighter prod scope only | One channel to watch | Staging rules still need tuning |
| Disable staging rules entirely | Zero noise | Miss staging regressions |
What we do: #docklog-staging with loose cooldowns, #docklog-prod with strict scope (prod-* only) and 10+ minute cooldowns on log rules.
Tuning so people don't mute the channel
- Scope tight at first (
prod-api-*, not*) - Cooldown and max-per-hour: staging can be looser, prod should be stricter
- Separate channels for staging and prod if both run on the same DockLog instance
- Recovery notifications are nice for "it stopped happening" but optional
- Log rules: count hits in a time window, not single-line triggers on every stack trace line
Example prod log rule:
| Field | Value |
|---|---|
| Pattern | ERROR or (?i)exception |
| Threshold | 15 hits in 3 minutes |
| Scope | prod-api-* |
| Cooldown | 10 minutes |
| Max per hour | 3 |
That catches a real spike without paging on one stray ERROR during a deploy.
Step-by-step: first prod alert
- Create
#docklog-prodin Slack, add Incoming Webhook. - Admin → Notifications → add channel, enable Intelligent alerts, Test.
- Admin → Alerts → enable "Restart loop" starter rule.
- Set scope to
prod-*(or tighter). - Assign the Slack channel as destination.
- Save. Restart a prod container 3-4 times quickly in a test window (or use staging first).
- Check History tab: fired? delivered? throttled?
Repeat for OOM rule before adding log-based rules. Events are easier to reason about than log noise.
When nothing arrives
Check in order:
| Step | Question |
|---|---|
| 1 | Global delivery enabled? (Admin → Notifications, top-level toggle) |
| 2 | Intelligent alerts on the channel? |
| 3 | Rule enabled with a destination channel assigned? |
| 4 | Scope actually matches container names? Copy-paste from the UI; names lie. |
| 5 | Cooldown or max-per-hour suppressing repeats? |
| 6 | Webhook URL still valid? (regenerate in Slack/Teams) |
History tab under Admin → Alerts shows what fired, delivered, or was suppressed. If History says "delivered" but Slack is quiet, the webhook is dead or the channel was archived.
Scope debugging
Container named prod-api-1 but rule scoped to production-api-*? Zero matches, zero alerts, no error. Always verify names in the container list before blaming DockLog.
For K8s, remember K8S_NAMESPACES is a hard ceiling. A rule on kube-system/* does nothing if that namespace isn't in the instance env. See RBAC guide.
Security events
Worth a separate low-traffic channel if compliance cares. Blocked stop/delete attempts show up when someone hits a button their user account doesn't allow.
Good for catching permission mistakes or someone poking at buttons they shouldn't have. Wire this before handing client logins: RBAC patterns.
If you expose DockLog on the internet, put it behind TLS first so webhook URLs and admin sessions aren't the weak link: reverse proxy post.
Email and mobile
Email isn't built yet. Slack mobile notifications are the usual workaround. Pin #docklog-prod and set notification preferences to mentions only if you add a bot username later.
For on-call without Slack, custom webhook → PagerDuty or Opsgenie is the path most teams take.
Related reading
More in the alerts guide. Compose baseline with auth and DB_PATH: docker-compose setup. Why alerts matter on a self-hosted viewer: why self-hosted.