Szymon Borowski
Extended\Mind::Thesis()
The mind extends beyond the skull — into tools, notes, and environment. — Clark & Chalmers, 1998

Monitoring with Prometheus, Loki and Grafana — observability stack in Docker Compose

Szymon Borowski ·

Three pillars of observability

Full system observability requires three types of data:

  • Metrics — numbers over time: CPU, memory, request count, response time (Prometheus)
  • Logs — textual event records (Loki + Promtail)
  • Traces — tracking a specific request across multiple services (not yet implemented)

Prometheus — collecting metrics

Prometheus scrapes metrics from HTTP /metrics endpoints. Traefik and Laravel expose them automatically.

Scrape configuration:

# infra/prometheus/prometheus.yml
scrape_configs:
  - job_name: traefik
    static_configs:
      - targets: ['traefik:8080']
    metrics_path: /metrics

  - job_name: frontend
    static_configs:
      - targets: ['frontend-nginx:9113']

For PHP/Laravel I use nginx-prometheus-exporter as a sidecar that reads Nginx status and exposes metrics in Prometheus format.

Loki + Promtail — log aggregation

Instead of docker logs (which disappear after a container restart), I log through Promtail → Loki.

All PHP services have LOG_CHANNEL=stderr set — logs go to container stdout, and Promtail collects them:

# infra/promtail/promtail.yml
scrape_configs:
  - job_name: containers
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: [__meta_docker_container_name]
        target_label: container

Grafana — dashboards

I created three dashboards:

1. Infrastructure Overview

  • CPU and RAM per container
  • Network I/O
  • Number of active containers

2. Laravel Application

  • HTTP requests per minute (via Nginx logs)
  • Response time (p50, p95, p99)
  • Error rate (5xx)
  • Redis cache hit rate

3. RabbitMQ

  • Number of messages in queues
  • Consumer throughput
  • Dead-letter queue size

Alerts (TODO)

Grafana allows setting up alerts based on PromQL queries. Planned alerts:

  • Error rate > 5% for 5 minutes
  • Pod restart count > 3
  • Queue depth > 1000 messages
  • p95 response time > 2 seconds

For now the monitoring works in "read-only" mode — I observe, it does not alert. This is one of the tasks to complete before a full production deployment.

Likes
Login — Log in to leave a comment.

Comments

No comments yet