CI/CD, Cloud Deployment & Observability


From industry scale to your first deployment

What we'll cover

01

Industry Reality

How the big players do it at scale

02

Our Deployed Stack

Docker Compose + Traefik + Observability on VPS

03

Putting It Together

Comparison & key takeaways

01

Industry Reality

What real production deployments look like

Architecture at Scale

Users / CDN
Load Balancer / Ingress
Service A
Service B
Service C
Database
Cache
Queue
Monitoring / Observability

Microservices · Kubernetes · Service Mesh · Multi-region

Autoscaling

Handle traffic spikes automatically

Horizontal Scaling →

Add more instances to distribute load

Kubernetes HPA KEDA

Vertical Scaling ↑

Make existing instances bigger (more CPU, RAM)

More CPU More RAM

Cloud resources = rent. Autoscaling saves money by scaling down too.

Deployment Strategies

Ship code without breaking things

StrategyHow It WorksRiskDowntime
Blue / Green Two identical envs — switch all traffic at once Low Zero
Canary Gradually shift 5% → 25% → 100% of traffic Low Zero
Rolling Replace instances one by one Medium Zero
Recreate Kill all old, start all new High Yes

Blue / Green

v1.0 Blue · Current
v2.0 Green · New

Switch all traffic at once. Instant rollback — just switch back.

Canary

Gradually shift traffic to the new version

Phase 1
v1
5%
Phase 2
v1
v2
25%
Phase 3
v2 — fully rolled out
100%

If errors spike → rollback immediately. Minimal blast radius.

The Three Pillars of Observability

📝

Logs

What happened?

Structured event records.
"User X got error 500 at 14:03"

📊

Metrics

How is it performing?

Numbers over time.
CPU, memory, request rate

🔍

Traces

Where did the request go?

Follow a request across services. Find the bottleneck

Industry CI/CD Pipelines

  • Code — Developer pushes to Git
  • Build — Compile, bundle, create artifacts
  • Test — Unit, integration, E2E tests
  • Security Scan — SAST, dependency audit, container scan
  • Staging — Deploy to pre-production
  • Approval — Manual gate or automated checks
  • Production — Canary / Blue-Green deploy

ArgoCD GitOps Terraform GHCR / ECR

That's the enterprise world.

But you don't need all of this to get started.


Let me show you something you can set up this weekend.

02

Our Deployed Stack

Docker Compose + Traefik + Observability on VPS

Why Docker?

The Problem

"It works on my machine" ™

  • Different OS versions
  • Different dependency versions
  • Missing system libraries
  • Configuration drift

The Solution

Package everything into a container

  • Same environment everywhere
  • Reproducible builds
  • Isolated dependencies
  • Ship it anywhere

Multi-Stage Builds

3-stage build: deps → build → run

1.2GB
Full Node + Bun image
~120MB
Next.js standalone + Alpine

90% smaller. Faster CI/CD. Faster deploys. Less cost.

Our Actual Dockerfile


# Stage 1: Install dependencies (Bun for speed)
FROM oven/bun:1.2.21-alpine AS deps
WORKDIR /app
COPY package.json bun.lock turbo.json ./
COPY apps/web/package.json apps/web/package.json
COPY packages/*/package.json ./packages/
RUN bun install --frozen-lockfile

# Stage 2: Build the Next.js app
FROM deps AS builder
COPY . .
RUN bun run build

# Stage 3: Production runner (minimal)
FROM node:22-alpine AS runner
RUN addgroup -S nodejs && adduser -S nextjs -G nodejs
COPY --from=builder --chown=nextjs:nodejs \
  /app/apps/web/.next/standalone ./
USER nextjs
CMD ["node", "apps/web/server.js"]
        

Bun for fast installs Next.js standalone output Non-root user

compose.prod.yml — 10 services


services:
  # ── Reverse Proxy ───────────────────
  traefik:               # Auto SSL + routing
    image: traefik:v3.6.1
    ports: ["80:80", "443:443"]

  # ── Application ─────────────────────
  web:                   # Next.js (from GHCR)
    image: ${WEB_IMAGE}
    depends_on: [postgres]

  postgres:              # Database
    image: postgres:16-alpine

  # ── Observability ───────────────────
  prometheus:            # Metrics collection
  loki:                  # Log aggregation
  alloy:                 # Log shipping (Grafana Alloy)
  grafana:               # Dashboards & visualization

  # ── System Metrics ──────────────────
  node-exporter:         # Host CPU, RAM, disk
  cadvisor:              # Container metrics
        

Architecture — How it connects

🌐 Browser (HTTPS)
↓ ports 80 / 443
Traefik v3 — Reverse Proxy + Auto SSL
↓ app.domain.com     ↓ grafana.domain.com
web (Next.js)
Grafana
↓ internal network
PostgreSQL
Prometheus
Loki
Alloy
↑ scrape metrics
node-exporter
cadvisor

Two Docker networks: public (Traefik-facing) & internal (backend services)

Traefik v3

Cloud-native reverse proxy — replaces Nginx + Certbot

Auto SSL

Let's Encrypt certs
via HTTP-01 challenge.
Zero config renewal.

Docker Labels

Route traffic with
container labels.
No config files needed.

Built-in Metrics

Exposes Prometheus
metrics automatically.
Request rates, latency.


# Docker labels on the web service:
- "traefik.http.routers.web.rule=Host(`app.${DOMAIN}`)"
- traefik.http.routers.web.tls.certresolver=letsencrypt
- traefik.http.services.web.loadbalancer.server.port=3000
        

Our Observability Stack

Prometheus + Loki + Alloy + Grafana

P

Prometheus — Metrics

Scrapes metrics from web app, Traefik, node-exporter, cadvisor every 15s. PromQL queries.

L

Loki — Logs

"grep for your containers." Indexes labels, stores log lines. Query with LogQL.

A

Grafana Alloy — Collector

Collects Docker container logs via the Docker socket and ships them to Loki.

G

Grafana — Dashboards

Visualize logs & metrics in one place. Pre-provisioned with Prometheus + Loki data sources.

Data Flow

Docker Containers
Alloy (collector)
Loki (logs)
web · traefik · node-exporter · cadvisor
Prometheus (scrape)
Loki
Prometheus
Grafana

Logs + Metrics unified in Grafana. Query: {stack="ai20k-demo"}

GitHub Actions — Our Actual Workflow

Push to master
Build & Push
GHCR
SSH Deploy to VPS

name: Build and Deploy VPS
on:
  push:
    branches: [master, main]

jobs:
  build:  # Job 1: Build image & push to GHCR
    steps:
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3      # GHCR auth
      - uses: docker/metadata-action@v5    # Smart tagging
        # tags: branch, sha-xxx, latest
      - uses: docker/build-push-action@v6  # Build + push
        with: { cache-from: type=gha }     # GHA cache

  deploy:  # Job 2: SSH into VPS & restart
    needs: build
    steps:
      - name: Pull and restart on VPS
        run: |
          ssh vps "cd /opt/ai20k-demo && \
            docker compose pull web && \
            docker compose up -d --remove-orphans"
        

GitHub Container Registry

What is GHCR?

  • Free container registry by GitHub
  • Stores your Docker images
  • Private repos = private images
  • Built-in with GitHub Actions

Our image tags


# Our image on GHCR:
ghcr.io/hoangnb24/ai20k-demo/web:master
ghcr.io/hoangnb24/ai20k-demo/web:sha-abc123
ghcr.io/hoangnb24/ai20k-demo/web:latest

# Deploy uses the SHA tag for traceability
            

No Docker Hub needed. Integrates seamlessly with your GitHub workflow.

Cloudflare DNS

Point subdomains to VPS — Traefik handles the rest

TypeNameContentProxy
A app VPS IP DNS only
A grafana VPS IP DNS only

DNS only (gray cloud) so Traefik's Let's Encrypt HTTP-01 challenge works. Port 80 must reach the VPS directly.

VPS Deployment Steps

  • Get a VPS — install Docker + Docker Compose
  • Clone the repo: git clone into /opt/ai20k-demo
  • Create .env.prod — domain, DB passwords, API keys
  • Set up Cloudflare DNS A records → VPS IP
  • docker compose -f compose.prod.yml up -d
  • GitHub Actions auto-deploys on every push to master

The Complete Picture

Developer pushes to master
GitHub Actions — Build & Push
↓ image pushed
GHCR (ghcr.io/.../web:sha-xxx)
↓ SSH → pull & restart
VPS — Docker Compose + Traefik
↓ runs
web
postgres
grafana
prometheus + loki

03

Putting It Together

Industry vs Our Stack & Key Takeaways

Industry vs Our Deployed Stack

AspectIndustryOur Stack
Orchestration Kubernetes Docker Compose
CI/CD GitOps + ArgoCD GitHub Actions + SSH
Registry ECR / Private GHCR (free)
Reverse Proxy AWS ALB / Istio Traefik v3
SSL ACM / Managed certs Let's Encrypt (auto)
Observability Datadog / Splunk Prometheus + Loki + Grafana
Hosting AWS / GCP / Azure VPS + Cloudflare DNS
Cost $$$$ $5-10/month

Key Takeaways

  • Start simple. Docker Compose + VPS + Traefik is enough for production.
  • Observability from day 1. Prometheus + Loki + Grafana — free and powerful.
  • Automate deployments. GitHub Actions → GHCR → SSH deploy on every push.
  • Optimize your images. Multi-stage builds (Bun → build → Alpine runner).
  • Know the industry. Understand K8s and cloud — so you can grow into them.

Q & A


"When should I switch from Docker Compose to Kubernetes?"

"Why Traefik over Nginx?"

"How do you debug issues with Grafana + Loki?"

Learn More

The Full Stack

Everything we showed is in one repo

github.com/hoangnb24/ai20k-demo ↗

Dockerfile · compose.prod.yml · deploy/ configs · GitHub Actions workflow

Dockerfile
compose.prod.yml
deploy-vps.yml

Thank You


Now go deploy something.