From industry scale to your first deployment
01
Industry Reality
How the big players do it at scale
02
Our Deployed Stack
Docker Compose + Traefik + Observability on VPS
03
Putting It Together
Comparison & key takeaways
01
What real production deployments look like
Microservices · Kubernetes · Service Mesh · Multi-region
Handle traffic spikes automatically
Add more instances to distribute load
Kubernetes HPA KEDA
Make existing instances bigger (more CPU, RAM)
More CPU More RAM
Ship code without breaking things
| Strategy | How It Works | Risk | Downtime |
|---|---|---|---|
| Blue / Green | Two identical envs — switch all traffic at once | Low | Zero |
| Canary | Gradually shift 5% → 25% → 100% of traffic | Low | Zero |
| Rolling | Replace instances one by one | Medium | Zero |
| Recreate | Kill all old, start all new | High | Yes |
Switch all traffic at once. Instant rollback — just switch back.
Gradually shift traffic to the new version
If errors spike → rollback immediately. Minimal blast radius.
📝
What happened?
Structured event records.
"User X got error 500 at 14:03"
📊
How is it performing?
Numbers over time.
CPU, memory, request rate
🔍
Where did the request go?
Follow a request across services. Find the bottleneck
ArgoCD GitOps Terraform GHCR / ECR
That's the enterprise world.
But you don't need all of this to get started.
Let me show you something you can set up this weekend.
02
Docker Compose + Traefik + Observability on VPS
"It works on my machine" ™
Package everything into a container
3-stage build: deps → build → run
90% smaller. Faster CI/CD. Faster deploys. Less cost.
# Stage 1: Install dependencies (Bun for speed)
FROM oven/bun:1.2.21-alpine AS deps
WORKDIR /app
COPY package.json bun.lock turbo.json ./
COPY apps/web/package.json apps/web/package.json
COPY packages/*/package.json ./packages/
RUN bun install --frozen-lockfile
# Stage 2: Build the Next.js app
FROM deps AS builder
COPY . .
RUN bun run build
# Stage 3: Production runner (minimal)
FROM node:22-alpine AS runner
RUN addgroup -S nodejs && adduser -S nextjs -G nodejs
COPY --from=builder --chown=nextjs:nodejs \
/app/apps/web/.next/standalone ./
USER nextjs
CMD ["node", "apps/web/server.js"]
Bun for fast installs Next.js standalone output Non-root user
services:
# ── Reverse Proxy ───────────────────
traefik: # Auto SSL + routing
image: traefik:v3.6.1
ports: ["80:80", "443:443"]
# ── Application ─────────────────────
web: # Next.js (from GHCR)
image: ${WEB_IMAGE}
depends_on: [postgres]
postgres: # Database
image: postgres:16-alpine
# ── Observability ───────────────────
prometheus: # Metrics collection
loki: # Log aggregation
alloy: # Log shipping (Grafana Alloy)
grafana: # Dashboards & visualization
# ── System Metrics ──────────────────
node-exporter: # Host CPU, RAM, disk
cadvisor: # Container metrics
Two Docker networks: public (Traefik-facing) & internal (backend services)
Cloud-native reverse proxy — replaces Nginx + Certbot
Let's Encrypt certs
via HTTP-01 challenge.
Zero config renewal.
Route traffic with
container labels.
No config files needed.
Exposes Prometheus
metrics automatically.
Request rates, latency.
# Docker labels on the web service:
- "traefik.http.routers.web.rule=Host(`app.${DOMAIN}`)"
- traefik.http.routers.web.tls.certresolver=letsencrypt
- traefik.http.services.web.loadbalancer.server.port=3000
Prometheus + Loki + Alloy + Grafana
Scrapes metrics from web app, Traefik, node-exporter, cadvisor every 15s. PromQL queries.
"grep for your containers." Indexes labels, stores log lines. Query with LogQL.
Collects Docker container logs via the Docker socket and ships them to Loki.
Visualize logs & metrics in one place. Pre-provisioned with Prometheus + Loki data sources.
Logs + Metrics unified in Grafana. Query: {stack="ai20k-demo"}
name: Build and Deploy VPS
on:
push:
branches: [master, main]
jobs:
build: # Job 1: Build image & push to GHCR
steps:
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3 # GHCR auth
- uses: docker/metadata-action@v5 # Smart tagging
# tags: branch, sha-xxx, latest
- uses: docker/build-push-action@v6 # Build + push
with: { cache-from: type=gha } # GHA cache
deploy: # Job 2: SSH into VPS & restart
needs: build
steps:
- name: Pull and restart on VPS
run: |
ssh vps "cd /opt/ai20k-demo && \
docker compose pull web && \
docker compose up -d --remove-orphans"
# Our image on GHCR:
ghcr.io/hoangnb24/ai20k-demo/web:master
ghcr.io/hoangnb24/ai20k-demo/web:sha-abc123
ghcr.io/hoangnb24/ai20k-demo/web:latest
# Deploy uses the SHA tag for traceability
No Docker Hub needed. Integrates seamlessly with your GitHub workflow.
Point subdomains to VPS — Traefik handles the rest
| Type | Name | Content | Proxy |
|---|---|---|---|
| A | app | VPS IP | DNS only |
| A | grafana | VPS IP | DNS only |
DNS only (gray cloud) so Traefik's Let's Encrypt HTTP-01 challenge works. Port 80 must reach the VPS directly.
git clone into /opt/ai20k-demo.env.prod — domain, DB passwords, API keysdocker compose -f compose.prod.yml up -d03
Industry vs Our Stack & Key Takeaways
| Aspect | Industry | Our Stack |
|---|---|---|
| Orchestration | Kubernetes | Docker Compose |
| CI/CD | GitOps + ArgoCD | GitHub Actions + SSH |
| Registry | ECR / Private | GHCR (free) |
| Reverse Proxy | AWS ALB / Istio | Traefik v3 |
| SSL | ACM / Managed certs | Let's Encrypt (auto) |
| Observability | Datadog / Splunk | Prometheus + Loki + Grafana |
| Hosting | AWS / GCP / Azure | VPS + Cloudflare DNS |
| Cost | $$$$ | $5-10/month |
"When should I switch from Docker Compose to Kubernetes?"
"Why Traefik over Nginx?"
"How do you debug issues with Grafana + Loki?"
Everything we showed is in one repo
Dockerfile · compose.prod.yml · deploy/ configs · GitHub Actions workflow
Now go deploy something.