Presentation

CI/CD, Cloud Deployment & Observability

From industry scale to your first deployment

Overview

What we'll cover

Industry Reality

How the big players do it at scale

Our Deployed Stack

Docker Compose + Traefik + Observability on VPS

Putting It Together

Comparison & key takeaways

Industry Reality

What real production deployments look like

Part 01

Architecture at Scale

Users / CDN

↓

Load Balancer / Ingress

↓

Service A

Service B

Service C

↓

Database

Cache

Queue

↓

Monitoring / Observability

Microservices · Kubernetes · Service Mesh · Multi-region

Part 01

Autoscaling

Handle traffic spikes automatically

Horizontal Scaling →

Add more instances to distribute load

Kubernetes HPA KEDA

Vertical Scaling ↑

Make existing instances bigger (more CPU, RAM)

More CPU More RAM

Cloud resources = rent. Autoscaling saves money by scaling down too.

Part 01

Deployment Strategies

Ship code without breaking things

Strategy	How It Works	Risk	Downtime
Blue / Green	Two identical envs — switch all traffic at once	Low	Zero
Canary	Gradually shift 5% → 25% → 100% of traffic	Low	Zero
Rolling	Replace instances one by one	Medium	Zero
Recreate	Kill all old, start all new	High	Yes

Part 01 · Deployment

Blue / Green

v1.0 Blue · Current

⇄

v2.0 Green · New

Switch all traffic at once. Instant rollback — just switch back.

Part 01 · Deployment

Canary

Gradually shift traffic to the new version

Phase 1

Phase 2

25%

Phase 3

v2 — fully rolled out

100%

If errors spike → rollback immediately. Minimal blast radius.

Part 01

The Three Pillars of Observability

📝

Logs

What happened?

Structured event records.
"User X got error 500 at 14:03"

📊

Metrics

How is it performing?

Numbers over time.
CPU, memory, request rate

🔍

Traces

Where did the request go?

Follow a request across services. Find the bottleneck

Part 01

Industry CI/CD Pipelines

Code — Developer pushes to Git
Build — Compile, bundle, create artifacts
Test — Unit, integration, E2E tests
Security Scan — SAST, dependency audit, container scan
Staging — Deploy to pre-production
Approval — Manual gate or automated checks
Production — Canary / Blue-Green deploy

ArgoCD GitOps Terraform GHCR / ECR

That's the enterprise world.

But you don't need all of this to get started.

Let me show you something you can set up this weekend.

Our Deployed Stack

Docker Compose + Traefik + Observability on VPS

Part 02 · Docker

Why Docker?

The Problem

"It works on my machine" ™

Different OS versions
Different dependency versions
Missing system libraries
Configuration drift

The Solution

Package everything into a container

Same environment everywhere
Reproducible builds
Isolated dependencies
Ship it anywhere

Part 02 · Docker

Multi-Stage Builds

3-stage build: deps → build → run

1.2GB

Full Node + Bun image

→

~120MB

Next.js standalone + Alpine

90% smaller. Faster CI/CD. Faster deploys. Less cost.

Part 02 · Docker

Our Actual Dockerfile


# Stage 1: Install dependencies (Bun for speed)
FROM oven/bun:1.2.21-alpine AS deps
WORKDIR /app
COPY package.json bun.lock turbo.json ./
COPY apps/web/package.json apps/web/package.json
COPY packages/*/package.json ./packages/
RUN bun install --frozen-lockfile

# Stage 2: Build the Next.js app
FROM deps AS builder
COPY . .
RUN bun run build

# Stage 3: Production runner (minimal)
FROM node:22-alpine AS runner
RUN addgroup -S nodejs && adduser -S nextjs -G nodejs
COPY --from=builder --chown=nextjs:nodejs \
  /app/apps/web/.next/standalone ./
USER nextjs
CMD ["node", "apps/web/server.js"]

Bun for fast installs Next.js standalone output Non-root user

Part 02 · Docker Compose

compose.prod.yml — 10 services


services:
  # ── Reverse Proxy ───────────────────
  traefik:               # Auto SSL + routing
    image: traefik:v3.6.1
    ports: ["80:80", "443:443"]

  # ── Application ─────────────────────
  web:                   # Next.js (from GHCR)
    image: ${WEB_IMAGE}
    depends_on: [postgres]

  postgres:              # Database
    image: postgres:16-alpine

  # ── Observability ───────────────────
  prometheus:            # Metrics collection
  loki:                  # Log aggregation
  alloy:                 # Log shipping (Grafana Alloy)
  grafana:               # Dashboards & visualization

  # ── System Metrics ──────────────────
  node-exporter:         # Host CPU, RAM, disk
  cadvisor:              # Container metrics

Part 02 · Docker Compose

Architecture — How it connects

🌐 Browser (HTTPS)

↓ ports 80 / 443

Traefik v3 — Reverse Proxy + Auto SSL

↓ app.domain.com ↓ grafana.domain.com

web (Next.js)

Grafana

↓ internal network

PostgreSQL

Prometheus

Loki

Alloy

↑ scrape metrics

node-exporter

cadvisor

Two Docker networks: public (Traefik-facing) & internal (backend services)

Part 02 · Reverse Proxy

Traefik v3

Cloud-native reverse proxy — replaces Nginx + Certbot

Auto SSL

Let's Encrypt certs
via HTTP-01 challenge.
Zero config renewal.

Docker Labels

Route traffic with
container labels.
No config files needed.

Built-in Metrics

Exposes Prometheus
metrics automatically.
Request rates, latency.


# Docker labels on the web service:
- "traefik.http.routers.web.rule=Host(`app.${DOMAIN}`)"
- traefik.http.routers.web.tls.certresolver=letsencrypt
- traefik.http.services.web.loadbalancer.server.port=3000

Part 02 · Observability

Our Observability Stack

Prometheus + Loki + Alloy + Grafana

Prometheus — Metrics

Scrapes metrics from web app, Traefik, node-exporter, cadvisor every 15s. PromQL queries.

Loki — Logs

"grep for your containers." Indexes labels, stores log lines. Query with LogQL.

Grafana Alloy — Collector

Collects Docker container logs via the Docker socket and ships them to Loki.

Grafana — Dashboards

Visualize logs & metrics in one place. Pre-provisioned with Prometheus + Loki data sources.

Part 02 · Observability

Data Flow

Docker Containers

→

Alloy (collector)

→

Loki (logs)

web · traefik · node-exporter · cadvisor

←

Prometheus (scrape)

Loki

Prometheus

→

Grafana

Logs + Metrics unified in Grafana. Query: {stack="ai20k-demo"}

Part 02 · CI/CD

GitHub Actions — Our Actual Workflow

Push to master

→

Build & Push

→

GHCR

→

SSH Deploy to VPS


name: Build and Deploy VPS
on:
  push:
    branches: [master, main]

jobs:
  build:  # Job 1: Build image & push to GHCR
    steps:
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3      # GHCR auth
      - uses: docker/metadata-action@v5    # Smart tagging
        # tags: branch, sha-xxx, latest
      - uses: docker/build-push-action@v6  # Build + push
        with: { cache-from: type=gha }     # GHA cache

  deploy:  # Job 2: SSH into VPS & restart
    needs: build
    steps:
      - name: Pull and restart on VPS
        run: |
          ssh vps "cd /opt/ai20k-demo && \
            docker compose pull web && \
            docker compose up -d --remove-orphans"

Part 02 · Registry

GitHub Container Registry

What is GHCR?

Free container registry by GitHub
Stores your Docker images
Private repos = private images
Built-in with GitHub Actions

Our image tags


# Our image on GHCR:
ghcr.io/hoangnb24/ai20k-demo/web:master
ghcr.io/hoangnb24/ai20k-demo/web:sha-abc123
ghcr.io/hoangnb24/ai20k-demo/web:latest

# Deploy uses the SHA tag for traceability

No Docker Hub needed. Integrates seamlessly with your GitHub workflow.

Part 02 · DNS

Cloudflare DNS

Point subdomains to VPS — Traefik handles the rest

Type	Name	Content	Proxy
A	app	VPS IP	DNS only
A	grafana	VPS IP	DNS only

DNS only (gray cloud) so Traefik's Let's Encrypt HTTP-01 challenge works. Port 80 must reach the VPS directly.

Part 02 · Deployment

VPS Deployment Steps

Get a VPS — install Docker + Docker Compose
Clone the repo: git clone into /opt/ai20k-demo
Create .env.prod — domain, DB passwords, API keys
Set up Cloudflare DNS A records → VPS IP
docker compose -f compose.prod.yml up -d
GitHub Actions auto-deploys on every push to master

Part 02

The Complete Picture

Developer pushes to master

↓

GitHub Actions — Build & Push

↓ image pushed

GHCR (ghcr.io/.../web:sha-xxx)

↓ SSH → pull & restart

VPS — Docker Compose + Traefik

↓ runs

web

postgres

grafana

prometheus + loki

Putting It Together

Industry vs Our Stack & Key Takeaways

Part 03

Industry vs Our Deployed Stack

Aspect	Industry	Our Stack
Orchestration	Kubernetes	Docker Compose
CI/CD	GitOps + ArgoCD	GitHub Actions + SSH
Registry	ECR / Private	GHCR (free)
Reverse Proxy	AWS ALB / Istio	Traefik v3
SSL	ACM / Managed certs	Let's Encrypt (auto)
Observability	Datadog / Splunk	Prometheus + Loki + Grafana
Hosting	AWS / GCP / Azure	VPS + Cloudflare DNS
Cost	$$$$	$5-10/month

Part 03

Key Takeaways

Start simple. Docker Compose + VPS + Traefik is enough for production.
Observability from day 1. Prometheus + Loki + Grafana — free and powerful.
Automate deployments. GitHub Actions → GHCR → SSH deploy on every push.
Optimize your images. Multi-stage builds (Bun → build → Alpine runner).
Know the industry. Understand K8s and cloud — so you can grow into them.

Q & A

"When should I switch from Docker Compose to Kubernetes?"

"Why Traefik over Nginx?"

"How do you debug issues with Grafana + Loki?"

Resources

Learn More

Source Code

The Full Stack

Everything we showed is in one repo

github.com/hoangnb24/ai20k-demo ↗

Dockerfile · compose.prod.yml · deploy/ configs · GitHub Actions workflow

Dockerfile

compose.prod.yml

deploy-vps.yml

Thank You

Now go deploy something.

CI/CD, Cloud Deployment & Observability

What we'll cover

Industry Reality

Architecture at Scale

Autoscaling

Horizontal Scaling →

Vertical Scaling ↑

Cloud resources = rent. Autoscaling saves money by scaling down too.

Deployment Strategies

Blue / Green

Canary

The Three Pillars of Observability

Logs

Metrics

Traces

Industry CI/CD Pipelines

Our Deployed Stack

Why Docker?

The Problem

The Solution

Multi-Stage Builds

Our Actual Dockerfile

compose.prod.yml — 10 services

Architecture — How it connects

Traefik v3

Auto SSL

Docker Labels

Built-in Metrics

Our Observability Stack

Prometheus — Metrics

Loki — Logs

Grafana Alloy — Collector

Grafana — Dashboards

Data Flow

GitHub Actions — Our Actual Workflow

GitHub Container Registry

What is GHCR?

Our image tags

Cloudflare DNS

VPS Deployment Steps

The Complete Picture

Putting It Together

Industry vs Our Deployed Stack

Key Takeaways

Q & A

Learn More

Docker

CI/CD & Registry

Observability

Infrastructure

The Full Stack

github.com/hoangnb24/ai20k-demo ↗

Thank You