Engineering

Docker, PM2, nginx — the small-team production stack that works

Not every team needs Kubernetes. For 1-5 person teams shipping to a VPS, the Docker + PM2 + nginx stack is boring, reliable, and cheap. Here's the exact config we run across twelve deployments, including the footguns and the fixes.

27 May 202610 min readKrypton Forge Labs

We run twelve server-side deployments. All of them use some variation of Docker, PM2, and nginx on a single VPS. Not one of them runs Kubernetes. Not one of them has a dedicated DevOps person. The stack is boring, reliable, and costs roughly ₹500-2,000/month in compute per deployment.

This is the exact config we use, the decisions behind it, and the things that have broken in production so you can skip the part where you discover them at 2 AM.

Why not Kubernetes

Kubernetes is the right answer for teams that need: rolling updates with zero downtime, horizontal autoscaling, service mesh, multi-cloud, and a dedicated platform team to maintain it. That describes about 5% of the teams building SaaS in India.

For a 1-5 person team shipping a Next.js app, a Node.js API, and a Postgres database to production, Kubernetes adds complexity that is not justified by the problem. The Kubernetes migration can wait until the team grows past the point where SSH into a single server is a risk rather than a convenience.

Even then, consider Docker Swarm or Nomad before Kubernetes. They are simpler, smaller, and solve 80% of the orchestration problem with 20% of the complexity.

The stack, layer by layer

nginx sits at the edge. It terminates SSL (Let's Encrypt via certbot), proxies to the app, serves static assets, and provides rate limiting. One nginx instance per VPS, multiple server blocks per domain.

# /etc/nginx/sites-available/paraslace
server {
    listen 443 ssl http2;
    server_name paraslace.in www.paraslace.in;

    ssl_certificate     /etc/letsencrypt/live/paraslace.in/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/paraslace.in/privkey.pem;

    # Rate limiting — 30 req/s per IP, burst of 50
    limit_req_zone $binary_remote_addr zone=paraslace:10m rate=30r/s;
    limit_req zone=paraslace burst=50 nodelay;

    location / {
        proxy_pass http://127.0.0.1:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Static assets — nginx serves these directly
    location /_next/static {
        alias /home/apps/paraslace/.next/static;
        expires 365d;
        add_header Cache-Control "public, immutable";
    }
}

The limit_req zone prevents a single misbehaving client from saturating the server. 30 requests/second is generous for a human user and restrictive for a scraper. The burst of 50 allows short spikes.

PM2 manages the Node.js processes. It restarts them when they crash, rotates logs, and provides a simple CLI for status and restart. We run PM2 in cluster mode for CPU-bound apps (one process per core) and fork mode for I/O-bound apps.

# pm2 start
pm2 start npm --name "paraslace" -- run start -- -p 3001
pm2 save
pm2 startup  # installs systemd hook for restart-on-boot

The systemd hook is critical. Without it, a server reboot leaves your app down until someone notices. PM2's startup command handles the systemd unit file.

Docker contains the non-Node.js services: Postgres, n8n, Redis, PgBouncer. Each gets a docker-compose.yml. The app itself runs directly on the host because Node.js + PM2 gives us better observability and restart behaviour than a containerised Node process.

# docker-compose.yml — n8n + Postgres + Redis
version: "3.8"
services:
  postgres:
    image: postgres:16-alpine
    restart: unless-stopped
    environment:
      POSTGRES_USER: n8n
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: n8n
    volumes:
      - pgdata:/var/lib/postgresql/data
    ports:
      - "127.0.0.1:5432:5432"

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --appendonly yes
    volumes:
      - redisdata:/data

  n8n:
    image: n8nio/n8n:latest
    restart: unless-stopped
    environment:
      N8N_HOST: n8n.kryptonforge.in
      N8N_PORT: 5678
      N8N_PROTOCOL: https
      DB_TYPE: postgresdb
      DB_POSTGRESDB_HOST: postgres
      DB_POSTGRESDB_DATABASE: n8n
      DB_POSTGRESDB_USER: n8n
      DB_POSTGRESDB_PASSWORD: ${DB_PASSWORD}
    ports:
      - "127.0.0.1:5678:5678"
    depends_on:
      - postgres
      - redis

volumes:
  pgdata:
  redisdata:

Key decisions in this compose file: restart: unless-stopped means Docker will restart the container unless you explicitly stop it. Containers bind to 127.0.0.1, not 0.0.0.0 — nginx is the only thing exposed to the internet. The database password comes from an environment variable, not the compose file. The compose file lives in git; the .env file does not.

The deployment pipeline

We do not use CI/CD pipelines that deploy on push. For a small team, that is more risk than benefit. Instead:

# deploy.sh — the deploy script we run manually
#!/bin/bash
set -e

cd /home/apps/paraslace
git pull origin main
npm ci --production
npm run build

# Graceful reload — zero downtime
pm2 reload paraslace

echo "Deployed at $(date)"

The pm2 reload is the magic. It starts new instances, waits for them to become ready, then kills the old ones. Zero downtime for the user. The only failure mode is if the build itself is broken — and we catch that by running npm run build before the reload.

We deploy to staging first (a separate VPS), run a smoke test, then deploy to production. The entire flow takes about three minutes for a typical Next.js app.

What has broken in production

Disk full from Docker logs. Docker's default json-file log driver accumulates logs indefinitely. On a small VPS with a 40 GB disk, a chatty n8n instance filled the disk in six months. The fix:

# docker-compose.yml — per-service logging limits
services:
  n8n:
    logging:
      driver: "json-file"
      options:
        max-size: "50m"
        max-file: "3"

PM2 memory leaks in Next.js. Next.js 14 and 15 had a known issue where the dev server leaked memory over time. Next.js 16 in production mode is stable, but we still set PM2's memory limit and auto-restart as a safety net:

pm2 start npm --name "paraslace" --max-memory-restart 500M -- run start

nginx client_max_body_size too small. The default is 1 MB. An API endpoint that accepts file uploads fails with a 413 error and a cryptic nginx page. Set it in the server block:

client_max_body_size 50m;

Let's Encrypt auto-renewal silently failing. certbot's default cron job renews certificates. If port 80 is blocked by a firewall or if the nginx config has a redirect that interferes with the ACME challenge, renewal fails and you discover it when the certificate expires. We now run a weekly check:

# /etc/cron.weekly/ssl-check
openssl s_client -connect paraslace.in:443 -servername paraslace.in </dev/null 2>/dev/null \
  | openssl x509 -noout -enddate

The monitoring that costs nothing

We do not run Datadog. We do not run Grafana. We run three things:

  1. PM2 monitoring: pm2 monit gives CPU, memory, and restart counts. Good enough for daily checks.
  2. nginx access logs: tail -f /var/log/nginx/access.log filtered through goaccess for basic analytics.
  3. Health check cron: A simple curl against /api/health every minute, alerts to a WhatsApp group if it fails three times in a row.
# /etc/cron.d/healthcheck
* * * * * root /usr/local/bin/healthcheck.sh

The health check script is 15 lines of bash. It has caught more production issues than every paid monitoring tool we have ever trialled.

When this stack stops working

This stack works for deployments up to about 50-100 concurrent users, a few hundred requests per second, and a team of 1-5 engineers. Beyond that:

  • The single VPS becomes a scaling bottleneck. You need to split the database, the app, and the background workers onto separate machines.
  • PM2's cluster mode hits diminishing returns beyond 4-8 cores on a single machine.
  • Manual deployments become a coordination problem when multiple people can deploy.

At that point, you have a good problem. The business is growing. Migrate to the infrastructure that matches the scale. But do not pre-emptively build a Kubernetes cluster for 10 users. The stack above will serve you for longer than you think.

Start boring. Ship working. Add complexity when the traffic demands it, not because a conference talk made Docker Compose feel inadequate.

Tags

  • docker
  • pm2
  • nginx
  • devops
  • deployment
  • infrastructure