Engineering
Docker, PM2, nginx — the small-team production stack that works
Not every team needs Kubernetes. For 1-5 person teams shipping to a VPS, the Docker + PM2 + nginx stack is boring, reliable, and cheap. Here's the exact config we run across twelve deployments, including the footguns and the fixes.
We run twelve server-side deployments. All of them use some variation of Docker, PM2, and nginx on a single VPS. Not one of them runs Kubernetes. Not one of them has a dedicated DevOps person. The stack is boring, reliable, and costs roughly ₹500-2,000/month in compute per deployment.
This is the exact config we use, the decisions behind it, and the things that have broken in production so you can skip the part where you discover them at 2 AM.
Why not Kubernetes
Kubernetes is the right answer for teams that need: rolling updates with zero downtime, horizontal autoscaling, service mesh, multi-cloud, and a dedicated platform team to maintain it. That describes about 5% of the teams building SaaS in India.
For a 1-5 person team shipping a Next.js app, a Node.js API, and a Postgres database to production, Kubernetes adds complexity that is not justified by the problem. The Kubernetes migration can wait until the team grows past the point where SSH into a single server is a risk rather than a convenience.
Even then, consider Docker Swarm or Nomad before Kubernetes. They are simpler, smaller, and solve 80% of the orchestration problem with 20% of the complexity.
The stack, layer by layer
nginx sits at the edge. It terminates SSL (Let's Encrypt via certbot), proxies to the app, serves static assets, and provides rate limiting. One nginx instance per VPS, multiple server blocks per domain.
# /etc/nginx/sites-available/paraslace
server {
listen 443 ssl http2;
server_name paraslace.in www.paraslace.in;
ssl_certificate /etc/letsencrypt/live/paraslace.in/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/paraslace.in/privkey.pem;
# Rate limiting — 30 req/s per IP, burst of 50
limit_req_zone $binary_remote_addr zone=paraslace:10m rate=30r/s;
limit_req zone=paraslace burst=50 nodelay;
location / {
proxy_pass http://127.0.0.1:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Static assets — nginx serves these directly
location /_next/static {
alias /home/apps/paraslace/.next/static;
expires 365d;
add_header Cache-Control "public, immutable";
}
}
The limit_req zone prevents a single misbehaving client from saturating the server. 30 requests/second is generous for a human user and restrictive for a scraper. The burst of 50 allows short spikes.
PM2 manages the Node.js processes. It restarts them when they crash, rotates logs, and provides a simple CLI for status and restart. We run PM2 in cluster mode for CPU-bound apps (one process per core) and fork mode for I/O-bound apps.
# pm2 start
pm2 start npm --name "paraslace" -- run start -- -p 3001
pm2 save
pm2 startup # installs systemd hook for restart-on-boot
The systemd hook is critical. Without it, a server reboot leaves your app down until someone notices. PM2's startup command handles the systemd unit file.
Docker contains the non-Node.js services: Postgres, n8n, Redis, PgBouncer. Each gets a docker-compose.yml. The app itself runs directly on the host because Node.js + PM2 gives us better observability and restart behaviour than a containerised Node process.
# docker-compose.yml — n8n + Postgres + Redis
version: "3.8"
services:
postgres:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_USER: n8n
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: n8n
volumes:
- pgdata:/var/lib/postgresql/data
ports:
- "127.0.0.1:5432:5432"
redis:
image: redis:7-alpine
restart: unless-stopped
command: redis-server --appendonly yes
volumes:
- redisdata:/data
n8n:
image: n8nio/n8n:latest
restart: unless-stopped
environment:
N8N_HOST: n8n.kryptonforge.in
N8N_PORT: 5678
N8N_PROTOCOL: https
DB_TYPE: postgresdb
DB_POSTGRESDB_HOST: postgres
DB_POSTGRESDB_DATABASE: n8n
DB_POSTGRESDB_USER: n8n
DB_POSTGRESDB_PASSWORD: ${DB_PASSWORD}
ports:
- "127.0.0.1:5678:5678"
depends_on:
- postgres
- redis
volumes:
pgdata:
redisdata:
Key decisions in this compose file: restart: unless-stopped means Docker will restart the container unless you explicitly stop it. Containers bind to 127.0.0.1, not 0.0.0.0 — nginx is the only thing exposed to the internet. The database password comes from an environment variable, not the compose file. The compose file lives in git; the .env file does not.
The deployment pipeline
We do not use CI/CD pipelines that deploy on push. For a small team, that is more risk than benefit. Instead:
# deploy.sh — the deploy script we run manually
#!/bin/bash
set -e
cd /home/apps/paraslace
git pull origin main
npm ci --production
npm run build
# Graceful reload — zero downtime
pm2 reload paraslace
echo "Deployed at $(date)"
The pm2 reload is the magic. It starts new instances, waits for them to become ready, then kills the old ones. Zero downtime for the user. The only failure mode is if the build itself is broken — and we catch that by running npm run build before the reload.
We deploy to staging first (a separate VPS), run a smoke test, then deploy to production. The entire flow takes about three minutes for a typical Next.js app.
What has broken in production
Disk full from Docker logs. Docker's default json-file log driver accumulates logs indefinitely. On a small VPS with a 40 GB disk, a chatty n8n instance filled the disk in six months. The fix:
# docker-compose.yml — per-service logging limits
services:
n8n:
logging:
driver: "json-file"
options:
max-size: "50m"
max-file: "3"
PM2 memory leaks in Next.js. Next.js 14 and 15 had a known issue where the dev server leaked memory over time. Next.js 16 in production mode is stable, but we still set PM2's memory limit and auto-restart as a safety net:
pm2 start npm --name "paraslace" --max-memory-restart 500M -- run start
nginx client_max_body_size too small. The default is 1 MB. An API endpoint that accepts file uploads fails with a 413 error and a cryptic nginx page. Set it in the server block:
client_max_body_size 50m;
Let's Encrypt auto-renewal silently failing. certbot's default cron job renews certificates. If port 80 is blocked by a firewall or if the nginx config has a redirect that interferes with the ACME challenge, renewal fails and you discover it when the certificate expires. We now run a weekly check:
# /etc/cron.weekly/ssl-check
openssl s_client -connect paraslace.in:443 -servername paraslace.in </dev/null 2>/dev/null \
| openssl x509 -noout -enddate
The monitoring that costs nothing
We do not run Datadog. We do not run Grafana. We run three things:
- PM2 monitoring:
pm2 monitgives CPU, memory, and restart counts. Good enough for daily checks. - nginx access logs:
tail -f /var/log/nginx/access.logfiltered throughgoaccessfor basic analytics. - Health check cron: A simple curl against
/api/healthevery minute, alerts to a WhatsApp group if it fails three times in a row.
# /etc/cron.d/healthcheck
* * * * * root /usr/local/bin/healthcheck.sh
The health check script is 15 lines of bash. It has caught more production issues than every paid monitoring tool we have ever trialled.
When this stack stops working
This stack works for deployments up to about 50-100 concurrent users, a few hundred requests per second, and a team of 1-5 engineers. Beyond that:
- The single VPS becomes a scaling bottleneck. You need to split the database, the app, and the background workers onto separate machines.
- PM2's cluster mode hits diminishing returns beyond 4-8 cores on a single machine.
- Manual deployments become a coordination problem when multiple people can deploy.
At that point, you have a good problem. The business is growing. Migrate to the infrastructure that matches the scale. But do not pre-emptively build a Kubernetes cluster for 10 users. The stack above will serve you for longer than you think.
Start boring. Ship working. Add complexity when the traffic demands it, not because a conference talk made Docker Compose feel inadequate.
Tags
- docker
- pm2
- nginx
- devops
- deployment
- infrastructure
More on engineering
- TypeScript strict mode in production — why it's not optionalTeams that skip strict mode rationalise it as pragmatism. It's not. It's deferred debugging. Here's what strict mode catches before production, what it costs to adopt mid-project, and why we enable it on day one of every codebase.
- Shipping on honest timelines — the studio's internal disciplinePadding estimates is not honesty. Underestimating is not confidence. The discipline that actually works is writing the plan, measuring against it, and publishing the gap. Here's the exact process we use for every client engagement.
- LangGraph: when the complexity actually pays offLangGraph is the most powerful and most painful agent framework. A walk through when state machines and checkpoints earn their cost, and when you should just use the Claude Agent SDK and move on.