Engineering
SQLite Durable Workflows — What the HN Crowd Got Right, and What They Skipped
A 473-point Hacker News post argues SQLite is all you need for durable workflows. We break down the architecture, where it actually works, and where it crumbles for Indian SMB workloads that need shared state.
A blog post by the Obelisk team hit 473 points on Hacker News this week. The thesis: SQLite is all you need for durable workflows. The argument builds on DBOS's earlier claim that Postgres replaces your queue and orchestration tier — but Obelisk pushes it further. Why run a network database when a local file, wrapped in transactions, gives you the same guarantees?
We agree with 80% of it. The remaining 20% is where Indian SMB workloads break the model. Here's our breakdown.
The Architecture They're Proposing
The pattern is clean. Each worker process owns a SQLite file. Workflow state is written transactionally into that file. Litestream streams WAL changes to S3-compatible object storage asynchronously. An observer process pulls databases for inspection and debugging.
Obelisk's argument is that AI agent workloads are especially well-suited here. Agents are bursty and experimental. Each agent or tenant benefits from a self-contained unit of state. A fleet of tiny containers, each with its own SQLite file, is cheaper and simpler than a shared Postgres cluster — and gives better fault isolation.
Where This Actually Works
We've used this pattern internally for two things:
Single-user workflow engines. When you're running an n8n instance for one department of a textile SMB in Ludhiana, the workflow history doesn't need to be shared. SQLite is the right call.
Agent debugging. Dumping an agent's execution log into a SQLite file, backing it to S3, and pulling it for inspection beats tailing structured logs from a central service. You get the full state, not just what someone remembered to emit.
The Litestream caveat matters. Replication is asynchronous. If the SQLite volume disappears before the latest writes are copied, you lose them. For AI experimentation and single-tenant workflows, this is acceptable. For a production order management system that can't lose a single transaction, it isn't.
What the Post Skips: Shared-Mutable State
| SQLite + Litestream | Postgres + Queue |
|---|---|
| Single-writer only. WAL mode allows concurrent readers but only one writer. | Multi-writer. Row-level locking handles concurrent mutation. |
| No built-in notification. You poll or use external signaling. | LISTEN/NOTIFY for real-time event delivery. |
| Backup is async. RPO measured in seconds to minutes. | Synchronous replication available. RPO = 0. |
| Operational simplicity — a file. No separate process. | Operational overhead — a database server to manage. |
| Best for: single-tenant agents, local tooling, embedded workflows. | Best for: multi-tenant SaaS, shared queues, zero-data-loss requirements. |
Here's the problem for Indian SMB SaaS. Most of our vertical products — Paraslace for textile ERP, for instance — have multi-tenant architectures. Multiple garment units share infrastructure. Workflow state spans tenants. A SQLite-per-tenant model means 400 SQLite files for 400 manufacturers. That's manageable with Litestream, but the moment two tenants need to share a workflow — like a dyeing unit and a stitching unit coordinating on the same order — the single-writer constraint bites.
The real insight isn't "SQLite replaces Postgres." It's "match your durability mechanism to your concurrency model." Most systems over-provision infrastructure on day one. The Obelisk team is right that many workflows don't need a distributed queue. But they also don't need to start with SQLite if they know they'll need shared-mutable state within six months.
When We'd Reach for SQLite (and When We Wouldn't)
Use SQLite workflows when:
- Single-tenant or agent-per-user architecture
- Bursty, experimental workloads
- Workflow state is self-contained
- You already run SQLite for the application
- You want to avoid a separate queue service
Use Postgres + queue when:
- Multi-tenant with shared workflows
- Zero-data-loss requirement (RPO = 0)
- You need LISTEN/NOTIFY for real-time triggers
- Multiple services need to read/write the same state
- You're already running Postgres for the app
The Obelisk team acknowledges this. They support Postgres as a backend too. "Many workflow systems do not need that on day one and should not start with more infrastructure than their state actually demands." That's the line we'd underline.
The Indian SMB Angle
Most Indian SMBs running SAAS platforms don't have dedicated DevOps. They're on a ₹1,500/month VPS from Hostinger or DigitalOcean, running PM2 and nginx. Adding Redis for a queue, or Kafka for event streaming, is not just cost — it's cognitive load. If SQLite gets them 90% of the way for <100 users, they should use it.
But the moment they cross 100 tenants and workflows start spanning them, the migration from SQLite to Postgres isn't trivial. WAL format differences, connection pooling, and the mental model shift from "a file I can copy" to "a server I must manage" all hit at once.
Our recommendation: start with Postgres if you expect multi-tenancy within 12 months. The overhead is lower than a migration. If you're building single-tenant agents or internal tools, SQLite + Litestream is the correct default.
The Obelisk post is good engineering advice. It's just incomplete for the kind of shared-state systems Indian B2B SaaS tends to build.
Tags
- sqlite
- workflows
- durability
- queues
- litestream
- architecture
More on engineering
- How to Evaluate an Open-Source Library Before You Commit — A Framework for Teams of 1–10Choosing the wrong dependency costs weeks of rewrites. This framework — community pulse, maintenance signals, API stability, and licensing — is how we decide what enters our codebase.
- Docker, PM2, nginx — the small-team production stack that worksNot every team needs Kubernetes. For 1-5 person teams shipping to a VPS, the Docker + PM2 + nginx stack is boring, reliable, and cheap. Here's the exact config we run across twelve deployments, including the footguns and the fixes.
- TypeScript strict mode in production — why it's not optionalTeams that skip strict mode rationalise it as pragmatism. It's not. It's deferred debugging. Here's what strict mode catches before production, what it costs to adopt mid-project, and why we enable it on day one of every codebase.