Engineering
Microsoft Open-Sources pg_durable — Durable Execution Moves Inside PostgreSQL
Microsoft released pg_durable on June 5, 2026, bringing durable execution directly into PostgreSQL as an extension. Define workflows in SQL, let Postgres checkpoint each step, survive crashes without external orchestrators. Here's what it does, who it's for, and why it matters.
Microsoft open-sourced pg_durable on June 5, 2026 — a PostgreSQL extension that brings durable execution inside the database. Instead of running Temporal workers, Airflow DAGs, or bespoke cron-job-plus-status-table contraptions, you define your workflow in SQL and let PostgreSQL handle checkpointing, retries, and crash recovery. The extension is MIT-licensed and available on GitHub. It ships as a built-in feature in Microsoft's new Azure HorizonDB PostgreSQL service.
This is not a minor utility extension. It's an architectural pivot. Durable execution — the ability to run long-running workflows that survive process crashes, database restarts, and partial failures — has been a separate infrastructure concern for decades. pg_durable collapses it into the database layer.
What Durable Execution Looks Like in SQL
A pg_durable workflow is a graph of SQL steps composed with operators like ~> (chain) and |=> (parallel fan-out). You start it with df.start() and get back an instance ID. PostgreSQL executes each step, checkpoints progress between steps, and picks up from the last checkpoint if anything fails.
Here's the conceptual shape:
SELECT df.start(
df.step('deduplicate', $sql$ DELETE FROM staging WHERE id IN (...) $sql$)
~> df.step('transform', $sql$ INSERT INTO facts SELECT ... FROM staging $sql$)
~> df.step('notify', $sql$ SELECT df.http('https://hooks.slack.com/...', ...) $sql$)
);
Each ~> is a durability boundary. If the database crashes after deduplicate runs but before transform completes, pg_durable restarts from transform — not from scratch. The checkpointing is automatic. The retry state lives in df.instances, queryable with standard SQL.
Who This Is For (and Who It Isn't)
pg_durable is not a Temporal replacement for every use case. It's intentionally SQL-shaped. If your workflow is mostly data transformations, batch pipelines, scheduled maintenance, or enrichment jobs that call HTTP APIs — and your state already lives in PostgreSQL — pg_durable eliminates an entire class of infrastructure.
It is explicitly not for:
- Sub-millisecond synchronous request handling
- Workflows spanning many heterogeneous systems with arbitrary application logic
- Environments where you can't install extensions or run background workers
The Architecture Shift
Before pg_durable, the standard approach for reliable background work in PostgreSQL looked like this:
| Component | Without pg_durable | With pg_durable |
|---|---|---|
| Workflow definition | Spread across SQL, application code, queue config, and cron schedules | SQL functions using df.step() and df.start() |
| Retry logic | Custom retry counters in a jobs table, application-level loops | Built into the extension, checkpointed automatically |
| Crash recovery | Manual — figure out what ran, what didn't, and replay | Automatic — resumes from last checkpoint |
| Operational visibility | Scattered across logs, queue dashboards, and status tables | df.instances — one table, standard SQL queries |
| Infrastructure dependencies | PostgreSQL + pg_cron + worker process + queue + monitoring | PostgreSQL + pg_durable extension |
The key word is "removes." pg_durable doesn't add another service to manage. It subtracts infrastructure. For teams running PostgreSQL already — which is most teams — this is a genuine reduction in operational surface area.
Real Workloads pg_durable Handles
From the README and Microsoft's documentation, the workloads this targets:
Vector embedding pipelines. Chunk documents, call an embedding API, upsert into pgvector. Each step checkpoints, so a failed API call doesn't re-chunk or re-embed already-processed documents.
Ingest pipelines. Stage raw data, deduplicate, transform, publish. Large batches that would otherwise run in long transactions holding locks.
Scheduled maintenance. Detect table bloat, notify the team, wait for approval, run VACUUM or REINDEX. Survives restarts between steps.
Fan-out aggregation. Run independent queries in parallel, join results. PostgreSQL parallel query has limits; pg_durable's
|=>operator gives you explicit parallel control with checkpointing.External API workflows. Enrichment calls, classification, webhook notifications — all from SQL, all durable.
Comparisons With Existing Tools
pg_durable enters a space with established players:
Temporal / Cadence: General-purpose durable execution with SDKs in Go, Java, TypeScript, Python. Much more powerful for complex workflows with arbitrary code. Much more infrastructure to run. pg_durable wins on simplicity for SQL-heavy workflows.
Apache Airflow: Python DAGs, scheduler, web UI, metadata database. Industry standard for data pipelines. Massive operational overhead. pg_durable wins when your "pipeline" is really just SQL with some HTTP calls.
AWS Step Functions / Azure Durable Functions: Fully managed, cloud-specific. pg_durable is self-hosted and database-native. If you're already on Azure, HorizonDB with built-in pg_durable is the obvious path.
pg_cron + jobs table: The thing pg_durable explicitly replaces. Works, but you're building durability from scratch every time.
What to Watch
pg_durable is v0.x software. The README lists limitations: the model is SQL-shaped, arbitrary code needs to live behind HTTP endpoints or SQL functions, and the extension requires a background worker process in PostgreSQL. These are reasonable constraints for a first release.
The bigger signal is Microsoft's PostgreSQL strategy. pg_durable is part of Azure HorizonDB, Microsoft's new managed PostgreSQL service. Microsoft is investing in PostgreSQL-native features — not just hosting PostgreSQL, but extending it. Between pg_durable, pgvector integration, and the Citus acquisition, Microsoft is betting that the future of application backends looks like "PostgreSQL plus extensions" rather than "PostgreSQL plus 12 microservices and 4 queues."
Bottom Line
If your background jobs are mostly SQL with HTTP calls sprinkled in, pg_durable is worth evaluating. It's not going to replace Temporal for complex multi-service orchestrations. But it will replace a lot of pg_cron-plus-jobs-table setups that teams have been building for years.
The durable execution pattern is now available as a PostgreSQL extension. That's a meaningful step toward shrinking the infrastructure footprint of the average SaaS backend. We'll be watching the extension's maturity trajectory closely — and testing it on real workloads.
Tags
- postgresql
- microsoft
- durable-execution
- workflow
- open-source
- azure
More on engineering
- When PostgreSQL Is Enough — Stop Adding Infrastructure Your SaaS Doesn't NeedMost SaaS backends running on PostgreSQL don't need Redis, Kafka, Elasticsearch, or a separate queue. Here's when the database you already have is the right tool — and when it isn't.
- Durable Execution Explained — The Pattern That Makes Your SaaS Actually WorkDurable execution is the difference between a SaaS that silently drops data during a restart and one that picks up exactly where it left off. Here's what it is, why most implementations get it wrong, and how to build it without overengineering.
- Connection Pooling Is Not Optional — PostgreSQL at Scale for Multi-Tenant SaaSEvery Rails/Django/Node.js tutorial ships with a database.yml that opens 5 connections. Multi-tenant SaaS at 200 tenants means 1,000 connections. PostgreSQL falls over around 300. Here's how connection pooling — specifically pgbouncer — prevents the crash you're heading toward.