What a rollback is allowed to delete
"Roll the project back to yesterday" sounds like one operation. It isn't. A project on this platform is spread across three stores with completely different durability and identity models:
- Source — files in a per-project git repo (
/data/projects/{id}), one working tree containing every app underapps/{slug}/. - Runtime — Kubernetes Deployments, Services, and Ingress rules in the project's namespace, generated from a spec.
- Data — a copy-on-write Postgres branch on shared Neon storage, plus the
relational rows (
project_apps, plan revisions) that describe what exists.
A rollback has to make all three agree on the same past moment. The restore itself — checkout this commit, fork the Neon timeline at that LSN — is the easy half. The half that actually decides whether time travel is trustworthy is the teardown: when you roll back to a point before a feature existed, what do you physically delete, and how do you guarantee you can still go forward again?
Get it wrong in one direction and the rollback is a lie — the spec says the feature is gone but its code still sits on disk and its container still serves traffic. Get it wrong in the other direction and the rollback is destructive — you delete something you can't bring back, and "reversible time travel" quietly isn't.
This post is the rule we settled on, and why each store obeys it.
The shape of the problem
Concretely: a project is a todo app. Later it grows a second app — a blog — with
its own source under apps/blog/, its own app-blog Deployment and Service, a
routing rule on the shared preview Ingress, and some rows in Postgres. Then the
user rolls back to a snapshot taken before the blog existed.
What "before the blog" means per store:
target snapshot now
git source ───────────●──────────────────────● blog files committed after ●
k8s runtime ───────────●──────────────────────● app-blog Deployment running
neon data ───────────●──────────────────────● blog tables written after ●
project_apps ───────────●──────────────────────● blog app row created after ●
Each store needs to end up at the left dot. But they don't roll back the same way, and the naïve per-store operation is wrong for two of the three.
Source: "restore the files" is not "match the commit"
The instinct is git checkout <target> -- .. It's wrong, and subtly so.
git checkout <commit> -- <pathspec> updates the working tree for paths that
exist in <commit>. A file that was added after the target — every blog
source file — isn't in the target tree, so checkout simply doesn't mention it.
It stays on disk. You then git add -A and commit, and the "restore" commit
faithfully captures a tree that still contains the blog. The rollback ran, the
diff looked plausible, and nothing got removed.
The fix is to make the working tree match the target exactly, deletions included. We capture the set of paths added between target and HEAD and remove them explicitly:
// project-git.js — restoreVersion()
// Files present in HEAD but absent in the target tree (added after the target).
const addedSince = await git(dir, ["diff", "--name-only", "--diff-filter=A", target, "HEAD"]);
await git(dir, ["checkout", target, "--", "."]); // restore modified/deleted paths
if (addedSince.length) {
await git(dir, ["rm", "-f", "--ignore-unmatch", "--", ...addedSince]); // remove added-since
}
await git(dir, ["add", "-A"]);
await git(dir, ["commit", "-m", `restore: reverted to ${short}`]);
(git read-tree -u --reset <target> does the same thing in one move; we keep
the explicit checkout-plus-delete because the failure it fixes is exactly about
which files are not mentioned, and the code reads as that intent.)
One detail matters enormously here and we'll come back to it: the restore is a new commit on top of HEAD. We never move the branch pointer or rewrite history. The pre-rollback commit — the one with the blog — is still fully reachable in the object graph. That single fact is what makes deleting the files safe: they aren't gone, they're one commit away.
The same checkout also drives index maintenance. Every changed path — including
the now-deleted ones — is fed to reindexFiles, which removes vanished files
from the code graph and the vector index instead of leaving dangling embeddings:
// file-sync.js — reindexFiles()
if (content === null) { // file gone after the checkout
await fanoutSyncRemove(projectId, projectRelPath);
scheduleVectorFlush(projectId, projectRelPath, null);
}
So the source store reaches the target by deletion that's recoverable from history. Hold that thought.
Runtime: don't restore it, regenerate it
The blog's app-blog Deployment is still running. You could try to "roll back
Kubernetes" — but there's nothing to roll back to; K8s has no history. And you
don't need one, because the runtime is not a store of record. Every
Deployment, Service, and Ingress rule is derived — generated from the
project's spec by the manifest generator on each start.
That changes the question from "how do I restore the runtime" to "how do I make the derivation stop including the blog." The spec is built from the live app list, and that query is filtered:
-- listProjectApps(): the spec only ever sees live apps
SELECT id, slug, is_primary, fs_dir_name
FROM project_apps
WHERE project_id = $1 AND archived_at IS NULL;
So the moment the blog's row is archived, it falls out of the spec, and the next
manifest generation simply doesn't emit app-blog or its Ingress rule. The
runtime converges by regeneration, not restoration.
There's still one explicit step: archiving a row removes the app from future manifests, but the Deployment that's already running keeps running until something deletes it. So the reconcile tears down each archived app's live runtime — and nothing else:
// orchestrator/cleanup.js — teardownApp()
export async function teardownApp(projectId, name) {
if (!name || name === "app") return { ok: false, reason: "refusing to tear down the primary app" };
await deleteIfExists(() => appsApi.deleteNamespacedDeployment({ name, namespace: ns }), ...);
await deleteIfExists(() => coreApi.deleteNamespacedService({ name, namespace: ns }), ...);
}
Two guard rails are load-bearing. It refuses to touch the primary app —
the project's anchor app is never torn down by a per-app reconcile. And it's
idempotent (deleteIfExists swallows 404), because a reconcile must be safe to
run twice. Deleting derived runtime loses nothing: a roll-forward re-adds the
row, and the next start regenerates the manifests verbatim.
Data: the branch restore already did it
The blog wrote tables into Postgres. But apps share one Neon database per project, and the combined rollback restores that database by forking the branch at the snapshot's LSN. DDL and DML after that LSN — the blog's tables and their rows — are simply on the other side of the fork. There is no per-app data to delete, because point-in-time is the deletion. (This is also why a data-only restore has to check for schema drift first — reverting the branch reverts the schema too. That's its own post.)
What does not get hard-deleted is the relational anchor — the project_apps
row. We soft-archive it (archived_at = now()), never DELETE. The row is
the identity that ties the three stores together; keeping it is what lets a
roll-forward un-archive the app and bring the whole thing back. Hard-deleting it
would orphan any history that referenced it and make forward travel impossible.
The invariant
Step back and the three stores obey one rule. A rollback may physically delete a thing only if that thing is:
- Recoverable from git history — source files. The pre-rollback commit stays reachable, so deletion is a pointer away from undone.
- Regenerable from the spec — K8s Deployments, Services, Ingress rules. Derived state; deleting it is free because the next start rebuilds it.
Everything else is soft-archived, not destroyed — the project_apps row,
the prior plan revisions. And shared data isn't deleted at all; it's reverted by
the branch fork, which is itself reversible (the pre-rollback state is captured
as a roll-forward snapshot before the fork).
Reversibility isn't a feature bolted on top — it's the property that defines what's safe to delete in the first place. If you can't get it back, you don't delete it; you archive it or you revert-not-erase it.
The one thing that isn't safe
The invariant has a sharp edge, and it's worth stating plainly so it isn't discovered the hard way: the rule only holds for state that is either in git or derived from the spec. Anything that is neither must not be blind-deleted by a rollback:
- Files that were never committed — a user-uploaded asset sitting in an app's
public/that no generation has captured. The git reset would remove it with no commit to recover it from. - Objects in a per-app external resource — a storage bucket the app wrote to directly. The Neon branch fork reverts the project database, but it knows nothing about an S3 bucket's contents.
For both, the correct move is the same as for the relational row: capture before you remove (commit untracked files into the roll-forward point; snapshot external resources), or leave them in place. The invariant is precise about its own boundary: recoverable or derived — and if a piece of state is neither, you make it recoverable before a rollback is allowed to touch it.
Why this shape
The deeper reason all of this works is that we keep a strict separation between stores of record and derived state, and we never let a rollback confuse the two. Source and the relational anchor are records — they get reverted with history preserved. The Kubernetes runtime is derived — it gets regenerated. Shared data is a record with its own time-travel primitive — it gets reverted by branch fork. The rollback's only job is to move each store to the same instant using that store's reversible operation, and to refuse to destroy anything it can't reconstruct.
Land that, and "go back to yesterday" stops being a hopeful one-liner and becomes what it should be: a coherent, reversible move across three stores that each remember the past in their own way.