The gravity of the primary app
For a long time a project here meant one thing: one app, one stack. You described an app, we generated a React (or React + tRPC + Postgres) project, and that was the universe.
Then projects grew a second app. "Add a marketing blog next to my finance tracker." "Give me an admin panel beside the storefront." Suddenly a single project is several apps, each its own isolated Vite tree with its own stack, plus several services — some shared platform infrastructure, some per-feature add-ons — all living in one Kubernetes namespace and all provisioned from chat.
That shift is small to describe and large to get right. This post is about how provisioning actually works once a project is plural, and about a failure mode that turned out to be a single bug wearing three different costumes. We shipped all three fixes today; the interesting part is that they're the same fix.
What a project actually is
Here is the anatomy of a real two-app project (qiwykfu39n: a finance tracker
that grew a blog):
project: qiwykfu39n
┌──────────────────────────────────────────────────────────────────┐
│ git worktree /data/projects/qiwykfu39n │
│ apps/personal-finance/ kind=fullstack is_primary=true │
│ apps/blog/ kind=react is_primary=false │
│ .adorable/spec.json ← the declared desired state │
└──────────────────────────────────────────────────────────────────┘
│ manifests generated from spec
▼
┌─ k8s namespace: adorable-qiwykfu39n ──────────────────────────────┐
│ Deployment app personal-finance (Vite + tRPC) │
│ Deployment app-blog blog (Vite) │
│ Deployment directus catalog service (shared, 1 per ns) │
│ Deployment neon-compute per-project Postgres compute │
│ Ingress personal-finance.preview… blog-… directus-… │
└──────────────────────────────────────────────────────────────────┘
│ DATABASE_URL / S3 keys / tokens via Vault → K8s Secrets
▼
Neon (shared pageserver) Garage S3 (shared) ← platform services
Three things in that picture matter for provisioning:
Apps are isolated trees. Each app is a self-contained Vite app under
apps/{slug}/. There is no shared bundler, no cross-app import — a symbol inapps/blogis never reachable fromapps/personal-finance. This is a hard invariant, and we'll see it bite later.Services split into two classes. Platform services (Neon Postgres, Garage S3) are shared instances outside the namespace, reached via
ExternalName. In-namespace services are either the built-in stack (react-frontend,trpc-api,postgres,s3) or catalog add-ons (directus,redis,strapi, …) defined asconfig/services/{id}/service.json. That distinction is not cosmetic — it decides what a rollback is allowed to tear down.Everything is derived from the spec.
.adorable/spec.jsonis the declared desired state. The K8s runtime is regenerated from it; it is never the source of truth.
The three stores, and the relational glue
The state of a project is spread across stores with different identity and durability models:
| Store | Holds | Identity |
|---|---|---|
| git worktree | every app's source under apps/{slug}/ |
commit SHA |
| K8s namespace | Deployments / Services / Ingress | derived, regenerable |
| Neon branch | all app data + Directus tables + adorable_meta |
timeline + LSN |
| Postgres (platform) | project_apps, projects.services, plan revisions |
relational rows |
The relational rows are the glue. Two are load-bearing:
-- migration 100 (+108 added fs_dir_name): one row per app
CREATE TABLE project_apps (
id TEXT PRIMARY KEY, -- a_xxx
project_id TEXT NOT NULL,
slug TEXT NOT NULL, -- → apps/{slug}/ on disk, app-{slug} in k8s
kind TEXT CHECK (kind IN ('react','fullstack')),
is_primary BOOLEAN NOT NULL,
fs_dir_name TEXT,
archived_at TIMESTAMPTZ -- soft-archive on rollback
);
-- the project's flat, declared service set (jsonb array on the projects row)
SELECT services FROM projects WHERE id = 'qiwykfu39n';
-- → ["react-frontend","trpc-api","postgres","s3","directus"]
project_apps.kind is per-app; projects.services is project-wide. Hold that
asymmetry — it's the seam half of today's bugs slipped through.
From a sentence to a second app
How does "add a blog as a separate app" become a project_apps row, a Vite
tree, and a Deployment? Through the plan, never through a chat tool reaching
into Kubernetes directly. The architect proposes a feature carrying a
createsApp directive and a $NEW:<featureId> sentinel as its
targetAppId:
The sentinel matters because the app doesn't exist yet at plan time, but the
plan already needs to reference it. The DB can't store $NEW:blog as an
app_id (foreign key), so sentinel features write NULL — and the literal
$NEW:blog lives only in spec.json. At approve time,
resolveCreatesAppSentinels (in state-tools.js):
- creates the real app (
createApp→ row + scaffold + K8s rebuild trigger), - rewrites
$NEW:blog→ the reala_xxxin the spec and stampsapp_idonto the subtask rows that referenced it (including sibling features that point at the same$NEW:blog), - hands the resolved plan to
bucketPlanByAppId, which fans the work out into one BullMQ job per distinct app.
That fan-out is why per-app scoping has to be real: each job runs the producer
against exactly one app's tree (resolveAppDir(projectId, appId)).
From services to a running namespace
Services take a parallel path. The service registry (service-registry.js)
holds both the code-defined built-ins and the catalog entries. A ProjectSpec
is assembled (project-spec.js) and handed to k8s-manifest-generator.js,
which emits one Deployment/Service/Ingress per app plus one per
in-namespace service. The multi-app twist lives in project-spec.js: the
react-frontend / trpc-api registry entries are skipped and replayed
per-app instead, with each app's kind deciding its shape:
// project-spec.js — the primary app's kind is derived, not trusted
const isTrpcActive = resolved.includes("trpc-api");
const effectiveKind = (app.isPrimary && isTrpcActive) ? "fullstack" : app.kind;
(The primary app's stored kind can drift — legacy projects defaulted to
react and never got updated when the agent added trpc-api — so manifest
generation re-derives it from the active service set every time rather than
trusting the column.)
Deployment order is not hardcoded. deploy-order.js reads each service's
requires and readyCheck from the catalog and topologically sorts into
tiers, running readiness probes between them:
deploy-order.js — tiers from `requires` + `readyCheck` (no hardcoded names)
Tier 0 (no deps) Tier 1 (deps healthy first)
────────────────── ─────────────────────────────
neon-compute ─ pg_isready ──┐
redis ├──► directus requires: [postgres, s3]
app / app-blog │ strapi requires: [postgres]
garage (s3) ─ bucket OK ───┘
Credentials thread through Vault the entire way: getEnvVarsForCompose
aggregates a project's secrets (DATABASE_URL, S3 keys, service creds), and
every buildProjectSpec call must pass them as vaultEnvVars or the
container boots without a database. Catalog services like Directus connect to
the same project Neon branch (DB_CLIENT=pg, their own schema) — which is why
their tables, and the platform's adorable_meta content ledger, all ride one
Neon timeline. That fact decides what a data rollback reverts.
The bug that wears three costumes
Here's where it gets interesting. Once a project is plural, a specific failure keeps recurring: a "new sibling app" silently collapses into the primary one. We hit it three separate ways today, fixed each, and only afterward saw they were one bug.
THE FAILURE CLASS: multi-app intent → silently degrades to "the primary app"
surface where it leaked what fell back to primary fix (commit)
─────────────────────────── ────────────────────────── ───────────
1. producer's READ context saw a sibling app's file in scope context
(gatherContext) the project-wide code index, to targetAppId
wrote a cross-app import (1ce1b58)
2. app MATERIALIZATION createsApp/$NEW never resolved, reconcile at
(approve happy-path only) subtasks (app_id NULL) routed startGeneration,
to the primary app's tree fail closed (c3afd35)
3. ROLLBACK teardown service + checkpoint not reconcile code +
(code+data only) reconciled; blog infra stayed data + INFRA to
live after "undo the blog" one target (bf3edbd)
Costume 1 — context bleed
The producer generating apps/blog was handed the project-wide code index and
vector-search results — unfiltered by app. The structural index correctly
fully-qualifies paths (code-index.js emits apps/personal-finance/src/lib/trpc.tsx
with an appId tag), but the producer pattern-matched the symbol, not the
app boundary, and wrote import { TRPCProvider } from './lib/trpc' into the
Directus-only blog — a file that exists in a different app's tree. Vite
couldn't resolve it; the preview died.
Apps are isolated trees, so a sibling's file is never locally importable. The
fix scopes the gathered context to the run's target app — the dependency graph
stays full (it's the intended cross-app surface, and it names apps), but the
raw file dump the producer copies from is filtered to targetAppId. It's a
knowledge bug — the context was factually wrong about which app owns the file
— fixed by correcting the context, not by adding a rule.
Costume 2 — materialization on the happy path only
App materialization (createApp) lived in exactly one place: approve_plan's
sentinel resolver. But generation reaches the producer through many doors —
an errored approval turn, trigger_generation, resume, iteration carryover. In
one trace the approval turn errored after planning; the blog's $NEW:blog was
never resolved, the subtasks kept app_id = NULL, and bucketPlanByAppId's
rowAppId || defaultAppId routed them straight to the primary app. The
"separate Finance Blog" was built into the finance tracker. No app row was ever
created.
The services next door already had the answer. startGeneration runs a
convergent reconcile of services on every entry path — it exists
precisely to catch the doors that bypass the happy path. Apps just hadn't
joined that contract:
The reconcile is idempotent (a successful approve_plan strips createsApp
from the spec, so this no-ops) and fails closed: a declared-but-unmaterialized
app refuses generation rather than letting it route to primary. We also
inverted the fan-out: a literal $NEW: sentinel reaching bucketPlanByAppId
now throws instead of falling back.
Costume 3 — time-travel that forgets two of the three stores
This is the one worth dwelling on, because rolling a multi-app, multi-service project back in time is the provisioning problem run in reverse. "Roll back the blog" must land the project at the moment before the blog existed — and that moment has to be the same across all three stores at once.
One project, one timeline — why the data revert is atomic
Every generation commit records a lightweight checkpoint: the dev Neon
timeline id + its head LSN + the commit SHA (recordCheckpoint in
db-timetravel.js). No data is copied — it's a (timeline, lsn, commit)
marker. A combined restore then rewinds three stores to that marker:
checkpoints (one per generation commit):
● 6fe2a8b ─────────────────────────── ● ac62c29 (= current HEAD)
pre-blog post-blog
"undo the blog" ⇒ restore to 6fe2a8b
│
┌──────────────────────────────┼──────────────────────────────────┐
▼ CODE (git) ▼ DATA (Neon) ▼ INFRA (reconcile)
restoreVersion forkBranch at the reconcileAfterRestore
read-tree --reset checkpoint LSN: • archive post-target
to 6fe2a8b: whole-branch COW fork plan revisions
deletes apps/blog/* reverts EVERY app's • soft-archive + tear
+ any file added schema, Directus down apps/blog (K8s
since target, across tables, AND teardownApp)
ALL app trees, adorable_meta — • deprovision directus
uniformly ATOMICALLY (catalog add-on, via
(one timeline/project) isCodeDefinedService)
repointDevCompute swaps
the compute → 6fe2a8b
└──────── a pre-restore snapshot is captured FIRST → roll-forward point ──────┘
The multi-app payoff is in the middle column. Because all of a project's apps
share one Neon branch (each app gets its own schema; Directus and the
adorable_meta content ledger live in that same branch), a single
copy-on-write timeline fork at one LSN reverts every app's data and every
in-branch service's tables atomically. There is no "revert app A's data but
not app B's" — it's one physical operation. Putting all app data on one branch
costs you per-app isolation guarantees, but it buys you a coherent,
all-or-nothing data rewind for free.
The code revert is uniform for a different reason: one git worktree holds every
app under apps/{slug}/, so a read-tree --reset to the target commit removes
files added since — across all app trees — in one stroke. (A plain
git checkout would leave added files behind; the reset is what makes
apps/blog/ actually vanish.)
Infra is the store that neither restore owns
Git rewinds source; Neon rewinds data. Nothing rewinds the runtime — and
that's the gap we closed today. reconcileAfterRestore brings the derived and
relational stores back in line with the restored moment:
- relational rows — archive plan revisions created after the target,
reactivate the target-era revision, and soft-archive
project_appscreated after the target (so the new app leaves the active set); - per-app K8s —
teardownAppdeletes the post-target app's Deployment (derived, so this is safe and regenerable); - catalog services — converge
projects.servicesdown to the target's declared set (read from the worktreespec.json, which the code restore has already reset) and deprovision the add-ons that appeared since. This is what finally takes the orphaneddirectusDeployment down.
Two subtleties made this hard, both fixed today:
Pick the right moment. Checkpoints sit at commits, so "undo the blog" means restore the checkpoint before it. The agent instead picked the post-blog checkpoint — which was current HEAD — making the whole restore a no-op. The restore-point list now flags the point anchored at current HEAD (
current: true) and steers callers to the one just older.Tear down by taxonomy, not by name. "Which services are safe to remove" is not a hardcoded list. The registry already separates the built-in stack from catalog add-ons (
codeServiceIds, captured before the catalog loads); rollback reads it viaisCodeDefinedService. Built-ins (react-frontend,trpc-api,postgres,s3) are reverted by code+data and left alone; only catalog add-ons are deprovisioned.
Staying reversible
Time travel is only trustworthy if it's bidirectional. Before forking away from
the current state, restoreToCheckpoint captures a pre-restore snapshot at
the serving timeline's HEAD — real current schema and data, not a
seed-state LSN — so "undo the rollback" is a first-class restore to that point.
And because a data fork reverts DDL too, migrationsSince counts schema changes
after the target and warns when a data-only restore would strand the code
ahead of its schema, steering the user to a combined code+data rollback instead.
The teardown stays reversible by construction: everything it removes is either
recoverable from git history (added-since files), regenerable from the spec
(K8s Deployments, catalog services), or soft-archived rather than deleted
(project_apps rows). The single rule behind the whole thing —
what a rollback is allowed to delete —
is its own post.
The one principle
Strip the three costumes off and the same sentence is underneath all of them:
Every place that can silently fall back to "the primary app" is a fail-open that should fail closed — and multi-app desired state belongs in one generation-entry reconcile, not in a happy-path side-effect.
Concretely, that's three habits the codebase now leans on:
Converge to the declared state at one chokepoint.
startGenerationis the single door every generation passes through. Services reconcile there; apps reconcile there now too. The spec is desired state; the entry point makes it true, idempotently, regardless of which path arrived.Fail closed, not back to primary. A
NULL/sentinelapp_id, an unmaterialized app, an unresolved reference — these are contract violations to refuse, not defaults to absorb. The seductiverowAppId || defaultAppIdis exactly the smell.Classify from the data, never from a list. Built-in vs catalog comes from the registry. App identity comes from
project_apps. Cross-app importability comes from the tree layout. When a fix wants to hardcode "the four core services" or "the blog app," that's a sign the taxonomy already exists somewhere authoritative and should be read, not duplicated.
The primary app has gravity: every ambiguous default, every silent fallback, every un-reconciled store pulls work back toward it. Good multi-app provisioning is mostly the discipline of refusing that pull — making the declared plurality real at one point, and treating "fell back to primary" as a bug to fail on rather than a safe default to rest on.