The runtimePart 1 of 2

The gravity of the primary app

A project on this platform isn't one app and one stack — it's N isolated apps and M services sharing a single Kubernetes namespace, all provisioned from a chat message. Once you allow that, almost every bug rhymes: a "new sibling app" silently collapses back into the primary one. This is the story of why that keeps happening, the single principle that fixes the whole class — converge to the declared state at one chokepoint, fail closed instead of falling back to primary — and how the same principle, run in reverse, lets us time-travel the entire stack (code in git, data in a shared Neon branch, and the live K8s infrastructure) back to one coherent moment.

The gravity of the primary app

For a long time a project here meant one thing: one app, one stack. You described an app, we generated a React (or React + tRPC + Postgres) project, and that was the universe.

Then projects grew a second app. "Add a marketing blog next to my finance tracker." "Give me an admin panel beside the storefront." Suddenly a single project is several apps, each its own isolated Vite tree with its own stack, plus several services — some shared platform infrastructure, some per-feature add-ons — all living in one Kubernetes namespace and all provisioned from chat.

That shift is small to describe and large to get right. This post is about how provisioning actually works once a project is plural, and about a failure mode that turned out to be a single bug wearing three different costumes. We shipped all three fixes today; the interesting part is that they're the same fix.

What a project actually is

Here is the anatomy of a real two-app project (qiwykfu39n: a finance tracker that grew a blog):

                          project: qiwykfu39n
  ┌──────────────────────────────────────────────────────────────────┐
  │  git worktree   /data/projects/qiwykfu39n                          │
  │    apps/personal-finance/   kind=fullstack   is_primary=true       │
  │    apps/blog/               kind=react       is_primary=false      │
  │    .adorable/spec.json      ← the declared desired state           │
  └──────────────────────────────────────────────────────────────────┘
                                  │  manifests generated from spec
                                  ▼
  ┌─ k8s namespace: adorable-qiwykfu39n ──────────────────────────────┐
  │   Deployment  app          personal-finance  (Vite + tRPC)        │
  │   Deployment  app-blog      blog              (Vite)               │
  │   Deployment  directus      catalog service   (shared, 1 per ns)   │
  │   Deployment  neon-compute  per-project Postgres compute           │
  │   Ingress     personal-finance.preview…  blog-…  directus-…        │
  └──────────────────────────────────────────────────────────────────┘
        │  DATABASE_URL / S3 keys / tokens  via Vault → K8s Secrets
        ▼
   Neon (shared pageserver)          Garage S3 (shared)   ← platform services

Three things in that picture matter for provisioning:

  1. Apps are isolated trees. Each app is a self-contained Vite app under apps/{slug}/. There is no shared bundler, no cross-app import — a symbol in apps/blog is never reachable from apps/personal-finance. This is a hard invariant, and we'll see it bite later.

  2. Services split into two classes. Platform services (Neon Postgres, Garage S3) are shared instances outside the namespace, reached via ExternalName. In-namespace services are either the built-in stack (react-frontend, trpc-api, postgres, s3) or catalog add-ons (directus, redis, strapi, …) defined as config/services/{id}/service.json. That distinction is not cosmetic — it decides what a rollback is allowed to tear down.

  3. Everything is derived from the spec. .adorable/spec.json is the declared desired state. The K8s runtime is regenerated from it; it is never the source of truth.

The three stores, and the relational glue

The state of a project is spread across stores with different identity and durability models:

Store Holds Identity
git worktree every app's source under apps/{slug}/ commit SHA
K8s namespace Deployments / Services / Ingress derived, regenerable
Neon branch all app data + Directus tables + adorable_meta timeline + LSN
Postgres (platform) project_apps, projects.services, plan revisions relational rows

The relational rows are the glue. Two are load-bearing:

-- migration 100 (+108 added fs_dir_name): one row per app
CREATE TABLE project_apps (
  id          TEXT PRIMARY KEY,         -- a_xxx
  project_id  TEXT NOT NULL,
  slug        TEXT NOT NULL,            -- → apps/{slug}/ on disk, app-{slug} in k8s
  kind        TEXT CHECK (kind IN ('react','fullstack')),
  is_primary  BOOLEAN NOT NULL,
  fs_dir_name TEXT,
  archived_at TIMESTAMPTZ              -- soft-archive on rollback
);

-- the project's flat, declared service set (jsonb array on the projects row)
SELECT services FROM projects WHERE id = 'qiwykfu39n';
-- → ["react-frontend","trpc-api","postgres","s3","directus"]

project_apps.kind is per-app; projects.services is project-wide. Hold that asymmetry — it's the seam half of today's bugs slipped through.

From a sentence to a second app

How does "add a blog as a separate app" become a project_apps row, a Vite tree, and a Deployment? Through the plan, never through a chat tool reaching into Kubernetes directly. The architect proposes a feature carrying a createsApp directive and a $NEW:<featureId> sentinel as its targetAppId:

user approves

chat: add a blog
as a separate app

architect / propose_plan

spec.json feature:
createsApp: {slug: blog, kind: react}
targetAppId: $NEW:blog

resolveCreatesAppSentinels

project_apps row
slug = blog

rewrite spec + subtask rows:
$NEW:blog → a_xxx

bucketPlanByAppId

BullMQ job
app = personal-finance

BullMQ job
app = blog

The sentinel matters because the app doesn't exist yet at plan time, but the plan already needs to reference it. The DB can't store $NEW:blog as an app_id (foreign key), so sentinel features write NULL — and the literal $NEW:blog lives only in spec.json. At approve time, resolveCreatesAppSentinels (in state-tools.js):

  1. creates the real app (createApp → row + scaffold + K8s rebuild trigger),
  2. rewrites $NEW:blog → the real a_xxx in the spec and stamps app_id onto the subtask rows that referenced it (including sibling features that point at the same $NEW:blog),
  3. hands the resolved plan to bucketPlanByAppId, which fans the work out into one BullMQ job per distinct app.

That fan-out is why per-app scoping has to be real: each job runs the producer against exactly one app's tree (resolveAppDir(projectId, appId)).

From services to a running namespace

Services take a parallel path. The service registry (service-registry.js) holds both the code-defined built-ins and the catalog entries. A ProjectSpec is assembled (project-spec.js) and handed to k8s-manifest-generator.js, which emits one Deployment/Service/Ingress per app plus one per in-namespace service. The multi-app twist lives in project-spec.js: the react-frontend / trpc-api registry entries are skipped and replayed per-app instead, with each app's kind deciding its shape:

// project-spec.js — the primary app's kind is derived, not trusted
const isTrpcActive  = resolved.includes("trpc-api");
const effectiveKind = (app.isPrimary && isTrpcActive) ? "fullstack" : app.kind;

(The primary app's stored kind can drift — legacy projects defaulted to react and never got updated when the agent added trpc-api — so manifest generation re-derives it from the active service set every time rather than trusting the column.)

Deployment order is not hardcoded. deploy-order.js reads each service's requires and readyCheck from the catalog and topologically sorts into tiers, running readiness probes between them:

deploy-order.js — tiers from `requires` + `readyCheck` (no hardcoded names)

  Tier 0  (no deps)            Tier 1  (deps healthy first)
  ──────────────────           ─────────────────────────────
  neon-compute  ─ pg_isready ──┐
  redis                        ├──►  directus   requires: [postgres, s3]
  app / app-blog               │     strapi     requires: [postgres]
  garage (s3)   ─ bucket OK ───┘

Credentials thread through Vault the entire way: getEnvVarsForCompose aggregates a project's secrets (DATABASE_URL, S3 keys, service creds), and every buildProjectSpec call must pass them as vaultEnvVars or the container boots without a database. Catalog services like Directus connect to the same project Neon branch (DB_CLIENT=pg, their own schema) — which is why their tables, and the platform's adorable_meta content ledger, all ride one Neon timeline. That fact decides what a data rollback reverts.

The bug that wears three costumes

Here's where it gets interesting. Once a project is plural, a specific failure keeps recurring: a "new sibling app" silently collapses into the primary one. We hit it three separate ways today, fixed each, and only afterward saw they were one bug.

THE FAILURE CLASS:  multi-app intent → silently degrades to "the primary app"

  surface where it leaked          what fell back to primary        fix (commit)
  ───────────────────────────      ──────────────────────────       ───────────
  1. producer's READ context       saw a sibling app's file in       scope context
     (gatherContext)               the project-wide code index,      to targetAppId
                                   wrote a cross-app import          (1ce1b58)

  2. app MATERIALIZATION           createsApp/$NEW never resolved,   reconcile at
     (approve happy-path only)     subtasks (app_id NULL) routed     startGeneration,
                                   to the primary app's tree         fail closed (c3afd35)

  3. ROLLBACK teardown             service + checkpoint not          reconcile code +
     (code+data only)              reconciled; blog infra stayed     data + INFRA to
                                   live after "undo the blog"        one target (bf3edbd)

Costume 1 — context bleed

The producer generating apps/blog was handed the project-wide code index and vector-search results — unfiltered by app. The structural index correctly fully-qualifies paths (code-index.js emits apps/personal-finance/src/lib/trpc.tsx with an appId tag), but the producer pattern-matched the symbol, not the app boundary, and wrote import { TRPCProvider } from './lib/trpc' into the Directus-only blog — a file that exists in a different app's tree. Vite couldn't resolve it; the preview died.

Apps are isolated trees, so a sibling's file is never locally importable. The fix scopes the gathered context to the run's target app — the dependency graph stays full (it's the intended cross-app surface, and it names apps), but the raw file dump the producer copies from is filtered to targetAppId. It's a knowledge bug — the context was factually wrong about which app owns the file — fixed by correcting the context, not by adding a rule.

Costume 2 — materialization on the happy path only

App materialization (createApp) lived in exactly one place: approve_plan's sentinel resolver. But generation reaches the producer through many doors — an errored approval turn, trigger_generation, resume, iteration carryover. In one trace the approval turn errored after planning; the blog's $NEW:blog was never resolved, the subtasks kept app_id = NULL, and bucketPlanByAppId's rowAppId || defaultAppId routed them straight to the primary app. The "separate Finance Blog" was built into the finance tracker. No app row was ever created.

The services next door already had the answer. startGeneration runs a convergent reconcile of services on every entry path — it exists precisely to catch the doors that bypass the happy path. Apps just hadn't joined that contract:

yes

ok

fail

no

approve_plan

startGeneration

trigger_generation

resume / retry

iteration carryover

spec declares an
unmaterialized app?

resolveCreatesAppSentinels
reconcile mode

syncProjectServices
converge to spec.services

REFUSE:
status → idle, no enqueue

enqueue producer

The reconcile is idempotent (a successful approve_plan strips createsApp from the spec, so this no-ops) and fails closed: a declared-but-unmaterialized app refuses generation rather than letting it route to primary. We also inverted the fan-out: a literal $NEW: sentinel reaching bucketPlanByAppId now throws instead of falling back.

Costume 3 — time-travel that forgets two of the three stores

This is the one worth dwelling on, because rolling a multi-app, multi-service project back in time is the provisioning problem run in reverse. "Roll back the blog" must land the project at the moment before the blog existed — and that moment has to be the same across all three stores at once.

One project, one timeline — why the data revert is atomic

Every generation commit records a lightweight checkpoint: the dev Neon timeline id + its head LSN + the commit SHA (recordCheckpoint in db-timetravel.js). No data is copied — it's a (timeline, lsn, commit) marker. A combined restore then rewinds three stores to that marker:

  checkpoints (one per generation commit):
        ● 6fe2a8b ─────────────────────────── ● ac62c29  (= current HEAD)
         pre-blog                                post-blog
                     "undo the blog"  ⇒  restore to 6fe2a8b
                                  │
   ┌──────────────────────────────┼──────────────────────────────────┐
   ▼ CODE  (git)         ▼ DATA  (Neon)            ▼ INFRA  (reconcile)
 restoreVersion          forkBranch at the        reconcileAfterRestore
 read-tree --reset       checkpoint LSN:          • archive post-target
  to 6fe2a8b:            whole-branch COW fork       plan revisions
  deletes apps/blog/*     reverts EVERY app's      • soft-archive + tear
  + any file added        schema, Directus           down apps/blog (K8s
  since target, across     tables, AND               teardownApp)
  ALL app trees,           adorable_meta —         • deprovision directus
  uniformly                ATOMICALLY                (catalog add-on, via
                          (one timeline/project)     isCodeDefinedService)
                         repointDevCompute swaps
                          the compute → 6fe2a8b
   └──────── a pre-restore snapshot is captured FIRST → roll-forward point ──────┘

The multi-app payoff is in the middle column. Because all of a project's apps share one Neon branch (each app gets its own schema; Directus and the adorable_meta content ledger live in that same branch), a single copy-on-write timeline fork at one LSN reverts every app's data and every in-branch service's tables atomically. There is no "revert app A's data but not app B's" — it's one physical operation. Putting all app data on one branch costs you per-app isolation guarantees, but it buys you a coherent, all-or-nothing data rewind for free.

The code revert is uniform for a different reason: one git worktree holds every app under apps/{slug}/, so a read-tree --reset to the target commit removes files added since — across all app trees — in one stroke. (A plain git checkout would leave added files behind; the reset is what makes apps/blog/ actually vanish.)

Infra is the store that neither restore owns

Git rewinds source; Neon rewinds data. Nothing rewinds the runtime — and that's the gap we closed today. reconcileAfterRestore brings the derived and relational stores back in line with the restored moment:

  • relational rows — archive plan revisions created after the target, reactivate the target-era revision, and soft-archive project_apps created after the target (so the new app leaves the active set);
  • per-app K8steardownApp deletes the post-target app's Deployment (derived, so this is safe and regenerable);
  • catalog services — converge projects.services down to the target's declared set (read from the worktree spec.json, which the code restore has already reset) and deprovision the add-ons that appeared since. This is what finally takes the orphaned directus Deployment down.

Two subtleties made this hard, both fixed today:

  • Pick the right moment. Checkpoints sit at commits, so "undo the blog" means restore the checkpoint before it. The agent instead picked the post-blog checkpoint — which was current HEAD — making the whole restore a no-op. The restore-point list now flags the point anchored at current HEAD (current: true) and steers callers to the one just older.

  • Tear down by taxonomy, not by name. "Which services are safe to remove" is not a hardcoded list. The registry already separates the built-in stack from catalog add-ons (codeServiceIds, captured before the catalog loads); rollback reads it via isCodeDefinedService. Built-ins (react-frontend, trpc-api, postgres, s3) are reverted by code+data and left alone; only catalog add-ons are deprovisioned.

Staying reversible

Time travel is only trustworthy if it's bidirectional. Before forking away from the current state, restoreToCheckpoint captures a pre-restore snapshot at the serving timeline's HEAD — real current schema and data, not a seed-state LSN — so "undo the rollback" is a first-class restore to that point. And because a data fork reverts DDL too, migrationsSince counts schema changes after the target and warns when a data-only restore would strand the code ahead of its schema, steering the user to a combined code+data rollback instead.

The teardown stays reversible by construction: everything it removes is either recoverable from git history (added-since files), regenerable from the spec (K8s Deployments, catalog services), or soft-archived rather than deleted (project_apps rows). The single rule behind the whole thing — what a rollback is allowed to delete — is its own post.

The one principle

Strip the three costumes off and the same sentence is underneath all of them:

Every place that can silently fall back to "the primary app" is a fail-open that should fail closed — and multi-app desired state belongs in one generation-entry reconcile, not in a happy-path side-effect.

Concretely, that's three habits the codebase now leans on:

  1. Converge to the declared state at one chokepoint. startGeneration is the single door every generation passes through. Services reconcile there; apps reconcile there now too. The spec is desired state; the entry point makes it true, idempotently, regardless of which path arrived.

  2. Fail closed, not back to primary. A NULL/sentinel app_id, an unmaterialized app, an unresolved reference — these are contract violations to refuse, not defaults to absorb. The seductive rowAppId || defaultAppId is exactly the smell.

  3. Classify from the data, never from a list. Built-in vs catalog comes from the registry. App identity comes from project_apps. Cross-app importability comes from the tree layout. When a fix wants to hardcode "the four core services" or "the blog app," that's a sign the taxonomy already exists somewhere authoritative and should be read, not duplicated.

The primary app has gravity: every ambiguous default, every silent fallback, every un-reconciled store pulls work back toward it. Good multi-app provisioning is mostly the discipline of refusing that pull — making the declared plurality real at one point, and treating "fell back to primary" as a bug to fail on rather than a safe default to rest on.

Build on the platform these posts describe.

Describe your app in plain English — Adorable writes the code, sets up the database, and ships it live.

Start building free