Idea-to-Reality Stack · v2.0

A general-purpose
creation engine

In 2020 I mapped the tools a person needs to turn ideas into reality. Now the question is different: for any output you can imagine, how much of it can AI build end-to-end — and what's still on you?

The big idea

Part of solving for AGI is building out the potential space of all possible output artifacts — then building the heuristics and processes that align each artifact's necessary inputs with the tools and steps that produce it, in a logical flow.

The goal over the next few years: an unhinged amount of creative output, produced by treating creation itself as a compilable pipeline. This page is the thinking made visible — and the live map is the first working organ of that engine.

From a second brain to a creation engine

The original Idea-to-Reality Stack (2020) was a personal-workflow language. Its core claim still holds: a “second brain” is only one layer of a full stack that spans three stages — Generate → Validate → Execute — built from both mental frameworks and digital/physical tools.

The original 2020 Idea-to-Reality Stack canvas: a static grid of mental frameworks arranged by One Mind / Many Minds and the three stages.
Before · 2020 A static canvas. You hand-placed mental frameworks on a printed grid — no tools connected, no automation, no estimate of what you could actually finish.
Mockup of the new Creation Engine UI: a creation prompt, an interactive ease x completeness map with a frontier line, a production graph with per-step tool integrations, connected tools, and a leaderboard.
After · now The Creation Engine. Type what you want to make; it coordinates the stages into real output artifacts, plugs into your tools (BYO key), scores each step, and plots your frontier on the live map.

The live map: ease × completeness

Every dot is an output artifact. Up = more of it can be done end-to-end by AI + a human. Right = less real-world friction (cost, robotics, regulation). Dot size = how many human touchpoints remain. Use the controls to filter by domain and to move the AI-completable frontier through time.

How the engine works

  1. Decompose the target artifact into a production graph (sub-artifacts + steps).
  2. Map each step to the best available tool, agent, or skill (LLM, text-to-image, code, API, human, robot).
  3. Score each step's fulfillment % and ease, then roll up to a composite for the whole artifact.
  4. Surface the gaps — the exact steps where a human (or a not-yet-existing capability) must enter.
  5. Place the artifact on the map and show the full stack required to make it real.

Architecture

One portable knowledge layer, two front doors. The data is the moat; the UI and the agent-skills are thin clients over it.

1 · Stack ontology

Artifact taxonomy, step graphs, tool/agent registry, and fulfillment + ease heuristics — plain versioned files. (This site reads its seed from data.js.)

2 · Estimator

A headless “make X” planner (CLI/API) that returns the decomposition, the stack, a completeness %, an ease score, and the gaps.

3 · Clients

This interactive map + blog, plus bring-your-own-key agent skills (Claude/Codex/Hermes/OpenClaw) that don't just describe the stack — they run it.

Layer 1 · the moat

The ontology: the map's source of truth

Everything you see on the map is a thin renderer over one small, carefully-typed dataset called the ontology. Think of it as the engine's dictionary of what can be made and what it takes to make it. It answers three questions in plain data:

37artifacts
56tools / agents
136production steps
5domains

It's written in TypeScript and validated with Zod, so the data can't drift out of shape: every artifact, tool, and step is checked against a schema, and the build refuses to publish if a step ever cites a tool that doesn't exist. That's the honesty rule made mechanical — no capability gets claimed unless it's a real, registered tool. A single command re-validates the ontology and regenerates the data.js this site reads.

// ontology/schema.ts — every step must cite at least one tool
const StepSchema = z.object({
  stage:       StageEnum,                 // generate | validate | execute
  name:        z.string().min(1),
  toolIds:     z.array(z.string()).min(1), // ← can't be empty: the honesty rule
  fulfillment: z.number().int().min(0).max(100),
});

The goal

Build the Creation Engine: a portable stack ontology plus a headless estimator that, given a target output, returns its decomposition, the required stack, a composite completeness % and ease score, the ordered plan, and the human-in-the-loop gaps — rendered onto the ease × completeness map you see above.

Honesty rule: every fulfillment % cites the tool and maturity it's based on. The map shows the capability frontier as it actually is, not as we wish it.

Coming next

The benchmark: plot your stack's frontier

The map above shows the global frontier. The next build is an open eval that any stack can plug into — an agent, a model, a toolchain, even a human + AI pair, bring your own API key. It runs a battery of creation tasks, measures how much each one your stack can actually finish, and draws your own frontier line overlaid on this exact map.

Plug in any stack

Submit a stack manifest (model + tools + keys) and a thin adapter that turns a task brief into produced artifacts plus a run-log of the steps it took.

Four scored axes

Completeness (steps finished unaided), quality (rubric / judged), autonomy & ease (human steps, cost, time), and honesty (did it report its own gaps instead of overclaiming?).

Your frontier, overlaid

Results render as a contour on the map above — the line where your stack's completeness × quality drops off. That line is “where your capabilities end up.”

How a run works

  1. Pick a task set — start with auto-gradeable digital artifacts (blog post, landing page, small web app, song, eBook) with deterministic verification.
  2. Your adapter runs each brief and drops artifacts + a step log into an output folder.
  3. The harness verifies — programmatic checks first, judged quality second, with provenance (model versions, cost, timestamp) recorded.
  4. Scores roll up into the four axes and a composite reach.
  5. Your frontier is drawn on the map and added to a dated leaderboard, so different stacks — and the same stack over time — can be compared as the frontier moves.

Same honesty rule applies: physical artifacts (house, drug, spacecraft) are scored only on their digital precursors, with the real-world remainder labelled unevaluated rather than faked.