Codex — Project Plan
Version 0.1 · 2026-04-18
1. Vision
A single resource that takes a self-taught learner from a minimal prerequisite (high-school algebra + basic functions) to graduate-level mastery across mathematics and physics — with a per-unit rigor toggle so the same corpus serves a curious 15-year-old on Beginner, an undergrad on Intermediate, and a PhD-track learner on Master. Built to supersede every existing roadmap (Fast Track, 't Hooft, Rigetti, Tong's notes) by being structurally better in ways books physically cannot be: cross-linked concept graph, Lean-verified proof practice, progressive rigor disclosure, mastery-gated progression.
2. Audience & Scope
Primary learner profile
- Knows high-school algebra, functions, basic geometry
- May or may not have calculus exposure
- Self-directed; no institutional support
- Motivated by either curiosity (Beginner path) or career/research ambition (Master path)
Tiers (progressive disclosure — see §3)
| Tier | Anchor literature | Endpoint |
|---|---|---|
| Beginner | 3Blue1Brown, Strogatz Infinite Powers | Scientific literacy; can read popular-science writing with real comprehension |
| Intermediate | Axler Linear Algebra Done Right, Apostol, Griffiths | Undergrad-textbook mastery; can work exercises, follow derivations |
| Master | Lang Algebra, Hartshorne, Weinberg, Bott-Tu | Graduate research readiness; can open follow-on research papers |
Scope v1 (math + physics)
- Fast Track spine (Sections 1–3) + pre-calc ramp for pre-FT entry
- ~80 canonical texts' worth of material, reorganized into ~1000–2000 atomic units
- Each atomic unit covers one concept/theorem/technique across all three tiers
Explicitly out of scope for v1
- Chemistry, biology, CS (defer to v2+)
- K-8 arithmetic (require basic-algebra entry level; build separate "Codex Foundations" later if needed)
- Credentialing, grading, cohort-style features
3. Unit Architecture — Progressive Disclosure
Each unit is one source document with tier markers. UI filters on tier; the content is single-source-of-truth.
Required unit structure
---
id: 01.01.03
title: Vector space
prerequisites: [01.01.01, 01.01.02]
references:
- quantum-well: "Mathematical foundations/Algebra/Linear Algebra and Matrix Theory/Vector space.md"
- tong: "raw/pdfs/vc/vc.pdf#ch1"
- shilov: "Shilov-LinearAlgebra.pdf#ch1"
lean_module: Codex.LinearAlgebra.VectorSpace
---
## Intuition [Beginner]
Plain-language "what is this and why does it matter" — one or two paragraphs,
heavy on analogy. No formal notation beyond what's already in prior units.
## Visual [Beginner]
Diagram, animation, or 3Blue1Brown-style geometric picture.
## Worked example [Beginner]
A concrete, fully-worked example using everyday quantities.
## Formal definition [Intermediate+]
Axioms, notation, counterexamples. Minimum viable rigor.
## Key theorem with proof [Intermediate+]
The one theorem a reader of Axler would remember. Full proof.
## Exercises [Intermediate+]
3–8 problems. Difficulty ladder.
## Lean formalization [Intermediate+]
Statement + proof in Lean 4 / Mathlib; link to compiled module.
## Advanced results [Master]
Harder theorems, unusual perspectives, connections to deep theory.
## Full proof set [Master]
Formal proofs of advanced results; additional Lean artifacts.
## Historical & philosophical context [Master]
Where did this idea come from? Which mathematicians mattered? What's the
modern generalization?
## Bibliography [Master]
References to primary literature beyond the Codex reference archive.
Tier filter rules
- Beginner view: shows only
[Beginner]sections - Intermediate view: shows
[Beginner]+[Intermediate+] - Master view: shows everything
Cross-reference system
Every [concept-X] inline reference becomes a clickable link to unit X (which opens at the reader's current tier). Cross-references span the whole graph — linking from a QFT unit back to the vector-space unit is routine.
4. Agent Orchestration
Agents are roles, not people. Each role has explicit input/output contracts so any agent swarm can execute.
Roles
| Role | Input | Output | Tools |
|---|---|---|---|
| Scanner | Unit spec (id, concept, prerequisites) | Ranked list of relevant passages from reference/ archive |
RAG retrieval over archive |
| Producer | Unit spec + top-N passages from Scanner | Draft unit (all tiers, all sections) | Large-context LLM |
| Mathematical reviewer | Draft unit | Pass/fail + flagged errors | Large-context LLM; Lean compiler |
| Pedagogical reviewer | Draft unit | Pass/fail + flagged issues against rubric | LLM rated against rubric |
| Integrator | Approved unit + neighbor units | Final unit with cross-references resolved, dependency-graph-updated | LLM + graph tooling |
| Copy editor | Final unit | Polished unit | LLM for prose quality |
Handoff contracts
All inter-role handoffs use a canonical manifest (JSON):
{
"unit_id": "01.01.03",
"phase": "producer_output" | "review_pending" | "integrated",
"status": "pass" | "revise" | "fail",
"flags": [ { "section": "...", "issue": "...", "severity": "..." } ],
"artifacts": { "md_path": "...", "lean_path": "..." }
}
Parallelization
Units with non-overlapping prerequisites can be produced concurrently. The dependency graph drives scheduling. Initial pilot is serial (10 units, prove the pipeline); production scales via topological-sort batching.
5. Quality System
Per-tier rubrics (Machine-checkable where possible)
Beginner rubric (sample):
- No undefined formal notation
- At least one visual or diagram reference
- At least one worked example with concrete numbers
- Reading level ≤ grade 10 (automated check)
- No proof language (no "∴", "QED", "Proof:")
- Paragraph length ≤ 120 words
Intermediate rubric (sample):
- Formal definition present and correct
- At least one theorem with complete proof
- All proofs either Lean-verified or reviewed by mathematical reviewer
- Exercise set includes easy/medium/hard
- References Shilov/Apostol/Axler-level source
Master rubric (sample):
- Coverage of advanced results comparable to the unit's Master-tier anchor text
- All Master-tier proofs in Lean where Mathlib coverage exists; flagged if not
- Historical/context section has primary-literature citations
- Cross-references to downstream graduate topics present
Correctness verification
- Lean 4 / Mathlib: any theorem at Intermediate+ that can be formalized must be formalized. Lean compile = ground truth.
- Human mathematical review: domain expert reviews anything Lean can't cover. Budget per unit: ~30 minutes expert time.
- Automated reference check: every
[reference: ...]cite must resolve to an existing file in the archive.
Consistency checks
- Notation consistency across units (automated: shared notation glossary)
- Prerequisite-chain integrity (every prerequisite listed must be a published unit)
- Tier-section separation (no
[Master]content leaking into[Beginner])
6. Pipeline Stages (revised 2026-04-23)
Phase 0 — Reference archive (DONE)
1 GB archive across 11 sources. reference/_meta/SOURCES.md, TOPIC_INDEX.md in place.
Phase 1 — Scaffolding (CURRENT)
OVERVIEW.md✓BRIEF.md✓docs/specs/UNIT_SPEC.md✓docs/plans/PROJECT_PLAN.md(this document) ✓docs/catalogs/DEPENDENCY_MAP.md— pendingdocs/plans/PILOT_PLAN.md— pendingdocs/plans/REVIEWER_PLAN.md✓docs/catalogs/CONCEPT_CATALOG.md✓ (seed entries only for pilot subjects)
Note on revised ordering: docs/specs/QUALITY_RUBRIC.md is not a Phase 1 deliverable. It is distilled after pilot unit #1 produces real failure modes to capture. See Phase 2b below.
Phase 1.5 — RAG layer (NEW)
Build embeddings + vector store + retrieval API over reference/ so the Scanner agent role can actually function. Without this, agent production cannot begin. TOPIC_INDEX.md keyword-matching is too coarse for per-unit reference selection.
Deliverables:
- Embedding pipeline for all
reference/**/*.mdand PDF text layers - Vector store (probably local: LanceDB / Chroma / pgvector)
- Retrieval API that Scanner calls with a concept and returns top-N passages with provenance
Phase 2a — Pilot unit #1 (NEW, manual)
Produce one pilot unit end-to-end manually (no agents). Use the scaffold exactly as-is. The purpose is to surface real failure modes of the spec.
Candidate unit: named in docs/plans/PILOT_PLAN.md when that file is written. Likely an apex unit (e.g., Clifford algebra, Master tier only).
Phase 2b — Distill docs/specs/QUALITY_RUBRIC.md
After unit #1 exists, catalogue what went wrong or required judgment. Those become the rubric's checklist items. Per-tier rubrics written now, grounded in real output.
Phase 2c — 9 more pilot units
Produce 9 more (mix of manual + agent-assisted using the rubric). By end of pilot:
- All 10 units shipped with review manifests
- Spec gaps identified and either accepted or filed as open issues
- First real measurements on production time and review time
Pilot success criteria (revised):
- For units that include Master tier: mathematical reviewer rates ≥9 of 10 as "publication-quality"
- For units with
lean_status: full: Lean proofs compile - For units with
lean_status: none:lean_mathlib_gapandhuman_reviewerboth populated; human review attested - All cross-refs resolve
- All reference citations resolve
- For units that include Beginner tier: a naive reader completes the Beginner section in ≤ 20 min with correct retention check
- Production time per unit ≤ X hours (X recorded, not targeted — we're measuring)
Phase 3 — Iterate
Revise specs based on Phase 2 findings. Expect non-trivial changes to UNIT_SPEC and QUALITY_RUBRIC.
Phase 4 — Scale
Parallel agent swarms. Topological-sort dependency graph. Target ~1500 units over 12–18 months.
7. Data Layer
Directory structure
codex/
├── README.md
├── OVERVIEW.md
├── BRIEF.md
├── docs/
│ ├── pilot-lessons.md
│ ├── plans/ # PROJECT_PLAN, PILOT_PLAN, WAVE_*, V05_*_PLAN, SITE_PLAN, REVIEWER_PLAN, FASTTRACK_EQUIVALENCE_PLAN, CURRICULUM_V0_5_PLAN
│ ├── specs/ # UNIT_SPEC, QUALITY_RUBRIC, ORCHESTRATION_PROTOCOL, CONTINUITY_SCAFFOLD, FASTTRACK_FLOW_SCAFFOLD
│ ├── catalogs/ # CONCEPT_CATALOG, DEPENDENCY_MAP, MATHLIB_GAPS, FASTTRACK_BOOKLIST, NEED_TO_SOURCE
│ └── batches/ # GPT batch scaffolds
├── reference/ # scanned external material (Phase 0)
├── content/ # produced units (Phase 2+)
│ ├── 00-precalc/
│ ├── 01-foundations/
│ ├── 02-analysis/
│ ├── 03-modern-geometry/
│ ├── 04-algebraic-geometry/
│ ├── 05-symplectic/
│ ├── 06-riemann-surfaces/
│ ├── 07-representation-theory/
│ └── 08-stat-mech/
├── lean/ # Lean 4 project — Codex.* modules
├── scripts/ # orchestration tooling
├── plans/fasttrack/ # per-book Fast Track equivalence plans
├── manifests/ # per-unit status JSON; dependency graph; campaign + connections
└── site/ # Astro companion site
Dependency graph format
manifests/deps.json: JSON adjacency list. Every edge is a declared prerequisite. Topological sort feeds the scheduler.
Reference index
reference/_meta/TOPIC_INDEX.md: topic → archive files mapping (already built; regenerated on archive changes).
8. Success Metrics
Pilot (10 units):
- Beginner-tier unit readable by naive human in ≤ 20 min with full comprehension
- Master-tier unit rated "publication quality" by domain expert
- Lean proofs compile 100%
- Cross-references resolve 100%
v1 (500–800 units, math + physics through mid-Section-2):
- 1 independent learner completes Beginner path for all 500 units
- 1 undergrad-background learner completes Intermediate path
- Master path reviewed by 3+ domain experts per section
Long-term:
- Peer-reviewable content quality (some units publishable standalone as pedagogy papers)
- Learner outcome data (testable via embedded assessments in pilot)
9. Risks & Mitigations (revised 2026-04-23)
| Risk | Mitigation |
|---|---|
| AI producers drift without spec adherence | Tight rubric + automated checks; reject-and-regenerate loop |
| Mathematical errors compound across dependent units | Lean where Mathlib covers; named human reviewer where not; integration phase explicitly checks prereq chain |
| Pilot succeeds but doesn't generalize | Pilot deliberately covers three very different concepts to stress the spec |
| Scope creep into other sciences | Hard gate: math + physics only in v1; written into OVERVIEW invariants |
| Burnout on specs, never produce | Pilot unit #1 is the hard stop — produce it before writing docs/specs/QUALITY_RUBRIC.md |
| Over-engineering agent orchestration | Pilot unit #1 is manual; agent orchestration only after rubric distilled |
| Reviewer bandwidth (new, critical) | docs/plans/REVIEWER_PLAN.md; LLM-augmented review with human spot-check; recruit 1–3 collaborators before Master scaling |
| Lean coverage collapses at FT top (new) | Accept lean_status: none with lean_mathlib_gap + named human_reviewer; feed gaps as Mathlib contribution roadmap |
| DAG partial-order freedom (new) | docs/catalogs/CONCEPT_CATALOG.md as canonical concept source; two producers cannot declare different prereqs for the same concept |
| v0.x audience ≠ advertised audience (new) | Own it in product communications: apex-first pilot = graduate reference; Beginner/Intermediate served at v1+ when prereq chains fill |
| No RAG = no Scanner = no agent production (new) | Phase 1.5 explicitly builds embeddings + retrieval before any agent work begins |
10. Immediate next actions (revised 2026-04-23)
OVERVIEW.md✓BRIEF.md✓docs/specs/UNIT_SPEC.md✓ (revised withtiers_present,concept_catalog_id,lean_mathlib_gap,human_reviewer)docs/plans/REVIEWER_PLAN.md✓docs/catalogs/CONCEPT_CATALOG.md✓ (seed only)- Write
docs/catalogs/DEPENDENCY_MAP.md— seed apex units, pulled-prereq DAG - Write
docs/plans/PILOT_PLAN.md— 10 apex units (Master-only) - Build RAG layer (Phase 1.5)
- Produce pilot unit #1 manually — stress-test spec
- Distill
docs/specs/QUALITY_RUBRIC.mdfrom unit #1's failure modes - Produce 9 more pilot units
- Only then invoke parallel agent orchestration.