Codex — Unit Specification
The canonical format for a Codex learning unit. Every produced unit must conform. Used by content producers (human or AI) and by reviewers (rubric keys directly off this spec).
Read BRIEF.md first for audience, tier, and mastery definitions.
1. Core concept
A unit is the smallest shippable piece of curriculum: one concept, one theorem cluster, or one technique. A unit contains content at all three tiers in a single markdown source file. UI tier filters determine what each learner sees.
- Granularity target: ~20 to 50 minutes of Master-tier reading time (including exercises). Too small = unit fragmentation; too big = unit gets unwieldy.
- Atomicity rule: one unit has one primary concept. "Vector space" is one unit. "Vector space, linear map, basis" is three units that link to each other.
2. File location & naming
content/<section>/<chapter>/<unit-id>-<slug>.md
# Example
content/01-fundamentals/01-linear-algebra/01.01.03-vector-space.md
<unit-id>is dotted triple:section.chapter.ordinal(e.g.,01.01.03).<slug>is kebab-case, derived from title.- Lean artifact lives in
lean/Codex/<Section>/<Chapter>/<Slug>.leanwith the same hierarchy.
3. Required file structure
---
id: 01.01.03
title: Vector space
slug: vector-space
section: fundamentals
chapter: linear-algebra
concept_catalog_id: linalg.vector-space # from docs/catalogs/CONCEPT_CATALOG.md
prerequisites:
- 00.02.05 # function
- 01.01.01 # field
- 01.01.02 # scalar
successors:
- 01.01.04 # subspace
- 01.01.05 # linear combination
tier_anchors:
beginner: "3Blue1Brown — 'Essence of Linear Algebra' Ch. 1"
intermediate: "Axler — Linear Algebra Done Right §1.A"
master: "Lang — Algebra Ch. III §1"
tiers_present: [beginner, intermediate, master] # subset actually filled in
references:
- source: quantum-well
path: "Mathematical foundations/Algebra/Linear Algebra and Matrix Theory/Vector space.md"
- source: shilov
path: "Shilov-LinearAlgebra.pdf"
locator: "Ch. 1, §1.1"
- source: tong
path: "raw/pdfs/vc/vc.pdf"
locator: "Ch. 1"
lean_module: Codex.LinearAlgebra.VectorSpace
lean_status: full # full | partial | none
lean_mathlib_gap: # required iff lean_status: none
human_reviewer: # required iff lean_status: none at Intermediate+
estimated_time:
beginner: 15m
intermediate: 30m
master: 60m
status: draft # draft | review | approved | shipped
produced_by: agent-producer-v1
reviewed_by: []
---
## Intuition [Beginner]
...
## Visual [Beginner]
...
## Worked example [Beginner]
...
## Check your understanding [Beginner]
...
## Formal definition [Intermediate+]
...
## Key theorem with proof [Intermediate+]
...
## Exercises [Intermediate+]
...
## Lean formalization [Intermediate+]
...
## Advanced results [Master]
...
## Full proof set [Master]
...
## Connections [Master]
...
## Historical & philosophical context [Master]
...
## Bibliography [Master]
...
4. Frontmatter fields — spec
| Field | Required | Notes |
|---|---|---|
id |
yes | dotted triple, globally unique |
title |
yes | human-readable, title-case |
slug |
yes | kebab-case, derives filename |
section, chapter |
yes | match directory structure |
prerequisites |
yes | list of unit IDs; every listed ID must exist as a published unit at time of this unit shipping. Integrator enforces. Apex / top-down production exception: if pending_prereqs: true is set, listed IDs may reference not-yet-produced units; they go into the next-wave production queue and the integrator records the dependency edge in manifests/deps.json with state pending. Do not use pending_prereqs: true outside top-down production waves. |
pending_prereqs |
no (default false) |
when true, prereq IDs may point at not-yet-produced units. Used during apex-first production. |
successors |
no | populated by Integrator; producers leave empty |
tier_anchors |
yes | name the specific anchor source per tier. Accepted values per tier: a string naming the anchor (e.g. "Lawson-Michelsohn §I.1") or the literal deferred (without quotes is a YAML tag; quoted "deferred" works) when that tier is not in tiers_present. |
references |
yes | ≥1 entry. By default each path must resolve in reference/ at build time. An entry may instead set pending: true with pointer: linking to docs/catalogs/NEED_TO_SOURCE.md (e.g. pointer: "docs/catalogs/NEED_TO_SOURCE.md#75") to declare a citation that will resolve once the source is acquired. Pending citations count toward ≥1 but a unit may not have only pending citations — at least one resolved citation is required. |
lean_module |
yes if Mathlib covers | canonical Lean module path |
lean_status |
yes | see §7 |
lean_mathlib_gap |
yes if lean_status: none |
short note on why Mathlib doesn't cover (names the gap; feeds upstream-contribution roadmap) |
human_reviewer |
yes if lean_status: none at Intermediate+ |
name of the domain expert who signed off on correctness |
tiers_present |
yes | subset of [beginner, intermediate, master] actually written in this unit; rubric only runs on present tiers |
concept_catalog_id |
yes | unique id from docs/catalogs/CONCEPT_CATALOG.md canonical list; prevents two producers declaring the same concept differently |
estimated_time |
yes | guides learner pacing; Beginner 10–20m, Intermediate 20–45m, Master 45–90m typical |
status |
yes | drives the pipeline (see §12) |
5. Sections — detailed spec
Every section heading carries a tier marker in square brackets. The marker is load-bearing: the UI filter keys off it.
## Intuition [Beginner]
- Length: 1–3 short paragraphs (≤ 120 words per paragraph).
- Goal: convey why this concept exists and what it captures before any formal definition.
- Rules:
- No undefined formal notation.
- No proofs, no QED, no "Let X be...".
- One concrete analogy from everyday experience minimum.
- Must answer: "Why do we bother with this?"
## Visual [Beginner]
- Content: one embedded image, animation, or interactive widget; or a reference to a 3Blue1Brown video (acceptable as a stopgap for pilot units; must be replaced with original content before v1.0).
- Alt text required for accessibility and RAG retrieval.
## Worked example [Beginner]
- Length: fully worked, with specific numbers (not variables).
- Goal: a concrete instance the reader can compute along with.
- Rules:
- All steps shown, nothing "obvious".
- End with "what this tells us" — the takeaway.
## Check your understanding [Beginner]
- 2–4 questions.
- Answer types allowed at this tier: multiple-choice, true/false, one-number numeric.
- Each question declares
answer,hint,feedback-correct,feedback-wrong(see §6 interactivity schema).
## Formal definition [Intermediate+]
- The standard definition as it would appear in the Intermediate anchor text.
- Notation introduced here must be listed in the notation glossary (auto-linked).
- Counter-examples or non-examples included where helpful.
## Key theorem with proof [Intermediate+]
- One signature theorem for the unit.
- Full proof, written to the standard of the Intermediate anchor text.
- Proof uses Lean syntax-compatible notation where possible (so Lean formalization below mirrors 1:1).
## Exercises [Intermediate+]
- 5–10 problems, difficulty ladder: 2 easy, 3–5 medium, 2–3 hard.
- Answer types allowed: numeric, symbolic, multiple-choice, short-answer-with-rubric.
- Each problem tagged with difficulty, topic, and solution rubric (see §6).
## Lean formalization [Intermediate+]
- Inline Lean 4 code block showing the statement and (where Mathlib covers) the proof.
- Full formalization lives in the
lean_modulepath; this section is the "readable" view.
## Advanced results [Master]
- Additional theorems, harder problems, non-obvious perspectives.
- No upper limit on length, but coherence required — this is not a dumping ground.
## Full proof set [Master]
- Formal proofs of everything in "Advanced results" that can be proved.
- Preferred: Lean-verified. Otherwise human-reviewed, flagged.
## Connections [Master]
- How this concept shows up later in the curriculum.
- Cross-references to downstream graduate-level units.
## Historical & philosophical context [Master]
- Where did this idea come from? Who mattered? What's the modern generalization?
- Not required to be long. Required to be accurate. Cite primary sources in bibliography.
## Bibliography [Master]
- Primary-literature references beyond the Codex reference archive.
- BibTeX or similar structured format preferred.
6. Interactivity — exercise schema
Exercises are structured, not freeform. They ship as static-renderable markdown today, and as interactive widgets once the platform supports it. Same source, both render paths.
v0.1 production format — semantic HTML. Neutron's marked engine doesn't process Pandoc-style fenced divs (:::), so exercises use <aside class="exercise"> HTML blocks with structured data-* attributes. Marked passes raw HTML through unchanged; the renderer styles them via CSS; future interactivity can read the data-* attributes.
<aside class="exercise" data-difficulty="medium" data-type="numeric">
<p class="exercise__label"><strong>Exercise (medium, numeric).</strong></p>
<p>Given the vector $v = (3, 4)$ in $\mathbb{R}^2$, compute $\|v\|$.</p>
<details class="exercise__hint"><summary>Hint</summary>
Remember the definition of the Euclidean norm.
</details>
<details class="exercise__answer"><summary>Answer</summary>
<p><strong>$5$.</strong> $\sqrt{9 + 16} = 5$.</p>
</details>
</aside>
Future format (once Neutron's marked extension pipeline is wired): ::: Pandoc-style fenced divs with YAML-like fields, parsed by a remark-directive plugin into the same HTML structure. Both the canonical UNIT_SPEC and the producer agents should target the v0.1 HTML format until that plugin is in place.
Supported type values
type |
Answer shape | Auto-checkable |
|---|---|---|
multiple-choice |
one of N options | yes |
true-false |
boolean | yes |
numeric |
number + tolerance | yes |
symbolic |
SymPy-parseable expression | yes (server-side) |
short-answer |
text matched against regex OR rubric | partial — regex auto, rubric human/LLM |
lean-proof |
Lean 4 tactic script | yes (Lean compiler) |
open-ended |
prose answer | human/LLM review only |
Required fields per exercise (v0.1 HTML mapping)
| Logical field | HTML representation |
|---|---|
type |
data-type="…" on outer <aside> |
difficulty |
data-difficulty="easy" | "medium" | "hard" on outer <aside> |
prompt |
first <p> after .exercise__label |
options (multiple-choice only) |
<ol class="exercise__options" type="A"> |
hint |
<details class="exercise__hint"><summary>Hint</summary>…</details> |
answer |
<details class="exercise__answer"><summary>Answer</summary>…</details> |
feedback-correct, feedback-wrong |
merged into the answer details paragraph; will be split when interactive renderer arrives |
7. Lean integration
lean_status values
full: theorem and all proofs formalized in Mathlib-compatible Lean 4. Compiles clean. Thelean_modulepath must exist and be wired into the Codex Lean build.partial: the theorem statements must compile (no missing imports, no syntax errors), but proofs may besorry-gated or partial. Thelean_modulepath must exist. The namedhuman_reviewerverifies the proof gap is actually believed-true.none: Mathlib doesn't cover this material (common at frontier: spin geometry, QFT, advanced algebraic geometry). Unit ships without alean_modulepath;lean_mathlib_gaptext describes the specific Mathlib coverage missing;human_revieweris named as the correctness gate. The aggregatedlean_status: noneunits form the upstream Mathlib contribution roadmap.
lean_module path verification
- If
lean_statusisfullorpartial, the path declared inlean_module:must correspond to an existing file inlean/and that file must successfullylake buildas part of the Codex Lean project. - If
lean_status: none, thelean_module:field must be absent or empty; declaring a path is a build error.
Formalization rules
- Unit must declare
lean_moduleiflean_status != none. - Lean module must build as part of the Codex Lean project. CI enforces.
- Lean module is not a replacement for the Master-tier proof prose — it is a verification companion.
- Tactic proofs preferred over term-mode where readability differs.
When Mathlib doesn't cover
lean_status: none is an accepted state, not a failure. At the top of Fast Track, expect this to dominate. Requirements:
lean_mathlib_gapfield in frontmatter names the specific gap (e.g., "Mathlib lacksCliffordAlgebra.spinStructure").human_reviewerfield names the domain expert who verified correctness.- Aggregated gaps feed a contribution roadmap upstream to Mathlib — every
lean_status: noneis a formalization target.
This reflects the honest state of Mathlib coverage as of 2026: extensive for undergraduate core, thin-to-absent for graduate physics and advanced geometry. Lean is a correctness accelerator where it applies, not a universal quality gate.
8. Reference citations
Every ## Key theorem, ## Formal definition, and ## Advanced results section must cite at least one source from the reference/ archive.
Citation syntax (inline):
The formal definition follows Shilov [ref: shilov Ch. 1 §1.1].
Where shilov maps to an entry in frontmatter references with source: shilov. Renderer resolves to a click-through.
Citation must resolve to a real file in reference/. Automated check at build time: any unresolved [ref: …] fails the build.
9. Cross-references to other units
Inline:
A [vector space][01.01.03] is a set with two operations...
Renderer resolves [01.01.03] to the unit's page at the learner's current tier. Integrator agent validates all cross-refs at integration time.
Broken cross-refs fail the build.
10. Notation glossary
All formal notation introduced in Intermediate+ sections must appear in the project-wide notation glossary at _meta/NOTATION.md. Auto-linkable: first use of a symbol in a unit links to its glossary entry.
Producer adds new entries when introducing notation not already present. Integrator agent checks for unregistered symbols and flags.
11. Style rules
Voice
- Second-person ("you") for Beginner intuition sections; third-person impersonal for Intermediate+ formal content.
- Emulate the rigor of the anchor text, not the literal prose. Do not copy sentences. Do not paraphrase closely. Rewrite.
- Master-tier prose standard: Lawson-Michelsohn, Hörmander, Kato — declarative, structurally honest, sparing with motivation, generous with the precise statement of universal properties, confident in categorical and functorial language without ostentation. Six prose failure modes are explicitly disallowed at Master tier and checked by the rubric (
docs/specs/QUALITY_RUBRIC.md§M):- No section-level metacommentary (don't narrate what a section is doing — enact structure through theorem ordering).
- No "Counterexamples to common slips" bullet lists at Master tier — fold into running prose. (Encouraged at Intermediate as a
### Counterexamples to common slipssubsection; prohibited at Master.) - No hedged futures or apologies for omission ("we will see…", "we do not state…").
- No valedictory closing paragraphs in Historical & philosophical context — cite, attribute, stop.
- No real-time error correction in worked examples — compute correctly the first time.
- No triplicate emphasis or rhetorical italics — italics are for definitions and proper names.
Notation
- Consistent across units. Glossary (§10) is authoritative.
- LaTeX for all math:
$...$inline,$$...$$display. - No Unicode-only math (except in prose where appropriate).
Length ceilings
- Beginner intuition paragraph: ≤ 120 words.
- Any single paragraph anywhere: ≤ 250 words.
- Proof: no limit — but must be continuous, not bulleted.
Prohibited phrasings (case-insensitive)
Two classes — see docs/specs/QUALITY_RUBRIC.md §0 for the full list and tier-specific application:
- Slop (any tier): "obviously", "clearly", "it is easy to see", "as we will see", "now, let's consider", "trivial(ly)".
- Hedged futures and LM-failure-mode phrases (Master tier only): "we will see", "we will rely on", "we do not state", "a subtlety worth noting", "the right organising principle", "the structural fact that organises", "the philosophical point is", "the deeper philosophical content".
Beginner and Intermediate tiers may use the second-class phrasings pedagogically (e.g., "we will see" as a forward pointer in Beginner intuition is fine).
Other prohibited
- Unattributed analogies from other sources.
- Forward references to un-shipped units (use
pending_prereqs: trueand apendingcross-ref instead).
12. Lifecycle & status field
The status frontmatter field drives the pipeline. Producer writes draft. Reviewer changes to review. On pass, Integrator changes to approved. Publishing pipeline promotes to shipped.
| status | Meaning |
|---|---|
draft |
Producer output; not yet reviewed |
review |
In reviewer queue |
approved |
Passed all reviews; ready to ship |
shipped |
Live; changes require re-review |
13. Example: minimal pilot-grade unit
A complete, tier-complete unit for 01.01.03 — Vector Space will be written during the pilot as the worked example of this spec. It lives at content/01-fundamentals/01-linear-algebra/01.01.03-vector-space.md once produced.
Before writing that unit, every item in this spec must have an unambiguous interpretation. If producing the pilot unit reveals a spec gap, that's a signal to revise docs/specs/UNIT_SPEC.md first — not to patch in the unit.