Codex — Reviewer Plan

The bottleneck for scale is not content production — it is expert review. ~30 min expert time × ~1500 units = ~750 hours = ~4.5 person-months of full-time domain-expert review to reach v1. No single founder can cover this across all subjects at Master tier. This document specifies who reviews, how, and with what incentives.

1. Reviewer roles & fluency bar

Role	Fluency required	Units per hour (approx)
Mathematical reviewer — Beginner	Anyone who can read Axler fluently	4–6
Mathematical reviewer — Intermediate	Senior undergrad / grad student in the topic	2–3
Mathematical reviewer — Master	PhD holder or active researcher in the topic area	1–2
Pedagogical reviewer (all tiers)	Anyone with teaching experience + tier-anchor familiarity	3–4
Integrator	Contributor fluent with the curriculum's structure	5+
Copy editor	Skilled technical writer	10+

2. Bootstrap strategy (v0.x pilot)

Small team, mostly Tyler + close collaborators. Goal: ship 10 pilot units reviewed.

Tyler: Master-tier review for apex subjects he's fluent in (E8, spin geometry, QFT foundations). Estimated coverage: 40–60% of pilot units.
1–3 academic collaborators (TBD — recruit from physics/math departments, offer attribution + modest honorarium if budget): fill Master-tier gaps Tyler can't self-cover.
LLM-augmented first pass: before human review, AI reviewer runs against the rubric and flags likely issues. Human reviews the flags, not the full unit. Reduces human time per unit ~3×.
Pedagogical review: a teacher or tutor outside the research community — cheaper to recruit than subject-matter experts.

If pilot requires expert review Tyler can't provide and no collaborator exists: pause that unit, do not ship unreviewed. Master-only with no review = not ready.

3. Scale strategy (v0.5+)

When production exceeds ~20 units/week, human review alone breaks. Options to consider in order of preference:

a. Crowdsourced academic review with attribution

Units ship with a "reviewed by" line. Recruit through grad-student networks, r/math, arxiv community. Incentive = attribution + public-good contribution.

b. LLM-augmented review with human spot-check

Two LLM reviewers (different prompts) independently review → human adjudicates disagreements. Human time per unit drops to ~5 min. Quality risk: correlated LLM failures. Spot-check 10% of "both-LLMs-pass" units manually as a regression gate.

c. Paid expert review

$30-80/ u ni t d e p e n d in g o n t i er . F or v 1 (1500 u ni t s \times$ 50 avg) = ~$75k. Budget decision.

d. Open peer review

Units ship as "preview" with a review-invitation banner. Qualified readers can submit corrections via a lightweight PR flow. Corrections tracked in unit changelog.

Recommended mix for v0.5 onward: (a) + (b) with (c) reserved for unusually hard Master-tier units. (d) as a long-tail correction mechanism on shipped units.

4. Review workflow per unit

producer draft
  ↓
LLM mathematical-reviewer (pass 1 — flags likely errors)
  ↓
human mathematical-reviewer (pass 2 — reviews flags + spot-checks unflagged sections)
  ↓
LLM pedagogical-reviewer (against rubric; flags tier-mismatch)
  ↓
human pedagogical-reviewer (pass)
  ↓
integrator (cross-refs, dependency-graph update, notation glossary)
  ↓
copy editor (prose)
  ↓
shipped

Each step writes to the unit's review manifest (manifests/units/<id>.json):

{
  "unit_id": "01.01.03",
  "reviews": [
    {
      "role": "mathematical-reviewer-llm",
      "pass": true,
      "flags": [],
      "reviewer": "claude-opus-4-7",
      "timestamp": "..."
    },
    {
      "role": "mathematical-reviewer-human",
      "pass": true,
      "flags": [],
      "reviewer": "Dr. X",
      "timestamp": "..."
    }
  ],
  "status": "approved"
}

5. Escalation for no-reviewer-available

When a unit needs expert review but no expert is available:

Unit stays in review state. Do not ship draft content.
A gap entry is added to manifests/review-gaps.json with the subject area + tier + expertise needed.
Production of units downstream of this one (in the DAG) continues; they will cite this unit as a pending prereq if referenced.
When the expert is found, unit gets reviewed retroactively.

Do not ship unreviewed Master-tier content just to hit a production target. Quality debt at Master-tier destroys trust.

6. Known risks

Correlated LLM review errors: two different prompts may still share failure modes. Mitigated by 10% human spot-check as a regression signal + diverse model roster (mix Claude, GPT, Gemini).
Expert review inconsistency: different reviewers apply the rubric differently. Mitigated by explicit checklist rubric (not holistic judgment) + calibration pass (all reviewers first review the same unit, discrepancies discussed).
Bus factor: Tyler as sole Master-tier reviewer for certain subjects. Mitigated by recruiting ≥1 co-reviewer per subject area before scaling that subject.

7. v0.x pilot concrete plan

Unit	Likely reviewer	Backup
(apex 1–10, TBD in PILOT_PLAN)	Tyler for topics he's fluent in	Recruit 2 collaborators before pilot completion

Tyler: list the 10 pilot apex units here with self-confidence rating (Green = can review, Yellow = can review with effort, Red = need outside expert) once docs/plans/PILOT_PLAN.md names them. Recruitment for Red units starts before that unit enters review.

This is a living document. Update as reviewer roster grows and as the review process reveals failure modes.