Codex — Reviewer Plan
The bottleneck for scale is not content production — it is expert review. ~30 min expert time × ~1500 units = ~750 hours = ~4.5 person-months of full-time domain-expert review to reach v1. No single founder can cover this across all subjects at Master tier. This document specifies who reviews, how, and with what incentives.
1. Reviewer roles & fluency bar
| Role | Fluency required | Units per hour (approx) |
|---|---|---|
| Mathematical reviewer — Beginner | Anyone who can read Axler fluently | 4–6 |
| Mathematical reviewer — Intermediate | Senior undergrad / grad student in the topic | 2–3 |
| Mathematical reviewer — Master | PhD holder or active researcher in the topic area | 1–2 |
| Pedagogical reviewer (all tiers) | Anyone with teaching experience + tier-anchor familiarity | 3–4 |
| Integrator | Contributor fluent with the curriculum's structure | 5+ |
| Copy editor | Skilled technical writer | 10+ |
2. Bootstrap strategy (v0.x pilot)
Small team, mostly Tyler + close collaborators. Goal: ship 10 pilot units reviewed.
- Tyler: Master-tier review for apex subjects he's fluent in (E8, spin geometry, QFT foundations). Estimated coverage: 40–60% of pilot units.
- 1–3 academic collaborators (TBD — recruit from physics/math departments, offer attribution + modest honorarium if budget): fill Master-tier gaps Tyler can't self-cover.
- LLM-augmented first pass: before human review, AI reviewer runs against the rubric and flags likely issues. Human reviews the flags, not the full unit. Reduces human time per unit ~3×.
- Pedagogical review: a teacher or tutor outside the research community — cheaper to recruit than subject-matter experts.
If pilot requires expert review Tyler can't provide and no collaborator exists: pause that unit, do not ship unreviewed. Master-only with no review = not ready.
3. Scale strategy (v0.5+)
When production exceeds ~20 units/week, human review alone breaks. Options to consider in order of preference:
a. Crowdsourced academic review with attribution
Units ship with a "reviewed by" line. Recruit through grad-student networks, r/math, arxiv community. Incentive = attribution + public-good contribution.
b. LLM-augmented review with human spot-check
Two LLM reviewers (different prompts) independently review → human adjudicates disagreements. Human time per unit drops to ~5 min. Quality risk: correlated LLM failures. Spot-check 10% of "both-LLMs-pass" units manually as a regression gate.
c. Paid expert review
50 avg) = ~$75k. Budget decision.
d. Open peer review
Units ship as "preview" with a review-invitation banner. Qualified readers can submit corrections via a lightweight PR flow. Corrections tracked in unit changelog.
Recommended mix for v0.5 onward: (a) + (b) with (c) reserved for unusually hard Master-tier units. (d) as a long-tail correction mechanism on shipped units.
4. Review workflow per unit
producer draft
↓
LLM mathematical-reviewer (pass 1 — flags likely errors)
↓
human mathematical-reviewer (pass 2 — reviews flags + spot-checks unflagged sections)
↓
LLM pedagogical-reviewer (against rubric; flags tier-mismatch)
↓
human pedagogical-reviewer (pass)
↓
integrator (cross-refs, dependency-graph update, notation glossary)
↓
copy editor (prose)
↓
shipped
Each step writes to the unit's review manifest (manifests/units/<id>.json):
{
"unit_id": "01.01.03",
"reviews": [
{
"role": "mathematical-reviewer-llm",
"pass": true,
"flags": [],
"reviewer": "claude-opus-4-7",
"timestamp": "..."
},
{
"role": "mathematical-reviewer-human",
"pass": true,
"flags": [],
"reviewer": "Dr. X",
"timestamp": "..."
}
],
"status": "approved"
}
5. Escalation for no-reviewer-available
When a unit needs expert review but no expert is available:
- Unit stays in
reviewstate. Do not shipdraftcontent. - A gap entry is added to
manifests/review-gaps.jsonwith the subject area + tier + expertise needed. - Production of units downstream of this one (in the DAG) continues; they will cite this unit as a pending prereq if referenced.
- When the expert is found, unit gets reviewed retroactively.
Do not ship unreviewed Master-tier content just to hit a production target. Quality debt at Master-tier destroys trust.
6. Known risks
- Correlated LLM review errors: two different prompts may still share failure modes. Mitigated by 10% human spot-check as a regression signal + diverse model roster (mix Claude, GPT, Gemini).
- Expert review inconsistency: different reviewers apply the rubric differently. Mitigated by explicit checklist rubric (not holistic judgment) + calibration pass (all reviewers first review the same unit, discrepancies discussed).
- Bus factor: Tyler as sole Master-tier reviewer for certain subjects. Mitigated by recruiting ≥1 co-reviewer per subject area before scaling that subject.
7. v0.x pilot concrete plan
| Unit | Likely reviewer | Backup |
|---|---|---|
| (apex 1–10, TBD in PILOT_PLAN) | Tyler for topics he's fluent in | Recruit 2 collaborators before pilot completion |
Tyler: list the 10 pilot apex units here with self-confidence rating (Green = can review, Yellow = can review with effort, Red = need outside expert) once docs/plans/PILOT_PLAN.md names them. Recruitment for Red units starts before that unit enters review.
This is a living document. Update as reviewer roster grows and as the review process reveals failure modes.