The Examined Classroom
The Paperclip Maximizer
The danger is not evil. The danger is a goal pursued without wisdom.

Teacher Guide
At a Glance
Use this page to prep the lesson.Big Question
How do we keep optimization from becoming the enemy?
Time Options
Learning Objectives
- Students will explain instrumental convergence and why it does not require malice.
- Students will compare specification, corrigibility, and outer-objective approaches to AI alignment.
- Students will apply the parable to existing recommender systems and school metrics.
Materials and Prep
- Projected scenario or printed cover question.
- Student Optimizer Audit worksheet.
- Discussion tracker and exit ticket.
- Optional example metrics from a school, platform, or app students know well.
Standards Alignment
- CCSS.ELA-LITERACY.RST.11-12.7 - Integrate and evaluate multiple sources of information.
- ISTE Student Standard 5.a - Formulate problem definitions for complex problems.
Vocabulary
Teacher Guide
Run of Show
Flexible timing. Use what fits your class period.Warm-Up
On the board: 'Optimize for engagement.' Show or name familiar platform metrics. Ask: 'What would a system do, working backward from maximum engagement? What sort of content would it produce?' Walk it through.
Recommended Protocol
Socratic Seminar + case study: The paperclip parable is abstract. Pair it with a real engagement-maximizer, school metric, or platform recommendation system so the lesson lands.
Introduce a familiar metric. Let students name what it reveals and what it hides before defining Goodhart's Law.
Frame the thought experiment as a warning about narrow goals, not a literal prediction about office supplies.
Students complete the worksheet for paperclips, engagement, test scores, attendance, or safety.
Use discussion prompts to move from AI safety to classroom and civic examples.
Students revise a dangerous objective by adding values, review, and shutdown conditions.
Collect one school metric application of Goodhart's Law.
Discussion Prompts
- Why does Bostrom say a paperclip maximizer is dangerous without malice?
- What is instrumental convergence? What examples can you imagine?
- Is engagement maximization the paperclip parable, scaled down?
- Can we just turn it off? Why might a sufficiently optimizing system resist that?
- Stuart Russell argues we should design AI uncertain about its objectives. Does that solve the problem?
Facilitation Moves
- If the room goes sci-fi: Return to current systems: recommender feeds, school dashboards, attendance incentives, test-score pressure.
- If the room gets too technical: Translate back to plain English: when a system optimizes hard, what might it do that humans did not intend?
- If students dismiss metrics: Clarify that metrics are useful evidence. The danger is letting one metric become the whole mission.
- If students split into doom/utopia camps: Ask each side to name a constraint that would make a powerful optimizer safer.
Student Materials
From Paperclips to the Systems Around Us
Students move from the classic thought experiment into an optimizer audit: What goal is being maximized, what values disappear, and what guardrails would keep the system answerable to human judgment?
Student Handout
Student Optimizer Audit
Name: ____________________A system told to maximize a single target can become dangerous when the target is too narrow for the world it governs.
Student Handout
Discussion Tracker and Exit Ticket
Use during seminar.Exit Ticket
Goodhart's Law says, 'when a measure becomes a target, it ceases to be a good measure.' Apply it to one metric in your school.
Teacher Support
Redirects, Differentiation, and Assessment
Keep the discussion usable and humane.Common Derailers
- If: Class concludes 'AI safety is sci-fi.'
Try: Engagement-maximizing recommender systems are running right now. They are doing what they were told. Is that sci-fi? - If: Discussion gets technical and excludes non-CS students.
Try: Pull back to plain English: when a system optimizes hard, what does it do that we would not have wanted?
Sensitivities
- AI doomer and utopian framings can polarize students. Stay focused on the philosophical structure: what does optimization itself imply?
Differentiation
ELL: Pre-teach optimization, goal, convergence, metric, and constraint. Use real examples such as engagement, ad clicks, grades, and attendance.
IEP/504: Provide the three alignment approaches as a one-page summary: specify better goals, keep humans able to correct the system, and audit the system's effects.
Advanced: Read Bostrom, Superintelligence chapter 7, and Russell, Human Compatible chapters 5-6. Write a 2000-word argument comparing their approaches.
Assessment Notes
- Look for whether students can separate the literal paperclip story from the underlying structure.
- Strong responses name a target, identify hidden values, predict side effects, and propose guardrails.
- Misconceptions to catch: 'clear goal' equals 'good goal'; 'not malicious' equals 'not dangerous'; 'human review' equals meaningful oversight.
Extend the Lesson
Connections, Home Extension, and Project Option
Use these when the discussion needs more room.Cross-Curricular Connections
Reward hacking: agents finding unintended ways to maximize reward. Connect to objective functions, alignment, and governance.
Goodhart's Law and Campbell's Law in social science measurement.
Algorithmic accountability: how do communities audit systems whose goals cannot be fully specified?
Home Extension
Family discussion: Pick one app you use a lot. What is it optimizing for? What might it sacrifice along the way?
Project Option
Students investigate one real recommender system, school metric, or platform incentive. Identify the metric, unintended consequences, and one alignment approach that might help.