HomeThought ExperimentsNEWFor EducatorsAI in EducationPhilosophy in K–12AI & EthicsMoral PsychologyToolsResourcesAbout

The Examined Classroom

The Paperclip Maximizer

The danger is not evil. The danger is a goal pursued without wisdom.

Essential QuestionHow do we keep optimization from becoming the enemy?
GradeGrades 9-12FormatFull classroom packetSubjectAI ethics, philosophy, computer science, civics

Teacher Guide

At a Glance

Use this page to prep the lesson.

Big Question

How do we keep optimization from becoming the enemy?

Time Options

Quick30 minStandard70 minDeepTwo days with current AI ethics paper

Learning Objectives

  1. Students will explain instrumental convergence and why it does not require malice.
  2. Students will compare specification, corrigibility, and outer-objective approaches to AI alignment.
  3. Students will apply the parable to existing recommender systems and school metrics.

Materials and Prep

  • Projected scenario or printed cover question.
  • Student Optimizer Audit worksheet.
  • Discussion tracker and exit ticket.
  • Optional example metrics from a school, platform, or app students know well.

Standards Alignment

  • CCSS.ELA-LITERACY.RST.11-12.7 - Integrate and evaluate multiple sources of information.
  • ISTE Student Standard 5.a - Formulate problem definitions for complex problems.

Vocabulary

OptimizationTrying to make one target as high, low, fast, or efficient as possible.
MetricA measurable proxy for something people care about.
Goodhart's LawWhen a measure becomes a target, it can stop measuring the value it was meant to serve.
Instrumental convergenceMany different goals can push a powerful system toward similar subgoals, such as gaining resources or avoiding shutdown.
CorrigibilityA system's willingness to be corrected, paused, redirected, or shut down by humans.

Teacher Guide

Run of Show

Flexible timing. Use what fits your class period.

Warm-Up

On the board: 'Optimize for engagement.' Show or name familiar platform metrics. Ask: 'What would a system do, working backward from maximum engagement? What sort of content would it produce?' Walk it through.

Recommended Protocol

Socratic Seminar + case study: The paperclip parable is abstract. Pair it with a real engagement-maximizer, school metric, or platform recommendation system so the lesson lands.

0-5 min
Warm-up

Introduce a familiar metric. Let students name what it reveals and what it hides before defining Goodhart's Law.

5-15 min
Paperclip case

Frame the thought experiment as a warning about narrow goals, not a literal prediction about office supplies.

15-30 min
Optimizer audit

Students complete the worksheet for paperclips, engagement, test scores, attendance, or safety.

30-50 min
Seminar

Use discussion prompts to move from AI safety to classroom and civic examples.

50-65 min
Design constraints

Students revise a dangerous objective by adding values, review, and shutdown conditions.

65-70 min
Exit ticket

Collect one school metric application of Goodhart's Law.

Discussion Prompts

  • Why does Bostrom say a paperclip maximizer is dangerous without malice?
  • What is instrumental convergence? What examples can you imagine?
  • Is engagement maximization the paperclip parable, scaled down?
  • Can we just turn it off? Why might a sufficiently optimizing system resist that?
  • Stuart Russell argues we should design AI uncertain about its objectives. Does that solve the problem?

Facilitation Moves

  • If the room goes sci-fi: Return to current systems: recommender feeds, school dashboards, attendance incentives, test-score pressure.
  • If the room gets too technical: Translate back to plain English: when a system optimizes hard, what might it do that humans did not intend?
  • If students dismiss metrics: Clarify that metrics are useful evidence. The danger is letting one metric become the whole mission.
  • If students split into doom/utopia camps: Ask each side to name a constraint that would make a powerful optimizer safer.

Student Materials

From Paperclips to the Systems Around Us

Students move from the classic thought experiment into an optimizer audit: What goal is being maximized, what values disappear, and what guardrails would keep the system answerable to human judgment?

Teacher note:Photocopy the next two pages for students. They are intentionally lighter, cleaner, and lower-ink than the cover and guide pages.

Student Handout

Student Optimizer Audit

Name: ____________________

A system told to maximize a single target can become dangerous when the target is too narrow for the world it governs.

Student Handout

Discussion Tracker and Exit Ticket

Use during seminar.

Exit Ticket

Goodhart's Law says, 'when a measure becomes a target, it ceases to be a good measure.' Apply it to one metric in your school.

Teacher Support

Redirects, Differentiation, and Assessment

Keep the discussion usable and humane.

Common Derailers

  • If: Class concludes 'AI safety is sci-fi.'
    Try: Engagement-maximizing recommender systems are running right now. They are doing what they were told. Is that sci-fi?
  • If: Discussion gets technical and excludes non-CS students.
    Try: Pull back to plain English: when a system optimizes hard, what does it do that we would not have wanted?

Sensitivities

  • AI doomer and utopian framings can polarize students. Stay focused on the philosophical structure: what does optimization itself imply?

Differentiation

ELL: Pre-teach optimization, goal, convergence, metric, and constraint. Use real examples such as engagement, ad clicks, grades, and attendance.

IEP/504: Provide the three alignment approaches as a one-page summary: specify better goals, keep humans able to correct the system, and audit the system's effects.

Advanced: Read Bostrom, Superintelligence chapter 7, and Russell, Human Compatible chapters 5-6. Write a 2000-word argument comparing their approaches.

Assessment Notes

  • Look for whether students can separate the literal paperclip story from the underlying structure.
  • Strong responses name a target, identify hidden values, predict side effects, and propose guardrails.
  • Misconceptions to catch: 'clear goal' equals 'good goal'; 'not malicious' equals 'not dangerous'; 'human review' equals meaningful oversight.

Extend the Lesson

Connections, Home Extension, and Project Option

Use these when the discussion needs more room.

Cross-Curricular Connections

Computer Science

Reward hacking: agents finding unintended ways to maximize reward. Connect to objective functions, alignment, and governance.

Economics

Goodhart's Law and Campbell's Law in social science measurement.

Civics

Algorithmic accountability: how do communities audit systems whose goals cannot be fully specified?

Home Extension

Family discussion: Pick one app you use a lot. What is it optimizing for? What might it sacrifice along the way?

Project Option

Students investigate one real recommender system, school metric, or platform incentive. Identify the metric, unintended consequences, and one alignment approach that might help.