HomeThought ExperimentsNEWFor EducatorsAI in EducationPhilosophy in K–12AI & EthicsMoral PsychologyToolsResourcesAbout
Ethics · Frameworks · Policy

From Ambiguity to Action

"Uphold ethics" is not a policy. It's a placeholder where a policy should be. This piece walks through the normative frameworks educators actually need — utilitarianism, deontology, virtue ethics — and the thought experiments that translate them into rules a sixteen-year-old can quote back at you.

18 min read
A painterly policy desk with balance scales, a compass, and layered decision cards moving from values to classroom practice.

In school leadership documents, "ethics" tends to show up in the same paragraph as "rigor," "excellence," and "the highest standards." The words feel weighty. They are also, almost always, doing no work. A staff that has agreed to "uphold the highest ethical standards" has not yet agreed on a single concrete behavior. The agreement is the easy part. The disagreement starts the moment two reasonable teachers reach different conclusions about whether AI-generated feedback is acceptable in a fifth-grade writing class.

The point of this piece is not to deliver a finished ethical theory. Philosophers have been at that project for 2,400 years without consensus, and a school leadership team is not going to settle the matter on a Tuesday afternoon. The point is to give educators a workable habit: name the value, choose the framework that tests it, build the thought experiment that exposes its edge cases, write the guideline that follows, and then watch what happens in the classroom — because the classroom is where the policy either holds or fails.

The reason this matters now is that AI in education has stopped being a future problem. By early 2025, 92% of UK undergraduates reported using AI tools in their studies (HEPI 2025), up from 66% in 2024. The Digital Education Council's global survey found 86% of students worldwide doing the same. By late 2025, 33+ U.S. states had issued formal AI guidance. The EU AI Act came into force August 2024 with education classified as a "high-risk" domain. Schools that haven't done the slow work of getting from value to practice are no longer ahead of the wave. They are inside it.

Why 'Uphold Ethics' Fails

Two teachers in the same building can hold opposite positions, sincerely, under the heading of "uphold ethics." One refuses to give AI-generated feedback because she believes it strips the human relationship out of evaluation. Another routinely uses AI to draft initial comments because she believes withholding faster, more consistent feedback fails her students. Both are appealing to ethics. The slogan settles nothing.

That's not a flaw in either teacher. It's a flaw in the policy that handed them both the same one-line standard and expected the disagreement to resolve itself. The real disagreement isn't about whether to be ethical. It's about which values to prioritize when honest values point in different directions — and how to test those priorities against the cases that will actually walk through the classroom door.

"A school that has agreed to be ethical has not yet agreed on anything."

— The article's working premise

Until a policy can address specific cases by name, it is not yet a policy. Below are the kinds of questions a real AI policy in a real school has to answer in language that a parent, a sub teacher, and a high school junior can all interpret the same way.

  • If a teacher uses an AI-generated voice clone to send personalized weekly check-ins to every student, is that an act of care or an act of deception? Does the answer change if the students are told? Does it change if the parents are told and the students aren't?
  • If an AI tutor measurably outperforms human one-on-one tutoring for math fluency, and the school can afford only one of the two, what does "putting students first" actually require?
  • If a student uses an AI avatar that looks and sounds like them to attend a class they would otherwise miss, asks questions they would ask, and learns the material — what's been compromised, and by whom?
  • If the school's behavior-monitoring AI flags a student for a pattern of disengagement that no human noticed, who owns the next conversation: the AI, the counselor, or the teacher whose intuition was overridden?
  • If AI-generated feedback is more accurate, more consistent, and more timely than human feedback, is a teacher who insists on writing every comment herself acting on principle or on pride?

A school that can answer five of these in writing has a policy. A school that gestures at "ethics" and "professional judgment" does not — it has an aspiration that will quietly hand the answers to whichever individual teacher happens to be in the room.

The Three Frameworks Educators Actually Need

Visualization

Three Lenses on the Same Decision

UtilityconsequencesUtilitarianism"did learning improve?"DutyprinciplesDeontology"is it honest?"VirtuecharacterVirtue Ethics"who am I becoming?"the actionin front of you
The frameworks don't compete for which is correct — they ask different questions about the same action. Most workable AI policies braid all three.

Utilitarianism, in its classical form (Bentham, then refined by Mill), holds that the right action is the one that produces the greatest overall good — usually measured as wellbeing, happiness, or welfare. Applied to AI in education, the utilitarian question is empirical: did students learn more? Did teachers have more time for the work only they can do? Did the intervention raise outcomes for the students who needed it most?

The strength of this framework is that it forces accountability to outcomes. "We adopted this tool because we believed in it" is not enough. The utilitarian asks: did it actually work, for whom, and at what cost? A school that takes this seriously runs the data. A school that doesn't is using the framework as decoration.

The limits show up the moment outcomes start trading against each other. If an AI tutor raises average test scores by 12 points but disengages the lowest-performing third of students, the utilitarian math has to pick a population. If a behavior-monitoring system catches more incidents but corrodes student trust, the framework needs a way to weigh the two effects against each other. Outcomes are not all in the same currency.

Source: Stanford Encyclopedia of Philosophy, "John Stuart Mill"

Deontology (Kant being the classical source) holds that some duties hold regardless of consequences. The categorical imperative — act only on principles you could will to be universal law, and treat every person as an end and never merely as a means — gives a fundamentally different test. The question is not "did it work?" but "is the action itself defensible if everyone did it?"

For AI in education, deontology surfaces duties the utilitarian frame can paper over: the duty to be honest with students about what they are interacting with; the duty to respect a student's autonomy in deciding how to do their own intellectual work; the duty not to treat a child as a data point in a system optimization problem. A teacher who uses an AI deepfake to send personalized check-ins without disclosure is producing a good outcome (better student-teacher connection) by means most deontologists would consider impermissible.

Kant's famous case of the murderer at the door — should you lie to save your friend? — is the standard objection. Pure deontology can produce conclusions most reasonable people would reject. The framework is a corrective, not a sole guide.

Source: Stanford Encyclopedia of Philosophy, "Kant's Moral Philosophy"

Aristotle's virtue ethics shifts the focus from the action to the person performing it. The question is not "did the act produce good?" or "did it follow the rule?" but "what does it cultivate in the agent over time?" Virtues — practical wisdom, courage, justice, temperance, honesty — are developed by habituation. We become what we repeatedly do.

For students, the virtue lens asks: does this use of AI cultivate intellectual independence, or quietly erode it? Does it teach the patience required to sit with difficulty, or does it remove the difficulty before the patience can form? Coelho and colleagues, writing in the British Educational Research Journal in 2025, argued that AI-assisted student work can produce the appearance of intellectual development without the actual cultivation of it — what they called a "placebo effect" paired with a "nocebo effect" in which students retreat from the harder work of becoming autonomous thinkers.

For teachers, the lens asks: does the way I'm using this tool cultivate the kind of educator I want to be? A teacher who uses AI to generate feedback faster and reinvests the saved time in conferencing one-on-one with students is exercising a different virtue than one who uses the same tool to disengage from grading altogether. The action is identical. The character formation is not.

2025

The placebo and nocebo effects of generative AI on subjectification

Coelho et al. argued that AI-assisted student work produces an appearance of independent intellectual development without the underlying habit formation, and that students who experience this loop subsequently retreat from the slower work of becoming autonomous thinkers. Uniform AI rules will not address this — only assignment design and assessment practice will.

Coelho et al., British Educational Research Journal, 2025

Source: Stanford Encyclopedia of Philosophy, "Aristotle's Ethics"

Source: Coelho et al., BERJ (2025)

The frameworks aren't competing for which is correct. They are asking different questions about the same action, and the same action will look different through each. The most workable AI policies in actual schools — NYC's traffic-light framework being the most detailed U.S. example as of March 2026 — implicitly use all three. The "red" prohibitions read as deontological. The "yellow" categories with active educator judgment read as virtue ethics. The "green" approvals based on demonstrated benefit read as utilitarian.

What each framework catches

  • Utility: outcomes most likely to be missed by intuition.
  • Duty: dignity violations most likely to be rationalized away.
  • Virtue: long-run character costs invisible in short-run scoring.

What each framework misses

  • Utility: aggregation that hides the worst-off student.
  • Duty: edge cases where the rule produces obvious harm.
  • Virtue: a way to settle disagreement between two virtuous people.

A policy that uses only one of these will reliably mishandle a third of the cases it touches. The discipline isn't picking a framework. It's keeping all three in the room and noticing when they pull in different directions.

Source: NYC Public Schools, "Guidance on Artificial Intelligence" (March 2026)

Thought Experiments as Policy Tools

Philippa Foot introduced the Trolley Problem in 1967 and Judith Jarvis Thomson sharpened it through the 1970s and 1980s. The setup: a runaway trolley is about to kill five people on the track ahead. You can pull a lever that diverts it to a side track, where it will kill one person instead. Most people, asked, pull the lever — sacrificing one to save five looks like the right call.

The Footbridge variant changes the means. You're now on a bridge over the tracks. The only way to stop the trolley is to push a large man off the bridge in front of it. He dies, the five live. Same math. Most people refuse. The intuition is that the means matter — that there is a moral difference between redirecting harm and using a person as the instrument of stopping it.

For school leaders, this is not abstract. It is the test for any AI tool that improves average outcomes by accepting harm to a specific student. A behavior-monitoring system that improves overall safety by occasionally flagging an innocent child as "at risk" is, in moral structure, the Footbridge case. A teacher who agrees with the utilitarian math but balks at deploying the system is doing what the Footbridge intuition predicts: the means matter, even when the outcome math runs the other way.

Source: Philippa Foot, "The Problem of Abortion and the Doctrine of the Double Effect" (1967)

Source: Judith Jarvis Thomson, "The Trolley Problem" (1985)

You do not need a moral philosophy degree to design a thought experiment that serves a real policy decision. The structure is simple: take the value you say you hold, push it into a case where it would cost you something, and see if you still hold it. The cases that produce the most useful disagreement are the ones nobody wants to answer.

1

Name the value

Pick one. "We value honesty." "We value student autonomy." "We value teacher judgment." One sentence.

2

Build a case where the value costs something

What does honesty about AI use cost when the student would have gotten a higher grade without disclosure? What does student autonomy cost when the autonomous choice is to disengage entirely?

3

Force the answer in writing

Ambiguity dies in the writing. A team that can say "in this case, we would do X and accept the cost Y" is doing real work. A team that says "it depends on the situation" is still upstream of the work.

4

Test against the cases you've actually seen

Pull real cases from the last semester. Does your answer still hold? If not, the value statement was hiding the real principle. Find the real one.

The schools that handle AI well are not the ones with the longest policy documents. They are the ones whose leadership team has done this exercise enough times that the next case doesn't ambush them.

From Value to Practice

Visualization

From Aspiration to Practice

Valuewhat we say we care aboutFrameworkthe ethical lens we applyThought experimentthe test that surfaces what the value really demandsGuidelinethe rule it producesPracticewhat actually happens in the classroom
Each layer narrows the previous one. Skip a layer and you get policy that reads well but breaks on contact with a real student.

Every working AI policy in education narrows through the same five layers. The slogan-only policies stop at layer one. The legalistic policies skip to layer four. Both versions fail in their own ways. The ones that hold do the slow work in the middle.

1

Value

What does the school actually care about? Not what looks good on the website. What survives the case where it costs something?

2

Framework

Which ethical lens makes this value testable? Utility for outcome-driven values, deontology for dignity-driven ones, virtue for character-driven ones.

3

Thought experiment

What's the case that reveals what the value really demands? The one nobody wants to answer is the one worth designing.

4

Guideline

The specific, sixteen-year-old-readable rule the experiment produces. "AI use is permitted with disclosure on take-home essays" is a guideline. "Uphold ethics" is not.

5

Practice

What actually happens when a teacher gets the case. The classroom is where the policy either holds or fails. No amount of writing fixes a practice the staff can't execute.

Every AI policy in every school is going to age badly. The tool that's central to today's policy will be obsolete in eighteen months. The new tool that wasn't on the radar at the time of writing will create cases the policy doesn't address. Teachers will find loopholes. Students will find better ones. The policy will need to be rewritten — not because it was bad, but because the world it was written for has moved.

This is fine. The policy's job is not to be permanent. Its job is to make the next round of decisions visible, contested, and improvable. A school that has done the value → framework → experiment → guideline → practice work once knows how to do it again. A school that hasn't is starting from zero every time the technology changes.

Stop trying to write the policy that catches every case. Start building the institutional habit that handles the next case. The first move is rewriting the policy in 12-18 months — name that explicitly in the policy itself.

Hume's old observation, sharpened by Sparrow and Flenady's 2025 paper in AI & Society: you cannot derive an "ought" from an "is." That AI can do a thing — write the essay, grade the paper, tutor the student, replace the teacher — does not settle whether it should. The settling is a different conversation, requiring different tools, and it does not happen on its own. It happens because someone in the building decided to host it, with the frameworks ready, the thought experiments drafted, and enough time blocked on the calendar to argue the cases through.

The point of the move from ambiguity to action is not to arrive at a finished ethics. It is to put the conversation on stable enough footing that the next case — and there will be a next case — gets handled by a school that has practiced thinking about cases. The slogan-only schools won't do this. The legalistic schools will overdo the document and underdo the practice. The schools that do the slow work in between are the ones whose teachers, in five years, will still be able to look a parent in the eye and explain what their school stands for and why.

Source: Sparrow & Flenady, "Bullshit Universities: The Future of Automated Education" (AI & Society 40, 2025)

References

Matthew A. Zinn. "From Ambiguity to Action: Navigating Ethical Challenges in AI-Enhanced Education." The Examined Classroom, July 12, 2024. (Original publication; this article is the expanded internal treatment.)

Philippa Foot. "The Problem of Abortion and the Doctrine of the Double Effect." Oxford Review 5, 1967.

Judith Jarvis Thomson. "The Trolley Problem." Yale Law Journal 94(6), 1985, 1395–1415.

Stanford Encyclopedia of Philosophy. "John Stuart Mill." Substantive revision 2022.

Stanford Encyclopedia of Philosophy. "Kant's Moral Philosophy." Substantive revision 2022.

Stanford Encyclopedia of Philosophy. "Aristotle's Ethics." Substantive revision 2022.

Aristotle. Nicomachean Ethics. Translation by W. D. Ross.

Coelho et al. "Generative AI in schools: placebo and nocebo effects on subjectification." British Educational Research Journal, 2025.

Robert Sparrow and Gavin Flenady. "Bullshit Universities: The Future of Automated Education." AI & Society 40, 2025, 5285–5296.

Gert Biesta. The Beautiful Risk of Education. Routledge, 2014.

NYC Public Schools. "Guidance on Artificial Intelligence." Released March 24, 2026.

European Union. Regulation (EU) 2024/1689 (the AI Act). In force August 1, 2024.

Higher Education Policy Institute. "Student Generative AI Survey 2025."

UNESCO. "AI Competency Framework for Teachers" (2024). UNESDOC: ark:/48223/pf0000391104.

Microsoft Research. "The Impact of Generative AI on Critical Thinking" (2025).

For Educators

Take this somewhere. The three sections below distill what to remember, what to do with students next week, and where to keep reading.

Key Takeaways

  1. "Uphold ethics" is not a policy. Two reasonable teachers can hold opposite positions under that heading. A policy starts at the point where you can say what you would do in a specific case and what cost you would accept.

  2. The three classical frameworks aren't competing for which is correct. They ask different questions about the same action. Most workable AI policies use utilitarianism for outcome questions, deontology for dignity questions, and virtue ethics for character questions — often in the same paragraph.

  3. Thought experiments are policy tools, not philosophy-class decorations. A case nobody on the leadership team wants to answer is the most useful design for the policy you need next.

  4. Every AI policy will age badly. Plan the rewrite cadence (12–18 months) into the policy itself. The point is the institutional habit, not the document.

  5. The is/ought distinction is the question every AI-in-education conversation eventually lands on. That AI can do something does not yet tell you whether it should. The settling work is values work, and it doesn't happen on its own.

Bring It Into Your Classroom

Run the Trolley → Footbridge → AI sequence with staff

60 min

Walk through the Trolley Problem, then the Footbridge variant, then a current AI case from your school (a flagged essay, a monitoring alert, an AI-tutor-vs-teacher-time tradeoff). Notice where intuitions hold across all three and where they break.

Discussion prompt: If your intuition flipped between Trolley and Footbridge but not between Footbridge and the AI case, what is your intuition tracking that the math isn't?

The 'name the value' exercise

45 min

Each department writes one sentence: "We value ___." Then each writes a case where that value would cost something. Then the team picks the one case nobody wants to answer and answers it in writing.

Discussion prompt: If you can't answer the case in writing, the value statement was hiding something. What is the real value underneath?

Audit the gap between policy and practice

30 min

Pull your current AI guidance and three real cases from the last semester. For each case, mark whether the policy actually told the teacher what to do, or whether the teacher had to decide alone. Count.

Discussion prompt: Where the count is high, where would the next thought experiment have to land to close the gap?

Where to Go Next

Outside reading ↗

NYC Schools AI Guidance (March 2026)

The most detailed U.S. district framework. Read it as a model of frameworks made operational.

Outside reading ↗

Sparrow & Flenady on automated education (AI & Society, 2025)

The is/ought distinction made very sharp, applied to teacher replacement.

Outside reading ↗

UNESCO AI Competency Framework for Teachers (2024)

The first global framework. 15 competencies across 5 dimensions; a useful scaffold for staff development.

Continue Exploring

AI Ethics

Policy, philosophy, and frameworks

Authorship Quandary

The frameworks applied to one case

Thought Experiments

Practice ethical reasoning