Do LLMs Fade Worked Examples?

Tim Gallagher Pilot Study

March 2026

A Pilot Study of Pedagogical Reasoning in Frontier AI Models

Abstract

The worked example fading effect is one of the most robust findings in cognitive load theory: instructional sequences should begin with fully worked examples and progressively remove solution steps so learners take over the problem-solving process. If teachers use LLMs to generate worked examples, do those models apply fading?

This pilot study tested two frontier models (Claude Opus 4.6 and Gemini 3.1 Pro) across three prompting conditions: a baseline with no pedagogical instruction, general cognitive load theory instruction, and specific fading instruction. Both model outputs and reasoning traces (chain-of-thought) were collected and scored on two dimensions: whether the output implemented fading, and whether the trace referenced it.

The headline finding: both models demonstrably know about fading in their reasoning traces when prompted with "apply cognitive load theory," but neither applies it in their output unless told specifically what fading is. This knowledge-application gap has direct implications for teachers relying on AI-generated instructional materials.

This is a pilot study with two models, one run per condition, and no inferential statistics. All findings are descriptive. For the full list of AlignED reports, visit AlignED Reports.

At a glance

Frontier models tested

Prompting conditions

Outputs scored

Reasoning traces analysed

Key finding: Both models referenced fading in their reasoning traces when told to "apply cognitive load theory" (Condition B). Neither applied fading in their actual output without specific instruction (Condition C). The models know about fading but do not select it from their repertoire of CLT principles unless the prompt names it directly.

Reading guide

This site follows the structure of an academic paper. The Introduction explains why fading matters and frames the research question. Methods describes the prompts, models, and scoring rubric. Results presents the scores, charts, and key trace excerpts. Discussion draws out implications and states limitations. The Appendices contain all prompts, the full rubric, and complete model outputs and reasoning traces.

Do LLMs Fade Worked Examples?

Abstract

At a glance

Reading guide

1. Introduction

2. Methods

3. Results

4. Discussion

Appendices