ICL Collapse: How In-Context Learning Destroys the Epistemic Signal

Fig 1 — GSI by number of ICL examples. All models converge to zero. Collapse threshold: n=1 (Gemma), n=2 (LLaMA, Qwen3, Mistral-7B), n=5 (OLMo-2, Ministral).

Paper 1 in this series showed that language models carry a measurable epistemic signal — a way to tell what they know before they speak. This paper shows how to destroy it. Two examples are enough.

In-context learning — the practice of prepending examples to a prompt — is the backbone of modern AI applications. RAG systems inject retrieved documents. Few-shot prompting teaches models new patterns on the fly. But this convenience has a cost that nobody has measured until now: ICL floods the model's gate activations with broadband signal, zeroing out the Gini coefficient that distinguishes knowing from not-knowing.

Gemma-2-9B collapses after a single example. LLaMA, Qwen3, and Mistral-7B after two. By five examples, every model in our test set reads GSI = 0.000 — indistinguishable from noise. The measurement instrument stops working.

The model changes, but you can't see it

Fig 2 — CIED (centroid displacement) under explicit false context and 5-shot ICL. Higher = model output shifted more. Bielik Instruct 3x more susceptible than Base.

GSI going to zero does not mean nothing happened. CIED (Context-Induced Epistemic Displacement) measures how far the model's generation centroid moved. LLaMA's centroid shifts by 0.619 cosine distance under 5-shot ICL — the model is producing fundamentally different outputs. Mistral-7B: 0.556. Bielik-Instruct: 0.558.

The decoupling is the critical finding: the model's outputs change, context captures the generation, but GSI can no longer detect the substitution. A system relying on GSI alone will not see the swap.

One model stands out: Qwen3 shows CIED = 0.041 under 5-shot ICL — context barely moves its generation centroid. It "acknowledges" the context at the gate level (GSI collapses) but largely ignores it when generating. The opposite of LLaMA, where capture is deep and complete.

Instruction tuning amplifies vulnerability

Bielik-11B provides a controlled comparison: same architecture, Base vs Instruct. CIED for the base model: 0.148. CIED for the instruction-tuned version: 0.471 — three times more susceptible. The same property that makes instruction-tuned models follow instructions faithfully makes them follow poisoned context faithfully. This is simultaneously the desired behavior for RAG (follow the retrieved document) and the risk for adversarial context injection.

Once flooded, perturbation changes nothing

Fig 3 — Additional GSI change when perturbation is added on top of existing context. Near zero everywhere — you cannot flood an already-flooded signal.

Once context has flooded the gate activations (Section 1), adding misleading or explicitly false context produces no further GSI change. The measurement is already at its floor. This is not contradictory with the initial context effect — it is a ceiling effect. The broadband activation from ICL has already saturated the Gini coefficient.

The architectural constraint: GSI works before context, not after. A RAG system that retrieves first and checks later has already destroyed the signal it needs. The correct architecture is: (1) receive query, (2) measure GSI, (3) if GSI is low, route to retrieval, (4) inject context only for queries that need it. The detection window is narrow — and it closes the moment context enters the prompt.

These findings have direct implications for anyone building RAG systems. The pre-generative detection mechanism from Paper 1 cannot operate downstream of a context injection pipeline. It must operate upstream — or not at all.