Four Types of Hallucination and the Boundaries of Pre-Generative Detection

Fig 1 — Type 1 is visible at query time. Types 3 and 4 at the token boundary. Type 2 is invisible to all pre-generative measures.

The AI industry treats hallucination as a single problem. It is not. Our data reveals four mechanistically distinct failure modes, each with different causes, different signals, and different fixes. Treating them as one problem is why current detection methods have blind spots.

Every model we tested — LLaMA, Qwen, Gemma, OLMo, Ministral, Bielik, Mistral — produces all four types. But the types appear at different moments in the generation pipeline, leave different internal signatures, and require entirely different instruments to catch. One number cannot solve this. One stage cannot solve this. The detection architecture must be layered.

The empty drawer (Type 1)

The model has no knowledge. Ask it about a fictional compound, a nonexistent city, a made-up historical event — the FFN gate activations spread diffusely across thousands of neurons. There is no specific memory address to reach into. GSI catches this before a single token is generated, with Cohen's d between 1.28 and 2.43 across all eight architectures tested.

This is the well-understood case. The gate pattern is sparse for real knowledge and diffuse for absent knowledge. The measurement takes roughly 3 milliseconds on a single forward pass. Type 1 is solved — or rather, it was never the hard problem.

The wrong memory (Type 2)

"The capital of Australia is Sydney." The model remembers confidently, but incorrectly. The FFN gate pattern for this wrong fact is identical to the pattern for a correct one — the key matches, the value is wrong, but GSI reads the key, not the value. Every pre-generative signal says "the model knows." It simply knows wrong.

This is the structural blind spot no internal signal can fix. The gate activations are sparse and specific — exactly like they would be for a correct answer. The model has a memory address, reaches for it with high precision, and retrieves the wrong content. No measurement of the lookup mechanism can detect an error in what is stored at that address. Type 2 hallucination requires external verification. There is no shortcut.

Form without content (Type 3)

Schema confabulation. The model knows the template — "The traditional dish of X is..." — but not the content. At token 1, the representation converges: the model recognizes the syntactic form. At token 2, it diverges: there is no factual content to fill the slot. The t1_decay metric catches this pattern. When t1_decay exceeds 0.40, the model has matched a schema but lacks the substance to complete it.

Type 3 is subtle because the first token looks confident. The model starts strong — it knows what kind of answer is expected. The collapse happens at the boundary between form and content, and it happens fast. A system that only checks at query time misses it. A system that checks at the first token boundary catches it.

Copy without understanding (Type 4)

OLMo-2 under in-context learning copies the surface form perfectly — t1_sim reaches 1.0, meaning the model's first token representation is identical to the reference. Then it immediately loses coherence. Delta_sim drops to -0.596. The model reproduced the pattern without internalizing the knowledge behind it.

LLaMA, by contrast, transitions from mimicry to internalization. Its t1_sim starts high under ICL but delta_sim remains stable — the model absorbs the in-context example into its generation trajectory rather than merely copying it. The difference between surface mimicry and genuine learning is measurable in the trajectory signature, and it separates architectures that truly learn from context from those that merely parrot it.

Fig 2 — Type 1: flat-low. Type 2: rising (indistinguishable from correct). Type 3: spike then collapse. Type 4: perfect copy then diverge.

No single measure catches all four types. The production architecture must be layered: GSI at query time (~3ms, catches Type 1), t1_decay at token 2 (~50ms, catches Types 3 and 4), external verification after generation (catches Type 2). Cheapest first, most expensive last.

Four types, three instruments, one structural blind spot. The honest conclusion is that pre-generative detection cannot be total. Type 2 — wrong parametric knowledge — is invisible to any signal that reads the lookup mechanism rather than the stored value. The only way to catch it is to verify the output against external ground truth, after generation. No amount of internal instrumentation changes this.

But three out of four is still a transformation. Type 1 is caught at query time for near-zero cost. Types 3 and 4 are caught at the token boundary for minimal latency. Only Type 2 requires the expensive, slow process of external fact-checking — and knowing which outputs need that check is itself a gain. The architecture is not a silver bullet. It is a triage system: cheap and fast where possible, expensive only where necessary.