A benchmark for honkaku (本格) deduction

Can an AI
solve a murder?

Seventy Honkaku murder mysteries — the genre that swears every clue is on the page and dares you to name the killer first. We handed the dossiers to five frontier AI models and let them play detective. Can they solve the cases with the correct reasoning?

Verdicts Try the cases yourself →

Headline Results

The scoreboard

Model	Perfect (100%)	Solved end‑to‑end (≥90%)	Mean (partial)
Claude Fable 5	17.1%	34.3%	67.7%
Claude Opus 4.8	5.7%	10.0%	46.0%
Claude Opus 4.6	7.1%	12.9%	45.8%
GPT‑5.5	7.1%	10.0%	45.7%
Gemini 3.1 Pro	4.3%	10.0%	42.1%

All five models answered all 70 graded cases. “Solved end‑to‑end” counts cases scored ≥90% — essentially the complete solution, culprit through deductions, not partial credit; “Perfect” is a flawless 100%. “Mean” is the average partial-credit score.

Read the analysis →

Now it's your turn.

Eight cases. Every clue on the page. No answer key in sight. Read one, out-deduce the machine, and send us your reasoning — it goes straight to the benchmark authors.

Try a Case→

Can an AIsolve a murder?

The scoreboard

Now it's your turn.

Can an AI
solve a murder?