A benchmark for honkaku (本格) deduction

Can an AI
solve a murder?

Seventy Honkaku murder mysteries — the genre that swears every clue is on the page and dares you to name the killer first. We handed the dossiers to five frontier AI models and let them play detective. Can they solve the cases with the correct reasoning?

The scoreboard

Model Perfect (100%) Solved end‑to‑end (≥90%) Mean (partial)
Claude Fable 5 17.1% 34.3% 67.7%
Claude Opus 4.8 5.7% 10.0% 46.0%
Claude Opus 4.6 7.1% 12.9% 45.8%
GPT‑5.5 7.1% 10.0% 45.7%
Gemini 3.1 Pro 4.3% 10.0% 42.1%

All five models answered all 70 graded cases. “Solved end‑to‑end” counts cases scored ≥90% — essentially the complete solution, culprit through deductions, not partial credit; “Perfect” is a flawless 100%. “Mean” is the average partial-credit score.

Now it's your turn.

Eight cases. Every clue on the page. No answer key in sight. Read one, out-deduce the machine, and send us your reasoning — it goes straight to the benchmark authors.

A case awaits you…

Can you beat an AI detective?

HB

click to open

click to continue