Decision: Anthropic Haiku for hub-recall reranker

Logged 2026-04-11.

What we ship

Hybrid retrieval (BM25 + pgvector union, top-30) → Anthropic Haiku 4.5 reranks → top-k returned to caller.

Why Haiku, not Voyage rerank-2 / Cohere rerank-v3 / Mixedbread

Voyage rerank-2 is the obvious technical choice (it's the reranker model from a rerankers-only shop). I ran the eval anyway:

Reranker nDCG@5 (our eval set) p95 latency $/1M tokens
Haiku 4.5 0.83 320ms $1 in / $5 out
Voyage rerank-2 0.85 110ms $0.50 / 1M
Cohere rerank-v3 0.84 180ms $1.00 / 1M

Voyage and Cohere are slightly more accurate and faster. So why Haiku?

  • Single-vendor story. We already use Anthropic for capture summarisation and graph extraction. Adding a second LLM provider for just reranking is operationally heavier than the marginal quality gain.
  • The eval gap is inside noise. Our eval set has 60 queries. The 0.02 nDCG gap between Haiku and Voyage falls inside the bootstrap CI. We can't prove the difference is real at this scale.
  • Latency budget has room. Our recall budget is 800ms p95 end-to-end. The reranker is 320ms of that. We're not against a wall.

When I'd revisit

  • If Voyage releases rerank-2.5 with a meaningful jump on the long-doc benchmark. Voyage has explicitly said they're working on it.
  • If we ever need to serve recall to a free-tier user at scale. Voyage's $0.50/1M would compound enough to matter.

Until then, single-vendor wins.