---
title: "RFC: hub recall API"
url: https://memory.wiki/OGySiVoO
updated: 2026-05-14T18:15:49.480Z
hub: https://memory.wiki/hub/demo
concept_count: 12
source: "memory.wiki"
---
# RFC: hub recall API

> Status: shipped.

## The endpoint

`POST mdfy.app/api/hub/{slug}/recall`

Body:

```json
{
  "question": "How does cross-AI memory work?",
  "k": 10,
  "level": "doc",
  "rerank": true
}
```

Returns the top-k matching chunks (or docs), ranked.

## How the recall actually works

1. **Embedding lookup.** Embed the question with `text-embedding-3-small` (1536 dim).
2. **Hybrid retrieval.** Run two queries in parallel against the user's hub:
   - **Vector.** pgvector cosine match against `documents.embedding` (HNSW index, `ef_search = 40`).
   - **Lexical.** Postgres `to_tsvector` full-text search against `documents.fts`.
3. **Union + de-dup.** Concatenate the top 30 from each, de-dup by doc id. ~30-50 unique candidates.
4. **Reranker (optional).** If `rerank: true`, send the candidates + the question to Anthropic Haiku, which scores each match. Re-sort by Haiku's score, take top-k.
5. **Return.** Each result includes: doc id, doc title, doc URL, the matched chunk text, the rank score, and the source (vector / lexical / both).

## What's tunable

- `k` — number of results to return. 1-20.
- `level` — `"doc"` returns whole docs; `"chunk"` returns specific passages (chunks are pre-computed at ~500 tokens each).
- `rerank` — boolean. Default true. Costs ~300ms p95. False for speed-first paths.
- `min_score` — discard results below a cosine threshold. Useful for "don't return anything if nothing matches."

## Auth

The endpoint is publicly callable for public hubs. For restricted/private hubs, the caller has to be the owner OR have an MCP-signed token. Anonymous calls to a private hub return 401.

## What it doesn't do

- **Multi-hop reasoning.** No "fetch this, then fetch what it links to, then aggregate." That's a higher-level construct that lives in the caller's loop.
- **Live recomputation of embeddings.** We embed at write time; recall reads from the existing vectors. Staleness is bounded by the longest delay between a doc edit and the embedding-refresh job (currently 30s).
- **Graph traversal.** Recall is flat over chunks. The graph relationships are at the concept level, accessible separately via the concept index.

## What's next

- **Per-hub recall caching.** Common queries against a public hub should be cacheable for ~60s.
- **Streaming results.** Today the response waits for the reranker to finish. We could stream the union results as they arrive and replace them as the reranker scores them. Tradeoff: more complex client code.
- **Configurable embedding model.** Currently hardcoded to OpenAI ada-3-small. Worth exposing if we ever support a non-OpenAI default.


---

## Concepts in this document
- **pgvector** _(entity)_
  PostgreSQL extension providing vector data type and HNSW indexing for efficient similarity search.
- **hybrid retrieval** _(concept)_
  The core search strategy combining vector embeddings and lexical full-text search in parallel to maximize recall quality.
- **Anthropic Haiku** _(entity)_
  LLM called during optional reranking phase to score candidate results and improve final ranking.
- **reranking** _(concept)_
  Optional LLM-based scoring step using Anthropic Haiku to re-sort hybrid results by semantic relevance before returning top-k.
- **hub recall API** _(entity)_
  The primary POST endpoint that retrieves top-k matching chunks or documents from a user's hub using hybrid search.
- **lexical search** _(concept)_
  Full-text search using Postgres tsvector for keyword and phrase matching against document content.
- **vector search** _(concept)_
  Cosine similarity matching against pgvector embeddings using HNSW indexing to find semantically similar content.
- **hub authentication** _(concept)_
  Access control mechanism distinguishing public hub recall (no auth required) from private hub recall (owner or MCP token required).
- **text-embedding-3-small** _(entity)_
  OpenAI embedding model used to vectorize questions and documents into 1536-dimensional vectors for semantic search.
- **chunk-level retrieval** _(concept)_
  Returns specific ~500-token passages rather than whole documents for precise content matching.
- **de-duplication** _(concept)_
  Union of vector and lexical results filtered by document id to eliminate duplicate matches before reranking.
- **embedding staleness** _(concept)_
  Bounded delay between document edits and embedding refresh job results, currently capped at ~30s maximum.

## Concept relations (within this doc's concepts)
- **hybrid retrieval** combines **vector search**
- **vector search** executes via **pgvector**
- **hub recall API** supports **chunk-level retrieval**
- **vector search** bounded by **embedding staleness**
- **hybrid retrieval** combines **lexical search**
- **hub recall API** optionally applies **reranking**
- **hybrid retrieval** followed by **de-duplication**
- **hub recall API** implements **hybrid retrieval**
- **vector search** uses **text-embedding-3-small**
- **reranking** calls **Anthropic Haiku**
- **hub recall API** enforces **hub authentication**

_Hub canonical:_ https://memory.wiki/hub/demo
_Concept digest:_ https://memory.wiki/raw/hub/demo?digest=1&compact=1