Memory.Wiki Memory
Your AI memory, owned by you, readable by any AI you paste it to.
What "memory" means here
Every chat with ChatGPT, Claude, or Cursor produces useful answers. Tomorrow they're gone. The chat is closed, the share link rots, the next session has no idea what you decided last time. Vendors have started building memory layers (ChatGPT memory, Claude projects, Cursor docs) but each one lives behind a vendor wall. They don't talk to each other, you can't share them, you can't read them outside the app, and you definitely can't paste them into the other AI tomorrow.
Memory.Wiki Memory is the inverse: a memory layer that lives at a public URL you control. Every captured answer is a markdown page anyone (you, your teammate, any AI agent) can read, and the whole hub is one URL that any AI can fetch as context.
The full architecture below is what makes that work (chunked indexing, hybrid retrieval, automatic refresh), but you only need to know the surface to use it.
The surface (what you actually do)
1. Capture
- Paste a ChatGPT or Claude share URL into the editor.
/memory.wiki capture <title>from inside Claude Code, Cursor, Codex CLI, or Aider.- Drop a PDF, DOCX, or transcript file.
Each capture lands at memory.wiki/<id> as a permanent URL. No signup required.
2. Organize (or let Memory.Wiki do it)
Captures roll up into your hub at memory.wiki/hub/<you>. Bundles group docs by topic. You can curate manually, or let auto-synthesis suggest groupings as the cluster forms.
3. Recall
Two ways:
- Paste the hub URL into any AI. They fetch the markdown index and load your knowledge as context.
- Hit the recall endpoint for question-targeted retrieval. Much fewer tokens, much higher precision:
bashcurl -X POST https://memory.wiki/api/hub/<slug>/recall \
-H "Content-Type: application/json" \
-d '{
"question": "How does mem0 extract memories?",
"k": 5,
"level": "chunk",
"hybrid": true
}'
That's the whole product surface. The rest of this doc is what's underneath.
How the memory layer works (architecture)
Memory.Wiki Memory is built on the same shape Karpathy described in his LLM Wiki gist. Raw, wiki, schema, with the AI doing 80% of the curation work that he does by hand.
Layer 1: embeddings everywhere, idempotent
Every public doc carries a 1536-dimensional vector embedded with OpenAI text-embedding-3-small, indexed with HNSW for cosine similarity. Same for every bundle. Same, at a finer grain, for every chunk inside a doc.
The refresh is idempotent. Each artifact carries a sha256 hash of its source. When you save a doc:
- Frontend debounces 10 s after the last save.
- Hits
POST /api/embed/<id>(fire-and-forget). - The route hashes the current source. If the hash matches stored, it returns
{skipped: "unchanged"}without ever calling OpenAI. Cost on a no-op save: zero. - If the hash differs, it embeds, writes the vector + new hash, continues.
Same pattern at three levels:
| Level | Source | Trigger |
|---|---|---|
| Doc | title + body | doc save (10s debounce) |
| Chunk | each markdown heading subtree | runs alongside doc embed; only changed chunks re-embed; deleted sections pruned |
| Bundle | title + description + member doc titles | /api/embed/bundle/<id> |
Result: schema layer is always fresh enough to retrieve from, without ever paying full embed cost on an unchanged hub.
Layer 2: chunks by markdown structure
A doc isn't one vector. It's split on markdown headings (#, ##, ###); each chunk is the heading line plus everything until the next heading at equal-or-higher rank. Pre-heading prelude becomes chunk 0. Sections longer than ~1800 chars further split on blank-line boundaries with the heading re-emitted at the top of each piece.
Each chunk carries a breadcrumb:
Memory.Wiki Memory > How the memory layer works > Layer 1: embeddings everywhere
When recall returns chunks, the breadcrumb tells the LLM (and the human reading the JSON) exactly where in the doc the snippet came from.
Layer 3: recall as an HTTP endpoint
The retrieval surface is a single public endpoint. No SDK, no API key, no MCP server.
POST memory.wiki/api/hub/<slug>/recall
body:
{
"question": "...",
"k": 5,
"level": "doc" | "chunk" | "bundle",
"hybrid": false
}
Three retrieval granularities:
| level | Returns | When |
|---|---|---|
doc |
Top-K whole docs | "Which docs are about X?" Lowest tokens. |
chunk |
Paragraph-level chunks with breadcrumb | Default for AI agents. Actual answering paragraph, ~10x less waste. |
bundle |
Top-K curated bundles | "Is there a reading order for this?" The bundle URL pulls full topic context. |
Hybrid (BM25 + vector RRF): when hybrid: true on level: "chunk":
- Vector cosine over chunk embeddings (top
k*4). - Postgres FTS (BM25 via tsvector) over the same chunks (top
k*4). - Reciprocal Rank Fusion:
score = sum( 1 / (60 + rank_in_list) ).
RRF merges ranks, not raw scores, so vector and BM25 (incompatible scales) combine cleanly with no normalization. Each result returns vector_rank, fts_rank, and rrf_score so callers can see why a chunk surfaced.
In practice: query "MCP server" has weak semantic signal (an acronym to the embedding model) but strong lexical signal (the chunk that literally mentions MCP should win). Vector-only ranks a vague "Why now?" doc first. Hybrid promotes the chunk that says "Built the MCP server" to top-1.
Layer 4: privacy filters live in SQL
Every public retrieval RPC enforces the same four privacy gates in SQL, not in the API route:
sqlWHERE d.is_draft = FALSE
AND d.deleted_at IS NULL
AND d.password_hash IS NULL
AND (d.allowed_emails IS NULL OR array_length(d.allowed_emails, 1) IS NULL)
Drafts, soft-deletes, password-protected, and email-restricted docs cannot leak through recall, even by accident, even if the API route has a bug. The schema is the boundary.
Layer x Operation matrix
| Embed | Retrieve | Public? | |
|---|---|---|---|
| Doc | auto on save (idempotent) | /recall?level=doc (vector) |
yes |
| Chunk | auto alongside doc embed (per-chunk hash) | /recall?level=chunk (vector) or hybrid=true (BM25 + vector RRF) |
yes |
| Bundle | /api/embed/bundle/<id> |
/recall?level=bundle (vector) |
yes |
| Hub graph | precomputed semantic edges (cos < 0.42) between all docs | /hub/<slug>/graph (visual) |
yes |
| Cross-refs | extracted from markdown links across all public hubs | /api/social/cross-refs |
yes |
Five distinct retrieval surfaces, all reading from the same embedding tables, all behind the same SQL privacy gates.
Why this is different from mem0 / OpenMemory
mem0 / OpenMemory Memory.Wiki Memory
First user AI agent human (agent reads via URL)
Interface MCP server / SDK HTTP endpoint
Content shape atomic memories long-form docs + bundles
Visibility black box human-readable markdown URL
Sharing personal / team public URL, any AI can fetch
Vendor lock-in MCP-compatible only any AI that can hit a URL
Memory.Wiki Memory isn't a backend store hidden behind an SDK. It's a public HTTP endpoint over content the user can read, edit, and paste. The retrieval pipeline below the surface is comparable to backend-only systems (chunked, hybrid, idempotent) but the surface stays human-shaped.
What's deliberately not here (yet)
- Cross-encoder reranker on top of RRF. Better, at +50 to 100 ms latency. Wait until users have hubs big enough that the gain matters.
- Per-bundle automatic re-embed on metadata edits. Doc-level is wired through auto-save; bundle-level still needs a manual
/api/embed/bundle/<id>after edits. Auto-trigger on bundle PATCH is the next sprint. - Multi-vector / late interaction (ColBERT-style). Useful at scale; overkill for hubs in the hundreds.
Try it
bashcurl -X POST https://memory.wiki/api/hub/raymindai/recall \
-H "Content-Type: application/json" \
-d '{
"question": "How does mem0 extract memories?",
"k": 5,
"level": "chunk",
"hybrid": true
}'
The response carries results[].markdown (the actual chunk), heading_path (breadcrumb), doc_url (link back), rrf_score, vector_rank, fts_rank so you can see why each chunk surfaced.
For the wider thesis (what Memory.Wiki is and how it sits next to vendor memory layers), see How Memory.Wiki works and MWBench for the open cross-AI verification.
This page is itself a Memory.Wiki memory. Paste it into Claude or ChatGPT and they read the whole pipeline as context.