Memory.Wiki Memory

Your AI memory, owned by you, readable by any AI you paste it to.


What "memory" means here

Every chat with ChatGPT, Claude, or Cursor produces useful answers. Tomorrow they're gone. The chat is closed, the share link rots, the next session has no idea what you decided last time. Vendors have started building memory layers (ChatGPT memory, Claude projects, Cursor docs) but each one lives behind a vendor wall. They don't talk to each other, you can't share them, you can't read them outside the app, and you definitely can't paste them into the other AI tomorrow.

Memory.Wiki Memory is the inverse: a memory layer that lives at a public URL you control. Every captured answer is a markdown page anyone (you, your teammate, any AI agent) can read, and the whole hub is one URL that any AI can fetch as context.

The full architecture below is what makes that work (chunked indexing, hybrid retrieval, automatic refresh), but you only need to know the surface to use it.


The surface (what you actually do)

1. Capture

  • Paste a ChatGPT or Claude share URL into the editor.
  • /memory.wiki capture <title> from inside Claude Code, Cursor, Codex CLI, or Aider.
  • Drop a PDF, DOCX, or transcript file.

Each capture lands at memory.wiki/<id> as a permanent URL. No signup required.

2. Organize (or let Memory.Wiki do it)

Captures roll up into your hub at memory.wiki/hub/<you>. Bundles group docs by topic. You can curate manually, or let auto-synthesis suggest groupings as the cluster forms.

3. Recall

Two ways:

  • Paste the hub URL into any AI. They fetch the markdown index and load your knowledge as context.
  • Hit the recall endpoint for question-targeted retrieval. Much fewer tokens, much higher precision:
bash
curl -X POST https://memory.wiki/api/hub/<slug>/recall \ -H "Content-Type: application/json" \ -d '{ "question": "How does mem0 extract memories?", "k": 5, "level": "chunk", "hybrid": true }'

That's the whole product surface. The rest of this doc is what's underneath.


How the memory layer works (architecture)

Memory.Wiki Memory is built on the same shape Karpathy described in his LLM Wiki gist. Raw, wiki, schema, with the AI doing 80% of the curation work that he does by hand.

Layer 1: embeddings everywhere, idempotent

Every public doc carries a 1536-dimensional vector embedded with OpenAI text-embedding-3-small, indexed with HNSW for cosine similarity. Same for every bundle. Same, at a finer grain, for every chunk inside a doc.

The refresh is idempotent. Each artifact carries a sha256 hash of its source. When you save a doc:

  1. Frontend debounces 10 s after the last save.
  2. Hits POST /api/embed/<id> (fire-and-forget).
  3. The route hashes the current source. If the hash matches stored, it returns {skipped: "unchanged"} without ever calling OpenAI. Cost on a no-op save: zero.
  4. If the hash differs, it embeds, writes the vector + new hash, continues.

Same pattern at three levels:

Level Source Trigger
Doc title + body doc save (10s debounce)
Chunk each markdown heading subtree runs alongside doc embed; only changed chunks re-embed; deleted sections pruned
Bundle title + description + member doc titles /api/embed/bundle/<id>

Result: schema layer is always fresh enough to retrieve from, without ever paying full embed cost on an unchanged hub.

Layer 2: chunks by markdown structure

A doc isn't one vector. It's split on markdown headings (#, ##, ###); each chunk is the heading line plus everything until the next heading at equal-or-higher rank. Pre-heading prelude becomes chunk 0. Sections longer than ~1800 chars further split on blank-line boundaries with the heading re-emitted at the top of each piece.

Each chunk carries a breadcrumb:

Memory.Wiki Memory > How the memory layer works > Layer 1: embeddings everywhere

When recall returns chunks, the breadcrumb tells the LLM (and the human reading the JSON) exactly where in the doc the snippet came from.

Layer 3: recall as an HTTP endpoint

The retrieval surface is a single public endpoint. No SDK, no API key, no MCP server.

POST memory.wiki/api/hub/<slug>/recall body: { "question": "...", "k": 5, "level": "doc" | "chunk" | "bundle", "hybrid": false }

Three retrieval granularities:

level Returns When
doc Top-K whole docs "Which docs are about X?" Lowest tokens.
chunk Paragraph-level chunks with breadcrumb Default for AI agents. Actual answering paragraph, ~10x less waste.
bundle Top-K curated bundles "Is there a reading order for this?" The bundle URL pulls full topic context.

Hybrid (BM25 + vector RRF): when hybrid: true on level: "chunk":

  1. Vector cosine over chunk embeddings (top k*4).
  2. Postgres FTS (BM25 via tsvector) over the same chunks (top k*4).
  3. Reciprocal Rank Fusion: score = sum( 1 / (60 + rank_in_list) ).

RRF merges ranks, not raw scores, so vector and BM25 (incompatible scales) combine cleanly with no normalization. Each result returns vector_rank, fts_rank, and rrf_score so callers can see why a chunk surfaced.

In practice: query "MCP server" has weak semantic signal (an acronym to the embedding model) but strong lexical signal (the chunk that literally mentions MCP should win). Vector-only ranks a vague "Why now?" doc first. Hybrid promotes the chunk that says "Built the MCP server" to top-1.

Layer 4: privacy filters live in SQL

Every public retrieval RPC enforces the same four privacy gates in SQL, not in the API route:

sql
WHERE d.is_draft = FALSE AND d.deleted_at IS NULL AND d.password_hash IS NULL AND (d.allowed_emails IS NULL OR array_length(d.allowed_emails, 1) IS NULL)

Drafts, soft-deletes, password-protected, and email-restricted docs cannot leak through recall, even by accident, even if the API route has a bug. The schema is the boundary.


Layer x Operation matrix

Embed Retrieve Public?
Doc auto on save (idempotent) /recall?level=doc (vector) yes
Chunk auto alongside doc embed (per-chunk hash) /recall?level=chunk (vector) or hybrid=true (BM25 + vector RRF) yes
Bundle /api/embed/bundle/<id> /recall?level=bundle (vector) yes
Hub graph precomputed semantic edges (cos < 0.42) between all docs /hub/<slug>/graph (visual) yes
Cross-refs extracted from markdown links across all public hubs /api/social/cross-refs yes

Five distinct retrieval surfaces, all reading from the same embedding tables, all behind the same SQL privacy gates.


Why this is different from mem0 / OpenMemory

mem0 / OpenMemory Memory.Wiki Memory First user AI agent human (agent reads via URL) Interface MCP server / SDK HTTP endpoint Content shape atomic memories long-form docs + bundles Visibility black box human-readable markdown URL Sharing personal / team public URL, any AI can fetch Vendor lock-in MCP-compatible only any AI that can hit a URL

Memory.Wiki Memory isn't a backend store hidden behind an SDK. It's a public HTTP endpoint over content the user can read, edit, and paste. The retrieval pipeline below the surface is comparable to backend-only systems (chunked, hybrid, idempotent) but the surface stays human-shaped.


What's deliberately not here (yet)

  • Cross-encoder reranker on top of RRF. Better, at +50 to 100 ms latency. Wait until users have hubs big enough that the gain matters.
  • Per-bundle automatic re-embed on metadata edits. Doc-level is wired through auto-save; bundle-level still needs a manual /api/embed/bundle/<id> after edits. Auto-trigger on bundle PATCH is the next sprint.
  • Multi-vector / late interaction (ColBERT-style). Useful at scale; overkill for hubs in the hundreds.

Try it

bash
curl -X POST https://memory.wiki/api/hub/raymindai/recall \ -H "Content-Type: application/json" \ -d '{ "question": "How does mem0 extract memories?", "k": 5, "level": "chunk", "hybrid": true }'

The response carries results[].markdown (the actual chunk), heading_path (breadcrumb), doc_url (link back), rrf_score, vector_rank, fts_rank so you can see why each chunk surfaced.

For the wider thesis (what Memory.Wiki is and how it sits next to vendor memory layers), see How Memory.Wiki works and MWBench for the open cross-AI verification.


This page is itself a Memory.Wiki memory. Paste it into Claude or ChatGPT and they read the whole pipeline as context.