Memory.Wiki Memory

Your AI memory, owned by you, readable by any AI you paste it to.

What "memory" means here

Every chat with ChatGPT, Claude, or Cursor produces useful answers. Tomorrow they're gone. The chat is closed, the share link rots, the next session has no idea what you decided last time. Vendors have started building memory layers (ChatGPT memory, Claude projects, Cursor docs) but each one lives behind a vendor wall. They don't talk to each other, you can't share them, you can't read them outside the app, and you definitely can't paste them into the other AI tomorrow.

Memory.Wiki Memory is the inverse: a memory layer that lives at a public URL you control. Every captured answer is a markdown page anyone (you, your teammate, any AI agent) can read, and the whole hub is one URL that any AI can fetch as context.

The full architecture below is what makes that work (chunked indexing, hybrid retrieval, automatic refresh), but you only need to know the surface to use it.

The surface (what you actually do)

1. Capture

Paste a ChatGPT or Claude share URL into the editor.
/memory.wiki capture <title> from inside Claude Code, Cursor, Codex CLI, or Aider.
Drop a PDF, DOCX, or transcript file.

Each capture lands at memory.wiki/<id> as a permanent URL. No signup required.

2. Organize (or let Memory.Wiki do it)

Captures roll up into your hub at memory.wiki/hub/<you>. Bundles group docs by topic. You can curate manually, or let auto-synthesis suggest groupings as the cluster forms.

3. Recall

Two ways:

Paste the hub URL into any AI. They fetch the markdown index and load your knowledge as context.
Hit the recall endpoint for question-targeted retrieval. Much fewer tokens, much higher precision:

bash
curl -X POST https://memory.wiki/api/hub/<slug>/recall \
  -H "Content-Type: application/json" \
  -d '{
        "question": "How does mem0 extract memories?",
        "k": 5,
        "level": "chunk",
        "hybrid": true
      }'

That's the whole product surface. The rest of this doc is what's underneath.

How the memory layer works (architecture)

Memory.Wiki Memory is built on the same shape Karpathy described in his LLM Wiki gist. Raw, wiki, schema, with the AI doing 80% of the curation work that he does by hand.

Layer 1: embeddings everywhere, idempotent

Every public doc carries a 1536-dimensional vector embedded with OpenAI text-embedding-3-small, indexed with HNSW for cosine similarity. Same for every bundle. Same, at a finer grain, for every chunk inside a doc.

The refresh is idempotent. Each artifact carries a sha256 hash of its source. When you save a doc:

Frontend debounces 10 s after the last save.
Hits POST /api/embed/<id> (fire-and-forget).
The route hashes the current source. If the hash matches stored, it returns {skipped: "unchanged"} without ever calling OpenAI. Cost on a no-op save: zero.
If the hash differs, it embeds, writes the vector + new hash, continues.

Same pattern at three levels:

Level	Source	Trigger
Doc	title + body	doc save (10s debounce)
Chunk	each markdown heading subtree	runs alongside doc embed; only changed chunks re-embed; deleted sections pruned
Bundle	title + description + member doc titles	`/api/embed/bundle/<id>`

Result: schema layer is always fresh enough to retrieve from, without ever paying full embed cost on an unchanged hub.

Layer 2: chunks by markdown structure

A doc isn't one vector. It's split on markdown headings (#, ##, ###); each chunk is the heading line plus everything until the next heading at equal-or-higher rank. Pre-heading prelude becomes chunk 0. Sections longer than ~1800 chars further split on blank-line boundaries with the heading re-emitted at the top of each piece.

Each chunk carries a breadcrumb:


Memory.Wiki Memory > How the memory layer works > Layer 1: embeddings everywhere

When recall returns chunks, the breadcrumb tells the LLM (and the human reading the JSON) exactly where in the doc the snippet came from.

Layer 3: recall as an HTTP endpoint

The retrieval surface is a single public endpoint. No SDK, no API key, no MCP server.


POST memory.wiki/api/hub/<slug>/recall
body:
  {
    "question": "...",
    "k": 5,
    "level": "doc" | "chunk" | "bundle",
    "hybrid": false
  }

Three retrieval granularities:

level	Returns	When
`doc`	Top-K whole docs	"Which docs are about X?" Lowest tokens.
`chunk`	Paragraph-level chunks with breadcrumb	Default for AI agents. Actual answering paragraph, ~10x less waste.
`bundle`	Top-K curated bundles	"Is there a reading order for this?" The bundle URL pulls full topic context.

Hybrid (BM25 + vector RRF): when hybrid: true on level: "chunk":

Vector cosine over chunk embeddings (top k*4).
Postgres FTS (BM25 via tsvector) over the same chunks (top k*4).
Reciprocal Rank Fusion: score = sum( 1 / (60 + rank_in_list) ).

RRF merges ranks, not raw scores, so vector and BM25 (incompatible scales) combine cleanly with no normalization. Each result returns vector_rank, fts_rank, and rrf_score so callers can see why a chunk surfaced.

In practice: query "MCP server" has weak semantic signal (an acronym to the embedding model) but strong lexical signal (the chunk that literally mentions MCP should win). Vector-only ranks a vague "Why now?" doc first. Hybrid promotes the chunk that says "Built the MCP server" to top-1.

Layer 4: privacy filters live in SQL

Every public retrieval RPC enforces the same four privacy gates in SQL, not in the API route:

sql
WHERE d.is_draft = FALSE
  AND d.deleted_at IS NULL
  AND d.password_hash IS NULL
  AND (d.allowed_emails IS NULL OR array_length(d.allowed_emails, 1) IS NULL)

Drafts, soft-deletes, password-protected, and email-restricted docs cannot leak through recall, even by accident, even if the API route has a bug. The schema is the boundary.

Layer x Operation matrix

	Embed	Retrieve	Public?
Doc	auto on save (idempotent)	`/recall?level=doc` (vector)	yes
Chunk	auto alongside doc embed (per-chunk hash)	`/recall?level=chunk` (vector) or `hybrid=true` (BM25 + vector RRF)	yes
Bundle	`/api/embed/bundle/<id>`	`/recall?level=bundle` (vector)	yes
Hub graph	precomputed semantic edges (cos < 0.42) between all docs	`/hub/<slug>/graph` (visual)	yes
Cross-refs	extracted from markdown links across all public hubs	`/api/social/cross-refs`	yes

Five distinct retrieval surfaces, all reading from the same embedding tables, all behind the same SQL privacy gates.

Why this is different from mem0 / OpenMemory


                     mem0 / OpenMemory     Memory.Wiki Memory
First user           AI agent              human (agent reads via URL)
Interface            MCP server / SDK      HTTP endpoint
Content shape        atomic memories       long-form docs + bundles
Visibility           black box             human-readable markdown URL
Sharing              personal / team       public URL, any AI can fetch
Vendor lock-in       MCP-compatible only   any AI that can hit a URL

Memory.Wiki Memory isn't a backend store hidden behind an SDK. It's a public HTTP endpoint over content the user can read, edit, and paste. The retrieval pipeline below the surface is comparable to backend-only systems (chunked, hybrid, idempotent) but the surface stays human-shaped.

What's deliberately not here (yet)

Cross-encoder reranker on top of RRF. Better, at +50 to 100 ms latency. Wait until users have hubs big enough that the gain matters.
Per-bundle automatic re-embed on metadata edits. Doc-level is wired through auto-save; bundle-level still needs a manual /api/embed/bundle/<id> after edits. Auto-trigger on bundle PATCH is the next sprint.
Multi-vector / late interaction (ColBERT-style). Useful at scale; overkill for hubs in the hundreds.

Try it

bash
curl -X POST https://memory.wiki/api/hub/raymindai/recall \
  -H "Content-Type: application/json" \
  -d '{
        "question": "How does mem0 extract memories?",
        "k": 5,
        "level": "chunk",
        "hybrid": true
      }'

The response carries results[].markdown (the actual chunk), heading_path (breadcrumb), doc_url (link back), rrf_score, vector_rank, fts_rank so you can see why each chunk surfaced.

For the wider thesis (what Memory.Wiki is and how it sits next to vendor memory layers), see How Memory.Wiki works and MWBench for the open cross-AI verification.

This page is itself a Memory.Wiki memory. Paste it into Claude or ChatGPT and they read the whole pipeline as context.