---
title: "Memory.Wiki Memory"
url: https://memory.wiki/mdfy-memory
updated: 2026-05-24T17:45:03.775Z
hub: https://memory.wiki/hub/raymindai
bundle_count: 1
concept_count: 12
source: "mdfy-explainer"
---
# Memory.Wiki Memory

*Your AI memory, owned by you, readable by any AI you paste it to.*

---

## What "memory" means here

Every chat with ChatGPT, Claude, or Cursor produces useful answers. Tomorrow they're gone. The chat is closed, the share link rots, the next session has no idea what you decided last time. Vendors have started building memory layers (ChatGPT memory, Claude projects, Cursor docs) but each one lives behind a vendor wall. They don't talk to each other, you can't share them, you can't read them outside the app, and you definitely can't paste them into the *other* AI tomorrow.

Memory.Wiki Memory is the inverse: a memory layer that lives at a public URL you control. Every captured answer is a markdown page anyone (you, your teammate, any AI agent) can read, and the whole hub is one URL that any AI can fetch as context.

The full architecture below is what makes that work (chunked indexing, hybrid retrieval, automatic refresh), but you only need to know the surface to use it.

---

## The surface (what you actually do)

### 1. Capture

- Paste a ChatGPT or Claude share URL into the editor.
- `/memory.wiki capture <title>` from inside Claude Code, Cursor, Codex CLI, or Aider.
- Drop a PDF, DOCX, or transcript file.

Each capture lands at `memory.wiki/<id>` as a permanent URL. No signup required.

### 2. Organize (or let Memory.Wiki do it)

Captures roll up into your hub at `memory.wiki/hub/<you>`. Bundles group docs by topic. You can curate manually, or let auto-synthesis suggest groupings as the cluster forms.

### 3. Recall

Two ways:

- **Paste the hub URL** into any AI. They fetch the markdown index and load your knowledge as context.
- **Hit the recall endpoint** for question-targeted retrieval. Much fewer tokens, much higher precision:

```bash
curl -X POST https://memory.wiki/api/hub/<slug>/recall \
  -H "Content-Type: application/json" \
  -d '{
        "question": "How does mem0 extract memories?",
        "k": 5,
        "level": "chunk",
        "hybrid": true
      }'
```

That's the whole product surface. The rest of this doc is what's underneath.

---

## How the memory layer works (architecture)

Memory.Wiki Memory is built on the same shape Karpathy described in his [LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). Raw, wiki, schema, with the AI doing 80% of the curation work that he does by hand.

### Layer 1: embeddings everywhere, idempotent

Every public doc carries a 1536-dimensional vector embedded with OpenAI `text-embedding-3-small`, indexed with HNSW for cosine similarity. Same for every bundle. Same, at a finer grain, for every chunk inside a doc.

The refresh is **idempotent**. Each artifact carries a sha256 hash of its source. When you save a doc:

1. Frontend debounces 10 s after the last save.
2. Hits `POST /api/embed/<id>` (fire-and-forget).
3. The route hashes the current source. If the hash matches stored, it returns `{skipped: "unchanged"}` without ever calling OpenAI. Cost on a no-op save: zero.
4. If the hash differs, it embeds, writes the vector + new hash, continues.

Same pattern at three levels:

| Level | Source | Trigger |
|---|---|---|
| **Doc** | title + body | doc save (10s debounce) |
| **Chunk** | each markdown heading subtree | runs alongside doc embed; only changed chunks re-embed; deleted sections pruned |
| **Bundle** | title + description + member doc titles | `/api/embed/bundle/<id>` |

Result: schema layer is always *fresh enough* to retrieve from, without ever paying full embed cost on an unchanged hub.

### Layer 2: chunks by markdown structure

A doc isn't one vector. It's split on markdown headings (`#`, `##`, `###`); each chunk is the heading line plus everything until the next heading at equal-or-higher rank. Pre-heading prelude becomes chunk 0. Sections longer than ~1800 chars further split on blank-line boundaries with the heading re-emitted at the top of each piece.

Each chunk carries a *breadcrumb*:

```
Memory.Wiki Memory > How the memory layer works > Layer 1: embeddings everywhere
```

When recall returns chunks, the breadcrumb tells the LLM (and the human reading the JSON) exactly *where* in the doc the snippet came from.

### Layer 3: recall as an HTTP endpoint

The retrieval surface is a single public endpoint. No SDK, no API key, no MCP server.

```
POST memory.wiki/api/hub/<slug>/recall
body:
  {
    "question": "...",
    "k": 5,
    "level": "doc" | "chunk" | "bundle",
    "hybrid": false
  }
```

Three retrieval granularities:

| level | Returns | When |
|---|---|---|
| `doc` | Top-K whole docs | "Which docs are about X?" Lowest tokens. |
| `chunk` | Paragraph-level chunks with breadcrumb | Default for AI agents. Actual answering paragraph, ~10x less waste. |
| `bundle` | Top-K curated bundles | "Is there a reading order for this?" The bundle URL pulls full topic context. |

**Hybrid (BM25 + vector RRF)**: when `hybrid: true` on `level: "chunk"`:

1. Vector cosine over chunk embeddings (top `k*4`).
2. Postgres FTS (BM25 via tsvector) over the same chunks (top `k*4`).
3. Reciprocal Rank Fusion: `score = sum( 1 / (60 + rank_in_list) )`.

RRF merges *ranks*, not raw scores, so vector and BM25 (incompatible scales) combine cleanly with no normalization. Each result returns `vector_rank`, `fts_rank`, and `rrf_score` so callers can see why a chunk surfaced.

In practice: query *"MCP server"* has weak semantic signal (an acronym to the embedding model) but strong lexical signal (the chunk that *literally mentions MCP* should win). Vector-only ranks a vague "Why now?" doc first. Hybrid promotes the chunk that says "Built the MCP server" to top-1.

### Layer 4: privacy filters live in SQL

Every public retrieval RPC enforces the same four privacy gates *in SQL*, not in the API route:

```sql
WHERE d.is_draft = FALSE
  AND d.deleted_at IS NULL
  AND d.password_hash IS NULL
  AND (d.allowed_emails IS NULL OR array_length(d.allowed_emails, 1) IS NULL)
```

Drafts, soft-deletes, password-protected, and email-restricted docs *cannot* leak through recall, even by accident, even if the API route has a bug. The schema is the boundary.

---

## Layer x Operation matrix

| | Embed | Retrieve | Public? |
|---|---|---|---|
| **Doc** | auto on save (idempotent) | `/recall?level=doc` (vector) | yes |
| **Chunk** | auto alongside doc embed (per-chunk hash) | `/recall?level=chunk` (vector) or `hybrid=true` (BM25 + vector RRF) | yes |
| **Bundle** | `/api/embed/bundle/<id>` | `/recall?level=bundle` (vector) | yes |
| **Hub graph** | precomputed semantic edges (cos < 0.42) between all docs | `/hub/<slug>/graph` (visual) | yes |
| **Cross-refs** | extracted from markdown links across all public hubs | `/api/social/cross-refs` | yes |

Five distinct retrieval surfaces, all reading from the same embedding tables, all behind the same SQL privacy gates.

---

## Why this is different from mem0 / OpenMemory

```
                     mem0 / OpenMemory     Memory.Wiki Memory
First user           AI agent              human (agent reads via URL)
Interface            MCP server / SDK      HTTP endpoint
Content shape        atomic memories       long-form docs + bundles
Visibility           black box             human-readable markdown URL
Sharing              personal / team       public URL, any AI can fetch
Vendor lock-in       MCP-compatible only   any AI that can hit a URL
```

Memory.Wiki Memory isn't a backend store hidden behind an SDK. It's a public HTTP endpoint over content the user can read, edit, and paste. The retrieval pipeline below the surface is comparable to backend-only systems (chunked, hybrid, idempotent) but the *surface* stays human-shaped.

---

## What's deliberately not here (yet)

- **Cross-encoder reranker** on top of RRF. Better, at +50 to 100 ms latency. Wait until users have hubs big enough that the gain matters.
- **Per-bundle automatic re-embed on metadata edits.** Doc-level is wired through auto-save; bundle-level still needs a manual `/api/embed/bundle/<id>` after edits. Auto-trigger on bundle PATCH is the next sprint.
- **Multi-vector / late interaction (ColBERT-style).** Useful at scale; overkill for hubs in the hundreds.

---

## Try it

```bash
curl -X POST https://memory.wiki/api/hub/raymindai/recall \
  -H "Content-Type: application/json" \
  -d '{
        "question": "How does mem0 extract memories?",
        "k": 5,
        "level": "chunk",
        "hybrid": true
      }'
```

The response carries `results[].markdown` (the actual chunk), `heading_path` (breadcrumb), `doc_url` (link back), `rrf_score`, `vector_rank`, `fts_rank` so you can see *why* each chunk surfaced.

For the wider thesis (what Memory.Wiki is and how it sits next to vendor memory layers), see [How Memory.Wiki works](/how-memorywiki-works) and [MWBench](/mwbench) for the open cross-AI verification.

---

*This page is itself a Memory.Wiki memory. Paste it into Claude or ChatGPT and they read the whole pipeline as context.*


---

## Summary
Memory.Wiki Memory is a public, human-readable memory layer that lets you capture answers from AI chats, organize them into a hub at a permanent URL, and share that hub with any AI via a simple HTTP recall endpoint that supports chunked retrieval, hybrid search (BM25 + vector), and privacy-enforced filtering. Unlike vendor-specific memory systems, it's not locked behind an SDK or MCP server and can be pasted into any AI as markdown context.

## Themes
- portable AI memory
- vendor-agnostic retrieval
- human-readable knowledge layer

## Key takeaways
- Memory.Wiki Memory is a public HTTP endpoint over markdown documents, not a vendor-locked SDK or MCP server.
- Documents are split into chunks at markdown heading boundaries with breadcrumb paths, enabling precision retrieval at three granularities: doc, chunk, and bundle.
- Embeddings are computed idempotently: frontend debounces saves, hashes source content, and skips OpenAI calls if the hash matches stored state.
- Retrieval supports hybrid search combining vector cosine similarity and PostgreSQL BM25 full-text search via Reciprocal Rank Fusion, returning both individual ranks and merged scores.
- The system differs from mem0 and OpenMemory by prioritizing human-readable, shareable markdown over atomic memory abstractions and black-box storage.

## Insights
- The system achieves cost efficiency through idempotent embedding with SHA256 hashing, skipping OpenAI calls entirely when document content is unchanged.
- Hybrid retrieval (BM25 + vector RRF) solves a specific semantic weakness: acronyms like MCP have poor embedding signal but strong lexical presence, so rank fusion outperforms vector-only search.
- Privacy enforcement lives in SQL schema constraints rather than application logic, making information leakage impossible even if the API route contains bugs.

## Open questions / gaps
- How does performance scale when a hub contains thousands of documents or millions of chunks?
- What is the operational cost to users, and does the idempotent embedding strategy meaningfully reduce spend at various hub sizes?

## Concepts in this document
- **Knowledge Management** _(tag)_
  Overarching domain of personal and organizational information systems
- **Claude** _(entity)_
  Specified AI tool for prototyping and validation before moving to high-fidelity design.
- **MCP server** _(concept)_
  A standard protocol allowing diverse AI tools to query and interact with the memory.wiki hub.
- **Hybrid Retrieval** _(concept)_
  Technical approach combining chunked indexing with semantic search for precise question-targeted recall.
- **URL as API** _(concept)_
  Core architectural principle where every Memory.Wiki URL serves as a fetchable API endpoint for AI consumption
- **ChatGPT** _(entity)_
  One of the AI platforms currently suffering from isolated memory silos.
- **Cursor** _(entity)_
  One of the AI platforms currently suffering from isolated memory silos.
- **AI memory** _(tag)_
  Conceptual topic describing how memory is used by AI agents.
- **Cross-AI Compatibility** _(concept)_
  The ability for Memory.Wiki URLs to work across any AI tool without vendor lock-in
- **Markdown** _(tag)_
  Lightweight markup format used as the universal content format across Memory.Wiki.
- **Obsidian** _(entity)_
  Competitor noted as a note-taking tool in relation to memory concepts.
- **Mem0** _(entity)_
  Competitive reference mentioned in the redefinition document.

## Concept relations (within this doc's concepts)
- **Markdown** transport format **URL as API**
- **AI memory** disrupts category **Knowledge Management**
- **URL as API** content format **Markdown**
- **URL as API** integrates with **ChatGPT**
- **URL as API** integrates with **Claude**
- **Cross-AI Compatibility** enables through **URL as API**
- **Hybrid Retrieval** serves domain **Knowledge Management**
- **URL as API** implements pattern **Cross-AI Compatibility**
- **URL as API** enables compatibility **Cross-AI Compatibility**
- **URL as API** enables **Cross-AI Compatibility**
- **Claude** has isolated **AI memory**
- **ChatGPT** has isolated **AI memory**
- **Cursor** has isolated **AI memory**

## Bundles containing this document
- [Memory.Wiki Foundations](https://memory.wiki/b/9FATHAnw)
  > What Memory.Wiki is, the three URL primitives, the memory architecture, the /memory.wiki skill across coding tools, the Bundle Spec, FAQ, and roadmap. Read in order, or paste the bundle URL into any A

_Hub canonical:_ https://memory.wiki/hub/raymindai
_Concept digest:_ https://memory.wiki/raw/hub/raymindai?digest=1&compact=1
