---
title: "Reading: Karpathy on LLM evals"
url: https://memory.wiki/c9e5203af6ee
updated: 2026-05-18T08:06:00.000Z
hub: https://memory.wiki/hub/memorywiki-demo
bundle_count: 1
source: "Memory.Wiki"
---
# Reading: Karpathy on LLM evals

A good error message answers three questions: what happened, why it happened, and what to try next. Most ship the first, hint at the second, and forget the third. The fix is usually a single sentence longer.

The hardest part of a 1-person startup isn't the work — it's the lack of a forcing function. Without a meeting on Tuesday, nothing has to ship on Monday. The schedule has to come from somewhere, and "because I said so" isn't enough.

A good error message answers three questions: what happened, why it happened, and what to try next. Most ship the first, hint at the second, and forget the third. The fix is usually a single sentence longer.

### Three rules I keep returning to

- Ship one feature, deeply, before two features shallowly.
- The interface IS the product. The engine just has to keep up.
- Anything important should fit on one screen.

```python
# Tiny script that prints any URL's title.
import requests, re
def title(url: str) -> str:
    html = requests.get(url, timeout=5).text
    m = re.search(r"<title>(.*?)</title>", html, re.S | re.I)
    return m.group(1).strip() if m else url
print(title("https://memory.wiki"))
```

> "The best note-taking system is the one you already have open."
> — every productivity post ever, and also true

The thesis here[^1] is that delivery model matters more than retrieval quality.

[^1]: First articulated in the W6 internal note "Graph RAG is delivery, not retrieval."

## What changed

The interesting thing about long-context models isn't that they can read more — it's that they finally make the *retrieval* problem optional. When a model can hold the whole repo in context, the question shifts from "what should I fetch?" to "what should I show?". That's a UX question, not an infrastructure one.

---


## Bundles containing this document
- [App Store submission checklist](https://memory.wiki/b/bf24c8b646ef)
  > App Store submission checklist — a curated set of memories grouped by theme. Reviewer note: this is generated demo content.

_Hub canonical:_ https://memory.wiki/hub/memorywiki-demo
_Concept digest:_ https://memory.wiki/raw/hub/memorywiki-demo?digest=1&compact=1
