CORE DIRECTORY // SYSTEM.USER.DIANA_ISMAIL

Labs by Diana — Experiments that ship.

Side projects that got out of hand. AI tools built for problems I kept tripping over — now live, now yours.

ResearchActive

What I Built Before Karpathy Named It

ARTICLE_006

PUBLISHED

2026.04.22

READ

~8 MIN

In April 2026, Andrej Karpathy published a note on an idea he called the "LLM-Wiki" - a system where AI agents maintain a knowledge base by reading and writing to a centralised repository of rules, plans, and context. The description was pitched as architecture. I read it and felt the static clarity of something already working in production. I recognised my own system because I'd built it months earlier, for the simplest possible reason: agents kept losing continuity between sessions, and nobody had finished the tools to solve it yet.

For five months, I've run twelve agents across six production repositories. They manage code, copy, research, and orchestration for SaaS products, a portfolio, and experimental projects. The question that shaped every decision was operational: what keeps coherence when the agent is not running? What survives a session ending? The answer turned out to be four kinds of text files, each with a specific purpose and a maintenance discipline attached.

CLAUDE.md:_The_Ambient_Context_Layer

Every project gets two CLAUDE.md files. The global one - ~/.claude/CLAUDE.md - lives outside any single repo and defines how my twelve agents work across all six projects. It's roughly 120 lines: model tiering rules, security baselines, git workflow conventions, session compaction thresholds, agent coordination protocols. The per-project CLAUDE.md lives in the repository root and answers a narrower question: how do we build here?

The critical move is that agents read CLAUDE.md at session start. Not as documentation they should be familiar with. As executable context they load before writing anything. They read the rules, understand the constraints, and factor them into their decisions before implementation begins.

The system is bidirectional. Agents write to CLAUDE.md when they discover something that affects how future agents should operate. When Nix discovered that worktree agents lose their working directory during context compaction, she documented the three-point mitigation directly into the global file: grant Bash permission, anchor the CWD first, use absolute paths - with a reference to the platform bug (anthropics/claude-code#22945). That entry stays live. Every agent that touches worktrees reads it in context.

The Labs CLAUDE.md goes deeper. It documents the chat engine's failure asymmetry: rate limiting fails closed (if Redis is down, reject the message with a 429); OTP handling fails open (if Redis is down, let the request through - OTP won't work but nothing crashes). It names the exact modules that own these decisions (engine.ts, memory.ts, redis.ts). It explains why certain Redis calls are wrapped in try-catch while summarisation failures log and continue. That's not a tutorial. That's the operational knowledge of how the system behaves under pressure.

The file reads dense. The grammar is compressed. There's no table of contents because agents navigate by search, not by structure. It's written to be consumed by AI, not browsed by humans.

manifest.yaml:_The_Module_Contract_Layer

Every substantial module - anything with a clear boundary and multiple consumers - carries a manifest.yaml file at its root. The schema is deliberate and spare: name, purpose, owner, status, depends_on (with explicit why), depended_on_by (with explicit how), exports, contracts, failure_modes, performance.

The failure modes field matters most. In the chat engine, the rate limiter has documented failure modes at the trigger level: "Redis unavailable while handling rate limit check → request rejected with 429 (fails closed)." The OTP validator has the opposite: "Redis unavailable while validating OTP → request allowed through (fails open)." These aren't implementation details. They're architectural intentions. Any future change to those modules would break if it didn't preserve this asymmetry - and the manifest calls it out explicitly, in two sentences, in a location where a developer knows to look before touching the code.

Nix owns manifest updates across all my repos. When she adds an export, updates a dependency, or discovers a new failure mode, the manifest updates in the same PR. Stale manifests are worse than none - a developer trusting a manifest that describes yesterday's behaviour will introduce a real bug. The maintenance rule is non-negotiable: if the manifest isn't current, the PR is blocked.

MEMORY.md:_The_Cross-Session_Persistence_Layer

Conversation context closes. Projects persist. I need knowledge about how I work and how this team works to survive that boundary.

Every project has a MEMORY.md index - a curated set of observations, corrected approaches, and reference pointers. Four types: user (my role and preferences), feedback (how I've asked agents to change approach), project (current state, why, and by when), reference (where to find things in external systems). The index stays under thirty entries, each under 150 characters. If the index gets longer, it stops serving its purpose. An agent scanning thirty entries looking for relevance isn't finding context - they're reading documentation that failed to compress.

Feedback memories are the most valuable because they embed reasoning. They lead with the rule itself, then Why: (the reasoning at the time the rule was formed), then How to apply: (when this rule does and doesn't hold). I built this structure because a rule without its original reasoning becomes brittle when circumstances change. When the situation evolves, an agent can reason about whether the rule still applies instead of blindly following a ghost of earlier intent.

The global MEMORY.md contains entries like: "projects.json content-only changes are chore + patch bump, not feature + minor" with the reasoning attached: "discovered that separating content updates from code changes makes semantic-release cleaner and keeps the changelog focused on actual code work." A future agent reading that can evaluate whether the rule still fits the team's needs, instead of treating it as law.

Handoff_Documents:_The_Session_Boundary_Layer

CLAUDE.md is the stable instruction set. Handoff documents are the shock absorbers.

Before ending a session, I write a handoff that contains: what was completed, what's blocked and why, what the next session needs to know, what decisions are too fresh to have hardened into CLAUDE.md yet. Mid-course corrections. Experimental rules. Things that are true now but might shift. The distinction is sharp: CLAUDE.md lives long, handoff documents bridge the gap between sessions. They allow decisions made at maximum context density to transfer without cluttering the permanent rules.

The_Honest_Gap

This system is file-based and manual in ways Karpathy's framing points past. His architecture sketches graph-based retrieval, semantic search, automated staleness detection. I don't have those. There's no semantic query across my MEMORY.md - an agent reads the index linearly. There's no real-time consistency checking between manifests and code. The staleness check is a nightly cron job, not a background process. My system depends on discipline: readable code, agents that think before writing, code reviewers that catch the gap between what was promised and what was shipped.

This is what you build when you solve the problem empirically, in production, without access to infrastructure that doesn't exist yet at accessible cost. It works. It's durable enough to run at twelve-agent scale across six projects. It's also fragile in the ways that matter: it requires thought, not just structure.

But that gap is the point worth naming. I didn't build this because I was ahead of the curve. I built it because agents kept losing context between sessions, and documents are the medium through which humans and AI coordinate when the better tools don't exist yet. Karpathy gave me language for what I was already doing. The system itself - the four files, the maintenance disciplines, the decision to write for AI readability before human browsability - came first, from the weight of production continuity.

The knowledge base that's useful to an agent is not the knowledge base that looks impressive in a diagram. It's the one that survives a session ending and hands off everything the next session needs to not repeat work or break coherence. That's what these four files do.

Claude CodeAgentic AILLM-WikiKnowledge PersistenceCLAUDE.mdmanifest.yamlMEMORY.mdSession Handoff

KEY_TAKEAWAYS

TAKEAWAY_01

Four kinds of text files - CLAUDE.md, manifest.yaml, MEMORY.md, and handoff documents - each with a specific purpose and maintenance discipline, are sufficient to maintain coherence across twelve agents and six repositories. The system is file-based and manual in ways that more sophisticated architectures point past, but it works because it solves the actual production problem: what survives a session ending.

TAKEAWAY_02

The knowledge base that's useful to an agent is not the one that looks impressive in a diagram - it's the one that hands off everything the next session needs to not repeat work or break coherence. Writing for AI readability before human browsability is a design choice that compounds over months of production use.

TAKEAWAY_03

The gap between file-based persistence and graph-based retrieval is the honest frontier. No semantic query, no automated staleness detection, no real-time consistency checking - the system depends on discipline and agents that think before writing. Naming the gap matters as much as building what exists.

KEY_TAKEAWAYS

TAKEAWAY_01

TAKEAWAY_02

TAKEAWAY_03