CORE DIRECTORY // SYSTEM.USER.DIANA_ISMAIL
Labs by Diana — Experiments that ship.
Side projects that got out of hand. AI tools built for problems I kept tripping over — now live, now yours.
A Playbook for Multi-Project AI Teams
ARTICLE_004
PUBLISHED
2026.04.13
READ
~8 MIN
Parts 1 and 2 documented the system and the rules. This article is the implementation guide. Every published AI governance playbook targets enterprise compliance teams -- risk taxonomies, RACI matrices, lifecycle frameworks. None of them answer the practical questions: how do you write your first CLAUDE.md, how do you decide what goes global versus project-scoped, how do you onboard an agent to a codebase it has never seen, and how do you know whether your governance is actually working six weeks later.
This article fills that gap. It walks through the setup from an empty directory to a functioning multi-project governance system, documents the five failure modes that practitioners encounter most consistently when scaling AI-assisted development, and provides a health checklist using proxy metrics that no published framework includes. Everything described here is drawn from the system running across fourteen projects in production -- not from theory, not from enterprise recommendations, but from what actually worked and what had to be fixed when it didn't.
The_Missing_Layer
There is no shortage of AI governance frameworks. Microsoft released the Agent Governance Toolkit in April 2026 for runtime security policy enforcement. IAPS published a 63-page field guide with a five-part intervention taxonomy. Arthur.ai, EY, and Superwise all have playbooks with RACI models and lifecycle management steps. Every one of them targets enterprise compliance teams.
None of them tell you how to write a CLAUDE.md file. None explain how to structure rules across five repositories so conventions stay consistent. None address what happens when an agent enters a codebase for the first time and has to figure out where to start.
The gap is not at the policy level -- it is at the practitioner level. The frameworks assume the governance infrastructure already exists and focus on monitoring and enforcement. For a solo operator or small team building with AI agents, the infrastructure is the entire problem. This article is the setup guide that the enterprise playbooks skip.
The_Three-File_Foundation
The minimum viable governance system is three files. The first is a global rules file at ~/.claude/CLAUDE.md. This is the baseline -- conventions that apply to every project regardless of stack or domain. In my system, this covers TypeScript strict mode, Zod validation for all API payloads, Conventional Commits, WCAG 2.1 AA accessibility, security practices (no hardcoded secrets, parameterised queries, no exposed stack traces), and agent coordination rules (specialists don't delegate to each other, cross-project decisions route through the owner). It runs about 60 lines. Every convention here is something I never want to restate per project.
The second file is a project CLAUDE.md in each repository root. This documents what makes this codebase different from every other one -- the stack, the architecture patterns, the build commands, the error handling conventions, the git workflow. Labs documents its chat engine layer and CSS custom property system. GEO Audit documents its API route structure and Machine Clarity design system. These files run under 200 lines each.
The third file is one or more scoped rules in .claude/rules/ with glob patterns that load conditionally. FitChecker's admin.md carries a glob of app/manage/**,app/api/manage/** and injects 26 route definitions only when editing admin module files. These three layers -- global baseline, project conventions, conditional scoped rules -- are the entire governance infrastructure. Everything else in my system (planning folders, cross-project state files, decision logs) builds on this foundation but is not required to start.
Writing_the_First_CLAUDE.md
The best structure for a CLAUDE.md follows three questions: what is this (stack, architecture, key directories), why does it work this way (purpose, non-obvious decisions, constraints), and how do you work here (commands, workflow, exceptions).
Start with what -- the things an agent needs to know before touching any code. Tech stack, build command, test command, lint command. Architecture overview in two to three sentences, not two to three paragraphs. Key directories and what lives in each.
Then why -- the decisions that the codebase itself does not explain. Why the chat engine uses a Redis singleton instead of per-request connections. Why CSS custom properties instead of Tailwind utility classes for design tokens. Why manual versioning instead of semantic-release automation. These are the things a new human developer would ask in their first week; an agent needs them stated explicitly because it cannot ask.
Then how -- the workflow rules that govern daily work. Git branching conventions, commit message format, what gets its own commit versus what gets bundled. Error handling patterns. Security requirements. The "never do" list. A good rules file tells the agent what the codebase cannot. The code shows what exists. The rules file explains why it exists that way and what constraints govern changes to it. HumanLayer's research confirmed this: the highest-leverage rules are the ones that bridge the gap between "what the code is" and "how to work with it correctly."
Onboarding_an_Agent_to_an_Existing_Codebase
An agent entering a repository for the first time has the same problem as a new hire on day one: it can see all the code but has no idea what matters, what is fragile, or where to start. The CLAUDE.md is the onboarding document, but it is not enough on its own. The agent also needs to know what state the project is in right now -- not what the architecture looks like in general, but what branch is active, what was the last change, what is the current focus.
The SessionStart hook in my system handles this by scanning all active repositories and injecting a live git status dashboard into every conversation. But the broader principle is that onboarding has two phases: static context (the CLAUDE.md, the scoped rules, the folder structure) and dynamic context (current branch, recent commits, active work). Static context is written once and maintained. Dynamic context is injected at session start and refreshed on demand.
The failure mode is treating static context as sufficient. A CLAUDE.md that says "this project uses feature branches" does not tell the agent that there is currently an open branch with uncommitted changes on a half-finished migration. Onboarding works when the agent knows both the rules of the house and the state of the house.
Five_Failure_Modes
Five failure modes appear consistently when scaling AI-assisted development, and governance addresses all five.
The first is monolithic prompts -- asking an agent to produce a large output in a single pass. Addy Osmani documented the key lesson: never ask for large outputs; break into iterative, single-purpose steps. This is why the system uses subagents for context-heavy operations -- each gets its own context window and a focused task.
The second is the review bottleneck. Research shows AI-assisted pull requests carry 1.7 times more issues than human-authored ones. High adoption correlates with larger PRs and longer review times. The governance response is scoping: one branch per task, commit after every completed subtask, keep changes reviewable.
The third is technical debt accumulation -- studies show a 30-41% increase in technical debt after AI tool adoption, with cognitive complexity rising 39% in agent-assisted repositories. The constraint layer (strict TypeScript, no any types, explicit error handling) is the countermeasure. It doesn't prevent debt from accruing, but it makes the most common forms of it structurally impossible.
The fourth is security blind spots. More code shipped faster with less human visibility means more surface area for vulnerabilities. OWASP published the Top 10 for Agentic Applications in December 2025 specifically for this risk class. The security rules in the global CLAUDE.md (no hardcoded secrets, parameterised queries, no exposed stack traces, self-hosted fonts) are not optional -- they are the minimum viable security posture.
The fifth is skipping fundamentals. AI is an amplifier. Teams with strong engineering practices see those practices amplified. Teams without them see existing problems magnified. Governance rules do not replace good engineering -- they encode it so it persists across sessions, agents, and projects.
The_Governance_Health_Checklist
No published framework includes metrics for measuring whether development-team governance is actually working. Enterprise playbooks measure compliance rates and audit coverage. Practitioners need different signals. Five proxy metrics emerged from running this system across fourteen projects.
First, PR rejection rate on agent-generated code. If agents routinely produce work that fails review, the rules are not constraining behaviour effectively -- either they are too vague, or they are missing the constraints that would prevent the most common failures. Second, time-to-first-useful-output when onboarding an agent to a new project. If the first three messages in every session are the agent asking clarifying questions or making incorrect assumptions, the CLAUDE.md is not doing its job.
Third, rule revision frequency. Rules that never change are probably stale -- the codebase has evolved and the rules haven't kept up. Rules that change every week are unstable -- the conventions haven't settled. The healthy range is somewhere in between: rules update when the architecture changes, not when an agent misinterprets something that should have been clearer.
Fourth, incident count from agent-generated code reaching production. This is the lagging indicator -- by the time an incident happens, the governance gap has already been there for a while. But tracking it over time reveals whether rule changes are having an effect. Fifth, review cycle time before and after governance rules are applied. If reviews take longer after adding rules, the rules may be creating complexity rather than reducing it. Treat governance as a living document iterated alongside the codebase. The rules that exist today should not be the rules that exist in six months -- they should be better, fewer, and more precise.
What_to_Build_Next
AGENTS.md is converging as a cross-tool standard. The Agentic AI Foundation, stewarded by the Linux Foundation with founding members including Anthropic, OpenAI, and Block, published the specification in 2026. Over 60,000 open-source projects have adopted it. The trajectory is clear: tool-specific rule files (CLAUDE.md, .cursorrules) will coexist with a universal AGENTS.md that carries conventions readable by any AI development tool.
For practitioners building governance systems today, the practical implication is to keep tool-specific additions minimal and push shared conventions into a format that will survive a tool switch.
The broader pattern is that agent governance is not a solved problem -- it is an emerging discipline with new failure modes surfacing as adoption scales. The system documented across these three articles handles fourteen projects today. It will need to handle more, across more tools, with agents that are more capable and therefore more dangerous when ungoverned. The work is not building a system that works now. It is building one that can be maintained, extended, and corrected by the next person -- or the next agent -- that inherits it.
KEY_TAKEAWAYS
TAKEAWAY_01
The gap between enterprise governance frameworks and practitioner needs is the entire implementation layer. Published playbooks address risk taxonomies and compliance monitoring. They do not address how to write a CLAUDE.md, how to structure rules across repositories, or how to onboard an agent to a codebase. The three-file foundation -- global baseline, project conventions, conditional scoped rules -- is the minimum viable governance system, and it can be set up in an afternoon.
TAKEAWAY_02
Five failure modes appear consistently when scaling AI-assisted development: monolithic prompts, review bottlenecks, technical debt accumulation, security blind spots, and skipping fundamentals. Governance does not prevent these -- it encodes the engineering practices that counteract them, so the countermeasures persist across sessions, agents, and projects rather than depending on whoever happens to be paying attention.
TAKEAWAY_03
Governance without measurement degrades silently. Five proxy metrics -- PR rejection rate, onboarding time-to-first-useful-output, rule revision frequency, agent-generated incident count, and review cycle time -- provide the feedback loop that no published framework includes. If you are not measuring whether your rules are working, you are maintaining a system on faith.