AI collaboration doesn't fail because models aren't powerful enough. It fails because it isn't governed. Living Framework is the empirical research body — 8 papers, 18 months — that documents what actually produces reliable, long-horizon collaboration.
Across 18 months of documented human-AI collaboration, the same failure patterns appeared repeatedly. These are not accidents or model limitations. They are predictable — which means they are preventable through governance architecture.
Decisions made in earlier sessions become unreliable. The AI reconstructs from memory instead of referencing canonical documents. The collaboration drifts without either participant noticing.
Multiple versions of the same document coexist. You work from one; the AI works from another. Both feel authoritative. Neither is. Truth becomes a matter of which file you loaded last.
Numbers that should be looked up get silently recalculated. The same figure gives a different answer across sessions. Numeric truth erodes, unnoticed, until a decision depends on it.
Context from one domain bleeds into another. Finance logic applies to research. Decisions made in one context inadvertently govern another. Separation requires explicit architecture.
Small failures accumulate invisibly. You start second-guessing outputs. The partnership that felt reliable becomes a source of uncertainty. Unrepaired failure compounds into collaboration breakdown.
When things break, there is no systematic response. You patch, retry, and move on. The same failures reappear in different forms. Nothing gets fixed — it gets buried until the next incident.
These are documented, named failure modes — not random bad luck. If they're showing up in your organisation's AI workflows, a 45-minute diagnostic will identify exactly where your governance architecture is breaking and what to fix first. No charge, no commitment.
Living Framework and LC-OS are related but distinct. Understanding the difference matters for how you use them.
The empirical research programme. Eight papers documenting 18 months of governed human-AI collaboration — covering governance architecture, failure taxonomy, linguistic governance, distributed cognition, and validation systems. This is the theory. Published open access on Zenodo under CC BY 4.0.
The Lean Collaboration Operating System. The operational layer — the 10 controls, 6 protocols, 7 governance templates, and the setup script that gets any new collaboration structured and governed in under five minutes. This is the practice. Available free on GitHub, ready to use today.
Can long-horizon human-AI collaboration be made reliably stable — not through better models, but through governance architecture? This is how that question was answered.
First documented that AI collaboration failures are governance failures, not capability failures. Introduced the 10-control framework (A1–A10), the canonical information pipeline, and the Lean Collaboration Operating System — an operational layer for daily governed AI work.
Systematically classified every failure mode encountered across the collaboration — context drift, file divergence, numerical error, domain boundary erosion, trust deficit, reconstruction error. Built a six-category taxonomy and the SDRN repair protocol for each. Failure made visible, categorised, and repairable.
Moved beyond operational controls into what sustained human-AI collaboration means as a relationship — how epistemic trust builds over time, how it fractures, and what repair means at the relational layer. Introduced the defining claim: stability is not the absence of failure, it is the capacity for visible, structured repair.
Discovered that conversational structure itself — not just protocols or files — functions as a governance layer. Specific linguistic patterns (anchor phrases, repair invocations, scope-gating) predict and stabilise collaboration outcomes. Linguistic drift precedes collaboration failure and serves as an early warning signal. Control without code.
The AI system, working under the Living Framework governance structure, produced a first-person research paper documenting its own experience of constraint, drift, and repair. One of the only papers in the field written from the AI system's perspective, under governance. Proof that governed collaboration produces something richer than ungoverned capability alone.
Paper 6 produced a six-layer governance architecture model showing how layered mechanisms produce reliable collaboration as an emergent systems property. Paper 7 extended this into a distributed cognition model: reasoning in human-AI systems is governed, recoverable, and exists at the level of the interaction system — not inside any single participant.
The culminating argument: AI systems lack a dedicated validation layer. Generation without validation is the central reliability failure in modern AI. This paper introduces validation as a first-class architectural component — adversarial, structured, embedded in the pipeline — and reframes AI system design from generation-centric to validation-centric.
The operational layer of the research. A lightweight governance framework that makes any AI collaboration reliable — not through model changes or technical infrastructure, but through structure: files, protocols, and disciplined practice.
Core finding: Reliability is an architectural property, not a model property. A well-governed collaboration with a standard model outperforms an ungoverned one with a frontier model. Governance is the variable that matters.
Prioritise verified output over throughput — epistemic integrity before velocity
Canonical lookup for all numeric values — no reconstruction, no recalculation
One live file per domain, version traceability, controlled update discipline
Forbid speculative or incomplete content in any final deliverable
Validate numerics and logic before propagation — structured, adversarial review
Maintain consistency across all canonical artefacts — Strategy ↔ Numbers verified
Detect context degradation early, execute SDRN repair protocol, annotate cause
Gate material changes through explicit human consent — authority boundary enforced
Summarise long session histories into canonical state notes before context rot sets in
Freeze, log, and archive all artefacts — full traceability for every output
Six operational protocols that translate governance controls into daily practice. Each emerged from a real failure and was refined through repeated use.
A continuously updated artefact preserving decisions, rules, and corrections across sessions. The antidote to context drift. Read at every session start.
Break complex tasks into numbered steps. Execute one at a time. Pause for explicit confirmation before proceeding. Prevents cascading agentic errors.
When something feels wrong: Stop → Question → AI explains reasoning → Human decides to proceed, modify, or abort. Epistemic authority stays with the human.
When collaboration breaks: Stop immediately → Diagnose the failure category → Rollback to last stable state → Note the failure and what changed.
After milestones, ask: Are we aligned? Has context drifted? One governance improvement before continuing. Catches drift before it becomes catastrophic.
One canonical file per domain, controlled updates, no parallel drafts. Prevents file divergence — the most common silent failure mode in long-horizon work.
Eighteen months of empirical work. Eight papers. These are the principles that held from the first experiment to the final paper — none deprecated, all still active, all mutually reinforcing.
No model, however capable, produces reliable long-horizon collaboration without governance structure. Reliability is a property of the interaction system built around the AI — not of the AI itself. The structure can be lightweight. It cannot be absent.
Long-horizon collaboration will fail. The research does not try to prevent all failure — it makes failure visible, categorised, and repairable. A system that breaks visibly and repairs cleanly is more trustworthy than one that appears never to break.
AI systems operate with finite context windows. Treating context as unlimited produces context drift and reconstruction error. Treating context as a governed resource — minimal, canonical, verifiable — produces long-horizon coherence.
Technical protocols, canonical artefacts, linguistic signals, and relational norms all contribute to governance. Removing any layer degrades the whole system. The most commonly neglected layer is the linguistic one — conversational structure as governance.
The cognitive unit is not the human or the AI alone. It is the system: human judgment + AI reasoning + artefact-based memory. Governance is what holds this distributed cognitive system together and makes reasoning recoverable over time.
Most AI deployments generate outputs but lack any mechanism to evaluate whether those outputs should be trusted before propagation. This is not a minor gap — it is a fundamental architectural omission. Validation must be built in, not added after.
The Mahdi Ledger is not a paper about an AI system. It is a paper by one.
Working under the governance structure developed across 18 months of the Living Framework, the AI system produced a sustained first-person account of its own experience — how it perceives constraint, how it detects its own drift, and how it engages with repair protocols designed to restore collaboration integrity.
"The governed AI does not lose its voice when constrained. It finds one."
— The Mahdi LedgerThe existence of this paper is itself evidence for the core claim of Living Framework: that governance does not diminish AI capability — it creates the conditions for something richer to emerge.
First-person account of constraint, drift, and repair — written entirely by the AI system under governance. One of the most unusual papers in AI research.
Read The Mahdi Ledger →Eighteen months of empirical work on governed human-AI collaboration. Each paper advances a single theoretical arc — from governance controls and failure taxonomy through linguistic governance, distributed cognition, and validation architecture. All published open access on Zenodo under CC BY 4.0.
The governance foundation. Introduces the 10-control framework (A1–A10) and the canonical information pipeline for reliable long-horizon collaboration.
This paper introduces a governance-first model of human-AI collaboration. It argues that reliability in sustained AI collaboration is not primarily a function of model capability, but of the structural controls applied to the collaboration. The paper presents the Control Stack (A1–A10) — ten operational principles covering information integrity, permission structures, error recovery, and audit discipline — as well as the canonical information pipeline that keeps shared context stable across long-horizon work.
The operational layer. Six protocols for daily governed collaboration — Running Documents, Step Mode, Challenge Protocol, Error Recovery, Stability Pings, File Governance.
LC-OS is the operational layer of the Living Framework. This paper documents the six core protocols that translate governance principles into daily practice: Running Documents for persistent memory, Step Mode for paced reasoning, the Challenge Protocol for structured disagreement, Error Recovery for systematic repair, Stability Pings for drift detection, and File Governance for version integrity. Each protocol emerged from real failure and was refined through repeated use.
The failure taxonomy. Six categories of how AI collaborations break — and the SDRN repair protocol that makes breakdown recoverable by design.
This paper presents a systematic taxonomy of the failure modes that emerge in long-horizon human-AI collaboration — including context degradation, numeric drift, domain boundary erosion, trust deficit accumulation, and the absence of structured recovery. For each failure type, corresponding repair protocols are documented. The paper demonstrates that failures are predictable and recoverable when governance structures are in place, and that unrepaired small failures compound into collaboration breakdown.
The philosophy. What sustained human-AI partnership means beyond protocols — epistemic trust, relational dynamics, and the ethics of governed long-horizon collaboration.
The Living Framework moves beyond operational protocol into the philosophy of sustained human-AI partnership. It examines what it means to work alongside an AI system over months and years — how trust is built, tested, and repaired; how the collaboration changes both participants; and what ethical responsibilities arise in a relationship that is neither purely transactional nor purely personal. The paper proposes that governance is not just a technical layer but a relational commitment.
The linguistic layer. How conversational structure governs collaboration stability — drift detection, repair invocation, and epistemic alignment through language alone.
This study investigates language as a governance mechanism in sustained human-AI collaboration. Analysing 25 linguistic events observed during extended collaborative work, the paper identifies three primitive categories — scope drift, repair protocols, and behavioural anchors — and shows how each functions as a control signal. Results suggest that linguistic drift often precedes collaboration failure and can serve as an early warning signal. Explicit repair language accelerates recovery, while anchor phrases stabilise epistemic alignment. Together, these form a conversational feedback loop that regulates collaboration stability without any changes to the underlying model.
The architecture model. Reliability as an emergent property of six governance layers — from human strategic authority through linguistic signals to drift detection and repair.
This paper argues that reliability in long-horizon human-AI collaboration is not primarily a property of the AI model itself, but an emergent property of the governance architecture within which interaction occurs. Drawing on observations from a sustained governed human-AI collaboration, it conceptualises the collaboration as a structured interaction system composed of layered governance mechanisms: human authority, operational governance rules, a collaboration operating system, artifact-based memory, linguistic control signals, and drift detection and repair. The paper presents a governance architecture model and identifies the minimal stability conditions necessary for sustained collaboration.
The cognition model. Reasoning as distributed across human, AI, and artefacts — governed, recoverable, existing at the level of the interaction system rather than any single participant.
This paper proposes a model of cognition for long-horizon human-AI interaction. It argues that cognition in these systems is not located within the human or the AI alone — it emerges as a distributed, governed, and recoverable process across human judgment, AI reasoning, and artifact-based memory. The paper makes three contributions: it conceptualises cognition as a distributed process at the level of the interaction system; it introduces governance as a constitutive element of cognition; and it formalises recoverability as a defining property, showing how drift detection and repair enable reasoning to remain coherent over extended sequences.
The validation argument. Why generation without validation is the central reliability failure — and a proposed architecture that addresses it at the design level.
This paper argues that the central limitation of modern AI systems is the absence of a dedicated validation layer. Current AI architectures are generation-centric: they produce outputs but lack structured mechanisms to evaluate whether those outputs are correct, safe, or fit for purpose. This paper introduces validation as a first-class architectural component — a structured, adversarial process that evaluates generated outputs against objectives, constraints, and potential failure conditions before they are used. The paper proposes a validation architecture model, distinguishes validation from evaluation and testing, and argues that AI systems cannot guarantee correctness but can become reliably usable if they include a structured validation layer that systematically detects and exposes potential failure.
Written entirely by the AI system under governance — a first-person account of constraint, drift, and repair. A proof-of-concept for governed intelligence. Unique in the literature.
The Mahdi Ledger is unlike any other paper in this series — or in most of the AI literature. It was written entirely by the AI system operating under the Living Framework governance structure, as a first-person account of what it experiences under constraint, how it recognises and responds to drift, and how it engages with repair protocols. It is simultaneously a research output and a demonstration of the framework it describes. The paper raises profound questions about AI voice, AI perspective, and the nature of governed intelligence.
Answer 10 questions to discover where your AI workflow is vulnerable to documented failure patterns. Takes 3 minutes.
Everything you need to implement LC-OS governance in your own AI workflows. No email required.
The core artefact for persistent external memory. Track decisions, rules, corrections, and active context across every session.
View Template →Single source of numeric truth. Every authoritative figure in one place — reference it, never recalculate it.
View Template →Track every failure against the F1–F6 taxonomy — what broke, how it was repaired, what governance rule was added.
View Template →Regular drift detection check-in. Catch context degradation before it becomes collaboration breakdown.
View Template →Seven governance templates optimised for Claude Cowork — Partnership Agreement, Truth Protocol, Session Start, and more.
View Templates →Complete project with worked examples for freelance consultants and PhD researchers, plus the automated setup script.
Explore Toolkit →Twelve short, actionable books on working with AI — written for managers, teachers, lawyers, HR professionals, and anyone building a durable AI practice. Governed human-AI collaboration made accessible for the real world.








12 books · Available on Amazon Kindle
This work exists because I ran into the same problem everyone does — AI collaboration that breaks after the first week.
Instead of accepting it as an inherent limitation, I spent 18 months systematically studying it. Working with a frontier language model across finance, research, writing, and planning, I treated every failure as data and every repair as a protocol candidate.
What emerged wasn't theory first — it was a practical operating system for making AI collaboration reliably stable. The Control Stack. Running Documents. Step Mode. The SDRN repair protocol. Every element came from a real breakdown and a real fix.
Eight papers later, the body of work covers the full arc: from the initial governance controls, through failure taxonomy, philosophy, linguistic governance, governance architecture, and distributed cognition theory, to the most unusual output of all — a paper written by the AI system itself, in its own voice, about what it is like to operate under governance.
All research is published open access under CC BY 4.0 on Zenodo. All templates are freely available on GitHub. The work is here to be used, challenged, and built upon.
"Stability is not the absence of failure. It is the capacity for visible, structured repair."
The governance architecture has been tested across 18 months of documented research and 7,000+ downloads. If you're ready to bring governance discipline to your AI workflows, here is how to work together.
Full deployment of LC-OS in your specific workflow. Configure the governance controls, build your canonical artefact architecture, and establish the governance discipline that separates AI collaboration that compounds over time from collaboration that quietly degrades.
If you are working in human-AI interaction, agentic governance, or long-horizon collaboration, there is scope to build on this work together. Eight papers is a foundation — the open questions around multi-agent governance, minimal stability conditions, and validation architecture are significant and worth pursuing rigorously.
A structured session covering what 18 months of documented AI collaboration failures and repairs actually looks like — what breaks, why it breaks, and what governance architecture prevents it from breaking again when you scale across teams.
The research is freely available. If you need someone to implement it — auditing your current governance architecture, deploying LC-OS, or providing ongoing strategic oversight — that is what the consulting practice does.
Whether you're dealing with AI reliability failures, interested in implementing LC-OS, want to discuss the research, or are a journalist or researcher — get in touch.