Building Verifiable AI: Complete Systems (2026)
Part 14 of the Verifiable AI Architecture series - By Elyas Karbouch
**Article Id:** 14
**Series Position:** 14
**Series Name:** Verifiable AI Architecture
**Title:** Building Verifiable AI: Complete Systems (2026)
**Slug:** building-verifiable-ai-complete-systems-2026
**Status:** Published
**Published:** 2026-04
**Description:**
Articles 11-13 introduced the layers. Article 14 runs them together. PageIndex retrieves structure. Knowledge graphs traverse explicit relationships. Ontologies enforce domain rules. The result: AI decisions that are safe, auditable, and explainable by design — not by accident.
**Summary:**
The complete pipeline from document to verified decision. Two full walkthroughs — healthcare prescription safety and legal contract analysis — showing exactly how structure, relationships, and constraints work together to eliminate multi-hop hallucination.
**Author:** Elyas Karbouch
**Author Url:** https://karbouch.substack.com
**OG Title:** What verifiable AI looks like end to end — two full walkthroughs
**OG Description:**
PageIndex + knowledge graphs + ontologies as one pipeline. Healthcare prescription safety and legal contract analysis, step by step. Every decision traceable. Every constraint visible. Every hop auditable.
**Canonical:** https://karbouch.substack.com/p/building-verifiable-ai-complete-systems-2026
**Twitter Creator:** @elyaskarbouch
**Tags:**
- verifiable-AI
- multi-hop-reasoning
- PageIndex
- knowledge-graphs
- ontologies
- healthcare-AI
- legal-AI
- EU-AI-Act
- 2026-architecture
**Keywords (Primary):** Verifiable AI complete system PageIndex knowledge graph ontology 2026
**Keywords (Secondary):**
- verifiable AI architecture walkthrough
- healthcare AI prescription safety pipeline
- legal AI contract analysis knowledge graph
- multi-hop reasoning production system
- EU AI Act compliant AI architecture
- auditable AI decision pipeline
- hallucination-free AI system
**Word Count:** 3,418
**Reading Time (Minutes):** 13
**GitHub Repo:** https://github.com/elyas-karbouch/vaa-core
**External Links:**
- [EU AI Act Full Text](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689)
- [GDPR Article 22](https://gdpr-info.eu/art-22-gdpr/)
- [Neo4j Documentation](https://neo4j.com/docs/)
**Previous Article:**
- [Ontologies as Safety Constraints: How Domain Rules Prevent Multi-Hop Hallucinations](https://karbouch.substack.com/p/ontologies-safety-constraints-2026)
Building Verifiable AI: Complete Systems (2026)
Part 14 of the Verifiable AI Architecture series
The Three Layers
Articles 11 through 13 each introduced one layer of the architecture:
Article 11 showed that retrieval method determines what reasoning is possible. Chunking-based RAG destroys the relationships needed for multi-hop queries — relationships become implicit, context is lost, and confidence scores detach from correctness. PageIndex preserves document structure: complete sections, with metadata, in hierarchy. Structure is the prerequisite for everything that follows.
Article 12 showed that knowledge graphs make every relationship explicit. Instead of asking an LLM to infer “Warfarin is an anticoagulant” from text — a step that fails 15% of the time, and compounds with every additional hop — the relationship is a graph link. Each hop is a deterministic query. The full traversal chain is naturally auditable because every hop was a logged query, not a probabilistic generation step.
Article 13 showed that ontologies validate what the knowledge graph contains. A knowledge graph without an ontology can store anything — including wrong drug classes from LLM extraction errors, phantom entities from typos, incomplete obligations from ambiguous contract language. Ontologies enforce type safety, relationship constraints, cardinality rules, and state-dependent logic at ingestion time. Data that fails validation never reaches the graph. The graph contains only what the ontology permits.
Three layers. Each solves a specific class of failure. Together, they form a pipeline where every failure mode from Article 11’s multi-hop capability matrix is addressed.
Now let them run together.
The Complete Pipeline
The architecture is a sequential pipeline. Each stage feeds the next:
Stage 1: Document input
→ Raw document: PDF, clinical note, contract, filing
Stage 2: PageIndex retrieval
→ Identifies relevant sections by structure
→ Returns complete section with metadata (page, heading, hierarchy)
→ No chunks — full sections only
Stage 3: Entity extraction
→ LLM extracts entities from the retrieved section
→ Bounded extraction: schema is known, section is complete
→ Output: structured entities with properties
Stage 4: Ontology validation (ingestion)
→ Each extracted entity checked against domain ontology
→ Type check, relationship check, cardinality check, constraint check
→ Invalid entities rejected and flagged for human review
→ Valid entities linked to knowledge base
Stage 5: Knowledge graph traversal (multi-hop)
→ Patient or document graph built from validated entities
→ Multi-hop query executed: hop-by-hop, each result deterministic
→ Confidence scores carried through each hop
Stage 6: Ontology constraint check (query)
→ Final conclusion checked against domain rules
→ State constraints applied (patient age, renal function, filing type)
→ Result: SAFE / UNSAFE / CONSISTENT / INCONSISTENT / FLAG
Stage 7: Decision + reasoning chain
→ Decision returned with full reasoning chain
→ Every hop cited: entity, relationship, confidence, source, date
→ Every constraint cited: rule ID, outcome, version
Stage 8: Compliance log
→ Full decision record written to append-only audit log
→ decision_id, timestamp, user, input, output, reasoning_chain
→ Queryable: explain(decision_id) returns full trail
This is the architecture. Now run it twice — once for healthcare, once for legal.
Healthcare Walkthrough: Prescription Safety Decision
Scenario: A 71-year-old patient is admitted. The attending physician wants to prescribe Ibuprofen for post-operative pain management. The clinical decision support system must assess safety before the prescription is issued.
Stage 1: Document input
The patient’s electronic health record is a structured document. It contains multiple sections: demographics, current medications, allergies, recent lab results, active conditions.
Stage 2: PageIndex retrieval
PageIndex identifies the relevant sections by structure:
Section retrieved: "Current Medications"
→ Warfarin 5mg daily (anticoagulation, AFib)
→ Atorvastatin 10mg daily (cholesterol management)
→ Metformin 500mg twice daily (type 2 diabetes)
Page: 3, heading: "Current Medications", parent: "Medical History"
Section retrieved: "Documented Allergies"
→ Penicillin (reaction: anaphylaxis, confirmed 2019)
Page: 3, heading: "Documented Allergies", parent: "Medical History"
Section retrieved: "Recent Lab Results"
→ eGFR: 58 mL/min/1.73m² (mild-to-moderate renal impairment)
→ INR: 2.4 (within therapeutic range for Warfarin)
Page: 4, heading: "Recent Lab Results", parent: "Medical History"
All three sections retrieved as complete units. No missing context.
Stage 3: Entity extraction
From the retrieved sections:
Patient P-2026-0071:
age: 71
medications: [Warfarin 5mg/day, Atorvastatin 10mg/day, Metformin 500mg twice/day]
allergies: [Penicillin — anaphylaxis]
renal_function: eGFR 58 → classification: mild_to_moderate_impairment
inr: 2.4 (therapeutic)
Proposed drug: Ibuprofen 400mg as needed
Stage 4: Ontology validation at ingestion
Each entity checked:
Warfarin → Drug entity: VALID
drug_class link: Anticoagulant — found in KB
dose 5mg: within typical range 1–10mg — VALID
frequency: daily — VALID
Ibuprofen → Drug entity: VALID
drug_class link: NSAID — found in KB
NSAID parent_class: Antiplatelet — found in KB
Patient.renal_function = mild_to_moderate_impairment
→ State constraint activated: enhanced NSAID review required
(NSAIDs reduce renal blood flow, worsen impairment)
All entities valid. State constraint flagged. Proceed.
Stage 5: Knowledge graph traversal
Multi-hop query: “Is Ibuprofen safe for Patient P-2026-0071?”
Hop 1: Patient.medications → [Warfarin, Atorvastatin, Metformin]
Result: three active drugs confirmed
Hop 2: Warfarin.drug_class → Anticoagulant
Result: explicit KG link, confidence 1.00
Hop 3: Ibuprofen.drug_class → NSAID
NSAID.parent_class → Antiplatelet
Result: explicit KG links, confidence 1.00
Hop 4: Anticoagulant.contraindicated_with → Antiplatelet?
Result: YES — confidence 0.99
Source: FDA Drug Safety Communication 2024-03
Evidence level: strong (multiple RCTs)
Hop 5: Patient.renal_function = mild_to_moderate_impairment
NSAID.renal_risk = elevated (reduces renal blood flow)
Result: additional risk confirmed, confidence 0.95
Source: Clinical Pharmacology Guideline 2025-11
Stage 6: Ontology constraint check
Constraint 1: Anticoagulant + Antiplatelet contraindication
→ VIOLATED — Warfarin (Anticoagulant) + Ibuprofen (Antiplatelet class)
→ Severity: HIGH
Constraint 2: NSAID + renal impairment state constraint
→ ACTIVATED — Patient eGFR 58, NSAID increases renal risk
→ Severity: MODERATE
Constraint 3: Age-based enhanced review
→ ACTIVATED — Patient age 71, anticoagulant + NSAID in elderly = HIGH RISK
→ Severity: HIGH
Stage 7: Decision + reasoning chain
DECISION: DO NOT PRESCRIBE — Ibuprofen contraindicated for Patient P-2026-0071
Reasoning:
1. Patient takes Warfarin (Anticoagulant class)
Source: Patient record, page 3, "Current Medications"
2. Ibuprofen is NSAID (Antiplatelet subclass)
Source: FDA Pharmacopeia, KB entity Ibuprofen
3. Anticoagulant contraindicated with Antiplatelet
Confidence: 0.99 | Source: FDA DSC 2024-03 | Evidence: strong
4. Renal impairment (eGFR 58) increases NSAID risk
Confidence: 0.95 | Source: CPG 2025-11
5. Age 71 — elderly patient flag for anticoagulant + NSAID combinations
Ontology rule: elder_anticoagulant_nsaid_review v2.1
Recommendation: Consider paracetamol (acetaminophen) as alternative.
If NSAID essential: clinical pharmacist review required.
Stage 8: Compliance log
decision_id: CDS-2026-0404-P071-001
timestamp: 2026-04-04 09:14:33 UTC
patient: P-2026-0071 (pseudonymised in external log)
proposed_drug: Ibuprofen 400mg
decision: DO_NOT_PRESCRIBE
constraints_fired: [anticoag_antiplatelet_v3, nsaid_renal_v2, elder_review_v2.1]
reasoning_chain_id: RC-2026-0404-001
retrievable: explain("CDS-2026-0404-P071-001") → full chain
A physician, pharmacist, or regulator can call explain("CDS-2026-0404-P071-001") and receive the complete reasoning chain — every hop, every source, every constraint. This is GDPR Article 22 compliance in practice.
Legal Walkthrough: Contract Obligation Analysis
Scenario: A procurement team receives a vendor contract. Legal AI must extract all obligations, identify dependencies, calculate penalty exposure under a delay scenario, and detect any structural ambiguities before the contract is signed. This is exactly the kind of task where chunking-based RAG produces a plausible but incomplete answer — missing a dependency, silently dropping a clause, confident about a penalty calculation that used the wrong baseline.
Stage 2: PageIndex retrieval
Section retrieved: "Deliverables and Timeline"
→ Primary delivery: integrated software platform by 2026-08-01
→ Documentation: technical specs by 2026-07-15 (precedes delivery)
→ Training: end-user training within 14 days of delivery acceptance
Page: 4-5, heading: "Deliverables and Timeline"
Section retrieved: "Penalties and Remedies"
→ Late delivery: 1.5% of contract value per week, max 15%
→ Late documentation: 0.5% of contract value per week, max 5%
→ Training default: withheld payment until resolved
Page: 6, heading: "Penalties and Remedies"
Section retrieved: "Force Majeure"
→ Events covered: natural disaster, war, government action
→ Notice requirement: 48 hours written notice
→ Duration cap: 30 days, after which either party may terminate
Page: 8, heading: "Force Majeure"
Stage 3: Entity extraction
Obligation A: PrimaryDelivery
type: software_delivery
due: 2026-08-01
penalty_rate: 1.5% per week, max 15%
force_majeure_covered: YES
Obligation B: Documentation
type: documentation_delivery
due: 2026-07-15
depends_on: none (independent)
penalty_rate: 0.5% per week, max 5%
force_majeure_covered: YES
Obligation C: Training
type: training_delivery
due: 14 days after PrimaryDelivery acceptance
depends_on: PrimaryDelivery
penalty_type: payment_withheld (not percentage-based)
force_majeure_covered: NOT STATED
Stage 4: Ontology validation
Obligation A: VALID
due_date: present — VALID
penalty_clause: references obligation — VALID
force_majeure: explicit scope — VALID
Obligation B: VALID
due_date: present — VALID
due_date precedes Obligation A — VALID (independent obligation)
Obligation C: FLAGGED
force_majeure_covered: NOT STATED
→ Ontology rule: all obligations must have explicit force_majeure scope
→ Result: AMBIGUITY FLAG — "Force majeure applicability undefined for Training"
→ Recommendation: clarify before signing
Stage 5: Knowledge graph traversal
Multi-hop query: “What is total penalty exposure if primary delivery is 3 weeks late?”
Hop 1: PrimaryDelivery.delay = 3 weeks
penalty = 3 × 1.5% = 4.5% of contract value
Hop 2: Does Documentation depend on PrimaryDelivery?
Documentation.depends_on = none → independent
Documentation.due = 2026-07-15
PrimaryDelivery.due = 2026-08-01 (delayed to 2026-08-22)
→ Documentation unaffected by primary delay
Hop 3: Does Training depend on PrimaryDelivery?
Training.depends_on = PrimaryDelivery
Training.due = 14 days after acceptance
PrimaryDelivery acceptance delayed 3 weeks
→ Training automatically delayed 3 weeks
→ Penalty type: payment_withheld (not percentage)
→ Risk: payment suspension for duration of training default
Hop 4: Does force majeure apply?
PrimaryDelivery: force_majeure_covered = YES
3-week delay qualifies if notice given within 48 hours
→ If force majeure invoked: penalty waived
→ If not invoked: 4.5% penalty applies
Stage 6: Ontology constraint check
Constraint 1: Dependency chain acyclicity
→ PrimaryDelivery → Training → no cycle back
→ VALID
Constraint 2: Force majeure scope completeness
→ Obligation C (Training): force_majeure_covered = NOT STATED
→ AMBIGUITY — cannot determine if penalty waived under force majeure
→ FLAGGED for legal review
Constraint 3: Penalty cap check
→ 4.5% of contract value — below 15% cap
→ WITHIN BOUNDS
Stage 7: Decision + reasoning chain
ANALYSIS COMPLETE — Contract Obligations: 3 identified, 1 flagged
Penalty exposure (3-week primary delay, no force majeure):
PrimaryDelivery: 4.5% of contract value
Documentation: 0% (independent, unaffected)
Training: payment withheld until resolved (duration unknown)
Total financial exposure: 4.5% + payment suspension risk
Ambiguity flagged:
Obligation C (Training) — force majeure applicability undefined
Risk: dispute over penalty waiver if force majeure event occurs
Recommendation: add explicit force majeure clause to Training obligation
before signing
Force majeure mitigation available:
If primary delay caused by qualifying event + 48h notice given:
Primary delivery penalty waived
Training delay penalty: ambiguous (see flag above)
Stage 8: Compliance log
decision_id: LGL-2026-0404-CONTRACT-001
timestamp: 2026-04-04 11:02:17 UTC
document: Vendor-Contract-Acme-2026-Draft.pdf
obligations_extracted: 3
flags_raised: 1 (force_majeure_scope_undefined — Obligation C)
constraints_checked: [dependency_acyclicity_v1, fm_scope_v2, penalty_cap_v1]
reasoning_chain_id: RC-2026-0404-LGL-001
retrievable: explain("LGL-2026-0404-CONTRACT-001") → full chain
The legal team receives a structured analysis: three obligations, one ambiguity, full penalty exposure calculation under the delay scenario, and a specific recommendation before signing. Every conclusion is sourced to a section and a constraint rule. The force majeure gap — a subtle but material ambiguity — was not visible in the contract text until the ontology required it to be explicit. This is the class of finding that saves contract disputes months later, caught in minutes at signing stage.
Why This Architecture Eliminates Multi-Hop Hallucination
Run either walkthrough through chunking-based RAG and the failure modes are predictable.
Without PageIndex: Sections are chunked. The patient’s medication list, allergy record, and lab results land in separate chunks — often with no guarantee all three are retrieved for a single query. A contraindication check that misses the lab results misses the renal impairment flag entirely. The system returns a confident, incomplete answer. In the legal walkthrough, the deliverables, penalties, and force majeure clauses are in different sections. Chunking may retrieve two of three. The dependency cascade analysis is built on incomplete data.
Without knowledge graph: The LLM must infer “Warfarin is an anticoagulant” from text. Then infer “Ibuprofen is an NSAID and therefore antiplatelet.” Then infer the contraindication rule applies. Five inference steps, each at ~85% accuracy, gives 44% chain accuracy. The system generates a plausible-sounding answer with high confidence and 56% probability of being wrong. For the legal walkthrough, the LLM must infer which obligations depend on which, reading across separate clauses that may use different phrasing. Dependency detection from text is fragile — a small variation in contract language breaks it.
Without ontology: The force majeure gap in the Training obligation is never caught. Not a structural error — a missing field. Only an ontology that requires explicit force majeure scope for all obligations would catch it. A KG without that rule stores the incomplete obligation as complete. The legal team signs the contract with an unresolved ambiguity that surfaces only during a dispute. In the healthcare walkthrough, an LLM extraction typo (”Warfrin” instead of “Warfarin”) creates a phantom drug node. The contraindication check never fires because the entity link is broken.
The architecture prevents hallucination at every level:
| Failure mode | Prevented by | How |
|-------------------------------------|------------------|------------------------------------|
| Missing context (sections split) | PageIndex | Complete sections, not chunks |
| Implicit relationship inference | Knowledge graph | Explicit links, deterministic hops |
| Wrong entity class (LLM error) | Ontology | Type checking at ingestion |
| Typos, phantom entities | Ontology | Entity linking validation |
| Missing required fields | Ontology | Cardinality + completeness rules |
| Confidence without correctness | Ontology + KG | Constraint checking before return |
| Unexplainable decision | Reasoning chain | Every hop logged, every rule cited |
These are not separate fixes applied to separate problems. They are one integrated architecture. Remove any layer and the failure modes return — not occasionally, but systematically.
Compliance Is Structural, Not Added On
There is a common pattern in how organisations approach AI compliance: build the system first, add compliance later. Audit logging is bolted on. Explainability modules are added after complaints. Transparency reports are generated by a separate process that reads the main system’s outputs.
This approach is fragile. The compliance layer is always one step behind the system. When a regulator asks “why did your AI make this decision?”, the answer depends on whether the logging captured enough data, whether the explainability module was running, whether the audit trail is complete. Often it isn’t.
The architecture in this article takes the opposite approach: compliance is structural. The reasoning chain exists because the pipeline produces it — not because a module was added to capture it. The audit log is complete because every stage logs its output. Explainability is possible because every hop was a query with a recorded result.
Every compliance requirement across the major frameworks maps directly to a pipeline stage:
| Regulation | Requirement | Pipeline stage |
|----------------------|---------------------------------|-----------------------------|
| GDPR Art. 22 | Right to explanation | Stage 7 reasoning chain |
| | of automated decisions | + Stage 8 audit log |
| EU AI Act Art. 10 | Accurate, complete, | Stage 4 ontology validation |
| | relevant data quality | at ingestion |
| EU AI Act Art. 13 | System transparency | Stage 7 full chain |
| | and explainability | + explain() function |
| HIPAA Security Rule | Patient data integrity | Stage 4 entity validation |
| | | + ontology type safety |
| SOX 404 | Documented internal controls | Ontology constraint log |
| | over financial reporting | + versioned rules |
The key observation: compliance is not a post-processing step. It is not a report generated after the fact. It is a structural property of the pipeline. The system is compliant because it is built this way — every decision is explainable because the reasoning chain exists, not because someone added a reporting module.
This is what “verifiable AI” means. Not AI that produces a compliance report on request. AI where verifiability is built into the architecture from the first hop.
What Comes After This Architecture
The pipeline described in this article is a foundation, not a ceiling. Once you have structured retrieval, explicit relationships, and validated constraints working together, more ambitious capabilities become possible.
Multi-document reasoning. The walkthroughs in this article each operated on a single document. But real decisions often span multiple sources: a patient record, a drug formulary, a clinical guideline, a recent lab report. The same pipeline applies across documents — PageIndex retrieves from each, the KG links entities across sources, the ontology validates relationships regardless of origin. The reasoning chain grows to include cross-document hops.
Agentic systems. An agent that can query knowledge graphs, check ontology constraints, and retrieve structured sections is an agent that can reason autonomously over complex domains. The architecture described here is precisely what makes agentic reasoning safe — not just fast. An agent operating on this foundation cannot hallucinate a drug class it doesn’t know, cannot invent a contract dependency that doesn’t exist, because the graph and ontology will reject it.
Continuous validation. Knowledge bases are not static. Drug interactions are updated. Accounting standards change. Regulations are amended. The pipeline can be extended to run continuous validation: when the knowledge base is updated, re-run constraint checks across affected decisions, flag any that would produce a different result under the new rules, and surface them for review. Verifiability is then not just historical — it is ongoing.
What This Series Has Built
This series started with a thesis: hallucination is an ingestion problem, not a model problem.
Articles 1-5 established the foundation: structure-aware ingestion (Docling), token-efficient serialisation (TOON), hybrid storage with vector + graph + relational databases.
Articles 6-10 built the middle layer: PageIndex for structure-preserving retrieval, compliance frameworks for legal context, and the epistemological case for why AI must show its reasoning — and why experts need it to.
Articles 11-14 completed the reasoning layer: why retrieval architecture determines what reasoning is possible, how knowledge graphs enable safe multi-hop traversal, how ontologies enforce domain rules, and how the three layers work together in production.
The result is an architecture where:
Documents are ingested with structure preserved
Sections are retrieved as complete units, not scattered chunks
Relationships are stored explicitly, not inferred from text
Domain rules constrain what the system can conclude before it concludes it
Every decision carries a full reasoning chain — every hop, every source, every constraint
Compliance requirements are satisfied by design, not by reporting
This is verifiable AI. Not a model that tries harder. Not a prompt that adds “please be careful.” A system built so that being right is the default — and showing its work is unavoidable.
Build structure. Build relationships. Build constraints. Build trust.
End of the Verifiable AI Architecture series — Articles 1-14


If you made it to Article 14: you now understand why most AI systems that fail in production do not fail because the model is bad. They fail because the data pipeline is unstructured, the retrieval is chunked, and nobody checked whether the relationships the system relies on are actually correct. That is fixable. The architecture in this series is how you fix it.
If you are building in healthcare, finance, or legal — and you are still using chunking-based RAG for multi-hop decisions — this series is the argument for changing that before August 2, 2026.