Building Verifiable AI: Complete Systems (2026)

Part 14 of the Verifiable AI Architecture series - By Elyas Karbouch

Apr 04, 2026


**Article Id:** 14
**Series Position:** 14
**Series Name:** Verifiable AI Architecture

**Title:** Building Verifiable AI: Complete Systems (2026)
**Slug:** building-verifiable-ai-complete-systems-2026
**Status:** Published
**Published:** 2026-04

**Description:**
Articles 11-13 introduced the layers. Article 14 runs them together. PageIndex retrieves structure. Knowledge graphs traverse explicit relationships. Ontologies enforce domain rules. The result: AI decisions that are safe, auditable, and explainable by design — not by accident.

**Summary:**
The complete pipeline from document to verified decision. Two full walkthroughs — healthcare prescription safety and legal contract analysis — showing exactly how structure, relationships, and constraints work together to eliminate multi-hop hallucination.

**Author:** Elyas Karbouch
**Author Url:** https://karbouch.substack.com

**OG Title:** What verifiable AI looks like end to end — two full walkthroughs
**OG Description:**
PageIndex + knowledge graphs + ontologies as one pipeline. Healthcare prescription safety and legal contract analysis, step by step. Every decision traceable. Every constraint visible. Every hop auditable.

**Canonical:** https://karbouch.substack.com/p/building-verifiable-ai-complete-systems-2026
**Twitter Creator:** @elyaskarbouch

**Tags:**
- verifiable-AI
- multi-hop-reasoning
- PageIndex
- knowledge-graphs
- ontologies
- healthcare-AI
- legal-AI
- EU-AI-Act
- 2026-architecture

**Keywords (Primary):** Verifiable AI complete system PageIndex knowledge graph ontology 2026

**Keywords (Secondary):**
- verifiable AI architecture walkthrough
- healthcare AI prescription safety pipeline
- legal AI contract analysis knowledge graph
- multi-hop reasoning production system
- EU AI Act compliant AI architecture
- auditable AI decision pipeline
- hallucination-free AI system

**Word Count:** 3,418
**Reading Time (Minutes):** 13

**GitHub Repo:** https://github.com/elyas-karbouch/vaa-core

**External Links:**
- [EU AI Act Full Text](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689)
- [GDPR Article 22](https://gdpr-info.eu/art-22-gdpr/)
- [Neo4j Documentation](https://neo4j.com/docs/)

**Previous Article:**
- [Ontologies as Safety Constraints: How Domain Rules Prevent Multi-Hop Hallucinations](https://karbouch.substack.com/p/ontologies-safety-constraints-2026)

Building Verifiable AI: Complete Systems (2026)

Part 14 of the Verifiable AI Architecture series

The Three Layers

Articles 11 through 13 each introduced one layer of the architecture:

Article 11 showed that retrieval method determines what reasoning is possible. Chunking-based RAG destroys the relationships needed for multi-hop queries — relationships become implicit, context is lost, and confidence scores detach from correctness. PageIndex preserves document structure: complete sections, with metadata, in hierarchy. Structure is the prerequisite for everything that follows.

Article 12 showed that knowledge graphs make every relationship explicit. Instead of asking an LLM to infer “Warfarin is an anticoagulant” from text — a step that fails 15% of the time, and compounds with every additional hop — the relationship is a graph link. Each hop is a deterministic query. The full traversal chain is naturally auditable because every hop was a logged query, not a probabilistic generation step.

Article 13 showed that ontologies validate what the knowledge graph contains. A knowledge graph without an ontology can store anything — including wrong drug classes from LLM extraction errors, phantom entities from typos, incomplete obligations from ambiguous contract language. Ontologies enforce type safety, relationship constraints, cardinality rules, and state-dependent logic at ingestion time. Data that fails validation never reaches the graph. The graph contains only what the ontology permits.

Three layers. Each solves a specific class of failure. Together, they form a pipeline where every failure mode from Article 11’s multi-hop capability matrix is addressed.

Now let them run together.

The Complete Pipeline

The architecture is a sequential pipeline. Each stage feeds the next:

Stage 1: Document input
  → Raw document: PDF, clinical note, contract, filing

Stage 2: PageIndex retrieval
  → Identifies relevant sections by structure
  → Returns complete section with metadata (page, heading, hierarchy)
  → No chunks — full sections only

Stage 3: Entity extraction
  → LLM extracts entities from the retrieved section
  → Bounded extraction: schema is known, section is complete
  → Output: structured entities with properties

Stage 4: Ontology validation (ingestion)
  → Each extracted entity checked against domain ontology
  → Type check, relationship check, cardinality check, constraint check
  → Invalid entities rejected and flagged for human review
  → Valid entities linked to knowledge base

Stage 5: Knowledge graph traversal (multi-hop)
  → Patient or document graph built from validated entities
  → Multi-hop query executed: hop-by-hop, each result deterministic
  → Confidence scores carried through each hop

Stage 6: Ontology constraint check (query)
  → Final conclusion checked against domain rules
  → State constraints applied (patient age, renal function, filing type)
  → Result: SAFE / UNSAFE / CONSISTENT / INCONSISTENT / FLAG

Stage 7: Decision + reasoning chain
  → Decision returned with full reasoning chain
  → Every hop cited: entity, relationship, confidence, source, date
  → Every constraint cited: rule ID, outcome, version

Stage 8: Compliance log
  → Full decision record written to append-only audit log
  → decision_id, timestamp, user, input, output, reasoning_chain
  → Queryable: explain(decision_id) returns full trail

This is the architecture. Now run it twice — once for healthcare, once for legal.

Healthcare Walkthrough: Prescription Safety Decision

Scenario: A 71-year-old patient is admitted. The attending physician wants to prescribe Ibuprofen for post-operative pain management. The clinical decision support system must assess safety before the prescription is issued.

Stage 1: Document input

The patient’s electronic health record is a structured document. It contains multiple sections: demographics, current medications, allergies, recent lab results, active conditions.

Stage 2: PageIndex retrieval

PageIndex identifies the relevant sections by structure:

Section retrieved: "Current Medications"
  → Warfarin 5mg daily (anticoagulation, AFib)
  → Atorvastatin 10mg daily (cholesterol management)
  → Metformin 500mg twice daily (type 2 diabetes)
  Page: 3, heading: "Current Medications", parent: "Medical History"

Section retrieved: "Documented Allergies"
  → Penicillin (reaction: anaphylaxis, confirmed 2019)
  Page: 3, heading: "Documented Allergies", parent: "Medical History"

Section retrieved: "Recent Lab Results"
  → eGFR: 58 mL/min/1.73m² (mild-to-moderate renal impairment)
  → INR: 2.4 (within therapeutic range for Warfarin)
  Page: 4, heading: "Recent Lab Results", parent: "Medical History"

All three sections retrieved as complete units. No missing context.

Stage 3: Entity extraction

From the retrieved sections:

Patient P-2026-0071:
  age: 71
  medications: [Warfarin 5mg/day, Atorvastatin 10mg/day, Metformin 500mg twice/day]
  allergies: [Penicillin — anaphylaxis]
  renal_function: eGFR 58 → classification: mild_to_moderate_impairment
  inr: 2.4 (therapeutic)

Proposed drug: Ibuprofen 400mg as needed

Stage 4: Ontology validation at ingestion

Each entity checked:

Warfarin → Drug entity: VALID
  drug_class link: Anticoagulant — found in KB
  dose 5mg: within typical range 1–10mg — VALID
  frequency: daily — VALID

Ibuprofen → Drug entity: VALID
  drug_class link: NSAID — found in KB
  NSAID parent_class: Antiplatelet — found in KB

Patient.renal_function = mild_to_moderate_impairment
  → State constraint activated: enhanced NSAID review required
     (NSAIDs reduce renal blood flow, worsen impairment)

All entities valid. State constraint flagged. Proceed.

Stage 5: Knowledge graph traversal

Multi-hop query: “Is Ibuprofen safe for Patient P-2026-0071?”

Hop 1: Patient.medications → [Warfarin, Atorvastatin, Metformin]
        Result: three active drugs confirmed

Hop 2: Warfarin.drug_class → Anticoagulant
        Result: explicit KG link, confidence 1.00

Hop 3: Ibuprofen.drug_class → NSAID
        NSAID.parent_class → Antiplatelet
        Result: explicit KG links, confidence 1.00

Hop 4: Anticoagulant.contraindicated_with → Antiplatelet?
        Result: YES — confidence 0.99
        Source: FDA Drug Safety Communication 2024-03
        Evidence level: strong (multiple RCTs)

Hop 5: Patient.renal_function = mild_to_moderate_impairment
        NSAID.renal_risk = elevated (reduces renal blood flow)
        Result: additional risk confirmed, confidence 0.95
        Source: Clinical Pharmacology Guideline 2025-11

Stage 6: Ontology constraint check

Constraint 1: Anticoagulant + Antiplatelet contraindication
  → VIOLATED — Warfarin (Anticoagulant) + Ibuprofen (Antiplatelet class)
  → Severity: HIGH

Constraint 2: NSAID + renal impairment state constraint
  → ACTIVATED — Patient eGFR 58, NSAID increases renal risk
  → Severity: MODERATE

Constraint 3: Age-based enhanced review
  → ACTIVATED — Patient age 71, anticoagulant + NSAID in elderly = HIGH RISK
  → Severity: HIGH

Stage 7: Decision + reasoning chain

DECISION: DO NOT PRESCRIBE — Ibuprofen contraindicated for Patient P-2026-0071

Reasoning:
  1. Patient takes Warfarin (Anticoagulant class)
     Source: Patient record, page 3, "Current Medications"

  2. Ibuprofen is NSAID (Antiplatelet subclass)
     Source: FDA Pharmacopeia, KB entity Ibuprofen

  3. Anticoagulant contraindicated with Antiplatelet
     Confidence: 0.99 | Source: FDA DSC 2024-03 | Evidence: strong

  4. Renal impairment (eGFR 58) increases NSAID risk
     Confidence: 0.95 | Source: CPG 2025-11

  5. Age 71 — elderly patient flag for anticoagulant + NSAID combinations
     Ontology rule: elder_anticoagulant_nsaid_review v2.1

Recommendation: Consider paracetamol (acetaminophen) as alternative.
               If NSAID essential: clinical pharmacist review required.

Stage 8: Compliance log

decision_id: CDS-2026-0404-P071-001
timestamp: 2026-04-04 09:14:33 UTC
patient: P-2026-0071 (pseudonymised in external log)
proposed_drug: Ibuprofen 400mg
decision: DO_NOT_PRESCRIBE
constraints_fired: [anticoag_antiplatelet_v3, nsaid_renal_v2, elder_review_v2.1]
reasoning_chain_id: RC-2026-0404-001
retrievable: explain("CDS-2026-0404-P071-001") → full chain

A physician, pharmacist, or regulator can call explain("CDS-2026-0404-P071-001") and receive the complete reasoning chain — every hop, every source, every constraint. This is GDPR Article 22 compliance in practice.

Legal Walkthrough: Contract Obligation Analysis

Scenario: A procurement team receives a vendor contract. Legal AI must extract all obligations, identify dependencies, calculate penalty exposure under a delay scenario, and detect any structural ambiguities before the contract is signed. This is exactly the kind of task where chunking-based RAG produces a plausible but incomplete answer — missing a dependency, silently dropping a clause, confident about a penalty calculation that used the wrong baseline.

Stage 2: PageIndex retrieval

Section retrieved: "Deliverables and Timeline"
  → Primary delivery: integrated software platform by 2026-08-01
  → Documentation: technical specs by 2026-07-15 (precedes delivery)
  → Training: end-user training within 14 days of delivery acceptance
  Page: 4-5, heading: "Deliverables and Timeline"

Section retrieved: "Penalties and Remedies"
  → Late delivery: 1.5% of contract value per week, max 15%
  → Late documentation: 0.5% of contract value per week, max 5%
  → Training default: withheld payment until resolved
  Page: 6, heading: "Penalties and Remedies"

Section retrieved: "Force Majeure"
  → Events covered: natural disaster, war, government action
  → Notice requirement: 48 hours written notice
  → Duration cap: 30 days, after which either party may terminate
  Page: 8, heading: "Force Majeure"

Stage 3: Entity extraction

Obligation A: PrimaryDelivery
  type: software_delivery
  due: 2026-08-01
  penalty_rate: 1.5% per week, max 15%
  force_majeure_covered: YES

Obligation B: Documentation
  type: documentation_delivery
  due: 2026-07-15
  depends_on: none (independent)
  penalty_rate: 0.5% per week, max 5%
  force_majeure_covered: YES

Obligation C: Training
  type: training_delivery
  due: 14 days after PrimaryDelivery acceptance
  depends_on: PrimaryDelivery
  penalty_type: payment_withheld (not percentage-based)
  force_majeure_covered: NOT STATED

Stage 4: Ontology validation

Obligation A: VALID
  due_date: present — VALID
  penalty_clause: references obligation — VALID
  force_majeure: explicit scope — VALID

Obligation B: VALID
  due_date: present — VALID
  due_date precedes Obligation A — VALID (independent obligation)

Obligation C: FLAGGED
  force_majeure_covered: NOT STATED
  → Ontology rule: all obligations must have explicit force_majeure scope
  → Result: AMBIGUITY FLAG — "Force majeure applicability undefined for Training"
  → Recommendation: clarify before signing

Stage 5: Knowledge graph traversal

Multi-hop query: “What is total penalty exposure if primary delivery is 3 weeks late?”

Hop 1: PrimaryDelivery.delay = 3 weeks
        penalty = 3 × 1.5% = 4.5% of contract value

Hop 2: Does Documentation depend on PrimaryDelivery?
        Documentation.depends_on = none → independent
        Documentation.due = 2026-07-15
        PrimaryDelivery.due = 2026-08-01 (delayed to 2026-08-22)
        → Documentation unaffected by primary delay

Hop 3: Does Training depend on PrimaryDelivery?
        Training.depends_on = PrimaryDelivery
        Training.due = 14 days after acceptance
        PrimaryDelivery acceptance delayed 3 weeks
        → Training automatically delayed 3 weeks
        → Penalty type: payment_withheld (not percentage)
        → Risk: payment suspension for duration of training default

Hop 4: Does force majeure apply?
        PrimaryDelivery: force_majeure_covered = YES
        3-week delay qualifies if notice given within 48 hours
        → If force majeure invoked: penalty waived
        → If not invoked: 4.5% penalty applies

Stage 6: Ontology constraint check

Constraint 1: Dependency chain acyclicity
  → PrimaryDelivery → Training → no cycle back
  → VALID

Constraint 2: Force majeure scope completeness
  → Obligation C (Training): force_majeure_covered = NOT STATED
  → AMBIGUITY — cannot determine if penalty waived under force majeure
  → FLAGGED for legal review

Constraint 3: Penalty cap check
  → 4.5% of contract value — below 15% cap
  → WITHIN BOUNDS

Stage 7: Decision + reasoning chain

ANALYSIS COMPLETE — Contract Obligations: 3 identified, 1 flagged

Penalty exposure (3-week primary delay, no force majeure):
  PrimaryDelivery: 4.5% of contract value
  Documentation: 0% (independent, unaffected)
  Training: payment withheld until resolved (duration unknown)
  Total financial exposure: 4.5% + payment suspension risk

Ambiguity flagged:
  Obligation C (Training) — force majeure applicability undefined
  Risk: dispute over penalty waiver if force majeure event occurs
  Recommendation: add explicit force majeure clause to Training obligation
                  before signing

Force majeure mitigation available:
  If primary delay caused by qualifying event + 48h notice given:
  Primary delivery penalty waived
  Training delay penalty: ambiguous (see flag above)

Stage 8: Compliance log

decision_id: LGL-2026-0404-CONTRACT-001
timestamp: 2026-04-04 11:02:17 UTC
document: Vendor-Contract-Acme-2026-Draft.pdf
obligations_extracted: 3
flags_raised: 1 (force_majeure_scope_undefined — Obligation C)
constraints_checked: [dependency_acyclicity_v1, fm_scope_v2, penalty_cap_v1]
reasoning_chain_id: RC-2026-0404-LGL-001
retrievable: explain("LGL-2026-0404-CONTRACT-001") → full chain

The legal team receives a structured analysis: three obligations, one ambiguity, full penalty exposure calculation under the delay scenario, and a specific recommendation before signing. Every conclusion is sourced to a section and a constraint rule. The force majeure gap — a subtle but material ambiguity — was not visible in the contract text until the ontology required it to be explicit. This is the class of finding that saves contract disputes months later, caught in minutes at signing stage.

Why This Architecture Eliminates Multi-Hop Hallucination

Run either walkthrough through chunking-based RAG and the failure modes are predictable.

Without PageIndex: Sections are chunked. The patient’s medication list, allergy record, and lab results land in separate chunks — often with no guarantee all three are retrieved for a single query. A contraindication check that misses the lab results misses the renal impairment flag entirely. The system returns a confident, incomplete answer. In the legal walkthrough, the deliverables, penalties, and force majeure clauses are in different sections. Chunking may retrieve two of three. The dependency cascade analysis is built on incomplete data.

Without knowledge graph: The LLM must infer “Warfarin is an anticoagulant” from text. Then infer “Ibuprofen is an NSAID and therefore antiplatelet.” Then infer the contraindication rule applies. Five inference steps, each at ~85% accuracy, gives 44% chain accuracy. The system generates a plausible-sounding answer with high confidence and 56% probability of being wrong. For the legal walkthrough, the LLM must infer which obligations depend on which, reading across separate clauses that may use different phrasing. Dependency detection from text is fragile — a small variation in contract language breaks it.

Without ontology: The force majeure gap in the Training obligation is never caught. Not a structural error — a missing field. Only an ontology that requires explicit force majeure scope for all obligations would catch it. A KG without that rule stores the incomplete obligation as complete. The legal team signs the contract with an unresolved ambiguity that surfaces only during a dispute. In the healthcare walkthrough, an LLM extraction typo (”Warfrin” instead of “Warfarin”) creates a phantom drug node. The contraindication check never fires because the entity link is broken.

The architecture prevents hallucination at every level:

| Failure mode                        | Prevented by     | How                                |
|-------------------------------------|------------------|------------------------------------|
| Missing context (sections split)    | PageIndex        | Complete sections, not chunks      |
| Implicit relationship inference     | Knowledge graph  | Explicit links, deterministic hops |
| Wrong entity class (LLM error)      | Ontology         | Type checking at ingestion         |
| Typos, phantom entities             | Ontology         | Entity linking validation          |
| Missing required fields             | Ontology         | Cardinality + completeness rules   |
| Confidence without correctness      | Ontology + KG    | Constraint checking before return  |
| Unexplainable decision              | Reasoning chain  | Every hop logged, every rule cited |

These are not separate fixes applied to separate problems. They are one integrated architecture. Remove any layer and the failure modes return — not occasionally, but systematically.

Compliance Is Structural, Not Added On

There is a common pattern in how organisations approach AI compliance: build the system first, add compliance later. Audit logging is bolted on. Explainability modules are added after complaints. Transparency reports are generated by a separate process that reads the main system’s outputs.

This approach is fragile. The compliance layer is always one step behind the system. When a regulator asks “why did your AI make this decision?”, the answer depends on whether the logging captured enough data, whether the explainability module was running, whether the audit trail is complete. Often it isn’t.

The architecture in this article takes the opposite approach: compliance is structural. The reasoning chain exists because the pipeline produces it — not because a module was added to capture it. The audit log is complete because every stage logs its output. Explainability is possible because every hop was a query with a recorded result.

Every compliance requirement across the major frameworks maps directly to a pipeline stage:

| Regulation           | Requirement                     | Pipeline stage              |
|----------------------|---------------------------------|-----------------------------|
| GDPR Art. 22         | Right to explanation            | Stage 7 reasoning chain     |
|                      | of automated decisions          | + Stage 8 audit log         |
| EU AI Act Art. 10    | Accurate, complete,             | Stage 4 ontology validation |
|                      | relevant data quality           | at ingestion                |
| EU AI Act Art. 13    | System transparency             | Stage 7 full chain          |
|                      | and explainability              | + explain() function        |
| HIPAA Security Rule  | Patient data integrity          | Stage 4 entity validation   |
|                      |                                 | + ontology type safety      |
| SOX 404              | Documented internal controls    | Ontology constraint log      |
|                      | over financial reporting        | + versioned rules           |

The key observation: compliance is not a post-processing step. It is not a report generated after the fact. It is a structural property of the pipeline. The system is compliant because it is built this way — every decision is explainable because the reasoning chain exists, not because someone added a reporting module.

This is what “verifiable AI” means. Not AI that produces a compliance report on request. AI where verifiability is built into the architecture from the first hop.

What Comes After This Architecture

The pipeline described in this article is a foundation, not a ceiling. Once you have structured retrieval, explicit relationships, and validated constraints working together, more ambitious capabilities become possible.

Multi-document reasoning. The walkthroughs in this article each operated on a single document. But real decisions often span multiple sources: a patient record, a drug formulary, a clinical guideline, a recent lab report. The same pipeline applies across documents — PageIndex retrieves from each, the KG links entities across sources, the ontology validates relationships regardless of origin. The reasoning chain grows to include cross-document hops.

Agentic systems. An agent that can query knowledge graphs, check ontology constraints, and retrieve structured sections is an agent that can reason autonomously over complex domains. The architecture described here is precisely what makes agentic reasoning safe — not just fast. An agent operating on this foundation cannot hallucinate a drug class it doesn’t know, cannot invent a contract dependency that doesn’t exist, because the graph and ontology will reject it.

Continuous validation. Knowledge bases are not static. Drug interactions are updated. Accounting standards change. Regulations are amended. The pipeline can be extended to run continuous validation: when the knowledge base is updated, re-run constraint checks across affected decisions, flag any that would produce a different result under the new rules, and surface them for review. Verifiability is then not just historical — it is ongoing.

What This Series Has Built

This series started with a thesis: hallucination is an ingestion problem, not a model problem.

Articles 1-5 established the foundation: structure-aware ingestion (Docling), token-efficient serialisation (TOON), hybrid storage with vector + graph + relational databases.

Articles 6-10 built the middle layer: PageIndex for structure-preserving retrieval, compliance frameworks for legal context, and the epistemological case for why AI must show its reasoning — and why experts need it to.

Articles 11-14 completed the reasoning layer: why retrieval architecture determines what reasoning is possible, how knowledge graphs enable safe multi-hop traversal, how ontologies enforce domain rules, and how the three layers work together in production.

The result is an architecture where:

Documents are ingested with structure preserved
Sections are retrieved as complete units, not scattered chunks
Relationships are stored explicitly, not inferred from text
Domain rules constrain what the system can conclude before it concludes it
Every decision carries a full reasoning chain — every hop, every source, every constraint
Compliance requirements are satisfied by design, not by reporting

This is verifiable AI. Not a model that tries harder. Not a prompt that adds “please be careful.” A system built so that being right is the default — and showing its work is unavoidable.

Build structure. Build relationships. Build constraints. Build trust.

End of the Verifiable AI Architecture series — Articles 1-14

Apr 4

If you made it to Article 14: you now understand why most AI systems that fail in production do not fail because the model is bad. They fail because the data pipeline is unstructured, the retrieval is chunked, and nobody checked whether the relationships the system relies on are actually correct. That is fixable. The architecture in this series is how you fix it.

If you are building in healthcare, finance, or legal — and you are still using chunking-based RAG for multi-hop decisions — this series is the argument for changing that before August 2, 2026.

Elyas's Substack

Discussion about this post

Ready for more?