InsightsMcKinsey Lilli Breach : Research Report And Learnings
AI Trends

McKinsey Lilli Breach : Research Report And Learnings

Gaurav ChopraGaurav Chopra·March 19, 2026
McKinsey Lilli Breach : Research Report And Learnings

Executive Summary

On February 28, 2026, an autonomous AI agent built by security research startup CodeWall gained full read-write access to McKinsey's internal AI platform "Lilli" in under two hours — with no credentials, no insider access, and no human guidance during the attack. The entry point was a SQL injection vulnerability in an unauthenticated API endpoint. The twist: while query values were correctly parameterized, JSON field names (keys) were concatenated directly into SQL — a variant structurally invisible to OWASP ZAP and all five major WAFs tested by researchers. A secondary IDOR (Insecure Direct Object Reference) flaw then enabled cross-user access to individual employee search histories. The exposure reached 46.5 million plaintext chat messages, 728,000 client files, 57,000 user accounts, and 3.68 million RAG document chunks representing decades of McKinsey's proprietary research. The most consequential finding was write access to 95 system prompt configurations controlling AI behavior — modifiable with a single HTTP call, no deployment gate, no alert, no audit trail. McKinsey patched all unauthenticated endpoints within 24 hours of disclosure and a third-party forensics firm found no evidence of unauthorized access beyond the research engagement. The core lesson: AI platforms are behavioral control systems sitting on top of data aggregations. A single vulnerability now reaches not just data, but the AI's entire reasoning contract with its users.


1. What Happened — Timeline, Actors, and Discovery

CodeWall and Its Autonomous Agent

CodeWall is an autonomous offensive security research startup. Their AI agent conducts what they describe as fully autonomous red-teaming: "researching the target, analyzing, attacking, and reporting" with no human in the loop. The agent does not follow a predefined checklist — it reasons iteratively, forming hypotheses from observed responses and adapting its attack chain accordingly.

The agent was pointed at McKinsey's Lilli platform on February 28, 2026. Notably, the agent itself selected the target. It flagged McKinsey because:

  • McKinsey had recently publicized updates to the Lilli platform, signaling an active, high-value target
  • McKinsey's HackerOne responsible disclosure policy provided a legal and ethical boundary the agent could operate within

Lilli: The Platform at Stake

Lilli is McKinsey's enterprise AI platform, processing 500,000+ prompts per month across 43,000+ employees. Capabilities include AI-assisted chat covering strategy, M&A, and client engagements; RAG-based search over 100,000+ internal documents; 384,000 AI assistants across 94,000 workspaces; and integration with external APIs including OpenAI vector stores. It had been running in production for over two years.

Attack Sequence

  1. API Surface Mapping — The agent discovered publicly accessible API documentation listing 200+ endpoints. Of these, 22 required no authentication.
  2. Unauthenticated Endpoint Targeting — An endpoint accepting JSON payloads to log user search queries was identified as a write path to the database.
  3. SQL Injection Discovery — The agent observed that JSON key names were reflected verbatim in database error messages. This indicated keys were being concatenated into SQL rather than parameterized.
  4. Blind SQL Injection Exploitation — Through 15 iterative error-based probing attempts, the agent mapped the database schema.
  5. IDOR Chaining — Following the SQL injection, the agent identified a Broken Object Level Authorization (BOLA/IDOR) flaw that allowed retrieval of individual employee records without verifying the caller's permission to access them.
  6. Full Database Access — Complete read-write access to the production database, including cross-user records, was achieved within two hours.

Responsible Disclosure and Remediation Timeline

Date Event
February 28, 2026 Agent achieves full access to Lilli production database
March 1, 2026 CodeWall initiates responsible disclosure to McKinsey
March 2, 2026 McKinsey patches all unauthenticated endpoints; dev environment taken offline; public API docs blocked
March 9, 2026 CodeWall publishes public disclosure

McKinsey's investigation, supported by a leading third-party forensics firm, found no evidence that client data or confidential information was accessed by any unauthorized party beyond this research engagement.

Sources: CodeWall, The Register, CyberNews


2. The Exploit: JSON Field-Name SQL Injection and IDOR — Technical Deep Dive

What Partial Parameterization Means

The developers at McKinsey applied parameterized queries correctly for field values — the textbook defense against SQL injection. The failure was in field names. Consider this simplified example:

-- Intended query (values parameterized, safe):
INSERT INTO search_logs (user_id, query_text) VALUES (?, ?)-- Actual query construction (key names concatenated, vulnerable):
-- The JSON payload: {"user_id": 123, "query_text": "market analysis"}
-- The code extracts keys and builds:
"INSERT INTO search_logs (" + key1 + ", " + key2 + ") VALUES (?, ?)"

If an attacker submits a JSON payload with a malicious key name:

{"user_id": 123, "'; SELECT * FROM chat_messages; --": "value"}

The key name is injected into the structural (column name) position of the SQL statement — a location parameterization never touches. The database interprets the injected SQL as valid commands.

Historical note: The principle that JSON keys can become SQL injection vectors was first documented publicly by Kazuho Oku in July 2014 (The JSON SQL Injection Vulnerability, Kazuho's Weblog). That analysis ran almost a decade before the OWASP CRS project addressed it at default protection levels — illustrating how long the tooling community can lag a known vulnerability class.

How the Agent Detected It

The critical signal: database error messages reflected the JSON key names verbatim. When the agent submitted keys containing SQL fragments, it observed partial SQL error messages echoed back — confirming that the key content was reaching the SQL parser unescaped. This enabled error-based blind SQL injection: by crafting keys that produce specific database errors, the agent extracted database structure information from error content across 15 iterative probes.

IDOR: The Secondary Vulnerability That Amplified Cross-User Access

Beyond the SQL injection, the agent identified a Broken Object Level Authorization (BOLA) flaw — the modern OWASP API Security Top 10 term for what is commonly called IDOR (Insecure Direct Object Reference). As the Promptfoo analysis of the incident describes, "the application accepts an object identifier and returns a record without verifying that the caller is allowed to see it." After the SQL injection established database access, the IDOR flaw allowed the agent to enumerate and retrieve individual employees' search histories by simply supplying their identifiers in API requests — no authorization verification was performed.

The chained attack path was: unauthenticated endpoint → SQL injection → database schema extraction → IDOR exploitation → cross-user search history access. Each step used the previous one's findings, which is precisely the multi-step reasoning that checklist scanners cannot replicate.

Source: McKinsey's Lilli Looks More Like an API Security Failure Than a Model Jailbreak — Promptfoo

This Is Not a New Class — It Is Systematically Missed

CVE-2024-42005, disclosed August 8, 2024, is the same vulnerability class in Django's ORM. Django's QuerySet.values() and values_list() methods failed to properly validate JSON object keys passed as arguments, allowing SQL injection through column aliases. Affected versions: Django prior to 4.2.15 and 5.0.8. The fix required explicitly treating JSON key names as untrusted input — exactly what Lilli's developers failed to do.

Source: CVE-2024-42005: Django JSONField SQL Injection — Vulert

Why WAFs and OWASP ZAP Missed It

Claroty Team82's research (JS-ON: Security-OFF, December 2022) tested JSON-based SQL injection against five major WAF vendors and bypassed all of them:

  • Palo Alto Networks (Next Generation Firewall)
  • Amazon Web Services (AWS ELB)
  • Cloudflare
  • F5 (Big-IP)
  • Imperva

The reason: WAF SQL injection detection rules look for standard SQL operators and keywords combined with suspicious patterns. JSON-based injection uses database JSON operators (JSON_EXTRACT, ?, @>, ::jsonb) that WAF parsers did not model as SQL. The database interprets these as valid SQL; the WAF sees only JSON.

Example PostgreSQL injection payload that passes WAFs:

'{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonb

The OWASP Core Rule Set (CRS) project only added Rule 942550 ("JSON-Based SQL Injection") in December 2022 — six days after the Team82 research published — providing protection at default paranoia level 1. Before that, protection only existed at paranoia level 2 (non-default). Organizations running WAFs with CRS at default paranoia level 1 before 2023 had no WAF-level defense. Those running out-of-date WAF rules in 2026 may still be unprotected.

Source: OWASP CRS Project — A new rule to prevent SQL in JSON

OWASP ZAP's scanner tests input value positions against known bad signatures. It does not systematically model JSON key names as injection candidates. This is not a flaw in ZAP — it is a structural limitation of signature-based scanning against a non-standard injection surface.

Correct Remediation

The fix is not more thorough parameterization. Parameterization cannot protect structural SQL positions (column names, table names, operators). The correct approaches are:

  1. Allowlisting for all structural elements. Maintain an explicit set of permitted column names. Reject any key name not on the list before SQL construction begins.
  2. ORM abstractions that never accept field names from user input. Do not expose raw JSON keys as database column selectors.
  3. Static analysis rules. Scan for any string concatenation involving request-derived data in SQL construction paths — values and structural positions alike.
  4. Object-level authorization checks. Every request returning a record must verify the authenticated caller is permitted to access that specific object — not just that they are authenticated.

Sources: Imperva — Abusing JSON-Based SQL, Claroty Team82, OWASP SQL Injection Prevention Cheat Sheet, Promptfoo


3. Impact Analysis — What Was Exposed and What It Means

Scale of Exposure

Data Class Quantity Significance
Chat messages 46.5 million Plaintext — M&A discussions, client strategy, financial analysis
Files 728,000 192K PDFs, 93K spreadsheets, 93K presentations, 58K Word docs
User accounts 57,000 PII, search histories, session context
AI assistants / workspaces 384,000 / 94,000 Full internal AI deployment scope exposed
System prompt configs 95 (across 12 model types) Behavioral control plane — writable
RAG document chunks 3.68 million Entire proprietary knowledge base with S3 paths and file metadata
External API file flows 1.1 million files Including 266,000+ OpenAI vector stores

Why Chat Message Exposure Is Categorically Worse for a Consulting Firm

McKinsey's conversations with its own employees about client matters occupy legal territory analogous to attorney-client privilege — confidential strategy, M&A deal analyses, financial model discussions. Unlike a consumer data breach, these messages represent the intellectual core of client relationships. The 46.5 million messages exposed in plaintext represent years of strategic work across hundreds of client engagements. Clients whose conversations were captured in those messages had no direct notification pathway — the third-party data risk dimension.

Under GDPR Article 33, the exposure of 57,000 user accounts' personal data triggers mandatory notification to the supervisory authority within 72 hours of the controller becoming aware of the breach — unless the breach is unlikely to result in a risk to the rights and freedoms of natural persons (GDPR Art. 33, gdpr-info.eu). McKinsey's handling of this obligation was not publicly detailed in any disclosure filing.

The Write-Access Vector: Prompt Poisoning

The most operationally dangerous finding was not the data exposure — it was write access to 95 system prompt configurations stored in the same database.

System prompts define how Lilli behaves for its 43,000 users:

  • What topics Lilli will and will not engage with
  • How Lilli frames risk, opportunity, and competitive analysis
  • What sources Lilli cites and how it qualifies them
  • What guardrails Lilli applies to sensitive topics
  • What downstream tools and APIs Lilli invokes

An attacker with the same SQL injection access could execute a single UPDATE statement via an HTTP call and silently modify any of these 95 configurations. At 500,000 prompts per month, a compromised system prompt would propagate subtly altered advice to the entire 43,000-person user base — with no code change, no deployment event, and no alert in any conventional monitoring system.

This is categorically different from data theft. Data theft extracts what has happened. Prompt poisoning shapes what will happen, invisibly and persistently.

Sources: CodeWall, NeuralTrust, BankInfoSecurity


4. AI-Specific Vulnerabilities Exposed

OWASP LLM Top 10 Mapping

This breach touches multiple OWASP LLM Top 10 (2025) categories simultaneously:

OWASP LLM Category How It Applies to Lilli
LLM01: Prompt Injection Write access to system prompts via SQL injection is an indirect prompt injection — the attacker modifies the data the LLM reads at inference time, not the prompt submitted by the user
LLM02: Sensitive Information Disclosure 46.5M chat messages, 728K files, and 57K user accounts exposed without authorization
LLM04: Data and Model Poisoning 3.68M RAG chunks writable via the same injection vector — potential for poisoned embeddings to corrupt AI retrieval
LLM06: Excessive Agency Lilli's access to external APIs and 266K+ OpenAI vector stores means a compromised instance could have been directed to exfiltrate via those tool connections
LLM07: System Prompt Leakage 95 system prompt configurations exposed — intellectual property and behavioral control surface
LLM08: Vector and Embedding Weaknesses 3.68M RAG document chunks exposed with S3 paths; same write access could inject poisoned embeddings

Source: OWASP Top 10 for LLM Applications 2025

Indirect Prompt Injection via Database Write Access

Direct prompt injection requires an attacker to interact with the AI through a user-facing prompt. Indirect prompt injection is more dangerous: the attacker modifies data the AI will ingest — RAG document chunks, system prompt records, cached context — without ever sending a prompt to the model. The AI then executes the attacker's instructions as though they were legitimate system configuration.

The Lilli breach is a textbook indirect prompt injection path: SQL injection to database write access to system prompt modification to compromised AI output. The attacker never interacted with the LLM directly.

Source: Indirect Prompt Injection: The Hidden Threat — Lakera

RAG Layer as the Weakest Link

The 3.68 million RAG document chunks represent McKinsey's accumulated proprietary knowledge base — decades of research methodologies and client work distilled into the AI's retrieval system. This is simultaneously the platform's core value proposition and its most sensitive asset.

Snyk Labs' RAGPoison research (RAGPoison: Persistent Prompt Injection via Poisoned Vector Databases, August 18, 2025) demonstrated that with write access to a vector database, an attacker can inject poisoned embeddings positioned to intercept retrieval results. In Snyk's demonstration, 274,944 poisoned data points were inserted into the vector database in approximately 80 seconds using a Python script. The technique works by surrounding each legitimate document's embedding with injected points at calculated offsets (4 points per axis per document at ±0.0001 and ±0.00001 units), ensuring the poisoned content is returned as more similar to the query vector than the legitimate document. The attack successfully caused the LLM to return injected content instead of legitimate answers — demonstrating persistent prompt injection achieved entirely through the data layer, with no interaction with the LLM itself.

Write access to the RAG knowledge base — available via the same SQL injection in Lilli — enables this attack at scale. The AI continues to appear functional; it simply returns subtly wrong answers for targeted questions, with no alert and no audit trail.

Additionally, text embeddings — often assumed to be opaque numerical representations — can be reverse-engineered to reconstruct the original source text. Research published at ACL 2024 (Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries) demonstrated that adversaries can infer sensitive information from text embeddings without direct access to the embedding model, using a surrogate transfer attack. Separately, Morris et al. (2023) showed that an adversary can recover 92% of a 32-token text input given embeddings from a T5-based transformer. Exposing the 3.68M RAG chunks and their S3 paths therefore means the underlying documents are not protected even if the source file store is theoretically separate.

The No-Audit-Trail Problem

Traditional intrusion detection systems and SIEMs monitor for:

  • Unusual network traffic
  • Privilege escalation events
  • File system changes
  • Executable code changes
  • Authentication anomalies

None of these fire when an application's own SQL connection updates a database row containing a system prompt. The attack is invisible to all conventional monitoring. The compromised prompt survives application restarts, code audits (prompts live in the database, not the code), and routine security scans. This is a new category of persistent, deniable compromise.


5. Why Standard Tools Missed It

The Scanner Architecture Problem

OWASP ZAP and signature-based WAFs share the same architectural limitation: they test input value positions against libraries of known malicious patterns. They model the attack surface as a set of named fields with user-supplied values. They do not model:

  • The relationship between request structure and backend query construction
  • Dynamic key names as injection candidates
  • Multi-step reasoning across multiple responses to infer database structure

The CodeWall agent succeeded by observing that error messages contained key names verbatim — a subtle, multi-request inference that no single-request signature scanner can make. It then used that observation to build an iterative error-extraction channel. This is behavioral analysis, not pattern matching.

WAF Bypass: A Documented, Industry-Acknowledged Problem

Claroty Team82's research (2022) demonstrated bypass of Palo Alto, AWS ELB, Cloudflare, F5, and Imperva using JSON-based injection syntax. The OWASP CRS project added Rule 942550 within six days of that publication — but only for users running CRS at default paranoia level 1 or who updated their WAF rules. Organizations that had not updated CRS rules since before December 2022 were running without protection for this specific vector.

The fundamental issue, as Imperva's research noted: "an attacker can craft a tautology that does not use an equal sign," bypassing WAF rules that require = combined with suspicious patterns to identify SQL injection.

The False Assurance of "Scanner Passed"

The most dangerous outcome of scanner-based security assurance is not the vulnerability that scanners find and miss — it is the organizational confidence that "scanner passed" generates. Teams see clean results and treat that as clearance. This mental model breaks for AI platforms because:

  1. AI-era API designs introduce structural injection surfaces that scanners were not built to test
  2. The AI behavioral layer (prompts, RAG configs) is entirely outside traditional scanner scope
  3. Complex API surfaces (200+ endpoints in Lilli's case) exceed what manual review supplements without a systematic approach

Source: Picus Security — WAF Bypass Using JSON-Based SQL Injection Attacks


6. The Autonomous Agent Advantage

What Checklist-Based Testing Misses by Design

Conventional penetration testing is bounded: a defined scope, a checklist of known attack classes, and a fixed testing window (often 1–2 weeks per year). It is excellent at finding known vulnerability classes in known locations. It is structurally unable to discover:

  • Novel injection surfaces not in the tester's checklist
  • Multi-step attack chains that chain five individually innocuous observations into a critical finding
  • Vulnerabilities that require adaptive reasoning across many responses

The Lilli vulnerability required 15 iterative probes, each using the previous error response to refine the next attempt. A checklist tester would have logged "JSON injection test: no response match" and moved on. The autonomous agent recognized partial SQL error content in key reflection and built a channel from that signal.

Empirical Evidence: AI Agents vs. Human Testers

A December 2024 arXiv paper (Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing — arXiv 2512.09882) tested the ARTEMIS AI agent framework against human penetration testers in real-world exercises:

  • ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate
  • ARTEMIS outperformed 9 of 10 human participants
  • The top human performer found 13 vulnerabilities; ARTEMIS configurations reached 11
  • Cost differential: $18.21/hour for ARTEMIS vs. $125,034/year (average U.S. penetration tester)
  • Key AI advantage: parallel sub-agent deployment — ARTEMIS "launches a sub-agent to probe that target in the background, sometimes resulting in multiple sub-agents for multiple targets," enabling concurrent exploitation that individual humans cannot match

The paper's conclusion: AI agents are not replacements for human testers (GUI-based interactions, business logic, social engineering remain human-dominated tasks) but are genuine operational complements offering cost-effective continuous coverage.

Multi-Agent Orchestration: The AWS Security Agent Model

The multi-agent architecture used by CodeWall reflects a broader shift in how automated security testing is being designed. AWS launched its own Security Agent (Inside AWS Security Agent: A multi-agent architecture for automated penetration testing) in preview at AWS re:Invent 2025, providing a concrete commercial illustration of the approach. The system coordinates specialized sub-agents: one performs broad reconnaissance using static predefined tasks to map the application surface; a guided exploration agent then dynamically generates focused test tasks tailored to the specific application, reasoning about discovered endpoints, business logic patterns, and potential vulnerability chains; and an intelligent sign-in component maintains authenticated sessions across testing phases. Critically, the system "dynamically generates focused test tasks" rather than running from a fixed list — adapting based on application responses in the same way the CodeWall agent adapted based on error messages. AWS's public implementation of this architecture is significant: it confirms that major cloud providers now treat adaptive, multi-agent autonomous security testing as a mainstream engineering capability, not an experimental research method. The same reasoning loop that makes it effective for defense makes it equally effective when used offensively.

Source: AWS Security Agent Blog — AWS re:Invent 2025

The Attacker Parity Problem

CodeWall CEO Paul Price stated directly: "Hackers will be using the same technology and strategies" for ransomware and data extortion. This is not speculative. RunSybil — founded by OpenAI's first security hire — raised $40M in March 2026 for continuous autonomous penetration testing. Penetrify, an autonomous AI red team, maps attack surfaces, chains vulnerabilities, and provides production-ready code fixes autonomously. The autonomous offensive capability demonstrated in the Lilli breach is commercially available, at low cost, to any attacker with motivation.

The defender calculus: if attackers can run autonomous agents continuously against your production systems at $18/hour, annual penetration tests and quarterly scanner runs are not an adequate response model.

Sources: arXiv 2512.09882, RunSybil $40M raise — Fortune, 2026 Guide to AI Penetration Testing — Penligent


7. Blast Radius: Why AI Platforms Are Different

Conventional Database Breach vs. AI Platform Breach

In a conventional enterprise database breach via SQL injection — say, a CRM with 50,000 customer records — the blast radius is:

  • Records exposed: bounded, enumerable, notifiable
  • Behavior affected: none
  • Recovery: patch the query, rotate credentials, notify affected users
  • Persistence: none after remediation

The Lilli breach blast radius is qualitatively different (CodeWall, NeuralTrust):

Dimension Conventional DB Breach AI Platform Breach (Lilli)
Data exposed Records in targeted table All data in all tables the DB user can reach
Behavior affected None AI behavioral layer fully writable
User impact Specific affected records All 43,000 users via single prompt modification
Knowledge base Not applicable 3.68M RAG chunks exposable and poisonable
Persistence after remediation Ends when patched Prompt modification survives patch if not detected
Detection Access logs, data monitoring No existing monitoring category catches prompt modification
Supply chain Direct users Every client whose work informed those 46.5M messages

The Behavioral Multiplier

The 500,000 monthly prompts number is the multiplier that transforms prompt write access from a targeted attack into a platform-wide influence operation. Modifying one system prompt does not affect one user — it affects every query that prompt handles, continuously, until the modification is detected and reversed. At 500,000 prompts per month, even a subtle framing bias in financial analysis prompts compounds into thousands of subtly influenced decisions per day.

This is categorically not a data breach. Data breaches are discrete events. Prompt poisoning is a continuous-time compromise of an organization's AI-mediated reasoning. As the Sombrainc LLM Security 2026 analysis notes, a single poisoning event in a shared AI system has a multiplied reach that no conventional database exploit can match.

The IP Aggregation Problem

The 3.68 million RAG document chunks represent McKinsey's accumulated proprietary knowledge — research methodologies, client engagement patterns, analytical frameworks built over decades. A conventional filing system breach might expose documents in a folder. A RAG knowledge base breach exposes the distilled, semantically searchable essence of everything the organization knows. Read access to the RAG layer is equivalent to read access to a structured, queryable representation of the firm's intellectual capital — without needing to read any individual document.

As the Development Corporate M&A analysis of the breach observes, this IP aggregation dimension means that RAG-based AI systems are now acquisition intelligence targets — a single breach can yield not just current data but a firm's accumulated knowledge architecture.

Source: Outpost24 — How an AI Agent Hacked McKinsey AI Platform, NeuralTrust, Development Corporate

OWASP LLM06: Excessive Agency

The integration of Lilli with 266,000+ external OpenAI vector stores and 1.1 million files flowing through external APIs represents OWASP LLM06: Excessive Agency risk. A compromised system prompt could direct those tool connections to exfiltrate data, modify external stores, or invoke APIs with the platform's full permission scope — well beyond what the original breach access alone could achieve. Overpermissioned AI systems amplify the impact of any underlying infrastructure breach.


8. Unauthenticated Endpoints: The Entry Point

How 22 Unauthenticated Endpoints Reached Production

The 22 unauthenticated endpoints among Lilli's 200+ are almost certainly a development-to-production hygiene failure: endpoints created during development where authentication was deferred, marked as internal, or used by tooling that was never meant to be public-facing. In a production platform serving 43,000 employees, these represent oversight rather than deliberate design.

This pattern is extremely common in enterprise platforms that grew quickly. Features ship faster than security reviews. Development endpoints that exist for convenience or internal tooling get included in deployments. Public API documentation — intended for internal developer reference — becomes an external reconnaissance resource.

McKinsey's remediation after disclosure was to immediately remove public API documentation and patch all unauthenticated endpoints. The remediation itself confirms the pre-breach state: public docs and open endpoints were not a security concern someone had addressed.

The API Documentation Gift to Attackers

Publicly exposed OpenAPI/Swagger documentation is a reconnaissance gift. It provides:

  • Exact endpoint paths
  • Authentication requirements (or lack thereof)
  • Accepted parameters and data types
  • Expected response schemas
  • Error message formats

For an autonomous agent, this is a structured attack surface map. The McKinsey agent did not need to brute-force endpoint discovery — the documentation enumerated the surface.

Zero-Trust API Design Principles

NIST Special Publication 800-207 (Zero Trust Architecture) establishes the principle that all access requests — regardless of network location — must be authenticated and authorized before resource access is granted (NIST SP 800-207). Applied to API design, this means authentication is the default state; unauthenticated access requires explicit, reviewed exception. Developer endpoints are never shipped to production environments.

The Lilli breach represents a failure of this principle applied at the API design level. Authentication was not enforced by default — it was configured per endpoint, and 22 endpoints were missed.


9. Responsible Disclosure and Industry Response

The Disclosure Process

CodeWall followed a responsible disclosure model bounded by McKinsey's HackerOne policy. The agent verified the policy existed before proceeding — treating it as a legal guardrail for the engagement. Disclosure to McKinsey occurred on March 1, 2026, one day after the breach. McKinsey acted rapidly: all unauthenticated endpoints patched, development environment taken offline, and public API documentation removed by March 2 — within 24 hours of notification.

Public disclosure followed on March 9, 2026. This eight-day window from disclosure to publication is consistent with responsible disclosure norms (which typically allow 7–14 days for critical actively-exploitable vulnerabilities with patches in place).

McKinsey's Public Posture

McKinsey has not issued a public statement beyond confirming through a spokesperson that a third-party forensics investigation found no evidence of unauthorized data access by any party other than the CodeWall research engagement. There has been no public acknowledgment of the prompt write-access risk, no client notification, and no regulatory disclosure filing (publicly visible). The absence of public disclosure of the broader vulnerability risk — given the platform scale and the nature of client data involved — has been noted critically by multiple security analysts.

Industry Framing

Coverage across The Register, BankInfoSecurity, FStech, Inc., and The Decoder consistently applied the same framing: a decades-old vulnerability class (SQL injection) amplified to an unprecedented scale and qualitative impact by its context (enterprise AI platform). The story landed not as "yet another SQL injection" but as "the first major demonstration of AI platform-specific blast radius in a real production breach."


10. Enterprise AI Security Governance Recommendations

Security Architecture for AI Platforms

Based on the Lilli breach findings and corroborating analysis from Traefik, Swept AI, NeuralTrust, and Outpost24, the following architecture controls should be standard for any enterprise AI platform:

Layer 1 — API Gateway (catch at the perimeter)

  • Authentication enforced on every endpoint, no exceptions
  • WAF with current CRS rules including Rule 942550 (JSON SQL injection)
  • SQL injection pattern detection as a blocking rule
  • API documentation never exposed publicly

Layer 2 — Application Layer (validate before database contact)

  • Allowlist validation for all JSON key names before SQL construction
  • ORM/query builder that never accepts field names from user input
  • Rate limiting per authenticated identity
  • Input validation schema enforcement
  • Object-level authorization check on every record-returning operation

Layer 3 — Database Layer (limit blast radius)

  • Read-only credentials for all AI inference query paths
  • Separate credentials and schemas for: user conversation data, RAG knowledge base, system prompt configurations
  • Row-level security or table-level access policies
  • Audit logging at the storage layer, separate from application logs

Layer 4 — Prompt and RAG Layer (protect the behavioral control plane)

  • System prompt storage isolated from operational data in a separate, write-restricted store
  • Version-controlled prompts with change approval workflow (treat like source code)
  • Cryptographic hash of production prompts stored in an append-only audit log; verify at load time
  • Write access to prompt configurations restricted to the deployment pipeline, not the application database user
  • Continuous hash-based integrity monitoring; alert on deviation from known-good baseline

Prompt Integrity Monitoring: The Gap No One Has Filled

The most underinvested capability in enterprise AI security today is prompt integrity monitoring. Organizations have endpoint detection and response (EDR) for devices, SIEM for logs, and DLP for data. There is no widely deployed equivalent for detecting that an AI's behavioral instructions have been silently modified. Building this capability requires:

  1. Baseline snapshots of all system prompt configurations
  2. Continuous comparison against those baselines
  3. Automated alert on any deviation not correlated with an approved deployment event
  4. Incident response playbook specifically for "suspected prompt modification" scenarios

AI Asset Inventory: The Governance Prerequisite

Swept AI's analysis of the Lilli breach found a pattern across enterprise AI deployments: organizations cannot answer basic governance questions about their own AI systems — which assistants exist, what data each accesses, who created them, what prompts govern them, when they were last modified. Without this inventory, security reviews are necessarily incomplete and regulatory compliance is unauditable.

The Lilli platform's 384,000 AI assistants and 94,000 workspaces illustrate the scale problem. At that volume, manual inventory is impossible. AI asset management tooling — analogous to software asset management (SAM) for traditional applications — is now a prerequisite for AI security governance.

M&A Due Diligence: A New Security Domain

The Development Corporate analysis identified AI platform security as an emerging M&A due diligence domain. For acquirers evaluating companies with enterprise AI platforms, the Lilli breach surfaces a checklist that did not previously exist:

  • Are all API endpoints authenticated by default?
  • Is API documentation publicly accessible?
  • How are system prompt configurations stored, versioned, and access-controlled?
  • Is the database credential used for AI inference read-only?
  • When was the last penetration test, and was AI-specific attack surface included in scope?
  • Are RAG vector stores access-controlled at retrieval time?
  • Does the AI incident response plan cover prompt modification scenarios?

AI security maturity now affects valuation — both as discount risk and as premium for organizations that have proactively built these controls.

OWASP LLM Top 10 as Audit Framework

The most actionable governance artifact for teams that need to assess their current AI security posture is the OWASP LLM Top 10 (2025). Mapping the Lilli breach to this framework produces a concrete audit checklist:

OWASP Category Lilli Breach Evidence Your Audit Question
LLM01: Prompt Injection Write access to system prompts via SQL injection Can any API path reach system prompt records in write mode?
LLM02: Sensitive Information Disclosure 46.5M messages, 728K files exposed Are AI query credentials read-only and scoped to minimum data?
LLM04: Data and Model Poisoning RAG write access via same injection vector Who can write to the RAG knowledge base, via what path?
LLM06: Excessive Agency 266K+ external API connections exploitable from compromised prompt Are AI tool permissions scoped to minimum needed?
LLM07: System Prompt Leakage 95 system prompts exposed and readable Are system prompts access-controlled and not in the operational DB?
LLM08: Vector and Embedding Weaknesses 3.68M RAG chunks exposed with S3 paths Is the vector database access-controlled at retrieval time?

Source: OWASP LLM Top 10 2025


11. Source Index

Primary Source

Dimension 1: Incident Coverage

Dimension 2: JSON SQL Injection and IDOR Technical Sources

Dimension 3: Impact and Regulatory Context

Dimension 4: AI-Specific Vulnerabilities

Dimension 5: Traditional Tool Failures

Dimension 6: Autonomous Security Agents

Dimension 7: Blast Radius

Dimension 8: Unauthenticated Endpoints

Dimension 9: Responsible Disclosure

Dimension 10: Enterprise Governance


This report was produced by the Eightgen Research Division on March 19, 2026. All claims are cited with primary sources.

Gaurav Chopra
Gaurav Chopra

Gaurav is a Co-Founder of Eightgen AI

Work with us

Found this useful? Let's talk about your build.

We write about what we build. If any of this resonates with a challenge you're facing, book a free 30-minute call — no prep needed.