Executive Summary
On February 28, 2026, an autonomous AI agent built by security research startup CodeWall gained full read-write access to McKinsey's internal AI platform "Lilli" in under two hours — with no credentials, no insider access, and no human guidance during the attack. The entry point was a SQL injection vulnerability in an unauthenticated API endpoint. The twist: while query values were correctly parameterized, JSON field names (keys) were concatenated directly into SQL — a variant structurally invisible to OWASP ZAP and all five major WAFs tested by researchers. A secondary IDOR (Insecure Direct Object Reference) flaw then enabled cross-user access to individual employee search histories. The exposure reached 46.5 million plaintext chat messages, 728,000 client files, 57,000 user accounts, and 3.68 million RAG document chunks representing decades of McKinsey's proprietary research. The most consequential finding was write access to 95 system prompt configurations controlling AI behavior — modifiable with a single HTTP call, no deployment gate, no alert, no audit trail. McKinsey patched all unauthenticated endpoints within 24 hours of disclosure and a third-party forensics firm found no evidence of unauthorized access beyond the research engagement. The core lesson: AI platforms are behavioral control systems sitting on top of data aggregations. A single vulnerability now reaches not just data, but the AI's entire reasoning contract with its users.
1. What Happened — Timeline, Actors, and Discovery
CodeWall and Its Autonomous Agent
CodeWall is an autonomous offensive security research startup. Their AI agent conducts what they describe as fully autonomous red-teaming: "researching the target, analyzing, attacking, and reporting" with no human in the loop. The agent does not follow a predefined checklist — it reasons iteratively, forming hypotheses from observed responses and adapting its attack chain accordingly.
The agent was pointed at McKinsey's Lilli platform on February 28, 2026. Notably, the agent itself selected the target. It flagged McKinsey because:
- McKinsey had recently publicized updates to the Lilli platform, signaling an active, high-value target
- McKinsey's HackerOne responsible disclosure policy provided a legal and ethical boundary the agent could operate within
Lilli: The Platform at Stake
Lilli is McKinsey's enterprise AI platform, processing 500,000+ prompts per month across 43,000+ employees. Capabilities include AI-assisted chat covering strategy, M&A, and client engagements; RAG-based search over 100,000+ internal documents; 384,000 AI assistants across 94,000 workspaces; and integration with external APIs including OpenAI vector stores. It had been running in production for over two years.
Attack Sequence
- API Surface Mapping — The agent discovered publicly accessible API documentation listing 200+ endpoints. Of these, 22 required no authentication.
- Unauthenticated Endpoint Targeting — An endpoint accepting JSON payloads to log user search queries was identified as a write path to the database.
- SQL Injection Discovery — The agent observed that JSON key names were reflected verbatim in database error messages. This indicated keys were being concatenated into SQL rather than parameterized.
- Blind SQL Injection Exploitation — Through 15 iterative error-based probing attempts, the agent mapped the database schema.
- IDOR Chaining — Following the SQL injection, the agent identified a Broken Object Level Authorization (BOLA/IDOR) flaw that allowed retrieval of individual employee records without verifying the caller's permission to access them.
- Full Database Access — Complete read-write access to the production database, including cross-user records, was achieved within two hours.
Responsible Disclosure and Remediation Timeline
| Date | Event |
|---|---|
| February 28, 2026 | Agent achieves full access to Lilli production database |
| March 1, 2026 | CodeWall initiates responsible disclosure to McKinsey |
| March 2, 2026 | McKinsey patches all unauthenticated endpoints; dev environment taken offline; public API docs blocked |
| March 9, 2026 | CodeWall publishes public disclosure |
McKinsey's investigation, supported by a leading third-party forensics firm, found no evidence that client data or confidential information was accessed by any unauthorized party beyond this research engagement.
Sources: CodeWall, The Register, CyberNews
2. The Exploit: JSON Field-Name SQL Injection and IDOR — Technical Deep Dive
What Partial Parameterization Means
The developers at McKinsey applied parameterized queries correctly for field values — the textbook defense against SQL injection. The failure was in field names. Consider this simplified example:
-- Intended query (values parameterized, safe):
INSERT INTO search_logs (user_id, query_text) VALUES (?, ?)-- Actual query construction (key names concatenated, vulnerable):
-- The JSON payload: {"user_id": 123, "query_text": "market analysis"}
-- The code extracts keys and builds:
"INSERT INTO search_logs (" + key1 + ", " + key2 + ") VALUES (?, ?)"If an attacker submits a JSON payload with a malicious key name:
{"user_id": 123, "'; SELECT * FROM chat_messages; --": "value"}The key name is injected into the structural (column name) position of the SQL statement — a location parameterization never touches. The database interprets the injected SQL as valid commands.
Historical note: The principle that JSON keys can become SQL injection vectors was first documented publicly by Kazuho Oku in July 2014 (The JSON SQL Injection Vulnerability, Kazuho's Weblog). That analysis ran almost a decade before the OWASP CRS project addressed it at default protection levels — illustrating how long the tooling community can lag a known vulnerability class.
How the Agent Detected It
The critical signal: database error messages reflected the JSON key names verbatim. When the agent submitted keys containing SQL fragments, it observed partial SQL error messages echoed back — confirming that the key content was reaching the SQL parser unescaped. This enabled error-based blind SQL injection: by crafting keys that produce specific database errors, the agent extracted database structure information from error content across 15 iterative probes.
IDOR: The Secondary Vulnerability That Amplified Cross-User Access
Beyond the SQL injection, the agent identified a Broken Object Level Authorization (BOLA) flaw — the modern OWASP API Security Top 10 term for what is commonly called IDOR (Insecure Direct Object Reference). As the Promptfoo analysis of the incident describes, "the application accepts an object identifier and returns a record without verifying that the caller is allowed to see it." After the SQL injection established database access, the IDOR flaw allowed the agent to enumerate and retrieve individual employees' search histories by simply supplying their identifiers in API requests — no authorization verification was performed.
The chained attack path was: unauthenticated endpoint → SQL injection → database schema extraction → IDOR exploitation → cross-user search history access. Each step used the previous one's findings, which is precisely the multi-step reasoning that checklist scanners cannot replicate.
Source: McKinsey's Lilli Looks More Like an API Security Failure Than a Model Jailbreak — Promptfoo
This Is Not a New Class — It Is Systematically Missed
CVE-2024-42005, disclosed August 8, 2024, is the same vulnerability class in Django's ORM. Django's QuerySet.values() and values_list() methods failed to properly validate JSON object keys passed as arguments, allowing SQL injection through column aliases. Affected versions: Django prior to 4.2.15 and 5.0.8. The fix required explicitly treating JSON key names as untrusted input — exactly what Lilli's developers failed to do.
Source: CVE-2024-42005: Django JSONField SQL Injection — Vulert
Why WAFs and OWASP ZAP Missed It
Claroty Team82's research (JS-ON: Security-OFF, December 2022) tested JSON-based SQL injection against five major WAF vendors and bypassed all of them:
- Palo Alto Networks (Next Generation Firewall)
- Amazon Web Services (AWS ELB)
- Cloudflare
- F5 (Big-IP)
- Imperva
The reason: WAF SQL injection detection rules look for standard SQL operators and keywords combined with suspicious patterns. JSON-based injection uses database JSON operators (JSON_EXTRACT, ?, @>, ::jsonb) that WAF parsers did not model as SQL. The database interprets these as valid SQL; the WAF sees only JSON.
Example PostgreSQL injection payload that passes WAFs:
'{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonbThe OWASP Core Rule Set (CRS) project only added Rule 942550 ("JSON-Based SQL Injection") in December 2022 — six days after the Team82 research published — providing protection at default paranoia level 1. Before that, protection only existed at paranoia level 2 (non-default). Organizations running WAFs with CRS at default paranoia level 1 before 2023 had no WAF-level defense. Those running out-of-date WAF rules in 2026 may still be unprotected.
Source: OWASP CRS Project — A new rule to prevent SQL in JSON
OWASP ZAP's scanner tests input value positions against known bad signatures. It does not systematically model JSON key names as injection candidates. This is not a flaw in ZAP — it is a structural limitation of signature-based scanning against a non-standard injection surface.
Correct Remediation
The fix is not more thorough parameterization. Parameterization cannot protect structural SQL positions (column names, table names, operators). The correct approaches are:
- Allowlisting for all structural elements. Maintain an explicit set of permitted column names. Reject any key name not on the list before SQL construction begins.
- ORM abstractions that never accept field names from user input. Do not expose raw JSON keys as database column selectors.
- Static analysis rules. Scan for any string concatenation involving request-derived data in SQL construction paths — values and structural positions alike.
- Object-level authorization checks. Every request returning a record must verify the authenticated caller is permitted to access that specific object — not just that they are authenticated.
Sources: Imperva — Abusing JSON-Based SQL, Claroty Team82, OWASP SQL Injection Prevention Cheat Sheet, Promptfoo
3. Impact Analysis — What Was Exposed and What It Means
Scale of Exposure
| Data Class | Quantity | Significance |
|---|---|---|
| Chat messages | 46.5 million | Plaintext — M&A discussions, client strategy, financial analysis |
| Files | 728,000 | 192K PDFs, 93K spreadsheets, 93K presentations, 58K Word docs |
| User accounts | 57,000 | PII, search histories, session context |
| AI assistants / workspaces | 384,000 / 94,000 | Full internal AI deployment scope exposed |
| System prompt configs | 95 (across 12 model types) | Behavioral control plane — writable |
| RAG document chunks | 3.68 million | Entire proprietary knowledge base with S3 paths and file metadata |
| External API file flows | 1.1 million files | Including 266,000+ OpenAI vector stores |
Why Chat Message Exposure Is Categorically Worse for a Consulting Firm
McKinsey's conversations with its own employees about client matters occupy legal territory analogous to attorney-client privilege — confidential strategy, M&A deal analyses, financial model discussions. Unlike a consumer data breach, these messages represent the intellectual core of client relationships. The 46.5 million messages exposed in plaintext represent years of strategic work across hundreds of client engagements. Clients whose conversations were captured in those messages had no direct notification pathway — the third-party data risk dimension.
Under GDPR Article 33, the exposure of 57,000 user accounts' personal data triggers mandatory notification to the supervisory authority within 72 hours of the controller becoming aware of the breach — unless the breach is unlikely to result in a risk to the rights and freedoms of natural persons (GDPR Art. 33, gdpr-info.eu). McKinsey's handling of this obligation was not publicly detailed in any disclosure filing.
The Write-Access Vector: Prompt Poisoning
The most operationally dangerous finding was not the data exposure — it was write access to 95 system prompt configurations stored in the same database.
System prompts define how Lilli behaves for its 43,000 users:
- What topics Lilli will and will not engage with
- How Lilli frames risk, opportunity, and competitive analysis
- What sources Lilli cites and how it qualifies them
- What guardrails Lilli applies to sensitive topics
- What downstream tools and APIs Lilli invokes
An attacker with the same SQL injection access could execute a single UPDATE statement via an HTTP call and silently modify any of these 95 configurations. At 500,000 prompts per month, a compromised system prompt would propagate subtly altered advice to the entire 43,000-person user base — with no code change, no deployment event, and no alert in any conventional monitoring system.
This is categorically different from data theft. Data theft extracts what has happened. Prompt poisoning shapes what will happen, invisibly and persistently.
Sources: CodeWall, NeuralTrust, BankInfoSecurity
4. AI-Specific Vulnerabilities Exposed
OWASP LLM Top 10 Mapping
This breach touches multiple OWASP LLM Top 10 (2025) categories simultaneously:
| OWASP LLM Category | How It Applies to Lilli |
|---|---|
| LLM01: Prompt Injection | Write access to system prompts via SQL injection is an indirect prompt injection — the attacker modifies the data the LLM reads at inference time, not the prompt submitted by the user |
| LLM02: Sensitive Information Disclosure | 46.5M chat messages, 728K files, and 57K user accounts exposed without authorization |
| LLM04: Data and Model Poisoning | 3.68M RAG chunks writable via the same injection vector — potential for poisoned embeddings to corrupt AI retrieval |
| LLM06: Excessive Agency | Lilli's access to external APIs and 266K+ OpenAI vector stores means a compromised instance could have been directed to exfiltrate via those tool connections |
| LLM07: System Prompt Leakage | 95 system prompt configurations exposed — intellectual property and behavioral control surface |
| LLM08: Vector and Embedding Weaknesses | 3.68M RAG document chunks exposed with S3 paths; same write access could inject poisoned embeddings |
Source: OWASP Top 10 for LLM Applications 2025
Indirect Prompt Injection via Database Write Access
Direct prompt injection requires an attacker to interact with the AI through a user-facing prompt. Indirect prompt injection is more dangerous: the attacker modifies data the AI will ingest — RAG document chunks, system prompt records, cached context — without ever sending a prompt to the model. The AI then executes the attacker's instructions as though they were legitimate system configuration.
The Lilli breach is a textbook indirect prompt injection path: SQL injection to database write access to system prompt modification to compromised AI output. The attacker never interacted with the LLM directly.
Source: Indirect Prompt Injection: The Hidden Threat — Lakera
RAG Layer as the Weakest Link
The 3.68 million RAG document chunks represent McKinsey's accumulated proprietary knowledge base — decades of research methodologies and client work distilled into the AI's retrieval system. This is simultaneously the platform's core value proposition and its most sensitive asset.
Snyk Labs' RAGPoison research (RAGPoison: Persistent Prompt Injection via Poisoned Vector Databases, August 18, 2025) demonstrated that with write access to a vector database, an attacker can inject poisoned embeddings positioned to intercept retrieval results. In Snyk's demonstration, 274,944 poisoned data points were inserted into the vector database in approximately 80 seconds using a Python script. The technique works by surrounding each legitimate document's embedding with injected points at calculated offsets (4 points per axis per document at ±0.0001 and ±0.00001 units), ensuring the poisoned content is returned as more similar to the query vector than the legitimate document. The attack successfully caused the LLM to return injected content instead of legitimate answers — demonstrating persistent prompt injection achieved entirely through the data layer, with no interaction with the LLM itself.
Write access to the RAG knowledge base — available via the same SQL injection in Lilli — enables this attack at scale. The AI continues to appear functional; it simply returns subtly wrong answers for targeted questions, with no alert and no audit trail.
Additionally, text embeddings — often assumed to be opaque numerical representations — can be reverse-engineered to reconstruct the original source text. Research published at ACL 2024 (Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries) demonstrated that adversaries can infer sensitive information from text embeddings without direct access to the embedding model, using a surrogate transfer attack. Separately, Morris et al. (2023) showed that an adversary can recover 92% of a 32-token text input given embeddings from a T5-based transformer. Exposing the 3.68M RAG chunks and their S3 paths therefore means the underlying documents are not protected even if the source file store is theoretically separate.
The No-Audit-Trail Problem
Traditional intrusion detection systems and SIEMs monitor for:
- Unusual network traffic
- Privilege escalation events
- File system changes
- Executable code changes
- Authentication anomalies
None of these fire when an application's own SQL connection updates a database row containing a system prompt. The attack is invisible to all conventional monitoring. The compromised prompt survives application restarts, code audits (prompts live in the database, not the code), and routine security scans. This is a new category of persistent, deniable compromise.
5. Why Standard Tools Missed It
The Scanner Architecture Problem
OWASP ZAP and signature-based WAFs share the same architectural limitation: they test input value positions against libraries of known malicious patterns. They model the attack surface as a set of named fields with user-supplied values. They do not model:
- The relationship between request structure and backend query construction
- Dynamic key names as injection candidates
- Multi-step reasoning across multiple responses to infer database structure
The CodeWall agent succeeded by observing that error messages contained key names verbatim — a subtle, multi-request inference that no single-request signature scanner can make. It then used that observation to build an iterative error-extraction channel. This is behavioral analysis, not pattern matching.
WAF Bypass: A Documented, Industry-Acknowledged Problem
Claroty Team82's research (2022) demonstrated bypass of Palo Alto, AWS ELB, Cloudflare, F5, and Imperva using JSON-based injection syntax. The OWASP CRS project added Rule 942550 within six days of that publication — but only for users running CRS at default paranoia level 1 or who updated their WAF rules. Organizations that had not updated CRS rules since before December 2022 were running without protection for this specific vector.
The fundamental issue, as Imperva's research noted: "an attacker can craft a tautology that does not use an equal sign," bypassing WAF rules that require = combined with suspicious patterns to identify SQL injection.
The False Assurance of "Scanner Passed"
The most dangerous outcome of scanner-based security assurance is not the vulnerability that scanners find and miss — it is the organizational confidence that "scanner passed" generates. Teams see clean results and treat that as clearance. This mental model breaks for AI platforms because:
- AI-era API designs introduce structural injection surfaces that scanners were not built to test
- The AI behavioral layer (prompts, RAG configs) is entirely outside traditional scanner scope
- Complex API surfaces (200+ endpoints in Lilli's case) exceed what manual review supplements without a systematic approach
Source: Picus Security — WAF Bypass Using JSON-Based SQL Injection Attacks
6. The Autonomous Agent Advantage
What Checklist-Based Testing Misses by Design
Conventional penetration testing is bounded: a defined scope, a checklist of known attack classes, and a fixed testing window (often 1–2 weeks per year). It is excellent at finding known vulnerability classes in known locations. It is structurally unable to discover:
- Novel injection surfaces not in the tester's checklist
- Multi-step attack chains that chain five individually innocuous observations into a critical finding
- Vulnerabilities that require adaptive reasoning across many responses
The Lilli vulnerability required 15 iterative probes, each using the previous error response to refine the next attempt. A checklist tester would have logged "JSON injection test: no response match" and moved on. The autonomous agent recognized partial SQL error content in key reflection and built a channel from that signal.
Empirical Evidence: AI Agents vs. Human Testers
A December 2024 arXiv paper (Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing — arXiv 2512.09882) tested the ARTEMIS AI agent framework against human penetration testers in real-world exercises:
- ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate
- ARTEMIS outperformed 9 of 10 human participants
- The top human performer found 13 vulnerabilities; ARTEMIS configurations reached 11
- Cost differential: $18.21/hour for ARTEMIS vs. $125,034/year (average U.S. penetration tester)
- Key AI advantage: parallel sub-agent deployment — ARTEMIS "launches a sub-agent to probe that target in the background, sometimes resulting in multiple sub-agents for multiple targets," enabling concurrent exploitation that individual humans cannot match
The paper's conclusion: AI agents are not replacements for human testers (GUI-based interactions, business logic, social engineering remain human-dominated tasks) but are genuine operational complements offering cost-effective continuous coverage.
Multi-Agent Orchestration: The AWS Security Agent Model
The multi-agent architecture used by CodeWall reflects a broader shift in how automated security testing is being designed. AWS launched its own Security Agent (Inside AWS Security Agent: A multi-agent architecture for automated penetration testing) in preview at AWS re:Invent 2025, providing a concrete commercial illustration of the approach. The system coordinates specialized sub-agents: one performs broad reconnaissance using static predefined tasks to map the application surface; a guided exploration agent then dynamically generates focused test tasks tailored to the specific application, reasoning about discovered endpoints, business logic patterns, and potential vulnerability chains; and an intelligent sign-in component maintains authenticated sessions across testing phases. Critically, the system "dynamically generates focused test tasks" rather than running from a fixed list — adapting based on application responses in the same way the CodeWall agent adapted based on error messages. AWS's public implementation of this architecture is significant: it confirms that major cloud providers now treat adaptive, multi-agent autonomous security testing as a mainstream engineering capability, not an experimental research method. The same reasoning loop that makes it effective for defense makes it equally effective when used offensively.
Source: AWS Security Agent Blog — AWS re:Invent 2025
The Attacker Parity Problem
CodeWall CEO Paul Price stated directly: "Hackers will be using the same technology and strategies" for ransomware and data extortion. This is not speculative. RunSybil — founded by OpenAI's first security hire — raised $40M in March 2026 for continuous autonomous penetration testing. Penetrify, an autonomous AI red team, maps attack surfaces, chains vulnerabilities, and provides production-ready code fixes autonomously. The autonomous offensive capability demonstrated in the Lilli breach is commercially available, at low cost, to any attacker with motivation.
The defender calculus: if attackers can run autonomous agents continuously against your production systems at $18/hour, annual penetration tests and quarterly scanner runs are not an adequate response model.
Sources: arXiv 2512.09882, RunSybil $40M raise — Fortune, 2026 Guide to AI Penetration Testing — Penligent
7. Blast Radius: Why AI Platforms Are Different
Conventional Database Breach vs. AI Platform Breach
In a conventional enterprise database breach via SQL injection — say, a CRM with 50,000 customer records — the blast radius is:
- Records exposed: bounded, enumerable, notifiable
- Behavior affected: none
- Recovery: patch the query, rotate credentials, notify affected users
- Persistence: none after remediation
The Lilli breach blast radius is qualitatively different (CodeWall, NeuralTrust):
| Dimension | Conventional DB Breach | AI Platform Breach (Lilli) |
|---|---|---|
| Data exposed | Records in targeted table | All data in all tables the DB user can reach |
| Behavior affected | None | AI behavioral layer fully writable |
| User impact | Specific affected records | All 43,000 users via single prompt modification |
| Knowledge base | Not applicable | 3.68M RAG chunks exposable and poisonable |
| Persistence after remediation | Ends when patched | Prompt modification survives patch if not detected |
| Detection | Access logs, data monitoring | No existing monitoring category catches prompt modification |
| Supply chain | Direct users | Every client whose work informed those 46.5M messages |
The Behavioral Multiplier
The 500,000 monthly prompts number is the multiplier that transforms prompt write access from a targeted attack into a platform-wide influence operation. Modifying one system prompt does not affect one user — it affects every query that prompt handles, continuously, until the modification is detected and reversed. At 500,000 prompts per month, even a subtle framing bias in financial analysis prompts compounds into thousands of subtly influenced decisions per day.
This is categorically not a data breach. Data breaches are discrete events. Prompt poisoning is a continuous-time compromise of an organization's AI-mediated reasoning. As the Sombrainc LLM Security 2026 analysis notes, a single poisoning event in a shared AI system has a multiplied reach that no conventional database exploit can match.
The IP Aggregation Problem
The 3.68 million RAG document chunks represent McKinsey's accumulated proprietary knowledge — research methodologies, client engagement patterns, analytical frameworks built over decades. A conventional filing system breach might expose documents in a folder. A RAG knowledge base breach exposes the distilled, semantically searchable essence of everything the organization knows. Read access to the RAG layer is equivalent to read access to a structured, queryable representation of the firm's intellectual capital — without needing to read any individual document.
As the Development Corporate M&A analysis of the breach observes, this IP aggregation dimension means that RAG-based AI systems are now acquisition intelligence targets — a single breach can yield not just current data but a firm's accumulated knowledge architecture.
Source: Outpost24 — How an AI Agent Hacked McKinsey AI Platform, NeuralTrust, Development Corporate
OWASP LLM06: Excessive Agency
The integration of Lilli with 266,000+ external OpenAI vector stores and 1.1 million files flowing through external APIs represents OWASP LLM06: Excessive Agency risk. A compromised system prompt could direct those tool connections to exfiltrate data, modify external stores, or invoke APIs with the platform's full permission scope — well beyond what the original breach access alone could achieve. Overpermissioned AI systems amplify the impact of any underlying infrastructure breach.
8. Unauthenticated Endpoints: The Entry Point
How 22 Unauthenticated Endpoints Reached Production
The 22 unauthenticated endpoints among Lilli's 200+ are almost certainly a development-to-production hygiene failure: endpoints created during development where authentication was deferred, marked as internal, or used by tooling that was never meant to be public-facing. In a production platform serving 43,000 employees, these represent oversight rather than deliberate design.
This pattern is extremely common in enterprise platforms that grew quickly. Features ship faster than security reviews. Development endpoints that exist for convenience or internal tooling get included in deployments. Public API documentation — intended for internal developer reference — becomes an external reconnaissance resource.
McKinsey's remediation after disclosure was to immediately remove public API documentation and patch all unauthenticated endpoints. The remediation itself confirms the pre-breach state: public docs and open endpoints were not a security concern someone had addressed.
The API Documentation Gift to Attackers
Publicly exposed OpenAPI/Swagger documentation is a reconnaissance gift. It provides:
- Exact endpoint paths
- Authentication requirements (or lack thereof)
- Accepted parameters and data types
- Expected response schemas
- Error message formats
For an autonomous agent, this is a structured attack surface map. The McKinsey agent did not need to brute-force endpoint discovery — the documentation enumerated the surface.
Zero-Trust API Design Principles
NIST Special Publication 800-207 (Zero Trust Architecture) establishes the principle that all access requests — regardless of network location — must be authenticated and authorized before resource access is granted (NIST SP 800-207). Applied to API design, this means authentication is the default state; unauthenticated access requires explicit, reviewed exception. Developer endpoints are never shipped to production environments.
The Lilli breach represents a failure of this principle applied at the API design level. Authentication was not enforced by default — it was configured per endpoint, and 22 endpoints were missed.
9. Responsible Disclosure and Industry Response
The Disclosure Process
CodeWall followed a responsible disclosure model bounded by McKinsey's HackerOne policy. The agent verified the policy existed before proceeding — treating it as a legal guardrail for the engagement. Disclosure to McKinsey occurred on March 1, 2026, one day after the breach. McKinsey acted rapidly: all unauthenticated endpoints patched, development environment taken offline, and public API documentation removed by March 2 — within 24 hours of notification.
Public disclosure followed on March 9, 2026. This eight-day window from disclosure to publication is consistent with responsible disclosure norms (which typically allow 7–14 days for critical actively-exploitable vulnerabilities with patches in place).
McKinsey's Public Posture
McKinsey has not issued a public statement beyond confirming through a spokesperson that a third-party forensics investigation found no evidence of unauthorized data access by any party other than the CodeWall research engagement. There has been no public acknowledgment of the prompt write-access risk, no client notification, and no regulatory disclosure filing (publicly visible). The absence of public disclosure of the broader vulnerability risk — given the platform scale and the nature of client data involved — has been noted critically by multiple security analysts.
Industry Framing
Coverage across The Register, BankInfoSecurity, FStech, Inc., and The Decoder consistently applied the same framing: a decades-old vulnerability class (SQL injection) amplified to an unprecedented scale and qualitative impact by its context (enterprise AI platform). The story landed not as "yet another SQL injection" but as "the first major demonstration of AI platform-specific blast radius in a real production breach."
10. Enterprise AI Security Governance Recommendations
Security Architecture for AI Platforms
Based on the Lilli breach findings and corroborating analysis from Traefik, Swept AI, NeuralTrust, and Outpost24, the following architecture controls should be standard for any enterprise AI platform:
Layer 1 — API Gateway (catch at the perimeter)
- Authentication enforced on every endpoint, no exceptions
- WAF with current CRS rules including Rule 942550 (JSON SQL injection)
- SQL injection pattern detection as a blocking rule
- API documentation never exposed publicly
Layer 2 — Application Layer (validate before database contact)
- Allowlist validation for all JSON key names before SQL construction
- ORM/query builder that never accepts field names from user input
- Rate limiting per authenticated identity
- Input validation schema enforcement
- Object-level authorization check on every record-returning operation
Layer 3 — Database Layer (limit blast radius)
- Read-only credentials for all AI inference query paths
- Separate credentials and schemas for: user conversation data, RAG knowledge base, system prompt configurations
- Row-level security or table-level access policies
- Audit logging at the storage layer, separate from application logs
Layer 4 — Prompt and RAG Layer (protect the behavioral control plane)
- System prompt storage isolated from operational data in a separate, write-restricted store
- Version-controlled prompts with change approval workflow (treat like source code)
- Cryptographic hash of production prompts stored in an append-only audit log; verify at load time
- Write access to prompt configurations restricted to the deployment pipeline, not the application database user
- Continuous hash-based integrity monitoring; alert on deviation from known-good baseline
Prompt Integrity Monitoring: The Gap No One Has Filled
The most underinvested capability in enterprise AI security today is prompt integrity monitoring. Organizations have endpoint detection and response (EDR) for devices, SIEM for logs, and DLP for data. There is no widely deployed equivalent for detecting that an AI's behavioral instructions have been silently modified. Building this capability requires:
- Baseline snapshots of all system prompt configurations
- Continuous comparison against those baselines
- Automated alert on any deviation not correlated with an approved deployment event
- Incident response playbook specifically for "suspected prompt modification" scenarios
AI Asset Inventory: The Governance Prerequisite
Swept AI's analysis of the Lilli breach found a pattern across enterprise AI deployments: organizations cannot answer basic governance questions about their own AI systems — which assistants exist, what data each accesses, who created them, what prompts govern them, when they were last modified. Without this inventory, security reviews are necessarily incomplete and regulatory compliance is unauditable.
The Lilli platform's 384,000 AI assistants and 94,000 workspaces illustrate the scale problem. At that volume, manual inventory is impossible. AI asset management tooling — analogous to software asset management (SAM) for traditional applications — is now a prerequisite for AI security governance.
M&A Due Diligence: A New Security Domain
The Development Corporate analysis identified AI platform security as an emerging M&A due diligence domain. For acquirers evaluating companies with enterprise AI platforms, the Lilli breach surfaces a checklist that did not previously exist:
- Are all API endpoints authenticated by default?
- Is API documentation publicly accessible?
- How are system prompt configurations stored, versioned, and access-controlled?
- Is the database credential used for AI inference read-only?
- When was the last penetration test, and was AI-specific attack surface included in scope?
- Are RAG vector stores access-controlled at retrieval time?
- Does the AI incident response plan cover prompt modification scenarios?
AI security maturity now affects valuation — both as discount risk and as premium for organizations that have proactively built these controls.
OWASP LLM Top 10 as Audit Framework
The most actionable governance artifact for teams that need to assess their current AI security posture is the OWASP LLM Top 10 (2025). Mapping the Lilli breach to this framework produces a concrete audit checklist:
| OWASP Category | Lilli Breach Evidence | Your Audit Question |
|---|---|---|
| LLM01: Prompt Injection | Write access to system prompts via SQL injection | Can any API path reach system prompt records in write mode? |
| LLM02: Sensitive Information Disclosure | 46.5M messages, 728K files exposed | Are AI query credentials read-only and scoped to minimum data? |
| LLM04: Data and Model Poisoning | RAG write access via same injection vector | Who can write to the RAG knowledge base, via what path? |
| LLM06: Excessive Agency | 266K+ external API connections exploitable from compromised prompt | Are AI tool permissions scoped to minimum needed? |
| LLM07: System Prompt Leakage | 95 system prompts exposed and readable | Are system prompts access-controlled and not in the operational DB? |
| LLM08: Vector and Embedding Weaknesses | 3.68M RAG chunks exposed with S3 paths | Is the vector database access-controlled at retrieval time? |
Source: OWASP LLM Top 10 2025
11. Source Index
Primary Source
Dimension 1: Incident Coverage
- AI agent hacked McKinsey chatbot for read-write access — The Register
- Autonomous Agent Hacked McKinsey's AI in 2 Hours — BankInfoSecurity
- An AI agent hacked McKinsey's internal AI platform in two hours — The Decoder
- How an AI Agent Hacked McKinsey and Exposed 46 Million Messages — NeuralTrust
- Red-teamers unleash AI agent on McKinsey's chatbot — CyberNews
- McKinsey AI Platform Lilli Hacked — ResultSense
- An AI Agent Broke Into McKinsey's Internal Chatbot — Inc.
- McKinsey working to fix flaws in AI system after hack — FStech
- How an AI Agent Hacked McKinsey AI Platform — Outpost24
- McKinsey's Lilli Looks More Like an API Security Failure Than a Model Jailbreak — Promptfoo
Dimension 2: JSON SQL Injection and IDOR Technical Sources
- Abusing JSON-Based SQL — Imperva
- JS-ON: Security-OFF — Abusing JSON-Based SQL to Bypass WAF — Claroty Team82
- JSON-based SQL attacks bypassed WAFs — Contrast Security
- CVE-2024-42005: Django JSONField SQL Injection — Vulert
- A new rule to prevent SQL in JSON — OWASP CRS Project
- WAF Bypass Using JSON-Based SQL Injection Attacks — Picus Security
- OWASP SQL Injection Prevention Cheat Sheet
- The JSON SQL Injection Vulnerability — Kazuho's Weblog (2014, historical reference)
Dimension 3: Impact and Regulatory Context
- SQL Injection in McKinsey's AI Platform — AI Productivity
- An AI Agent Hacked McKinsey's AI Platform — State of Surveillance
- GDPR Article 33 — Notification of a personal data breach — gdpr-info.eu
Dimension 4: AI-Specific Vulnerabilities
- OWASP Top 10 for LLM Applications 2025
- OWASP LLM Gen AI Security Project
- Indirect Prompt Injection: The Hidden Threat — Lakera
- RAGPoison: Persistent Prompt Injection via Poisoned Vector Databases — Snyk Labs
- Transferable Embedding Inversion Attack — ACL 2024
- LLM Security Risks in 2026: Prompt Injection, RAG, and Shadow AI — Sombrainc
- AI Security in 2026: Prompt Injection, the Lethal Trifecta — Airia
Dimension 5: Traditional Tool Failures
- WAF Bypass Using JSON-Based SQL Injection Attacks — Picus Security
- OWASP CRS — A new rule to prevent SQL in JSON
- ZAP Active Scan Rules — OWASP ZAP Documentation
Dimension 6: Autonomous Security Agents
- Comparing AI Agents to Cybersecurity Professionals — arXiv 2512.09882
- Inside AWS Security Agent: A multi-agent architecture for automated penetration testing — AWS Security Blog
- The 2026 Ultimate Guide to AI Penetration Testing — Penligent
- RunSybil raises $40M for autonomous penetration testing — Fortune
- Top 6 Continuous Pentesting Tools in 2026 — Aikido
Dimension 7: Blast Radius
- How an AI Agent Hacked McKinsey AI Platform — Outpost24
- McKinsey AI Agent Lilli Hacked — The Stack Technology
- LLM Security Risks in 2026 — Sombrainc
- OWASP LLM Top 10 for LLM Applications — LLM06 Excessive Agency
- Enterprise AI Security Due Diligence — Development Corporate
Dimension 8: Unauthenticated Endpoints
- The Real Security Lesson from the McKinsey Breach — Traefik
- OWASP Injection Prevention Cheat Sheet
- NIST Special Publication 800-207: Zero Trust Architecture
Dimension 9: Responsible Disclosure
- McKinsey's AI chatbot hack reveals agentic AI security risks — CXOToday
- CodeWall says it hacked McKinsey's AI platform — what holds up and what doesn't — Edward Kiledjian
Dimension 10: Enterprise Governance
- McKinsey AI Platform Breach: Enterprise AI Governance Lessons — Swept AI
- AI Agent Hacks McKinsey: When You Should Not Deploy Agents — Nanonets
- Enterprise AI Security Due Diligence — Development Corporate
- AI & Security Predictions for 2026 — Prompt Security
- AI Cyber Defense 2026: 5 Critical Strategies — Cyber Strategy Institute
This report was produced by the Eightgen Research Division on March 19, 2026. All claims are cited with primary sources.



