Anthropic vs. Alibaba: What the largest known AI distillation attack tells us about model security

On June 10, Anthropic sent a letter to senior members of the US Senate Banking Committee describing what it called the largest known attack of its kind against the company: a coordinated operation using 25,000 fraudulent accounts to run 28.8 million interactions with Claude between April 22 and June 5, 2026.

Anthropic alleges Alibaba ran the operation to create training data for its Qwen model family. The technique is called distillation: a less capable model is trained on the outputs of a more capable one, allowing the smaller model to mimic the stronger model's behavior at a fraction of the development cost.

Alibaba has not responded.

What distillation is and why it matters

Model distillation in the legitimate sense is a well-established technique in machine learning: a student model learns to replicate the behavior of a teacher model by training on the teacher's outputs. It is how you compress a large expensive model into a smaller cheaper one without retraining from scratch.

As an attack, distillation looks different. Instead of training from a model you control, you systematically query someone else's closed model, collect its outputs across a carefully designed set of prompts covering the capabilities you want to replicate, and use those outputs as training data. You do not need access to the model weights. You just need API access and a lot of queries.

The limiting factor is cost: at standard API pricing, 28.8 million interactions would cost millions of dollars. Anthropic says the fraudulent accounts were created to distribute the cost and evade per-account rate limits. The accounts ran queries designed to probe Claude's strengths in software development, multi-step reasoning, and agentic task execution. Those specific capabilities are, not coincidentally, areas where Qwen has shown notable improvement in recent benchmark comparisons.

What the detection failure tells us

Anthropic's fraud detection did not catch the campaign for approximately six weeks. That is the most operationally significant fact in this case.

25,000 accounts running coordinated, systematically designed queries for six weeks is a large operation. The accounts were presumably spread across IP ranges and created over time to evade creation-velocity detection. The query patterns were likely designed to look diverse rather than repetitive. The operation succeeded in flying under detection for long enough to collect a substantial dataset.

This matters for any organization deploying a proprietary or fine-tuned model via API. The same techniques that defeated Anthropic's fraud detection will work against weaker detection systems. Most organizations deploying custom models do not have Anthropic's scale of investment in abuse detection. If Anthropic took six weeks to detect this, smaller organizations may not detect it at all.

What the White House says about it

On June 5, the White House signed NSPM-11, a National Security Presidential Memorandum on artificial intelligence. For the first time in presidential-level security guidance, the document explicitly names malicious distillation attacks as a national security concern and directs federal agencies to incorporate this threat class into model security assessments.

This is not regulatory theater. Distillation attacks, executed at the scale Anthropic describes, convert closed proprietary models into open, unguarded derivatives that bypass the safety controls built into the original system. A model trained on Claude's outputs to replicate Claude's capabilities does not inherit Claude's safety guardrails or constitutional AI training. It inherits the capability without the constraint.

The national security framing focuses on geopolitical competition: frontier AI capabilities developed with American investment being replicated by geopolitical competitors at minimal cost. The AI security framing is different but equally concrete: organizations with proprietary models trained on sensitive internal data, customer data, or specialized domain knowledge face the same attack class, with the theft being competitive intelligence rather than national security material.

Practical implications for teams deploying models via API

Query diversity fingerprinting is the most direct detection approach. A legitimate user's query distribution over time looks different from a systematic capability extraction campaign. Legitimate users ask varied questions reflecting their actual work. Capability extractors ask questions designed to probe specific model behaviors across a structured taxonomy of tasks. The query set over a six-week campaign will show systematic coverage of model capabilities that no normal user would produce.

Volume-to-diversity ratio anomaly detection is a related signal. A single account generating high query volume with unusually broad topic diversity is a red flag. 25,000 accounts each generating moderate query volume with coordinated topic coverage across the group is harder to detect at the account level, but visible at the aggregate level if you are looking for correlated query patterns across accounts.

Attribution tracing matters for API products. API keys tied to verified organizational identities, with revocation processes that propagate in near-real time, raise the cost of the account-farming approach significantly.

Rate limits and cost controls slow distillation campaigns but do not stop them. A well-resourced attacker distributes the load across enough accounts and time to stay under per-account thresholds.

What this case changes

The Anthropic/Alibaba case, if Anthropic's account is accurate, is the first time model distillation as an attack has been confirmed at this scale and attributed to a specific organization in public, government-facing disclosure. It moves distillation from a theoretical risk in AI safety papers into the category of documented incidents with named actors.

For the AI security community, the practical shift is this: model security is not just about preventing jailbreaks and prompt injection. It includes protecting model capabilities as proprietary assets, with the same rigor applied to protecting training data and model weights.

The detection gap is real. Closing it requires investment in query-level behavioral analytics that most API platforms do not have by default.

Gigia Tsiklauri is a Security Architect and founder of Infosec.ge. Get in touch if you are working on AI model security or API abuse detection for your organization.

AI SecurityAgentic AIllm-securityPrompt Injection

One in eight AI breaches now involves agentic systems: what the HiddenLayer 2026 report actually says

HiddenLayer's 2026 AI Threat Landscape Report puts a concrete number on what many security teams have been watching with unease: autonomous agentic AI systems now account for more than 1 in 8 reported AI security breaches. Prompt injection is present in 73% of production deployments. The attack surface expanded faster than the defenses.

June 29, 2026

AI SecurityLLM

The government is now deciding who gets the best AI: what GPT-5.6's gated release means for defenders

The White House asked OpenAI to stagger GPT-5.6 access to roughly 20 approved organizations. Anthropic's Mythos model was pulled the same week under similar pressure. This is the first time frontier AI access has been restricted based on offensive cyber capability rather than hardware export rules, and it changes how defenders need to think about their AI vendor dependencies.

June 27, 2026

AI SecurityLLMcredential-theft

Poisoned Tenant: how threat actors are weaponizing OpenAI's own invite emails against security teams

Push Security discovered a campaign where attackers create fake OpenAI organization tenants impersonating your company, then invite your employees with Owner-level access using legitimate [email protected] invitations. Here is how the attack works and how to detect it.