Skip to content
AI SecurityAnthropicVulnerability ResearchLLMDual Use

Project Glasswing: 10,000 Critical Flaws in a Month and a Growing Patch Gap

6 min read
Share

Anthropic Project Glasswing finds 10,000 critical software flaws in a month. The patches cannot keep up.

On May 23, Anthropic published the first one-month update from Project Glasswing, the initiative it launched in April to direct frontier-model offensive cyber capability at the world's most systemically important open-source software. The headline number is the headline for a reason. Claude Mythos Preview, working with approximately 50 launch partners, surfaced more than 10,000 high or critical vulnerabilities in 30 days.

That number is correct. The implications are uncomfortable.

What Glasswing actually did in 30 days

The Frontier Red Team's technical writeup gives the breakdown. Mythos Preview analyzed code across more than 1,000 open-source projects. It flagged 6,202 high or critical-severity vulnerability candidates. Human validators confirmed 1,726 of those as real exploitable flaws. Of those confirmed, 1,094 were genuine high or critical severity.

Two bugs surfaced in month one capture the depth of what the model is finding. A 27-year-old vulnerability in OpenBSD, undiscovered since the late 1990s, and a 16-year-old flaw in FFmpeg, undiscovered since roughly 2010. These are bugs that have lived through every major OSS security review wave of the last two decades. Mythos found them in a software-reading pass measured in days.

The launch partner list reads like a snapshot of who matters in 2026 infrastructure. AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks. The framing of "systemically important" software is deliberate. Anthropic and the partners are not pointing the model at arbitrary open-source projects. They are pointing it at the libraries, kernels, browsers, and runtimes that everything else depends on.

The patching gap is the actual story

Here is the line in the Frontier Red Team update that should change how every defender thinks about 2026. Of the thousands of validated high-severity findings, fewer than 100 confirmed patches have shipped.

Re-read that. Anthropic and its partners have confirmed over a thousand real exploitable critical vulnerabilities in widely-used software. The world has fixed roughly one in ten.

The bottleneck has structurally moved. For the entire history of software security, the rate-limiting step has been vulnerability discovery. Defenders worried about how fast they could find bugs, how broad their fuzzing coverage was, how many bug-bounty submissions they could process. The implicit assumption was that finding bugs took time and effort, and the remediation backlog scaled to match.

That assumption is now broken. Mythos finds bugs at a pace that exceeds what the entire OSS community plus its commercial security-vendor allies can validate, disclose, patch, ship, and deploy. The bottleneck has moved upstream to coordinated disclosure capacity, then to patch development bandwidth, then to downstream deployment cadence. Each of those is a different organizational discipline than vulnerability discovery, and none of them scale linearly with budget.

Mythos held from public release

The second piece of news in the Glasswing update is the model availability decision. Mythos Preview will not be released publicly in its current form. Anthropic explicitly cited the cybersecurity capability ceiling. The model's autonomous vulnerability-discovery and exploit-development capability exceeds what Anthropic considers safe for general availability.

This is the first time a frontier lab has publicly invoked offensive cyber capability as the gating factor on a model release. The bio, chem, and dual-use cyber framings have existed in safety-policy documents for two years now. This is different. This is a specific model, with specific demonstrated capabilities on real targets, being held from general availability because the offense-defense math broke.

For enterprise procurement, the implication is direct. Capability disclosures from frontier labs that touch vulnerability research should now be assumed to involve extended pre-release evaluation windows. Some capability tiers will never be commercially available. Vendor security questionnaires that ask about day-zero model availability should add a question about which capabilities have been gated, on what criteria, and by whom.

What this means for you this week

Three things change if you operate any infrastructure that depends on systemically-important open-source software.

First, plan for a sustained patch wave over the next 90 days. Anthropic operates Glasswing under a strict 90-day coordinated-disclosure window. The first batch of validated high-severity findings will start hitting public CVE databases through May, June, and July. Weight your patch operations toward the partner ecosystem critical list, especially Linux Foundation projects, the major browsers, and the open-source components that ship inside AWS, Microsoft, and Google products.

Second, subscribe to the Anthropic Frontier Red Team blog. Treat each Glasswing disclosure as an in-cycle Patch-Tuesday-equivalent event. The Tier-1 primary source is the Frontier Red Team writeup; downstream vendor advisories will follow as patches ship, but the Anthropic post will land first and give you the architectural framing.

Third, audit your patch-cycle latency for the systemically-important software you depend on. If your average time from CVE-released to production-deployed for a critical-severity issue is measured in weeks rather than days, this is the moment to invest in cutting that down. The defender window has shrunk on the discovery side. Whether it has shrunk on your side depends on choices your engineering organization makes now.

The bigger frame

The Unit42 Frontier AI Defender's Guide from May 14 estimated a 3-to-5-month window for defenders before frontier-model offensive capability would meaningfully outpace defender capacity. Glasswing has now compressed that estimate. The window is over for the discovery half of the problem. What remains open is the response half.

The response half is, in the long run, the harder problem. Discovery scales with compute. Disclosure, patching, and deployment scale with human coordination, organizational process, and the slow gears of vendor relationships. Anthropic has shown what one frontier model with one month of inference can do. It is going to take the rest of the software industry several quarters to absorb the implications.

For Tashkent-region defenders and for any organization running infrastructure that depends on OpenBSD, FFmpeg, Linux kernel, or any of the partner-ecosystem critical software, the read for this week is direct. Watch the Frontier Red Team blog. Plan for a 90-day rolling patch wave. Reduce patch-cycle latency wherever the engineering effort is feasible. The offense-defense balance has structurally shifted, and the response will play out across the rest of 2026.

Gigia Tsiklauri is a Security Architect and founder of Infosec.ge. Get in touch if you want to talk through how Project Glasswing changes patch-cycle planning for your infrastructure.