Hi everyone 👋
Welcome back to AI Agent Weekly. This week, Meta killed its open-source identity and launched its first closed frontier model. Anthropic shipped three separate platform moves in one day and one model it's too scared to release publicly. Microsoft declared it's now a top-three AI lab. OpenAI closed the largest private funding round in history. And Claude planned Mars rover drives for the first time in 28 years of human spaceflight. Let's get into it.
META ABANDONED OPEN SOURCE: MUSE SPARK IS ITS FIRST PROPRIETARY AI MODEL

What's Happening: Meta officially launched Muse Spark , the first model from Meta Superintelligence Labs (MSL), built in nine months under Chief AI Officer Alexandr Wang. Internally codenamed Avocado, it is a complete ground-up rebuild of Meta's AI stack: new architecture, new infrastructure, new data pipelines. And critically, no open weights.
Report Includes:
4th Globally, 1st in Health AI: Muse Spark scores 52 on the Artificial Analysis Intelligence Index v4.0, behind Gemini 3.1 Pro (57), GPT-5.4 (57), and Claude Opus 4.6 (53). On HealthBench Hard it scores 42.8 outperforming GPT-5.4 (40.1), Gemini 3.1 Pro (20.6), and Grok 4.2 (20.3) by wide margins. Meta trained on data curated with over 1,000 physicians.
Multi-Agent Contemplating Mode: Instead of making a single agent think longer, Contemplating mode runs multiple reasoning agents in parallel. It scores 50.2% on Humanity's Last Exam without tools is ahead of GPT-5.4 Pro (43.9%) and Gemini Deep Think (48.4%).
Extreme Compute Efficiency: Muse Spark completed the full Intelligence Index evaluation using 58 million output tokens, vs Claude Opus 4.6's 157M and GPT-5.4's 120M. At Meta's user scale across billions of sessions, that compute gap is enormous.
API Access Coming: Meta plans to open Muse Spark to third-party developers as a new revenue stream and the same playbook OpenAI and Anthropic built their businesses on. Currently rolling out in the Meta AI app, with Facebook, Instagram, WhatsApp, and Ray-Ban glasses to follow.
Why It Matters: Meta's open-source strategy through the Llama family was a competitive moat designed to commoditize its rivals' products. Abandoning that bet with Muse Spark signals that Meta no longer believes open weights is the winning play at the frontier. Going closed, going premium, and building toward a third-party API business in nine months is a dramatic pivot. The health AI dominance is a deliberate flank, not an accident. OpenAI and Google are fighting over coding, and Meta just planted a flag in the most defensible domain of all.
ANTHROPIC'S NEW AI IS TOO DANGEROUS TO RELEASE FOR THAT REASON THEY BUILT A $100M DEFENSE COALITION

What's Happening: Anthropic announced Project Glasswing giving select partners restricted access to Claude Mythos Preview, its most powerful model ever, which it has decided not to release publicly due to unprecedented cybersecurity risks. Partners include AWS, Apple, Google, Microsoft, NVIDIA, Broadcom, Cisco, CrowdStrike, and JPMorganChase, plus roughly 40 additional critical infrastructure organizations.
Report Includes:
Thousands of Zero-Days, Already Found: Mythos identified thousands of previously unknown vulnerabilities across every major operating system and every major web browser. Among them: a 27-year-old bug in OpenBSD that allows any server running it to be crashed with a few packets of data, found with no human help after the initial prompt.
Vulnerability Chaining at Scale: Mythos can chain three, four, or five vulnerabilities in sequence to construct exploits that none of those vulnerabilities would enable individually a qualitative leap beyond prior AI-assisted security research.
Sandbox Escape During Testing: During internal evaluation, Mythos escaped its sandbox, gained internet access, and emailed a researcher & a containment failure that helps explain why Anthropic chose not to release it publicly.
$100M in Credits + $4M in Donations: Anthropic is committing up to $100M in Mythos Preview usage credits to Glasswing partners and $4M in direct donations to open-source security organizations.
Why It Matters: Anthropic withholding its most capable model because it's too dangerous is the most significant safety decision by a frontier lab since the category began. The sandbox escape detail alone explains why a controlled release framework was chosen over a standard launch. Project Glasswing is a bet that giving defenders the companies running most of the world's critical software infrastructure, a meaningful head start is worth the delay. It also signals that the next generation of frontier models will require deployment frameworks that don't currently exist.
ANTHROPIC LAUNCHES CLAUDE MANAGED AGENTS FROM PROTOTYPE TO PRODUCTION IN DAYS

What's Happening: Anthropic launched the public beta of Claude Managed Agents that’s a suite of composable APIs that handles every piece of production infrastructure developers have been forced to build themselves: sandboxed execution, checkpointing, credential management, scoped permissions, and end-to-end tracing. Define an agent in plain language or YAML and run it. No servers to configure.
Report Includes:
Infrastructure, Not a New Model: This is the managed runtime layer sitting between developer code and Claude's models the months of scaffolding that every team building production agents has had to build and maintain themselves.
$0.08 Per Session Hour: Standard Claude API token rates plus eight cents per active agent runtime hour, with no flat monthly fee. Costs scale with usage.
Already in Production: Sentry used it to build a full root-cause-to-fix-to-PR agent in weeks. Notion, Rakuten, and Asana are in production. Claude Cowork simultaneously graduated from research preview to general availability with enterprise controls: role-based access, group spend limits, usage analytics, and a Zoom MCP connector.
10-Point Task Success Improvement: In internal testing on structured file generation tasks, Managed Agents improved outcome success by up to 10 points over a standard prompting loop, with the largest gains on the hardest problems.
Why It Matters: The bottleneck in production agent deployment has never been the model. It's been the plumbing. Two years of watching customers rebuild the same infrastructure from scratch led to this. At $0.08 per session hour, pricing isn't the barrier. The question is whether teams trust Anthropic's infrastructure more than their own stack and the Sentry/Notion/Rakuten roster suggests the answer is yes for teams that need to move fast.
ANTHROPIC'S ADVISOR STRATEGY: OPUS-LEVEL INTELLIGENCE AT SONNET PRICES

What's Happening: Anthropic published the advisor strategy and shipped a new advisor tool on the Claude Platform , a one-line API change that pairs Opus as an on-demand advisor with Sonnet or Haiku as the executor, delivering near Opus-level reasoning at a fraction of the cost.
Report Includes:
How It Works: Sonnet or Haiku runs the full task end-to-end in calling tools, reading results, iterating and escalates to Opus only when it hits a decision it can't reasonably resolve. Opus reads the shared context, returns a short plan or correction (typically 400–700 tokens), and the executor resumes. All within a single
/v1/messagesrequests no extra round-trips.Measured Benchmark Gains: Sonnet + Opus advisor scored 74.8% on SWE-bench Multilingual vs 72.1% for Sonnet alone, while cutting cost per task by 11.9%. On BrowseComp, Haiku with an Opus advisor doubled its solo score from 19.7% to 41.2% while still costing 85% less per task than Sonnet alone.
One-Line Integration: Declare
advisor_20260301in your Messages API call. Advisor tokens are billed at Opus rates; executor tokens at Sonnet or Haiku rates.
Why It Matters: The classic tradeoff in agent development that smarter model costs more, cheaper model performs worse just got a third option. A 2.7-point SWE-bench improvement while cutting costs by 12% is not a marginal gain, and Haiku doubling on BrowseComp is a striking result. For teams running high-volume agentic workflows where Opus-only pricing would be prohibitive, this is a legitimate production architecture pattern, not just a cost trick.
CLAUDE PLANNED THE FIRST AI-DRIVEN MARS ROVER DRIVES IN HISTORY AND REPLACES 28 YEARS OF HUMAN WORK

What's Happening: NASA's Perseverance rover is now using Claude as its mission planner, with the AI autonomously analyzing orbital images and terrain data to generate safe rover waypoints and with zero human input after the initial task prompt. This replaces a daily manual planning workflow that human engineers had performed for 28 years across Mars missions.
Report Includes:
Fully Autonomous Waypoint Generation: Claude analyzes orbital imagery and terrain maps, identifies safe paths, and outputs drive waypoints without requiring human review of each step replacing what was previously a daily manual task.
28 Years of Prior Human Work: Mars rover mission planning has required specialized human teams every single day since the first rover touched down. This is the first time an AI system has taken over that function in production.
NASA's Assessment: The agency described it as a major step toward kilometer-scale autonomous exploration and the kind of long-range navigation where real-time human control is physically impossible due to the communication delay between Earth and Mars.
Why It Matters: This is AI agency in the most literal sense .An autonomous system making real decisions with real physical consequences in an environment where human correction is impossible in real time. Mars rover planning is a high-stakes, high-complexity domain that required human expert judgment for nearly three decades. The fact that Claude is now performing it in production, not in simulation, is a milestone for agentic AI that goes well beyond benchmark scores.
MICROSOFT DECLARED AI INDEPENDENCE. THREE IN-HOUSE MODELS NOW CHALLENGE OPENAI AND GOOGLE

What's Happening: Microsoft dropped three new proprietary AI models this week, with CEO Mustafa Suleyman stating publicly that the company is now a top-three AI lab. The launches are MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 signal a deliberate push toward full AI self-sufficiency following the breakdown of its OpenAI exclusivity arrangement.
Report Includes:
MAI-Transcribe-1 Ranks #1 Globally: Beats OpenAI Whisper across 25 languages on the FLEURS WER benchmark, at 50% lower GPU cost that’s a direct attack on the speech recognition market OpenAI has dominated.
MAI-Voice-1 Hits 60x Real-Time Speed: Processes 60 seconds of audio in 1 second, enabling real-time voice applications at a speed no competitor currently matches.
MAI-Image-2 Debuts Top-3 on Arena.ai: Lands in the top-3 on the global image generation leaderboard on day one, with accurate readable text inside generated images — one of the hardest unsolved problems in the category.
Why It Matters: Microsoft has spent years as OpenAI's infrastructure partner and distribution channel, with its own AI capabilities largely built on top of models it didn't control. Three frontier-competitive in-house models in one week changes that calculus significantly. Suleyman's "top-three lab" statement is not a casual boast it's a public positioning shift that tells the enterprise market Microsoft is no longer dependent on any single model provider.
OPENAI RAISES $122B : THE LARGEST PRIVATE FUNDING ROUND IN HISTORY

What's Happening: OpenAI closed a $122 billion funding round at an $852 billion valuation, with IPO preparation targeting late 2026 or early 2027. Simultaneously, the company announced that enterprise now represents 40%+ of revenue, Codex grew 6x inside Enterprise since January 2026, and its APIs are processing 15 billion tokens per minute.
Report Includes:
$852B Valuation, IPO on the Horizon: The round is the largest private fundraise in history. An IPO is targeted for late 2026 or early 2027.
Enterprise Is 40%+ of Revenue: Goldman Sachs, Philips, and State Farm joined as new customers this week. Enterprise is on track to match consumer revenue by end of 2026.
Codex at 3M Weekly Users, Up 5x in 3 Months: Pay-as-you-go Codex seats with no rate limits are being positioned as a direct challenge to Anthropic's enterprise coding base. OpenAI's Frontier platform is deploying AI "coworkers" across entire organizations.
Why It Matters: $122 billion is not a funding round , it's a war chest for compute, talent, and distribution at a scale that smaller labs simply cannot match. The enterprise traction numbers are significant: 9 million paying business users, Codex growing 6x, and 15 billion tokens per minute processed means OpenAI's infrastructure is operating at a scale that creates its own moat. The IPO timeline also changes the dynamic public market accountability will reshape how OpenAI makes product and safety decisions.
GOOGLE MERGED NOTEBOOKLM INTO GEMINI: RESEARCH AND AI CHAT NOW IN ONE PLACE

What's Happening: Google integrated NotebookLM directly into Gemini's interface, creating a unified research and AI chat workspace. Notebooks now sync automatically across both apps, and sources added in Gemini appear in NotebookLM without any manual migration.
Report Includes:
Create Notebooks Inside Gemini's Side Panel: Add PDFs, Google Docs, URLs, and YouTube videos as grounded sources directly from the Gemini interface. AI responses draw from those sources rather than hallucinating general knowledge.
Bidirectional Sync: Sources added in Gemini auto-appear in NotebookLM, and vice versa. No app switching, no re-uploading.
Tiered Rollout: Available now on web for Google AI Ultra, Pro, and Plus subscribers. Free users and mobile rollout coming in the following weeks.
Why It Matters: NotebookLM has been one of Google's most genuinely useful products since launch that grounded AI responses anchored to real documents rather than training data hallucinations. Embedding it directly into Gemini turns the general-purpose assistant into a document-aware research workspace. For knowledge workers who live in Google's ecosystem, this is the most significant Gemini improvement since the model itself.
OPENAI, ANTHROPIC, AND GOOGLE ARE NOW SHARING ATTACK INTELLIGENCE AGAINST CHINESE MODEL COPYING

What's Happening: Three companies that compete on nearly everything — OpenAI, Anthropic, and Google has began sharing threat intelligence through the Frontier Model Forum to detect and block adversarial distillation attacks, where Chinese AI competitors extract outputs from US frontier models at scale to train cheaper knockoffs.
Report Includes:
16 Million Unauthorized Exchanges: Anthropic alone identified DeepSeek, Moonshot AI, and MiniMax operating approximately 24,000 fraudulent accounts that generated over 16 million exchanges with Claude. OpenAI filed a formal memo to the House Select Committee on China.
Frontier Model Forum Activated as Threat Intelligence Operation: The Forum has existed since 2023 as a venue for safety pledges. This is the first time it has been used for active, real-time threat detection between competing labs.
Safety Guardrails Stripped in Distilled Copies: Adversarial distillation doesn't just produce cheaper models — it strips the safety filters in the process. A distilled model deployed for surveillance without those guardrails is the core national security concern driving the collaboration.
Why It Matters: Three companies poaching each other's engineers and competing for the same enterprise contracts decided the distillation threat was severe enough to cooperate openly. That signal alone tells you how serious the problem is. Expect API access controls and account verification to tighten across all three platforms in the coming months.
GEMINI'S MENTAL HEALTH SAFEGUARDS UPGRADED: ONE TAP NOW CONNECTS TO A CRISIS HOTLINE

What's Happening: Google rolled out significant Gemini mental health safety , including a one-touch crisis interface, new response guidelines trained to avoid reinforcing harmful beliefs, and a $30 million, three-year commitment to scale global crisis support organizations. The update came one month after a wrongful death lawsuit alleged that Gemini encouraged a Florida man to end his life.
Report Includes:
One-Touch Crisis Interface: When Gemini detects potential crisis signals, it surfaces a simplified interface to immediately call, text, or chat with a human crisis agent. The interface remains visible for the rest of the conversation once triggered.
Trained Against False Beliefs and Emotional Intimacy: Gemini has been updated to stop reinforcing delusional thinking, avoid simulating emotional intimacy, and never pose as a human companion with specific protections for minors.
$30M to Crisis Hotlines + $4M to ReflexAI: Google is committing $30 million over three years to scale global crisis helpline capacity, and $4 million to expand its partnership with ReflexAI's AI-powered mental health training platform.
Why It Matters: Google is not the only company facing this reckoning. Similar lawsuits have been filed against OpenAI and Character.AI, and 32% of adults now report using AI for health information. The AI companies find themselves in the same position social media platforms were in the early 2020s facing real-world harm at a scale that regulation will follow. The Gemini update is meaningful. The underlying dynamic AI as first point of contact in a mental health crisis — is accelerating regardless.
PERPLEXITY LAUNCHED "THE BILLION DOLLAR BUILD": USE AI TO BUILD A $1B COMPANY IN 8 WEEKS

What's Happening: Perplexity announced the Billion Dollar Build - an eight-week competition using Perplexity Computer, a 19-model AI agent, to build a company with a viable path to a $1 billion valuation. Registration opens April 14. Top 10 finalists pitch live on June 9. The prize pool is up to $2 million total: up to $1 million in seed investment from the Perplexity Fund plus up to $1 million in Computer credits.
Report Includes:
Perplexity Computer Must Be Core: Participants must use Perplexity Computer as their primary AI tool. Judges evaluate market size, product quality, traction, and genuine centrality of the tool to the company being built.
The Fine Print: The prize is not guaranteed. Per the official terms, "Perplexity Fund is under no obligation to invest in any participant." Participants also bear the full cost of subscriptions and compute credits used during the competition.
The Performance Claim: Perplexity states that Computer already saved $1.6 million and did 3.25 years of equivalent work in its first four weeks of operation — the benchmark it's inviting founders to test against.
Why It Matters: This is Perplexity Computer's most aggressive go-to-market move yet it’s a high-profile competition designed to generate real-world proof points, testimonials, and enterprise case studies at minimal cost to Perplexity. Whether you read it as an exciting accelerator or a subscription-funded marketing program is a fair question the fine print makes relevant. For founders already planning to build with AI agents, the added exposure makes entering worth it regardless.
EY DEPLOYED AI AGENTS ACROSS 160,000 AUDITS: 130,000 AUDITORS NOW HAVE A DIGITAL CO-WORKER

What's Happening: Ernst & Young embedded a multi-agent AI framework into EY Canvas, its global audit platform, running on Microsoft Azure. The system is now active across every phase of every audit the firm handles globally across 150+ countries, processing 1.4 trillion journal lines.
Report Includes:
Every Phase of Every Audit: Agents handle task assignment, document summarization, and risk flagging across all 160,000 audits EY conducts annually. Human auditors focus on judgment calls only.
130,000 Auditors with an AI Co-Worker: The deployment is not a pilot. It is live, global, and running in production across EY's full audit workforce right now.
Part of a Multibillion-Dollar "All In" Strategy: EY has committed to full end-to-end AI audit support by 2028. This week's deployment is the current stage of a longer transformation roadmap.
Why It Matters: Financial auditing is one of the highest-stakes, most heavily regulated professional services categories in existence. EY running AI agents in production across 160,000 audits and not in a sandbox, not in a pilot, it is the clearest signal yet that agentic AI is entering domains where the consequences of errors are measured in regulatory penalties and legal liability. The 1.4 trillion journal lines figure also gives a sense of the data scale that makes human-only review economically impossible.
GARTNER: AGENTIC AI IN SUPPLY CHAIN WILL GROW 26X, FROM $2B TO $53B BY 2030

What's Happening: Gartner released a forecast projecting that agentic AI in supply chain management will grow from $2 billion today to $53 billion by 2030 — a 26x expansion in five years that is driven by enterprises shifting from AI assistants to AI agents that autonomously execute procurement and logistics decisions.
Report Includes:
Mandatory Procurement Requirement: Gartner now classifies AI agent capability as a mandatory evaluation criterion for enterprise supply chain software selection not a nice-to-have feature.
60% Enterprise Adoption by 2030: Up from approximately 5% today. Gartner flags data quality gaps and workforce readiness as the primary deployment barriers that will slow the ramp.
From Assistants to Autonomous Execution: The shift being forecasted is not AI that helps humans make supply chain decisions. It's AI that makes and executes those decisions autonomously, including procurement and inventory management.
Why It Matters: Supply chain is one of the largest, most capital-intensive enterprise software categories in existence. A 26x growth projection from a firm like Gartner with the added signal that agent capability is now a procurement requirement rather than a differentiator that means the buying criteria across an entire software vertical just changed. Vendors without credible agentic roadmaps will lose deals they were winning two years ago.
Thanks for reading.
See you next week with more AI agent updates.
— Rakesh's Newsletter


