Hi everyone 👋
Welcome back to AI Agent Weekly.
This week, OpenAI took a significant step toward practical, domain-specialized agents with the launch of self-improving Tax AI agents powered by Codex. These systems demonstrate autonomous feedback loops and continuous improvement from real-world usage, a key milestone in moving agents from general tools to reliable professional collaborators. Alongside this, we saw strong advancements in AI-powered security, specialized agent hardware, and multimodal generation
Let’s get into the details.
OpenAI Builds Self-Improving Tax Agents with Codex

What’s Happening: OpenAI, in collaboration with Thrive Holdings, developed Tax AI, a self-improving agent system for complex tax preparation using Codex’s agentic capabilities.
Report Includes:
Autonomous feedback loops that turn real-world usage into structured improvements.
Collaboration between practitioners and AI to handle increasingly complex tax returns.
Shift from manual engineering fixes to production-driven self-improvement.
Real deployment across 30+ accounting firms.
Why It Matters: This project shows how frontier agentic models can move beyond one-off tasks into domain-specific, continuously improving professional systems.
Google AI Threat Defense: Leads New Wave of Autonomous Security

What’s Happening: Google Cloud introduced Google AI Threat Defense, an automated security system that uses multiple AI models and agents to proactively detect, prioritize, and remediate vulnerabilities faster than adversaries can exploit them.
Report Includes:
Combines Gemini reasoning, Wiz contextual risk analysis, CodeMender for automated fixes, and Mandiant expertise.
Four-step framework: Prepare, Scan & Prioritize, Remediate, and continuously Monitor.
AI-powered penetration testing and autonomous patch generation.
Focus on reducing exploit windows in the era of fast AI-driven attacks.
Why It Matters: As attackers use AI to accelerate breaches, defenders need autonomous systems that operate at machine speed. Google’s new platform marks a major step toward proactive, AI-vs-AI cybersecurity.
NVIDIA Vera CPU Delivers Strong Results for Agentic Workloads

What’s Happening: New benchmarks highlight the performance of NVIDIA’s Vera CPU, purpose-built for agentic AI orchestration and sustained workloads.
Report Includes:
88 custom Olympus cores with 1.2 TB/s memory bandwidth.
Excellent sustained performance under heavy parallel agent tasks.
Strong results in code compilation, data processing, and orchestration.
Clear generational improvement over the previous Grace CPU.
Why It Matters: Agentic systems need powerful CPUs for coordination and tool use. Vera reinforces NVIDIA’s strategy to build full-stack AI infrastructure.
Anthropic Releases Zero Trust Framework for AI Agents

What’s Happening: Anthropic published a practical security guide titled “Zero Trust for AI Agents”, outlining how enterprises should securely deploy autonomous agents in the face of AI-accelerated threats.
Report Includes:
Addresses unique agent risks: prompt injection, tool poisoning, identity/privilege abuse, memory poisoning, and multi-agent coordination issues.
Introduces a three-tier Zero Trust architecture (Foundation, Advanced, Optimized) tailored to different maturity levels and risk profiles.
Emphasizes cryptographically rooted identities, per-task scoped permissions, protected memory, and defensive operations that match the speed of autonomous attackers.
Covers the compressed vulnerability-to-exploit timeline driven by frontier models.
Why It Matters: As agents gain more autonomy and tool access, traditional security models fall short. This framework provides a clear, actionable path for enterprises to adopt agentic AI safely.
Microsoft MAI Image 2.5 Climbs to No. 3 on Arena AI Leaderboard

What’s Happening: Microsoft’s MAI team released MAI Image 2.5, which quickly reached third place on the Arena AI image generation leaderboard.
Report Includes:
Strong performance in quality, prompt adherence, and creative consistency.
Competitive ranking against top multimodal image models.
Rapid iteration from the MAI research group.
Why It Matters: Demonstrates Microsoft’s continued aggressive push in the competitive multimodal generation space.
AWS AgentWatch: Ambient Agents for Proactive Cloud Monitoring

What’s Happening: AWS launched AgentWatch, an ambient agent system that delivers continuous proactive monitoring of AWS environments using Amazon Bedrock.
Report Includes:
Automated 15-minute health reports sent to Slack.
Hybrid human-in-the-loop patterns (Notify, Question, Review).
Natural language queries across accounts.
Built on Bedrock AgentCore for scalable deployment.
Why It Matters: Moves cloud operations from reactive to proactive AI-driven oversight essential for reliable agentic infrastructure at scale.
Anthropic Launches Claude Marketplace for Enterprise Agents

What’s Happening: Anthropic introduced the Claude Marketplace, a curated platform where enterprises can discover and purchase Claude-powered tools and agents using their existing Anthropic commitments.
Report Includes :
Simplifies procurement by letting organizations apply part of their Anthropic spend toward partner solutions.
Features enterprise-grade agents and platforms, including Augment Code (agentic software engineering), Harvey (legal), Lovable & Bolt (app building), GitLab, Snowflake Cortex Agents, CodeRabbit, and more.
Centralized billing and invoicing managed by Anthropic.
Focus on secure, scalable, production-ready agents for code, legal, finance, and data workflows.
Why It Matters: This lowers friction for enterprises adopting agentic AI by creating a trusted marketplace with consolidated spend and governance, accelerating deployment of specialized agents across the organization.
Read the full report
OpenAI Launches Secure MCP Tunnels for Private Agent Tools

What’s Happening: OpenAI released Secure MCP Tunnel, enabling enterprises to securely connect private, on-premises, or firewall-protected MCP (Model Context Protocol) servers to OpenAI products without exposing them to the public internet
Report Includes:
Uses an outbound-only HTTPS connection; there is no need to open inbound firewall ports or allowlist IP addresses.
Run the lightweight tunnel client inside your network; it establishes a secure tunnel to OpenAI and forwards MCP requests/responses.
Supported across ChatGPT, Codex, Responses API, AgentKit, and other agentic workflows.
Maintains full security boundaries while allowing agents to access internal tools, databases, APIs, and services.
Why It Matters: This removes a major barrier for production agent deployments in regulated and security-conscious environments. Enterprises can now safely expose internal tools to powerful AI agents while keeping everything behind their firewall. A significant step toward secure, real-world agentic systems.
IBM Advances Agentic AI for Trusted Dataset Integration

What’s Happening: IBM highlighted how agentic AI integration in watsonx.data integration is transforming data access by enabling business users to generate trusted, production-ready datasets in minutes instead of weeks.
Report Includes:
Business users describe needs in natural language; the agent interprets intent, identifies sources, builds pipelines, applies governance, and delivers validated datasets.
Reduces typical 1–4 week ticket-based delays to under 3 minutes.
Built-in guardrails: automatic compliance, access controls, audit trails, lineage, and human-in-the-loop approval.
Connects to over 300 enterprise systems with support for batch, streaming, and replication.
Why It Matters: One of the biggest bottlenecks for agentic AI adoption is access to trusted, timely data. IBM’s approach makes high-quality data self-serviceable while maintaining enterprise governance, accelerating AI initiatives without compromising security or reliability.
Read the full report
Thanks for reading.
See you next week with more AI agent updates.
— Rakesh’s Newsletter


