Hey there 👋

Last week I went again in the internet searching for how production-grade AI agents are implemented by reviewing engineering blogs, case studies, and system design docs from teams operating at scale.

Here’s what I found after reviewing these systems in production.👇

American Express: IT Support & Travel Agents

What they built: Conversational AI agents that handle IT support tickets and travel assistance requests.

How they did it:

  • Trained on historical support interactions and company policies

  • Integrated with existing ticketing systems and travel databases

  • Built escalation logic to route complex cases to humans

What matters: 40% fewer IT escalations and an 85% increase in travel assistance efficiency. The win wasn't about handling every case but about filtering out the noise so experts could focus on what truly required their expertise.

Uber: Finch for Financial Data

What they built: A natural language to SQL agent called Finch that lets finance teams query data without writing code.

How they did it:

  • LLM translates business questions into SQL queries

  • Connected to Uber's data warehouse

  • Added guardrails and query validation for safety

What matters: Eliminated dozens of hours per week of manual SQL writing. Finance people now get answers instantly instead of waiting for data engineers.

Anthropic: Multi-Agent Research System

What they built: A coordinated system where specialized agents collaborate on research tasks: one plans, one searches, one reads, one synthesizes.

How they did it:

  • Each sub-agent handles a specific capability

  • The orchestration layer coordinates the workflow

  • Built with Claude models for reasoning and tool use

What matters: 90.2% performance improvement and 40% faster completion on research tasks. Breaking complex work into specialized agents beats the single-agent approach.

What they built: A multi-step RAG agent that searches across 50+ connected apps and platforms.

How they did it:

  • Semantic search across all data sources

  • Claude for natural language understanding

  • Heavy optimization for speed

What matters: Sub-2-second response times on 95%+ of queries. Speed wasn't a nice-to-have; it was the entire product. Nobody uses a search tool that takes 10 seconds to think.

Airtable: Field Agents for Automation

What they built: Agents that automate workflows through natural language instructions, no code required.

How they did it:

  • Natural language interface for defining automation rules

  • Agents read, write, and manipulate Airtable data

  • Integration with their database and API ecosystem

What matters: 100+ hours of work automated in seconds. Business users became automation creators without needing developers.

Salesforce: Text-to-SQL Agents

What they built: An agent that translates natural language questions into SQL queries against Salesforce data.

How they did it:

  • LLM generates SQL from business questions

  • Implemented data security and access controls

  • Query validation before execution

What matters: Self-serve answers in minutes vs. waiting dozens of hours for analyst support. Reduced the bottleneck on routine data requests.

Intercom: Voice Support Agents

What they built: Voice AI agents that handle customer support phone calls.

How they did it:

  • Natural language understanding for voice

  • Integration with knowledge bases and CRM

  • Seamless handoff to humans when needed

What matters: 56% resolution rate + 5x cost savings per call. The economics work because agents handle volume while humans handle complexity.

DoorDash: Testing & Support Agents

What they built: AI agents for automated testing and customer support on AWS Bedrock.

How they did it:

  • Agents simulate user flows for testing

  • Support agents handle order tracking and common issues

  • Integration with existing infrastructure

What matters: 50x testing capacity increase + 49% fewer transfers to human support + 2.5s response latency. Scaled two bottlenecks simultaneously.

Meta & Aitomatic: Domain-Expert Agents for IC Design

What they built: Domain-Expert Agents (DXA) that captured decades of specialized knowledge for integrated circuit field engineers.

How they did it:

  • Used Llama 3.1 70B fine-tuned on domain data

  • Three-phase approach: Capture (documents, expert knowledge) → Train (augment with synthetic scenarios) → Apply (deploy with connected resources)

  • Neuro-symbolic AI combining neural networks with expert rules

  • On-premises deployment to protect IP

What matters: 3x faster issue resolution + 75% first-attempt success rate (vs. 15-20% from generic AI). The key was capturing and scaling expert knowledge, not just having a good model. Open-source Llama gave them control over sensitive company data.

Shopify: Product Attribute Extraction

What they built: Multimodal agents that extract product attributes from images to improve listings and search.

How they did it:

  • Fine-tuned Llama 2 7B-based LLaVA model on product images

  • Two agents: real-time merchant assistant + offline catalog optimizer

  • Used QLoRA for efficient training, then full-precision for accuracy

  • Deployed on 100 NVIDIA A100 GPUs with LM deploy

  • Process over 10,000 metadata categories

What matters: Tens of billions of tokens processed daily. They reduced the number of parameters from 13B to 7B, not for accuracy but for cost; open-source made massive-scale inference economically viable. Improved merchant experience, better SEO, and conversational search for customers.

McKinsey: Agentic AI Mesh Architecture

What they built: Enterprise architecture for deploying Gen AI agents across 1000+ teams at scale.

How they did it:

  • Centralized governance, decentralized execution

  • Modular agent components that compose for different use cases

  • Standardized deployment patterns and monitoring

What matters: Deployment time went from months to days. The architecture is the Product: It's how you scale agents across an enterprise without chaos.

LinkedIn: Hiring Assistant

What they built: An agent that automates recruiting tasks, candidate sourcing, screening, and outreach.

How they did it:

  • Built on LinkedIn's Gen AI tech stack

  • Natural language interface for recruiter instructions

  • Integration with LinkedIn's professional network data

What matters: 73% of recruiters save 1+ hour per role, + 20x sourcing efficiency,y + 20% revenue increase. Time saved on admin = more time building relationships with candidates.

LinkedIn: Gen AI Tech Stack for Agents

What they built: The infrastructure layer supporting all their AI agents and applications.

The architecture:

  • Foundation models layer (mix of open-source and proprietary)

  • Agent framework for building, testing, and deploying

  • Orchestration layer for workflow management

  • Real-time data infrastructure (LinkedIn's professional graph)

  • Safety and governance systems

What matters: modularity for rapid experimentation, horizontal scalability, and built-in observability. The stack is designed for iteration speed without breaking reliability.

BlackRock: Portfolio Management Agents

What they built: AI agents that analyze market data and provide portfolio recommendations.

How they did it:

  • Agents process market data, financial reports, and economic indicators

  • Integration with proprietary data and models

  • Augments human decision-making with AI insights

What matters: Agents outperform most humans in specific portfolio cases. Faster analysis, more consistent strategy application, better risk assessment.

Vercel: Developer Tool Agents

What they built: AI agents integrated into their development platform for code generation, debugging, and deployment.

How they did it:

  • Natural language interface for dev tasks

  • Context-aware based on project structure

  • Designed for agent-human collaboration

What matters: Their learnings: start narrow, invest in evaluation, design for collaboration, not replacement, iterate on real feedback. Build for developers, not demos.

Thanks for reading. — Rakesh’s Newsletter

Keep Reading