Published: August 26, 2025

When MCP Fails

The Model Context Protocol (MCP) was meant to be the "USB-C for AI agents" - a universal standard for connecting LLMs to tools and data. We're nearly at 9 months since release, we have thousands of MCP servers, I'm seeing widespread adoption across companies of all sizes, yet most implementations are failing to deliver any real value. What's the gap here really?

Wrapping your API in an MCP is the way people are going. This just won't work for a multitude of reasons.

I'm seeing the market move toward "done-for-you" automation. Companies like Clay are building AI agents that complete entire workflows, not just answer questions. This is the trend, not the outlier. They've done it well, but for a lot of others, here's the problem: getting your LLM to chain the correct APIs, in the correct order, with the correct data is brutally error-prone.

This post walks through where I'm seeing MCP implementations fail, why most barely work, and what actually works in production. The tl;dr: we need to move from endpoint wrappers to intent-based workflows.

The Common Failure Modes

Let me show you three critical mistakes I see everywhere. The difference between a tool that technically works and one that AI models can actually use effectively comes down to design decisions that might seem trivial at first.

Most companies are taking the easy path: wrap existing APIs, add an MCP server, ship it. Browse the MCP servers on Glama¹ or Smithery² - this is the pattern. The result? MCPs that barely work and miss the entire point.

Here's where these implementations fail.

Failure Mode #1: Useless Error Messages

Take this conversation:

User: "Assign this ticket to John from the engineering team"
AI: Makes API call
API: "Error 404: User not found"
AI: "I couldn't find that user. Can you provide more information?"
User: Gives up in frustration

That error tells an AI agent nothing. A proper MCP error would look like:

"User 'John' not found. Found 3 possible matches in the "Engineering" team:
1. John Smith (john.smith@company.com) - Engineering
2. John Doe (john.doe@company.com) - Engineering
3. Jonathan Lee (jonathan@company.com) - Engineering
Call createTicket with the email address, or use findUser('John engineering') for more options."

Better yet? Handle the name resolution internally. The AI shouldn't need to understand your UUID system.

Here's the counterintuitive truth: the best tool descriptions aren't the most detailed ones. If your error handling can guide the model to correct usage 90% of the time, lean on that instead of bloating descriptions with every edge case. This tends to just confuse your context.

Every response - success or failure - is a chance to guide the model's next action. When returning search results, don't just dump data. Add guidance: "Found 5 results. Use getDetails() for full information on any result." These breadcrumbs are what separate working MCPs from magical ones.

Failure Mode #2: Token Explosion and Conceptual Confusion

Here's a typical example I see everywhere: A company takes their 80 API endpoints and wraps them directly - a 1:1 mapping, no abstraction. They're proud of the "completeness."

The damage is always immediate and measurable:

24,000+ tokens consumed by tool definitions alone
Before the AI even started working
Before any actual user request

But token count is literally just the beginning of the problems. Take this common scenario:

The AI encounters a conceptual maze. It's drowning in ambiguous terminology: resource vs entity vs object. When should it use community_members vs space_members? What's the difference between flows and docs? The AI has no context for these distinctions - just 80 tools with overlapping, confusing purposes.

Watch how this unfolds. User request: "Find the most active members in our JavaScript community."

Simple, right? The AI took fourteen API calls. And still failed.

Here's the painful journey it went through:

First, it had to figure out that "community" mapped to "space" in their system
Then discover that "space_groups" are collections of spaces (not the same thing!)
Learn that member activity is tracked separately from member profiles
Realize that "workshops" are a type of "event" but stored in a different endpoint
Understand that activity metrics require joining data from three different services

Each call added 2-3 seconds. Each failure required backtracking. The user waits over 40 seconds for... an error message.

The fundamental problem: APIs are designed for developers who build mental models over time. They read docs, experiment, learn the relationships. Each interaction builds on the last.

AI models? They start fresh every single conversation. LLMs are, at the end of the day, pure stateless functions. They can't "remember" that in your system, "spaces" are different from "groups" which are different from "communities." They rely entirely on what you tell them in that moment.

Unlike developers who internalize these concepts over weeks, the AI gets one shot to understand your entire data model. And when you dump 80 endpoints on it? Game over.

Failure Mode #3: Multi-Step Hell

Here's the most painful failure mode I see: forcing LLMs to play systems integrator instead of focusing on the actual task.

"Create a ticket and assign it to John" should be ONE MCP call, not four.

Yet I keep seeing implementations that require:

Call searchUsers('John') → get list of Johns
Call getUserDetails(user_id) → get John's full profile
Call getProjects() → find the right project
Call createTicket(project_id, user_id, details) → finally create the ticket

Each call adds 2-3 seconds of latency. Something simple ends up taking 10+ seconds for no reason. Users complain about sluggish multi-step processes. Figma's AI chat has become the negative benchmark everyone cites - "Don't be like Figma."

The problem? We're forcing LLMs to orchestrate brittle multi-call sequences when they should be inferring intent (which LLMs are notoriously great at, by the way - taking intent and creating structured data to invoke tools)

The Automation Layer Solution

After watching these failures repeat across dozens of companies, I've identified what actually works. Traditional web dev follows a predictable pattern: user performs action → system responds with predetermined outcome. It's deterministic and immediate.

But with LLM systems, we're moving toward something fundamentally different: user expresses intent → system interprets meaning → system determines and executes actions (plural).

This paradigm shift requires an automation layer - a deterministic workflow engine that sits between your source of truth and your AI applications. This layer handles the complex orchestration that LLMs shouldn't be responsible for.

Think of it as a bridge:

Source of Truth → Automation Layer → AI Applications
(Your APIs,      (Intent-based     (Chatbots, agents,
 databases,       workflows,        browser extensions,
 business logic)  smart routing)    voice interfaces)

The issue isn't with MCP itself, but with how we're forcing LLM interactions into traditional request-response patterns without this intelligent middle layer.

How Circle.so Got It Right: From 80 Endpoints to 12 Intent Tools

Let me show you what actually works. Circle transformed their 80-endpoint API into just 12 intent-based tools³:

Finding things: findContent, findMembers, findSpaces (replaced ~15 endpoints)
Understanding activity: getSpaceActivity, getSpaceEvents (replaced ~20 endpoints)
Taking action: publishContent, moderateContent, manageMembership (replaced ~25 endpoints)
Communication: sendDirectMessage, bulkMessage (replaced ~10 endpoints)
Automation: watchForChanges, exportData (replaced ~10 endpoints)

Now when someone asks "Find the most active members in our JavaScript community who haven't attended workshops," it's just three clean calls instead of fourteen⁴. Each tool represents complete user intent, not technical operations.

The AI doesn't need to understand that "spaces" are different from "groups."⁵ It just knows it can findSpaces and getSpaceActivity. Beautiful.

The Native Toolshed Alternative

The compelling use case-where you give Cursor new capabilities by just injecting your tool-won't work most of the time. In production agents, you need to tailor the system message and architecture to the tools you make available.

If you own the entire stack-UI, prompts, architecture, tools-you can actually deliver on customers' increasing expectations. Otherwise, good luck.

For internal automation, you're often better off with native tool definitions that you control completely. These tools use service layers defined in your codebase, not wrappers over public APIs. They can't be made public since they're from deep in the codebase-methods that wrap your services with permissioning, dependencies, etc.

This approach is better for things that aren't available publicly and where you need tight control over:

Permissions - handled within your codebase
Dependencies - managed through your service architecture
Context - you control the prompt and system message
Reliability - no external network calls or server management

Why even start to look into MCP then? An actual answer I've gotten to this question: "MCP is good for a blog post."

Current limitations:

Fragmentation - Non-standard servers, inconsistent tool names & schemas, imperfect documentation
Generic wrappers don't work - Success comes from custom per-server handling
Tool proliferation - 1000 tools from an MCP will fill your context window

MCP is most useful when you want to bring tools to an agent you don't control, or when you need externalization for compliance/security reasons.

The Quality and Safety Challenges

Groundedness: Where does advice come from? Humans. Ask them for the ground truth. The move toward evals involves generating datasets from LLM-as-judge on conversations, but advice should come from human ground truth-capture customer success manager knowledge.

High quality bar: The long tail of edge cases means "works for these things but not those" is unacceptable. Production systems need near-100% reliability.
Safety & approvals: In big companies, this is critical. Many organizations have disabled "update" actions due to missing approval flows⁶. Action tools are disabled until an approval layer is added.
Evals are essential: Generate datasets and use LLM-as-judge⁷ on conversations. Errors need diagnostic detail and repair hints, not just failure notifications.

Understanding Your User: The LLM

Traditional APIs serve deterministic software. MCPs serve LLMs that think in conversations and work with ambiguous input. When you ask an AI agent to "assign this ticket to John," it shouldn't need 4 separate API calls to find John's UUID, look up project IDs, then create the ticket.

"Create a ticket and assign it to John" should be ONE MCP call, not four.

The Latency Problem

Tool-call chains (often 20+ tools) make experiences painfully slow. Figma's AI chat is cited as a negative benchmark-users complain about sluggish multi-step processes.

Something simple ends up taking 10+ seconds for no reason. If we designed for the intent of the LLM we could get much better, faster results. This might require:

Combining API calls in a single MCP call
Running AI enhancements of content in the MCP call
Building intelligence into your MCP server to handle ambiguity
Returning what LLMs actually need, not what your existing API dumps out

Backend Realities

Take a typical HR platform as an example. Internal APIs are complex controller logic (not clean REST). Multiple ID types (worker/HRIS/profile/user) confuse sequential tool use. Contractors vs. employees often have separate endpoints-inconsistency hurts chaining.

The companies getting this right are building MCPs that feel magical. One request accomplishes what used to take multiple API calls.

LLM constraints you must understand:

No persistent learning - They can't remember between conversations
Limited exploration - They don't experiment or learn from mistakes
Context ≠ comprehension - They rely on your descriptions right now
Token bloat - Beware dumping 80+ endpoints as tools

Real-world examples:

Clay's Claygent enables in-table automation with row-level actions (create Notion docs, create Linear tickets) and cross-row AI actions inside tables. Success comes from non-lazy, spec-aware implementations per server.

The principles that actually work:

Build intent-based tools that encapsulate multi-step workflows
Return next-step guidance in every response
Handle the complexity internally, not in the AI's reasoning loop

This is the foundation of the automation layer - and it's what separates MCPs that work from those that don't.

Why This Shift Is Coming Now

The market is racing toward "done-for-you" automation. Companies like Clay are building AI agents that complete entire workflows, not just answer questions. But there are catches.

The Last Mile Problem: Why 95% Isn't Good Enough

Here's the brutal truth about AI automation that nobody wants to talk about.

A security compliance team put it perfectly: "We can't go to full automation without 100% reliability. Even the last 5% requires an approval flow and verification process."

Think about that. 95% accuracy isn't enough - the last 5% failures kill the entire user experience.

When AI struggles with complex tool chains (10-20+ steps), that 5% failure rate compounds. A simple request becomes a frustrating game of "what went wrong this time?" This is why, for financial operations, users still prefer buttons and confirmations. It's not about the tech - it's about perceived control and safety.

"No-UI" requires near-100% correctness. And let's be honest - that's not realistic today in many domains. Payments involve third parties, geographic variance, switching rails. Too many variables for deterministic automation.

The UI vs. Automation Paradox

Here's the paradox that's wracking my brain: We're building toward "done-for-you" automation, yet right now, we're actually seeing more UI, not less...

Wait, what?

Three economic forces are driving this counterintuitive trend:

1. Feature velocity explosion: AI makes building features 3-5x faster. Internal teams are shipping more because the barrier to creating features has collapsed. It's feature inflation - when you can build anything quickly, you build everything.
2. The copy-paste economy: Remember when that guy built a free DocuSign alternative in a weekend using AI?⁸ That's the new reality. Your moat isn't your features anymore - they're cloneable overnight. So what do companies do? They go for the platform play. One-stop-shop. Every feature in one place. More features = more UI.
3. Pipeline crisis: AI outbounding has created a race to the bottom. Nobody reads LinkedIn AI messages anymore. Pipeline is vanishing. When you can't generate new customers cheaply, you need to expand revenue from existing ones. More products, more features, more things to upsell.

The result? We're entering the validation economy - people spend more time reviewing AI outputs than entering data. But they still need sophisticated interfaces for that review process.

Case Study: Clay's Activation Problem

Clay's "Sculptor" AI provides the perfect case study. They pushed new users to a chat-only interface. Guess what happened?

Activation tanked.

The results were brutal: the vast majority of users avoided the chat entirely. Those who engaged showed incredible activation rates, but losing most of your users at the door? That's a disaster.

The core problem isn't the AI - it's that many users don't know what to do with Clay. They need guidance, not a blank chat box. Execution is just the tip of the iceberg. The real challenge is the idea generation and guidance layer that's missing.

Chat is powerful for users who know exactly what they want. But for discovery? For learning what's possible? It's terrible.

Clay's solution: They've nailed it with a combined approach. Claygent blends chat with wizard-style guidance, meeting users where they are in their journey.

Case Study: Central HQ's Automation-First Approach

Now look at Central HQ⁹ - they took the opposite approach and it's working brilliantly.

The key insight? Meet users where they already are. Instead of building another HR dashboard that forces users to context-switch, they brought HR directly into Slack - where teams already spend 8+ hours a day.

Type "Change @Marco's salary to $150k" in Slack and watch the magic:

Permissions validated ✓
Compliance rules checked ✓
Multiple payroll systems updated ✓
Audit trail created ✓
Response: "Done. Effective next payroll cycle, tax implications handled."

One message. Complete automation. No UI.

Their partnership with Puzzle's accounting APIs¹⁰ shows the power of the automation layer. Traditional approach? Four API calls to check cash flow. Central's approach? "You have $47K available, next payroll is$ 12K on Friday."

But here's the key insight: Central targets small teams who want speed over control. Traditional HR platforms target large teams who need dashboards and oversight.

This isn't random. It's classic disruption theory in action.

Clayton M. Christensen explained this pattern decades ago in "The Innovator's Dilemma"¹¹. Large companies focus on their most demanding customers - the ones who pay for features, compliance, and control. They're blind to the disruptive tech that initially seems worse but is progressing faster.

Central HQ looks primitive compared to enterprise HR platforms. No dashboards! No complex approval workflows! Just Slack messages!

But that's exactly the point.

Small teams don't need 200 features. They need five things that work perfectly through the interface they're already using all day. While enterprise platforms add another compliance module for Fortune 500 companies, Central is learning how to handle 80% of HR tasks through chat.

This is the power of meeting users where they are - no new logins, no context switching, no learning curve. Just type your request where you already work. This will be a massive benefit of integrating LLMs in my opinion.

The pattern is predictable:

Start simple - Serve underserved segments with "worse" but more convenient solutions
Learn fast - Small teams provide rapid feedback loops
Move upmarket - Add enterprise features only after nailing the core experience

By the time large companies notice, the disruptor has both the momentum and the learning advantage. This is why MCP matters - not for the enterprises building comprehensive platforms, but for the startups building focused, intent-based automation.

Where MCP Actually Fits (Spoiler: Not Where You Think)

After all this, you might think I'm anti-MCP. I'm not. I'm anti-bad-MCP.

Here's what I've learned: MCP is terrible for internal automation but brilliant for the long tail.

Let me explain.

The Internal Tools Trap

For internal automation, you're better off with native tool definitions that you control completely. These tools use service layers deep in your codebase - methods that wrap your services with permissions, dependencies, context. They can't be made public because they're coupled to your infrastructure.

One engineering lead told me bluntly: "MCP is good for a blog post."

Harsh? Maybe. But he's not wrong for internal use cases.

If you own the entire stack - UI, prompts, architecture, tools - you can actually deliver on customers' increasing expectations. You need to follow 12-factor agent principles¹², control your context, optimize for your specific use cases.

MCP adds a layer of abstraction you don't need.

The Long Tail Opportunity

But here's where MCP shines: the infinite long tail of integrations.

Think Zapier, not Salesforce.

The right comparison isn't "an agent I could build with MCP" vs "a polished agent with native tools." It's "the thousand small automations I need" vs "the five polished agents someone built."

There's an infinite number of workflows - connecting email to Sheets, Slack to Notion, Airtable to Linear. Nobody's going to build polished agents for each combination. With MCP, you can wire them together yourself.

This is the real promise: democratizing automation for the long tail of use cases.

The Three Progressive Steps: From Naive to Native

I've watched dozens of companies go through the same painful evolution. Let me save you some time.

Stage 1: The API Wrapper Phase
"Let's just wrap our API 1:1!"

Result: 80 endpoints = 24,000 tokens = unusable
Example: Separate tools for users.list, channels.list, chat.postMessage
Duration: 2-4 weeks before they realize it's broken

Stage 2: The Workflow Phase
"Let's combine related API calls!"

Result: Better, but still brittle
Example: sendSlackUpdate that handles multiple calls but can't resolve "engineering team" to actual channels
Duration: 2-3 months of bandaids and patches

Stage 3: The Automation Layer
"Let's build intent-based tools with business logic!"

Result: Actually works
Example: communicateWithTeam that understands your org, handles permissions, routes intelligently
This is where you should start

The final stage requires building what I call the automation layer - a deterministic workflow engine between your source systems and AI applications. Stop forcing LLMs to be systems integrators. Let them express intent, then handle the complexity yourself.

Where MCP Excels Today

Despite the problems, MCP has genuine use cases:

Externalization: When you need tools available to agents you don't control. Your customers' Cursor installations. Third-party automation platforms. External AI assistants.
Non-technical builders: Subject matter experts who want to create automations without touching code. They won't edit agent logic. They need plug-and-play.
Integration layer: Bringing external data into your app - if done right. Intercom¹³, Notion¹⁴, Linear already have solid MCP servers. Users can connect via URL + API key.

But remember: 1000 tools from a poorly designed MCP will destroy your context window. Design for intent, always.

Building Better MCPs: A Practical Guide

If you're going to build MCPs despite everything I've said, here's how to not suck at it.

The 90/10 Rule

Keep descriptions lean. If your error handling can guide the model 90% of the time, don't bloat descriptions with edge cases. Let smart errors teach.

Use XML tags for structure:

<usecase> - WHAT the tool does and WHEN to use it
<instructions> - HOW to use it correctly

Don't write paragraphs. Write breadcrumbs.

The Intent Principle

Never expose your internal data model. The AI shouldn't know that "spaces" differ from "groups" in your system. Build tools around user intent:

Bad: getSpaceMembers, getGroupMembers, getCommunityUsers Good: findMembers (handles all the complexity internally)

The Circle.so Pattern

Transform endpoint soup into intent tools:

~80 endpoints → ~12 intent-based tools
Group by what users want to do, not how your system works
Every response structures data to feed the next tool cleanly

Safety First

Isolate destructive operations. Require approvals for updates and deletes. Your model might misinterpret a request - don't let it destroy production data.

The Path Forward

Let me tell you what's actually going to happen.

The holy grail - "users say what they want, and it's done" - requires near-perfect reliability. We're not there. We might never be there for critical operations.

But that's okay.

The companies getting this right aren't trying to eliminate humans from the loop. They're building automation layers that handle the complex orchestration, then letting humans validate the important bits.

User Intent → AI Reasoning → Automation Layer → Source Systems
             ↑                                  ↓
             ← Human Validation (when needed) ←

The market is already segmenting:

Small teams go automation-first. Central HQ in Slack. Speed over control. Move fast, fix things.

Large teams stay UI-first. Enterprise dashboards. Compliance and audit trails. Move carefully, break nothing.

Both are right for their segments.

For Builders

Here's my advice after watching this space evolve:

If you control the full stack: Skip MCP. Build native tools with tight integration. You'll move faster and build better.

If you need externalization: MCP can work, but design for intent, not endpoints. Your 80 endpoints should become 12 intent tools.

Whatever you build: Ship with reliability in mind. That 95% accuracy rate? It kills user experience when your app is designed for doing. Or have a revert :)

The Real Future

The medium term future isn't "no UI" - it's validation over entry. People will spend more time reviewing AI outputs than entering data. But they'll still need sophisticated interfaces for that review.

We're not eliminating UI. We're eliminating data entry.

We're not eliminating humans. We're eliminating tedium.

And please, for the love of all that is holy:

Stop shipping endpoint wrappers.

References

Glama: "MCP Servers Directory", 2024 ↩
Smithery: "MCP Server Registry", 2024 ↩
UseAI: "MCP Tool Design: From APIs to AI-First Interfaces", Detailed case study on transforming Circle's 80 API endpoints into 12 intent-based tools, June 2025 ↩
Circle API Documentation: "Optimizing API Usage", Official guidance on efficient API patterns for Circle platform ↩
Circle Developer Platform: "Get to Know the Circle Developer Platform", Understanding Circle's data model and terminology ↩
MCP Security: "Security Best Practices", Model Context Protocol specification ↩
LLM-as-Judge Research: "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena", NeurIPS 2023 ↩
Aadit Sheth: "Built a free DocuSign alternative in a weekend using AI", Twitter/X, 2024 ↩
Central HQ: "Company Website", 2024 ↩
Puzzle: "Case Study: How Central Built an AI-Powered Accounting Assistant Using Puzzle's Modern APIs", 2025 ↩
Clayton M. Christensen: "The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail", Harvard Business Review Press, 1997 ↩
Dex Horthy: "12-Factor Agents: Patterns of Reliable LLM Applications", HumanLayer, 2024 ↩
Intercom MCP: "Model Context Protocol (MCP)", Developer documentation for Intercom's MCP server ↩
Notion MCP: "Notion MCP – Connect Notion to your favorite AI tools", Developer documentation for Notion's MCP integration ↩

Be the first to share your thoughts!

Loading comments...

PreviousThe Integration Illusion NextContext Engineering