Published: August 30, 2025

The Integration Illusion

Most companies build integrations the wrong way. I know because we did too.

At SAMMY, every new customer request followed the same pattern: "We need to integrate with Zendesk to make this work." So we'd build a custom Zendesk connector. Six months later: "What about Intercom?" Another custom connector. Then Jira, Document360, Linear - each one a separate project, a separate maintenance burden.

After building integration pipelines for dozens of platforms, I realized we were solving the same problem over and over. Every integration felt unique, but they all followed identical patterns. The differences weren't fundamental - they were surface-level variations on how humans organize knowledge.

This changed everything. Instead of building custom connectors for every platform, we built one system that handles them all. The trick here isn't better integration tools or APIs. It's recognizing the universal patterns that exist beneath platform-specific differences.

The Universal Pattern Hidden in Plain Sight

Here's the change that changed how we thought about integrations, illustrated by docs as an example: every knowledge platform uses the same fundamental structure.

It doesn't matter if you're looking at Zendesk, Intercom, Notion, or Confluence. Strip away the API differences, field names, and authentication schemes, and you'll find the same core pattern:

Containers (folders, categories, collections) that organize content
Content (documents, articles, tickets) that lives within containers
Hierarchical relationships that create navigable structure
Metadata that adds context and meaning

This isn't coincidence - it's how humans naturally organize information. Whether you're organizing files on your computer, books in a library, or thoughts in your mind, you use the same hierarchical pattern.

This universality extends beyond documentation. Ticketing systems, messaging platforms, even web scraping - they all follow variations of this same structure. The key insight: what feels like completely different integration challenges are actually the same problem wearing different clothes.

This sounds so ridiculously simple and obvious, but it changes a lot in terms of design.

From Chaos to Clarity

Let me show you the difference this approach makes with a real example.

Custom Connectors Everywhere

# Zendesk connector
class ZendeskConnector:
    def get_categories(self):
        return self.api.get('/help_center/categories')
    def get_articles(self, category_id):
        return self.api.get(f'/help_center/categories/{category_id}/articles')
 
# Intercom connector (completely different)
class IntercomConnector:
    def get_collections(self):
        return self.api.get('/articles')
    def get_articles(self, collection_id):
        return self.api.get(f'/articles?collection_id={collection_id}')
 
# Result: 3 platforms = 3 separate codebases

One System for Everything

# Single loader handles all platforms
loader = HelpCenterLoader(
    integration_type=IntegrationType.ZENDESK,  # or INTERCOM, or JIRA
    content_type=ContentType.HTML,
    locale=LocaleISO639.EN
)
 
# Same method works for any platform
hierarchical_data = load_json_data(data_dir)
loader.load_hierarchical_data(hierarchical_data)
 
# Result: 10 platforms = 1 codebase

The difference is dramatic: instead of building N connectors for N platforms, you build one system that handles them all.

Stateful Design

The real problem wasn't the number of connectors. It was how we built them.

Look at what every integration tutorial teaches you:

# The "obvious" way - stateful design
class ZendeskConnector:
    def __init__(self):
        self.fetched_resources = []  # Hidden state!
 
    async def fetch_resources(self, ids):
        self.fetched_resources = await self.api.get_articles(ids)
        return self  # Method chaining feels clever
 
    async def transform_content(self):
        # Reads from hidden state - must call fetch first!
        for resource in self.fetched_resources:
            resource.content = markdown(resource.html)
        return self
 
    async def upload(self):
        # More hidden state dependency
        return await self.db.insert(self.fetched_resources)

Looks reasonable, right? Wrong. This design creates a cascade of problems:

Temporal coupling - Methods must be called in exact order
Testing nightmare - Can't test transform without calling fetch first
Hidden dependencies - What state does each method need?
Reusability? Forget it - State persists between operations

The fix? Make everything stateless:

# The better way - stateless design (pure functions)
class ZendeskConnector:
    async def fetch_batch(self, resources, context) -> FetchResult:
        # Input → Output, no hidden state
        articles = await self.api.get_articles(resources)
        return FetchResult(succeeded=articles, failed=[])
 
    async def transform_resource(self, raw_data, context) -> StandardizedResource:
        # Pure transformation - testable in isolation
        return StandardizedResource(
            content=markdown(raw_data.html),
            title=raw_data.title
        )

This shift alone dramatically reduced our testing time. No more elaborate state setup. Just pass data in, assert on what comes out.

The Four-Layer Architecture (and the Patterns That Make It Work)

This transformation is possible because of a carefully designed abstraction hierarchy. Architecture diagrams are lies without the patterns that implement them.

Here's how we structure every integration:

Each level has a specific responsibility:

Level 1: Platform APIs - Handle the messy reality of different APIs, authentication, and rate limits
Level 2: Platform Normalizers - Transform platform-specific data into universal concepts
Level 3: Unified Data Model - Store everything in the same structure regardless of source
Level 4: Business Logic - Build features that work across all platforms

The magic happens at Level 2: this is where Zendesk "categories" become "folders", Intercom "collections" become "folders", and so on. Once normalized, everything downstream is identical.

This separation means platform complexity never leaks into your core system. Your search, analytics, and business logic work the same whether data came from Zendesk or Slack.

The Patterns Behind the Magic

But how do you actually BUILD this? After trying (and failing) with several approaches, we landed on four battle-tested design patterns:

1. Template Method Pattern¹ - The orchestrator defines the algorithm skeleton:

class IngestionOrchestrator:
    async def ingest(self, resources, context):
        # Fixed pipeline - never changes
        fetch_result = await self._fetch_resources(...)     # Hook
        validated = await self._validate_resources(...)      # Hook
        transformed = await self._transform_resources(...)   # Hook
        persisted = await self._persist_resources(...)       # Hook
        return self._build_result(...)                      # Fixed

This is your North Star. The pipeline structure never changes, only the implementations vary.

2. Strategy Pattern² - Each platform implements the hooks differently:

class PlatformClient(ABC):
    @abstractmethod
    async def fetch_batch(self, resources, context) -> FetchResult:
        """Platform-specific fetching"""
 
    @abstractmethod
    async def transform_resource(self, raw_data, context) -> StandardizedResource:
        """Platform-specific transformation"""

3. Registry Pattern³ - Auto-discovery without if/else chains:

# Old way - modify factory for each platform (gross)
if platform == "ZENDESK":
    return ZendeskClient()
elif platform == "JIRA":  # Must modify!
    return JiraClient()
 
# New way - auto-registration (chef's kiss)
register_platform_client(PlatformEnum.JIRA, JiraClient)
# Factory automatically knows about it

4. Dependency Injection - Everything is injected, nothing is created internally:

class PlatformClient:
    def __init__(
        self,
        db_client: AsyncClient,        # Injected
        converter: ContentConverter,   # Injected
        **kwargs
    ):
        # No hidden `ContentConverter()` creation
        # Everything explicit, everything testable

These patterns work together like a Swiss watch. Template Method ensures consistency. Strategy enables variety. Registry provides extensibility. Dependency Injection delivers testability.

The Power of Simple Primitives

Here's where the magic happens: complex platform integrations reduce to just two simple data structures.

# Core primitive structures
HierarchyFolder:
  - id: Universal identifier
  - title: Human-readable name
  - children: List[HierarchyFolder] # Recursive nesting
  - docs: List[HierarchyDoc] # Content within folder
  - metadata: Dict[str, Any] # Platform-specific data
 
HierarchyDoc:
  - id: Universal identifier
  - title: Human-readable name
  - body: Content payload
  - metadata: Dict[str, Any] # Platform-specific data

These two simple structures can represent any knowledge organization system:

Zendesk: Categories → Sections → Articles
Intercom: Collections → Articles (flat or nested)
Document360: Categories → Subcategories → Articles
Notion: Databases → Pages → Blocks
Confluence: Spaces → Pages → Child Pages
Any future platform you need to integrate

The beauty is in the simplicity: no matter how complex the source platform, everything maps to folders and documents.

At SAMMY we use the same principle to generalise "knowledge" from vastly different sources. This principle works in layers and stacks to generalise all sorts of integration data.

This diagram illustrates the key change: regardless of source complexity, everything flows through the same normalization pattern. Different platform types (docs, messages, tickets) each have their own source table, but all feed into the same unified knowledge system.

Preserve Everything

Here's a crucial principle: treat platform-specific fields as metadata, not schema requirements. This means you preserve every piece of data without forcing it into rigid columns.

Why does this matter? Because every platform has unique features that users depend on:

# Zendesk-specific metadata preserved:
{
    "author_id": "123456",
    "vote_sum": 42,
    "vote_count": 50,
    "section_id": "789",
    "locale": "en-gb",
    "outdated": false,
    "permission_group_id": "456"
}
 
# Intercom-specific metadata preserved:
{
    "intercom_type": "collection",
    "parent_type": "collection",
    "parent_id": "123",
    "author_id": "456",
    "help_center_id": "789"
}

This metadata approach solves four critical problems:

Zero Data Loss - Every platform field is preserved, nothing gets thrown away
Schema Evolution - New fields appear? No problem, no migrations needed
Platform Features - You can still build platform-specific functionality when needed
Future-Proofing - New platforms work immediately without schema changes

Standardize what's common, preserve what's unique. The core system for docs works with title, content, and hierarchy. Platform-specific features live in metadata, available when needed but never blocking integration.

The Seven Principles That Make It Work

This architecture succeeds because it follows seven core principles. This is how we build integrations that scale:

Platform Agnosticism

The Rule: Your application code should never know where data came from.

Why It Matters: When your search feature works the same whether data came from Zendesk or Slack, adding new platforms becomes trivial.

How to Apply: Map all platforms to standardized fields (title, assignee, tier). Keep platform-specific data in metadata.

Group by Data Type, Not Platform

The Rule: Don't create separate tables for each platform. Group by what the data represents.

Why It Matters: Prevents table explosion and enables cross-platform queries with a single SELECT.

How to Apply: Use a single table for all ticketing platforms, another for all messaging platforms. Identify the platform with an enum column.

The Rule: Resources and entities used across platforms get their own shared tables. These shouldn't have to be concerned about the source type.

Why It Matters: Eliminates duplication and creates single sources of truth for common functionality.

How to Apply: Use dedicated tables for shared resources like change tracking and file attachments. Use join tables to link sources to shared resources.

Design for Growth

The Rule: Adding new platforms should be easy, not a major project.

Why It Matters: Business moves fast. Your integration system should too.

How to Apply: Use extensible ENUMs for platforms, JSONB for flexible metadata, and consistent table patterns for new source types.

Enable Polymorphism

The Rule: Shared resources should work with any source type.

Why It Matters: Documentation changes can come from tickets, messages, or documents. Don't artificially limit connections.

How to Apply: Use source_type enums to enable polymorphic relationships. Keep shared tables source-agnostic.

Preserve Everything

The Rule: Never throw away data. Storage is cheap, data recovery is expensive.

Why It Matters: Requirements change, processes get better, bugs happen, and you'll want that "useless" field later.

How to Apply: Store complete API responses in payload_data. Extract standardized fields, but keep the original. Remove PII first, and the rest is fair game.

Configure Without Code Changes

The Rule: Different organizations need different configurations. Make it data, not code.

Why It Matters: Every customer has unique field mappings, webhook URLs, and integration requirements.

How to Apply: Use settings columns per organization/platform combination.

Refactoring Guru: "Template Method Pattern", Design pattern that defines the skeleton of an algorithm in the superclass but lets subclasses override specific steps without changing its structure. ↩
Refactoring Guru: "Strategy Pattern", Behavioral design pattern that lets you define a family of algorithms, put each of them into a separate class, and make their objects interchangeable. ↩
GeeksforGeeks: "Registry Pattern", System design pattern for managing and accessing a collection of objects through a centralized registry. ↩

Be the first to share your thoughts!

Loading comments...

PreviousIs Code Rotting Due To AI?NextWhen MCP Fails

The Integration Illusion

The Universal Pattern Hidden in Plain Sight

From Chaos to Clarity

Custom Connectors Everywhere

One System for Everything

Stateful Design

The Four-Layer Architecture (and the Patterns That Make It Work)

The Patterns Behind the Magic

The Power of Simple Primitives

Preserve Everything

The Seven Principles That Make It Work

Platform Agnosticism

Group by Data Type, Not Platform

Design for Growth

Enable Polymorphism

Preserve Everything

Configure Without Code Changes

The Compound Effect

Your Next Steps

References