- Git Disasters and Process Debt
- Is Code Rotting Due To AI?
- ›The Integration Illusion
- When MCP Fails
- Context Engineering
- Stop Email Spoofing with DMARC
- SOTA Embedding Retrieval: Gemini + pgvector for Production Chat
- A Review of Agentic Design Patterns
- Building AI Agents for Automated Podcasts
- Rediscovering Cursor
- GraphRAG > Traditional Vector RAG
- Cultural Bias in LLMs
- Mapping out the AI Landscape with Topic Modelling
- Sustainable Cloud Computing: Carbon-Aware AI
- Defensive Technology for the Next Decade of AI
- Situational Awareness: The Decade Ahead
- Mechanistic Interpretability: A Survey
- Why I Left Ubuntu
- Multi-Agent Collaboration
- Building Better Retrieval Systems
- Building an Automated Newsletter-to-Summary Pipeline with Zapier AI Actions vs AWS SES & Lambda
- Local AI Image Generation
- Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay
- Making the Switch to Linux for Development
- Scaling Options Pricing with Ray
- The Async Worker Pool
- Browser Fingerprinting: Introducing My First NPM Package
- Reading Data from @socket.io/redis-emitter without Using a Socket.io Client
- Socket.io Middleware for Redux Store Integration
- Sharing TypeScript Code Between Microservices: A Guide Using Git Submodules
- Efficient Dataset Storage: Beyond CSVs
- Why I switched from Plain React to Next.js 13
- Deploy & Scale Socket.io Containers in ECS with Elasticache
- Implementing TOTP Authentication in Python using PyOTP
- Simplifying Lambda Layer ARNs and Creating Custom Layers in AWS
- TimeScaleDB Deployment: Docker Containers and EC2 Setup
- How to SSH into an EC2 Instance Using PuTTY
Most companies build integrations the wrong way. I know because we did too.
At SAMMY, every new customer request followed the same pattern: "We need to integrate with Zendesk to make this work." So we'd build a custom Zendesk connector. Six months later: "What about Intercom?" Another custom connector. Then Jira, Document360, Linear - each one a separate project, a separate maintenance burden.
After building integration pipelines for dozens of platforms, I realized we were solving the same problem over and over. Every integration felt unique, but they all followed identical patterns. The differences weren't fundamental - they were surface-level variations on how humans organize knowledge.
This changed everything. Instead of building custom connectors for every platform, we built one system that handles them all. The trick here isn't better integration tools or APIs. It's recognizing the universal patterns that exist beneath platform-specific differences.
The Universal Pattern Hidden in Plain Sight
Here's the change that changed how we thought about integrations, illustrated by docs as an example: every knowledge platform uses the same fundamental structure.
It doesn't matter if you're looking at Zendesk, Intercom, Notion, or Confluence. Strip away the API differences, field names, and authentication schemes, and you'll find the same core pattern:
- Containers (folders, categories, collections) that organize content
- Content (documents, articles, tickets) that lives within containers
- Hierarchical relationships that create navigable structure
- Metadata that adds context and meaning
This isn't coincidence - it's how humans naturally organize information. Whether you're organizing files on your computer, books in a library, or thoughts in your mind, you use the same hierarchical pattern.
This universality extends beyond documentation. Ticketing systems, messaging platforms, even web scraping - they all follow variations of this same structure. The key insight: what feels like completely different integration challenges are actually the same problem wearing different clothes.
This sounds so ridiculously simple and obvious, but it changes a lot in terms of design.
From Chaos to Clarity
Let me show you the difference this approach makes with a real example.
Custom Connectors Everywhere
# Zendesk connector
class ZendeskConnector:
def get_categories(self):
return self.api.get('/help_center/categories')
def get_articles(self, category_id):
return self.api.get(f'/help_center/categories/{category_id}/articles')
# Intercom connector (completely different)
class IntercomConnector:
def get_collections(self):
return self.api.get('/articles')
def get_articles(self, collection_id):
return self.api.get(f'/articles?collection_id={collection_id}')
# Result: 3 platforms = 3 separate codebases
One System for Everything
# Single loader handles all platforms
loader = HelpCenterLoader(
integration_type=IntegrationType.ZENDESK, # or INTERCOM, or JIRA
content_type=ContentType.HTML,
locale=LocaleISO639.EN
)
# Same method works for any platform
hierarchical_data = load_json_data(data_dir)
loader.load_hierarchical_data(hierarchical_data)
# Result: 10 platforms = 1 codebase
The difference is dramatic: instead of building N connectors for N platforms, you build one system that handles them all.
Stateful Design
The real problem wasn't the number of connectors. It was how we built them.
Look at what every integration tutorial teaches you:
# The "obvious" way - stateful design
class ZendeskConnector:
def __init__(self):
self.fetched_resources = [] # Hidden state!
async def fetch_resources(self, ids):
self.fetched_resources = await self.api.get_articles(ids)
return self # Method chaining feels clever
async def transform_content(self):
# Reads from hidden state - must call fetch first!
for resource in self.fetched_resources:
resource.content = markdown(resource.html)
return self
async def upload(self):
# More hidden state dependency
return await self.db.insert(self.fetched_resources)
Looks reasonable, right? Wrong. This design creates a cascade of problems:
- Temporal coupling - Methods must be called in exact order
- Testing nightmare - Can't test
transform
without callingfetch
first - Hidden dependencies - What state does each method need?
- Reusability? Forget it - State persists between operations
The fix? Make everything stateless:
# The better way - stateless design (pure functions)
class ZendeskConnector:
async def fetch_batch(self, resources, context) -> FetchResult:
# Input → Output, no hidden state
articles = await self.api.get_articles(resources)
return FetchResult(succeeded=articles, failed=[])
async def transform_resource(self, raw_data, context) -> StandardizedResource:
# Pure transformation - testable in isolation
return StandardizedResource(
content=markdown(raw_data.html),
title=raw_data.title
)
This shift alone dramatically reduced our testing time. No more elaborate state setup. Just pass data in, assert on what comes out.
The Four-Layer Architecture (and the Patterns That Make It Work)
This transformation is possible because of a carefully designed abstraction hierarchy. Architecture diagrams are lies without the patterns that implement them.
Here's how we structure every integration:
Each level has a specific responsibility:
- Level 1: Platform APIs - Handle the messy reality of different APIs, authentication, and rate limits
- Level 2: Platform Normalizers - Transform platform-specific data into universal concepts
- Level 3: Unified Data Model - Store everything in the same structure regardless of source
- Level 4: Business Logic - Build features that work across all platforms
The magic happens at Level 2: this is where Zendesk "categories" become "folders", Intercom "collections" become "folders", and so on. Once normalized, everything downstream is identical.
This separation means platform complexity never leaks into your core system. Your search, analytics, and business logic work the same whether data came from Zendesk or Slack.
The Patterns Behind the Magic
But how do you actually BUILD this? After trying (and failing) with several approaches, we landed on four battle-tested design patterns:
1. Template Method Pattern1 - The orchestrator defines the algorithm skeleton:
class IngestionOrchestrator:
async def ingest(self, resources, context):
# Fixed pipeline - never changes
fetch_result = await self._fetch_resources(...) # Hook
validated = await self._validate_resources(...) # Hook
transformed = await self._transform_resources(...) # Hook
persisted = await self._persist_resources(...) # Hook
return self._build_result(...) # Fixed
This is your North Star. The pipeline structure never changes, only the implementations vary.
2. Strategy Pattern2 - Each platform implements the hooks differently:
class PlatformClient(ABC):
@abstractmethod
async def fetch_batch(self, resources, context) -> FetchResult:
"""Platform-specific fetching"""
@abstractmethod
async def transform_resource(self, raw_data, context) -> StandardizedResource:
"""Platform-specific transformation"""
3. Registry Pattern3 - Auto-discovery without if/else chains:
# Old way - modify factory for each platform (gross)
if platform == "ZENDESK":
return ZendeskClient()
elif platform == "JIRA": # Must modify!
return JiraClient()
# New way - auto-registration (chef's kiss)
register_platform_client(PlatformEnum.JIRA, JiraClient)
# Factory automatically knows about it
4. Dependency Injection - Everything is injected, nothing is created internally:
class PlatformClient:
def __init__(
self,
db_client: AsyncClient, # Injected
converter: ContentConverter, # Injected
**kwargs
):
# No hidden `ContentConverter()` creation
# Everything explicit, everything testable
These patterns work together like a Swiss watch. Template Method ensures consistency. Strategy enables variety. Registry provides extensibility. Dependency Injection delivers testability.
The Power of Simple Primitives
Here's where the magic happens: complex platform integrations reduce to just two simple data structures.
# Core primitive structures
HierarchyFolder:
- id: Universal identifier
- title: Human-readable name
- children: List[HierarchyFolder] # Recursive nesting
- docs: List[HierarchyDoc] # Content within folder
- metadata: Dict[str, Any] # Platform-specific data
HierarchyDoc:
- id: Universal identifier
- title: Human-readable name
- body: Content payload
- metadata: Dict[str, Any] # Platform-specific data
These two simple structures can represent any knowledge organization system:
- Zendesk: Categories → Sections → Articles
- Intercom: Collections → Articles (flat or nested)
- Document360: Categories → Subcategories → Articles
- Notion: Databases → Pages → Blocks
- Confluence: Spaces → Pages → Child Pages
- Any future platform you need to integrate
The beauty is in the simplicity: no matter how complex the source platform, everything maps to folders and documents.
At SAMMY we use the same principle to generalise "knowledge" from vastly different sources. This principle works in layers and stacks to generalise all sorts of integration data.
This diagram illustrates the key change: regardless of source complexity, everything flows through the same normalization pattern. Different platform types (docs, messages, tickets) each have their own source table, but all feed into the same unified knowledge system.
Preserve Everything
Here's a crucial principle: treat platform-specific fields as metadata, not schema requirements. This means you preserve every piece of data without forcing it into rigid columns.
Why does this matter? Because every platform has unique features that users depend on:
# Zendesk-specific metadata preserved:
{
"author_id": "123456",
"vote_sum": 42,
"vote_count": 50,
"section_id": "789",
"locale": "en-gb",
"outdated": false,
"permission_group_id": "456"
}
# Intercom-specific metadata preserved:
{
"intercom_type": "collection",
"parent_type": "collection",
"parent_id": "123",
"author_id": "456",
"help_center_id": "789"
}
This metadata approach solves four critical problems:
- Zero Data Loss - Every platform field is preserved, nothing gets thrown away
- Schema Evolution - New fields appear? No problem, no migrations needed
- Platform Features - You can still build platform-specific functionality when needed
- Future-Proofing - New platforms work immediately without schema changes
Standardize what's common, preserve what's unique. The core system for docs works with title, content, and hierarchy. Platform-specific features live in metadata, available when needed but never blocking integration.
The Seven Principles That Make It Work
This architecture succeeds because it follows seven core principles. This is how we build integrations that scale:
Platform Agnosticism
The Rule: Your application code should never know where data came from.
Why It Matters: When your search feature works the same whether data came from Zendesk or Slack, adding new platforms becomes trivial.
How to Apply: Map all platforms to standardized fields (title
, assignee
, tier
). Keep platform-specific data in metadata.
Group by Data Type, Not Platform
The Rule: Don't create separate tables for each platform. Group by what the data represents.
Why It Matters: Prevents table explosion and enables cross-platform queries with a single SELECT.
How to Apply: Use a single table for all ticketing platforms, another for all messaging platforms. Identify the platform with an enum column.
Share What's Common
The Rule: Resources and entities used across platforms get their own shared tables. These shouldn't have to be concerned about the source type.
Why It Matters: Eliminates duplication and creates single sources of truth for common functionality.
How to Apply: Use dedicated tables for shared resources like change tracking and file attachments. Use join tables to link sources to shared resources.
Design for Growth
The Rule: Adding new platforms should be easy, not a major project.
Why It Matters: Business moves fast. Your integration system should too.
How to Apply: Use extensible ENUMs for platforms, JSONB for flexible metadata, and consistent table patterns for new source types.
Enable Polymorphism
The Rule: Shared resources should work with any source type.
Why It Matters: Documentation changes can come from tickets, messages, or documents. Don't artificially limit connections.
How to Apply: Use source_type
enums to enable polymorphic relationships. Keep shared tables source-agnostic.
Preserve Everything
The Rule: Never throw away data. Storage is cheap, data recovery is expensive.
Why It Matters: Requirements change, processes get better, bugs happen, and you'll want that "useless" field later.
How to Apply: Store complete API responses in payload_data
. Extract standardized fields, but keep the original. Remove PII first, and the rest is fair game.
Configure Without Code Changes
The Rule: Different organizations need different configurations. Make it data, not code.
Why It Matters: Every customer has unique field mappings, webhook URLs, and integration requirements.
How to Apply: Use settings columns per organization/platform combination.
The Compound Effect
Every platform we add makes the next one easier. Every bug we fix improves all platforms. Every optimization benefits everyone. It's not linear improvement - it's exponential.
The first few platforms took weeks to integrate. Now? New integrations take days, sometimes hours. Each one strengthens the foundation for the next.
Your Next Steps
Building integrations that last requires a fundamental shift in thinking. Stop treating each platform as unique. Instead, recognize the universal patterns that exist in every knowledge organization system.
References
Footnotes
-
Refactoring Guru: "Template Method Pattern", Design pattern that defines the skeleton of an algorithm in the superclass but lets subclasses override specific steps without changing its structure. ↩
-
Refactoring Guru: "Strategy Pattern", Behavioral design pattern that lets you define a family of algorithms, put each of them into a separate class, and make their objects interchangeable. ↩
-
GeeksforGeeks: "Registry Pattern", System design pattern for managing and accessing a collection of objects through a centralized registry. ↩