Published: March 12, 2026

How I Gave My Coding Agents Persistent Memory

I had 50 coding sessions in a single day recently. Not unusual - I run Claude Code and Cursor side by side, and each task gets its own session. Over the past few months I've accumulated over 4,000 sessions across both tools, plus quite a few recorded meetings. Decisions, things I've worked on, dead ends, architectural choices - all of them end up trapped in JSONL files and SQLite databases I'd never open again. Each time I opened a new terminal I was re-explaining the same context from scratch or having to remember which doc contained the best context to set off my research agents.

Claude Code's built-in search doesn't help much here. Under the hood it runs a smaller model that does sequential greps file-by-file for text matching - fine for finding a function definition, useless for "what was the architectural decision we made about the compliance engine last month?" I would watch it churn for three minutes and come back with string matches that included audit() function calls when I was looking for notes about actual patterns and reasons for choices we had made.

I needed something that understood relevance and reasoning, not just string matching. QMD¹ turned out to be the answer - Tobias Lutke's² local search engine that runs a pretty standard RAG pipeline (keyword scoring, vector similarity, LLM-based re-ranking) but entirely locally on your machine through node-llama-cpp³ and quantized GGUF models. No cloud APIs, no token costs. The difference was amazing - sub-second results ranked by actual relevance, all local.

Memory Layer

The bet I'm making is that context outlives any specific tool. Workflows will keep changing - Claude Code today, Cursor yesterday, something else tomorrow. But if your decisions, debugging sessions, and meeting notes are indexed locally in a format any tool can read, you're insulated from the churn. I've written about this before - context is the thing to be focussed on managing. Tobias Lutke calls it "the art of providing all the context for the task to be plausibly solvable by the LLM"⁴. Anthropic's engineering team frames it differently - they write about "context rot," how models lose recall accuracy as context grows⁵. Either way, the answer is the same: find the smallest set of high-signal tokens that maximize the outcome you want.

I built this as a set of Claude Code skills, inspired by ArtemXTech's personal-os-skills⁶ but extended with Cursor session extraction, Granola API reverse engineering, and a full automation pipeline. Everything is local markdown indexed by QMD, accessible from Claude Code and any MCP-compatible tool⁷.

Two skills sit on top. /recall loads context from previous sessions - you can recall by time ("what features did I work on yesterday?"), by topic ("recall Compliance Engine work" via BM25), or as an interactive graph visualization. /search is the lighter counterpart for mid-conversation lookups - BM25 with snippets, no query expansion, no synthesis. Just fast inline results when you need to find something specific.

Collecting Context

First step is to have your context in one place. I needed context to index and search over, so I set up five QMD collections: notes, claude-code-sessions, cursor-sessions, granola-sessions, and service-docs. The first four are generated - raw data extracted from apps and converted to markdown. The fifth is mirrored documentation from the repos I actively work on, copied so it's searchable alongside everything else.

All of them end up as markdown files in git, indexed by QMD. Claude Code sessions are saved incrementally via hooks. Cursor sessions are batch-exported from SQLite. Granola meetings are fetched from reverse-engineered API calls. Service docs are synced from the source repos.

Each source stores its data differently, so we run a normalisation layer to get the data in (similar to the integration post). Claude Code hands you flat JSONL files. Cursor buries everything in SQLite. Granola doesn't want you to have it at all locally. The stories get progressively more interesting.

Claude Code Sessions

This was the easiest data source. Every Claude Code conversation ends up as a JSONL file under ~/.claude/projects/, organized by working directory. The problem is that the raw files are noisy: tool use blocks, system prompts, role markers, base64 image data. Most of it is useless for search.

My export script strips all of that out and writes clean markdown - just the user messages, the title, a summary, and which files were touched. That's what QMD actually indexes. A SessionEnd hook triggers the export automatically whenever I close a terminal, so the index stays current without me thinking about it.

100%

Directory Structure

Sessions are organized by project group and, for multi-repo products, by repo subdirectory:

claude-code-sessions/
├── my-product/                    # SaaS product - separate repos
│   ├── api/                       # Backend API - workflows, webhooks, shared services
│   ├── dashboard/                 # Frontend dashboard - management, analytics
│   ├── agents-v2/                 # Agent orchestration - pipelines, multi-LLM adapters
│   ├── workers/                   # Event-driven async - message queues, saga workflows
│   └── voice/                     # AI voice agent - real-time conversation
├── blog/                          # Personal MDX blog built with Next.js
├── prompt-lib/                    # Prompt library and templates
├── context/                       # Context system - session sync, recall, QMD search skills
└── notes/                         # Personal notes and research

The export script has a catalog of project groups and their repos - it knows which folders get subdirectories and what descriptions to apply.

Context Routing

Each project and repo subdirectory gets a QMD context description so search results include what each folder is about. Running cs context registers all of them at once, and qmd context list verifies the setup.

With Claude Code sessions flowing into the index automatically, I started noticing what was missing. Some of my coding still happens in Cursor - and none of those sessions were searchable.

Cursor Sessions

Claude Code was straightforward - one JSONL file per session, easy to parse. Cursor was a different problem entirely.

Cursor doesn't write flat files. It stores everything in SQLite databases. All the composer metadata, all the messages, all the file states - locked inside state.vscdb files. If you want your Cursor conversations searchable alongside your Claude Code sessions, you need to crack those databases open and extract the same kind of clean markdown.

That's what the export script does. Same output format, same directory structure, same QMD integration. The only difference is where the data comes from.

100%

Data Architecture

Cursor uses a two-level SQLite architecture:

The global DB (~/Library/Application Support/Cursor/User/globalStorage/state.vscdb, ~6.6GB) is the big one. A single cursorDiskKV table holds everything. Composer metadata lives in rows keyed composerData:<uuid> (~1,600 entries). Individual messages are stored separately as bubbleId:<composerId>:<bubbleId> (~146K rows).

The separation exists because messages can be huge - a single bubble JSON can be 250KB with full diffs and tool outputs.

Then there are the workspace DBs - about 300 of them, one per workspace hash under ~/Library/Application Support/Cursor/User/workspaceStorage/. Each workspace directory has a workspace.json that maps to a real folder path, and its own state.vscdb with a composer.composerData key listing which composers were opened in that workspace.

The global DB has the content. The workspace DBs have the project mapping. You need both.

Read-Only Access

I open every database with file:{path}?mode=ro (SQLite URI read-only mode). This matters because Cursor is usually running while you export. SQLite handles concurrent readers fine, but if you open in read-write mode you risk WAL checkpoint conflicts or - worse - accidentally corrupting Cursor's state. Read-only mode makes the script safe to run anytime, even mid-session.

Export Pipeline

The script runs in three phases.

First, build the workspace map. Scan all ~300 workspace directories, read each workspace.json to get the folder path, then query its state.vscdb for the list of composer IDs. The result maps composer IDs to folder paths - many composers appear in multiple workspaces, so the raw map is large, but it covers the ~1,600 unique composers in the global DB. This is the only way to know which project a Cursor session belongs to - the global DB doesn't store folder paths.

Next, stream and extract composers from the global DB. I query all composerData:* rows one at a time rather than loading everything into memory - the full dataset is ~105MB of JSON. For each composer I extract:

Metadata: title, mode (agent/chat/edit/plan), model name, branch, creation and update timestamps
File operations: originalFileStates keys (modified files), newlyCreatedFiles (created files)
User messages: I only want the user's text, not the full bubble JSON. Instead of deserializing 250KB blobs, I use json_extract(value, '$.text') in SQL to pull just the text field. This keeps memory usage low and export fast.

Finally, resolve projects and write markdown. Each composer gets assigned to a project using a three-strategy priority chain. The workspace map is most reliable (~50% of composers) - if workspace.json says this composer was opened in a project folder, that's the answer. For the rest, I fall back to file URI matching (~40%) - extract paths from originalFileStates, find the common prefix, match against the project catalog. Sessions with neither workspace nor file matches go in the root unresolved.

Once the project is resolved, I generate the same markdown format as Claude sessions - YAML frontmatter, artifacts section, a preserved ## My Notes section, and the conversation. Cursor-specific frontmatter fields include mode (agent/chat/edit/plan), model (which LLM was used), and branch.

Preserving Notes

Every export re-generates the full markdown from the source SQLite data. But you might have added notes to a session file - annotations, links, context that only you know. The ## My Notes section and certain frontmatter fields (status, tags, rating, comments, title, projects) are read from the existing file before overwriting and carried forward into the new output. You can re-export as many times as you want without losing your annotations.

Output Format

---
type: cursor-session
date: 2026-01-23
composer_id: 002f2d09-...
repo: product-api
mode: agent
model: claude-4.5-sonnet-thinking
title: "Notification filtering decision persistence"
branch: you/eng-123-refactor-webhooks
messages: 34
last_activity: 2026-01-23T13:08:45+00:00
status: active
tags: []
rating: null
comments: ""
projects: []
---
 
# Notification filtering decision persistence
 
## Artifacts
**Modified:**
- `src/routes/webhook.py`
 
## My Notes
<!-- preserved across re-exports -->
 
## Conversation
### User
<message text>

Subdirectory Structure

Same structure as claude-code-sessions/, using the same project catalog:

cursor-sessions/
├── my-product/                    # 992 sessions
│   ├── api/                       # Backend API
│   ├── dashboard/                 # Frontend dashboard
│   ├── workers/                   # Event-driven async orchestration
│   └── voice/                     # AI voice agent
├── blog/                          # 19 sessions
├── prompt_lib/                    # 18 sessions
└── notes/                         # 2 sessions

Commands

crs="python3 ~/projects/context/skills/sync-cursor-sessions/scripts/cursor-sessions"

Command	Description
`crs export --all`	Export all composers to markdown
`crs export --today`	Only composers updated in last 24h
`crs export --since 2026-03-01`	Composers updated since date
`crs list`	List active exported sessions
`crs list --all --json`	All sessions as JSON
`crs context`	Register QMD context descriptions

No sync, resume, note, or close commands. Those are Claude Code-specific - sync uses hooks that fire on each prompt, resume calls claude --resume to reopen a session. Cursor doesn't expose those integration points, so cursor-sessions is export-only. You run it when you want a fresh snapshot.

After exporting, run qmd update && qmd embed. qmd update scans for new or changed markdown files and adds them to the index. qmd embed generates vector embeddings for any new chunks. Both commands are incremental - they only process what changed. The first full embed for 1,036 cursor session files (25,773 chunks) took about 16 minutes on an M4 Pro.

Indexing happens automatically via the hourly context-sync --cron - more on that in Keeping It Fresh.

Coding agent sessions were now covered. But what about the decisions that happen in meetings?

Granola Sessions

Cursor stored everything locally in SQLite - locked away, but at least on my machine. Granola was the opposite problem. The app shows you beautiful AI summaries and full transcripts in the UI, but the local cache is almost empty. Hundreds of meetings in the cache, zero transcripts, zero summaries. The content you see in the app lives on Granola's servers⁸.

The local cache (~/Library/Application Support/Granola/cache-v4.json) has the metadata - titles, attendees, calendar events, timestamps. But the two things that make meetings searchable - what was said and what the AI extracted from it - aren't there.

We had to go find them.

100%

Dead End

The first instinct was to look harder locally. Granola is an Electron app - there should be more data somewhere. I dug through everything:

IndexedDB (~45KB) - too small to hold anything useful
Local Storage (~160KB) - analytics telemetry, not content
Session Storage (~55KB) - session-scoped app state
blob_storage - one empty UUID directory

Then I found something interesting: an Origin Private File System (OPFS) directory at ~/Library/Application Support/Granola/File System/000/t/00/ containing a 6.3MB SQLite database (granola.db) with a 4.4MB WAL file. This is where the AI summaries live locally - stored as Yjs collaborative documents. But the database is encrypted (likely SQLCipher). No standard SQLite header, no readable strings. Dead end without the encryption key.

I also found a transcription_retention_time_ms flag set to 259,200,000 milliseconds (3 days). That explains why the cache only had 1 transcript entry (empty). Transcripts are cached briefly, then purged. The cache is a thin layer for the most recently viewed data, not a persistent store.

Granola API

The data exists - just server-side. Fortunately, multiple developers have already reverse-engineered Granola's internal API⁹. Two endpoints give us everything:

POST /v2/get-documents with include_last_viewed_panel: true - returns all meetings with AI summary panels
POST /v1/get-document-transcript - returns full transcript segments with speaker source, timestamps, and text

Authentication uses WorkOS OAuth. The access token is stored locally at ~/Library/Application Support/Granola/supabase.json - the same token the desktop app uses. The script reads it directly, no manual setup needed. It's your own session token on your own machine; this is the same trust boundary as opening the file in a text editor. The token structure is JSON-inside-JSON: supabase.json has a workos_tokens key containing a JSON string that must be parsed again to get access_token.

One gotcha: the API returns gzip-compressed responses even without an Accept-Encoding header. The script detects the \x1f\x8b gzip magic bytes and decompresses automatically.

Summary Conversion

The AI-generated summaries aren't markdown or plain text. They're ProseMirror/TipTap JSON - the same rich-text format used by Notion, GitLab, and even our own editors in SAMMY. Each summary is a tree of typed nodes:

{
  "type": "doc",
  "content": [
    {
      "type": "heading",
      "attrs": { "level": 3 },
      "content": [{ "type": "text", "text": "Feature Status Update" }]
    },
    {
      "type": "bulletList",
      "content": [
        {
          "type": "listItem",
          "content": [
            {
              "type": "paragraph",
              "content": [
                { "type": "text", "text": "No requests since last quarter" }
              ]
            }
          ]
        }
      ]
    }
  ]
}

The script walks this tree recursively, converting each node type to its markdown equivalent - headings, bullet lists, ordered lists, code blocks, bold/italic marks, links. Nested lists get indented. It handles the full ProseMirror spec that Granola actually uses.

Two Sources, One File

The final design merges both data sources per meeting:

Content	Source	Coverage
Attendee details	Local cache (`people`)	141/141
Calendar metadata	Local cache (`google_calendar_event`)	57/141
Transcripts	API (`get-document-transcript`)	138/141
AI summaries	API (`last_viewed_panel`)	114/141
User notes	Local cache (`notes_markdown`)	6/141

If the API is unavailable (no token, network down), the script falls back gracefully to cache-only export - you still get metadata and attendees, just no transcripts or summaries. The script never writes to the API or modifies local state. Granola records automatically when it detects a call, but you have to actively type notes in the app - the AI summaries fill that gap. The meetings have structured summaries with headings like "Key Decisions", "Action Items", "Technical Discussion". That's the most searchable content.

Output Format

---
type: granola-meeting
date: 2026-03-10
meeting_id: b68af9b5-...
title: "Alex <> Shav (Weekly)"
time: "11:00"
duration_min: 30
attendees:
  - "Alex Chen <alex.chen@partner.com>"
creator: "Shav <shav@example.com>"
last_activity: 2026-03-10T11:30:00Z
status: raw
tags: []
rating: null
comments: ""
projects: []
---
 
# Alex <> Shav (Weekly)
 
## Attendees
- Alex Chen (alex.chen@partner.com)
 
## Summary
### Feature Status Update
- No requests from partner since last quarter
- Endpoints still active but unused
...
 
## My Notes
<!-- preserved across re-exports -->
 
## Transcript
[11:00:16] **You**: How's it going? Long time no see.
[11:00:17] **Participant**: Hello.
...

Transcripts use two speaker labels: You (microphone source - your voice) and Participant (system audio - everyone else on the call). Each utterance has a timestamp from the original recording.

Commands

grs="python3 ~/projects/context/skills/granola/scripts/granola-sessions"

Command	Description
`grs export --all`	Export all valid meetings
`grs export --today`	Meetings from last 24h
`grs export --since 2026-03-01`	Meetings since date
`grs list [--all] [--json]`	List exported meetings
`grs context`	Register QMD context description

The hourly cron handles indexing. Now I had conversations, code sessions, and meetings - but not the documentation that explains how my systems actually work.

Service Docs

The other four collections are about what happened - conversations, sessions, meetings. This one is about what exists. Your actual service documentation, the stuff engineers write to explain how systems work.

The problem is obvious once you have the other collections working. You search for "webhook processing" and get back three Cursor sessions where you worked on webhooks, a meeting where you discussed the webhook architecture, and a Claude Code session where you debugged a webhook handler. But you don't get the actual documentation that explains how webhooks work in your system - because that lives in a different repo and isn't indexed.

Service docs sit in the repos where they belong (product/api/docs/, product/dashboard/docs/, product/workers/docs/). That's the right place for them - close to the code, updated alongside it and versioned. But for search, you want them in one place alongside your sessions and meeting notes. When you ask "how does the event publisher work?", you want the architecture doc, the session where you refactored it, and the meeting where the team decided on the design - all in one result set.

The script doesn't transform anything. No parsing, no frontmatter injection, no format conversion. It copies markdown files from source repos into service-docs/, preserving directory structure, organized by service name. That's it.

100%

Why Copy

QMD collections map to directories. You could add each repo's docs/ folder as a separate collection, but then you'd have 3 collections just for documentation, and that number grows every time you add a service. You'd need separate context descriptions for each, separate search flags, and the results would be fragmented across collections.

Copying into one folder means one collection, one set of context descriptions, one search scope. qmd search "saga workflow" -c service-docs searches all service documentation. The service name is preserved in the path (service-docs/workers/design/workflows.md), so you always know where a result came from. Each service subdirectory gets its own QMD context description so search results rank higher when the query matches a service's domain.

qmd://service-docs
  SaaS product documentation - API, dashboard, workers service architecture and guides
qmd://service-docs/api
  Backend API - workflows, webhooks, ingestion, analysis, shared services
qmd://service-docs/dashboard
  Frontend dashboard - management, analytics, feature flags
qmd://service-docs/workers
  Event-driven async orchestration - message queue workers, saga workflows

The sync is incremental - it compares file contents and only copies what changed. Re-running is fast. Files deleted from the source repo are cleaned up from the mirror. I work in git worktrees, so the main branch docs are always in a stable state. The sync script reads from the worktree root, not from whatever feature branch I happen to be on.

Adding Services

Edit the SOURCES dict in the script:

SOURCES = {
    "api": Path("~/projects/product/api/docs"),
    "dashboard": Path("~/projects/product/dashboard/docs"),
    "workers": Path("~/projects/product/workers/docs"),
    # Add new services here:
    "agents-v2": Path("~/projects/product/agents-v2/docs"),
}

Then sds sync && sds context && qmd update && qmd embed.

Commands

sds="python3 ~/projects/context/skills/sync-service-docs/scripts/service-docs"

Command	Description
`sds sync`	Sync docs from all source repos (incremental)
`sds sync --clean`	Full re-sync (remove all files first)
`sds list`	List synced documentation by service
`sds context`	Register QMD context descriptions

Indexing happens automatically on the same hourly cron, same pipeline.

Keeping It Fresh

Now we have four export pipelines, four data sources and four different tools. If you have to remember to run them manually, you just won't and context ends up stale. You /recall something and get results from last week because you forgot to export.

Two Triggers

Each collection's data changes in a different place. Claude Code sessions change while you're in Claude. Cursor sessions change while you're in Cursor. Granola meetings happen on their own schedule. Service docs update when someone pushes to a repo. No single trigger captures all of them.

Hooks are event-driven - Claude Code supports SessionEnd hooks that run when you close a session, so the data gets exported the moment it changes. But hooks only work inside Claude Code. A Cursor session or Granola meeting won't trigger anything.

Cron catches everything else. It runs whether you're in Claude, Cursor, or sleeping. The tradeoff is staleness - data can be up to an hour old. A meeting that ended 5 minutes ago won't show up in /recall until the next cron tick.

The answer is to do both. Hooks for the data that changes during Claude use. Cron for everything else.

Orchestration Script

The first version of automation was a crontab one-liner: qmd update && qmd embed. It only re-indexed files that were already exported - it didn't run the exports at all. To actually keep context fresh, you'd need something like:

cs export --today && crs export --today && grs export --today && sds sync -q && qmd update && qmd embed

That works until it doesn't. If Granola's API is down, everything after && never runs - including the QMD indexing for collections that exported successfully. When you add a sixth collection, you edit the crontab. When you want to debug why cursor-sessions is slow, you rewrite it with time in front of each step. When you want to call the same pipeline from a hook with different options, you duplicate the whole thing.

scripts/context-sync instead is a single orchestration script that solves all of this. Each collection runs independently - one failing doesn't block others. It takes a mode flag that controls which collections run:

python3 scripts/context-sync --hook    # Fast: claude-sessions only + qmd update
python3 scripts/context-sync --cron    # Full: all 4 exports + qmd update + qmd embed
python3 scripts/context-sync --all     # Everything including full re-embed

Every step is timed and logged to logs/context-sync.log:

=== context-sync (cron) 2026-03-10T09:07:00 ===
  [claude-sessions] ok (2.1s)
  [cursor-sessions] ok (1.8s)
  [granola-sessions] ok (3.4s)
  [service-docs] ok (5.2s)
  [qmd-update] ok (1.0s)
  [qmd-embed] ok (12.3s)
=== done ===

One place to add new sources, one place to debug timing, one script callable from both hook and cron.

Why SessionEnd

The first idea was syncing on every prompt. Claude Code has a UserPromptSubmit hook - it fires before each message is processed. You'd always have the latest context available for /recall.

The problem is cost. That hook adds 1-2 seconds of latency to every single message. Every prompt, follow-up, "yes do it." It's a tax you pay constantly for a benefit you rarely need - how often do you /recall something from the current session you're still in?

SessionEnd fires once, when you close the session. The session is done. Export it. That's the moment that matters - the next time you start a new session and /recall, the previous conversation is there. My hook fires once at SessionEnd with async: true, so the sync runs in the background after I close the terminal. One call to context-sync --hook handles the Claude session export and QMD re-indexing in one shot.

What Runs When

Each collection gets the mechanism that matches how its data changes:

Collection	Hook (SessionEnd)	Cron (hourly)	Why
claude-sessions	Yes	Yes	Data changes during Claude use. SessionEnd captures it immediately. Cron catches any missed sessions.
cursor-sessions	-	Yes	Data changes while using Cursor (different app).
granola-sessions	-	Yes	Meetings happen independently. `--today` means only today's meetings are fetched (fast).
service-docs	-	Yes	Repo docs change via git push. Hourly sync copies incrementally (only changed files).
notes	-	Yes	Direct QMD collection. `qmd update` picks up file changes automatically.

The cron runs at minute 7 of every hour:

7 * * * * python3 ~/projects/context/scripts/context-sync --cron >> ~/projects/context/logs/context-sync.log 2>&1

Idempotency

The pipeline runs hourly. If nothing changed since the last run - no new Claude sessions, no Cursor activity, no meetings, no doc updates - it shouldn't re-parse 800+ JSONL files, re-query a 6.6GB SQLite database, re-fetch hundreds of meeting transcripts from an API, and re-copy 400 documentation files.

Every export script checks whether the source data has actually changed before doing any work:

For claude-sessions, it's simple mtime comparison - if the JSONL source hasn't been modified since the last export, skip it entirely.

Cursor-sessions compares each composer's lastUpdatedAt timestamp against the output file's mtime. Unchanged composers get skipped before the expensive parts. The workspace map alone requires scanning ~300 workspace SQLite databases (~8 seconds), so that scan is deferred entirely - if no composers need exporting, it never runs.

Granola-sessions does the same mtime check, but the real win is deferring API calls. The script scans the local cache first to find meetings that actually changed. If everything is up to date, it makes zero API calls - no fetch_all_documents, no per-meeting fetch_transcript. Those calls were the whole reason export was slow.

Service-docs and QMD's own update/embed commands were already incremental - content hashes and mtime checks respectively.

A typical "nothing changed" cron run:

=== context-sync (cron) 2026-03-10T23:39:37 ===
  [claude-sessions] ok (0.0s)      # skipped - no new JSONL changes
  [cursor-sessions] ok (7.5s)      # DB scan unavoidable (6.6GB), but no writes
  [granola-sessions] ok (0.1s)     # skipped - no API calls
  [service-docs] ok (0.3s)         # hash check, no copies
  [qmd-update] ok (1.3s)           # mtime check, no reindex
  [qmd-embed] ok (0.2s)            # nothing new to embed
=== done ===

Cursor-sessions is the outlier. Even when every composer is skipped, the script still streams 1,620 JSON blobs from a 6.6GB SQLite database just to check their timestamps. The query itself is the bottleneck - there's no way to filter by lastUpdatedAt in SQL because the timestamp is buried inside a JSON value column. But the expensive work after that - building the workspace map (~8s), fetching messages, writing files - all gets skipped. The 7.5s is the floor for cursor-sessions.

Setting Up QMD

With all the export pipelines producing markdown, I needed to wire them into QMD. The mental model: each folder of markdown becomes a "collection" that QMD indexes separately but searches together.

Collections and Context

I pointed QMD at all five context folders - notes, the four export directories. Each one got a qmd collection add with a name. The interesting part is context descriptions. QMD lets you annotate any level of your directory hierarchy with a human-readable description of what's in there. During search, these descriptions influence ranking - a query about "webhook retries" will naturally score higher against a collection described as "Backend API - workflows, webhooks, shared services" than one described as "Frontend dashboard."

I went deep on this. Every project subfolder within claude-code-sessions and cursor-sessions has its own description. The export scripts generate these automatically from the project catalog, so when I add a new repo the context descriptions update on the next run.

Models and Indexing

The first qmd embed run downloads three GGUF models to your machine (~2GB total). An embeddings model for vector search, a reranker for scoring results, and a query expansion model for generating search variants. After that, qmd update && qmd embed only processes changed files - a typical incremental run finishes in seconds.

Incremental Updates

100%

QMD doesn't watch for file changes on its own, so the context-sync cron described in Keeping It Fresh handles qmd update && qmd embed hourly. Both commands are incremental - they only process what changed.

How Search Works

QMD ships with BM25, vector, and hybrid search - each suited to different kinds of questions.

Search Modes

To show the difference, I searched for "webhook retries" across my collections.

Plain grep would match every file containing those words - including test files, log outputs, and unrelated code comments. Hundreds of hits, no ranking, no way to know which ones actually discuss the retry architecture.

BM25 (qmd search) narrows it down immediately. It scores documents by term frequency and rarity - a focused design doc mentioning "webhook retries" throughout scores higher than a massive session transcript where the phrase appears once in passing. Two seconds, five ranked results, and the top hit was the architecture decision doc from service-docs. No embeddings involved - pure statistical relevance.

Where BM25 falls short: qmd search "event delivery failures" returns nothing if nobody used that exact phrase. The concept is the same as webhook retries, but the words don't match.

Semantic search (qmd vsearch) bridges that gap. It embeds the query and finds documents by meaning, not keywords. I searched qmd vsearch "messages getting lost between services" and it surfaced a Cursor session about the dead letter queue implementation, a meeting where we discussed retry backoff strategy, and a note about idempotency guarantees - none of which contain my search words.

Hybrid search (qmd query) runs both approaches and fuses the results. For most queries, BM25 is enough and fast. I reach for hybrid when searching across meeting transcripts and braindumps, where the exact terminology is unpredictable.

Search Pipeline

QMD's hybrid search (qmd query) runs a multi-stage pipeline combining all three approaches:

100%

Score Normalization

Each search backend produces scores on a different scale:

Backend	Raw Score	Conversion	Range
FTS (BM25)	SQLite FTS5 BM25	`Math.abs(score)`	0 to ~25+
Vector	Cosine distance	`1 / (1 + distance)`	0.0 to 1.0
Reranker	LLM 0-10 rating	`score / 10`	0.0 to 1.0

Fusion Strategy

The query command uses Reciprocal Rank Fusion (RRF) with position-aware blending.

The original query gets doubled for weighting and expanded into 1 LLM variation. Each variant searches both FTS and vector indexes in parallel, then all result lists are combined using RRF ( $\text{score} = \sum \frac{1}{k + \text{rank} + 1}$ where $k = 60$ ). Documents ranking #1 in any list get a +0.05 bonus, #2-3 get +0.02 - this prevents exact matches from getting diluted by expanded queries. The top 30 candidates go to the LLM reranker (yes/no with logprobs confidence), then a position-aware blend produces the final scores: RRF rank 1-3 gets 75% retrieval / 25% reranker to preserve exact matches, rank 4-10 shifts to 60/40, and rank 11+ trusts the reranker more at 40/60.

Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.

Score	Meaning
0.8 - 1.0	Highly relevant
0.5 - 0.8	Moderately relevant
0.2 - 0.5	Somewhat relevant
0.0 - 0.2	Low relevance

Embedding Flow

Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:

100%

Smart Chunking

Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.

Pattern	Score	Description
`# Heading`	100	H1 - major section
`## Heading`	90	H2 - subsection
`### Heading`	80	H3
`#### Heading`	70	H4
`##### Heading`	60	H5
`###### Heading`	50	H6
```	80	Code block boundary
`---` / `***`	60	Horizontal rule
Blank line	20	Paragraph boundary
`- item` / `1. item`	5	List item
Line break	1	Minimal break

The algorithm:

Scan document for all break points with scores
When approaching the 900-token target, search a 200-token window before the cutoff
Score each break point: finalScore = baseScore × (1 - (distance/window)² × 0.7)
Cut at the highest-scoring break point

The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.

Code blocks get special treatment - break points inside code fences are ignored, keeping code together. If a code block exceeds the chunk size, it's kept whole when possible.

QMD Reference

Beyond search, QMD has a few commands I use constantly. You can annotate any level of your collection hierarchy with descriptions using qmd:// paths - I set these for every project subfolder so searches know what each folder contains. qmd status is my go-to for checking index health, and qmd update --pull is handy when service docs live in remote repos.

For retrieval, I mostly use qmd get with the docid from search results or a file path. The -c flag restricts searches to a single collection, which I use when I know the answer is in service-docs and don't want meeting transcripts cluttering the results. --explain shows the full score breakdown - useful when debugging why a document ranks where it does.

Deploy and Skills

The context system has one more problem: configuration sprawl.

The hook config lives in ~/.claude/settings.json. The skills live in ~/.claude/skills/. The cron schedule lives in the crontab. The shell aliases live in ~/.zshrc. Four different places, none of them version controlled. You edit a hook timeout, forget what you changed, and can't get back. You set up a new machine and spend an hour remembering which files go where.

I moved everything into the repo.

config/settings.json is the Claude Code settings file - hooks, statusline, preferences. config/crontab is the cron schedule. config/aliases.sh has the shell aliases. skills/ has the skill implementations. Everything tracked in git, diffable, and committable.

scripts/deploy is the bridge between the repo and the filesystem. It copies everything from where it's tracked to where it needs to live:

python3 scripts/deploy          # Copy everything to its destination
python3 scripts/deploy --dry-run  # Preview what would change

What it deploys:

Source	Destination	What
`config/settings.json`	`~/.claude/settings.json`	Hooks, statusline, preferences
`skills/*`	`~/.claude/skills/*`	All 6 skills (recall, search, sync-claude-sessions, sync-cursor-sessions, granola, sync-service-docs)
`config/crontab`	system crontab	Hourly context-sync
`config/aliases.sh`	sourced from `~/.zshrc`	Shell aliases (cs, crs, grs, sds)

The workflow: edit source files in the repo, commit, run deploy. If something breaks, git diff shows what changed. Roll back with git checkout. Set up a new machine: clone, deploy, done.

/recall

This is the skill I reach for every morning. I open a new Claude Code session and type /recall yesterday before anything else. It gives me a table of every session from the previous day - timestamps, message counts, what I was doing. 47 sessions from one Tuesday. I pick the three that matter, drill in, and Claude has the full context of where I left off.

But the date-based mode only works when I know when something happened. Half the time I'm looking for a topic, not a date. That's where QMD comes in. I type /recall compliance rules engine and it fans out across all five collections - comes back with the evaluator architecture from a Claude Code session three weeks ago, a Cursor session where I wrote the jurisdiction tests, and meeting notes from when we discussed the override logic. All of that from one command, ranked by relevance, in under a minute.

The graph mode I use less often but it's the most visually satisfying. It renders an interactive HTML page where each session is a node, colored by day, connected to every file it touched. You immediately see clusters - oh, these seven sessions all touched the same three files, that's the webhook refactor. That isolated cluster over there is the blog work. Older sessions fade out, recent ones glow purple.

I asked "what is the biggest thing I implemented last month?" and it searched across all five collections:

❯ /recall what is the biggest thing i implemented last month? Rules engine
 
⏺ Let me search for rules-engine-related work from last month.
 
⏺ Bash(qmd search "rules engine implementation" -c claude-code-sessions -n 5)
  ⎿  qmd://claude-code-sessions/my-product/api/2026-03-11-5a751a41.md
     Title: Implement the following plan: Refactor rules_engine → evaluation_engine
     … +35 lines
 
⏺ Bash(qmd search "rules engine" -c notes -n 5)
  ⎿  qmd://notes/my-product/industries/readme.md
     Title: Product - Strategic Overview & Research Index
     … +34 lines
 
⏺ Bash(qmd search "rules engine" -c granola-sessions -n 3)
  ⎿  qmd://granola-sessions/2026-02-19-92999391.md
     Title: Weekly Sync
     … +35 lines
 
⏺ Bash(qmd search "rules engine" -c cursor-sessions -n 5)
  ⎿  qmd://cursor-sessions/my-product/api/2025-12-08-8e3ac8eb.md
     Title: Implement ticket ingestion rules
     … +31 lines
 
⏺ Bash(python3 recall-day.py list "last 30 days" --min-msgs 3)
  ⎿  Sessions for 2026-02-10 to 2026-03-12
     … +206 lines
 
⏺ Now let me fetch the most relevant sessions to understand the biggest
  implementation.
 
⏺ Bash(qmd get "qmd://claude-code-sessions/my-product/api/2026-03-11-..." -l 80)
  ⎿  … +78 lines
 
⏺ Here's what I found:
 
  Biggest Implementation Last Month
 
  The biggest thing you built in February–March is ...
 
✻ Churned for 44s

One question, and it searched Claude Code sessions, Cursor sessions, meeting transcripts from Granola, personal notes, and research docs. It pulled context from sessions I'd forgotten about, meetings where the architecture was discussed, and notes where I'd sketched out the approach months earlier. Without this, I'd have spent 20 minutes explaining to Claude what the rules engine even is.

/search

The lightweight sibling. Mid-conversation I'll realize I need to check how we handle webhook retries, but I don't want the full recall workflow with its synthesis and "One Thing" recommendation. /search webhook retries gives me BM25 snippets inline and I keep moving. I use it dozens of times a day - quick lookups that would otherwise mean opening another terminal and grepping.

Export Skills

The other skills handle the data pipeline. I have one for Claude Code sessions and one for Cursor - both produce the same markdown format so QMD treats them identically. The Claude one also lets me resume old sessions, add timestamped annotations, and mark things done. The Cursor one is export-only since Cursor doesn't expose hooks for live sync. I run the Claude exports automatically via the SessionEnd hook. Cursor exports I trigger manually or let the hourly cron handle.

Results

Index Scale

Metric	Value
Total documents indexed	2,398
Total vector embeddings	41,420 chunks
Index size (SQLite)	312 MB
Model cache	~2 GB (includes 3 GGUF models)

At ~500 tokens per chunk, that's roughly 20 million tokens of searchable context.

Local Models

Three GGUF models power the entire pipeline, all running locally on an M4 Pro with Metal GPU acceleration:

Model	Purpose	Size
EmbeddingGemma 300M	Embedding generation	~300M params
Qwen3-Reranker 0.6B (Q8_0)	Cross-encoder reranking	639 MB
qmd-query-expansion 1.7B (Q4_K_M)	Query expansion for `qmd query`	1.28 GB

Embedding 4 queries takes ~486ms. Everything runs locally.

Cross-Collection Search

A search for "authentication middleware" returns results from multiple collections at once:

Score	Source	File
0.90	service-docs	`api/slack/slack-authentication-flow.md`
0.90	service-docs	`api/slack/slack-bolt-integration-guide.md`
0.90	cursor-sessions	`my-product/dashboard/2025-12-19-...md`

Service docs and coding sessions surfacing together for the same query. Without this system, you'd search each tool's history separately.

A hybrid query for "what did I work on yesterday" shows the query expansion in action:

Expanding query...
├─ what did I work on yesterday            (original)
├─ lex: tracking personal tasks            (lexical expansion)
├─ lex: what were my                       (lexical expansion)
├─ vec: tracking personal tasks            (vector query)
├─ vec: what were my responsibilities yesterday  (vector query)
└─ hyde: The topic of what did I work on yesterday covers...  (hypothetical document)

Six sub-queries generated, 4 embedded (486ms), 40 chunks reranked. The top 3 results come from three different collections - a Claude Code session, personal notes, and a meeting transcript. The system surfaces relevant context regardless of where it was captured.

What's Still Rough

This isn't a polished product. A few things still bother me.

Cursor-sessions has a 7.5-second floor even when nothing changed, because the script has to stream 1,620 JSON blobs from a 6.6GB database just to check timestamps. The Granola database is encrypted - if they ever change the API, I lose access to summaries entirely. And there are now a dozen+ "memory for coding agents" projects popping up - Total Recall¹⁰, Ghost¹¹, Engram¹², and others - all solving the same problem differently. The space is moving fast and the right abstractions haven't settled yet.

But even with the rough edges, the core loop works. I start a session, run /recall yesterday, and see 47 sessions reconstructed with timelines and message counts. I search for "compliance rules engine" and get back the evaluator design, the test plan, the jurisdiction loader - all from different collections, all ranked by relevance. I type a natural language question and the pipeline expands it into six sub-queries, searches both keyword and semantic indexes, re-ranks with a local LLM, and returns results in under a second.

You'll always find another data source to crack open, another edge case in the sync pipeline, another collection to add. But the alternative - starting every session from scratch - is worse. So you build the layer, automate it, and keep going.

References

Tobias Lutke: "QMD - Query Markup Documents", GitHub, 2025 ↩
Tobias Lutke is the CEO of Shopify and creator of QMD. See Douglas Laney: "Viral Shopify CEO Manifesto Says AI Now Mandatory", Forbes, 2025 ↩
node-llama-cpp: "Node.js bindings for llama.cpp", GitHub ↩
Philipp Schmid: "The new skill in AI is not prompting, it's context engineering", 2025 ↩
Anthropic: "Effective context engineering for AI agents", 2026 ↩
ArtemXTech: "Personal OS Skills", GitHub - original post ↩
Anthropic: "Model Context Protocol - Introduction", 2025 ↩
Granola: "granola.ai", AI meeting notepad ↩
getprobo: "Reverse Engineering Granola API", GitHub ↩
Dave Goldblatt: "Total Recall - Persistent memory for Claude Code", GitHub, 2026 ↩
notkurt: "Ghost - Claude Code session capture with QMD", GitHub, 2026 ↩
Engram: "Engram - Memory MCP server", 2026 ↩

Loading comments...

PreviousI Open-Sourced My Coding Agents' Memory NextMulti‑Agent Web Exploration with Shared Graph Memory