review intelligence
Architecture Deep Dive

How to Actually Build the Customer Voice Skill

Three ways to structure your review data inside Mind OS — each optimized for different types of questions your team will ask. Based on Anthropic's own research, Karpathy's LLM Wiki pattern, and how your existing skills are built.

Can You Ask "Give Me 5 Amazing Reviews About :Rise"?

Short answer: yes, but only if the data is structured right. Here's why it matters:

Claude's context window is like a desk. You can spread a lot of papers on it (~500 pages worth). Claude can find specific items on that desk — but there's a catch called the "lost in the middle" problem.

The Science: Stanford and UC Berkeley found that AI models pay the most attention to what's at the beginning and end of the context — and their accuracy drops 30%+ for information buried in the middle. So if you dump 1,000 raw reviews in and ask for the best ones about taste, Claude will reliably find reviews near the top and bottom, but might miss great ones in the middle.

The fix isn't to use fewer reviews. It's to organize them so Claude knows where to look — with tags, headers, and an index. Anthropic's own testing found three techniques that dramatically improve retrieval:

All three options below use these techniques. The difference is how they organize the files.

How Much Review Data Actually Fits

Claude's context window is 200K tokens. But it's not all yours — the system, conversation, and other skills take up space. Here's the realistic budget:

What's Using the SpaceTokens
System + tools + other skills~20,000
Your conversation with Claude~10,000
Claude's "thinking room" (needs space to reason)~20,000
Available for review data~150,000

A typical customer review (100-200 words + metadata tags) is about 200-300 tokens. That means:

Key insight: Anthropic recommends filling no more than ~60% of the context window. Past that, reasoning quality degrades. So the skill should be designed to load only what's relevant to each question — not everything every time.

How to Structure It

Option 01 Flexible

The Wiki — One File Per Topic

Inspired by Karpathy's LLM Wiki pattern. Every topic gets its own file — one for each product, one for each theme, one for each persona. Each file has a summary at the top and curated raw reviews below. An index file tells Claude where everything lives.

mudwtr-customer-voice/
  SKILL.md — instructions for Claude
  references/
    index.md — master catalog of all files
    products/
      rise.md — :rise summary + top 40 reviews
      rest.md
      balance.md
      starter-kit.md
    themes/
      taste.md — cross-product taste insights + reviews
      energy.md
      ritual.md
      quitting-coffee.md
    personas/
      coffee-quitter.md
      wellness-optimizer.md
      ritual-seeker.md
    swipe/
      ad-hooks.md — pre-built hooks from review language
      testimonial-blocks.md
      objection-handling.md
Example Question
"Give me 5 amazing reviews about :rise that talk about energy benefits"
Claude reads index.md, loads products/rise.md + themes/energy.md, finds tagged reviews that match both :rise AND energy, returns 5 verbatim reviews with context.

Pros

  • Handles any type of question — product, theme, persona, or creative
  • Claude loads only relevant files (good token efficiency)
  • Each file is small and focused (avoids the "lost in the middle" problem)
  • Easy to update one file without touching others
  • Follows the Karpathy pattern used by leading AI practitioners

Cons

  • Most files to build upfront (~18-20 files)
  • Some reviews appear in multiple files (a great :rise taste review is in both rise.md and taste.md)
  • Claude has to decide which files to load — needs clear routing instructions
  • More maintenance when refreshing with new reviews
Option 02 Simple

The Product Bible — One Big File Per Product

The simplest approach. Each product gets one comprehensive file containing everything: summary, themes, persona notes, AND all the curated reviews for that product. Plus one cross-product file for brand-level insights and swipe files.

mudwtr-customer-voice/
  SKILL.md — instructions for Claude
  references/
    index.md — file catalog
    rise-reviews.md — everything about :rise (summary + 80 best reviews)
    rest-reviews.md
    balance-reviews.md
    starter-kit-reviews.md
    brand-overview.md — cross-product themes, personas, swipe file
Example Question
"Give me 5 amazing reviews about :rise that talk about energy benefits"
Claude loads rise-reviews.md, scans the "Energy" section header, finds tagged reviews, returns 5 verbatim reviews. Simple and fast.

Pros

  • Fewest files — easiest to build and maintain (5-6 files total)
  • Dead simple mental model: "all :rise stuff is in one place"
  • No duplication — each review appears once
  • Fewer routing decisions for Claude to make
  • Mirrors your existing mudwtr-products skill structure (one file per product)

Cons

  • Files get large (8,000-15,000 tokens each) — higher "lost in the middle" risk
  • Cross-product questions need multiple files loaded at once
  • Always loads the full product file, even for narrow questions
  • Persona and swipe file content gets buried inside a big file

Side-by-Side

Factor 01: Wiki 02: Product Bible 03: Hybrid
Files to build ~18-20 ~5-6 ~12-14
"Give me 5 real reviews" Yes (scattered across files) Yes (in one big file) Yes (dedicated vault layer)
"Write me ad hooks" Needs to generate from reviews Needs to generate from reviews Pre-built in playbooks
Tokens per typical query ~8-15K ~10-15K ~5-10K
"Lost in the middle" risk Low (small files) Medium (large files) Low (small, layered files)
Cross-product questions Good (theme files span products) Needs multiple files Good (theme-map spans products)
Maintenance effort Higher (many files) Lowest Medium
Speed of common asks Medium Medium Fastest (playbooks are pre-built)

Recommendation: Option 3 — The Hybrid

The three-layer design directly maps to how your team will actually use this data. Your creative strategist asking for ad hooks is a different workflow than your designer asking for real testimonial quotes, which is different from your brand strategist building a persona. The hybrid gives each of them a fast path.

The Playbooks layer is the secret weapon. Your most common requests — "give me hooks," "give me testimonials," "help me handle this objection" — are pre-built and ready to go. Claude doesn't need to re-derive them from raw reviews every time. It just serves them up.

And yes — "Give me 5 amazing reviews about :rise that talk about benefits" works. Claude loads the vault, scans the tags, and returns real verbatim reviews. The tagging and indexing techniques from Anthropic's research make this reliable, not a gamble.

The total data footprint (~63K tokens for everything, ~5-10K for a typical question) leaves your team plenty of room to have a real back-and-forth conversation with Claude on top of the data. No context window crunch.

Research Sources

Anthropic — Long Context Window Prompting Tips Anthropic — Context Engineering for AI Agents Anthropic — Using XML Tags to Structure Prompts Anthropic — "Quote First" Retrieval Technique (27% to 98%) Stanford/UC Berkeley — Lost in the Middle: How Language Models Use Long Contexts Karpathy — LLM Wiki Knowledge Base Pattern (GitHub Gist) VentureBeat — Karpathy's LLM Knowledge Base Architecture MindStudio — LLM Wiki vs RAG Comparison DTCskills — How to Run Your Ecommerce Brand with Claude Multi-Needle Retrieval in Large Context Windows (2025) Long Context vs RAG for LLMs (2025)