How to Actually Build the Customer Voice Skill
Three ways to structure your review data inside Mind OS — each optimized for different types of questions your team will ask. Based on Anthropic's own research, Karpathy's LLM Wiki pattern, and how your existing skills are built.
Can You Ask "Give Me 5 Amazing Reviews About :Rise"?
Short answer: yes, but only if the data is structured right. Here's why it matters:
Claude's context window is like a desk. You can spread a lot of papers on it (~500 pages worth). Claude can find specific items on that desk — but there's a catch called the "lost in the middle" problem.
The Science: Stanford and UC Berkeley found that AI models pay the most attention to what's at the beginning and end of the context — and their accuracy drops 30%+ for information buried in the middle. So if you dump 1,000 raw reviews in and ask for the best ones about taste, Claude will reliably find reviews near the top and bottom, but might miss great ones in the middle.
The fix isn't to use fewer reviews. It's to organize them so Claude knows where to look — with tags, headers, and an index. Anthropic's own testing found three techniques that dramatically improve retrieval:
- Tagging each review with metadata (product, theme, star rating) — improves accuracy up to 40%
- Asking Claude to "quote the relevant reviews first, then answer" — improved accuracy from 27% to 98% in Anthropic's tests
- Putting a table of contents / index at the top — gives Claude a map so it goes directly to the right section instead of scanning everything
All three options below use these techniques. The difference is how they organize the files.
How Much Review Data Actually Fits
Claude's context window is 200K tokens. But it's not all yours — the system, conversation, and other skills take up space. Here's the realistic budget:
| What's Using the Space | Tokens |
|---|---|
| System + tools + other skills | ~20,000 |
| Your conversation with Claude | ~10,000 |
| Claude's "thinking room" (needs space to reason) | ~20,000 |
| Available for review data | ~150,000 |
A typical customer review (100-200 words + metadata tags) is about 200-300 tokens. That means:
- ~500-750 raw reviews could fit if that's ALL you loaded
- ~300-400 reviews is the comfortable sweet spot (leaves headroom)
- Summaries + personas + swipe files + 200 curated reviews = ~63,000 tokens (just 32% of budget)
Key insight: Anthropic recommends filling no more than ~60% of the context window. Past that, reasoning quality degrades. So the skill should be designed to load only what's relevant to each question — not everything every time.
How to Structure It
The Wiki — One File Per Topic
Inspired by Karpathy's LLM Wiki pattern. Every topic gets its own file — one for each product, one for each theme, one for each persona. Each file has a summary at the top and curated raw reviews below. An index file tells Claude where everything lives.
SKILL.md — instructions for Claude
references/
index.md — master catalog of all files
products/
rise.md — :rise summary + top 40 reviews
rest.md
balance.md
starter-kit.md
themes/
taste.md — cross-product taste insights + reviews
energy.md
ritual.md
quitting-coffee.md
personas/
coffee-quitter.md
wellness-optimizer.md
ritual-seeker.md
swipe/
ad-hooks.md — pre-built hooks from review language
testimonial-blocks.md
objection-handling.md
Pros
- Handles any type of question — product, theme, persona, or creative
- Claude loads only relevant files (good token efficiency)
- Each file is small and focused (avoids the "lost in the middle" problem)
- Easy to update one file without touching others
- Follows the Karpathy pattern used by leading AI practitioners
Cons
- Most files to build upfront (~18-20 files)
- Some reviews appear in multiple files (a great :rise taste review is in both rise.md and taste.md)
- Claude has to decide which files to load — needs clear routing instructions
- More maintenance when refreshing with new reviews
The Product Bible — One Big File Per Product
The simplest approach. Each product gets one comprehensive file containing everything: summary, themes, persona notes, AND all the curated reviews for that product. Plus one cross-product file for brand-level insights and swipe files.
SKILL.md — instructions for Claude
references/
index.md — file catalog
rise-reviews.md — everything about :rise (summary + 80 best reviews)
rest-reviews.md
balance-reviews.md
starter-kit-reviews.md
brand-overview.md — cross-product themes, personas, swipe file
Pros
- Fewest files — easiest to build and maintain (5-6 files total)
- Dead simple mental model: "all :rise stuff is in one place"
- No duplication — each review appears once
- Fewer routing decisions for Claude to make
- Mirrors your existing mudwtr-products skill structure (one file per product)
Cons
- Files get large (8,000-15,000 tokens each) — higher "lost in the middle" risk
- Cross-product questions need multiple files loaded at once
- Always loads the full product file, even for narrow questions
- Persona and swipe file content gets buried inside a big file
The Hybrid — Three Layers by Purpose
Combines the best of both approaches. Three layers, each designed for a different type of question: Insights (what do customers think?), Playbooks (give me creative output), and Vault (give me actual reviews). Claude loads only the layer(s) needed.
SKILL.md — instructions + routing logic
references/
index.md — master catalog
LAYER 1: INSIGHTS (~25K tokens)
insights/
product-summaries.md — 1-page review summary per product
theme-map.md — all major themes + key findings + star quotes
customer-personas.md — 4-5 persona cards from review patterns
brand-language.md — actual words customers use, by emotion/topic
LAYER 2: PLAYBOOKS (~8K tokens)
playbooks/
ad-hooks.md — 50 hooks from review language, tagged by product
testimonial-blocks.md — 30 copy-paste review blocks for ads/emails
objection-handling.md — top 10 objections + real customer responses
LAYER 3: VAULT (~30K tokens)
vault/
rise-reviews.md — best 50 verbatim :rise reviews, tagged
rest-reviews.md — best 50 verbatim :rest reviews
balance-reviews.md
starter-kit-reviews.md
negative-reviews.md — honest critical reviews (for objection handling)
How Claude Routes Questions
| If the Team Asks... | Claude Loads... | Tokens Used |
|---|---|---|
| "What do customers think about :rise?" | insights/product-summaries.md | ~5K |
| "Write me 10 ad hooks about energy" | playbooks/ad-hooks.md + insights/brand-language.md | ~10K |
| "Give me 5 amazing reviews about :rise" | vault/rise-reviews.md | ~6K |
| "Build me a persona for the Coffee Quitter" | insights/customer-personas.md + insights/brand-language.md | ~8K |
| "What objections do people have before buying?" | playbooks/objection-handling.md + vault/negative-reviews.md | ~10K |
| "Write a full landing page using customer voice" | All three layers | ~63K |
Why this structure answers your question: The Vault layer is specifically designed for "give me actual reviews" requests. Each review in the vault is tagged with product, star rating, themes, and sentiment — so Claude can filter precisely. The SKILL.md instructions tell Claude to "always quote reviews by ID before answering" — the technique Anthropic found improves retrieval from 27% to 98% accuracy.
Pros
- Best token efficiency — most questions use only 5-10K tokens
- Pre-built playbooks make the most common asks (hooks, testimonials) instant
- Vault gives you verbatim reviews on demand — answers "give me 5 real reviews"
- Each layer is independently updatable
- Follows both the Karpathy pattern AND Anthropic's own guidance
- Scales — can easily add more vault files or insight docs over time
- Claude only loads what's needed, keeping reasoning quality high
Cons
- More upfront work than Option 2 (~12-14 files vs 5-6)
- Routing logic in SKILL.md needs to be written carefully
- Some review content appears in both insights (as quotes) and vault (as full reviews)
Side-by-Side
| Factor | 01: Wiki | 02: Product Bible | 03: Hybrid |
|---|---|---|---|
| Files to build | ~18-20 | ~5-6 | ~12-14 |
| "Give me 5 real reviews" | Yes (scattered across files) | Yes (in one big file) | Yes (dedicated vault layer) |
| "Write me ad hooks" | Needs to generate from reviews | Needs to generate from reviews | Pre-built in playbooks |
| Tokens per typical query | ~8-15K | ~10-15K | ~5-10K |
| "Lost in the middle" risk | Low (small files) | Medium (large files) | Low (small, layered files) |
| Cross-product questions | Good (theme files span products) | Needs multiple files | Good (theme-map spans products) |
| Maintenance effort | Higher (many files) | Lowest | Medium |
| Speed of common asks | Medium | Medium | Fastest (playbooks are pre-built) |
Recommendation: Option 3 — The Hybrid
The three-layer design directly maps to how your team will actually use this data. Your creative strategist asking for ad hooks is a different workflow than your designer asking for real testimonial quotes, which is different from your brand strategist building a persona. The hybrid gives each of them a fast path.
The Playbooks layer is the secret weapon. Your most common requests — "give me hooks," "give me testimonials," "help me handle this objection" — are pre-built and ready to go. Claude doesn't need to re-derive them from raw reviews every time. It just serves them up.
And yes — "Give me 5 amazing reviews about :rise that talk about benefits" works. Claude loads the vault, scans the tags, and returns real verbatim reviews. The tagging and indexing techniques from Anthropic's research make this reliable, not a gamble.
The total data footprint (~63K tokens for everything, ~5-10K for a typical question) leaves your team plenty of room to have a real back-and-forth conversation with Claude on top of the data. No context window crunch.