@houtini/voice-analyser 1.3.1 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,19 +1,18 @@
1
- # Voice Analyser
1
+ # Voice Analyser MCP
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/@houtini/voice-analyser)](https://www.npmjs.com/package/@houtini/voice-analyser)
4
+ [![Known Vulnerabilities](https://snyk.io/test/github/houtini-ai/voice-analyser-mcp/badge.svg)](https://snyk.io/test/github/houtini-ai/voice-analyser-mcp)
4
5
  [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5
6
 
6
- > **Experimental library** for extracting statistical voice models from your published writing. Generates immersive style guides that teach LLMs to replicate how you actually write - not through rules, but through examples and rhythm patterns.
7
+ > MCP server that analyses your published writing and generates executable style guides for voice-matched content creation.
7
8
 
8
- ## Why This Exists
9
+ ## What This Does
9
10
 
10
- Traditional style guides list rules: "Use short sentences. Vary paragraph length. Include personal anecdotes."
11
+ I built this because traditional style guides don't work. They tell you "use short sentences" and "vary paragraph length" - rules that sound helpful but produce robotic output when you try to follow them.
11
12
 
12
- This doesn't work. Writers don't follow rules - they channel voice.
13
+ This tool extracts statistical patterns from your published writing and generates a style guide that teaches through **zero tolerance rules, phrase libraries, and validation checklists** rather than vague principles.
13
14
 
14
- This tool extracts the statistical fingerprint of *your writing* and presents it as immersive examples with annotations showing *what makes each passage feel human*. The goal is voice replication through pattern recognition, not rule compliance.
15
-
16
- **Status:** Experimental. The approach works but is under active development.
15
+ Version 1.4.0 focuses on executable instructions: forbidden word lists with alternatives, 50+ actual phrases from your corpus, and checkbox validation that catches AI slop before you publish.
17
16
 
18
17
  ## Installation
19
18
 
@@ -32,217 +31,191 @@ Add to your `claude_desktop_config.json`:
32
31
  }
33
32
  ```
34
33
 
35
- **Config locations:**
34
+ **Config file locations:**
36
35
  - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
37
36
  - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
38
37
  - Linux: `~/.config/Claude/claude_desktop_config.json`
39
38
 
40
39
  Restart Claude Desktop after saving.
41
40
 
42
- ### Requirements
43
-
44
- - Node.js 20+
41
+ **Requirements:** Node.js 20+
45
42
 
46
43
  ## Quick Start
47
44
 
48
- ### 1. Create an Output Directory
45
+ ### 1. Create Output Directory
49
46
 
50
- First, create a directory where your corpus and analysis will be stored:
47
+ Pick a directory for corpus storage and analysis:
51
48
 
52
49
  ```
53
50
  C:\writing\voice-models\ (Windows)
54
51
  ~/writing/voice-models/ (Mac/Linux)
55
52
  ```
56
53
 
57
- This directory will contain:
58
- - Collected articles (markdown)
59
- - Analysis JSON files
60
- - Generated voice guides
54
+ This holds collected articles, analysis JSON files, and generated style guides.
61
55
 
62
- ### 2. Collect Your Writing
56
+ ### 2. Collect Writing Corpus
63
57
 
58
+ In Claude Desktop:
64
59
  ```
65
60
  Collect corpus from https://yoursite.com/post-sitemap.xml
66
61
  Save as "my-voice" in "C:\writing\voice-models"
67
62
  ```
68
63
 
69
- The tool needs:
70
- - `sitemap_url` - Your XML sitemap URL
71
- - `output_name` - A name for this corpus (e.g., "my-voice", "blog-posts")
72
- - `output_dir` - The directory you created above
64
+ Parameters:
65
+ - `sitemap_url` - XML sitemap URL
66
+ - `output_name` - Corpus identifier (e.g., "my-voice")
67
+ - `output_dir` - Directory you created above
68
+ - `max_articles` - Optional limit (default: 100)
73
69
 
74
- **Example with all parameters:**
75
- ```
76
- Collect corpus from https://example.com/post-sitemap.xml
77
- Output name: "technical-writing"
78
- Output directory: "C:\writing\voice-models"
79
- Maximum articles: 50
80
- ```
70
+ The tool crawls your sitemap, extracts clean content, and saves markdown files.
81
71
 
82
- ### 3. Analyse Patterns
72
+ ### 3. Run Analysis
83
73
 
84
74
  ```
85
75
  Analyse corpus "my-voice" in directory "C:\writing\voice-models"
86
76
  ```
87
77
 
88
- This runs 14 analysers covering vocabulary, sentence structure, voice markers, argument flow, and paragraph transitions.
78
+ This runs 15+ analysers covering:
79
+ - Vocabulary tiers (AI slop detection, formality scoring)
80
+ - Phrase extraction (opening patterns, transitions, caveats)
81
+ - Sentence structure and rhythm
82
+ - Voice markers and conversational devices
83
+ - Punctuation habits
89
84
 
90
- ### 4. Generate Voice Guide
85
+ ### 4. Generate Style Guide v4
91
86
 
92
87
  ```
93
- Generate narrative guide for "my-voice" in directory "C:\writing\voice-models"
88
+ Generate style guide for "my-voice" in directory "C:\writing\voice-models"
94
89
  ```
95
90
 
96
- Creates an immersive style guide with annotated examples at:
97
- `C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md`
91
+ Creates an example-first guide at:
92
+ `C:\writing\voice-models\my-voice\writing_style_my-voice.md`
98
93
 
99
- ## Using the Voice Guide
94
+ ## What v1.4.0 Changed
100
95
 
101
- Once generated, the voice guide can be loaded into any LLM conversation to help it write in your voice.
96
+ Previous versions generated statistical analysis that was accurate but not useful for writing. v1.4.0 restructures the output:
102
97
 
103
- ### Loading the Guide
98
+ **Before:** 60% statistics, 40% guidance
99
+ **After:** 70% examples, 30% statistics
104
100
 
105
- ```
106
- Load the file C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md
107
- and use it as a reference for all writing in this conversation.
108
- ```
101
+ ### New Style Guide Structure
109
102
 
110
- ### Example Prompts for Content Generation
103
+ **Part 1: Zero Tolerance Rules**
104
+ - Forbidden vocabulary (AI slop) with alternatives
105
+ - Formal words flagged with casual replacements
106
+ - Punctuation rules (em-dash detection)
111
107
 
112
- **Blog post:**
113
- ```
114
- Using the voice guide as your reference, write a blog post about [topic].
115
-
116
- Key requirements:
117
- - Match the sentence rhythm patterns shown in the examples
118
- - Use the conversational devices naturally (not forced)
119
- - Include the micro-rhythms: mid-thought pivots, embedded uncertainty, present-tense immediacy
120
- - Vary sentence length as shown in the statistical targets
121
- - Use British/American spelling as indicated in the guide
122
- ```
108
+ **Part 2: Phrase Library (50+ Examples)**
109
+ - Opening patterns (personal story, direct action, protective warnings)
110
+ - Equipment references (possessive vs generic)
111
+ - Caveat phrases (honesty markers)
112
+ - Transition patterns
123
113
 
124
- **Technical article:**
125
- ```
126
- Reference the voice guide and write a technical explanation of [concept].
127
-
128
- Channel the voice by:
129
- - Opening with the pattern shown in "Opening Moves" section
130
- - Using specific product/tool names, not generic references
131
- - Including admissions of complexity or uncertainty where authentic
132
- - Following the argument flow patterns from the guide
133
- - Matching the punctuation habits (especially dash usage)
134
- ```
114
+ **Part 3: Sentence Patterns**
115
+ - Rhythm variation targets with corpus examples
116
+ - First-person usage frequency
117
+ - Natural sentence flow demonstrations
118
+
119
+ **Part 4: Validation Checklist**
120
+ - Critical rules (must pass)
121
+ - Voice match rules (should pass)
122
+ - Actionable checkbox format
123
+
124
+ **Part 5: Quick Reference**
125
+ - Top phrases by frequency
126
+ - Statistics summary
127
+
128
+ ## Using the Style Guide
129
+
130
+ Load the generated guide into Claude conversations:
135
131
 
136
- **Product review:**
137
132
  ```
138
- Using the loaded voice guide, write a review of [product].
139
-
140
- Capture the voice by:
141
- - Starting with personal context (why you tested this)
142
- - Blending technical specs with practical implications
143
- - Using the transition patterns between paragraphs
144
- - Including the "human tells" - parenthetical asides, mid-thought corrections
145
- - Ending with the closing patterns shown in examples
133
+ Load C:\writing\voice-models\my-voice\writing_style_my-voice.md
134
+ and use it to write [content type] about [topic]
146
135
  ```
147
136
 
148
- **Email/communication:**
149
- ```
150
- Write an email about [subject] using the voice patterns from the guide.
137
+ The guide includes validation checklists. After Claude writes, run:
151
138
 
152
- Focus on:
153
- - Conversational markers appearing naturally
154
- - Sentence length variation (some punchy, some complex)
155
- - The hedging/confidence balance shown in statistics
156
- - First-person usage matching the corpus frequency
139
+ ```
140
+ Check what you just wrote against the style guide validation checklist.
141
+ Report any violations.
157
142
  ```
158
143
 
159
- ### Validation After Writing
144
+ ### Critical Validation Rules
160
145
 
161
- The guide includes statistical targets. After writing, check:
146
+ The guide flags these as must-pass:
147
+ - Zero AI slop words (delve, leverage, unlock, seamless, robust)
148
+ - Zero em-dashes if corpus doesn't use them
149
+ - British/American spelling consistency
150
+ - Equipment named specifically (not "the product")
162
151
 
163
- ```
164
- Review what you just wrote against the voice guide metrics:
165
- - Does sentence length variation match the target standard deviation?
166
- - Is first-person frequency within the expected range?
167
- - Are conversational markers present but not overused?
168
- - Does the rhythm feel like the extended examples?
169
- ```
152
+ ### Voice Match Validation
170
153
 
171
- ## What Gets Analysed
154
+ The guide checks these as should-pass:
155
+ - First-person frequency matches target (~0.8 per 100 words typical)
156
+ - Sentence length varies wildly (5-word to 40-word sentences)
157
+ - Honest caveats present ("It's not perfect", "I wish I'd...")
158
+ - Opening follows corpus patterns
172
159
 
173
- ### Core Voice Patterns
174
- - **Vocabulary** - Word choice, British/American markers, domain specificity
175
- - **Sentence structure** - Length distribution, openers, complexity patterns
176
- - **Voice markers** - First-person usage, hedging language, conversational markers
177
- - **Punctuation** - Dash types, comma density, parenthetical frequency
160
+ ## Analysis Output
178
161
 
179
- ### Argument & Flow Patterns
180
- - **Argument flow** - How you open, build, and close arguments
181
- - **Paragraph transitions** - How ideas connect across paragraphs
182
- - **Conversational devices** - "look", "frankly", "actually" and when they appear
162
+ The tool generates these JSON files in `corpus-name/analysis/`:
183
163
 
184
- ### Micro-Rhythm Detection
185
- The guide annotates examples with invisible patterns that make writing feel human:
186
- - Mid-thought pivots (comma before "and", "but", "so")
187
- - Present-tense immediacy ("Right now, it's...")
188
- - Embedded uncertainty ("I think", "probably")
189
- - Casual sentence starters ("So,", "And,", "But,")
190
- - Parenthetical asides
191
- - Punchy fragments contrasting with longer sentences
164
+ **Core Analysis:**
165
+ - `vocabulary.json` - Word choice, domain terms, British/American markers
166
+ - `sentence.json` - Length distribution, complexity patterns
167
+ - `voice.json` - First-person usage, hedging language, conversational markers
168
+ - `paragraph.json` - Structure and transition patterns
169
+ - `punctuation.json` - Dash types, comma density, parenthetical frequency
192
170
 
193
- ## Output Structure
171
+ **v1.4.0 Additions:**
172
+ - `vocabulary-tiers.json` - AI slop detection, formality scoring with alternatives
173
+ - `phrase-library.json` - 50+ extracted phrases organized by type
194
174
 
195
- ```
196
- your-output-directory/
197
- └── corpus-name/
198
- ├── articles/ # Collected markdown files
199
- ├── corpus.json # Metadata
200
- ├── analysis/ # JSON analysis files
201
- │ ├── vocabulary.json
202
- │ ├── sentence.json
203
- │ ├── voice.json
204
- │ ├── paragraph.json
205
- │ ├── punctuation.json
206
- │ ├── function-words.json
207
- │ ├── argument-flow.json
208
- │ └── paragraph-transitions.json
209
- └── writing_style_[name]_narrative.md # The voice guide
210
- ```
175
+ **Advanced Analysis:**
176
+ - `function-words.json` - Z-scores for style fingerprinting
177
+ - `anti-mechanical.json` - Naturalness scoring
178
+ - `argument-flow.json` - How arguments open, build, close
179
+ - `paragraph-transitions.json` - Cross-paragraph connection patterns
180
+ - `specificity-patterns.json` - Possessive vs generic references
211
181
 
212
- ## Minimum Corpus Size
182
+ ## Minimum Corpus Requirements
213
183
 
214
184
  - **Minimum:** 15,000 words (~20 articles)
215
- - **Recommended:** 30,000 words
185
+ - **Recommended:** 30,000 words (~40 articles)
216
186
  - **Ideal:** 50,000+ words
217
187
 
218
- Below 15k words, statistical patterns become unreliable.
188
+ Below 15k words, statistical patterns become unreliable. The phrase library needs volume to find frequently-used patterns.
219
189
 
220
- ## Tools Reference
190
+ ## MCP Tools Reference
221
191
 
222
192
  ### collect_corpus
223
193
 
224
- | Parameter | Required | Description |
225
- |-----------|----------|-------------|
226
- | `sitemap_url` | Yes | XML sitemap URL |
227
- | `output_name` | Yes | Corpus identifier (e.g., "my-voice") |
228
- | `output_dir` | Yes | Directory to store corpus |
229
- | `max_articles` | No | Limit (default: 100) |
230
- | `article_pattern` | No | Regex filter for URLs |
194
+ Crawls sitemap and collects clean writing corpus.
195
+
196
+ **Parameters:**
197
+ - `sitemap_url` (required) - XML sitemap URL
198
+ - `output_name` (required) - Corpus identifier
199
+ - `output_dir` (required) - Storage directory
200
+ - `max_articles` (optional) - Limit, default 100
201
+ - `article_pattern` (optional) - Regex URL filter
231
202
 
232
203
  ### analyze_corpus
233
204
 
234
- | Parameter | Required | Description |
235
- |-----------|----------|-------------|
236
- | `corpus_name` | Yes | Name from collect_corpus |
237
- | `corpus_dir` | Yes | Directory containing corpus |
238
- | `analysis_type` | No | full, quick, vocabulary, syntax |
205
+ Runs linguistic analysis on collected corpus.
206
+
207
+ **Parameters:**
208
+ - `corpus_name` (required) - Name from collect_corpus
209
+ - `corpus_dir` (required) - Directory containing corpus
210
+ - `analysis_type` (optional) - full, quick, vocabulary, syntax (default: full)
211
+
212
+ ### generate_style_guide
239
213
 
240
- ### generate_narrative_guide
214
+ Generates v4 executable style guide.
241
215
 
242
- | Parameter | Required | Description |
243
- |-----------|----------|-------------|
244
- | `corpus_name` | Yes | Name from analyze_corpus |
245
- | `corpus_dir` | Yes | Directory containing corpus |
216
+ **Parameters:**
217
+ - `corpus_name` (required) - Name from analyze_corpus
218
+ - `corpus_dir` (required) - Directory containing analysis
246
219
 
247
220
  ## Development
248
221
 
@@ -253,13 +226,37 @@ npm install
253
226
  npm run build
254
227
  ```
255
228
 
256
- ## Limitations
229
+ Local development mode in Claude Desktop config:
230
+
231
+ ```json
232
+ {
233
+ "mcpServers": {
234
+ "voice-analysis": {
235
+ "command": "node",
236
+ "args": ["C:\\path\\to\\mcp-server-voice-analysis\\dist\\index.js"]
237
+ }
238
+ }
239
+ }
240
+ ```
241
+
242
+ ## Known Limitations
243
+
244
+ - Needs XML sitemap (RSS feeds not supported)
245
+ - Works best with single-author content
246
+ - Mixed authorship weakens statistical signals
247
+ - Heavily edited content produces less distinct voice patterns
248
+ - Transition phrase detection currently returns sparse results (being improved)
249
+
250
+ ## What's Next
257
251
 
258
- - Requires XML sitemap (RSS feeds not currently supported)
259
- - Works best with consistent single-author content
260
- - Mixed authorship or heavily edited content produces weaker signals
261
- - The approach is experimental - results vary by writing style
252
+ v1.5.0 planned features:
253
+ - Automated text validation against corpus
254
+ - Real-time writing feedback
255
+ - Custom forbidden vocabulary per corpus
256
+ - Improved transition phrase detection
262
257
 
263
258
  ---
264
259
 
265
- Apache License 2.0 - [Houtini.ai](https://houtini.ai)
260
+ **License:** Apache 2.0
261
+ **Author:** [Houtini](https://houtini.ai)
262
+ **Version:** 1.4.0
@@ -0,0 +1,30 @@
1
+ /**
2
+ * Phrase Extraction Analysis
3
+ *
4
+ * Extracts frequently-used phrases for direct imitation.
5
+ * Focuses on opening patterns, transitions, equipment references, and caveats.
6
+ */
7
+ export interface PhraseExample {
8
+ phrase: string;
9
+ count: number;
10
+ context?: string;
11
+ }
12
+ export interface PhraseLibrary {
13
+ openingPatterns: {
14
+ personalStory: PhraseExample[];
15
+ directAction: PhraseExample[];
16
+ protectiveWarning: PhraseExample[];
17
+ };
18
+ transitionPhrases: PhraseExample[];
19
+ equipmentReferences: {
20
+ withPossessive: PhraseExample[];
21
+ generic: PhraseExample[];
22
+ };
23
+ caveatPhrases: PhraseExample[];
24
+ totalPhrases: number;
25
+ }
26
+ /**
27
+ * Main phrase extraction function
28
+ */
29
+ export declare function extractPhrases(text: string): PhraseLibrary;
30
+ //# sourceMappingURL=phrase-extraction.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"phrase-extraction.d.ts","sourceRoot":"","sources":["../../src/analyzers/phrase-extraction.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAIH,MAAM,WAAW,aAAa;IAC5B,MAAM,EAAE,MAAM,CAAC;IACf,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED,MAAM,WAAW,aAAa;IAC5B,eAAe,EAAE;QACf,aAAa,EAAE,aAAa,EAAE,CAAC;QAC/B,YAAY,EAAE,aAAa,EAAE,CAAC;QAC9B,iBAAiB,EAAE,aAAa,EAAE,CAAC;KACpC,CAAC;IACF,iBAAiB,EAAE,aAAa,EAAE,CAAC;IACnC,mBAAmB,EAAE;QACnB,cAAc,EAAE,aAAa,EAAE,CAAC;QAChC,OAAO,EAAE,aAAa,EAAE,CAAC;KAC1B,CAAC;IACF,aAAa,EAAE,aAAa,EAAE,CAAC;IAC/B,YAAY,EAAE,MAAM,CAAC;CACtB;AAkLD;;GAEG;AACH,wBAAgB,cAAc,CAAC,IAAI,EAAE,MAAM,GAAG,aAAa,CAsB1D"}
@@ -0,0 +1,175 @@
1
+ /**
2
+ * Phrase Extraction Analysis
3
+ *
4
+ * Extracts frequently-used phrases for direct imitation.
5
+ * Focuses on opening patterns, transitions, equipment references, and caveats.
6
+ */
7
+ import nlp from 'compromise';
8
+ /**
9
+ * Extract n-grams (2-6 word phrases) from text
10
+ */
11
+ function extractNGrams(text, minLength = 2, maxLength = 6) {
12
+ const words = text.toLowerCase()
13
+ .replace(/[—–]/g, '-')
14
+ .split(/\s+/)
15
+ .filter(w => w.length > 0);
16
+ const ngramCounts = new Map();
17
+ for (let n = minLength; n <= maxLength; n++) {
18
+ for (let i = 0; i <= words.length - n; i++) {
19
+ const ngram = words.slice(i, i + n).join(' ');
20
+ ngramCounts.set(ngram, (ngramCounts.get(ngram) || 0) + 1);
21
+ }
22
+ }
23
+ return ngramCounts;
24
+ }
25
+ /**
26
+ * Extract opening patterns from paragraphs
27
+ */
28
+ function extractOpeningPatterns(text) {
29
+ const paragraphs = text.split(/\n\n+/);
30
+ const openingSentences = [];
31
+ for (const para of paragraphs) {
32
+ const sentences = para.split(/[.!?]+/).filter(s => s.trim().length > 10);
33
+ if (sentences.length > 0) {
34
+ openingSentences.push(sentences[0].trim());
35
+ }
36
+ }
37
+ const personalStory = [];
38
+ const directAction = [];
39
+ const protectiveWarning = [];
40
+ const personalMarkers = /\b(i've|i'm|i was|i have|for me|my|when i|how i)\b/i;
41
+ const actionMarkers = /\b(right|so|now|let's|first|start|here's|okay)\b/i;
42
+ const warningMarkers = /\b(before|make sure|important|note|remember|warning|careful)\b/i;
43
+ for (const sentence of openingSentences) {
44
+ const lower = sentence.toLowerCase();
45
+ if (personalMarkers.test(lower)) {
46
+ personalStory.push({ phrase: sentence, count: 1 });
47
+ }
48
+ else if (actionMarkers.test(lower)) {
49
+ directAction.push({ phrase: sentence, count: 1 });
50
+ }
51
+ else if (warningMarkers.test(lower)) {
52
+ protectiveWarning.push({ phrase: sentence, count: 1 });
53
+ }
54
+ }
55
+ return { personalStory, directAction, protectiveWarning };
56
+ }
57
+ /**
58
+ * Extract transition phrases (between paragraphs or at sentence starts)
59
+ */ function extractTransitionPhrases(text) {
60
+ const sentences = text.split(/[.!?]+/).filter(s => s.trim().length > 10);
61
+ const transitionMarkers = [
62
+ 'at this', 'once you', 'once we', 'at that', 'after that', 'from there',
63
+ 'next up', 'moving on', 'the key', 'the thing', 'in fact', 'actually',
64
+ 'however', 'though', 'still', 'meanwhile', 'alternatively',
65
+ 'for instance', 'for example', 'essentially', 'basically'
66
+ ];
67
+ const transitions = new Map();
68
+ for (const sentence of sentences) {
69
+ const lower = sentence.toLowerCase().trim();
70
+ for (const marker of transitionMarkers) {
71
+ if (lower.startsWith(marker)) {
72
+ const match = sentence.match(new RegExp(`^${marker}[^,]*`, 'i'));
73
+ if (match) {
74
+ const phrase = match[0].trim();
75
+ transitions.set(phrase, (transitions.get(phrase) || 0) + 1);
76
+ }
77
+ }
78
+ }
79
+ }
80
+ return Array.from(transitions.entries())
81
+ .filter(([_, count]) => count >= 1)
82
+ .map(([phrase, count]) => ({ phrase, count }))
83
+ .sort((a, b) => b.count - a.count);
84
+ }
85
+ /**
86
+ * Extract equipment references with possessives vs generic
87
+ */
88
+ function extractEquipmentReferences(text) {
89
+ const equipmentWords = [
90
+ 'pc', 'rig', 'card', 'gpu', 'block', 'pad', 'thermal', 'cooler',
91
+ 'radiator', 'pump', 'fan', 'case', 'system', 'setup', 'hardware',
92
+ 'device', 'product', 'equipment', 'unit', 'tool'
93
+ ];
94
+ const withPossessive = new Map();
95
+ const generic = new Map();
96
+ const doc = nlp(text);
97
+ for (const word of equipmentWords) {
98
+ const possessiveMatches = doc.match(`(my|our) ${word}`);
99
+ possessiveMatches.forEach((match) => {
100
+ const phrase = match.text().toLowerCase();
101
+ withPossessive.set(phrase, (withPossessive.get(phrase) || 0) + 1);
102
+ });
103
+ const genericMatches = doc.match(`(the|this|that|a) ${word}`);
104
+ genericMatches.forEach((match) => {
105
+ const phrase = match.text().toLowerCase();
106
+ generic.set(phrase, (generic.get(phrase) || 0) + 1);
107
+ });
108
+ }
109
+ return {
110
+ withPossessive: Array.from(withPossessive.entries())
111
+ .map(([phrase, count]) => ({ phrase, count }))
112
+ .sort((a, b) => b.count - a.count)
113
+ .slice(0, 30),
114
+ generic: Array.from(generic.entries())
115
+ .map(([phrase, count]) => ({ phrase, count }))
116
+ .sort((a, b) => b.count - a.count)
117
+ .slice(0, 20)
118
+ };
119
+ }
120
+ /**
121
+ * Extract caveat and honesty phrases
122
+ */
123
+ function extractCaveatPhrases(text) {
124
+ const caveatMarkers = [
125
+ "it isn't", "it's not", "not perfect", "not ideal", "could be better",
126
+ "i wish", "would have", "should have", "probably", "might not",
127
+ "may not", "doesn't always", "won't always", "can be", "tends to",
128
+ "in my case", "for me", "your mileage"
129
+ ];
130
+ const caveats = new Map();
131
+ const sentences = text.split(/[.!?]+/).filter(s => s.trim().length > 10);
132
+ for (const sentence of sentences) {
133
+ const lower = sentence.toLowerCase();
134
+ for (const marker of caveatMarkers) {
135
+ if (lower.includes(marker)) {
136
+ const match = sentence.match(new RegExp(`[^.!?]*${marker}[^.!?]*`, 'i'));
137
+ if (match) {
138
+ const phrase = match[0].trim();
139
+ if (phrase.length > 10 && phrase.length < 100) {
140
+ caveats.set(phrase, (caveats.get(phrase) || 0) + 1);
141
+ }
142
+ }
143
+ }
144
+ }
145
+ }
146
+ return Array.from(caveats.entries())
147
+ .filter(([_, count]) => count >= 1)
148
+ .map(([phrase, count]) => ({ phrase, count }))
149
+ .sort((a, b) => b.count - a.count)
150
+ .slice(0, 20);
151
+ }
152
+ /**
153
+ * Main phrase extraction function
154
+ */
155
+ export function extractPhrases(text) {
156
+ const openingPatterns = extractOpeningPatterns(text);
157
+ const transitionPhrases = extractTransitionPhrases(text);
158
+ const equipmentReferences = extractEquipmentReferences(text);
159
+ const caveatPhrases = extractCaveatPhrases(text);
160
+ const totalPhrases = openingPatterns.personalStory.length +
161
+ openingPatterns.directAction.length +
162
+ openingPatterns.protectiveWarning.length +
163
+ transitionPhrases.length +
164
+ equipmentReferences.withPossessive.length +
165
+ equipmentReferences.generic.length +
166
+ caveatPhrases.length;
167
+ return {
168
+ openingPatterns,
169
+ transitionPhrases,
170
+ equipmentReferences,
171
+ caveatPhrases,
172
+ totalPhrases
173
+ };
174
+ }
175
+ //# sourceMappingURL=phrase-extraction.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"phrase-extraction.js","sourceRoot":"","sources":["../../src/analyzers/phrase-extraction.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,GAAG,MAAM,YAAY,CAAC;AAuB7B;;GAEG;AACH,SAAS,aAAa,CAAC,IAAY,EAAE,YAAoB,CAAC,EAAE,YAAoB,CAAC;IAC/E,MAAM,KAAK,GAAG,IAAI,CAAC,WAAW,EAAE;SAC7B,OAAO,CAAC,OAAO,EAAE,GAAG,CAAC;SACrB,KAAK,CAAC,KAAK,CAAC;SACZ,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IAE7B,MAAM,WAAW,GAAG,IAAI,GAAG,EAAkB,CAAC;IAE9C,KAAK,IAAI,CAAC,GAAG,SAAS,EAAE,CAAC,IAAI,SAAS,EAAE,CAAC,EAAE,EAAE,CAAC;QAC5C,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,IAAI,KAAK,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC;YAC3C,MAAM,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;YAC9C,WAAW,CAAC,GAAG,CAAC,KAAK,EAAE,CAAC,WAAW,CAAC,GAAG,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;QAC5D,CAAC;IACH,CAAC;IAED,OAAO,WAAW,CAAC;AACrB,CAAC;AAED;;GAEG;AACH,SAAS,sBAAsB,CAAC,IAAY;IAK1C,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC;IACvC,MAAM,gBAAgB,GAAa,EAAE,CAAC;IAEtC,KAAK,MAAM,IAAI,IAAI,UAAU,EAAE,CAAC;QAC9B,MAAM,SAAS,GAAG,IAAI,CAAC,KAAK,CAAC,QAAQ,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,EAAE,CAAC,CAAC;QACzE,IAAI,SAAS,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACzB,gBAAgB,CAAC,IAAI,CAAC,SAAS,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC;QAC7C,CAAC;IACH,CAAC;IAED,MAAM,aAAa,GAAoB,EAAE,CAAC;IAC1C,MAAM,YAAY,GAAoB,EAAE,CAAC;IACzC,MAAM,iBAAiB,GAAoB,EAAE,CAAC;IAE9C,MAAM,eAAe,GAAG,qDAAqD,CAAC;IAC9E,MAAM,aAAa,GAAG,mDAAmD,CAAC;IAC1E,MAAM,cAAc,GAAG,iEAAiE,CAAC;IAEzF,KAAK,MAAM,QAAQ,IAAI,gBAAgB,EAAE,CAAC;QACxC,MAAM,KAAK,GAAG,QAAQ,CAAC,WAAW,EAAE,CAAC;QAErC,IAAI,eAAe,CAAC,IAAI,CAAC,KAAK,CAAC,EAAE,CAAC;YAChC,aAAa,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,QAAQ,EAAE,KAAK,EAAE,CAAC,EAAE,CAAC,CAAC;QACrD,CAAC;aAAM,IAAI,aAAa,CAAC,IAAI,CAAC,KAAK,CAAC,EAAE,CAAC;YACrC,YAAY,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,QAAQ,EAAE,KAAK,EAAE,CAAC,EAAE,CAAC,CAAC;QACpD,CAAC;aAAM,IAAI,cAAc,CAAC,IAAI,CAAC,KAAK,CAAC,EAAE,CAAC;YACtC,iBAAiB,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,QAAQ,EAAE,KAAK,EAAE,CAAC,EAAE,CAAC,CAAC;QACzD,CAAC;IACH,CAAC;IAED,OAAO,EAAE,aAAa,EAAE,YAAY,EAAE,iBAAiB,EAAE,CAAC;AAC5D,CAAC;AAED;;GAEG,CAAA,SAAS,wBAAwB,CAAC,IAAY;IAC/C,MAAM,SAAS,GAAG,IAAI,CAAC,KAAK,CAAC,QAAQ,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,EAAE,CAAC,CAAC;IACzE,MAAM,iBAAiB,GAAG;QACxB,SAAS,EAAE,UAAU,EAAE,SAAS,EAAE,SAAS,EAAE,YAAY,EAAE,YAAY;QACvE,SAAS,EAAE,WAAW,EAAE,SAAS,EAAE,WAAW,EAAE,SAAS,EAAE,UAAU;QACrE,SAAS,EAAE,QAAQ,EAAE,OAAO,EAAE,WAAW,EAAE,eAAe;QAC1D,cAAc,EAAE,aAAa,EAAE,aAAa,EAAE,WAAW;KAC1D,CAAC;IAEF,MAAM,WAAW,GAAG,IAAI,GAAG,EAAkB,CAAC;IAE9C,KAAK,MAAM,QAAQ,IAAI,SAAS,EAAE,CAAC;QACjC,MAAM,KAAK,GAAG,QAAQ,CAAC,WAAW,EAAE,CAAC,IAAI,EAAE,CAAC;QAE5C,KAAK,MAAM,MAAM,IAAI,iBAAiB,EAAE,CAAC;YACvC,IAAI,KAAK,CAAC,UAAU,CAAC,MAAM,CAAC,EAAE,CAAC;gBAC7B,MAAM,KAAK,GAAG,QAAQ,CAAC,KAAK,CAAC,IAAI,MAAM,CAAC,IAAI,MAAM,OAAO,EAAE,GAAG,CAAC,CAAC,CAAC;gBACjE,IAAI,KAAK,EAAE,CAAC;oBACV,MAAM,MAAM,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;oBAC/B,WAAW,CAAC,GAAG,CAAC,MAAM,EAAE,CAAC,WAAW,CAAC,GAAG,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;gBAC9D,CAAC;YACH,CAAC;QACH,CAAC;IACH,CAAC;IAED,OAAO,KAAK,CAAC,IAAI,CAAC,WAAW,CAAC,OAAO,EAAE,CAAC;SACrC,MAAM,CAAC,CAAC,CAAC,CAAC,EAAE,KAAK,CAAC,EAAE,EAAE,CAAC,KAAK,IAAI,CAAC,CAAC;SAClC,GAAG,CAAC,CAAC,CAAC,MAAM,EAAE,KAAK,CAAC,EAAE,EAAE,CAAC,CAAC,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC,CAAC;SAC7C,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,KAAK,GAAG,CAAC,CAAC,KAAK,CAAC,CAAC;AACvC,CAAC;AAED;;GAEG;AACH,SAAS,0BAA0B,CAAC,IAAY;IAI9C,MAAM,cAAc,GAAG;QACrB,IAAI,EAAE,KAAK,EAAE,MAAM,EAAE,KAAK,EAAE,OAAO,EAAE,KAAK,EAAE,SAAS,EAAE,QAAQ;QAC/D,UAAU,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,QAAQ,EAAE,OAAO,EAAE,UAAU;QAChE,QAAQ,EAAE,SAAS,EAAE,WAAW,EAAE,MAAM,EAAE,MAAM;KACjD,CAAC;IAEF,MAAM,cAAc,GAAG,IAAI,GAAG,EAAkB,CAAC;IACjD,MAAM,OAAO,GAAG,IAAI,GAAG,EAAkB,CAAC;IAE1C,MAAM,GAAG,GAAG,GAAG,CAAC,IAAI,CAAC,CAAC;IAEtB,KAAK,MAAM,IAAI,IAAI,cAAc,EAAE,CAAC;QAClC,MAAM,iBAAiB,GAAG,GAAG,CAAC,KAAK,CAAC,YAAY,IAAI,EAAE,CAAC,CAAC;QACxD,iBAAiB,CAAC,OAAO,CAAC,CAAC,KAAU,EAAE,EAAE;YACvC,MAAM,MAAM,GAAG,KAAK,CAAC,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;YAC1C,cAAc,CAAC,GAAG,CAAC,MAAM,EAAE,CAAC,cAAc,CAAC,GAAG,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;QACpE,CAAC,CAAC,CAAC;QAEH,MAAM,cAAc,GAAG,GAAG,CAAC,KAAK,CAAC,qBAAqB,IAAI,EAAE,CAAC,CAAC;QAC9D,cAAc,CAAC,OAAO,CAAC,CAAC,KAAU,EAAE,EAAE;YACpC,MAAM,MAAM,GAAG,KAAK,CAAC,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;YAC1C,OAAO,CAAC,GAAG,CAAC,MAAM,EAAE,CAAC,OAAO,CAAC,GAAG,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;QACtD,CAAC,CAAC,CAAC;IACL,CAAC;IAED,OAAO;QACL,cAAc,EAAE,KAAK,CAAC,IAAI,CAAC,cAAc,CAAC,OAAO,EAAE,CAAC;aACjD,GAAG,CAAC,CAAC,CAAC,MAAM,EAAE,KAAK,CAAC,EAAE,EAAE,CAAC,CAAC,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC,CAAC;aAC7C,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,KAAK,GAAG,CAAC,CAAC,KAAK,CAAC;aACjC,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC;QACf,OAAO,EAAE,KAAK,CAAC,IAAI,CAAC,OAAO,CAAC,OAAO,EAAE,CAAC;aACnC,GAAG,CAAC,CAAC,CAAC,MAAM,EAAE,KAAK,CAAC,EAAE,EAAE,CAAC,CAAC,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC,CAAC;aAC7C,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,KAAK,GAAG,CAAC,CAAC,KAAK,CAAC;aACjC,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC;KAChB,CAAC;AACJ,CAAC;AAED;;GAEG;AACH,SAAS,oBAAoB,CAAC,IAAY;IACxC,MAAM,aAAa,GAAG;QACpB,UAAU,EAAE,UAAU,EAAE,aAAa,EAAE,WAAW,EAAE,iBAAiB;QACrE,QAAQ,EAAE,YAAY,EAAE,aAAa,EAAE,UAAU,EAAE,WAAW;QAC9D,SAAS,EAAE,gBAAgB,EAAE,cAAc,EAAE,QAAQ,EAAE,UAAU;QACjE,YAAY,EAAE,QAAQ,EAAE,cAAc;KACvC,CAAC;IAEF,MAAM,OAAO,GAAG,IAAI,GAAG,EAAkB,CAAC;IAC1C,MAAM,SAAS,GAAG,IAAI,CAAC,KAAK,CAAC,QAAQ,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,EAAE,CAAC,CAAC;IAEzE,KAAK,MAAM,QAAQ,IAAI,SAAS,EAAE,CAAC;QACjC,MAAM,KAAK,GAAG,QAAQ,CAAC,WAAW,EAAE,CAAC;QAErC,KAAK,MAAM,MAAM,IAAI,aAAa,EAAE,CAAC;YACnC,IAAI,KAAK,CAAC,QAAQ,CAAC,MAAM,CAAC,EAAE,CAAC;gBAC3B,MAAM,KAAK,GAAG,QAAQ,CAAC,KAAK,CAAC,IAAI,MAAM,CAAC,UAAU,MAAM,SAAS,EAAE,GAAG,CAAC,CAAC,CAAC;gBACzE,IAAI,KAAK,EAAE,CAAC;oBACV,MAAM,MAAM,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;oBAC/B,IAAI,MAAM,CAAC,MAAM,GAAG,EAAE,IAAI,MAAM,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;wBAC9C,OAAO,CAAC,GAAG,CAAC,MAAM,EAAE,CAAC,OAAO,CAAC,GAAG,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;oBACtD,CAAC;gBACH,CAAC;YACH,CAAC;QACH,CAAC;IACH,CAAC;IAED,OAAO,KAAK,CAAC,IAAI,CAAC,OAAO,CAAC,OAAO,EAAE,CAAC;SACjC,MAAM,CAAC,CAAC,CAAC,CAAC,EAAE,KAAK,CAAC,EAAE,EAAE,CAAC,KAAK,IAAI,CAAC,CAAC;SAClC,GAAG,CAAC,CAAC,CAAC,MAAM,EAAE,KAAK,CAAC,EAAE,EAAE,CAAC,CAAC,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC,CAAC;SAC7C,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,KAAK,GAAG,CAAC,CAAC,KAAK,CAAC;SACjC,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC;AAClB,CAAC;AAED;;GAEG;AACH,MAAM,UAAU,cAAc,CAAC,IAAY;IACzC,MAAM,eAAe,GAAG,sBAAsB,CAAC,IAAI,CAAC,CAAC;IACrD,MAAM,iBAAiB,GAAG,wBAAwB,CAAC,IAAI,CAAC,CAAC;IACzD,MAAM,mBAAmB,GAAG,0BAA0B,CAAC,IAAI,CAAC,CAAC;IAC7D,MAAM,aAAa,GAAG,oBAAoB,CAAC,IAAI,CAAC,CAAC;IAEjD,MAAM,YAAY,GAChB,eAAe,CAAC,aAAa,CAAC,MAAM;QACpC,eAAe,CAAC,YAAY,CAAC,MAAM;QACnC,eAAe,CAAC,iBAAiB,CAAC,MAAM;QACxC,iBAAiB,CAAC,MAAM;QACxB,mBAAmB,CAAC,cAAc,CAAC,MAAM;QACzC,mBAAmB,CAAC,OAAO,CAAC,MAAM;QAClC,aAAa,CAAC,MAAM,CAAC;IAEvB,OAAO;QACL,eAAe;QACf,iBAAiB;QACjB,mBAAmB;QACnB,aAAa;QACb,YAAY;KACb,CAAC;AACJ,CAAC"}
@@ -0,0 +1,27 @@
1
+ /**
2
+ * Vocabulary Tier Analysis
3
+ *
4
+ * Identifies formal vs casual vocabulary to help writers avoid
5
+ * using words that are too formal for their authentic voice.
6
+ */
7
+ export interface VocabularyTier {
8
+ word: string;
9
+ count: number;
10
+ category: 'verb' | 'adjective' | 'adverb';
11
+ formality: 'formal' | 'ai-slop';
12
+ suggestedAlternatives: string[];
13
+ }
14
+ export interface VocabularyTierAnalysis {
15
+ formalVerbs: VocabularyTier[];
16
+ formalAdjectives: VocabularyTier[];
17
+ formalAdverbs: VocabularyTier[];
18
+ aiSlop: VocabularyTier[];
19
+ totalFormalWords: number;
20
+ formalityScore: number;
21
+ recommendations: string[];
22
+ }
23
+ /**
24
+ * Analyze vocabulary formality in text
25
+ */
26
+ export declare function analyzeVocabularyTiers(text: string): VocabularyTierAnalysis;
27
+ //# sourceMappingURL=vocabulary-tiers.d.ts.map