@ijonis/geo-lint 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,32 +1,22 @@
1
1
  # @ijonis/geo-lint
2
2
 
3
- **An agentic SEO and GEO linter. AI coding agents run it, read the violations, fix your content, and re-lint -- hands off.**
3
+ **The first open-source linter for GEO (Generative Engine Optimization). Validates your content for AI search visibility -- then lets your AI agent fix it automatically.**
4
4
 
5
5
  [![npm version](https://img.shields.io/npm/v/@ijonis/geo-lint)](https://www.npmjs.com/package/@ijonis/geo-lint)
6
6
  [![CI](https://img.shields.io/github/actions/workflow/status/ijonis/geo-lint/ci.yml?branch=main&label=CI)](https://github.com/IJONIS/geo-lint/actions)
7
7
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/IJONIS/geo-lint/blob/main/LICENSE)
8
8
 
9
+ ![geo-lint demo](docs/demo.gif)
10
+
9
11
  ---
10
12
 
11
13
  ## Why this exists
12
14
 
13
- Traditional SEO tools give you a report. You read it, figure out what to change, edit the file, re-run, repeat. That loop is manual, slow, and breaks down at scale.
14
-
15
- `@ijonis/geo-lint` is built for a different loop:
16
-
17
- ```
18
- Agent runs geo-lint → reads JSON violations → fixes the content → re-runs geo-lint → done
19
- ```
20
-
21
- **You don't fix the violations. Your AI agent does.** The linter is the rule engine that tells the agent exactly what's wrong and how to fix it. Every rule ships with a machine-readable `fixStrategy` and a `suggestion` field that agents consume directly. The JSON output has zero ANSI formatting -- pure structured data.
15
+ **GEO (Generative Engine Optimization)** is the practice of structuring content so AI search engines cite it -- ChatGPT, Perplexity, Google AI Overviews, Gemini. Traditional SEO gets you into search result lists. GEO gets you **cited in AI-generated answers**. They require different content patterns, and no existing open-source tool checks for GEO.
22
16
 
23
- This works today with Claude Code, Cursor, Windsurf, Copilot, or any agent that can run shell commands and edit files.
17
+ `@ijonis/geo-lint` is built for an agentic workflow: your AI agent runs the linter, reads the JSON violations, fixes the content, and re-lints until clean. Every rule ships with a machine-readable `suggestion` and `fixStrategy` that agents consume directly.
24
18
 
25
- ### What about GEO?
26
-
27
- **GEO (Generative Engine Optimization)** is the practice of optimizing content so it gets cited by AI search engines -- ChatGPT, Perplexity, Google AI Overviews, Gemini. When someone asks an AI a question, the model pulls from web content to build its answer. GEO makes your content the source it pulls from.
28
-
29
- Traditional SEO gets you into search result lists. GEO gets you **cited in AI-generated answers**. They're complementary, but GEO requires structural changes that no existing SEO tool checks for. `@ijonis/geo-lint` validates both -- 32 SEO rules, **35 dedicated GEO rules**, and **14 content quality rules** including readability analysis inspired by Yoast SEO -- with zero open-source alternatives for the GEO checks.
19
+ **92 rules: 35 GEO, 32 SEO, 14 content quality, 8 technical, 3 i18n.** Readability analysis inspired by Yoast SEO. Zero open-source alternatives for the GEO checks.
30
20
 
31
21
  ---
32
22
 
@@ -51,45 +41,24 @@ export default defineConfig({
51
41
  });
52
42
  ```
53
43
 
54
- Run it manually:
44
+ Run it:
55
45
 
56
46
  ```bash
57
- npx geo-lint
47
+ npx geo-lint # Human-readable output
48
+ npx geo-lint --format=json # Machine-readable for AI agents
58
49
  ```
59
50
 
60
- Or let your agent handle it -- see [Agent Integration](#agent-integration) below.
51
+ Works out of the box with `.md`/`.mdx` files. For Astro, HTML, or other formats, see [Custom Adapters](docs/custom-adapters.md).
61
52
 
62
53
  ---
63
54
 
64
- ## GEO Rules
65
-
66
- No other open-source linter checks for these. 35 rules across E-E-A-T signals, content structure, freshness, and RAG optimization -- each targeting a specific content pattern that AI search engines use when deciding what to cite. When your agent fixes a GEO violation, it's directly increasing the probability that the content gets pulled into AI-generated answers.
67
-
68
- > **New in 0.1.1:** 14 content quality rules now include transition word analysis, consecutive sentence start detection, and sentence length variety scoring -- readability checks inspired by Yoast SEO, built for the agentic lint-fix loop.
69
-
70
- ### Core GEO Rules (7 rules)
71
-
72
- ### 1. `geo-no-question-headings`
55
+ ## GEO in Action
73
56
 
74
- **At least 20% of H2/H3 headings should be phrased as questions.**
57
+ Three examples of what GEO rules catch and how to fix them. See [all 7 core GEO rules with examples](docs/geo-rules.md).
75
58
 
76
- LLMs match user queries against headings to find relevant sections. Question-formatted headings create a direct mapping between what users ask and what your content answers.
59
+ ### `geo-weak-lead-sentences`
77
60
 
78
- **Before:**
79
- ```markdown
80
- ## Benefits of Remote Work
81
- ```
82
-
83
- **After:**
84
- ```markdown
85
- ## What are the benefits of remote work?
86
- ```
87
-
88
- ### 2. `geo-weak-lead-sentences`
89
-
90
- **At least 50% of sections should start with a direct answer, not filler.**
91
-
92
- AI systems use the first sentence after a heading as the citation snippet. Filler openings like "In this section, we will explore..." get skipped in favor of content that leads with the answer.
61
+ AI systems use the first sentence after a heading as the citation snippet. Filler openings get skipped.
93
62
 
94
63
  **Before:**
95
64
  ```markdown
@@ -108,11 +77,9 @@ dynamically allocates compute resources per request, eliminating the
108
77
  need to provision or manage servers.
109
78
  ```
110
79
 
111
- ### 3. `geo-low-citation-density`
112
-
113
- **Content needs at least 1 statistical data point per 500 words.**
80
+ ### `geo-low-citation-density`
114
81
 
115
- AI answers prefer citable claims backed by numbers. A post that says "performance improved significantly" is less likely to be cited than one that says "performance improved by 47% in load testing."
82
+ AI answers prefer citable claims backed by numbers. Vague statements get passed over.
116
83
 
117
84
  **Before:**
118
85
  ```markdown
@@ -126,39 +93,9 @@ exceeding 50,000 lines of code, according to a 2023 study by
126
93
  Microsoft Research.
127
94
  ```
128
95
 
129
- ### 4. `geo-missing-faq-section`
130
-
131
- **Long posts (800+ words) should include an FAQ section.**
96
+ ### `geo-missing-table`
132
97
 
133
- FAQ sections are extracted verbatim by AI systems more than any other content structure. A well-written FAQ at the bottom of a post can generate more AI citations than the rest of the article combined.
134
-
135
- **Before:**
136
- ```markdown
137
- ## Conclusion
138
-
139
- TypeScript is a valuable tool for large teams.
140
- ```
141
-
142
- **After:**
143
- ```markdown
144
- ## FAQ
145
-
146
- ### Is TypeScript worth learning in 2026?
147
-
148
- Yes. TypeScript is used by 78% of professional JavaScript developers
149
- and is required in most enterprise job listings.
150
-
151
- ### Does TypeScript slow down development?
152
-
153
- Initial setup adds overhead, but teams report 15-25% faster
154
- iteration after the first month due to fewer runtime errors.
155
- ```
156
-
157
- ### 5. `geo-missing-table`
158
-
159
- **Long posts (1000+ words) should include at least one data table.**
160
-
161
- Tables are highly structured and unambiguous, which makes them ideal for AI extraction. Research shows that content with comparison tables is cited 2.5x more frequently in AI-generated answers than equivalent content without tables.
98
+ Tables are highly structured and unambiguous -- ideal for AI extraction. Content with comparison tables is cited significantly more often than equivalent prose.
162
99
 
163
100
  **Before:**
164
101
  ```markdown
@@ -176,57 +113,9 @@ components at build time.
176
113
  | Svelte | Compile-time | 1.6 KB | Low |
177
114
  ```
178
115
 
179
- ### 6. `geo-short-citation-blocks`
180
-
181
- **At least 50% of sections should start with a paragraph of 40+ words.**
182
-
183
- The first paragraph after a heading is the "citation block" -- the unit of text that AI systems extract and present to users. If your opening paragraph is too short (a single sentence fragment), the AI may skip it or pull from a competitor's more complete answer.
184
-
185
- **Before:**
186
- ```markdown
187
- ## How does DNS work?
188
-
189
- It translates domain names.
190
-
191
- DNS uses a hierarchical system of nameservers...
192
- ```
193
-
194
- **After:**
195
- ```markdown
196
- ## How does DNS work?
197
-
198
- DNS (Domain Name System) translates human-readable domain names like
199
- example.com into IP addresses that computers use to route traffic.
200
- The resolution process queries a hierarchy of nameservers, starting
201
- from root servers and drilling down through TLD and authoritative
202
- nameservers to find the correct IP address.
203
- ```
204
-
205
- ### 7. `geo-low-entity-density`
206
-
207
- **Brand name and location should appear in the content body.**
208
-
209
- AI systems build entity graphs that connect brands, locations, products, and topics. If your content never mentions your brand name or geographic context, the AI cannot associate the content with your entity -- even if the domain is correct.
210
-
211
- This rule checks for the presence of the `brandName` and `brandCity` values from your config. When either value is empty, that check is skipped.
212
-
213
- **Before:**
214
- ```markdown
215
- Our team builds high-performance web applications using modern
216
- frameworks and cloud infrastructure.
217
- ```
218
-
219
- **After:**
220
- ```markdown
221
- ACME builds high-performance web applications from our Berlin
222
- headquarters, using modern frameworks and cloud infrastructure.
223
- ```
224
-
225
116
  ---
226
117
 
227
- ## All Rules
228
-
229
- `@ijonis/geo-lint` ships with 92 rules across 5 categories. Here is a summary:
118
+ ## All 92 Rules
230
119
 
231
120
  | Category | Rules | Severity Mix | Focus |
232
121
  |----------|-------|-------------|-------|
@@ -236,258 +125,40 @@ headquarters, using modern frameworks and cloud infrastructure.
236
125
  | i18n | 3 | 0 errors, 3 warnings | Translation pairs, locale metadata |
237
126
  | GEO | 35 | 0 errors, 35 warnings | AI citation readiness: E-E-A-T signals, content structure, freshness, RAG optimization |
238
127
 
239
- <details>
240
- <summary>Full rule list</summary>
241
-
242
- **Title (4 rules)**
243
-
244
- | Rule | Severity | Description |
245
- |------|----------|-------------|
246
- | `title-missing` | error | Title must be present in frontmatter |
247
- | `title-too-short` | warning | Title should meet minimum length (30 chars) |
248
- | `title-too-long` | error | Title must not exceed maximum length (60 chars) |
249
- | `title-approaching-limit` | warning | Title is close to the maximum length |
250
-
251
- **Description (4 rules)**
128
+ See the [complete rule reference](docs/rules.md) with descriptions and severity for every rule.
252
129
 
253
- | Rule | Severity | Description |
254
- |------|----------|-------------|
255
- | `description-missing` | error | Meta description must be present |
256
- | `description-too-long` | error | Description must not exceed 160 characters |
257
- | `description-approaching-limit` | warning | Description is close to the maximum length |
258
- | `description-too-short` | warning | Description should meet minimum length (70 chars) |
259
-
260
- **Heading (4 rules)**
261
-
262
- | Rule | Severity | Description |
263
- |------|----------|-------------|
264
- | `missing-h1` | warning | Content should have an H1 heading |
265
- | `multiple-h1` | error | Content must not have more than one H1 |
266
- | `heading-hierarchy-skip` | warning | Heading levels should not skip (e.g., H2 to H4) |
267
- | `duplicate-heading-text` | warning | Heading text should be unique within a page |
268
-
269
- **Slug (2 rules)**
270
-
271
- | Rule | Severity | Description |
272
- |------|----------|-------------|
273
- | `slug-invalid-characters` | error | Slugs must be lowercase alphanumeric with hyphens |
274
- | `slug-too-long` | warning | Slugs should not exceed 75 characters |
275
-
276
- **Open Graph (2 rules)**
277
-
278
- | Rule | Severity | Description |
279
- |------|----------|-------------|
280
- | `blog-missing-og-image` | warning | Blog posts should have a featured image |
281
- | `project-missing-og-image` | warning | Projects should have a featured image |
282
-
283
- **Canonical (2 rules)**
284
-
285
- | Rule | Severity | Description |
286
- |------|----------|-------------|
287
- | `canonical-missing` | warning | Indexed pages should have a canonical URL |
288
- | `canonical-malformed` | warning | Canonical URL must be a valid path or site URL |
289
-
290
- **Robots (1 rule)**
291
-
292
- | Rule | Severity | Description |
293
- |------|----------|-------------|
294
- | `published-noindex` | warning | Published content with noindex may be unintentional |
295
-
296
- **Schema (1 rule)**
297
-
298
- | Rule | Severity | Description |
299
- |------|----------|-------------|
300
- | `blog-missing-schema-fields` | warning | Blog posts should have fields for BlogPosting schema |
301
-
302
- **Keyword Coherence (3 rules)**
130
+ ---
303
131
 
304
- | Rule | Severity | Description |
305
- |------|----------|-------------|
306
- | `keyword-not-in-description` | warning | Title keywords should appear in the description |
307
- | `keyword-not-in-headings` | warning | Title keywords should appear in subheadings |
308
- | `title-description-no-overlap` | warning | Title and description should share keywords |
132
+ ## Works With
309
133
 
310
- **Duplicate Detection (2 rules)**
134
+ **AI agents**: Claude Code, Cursor, Windsurf, GitHub Copilot -- any agent that can run shell commands and edit files
311
135
 
312
- | Rule | Severity | Description |
313
- |------|----------|-------------|
314
- | `duplicate-title` | error | Titles must be unique across all content |
315
- | `duplicate-description` | error | Descriptions must be unique across all content |
136
+ **Content formats**: Markdown and MDX out of the box. Astro, HTML, Nuxt, any CMS via [custom adapters](docs/custom-adapters.md)
316
137
 
317
- **Link Validation (4 rules)**
138
+ **Build tools**: Runs in any CI pipeline. JSON output for programmatic consumption
318
139
 
319
- | Rule | Severity | Description |
320
- |------|----------|-------------|
321
- | `broken-internal-link` | error | Internal links must resolve to existing pages |
322
- | `absolute-internal-link` | warning | Internal links should use relative paths |
323
- | `draft-link-leak` | error | Links must not point to draft or noindex pages |
324
- | `trailing-slash-inconsistency` | warning | Internal links should not have trailing slashes |
325
-
326
- **External Links (3 rules)**
327
-
328
- | Rule | Severity | Description |
329
- |------|----------|-------------|
330
- | `external-link-malformed` | warning | External URLs must be well-formed |
331
- | `external-link-http` | warning | External links should use HTTPS |
332
- | `external-link-low-density` | warning | Blog posts should cite external sources |
333
-
334
- **Image Validation (3 rules)**
335
-
336
- | Rule | Severity | Description |
337
- |------|----------|-------------|
338
- | `inline-image-missing-alt` | error | Inline images must have alt text |
339
- | `frontmatter-image-missing-alt` | warning | Featured images should have alt text |
340
- | `image-not-found` | warning | Referenced images should exist on disk |
341
-
342
- **Performance (1 rule)**
343
-
344
- | Rule | Severity | Description |
345
- |------|----------|-------------|
346
- | `image-file-too-large` | warning | Image files should not exceed 500 KB |
347
-
348
- **Orphan Detection (1 rule)**
349
-
350
- | Rule | Severity | Description |
351
- |------|----------|-------------|
352
- | `orphan-content` | warning | Content should be linked from at least one other page |
353
-
354
- **Content Quality (14 rules)**
355
-
356
- | Rule | Severity | Description |
357
- |------|----------|-------------|
358
- | `content-too-short` | warning | Content should meet minimum word count (300) |
359
- | `low-readability` | warning | Content should meet minimum readability score |
360
- | `content-jargon-density` | warning | Complex/uncommon word density exceeds 8% (error at 15%) |
361
- | `content-repetition` | warning | High paragraph similarity or repeated phrases |
362
- | `content-sentence-length-extreme` | warning | Average sentence length exceeds 35 words (error at 50) |
363
- | `content-substance-ratio` | warning | Low vocabulary diversity (type-token ratio below 25%) |
364
- | `content-low-transition-words` | warning | Fewer than 20% of sentences contain transition words (error at 10%) |
365
- | `content-consecutive-starts` | warning | 3+ consecutive sentences start with the same word (error at 5+) |
366
- | `content-sentence-variety` | warning | Monotonous sentence lengths (coefficient of variation below 0.30) |
367
-
368
- **Date Validation (3 rules)**
369
-
370
- | Rule | Severity | Description |
371
- |------|----------|-------------|
372
- | `missing-date` | error | Blog and project content must have a date |
373
- | `future-date` | warning | Date should not be in the future |
374
- | `missing-updated-at` | warning | Content should have an updatedAt field |
375
-
376
- **Category Validation (2 rules)**
377
-
378
- | Rule | Severity | Description |
379
- |------|----------|-------------|
380
- | `category-invalid` | error | Categories must match the configured list |
381
- | `missing-categories` | warning | Blog posts should have at least one category |
382
-
383
- **i18n (2 rules)**
384
-
385
- | Rule | Severity | Description |
386
- |------|----------|-------------|
387
- | `translation-pair-missing` | warning | Translated content should have both language versions |
388
- | `missing-locale` | warning | Content should have a locale field |
389
-
390
- **GEO — Core (7 rules)**
391
-
392
- | Rule | Severity | Description |
393
- |------|----------|-------------|
394
- | `geo-no-question-headings` | warning | At least 20% of headings should be questions |
395
- | `geo-weak-lead-sentences` | warning | Sections should start with direct answers |
396
- | `geo-low-citation-density` | warning | Content needs data points (1 per 500 words) |
397
- | `geo-missing-faq-section` | warning | Long posts should include an FAQ section |
398
- | `geo-missing-table` | warning | Long posts should include a data table |
399
- | `geo-short-citation-blocks` | warning | Section lead paragraphs should be 40+ words |
400
- | `geo-low-entity-density` | warning | Brand and location should appear in content |
401
-
402
- **GEO — E-E-A-T (8 rules)**
403
-
404
- | Rule | Severity | Description |
405
- |------|----------|-------------|
406
- | `geo-missing-source-citations` | warning | Min 1 source citation per 500 words |
407
- | `geo-missing-expert-quotes` | warning | Long posts need at least 1 attributed blockquote |
408
- | `geo-missing-author` | warning | Blog posts need a non-generic author name |
409
- | `geo-heading-too-vague` | warning | Headings must be 3+ words and not generic |
410
- | `geo-faq-quality` | warning | FAQ sections need 3+ Q&A pairs with proper formatting |
411
- | `geo-definition-pattern` | warning | "What is X?" headings should start with "X is..." |
412
- | `geo-howto-steps` | warning | "How to" headings need 3+ numbered steps |
413
- | `geo-missing-tldr` | warning | Long posts need a TL;DR or key takeaway near the top |
414
-
415
- **GEO — Structure (7 rules)**
416
-
417
- | Rule | Severity | Description |
418
- |------|----------|-------------|
419
- | `geo-section-too-long` | warning | H2 sections over 300 words need H3 sub-headings |
420
- | `geo-paragraph-too-long` | warning | Paragraphs should not exceed 100 words |
421
- | `geo-missing-lists` | warning | Content should include at least one list |
422
- | `geo-citation-block-upper-bound` | warning | First paragraph after H2 should be under 80 words |
423
- | `geo-orphaned-intro` | warning | Introduction before first H2 should be under 150 words |
424
- | `geo-heading-density` | warning | No text gap should exceed 300 words without a heading |
425
- | `geo-structural-element-ratio` | warning | At least 1 structural element per 500 words |
426
-
427
- **GEO — Freshness & Quality (7 rules)**
428
-
429
- | Rule | Severity | Description |
430
- |------|----------|-------------|
431
- | `geo-stale-date-references` | warning | Year references older than 18 months |
432
- | `geo-outdated-content` | warning | Content not updated in over 6 months |
433
- | `geo-passive-voice-excess` | warning | Over 15% passive voice sentences |
434
- | `geo-sentence-too-long` | warning | Sentences exceeding 40 words |
435
- | `geo-low-internal-links` | warning | Fewer than 2 internal links |
436
- | `geo-comparison-table` | warning | Comparison headings without a data table |
437
- | `geo-inline-html` | warning | Raw HTML tags in markdown content |
438
-
439
- **GEO — RAG Optimization (6 rules)**
440
-
441
- | Rule | Severity | Description |
442
- |------|----------|-------------|
443
- | `geo-extraction-triggers` | warning | Long posts need summary/takeaway phrases |
444
- | `geo-section-self-containment` | warning | Sections should not open with unresolved pronouns |
445
- | `geo-vague-opening` | warning | Articles should not start with filler phrases |
446
- | `geo-acronym-expansion` | warning | Acronyms must be expanded on first use |
447
- | `geo-statistic-without-context` | warning | Statistics need source attribution or timeframe |
448
- | `geo-missing-summary-section` | warning | Long posts (2000+ words) need a summary section |
449
-
450
- </details>
140
+ **Runtime**: Node.js >= 18. Zero peer dependencies
451
141
 
452
142
  ---
453
143
 
454
- ## Agent Integration
144
+ ## Agent-First Design
145
+
146
+ This linter is **deterministic** -- same content in, same violations out, every time. Your AI agent provides the creativity to fix the content; geo-lint provides the guardrails to verify it's correct. The loop runs until violations hit zero.
455
147
 
456
- This is what `@ijonis/geo-lint` is built for. The linter isn't a reporting tool you read -- it's a **rule engine that governs your AI agent**. The agent runs the linter, reads the structured output, fixes every violation, and re-runs until the content is clean. You don't touch the content.
148
+ ### Try it now
457
149
 
458
- ### How it works
150
+ Paste this into **Claude Code**, **Cursor**, or any AI coding agent:
459
151
 
460
152
  ```
461
- ┌─────────────────────────────────────────────────┐
462
- │ You: "Optimize my blog posts for AI search" │
463
- └──────────────────┬──────────────────────────────┘
464
-
465
- ┌─────────────────────────────────────────────────┐
466
- │ Agent runs: npx geo-lint --format=json │
467
- │ ← Gets structured violations with fix guidance │
468
- └──────────────────┬──────────────────────────────┘
469
-
470
- ┌─────────────────────────────────────────────────┐
471
- │ Agent reads each violation's `suggestion` │
472
- │ Opens the file, applies the fix, saves it │
473
- └──────────────────┬──────────────────────────────┘
474
-
475
- ┌─────────────────────────────────────────────────┐
476
- │ Agent re-runs: npx geo-lint --format=json │
477
- │ Loops until violations = 0 │
478
- └──────────────────┬──────────────────────────────┘
479
-
480
- ┌─────────────────────────────────────────────────┐
481
- │ Done. Content is GEO-optimized. │
482
- └─────────────────────────────────────────────────┘
153
+ Run npx geo-lint --format=json, then fix every violation in the reported
154
+ files using each violation's suggestion field. After fixing, re-run the
155
+ linter and repeat until the output is an empty array []. Preserve the
156
+ author's voice -- restructure, don't rewrite.
483
157
  ```
484
158
 
485
- The entire loop is hands-off. Every rule includes:
486
- - **`suggestion`** -- a plain-language instruction the agent follows to fix the violation
487
- - **`fixStrategy`** -- a machine-readable fix description for the rule itself
488
- - **`file`, `field`, `line`** -- exact location so the agent edits the right place
159
+ That's it. The agent will iterate automatically.
489
160
 
490
- ### JSON output (what the agent reads)
161
+ ### What the agent sees
491
162
 
492
163
  ```bash
493
164
  npx geo-lint --format=json
@@ -502,482 +173,71 @@ npx geo-lint --format=json
502
173
  "severity": "warning",
503
174
  "message": "Only 1/5 (20%) H2/H3 headings are question-formatted",
504
175
  "suggestion": "Rephrase some headings as questions (e.g., 'How does X work?') to improve LLM snippet extraction."
505
- },
506
- {
507
- "file": "blog/my-post",
508
- "field": "body",
509
- "rule": "geo-missing-table",
510
- "severity": "warning",
511
- "message": "No data table found in long-form content",
512
- "suggestion": "Add a comparison table, feature matrix, or data summary table."
513
176
  }
514
177
  ]
515
178
  ```
516
179
 
517
- No ANSI colors. No human-friendly formatting. Pure structured data that any agent can parse and act on.
518
-
519
- ### Rule discovery (agent bootstrapping)
520
-
521
- Before fixing anything, an agent can learn every rule and its fix strategy in one call:
522
-
523
- ```bash
524
- npx geo-lint --rules
525
- ```
526
-
527
- ```json
528
- [
529
- {
530
- "name": "geo-no-question-headings",
531
- "severity": "warning",
532
- "category": "geo",
533
- "fixStrategy": "Rephrase some headings as questions (e.g., 'How does X work?')"
534
- }
535
- ]
536
- ```
537
-
538
- ### Drop-in Claude Code skill
539
-
540
- Add this to your project's `.claude/skills/` and the agent will optimize your content autonomously:
180
+ Every violation includes:
181
+ - **`suggestion`** -- plain-language fix instruction the agent follows directly
182
+ - **`fixStrategy`** -- machine-readable fix pattern for the rule
183
+ - **`file`, `field`, `line`** -- exact location so the agent edits the right place
541
184
 
542
- ```markdown
543
- ## GEO Lint & Fix
544
-
545
- 1. Run `npx geo-lint --format=json` and capture output
546
- 2. Parse the JSON array of violations
547
- 3. Group violations by file
548
- 4. For each file:
549
- - Read the MDX file
550
- - For each violation, apply the fix described in `suggestion`
551
- - Preserve the author's voice -- don't rewrite, restructure
552
- 5. Re-run `npx geo-lint --format=json`
553
- 6. If violations remain, repeat from step 4 (max 3 passes)
554
- 7. Report: files changed, violations fixed, any remaining issues
555
- ```
185
+ An empty array `[]` means zero violations -- the content is clean. The agent knows to stop.
556
186
 
557
- Works with **Claude Code**, **Cursor**, **Windsurf**, **Copilot**, or any agent that can run shell commands and edit files.
187
+ See the full [Agent Integration Guide](docs/agent-integration.md) for per-agent setup, a Claude Code skill, and handling edge cases.
558
188
 
559
189
  ---
560
190
 
561
- ## Configuration Reference
562
-
563
- Configuration is loaded from `geo-lint.config.ts` (also supports `.mjs` and `.js`), or from a `geoLint` key in `package.json`.
191
+ ## Configuration
564
192
 
565
- Use `defineConfig` for TypeScript autocomplete:
193
+ Override any rule's severity or disable it entirely:
566
194
 
567
195
  ```typescript
568
196
  import { defineConfig } from '@ijonis/geo-lint';
569
197
 
570
198
  export default defineConfig({
571
- // Required: your canonical site URL
572
- siteUrl: 'https://example.com',
573
-
574
- // Content directories to scan (defaults shown)
575
- contentPaths: [
576
- { dir: 'content/blog', type: 'blog', urlPrefix: '/blog/' },
577
- { dir: 'content/pages', type: 'page', urlPrefix: '/' },
578
- { dir: 'content/projects', type: 'project', urlPrefix: '/projects/' },
579
- ],
580
-
581
- // Additional valid internal URLs for link validation
582
- staticRoutes: ['/about', '/contact', '/pricing'],
583
-
584
- // Directories to scan for image existence checks (default: ['public/images'])
585
- imageDirectories: ['public/images'],
586
-
587
- // Valid content categories (empty = skip category validation)
588
- categories: ['engineering', 'design', 'business'],
589
-
590
- // Slugs to exclude from linting
591
- excludeSlugs: ['draft-post', 'test-page'],
592
-
593
- // Content categories to exclude entirely (default: ['legal'])
594
- excludeCategories: ['legal'],
595
-
596
- // GEO-specific configuration
597
- geo: {
598
- brandName: 'ACME Corp', // Entity density check (empty = skip)
599
- brandCity: 'Berlin', // Location entity check (empty = skip)
600
- keywordsPath: '', // Reserved for future use
601
- fillerPhrases: ['in this article', 'welcome to'], // Flagged in openings
602
- extractionTriggers: ['key takeaway', 'in summary'], // Summary phrases
603
- acronymAllowlist: ['HTML', 'CSS', 'API', 'SEO'], // Skip expansion check
604
- vagueHeadings: ['introduction', 'overview'], // Generic headings
605
- genericAuthorNames: ['admin', 'team'], // Flagged author names
606
- allowedHtmlTags: ['Callout', 'Note'], // MDX components
607
- },
608
-
609
- // Per-rule severity overrides ('error' | 'warning' | 'off')
199
+ siteUrl: 'https://your-site.com',
200
+ contentPaths: [{ dir: 'content/blog', type: 'blog', urlPrefix: '/blog/' }],
610
201
  rules: {
611
- 'geo-missing-table': 'off', // Disable a rule
612
- 'orphan-content': 'error', // Upgrade to error
613
- 'title-approaching-limit': 'off', // Disable a rule
614
- },
615
-
616
- // Threshold overrides
617
- thresholds: {
618
- title: { minLength: 30, maxLength: 60, warnLength: 55 },
619
- description: { minLength: 70, maxLength: 160, warnLength: 150 },
620
- slug: { maxLength: 75 },
621
- content: { minWordCount: 300, minReadabilityScore: 30 },
202
+ 'geo-missing-table': 'off', // disable a rule
203
+ 'orphan-content': 'error', // upgrade to error
622
204
  },
623
205
  });
624
206
  ```
625
207
 
626
- ### Configuration Options
627
-
628
- | Option | Type | Required | Default | Description |
629
- |--------|------|----------|---------|-------------|
630
- | `siteUrl` | `string` | Yes | -- | Canonical site URL for link and canonical validation |
631
- | `contentPaths` | `ContentPathConfig[]` | No | blog + pages + projects | Content directories to scan |
632
- | `staticRoutes` | `string[]` | No | `[]` | Additional valid internal URLs |
633
- | `imageDirectories` | `string[]` | No | `['public/images']` | Directories to scan for images |
634
- | `categories` | `string[]` | No | `[]` | Valid content categories |
635
- | `excludeSlugs` | `string[]` | No | `[]` | Slugs to skip during linting |
636
- | `excludeCategories` | `string[]` | No | `['legal']` | Categories to skip entirely |
637
- | `geo` | `GeoConfig` | No | `{}` | GEO entity density configuration |
638
- | `rules` | `Record<string, Severity>` | No | `{}` | Per-rule severity overrides |
639
- | `thresholds` | `ThresholdConfig` | No | See above | Length and quality thresholds |
640
-
641
- ### ContentPathConfig
642
-
643
- ```typescript
644
- interface ContentPathConfig {
645
- dir: string; // Relative path from project root
646
- type: 'blog' | 'page' | 'project';
647
- urlPrefix?: string; // URL prefix for permalink derivation
648
- defaultLocale?: string; // Default locale when frontmatter has none
649
- }
650
- ```
208
+ See the full [Configuration Reference](docs/configuration.md) for all options, thresholds, and GEO-specific settings.
651
209
 
652
210
  ---
653
211
 
654
- ## Custom Adapters
655
-
656
- By default, `@ijonis/geo-lint` scans `.md` and `.mdx` files with `gray-matter` frontmatter. **But you can lint any content source** -- Astro content collections, plain HTML, a headless CMS, a database -- by writing a small adapter that maps your content into `ContentItem` objects.
657
-
658
- The adapter runs through the **programmatic API** (`lint()` / `lintQuiet()`), so you create a tiny wrapper script instead of calling the CLI directly. This takes ~20 lines for most setups.
212
+ ## Extend It
659
213
 
660
- ### How it works
214
+ ### Custom Adapters
661
215
 
662
- ```
663
- Your content (Astro, HTML, CMS, DB, …)
664
- → Adapter maps each page to a ContentItem
665
- → geo-lint runs all 92 rules against those items
666
- → JSON violations come back, agent fixes content
667
- ```
668
-
669
- ### The `ContentItem` contract
670
-
671
- Every adapter must return an array of objects matching this interface. The required fields are what rules inspect:
672
-
673
- ```typescript
674
- interface ContentItem {
675
- // Required -- rules depend on these
676
- title: string; // Page/post title (SEO title rules)
677
- slug: string; // URL slug (slug validation rules)
678
- description: string; // Meta description (description rules)
679
- permalink: string; // Full URL path, e.g. '/blog/my-post' (link validation)
680
- contentType: 'blog' | 'page' | 'project'; // Controls which rules apply
681
- filePath: string; // Path to source file on disk (image path resolution)
682
- rawContent: string; // Full file content including frontmatter/metadata
683
- body: string; // Body content only (heading, readability, GEO rules)
684
-
685
- // Optional -- unlocks additional rules when provided
686
- image?: string; // Featured/OG image path
687
- imageAlt?: string; // Image alt text
688
- categories?: string[]; // Content categories
689
- date?: string; // Publish date (freshness rules)
690
- updatedAt?: string; // Last updated date
691
- author?: string; // Author name (E-E-A-T rules)
692
- locale?: string; // Locale code (i18n rules)
693
- translationKey?: string; // Links translated versions
694
- noindex?: boolean; // noindex flag
695
- draft?: boolean; // Draft flag (skipped by default adapter)
696
- }
697
- ```
698
-
699
- > **Tip:** Provide as many optional fields as you can. Each one unlocks rules that would otherwise be silently skipped.
700
-
701
- ### Example: CMS / API adapter
216
+ Lint any content source -- Astro, HTML, a headless CMS -- by writing a small adapter:
702
217
 
703
218
  ```typescript
704
219
  import { lint, createAdapter } from '@ijonis/geo-lint';
705
220
 
706
221
  const adapter = createAdapter(async (projectRoot) => {
707
- const posts = await fetchFromCMS();
708
-
709
- return posts.map(post => ({
710
- title: post.title,
711
- slug: post.slug,
712
- description: post.metaDescription,
713
- permalink: `/blog/${post.slug}`,
714
- body: post.markdownContent,
715
- contentType: 'blog' as const,
716
- filePath: `virtual/${post.slug}.mdx`,
717
- rawContent: post.markdownContent,
718
- image: post.featuredImage,
719
- imageAlt: post.featuredImageAlt,
720
- date: post.publishedAt,
721
- locale: post.language,
722
- categories: post.tags,
723
- }));
724
- });
725
-
726
- const exitCode = await lint({ adapter });
727
- process.exit(exitCode);
728
- ```
729
-
730
- ### Example: Astro content collections
731
-
732
- Astro stores content in `src/content/` with its own frontmatter schema. Write an adapter that reads the `.md`/`.mdx` files and maps Astro's frontmatter fields to `ContentItem`:
733
-
734
- ```typescript
735
- // scripts/lint.ts
736
- import { lint, createAdapter } from '@ijonis/geo-lint';
737
- import { readFileSync, readdirSync } from 'fs';
738
- import { join, basename } from 'path';
739
- import matter from 'gray-matter';
740
-
741
- const adapter = createAdapter((projectRoot) => {
742
- const contentDir = join(projectRoot, 'src/content/blog');
743
- const files = readdirSync(contentDir).filter(f => f.endsWith('.md') || f.endsWith('.mdx'));
744
-
745
- return files.map(file => {
746
- const filePath = join(contentDir, file);
747
- const raw = readFileSync(filePath, 'utf-8');
748
- const { data: fm, content: body } = matter(raw);
749
- const slug = fm.slug ?? basename(file, '.mdx').replace(/\.md$/, '');
750
-
751
- return {
752
- title: fm.title ?? '',
753
- slug,
754
- description: fm.description ?? '',
755
- permalink: `/blog/${slug}`,
756
- contentType: 'blog' as const,
757
- filePath,
758
- rawContent: raw,
759
- body,
760
- image: fm.heroImage ?? fm.image,
761
- imageAlt: fm.heroImageAlt ?? fm.imageAlt,
762
- date: fm.pubDate ?? fm.date,
763
- updatedAt: fm.updatedDate,
764
- author: fm.author,
765
- categories: fm.tags ?? fm.categories,
766
- draft: fm.draft,
767
- };
768
- });
222
+ // Map your content into ContentItem objects
223
+ return [{ title, slug, description, body, permalink, contentType, filePath, rawContent }];
769
224
  });
770
225
 
771
- const exitCode = await lint({
772
- adapter,
773
- projectRoot: process.cwd(),
774
- format: 'json',
775
- });
776
- process.exit(exitCode);
777
- ```
778
-
779
- Run it with:
780
-
781
- ```bash
782
- npx tsx scripts/lint.ts
783
- ```
784
-
785
- ### Example: Static HTML site
786
-
787
- For a static site with plain `.html` files (no frontmatter), extract metadata from `<title>`, `<meta>` tags, and the document body. A lightweight parser like `cheerio` does the job:
788
-
789
- ```typescript
790
- // scripts/lint.ts
791
- import { lint, createAdapter } from '@ijonis/geo-lint';
792
- import { readFileSync, readdirSync, statSync } from 'fs';
793
- import { join, relative, basename } from 'path';
794
- import * as cheerio from 'cheerio';
795
-
796
- function findHtmlFiles(dir: string): string[] {
797
- const results: string[] = [];
798
- for (const entry of readdirSync(dir)) {
799
- const full = join(dir, entry);
800
- if (statSync(full).isDirectory()) results.push(...findHtmlFiles(full));
801
- else if (entry.endsWith('.html')) results.push(full);
802
- }
803
- return results;
804
- }
805
-
806
- const adapter = createAdapter((projectRoot) => {
807
- const htmlFiles = findHtmlFiles(projectRoot);
808
-
809
- return htmlFiles.map(filePath => {
810
- const raw = readFileSync(filePath, 'utf-8');
811
- const $ = cheerio.load(raw);
812
-
813
- const title = $('title').text() || '';
814
- const description = $('meta[name="description"]').attr('content') || '';
815
- const ogImage = $('meta[property="og:image"]').attr('content');
816
- const ogImageAlt = $('meta[property="og:image:alt"]').attr('content');
817
- const author = $('meta[name="author"]').attr('content');
818
- const body = $('main').html() ?? $('body').html() ?? '';
819
- const rel = relative(projectRoot, filePath);
820
- const slug = rel.replace(/\.html$/, '').replace(/\/index$/, '');
821
-
822
- return {
823
- title,
824
- slug,
825
- description,
826
- permalink: `/${slug}`,
827
- contentType: 'page' as const,
828
- filePath,
829
- rawContent: raw,
830
- body,
831
- image: ogImage,
832
- imageAlt: ogImageAlt,
833
- author,
834
- };
835
- });
836
- });
837
-
838
- const exitCode = await lint({
839
- adapter,
840
- projectRoot: process.cwd(),
841
- format: 'json',
842
- });
843
- process.exit(exitCode);
844
- ```
845
-
846
- ### Example: Astro `.astro` component pages
847
-
848
- For `.astro` files that use embedded frontmatter (the `---` block at the top), extract the variables and template body:
849
-
850
- ```typescript
851
- // scripts/lint.ts
852
- import { lint, createAdapter } from '@ijonis/geo-lint';
853
- import { readFileSync, readdirSync, statSync } from 'fs';
854
- import { join, relative } from 'path';
855
-
856
- function findAstroFiles(dir: string): string[] {
857
- const results: string[] = [];
858
- for (const entry of readdirSync(dir)) {
859
- const full = join(dir, entry);
860
- if (statSync(full).isDirectory()) results.push(...findAstroFiles(full));
861
- else if (entry.endsWith('.astro')) results.push(full);
862
- }
863
- return results;
864
- }
865
-
866
- function parseAstroFrontmatter(raw: string): Record<string, string> {
867
- const match = raw.match(/^---\n([\s\S]*?)\n---/);
868
- if (!match) return {};
869
- const vars: Record<string, string> = {};
870
- for (const line of match[1].split('\n')) {
871
- const assign = line.match(/(?:const|let)\s+(\w+)\s*=\s*['"](.+?)['"]/);
872
- if (assign) vars[assign[1]] = assign[2];
873
- }
874
- return vars;
875
- }
876
-
877
- const adapter = createAdapter((projectRoot) => {
878
- const pagesDir = join(projectRoot, 'src/pages');
879
- const files = findAstroFiles(pagesDir);
880
-
881
- return files.map(filePath => {
882
- const raw = readFileSync(filePath, 'utf-8');
883
- const vars = parseAstroFrontmatter(raw);
884
- const templateBody = raw.replace(/^---[\s\S]*?---/, '').trim();
885
- const rel = relative(pagesDir, filePath);
886
- const slug = rel.replace(/\.astro$/, '').replace(/\/index$/, '');
887
-
888
- return {
889
- title: vars.title ?? '',
890
- slug,
891
- description: vars.description ?? '',
892
- permalink: `/${slug}`,
893
- contentType: 'page' as const,
894
- filePath,
895
- rawContent: raw,
896
- body: templateBody,
897
- image: vars.ogImage,
898
- author: vars.author,
899
- };
900
- });
901
- });
902
-
903
- const exitCode = await lint({
904
- adapter,
905
- projectRoot: process.cwd(),
906
- format: 'json',
907
- });
908
- process.exit(exitCode);
909
- ```
910
-
911
- ### Tips for custom adapters
912
-
913
- | Topic | Guidance |
914
- |-------|----------|
915
- | **`filePath` must be a real path** | Rules like `image-not-found` resolve image paths relative to `filePath`. Use the actual file path on disk, not a virtual one, whenever possible. |
916
- | **`body` should be the renderable content** | Strip frontmatter, script blocks, and layout wrappers. Rules analyze headings, paragraphs, and links in the body. |
917
- | **`rawContent` includes everything** | Some rules inspect the full file (frontmatter + body). Always pass the unmodified file content. |
918
- | **`contentType` controls rule selection** | `'blog'` triggers date/author/category rules. `'page'` and `'project'` are lighter. Map your content to the closest match. |
919
- | **Config still applies** | Your `geo-lint.config.ts` settings (`siteUrl`, `categories`, `imageDirectories`, `rules`, etc.) still apply. Only `contentPaths` is bypassed by the adapter. |
920
- | **Combine with the default adapter** | You can lint MDX files via `contentPaths` in config AND additional content via a custom adapter in separate runs. |
921
-
922
- ### Let an AI agent write the adapter for you
923
-
924
- If you're integrating geo-lint into a project that uses a non-standard content format, you can ask your AI agent to generate the adapter. Give it this prompt:
925
-
226
+ await lint({ adapter });
926
227
  ```
927
- I want to lint my content with @ijonis/geo-lint but my site uses [Astro/HTML/Nuxt/etc.].
928
- Create a scripts/lint.ts file with a custom adapter that:
929
- 1. Finds all content files in [describe your content directory]
930
- 2. Extracts title, description, slug, body from [describe your format]
931
- 3. Maps them to ContentItem objects
932
- 4. Runs lint() with JSON output
933
-
934
- See the Custom Adapters section in the @ijonis/geo-lint README for the ContentItem interface
935
- and examples. Use createAdapter() from '@ijonis/geo-lint'.
936
- ```
937
-
938
- The agent will read your project structure, create the adapter, run it, and fix any violations it finds -- the standard agentic lint-fix loop works the same regardless of the content format.
939
228
 
940
- ---
229
+ See the [Custom Adapters Guide](docs/custom-adapters.md) for the full `ContentItem` interface and ready-to-use examples for Astro, HTML, and CMS sources.
941
230
 
942
- ## Programmatic API
943
-
944
- Use `lint()` for full output or `lintQuiet()` for raw results without console output:
231
+ ### Programmatic API
945
232
 
946
233
  ```typescript
947
234
  import { lint, lintQuiet } from '@ijonis/geo-lint';
948
235
 
949
- // Full lint with formatted console output
950
- const exitCode = await lint({
951
- projectRoot: './my-project',
952
- format: 'json',
953
- });
954
-
955
- // Quiet mode: returns raw LintResult[] with no console output
956
- const results = await lintQuiet({
957
- projectRoot: './my-project',
958
- });
959
-
960
- // Filter and process results programmatically
961
- const geoViolations = results.filter(r => r.rule.startsWith('geo-'));
962
- const errors = results.filter(r => r.severity === 'error');
963
-
964
- console.log(`${geoViolations.length} GEO issues found`);
965
- console.log(`${errors.length} errors (will block build)`);
236
+ const exitCode = await lint({ format: 'json' }); // with console output
237
+ const results = await lintQuiet({ projectRoot: '.' }); // raw LintResult[]
966
238
  ```
967
239
 
968
- ### LintResult Type
969
-
970
- ```typescript
971
- interface LintResult {
972
- file: string; // Relative path (e.g., "blog/my-post")
973
- field: string; // Field checked (e.g., "title", "body", "image")
974
- rule: string; // Rule identifier (e.g., "geo-no-question-headings")
975
- severity: 'error' | 'warning';
976
- message: string; // Human-readable violation description
977
- suggestion?: string; // Actionable fix suggestion
978
- line?: number; // Line number in source file (when applicable)
979
- }
980
- ```
240
+ See the [API Reference](docs/api.md) for all options and types.
981
241
 
982
242
  ---
983
243
 
@@ -999,6 +259,10 @@ Options:
999
259
 
1000
260
  ---
1001
261
 
262
+ ## Contributing
263
+
264
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and how to add new rules. Changes are tracked in the [CHANGELOG](CHANGELOG.md).
265
+
1002
266
  ## License
1003
267
 
1004
268
  [MIT](LICENSE)