npm - @ijonis/geo-lint - Versions diffs - 0.1.0 → 0.1.1 - Mend

@ijonis/geo-lint 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -8,10 +8,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 ### Added
-- Initial release with 55 SEO/GEO rules
+- 28 new GEO rules across 4 categories (total: 35 GEO rules, 81 rules overall)
+  - **E-E-A-T (8 rules):** source citations, expert quotes, author validation, heading quality, FAQ quality, definition patterns, how-to steps, TL;DR detection
+  - **Structure (7 rules):** section length, paragraph length, list presence, citation block bounds, orphaned intros, heading density, structural element ratio
+  - **Freshness (7 rules):** stale year references, outdated content, passive voice, sentence length, internal links, comparison tables, inline HTML
+  - **RAG Optimization (6 rules):** extraction triggers, section self-containment, vague openings, acronym expansion, statistic context, summary sections
+- `author` field support in ContentItem and MDX adapter
+- 6 new GeoConfig options: `fillerPhrases`, `extractionTriggers`, `acronymAllowlist`, `vagueHeadings`, `genericAuthorNames`, `allowedHtmlTags`
+- New utility module `geo-advanced-analyzer.ts` with 10 analysis functions
+- Extended `geo-analyzer.ts` with 6 new utility functions
+- Comprehensive tests for all 28 new rules (~120 tests)
+## [0.1.0] - 2026-02-18
+### Added
+- Initial release with 53 SEO/GEO rules
 - 7 GEO (Generative Engine Optimization) rules for AI search visibility
 - Configurable via `geo-lint.config.ts`
 - JSON output mode for AI agent integration
 - Fix strategies for every rule (agent-readable)
 - MDX/Markdown content adapter with `gray-matter`
 - CLI with `--format=json`, `--rules`, `--root`, `--config` flags
+[0.1.0]: https://github.com/IJONIS/geo-lint/releases/tag/v0.1.0

package/README.md CHANGED Viewed

@@ -26,7 +26,7 @@ This works today with Claude Code, Cursor, Windsurf, Copilot, or any agent that
 **GEO (Generative Engine Optimization)** is the practice of optimizing content so it gets cited by AI search engines -- ChatGPT, Perplexity, Google AI Overviews, Gemini. When someone asks an AI a question, the model pulls from web content to build its answer. GEO makes your content the source it pulls from.
-Traditional SEO gets you into search result lists. GEO gets you **cited in AI-generated answers**. They're complementary, but GEO requires structural changes that no existing SEO tool checks for. `@ijonis/geo-lint` validates both -- 46 SEO rules and **7 dedicated GEO rules** that have zero open-source alternatives.
+Traditional SEO gets you into search result lists. GEO gets you **cited in AI-generated answers**. They're complementary, but GEO requires structural changes that no existing SEO tool checks for. `@ijonis/geo-lint` validates both -- 32 SEO rules, **35 dedicated GEO rules**, and **14 content quality rules** including readability analysis inspired by Yoast SEO -- with zero open-source alternatives for the GEO checks.
 ---
@@ -61,9 +61,13 @@ Or let your agent handle it -- see [Agent Integration](#agent-integration) below
 ---
-## The 7 GEO Rules
+## GEO Rules
-No other open-source linter checks for these. Each rule targets a specific content pattern that AI search engines use when deciding what to cite. When your agent fixes a GEO violation, it's directly increasing the probability that the content gets pulled into AI-generated answers.
+No other open-source linter checks for these. 35 rules across E-E-A-T signals, content structure, freshness, and RAG optimization -- each targeting a specific content pattern that AI search engines use when deciding what to cite. When your agent fixes a GEO violation, it's directly increasing the probability that the content gets pulled into AI-generated answers.
+> **New in 0.1.1:** 14 content quality rules now include transition word analysis, consecutive sentence start detection, and sentence length variety scoring -- readability checks inspired by Yoast SEO, built for the agentic lint-fix loop.
+### Core GEO Rules (7 rules)
 ### 1. `geo-no-question-headings`
@@ -222,15 +226,15 @@ headquarters, using modern frameworks and cloud infrastructure.
 ## All Rules
-`@ijonis/geo-lint` ships with 53 rules across 5 categories. Here is a summary:
+`@ijonis/geo-lint` ships with 92 rules across 5 categories. Here is a summary:
 | Category | Rules | Severity Mix | Focus |
 |----------|-------|-------------|-------|
-| SEO | 27 | 6 errors, 21 warnings | Titles, descriptions, headings, slugs, OG images, canonical URLs, keywords, links, schema |
-| Content | 7 | 2 errors, 5 warnings | Word count, readability, dates, categories |
-| Technical | 9 | 3 errors, 6 warnings | Broken links, image files, trailing slashes, external URLs, performance |
-| i18n | 2 | 0 errors, 2 warnings | Translation pairs, locale metadata |
-| GEO | 7 | 0 errors, 7 warnings | AI citation readiness (see above) |
+| SEO | 32 | 6 errors, 26 warnings | Titles, descriptions, headings, slugs, OG images, canonical URLs, keywords, links, schema |
+| Content | 14 | 2 errors, 12 warnings | Word count, readability, dates, categories, jargon density, repetition, vocabulary diversity, transition words, sentence variety |
+| Technical | 8 | 3 errors, 5 warnings | Broken links, image files, trailing slashes, external URLs, performance |
+| i18n | 3 | 0 errors, 3 warnings | Translation pairs, locale metadata |
+| GEO | 35 | 0 errors, 35 warnings | AI citation readiness: E-E-A-T signals, content structure, freshness, RAG optimization |
 <details>
 <summary>Full rule list</summary>
@@ -347,12 +351,19 @@ headquarters, using modern frameworks and cloud infrastructure.
 |------|----------|-------------|
 | `orphan-content` | warning | Content should be linked from at least one other page |
-**Content Quality (2 rules)**
+**Content Quality (14 rules)**
 | Rule | Severity | Description |
 |------|----------|-------------|
 | `content-too-short` | warning | Content should meet minimum word count (300) |
 | `low-readability` | warning | Content should meet minimum readability score |
+| `content-jargon-density` | warning | Complex/uncommon word density exceeds 8% (error at 15%) |
+| `content-repetition` | warning | High paragraph similarity or repeated phrases |
+| `content-sentence-length-extreme` | warning | Average sentence length exceeds 35 words (error at 50) |
+| `content-substance-ratio` | warning | Low vocabulary diversity (type-token ratio below 25%) |
+| `content-low-transition-words` | warning | Fewer than 20% of sentences contain transition words (error at 10%) |
+| `content-consecutive-starts` | warning | 3+ consecutive sentences start with the same word (error at 5+) |
+| `content-sentence-variety` | warning | Monotonous sentence lengths (coefficient of variation below 0.30) |
 **Date Validation (3 rules)**
@@ -376,7 +387,7 @@ headquarters, using modern frameworks and cloud infrastructure.
 | `translation-pair-missing` | warning | Translated content should have both language versions |
 | `missing-locale` | warning | Content should have a locale field |
-**GEO (7 rules)**
+**GEO — Core (7 rules)**
 | Rule | Severity | Description |
 |------|----------|-------------|
@@ -388,6 +399,54 @@ headquarters, using modern frameworks and cloud infrastructure.
 | `geo-short-citation-blocks` | warning | Section lead paragraphs should be 40+ words |
 | `geo-low-entity-density` | warning | Brand and location should appear in content |
+**GEO — E-E-A-T (8 rules)**
+| Rule | Severity | Description |
+|------|----------|-------------|
+| `geo-missing-source-citations` | warning | Min 1 source citation per 500 words |
+| `geo-missing-expert-quotes` | warning | Long posts need at least 1 attributed blockquote |
+| `geo-missing-author` | warning | Blog posts need a non-generic author name |
+| `geo-heading-too-vague` | warning | Headings must be 3+ words and not generic |
+| `geo-faq-quality` | warning | FAQ sections need 3+ Q&A pairs with proper formatting |
+| `geo-definition-pattern` | warning | "What is X?" headings should start with "X is..." |
+| `geo-howto-steps` | warning | "How to" headings need 3+ numbered steps |
+| `geo-missing-tldr` | warning | Long posts need a TL;DR or key takeaway near the top |
+**GEO — Structure (7 rules)**
+| Rule | Severity | Description |
+|------|----------|-------------|
+| `geo-section-too-long` | warning | H2 sections over 300 words need H3 sub-headings |
+| `geo-paragraph-too-long` | warning | Paragraphs should not exceed 100 words |
+| `geo-missing-lists` | warning | Content should include at least one list |
+| `geo-citation-block-upper-bound` | warning | First paragraph after H2 should be under 80 words |
+| `geo-orphaned-intro` | warning | Introduction before first H2 should be under 150 words |
+| `geo-heading-density` | warning | No text gap should exceed 300 words without a heading |
+| `geo-structural-element-ratio` | warning | At least 1 structural element per 500 words |
+**GEO — Freshness & Quality (7 rules)**
+| Rule | Severity | Description |
+|------|----------|-------------|
+| `geo-stale-date-references` | warning | Year references older than 18 months |
+| `geo-outdated-content` | warning | Content not updated in over 6 months |
+| `geo-passive-voice-excess` | warning | Over 15% passive voice sentences |
+| `geo-sentence-too-long` | warning | Sentences exceeding 40 words |
+| `geo-low-internal-links` | warning | Fewer than 2 internal links |
+| `geo-comparison-table` | warning | Comparison headings without a data table |
+| `geo-inline-html` | warning | Raw HTML tags in markdown content |
+**GEO — RAG Optimization (6 rules)**
+| Rule | Severity | Description |
+|------|----------|-------------|
+| `geo-extraction-triggers` | warning | Long posts need summary/takeaway phrases |
+| `geo-section-self-containment` | warning | Sections should not open with unresolved pronouns |
+| `geo-vague-opening` | warning | Articles should not start with filler phrases |
+| `geo-acronym-expansion` | warning | Acronyms must be expanded on first use |
+| `geo-statistic-without-context` | warning | Statistics need source attribution or timeframe |
+| `geo-missing-summary-section` | warning | Long posts (2000+ words) need a summary section |
 </details>
 ---
@@ -539,6 +598,12 @@ export default defineConfig({
     brandName: 'ACME Corp',   // Entity density check (empty = skip)
     brandCity: 'Berlin',       // Location entity check (empty = skip)
     keywordsPath: '',          // Reserved for future use
+    fillerPhrases: ['in this article', 'welcome to'],  // Flagged in openings
+    extractionTriggers: ['key takeaway', 'in summary'], // Summary phrases
+    acronymAllowlist: ['HTML', 'CSS', 'API', 'SEO'],   // Skip expansion check
+    vagueHeadings: ['introduction', 'overview'],        // Generic headings
+    genericAuthorNames: ['admin', 'team'],              // Flagged author names
+    allowedHtmlTags: ['Callout', 'Note'],               // MDX components
   },
   // Per-rule severity overrides ('error' | 'warning' | 'off')
@@ -588,13 +653,57 @@ interface ContentPathConfig {
 ## Custom Adapters
-By default, `@ijonis/geo-lint` scans Markdown and MDX files using `gray-matter` for frontmatter parsing. If your content lives in a CMS, database, or custom format, you can provide a custom adapter:
+By default, `@ijonis/geo-lint` scans `.md` and `.mdx` files with `gray-matter` frontmatter. **But you can lint any content source** -- Astro content collections, plain HTML, a headless CMS, a database -- by writing a small adapter that maps your content into `ContentItem` objects.
+The adapter runs through the **programmatic API** (`lint()` / `lintQuiet()`), so you create a tiny wrapper script instead of calling the CLI directly. This takes ~20 lines for most setups.
+### How it works
+```
+Your content (Astro, HTML, CMS, DB, …)
+  → Adapter maps each page to a ContentItem
+    → geo-lint runs all 92 rules against those items
+      → JSON violations come back, agent fixes content
+```
+### The `ContentItem` contract
+Every adapter must return an array of objects matching this interface. The required fields are what rules inspect:
+```typescript
+interface ContentItem {
+  // Required -- rules depend on these
+  title: string;           // Page/post title (SEO title rules)
+  slug: string;            // URL slug (slug validation rules)
+  description: string;     // Meta description (description rules)
+  permalink: string;       // Full URL path, e.g. '/blog/my-post' (link validation)
+  contentType: 'blog' | 'page' | 'project'; // Controls which rules apply
+  filePath: string;        // Path to source file on disk (image path resolution)
+  rawContent: string;      // Full file content including frontmatter/metadata
+  body: string;            // Body content only (heading, readability, GEO rules)
+  // Optional -- unlocks additional rules when provided
+  image?: string;          // Featured/OG image path
+  imageAlt?: string;       // Image alt text
+  categories?: string[];   // Content categories
+  date?: string;           // Publish date (freshness rules)
+  updatedAt?: string;      // Last updated date
+  author?: string;         // Author name (E-E-A-T rules)
+  locale?: string;         // Locale code (i18n rules)
+  translationKey?: string; // Links translated versions
+  noindex?: boolean;       // noindex flag
+  draft?: boolean;         // Draft flag (skipped by default adapter)
+}
+```
+> **Tip:** Provide as many optional fields as you can. Each one unlocks rules that would otherwise be silently skipped.
+### Example: CMS / API adapter
 ```typescript
 import { lint, createAdapter } from '@ijonis/geo-lint';
 const adapter = createAdapter(async (projectRoot) => {
-  // Fetch from your CMS, database, or API
   const posts = await fetchFromCMS();
   return posts.map(post => ({
@@ -606,7 +715,6 @@ const adapter = createAdapter(async (projectRoot) => {
     contentType: 'blog' as const,
     filePath: `virtual/${post.slug}.mdx`,
     rawContent: post.markdownContent,
-    // Optional fields
     image: post.featuredImage,
     imageAlt: post.featuredImageAlt,
     date: post.publishedAt,
@@ -619,7 +727,215 @@ const exitCode = await lint({ adapter });
 process.exit(exitCode);
 ```
-The adapter receives the project root path and must return an array of `ContentItem` objects. All standard rules run against the returned items.
+### Example: Astro content collections
+Astro stores content in `src/content/` with its own frontmatter schema. Write an adapter that reads the `.md`/`.mdx` files and maps Astro's frontmatter fields to `ContentItem`:
+```typescript
+// scripts/lint.ts
+import { lint, createAdapter } from '@ijonis/geo-lint';
+import { readFileSync, readdirSync } from 'fs';
+import { join, basename } from 'path';
+import matter from 'gray-matter';
+const adapter = createAdapter((projectRoot) => {
+  const contentDir = join(projectRoot, 'src/content/blog');
+  const files = readdirSync(contentDir).filter(f => f.endsWith('.md') || f.endsWith('.mdx'));
+  return files.map(file => {
+    const filePath = join(contentDir, file);
+    const raw = readFileSync(filePath, 'utf-8');
+    const { data: fm, content: body } = matter(raw);
+    const slug = fm.slug ?? basename(file, '.mdx').replace(/\.md$/, '');
+    return {
+      title: fm.title ?? '',
+      slug,
+      description: fm.description ?? '',
+      permalink: `/blog/${slug}`,
+      contentType: 'blog' as const,
+      filePath,
+      rawContent: raw,
+      body,
+      image: fm.heroImage ?? fm.image,
+      imageAlt: fm.heroImageAlt ?? fm.imageAlt,
+      date: fm.pubDate ?? fm.date,
+      updatedAt: fm.updatedDate,
+      author: fm.author,
+      categories: fm.tags ?? fm.categories,
+      draft: fm.draft,
+    };
+  });
+});
+const exitCode = await lint({
+  adapter,
+  projectRoot: process.cwd(),
+  format: 'json',
+});
+process.exit(exitCode);
+```
+Run it with:
+```bash
+npx tsx scripts/lint.ts
+```
+### Example: Static HTML site
+For a static site with plain `.html` files (no frontmatter), extract metadata from `<title>`, `<meta>` tags, and the document body. A lightweight parser like `cheerio` does the job:
+```typescript
+// scripts/lint.ts
+import { lint, createAdapter } from '@ijonis/geo-lint';
+import { readFileSync, readdirSync, statSync } from 'fs';
+import { join, relative, basename } from 'path';
+import * as cheerio from 'cheerio';
+function findHtmlFiles(dir: string): string[] {
+  const results: string[] = [];
+  for (const entry of readdirSync(dir)) {
+    const full = join(dir, entry);
+    if (statSync(full).isDirectory()) results.push(...findHtmlFiles(full));
+    else if (entry.endsWith('.html')) results.push(full);
+  }
+  return results;
+}
+const adapter = createAdapter((projectRoot) => {
+  const htmlFiles = findHtmlFiles(projectRoot);
+  return htmlFiles.map(filePath => {
+    const raw = readFileSync(filePath, 'utf-8');
+    const $ = cheerio.load(raw);
+    const title = $('title').text() || '';
+    const description = $('meta[name="description"]').attr('content') || '';
+    const ogImage = $('meta[property="og:image"]').attr('content');
+    const ogImageAlt = $('meta[property="og:image:alt"]').attr('content');
+    const author = $('meta[name="author"]').attr('content');
+    const body = $('main').html() ?? $('body').html() ?? '';
+    const rel = relative(projectRoot, filePath);
+    const slug = rel.replace(/\.html$/, '').replace(/\/index$/, '');
+    return {
+      title,
+      slug,
+      description,
+      permalink: `/${slug}`,
+      contentType: 'page' as const,
+      filePath,
+      rawContent: raw,
+      body,
+      image: ogImage,
+      imageAlt: ogImageAlt,
+      author,
+    };
+  });
+});
+const exitCode = await lint({
+  adapter,
+  projectRoot: process.cwd(),
+  format: 'json',
+});
+process.exit(exitCode);
+```
+### Example: Astro `.astro` component pages
+For `.astro` files that use embedded frontmatter (the `---` block at the top), extract the variables and template body:
+```typescript
+// scripts/lint.ts
+import { lint, createAdapter } from '@ijonis/geo-lint';
+import { readFileSync, readdirSync, statSync } from 'fs';
+import { join, relative } from 'path';
+function findAstroFiles(dir: string): string[] {
+  const results: string[] = [];
+  for (const entry of readdirSync(dir)) {
+    const full = join(dir, entry);
+    if (statSync(full).isDirectory()) results.push(...findAstroFiles(full));
+    else if (entry.endsWith('.astro')) results.push(full);
+  }
+  return results;
+}
+function parseAstroFrontmatter(raw: string): Record<string, string> {
+  const match = raw.match(/^---\n([\s\S]*?)\n---/);
+  if (!match) return {};
+  const vars: Record<string, string> = {};
+  for (const line of match[1].split('\n')) {
+    const assign = line.match(/(?:const|let)\s+(\w+)\s*=\s*['"](.+?)['"]/);
+    if (assign) vars[assign[1]] = assign[2];
+  }
+  return vars;
+}
+const adapter = createAdapter((projectRoot) => {
+  const pagesDir = join(projectRoot, 'src/pages');
+  const files = findAstroFiles(pagesDir);
+  return files.map(filePath => {
+    const raw = readFileSync(filePath, 'utf-8');
+    const vars = parseAstroFrontmatter(raw);
+    const templateBody = raw.replace(/^---[\s\S]*?---/, '').trim();
+    const rel = relative(pagesDir, filePath);
+    const slug = rel.replace(/\.astro$/, '').replace(/\/index$/, '');
+    return {
+      title: vars.title ?? '',
+      slug,
+      description: vars.description ?? '',
+      permalink: `/${slug}`,
+      contentType: 'page' as const,
+      filePath,
+      rawContent: raw,
+      body: templateBody,
+      image: vars.ogImage,
+      author: vars.author,
+    };
+  });
+});
+const exitCode = await lint({
+  adapter,
+  projectRoot: process.cwd(),
+  format: 'json',
+});
+process.exit(exitCode);
+```
+### Tips for custom adapters
+| Topic | Guidance |
+|-------|----------|
+| **`filePath` must be a real path** | Rules like `image-not-found` resolve image paths relative to `filePath`. Use the actual file path on disk, not a virtual one, whenever possible. |
+| **`body` should be the renderable content** | Strip frontmatter, script blocks, and layout wrappers. Rules analyze headings, paragraphs, and links in the body. |
+| **`rawContent` includes everything** | Some rules inspect the full file (frontmatter + body). Always pass the unmodified file content. |
+| **`contentType` controls rule selection** | `'blog'` triggers date/author/category rules. `'page'` and `'project'` are lighter. Map your content to the closest match. |
+| **Config still applies** | Your `geo-lint.config.ts` settings (`siteUrl`, `categories`, `imageDirectories`, `rules`, etc.) still apply. Only `contentPaths` is bypassed by the adapter. |
+| **Combine with the default adapter** | You can lint MDX files via `contentPaths` in config AND additional content via a custom adapter in separate runs. |
+### Let an AI agent write the adapter for you
+If you're integrating geo-lint into a project that uses a non-standard content format, you can ask your AI agent to generate the adapter. Give it this prompt:
+```
+I want to lint my content with @ijonis/geo-lint but my site uses [Astro/HTML/Nuxt/etc.].
+Create a scripts/lint.ts file with a custom adapter that:
+1. Finds all content files in [describe your content directory]
+2. Extracts title, description, slug, body from [describe your format]
+3. Maps them to ContentItem objects
+4. Runs lint() with JSON output
+See the Custom Adapters section in the @ijonis/geo-lint README for the ContentItem interface
+and examples. Use createAdapter() from '@ijonis/geo-lint'.
+```
+The agent will read your project structure, create the adapter, run it, and fix any violations it finds -- the standard agentic lint-fix loop works the same regardless of the content format.
 ---