npm - aeorank - Versions diffs - 3.0.3 → 3.1.0 - Mend

aeorank 3.0.3 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # AEORank
-Score any website for AI engine visibility across 34 criteria in a 5-pillar framework. Pure HTTP + regex - zero API keys, under 10 seconds.
+Score any website for AI engine visibility across 36 criteria in a 5-pillar framework. Pure HTTP + regex - zero API keys, under 10 seconds.
 [![npm version](https://img.shields.io/npm/v/aeorank.svg)](https://www.npmjs.com/package/aeorank)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -35,7 +35,7 @@ import { audit } from 'aeorank';
 const result = await audit('example.com');
 console.log(result.overallScore);  // 0-100
-console.log(result.scorecard);     // 34 criteria with scores, pillars, weights
+console.log(result.scorecard);     // 36 criteria with scores, pillars, weights
 console.log(result.pillarScores);  // { answerReadiness, contentStructure, ... }
 console.log(result.topFixes);      // Top 3 highest-impact fixes
 console.log(result.opportunities); // Prioritized improvements
@@ -43,7 +43,7 @@ console.log(result.opportunities); // Prioritized improvements
 ## What It Checks
-AEORank evaluates 34 criteria that determine how AI engines (ChatGPT, Claude, Perplexity, Google AI Overviews) discover, parse, and cite your content. Criteria are organized into five pillars:
+AEORank evaluates 36 criteria that determine how AI engines (ChatGPT, Claude, Perplexity, Google AI Overviews) discover, parse, and cite your content. Criteria are organized into five pillars:
 ### 5-Pillar Framework
@@ -55,7 +55,9 @@ AEORank evaluates 34 criteria that determine how AI engines (ChatGPT, Claude, Pe
 | Original Data & Expert Analysis | 10% | Proprietary research, case studies, unique data points |
 | Content Depth | 7% | Article length, heading structure, deep vs thin pages |
 | Fact & Data Density | 6% | Specific numbers, statistics, data points per page |
+| Duplicate Content Blocks | 5% | Identical text blocks repeated across sections within a page |
 | Citation-Ready Writing | 4% | Self-contained definition sentences, single-claim statements |
+| Cross-Page Duplicate Content | 3% | Same paragraphs copy-pasted across multiple pages |
 | Answer-First Placement | 3% | Answer block in first 300 words, no throat-clearing openers |
 | Evidence Packaging | 3% | Inline citations, attribution phrases, sources sections |
@@ -64,9 +66,9 @@ AEORank evaluates 34 criteria that determine how AI engines (ChatGPT, Claude, Pe
 | Criterion | Weight | What it measures |
 |-----------|--------|------------------|
 | Direct Answer Paragraphs | 5% | Concise answer paragraphs after question headings |
-| Q&A Content Format | 5% | Question-format headings (What, How, Why) with answers |
-| Query-Answer Alignment | 5% | Every question heading followed by a direct answer |
-| Comprehensive FAQ Section | 4% | Dedicated FAQ with FAQPage schema markup |
+| Q&A Content Format | 4% | Question-format headings (What, How, Why) with answers |
+| Query-Answer Alignment | 4% | Every question heading followed by a direct answer |
+| Comprehensive FAQ Section | 3% | Dedicated FAQ with FAQPage schema markup |
 | Table & List Extractability | 3% | HTML tables with headers, ordered/unordered lists |
 | Definition Patterns | 2% | Clear "X is defined as..." patterns for key terms |
 | Entity Disambiguation | 2% | Primary entity defined early, consistent terminology |
@@ -107,19 +109,21 @@ AEORank evaluates 34 criteria that determine how AI engines (ChatGPT, Claude, Pe
 | RSS/Atom Feed | 1% | RSS feed linked from homepage |
 > **Coherence Gate:** Sites with topic coherence below 6/10 are score-capped regardless of technical perfection. A scattered site with perfect robots.txt, llms.txt, and schema will score lower than a focused site with mediocre technical implementation.
+>
+> **Duplication Gate:** Per-page scores are capped when duplicate content blocks are detected. A page with 3+ identical copy-pasted paragraphs cannot score above 35/75 regardless of other signals — LLMs will flag it as low-quality content.
 <details>
-<summary>All 34 criteria (numbered list)</summary>
+<summary>All 36 criteria (numbered list)</summary>
 | # | Criterion | Weight | Pillar |
 |---|-----------|--------|--------|
 | 1 | llms.txt File | 2% | AI Discovery |
 | 2 | Schema.org Structured Data | 3% | Trust & Authority |
-| 3 | Q&A Content Format | 5% | Content Structure |
+| 3 | Q&A Content Format | 4% | Content Structure |
 | 4 | Clean, Crawlable HTML | 2% | Technical Foundation |
 | 5 | Entity Authority & NAP Consistency | 5% | Trust & Authority |
 | 6 | robots.txt for AI Crawlers | 2% | AI Discovery |
-| 7 | Comprehensive FAQ Section | 4% | Content Structure |
+| 7 | Comprehensive FAQ Section | 3% | Content Structure |
 | 8 | Original Data & Expert Analysis | 10% | Answer Readiness |
 | 9 | Internal Linking Structure | 4% | Trust & Authority |
 | 10 | Semantic HTML5 & Accessibility | 2% | Technical Foundation |
@@ -136,7 +140,7 @@ AEORank evaluates 34 criteria that determine how AI engines (ChatGPT, Claude, Pe
 | 21 | Content Publishing Velocity | 2% | AI Discovery |
 | 22 | Schema Coverage & Depth | 1% | Technical Foundation |
 | 23 | Speakable Schema | 1% | Technical Foundation |
-| 24 | Query-Answer Alignment | 5% | Content Structure |
+| 24 | Query-Answer Alignment | 4% | Content Structure |
 | 25 | Content Cannibalization | 2% | AI Discovery |
 | 26 | Visible Date Signal | 2% | Technical Foundation |
 | 27 | Topic Coherence | 14% | Answer Readiness |
@@ -147,6 +151,8 @@ AEORank evaluates 34 criteria that determine how AI engines (ChatGPT, Claude, Pe
 | 32 | Entity Disambiguation | 2% | Content Structure |
 | 33 | Extraction Friction | 2% | Technical Foundation |
 | 34 | Image Context for AI | 1% | Technical Foundation |
+| 35 | Duplicate Content Blocks | 5% | Answer Readiness |
+| 36 | Cross-Page Duplicate Content | 3% | Answer Readiness |
 </details>
@@ -197,7 +203,7 @@ Or use `npx` directly:
 Run a complete audit. Returns `AuditResult` with:
 - `overallScore` - 0-100 weighted score
-- `scorecard` - 28 `ScoreCardItem` entries (criterion, score 0-10, status, key findings)
+- `scorecard` - 36 `ScoreCardItem` entries (criterion, score 0-10, status, key findings)
 - `detailedFindings` - Per-criterion findings with severity
 - `opportunities` - Prioritized improvements with effort/impact
 - `pitchNumbers` - Key metrics (schema types, AI crawler access, etc.)
@@ -219,10 +225,10 @@ Run a complete audit. Returns `AuditResult` with:
 ### `scorePage(html, url?)`
-Score a single HTML page against 14 per-page AEO criteria. Returns `PageScoreResult` with:
+Score a single HTML page against 21 per-page AEO criteria. Returns `PageScoreResult` with:
-- `aeoScore` - 0-100 weighted score
-- `criterionScores` - 14 `PageCriterionScore` entries (criterion, score 0-10, weight)
+- `aeoScore` - 0-75 weighted score (capped; duplication gate may lower further)
+- `criterionScores` - 21 `PageCriterionScore` entries (criterion, score 0-10, weight)
 ### `scoreAllPages(siteData)`
@@ -382,21 +388,22 @@ console.log(crawlResult.discoveredUrls.length); // Total URLs found
 ## Per-Page Scoring
-AEORank scores each individual page (0-75) against the 20 criteria that apply at page level. Instead of only seeing "your site scores 62," you get "your /about page scores 45, your /blog/guide scores 72."
+AEORank scores each individual page (0-75) against the 21 criteria that apply at page level. Instead of only seeing "your site scores 62," you get "your /about page scores 45, your /blog/guide scores 72."
-The 20 per-page criteria follow the same pillar-first weighting as the site-level score:
+The 21 per-page criteria follow the same pillar-first weighting as the site-level score:
 | Pillar | Per-Page Criteria | Weight |
 |--------|-------------------|--------|
 | **Answer Readiness** | Original Data & Expert Content | 10% |
 | | Fact & Data Density | 6% |
+| | Duplicate Content Blocks | 5% |
 | | Citation-Ready Writing | 4% |
 | | Answer-First Placement | 3% |
 | | Evidence Packaging | 3% |
 | **Content Structure** | Direct Answer Paragraphs | 5% |
-| | Q&A Content Format | 5% |
-| | Query-Answer Alignment | 5% |
-| | FAQ Section Content | 4% |
+| | Q&A Content Format | 4% |
+| | Query-Answer Alignment | 4% |
+| | FAQ Section Content | 3% |
 | | Table & List Extractability | 3% |
 | | Definition Patterns | 2% |
 | | Entity Disambiguation | 2% |
@@ -409,9 +416,11 @@ The 20 per-page criteria follow the same pillar-first weighting as the site-leve
 | | Image Context for AI | 1% |
 | **AI Discovery** | Canonical URL Strategy | 1% |
-The remaining 14 criteria are site-level only: llms.txt, robots.txt, sitemap, RSS, entity consistency, internal linking, content licensing, author schema, content velocity, schema coverage, speakable schema, content cannibalization, topic coherence, and content depth.
+The remaining 15 criteria are site-level only: llms.txt, robots.txt, sitemap, RSS, entity consistency, internal linking, content licensing, author schema, content velocity, schema coverage, speakable schema, content cannibalization, cross-page duplication, topic coherence, and content depth.
 > **Single-page cap:** Per-page scores are capped at 75 since single pages cannot demonstrate site-wide signals like topic coherence, content velocity, or sitemap completeness.
+>
+> **Duplication gate:** Pages with significant duplicate content blocks are score-capped. A page with 3+ copy-pasted paragraphs is capped at 35/75 — LLMs treat repeated content as low-quality regardless of other signals.
 ### CLI Output
@@ -436,7 +445,7 @@ import type { PageScoreResult, PageCriterionScore } from 'aeorank';
 // Score a single page
 const result = scorePage(html, url);
 console.log(result.aeoScore);         // 0-75 (capped for single pages)
-console.log(result.criterionScores);  // 20 per-criterion scores
+console.log(result.criterionScores);  // 21 per-criterion scores
 console.log(result.scoreCapped);      // true if score was capped at 75
 // Score all pages from site data
@@ -565,9 +574,13 @@ console.log(result.comparison.tied);              // Criteria with equal scores
 ## Changelog
+### v3.1.0 - Duplicate Content Detection
+2 new criteria (#35-#36): Duplicate Content Blocks (intra-page, 5%) and Cross-Page Duplicate Content (3%). Detects identical text blocks within pages and copy-pasted paragraphs across pages using shingle-based Jaccard similarity. Boilerplate filtering excludes CTAs, signups, and template content from false positives. Duplication gate caps per-page scores when severe duplication is found. CLI now shows duplicate section names inline per page.
 ### v3.0.0 - 5-Pillar Framework & 6 New Criteria
-Scoring Engine v2: 28 → 34 criteria with 5-pillar framework (Answer Readiness, Content Structure, Trust & Authority, Technical Foundation, AI Discovery). 6 new criteria targeting citation quality, evidence packaging, and extraction friction. Per-pillar sub-scores, top-3 fixes, client-friendly names. Single-page score cap at 75. 15 per-page quality checks (up from 12).
+Scoring Engine v2: 28 → 34 criteria (now 36) with 5-pillar framework (Answer Readiness, Content Structure, Trust & Authority, Technical Foundation, AI Discovery). 6 new criteria targeting citation quality, evidence packaging, and extraction friction. Per-pillar sub-scores, top-3 fixes, client-friendly names. Single-page score cap at 75. 15 per-page quality checks (up from 12).
 ### v2.3.0 - Coherence Scaling & Script Stripping
@@ -595,7 +608,7 @@ Individual page scores (0-100) against 14 page-level criteria. Top/bottom page r
 ## Benchmark Dataset
-The `data/` directory contains the largest open dataset of AI visibility scores - **13,619 domains** scored across 34 criteria, including **4,328 Y Combinator startups** across 48 batches (W06-W26):
+The `data/` directory contains the largest open dataset of AI visibility scores - **13,619 domains** scored across 36 criteria, including **4,328 Y Combinator startups** across 48 batches (W06-W26):
 | File | Contents |
 |------|----------|

package/dist/browser.d.ts CHANGED Viewed

@@ -64,7 +64,7 @@ declare function buildLinkGraph(pages: FetchResult[], domain: string, homepageUr
 /**
  * V2 Pillar Framework — 5-pillar scoring model.
- * Maps all 34 criteria into pillars, computes sub-scores,
+ * Maps all 36 criteria into pillars, computes sub-scores,
  * provides client-friendly names, and calculates top-3 fixes.
  */
@@ -320,7 +320,7 @@ interface SitemapDateAnalysis {
 declare function countRecentSitemapDates(sitemapText: string): SitemapDateAnalysis;
 declare function extractRawDataSummary(data: SiteData): RawDataSummary;
 /**
- * Run all 34 criteria checks using pre-fetched site data.
+ * Run all 36 criteria checks using pre-fetched site data.
  * All functions are synchronous (no HTTP calls) - data was already fetched.
  */
 declare function auditSiteFromData(data: SiteData): CriterionResult[];
@@ -456,7 +456,7 @@ declare function analyzeAllPages(siteData: SiteData): PageReview[];
 /**
  * Per-page AEO scoring.
- * Evaluates 20 of 34 criteria that apply at individual page level.
+ * Evaluates 21 of 36 criteria that apply at individual page level.
  * Produces a 0-75 AEO score per page (single-page cap at 75).
  */