seo-intel 1.5.40 → 1.5.45
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +64 -0
- package/analyses/blog-draft/prescorer.js +17 -0
- package/analyses/loop/orchestrator.js +179 -0
- package/cli.js +162 -6
- package/crawler/html-extract.js +127 -0
- package/crawler/light.js +169 -0
- package/db/db.js +66 -0
- package/lib/gate.js +33 -1
- package/lib/intel.js +9 -3
- package/mcp/server.js +172 -17
- package/package.json +1 -1
- package/reports/generate-html.js +42 -404
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,69 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 1.5.45 (2026-05-29)
|
|
4
|
+
|
|
5
|
+
### The content loop in one command — `seo-intel loop` + `run_content_loop` (MCP)
|
|
6
|
+
Closes the loop from the agent's side: instead of running gap-finding, drafting, and scoring as separate steps, one invocation walks the whole content half — **rank the open gaps → draft the highest-leverage one → AEO-prescore → record it → queue for publish.**
|
|
7
|
+
|
|
8
|
+
- **CLI `seo-intel loop <project>`** (free): picks the top gap from the Intelligence Ledger (ranked by priority × source × AI-intent), drafts it with your chosen model, pre-scores citability, optionally auto-revises (`--revise k`) until it clears `--min-score`, marks the gap in-progress (the v1.5.42 write-back), and writes the approved draft to `reports/ready/<project>/`. Flags: `--topic`, `--count`, `--lang`, `--type`, `--model`, `--min-score`, `--revise`, `--no-queue`, `--dry-run`, `--format json`. `--dry-run` shows which gap it would target with no model call.
|
|
9
|
+
- **MCP `run_content_loop`** (free): the same ranking + selection, returned as a seeded draft prompt for the agent's own LLM (hand-back mode) — write the draft, then call `prescore_draft(project, topic)` to score and close the loop. `dry_run` to just see the target.
|
|
10
|
+
- New module `analyses/loop/orchestrator.js` (`runContentLoop`) backs both surfaces; the model writer is injectable so the CLI drives your cloud model while MCP hands the prompt to the agent.
|
|
11
|
+
- The queued draft carries front-matter (`status: ready`, score, tier, source gap) as a publish handoff — it does not auto-deploy.
|
|
12
|
+
|
|
13
|
+
## 1.5.44 (2026-05-29)
|
|
14
|
+
|
|
15
|
+
### New standalone skill: `ai-citability` — score AI citability with zero install, zero account
|
|
16
|
+
A drop-in Agent Skill (Claude Code / Cursor / Codex) that scores any page or draft 0–100 for how easily an AI assistant (ChatGPT, Claude, Perplexity, AI Overviews, Bing Copilot) can cite it — across the same six signals as the full AEO audit.
|
|
17
|
+
|
|
18
|
+
- **Truly self-contained:** pure Node, no `npm install`, no account, no API key, no network. Nothing is saved or sent. Drop the folder into your agent's skills directory and it works.
|
|
19
|
+
- **Markdown or HTML** input, from a file or stdin: `node scripts/score.mjs <file>` (add `--json` for machine output). The agent fetches the content however it likes — a local file, WebFetch, or the `crawl_site` MCP tool.
|
|
20
|
+
- Reports the overall score + tier, all six signal bars, the two weakest signals with concrete fixes, and funnels to the full `seo-intel aeo` audit for whole-site, entity-aware, historical scoring.
|
|
21
|
+
- Ships the exact scoring engine as the product (vendored from `analyses/aeo/scorer.js`), with a smoke-test drift guard so the standalone score never diverges from `seo-intel aeo`.
|
|
22
|
+
- Lives in `skills/ai-citability/` — distributed via the repo / skill directories, not the npm package.
|
|
23
|
+
|
|
24
|
+
## 1.5.43 (2026-05-29)
|
|
25
|
+
|
|
26
|
+
### New MCP tool: `crawl_site` — crawl any URL, no setup, no account, nothing saved
|
|
27
|
+
A zero-config crawl for any AI agent. Point it at a URL and it returns structured SEO/AEO data — no project to configure, no account, no API key, and nothing is persisted or sent anywhere.
|
|
28
|
+
|
|
29
|
+
- **Lightweight by design:** plain HTTP fetch (no browser, no Playwright download), same-origin BFS, honours `robots.txt` + crawl-delay, small page budget (default 10, hard cap 50). www/non-www and http/https are treated as the same site.
|
|
30
|
+
- **Returns per page:** title, meta description, canonical, indexability, headings, internal/external link counts, JSON-LD schema types, word count, and published/modified dates — plus a deduped list of discovered internal URLs.
|
|
31
|
+
- **Optional AI-citability (AEO) score** per page via `include_citability` (approximate in light mode — no entity extraction; run `seo-intel aeo` for the full score).
|
|
32
|
+
- **Knows its limits:** JavaScript-rendered / SPA pages under-report content (the response says so) — use `seo-intel crawl` (Playwright) for those, and install seo-intel for persistent history, the Intelligence Ledger, and competitor analysis.
|
|
33
|
+
- New modules: `crawler/light.js` (fetch crawler) + `crawler/html-extract.js` (pure regex HTML extraction, zero new dependencies).
|
|
34
|
+
|
|
35
|
+
## 1.5.42 (2026-05-29)
|
|
36
|
+
|
|
37
|
+
### The content loop now remembers its own work
|
|
38
|
+
Drafting a post used to leave the Intelligence Ledger untouched — so the same gap kept getting suggested even after you'd written about it. Now drafting closes that memory gap.
|
|
39
|
+
|
|
40
|
+
- **`blog-draft` writes back to the Ledger.** After a draft is generated and AEO-scored, SEO Intel records a `draft_created` insight and flips the matching gap(s) to `in_progress` — so they stop resurfacing until a re-audit re-scores the published page. Matching is precise (keyword/topic), and the write-back is best-effort: a Ledger hiccup never fails your draft.
|
|
41
|
+
- **MCP `prescore_draft` can close the loop too.** It gains optional `project` and `topic` arguments — pass them and the scored draft is recorded + matching gaps marked `in_progress`, identical to the CLI. Omit them for a pure, stateless score (unchanged default).
|
|
42
|
+
- When no `--topic` is given, the targeted gap is recovered from the draft's own frontmatter title or first H1.
|
|
43
|
+
|
|
44
|
+
### Fix — free-tier `blog-draft` no longer claims "no data" when you have citability gaps
|
|
45
|
+
`blog-draft`'s empty-state check only counted competitor-analysis gaps, so a free user whose gaps came from `aeo` (AI citability) or only from `keywords` could be wrongly told "No intelligence data found." It now counts citability and content gaps too, and points free users at `aeo` + `keywords` first (competitor gaps via `analyze` are Solo).
|
|
46
|
+
|
|
47
|
+
### Fix — MCP startup banner now reflects the real free/paid split
|
|
48
|
+
The `seo-intel-mcp` startup line (shown in your MCP host's logs) still listed `run_citability_audit`, `prescore_draft`, `draft_blog_prompt`, and `get_intel(audit/blog)` as paid. They've been free since 1.5.41 — the banner now says so, listing only competitor synthesis (`get_competitor_positioning`, `get_intel(competitor)`, the `analyses` export table) as Solo.
|
|
49
|
+
|
|
50
|
+
## 1.5.41 (2026-05-28)
|
|
51
|
+
|
|
52
|
+
### Your own site is now fully free — across CLI, MCP, and the dashboard
|
|
53
|
+
Everything SEO Intel can tell you about **your own** site is now available on the free tier: full crawl, AI Citability (AEO) scoring, keyword intelligence, programmatic template detection, orphan detection, JS-rendering delta, Search Console insights, blog-draft generation, and the complete dashboard. The daily problem-notification cron stays free too.
|
|
54
|
+
|
|
55
|
+
Solo (€19.99/mo) now focuses on the things a tool has to do that an AI agent can't do for itself:
|
|
56
|
+
|
|
57
|
+
- **Competitor synthesis** — gap analysis, keyword battleground, positioning, competitive landscape sections, and the competitor export/digest.
|
|
58
|
+
- **Automation** — the scheduled-crawl scheduler.
|
|
59
|
+
- **History & trends** — the crawl change brief ("what changed") and publishing velocity.
|
|
60
|
+
|
|
61
|
+
**CLI:** own-site commands (`aeo`, `keywords`, `templates`, `orphans`, `js-delta`, `blog-draft`, `gsc-insights`, `scan`, `extract`, `html`) no longer prompt to upgrade. `intel <project> --for=audit|blog` is free; `--for=competitor` is Solo.
|
|
62
|
+
|
|
63
|
+
**MCP:** `run_citability_audit`, `prescore_draft`, and `draft_blog_prompt` are now free tools. `export_intel` returns all own-site tables (pages, keywords, headings, links, technical, schemas, extractions, citability scores, and the Intelligence Ledger) for free; only the competitor gap-analysis history requires Solo. `get_competitor_positioning` remains Solo.
|
|
64
|
+
|
|
65
|
+
**Dashboard:** collapsed to a single template that gates sections individually — own-site analysis (citability, keyword inventor, top keywords, GSC insights, internal links, exports, drafts) renders on every tier; competitor and strategy sections render with Solo. The old separate free-tier dashboard codepath is gone.
|
|
66
|
+
|
|
3
67
|
## 1.5.40 (2026-05-28)
|
|
4
68
|
|
|
5
69
|
### Setup wizard — daily notifications now self-install
|
|
@@ -10,6 +10,23 @@
|
|
|
10
10
|
|
|
11
11
|
import { scorePage } from '../aeo/scorer.js';
|
|
12
12
|
|
|
13
|
+
/**
|
|
14
|
+
* Recover a draft's subject from its own output when no explicit topic was
|
|
15
|
+
* given — so the agentic-loop write-back (F1) can still match Ledger gaps.
|
|
16
|
+
* Prefers YAML frontmatter `title:`, falls back to the first markdown H1.
|
|
17
|
+
* Shared by CLI `blog-draft` and the MCP `prescore_draft` tool.
|
|
18
|
+
* @param {string} draft
|
|
19
|
+
* @returns {string|null}
|
|
20
|
+
*/
|
|
21
|
+
export function extractDraftTopic(draft) {
|
|
22
|
+
if (!draft) return null;
|
|
23
|
+
const fm = draft.match(/^\s*---\s*[\s\S]*?\btitle\s*:\s*["']?(.+?)["']?\s*$/im);
|
|
24
|
+
if (fm && fm[1]) return fm[1].trim();
|
|
25
|
+
const h1 = draft.match(/^\s{0,3}#\s+(.+?)\s*$/m);
|
|
26
|
+
if (h1 && h1[1]) return h1[1].replace(/[#*_`]/g, '').trim();
|
|
27
|
+
return null;
|
|
28
|
+
}
|
|
29
|
+
|
|
13
30
|
export function prescore(markdownText) {
|
|
14
31
|
// Strip YAML frontmatter
|
|
15
32
|
const bodyMatch = markdownText.match(/^---[\s\S]*?---\n([\s\S]*)$/);
|
|
@@ -0,0 +1,179 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Content-loop orchestrator (F2) — the "hands, not eyes" engine.
|
|
3
|
+
*
|
|
4
|
+
* One invocation walks the content half of the agentic loop:
|
|
5
|
+
* gather gaps → rank → draft → prescore → (revise) → write-back → queue
|
|
6
|
+
*
|
|
7
|
+
* Library-first: backs both `seo-intel loop` (CLI) and `run_content_loop` (MCP).
|
|
8
|
+
* The `generate` fn is injected so the CLI can drive the user's cloud model while
|
|
9
|
+
* MCP can hand the prompt back to the agent's own LLM (generate = null).
|
|
10
|
+
*
|
|
11
|
+
* Builds on F1 (v1.5.42): a finished draft records a `draft_created` insight and
|
|
12
|
+
* flips matching gaps to in_progress, so the loop remembers its own work.
|
|
13
|
+
* F3 (re-audit → flip in_progress→done once the live page clears 60) is NOT here.
|
|
14
|
+
*/
|
|
15
|
+
|
|
16
|
+
import { writeFileSync, mkdirSync } from 'node:fs';
|
|
17
|
+
import { join, dirname, relative } from 'node:path';
|
|
18
|
+
import { fileURLToPath } from 'node:url';
|
|
19
|
+
import { gatherBlogDraftContext, buildBlogDraftPrompt } from '../blog-draft/index.js';
|
|
20
|
+
import { prescore, extractDraftTopic } from '../blog-draft/prescorer.js';
|
|
21
|
+
import { recordDraftCreated, markGapsInProgress } from '../../db/db.js';
|
|
22
|
+
|
|
23
|
+
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
24
|
+
const REPO_ROOT = join(__dirname, '../..');
|
|
25
|
+
const REPORTS_DIR = join(REPO_ROOT, 'reports');
|
|
26
|
+
|
|
27
|
+
const PRIORITY_W = { high: 3, medium: 2, low: 1 };
|
|
28
|
+
const SOURCE_W = { citability_gap: 1.3, content_gap: 1.3, keyword_gap: 1.1, long_tail: 1.0, keyword_inventor: 1.0 };
|
|
29
|
+
const HOT_INTENT = /decision|comparison|implementation|compare|\bvs\b|best|should/i;
|
|
30
|
+
|
|
31
|
+
function slugify(s) {
|
|
32
|
+
return (s || 'draft').toLowerCase().replace(/[^a-z0-9]+/g, '-').replace(/^-+|-+$/g, '').slice(0, 60) || 'draft';
|
|
33
|
+
}
|
|
34
|
+
|
|
35
|
+
/** Build a unified, leverage-ranked list of candidate gaps from gathered context. */
|
|
36
|
+
export function rankGaps(ctx) {
|
|
37
|
+
const cands = [];
|
|
38
|
+
const add = (source, topicRaw, priority, intent, extra = {}) => {
|
|
39
|
+
const topic = (topicRaw || '').toString().trim();
|
|
40
|
+
if (!topic) return;
|
|
41
|
+
const pw = PRIORITY_W[(priority || 'medium').toLowerCase()] || 2;
|
|
42
|
+
const sw = SOURCE_W[source] || 1;
|
|
43
|
+
const bonus = intent && HOT_INTENT.test(intent) ? 0.25 : 0;
|
|
44
|
+
cands.push({ source, topic, priority: (priority || 'medium'), intent: intent || null, leverage: +(pw * sw * (1 + bonus)).toFixed(3), ...extra });
|
|
45
|
+
};
|
|
46
|
+
for (const kg of ctx.keywordGaps || []) add('keyword_gap', kg.keyword, kg.priority, kg.intent);
|
|
47
|
+
for (const lt of ctx.longTails || []) add('long_tail', lt.phrase, lt.priority, lt.intent);
|
|
48
|
+
for (const kw of ctx.kwInventor || []) add('keyword_inventor', kw.phrase, kw.priority, kw.intent);
|
|
49
|
+
for (const cg of ctx.contentGaps || []) add('content_gap', typeof cg === 'string' ? cg : (cg.topic || cg.suggested_title || cg.gap), cg.priority || 'high', null);
|
|
50
|
+
for (const cgap of ctx.citabilityGaps || []) add('citability_gap', cgap.title || cgap.url, (cgap.score ?? 50) < 35 ? 'high' : 'medium', (cgap.ai_intents || [])[0], { url: cgap.url, current_score: cgap.score });
|
|
51
|
+
|
|
52
|
+
// Dedupe by lowercased topic, keep highest leverage.
|
|
53
|
+
const best = new Map();
|
|
54
|
+
for (const c of cands) {
|
|
55
|
+
const k = c.topic.toLowerCase();
|
|
56
|
+
if (!best.has(k) || best.get(k).leverage < c.leverage) best.set(k, c);
|
|
57
|
+
}
|
|
58
|
+
return [...best.values()].sort((a, b) => b.leverage - a.leverage);
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
/**
|
|
62
|
+
* @param {import('node:sqlite').DatabaseSync} db
|
|
63
|
+
* @param {string} project
|
|
64
|
+
* @param {object} opts
|
|
65
|
+
* @param {object} opts.config project config (for the prompt builder)
|
|
66
|
+
* @param {string} [opts.topic] focus topic (else auto-pick top gap)
|
|
67
|
+
* @param {number} [opts.count=1] draft the top N gaps
|
|
68
|
+
* @param {string} [opts.lang='en']
|
|
69
|
+
* @param {string} [opts.contentType='blog']
|
|
70
|
+
* @param {number} [opts.minScore=60]
|
|
71
|
+
* @param {number} [opts.revise=0] auto-revise up to k times if below minScore
|
|
72
|
+
* @param {boolean} [opts.queue=true] write approved drafts to reports/ready/<project>/
|
|
73
|
+
* @param {string} [opts.queueDir]
|
|
74
|
+
* @param {boolean} [opts.dryRun=false] select + plan only, no model call
|
|
75
|
+
* @param {(prompt:string)=>Promise<string|null>} [opts.generate] null ⇒ hand-back mode
|
|
76
|
+
* @param {(msg:string)=>void} [opts.onProgress]
|
|
77
|
+
*/
|
|
78
|
+
export async function runContentLoop(db, project, opts = {}) {
|
|
79
|
+
const {
|
|
80
|
+
config = { project }, topic = null, count = 1, lang = 'en', contentType = 'blog',
|
|
81
|
+
minScore = 60, revise = 0, queue = true, queueDir, dryRun = false,
|
|
82
|
+
generate = null, onProgress = () => {},
|
|
83
|
+
} = opts;
|
|
84
|
+
|
|
85
|
+
const ctx = gatherBlogDraftContext(db, project, topic);
|
|
86
|
+
const ranked = rankGaps(ctx);
|
|
87
|
+
|
|
88
|
+
if (!ranked.length) {
|
|
89
|
+
return {
|
|
90
|
+
project, mode: 'no-gaps', drafts: [], skipped: [],
|
|
91
|
+
next_action: 'No active gaps to draft. Run `seo-intel aeo` + `seo-intel keywords` (own-site, free) or `seo-intel analyze` (competitor, Solo) to populate the Ledger.',
|
|
92
|
+
};
|
|
93
|
+
}
|
|
94
|
+
|
|
95
|
+
const targets = ranked.slice(0, Math.max(1, count));
|
|
96
|
+
|
|
97
|
+
if (dryRun) {
|
|
98
|
+
return {
|
|
99
|
+
project, mode: 'dry-run',
|
|
100
|
+
planned: targets.map(t => ({ topic: t.topic, source: t.source, priority: t.priority, leverage: t.leverage })),
|
|
101
|
+
next_action: 'Dry run — no draft generated. Re-run without dryRun to draft these.',
|
|
102
|
+
};
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
const drafts = [];
|
|
106
|
+
const skipped = [];
|
|
107
|
+
|
|
108
|
+
for (const target of targets) {
|
|
109
|
+
const tTopic = target.topic;
|
|
110
|
+
const prompt = buildBlogDraftPrompt(ctx, { config, lang, topic: tTopic, contentType });
|
|
111
|
+
|
|
112
|
+
// Hand-back mode (MCP default): no model wired — return the prompt so the
|
|
113
|
+
// agent's own LLM writes the draft, then calls prescore_draft to close.
|
|
114
|
+
if (!generate) {
|
|
115
|
+
drafts.push({
|
|
116
|
+
gap: { source: target.source, topic: tTopic, leverage: target.leverage },
|
|
117
|
+
mode: 'handback', prompt,
|
|
118
|
+
next: `Write the draft from this prompt with your own LLM, then call prescore_draft(project="${project}", topic="${tTopic}") to AEO-score it and close the loop.`,
|
|
119
|
+
});
|
|
120
|
+
continue;
|
|
121
|
+
}
|
|
122
|
+
|
|
123
|
+
onProgress(`drafting "${tTopic}" (${target.source} · leverage ${target.leverage})`);
|
|
124
|
+
let draft = await generate(prompt);
|
|
125
|
+
if (!draft) { skipped.push({ topic: tTopic, reason: 'generation_failed' }); continue; }
|
|
126
|
+
let score = prescore(draft);
|
|
127
|
+
|
|
128
|
+
let revisions = 0;
|
|
129
|
+
while (score.score < minScore && revisions < revise) {
|
|
130
|
+
const weak = Object.entries(score.breakdown).sort((a, b) => a[1] - b[1]).slice(0, 2).map(([k]) => k.replace(/_/g, ' '));
|
|
131
|
+
onProgress(`revising ${revisions + 1}/${revise} (was ${score.score}; weak: ${weak.join(', ')})`);
|
|
132
|
+
const revised = await generate(prompt + `\n\n## REVISION\nThe previous draft scored ${score.score}/100. Strengthen the weakest signals: ${weak.join(', ')}. Add Q&A structure (H2 question → immediate answer), concrete numbers/dates, and named entities/sources. Return the full improved draft.`);
|
|
133
|
+
revisions++;
|
|
134
|
+
if (revised) {
|
|
135
|
+
const rs = prescore(revised);
|
|
136
|
+
if (rs.score >= score.score) { draft = revised; score = rs; }
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
|
|
140
|
+
const effectiveTopic = tTopic || extractDraftTopic(draft);
|
|
141
|
+
|
|
142
|
+
// Queue for publish (handoff, not auto-deploy).
|
|
143
|
+
let queuedPath = null;
|
|
144
|
+
if (queue) {
|
|
145
|
+
const dir = queueDir || join(REPORTS_DIR, 'ready', project);
|
|
146
|
+
mkdirSync(dir, { recursive: true });
|
|
147
|
+
queuedPath = join(dir, `${slugify(effectiveTopic)}.md`);
|
|
148
|
+
const fm = `---\nstatus: ready\nscore: ${score.score}\ntier: ${score.tier}\ntopic: ${JSON.stringify(effectiveTopic)}\nsource_gap: ${target.source}\nlang: ${lang}\ntype: ${contentType}\ncreated_at: ${new Date().toISOString()}\n---\n\n`;
|
|
149
|
+
writeFileSync(queuedPath, /^\s*---/.test(draft) ? draft : fm + draft, 'utf8');
|
|
150
|
+
}
|
|
151
|
+
|
|
152
|
+
// Write-back (F1) — never let a Ledger hiccup fail the draft.
|
|
153
|
+
let marked = 0;
|
|
154
|
+
try {
|
|
155
|
+
recordDraftCreated(db, project, { topic: effectiveTopic, score: score.score, tier: score.tier, wordCount: score.wordCount, lang, contentType, savedPath: queuedPath });
|
|
156
|
+
marked = markGapsInProgress(db, project, effectiveTopic);
|
|
157
|
+
} catch { /* best-effort */ }
|
|
158
|
+
|
|
159
|
+
drafts.push({
|
|
160
|
+
gap: { source: target.source, topic: tTopic, leverage: target.leverage, ...(target.url ? { url: target.url, previous_score: target.current_score } : {}) },
|
|
161
|
+
topic: effectiveTopic, score: score.score, tier: score.tier, revisions,
|
|
162
|
+
word_count: score.wordCount,
|
|
163
|
+
queued_path: queuedPath ? relative(REPO_ROOT, queuedPath) : null,
|
|
164
|
+
ledger: { draft_recorded: true, gaps_marked_in_progress: marked },
|
|
165
|
+
});
|
|
166
|
+
}
|
|
167
|
+
|
|
168
|
+
const handback = !generate;
|
|
169
|
+
return {
|
|
170
|
+
project,
|
|
171
|
+
mode: handback ? 'handback' : 'generated',
|
|
172
|
+
drafts, skipped,
|
|
173
|
+
next_action: handback
|
|
174
|
+
? `No generation model wired — write each draft from its prompt, then call prescore_draft(project, topic) to score + close the loop. (CLI: \`seo-intel loop ${project}\` drives a model directly.)`
|
|
175
|
+
: queue
|
|
176
|
+
? `Review reports/ready/${project}/, publish, then re-crawl + \`seo-intel aeo\` to verify the gap closed.`
|
|
177
|
+
: 'Drafts generated (not queued).',
|
|
178
|
+
};
|
|
179
|
+
}
|
package/cli.js
CHANGED
|
@@ -40,6 +40,7 @@ import {
|
|
|
40
40
|
getPageHash, getSchemasByProject,
|
|
41
41
|
upsertInsightsFromAnalysis, upsertInsightsFromKeywords,
|
|
42
42
|
upsertSitemapUrls,
|
|
43
|
+
recordDraftCreated, markGapsInProgress,
|
|
43
44
|
} from './db/db.js';
|
|
44
45
|
import { generateMultiDashboard } from './reports/generate-html.js';
|
|
45
46
|
import { buildTechnicalActions } from './exports/technical.js';
|
|
@@ -4709,7 +4710,7 @@ program
|
|
|
4709
4710
|
printAttackHeader('✍️ AEO Blog Draft Generator', project);
|
|
4710
4711
|
|
|
4711
4712
|
const { gatherBlogDraftContext, buildBlogDraftPrompt } = await getBlogDraftModule();
|
|
4712
|
-
const { prescore } = await getPrescorerModule();
|
|
4713
|
+
const { prescore, extractDraftTopic } = await getPrescorerModule();
|
|
4713
4714
|
|
|
4714
4715
|
// ── Gather intelligence ──
|
|
4715
4716
|
console.log(chalk.gray(' Gathering intelligence from Ledger...'));
|
|
@@ -4728,9 +4729,13 @@ program
|
|
|
4728
4729
|
console.log(chalk.gray(` Keyword gaps: ${stats.keywordGaps} Long-tails: ${stats.longTails} Citability gaps: ${stats.citabilityGaps}`));
|
|
4729
4730
|
console.log(chalk.gray(` Keyword inventor: ${stats.kwInventor} Content gaps: ${stats.contentGaps} Entities: ${stats.entities}`));
|
|
4730
4731
|
|
|
4731
|
-
|
|
4732
|
+
// F5 (v1.5.42): count the free-tier gap sources too — a free user's gaps
|
|
4733
|
+
// come from `aeo` (citability) and `keywords` (kw-inventor), not just the
|
|
4734
|
+
// Solo competitor analysis. Don't tell them "no data" when AEO gaps exist.
|
|
4735
|
+
if (stats.keywordGaps + stats.longTails + stats.kwInventor + stats.citabilityGaps + stats.contentGaps === 0) {
|
|
4732
4736
|
console.log(chalk.yellow('\n ⚠️ No intelligence data found in the Ledger.'));
|
|
4733
|
-
console.log(chalk.gray('
|
|
4737
|
+
console.log(chalk.gray(' Free sources: seo-intel aeo ' + project + ' and seo-intel keywords ' + project));
|
|
4738
|
+
console.log(chalk.gray(' Competitor gaps (Solo): seo-intel analyze ' + project + '\n'));
|
|
4734
4739
|
return;
|
|
4735
4740
|
}
|
|
4736
4741
|
|
|
@@ -4786,14 +4791,15 @@ program
|
|
|
4786
4791
|
}
|
|
4787
4792
|
|
|
4788
4793
|
// ── Output ──
|
|
4794
|
+
let savedPath = null;
|
|
4789
4795
|
if (opts.save) {
|
|
4790
4796
|
const slug = opts.topic
|
|
4791
4797
|
? opts.topic.toLowerCase().replace(/[^a-z0-9]+/g, '-').slice(0, 50)
|
|
4792
4798
|
: 'auto';
|
|
4793
4799
|
const filename = `${project}-blog-draft-${slug}-${new Date().toISOString().slice(0, 10)}.md`;
|
|
4794
|
-
|
|
4795
|
-
writeFileSync(
|
|
4796
|
-
console.log(chalk.bold.green(` ✅ Draft saved: ${
|
|
4800
|
+
savedPath = join(__dirname, 'reports', filename);
|
|
4801
|
+
writeFileSync(savedPath, draft, 'utf8');
|
|
4802
|
+
console.log(chalk.bold.green(` ✅ Draft saved: ${savedPath}`));
|
|
4797
4803
|
console.log('');
|
|
4798
4804
|
} else {
|
|
4799
4805
|
console.log(chalk.bold(' 📝 Generated Draft'));
|
|
@@ -4804,6 +4810,32 @@ program
|
|
|
4804
4810
|
console.log(chalk.gray(' Tip: add --save to write the draft to reports/'));
|
|
4805
4811
|
console.log('');
|
|
4806
4812
|
}
|
|
4813
|
+
|
|
4814
|
+
// ── F1 (v1.5.42): close the loop's memory gap ──
|
|
4815
|
+
// Record the draft in the Ledger and flip matching gaps to in_progress so
|
|
4816
|
+
// they stop resurfacing next pass. Best-effort — never break the command.
|
|
4817
|
+
try {
|
|
4818
|
+
const effectiveTopic = opts.topic || extractDraftTopic(draft);
|
|
4819
|
+
recordDraftCreated(db, project, {
|
|
4820
|
+
topic: effectiveTopic,
|
|
4821
|
+
score: scoreResult.score,
|
|
4822
|
+
tier: scoreResult.tier,
|
|
4823
|
+
wordCount: scoreResult.wordCount,
|
|
4824
|
+
lang: opts.lang,
|
|
4825
|
+
contentType: opts.type || 'blog',
|
|
4826
|
+
savedPath,
|
|
4827
|
+
});
|
|
4828
|
+
const marked = markGapsInProgress(db, project, effectiveTopic);
|
|
4829
|
+
if (marked > 0) {
|
|
4830
|
+
console.log(chalk.gray(` 📌 Ledger: ${marked} gap(s) marked in-progress — they'll stop resurfacing until re-audited.`));
|
|
4831
|
+
} else {
|
|
4832
|
+
console.log(chalk.gray(' 📌 Ledger: draft recorded.'));
|
|
4833
|
+
}
|
|
4834
|
+
console.log('');
|
|
4835
|
+
} catch (e) {
|
|
4836
|
+
// loop write-back is best-effort; a failure must not fail the draft
|
|
4837
|
+
console.log(chalk.dim(` (Ledger write-back skipped: ${e.message})`));
|
|
4838
|
+
}
|
|
4807
4839
|
});
|
|
4808
4840
|
|
|
4809
4841
|
// ── GUIDE (Coach-style chapter map) ──────────────────────────────────────
|
|
@@ -5359,6 +5391,130 @@ if (process.argv.length <= 2) {
|
|
|
5359
5391
|
process.exit(0);
|
|
5360
5392
|
}
|
|
5361
5393
|
|
|
5394
|
+
// ── LOOP — content-loop orchestrator: gap → draft → prescore → queue ─────
|
|
5395
|
+
program
|
|
5396
|
+
.command('loop <project>')
|
|
5397
|
+
.description('Run the content loop once: pick the top gap → draft → prescore → queue for publish')
|
|
5398
|
+
.option('--topic <t>', 'Focus on a specific topic instead of auto-picking')
|
|
5399
|
+
.option('--count <n>', 'Draft the top N gaps', '1')
|
|
5400
|
+
.option('--lang <code>', 'Language: en or fi', 'en')
|
|
5401
|
+
.option('--type <type>', 'Content type: blog, docs, or social', 'blog')
|
|
5402
|
+
.option('--model <name>', 'Generation model (gemini, claude, gpt, deepseek)', 'gemini')
|
|
5403
|
+
.option('--min-score <n>', 'Target citability before publishing', '60')
|
|
5404
|
+
.option('--revise <k>', 'Auto-revise up to k times if below min-score', '0')
|
|
5405
|
+
.option('--no-queue', 'Do not write drafts to reports/ready/')
|
|
5406
|
+
.option('--dry-run', 'Pick the gap and show the plan — no model call')
|
|
5407
|
+
.option('--format <type>', 'Output format: brief or json', 'brief')
|
|
5408
|
+
.action(async (project, opts) => {
|
|
5409
|
+
if (!requirePro('loop')) return;
|
|
5410
|
+
const db = getDb();
|
|
5411
|
+
const config = loadConfig(project);
|
|
5412
|
+
const isJson = opts.format === 'json';
|
|
5413
|
+
const { runContentLoop } = await import('./analyses/loop/orchestrator.js');
|
|
5414
|
+
|
|
5415
|
+
if (!isJson) printAttackHeader('🔁 Content Loop', project);
|
|
5416
|
+
|
|
5417
|
+
const result = await runContentLoop(db, project, {
|
|
5418
|
+
config,
|
|
5419
|
+
topic: opts.topic || null,
|
|
5420
|
+
count: parseInt(opts.count, 10) || 1,
|
|
5421
|
+
lang: opts.lang,
|
|
5422
|
+
contentType: opts.type || 'blog',
|
|
5423
|
+
minScore: parseInt(opts.minScore, 10) || 60,
|
|
5424
|
+
revise: parseInt(opts.revise, 10) || 0,
|
|
5425
|
+
queue: opts.queue !== false,
|
|
5426
|
+
dryRun: !!opts.dryRun,
|
|
5427
|
+
generate: opts.dryRun ? null : (p) => callAnalysisModel(p, opts.model),
|
|
5428
|
+
onProgress: isJson ? () => {} : (m) => console.log(chalk.gray(' · ' + m)),
|
|
5429
|
+
});
|
|
5430
|
+
|
|
5431
|
+
if (isJson) { console.log(JSON.stringify(result, null, 2)); return; }
|
|
5432
|
+
|
|
5433
|
+
console.log('');
|
|
5434
|
+
if (result.mode === 'no-gaps') {
|
|
5435
|
+
console.log(chalk.yellow(' No active gaps to draft.'));
|
|
5436
|
+
console.log(chalk.gray(' ' + result.next_action));
|
|
5437
|
+
console.log('');
|
|
5438
|
+
return;
|
|
5439
|
+
}
|
|
5440
|
+
if (result.mode === 'dry-run') {
|
|
5441
|
+
console.log(chalk.bold(' Planned drafts (highest leverage first):'));
|
|
5442
|
+
for (const p of result.planned) {
|
|
5443
|
+
console.log(` ${chalk.cyan(p.topic.slice(0, 60))} ${chalk.gray(`[${p.source} · ${p.priority} · leverage ${p.leverage}]`)}`);
|
|
5444
|
+
}
|
|
5445
|
+
console.log('');
|
|
5446
|
+
console.log(chalk.gray(' Re-run without --dry-run to draft these.'));
|
|
5447
|
+
console.log('');
|
|
5448
|
+
return;
|
|
5449
|
+
}
|
|
5450
|
+
for (const d of result.drafts) {
|
|
5451
|
+
const tier = d.score >= 60 ? chalk.green : d.score >= 35 ? chalk.yellow : chalk.red;
|
|
5452
|
+
console.log(` ${chalk.bold('→')} ${chalk.cyan(`"${d.topic.slice(0, 56)}"`)} ${tier(`${d.score}/100 (${d.tier})`)}`);
|
|
5453
|
+
console.log(chalk.gray(` ${d.word_count}w · ${d.revisions} revision(s) · gap: ${d.gap.source} · ${d.ledger.gaps_marked_in_progress} gap(s) marked in-progress`));
|
|
5454
|
+
if (d.queued_path) console.log(chalk.gray(` queued: ${d.queued_path}`));
|
|
5455
|
+
}
|
|
5456
|
+
for (const s of result.skipped) console.log(chalk.gray(` ✗ skipped "${s.topic.slice(0, 40)}": ${s.reason}`));
|
|
5457
|
+
console.log('');
|
|
5458
|
+
console.log(chalk.gray(' ' + result.next_action));
|
|
5459
|
+
console.log('');
|
|
5460
|
+
});
|
|
5461
|
+
|
|
5462
|
+
// ── CRAWL-URL — ad-hoc lightweight crawl of any URL (free, no project) ────
|
|
5463
|
+
program
|
|
5464
|
+
.command('crawl-url <url>')
|
|
5465
|
+
.description('Ad-hoc lightweight crawl of any URL — no project, no browser, nothing saved')
|
|
5466
|
+
.option('--max-pages <n>', 'Pages to fetch (default 10, hard cap 50)', '10')
|
|
5467
|
+
.option('--citability', 'Include per-page AI citability (AEO) score')
|
|
5468
|
+
.option('--all-origins', 'Follow links across origins (default: same site only)')
|
|
5469
|
+
.option('--format <type>', 'Output format: brief or json', 'brief')
|
|
5470
|
+
.action(async (url, opts) => {
|
|
5471
|
+
const isJson = opts.format === 'json';
|
|
5472
|
+
const { lightCrawl } = await import('./crawler/light.js');
|
|
5473
|
+
try {
|
|
5474
|
+
const r = await lightCrawl(url, {
|
|
5475
|
+
maxPages: parseInt(opts.maxPages, 10) || 10,
|
|
5476
|
+
includeCitability: !!opts.citability,
|
|
5477
|
+
sameOrigin: !opts.allOrigins,
|
|
5478
|
+
onProgress: isJson ? undefined : (m) => console.log(chalk.gray(' ' + m)),
|
|
5479
|
+
});
|
|
5480
|
+
|
|
5481
|
+
if (isJson) {
|
|
5482
|
+
const pages = r.pages.map(p => ({
|
|
5483
|
+
url: p.url, status_code: p.status_code, title: p.title, meta_desc: p.meta_desc,
|
|
5484
|
+
canonical: p.canonical || null, is_indexable: p.is_indexable, word_count: p.word_count,
|
|
5485
|
+
headings: p.headings, schema_types: p.schema_types,
|
|
5486
|
+
published_date: p.published_date, modified_date: p.modified_date,
|
|
5487
|
+
internal_links: p.links.filter(l => l.internal).length,
|
|
5488
|
+
external_links: p.links.filter(l => !l.internal).length,
|
|
5489
|
+
...(p.citability ? { citability: p.citability } : {}),
|
|
5490
|
+
}));
|
|
5491
|
+
console.log(JSON.stringify({ start: r.start, origin: r.origin, stats: r.stats, pages, skipped: r.skipped }, null, 2));
|
|
5492
|
+
return;
|
|
5493
|
+
}
|
|
5494
|
+
|
|
5495
|
+
console.log('');
|
|
5496
|
+
console.log(chalk.bold(` 🕸 Light crawl — ${r.origin}`));
|
|
5497
|
+
console.log(chalk.gray(` ${r.stats.crawled} pages · ${r.stats.with_schema} with schema · ${r.stats.missing_title} missing title · ${r.stats.missing_meta_desc} missing meta · ${r.stats.elapsed_ms}ms`));
|
|
5498
|
+
console.log('');
|
|
5499
|
+
for (const p of r.pages) {
|
|
5500
|
+
const c = p.citability;
|
|
5501
|
+
const citeTxt = c ? (c.score >= 60 ? chalk.green : c.score >= 35 ? chalk.yellow : chalk.red)(` · cite:${c.score}`) : '';
|
|
5502
|
+
console.log(` ${chalk.cyan(p.url)}`);
|
|
5503
|
+
console.log(chalk.gray(` [${p.status_code}] ${p.word_count}w · h:${p.headings.length} · schema:[${p.schema_types.join(',') || '—'}]`) + citeTxt);
|
|
5504
|
+
if (p.title) console.log(chalk.gray(` “${p.title.slice(0, 80)}”`));
|
|
5505
|
+
}
|
|
5506
|
+
if (r.skipped.length) console.log(chalk.gray(`\n skipped ${r.skipped.length}: ${r.skipped.slice(0, 5).map(s => s.reason).join(', ')}`));
|
|
5507
|
+
console.log('');
|
|
5508
|
+
console.log(chalk.gray(' Ephemeral + local — nothing saved. JS-rendered pages under-report (use `seo-intel crawl`).'));
|
|
5509
|
+
console.log(chalk.gray(' For persistent history, the Ledger & competitors: `seo-intel setup` → `crawl`.'));
|
|
5510
|
+
console.log('');
|
|
5511
|
+
} catch (err) {
|
|
5512
|
+
if (isJson) { console.log(JSON.stringify({ error: err.message })); process.exit(1); }
|
|
5513
|
+
console.error(chalk.red(`\n ✗ ${err.message}\n`));
|
|
5514
|
+
process.exit(1);
|
|
5515
|
+
}
|
|
5516
|
+
});
|
|
5517
|
+
|
|
5362
5518
|
// Global error handler — ensures uncaught errors in async actions exit non-zero (BUG-004)
|
|
5363
5519
|
program.parseAsync().catch(err => {
|
|
5364
5520
|
console.error(chalk.red(`\n✗ ${err.message}\n`));
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Lightweight HTML extractor — pure string/regex parsing. No DOM, no browser.
|
|
3
|
+
*
|
|
4
|
+
* Powers the fetch-based light crawler (crawler/light.js) so ANY Claude user can
|
|
5
|
+
* crawl + analyze a site with zero browser environment installed. Consistent
|
|
6
|
+
* with schema-parser.js's regex approach ("no DOM parser needed").
|
|
7
|
+
*
|
|
8
|
+
* Trade-off: not as bulletproof as a full DOM parse on adversarial markup, but
|
|
9
|
+
* more than good enough for SEO/AEO metadata (title, meta, headings, links,
|
|
10
|
+
* JSON-LD, dates). The full Playwright crawler stays the heavyweight option.
|
|
11
|
+
*/
|
|
12
|
+
|
|
13
|
+
import { stripHtml } from './sanitize.js';
|
|
14
|
+
import { parseJsonLd } from './schema-parser.js';
|
|
15
|
+
|
|
16
|
+
function decodeEntities(s) {
|
|
17
|
+
return (s || '')
|
|
18
|
+
.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>')
|
|
19
|
+
.replace(/"/g, '"').replace(/�?39;/g, "'").replace(/'/g, "'")
|
|
20
|
+
.replace(/ /g, ' ')
|
|
21
|
+
.replace(/&#(\d+);/g, (_, n) => { try { return String.fromCodePoint(+n); } catch { return ' '; } })
|
|
22
|
+
.replace(/&#x([0-9a-f]+);/gi, (_, h) => { try { return String.fromCodePoint(parseInt(h, 16)); } catch { return ' '; } })
|
|
23
|
+
.trim();
|
|
24
|
+
}
|
|
25
|
+
|
|
26
|
+
const collapse = (s) => decodeEntities(stripHtml(s || '').replace(/\s+/g, ' '));
|
|
27
|
+
|
|
28
|
+
export function extractTitle(html) {
|
|
29
|
+
const m = html.match(/<title[^>]*>([\s\S]*?)<\/title>/i);
|
|
30
|
+
return m ? decodeEntities(m[1].replace(/\s+/g, ' ')) : '';
|
|
31
|
+
}
|
|
32
|
+
|
|
33
|
+
// Find a <meta> tag by attribute (name|property) = value, then read its content.
|
|
34
|
+
function metaContent(html, attr, value) {
|
|
35
|
+
const re = new RegExp(`<meta\\b[^>]*\\b${attr}\\s*=\\s*["']${value}["'][^>]*>`, 'i');
|
|
36
|
+
const tag = html.match(re);
|
|
37
|
+
if (!tag) return '';
|
|
38
|
+
const c = tag[0].match(/\bcontent\s*=\s*["']([\s\S]*?)["']/i);
|
|
39
|
+
return c ? decodeEntities(c[1]) : '';
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
export function extractMetaDescription(html) {
|
|
43
|
+
return metaContent(html, 'name', 'description') || metaContent(html, 'property', 'og:description');
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
export function extractMetaRobots(html) {
|
|
47
|
+
return metaContent(html, 'name', 'robots').toLowerCase();
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
export function extractCanonical(html, baseUrl) {
|
|
51
|
+
const tag = html.match(/<link\b[^>]*\brel\s*=\s*["']canonical["'][^>]*>/i);
|
|
52
|
+
if (!tag) return '';
|
|
53
|
+
const h = tag[0].match(/\bhref\s*=\s*["']([^"']+)["']/i);
|
|
54
|
+
if (!h) return '';
|
|
55
|
+
try { return new URL(h[1], baseUrl).toString(); } catch { return h[1]; }
|
|
56
|
+
}
|
|
57
|
+
|
|
58
|
+
export function extractHeadings(html) {
|
|
59
|
+
const out = [];
|
|
60
|
+
const re = /<h([1-6])\b[^>]*>([\s\S]*?)<\/h\1>/gi;
|
|
61
|
+
let m;
|
|
62
|
+
while ((m = re.exec(html)) !== null) {
|
|
63
|
+
const text = collapse(m[2]);
|
|
64
|
+
if (text) out.push({ level: Number(m[1]), text: text.slice(0, 300) });
|
|
65
|
+
if (out.length >= 300) break;
|
|
66
|
+
}
|
|
67
|
+
return out;
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
export function extractLinks(html, baseUrl) {
|
|
71
|
+
const out = [];
|
|
72
|
+
const seen = new Set();
|
|
73
|
+
let base; try { base = new URL(baseUrl); } catch { base = null; }
|
|
74
|
+
const re = /<a\b[^>]*\bhref\s*=\s*["']([^"']+)["'][^>]*>([\s\S]*?)<\/a>/gi;
|
|
75
|
+
let m;
|
|
76
|
+
while ((m = re.exec(html)) !== null) {
|
|
77
|
+
let href = m[1].trim();
|
|
78
|
+
if (!href) continue;
|
|
79
|
+
if (/^(#|mailto:|tel:|javascript:|data:)/i.test(href)) continue;
|
|
80
|
+
let abs;
|
|
81
|
+
try { abs = base ? new URL(href, base).toString() : href; } catch { continue; }
|
|
82
|
+
abs = abs.split('#')[0];
|
|
83
|
+
if (seen.has(abs)) continue;
|
|
84
|
+
seen.add(abs);
|
|
85
|
+
let internal = false;
|
|
86
|
+
try { internal = !!base && new URL(abs).hostname === base.hostname; } catch { /* keep false */ }
|
|
87
|
+
out.push({ href: abs, text: collapse(m[2]).slice(0, 120), internal });
|
|
88
|
+
if (out.length >= 1000) break;
|
|
89
|
+
}
|
|
90
|
+
return out;
|
|
91
|
+
}
|
|
92
|
+
|
|
93
|
+
/**
|
|
94
|
+
* Parse one fetched HTML document into the structured shape the rest of
|
|
95
|
+
* SEO Intel speaks (mirrors the Playwright crawler's per-page object).
|
|
96
|
+
* @param {string} html
|
|
97
|
+
* @param {string} url - the (final) URL this HTML was fetched from
|
|
98
|
+
*/
|
|
99
|
+
export function extractPageData(html, url) {
|
|
100
|
+
const schemas = parseJsonLd(html) || [];
|
|
101
|
+
const schemaTypes = [...new Set(schemas.map(s => s.type).filter(Boolean))];
|
|
102
|
+
let published = null, modified = null;
|
|
103
|
+
for (const s of schemas) {
|
|
104
|
+
if (!published && s.datePublished) published = s.datePublished;
|
|
105
|
+
if (!modified && s.dateModified) modified = s.dateModified;
|
|
106
|
+
}
|
|
107
|
+
const bodyText = stripHtml(html);
|
|
108
|
+
const wordCount = bodyText ? bodyText.split(/\s+/).filter(Boolean).length : 0;
|
|
109
|
+
const robots = extractMetaRobots(html);
|
|
110
|
+
|
|
111
|
+
return {
|
|
112
|
+
url,
|
|
113
|
+
title: extractTitle(html),
|
|
114
|
+
meta_desc: extractMetaDescription(html),
|
|
115
|
+
canonical: extractCanonical(html, url),
|
|
116
|
+
robots,
|
|
117
|
+
is_indexable: !/\bnoindex\b/.test(robots),
|
|
118
|
+
headings: extractHeadings(html),
|
|
119
|
+
links: extractLinks(html, url),
|
|
120
|
+
schema_types: schemaTypes,
|
|
121
|
+
schemas,
|
|
122
|
+
word_count: wordCount,
|
|
123
|
+
body_text: bodyText.slice(0, 20000),
|
|
124
|
+
published_date: published,
|
|
125
|
+
modified_date: modified,
|
|
126
|
+
};
|
|
127
|
+
}
|