@opendirectory.dev/skills 0.1.71 → 0.1.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,125 @@
1
+ # Guardrails — domain-expired-opportunity-finder
2
+
3
+ This document defines the ethical, policy, and anti-abuse constraints
4
+ that govern how this skill frames its output and recommendations.
5
+
6
+ ---
7
+
8
+ ## Core Principle
9
+
10
+ This skill is a **research and triage tool**, not an acquisition engine.
11
+ It surfaces candidates for human review. It does not make buying decisions,
12
+ guarantee SEO outcomes, or encourage manipulative domain practices.
13
+
14
+ ---
15
+
16
+ ## Anti-Abuse Policy
17
+
18
+ ### What This Skill Must NOT Encourage
19
+
20
+ 1. **Private Blog Network (PBN) construction.** The skill must never suggest
21
+ acquiring expired domains to build link networks. If the user's stated
22
+ intent suggests PBN use, the skill should note the risk and decline to
23
+ optimize for that use case.
24
+
25
+ 2. **Deceptive redirects.** The skill must never recommend redirecting an
26
+ expired domain to an unrelated site without clearly flagging this as
27
+ high-risk and likely ineffective. The `redirect_mismatch` flag exists
28
+ specifically to enforce this constraint.
29
+
30
+ 3. **Unrelated domain repurposing.** Acquiring a domain with no topical
31
+ connection to the user's niche purely for its metrics is a pattern
32
+ the skill must actively discourage through scoring (topical relevance
33
+ is the highest-weighted dimension at 30%).
34
+
35
+ 4. **Spam domain recycling.** The skill must flag domains with spammy
36
+ anchor profiles, suspicious registrar history, or deindex signals
37
+ rather than presenting them as opportunities.
38
+
39
+ ---
40
+
41
+ ## Redirect Analysis Constraints
42
+
43
+ Redirect suitability is only rated `high` or `medium` when:
44
+
45
+ - The historical topic of the expired domain **closely matches** the
46
+ target niche (topic continuity score ≥ 5/10)
47
+ - The content type is compatible (e.g., blog to blog, tool to tool)
48
+ - The anchor text profile is primarily branded or generic (not spammy)
49
+
50
+ If ANY of these conditions fail, redirect suitability is rated `low` or
51
+ `not-recommended`, and the `redirect_mismatch` flag is applied.
52
+
53
+ The skill must explicitly state in output when a domain is better suited
54
+ for rebuild than redirect, and explain why.
55
+
56
+ ---
57
+
58
+ ## Framing Rules
59
+
60
+ ### Language to USE in output:
61
+ - "This domain appears worth reviewing for..."
62
+ - "Based on available evidence, this candidate shows..."
63
+ - "Manual due diligence is recommended before any acquisition decision."
64
+ - "The historical topic appears adjacent to your niche, but further
65
+ verification is needed."
66
+ - "This assessment is based on publicly available signals and may not
67
+ reflect the domain's current state in search engines."
68
+
69
+ ### Language to AVOID in output:
70
+ - ~~"Guaranteed rankings"~~
71
+ - ~~"Link equity transfer"~~
72
+ - ~~"This domain will boost your SEO"~~
73
+ - ~~"Safe to redirect"~~ (always qualified)
74
+ - ~~"High authority domain"~~ (authority metrics are not our primary signal)
75
+ - ~~"Easy win"~~
76
+ - ~~"No-brainer acquisition"~~
77
+ - ~~"Free backlinks"~~
78
+ - ~~"Instant traffic"~~
79
+ - ~~"SEO hack"~~
80
+
81
+ ---
82
+
83
+ ## Required Disclaimer
84
+
85
+ Every output — in both shortlist and audit mode — must include this
86
+ disclaimer at the end:
87
+
88
+ > **Disclaimer:** These results are research recommendations, not guarantees
89
+ > of SEO value. Redirect analysis should only be considered when strong
90
+ > topic continuity exists between the expired domain and your target site.
91
+ > Search engine algorithms change frequently. Always perform manual due
92
+ > diligence — including checking current index status, reviewing the full
93
+ > backlink profile with a commercial tool, and verifying domain history —
94
+ > before making any acquisition decision. This skill does not endorse or
95
+ > facilitate manipulative SEO practices.
96
+
97
+ ---
98
+
99
+ ## User Intent Handling
100
+
101
+ If the user describes an intended use that conflicts with these guardrails:
102
+
103
+ | User Intent | Skill Response |
104
+ |---|---|
105
+ | "I want to build a PBN" | Note the SEO risk, explain why PBN patterns are increasingly detected, and proceed with scoring but add a warning to every output. Do not optimize for PBN use. |
106
+ | "I want to redirect unrelated domains" | Apply `redirect_mismatch` flag to all candidates where topic continuity is weak. Explain why unrelated redirects are unreliable. Still provide rebuild recommendations if applicable. |
107
+ | "I want cheap domains for link building" | Clarify that expired domain acquisition for link building carries inherent risk. Score normally but ensure `why_risky` explanations are thorough. |
108
+ | "I want to find domains in my niche to rebuild" | This is the ideal use case. Proceed normally with full scoring. |
109
+
110
+ ---
111
+
112
+ ## Transparency Requirements
113
+
114
+ 1. Every shortlisted domain must include BOTH `why_selected` AND `why_risky`.
115
+ Even the best candidates have some level of risk or uncertainty.
116
+
117
+ 2. The `scoring_mode` field must indicate whether LLM-enhanced or rule-based-only
118
+ scoring was used, so the user understands the depth of analysis.
119
+
120
+ 3. If data sources were unavailable or returned no results for a candidate,
121
+ this must be reflected in the `confidence` level and `signal_completeness`
122
+ dimension.
123
+
124
+ 4. The skill must never present a score without the ability for the user
125
+ to understand how it was computed (by referencing `references/scoring-model.md`).
@@ -0,0 +1,200 @@
1
+ # Output Format — domain-expired-opportunity-finder
2
+
3
+ This document defines the exact output schema for both Shortlist and Audit modes.
4
+ All output must follow this structure precisely.
5
+
6
+ ---
7
+
8
+ ## Output Modes
9
+
10
+ ### Shortlist Mode (default)
11
+ Returns only candidates with `recommended_action` of `high-priority-review`,
12
+ `review`, or `rebuild-only-review`. Rejected candidates are excluded.
13
+ Optimized for operator review — compact, decision-ready.
14
+
15
+ ### Audit Mode
16
+ Returns ALL candidates, including rejected ones. Each candidate includes
17
+ full dimension breakdowns. Use this mode to understand why candidates were
18
+ rejected or down-ranked.
19
+
20
+ To request audit mode, the user says: "run in audit mode" or "show all candidates including rejects."
21
+
22
+ ---
23
+
24
+ ## JSON Schema — Shortlist Mode
25
+
26
+ ```json
27
+ {
28
+ "skill": "domain-expired-opportunity-finder",
29
+ "run_date": "YYYY-MM-DD",
30
+ "target_niche": "string — the niche used for scoring",
31
+ "seed_keywords": ["array of seed keywords used, if any"],
32
+ "intended_use": "rebuild | redirect | either",
33
+ "candidates_evaluated": 0,
34
+ "candidates_shortlisted": 0,
35
+ "candidates_rejected": 0,
36
+ "scoring_mode": "llm-enhanced | rule-based-only",
37
+ "shortlist": [
38
+ {
39
+ "domain": "example.com",
40
+ "opportunity_score": 81,
41
+ "confidence": "high | medium | low",
42
+ "recommended_action": "high-priority-review | review | rebuild-only-review",
43
+ "redirect_suitability": "high | medium | low | not-recommended",
44
+ "topical_fit_summary": "One or two sentences on niche alignment evidence.",
45
+ "activity_level_summary": "One or two sentences on historical snapshot density.",
46
+ "content_quality_summary": "One or two sentences on historical title/meta quality.",
47
+ "history_summary": "One or two sentences on prior use and cleanliness.",
48
+ "risk_flags": ["topic_mismatch", "spam_content_risk"],
49
+ "why_selected": "Human-readable rationale for why this domain made the shortlist.",
50
+ "why_risky": "Human-readable rationale for caution, even for selected domains."
51
+ }
52
+ ],
53
+ "guardrails_disclaimer": "These results are research recommendations, not guarantees of SEO value. Redirect analysis should only be considered when strong topic continuity exists between the expired domain and your target site. Search engine algorithms change frequently. Always perform manual due diligence — including checking current index status, reviewing the full backlink profile with a commercial tool, and verifying domain history — before making any acquisition decision. This skill does not endorse or facilitate manipulative SEO practices."
54
+ }
55
+ ```
56
+
57
+ ---
58
+
59
+ ## JSON Schema — Audit Mode
60
+
61
+ Includes all fields from Shortlist Mode, plus:
62
+
63
+ ```json
64
+ {
65
+ "all_candidates": [
66
+ {
67
+ "domain": "example.com",
68
+ "opportunity_score": 34,
69
+ "confidence": "low",
70
+ "recommended_action": "reject",
71
+ "redirect_suitability": "not-recommended",
72
+ "dimension_breakdown": {
73
+ "topical_relevance": { "score": 5, "max": 30, "notes": "No keyword match, historical content unrelated" },
74
+ "historical_activity_level": { "score": 12, "max": 25, "notes": "Consistent snapshots but large gap in 2021" },
75
+ "historical_content_quality": { "score": 3, "max": 15, "notes": "Heavy exact-match money keywords in titles" },
76
+ "history_cleanliness": { "score": 8, "max": 15, "notes": "3 years of snapshots, some gaps" },
77
+ "redirect_suitability": { "score": 2, "max": 10, "notes": "Historical topic does not match target niche" },
78
+ "signal_completeness": { "score": 4, "max": 5, "notes": "RDAP lookup failed" }
79
+ },
80
+ "risk_flags": ["topic_mismatch", "spam_content_risk"],
81
+ "rejection_reason": "Topic mismatch (High severity) combined with spammy historical content.",
82
+ "topical_fit_summary": "...",
83
+ "activity_level_summary": "...",
84
+ "content_quality_summary": "...",
85
+ "history_summary": "...",
86
+ "why_selected": null,
87
+ "why_risky": "Domain has no topical overlap with target niche and titles show signs of keyword stuffing."
88
+ }
89
+ ]
90
+ }
91
+ ```
92
+
93
+ ---
94
+
95
+ ## Field Reference
96
+
97
+ | Field | Type | Required | Description |
98
+ |---|---|---|---|
99
+ | `domain` | string | Yes | Candidate domain name (no protocol, no trailing slash) |
100
+ | `opportunity_score` | integer | Yes | Final weighted score, 0–100 |
101
+ | `confidence` | enum | Yes | `high`, `medium`, or `low` — reflects evidence completeness |
102
+ | `recommended_action` | enum | Yes | `high-priority-review`, `review`, `rebuild-only-review`, or `reject` |
103
+ | `redirect_suitability` | enum | Yes | `high`, `medium`, `low`, or `not-recommended` |
104
+ | `topical_fit_summary` | string | Yes | 1–2 sentence explanation of niche alignment |
105
+ | `activity_level_summary` | string | Yes | 1–2 sentence explanation of snapshot/activity density |
106
+ | `content_quality_summary` | string | Yes | 1–2 sentence explanation of historical title quality |
107
+ | `history_summary` | string | Yes | 1–2 sentence explanation of prior use and cleanliness |
108
+ | `risk_flags` | array | Yes | Array of flag strings (empty array if none) |
109
+ | `why_selected` | string | Yes* | Rationale for inclusion. *Null for rejected candidates in audit mode. |
110
+ | `why_risky` | string | Yes | Rationale for caution, present even for selected domains |
111
+ | `dimension_breakdown` | object | Audit only | Per-dimension score, max, and notes |
112
+ | `rejection_reason` | string | Audit only | Why the candidate was rejected |
113
+
114
+ ---
115
+
116
+ ## Redirect Suitability Mapping
117
+
118
+ | Score (out of 10) | Label |
119
+ |---|---|
120
+ | 8–10 | `high` |
121
+ | 5–7 | `medium` |
122
+ | 2–4 | `low` |
123
+ | 0–1 | `not-recommended` |
124
+
125
+ ---
126
+
127
+ ## Example Complete Output — Shortlist Mode
128
+
129
+ ```json
130
+ {
131
+ "skill": "domain-expired-opportunity-finder",
132
+ "run_date": "2026-05-10",
133
+ "target_niche": "developer tools",
134
+ "seed_keywords": ["devops", "CI/CD", "code editor", "IDE"],
135
+ "intended_use": "either",
136
+ "candidates_evaluated": 8,
137
+ "candidates_shortlisted": 3,
138
+ "candidates_rejected": 5,
139
+ "scoring_mode": "llm-enhanced",
140
+ "shortlist": [
141
+ {
142
+ "domain": "devtoolsweekly.com",
143
+ "opportunity_score": 86,
144
+ "confidence": "high",
145
+ "recommended_action": "high-priority-review",
146
+ "redirect_suitability": "high",
147
+ "topical_fit_summary": "Domain name contains 'devtools'. Wayback snapshots confirm it was a weekly newsletter covering developer tools and IDE plugins from 2019 to 2024.",
148
+ "activity_level_summary": "High snapshot frequency across 6 consecutive years, indicating sustained active use.",
149
+ "content_quality_summary": "Historical page titles are predominantly branded ('DevTools Weekly') and natural. No keyword stuffing detected.",
150
+ "history_summary": "6 years of consistent Wayback snapshots (2019–2024). All snapshots show real content. No parking pages or sudden drop-offs detected.",
151
+ "risk_flags": [],
152
+ "why_selected": "Strong topical match to developer tools niche with a healthy activity history. Suitable for both rebuild and redirect analysis.",
153
+ "why_risky": "No significant risk signals detected. Standard due diligence recommended before acquisition."
154
+ },
155
+ {
156
+ "domain": "codeshipnews.io",
157
+ "opportunity_score": 63,
158
+ "confidence": "medium",
159
+ "recommended_action": "review",
160
+ "redirect_suitability": "medium",
161
+ "topical_fit_summary": "Domain suggests CI/CD or shipping code. Wayback shows a blog about deployment automation, adjacent to developer tools.",
162
+ "activity_level_summary": "Moderate activity. Captures exist but are clustered rather than continuous.",
163
+ "content_quality_summary": "Titles are somewhat generic but generally readable. Not alarming but worth noting.",
164
+ "history_summary": "3 years of snapshots (2020–2023). Clean content, no parking pages. Relatively short history.",
165
+ "risk_flags": ["short_history"],
166
+ "why_selected": "Adjacent to developer tools with a reasonable history. Worth manual review for rebuild potential.",
167
+ "why_risky": "Relatively short active history (3 years). Some boilerplate titles present."
168
+ },
169
+ {
170
+ "domain": "stackforgeapp.com",
171
+ "opportunity_score": 58,
172
+ "confidence": "medium",
173
+ "recommended_action": "rebuild-only-review",
174
+ "redirect_suitability": "low",
175
+ "topical_fit_summary": "Domain suggests a developer platform. Wayback shows a SaaS landing page for a project management tool — partially related to dev tools but not a direct match.",
176
+ "activity_level_summary": "Consistent snapshots over a 4-year period.",
177
+ "content_quality_summary": "Mostly branded ('StackForge') titles. Clean pattern.",
178
+ "history_summary": "4 years of snapshots. Clean, real product pages throughout.",
179
+ "risk_flags": ["redirect_mismatch"],
180
+ "why_selected": "Clean history makes it worth reviewing as a rebuild candidate in the developer tools space.",
181
+ "why_risky": "Historical topic (project management SaaS) does not closely match developer tools. Redirect not recommended — topic continuity is weak."
182
+ }
183
+ ],
184
+ "guardrails_disclaimer": "These results are research recommendations, not guarantees of SEO value. Redirect analysis should only be considered when strong topic continuity exists between the expired domain and your target site. Search engine algorithms change frequently. Always perform manual due diligence — including checking current index status, reviewing the full backlink profile with a commercial tool, and verifying domain history — before making any acquisition decision. This skill does not endorse or facilitate manipulative SEO practices."
185
+ }
186
+ ```
187
+
188
+ ---
189
+
190
+ ## Save Location
191
+
192
+ Output is saved to:
193
+ ```
194
+ docs/expired-domain-intel/YYYY-MM-DD.json
195
+ ```
196
+
197
+ Create the directory if it does not exist:
198
+ ```bash
199
+ mkdir -p docs/expired-domain-intel
200
+ ```
@@ -0,0 +1,135 @@
1
+ # Risk Flags — domain-expired-opportunity-finder
2
+
3
+ This document defines the risk flags that the skill can attach to any
4
+ candidate domain. Risk flags serve two purposes:
5
+
6
+ 1. They explain WHY a domain may be risky despite looking attractive.
7
+ 2. They influence the recommendation logic (high-severity flags cap recommendations).
8
+
9
+ ---
10
+
11
+ ## Flag Definitions
12
+
13
+ ### `topic_mismatch`
14
+ - **Severity:** High
15
+ - **Trigger:** Historical topic and target niche overlap score < 30% (based on
16
+ Wayback content analysis and anchor text review)
17
+ - **Effect:** Caps recommendation at `reject` for redirect use cases.
18
+ For rebuild use cases, caps at `review` (never `high-priority-review`).
19
+ - **Why It Matters:** Acquiring a domain with no topical connection to your
20
+ niche is the most common and most damaging expired domain mistake. Search
21
+ engines increasingly detect and devalue unrelated redirects.
22
+
23
+ ---
24
+
25
+ ### `spam_content_risk`
26
+ - **Severity:** High
27
+ - **Trigger:** Historical page titles or meta descriptions consist predominantly of
28
+ exact-match money keywords, nonsensical text, or foreign-language spam unrelated to the
29
+ domain's apparent topic
30
+ - **Effect:** Caps recommendation at `review` (never `high-priority-review`).
31
+ Adds explicit warning in `why_risky` field.
32
+ - **Why It Matters:** Spammy historical content suggests the domain was used
33
+ for link schemes or low-quality affiliate sites. The historical value is unreliable.
34
+
35
+ ---
36
+
37
+ ### `weak_historical_activity`
38
+ - **Severity:** Medium
39
+ - **Trigger:** Wayback snapshot count is extremely low (e.g., < 10 total snapshots
40
+ across its lifetime) but avoids the `unclear_history` threshold.
41
+ - **Effect:** Lowers `historical_activity_level` score. Does not automatically
42
+ reject but reduces overall opportunity_score.
43
+ - **Why It Matters:** Expired domains with very low historical activity have had
44
+ less time to build natural authority. While not inherently spammy, they offer
45
+ less evidence of sustained value.
46
+
47
+ ---
48
+
49
+ ### `unclear_history`
50
+ - **Severity:** Medium
51
+ - **Trigger:** Fewer than 3 Wayback CDX snapshots exist for the domain,
52
+ OR all available snapshots resolve to parking pages (Sedo, GoDaddy parked,
53
+ domain-for-sale pages)
54
+ - **Effect:** Reduces `history_cleanliness` score. Lowers confidence level
55
+ by one step (e.g., `high` → `medium`).
56
+ - **Why It Matters:** Without verifiable history, there is no way to confirm
57
+ what the domain was previously used for. It could have been legitimate,
58
+ or it could have been a spam site that was cleaned. Ambiguity should
59
+ reduce confidence, not be ignored.
60
+
61
+ ---
62
+
63
+ ### `possible_deindex`
64
+ - **Severity:** High
65
+ - **Trigger:** Wayback CDX data shows years of regular snapshots followed by a
66
+ sudden complete stop (no snapshots for 2+ years before expiry), suggesting the
67
+ domain may have been deindexed or penalized before it expired
68
+ - **Effect:** Caps recommendation at `review`. Adds explicit warning about
69
+ potential search engine penalties in `why_risky` field.
70
+ - **Why It Matters:** A domain that was deindexed carries legacy penalties
71
+ that may transfer even after acquisition. This is one of the highest-risk
72
+ scenarios in expired domain acquisition.
73
+
74
+ ---
75
+
76
+ ### `redirect_mismatch`
77
+ - **Severity:** Medium
78
+ - **Trigger:** Redirect suitability score is below 4/10 (weak topic continuity
79
+ between historical use and intended target)
80
+ - **Effect:** Changes recommendation from `review` to `rebuild-only-review`.
81
+ Explicitly notes in output that redirect analysis is not recommended.
82
+ - **Why It Matters:** Redirecting a domain to an unrelated site is increasingly
83
+ treated as manipulative by search engines. Even when done with good intentions,
84
+ a topic-mismatched redirect is unlikely to pass meaningful value.
85
+
86
+ ---
87
+
88
+ ### `short_history`
89
+ - **Severity:** Low
90
+ - **Trigger:** Domain was actively used for less than 1 year before expiring
91
+ (based on Wayback first and last capture dates)
92
+ - **Effect:** Minor reduction to `history_cleanliness` score. No recommendation
93
+ cap.
94
+ - **Why It Matters:** Domains with very short histories have had less time to
95
+ build natural authority. They are not necessarily bad, but they offer less
96
+ evidence of sustained value.
97
+
98
+ ---
99
+
100
+ ### `suspicious_registrar`
101
+ - **Severity:** Medium
102
+ - **Trigger:** Registration data (RDAP) shows the domain was registered through a registrar
103
+ commonly associated with bulk domain spam operations (pattern-matched against
104
+ a known list of high-spam registrars)
105
+ - **Effect:** Lowers confidence by one step. Adds note in `why_risky`.
106
+ - **Why It Matters:** Domains registered through bulk-spam registrars have a
107
+ higher probability of having been used for link schemes, PBNs, or other
108
+ manipulative purposes, even if the current signals look clean.
109
+
110
+ ---
111
+
112
+ ## Severity Interaction Rules
113
+
114
+ 1. **Any High-severity flag** → recommendation is capped at `review` or lower.
115
+ A domain can never be `high-priority-review` with an active High flag.
116
+
117
+ 2. **Two or more Medium-severity flags** → treated as equivalent to one High
118
+ flag for recommendation capping purposes.
119
+
120
+ 3. **Low-severity flags** → informational only. They appear in output but do
121
+ not cap recommendations.
122
+
123
+ 4. **Flag stacking** → multiple flags of the same severity do not compound
124
+ further than the rules above. Two High flags are treated the same as one
125
+ High flag for capping purposes (the cap is already at `review`).
126
+
127
+ ---
128
+
129
+ ## Flag Presentation in Output
130
+
131
+ Every flagged domain includes:
132
+ - The flag name in the `risk_flags` array
133
+ - A human-readable explanation in `why_risky`
134
+ - The severity level is documented here but not repeated in output
135
+ (to keep output compact — operators can reference this file for severity details)
@@ -0,0 +1,198 @@
1
+ # Scoring Model — domain-expired-opportunity-finder
2
+
3
+ This document defines the transparent, weighted scoring model used to evaluate
4
+ expired domain candidates. Every score is explainable — no black-box ranking.
5
+
6
+ ---
7
+
8
+ ## Scoring Dimensions
9
+
10
+ The skill scores each candidate across 6 dimensions. Each dimension has a
11
+ defined weight, maximum point value, and the signals used to compute it.
12
+
13
+ ### 1. Topical Relevance (30 points, weight 30%)
14
+
15
+ **Purpose:** Does this domain historically align with the user's target niche?
16
+
17
+ **Signals:**
18
+ - Domain string keyword match against `target_niche` and `seed_keywords`
19
+ - Historical page titles from Wayback CDX snapshots
20
+ - Historical meta descriptions and visible text (when available)
21
+ - LLM niche-fit assessment (when LLM API key is configured)
22
+
23
+ **Scoring Rules:**
24
+ - 25–30: Strong keyword match in domain name AND historical content confirms niche
25
+ - 18–24: Partial keyword match OR historical content is adjacent to niche
26
+ - 10–17: Weak or indirect topical connection
27
+ - 0–9: No meaningful topical overlap detected
28
+
29
+ **Why Highest Weight:** Topical mismatch is the #1 trap in expired domain
30
+ acquisition. A high-authority domain in the wrong niche is worse than a
31
+ moderate domain in the right niche.
32
+
33
+ ---
34
+
35
+ ### 2. Historical Activity Level (25 points, weight 25%)
36
+
37
+ **Purpose:** Does the domain have a sustained track record of active use?
38
+
39
+ **Signals:**
40
+ - Wayback CDX snapshot frequency over time
41
+ - Density of snapshots during active years
42
+ - Absence of prolonged dormant periods (excluding post-expiry)
43
+
44
+ **Scoring Rules:**
45
+ - 20–25: High snapshot frequency spanning multiple years continuously
46
+ - 13–19: Moderate activity, some years with lower capture rates
47
+ - 6–12: Low overall activity or large multi-year gaps
48
+ - 0–5: Barely any snapshots, very sparse activity history
49
+
50
+ **Key Principle:** A domain with a deep, sustained archive history provides
51
+ more evidence of legitimate past use than a domain that only existed briefly.
52
+
53
+ ---
54
+
55
+ ### 3. Historical Content Quality (15 points, weight 15%)
56
+
57
+ **Purpose:** Does the archived content look natural or keyword-stuffed?
58
+
59
+ **Signals:**
60
+ - Keyword stuffing in historical `<title>` tags
61
+ - Unnatural phrasing in meta descriptions
62
+ - Language consistency (sudden language shifts in history)
63
+
64
+ **Scoring Rules:**
65
+ - 12–15: Natural titles, branded focus, readable descriptions
66
+ - 8–11: Some exact-match keywords but generally readable
67
+ - 4–7: Heavy exact-match keywords or boilerplate titles
68
+ - 0–3: Predominantly spammy, nonsensical, or foreign-language shifts
69
+
70
+ **Red Flag:** If historical titles are heavily stuffed with money keywords, the
71
+ `spam_content_risk` flag is triggered regardless of score.
72
+
73
+ ---
74
+
75
+ ### 4. History Cleanliness (15 points, weight 15%)
76
+
77
+ **Purpose:** Was this domain used legitimately before it expired?
78
+
79
+ **Signals:**
80
+ - Wayback CDX snapshot count (more = longer verifiable history)
81
+ - First and last capture dates (years of active use)
82
+ - HTTP status code consistency across snapshots (200s vs 301s/404s)
83
+ - Absence of parking page patterns (e.g., Sedo, GoDaddy parked pages)
84
+ - No sudden activity drop-offs suggesting deindexing
85
+
86
+ **Scoring Rules:**
87
+ - 12–15: 5+ years of consistent snapshots, clean status codes, real content
88
+ - 8–11: 2–5 years of history, mostly clean, minor gaps
89
+ - 4–7: Short history OR significant gaps OR some parking page evidence
90
+ - 0–3: Very sparse history, mostly parking pages, or suspicious patterns
91
+
92
+ ---
93
+
94
+ ### 5. Redirect Suitability (10 points, weight 10%)
95
+
96
+ **Purpose:** If the user wants to redirect this domain, is the historical
97
+ topic close enough to the target niche to make that plausible?
98
+
99
+ **Signals:**
100
+ - Topic continuity score (overlap between historical topic and target niche)
101
+ - Content type match (was it a blog, tool, company site, etc.)
102
+ - Audience overlap estimate (did the historical audience align with target?)
103
+
104
+ **Scoring Rules:**
105
+ - 8–10: Historical topic is the same or very closely adjacent to target niche
106
+ - 5–7: Some topic overlap, partially plausible redirect
107
+ - 2–4: Weak overlap, redirect would be a stretch
108
+ - 0–1: No meaningful topic continuity — redirect not recommended
109
+
110
+ **Key Constraint:** `redirect_suitability` is always included in `opportunity_score`
111
+ regardless of `intended_use`. When `intended_use` is `rebuild`, a low
112
+ `redirect_suitability` score does **not** trigger the `redirect_mismatch` flag
113
+ or lower the recommendation — it is scored and summed normally. When
114
+ `intended_use` is `redirect`, a score below 4/10 triggers the `redirect_mismatch`
115
+ flag and caps the recommendation at `rebuild-only-review`.
116
+
117
+ ---
118
+
119
+ ### 6. Signal Completeness (5 points, weight 5%)
120
+
121
+ **Purpose:** How much evidence do we actually have for this candidate?
122
+
123
+ **Signals:**
124
+ - Percentage of data sources that returned usable data
125
+ - Whether Wayback and RDAP both succeeded
126
+ - Whether LLM analysis was available (if API key configured)
127
+
128
+ **Scoring Rules:**
129
+ - 4–5: All available data sources returned data
130
+ - 2–3: Most sources returned data, 1–2 gaps
131
+ - 0–1: Significant data gaps — multiple sources failed or returned nothing
132
+
133
+ **Why This Matters:** A domain that looks promising but has almost no
134
+ verifiable data should not rank as highly as one with strong evidence.
135
+ Missing signals reduce confidence, not boost it.
136
+
137
+ ---
138
+
139
+ ## Total Score Computation
140
+
141
+ ```
142
+ opportunity_score = topical_relevance
143
+ + historical_activity_level
144
+ + historical_content_quality
145
+ + history_cleanliness
146
+ + redirect_suitability
147
+ + signal_completeness
148
+ ```
149
+
150
+ **Range:** 0–100 points.
151
+
152
+ ---
153
+
154
+ ## Confidence Mapping
155
+
156
+ Confidence reflects how much evidence exists behind the score.
157
+
158
+ | Condition | Confidence Level |
159
+ |---|---|
160
+ | ≥ 4 of 6 dimensions have strong data (non-zero signals) | `high` |
161
+ | 3 of 6 dimensions have strong data | `medium` |
162
+ | ≤ 2 dimensions have strong data | `low` |
163
+
164
+ **Rule:** A domain with `low` confidence can never receive `high-priority-review`,
165
+ regardless of its opportunity_score.
166
+
167
+ ---
168
+
169
+ ## Recommendation Logic
170
+
171
+ | Score Range | Confidence | Recommended Action |
172
+ |---|---|---|
173
+ | ≥ 75 | `high` | `high-priority-review` |
174
+ | ≥ 55 | `medium` or `high` | `review` |
175
+ | ≥ 55 | any, BUT redirect_suitability < 4/10 | `rebuild-only-review` |
176
+ | < 55 | any | `reject` |
177
+ | any | any, WITH critical risk flag (severity = High) | `reject` (unless override) |
178
+
179
+ **Override Rule:** A single High-severity risk flag (e.g., `topic_mismatch`,
180
+ `spam_content_risk`, `possible_deindex`) automatically caps the recommendation
181
+ at `review` or lower, even if the score is ≥ 75.
182
+
183
+ ---
184
+
185
+ ## Example Scoring Walkthrough
186
+
187
+ **Candidate:** `devtoolsweekly.com` | **Target Niche:** `developer tools`
188
+
189
+ | Dimension | Score | Rationale |
190
+ |---|---|---|
191
+ | Topical Relevance | 28/30 | "devtools" in domain, Wayback shows newsletter about dev tools |
192
+ | Historical Activity | 18/25 | Consistent snapshot captures across multiple years |
193
+ | Content Quality | 12/15 | Natural titles ("DevTools Weekly - Issue #45") |
194
+ | History Cleanliness | 14/15 | 6 years of consistent snapshots, no parking pages |
195
+ | Redirect Suitability | 9/10 | Direct topic match — dev tools to dev tools |
196
+ | Signal Completeness | 5/5 | All sources returned data |
197
+
198
+ **Total:** 86/100 | **Confidence:** `high` | **Action:** `high-priority-review`
@@ -1,7 +0,0 @@
1
- # claude-md-generator — Environment Variables
2
- # =============================================
3
- # Gemini is required for generating the CLAUDE.md content from analysis.
4
-
5
- # Required: Google Gemini API key for CLAUDE.md generation
6
- # Get it: aistudio.google.com, Get API key
7
- GEMINI_API_KEY=your_gemini_api_key_here