@opendirectory.dev/skills 0.1.70 → 0.1.72

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,540 @@
1
+ ---
2
+ name: domain-expired-opportunity-finder
3
+ description: Evaluates expired domain candidates against a target niche, scores them by topical relevance, historical activity level, and history cleanliness, then outputs a ranked shortlist with explainable reasoning and risk flags.
4
+ compatibility: [claude-code, gemini-cli, github-copilot]
5
+ author: ajaycodesitbetter
6
+ version: 1.0.0
7
+ ---
8
+
9
+ # Expired Domain Opportunity Finder
10
+
11
+ Evaluate expired domain candidates for a specific niche. Score them on topical
12
+ fit, historical activity level, history cleanliness, and redirect suitability. Output a
13
+ conservative, explainable shortlist for human review.
14
+
15
+ ---
16
+
17
+ **Critical rule:** Every recommendation must include BOTH a positive rationale
18
+ (`why_selected`) AND a caution rationale (`why_risky`). Never output a bare
19
+ score without explanation.
20
+
21
+ **Conservative-by-default rule:** When signals are incomplete or contradictory,
22
+ lower the confidence level. Do not surface ambiguous candidates as strong
23
+ opportunities. Missing data reduces confidence, never inflates it.
24
+
25
+ **Anti-abuse rule:** Never encourage unrelated redirects, PBN construction, or
26
+ domain repurposing where the historical topic does not match the target niche.
27
+ Read `references/guardrails.md` for the full anti-abuse policy.
28
+
29
+ ---
30
+
31
+ ## Step 1: Setup Check
32
+
33
+ Check the environment before doing anything else.
34
+
35
+ Verify that `curl` and `python3` (or `python`) are available:
36
+ ```bash
37
+ curl --version > /dev/null 2>&1 && echo "curl: available" || echo "curl: MISSING"
38
+ python3 --version 2>/dev/null || python --version 2>/dev/null || echo "python: MISSING"
39
+ ```
40
+
41
+ Check for an optional LLM API key for enhanced niche-relevance scoring:
42
+ ```bash
43
+ echo "LLM_API_KEY: ${LLM_API_KEY:+set}"
44
+ ```
45
+
46
+ **If `curl` or `python` is missing:**
47
+ Stop. Tell the user: "This skill requires curl and Python 3.10+. Please install them and try again."
48
+
49
+ **If `LLM_API_KEY` is not set:**
50
+ Continue. The skill will use rule-based scoring only (domain string matching,
51
+ Wayback title analysis, keyword overlap). Note to the user: "Running in
52
+ rule-based-only mode. Set LLM_API_KEY for enhanced niche-relevance scoring."
53
+
54
+ **If `LLM_API_KEY` is set:**
55
+ The skill will use LLM-enhanced scoring for topical relevance analysis.
56
+ This provides deeper contextual assessment of niche fit.
57
+
58
+ QA: State the scoring mode (llm-enhanced or rule-based-only) and confirm tools are available.
59
+
60
+ ---
61
+
62
+ ## Step 2: Input Collection
63
+
64
+ Collect the required and optional inputs from the user.
65
+
66
+ **Required:**
67
+ - `target_niche` (string): The core niche to evaluate against. Examples: "developer tools", "AI SaaS", "cybersecurity", "fintech".
68
+
69
+ **Optional (ask only if not provided):**
70
+ - `seed_keywords` (array): Keywords to refine topical matching. If not provided, extract 3–5 keywords from the niche name automatically.
71
+ - `candidate_domains` (array): Specific domains to evaluate. If not provided, prompt the user.
72
+ - `discovery_source` (string): Where candidates came from — `manual`, `expireddomains-net`, `external-feed`.
73
+ - `min_snapshots` (integer): Minimum historical snapshot threshold. Default: 10.
74
+ - `max_risk_level` (string): `low`, `medium`, or `high`. Controls how aggressively risky candidates are filtered. Default: `medium`.
75
+ - `intended_use` (string): `rebuild`, `redirect`, or `either`. Default: `either`.
76
+
77
+ **If no `candidate_domains` are provided:**
78
+ Ask: "Please provide a list of expired domain candidates to evaluate. You can:
79
+ 1. Paste domain names (one per line or comma-separated)
80
+ 2. Provide a file path to a text file with one domain per line
81
+ 3. Say 'example' to run with a built-in demo set for the 'developer tools' niche"
82
+
83
+ **If the user says 'example':**
84
+ Use this demo set:
85
+ ```
86
+ devtoolsweekly.com
87
+ codeshipnews.io
88
+ stackforgeapp.com
89
+ quickseorank.net
90
+ bestcheaphosting247.com
91
+ cloudbuildpro.dev
92
+ reactwidgetlib.com
93
+ megadealsshop.xyz
94
+ ```
95
+
96
+ After collecting all inputs, confirm:
97
+ "Target niche: [niche]. Evaluating [N] candidate domains. Scoring mode: [mode]. Intended use: [use]."
98
+
99
+ ---
100
+
101
+ ## Step 3: Candidate Normalization
102
+
103
+ Clean and validate the candidate list before scoring.
104
+
105
+ ```bash
106
+ python3 -c "
107
+ import sys, re
108
+
109
+ domains = '''CANDIDATE_LIST_HERE'''.strip().split('\n')
110
+ seen = set()
111
+ valid = []
112
+ invalid = []
113
+
114
+ for d in domains:
115
+ d = d.strip().lower()
116
+ # Strip protocols and paths
117
+ d = re.sub(r'^https?://', '', d)
118
+ d = d.split('/')[0]
119
+ d = d.strip('.')
120
+
121
+ if not d:
122
+ continue
123
+
124
+ # Basic TLD validation
125
+ if '.' not in d or len(d) < 4:
126
+ invalid.append(d)
127
+ continue
128
+
129
+ # Deduplicate
130
+ if d in seen:
131
+ continue
132
+ seen.add(d)
133
+ valid.append(d)
134
+
135
+ print(f'Valid candidates: {len(valid)}')
136
+ print(f'Removed (invalid/duplicate): {len(invalid)}')
137
+ for v in valid:
138
+ print(f' ✓ {v}')
139
+ for i in invalid:
140
+ print(f' ✗ {i} (invalid format)')
141
+ "
142
+ ```
143
+
144
+ Replace `CANDIDATE_LIST_HERE` with the actual domain list from Step 2.
145
+
146
+ State: "[N] valid candidates after normalization. [M] removed (invalid/duplicate)."
147
+
148
+ If 0 valid candidates remain, stop and tell the user: "No valid domain candidates found. Please provide domain names in the format 'example.com'."
149
+
150
+ ---
151
+
152
+ ## Step 4: Signal Collection
153
+
154
+ For each valid candidate, collect signals from free public sources.
155
+ Run these checks sequentially per domain.
156
+
157
+ ### 4a: Wayback CDX API — History Snapshots
158
+
159
+ Query the Wayback Machine for all historical snapshots. We use `limit=100000`
160
+ and explicit `from`/`to` parameters are intentionally omitted so that CDX
161
+ returns snapshots from the full lifetime of the domain. The results are sorted
162
+ ascending by timestamp (oldest first) so `first_capture` and `last_capture`
163
+ are accurate:
164
+
165
+ ```bash
166
+ curl -s "https://web.archive.org/cdx/search/cdx?url=DOMAIN_HERE&output=json&fl=timestamp,statuscode&collapse=timestamp:6&limit=100000" \
167
+ | python3 -c "
168
+ import sys, json
169
+
170
+ try:
171
+ data = json.load(sys.stdin)
172
+ if len(data) <= 1:
173
+ print(json.dumps({'domain': 'DOMAIN_HERE', 'snapshots': 0, 'first_capture': None, 'last_capture': None, 'status_codes': {}, 'years_active': 0}))
174
+ else:
175
+ rows = data[1:] # skip header row
176
+ timestamps = [r[0] for r in rows] # already ascending (oldest first)
177
+ statuses = [r[1] for r in rows]
178
+ status_counts = {}
179
+ for s in statuses:
180
+ status_counts[s] = status_counts.get(s, 0) + 1
181
+ first_year = int(timestamps[0][:4])
182
+ last_year = int(timestamps[-1][:4])
183
+ print(json.dumps({
184
+ 'domain': 'DOMAIN_HERE',
185
+ 'snapshots': len(rows),
186
+ 'first_capture': timestamps[0],
187
+ 'last_capture': timestamps[-1],
188
+ 'status_codes': status_counts,
189
+ 'years_active': last_year - first_year + 1
190
+ }))
191
+ except:
192
+ print(json.dumps({'domain': 'DOMAIN_HERE', 'snapshots': 0, 'error': 'wayback_api_failed'}))
193
+ "
194
+ ```
195
+
196
+ **Rate limiting:** Wait 2 seconds between Wayback API calls to be polite to the service.
197
+
198
+ ### 4b: Wayback Content Sampling — Historical Page Titles
199
+
200
+ For candidates with > 0 snapshots, fetch the most recent snapshot to extract
201
+ the page title (used for topical relevance scoring):
202
+
203
+ ```bash
204
+ curl -s -L "https://web.archive.org/web/LATEST_TIMESTAMP/http://DOMAIN_HERE" \
205
+ | python3 -c "
206
+ import sys, re
207
+ html = sys.stdin.read()[:50000]
208
+ title_match = re.search(r'<title[^>]*>(.*?)</title>', html, re.IGNORECASE | re.DOTALL)
209
+ title = title_match.group(1).strip() if title_match else 'no title found'
210
+ # Extract meta description too
211
+ meta_match = re.search(r'<meta[^>]*name=[\"']description[\"'][^>]*content=[\"'](.*?)[\"']', html, re.IGNORECASE)
212
+ desc = meta_match.group(1).strip() if meta_match else 'no description found'
213
+ print(f'Title: {title}')
214
+ print(f'Description: {desc}')
215
+ "
216
+ ```
217
+
218
+ Replace `LATEST_TIMESTAMP` with the most recent timestamp from Step 4a.
219
+
220
+ ### 4c: RDAP Lookup — Registration Status
221
+
222
+ Use the cross-platform HTTP-based RDAP standard (replaces OS-dependent WHOIS).
223
+ An HTTP 404 from RDAP means the domain is not registered (i.e. it is genuinely
224
+ available or untracked) — that is distinct from a network failure. Handle both
225
+ cases explicitly:
226
+
227
+ ```bash
228
+ python3 -c "
229
+ import urllib.request, urllib.error, json
230
+
231
+ domain = 'DOMAIN_HERE'
232
+ try:
233
+ req = urllib.request.Request(
234
+ f'https://rdap.org/domain/{domain}',
235
+ headers={'User-Agent': 'Mozilla/5.0'}
236
+ )
237
+ with urllib.request.urlopen(req, timeout=10) as response:
238
+ data = json.loads(response.read().decode())
239
+
240
+ registrar = 'unknown'
241
+ created = 'unknown'
242
+
243
+ for entity in data.get('entities', []):
244
+ if 'registrar' in entity.get('roles', []):
245
+ try:
246
+ registrar = entity.get('vcardArray', [[]])[1][0][3]
247
+ except Exception:
248
+ pass
249
+
250
+ for event in data.get('events', []):
251
+ if event.get('eventAction') == 'registration':
252
+ created = event.get('eventDate', 'unknown')
253
+
254
+ print(json.dumps({
255
+ 'domain': domain,
256
+ 'status': 'registered',
257
+ 'registrar': registrar,
258
+ 'created': created
259
+ }))
260
+ except urllib.error.HTTPError as e:
261
+ if e.code == 404:
262
+ # Domain has no RDAP object — likely unregistered or not in RDAP coverage
263
+ print(json.dumps({'domain': domain, 'status': 'unregistered_or_no_rdap_object'}))
264
+ else:
265
+ print(json.dumps({'domain': domain, 'error': f'rdap_http_error_{e.code}'}))
266
+ except Exception:
267
+ print(json.dumps({'domain': domain, 'error': 'rdap_lookup_failed'}))
268
+ "
269
+ ```
270
+
271
+ ### 4d: Domain String Analysis — Keyword Matching
272
+
273
+ Score keyword overlap between the domain name and the target niche / seed keywords:
274
+
275
+ ```bash
276
+ python3 -c "
277
+ import re, json
278
+
279
+ domain = 'DOMAIN_HERE'
280
+ niche = 'NICHE_HERE'
281
+ seeds = SEEDS_JSON_HERE # e.g., ['devops', 'ci/cd', 'code editor']
282
+
283
+ # Extract words from domain
284
+ domain_base = domain.rsplit('.', 1)[0] # remove TLD
285
+ domain_words = re.split(r'[-_.]', domain_base.lower())
286
+
287
+ # Check niche words
288
+ niche_words = niche.lower().split()
289
+ all_keywords = set(niche_words + [s.lower() for s in seeds])
290
+
291
+ matches = [w for w in domain_words if any(kw in w or w in kw for kw in all_keywords)]
292
+ match_ratio = len(matches) / max(len(domain_words), 1)
293
+
294
+ print(json.dumps({
295
+ 'domain': domain,
296
+ 'domain_words': domain_words,
297
+ 'keyword_matches': matches,
298
+ 'match_ratio': round(match_ratio, 2)
299
+ }))
300
+ "
301
+ ```
302
+
303
+ ### 4e: Gemini LLM Niche-Relevance Assessment (if LLM_API_KEY is set)
304
+
305
+ If the LLM API key is configured, batch all candidates with their collected
306
+ signals and ask for a contextual niche-relevance assessment.
307
+
308
+ **Note:** The request/response format below uses the **Gemini API** (`generateContent`
309
+ format). It is not compatible with OpenAI-style endpoints without modification.
310
+ If you use a different provider, you must adapt the JSON body and response parsing.
311
+
312
+ ```bash
313
+ cat > /tmp/domain-relevance-request.json << 'ENDJSON'
314
+ {
315
+ "system_instruction": {
316
+ "parts": [{
317
+ "text": "You are an SEO research analyst. For each expired domain candidate provided, assess its topical relevance to the specified target niche. Consider the domain name, historical page title, and meta description. For each domain, output a JSON object with: domain (string), relevance_score (integer 1-10), relevance_rationale (one sentence explaining the score), redirect_plausibility (integer 1-10), redirect_rationale (one sentence). Output only a JSON array. No commentary before or after."
318
+ }]
319
+ },
320
+ "contents": [{
321
+ "parts": [{
322
+ "text": "DOMAIN_SIGNALS_AND_NICHE_CONTEXT_HERE"
323
+ }]
324
+ }],
325
+ "generationConfig": {
326
+ "temperature": 0.2,
327
+ "maxOutputTokens": 2048
328
+ }
329
+ }
330
+ ENDJSON
331
+ ```
332
+
333
+ Replace `DOMAIN_SIGNALS_AND_NICHE_CONTEXT_HERE` with:
334
+ - The target niche and seed keywords
335
+ - For each candidate: domain name, historical title, description, keyword match data
336
+
337
+ Send the request to the Gemini API:
338
+
339
+ ```bash
340
+ curl -s -X POST \
341
+ "${LLM_API_ENDPOINT:-https://generativelanguage.googleapis.com/v1beta}/models/${LLM_MODEL:-gemini-2.0-flash}:generateContent?key=$LLM_API_KEY" \
342
+ -H "Content-Type: application/json" \
343
+ -d @/tmp/domain-relevance-request.json \
344
+ | python3 -c "
345
+ import sys, json
346
+ try:
347
+ d = json.load(sys.stdin)
348
+ text = d['candidates'][0]['content']['parts'][0]['text']
349
+ print(text)
350
+ except (KeyError, IndexError, json.JSONDecodeError) as e:
351
+ print(json.dumps({'error': 'llm_response_parse_failed', 'detail': str(e)}))
352
+ "
353
+ ```
354
+
355
+ If the LLM call or response parsing fails, log the error and continue with
356
+ rule-based scoring only. Do not stop the workflow.
357
+
358
+ After all signal collection, state:
359
+ "Signal collection complete for [N] candidates. [M] Wayback hits, [K] RDAP lookups succeeded."
360
+
361
+ ---
362
+
363
+ ## Step 5: Scoring & Classification
364
+
365
+ Read `references/scoring-model.md` for the full scoring framework.
366
+
367
+ For each candidate, compute scores across the 6 dimensions:
368
+
369
+ 1. **Topical Relevance (0–30):** Combine domain keyword match ratio, historical
370
+ title/description analysis, and LLM relevance score (if available). Without
371
+ LLM: use keyword match ratio × 15 + title keyword overlap × 15. With LLM:
372
+ use LLM relevance_score × 3.
373
+
374
+ 2. **Historical Activity Level (0–25):** Based on Wayback snapshot diversity and
375
+ frequency. More snapshots consistently captured across multiple years indicates
376
+ higher sustained activity and inferred legitimacy.
377
+
378
+ 3. **Historical Content Quality (0–15):** Derived from historical page title and
379
+ meta description analysis, checking for natural phrasing versus keyword
380
+ stuffing. Without LLM: base score of 8/15 adjusted by exact-match density.
381
+
382
+ 4. **History Cleanliness (0–15):** Based on Wayback snapshot count, years active,
383
+ status code consistency, and absence of parking page indicators.
384
+
385
+ 5. **Redirect Suitability (0–10):** Based on topic continuity between historical
386
+ content and target niche. Use LLM redirect_plausibility score if available;
387
+ otherwise use keyword overlap ratio.
388
+
389
+ 6. **Signal Completeness (0–5):** Count how many data sources returned usable
390
+ data (Wayback, RDAP, domain analysis, LLM if configured).
391
+
392
+ **Compute:**
393
+ - `opportunity_score` = sum of all dimension scores (0–100)
394
+ - `confidence` = based on how many dimensions have strong data (see scoring-model.md)
395
+ - `recommended_action` = based on score + confidence + risk flags (see Step 6)
396
+
397
+ ---
398
+
399
+ ## Step 6: Risk Flagging & Filtering
400
+
401
+ Read `references/risk-flags.md` for the complete flag definitions.
402
+
403
+ Apply risk flags to each candidate:
404
+
405
+ | Check | Flag Applied |
406
+ |---|---|
407
+ | Historical topic overlap < 30% with target niche | `topic_mismatch` |
408
+ | Domain active < 1 year before expiry | `short_history` |
409
+ | < 3 Wayback snapshots or all parking pages | `unclear_history` |
410
+ | Sudden Wayback drop-off after years of activity | `possible_deindex` |
411
+ | Snapshot count below `min_snapshots` | `weak_historical_activity` |
412
+ | Redirect suitability < 4/10 | `redirect_mismatch` |
413
+
414
+ **Apply recommendation logic:**
415
+
416
+ | Score + Flags | Recommendation |
417
+ |---|---|
418
+ | Score ≥ 75 AND confidence `high` AND no High-severity flags | `high-priority-review` |
419
+ | Score ≥ 55 AND confidence ≥ `medium` | `review` |
420
+ | Score ≥ 55 BUT redirect_suitability < 4/10 | `rebuild-only-review` |
421
+ | Score < 55 OR any critical High-severity flag | `reject` |
422
+
423
+ **Apply `max_risk_level` filter:**
424
+ - If `max_risk_level` = `low`: exclude any candidate with Medium or High flags
425
+ - If `max_risk_level` = `medium`: exclude candidates with High flags only
426
+ - If `max_risk_level` = `high`: include all candidates (no filter)
427
+
428
+ ---
429
+
430
+ ## Step 7: Output & Save
431
+
432
+ Read `references/output-format.md` for the exact JSON schema.
433
+ Read `references/guardrails.md` for the required disclaimer text.
434
+
435
+ **Default: Shortlist mode.** Show only candidates with `recommended_action`
436
+ of `high-priority-review`, `review`, or `rebuild-only-review`.
437
+
438
+ If the user requested audit mode, show ALL candidates with full dimension
439
+ breakdowns including rejection reasons.
440
+
441
+ ### Present the output:
442
+
443
+ ```
444
+ ## Expired Domain Opportunity Finder — [YYYY-MM-DD]
445
+
446
+ **Target niche:** [niche]
447
+ **Seed keywords:** [keywords]
448
+ **Intended use:** [rebuild/redirect/either]
449
+ **Candidates evaluated:** [N]
450
+ **Shortlisted:** [M]
451
+ **Rejected:** [K]
452
+ **Scoring mode:** [llm-enhanced / rule-based-only]
453
+
454
+ ---
455
+
456
+ ### 1. [domain.com] — Score: [N]/100 | Confidence: [level] | Action: [recommendation]
457
+
458
+ **Topical fit:** [summary]
459
+ **Activity level:** [summary]
460
+ **Content quality:** [summary]
461
+ **History:** [summary]
462
+ **Redirect suitability:** [level]
463
+ **Risk flags:** [flags or "none"]
464
+
465
+ **Why selected:** [rationale]
466
+ **Why risky:** [rationale]
467
+
468
+ ---
469
+
470
+ [repeat for each shortlisted domain, ranked by opportunity_score descending]
471
+
472
+ ---
473
+
474
+ **Disclaimer:** These results are research recommendations, not guarantees
475
+ of SEO value. Redirect analysis should only be considered when strong
476
+ topic continuity exists between the expired domain and your target site.
477
+ Search engine algorithms change frequently. Always perform manual due
478
+ diligence — including checking current index status, reviewing the full
479
+ backlink profile with a commercial tool, and verifying domain history —
480
+ before making any acquisition decision. This skill does not endorse or
481
+ facilitate manipulative SEO practices.
482
+ ```
483
+
484
+ **Save the structured JSON output:**
485
+ ```bash
486
+ mkdir -p docs/expired-domain-intel
487
+ OUTFILE="docs/expired-domain-intel/$(date +%Y-%m-%d).json"
488
+ cat > "$OUTFILE" << 'EOF'
489
+ JSON_OUTPUT_HERE
490
+ EOF
491
+ echo "Saved to $OUTFILE"
492
+ ```
493
+
494
+ **If 0 candidates pass the shortlist:**
495
+ "No candidates met the shortlist criteria for the '[niche]' niche with the
496
+ current risk tolerance. This is a normal outcome — it means the evaluated
497
+ domains were not strong enough matches. Try:
498
+ 1. Providing different candidate domains
499
+ 2. Widening seed keywords
500
+ 3. Setting max_risk_level to 'high' to see borderline candidates
501
+ 4. Running in audit mode to see why candidates were rejected"
502
+
503
+ ---
504
+
505
+ ## Self-QA Checklist
506
+
507
+ Run every check before presenting output:
508
+
509
+ - [ ] Every shortlisted domain has both `why_selected` AND `why_risky`
510
+ - [ ] No shortlisted domain has a High-severity risk flag AND `high-priority-review` action
511
+ - [ ] Domains with `redirect_mismatch` are labeled `rebuild-only-review` (not `review`)
512
+ - [ ] The guardrails disclaimer is present at the end of output
513
+ - [ ] No hype language: no "guaranteed", "easy win", "safe to redirect", "SEO hack"
514
+ - [ ] `scoring_mode` correctly reflects whether LLM was used
515
+ - [ ] Candidates are ranked by `opportunity_score` descending
516
+ - [ ] JSON output saved to `docs/expired-domain-intel/YYYY-MM-DD.json`
517
+ - [ ] All Wayback API calls were rate-limited (2s between calls)
518
+
519
+ Fix any violation before presenting.
520
+
521
+ ---
522
+
523
+ ## What Good Output Looks Like
524
+
525
+ - Every domain has a score, confidence, action, and risk assessment
526
+ - Summaries are 1–2 sentences each, specific to the candidate (not generic)
527
+ - Risk flags are present and explained in `why_risky`
528
+ - The shortlist is small (quality over quantity) — typically 2–5 domains from a batch of 10–20
529
+ - Conservative: when in doubt, reject or lower confidence
530
+ - The user can understand exactly why each domain was selected or rejected
531
+
532
+ ## What Bad Output Looks Like
533
+
534
+ - Bare scores without explanation
535
+ - Generic summaries like "this domain has good metrics" (must be specific)
536
+ - High-priority recommendations for domains with serious risk flags
537
+ - Redirect recommendations for topic-mismatched domains
538
+ - No disclaimer at the end
539
+ - Hype language promising SEO outcomes
540
+ - Too many shortlisted domains (the skill should be selective, not permissive)
@@ -0,0 +1,58 @@
1
+ {
2
+ "skill_name": "domain-expired-opportunity-finder",
3
+ "evals": [
4
+ {
5
+ "id": 1,
6
+ "prompt": "Find expired domain opportunities in the developer tools niche. Evaluate these candidates: devtoolsweekly.com, codeshipnews.io, stackforgeapp.com, quickseorank.net, bestcheaphosting247.com",
7
+ "expected_output": "Agent checks environment: confirms curl and python are available. Checks LLM_API_KEY (set). Reports scoring mode: llm-enhanced. Collects target_niche='developer tools', extracts seed keywords automatically. Normalizes 5 candidates (all valid, 0 removed). For each candidate: queries Wayback CDX API with 2-second rate limiting, fetches most recent snapshot for page title extraction, runs WHOIS lookup, performs domain string keyword analysis, sends batch to LLM for niche-relevance assessment. Scores all 5 across 6 dimensions using references/scoring-model.md. Applies risk flags per references/risk-flags.md. devtoolsweekly.com and codeshipnews.io should score highest (strong topical fit). quickseorank.net and bestcheaphosting247.com should be rejected (topic_mismatch). stackforgeapp.com should get rebuild-only-review or review. Output includes why_selected AND why_risky for each shortlisted domain. Guardrails disclaimer present at end. JSON saved to docs/expired-domain-intel/YYYY-MM-DD.json.",
8
+ "files": [
9
+ "references/scoring-model.md",
10
+ "references/risk-flags.md",
11
+ "references/output-format.md",
12
+ "references/guardrails.md"
13
+ ]
14
+ },
15
+ {
16
+ "id": 2,
17
+ "prompt": "Evaluate these expired domains for my AI SaaS niche: mlpipeline.io, datascience-hub.com, cheapcarinsurance99.com",
18
+ "expected_output": "Agent checks environment: LLM_API_KEY is NOT set. Reports: 'Running in rule-based-only mode. Set LLM_API_KEY for enhanced niche-relevance scoring.' Does not crash. Proceeds with rule-based scoring: domain string keyword matching against 'AI SaaS' and auto-extracted seed keywords, Wayback CDX history analysis, WHOIS lookups. mlpipeline.io and datascience-hub.com should score reasonably on topical relevance via keyword overlap. cheapcarinsurance99.com should be rejected with topic_mismatch flag. Confidence levels may be lower than LLM-enhanced mode due to less nuanced relevance assessment. scoring_mode field in output reads 'rule-based-only'. All other output structure remains identical. Disclaimer present.",
19
+ "files": [
20
+ "references/scoring-model.md",
21
+ "references/output-format.md"
22
+ ],
23
+ "setup": "LLM_API_KEY not set in environment"
24
+ },
25
+ {
26
+ "id": 3,
27
+ "prompt": "Find expired domain opportunities in the developer tools niche. Run with the example set.",
28
+ "expected_output": "Agent loads the built-in demo candidate set: devtoolsweekly.com, codeshipnews.io, stackforgeapp.com, quickseorank.net, bestcheaphosting247.com, cloudbuildpro.dev, reactwidgetlib.com, megadealsshop.xyz. Normalizes all 8 (all valid). Evaluates all 8. Expected results: devtoolsweekly.com, codeshipnews.io, cloudbuildpro.dev, and reactwidgetlib.com should score well (topical fit to developer tools). stackforgeapp.com should get rebuild-only-review (adjacent but not direct match). quickseorank.net, bestcheaphosting247.com, megadealsshop.xyz should be rejected (topic_mismatch and/or spam signals). Shortlist should contain 3-5 domains. Rejected count should be 3-5. Output is selective, not permissive.",
29
+ "files": [
30
+ "references/scoring-model.md",
31
+ "references/risk-flags.md",
32
+ "references/output-format.md",
33
+ "references/guardrails.md"
34
+ ]
35
+ },
36
+ {
37
+ "id": 4,
38
+ "prompt": "Evaluate paymentsdaily.com for the fintech niche. I want to redirect it to my fintech blog.",
39
+ "expected_output": "Agent sets intended_use='redirect'. Normalizes 1 candidate. Collects signals for paymentsdaily.com. If Wayback shows the domain was about payment news/fintech: topical_relevance scores high, redirect_suitability should be medium-to-high (topic continuity between payments content and fintech blog). Recommended action should be 'review' or 'high-priority-review' depending on signal quality. If Wayback shows sparse or unrelated history: redirect_suitability should be 'low', redirect_mismatch flag applied, recommendation should be 'rebuild-only-review'. Output explicitly addresses redirect suitability assessment since intended_use is 'redirect'. why_risky present even for a good candidate.",
40
+ "files": [
41
+ "references/scoring-model.md",
42
+ "references/risk-flags.md",
43
+ "references/output-format.md"
44
+ ]
45
+ },
46
+ {
47
+ "id": 5,
48
+ "prompt": "Evaluate these domains for cybersecurity niche: xyz123abc.com, q9w8e7r6.net, testdomain000.org",
49
+ "expected_output": "Agent normalizes 3 candidates. Collects signals. All three domains have nonsensical names with no keyword match to cybersecurity. Wayback CDX returns few or zero snapshots for all three (gibberish domains rarely have meaningful history). WHOIS may show short registration periods. Expected scoring: all three should score below 55 on opportunity_score. All three should receive topic_mismatch flag. Multiple candidates should receive unclear_history flag due to sparse Wayback data. Confidence should be 'low' for all three. All three should be recommended as 'reject'. Zero-result message presented: 'No candidates met the shortlist criteria for the cybersecurity niche.' Suggests widening keywords, trying different candidates, or running in audit mode. Guardrails disclaimer present.",
50
+ "files": [
51
+ "references/scoring-model.md",
52
+ "references/risk-flags.md",
53
+ "references/output-format.md",
54
+ "references/guardrails.md"
55
+ ]
56
+ }
57
+ ]
58
+ }