@cloudcreate/adsense-check 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # @cloudcreate/adsense-check
2
2
 
3
- Automated website checker for Google AdSense review requirements. Focuses on detecting "low value content" — the #1 rejection reason.
3
+ Automated website checker for Google AdSense review requirements. Detects "low value content" — the #1 rejection reason. Supports content sites, tool sites, and game sites with AI-powered topic analysis and content relevance checking.
4
4
 
5
5
  ## Install
6
6
 
@@ -18,75 +18,92 @@ npx @cloudcreate/adsense-check https://example.com
18
18
 
19
19
  ```bash
20
20
  # Full check with AI analysis
21
- adsense-check https://example.com
21
+ adsense-check https://example.com --ai
22
22
 
23
23
  # Quick check without AI
24
- adsense-check https://example.com --skip-ai
24
+ adsense-check https://example.com
25
25
 
26
26
  # JSON output (for programmatic use)
27
27
  adsense-check https://example.com --json
28
+
29
+ # Chinese output
30
+ adsense-check https://example.com -l zh --ai
31
+
32
+ # Only detect site type and topic
33
+ adsense-check https://example.com --detect-only --ai
28
34
  ```
29
35
 
30
36
  Reports are auto-saved to `tmp/<domain>-<timestamp>.json`.
31
37
 
32
- ## What It Checks
33
-
34
- | Category | Checks | Focus |
35
- |----------|--------|-------|
36
- | **Content Quality** (8) | Content ratio, depth, template detection, filler detection, duplication, freshness, site scale | Low-value content |
37
- | **Required Pages** (4) | About, Privacy Policy, Contact, Terms of Service | Completeness |
38
- | **Site Structure** (5) | H1 tags, robots.txt, sitemap, internal links, dead links | Crawlability |
39
- | **Performance** (5) | Load speed, viewport, mobile overflow, font size, popups | User experience |
40
- | **Policy Compliance** (1) | Blacklisted keywords | AdSense policy |
41
- | **AI Analysis** (3+) | Content value, originality, compliance + per-page analysis | Low-value content |
42
-
43
- ### Content Quality (Anti Low-Value Content)
44
-
45
- The core focus of this tool detecting content that AdSense reviewers flag as "low value":
46
-
47
- - **Content Ratio**: Strips navigation/footer/sidebar, measures real content percentage
48
- - **Content Depth**: Per-page word count of actual content (not total page text)
49
- - **Template Detection**: Flags pages with identical structures but different words
50
- - **Filler Detection**: Catches repeated phrases, padding, meaningless text
51
- - **Cross-Page Duplication**: Segment-level dedup across all crawled pages
52
- - **Content Freshness**: Checks if site has been updated recently
53
- - **Site Scale**: Warns if site has too few content pages
54
-
55
- ### AI Per-Page Analysis
56
-
57
- With AI enabled, each crawled page gets individual assessment:
58
-
59
- ```json
60
- {
61
- "pages": [
62
- {
63
- "url": "https://example.com/blog/post-1",
64
- "title": "Post Title",
65
- "contentChars": 1200,
66
- "contentRatio": 85,
67
- "contentStatus": "pass",
68
- "issues": [],
69
- "ai": {
70
- "status": "pass",
71
- "assessment": "Content provides genuine value...",
72
- "suggestions": ["Add more specific examples"]
73
- }
74
- }
75
- ]
76
- }
38
+ ## Features
39
+
40
+ ### Site Type Detection
41
+
42
+ Automatically classifies websites into three supported types:
43
+
44
+ | Type | Description | Examples |
45
+ |------|-------------|----------|
46
+ | **Content** | News, blogs, reference material | theexceltranslator.com |
47
+ | **Tool** | Online calculators, converters, generators | ishowspeedsaid.com |
48
+ | **Game** | Online games, game portals | popstone2.com |
49
+ | **Unsupported** | Other types (e-commerce, social, etc.) | — |
50
+
51
+ AI analysis classifies the site type and topic. Falls back to DOM signal detection when AI is unavailable.
52
+
53
+ ### AI Topic Analysis
54
+
55
+ With `--ai`, the tool analyzes the homepage to determine:
56
+ - **Topic**: What the site is about (e.g., "online match-3 puzzle games")
57
+ - **Description**: One-line summary of the site's purpose
58
+ - **Type**: content / tool / game / unsupported
59
+
60
+ ### Content Relevance Checking
61
+
62
+ Each page is evaluated for relevance to the site's topic:
63
+ - **relevant**: Directly related to the site's topic
64
+ - **tangential**: Loosely related
65
+ - **off-topic**: Unrelated to the site's purpose
66
+
67
+ Sites with >30% off-topic content are flagged as potentially failing review.
68
+
69
+ ### Sampling Strategy
70
+
71
+ The tool discovers content pages from sitemaps (including recursive sitemap indexes) and homepage links, then samples based on:
72
+
73
+ - **6-month freshness**: Prioritizes recently updated content
74
+ - **Configurable minimum**: `--sample-min` (default: 20)
75
+ - **Configurable ratio**: `--sample-ratio` (default: 0.2, i.e., 20%)
76
+ - **Confidence level**: high (≥50%), medium (≥20%), low (<20%)
77
+
78
+ ### Two-Group Scoring
79
+
80
+ Checks are divided into **Hard Requirements** (pass/fail) and **Soft Scoring** (0-100):
81
+
77
82
  ```
83
+ Composite = Hard Pass Rate × 0.4 + Soft Score × 0.6 - Warning Penalty
84
+ ```
85
+
86
+ - **Hard**: Site scale, required pages, structure, performance baseline, policy compliance
87
+ - **Soft**: Content quality, user experience, AI analysis, content relevance
78
88
 
79
89
  ## Options
80
90
 
81
91
  ```
82
- -v, --version Show version
83
- -j, --json Output JSON to stdout
84
- -d, --depth <n> Pages to crawl (default: 10)
85
- -s, --skip-ai Skip AI analysis
86
- -t, --timeout <ms> Page load timeout (default: 30000)
87
- --api-key <key> AI API key
88
- -o, --output <dir> Report output dir (default: tmp)
89
- --no-save Skip auto-saving report
92
+ -v, --version Show version
93
+ -j, --json Output JSON to stdout
94
+ -n, --max-crawl <n> Total page crawl limit, Phase 1 + 2 (default: 50)
95
+ -m, --page-limit <n> Max structural pages to crawl, Phase 1 (default: 50)
96
+ -c, --content-limit <n> Max content pages to crawl, Phase 2 (default: 20)
97
+ --sample-min <n> Min content pages to sample (default: 20)
98
+ --sample-ratio <ratio> Content page sampling ratio 0-1 (default: 0.2)
99
+ --ai Enable AI content quality analysis
100
+ -t, --timeout <ms> Page load timeout (default: 30000)
101
+ --api-key <key> AI API key
102
+ -o, --output <dir> Report output dir (default: tmp)
103
+ --no-save Skip auto-saving report
104
+ -l, --lang <lang> Output language: en|zh (default: en)
105
+ --type <type> Force site type: content|tool|game
106
+ --detect-only Only detect site type/topic, skip full check
90
107
  ```
91
108
 
92
109
  ## AI Configuration
@@ -104,7 +121,7 @@ cp .env.example .env
104
121
  Or pass directly:
105
122
 
106
123
  ```bash
107
- adsense-check https://example.com --api-key sk-xxx...
124
+ adsense-check https://example.com --ai --api-key sk-xxx...
108
125
  ```
109
126
 
110
127
  ## Report Output
@@ -113,30 +130,33 @@ adsense-check https://example.com --api-key sk-xxx...
113
130
 
114
131
  ```
115
132
  AdSense Checklist Report
116
- Website: https://example.com
117
-
118
- Content Quality
119
- [PASS] 各页面正文占比正常
120
- [PASS] 首页正文内容充足 (2,340 )
121
-
122
- ...
123
-
124
- Page Details (5 pages analyzed)
125
- /
126
- 正文 92% (2,340/2,540 字)
127
- ⚠ /blog/old-post
128
- 正文 25% (80/320 字)
129
- ! 正文占比仅 25%,大量模板元素
130
- AI: 内容过于单薄,缺乏实质性信息
131
- 增加至少 500 字的原创分析内容
132
-
133
- Score: 18/21
134
- Status: NOT READY — 1 项失败需要修复
133
+ URL: https://example.com
134
+ Time: 2026-05-08T15:00:00.000Z
135
+ Site type: 内容站
136
+ Topic: Excel translation reference — Provides Excel terminology translations for multiple languages.
137
+ Pages: 165 total, 82 recent (6mo), 33 sampled (20%) medium confidence
138
+
139
+ 综合评分: 82/100
140
+
141
+ ┌─ 硬性要求 ──────────────────────────────────── PASS
142
+ 站点规模 站点规模良好 (194 个页面)
143
+ │ ✔ About 找到 About 页面 (/about/)
144
+ │ ...
145
+ └─ 评分: READY — 所有必要项达标
146
+
147
+ ┌─ 柔性评分 ──────────────────────────────────── 75/100
148
+ │ ████████████████████ 100% 内容质量
149
+ │ ████████████████████ 100% 用户体验
150
+ │ ████████░░░░░░░░░░░░ 40% AI 内容分析
151
+ │ ████████████████████ 100% 内容相关性
152
+
153
+ │ Hard 40% × 0.4 + Soft 75% × 0.6 - Penalty 0 = 82
154
+ └─
135
155
  ```
136
156
 
137
157
  ### JSON Report
138
158
 
139
- Full structured data including per-page details, saved automatically to `tmp/`.
159
+ Full structured data including per-page details, AI assessments, topic info, and sampling stats. Saved automatically to `tmp/`.
140
160
 
141
161
  ## Exit Codes
142
162