@cloudcreate/adsense-check 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,76 +1,170 @@
1
- # adsense-check
1
+ # @cloudcreate/adsense-check
2
2
 
3
- Check if a website meets Google AdSense review requirements.
3
+ Automated website checker for Google AdSense review requirements. Detects "low value content" — the #1 rejection reason. Supports content sites, tool sites, and game sites with AI-powered topic analysis and content relevance checking.
4
4
 
5
5
  ## Install
6
6
 
7
7
  ```bash
8
- npm install -g adsense-check
8
+ npm install -g @cloudcreate/adsense-check
9
9
  ```
10
10
 
11
- ## Usage
11
+ Or run directly without installing:
12
12
 
13
13
  ```bash
14
- # Basic check
14
+ npx @cloudcreate/adsense-check https://example.com
15
+ ```
16
+
17
+ ## Quick Start
18
+
19
+ ```bash
20
+ # Full check with AI analysis
21
+ adsense-check https://example.com --ai
22
+
23
+ # Quick check without AI
15
24
  adsense-check https://example.com
16
25
 
17
26
  # JSON output (for programmatic use)
18
27
  adsense-check https://example.com --json
19
28
 
20
- # Crawl more internal pages (default: 5)
21
- adsense-check https://example.com --depth 10
22
-
23
- # Skip AI content analysis
24
- adsense-check https://example.com --skip-ai
29
+ # Chinese output
30
+ adsense-check https://example.com -l zh --ai
25
31
 
26
- # Custom timeout (ms)
27
- adsense-check https://example.com --timeout 60000
32
+ # Only detect site type and topic
33
+ adsense-check https://example.com --detect-only --ai
28
34
  ```
29
35
 
30
- ## What it checks
36
+ Reports are auto-saved to `tmp/<domain>-<timestamp>.json`.
37
+
38
+ ## Features
39
+
40
+ ### Site Type Detection
41
+
42
+ Automatically classifies websites into three supported types:
43
+
44
+ | Type | Description | Examples |
45
+ |------|-------------|----------|
46
+ | **Content** | News, blogs, reference material | theexceltranslator.com |
47
+ | **Tool** | Online calculators, converters, generators | ishowspeedsaid.com |
48
+ | **Game** | Online games, game portals | popstone2.com |
49
+ | **Unsupported** | Other types (e-commerce, social, etc.) | — |
50
+
51
+ AI analysis classifies the site type and topic. Falls back to DOM signal detection when AI is unavailable.
52
+
53
+ ### AI Topic Analysis
54
+
55
+ With `--ai`, the tool analyzes the homepage to determine:
56
+ - **Topic**: What the site is about (e.g., "online match-3 puzzle games")
57
+ - **Description**: One-line summary of the site's purpose
58
+ - **Type**: content / tool / game / unsupported
59
+
60
+ ### Content Relevance Checking
61
+
62
+ Each page is evaluated for relevance to the site's topic:
63
+ - **relevant**: Directly related to the site's topic
64
+ - **tangential**: Loosely related
65
+ - **off-topic**: Unrelated to the site's purpose
66
+
67
+ Sites with >30% off-topic content are flagged as potentially failing review.
31
68
 
32
- | Category | Checks |
33
- |----------|--------|
34
- | **Content Quality** | Page word count, content duplication |
35
- | **Required Pages** | About, Privacy Policy, Contact, Terms of Service |
36
- | **Site Structure** | H1 tags, robots.txt, sitemap.xml, internal links, dead links |
37
- | **Performance** | Load time, mobile viewport, responsive layout, font size, popups |
38
- | **Policy Compliance** | Blacklisted keywords (porn, gambling, piracy, etc.) |
39
- | **AI Analysis** | Content originality, quality, compliance (requires `AI_API_KEY`) |
69
+ ### Sampling Strategy
70
+
71
+ The tool discovers content pages from sitemaps (including recursive sitemap indexes) and homepage links, then samples based on:
72
+
73
+ - **6-month freshness**: Prioritizes recently updated content
74
+ - **Configurable minimum**: `--sample-min` (default: 20)
75
+ - **Configurable ratio**: `--sample-ratio` (default: 0.2, i.e., 20%)
76
+ - **Confidence level**: high (≥50%), medium (≥20%), low (<20%)
77
+
78
+ ### Two-Group Scoring
79
+
80
+ Checks are divided into **Hard Requirements** (pass/fail) and **Soft Scoring** (0-100):
81
+
82
+ ```
83
+ Composite = Hard Pass Rate × 0.4 + Soft Score × 0.6 - Warning Penalty
84
+ ```
85
+
86
+ - **Hard**: Site scale, required pages, structure, performance baseline, policy compliance
87
+ - **Soft**: Content quality, user experience, AI analysis, content relevance
40
88
 
41
89
  ## Options
42
90
 
43
91
  ```
44
- -v, --version Show version
45
- -j, --json Output as JSON
46
- -d, --depth <n> Number of internal pages to crawl (default: 5)
47
- -s, --skip-ai Skip AI content analysis
48
- -t, --timeout <ms> Page load timeout (default: 30000)
49
- --api-key <key> Anthropic API key (or set ANTHROPIC_API_KEY env var)
92
+ -v, --version Show version
93
+ -j, --json Output JSON to stdout
94
+ -n, --max-crawl <n> Total page crawl limit, Phase 1 + 2 (default: 50)
95
+ -m, --page-limit <n> Max structural pages to crawl, Phase 1 (default: 50)
96
+ -c, --content-limit <n> Max content pages to crawl, Phase 2 (default: 20)
97
+ --sample-min <n> Min content pages to sample (default: 20)
98
+ --sample-ratio <ratio> Content page sampling ratio 0-1 (default: 0.2)
99
+ --ai Enable AI content quality analysis
100
+ -t, --timeout <ms> Page load timeout (default: 30000)
101
+ --api-key <key> AI API key
102
+ -o, --output <dir> Report output dir (default: tmp)
103
+ --no-save Skip auto-saving report
104
+ -l, --lang <lang> Output language: en|zh (default: en)
105
+ --type <type> Force site type: content|tool|game
106
+ --detect-only Only detect site type/topic, skip full check
50
107
  ```
51
108
 
52
- ## AI Analysis
109
+ ## AI Configuration
53
110
 
54
- For deeper content quality assessment, configure AI API in `.env`:
111
+ Supports any OpenAI-compatible API (DeepSeek, OpenAI, Moonshot, local LLM, etc.).
55
112
 
56
113
  ```bash
57
114
  cp .env.example .env
58
- # Edit .env and set AI_API_KEY, AI_API_BASE, AI_MODEL
115
+ # Edit .env:
116
+ # AI_API_KEY=sk-xxx
117
+ # AI_API_BASE=https://api.deepseek.com
118
+ # AI_MODEL=deepseek-chat
59
119
  ```
60
120
 
61
- Compatible with any OpenAI-format API: DeepSeek, OpenAI, Moonshot, local LLM, etc.
62
-
63
- Or pass the key directly:
121
+ Or pass directly:
64
122
 
65
123
  ```bash
66
- adsense-check https://example.com --api-key sk-xxx...
124
+ adsense-check https://example.com --ai --api-key sk-xxx...
125
+ ```
126
+
127
+ ## Report Output
128
+
129
+ ### Terminal Report
130
+
131
+ ```
132
+ AdSense Checklist Report
133
+ URL: https://example.com
134
+ Time: 2026-05-08T15:00:00.000Z
135
+ Site type: 内容站
136
+ Topic: Excel translation reference — Provides Excel terminology translations for multiple languages.
137
+ Pages: 165 total, 82 recent (6mo), 33 sampled (20%) medium confidence
138
+
139
+ 综合评分: 82/100
140
+
141
+ ┌─ 硬性要求 ──────────────────────────────────── PASS
142
+ │ ✔ 站点规模 站点规模良好 (194 个页面)
143
+ │ ✔ About 找到 About 页面 (/about/)
144
+ │ ...
145
+ └─ 评分: READY — 所有必要项达标
146
+
147
+ ┌─ 柔性评分 ──────────────────────────────────── 75/100
148
+ │ ████████████████████ 100% 内容质量
149
+ │ ████████████████████ 100% 用户体验
150
+ │ ████████░░░░░░░░░░░░ 40% AI 内容分析
151
+ │ ████████████████████ 100% 内容相关性
152
+
153
+ │ Hard 40% × 0.4 + Soft 75% × 0.6 - Penalty 0 = 82
154
+ └─
67
155
  ```
68
156
 
69
- ## Exit codes
157
+ ### JSON Report
158
+
159
+ Full structured data including per-page details, AI assessments, topic info, and sampling stats. Saved automatically to `tmp/`.
160
+
161
+ ## Exit Codes
70
162
 
71
- - `0` No failures (ready or mostly ready)
72
- - `1` — Has failures (not ready)
73
- - `2` Error (invalid URL, network failure, etc.)
163
+ | Code | Meaning |
164
+ |------|---------|
165
+ | 0 | No failures (READY or MOSTLY READY) |
166
+ | 1 | Has failures (NOT READY) |
167
+ | 2 | Runtime error |
74
168
 
75
169
  ## License
76
170