@tydung26/product-kit 1.3.2 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +4 -8
- package/dist/scripts/market-intel/search-app-store.d.ts +7 -0
- package/dist/scripts/market-intel/search-app-store.d.ts.map +1 -0
- package/dist/scripts/market-intel/search-app-store.js +91 -0
- package/dist/scripts/market-intel/search-app-store.js.map +1 -0
- package/dist/scripts/market-intel/search-google-play.d.ts +7 -0
- package/dist/scripts/market-intel/search-google-play.d.ts.map +1 -0
- package/dist/scripts/market-intel/search-google-play.js +195 -0
- package/dist/scripts/market-intel/search-google-play.js.map +1 -0
- package/dist/scripts/market-intel/search-product-hunt.d.ts +7 -0
- package/dist/scripts/market-intel/search-product-hunt.d.ts.map +1 -0
- package/dist/scripts/market-intel/search-product-hunt.js +236 -0
- package/dist/scripts/market-intel/search-product-hunt.js.map +1 -0
- package/dist/scripts/market-intel/search-yc-launch.d.ts +7 -0
- package/dist/scripts/market-intel/search-yc-launch.d.ts.map +1 -0
- package/dist/scripts/market-intel/search-yc-launch.js +229 -0
- package/dist/scripts/market-intel/search-yc-launch.js.map +1 -0
- package/dist/scripts/market-intel/shared-types.d.ts +44 -0
- package/dist/scripts/market-intel/shared-types.d.ts.map +1 -0
- package/dist/scripts/market-intel/shared-types.js +63 -0
- package/dist/scripts/market-intel/shared-types.js.map +1 -0
- package/package.json +8 -9
- package/skills/market-intel/SKILL.md +184 -61
- package/skills/market-intel/scripts/search-app-store.py +117 -0
- package/skills/market-intel/scripts/search-google-play.py +179 -0
- package/skills/market-intel/scripts/search-product-hunt.py +194 -0
- package/skills/market-intel/scripts/search-yc-launch.py +160 -0
- package/skills/naming/SKILL.md +66 -0
- package/dist/commands/config/index.d.ts +0 -3
- package/dist/commands/config/index.d.ts.map +0 -1
- package/dist/commands/config/index.js +0 -34
- package/dist/commands/config/index.js.map +0 -1
|
@@ -11,14 +11,14 @@ license: MIT
|
|
|
11
11
|
|
|
12
12
|
# Market Intel - Competitive Landscape Analysis
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
Automated competitor discovery across 4 platforms → structured analysis → dashboard-style report.
|
|
15
15
|
|
|
16
|
-
**Principles:**
|
|
16
|
+
**Principles:** Evidence over opinion | Honest about strengths and gaps | Actionable recs over generic summaries | Graceful degradation when data unavailable
|
|
17
17
|
|
|
18
18
|
## Usage
|
|
19
19
|
|
|
20
20
|
```
|
|
21
|
-
/pkit:market-intel <product
|
|
21
|
+
/pkit:market-intel <product idea or market description>
|
|
22
22
|
```
|
|
23
23
|
|
|
24
24
|
**Do NOT use for:** Internal A/B decisions, roadmap planning (`/pkit:roadmap`), or feature prioritization.
|
|
@@ -26,16 +26,36 @@ Scope → competitor profiles → feature matrix → whitespace mapping → stra
|
|
|
26
26
|
## Workflow Overview
|
|
27
27
|
|
|
28
28
|
```
|
|
29
|
-
[Scope
|
|
29
|
+
[Scope] → [Search Platforms] → [Fetch & Extract] → [Analyze] → [Compare] → [Report]
|
|
30
30
|
```
|
|
31
31
|
|
|
32
|
-
| Step
|
|
33
|
-
|
|
|
34
|
-
| 1. Scope
|
|
35
|
-
| 2.
|
|
36
|
-
| 3.
|
|
37
|
-
| 4.
|
|
38
|
-
| 5.
|
|
32
|
+
| Step | Action | Skip if |
|
|
33
|
+
| ---------- | ----------------------------------------------- | ---------------- |
|
|
34
|
+
| 1. Scope | Define product idea, competitors, focus area | Context provided |
|
|
35
|
+
| 2. Search | Run crawler scripts on all 4 platforms | — |
|
|
36
|
+
| 3. Parse | Parse JSON output, merge and deduplicate | — |
|
|
37
|
+
| 4. Analyze | Per-competitor analysis across all dimensions | — |
|
|
38
|
+
| 5. Compare | Cross-competitor feature matrix + pricing | — |
|
|
39
|
+
| 6. Report | Generate dashboard-style MD report | — |
|
|
40
|
+
|
|
41
|
+
## Helper Scripts
|
|
42
|
+
|
|
43
|
+
Platform-specific crawler scripts are bundled in `scripts/` (relative to this SKILL.md).
|
|
44
|
+
Each script fetches and parses data from one platform, outputting structured JSON to stdout.
|
|
45
|
+
|
|
46
|
+
**Usage:** Run via Bash tool:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
python3 {skill_dir}/scripts/search-app-store.py "<keywords>" [limit]
|
|
50
|
+
python3 {skill_dir}/scripts/search-google-play.py "<keywords>" [limit]
|
|
51
|
+
python3 {skill_dir}/scripts/search-product-hunt.py "<keywords>" [limit]
|
|
52
|
+
python3 {skill_dir}/scripts/search-yc-launch.py "<keywords>" [limit]
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
- `{skill_dir}` = directory containing this SKILL.md
|
|
56
|
+
- `limit` = max results per platform (default 5, max 10)
|
|
57
|
+
- Output: JSON with `results[]` array (name, url, description, rating, pricing, reviews) and `errors[]` for non-fatal issues
|
|
58
|
+
- **Run all 4 in parallel** for speed
|
|
39
59
|
|
|
40
60
|
## Step Details
|
|
41
61
|
|
|
@@ -43,79 +63,182 @@ Scope → competitor profiles → feature matrix → whitespace mapping → stra
|
|
|
43
63
|
|
|
44
64
|
Ask if not provided:
|
|
45
65
|
|
|
46
|
-
- What product/feature are we analyzing?
|
|
47
|
-
- Who are
|
|
66
|
+
- What product/feature/idea are we analyzing?
|
|
67
|
+
- Who are known competitors? (or: "find them for me")
|
|
48
68
|
- Focus area: pricing / features / UX / positioning / all?
|
|
49
|
-
-
|
|
69
|
+
- How many competitors to analyze? (default 5–8, max 10)
|
|
70
|
+
|
|
71
|
+
### Step 2 — Search Platforms
|
|
72
|
+
|
|
73
|
+
**Primary method:** Run all 4 crawler scripts in parallel via Bash tool:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
python3 {skill_dir}/scripts/search-app-store.py "{keywords}" {limit}
|
|
77
|
+
python3 {skill_dir}/scripts/search-google-play.py "{keywords}" {limit}
|
|
78
|
+
python3 {skill_dir}/scripts/search-product-hunt.py "{keywords}" {limit}
|
|
79
|
+
python3 {skill_dir}/scripts/search-yc-launch.py "{keywords}" {limit}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Also run a general WebSearch: `{keywords} app alternatives competitors` to catch competitors not on these platforms.
|
|
83
|
+
|
|
84
|
+
**Fallback (if scripts unavailable):** Use WebSearch with `site:` queries:
|
|
85
|
+
|
|
86
|
+
| Platform | Search Query Pattern |
|
|
87
|
+
| ------------ | ----------------------------------------------------- |
|
|
88
|
+
| App Store | `site:apps.apple.com {keywords}` |
|
|
89
|
+
| Google Play | `site:play.google.com/store/apps {keywords}` |
|
|
90
|
+
| Product Hunt | `site:producthunt.com/posts {keywords}` |
|
|
91
|
+
| YC Launch | `site:ycombinator.com/launches {keywords}` |
|
|
92
|
+
|
|
93
|
+
### Step 3 — Parse & Merge Results
|
|
94
|
+
|
|
95
|
+
Parse JSON output from each script. Each result contains:
|
|
96
|
+
|
|
97
|
+
- `name`, `url`, `description`, `tagline`
|
|
98
|
+
- `rating`, `reviewCount`
|
|
99
|
+
- `pricing` (free, monthly, yearly, other)
|
|
100
|
+
- `reviews[]` (text, rating, sentiment)
|
|
101
|
+
- `errors[]` (non-fatal issues encountered)
|
|
102
|
+
|
|
103
|
+
**Merge:** Deduplicate by product name across platforms. Prioritize entries with richer data (more reviews, pricing info). Check the `errors` array — note any platform issues in the final report.
|
|
104
|
+
|
|
105
|
+
**If using WebSearch fallback:** WebFetch each result URL and manually extract the same fields.
|
|
106
|
+
|
|
107
|
+
### Step 4 — Analyze Per Competitor
|
|
50
108
|
|
|
51
|
-
|
|
109
|
+
For each competitor, determine:
|
|
52
110
|
|
|
53
|
-
|
|
111
|
+
| Dimension | What to assess |
|
|
112
|
+
| ------------------ | ------------------------------------------------------- |
|
|
113
|
+
| Problem Solved | Core pain point addressed |
|
|
114
|
+
| Target Audience | Primary user segment (role, context, company size) |
|
|
115
|
+
| Value Proposition | Why users choose this over alternatives |
|
|
116
|
+
| Killer Features | Top 3 differentiating capabilities |
|
|
117
|
+
| Strengths | Top 3 things done well (from reviews + product page) |
|
|
118
|
+
| Weaknesses | Top 3 gaps or pain points (from negative reviews/discussions) |
|
|
119
|
+
| Review Sentiment | Positive / Mixed / Negative + key themes |
|
|
120
|
+
| Pricing Model | Free tier, monthly, yearly, lifetime, other tiers |
|
|
54
121
|
|
|
55
|
-
**
|
|
122
|
+
Base analysis on **actual data** from fetched pages, reviews, and discussions — not assumptions.
|
|
56
123
|
|
|
57
|
-
|
|
58
|
-
- **Target segment:** Who they primarily serve
|
|
59
|
-
- **Pricing model:** Free/freemium/paid tiers (include price points if known)
|
|
60
|
-
- **Key strengths:** Top 3 things they do well
|
|
61
|
-
- **Key weaknesses:** Top 3 pain points or gaps
|
|
62
|
-
- **Notable features:** Differentiated capabilities worth noting
|
|
124
|
+
### Step 5 — Cross-Compare
|
|
63
125
|
|
|
64
|
-
|
|
126
|
+
Build two comparison matrices:
|
|
65
127
|
|
|
66
|
-
|
|
128
|
+
**Feature Matrix** — identify 8–15 key features across all competitors:
|
|
67
129
|
|
|
68
|
-
| Feature |
|
|
69
|
-
| --------- | --------- |
|
|
70
|
-
| [Feature] | ✓ / ✗ / ~ | ✓ / ✗ / ~
|
|
130
|
+
| Feature | App A | App B | App C |
|
|
131
|
+
| --------- | --------- | --------- | --------- |
|
|
132
|
+
| [Feature] | ✓ / ✗ / ~ | ✓ / ✗ / ~ | ✓ / ✗ / ~ |
|
|
71
133
|
|
|
72
134
|
Legend: ✓ = strong, ~ = partial/limited, ✗ = missing
|
|
73
135
|
|
|
74
|
-
|
|
136
|
+
**Pricing Landscape** — normalize pricing across competitors.
|
|
75
137
|
|
|
76
|
-
|
|
138
|
+
Then identify:
|
|
77
139
|
|
|
78
|
-
- **Gaps no one fills well**
|
|
79
|
-
- **Table stakes**
|
|
80
|
-
- **
|
|
81
|
-
- **Threats to watch** (competitors gaining momentum)
|
|
140
|
+
- **Gaps no one fills well** — your opportunity
|
|
141
|
+
- **Table stakes** — must-have to compete
|
|
142
|
+
- **Emerging threats** — competitors gaining momentum
|
|
82
143
|
|
|
83
|
-
### Step
|
|
144
|
+
### Step 6 — Generate Report
|
|
84
145
|
|
|
85
|
-
|
|
146
|
+
Output a single markdown report using the template below. Save to the plan reports directory if available.
|
|
86
147
|
|
|
87
|
-
|
|
88
|
-
- What to build to close gaps
|
|
89
|
-
- What to monitor but not react to yet
|
|
148
|
+
**Table width rule:** If >7 competitors, split the Competitor Dashboard into 2 tables (e.g., 1–5 and 6–10).
|
|
90
149
|
|
|
91
|
-
##
|
|
150
|
+
## Report Template
|
|
92
151
|
|
|
93
|
-
```
|
|
94
|
-
|
|
152
|
+
```markdown
|
|
153
|
+
# Market Intel Report: [Product Idea]
|
|
154
|
+
|
|
155
|
+
> **Platforms searched:** App Store, Google Play, Product Hunt, YC Launch
|
|
156
|
+
> **Competitors analyzed:** {count} | **Generated:** {date}
|
|
157
|
+
|
|
158
|
+
## Competitor Dashboard
|
|
159
|
+
|
|
160
|
+
| Aspect | App A | App B | App C |
|
|
161
|
+
| ----------------- | -------------- | -------------- | -------------- |
|
|
162
|
+
| Platform(s) | PH, App Store | Google Play | YC Launch |
|
|
163
|
+
| Problem Solved | ... | ... | ... |
|
|
164
|
+
| Target Audience | ... | ... | ... |
|
|
165
|
+
| Value Proposition | ... | ... | ... |
|
|
166
|
+
| Killer Features | ... | ... | ... |
|
|
167
|
+
| Strengths | ... | ... | ... |
|
|
168
|
+
| Weaknesses | ... | ... | ... |
|
|
169
|
+
| Rating | 4.5★ (2.3k) | 4.1★ (800) | N/A |
|
|
170
|
+
| Review Sentiment | Positive | Mixed | Positive |
|
|
171
|
+
|
|
172
|
+
## Pricing Landscape
|
|
173
|
+
|
|
174
|
+
| App | Free Tier | Monthly | Yearly | Other | Notes |
|
|
175
|
+
| ----- | --------- | ------- | ------ | ------------ | ------------- |
|
|
176
|
+
| App A | ✓ | $9 | $79 | — | 14-day trial |
|
|
177
|
+
| App B | Freemium | $12 | — | Lifetime $199| — |
|
|
178
|
+
|
|
179
|
+
Use "Unknown" for pricing not found — never leave blank.
|
|
180
|
+
|
|
181
|
+
## Feature Matrix
|
|
182
|
+
|
|
183
|
+
| Feature | A | B | C | D | E |
|
|
184
|
+
| ------------ | - | - | - | - | - |
|
|
185
|
+
| [Feature 1] | ✓ | ✗ | ~ | ✓ | ✓ |
|
|
186
|
+
| [Feature 2] | ~ | ✓ | ✓ | ✗ | ~ |
|
|
187
|
+
|
|
188
|
+
Legend: ✓ = strong, ~ = partial, ✗ = missing
|
|
95
189
|
|
|
96
|
-
|
|
97
|
-
- Positioning: ...
|
|
98
|
-
- Target: ...
|
|
99
|
-
- Pricing: ...
|
|
100
|
-
- Strengths: ...
|
|
101
|
-
- Weaknesses: ...
|
|
190
|
+
## Strategic Insights
|
|
102
191
|
|
|
103
|
-
|
|
192
|
+
### What to Take
|
|
104
193
|
|
|
105
|
-
|
|
194
|
+
| Insight | Evidence | Source |
|
|
195
|
+
| ------- | -------- | ------ |
|
|
196
|
+
| ... | ... | App A reviews |
|
|
106
197
|
|
|
107
|
-
|
|
108
|
-
|---------|----|-------|-------|
|
|
198
|
+
### What to Avoid
|
|
109
199
|
|
|
110
|
-
|
|
200
|
+
| Anti-pattern | Why | Evidence |
|
|
201
|
+
| ------------ | --- | -------- |
|
|
202
|
+
| ... | ... | App C 1-star reviews |
|
|
111
203
|
|
|
112
|
-
|
|
113
|
-
**Table stakes:** ...
|
|
114
|
-
**Our differentiation:** ...
|
|
115
|
-
**Threats:** ...
|
|
204
|
+
### What to Do Uniquely
|
|
116
205
|
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
206
|
+
| Opportunity | Gap in Market | Our Angle |
|
|
207
|
+
| ----------- | ------------- | --------- |
|
|
208
|
+
| ... | No one does X well | ... |
|
|
209
|
+
|
|
210
|
+
## Market Gaps & Threats
|
|
211
|
+
|
|
212
|
+
| Type | Finding |
|
|
213
|
+
| --------------- | ------- |
|
|
214
|
+
| Unfilled gap | ... |
|
|
215
|
+
| Table stakes | ... |
|
|
216
|
+
| Emerging threat | ... |
|
|
217
|
+
|
|
218
|
+
## Data Sources
|
|
219
|
+
|
|
220
|
+
| Competitor | Platforms Found On | URLs |
|
|
221
|
+
| ---------- | ---------------------- | ------------- |
|
|
222
|
+
| App A | Product Hunt, App Store| [links] |
|
|
223
|
+
| App B | Google Play | [links] |
|
|
121
224
|
```
|
|
225
|
+
|
|
226
|
+
## Graceful Degradation
|
|
227
|
+
|
|
228
|
+
| Scenario | Action |
|
|
229
|
+
| -------------------------------- | --------------------------------------------------------- |
|
|
230
|
+
| Platform fetch blocked | Use search snippet data, note in Data Sources |
|
|
231
|
+
| No results on a platform | Note "No results on {platform}", continue with rest |
|
|
232
|
+
| >7 competitors analyzed | Split Competitor Dashboard into 2 tables |
|
|
233
|
+
| Pricing not found | Mark as "Unknown", never leave blank |
|
|
234
|
+
| Reviews unavailable | Note "No reviews available", assess from description |
|
|
235
|
+
| Entire search yields <3 hits | Ask user for more specific keywords or known competitors |
|
|
236
|
+
| WebSearch/WebFetch unavailable | Ask user for competitor names and URLs, analyze from provided context |
|
|
237
|
+
| Scripts not found / Node error | Fall back to WebSearch + WebFetch method (see Step 2 fallback) |
|
|
238
|
+
| Script returns empty results | Check `errors[]` in JSON, fall back to WebSearch for that platform |
|
|
239
|
+
|
|
240
|
+
## Follow-up
|
|
241
|
+
|
|
242
|
+
Always end with:
|
|
243
|
+
|
|
244
|
+
> "Report complete. Want me to go deeper on any competitor, or feed these insights into `/pkit:product-design` for a PRD?"
|
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""App Store crawler via iTunes Search API. Zero external dependencies.
|
|
3
|
+
Usage: python3 search-app-store.py "<keywords>" [limit]
|
|
4
|
+
Output: JSON CrawlResult to stdout
|
|
5
|
+
"""
|
|
6
|
+
|
|
7
|
+
import json
|
|
8
|
+
import sys
|
|
9
|
+
import urllib.request
|
|
10
|
+
import urllib.parse
|
|
11
|
+
from datetime import datetime, timezone
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
def safe_fetch(url, timeout=10):
|
|
15
|
+
"""Fetch URL with timeout and user-agent header."""
|
|
16
|
+
req = urllib.request.Request(url, headers={
|
|
17
|
+
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
|
18
|
+
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
|
19
|
+
"Chrome/120.0.0.0 Safari/537.36",
|
|
20
|
+
})
|
|
21
|
+
return urllib.request.urlopen(req, timeout=timeout)
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
def truncate(text, max_len=500):
|
|
25
|
+
return text[:max_len] + "..." if len(text) > max_len else text
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
def fetch_reviews(track_id):
|
|
29
|
+
"""Fetch recent reviews via iTunes RSS feed."""
|
|
30
|
+
reviews = []
|
|
31
|
+
try:
|
|
32
|
+
url = (f"https://itunes.apple.com/rss/customerreviews/"
|
|
33
|
+
f"id={track_id}/sortBy=mostRecent/json")
|
|
34
|
+
resp = safe_fetch(url)
|
|
35
|
+
data = json.loads(resp.read())
|
|
36
|
+
entries = data.get("feed", {}).get("entry", [])
|
|
37
|
+
for entry in entries[:6]:
|
|
38
|
+
content = entry.get("content", {}).get("label", "")
|
|
39
|
+
rating_str = entry.get("im:rating", {}).get("label", "")
|
|
40
|
+
if not content or not rating_str:
|
|
41
|
+
continue
|
|
42
|
+
rating = int(rating_str)
|
|
43
|
+
sentiment = "positive" if rating >= 4 else "negative" if rating <= 2 else "neutral"
|
|
44
|
+
reviews.append({
|
|
45
|
+
"text": truncate(content, 300),
|
|
46
|
+
"rating": rating,
|
|
47
|
+
"sentiment": sentiment,
|
|
48
|
+
})
|
|
49
|
+
except Exception:
|
|
50
|
+
pass
|
|
51
|
+
return reviews
|
|
52
|
+
|
|
53
|
+
|
|
54
|
+
def main():
|
|
55
|
+
if len(sys.argv) < 2:
|
|
56
|
+
print(json.dumps({"error": "Usage: python3 search-app-store.py <keywords> [limit]"}))
|
|
57
|
+
sys.exit(1)
|
|
58
|
+
|
|
59
|
+
query = sys.argv[1]
|
|
60
|
+
limit = 5
|
|
61
|
+
if len(sys.argv) >= 3:
|
|
62
|
+
try:
|
|
63
|
+
limit = max(1, min(int(sys.argv[2]), 10))
|
|
64
|
+
except ValueError:
|
|
65
|
+
limit = 5
|
|
66
|
+
|
|
67
|
+
errors = []
|
|
68
|
+
search_url = (
|
|
69
|
+
f"https://itunes.apple.com/search?"
|
|
70
|
+
f"term={urllib.parse.quote(query)}&entity=software&limit={limit}&country=us"
|
|
71
|
+
)
|
|
72
|
+
|
|
73
|
+
try:
|
|
74
|
+
resp = safe_fetch(search_url)
|
|
75
|
+
search_data = json.loads(resp.read())
|
|
76
|
+
except Exception as e:
|
|
77
|
+
print(json.dumps({
|
|
78
|
+
"platform": "app_store", "query": query,
|
|
79
|
+
"timestamp": datetime.now(timezone.utc).isoformat(),
|
|
80
|
+
"results": [], "errors": [f"iTunes API error: {e}"],
|
|
81
|
+
}, indent=2))
|
|
82
|
+
return
|
|
83
|
+
|
|
84
|
+
results = []
|
|
85
|
+
for app in search_data.get("results", []):
|
|
86
|
+
track_id = app.get("trackId", 0)
|
|
87
|
+
reviews = fetch_reviews(track_id)
|
|
88
|
+
price = app.get("price", 0)
|
|
89
|
+
|
|
90
|
+
results.append({
|
|
91
|
+
"name": app.get("trackName", ""),
|
|
92
|
+
"url": app.get("trackViewUrl", ""),
|
|
93
|
+
"description": truncate(app.get("description", "")),
|
|
94
|
+
"tagline": None,
|
|
95
|
+
"rating": round(app.get("averageUserRating", 0), 1) or None,
|
|
96
|
+
"reviewCount": app.get("userRatingCount"),
|
|
97
|
+
"pricing": {
|
|
98
|
+
"free": price == 0,
|
|
99
|
+
"monthly": None,
|
|
100
|
+
"yearly": None,
|
|
101
|
+
"other": app.get("formattedPrice") if price > 0 else None,
|
|
102
|
+
},
|
|
103
|
+
"features": [],
|
|
104
|
+
"reviews": reviews,
|
|
105
|
+
})
|
|
106
|
+
|
|
107
|
+
print(json.dumps({
|
|
108
|
+
"platform": "app_store",
|
|
109
|
+
"query": query,
|
|
110
|
+
"timestamp": datetime.now(timezone.utc).isoformat(),
|
|
111
|
+
"results": results,
|
|
112
|
+
"errors": errors,
|
|
113
|
+
}, indent=2))
|
|
114
|
+
|
|
115
|
+
|
|
116
|
+
if __name__ == "__main__":
|
|
117
|
+
main()
|
|
@@ -0,0 +1,179 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""Google Play Store crawler via HTML scraping. Zero external dependencies.
|
|
3
|
+
Usage: python3 search-google-play.py "<keywords>" [limit]
|
|
4
|
+
Output: JSON CrawlResult to stdout
|
|
5
|
+
"""
|
|
6
|
+
|
|
7
|
+
import json
|
|
8
|
+
import re
|
|
9
|
+
import sys
|
|
10
|
+
import urllib.request
|
|
11
|
+
import urllib.parse
|
|
12
|
+
from datetime import datetime, timezone
|
|
13
|
+
from html.parser import HTMLParser
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
def safe_fetch(url, timeout=10):
|
|
17
|
+
"""Fetch URL with timeout and user-agent header."""
|
|
18
|
+
req = urllib.request.Request(url, headers={
|
|
19
|
+
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
|
20
|
+
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
|
21
|
+
"Chrome/120.0.0.0 Safari/537.36",
|
|
22
|
+
"Accept-Language": "en-US,en;q=0.9",
|
|
23
|
+
})
|
|
24
|
+
return urllib.request.urlopen(req, timeout=timeout).read().decode("utf-8", errors="replace")
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
def truncate(text, max_len=500):
|
|
28
|
+
return text[:max_len] + "..." if len(text) > max_len else text
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
def extract_app_ids_from_search(html):
|
|
32
|
+
"""Extract app IDs from Google Play search page HTML."""
|
|
33
|
+
# Google Play links: /store/apps/details?id=com.example.app
|
|
34
|
+
ids = re.findall(r'/store/apps/details\?id=([a-zA-Z0-9_.]+)', html)
|
|
35
|
+
# Deduplicate while preserving order
|
|
36
|
+
seen = set()
|
|
37
|
+
unique = []
|
|
38
|
+
for app_id in ids:
|
|
39
|
+
if app_id not in seen:
|
|
40
|
+
seen.add(app_id)
|
|
41
|
+
unique.append(app_id)
|
|
42
|
+
return unique
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
def extract_json_ld(html):
|
|
46
|
+
"""Extract JSON-LD structured data from HTML."""
|
|
47
|
+
matches = re.findall(
|
|
48
|
+
r'<script[^>]*type="application/ld\+json"[^>]*>(.*?)</script>',
|
|
49
|
+
html, re.DOTALL
|
|
50
|
+
)
|
|
51
|
+
for match in matches:
|
|
52
|
+
try:
|
|
53
|
+
data = json.loads(match)
|
|
54
|
+
if isinstance(data, dict) and data.get("@type") == "SoftwareApplication":
|
|
55
|
+
return data
|
|
56
|
+
except (json.JSONDecodeError, ValueError):
|
|
57
|
+
continue
|
|
58
|
+
return None
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
def extract_app_details(html, app_id):
|
|
62
|
+
"""Extract app details from a Google Play detail page."""
|
|
63
|
+
url = f"https://play.google.com/store/apps/details?id={app_id}"
|
|
64
|
+
|
|
65
|
+
# Try JSON-LD first (most reliable)
|
|
66
|
+
json_ld = extract_json_ld(html)
|
|
67
|
+
if json_ld:
|
|
68
|
+
name = json_ld.get("name", "")
|
|
69
|
+
description = json_ld.get("description", "")
|
|
70
|
+
rating = None
|
|
71
|
+
review_count = None
|
|
72
|
+
agg = json_ld.get("aggregateRating", {})
|
|
73
|
+
if agg:
|
|
74
|
+
try:
|
|
75
|
+
rating = round(float(agg.get("ratingValue", 0)), 1)
|
|
76
|
+
review_count = int(agg.get("ratingCount", 0))
|
|
77
|
+
except (ValueError, TypeError):
|
|
78
|
+
pass
|
|
79
|
+
|
|
80
|
+
price_text = json_ld.get("offers", {}).get("price", "0")
|
|
81
|
+
is_free = str(price_text) in ("0", "0.00", "")
|
|
82
|
+
|
|
83
|
+
return {
|
|
84
|
+
"name": name,
|
|
85
|
+
"url": url,
|
|
86
|
+
"description": truncate(description),
|
|
87
|
+
"rating": rating if rating else None,
|
|
88
|
+
"reviewCount": review_count if review_count else None,
|
|
89
|
+
"pricing": {
|
|
90
|
+
"free": is_free,
|
|
91
|
+
"other": None if is_free else str(price_text),
|
|
92
|
+
},
|
|
93
|
+
"features": [],
|
|
94
|
+
"reviews": [],
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
# Fallback: meta tags
|
|
98
|
+
name_match = re.search(r'<meta\s+property="og:title"\s+content="([^"]*)"', html)
|
|
99
|
+
desc_match = re.search(r'<meta\s+property="og:description"\s+content="([^"]*)"', html)
|
|
100
|
+
name = name_match.group(1) if name_match else ""
|
|
101
|
+
description = desc_match.group(1) if desc_match else ""
|
|
102
|
+
|
|
103
|
+
if not name:
|
|
104
|
+
return None
|
|
105
|
+
|
|
106
|
+
return {
|
|
107
|
+
"name": name.replace(" - Apps on Google Play", ""),
|
|
108
|
+
"url": url,
|
|
109
|
+
"description": truncate(description),
|
|
110
|
+
"rating": None,
|
|
111
|
+
"reviewCount": None,
|
|
112
|
+
"pricing": {"free": True},
|
|
113
|
+
"features": [],
|
|
114
|
+
"reviews": [],
|
|
115
|
+
}
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
def main():
|
|
119
|
+
if len(sys.argv) < 2:
|
|
120
|
+
print(json.dumps({"error": "Usage: python3 search-google-play.py <keywords> [limit]"}))
|
|
121
|
+
sys.exit(1)
|
|
122
|
+
|
|
123
|
+
query = sys.argv[1]
|
|
124
|
+
limit = 5
|
|
125
|
+
if len(sys.argv) >= 3:
|
|
126
|
+
try:
|
|
127
|
+
limit = max(1, min(int(sys.argv[2]), 10))
|
|
128
|
+
except ValueError:
|
|
129
|
+
limit = 5
|
|
130
|
+
|
|
131
|
+
errors = []
|
|
132
|
+
search_url = (
|
|
133
|
+
f"https://play.google.com/store/search?"
|
|
134
|
+
f"q={urllib.parse.quote(query)}&c=apps&hl=en&gl=us"
|
|
135
|
+
)
|
|
136
|
+
|
|
137
|
+
try:
|
|
138
|
+
search_html = safe_fetch(search_url)
|
|
139
|
+
except Exception as e:
|
|
140
|
+
print(json.dumps({
|
|
141
|
+
"platform": "google_play", "query": query,
|
|
142
|
+
"timestamp": datetime.now(timezone.utc).isoformat(),
|
|
143
|
+
"results": [], "errors": [f"Google Play search error: {e}"],
|
|
144
|
+
}, indent=2))
|
|
145
|
+
return
|
|
146
|
+
|
|
147
|
+
app_ids = extract_app_ids_from_search(search_html)[:limit]
|
|
148
|
+
|
|
149
|
+
if not app_ids:
|
|
150
|
+
errors.append("No app IDs found — page may be JS-rendered")
|
|
151
|
+
print(json.dumps({
|
|
152
|
+
"platform": "google_play", "query": query,
|
|
153
|
+
"timestamp": datetime.now(timezone.utc).isoformat(),
|
|
154
|
+
"results": [], "errors": errors,
|
|
155
|
+
}, indent=2))
|
|
156
|
+
return
|
|
157
|
+
|
|
158
|
+
results = []
|
|
159
|
+
for app_id in app_ids:
|
|
160
|
+
detail_url = f"https://play.google.com/store/apps/details?id={app_id}&hl=en&gl=us"
|
|
161
|
+
try:
|
|
162
|
+
detail_html = safe_fetch(detail_url)
|
|
163
|
+
entry = extract_app_details(detail_html, app_id)
|
|
164
|
+
if entry:
|
|
165
|
+
results.append(entry)
|
|
166
|
+
except Exception as e:
|
|
167
|
+
errors.append(f"Failed to fetch {app_id}: {e}")
|
|
168
|
+
|
|
169
|
+
print(json.dumps({
|
|
170
|
+
"platform": "google_play",
|
|
171
|
+
"query": query,
|
|
172
|
+
"timestamp": datetime.now(timezone.utc).isoformat(),
|
|
173
|
+
"results": results,
|
|
174
|
+
"errors": errors,
|
|
175
|
+
}, indent=2))
|
|
176
|
+
|
|
177
|
+
|
|
178
|
+
if __name__ == "__main__":
|
|
179
|
+
main()
|