bubu-xhs-scraper-skill 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.en.md +198 -0
- package/README.md +205 -0
- package/README.zh-CN.md +190 -0
- package/bin/install.mjs +77 -0
- package/package.json +39 -0
- package/skills/bubu-xhs-scraper-skill/SKILL.md +367 -0
package/README.en.md
ADDED
|
@@ -0,0 +1,198 @@
|
|
|
1
|
+
# xhs-scraper-skill
|
|
2
|
+
|
|
3
|
+
[**中文文档**](./README.md)
|
|
4
|
+
|
|
5
|
+
> A [Claude Code](https://claude.ai/claude-code) skill for scraping Xiaohongshu (小红书 / RedNote) data via [TikHub API](https://tikhub.io).
|
|
6
|
+
> Compatible with **Claude Code** and **Openclaw**.
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Why I Built This
|
|
11
|
+
|
|
12
|
+
I got my Xiaohongshu account restricted — and this is why.
|
|
13
|
+
|
|
14
|
+

|
|
15
|
+
|
|
16
|
+
> **Violation notice (translated):** *"Your account has repeatedly used AI automation for posting/interactions. We will restrict your account functions. The platform does not support any third-party tools, software, or mini programs for auto-publishing content or inflating data. Repeated violations may result in a permanent ban."*
|
|
17
|
+
|
|
18
|
+
Previously, I was scraping Xiaohongshu data using tools that required **logging in with my account** — including Playwright-based scrapers, Openclaw monitoring workflows, and `xiaohongshu-cli` type tools. All of them triggered XHS's upgraded AI automation detection. The problem is clear: **any tool that logs in with your account will be flagged**, regardless of what it's actually doing.
|
|
19
|
+
|
|
20
|
+
**This skill takes a different approach.** It uses TikHub's API — a third-party data service that accesses public Xiaohongshu data through its own infrastructure. You never log in to XHS, never run a browser, and never touch your account's session. Your account stays safe.
|
|
21
|
+
|
|
22
|
+
> Use this skill for reading and analyzing **public** Xiaohongshu data only — not for automating interactions (posting, liking, commenting) on your account.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## What it does
|
|
27
|
+
|
|
28
|
+
Talk to Claude in plain language — no commands, no code. Claude will automatically pick the right API, extract IDs from URLs, paginate results, and export data.
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
帮我抓取这条小红书笔记的数据:https://www.xiaohongshu.com/explore/68304ca2...
|
|
32
|
+
|
|
33
|
+
抓取用户「美食探店小王」的所有笔记,导出成 Excel
|
|
34
|
+
|
|
35
|
+
搜索小红书上关于「护肤」的用户,列出粉丝数排名
|
|
36
|
+
|
|
37
|
+
获取这个笔记的所有评论
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
**Supported Platforms:**
|
|
41
|
+
|
|
42
|
+
| Platform | User Info | Content List | Content Detail | Comments | Search |
|
|
43
|
+
|----------|-----------|-------------|----------------|----------|--------|
|
|
44
|
+
| Xiaohongshu (XHS) | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
45
|
+
| Douyin | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
46
|
+
| TikTok | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
47
|
+
| Bilibili | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
48
|
+
| Weibo | ✅ | ✅ | ✅ | — | ✅ |
|
|
49
|
+
| WeChat MP (公众号) | — | ✅ | ✅ | ✅ | — |
|
|
50
|
+
| WeChat Channels (视频号) | ✅ | — | ✅ | ✅ | ✅ |
|
|
51
|
+
| Twitter / X | ✅ | ✅ | — | — | ✅ |
|
|
52
|
+
| YouTube | — | — | — | — | ✅ |
|
|
53
|
+
|
|
54
|
+
**Supports:**
|
|
55
|
+
- Search users / content / topics across platforms
|
|
56
|
+
- Get user profile (followers, post count, bio)
|
|
57
|
+
- Fetch all content from a user (auto-pagination)
|
|
58
|
+
- Get content detail — title, description, stats, cover image, topics
|
|
59
|
+
- Get comments
|
|
60
|
+
- Export to Excel / JSON
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Quick Start
|
|
65
|
+
|
|
66
|
+
### 1. Install the skill
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
npx bubu-xhs-scraper-skill
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
This copies the skill into `~/.claude/skills/bubu-xhs-scraper-skill/`.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
### 2. Get a TikHub API key
|
|
77
|
+
|
|
78
|
+
1. Register at [user.tikhub.io](https://user.tikhub.io)
|
|
79
|
+
2. Copy your API key from the dashboard
|
|
80
|
+
|
|
81
|
+
> TikHub provides the underlying Xiaohongshu data API. Pricing: ~**$0.001 per call**, with 24h response caching (repeated identical requests are free). See [full pricing](https://user.tikhub.io/dashboard/pricing).
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
### 3. Add the API key to Claude Code
|
|
86
|
+
|
|
87
|
+
Edit `~/.claude/settings.json`:
|
|
88
|
+
|
|
89
|
+
```json
|
|
90
|
+
{
|
|
91
|
+
"env": {
|
|
92
|
+
"TIKHUB_API_KEY": "your-api-key-here"
|
|
93
|
+
}
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
> **Lazy mode:** Don't want to edit the file manually? Just paste your API key into Claude Code or Openclaw and say "add this to my config: `your-key`" — it'll handle the rest.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
### 4. Install Python dependency
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
pip3 install httpx
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
### 5. Restart Claude Code
|
|
110
|
+
|
|
111
|
+
The skill activates automatically. Try asking:
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
帮我抓取这条小红书笔记的数据:https://www.xiaohongshu.com/explore/68304ca2...
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## Requirements
|
|
120
|
+
|
|
121
|
+
| Requirement | Details |
|
|
122
|
+
|-------------|---------|
|
|
123
|
+
| Claude Code | [claude.ai/claude-code](https://claude.ai/claude-code) |
|
|
124
|
+
| Node.js | ≥ 18 (for installer) |
|
|
125
|
+
| Python 3 + `httpx` | `pip3 install httpx` |
|
|
126
|
+
| TikHub API key | [user.tikhub.io](https://user.tikhub.io) |
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## Supported Endpoints
|
|
131
|
+
|
|
132
|
+
| Feature | Endpoint |
|
|
133
|
+
|---------|----------|
|
|
134
|
+
| Search users | `GET /api/v1/xiaohongshu/web/search_users` |
|
|
135
|
+
| Get user info | `GET /api/v1/xiaohongshu/app/get_user_info` |
|
|
136
|
+
| Get user notes | `GET /api/v1/xiaohongshu/app/get_user_notes` |
|
|
137
|
+
| Get note detail | `GET /api/v1/xiaohongshu/app/get_note_info` |
|
|
138
|
+
| Get note comments | `GET /api/v1/xiaohongshu/app/get_note_comments` |
|
|
139
|
+
| Get notes by topic | `GET /api/v1/xiaohongshu/app/get_notes_by_topic` |
|
|
140
|
+
| Home feed | `GET /api/v1/xiaohongshu/web/get_home_recommend` |
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## How to Extract IDs from URLs
|
|
145
|
+
|
|
146
|
+
**User ID** — from profile URL:
|
|
147
|
+
```
|
|
148
|
+
https://www.xiaohongshu.com/user/profile/5c1b1234...
|
|
149
|
+
^^^^^^^^^^^
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
**Note ID** — from note URL:
|
|
153
|
+
```
|
|
154
|
+
https://www.xiaohongshu.com/explore/68304ca200000000...
|
|
155
|
+
^^^^^^^^^^^^^^^^
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
> You don't need to extract IDs manually — just paste the full URL and Claude will handle it.
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## Note Detail Fields
|
|
163
|
+
|
|
164
|
+
| Field | Description |
|
|
165
|
+
|-------|-------------|
|
|
166
|
+
| `title` | Note title |
|
|
167
|
+
| `desc` | Note description |
|
|
168
|
+
| `type` | `"normal"` (image) or `"video"` |
|
|
169
|
+
| `liked_count` | Like count |
|
|
170
|
+
| `collected_count` | Collect/save count |
|
|
171
|
+
| `comments_count` | Comment count |
|
|
172
|
+
| `shared_count` | Share count |
|
|
173
|
+
| `view_count` | View count |
|
|
174
|
+
| `topics` | Topic tags |
|
|
175
|
+
| `time` | Publish timestamp (seconds) |
|
|
176
|
+
| `ip_location` | Author's IP location |
|
|
177
|
+
| `images_list` | Image URLs (multiple resolutions) |
|
|
178
|
+
| `video` | Video info (for video notes) |
|
|
179
|
+
|
|
180
|
+
> ⚠️ `interact_info` in the raw response is always empty — the skill reads the correct top-level fields (`liked_count`, etc.) instead.
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Troubleshooting
|
|
185
|
+
|
|
186
|
+
| Error | Likely Cause | Fix |
|
|
187
|
+
|-------|-------------|-----|
|
|
188
|
+
| `400` | Note deleted, or needs `share_text` | Paste the full share URL instead of just the note ID |
|
|
189
|
+
| `401` | API key missing | Check `TIKHUB_API_KEY` in `~/.claude/settings.json` |
|
|
190
|
+
| `429` | Rate limited | The skill adds delays automatically; wait a moment and retry |
|
|
191
|
+
| Timeout | VPN / proxy conflict | Try toggling your VPN — `api.tikhub.io` may need direct access |
|
|
192
|
+
| Cover URL broken | CDN URLs expire | Download cover images right away, don't save URLs for later use |
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## License
|
|
197
|
+
|
|
198
|
+
MIT
|
package/README.md
ADDED
|
@@ -0,0 +1,205 @@
|
|
|
1
|
+
# xhs-scraper-skill
|
|
2
|
+
|
|
3
|
+
[**English**](./README.en.md)
|
|
4
|
+
|
|
5
|
+
> 基于 [TikHub API](https://tikhub.io) 的小红书数据抓取工具,适用于 [Claude Code](https://claude.ai/claude-code) 和 **Openclaw**。
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 为什么做这个
|
|
10
|
+
|
|
11
|
+
因为我的小红书账号被限流了。
|
|
12
|
+
|
|
13
|
+

|
|
14
|
+
|
|
15
|
+
**违规原因:** 账号多次利用 AI 托管进行发文/互动,平台检测到自动化行为,限制了账号的发布、评论、私信等功能,有效期一周。
|
|
16
|
+
|
|
17
|
+
在这之前,我用了不少工具来采集小红书数据,包括:
|
|
18
|
+
|
|
19
|
+
- 基于 Playwright 的爬虫脚本(需要登录账号)
|
|
20
|
+
- Openclaw 的小红书笔记监控流程
|
|
21
|
+
- 各种 `xiaohongshu-cli` 类型的命令行工具
|
|
22
|
+
|
|
23
|
+
这些工具有一个共同点:**都需要登录我的小红书账号**。而小红书现在升级了 AI 自动化检测,只要账号有可疑的自动化行为,就会被无差别封控——哪怕你只是在采集数据、没有发布任何内容。
|
|
24
|
+
|
|
25
|
+
**这个 Skill 的解决思路完全不同。** 它调用的是 TikHub 的第三方数据接口,TikHub 通过自己的基础设施访问小红书的公开数据。你不需要登录小红书账号,不需要打开浏览器,也不会产生任何账号操作记录。你的账号安全。
|
|
26
|
+
|
|
27
|
+
> 本工具仅用于读取和分析小红书**公开数据**,请勿用于自动化互动(发帖、点赞、评论)等可能违反平台规则的行为。
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## 能做什么
|
|
32
|
+
|
|
33
|
+
用自然语言告诉 Claude 你要抓什么,不需要写代码,不需要记命令。Claude 会自动识别意图、从链接中提取 ID、自动翻页,并导出结果。
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
帮我抓取这条小红书笔记的数据:https://www.xiaohongshu.com/explore/68304ca2...
|
|
37
|
+
|
|
38
|
+
抓取用户「美食探店小王」的所有笔记,导出成 Excel
|
|
39
|
+
|
|
40
|
+
搜索小红书上关于「护肤」的用户,列出粉丝数排名
|
|
41
|
+
|
|
42
|
+
获取这个笔记的所有评论
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**支持平台:**
|
|
46
|
+
|
|
47
|
+
| 平台 | 用户信息 | 内容列表 | 内容详情 | 评论 | 搜索 |
|
|
48
|
+
|------|---------|---------|---------|------|------|
|
|
49
|
+
| 小红书 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
50
|
+
| 抖音 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
51
|
+
| TikTok | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
52
|
+
| Bilibili | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
53
|
+
| 微博 | ✅ | ✅ | ✅ | — | ✅ |
|
|
54
|
+
| 微信公众号 | — | ✅ | ✅ | ✅ | — |
|
|
55
|
+
| 微信视频号 | ✅ | — | ✅ | ✅ | ✅ |
|
|
56
|
+
| Twitter / X | ✅ | ✅ | — | — | ✅ |
|
|
57
|
+
| YouTube | — | — | — | — | ✅ |
|
|
58
|
+
|
|
59
|
+
**支持功能:**
|
|
60
|
+
- 搜索用户 / 内容 / 话题
|
|
61
|
+
- 获取用户主页数据(粉丝数、发帖数、简介等)
|
|
62
|
+
- 获取用户所有内容(自动翻页)
|
|
63
|
+
- 获取内容详情(标题、正文、互动数、封面图、话题标签)
|
|
64
|
+
- 获取评论
|
|
65
|
+
- 导出为 Excel / JSON
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## 快速开始
|
|
70
|
+
|
|
71
|
+
### 第一步 — 安装 Skill
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
npx bubu-xhs-scraper-skill
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
安装完成后,skill 文件会复制到 `~/.claude/skills/bubu-xhs-scraper-skill/`。
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
### 第二步 — 获取 TikHub API Key
|
|
82
|
+
|
|
83
|
+
1. 前往 [user.tikhub.io](https://user.tikhub.io) 注册账号
|
|
84
|
+
2. 在控制台中复制你的 API Key
|
|
85
|
+
|
|
86
|
+
> TikHub 提供小红书底层数据接口。计费方式:约 **$0.001/次**,成功请求会缓存 24 小时(重复请求不额外计费)。详细定价见 [TikHub Pricing](https://user.tikhub.io/dashboard/pricing)。
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
### 第三步 — 配置 API Key
|
|
91
|
+
|
|
92
|
+
编辑 `~/.claude/settings.json`,添加以下内容:
|
|
93
|
+
|
|
94
|
+
```json
|
|
95
|
+
{
|
|
96
|
+
"env": {
|
|
97
|
+
"TIKHUB_API_KEY": "你的-api-key"
|
|
98
|
+
}
|
|
99
|
+
}
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
> **懒人方式:** 不想手动改文件?直接把 API Key 丢给 Claude Code 或 Openclaw,说一句「帮我把这个写入配置:`你的key`」,它会自动帮你完成配置。
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
### 第四步 — 安装 Python 依赖
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
pip3 install httpx
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
### 第五步 — 重启 Claude Code
|
|
115
|
+
|
|
116
|
+
重启后 skill 自动生效。试试这句话:
|
|
117
|
+
|
|
118
|
+
```
|
|
119
|
+
帮我抓取这条小红书笔记的数据:https://www.xiaohongshu.com/explore/68304ca2...
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## 环境要求
|
|
125
|
+
|
|
126
|
+
| 要求 | 说明 |
|
|
127
|
+
|------|------|
|
|
128
|
+
| Claude Code | [claude.ai/claude-code](https://claude.ai/claude-code) |
|
|
129
|
+
| Node.js | ≥ 18(安装脚本需要) |
|
|
130
|
+
| Python 3 + `httpx` | `pip3 install httpx` |
|
|
131
|
+
| TikHub API Key | [user.tikhub.io](https://user.tikhub.io) 注册获取 |
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## 支持的接口
|
|
136
|
+
|
|
137
|
+
| 功能 | 接口路径 |
|
|
138
|
+
|------|---------|
|
|
139
|
+
| 搜索用户 | `GET /api/v1/xiaohongshu/web/search_users` |
|
|
140
|
+
| 获取用户信息 | `GET /api/v1/xiaohongshu/app/get_user_info` |
|
|
141
|
+
| 获取用户笔记列表 | `GET /api/v1/xiaohongshu/app/get_user_notes` |
|
|
142
|
+
| 获取笔记详情 | `GET /api/v1/xiaohongshu/app/get_note_info` |
|
|
143
|
+
| 获取笔记评论 | `GET /api/v1/xiaohongshu/app/get_note_comments` |
|
|
144
|
+
| 按话题获取笔记 | `GET /api/v1/xiaohongshu/app/get_notes_by_topic` |
|
|
145
|
+
| 首页推荐 | `GET /api/v1/xiaohongshu/web/get_home_recommend` |
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## 如何从链接中找到 ID
|
|
150
|
+
|
|
151
|
+
**用户 ID** — 来自主页链接:
|
|
152
|
+
```
|
|
153
|
+
https://www.xiaohongshu.com/user/profile/5c1b1234...
|
|
154
|
+
^^^^^^^^^^^
|
|
155
|
+
这就是 user_id
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**笔记 ID** — 来自笔记链接:
|
|
159
|
+
```
|
|
160
|
+
https://www.xiaohongshu.com/explore/68304ca200000000...
|
|
161
|
+
^^^^^^^^^^^^^^^^
|
|
162
|
+
这就是 note_id
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
> 不需要手动提取 ID,直接把完整链接发给 Claude,它会自动处理。
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## 笔记详情字段说明
|
|
170
|
+
|
|
171
|
+
| 字段 | 含义 |
|
|
172
|
+
|------|------|
|
|
173
|
+
| `title` | 笔记标题 |
|
|
174
|
+
| `desc` | 笔记正文 |
|
|
175
|
+
| `type` | `"normal"`(图文)或 `"video"`(视频) |
|
|
176
|
+
| `liked_count` | 点赞数 |
|
|
177
|
+
| `collected_count` | 收藏数 |
|
|
178
|
+
| `comments_count` | 评论数 |
|
|
179
|
+
| `shared_count` | 分享数 |
|
|
180
|
+
| `view_count` | 浏览数 |
|
|
181
|
+
| `topics` | 话题标签列表 |
|
|
182
|
+
| `time` | 发布时间戳(秒) |
|
|
183
|
+
| `ip_location` | 发布 IP 归属地 |
|
|
184
|
+
| `images_list` | 图片列表(多档清晰度) |
|
|
185
|
+
| `video` | 视频信息(视频笔记专有) |
|
|
186
|
+
|
|
187
|
+
> ⚠️ 原始响应中 `interact_info` 字段始终为空,Skill 会自动读取顶层的 `liked_count` 等字段,无需关心。
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## 常见问题
|
|
192
|
+
|
|
193
|
+
| 报错 | 可能原因 | 解决方法 |
|
|
194
|
+
|------|---------|---------|
|
|
195
|
+
| `400` | 笔记已删除,或需要分享链接 | 换用完整的分享链接(而不是单独的 note_id) |
|
|
196
|
+
| `401` | API Key 未配置 | 检查 `~/.claude/settings.json` 中的 `TIKHUB_API_KEY` |
|
|
197
|
+
| `429` | 触发限流 | Skill 已自动加延迟,稍等片刻后重试 |
|
|
198
|
+
| 请求超时 | VPN 或代理冲突 | 尝试切换 VPN 状态,`api.tikhub.io` 可能需要直连 |
|
|
199
|
+
| 封面链接失效 | CDN 链接带时效签名 | 抓到封面链接后立即下载,不要只保存 URL |
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## 开源协议
|
|
204
|
+
|
|
205
|
+
MIT
|
package/README.zh-CN.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
# xhs-scraper-skill
|
|
2
|
+
|
|
3
|
+
[**English**](./README.en.md)
|
|
4
|
+
|
|
5
|
+
> 基于 [TikHub API](https://tikhub.io) 的小红书数据抓取工具,适用于 [Claude Code](https://claude.ai/claude-code) 和 **Openclaw**。
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 为什么做这个
|
|
10
|
+
|
|
11
|
+
因为我的小红书账号被限流了。
|
|
12
|
+
|
|
13
|
+

|
|
14
|
+
|
|
15
|
+
**违规原因:** 账号多次利用 AI 托管进行发文/互动,平台检测到自动化行为,限制了账号的发布、评论、私信等功能,有效期一周。
|
|
16
|
+
|
|
17
|
+
在这之前,我用了不少工具来采集小红书数据,包括:
|
|
18
|
+
|
|
19
|
+
- 基于 Playwright 的爬虫脚本(需要登录账号)
|
|
20
|
+
- Openclaw 的小红书笔记监控流程
|
|
21
|
+
- 各种 `xiaohongshu-cli` 类型的命令行工具
|
|
22
|
+
|
|
23
|
+
这些工具有一个共同点:**都需要登录我的小红书账号**。而小红书现在升级了 AI 自动化检测,只要账号有可疑的自动化行为,就会被无差别封控——哪怕你只是在采集数据、没有发布任何内容。
|
|
24
|
+
|
|
25
|
+
**这个 Skill 的解决思路完全不同。** 它调用的是 TikHub 的第三方数据接口,TikHub 通过自己的基础设施访问小红书的公开数据。你不需要登录小红书账号,不需要打开浏览器,也不会产生任何账号操作记录。你的账号安全。
|
|
26
|
+
|
|
27
|
+
> 本工具仅用于读取和分析小红书**公开数据**,请勿用于自动化互动(发帖、点赞、评论)等可能违反平台规则的行为。
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## 能做什么
|
|
32
|
+
|
|
33
|
+
用自然语言告诉 Claude 你要抓什么,不需要写代码,不需要记命令。Claude 会自动识别意图、从链接中提取 ID、自动翻页,并导出结果。
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
帮我抓取这条小红书笔记的数据:https://www.xiaohongshu.com/explore/68304ca2...
|
|
37
|
+
|
|
38
|
+
抓取用户「美食探店小王」的所有笔记,导出成 Excel
|
|
39
|
+
|
|
40
|
+
搜索小红书上关于「护肤」的用户,列出粉丝数排名
|
|
41
|
+
|
|
42
|
+
获取这个笔记的所有评论
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**支持功能:**
|
|
46
|
+
- 按关键词搜索用户
|
|
47
|
+
- 获取用户主页数据(粉丝数、发帖数、简介等)
|
|
48
|
+
- 获取用户所有笔记(自动翻页)
|
|
49
|
+
- 获取笔记详情(标题、正文、互动数、封面图、话题标签)
|
|
50
|
+
- 获取笔记评论
|
|
51
|
+
- 按话题获取笔记 / 获取首页推荐
|
|
52
|
+
- 导出为 Excel / JSON
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## 快速开始
|
|
57
|
+
|
|
58
|
+
### 第一步 — 安装 Skill
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
npx bubu-xhs-scraper-skill
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
安装完成后,skill 文件会复制到 `~/.claude/skills/bubu-xhs-scraper-skill/`。
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
### 第二步 — 获取 TikHub API Key
|
|
69
|
+
|
|
70
|
+
1. 前往 [user.tikhub.io](https://user.tikhub.io) 注册账号
|
|
71
|
+
2. 在控制台中复制你的 API Key
|
|
72
|
+
|
|
73
|
+
> TikHub 提供小红书底层数据接口。计费方式:约 **$0.001/次**,成功请求会缓存 24 小时(重复请求不额外计费)。详细定价见 [TikHub Pricing](https://user.tikhub.io/dashboard/pricing)。
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
### 第三步 — 配置 API Key
|
|
78
|
+
|
|
79
|
+
编辑 `~/.claude/settings.json`,添加以下内容:
|
|
80
|
+
|
|
81
|
+
```json
|
|
82
|
+
{
|
|
83
|
+
"env": {
|
|
84
|
+
"TIKHUB_API_KEY": "你的-api-key"
|
|
85
|
+
}
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
### 第四步 — 安装 Python 依赖
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
pip3 install httpx
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
### 第五步 — 重启 Claude Code
|
|
100
|
+
|
|
101
|
+
重启后 skill 自动生效。试试这句话:
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
帮我抓取这条小红书笔记的数据:https://www.xiaohongshu.com/explore/68304ca2...
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
## 环境要求
|
|
110
|
+
|
|
111
|
+
| 要求 | 说明 |
|
|
112
|
+
|------|------|
|
|
113
|
+
| Claude Code | [claude.ai/claude-code](https://claude.ai/claude-code) |
|
|
114
|
+
| Node.js | ≥ 18(安装脚本需要) |
|
|
115
|
+
| Python 3 + `httpx` | `pip3 install httpx` |
|
|
116
|
+
| TikHub API Key | [user.tikhub.io](https://user.tikhub.io) 注册获取 |
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## 支持的接口
|
|
121
|
+
|
|
122
|
+
| 功能 | 接口路径 |
|
|
123
|
+
|------|---------|
|
|
124
|
+
| 搜索用户 | `GET /api/v1/xiaohongshu/web/search_users` |
|
|
125
|
+
| 获取用户信息 | `GET /api/v1/xiaohongshu/app/get_user_info` |
|
|
126
|
+
| 获取用户笔记列表 | `GET /api/v1/xiaohongshu/app/get_user_notes` |
|
|
127
|
+
| 获取笔记详情 | `GET /api/v1/xiaohongshu/app/get_note_info` |
|
|
128
|
+
| 获取笔记评论 | `GET /api/v1/xiaohongshu/app/get_note_comments` |
|
|
129
|
+
| 按话题获取笔记 | `GET /api/v1/xiaohongshu/app/get_notes_by_topic` |
|
|
130
|
+
| 首页推荐 | `GET /api/v1/xiaohongshu/web/get_home_recommend` |
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## 如何从链接中找到 ID
|
|
135
|
+
|
|
136
|
+
**用户 ID** — 来自主页链接:
|
|
137
|
+
```
|
|
138
|
+
https://www.xiaohongshu.com/user/profile/5c1b1234...
|
|
139
|
+
^^^^^^^^^^^
|
|
140
|
+
这就是 user_id
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
**笔记 ID** — 来自笔记链接:
|
|
144
|
+
```
|
|
145
|
+
https://www.xiaohongshu.com/explore/68304ca200000000...
|
|
146
|
+
^^^^^^^^^^^^^^^^
|
|
147
|
+
这就是 note_id
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
> 不需要手动提取 ID,直接把完整链接发给 Claude,它会自动处理。
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## 笔记详情字段说明
|
|
155
|
+
|
|
156
|
+
| 字段 | 含义 |
|
|
157
|
+
|------|------|
|
|
158
|
+
| `title` | 笔记标题 |
|
|
159
|
+
| `desc` | 笔记正文 |
|
|
160
|
+
| `type` | `"normal"`(图文)或 `"video"`(视频) |
|
|
161
|
+
| `liked_count` | 点赞数 |
|
|
162
|
+
| `collected_count` | 收藏数 |
|
|
163
|
+
| `comments_count` | 评论数 |
|
|
164
|
+
| `shared_count` | 分享数 |
|
|
165
|
+
| `view_count` | 浏览数 |
|
|
166
|
+
| `topics` | 话题标签列表 |
|
|
167
|
+
| `time` | 发布时间戳(秒) |
|
|
168
|
+
| `ip_location` | 发布 IP 归属地 |
|
|
169
|
+
| `images_list` | 图片列表(多档清晰度) |
|
|
170
|
+
| `video` | 视频信息(视频笔记专有) |
|
|
171
|
+
|
|
172
|
+
> ⚠️ 原始响应中 `interact_info` 字段始终为空,Skill 会自动读取顶层的 `liked_count` 等字段,无需关心。
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## 常见问题
|
|
177
|
+
|
|
178
|
+
| 报错 | 可能原因 | 解决方法 |
|
|
179
|
+
|------|---------|---------|
|
|
180
|
+
| `400` | 笔记已删除,或需要分享链接 | 换用完整的分享链接(而不是单独的 note_id) |
|
|
181
|
+
| `401` | API Key 未配置 | 检查 `~/.claude/settings.json` 中的 `TIKHUB_API_KEY` |
|
|
182
|
+
| `429` | 触发限流 | Skill 已自动加延迟,稍等片刻后重试 |
|
|
183
|
+
| 请求超时 | VPN 或代理冲突 | 尝试切换 VPN 状态,`api.tikhub.io` 可能需要直连 |
|
|
184
|
+
| 封面链接失效 | CDN 链接带时效签名 | 抓到封面链接后立即下载,不要只保存 URL |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## 开源协议
|
|
189
|
+
|
|
190
|
+
MIT
|
package/bin/install.mjs
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* bubu-xhs-scraper-skill installer
|
|
4
|
+
* Installs the XHS Scraper skill to ~/.claude/skills/
|
|
5
|
+
*
|
|
6
|
+
* Usage:
|
|
7
|
+
* npx bubu-xhs-scraper-skill
|
|
8
|
+
* npx bubu-xhs-scraper-skill install
|
|
9
|
+
*/
|
|
10
|
+
|
|
11
|
+
import { readFileSync, mkdirSync, writeFileSync, existsSync } from 'fs';
|
|
12
|
+
import { join } from 'path';
|
|
13
|
+
import { homedir } from 'os';
|
|
14
|
+
import { fileURLToPath } from 'url';
|
|
15
|
+
import { dirname } from 'path';
|
|
16
|
+
|
|
17
|
+
const __filename = fileURLToPath(import.meta.url);
|
|
18
|
+
const __dirname = dirname(__filename);
|
|
19
|
+
|
|
20
|
+
const SKILL_NAME = 'bubu-xhs-scraper-skill';
|
|
21
|
+
const SKILLS_DIR = join(homedir(), '.claude', 'skills');
|
|
22
|
+
const TARGET_DIR = join(SKILLS_DIR, SKILL_NAME);
|
|
23
|
+
const SKILL_FILE = join(__dirname, '..', 'skills', SKILL_NAME, 'SKILL.md');
|
|
24
|
+
|
|
25
|
+
function printBanner() {
|
|
26
|
+
console.log('\x1b[36m');
|
|
27
|
+
console.log('╔══════════════════════════════════╗');
|
|
28
|
+
console.log('║ bubu-xhs-scraper-skill v1.1.0 ║');
|
|
29
|
+
console.log('║ Claude Code Skill Installer ║');
|
|
30
|
+
console.log('╚══════════════════════════════════╝');
|
|
31
|
+
console.log('\x1b[0m');
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
function install() {
|
|
35
|
+
printBanner();
|
|
36
|
+
|
|
37
|
+
// Create skills directory if needed
|
|
38
|
+
if (!existsSync(SKILLS_DIR)) {
|
|
39
|
+
mkdirSync(SKILLS_DIR, { recursive: true });
|
|
40
|
+
console.log(`\x1b[32m✓ Created ${SKILLS_DIR}\x1b[0m`);
|
|
41
|
+
}
|
|
42
|
+
|
|
43
|
+
// Create skill directory
|
|
44
|
+
mkdirSync(TARGET_DIR, { recursive: true });
|
|
45
|
+
|
|
46
|
+
// Copy SKILL.md
|
|
47
|
+
const skillContent = readFileSync(SKILL_FILE, 'utf8');
|
|
48
|
+
writeFileSync(join(TARGET_DIR, 'SKILL.md'), skillContent);
|
|
49
|
+
|
|
50
|
+
console.log(`\x1b[32m✓ Installed skill to: ${TARGET_DIR}\x1b[0m\n`);
|
|
51
|
+
|
|
52
|
+
console.log('\x1b[33m📋 Setup required:\x1b[0m');
|
|
53
|
+
console.log(' 1. Get a TikHub API key at \x1b[4mhttps://user.tikhub.io\x1b[0m');
|
|
54
|
+
console.log(' 2. Add to ~/.claude/settings.json:');
|
|
55
|
+
console.log('\x1b[90m {');
|
|
56
|
+
console.log(' "env": {');
|
|
57
|
+
console.log(' "TIKHUB_API_KEY": "your-api-key-here"');
|
|
58
|
+
console.log(' }');
|
|
59
|
+
console.log(' }\x1b[0m\n');
|
|
60
|
+
|
|
61
|
+
console.log(' 3. Install Python dependency:');
|
|
62
|
+
console.log('\x1b[90m pip3 install httpx\x1b[0m\n');
|
|
63
|
+
|
|
64
|
+
console.log('\x1b[32m✨ Done! Restart Claude Code and ask it to scrape XHS data.\x1b[0m');
|
|
65
|
+
console.log('\x1b[90m Example: "帮我抓取这条小红书笔记的数据:<url>"\x1b[0m\n');
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
const args = process.argv.slice(2);
|
|
69
|
+
const cmd = args[0];
|
|
70
|
+
|
|
71
|
+
if (cmd === '--help' || cmd === '-h') {
|
|
72
|
+
console.log('Usage: npx bubu-xhs-scraper-skill [install]');
|
|
73
|
+
console.log('Installs the XHS Scraper skill for Claude Code.');
|
|
74
|
+
} else {
|
|
75
|
+
// Default action is install
|
|
76
|
+
install();
|
|
77
|
+
}
|
package/package.json
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "bubu-xhs-scraper-skill",
|
|
3
|
+
"version": "1.1.0",
|
|
4
|
+
"description": "Multi-platform social media scraper skill for Claude Code & Openclaw, powered by TikHub API. Supports XHS, Douyin, TikTok, Bilibili, Weibo, YouTube, WeChat, Twitter/X.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"bin": {
|
|
7
|
+
"bubu-xhs-scraper-skill": "bin/install.mjs"
|
|
8
|
+
},
|
|
9
|
+
"files": [
|
|
10
|
+
"bin/",
|
|
11
|
+
"skills/"
|
|
12
|
+
],
|
|
13
|
+
"keywords": [
|
|
14
|
+
"claude",
|
|
15
|
+
"claude-code",
|
|
16
|
+
"openclaw",
|
|
17
|
+
"skill",
|
|
18
|
+
"xiaohongshu",
|
|
19
|
+
"xhs",
|
|
20
|
+
"scraper",
|
|
21
|
+
"tikhub",
|
|
22
|
+
"douyin",
|
|
23
|
+
"tiktok",
|
|
24
|
+
"bilibili",
|
|
25
|
+
"weibo",
|
|
26
|
+
"youtube",
|
|
27
|
+
"wechat",
|
|
28
|
+
"twitter"
|
|
29
|
+
],
|
|
30
|
+
"author": "wushijing123",
|
|
31
|
+
"repository": {
|
|
32
|
+
"type": "git",
|
|
33
|
+
"url": "git+https://github.com/wushijing123/xhs-scraper-skill.git"
|
|
34
|
+
},
|
|
35
|
+
"license": "MIT",
|
|
36
|
+
"engines": {
|
|
37
|
+
"node": ">=18"
|
|
38
|
+
}
|
|
39
|
+
}
|
|
@@ -0,0 +1,367 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: bubu-xhs-scraper-skill
|
|
3
|
+
description: 多平台社交媒体数据抓取工具,基于 TikHub API。支持小红书、抖音、TikTok、Bilibili、微博、YouTube、微信公众号、视频号、Twitter/X。触发词:「小红书」「抖音」「TikTok」「B站」「bilibili」「微博」「YouTube」「公众号」「视频号」「Twitter」「X平台」「抓取数据」「获取用户」「搜索」「评论」「笔记」「视频」。只要用户提到这些平台 + 数据需求,立即触发此 skill。
|
|
4
|
+
version: 1.1.0
|
|
5
|
+
metadata:
|
|
6
|
+
openclaw:
|
|
7
|
+
requires:
|
|
8
|
+
env:
|
|
9
|
+
- TIKHUB_API_KEY
|
|
10
|
+
bins:
|
|
11
|
+
- python3
|
|
12
|
+
primaryEnv: TIKHUB_API_KEY
|
|
13
|
+
emoji: 🔍
|
|
14
|
+
homepage: https://github.com/wushijing123/xhs-scraper-skill
|
|
15
|
+
setup:
|
|
16
|
+
command: pip3 install httpx
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
# 多平台数据抓取 Skill(TikHub API)
|
|
20
|
+
|
|
21
|
+
## 概述
|
|
22
|
+
|
|
23
|
+
通过 TikHub API 直接 HTTP 调用抓取多平台数据。API key 存储在环境变量 `TIKHUB_API_KEY` 中,**无需用户提供**。
|
|
24
|
+
|
|
25
|
+
> **重要**:不要使用 `tikhub` Python SDK,它的端点已过时。直接用 `httpx` 调用 API。
|
|
26
|
+
|
|
27
|
+
## 支持平台一览
|
|
28
|
+
|
|
29
|
+
| 平台 | 用户信息 | 内容列表 | 内容详情 | 评论 | 搜索 | 热搜 |
|
|
30
|
+
|------|---------|---------|---------|------|------|------|
|
|
31
|
+
| 小红书 | ✅ | ✅ | ✅ | ✅ | ✅ | — |
|
|
32
|
+
| 抖音 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
33
|
+
| TikTok | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
34
|
+
| Bilibili | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
35
|
+
| 微博 | ✅ | ✅ | ✅ | — | ✅ | ✅ |
|
|
36
|
+
| 微信公众号 | — | ✅ | ✅ | ✅ | — | — |
|
|
37
|
+
| 微信视频号 | ✅ | — | ✅ | ✅ | ✅ | ✅ |
|
|
38
|
+
| Twitter / X | ✅ | ✅ | — | — | ✅ | — |
|
|
39
|
+
| YouTube | — | — | — | — | ✅ | — |
|
|
40
|
+
|
|
41
|
+
## 安装依赖
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
pip3 install httpx
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## 标准代码模板
|
|
48
|
+
|
|
49
|
+
```python
|
|
50
|
+
import asyncio, os, httpx
|
|
51
|
+
|
|
52
|
+
BASE_URL = "https://api.tikhub.io"
|
|
53
|
+
HEADERS = {"Authorization": f"Bearer {os.environ.get('TIKHUB_API_KEY')}"}
|
|
54
|
+
|
|
55
|
+
async def api_get(endpoint: str, params: dict) -> dict:
|
|
56
|
+
async with httpx.AsyncClient() as client:
|
|
57
|
+
r = await client.get(f"{BASE_URL}{endpoint}", params=params, headers=HEADERS, timeout=30)
|
|
58
|
+
r.raise_for_status()
|
|
59
|
+
return r.json()
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## 平台一:小红书(Xiaohongshu)
|
|
65
|
+
|
|
66
|
+
### 端点列表
|
|
67
|
+
|
|
68
|
+
| 功能 | 端点 | 主要参数 |
|
|
69
|
+
|------|------|---------|
|
|
70
|
+
| 搜索用户 | `GET /api/v1/xiaohongshu/web/search_users` | `keyword`, `page` |
|
|
71
|
+
| 获取用户信息(Web) | `GET /api/v1/xiaohongshu/web/get_user_info` | `user_id` |
|
|
72
|
+
| 获取用户信息(App) | `GET /api/v1/xiaohongshu/app/get_user_info` | `user_id` |
|
|
73
|
+
| 获取用户笔记(Web) | `GET /api/v1/xiaohongshu/web/get_user_notes_v2` | `user_id`, `cursor` |
|
|
74
|
+
| 获取用户笔记(App) | `GET /api/v1/xiaohongshu/app/get_user_notes` | `user_id`, `cursor` |
|
|
75
|
+
| 获取笔记详情 | `GET /api/v1/xiaohongshu/app/get_note_info` | `note_id` 或 `share_text` |
|
|
76
|
+
| 获取笔记评论 | `GET /api/v1/xiaohongshu/app/get_note_comments` | `note_id` |
|
|
77
|
+
| 按话题获取笔记 | `GET /api/v1/xiaohongshu/app/get_notes_by_topic` | `topic_id` |
|
|
78
|
+
| 首页推荐 | `GET /api/v1/xiaohongshu/web/get_home_recommend` | 无 |
|
|
79
|
+
|
|
80
|
+
### 响应要点
|
|
81
|
+
|
|
82
|
+
- **笔记详情路径**:`data["data"][0]["note_list"][0]`
|
|
83
|
+
- **互动数**:`liked_count` / `collected_count` / `comments_count` / `shared_count` 在顶层,`interact_info` 始终为空
|
|
84
|
+
- **时间戳**:秒级,用 `datetime.fromtimestamp(ts)`
|
|
85
|
+
- **封面图**:`images_list[0]["original"]`(原图)或 `["url"]`(压缩版),CDN 有时效,立即下载
|
|
86
|
+
- `note_id` 优先于 `share_text`;`xhslink.com` 短链可直接作为 `share_text`
|
|
87
|
+
|
|
88
|
+
### ID 提取
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
用户ID:https://www.xiaohongshu.com/user/profile/{user_id}
|
|
92
|
+
笔记ID:https://www.xiaohongshu.com/explore/{note_id}
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## 平台二:抖音(Douyin)
|
|
98
|
+
|
|
99
|
+
### 端点列表
|
|
100
|
+
|
|
101
|
+
| 功能 | 端点 | 主要参数 |
|
|
102
|
+
|------|------|---------|
|
|
103
|
+
| 获取用户主页 | `GET /api/v1/douyin/web/handler_user_profile` | `unique_id` 或 `sec_user_id` |
|
|
104
|
+
| 获取用户视频列表 | `GET /api/v1/douyin/web/fetch_user_post_videos` | `sec_user_id`, `max_cursor` |
|
|
105
|
+
| 获取单条视频 | `GET /api/v1/douyin/web/fetch_one_video` | `aweme_id` |
|
|
106
|
+
| 通过分享链接获取视频 | `GET /api/v1/douyin/web/fetch_one_video_by_share_url` | `share_url` |
|
|
107
|
+
| 获取视频评论 | `GET /api/v1/douyin/web/fetch_video_comments` | `aweme_id`, `cursor` |
|
|
108
|
+
| 搜索用户 | `GET /api/v1/douyin/web/fetch_user_search_result` | `keyword`, `cursor` |
|
|
109
|
+
| 搜索视频 | `GET /api/v1/douyin/web/fetch_video_search_result` | `keyword`, `cursor` |
|
|
110
|
+
| 热搜榜 | `GET /api/v1/douyin/web/fetch_hot_search_result` | 无 |
|
|
111
|
+
|
|
112
|
+
### 备用端点(App 版,Web 失败时用)
|
|
113
|
+
|
|
114
|
+
| 功能 | 端点 | 主要参数 |
|
|
115
|
+
|------|------|---------|
|
|
116
|
+
| 获取用户主页 | `GET /api/v1/douyin/app/v3/handler_user_profile` | `sec_user_id` |
|
|
117
|
+
| 获取用户视频 | `GET /api/v1/douyin/app/v3/fetch_user_post_videos` | `sec_user_id`, `max_cursor` |
|
|
118
|
+
| 获取单条视频 | `GET /api/v1/douyin/app/v3/fetch_one_video` | `aweme_id` |
|
|
119
|
+
| 通过分享链接获取视频 | `GET /api/v1/douyin/app/v3/fetch_one_video_by_share_url` | `share_url` |
|
|
120
|
+
| 搜索视频 | `GET /api/v1/douyin/app/v3/fetch_video_search_result` | `keyword`, `cursor` |
|
|
121
|
+
|
|
122
|
+
### 响应要点
|
|
123
|
+
|
|
124
|
+
- `unique_id` = 抖音号(@xxx),`sec_user_id` = URL 中的长 ID
|
|
125
|
+
- 视频列表翻页用 `max_cursor`(不是 `cursor`)
|
|
126
|
+
- 从 URL 提取 `aweme_id`:`douyin.com/video/{aweme_id}`
|
|
127
|
+
|
|
128
|
+
### ID 提取
|
|
129
|
+
|
|
130
|
+
```
|
|
131
|
+
sec_user_id:https://www.douyin.com/user/{sec_user_id}
|
|
132
|
+
aweme_id:https://www.douyin.com/video/{aweme_id}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## 平台三:TikTok
|
|
138
|
+
|
|
139
|
+
### 端点列表
|
|
140
|
+
|
|
141
|
+
| 功能 | 端点 | 主要参数 |
|
|
142
|
+
|------|------|---------|
|
|
143
|
+
| 获取用户主页 | `GET /api/v1/tiktok/web/fetch_user_profile` | `uniqueId` |
|
|
144
|
+
| 获取用户视频 | `GET /api/v1/tiktok/web/fetch_user_post` | `secUid`, `cursor`, `count` |
|
|
145
|
+
| 获取视频详情 | `GET /api/v1/tiktok/web/fetch_post_detail` | `itemId` |
|
|
146
|
+
| 通过分享链接获取视频 | `GET /api/v1/tiktok/app/v3/fetch_one_video_by_share_url` | `share_url` |
|
|
147
|
+
| 获取视频评论 | `GET /api/v1/tiktok/web/fetch_post_comment` | `aweme_id`, `cursor` |
|
|
148
|
+
| 搜索用户 | `GET /api/v1/tiktok/web/fetch_search_user` | `keyword`, `cursor` |
|
|
149
|
+
| 搜索视频 | `GET /api/v1/tiktok/web/fetch_search_video` | `keyword`, `count`, `offset` |
|
|
150
|
+
| 热门话题 | `GET /api/v1/tiktok/web/fetch_trending_searchwords` | 无 |
|
|
151
|
+
|
|
152
|
+
### 备用端点(App 版,Web 失败时用)
|
|
153
|
+
|
|
154
|
+
| 功能 | 端点 | 主要参数 |
|
|
155
|
+
|------|------|---------|
|
|
156
|
+
| 获取用户主页 | `GET /api/v1/tiktok/app/v3/handler_user_profile` | `sec_user_id` |
|
|
157
|
+
| 获取用户视频 | `GET /api/v1/tiktok/app/v3/fetch_user_post_videos` | `sec_user_id`, `max_cursor` |
|
|
158
|
+
| 获取单条视频 | `GET /api/v1/tiktok/app/v3/fetch_one_video` | `aweme_id` |
|
|
159
|
+
| 搜索用户 | `GET /api/v1/tiktok/app/v3/fetch_user_search_result` | `keyword`, `cursor` |
|
|
160
|
+
| 搜索视频 | `GET /api/v1/tiktok/app/v3/fetch_video_search_result` | `keyword`, `cursor` |
|
|
161
|
+
|
|
162
|
+
### 响应要点
|
|
163
|
+
|
|
164
|
+
- `uniqueId` = @用户名(不含@),`secUid` = URL 中的长 ID
|
|
165
|
+
- 先用 `uniqueId` 调用 `fetch_user_profile` 拿到 `secUid`,再翻页获取视频
|
|
166
|
+
- 视频列表翻页用 `cursor`
|
|
167
|
+
|
|
168
|
+
### ID 提取
|
|
169
|
+
|
|
170
|
+
```
|
|
171
|
+
uniqueId:https://www.tiktok.com/@{uniqueId}
|
|
172
|
+
itemId:https://www.tiktok.com/@xxx/video/{itemId}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## 平台四:Bilibili
|
|
178
|
+
|
|
179
|
+
### 端点列表
|
|
180
|
+
|
|
181
|
+
| 功能 | 端点 | 主要参数 |
|
|
182
|
+
|------|------|---------|
|
|
183
|
+
| 获取用户主页 | `GET /api/v1/bilibili/web/fetch_user_profile` | `uid` |
|
|
184
|
+
| 获取用户视频 | `GET /api/v1/bilibili/web/fetch_user_post_videos` | `uid`, `page`, `pagesize` |
|
|
185
|
+
| 获取视频详情 | `GET /api/v1/bilibili/web/fetch_one_video` | `bvid` 或 `aid` |
|
|
186
|
+
| 获取视频详情 v2 | `GET /api/v1/bilibili/web/fetch_video_detail` | `bvid` |
|
|
187
|
+
| 获取视频评论 | `GET /api/v1/bilibili/web/fetch_video_comments` | `bvid`, `page` |
|
|
188
|
+
| 综合搜索 | `GET /api/v1/bilibili/web/fetch_general_search` | `keyword`, `page` |
|
|
189
|
+
| 热搜 | `GET /api/v1/bilibili/web/fetch_hot_search` | 无 |
|
|
190
|
+
|
|
191
|
+
### 备用端点(App 版,Web 失败时用)
|
|
192
|
+
|
|
193
|
+
| 功能 | 端点 | 主要参数 |
|
|
194
|
+
|------|------|---------|
|
|
195
|
+
| 获取用户信息 | `GET /api/v1/bilibili/app/fetch_user_info` | `uid` |
|
|
196
|
+
| 获取用户视频 | `GET /api/v1/bilibili/app/fetch_user_videos` | `uid`, `page` |
|
|
197
|
+
| 获取视频详情 | `GET /api/v1/bilibili/app/fetch_one_video` | `bvid` |
|
|
198
|
+
| 获取视频评论 | `GET /api/v1/bilibili/app/fetch_video_comments` | `bvid`, `page` |
|
|
199
|
+
| 综合搜索 | `GET /api/v1/bilibili/app/fetch_search_all` | `keyword`, `page` |
|
|
200
|
+
|
|
201
|
+
### 响应要点
|
|
202
|
+
|
|
203
|
+
- `uid` = 数字用户 ID
|
|
204
|
+
- `bvid` = BV 号(如 `BV1xx411c7mD`),`aid` = AV 号(数字)
|
|
205
|
+
- 视频列表翻页用 `page`(从 1 开始),`pagesize` 默认 30
|
|
206
|
+
|
|
207
|
+
### ID 提取
|
|
208
|
+
|
|
209
|
+
```
|
|
210
|
+
uid:https://space.bilibili.com/{uid}
|
|
211
|
+
bvid:https://www.bilibili.com/video/{bvid}
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## 平台五:微博(Weibo)
|
|
217
|
+
|
|
218
|
+
### 端点列表
|
|
219
|
+
|
|
220
|
+
| 功能 | 端点 | 主要参数 |
|
|
221
|
+
|------|------|---------|
|
|
222
|
+
| 获取用户信息 | `GET /api/v1/weibo/app/fetch_user_info` | `uid` 或 `screen_name` |
|
|
223
|
+
| 获取用户微博 | `GET /api/v1/weibo/web/fetch_user_posts` | `uid`, `page` |
|
|
224
|
+
| 获取用户视频 | `GET /api/v1/weibo/web_v2/fetch_user_video_list` | `uid`, `page` |
|
|
225
|
+
| 搜索(综合) | `GET /api/v1/weibo/web/fetch_search` | `keyword`, `page` |
|
|
226
|
+
| 搜索用户 | `GET /api/v1/weibo/web_v2/fetch_user_search` | `keyword`, `page` |
|
|
227
|
+
| 搜索视频 | `GET /api/v1/weibo/web_v2/fetch_video_search` | `keyword`, `page` |
|
|
228
|
+
| 热搜榜 | `GET /api/v1/weibo/app/fetch_hot_search` | 无 |
|
|
229
|
+
| AI 热点搜索 | `GET /api/v1/weibo/web_v2/fetch_ai_search` | `keyword` |
|
|
230
|
+
| 高级搜索 | `GET /api/v1/weibo/web_v2/fetch_advanced_search` | `keyword`, `page` |
|
|
231
|
+
|
|
232
|
+
### 响应要点
|
|
233
|
+
|
|
234
|
+
- `uid` = 数字 ID,`screen_name` = 微博昵称
|
|
235
|
+
- 翻页用 `page`(从 1 开始)
|
|
236
|
+
|
|
237
|
+
### ID 提取
|
|
238
|
+
|
|
239
|
+
```
|
|
240
|
+
uid:https://weibo.com/u/{uid}
|
|
241
|
+
https://weibo.com/{screen_name}(昵称直接用)
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
## 平台六:YouTube
|
|
247
|
+
|
|
248
|
+
> ⚠️ **目前仅支持搜索**,暂无频道详情或视频详情端点。
|
|
249
|
+
|
|
250
|
+
### 端点列表
|
|
251
|
+
|
|
252
|
+
| 功能 | 端点 | 主要参数 |
|
|
253
|
+
|------|------|---------|
|
|
254
|
+
| 综合搜索 | `GET /api/v1/youtube/web_v2/get_general_search` | `keyword` |
|
|
255
|
+
| 搜索频道 | `GET /api/v1/youtube/web_v2/search_channels` | `keyword` |
|
|
256
|
+
| 搜索视频 | `GET /api/v1/youtube/web/search_video` | `keyword` |
|
|
257
|
+
| 搜索 Shorts | `GET /api/v1/youtube/web_v2/get_shorts_search` | `keyword` |
|
|
258
|
+
| 搜索建议词 | `GET /api/v1/youtube/web_v2/get_search_suggestions` | `keyword` |
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## 平台七:微信公众号(WeChat MP)
|
|
263
|
+
|
|
264
|
+
### 端点列表
|
|
265
|
+
|
|
266
|
+
| 功能 | 端点 | 主要参数 |
|
|
267
|
+
|------|------|---------|
|
|
268
|
+
| 获取文章详情(JSON) | `GET /api/v1/wechat_mp/web/fetch_mp_article_detail_json` | `url` |
|
|
269
|
+
| 获取文章详情(HTML) | `GET /api/v1/wechat_mp/web/fetch_mp_article_detail_html` | `url` |
|
|
270
|
+
| 获取公众号文章列表 | `GET /api/v1/wechat_mp/web/fetch_mp_article_list` | `url` 或 `biz` |
|
|
271
|
+
| 获取文章评论 | `GET /api/v1/wechat_mp/web/fetch_mp_article_comment_list` | `url` |
|
|
272
|
+
| 获取文章阅读数 | `GET /api/v1/wechat_mp/web/fetch_mp_article_read_count` | `url` |
|
|
273
|
+
| 获取相关文章 | `GET /api/v1/wechat_mp/web/fetch_mp_related_articles` | `url` |
|
|
274
|
+
| 获取文章广告 | `GET /api/v1/wechat_mp/web/fetch_mp_article_ad` | `url` |
|
|
275
|
+
|
|
276
|
+
### 响应要点
|
|
277
|
+
|
|
278
|
+
- 所有端点主要参数为文章 `url`(`mp.weixin.qq.com/s/...`)
|
|
279
|
+
- `biz` = 公众号唯一标识,从文章 URL 中提取:`__biz=MzI...`
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## 平台八:微信视频号(WeChat Channels)
|
|
284
|
+
|
|
285
|
+
### 端点列表
|
|
286
|
+
|
|
287
|
+
| 功能 | 端点 | 主要参数 |
|
|
288
|
+
|------|------|---------|
|
|
289
|
+
| 搜索创作者 | `GET /api/v1/wechat_channels/fetch_user_search` | `keyword` |
|
|
290
|
+
| 搜索创作者 v2 | `GET /api/v1/wechat_channels/fetch_user_search_v2` | `keyword` |
|
|
291
|
+
| 综合搜索 | `GET /api/v1/wechat_channels/fetch_default_search` | `keyword` |
|
|
292
|
+
| 最新内容搜索 | `GET /api/v1/wechat_channels/fetch_search_latest` | `keyword` |
|
|
293
|
+
| 普通内容搜索 | `GET /api/v1/wechat_channels/fetch_search_ordinary` | `keyword` |
|
|
294
|
+
| 获取视频详情 | `GET /api/v1/wechat_channels/fetch_video_detail` | `video_id` 或 `url` |
|
|
295
|
+
| 获取视频评论 | `GET /api/v1/wechat_channels/fetch_comments` | `video_id` |
|
|
296
|
+
| 主页内容 | `GET /api/v1/wechat_channels/fetch_home_page` | `username` |
|
|
297
|
+
| 热词 | `GET /api/v1/wechat_channels/fetch_hot_words` | 无 |
|
|
298
|
+
| 直播历史 | `GET /api/v1/wechat_channels/fetch_live_history` | `username` |
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## 平台九:Twitter / X
|
|
303
|
+
|
|
304
|
+
### 端点列表
|
|
305
|
+
|
|
306
|
+
| 功能 | 端点 | 主要参数 |
|
|
307
|
+
|------|------|---------|
|
|
308
|
+
| 获取用户主页 | `GET /api/v1/twitter/web/fetch_user_profile` | `screen_name` 或 `user_id` |
|
|
309
|
+
| 获取用户推文 | `GET /api/v1/twitter/web/fetch_user_post_tweet` | `user_id`, `cursor` |
|
|
310
|
+
| 搜索推文 | `GET /api/v1/twitter/web/fetch_search_timeline` | `keyword`, `cursor` |
|
|
311
|
+
|
|
312
|
+
### 响应要点
|
|
313
|
+
|
|
314
|
+
- `screen_name` = @用户名(不含@),`user_id` = 数字 ID
|
|
315
|
+
- 先用 `screen_name` 调 `fetch_user_profile` 获取 `user_id`,再翻页
|
|
316
|
+
|
|
317
|
+
### ID 提取
|
|
318
|
+
|
|
319
|
+
```
|
|
320
|
+
screen_name:https://twitter.com/{screen_name}
|
|
321
|
+
https://x.com/{screen_name}
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
---
|
|
325
|
+
|
|
326
|
+
## 批量翻页通用模板
|
|
327
|
+
|
|
328
|
+
```python
|
|
329
|
+
async def fetch_all_pages(endpoint, base_params, cursor_key="cursor", data_key="list"):
|
|
330
|
+
"""通用翻页抓取"""
|
|
331
|
+
all_items = []
|
|
332
|
+
cursor = None
|
|
333
|
+
while True:
|
|
334
|
+
params = {**base_params}
|
|
335
|
+
if cursor:
|
|
336
|
+
params[cursor_key] = cursor
|
|
337
|
+
result = await api_get(endpoint, params)
|
|
338
|
+
inner = result.get("data", {}).get("data", {})
|
|
339
|
+
items = inner.get(data_key, [])
|
|
340
|
+
all_items.extend(items)
|
|
341
|
+
has_more = inner.get("has_more", False)
|
|
342
|
+
cursor = inner.get(cursor_key)
|
|
343
|
+
if not has_more or not cursor:
|
|
344
|
+
break
|
|
345
|
+
await asyncio.sleep(0.5)
|
|
346
|
+
return all_items
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
> ⚠️ cursor_key 差异:小红书/TikTok/Twitter 用 `cursor`,抖音用 `max_cursor`,Bilibili/微博用 `page`(数字递增)
|
|
350
|
+
|
|
351
|
+
---
|
|
352
|
+
|
|
353
|
+
## 常见问题
|
|
354
|
+
|
|
355
|
+
| 报错 | 可能原因 | 解决方法 |
|
|
356
|
+
|------|---------|---------|
|
|
357
|
+
| `400` | 端点需要额外参数或内容已删除 | 换分享链接方式,或换 v2/v3 备用端点 |
|
|
358
|
+
| `401` | API Key 未配置 | 检查 `TIKHUB_API_KEY` 环境变量 |
|
|
359
|
+
| `429` | 限流 | 每次请求间加 `await asyncio.sleep(0.5~1)` |
|
|
360
|
+
| 超时 | VPN/代理冲突 | 切换 VPN 状态,`api.tikhub.io` 可能需要直连 |
|
|
361
|
+
|
|
362
|
+
## API 基础信息
|
|
363
|
+
|
|
364
|
+
- **Base URL**: `https://api.tikhub.io`(国内备用:`https://api.tikhub.dev`)
|
|
365
|
+
- **鉴权**: `Authorization: Bearer {TIKHUB_API_KEY}`
|
|
366
|
+
- **定价**: $0.001/次,详见 [TikHub Pricing](https://user.tikhub.io/dashboard/pricing)
|
|
367
|
+
- **缓存**: 成功请求缓存 24 小时,重复请求不额外计费
|