ituring-fetch 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ituring_fetch-0.1.0/.gitignore +12 -0
- ituring_fetch-0.1.0/.python-version +1 -0
- ituring_fetch-0.1.0/PKG-INFO +71 -0
- ituring_fetch-0.1.0/README.en.md +58 -0
- ituring_fetch-0.1.0/README.md +59 -0
- ituring_fetch-0.1.0/TODO.md +11 -0
- ituring_fetch-0.1.0/docs/superpowers/plans/2026-05-31-ituring-fetch-plan.md +1098 -0
- ituring_fetch-0.1.0/docs/superpowers/specs/2026-05-31-ituring-fetch-design.md +232 -0
- ituring_fetch-0.1.0/pyproject.toml +35 -0
- ituring_fetch-0.1.0/src/ituring_fetch/__init__.py +1 -0
- ituring_fetch-0.1.0/src/ituring_fetch/api.py +88 -0
- ituring_fetch-0.1.0/src/ituring_fetch/auth.py +147 -0
- ituring_fetch-0.1.0/src/ituring_fetch/cli.py +147 -0
- ituring_fetch-0.1.0/src/ituring_fetch/converter.py +151 -0
- ituring_fetch-0.1.0/src/ituring_fetch/debug.py +36 -0
- ituring_fetch-0.1.0/src/ituring_fetch/models.py +17 -0
- ituring_fetch-0.1.0/src/ituring_fetch/py.typed +0 -0
- ituring_fetch-0.1.0/src/ituring_fetch/scraper.py +79 -0
- ituring_fetch-0.1.0/tests/__init__.py +1 -0
- ituring_fetch-0.1.0/tests/test_api.py +86 -0
- ituring_fetch-0.1.0/tests/test_auth.py +67 -0
- ituring_fetch-0.1.0/tests/test_converter.py +43 -0
- ituring_fetch-0.1.0/tests/test_models.py +26 -0
- ituring_fetch-0.1.0/uv.lock +322 -0
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.14
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ituring-fetch
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: CLI tool to scrape purchased ebooks from ituring.com.cn
|
|
5
|
+
Author-email: yms_hi <yms_hi@Outlook.com>
|
|
6
|
+
Requires-Python: >=3.11
|
|
7
|
+
Requires-Dist: click>=8
|
|
8
|
+
Requires-Dist: httpx>=0.27
|
|
9
|
+
Requires-Dist: playwright>=1.40
|
|
10
|
+
Requires-Dist: rich>=13
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
|
|
13
|
+
# ituring-fetch
|
|
14
|
+
|
|
15
|
+
把图灵社区已购电子书扒下来,转成 epub/pdf/html。
|
|
16
|
+
|
|
17
|
+
## 安装
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
git clone <repo>
|
|
21
|
+
cd ituring_fetch
|
|
22
|
+
uv sync
|
|
23
|
+
playwright install chromium # 或者用系统自带的 Chrome
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
依赖:Python 3.11+、pandoc(epub 格式需要)、Chrome/Chromium。
|
|
27
|
+
|
|
28
|
+
## 使用
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
# 登录(会弹出浏览器窗口)
|
|
32
|
+
uv run ituring-fetch login
|
|
33
|
+
|
|
34
|
+
# 看看登录没
|
|
35
|
+
uv run ituring-fetch status
|
|
36
|
+
|
|
37
|
+
# 列出已购电子书
|
|
38
|
+
uv run ituring-fetch list
|
|
39
|
+
|
|
40
|
+
# 抓取一本书
|
|
41
|
+
uv run ituring-fetch fetch 1143 --type=epub
|
|
42
|
+
uv run ituring-fetch fetch 1143 --type=pdf
|
|
43
|
+
uv run ituring-fetch fetch 1143 --type=html
|
|
44
|
+
|
|
45
|
+
# 指定输出文件名
|
|
46
|
+
uv run ituring-fetch fetch 1143 --type=pdf -o mybook.pdf
|
|
47
|
+
|
|
48
|
+
# 登出
|
|
49
|
+
uv run ituring-fetch logout
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
`list` 输出的第一列就是书籍 ID。
|
|
53
|
+
|
|
54
|
+
## 调试
|
|
55
|
+
|
|
56
|
+
出问题时设个环境变量看详细日志:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
ITURING_DEBUG=1 uv run ituring-fetch fetch 1143 --type=pdf
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
会往 stderr 打请求/响应细节。
|
|
63
|
+
|
|
64
|
+
## 原理
|
|
65
|
+
|
|
66
|
+
登录时用 Playwright 打开浏览器,你手动登录后,程序从 localStorage 里取出 access token,跟 cookie 一起存到 `~/.ituring/credentials.json`。
|
|
67
|
+
|
|
68
|
+
后续操作:
|
|
69
|
+
- `list` 和书籍信息直接调 `api.ituring.com.cn` 的接口,带 Bearer token
|
|
70
|
+
- 章节内容用 Playwright headless 渲染页面,抽 `.article-content` 的 HTML
|
|
71
|
+
- 合并所有章节,pandoc 转 epub,Playwright 内置 PDF 引擎出 pdf
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# ituring-fetch
|
|
2
|
+
|
|
3
|
+
Download purchased ebooks from ituring.com.cn and convert to epub/pdf/html.
|
|
4
|
+
|
|
5
|
+
## Install
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
git clone <repo>
|
|
9
|
+
cd ituring_fetch
|
|
10
|
+
uv sync
|
|
11
|
+
playwright install chromium # or use system Chrome
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
Requires Python 3.11+, pandoc (for epub output), Chrome/Chromium.
|
|
15
|
+
|
|
16
|
+
## Usage
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
# Login (opens a browser window)
|
|
20
|
+
uv run ituring-fetch login
|
|
21
|
+
|
|
22
|
+
# Check login status
|
|
23
|
+
uv run ituring-fetch status
|
|
24
|
+
|
|
25
|
+
# List purchased ebooks
|
|
26
|
+
uv run ituring-fetch list
|
|
27
|
+
|
|
28
|
+
# Download a book
|
|
29
|
+
uv run ituring-fetch fetch 1143 --type=epub
|
|
30
|
+
uv run ituring-fetch fetch 1143 --type=pdf
|
|
31
|
+
uv run ituring-fetch fetch 1143 --type=html
|
|
32
|
+
|
|
33
|
+
# Custom output filename
|
|
34
|
+
uv run ituring-fetch fetch 1143 --type=pdf -o mybook.pdf
|
|
35
|
+
|
|
36
|
+
# Logout
|
|
37
|
+
uv run ituring-fetch logout
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
The first column from `list` is the book ID.
|
|
41
|
+
|
|
42
|
+
## Debugging
|
|
43
|
+
|
|
44
|
+
Set `ITURING_DEBUG=1` to see request/response details on stderr:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
ITURING_DEBUG=1 uv run ituring-fetch fetch 1143 --type=pdf
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## How it works
|
|
51
|
+
|
|
52
|
+
Login opens a browser via Playwright. After you sign in manually, the tool extracts the access token from localStorage and saves it alongside cookies to `~/.ituring/credentials.json`.
|
|
53
|
+
|
|
54
|
+
From there:
|
|
55
|
+
|
|
56
|
+
- `list` and book metadata use the `api.ituring.com.cn` REST API with Bearer auth
|
|
57
|
+
- Chapter content is scraped by rendering each chapter page in headless Playwright and extracting `.article-content` innerHTML
|
|
58
|
+
- All chapters are assembled into a single HTML document, then converted via pandoc (epub) or Playwright's built-in PDF engine (pdf)
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# ituring-fetch
|
|
2
|
+
|
|
3
|
+
把图灵社区已购电子书扒下来,转成 epub/pdf/html。
|
|
4
|
+
|
|
5
|
+
## 安装
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
git clone <repo>
|
|
9
|
+
cd ituring_fetch
|
|
10
|
+
uv sync
|
|
11
|
+
playwright install chromium # 或者用系统自带的 Chrome
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
依赖:Python 3.11+、pandoc(epub 格式需要)、Chrome/Chromium。
|
|
15
|
+
|
|
16
|
+
## 使用
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
# 登录(会弹出浏览器窗口)
|
|
20
|
+
uv run ituring-fetch login
|
|
21
|
+
|
|
22
|
+
# 看看登录没
|
|
23
|
+
uv run ituring-fetch status
|
|
24
|
+
|
|
25
|
+
# 列出已购电子书
|
|
26
|
+
uv run ituring-fetch list
|
|
27
|
+
|
|
28
|
+
# 抓取一本书
|
|
29
|
+
uv run ituring-fetch fetch 1143 --type=epub
|
|
30
|
+
uv run ituring-fetch fetch 1143 --type=pdf
|
|
31
|
+
uv run ituring-fetch fetch 1143 --type=html
|
|
32
|
+
|
|
33
|
+
# 指定输出文件名
|
|
34
|
+
uv run ituring-fetch fetch 1143 --type=pdf -o mybook.pdf
|
|
35
|
+
|
|
36
|
+
# 登出
|
|
37
|
+
uv run ituring-fetch logout
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
`list` 输出的第一列就是书籍 ID。
|
|
41
|
+
|
|
42
|
+
## 调试
|
|
43
|
+
|
|
44
|
+
出问题时设个环境变量看详细日志:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
ITURING_DEBUG=1 uv run ituring-fetch fetch 1143 --type=pdf
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
会往 stderr 打请求/响应细节。
|
|
51
|
+
|
|
52
|
+
## 原理
|
|
53
|
+
|
|
54
|
+
登录时用 Playwright 打开浏览器,你手动登录后,程序从 localStorage 里取出 access token,跟 cookie 一起存到 `~/.ituring/credentials.json`。
|
|
55
|
+
|
|
56
|
+
后续操作:
|
|
57
|
+
- `list` 和书籍信息直接调 `api.ituring.com.cn` 的接口,带 Bearer token
|
|
58
|
+
- 章节内容用 Playwright headless 渲染页面,抽 `.article-content` 的 HTML
|
|
59
|
+
- 合并所有章节,pandoc 转 epub,Playwright 内置 PDF 引擎出 pdf
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# Implementation Tasks
|
|
2
|
+
|
|
3
|
+
- [x] Task 1: Project Setup + Models
|
|
4
|
+
- [x] Task 2: Auth Module (cookie handling)
|
|
5
|
+
- [x] Task 3: API Client
|
|
6
|
+
- [x] Task 4: Auth Operations (login/logout/status)
|
|
7
|
+
- [x] Task 5: CLI (login, logout, status, list)
|
|
8
|
+
- [x] Task 6: Scraper (chapter content)
|
|
9
|
+
- [x] Task 7: Converter (HTML + pandoc)
|
|
10
|
+
- [x] Task 8: CLI fetch command
|
|
11
|
+
- [x] Task 9: Integration Verification
|