ituring-fetch 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,12 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+ *.pdf
12
+ *.epub
@@ -0,0 +1 @@
1
+ 3.14
@@ -0,0 +1,71 @@
1
+ Metadata-Version: 2.4
2
+ Name: ituring-fetch
3
+ Version: 0.1.0
4
+ Summary: CLI tool to scrape purchased ebooks from ituring.com.cn
5
+ Author-email: yms_hi <yms_hi@Outlook.com>
6
+ Requires-Python: >=3.11
7
+ Requires-Dist: click>=8
8
+ Requires-Dist: httpx>=0.27
9
+ Requires-Dist: playwright>=1.40
10
+ Requires-Dist: rich>=13
11
+ Description-Content-Type: text/markdown
12
+
13
+ # ituring-fetch
14
+
15
+ 把图灵社区已购电子书扒下来,转成 epub/pdf/html。
16
+
17
+ ## 安装
18
+
19
+ ```bash
20
+ git clone <repo>
21
+ cd ituring_fetch
22
+ uv sync
23
+ playwright install chromium # 或者用系统自带的 Chrome
24
+ ```
25
+
26
+ 依赖:Python 3.11+、pandoc(epub 格式需要)、Chrome/Chromium。
27
+
28
+ ## 使用
29
+
30
+ ```bash
31
+ # 登录(会弹出浏览器窗口)
32
+ uv run ituring-fetch login
33
+
34
+ # 看看登录没
35
+ uv run ituring-fetch status
36
+
37
+ # 列出已购电子书
38
+ uv run ituring-fetch list
39
+
40
+ # 抓取一本书
41
+ uv run ituring-fetch fetch 1143 --type=epub
42
+ uv run ituring-fetch fetch 1143 --type=pdf
43
+ uv run ituring-fetch fetch 1143 --type=html
44
+
45
+ # 指定输出文件名
46
+ uv run ituring-fetch fetch 1143 --type=pdf -o mybook.pdf
47
+
48
+ # 登出
49
+ uv run ituring-fetch logout
50
+ ```
51
+
52
+ `list` 输出的第一列就是书籍 ID。
53
+
54
+ ## 调试
55
+
56
+ 出问题时设个环境变量看详细日志:
57
+
58
+ ```bash
59
+ ITURING_DEBUG=1 uv run ituring-fetch fetch 1143 --type=pdf
60
+ ```
61
+
62
+ 会往 stderr 打请求/响应细节。
63
+
64
+ ## 原理
65
+
66
+ 登录时用 Playwright 打开浏览器,你手动登录后,程序从 localStorage 里取出 access token,跟 cookie 一起存到 `~/.ituring/credentials.json`。
67
+
68
+ 后续操作:
69
+ - `list` 和书籍信息直接调 `api.ituring.com.cn` 的接口,带 Bearer token
70
+ - 章节内容用 Playwright headless 渲染页面,抽 `.article-content` 的 HTML
71
+ - 合并所有章节,pandoc 转 epub,Playwright 内置 PDF 引擎出 pdf
@@ -0,0 +1,58 @@
1
+ # ituring-fetch
2
+
3
+ Download purchased ebooks from ituring.com.cn and convert to epub/pdf/html.
4
+
5
+ ## Install
6
+
7
+ ```bash
8
+ git clone <repo>
9
+ cd ituring_fetch
10
+ uv sync
11
+ playwright install chromium # or use system Chrome
12
+ ```
13
+
14
+ Requires Python 3.11+, pandoc (for epub output), Chrome/Chromium.
15
+
16
+ ## Usage
17
+
18
+ ```bash
19
+ # Login (opens a browser window)
20
+ uv run ituring-fetch login
21
+
22
+ # Check login status
23
+ uv run ituring-fetch status
24
+
25
+ # List purchased ebooks
26
+ uv run ituring-fetch list
27
+
28
+ # Download a book
29
+ uv run ituring-fetch fetch 1143 --type=epub
30
+ uv run ituring-fetch fetch 1143 --type=pdf
31
+ uv run ituring-fetch fetch 1143 --type=html
32
+
33
+ # Custom output filename
34
+ uv run ituring-fetch fetch 1143 --type=pdf -o mybook.pdf
35
+
36
+ # Logout
37
+ uv run ituring-fetch logout
38
+ ```
39
+
40
+ The first column from `list` is the book ID.
41
+
42
+ ## Debugging
43
+
44
+ Set `ITURING_DEBUG=1` to see request/response details on stderr:
45
+
46
+ ```bash
47
+ ITURING_DEBUG=1 uv run ituring-fetch fetch 1143 --type=pdf
48
+ ```
49
+
50
+ ## How it works
51
+
52
+ Login opens a browser via Playwright. After you sign in manually, the tool extracts the access token from localStorage and saves it alongside cookies to `~/.ituring/credentials.json`.
53
+
54
+ From there:
55
+
56
+ - `list` and book metadata use the `api.ituring.com.cn` REST API with Bearer auth
57
+ - Chapter content is scraped by rendering each chapter page in headless Playwright and extracting `.article-content` innerHTML
58
+ - All chapters are assembled into a single HTML document, then converted via pandoc (epub) or Playwright's built-in PDF engine (pdf)
@@ -0,0 +1,59 @@
1
+ # ituring-fetch
2
+
3
+ 把图灵社区已购电子书扒下来,转成 epub/pdf/html。
4
+
5
+ ## 安装
6
+
7
+ ```bash
8
+ git clone <repo>
9
+ cd ituring_fetch
10
+ uv sync
11
+ playwright install chromium # 或者用系统自带的 Chrome
12
+ ```
13
+
14
+ 依赖:Python 3.11+、pandoc(epub 格式需要)、Chrome/Chromium。
15
+
16
+ ## 使用
17
+
18
+ ```bash
19
+ # 登录(会弹出浏览器窗口)
20
+ uv run ituring-fetch login
21
+
22
+ # 看看登录没
23
+ uv run ituring-fetch status
24
+
25
+ # 列出已购电子书
26
+ uv run ituring-fetch list
27
+
28
+ # 抓取一本书
29
+ uv run ituring-fetch fetch 1143 --type=epub
30
+ uv run ituring-fetch fetch 1143 --type=pdf
31
+ uv run ituring-fetch fetch 1143 --type=html
32
+
33
+ # 指定输出文件名
34
+ uv run ituring-fetch fetch 1143 --type=pdf -o mybook.pdf
35
+
36
+ # 登出
37
+ uv run ituring-fetch logout
38
+ ```
39
+
40
+ `list` 输出的第一列就是书籍 ID。
41
+
42
+ ## 调试
43
+
44
+ 出问题时设个环境变量看详细日志:
45
+
46
+ ```bash
47
+ ITURING_DEBUG=1 uv run ituring-fetch fetch 1143 --type=pdf
48
+ ```
49
+
50
+ 会往 stderr 打请求/响应细节。
51
+
52
+ ## 原理
53
+
54
+ 登录时用 Playwright 打开浏览器,你手动登录后,程序从 localStorage 里取出 access token,跟 cookie 一起存到 `~/.ituring/credentials.json`。
55
+
56
+ 后续操作:
57
+ - `list` 和书籍信息直接调 `api.ituring.com.cn` 的接口,带 Bearer token
58
+ - 章节内容用 Playwright headless 渲染页面,抽 `.article-content` 的 HTML
59
+ - 合并所有章节,pandoc 转 epub,Playwright 内置 PDF 引擎出 pdf
@@ -0,0 +1,11 @@
1
+ # Implementation Tasks
2
+
3
+ - [x] Task 1: Project Setup + Models
4
+ - [x] Task 2: Auth Module (cookie handling)
5
+ - [x] Task 3: API Client
6
+ - [x] Task 4: Auth Operations (login/logout/status)
7
+ - [x] Task 5: CLI (login, logout, status, list)
8
+ - [x] Task 6: Scraper (chapter content)
9
+ - [x] Task 7: Converter (HTML + pandoc)
10
+ - [x] Task 8: CLI fetch command
11
+ - [x] Task 9: Integration Verification