@ariesfish/feedloom 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Quinn
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,282 @@
1
+ # Feedloom
2
+
3
+ Feedloom is a command-line tool for archiving long-form web content. It takes article URLs, URL list files, or RSS/Atom feeds, extracts readable article content, converts it to Markdown with YAML frontmatter, and saves page images as local assets. It is designed for personal knowledge bases, notebook vaults, and offline reading archives.
4
+
5
+ ## Features
6
+
7
+ - Accept one or more URLs directly from the command line.
8
+ - Extract URLs from text or Markdown files, with automatic deduplication.
9
+ - Expand RSS/Atom feeds and optionally filter entries by date.
10
+ - Clean article HTML and convert it to Markdown.
11
+ - Download and localize article images.
12
+ - Generate Markdown notes with `source`, `author`, and `created` frontmatter.
13
+ - Support static fetch, browser-rendered fetch, and stealth fetch modes.
14
+ - Optionally use a local Chrome profile for pages that require login state.
15
+ - Automatically mark Markdown checklist items as done after successful processing.
16
+
17
+ ## Requirements
18
+
19
+ - Node.js >= 24
20
+ - npm
21
+ - macOS, Linux, or Windows should work; browser-based fetching depends on Patchright/Chromium.
22
+
23
+ ## Installation
24
+
25
+ ### 1. Clone the repository
26
+
27
+ ```bash
28
+ git clone <this-repository-url>
29
+ cd feedloom
30
+ ```
31
+
32
+ ### 2. Install dependencies
33
+
34
+ ```bash
35
+ npm install
36
+ ```
37
+
38
+ ### 3. Install the browser runtime
39
+
40
+ If you plan to use `browser`, `stealth`, or the browser fallback in `auto` mode, install the Patchright Chromium runtime:
41
+
42
+ ```bash
43
+ npx patchright install chromium
44
+ ```
45
+
46
+ ### 4. Build the CLI
47
+
48
+ ```bash
49
+ npm run build
50
+ ```
51
+
52
+ After building, run:
53
+
54
+ ```bash
55
+ node dist/cli.js --help
56
+ ```
57
+
58
+ During development, you can run the TypeScript source directly:
59
+
60
+ ```bash
61
+ npm run dev -- --help
62
+ ```
63
+
64
+ To make the CLI available globally on your machine:
65
+
66
+ ```bash
67
+ npm link
68
+ feedloom --help
69
+ ```
70
+
71
+ ## Quick Start
72
+
73
+ Archive a single article to the default `clippings/` directory:
74
+
75
+ ```bash
76
+ npm run dev -- "https://example.com/article"
77
+ ```
78
+
79
+ Write output to a custom directory:
80
+
81
+ ```bash
82
+ npm run dev -- --output-dir ./outputs "https://example.com/article"
83
+ ```
84
+
85
+ Use the built CLI:
86
+
87
+ ```bash
88
+ node dist/cli.js --output-dir ./outputs "https://example.com/article"
89
+ ```
90
+
91
+ The generated Markdown will look roughly like this:
92
+
93
+ ```markdown
94
+ ---
95
+ source: "https://example.com/article"
96
+ author: "Author Name"
97
+ created: "2026-04-29"
98
+ ---
99
+
100
+ # Article Title
101
+
102
+ Article content...
103
+ ```
104
+
105
+ Images are downloaded into an `assets/` subdirectory under the output directory and rewritten as local Markdown references.
106
+
107
+ ## Input Methods
108
+
109
+ ### Pass multiple URLs directly
110
+
111
+ ```bash
112
+ npm run dev -- \
113
+ "https://example.com/a" \
114
+ "https://example.com/b"
115
+ ```
116
+
117
+ ### Read URLs from a file
118
+
119
+ `urls.md` can be a plain URL list or a Markdown checklist:
120
+
121
+ ```markdown
122
+ - [ ] https://example.com/a
123
+ - [ ] https://example.com/b
124
+ ```
125
+
126
+ Run:
127
+
128
+ ```bash
129
+ npm run dev -- --output-dir ./outputs urls.md
130
+ ```
131
+
132
+ After a URL is processed successfully, the matching checklist item is updated automatically:
133
+
134
+ ```markdown
135
+ - [x] https://example.com/a
136
+ ```
137
+
138
+ ### Process RSS/Atom feeds
139
+
140
+ By default, `--source-kind auto` tries to detect whether the input is a normal HTML page or a feed. You can also specify the source kind explicitly:
141
+
142
+ ```bash
143
+ npm run dev -- --source-kind rss-feed --since 2026-01-01 "https://example.com/feed.xml"
144
+ ```
145
+
146
+ Useful slicing options:
147
+
148
+ ```bash
149
+ npm run dev -- --start 1 --end 10 "https://example.com/feed.xml"
150
+ npm run dev -- --limit 5 "https://example.com/feed.xml"
151
+ ```
152
+
153
+ ## Fetch Modes
154
+
155
+ Use `--fetch-mode` to control how pages are fetched:
156
+
157
+ | Mode | Description |
158
+ | --- | --- |
159
+ | `auto` | Default. Try static fetch first, then fall back to browser/stealth when content is insufficient. |
160
+ | `static` | Use plain HTTP fetching only. Fastest option for static pages. |
161
+ | `browser` | Render the page in a browser. Useful for JavaScript-heavy sites. |
162
+ | `stealth` | Use a more realistic browser context. Useful for sites with stronger bot detection. |
163
+
164
+ Examples:
165
+
166
+ ```bash
167
+ npm run dev -- --fetch-mode browser "https://example.com/article"
168
+ npm run dev -- --fetch-mode stealth --solve-cloudflare "https://example.com/article"
169
+ ```
170
+
171
+ ## Browser Options
172
+
173
+ Wait longer after page load:
174
+
175
+ ```bash
176
+ npm run dev -- --fetch-mode browser --wait-ms 5000 "https://example.com/article"
177
+ ```
178
+
179
+ Wait for a selector before extracting content:
180
+
181
+ ```bash
182
+ npm run dev -- --fetch-mode browser --wait-selector "article" "https://example.com/article"
183
+ ```
184
+
185
+ Click popups or expand buttons before extraction:
186
+
187
+ ```bash
188
+ npm run dev -- --fetch-mode browser --click-selector "button.accept" --click-selector ".expand" "https://example.com/article"
189
+ ```
190
+
191
+ Scroll to the bottom before extraction:
192
+
193
+ ```bash
194
+ npm run dev -- --fetch-mode browser --scroll-to-bottom "https://example.com/article"
195
+ ```
196
+
197
+ Use a proxy:
198
+
199
+ ```bash
200
+ npm run dev -- --fetch-mode stealth --proxy "http://127.0.0.1:7890" "https://example.com/article"
201
+ ```
202
+
203
+ Run with a visible browser window for debugging:
204
+
205
+ ```bash
206
+ npm run dev -- --fetch-mode browser --headful "https://example.com/article"
207
+ ```
208
+
209
+ ## Use Local Chrome Login State
210
+
211
+ For pages that require an authenticated browser session, you can try using your local Chrome profile:
212
+
213
+ ```bash
214
+ npm run dev -- \
215
+ --prefer-browser-state \
216
+ --chrome-user-data-dir "{CHROME_INSTALL_PATH}" \
217
+ --chrome-profile "Default" \
218
+ --fetch-mode browser \
219
+ "https://example.com/member-only-article"
220
+ ```
221
+
222
+ Only use this on your own device and accounts. Always respect the target site's terms of service and copyright rules.
223
+
224
+ ## Common CLI Options
225
+
226
+ ```text
227
+ --output-dir <dir> Markdown output directory. Default: clippings
228
+ --source-kind <kind> auto, html-page, or rss-feed. Default: auto
229
+ --since <date> Keep only feed entries on or after YYYY-MM-DD
230
+ --limit <n> Process only the first N deduplicated URLs
231
+ --start <n> Start from the Nth deduplicated URL, 1-based
232
+ --end <n> End at the Nth deduplicated URL, 1-based; 0 means no upper bound
233
+ --fetch-mode <mode> auto, static, browser, or stealth. Default: auto
234
+ --wait-ms <ms> Extra browser wait after load. Default: 2500
235
+ --wait-selector <selector> Wait for a CSS selector
236
+ --click-selector <selector...> Click one or more selectors after page load
237
+ --scroll-to-bottom Scroll to the bottom before extraction
238
+ --headful Run with a visible browser window
239
+ --proxy <server> Proxy server for browser/stealth fetch
240
+ --solve-cloudflare In stealth mode, try to handle Cloudflare challenges
241
+ --disable-resources In stealth mode, block images/media/fonts/stylesheets for speed
242
+ --prefer-browser-state Try local Chrome user state first
243
+ --chrome-user-data-dir <path> Chrome User Data directory
244
+ --chrome-profile <name> Chrome profile name. Default: Default
245
+ ```
246
+
247
+ For the full option list, run:
248
+
249
+ ```bash
250
+ npm run dev -- --help
251
+ ```
252
+
253
+ ## Development
254
+
255
+ ```bash
256
+ npm install
257
+ npm run build
258
+ npm run typecheck
259
+ npm test
260
+ ```
261
+
262
+ ## Tips and Notes
263
+
264
+ - Respect robots.txt, website terms of service, copyright, and rate limits.
265
+ - For dynamic pages, try `--fetch-mode browser` first.
266
+ - For static blogs and news sites, `--fetch-mode static` is usually faster.
267
+ - If article extraction is poor for a specific site, add or adjust a site rule in `src/site-rules/`.
268
+ - For large batches, test with `--limit` before running the full job.
269
+
270
+ ## Acknowledgements
271
+
272
+ Feedloom is inspired by several excellent open-source projects. Special thanks to:
273
+
274
+ - [Defuddle](https://github.com/kepano/defuddle), for high-quality readable content extraction ideas.
275
+ - [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright), for inspiring robust browser automation and realistic page access.
276
+ - [Scrapling](https://github.com/D4Vinci/Scrapling), for ideas around real browser contexts, anti-detection strategies, and resilient scraping fallbacks.
277
+
278
+ Thanks also to Linkedom, Turndown, Commander, Vitest, and the wider TypeScript ecosystem for the reliable building blocks.
279
+
280
+ ## License
281
+
282
+ MIT License
package/dist/cli.d.ts ADDED
@@ -0,0 +1 @@
1
+ #!/usr/bin/env node