ebk 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of ebk might be problematic. Click here for more details.

ebk-0.1.0/MANIFEST.in ADDED
@@ -0,0 +1,3 @@
1
+ include README.md
2
+ include LICENSE
3
+ recursive-include ebk/streamlit *
ebk-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,457 @@
1
+ Metadata-Version: 2.2
2
+ Name: ebk
3
+ Version: 0.1.0
4
+ Summary: A lightweight tool for managing eBook metadata
5
+ Home-page: https://github.com/yourusername/ebk
6
+ Author: Alex Towell
7
+ Author-email: lex@metafunctor.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.7
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: streamlit
13
+ Requires-Dist: lxml
14
+ Requires-Dist: pandas
15
+ Requires-Dist: slugify
16
+ Requires-Dist: pyyaml
17
+ Requires-Dist: pathlib
18
+ Requires-Dist: PyPDF2
19
+ Requires-Dist: ebooklib
20
+ Requires-Dist: altair
21
+ Requires-Dist: Pillow
22
+ Dynamic: author
23
+ Dynamic: author-email
24
+ Dynamic: classifier
25
+ Dynamic: description
26
+ Dynamic: description-content-type
27
+ Dynamic: home-page
28
+ Dynamic: requires-dist
29
+ Dynamic: requires-python
30
+ Dynamic: summary
31
+
32
+ # ebk
33
+
34
+ ![ebk Logo](https://github.com/queelius/ebk/blob/main/logo.png?raw=true)
35
+
36
+ **ebk** is a lightweight and versatile tool for managing eBook metadata. It provides a rich Typer-based CLI (with colorized output courtesy of [Rich](https://github.com/Textualize/rich)), supports import/export of libraries from multiple sources (Calibre, raw ebooks, ZIP archives), enables advanced set-theoretic merges, and offers an interactive Streamlit web dashboard.
37
+
38
+ > **Note**: We have future plans to integrate Large Language Model (LLM) features for automated tagging, summarization, and metadata generation—stay tuned!
39
+
40
+ ---
41
+
42
+ ## Table of Contents
43
+
44
+ - [Features](#features)
45
+ - [Installation](#installation)
46
+ - [Configuration](#configuration)
47
+ - [CLI Usage](#cli-usage)
48
+ - [General CLI Structure](#general-cli-structure)
49
+ - [Importing Libraries](#importing-libraries)
50
+ - [Import from Zip (`import-zip`)](#import-from-zip-import-zip)
51
+ - [Import Calibre Library (`import-calibre`)](#import-calibre-library-import-calibre)
52
+ - [Import Raw Ebooks (`import-ebooks`)](#import-raw-ebooks-import-ebooks)
53
+ - [Exporting Libraries](#exporting-libraries)
54
+ - [Merging Libraries](#merging-libraries)
55
+ - [Searching](#searching)
56
+ - [Regex Search](#regex-search)
57
+ - [JMESPath Search](#jmespath-search)
58
+ - [Listing, Adding, Updating, and Removing Entries](#listing-adding-updating-and-removing-entries)
59
+ - [Launch Streamlit Dashboard](#launch-streamlit-dashboard)
60
+ - [Streamlit Dashboard Usage](#streamlit-dashboard-usage)
61
+ - [Library Management Class (Python API)](#library-management-class-python-api)
62
+ - [Future LLM Integration](#future-llm-integration)
63
+ - [Contributing](#contributing)
64
+ - [License](#license)
65
+ - [Known Issues & TODOs](#known-issues--todos)
66
+ - [Stay Updated](#stay-updated)
67
+ - [Support](#support)
68
+
69
+ ---
70
+
71
+ ## Features
72
+
73
+ - **Typer + Rich CLI**: A colorized, easy-to-use, and extensible command-line interface.
74
+ - **Multiple Import Paths**:
75
+ - Calibre libraries → JSON-based ebk library
76
+ - Raw eBook folders → Basic metadata inference (cover extraction, PDF metadata)
77
+ - Existing ebk libraries in `.zip` format
78
+ - **Advanced Metadata**:
79
+ - Set-theoretic merges (union, intersect, diff, symdiff)
80
+ - Unique entry identification (hash-based)
81
+ - Automatic cover image extraction
82
+ - **Flexible Exports**:
83
+ - Export to ZIP
84
+ - Hugo-compatible Markdown for static site integration
85
+ - **Streamlit Dashboard**:
86
+ - Interactive web interface for browsing, filtering, and managing your eBook library
87
+ - Search by title, author, subjects, language, etc.
88
+ - Download eBooks from the dashboard
89
+ - **Regex & JMESPath Searching**: Perform advanced queries on your metadata (CLI + Streamlit).
90
+ - **(Planned) LLM Extensions**: Automatic summarization, tagging, or classification using large language models.
91
+
92
+ ---
93
+
94
+ ## Installation
95
+
96
+ 1. **Clone the Repository**
97
+
98
+ ```bash
99
+ git clone https://github.com/queelius/ebk.git
100
+ cd ebk
101
+ ```
102
+
103
+ 2. **(Optional) Create a Virtual Environment**
104
+
105
+ Using `venv`:
106
+
107
+ ```bash
108
+ python -m venv venv
109
+ source venv/bin/activate # (On Windows: venv\Scripts\activate)
110
+ ```
111
+
112
+ Using `conda`:
113
+
114
+ ```bash
115
+ conda create -n ebk python=3.8
116
+ conda activate ebk
117
+ ```
118
+
119
+ 3. **Install Dependencies & `ebk`**
120
+
121
+ ```bash
122
+ pip install -r requirements.txt
123
+ pip install .
124
+ ```
125
+
126
+ > **Note**: You need Python 3.8+.
127
+
128
+ ---
129
+
130
+ ## Configuration
131
+
132
+ The primary configuration file should be placed in `~/.ebkrc`.
133
+ Here’s a sample configuration:
134
+
135
+ ```
136
+ [llm]
137
+ endpoint = <your_llm_endpoint>
138
+ api_key = <your_llm_api_key>
139
+ model = <your_llm_model>
140
+
141
+ [streamlit]
142
+ port = 8501
143
+ host = "0.0.0.0" # this allows external access
144
+
145
+ [export]
146
+ hugo = "/path/to/hugo_site"
147
+
148
+
149
+ ```
150
+
151
+ ## CLI Usage
152
+
153
+ ebk uses [Typer](https://typer.tiangolo.com/) under the hood, providing subcommands for imports, exports, merges, searches, listing, updates, etc. The CLI also leverages [Rich](https://github.com/Textualize/rich) for colorized/logging output.
154
+
155
+ ### General CLI Structure
156
+
157
+ ```
158
+ ebk --help
159
+ ebk <command> --help # see specific usage, options
160
+ ```
161
+
162
+ The primary commands include:
163
+ - `import-zip`
164
+ - `import-calibre`
165
+ - `import-ebooks`
166
+ - `export`
167
+ - `merge`
168
+ - `search`
169
+ - `stats`
170
+ - `list`
171
+ - `add`
172
+ - `remove`
173
+ - `remove-index`
174
+ - `update-index`
175
+ - `update-id`
176
+ - `dash`
177
+ - …and more!
178
+
179
+ ---
180
+
181
+ ### Importing Libraries
182
+
183
+ #### Import from Zip (`import-zip`)
184
+
185
+ Load an existing ebk library archive (which has a `metadata.json` plus eBook/cover files) into a folder:
186
+
187
+ ```bash
188
+ ebk import-zip /path/to/ebk_library.zip --output-dir /path/to/output
189
+ ```
190
+
191
+ - If `--output-dir` is omitted, the default will be derived from the zip filename.
192
+ - This unpacks the ZIP while retaining the `metadata.json` structure.
193
+
194
+ #### Import Calibre Library (`import-calibre`)
195
+
196
+ Convert your [Calibre](https://calibre-ebook.com/) library into an ebk JSON library:
197
+
198
+ ```bash
199
+ ebk import-calibre /path/to/calibre/library --output-dir /path/to/output
200
+ ```
201
+
202
+ - Extracts metadata from `metadata.opf` files (if present) or from PDF/EPUB fallback.
203
+ - Copies ebook files + covers into the output directory, producing a consolidated `metadata.json`.
204
+
205
+ #### Import Raw Ebooks (`import-ebooks`)
206
+
207
+ Import a folder of eBooks (PDF, EPUB, etc.) by inferring minimal metadata:
208
+
209
+ ```bash
210
+ ebk import-ebooks /path/to/raw/ebooks --output-dir /path/to/output
211
+ ```
212
+
213
+ - Uses PyPDF2 for PDF metadata and attempts a best-effort cover extraction (first page → thumbnail).
214
+ - Creates `metadata.json` and copies files + covers to `/path/to/output`.
215
+
216
+ ---
217
+
218
+ ### Exporting Libraries
219
+
220
+ Available formats:
221
+ - **Hugo**:
222
+ ```bash
223
+ ebk export hugo /path/to/ebk_library /path/to/hugo_site
224
+ ```
225
+ This writes Hugo-compatible Markdown files (and copies covers/ebooks) into your Hugo `content` + `static` folders.
226
+
227
+ - **Zip**:
228
+ ```bash
229
+ ebk export zip /path/to/ebk_library /path/to/export.zip
230
+ ```
231
+ Creates a `.zip` archive containing the entire library.
232
+
233
+ ---
234
+
235
+ ### Merging Libraries
236
+
237
+ Use set-theoretic operations to combine multiple ebk libraries:
238
+
239
+ ```bash
240
+ ebk merge <operation> /path/to/merged_dir [libs...]
241
+ ```
242
+
243
+ Where `<operation>` can be:
244
+ - `union`: Combine all unique entries
245
+ - `intersect`: Keep only entries common to all libraries
246
+ - `diff`: Keep entries present in the first library but not others
247
+ - `symdiff`: Entries in exactly one library (exclusive-or)
248
+
249
+ **Example**:
250
+
251
+ ```bash
252
+ ebk merge union /path/to/merged_lib /path/to/lib1 /path/to/lib2
253
+ ```
254
+
255
+ ---
256
+
257
+ ### Searching
258
+
259
+ #### Regex Search
260
+
261
+ ```bash
262
+ ebk search <regex> /path/to/ebk_library
263
+ ```
264
+
265
+ By default, it searches the `title` field. You can specify additional fields:
266
+
267
+ ```bash
268
+ ebk search "Python" /path/to/lib --regex-fields title creators
269
+ ```
270
+
271
+ #### JMESPath Search
272
+
273
+ For more powerful, structured searches:
274
+
275
+ ```bash
276
+ ebk search "[?language=='en']" /path/to/lib --jmespath
277
+ ```
278
+
279
+ JMESPath expressions allow you to filter, project fields, etc. If you want to see these results as JSON:
280
+
281
+ ```bash
282
+ ebk search "[?language=='en']" /path/to/lib --jmespath --json
283
+ ```
284
+
285
+ ---
286
+
287
+ ### Listing, Adding, Updating, and Removing Entries
288
+
289
+ - **List**:
290
+ ```bash
291
+ ebk list /path/to/lib
292
+ ```
293
+ Prints all ebooks with indexes, clickable file links (via Rich).
294
+
295
+ - **Add**:
296
+ ```bash
297
+ ebk add /path/to/lib --title "My Book" --creators "Alice" --ebooks "/path/to/book.pdf"
298
+ ```
299
+ or
300
+ ```bash
301
+ ebk add /path/to/lib --json /path/to/new_entries.json
302
+ ```
303
+ to bulk-add entries from a JSON file.
304
+
305
+ - **Update**:
306
+ - By index:
307
+ ```bash
308
+ ebk update-index /path/to/lib 12 --title "New Title"
309
+ ```
310
+ - By unique ID:
311
+ ```bash
312
+ ebk update-id /path/to/lib <unique_id> --cover /path/to/new_cover.jpg
313
+ ```
314
+
315
+ - **Remove**:
316
+ - By regex in `title`, `creators`, or `identifiers`:
317
+ ```bash
318
+ ebk remove /path/to/lib "SomeRegex" --apply-to title creators
319
+ ```
320
+ - By index:
321
+ ```bash
322
+ ebk remove-index /path/to/lib 3 4 5
323
+ ```
324
+ - By unique ID:
325
+ ```bash
326
+ ebk remove-id /path/to/lib <unique_id>
327
+ ```
328
+
329
+ - **Stats**:
330
+ ```bash
331
+ ebk stats /path/to/lib --keywords python data "machine learning"
332
+ ```
333
+ Returns aggregated statistics (common languages, top creators, subject frequency, etc.).
334
+
335
+ ---
336
+
337
+ ### Launch Streamlit Dashboard
338
+
339
+ ```bash
340
+ ebk dash --port 8501
341
+ ```
342
+
343
+ - By default, the dashboard runs at `http://localhost:8501`.
344
+
345
+ ---
346
+
347
+ ## Streamlit Dashboard Usage
348
+
349
+ 1. **Prepare a ZIP Archive**
350
+ From any ebk library folder (containing `metadata.json`), compress the entire folder into a `.zip`. Or use:
351
+ ```bash
352
+ ebk export zip /path/to/lib /path/to/lib.zip
353
+ ```
354
+
355
+ 2. **Upload it** via the Streamlit interface (`ebk dash`).
356
+ 3. **Browse & Filter** your library:
357
+ - Advanced filtering (author, subject, language, year, etc.).
358
+ - View cover images, descriptions, and download eBooks.
359
+ - JMESPath-based advanced search in the “Advanced Search” tab.
360
+ 4. **Enjoy** a modern, interactive interface for eBook exploration.
361
+
362
+ ---
363
+
364
+ ## Library Management Class (Python API)
365
+
366
+ For programmatic usage, `ebk` includes a simple `LibraryManager` class:
367
+
368
+ ```python
369
+ from ebk.manager import LibraryManager
370
+
371
+ manager = LibraryManager("metadata.json")
372
+
373
+ # List all books
374
+ all_books = manager.list_books()
375
+
376
+ # Add a book
377
+ manager.add_book({
378
+ "Title": "Example Book",
379
+ "Author": "Alice",
380
+ "Tags": "fiction"
381
+ })
382
+
383
+ # Delete or update
384
+ manager.delete_book("Old Title")
385
+ manager.update_book("Example Book", {"Tags": "fiction, fantasy"})
386
+ ```
387
+
388
+ ---
389
+
390
+ ## LLM Integration
391
+
392
+ The ebk library may be queried using a natural language interface using the
393
+ streamlit dashboard's chat interface or the command line. For the comamnd line
394
+ interface, the `llm` subcommand is used:
395
+
396
+ ```bash
397
+ ebk llm <ebklib> "What are the books about Python and machine learning published after 2020?"
398
+ ```
399
+
400
+ The `llm` subcommand uses the `ebk` library to answer questions about the library
401
+ using a large language model. The configuration file should contain the endpoint
402
+ of the LLM server, the API key, and the model to use. Either an Ollama compatible
403
+ endpoint or an OpenAI compatible endpoint can be used.
404
+
405
+ ---
406
+
407
+ ## Contributing
408
+
409
+ Contributions are welcome! Here’s how to get involved:
410
+
411
+ 1. **Fork the Repo**
412
+ 2. **Create a Branch** for your feature or fix
413
+ 3. **Commit & Push** your changes
414
+ 4. **Open a Pull Request** describing the changes
415
+
416
+ We appreciate code contributions, bug reports, and doc improvements alike.
417
+
418
+ ---
419
+
420
+ ## License
421
+
422
+ Distributed under the [MIT License](https://github.com/queelius/ebk/blob/main/LICENSE).
423
+
424
+ ---
425
+
426
+ ## Known Issues & TODOs
427
+
428
+ 1. **Exporter Module**:
429
+ - Switch from `os.system` to `shutil` for safer file operations
430
+ - Expand supported eBook formats & metadata fields
431
+ 2. **Merger Module**:
432
+ - Resolve conflicts automatically or allow user-specified conflict resolution
433
+ - Performance optimization for large libraries
434
+ 3. **Consistent Entry Identification**:
435
+ - Support multiple eBook files per entry seamlessly
436
+ - Improve hash-based deduplication for large files
437
+ 4. **LLM-Based Metadata** _(Planned)_:
438
+ - Summaries or tags automatically generated via language models
439
+ - Potential GPU/accelerator support for on-device inference
440
+
441
+ ---
442
+
443
+ ## Stay Updated
444
+
445
+ - **GitHub**: [https://github.com/queelius/ebk](https://github.com/queelius/ebk)
446
+ - **Website**: [https://metafunctor.com](https://metafunctor.com)
447
+
448
+ ---
449
+
450
+ ## Support
451
+
452
+ - **Issues**: [Open an Issue](https://github.com/queelius/ebk/issues) on GitHub
453
+ - **Contact**: <lex@metafunctor.com>
454
+
455
+ ---
456
+
457
+ Happy eBook managing! 📚✨