estravon-backend 0.1.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- estravon_backend-0.1.4/LICENSE +22 -0
- estravon_backend-0.1.4/PKG-INFO +19 -0
- estravon_backend-0.1.4/README.md +143 -0
- estravon_backend-0.1.4/estravon/__init__.py +3 -0
- estravon_backend-0.1.4/estravon/__main__.py +5 -0
- estravon_backend-0.1.4/estravon/backends.py +702 -0
- estravon_backend-0.1.4/estravon/chunking.py +162 -0
- estravon_backend-0.1.4/estravon/content_stats.py +284 -0
- estravon_backend-0.1.4/estravon/initialization.py +192 -0
- estravon_backend-0.1.4/estravon/pipeline.py +435 -0
- estravon_backend-0.1.4/estravon/schema_registry.json +71 -0
- estravon_backend-0.1.4/estravon/server.py +382 -0
- estravon_backend-0.1.4/estravon_backend.egg-info/PKG-INFO +19 -0
- estravon_backend-0.1.4/estravon_backend.egg-info/SOURCES.txt +18 -0
- estravon_backend-0.1.4/estravon_backend.egg-info/dependency_links.txt +1 -0
- estravon_backend-0.1.4/estravon_backend.egg-info/entry_points.txt +2 -0
- estravon_backend-0.1.4/estravon_backend.egg-info/requires.txt +14 -0
- estravon_backend-0.1.4/estravon_backend.egg-info/top_level.txt +1 -0
- estravon_backend-0.1.4/pyproject.toml +41 -0
- estravon_backend-0.1.4/setup.cfg +4 -0
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
GNU AFFERO GENERAL PUBLIC LICENSE
|
|
2
|
+
Version 3, 19 November 2007
|
|
3
|
+
|
|
4
|
+
Copyright (C) 2026 Zotero Marker Contributors
|
|
5
|
+
|
|
6
|
+
SPDX-License-Identifier: AGPL-3.0-or-later
|
|
7
|
+
|
|
8
|
+
This program is free software: you can redistribute it and/or modify
|
|
9
|
+
it under the terms of the GNU Affero General Public License as published
|
|
10
|
+
by the Free Software Foundation, either version 3 of the License, or
|
|
11
|
+
(at your option) any later version.
|
|
12
|
+
|
|
13
|
+
This program is distributed in the hope that it will be useful,
|
|
14
|
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
15
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
16
|
+
GNU Affero General Public License for more details.
|
|
17
|
+
|
|
18
|
+
You should have received a copy of the GNU Affero General Public License
|
|
19
|
+
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
20
|
+
|
|
21
|
+
The full license text is available at:
|
|
22
|
+
https://www.gnu.org/licenses/agpl-3.0.txt
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: estravon-backend
|
|
3
|
+
Version: 0.1.4
|
|
4
|
+
Summary: FastHTML backend for Zotero Book Markdown Extractor
|
|
5
|
+
Requires-Python: >=3.11
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Requires-Dist: python-fasthtml<1.0,>=0.12.0
|
|
8
|
+
Requires-Dist: replicate<2.0,>=0.34.0
|
|
9
|
+
Requires-Dist: httpx<1.0,>=0.27.0
|
|
10
|
+
Requires-Dist: python-multipart<1.0,>=0.0.9
|
|
11
|
+
Requires-Dist: python-dotenv<2.0,>=1.0.0
|
|
12
|
+
Requires-Dist: pypdf[cryptography]>=4.0
|
|
13
|
+
Requires-Dist: mistralai<3.0,>=2.0
|
|
14
|
+
Provides-Extra: dev
|
|
15
|
+
Requires-Dist: pytest>=8.0; extra == "dev"
|
|
16
|
+
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
|
|
17
|
+
Provides-Extra: nlp
|
|
18
|
+
Requires-Dist: spacy>=3.7; extra == "nlp"
|
|
19
|
+
Dynamic: license-file
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Estravon — self-hosted backend
|
|
2
|
+
|
|
3
|
+
Extract nominated sections of a book PDF as Markdown and attach them directly
|
|
4
|
+
to the Zotero item — synced, versioned, always co-located with the source.
|
|
5
|
+
|
|
6
|
+
The extracted `.md` files include full provenance metadata (pages, backend,
|
|
7
|
+
extraction date, schema version) and content statistics (word counts, vocabulary
|
|
8
|
+
profile). They can be read in Zotero, searched, and fed into downstream tools.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Two ways to use Estravon
|
|
13
|
+
|
|
14
|
+
**Hosted (no setup):** visit [estravon.com](https://estravon.com), buy a credit pack,
|
|
15
|
+
paste one API key into the plugin preferences, and start extracting. Nothing to install
|
|
16
|
+
or maintain on your machine.
|
|
17
|
+
|
|
18
|
+
**Self-hosted (this repo):** run the backend on your own machine with your own Mistral
|
|
19
|
+
API key. AGPL-3.0. Full control, no ongoing cost beyond your Mistral usage.
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Prerequisites (self-hosted)
|
|
24
|
+
|
|
25
|
+
- **Zotero 7.0** or newer
|
|
26
|
+
- **Python 3.10+**
|
|
27
|
+
- An API key for one of the supported extraction backends:
|
|
28
|
+
|
|
29
|
+
| Backend | Pricing | Notes |
|
|
30
|
+
|---|---|---|
|
|
31
|
+
| [Mistral](https://console.mistral.ai/) | Pay-as-you-go | Default. No subscription needed. |
|
|
32
|
+
| [Datalab](https://www.datalab.to/) | $25/month flat | Single-user subscription. Slightly higher throughput at peak. |
|
|
33
|
+
| [Replicate](https://replicate.com/) | Pay-as-you-go | Runs the same Datalab model but without concurrency; lower idle cost. |
|
|
34
|
+
|
|
35
|
+
Set `_ZM_BACKEND=mistral` (default), `_ZM_BACKEND=datalab`, or `_ZM_BACKEND=replicate`
|
|
36
|
+
in your `.env` file to choose.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Installation
|
|
41
|
+
|
|
42
|
+
### 1 — Install the Zotero plugin
|
|
43
|
+
|
|
44
|
+
Download `estravon-<version>.xpi` from the
|
|
45
|
+
[latest GitHub Release](../../releases/latest).
|
|
46
|
+
|
|
47
|
+
In Zotero: **Tools → Plugins → Install Add-on From File…** → select the `.xpi`.
|
|
48
|
+
|
|
49
|
+
After the initial install the plugin auto-updates via Zotero's built-in update
|
|
50
|
+
mechanism — no manual action needed for future releases.
|
|
51
|
+
|
|
52
|
+
On first launch the plugin opens `estravon.com/start` in your browser; follow the
|
|
53
|
+
"Make it yourself" path to set up the self-hosted backend.
|
|
54
|
+
|
|
55
|
+
### 2 — Install and start the Python backend
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
pip install estravon-backend
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Copy the example environment file and add your API key:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
cp .env.example .env
|
|
65
|
+
# Edit .env: set MISTRAL_API_KEY (or DATALAB_API_KEY / REPLICATE_API_TOKEN)
|
|
66
|
+
# Optionally set _ZM_BACKEND=datalab or _ZM_BACKEND=replicate to switch backends
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Start the backend:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
estravon --port 7766
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Keep this terminal running while you use the plugin. In Zotero → Settings → Estravon,
|
|
76
|
+
confirm the status indicator is green.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## First extraction
|
|
81
|
+
|
|
82
|
+
1. Open Zotero and right-click a **Book** item that has a PDF attachment.
|
|
83
|
+
2. Select **Extract Section to Markdown…**
|
|
84
|
+
3. Fill in the section name (e.g. `chapter_1`), page range (e.g. `1-40`),
|
|
85
|
+
and extraction mode (`balanced` is a good default).
|
|
86
|
+
4. Click **Extract** and wait. The backend calls the Mistral OCR API and returns
|
|
87
|
+
when done (typically 30–90 seconds for 40 pages).
|
|
88
|
+
5. The extracted `.md` file and any images appear as child attachments on the
|
|
89
|
+
Zotero item. An **Extraction log** child note records the provenance.
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Configuration
|
|
94
|
+
|
|
95
|
+
Plugin preferences are in **Zotero → Settings → Estravon**:
|
|
96
|
+
|
|
97
|
+
| Preference | Default | Description |
|
|
98
|
+
|---|---|---|
|
|
99
|
+
| Backend URL | `http://localhost:7766` | Self-hosted backend address |
|
|
100
|
+
| Default chunk size | `80` | Pages per API call (reduce for large scanned books) |
|
|
101
|
+
| Default mode | `balanced` | `fast` / `balanced` / `accurate` |
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Checking backend health
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
curl http://localhost:7766/ping
|
|
109
|
+
# {"status":"ok","state":"idle","backend":"mistral"}
|
|
110
|
+
|
|
111
|
+
curl http://localhost:7766/status
|
|
112
|
+
# {"state":"idle","state_since_s":12.3,"backend":"mistral","last_job":{}}
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
The `/status` endpoint shows the current server state (idle / running / error) and
|
|
116
|
+
the details of the last job — useful for debugging without SSH access.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## Roadmap
|
|
121
|
+
|
|
122
|
+
**v0.4.x (now):** Self-hosted vanilla backend + Zotero plugin. Uses Mistral OCR.
|
|
123
|
+
|
|
124
|
+
**v0.5.x:** Plugin session telemetry (`/jobs/{id}/ack`); per-chunk quality scores.
|
|
125
|
+
|
|
126
|
+
**Post-launch:** LLM agent that extracts all chapters from a book autonomously
|
|
127
|
+
(requires the tools server — separate repository, not yet public).
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Issues and feedback
|
|
132
|
+
|
|
133
|
+
Please report bugs and feature requests via
|
|
134
|
+
[GitHub Issues](../../issues).
|
|
135
|
+
|
|
136
|
+
Feedback on extraction quality for scanned books, non-English texts, or books with
|
|
137
|
+
heavy mathematical notation is especially welcome.
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## License
|
|
142
|
+
|
|
143
|
+
[AGPL-3.0](LICENSE) — the same license as Zotero itself.
|