estravon-backend 0.1.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,22 @@
1
+ GNU AFFERO GENERAL PUBLIC LICENSE
2
+ Version 3, 19 November 2007
3
+
4
+ Copyright (C) 2026 Zotero Marker Contributors
5
+
6
+ SPDX-License-Identifier: AGPL-3.0-or-later
7
+
8
+ This program is free software: you can redistribute it and/or modify
9
+ it under the terms of the GNU Affero General Public License as published
10
+ by the Free Software Foundation, either version 3 of the License, or
11
+ (at your option) any later version.
12
+
13
+ This program is distributed in the hope that it will be useful,
14
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ GNU Affero General Public License for more details.
17
+
18
+ You should have received a copy of the GNU Affero General Public License
19
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
20
+
21
+ The full license text is available at:
22
+ https://www.gnu.org/licenses/agpl-3.0.txt
@@ -0,0 +1,19 @@
1
+ Metadata-Version: 2.4
2
+ Name: estravon-backend
3
+ Version: 0.1.4
4
+ Summary: FastHTML backend for Zotero Book Markdown Extractor
5
+ Requires-Python: >=3.11
6
+ License-File: LICENSE
7
+ Requires-Dist: python-fasthtml<1.0,>=0.12.0
8
+ Requires-Dist: replicate<2.0,>=0.34.0
9
+ Requires-Dist: httpx<1.0,>=0.27.0
10
+ Requires-Dist: python-multipart<1.0,>=0.0.9
11
+ Requires-Dist: python-dotenv<2.0,>=1.0.0
12
+ Requires-Dist: pypdf[cryptography]>=4.0
13
+ Requires-Dist: mistralai<3.0,>=2.0
14
+ Provides-Extra: dev
15
+ Requires-Dist: pytest>=8.0; extra == "dev"
16
+ Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
17
+ Provides-Extra: nlp
18
+ Requires-Dist: spacy>=3.7; extra == "nlp"
19
+ Dynamic: license-file
@@ -0,0 +1,143 @@
1
+ # Estravon — self-hosted backend
2
+
3
+ Extract nominated sections of a book PDF as Markdown and attach them directly
4
+ to the Zotero item — synced, versioned, always co-located with the source.
5
+
6
+ The extracted `.md` files include full provenance metadata (pages, backend,
7
+ extraction date, schema version) and content statistics (word counts, vocabulary
8
+ profile). They can be read in Zotero, searched, and fed into downstream tools.
9
+
10
+ ---
11
+
12
+ ## Two ways to use Estravon
13
+
14
+ **Hosted (no setup):** visit [estravon.com](https://estravon.com), buy a credit pack,
15
+ paste one API key into the plugin preferences, and start extracting. Nothing to install
16
+ or maintain on your machine.
17
+
18
+ **Self-hosted (this repo):** run the backend on your own machine with your own Mistral
19
+ API key. AGPL-3.0. Full control, no ongoing cost beyond your Mistral usage.
20
+
21
+ ---
22
+
23
+ ## Prerequisites (self-hosted)
24
+
25
+ - **Zotero 7.0** or newer
26
+ - **Python 3.10+**
27
+ - An API key for one of the supported extraction backends:
28
+
29
+ | Backend | Pricing | Notes |
30
+ |---|---|---|
31
+ | [Mistral](https://console.mistral.ai/) | Pay-as-you-go | Default. No subscription needed. |
32
+ | [Datalab](https://www.datalab.to/) | $25/month flat | Single-user subscription. Slightly higher throughput at peak. |
33
+ | [Replicate](https://replicate.com/) | Pay-as-you-go | Runs the same Datalab model but without concurrency; lower idle cost. |
34
+
35
+ Set `_ZM_BACKEND=mistral` (default), `_ZM_BACKEND=datalab`, or `_ZM_BACKEND=replicate`
36
+ in your `.env` file to choose.
37
+
38
+ ---
39
+
40
+ ## Installation
41
+
42
+ ### 1 — Install the Zotero plugin
43
+
44
+ Download `estravon-<version>.xpi` from the
45
+ [latest GitHub Release](../../releases/latest).
46
+
47
+ In Zotero: **Tools → Plugins → Install Add-on From File…** → select the `.xpi`.
48
+
49
+ After the initial install the plugin auto-updates via Zotero's built-in update
50
+ mechanism — no manual action needed for future releases.
51
+
52
+ On first launch the plugin opens `estravon.com/start` in your browser; follow the
53
+ "Make it yourself" path to set up the self-hosted backend.
54
+
55
+ ### 2 — Install and start the Python backend
56
+
57
+ ```bash
58
+ pip install estravon-backend
59
+ ```
60
+
61
+ Copy the example environment file and add your API key:
62
+
63
+ ```bash
64
+ cp .env.example .env
65
+ # Edit .env: set MISTRAL_API_KEY (or DATALAB_API_KEY / REPLICATE_API_TOKEN)
66
+ # Optionally set _ZM_BACKEND=datalab or _ZM_BACKEND=replicate to switch backends
67
+ ```
68
+
69
+ Start the backend:
70
+
71
+ ```bash
72
+ estravon --port 7766
73
+ ```
74
+
75
+ Keep this terminal running while you use the plugin. In Zotero → Settings → Estravon,
76
+ confirm the status indicator is green.
77
+
78
+ ---
79
+
80
+ ## First extraction
81
+
82
+ 1. Open Zotero and right-click a **Book** item that has a PDF attachment.
83
+ 2. Select **Extract Section to Markdown…**
84
+ 3. Fill in the section name (e.g. `chapter_1`), page range (e.g. `1-40`),
85
+ and extraction mode (`balanced` is a good default).
86
+ 4. Click **Extract** and wait. The backend calls the Mistral OCR API and returns
87
+ when done (typically 30–90 seconds for 40 pages).
88
+ 5. The extracted `.md` file and any images appear as child attachments on the
89
+ Zotero item. An **Extraction log** child note records the provenance.
90
+
91
+ ---
92
+
93
+ ## Configuration
94
+
95
+ Plugin preferences are in **Zotero → Settings → Estravon**:
96
+
97
+ | Preference | Default | Description |
98
+ |---|---|---|
99
+ | Backend URL | `http://localhost:7766` | Self-hosted backend address |
100
+ | Default chunk size | `80` | Pages per API call (reduce for large scanned books) |
101
+ | Default mode | `balanced` | `fast` / `balanced` / `accurate` |
102
+
103
+ ---
104
+
105
+ ## Checking backend health
106
+
107
+ ```bash
108
+ curl http://localhost:7766/ping
109
+ # {"status":"ok","state":"idle","backend":"mistral"}
110
+
111
+ curl http://localhost:7766/status
112
+ # {"state":"idle","state_since_s":12.3,"backend":"mistral","last_job":{}}
113
+ ```
114
+
115
+ The `/status` endpoint shows the current server state (idle / running / error) and
116
+ the details of the last job — useful for debugging without SSH access.
117
+
118
+ ---
119
+
120
+ ## Roadmap
121
+
122
+ **v0.4.x (now):** Self-hosted vanilla backend + Zotero plugin. Uses Mistral OCR.
123
+
124
+ **v0.5.x:** Plugin session telemetry (`/jobs/{id}/ack`); per-chunk quality scores.
125
+
126
+ **Post-launch:** LLM agent that extracts all chapters from a book autonomously
127
+ (requires the tools server — separate repository, not yet public).
128
+
129
+ ---
130
+
131
+ ## Issues and feedback
132
+
133
+ Please report bugs and feature requests via
134
+ [GitHub Issues](../../issues).
135
+
136
+ Feedback on extraction quality for scanned books, non-English texts, or books with
137
+ heavy mathematical notation is especially welcome.
138
+
139
+ ---
140
+
141
+ ## License
142
+
143
+ [AGPL-3.0](LICENSE) — the same license as Zotero itself.
@@ -0,0 +1,3 @@
1
+ """Estravon backend — PDF extraction pipeline."""
2
+
3
+ __version__ = "0.1.0"
@@ -0,0 +1,5 @@
1
+ """Entry point for ``python -m estravon``."""
2
+
3
+ from estravon.server import main
4
+
5
+ main()