pistributer 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,160 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction, and
10
+ distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright
13
+ owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all other entities
16
+ that control, are controlled by, or are under common control with that entity.
17
+ For the purposes of this definition, "control" means (i) the power, direct or
18
+ indirect, to cause the direction or management of such entity, whether by
19
+ contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
20
+ outstanding shares, or (iii) beneficial ownership of such entity.
21
+
22
+ "You" (or "Your") shall mean an individual or Legal Entity exercising
23
+ permissions granted by this License.
24
+
25
+ "Source" form shall mean the preferred form for making modifications, including
26
+ but not limited to software source code, documentation source, and
27
+ configuration files.
28
+
29
+ "Object" form shall mean any form resulting from mechanical transformation or
30
+ translation of a Source form, including but not limited to compiled object
31
+ code, generated documentation, and conversions to other media types.
32
+
33
+ "Work" shall mean the work of authorship, whether in Source or Object form,
34
+ made available under the License, as indicated by a copyright notice that is
35
+ included in or attached to the work (an example is provided in the Appendix
36
+ below).
37
+
38
+ "Derivative Works" shall mean any work, whether in Source or Object form, that
39
+ is based on (or derived from) the Work and for which the editorial revisions,
40
+ annotations, elaborations, or other modifications represent, as a whole, an
41
+ original work of authorship. For the purposes of this License, Derivative Works
42
+ shall not include works that remain separable from, or merely link (or bind by
43
+ name) to the interfaces of, the Work and Derivative Works thereof.
44
+
45
+ "Contribution" shall mean any work of authorship, including the original version
46
+ of the Work and any modifications or additions to that Work or Derivative Works
47
+ thereof, that is intentionally submitted to Licensor for inclusion in the Work
48
+ by the copyright owner or by an individual or Legal Entity authorized to submit
49
+ on behalf of the copyright owner. For the purposes of this definition,
50
+ "submitted" means any form of electronic, verbal, or written communication sent
51
+ to the Licensor or its representatives, including but not limited to
52
+ communication on electronic mailing lists, source code control systems, and
53
+ issue tracking systems that are managed by, or on behalf of, the Licensor for
54
+ the purpose of discussing and improving the Work, but excluding communication
55
+ that is conspicuously marked or otherwise designated in writing by the copyright
56
+ owner as "Not a Contribution."
57
+
58
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf
59
+ of whom a Contribution has been received by Licensor and subsequently
60
+ incorporated within the Work.
61
+
62
+ 2. Grant of Copyright License. Subject to the terms and conditions of this
63
+ License, each Contributor hereby grants to You a perpetual, worldwide,
64
+ non-exclusive, no-charge, royalty-free, irrevocable copyright license to
65
+ reproduce, prepare Derivative Works of, publicly display, publicly perform,
66
+ sublicense, and distribute the Work and such Derivative Works in Source or
67
+ Object form.
68
+
69
+ 3. Grant of Patent License. Subject to the terms and conditions of this License,
70
+ each Contributor hereby grants to You a perpetual, worldwide, non-exclusive,
71
+ no-charge, royalty-free, irrevocable (except as stated in this section) patent
72
+ license to make, have made, use, offer to sell, sell, import, and otherwise
73
+ transfer the Work, where such license applies only to those patent claims
74
+ licensable by such Contributor that are necessarily infringed by their
75
+ Contribution(s) alone or by combination of their Contribution(s) with the Work
76
+ to which such Contribution(s) was submitted. If You institute patent litigation
77
+ against any entity (including a cross-claim or counterclaim in a lawsuit)
78
+ alleging that the Work or a Contribution incorporated within the Work
79
+ constitutes direct or contributory patent infringement, then any patent
80
+ licenses granted to You under this License for that Work shall terminate as of
81
+ the date such litigation is filed.
82
+
83
+ 4. Redistribution. You may reproduce and distribute copies of the Work or
84
+ Derivative Works thereof in any medium, with or without modifications, and in
85
+ Source or Object form, provided that You meet the following conditions:
86
+
87
+ (a) You must give any other recipients of the Work or Derivative Works a copy
88
+ of this License; and
89
+
90
+ (b) You must cause any modified files to carry prominent notices stating that
91
+ You changed the files; and
92
+
93
+ (c) You must retain, in the Source form of any Derivative Works that You
94
+ distribute, all copyright, patent, trademark, and attribution notices from
95
+ the Source form of the Work, excluding those notices that do not pertain to
96
+ any part of the Derivative Works; and
97
+
98
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then
99
+ any Derivative Works that You distribute must include a readable copy of the
100
+ attribution notices contained within such NOTICE file, excluding those
101
+ notices that do not pertain to any part of the Derivative Works, in at least
102
+ one of the following places: within a NOTICE text file distributed as part
103
+ of the Derivative Works; within the Source form or documentation, if
104
+ provided along with the Derivative Works; or, within a display generated by
105
+ the Derivative Works, if and wherever such third-party notices normally
106
+ appear. The contents of the NOTICE file are for informational purposes only
107
+ and do not modify the License. You may add Your own attribution notices
108
+ within Derivative Works that You distribute, alongside or as an addendum to
109
+ the NOTICE text from the Work, provided that such additional attribution
110
+ notices cannot be construed as modifying the License.
111
+
112
+ You may add Your own copyright statement to Your modifications and may provide
113
+ additional or different license terms and conditions for use, reproduction, or
114
+ distribution of Your modifications, or for any such Derivative Works as a
115
+ whole, provided Your use, reproduction, and distribution of the Work otherwise
116
+ complies with the conditions stated in this License.
117
+
118
+ 5. Submission of Contributions. Unless You explicitly state otherwise, any
119
+ Contribution intentionally submitted for inclusion in the Work by You to the
120
+ Licensor shall be under the terms and conditions of this License, without any
121
+ additional terms or conditions. Notwithstanding the above, nothing herein shall
122
+ supersede or modify the terms of any separate license agreement you may have
123
+ executed with Licensor regarding such Contributions.
124
+
125
+ 6. Trademarks. This License does not grant permission to use the trade names,
126
+ trademarks, service marks, or product names of the Licensor, except as required
127
+ for reasonable and customary use in describing the origin of the Work and
128
+ reproducing the content of the NOTICE file.
129
+
130
+ 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in
131
+ writing, Licensor provides the Work (and each Contributor provides its
132
+ Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
133
+ KIND, either express or implied, including, without limitation, any warranties
134
+ or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
135
+ PARTICULAR PURPOSE. You are solely responsible for determining the
136
+ appropriateness of using or redistributing the Work and assume any risks
137
+ associated with Your exercise of permissions under this License.
138
+
139
+ 8. Limitation of Liability. In no event and under no legal theory, whether in
140
+ tort (including negligence), contract, or otherwise, unless required by
141
+ applicable law (such as deliberate and grossly negligent acts) or agreed to in
142
+ writing, shall any Contributor be liable to You for damages, including any
143
+ direct, indirect, special, incidental, or consequential damages of any
144
+ character arising as a result of this License or out of the use or inability to
145
+ use the Work (including but not limited to damages for loss of goodwill, work
146
+ stoppage, computer failure or malfunction, or any and all other commercial
147
+ damages or losses), even if such Contributor has been advised of the
148
+ possibility of such damages.
149
+
150
+ 9. Accepting Warranty or Additional Liability. While redistributing the Work or
151
+ Derivative Works thereof, You may choose to offer, and charge a fee for,
152
+ acceptance of support, warranty, indemnity, or other liability obligations
153
+ and/or rights consistent with this License. However, in accepting such
154
+ obligations, You may act only on Your own behalf and on Your sole
155
+ responsibility, not on behalf of any other Contributor, and only if You agree
156
+ to indemnify, defend, and hold each Contributor harmless for any liability
157
+ incurred by, or claims asserted against, such Contributor by reason of your
158
+ accepting any such warranty or additional liability.
159
+
160
+ END OF TERMS AND CONDITIONS
@@ -0,0 +1,333 @@
1
+ Metadata-Version: 2.4
2
+ Name: pistributer
3
+ Version: 0.2.0
4
+ Summary: A local-first file-backed FIFO queue with jsonl, txt, and sqlite drivers.
5
+ Author: Yeong
6
+ License-Expression: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/yeongdev/pistributer
8
+ Project-URL: Repository, https://github.com/yeongdev/pistributer
9
+ Project-URL: Issues, https://github.com/yeongdev/pistributer/issues
10
+ Project-URL: Changelog, https://github.com/yeongdev/pistributer/blob/main/CHANGELOG.md
11
+ Keywords: queue,file,fifo,jsonl,streaming
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3 :: Only
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Software Development :: Libraries
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Dynamic: license-file
25
+
26
+ # pistributer
27
+
28
+ `pistributer` is a local-first FIFO queue toolkit for file-based workflows.
29
+
30
+ **Hard position:** `pistributer` exists for developers who want a usable queue across servers or local jobs without standing up Redis, Kafka, or another heavy service.
31
+
32
+ **Performance contract:** this project prefers faster write/read throughput over a larger or more polished interface. The core value is still the same: write, read, high concurrency, multi-file workloads, and as little extra overhead as possible.
33
+
34
+ It started as a high-throughput local queue used on large datasets, and it now ships three practical drivers instead of pretending one storage format is ideal for every workload:
35
+
36
+ - `txt` for the shortest raw file path
37
+ - `jsonl` for structured, inspectable records
38
+ - `sqlite` for stronger integrity under contention
39
+
40
+ ## When to use each driver
41
+
42
+ | Driver | Use it when | Avoid it when |
43
+ | --- | --- | --- |
44
+ | `txt` | You want the shortest plain-text file path and staged or single-writer-friendly queueing | You need structured records or stronger overlap correctness |
45
+ | `jsonl` | You want readable structured payloads and the default publishable workflow | You need the rawest append path or heavy overlapping write/read integrity |
46
+ | `sqlite` | You want stronger local correctness under contention | You want the lightest append-heavy file path |
47
+
48
+ ## Project background
49
+
50
+ The earliest version of this tool was not created as a polished open-source package. It was created by a developer who wanted a queue but did not want to install or operate a heavier system such as Redis or Kafka.
51
+
52
+ The original approach was simple: use the filesystem itself as the message layer and use plain `.txt` files for data exchange. That early code was not cleaned up for publication, but the underlying queue logic was already useful in practice and was used on large amounts of real data.
53
+
54
+ Later, after the developer started building AI-oriented workflows with `nanobot`, the need for a cleaner installable package became more obvious. That is what pushed the repackaging effort: not a rewrite for the sake of novelty, but a practical need for `pip` installability, clearer docs, and a structured default format.
55
+
56
+ That is why the project now has two historical truths at the same time:
57
+
58
+ - the `0.1.x` line represents long-used file-queue logic that existed before the packaging cleanup
59
+ - the `0.2.0` line makes `.jsonl` the default public driver and adds wider tests, benchmarks, and release tooling
60
+
61
+ The current repository keeps that history visible instead of hiding it. The old behavior is preserved for comparison in `benchmarks/pistributer_bak.py`, while the main package is documented and tested as the public installable tool.
62
+
63
+ ## `0.1.x` and `0.2.0` in practice
64
+
65
+ The simplest way to understand the version difference is this:
66
+
67
+ - `0.1.x` means the long-used plain-text file-queue era
68
+ - `0.2.0` means the public packaged era with `.jsonl` as the default driver
69
+
70
+ What changed:
71
+
72
+ - `0.1.x` was centered on the shortest `.txt` path and real-world usage before packaging cleanup
73
+ - `0.2.0` keeps the same basic file-queue idea but makes `.jsonl` the default public format
74
+ - `0.2.0` adds tests, benchmark documentation, packaging, and a clearer boundary for when to use `txt`, `jsonl`, and `sqlite`
75
+
76
+ What did **not** change:
77
+
78
+ - the project still values write/read throughput over interface growth
79
+ - the core file-queue idea is still append, rotate into `.in_use`, and consume with a small API
80
+ - the hot path is still expected to stay small and fast
81
+
82
+ Measured conclusion from the current benchmark work:
83
+
84
+ - the visible slowdown from `0.1.x` to `0.2.0` is real, but it is not mostly caused by the `.jsonl` extension itself
85
+ - most of the remaining overhead comes from the modern hot path doing a little more work around path validation and file-operation safeguards
86
+ - JSON serialization adds a smaller extra cost when callers pass Python objects instead of already-serialized strings
87
+
88
+ In short: `0.2.0` is slower than the historical plain-text path, but the gap is now mostly the cost of a cleaner public default and structured serialization, not a dramatic penalty from JSONL as a format.
89
+
90
+ ## Design priorities
91
+
92
+ The author's preference is very explicit:
93
+
94
+ 1. keep the queue fast under heavy write/read workloads
95
+ 2. keep the public surface small
96
+ 3. avoid changes that reduce throughput unless the trade-off is clearly worth it
97
+
98
+ That means this project does **not** try to win by offering a large interface. It tries to win by doing a very small number of things well:
99
+
100
+ - append records fast
101
+ - read records fast
102
+ - handle many files
103
+ - stay useful under high write pressure
104
+
105
+ The current `.jsonl` default is accepted even though it is slower than the historical `.txt` path, because the measured overhead is still within an acceptable range for the intended workflows. Outside of that specific trade-off, performance regressions are treated as unacceptable.
106
+
107
+ ## Project position
108
+
109
+ `pistributer` is best positioned as a lightweight local queue for scripts, batch jobs, and single-host pipelines.
110
+
111
+ It is a good fit when:
112
+
113
+ - your workload is file-centric
114
+ - you want simple deployment with no external service
115
+ - you care about readable queue state on disk
116
+ - you want to choose between throughput-first and integrity-first local drivers
117
+ - your file-driver usage is staged or single-writer-friendly
118
+
119
+ It is not the best fit when:
120
+
121
+ - you need a multi-node distributed queue
122
+ - you need strict correctness under heavy overlapping readers and writers with the file drivers
123
+ - you want managed persistence, replication, or cross-host coordination
124
+
125
+ ## Why use it
126
+
127
+ Use `pistributer` when you want a small queue abstraction for a local or single-host pipeline without introducing Redis, Kafka, or a separate database service.
128
+
129
+ What it is good at:
130
+
131
+ - append-heavy local workloads
132
+ - high-concurrency write-focused workloads
133
+ - simple batch pipelines and scripts
134
+ - readable on-disk data formats
135
+ - lightweight deployment with minimal operational overhead
136
+
137
+ The file-based drivers rotate active data into `.in_use`, which helps reduce direct producer/consumer contention on the same file.
138
+
139
+ The two file drivers, `Pistributer` and `PistributerTxt`, are best documented as staged or single-writer-friendly. They work well when appends and reads are mostly separated, but they are not the strongest option for heavy overlapping writer and reader contention.
140
+
141
+ For hot-path performance, `put()` in the two file drivers assumes the parent directory already exists. Directory preparation is intentionally kept outside the hot path.
142
+
143
+ The intent is not to keep adding more actions and more surface area. The intent is to keep the important path small: write, read, and move a lot of data without unnecessary slowdown.
144
+
145
+ ## Driver modes
146
+
147
+ ### `jsonl` driver
148
+
149
+ ```python
150
+ from pistributer import Pistributer
151
+ ```
152
+
153
+ Use this as the default mode when you want structured records and a modern `.jsonl` workflow.
154
+
155
+ ### `txt` driver
156
+
157
+ ```python
158
+ from pistributer_txt import PistributerTxt
159
+ ```
160
+
161
+ Use this when raw text throughput matters more than structure.
162
+
163
+ ### `sqlite` driver
164
+
165
+ ```python
166
+ from pistributer_sqlite import PistributerSqlite
167
+ ```
168
+
169
+ Use this when queue correctness matters more than the shortest append path.
170
+
171
+ See `DRIVERS.md` for a full comparison.
172
+
173
+ See `EXAMPLES.md` for small copyable examples for all three drivers.
174
+
175
+ The Python docstrings are written to be `help()`-friendly, so `help(Pistributer)`, `help(Pistributer.put)`, and the equivalent driver methods should now give usable inline guidance.
176
+
177
+ ## Choose a driver
178
+
179
+ | Driver | Best for | Main trade-off |
180
+ | --- | --- | --- |
181
+ | `Pistributer` (`jsonl`) | Structured local queues and readable payloads | More overhead than raw text |
182
+ | `PistributerTxt` | Fast plain-text append-heavy workloads | Least structured format |
183
+ | `PistributerSqlite` | Stronger local correctness under contention | Higher transaction overhead |
184
+
185
+ ## API naming note
186
+
187
+ Public API names stay stable on purpose.
188
+
189
+ - `Pistributer` and `PistributerTxt` keep the historical camelCase names such as `isEmpty()` and `getIndex()`.
190
+ - `PistributerSqlite` keeps its newer snake_case method `is_empty()`.
191
+
192
+ The naming is not perfectly uniform, but the project keeps the existing public contract instead of breaking working code.
193
+
194
+ ## Install
195
+
196
+ ```bash
197
+ pip install pistributer
198
+ ```
199
+
200
+ ## Quick start
201
+
202
+ ```python
203
+ from pistributer import Pistributer
204
+
205
+ Pistributer.put("channel.jsonl", {"value": "hello"})
206
+ Pistributer.put("channel.jsonl", {"value": "world"})
207
+
208
+ queue = Pistributer("channel.jsonl")
209
+
210
+ print(queue.next())
211
+ print(queue.next())
212
+ print(queue.isEmpty())
213
+ ```
214
+
215
+ ## Core API
216
+
217
+ The practical core of the project is intentionally small: append with `put()` and consume with `next()`.
218
+
219
+ The other methods exist to support queue lifecycle and compatibility, not to turn the library into a broad interface framework.
220
+
221
+ - `Pistributer(path)`: open a queue backed by a `.jsonl` file
222
+ - `Pistributer.put(path, value)`: append one message; the parent directory must already exist
223
+ - `Pistributer.new(path, value, overwrite=False, sep="")`: create a file with initial content
224
+ - `Pistributer.next()`: return the next message, or raise `StopIteration` if empty
225
+ - `Pistributer.isEmpty()`: return whether unread messages remain
226
+ - `Pistributer.size()`: count queued messages across active files
227
+ - `Pistributer.remaining()`: count unread messages
228
+ - `Pistributer.getIndex()`: return the consumed message count
229
+
230
+ ## Validation
231
+
232
+ The repository includes both correctness tests and direct benchmarks against the preserved backup implementation at `benchmarks/pistributer_bak.py`.
233
+
234
+ ### Functional tests
235
+
236
+ The main regression test writes `300` `.jsonl` files, each with `30` distinct JSON rows, and then consumes everything through `next()`.
237
+
238
+ That covers `9000` records and checks:
239
+
240
+ - FIFO ordering
241
+ - empty-queue behavior
242
+ - remaining-count behavior
243
+ - reopen/index persistence
244
+ - rotated `.in_use` files plus newly appended data
245
+
246
+ Run the test suite with:
247
+
248
+ ```bash
249
+ python -m unittest discover -s tests -v
250
+ ```
251
+
252
+ ### Staged benchmark
253
+
254
+ The staged benchmark compares:
255
+
256
+ - `benchmarks/pistributer_bak.py` with `300` `.txt` files × `30` rows
257
+ - `pistributer.py` with `300` `.jsonl` files × `30` rows
258
+
259
+ Latest measured result in this workspace:
260
+
261
+ - backup total: about `0.956s`
262
+ - current total: about `1.020s`
263
+ - current write ratio: about `1.18x` of backup
264
+ - current read ratio: about `1.02x` of backup
265
+ - current total ratio: about `1.07x` of backup
266
+
267
+ In this staged workload, the current `jsonl` rewrite is slightly slower than the backup `txt` path, which is expected because the structured driver pays extra serialization and validation overhead.
268
+
269
+ That overhead is currently accepted because the repository intentionally chose `.jsonl` as the public default. Beyond that known trade-off, the project should resist changes that make the hot path slower.
270
+
271
+ The important version-difference detail is that this benchmark compares the historical backup implementation against the current public package. It is not a pure `jsonl` vs `txt` format test in isolation.
272
+
273
+ After moving parent-directory preparation out of the file-driver `put()` hot path, a focused write-only microbenchmark in this workspace produced these averages over five runs of `20000` appends:
274
+
275
+ - backup `txt` string append: about `0.617s`
276
+ - current `txt` string append: about `0.586s`
277
+ - current `jsonl` string append: about `0.608s`
278
+ - current `jsonl` dictionary append: about `0.652s`
279
+
280
+ That means the current `jsonl` hot path is now much closer to the historical reference point when the caller passes pre-serialized strings, and the remaining extra cost mainly shows up when the caller asks the driver to serialize Python objects.
281
+
282
+ Run it with:
283
+
284
+ ```bash
285
+ python benchmarks/compare_versions.py
286
+ ```
287
+
288
+ Detailed notes are in `BENCHMARKS.md`.
289
+
290
+ ### Interleaved write/read stress benchmark
291
+
292
+ The repository also includes a heavier benchmark where writes and reads overlap:
293
+
294
+ - `300` writer threads
295
+ - `64` consumer threads
296
+ - `30` logical file assignments per writer
297
+ - `10` shared files per writer
298
+ - `300` rows per file assignment
299
+ - `2,700,000` total rows attempted
300
+ - `6010` physical files after shared-path collapse
301
+
302
+ Latest measured result in this workspace:
303
+
304
+ - backup `.txt` consumed `2,699,852 / 2,700,000`
305
+ - current `.jsonl` consumed `2,699,351 / 2,700,000`
306
+ - both modes produced `0` malformed JSON rows
307
+ - both file drivers failed full integrity under simultaneous write/read pressure
308
+
309
+ That is the practical reason the `sqlite` driver exists: once the workload shifts from staged local queueing to stronger overlapping write/read correctness, a transactional driver becomes the better fit.
310
+
311
+ Run it with:
312
+
313
+ ```bash
314
+ python benchmarks/threaded_interleaved_compare.py
315
+ ```
316
+
317
+ ## Release
318
+
319
+ Release steps and commands are documented in `RELEASE.md`.
320
+
321
+ Project history is tracked in `CHANGELOG.md`.
322
+
323
+ ## Contributing
324
+
325
+ Small, focused contributions are welcome. Start with `CONTRIBUTING.md` for development and review expectations.
326
+
327
+ ## Notes
328
+
329
+ - Messages are stored as one JSON object per line in the default driver.
330
+ - Files rotate from `data` to `.in_use` so new appends stay separate from active consumption.
331
+ - `txt` and `jsonl` are strongest in local staged workloads where writes and reads are mostly separated.
332
+ - `sqlite` targets a different optimization goal: stronger correctness under concurrency.
333
+ - The project is intentionally lightweight and local-first.