PyPI - pistributer - Versions diffs - 0.2.0__tar.gz - Mend

pistributer 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

pistributer-0.2.0/LICENSE +160 -0
pistributer-0.2.0/PKG-INFO +333 -0
pistributer-0.2.0/README.md +308 -0
pistributer-0.2.0/pistributer.egg-info/PKG-INFO +333 -0
pistributer-0.2.0/pistributer.egg-info/SOURCES.txt +12 -0
pistributer-0.2.0/pistributer.egg-info/dependency_links.txt +1 -0
pistributer-0.2.0/pistributer.egg-info/top_level.txt +3 -0
pistributer-0.2.0/pistributer.py +348 -0
pistributer-0.2.0/pistributer_sqlite.py +210 -0
pistributer-0.2.0/pistributer_txt.py +281 -0
pistributer-0.2.0/pyproject.toml +31 -0
pistributer-0.2.0/setup.cfg +4 -0
pistributer-0.2.0/tests/test_driver_modes.py +57 -0
pistributer-0.2.0/tests/test_pistributer.py +116 -0

pistributer-0.2.0/LICENSE ADDED Viewed

@@ -0,0 +1,160 @@
+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction, and
+ distribution as defined by Sections 1 through 9 of this document.
+"Licensor" shall mean the copyright owner or entity authorized by the copyright
+ owner that is granting the License.
+"Legal Entity" shall mean the union of the acting entity and all other entities
+ that control, are controlled by, or are under common control with that entity.
+ For the purposes of this definition, "control" means (i) the power, direct or
+ indirect, to cause the direction or management of such entity, whether by
+ contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+"You" (or "Your") shall mean an individual or Legal Entity exercising
+ permissions granted by this License.
+"Source" form shall mean the preferred form for making modifications, including
+ but not limited to software source code, documentation source, and
+ configuration files.
+"Object" form shall mean any form resulting from mechanical transformation or
+ translation of a Source form, including but not limited to compiled object
+ code, generated documentation, and conversions to other media types.
+"Work" shall mean the work of authorship, whether in Source or Object form,
+ made available under the License, as indicated by a copyright notice that is
+ included in or attached to the work (an example is provided in the Appendix
+ below).
+"Derivative Works" shall mean any work, whether in Source or Object form, that
+ is based on (or derived from) the Work and for which the editorial revisions,
+ annotations, elaborations, or other modifications represent, as a whole, an
+ original work of authorship. For the purposes of this License, Derivative Works
+ shall not include works that remain separable from, or merely link (or bind by
+ name) to the interfaces of, the Work and Derivative Works thereof.
+"Contribution" shall mean any work of authorship, including the original version
+ of the Work and any modifications or additions to that Work or Derivative Works
+ thereof, that is intentionally submitted to Licensor for inclusion in the Work
+ by the copyright owner or by an individual or Legal Entity authorized to submit
+ on behalf of the copyright owner. For the purposes of this definition,
+ "submitted" means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems, and
+ issue tracking systems that are managed by, or on behalf of, the Licensor for
+ the purpose of discussing and improving the Work, but excluding communication
+ that is conspicuously marked or otherwise designated in writing by the copyright
+ owner as "Not a Contribution."
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf
+ of whom a Contribution has been received by Licensor and subsequently
+ incorporated within the Work.
+2. Grant of Copyright License. Subject to the terms and conditions of this
+ License, each Contributor hereby grants to You a perpetual, worldwide,
+ non-exclusive, no-charge, royalty-free, irrevocable copyright license to
+ reproduce, prepare Derivative Works of, publicly display, publicly perform,
+ sublicense, and distribute the Work and such Derivative Works in Source or
+ Object form.
+3. Grant of Patent License. Subject to the terms and conditions of this License,
+ each Contributor hereby grants to You a perpetual, worldwide, non-exclusive,
+ no-charge, royalty-free, irrevocable (except as stated in this section) patent
+ license to make, have made, use, offer to sell, sell, import, and otherwise
+ transfer the Work, where such license applies only to those patent claims
+ licensable by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s) with the Work
+ to which such Contribution(s) was submitted. If You institute patent litigation
+ against any entity (including a cross-claim or counterclaim in a lawsuit)
+ alleging that the Work or a Contribution incorporated within the Work
+ constitutes direct or contributory patent infringement, then any patent
+ licenses granted to You under this License for that Work shall terminate as of
+ the date such litigation is filed.
+4. Redistribution. You may reproduce and distribute copies of the Work or
+ Derivative Works thereof in any medium, with or without modifications, and in
+ Source or Object form, provided that You meet the following conditions:
+ (a) You must give any other recipients of the Work or Derivative Works a copy
+     of this License; and
+ (b) You must cause any modified files to carry prominent notices stating that
+     You changed the files; and
+ (c) You must retain, in the Source form of any Derivative Works that You
+     distribute, all copyright, patent, trademark, and attribution notices from
+     the Source form of the Work, excluding those notices that do not pertain to
+     any part of the Derivative Works; and
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then
+     any Derivative Works that You distribute must include a readable copy of the
+     attribution notices contained within such NOTICE file, excluding those
+     notices that do not pertain to any part of the Derivative Works, in at least
+     one of the following places: within a NOTICE text file distributed as part
+     of the Derivative Works; within the Source form or documentation, if
+     provided along with the Derivative Works; or, within a display generated by
+     the Derivative Works, if and wherever such third-party notices normally
+     appear. The contents of the NOTICE file are for informational purposes only
+     and do not modify the License. You may add Your own attribution notices
+     within Derivative Works that You distribute, alongside or as an addendum to
+     the NOTICE text from the Work, provided that such additional attribution
+     notices cannot be construed as modifying the License.
+ You may add Your own copyright statement to Your modifications and may provide
+ additional or different license terms and conditions for use, reproduction, or
+ distribution of Your modifications, or for any such Derivative Works as a
+ whole, provided Your use, reproduction, and distribution of the Work otherwise
+ complies with the conditions stated in this License.
+5. Submission of Contributions. Unless You explicitly state otherwise, any
+ Contribution intentionally submitted for inclusion in the Work by You to the
+ Licensor shall be under the terms and conditions of this License, without any
+ additional terms or conditions. Notwithstanding the above, nothing herein shall
+ supersede or modify the terms of any separate license agreement you may have
+ executed with Licensor regarding such Contributions.
+6. Trademarks. This License does not grant permission to use the trade names,
+ trademarks, service marks, or product names of the Licensor, except as required
+ for reasonable and customary use in describing the origin of the Work and
+ reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in
+ writing, Licensor provides the Work (and each Contributor provides its
+ Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied, including, without limitation, any warranties
+ or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any risks
+ associated with Your exercise of permissions under this License.
+8. Limitation of Liability. In no event and under no legal theory, whether in
+ tort (including negligence), contract, or otherwise, unless required by
+ applicable law (such as deliberate and grossly negligent acts) or agreed to in
+ writing, shall any Contributor be liable to You for damages, including any
+ direct, indirect, special, incidental, or consequential damages of any
+ character arising as a result of this License or out of the use or inability to
+ use the Work (including but not limited to damages for loss of goodwill, work
+ stoppage, computer failure or malfunction, or any and all other commercial
+ damages or losses), even if such Contributor has been advised of the
+ possibility of such damages.
+9. Accepting Warranty or Additional Liability. While redistributing the Work or
+ Derivative Works thereof, You may choose to offer, and charge a fee for,
+ acceptance of support, warranty, indemnity, or other liability obligations
+ and/or rights consistent with this License. However, in accepting such
+ obligations, You may act only on Your own behalf and on Your sole
+ responsibility, not on behalf of any other Contributor, and only if You agree
+ to indemnify, defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason of your
+ accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS

pistributer-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,333 @@
+Metadata-Version: 2.4
+Name: pistributer
+Version: 0.2.0
+Summary: A local-first file-backed FIFO queue with jsonl, txt, and sqlite drivers.
+Author: Yeong
+License-Expression: Apache-2.0
+Project-URL: Homepage, https://github.com/yeongdev/pistributer
+Project-URL: Repository, https://github.com/yeongdev/pistributer
+Project-URL: Issues, https://github.com/yeongdev/pistributer/issues
+Project-URL: Changelog, https://github.com/yeongdev/pistributer/blob/main/CHANGELOG.md
+Keywords: queue,file,fifo,jsonl,streaming
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Dynamic: license-file
+# pistributer
+`pistributer` is a local-first FIFO queue toolkit for file-based workflows.
+**Hard position:** `pistributer` exists for developers who want a usable queue across servers or local jobs without standing up Redis, Kafka, or another heavy service.
+**Performance contract:** this project prefers faster write/read throughput over a larger or more polished interface. The core value is still the same: write, read, high concurrency, multi-file workloads, and as little extra overhead as possible.
+It started as a high-throughput local queue used on large datasets, and it now ships three practical drivers instead of pretending one storage format is ideal for every workload:
+- `txt` for the shortest raw file path
+- `jsonl` for structured, inspectable records
+- `sqlite` for stronger integrity under contention
+## When to use each driver
+| Driver | Use it when | Avoid it when |
+| --- | --- | --- |
+| `txt` | You want the shortest plain-text file path and staged or single-writer-friendly queueing | You need structured records or stronger overlap correctness |
+| `jsonl` | You want readable structured payloads and the default publishable workflow | You need the rawest append path or heavy overlapping write/read integrity |
+| `sqlite` | You want stronger local correctness under contention | You want the lightest append-heavy file path |
+## Project background
+The earliest version of this tool was not created as a polished open-source package. It was created by a developer who wanted a queue but did not want to install or operate a heavier system such as Redis or Kafka.
+The original approach was simple: use the filesystem itself as the message layer and use plain `.txt` files for data exchange. That early code was not cleaned up for publication, but the underlying queue logic was already useful in practice and was used on large amounts of real data.
+Later, after the developer started building AI-oriented workflows with `nanobot`, the need for a cleaner installable package became more obvious. That is what pushed the repackaging effort: not a rewrite for the sake of novelty, but a practical need for `pip` installability, clearer docs, and a structured default format.
+That is why the project now has two historical truths at the same time:
+- the `0.1.x` line represents long-used file-queue logic that existed before the packaging cleanup
+- the `0.2.0` line makes `.jsonl` the default public driver and adds wider tests, benchmarks, and release tooling
+The current repository keeps that history visible instead of hiding it. The old behavior is preserved for comparison in `benchmarks/pistributer_bak.py`, while the main package is documented and tested as the public installable tool.
+## `0.1.x` and `0.2.0` in practice
+The simplest way to understand the version difference is this:
+- `0.1.x` means the long-used plain-text file-queue era
+- `0.2.0` means the public packaged era with `.jsonl` as the default driver
+What changed:
+- `0.1.x` was centered on the shortest `.txt` path and real-world usage before packaging cleanup
+- `0.2.0` keeps the same basic file-queue idea but makes `.jsonl` the default public format
+- `0.2.0` adds tests, benchmark documentation, packaging, and a clearer boundary for when to use `txt`, `jsonl`, and `sqlite`
+What did **not** change:
+- the project still values write/read throughput over interface growth
+- the core file-queue idea is still append, rotate into `.in_use`, and consume with a small API
+- the hot path is still expected to stay small and fast
+Measured conclusion from the current benchmark work:
+- the visible slowdown from `0.1.x` to `0.2.0` is real, but it is not mostly caused by the `.jsonl` extension itself
+- most of the remaining overhead comes from the modern hot path doing a little more work around path validation and file-operation safeguards
+- JSON serialization adds a smaller extra cost when callers pass Python objects instead of already-serialized strings
+In short: `0.2.0` is slower than the historical plain-text path, but the gap is now mostly the cost of a cleaner public default and structured serialization, not a dramatic penalty from JSONL as a format.
+## Design priorities
+The author's preference is very explicit:
+1. keep the queue fast under heavy write/read workloads
+2. keep the public surface small
+3. avoid changes that reduce throughput unless the trade-off is clearly worth it
+That means this project does **not** try to win by offering a large interface. It tries to win by doing a very small number of things well:
+- append records fast
+- read records fast
+- handle many files
+- stay useful under high write pressure
+The current `.jsonl` default is accepted even though it is slower than the historical `.txt` path, because the measured overhead is still within an acceptable range for the intended workflows. Outside of that specific trade-off, performance regressions are treated as unacceptable.
+## Project position
+`pistributer` is best positioned as a lightweight local queue for scripts, batch jobs, and single-host pipelines.
+It is a good fit when:
+- your workload is file-centric
+- you want simple deployment with no external service
+- you care about readable queue state on disk
+- you want to choose between throughput-first and integrity-first local drivers
+- your file-driver usage is staged or single-writer-friendly
+It is not the best fit when:
+- you need a multi-node distributed queue
+- you need strict correctness under heavy overlapping readers and writers with the file drivers
+- you want managed persistence, replication, or cross-host coordination
+## Why use it
+Use `pistributer` when you want a small queue abstraction for a local or single-host pipeline without introducing Redis, Kafka, or a separate database service.
+What it is good at:
+- append-heavy local workloads
+- high-concurrency write-focused workloads
+- simple batch pipelines and scripts
+- readable on-disk data formats
+- lightweight deployment with minimal operational overhead
+The file-based drivers rotate active data into `.in_use`, which helps reduce direct producer/consumer contention on the same file.
+The two file drivers, `Pistributer` and `PistributerTxt`, are best documented as staged or single-writer-friendly. They work well when appends and reads are mostly separated, but they are not the strongest option for heavy overlapping writer and reader contention.
+For hot-path performance, `put()` in the two file drivers assumes the parent directory already exists. Directory preparation is intentionally kept outside the hot path.
+The intent is not to keep adding more actions and more surface area. The intent is to keep the important path small: write, read, and move a lot of data without unnecessary slowdown.
+## Driver modes
+### `jsonl` driver
+```python
+from pistributer import Pistributer
+```
+Use this as the default mode when you want structured records and a modern `.jsonl` workflow.
+### `txt` driver
+```python
+from pistributer_txt import PistributerTxt
+```
+Use this when raw text throughput matters more than structure.
+### `sqlite` driver
+```python
+from pistributer_sqlite import PistributerSqlite
+```
+Use this when queue correctness matters more than the shortest append path.
+See `DRIVERS.md` for a full comparison.
+See `EXAMPLES.md` for small copyable examples for all three drivers.
+The Python docstrings are written to be `help()`-friendly, so `help(Pistributer)`, `help(Pistributer.put)`, and the equivalent driver methods should now give usable inline guidance.
+## Choose a driver
+| Driver | Best for | Main trade-off |
+| --- | --- | --- |
+| `Pistributer` (`jsonl`) | Structured local queues and readable payloads | More overhead than raw text |
+| `PistributerTxt` | Fast plain-text append-heavy workloads | Least structured format |
+| `PistributerSqlite` | Stronger local correctness under contention | Higher transaction overhead |
+## API naming note
+Public API names stay stable on purpose.
+- `Pistributer` and `PistributerTxt` keep the historical camelCase names such as `isEmpty()` and `getIndex()`.
+- `PistributerSqlite` keeps its newer snake_case method `is_empty()`.
+The naming is not perfectly uniform, but the project keeps the existing public contract instead of breaking working code.
+## Install
+```bash
+pip install pistributer
+```
+## Quick start
+```python
+from pistributer import Pistributer
+Pistributer.put("channel.jsonl", {"value": "hello"})
+Pistributer.put("channel.jsonl", {"value": "world"})
+queue = Pistributer("channel.jsonl")
+print(queue.next())
+print(queue.next())
+print(queue.isEmpty())
+```
+## Core API
+The practical core of the project is intentionally small: append with `put()` and consume with `next()`.
+The other methods exist to support queue lifecycle and compatibility, not to turn the library into a broad interface framework.
+- `Pistributer(path)`: open a queue backed by a `.jsonl` file
+- `Pistributer.put(path, value)`: append one message; the parent directory must already exist
+- `Pistributer.new(path, value, overwrite=False, sep="")`: create a file with initial content
+- `Pistributer.next()`: return the next message, or raise `StopIteration` if empty
+- `Pistributer.isEmpty()`: return whether unread messages remain
+- `Pistributer.size()`: count queued messages across active files
+- `Pistributer.remaining()`: count unread messages
+- `Pistributer.getIndex()`: return the consumed message count
+## Validation
+The repository includes both correctness tests and direct benchmarks against the preserved backup implementation at `benchmarks/pistributer_bak.py`.
+### Functional tests
+The main regression test writes `300` `.jsonl` files, each with `30` distinct JSON rows, and then consumes everything through `next()`.
+That covers `9000` records and checks:
+- FIFO ordering
+- empty-queue behavior
+- remaining-count behavior
+- reopen/index persistence
+- rotated `.in_use` files plus newly appended data
+Run the test suite with:
+```bash
+python -m unittest discover -s tests -v
+```
+### Staged benchmark
+The staged benchmark compares:
+- `benchmarks/pistributer_bak.py` with `300` `.txt` files × `30` rows
+- `pistributer.py` with `300` `.jsonl` files × `30` rows
+Latest measured result in this workspace:
+- backup total: about `0.956s`
+- current total: about `1.020s`
+- current write ratio: about `1.18x` of backup
+- current read ratio: about `1.02x` of backup
+- current total ratio: about `1.07x` of backup
+In this staged workload, the current `jsonl` rewrite is slightly slower than the backup `txt` path, which is expected because the structured driver pays extra serialization and validation overhead.
+That overhead is currently accepted because the repository intentionally chose `.jsonl` as the public default. Beyond that known trade-off, the project should resist changes that make the hot path slower.
+The important version-difference detail is that this benchmark compares the historical backup implementation against the current public package. It is not a pure `jsonl` vs `txt` format test in isolation.
+After moving parent-directory preparation out of the file-driver `put()` hot path, a focused write-only microbenchmark in this workspace produced these averages over five runs of `20000` appends:
+- backup `txt` string append: about `0.617s`
+- current `txt` string append: about `0.586s`
+- current `jsonl` string append: about `0.608s`
+- current `jsonl` dictionary append: about `0.652s`
+That means the current `jsonl` hot path is now much closer to the historical reference point when the caller passes pre-serialized strings, and the remaining extra cost mainly shows up when the caller asks the driver to serialize Python objects.
+Run it with:
+```bash
+python benchmarks/compare_versions.py
+```
+Detailed notes are in `BENCHMARKS.md`.
+### Interleaved write/read stress benchmark
+The repository also includes a heavier benchmark where writes and reads overlap:
+- `300` writer threads
+- `64` consumer threads
+- `30` logical file assignments per writer
+- `10` shared files per writer
+- `300` rows per file assignment
+- `2,700,000` total rows attempted
+- `6010` physical files after shared-path collapse
+Latest measured result in this workspace:
+- backup `.txt` consumed `2,699,852 / 2,700,000`
+- current `.jsonl` consumed `2,699,351 / 2,700,000`
+- both modes produced `0` malformed JSON rows
+- both file drivers failed full integrity under simultaneous write/read pressure
+That is the practical reason the `sqlite` driver exists: once the workload shifts from staged local queueing to stronger overlapping write/read correctness, a transactional driver becomes the better fit.
+Run it with:
+```bash
+python benchmarks/threaded_interleaved_compare.py
+```
+## Release
+Release steps and commands are documented in `RELEASE.md`.
+Project history is tracked in `CHANGELOG.md`.
+## Contributing
+Small, focused contributions are welcome. Start with `CONTRIBUTING.md` for development and review expectations.
+## Notes
+- Messages are stored as one JSON object per line in the default driver.
+- Files rotate from `data` to `.in_use` so new appends stay separate from active consumption.
+- `txt` and `jsonl` are strongest in local staged workloads where writes and reads are mostly separated.
+- `sqlite` targets a different optimization goal: stronger correctness under concurrency.
+- The project is intentionally lightweight and local-first.