PyPI - flowcept - Versions diffs - 0.9.10__tar.gz → 0.9.12__tar.gz - Mend

flowcept 0.9.10tar.gz → 0.9.12tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (218) hide show

{flowcept-0.9.10 → flowcept-0.9.12}/.gitignore RENAMED Viewed

@@ -10,6 +10,7 @@
 **/*tensorboard_events*
 **/*.DS_Store*
 **/*.log
+**/*.jsonl
 **/*.pth
 **/*ipynb_checkpoints*
 **/*flowcept_lmdb

{flowcept-0.9.10 → flowcept-0.9.12}/Makefile RENAMED Viewed

@@ -46,6 +46,7 @@ clean:
 	@find . -type f -name "*nohup*" -exec sh -c 'rm -f "$$@" 2>/dev/null || true' sh {} +
 	@sh -c 'sphinx-build -M clean docs docs/_build > /dev/null 2>&1 || true'
 	@sh -c 'rm -f docs/generated/* 2>/dev/null || true'
+	@sh -c 'rm -f docs/_build/* 2>/dev/null || true'
 # Build the HTML documentation using Sphinx
 .PHONY: docs

{flowcept-0.9.10 → flowcept-0.9.12}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: flowcept
-Version: 0.9.10
+Version: 0.9.12
 Summary: Capture and query workflow provenance data using data observability
 Author: Oak Ridge National Laboratory
 License-Expression: MIT
@@ -149,11 +149,16 @@ Description-Content-Type: text/markdown
 <p align="center">
   <picture>
+    <!-- Dark theme -->
+    <source srcset="./docs/img/flowcept-logo-dark.png" media="(prefers-color-scheme: dark)" />
+    <!-- Light theme -->
+    <source srcset="./docs/img/flowcept-logo.png" media="(prefers-color-scheme: light)" />
+    <!-- Fallback -->
     <img src="./docs/img/flowcept-logo.png" alt="Flowcept Logo" width="200"/>
   </picture>
 </p>
-<h3 align="center">Lightweight Distributed Workflow Provenance</h3>
+<h3 align="center">Lightweight Distributed Workflow Provenance</h3>
 ---
@@ -162,6 +167,7 @@ Flowcept captures and queries workflow provenance at runtime with minimal code c
 ---
 [![Documentation](https://img.shields.io/badge/docs-readthedocs.io-green.svg)](https://flowcept.readthedocs.io/)
 [![Build](https://github.com/ORNL/flowcept/actions/workflows/create-release-n-publish.yml/badge.svg)](https://github.com/ORNL/flowcept/actions/workflows/create-release-n-publish.yml)
 [![PyPI](https://badge.fury.io/py/flowcept.svg)](https://pypi.org/project/flowcept)
@@ -169,6 +175,15 @@ Flowcept captures and queries workflow provenance at runtime with minimal code c
 [![Code Formatting](https://github.com/ORNL/flowcept/actions/workflows/checks.yml/badge.svg?branch=dev)](https://github.com/ORNL/flowcept/actions/workflows/checks.yml)
 [![License: MIT](https://img.shields.io/github/license/ORNL/flowcept)](LICENSE)
+<h4 align="center">
+  <a href="https://flowcept.readthedocs.io/">Documentation</a> &#8226;
+  <a href="./docs/publications">Publications</a>
+</h4>
 ---
 # Quickstart
@@ -220,7 +235,7 @@ def main():
 if __name__ == "__main__":
     main()
-    prov_messages = Flowcept.read_messages_file()
+    prov_messages = Flowcept.read_buffer_file()
     assert len(prov_messages) == 2
     print(json.dumps(prov_messages, indent=2))
 ```

{flowcept-0.9.10 → flowcept-0.9.12}/README.md RENAMED Viewed

@@ -1,10 +1,15 @@
 <p align="center">
   <picture>
+    <!-- Dark theme -->
+    <source srcset="./docs/img/flowcept-logo-dark.png" media="(prefers-color-scheme: dark)" />
+    <!-- Light theme -->
+    <source srcset="./docs/img/flowcept-logo.png" media="(prefers-color-scheme: light)" />
+    <!-- Fallback -->
     <img src="./docs/img/flowcept-logo.png" alt="Flowcept Logo" width="200"/>
   </picture>
 </p>
-<h3 align="center">Lightweight Distributed Workflow Provenance</h3>
+<h3 align="center">Lightweight Distributed Workflow Provenance</h3>
 ---
@@ -13,6 +18,7 @@ Flowcept captures and queries workflow provenance at runtime with minimal code c
 ---
 [![Documentation](https://img.shields.io/badge/docs-readthedocs.io-green.svg)](https://flowcept.readthedocs.io/)
 [![Build](https://github.com/ORNL/flowcept/actions/workflows/create-release-n-publish.yml/badge.svg)](https://github.com/ORNL/flowcept/actions/workflows/create-release-n-publish.yml)
 [![PyPI](https://badge.fury.io/py/flowcept.svg)](https://pypi.org/project/flowcept)
@@ -20,6 +26,15 @@ Flowcept captures and queries workflow provenance at runtime with minimal code c
 [![Code Formatting](https://github.com/ORNL/flowcept/actions/workflows/checks.yml/badge.svg?branch=dev)](https://github.com/ORNL/flowcept/actions/workflows/checks.yml)
 [![License: MIT](https://img.shields.io/github/license/ORNL/flowcept)](LICENSE)
+<h4 align="center">
+  <a href="https://flowcept.readthedocs.io/">Documentation</a> &#8226;
+  <a href="./docs/publications">Publications</a>
+</h4>
 ---
 # Quickstart
@@ -71,7 +86,7 @@ def main():
 if __name__ == "__main__":
     main()
-    prov_messages = Flowcept.read_messages_file()
+    prov_messages = Flowcept.read_buffer_file()
     assert len(prov_messages) == 2
     print(json.dumps(prov_messages, indent=2))
 ```

{flowcept-0.9.10 → flowcept-0.9.12}/docs/api-reference.rst RENAMED Viewed

@@ -65,7 +65,9 @@ FlowceptTask
 FlowceptLoop
 -------------------
-.. autoclass:: flowcept.FlowceptLoop
+Can be imported via ``from flowcept import FlowceptLoop``
+.. autoclass:: flowcept.instrumentation.flowcept_loop.FlowceptLoop
    :members:
    :special-members: __init__
    :undoc-members:
@@ -75,7 +77,9 @@ FlowceptLoop
 FlowceptLightweightLoop
 ------------------------------
-.. autoclass:: flowcept.FlowceptLightweightLoop
+Can be imported via ``from flowcept import FlowceptLightweightLoop``
+.. autoclass:: flowcept.instrumentation.flowcept_loop.FlowceptLightweightLoop
    :members:
    :special-members: __init__
    :undoc-members:

{flowcept-0.9.10 → flowcept-0.9.12}/docs/conf.py RENAMED Viewed

@@ -1,31 +1,39 @@
 # Configuration file for the Sphinx documentation builder.
 #
-# For the full list of built-in configuration values, see the documentation:
 # https://www.sphinx-doc.org/en/master/usage/configuration.html
 # -- Project information -----------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 project = "Flowcept"
 copyright = "Oak Ridge National Lab"
 author = "Oak Ridge National Lab"
 # -- General configuration ---------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
 extensions = [
     "sphinx.ext.autodoc",
     "sphinx.ext.autosummary",
 ]
 autosummary_generate = True
 templates_path = ["_templates"]
 exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
-# -- Options for HTML output -------------------------------------------------
-# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+# -- HTML output -------------------------------------------------------------
 html_theme = "furo"
 html_title = "Flowcept"
+# Keep using your existing 'img' folder as the static path so you don't have to move files.
+# Sphinx will treat everything inside 'img/' as static assets.
 html_static_path = ["img"]
-html_logo = "img/flowcept-logo.png"
+# Furo supports automatic dark/light logo switching.
+# IMPORTANT: Paths below are relative to the *root* of each static path ('img' here),
+# so do NOT prefix with 'img/'.
+html_theme_options = {
+    "light_logo": "flowcept-logo.png",
+    "dark_logo": "flowcept-logo-dark.png",
+    # Optional extras:
+    "sidebar_hide_name": True,
+    # "light_css_variables": {},
+    # "dark_css_variables": {},
+}
+# html_logo = "img/flowcept-logo.png

flowcept-0.9.12/docs/img/flowcept-logo-dark.png ADDED Viewed

Binary file

flowcept-0.9.12/docs/index.rst ADDED Viewed

@@ -0,0 +1,49 @@
+Flowcept
+========
+.. raw:: html
+   <style>
+     /* Show/hide logos based on Furo's theme attribute */
+     .logo-light { display: inline; }
+     .logo-dark  { display: none; }
+     html[data-theme="dark"] .logo-light { display: none; }
+     html[data-theme="dark"] .logo-dark  { display: inline; }
+     /* When Furo is in 'auto', follow the OS preference */
+     html[data-theme="auto"] .logo-light { display: inline; }
+     html[data-theme="auto"] .logo-dark  { display: none; }
+     @media (prefers-color-scheme: dark) {
+       html[data-theme="auto"] .logo-light { display: none; }
+       html[data-theme="auto"] .logo-dark  { display: inline; }
+     }
+   </style>
+   <p align="center">
+     <!-- Keep both images in the DOM and toggle via CSS -->
+     <img src="_static/flowcept-logo.png" alt="Flowcept Logo" width="200" class="logo-light">
+     <img src="_static/flowcept-logo-dark.png" alt="Flowcept Logo (Dark)" width="200" class="logo-dark">
+   </p>
+.. image:: https://img.shields.io/badge/GitHub-Flowcept-black?logo=github&logoColor=white
+   :target: https://github.com/ORNL/flowcept
+   :alt: GitHub
+   :align: center
+   :width: 120px
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+   quick_start
+   architecture
+   setup
+   prov_capture
+   telemetry_capture
+   prov_storage
+   prov_query
+   schemas
+   contributing
+   cli-reference
+   api-reference

{flowcept-0.9.10 → flowcept-0.9.12}/docs/prov_capture.rst RENAMED Viewed

@@ -381,10 +381,8 @@ Optimized for **HPC** and tight loops; minimal interception overhead:
 **When to use**: massive iteration counts, sensitive microbenchmarks, or very low overhead needs.
 Loop Instrumentation
-~~~~~~~~~~~~~~~~~~~~
+---------------------
 Instrument iterative loops directly (see
 `loop example <https://github.com/ORNL/flowcept/blob/main/examples/instrumented_loop_example.py>`_).
@@ -403,10 +401,116 @@ Combine the context manager (below) with per-iteration tasks or custom events.
         loop.end_iter({"item": item, "loss": loss})
 FlowceptLoop vs FlowceptLightweightLoop
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Both classes instrument iterative code and attach each iteration to provenance. They differ in how they trade detail for speed.
+- **FlowceptLoop**: opens and closes a tiny “iteration task” around every `next()` call. It can attach `started_at`, `status`, and (if enabled) **per-iteration telemetry** at the end of each iteration. Messages are sent one by one. Works with sized iterables, integers, and iterators. If you pass a pure iterator without a known length, it will materialize it into a list unless you provide `items_length`.
+- **FlowceptLightweightLoop**: pre-allocates a task object for every iteration up front, updates `used` and `generated` as the loop progresses, and **sends everything in a single batch** when the loop ends. No per-iteration telemetry capture. Requires a known length. If you pass a pure iterator, you **must** provide `items_length`.
+When to use which
+~~~~~~~~~~~~~~~~~
+Choose **FlowceptLoop** if you need:
+- Per-iteration telemetry, `started_at`, and fine-grained timing.
+- Streaming of iteration records to the MQ/DB as the loop runs.
+- Constant memory usage independent of the number of iterations.
+Choose **FlowceptLightweightLoop** if you need:
+- The lowest overhead for very large loops.
+- A single batched publish at the end of the loop.
+- You can provide, or already have, the exact iteration count.
+Behavioral differences at a glance
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- **Telemetry**: FlowceptLoop records telemetry at the end of each iteration when telemetry is enabled. Lightweight does not.
+- **Publishing**: FlowceptLoop calls `intercept(...)` per iteration. Lightweight calls `intercept_many([...])` once after the loop finishes.
+- **Memory**: FlowceptLoop keeps only the current iteration in memory. Lightweight pre-allocates a list of task objects of size `len(items)`.
+- **Unknown lengths**: FlowceptLoop can materialize an unknown-length iterator into a list if you do not provide `items_length` (may be expensive). Lightweight requires a known `items_length` for iterators.
+API quick links
+~~~~~~~~~~~~~~~
+- `FlowceptLoop API <https://flowcept.readthedocs.io/en/latest/api-reference.html#flowceptloop>`_
+- `FlowceptLightweightLoop API <https://flowcept.readthedocs.io/en/latest/api-reference.html#flowceptlightweightloop>`_
+Examples
+~~~~~~~~
+Per-iteration telemetry and streaming
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: python
+   from time import sleep
+   from flowcept import Flowcept
+   from flowcept import FlowceptLoop
+   with Flowcept(workflow_name="telemetry_stream"):
+       loop = FlowceptLoop(range(5), loop_name="train_loop", item_name="epoch")
+       for epoch in loop:
+           loss = 0.1 * (5 - epoch)
+           sleep(0.02)
+           # Attach values produced inside this iteration
+           loop.end_iter({"loss": loss})
+   # Each iteration is sent with status and, if enabled, telemetry_at_end.
+Ultra-low overhead and batched publish
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: python
+   from flowcept import Flowcept
+   from flowcept import FlowceptLightweightLoop
+   data = [0, 1, 2, 3, 4]
+   with Flowcept(workflow_name="batched_publish"):
+       loop = FlowceptLightweightLoop(data, loop_name="eval_loop", item_name="batch")
+       for batch in loop:
+           metric = batch * 2
+           loop.end_iter({"metric": metric})
+   # All iteration tasks are published together after the loop completes.
+Iterating an unknown-length iterator
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: python
+   import itertools as it
+   from flowcept import Flowcept
+   from flowcept.instrumentation.loop import FlowceptLoop, FlowceptLightweightLoop
+   stream = it.islice(it.count(), 0, 100)  # iterator without __len__
+   with Flowcept(workflow_name="iterators"):
+       # Option A: FlowceptLoop can materialize if you do not know the length,
+       #           but this may be expensive for large streams.
+       loop_a = FlowceptLoop(stream, loop_name="loop_a", item_name="i", items_length=100)
+       for i in loop_a:
+           loop_a.end_iter({"v": i*i})
+       # Option B: Lightweight requires items_length for iterators.
+       stream2 = it.islice(it.count(), 0, 100)
+       loop_b = FlowceptLightweightLoop(stream2, loop_name="loop_b", item_name="i", items_length=100)
+       for i in loop_b:
+           loop_b.end_iter({"v": i*i})
+Tips and caveats
+~~~~~~~~~~~~~~~~
+- Set `item_name` to control the key stored under `used`, for example `{"epoch": 3}` instead of `{"item": 3}`.
+- Use `parent_task_id` to nest loop iterations under another task.
+- For very large loops where you only need `used` and `generated`, prefer Lightweight to reduce interceptor calls.
+- If you use FlowceptLoop with a huge iterator, pass `items_length` to avoid accidental materialization.
+- Both classes honor `INSTRUMENTATION_ENABLED` and `capture_enabled`. If disabled, they behave like regular iterators and `end_iter(...)` becomes a no-op.
-TODO
 PyTorch Models
 ~~~~~~~~~~~~~~
@@ -485,7 +589,7 @@ See `MCP Agent example <https://github.com/ORNL/flowcept/blob/main/examples/agen
 Custom Task Creation (fully customizable)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------------------
 Build tasks programmatically with ``FlowceptTask``—useful for non-decorator flows or custom payloads.
 Requires an active workflow (``with Flowcept(...)`` or ``Flowcept().start()``).
@@ -506,10 +610,13 @@ Requires an active workflow (``with Flowcept(...)`` or ``Flowcept().start()``).
        task.end({"records": 42})
        task.send()  # publishes to MQ
+If you need to store something that is not publicly exposed in the API (yet), you can use the private instance of ``FlowceptTask._task`` to access the task object fields directly. If that happens, open an issue in the repository and we will try to expose that in the public API.
 **Notes**:
 - Use **context** (``with FlowceptTask(...)``) *or* call ``send()`` explicitly.
 - Flows publish to the MQ; persistence/queries require a DB (e.g., MongoDB).
+- See also: `FlowceptTask API reference <file:///Users/rsr/Documents/GDrive/ORNL/dev/flowcept/docs/_build/html/api-reference.html#flowcepttask>`_
 - See also: `Consumer example <https://flowcept.readthedocs.io/en/latest/prov_storage.html#example-extending-the-base-consumer>`_
 - See also: `Ping pong example via PubSub with Flowcept <https://github.com/ORNL/flowcept/blob/main/examples/consumers/ping_pong_example.py>`_

{flowcept-0.9.10 → flowcept-0.9.12}/docs/prov_query.rst RENAMED Viewed

@@ -72,72 +72,85 @@ Below is a typical usage pattern:
 The `DBAPI` exposes many other methods, such as `get_tasks_recursive` to retrieve all descendants of a task, or `dump_tasks_to_file_recursive` to export tasks to Parquet. See the API reference for details.
-Accessing the In‑Memory Buffer
+Accessing the in-memory buffer
 ------------------------------
-During runtime Flowcept stores captured messages in an in‑memory buffer (`Flowcept.buffer`). This buffer is useful for debugging or lightweight scripts because it provides immediate access to the latest tasks and workflows without any additional services. However, if running online, be aware that this buffer is flushed (i.e., emptied) from times to times to the MQ.
+Flowcept keeps recently captured messages in memory as a list of dictionaries. This is handy for debugging and lightweight scripts. In online mode the buffer may be flushed to the MQ periodically.
-In the example below we create two tasks that attach binary data and then inspect the buffer:
+.. code-block:: python
+   from flowcept import Flowcept
+   with Flowcept(workflow_name="demo") as f:
+       # ... run your tasks ...
+       raw_list = f.get_buffer()                 # list[dict]
+       df = f.get_buffer(return_df=True)         # pandas.DataFrame with dotted columns
+       assert "generated.attention" in df.columns
+Dumping the buffer to disk (online or offline)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+You can persist the buffer to a JSON Lines file in both offline and online runs.
 .. code-block:: python
-    from pathlib import Path
-    from flowcept import Flowcept
-    from flowcept.instrumentation.task import FlowceptTask
-    with Flowcept() as f:
-        used_args = {"a": 1}
-        # first task – attach a PDF
-        with FlowceptTask(used=used_args) as t:
-            img_path = Path("docs/img/architecture.pdf")
-            with open(img_path, "rb") as fp:
-                img_data = fp.read()
-            t.end(generated={"b": 2},
-                  data=img_data,
-                  custom_metadata={
-                      "mime_type": "application/pdf",
-                      "file_name": "architecture.pdf",
-                      "file_extension": "pdf"})
-            t.send()
-        # second task – attach a PNG
-        with FlowceptTask(used=used_args) as t:
-            img_path = Path("docs/img/flowcept-logo.png")
-            with open(img_path, "rb") as fp:
-                img_data = fp.read()
-            t.end(generated={"c": 2},
-                  data=img_data,
-                  custom_metadata={
-                      "mime_type": "image/png",
-                      "file_name": "flowcept-logo.png",
-                      "file_extension": "png"})
-            t.send()
-        # inspect the buffer
-        assert len(Flowcept.buffer) == 3  # includes the workflow message
-        assert Flowcept.buffer[1]["data"]  # binary data is captured as bytes
-At any point inside the running workflow you can access `Flowcept.buffer` to retrieve a list of dictionaries representing messages. Each element contains the original JSON payload plus any binary `data` field. Because the buffer lives in memory, it reflects the most recent state of the workflow and is cleared when the process ends.
-Working Offline: Reading a Messages File
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-When persistence is enabled in offline mode, Flowcept dumps the buffer to a JSONL file. Use :func:`Flowcept.read_messages_file` to load these messages later. If you pass `return_df=True` Flowcept will normalise nested fields into dot‑separated columns and return a pandas DataFrame. This is handy for ad‑hoc analysis with pandas.
+   with Flowcept(workflow_name="demo") as f:
+       # ... run your tasks ...
+       f.dump_buffer()                  # uses settings path (see below)
+       f.dump_buffer(\"my_buffer.jsonl\") # custom path
+Default configuration enables dumping to ``flowcept_buffer.jsonl``:
+- ``\"project\": {\"dump_buffer\": {\"enabled\": True, \"path\": \"flowcept_buffer.jsonl\"}}``
+You can control DB flushing and the buffer path in your settings:
+.. code-block:: yaml
+   project:
+     db_flush_mode: online   # \"online\" or \"offline\"
+     dump_buffer:
+       enabled: true
+       path: flowcept_buffer.jsonl
+- **Offline mode**: set ``project.db_flush_mode: offline`` to keep messages local.
+- **Online mode**: keep ``online``; you can still dump and read the file at any time.
+Reading a buffer file (list or DataFrame)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Use :meth:`Flowcept.read_buffer_file` to load a buffer file later. If no file path is provided, the one configured in the settings.yaml will be used.
 .. code-block:: python
-    from flowcept import Flowcept
+   from flowcept import Flowcept
-    # read JSON into a list of dicts
-    msgs = Flowcept.read_messages_file("offline_buffer.jsonl")
-    print(f"{len(msgs)} messages")
+   # 1) List of dicts
+   msgs = Flowcept.read_buffer_file(\"flowcept_buffer.jsonl\")
+   print(f\"Loaded {len(msgs)} messages\")
+   # 2) DataFrame without flattening (nested dicts stay as objects)
+   df_raw = Flowcept.read_buffer_file(\"flowcept_buffer.jsonl\", return_df=True, normalize_df=False)
+   # 3) DataFrame with dotted columns (normalized)
+   df_norm = Flowcept.read_buffer_file(\"flowcept_buffer.jsonl\", return_df=True, normalize_df=True)
+   assert \"generated.attention\" in df_norm.columns
+Deleting a buffer file
+^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: python
-    # read JSON into a pandas DataFrame
-    df = Flowcept.read_messages_file("offline_buffer.jsonl", return_df=True)
-    # dot‑notation columns allow easy selection; e.g., outputs of attention layers
-    print("generated.attention" in df.columns)
+   from flowcept import Flowcept
+   Flowcept.delete_buffer_file()                 # deletes default path from settings
+   Flowcept.delete_buffer_file(\"my_buffer.jsonl\")
-Keep in mind that the JSONL file is only created when using fully offline mode. The path is configured in the settings file under ``DUMP_BUFFER_PATH``. If the file doesn’t exist, `read_messages_file` will raise an error.
+Notes
+^^^^^
+- DataFrame returns require ``pandas``. If you installed Flowcept with optional extras, ``pandas`` is included.
+- Binary payloads, when present, are stored under the ``data`` key in the buffer messages. However, they are not stored in the buffer file.
+- See also: `persisting the in-memory buffer. <https://flowcept.readthedocs.io/en/latest/prov_storage.html#saving-the-in-memory-buffer-to-disk>`_
 Working Directly with MongoDB
 -----------------------------

{flowcept-0.9.10 → flowcept-0.9.12}/docs/prov_storage.rst RENAMED Viewed

@@ -12,9 +12,9 @@ For optional persistence, you can choose between:
 - `MongoDB <https://www.mongodb.com/>`_
   A robust, service-based database with advanced query support.
   Required to use Flowcept's Query API (``flowcept.Flowcept.db``) for complex queries and features like ML model management or runtime queries (query while writing).
-  To use MongoDB, start the service with ``make services-mongo``.
 Flowcept supports writing to both databases simultaneously (default), individually, or to neither, depending on configuration.
+See `setup instructions <https://flowcept.readthedocs.io/en/latest/setup.html#setup>`_.
 If persistence is disabled, captured data is sent to the MQ without any default consumer subscribing to it.
 In this case, querying requires writing a custom consumer to subscribe and store the data.
@@ -26,6 +26,95 @@ In this case, querying requires writing a custom consumer to subscribe and store
    If neither is enabled, an error occurs.
    Data stored in MongoDB and LMDB are interchangeable and can be transferred between them.
+Saving the In-Memory Buffer to Disk
+-----------------------------------
+Flowcept can persist the in-memory message buffer to a **JSON Lines (JSONL)** file in both **offline** and **online** modes. This is useful for audits, simple centralized runs, and quick ad‑hoc analysis.
+Configuration
+^^^^^^^^^^^^^
+Default dumping is enabled and writes to ``flowcept_buffer.jsonl``:
+To favor local files (**offline**), set:
+.. code-block:: yaml
+   project:
+     db_flush_mode: offline   # keeps messages local (no DB writes)
+     dump_buffer:
+       enabled: true
+       path: flowcept_buffer.jsonl
+For standard **online** runs (DB writes enabled) while still keeping a file copy:
+.. code-block:: yaml
+   project:
+     db_flush_mode: online    # default
+     dump_buffer:
+       enabled: true
+       path: flowcept_buffer.jsonl
+Usage
+^^^^^
+Dump the buffer (during or at the end of a run):
+.. code-block:: python
+   from flowcept import Flowcept
+   with Flowcept(workflow_name="demo") as f:
+       # ... your tasks ...
+       f.dump_buffer()                   # uses settings path
+       f.dump_buffer("my_buffer.jsonl") # custom path
+Read the buffer file later (as list or DataFrame):
+.. code-block:: python
+   from flowcept import Flowcept
+   # 1) List of dicts
+   msgs = Flowcept.read_buffer_file("flowcept_buffer.jsonl")
+   # 2) DataFrame without flattening (nested dicts stay as objects)
+   df_raw = Flowcept.read_buffer_file("flowcept_buffer.jsonl", return_df=True, normalize_df=False)
+   # 3) DataFrame with dotted columns (normalized)
+   df_norm = Flowcept.read_buffer_file("flowcept_buffer.jsonl", return_df=True, normalize_df=True)
+Delete a buffer file if needed:
+.. code-block:: python
+   from flowcept import Flowcept
+   Flowcept.delete_buffer_file()                  # deletes default path from settings
+   Flowcept.delete_buffer_file("my_buffer.jsonl")
+.. note::
+   The file-based method is **best suited for offline mode** or small, centralized runs.
+   Each ``interceptor`` in a Flowcept instance maintains its own in-memory buffer.
+   In distributed settings (e.g., HPC jobs or distributed workflows), this creates separate buffer
+   files per interceptor. To run an end-to-end analysis, you must manually merge all files.
+   For distributed runs, prefer the **MongoDB** provenance storage option, which consolidates all
+   captured provenance into a single database automatically.
+   Alternatively, implement a **custom consumer** to centralize message ingestion and
+   enable real-time analysis.
+See also
+^^^^^^^^
+- `Buffer querying <https://flowcept.readthedocs.io/en/latest/prov_query.html#accessing-the-in-memory-buffer>`_
+- `Implementing a custom consumer <https://flowcept.readthedocs.io/en/latest/prov_storage.html#example-extending-the-base-consumer>`_
+- `Flowcept API Reference <https://flowcept.readthedocs.io/en/latest/api-reference.html#main-flowcept-object>`_
 ---
 Provenance Consumer
@@ -84,6 +173,7 @@ This can serve as a template for building custom provenance consumers.
        consumer = MyConsumer()
        consumer.start(daemon=False)
 **Notes**:
 - See also: `Explicit publish example <file:///Users/rsr/Documents/GDrive/ORNL/dev/flowcept/docs/_build/html/prov_capture.html#custom-task-creation-fully-customizable>`_

flowcept 0.9.10__tar.gz → 0.9.12__tar.gz

flowcept 0.9.10tar.gz → 0.9.12tar.gz