PyPI - lsst-pipe-base - Versions diffs - 30.2026.200__tar.gz → 30.2026.400__tar.gz - Mend

lsst-pipe-base 30.2026.200tar.gz → 30.2026.400tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (193) hide show

{lsst_pipe_base-30.2026.200/python/lsst_pipe_base.egg-info → lsst_pipe_base-30.2026.400}/PKG-INFO RENAMED Viewed

@@ -1,10 +1,11 @@
 Metadata-Version: 2.4
 Name: lsst-pipe-base
-Version: 30.2026.200
+Version: 30.2026.400
 Summary: Pipeline infrastructure for the Rubin Science Pipelines.
 Author-email: Rubin Observatory Data Management <dm-admin@lists.lsst.org>
 License-Expression: BSD-3-Clause OR GPL-3.0-or-later
 Project-URL: Homepage, https://github.com/lsst/pipe_base
+Project-URL: Source, https://github.com/lsst/pipe_base
 Keywords: lsst
 Classifier: Intended Audience :: Science/Research
 Classifier: Operating System :: OS Independent

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/doc/lsst.pipe.base/CHANGES.rst RENAMED Viewed

@@ -1,3 +1,69 @@
+lsst-pipe-base v30.0.0 (2026-01-16)
+===================================
+New Features
+------------
+- Added support for healpix (and other non-database dimensions) in quantum graph builder. (`DM-51176 <https://rubinobs.atlassian.net/browse/DM-51176>`_)
+- Added filtering out dataset refs that the destination butler already knows in ``transfer_from_graph`` as well as dividing the transfer into smaller chunks to speed up restarts. (`DM-51273 <https://rubinobs.atlassian.net/browse/DM-51273>`_)
+- Added handling of ``PrerequisiteInput`` in ``QuantaAdjuster``, with a corresponding unit test. (`DM-51509 <https://rubinobs.atlassian.net/browse/DM-51509>`_)
+- Added ``PredictedQuantumGraph``, a replacement for the ``QuantumGraph`` class with more efficient I/O (via a new file format and more partial-read flexibility).
+  The new ``PredictedQuantumGraph`` is now the default in most tooling, and the new format can be opted into via the ``.qg`` (instead of ``.qgraph``) file extension.
+  New files can be read with the old class and vice versa.
+  The ``QuantumGraph`` class will eventually be deprecated along with much of the current provenance reporting tooling, but only when the new provenance ecosystem is fully in place. (`DM-51850 <https://rubinobs.atlassian.net/browse/DM-51850>`_)
+- Added a ``rename`` dict in ``ImportIR`` to support renaming task labels, with corresponding unit tests. (`DM-52168 <https://rubinobs.atlassian.net/browse/DM-52168>`_)
+- Added the new ``ProvenanceQuantumGraph`` class and the ``aggregate-graph`` tool (a replacement for ``transfer-from-graph``) that writes it at the end of batch runs. (`DM-52360 <https://rubinobs.atlassian.net/browse/DM-52360>`_)
+- Improved provenance tracking for failed quanta and retries.
+  By storing extra information in the log datasets written during execution,
+  we can record caught exceptions, track which other quanta have
+  already executed in the same process, and keep track of previous attempts to
+  run the same quantum. (`DM-53019 <https://rubinobs.atlassian.net/browse/DM-53019>`_)
+- Added provenance writing support to ``MPGraphExecutor`` and ``SeparablePipelineExecutor``. (`DM-53622 <https://rubinobs.atlassian.net/browse/DM-53622>`_)
+API Changes
+-----------
+- Moved pipeline executors and their support code here, from ``lsst.ctrl.mpexec``.
+  This included minor API changes for ``SingleQuantumExecutor`` as well: consistent snake-case naming, keyword-only arguments for construction, and a switch to private instance attributes. (`DM-48980 <https://rubinobs.atlassian.net/browse/DM-48980>`_)
+Bug Fixes
+---------
+- Fixed ``transfer-from-graph`` to update chain when asked and output run collection exists even if didn't transfer any datasets. (`DM-51821 <https://rubinobs.atlassian.net/browse/DM-51821>`_)
+- Fixed bug in ``transfer_from_graph`` where the input collections were not flattened before adding to a new output chain. (`DM-52004 <https://rubinobs.atlassian.net/browse/DM-52004>`_)
+- Fixed bug where the log's ``MDC.RUN`` was the empty string when using a quantum-backed butler. (`DM-52676 <https://rubinobs.atlassian.net/browse/DM-52676>`_)
+- Fixed a bug that caused ``PipelineGraph`` objects to be marked as unresolved when loaded from disk. (`DM-52787 <https://rubinobs.atlassian.net/browse/DM-52787>`_)
+Other Changes and Additions
+---------------------------
+- The ``Instrument.configPaths`` property can now refer to ``lsst.resources.ResourcePath`` URIs as well as strings (paths or URI strings). (`DM-33226 <https://rubinobs.atlassian.net/browse/DM-33226>`_)
+- Updated pipe_base code to use the constants defined in ``automatic_connection_constants.py``. (`DM-52676 <https://rubinobs.atlassian.net/browse/DM-52676>`_)
+- Uses UUIDs instead of internal integer IDs in new quantum graph storage.
+  This includes backwards compatibility read support for predicted quantum graphs, but not provenance quantum graphs, because those are still experimental anyway.
+  This increases the size of the files by ~6%, but it simplifies the codebase and will make consolidating multiple small provenance quantum graphs (as we
+  currently anticipate doing for prompt processing) much more efficient. (`DM-53174 <https://rubinobs.atlassian.net/browse/DM-53174>`_)
+- Used context managers to ensure that database resources are freed. (`DM-53370 <https://rubinobs.atlassian.net/browse/DM-53370>`_)
+- Added more logging for the later steps of QG building. (`DM-53636 <https://rubinobs.atlassian.net/browse/DM-53636>`_)
+An API Removal or Deprecation
+-----------------------------
+- Removes the ``buildExecutionButler`` function and all supporting code.
+  Execution butlers (read-only SQLite databases used for batch execution) have been fully superseded by ``lsst.daf.butler.QuantumBackedButler``. (`DM-52044 <https://rubinobs.atlassian.net/browse/DM-52044>`_)
 lsst-pipe-base v29.1.0 (2025-06-13)
 ===================================

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/doc/lsst.pipe.base/creating-a-pipeline.rst RENAMED Viewed

@@ -150,12 +150,12 @@ associated with ``class`` keyword instead of the label directly. The
 the configuration appropriate for this `Pipeline` specified as an additional
 yaml mapping.
-The complete complexity of :ref:`lsst.pex.config` can't be represented with simple
+The complete complexity of `lsst.pex.config` can't be represented with simple
 yaml mapping syntax. To account for this, ``config`` blocks in `Pipeline`\ s
 support two special fields: ``file`` and ``python``.
 The ``file`` key may be associated with either a single value pointing to a
-filesystem path where a :ref:`lsst.pex.config` file can be found, or a yaml list
+filesystem path where a `lsst.pex.config` file can be found, or a yaml list
 of such paths. The file paths can contain environment variables that will be
 expanded prior to loading the file(s). These files will then be applied to
 the task during configuration time to override any default values.
@@ -477,7 +477,7 @@ desired camera, or can serve as a base for further `Pipeline`\ s to import.
 Command line options for running Pipelines
 ------------------------------------------
 This section is not intended to serve as a tutorial for processing data from
-the command line, for that refer to :ref:`lsst.ctrl.mpexec` or :ref:`lsst.ctrl.bps`.
+the command line, for that refer to `lsst.ctrl.mpexec` or `lsst.ctrl.bps`.
 However, both of these tools accept URI pointers to a `Pipeline`.  These URIs
 can be altered with a specific syntax which will control how the `Pipeline`
 is loaded.

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/doc/lsst.pipe.base/creating-a-pipelinetask.rst RENAMED Viewed

@@ -142,7 +142,7 @@ not tied to the exact band passes of an individual telescope filter).
 Next, take a look at the fields defined on your new connection class. These
 are defined in a similar way as defining a configuration class, but instead
-of using `~lsst.pex.config.Field` types from :ref:`lsst.pex.config`,
+of using `~lsst.pex.config.Field` types from `lsst.pex.config`,
 connection classes make use of connection types defined in
 :py:mod:`lsst.pipe.base.connectionTypes`. These connections define the inputs and outputs that
 a |PipelineTask| will expect to make use of. Each of these connections documents

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/doc/lsst.pipe.base/creating-a-task.rst RENAMED Viewed

@@ -145,7 +145,7 @@ Use the ``__init__`` method (task constructor) to do the following:
 - Call the parent task's ``__init__`` method
 - Make subtasks by calling ``self.makeSubtask(name)``, where ``name`` is the name of a field of type `lsst.pex.config.ConfigurableField` in your :ref:`task's configuration <creating-a-task-configuration>`.
-- Make a schema if your task uses an :ref:`lsst.afw.table`.
+- Make a schema if your task uses an `lsst.afw.table`.
   For an example of such a task `lsst.pipe.tasks.calibrate.CalibrateTask`.
 - Initialize any other instance variables your task needs.

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/doc/lsst.pipe.base/index.rst RENAMED Viewed

@@ -61,6 +61,14 @@ Developing Pipelines
    testing-pipelines-with-mocks.rst
    working-with-pipeline-graphs.rst
+Running Pipelines
+-----------------
+.. toctree::
+   :maxdepth: 1
+   recording-provenance.rst
 .. _lsst.pipe.base-contributing:
 Contributing
@@ -102,6 +110,10 @@ Python API reference
 .. automodapi:: lsst.pipe.base.quantum_graph
+.. automodapi:: lsst.pipe.base.quantum_graph.aggregator
+.. automodapi:: lsst.pipe.base.quantum_graph.ingest_graph
 .. automodapi:: lsst.pipe.base.quantum_graph.visualization
 QuantumGraph generation API reference

lsst_pipe_base-30.2026.400/doc/lsst.pipe.base/recording-provenance.rst ADDED Viewed

@@ -0,0 +1,108 @@
+.. _pipe_base_provenance:
+.. py:currentmodule:: lsst.pipe.base.quantum_graph
+####################
+Recording Provenance
+####################
+The `PredictedQuantumGraph` that is used to predict and control processing also contains a wealth of provenance information, including task configuration and the complete input-output relationships between all datasets.
+Instead of storing these graphs directly in a `~lsst.daf.butler.Butler` repository, however, it is better to first augment them with additional provenance information that is only available after execution has completed, producing a `ProvenanceQuantumGraph` that is ingested instead.
+We store provenance in a ``run_provenance`` dataset type with empty dimensions, which means there is exactly one for each `~lsst.daf.butler.CollectionType.RUN` collection.
+In addition to the input-output graph itself and full configuration for all tasks, `ProvenanceQuantumGraph` stores status information for each attempt to run a quantum, including exception information and caveats on any successes.
+It can also store the full logs and task metadata for each quantum, allowing repositories to store many fewer small files (it is possible to continue to have per-quantum butler datasets for these, all backed by the same file).
+The pipeline system has many different execution contexts, and provenance recording is not supported in all of them at this time.
+Batch Execution / Quantum-Backed Butler
+=======================================
+Provenance recording is fully supported in batch workflows that use the `~lsst.daf.butler.QuantumBackedButler` class (e.g. ``pipetask run-qbb``, as run by the ``bps`` tool) to avoid database writes during execution.
+This involves the following steps:
+- A `PredictedQuantumGraph` is generated as usual (e.g. via ``pipetask qgraph``, as run by ``bps submit``) and saved to a known location.
+- All quanta are executed via ``pipetask run-qbb``, writing their outputs to butler-managed storage without updating the butler database.
+- When all quanta have been attempted, the ``butler aggregate-graph`` tool is run (e.g. in the BPS ``finalJob``) to ingest output datasets into the butler database, and the ``--output`` option is used to save a `ProvenanceQuantumGraph` to a known location.
+  This step and the previous one may be run multiple times (e.g. via ``bps restart``) to retry some failures, and it is only necessary to pass ``--output`` the last time (though usually the user does not know which attempt will be the last one).
+- When all processing attempts are complete, the ``butler ingest-graph`` tool is used to ingest the graph into the butler database and rewrite all metadata, log, and config datasets to also be backed by the same graph file (deleting the original files).
+  This step should not be included in the BPS ``finalJob`` (see below).
+All of the above happens in a single `~lsst.daf.butler.CollectionType.RUN` collection.
+Reference documentation for ``butler aggregate-graph`` and ``butler ingest-graph`` can be found in the `aggregator` and `ingest_graph` modules that implement them (respectively); in both cases there are Python interfaces that closely mirror the command-line ones.
+Parallelization
+---------------
+Aggregating and ingesting a large batch run is expensive, and both tools use parallelism whenever possible to improve performance.
+The aggregator in particular is explicitly parallel, with separate workers (usually subprocesses) assigned to scan and read metadata and log files (any number of workers), ingest datasets (a single worker), write the provenance graph file (a single worker), and coordinate all of these operations.
+Since all information must be passed from the scanners to the ingestion and writer workers, additional parallelism can help when all operations are running at around the same speed (as reported in the logs), but not when ingestion or writing lags significantly behind.
+The writer process has substantial startup overhead and will typically lag the others at the beginning before it catches up later.
+The `ingest_graph` tool mostly performs database write operations, which do not benefit from parallelism, but it also deletes the original metadata, log, and config files as the new graph-backed variants of those datasets are ingested.
+These deletes are delegated to `lsst.resources.ResourcePath.mremove`, which refers to the ``LSST_RESOURCES_NUM_WORKERS``, ``LSST_RESOURCES_EXECUTOR``, and ``LSST_S3_USE_THREADS`` environment variables to control parallelism.
+As with other butler bulk-delete operations, the default parallelism is usually fine.
+.. note::
+  Earlier versions of the `aggregator` would run catastrophically slowly when ``LSST_RESOURCES_EXECUTOR=process``, as this made each scanner process spawn multiple subprocesses constantly.
+  In recent versions all parallelism environment variables are ignored by the aggregator so this should not occur.
+Ingesting Outputs Early
+-----------------------
+The `aggregator` may be run with `~aggregator.AggregatorConfig.incomplete` set to `True` (``--incomplete`` on the command line) to allow it to be safely run before the graph has finished executing.
+Note that while ingestion always picks up where it left off, scanning always has to start at the beginning, and provenance graph writing is disabled when running in ``incomplete`` mode, so while this allows output datasets be be available via the `~lsst.daf.butler.Butler` sooner, it does not generally make the final complete `aggregator` call substantially faster.
+Promising Graph Ingestion
+-------------------------
+By default, the `aggregator` ingests all metadata, log, and config outputs into the butler database in the usual way, i.e. backed by their original individual files.
+The `ingest_graph` tool then has to delete these datasets from the butler database before it can ingest new ones and delete the original files.
+When it is known in advance that `ingest_graph` will be run later, the `~aggregator.AggregatorConfig.promise_ingest_graph` (``--promise-ingest-graph``) option can be used to tell the `aggregator` *not* to ingest these, saving time for both commands.
+This option must be used with care, however: if `ingest_graph` isn't run later, the original files will be orphaned in a butler-managed location without any record in the database, which generally means they'll quietly take up space.
+In addition, because the metadata datasets are used by the middleware system as the indicator of a quantum's success, their absence will make any downstream quantum graphs built using ``--skip-existing-in`` incorrect.
+And of course any downstream quantum graph builds that actually use those datasets as input (only metadata should be) will not see them as available.
+Deferring Graph Ingestion
+-------------------------
+Ingesting the provenance graph is not generally necessary to kick off downstream processing by building new quantum graphs for later pipeline steps, and it is always safe to build downstream quantum graphs if `~aggregator.AggregatorConfig.promise_ingest_graph` is left `False`.
+It can also be done safely if `~aggregator.AggregatorConfig.promise_ingest_graph` is `True` and:
+ - ``--skip-existing-in`` is not used;
+ - the downstream processing does not use metadata, log, or config datasets as an overall input (``pipetask build ... --show inputs`` can be used to check for this).
+These conditions also must be met in order for `ingest_graph` to be safely run *while* a downstream quantum graph is being executed.
+Both of these conditions are *usually* met, and deferring and promising graph ingest each provide significant wall-clock savings, so we recommend the following approach for very large BPS campaigns:
+- Submit ``step(N)`` to BPS with ``--promise-ingest-graph`` in the ``finalJob`` invocation of ``aggregate-graph``.
+- When ready to move on to ``step(N+1)``, run ``pipetask build ... --show inputs`` (on ``step(N+1)``) to scan for metadata, log, and config inputs that may be needed from the previous step.
+- If there are no such inputs, immediately submit that step to BPS, and run `ingest_graph` on ``step(N)`` as soon as the quantum graph for ``step(N+1)`` is built (it could be built at the same time, but waiting a bit may help spread out database load).
+- If there are metadata, log, or config inputs, run `ingest_graph` on ``step(N)`` and wait for it to finish before submitting ``step(N+1)``.
+Note that *independent* quantum graph builds (e.g. same tasks, disjoint data IDs) can always be built before or while `ingest_graph` runs.
+Recovering from Interruptions
+-----------------------------
+If the `aggregator` is interrupted it can simply be started again.
+Database ingestion will pick up where it left off, while scanning and provenance-graph writing will start over from the beginning.
+If `ingest_graph` is interrupted, it can also be started again, and everything will pick up where it left off.
+To guarantee this it always modifies the repository in the following order:
+- if the ``run_provenance`` dataset does not exist in the collection, all existing metadata/log/config datasets are assumed to be backed by their original files and are deleted from the butler database (without deleting the files);
+- the ``run_provenance`` dataset itself is ingested (this ensures the metadata/log/config *content* is safe inside the butler, even if it's not fully accessible);
+- in batches, metadata/log/config datasets are reingested into the butler backed by the graph file, and then the corresponding original files are deleted.
+This means we can use the existence of ``run_provenance`` and any particular metadata/log/config dataset in the butler database to infer the status of the original files.
+In fact, if `ingest_graph` is interrupted at any point, it *must* be tried again until it succeeds, since not doing so can leave metadata/log/config files orphaned, just like when `~aggregator.AggregatorConfig.promise_ingest_graph` is `True`.
+.. note::
+  After the ``run_provenance`` dataset is ingested, it is *not* safe to run the `aggregator`: the `aggregator` reads the original metadata and log files to gather provenance information, and will infer the wrong states for quanta if those are missing because `ingest_graph` has deleted them.
+  This is why it is not safe to run ``bps restart`` after `ingest_graph`, and why we do not recommend adding `ingest_graph` to the BPS ``finalJob``, even if the user is willing to forgo using ``bps restart``: by default, the ``finalJob`` will be retried on failure, causing the `aggregator` to run again when it may not be safe to do so.
+  And if ``finalJob`` retries are disabled, it is too easy for the repository to end up in a state that would require manual `ingest_graph` runs to prevent orphan datasets.

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/pyproject.toml RENAMED Viewed

@@ -45,6 +45,7 @@ pipe_base = "lsst.pipe.base.cli:get_cli_subcommands"
 [project.urls]
 "Homepage" = "https://github.com/lsst/pipe_base"
+"Source" = "https://github.com/lsst/pipe_base"
 [project.optional-dependencies]
 test = ["pytest >= 3.2"]

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/python/lsst/pipe/base/_instrument.py RENAMED Viewed

@@ -31,7 +31,6 @@ __all__ = ("Instrument",)
 import contextlib
 import datetime
-import os.path
 from abc import ABCMeta, abstractmethod
 from collections.abc import Sequence
 from typing import TYPE_CHECKING, Any, Self, cast, final
@@ -39,6 +38,7 @@ from typing import TYPE_CHECKING, Any, Self, cast, final
 from lsst.daf.butler import DataCoordinate, DataId, DimensionPacker, DimensionRecord, Formatter
 from lsst.daf.butler.registry import DataIdError
 from lsst.pex.config import Config, RegistryField
+from lsst.resources import ResourcePath, ResourcePathExpression
 from lsst.utils import doImportType
 from lsst.utils.introspection import get_full_type_name
@@ -65,7 +65,7 @@ class Instrument(metaclass=ABCMeta):
     the base class.
     """
-    configPaths: Sequence[str] = ()
+    configPaths: Sequence[ResourcePathExpression] = ()
     """Paths to config files to read for specific Tasks.
     The paths in this list should contain files of the form `task.py`, for
@@ -109,6 +109,10 @@ class Instrument(metaclass=ABCMeta):
             If `True` (`False` is default), update existing records if they
             differ from the new ones.
+        Returns
+        -------
+        None
         Raises
         ------
         lsst.daf.butler.registry.ConflictingDefinitionError
@@ -127,13 +131,6 @@ class Instrument(metaclass=ABCMeta):
         the level of individual dimension entries; new detectors and filters
         should be added, but changes to any existing record should not be.
         This can generally be achieved via a block like
-        .. code-block:: python
-            with registry.transaction():
-                registry.syncDimensionData("instrument", ...)
-                registry.syncDimensionData("detector", ...)
-                self.registerFilters(registry)
         """
         raise NotImplementedError()
@@ -366,9 +363,10 @@ class Instrument(metaclass=ABCMeta):
             Config instance to which overrides should be applied.
         """
         for root in self.configPaths:
-            path = os.path.join(root, f"{name}.py")
-            if os.path.exists(path):
-                config.load(path)
+            resource = ResourcePath(root, forceDirectory=True, forceAbsolute=True)
+            uri = resource.join(f"{name}.py", forceDirectory=False)
+            if uri.exists():
+                config.load(uri)
     @staticmethod
     def formatCollectionTimestamp(timestamp: str | datetime.datetime) -> str:

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/python/lsst/pipe/base/_status.py RENAMED Viewed

@@ -275,15 +275,23 @@ class ExceptionInfo(pydantic.BaseModel):
 class QuantumAttemptStatus(enum.Enum):
     """Enum summarizing an attempt to run a quantum."""
+    ABORTED = -4
+    """The quantum failed with a hard error that prevented both logs and
+    metadata from being written.
+    This state is only set if information from higher-level tooling (e.g. BPS)
+    is available to distinguish it from ``UNKNOWN``.
+    """
     UNKNOWN = -3
     """The status of this attempt is unknown.
-    This usually means no logs or metadata were written, and it at least could
-    not be determined whether the quantum was blocked by an upstream failure
-    (if it was definitely blocked, `BLOCKED` is set instead).
+    This means no logs or metadata were written, and it at least could not be
+    determined whether the quantum was blocked by an upstream failure (if it
+    was definitely blocked, `BLOCKED` is set instead).
     """
-    LOGS_MISSING = -2
+    ABORTED_SUCCESS = -2
     """Task metadata was written for this attempt but logs were not.
     This is a rare condition that requires a hard failure (i.e. the kind that
@@ -292,20 +300,21 @@ class QuantumAttemptStatus(enum.Enum):
     """
     FAILED = -1
-    """Execution of the quantum failed.
+    """Execution of the quantum failed gracefully.
     This is always set if the task metadata dataset was not written but logs
     were, as is the case when a Python exception is caught and handled by the
-    execution system.  It may also be set in cases where logs were not written
-    either, but other information was available (e.g. from higher-level
-    orchestration tooling) to mark it as a failure.
+    execution system.
+    This status guarantees that the task log dataset was produced but the
+    metadata dataset was not.
     """
     BLOCKED = 0
     """This quantum was not executed because an upstream quantum failed.
-    Upstream quanta with status `UNKNOWN` or `FAILED` are considered blockers;
-    `LOGS_MISSING` is not.
+    Upstream quanta with status `UNKNOWN`, `FAILED`, or `ABORTED` are
+    considered blockers; `ABORTED_SUCCESS` is not.
     """
     SUCCESSFUL = 1
@@ -319,6 +328,16 @@ class QuantumAttemptStatus(enum.Enum):
     these "successes with caveats" are reported.
     """
+    @property
+    def has_metadata(self) -> bool:
+        """Whether the task metadata dataset was produced."""
+        return self is self.SUCCESSFUL or self is self.ABORTED_SUCCESS
+    @property
+    def has_log(self) -> bool:
+        """Whether the log dataset was produced."""
+        return self is self.SUCCESSFUL or self is self.FAILED
 class GetSetDictMetadataHolder(Protocol):
     """Protocol for objects that have a ``metadata`` attribute that satisfies

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/python/lsst/pipe/base/automatic_connection_constants.py RENAMED Viewed

@@ -26,7 +26,7 @@
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 """Constants used to define the connections automatically added for each
-PipelineTask by the execution system.
+PipelineTask by the execution system, as well as other special dataset types.
 """
 from __future__ import annotations
@@ -43,6 +43,8 @@ __all__ = (
     "METADATA_OUTPUT_TEMPLATE",
     "PACKAGES_INIT_OUTPUT_NAME",
     "PACKAGES_INIT_OUTPUT_STORAGE_CLASS",
+    "PROVENANCE_DATASET_TYPE_NAME",
+    "PROVENANCE_STORAGE_CLASS",
 )
@@ -91,3 +93,9 @@ type names.
 METADATA_OUTPUT_STORAGE_CLASS: str = "TaskMetadata"
 """Name of the storage class for task metadata output datasets.
 """
+PROVENANCE_DATASET_TYPE_NAME: str = "run_provenance"
+"""Name of the dataset used to store per-RUN provenance."""
+PROVENANCE_STORAGE_CLASS: str = "ProvenanceQuantumGraph"
+"""Name of the storage class used to store provenance."""

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/python/lsst/pipe/base/cli/cmd/__init__.py RENAMED Viewed

@@ -25,6 +25,20 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <https://www.gnu.org/licenses/>.
-__all__ = ["register_instrument", "transfer_from_graph", "zip_from_graph", "retrieve_artifacts_for_quanta", "aggregate_graph"]
+__all__ = [
+    "register_instrument",
+    "transfer_from_graph",
+    "zip_from_graph",
+    "retrieve_artifacts_for_quanta",
+    "aggregate_graph",
+    "ingest_graph",
+]
-from .commands import (register_instrument, retrieve_artifacts_for_quanta, transfer_from_graph, zip_from_graph, aggregate_graph)
+from .commands import (
+    register_instrument,
+    retrieve_artifacts_for_quanta,
+    transfer_from_graph,
+    zip_from_graph,
+    aggregate_graph,
+    ingest_graph,
+)

{lsst_pipe_base-30.2026.200 → lsst_pipe_base-30.2026.400}/python/lsst/pipe/base/cli/cmd/commands.py RENAMED Viewed

@@ -161,7 +161,7 @@ _AGGREGATOR_DEFAULTS = aggregator.AggregatorConfig()
 @click.command(short_help="Scan for the outputs of an active or completed quantum graph.", cls=ButlerCommand)
 @click.argument("predicted_graph", required=True)
-@repo_argument(required=True, help="Path to the central butler repository.")
+@repo_argument(required=True, help="Path or alias for the butler repository.")
 @click.option(
     "-o",
     "--output",
@@ -181,9 +181,9 @@ _AGGREGATOR_DEFAULTS = aggregator.AggregatorConfig()
     help="Number of processes to use.",
 )
 @click.option(
-    "--complete/--incomplete",
-    "assume_complete",
-    default=_AGGREGATOR_DEFAULTS.assume_complete,
+    "--incomplete/--complete",
+    "incomplete",
+    default=_AGGREGATOR_DEFAULTS.incomplete,
     help="Whether execution has completed (and failures cannot be retried).",
 )
 @click.option(
@@ -249,6 +249,14 @@ _AGGREGATOR_DEFAULTS = aggregator.AggregatorConfig()
     default=_AGGREGATOR_DEFAULTS.mock_storage_classes,
     help="Enable support for storage classes created by the lsst.pipe.base.tests.mocks package.",
 )
+@click.option(
+    "--promise-ingest-graph/--no-promise-ingest-graph",
+    default=_AGGREGATOR_DEFAULTS.promise_ingest_graph,
+    help=(
+        "Promise to run 'butler ingest-graph' later, allowing aggregate-graph "
+        "to skip metadata/log/config ingestion for now."
+    ),
+)
 def aggregate_graph(predicted_graph: str, repo: str, **kwargs: Any) -> None:
     """Scan for quantum graph's outputs to gather provenance, ingest datasets
     into the central butler repository, and delete datasets that are no
@@ -268,3 +276,33 @@ def aggregate_graph(predicted_graph: str, repo: str, **kwargs: Any) -> None:
         # When this exception is raised, we'll have already logged the relevant
         # traceback from a separate worker.
         raise click.ClickException(str(err)) from None
+@click.command(
+    short_help="Ingest a provenance quantum graph into a butler, finalizing a RUN collection.",
+    cls=ButlerCommand,
+)
+@repo_argument(required=True, help="Path or alias for the butler repository.")
+@click.argument("provenance_graph", required=False)
+@transfer_option(default="move")
+@click.option("--batch-size", default=10000, help="How many datasets to process in each transaction.")
+@click.option(
+    "--output-run",
+    default=None,
+    help=(
+        "Name of the output RUN collection.  Must be provided if the provenance graph is not"
+        " provided (so the graph can be found in the butler)."
+    ),
+)
+def ingest_graph(
+    *,
+    repo: str,
+    provenance_graph: str | None,
+    transfer: str | None,
+    batch_size: int,
+    output_run: str | None,
+) -> None:
+    """Ingest a provenance graph into a butler repository."""
+    from ...quantum_graph.ingest_graph import ingest_graph as ingest_graph_py
+    ingest_graph_py(repo, provenance_graph, transfer=transfer, batch_size=batch_size, output_run=output_run)

lsst-pipe-base 30.2026.200__tar.gz → 30.2026.400__tar.gz

lsst-pipe-base 30.2026.200tar.gz → 30.2026.400tar.gz