PyPI - atlas-schema - Versions diffs - 0.2.2__tar.gz → 0.2.3__tar.gz - Mend

atlas-schema 0.2.2tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,11 +1,11 @@
 Metadata-Version: 2.4
 Name: atlas-schema
-Version: 0.2.2
+Version: 0.2.3
 Summary: Helper python package for ATLAS Common NTuple Analysis work.
 Project-URL: Homepage, https://github.com/scipp-atlas/atlas-schema
 Project-URL: Bug Tracker, https://github.com/scipp-atlas/atlas-schema/issues
 Project-URL: Discussions, https://github.com/scipp-atlas/atlas-schema/discussions
-Project-URL: Documentation, https://atlas-schema.readthedocs.io/en/v0.2.2/
+Project-URL: Documentation, https://atlas-schema.readthedocs.io/en/v0.2.3/
 Project-URL: Releases, https://github.com/scipp-atlas/atlas-schema/releases
 Project-URL: Release Notes, https://atlas-schema.readthedocs.io/en/latest/history.html
 Author-email: Giordon Stark <kratsg@gmail.com>
@@ -251,7 +251,7 @@ Requires-Dist: tbump>=6.7.0; extra == 'test'
 Requires-Dist: twine; extra == 'test'
 Description-Content-Type: text/markdown
-# atlas-schema v0.2.2
+# atlas-schema v0.2.3
 [![Actions Status][actions-badge]][actions-link]
 [![Documentation Status][rtd-badge]][rtd-link]
@@ -279,6 +279,129 @@ Description-Content-Type: text/markdown
 <!-- prettier-ignore-end -->
+This is the python package containing schemas and helper functions enabling
+analyzers to work with ATLAS datasets (Monte Carlo and Data), using
+[coffea](https://coffea-hep.readthedocs.io/en/latest/).
+## Hello World
+The simplest example is to just get started processing the file as expected:
+```python
+from atlas_schema.schema import NtupleSchema
+from coffea import dataset_tools
+import awkward as ak
+fileset = {"ttbar": {"files": {"path/to/ttbar.root": "tree_name"}}}
+samples, report = dataset_tools.preprocess(fileset)
+def noop(events):
+    return ak.fields(events)
+fields = dataset_tools.apply_to_fileset(noop, samples, schemaclass=NtupleSchema)
+print(fields)
+```
+which produces something similar to
+```python
+{
+    "ttbar": [
+        "dataTakingYear",
+        "mcChannelNumber",
+        "runNumber",
+        "eventNumber",
+        "lumiBlock",
+        "actualInteractionsPerCrossing",
+        "averageInteractionsPerCrossing",
+        "truthjet",
+        "PileupWeight",
+        "RandomRunNumber",
+        "met",
+        "recojet",
+        "truth",
+        "generatorWeight",
+        "beamSpotWeight",
+        "trigPassed",
+        "jvt",
+    ]
+}
+```
+However, a more involved example to apply a selection and fill a histogram looks
+like below:
+```python
+import awkward as ak
+import dask
+import hist.dask as had
+import matplotlib.pyplot as plt
+from coffea import processor
+from coffea.nanoevents import NanoEventsFactory
+from distributed import Client
+from atlas_schema.schema import NtupleSchema
+class MyFirstProcessor(processor.ProcessorABC):
+    def __init__(self):
+        pass
+    def process(self, events):
+        dataset = events.metadata["dataset"]
+        h_ph_pt = (
+            had.Hist.new.StrCat(["all", "pass", "fail"], name="isEM")
+            .Regular(200, 0.0, 2000.0, name="pt", label="$pt_{\gamma}$ [GeV]")
+            .Int64()
+        )
+        cut = ak.all(events.ph.isEM, axis=1)
+        h_ph_pt.fill(isEM="all", pt=ak.firsts(events.ph.pt / 1.0e3))
+        h_ph_pt.fill(isEM="pass", pt=ak.firsts(events[cut].ph.pt / 1.0e3))
+        h_ph_pt.fill(isEM="fail", pt=ak.firsts(events[~cut].ph.pt / 1.0e3))
+        return {
+            dataset: {
+                "entries": ak.num(events, axis=0),
+                "ph_pt": h_ph_pt,
+            }
+        }
+    def postprocess(self, accumulator):
+        pass
+if __name__ == "__main__":
+    client = Client()
+    fname = "ntuple.root"
+    events = NanoEventsFactory.from_root(
+        {fname: "analysis"},
+        schemaclass=NtupleSchema,
+        metadata={"dataset": "700352.Zqqgamma.mc20d.v1"},
+    ).events()
+    p = MyFirstProcessor()
+    out = p.process(events)
+    (computed,) = dask.compute(out)
+    print(computed)
+    fig, ax = plt.subplots()
+    computed["700352.Zqqgamma.mc20d.v1"]["ph_pt"].plot1d(ax=ax)
+    ax.set_xscale("log")
+    ax.legend(title="Photon pT for Zqqgamma")
+    fig.savefig("ph_pt.pdf")
+```
+which produces
+<img src="https://raw.githubusercontent.com/scipp-atlas/atlas-schema/main/docs/_static/img/ph_pt.png" alt="three stacked histograms of photon pT, with each stack corresponding to: no selection, requiring the isEM flag, and inverting the isEM requirement" width="500" style="display: block; margin-left: auto; margin-right: auto;">
+<!-- SPHINX-END -->
 ## Developer Notes
 ### Converting Enums from C++ to Python

atlas_schema-0.2.3/README.md ADDED Viewed

@@ -0,0 +1,160 @@
+# atlas-schema v0.2.3
+[![Actions Status][actions-badge]][actions-link]
+[![Documentation Status][rtd-badge]][rtd-link]
+[![PyPI version][pypi-version]][pypi-link]
+[![Conda-Forge][conda-badge]][conda-link]
+[![PyPI platforms][pypi-platforms]][pypi-link]
+[![GitHub Discussion][github-discussions-badge]][github-discussions-link]
+<!-- SPHINX-START -->
+<!-- prettier-ignore-start -->
+[actions-badge]:            https://github.com/scipp-atlas/atlas-schema/workflows/CI/badge.svg
+[actions-link]:             https://github.com/scipp-atlas/atlas-schema/actions
+[conda-badge]:              https://img.shields.io/conda/vn/conda-forge/atlas-schema
+[conda-link]:               https://github.com/conda-forge/atlas-schema-feedstock
+[github-discussions-badge]: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
+[github-discussions-link]:  https://github.com/scipp-atlas/atlas-schema/discussions
+[pypi-link]:                https://pypi.org/project/atlas-schema/
+[pypi-platforms]:           https://img.shields.io/pypi/pyversions/atlas-schema
+[pypi-version]:             https://img.shields.io/pypi/v/atlas-schema
+[rtd-badge]:                https://readthedocs.org/projects/atlas-schema/badge/?version=latest
+[rtd-link]:                 https://atlas-schema.readthedocs.io/en/latest/?badge=latest
+<!-- prettier-ignore-end -->
+This is the python package containing schemas and helper functions enabling
+analyzers to work with ATLAS datasets (Monte Carlo and Data), using
+[coffea](https://coffea-hep.readthedocs.io/en/latest/).
+## Hello World
+The simplest example is to just get started processing the file as expected:
+```python
+from atlas_schema.schema import NtupleSchema
+from coffea import dataset_tools
+import awkward as ak
+fileset = {"ttbar": {"files": {"path/to/ttbar.root": "tree_name"}}}
+samples, report = dataset_tools.preprocess(fileset)
+def noop(events):
+    return ak.fields(events)
+fields = dataset_tools.apply_to_fileset(noop, samples, schemaclass=NtupleSchema)
+print(fields)
+```
+which produces something similar to
+```python
+{
+    "ttbar": [
+        "dataTakingYear",
+        "mcChannelNumber",
+        "runNumber",
+        "eventNumber",
+        "lumiBlock",
+        "actualInteractionsPerCrossing",
+        "averageInteractionsPerCrossing",
+        "truthjet",
+        "PileupWeight",
+        "RandomRunNumber",
+        "met",
+        "recojet",
+        "truth",
+        "generatorWeight",
+        "beamSpotWeight",
+        "trigPassed",
+        "jvt",
+    ]
+}
+```
+However, a more involved example to apply a selection and fill a histogram looks
+like below:
+```python
+import awkward as ak
+import dask
+import hist.dask as had
+import matplotlib.pyplot as plt
+from coffea import processor
+from coffea.nanoevents import NanoEventsFactory
+from distributed import Client
+from atlas_schema.schema import NtupleSchema
+class MyFirstProcessor(processor.ProcessorABC):
+    def __init__(self):
+        pass
+    def process(self, events):
+        dataset = events.metadata["dataset"]
+        h_ph_pt = (
+            had.Hist.new.StrCat(["all", "pass", "fail"], name="isEM")
+            .Regular(200, 0.0, 2000.0, name="pt", label="$pt_{\gamma}$ [GeV]")
+            .Int64()
+        )
+        cut = ak.all(events.ph.isEM, axis=1)
+        h_ph_pt.fill(isEM="all", pt=ak.firsts(events.ph.pt / 1.0e3))
+        h_ph_pt.fill(isEM="pass", pt=ak.firsts(events[cut].ph.pt / 1.0e3))
+        h_ph_pt.fill(isEM="fail", pt=ak.firsts(events[~cut].ph.pt / 1.0e3))
+        return {
+            dataset: {
+                "entries": ak.num(events, axis=0),
+                "ph_pt": h_ph_pt,
+            }
+        }
+    def postprocess(self, accumulator):
+        pass
+if __name__ == "__main__":
+    client = Client()
+    fname = "ntuple.root"
+    events = NanoEventsFactory.from_root(
+        {fname: "analysis"},
+        schemaclass=NtupleSchema,
+        metadata={"dataset": "700352.Zqqgamma.mc20d.v1"},
+    ).events()
+    p = MyFirstProcessor()
+    out = p.process(events)
+    (computed,) = dask.compute(out)
+    print(computed)
+    fig, ax = plt.subplots()
+    computed["700352.Zqqgamma.mc20d.v1"]["ph_pt"].plot1d(ax=ax)
+    ax.set_xscale("log")
+    ax.legend(title="Photon pT for Zqqgamma")
+    fig.savefig("ph_pt.pdf")
+```
+which produces
+<img src="https://raw.githubusercontent.com/scipp-atlas/atlas-schema/main/docs/_static/img/ph_pt.png" alt="three stacked histograms of photon pT, with each stack corresponding to: no selection, requiring the isEM flag, and inverting the isEM requirement" width="500" style="display: block; margin-left: auto; margin-right: auto;">
+<!-- SPHINX-END -->
+## Developer Notes
+### Converting Enums from C++ to Python
+This useful `vim` substitution helps:
+```
+%s/    \([A-Za-z]\+\)\s\+=  \(\d\+\),\?/    \1: Annotated[int, "\1"] = \2
+```

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/pyproject.toml RENAMED Viewed

@@ -60,7 +60,7 @@ docs = [
 Homepage = "https://github.com/scipp-atlas/atlas-schema"
 "Bug Tracker" = "https://github.com/scipp-atlas/atlas-schema/issues"
 Discussions = "https://github.com/scipp-atlas/atlas-schema/discussions"
-Documentation = "https://atlas-schema.readthedocs.io/en/v0.2.2/"
+Documentation = "https://atlas-schema.readthedocs.io/en/v0.2.3/"
 Releases = "https://github.com/scipp-atlas/atlas-schema/releases"
 "Release Notes" = "https://atlas-schema.readthedocs.io/en/latest/history.html"
@@ -115,6 +115,7 @@ filterwarnings = [
 ]
 log_cli_level = "INFO"
 testpaths = [
+  "src",
   "tests",
   "docs",
 ]

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.2.2'
-__version_tuple__ = version_tuple = (0, 2, 2)
+__version__ = version = '0.2.3'
+__version_tuple__ = version_tuple = (0, 2, 3)

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/schema.py RENAMED Viewed

@@ -49,7 +49,7 @@ class NtupleSchema(BaseSchema):  # type: ignore[misc]
     }
     # These are stored as length-1 vectors unnecessarily
-    singletons: ClassVar[list[str]] = []
+    singletons: ClassVar[set[str]] = set()
     docstrings: ClassVar[dict[str, str]] = {
         "charge": "charge",
@@ -127,8 +127,8 @@ class NtupleSchema(BaseSchema):  # type: ignore[misc]
         output = {}
-        # first, register the event-level stuff directly
-        for name in self.event_ids:
+        # first, register singletons (event-level, others)
+        for name in {*self.event_ids, *self.singletons}:
             if name in missing_event_ids:
                 continue
             output[name] = branch_forms[name]
@@ -163,7 +163,17 @@ class NtupleSchema(BaseSchema):  # type: ignore[misc]
                 }
             )
-            output[name] = zip_forms(content, name, record_name=mixin)
+            if not used and not content:
+                warnings.warn(
+                    f"I identified a branch that likely does not have any leaves: '{name}'. I will treat this as a 'singleton'. To suppress this warning next time, please define your singletons explicitly.",
+                    RuntimeWarning,
+                    stacklevel=2,
+                )
+                self.singletons.add(name)
+                output[name] = branch_forms[name]
+            else:
+                output[name] = zip_forms(content, name, record_name=mixin)
             output[name].setdefault("parameters", {})
             output[name]["parameters"].update({"collection_name": name})
@@ -174,6 +184,9 @@ class NtupleSchema(BaseSchema):  # type: ignore[misc]
             elif output[name]["class"] == "RecordArray":
                 parameters = output[name]["fields"]
                 contents = output[name]["contents"]
+            elif output[name]["class"] == "NumpyArray":
+                # these are singletons that we just pass through
+                continue
             else:
                 msg = f"Unhandled class {output[name]['class']}"
                 raise RuntimeError(msg)
@@ -191,11 +204,6 @@ class NtupleSchema(BaseSchema):  # type: ignore[misc]
                     ),
                 )
-            if name in self.singletons:
-                # flatten! this 'promotes' the content of an inner dimension
-                # upwards, effectively hiding one nested dimension
-                output[name] = output[name]["content"]
         return output.keys(), output.values()
     @classmethod

atlas_schema-0.2.3/src/atlas_schema/utils.py ADDED Viewed

@@ -0,0 +1,49 @@
+from __future__ import annotations
+from enum import Enum
+from typing import TypeVar, Union, cast
+import awkward as ak
+import dask_awkward as dak
+Array = TypeVar("Array", bound=Union[dak.Array, ak.Array])
+_E = TypeVar("_E", bound=Enum)
+def isin(element: Array, test_elements: dak.Array | ak.Array, axis: int = -1) -> Array:
+    """
+    Find test_elements in element. Similar in API as :func:`numpy.isin`.
+    Calculates `element in test_elements`, broadcasting over *element elements only*. Returns a boolean array of the same shape as *element* that is `True` where an element of *element* is in *test_elements* and `False` otherwise.
+    This works by first transforming *test_elements* to an array with one more
+    dimension than the *element*, placing the *test_elements* at *axis*, and then doing a
+    comparison.
+    Args:
+        element (dak.Array or ak.Array): input array of values.
+        test_elements (dak.Array or ak.Array): one-dimensional set of values against which to test each value of *element*.
+        axis (int): the axis along which the comparison is performed
+    Returns:
+        dak.Array or ak.Array: result of comparison for test_elements in *element*
+    Example:
+        >>> import awkward as ak
+        >>> import atlas_schema as ats
+        >>> truth_origins = ak.Array([[1, 2, 3], [4], [5, 6, 7], [1]])
+        >>> prompt_origins = ak.Array([1, 2, 7])
+        >>> ats.isin(truth_origins, prompt_origins).to_list()
+        [[True, True, False], [False], [False, False, True], [True]]
+    """
+    assert test_elements.ndim == 1, "test_elements must be one-dimensional"
+    assert axis >= -1, "axis must be -1 or positive-valued"
+    assert axis < element.ndim + 1, "axis too large for the element"
+    # First, build up the transformation, with slice(None) indicating where to stick the test_elements
+    reshaper: list[None | slice] = [None] * element.ndim
+    axis = element.ndim if axis == -1 else axis
+    reshaper.insert(axis, slice(None))
+    # Note: reshaper needs to be a tuple for indexing purposes
+    return cast(Array, ak.any(element == test_elements[tuple(reshaper)], axis=-1))

atlas_schema-0.2.2/README.md DELETED Viewed

@@ -1,37 +0,0 @@
-# atlas-schema v0.2.2
-[![Actions Status][actions-badge]][actions-link]
-[![Documentation Status][rtd-badge]][rtd-link]
-[![PyPI version][pypi-version]][pypi-link]
-[![Conda-Forge][conda-badge]][conda-link]
-[![PyPI platforms][pypi-platforms]][pypi-link]
-[![GitHub Discussion][github-discussions-badge]][github-discussions-link]
-<!-- SPHINX-START -->
-<!-- prettier-ignore-start -->
-[actions-badge]:            https://github.com/scipp-atlas/atlas-schema/workflows/CI/badge.svg
-[actions-link]:             https://github.com/scipp-atlas/atlas-schema/actions
-[conda-badge]:              https://img.shields.io/conda/vn/conda-forge/atlas-schema
-[conda-link]:               https://github.com/conda-forge/atlas-schema-feedstock
-[github-discussions-badge]: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
-[github-discussions-link]:  https://github.com/scipp-atlas/atlas-schema/discussions
-[pypi-link]:                https://pypi.org/project/atlas-schema/
-[pypi-platforms]:           https://img.shields.io/pypi/pyversions/atlas-schema
-[pypi-version]:             https://img.shields.io/pypi/v/atlas-schema
-[rtd-badge]:                https://readthedocs.org/projects/atlas-schema/badge/?version=latest
-[rtd-link]:                 https://atlas-schema.readthedocs.io/en/latest/?badge=latest
-<!-- prettier-ignore-end -->
-## Developer Notes
-### Converting Enums from C++ to Python
-This useful `vim` substitution helps:
-```
-%s/    \([A-Za-z]\+\)\s\+=  \(\d\+\),\?/    \1: Annotated[int, "\1"] = \2
-```

atlas_schema-0.2.2/src/atlas_schema/utils.py DELETED Viewed

@@ -1,39 +0,0 @@
-from __future__ import annotations
-from enum import Enum
-from typing import TypeVar, Union, cast
-import awkward as ak
-import dask_awkward as dak
-Array = TypeVar("Array", bound=Union[dak.Array, ak.Array])
-_E = TypeVar("_E", bound=Enum)
-def isin(haystack: Array, needles: dak.Array | ak.Array, axis: int = -1) -> Array:
-    """
-    Find needles in haystack.
-    This works by first transforming needles to an array with one more
-    dimension than the haystack, placing the needles at axis, and then doing a
-    comparison.
-    Args:
-        haystack (dak.Array or ak.Array): haystack of values.
-        needles (dak.Array or ak.Array): one-dimensional set of needles to find in haystack.
-        axis (int): the axis along which the comparison is performed
-    Returns:
-        dak.Array or ak.Array: result of comparison for needles in haystack
-    """
-    assert needles.ndim == 1, "Needles must be one-dimensional"
-    assert axis >= -1, "axis must be -1 or positive-valued"
-    assert axis < haystack.ndim + 1, "axis too large for the haystack"
-    # First, build up the transformation, with slice(None) indicating where to stick the needles
-    reshaper: list[None | slice] = [None] * haystack.ndim
-    axis = haystack.ndim if axis == -1 else axis
-    reshaper.insert(axis, slice(None))
-    # Note: reshaper needs to be a tuple for indexing purposes
-    return cast(Array, ak.any(haystack == needles[tuple(reshaper)], axis=-1))

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/.gitignore RENAMED Viewed

File without changes

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/LICENSE RENAMED Viewed

File without changes

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/__init__.py RENAMED Viewed

File without changes

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/_version.pyi RENAMED Viewed

File without changes

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/enums.py RENAMED Viewed

File without changes

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/methods.py RENAMED Viewed

File without changes

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/py.typed RENAMED Viewed

File without changes

{atlas_schema-0.2.2 → atlas_schema-0.2.3}/src/atlas_schema/typing_compat.py RENAMED Viewed

File without changes

atlas-schema 0.2.2__tar.gz → 0.2.3__tar.gz

atlas-schema 0.2.2tar.gz → 0.2.3tar.gz