PyPI - embedl-deploy - Versions diffs - 0.3.0__tar.gz → 0.4.1__tar.gz - Mend

embedl-deploy 0.3.0tar.gz → 0.4.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

{embedl_deploy-0.3.0 → embedl_deploy-0.4.1}/MANIFEST.in RENAMED Viewed

@@ -3,6 +3,8 @@ graft src
 include LICENSE
 include NOTICE
 include README.md
+prune src/embedl_deploy/tensorrt
+prune src/embedl_deploy/_internal/tensorrt
 global-exclude CLAUDE.md
 global-exclude *.pyc
 global-exclude __pycache__

{embedl_deploy-0.3.0/src/embedl_deploy.egg-info → embedl_deploy-0.4.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: embedl-deploy
-Version: 0.3.0
+Version: 0.4.1
 Summary: Python package to make AI models deployment-ready for any hardware.
 Author-email: Embedl AB <support@embedl.com>
 Project-URL: Homepage, https://www.embedl.com/
@@ -15,7 +15,7 @@ License-File: LICENSE
 License-File: NOTICE
 Requires-Dist: torch
 Provides-Extra: tensorrt
-Requires-Dist: tensorrt; extra == "tensorrt"
+Requires-Dist: embedl-deploy-tensorrt; extra == "tensorrt"
 Dynamic: license-file
 # embedl-deploy
@@ -55,16 +55,16 @@ hardware target ensuring correct quantization and compilation.
 ## Supported Backends
-| Backend             | Status      |
-|---------------------|-------------|
-| NVIDIA TensorRT     | Supported   |
+| Backend                 | Status      |
+|-------------------------|-------------|
+| NVIDIA TensorRT (v10.3) | Supported   |
-Contact us for other backends.
+Contact Embedl for other backends.
 ## Installation
 ```bash
-pip install embedl-deploy
+pip install "embedl-deploy[tensorrt]"
 ```
 Note that you may need to also install `onnx` and `onnx-simplifier` to export
 and get the exported model compiled with TensorRT if using ONNX as an
@@ -86,6 +86,9 @@ model = Model().eval()
 example_input = torch.randn(1, 3, 224, 224)
 # 2. Transform — fuse and optimize for TensorRT in one call
+# For more compatibilty you can trace your model with torch.export.export
+# as follows:
+# model = torch.export.export(model, (example_input)).module()
 res = transform(model, patterns=TENSORRT_PATTERNS)
 print("Model\n", res.model.print_readable())
 print("Matches", "\n".join([str(match) for match in res.matches]))
@@ -112,28 +115,54 @@ torch.onnx.export(
 qat_model = quantized_model.train()
 # Freeze BatchNorm, or apply other QAT utilities as needed
 # train(qat_model)
+```
+### Compile
+Compilation can be done with TensorRT's trtexec tool, which can take the ONNX
+model and compile it for inference. The exported layer info and profile can
+be used for debugging, optimization and visualization.
+Note: that the ONNX model might need to be simplified with onnx-simplifier to
+make trtexec compile it. Dynamo exported models may have compilation issues,
+so it's recommended to export with dynamo=False.
+```bash
+onnxsim model.onnx model.onnx
+/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --fp16 --int8 --useCudaGraph
+```
+Optionally you can get the layer profile with the following flags:
+```
+--exportLayerInfo=layer_info.json
+--exportProfile=profile.json
+--profilingVerbosity=detailed
+```
-# Compile
-# -------
-# Compilation can be done with TensorRT's trtexec tool, which can take the ONNX
-# model and compile it for inference. The exported layer info and profile can
-# be used for debugging, optimization and visualization.
-#
-# Note: that the ONNX model might need to be simplified with onnx-simplifier to
-# make trtexec compile it. Dynamo exported models may have compilation issues,
-# so it's recommended to export with dynamo=False.
-#
-# We are working on a Aten-based export path that should be more robust and
-# support more models in the future.
-# >> onnxsim model.onnx model.onnx
-# >> trtexec \
-#       --onnx=model.onnx \
-#       --exportLayerInfo=layer_info.json \
-#       --exportProfile=profile.json \
-#       --profilingVerbosity=detailed
-# More benchmarking scripts can be found in the examples/ directory
+## Mixed Precision
+To keep a specific layer in higher precision while quantizing the rest to INT8,
+pass its `nn.Conv2d` instance to `ModulesToSkip` after `transform`. Note that
+`torch.fx.GraphModule` deep-copies submodules during tracing, so you must take
+the reference **from the fused graph**, not from the original model:
+```python
+from embedl_deploy.quantize import quantize, QuantConfig, ModulesToSkip
+res = transform(model, patterns=TENSORRT_PATTERNS)
+# Grab the conv instance from the fused graph (not from the original model)
+first_conv = res.model.FusedConvBNActMaxPool_0.conv
+config = QuantConfig(
+    skip=ModulesToSkip(
+        stub={first_conv},    # disables input activation quantization
+        weight={first_conv},  # disables weight fake-quantization
+    )
+)
+quantized_model = quantize(
+    res.model, (example_input,), config=config, forward_loop=calibration_loop
+)
 ```
 ## Design Principles
@@ -150,10 +179,13 @@ qat_model = quantized_model.train()
    `transform()` is a convenience for the common case where you want
    everything applied.
-3. **FX-graph-based.**
-   All graph analysis and surgery uses `torch.fx`. Models are traced once
-   and manipulated as `fx.GraphModule` objects. Support for Aten graphs
-   produced by `torch.export.export` is planned for the future.
+3. **Graph-based models (torch.export.export and symbolic traced).**
+   All graph analysis and surgery uses traced graphs. Models are traced once
+   and manipulated as `fx.GraphModule` objects with suport for tracing via both
+   `torch.fx` (symbolic) as well as `torch.export.export` (Aten). Support for
+   Aten graphs is automatically enabled using Aten recomposition
+   patterns that compose Aten operations into equivalent `torch.nn` modules
+   automatically before conversions and fusions.
 ## Support

{embedl_deploy-0.3.0 → embedl_deploy-0.4.1}/README.md RENAMED Viewed

@@ -35,16 +35,16 @@ hardware target ensuring correct quantization and compilation.
 ## Supported Backends
-| Backend             | Status      |
-|---------------------|-------------|
-| NVIDIA TensorRT     | Supported   |
+| Backend                 | Status      |
+|-------------------------|-------------|
+| NVIDIA TensorRT (v10.3) | Supported   |
-Contact us for other backends.
+Contact Embedl for other backends.
 ## Installation
 ```bash
-pip install embedl-deploy
+pip install "embedl-deploy[tensorrt]"
 ```
 Note that you may need to also install `onnx` and `onnx-simplifier` to export
 and get the exported model compiled with TensorRT if using ONNX as an
@@ -66,6 +66,9 @@ model = Model().eval()
 example_input = torch.randn(1, 3, 224, 224)
 # 2. Transform — fuse and optimize for TensorRT in one call
+# For more compatibilty you can trace your model with torch.export.export
+# as follows:
+# model = torch.export.export(model, (example_input)).module()
 res = transform(model, patterns=TENSORRT_PATTERNS)
 print("Model\n", res.model.print_readable())
 print("Matches", "\n".join([str(match) for match in res.matches]))
@@ -92,28 +95,54 @@ torch.onnx.export(
 qat_model = quantized_model.train()
 # Freeze BatchNorm, or apply other QAT utilities as needed
 # train(qat_model)
+```
+### Compile
+Compilation can be done with TensorRT's trtexec tool, which can take the ONNX
+model and compile it for inference. The exported layer info and profile can
+be used for debugging, optimization and visualization.
+Note: that the ONNX model might need to be simplified with onnx-simplifier to
+make trtexec compile it. Dynamo exported models may have compilation issues,
+so it's recommended to export with dynamo=False.
+```bash
+onnxsim model.onnx model.onnx
+/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --fp16 --int8 --useCudaGraph
+```
+Optionally you can get the layer profile with the following flags:
+```
+--exportLayerInfo=layer_info.json
+--exportProfile=profile.json
+--profilingVerbosity=detailed
+```
-# Compile
-# -------
-# Compilation can be done with TensorRT's trtexec tool, which can take the ONNX
-# model and compile it for inference. The exported layer info and profile can
-# be used for debugging, optimization and visualization.
-#
-# Note: that the ONNX model might need to be simplified with onnx-simplifier to
-# make trtexec compile it. Dynamo exported models may have compilation issues,
-# so it's recommended to export with dynamo=False.
-#
-# We are working on a Aten-based export path that should be more robust and
-# support more models in the future.
-# >> onnxsim model.onnx model.onnx
-# >> trtexec \
-#       --onnx=model.onnx \
-#       --exportLayerInfo=layer_info.json \
-#       --exportProfile=profile.json \
-#       --profilingVerbosity=detailed
-# More benchmarking scripts can be found in the examples/ directory
+## Mixed Precision
+To keep a specific layer in higher precision while quantizing the rest to INT8,
+pass its `nn.Conv2d` instance to `ModulesToSkip` after `transform`. Note that
+`torch.fx.GraphModule` deep-copies submodules during tracing, so you must take
+the reference **from the fused graph**, not from the original model:
+```python
+from embedl_deploy.quantize import quantize, QuantConfig, ModulesToSkip
+res = transform(model, patterns=TENSORRT_PATTERNS)
+# Grab the conv instance from the fused graph (not from the original model)
+first_conv = res.model.FusedConvBNActMaxPool_0.conv
+config = QuantConfig(
+    skip=ModulesToSkip(
+        stub={first_conv},    # disables input activation quantization
+        weight={first_conv},  # disables weight fake-quantization
+    )
+)
+quantized_model = quantize(
+    res.model, (example_input,), config=config, forward_loop=calibration_loop
+)
 ```
 ## Design Principles
@@ -130,10 +159,13 @@ qat_model = quantized_model.train()
    `transform()` is a convenience for the common case where you want
    everything applied.
-3. **FX-graph-based.**
-   All graph analysis and surgery uses `torch.fx`. Models are traced once
-   and manipulated as `fx.GraphModule` objects. Support for Aten graphs
-   produced by `torch.export.export` is planned for the future.
+3. **Graph-based models (torch.export.export and symbolic traced).**
+   All graph analysis and surgery uses traced graphs. Models are traced once
+   and manipulated as `fx.GraphModule` objects with suport for tracing via both
+   `torch.fx` (symbolic) as well as `torch.export.export` (Aten). Support for
+   Aten graphs is automatically enabled using Aten recomposition
+   patterns that compose Aten operations into equivalent `torch.nn` modules
+   automatically before conversions and fusions.
 ## Support

{embedl_deploy-0.3.0 → embedl_deploy-0.4.1}/pyproject.toml RENAMED Viewed

@@ -27,16 +27,11 @@ dynamic = ["version"]
 dependencies = ["torch"]
 [project.optional-dependencies]
-tensorrt = ["tensorrt"]
+tensorrt = ["embedl-deploy-tensorrt"]
 [project.urls]
 Homepage = "https://www.embedl.com/"
-[tool.black]
-line-length = 79
-target-version = ["py310"]
-skip-string-normalization = true
 [tool.coverage.html]
 show_contexts = true
@@ -100,19 +95,72 @@ line-length = 79
 quote-style = "preserve"
 [tool.ruff.lint]
-select = [
-    # isort
-    "I",
-    # Use `from X import Y` instead of `import X.Y as Y`
-    "PLR0402",
+select = ["ALL"]
+ignore = [
+    # Dynamic attributes on fx.Node require string-based access for mypy
+    "B009", "B010",
+    # Conflicts with ruff format
+    "COM812",
+    # Descriptive exception messages preferred
+    "EM", "TRY003",
+    # Allow long lines for URLs, Sphinx cross-references, and imports
+    "E501",
+    # Too many false positives
+    "ERA001",
+    # Common in PyTorch-style APIs
+    "FBT",
+    # TODOs are fine
+    "FIX002",
+    # PyTorch naming conventions (N, C, H, W; import F)
+    "N806", "N812",
+    # Allow magic value comparisons
+    "PLR2004",
+    # Intermediate variables before return aid readability
+    "RET504",
+    # Conflicts with quote-style = "preserve"
+    "Q000",
+    # Intentional Unicode in docstrings and comments
+    "RUF002", "RUF003",
+    # Explicit if/return True/return False is clearer for predicate functions
+    "SIM103",
+    # Type-only imports are fine as regular imports
+    "TC001",
+    # Non-cryptographic random is expected in ML code
+    "S311",
+    # Prefer unquoted type expressions in cast()
+    "TC006",
+    # Clashes with dataclass and nn.Module patterns
+    "RUF012",
+    # Too prescriptive about TODO format
+    "TD",
+    # D203/D211 and D212/D213 are mutually exclusive pairs
+    "D203", "D213",
 ]
+[tool.ruff.lint.per-file-ignores]
+"src/**/*.py" = ["S101"]
+"tests/**/*.py" = ["ANN", "D103", "S101"]
+"docs/**/*.py" = ["ANN", "E402", "INP001", "S", "T201"]
+"examples/**/*.py" = ["INP001", "T201"]
+".claude/**/*.py" = ["ALL"]
+[tool.ruff.lint.pylint]
+max-args = 8
 [tool.mypy]
 ignore_missing_imports = false
 strict = true
 [[tool.mypy.overrides]]
-module = ["torch.*", "pytest.*"]
+module = [
+    "torch.*",
+    "pytest.*",
+    "torchvision.*",
+    "tensorrt.*",
+    "onnx.*",
+    "onnxsim.*",
+    "embedl_studio.*",
+]
 ignore_missing_imports = true
 [[tool.mypy.overrides]]
@@ -125,5 +173,8 @@ disable_error_code = ["misc", "no-any-return"]
 module = ["embedl_deploy._internal.tensorrt.modules.*"]
 disable_error_code = ["no-any-return"]
+[tool.setuptools.package-data]
+embedl_deploy = ["py.typed"]
 [tool.setuptools.dynamic]
 version = { attr = "embedl_deploy.version.public.PUBLIC_VERSION" }

{embedl_deploy-0.3.0 → embedl_deploy-0.4.1}/src/embedl_deploy/_internal/core/backend.py RENAMED Viewed

@@ -22,7 +22,7 @@ class Backend:
     fusion_patterns: Sequence[Pattern]
     #: SmoothQuant preparation patterns.
     smooth_patterns: Sequence[Pattern]
-    #: Q/DQ stub insertion patterns for quantisation.
+    #: Q/DQ stub insertion patterns for quantization.
     quantized_patterns: Sequence[Pattern]
@@ -120,6 +120,6 @@ def set_backend(name: str) -> None:
     if name not in backends:
         available = ", ".join(sorted(backends)) or "(none)"
         raise ValueError(
-            f"Backend {name!r} not found. " f"Available backends: {available}"
+            f"Backend {name!r} not found. Available backends: {available}"
         )
     _BackendState.backend = backends[name]

{embedl_deploy-0.3.0 → embedl_deploy-0.4.1}/src/embedl_deploy/_internal/core/modules.py RENAMED Viewed

@@ -63,6 +63,10 @@ class FusedModule(nn.Module, ABC):
         self.input_quant_stubs: dict[int, QuantStub] = {
             idx: QuantStub({self}) for idx in self.inputs_to_quantize
         }
+        #: Whether this module has been surrounded with input
+        #: ``QuantStub`` entries by
+        #: :class:`~embedl_deploy._internal.tensorrt.patterns.quantizations.SurroundWithQuantStubsPattern`.
+        self.surrounded: bool = False
 class _LeafTracer(fx.Tracer):

embedl_deploy-0.4.1/src/embedl_deploy/_internal/core/pattern.py ADDED Viewed

@@ -0,0 +1,204 @@
+# Copyright (C) 2026 Embedl AB
+"""Core abstractions: Pattern base class and PatternMatch dataclass.
+Every fusion, conversion, and quantization rule is a
+:class:`~embedl_deploy._internal.core.pattern.Pattern` subclass.  The two
+methods — :meth:`~embedl_deploy._internal.core.pattern.Pattern.match` and
+:meth:`~embedl_deploy._internal.core.pattern.Pattern.replace` — encapsulate
+what to look for and how to rewrite the graph.
+"""
+from dataclasses import dataclass
+from torch import fx, nn
+from embedl_deploy._internal.core.tree.match import match_tree
+from embedl_deploy._internal.core.tree.replace import replace_tree
+from embedl_deploy._internal.core.tree.types import (
+    Graft,
+    Replacement,
+    Tree,
+    TreeMatch,
+    Wildcard,
+)
+from embedl_deploy._internal.core.tree.utils import get_module
+def _collect_modules(tree_match: TreeMatch) -> list[nn.Module | None]:
+    """Resolve matched modules from a tree match.
+    Walks nested branches first (in input order), then
+    trunk nodes.  For a
+    :class:`~embedl_deploy._internal.core.tree.types.Fork`
+    tree this means the fork-input branches precede the output
+    trunk, so the resulting list matches a constructor signature
+    like
+    ``FusedModule(branch0_mod, branch1_mod, …, output_mod)``.
+    :class:`~embedl_deploy._internal.core.tree.types.Wildcard`
+    entries with ``"?"`` quantifier that matched nothing
+    contribute ``None``.
+    :raises TypeError:
+        If a matched node is not a ``call_module`` node.
+    """
+    modules: list[nn.Module | None] = []
+    for nested in tree_match.nested:
+        modules.extend(_collect_modules(nested))
+    for entry in tree_match.trunk_nodes:
+        if isinstance(entry, Wildcard):
+            if entry.quantifier != "?":
+                raise TypeError(
+                    f"wildcard with quantifier"
+                    f" {entry.quantifier!r} is not"
+                    f" supported — graft only supports"
+                    f" '?' wildcards"
+                )
+            node = entry.nodes[0] if entry.nodes else None
+        else:
+            node = entry
+        if node is None:
+            modules.append(None)
+        else:
+            mod = get_module(node)
+            if mod is None:
+                raise TypeError(
+                    f"node {node.name!r} is not a call_module "
+                    f"node — graft only works with "
+                    f"module-only trees"
+                )
+            modules.append(mod)
+    return modules
+def _get_replacements(
+    graft: Graft,
+    tree_match: TreeMatch,
+) -> list[Replacement]:
+    """Build the replacement list from a graft specification."""
+    if isinstance(graft, tuple):
+        replacements: list[Replacement] = []
+        for rep_maker in graft:
+            replacements.extend(rep_maker(tree_match))
+        return replacements
+    modules = _collect_modules(tree_match)
+    try:
+        return [graft(*modules)]
+    except TypeError as exc:
+        raise TypeError(
+            f"{graft.__name__}() got"
+            f" {len(modules)} modules from"
+            f" the tree match — check that"
+            f" the tree shape matches the"
+            f" constructor signature"
+        ) from exc
+class Pattern:
+    """A graph transformation rule: find a sub-graph and replace it.
+    The default :meth:`match` delegates to
+    :func:`~embedl_deploy._internal.core.tree.match.match_tree` using the
+    class's :attr:`tree`.  The default :meth:`replace` constructs
+    replacements from :attr:`graft` and delegates to
+    :func:`~embedl_deploy._internal.core.tree.replace.replace_tree`.
+    Subclasses override either method when they need custom logic
+    (pre/post side-effects, post-match filtering, etc.).
+    Patterns with
+    :attr:`~embedl_deploy._internal.core.pattern.Pattern.is_conversion` set to
+    ``True`` are applied in a first pass to rewrite graph topology before
+    fusion patterns are matched.
+    """
+    tree: Tree | None = None
+    """The pattern topology to match, if using tree-based matching."""
+    graft: Graft | None = None
+    """The factories to make replacements for each matched tree, if used."""
+    is_conversion: bool = False
+    """If ``True``, this pattern is a structural conversion that must
+    be applied before fusion matching."""
+    symbolic_trace_only: bool = False
+    """If ``True``, this pattern removes nodes that are artifacts of
+    ``symbolic_trace``. This pattern has no effect on graphs exported with
+    ``torch.export`` because the nodes never appear in those graphs."""
+    export_graph_only: bool = False
+    """If ``True``, this pattern targets nodes that only appear in
+    ``torch.export`` aten graphs and has no effect on symbolic-trace output."""
+    def match(self, graph_module: fx.GraphModule) -> list["PatternMatch"]:
+        """Find all occurrences of this pattern in `graph_module`.
+        :raises ValueError:
+            If the pattern has no ``tree``.
+        """
+        tree = self.tree
+        if tree is None:
+            raise ValueError(f"{type(self).__name__} has no tree to match.")
+        tree_matches = match_tree(graph_module, tree)
+        return [
+            PatternMatch(
+                pattern=self,
+                graph_module=graph_module,
+                tree_match=tm,
+            )
+            for tm in tree_matches
+        ]
+    def replace(
+        self,
+        pattern_match: "PatternMatch",
+    ) -> list[fx.Node]:
+        """Replace one matched occurrence in-place.
+        :param pattern_match:
+            The pattern match to replace.
+        :returns:
+            The replacement nodes inserted into the graph.
+        :raises ValueError:
+            If the pattern has no ``graft``.
+        :raises TypeError:
+            If the ``graft`` class constructor rejects the
+            collected modules.
+        """
+        assert pattern_match.pattern is self
+        tree_match = pattern_match.tree_match
+        graft = self.graft
+        if graft is None:
+            raise ValueError(
+                f"{type(self).__name__} has no graft"
+                f" — override replace() or set graft."
+            )
+        replacements = _get_replacements(graft, tree_match)
+        return replace_tree(
+            pattern_match.graph_module, tree_match, replacements
+        )
+@dataclass
+class PatternMatch:
+    """One matched occurrence of a ``Pattern`` in a graph."""
+    #: The pattern that produced this match.
+    pattern: Pattern
+    #: The graph module that produced this match.
+    graph_module: fx.GraphModule
+    #: Structured match result produced by
+    #: :func:`~embedl_deploy._internal.core.tree.match.match_tree`.
+    #: Contains the matched nodes, modules, and nested per-branch
+    #: sub-matches for
+    #: :class:`~embedl_deploy._internal.core.tree.types.Fork`
+    #: topologies.
+    tree_match: TreeMatch
+    #: Whether to apply this match during transformation.
+    apply: bool = True
+    def __repr__(self) -> str:
+        pat = type(self.pattern).__name__
+        node_names = [n.name for n in self.tree_match.get_tree_nodes()]
+        return f"PatternMatch({pat}: {' -> '.join(node_names)})"

embedl-deploy 0.3.0__tar.gz → 0.4.1__tar.gz

embedl-deploy 0.3.0tar.gz → 0.4.1tar.gz