PyPI - tracepipe - Versions diffs - 0.3.5__tar.gz → 0.4.1__tar.gz - Mend

tracepipe 0.3.5tar.gz → 0.4.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (154) hide show

{tracepipe-0.3.5 → tracepipe-0.4.1}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,67 @@ All notable changes to TracePipe will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## 0.4.1 - 2026-02-04
+### Fixed
+- Fully implemented `CheckResult` convenience properties (`.passed`, `.retention`, `.n_dropped`, `.n_steps`, `.drops_by_op`)
+- Added comprehensive tests for `CheckResult` API to ensure properties work correctly
+- Properties now properly access underlying `.facts` dictionary for all metrics
+### Changed
+- Cleaned up example files and test scripts
+## 0.4.0 - 2026-02-04
+### Added
+- **Full row provenance for `pd.concat(axis=0)`**: Row IDs are now preserved through concatenation
+  - Each result row maintains its original RID from the source DataFrame
+  - `ConcatMapping` tracks which source DataFrame each row came from
+  - Concat steps are now marked `FULL` completeness (previously `PARTIAL`)
+- **Duplicate drop provenance in debug mode**: `drop_duplicates` now tracks which row "won"
+  - `DuplicateDropMapping` maps dropped rows to their kept representative
+  - Supports `keep='first'`, `keep='last'`, and `keep=False`
+  - Uses `hash_pandas_object` for fast, NaN-safe key comparison
+- **Clean `TraceResult` API for provenance** (UX improvement):
+  - `trace.origin` — Unified origin info: `{"type": "concat", "source_df": 1}` or `{"type": "merge", "left_parent": 10, "right_parent": 20}`
+  - `trace.representative` — For dedup-dropped rows: `{"kept_rid": 42, "subset": ["key"], "keep": "first"}`
+  - No need to access internal `.store` methods — everything is in `tp.trace()` result
+- **Clean `CheckResult` API** (UX improvement):
+  - `result.passed` — Alias for `.ok` (common naming convention)
+  - `result.retention` — Row retention rate (0.0-1.0) from `.facts`
+  - `result.n_dropped` — Total rows dropped
+  - `result.n_steps` — Total pipeline steps recorded
+  - `result.drops_by_op` — Drops broken down by operation name
+  - All properties are now discoverable via autocomplete
+- **New data structures in `core.py`**:
+  - `ConcatMapping`: Tracks row provenance through concat operations
+  - `DuplicateDropMapping`: Tracks dropped->kept relationships in drop_duplicates
+- **Comprehensive test suite**: 38 new tests in `test_row_provenance.py` covering:
+  - Concat RID preservation, ignore_index, after sort, with empty DFs, chained concats
+  - Axis=1 same-RID propagation vs different-RID PARTIAL marking
+  - Drop_duplicates keep='first'/'last'/False mapping correctness
+  - NaN handling parity with pandas `duplicated()`
+  - Integration: concat→merge, filter→concat, dedup→fillna lineage
+  - TraceResult `.origin` and `.representative` property tests
+### Changed
+- `wrap_concat_with_lineage` rewritten for full provenance tracking
+  - Captures source RIDs before operation
+  - Propagates RIDs (not new registration) for axis=0
+  - Stores positional + sorted arrays for both "explain row i" and O(log n) lookup
+  - Axis=1 propagates RIDs if all inputs match, otherwise PARTIAL
+- `_capture_filter_with_mask` enhanced to store `DuplicateDropMapping` in debug mode
+- `TraceResult` enhanced with `.origin` and `.representative` properties
+  - `.to_text()` now displays origin and representative info
+  - `.to_dict()` includes all provenance info
 ## 0.3.5 - 2026-02-03
 ### Fixed
@@ -17,6 +78,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Known Limitations section in README documenting concat/dedup tracking gaps
 - Test for `DataFrame.fillna` single-event logging
+### Changed
+- **Test suite hardened** with exact count assertions and multi-scenario tests:
+  - Changed 15+ assertions from `>= 1` to `== 1` for precise verification
+  - Added `test_integration_scenarios.py` with 16 new tests covering:
+    - Multi-pipeline session isolation
+    - Warning message content verification
+    - Reliability scenarios (fillna, replace, loc, merge)
+    - Cross-pipeline contamination prevention
 ## 0.3.4 - 2026-02-03
 ### Fixed

{tracepipe-0.3.5 → tracepipe-0.4.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tracepipe
-Version: 0.3.5
+Version: 0.4.1
 Summary: Row-level data lineage tracking for pandas pipelines
 Project-URL: Homepage, https://github.com/tracepipe/tracepipe
 Project-URL: Documentation, https://tracepipe.github.io/tracepipe/
@@ -278,7 +278,7 @@ tp.enable(mode="debug")  # Full lineage
 ## Known Limitations
-TracePipe tracks **cell mutations** (fillna, replace, loc assignment) and **merge provenance** reliably. However, some patterns are not yet fully supported:
+TracePipe tracks **cell mutations**, **merge provenance**, **concat provenance**, and **duplicate drop decisions** reliably. A few patterns have limited tracking:
 | Pattern | Status | Notes |
 |---------|--------|-------|
@@ -286,13 +286,10 @@ TracePipe tracks **cell mutations** (fillna, replace, loc assignment) and **merg
 | `df = df.fillna({"col": 0})` | ✅ Tracked | DataFrame-level fillna |
 | `df.loc[mask, "col"] = val` | ✅ Tracked | Conditional assignment |
 | `df.merge(other, on="key")` | ✅ Tracked | Full provenance in debug mode |
-| `pd.concat([df1, df2])` | ⚠️ Partial | Row IDs preserved, but no "source DataFrame" tracking |
-| `df.drop_duplicates(keep='last')` | ⚠️ Partial | Which row was kept is not tracked |
-| Sort + dedup patterns | ⚠️ Partial | "Latest record wins" logic not traced |
-**Why?** TracePipe tracks value changes within rows, not row-selection operations. When `drop_duplicates` picks one row over another, that's a provenance decision (not a cell mutation) that isn't currently instrumented.
-**Planned for 0.4**: Full row-provenance tracking for concat, drop_duplicates, and sort operations.
+| `pd.concat([df1, df2])` | ✅ Tracked | Row IDs preserved with source DataFrame tracking (v0.4+) |
+| `df.drop_duplicates()` | ✅ Tracked | Dropped rows map to kept representative (debug mode, v0.4+) |
+| `pd.concat(axis=1)` | ⚠️ Partial | FULL only if all inputs have identical RIDs |
+| Complex `apply`/`pipe` | ⚠️ Partial | Output tracked, internals opaque |
 ---

{tracepipe-0.3.5 → tracepipe-0.4.1}/README.md RENAMED Viewed

@@ -209,7 +209,7 @@ tp.enable(mode="debug")  # Full lineage
 ## Known Limitations
-TracePipe tracks **cell mutations** (fillna, replace, loc assignment) and **merge provenance** reliably. However, some patterns are not yet fully supported:
+TracePipe tracks **cell mutations**, **merge provenance**, **concat provenance**, and **duplicate drop decisions** reliably. A few patterns have limited tracking:
 | Pattern | Status | Notes |
 |---------|--------|-------|
@@ -217,13 +217,10 @@ TracePipe tracks **cell mutations** (fillna, replace, loc assignment) and **merg
 | `df = df.fillna({"col": 0})` | ✅ Tracked | DataFrame-level fillna |
 | `df.loc[mask, "col"] = val` | ✅ Tracked | Conditional assignment |
 | `df.merge(other, on="key")` | ✅ Tracked | Full provenance in debug mode |
-| `pd.concat([df1, df2])` | ⚠️ Partial | Row IDs preserved, but no "source DataFrame" tracking |
-| `df.drop_duplicates(keep='last')` | ⚠️ Partial | Which row was kept is not tracked |
-| Sort + dedup patterns | ⚠️ Partial | "Latest record wins" logic not traced |
-**Why?** TracePipe tracks value changes within rows, not row-selection operations. When `drop_duplicates` picks one row over another, that's a provenance decision (not a cell mutation) that isn't currently instrumented.
-**Planned for 0.4**: Full row-provenance tracking for concat, drop_duplicates, and sort operations.
+| `pd.concat([df1, df2])` | ✅ Tracked | Row IDs preserved with source DataFrame tracking (v0.4+) |
+| `df.drop_duplicates()` | ✅ Tracked | Dropped rows map to kept representative (debug mode, v0.4+) |
+| `pd.concat(axis=1)` | ⚠️ Partial | FULL only if all inputs have identical RIDs |
+| Complex `apply`/`pipe` | ⚠️ Partial | Output tracked, internals opaque |
 ---

{tracepipe-0.3.5 → tracepipe-0.4.1}/docs/api/core.md RENAMED Viewed

@@ -86,6 +86,9 @@ Manually register DataFrames for tracking.
 Use this when DataFrames are created before `tp.enable()` is called.
+!!! note "Lineage Break"
+    Calling `register()` assigns new row IDs, which breaks lineage from any prior transformations. Use it only for "entry point" DataFrames.
 **Parameters:**
 | Parameter | Type | Description |
@@ -161,14 +164,15 @@ Health check for a DataFrame's lineage.
 | Attribute | Type | Description |
 |-----------|------|-------------|
-| `.passed` | `bool` | True if healthy |
+| `.ok` | `bool` | True if no FACT-level warnings |
+| `.passed` | `bool` | Alias for `.ok` |
 | `.mode` | `str` | Current tracking mode |
-| `.retention` | `float` | Row retention rate (0-1) |
-| `.n_dropped` | `int` | Total dropped rows |
-| `.n_changes` | `int` | Total cell changes |
-| `.warnings` | `list[str]` | Any warnings |
-| `.drops_by_op` | `dict` | Drops by operation |
-| `.changes_by_op` | `dict` | Changes by operation |
+| `.retention` | `float \| None` | Row retention rate (0.0-1.0) |
+| `.n_dropped` | `int` | Total rows dropped |
+| `.n_steps` | `int` | Total pipeline steps recorded |
+| `.drops_by_op` | `dict[str, int]` | Drops by operation name |
+| `.warnings` | `list[CheckWarning]` | Warning objects with details |
+| `.facts` | `dict` | Raw measured facts (for power users) |
 **Example:**
@@ -209,9 +213,11 @@ Trace a row's journey through the pipeline.
 | Attribute | Type | Description |
 |-----------|------|-------------|
 | `.row_id` | `int` | Internal row ID |
-| `.status` | `str` | `"alive"` or `"dropped"` |
+| `.is_alive` | `bool` | True if row exists in current DataFrame |
 | `.events` | `list` | All events for this row |
-| `.dropped_by` | `str` | Operation that dropped (if dropped) |
+| `.dropped_at` | `dict` | Operation that dropped (if dropped) |
+| `.origin` | `dict` | Where row came from: `{"type": "concat", "source_df": 1}` or `{"type": "merge", "left_parent": 10, "right_parent": 20}` |
+| `.representative` | `dict` | If dropped by dedup: `{"kept_rid": 42, "subset": [...], "keep": "first"}` |
 **Example:**

tracepipe-0.4.1/docs/changelog.md ADDED Viewed

@@ -0,0 +1,120 @@
+# Changelog
+All notable changes to TracePipe will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.4.1] - 2026-02-04
+### Fixed
+- Fully implemented `CheckResult` convenience properties (`.passed`, `.retention`, `.n_dropped`, `.n_steps`, `.drops_by_op`)
+- Added comprehensive tests for `CheckResult` API to ensure properties work correctly
+- Properties now properly access underlying `.facts` dictionary for all metrics
+### Changed
+- Cleaned up example files and test scripts
+## [0.4.0] - 2026-02-04
+### Added
+- **Full row provenance for `pd.concat(axis=0)`**: Row IDs are now preserved through concatenation
+  - Each result row maintains its original RID from the source DataFrame
+  - `ConcatMapping` tracks which source DataFrame each row came from
+  - Concat steps are now marked `FULL` completeness
+- **Duplicate drop provenance in debug mode**: `drop_duplicates` now tracks which row "won"
+  - `DuplicateDropMapping` maps dropped rows to their kept representative
+  - Supports `keep='first'`, `keep='last'`, and `keep=False`
+  - Uses `hash_pandas_object` for fast, NaN-safe key comparison
+- **Clean `TraceResult` API for provenance**:
+  - `trace.origin` — Unified origin: `{"type": "concat", "source_df": 1}` or `{"type": "merge", ...}`
+  - `trace.representative` — For dedup drops: `{"kept_rid": 42, "subset": ["key"], "keep": "first"}`
+  - No need to access internal `.store` methods
+- **Clean `CheckResult` API**:
+  - `result.passed` — Alias for `.ok`
+  - `result.retention` — Row retention rate (0.0-1.0)
+  - `result.n_dropped`, `result.n_steps`, `result.drops_by_op`
+  - All properties discoverable via autocomplete
+- **Comprehensive test suite**: 38 new tests covering concat, dedup, and TraceResult API
+### Changed
+- `wrap_concat_with_lineage` rewritten for full provenance tracking
+- `axis=1` concat propagates RIDs if all inputs match, otherwise PARTIAL
+- `TraceResult` enhanced with `.origin` and `.representative` properties
+## [0.3.5] - 2026-02-03
+### Fixed
+- **DataFrame.fillna double-logging**: `df.fillna({"col": 0})` now logs exactly 1 event
+- Added `wrap_pandas_transform_method` with `_in_transform_op` flag
+### Added
+- Known Limitations section in README documenting concat/dedup tracking gaps
+### Changed
+- Test suite hardened with exact count assertions and multi-scenario tests
+## [0.3.4] - 2026-02-03
+### Fixed
+- **Event deduplication**: Identical events from parallel pipelines are now deduplicated
+## [0.3.3] - 2026-02-03
+### Fixed
+- **Double-logging bug**: `df['col'] = df['col'].fillna()` now logs exactly one event
+- **Merge warning scoping**: `tp.check(df)` now only shows warnings for merges in df's lineage
+## [0.3.2] - 2026-02-03
+### Fixed
+- Merge duplicate key warnings now correctly identify which table (left/right) has duplicates
+## [0.3.1] - 2026-02-03
+### Fixed
+- Cell history now correctly chains through merge operations via lineage traversal
+- `tp.why()` and `tp.trace()` show pre-merge changes for post-merge rows
+- `enable()` resets accumulated state when called multiple times
+### Added
+- `get_row_history_with_lineage()` and `get_cell_history_with_lineage()` methods
+## [0.3.0] - 2026-02-03
+### Added
+- MkDocs documentation site with Material theme
+- Comprehensive API reference documentation
+- Getting started guides and tutorials
+- `tp.register()` API for manually registering DataFrames
+- Configurable retention threshold in `tp.check()`
+- Ghost row capture for fallback filter paths
+- Data quality contracts with fluent API
+- HTML report generation
+- Snapshot and diff functionality
+- Debug mode with cell-level tracking
+- `tp.why()` for cell provenance
+- `tp.trace()` for row journey
+- Support for all major pandas operations
+### Fixed
+- Recursion bug when accessing hidden column in COLUMN mode
+- Config propagation issues
+- Retention rate calculation for multi-table pipelines
+- Export wrappers correctly strip hidden column

{tracepipe-0.3.5 → tracepipe-0.4.1}/docs/guide/concepts.md RENAMED Viewed

@@ -68,6 +68,28 @@ When rows are aggregated:
 grouped = df.groupby("category").sum()  # GROUP event with membership
 ```
+### Concat Events (v0.4+)
+When DataFrames are concatenated, row IDs are preserved:
+```python
+df1 = pd.DataFrame({"a": [1, 2]})  # Rows get IDs: 0, 1
+df2 = pd.DataFrame({"a": [3, 4]})  # Rows get IDs: 2, 3
+result = pd.concat([df1, df2])  # IDs preserved: 0, 1, 2, 3
+# TracePipe tracks which source DataFrame each row came from
+```
+### Duplicate Drop Events (v0.4+)
+In debug mode, `drop_duplicates` tracks which row was kept as representative:
+```python
+df = pd.DataFrame({"key": ["A", "A", "B"], "val": [1, 2, 3]})
+df = df.drop_duplicates(subset=["key"], keep="first")
+# Row with val=2 was dropped, mapped to representative (val=1)
+```
 ---
 ## The Lineage Store

{tracepipe-0.3.5 → tracepipe-0.4.1}/docs/guide/row-tracing.md RENAMED Viewed

@@ -109,6 +109,70 @@ if trace.merge_parents:
     print(f"Right parent: {trace.merge_parents.right}")
 ```
+---
+## Concat Origin Tracking (v0.4+)
+When rows come from concatenated DataFrames, TracePipe tracks their source via `trace.origin`:
+```python
+df1 = pd.DataFrame({"a": [1, 2]})
+df2 = pd.DataFrame({"a": [3, 4]})
+result = pd.concat([df1, df2])
+# Trace a row that came from df2
+trace = tp.trace(result, row=2)
+print(trace.origin)
+# {"type": "concat", "source_df": 1, "step_id": 5}
+```
+The `.origin` property returns a unified dict with:
+- `type`: `"concat"`, `"merge"`, or `None` (for original rows)
+- `source_df`: Index in the concat list (0=first DataFrame, 1=second, etc.)
+- `step_id`: Which pipeline step
+Row IDs are preserved through `pd.concat(axis=0)`, so lineage chains correctly:
+```python
+# Transform df1 before concat
+df1["a"] = df1["a"].fillna(0)
+result = pd.concat([df1, df2])
+# Rows from df1 still have their fillna history
+trace = tp.trace(result, row=0)  # Shows fillna event from df1
+```
+---
+## Duplicate Representative Tracking (v0.4+)
+When `drop_duplicates` removes rows, TracePipe tracks which row "won" via `trace.representative`:
+```python
+df = pd.DataFrame({
+    "key": ["A", "A", "B"],
+    "value": [100, 200, 300]
+})
+df = df.drop_duplicates(subset=["key"], keep="first")
+# Trace the dropped row (value=200)
+trace = tp.trace(df, row=dropped_row_id)
+print(trace.representative)
+# {"kept_rid": 42, "subset": ["key"], "keep": "first"}
+```
+The `.representative` property is only set for rows dropped by `drop_duplicates`:
+| `keep` Strategy | `.representative` |
+|-----------------|-------------------|
+| `keep='first'` | `{"kept_rid": 42, ...}` — first occurrence kept |
+| `keep='last'` | `{"kept_rid": 45, ...}` — last occurrence kept |
+| `keep=False` | `{"kept_rid": None, ...}` — all duplicates removed |
+This answers "why did this row disappear?" — it wasn't deleted, it was deduplicated.
 ## Performance Considerations
 - Row tracing in CI mode is limited (no individual row IDs)

{tracepipe-0.3.5 → tracepipe-0.4.1}/docs/index.md RENAMED Viewed

@@ -159,12 +159,14 @@ print(tp.why(df, "price", 0))  # Why price changed
 | Operation | Tracking | Completeness |
 |-----------|----------|--------------|
-| `dropna`, `drop_duplicates` | Dropped row IDs | Full |
-| `query`, `df[mask]` | Dropped row IDs | Full |
+| `dropna`, `query`, `df[mask]` | Dropped row IDs | Full |
+| `drop_duplicates` | Dropped→kept mapping (debug mode) | Full |
 | `head`, `tail`, `sample` | Dropped row IDs | Full |
 | `fillna`, `replace` | Cell diffs (watched cols) | Full |
 | `loc[]=`, `iloc[]=`, `at[]=` | Cell diffs | Full |
 | `merge`, `join` | Parent tracking | Full |
+| `pd.concat(axis=0)` | Row IDs + source DataFrame | Full |
+| `pd.concat(axis=1)` | Row IDs (if aligned) | Partial |
 | `groupby().agg()` | Group membership | Full |
 | `apply`, `pipe` | Output tracked | Partial |

{tracepipe-0.3.5 → tracepipe-0.4.1}/mkdocs.yml RENAMED Viewed

@@ -1,6 +1,6 @@
 site_name: TracePipe
 site_description: Row-level data lineage tracking for pandas pipelines
-site_url: https://tracepipe.github.io/tracepipe/
+site_url: https://gauthierpiarrette.github.io/tracepipe/
 repo_url: https://github.com/gauthierpiarrette/tracepipe
 repo_name: gauthierpiarrette/tracepipe
 edit_uri: edit/main/docs/

{tracepipe-0.3.5 → tracepipe-0.4.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "tracepipe"
-version = "0.3.5"
+version = "0.4.1"
 description = "Row-level data lineage tracking for pandas pipelines"
 readme = "README.md"
 license = {file = "LICENSE"}

{tracepipe-0.3.5 → tracepipe-0.4.1}/tests/test_api.py RENAMED Viewed

@@ -304,7 +304,9 @@ class TestRowLineageResult:
         row = dbg().explain_row(0)
         history = row.cell_history("a")
-        assert len(history) >= 1
+        assert (
+            len(history) == 1
+        ), f"Single fillna should record exactly 1 change, got {len(history)}"
     def test_history(self):
         """history() returns full history."""
@@ -487,7 +489,9 @@ class TestPreEnableDataFrameTracking:
         df["a"] = df["a"] * 10
         result = tracepipe.why(df, col="a", row=0)
-        assert len(result.history) >= 1
+        assert (
+            len(result.history) == 1
+        ), f"Single multiply should record exactly 1 change, got {len(result.history)}"
     def test_trace_after_register(self):
         """Row tracing works for registered DataFrames."""

{tracepipe-0.3.5 → tracepipe-0.4.1}/tests/test_convenience_debug.py RENAMED Viewed

@@ -192,6 +192,125 @@ class TestCheckResult:
         assert d["facts"]["key"] == "value"
         assert len(d["warnings"]) == 1
+    # === CONVENIENCE PROPERTY TESTS (v0.4+) ===
+    def test_passed_property_alias_for_ok(self):
+        """passed is an alias for ok."""
+        result_ok = CheckResult(ok=True, warnings=[], facts={}, suggestions=[], mode="debug")
+        result_fail = CheckResult(ok=False, warnings=[], facts={}, suggestions=[], mode="debug")
+        assert result_ok.passed is True
+        assert result_ok.passed == result_ok.ok
+        assert result_fail.passed is False
+        assert result_fail.passed == result_fail.ok
+    def test_retention_property(self):
+        """retention returns retention_rate from facts."""
+        result = CheckResult(
+            ok=True,
+            warnings=[],
+            facts={"retention_rate": 0.847},
+            suggestions=[],
+            mode="debug",
+        )
+        assert result.retention == 0.847
+    def test_retention_property_none_when_missing(self):
+        """retention returns None when retention_rate not in facts."""
+        result = CheckResult(ok=True, warnings=[], facts={}, suggestions=[], mode="debug")
+        assert result.retention is None
+    def test_n_dropped_property(self):
+        """n_dropped returns rows_dropped from facts."""
+        result = CheckResult(
+            ok=True,
+            warnings=[],
+            facts={"rows_dropped": 153},
+            suggestions=[],
+            mode="debug",
+        )
+        assert result.n_dropped == 153
+    def test_n_dropped_property_zero_default(self):
+        """n_dropped returns 0 when rows_dropped not in facts."""
+        result = CheckResult(ok=True, warnings=[], facts={}, suggestions=[], mode="debug")
+        assert result.n_dropped == 0
+    def test_n_steps_property(self):
+        """n_steps returns total_steps from facts."""
+        result = CheckResult(
+            ok=True,
+            warnings=[],
+            facts={"total_steps": 5},
+            suggestions=[],
+            mode="debug",
+        )
+        assert result.n_steps == 5
+    def test_drops_by_op_property(self):
+        """drops_by_op returns the _drops_by_op dict."""
+        result = CheckResult(
+            ok=True,
+            warnings=[],
+            facts={},
+            suggestions=[],
+            mode="debug",
+            _drops_by_op={"dropna": 42, "filter": 111},
+        )
+        assert result.drops_by_op == {"dropna": 42, "filter": 111}
+    def test_drops_by_op_empty_default(self):
+        """drops_by_op returns empty dict when not set."""
+        result = CheckResult(ok=True, warnings=[], facts={}, suggestions=[], mode="debug")
+        assert result.drops_by_op == {}
+    def test_to_dict_includes_convenience_fields(self):
+        """to_dict includes convenience property values."""
+        result = CheckResult(
+            ok=True,
+            warnings=[],
+            facts={"retention_rate": 0.85, "rows_dropped": 15, "total_steps": 3},
+            suggestions=[],
+            mode="debug",
+            _drops_by_op={"dropna": 15},
+        )
+        d = result.to_dict()
+        assert d["passed"] is True
+        assert d["retention"] == 0.85
+        assert d["n_dropped"] == 15
+        assert d["n_steps"] == 3
+        assert d["drops_by_op"] == {"dropna": 15}
+class TestCheckResultIntegration:
+    """Integration tests for CheckResult convenience properties."""
+    def test_check_populates_convenience_properties(self):
+        """tp.check() populates convenience properties correctly."""
+        tp.enable(mode="debug")
+        df = pd.DataFrame({"a": [1, 2, None, 4, 5]})
+        df = df.dropna()
+        result = tp.check(df)
+        # Convenience properties should be accessible
+        assert isinstance(result.passed, bool)
+        assert result.retention is None or isinstance(result.retention, float)
+        assert isinstance(result.n_dropped, int)
+        assert isinstance(result.drops_by_op, dict)
+        assert isinstance(result.n_steps, int)
+    def test_check_drops_by_op_populated(self):
+        """tp.check() correctly populates drops_by_op."""
+        tp.enable(mode="debug")
+        df = pd.DataFrame({"a": [1, 2, None, 4, 5], "b": [1, 1, 2, 2, 3]})
+        df = df.dropna()
+        df = df.drop_duplicates(subset=["b"])
+        result = tp.check(df)
+        # Should have drops tracked by operation
+        assert "DataFrame.dropna" in result.drops_by_op or result.n_dropped > 0
 # =============================================================================
 # TraceResult TESTS
@@ -410,7 +529,9 @@ class TestWhyResult:
         assert result is not None
         assert result.column == "amount"
         assert result.current_value == 300.0  # 200 * 1.5
-        assert result.n_changes >= 1
+        assert (
+            result.n_changes == 1
+        ), f"Single multiply should record exactly 1 change, got {result.n_changes}"
     def test_why_with_where_multiple_criteria(self):
         """why() with where= using multiple column criteria."""

tracepipe 0.3.5__tar.gz → 0.4.1__tar.gz

tracepipe 0.3.5tar.gz → 0.4.1tar.gz