PyPI - loopgain - Versions diffs - 0.4.0__tar.gz → 0.4.2__tar.gz - Mend

loopgain 0.4.0tar.gz → 0.4.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{loopgain-0.4.0 → loopgain-0.4.2}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: loopgain
-Version: 0.4.0
-Summary: Barkhausen stability monitor for AI agent loops. Real-time loop-gain (Aβ) monitoring with five named threshold bands, best-so-far rollback, and ETA prediction.
+Version: 0.4.2
+Summary: An open-source cost controller for AI agent loops. Stops a loop when it has actually converged and rolls back before it degrades — replacing the max_iterations guess with a real-time loop-gain (Aβ) monitor with five named threshold bands and best-so-far rollback.
 Author-email: Dave Fitzsimmons <hello@loopgain.ai>
 License: Apache-2.0
 Project-URL: Homepage, https://loopgain.ai
@@ -49,14 +49,16 @@ Dynamic: license-file
 # LoopGain
-**Barkhausen stability monitor for AI agent loops.**
+**An open-source cost controller for AI agent loops.**
-Replace `max_iterations=5` with a real-time trajectory classifier that reads four features off the loop's error series and routes it into one of five named states — knowing whether your agent loop is converging, stalling, oscillating, or diverging, and what to do in each case.
+AI agent loops waste time and money when they don't know when to stop. LoopGain measures the loop in real time and stops it the moment it has actually converged — and rolls back before it degrades — instead of running to a fixed `max_iterations` cap.
+> **Across 2,000 paired trials over 10 cells**, LoopGain reduced total API spend by **92.8%** vs `max_iter=20`, dropped median wall-clock latency from 30.9s to 2.1s (**~15×**), preserved output quality on natural-distribution workloads (W1–W4: judge winrate 0.50–0.63, CI excluding null on most cells), and improved output quality on engineered-failure workloads (W5: winrate 0.92–0.95 across three adapters). Weighted-average pairwise preference for LG vs B20 across 1,800 judge comparisons: **0.678**. Zero of six kill criteria fired.
 [![PyPI](https://img.shields.io/pypi/v/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![Python](https://img.shields.io/pypi/pyversions/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-157_passing-brightgreen.svg)](tests/)
+[![Tests](https://img.shields.io/badge/tests-200%2B_passing-brightgreen.svg)](tests/)
 **Home:** [loopgain.ai](https://loopgain.ai)
@@ -68,7 +70,7 @@ Works for **any iterative AI workflow with a measurable error signal** — verif
 ## Why
-Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stability monitor based on the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
+Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stop-and-rollback policy grounded in the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
 ---
@@ -145,7 +147,7 @@ It routes the trajectory into one of five named states:
 | State | Condition | Action |
 | --- | --- | --- |
-| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue, predict ETA |
+| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue |
 | `CONVERGING` | negative slope with `p < 0.05`, OR cumulative ≤ 50% | Continue, watch for upward drift |
 | `STALLING` | no significant slope, no detectable oscillation | Stop after 2 consecutive readings — return best-so-far |
 | `OSCILLATING` | high residual variance with flat trend | Stop — return best-so-far |
@@ -161,18 +163,6 @@ The decision is **conservative by design**: requiring both statistical significa
 ---
-## ETA prediction
-When the loop is converging (`Aβ_smooth < 1`), LoopGain produces a closed-form prediction of iterations remaining:
-```
-n_remaining = log(E_target / E_current) / log(Aβ_smooth)
-```
-Available as `lg.eta` mid-loop. Returns `None` when the prediction isn't well-defined (no Aβ yet, target zero, or non-converging gain).
----
 ## Best-so-far rollback
 LoopGain keeps a buffer of all observed outputs paired with their error scores. On termination it returns `argmin(error)`, not the last iteration:
@@ -189,10 +179,10 @@ This transforms divergence detection from "abort with garbage" into "abort with
 ## What LoopGain does and doesn't guarantee
-LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **93.5% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
+LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **92.8% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
-- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
-- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤3.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
+- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~78–84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
+- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤4.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
 ---
@@ -224,7 +214,7 @@ Current state name. One of `INIT`, `FAST_CONVERGE`, `CONVERGING`, `STALLING`, `O
 ### `lg.eta -> int | None`
-Predicted iterations to reach target. `None` when not well-defined.
+Best-effort closed-form estimate of iterations remaining, exposed for instrumentation. Returns `None` whenever it isn't well-defined — which is most of the time on real, jump-dominated loops, so don't depend on it for control.
 ### `lg.gain_margin -> float | None`

{loopgain-0.4.0 → loopgain-0.4.2}/README.md RENAMED Viewed

@@ -1,13 +1,15 @@
 # LoopGain
-**Barkhausen stability monitor for AI agent loops.**
+**An open-source cost controller for AI agent loops.**
-Replace `max_iterations=5` with a real-time trajectory classifier that reads four features off the loop's error series and routes it into one of five named states — knowing whether your agent loop is converging, stalling, oscillating, or diverging, and what to do in each case.
+AI agent loops waste time and money when they don't know when to stop. LoopGain measures the loop in real time and stops it the moment it has actually converged — and rolls back before it degrades — instead of running to a fixed `max_iterations` cap.
+> **Across 2,000 paired trials over 10 cells**, LoopGain reduced total API spend by **92.8%** vs `max_iter=20`, dropped median wall-clock latency from 30.9s to 2.1s (**~15×**), preserved output quality on natural-distribution workloads (W1–W4: judge winrate 0.50–0.63, CI excluding null on most cells), and improved output quality on engineered-failure workloads (W5: winrate 0.92–0.95 across three adapters). Weighted-average pairwise preference for LG vs B20 across 1,800 judge comparisons: **0.678**. Zero of six kill criteria fired.
 [![PyPI](https://img.shields.io/pypi/v/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![Python](https://img.shields.io/pypi/pyversions/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-157_passing-brightgreen.svg)](tests/)
+[![Tests](https://img.shields.io/badge/tests-200%2B_passing-brightgreen.svg)](tests/)
 **Home:** [loopgain.ai](https://loopgain.ai)
@@ -19,7 +21,7 @@ Works for **any iterative AI workflow with a measurable error signal** — verif
 ## Why
-Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stability monitor based on the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
+Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stop-and-rollback policy grounded in the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
 ---
@@ -96,7 +98,7 @@ It routes the trajectory into one of five named states:
 | State | Condition | Action |
 | --- | --- | --- |
-| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue, predict ETA |
+| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue |
 | `CONVERGING` | negative slope with `p < 0.05`, OR cumulative ≤ 50% | Continue, watch for upward drift |
 | `STALLING` | no significant slope, no detectable oscillation | Stop after 2 consecutive readings — return best-so-far |
 | `OSCILLATING` | high residual variance with flat trend | Stop — return best-so-far |
@@ -112,18 +114,6 @@ The decision is **conservative by design**: requiring both statistical significa
 ---
-## ETA prediction
-When the loop is converging (`Aβ_smooth < 1`), LoopGain produces a closed-form prediction of iterations remaining:
-```
-n_remaining = log(E_target / E_current) / log(Aβ_smooth)
-```
-Available as `lg.eta` mid-loop. Returns `None` when the prediction isn't well-defined (no Aβ yet, target zero, or non-converging gain).
----
 ## Best-so-far rollback
 LoopGain keeps a buffer of all observed outputs paired with their error scores. On termination it returns `argmin(error)`, not the last iteration:
@@ -140,10 +130,10 @@ This transforms divergence detection from "abort with garbage" into "abort with
 ## What LoopGain does and doesn't guarantee
-LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **93.5% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
+LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **92.8% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
-- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
-- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤3.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
+- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~78–84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
+- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤4.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
 ---
@@ -175,7 +165,7 @@ Current state name. One of `INIT`, `FAST_CONVERGE`, `CONVERGING`, `STALLING`, `O
 ### `lg.eta -> int | None`
-Predicted iterations to reach target. `None` when not well-defined.
+Best-effort closed-form estimate of iterations remaining, exposed for instrumentation. Returns `None` whenever it isn't well-defined — which is most of the time on real, jump-dominated loops, so don't depend on it for control.
 ### `lg.gain_margin -> float | None`

{loopgain-0.4.0 → loopgain-0.4.2}/loopgain/_version.py RENAMED Viewed

@@ -7,4 +7,4 @@ from here so the value never drifts between ``__version__`` and the
 ``pyproject.toml``) for each release.
 """
-__version__ = "0.4.0"
+__version__ = "0.4.2"

{loopgain-0.4.0 → loopgain-0.4.2}/loopgain/classifier.py RENAMED Viewed

@@ -184,9 +184,14 @@ def _two_sided_t_p(t_abs: float, df: int) -> float:
         # exact: cdf_t(t,1) = 0.5 + arctan(t)/pi
         return 2.0 * (0.5 - math.atan(t_abs) / math.pi)
     if df == 2:
-        # exact one-sided survival: 1 - (1 + t²/2)^(-1) doubled
-        return min(1.0, 2.0 * (1.0 - t_abs / math.sqrt(2.0 + t_abs * t_abs) / 1.0) * 0.5
-                   + 2.0 * (0.5 - 0.5 * t_abs / math.sqrt(2.0 + t_abs * t_abs)))
+        # Exact two-sided p-value for Student-t with df=2. The df=2 CDF is
+        # F(t) = 1/2 + t / (2·√(2 + t²)), so the one-sided survival is
+        # P(T > t) = 1/2 − t / (2·√(2 + t²)) and the two-sided p is
+        #   2·P(T > |t|) = 1 − |t| / √(2 + t²).
+        # (The previous implementation returned twice this — it required
+        # |t| > 6.21 for p<0.05 instead of the correct |t| > 4.30, making
+        # the n=4 classifier far too conservative. See test_classifier.)
+        return max(0.0, 1.0 - t_abs / math.sqrt(2.0 + t_abs * t_abs))
     # Wilson-Hilferty: transform t² ~ F(1, df), then F → chi-square via
     # cube-root approximation. For our purposes the simpler normal-approx
     # to the t with the Hill / Abramowitz adjustment is enough.

{loopgain-0.4.0 → loopgain-0.4.2}/loopgain/telemetry.py RENAMED Viewed

@@ -178,6 +178,11 @@ def build_payload(
             "savings_vs_fixed_cap": result.savings_vs_fixed_cap,
             "convergence_profile_summary": profile_summary,
             "rollback_triggered": result.outcome in ("oscillating", "diverged"),
+            # Index (0-based) of the lowest-error iteration. Lets the receiver
+            # derive iterations-to-best (best_index+1) and iterations-past-best
+            # (iterations_used-1-best_index) — the "Iteration Waste" view.
+            # Privacy-safe: an integer position, no output/error content.
+            "best_index": result.best_index,
             # v2: first computable eta snapshot, for ETA calibration dashboard.
             # Predicted total iterations = first_eta_at_iteration +
             # first_eta_prediction; compare to iterations_used to compute the

{loopgain-0.4.0 → loopgain-0.4.2}/loopgain.egg-info/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: loopgain
-Version: 0.4.0
-Summary: Barkhausen stability monitor for AI agent loops. Real-time loop-gain (Aβ) monitoring with five named threshold bands, best-so-far rollback, and ETA prediction.
+Version: 0.4.2
+Summary: An open-source cost controller for AI agent loops. Stops a loop when it has actually converged and rolls back before it degrades — replacing the max_iterations guess with a real-time loop-gain (Aβ) monitor with five named threshold bands and best-so-far rollback.
 Author-email: Dave Fitzsimmons <hello@loopgain.ai>
 License: Apache-2.0
 Project-URL: Homepage, https://loopgain.ai
@@ -49,14 +49,16 @@ Dynamic: license-file
 # LoopGain
-**Barkhausen stability monitor for AI agent loops.**
+**An open-source cost controller for AI agent loops.**
-Replace `max_iterations=5` with a real-time trajectory classifier that reads four features off the loop's error series and routes it into one of five named states — knowing whether your agent loop is converging, stalling, oscillating, or diverging, and what to do in each case.
+AI agent loops waste time and money when they don't know when to stop. LoopGain measures the loop in real time and stops it the moment it has actually converged — and rolls back before it degrades — instead of running to a fixed `max_iterations` cap.
+> **Across 2,000 paired trials over 10 cells**, LoopGain reduced total API spend by **92.8%** vs `max_iter=20`, dropped median wall-clock latency from 30.9s to 2.1s (**~15×**), preserved output quality on natural-distribution workloads (W1–W4: judge winrate 0.50–0.63, CI excluding null on most cells), and improved output quality on engineered-failure workloads (W5: winrate 0.92–0.95 across three adapters). Weighted-average pairwise preference for LG vs B20 across 1,800 judge comparisons: **0.678**. Zero of six kill criteria fired.
 [![PyPI](https://img.shields.io/pypi/v/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![Python](https://img.shields.io/pypi/pyversions/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-157_passing-brightgreen.svg)](tests/)
+[![Tests](https://img.shields.io/badge/tests-200%2B_passing-brightgreen.svg)](tests/)
 **Home:** [loopgain.ai](https://loopgain.ai)
@@ -68,7 +70,7 @@ Works for **any iterative AI workflow with a measurable error signal** — verif
 ## Why
-Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stability monitor based on the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
+Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stop-and-rollback policy grounded in the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
 ---
@@ -145,7 +147,7 @@ It routes the trajectory into one of five named states:
 | State | Condition | Action |
 | --- | --- | --- |
-| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue, predict ETA |
+| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue |
 | `CONVERGING` | negative slope with `p < 0.05`, OR cumulative ≤ 50% | Continue, watch for upward drift |
 | `STALLING` | no significant slope, no detectable oscillation | Stop after 2 consecutive readings — return best-so-far |
 | `OSCILLATING` | high residual variance with flat trend | Stop — return best-so-far |
@@ -161,18 +163,6 @@ The decision is **conservative by design**: requiring both statistical significa
 ---
-## ETA prediction
-When the loop is converging (`Aβ_smooth < 1`), LoopGain produces a closed-form prediction of iterations remaining:
-```
-n_remaining = log(E_target / E_current) / log(Aβ_smooth)
-```
-Available as `lg.eta` mid-loop. Returns `None` when the prediction isn't well-defined (no Aβ yet, target zero, or non-converging gain).
----
 ## Best-so-far rollback
 LoopGain keeps a buffer of all observed outputs paired with their error scores. On termination it returns `argmin(error)`, not the last iteration:
@@ -189,10 +179,10 @@ This transforms divergence detection from "abort with garbage" into "abort with
 ## What LoopGain does and doesn't guarantee
-LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **93.5% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
+LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **92.8% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
-- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
-- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤3.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
+- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~78–84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
+- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤4.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
 ---
@@ -224,7 +214,7 @@ Current state name. One of `INIT`, `FAST_CONVERGE`, `CONVERGING`, `STALLING`, `O
 ### `lg.eta -> int | None`
-Predicted iterations to reach target. `None` when not well-defined.
+Best-effort closed-form estimate of iterations remaining, exposed for instrumentation. Returns `None` whenever it isn't well-defined — which is most of the time on real, jump-dominated loops, so don't depend on it for control.
 ### `lg.gain_margin -> float | None`

{loopgain-0.4.0 → loopgain-0.4.2}/pyproject.toml RENAMED Viewed

@@ -4,8 +4,10 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "loopgain"
-version = "0.4.0"
-description = "Barkhausen stability monitor for AI agent loops. Real-time loop-gain (Aβ) monitoring with five named threshold bands, best-so-far rollback, and ETA prediction."
+# Single source of truth: loopgain/_version.py (read dynamically below).
+# Bump the version in that one file per release; this no longer duplicates it.
+dynamic = ["version"]
+description = "An open-source cost controller for AI agent loops. Stops a loop when it has actually converged and rolls back before it degrades — replacing the max_iterations guess with a real-time loop-gain (Aβ) monitor with five named threshold bands and best-so-far rollback."
 authors = [{name = "Dave Fitzsimmons", email = "hello@loopgain.ai"}]
 readme = "README.md"
 license = {text = "Apache-2.0"}
@@ -100,6 +102,11 @@ all = [
 # zero-dep. Install with `pip install 'loopgain[examples]'`.
 examples = ["anthropic>=0.40.0"]
+[tool.setuptools.dynamic]
+# Reads the literal ``__version__ = "x.y.z"`` from loopgain/_version.py via AST
+# (no import), so pyproject.toml never duplicates the version string.
+version = {attr = "loopgain._version.__version__"}
 [tool.setuptools.packages.find]
 where = ["."]
 include = ["loopgain*"]

{loopgain-0.4.0 → loopgain-0.4.2}/tests/test_classifier_mock_validation.py RENAMED Viewed

@@ -223,11 +223,13 @@ def test_loop_length_robustness():
     - n=8 (df=6): ≥ 90% (the default real-loop length)
     - n=12 (df=10): ≥ 95%
     """
-    # n=4 is intentionally excluded: with df=2 the t-test requires |t|>4.3
-    # for p<0.05, which is a fundamental statistical-power floor. The
-    # classifier correctly falls back to STALLING (insufficient evidence)
-    # for most convergent trajectories at n=4. Documented as a
-    # min-recommended-iterations limit, not a bug.
+    # n=4 is intentionally excluded from the high-accuracy thresholds below:
+    # with df=2 the t-test correctly requires |t|>4.30 for p<0.05 (see
+    # test_two_sided_t_p_df2_exact), a fundamental statistical-power floor at
+    # this length. The classifier falls back to cumulative E_ratio when the
+    # slope test is underpowered. This is a min-recommended-iterations limit,
+    # not a bug. (Historically the df=2 p-value was computed at 2x its true
+    # value, requiring |t|>6.21 and worsening this floor — now fixed.)
     LEN_THRESHOLDS = {6: 0.80, 8: 0.90, 12: 0.95}
     for n, threshold in LEN_THRESHOLDS.items():
         for gen, expected in [

{loopgain-0.4.0 → loopgain-0.4.2}/tests/test_classifier_synthetic.py RENAMED Viewed

@@ -33,7 +33,42 @@ from loopgain import (
     classify_trajectory,
     extract_features,
 )
-from loopgain.classifier import _ols_slope_and_p
+from loopgain.classifier import _ols_slope_and_p, _two_sided_t_p
+# ----- Two-sided t p-value closed forms -----
+def test_two_sided_t_p_df1_exact():
+    """df=1 is the Cauchy distribution: two-sided p = 1 - 2·atan(t)/pi."""
+    for t in (0.0, 0.5, 1.0, 2.0, 5.0, 12.706):
+        expected = 1.0 - 2.0 * math.atan(t) / math.pi
+        assert _two_sided_t_p(t, 1) == pytest.approx(expected, abs=1e-9)
+    # t=1 is the median of |T| for df=1 → two-sided p = 0.5.
+    assert _two_sided_t_p(1.0, 1) == pytest.approx(0.5, abs=1e-9)
+def test_two_sided_t_p_df2_exact():
+    """df=2 closed form: two-sided p = 1 - |t|/sqrt(2 + t^2).
+    Regression guard for the doubled-p bug: the critical value for p=0.05
+    at df=2 is t=4.302653. The previous implementation returned ~0.10 here
+    (2x too large), which forced |t|>6.21 for significance and made the n=4
+    classifier far too conservative.
+    """
+    for t in (0.0, 0.5, 1.0, 2.0, 5.0):
+        expected = 1.0 - t / math.sqrt(2.0 + t * t)
+        assert _two_sided_t_p(t, 2) == pytest.approx(expected, abs=1e-9)
+    # The exact 5% two-sided critical value for df=2.
+    assert _two_sided_t_p(4.302653, 2) == pytest.approx(0.05, abs=1e-4)
+    # p is a probability: monotone non-increasing in t, bounded to [0, 1].
+    assert _two_sided_t_p(0.0, 2) == pytest.approx(1.0, abs=1e-9)
+    prev = 1.1
+    for t in (0.0, 0.5, 1.0, 2.0, 4.0, 8.0, 50.0):
+        p = _two_sided_t_p(t, 2)
+        assert 0.0 <= p <= 1.0
+        assert p <= prev + 1e-12
+        prev = p
 # ----- OLS slope / p-value building blocks -----