PyPI - loopgain - Versions diffs - 0.3.0__tar.gz → 0.4.1__tar.gz - Mend

loopgain 0.3.0tar.gz → 0.4.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{loopgain-0.3.0 → loopgain-0.4.1}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: loopgain
-Version: 0.3.0
-Summary: Barkhausen stability monitor for AI agent loops. Real-time loop-gain (Aβ) monitoring with five named threshold bands, best-so-far rollback, and ETA prediction.
+Version: 0.4.1
+Summary: An open-source cost controller for AI agent loops. Stops a loop when it has actually converged and rolls back before it degrades — replacing the max_iterations guess with a real-time loop-gain (Aβ) monitor with five named threshold bands and best-so-far rollback.
 Author-email: Dave Fitzsimmons <hello@loopgain.ai>
 License: Apache-2.0
 Project-URL: Homepage, https://loopgain.ai
@@ -49,14 +49,16 @@ Dynamic: license-file
 # LoopGain
-**Barkhausen stability monitor for AI agent loops.**
+**An open-source cost controller for AI agent loops.**
-Replace `max_iterations=5` with a real-time trajectory classifier that reads four features off the loop's error series and routes it into one of five named states — knowing whether your agent loop is converging, stalling, oscillating, or diverging, and what to do in each case.
+AI agent loops waste time and money when they don't know when to stop. LoopGain measures the loop in real time and stops it the moment it has actually converged — and rolls back before it degrades — instead of running to a fixed `max_iterations` cap.
+> **Across 2,000 paired trials over 10 cells**, LoopGain reduced total API spend by **92.8%** vs `max_iter=20`, dropped median wall-clock latency from 30.9s to 2.1s (**~15×**), preserved output quality on natural-distribution workloads (W1–W4: judge winrate 0.50–0.63, CI excluding null on most cells), and improved output quality on engineered-failure workloads (W5: winrate 0.92–0.95 across three adapters). Weighted-average pairwise preference for LG vs B20 across 1,800 judge comparisons: **0.678**. Zero of six kill criteria fired.
 [![PyPI](https://img.shields.io/pypi/v/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![Python](https://img.shields.io/pypi/pyversions/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-157_passing-brightgreen.svg)](tests/)
+[![Tests](https://img.shields.io/badge/tests-202_passing-brightgreen.svg)](tests/)
 **Home:** [loopgain.ai](https://loopgain.ai)
@@ -68,7 +70,7 @@ Works for **any iterative AI workflow with a measurable error signal** — verif
 ## Why
-Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stability monitor based on the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
+Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stop-and-rollback policy grounded in the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
 ---
@@ -108,6 +110,28 @@ print(result.savings_vs_fixed_cap)
 ---
+## Defining your error signal
+The one thing you provide is the **error signal**: a single non-negative number, every iteration, that says how wrong the current output is. **Lower is better; zero means done.** LoopGain doesn't know what your loop does — it just watches that number's trajectory and decides whether to keep going, stop, or roll back.
+Your loop already has some way of knowing the output isn't good yet (or it wouldn't keep revising). Turn that into a number:
+| Loop | Error signal = |
+| --- | --- |
+| Agentic coding (write code → run tests) | number of **failing tests** (10 → 3 → 0) |
+| JSON / structured extraction | number of **schema violations** |
+| RAG with self-correction | number of **required facts still missing** |
+| Self-refinement with an LLM judge | judge's **gap to target** (e.g. `10 − quality_score`) |
+| Lint / format loop | **lint error count** |
+The only rules: non-negative, and **smaller as the output gets better**. Returning the raw list of problems works directly — `observe()` uses its length as the magnitude (e.g. hand it the list of failing tests).
+If your quality is fuzzy and has no natural "zero," run with `target_error=None`: LoopGain then stops when the number **stops improving**, wherever that plateau is, instead of waiting for an exact target.
+Every stop/continue decision is made from this one number, so **LoopGain is only as good as the error signal you give it** — pick one that genuinely tracks output quality.
+---
 ## How it works
 LoopGain measures empirical loop gain (`Aβ = E(n) / E(n-1)`) at every iteration and exposes it as a smoothed time series for visualization. The decision engine, however, classifies the **full error trajectory** using four features:
@@ -123,7 +147,7 @@ It routes the trajectory into one of five named states:
 | State | Condition | Action |
 | --- | --- | --- |
-| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue, predict ETA |
+| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue |
 | `CONVERGING` | negative slope with `p < 0.05`, OR cumulative ≤ 50% | Continue, watch for upward drift |
 | `STALLING` | no significant slope, no detectable oscillation | Stop after 2 consecutive readings — return best-so-far |
 | `OSCILLATING` | high residual variance with flat trend | Stop — return best-so-far |
@@ -139,18 +163,6 @@ The decision is **conservative by design**: requiring both statistical significa
 ---
-## ETA prediction
-When the loop is converging (`Aβ_smooth < 1`), LoopGain produces a closed-form prediction of iterations remaining:
-```
-n_remaining = log(E_target / E_current) / log(Aβ_smooth)
-```
-Available as `lg.eta` mid-loop. Returns `None` when the prediction isn't well-defined (no Aβ yet, target zero, or non-converging gain).
----
 ## Best-so-far rollback
 LoopGain keeps a buffer of all observed outputs paired with their error scores. On termination it returns `argmin(error)`, not the last iteration:
@@ -165,14 +177,23 @@ This transforms divergence detection from "abort with garbage" into "abort with
 ---
+## What LoopGain does and doesn't guarantee
+LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **92.8% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
+- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~78–84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
+- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤4.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
+---
 ## API reference
-### `LoopGain(target_error=0.0, max_iterations=None, thresholds=None, trajectory_thresholds=None, classifier='trajectory', smoothing_window=3, assumed_fixed_cap=10)`
+### `LoopGain(target_error=0.0, max_iterations=50, thresholds=None, trajectory_thresholds=None, classifier='trajectory', smoothing_window=3, assumed_fixed_cap=10)`
 Construct the monitor.
 - `target_error` — Stop when an observed error drops at or below this. Default `0.0` short-circuits on exactly zero error (the natural completion signal for verifier-driven loops). Pass `None` to disable the short-circuit entirely.
-- `max_iterations` — Hard safety cap. Default `None` (rely on stability detection). Recommended ~20–50 for production.
+- `max_iterations` — Hard safety backstop. Default `50` so the loop can never run unbounded; a stability verdict normally terminates it well before this. Pass `None` to opt into a fully unbounded loop (only safe if your loop is guaranteed to reach `target_error` or a stop-state), or a smaller integer to cap tighter.
 - `thresholds` — Custom `ThresholdBands` for the legacy single-Aβ-band classifier. Ignored when `classifier='trajectory'`.
 - `trajectory_thresholds` — Custom `TrajectoryThresholds` for the multi-feature classifier (the default). Override only with workload-specific evidence.
 - `classifier` — `'trajectory'` (default, v0.2 multi-feature classifier) or `'legacy_bands'` (v0.1 single-Aβ-band classifier).
@@ -193,7 +214,7 @@ Current state name. One of `INIT`, `FAST_CONVERGE`, `CONVERGING`, `STALLING`, `O
 ### `lg.eta -> int | None`
-Predicted iterations to reach target. `None` when not well-defined.
+Best-effort closed-form estimate of iterations remaining, exposed for instrumentation. Returns `None` whenever it isn't well-defined — which is most of the time on real, jump-dominated loops, so don't depend on it for control.
 ### `lg.gain_margin -> float | None`

{loopgain-0.3.0 → loopgain-0.4.1}/README.md RENAMED Viewed

@@ -1,13 +1,15 @@
 # LoopGain
-**Barkhausen stability monitor for AI agent loops.**
+**An open-source cost controller for AI agent loops.**
-Replace `max_iterations=5` with a real-time trajectory classifier that reads four features off the loop's error series and routes it into one of five named states — knowing whether your agent loop is converging, stalling, oscillating, or diverging, and what to do in each case.
+AI agent loops waste time and money when they don't know when to stop. LoopGain measures the loop in real time and stops it the moment it has actually converged — and rolls back before it degrades — instead of running to a fixed `max_iterations` cap.
+> **Across 2,000 paired trials over 10 cells**, LoopGain reduced total API spend by **92.8%** vs `max_iter=20`, dropped median wall-clock latency from 30.9s to 2.1s (**~15×**), preserved output quality on natural-distribution workloads (W1–W4: judge winrate 0.50–0.63, CI excluding null on most cells), and improved output quality on engineered-failure workloads (W5: winrate 0.92–0.95 across three adapters). Weighted-average pairwise preference for LG vs B20 across 1,800 judge comparisons: **0.678**. Zero of six kill criteria fired.
 [![PyPI](https://img.shields.io/pypi/v/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![Python](https://img.shields.io/pypi/pyversions/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-157_passing-brightgreen.svg)](tests/)
+[![Tests](https://img.shields.io/badge/tests-202_passing-brightgreen.svg)](tests/)
 **Home:** [loopgain.ai](https://loopgain.ai)
@@ -19,7 +21,7 @@ Works for **any iterative AI workflow with a measurable error signal** — verif
 ## Why
-Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stability monitor based on the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
+Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stop-and-rollback policy grounded in the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
 ---
@@ -59,6 +61,28 @@ print(result.savings_vs_fixed_cap)
 ---
+## Defining your error signal
+The one thing you provide is the **error signal**: a single non-negative number, every iteration, that says how wrong the current output is. **Lower is better; zero means done.** LoopGain doesn't know what your loop does — it just watches that number's trajectory and decides whether to keep going, stop, or roll back.
+Your loop already has some way of knowing the output isn't good yet (or it wouldn't keep revising). Turn that into a number:
+| Loop | Error signal = |
+| --- | --- |
+| Agentic coding (write code → run tests) | number of **failing tests** (10 → 3 → 0) |
+| JSON / structured extraction | number of **schema violations** |
+| RAG with self-correction | number of **required facts still missing** |
+| Self-refinement with an LLM judge | judge's **gap to target** (e.g. `10 − quality_score`) |
+| Lint / format loop | **lint error count** |
+The only rules: non-negative, and **smaller as the output gets better**. Returning the raw list of problems works directly — `observe()` uses its length as the magnitude (e.g. hand it the list of failing tests).
+If your quality is fuzzy and has no natural "zero," run with `target_error=None`: LoopGain then stops when the number **stops improving**, wherever that plateau is, instead of waiting for an exact target.
+Every stop/continue decision is made from this one number, so **LoopGain is only as good as the error signal you give it** — pick one that genuinely tracks output quality.
+---
 ## How it works
 LoopGain measures empirical loop gain (`Aβ = E(n) / E(n-1)`) at every iteration and exposes it as a smoothed time series for visualization. The decision engine, however, classifies the **full error trajectory** using four features:
@@ -74,7 +98,7 @@ It routes the trajectory into one of five named states:
 | State | Condition | Action |
 | --- | --- | --- |
-| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue, predict ETA |
+| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue |
 | `CONVERGING` | negative slope with `p < 0.05`, OR cumulative ≤ 50% | Continue, watch for upward drift |
 | `STALLING` | no significant slope, no detectable oscillation | Stop after 2 consecutive readings — return best-so-far |
 | `OSCILLATING` | high residual variance with flat trend | Stop — return best-so-far |
@@ -90,18 +114,6 @@ The decision is **conservative by design**: requiring both statistical significa
 ---
-## ETA prediction
-When the loop is converging (`Aβ_smooth < 1`), LoopGain produces a closed-form prediction of iterations remaining:
-```
-n_remaining = log(E_target / E_current) / log(Aβ_smooth)
-```
-Available as `lg.eta` mid-loop. Returns `None` when the prediction isn't well-defined (no Aβ yet, target zero, or non-converging gain).
----
 ## Best-so-far rollback
 LoopGain keeps a buffer of all observed outputs paired with their error scores. On termination it returns `argmin(error)`, not the last iteration:
@@ -116,14 +128,23 @@ This transforms divergence detection from "abort with garbage" into "abort with
 ---
+## What LoopGain does and doesn't guarantee
+LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **92.8% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
+- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~78–84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
+- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤4.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
+---
 ## API reference
-### `LoopGain(target_error=0.0, max_iterations=None, thresholds=None, trajectory_thresholds=None, classifier='trajectory', smoothing_window=3, assumed_fixed_cap=10)`
+### `LoopGain(target_error=0.0, max_iterations=50, thresholds=None, trajectory_thresholds=None, classifier='trajectory', smoothing_window=3, assumed_fixed_cap=10)`
 Construct the monitor.
 - `target_error` — Stop when an observed error drops at or below this. Default `0.0` short-circuits on exactly zero error (the natural completion signal for verifier-driven loops). Pass `None` to disable the short-circuit entirely.
-- `max_iterations` — Hard safety cap. Default `None` (rely on stability detection). Recommended ~20–50 for production.
+- `max_iterations` — Hard safety backstop. Default `50` so the loop can never run unbounded; a stability verdict normally terminates it well before this. Pass `None` to opt into a fully unbounded loop (only safe if your loop is guaranteed to reach `target_error` or a stop-state), or a smaller integer to cap tighter.
 - `thresholds` — Custom `ThresholdBands` for the legacy single-Aβ-band classifier. Ignored when `classifier='trajectory'`.
 - `trajectory_thresholds` — Custom `TrajectoryThresholds` for the multi-feature classifier (the default). Override only with workload-specific evidence.
 - `classifier` — `'trajectory'` (default, v0.2 multi-feature classifier) or `'legacy_bands'` (v0.1 single-Aβ-band classifier).
@@ -144,7 +165,7 @@ Current state name. One of `INIT`, `FAST_CONVERGE`, `CONVERGING`, `STALLING`, `O
 ### `lg.eta -> int | None`
-Predicted iterations to reach target. `None` when not well-defined.
+Best-effort closed-form estimate of iterations remaining, exposed for instrumentation. Returns `None` whenever it isn't well-defined — which is most of the time on real, jump-dominated loops, so don't depend on it for control.
 ### `lg.gain_margin -> float | None`

{loopgain-0.3.0 → loopgain-0.4.1}/loopgain/_version.py RENAMED Viewed

@@ -7,4 +7,4 @@ from here so the value never drifts between ``__version__`` and the
 ``pyproject.toml``) for each release.
 """
-__version__ = "0.3.0"
+__version__ = "0.4.1"

{loopgain-0.3.0 → loopgain-0.4.1}/loopgain/classifier.py RENAMED Viewed

@@ -66,6 +66,20 @@ DEFAULT_OSC_STD_THRESHOLD = 0.30
 # for the oscillation gate.
 DEFAULT_SLOPE_TOL = 0.05
+# Liveness gate: number of iterations a loop may go without achieving a new
+# best (lowest) error before its "continue" verdicts (FAST_CONVERGE /
+# CONVERGING) are withdrawn so it can reach STALLING / OSCILLATING and
+# terminate. Without this, a loop that drops a lot and then plateaus or
+# oscillates *below* the cumulative thresholds keeps its historical win
+# forever and never terminates. Derivation: the continue-states are claims
+# about *ongoing* progress; cumulative reduction (E_current/E_first) and a
+# whole-history slope are claims about the *past* and do not expire. We treat
+# "no new low in N steps" as the loop having stopped improving. N is small
+# (3) so a sustained plateau is caught quickly, but the consecutive-STALLING
+# termination rule (2 readings) still protects a loop that briefly stalls and
+# then resumes hitting new lows.
+DEFAULT_STALL_PATIENCE = 3
 # Numerical floor to avoid log(0).
 _EPS = 1e-12
@@ -85,6 +99,7 @@ class TrajectoryThresholds:
     div_margin: float = DEFAULT_DIV_MARGIN
     osc_std_threshold: float = DEFAULT_OSC_STD_THRESHOLD
     slope_tol: float = DEFAULT_SLOPE_TOL
+    stall_patience: int = DEFAULT_STALL_PATIENCE
 @dataclass(frozen=True)
@@ -276,6 +291,18 @@ def classify_trajectory(
     f = extract_features(error_history)
+    # Liveness signal: how many iterations since the loop last achieved a new
+    # best (lowest) error. A genuinely converging loop keeps hitting new lows,
+    # so this stays small; a loop that dropped a lot and then plateaued (or is
+    # oscillating below the cumulative thresholds) has a large value. We use it
+    # to withdraw the "continue" verdicts (FAST_CONVERGE / CONVERGING) once a
+    # loop has stopped improving, so it can reach STALLING / OSCILLATING and
+    # terminate instead of riding its historical cumulative win forever. See
+    # DEFAULT_STALL_PATIENCE.
+    hist = list(error_history)
+    iters_since_best = (n - 1) - hist.index(min(hist))
+    still_improving = iters_since_best < th.stall_patience
     # n == 2 special case: with two observations, the slope is well defined
     # but its p-value is not (zero residual degrees of freedom). Fall back to
     # the sign of the change. This is the same conservatism as a Wilcoxon
@@ -291,13 +318,20 @@ def classify_trajectory(
         return STALLING
     # Order matters: FAST_CONVERGE precedes CONVERGING; both precede the
-    # remaining gates.
-    if f.e_ratio <= th.e_ratio_fast:
+    # remaining gates. Both continue-verdicts are gated on `still_improving`:
+    # a loop that has stopped hitting new lows is no longer "converging" no
+    # matter how large its historical cumulative reduction was, and must be
+    # allowed to fall through to STALLING / OSCILLATING so it can terminate.
+    if f.e_ratio <= th.e_ratio_fast and still_improving:
         return FAST_CONVERGE
     slope_significant = f.slope_p < th.p_sig
-    if f.slope_log < 0 and (slope_significant or f.e_ratio <= th.e_ratio_conv):
+    if (
+        f.slope_log < 0
+        and still_improving
+        and (slope_significant or f.e_ratio <= th.e_ratio_conv)
+    ):
         return CONVERGING
     if f.slope_log > 0 and slope_significant and f.e_ratio > 1.0 + th.div_margin:

{loopgain-0.3.0 → loopgain-0.4.1}/loopgain/core.py RENAMED Viewed

@@ -40,6 +40,16 @@ DEFAULT_STALLING = 0.95
 DEFAULT_OSCILLATING_UPPER = 1.05
+# Bounded-by-default safety backstop. The loop should normally terminate on a
+# stability verdict (target met / oscillating / diverging / stalled) long
+# before this; it exists only so the library can never run truly unbounded if
+# a loop never converges and never stalls (e.g. infinitesimal-but-real progress
+# with target_error=None). Generous relative to typical loop lengths (the
+# bench capped at 20). Pass max_iterations=None to opt into a fully unbounded
+# loop, or a smaller integer to cap tighter.
+DEFAULT_MAX_ITERATIONS = 50
 # State names. Exported for use in switch/case in user code.
 INIT = "INIT"
 FAST_CONVERGE = "FAST_CONVERGE"
@@ -165,8 +175,11 @@ class LoopGain:
             tests, no validation errors, etc.). Pass ``None`` to disable
             the short-circuit entirely and rely only on stability
             detection and ``max_iterations``.
-        max_iterations: Hard safety cap. Default ``None`` (rely on
-            stability detection). Recommended ~20-50 for production.
+        max_iterations: Hard safety backstop. Default
+            ``DEFAULT_MAX_ITERATIONS`` (50) so the loop can never run
+            unbounded; normally a stability verdict terminates it long
+            before this. Pass ``None`` to opt into a fully unbounded loop,
+            or a smaller integer to cap tighter.
         thresholds: Custom ``ThresholdBands`` (legacy single-feature
             classifier only). Default is the canonical 0.3 / 0.85 / 0.95 /
             1.05. Ignored when ``classifier='trajectory'``.
@@ -190,7 +203,7 @@ class LoopGain:
     def __init__(
         self,
         target_error: Optional[float] = 0.0,
-        max_iterations: Optional[int] = None,
+        max_iterations: Optional[int] = DEFAULT_MAX_ITERATIONS,
         thresholds: Optional[ThresholdBands] = None,
         trajectory_thresholds: Optional[TrajectoryThresholds] = None,
         classifier: str = "trajectory",

{loopgain-0.3.0 → loopgain-0.4.1}/loopgain.egg-info/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: loopgain
-Version: 0.3.0
-Summary: Barkhausen stability monitor for AI agent loops. Real-time loop-gain (Aβ) monitoring with five named threshold bands, best-so-far rollback, and ETA prediction.
+Version: 0.4.1
+Summary: An open-source cost controller for AI agent loops. Stops a loop when it has actually converged and rolls back before it degrades — replacing the max_iterations guess with a real-time loop-gain (Aβ) monitor with five named threshold bands and best-so-far rollback.
 Author-email: Dave Fitzsimmons <hello@loopgain.ai>
 License: Apache-2.0
 Project-URL: Homepage, https://loopgain.ai
@@ -49,14 +49,16 @@ Dynamic: license-file
 # LoopGain
-**Barkhausen stability monitor for AI agent loops.**
+**An open-source cost controller for AI agent loops.**
-Replace `max_iterations=5` with a real-time trajectory classifier that reads four features off the loop's error series and routes it into one of five named states — knowing whether your agent loop is converging, stalling, oscillating, or diverging, and what to do in each case.
+AI agent loops waste time and money when they don't know when to stop. LoopGain measures the loop in real time and stops it the moment it has actually converged — and rolls back before it degrades — instead of running to a fixed `max_iterations` cap.
+> **Across 2,000 paired trials over 10 cells**, LoopGain reduced total API spend by **92.8%** vs `max_iter=20`, dropped median wall-clock latency from 30.9s to 2.1s (**~15×**), preserved output quality on natural-distribution workloads (W1–W4: judge winrate 0.50–0.63, CI excluding null on most cells), and improved output quality on engineered-failure workloads (W5: winrate 0.92–0.95 across three adapters). Weighted-average pairwise preference for LG vs B20 across 1,800 judge comparisons: **0.678**. Zero of six kill criteria fired.
 [![PyPI](https://img.shields.io/pypi/v/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![Python](https://img.shields.io/pypi/pyversions/loopgain.svg)](https://pypi.org/project/loopgain/)
 [![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-157_passing-brightgreen.svg)](tests/)
+[![Tests](https://img.shields.io/badge/tests-202_passing-brightgreen.svg)](tests/)
 **Home:** [loopgain.ai](https://loopgain.ai)
@@ -68,7 +70,7 @@ Works for **any iterative AI workflow with a measurable error signal** — verif
 ## Why
-Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stability monitor based on the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
+Production agent loops universally use `max_iterations=N` as their termination policy. It's the embarrassing default of agentic AI: you either waste compute (loop stops too late) or ship bad output (loop stops too early). LoopGain replaces it with a control-theoretic stop-and-rollback policy grounded in the **Barkhausen criterion** — a foundational result from electrical-engineering feedback-oscillator analysis (1921).
 ---
@@ -108,6 +110,28 @@ print(result.savings_vs_fixed_cap)
 ---
+## Defining your error signal
+The one thing you provide is the **error signal**: a single non-negative number, every iteration, that says how wrong the current output is. **Lower is better; zero means done.** LoopGain doesn't know what your loop does — it just watches that number's trajectory and decides whether to keep going, stop, or roll back.
+Your loop already has some way of knowing the output isn't good yet (or it wouldn't keep revising). Turn that into a number:
+| Loop | Error signal = |
+| --- | --- |
+| Agentic coding (write code → run tests) | number of **failing tests** (10 → 3 → 0) |
+| JSON / structured extraction | number of **schema violations** |
+| RAG with self-correction | number of **required facts still missing** |
+| Self-refinement with an LLM judge | judge's **gap to target** (e.g. `10 − quality_score`) |
+| Lint / format loop | **lint error count** |
+The only rules: non-negative, and **smaller as the output gets better**. Returning the raw list of problems works directly — `observe()` uses its length as the magnitude (e.g. hand it the list of failing tests).
+If your quality is fuzzy and has no natural "zero," run with `target_error=None`: LoopGain then stops when the number **stops improving**, wherever that plateau is, instead of waiting for an exact target.
+Every stop/continue decision is made from this one number, so **LoopGain is only as good as the error signal you give it** — pick one that genuinely tracks output quality.
+---
 ## How it works
 LoopGain measures empirical loop gain (`Aβ = E(n) / E(n-1)`) at every iteration and exposes it as a smoothed time series for visualization. The decision engine, however, classifies the **full error trajectory** using four features:
@@ -123,7 +147,7 @@ It routes the trajectory into one of five named states:
 | State | Condition | Action |
 | --- | --- | --- |
-| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue, predict ETA |
+| `FAST_CONVERGE` | cumulative reduction to ≤ 10% of E_first | Continue |
 | `CONVERGING` | negative slope with `p < 0.05`, OR cumulative ≤ 50% | Continue, watch for upward drift |
 | `STALLING` | no significant slope, no detectable oscillation | Stop after 2 consecutive readings — return best-so-far |
 | `OSCILLATING` | high residual variance with flat trend | Stop — return best-so-far |
@@ -139,18 +163,6 @@ The decision is **conservative by design**: requiring both statistical significa
 ---
-## ETA prediction
-When the loop is converging (`Aβ_smooth < 1`), LoopGain produces a closed-form prediction of iterations remaining:
-```
-n_remaining = log(E_target / E_current) / log(Aβ_smooth)
-```
-Available as `lg.eta` mid-loop. Returns `None` when the prediction isn't well-defined (no Aβ yet, target zero, or non-converging gain).
----
 ## Best-so-far rollback
 LoopGain keeps a buffer of all observed outputs paired with their error scores. On termination it returns `argmin(error)`, not the last iteration:
@@ -165,14 +177,23 @@ This transforms divergence detection from "abort with garbage" into "abort with
 ---
+## What LoopGain does and doesn't guarantee
+LoopGain saves money by stopping a loop once it stops improving — fewer iterations, fewer tokens. In our [public benchmark](https://github.com/loopgain-ai/loopgain-bench), that was a **92.8% median cut in API spend** vs `max_iterations=20`, with output quality preserved. Two honest limits:
+- **Savings depend on your workload.** Loops that usually succeed fast save the most (~96%); adversarial, failure-prone loops save less (~78–84%). The headline is a blend — run the benchmark on your own loops before quoting a number.
+- **LoopGain detects convergence, not correctness.** It stops when your error signal stops improving — which means more iterations won't help, *not* that the loop succeeded. On the benchmark this preserved quality (it rarely stopped early on a worse output; false-stop rate ≤4.5%), but a loop can stall with the error still above zero — a plateau at, say, 2 failing tests. So check `result.best_error` (or your own pass/fail) before you trust the output: if it plateaued short of your target, that's a quality gap LoopGain can't see, and a false stop that forces a rerun is the one way it eats into the savings. LoopGain decides *when to stop*; you decide *whether the answer is good enough*.
+---
 ## API reference
-### `LoopGain(target_error=0.0, max_iterations=None, thresholds=None, trajectory_thresholds=None, classifier='trajectory', smoothing_window=3, assumed_fixed_cap=10)`
+### `LoopGain(target_error=0.0, max_iterations=50, thresholds=None, trajectory_thresholds=None, classifier='trajectory', smoothing_window=3, assumed_fixed_cap=10)`
 Construct the monitor.
 - `target_error` — Stop when an observed error drops at or below this. Default `0.0` short-circuits on exactly zero error (the natural completion signal for verifier-driven loops). Pass `None` to disable the short-circuit entirely.
-- `max_iterations` — Hard safety cap. Default `None` (rely on stability detection). Recommended ~20–50 for production.
+- `max_iterations` — Hard safety backstop. Default `50` so the loop can never run unbounded; a stability verdict normally terminates it well before this. Pass `None` to opt into a fully unbounded loop (only safe if your loop is guaranteed to reach `target_error` or a stop-state), or a smaller integer to cap tighter.
 - `thresholds` — Custom `ThresholdBands` for the legacy single-Aβ-band classifier. Ignored when `classifier='trajectory'`.
 - `trajectory_thresholds` — Custom `TrajectoryThresholds` for the multi-feature classifier (the default). Override only with workload-specific evidence.
 - `classifier` — `'trajectory'` (default, v0.2 multi-feature classifier) or `'legacy_bands'` (v0.1 single-Aβ-band classifier).
@@ -193,7 +214,7 @@ Current state name. One of `INIT`, `FAST_CONVERGE`, `CONVERGING`, `STALLING`, `O
 ### `lg.eta -> int | None`
-Predicted iterations to reach target. `None` when not well-defined.
+Best-effort closed-form estimate of iterations remaining, exposed for instrumentation. Returns `None` whenever it isn't well-defined — which is most of the time on real, jump-dominated loops, so don't depend on it for control.
 ### `lg.gain_margin -> float | None`

{loopgain-0.3.0 → loopgain-0.4.1}/loopgain.egg-info/SOURCES.txt RENAMED Viewed

@@ -28,4 +28,5 @@ tests/test_core.py
 tests/test_funnel.py
 tests/test_integrations.py
 tests/test_stress.py
-tests/test_telemetry.py
+tests/test_telemetry.py
+tests/test_termination_safety.py

{loopgain-0.3.0 → loopgain-0.4.1}/pyproject.toml RENAMED Viewed

@@ -4,8 +4,10 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "loopgain"
-version = "0.3.0"
-description = "Barkhausen stability monitor for AI agent loops. Real-time loop-gain (Aβ) monitoring with five named threshold bands, best-so-far rollback, and ETA prediction."
+# Single source of truth: loopgain/_version.py (read dynamically below).
+# Bump the version in that one file per release; this no longer duplicates it.
+dynamic = ["version"]
+description = "An open-source cost controller for AI agent loops. Stops a loop when it has actually converged and rolls back before it degrades — replacing the max_iterations guess with a real-time loop-gain (Aβ) monitor with five named threshold bands and best-so-far rollback."
 authors = [{name = "Dave Fitzsimmons", email = "hello@loopgain.ai"}]
 readme = "README.md"
 license = {text = "Apache-2.0"}
@@ -100,6 +102,11 @@ all = [
 # zero-dep. Install with `pip install 'loopgain[examples]'`.
 examples = ["anthropic>=0.40.0"]
+[tool.setuptools.dynamic]
+# Reads the literal ``__version__ = "x.y.z"`` from loopgain/_version.py via AST
+# (no import), so pyproject.toml never duplicates the version string.
+version = {attr = "loopgain._version.__version__"}
 [tool.setuptools.packages.find]
 where = ["."]
 include = ["loopgain*"]

{loopgain-0.3.0 → loopgain-0.4.1}/tests/test_classifier_synthetic.py RENAMED Viewed

@@ -158,12 +158,22 @@ def test_pure_stall_no_trend():
     )
-def test_floor_convergence_already_at_target():
-    """If error is already ≈ 0 at observation 1, classifier returns
-    FAST_CONVERGE (cumulative reduction to floor)."""
+def test_floor_convergence_already_flat_at_floor_stalls():
+    """A loop already pinned at the numerical floor from iteration 0, flat,
+    classifies as STALLING — not FAST_CONVERGE.
+    Updated 2026-06 with the liveness-gate fix (see DEFAULT_STALL_PATIENCE).
+    Previously this returned FAST_CONVERGE on the strength of cumulative
+    reduction alone — but FAST_CONVERGE is a *continue* verdict, so an
+    at-floor flat loop would have continued (and, with no max_iterations,
+    run unbounded) instead of stopping. STALLING is the correct verdict: the
+    loop has made no progress for `stall_patience` iterations, so it
+    terminates via the consecutive-stall rule and returns best-so-far (the
+    floor value — a fine answer). In real use the `target_error`
+    short-circuit (next test) handles the at-target case directly."""
     trajectory = [1e-15] * 5
     state = classify_trajectory(trajectory)
-    assert state == FAST_CONVERGE
+    assert state == STALLING
 def test_target_met_short_circuit():

loopgain-0.4.1/tests/test_termination_safety.py ADDED Viewed

@@ -0,0 +1,115 @@
+"""Termination-safety tests: a loop must not run unbounded.
+Regression coverage for the FAST_CONVERGE/CONVERGING liveness bug (2026-06):
+the trajectory classifier used *cumulative* reduction (E_current/E_first) and a
+*whole-history* slope to emit the "continue" verdicts FAST_CONVERGE and
+CONVERGING. A loop that reduced its error and then plateaued (or oscillated)
+*below* the cumulative thresholds kept its historical win forever — it was
+pinned in a continue-state, never reached STALLING/OSCILLATING, and with the
+(then-default) max_iterations=None it ran forever.
+The fix has two independent layers, each tested here:
+  1. A liveness gate on the continue-verdicts: a loop that has not achieved a
+     new best error in `stall_patience` iterations is no longer treated as
+     "improving", so it can reach STALLING/OSCILLATING and terminate.
+  2. A bounded default max_iterations backstop, so the library can never run
+     truly unbounded even if a future classifier path regresses.
+Output quality was never at risk (best-so-far rollback held the good answer);
+the bug was a *liveness* failure — the loop never returned to hand it back.
+"""
+from __future__ import annotations
+import pytest
+from loopgain import CONVERGING, FAST_CONVERGE, LoopGain, classify_trajectory
+# Hard test guard: large enough that a *correctly* terminating loop never hits
+# it, small enough that a regression (unbounded loop) fails fast instead of
+# hanging the suite.
+GUARD = 500
+def _run_to_termination(lg: LoopGain, errors, guard: int = GUARD):
+    """Drive a loop, plateauing/repeating the last error, until it terminates
+    or hits the guard. Returns (iterations_run, hit_guard)."""
+    i = 0
+    while lg.should_continue():
+        e = errors[i] if i < len(errors) else errors[-1]
+        lg.observe(e, output=f"o{i}")
+        i += 1
+        if i >= guard:
+            return i, True
+    return i, False
+# ----- Layer 1: classifier liveness gate -----
+def test_plateau_below_fast_floor_terminates_without_max_iter():
+    """Error drops to 8% of initial then plateaus. e_ratio<=0.1 used to pin
+    FAST_CONVERGE forever. Must now terminate via STALLING."""
+    lg = LoopGain(max_iterations=None, target_error=None)
+    n, hit_guard = _run_to_termination(lg, [100, 8, 8, 8, 8, 8, 8, 8])
+    assert not hit_guard, f"loop did not terminate within {GUARD} iters (unbounded)"
+    assert not lg.should_continue()
+    assert lg.result.best_error == 8.0  # best-so-far still returned
+def test_plateau_above_fast_floor_terminates_without_max_iter():
+    """Error drops to 30% of initial (below E_RATIO_CONV=0.5) then plateaus.
+    e_ratio<=0.5 with a whole-history negative slope used to pin CONVERGING
+    forever. Must now terminate."""
+    lg = LoopGain(max_iterations=None, target_error=None)
+    n, hit_guard = _run_to_termination(lg, [100, 30, 30, 30, 30, 30, 30, 30])
+    assert not hit_guard, f"loop did not terminate within {GUARD} iters (unbounded)"
+    assert not lg.should_continue()
+def test_oscillation_below_floor_terminates_without_max_iter():
+    """Oscillation entirely below the 10% cumulative floor used to be shadowed
+    by FAST_CONVERGE. Must now terminate (OSCILLATING or STALLING)."""
+    lg = LoopGain(max_iterations=None, target_error=None)
+    n, hit_guard = _run_to_termination(lg, [100, 5, 8, 5, 8, 5, 8, 5, 8])
+    assert not hit_guard, f"loop did not terminate within {GUARD} iters (unbounded)"
+    assert not lg.should_continue()
+def test_classifier_flags_plateau_after_big_drop_as_terminable():
+    """Direct classifier check: a big drop followed by a flat tail must NOT be
+    reported as a continue-state (FAST_CONVERGE/CONVERGING)."""
+    plateau_low = [100, 8, 8, 8, 8, 8]
+    plateau_mid = [100, 30, 30, 30, 30, 30]
+    assert classify_trajectory(plateau_low) not in (FAST_CONVERGE, CONVERGING)
+    assert classify_trajectory(plateau_mid) not in (FAST_CONVERGE, CONVERGING)
+def test_genuine_fast_converge_still_continues():
+    """Guard against over-correction: a monotone steep decline that keeps
+    hitting new lows must still read FAST_CONVERGE (continue), not be
+    prematurely stalled."""
+    monotone = [100, 25, 6, 1.5, 0.4, 0.1]  # new low every step
+    assert classify_trajectory(monotone) == FAST_CONVERGE
+def test_genuine_converging_still_continues():
+    """A steady decline landing between the two cumulative thresholds must
+    still read CONVERGING while it is still hitting new lows."""
+    converging = [10.0, 8.0, 6.4, 5.1, 4.1, 3.3]  # ~0.8x/step, new low every step
+    assert classify_trajectory(converging) == CONVERGING
+# ----- Layer 2: bounded default backstop -----
+def test_default_max_iterations_is_a_bounded_backstop():
+    """The default config must not be able to run unbounded. A never-improving
+    loop under all-default construction must terminate at the backstop."""
+    lg = LoopGain()  # all defaults
+    assert lg.max_iterations is not None, "default max_iterations must be bounded"
+    # A strictly increasing error never converges/stalls into best-so-far early
+    # under every classifier path; the backstop must still stop it.
+    i, hit_guard = _run_to_termination(lg, list(range(1, GUARD + 5)))
+    assert not hit_guard, "default backstop failed to bound the loop"
+    assert not lg.should_continue()