@rigour-labs/core 4.3.5 → 5.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +46 -10
- package/dist/gates/base.d.ts +3 -0
- package/dist/gates/checkpoint.d.ts +23 -8
- package/dist/gates/checkpoint.js +109 -45
- package/dist/gates/checkpoint.test.js +6 -3
- package/dist/gates/dependency.d.ts +39 -0
- package/dist/gates/dependency.js +212 -5
- package/dist/gates/duplication-drift.d.ts +101 -6
- package/dist/gates/duplication-drift.js +427 -33
- package/dist/gates/logic-drift.d.ts +70 -0
- package/dist/gates/logic-drift.js +280 -0
- package/dist/gates/runner.js +29 -1
- package/dist/gates/style-drift.d.ts +53 -0
- package/dist/gates/style-drift.js +305 -0
- package/dist/index.d.ts +4 -0
- package/dist/index.js +4 -0
- package/dist/inference/model-manager.js +5 -1
- package/dist/inference/types.d.ts +6 -1
- package/dist/inference/types.js +6 -1
- package/dist/services/adaptive-thresholds.d.ts +54 -10
- package/dist/services/adaptive-thresholds.js +161 -35
- package/dist/services/adaptive-thresholds.test.js +24 -20
- package/dist/services/filesystem-cache.d.ts +50 -0
- package/dist/services/filesystem-cache.js +124 -0
- package/dist/services/temporal-drift.d.ts +101 -0
- package/dist/services/temporal-drift.js +386 -0
- package/dist/templates/universal-config.js +17 -0
- package/dist/types/index.d.ts +196 -0
- package/dist/types/index.js +19 -0
- package/dist/utils/scanner.d.ts +6 -1
- package/dist/utils/scanner.js +8 -1
- package/package.json +6 -6
package/README.md
CHANGED
|
@@ -3,23 +3,46 @@
|
|
|
3
3
|
[](https://www.npmjs.com/package/@rigour-labs/core)
|
|
4
4
|
[](https://opensource.org/licenses/MIT)
|
|
5
5
|
|
|
6
|
-
**
|
|
6
|
+
**AI Agent Governance Engine — deterministic quality gates, drift detection, and LLM-powered deep analysis.**
|
|
7
7
|
|
|
8
|
-
The core library powering [Rigour](https://rigour.run) —
|
|
8
|
+
The core library powering [Rigour](https://rigour.run) — 27+ quality gates, five-signal deep analysis pipeline, temporal drift engine, and AI agent DLP across TypeScript, JavaScript, Python, Go, Ruby, and C#/.NET.
|
|
9
9
|
|
|
10
10
|
> This package is the engine. For the CLI, use [`@rigour-labs/cli`](https://www.npmjs.com/package/@rigour-labs/cli). For MCP integration, use [`@rigour-labs/mcp`](https://www.npmjs.com/package/@rigour-labs/mcp).
|
|
11
11
|
|
|
12
12
|
## What's Inside
|
|
13
13
|
|
|
14
|
-
###
|
|
14
|
+
### 27+ Deterministic Quality Gates
|
|
15
15
|
|
|
16
16
|
**Structural:** File size, cyclomatic complexity, method count, parameter count, nesting depth, required docs, content hygiene.
|
|
17
17
|
|
|
18
|
-
**Security:** Hardcoded secrets, SQL injection, XSS, command injection, path traversal.
|
|
18
|
+
**Security:** Hardcoded secrets, SQL injection, XSS, command injection, path traversal, frontend secret exposure.
|
|
19
19
|
|
|
20
|
-
**AI
|
|
20
|
+
**AI Drift Detection:**
|
|
21
|
+
- **Three-pass duplication drift** — MD5 exact → AST Jaccard (tree-sitter) → semantic embedding (all-MiniLM-L6-v2, 384D cosine). Catches `.find()` vs `.filter()[0]` — same intent, different implementation.
|
|
22
|
+
- **Hallucinated imports** — language-aware resolution for relative + package imports.
|
|
23
|
+
- **Phantom APIs** — non-existent stdlib/framework methods the LLM invented.
|
|
24
|
+
- **Style drift** — fingerprints naming, error handling, import style, quote preferences against project baseline.
|
|
25
|
+
- **Logic drift** — tracks comparison operators (>= → >), branch counts, return statements per function across scans.
|
|
26
|
+
- **Dependency bloat** — unused deps, heavy alternatives (moment→dayjs), duplicate purpose packages.
|
|
27
|
+
- **Context-window artifacts**, inconsistent error handling, promise safety, deprecated APIs.
|
|
21
28
|
|
|
22
|
-
**Agent Governance:** Multi-agent scope isolation, checkpoint supervision, context drift, retry loop breaker.
|
|
29
|
+
**Agent Governance:** Multi-agent scope isolation, EWMA-based checkpoint supervision, context drift, retry loop breaker, memory & skills governance with DLP scanning.
|
|
30
|
+
|
|
31
|
+
### Five-Signal Deep Analysis Pipeline
|
|
32
|
+
|
|
33
|
+
Rigour's deep analysis is not a wrapper around a generic LLM. The model operates within a cage of deterministic facts:
|
|
34
|
+
|
|
35
|
+
1. **Extract** — five independent signal streams (AST facts, semantic embeddings, style fingerprints, logic baselines, dependency graphs) computed deterministically before the LLM sees anything.
|
|
36
|
+
2. **Interpret** — the model receives structured facts (not raw source), focuses on SOLID, design patterns, language idioms, architecture. Constrained input prevents hallucination.
|
|
37
|
+
3. **Verify** — every LLM finding is cross-referenced against all five signal streams. Wrong line numbers, phantom patterns, non-existent functions → discarded. Only verified findings with confidence scores reach the report.
|
|
38
|
+
|
|
39
|
+
Both model tiers (lite sidecar + pro code-specialized) are fine-tuned via the [DriftBench RLAIF pipeline](https://github.com/rigour-labs/driftbench) where the five signal streams serve as the teacher signal.
|
|
40
|
+
|
|
41
|
+
### Temporal Drift Engine (v5.1)
|
|
42
|
+
|
|
43
|
+
Cross-session trend analysis powered by EWMA and Z-score anomaly detection. Tracks three independent provenance streams (AI drift, structural, security) with separate trend directions. Reads from the SQLite brain for month-over-month analysis.
|
|
44
|
+
|
|
45
|
+
Key capabilities: per-provenance EWMA streams (alpha=0.3), Z-score anomaly detection (|Z| > 2.0), monthly/weekly rollups, semantic duplicate tracking, style + logic baseline evolution, human-readable narrative generation.
|
|
23
46
|
|
|
24
47
|
### Multi-Language Support
|
|
25
48
|
|
|
@@ -41,14 +64,27 @@ Machine-readable JSON diagnostics with severity, provenance, file, line number,
|
|
|
41
64
|
```typescript
|
|
42
65
|
import { GateRunner } from '@rigour-labs/core';
|
|
43
66
|
|
|
44
|
-
const runner = new GateRunner(config
|
|
45
|
-
const report = await runner.run();
|
|
67
|
+
const runner = new GateRunner(config);
|
|
68
|
+
const report = await runner.run(projectRoot);
|
|
46
69
|
|
|
47
|
-
console.log(report.
|
|
48
|
-
console.log(report.score); // 0-100
|
|
70
|
+
console.log(report.status); // 'PASS' or 'FAIL'
|
|
71
|
+
console.log(report.stats.score); // 0-100
|
|
49
72
|
console.log(report.failures); // Failure[]
|
|
50
73
|
```
|
|
51
74
|
|
|
75
|
+
### With Deep Analysis
|
|
76
|
+
|
|
77
|
+
```typescript
|
|
78
|
+
import { GateRunner } from '@rigour-labs/core';
|
|
79
|
+
|
|
80
|
+
const runner = new GateRunner(config);
|
|
81
|
+
const report = await runner.run(projectRoot, undefined, {
|
|
82
|
+
enabled: true,
|
|
83
|
+
pro: false, // true for full-power model
|
|
84
|
+
provider: 'local', // or 'claude', 'openai', etc.
|
|
85
|
+
});
|
|
86
|
+
```
|
|
87
|
+
|
|
52
88
|
## Documentation
|
|
53
89
|
|
|
54
90
|
**[Full docs at docs.rigour.run](https://docs.rigour.run)**
|
package/dist/gates/base.d.ts
CHANGED
|
@@ -1,10 +1,13 @@
|
|
|
1
1
|
import { GoldenRecord } from '../services/context-engine.js';
|
|
2
2
|
import { Failure, Severity, Provenance } from '../types/index.js';
|
|
3
|
+
import type { FileSystemCache } from '../services/filesystem-cache.js';
|
|
3
4
|
export interface GateContext {
|
|
4
5
|
cwd: string;
|
|
5
6
|
record?: GoldenRecord;
|
|
6
7
|
ignore?: string[];
|
|
7
8
|
patterns?: string[];
|
|
9
|
+
/** Shared file cache across gates — reduces memory ~80% on large repos */
|
|
10
|
+
fileCache?: FileSystemCache;
|
|
8
11
|
}
|
|
9
12
|
export declare abstract class Gate {
|
|
10
13
|
readonly id: string;
|
|
@@ -1,16 +1,25 @@
|
|
|
1
1
|
/**
|
|
2
|
-
* Checkpoint Supervision Gate
|
|
2
|
+
* Checkpoint Supervision Gate (v2)
|
|
3
3
|
*
|
|
4
4
|
* Monitors agent quality during extended execution for frontier models
|
|
5
5
|
* like GPT-5.3-Codex "coworking mode" that run autonomously for long periods.
|
|
6
6
|
*
|
|
7
|
-
*
|
|
8
|
-
* -
|
|
9
|
-
* -
|
|
10
|
-
* -
|
|
11
|
-
* -
|
|
7
|
+
* v2 upgrades:
|
|
8
|
+
* - EWMA (Exponentially Weighted Moving Average) replaces linear regression
|
|
9
|
+
* - One bad checkpoint no longer tanks the whole trend
|
|
10
|
+
* - Configurable smoothing factor (α=0.3 default — recent data weighted more)
|
|
11
|
+
* - Separate "sudden drop" detection from "gradual decline" detection
|
|
12
12
|
*
|
|
13
|
-
*
|
|
13
|
+
* EWMA formula:
|
|
14
|
+
* ewma_t = α × score_t + (1 - α) × ewma_(t-1)
|
|
15
|
+
*
|
|
16
|
+
* Why EWMA > Linear Regression:
|
|
17
|
+
* - Linear regression on 5 points: one outlier shifts the slope dramatically
|
|
18
|
+
* - EWMA: one bad score dampened by history, persistent drops amplified
|
|
19
|
+
* - α=0.3: ~70% weight on history, 30% on new data → noise-resistant
|
|
20
|
+
*
|
|
21
|
+
* @since v2.14.0 (original, linear regression)
|
|
22
|
+
* @since v5.0.0 (EWMA drift detection)
|
|
14
23
|
*/
|
|
15
24
|
import { Gate, GateContext } from './base.js';
|
|
16
25
|
import { Failure, Provenance } from '../types/index.js';
|
|
@@ -22,6 +31,8 @@ export interface CheckpointEntry {
|
|
|
22
31
|
summary: string;
|
|
23
32
|
qualityScore: number;
|
|
24
33
|
warnings: string[];
|
|
34
|
+
/** EWMA value at this checkpoint (computed on record) */
|
|
35
|
+
ewma?: number;
|
|
25
36
|
}
|
|
26
37
|
export interface CheckpointSession {
|
|
27
38
|
sessionId: string;
|
|
@@ -36,6 +47,10 @@ export interface CheckpointConfig {
|
|
|
36
47
|
quality_threshold?: number;
|
|
37
48
|
drift_detection?: boolean;
|
|
38
49
|
auto_save_on_failure?: boolean;
|
|
50
|
+
/** EWMA smoothing factor. Higher = more weight on recent data. Default 0.3 */
|
|
51
|
+
ewma_alpha?: number;
|
|
52
|
+
/** Drop from EWMA that triggers drift warning. Default 15 */
|
|
53
|
+
drift_drop_threshold?: number;
|
|
39
54
|
}
|
|
40
55
|
/**
|
|
41
56
|
* Get or create checkpoint session
|
|
@@ -44,7 +59,7 @@ export declare function getOrCreateCheckpointSession(cwd: string): CheckpointSes
|
|
|
44
59
|
/**
|
|
45
60
|
* Record a checkpoint with quality evaluation
|
|
46
61
|
*/
|
|
47
|
-
export declare function recordCheckpoint(cwd: string, progressPct: number, filesChanged: string[], summary: string, qualityScore: number): {
|
|
62
|
+
export declare function recordCheckpoint(cwd: string, progressPct: number, filesChanged: string[], summary: string, qualityScore: number, config?: CheckpointConfig): {
|
|
48
63
|
continue: boolean;
|
|
49
64
|
warnings: string[];
|
|
50
65
|
checkpoint: CheckpointEntry;
|
package/dist/gates/checkpoint.js
CHANGED
|
@@ -1,16 +1,25 @@
|
|
|
1
1
|
/**
|
|
2
|
-
* Checkpoint Supervision Gate
|
|
2
|
+
* Checkpoint Supervision Gate (v2)
|
|
3
3
|
*
|
|
4
4
|
* Monitors agent quality during extended execution for frontier models
|
|
5
5
|
* like GPT-5.3-Codex "coworking mode" that run autonomously for long periods.
|
|
6
6
|
*
|
|
7
|
-
*
|
|
8
|
-
* -
|
|
9
|
-
* -
|
|
10
|
-
* -
|
|
11
|
-
* -
|
|
7
|
+
* v2 upgrades:
|
|
8
|
+
* - EWMA (Exponentially Weighted Moving Average) replaces linear regression
|
|
9
|
+
* - One bad checkpoint no longer tanks the whole trend
|
|
10
|
+
* - Configurable smoothing factor (α=0.3 default — recent data weighted more)
|
|
11
|
+
* - Separate "sudden drop" detection from "gradual decline" detection
|
|
12
12
|
*
|
|
13
|
-
*
|
|
13
|
+
* EWMA formula:
|
|
14
|
+
* ewma_t = α × score_t + (1 - α) × ewma_(t-1)
|
|
15
|
+
*
|
|
16
|
+
* Why EWMA > Linear Regression:
|
|
17
|
+
* - Linear regression on 5 points: one outlier shifts the slope dramatically
|
|
18
|
+
* - EWMA: one bad score dampened by history, persistent drops amplified
|
|
19
|
+
* - α=0.3: ~70% weight on history, 30% on new data → noise-resistant
|
|
20
|
+
*
|
|
21
|
+
* @since v2.14.0 (original, linear regression)
|
|
22
|
+
* @since v5.0.0 (EWMA drift detection)
|
|
14
23
|
*/
|
|
15
24
|
import { Gate } from './base.js';
|
|
16
25
|
import { Logger } from '../utils/logger.js';
|
|
@@ -22,8 +31,78 @@ let currentCheckpointSession = null;
|
|
|
22
31
|
* Generate unique checkpoint ID
|
|
23
32
|
*/
|
|
24
33
|
function generateCheckpointId() {
|
|
25
|
-
return `cp-${Date.now()}-${Math.random().toString(36).
|
|
34
|
+
return `cp-${Date.now()}-${Math.random().toString(36).substring(2, 8)}`;
|
|
35
|
+
}
|
|
36
|
+
// ─── EWMA Computation ──────────────────────────────────────────────
|
|
37
|
+
/**
|
|
38
|
+
* Compute EWMA for a new data point.
|
|
39
|
+
*
|
|
40
|
+
* ewma_t = α × value + (1 - α) × ewma_(t-1)
|
|
41
|
+
*
|
|
42
|
+
* For the first data point, ewma = value (no history to smooth against).
|
|
43
|
+
*/
|
|
44
|
+
function computeEWMA(value, previousEWMA, alpha) {
|
|
45
|
+
if (previousEWMA === undefined)
|
|
46
|
+
return value;
|
|
47
|
+
return alpha * value + (1 - alpha) * previousEWMA;
|
|
48
|
+
}
|
|
49
|
+
/**
|
|
50
|
+
* Detect quality drift using EWMA.
|
|
51
|
+
*
|
|
52
|
+
* Two-signal detection:
|
|
53
|
+
*
|
|
54
|
+
* 1. Sudden Drop: Current score is significantly below the EWMA.
|
|
55
|
+
* This catches "agent suddenly started producing garbage."
|
|
56
|
+
* Threshold: score < ewma - drift_drop_threshold
|
|
57
|
+
*
|
|
58
|
+
* 2. Gradual Decline: EWMA itself is trending downward.
|
|
59
|
+
* Compare current EWMA to EWMA from N checkpoints ago.
|
|
60
|
+
* This catches "agent is slowly getting worse over 30 minutes."
|
|
61
|
+
* Threshold: ewma_now < ewma_5ago - drift_drop_threshold
|
|
62
|
+
*
|
|
63
|
+
* Returns trend assessment and whether drift alarm should fire.
|
|
64
|
+
*/
|
|
65
|
+
function detectDrift(checkpoints, alpha, dropThreshold) {
|
|
66
|
+
if (checkpoints.length < 3) {
|
|
67
|
+
return { hasDrift: false, trend: 'stable', ewma: checkpoints.length > 0 ? checkpoints[checkpoints.length - 1].qualityScore : 0 };
|
|
68
|
+
}
|
|
69
|
+
// Recompute EWMA chain to ensure consistency
|
|
70
|
+
let ewma = checkpoints[0].qualityScore;
|
|
71
|
+
for (let i = 1; i < checkpoints.length; i++) {
|
|
72
|
+
ewma = computeEWMA(checkpoints[i].qualityScore, ewma, alpha);
|
|
73
|
+
}
|
|
74
|
+
const currentScore = checkpoints[checkpoints.length - 1].qualityScore;
|
|
75
|
+
// Signal 1: Sudden drop from smoothed average
|
|
76
|
+
if (currentScore < ewma - dropThreshold) {
|
|
77
|
+
return {
|
|
78
|
+
hasDrift: true,
|
|
79
|
+
trend: 'degrading',
|
|
80
|
+
ewma,
|
|
81
|
+
reason: `Sudden quality drop: score ${currentScore}% vs EWMA ${ewma.toFixed(1)}% (gap: ${(ewma - currentScore).toFixed(1)})`,
|
|
82
|
+
};
|
|
83
|
+
}
|
|
84
|
+
// Signal 2: Gradual EWMA decline (compare to EWMA 5 checkpoints ago)
|
|
85
|
+
if (checkpoints.length >= 6) {
|
|
86
|
+
let ewmaAt5Ago = checkpoints[0].qualityScore;
|
|
87
|
+
const stopAt = checkpoints.length - 5;
|
|
88
|
+
for (let i = 1; i <= stopAt; i++) {
|
|
89
|
+
ewmaAt5Ago = computeEWMA(checkpoints[i].qualityScore, ewmaAt5Ago, alpha);
|
|
90
|
+
}
|
|
91
|
+
if (ewma < ewmaAt5Ago - dropThreshold) {
|
|
92
|
+
return {
|
|
93
|
+
hasDrift: true,
|
|
94
|
+
trend: 'degrading',
|
|
95
|
+
ewma,
|
|
96
|
+
reason: `Gradual decline: EWMA dropped from ${ewmaAt5Ago.toFixed(1)}% to ${ewma.toFixed(1)}% over last 5 checkpoints`,
|
|
97
|
+
};
|
|
98
|
+
}
|
|
99
|
+
if (ewma > ewmaAt5Ago + dropThreshold) {
|
|
100
|
+
return { hasDrift: false, trend: 'improving', ewma };
|
|
101
|
+
}
|
|
102
|
+
}
|
|
103
|
+
return { hasDrift: false, trend: 'stable', ewma };
|
|
26
104
|
}
|
|
105
|
+
// ─── Session Management ─────────────────────────────────────────────
|
|
27
106
|
/**
|
|
28
107
|
* Get or create checkpoint session
|
|
29
108
|
*/
|
|
@@ -45,22 +124,29 @@ export function getOrCreateCheckpointSession(cwd) {
|
|
|
45
124
|
/**
|
|
46
125
|
* Record a checkpoint with quality evaluation
|
|
47
126
|
*/
|
|
48
|
-
export function recordCheckpoint(cwd, progressPct, filesChanged, summary, qualityScore) {
|
|
127
|
+
export function recordCheckpoint(cwd, progressPct, filesChanged, summary, qualityScore, config) {
|
|
49
128
|
const session = getOrCreateCheckpointSession(cwd);
|
|
50
129
|
const warnings = [];
|
|
51
|
-
|
|
52
|
-
const qualityThreshold = 80;
|
|
130
|
+
const alpha = config?.ewma_alpha ?? 0.3;
|
|
131
|
+
const qualityThreshold = config?.quality_threshold ?? 80;
|
|
53
132
|
// Check if quality is below threshold
|
|
54
133
|
const shouldContinue = qualityScore >= qualityThreshold;
|
|
55
134
|
if (!shouldContinue) {
|
|
56
135
|
warnings.push(`Quality score ${qualityScore}% is below threshold ${qualityThreshold}%`);
|
|
57
136
|
}
|
|
58
|
-
//
|
|
137
|
+
// Compute EWMA for this checkpoint
|
|
138
|
+
const previousEWMA = session.checkpoints.length > 0
|
|
139
|
+
? session.checkpoints[session.checkpoints.length - 1].ewma
|
|
140
|
+
: undefined;
|
|
141
|
+
const currentEWMA = computeEWMA(qualityScore, previousEWMA, alpha);
|
|
142
|
+
// Detect drift using EWMA
|
|
59
143
|
if (session.checkpoints.length >= 2) {
|
|
60
|
-
const
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
144
|
+
const dropThreshold = config?.drift_drop_threshold ?? 15;
|
|
145
|
+
// Temporarily add this checkpoint for drift analysis
|
|
146
|
+
const tempCheckpoints = [...session.checkpoints, { qualityScore }];
|
|
147
|
+
const { hasDrift, reason } = detectDrift(tempCheckpoints, alpha, dropThreshold);
|
|
148
|
+
if (hasDrift && reason) {
|
|
149
|
+
warnings.push(reason);
|
|
64
150
|
}
|
|
65
151
|
}
|
|
66
152
|
const checkpoint = {
|
|
@@ -71,6 +157,7 @@ export function recordCheckpoint(cwd, progressPct, filesChanged, summary, qualit
|
|
|
71
157
|
summary,
|
|
72
158
|
qualityScore,
|
|
73
159
|
warnings,
|
|
160
|
+
ewma: Math.round(currentEWMA * 10) / 10,
|
|
74
161
|
};
|
|
75
162
|
session.checkpoints.push(checkpoint);
|
|
76
163
|
session.lastCheckpoint = new Date();
|
|
@@ -101,7 +188,6 @@ export function completeCheckpointSession(cwd) {
|
|
|
101
188
|
export function abortCheckpointSession(cwd, reason) {
|
|
102
189
|
if (currentCheckpointSession) {
|
|
103
190
|
currentCheckpointSession.status = 'aborted';
|
|
104
|
-
// Add final checkpoint with abort reason
|
|
105
191
|
currentCheckpointSession.checkpoints.push({
|
|
106
192
|
checkpointId: generateCheckpointId(),
|
|
107
193
|
timestamp: new Date(),
|
|
@@ -162,30 +248,7 @@ function timeSinceLastCheckpoint(session) {
|
|
|
162
248
|
const lastTime = session.lastCheckpoint || session.startedAt;
|
|
163
249
|
return (Date.now() - lastTime.getTime()) / 1000 / 60; // minutes
|
|
164
250
|
}
|
|
165
|
-
|
|
166
|
-
* Detect quality drift pattern
|
|
167
|
-
*/
|
|
168
|
-
function detectDrift(checkpoints) {
|
|
169
|
-
if (checkpoints.length < 3) {
|
|
170
|
-
return { hasDrift: false, trend: 'stable' };
|
|
171
|
-
}
|
|
172
|
-
const recent = checkpoints.slice(-5);
|
|
173
|
-
const scores = recent.map(cp => cp.qualityScore);
|
|
174
|
-
// Calculate trend using simple linear regression
|
|
175
|
-
const n = scores.length;
|
|
176
|
-
const sumX = (n * (n - 1)) / 2;
|
|
177
|
-
const sumY = scores.reduce((a, b) => a + b, 0);
|
|
178
|
-
const sumXY = scores.reduce((sum, y, x) => sum + x * y, 0);
|
|
179
|
-
const sumX2 = (n * (n - 1) * (2 * n - 1)) / 6;
|
|
180
|
-
const slope = (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX);
|
|
181
|
-
if (slope < -2) {
|
|
182
|
-
return { hasDrift: true, trend: 'degrading' };
|
|
183
|
-
}
|
|
184
|
-
else if (slope > 2) {
|
|
185
|
-
return { hasDrift: false, trend: 'improving' };
|
|
186
|
-
}
|
|
187
|
-
return { hasDrift: false, trend: 'stable' };
|
|
188
|
-
}
|
|
251
|
+
// ─── Gate Implementation ────────────────────────────────────────────
|
|
189
252
|
export class CheckpointGate extends Gate {
|
|
190
253
|
config;
|
|
191
254
|
constructor(config = {}) {
|
|
@@ -196,6 +259,8 @@ export class CheckpointGate extends Gate {
|
|
|
196
259
|
quality_threshold: config.quality_threshold ?? 80,
|
|
197
260
|
drift_detection: config.drift_detection ?? true,
|
|
198
261
|
auto_save_on_failure: config.auto_save_on_failure ?? true,
|
|
262
|
+
ewma_alpha: config.ewma_alpha ?? 0.3,
|
|
263
|
+
drift_drop_threshold: config.drift_drop_threshold ?? 15,
|
|
199
264
|
};
|
|
200
265
|
}
|
|
201
266
|
get provenance() { return 'governance'; }
|
|
@@ -206,7 +271,6 @@ export class CheckpointGate extends Gate {
|
|
|
206
271
|
const failures = [];
|
|
207
272
|
const session = getCheckpointSession(context.cwd);
|
|
208
273
|
if (!session || session.checkpoints.length === 0) {
|
|
209
|
-
// No checkpoints yet, skip
|
|
210
274
|
return [];
|
|
211
275
|
}
|
|
212
276
|
Logger.info(`Checkpoint Gate: ${session.checkpoints.length} checkpoints in session`);
|
|
@@ -220,11 +284,11 @@ export class CheckpointGate extends Gate {
|
|
|
220
284
|
if (lastCheckpoint.qualityScore < (this.config.quality_threshold ?? 80)) {
|
|
221
285
|
failures.push(this.createFailure(`Quality score ${lastCheckpoint.qualityScore}% is below threshold ${this.config.quality_threshold}%`, lastCheckpoint.filesChanged, 'Review recent changes and address quality issues before continuing', 'Quality Below Threshold', undefined, undefined, 'high'));
|
|
222
286
|
}
|
|
223
|
-
// Check 3:
|
|
287
|
+
// Check 3: EWMA-based drift detection (v5)
|
|
224
288
|
if (this.config.drift_detection) {
|
|
225
|
-
const { hasDrift, trend } = detectDrift(session.checkpoints);
|
|
289
|
+
const { hasDrift, trend, ewma, reason } = detectDrift(session.checkpoints, this.config.ewma_alpha ?? 0.3, this.config.drift_drop_threshold ?? 15);
|
|
226
290
|
if (hasDrift && trend === 'degrading') {
|
|
227
|
-
failures.push(this.createFailure(`Quality drift detected: scores are degrading over time`, undefined, 'Agent performance is declining. Consider pausing and reviewing recent work.', 'Quality Drift Detected', undefined, undefined, 'high'));
|
|
291
|
+
failures.push(this.createFailure(`Quality drift detected (EWMA: ${ewma.toFixed(1)}%): ${reason || 'scores are degrading over time'}`, undefined, 'Agent performance is declining. Consider pausing and reviewing recent work.', 'Quality Drift Detected', undefined, undefined, 'high'));
|
|
228
292
|
}
|
|
229
293
|
}
|
|
230
294
|
return failures;
|
|
@@ -72,11 +72,14 @@ describe('CheckpointGate', () => {
|
|
|
72
72
|
});
|
|
73
73
|
describe('drift detection', () => {
|
|
74
74
|
it('should detect quality degradation', () => {
|
|
75
|
-
// Record
|
|
75
|
+
// Record checkpoints with a sharp quality drop.
|
|
76
|
+
// EWMA with α=0.3: after 95, 90, the EWMA ≈ 93.5.
|
|
77
|
+
// A sudden drop to 55 creates a gap of ~38 which exceeds the
|
|
78
|
+
// drift_drop_threshold of 15, triggering the "Sudden quality drop" warning.
|
|
76
79
|
recordCheckpoint(testDir, 20, [], 'Start', 95);
|
|
77
80
|
recordCheckpoint(testDir, 40, [], 'Middle', 90);
|
|
78
|
-
const result = recordCheckpoint(testDir, 60, [], 'Decline',
|
|
79
|
-
expect(result.warnings.some(w => w.includes('
|
|
81
|
+
const result = recordCheckpoint(testDir, 60, [], 'Decline', 55);
|
|
82
|
+
expect(result.warnings.some(w => w.includes('Sudden quality drop'))).toBe(true);
|
|
80
83
|
});
|
|
81
84
|
it('should not flag stable quality', () => {
|
|
82
85
|
recordCheckpoint(testDir, 20, [], 'Start', 85);
|
|
@@ -1,7 +1,46 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Dependency Guardian Gate (v2)
|
|
3
|
+
*
|
|
4
|
+
* Detects dependency issues that AI agents commonly introduce:
|
|
5
|
+
* 1. Forbidden dependencies (existing) — packages banned by project standards
|
|
6
|
+
* 2. Unused dependencies (NEW) — installed but never imported
|
|
7
|
+
* 3. Heavy alternatives (NEW) — bloated packages with lighter alternatives
|
|
8
|
+
* 4. Duplicate purpose (NEW) — multiple packages solving the same problem
|
|
9
|
+
*
|
|
10
|
+
* AI agents are particularly prone to:
|
|
11
|
+
* - Adding packages they've seen in training data without checking existing deps
|
|
12
|
+
* - Using heavy/popular packages when lighter alternatives exist
|
|
13
|
+
* - Installing multiple HTTP clients, date libs, etc. across different sessions
|
|
14
|
+
*
|
|
15
|
+
* @since v2.0.0 (forbidden deps)
|
|
16
|
+
* @since v5.1.0 (unused, heavy alternatives, duplicate purpose)
|
|
17
|
+
*/
|
|
1
18
|
import { Failure, Config } from '../types/index.js';
|
|
2
19
|
import { Gate, GateContext } from './base.js';
|
|
3
20
|
export declare class DependencyGate extends Gate {
|
|
4
21
|
private config;
|
|
5
22
|
constructor(config: Config);
|
|
6
23
|
run(context: GateContext): Promise<Failure[]>;
|
|
24
|
+
/**
|
|
25
|
+
* Detect dependencies listed in package.json but never imported.
|
|
26
|
+
* Scans all source files for import/require statements.
|
|
27
|
+
*
|
|
28
|
+
* Allowlist handles side-effect imports like:
|
|
29
|
+
* - @types/* (TypeScript type packages)
|
|
30
|
+
* - polyfills (core-js, regenerator-runtime)
|
|
31
|
+
* - PostCSS/Babel plugins (used in config files, not source)
|
|
32
|
+
*/
|
|
33
|
+
private detectUnusedDeps;
|
|
34
|
+
/**
|
|
35
|
+
* Detect heavy/bloated packages that have lighter modern alternatives.
|
|
36
|
+
* AI agents tend to reach for the most popular (heaviest) package
|
|
37
|
+
* because that's what they've seen most in training data.
|
|
38
|
+
*/
|
|
39
|
+
private detectHeavyAlternatives;
|
|
40
|
+
/**
|
|
41
|
+
* Detect when multiple packages serve the same purpose.
|
|
42
|
+
* This is a classic AI drift symptom — different sessions install different
|
|
43
|
+
* packages for the same task (e.g., axios in one PR, got in another).
|
|
44
|
+
*/
|
|
45
|
+
private detectDuplicatePurpose;
|
|
7
46
|
}
|