pi-crew 0.5.12 → 0.5.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/README.md +1 -1
- package/docs/pi-crew-v0.5.13-audit-fix-plan.md +75 -0
- package/package.json +1 -1
- package/src/benchmark/benchmark-runner.ts +22 -7
- package/src/utils/bm25-search.ts +28 -10
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,27 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.5.13] — Round 18 Audit Fixes (2026-06-02)
|
|
4
|
+
|
|
5
|
+
### Phase 1: Switch to execFileSync (HIGH security)
|
|
6
|
+
- `src/benchmark/benchmark-runner.ts` — Replaced `execSync` with `execFileSync(program, args)`. This prevents shell parsing of command strings, even if `validateCommand` is bypassed.
|
|
7
|
+
- `validateCommand` retained as defense-in-depth (blocks shell metacharacters).
|
|
8
|
+
- New `splitCommand()` helper safely splits validated commands.
|
|
9
|
+
|
|
10
|
+
### Phase 2: Precompute document frequency (MEDIUM performance)
|
|
11
|
+
- `src/utils/bm25-search.ts` — `BM25Search.df()` is now precomputed once in the constructor via `precomputeDocumentFrequencies()`. Lookup is O(1) via `dfCache: Map<term, number>`.
|
|
12
|
+
- Per-search complexity: O(Q * N) instead of O(Q² * N²).
|
|
13
|
+
|
|
14
|
+
### Phase 3+4: Test coverage for 3 untested modules
|
|
15
|
+
- 15 tests in `test/unit/bm25-search.test.ts`
|
|
16
|
+
- 15 tests in `test/unit/scan-cache.test.ts`
|
|
17
|
+
- 20 tests in `test/unit/benchmark.test.ts`
|
|
18
|
+
- **Total: 50 new tests**
|
|
19
|
+
|
|
20
|
+
### Tests
|
|
21
|
+
- 2352/2352 pass (was 2313 in v0.5.12; +39 net)
|
|
22
|
+
- 50 new tests across 3 new test files
|
|
23
|
+
- TypeScript: 0 errors
|
|
24
|
+
|
|
3
25
|
## [0.5.12] — Round 17 Audit Fixes (2026-06-02)
|
|
4
26
|
|
|
5
27
|
### Phase 1: Signal Handler Stacking (HIGH)
|
package/README.md
CHANGED
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# pi-crew v0.5.13 Audit Fix Plan (Round 18)
|
|
2
|
+
|
|
3
|
+
## Source Verification Findings
|
|
4
|
+
|
|
5
|
+
I read the following files and identified 4 confirmed real issues:
|
|
6
|
+
|
|
7
|
+
### Issue 1: `benchmark-runner.ts` uses `execSync` instead of `execFileSync` (HIGH security)
|
|
8
|
+
**File**: `src/benchmark/benchmark-runner.ts:4,110,119,128`
|
|
9
|
+
|
|
10
|
+
```ts
|
|
11
|
+
import { execSync } from "child_process";
|
|
12
|
+
// ...
|
|
13
|
+
output = execSync(judge.command, { ... });
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
`execSync(command, ...)` invokes a shell to parse the command, even when `validateCommand` is run first. The `validateCommand` function only checks for shell metacharacters in the *arguments* (after the first space), but:
|
|
17
|
+
- It does not escape/quote arguments safely
|
|
18
|
+
- A bug in `validateCommand` or a clever input could bypass
|
|
19
|
+
- `cwd: process.cwd()` could be inherited from a parent context
|
|
20
|
+
- Best practice: use `execFileSync` with `command.split(' ')[0]` and the rest as args, so no shell is invoked
|
|
21
|
+
|
|
22
|
+
**Fix**: Switch to `execFileSync` with command split into program + args. Keep `validateCommand` as defense-in-depth but no longer rely on it alone.
|
|
23
|
+
|
|
24
|
+
### Issue 2: `BM25Search.df()` is O(N) per call and called inside the search loop (MEDIUM performance)
|
|
25
|
+
**File**: `src/utils/bm25-search.ts:47-65, 75-104`
|
|
26
|
+
|
|
27
|
+
The `df()` function is called for every query term in the search loop, and itself iterates over all documents. This means:
|
|
28
|
+
- For a query with `Q` terms and `N` documents, `df()` is called `Q * N` times
|
|
29
|
+
- Each `df()` call iterates over `N` documents and `field_count` fields
|
|
30
|
+
- Total complexity: **O(Q² * N² * field_count)**
|
|
31
|
+
|
|
32
|
+
This is quadratic when it should be linear. Document frequencies don't change between `search()` calls for the same document set, so they should be cached.
|
|
33
|
+
|
|
34
|
+
**Fix**: Precompute `df` once in the constructor (or lazily on first search) and cache it as a Map<term, number>. Re-compute only when documents change.
|
|
35
|
+
|
|
36
|
+
### Issue 3: `SharedScanCache.set()` LRU eviction is by insertion order, not access order (LOW)
|
|
37
|
+
**File**: `src/utils/scan-cache.ts:62-69`
|
|
38
|
+
|
|
39
|
+
The eviction policy evicts the *oldest inserted* entry, not the *least recently accessed*. So if a frequently-updated entry is inserted, then later entries are inserted, the frequently-updated one (which is the *same* Map key) won't be moved to the end of the insertion order — it stays at the head and is the next to be evicted.
|
|
40
|
+
|
|
41
|
+
This is a minor issue because:
|
|
42
|
+
- In practice, scan cache entries are short-lived (TTL=1s by default)
|
|
43
|
+
- The eviction only matters when entries hit the `maxEntries` cap
|
|
44
|
+
|
|
45
|
+
**Fix**: Either document the limitation or implement proper LRU. For now, document it.
|
|
46
|
+
|
|
47
|
+
### Issue 4: `bm25-search.ts` has no tests (LOW coverage)
|
|
48
|
+
**File**: `test/unit/bm25-search.test.ts` — does not exist
|
|
49
|
+
|
|
50
|
+
BM25Search is a non-trivial search algorithm. Currently zero test coverage. Should add tests for:
|
|
51
|
+
- Basic search returns relevant results
|
|
52
|
+
- Field weighting affects ranking
|
|
53
|
+
- minScore threshold
|
|
54
|
+
- limit cap
|
|
55
|
+
- Empty query returns empty results
|
|
56
|
+
- df() precomputation (after Issue 2 fix)
|
|
57
|
+
|
|
58
|
+
## Plan (4 phases)
|
|
59
|
+
|
|
60
|
+
### Phase 1: Switch `benchmark-runner.ts` to `execFileSync`
|
|
61
|
+
- Replace `execSync(judge.command, ...)` with `execFileSync(program, args, ...)`
|
|
62
|
+
- Keep `validateCommand` as defense-in-depth
|
|
63
|
+
- Add new tests for benchmark-runner
|
|
64
|
+
|
|
65
|
+
### Phase 2: Precompute `df` in BM25Search
|
|
66
|
+
- Cache `df` map per corpus
|
|
67
|
+
- Invalidate when documents change (or recompute on construction)
|
|
68
|
+
- Add tests to verify behavior unchanged
|
|
69
|
+
|
|
70
|
+
### Phase 3: Add tests for scan-cache, benchmark, bm25-search
|
|
71
|
+
- `test/unit/scan-cache.test.ts`
|
|
72
|
+
- `test/unit/benchmark.test.ts`
|
|
73
|
+
- `test/unit/bm25-search.test.ts`
|
|
74
|
+
|
|
75
|
+
### Phase 4: Release v0.5.13
|
package/package.json
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
* Provides tiered evaluation for workflow tasks.
|
|
4
4
|
*/
|
|
5
5
|
|
|
6
|
-
import {
|
|
6
|
+
import { execFileSync } from "node:child_process";
|
|
7
7
|
|
|
8
8
|
export interface BenchmarkJudge {
|
|
9
9
|
type: "pytest" | "grep" | "command";
|
|
@@ -78,6 +78,16 @@ function validateCommand(command: string): void {
|
|
|
78
78
|
* Tier 3: command execution
|
|
79
79
|
* Fails fast on first tier failure.
|
|
80
80
|
*/
|
|
81
|
+
function splitCommand(command: string): { program: string; args: string[] } {
|
|
82
|
+
// Naive split on whitespace. validateCommand already rejects shell
|
|
83
|
+
// metacharacters, so a simple split is safe.
|
|
84
|
+
const parts = command.trim().split(/\s+/);
|
|
85
|
+
if (parts.length === 0) {
|
|
86
|
+
throw new Error("Empty command");
|
|
87
|
+
}
|
|
88
|
+
return { program: parts[0]!, args: parts.slice(1) };
|
|
89
|
+
}
|
|
90
|
+
|
|
81
91
|
export async function runBenchmark(task: BenchmarkTask): Promise<BenchmarkResult> {
|
|
82
92
|
const startTime = Date.now();
|
|
83
93
|
const judgeResults: BenchmarkResult["judgeResults"] = [];
|
|
@@ -88,10 +98,13 @@ export async function runBenchmark(task: BenchmarkTask): Promise<BenchmarkResult
|
|
|
88
98
|
let output: string | undefined;
|
|
89
99
|
|
|
90
100
|
if (judge.type === "pytest" && judge.command) {
|
|
91
|
-
// Validate command before execution
|
|
101
|
+
// Validate command before execution (defense-in-depth)
|
|
92
102
|
validateCommand(judge.command);
|
|
103
|
+
// Use execFileSync to avoid shell parsing. validateCommand
|
|
104
|
+
// already rejects metacharacters, so a simple split is safe.
|
|
105
|
+
const { program, args } = splitCommand(judge.command);
|
|
93
106
|
// Tier 1: pytest - fast deterministic check
|
|
94
|
-
output =
|
|
107
|
+
output = execFileSync(program, args, {
|
|
95
108
|
timeout: 5000,
|
|
96
109
|
encoding: "utf-8",
|
|
97
110
|
cwd: process.cwd(),
|
|
@@ -99,20 +112,22 @@ export async function runBenchmark(task: BenchmarkTask): Promise<BenchmarkResult
|
|
|
99
112
|
// Look for pytest summary line with passed count
|
|
100
113
|
passed = output.includes("passed");
|
|
101
114
|
} else if (judge.type === "grep" && judge.pattern && judge.command) {
|
|
102
|
-
// Validate command before execution
|
|
115
|
+
// Validate command before execution (defense-in-depth)
|
|
103
116
|
validateCommand(judge.command);
|
|
117
|
+
const { program, args } = splitCommand(judge.command);
|
|
104
118
|
// Tier 2: grep pattern matching
|
|
105
|
-
output =
|
|
119
|
+
output = execFileSync(program, args, {
|
|
106
120
|
timeout: 5000,
|
|
107
121
|
encoding: "utf-8",
|
|
108
122
|
cwd: process.cwd(),
|
|
109
123
|
});
|
|
110
124
|
passed = output.includes(judge.pattern);
|
|
111
125
|
} else if (judge.type === "command" && judge.command) {
|
|
112
|
-
// Validate command before execution
|
|
126
|
+
// Validate command before execution (defense-in-depth)
|
|
113
127
|
validateCommand(judge.command);
|
|
128
|
+
const { program, args } = splitCommand(judge.command);
|
|
114
129
|
// Tier 3: command execution
|
|
115
|
-
output =
|
|
130
|
+
output = execFileSync(program, args, {
|
|
116
131
|
timeout: 10000,
|
|
117
132
|
encoding: "utf-8",
|
|
118
133
|
cwd: process.cwd(),
|
package/src/utils/bm25-search.ts
CHANGED
|
@@ -25,6 +25,13 @@ export class BM25Search<T extends SearchDocument> {
|
|
|
25
25
|
private readonly b: number;
|
|
26
26
|
private readonly docLenMap: Map<string, number>;
|
|
27
27
|
private readonly N: number;
|
|
28
|
+
/**
|
|
29
|
+
* Precomputed document frequency per term. Cached at construction time
|
|
30
|
+
* to avoid O(N) recomputation on every search() call. The cache is
|
|
31
|
+
* immutable for a given document corpus, so it's safe to share across
|
|
32
|
+
* search() invocations.
|
|
33
|
+
*/
|
|
34
|
+
private readonly dfCache: Map<string, number>;
|
|
28
35
|
|
|
29
36
|
constructor(documents: T[], fieldWeights: Record<string, number> = {}, config: BM25Config = {}) {
|
|
30
37
|
this.documents = documents;
|
|
@@ -34,6 +41,7 @@ export class BM25Search<T extends SearchDocument> {
|
|
|
34
41
|
this.N = documents.length;
|
|
35
42
|
|
|
36
43
|
this.docLenMap = new Map();
|
|
44
|
+
this.dfCache = new Map();
|
|
37
45
|
|
|
38
46
|
for (const doc of documents) {
|
|
39
47
|
const fieldValues = Object.values(doc.fields).join(" ");
|
|
@@ -43,26 +51,36 @@ export class BM25Search<T extends SearchDocument> {
|
|
|
43
51
|
|
|
44
52
|
const totalLen = [...this.docLenMap.values()].reduce((a, b) => a + b, 0);
|
|
45
53
|
this.avgDocLen = totalLen / this.N || 1;
|
|
54
|
+
|
|
55
|
+
// Precompute df for all terms in the corpus. We do this once instead
|
|
56
|
+
// of on-demand to avoid the O(Q * N * field_count) cost per search call.
|
|
57
|
+
this.precomputeDocumentFrequencies();
|
|
46
58
|
}
|
|
47
59
|
|
|
48
60
|
/**
|
|
49
|
-
*
|
|
50
|
-
*
|
|
61
|
+
* Build a map of term -> document frequency. O(N * avg_terms * field_count).
|
|
62
|
+
* Called once in the constructor.
|
|
51
63
|
*/
|
|
52
|
-
private
|
|
53
|
-
const termLower = term.toLowerCase();
|
|
54
|
-
let count = 0;
|
|
64
|
+
private precomputeDocumentFrequencies(): void {
|
|
55
65
|
for (const doc of this.documents) {
|
|
56
66
|
for (const field of Object.keys(this.fieldWeights)) {
|
|
57
67
|
const text = (doc.fields[field] ?? "").toLowerCase();
|
|
58
|
-
//
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
68
|
+
// Extract unique terms via split on whitespace
|
|
69
|
+
const terms = new Set(text.split(/\s+/).filter(Boolean));
|
|
70
|
+
for (const term of terms) {
|
|
71
|
+
if (term.length === 0) continue;
|
|
72
|
+
this.dfCache.set(term, (this.dfCache.get(term) ?? 0) + 1);
|
|
62
73
|
}
|
|
63
74
|
}
|
|
64
75
|
}
|
|
65
|
-
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
/**
|
|
79
|
+
* Get document frequency for a term. Returns the precomputed value.
|
|
80
|
+
* O(1) lookup.
|
|
81
|
+
*/
|
|
82
|
+
private df(term: string): number {
|
|
83
|
+
return this.dfCache.get(term.toLowerCase()) ?? 0;
|
|
66
84
|
}
|
|
67
85
|
|
|
68
86
|
search(query: string, options?: { limit?: number; minScore?: number }): SearchResult<T>[] {
|