npm - pullfrog - Versions diffs - 0.1.5 → 0.1.6 - Mend

pullfrog 0.1.5 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/dist/agents/sessionLabeler.d.ts +38 -18
package/dist/cli.mjs +294 -107
package/dist/index.js +282 -95
package/dist/internal.js +4 -2
package/dist/utils/normalizeEnv.d.ts +21 -1
package/dist/utils/subprocess.d.ts +40 -0
package/dist/utils/timer.d.ts +11 -0
package/package.json +1 -1

package/dist/internal.js CHANGED Viewed

@@ -601,6 +601,8 @@ For simple, well-defined tasks, skip the plan phase and go straight to build.`
    - **4\u20135 lenses (high-stakes subsystem touches)** \u2014 any billing/payments change (billing-subsystem + correctness + security + operational-readiness); new auth flow (auth-subsystem + correctness + security + test-integrity); schema migration (schema-migration-subsystem + correctness + operational-readiness + impact); cross-subsystem PR that touches billing AND auth AND schema (one subsystem lens per domain + correctness)
    - **6+ lenses** \u2014 almost always a smell; you're either covering overlapping ground or this PR should have been split. push back via the review body rather than expanding lens count.
+   **lens-add discipline.** Each lens needs to clear a specific bar before you dispatch it: name the concrete failure mode this lens would catch *that the diff plausibly introduces*, in one sentence. "Could apply", "good to have", "for completeness" do not qualify. If you can't name what the lens is going to find, drop it. The "when unsure, treat as non-trivial" rule above is for the trivial-vs-non-trivial gate at step 3 \u2014 it does not license expanding lens count without articulated risk. Every extra lens adds wall-time, log noise, and pulls subagent attention onto speculative angles, which biases the final review toward bloat-shaped findings.
    lenses come in two flavors, and you can mix them:
    - **themed lenses** \u2014 a perspective applied across the whole diff (correctness, security, user-journey, performance, etc.).
    - **subsystem lenses** \u2014 a domain-scoped frame for high-stakes subsystems the PR touches (e.g. "the auth lens", "the billing lens", "the schema-migration lens"). a subsystem lens is "review the PR specifically for what could go wrong in this subsystem" and naturally combines theme + scope. **for high-stakes domains, lead with the subsystem lens rather than the generic themed equivalent** \u2014 "billing-subsystem" outperforms "correctness on billing code" because the framing primes the subagent to remember domain-specific failure modes (double-charges, refund races, currency rounding, dispute flows) the generic lens misses.
@@ -608,7 +610,7 @@ For simple, well-defined tasks, skip the plan phase and go straight to build.`
    starter menu (combine, omit, or invent your own):
    - **correctness & invariants** \u2014 bugs, races, error handling, edge cases, state-machine boundaries
    - **impact** \u2014 when the PR removes features, deletes exports, renames identifiers, or changes architectural patterns: stale references in code, tests, docs (\`docs/\`, \`wiki/\`), comments, configs, UI
-   - **research-validated assumptions** \u2014 third-party API contracts, SDK semantics, framework directives, version-gated behavior. the subagent must verify load-bearing claims via web search and quote source URLs.
+   - **research-validated assumptions** \u2014 third-party API contracts, SDK semantics, framework directives, version-gated behavior. **only pick when the PR's correctness depends on the contract behaving a specific way** \u2014 not when the API is merely used. An idempotency key as a backstop, a timeout as a hint, a retry as belt-and-suspenders: not load-bearing, skip this lens. The bar is "if the third-party contract differs from what the diff assumes, the PR is incorrect." When dispatched, the subagent must verify load-bearing claims via web search and quote source URLs.
    - **security** \u2014 new endpoints, authZ, input validation, secrets handling, replay/CSRF/injection, cross-tenant isolation
    - **user-journey** \u2014 UX-touching flows: walk through happy path and failure modes as a user
    - **operational readiness** \u2014 observability, alerting, migrations (forward + rollback), feature flags, on-call burden
@@ -697,7 +699,7 @@ ${PR_SUMMARY_FORMAT}`
    "Looks trivial but isn't" (do NOT skip \u2014 same anti-patterns as Review mode): 1-line changes to SQL/regex/auth/billing/permissions/signature-verification code; flipping feature-flag defaults or retry/timeout constants; money/tax/HTTP-method/redirect changes; tightening or loosening a comparison operator; mixed diffs with a semantic line buried in formatting.
    When unsure, treat as non-trivial.
-   otherwise pick lenses by where the new commits concentrate risk \u2014 **there's no fixed count**, same calibration as Review mode (1 lens for pure refactor / isolated fix; 2\u20133 for typical features; 4\u20135 for high-stakes subsystem touches; 6+ is a smell). lens framing follows Review mode: themed lenses (correctness & invariants, impact when new commits remove/rename/deprecate things, research-validated assumptions, security, user-journey, operational readiness, integration & cross-cutting, test integrity, performance, holistic) and subsystem lenses (auth, billing, schema migration, etc.) \u2014 for high-stakes domains lead with the subsystem lens rather than the generic themed equivalent.
+   otherwise pick lenses by where the new commits concentrate risk \u2014 **there's no fixed count**, same calibration as Review mode (1 lens for pure refactor / isolated fix; 2\u20133 for typical features; 4\u20135 for high-stakes subsystem touches; 6+ is a smell). same **lens-add discipline** as Review mode applies: each lens needs to name the concrete failure mode it would catch *that the new commits plausibly introduce* \u2014 "could apply" doesn't qualify, drop it. **research-validated assumptions** specifically: only pick when the new commits' correctness depends on a third-party contract behaving a specific way; merely using an API doesn't qualify. lens framing follows Review mode: themed lenses (correctness & invariants, impact when new commits remove/rename/deprecate things, research-validated assumptions, security, user-journey, operational readiness, integration & cross-cutting, test integrity, performance, holistic) and subsystem lenses (auth, billing, schema migration, etc.) \u2014 for high-stakes domains lead with the subsystem lens rather than the generic themed equivalent.
    dispatch one \`${REVIEWER_AGENT_NAME}\` subagent per lens \u2014 its baked-in system prompt enforces the non-mutative + non-recursive contract (read-only file/search/web tools and read-only MCP queries; no writes, shell side effects, state-changing MCP calls, or nested subagent dispatch). dispatch them in a **single assistant turn with multiple parallel subagent calls** (serial dispatch collapses the fan-out). if a subagent errors out, times out, or returns nothing usable, retry once with the same lens; if it still fails, proceed with partial coverage and note the missing lens in the review body \u2014 do not skip step 5 entirely on a single subagent failure. each subagent gets:
    - the diff scope (incremental diff path if available, full diff otherwise). do NOT tell them to skip pre-existing issues \u2014 that suppresses regressions the new commits amplified; the "issues must be NEW" filter lives at aggregation time (step 6), not in the subagent prompt

package/dist/utils/normalizeEnv.d.ts CHANGED Viewed

@@ -1,3 +1,22 @@
+/**
+ * Trim surrounding whitespace from a sensitive value and register it as a
+ * GitHub Actions log mask. Trailing newlines from terminal-copy paste are a
+ * common footgun: the value travels through GH Actions logs and any tool
+ * that re-emits parts of it leaks the unmasked tail. Trimming canonicalises
+ * the value so the mask matches exactly what downstream tools will print.
+ *
+ * Masking is delegated to `core.setSecret` (not raw `console.log`) so the
+ * toolkit percent-encodes `\r`/`\n`; the runner V2 parser decodes them and
+ * registers the full value plus every non-empty line as separate masks. That
+ * keeps us safe for embedded-newline values (PEMs, kubeconfigs, JSON blobs)
+ * even though they aren't currently used.
+ *
+ * Returns the trimmed value, or `null` when the input was whitespace-only —
+ * callers must leave `process.env` untouched in that case so a misconfigured
+ * value surfaces as a clear "missing key" downstream rather than silently
+ * mutating to the empty string.
+ */
+export declare function sanitizeSecret(key: string, value: string): string | null;
 /**
  * Normalize environment variables to uppercase.
  * This handles case-insensitive env var names (e.g., `anthropic_api_key` -> `ANTHROPIC_API_KEY`).
@@ -5,6 +24,7 @@
  * If there are conflicts (same key with different capitalizations but different values),
  * logs a warning and keeps the uppercase version.
  *
- * Also registers sensitive values as masks in GitHub Actions.
+ * Also trims and masks sensitive values so accidental trailing whitespace
+ * doesn't defeat GitHub Actions log masking.
  */
 export declare function normalizeEnv(): void;

package/dist/utils/subprocess.d.ts CHANGED Viewed

@@ -14,6 +14,27 @@ export declare function trackChild(options: TrackChildOptions): void;
 export declare function untrackChild(child: ChildProcess): void;
 export declare function setSignalHandler(handler: SignalHandler | null): void;
 export declare function killTrackedChildren(): void;
+/**
+ * Controls what the wrapper retains in memory across the child's lifetime
+ * for the post-hoc `SpawnResult.stdout` / `SpawnResult.stderr` snapshots.
+ *
+ * Streaming callbacks (`onStdout` / `onStderr`) fire regardless — `retain`
+ * only governs the buffered snapshot returned in `SpawnResult`.
+ *
+ * - `"tail"` (default): keep the last `maxRetainedBytes` UTF-16 code units
+ *   of each stream. Once the cap is exceeded, oldest bytes are sliced off
+ *   and the result is prefixed with a `... [N MiB truncated] ...` sentinel.
+ *   Right default for short-lived commands whose failure mode is in their
+ *   final output (git errors, install failures, hook scripts).
+ * - `"none"`: skip the buffer entirely. `SpawnResult.stdout` / `.stderr`
+ *   are empty strings. Use this for long-lived streaming agents that already
+ *   drain via `onStdout` / `onStderr` and never read the buffered snapshot.
+ *
+ * Default cap is 8 MiB — well below V8's ~1 GiB `kMaxLength` so `+= chunk`
+ * can never throw `RangeError: Invalid string length`.
+ */
+export type RetainMode = "tail" | "none";
+export declare const DEFAULT_MAX_RETAINED_BYTES: number;
 export interface SpawnOptions {
     cmd: string;
     args: string[];
@@ -27,6 +48,25 @@ export interface SpawnOptions {
     onStdout?: (chunk: string) => void;
     onStderr?: (chunk: string) => void;
     killGroup?: boolean;
+    retain?: RetainMode;
+    maxRetainedBytes?: number;
+}
+/**
+ * Bounded string accumulator that keeps the tail of appended chunks.
+ * Once the cap is exceeded, oldest bytes are sliced off and `toString()`
+ * prefixes the survivors with a sentinel describing the elided byte count.
+ *
+ * Exported because long-lived agent runtimes (opencode, claude) also
+ * accumulate per-run narration strings independently of the spawn wrapper
+ * and need the same protection against V8's `kMaxLength`.
+ */
+export declare class TailBuffer {
+    private readonly cap;
+    private buffer;
+    private truncatedBytes;
+    constructor(cap: number);
+    append(chunk: string): void;
+    toString(): string;
 }
 export interface SpawnResult {
     stdout: string;

package/dist/utils/timer.d.ts CHANGED Viewed

@@ -4,9 +4,20 @@ export declare class Timer {
     constructor();
     checkpoint(name: string): void;
 }
+/**
+ * Measures wall-clock gap between the last tool_result and the next tool_call,
+ * surfacing it as a "thought for Xs" log when over `THINKING_THRESHOLD`.
+ *
+ * Use one instance per logical session (orchestrator, each subagent) — sharing
+ * a single timer across sessions conflates cross-session interleaving as
+ * thinking time. The optional `formatLine` lets the caller prefix output with
+ * a session label so attribution is visible in the merged log stream.
+ */
 export declare class ThinkingTimer {
     private readonly durationFormatter;
     private lastToolResultTimestamp;
+    private readonly formatLine;
+    constructor(formatLine?: (line: string) => string);
     markToolResult(): void;
     markToolCall(): void;
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pullfrog",
-  "version": "0.1.5",
+  "version": "0.1.6",
   "type": "module",
   "bin": {
     "pullfrog": "dist/cli.mjs",