loki-mode 7.41.5 → 7.42.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +18 -1
- package/SKILL.md +2 -2
- package/VERSION +1 -1
- package/autonomy/completion-council.sh +22 -13
- package/autonomy/hooks/migration-hooks.sh +131 -7
- package/autonomy/loki +54 -43
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +102 -0
- package/docs/INSTALLATION.md +70 -1
- package/loki-ts/dist/loki.js +2 -2
- package/mcp/__init__.py +1 -1
- package/mcp/lsp_proxy.py +274 -89
- package/package.json +1 -1
- package/plugins/loki-mode/.claude-plugin/plugin.json +1 -1
- package/references/core-workflow.md +7 -0
- package/references/quality-control.md +6 -0
- package/skills/agents.md +1 -0
package/README.md
CHANGED
|
@@ -29,7 +29,7 @@ _The free, source-available autonomous coding agent by [Autonomi](https://www.au
|
|
|
29
29
|
- **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
|
|
30
30
|
- **Standalone verification: `loki verify`** -- Run Loki's deterministic gates (build, tests, static analysis, secret scan, dependency audit) against any branch or PR diff, including code written by other agents or humans. CI-ready exit codes (0 VERIFIED, 1 CONCERNS, 2 BLOCKED), machine-readable evidence at `.loki/verify/evidence.json`. Inconclusive evidence is never reported as VERIFIED (v7.27.0).
|
|
31
31
|
- **Living spec and pre-build interrogation** -- `loki spec` locks a spec and detects drift deterministically (`spec.lock`, `drift-report.json`, and a `SPEC_DRIFT` finding in `loki verify` with CI exit codes), so you can tell when the build diverges from what was agreed. `loki grill` runs a Devil's-Advocate interrogation of the spec before you build, surfacing gaps and contradictions early (v7.28.0).
|
|
32
|
-
- **Mid-flight model switching
|
|
32
|
+
- **Mid-flight model switching** -- switch the model a live run uses from the dashboard (applies at the next iteration, current run only). A Fable tier lever exists in the CLI, dashboard, and override paths, but Claude Fable 5 is not yet available at the API, so selecting Fable currently collapses to Opus at every dispatch chokepoint and the `loki plan` quote reflects Opus accordingly. For every model lever (session pin, mid-flight override, architect pass) and every `LOKI_MAX_TIER` path, the `loki plan` quote, the dashboard's reported model, and the actual dispatched model agree, with the ceiling enforced (v7.31.0; Fable-to-Opus collapse v7.39.1).
|
|
33
33
|
- **A calmer CLI** -- the help surface is ~20 grouped workflow entries instead of a 70-command wall; merged commands live on as aliases that forward byte-identically with a one-line stderr pointer, so no script breaks (v7.31.0).
|
|
34
34
|
- **Guided first build: `loki quickstart`** -- four quick questions (setup check, one-line idea, template pick, plan review) and your build starts; pressing Enter through every step builds the sample Todo app. The plan step quotes the real cost/time estimate before anything is spent, and `loki demo` now confirms its estimate the same way. If no AI provider CLI is installed, Loki offers to install Claude Code (consent-gated, interactive terminals only) (v7.29.0).
|
|
35
35
|
- **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
|
|
@@ -391,6 +391,23 @@ Run `loki --help` for all options. Full reference: [CLI Reference](wiki/CLI-Refe
|
|
|
391
391
|
|
|
392
392
|
---
|
|
393
393
|
|
|
394
|
+
<details>
|
|
395
|
+
<summary><strong>Configuration env vars (intelligent defaults, opt-out knobs)</strong></summary>
|
|
396
|
+
|
|
397
|
+
Loki Mode's accuracy and autonomy behaviors are default-on. Each is an opt-out escape hatch, not a setting you have to discover. The most relevant knobs from the v7.41.x accuracy/autonomy hardening:
|
|
398
|
+
|
|
399
|
+
| Env var | Default | Effect |
|
|
400
|
+
|---------|---------|--------|
|
|
401
|
+
| `LOKI_REVIEW_INCONCLUSIVE_BLOCK` | `1` | Blocks completion when a code-review round returns zero usable verdicts (an all-empty review proves nothing). Set `0` to record the inconclusive result without blocking. |
|
|
402
|
+
| `LOKI_COMPLETION_TEST_CAPTURE` | `1` | Captures fresh test results before the verified-completion evidence gate evaluates. Set `0` to skip the pre-gate capture. |
|
|
403
|
+
| `LOKI_AUTO_DOCS` | `true` | Generates the `.loki/docs/` suite before the documentation gate scores it (bounded: once per run when docs are missing, and again only when >10 commits stale). Set `false` to opt out. |
|
|
404
|
+
| `LOKI_CAVEMAN` | `1` (on) | Output-token compressor for free-form generation only (never trust-gate subcalls). Set `0` to opt out. |
|
|
405
|
+
| `LOKI_CAVEMAN_LEVEL` | inferred | Compression level for the compressor. Auto-inferred per invocation from the run's RARV tier; set explicitly (`lite` / `full` / `ultra`) to override the inference. |
|
|
406
|
+
|
|
407
|
+
This is a subset. See the [wiki](wiki/Home.md) for the full env-var reference and the RARV-C closure knobs (`LOKI_INJECT_FINDINGS`, `LOKI_OVERRIDE_COUNCIL`, `LOKI_AUTO_LEARNINGS`, `LOKI_HANDOFF_MD`).
|
|
408
|
+
|
|
409
|
+
</details>
|
|
410
|
+
|
|
394
411
|
<details>
|
|
395
412
|
<summary><strong>BMAD Method Integration</strong></summary>
|
|
396
413
|
|
package/SKILL.md
CHANGED
|
@@ -3,7 +3,7 @@ name: loki-mode
|
|
|
3
3
|
description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Loki Mode v7.
|
|
6
|
+
# Loki Mode v7.42.0
|
|
7
7
|
|
|
8
8
|
**You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
|
|
9
9
|
|
|
@@ -398,4 +398,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
|
|
|
398
398
|
|
|
399
399
|
---
|
|
400
400
|
|
|
401
|
-
**v7.
|
|
401
|
+
**v7.42.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
7.
|
|
1
|
+
7.42.0
|
|
@@ -710,8 +710,18 @@ print('true' if ratio > budget else 'false')
|
|
|
710
710
|
((member++))
|
|
711
711
|
done
|
|
712
712
|
|
|
713
|
-
# Anti-sycophancy check: if unanimous APPROVE, run devil's advocate
|
|
713
|
+
# Anti-sycophancy check: if unanimous APPROVE, run devil's advocate.
|
|
714
|
+
#
|
|
715
|
+
# Audit-trail snapshots (these do NOT affect the live vote): capture whether
|
|
716
|
+
# the council was unanimous BEFORE the decrement below, and whether the DA
|
|
717
|
+
# actually fired and flipped the verdict. The transcript fields
|
|
718
|
+
# _ct_triggered/_ct_flipped used to be re-derived from approve_count AFTER
|
|
719
|
+
# this block decremented it, so on rounds where the DA fired AND flipped they
|
|
720
|
+
# were mis-recorded as false/false, corrupting the trust-metrics audit trail.
|
|
721
|
+
local _da_was_unanimous="false"
|
|
722
|
+
local _da_flipped="false"
|
|
714
723
|
if [ $approve_count -eq $COUNCIL_SIZE ] && [ $COUNCIL_SIZE -ge 2 ]; then
|
|
724
|
+
_da_was_unanimous="true"
|
|
715
725
|
log_warn "Unanimous approval detected - running anti-sycophancy check..."
|
|
716
726
|
local contrarian_verdict
|
|
717
727
|
contrarian_verdict=$(council_devils_advocate "$evidence_file" "$vote_dir")
|
|
@@ -731,6 +741,7 @@ print('true' if ratio > budget else 'false')
|
|
|
731
741
|
log_warn "Overriding to require one more iteration for verification"
|
|
732
742
|
approve_count=$((approve_count - 1))
|
|
733
743
|
reject_count=$((reject_count + 1))
|
|
744
|
+
_da_flipped="true"
|
|
734
745
|
fi
|
|
735
746
|
fi
|
|
736
747
|
|
|
@@ -795,20 +806,18 @@ with open(state_file, 'w') as f:
|
|
|
795
806
|
>/dev/null 2>&1 || true
|
|
796
807
|
fi
|
|
797
808
|
|
|
798
|
-
# Write transcript for this council round (Path A: council_vote path)
|
|
809
|
+
# Write transcript for this council round (Path A: council_vote path).
|
|
810
|
+
#
|
|
811
|
+
# Drive contrarian_triggered/_flipped off the snapshots captured in the
|
|
812
|
+
# anti-sycophancy block ABOVE, not off the now-mutated approve_count. The DA
|
|
813
|
+
# fires exactly when the council was unanimous (_da_was_unanimous), and it
|
|
814
|
+
# flips exactly when it did not confirm the approval (_da_flipped). Re-deriving
|
|
815
|
+
# from approve_count was wrong because the flip path already decremented it,
|
|
816
|
+
# so triggered/flipped were both recorded as false on flip rounds.
|
|
799
817
|
local _ct_outcome
|
|
800
818
|
_ct_outcome=$([ $approve_count -ge $effective_threshold ] && echo "APPROVED" || echo "REJECTED")
|
|
801
|
-
local _ct_triggered="
|
|
802
|
-
local _ct_flipped="
|
|
803
|
-
if [ $approve_count -eq $COUNCIL_SIZE ] && [ $COUNCIL_SIZE -ge 2 ]; then
|
|
804
|
-
_ct_triggered="true"
|
|
805
|
-
fi
|
|
806
|
-
# contrarian_flipped: DA voted REJECT/CANNOT_VALIDATE causing approve_count drop
|
|
807
|
-
# Detect by checking if approve dropped from unanimous (COUNCIL_SIZE) to less
|
|
808
|
-
# We infer flip if triggered AND final approve < COUNCIL_SIZE
|
|
809
|
-
if [ "$_ct_triggered" = "true" ] && [ $approve_count -lt $COUNCIL_SIZE ]; then
|
|
810
|
-
_ct_flipped="true"
|
|
811
|
-
fi
|
|
819
|
+
local _ct_triggered="$_da_was_unanimous"
|
|
820
|
+
local _ct_flipped="$_da_flipped"
|
|
812
821
|
council_write_transcript "${ITERATION_COUNT:-0}" "$_ct_outcome" "$_ct_triggered" "$_ct_flipped" "$effective_threshold"
|
|
813
822
|
|
|
814
823
|
if [ $approve_count -ge $effective_threshold ]; then
|
|
@@ -317,14 +317,38 @@ hook_pre_healing_modify() {
|
|
|
317
317
|
if [[ -f "$heal_dir/friction-map.json" ]]; then
|
|
318
318
|
local blocked
|
|
319
319
|
blocked=$(python3 -c "
|
|
320
|
-
import json, sys
|
|
320
|
+
import json, os, sys
|
|
321
321
|
file_path = sys.argv[1]
|
|
322
322
|
strict = sys.argv[2] == 'true'
|
|
323
323
|
with open(sys.argv[3]) as f:
|
|
324
324
|
data = json.load(f)
|
|
325
|
+
|
|
326
|
+
# Path-aware match (not raw substring 'in', which over-matched app.py against
|
|
327
|
+
# myapp.py and under-matched src/foo.py against a foo.py:10 location). Friction
|
|
328
|
+
# locations are formatted 'path:line' (or just 'path'); strip a trailing
|
|
329
|
+
# ':<line>' then compare by basename and normalized path so the same file is
|
|
330
|
+
# matched regardless of how it was referenced.
|
|
331
|
+
def norm(p):
|
|
332
|
+
# Drop a trailing ':<line>' (and optional ':<col>') suffix from a location.
|
|
333
|
+
parts = p.rsplit(':', 1)
|
|
334
|
+
while len(parts) == 2 and parts[1].isdigit():
|
|
335
|
+
p = parts[0]
|
|
336
|
+
parts = p.rsplit(':', 1)
|
|
337
|
+
return p
|
|
338
|
+
|
|
339
|
+
def matches(target, loc):
|
|
340
|
+
loc = norm(loc)
|
|
341
|
+
if not target or not loc:
|
|
342
|
+
return False
|
|
343
|
+
# Exact normalized-path match, or same basename. Basename equality is the
|
|
344
|
+
# path-aware replacement for substring containment.
|
|
345
|
+
if os.path.normpath(target) == os.path.normpath(loc):
|
|
346
|
+
return True
|
|
347
|
+
return os.path.basename(target) == os.path.basename(loc)
|
|
348
|
+
|
|
325
349
|
for friction in data.get('frictions', []):
|
|
326
350
|
loc = friction.get('location', '')
|
|
327
|
-
if file_path
|
|
351
|
+
if matches(file_path, loc):
|
|
328
352
|
cls = friction.get('classification', 'unknown')
|
|
329
353
|
safe = friction.get('safe_to_remove', False)
|
|
330
354
|
if cls in ('business_rule', 'unknown') and not safe:
|
|
@@ -343,9 +367,101 @@ print('OK')
|
|
|
343
367
|
fi
|
|
344
368
|
fi
|
|
345
369
|
|
|
370
|
+
# Capture a pre-edit snapshot so post_healing_modify can revert ONLY the
|
|
371
|
+
# healing edit on test failure (not unrelated uncommitted changes, and not
|
|
372
|
+
# via git checkout which discards everything). Keyed by file path.
|
|
373
|
+
_heal_snapshot_save "$heal_dir" "$file_path"
|
|
374
|
+
|
|
375
|
+
return 0
|
|
376
|
+
}
|
|
377
|
+
|
|
378
|
+
# Snapshot path helper: maps a target file path to its snapshot blob location.
|
|
379
|
+
# Uses a flat directory with the path's basename plus a hash of the full path
|
|
380
|
+
# to avoid collisions between same-named files in different directories.
|
|
381
|
+
_heal_snapshot_path() {
|
|
382
|
+
local heal_dir="$1"
|
|
383
|
+
local file_path="$2"
|
|
384
|
+
local key
|
|
385
|
+
key=$(printf '%s' "$file_path" | cksum | awk '{print $1"-"$2}')
|
|
386
|
+
printf '%s/snapshots/%s.%s' "$heal_dir" "$(basename "$file_path")" "$key"
|
|
387
|
+
}
|
|
388
|
+
|
|
389
|
+
# Save a pre-edit snapshot of file_path. If the file does not exist yet (the
|
|
390
|
+
# healing edit will CREATE it), write a sentinel marker instead so the revert
|
|
391
|
+
# path knows to remove the file rather than restore content.
|
|
392
|
+
#
|
|
393
|
+
# Pairing contract: hook_pre_healing_modify (which calls this) MUST run for a
|
|
394
|
+
# file before hook_post_healing_modify reverts it. The snapshot is refreshed on
|
|
395
|
+
# every pre call, so a post without a matching fresh pre could restore a stale
|
|
396
|
+
# blob. On the success path the snapshot is intentionally left in place; the
|
|
397
|
+
# next pre overwrites it.
|
|
398
|
+
_heal_snapshot_save() {
|
|
399
|
+
local heal_dir="$1"
|
|
400
|
+
local file_path="$2"
|
|
401
|
+
[[ -z "$file_path" ]] && return 0
|
|
402
|
+
local snap_dir="$heal_dir/snapshots"
|
|
403
|
+
mkdir -p "$snap_dir" 2>/dev/null || return 0
|
|
404
|
+
local snap
|
|
405
|
+
snap=$(_heal_snapshot_path "$heal_dir" "$file_path")
|
|
406
|
+
if [[ -f "$file_path" ]]; then
|
|
407
|
+
cp "$file_path" "$snap" 2>/dev/null || return 0
|
|
408
|
+
rm -f "$snap.absent" 2>/dev/null || true
|
|
409
|
+
else
|
|
410
|
+
# File does not exist pre-edit: record an "absent" marker, drop any
|
|
411
|
+
# stale content snapshot.
|
|
412
|
+
rm -f "$snap" 2>/dev/null || true
|
|
413
|
+
: > "$snap.absent" 2>/dev/null || true
|
|
414
|
+
fi
|
|
346
415
|
return 0
|
|
347
416
|
}
|
|
348
417
|
|
|
418
|
+
# Restore file_path from its pre-edit snapshot, reverting ONLY the healing edit.
|
|
419
|
+
# Echoes an accurate human-readable message describing what actually happened
|
|
420
|
+
# (content restored / healing-added file removed / could not revert). Returns 0
|
|
421
|
+
# when the revert succeeded as reported, 1 when it could not be performed.
|
|
422
|
+
_heal_snapshot_restore() {
|
|
423
|
+
local heal_dir="$1"
|
|
424
|
+
local file_path="$2"
|
|
425
|
+
if [[ -z "$file_path" ]]; then
|
|
426
|
+
echo "No file path given; nothing reverted."
|
|
427
|
+
return 1
|
|
428
|
+
fi
|
|
429
|
+
local snap
|
|
430
|
+
snap=$(_heal_snapshot_path "$heal_dir" "$file_path")
|
|
431
|
+
|
|
432
|
+
if [[ -f "$snap" ]]; then
|
|
433
|
+
# Pre-edit content snapshot exists: restore exactly that content, which
|
|
434
|
+
# preserves any unrelated uncommitted changes present before the edit.
|
|
435
|
+
if cp "$snap" "$file_path" 2>/dev/null; then
|
|
436
|
+
echo "Healing edit reverted to pre-edit snapshot."
|
|
437
|
+
return 0
|
|
438
|
+
fi
|
|
439
|
+
echo "Could not restore pre-edit snapshot for ${file_path}; file left as-is."
|
|
440
|
+
return 1
|
|
441
|
+
fi
|
|
442
|
+
|
|
443
|
+
if [[ -f "$snap.absent" ]]; then
|
|
444
|
+
# File did not exist pre-edit: the healing edit created it. Remove only
|
|
445
|
+
# that file, not unrelated state.
|
|
446
|
+
if [[ ! -e "$file_path" ]]; then
|
|
447
|
+
echo "Healing-added file ${file_path} no longer present; nothing to remove."
|
|
448
|
+
return 0
|
|
449
|
+
fi
|
|
450
|
+
if rm -f "$file_path" 2>/dev/null; then
|
|
451
|
+
echo "Healing-added file ${file_path} removed."
|
|
452
|
+
return 0
|
|
453
|
+
fi
|
|
454
|
+
echo "Could not remove healing-added file ${file_path}; file left as-is."
|
|
455
|
+
return 1
|
|
456
|
+
fi
|
|
457
|
+
|
|
458
|
+
# No snapshot was captured (pre_healing_modify did not run for this file).
|
|
459
|
+
# Be honest: do not claim a revert that did not happen, and do NOT fall back
|
|
460
|
+
# to a destructive git checkout.
|
|
461
|
+
echo "No pre-edit snapshot found for ${file_path}; could not revert (left as-is)."
|
|
462
|
+
return 1
|
|
463
|
+
}
|
|
464
|
+
|
|
349
465
|
# Hook: post_healing_modify - runs AFTER agent modifies a file in healing mode
|
|
350
466
|
# Verifies characterization tests still pass after modification
|
|
351
467
|
hook_post_healing_modify() {
|
|
@@ -384,9 +500,17 @@ hook_post_healing_modify() {
|
|
|
384
500
|
test_output=$(cat "$test_result_file")
|
|
385
501
|
rm -f "$test_result_file"
|
|
386
502
|
|
|
387
|
-
# Revert the
|
|
388
|
-
|
|
389
|
-
|
|
503
|
+
# Revert ONLY the healing edit using the pre-edit snapshot captured by
|
|
504
|
+
# hook_pre_healing_modify. Do NOT use `git checkout -- "$file_path"`:
|
|
505
|
+
# that discards ALL uncommitted changes to the file (not just the
|
|
506
|
+
# healing edit) and silently no-ops for an untracked file while still
|
|
507
|
+
# claiming the change was reverted. Report exactly what happened.
|
|
508
|
+
local revert_msg
|
|
509
|
+
# _heal_snapshot_restore returns nonzero when it could not revert; we
|
|
510
|
+
# surface the outcome via its message (recorded below) rather than a
|
|
511
|
+
# code, and must not let a nonzero return abort under set -e.
|
|
512
|
+
revert_msg=$(_heal_snapshot_restore "$heal_dir" "$file_path") || true
|
|
513
|
+
echo "HOOK_BLOCKED: Characterization tests failed after healing modification to ${file_path}. ${revert_msg}"
|
|
390
514
|
echo "Test output: ${test_output}"
|
|
391
515
|
|
|
392
516
|
# Record failure in failure-modes.json
|
|
@@ -404,12 +528,12 @@ data.setdefault('modes', []).append({
|
|
|
404
528
|
'trigger': 'healing_modification',
|
|
405
529
|
'file': sys.argv[2],
|
|
406
530
|
'behavior': 'Characterization tests failed after modification',
|
|
407
|
-
'recovery':
|
|
531
|
+
'recovery': sys.argv[3],
|
|
408
532
|
'is_intentional': False
|
|
409
533
|
})
|
|
410
534
|
with open(sys.argv[1], 'w') as f:
|
|
411
535
|
json.dump(data, f, indent=2)
|
|
412
|
-
" "$heal_dir/failure-modes.json" "$file_path" 2>/dev/null || true
|
|
536
|
+
" "$heal_dir/failure-modes.json" "$file_path" "$revert_msg" 2>/dev/null || true
|
|
413
537
|
fi
|
|
414
538
|
|
|
415
539
|
return 1
|
package/autonomy/loki
CHANGED
|
@@ -13178,13 +13178,18 @@ FEOF
|
|
|
13178
13178
|
;;
|
|
13179
13179
|
--disable)
|
|
13180
13180
|
if [ -f "$failover_file" ]; then
|
|
13181
|
-
python3 -c "
|
|
13182
|
-
import json
|
|
13183
|
-
|
|
13181
|
+
if _FAILOVER_FILE="$failover_file" python3 -c "
|
|
13182
|
+
import json, os
|
|
13183
|
+
failover_file = os.environ['_FAILOVER_FILE']
|
|
13184
|
+
with open(failover_file) as f: d = json.load(f)
|
|
13184
13185
|
d['enabled'] = False
|
|
13185
|
-
with open(
|
|
13186
|
-
"
|
|
13187
|
-
|
|
13186
|
+
with open(failover_file, 'w') as f: json.dump(d, f, indent=2)
|
|
13187
|
+
"; then
|
|
13188
|
+
echo -e "${YELLOW}Failover disabled${NC}"
|
|
13189
|
+
else
|
|
13190
|
+
echo -e "${RED}Error: failed to disable failover${NC}"
|
|
13191
|
+
return 1
|
|
13192
|
+
fi
|
|
13188
13193
|
else
|
|
13189
13194
|
echo "Failover not initialized."
|
|
13190
13195
|
fi
|
|
@@ -13212,13 +13217,19 @@ with open('$failover_file', 'w') as f: json.dump(d, f, indent=2)
|
|
|
13212
13217
|
return 1
|
|
13213
13218
|
fi
|
|
13214
13219
|
|
|
13215
|
-
python3 -c "
|
|
13216
|
-
import json
|
|
13217
|
-
|
|
13218
|
-
|
|
13219
|
-
with open(
|
|
13220
|
-
|
|
13221
|
-
|
|
13220
|
+
if _FAILOVER_FILE="$failover_file" _NEW_CHAIN="$new_chain" python3 -c "
|
|
13221
|
+
import json, os
|
|
13222
|
+
failover_file = os.environ['_FAILOVER_FILE']
|
|
13223
|
+
new_chain = os.environ['_NEW_CHAIN']
|
|
13224
|
+
with open(failover_file) as f: d = json.load(f)
|
|
13225
|
+
d['chain'] = new_chain.split(',')
|
|
13226
|
+
with open(failover_file, 'w') as f: json.dump(d, f, indent=2)
|
|
13227
|
+
"; then
|
|
13228
|
+
echo "Failover chain updated: $new_chain"
|
|
13229
|
+
else
|
|
13230
|
+
echo -e "${RED}Error: failed to update failover chain${NC}"
|
|
13231
|
+
return 1
|
|
13232
|
+
fi
|
|
13222
13233
|
shift
|
|
13223
13234
|
;;
|
|
13224
13235
|
--test)
|
|
@@ -18601,16 +18612,16 @@ else:
|
|
|
18601
18612
|
exit 1
|
|
18602
18613
|
fi
|
|
18603
18614
|
|
|
18604
|
-
python3 -c "
|
|
18615
|
+
_REGISTRY_FILE="$registry_file" _PROJ_PATH="$path" _PROJ_NAME="$name" _PROJ_ALIAS="$alias" python3 -c "
|
|
18605
18616
|
import json
|
|
18606
18617
|
import os
|
|
18607
18618
|
import hashlib
|
|
18608
18619
|
from datetime import datetime, timezone
|
|
18609
18620
|
|
|
18610
|
-
registry_file = '
|
|
18611
|
-
path = '
|
|
18612
|
-
name = '
|
|
18613
|
-
alias = '
|
|
18621
|
+
registry_file = os.environ['_REGISTRY_FILE']
|
|
18622
|
+
path = os.environ['_PROJ_PATH']
|
|
18623
|
+
name = os.environ['_PROJ_NAME'] or os.path.basename(path)
|
|
18624
|
+
alias = os.environ['_PROJ_ALIAS'] or None
|
|
18614
18625
|
|
|
18615
18626
|
# Generate project ID
|
|
18616
18627
|
project_id = hashlib.md5(path.encode()).hexdigest()[:12]
|
|
@@ -18651,7 +18662,7 @@ with open(registry_file, 'w') as f:
|
|
|
18651
18662
|
print(f' Path: {path}')
|
|
18652
18663
|
if alias:
|
|
18653
18664
|
print(f' Alias: {alias}')
|
|
18654
|
-
"
|
|
18665
|
+
"
|
|
18655
18666
|
;;
|
|
18656
18667
|
|
|
18657
18668
|
remove|rm)
|
|
@@ -18662,12 +18673,12 @@ if alias:
|
|
|
18662
18673
|
exit 1
|
|
18663
18674
|
fi
|
|
18664
18675
|
|
|
18665
|
-
python3 -c "
|
|
18676
|
+
_REGISTRY_FILE="$registry_file" _IDENTIFIER="$identifier" python3 -c "
|
|
18666
18677
|
import json
|
|
18667
18678
|
import os
|
|
18668
18679
|
|
|
18669
|
-
registry_file = '
|
|
18670
|
-
identifier = '
|
|
18680
|
+
registry_file = os.environ['_REGISTRY_FILE']
|
|
18681
|
+
identifier = os.environ['_IDENTIFIER']
|
|
18671
18682
|
|
|
18672
18683
|
with open(registry_file, 'r') as f:
|
|
18673
18684
|
data = json.load(f)
|
|
@@ -18690,7 +18701,7 @@ if found_id:
|
|
|
18690
18701
|
else:
|
|
18691
18702
|
print(f'Not found: {identifier}')
|
|
18692
18703
|
exit(1)
|
|
18693
|
-
"
|
|
18704
|
+
"
|
|
18694
18705
|
;;
|
|
18695
18706
|
|
|
18696
18707
|
discover)
|
|
@@ -18842,12 +18853,12 @@ print(f'Added: {added}, Missing: {missing}, Total: {len(projects)}')
|
|
|
18842
18853
|
health)
|
|
18843
18854
|
local identifier="${2:-$(pwd)}"
|
|
18844
18855
|
|
|
18845
|
-
python3 -c "
|
|
18856
|
+
_REGISTRY_FILE="$registry_file" _IDENTIFIER="$identifier" python3 -c "
|
|
18846
18857
|
import json
|
|
18847
18858
|
import os
|
|
18848
18859
|
|
|
18849
|
-
registry_file = '
|
|
18850
|
-
identifier = '
|
|
18860
|
+
registry_file = os.environ['_REGISTRY_FILE']
|
|
18861
|
+
identifier = os.environ['_IDENTIFIER']
|
|
18851
18862
|
|
|
18852
18863
|
# If it's a path, resolve it
|
|
18853
18864
|
if os.path.isdir(identifier):
|
|
@@ -18886,7 +18897,7 @@ print('Health Checks:')
|
|
|
18886
18897
|
for check, passed in checks.items():
|
|
18887
18898
|
icon = '[OK]' if passed else '[FAIL]'
|
|
18888
18899
|
print(f' {icon} {check}')
|
|
18889
|
-
"
|
|
18900
|
+
"
|
|
18890
18901
|
;;
|
|
18891
18902
|
|
|
18892
18903
|
--help|-h|help)
|
|
@@ -19040,17 +19051,17 @@ cmd_enterprise() {
|
|
|
19040
19051
|
esac
|
|
19041
19052
|
done
|
|
19042
19053
|
|
|
19043
|
-
python3 -c "
|
|
19054
|
+
_TOKEN_FILE="$token_file" _TOKEN_NAME="$name" _TOKEN_SCOPES="$scopes" _TOKEN_EXPIRES="$expires" python3 -c "
|
|
19044
19055
|
import json
|
|
19045
19056
|
import secrets
|
|
19046
19057
|
import hashlib
|
|
19047
19058
|
from datetime import datetime, timezone, timedelta
|
|
19048
19059
|
import os
|
|
19049
19060
|
|
|
19050
|
-
token_file = '
|
|
19051
|
-
name = '
|
|
19052
|
-
scopes_str = '
|
|
19053
|
-
expires_str = '
|
|
19061
|
+
token_file = os.environ['_TOKEN_FILE']
|
|
19062
|
+
name = os.environ['_TOKEN_NAME']
|
|
19063
|
+
scopes_str = os.environ['_TOKEN_SCOPES']
|
|
19064
|
+
expires_str = os.environ['_TOKEN_EXPIRES']
|
|
19054
19065
|
|
|
19055
19066
|
# Parse scopes
|
|
19056
19067
|
scopes = scopes_str.split(',') if scopes_str else ['*']
|
|
@@ -19105,7 +19116,7 @@ if expires_at:
|
|
|
19105
19116
|
print('')
|
|
19106
19117
|
print('Token (save this - shown only once):')
|
|
19107
19118
|
print(f' {raw_token}')
|
|
19108
|
-
"
|
|
19119
|
+
"
|
|
19109
19120
|
;;
|
|
19110
19121
|
|
|
19111
19122
|
list|ls)
|
|
@@ -19174,12 +19185,12 @@ else:
|
|
|
19174
19185
|
exit 2
|
|
19175
19186
|
fi
|
|
19176
19187
|
|
|
19177
|
-
python3 -c "
|
|
19178
|
-
import json
|
|
19188
|
+
_TOKEN_FILE="$token_file" _IDENTIFIER="$identifier" python3 -c "
|
|
19189
|
+
import json, os
|
|
19179
19190
|
from datetime import datetime, timezone
|
|
19180
19191
|
|
|
19181
|
-
token_file = '
|
|
19182
|
-
identifier = '
|
|
19192
|
+
token_file = os.environ['_TOKEN_FILE']
|
|
19193
|
+
identifier = os.environ['_IDENTIFIER']
|
|
19183
19194
|
|
|
19184
19195
|
with open(token_file, 'r') as f:
|
|
19185
19196
|
data = json.load(f)
|
|
@@ -19202,7 +19213,7 @@ if found_id:
|
|
|
19202
19213
|
else:
|
|
19203
19214
|
print(f'Token not found: {identifier}')
|
|
19204
19215
|
exit(1)
|
|
19205
|
-
"
|
|
19216
|
+
"
|
|
19206
19217
|
;;
|
|
19207
19218
|
|
|
19208
19219
|
delete)
|
|
@@ -19213,11 +19224,11 @@ else:
|
|
|
19213
19224
|
exit 2
|
|
19214
19225
|
fi
|
|
19215
19226
|
|
|
19216
|
-
python3 -c "
|
|
19217
|
-
import json
|
|
19227
|
+
_TOKEN_FILE="$token_file" _IDENTIFIER="$identifier" python3 -c "
|
|
19228
|
+
import json, os
|
|
19218
19229
|
|
|
19219
|
-
token_file = '
|
|
19220
|
-
identifier = '
|
|
19230
|
+
token_file = os.environ['_TOKEN_FILE']
|
|
19231
|
+
identifier = os.environ['_IDENTIFIER']
|
|
19221
19232
|
|
|
19222
19233
|
with open(token_file, 'r') as f:
|
|
19223
19234
|
data = json.load(f)
|
|
@@ -19241,7 +19252,7 @@ if found_id:
|
|
|
19241
19252
|
else:
|
|
19242
19253
|
print(f'Token not found: {identifier}')
|
|
19243
19254
|
exit(1)
|
|
19244
|
-
"
|
|
19255
|
+
"
|
|
19245
19256
|
;;
|
|
19246
19257
|
|
|
19247
19258
|
*)
|
package/dashboard/__init__.py
CHANGED
package/dashboard/server.py
CHANGED
|
@@ -7034,6 +7034,96 @@ def _pid_is_alive(pid):
|
|
|
7034
7034
|
return None
|
|
7035
7035
|
|
|
7036
7036
|
|
|
7037
|
+
# Margin (seconds) added to the recorded reference time before a live pid is
|
|
7038
|
+
# judged to be a recycled (different) process. Must comfortably exceed clock
|
|
7039
|
+
# skew plus the launch-to-first-state-write gap so a genuine app is never
|
|
7040
|
+
# downgraded. A PID recycled after a crash typically belongs to a process that
|
|
7041
|
+
# started minutes or hours later, so a generous margin still catches recycles
|
|
7042
|
+
# while strongly biasing against the far worse false-positive of killing a live
|
|
7043
|
+
# app's status. See _reconcile_app_runner_liveness.
|
|
7044
|
+
_APP_RUNNER_PID_RECYCLE_MARGIN_SECONDS = 120
|
|
7045
|
+
|
|
7046
|
+
|
|
7047
|
+
def _pid_start_time(pid):
|
|
7048
|
+
"""Best-effort wall-clock start time of pid, as epoch seconds, or None.
|
|
7049
|
+
|
|
7050
|
+
Reads `ps -o lstart= -p <pid>`, which is available on both macOS and Linux
|
|
7051
|
+
and prints the process start time in local time (e.g. "Sun Jun 14 18:39:15
|
|
7052
|
+
2026"). The string is locale-dependent (%a/%b), so any parse failure, empty
|
|
7053
|
+
output, or missing process returns None and the caller degrades gracefully
|
|
7054
|
+
to its prior behavior. The returned epoch is timezone-correct because the
|
|
7055
|
+
naive local timestamp is interpreted in the system's local zone before
|
|
7056
|
+
conversion (ps reports local time; never mix it with a UTC value directly).
|
|
7057
|
+
"""
|
|
7058
|
+
try:
|
|
7059
|
+
pid = int(pid)
|
|
7060
|
+
except (TypeError, ValueError):
|
|
7061
|
+
return None
|
|
7062
|
+
if pid <= 0:
|
|
7063
|
+
return None
|
|
7064
|
+
try:
|
|
7065
|
+
out = subprocess.run(["ps", "-o", "lstart=", "-p", str(pid)],
|
|
7066
|
+
capture_output=True, text=True, timeout=5)
|
|
7067
|
+
except (OSError, subprocess.SubprocessError):
|
|
7068
|
+
return None
|
|
7069
|
+
raw = (out.stdout or "").strip()
|
|
7070
|
+
if not raw:
|
|
7071
|
+
return None
|
|
7072
|
+
try:
|
|
7073
|
+
# lstart is local time without a zone; parse naive then attach the
|
|
7074
|
+
# local zone so .timestamp() yields a correct epoch regardless of TZ.
|
|
7075
|
+
naive = datetime.strptime(raw, "%a %b %d %H:%M:%S %Y")
|
|
7076
|
+
local = naive.replace(tzinfo=datetime.now().astimezone().tzinfo)
|
|
7077
|
+
return local.timestamp()
|
|
7078
|
+
except (ValueError, OverflowError, OSError):
|
|
7079
|
+
return None
|
|
7080
|
+
|
|
7081
|
+
|
|
7082
|
+
def _state_reference_epoch(state):
|
|
7083
|
+
"""Epoch seconds for state.json's recorded reference time, or None.
|
|
7084
|
+
|
|
7085
|
+
Uses `started_at` (rewritten by the app-runner on every state write; it is
|
|
7086
|
+
the last-state-write time, not pure launch time). For a genuine process the
|
|
7087
|
+
real start time is always <= this value, so it is a safe upper bound to
|
|
7088
|
+
compare a live pid's start time against. The value is UTC (Z-suffixed).
|
|
7089
|
+
"""
|
|
7090
|
+
if not isinstance(state, dict):
|
|
7091
|
+
return None
|
|
7092
|
+
started_at = state.get("started_at")
|
|
7093
|
+
if not started_at:
|
|
7094
|
+
return None
|
|
7095
|
+
try:
|
|
7096
|
+
ts = datetime.fromisoformat(str(started_at).replace("Z", "+00:00"))
|
|
7097
|
+
except (ValueError, TypeError):
|
|
7098
|
+
return None
|
|
7099
|
+
if ts.tzinfo is None:
|
|
7100
|
+
ts = ts.replace(tzinfo=timezone.utc)
|
|
7101
|
+
return ts.timestamp()
|
|
7102
|
+
|
|
7103
|
+
|
|
7104
|
+
def _pid_is_recycled(state):
|
|
7105
|
+
"""True if the recorded main_pid is alive but is a DIFFERENT process now.
|
|
7106
|
+
|
|
7107
|
+
After the recorded app dies, the OS can recycle its numeric pid for an
|
|
7108
|
+
unrelated process; os.kill(pid, 0) then reports the stale pid "alive"
|
|
7109
|
+
forever and a dead run is never reconciled. We detect this by comparing the
|
|
7110
|
+
live pid's real start time against the recorded reference time: a genuine
|
|
7111
|
+
process started at or before the reference, so a live pid whose start time
|
|
7112
|
+
is comfortably AFTER the reference cannot be the original.
|
|
7113
|
+
|
|
7114
|
+
Returns True only with positive evidence of recycling. Any missing data
|
|
7115
|
+
(no recorded reference, start time unavailable) returns False so the caller
|
|
7116
|
+
keeps its prior behavior -- best-effort, biased against false positives.
|
|
7117
|
+
"""
|
|
7118
|
+
reference = _state_reference_epoch(state)
|
|
7119
|
+
if reference is None:
|
|
7120
|
+
return False
|
|
7121
|
+
pid_start = _pid_start_time(state.get("main_pid"))
|
|
7122
|
+
if pid_start is None:
|
|
7123
|
+
return False
|
|
7124
|
+
return pid_start > reference + _APP_RUNNER_PID_RECYCLE_MARGIN_SECONDS
|
|
7125
|
+
|
|
7126
|
+
|
|
7037
7127
|
def _health_checked_age_seconds(state):
|
|
7038
7128
|
"""Seconds since last_health.checked_at, or None if unparseable/absent."""
|
|
7039
7129
|
health = state.get("last_health")
|
|
@@ -7059,6 +7149,9 @@ def _reconcile_app_runner_liveness(state):
|
|
|
7059
7149
|
Here we cross-check the recorded main_pid against the real OS before
|
|
7060
7150
|
returning, and only ever downgrade -- never upgrade -- the status:
|
|
7061
7151
|
- recorded running/starting + pid genuinely gone -> "stopped"
|
|
7152
|
+
- recorded running/starting + pid "alive" but its real start time is
|
|
7153
|
+
after the recorded reference (the OS recycled a dead run's pid for an
|
|
7154
|
+
unrelated process) -> "stopped"
|
|
7062
7155
|
- recorded running/starting + pid not verifiable +
|
|
7063
7156
|
last_health.checked_at older than the threshold -> "stale"
|
|
7064
7157
|
Any failure falls back to the raw recorded status (fail open to the writer's
|
|
@@ -7076,6 +7169,15 @@ def _reconcile_app_runner_liveness(state):
|
|
|
7076
7169
|
state["status"] = "stopped"
|
|
7077
7170
|
state["liveness"] = "pid_gone"
|
|
7078
7171
|
return state
|
|
7172
|
+
if alive is True:
|
|
7173
|
+
# The numeric pid exists, but os.kill(pid, 0) cannot tell whether it
|
|
7174
|
+
# is still the SAME process. After a dead run the OS can recycle the
|
|
7175
|
+
# pid; detect that via the process start time so a recycled pid is
|
|
7176
|
+
# treated as gone rather than reported "running" forever.
|
|
7177
|
+
if _pid_is_recycled(state):
|
|
7178
|
+
state["status"] = "stopped"
|
|
7179
|
+
state["liveness"] = "pid_recycled"
|
|
7180
|
+
return state
|
|
7079
7181
|
if alive is None:
|
|
7080
7182
|
# Cannot verify via pid (e.g. compose subshell pid). Fall back to
|
|
7081
7183
|
# the health-beat freshness with a generous threshold.
|