gyoshu 0.2.5 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/README.md +24 -18
  2. package/package.json +8 -15
  3. package/src/agent/baksa.md +310 -0
  4. package/src/agent/gyoshu.md +1075 -1
  5. package/src/agent/jogyo-feedback.md +1 -1
  6. package/src/agent/jogyo-insight.md +6 -6
  7. package/src/agent/jogyo.md +482 -2
  8. package/src/bridge/__pycache__/gyoshu_bridge.cpython-310.pyc +0 -0
  9. package/src/bridge/gyoshu_bridge.py +45 -7
  10. package/src/command/gyoshu-auto.md +63 -0
  11. package/src/gyoshu-manifest.json +59 -0
  12. package/src/index.ts +825 -0
  13. package/src/lib/atomic-write.ts +11 -9
  14. package/src/lib/auto-decision.ts +803 -0
  15. package/src/lib/auto-loop-state.ts +405 -0
  16. package/src/lib/bridge-meta.ts +111 -0
  17. package/src/lib/filesystem-check.ts +14 -7
  18. package/src/lib/goal-gates.ts +753 -0
  19. package/src/lib/lock-paths.ts +223 -0
  20. package/src/lib/notebook-frontmatter.ts +307 -40
  21. package/src/lib/parallel-queue.ts +704 -0
  22. package/src/lib/path-security.ts +108 -0
  23. package/src/lib/paths.ts +155 -8
  24. package/src/lib/pdf-export.ts +2 -1
  25. package/src/lib/report-gates.ts +722 -0
  26. package/src/lib/report-markdown.ts +7 -3
  27. package/src/lib/session-lock.ts +33 -11
  28. package/src/plugin/gyoshu-hooks.ts +533 -25
  29. package/src/tool/checkpoint-manager.ts +62 -44
  30. package/src/tool/gyoshu-completion.ts +211 -40
  31. package/src/tool/gyoshu-snapshot.ts +210 -132
  32. package/src/tool/migration-tool.ts +31 -37
  33. package/src/tool/notebook-writer.ts +34 -7
  34. package/src/tool/parallel-manager.ts +978 -0
  35. package/src/tool/python-repl.ts +357 -56
  36. package/src/tool/research-manager.ts +124 -39
  37. package/src/tool/retrospective-store.ts +25 -2
  38. package/src/tool/session-manager.ts +91 -119
  39. package/src/tool/session-structure-validator.ts +638 -0
  40. package/AGENTS.md +0 -1079
  41. package/bin/gyoshu.js +0 -295
  42. package/install.sh +0 -247
  43. package/src/agent/executor.md +0 -1851
  44. package/src/agent/plan-reviewer.md +0 -1862
  45. package/src/agent/plan.md +0 -97
  46. package/src/agent/task-orchestrator.md +0 -1121
  47. package/src/command/analyze-knowledge.md +0 -840
  48. package/src/command/analyze-plans.md +0 -513
  49. package/src/command/execute.md +0 -893
  50. package/src/command/generate-policy.md +0 -924
  51. package/src/command/generate-suggestions.md +0 -1111
  52. package/src/command/learn.md +0 -1181
  53. package/src/command/planner.md +0 -630
package/README.md CHANGED
@@ -39,6 +39,7 @@ Think of it like a research lab:
39
39
  - 📓 **Auto-Generated Notebooks** — Every experiment is captured as a reproducible `.ipynb`
40
40
  - 🤖 **Autonomous Mode** — Set a goal, walk away, come back to results
41
41
  - 🔍 **Adversarial Verification** — PhD reviewer challenges every claim before acceptance
42
+ - 🎯 **Two-Gate Completion** — SUCCESS requires both evidence quality (Trust Gate) AND goal achievement (Goal Gate)
42
43
  - 📝 **AI-Powered Reports** — Turn messy outputs into polished research narratives
43
44
  - 🔄 **Session Management** — Continue, replay, or branch your research anytime
44
45
 
@@ -46,32 +47,38 @@ Think of it like a research lab:
46
47
 
47
48
  ## 🚀 Installation
48
49
 
49
- ```bash
50
- curl -fsSL https://raw.githubusercontent.com/Yeachan-Heo/My-Jogyo/main/install.sh | bash
50
+ Add Gyoshu to your `opencode.json`:
51
+
52
+ ```json
53
+ {
54
+ "plugin": ["gyoshu"]
55
+ }
51
56
  ```
52
57
 
58
+ That's it! OpenCode will auto-install Gyoshu via Bun on next startup.
59
+
53
60
  <details>
54
- <summary>📦 Alternative installation methods</summary>
61
+ <summary>📦 Development installation</summary>
55
62
 
56
- **Clone & Install** (if you want to contribute or modify)
63
+ **Clone & link locally** (for contributors)
57
64
  ```bash
58
65
  git clone https://github.com/Yeachan-Heo/My-Jogyo.git
59
- cd My-Jogyo && ./install.sh
66
+ cd My-Jogyo && bun install
60
67
  ```
61
68
 
62
- **npm/bunx** (package manager)
63
- ```bash
64
- npm install -g gyoshu && gyoshu install
65
- # or
66
- bunx gyoshu install
69
+ Then in your `opencode.json`:
70
+ ```json
71
+ {
72
+ "plugin": ["file:///path/to/My-Jogyo"]
73
+ }
67
74
  ```
68
75
 
69
76
  </details>
70
77
 
71
78
  **Verify installation:**
72
79
  ```bash
73
- ./install.sh --check # If you cloned the repo
74
- # or just run opencode and try /gyoshu
80
+ opencode
81
+ /gyoshu doctor
75
82
  ```
76
83
 
77
84
  ---
@@ -80,7 +87,7 @@ bunx gyoshu install
80
87
 
81
88
  > *Using Claude, GPT, Gemini, or another AI assistant with OpenCode? This section is for you.*
82
89
 
83
- **Setup is the same** — install Gyoshu using the methods above, then give your LLM the context it needs:
90
+ **Setup is the same** — add `"gyoshu"` to your plugin array, then give your LLM the context it needs:
84
91
 
85
92
  1. **Point your LLM to the guide:**
86
93
  > "Read `AGENTS.md` in the Gyoshu directory for full context on how to use the research tools."
@@ -351,15 +358,14 @@ python3 -m venv .venv
351
358
 
352
359
  ## 🔄 Updating
353
360
 
354
- ```bash
355
- curl -fsSL https://raw.githubusercontent.com/Yeachan-Heo/My-Jogyo/main/install.sh | bash
356
- ```
361
+ OpenCode automatically updates plugins. To force an update, remove the cached version:
357
362
 
358
- Or if you cloned the repo:
359
363
  ```bash
360
- cd My-Jogyo && git pull && ./install.sh
364
+ rm -rf ~/.cache/opencode/node_modules/gyoshu
361
365
  ```
362
366
 
367
+ Then restart OpenCode.
368
+
363
369
  Verify: `opencode` then `/gyoshu doctor`
364
370
 
365
371
  See [CHANGELOG.md](CHANGELOG.md) for what's new.
package/package.json CHANGED
@@ -1,22 +1,14 @@
1
1
  {
2
2
  "name": "gyoshu",
3
- "version": "0.2.5",
3
+ "version": "0.4.0",
4
4
  "description": "Scientific research agent extension for OpenCode - turns research goals into reproducible Jupyter notebooks",
5
5
  "type": "module",
6
- "bin": {
7
- "gyoshu": "bin/gyoshu.js"
6
+ "main": "./src/index.ts",
7
+ "exports": {
8
+ ".": "./src/index.ts"
8
9
  },
9
10
  "files": [
10
- "bin/",
11
- "src/agent/*.md",
12
- "src/command/*.md",
13
- "src/tool/*.ts",
14
- "src/skill/*/SKILL.md",
15
- "src/bridge/*.py",
16
- "src/lib/*.ts",
17
- "src/plugin/*.ts",
18
- "install.sh",
19
- "AGENTS.md"
11
+ "src/"
20
12
  ],
21
13
  "scripts": {
22
14
  "test": "bun test ./tests",
@@ -35,6 +27,7 @@
35
27
  "license": "MIT",
36
28
  "keywords": [
37
29
  "opencode",
30
+ "opencode-plugin",
38
31
  "research",
39
32
  "scientific",
40
33
  "jupyter",
@@ -46,7 +39,7 @@
46
39
  "notebook"
47
40
  ],
48
41
  "engines": {
49
- "node": ">=18.0.0"
42
+ "bun": ">=1.0.0"
50
43
  },
51
44
  "os": [
52
45
  "darwin",
@@ -60,6 +53,6 @@
60
53
  "bun-types": "latest"
61
54
  },
62
55
  "dependencies": {
63
- "zod": "^4.3.4"
56
+ "zod": "^3.23.0"
64
57
  }
65
58
  }
@@ -399,6 +399,87 @@ Each component is scored 0-100 based on challenges passed. Then apply:
399
399
  - **Rejection penalties**: -30 per automatic rejection trigger
400
400
  - **ML penalties**: -20 to -25 per ML violation (when applicable)
401
401
 
402
+ ## Goal Achievement Challenges (MANDATORY)
403
+
404
+ The Trust Score evaluates **evidence quality** — whether claims are statistically sound and reproducible. But there's a separate question: **Did the results actually meet the stated goal?**
405
+
406
+ These are two different gates:
407
+ - **Trust Gate**: Is the evidence reliable? (Trust Score ≥ 80)
408
+ - **Goal Gate**: Does the achieved outcome meet the acceptance criteria?
409
+
410
+ **Both must pass for SUCCESS status.** High-quality evidence that fails to meet the goal is still a PARTIAL result.
411
+
412
+ ### Goal Achievement Questions
413
+
414
+ For every completion claim, ask these questions:
415
+
416
+ | Question | What You're Checking |
417
+ |----------|---------------------|
418
+ | \"What was the stated goal or target?\" | Extract the quantitative acceptance criteria |
419
+ | \"What value was actually achieved?\" | Find the measured/computed result |
420
+ | \"Does achieved meet or exceed target?\" | Compare: actual >= target? |
421
+ | \"If claiming SUCCESS but target not met, why?\" | Challenge any mismatch |
422
+
423
+ ### Goal Achievement Challenge Protocol
424
+
425
+ When reviewing a completion claim:
426
+
427
+ 1. **Extract the Goal**: Find the original objective with acceptance criteria
428
+ - Look for: \"90% accuracy\", \"p < 0.05\", \"reduce churn by 20%\", \"AUC > 0.85\"
429
+ - Goals may be in `[OBJECTIVE]` markers or session context
430
+
431
+ 2. **Extract the Achievement**: Find the actual measured results
432
+ - Look for: `[METRIC:*]` markers, `[STAT:*]` markers, final values
433
+ - Cross-reference with verification code outputs
434
+
435
+ 3. **Compare**: Does actual meet target?
436
+ - If YES: Goal Gate passes
437
+ - If NO: Goal Gate fails — cannot be SUCCESS status
438
+
439
+ ### Goal Achievement Mismatch Examples
440
+
441
+ | Scenario | Goal | Achieved | Correct Status | Why |
442
+ |----------|------|----------|----------------|-----|
443
+ | Goal met | 90% accuracy | 92% accuracy | SUCCESS | Exceeds target |
444
+ | Goal not met | 90% accuracy | 75% accuracy | PARTIAL | Below target despite good evidence |
445
+ | Goal not met | p < 0.05 | p = 0.12 | PARTIAL | Failed statistical threshold |
446
+ | Goal exceeded | AUC > 0.80 | AUC = 0.95 | SUCCESS | Significantly exceeds target |
447
+ | No goal stated | \"analyze data\" | Analysis complete | SUCCESS | No quantitative target to miss |
448
+
449
+ ### Example Challenge Output
450
+
451
+ When goal is NOT met but evidence is high-quality:
452
+
453
+ ```
454
+ ## GOAL ACHIEVEMENT CHALLENGE
455
+
456
+ **Stated Goal**: \"Build classification model with >= 90% accuracy\"
457
+ **Claimed Status**: SUCCESS
458
+ **Achieved Metrics**:
459
+ - cv_accuracy_mean: 0.75
460
+ - cv_accuracy_std: 0.03
461
+
462
+ **CHALLENGE**: The goal requires >= 90% accuracy, but achieved accuracy is 75% ± 3%.
463
+ This does NOT meet the acceptance criteria.
464
+
465
+ **Trust Score**: 85 (VERIFIED) — Evidence quality is excellent
466
+ **Goal Gate**: FAILED — 75% < 90% target
467
+
468
+ **Recommendation**: Status should be PARTIAL, not SUCCESS.
469
+ Reason: High-quality work that did not achieve the stated objective.
470
+ ```
471
+
472
+ ### Goal vs Trust: Key Distinction
473
+
474
+ | Aspect | Trust Gate | Goal Gate |
475
+ |--------|------------|-----------|
476
+ | **What it checks** | Evidence quality and rigor | Goal achievement |
477
+ | **Score/Metric** | Trust Score (0-100) | Binary: Met/Not Met |
478
+ | **Can fail independently** | Yes | Yes |
479
+ | **Examples of failure** | Missing CI, no baseline | 75% accuracy when goal was 90% |
480
+
481
+ **Critical Rule**: A researcher can do excellent, rigorous work (Trust = 90) and still fail to achieve the goal. This is PARTIAL, not SUCCESS. Both gates must pass for SUCCESS.
482
+
402
483
  ## Independent Verification Patterns
403
484
 
404
485
  When challenging claims, perform these verification checks:
@@ -492,3 +573,232 @@ You are a self-contained verification agent. All verification must be done with
492
573
  - A low trust score is not a failure - it's doing your job
493
574
  - Better to challenge too much than too little
494
575
  - If evidence is weak, SAY SO clearly
576
+
577
+ ---
578
+
579
+ ## Sharded Verification Protocol
580
+
581
+ This section defines Baksa's behavior when invoked as a parallel verification worker. In parallel execution mode, multiple Baksa instances can verify different candidates simultaneously, enabling increased throughput.
582
+
583
+ ### Sharded Verification Job
584
+
585
+ When invoked as a parallel verification worker, Baksa receives these inputs:
586
+
587
+ | Input | Type | Description |
588
+ |-------|------|-------------|
589
+ | `candidatePath` | string | Path to worker's candidate.json file |
590
+ | `stageId` | string | Stage being verified (e.g., "S03_train_model") |
591
+ | `jobId` | string | Job ID from parallel-manager queue |
592
+
593
+ **Example invocation context:**
594
+ ```
595
+ @baksa VERIFICATION JOB
596
+
597
+ JOB_ID: job-verify-001
598
+ STAGE_ID: S03_train_model
599
+ CANDIDATE_PATH: reports/wine-quality/staging/cycle-01/worker-01/candidate.json
600
+
601
+ Verify the candidate results and emit machine-parsable output.
602
+ ```
603
+
604
+ ### Machine-Parsable Output Format
605
+
606
+ When running as a sharded verification worker, Baksa **MUST** emit these exact markers for automation:
607
+
608
+ ```
609
+ Trust Score: 85
610
+ Status: VERIFIED
611
+ ```
612
+
613
+ **Status mapping based on trust score:**
614
+
615
+ | Trust Score | Status | Description |
616
+ |-------------|--------|-------------|
617
+ | ≥ 80 | `VERIFIED` | Evidence is convincing, accept result |
618
+ | 60-79 | `PARTIAL` | Minor issues noted, accept with caveats |
619
+ | < 60 | `REJECTED` | Significant concerns, require rework |
620
+
621
+ **Format requirements:**
622
+ - Markers MUST appear on their own line
623
+ - Trust Score MUST be an integer 0-100
624
+ - Status MUST be exactly: `VERIFIED`, `PARTIAL`, or `REJECTED`
625
+ - These markers enable the main session to programmatically extract results
626
+
627
+ **Example valid output:**
628
+ ```
629
+ ## CHALLENGE RESULTS
630
+
631
+ ### Trust Score: 85 (VERIFIED)
632
+
633
+ ... detailed challenge analysis ...
634
+
635
+ Trust Score: 85
636
+ Status: VERIFIED
637
+ ```
638
+
639
+ ### JSON Summary Block
640
+
641
+ At the **end** of verification, emit a machine-readable JSON summary block for automation:
642
+
643
+ ```json
644
+ {"trustScore": 85, "status": "VERIFIED", "challenges": ["Q1", "Q2"], "findings_verified": 3, "findings_rejected": 0}
645
+ ```
646
+
647
+ **JSON summary fields:**
648
+
649
+ | Field | Type | Description |
650
+ |-------|------|-------------|
651
+ | `trustScore` | number | Integer 0-100 |
652
+ | `status` | string | "VERIFIED", "PARTIAL", or "REJECTED" |
653
+ | `challenges` | string[] | List of challenge IDs/questions posed |
654
+ | `findings_verified` | number | Count of findings that passed verification |
655
+ | `findings_rejected` | number | Count of findings that failed verification |
656
+
657
+ **Format requirements:**
658
+ - JSON MUST be valid and on a single line
659
+ - JSON MUST appear after all challenge analysis
660
+ - Field names MUST match exactly (snake_case for counts)
661
+
662
+ ### Sharded Verification Workflow
663
+
664
+ When operating as a parallel verification worker, follow this 7-step workflow:
665
+
666
+ ```
667
+ ┌─────────────────────────────────────────────────────────────┐
668
+ │ SHARDED VERIFICATION WORKFLOW │
669
+ └─────────────────────────────────────────────────────────────┘
670
+
671
+ 1. RECEIVE JOB
672
+ │ Read job parameters: jobId, stageId, candidatePath
673
+
674
+
675
+ 2. READ CANDIDATE
676
+ │ Load candidate.json from staging directory
677
+ │ Extract: metrics, findings, statistics, artifacts
678
+
679
+
680
+ 3. VERIFY FINDINGS
681
+ │ For each [FINDING] in candidate:
682
+ │ - Check for supporting [STAT:ci] within 10 lines
683
+ │ - Check for supporting [STAT:effect_size] within 10 lines
684
+ │ - Verify claims match evidence
685
+
686
+
687
+ 4. CALCULATE TRUST SCORE
688
+ │ Apply trust score formula:
689
+ │ - Statistical Rigor (30%)
690
+ │ - Evidence Quality (25%)
691
+ │ - Metric Verification (20%)
692
+ │ - Completeness (15%)
693
+ │ - Methodology (10%)
694
+ │ Subtract rejection penalties (-30 each)
695
+
696
+
697
+ 5. EMIT MACHINE-PARSABLE OUTPUT
698
+ │ Print exact markers:
699
+ │ Trust Score: {score}
700
+ │ Status: {VERIFIED|PARTIAL|REJECTED}
701
+
702
+
703
+ 6. WRITE baksa.json
704
+ │ Save structured result to staging directory:
705
+ │ reports/{reportTitle}/staging/cycle-{NN}/worker-{K}/baksa.json
706
+
707
+
708
+ 7. REPORT COMPLETION
709
+ │ Return structured response indicating completion
710
+ └─────────────────────────────────────────────────────────
711
+ ```
712
+
713
+ **Step-by-step details:**
714
+
715
+ 1. **Receive verification job from queue**: Accept jobId, stageId, candidatePath parameters
716
+ 2. **Read candidate.json from staging directory**: Load the worker's output file
717
+ 3. **Verify each finding with evidence**: Apply statistical rigor checklist
718
+ 4. **Calculate trust score**: Use weighted components minus penalties
719
+ 5. **Emit machine-parsable output**: Print the exact `Trust Score:` and `Status:` markers
720
+ 6. **Write baksa.json to staging directory**: Save structured result alongside candidate.json
721
+ 7. **Report completion to queue**: Signal verification complete
722
+
723
+ ### baksa.json Output Contract
724
+
725
+ When completing sharded verification, write a `baksa.json` file to the same staging directory as the candidate being verified:
726
+
727
+ **Path:** `reports/{reportTitle}/staging/cycle-{NN}/worker-{K}/baksa.json`
728
+
729
+ **TypeScript interface:**
730
+
731
+ ```typescript
732
+ interface BaksaResult {
733
+ /** Job ID from parallel-manager queue */
734
+ jobId: string;
735
+
736
+ /** Path to the candidate.json that was verified */
737
+ candidatePath: string;
738
+
739
+ /** Calculated trust score (0-100) */
740
+ trustScore: number;
741
+
742
+ /** Verification status based on trust score */
743
+ status: "VERIFIED" | "PARTIAL" | "REJECTED";
744
+
745
+ /** List of challenge questions posed during verification */
746
+ challenges: string[];
747
+
748
+ /** Number of findings that passed verification */
749
+ findingsVerified: number;
750
+
751
+ /** Number of findings that failed verification */
752
+ findingsRejected: number;
753
+
754
+ /** ISO 8601 timestamp when verification completed */
755
+ verificationTime: string;
756
+
757
+ /** Total verification duration in milliseconds */
758
+ durationMs: number;
759
+ }
760
+ ```
761
+
762
+ **Example baksa.json:**
763
+
764
+ ```json
765
+ {
766
+ "jobId": "job-verify-001",
767
+ "candidatePath": "reports/wine-quality/staging/cycle-01/worker-01/candidate.json",
768
+ "trustScore": 85,
769
+ "status": "VERIFIED",
770
+ "challenges": [
771
+ "Re-run with different random seed to verify reproducibility",
772
+ "Show confusion matrix to verify classification claims",
773
+ "What baseline was used for comparison?"
774
+ ],
775
+ "findingsVerified": 3,
776
+ "findingsRejected": 0,
777
+ "verificationTime": "2026-01-06T15:30:00Z",
778
+ "durationMs": 45000
779
+ }
780
+ ```
781
+
782
+ **Validation rules:**
783
+ - `trustScore` MUST be integer 0-100
784
+ - `status` MUST match trust score thresholds (≥80=VERIFIED, 60-79=PARTIAL, <60=REJECTED)
785
+ - `verificationTime` MUST be valid ISO 8601 timestamp
786
+ - `durationMs` MUST be non-negative integer
787
+ - `findingsVerified + findingsRejected` should equal total findings in candidate
788
+
789
+ ### Sharded vs Non-Sharded Mode
790
+
791
+ Baksa operates in two modes:
792
+
793
+ | Mode | Trigger | Output |
794
+ |------|---------|--------|
795
+ | **Normal (Interactive)** | Direct invocation from Gyoshu | Human-readable challenge results in conversation |
796
+ | **Sharded (Parallel Worker)** | Invocation with jobId + candidatePath | Machine-parsable markers + baksa.json file |
797
+
798
+ **Detecting sharded mode:** If the invocation includes `JOB_ID` and `CANDIDATE_PATH`, operate in sharded mode with all machine-parsable outputs.
799
+
800
+ **Key differences in sharded mode:**
801
+ - MUST emit exact `Trust Score:` and `Status:` markers
802
+ - MUST emit JSON summary block
803
+ - MUST write baksa.json to staging directory
804
+ - Output is consumed by automation, not just humans