pi-evalset-lab 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.copier-answers.yml +5 -0
- package/.githooks/pre-commit +12 -0
- package/.github/CODEOWNERS +12 -0
- package/.github/ISSUE_TEMPLATE/bug-report.yml +63 -0
- package/.github/ISSUE_TEMPLATE/config.yml +5 -0
- package/.github/ISSUE_TEMPLATE/docs.yml +39 -0
- package/.github/ISSUE_TEMPLATE/feature-request.yml +41 -0
- package/.github/VOUCHED.td +8 -0
- package/.github/dependabot.yml +13 -0
- package/.github/pull_request_template.md +34 -0
- package/.github/workflows/ci.yml +37 -0
- package/.github/workflows/publish.yml +60 -0
- package/.github/workflows/release-please.yml +25 -0
- package/.github/workflows/vouch-check-pr.yml +29 -0
- package/.github/workflows/vouch-manage.yml +34 -0
- package/.pi/extensions/startup-intake-router.ts +151 -0
- package/.pi/prompts/init-project-docs.md +32 -0
- package/.release-please-config.json +11 -0
- package/.release-please-manifest.json +3 -0
- package/AGENTS.md +39 -0
- package/CHANGELOG.md +43 -0
- package/CODE_OF_CONDUCT.md +50 -0
- package/CONTRIBUTING.md +28 -0
- package/NEXT_SESSION_PROMPT.md +14 -0
- package/README.md +246 -0
- package/SECURITY.md +34 -0
- package/SUPPORT.md +37 -0
- package/docs/dev/CONTRIBUTING.md +37 -0
- package/docs/dev/EXTENSION_SOP.md +43 -0
- package/docs/dev/next_steps.md +17 -0
- package/docs/dev/plans/001-initial-plan.md +24 -0
- package/docs/dev/status.md +21 -0
- package/docs/org/operating_model.md +39 -0
- package/docs/org/project-docs-intake.questions.json +60 -0
- package/docs/project/foundation.md +28 -0
- package/docs/project/incentives.md +17 -0
- package/docs/project/resources.md +26 -0
- package/docs/project/skills.md +17 -0
- package/docs/project/strategic_goals.md +18 -0
- package/docs/project/tactical_goals.md +39 -0
- package/docs/project/vision.md +21 -0
- package/examples/.gitkeep +0 -0
- package/examples/fixed-task-set-v2.json +127 -0
- package/examples/fixed-task-set-v3.json +126 -0
- package/examples/fixed-task-set.json +22 -0
- package/examples/system-baseline.txt +1 -0
- package/examples/system-candidate.txt +6 -0
- package/extensions/evalset.ts +1090 -0
- package/external/.gitkeep +0 -0
- package/ontology/.gitkeep +0 -0
- package/package.json +31 -0
- package/policy/security-policy.json +10 -0
- package/prek.toml +15 -0
- package/prompts/implementation-planning.md +17 -0
- package/prompts/init-project-docs.md +32 -0
- package/prompts/security-review.md +17 -0
- package/scripts/docs-list.sh +50 -0
- package/scripts/init-project-docs.sh +56 -0
- package/scripts/install-hooks.sh +13 -0
- package/scripts/sync-to-live.sh +91 -0
- package/scripts/validate-structure.sh +325 -0
- package/src/.gitkeep +0 -0
- package/tests/.gitkeep +0 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Changelog for scaffold evolution."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Preparing a release or reviewing history."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "Release log for this extension package."
|
|
7
|
+
compass: "Track meaningful deltas per version."
|
|
8
|
+
engine: "Document changes at release boundaries."
|
|
9
|
+
fog: "Versioning policy may evolve with team preference."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Changelog
|
|
13
|
+
|
|
14
|
+
All notable changes to this project should be documented here.
|
|
15
|
+
|
|
16
|
+
## [Unreleased]
|
|
17
|
+
|
|
18
|
+
### Added
|
|
19
|
+
|
|
20
|
+
- Added `/evalset` MVP command with subcommands:
|
|
21
|
+
- `init` to generate a starter fixed-task-set dataset
|
|
22
|
+
- `run` to evaluate one variant against a dataset
|
|
23
|
+
- `compare` to evaluate baseline vs candidate system prompts
|
|
24
|
+
- Added example files in `examples/`:
|
|
25
|
+
- `fixed-task-set.json`
|
|
26
|
+
- `fixed-task-set-v2.json`
|
|
27
|
+
- `fixed-task-set-v3.json`
|
|
28
|
+
- `system-baseline.txt`
|
|
29
|
+
- `system-candidate.txt`
|
|
30
|
+
- Added report output support to `.evalset/reports/*.json` with per-case and aggregate metrics.
|
|
31
|
+
- Added run identity metadata to reports (`runId`, `datasetHash`, `casesHash`, `variantHash`).
|
|
32
|
+
- Reduced session message payload size by storing only lightweight report metadata instead of full report bodies.
|
|
33
|
+
|
|
34
|
+
### Changed
|
|
35
|
+
|
|
36
|
+
- Clarified `/evalset` invocation docs: use `pi -p` (or `pi -e ... -p`) for non-interactive runs; `/evalset` is not a standalone shell binary.
|
|
37
|
+
- Added the same non-interactive invocation note to `/evalset help` output.
|
|
38
|
+
|
|
39
|
+
## [0.1.0] - 2026-02-08
|
|
40
|
+
|
|
41
|
+
### Added
|
|
42
|
+
|
|
43
|
+
- Initial production-ready scaffold generated from template v2.
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Community behavior expectations and enforcement path."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Participating in issues, discussions, or pull requests."
|
|
5
|
+
- "Handling conduct incidents."
|
|
6
|
+
system4d:
|
|
7
|
+
container: "Shared standards for respectful collaboration."
|
|
8
|
+
compass: "Safety, respect, and clarity over conflict escalation."
|
|
9
|
+
engine: "Observe -> report -> review -> enforce -> document."
|
|
10
|
+
fog: "Context around incidents can be incomplete at first report."
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Code of Conduct
|
|
14
|
+
|
|
15
|
+
## Our commitment
|
|
16
|
+
|
|
17
|
+
We want this repository to be a respectful, harassment-free space for everyone,
|
|
18
|
+
regardless of background or identity.
|
|
19
|
+
|
|
20
|
+
## Expected behavior
|
|
21
|
+
|
|
22
|
+
- Be respectful and constructive.
|
|
23
|
+
- Assume good intent, ask clarifying questions, and stay technical.
|
|
24
|
+
- Accept feedback and correct mistakes quickly.
|
|
25
|
+
- Keep discussions focused on project outcomes.
|
|
26
|
+
|
|
27
|
+
## Unacceptable behavior
|
|
28
|
+
|
|
29
|
+
- Harassment, threats, hate speech, or intimidation.
|
|
30
|
+
- Personal attacks, repeated hostile behavior, or trolling.
|
|
31
|
+
- Sharing private information without consent.
|
|
32
|
+
- Sexualized language or unwanted attention.
|
|
33
|
+
|
|
34
|
+
## Reporting incidents
|
|
35
|
+
|
|
36
|
+
If you experience or witness unacceptable behavior:
|
|
37
|
+
|
|
38
|
+
1. Contact maintainers privately using the channels listed in [SUPPORT.md](SUPPORT.md).
|
|
39
|
+
2. Share relevant links/screenshots and timeline details.
|
|
40
|
+
3. Do not post sensitive personal data publicly.
|
|
41
|
+
|
|
42
|
+
## Enforcement
|
|
43
|
+
|
|
44
|
+
Maintainers may remove or edit comments, close threads, or block contributors for
|
|
45
|
+
behavior that violates this policy. Responses may include warning, temporary ban,
|
|
46
|
+
or permanent ban depending on severity and repetition.
|
|
47
|
+
|
|
48
|
+
## Attribution
|
|
49
|
+
|
|
50
|
+
Adapted from the [Contributor Covenant](https://www.contributor-covenant.org/version/2/0/code_of_conduct.html).
|
package/CONTRIBUTING.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Top-level contribution entrypoint linking to the detailed contributor guide."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Preparing to submit code or docs changes."
|
|
5
|
+
- "Looking for contribution quality gates."
|
|
6
|
+
system4d:
|
|
7
|
+
container: "Contribution intake and quality policy."
|
|
8
|
+
compass: "Small, reviewable, verified changes."
|
|
9
|
+
engine: "Read guide -> implement -> validate -> open PR."
|
|
10
|
+
fog: "Project-specific constraints may evolve with release policy changes."
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Contributing
|
|
14
|
+
|
|
15
|
+
Primary contributor guide: [docs/dev/CONTRIBUTING.md](docs/dev/CONTRIBUTING.md)
|
|
16
|
+
|
|
17
|
+
## Minimum checklist
|
|
18
|
+
|
|
19
|
+
1. Read applicable docs (`npm run docs:list`).
|
|
20
|
+
2. Keep changes scoped.
|
|
21
|
+
3. Run `npm run check`.
|
|
22
|
+
4. Update docs/changelog when behavior changes.
|
|
23
|
+
5. Open a PR with validation output.
|
|
24
|
+
|
|
25
|
+
## Conduct + support
|
|
26
|
+
|
|
27
|
+
- [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md)
|
|
28
|
+
- [SUPPORT.md](SUPPORT.md)
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Session handoff prompt for pi-evalset-lab."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Starting the next focused development session."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "Session handoff artifact."
|
|
7
|
+
compass: "Resume work quickly with explicit priorities."
|
|
8
|
+
engine: "Capture context, constraints, and next actions."
|
|
9
|
+
fog: "Staleness risk if not updated after major changes."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Next session prompt for pi-evalset-lab
|
|
13
|
+
|
|
14
|
+
Use this file to capture exact follow-up tasks for the next coding session.
|
package/README.md
ADDED
|
@@ -0,0 +1,246 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Overview and quickstart for pi-evalset-lab."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Starting work in this repository."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "Repository scaffold for a pi extension package."
|
|
7
|
+
compass: "Ship small, safe, testable extension iterations."
|
|
8
|
+
engine: "Plan -> implement -> verify with docs and hooks in sync."
|
|
9
|
+
fog: "Unknown runtime integration edge cases until first live sync."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# pi-evalset-lab
|
|
13
|
+
|
|
14
|
+
Production-ready starter scaffold for a pi extension package.
|
|
15
|
+
|
|
16
|
+
## Quickstart
|
|
17
|
+
|
|
18
|
+
1. Install dependencies (if you add any):
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
npm install
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
2. Test with pi:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
pi -e ./extensions/evalset.ts
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
3. Install package into pi:
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
pi install /absolute/path/to/pi-evalset-lab
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## evalset command (MVP)
|
|
37
|
+
|
|
38
|
+
This extension adds `/evalset` for fixed-task-set evaluation runs.
|
|
39
|
+
|
|
40
|
+
### Commands
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
/evalset help
|
|
44
|
+
/evalset init [dataset-path] [--force]
|
|
45
|
+
/evalset run <dataset.json> [--system-file <path>] [--system-text <text>] [--variant <name>] [--max-cases <n>] [--temperature <n>] [--out <report.json>]
|
|
46
|
+
/evalset compare <dataset.json> <baseline-system.txt> <candidate-system.txt> [--baseline-name <name>] [--candidate-name <name>] [--max-cases <n>] [--temperature <n>] [--out <report.json>]
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Running modes
|
|
50
|
+
|
|
51
|
+
`/evalset` is a pi slash command, not a shell executable.
|
|
52
|
+
|
|
53
|
+
Interactive mode:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
pi -e ./extensions/evalset.ts
|
|
57
|
+
# then inside pi:
|
|
58
|
+
/evalset compare examples/fixed-task-set.json examples/system-baseline.txt examples/system-candidate.txt
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Non-interactive mode (scripts/CI):
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
pi -e ./extensions/evalset.ts -p "/evalset compare examples/fixed-task-set.json examples/system-baseline.txt examples/system-candidate.txt"
|
|
65
|
+
# or, if extension already installed/enabled:
|
|
66
|
+
pi -p "/evalset compare examples/fixed-task-set.json examples/system-baseline.txt examples/system-candidate.txt"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Example workflow (inside pi)
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
/evalset run examples/fixed-task-set.json --variant baseline
|
|
73
|
+
/evalset compare examples/fixed-task-set.json examples/system-baseline.txt examples/system-candidate.txt
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Included datasets
|
|
77
|
+
|
|
78
|
+
- `examples/fixed-task-set.json` — tiny smoke set (3 cases)
|
|
79
|
+
- `examples/fixed-task-set-v2.json` — larger first pass set
|
|
80
|
+
- `examples/fixed-task-set-v3.json` — less brittle checks (recommended)
|
|
81
|
+
|
|
82
|
+
The command writes JSON reports to:
|
|
83
|
+
- explicit `--out <path>` when provided
|
|
84
|
+
- otherwise `.evalset/reports/*.json` under your current project directory
|
|
85
|
+
|
|
86
|
+
Each report includes run identity metadata:
|
|
87
|
+
- `runId`
|
|
88
|
+
- `datasetHash`
|
|
89
|
+
- `casesHash`
|
|
90
|
+
- `variantHash` (run) or baseline/candidate variant hashes (compare)
|
|
91
|
+
|
|
92
|
+
Session messages only keep lightweight report metadata (`reportPath`, ids, summary metrics), not full report bodies.
|
|
93
|
+
|
|
94
|
+
## Optional core hooks (future, not required for this MVP)
|
|
95
|
+
|
|
96
|
+
This extension works today without core changes. If we decide to harden further, optional core support could include:
|
|
97
|
+
|
|
98
|
+
1. Stable agent-level lineage IDs (`runId`/`traceId`) across extension events.
|
|
99
|
+
2. Explicit reproducibility capability metadata in `pi-ai` (e.g. seed support and determinism caveats per provider/model).
|
|
100
|
+
3. Shared canonical provider payload hash helper in `pi-ai`.
|
|
101
|
+
4. A headless agent-eval API for tool-heavy/full agent-loop benchmark runs.
|
|
102
|
+
|
|
103
|
+
## Repository checks
|
|
104
|
+
|
|
105
|
+
Run:
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
npm run check
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
This executes [scripts/validate-structure.sh](scripts/validate-structure.sh).
|
|
112
|
+
|
|
113
|
+
## Release + security baseline
|
|
114
|
+
|
|
115
|
+
This scaffold defaults to **release-please** for single-package release PR + tag flow (`vX.Y.Z`) and npm trusted publishing via OIDC.
|
|
116
|
+
|
|
117
|
+
Included files:
|
|
118
|
+
|
|
119
|
+
- [CI workflow](.github/workflows/ci.yml)
|
|
120
|
+
- [release-please workflow](.github/workflows/release-please.yml)
|
|
121
|
+
- [publish workflow](.github/workflows/publish.yml)
|
|
122
|
+
- [Dependabot config](.github/dependabot.yml)
|
|
123
|
+
- [CODEOWNERS](.github/CODEOWNERS)
|
|
124
|
+
- [release-please config](.release-please-config.json)
|
|
125
|
+
- [release-please manifest](.release-please-manifest.json)
|
|
126
|
+
- [Security policy](SECURITY.md)
|
|
127
|
+
|
|
128
|
+
Before first production release:
|
|
129
|
+
|
|
130
|
+
1. Confirm/adjust owners in [.github/CODEOWNERS](.github/CODEOWNERS).
|
|
131
|
+
2. Enable branch protection on `main`.
|
|
132
|
+
3. Configure npm Trusted Publishing for this repo + [publish workflow](.github/workflows/publish.yml).
|
|
133
|
+
4. Merge release PR from release-please, then publish from GitHub release.
|
|
134
|
+
|
|
135
|
+
## Issue + PR intake baseline
|
|
136
|
+
|
|
137
|
+
Included files:
|
|
138
|
+
|
|
139
|
+
- [Bug report form](.github/ISSUE_TEMPLATE/bug-report.yml)
|
|
140
|
+
- [Feature request form](.github/ISSUE_TEMPLATE/feature-request.yml)
|
|
141
|
+
- [Docs request form](.github/ISSUE_TEMPLATE/docs.yml)
|
|
142
|
+
- [Issue template config](.github/ISSUE_TEMPLATE/config.yml)
|
|
143
|
+
- [PR template](.github/pull_request_template.md)
|
|
144
|
+
- [Code of conduct](CODE_OF_CONDUCT.md)
|
|
145
|
+
- [Support guide](SUPPORT.md)
|
|
146
|
+
- [Top-level contributing guide](CONTRIBUTING.md)
|
|
147
|
+
|
|
148
|
+
## Vouch trust gate baseline
|
|
149
|
+
|
|
150
|
+
Included files:
|
|
151
|
+
|
|
152
|
+
- [Vouched contributors list](.github/VOUCHED.td)
|
|
153
|
+
- [PR trust gate workflow](.github/workflows/vouch-check-pr.yml)
|
|
154
|
+
- [Issue-comment trust management workflow](.github/workflows/vouch-manage.yml)
|
|
155
|
+
|
|
156
|
+
Default behavior:
|
|
157
|
+
|
|
158
|
+
- PR workflow runs on `pull_request_target` (`opened`, `reopened`).
|
|
159
|
+
- `require-vouch: true` and `auto-close: true` are enabled by default.
|
|
160
|
+
- Maintainers can comment `vouch`, `denounce`, or `unvouch` on issues to update trust state.
|
|
161
|
+
- Vouch actions are SHA pinned (`0e11a71bba23218a284d3ecca162e75a110fd7e3`) for reproducibility and supply-chain review.
|
|
162
|
+
|
|
163
|
+
Bootstrap step:
|
|
164
|
+
|
|
165
|
+
- Confirm/adjust entries in [.github/VOUCHED.td](.github/VOUCHED.td) before enforcing production policy.
|
|
166
|
+
|
|
167
|
+
## Docs discovery
|
|
168
|
+
|
|
169
|
+
Run:
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
npm run docs:list
|
|
173
|
+
npm run docs:list:workspace
|
|
174
|
+
npm run docs:list:json
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Wrapper script: [scripts/docs-list.sh](scripts/docs-list.sh)
|
|
178
|
+
|
|
179
|
+
Resolution order:
|
|
180
|
+
1. `DOCS_LIST_SCRIPT`
|
|
181
|
+
2. `./scripts/docs-list.mjs` (if vendored)
|
|
182
|
+
3. `~/ai-society/core/agent-scripts/scripts/docs-list.mjs`
|
|
183
|
+
|
|
184
|
+
## Copier lifecycle policy
|
|
185
|
+
|
|
186
|
+
- Keep `.copier-answers.yml` committed.
|
|
187
|
+
- Do not edit `.copier-answers.yml` manually.
|
|
188
|
+
- Run from a clean destination repo (commit or stash pending changes first).
|
|
189
|
+
- Use `copier update --trust` when `.copier-answers.yml` includes `_commit` and update is supported.
|
|
190
|
+
- In non-interactive shells/CI, append `--defaults` to update/recopy.
|
|
191
|
+
- Use `copier recopy --trust` when update is unavailable (for example local non-VCS source) or cannot reconcile cleanly.
|
|
192
|
+
- After recopy, re-apply local deltas intentionally and run `npm run check`.
|
|
193
|
+
|
|
194
|
+
## Hook behavior
|
|
195
|
+
|
|
196
|
+
- Git uses `.githooks/pre-commit` (configured by [scripts/install-hooks.sh](scripts/install-hooks.sh)).
|
|
197
|
+
- If `prek` is available, the hook runs `prek` using [prek.toml](prek.toml).
|
|
198
|
+
- If `prek` is not available, the hook falls back to `scripts/validate-structure.sh`.
|
|
199
|
+
|
|
200
|
+
Install options for `prek`:
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
npm add -D @j178/prek
|
|
204
|
+
# or
|
|
205
|
+
npm install -g @j178/prek
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
## Startup interview flow (project-local)
|
|
209
|
+
|
|
210
|
+
- [`.pi/extensions/startup-intake-router.ts`](.pi/extensions/startup-intake-router.ts) watches the first non-command message in a session.
|
|
211
|
+
- It converts your startup intent into a prefilled command:
|
|
212
|
+
- `/init-project-docs "<your intent>"`
|
|
213
|
+
- [`.pi/prompts/init-project-docs.md`](.pi/prompts/init-project-docs.md) then drives the `interview` tool using [docs/org/project-docs-intake.questions.json](docs/org/project-docs-intake.questions.json).
|
|
214
|
+
|
|
215
|
+
Utility commands:
|
|
216
|
+
|
|
217
|
+
- `/startup-intake-router-status`
|
|
218
|
+
- `/startup-intake-router-reset`
|
|
219
|
+
|
|
220
|
+
## Live sync helper
|
|
221
|
+
|
|
222
|
+
Use [scripts/sync-to-live.sh](scripts/sync-to-live.sh) to copy the package extension to
|
|
223
|
+
`~/.pi/agent/extensions/`.
|
|
224
|
+
|
|
225
|
+
Optional flags:
|
|
226
|
+
|
|
227
|
+
- `--with-prompts`
|
|
228
|
+
- `--with-policy`
|
|
229
|
+
- `--all` (prompts + policy)
|
|
230
|
+
|
|
231
|
+
After sync, run `/reload` in pi.
|
|
232
|
+
|
|
233
|
+
## Docs map
|
|
234
|
+
|
|
235
|
+
- [Organization operating model](docs/org/operating_model.md)
|
|
236
|
+
- [Project foundation model](docs/project/foundation.md)
|
|
237
|
+
- [Project vision](docs/project/vision.md)
|
|
238
|
+
- [Project incentives](docs/project/incentives.md)
|
|
239
|
+
- [Project resources](docs/project/resources.md)
|
|
240
|
+
- [Project skills](docs/project/skills.md)
|
|
241
|
+
- [Strategic goals](docs/project/strategic_goals.md)
|
|
242
|
+
- [Tactical goals](docs/project/tactical_goals.md)
|
|
243
|
+
- [Contributor guide](docs/dev/CONTRIBUTING.md)
|
|
244
|
+
- [Extension SOP](docs/dev/EXTENSION_SOP.md)
|
|
245
|
+
- [Next steps](docs/dev/next_steps.md)
|
|
246
|
+
- [Status](docs/dev/status.md)
|
package/SECURITY.md
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Security reporting process and release hardening baseline."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Reporting a vulnerability."
|
|
5
|
+
- "Reviewing release and workflow security controls."
|
|
6
|
+
system4d:
|
|
7
|
+
container: "Security policy for maintainers and contributors."
|
|
8
|
+
compass: "Private reporting, least privilege, auditable releases."
|
|
9
|
+
engine: "Report privately -> triage -> patch -> verify -> disclose."
|
|
10
|
+
fog: "Dependency and ecosystem risk shifts over time."
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Security Policy
|
|
14
|
+
|
|
15
|
+
## Supported versions
|
|
16
|
+
|
|
17
|
+
Security fixes target the latest release and `main` branch.
|
|
18
|
+
|
|
19
|
+
## Reporting a vulnerability
|
|
20
|
+
|
|
21
|
+
Use **private reporting**.
|
|
22
|
+
|
|
23
|
+
1. Preferred: GitHub Security tab -> **Report a vulnerability**.
|
|
24
|
+
2. If private reporting is unavailable, open a minimal issue titled
|
|
25
|
+
`Security contact request` without exploit details and request a private channel.
|
|
26
|
+
3. Include impact, affected versions, and reproduction steps.
|
|
27
|
+
4. Avoid public disclosure until maintainers confirm a fix/release plan.
|
|
28
|
+
|
|
29
|
+
## Release and supply-chain baseline
|
|
30
|
+
|
|
31
|
+
- Release flow uses release-please PRs before tags/releases.
|
|
32
|
+
- Publish flow uses npm Trusted Publishing (OIDC) and `npm publish --provenance`.
|
|
33
|
+
- Workflow permissions default to read and elevate per job only.
|
|
34
|
+
- Third-party actions must stay explicit; high-risk paths should be SHA pinned.
|
package/SUPPORT.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "How users and contributors request help or report problems."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Needing help, troubleshooting, or reporting non-security issues."
|
|
5
|
+
- "Deciding where to file bug/feature/docs requests."
|
|
6
|
+
system4d:
|
|
7
|
+
container: "Support intake and routing guidance."
|
|
8
|
+
compass: "Route requests to the right channel with enough context."
|
|
9
|
+
engine: "Self-check -> search -> file focused issue -> iterate."
|
|
10
|
+
fog: "Reproduction context is often incomplete in first reports."
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Support
|
|
14
|
+
|
|
15
|
+
## Before opening an issue
|
|
16
|
+
|
|
17
|
+
1. Read [README.md](README.md) and relevant docs in `docs/`.
|
|
18
|
+
2. Search existing issues/PRs for duplicates.
|
|
19
|
+
3. Re-test on the latest release.
|
|
20
|
+
|
|
21
|
+
## Open the right issue type
|
|
22
|
+
|
|
23
|
+
Use the GitHub issue forms for:
|
|
24
|
+
|
|
25
|
+
- Bug reports
|
|
26
|
+
- Feature requests
|
|
27
|
+
- Documentation improvements
|
|
28
|
+
|
|
29
|
+
## Security reports
|
|
30
|
+
|
|
31
|
+
For vulnerabilities, follow [SECURITY.md](SECURITY.md) and use private reporting.
|
|
32
|
+
Do **not** post exploit details in public issues.
|
|
33
|
+
|
|
34
|
+
## Maintainer response expectations
|
|
35
|
+
|
|
36
|
+
This project may be maintained part-time. Triage and response times can vary,
|
|
37
|
+
but actionable reports with reproduction details are prioritized.
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Contribution workflow for this extension repository."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Before opening PRs or submitting local changes."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "Contributor process and quality gates."
|
|
7
|
+
compass: "Small, validated, documented changes."
|
|
8
|
+
engine: "Branch -> implement -> check -> document -> review."
|
|
9
|
+
fog: "Process details may adjust with team scale."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Contributing
|
|
13
|
+
|
|
14
|
+
## Workflow
|
|
15
|
+
|
|
16
|
+
1. Create a focused branch.
|
|
17
|
+
2. Run `npm run docs:list` and read matched docs before cross-cutting changes.
|
|
18
|
+
3. Implement one scoped change.
|
|
19
|
+
4. Run `npm run check`.
|
|
20
|
+
5. Update docs/changelog where relevant.
|
|
21
|
+
6. Open PR with concise rationale and validation output.
|
|
22
|
+
|
|
23
|
+
## Standards
|
|
24
|
+
|
|
25
|
+
- Keep diffs small and reviewable.
|
|
26
|
+
- Preserve markdown frontmatter in generated docs.
|
|
27
|
+
- Prefer explicit scripts over manual one-off commands.
|
|
28
|
+
|
|
29
|
+
## Copier policy
|
|
30
|
+
|
|
31
|
+
- Keep `.copier-answers.yml` in version control.
|
|
32
|
+
- Do not edit `.copier-answers.yml` manually.
|
|
33
|
+
- Run update/recopy from a clean destination repo (commit or stash pending changes first).
|
|
34
|
+
- Use `copier update --trust` when `.copier-answers.yml` includes `_commit` and update is supported.
|
|
35
|
+
- In non-interactive shells/CI, append `--defaults` to update/recopy.
|
|
36
|
+
- Use `copier recopy --trust` when update is unavailable (for example local non-VCS source) or cannot reconcile cleanly.
|
|
37
|
+
- After recopy, re-apply local deltas intentionally and run `npm run check`.
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Lifecycle SOP for extension delivery and maintenance."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Planning, implementing, verifying, releasing, or maintaining extension work."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "End-to-end extension operating procedure."
|
|
7
|
+
compass: "Consistent quality from idea to maintenance."
|
|
8
|
+
engine: "plan -> implement -> verify -> release -> maintain."
|
|
9
|
+
fog: "Unknowns resolved through incremental validation loops."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Extension SOP
|
|
13
|
+
|
|
14
|
+
## 1) Plan
|
|
15
|
+
|
|
16
|
+
- Define scope and acceptance criteria.
|
|
17
|
+
- Run `npm run docs:list` and read docs matching your task domain.
|
|
18
|
+
- Capture work in `docs/dev/plans/`.
|
|
19
|
+
- Confirm risks and dependencies.
|
|
20
|
+
|
|
21
|
+
## 2) Implement
|
|
22
|
+
|
|
23
|
+
- Build in small commits.
|
|
24
|
+
- Keep command/tool behavior explicit.
|
|
25
|
+
- Update docs as behavior changes.
|
|
26
|
+
|
|
27
|
+
## 3) Verify
|
|
28
|
+
|
|
29
|
+
- Run `npm run check`.
|
|
30
|
+
- Execute relevant extension tests.
|
|
31
|
+
- Validate prompt templates if changed.
|
|
32
|
+
|
|
33
|
+
## 4) Release
|
|
34
|
+
|
|
35
|
+
- Update `CHANGELOG.md`.
|
|
36
|
+
- Tag/version according to team policy.
|
|
37
|
+
- Sync extension to live pi when needed.
|
|
38
|
+
|
|
39
|
+
## 5) Maintain
|
|
40
|
+
|
|
41
|
+
- Monitor regressions and user feedback.
|
|
42
|
+
- Re-run validation after dependency/script changes.
|
|
43
|
+
- Keep `docs/dev/status.md` and `docs/dev/next_steps.md` current.
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Prioritized next actions for active development."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Starting a coding session or grooming tasks."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "Execution queue for maintainers."
|
|
7
|
+
compass: "Maintain momentum with clear, ordered tasks."
|
|
8
|
+
engine: "Do highest-leverage item first, then validate."
|
|
9
|
+
fog: "Task order may change after discovery."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Next steps
|
|
13
|
+
|
|
14
|
+
1. Complete npm publish on npmjs (login, registry, publish, verify install).
|
|
15
|
+
2. Add at least one automated test for argument parsing and expectation scoring.
|
|
16
|
+
3. Add a small repeatable report-share helper (JSON -> static HTML export command/script).
|
|
17
|
+
4. Evaluate optional LLM-judge scoring mode (`expectJudgePrompt`) after parser/scoring tests exist.
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Initial implementation plan for first extension iteration."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Executing the first feature slice from scaffold state."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "Plan artifact for incremental delivery."
|
|
7
|
+
compass: "Move from scaffold to validated feature quickly."
|
|
8
|
+
engine: "Plan -> implement -> verify -> document."
|
|
9
|
+
fog: "Scope creep risk if tasks are not constrained."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Plan 001: first feature slice
|
|
13
|
+
|
|
14
|
+
## Objective
|
|
15
|
+
|
|
16
|
+
Ship one useful command behavior end-to-end.
|
|
17
|
+
|
|
18
|
+
## Steps
|
|
19
|
+
|
|
20
|
+
1. Define expected command input/output.
|
|
21
|
+
2. Implement logic in `extensions/`.
|
|
22
|
+
3. Add tests in `tests/`.
|
|
23
|
+
4. Run `npm run check`.
|
|
24
|
+
5. Update `docs/dev/status.md` and `CHANGELOG.md`.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Current project status snapshot."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Checking project health or preparing handoff updates."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "State report for current branch/project."
|
|
7
|
+
compass: "Keep stakeholders aligned on progress and blockers."
|
|
8
|
+
engine: "Update status after meaningful implementation slices."
|
|
9
|
+
fog: "Status can stale quickly without disciplined updates."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Status
|
|
13
|
+
|
|
14
|
+
- Scaffold: complete
|
|
15
|
+
- Extension behavior: `/evalset` MVP implemented (`init`, `run`, `compare`)
|
|
16
|
+
- Example datasets: smoke + `fixed-task-set-v2.json` + `fixed-task-set-v3.json`
|
|
17
|
+
- Invocation docs: clarified non-interactive usage via `pi -p` / `pi -e ... -p`
|
|
18
|
+
- GitHub publish: `tryingET/pi-evalset-lab` created with release `v0.1.0`
|
|
19
|
+
- npm publish: pending (`npm` auth + registry set to `https://registry.npmjs.org/`)
|
|
20
|
+
- Validation hooks: installed
|
|
21
|
+
- Tests: pending
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Compact organization operating model and terminology."
|
|
3
|
+
read_when:
|
|
4
|
+
- "Aligning organization-level purpose, mission, and strategy."
|
|
5
|
+
system4d:
|
|
6
|
+
container: "Organization-level concepts shared across projects."
|
|
7
|
+
compass: "Keep strategy and culture aligned with organization purpose."
|
|
8
|
+
engine: "Purpose -> mission -> vision -> strategic objectives."
|
|
9
|
+
fog: "Terminology drift can create cross-project confusion."
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Organization operating model
|
|
13
|
+
|
|
14
|
+
## Organization purpose
|
|
15
|
+
Enable teams to build and operate AI-assisted software workflows that are reliable, auditable, and easy to improve.
|
|
16
|
+
|
|
17
|
+
## Organization mission (current)
|
|
18
|
+
Provide practical tooling, standards, and documentation that make safe AI engineering the default path.
|
|
19
|
+
|
|
20
|
+
## Organization vision
|
|
21
|
+
A trusted ecosystem where AI coding workflows are reproducible, understandable, and continuously improving.
|
|
22
|
+
|
|
23
|
+
## Organization strategic objectives
|
|
24
|
+
1. Standardize evaluation workflows across active projects.
|
|
25
|
+
2. Improve run traceability and reproducibility metadata.
|
|
26
|
+
3. Keep onboarding friction low through concise, current documentation.
|
|
27
|
+
4. Maintain secure-by-default release and dependency hygiene.
|
|
28
|
+
5. Reduce ambiguity between interactive and non-interactive execution modes.
|
|
29
|
+
|
|
30
|
+
## Core values
|
|
31
|
+
- Clarity first
|
|
32
|
+
- Evidence over assumption
|
|
33
|
+
- Small, verifiable increments
|
|
34
|
+
- Safety for high-impact changes
|
|
35
|
+
- Respectful collaboration
|
|
36
|
+
|
|
37
|
+
## Boundary
|
|
38
|
+
Organization purpose is cross-project.
|
|
39
|
+
Project-specific purpose is defined in [Project foundation model](../project/foundation.md).
|