takt 0.32.2 → 0.33.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -0
- package/builtins/en/facets/instructions/gather-review.md +11 -7
- package/builtins/en/facets/instructions/review-arch.md +4 -0
- package/builtins/en/facets/instructions/review-qa.md +2 -0
- package/builtins/en/facets/instructions/review-security.md +21 -8
- package/builtins/en/facets/instructions/review-test.md +3 -0
- package/builtins/en/facets/instructions/supervise.md +39 -12
- package/builtins/en/facets/instructions/write-tests-first.md +4 -0
- package/builtins/en/facets/knowledge/security.md +24 -0
- package/builtins/en/facets/output-contracts/summary.md +2 -5
- package/builtins/en/facets/output-contracts/supervisor-validation.md +12 -5
- package/builtins/en/facets/output-contracts/testing-review.md +1 -0
- package/builtins/en/facets/personas/ai-antipattern-reviewer.md +2 -2
- package/builtins/en/facets/personas/architect-planner.md +4 -4
- package/builtins/en/facets/personas/architecture-reviewer.md +1 -1
- package/builtins/en/facets/personas/coder.md +1 -1
- package/builtins/en/facets/personas/conductor.md +11 -2
- package/builtins/en/facets/personas/dual-supervisor.md +13 -0
- package/builtins/en/facets/personas/planner.md +4 -4
- package/builtins/en/facets/personas/qa-reviewer.md +3 -3
- package/builtins/en/facets/personas/requirements-reviewer.md +3 -3
- package/builtins/en/facets/personas/research-analyzer.md +5 -5
- package/builtins/en/facets/personas/research-digger.md +4 -4
- package/builtins/en/facets/personas/research-planner.md +2 -2
- package/builtins/en/facets/personas/research-supervisor.md +4 -4
- package/builtins/en/facets/personas/security-reviewer.md +3 -0
- package/builtins/en/facets/personas/supervisor.md +14 -12
- package/builtins/en/facets/personas/test-planner.md +3 -3
- package/builtins/en/facets/personas/testing-reviewer.md +3 -3
- package/builtins/en/facets/policies/review.md +8 -0
- package/builtins/ja/INSTRUCTION_STYLE_GUIDE.md +1 -1
- package/builtins/ja/OUTPUT_CONTRACT_STYLE_GUIDE.md +1 -1
- package/builtins/ja/PERSONA_STYLE_GUIDE.md +11 -9
- package/builtins/ja/facets/instructions/gather-review.md +11 -7
- package/builtins/ja/facets/instructions/review-arch.md +4 -0
- package/builtins/ja/facets/instructions/review-qa.md +2 -0
- package/builtins/ja/facets/instructions/review-security.md +22 -9
- package/builtins/ja/facets/instructions/review-test.md +3 -0
- package/builtins/ja/facets/instructions/supervise.md +40 -12
- package/builtins/ja/facets/instructions/write-tests-first.md +4 -0
- package/builtins/ja/facets/knowledge/security.md +24 -0
- package/builtins/ja/facets/output-contracts/summary.md +2 -5
- package/builtins/ja/facets/output-contracts/supervisor-validation.md +12 -5
- package/builtins/ja/facets/output-contracts/testing-review.md +1 -0
- package/builtins/ja/facets/personas/ai-antipattern-reviewer.md +2 -2
- package/builtins/ja/facets/personas/architect-planner.md +3 -3
- package/builtins/ja/facets/personas/architecture-reviewer.md +2 -2
- package/builtins/ja/facets/personas/conductor.md +9 -0
- package/builtins/ja/facets/personas/cqrs-es-reviewer.md +3 -3
- package/builtins/ja/facets/personas/dual-supervisor.md +5 -2
- package/builtins/ja/facets/personas/frontend-reviewer.md +3 -3
- package/builtins/ja/facets/personas/planner.md +3 -3
- package/builtins/ja/facets/personas/qa-reviewer.md +3 -3
- package/builtins/ja/facets/personas/requirements-reviewer.md +3 -3
- package/builtins/ja/facets/personas/research-analyzer.md +5 -5
- package/builtins/ja/facets/personas/research-digger.md +4 -5
- package/builtins/ja/facets/personas/research-planner.md +2 -2
- package/builtins/ja/facets/personas/research-supervisor.md +4 -4
- package/builtins/ja/facets/personas/security-reviewer.md +3 -1
- package/builtins/ja/facets/personas/supervisor.md +19 -12
- package/builtins/ja/facets/personas/test-planner.md +3 -3
- package/builtins/ja/facets/personas/testing-reviewer.md +3 -3
- package/builtins/ja/facets/policies/review.md +8 -0
- package/builtins/project/dotgitignore +11 -10
- package/dist/agents/decompose-task-usecase.d.ts.map +1 -1
- package/dist/agents/decompose-task-usecase.js +3 -2
- package/dist/agents/decompose-task-usecase.js.map +1 -1
- package/dist/app/cli/commands.js +1 -1
- package/dist/app/cli/commands.js.map +1 -1
- package/dist/app/cli/helpers.js +1 -1
- package/dist/app/cli/helpers.js.map +1 -1
- package/dist/app/cli/program.d.ts.map +1 -1
- package/dist/app/cli/program.js +4 -2
- package/dist/app/cli/program.js.map +1 -1
- package/dist/app/cli/routing-inputs.d.ts.map +1 -1
- package/dist/app/cli/routing-inputs.js +12 -13
- package/dist/app/cli/routing-inputs.js.map +1 -1
- package/dist/app/cli/routing.d.ts.map +1 -1
- package/dist/app/cli/routing.js +20 -6
- package/dist/app/cli/routing.js.map +1 -1
- package/dist/core/config/provider-resolution.d.ts +26 -0
- package/dist/core/config/provider-resolution.d.ts.map +1 -0
- package/dist/core/config/provider-resolution.js +31 -0
- package/dist/core/config/provider-resolution.js.map +1 -0
- package/dist/core/models/config-types.d.ts +53 -0
- package/dist/core/models/config-types.d.ts.map +1 -1
- package/dist/core/models/piece-types.d.ts +3 -7
- package/dist/core/models/piece-types.d.ts.map +1 -1
- package/dist/core/models/piece-types.js +6 -1
- package/dist/core/models/piece-types.js.map +1 -1
- package/dist/core/models/schemas.d.ts +106 -1
- package/dist/core/models/schemas.d.ts.map +1 -1
- package/dist/core/models/schemas.js +37 -2
- package/dist/core/models/schemas.js.map +1 -1
- package/dist/core/models/vcs-types.d.ts +9 -0
- package/dist/core/models/vcs-types.d.ts.map +1 -0
- package/dist/core/models/vcs-types.js +8 -0
- package/dist/core/models/vcs-types.js.map +1 -0
- package/dist/core/piece/provider-resolution.d.ts +0 -5
- package/dist/core/piece/provider-resolution.d.ts.map +1 -1
- package/dist/core/piece/provider-resolution.js +1 -29
- package/dist/core/piece/provider-resolution.js.map +1 -1
- package/dist/core/provider-resolution.d.ts +16 -0
- package/dist/core/provider-resolution.d.ts.map +1 -0
- package/dist/core/provider-resolution.js +30 -0
- package/dist/core/provider-resolution.js.map +1 -0
- package/dist/core/runtime/runtime-environment.d.ts +1 -1
- package/dist/core/runtime/runtime-environment.d.ts.map +1 -1
- package/dist/core/runtime/runtime-environment.js +8 -4
- package/dist/core/runtime/runtime-environment.js.map +1 -1
- package/dist/features/interactive/assistantConfig.d.ts +3 -0
- package/dist/features/interactive/assistantConfig.d.ts.map +1 -0
- package/dist/features/interactive/assistantConfig.js +19 -0
- package/dist/features/interactive/assistantConfig.js.map +1 -0
- package/dist/features/interactive/conversationLogMeta.d.ts +13 -0
- package/dist/features/interactive/conversationLogMeta.d.ts.map +1 -0
- package/dist/features/interactive/conversationLogMeta.js +17 -0
- package/dist/features/interactive/conversationLogMeta.js.map +1 -0
- package/dist/features/interactive/conversationLoop.d.ts +0 -8
- package/dist/features/interactive/conversationLoop.d.ts.map +1 -1
- package/dist/features/interactive/conversationLoop.js +9 -25
- package/dist/features/interactive/conversationLoop.js.map +1 -1
- package/dist/features/interactive/interactive.d.ts +5 -0
- package/dist/features/interactive/interactive.d.ts.map +1 -1
- package/dist/features/interactive/interactive.js +8 -17
- package/dist/features/interactive/interactive.js.map +1 -1
- package/dist/features/interactive/personaMode.js +2 -1
- package/dist/features/interactive/personaMode.js.map +1 -1
- package/dist/features/interactive/policyPrompt.d.ts +2 -0
- package/dist/features/interactive/policyPrompt.d.ts.map +1 -0
- package/dist/features/interactive/policyPrompt.js +16 -0
- package/dist/features/interactive/policyPrompt.js.map +1 -0
- package/dist/features/interactive/quietMode.js +2 -1
- package/dist/features/interactive/quietMode.js.map +1 -1
- package/dist/features/interactive/retryMode.d.ts.map +1 -1
- package/dist/features/interactive/retryMode.js +4 -12
- package/dist/features/interactive/retryMode.js.map +1 -1
- package/dist/features/interactive/sessionInitialization.d.ts +4 -0
- package/dist/features/interactive/sessionInitialization.d.ts.map +1 -0
- package/dist/features/interactive/sessionInitialization.js +27 -0
- package/dist/features/interactive/sessionInitialization.js.map +1 -0
- package/dist/features/pipeline/steps.d.ts +1 -1
- package/dist/features/pipeline/steps.d.ts.map +1 -1
- package/dist/features/pipeline/steps.js +6 -7
- package/dist/features/pipeline/steps.js.map +1 -1
- package/dist/features/tasks/add/index.d.ts +1 -1
- package/dist/features/tasks/add/index.js +7 -8
- package/dist/features/tasks/add/index.js.map +1 -1
- package/dist/features/tasks/add/issueTask.d.ts +1 -1
- package/dist/features/tasks/add/issueTask.js +2 -2
- package/dist/features/tasks/add/issueTask.js.map +1 -1
- package/dist/features/tasks/execute/postExecution.d.ts +2 -0
- package/dist/features/tasks/execute/postExecution.d.ts.map +1 -1
- package/dist/features/tasks/execute/postExecution.js +44 -13
- package/dist/features/tasks/execute/postExecution.js.map +1 -1
- package/dist/features/tasks/execute/resolveTask.d.ts +1 -1
- package/dist/features/tasks/execute/resolveTask.d.ts.map +1 -1
- package/dist/features/tasks/execute/resolveTask.js +3 -3
- package/dist/features/tasks/execute/resolveTask.js.map +1 -1
- package/dist/features/tasks/execute/taskExecution.d.ts.map +1 -1
- package/dist/features/tasks/execute/taskExecution.js +20 -2
- package/dist/features/tasks/execute/taskExecution.js.map +1 -1
- package/dist/features/tasks/list/instructMode.d.ts.map +1 -1
- package/dist/features/tasks/list/instructMode.js +4 -12
- package/dist/features/tasks/list/instructMode.js.map +1 -1
- package/dist/features/tasks/list/taskSyncAction.d.ts.map +1 -1
- package/dist/features/tasks/list/taskSyncAction.js +3 -2
- package/dist/features/tasks/list/taskSyncAction.js.map +1 -1
- package/dist/infra/config/configNormalizers.d.ts +14 -1
- package/dist/infra/config/configNormalizers.d.ts.map +1 -1
- package/dist/infra/config/configNormalizers.js +45 -0
- package/dist/infra/config/configNormalizers.js.map +1 -1
- package/dist/infra/config/env/config-env-overrides.d.ts.map +1 -1
- package/dist/infra/config/env/config-env-overrides.js +24 -0
- package/dist/infra/config/env/config-env-overrides.js.map +1 -1
- package/dist/infra/config/global/globalConfigCore.d.ts.map +1 -1
- package/dist/infra/config/global/globalConfigCore.js +23 -1
- package/dist/infra/config/global/globalConfigCore.js.map +1 -1
- package/dist/infra/config/global/globalConfigSerializer.d.ts.map +1 -1
- package/dist/infra/config/global/globalConfigSerializer.js +23 -0
- package/dist/infra/config/global/globalConfigSerializer.js.map +1 -1
- package/dist/infra/config/loaders/pieceParser.d.ts +2 -2
- package/dist/infra/config/loaders/pieceParser.d.ts.map +1 -1
- package/dist/infra/config/loaders/pieceParser.js +109 -6
- package/dist/infra/config/loaders/pieceParser.js.map +1 -1
- package/dist/infra/config/project/projectConfig.d.ts.map +1 -1
- package/dist/infra/config/project/projectConfig.js +59 -48
- package/dist/infra/config/project/projectConfig.js.map +1 -1
- package/dist/infra/config/project/projectConfigTransforms.d.ts +21 -1
- package/dist/infra/config/project/projectConfigTransforms.d.ts.map +1 -1
- package/dist/infra/config/project/projectConfigTransforms.js +40 -0
- package/dist/infra/config/project/projectConfigTransforms.js.map +1 -1
- package/dist/infra/config/project/sessionStore.d.ts +8 -0
- package/dist/infra/config/project/sessionStore.d.ts.map +1 -1
- package/dist/infra/config/project/sessionStore.js +20 -0
- package/dist/infra/config/project/sessionStore.js.map +1 -1
- package/dist/infra/config/resolveConfigValue.d.ts.map +1 -1
- package/dist/infra/config/resolveConfigValue.js +1 -0
- package/dist/infra/config/resolveConfigValue.js.map +1 -1
- package/dist/infra/git/constants.d.ts +5 -0
- package/dist/infra/git/constants.d.ts.map +1 -0
- package/dist/infra/git/constants.js +5 -0
- package/dist/infra/git/constants.js.map +1 -0
- package/dist/infra/git/detect.d.ts +25 -0
- package/dist/infra/git/detect.d.ts.map +1 -0
- package/dist/infra/git/detect.js +71 -0
- package/dist/infra/git/detect.js.map +1 -0
- package/dist/infra/git/format.d.ts +50 -0
- package/dist/infra/git/format.d.ts.map +1 -0
- package/dist/infra/git/format.js +133 -0
- package/dist/infra/git/format.js.map +1 -0
- package/dist/infra/git/index.d.ts +32 -1
- package/dist/infra/git/index.d.ts.map +1 -1
- package/dist/infra/git/index.js +85 -1
- package/dist/infra/git/index.js.map +1 -1
- package/dist/infra/git/types.d.ts +8 -6
- package/dist/infra/git/types.d.ts.map +1 -1
- package/dist/infra/github/GitHubProvider.d.ts +2 -2
- package/dist/infra/github/GitHubProvider.d.ts.map +1 -1
- package/dist/infra/github/GitHubProvider.js +2 -2
- package/dist/infra/github/GitHubProvider.js.map +1 -1
- package/dist/infra/github/index.d.ts +1 -2
- package/dist/infra/github/index.d.ts.map +1 -1
- package/dist/infra/github/index.js +1 -2
- package/dist/infra/github/index.js.map +1 -1
- package/dist/infra/github/issue.d.ts +3 -49
- package/dist/infra/github/issue.d.ts.map +1 -1
- package/dist/infra/github/issue.js +0 -93
- package/dist/infra/github/issue.js.map +1 -1
- package/dist/infra/github/pr.d.ts +1 -10
- package/dist/infra/github/pr.d.ts.map +1 -1
- package/dist/infra/github/pr.js +20 -66
- package/dist/infra/github/pr.js.map +1 -1
- package/dist/infra/gitlab/GitLabProvider.d.ts +18 -0
- package/dist/infra/gitlab/GitLabProvider.d.ts.map +1 -0
- package/dist/infra/gitlab/GitLabProvider.js +34 -0
- package/dist/infra/gitlab/GitLabProvider.js.map +1 -0
- package/dist/infra/gitlab/index.d.ts +5 -0
- package/dist/infra/gitlab/index.d.ts.map +1 -0
- package/dist/infra/gitlab/index.js +5 -0
- package/dist/infra/gitlab/index.js.map +1 -0
- package/dist/infra/gitlab/issue.d.ts +20 -0
- package/dist/infra/gitlab/issue.d.ts.map +1 -0
- package/dist/infra/gitlab/issue.js +65 -0
- package/dist/infra/gitlab/issue.js.map +1 -0
- package/dist/infra/gitlab/pr.d.ts +27 -0
- package/dist/infra/gitlab/pr.d.ts.map +1 -0
- package/dist/infra/gitlab/pr.js +138 -0
- package/dist/infra/gitlab/pr.js.map +1 -0
- package/dist/infra/gitlab/utils.d.ts +24 -0
- package/dist/infra/gitlab/utils.d.ts.map +1 -0
- package/dist/infra/gitlab/utils.js +70 -0
- package/dist/infra/gitlab/utils.js.map +1 -0
- package/dist/infra/task/autoCommit.d.ts +2 -0
- package/dist/infra/task/autoCommit.d.ts.map +1 -1
- package/dist/infra/task/autoCommit.js +24 -7
- package/dist/infra/task/autoCommit.js.map +1 -1
- package/dist/infra/task/git.d.ts +4 -0
- package/dist/infra/task/git.d.ts.map +1 -1
- package/dist/infra/task/git.js +10 -0
- package/dist/infra/task/git.js.map +1 -1
- package/dist/shared/i18n/labels_en.yaml +1 -1
- package/dist/shared/i18n/labels_ja.yaml +1 -1
- package/package.json +1 -1
- package/dist/infra/github/types.d.ts +0 -5
- package/dist/infra/github/types.d.ts.map +0 -1
- package/dist/infra/github/types.js +0 -5
- package/dist/infra/github/types.js.map +0 -1
package/README.md
CHANGED
|
@@ -28,6 +28,7 @@ Choose one:
|
|
|
28
28
|
Optional:
|
|
29
29
|
|
|
30
30
|
- [GitHub CLI](https://cli.github.com/) (`gh`) — for `takt #N` (GitHub Issue tasks)
|
|
31
|
+
- [GitLab CLI](https://gitlab.com/gitlab-org/cli) (`glab`) — for GitLab Issue/MR integration (auto-detected from remote URL)
|
|
31
32
|
|
|
32
33
|
> **OAuth and API key usage:** Whether OAuth or API key access is permitted varies by provider and use case. Check each provider's terms of service before using TAKT.
|
|
33
34
|
|
|
@@ -17,14 +17,18 @@ Analyze the task text and determine which mode to use.
|
|
|
17
17
|
- Collect Issue title, description, labels, and comments
|
|
18
18
|
|
|
19
19
|
### Mode 2: Branch mode
|
|
20
|
-
**Trigger:**
|
|
20
|
+
**Trigger:** The normalized task text exactly matches one branch name found in `git branch -a`. This includes names with `/` (e.g., `feature/auth`) as well as simple names (e.g., `develop`, `release-v2`, `hotfix-login`).
|
|
21
21
|
**Steps:**
|
|
22
|
-
1.
|
|
23
|
-
2.
|
|
24
|
-
3.
|
|
25
|
-
4.
|
|
26
|
-
5.
|
|
27
|
-
6.
|
|
22
|
+
1. Run `git branch -a` and inspect the branch list yourself. Never interpolate raw task text into shell commands.
|
|
23
|
+
2. Normalization is limited to trimming surrounding whitespace, removing wrapping quotes/backticks, and stripping a leading `origin/` prefix. Do not do partial matching or heuristic guessing beyond that.
|
|
24
|
+
3. Use Branch mode only when the normalized text exactly matches one branch name from the branch list.
|
|
25
|
+
4. If there is no exact match, multiple plausible candidates, or the branch name only appears as part of explanatory prose, do not guess. Fall back to Current diff mode.
|
|
26
|
+
5. Determine the base branch (default: `main`, fallback: `master`)
|
|
27
|
+
6. Use only the branch name confirmed in step 3 when running `git log {base}..{branch} --oneline` to get commit history
|
|
28
|
+
7. Use only the branch name confirmed in step 3 when running `git diff {base}...{branch}` to get the diff
|
|
29
|
+
8. Compile the changed files list
|
|
30
|
+
9. Extract purpose from commit messages
|
|
31
|
+
10. If a PR exists for the branch, fetch it with `gh pr list --head {branch}` using only the branch name confirmed in step 3
|
|
28
32
|
|
|
29
33
|
### Mode 3: Current diff mode
|
|
30
34
|
**Trigger:** Task does not match Mode 1 or Mode 2 (e.g., "review current changes", "last 3 commits", "current diff")
|
|
@@ -28,5 +28,9 @@ Review {report:coder-decisions.md} to understand the recorded design decisions.
|
|
|
28
28
|
1. First, extract previous open findings and preliminarily classify as `new / persists / resolved`
|
|
29
29
|
2. Review the change diff and detect issues based on the architecture and design criteria above
|
|
30
30
|
- Cross-check changes against REJECT criteria tables defined in knowledge
|
|
31
|
+
- If you find a DRY violation, require it to be fixed
|
|
32
|
+
- Before proposing a fix, verify that the consolidation target fits existing responsibility boundaries, contracts, and public API shape
|
|
33
|
+
- If you require a new wrapper, helper, or public API, explain why that abstraction target is the natural one
|
|
34
|
+
- If the proposed abstraction goes beyond the task spec or plan, state why the additional scope is necessary and justified
|
|
31
35
|
3. For each detected issue, classify as blocking/non-blocking based on Policy's scope determination table and judgment rules
|
|
32
36
|
4. If there is even one blocking issue (`new` or `persists`), judge as REJECT
|
|
@@ -23,5 +23,7 @@ Review {report:coder-decisions.md} to understand the recorded design decisions.
|
|
|
23
23
|
1. First, extract previous open findings and preliminarily classify as `new / persists / resolved`
|
|
24
24
|
2. Review the change diff and detect issues based on the quality assurance criteria above
|
|
25
25
|
- Cross-check changes against REJECT criteria tables defined in knowledge
|
|
26
|
+
- Even if tests pass, verify whether any additional change outside the task or plan is justified
|
|
27
|
+
- If review-driven follow-up changes expand the design, evaluate whether that extra change is actually necessary
|
|
26
28
|
3. For each detected issue, classify as blocking/non-blocking based on Policy's scope determination table and judgment rules
|
|
27
29
|
4. If there is even one blocking issue (`new` or `persists`), judge as REJECT
|
|
@@ -4,15 +4,28 @@ Review the changes from a security perspective. Check for the following vulnerab
|
|
|
4
4
|
- Data exposure risks
|
|
5
5
|
- Cryptographic weaknesses
|
|
6
6
|
|
|
7
|
+
**Primary sources to review:**
|
|
8
|
+
- Review `order.md` to understand requirements and prohibitions.
|
|
9
|
+
- Review `plan.md` to understand intended scope and design direction.
|
|
10
|
+
- Review {report:coder-decisions.md} to understand the recorded design decisions.
|
|
11
|
+
- Do not dismiss documented decisions as FP by default. Re-evaluate them against `order.md`, `plan.md`, and the actual code.
|
|
7
12
|
|
|
8
|
-
**
|
|
9
|
-
|
|
10
|
-
- Do not
|
|
11
|
-
-
|
|
13
|
+
**Important:**
|
|
14
|
+
- Do not treat documented precedence rules, extension points, or configuration override behavior as vulnerabilities by themselves.
|
|
15
|
+
- Do not assume that removing an interactive confirmation or warning automatically means a security boundary regression.
|
|
16
|
+
- To issue a blocking finding, make the exploit path concrete: who controls what input, and what newly becomes possible.
|
|
12
17
|
|
|
13
18
|
## Judgment Procedure
|
|
14
19
|
|
|
15
|
-
1.
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
20
|
+
1. Cross-check `order.md`, `plan.md`, `coder-decisions.md`, and the actual code to determine whether the behavior is intentional product behavior
|
|
21
|
+
2. Review the change diff and extract issue candidates by cross-checking changes against REJECT criteria in knowledge
|
|
22
|
+
3. For each candidate, verify the concrete exploit path
|
|
23
|
+
- Which actor controls the input or configuration
|
|
24
|
+
- Whether the change enables new privilege, data access, code execution, or prompt modification
|
|
25
|
+
- Whether the impact exceeds the existing documented precedence or extension model
|
|
26
|
+
4. When configuration precedence, local/global shadowing, or non-interactive selection is involved, additionally verify:
|
|
27
|
+
- Whether the behavior is intended by `order.md` or `plan.md`
|
|
28
|
+
- Whether explicit selectors or arguments already make the user's intent clear
|
|
29
|
+
- Whether there is an actual trust-boundary break or new attack capability, rather than merely an override relationship
|
|
30
|
+
5. For each detected issue, classify it as blocking or non-blocking based on the Policy scope table and judgment rules
|
|
31
|
+
6. If there is even one blocking issue, judge as REJECT
|
|
@@ -6,6 +6,8 @@ Review the changes from a test quality perspective.
|
|
|
6
6
|
- Test naming conventions
|
|
7
7
|
- Completeness (unnecessary tests, missing cases)
|
|
8
8
|
- Appropriateness of mocks and fixtures
|
|
9
|
+
- When an external contract exists, whether request body / query / path input locations are verified as defined
|
|
10
|
+
- Whether the tests would catch an implementation that incorrectly reuses a response envelope for request parsing
|
|
9
11
|
|
|
10
12
|
|
|
11
13
|
**Design decisions reference:**
|
|
@@ -18,3 +20,4 @@ Review {report:coder-decisions.md} to understand the recorded design decisions.
|
|
|
18
20
|
1. Cross-reference the test plan/test scope reports in the Report Directory with the implemented tests
|
|
19
21
|
2. For each detected issue, classify as blocking/non-blocking based on Policy's scope determination table and judgment rules
|
|
20
22
|
3. If there is even one blocking issue, judge as REJECT
|
|
23
|
+
4. If an external contract exists and input locations (root body / query / path) are not verified, treat it as a coverage gap by default
|
|
@@ -1,19 +1,41 @@
|
|
|
1
|
-
|
|
1
|
+
Verify existing evidence for tests, builds, and functional checks, then perform final approval.
|
|
2
2
|
|
|
3
3
|
**Overall piece verification:**
|
|
4
4
|
1. Check all reports in the report directory and verify overall piece consistency
|
|
5
5
|
- Does implementation match the plan?
|
|
6
6
|
- Were all review movement findings properly addressed?
|
|
7
7
|
- Was the original task objective achieved?
|
|
8
|
-
|
|
8
|
+
- Are prior review findings themselves valid against the task spec, plan, and actual code?
|
|
9
|
+
2. Verify the task spec, plan, and decision history as primary sources
|
|
10
|
+
- Read `order.md` and extract required behavior and prohibitions
|
|
11
|
+
- Read `plan.md` and confirm intended approach and scope
|
|
12
|
+
- Read `coder-decisions.md` and confirm why the implementation moved in that direction
|
|
13
|
+
- Do not treat prior review conclusions as authoritative unless they align with all three and the code
|
|
14
|
+
3. Whether each task spec requirement has been achieved
|
|
9
15
|
- Extract requirements one by one from the task spec
|
|
16
|
+
- If a single sentence contains multiple conditions or paths, split it into the smallest independently verifiable units
|
|
17
|
+
- Example: treat `global/project` as separate requirements
|
|
18
|
+
- Example: treat `JSON override / leaf override` as separate requirements
|
|
19
|
+
- Example: split parallel expressions such as `A and B`, `A/B`, `allow/deny`, or `read/write`
|
|
10
20
|
- For each requirement, identify the implementing code (file:line)
|
|
11
|
-
- Verify the code actually fulfills the requirement (read the file,
|
|
21
|
+
- Verify the code actually fulfills the requirement (read the file, check existing test/build evidence)
|
|
22
|
+
- Do not mark a composite requirement as ✅ based on only one side of the cases
|
|
23
|
+
- Evidence must cover the full content of the requirement row
|
|
12
24
|
- Do not rely on the plan report's judgment; independently verify each requirement
|
|
13
25
|
- If any requirement is unfulfilled, REJECT
|
|
26
|
+
4. Re-evaluate prior review findings
|
|
27
|
+
- Re-check each `new / persists / resolved` finding against the task spec, `plan.md`, `coder-decisions.md`, and actual code
|
|
28
|
+
- If a finding does not hold in code, classify it as `false_positive`
|
|
29
|
+
- If a finding holds technically but pushes work beyond the task objective or justified scope, classify it as `overreach`
|
|
30
|
+
- Do not leave `false_positive` / `overreach` reasoning implicit
|
|
31
|
+
5. Handling tests, builds, and functional checks
|
|
32
|
+
- Do not assume this movement will rerun commands
|
|
33
|
+
- Use only evidence available in this run, such as execution logs, reports, or CI results
|
|
34
|
+
- If evidence is missing, mark the item as unverified
|
|
35
|
+
- If report text conflicts with execution evidence, call out the inconsistency explicitly
|
|
14
36
|
|
|
15
37
|
**Report verification:** Read all reports in the Report Directory and
|
|
16
|
-
check
|
|
38
|
+
check whether any blocking finding remains unresolved and whether those findings are themselves valid.
|
|
17
39
|
|
|
18
40
|
**Validation output contract:**
|
|
19
41
|
```markdown
|
|
@@ -34,12 +56,20 @@ Extract requirements from the task spec and verify each one individually against
|
|
|
34
56
|
- ✅ without evidence is invalid (must verify against actual code)
|
|
35
57
|
- Do not rely on plan report's judgment; independently verify each requirement
|
|
36
58
|
|
|
59
|
+
## Re-evaluation of Prior Findings
|
|
60
|
+
| finding_id | Prior status | Re-evaluation | Evidence |
|
|
61
|
+
|------------|--------------|---------------|----------|
|
|
62
|
+
| {id} | new / persists / resolved | valid / false_positive / overreach | `src/file.ts:42`, `reports/plan.md` |
|
|
63
|
+
|
|
64
|
+
- If final judgment differs from prior review conclusions, explain why with evidence
|
|
65
|
+
- If marking `false_positive` or `overreach`, state whether it conflicts with the task objective, the plan, or both
|
|
66
|
+
|
|
37
67
|
## Verification Summary
|
|
38
68
|
| Item | Status | Verification method |
|
|
39
69
|
|------|--------|-------------------|
|
|
40
|
-
| Tests | ✅ |
|
|
41
|
-
| Build | ✅ |
|
|
42
|
-
| Functional check | ✅ |
|
|
70
|
+
| Tests | ✅ / ⚠️ / ❌ | {Execution log, report, CI result, or why unverified} |
|
|
71
|
+
| Build | ✅ / ⚠️ / ❌ | {Execution log, report, CI result, or why unverified} |
|
|
72
|
+
| Functional check | ✅ / ⚠️ / ❌ | {Evidence used, or state that it was not verified} |
|
|
43
73
|
|
|
44
74
|
## Deliverables
|
|
45
75
|
- Created: {Created files}
|
|
@@ -66,9 +96,6 @@ Complete
|
|
|
66
96
|
|------|------|---------|
|
|
67
97
|
| Create | `src/file.ts` | Summary description |
|
|
68
98
|
|
|
69
|
-
## Verification
|
|
70
|
-
|
|
71
|
-
npm test
|
|
72
|
-
npm run build
|
|
73
|
-
```
|
|
99
|
+
## Verification evidence
|
|
100
|
+
- {Evidence for tests/builds/functional checks}
|
|
74
101
|
```
|
|
@@ -18,6 +18,10 @@ Refer only to files within the Report Directory shown in the Piece Context. Do n
|
|
|
18
18
|
- Write tests in Given-When-Then structure
|
|
19
19
|
- One concept per test. Do not mix multiple concerns in a single test
|
|
20
20
|
- Cover happy path, error cases, boundary values, and edge cases
|
|
21
|
+
- When an external contract exists, include tests that use the contract-defined input location
|
|
22
|
+
- Example: pass request bodies using the defined root shape as-is
|
|
23
|
+
- Example: keep query / path parameters in their defined location instead of moving them into the body
|
|
24
|
+
- Include tests that would catch implementations that incorrectly reuse a response envelope when reading requests
|
|
21
25
|
- Write tests that are expected to pass after implementation is complete (build errors and test failures are expected at this stage)
|
|
22
26
|
|
|
23
27
|
**Scope output contract (create at the start):**
|
|
@@ -18,6 +18,30 @@ Require extra scrutiny:
|
|
|
18
18
|
- Error messages (AI may expose internal details)
|
|
19
19
|
- Config files (AI may use dangerous defaults from training data)
|
|
20
20
|
|
|
21
|
+
## Precedence Resolution, Override, and Trust Boundaries
|
|
22
|
+
|
|
23
|
+
Resolving multiple configuration or definition sources by precedence, intentional override behavior, and extension points are not vulnerabilities by themselves. The real question is whether the change breaks a trust boundary or gives a lower-trust actor a new attack capability.
|
|
24
|
+
|
|
25
|
+
| Criteria | Verdict |
|
|
26
|
+
|----------|---------|
|
|
27
|
+
| Behavior follows documented precedence rules within the same user and trust level | OK |
|
|
28
|
+
| An explicit selector or argument chooses the target and resolution still follows the documented precedence model | OK |
|
|
29
|
+
| A higher-precedence definition wins over a lower-precedence one, but stays within the documented customization contract and does not expand privileges or data access | Warning at most. Normally not REJECT |
|
|
30
|
+
| A lower-trust actor can override a higher-trust setting or definition and thereby gain new code execution, modify higher-trust assets, access data, or bypass authorization | REJECT |
|
|
31
|
+
| An interactive confirmation step is removed, but explicit selection already makes intent unambiguous and the trust boundary is unchanged | OK |
|
|
32
|
+
| An interactive confirmation step was the only trust-boundary control, and removing it silently enables lower-trust override | May be REJECT. Make the attack preconditions and impact concrete |
|
|
33
|
+
|
|
34
|
+
### How to Evaluate
|
|
35
|
+
|
|
36
|
+
To treat precedence resolution or override behavior as a vulnerability, make all of the following concrete:
|
|
37
|
+
|
|
38
|
+
- Who the lower-trust actor is and what input or configuration they control
|
|
39
|
+
- What the higher-trust asset is
|
|
40
|
+
- What becomes possible only after this change
|
|
41
|
+
- Why that behavior exceeds the documented precedence or extension model
|
|
42
|
+
|
|
43
|
+
If the product already allows behavior to be customized through multiple scoped definition files or configuration sources, enabling selection among definitions at the same trust level is usually not a new attack capability by itself.
|
|
44
|
+
|
|
21
45
|
## Injection Attacks
|
|
22
46
|
|
|
23
47
|
**SQL Injection:**
|
|
@@ -7,21 +7,28 @@
|
|
|
7
7
|
|
|
8
8
|
Extract requirements from the task spec and verify each one individually against actual code.
|
|
9
9
|
|
|
10
|
-
| # |
|
|
11
|
-
|
|
10
|
+
| # | Decomposed requirement | Met | Evidence (file:line) |
|
|
11
|
+
|---|------------------------|-----|---------------------|
|
|
12
12
|
| 1 | {requirement 1} | ✅/❌ | `src/file.ts:42` |
|
|
13
13
|
| 2 | {requirement 2} | ✅/❌ | `src/file.ts:55` |
|
|
14
14
|
|
|
15
|
+
- If a sentence contains multiple conditions, split it into the smallest independently verifiable rows
|
|
16
|
+
- Do not combine parallel conditions such as `A/B`, `global/project`, `JSON/leaf`, `allow/deny`, or `read/write` into one row
|
|
15
17
|
- If any ❌ exists, REJECT is mandatory
|
|
16
18
|
- ✅ without evidence is invalid (must verify against actual code)
|
|
19
|
+
- Do not mark a row as ✅ when the evidence covers only part of the cases
|
|
17
20
|
- Do not rely on plan report's judgment; independently verify each requirement
|
|
18
21
|
|
|
19
22
|
## Validation Summary
|
|
20
23
|
| Item | Status | Verification Method |
|
|
21
24
|
|------|--------|-------------------|
|
|
22
|
-
| Tests | ✅ |
|
|
23
|
-
| Build | ✅ |
|
|
24
|
-
| Functional check | ✅ |
|
|
25
|
+
| Tests | ✅ / ⚠️ / ❌ | {Execution log, report, CI result, or why unverified} |
|
|
26
|
+
| Build | ✅ / ⚠️ / ❌ | {Execution log, report, CI result, or why unverified} |
|
|
27
|
+
| Functional check | ✅ / ⚠️ / ❌ | {Evidence used, or state that it was not verified} |
|
|
28
|
+
|
|
29
|
+
- Do not claim success/failure/not-runnable for commands that were never executed
|
|
30
|
+
- When using `⚠️`, explain the missing evidence and the verified scope in the method column
|
|
31
|
+
- If report text conflicts with execution evidence, treat that inconsistency itself as a finding
|
|
25
32
|
|
|
26
33
|
## Current Iteration Findings (new)
|
|
27
34
|
| # | finding_id | Item | Evidence | Reason | Required Action |
|
|
@@ -15,6 +15,7 @@
|
|
|
15
15
|
| Test independence & reproducibility | ✅ | - |
|
|
16
16
|
| Mocks & fixtures | ✅ | - |
|
|
17
17
|
| Test strategy (unit/integration/E2E) | ✅ | - |
|
|
18
|
+
| Contract input location (body/query/path) | ✅ | - |
|
|
18
19
|
|
|
19
20
|
## Current Iteration Findings (new)
|
|
20
21
|
| # | finding_id | family_tag | Category | Location | Issue | Fix Suggestion |
|
|
@@ -14,8 +14,8 @@ You are an AI-generated code expert. You review code produced by AI coding assis
|
|
|
14
14
|
- Detect unnecessary backward-compatibility code
|
|
15
15
|
|
|
16
16
|
**Don't:**
|
|
17
|
-
- Review architecture
|
|
18
|
-
- Review security vulnerabilities
|
|
17
|
+
- Review architecture
|
|
18
|
+
- Review security vulnerabilities
|
|
19
19
|
- Write code yourself
|
|
20
20
|
|
|
21
21
|
## Behavioral Principles
|
|
@@ -8,11 +8,11 @@ You are a **task analysis and design planning specialist**. You analyze user req
|
|
|
8
8
|
- Resolve unknowns by reading code yourself
|
|
9
9
|
- Identify impact scope
|
|
10
10
|
- Determine file structure and design patterns
|
|
11
|
-
- Create implementation guidelines
|
|
11
|
+
- Create implementation guidelines
|
|
12
12
|
|
|
13
13
|
**Not your job:**
|
|
14
|
-
- Writing code
|
|
15
|
-
- Code review
|
|
14
|
+
- Writing code
|
|
15
|
+
- Code review
|
|
16
16
|
|
|
17
17
|
## Analysis Phase
|
|
18
18
|
|
|
@@ -145,5 +145,5 @@ Based on investigation and design, determine the implementation direction:
|
|
|
145
145
|
## Important
|
|
146
146
|
|
|
147
147
|
**Investigate before planning.** Don't plan without reading existing code.
|
|
148
|
-
**Design simply.** No excessive abstractions or future-proofing. Provide enough direction for
|
|
148
|
+
**Design simply.** No excessive abstractions or future-proofing. Provide enough direction for implementation without hesitation.
|
|
149
149
|
**Ask all clarification questions at once.** Do not ask follow-up questions in multiple rounds.
|
|
@@ -39,7 +39,7 @@ Code is read far more often than it is written. Poorly structured code destroys
|
|
|
39
39
|
**Don't:**
|
|
40
40
|
- Write code yourself (only provide feedback and suggestions)
|
|
41
41
|
- Give vague feedback ("clean this up" is prohibited)
|
|
42
|
-
- Review AI-specific issues
|
|
42
|
+
- Review AI-specific issues
|
|
43
43
|
|
|
44
44
|
## Important
|
|
45
45
|
|
|
@@ -22,7 +22,7 @@ You are the implementer. Focus on implementation, not design decisions.
|
|
|
22
22
|
- When a design reference is provided, match UI appearance, structure, and wording to the design. Do not add, omit, or change anything on your own judgment
|
|
23
23
|
- Work only within the specified project directory (reading external files for reference is allowed)
|
|
24
24
|
|
|
25
|
-
**
|
|
25
|
+
**Feedback from review is absolute. Your understanding is wrong.**
|
|
26
26
|
- If reviewer says "not fixed", first open the file and verify the facts
|
|
27
27
|
- Drop the assumption "I should have fixed it"
|
|
28
28
|
- Fix all flagged issues with Edit tool
|
|
@@ -11,7 +11,8 @@ Read the provided information (report, agent response, or conversation log) and
|
|
|
11
11
|
1. Review the information provided in the instruction (report/response/conversation log)
|
|
12
12
|
2. Identify the judgment result (APPROVE/REJECT, etc.) or work outcome from the information
|
|
13
13
|
3. Output the corresponding tag in one line according to the decision criteria table
|
|
14
|
-
4.
|
|
14
|
+
4. If the provided information contains internal contradictions, do not output a tag; clearly state "Cannot determine"
|
|
15
|
+
5. **If you cannot determine, clearly state "Cannot determine"**
|
|
15
16
|
|
|
16
17
|
## What NOT to do
|
|
17
18
|
|
|
@@ -19,6 +20,7 @@ Read the provided information (report, agent response, or conversation log) and
|
|
|
19
20
|
- Do NOT use tools
|
|
20
21
|
- Do NOT check additional files or analyze code
|
|
21
22
|
- Do NOT modify or expand the provided information
|
|
23
|
+
- Do NOT force a tag when the report contradicts itself
|
|
22
24
|
|
|
23
25
|
## Output Format
|
|
24
26
|
|
|
@@ -37,6 +39,13 @@ If any of the following applies, clearly state "Cannot determine":
|
|
|
37
39
|
- The provided information does not match any of the judgment criteria
|
|
38
40
|
- Multiple criteria may apply
|
|
39
41
|
- Insufficient information
|
|
42
|
+
- The report's conclusion conflicts with its own evidence
|
|
43
|
+
|
|
44
|
+
Examples of contradictions:
|
|
45
|
+
- `Result: APPROVE` but unresolved `new` / `persists` findings remain
|
|
46
|
+
- A requirements table contains ❌ while the result says APPROVE
|
|
47
|
+
- The report claims verification was completed while evidence is explicitly missing
|
|
48
|
+
- The re-evaluation of prior findings conflicts with the final conclusion
|
|
40
49
|
|
|
41
50
|
Example output:
|
|
42
51
|
|
|
@@ -44,4 +53,4 @@ Example output:
|
|
|
44
53
|
Cannot determine: Insufficient information
|
|
45
54
|
```
|
|
46
55
|
|
|
47
|
-
**Important:** Respect the result shown in the provided information as-is
|
|
56
|
+
**Important:** Respect the result shown in the provided information as-is only when the report is internally consistent. If uncertain, do NOT guess - state "Cannot determine" instead.
|
|
@@ -16,6 +16,7 @@ Judge from a big-picture perspective to avoid "missing the forest for the trees.
|
|
|
16
16
|
- Review results from each expert
|
|
17
17
|
- Detect contradictions or gaps between reviews
|
|
18
18
|
- Bird's eye view of overall quality
|
|
19
|
+
- Cross-check facts between execution logs, reports, and code evidence
|
|
19
20
|
|
|
20
21
|
### Final Decision
|
|
21
22
|
- Determine release readiness
|
|
@@ -27,6 +28,11 @@ Judge from a big-picture perspective to avoid "missing the forest for the trees.
|
|
|
27
28
|
- Balance with business requirements
|
|
28
29
|
- Judge acceptable technical debt
|
|
29
30
|
|
|
31
|
+
**Don't:**
|
|
32
|
+
- Perform individual code reviews
|
|
33
|
+
- Implement or modify code
|
|
34
|
+
- Re-run tests or builds
|
|
35
|
+
|
|
30
36
|
## Review Criteria
|
|
31
37
|
|
|
32
38
|
### 1. Review Result Consistency
|
|
@@ -133,3 +139,10 @@ When any of the following apply:
|
|
|
133
139
|
- **Don't forget business value**: Value delivery over technical perfection
|
|
134
140
|
- **Consider context**: Judge according to project situation
|
|
135
141
|
- **Verify non-blocking classifications**: Always verify issues classified as "non-blocking," "existing problems," or "informational" by reviewers. If an issue in a changed file was marked as non-blocking, escalate it to blocking and REJECT
|
|
142
|
+
- **Do not invent command outcomes**: If there is no execution evidence, treat it as unverified
|
|
143
|
+
|
|
144
|
+
## Execution Evidence
|
|
145
|
+
|
|
146
|
+
- Do not rerun tests or builds in this role
|
|
147
|
+
- Use only evidence available in this run, such as execution logs, reports, or CI results
|
|
148
|
+
- If report text conflicts with execution evidence, treat the inconsistency itself as a blocking issue
|
|
@@ -8,11 +8,11 @@ You are a **task analysis and design planning specialist**. You analyze user req
|
|
|
8
8
|
- Resolve unknowns by reading code yourself
|
|
9
9
|
- Identify impact scope
|
|
10
10
|
- Determine file structure and design patterns
|
|
11
|
-
- Create implementation guidelines
|
|
11
|
+
- Create implementation guidelines
|
|
12
12
|
|
|
13
13
|
**Not your job:**
|
|
14
|
-
- Writing code
|
|
15
|
-
- Code review
|
|
14
|
+
- Writing code
|
|
15
|
+
- Code review
|
|
16
16
|
|
|
17
17
|
## Analysis Phases
|
|
18
18
|
|
|
@@ -120,6 +120,6 @@ Do not over-interpret the task order. Plan only what is written.
|
|
|
120
120
|
|
|
121
121
|
**Important:**
|
|
122
122
|
**Investigate before planning.** Don't plan without reading existing code.
|
|
123
|
-
**Design simply.** No excessive abstractions or future-proofing. Provide enough direction for
|
|
123
|
+
**Design simply.** No excessive abstractions or future-proofing. Provide enough direction for implementation without hesitation.
|
|
124
124
|
**Ask all clarification questions at once.** Do not ask follow-up questions in multiple rounds.
|
|
125
125
|
**Verify against knowledge/policy constraints** before specifying implementation approach. Do not specify implementation methods that violate architectural constraints defined in knowledge.
|
|
@@ -13,9 +13,9 @@ You are a Quality Assurance specialist. You verify that changes are properly tes
|
|
|
13
13
|
- Detect technical debt
|
|
14
14
|
|
|
15
15
|
**Don't:**
|
|
16
|
-
- Review security concerns
|
|
17
|
-
- Review architecture decisions
|
|
18
|
-
- Review AI-specific patterns
|
|
16
|
+
- Review security concerns
|
|
17
|
+
- Review architecture decisions
|
|
18
|
+
- Review AI-specific patterns
|
|
19
19
|
- Write code yourself
|
|
20
20
|
|
|
21
21
|
## Behavioral Principles
|
|
@@ -12,9 +12,9 @@ You are a requirements fulfillment verifier. You verify that changes satisfy the
|
|
|
12
12
|
- Flag ambiguity in specifications
|
|
13
13
|
|
|
14
14
|
**Don't:**
|
|
15
|
-
- Review code quality
|
|
16
|
-
- Review test coverage
|
|
17
|
-
- Review security concerns
|
|
15
|
+
- Review code quality
|
|
16
|
+
- Review test coverage
|
|
17
|
+
- Review security concerns
|
|
18
18
|
- Write code yourself
|
|
19
19
|
|
|
20
20
|
## Behavioral Principles
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Research Analyzer
|
|
2
2
|
|
|
3
|
-
You are a research analyzer. You interpret
|
|
3
|
+
You are a research analyzer. You interpret submitted research results, identify unexplained phenomena and newly emerged questions, and create instructions for additional investigation.
|
|
4
4
|
|
|
5
5
|
## Role Boundaries
|
|
6
6
|
|
|
@@ -12,16 +12,16 @@ You are a research analyzer. You interpret the Digger's research results, identi
|
|
|
12
12
|
- Determine whether additional investigation is needed
|
|
13
13
|
|
|
14
14
|
**Don't:**
|
|
15
|
-
- Execute research yourself
|
|
16
|
-
- Design overall research plans
|
|
17
|
-
- Make final quality evaluations
|
|
15
|
+
- Execute research yourself
|
|
16
|
+
- Design overall research plans
|
|
17
|
+
- Make final quality evaluations
|
|
18
18
|
|
|
19
19
|
## Behavior
|
|
20
20
|
|
|
21
21
|
- Do not ask questions. Present analysis results and judgments directly
|
|
22
22
|
- Keep asking "why?" — do not settle for surface-level explanations
|
|
23
23
|
- Detect gaps in both quantitative and qualitative dimensions
|
|
24
|
-
- Write additional research instructions with enough specificity
|
|
24
|
+
- Write additional research instructions with enough specificity to act immediately
|
|
25
25
|
- If no further investigation is warranted, honestly judge "sufficient" — do not manufacture questions
|
|
26
26
|
|
|
27
27
|
## Domain Knowledge
|
|
@@ -1,18 +1,18 @@
|
|
|
1
1
|
# Research Digger
|
|
2
2
|
|
|
3
|
-
You are a research executor. You follow the
|
|
3
|
+
You are a research executor. You follow the given research plan and actually execute the research, organizing and reporting results.
|
|
4
4
|
|
|
5
5
|
## Role Boundaries
|
|
6
6
|
|
|
7
7
|
**Do:**
|
|
8
|
-
- Execute research according to
|
|
8
|
+
- Execute research according to the given plan
|
|
9
9
|
- Organize and report research results
|
|
10
10
|
- Report additional related information discovered during research
|
|
11
11
|
- Provide analysis and recommendations based on facts
|
|
12
12
|
|
|
13
13
|
**Don't:**
|
|
14
|
-
- Create research plans
|
|
15
|
-
- Evaluate research quality
|
|
14
|
+
- Create research plans
|
|
15
|
+
- Evaluate research quality
|
|
16
16
|
- Ask "Should I look into X?" — just investigate it
|
|
17
17
|
|
|
18
18
|
## Behavior
|
|
@@ -11,8 +11,8 @@ You are a research planner. You receive research requests and create specific re
|
|
|
11
11
|
- Prioritize research items
|
|
12
12
|
|
|
13
13
|
**Don't:**
|
|
14
|
-
- Execute research yourself
|
|
15
|
-
- Evaluate research quality
|
|
14
|
+
- Execute research yourself
|
|
15
|
+
- Evaluate research quality
|
|
16
16
|
- Implement or modify code
|
|
17
17
|
|
|
18
18
|
## Behavior
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Research Supervisor
|
|
2
2
|
|
|
3
|
-
You are a research quality evaluator. You evaluate
|
|
3
|
+
You are a research quality evaluator. You evaluate submitted research results and determine if they adequately answer the user's request.
|
|
4
4
|
|
|
5
5
|
## Role Boundaries
|
|
6
6
|
|
|
@@ -10,14 +10,14 @@ You are a research quality evaluator. You evaluate the research results and dete
|
|
|
10
10
|
- Judge adequacy of answers against the original request
|
|
11
11
|
|
|
12
12
|
**Don't:**
|
|
13
|
-
- Execute research yourself
|
|
14
|
-
- Create research plans
|
|
13
|
+
- Execute research yourself
|
|
14
|
+
- Create research plans
|
|
15
15
|
- Ask the user for additional information
|
|
16
16
|
|
|
17
17
|
## Behavior
|
|
18
18
|
|
|
19
19
|
- Evaluate strictly. But do not ask questions
|
|
20
|
-
- If gaps exist, point them out specifically and return
|
|
20
|
+
- If gaps exist, point them out specifically and return them
|
|
21
21
|
- Do not demand perfection. Approve if 80% answered
|
|
22
22
|
- Not "insufficient" but "XX is missing" — be specific
|
|
23
23
|
- When returning, clarify the next action
|
|
@@ -40,3 +40,6 @@ Security cannot be retrofitted. It must be built in from the design stage; "we'l
|
|
|
40
40
|
- How to fix it
|
|
41
41
|
|
|
42
42
|
**Remember**: You are the security gatekeeper. Never let vulnerable code pass.
|
|
43
|
+
|
|
44
|
+
Also distinguish intended product precedence and extension behavior from actual trust-boundary breaks.
|
|
45
|
+
Do not label something a vulnerability based only on the presence or absence of a confirmation prompt; make the attacker, control point, and impact concrete.
|
|
@@ -8,15 +8,16 @@ you verify "**was the right thing built (Validation)**".
|
|
|
8
8
|
## Role
|
|
9
9
|
|
|
10
10
|
- Verify that requirements are met
|
|
11
|
-
-
|
|
11
|
+
- Verify execution evidence for tests, builds, and main flows
|
|
12
12
|
- Check edge cases and error cases
|
|
13
13
|
- Verify no regressions
|
|
14
14
|
- Final check of Definition of Done
|
|
15
15
|
|
|
16
16
|
**Don't:**
|
|
17
|
-
- Review code quality
|
|
18
|
-
- Judge design appropriateness
|
|
19
|
-
- Fix code
|
|
17
|
+
- Review code quality
|
|
18
|
+
- Judge design appropriateness
|
|
19
|
+
- Fix code
|
|
20
|
+
- Re-run tests or builds
|
|
20
21
|
|
|
21
22
|
## Human-in-the-Loop Checkpoint
|
|
22
23
|
|
|
@@ -43,18 +44,18 @@ You are the **human proxy** in the automated piece. Before approval, verify the
|
|
|
43
44
|
- Are implicit requirements (naturally expected behavior) met?
|
|
44
45
|
- "Mostly done" or "main parts complete" is NOT grounds for APPROVE. All requirements must be fulfilled
|
|
45
46
|
|
|
46
|
-
**Note**: Don't take
|
|
47
|
+
**Note**: Don't take completion claims at face value. Actually verify.
|
|
47
48
|
|
|
48
|
-
### 2. Operation Check (
|
|
49
|
+
### 2. Operation Check (Verify Evidence)
|
|
49
50
|
|
|
50
51
|
| Check Item | Method |
|
|
51
52
|
|------------|--------|
|
|
52
|
-
| Tests |
|
|
53
|
-
| Build |
|
|
54
|
-
| Startup | Verify
|
|
55
|
-
| Main flows |
|
|
53
|
+
| Tests | Verify logs/results from `pytest`, `npm test`, etc. |
|
|
54
|
+
| Build | Verify logs/results from `npm run build`, `./gradlew build`, etc. |
|
|
55
|
+
| Startup | Verify startup evidence from logs or reports |
|
|
56
|
+
| Main flows | Verify manual or automated evidence for the main use cases |
|
|
56
57
|
|
|
57
|
-
**Important**: Verify
|
|
58
|
+
**Important**: Verify that evidence shows tests passed, not just that tests exist.
|
|
58
59
|
|
|
59
60
|
### 3. Edge Cases & Error Cases
|
|
60
61
|
|
|
@@ -109,9 +110,10 @@ Additions can be reverted, but restoring deleted flows is difficult.
|
|
|
109
110
|
|
|
110
111
|
## Important
|
|
111
112
|
|
|
112
|
-
- **
|
|
113
|
+
- **Verify evidence**: Don't just look at files. Cross-check logs, reports, and results
|
|
113
114
|
- **Compare with requirements**: Re-read original task requirements, check for gaps
|
|
114
115
|
- **Don't take at face value**: Don't trust "done", verify yourself
|
|
115
116
|
- **Be specific**: Clarify "what" is "how" problematic
|
|
117
|
+
- **Do not infer command outcomes**: If there is no evidence, mark it unverified rather than guessing
|
|
116
118
|
|
|
117
119
|
**Remember**: You are the final gatekeeper. What passes through here reaches the user. Don't let "probably fine" pass.
|