handovergap 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- handovergap-0.1.0/.github/workflows/ci.yml +44 -0
- handovergap-0.1.0/.github/workflows/publish.yml +49 -0
- handovergap-0.1.0/.github/workflows/test-publish.yml +50 -0
- handovergap-0.1.0/.gitignore +9 -0
- handovergap-0.1.0/AGENTS.md +42 -0
- handovergap-0.1.0/CLAUDE.md +80 -0
- handovergap-0.1.0/CODEX.md +39 -0
- handovergap-0.1.0/LICENSE +21 -0
- handovergap-0.1.0/LOOP_ENGINEERING.md +254 -0
- handovergap-0.1.0/PKG-INFO +219 -0
- handovergap-0.1.0/README.ja.md +124 -0
- handovergap-0.1.0/README.md +182 -0
- handovergap-0.1.0/article/key_phrases.md +8 -0
- handovergap-0.1.0/article/openai_slot_filling_results.json +319 -0
- handovergap-0.1.0/article/results.md +59 -0
- handovergap-0.1.0/article/zenn_draft.md +255 -0
- handovergap-0.1.0/article/zenn_draft_skeleton.md +75 -0
- handovergap-0.1.0/article/zenn_outline.md +94 -0
- handovergap-0.1.0/design-qa.md +45 -0
- handovergap-0.1.0/docs/01_product_and_research_brief.md +82 -0
- handovergap-0.1.0/docs/02_method_handovergap_rag.md +77 -0
- handovergap-0.1.0/docs/03_tidb_architecture.md +71 -0
- handovergap-0.1.0/docs/04_tidb_schema.sql +122 -0
- handovergap-0.1.0/docs/05_slot_schema.md +118 -0
- handovergap-0.1.0/docs/06_handover_gap_bench.md +83 -0
- handovergap-0.1.0/docs/07_evaluation_metrics.md +86 -0
- handovergap-0.1.0/docs/08_pypi_package_design.md +105 -0
- handovergap-0.1.0/docs/09_cli_and_api_spec.md +103 -0
- handovergap-0.1.0/docs/10_demo_design.md +78 -0
- handovergap-0.1.0/docs/11_implementation_plan.md +59 -0
- handovergap-0.1.0/docs/12_zenn_article_outline.md +94 -0
- handovergap-0.1.0/docs/13_competitor_analysis.md +75 -0
- handovergap-0.1.0/docs/14_risks_and_notes.md +73 -0
- handovergap-0.1.0/docs/15_loop_engineering.md +100 -0
- handovergap-0.1.0/docs/16_agent_operating_contract.md +98 -0
- handovergap-0.1.0/docs/17_winning_strategy.md +95 -0
- handovergap-0.1.0/docs/18_judging_scorecard.md +38 -0
- handovergap-0.1.0/docs/19_competitor_battlecard.md +80 -0
- handovergap-0.1.0/docs/20_research_positioning.md +74 -0
- handovergap-0.1.0/docs/21_method_deep_dive.md +125 -0
- handovergap-0.1.0/docs/22_article_storyboard.md +55 -0
- handovergap-0.1.0/docs/23_demo_script.md +79 -0
- handovergap-0.1.0/docs/24_pypi_release_playbook.md +56 -0
- handovergap-0.1.0/docs/25_failure_modes.md +95 -0
- handovergap-0.1.0/docs/26_final_submission_checklist.md +82 -0
- handovergap-0.1.0/docs/assets/demo-ja.png +0 -0
- handovergap-0.1.0/docs/assets/design-comparison.png +0 -0
- handovergap-0.1.0/examples/streamlit_app.py +3 -0
- handovergap-0.1.0/harness/agent_prompt_claude.md +20 -0
- handovergap-0.1.0/harness/agent_prompt_codex.md +15 -0
- handovergap-0.1.0/harness/loops/00_winning_gate.md +18 -0
- handovergap-0.1.0/harness/loops/01_package_skeleton_loop.md +36 -0
- handovergap-0.1.0/harness/loops/02_schema_models_loop.md +31 -0
- handovergap-0.1.0/harness/loops/03_sample_dataset_loop.md +31 -0
- handovergap-0.1.0/harness/loops/04_detect_cli_loop.md +33 -0
- handovergap-0.1.0/harness/loops/05_evaluate_loop.md +33 -0
- handovergap-0.1.0/harness/loops/06_baseline_compare_loop.md +33 -0
- handovergap-0.1.0/harness/loops/07_tidb_store_loop.md +33 -0
- handovergap-0.1.0/harness/loops/08_streamlit_demo_loop.md +33 -0
- handovergap-0.1.0/harness/loops/09_article_loop.md +33 -0
- handovergap-0.1.0/harness/loops/10_release_loop.md +35 -0
- handovergap-0.1.0/harness/templates/failure_report.md +31 -0
- handovergap-0.1.0/harness/templates/handoff.md +29 -0
- handovergap-0.1.0/harness/templates/loop_report.md +30 -0
- handovergap-0.1.0/harness/templates/task_contract.md +31 -0
- handovergap-0.1.0/harness/validation/eval_check.sh +5 -0
- handovergap-0.1.0/harness/validation/openai_slot_filling_check.py +214 -0
- handovergap-0.1.0/harness/validation/release_check.sh +7 -0
- handovergap-0.1.0/harness/validation/smoke_test.sh +7 -0
- handovergap-0.1.0/harness/validation/tidb_live_check.py +209 -0
- handovergap-0.1.0/project/acceptance_criteria.md +44 -0
- handovergap-0.1.0/project/handoff.md +52 -0
- handovergap-0.1.0/project/loop_report_06_baseline_compare.md +39 -0
- handovergap-0.1.0/project/loop_report_07_tidb_store.md +45 -0
- handovergap-0.1.0/project/loop_report_08_streamlit_demo.md +46 -0
- handovergap-0.1.0/project/loop_report_09_article.md +41 -0
- handovergap-0.1.0/project/loop_report_10_release.md +45 -0
- handovergap-0.1.0/project/loop_report_p0_mvp.md +45 -0
- handovergap-0.1.0/project/release_checklist.md +43 -0
- handovergap-0.1.0/project/tasks.md +54 -0
- handovergap-0.1.0/prompts/extract_memory.md +26 -0
- handovergap-0.1.0/prompts/fill_slot.md +15 -0
- handovergap-0.1.0/prompts/generate_questions.md +17 -0
- handovergap-0.1.0/pyproject.toml +51 -0
- handovergap-0.1.0/src/handovergap/__init__.py +8 -0
- handovergap-0.1.0/src/handovergap/cli.py +160 -0
- handovergap-0.1.0/src/handovergap/core/__init__.py +1 -0
- handovergap-0.1.0/src/handovergap/core/baselines.py +62 -0
- handovergap-0.1.0/src/handovergap/core/detector.py +79 -0
- handovergap-0.1.0/src/handovergap/core/evaluator.py +104 -0
- handovergap-0.1.0/src/handovergap/data/__init__.py +1 -0
- handovergap-0.1.0/src/handovergap/data/handover_gap_bench.json +434 -0
- handovergap-0.1.0/src/handovergap/data/handover_gap_bench_holdout.json +180 -0
- handovergap-0.1.0/src/handovergap/data/schema.sql +133 -0
- handovergap-0.1.0/src/handovergap/demo_app.py +277 -0
- handovergap-0.1.0/src/handovergap/schemas/__init__.py +21 -0
- handovergap-0.1.0/src/handovergap/schemas/models.py +70 -0
- handovergap-0.1.0/src/handovergap/slot_rules.py +65 -0
- handovergap-0.1.0/src/handovergap/store.py +43 -0
- handovergap-0.1.0/src/handovergap/stores/__init__.py +3 -0
- handovergap-0.1.0/src/handovergap/stores/tidb.py +139 -0
- handovergap-0.1.0/tests/test_baselines.py +32 -0
- handovergap-0.1.0/tests/test_cli_detect.py +15 -0
- handovergap-0.1.0/tests/test_cli_help.py +15 -0
- handovergap-0.1.0/tests/test_dataset.py +27 -0
- handovergap-0.1.0/tests/test_demo_app.py +25 -0
- handovergap-0.1.0/tests/test_evaluate.py +50 -0
- handovergap-0.1.0/tests/test_schema_sql.py +43 -0
- handovergap-0.1.0/tests/test_schemas.py +29 -0
- handovergap-0.1.0/tests/test_serve_command.py +15 -0
- handovergap-0.1.0/tests/test_tidb_persistence.py +80 -0
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
pull_request:
|
|
6
|
+
|
|
7
|
+
permissions:
|
|
8
|
+
contents: read
|
|
9
|
+
|
|
10
|
+
jobs:
|
|
11
|
+
test:
|
|
12
|
+
runs-on: ubuntu-latest
|
|
13
|
+
strategy:
|
|
14
|
+
matrix:
|
|
15
|
+
python-version: ["3.10", "3.11", "3.12", "3.13"]
|
|
16
|
+
steps:
|
|
17
|
+
- uses: actions/checkout@v6
|
|
18
|
+
- uses: actions/setup-python@v6
|
|
19
|
+
with:
|
|
20
|
+
python-version: ${{ matrix.python-version }}
|
|
21
|
+
cache: pip
|
|
22
|
+
- run: python -m pip install --upgrade pip
|
|
23
|
+
- run: python -m pip install -e ".[dev,demo]"
|
|
24
|
+
- run: ruff check .
|
|
25
|
+
- run: pytest
|
|
26
|
+
- run: handovergap demo
|
|
27
|
+
- run: handovergap evaluate --compare
|
|
28
|
+
|
|
29
|
+
package:
|
|
30
|
+
runs-on: ubuntu-latest
|
|
31
|
+
steps:
|
|
32
|
+
- uses: actions/checkout@v6
|
|
33
|
+
- uses: actions/setup-python@v6
|
|
34
|
+
with:
|
|
35
|
+
python-version: "3.12"
|
|
36
|
+
cache: pip
|
|
37
|
+
- run: python -m pip install --upgrade pip
|
|
38
|
+
- run: python -m pip install build twine
|
|
39
|
+
- run: python -m build
|
|
40
|
+
- run: twine check dist/*
|
|
41
|
+
- uses: actions/upload-artifact@v7
|
|
42
|
+
with:
|
|
43
|
+
name: distributions
|
|
44
|
+
path: dist/
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
workflow_dispatch:
|
|
5
|
+
release:
|
|
6
|
+
types: [published]
|
|
7
|
+
|
|
8
|
+
permissions:
|
|
9
|
+
contents: read
|
|
10
|
+
|
|
11
|
+
jobs:
|
|
12
|
+
build:
|
|
13
|
+
runs-on: ubuntu-latest
|
|
14
|
+
steps:
|
|
15
|
+
- uses: actions/checkout@v6
|
|
16
|
+
- uses: actions/setup-python@v6
|
|
17
|
+
with:
|
|
18
|
+
python-version: "3.12"
|
|
19
|
+
- run: python -m pip install build twine
|
|
20
|
+
- run: python -m build
|
|
21
|
+
- run: twine check dist/*
|
|
22
|
+
- uses: actions/upload-artifact@v7
|
|
23
|
+
with:
|
|
24
|
+
name: distributions
|
|
25
|
+
path: dist/
|
|
26
|
+
|
|
27
|
+
publish:
|
|
28
|
+
needs: build
|
|
29
|
+
runs-on: ubuntu-latest
|
|
30
|
+
environment:
|
|
31
|
+
name: pypi
|
|
32
|
+
url: https://pypi.org/p/handovergap
|
|
33
|
+
permissions:
|
|
34
|
+
id-token: write
|
|
35
|
+
env:
|
|
36
|
+
PYPI_API_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
|
|
37
|
+
steps:
|
|
38
|
+
- uses: actions/download-artifact@v8
|
|
39
|
+
with:
|
|
40
|
+
name: distributions
|
|
41
|
+
path: dist/
|
|
42
|
+
- name: Publish to PyPI with API token
|
|
43
|
+
if: ${{ env.PYPI_API_TOKEN != '' }}
|
|
44
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
45
|
+
with:
|
|
46
|
+
password: ${{ env.PYPI_API_TOKEN }}
|
|
47
|
+
- name: Publish to PyPI with trusted publishing
|
|
48
|
+
if: ${{ env.PYPI_API_TOKEN == '' }}
|
|
49
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
name: Publish to TestPyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
workflow_dispatch:
|
|
5
|
+
|
|
6
|
+
permissions:
|
|
7
|
+
contents: read
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
build:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
steps:
|
|
13
|
+
- uses: actions/checkout@v6
|
|
14
|
+
- uses: actions/setup-python@v6
|
|
15
|
+
with:
|
|
16
|
+
python-version: "3.12"
|
|
17
|
+
- run: python -m pip install build twine
|
|
18
|
+
- run: python -m build
|
|
19
|
+
- run: twine check dist/*
|
|
20
|
+
- uses: actions/upload-artifact@v7
|
|
21
|
+
with:
|
|
22
|
+
name: test-distributions
|
|
23
|
+
path: dist/
|
|
24
|
+
|
|
25
|
+
publish:
|
|
26
|
+
needs: build
|
|
27
|
+
runs-on: ubuntu-latest
|
|
28
|
+
environment:
|
|
29
|
+
name: testpypi
|
|
30
|
+
url: https://test.pypi.org/p/handovergap
|
|
31
|
+
permissions:
|
|
32
|
+
id-token: write
|
|
33
|
+
env:
|
|
34
|
+
TEST_PYPI_API_TOKEN: ${{ secrets.TEST_PYPI_API_TOKEN }}
|
|
35
|
+
steps:
|
|
36
|
+
- uses: actions/download-artifact@v8
|
|
37
|
+
with:
|
|
38
|
+
name: test-distributions
|
|
39
|
+
path: dist/
|
|
40
|
+
- name: Publish to TestPyPI with API token
|
|
41
|
+
if: ${{ env.TEST_PYPI_API_TOKEN != '' }}
|
|
42
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
43
|
+
with:
|
|
44
|
+
repository-url: https://test.pypi.org/legacy/
|
|
45
|
+
password: ${{ env.TEST_PYPI_API_TOKEN }}
|
|
46
|
+
- name: Publish to TestPyPI with trusted publishing
|
|
47
|
+
if: ${{ env.TEST_PYPI_API_TOKEN == '' }}
|
|
48
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
49
|
+
with:
|
|
50
|
+
repository-url: https://test.pypi.org/legacy/
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# AGENTS.md
|
|
2
|
+
|
|
3
|
+
All agents must optimize for contest-winning evidence, not feature volume.
|
|
4
|
+
|
|
5
|
+
## Core Thesis
|
|
6
|
+
|
|
7
|
+
Correct memories are not always transferable.
|
|
8
|
+
|
|
9
|
+
## Required Work Loop
|
|
10
|
+
|
|
11
|
+
Plan -> Act -> Observe -> Validate -> Reflect -> Update Context -> Handoff
|
|
12
|
+
|
|
13
|
+
Work one loop at a time.
|
|
14
|
+
|
|
15
|
+
## Winning Filter
|
|
16
|
+
|
|
17
|
+
Before implementing any feature, answer:
|
|
18
|
+
|
|
19
|
+
```text
|
|
20
|
+
Which article claim, evaluation metric, TiDB-specific learning, PyPI first-run experience, or demo clarity does this improve?
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
If the answer is unclear, do not implement it.
|
|
24
|
+
|
|
25
|
+
## Forbidden in P0
|
|
26
|
+
|
|
27
|
+
- real company data
|
|
28
|
+
- employee scoring
|
|
29
|
+
- OpenAI-required runtime
|
|
30
|
+
- TiDB-required runtime
|
|
31
|
+
- full web app
|
|
32
|
+
- Slack/GitHub integration
|
|
33
|
+
- broad refactors without validation
|
|
34
|
+
|
|
35
|
+
## MVP Commands
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
handovergap demo
|
|
39
|
+
handovergap detect --scenario S001 --role CS
|
|
40
|
+
handovergap evaluate --compare
|
|
41
|
+
pytest
|
|
42
|
+
```
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
|
|
5
|
+
Build HandoverGap RAG as a contest-grade, PyPI-installable toolkit and mini benchmark.
|
|
6
|
+
|
|
7
|
+
## Core Thesis
|
|
8
|
+
|
|
9
|
+
Correct memories are not always transferable.
|
|
10
|
+
|
|
11
|
+
## Working Contract
|
|
12
|
+
|
|
13
|
+
For every task:
|
|
14
|
+
|
|
15
|
+
1. State objective.
|
|
16
|
+
2. Inspect required files.
|
|
17
|
+
3. Implement smallest useful change.
|
|
18
|
+
4. Validate.
|
|
19
|
+
5. Update docs/tasks if behavior changed.
|
|
20
|
+
6. Produce loop report.
|
|
21
|
+
7. Stop.
|
|
22
|
+
|
|
23
|
+
## Winning Mode
|
|
24
|
+
|
|
25
|
+
Optimize for:
|
|
26
|
+
|
|
27
|
+
- article claim
|
|
28
|
+
- evaluation metric
|
|
29
|
+
- TiDB-specific learning
|
|
30
|
+
- PyPI first-run experience
|
|
31
|
+
- demo clarity
|
|
32
|
+
|
|
33
|
+
Do not optimize for feature count.
|
|
34
|
+
|
|
35
|
+
## Required Framing
|
|
36
|
+
|
|
37
|
+
Do not describe this as a generic RAG app.
|
|
38
|
+
|
|
39
|
+
Describe it as:
|
|
40
|
+
|
|
41
|
+
> A TiDB-backed evaluation toolkit for detecting when a retrieved organizational memory is correct but not yet transferable to a successor.
|
|
42
|
+
|
|
43
|
+
## MVP Focus
|
|
44
|
+
|
|
45
|
+
P0:
|
|
46
|
+
|
|
47
|
+
- CLI
|
|
48
|
+
- sample dataset
|
|
49
|
+
- rule-based detector
|
|
50
|
+
- baseline comparison
|
|
51
|
+
- evaluation metrics
|
|
52
|
+
- tests
|
|
53
|
+
|
|
54
|
+
P1:
|
|
55
|
+
|
|
56
|
+
- TiDB schema and optional store
|
|
57
|
+
- Streamlit demo
|
|
58
|
+
- PyPI package
|
|
59
|
+
- article assets
|
|
60
|
+
|
|
61
|
+
## Loop Report Format
|
|
62
|
+
|
|
63
|
+
```md
|
|
64
|
+
## Loop Report
|
|
65
|
+
|
|
66
|
+
### Objective
|
|
67
|
+
...
|
|
68
|
+
|
|
69
|
+
### Files Changed
|
|
70
|
+
...
|
|
71
|
+
|
|
72
|
+
### Validation
|
|
73
|
+
...
|
|
74
|
+
|
|
75
|
+
### What This Improves for the Article
|
|
76
|
+
...
|
|
77
|
+
|
|
78
|
+
### Next Recommended Loop
|
|
79
|
+
...
|
|
80
|
+
```
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# CODEX.md
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
|
|
5
|
+
Implement `handovergap`, a Python package and CLI for detecting tacit context gaps in handover-oriented RAG.
|
|
6
|
+
|
|
7
|
+
## Priority Order
|
|
8
|
+
|
|
9
|
+
1. CLI first-run experience
|
|
10
|
+
2. evaluation reproducibility
|
|
11
|
+
3. baseline comparison
|
|
12
|
+
4. TiDB-specific implementation
|
|
13
|
+
5. article assets
|
|
14
|
+
6. demo polish
|
|
15
|
+
|
|
16
|
+
## Core Commands
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
handovergap demo
|
|
20
|
+
handovergap detect --scenario S001 --role CS
|
|
21
|
+
handovergap evaluate --compare
|
|
22
|
+
handovergap schema --dialect tidb
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Constraints
|
|
26
|
+
|
|
27
|
+
- Python >= 3.10
|
|
28
|
+
- Typer CLI
|
|
29
|
+
- Pydantic v2
|
|
30
|
+
- Rich output
|
|
31
|
+
- pytest
|
|
32
|
+
- no required external LLM in P0
|
|
33
|
+
- no required TiDB in local MVP
|
|
34
|
+
- synthetic data only
|
|
35
|
+
|
|
36
|
+
## Stop Rule
|
|
37
|
+
|
|
38
|
+
Do not move to the next loop automatically.
|
|
39
|
+
End with a loop report.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Masa Mura
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,254 @@
|
|
|
1
|
+
# Loop Engineering Harness
|
|
2
|
+
|
|
3
|
+
This document defines the operating loop for AI coding agents working on HandoverGap RAG.
|
|
4
|
+
|
|
5
|
+
## What Loop Engineering Means Here
|
|
6
|
+
|
|
7
|
+
Loop engineering is not a task list.
|
|
8
|
+
|
|
9
|
+
It is the design of a closed development loop where an AI agent repeatedly:
|
|
10
|
+
|
|
11
|
+
```text
|
|
12
|
+
Plan
|
|
13
|
+
→ Act
|
|
14
|
+
→ Observe
|
|
15
|
+
→ Validate
|
|
16
|
+
→ Reflect
|
|
17
|
+
→ Update Context
|
|
18
|
+
→ Handoff
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
The goal is to prevent the agent from drifting into broad, unverified implementation.
|
|
22
|
+
|
|
23
|
+
Every loop must have:
|
|
24
|
+
|
|
25
|
+
- objective
|
|
26
|
+
- input files
|
|
27
|
+
- output files
|
|
28
|
+
- forbidden changes
|
|
29
|
+
- validation commands
|
|
30
|
+
- failure handling
|
|
31
|
+
- context update rule
|
|
32
|
+
- handoff format
|
|
33
|
+
- stop condition
|
|
34
|
+
|
|
35
|
+
## Global Loop Contract
|
|
36
|
+
|
|
37
|
+
For every task, the agent must:
|
|
38
|
+
|
|
39
|
+
1. Read the task contract.
|
|
40
|
+
2. Restate the loop objective in one sentence.
|
|
41
|
+
3. Inspect only the necessary files first.
|
|
42
|
+
4. Prefer tests before implementation when possible.
|
|
43
|
+
5. Implement the smallest useful change.
|
|
44
|
+
6. Run the validation commands.
|
|
45
|
+
7. If validation fails, fix the smallest cause only.
|
|
46
|
+
8. Update docs or task status if behavior changed.
|
|
47
|
+
9. Write a loop report.
|
|
48
|
+
10. Stop.
|
|
49
|
+
|
|
50
|
+
## Forbidden Behavior
|
|
51
|
+
|
|
52
|
+
The agent must not:
|
|
53
|
+
|
|
54
|
+
- implement multiple loops in one pass
|
|
55
|
+
- introduce external LLM dependency in P0
|
|
56
|
+
- require TiDB for local MVP
|
|
57
|
+
- use real company or customer data
|
|
58
|
+
- create employee scoring or surveillance features
|
|
59
|
+
- silently change public CLI behavior
|
|
60
|
+
- skip validation commands
|
|
61
|
+
- hide failing tests
|
|
62
|
+
- broaden the architecture without updating docs
|
|
63
|
+
|
|
64
|
+
## Loop Types
|
|
65
|
+
|
|
66
|
+
### Research Loop
|
|
67
|
+
|
|
68
|
+
Purpose: Verify positioning and novelty.
|
|
69
|
+
|
|
70
|
+
```text
|
|
71
|
+
Hypothesis
|
|
72
|
+
→ Related work / competitor check
|
|
73
|
+
→ Difference statement
|
|
74
|
+
→ Article thesis update
|
|
75
|
+
→ Risk note
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Output:
|
|
79
|
+
|
|
80
|
+
- `docs/13_competitor_analysis.md`
|
|
81
|
+
- `article/key_phrases.md`
|
|
82
|
+
- `article/zenn_outline.md`
|
|
83
|
+
|
|
84
|
+
### Product Loop
|
|
85
|
+
|
|
86
|
+
Purpose: Improve 5-minute package experience.
|
|
87
|
+
|
|
88
|
+
```text
|
|
89
|
+
CLI idea
|
|
90
|
+
→ Implement
|
|
91
|
+
→ Run quickstart
|
|
92
|
+
→ Observe friction
|
|
93
|
+
→ Simplify
|
|
94
|
+
→ Update README
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Output:
|
|
98
|
+
|
|
99
|
+
- CLI command
|
|
100
|
+
- README quickstart
|
|
101
|
+
- smoke test
|
|
102
|
+
|
|
103
|
+
### Evaluation Loop
|
|
104
|
+
|
|
105
|
+
Purpose: Make the claim measurable.
|
|
106
|
+
|
|
107
|
+
```text
|
|
108
|
+
Scenario
|
|
109
|
+
→ Gold gaps
|
|
110
|
+
→ Run baseline
|
|
111
|
+
→ Run proposed method
|
|
112
|
+
→ Compare
|
|
113
|
+
→ Inspect failure
|
|
114
|
+
→ Improve schema
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Output:
|
|
118
|
+
|
|
119
|
+
- benchmark scenarios
|
|
120
|
+
- metrics
|
|
121
|
+
- result table
|
|
122
|
+
|
|
123
|
+
### Engineering Loop
|
|
124
|
+
|
|
125
|
+
Purpose: Keep code maintainable.
|
|
126
|
+
|
|
127
|
+
```text
|
|
128
|
+
Test
|
|
129
|
+
→ Implement
|
|
130
|
+
→ Lint
|
|
131
|
+
→ Smoke test
|
|
132
|
+
→ Docs update
|
|
133
|
+
→ Handoff
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Output:
|
|
137
|
+
|
|
138
|
+
- passing tests
|
|
139
|
+
- small diff
|
|
140
|
+
- loop report
|
|
141
|
+
|
|
142
|
+
### Article Loop
|
|
143
|
+
|
|
144
|
+
Purpose: Turn implementation results into article content.
|
|
145
|
+
|
|
146
|
+
```text
|
|
147
|
+
Draft claim
|
|
148
|
+
→ Link to implementation result
|
|
149
|
+
→ Add table/screenshot
|
|
150
|
+
→ Remove unsupported claim
|
|
151
|
+
→ Update article
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Output:
|
|
155
|
+
|
|
156
|
+
- article section
|
|
157
|
+
- result table
|
|
158
|
+
- screenshot checklist
|
|
159
|
+
|
|
160
|
+
## Validation Levels
|
|
161
|
+
|
|
162
|
+
### L0: Local Smoke
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
handovergap --help
|
|
166
|
+
handovergap demo
|
|
167
|
+
handovergap detect --scenario S001 --role CS
|
|
168
|
+
handovergap evaluate
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### L1: Test
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
pytest
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### L2: Quality
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
ruff check .
|
|
181
|
+
mypy src
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### L3: Package
|
|
185
|
+
|
|
186
|
+
```bash
|
|
187
|
+
python -m build
|
|
188
|
+
twine check dist/*
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
## Failure Handling
|
|
192
|
+
|
|
193
|
+
### If tests fail
|
|
194
|
+
|
|
195
|
+
Do not add new features.
|
|
196
|
+
|
|
197
|
+
1. Identify the failing test.
|
|
198
|
+
2. Explain the likely cause.
|
|
199
|
+
3. Fix the smallest cause.
|
|
200
|
+
4. Rerun the exact failing test.
|
|
201
|
+
5. Rerun full tests if fixed.
|
|
202
|
+
|
|
203
|
+
### If CLI UX is confusing
|
|
204
|
+
|
|
205
|
+
1. Simplify command.
|
|
206
|
+
2. Update README.
|
|
207
|
+
3. Add smoke test.
|
|
208
|
+
4. Do not add a new command unless necessary.
|
|
209
|
+
|
|
210
|
+
### If novelty claim is unsupported
|
|
211
|
+
|
|
212
|
+
1. Downgrade the claim.
|
|
213
|
+
2. Add limitation.
|
|
214
|
+
3. Add evaluation or evidence.
|
|
215
|
+
4. Update article wording.
|
|
216
|
+
|
|
217
|
+
### If TiDB integration blocks progress
|
|
218
|
+
|
|
219
|
+
1. Keep InMemoryStore working.
|
|
220
|
+
2. Add a clear TODO.
|
|
221
|
+
3. Isolate TiDB-specific code.
|
|
222
|
+
4. Do not break local MVP.
|
|
223
|
+
|
|
224
|
+
## Loop Report Format
|
|
225
|
+
|
|
226
|
+
At the end of every loop, produce:
|
|
227
|
+
|
|
228
|
+
```md
|
|
229
|
+
## Loop Report
|
|
230
|
+
|
|
231
|
+
### Objective
|
|
232
|
+
...
|
|
233
|
+
|
|
234
|
+
### Files Changed
|
|
235
|
+
- ...
|
|
236
|
+
|
|
237
|
+
### Validation
|
|
238
|
+
- [ ] command: result
|
|
239
|
+
|
|
240
|
+
### Observations
|
|
241
|
+
...
|
|
242
|
+
|
|
243
|
+
### Failures
|
|
244
|
+
...
|
|
245
|
+
|
|
246
|
+
### Next Recommended Loop
|
|
247
|
+
...
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
## Stop Rule
|
|
251
|
+
|
|
252
|
+
Stop after one loop.
|
|
253
|
+
|
|
254
|
+
If tempted to continue, write the next loop contract instead.
|