@champpaba/claude-agent-kit 1.5.1 â 1.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/CLAUDE.md +39 -13
- package/.claude/commands/csetup.md +82 -3
- package/.claude/commands/pageplan.md +233 -6
- package/.claude/lib/agent-executor.md +449 -0
- package/.claude/lib/detailed-guides/incremental-testing.md +460 -0
- package/.claude/lib/task-analyzer.md +398 -2
- package/.claude/templates/phase-templates.json +50 -1
- package/README.md +125 -39
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -917,52 +917,138 @@ Built with:
|
|
|
917
917
|
|
|
918
918
|
---
|
|
919
919
|
|
|
920
|
-
## đ What's New in v1.
|
|
920
|
+
## đ What's New in v1.6.0
|
|
921
921
|
|
|
922
|
-
**Feature:
|
|
922
|
+
**Feature: Incremental Testing - Milestone-based Validation for High-Risk Tasks** đ
|
|
923
923
|
|
|
924
|
-
###
|
|
924
|
+
### The Problem: All-or-Nothing Testing
|
|
925
925
|
|
|
926
|
-
**
|
|
927
|
-
|
|
928
|
-
|
|
926
|
+
**Before v1.6.0:**
|
|
927
|
+
```
|
|
928
|
+
Task: "Integrate Google Maps API"
|
|
929
|
+
â Agent implements complete solution (1000 locations)
|
|
930
|
+
â Tests with full dataset
|
|
931
|
+
â Bug found â Hard to debug (which part failed?)
|
|
932
|
+
â Fix â Retest full dataset â Slow iteration
|
|
933
|
+
|
|
934
|
+
Problem:
|
|
935
|
+
â Large scope = hard to debug
|
|
936
|
+
â Late bug detection (at scale)
|
|
937
|
+
â Rework expensive (threw away 1000-location implementation)
|
|
938
|
+
â No confidence in progressive scaling
|
|
939
|
+
```
|
|
929
940
|
|
|
930
|
-
**
|
|
931
|
-
|
|
932
|
-
|
|
933
|
-
|
|
934
|
-
|
|
941
|
+
**After v1.6.0:**
|
|
942
|
+
```
|
|
943
|
+
Task: "Integrate Google Maps API"
|
|
944
|
+
â Milestone 1: Test 1 location (hardcoded)
|
|
945
|
+
â Bug found â Easy to debug (small scope)
|
|
946
|
+
â Fix â Retest 1 location â Fast iteration
|
|
947
|
+
â Milestone 2: Test 10 locations (parameterized)
|
|
948
|
+
â Works! Confidence++
|
|
949
|
+
â Milestone 3: Error handling
|
|
950
|
+
â Refine edge cases
|
|
951
|
+
â Milestone 4: Scale to 1000
|
|
952
|
+
â Already confident (1 and 10 worked)
|
|
953
|
+
|
|
954
|
+
Benefits:
|
|
955
|
+
â
Small scope = easy debugging
|
|
956
|
+
â
Early bug detection (at milestone 1)
|
|
957
|
+
â
Low rework (fix before scaling)
|
|
958
|
+
â
Progressive confidence
|
|
959
|
+
```
|
|
935
960
|
|
|
936
|
-
|
|
937
|
-
- **50-90% fewer confirmations** (1x per workflow vs 2x per phase)
|
|
938
|
-
- **25% faster execution** (no waiting for redundant approvals)
|
|
939
|
-
- **Better UX** (approve once, system handles the rest)
|
|
940
|
-
- **Lean implementation** (80 lines, 1 file, +0.1% context)
|
|
961
|
+
### The Solution: Milestone-based Validation
|
|
941
962
|
|
|
942
|
-
|
|
963
|
+
**Automatic Detection:** `/csetup` detects high-risk tasks automatically
|
|
964
|
+
- Risk = HIGH (payment, auth, security)
|
|
965
|
+
- Risk = MEDIUM + Complexity âĨ 7 (complex forms)
|
|
966
|
+
- External API dependency (Google Maps, Stripe, OpenAI)
|
|
967
|
+
- Data-intensive operation (ETL, migration, batch processing)
|
|
943
968
|
|
|
944
|
-
|
|
945
|
-
|
|
946
|
-
|
|
947
|
-
|
|
948
|
-
|
|
949
|
-
|
|
950
|
-
|
|
951
|
-
|
|
952
|
-
|
|
953
|
-
|
|
954
|
-
|
|
955
|
-
|
|
956
|
-
|
|
957
|
-
|
|
958
|
-
|
|
959
|
-
|
|
960
|
-
|
|
961
|
-
|
|
962
|
-
|
|
963
|
-
|
|
964
|
-
-
|
|
965
|
-
-
|
|
969
|
+
**3 Milestone Patterns:**
|
|
970
|
+
|
|
971
|
+
1. **Backend API Integration** (4 milestones)
|
|
972
|
+
- M1: Core implementation (1 record, hardcoded)
|
|
973
|
+
- M2: Parameterized query (10 records, dynamic)
|
|
974
|
+
- M3: Error handling (invalid input, timeouts)
|
|
975
|
+
- M4: Scale + performance (100-1000 records)
|
|
976
|
+
|
|
977
|
+
2. **Complex Form** (3 milestones)
|
|
978
|
+
- M1: Architecture + skeleton (2-3 critical fields)
|
|
979
|
+
- M2: E2E flow validation (submit â API â DB)
|
|
980
|
+
- M3: Complete all fields (all 20 fields + validation)
|
|
981
|
+
|
|
982
|
+
3. **Database Migration / ETL** (3 milestones)
|
|
983
|
+
- M1: Dry-run (10 records)
|
|
984
|
+
- M2: Scale to 100 records
|
|
985
|
+
- M3: Full dataset (staging)
|
|
986
|
+
|
|
987
|
+
### Round-based Retry Logic
|
|
988
|
+
|
|
989
|
+
**Per-Milestone Quota:**
|
|
990
|
+
- **2 attempts per round** (not total)
|
|
991
|
+
- **Unlimited rounds** (Main Claude decides when to stop)
|
|
992
|
+
- **Hints reset quota** (fresh start)
|
|
993
|
+
|
|
994
|
+
**Example:**
|
|
995
|
+
```
|
|
996
|
+
Milestone 1: Core implementation
|
|
997
|
+
â Round 1: Attempt 1 â (API key missing)
|
|
998
|
+
â Round 1: Attempt 2 â (Still missing)
|
|
999
|
+
â Main Claude: "Check API_KEY env variable" đĄ
|
|
1000
|
+
â Round 2: Attempt 1 â
(Fixed!)
|
|
1001
|
+
|
|
1002
|
+
Total attempts: 3 (2 in Round 1, 1 in Round 2)
|
|
1003
|
+
```
|
|
1004
|
+
|
|
1005
|
+
### Main Claude Intervention
|
|
1006
|
+
|
|
1007
|
+
**Decision Matrix:**
|
|
1008
|
+
|
|
1009
|
+
| Error Pattern | Complexity | Confidence | Action |
|
|
1010
|
+
|---------------|------------|------------|--------|
|
|
1011
|
+
| Same error 2x | SIMPLE | HIGH | Give Hints |
|
|
1012
|
+
| Same error 2x | COMPLEX | LOW | Ask Human |
|
|
1013
|
+
| Different errors | ANY | ANY | Ask Human |
|
|
1014
|
+
| Intermittent | ANY | ANY | Ask Human |
|
|
1015
|
+
| 2+ rounds no progress | ANY | ANY | Ask Human |
|
|
1016
|
+
|
|
1017
|
+
**Pattern-based Hints:**
|
|
1018
|
+
- 401 Unauthorized â Check API_KEY, verify key validity
|
|
1019
|
+
- Timeout â Increase threshold, check network
|
|
1020
|
+
- Schema mismatch â Compare actual vs expected, check API version
|
|
1021
|
+
|
|
1022
|
+
### Benefits & Trade-offs
|
|
1023
|
+
|
|
1024
|
+
**Benefits:**
|
|
1025
|
+
- â
**75% faster debug** - Catch bugs at M1 (1 record) vs M4 (1000 records)
|
|
1026
|
+
- â
**60-70% rework reduction** - Fix before scaling
|
|
1027
|
+
- â
**80% faster debugging** - Small scope (1 record) vs full dataset
|
|
1028
|
+
- â
**90% success rate** - Progressive confidence at M4
|
|
1029
|
+
- â
**40-50% net speedup** - +15-20% time upfront â -60-70% rework time
|
|
1030
|
+
|
|
1031
|
+
**Trade-offs:**
|
|
1032
|
+
- â ī¸ **Timeline:** +15-20% upfront (but saves 60-70% rework)
|
|
1033
|
+
- â ī¸ **Complexity:** phases.md 2-3x longer (summary table at top)
|
|
1034
|
+
- â ī¸ **Learning curve:** More coordination (automated by `/csetup`)
|
|
1035
|
+
|
|
1036
|
+
**Net benefit:** +15-20% time â -60-70% rework = **40-50% faster overall**
|
|
1037
|
+
|
|
1038
|
+
### When to Use Incremental Testing
|
|
1039
|
+
|
|
1040
|
+
**â
Use for:**
|
|
1041
|
+
- Payment integration, Auth systems (HIGH risk)
|
|
1042
|
+
- Complex forms with 20+ fields (Complexity âĨ 7)
|
|
1043
|
+
- External APIs (Google Maps, Stripe, OpenAI)
|
|
1044
|
+
- Data migrations, ETL pipelines (data-intensive)
|
|
1045
|
+
|
|
1046
|
+
**â Skip for:**
|
|
1047
|
+
- Simple CRUD operations (LOW risk, Complexity < 5)
|
|
1048
|
+
- UI components (standard testing sufficient)
|
|
1049
|
+
- Configuration changes (no integration testing needed)
|
|
1050
|
+
|
|
1051
|
+
**Detection Rate:** ~20-30% of tasks (only high-risk)
|
|
966
1052
|
|
|
967
1053
|
---
|
|
968
1054
|
|