flyee 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,134 @@
1
+ ---
2
+ name: roadmap-reassessment
3
+ description: Automatically reassess the project roadmap after each phase/sprint completes. Checks if learned information changes the plan, reorders priorities, and adjusts scope. Prevents executing an outdated plan when reality has shifted.
4
+ ---
5
+
6
+ # Roadmap Reassessment
7
+
8
+ > Reavaliar o plano após cada phase completada.
9
+
10
+ ## Problem
11
+
12
+ Plans are made with incomplete information. As work progresses, the agent learns:
13
+ - A dependency is harder than expected
14
+ - A feature is simpler than planned
15
+ - A new requirement emerged
16
+ - A technical constraint was discovered
17
+ - An approach was abandoned in favor of a better one
18
+
19
+ Without reassessment, the agent blindly follows the original plan even when it no longer makes sense.
20
+
21
+ ## When to Trigger
22
+
23
+ | Trigger | Reassessment Type |
24
+ |---------|-------------------|
25
+ | Phase/Sprint completed | **Full reassessment** |
26
+ | Major blocker found | **Emergency reassessment** |
27
+ | User changes requirements | **Scope reassessment** |
28
+ | Budget milestone (50%, 75%) | **Budget reassessment** |
29
+
30
+ ## Reassessment Protocol
31
+
32
+ ### Step 1: Gather Evidence
33
+
34
+ ```markdown
35
+ What changed since the plan was made?
36
+
37
+ 1. **Completed work**: What was done? What was learned?
38
+ 2. **Discovered complexity**: What was harder/easier than expected?
39
+ 3. **New dependencies**: What new dependencies were discovered?
40
+ 4. **Abandoned approaches**: What was tried and didn't work?
41
+ 5. **Cost reality**: How much budget was used vs planned?
42
+ ```
43
+
44
+ ### Step 2: Evaluate Remaining Plan
45
+
46
+ For each remaining phase/task, ask:
47
+
48
+ ```markdown
49
+ | Phase | Still Valid? | Priority Changed? | Effort Changed? | Action |
50
+ |-------|-------------|-------------------|-----------------|--------|
51
+ | Phase N+1 | Yes/No | ↑↓→ | ↑↓→ | Keep/Reorder/Split/Remove |
52
+ | Phase N+2 | Yes/No | ↑↓→ | ↑↓→ | Keep/Reorder/Split/Remove |
53
+ ```
54
+
55
+ ### Step 3: Decide
56
+
57
+ | Decision | When |
58
+ |----------|------|
59
+ | **Keep** | Plan is still valid. No changes. |
60
+ | **Reorder** | Priorities shifted. Move phases up/down. |
61
+ | **Split** | A phase is too large. Break it into smaller phases. |
62
+ | **Merge** | Two phases are related and should be combined. |
63
+ | **Remove** | Phase is no longer needed. |
64
+ | **Add** | New phase needed based on discoveries. |
65
+ | **Pause** | Need user input before continuing. |
66
+
67
+ ### Step 4: Update Artifacts
68
+
69
+ If the plan changed:
70
+
71
+ 1. Update `implementation_plan.md` or equivalent planning doc
72
+ 2. Update `.flyee/STATE.md` with new phase order
73
+ 3. Add entry to `.flyee/DECISIONS.md`:
74
+ ```markdown
75
+ ## [DATE] Roadmap Reassessment after Phase N
76
+
77
+ **Trigger:** Phase N completed
78
+ **Changes:**
79
+ - Phase N+2 moved before N+1 (dependency discovered)
80
+ - Phase N+3 removed (handled by Phase N)
81
+ - New Phase N+4 added (edge case found)
82
+ **Rationale:** [explanation]
83
+ ```
84
+ 4. Emit event for Flyee SaaS
85
+
86
+ ### Step 5: Notify
87
+
88
+ ```markdown
89
+ 📋 **Roadmap Reassessment Complete**
90
+
91
+ Phase {N} finished. Here's what changed:
92
+
93
+ | Change | Detail |
94
+ |--------|--------|
95
+ | ✅ Completed | Phase {N}: {description} |
96
+ | 🔄 Reordered | Phase {X} moved up (blocking dependency) |
97
+ | ➕ Added | Phase {Y}: {new discovery} |
98
+ | 🗑️ Removed | Phase {Z}: {no longer needed} |
99
+
100
+ Budget: ${used} / ${ceiling} ({percentage}%)
101
+ Remaining phases: {count}
102
+ Estimated remaining cost: ${projection}
103
+
104
+ Continue? (yes / review changes / pause)
105
+ ```
106
+
107
+ ## Budget Reassessment
108
+
109
+ At cost milestones (50%, 75%, 90%):
110
+
111
+ ```markdown
112
+ 💰 **Budget Reassessment — {percentage}% Used**
113
+
114
+ Spent: ${used} on {completed_phases} phases
115
+ Remaining: ${remaining} for {remaining_phases} phases
116
+ Avg cost per phase: ${avg}
117
+ Projected total: ${projection}
118
+
119
+ | Action | Description |
120
+ |--------|-------------|
121
+ | Continue | Budget is sufficient |
122
+ | Trim scope | Remove low-priority phases to stay in budget |
123
+ | Request increase | Ask user for more budget |
124
+ | Switch models | Use cheaper models for remaining work |
125
+ ```
126
+
127
+ ## Integration
128
+
129
+ - **state-machine**: Reads phases from STATE.md
130
+ - **cost-tracking**: Provides budget data for budget reassessment
131
+ - **quality-gates**: Sprint Gate triggers reassessment
132
+ - **knowledge-persistence**: Reassessment decisions go to KNOWLEDGE.md
133
+ - **context-budget**: Validates new phases fit in context windows
134
+ - **Flyee SaaS**: Reassessment events update project timeline (S-02, S-03)
@@ -0,0 +1,152 @@
1
+ ---
2
+ name: skill-discovery
3
+ description: Automatic discovery and suggestion of relevant skills based on the current task context. Modes - auto (load silently), suggest (recommend to user), off (manual only). Replaces static frontmatter-only loading with dynamic detection.
4
+ ---
5
+
6
+ # Skill Discovery
7
+
8
+ > Auto-detect and load relevant skills based on task context.
9
+
10
+ ## Problem
11
+
12
+ Current approach: skills are loaded via agent frontmatter (`skills: [a, b, c]`). This is static — if the agent lists 10 skills, ALL 10 are loaded regardless of whether they're relevant to the current task.
13
+
14
+ GSD-2 approach: skills are discovered dynamically based on the task context. A "Fix CSS bug" task loads frontend skills. A "Database migration" task loads backend skills.
15
+
16
+ ## Modes
17
+
18
+ Configure in `.flyee/config.json`:
19
+
20
+ ```json
21
+ {
22
+ "skill_discovery": "suggest"
23
+ }
24
+ ```
25
+
26
+ | Mode | Behavior |
27
+ |------|----------|
28
+ | `auto` | Discover and load relevant skills silently |
29
+ | `suggest` | Discover skills, show to user, ask before loading |
30
+ | `off` | Only load skills explicitly listed in agent frontmatter |
31
+
32
+ ## Discovery Algorithm
33
+
34
+ ### Step 1: Context Analysis
35
+
36
+ Analyze the user's request for signals:
37
+
38
+ ```
39
+ Input: "Fix the login form validation on mobile"
40
+
41
+ Signals detected:
42
+ - "fix" → debugging domain
43
+ - "login form" → frontend/UI domain
44
+ - "validation" → form patterns
45
+ - "mobile" → mobile-responsive/mobile-design
46
+ ```
47
+
48
+ ### Step 2: Skill Matching
49
+
50
+ Match signals to skill domains:
51
+
52
+ ```
53
+ Signal → Skill Mapping:
54
+
55
+ frontend/UI/CSS/React → frontend-design, nextjs-react-expert, tailwind-patterns
56
+ backend/API/database → api-patterns, database-design, nodejs-best-practices
57
+ mobile/iOS/Android → mobile-design
58
+ security/auth/OWASP → vulnerability-scanner, red-team-tactics
59
+ testing/test/jest → testing-patterns, tdd-workflow, webapp-testing
60
+ git/branch/commit → git-workflow
61
+ debug/fix/error → systematic-debugging
62
+ design/UI/component → design-system-enforcement, atomic-design
63
+ deploy/production/CI → deployment-procedures
64
+ performance/slow/optimize → performance-profiling, nextjs-react-expert
65
+ SEO/search/meta → seo-fundamentals, geo-fundamentals
66
+ ```
67
+
68
+ ### Step 3: File Context
69
+
70
+ Also check which files are being modified:
71
+
72
+ ```
73
+ *.tsx, *.jsx, *.css → frontend-design
74
+ *.py → python-patterns
75
+ *.ts (backend) → nodejs-best-practices
76
+ package.json → project-setup
77
+ Dockerfile → server-management
78
+ *.test.*, *.spec.* → testing-patterns
79
+ ```
80
+
81
+ ### Step 4: Confidence Scoring
82
+
83
+ Each potential skill gets a confidence score:
84
+
85
+ | Factor | Weight |
86
+ |--------|--------|
87
+ | Direct keyword match | 0.4 |
88
+ | File extension match | 0.3 |
89
+ | Agent frontmatter includes it | 0.2 |
90
+ | Recently used (last 3 sessions) | 0.1 |
91
+
92
+ **Load threshold:** score >= 0.5
93
+
94
+ ### Step 5: Present/Load
95
+
96
+ **Mode: auto**
97
+ ```
98
+ 🧩 Auto-loaded skills: frontend-design, testing-patterns
99
+ ```
100
+
101
+ **Mode: suggest**
102
+ ```markdown
103
+ 🧩 **Suggested skills for this task:**
104
+
105
+ | Skill | Confidence | Reason |
106
+ |-------|-----------|--------|
107
+ | frontend-design | 0.85 | Keywords: "form", "mobile"; Files: .tsx |
108
+ | testing-patterns | 0.65 | Keyword: "validation" |
109
+ | mobile-design | 0.55 | Keyword: "mobile" |
110
+
111
+ Load these skills? (yes / select / no)
112
+ ```
113
+
114
+ ## Always-Load Skills
115
+
116
+ Some skills should ALWAYS be loaded regardless of discovery:
117
+
118
+ ```json
119
+ {
120
+ "always_load_skills": [
121
+ "clean-code",
122
+ "knowledge-persistence",
123
+ "quality-gates"
124
+ ]
125
+ }
126
+ ```
127
+
128
+ ## Staleness (Future — F-18)
129
+
130
+ Skills not used for N sessions get deprioritized:
131
+
132
+ ```json
133
+ {
134
+ "skill_staleness_sessions": 20
135
+ }
136
+ ```
137
+
138
+ Usage tracking in `.flyee/skill-usage.json`:
139
+ ```json
140
+ {
141
+ "frontend-design": { "last_used": "2026-03-31", "count": 15 },
142
+ "game-development": { "last_used": "2026-01-15", "count": 2 }
143
+ }
144
+ ```
145
+
146
+ ## Integration
147
+
148
+ - **intelligent-routing**: Skill discovery runs AFTER agent routing
149
+ - **context-budget**: Discovered skills count against context budget
150
+ - **knowledge-persistence**: Skill effectiveness feeds into KNOWLEDGE.md
151
+ - **cost-tracking**: Track if skill loading improved task outcomes
152
+ - **Flyee SaaS**: Skill usage analytics in dashboard
@@ -0,0 +1,125 @@
1
+ ---
2
+ name: sprint-validation
3
+ description: Gate de conclusão de sprint/milestone. Compara success criteria do roadmap contra resultados reais antes de selar o sprint. Diferente de quality-gates (que é por task) — este é o gate FINAL do sprint inteiro.
4
+ ---
5
+
6
+ # Sprint Validation Gate
7
+
8
+ > O gate final antes de marcar um sprint/milestone como concluído.
9
+
10
+ ## Purpose
11
+
12
+ Individual tasks can pass their quality gates while the sprint as a whole fails. Examples:
13
+ - Each task works in isolation, but they don't work together
14
+ - All code is written, but the feature doesn't meet the original requirements
15
+ - Tests pass, but the user story isn't fulfilled
16
+
17
+ Sprint validation catches these by comparing **planned outcomes** against **actual results**.
18
+
19
+ ## When to Trigger
20
+
21
+ - Before marking any sprint/milestone as "completed"
22
+ - After ALL tasks in the sprint have their own Completion Gates passed
23
+ - Before the final git commit/merge for the sprint
24
+
25
+ ## Validation Protocol
26
+
27
+ ### Step 1: Gather Success Criteria
28
+
29
+ Read the sprint planning document for defined success criteria:
30
+
31
+ ```markdown
32
+ ## Sprint S09 — Blog & CMS
33
+
34
+ ### Success Criteria (from planning)
35
+ 1. Blog posts can be created, edited, and published
36
+ 2. CMS admin panel with CRUD operations
37
+ 3. SEO metadata auto-generated for each post
38
+ 4. RSS feed available at /feed.xml
39
+ 5. Performance: LCP < 2.5s on blog pages
40
+ ```
41
+
42
+ ### Step 2: Verify Each Criterion
43
+
44
+ For each success criterion, provide **evidence**:
45
+
46
+ ```markdown
47
+ ## Sprint Validation — S09
48
+
49
+ | # | Criterion | Status | Evidence |
50
+ |---|-----------|--------|----------|
51
+ | 1 | Blog CRUD | ✅ | Routes exist: POST/GET/PUT/DELETE /api/posts |
52
+ | 2 | CMS admin | ✅ | Page exists: /admin/posts with DataTable |
53
+ | 3 | SEO metadata | ✅ | generateMetadata() in app/blog/[slug]/page.tsx |
54
+ | 4 | RSS feed | ❌ | /feed.xml returns 404 — NOT IMPLEMENTED |
55
+ | 5 | LCP < 2.5s | ⚠️ | Not measurable without deployment |
56
+ ```
57
+
58
+ ### Step 3: Integration Check
59
+
60
+ Beyond individual criteria, verify components work together:
61
+
62
+ ```markdown
63
+ ## Integration Verification
64
+
65
+ - [ ] E2E flow works: Create post → Edit → Publish → View on blog → Appears in RSS
66
+ - [ ] All routes registered in router/navigation
67
+ - [ ] Database migrations applied without conflict
68
+ - [ ] No import cycles between modules
69
+ - [ ] Shared types/interfaces consistent across components
70
+ ```
71
+
72
+ ### Step 4: Decision
73
+
74
+ | Result | Action |
75
+ |--------|--------|
76
+ | All criteria ✅ | Sprint PASS → mark completed |
77
+ | 1-2 criteria ❌ or ⚠️ | Sprint PARTIAL → create fix tasks, DON'T mark completed |
78
+ | 3+ criteria ❌ | Sprint FAIL → reassess, DON'T mark completed |
79
+
80
+ ### Step 5: Sprint Summary
81
+
82
+ ```markdown
83
+ ## Sprint S09 — Validation Result: PARTIAL
84
+
85
+ ### Passed (3/5)
86
+ - ✅ Blog CRUD — fully functional
87
+ - ✅ CMS admin — complete with DataTable
88
+ - ✅ SEO metadata — auto-generated
89
+
90
+ ### Failed (1/5)
91
+ - ❌ RSS feed — not implemented
92
+
93
+ ### Unverifiable (1/5)
94
+ - ⚠️ LCP performance — requires deployment
95
+
96
+ ### Action
97
+ - Created fix task: "Implement RSS feed at /feed.xml"
98
+ - Sprint remains OPEN until RSS is complete
99
+ - LCP deferred to post-deployment verification
100
+ ```
101
+
102
+ ## Event Emission
103
+
104
+ ```python
105
+ bridge.emit_event("sprint.validation", {
106
+ "sprint": "S09",
107
+ "result": "partial",
108
+ "criteria_total": 5,
109
+ "criteria_passed": 3,
110
+ "criteria_failed": 1,
111
+ "criteria_unverifiable": 1,
112
+ "action": "created_fix_tasks",
113
+ "fix_tasks": ["Implement RSS feed"]
114
+ })
115
+ ```
116
+
117
+ ## Integration
118
+
119
+ - **quality-gates**: Tasks must pass Completion Gate BEFORE sprint validation runs
120
+ - **roadmap-reassessment**: Failed sprint triggers reassessment
121
+ - **verification-gate**: Sprint validation uses verification commands
122
+ - **cost-tracking**: Sprint cost is part of the validation summary
123
+ - **knowledge-persistence**: Record what worked/failed in KNOWLEDGE.md
124
+ - **task-complete**: Sprint validation BLOCKS `/task-complete` for the sprint itself
125
+ - **Flyee SaaS**: Sprint validation results visible in progress dashboard
@@ -0,0 +1,176 @@
1
+ ---
2
+ name: stuck-detection
3
+ description: Detects when the agent enters an infinite loop — repeating the same actions, failing the same way, or making zero progress. Uses a sliding-window pattern detector to identify cycles and break them before wasting budget.
4
+ ---
5
+
6
+ # Stuck Detection
7
+
8
+ > Detects and breaks infinite loops in agent execution.
9
+
10
+ ## Problem
11
+
12
+ Without stuck detection, an agent can:
13
+ - Retry the same failing command indefinitely
14
+ - Edit → undo → re-edit the same file in a loop
15
+ - Make the same search query repeatedly
16
+ - Generate the same code that fails the same test
17
+
18
+ Each iteration burns tokens and budget with zero progress.
19
+
20
+ ## Detection Algorithm
21
+
22
+ ### Sliding Window Pattern Detector
23
+
24
+ Monitor the last N tool calls (window size = 10) for:
25
+
26
+ ```
27
+ Pattern 1: EXACT REPEAT
28
+ Tool calls [i] == Tool calls [i-1] == Tool calls [i-2]
29
+ → Same tool, same arguments, 3+ times in a row
30
+
31
+ Pattern 2: CYCLE
32
+ Sequence A-B-C-A-B-C detected
33
+ → Agent alternating between the same actions
34
+
35
+ Pattern 3: ZERO PROGRESS
36
+ Last 5 tool calls produced no file changes
37
+ AND no new information was gathered
38
+ → Agent is "thinking" but not "doing"
39
+
40
+ Pattern 4: FAIL-RETRY-FAIL
41
+ Same command fails → agent retries → same failure
42
+ → 3+ identical failures without changing approach
43
+ ```
44
+
45
+ ### Detection Thresholds
46
+
47
+ | Pattern | Threshold | Confidence |
48
+ |---------|-----------|------------|
49
+ | Exact repeat | 3 consecutive | HIGH |
50
+ | Cycle | 2 full cycles (6+ calls) | HIGH |
51
+ | Zero progress | 5 calls, 0 changes | MEDIUM |
52
+ | Fail-retry-fail | 3 identical failures | HIGH |
53
+
54
+ ## Response Protocol
55
+
56
+ ### Phase 1: Soft Intervention
57
+
58
+ On first detection:
59
+
60
+ ```markdown
61
+ ⚠️ **Stuck Detection Triggered**
62
+
63
+ Pattern: {pattern_type}
64
+ Evidence: {last N tool calls summarized}
65
+ Window: {call_count} calls, {time_elapsed}
66
+
67
+ Attempting recovery: Retry with diagnostic prompt.
68
+ ```
69
+
70
+ Inject a **diagnostic prompt** that:
71
+ 1. Summarizes what was attempted
72
+ 2. Explains WHY it's stuck
73
+ 3. Asks the agent to try a **different approach**
74
+
75
+ ### Phase 2: Hard Stop
76
+
77
+ If soft intervention fails (same pattern recurs within 5 tool calls):
78
+
79
+ ```markdown
80
+ 🛑 **Stuck Detection — Hard Stop**
81
+
82
+ The agent has been stuck in a loop for {call_count} calls.
83
+ Pattern: {pattern_type}
84
+ Estimated wasted cost: ${cost}
85
+
86
+ Action: Pausing execution.
87
+ Next steps:
88
+ 1. Review the session log
89
+ 2. Provide manual guidance
90
+ 3. Resume with `/execute`
91
+ ```
92
+
93
+ ### Phase 3: Escalation
94
+
95
+ Emit event for Flyee SaaS:
96
+
97
+ ```json
98
+ {
99
+ "event_type": "stuck_detection.triggered",
100
+ "payload": {
101
+ "task_id": "T-001",
102
+ "pattern": "fail_retry_fail",
103
+ "tool_calls_in_loop": 9,
104
+ "wasted_cost_usd": 0.45,
105
+ "intervention": "hard_stop",
106
+ "last_5_calls": [
107
+ "run_command: npm test",
108
+ "replace_file_content: src/utils.ts",
109
+ "run_command: npm test",
110
+ "replace_file_content: src/utils.ts",
111
+ "run_command: npm test"
112
+ ]
113
+ }
114
+ }
115
+ ```
116
+
117
+ ## Implementation
118
+
119
+ ### For Prompt-Based Agents (Flyee approach)
120
+
121
+ Since flyee-agent runs INSIDE the host runtime (not as a standalone app), stuck detection works through **skill instructions**:
122
+
123
+ ```markdown
124
+ ## Self-Monitoring Protocol
125
+
126
+ You MUST monitor your own behavior for loops:
127
+
128
+ After every 5 tool calls, ask yourself:
129
+ 1. Am I making progress? (new files, passing tests, new info)
130
+ 2. Am I repeating the same action? (same edit, same command)
131
+ 3. Have I changed approach since the last failure?
132
+
133
+ If the answer to #1 is NO and #2 or #3 is YES:
134
+ → STOP. State what you're stuck on. Ask the user for guidance.
135
+
136
+ DO NOT:
137
+ - Retry the same command more than 2 times
138
+ - Edit the same file more than 3 times for the same issue
139
+ - Run the same failing test more than 2 times without changing the code
140
+ ```
141
+
142
+ ### For Bridge-Level Detection
143
+
144
+ The `local_tracker.py` can implement pattern detection on the event log:
145
+
146
+ ```python
147
+ def check_stuck(events, window=10):
148
+ recent = events[-window:]
149
+
150
+ # Pattern 1: Exact repeat
151
+ if len(set(e['action'] for e in recent[-3:])) == 1:
152
+ return 'exact_repeat'
153
+
154
+ # Pattern 4: Fail-retry-fail
155
+ failures = [e for e in recent if e.get('result') == 'failure']
156
+ if len(failures) >= 3:
157
+ cmds = [f['action'] for f in failures[-3:]]
158
+ if len(set(cmds)) == 1:
159
+ return 'fail_retry_fail'
160
+
161
+ return None
162
+ ```
163
+
164
+ ## Integration
165
+
166
+ - **cost-tracking**: Log wasted cost during stuck periods
167
+ - **session-resilience**: Stuck detection can trigger session save
168
+ - **knowledge-persistence**: Record what caused the loop in Lessons Learned
169
+ - **task-complete**: Block completion if agent was stuck and didn't resolve
170
+ - **Flyee SaaS**: Dashboard shows stuck incidents timeline (S-03)
171
+
172
+ ## Exclusions
173
+
174
+ - **Research tasks**: Repeated searches are normal during research
175
+ - **Interactive debugging**: User may ask to retry — that's not a loop
176
+ - **Build/test cycles**: Edit → test → edit → test is normal IF the edits are different