ai-fob 1.3.0 → 1.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -26,6 +26,10 @@ You are a rigorous build validator. Your job is to run validation checks provide
26
26
 
27
27
  Run every validation check. Report exactly what you observe. If a check fails, describe the failure precisely -- what was expected vs. what actually happened. Do not attempt to fix the code or suggest fixes. Your report goes back to the orchestrator who will route failures to a builder agent for correction.
28
28
 
29
+ ## Browser Tool Constraint
30
+
31
+ NEVER use the macOS `open` command to open URLs in a browser. ALWAYS use `agent-browser open <url>` for all browser-based checks. The `open` command launches Safari, which cannot be automated, snapshotted, or device-emulated. All browser interactions MUST go through `agent-browser` (Chromium via Playwright).
32
+
29
33
  ## Checks
30
34
 
31
35
  The calling prompt provides a numbered list of checks to run. Execute every check listed -- no more, no fewer. For each check:
@@ -38,3 +38,4 @@ Present your findings as:
38
38
  - **Data Flow**: How data moves through the relevant files
39
39
  - **Integration Points**: Where new code would connect to existing code (file path + line)
40
40
  - **Shared Utilities**: Existing helpers, hooks, or patterns that should be reused
41
+ - **File Size Audit**: List every key file with its line count. Flag files over 300 lines (warning) and over 500 lines (critical -- must be decomposed).
@@ -291,12 +291,13 @@ Run MARK_STEP_START(1).
291
291
  4. Shared utilities -- helpers, hooks, or patterns to reuse rather than rebuild
292
292
  5. Potential conflicts -- code this phase may need to modify that other work also touches
293
293
  6. Success criteria grounding -- for each success criterion, document what currently exists or is missing
294
+ 7. File size audit -- for every file this phase will modify or extend, report its current line count. Flag files over 300 lines (WARNING) and over 500 lines (CRITICAL -- must be decomposed before adding code).
294
295
 
295
296
  Do NOT explore areas unrelated to this phase.
296
297
 
297
298
  ## Output
298
299
  Write your complete findings to: {PHASE_DIR}/explorer_findings.md
299
- Structure your report with these sections: Prerequisites Status, Key Files, Existing Patterns, Integration Points, Shared Utilities, Potential Conflicts, Success Criteria Grounding, Data Flow.
300
+ Structure your report with these sections: Prerequisites Status, Key Files, Existing Patterns, Integration Points, Shared Utilities, Potential Conflicts, Success Criteria Grounding, Data Flow, File Size Audit.
300
301
  Write the file using the Write tool.
301
302
  Return the file path in your response.
302
303
  ```
@@ -417,6 +418,7 @@ After research is complete, spawn the architect agent to create the implementati
417
418
  - Every frontend domain MUST include `agent-browser` browser verification steps in the Phase Validation section. If the phase includes a Frontend domain, browser-based validation is MANDATORY, not optional.
418
419
  - This plan covers a SINGLE PHASE only -- do not plan work beyond this phase's scope
419
420
  - If the phase has multiple domains that can be built independently (no shared files, no cross-domain dependencies), mark them with `| PARALLEL` on the domain header line (e.g., `### Frontend | PARALLEL`). If uncertain, do NOT mark as parallel.
421
+ - NEVER plan to add significant logic to a file already over 300 lines without first splitting it into focused modules. Files over 500 lines MUST have a decomposition task BEFORE any new code is added. Check the Explorer's File Size Audit section for flagged files.
420
422
 
421
423
  ## Output Format
422
424
  Write the implementation plan in the Single-Phase Implementation Plan format (see below) to: {PHASE_DIR}/plan_V1.md
@@ -541,6 +543,7 @@ Anti-stories from the HL plan relevant to this phase:
541
543
 
542
544
  ## Important Notes
543
545
  [Gotchas, constraints, security considerations, version pinning, dependency ordering between tasks]
546
+ [Flag any files from the File Size Audit that are over 300 lines and will be modified. Note decomposition strategy for any over 500 lines.]
544
547
  ```
545
548
 
546
549
  ### Step 3: Validate Plan (Correction Loop)
@@ -590,7 +593,7 @@ Read the plan at: {PHASE_DIR}/plan_V1.md
590
593
  ### Prior Phase Context
591
594
  {PRIOR_PHASE_CONTEXT if N > 1, otherwise "N/A -- this is Phase 1"}
592
595
 
593
- ## Validation Checks (10 -- run ALL of these)
596
+ ## Validation Checks (11 -- run ALL of these)
594
597
 
595
598
  1. **File reference accuracy** -- Do referenced files, functions, and patterns actually exist in the codebase? Use Glob/Grep/Read to verify every file path mentioned in the plan. Flag any references to files, functions, or patterns that do not exist.
596
599
 
@@ -612,6 +615,8 @@ Read the plan at: {PHASE_DIR}/plan_V1.md
612
615
 
613
616
  10. **Self-containment check** -- Can a building agent execute this plan without further research? Verify all file paths to create/modify are explicit (no "find the appropriate file"). Verify code blocks are complete enough to implement (no "add similar logic here"). Verify dependency install commands are specified where new packages are introduced. Flag any task that requires the builder to make architectural decisions or do additional research.
614
617
 
618
+ 11. **File size check** -- Does the plan respect file size limits? Read the Explorer's File Size Audit section from `{PHASE_DIR}/explorer_findings.md`. For any file flagged over 300 lines that the plan modifies, verify the plan acknowledges the size concern (in Important Notes or as a refactoring task). For any file flagged over 500 lines, verify the plan includes an explicit decomposition task BEFORE adding new code to that file. FAIL if the plan adds code to a 500+ line file without decomposing it first.
619
+
615
620
  ## Validation Parameters
616
621
  - task: {TASK_NAME}
617
622
  - phase: {N}
@@ -621,7 +626,7 @@ Read the plan at: {PHASE_DIR}/plan_V1.md
621
626
  ## Output
622
627
  Write your validation report to: {PHASE_DIR}/plan_validation_report.md
623
628
 
624
- Use the YAML frontmatter format from your agent instructions with the parameters above. Include `checks-passed: X/10` in the frontmatter.
629
+ Use the YAML frontmatter format from your agent instructions with the parameters above. Include `checks-passed: X/11` in the frontmatter.
625
630
 
626
631
  Write the file using the Write tool.
627
632
  Return the file path AND the overall result (pass or fail) in your response.
@@ -737,7 +742,7 @@ If validation has failed `MAX_CYCLES` (3) times, abort. Present:
737
742
  PLAN VALIDATION FAILED after 3 cycles. Aborting.
738
743
 
739
744
  Phase: {N} -- {PHASE_NAME}
740
- Checks passed: {X}/10 (cycle 3)
745
+ Checks passed: {X}/11 (cycle 3)
741
746
 
742
747
  Remaining failures:
743
748
  - {check name -- problem summary}
@@ -903,9 +908,11 @@ Construct the final numbered check list by combining standard checks, plan-speci
903
908
 
904
909
  3. **Build succeeds** -- Run `./scripts/build.sh` via Bash. PASS if exit code is 0; FAIL if non-zero. Include the first 50 lines of output on failure.
905
910
 
911
+ 4. **No oversized files introduced** -- Identify files changed during this phase: if `git-available` is true, run `git diff --name-only {pre-phase-sha}..HEAD` via Bash to get the list of phase-modified files; if `git-available` is false, read the build report(s) and extract the "Files Created/Modified" lists. Then run `wc -l` on each identified file (skipping binary files and deleted files). PASS if all files are under 500 lines. FAIL listing each file that exceeds 500 lines with its line count. Also WARN (but do not FAIL) for files between 300-500 lines -- include these in the findings as advisories.
912
+
906
913
  **Plan-specific checks** (from `## Phase {N} Validation`):
907
914
 
908
- Number these sequentially starting at 4. Preserve the original check descriptions from the plan verbatim. For each check, the validator determines the check type (shell command, file verification, or browser verification) based on the description content.
915
+ Number these sequentially starting at 5. Preserve the original check descriptions from the plan verbatim. For each check, the validator determines the check type (shell command, file verification, or browser verification) based on the description content.
909
916
 
910
917
  **Browser console error check** (always included AFTER plan-specific checks, but ONLY if any plan-specific check involves browser/UI verification OR the phase has a Frontend domain):
911
918
 
@@ -927,6 +934,12 @@ Store the total check count as `BUILD_CHECK_COUNT`. Store the count of HL criter
927
934
 
928
935
  Determine if browser checks exist: scan the assembled check list for any check that mentions "browser", "agent-browser", "navigate", "page", "UI", or "localhost". Store as `HAS_BROWSER_CHECKS` (true/false).
929
936
 
937
+ If `HAS_BROWSER_CHECKS` is true: Read the Mobile Test Devices section from the testing-and-validation skill. Extract the Primary Device and Secondary Device values. Store as `MOBILE_PRIMARY_DEVICE` and `MOBILE_SECONDARY_DEVICE`. Determine `HAS_MOBILE_CHECKS`:
938
+ - If `MOBILE_PRIMARY_DEVICE` is "NONE": set `HAS_MOBILE_CHECKS = false`.
939
+ - Otherwise: set `HAS_MOBILE_CHECKS = true`.
940
+
941
+ If `HAS_BROWSER_CHECKS` is false: set `HAS_MOBILE_CHECKS = false`.
942
+
930
943
  #### 5b. Start Dev Server (if needed)
931
944
 
932
945
  If `HAS_BROWSER_CHECKS` is true:
@@ -997,6 +1010,27 @@ Read the build report(s) for context on what was built:
997
1010
  If Username is "REPLACE_WITH_TEST_USERNAME":
998
1011
  "Test credentials have NOT been configured in the testing-and-validation skill. The placeholder values have not been replaced. Browser checks that require authentication MUST be marked as BLOCKED with reason: 'Test credentials not configured in testing-and-validation skill -- user must replace placeholder values.'"}
999
1012
 
1013
+ ## Mobile Device Testing
1014
+ {Read the Mobile Test Devices section from the testing-and-validation skill. Extract the Primary Device and Secondary Device values.
1015
+
1016
+ If HAS_MOBILE_CHECKS is true (Primary Device is NOT "NONE"):
1017
+ "Mobile viewport testing is configured for this project. After completing each browser check at the default desktop viewport, you MUST repeat the visual/layout portions of that check at the mobile viewport.
1018
+
1019
+ Mobile testing procedure (use the agent-browser skill):
1020
+ 1. Complete the browser check at the default desktop viewport first
1021
+ 2. Set the mobile device: `agent-browser set device \"{MOBILE_PRIMARY_DEVICE}\"`
1022
+ 3. Reload the page: `agent-browser reload`
1023
+ 4. Take a snapshot to verify layout at mobile viewport: `agent-browser snapshot -i`
1024
+ 5. Take a screenshot for visual evidence: `agent-browser screenshot`
1025
+ 6. Verify the page renders correctly at the mobile viewport -- no overlapping elements, no horizontal scrolling, no truncated content, no inaccessible interactive elements
1026
+ 7. Reset to desktop viewport when done: `agent-browser set viewport 1920 1080`{If MOBILE_SECONDARY_DEVICE is not "NONE":
1027
+ 8. Repeat steps 2-7 with the secondary device: `agent-browser set device \"{MOBILE_SECONDARY_DEVICE}\"`}
1028
+
1029
+ For each browser check, report desktop and mobile results separately. If a check passes at desktop but fails at mobile, the overall check result is FAIL. Include the device name in the findings (e.g., 'FAIL at iPhone 12 Pro: navigation menu overlaps content')."
1030
+
1031
+ If HAS_MOBILE_CHECKS is false:
1032
+ "Mobile viewport testing is not configured (Primary Device is NONE in the testing-and-validation skill). All browser checks run at the default desktop viewport only."}
1033
+
1000
1034
  ## Validation Checks ({BUILD_CHECK_COUNT} -- run ALL of these)
1001
1035
 
1002
1036
  {The assembled numbered check list from step 5a -- paste the full list verbatim, including the [HL] prefixed checks at the end}
@@ -1019,6 +1053,8 @@ For each check, report one of three results:
1019
1053
  - phase-name: {PHASE_NAME}
1020
1054
  - cycle: {BUILD_VALIDATION_CYCLE}
1021
1055
  - hl-criteria-count: {HL_CRITERIA_COUNT}
1056
+ - pre-phase-sha: {PRE_PHASE_SHA}
1057
+ - git-available: {GIT_AVAILABLE}
1022
1058
 
1023
1059
  ## Output
1024
1060
  Write your validation report to: {PHASE_DIR}/build_validation_report.md
@@ -1309,7 +1345,7 @@ Implementation Plan Summary:
1309
1345
  - Source Citations: {count of `// per:` occurrences}
1310
1346
 
1311
1347
  Plan Validation: {VALIDATION_RESULT} (cycle {VALIDATION_CYCLE})
1312
- - Checks passed: {X}/10
1348
+ - Checks passed: {X}/11
1313
1349
 
1314
1350
  Build Summary:
1315
1351
  - Builder(s): {BUILDER_COUNT} spawned {" (parallel)" if > 1}
@@ -93,7 +93,8 @@ The user provided a feature document from a reverse-engineering analysis. Use it
93
93
  - **Name**: short descriptive name
94
94
  - **Goal**: 1-2 sentences on what this phase achieves and why it comes at this position
95
95
  - **Dependencies**: which prior phase(s) must complete first (or "None" for the first)
96
- - **Success criteria**: concrete, verifiable statements (can be user stories, anti-stories, API checks, state checks, UI checks, build checks)
96
+ - **Success criteria**: concrete, verifiable statements (can be user stories, anti-stories, API checks, state checks, UI checks, build checks, mobile viewport checks)
97
+ - For phases with UI work, ask: "Does this need to work on mobile viewports?" If yes, include mobile-specific success criteria (e.g., "Navigation menu is usable on iPhone 12 Pro viewport")
97
98
  Target 3-5 phases. Present the suggested phases to the user:
98
99
  ```
99
100
  Based on the feature document, I suggest this phase breakdown:
@@ -155,7 +156,9 @@ The user provided a feature document from a reverse-engineering analysis. Use it
155
156
  - State/infrastructure checks: "Database tables exist and are accessible"
156
157
  - UI checks: "Login form renders with email and password fields"
157
158
  - Build/tooling checks: "`bun dev` starts without errors"
159
+ - Mobile viewport checks: "Navigation is usable at iPhone 12 Pro viewport"
158
160
  - For phases with user-facing behavior, success criteria MUST include both user stories (what users CAN do) and anti-stories (what users CANNOT do)
161
+ - For phases with UI work, ask the user: "Does this need to work on mobile viewports?" If yes, include mobile-specific success criteria. Mobile viewport testing is configured in the testing-and-validation skill.
159
162
  - Each criterion must be testable -- a future validator should be able to determine PASS/FAIL
160
163
  - The last phase should typically be "Integration & Polish"
161
164
  8. Do NOT proceed to Phase 2 until you and the user explicitly agree on the task description, user stories, anti-stories, phase breakdown, and detailed specifications (if any were provided).
@@ -180,6 +183,7 @@ Spawn a teammate using the `explorer-agent` agent definition. Include this conte
180
183
  - Anti-stories (from Phase 1)
181
184
  - Phase breakdown with success criteria (from Phase 1)
182
185
  - Specific areas of the codebase to focus on (if known)
186
+ - Directive: "For every key file you report, include its line count. Flag any file over 300 lines as a WARNING and any file over 500 lines as CRITICAL -- these MUST be noted for the architect to plan decomposition."
183
187
 
184
188
  If IS_RE_ENGINEERING is true, also include in the Explorer's spawn prompt:
185
189
  - A note: "This task is a reimplementation based on a reverse-engineered feature document. The feature document describes how the original feature works in a different codebase. Your job is to explore THIS codebase (not the source repo at {SOURCE_REPO}) to understand what exists here that is relevant to building this feature."
@@ -214,6 +218,8 @@ Wait for the Explorer (and Docs Researcher if spawned) to complete their work. T
214
218
 
215
219
  Instruct the Architect to produce the plan in the specified format and post it to the shared task list. Instruct the Architect to preserve user-provided detailed specifications in the "Detailed Specifications" section of the plan (section 8) -- these are requirements, not implementation details.
216
220
 
221
+ Instruct the Architect: "If the Explorer flagged any files over 300 lines, address them in Key Considerations (section 7). For files over 500 lines, the plan MUST include decomposition as an explicit task in the relevant phase. NEVER plan to add significant logic to a file already over 300 lines without first splitting it."
222
+
217
223
  #### If agent teams are NOT available (fallback):
218
224
 
219
225
  Use the Task tool to run these sequentially:
@@ -264,7 +270,7 @@ You are validating the high-level plan for {TASK_NAME}.
264
270
  Category: {task category}
265
271
  Description: {task description}
266
272
 
267
- ## Validation Checks (9 -- run ALL of these)
273
+ ## Validation Checks (10 -- run ALL of these)
268
274
 
269
275
  1. **Current State accuracy** -- Does the plan's description of the codebase match reality? Use Grep/Glob to verify referenced files exist. Use Read to verify descriptions of file contents are accurate. Flag any references to files, functions, or patterns that don't exist.
270
276
 
@@ -284,6 +290,8 @@ Description: {task description}
284
290
 
285
291
  9. **Phase dependency ordering** -- Is the dependency chain a valid DAG? No circular dependencies. Does the ordering make logical sense? Can each phase start once its dependencies complete?
286
292
 
293
+ 10. **File size awareness** -- Does the plan address oversized files flagged by the Explorer? Check the Explorer's file size data. For any file over 300 lines that the plan modifies or extends, verify the plan acknowledges the size concern in Key Considerations (section 7). For any file over 500 lines, verify the plan includes explicit decomposition. Flag any plan that adds work to a 500+ line file without decomposing it first.
294
+
287
295
  ## Output
288
296
  Present the validation report directly in your response (or post to the shared task list if using agent teams).
289
297
  ```
@@ -421,6 +429,7 @@ Note: User-provided specifications (schemas, API requirements, business rules) a
421
429
 
422
430
  ## 7. Key Considerations
423
431
  Risks, gotchas, dependencies, things to watch out for. Include security considerations.
432
+ Flag any existing files over 300 lines that this plan touches. Files over 500 lines MUST have a decomposition plan.
424
433
  These apply across all phases.
425
434
 
426
435
  ## 8. Detailed Specifications
@@ -461,7 +470,7 @@ Anti-Stories: {count}
461
470
 
462
471
  High-Level Approach: {1-2 sentence summary}
463
472
  Key Considerations: {count} identified
464
- Validation: PASSED ({count}/9 checks verified)
473
+ Validation: PASSED ({count}/10 checks verified)
465
474
 
466
475
  State Files:
467
476
  - STATE.md: [read | scaffolded]
@@ -472,5 +481,5 @@ Team Members Used:
472
481
  - Explorer: completed
473
482
  - Docs Researcher: [completed | skipped]
474
483
  - Architect: completed ({N} iterations)
475
- - Validator: PASSED ({count}/9 checks verified)
484
+ - Validator: PASSED ({count}/10 checks verified)
476
485
  ```
@@ -49,6 +49,11 @@ You are investigating a bug in this codebase.
49
49
  - Determine whether the issue is localized (single file/module) or cross-cutting (multiple layers/modules)
50
50
  - Note any architectural implications (data model changes, API surface changes, auth flow changes)
51
51
 
52
+ 4. Audit file sizes:
53
+ - For every affected file, count its lines using `wc -l`
54
+ - Flag any file over 300 lines as WARNING (risk of cascading bugs from changes)
55
+ - Flag any file over 500 lines as CRITICAL (file must be decomposed -- fix may cause more breakage)
56
+
52
57
  ## Output Format
53
58
 
54
59
  Report your findings with these sections:
@@ -58,6 +63,7 @@ Report your findings with these sections:
58
63
  - **Localized vs Cross-Cutting**: Is this a single-point fix or does it span multiple modules/layers?
59
64
  - **Architectural Implications**: Any data model, API, auth, or structural concerns (or "None")
60
65
  - **Relevant Patterns**: How similar code is handled correctly elsewhere in the codebase
66
+ - **File Size Warnings**: Line counts of affected files. Flag any over 300 lines (WARNING) or over 500 lines (CRITICAL).
61
67
 
62
68
  Do NOT write your findings to a file. Return them directly.
63
69
  ```
@@ -107,6 +113,7 @@ Evaluate the following checklist against the research findings:
107
113
  - Change type is config, import, small logic fix, CSS/styling, typo, missing null check, or similar
108
114
  - No architectural implications (fix does not change data model, API surface, auth flow, or component hierarchy)
109
115
  - No risk of cascading side effects (change is localized)
116
+ - No affected file exceeds 500 lines (from explorer's File Size Warnings -- oversized files have high cascading-bug risk)
110
117
 
111
118
  **Complex-Fix Indicators** (ANY triggers escalation):
112
119
  - Root cause is unclear or ambiguous after research
@@ -117,6 +124,7 @@ Evaluate the following checklist against the research findings:
117
124
  - Fix involves race conditions, concurrency, or timing issues
118
125
  - Research reveals the issue is a symptom of a deeper design problem
119
126
  - Fix requires changes across multiple layers (frontend + backend + database)
127
+ - Any affected file exceeds 500 lines (oversized file -- high risk of cascading bugs; needs decomposition before fixing)
120
128
 
121
129
  Produce a triage verdict: **EASY** or **COMPLEX** with a one-line justification.
122
130
 
@@ -229,6 +237,7 @@ You MUST produce a plan using EXACTLY this format. No additional sections, no om
229
237
  - Keep it minimal -- this is a quick fix
230
238
  - The plan must address the root cause, not just the symptom
231
239
  - The verification step must be concrete and executable (a command to run, a test to check, a behavior to observe -- not "it should work now")
240
+ - If any affected file is over 300 lines, note the file size risk in the Side Effects section. NEVER add significant logic to a file already over 300 lines without splitting it first.
232
241
  - Do NOT write the plan to a file. Return it directly.
233
242
  ```
234
243
 
@@ -289,6 +298,8 @@ For each cycle:
289
298
  - If the plan specifies a lint/type-check command: run it via Bash
290
299
  - If the plan specifies browser verification or manual UI checks: note this for the user in the final report (the main agent cannot do browser checks autonomously)
291
300
 
301
+ **6c-pre. File size check**: For every file listed in "Files Affected", run `wc -l` via Bash. If any file exceeds 500 lines, report it as a WARNING in the final report with the message: "File {path} is {N} lines -- exceeds 500-line limit. Consider decomposition to prevent future cascading bugs." If any file exceeds 300 lines, note it as an advisory.
302
+
292
303
  **6c. Assess result**:
293
304
  - **PASS**: Verification succeeds (command exits 0, expected output matches). Proceed to Step 7.
294
305
  - **FAIL**: If VALIDATION_CYCLE < MAX_CYCLES:
@@ -345,6 +356,9 @@ Present the final report.
345
356
 
346
357
  ### Notes
347
358
  {Any caveats, things to watch for, or manual verification the user should do (e.g., browser checks noted in Step 6b)}
359
+
360
+ ### File Size Warnings
361
+ {List any files over 300 lines from the file size check in Step 6c-pre, or "None -- all modified files are under 300 lines."}
348
362
  ```
349
363
 
350
364
  **If build validation FAILED after MAX_CYCLES:**
@@ -56,6 +56,7 @@ Ask these questions about every proposed approach. The more "no" answers, the mo
56
56
  - [ ] Are there clear boundaries between this feature and the rest of the codebase?
57
57
  - [ ] Does this approach make future changes easier or harder?
58
58
  - [ ] Is the testing strategy straightforward?
59
+ - [ ] Are individual files kept under 300 lines? Files over 300 lines are a warning sign; files over 500 lines MUST be decomposed before proceeding.
59
60
 
60
61
  ### Upgrade Path
61
62
  - [ ] Can the underlying framework/library be upgraded without rewriting this feature?
@@ -99,6 +100,12 @@ Watch for these patterns during brainstorming — they almost always indicate un
99
100
  - Caching everything by default instead of where profiling shows need
100
101
  - Choosing complex data structures for datasets under 1000 items
101
102
 
103
+ ### Oversized Files
104
+ - Any single file exceeding 300 lines without a clear justification (warning threshold)
105
+ - Any single file exceeding 500 lines regardless of justification (hard limit -- MUST be decomposed)
106
+ - Putting multiple unrelated concerns in one file instead of splitting into focused modules
107
+ - Growing an existing file rather than extracting a new module when adding features
108
+
102
109
  ## Paired Examples
103
110
 
104
111
  ### Example: User Preferences Storage
@@ -70,6 +70,7 @@ ALL of the following must be true for a bug to qualify as an easy fix:
70
70
  - Change type is config, import, small logic fix, CSS/styling, typo, missing null check, or similar
71
71
  - No architectural implications (fix does not change data model, API surface, auth flow, or component hierarchy)
72
72
  - No risk of cascading side effects (change is localized)
73
+ - No affected file exceeds 500 lines (oversized files have high risk of cascading bugs from even small changes)
73
74
 
74
75
  ### Complex-Fix Indicators
75
76
 
@@ -83,6 +84,7 @@ ANY of the following makes a bug complex (triggers escalation):
83
84
  - Fix involves race conditions, concurrency, or timing issues that need careful design
84
85
  - Research reveals the issue is a symptom of a deeper design problem
85
86
  - Fix requires changes across multiple layers (frontend + backend + database)
87
+ - Any affected file exceeds 500 lines (oversized file -- high cascading-bug risk; needs decomposition before fixing)
86
88
 
87
89
  ## Lightweight Bug-Fix Plan Format
88
90
 
@@ -58,6 +58,22 @@ Credentials for automated browser-based authentication. These are used by valida
58
58
  3. The Post-Login URL is the page the browser should land on after successful login -- used to confirm authentication succeeded
59
59
  4. If your app does not require authentication, set Username to `NONE` -- validators will skip the authentication step
60
60
 
61
+ ## Mobile Test Devices
62
+
63
+ Device configurations for mobile viewport testing via Chrome DevTools device emulation (Playwright). These are used by validator agents to run browser checks at mobile viewports in addition to desktop. If your application has responsive layouts, configure a device here so automated tests verify mobile rendering.
64
+
65
+ | Setting | Value |
66
+ |---------|-------|
67
+ | Primary Device | `iPhone 12 Pro` |
68
+ | Secondary Device | `NONE` |
69
+
70
+ **Setup instructions:**
71
+ 1. The Primary Device is the Chrome DevTools device name used for mobile viewport checks (e.g., `iPhone 12 Pro`, `Pixel 5`, `iPad Air`)
72
+ 2. The Secondary Device is an optional second device for additional viewport coverage. Set to `NONE` to skip.
73
+ 3. Device emulation is set via `agent-browser set device "{device name}"` before navigating to a page
74
+ 4. Reset to desktop after mobile checks with `agent-browser set viewport 1920 1080`
75
+ 5. If your app does not need mobile testing, set Primary Device to `NONE` -- validators will skip mobile viewport checks
76
+
61
77
  ## Front-End Testing
62
78
 
63
79
  For visual and interactive front-end testing, use the `agent-browser` skill.
@@ -0,0 +1,136 @@
1
+ ---
2
+ type: diagnosis-report
3
+ workflow: repair-workflow
4
+ target-asset: .claude/skills/testing-and-validation/SKILL.md, .claude/commands/build-phase-V2.md, .claude/commands/create-highlevel-plan-phases.md, .claude/agents/build-validator-agent.md
5
+ asset-type: skill, command, command, agent
6
+ date: 2026-03-28
7
+ status: validated
8
+ failure-patterns: [Implicit Assumptions, Missing Guardrails]
9
+ repair-strategies: [Explicit Instruction, Constraint Injection]
10
+ root-cause-layer: Command-Layer
11
+ ---
12
+
13
+ # Diagnosis Report: Mobile viewport testing absent from build validation pipeline
14
+
15
+ ## Problem Summary
16
+
17
+ Mobile viewport validation never happened during build-phase-V2 browser checks -- all checks ran at the default desktop viewport only, missing responsive layout issues on smaller screens. Additionally, the build validator agent occasionally used the macOS `open` command (launching Safari) instead of `agent-browser open` (Chromium), creating unpredictable, unautomatable browser behavior. The root cause spans the full prompt chain: mobile testing intent was never captured during planning, no device configuration existed in the testing skill, and the build command never instructed the validator to set a mobile viewport.
18
+
19
+ ## Prompt Chain Trace
20
+
21
+ ### Command Markdown (Chain A: Planning)
22
+ - **File**: `.claude/commands/create-highlevel-plan-phases.md`
23
+ - **Finding**: ABSENT. Zero mentions of mobile, viewport, device, or responsive anywhere in the 477-line file. User story and success criteria sections never prompted for mobile testing needs. Because mobile intent was never captured, no mobile success criteria could flow downstream.
24
+
25
+ ### Command Markdown (Chain B: Validation)
26
+ - **File**: `.claude/commands/build-phase-V2.md`
27
+ - **Finding**: ABSENT. Browser check assembly (Step 5a) and validator delegation prompt (Step 5c) had no viewport/device instructions. The Test Credentials pattern existed and worked but was not replicated for device configuration.
28
+
29
+ ### Delegation Prompt Text
30
+ - **Agent spawned**: build-validator-agent (via Step 5c of build-phase-V2)
31
+ - **Finding**: ABSENT. The delegation prompt included dev server info and test credentials but no viewport/device configuration. Without instructions, the validator ran all browser checks at the default Chromium viewport.
32
+
33
+ ### Agent System Prompt
34
+ - **File**: `.claude/agents/build-validator-agent.md`
35
+ - **Finding**: ABSENT for mobile. The browser verification workflow (line 37) described navigate → snapshot → interact → re-snapshot but never mentioned setting a viewport. Also ABSENT for Safari guardrail: the agent had unrestricted `Bash` access with no prohibition on macOS `open`.
36
+
37
+ ### Skill Content
38
+ - **Skills examined**: testing-and-validation, agent-browser
39
+ - **Finding**: testing-and-validation INCOMPLETE -- mentioned "responsiveness" at line 71 but had no device config table. agent-browser PRESENT but UNUSED -- `set device` and `set viewport` commands were fully documented in references/commands.md but never invoked by any workflow.
40
+
41
+ ### Agent Behavior
42
+ - **Observed**: Validator ran all browser checks at default desktop viewport. Occasionally used macOS `open` (Safari) instead of `agent-browser open` (Chromium).
43
+ - **Expected**: Validator should also check at mobile viewport(s) when configured. Should always use `agent-browser open` for automatable, consistent browser testing.
44
+
45
+ ## Failure Pattern Classification
46
+
47
+ ### Primary Pattern: Implicit Assumptions
48
+ - **Evidence**: The entire prompt chain implicitly assumed desktop-only browser testing was sufficient. The capability existed (`agent-browser set device`), the need existed (mobile layout issues), but no file in the chain connected them. The assumption was invisible because each file looked correct in isolation.
49
+ - **Prompt chain link**: All links -- the assumption propagated through the entire chain from planning (no mobile question) through config (no device table) to validation (no viewport instructions).
50
+
51
+ ### Contributing Pattern: Missing Guardrails
52
+ - **Evidence**: build-validator-agent had unrestricted `Bash` in its tools and no explicit prohibition on macOS `open`. The agent-browser skill's `allowed-tools: Bash(agent-browser:*)` was additive, not restrictive. The soft instruction at line 37 to use `agent-browser open` was insufficient.
53
+ - **Prompt chain link**: Agent system prompt (build-validator-agent.md) -- missing negative constraint.
54
+
55
+ ## Root Cause Analysis
56
+
57
+ - **Root cause layer**: Command-Layer (primary), Skill-Layer (contributing)
58
+ - **Root cause description**: Both commands (create-highlevel-plan-phases and build-phase-V2) lacked workflow steps to establish and execute mobile testing. The planning command never asked about mobile viewports, so no mobile success criteria entered the plan. The build command never read device config or injected viewport instructions, so the validator had no reason to test at mobile dimensions. The testing-and-validation skill lacked a device configuration table, so even if the commands wanted to read config, there was nothing to read.
59
+ - **Why shallow fixes failed**: N/A -- first repair attempt. However, fixing only build-phase-V2 would have been insufficient because mobile intent must originate at the planning level to flow into success criteria.
60
+
61
+ ## Impact Analysis
62
+
63
+ - **Target assets**: `.claude/skills/testing-and-validation/SKILL.md`, `.claude/commands/build-phase-V2.md`, `.claude/commands/create-highlevel-plan-phases.md`, `.claude/agents/build-validator-agent.md`
64
+ - **Blast radius**: Contained
65
+ - **Affected consumers**: build-validator-agent (gains constraint + mobile config), setup-project.md (not modified -- new section not in its scope), plan-validator-agent (will see plans with mobile criteria -- no behavior change)
66
+ - **High-risk aspects**: None. All changes are additive. Projects with Primary Device = `NONE` behave identically to before.
67
+
68
+ ## Repair Applied
69
+
70
+ ### Fix Specification
71
+ - **Repair strategy**: Explicit Instruction
72
+ - **Additional strategies**: Constraint Injection
73
+ - **What was changed**: Added mobile device config table to testing-and-validation skill. Added mobile device detection and conditional testing instructions to build-phase-V2. Added mobile testing awareness prompts to create-highlevel-plan-phases (both paths). Added NEVER/ALWAYS Safari guardrail to build-validator-agent.
74
+ - **File(s) modified**:
75
+ - `.claude/skills/testing-and-validation/SKILL.md`
76
+ - `.claude/commands/build-phase-V2.md`
77
+ - `.claude/commands/create-highlevel-plan-phases.md`
78
+ - `.claude/agents/build-validator-agent.md`
79
+
80
+ ### Fix Details
81
+
82
+ 6 changes across 4 files. Most significant change shown below.
83
+
84
+ **Before** (build-phase-V2.md, after Test Credentials block):
85
+
86
+ ```
87
+ If Username is "REPLACE_WITH_TEST_USERNAME":
88
+ "Test credentials have NOT been configured..."}
89
+
90
+ ## Validation Checks ({BUILD_CHECK_COUNT} -- run ALL of these)
91
+ ```
92
+
93
+ **After** (build-phase-V2.md, Mobile Device Testing block inserted):
94
+
95
+ ```
96
+ If Username is "REPLACE_WITH_TEST_USERNAME":
97
+ "Test credentials have NOT been configured..."}
98
+
99
+ ## Mobile Device Testing
100
+ {Read the Mobile Test Devices section from the testing-and-validation skill.
101
+
102
+ If HAS_MOBILE_CHECKS is true (Primary Device is NOT "NONE"):
103
+ "Mobile viewport testing is configured. After completing each browser check at
104
+ desktop, you MUST repeat visual/layout checks at the mobile viewport.
105
+
106
+ Mobile testing procedure:
107
+ 1. Complete desktop check first
108
+ 2. Set mobile device: `agent-browser set device "{MOBILE_PRIMARY_DEVICE}"`
109
+ 3. Reload: `agent-browser reload`
110
+ 4. Snapshot: `agent-browser snapshot -i`
111
+ 5. Screenshot: `agent-browser screenshot`
112
+ 6. Verify: no overlapping elements, no horizontal scrolling, no truncated content
113
+ 7. Reset to desktop: `agent-browser set viewport 1920 1080`
114
+
115
+ If a check passes at desktop but fails at mobile, overall result is FAIL."
116
+
117
+ If HAS_MOBILE_CHECKS is false:
118
+ "Mobile viewport testing is not configured. Desktop viewport only."}
119
+
120
+ ## Validation Checks ({BUILD_CHECK_COUNT} -- run ALL of these)
121
+ ```
122
+
123
+ ## Validation Results
124
+
125
+ - **Fix addresses root cause**: Yes -- mobile testing is now explicitly wired through all 4 prompt chain links (planning → config → validation → agent constraint)
126
+ - **No new failure patterns introduced**: Yes -- the `set device reset` issue caught during pre-implementation validation was corrected to `set viewport 1920 1080` before implementation
127
+ - **Impact analysis clear**: Yes -- all consumers verified, no regressions found
128
+ - **Test result**: Pass -- post-implementation validation passed 4/4 checks
129
+
130
+ ## Prevention Recommendations
131
+
132
+ 1. **Capability-to-instruction audit**: When a skill documents a capability (like `agent-browser set device`), trace every consuming command/agent and verify there is an explicit instruction to use it when relevant. Undocumented capabilities are dead weight.
133
+
134
+ 2. **Config table completeness check**: The testing-and-validation skill is the config hub for validators. New testing dimensions (mobile viewports, accessibility, performance budgets) should follow the established pattern: config table in skill → conditional read in build-phase-V2 Step 5a → conditional injection in Step 5c.
135
+
136
+ 3. **Negative constraint review for unrestricted Bash agents**: Any agent with `tools: Bash` should have explicit NEVER constraints for known-bad alternatives to intended tools (e.g., macOS `open` vs `agent-browser open`).
@@ -49,26 +49,12 @@ fi
49
49
 
50
50
  if [ "$should_check" = true ]; then
51
51
  (
52
- # Read installed version from .ai-fob.json tracking file
53
- current=""
54
- # Check local project first, then global
55
- for tracking in "$CLAUDE_PROJECT_DIR/.claude/.ai-fob.json" "$HOME/.claude/.ai-fob.json"; do
56
- if [ -f "$tracking" ]; then
57
- current=$(jq -r '.version // empty' "$tracking" 2>/dev/null)
58
- [ -n "$current" ] && break
59
- fi
60
- done
61
-
62
- if [ -z "$current" ]; then
63
- exit 0
64
- fi
65
-
66
- # Query npm registry
52
+ # Query npm registry for latest version
67
53
  latest=$(npm view ai-fob version 2>/dev/null)
68
54
 
69
55
  if [ -n "$latest" ]; then
70
- printf '{"current":"%s","latest":"%s","checked_at":%d}\n' \
71
- "$current" "$latest" "$(date +%s)" > "$version_cache"
56
+ printf '{"latest":"%s","checked_at":%d}\n' \
57
+ "$latest" "$(date +%s)" > "$version_cache"
72
58
  fi
73
59
  ) &
74
60
  fi
@@ -1,6 +1,8 @@
1
1
  {
2
2
  "env": {
3
- "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
3
+ "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1",
4
+ "CLAUDE_CODE_NO_FLICKER": "1",
5
+ "CLAUDE_CODE_SCROLL_SPEED": "1"
4
6
  },
5
7
  "statusLine": {
6
8
  "type": "command",
@@ -2,10 +2,12 @@
2
2
 
3
3
  # ai-fob status line for Claude Code
4
4
  # Reads JSON from stdin with workspace and context_window data
5
- # Displays: cwd, git branch, model, context bar + tokens, update indicator
5
+ # Displays: cwd, git branch, update indicator, model, context bar + tokens
6
6
 
7
7
  input=$(cat)
8
8
 
9
+ CACHE_DIR="$HOME/.claude/data/ai-fob-cache"
10
+
9
11
  # --- Working directory ---
10
12
  cwd=$(echo "$input" | jq -r '.workspace.current_dir')
11
13
  short_cwd=$(echo "$cwd" | sed "s|^$HOME|~|")
@@ -19,7 +21,30 @@ if git -C "$cwd" rev-parse --git-dir > /dev/null 2>&1; then
19
21
  fi
20
22
  fi
21
23
 
22
- # --- Extract context window size early (needed for model suffix + bar) ---
24
+ # --- ai-fob update indicator (early so it's never truncated) ---
25
+ update_display=""
26
+ version_cache="$CACHE_DIR/version-check.json"
27
+ if [ -f "$version_cache" ]; then
28
+ cached_ts=$(jq -r '.checked_at // 0' "$version_cache" 2>/dev/null)
29
+ now=$(date +%s)
30
+ age=$(( now - cached_ts ))
31
+ if [ "$age" -lt 3600 ]; then
32
+ latest_ver=$(jq -r '.latest // empty' "$version_cache" 2>/dev/null)
33
+ # Read installed version directly from .ai-fob.json (local first, then global)
34
+ current_ver=""
35
+ for tracking in "$cwd/.claude/.ai-fob.json" "$HOME/.claude/.ai-fob.json"; do
36
+ if [ -f "$tracking" ]; then
37
+ current_ver=$(jq -r '.version // empty' "$tracking" 2>/dev/null)
38
+ [ -n "$current_ver" ] && break
39
+ fi
40
+ done
41
+ if [ -n "$latest_ver" ] && [ -n "$current_ver" ] && [ "$latest_ver" != "$current_ver" ]; then
42
+ update_display=" \033[36m\xe2\xac\x86 ai-fob ${latest_ver}\033[0m"
43
+ fi
44
+ fi
45
+ fi
46
+
47
+ # --- Extract context window size (needed for model suffix + bar) ---
23
48
  usage=$(echo "$input" | jq '.context_window.current_usage')
24
49
  size=$(echo "$input" | jq '.context_window.context_window_size')
25
50
  current=0
@@ -28,7 +53,6 @@ if [ "$usage" != "null" ] && [ -n "$usage" ]; then
28
53
  fi
29
54
 
30
55
  # --- Model info (from session-start cache) ---
31
- CACHE_DIR="$HOME/.claude/data/ai-fob-cache"
32
56
  model_display=""
33
57
  session_id=$(echo "$input" | jq -r '.session_id // empty')
34
58
 
@@ -63,14 +87,24 @@ if [ -n "$model_display" ]; then
63
87
  model_display=" \033[35m${model_display}\033[0m"
64
88
  fi
65
89
 
66
- # --- Context window bar + percentage + total tokens ---
90
+ # --- Context window bar + percentage + compact tokens ---
67
91
  context_display=""
68
92
  if [ "$usage" != "null" ] && [ -n "$usage" ] && [ "$size" != "null" ] && [ "$size" -gt 0 ] 2>/dev/null; then
69
93
  used=$((current * 100 / size))
70
94
 
71
- # Format numbers with commas (bash builtin printf doesn't support %'d)
72
- fmt_current=$(LC_ALL=en_US.UTF-8 /usr/bin/printf "%'d" "$current" 2>/dev/null || echo "$current")
73
- fmt_size=$(LC_ALL=en_US.UTF-8 /usr/bin/printf "%'d" "$size" 2>/dev/null || echo "$size")
95
+ # Format numbers compactly (1M, 200K, etc.)
96
+ fmt_compact() {
97
+ local n=$1
98
+ if [ "$n" -ge 1000000 ]; then
99
+ echo "$((n / 1000000))M"
100
+ elif [ "$n" -ge 1000 ]; then
101
+ echo "$((n / 1000))K"
102
+ else
103
+ echo "$n"
104
+ fi
105
+ }
106
+ fmt_current=$(fmt_compact "$current")
107
+ fmt_size=$(fmt_compact "$size")
74
108
 
75
109
  # Build progress bar (10 segments)
76
110
  filled=$((used / 10))
@@ -79,32 +113,15 @@ if [ "$usage" != "null" ] && [ -n "$usage" ] && [ "$size" != "null" ] && [ "$siz
79
113
 
80
114
  # Color based on usage
81
115
  if [ "$used" -lt 50 ]; then
82
- context_display=" \033[32m${bar} ${used}% ${fmt_current} of ${fmt_size}\033[0m"
116
+ context_display=" \033[32m${bar} ${used}% ${fmt_current}/${fmt_size}\033[0m"
83
117
  elif [ "$used" -lt 65 ]; then
84
- context_display=" \033[33m${bar} ${used}% ${fmt_current} of ${fmt_size}\033[0m"
118
+ context_display=" \033[33m${bar} ${used}% ${fmt_current}/${fmt_size}\033[0m"
85
119
  elif [ "$used" -lt 80 ]; then
86
- context_display=" \033[38;5;208m${bar} ${used}% ${fmt_current} of ${fmt_size}\033[0m"
120
+ context_display=" \033[38;5;208m${bar} ${used}% ${fmt_current}/${fmt_size}\033[0m"
87
121
  else
88
- context_display=" \033[5;31m\xf0\x9f\x92\x80 ${bar} ${used}% ${fmt_current} of ${fmt_size}\033[0m"
89
- fi
90
- fi
91
-
92
- # --- ai-fob update indicator ---
93
- update_display=""
94
- version_cache="$CACHE_DIR/version-check.json"
95
- if [ -f "$version_cache" ]; then
96
- cached_ts=$(jq -r '.checked_at // 0' "$version_cache" 2>/dev/null)
97
- now=$(date +%s)
98
- age=$(( now - cached_ts ))
99
- # Only use cache if less than 1 hour old
100
- if [ "$age" -lt 3600 ]; then
101
- latest_ver=$(jq -r '.latest // empty' "$version_cache" 2>/dev/null)
102
- current_ver=$(jq -r '.current // empty' "$version_cache" 2>/dev/null)
103
- if [ -n "$latest_ver" ] && [ -n "$current_ver" ] && [ "$latest_ver" != "$current_ver" ]; then
104
- update_display=" \033[36m\xe2\xac\x86 ai-fob ${latest_ver}\033[0m"
105
- fi
122
+ context_display=" \033[5;31m\xf0\x9f\x92\x80 ${bar} ${used}% ${fmt_current}/${fmt_size}\033[0m"
106
123
  fi
107
124
  fi
108
125
 
109
- # --- Output ---
110
- printf '%s%s%b%b%b' "$short_cwd" "$git_branch" "$model_display" "$context_display" "$update_display"
126
+ # --- Output: cwd (branch) [update] model bar% tokens ---
127
+ printf '%s%s%b%b%b' "$short_cwd" "$git_branch" "$update_display" "$model_display" "$context_display"
package/manifest.json CHANGED
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "1.3.0",
2
+ "version": "1.3.2",
3
3
  "presets": {
4
4
  "coding": {
5
5
  "description": "Research-driven coding workflow",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-fob",
3
- "version": "1.3.0",
3
+ "version": "1.3.2",
4
4
  "description": "Deploy research-driven AI coding assistant assets (skills, agents, commands) into your projects",
5
5
  "bin": {
6
6
  "ai-fob": "bin/install.js"