@tayo-dev/rtl 1.4.0 → 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/README.md +16 -75
  2. package/assets/claude/commands/@tayo-dev/rtl/generate.md +42 -5
  3. package/assets/claude/commands/@tayo-dev/rtl/help.md +2 -2
  4. package/assets/codex/@tayo-dev/rtl-conventions/SKILL.md +38 -6
  5. package/assets/codex/@tayo-dev/rtl-generate/SKILL.md +125 -13
  6. package/assets/codex/@tayo-dev/rtl-generate/references/assertion-markers.md +62 -0
  7. package/assets/codex/@tayo-dev/rtl-generate/references/auth.md +92 -0
  8. package/assets/codex/@tayo-dev/rtl-generate/references/conventions-schema.md +184 -0
  9. package/assets/codex/@tayo-dev/rtl-generate/references/entry-path-fidelity.md +68 -0
  10. package/assets/codex/@tayo-dev/rtl-generate/references/intent-model.md +232 -0
  11. package/assets/codex/@tayo-dev/rtl-generate/references/mock-store.md +18 -0
  12. package/assets/codex/@tayo-dev/rtl-generate/references/quality-scoring.md +189 -0
  13. package/assets/codex/@tayo-dev/rtl-generate/references/state-schema.md +119 -0
  14. package/assets/codex/@tayo-dev/rtl-generate/references/test-index.md +12 -0
  15. package/assets/codex/@tayo-dev/rtl-generate/references/verification-gate.md +93 -0
  16. package/assets/codex/@tayo-dev/rtl-help/SKILL.md +21 -7
  17. package/assets/codex/@tayo-dev/rtl-mocks/SKILL.md +55 -9
  18. package/assets/gemini/commands/@tayo-dev/rtl/generate.toml +27 -5
  19. package/assets/gemini/commands/@tayo-dev/rtl/help.toml +2 -2
  20. package/assets/opencode/commands/@tayo-dev/rtl-generate.md +31 -5
  21. package/assets/opencode/commands/@tayo-dev/rtl-help.md +2 -2
  22. package/dist/cli/commands/generate.d.ts +1 -7
  23. package/dist/cli/commands/generate.d.ts.map +1 -1
  24. package/dist/cli/commands/generate.js +50 -96
  25. package/dist/cli/commands/generate.js.map +1 -1
  26. package/dist/core/generator.d.ts +0 -2
  27. package/dist/core/generator.d.ts.map +1 -1
  28. package/dist/core/generator.js +1 -1
  29. package/dist/core/generator.js.map +1 -1
  30. package/dist/core/input-loader.d.ts +2 -2
  31. package/dist/core/input-loader.d.ts.map +1 -1
  32. package/dist/core/input-loader.js +6 -16
  33. package/dist/core/input-loader.js.map +1 -1
  34. package/dist/core/js-parser.js +1 -1
  35. package/dist/core/js-parser.js.map +1 -1
  36. package/dist/core/writer.d.ts +0 -1
  37. package/dist/core/writer.d.ts.map +1 -1
  38. package/dist/core/writer.js +3 -3
  39. package/dist/core/writer.js.map +1 -1
  40. package/dist/index.d.ts +1 -1
  41. package/dist/index.js +18 -14
  42. package/dist/index.js.map +1 -1
  43. package/dist/install/runtimes/codex.d.ts.map +1 -1
  44. package/dist/install/runtimes/codex.js +18 -0
  45. package/dist/install/runtimes/codex.js.map +1 -1
  46. package/package.json +1 -1
@@ -0,0 +1,184 @@
1
+ # Conventions Schema (Self-Evolving Signals)
2
+
3
+ Stored in unified state at:
4
+ .tayo/state.json → conventions.signals
5
+
6
+ Goal:
7
+ Learn conventions over time without imposing assumptions.
8
+
9
+ Principle:
10
+
11
+ - Record observations (evidence)
12
+ - Derive candidates
13
+ - Track confidence
14
+ - Keep ranked alternatives when mixed
15
+ - Stay project-agnostic
16
+
17
+ ---
18
+
19
+ ## Signal Shape
20
+
21
+ Each signal uses this shape:
22
+
23
+ ```jsonc
24
+ {
25
+ "value": "any",
26
+ "confidence": 0.0,
27
+ "evidence": [
28
+ {
29
+ "file": "path",
30
+ "kind": "import|call|usage|pathPrefix|wrapper",
31
+ "match": "string"
32
+ }
33
+ ],
34
+ "updatedAt": "ISO-8601"
35
+ }
36
+ ```
37
+
38
+ Signals may store a single value or a ranked list of candidates.
39
+
40
+ ---
41
+
42
+ ## Required Signals
43
+
44
+ ### testFramework
45
+
46
+ ```jsonc
47
+ {
48
+ "value": "vitest|jest|unknown",
49
+ "confidence": 0.0,
50
+ "evidence": []
51
+ }
52
+ ```
53
+
54
+ Evidence examples:
55
+
56
+ - import from "vitest"
57
+ - import from "@jest/globals"
58
+ - describe/test usage patterns when imports are absent
59
+
60
+ ### mockStrategy
61
+
62
+ ```jsonc
63
+ {
64
+ "value": "vi.mock|jest.mock|msw|unknown",
65
+ "confidence": 0.0,
66
+ "evidence": []
67
+ }
68
+ ```
69
+
70
+ Evidence examples:
71
+
72
+ - vi.mock(
73
+ - jest.mock(
74
+ - setupServer(
75
+ - http.get(
76
+
77
+ ### importAliases
78
+
79
+ Ranked candidates:
80
+
81
+ ```jsonc
82
+ {
83
+ "value": [
84
+ { "alias": "@/", "count": 12, "confidence": 0.8 },
85
+ { "alias": "~/", "count": 3, "confidence": 0.2 }
86
+ ],
87
+ "confidence": 0.8,
88
+ "evidence": []
89
+ }
90
+ ```
91
+
92
+ ### renderHelpers
93
+
94
+ Ranked candidates:
95
+
96
+ ```jsonc
97
+ {
98
+ "value": [
99
+ {
100
+ "name": "renderWithProviders",
101
+ "path": "@/tests/utils/render",
102
+ "count": 5,
103
+ "confidence": 0.7
104
+ }
105
+ ],
106
+ "confidence": 0.7,
107
+ "evidence": []
108
+ }
109
+ ```
110
+
111
+ ### queryStyle
112
+
113
+ Tracks observed querying patterns:
114
+
115
+ ```jsonc
116
+ {
117
+ "value": {
118
+ "usesRoleQueries": true,
119
+ "usesLabelQueries": true,
120
+ "usesTestId": false
121
+ },
122
+ "confidence": 0.6,
123
+ "evidence": []
124
+ }
125
+ ```
126
+
127
+ ### sharedMockSetups
128
+
129
+ Ranked reusable mock setup imports:
130
+
131
+ ```jsonc
132
+ {
133
+ "value": [
134
+ {
135
+ "path": "@/tests/mocks/digitax-components",
136
+ "count": 4,
137
+ "confidence": 0.8
138
+ }
139
+ ],
140
+ "confidence": 0.8,
141
+ "evidence": []
142
+ }
143
+ ```
144
+
145
+ Evidence examples:
146
+
147
+ - side-effect mock setup imports in test files
148
+ - centralized mock bootstrap modules reused across directories
149
+
150
+ Generation preference:
151
+
152
+ - If `sharedMockSetups` has confidence >= 0.8, prefer importing the setup over re-mocking the same UI package locally.
153
+
154
+ ---
155
+
156
+ ## Confidence Rules (Deterministic)
157
+
158
+ Suggested confidence computation:
159
+
160
+ - For categorical signals:
161
+ confidence = topCount / totalRelevantCount (clamp 0..1)
162
+ - For ranked lists:
163
+ overall confidence = topCount / totalCount
164
+ - If totalRelevantCount < 3:
165
+ cap confidence at 0.5 (insufficient evidence)
166
+
167
+ If two candidates are close (difference <= 10%):
168
+
169
+ - keep both candidates
170
+ - reduce confidence by 0.1
171
+
172
+ ---
173
+
174
+ ## Update/Merge Rules
175
+
176
+ On each run:
177
+
178
+ 1. Add new evidence to existing evidence (bounded cap).
179
+ 2. Recompute counts and confidence.
180
+ 3. Update updatedAt.
181
+ 4. Never delete a signal; mark unknown if confidence falls.
182
+
183
+ Generation must prefer signals with confidence >= 0.8.
184
+ Otherwise use safe fallback and log limitation.
@@ -0,0 +1,68 @@
1
+ # Entry Path Fidelity (Tayo)
2
+
3
+ Purpose:
4
+ Ensure generated tests preserve the real user entry path captured by recording
5
+ preconditions (especially trigger actions before form interaction).
6
+
7
+ ---
8
+
9
+ ## Why This Exists
10
+
11
+ Recordings often start with a parent trigger action (button/tab/link) that
12
+ opens the target UI. If generation skips that and renders child UI directly,
13
+ tests lose behavioral fidelity and miss integration regressions.
14
+
15
+ ---
16
+
17
+ ## Detection Rules
18
+
19
+ Classify early-step preconditions from the recording:
20
+
21
+ 1. Locate the first meaningful interaction steps (ignore viewport/title checks).
22
+ 2. If these steps open a panel/modal/tab and later steps interact with a child
23
+ form/content area, mark the trigger steps as required preconditions.
24
+
25
+ Examples:
26
+
27
+ - click "Add API KEY" then interact with `#addAPIKeyForm`
28
+ - click "Create Sale" then interact with sale modal form
29
+
30
+ ---
31
+
32
+ ## Generation Rules
33
+
34
+ When required preconditions exist:
35
+
36
+ 1. Prefer rendering the parent component that contains the trigger.
37
+ 2. Reproduce trigger action in test setup before child interaction.
38
+ 3. Avoid direct harness shortcuts (for example `<Dialog open>`) unless parent
39
+ composition is unavailable in source.
40
+
41
+ Fallback when parent cannot be rendered:
42
+
43
+ - Document limitation explicitly in output and state evidence.
44
+ - Use the closest harness, but keep a warning that fidelity is reduced.
45
+
46
+ ---
47
+
48
+ ## Verification Rules
49
+
50
+ Fail generation if all are true:
51
+
52
+ - recording has required precondition trigger(s),
53
+ - parent source composition is resolvable,
54
+ - generated test bypasses trigger path.
55
+
56
+ Expected repair behavior:
57
+
58
+ - regenerate once with parent-level render and trigger action.
59
+
60
+ ---
61
+
62
+ ## Evidence to Store
63
+
64
+ For each generated test, record:
65
+
66
+ - detected precondition trigger summary
67
+ - whether fidelity was preserved
68
+ - fallback reason (if any)
@@ -0,0 +1,232 @@
1
+ # Interaction Intent Model
2
+
3
+ Purpose:
4
+ Convert Puppeteer Replay `runStep` objects into semantic user-level intents that can drive:
5
+
6
+ - component discovery hints
7
+ - robust RTL query generation
8
+ - user-visible assertion extraction from marker actions
9
+ - screenshot milestone selection
10
+
11
+ This model must remain project-agnostic.
12
+ It stores intent + evidence + confidence.
13
+
14
+ ---
15
+
16
+ ## 1) ParsedStep Schema (input)
17
+
18
+ A ParsedStep is extracted deterministically from the recording using AST parsing.
19
+
20
+ ```ts
21
+ type ParsedStep = {
22
+ index: number;
23
+ type: string; // e.g. "navigate" | "click" | "change" | ...
24
+ url?: string;
25
+ value?: string;
26
+ keys?: string;
27
+ selectors?: string[][];
28
+ location?: { line: number; column: number };
29
+ };
30
+ ```
31
+
32
+ Notes:
33
+
34
+ - `selectors` is an ordered list of selector chains from Puppeteer Replay.
35
+ - The parser must preserve selector order and grouping.
36
+
37
+ ---
38
+
39
+ ## 2) InteractionIntent Schema (output)
40
+
41
+ ```ts
42
+ type QueryHint =
43
+ | { kind: "role"; role: string; name?: string }
44
+ | { kind: "label"; text: string }
45
+ | { kind: "placeholder"; text: string }
46
+ | { kind: "text"; text: string }
47
+ | { kind: "css"; selector: string };
48
+
49
+ type Evidence = {
50
+ kind: "selector" | "value" | "stepType";
51
+ detail: string;
52
+ };
53
+
54
+ type InteractionIntent = {
55
+ index: number;
56
+
57
+ // Normalized action type
58
+ type:
59
+ | "navigate"
60
+ | "click"
61
+ | "type"
62
+ | "select"
63
+ | "assertExists"
64
+ | "assertNotExists";
65
+
66
+ // For navigation
67
+ url?: string;
68
+
69
+ // For typing/selecting
70
+ value?: string;
71
+
72
+ // For assertion intents
73
+ assertionValue?: string;
74
+
75
+ // Best guess of semantic target
76
+ role?: string;
77
+ name?: string;
78
+ label?: string;
79
+
80
+ // Query hints for RTL generation (ordered best → worst)
81
+ queryHints: QueryHint[];
82
+
83
+ // Raw selector chosen as best fallback (if any)
84
+ rawSelector?: string;
85
+
86
+ // Confidence score in [0..1]
87
+ confidence: number;
88
+
89
+ // Evidence used to derive the intent
90
+ evidence: Evidence[];
91
+ };
92
+ ```
93
+
94
+ Key principle:
95
+
96
+ - Downstream generation must prefer high-confidence intents and the earliest queryHints.
97
+ - Low-confidence intents must trigger conservative fallbacks and stronger verification.
98
+
99
+ ---
100
+
101
+ ## 3) Normalization Rules (ParsedStep → InteractionIntent)
102
+
103
+ ### Step type mapping
104
+
105
+ - `navigate` → intent.type = "navigate", intent.url = step.url
106
+ - `click` → intent.type = "click"
107
+ - `change` (or typing-like step) → intent.type = "type", intent.value = step.value
108
+ - If step indicates option selection (if detectable) → intent.type = "select"
109
+ - Copy keyboard chord marker (`Meta+C` or `Control+C`) after text selection/focusable target
110
+ may emit intent.type = "assertExists" when marker rules match.
111
+
112
+ If the step type is unknown:
113
+
114
+ - map to the closest of click/type/select based on presence of value/selectors.
115
+ - set confidence low (<= 0.4).
116
+
117
+ ---
118
+
119
+ ## 3.1) Assertion Marker Detection (Non-Technical Flow)
120
+
121
+ Preferred marker pattern:
122
+
123
+ - user highlights visible text on the page
124
+ - user presses copy (`Meta+C` on macOS, `Control+C` on Windows/Linux)
125
+
126
+ Deterministic interpretation rules:
127
+
128
+ 1. Detect keyboard copy chord steps.
129
+ 2. Look back up to 5 prior non-navigation steps for the focused/selected target.
130
+ 3. If a semantic selector (`aria/` or `text/`) exists on that target, create:
131
+ - `intent.type = "assertExists"`
132
+ - query hint derived from that semantic selector
133
+ 4. If no semantic selector exists, do not invent one:
134
+ - fallback to conservative text query only when the recording includes visible text evidence
135
+ - otherwise skip marker conversion and log low-confidence evidence
136
+
137
+ Notes:
138
+
139
+ - Marker conversion is additive and never blocks normal intent extraction.
140
+ - Highlight-only without copy is best-effort and lower confidence.
141
+
142
+ ---
143
+
144
+ ## 4) Selector Resolution Ladder (deterministic)
145
+
146
+ Given `selectors: string[][]`, flatten into a prioritized list of individual selectors preserving order.
147
+
148
+ For each selector candidate:
149
+
150
+ 1. If starts with `aria/`:
151
+ - Extract accessible name text after `aria/`
152
+ - Add queryHint:
153
+ - `{ kind: "role", role: "button", name: <extracted> }` IF used in a click step
154
+ - `{ kind: "role", role: "textbox", name: <extracted> }` IF used in a type step
155
+
156
+ - Add evidence: selector
157
+ - Increase confidence (+0.3)
158
+
159
+ 2. If starts with `text/`:
160
+ - Extract visible label after `text/`
161
+ - Add queryHint: `{ kind: "text", text: <extracted> }`
162
+ - Increase confidence (+0.2)
163
+
164
+ 3. If selector looks like an input hint (heuristic, project-agnostic):
165
+ - contains `input`, `textarea`, `[type=` or similar
166
+ - Add queryHint: `{ kind: "placeholder", text: <unknown> }` only if placeholder is known later
167
+ - Confidence unchanged unless label is discovered elsewhere
168
+
169
+ 4. CSS fallback:
170
+ - If selector is CSS and not aria/text, keep as `{ kind: "css", selector }`
171
+ - Confidence +0.05 (tiny)
172
+ - Set `rawSelector` if no better option exists
173
+
174
+ Important:
175
+
176
+ - Do not invent labels/placeholders from CSS selectors.
177
+ - Do not assume a role from CSS unless step context supports it (click → likely button/link; type → likely textbox).
178
+
179
+ ---
180
+
181
+ ## 5) Confidence Calculation
182
+
183
+ Start confidence at:
184
+
185
+ - 0.5 if step type is known
186
+ - 0.3 if step type is unknown
187
+
188
+ Then:
189
+
190
+ - +0.3 if aria/ exists
191
+ - +0.2 if text/ exists
192
+ - +0.1 if both aria/ and text/ corroborate similar name
193
+ - -0.2 if only CSS selectors exist
194
+ - Clamp to [0..1]
195
+
196
+ If multiple selector types exist, keep multiple queryHints in ranked order.
197
+
198
+ ---
199
+
200
+ ## 6) Output Requirements
201
+
202
+ For every ParsedStep (except setViewport):
203
+
204
+ - Emit an InteractionIntent with:
205
+ - index
206
+ - normalized type
207
+ - confidence
208
+ - evidence[]
209
+ - queryHints[] (may include css as last resort)
210
+
211
+ Never emit an intent with empty queryHints for click/type/select.
212
+ For assertion intents (`assertExists`/`assertNotExists`), queryHints must include at least one non-css hint when possible.
213
+ If nothing is available, set:
214
+
215
+ - queryHints = [{ kind: "css", selector: "<unknown>" }]
216
+ - confidence = 0.1
217
+
218
+ ---
219
+
220
+ ## 7) How Downstream Modules Use This
221
+
222
+ - Component discovery:
223
+ - prefer intents with confidence >= 0.6
224
+ - use role/name hints to infer component context
225
+
226
+ - Screenshot milestones:
227
+ - prefer click intents with confidence >= 0.6
228
+
229
+ - RTL generation:
230
+ - use the first viable queryHint:
231
+ - role → label → placeholder → text → css (only if allowed by conventions)
232
+ - for marker assertion intents, emit explicit `expect(...)` assertions tied to user-visible outcomes.
@@ -0,0 +1,18 @@
1
+ # Persistent Mock Store
2
+
3
+ Location:
4
+ packages/**/src/tests/mock-store/
5
+
6
+ Rules:
7
+ - Deterministic IDs (ORG_001, INV_001)
8
+ - No random UUIDs
9
+ - Export seed objects
10
+ - Central index.ts exports all fixtures
11
+
12
+ Example:
13
+
14
+ export const ORG_001 = {
15
+ id: "ORG_001",
16
+ name: "Test Organisation",
17
+ active: true
18
+ };
@@ -0,0 +1,189 @@
1
+ # Test Quality Scoring (Tayo)
2
+
3
+ Purpose:
4
+ Provide a deterministic, explainable score for each generated test file so Tayo can:
5
+
6
+ - measure whether changes improve quality,
7
+ - avoid regressions,
8
+ - and prioritize upgrades (rewrite suggestions) over time.
9
+
10
+ Scoring must be:
11
+
12
+ - project-agnostic,
13
+ - deterministic (same input => same score),
14
+ - explainable (every point has a reason),
15
+ - bounded and comparable across runs.
16
+
17
+ ---
18
+
19
+ ## Output
20
+
21
+ Each generated test must produce:
22
+
23
+ ```jsonc
24
+ {
25
+ "overall": 0, // 0-100
26
+ "grade": "F|D|C|B|A",
27
+ "dimensions": {
28
+ "robustness": 0, // 0-25
29
+ "readability": 0, // 0-15
30
+ "assertionStrength": 0, // 0-20
31
+ "mockFidelity": 0, // 0-20
32
+ "maintainability": 0, // 0-20
33
+ },
34
+ "signals": {
35
+ "usesCssSelectors": false,
36
+ "usesTestId": false,
37
+ "usesRoleQueries": true,
38
+ "hasMeaningfulAssertions": true,
39
+ "hasMarkerDerivedAssertions": true,
40
+ "hasDeterministicFixtures": true,
41
+ "hasProviderWrapper": true,
42
+ "hasUiLibraryReimplementation": false
43
+ },
44
+ "reasons": [
45
+ {
46
+ "dimension": "robustness",
47
+ "delta": -8,
48
+ "reason": "Uses brittle CSS selectors for primary queries."
49
+ },
50
+ {
51
+ "dimension": "assertionStrength",
52
+ "delta": +6,
53
+ "reason": "Asserts user-visible success outcome (toast/dialog close)."
54
+ }
55
+ ]
56
+ }
57
+ ```
58
+
59
+ ---
60
+
61
+ ## Dimension Scoring Rubric
62
+
63
+ ### A) Robustness (0–25)
64
+
65
+ Start at 25. Subtract:
66
+
67
+ - -10: uses CSS selectors for user interactions or assertions
68
+ - -6: primary queries use text-only selectors where role/label exists
69
+ - -6: heavy reliance on exact, fragile UI text (not regex or role-based name)
70
+ - -3: missing `findBy*` / waits where async UI is expected (flakiness risk)
71
+ - -15: reimplements UI-library components in test mocks
72
+
73
+ Add back (up to cap 25):
74
+
75
+ - +5: uses getByRole with accessible names for main interactions
76
+ - +3: uses getByLabelText for form fields
77
+ - +2: avoids querying implementation details
78
+
79
+ ### B) Readability (0–15)
80
+
81
+ Start at 10. Adjust:
82
+
83
+ - +3: helper functions are used for repeated flows (setup/fill/submit)
84
+ - +2: test names align with domain behavior (create organisation, etc.)
85
+ - +2: clear Arrange/Act/Assert separation
86
+ - -4: confusing naming mismatch ("profile" vs "organisation")
87
+ - -3: large monolithic tests with repeated code
88
+
89
+ Cap 15.
90
+
91
+ ### C) Assertion Strength (0–20)
92
+
93
+ Start at 8. Add:
94
+
95
+ - +6: asserts user-visible success outcome (toast, navigation, dialog close, list update)
96
+ - +6: asserts correct error outcome (validation message, error toast)
97
+ - +4: asserts API call was made with expected payload shape
98
+ - +3: includes marker-derived assertions from non-technical checkpoints (semantic dblClick markers)
99
+ - +2: asserts disabled state / loading state when relevant
100
+
101
+ Subtract:
102
+
103
+ - -8: only asserts mock called (no user-visible assertion)
104
+ - -6: asserts internal implementation details only
105
+
106
+ Cap 20.
107
+
108
+ ### D) Mock Fidelity (0–20)
109
+
110
+ Start at 10. Add:
111
+
112
+ - +6: mocks match real API hook signature (callbacks/args)
113
+ - +4: uses persistent deterministic fixtures (mock-store)
114
+ - +3: covers both success and error branches with realistic responses
115
+ - +2: clears mocks properly between tests
116
+
117
+ Subtract:
118
+
119
+ - -8: random/inline fixtures created ad hoc each run
120
+ - -6: mocks don’t reflect actual dependency contract (false positives)
121
+ - -4: mocks rely on global state without reset
122
+ - -20: UI-library component reimplementation detected (policy violation)
123
+
124
+ Cap 20.
125
+
126
+ ### E) Maintainability (0–20)
127
+
128
+ Start at 10. Add:
129
+
130
+ - +5: uses centralized fixtures (mock-store)
131
+ - +4: minimal coupling to UI structure (role/label-based)
132
+ - +3: test file structure matches project conventions (imports, cleanup)
133
+ - +2: avoids duplicated test generation (indexed in state)
134
+
135
+ Subtract:
136
+
137
+ - -6: hardcoded selectors tied to layout/CSS
138
+ - -4: missing shared fixtures; repeated data creation
139
+ - -4: reruns regenerate different data or duplicate files
140
+ - -10: replaces design-system/UI-library modules with custom stand-ins
141
+
142
+ Cap 20.
143
+
144
+ ---
145
+
146
+ ## Grade Mapping
147
+
148
+ - A: 90–100
149
+ - B: 80–89
150
+ - C: 70–79
151
+ - D: 60–69
152
+ - F: 0–59
153
+
154
+ Hard fail cap:
155
+
156
+ - If `hasUiLibraryReimplementation` is true, cap final `overall` at 59 and `grade` at `F`.
157
+ - Always add reason:
158
+ - `Reimplemented UI library components; behavioral fidelity reduced.`
159
+
160
+ ---
161
+
162
+ ## Deterministic Extraction Rules
163
+
164
+ To score, Tayo inspects the generated test file text and checks for patterns:
165
+
166
+ - CSS selectors: `container.querySelector`, `document.querySelector`, or `screen.*` calls using selectors (should be absent)
167
+ - Role queries: `getByRole`, `findByRole`
168
+ - Label queries: `getByLabelText`, `findByLabelText`
169
+ - TestId queries: `getByTestId`
170
+ - User-visible assertions: `toBeInTheDocument` on toast/dialog/message; `queryByRole('dialog')` absence; route change assertions if present
171
+ - Marker-derived assertions: inline marker comments or explicit checkpoint assertion helpers
172
+ - Mock store usage: imports from detected mock-store path
173
+ - Mock reset: `beforeEach`, `afterEach`, `cleanup`, `vi.clearAllMocks`, etc.
174
+ - UI-library reimplementation:
175
+ - `vi.mock`/`jest.mock` targeting known UI-library modules and returning replacement component objects/functions.
176
+
177
+ This scoring is heuristic but deterministic.
178
+
179
+ ---
180
+
181
+ ## Evolution Rules
182
+
183
+ - Every run stores a score snapshot in `.tayo/state.json`.
184
+ - When Tayo changes its generation logic, compare:
185
+ - latest score vs previous score for same component (or same recording)
186
+
187
+ - If score drops by >= 5 points:
188
+ - warn about regression
189
+ - keep old test unless user explicitly opts in to overwrite