npm - @tayo-dev/rtl - Versions diffs - 1.4.0 → 1.4.1 - Mend

@tayo-dev/rtl 1.4.0 → 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/README.md +16 -75
package/assets/claude/commands/@tayo-dev/rtl/generate.md +42 -5
package/assets/claude/commands/@tayo-dev/rtl/help.md +2 -2
package/assets/codex/@tayo-dev/rtl-conventions/SKILL.md +38 -6
package/assets/codex/@tayo-dev/rtl-generate/SKILL.md +125 -13
package/assets/codex/@tayo-dev/rtl-generate/references/assertion-markers.md +62 -0
package/assets/codex/@tayo-dev/rtl-generate/references/auth.md +92 -0
package/assets/codex/@tayo-dev/rtl-generate/references/conventions-schema.md +184 -0
package/assets/codex/@tayo-dev/rtl-generate/references/entry-path-fidelity.md +68 -0
package/assets/codex/@tayo-dev/rtl-generate/references/intent-model.md +232 -0
package/assets/codex/@tayo-dev/rtl-generate/references/mock-store.md +18 -0
package/assets/codex/@tayo-dev/rtl-generate/references/quality-scoring.md +189 -0
package/assets/codex/@tayo-dev/rtl-generate/references/state-schema.md +119 -0
package/assets/codex/@tayo-dev/rtl-generate/references/test-index.md +12 -0
package/assets/codex/@tayo-dev/rtl-generate/references/verification-gate.md +93 -0
package/assets/codex/@tayo-dev/rtl-help/SKILL.md +21 -7
package/assets/codex/@tayo-dev/rtl-mocks/SKILL.md +55 -9
package/assets/gemini/commands/@tayo-dev/rtl/generate.toml +27 -5
package/assets/gemini/commands/@tayo-dev/rtl/help.toml +2 -2
package/assets/opencode/commands/@tayo-dev/rtl-generate.md +31 -5
package/assets/opencode/commands/@tayo-dev/rtl-help.md +2 -2
package/dist/cli/commands/generate.d.ts +1 -7
package/dist/cli/commands/generate.d.ts.map +1 -1
package/dist/cli/commands/generate.js +50 -96
package/dist/cli/commands/generate.js.map +1 -1
package/dist/core/generator.d.ts +0 -2
package/dist/core/generator.d.ts.map +1 -1
package/dist/core/generator.js +1 -1
package/dist/core/generator.js.map +1 -1
package/dist/core/input-loader.d.ts +2 -2
package/dist/core/input-loader.d.ts.map +1 -1
package/dist/core/input-loader.js +6 -16
package/dist/core/input-loader.js.map +1 -1
package/dist/core/js-parser.js +1 -1
package/dist/core/js-parser.js.map +1 -1
package/dist/core/writer.d.ts +0 -1
package/dist/core/writer.d.ts.map +1 -1
package/dist/core/writer.js +3 -3
package/dist/core/writer.js.map +1 -1
package/dist/index.d.ts +1 -1
package/dist/index.js +18 -14
package/dist/index.js.map +1 -1
package/dist/install/runtimes/codex.d.ts.map +1 -1
package/dist/install/runtimes/codex.js +18 -0
package/dist/install/runtimes/codex.js.map +1 -1
package/package.json +1 -1

package/assets/codex/@tayo-dev/rtl-generate/references/conventions-schema.md ADDED Viewed

@@ -0,0 +1,184 @@
+# Conventions Schema (Self-Evolving Signals)
+Stored in unified state at:
+.tayo/state.json → conventions.signals
+Goal:
+Learn conventions over time without imposing assumptions.
+Principle:
+- Record observations (evidence)
+- Derive candidates
+- Track confidence
+- Keep ranked alternatives when mixed
+- Stay project-agnostic
+---
+## Signal Shape
+Each signal uses this shape:
+```jsonc
+{
+  "value": "any",
+  "confidence": 0.0,
+  "evidence": [
+    {
+      "file": "path",
+      "kind": "import|call|usage|pathPrefix|wrapper",
+      "match": "string"
+    }
+  ],
+  "updatedAt": "ISO-8601"
+}
+```
+Signals may store a single value or a ranked list of candidates.
+---
+## Required Signals
+### testFramework
+```jsonc
+{
+  "value": "vitest|jest|unknown",
+  "confidence": 0.0,
+  "evidence": []
+}
+```
+Evidence examples:
+- import from "vitest"
+- import from "@jest/globals"
+- describe/test usage patterns when imports are absent
+### mockStrategy
+```jsonc
+{
+  "value": "vi.mock|jest.mock|msw|unknown",
+  "confidence": 0.0,
+  "evidence": []
+}
+```
+Evidence examples:
+- vi.mock(
+- jest.mock(
+- setupServer(
+- http.get(
+### importAliases
+Ranked candidates:
+```jsonc
+{
+  "value": [
+    { "alias": "@/", "count": 12, "confidence": 0.8 },
+    { "alias": "~/", "count": 3, "confidence": 0.2 }
+  ],
+  "confidence": 0.8,
+  "evidence": []
+}
+```
+### renderHelpers
+Ranked candidates:
+```jsonc
+{
+  "value": [
+    {
+      "name": "renderWithProviders",
+      "path": "@/tests/utils/render",
+      "count": 5,
+      "confidence": 0.7
+    }
+  ],
+  "confidence": 0.7,
+  "evidence": []
+}
+```
+### queryStyle
+Tracks observed querying patterns:
+```jsonc
+{
+  "value": {
+    "usesRoleQueries": true,
+    "usesLabelQueries": true,
+    "usesTestId": false
+  },
+  "confidence": 0.6,
+  "evidence": []
+}
+```
+### sharedMockSetups
+Ranked reusable mock setup imports:
+```jsonc
+{
+  "value": [
+    {
+      "path": "@/tests/mocks/digitax-components",
+      "count": 4,
+      "confidence": 0.8
+    }
+  ],
+  "confidence": 0.8,
+  "evidence": []
+}
+```
+Evidence examples:
+- side-effect mock setup imports in test files
+- centralized mock bootstrap modules reused across directories
+Generation preference:
+- If `sharedMockSetups` has confidence >= 0.8, prefer importing the setup over re-mocking the same UI package locally.
+---
+## Confidence Rules (Deterministic)
+Suggested confidence computation:
+- For categorical signals:
+  confidence = topCount / totalRelevantCount (clamp 0..1)
+- For ranked lists:
+  overall confidence = topCount / totalCount
+- If totalRelevantCount < 3:
+  cap confidence at 0.5 (insufficient evidence)
+If two candidates are close (difference <= 10%):
+- keep both candidates
+- reduce confidence by 0.1
+---
+## Update/Merge Rules
+On each run:
+1. Add new evidence to existing evidence (bounded cap).
+2. Recompute counts and confidence.
+3. Update updatedAt.
+4. Never delete a signal; mark unknown if confidence falls.
+Generation must prefer signals with confidence >= 0.8.
+Otherwise use safe fallback and log limitation.

package/assets/codex/@tayo-dev/rtl-generate/references/entry-path-fidelity.md ADDED Viewed

@@ -0,0 +1,68 @@
+# Entry Path Fidelity (Tayo)
+Purpose:
+Ensure generated tests preserve the real user entry path captured by recording
+preconditions (especially trigger actions before form interaction).
+---
+## Why This Exists
+Recordings often start with a parent trigger action (button/tab/link) that
+opens the target UI. If generation skips that and renders child UI directly,
+tests lose behavioral fidelity and miss integration regressions.
+---
+## Detection Rules
+Classify early-step preconditions from the recording:
+1. Locate the first meaningful interaction steps (ignore viewport/title checks).
+2. If these steps open a panel/modal/tab and later steps interact with a child
+   form/content area, mark the trigger steps as required preconditions.
+Examples:
+- click "Add API KEY" then interact with `#addAPIKeyForm`
+- click "Create Sale" then interact with sale modal form
+---
+## Generation Rules
+When required preconditions exist:
+1. Prefer rendering the parent component that contains the trigger.
+2. Reproduce trigger action in test setup before child interaction.
+3. Avoid direct harness shortcuts (for example `<Dialog open>`) unless parent
+   composition is unavailable in source.
+Fallback when parent cannot be rendered:
+- Document limitation explicitly in output and state evidence.
+- Use the closest harness, but keep a warning that fidelity is reduced.
+---
+## Verification Rules
+Fail generation if all are true:
+- recording has required precondition trigger(s),
+- parent source composition is resolvable,
+- generated test bypasses trigger path.
+Expected repair behavior:
+- regenerate once with parent-level render and trigger action.
+---
+## Evidence to Store
+For each generated test, record:
+- detected precondition trigger summary
+- whether fidelity was preserved
+- fallback reason (if any)

package/assets/codex/@tayo-dev/rtl-generate/references/intent-model.md ADDED Viewed

@@ -0,0 +1,232 @@
+# Interaction Intent Model
+Purpose:
+Convert Puppeteer Replay `runStep` objects into semantic user-level intents that can drive:
+- component discovery hints
+- robust RTL query generation
+- user-visible assertion extraction from marker actions
+- screenshot milestone selection
+This model must remain project-agnostic.
+It stores intent + evidence + confidence.
+---
+## 1) ParsedStep Schema (input)
+A ParsedStep is extracted deterministically from the recording using AST parsing.
+```ts
+type ParsedStep = {
+  index: number;
+  type: string; // e.g. "navigate" | "click" | "change" | ...
+  url?: string;
+  value?: string;
+  keys?: string;
+  selectors?: string[][];
+  location?: { line: number; column: number };
+};
+```
+Notes:
+- `selectors` is an ordered list of selector chains from Puppeteer Replay.
+- The parser must preserve selector order and grouping.
+---
+## 2) InteractionIntent Schema (output)
+```ts
+type QueryHint =
+  | { kind: "role"; role: string; name?: string }
+  | { kind: "label"; text: string }
+  | { kind: "placeholder"; text: string }
+  | { kind: "text"; text: string }
+  | { kind: "css"; selector: string };
+type Evidence = {
+  kind: "selector" | "value" | "stepType";
+  detail: string;
+};
+type InteractionIntent = {
+  index: number;
+  // Normalized action type
+  type:
+    | "navigate"
+    | "click"
+    | "type"
+    | "select"
+    | "assertExists"
+    | "assertNotExists";
+  // For navigation
+  url?: string;
+  // For typing/selecting
+  value?: string;
+  // For assertion intents
+  assertionValue?: string;
+  // Best guess of semantic target
+  role?: string;
+  name?: string;
+  label?: string;
+  // Query hints for RTL generation (ordered best → worst)
+  queryHints: QueryHint[];
+  // Raw selector chosen as best fallback (if any)
+  rawSelector?: string;
+  // Confidence score in [0..1]
+  confidence: number;
+  // Evidence used to derive the intent
+  evidence: Evidence[];
+};
+```
+Key principle:
+- Downstream generation must prefer high-confidence intents and the earliest queryHints.
+- Low-confidence intents must trigger conservative fallbacks and stronger verification.
+---
+## 3) Normalization Rules (ParsedStep → InteractionIntent)
+### Step type mapping
+- `navigate` → intent.type = "navigate", intent.url = step.url
+- `click` → intent.type = "click"
+- `change` (or typing-like step) → intent.type = "type", intent.value = step.value
+- If step indicates option selection (if detectable) → intent.type = "select"
+- Copy keyboard chord marker (`Meta+C` or `Control+C`) after text selection/focusable target
+  may emit intent.type = "assertExists" when marker rules match.
+If the step type is unknown:
+- map to the closest of click/type/select based on presence of value/selectors.
+- set confidence low (<= 0.4).
+---
+## 3.1) Assertion Marker Detection (Non-Technical Flow)
+Preferred marker pattern:
+- user highlights visible text on the page
+- user presses copy (`Meta+C` on macOS, `Control+C` on Windows/Linux)
+Deterministic interpretation rules:
+1. Detect keyboard copy chord steps.
+2. Look back up to 5 prior non-navigation steps for the focused/selected target.
+3. If a semantic selector (`aria/` or `text/`) exists on that target, create:
+   - `intent.type = "assertExists"`
+   - query hint derived from that semantic selector
+4. If no semantic selector exists, do not invent one:
+   - fallback to conservative text query only when the recording includes visible text evidence
+   - otherwise skip marker conversion and log low-confidence evidence
+Notes:
+- Marker conversion is additive and never blocks normal intent extraction.
+- Highlight-only without copy is best-effort and lower confidence.
+---
+## 4) Selector Resolution Ladder (deterministic)
+Given `selectors: string[][]`, flatten into a prioritized list of individual selectors preserving order.
+For each selector candidate:
+1. If starts with `aria/`:
+   - Extract accessible name text after `aria/`
+   - Add queryHint:
+     - `{ kind: "role", role: "button", name: <extracted> }` IF used in a click step
+     - `{ kind: "role", role: "textbox", name: <extracted> }` IF used in a type step
+   - Add evidence: selector
+   - Increase confidence (+0.3)
+2. If starts with `text/`:
+   - Extract visible label after `text/`
+   - Add queryHint: `{ kind: "text", text: <extracted> }`
+   - Increase confidence (+0.2)
+3. If selector looks like an input hint (heuristic, project-agnostic):
+   - contains `input`, `textarea`, `[type=` or similar
+   - Add queryHint: `{ kind: "placeholder", text: <unknown> }` only if placeholder is known later
+   - Confidence unchanged unless label is discovered elsewhere
+4. CSS fallback:
+   - If selector is CSS and not aria/text, keep as `{ kind: "css", selector }`
+   - Confidence +0.05 (tiny)
+   - Set `rawSelector` if no better option exists
+Important:
+- Do not invent labels/placeholders from CSS selectors.
+- Do not assume a role from CSS unless step context supports it (click → likely button/link; type → likely textbox).
+---
+## 5) Confidence Calculation
+Start confidence at:
+- 0.5 if step type is known
+- 0.3 if step type is unknown
+Then:
+- +0.3 if aria/ exists
+- +0.2 if text/ exists
+- +0.1 if both aria/ and text/ corroborate similar name
+- -0.2 if only CSS selectors exist
+- Clamp to [0..1]
+If multiple selector types exist, keep multiple queryHints in ranked order.
+---
+## 6) Output Requirements
+For every ParsedStep (except setViewport):
+- Emit an InteractionIntent with:
+  - index
+  - normalized type
+  - confidence
+  - evidence[]
+  - queryHints[] (may include css as last resort)
+Never emit an intent with empty queryHints for click/type/select.
+For assertion intents (`assertExists`/`assertNotExists`), queryHints must include at least one non-css hint when possible.
+If nothing is available, set:
+- queryHints = [{ kind: "css", selector: "<unknown>" }]
+- confidence = 0.1
+---
+## 7) How Downstream Modules Use This
+- Component discovery:
+  - prefer intents with confidence >= 0.6
+  - use role/name hints to infer component context
+- Screenshot milestones:
+  - prefer click intents with confidence >= 0.6
+- RTL generation:
+  - use the first viable queryHint:
+    - role → label → placeholder → text → css (only if allowed by conventions)
+  - for marker assertion intents, emit explicit `expect(...)` assertions tied to user-visible outcomes.

package/assets/codex/@tayo-dev/rtl-generate/references/mock-store.md ADDED Viewed

@@ -0,0 +1,18 @@
+# Persistent Mock Store
+Location:
+packages/**/src/tests/mock-store/
+Rules:
+- Deterministic IDs (ORG_001, INV_001)
+- No random UUIDs
+- Export seed objects
+- Central index.ts exports all fixtures
+Example:
+export const ORG_001 = {
+  id: "ORG_001",
+  name: "Test Organisation",
+  active: true
+};

package/assets/codex/@tayo-dev/rtl-generate/references/quality-scoring.md ADDED Viewed

@@ -0,0 +1,189 @@
+# Test Quality Scoring (Tayo)
+Purpose:
+Provide a deterministic, explainable score for each generated test file so Tayo can:
+- measure whether changes improve quality,
+- avoid regressions,
+- and prioritize upgrades (rewrite suggestions) over time.
+Scoring must be:
+- project-agnostic,
+- deterministic (same input => same score),
+- explainable (every point has a reason),
+- bounded and comparable across runs.
+---
+## Output
+Each generated test must produce:
+```jsonc
+{
+  "overall": 0, // 0-100
+  "grade": "F|D|C|B|A",
+  "dimensions": {
+    "robustness": 0, // 0-25
+    "readability": 0, // 0-15
+    "assertionStrength": 0, // 0-20
+    "mockFidelity": 0, // 0-20
+    "maintainability": 0, // 0-20
+  },
+  "signals": {
+    "usesCssSelectors": false,
+    "usesTestId": false,
+    "usesRoleQueries": true,
+    "hasMeaningfulAssertions": true,
+    "hasMarkerDerivedAssertions": true,
+    "hasDeterministicFixtures": true,
+    "hasProviderWrapper": true,
+    "hasUiLibraryReimplementation": false
+  },
+  "reasons": [
+    {
+      "dimension": "robustness",
+      "delta": -8,
+      "reason": "Uses brittle CSS selectors for primary queries."
+    },
+    {
+      "dimension": "assertionStrength",
+      "delta": +6,
+      "reason": "Asserts user-visible success outcome (toast/dialog close)."
+    }
+  ]
+}
+```
+---
+## Dimension Scoring Rubric
+### A) Robustness (0–25)
+Start at 25. Subtract:
+- -10: uses CSS selectors for user interactions or assertions
+- -6: primary queries use text-only selectors where role/label exists
+- -6: heavy reliance on exact, fragile UI text (not regex or role-based name)
+- -3: missing `findBy*` / waits where async UI is expected (flakiness risk)
+- -15: reimplements UI-library components in test mocks
+Add back (up to cap 25):
+- +5: uses getByRole with accessible names for main interactions
+- +3: uses getByLabelText for form fields
+- +2: avoids querying implementation details
+### B) Readability (0–15)
+Start at 10. Adjust:
+- +3: helper functions are used for repeated flows (setup/fill/submit)
+- +2: test names align with domain behavior (create organisation, etc.)
+- +2: clear Arrange/Act/Assert separation
+- -4: confusing naming mismatch ("profile" vs "organisation")
+- -3: large monolithic tests with repeated code
+Cap 15.
+### C) Assertion Strength (0–20)
+Start at 8. Add:
+- +6: asserts user-visible success outcome (toast, navigation, dialog close, list update)
+- +6: asserts correct error outcome (validation message, error toast)
+- +4: asserts API call was made with expected payload shape
+- +3: includes marker-derived assertions from non-technical checkpoints (semantic dblClick markers)
+- +2: asserts disabled state / loading state when relevant
+Subtract:
+- -8: only asserts mock called (no user-visible assertion)
+- -6: asserts internal implementation details only
+Cap 20.
+### D) Mock Fidelity (0–20)
+Start at 10. Add:
+- +6: mocks match real API hook signature (callbacks/args)
+- +4: uses persistent deterministic fixtures (mock-store)
+- +3: covers both success and error branches with realistic responses
+- +2: clears mocks properly between tests
+Subtract:
+- -8: random/inline fixtures created ad hoc each run
+- -6: mocks don’t reflect actual dependency contract (false positives)
+- -4: mocks rely on global state without reset
+- -20: UI-library component reimplementation detected (policy violation)
+Cap 20.
+### E) Maintainability (0–20)
+Start at 10. Add:
+- +5: uses centralized fixtures (mock-store)
+- +4: minimal coupling to UI structure (role/label-based)
+- +3: test file structure matches project conventions (imports, cleanup)
+- +2: avoids duplicated test generation (indexed in state)
+Subtract:
+- -6: hardcoded selectors tied to layout/CSS
+- -4: missing shared fixtures; repeated data creation
+- -4: reruns regenerate different data or duplicate files
+- -10: replaces design-system/UI-library modules with custom stand-ins
+Cap 20.
+---
+## Grade Mapping
+- A: 90–100
+- B: 80–89
+- C: 70–79
+- D: 60–69
+- F: 0–59
+Hard fail cap:
+- If `hasUiLibraryReimplementation` is true, cap final `overall` at 59 and `grade` at `F`.
+- Always add reason:
+  - `Reimplemented UI library components; behavioral fidelity reduced.`
+---
+## Deterministic Extraction Rules
+To score, Tayo inspects the generated test file text and checks for patterns:
+- CSS selectors: `container.querySelector`, `document.querySelector`, or `screen.*` calls using selectors (should be absent)
+- Role queries: `getByRole`, `findByRole`
+- Label queries: `getByLabelText`, `findByLabelText`
+- TestId queries: `getByTestId`
+- User-visible assertions: `toBeInTheDocument` on toast/dialog/message; `queryByRole('dialog')` absence; route change assertions if present
+- Marker-derived assertions: inline marker comments or explicit checkpoint assertion helpers
+- Mock store usage: imports from detected mock-store path
+- Mock reset: `beforeEach`, `afterEach`, `cleanup`, `vi.clearAllMocks`, etc.
+- UI-library reimplementation:
+  - `vi.mock`/`jest.mock` targeting known UI-library modules and returning replacement component objects/functions.
+This scoring is heuristic but deterministic.
+---
+## Evolution Rules
+- Every run stores a score snapshot in `.tayo/state.json`.
+- When Tayo changes its generation logic, compare:
+  - latest score vs previous score for same component (or same recording)
+- If score drops by >= 5 points:
+  - warn about regression
+  - keep old test unless user explicitly opts in to overwrite