@sun-asterisk/sungen 3.0.1 → 3.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (132) hide show
  1. package/dist/cli/commands/challenge.d.ts.map +1 -1
  2. package/dist/cli/commands/challenge.js +9 -2
  3. package/dist/cli/commands/challenge.js.map +1 -1
  4. package/dist/cli/commands/delivery.d.ts.map +1 -1
  5. package/dist/cli/commands/delivery.js +3 -2
  6. package/dist/cli/commands/delivery.js.map +1 -1
  7. package/dist/cli/commands/generate.d.ts.map +1 -1
  8. package/dist/cli/commands/generate.js +8 -0
  9. package/dist/cli/commands/generate.js.map +1 -1
  10. package/dist/exporters/csv-exporter.d.ts.map +1 -1
  11. package/dist/exporters/csv-exporter.js +92 -76
  12. package/dist/exporters/csv-exporter.js.map +1 -1
  13. package/dist/exporters/spec-parser.d.ts.map +1 -1
  14. package/dist/exporters/spec-parser.js +3 -1
  15. package/dist/exporters/spec-parser.js.map +1 -1
  16. package/dist/generators/test-generator/adapters/adapter-interface.d.ts +2 -0
  17. package/dist/generators/test-generator/adapters/adapter-interface.d.ts.map +1 -1
  18. package/dist/generators/test-generator/adapters/playwright/playwright-adapter.d.ts +1 -0
  19. package/dist/generators/test-generator/adapters/playwright/playwright-adapter.d.ts.map +1 -1
  20. package/dist/generators/test-generator/adapters/playwright/playwright-adapter.js.map +1 -1
  21. package/dist/generators/test-generator/adapters/playwright/templates/imports.hbs +3 -0
  22. package/dist/generators/test-generator/adapters/playwright/templates/scenario.hbs +19 -1
  23. package/dist/generators/test-generator/code-generator.d.ts +12 -0
  24. package/dist/generators/test-generator/code-generator.d.ts.map +1 -1
  25. package/dist/generators/test-generator/code-generator.js +137 -4
  26. package/dist/generators/test-generator/code-generator.js.map +1 -1
  27. package/dist/generators/test-generator/patterns/database-patterns.d.ts +6 -0
  28. package/dist/generators/test-generator/patterns/database-patterns.d.ts.map +1 -0
  29. package/dist/generators/test-generator/patterns/database-patterns.js +95 -0
  30. package/dist/generators/test-generator/patterns/database-patterns.js.map +1 -0
  31. package/dist/generators/test-generator/patterns/expect-patterns.d.ts +3 -0
  32. package/dist/generators/test-generator/patterns/expect-patterns.d.ts.map +1 -0
  33. package/dist/generators/test-generator/patterns/expect-patterns.js +54 -0
  34. package/dist/generators/test-generator/patterns/expect-patterns.js.map +1 -0
  35. package/dist/generators/test-generator/patterns/index.d.ts +1 -0
  36. package/dist/generators/test-generator/patterns/index.d.ts.map +1 -1
  37. package/dist/generators/test-generator/patterns/index.js +8 -1
  38. package/dist/generators/test-generator/patterns/index.js.map +1 -1
  39. package/dist/generators/test-generator/step-mapper.d.ts +6 -0
  40. package/dist/generators/test-generator/step-mapper.d.ts.map +1 -1
  41. package/dist/generators/test-generator/step-mapper.js +8 -0
  42. package/dist/generators/test-generator/step-mapper.js.map +1 -1
  43. package/dist/generators/test-generator/template-engine.d.ts +4 -0
  44. package/dist/generators/test-generator/template-engine.d.ts.map +1 -1
  45. package/dist/generators/test-generator/template-engine.js +1 -1
  46. package/dist/generators/test-generator/template-engine.js.map +1 -1
  47. package/dist/generators/test-generator/utils/runtime-data-transformer.d.ts +1 -1
  48. package/dist/generators/test-generator/utils/runtime-data-transformer.d.ts.map +1 -1
  49. package/dist/generators/test-generator/utils/runtime-data-transformer.js +5 -5
  50. package/dist/generators/test-generator/utils/runtime-data-transformer.js.map +1 -1
  51. package/dist/harness/audit.js +1 -1
  52. package/dist/harness/capability-plan.js +1 -1
  53. package/dist/harness/catalog/drivers.yaml +1 -1
  54. package/dist/harness/catalog/universal-viewpoints.yaml +1 -1
  55. package/dist/harness/challenge.d.ts +1 -0
  56. package/dist/harness/challenge.d.ts.map +1 -1
  57. package/dist/harness/challenge.js +49 -2
  58. package/dist/harness/challenge.js.map +1 -1
  59. package/dist/harness/data-driven-lint.d.ts +7 -0
  60. package/dist/harness/data-driven-lint.d.ts.map +1 -0
  61. package/dist/harness/data-driven-lint.js +153 -0
  62. package/dist/harness/data-driven-lint.js.map +1 -0
  63. package/dist/harness/flow-plan.js +1 -1
  64. package/dist/harness/parse.d.ts +2 -0
  65. package/dist/harness/parse.d.ts.map +1 -1
  66. package/dist/harness/parse.js +16 -0
  67. package/dist/harness/parse.js.map +1 -1
  68. package/dist/harness/query-catalog.d.ts +48 -0
  69. package/dist/harness/query-catalog.d.ts.map +1 -0
  70. package/dist/harness/query-catalog.js +0 -0
  71. package/dist/harness/query-catalog.js.map +1 -0
  72. package/dist/harness/script-check.d.ts.map +1 -1
  73. package/dist/harness/script-check.js +11 -5
  74. package/dist/harness/script-check.js.map +1 -1
  75. package/dist/orchestrator/templates/ai-instructions/claude-agent-challenge.md +3 -2
  76. package/dist/orchestrator/templates/ai-instructions/claude-skill-gherkin-syntax.md +40 -0
  77. package/dist/orchestrator/templates/ai-instructions/claude-skill-harness-audit.md +1 -1
  78. package/dist/orchestrator/templates/ai-instructions/claude-skill-tc-generation.md +19 -0
  79. package/dist/orchestrator/templates/ai-instructions/claude-skill-tc-review.md +1 -0
  80. package/dist/orchestrator/templates/ai-instructions/claude-skill-test-design-techniques.md +6 -0
  81. package/dist/orchestrator/templates/ai-instructions/github-skill-sungen-gherkin-syntax.md +40 -0
  82. package/dist/orchestrator/templates/ai-instructions/github-skill-sungen-harness-audit.md +1 -1
  83. package/dist/orchestrator/templates/ai-instructions/github-skill-sungen-tc-generation.md +19 -0
  84. package/dist/orchestrator/templates/ai-instructions/github-skill-sungen-tc-review.md +1 -0
  85. package/dist/orchestrator/templates/ai-instructions/github-skill-sungen-test-design-techniques.md +6 -0
  86. package/dist/orchestrator/templates/specs-db.d.ts +26 -0
  87. package/dist/orchestrator/templates/specs-db.d.ts.map +1 -0
  88. package/dist/orchestrator/templates/specs-db.js +193 -0
  89. package/dist/orchestrator/templates/specs-db.js.map +1 -0
  90. package/dist/orchestrator/templates/specs-db.ts +169 -0
  91. package/dist/orchestrator/templates/specs-test-data.ts +76 -15
  92. package/docs/orchestration-spec.md +3 -3
  93. package/package.json +2 -2
  94. package/src/cli/commands/challenge.ts +6 -2
  95. package/src/cli/commands/delivery.ts +3 -2
  96. package/src/cli/commands/generate.ts +8 -0
  97. package/src/exporters/csv-exporter.ts +22 -6
  98. package/src/exporters/spec-parser.ts +3 -1
  99. package/src/generators/test-generator/adapters/adapter-interface.ts +2 -1
  100. package/src/generators/test-generator/adapters/playwright/playwright-adapter.ts +1 -1
  101. package/src/generators/test-generator/adapters/playwright/templates/imports.hbs +3 -0
  102. package/src/generators/test-generator/adapters/playwright/templates/scenario.hbs +19 -1
  103. package/src/generators/test-generator/code-generator.ts +133 -4
  104. package/src/generators/test-generator/patterns/database-patterns.ts +96 -0
  105. package/src/generators/test-generator/patterns/expect-patterns.ts +49 -0
  106. package/src/generators/test-generator/patterns/index.ts +5 -0
  107. package/src/generators/test-generator/step-mapper.ts +9 -0
  108. package/src/generators/test-generator/template-engine.ts +5 -2
  109. package/src/generators/test-generator/utils/runtime-data-transformer.ts +5 -5
  110. package/src/harness/audit.ts +1 -1
  111. package/src/harness/capability-plan.ts +1 -1
  112. package/src/harness/catalog/drivers.yaml +1 -1
  113. package/src/harness/catalog/universal-viewpoints.yaml +1 -1
  114. package/src/harness/challenge.ts +47 -2
  115. package/src/harness/data-driven-lint.ts +119 -0
  116. package/src/harness/flow-plan.ts +1 -1
  117. package/src/harness/parse.ts +12 -0
  118. package/src/harness/query-catalog.ts +0 -0
  119. package/src/harness/script-check.ts +12 -6
  120. package/src/orchestrator/templates/ai-instructions/claude-agent-challenge.md +3 -2
  121. package/src/orchestrator/templates/ai-instructions/claude-skill-gherkin-syntax.md +40 -0
  122. package/src/orchestrator/templates/ai-instructions/claude-skill-harness-audit.md +1 -1
  123. package/src/orchestrator/templates/ai-instructions/claude-skill-tc-generation.md +19 -0
  124. package/src/orchestrator/templates/ai-instructions/claude-skill-tc-review.md +1 -0
  125. package/src/orchestrator/templates/ai-instructions/claude-skill-test-design-techniques.md +6 -0
  126. package/src/orchestrator/templates/ai-instructions/github-skill-sungen-gherkin-syntax.md +40 -0
  127. package/src/orchestrator/templates/ai-instructions/github-skill-sungen-harness-audit.md +1 -1
  128. package/src/orchestrator/templates/ai-instructions/github-skill-sungen-tc-generation.md +19 -0
  129. package/src/orchestrator/templates/ai-instructions/github-skill-sungen-tc-review.md +1 -0
  130. package/src/orchestrator/templates/ai-instructions/github-skill-sungen-test-design-techniques.md +6 -0
  131. package/src/orchestrator/templates/specs-db.ts +169 -0
  132. package/src/orchestrator/templates/specs-test-data.ts +76 -15
@@ -34,6 +34,8 @@ export interface ChallengeReport {
34
34
  shallowThemes: string[];
35
35
  // Depth critic
36
36
  collectionClaimSingular: ChallengeFinding[];
37
+ // Data-driven critic — @cases-worthy gaps (spec-independent)
38
+ dataDriven: ChallengeFinding[];
37
39
  // Novelty critic (deterministic prompts → AI agent fills candidates)
38
40
  noveltyPrompts: string[];
39
41
  // Roll-up
@@ -47,6 +49,10 @@ const PLURAL_NOUN = /\b(cards|items|products|rows|results|prices|entries|records
47
49
  const QUANTIFIER = /\b(all|every|each)\b/i;
48
50
  const DISPLAY_VERB = /\b(displays?|shows?|lists?|grid|contains?)\b/i;
49
51
 
52
+ // Title lexicon implying a CLASS of inputs (→ a data-driven @cases candidate). Used by
53
+ // Detector A. Spec-independent — reads the title, not the spec.
54
+ const INPUT_CLASS_LEXICON = /\b(invalid|formats?|boundary|range|min|max|too (?:long|short)|special char|each|various|classes)\b/i;
55
+
50
56
  /** Risk lenses the Novelty critic prompts the AI to explore (beyond the catalog). */
51
57
  const NOVELTY_LENSES = [
52
58
  'double-submit / rapid repeat of the primary action (duplicate side-effects?)',
@@ -88,17 +94,49 @@ export function buildChallenge(screenDir: string, screenName: string): Challenge
88
94
  }
89
95
  }
90
96
 
91
- // 3. Novelty critic — deterministic prompts; the AI agent expands these into candidates.
97
+ // 3. Data-driven critic — surface @cases-worthy gaps (spec-independent).
98
+ const dataDriven: ChallengeFinding[] = [];
99
+ const clustered = new Set<string>();
100
+ // Detector B (high precision): ≥2 non-@cases scenarios sharing the SAME step skeleton
101
+ // (stepSkeleton normalizes {{vars}}/quoted values) → they differ only by data → collapse.
102
+ const bySkeleton = new Map<string, ScenarioInfo[]>();
103
+ for (const s of scenarios) {
104
+ if (s.manual || s.casesDataset || !s.stepSkeleton) continue;
105
+ (bySkeleton.get(s.stepSkeleton) ?? bySkeleton.set(s.stepSkeleton, []).get(s.stepSkeleton)!).push(s);
106
+ }
107
+ for (const group of bySkeleton.values()) {
108
+ if (group.length < 2) continue;
109
+ group.forEach((s) => clustered.add(s.name));
110
+ dataDriven.push({
111
+ scenario: group.map((s) => s.name).join(' | '),
112
+ issue: `${group.length} scenarios share the same steps and differ only by input value (data-variants).`,
113
+ suggestion: `Collapse into ONE \`@cases:<dataset>\` — ${group.length} rows in test-data, each \`{{col}}\` a column. See Gherkin → Advanced → Data-driven.`,
114
+ });
115
+ }
116
+ // Detector A (advisory): a lone scenario whose title implies a class of inputs → suggest @cases.
117
+ for (const s of scenarios) {
118
+ if (s.manual || s.casesDataset || clustered.has(s.name)) continue;
119
+ if (INPUT_CLASS_LEXICON.test(s.name)) {
120
+ dataDriven.push({
121
+ scenario: s.name,
122
+ issue: 'Title reads like a CLASS of inputs (validation/boundary/error) but tests a single value.',
123
+ suggestion: 'Consider `@cases` to cover the EP/boundary classes (one row per valid/invalid class), not just one value.',
124
+ });
125
+ }
126
+ }
127
+
128
+ // 4. Novelty critic — deterministic prompts; the AI agent expands these into candidates.
92
129
  const noveltyPrompts = NOVELTY_LENSES.map((l) => `Find 1 non-obvious, valuable scenario via: ${l}`);
93
130
 
94
131
  // Roll-up — exploration readiness signals (not a fake score).
95
132
  const explorationReadiness: string[] = [];
133
+ if (dataDriven.length) explorationReadiness.push(`${dataDriven.length} data-driven gap(s) — scenarios that should be one \`@cases\` (collapse data-variants / cover EP-boundary classes).`);
96
134
  if (collectionClaimSingular.length) explorationReadiness.push(`${collectionClaimSingular.length} title↔assertion gap(s) — deterministic depth critic flagged these; an AI Business-Depth critic should confirm + fix.`);
97
135
  if (overCovered.length) explorationReadiness.push(`${overCovered.length} possibly over-covered area(s) — rebalance toward correctness.`);
98
136
  if (shallowThemes.length) explorationReadiness.push(`Shallow themes: ${shallowThemes.join(', ')}.`);
99
137
  explorationReadiness.push('Novelty candidates are NOT generated deterministically — run the `sungen-challenge` agent (Claude) or its inline criteria (Copilot) to propose them, then QA accept/reject (≤20% of official, no auto-merge).');
100
138
 
101
- return { screen: screenName, overCovered, shallowThemes, collectionClaimSingular, noveltyPrompts, explorationReadiness };
139
+ return { screen: screenName, overCovered, shallowThemes, collectionClaimSingular, dataDriven, noveltyPrompts, explorationReadiness };
102
140
  }
103
141
 
104
142
  /** Render the Challenge Report as Markdown (advisory — not part of the official suite). */
@@ -114,6 +152,13 @@ export function renderChallengeMarkdown(r: ChallengeReport): string {
114
152
  } else lines.push('_none_');
115
153
  lines.push('');
116
154
 
155
+ lines.push('## Data-driven — scenarios that should be `@cases` (one test case, many inputs)');
156
+ if (r.dataDriven.length) {
157
+ lines.push('| Scenario(s) | Issue | Suggested |', '|---|---|---|');
158
+ for (const f of r.dataDriven) lines.push(`| ${f.scenario} | ${f.issue} | ${f.suggestion} |`);
159
+ } else lines.push('_none_');
160
+ lines.push('');
161
+
117
162
  lines.push('## Coverage — possibly over-covered / shallow');
118
163
  if (r.overCovered.length) for (const o of r.overCovered) lines.push(`- **${o.bucket}** — ${o.note}`);
119
164
  if (r.shallowThemes.length) lines.push(`- Shallow themes: ${r.shallowThemes.join(', ')}`);
@@ -0,0 +1,119 @@
1
+ /**
2
+ * Deterministic lint for the data-driven features (`@cases` + `@query`).
3
+ *
4
+ * Advisory (warn, never block) — surfaced after `sungen generate`. Catches the silent
5
+ * footguns: a `{{var}}` in a @cases scenario that is neither a row column nor a top-level
6
+ * key, a name collision that shadows a top-level value, inconsistent dataset rows, and a
7
+ * @query whose param has no value / whose name doesn't resolve.
8
+ */
9
+ import * as fs from 'fs';
10
+ import * as path from 'path';
11
+ import { parse as parseYaml } from 'yaml';
12
+ import { GherkinParser, ParsedScenario } from '../generators/gherkin-parser';
13
+ import { resolveQuery, lintCatalog } from './query-catalog';
14
+
15
+ export interface DataDrivenWarning {
16
+ scenario?: string;
17
+ message: string;
18
+ }
19
+
20
+ const ROW_META = new Set(['case', 'name', 'label', '__label']);
21
+
22
+ /** Collect the inner text of every `{{…}}` reference in a scenario's steps. */
23
+ function collectRefs(sc: ParsedScenario): string[] {
24
+ const refs = new Set<string>();
25
+ for (const st of sc.steps || []) {
26
+ for (const m of (st.text || '').matchAll(/\{\{\s*([^}]+?)\s*\}\}/g)) refs.add(m[1]);
27
+ }
28
+ return [...refs];
29
+ }
30
+
31
+ /** Keys overridden in a `@query:name(a=…,b=…)` annotation. */
32
+ function overrideKeys(raw?: string): Set<string> {
33
+ const out = new Set<string>();
34
+ if (!raw) return out;
35
+ for (const part of raw.split(',')) {
36
+ const eq = part.indexOf('=');
37
+ if (eq > 0) out.add(part.slice(0, eq).trim());
38
+ }
39
+ return out;
40
+ }
41
+
42
+ /** Lint the @cases/@query usage of one screen/flow directory. Returns advisory warnings. */
43
+ export function lintDataDriven(screenDir: string, cwd: string = process.cwd()): DataDrivenWarning[] {
44
+ const base = path.basename(screenDir);
45
+ const isFlow = path.basename(path.dirname(screenDir)) === 'flows';
46
+ const screenName = isFlow ? `flows/${base}` : base;
47
+ const featurePath = path.join(screenDir, 'features', `${base}.feature`);
48
+ if (!fs.existsSync(featurePath)) return [];
49
+
50
+ let scenarios: ParsedScenario[];
51
+ try {
52
+ scenarios = new GherkinParser().parseFeatureFile(featurePath).scenarios || [];
53
+ } catch {
54
+ return [];
55
+ }
56
+
57
+ const tdPath = path.join(screenDir, 'test-data', `${base}.yaml`);
58
+ const td: Record<string, any> = tdPath && fs.existsSync(tdPath) ? parseYaml(fs.readFileSync(tdPath, 'utf8')) || {} : {};
59
+ const topKeys = new Set(Object.keys(td));
60
+ const warns: DataDrivenWarning[] = [];
61
+
62
+ for (const sc of scenarios) {
63
+ const tags: string[] = sc.tags || [];
64
+ const refs = collectRefs(sc);
65
+
66
+ // --- @cases -----------------------------------------------------------
67
+ const casesTag = tags.find((t) => t.startsWith('@cases:'));
68
+ if (casesTag) {
69
+ const ds = casesTag.slice('@cases:'.length).trim();
70
+ const rows = td[ds];
71
+ if (!Array.isArray(rows)) {
72
+ warns.push({ scenario: sc.name, message: `@cases:${ds} → dataset "${ds}" is missing or not a list in test-data.` });
73
+ } else {
74
+ const colSets = rows.map((r) => new Set(Object.keys(r || {}).filter((k) => !ROW_META.has(k))));
75
+ const allCols = new Set<string>();
76
+ colSets.forEach((s) => s.forEach((c) => allCols.add(c)));
77
+ for (const c of allCols) {
78
+ const missing = colSets.filter((s) => !s.has(c)).length;
79
+ if (missing) warns.push({ scenario: sc.name, message: `@cases:${ds} → column "${c}" is missing in ${missing}/${rows.length} row(s) — rows are inconsistent.` });
80
+ if (topKeys.has(c)) warns.push({ scenario: sc.name, message: `@cases:${ds} → "${c}" is both a dataset column and a top-level test-data key — the row value shadows the top-level one.` });
81
+ }
82
+ for (const r of refs) {
83
+ const head = r.split(/[.[]/)[0];
84
+ if (!allCols.has(head) && !topKeys.has(head)) {
85
+ warns.push({ scenario: sc.name, message: `@cases:${ds} → {{${r}}} is neither a dataset column nor a top-level test-data key.` });
86
+ }
87
+ }
88
+ }
89
+ }
90
+
91
+ // --- @query -----------------------------------------------------------
92
+ for (const t of tags) {
93
+ const m = t.match(/^@query:([A-Za-z_][A-Za-z0-9_]*)(?:\((.*)\))?$/);
94
+ if (!m) continue;
95
+ const name = m[1];
96
+ const overrides = overrideKeys(m[2]);
97
+ let entry;
98
+ try {
99
+ entry = resolveQuery(name, screenName, cwd);
100
+ } catch (e: any) {
101
+ warns.push({ scenario: sc.name, message: e?.message || `@query:${name} → cannot resolve query.` });
102
+ continue;
103
+ }
104
+ for (const p of entry.params || []) {
105
+ if (!overrides.has(p) && !topKeys.has(p)) {
106
+ warns.push({ scenario: sc.name, message: `@query:${name} → param "${p}" has no value: not in the annotation and not a top-level test-data key.` });
107
+ }
108
+ }
109
+ }
110
+ }
111
+
112
+ // Catalog-level lint (SELECT-only, params declared/used, datasource present).
113
+ try {
114
+ for (const e of lintCatalog(screenName, null, cwd).errors) warns.push({ message: e });
115
+ } catch {
116
+ /* no catalog → nothing to lint */
117
+ }
118
+ return warns;
119
+ }
@@ -5,7 +5,7 @@
5
5
  * leg's SELECTOR READINESS + capability, folds in the manual-reason taxonomy
6
6
  * (capability-plan) and the run-test contract (flow-check), and emits a run-test
7
7
  * PLAN. Automates the manual diagnosis done while healing cart-and-filter.
8
- * See reports/sungen_phase2c_spec.md.
8
+ * See docs/spec/sungen_phase2c_spec.md.
9
9
  */
10
10
  import * as fs from 'fs';
11
11
  import * as path from 'path';
@@ -30,6 +30,8 @@ export interface ScenarioInfo {
30
30
  haystack: string; // lowercase name + steps text (for keyword coverage)
31
31
  stepsText: string; // lowercase steps ONLY (name excluded) — for claim-proof
32
32
  vpId?: string; // raw leading ID token of the title (project's scheme: VP0-001, MS-HP-001, VP-LIST-001)
33
+ casesDataset?: string; // @cases:<dataset> — data-driven; one scenario expands to N row-tests
34
+ queryRefs?: string[]; // named queries referenced by this scenario (inline `query [name]` + @query: tags)
33
35
  }
34
36
 
35
37
  /** Format-tolerant: is this token an ID (project's scheme), not a prose word?
@@ -98,6 +100,14 @@ const PRIORITY_TAGS: Record<string, Priority> = { '@high': 'high', '@normal': 'n
98
100
  function classifyScenario(sc: ParsedScenario): ScenarioInfo {
99
101
  const tags = sc.tags || [];
100
102
  const manual = tags.includes('@manual');
103
+ const casesTag = tags.find((t) => t.startsWith('@cases:'));
104
+ const casesDataset = casesTag ? casesTag.slice('@cases:'.length).trim() : undefined;
105
+ // Named-query references: @query:<name> tags + inline `query [name]` step refs.
106
+ const queryRefs = new Set<string>();
107
+ for (const t of tags) if (t.startsWith('@query:')) { const n = t.slice('@query:'.length).trim(); if (n) queryRefs.add(n); }
108
+ for (const step of (sc.steps as ParsedStep[]) || []) {
109
+ for (const m of (step.text || '').matchAll(/\bquery\s+\[([A-Za-z_][A-Za-z0-9_]*)\]/gi)) queryRefs.add(m[1]);
110
+ }
101
111
  let priority: Priority = 'unknown';
102
112
  for (const t of tags) if (PRIORITY_TAGS[t]) priority = PRIORITY_TAGS[t];
103
113
 
@@ -152,6 +162,8 @@ function classifyScenario(sc: ParsedScenario): ScenarioInfo {
152
162
  haystack: textParts.join(' ').toLowerCase(),
153
163
  stepsText: stepTextParts.join(' ').toLowerCase(),
154
164
  vpId,
165
+ casesDataset,
166
+ queryRefs: queryRefs.size ? [...queryRefs] : undefined,
155
167
  };
156
168
  }
157
169
 
Binary file
@@ -15,7 +15,7 @@
15
15
  import * as fs from 'fs';
16
16
  import * as path from 'path';
17
17
  import * as os from 'os';
18
- import { loadScenarios } from './parse';
18
+ import { loadScenarios, ScenarioInfo } from './parse';
19
19
 
20
20
  export interface ScriptCheckResult {
21
21
  screen: string;
@@ -68,7 +68,10 @@ export function analyzeFaithfulness(specSrc: string, automatedTitles: Set<string
68
68
  for (const blk of extractTestBlocks(specSrc)) {
69
69
  if (!automatedTitles.has(blk.title)) continue; // only non-@manual scenarios
70
70
  const body = blk.body;
71
- if (!body.some((l) => /expect\(/.test(l))) assertionlessTests.push(blk.title);
71
+ // An assertion is a Playwright `expect(...)` OR a Data Driver DB assertion
72
+ // (`db.assertRow/assertNoRow/assertCount/...`) — a DB check is a real oracle, so a
73
+ // DB-only scenario (no UI expect) is NOT a bypass.
74
+ if (!body.some((l) => /expect\(|\bdb\.assert\w*\s*\(/.test(l))) assertionlessTests.push(blk.title);
72
75
  // hollow step: a `// step` whose region (until the NEXT step-comment / block end)
73
76
  // contains no executable code. The region — not just the next line — is checked,
74
77
  // so block-style steps (`// Assert all … { … expect … }`) are correctly counted.
@@ -143,10 +146,13 @@ export async function runScriptCheck(screenDir: string, screenName: string, flow
143
146
  }
144
147
 
145
148
  // A. Structural 1:1
149
+ // A @cases scenario emits ONE source test() inside a per-row loop, titled
150
+ // `<name> — ${__row.__label}` — match that literal title, not the bare name.
151
+ const expectedTitle = (s: ScenarioInfo) => (s.casesDataset ? `${s.name} — ${'${'}__row.__label}` : s.name);
146
152
  const specTitleSet = new Set(specTitles);
147
- const scenTitleSet = new Set(automated.map((s) => s.name));
148
- const missingInSpec = automated.filter((s) => !specTitleSet.has(s.name)).map((s) => s.name);
149
- const extraInSpec = specTitles.filter((t) => !scenTitleSet.has(t));
153
+ const expectedSet = new Set(automated.map(expectedTitle));
154
+ const missingInSpec = automated.filter((s) => !specTitleSet.has(expectedTitle(s))).map((s) => s.name);
155
+ const extraInSpec = specTitles.filter((t) => !expectedSet.has(t));
150
156
  const countMatch = committedSpec ? automated.length === specTitles.length : false;
151
157
  if (committedSpec && !countMatch) {
152
158
  findings.push(`Count mismatch: ${automated.length} automated scenarios vs ${specTitles.length} test() blocks.`);
@@ -190,7 +196,7 @@ export async function runScriptCheck(screenDir: string, screenName: string, flow
190
196
 
191
197
  // C. Anti-bypass / faithfulness
192
198
  const { assertionlessTests, hollowSteps } = committedSpec
193
- ? analyzeFaithfulness(specSrc, scenTitleSet)
199
+ ? analyzeFaithfulness(specSrc, expectedSet)
194
200
  : { assertionlessTests: [], hollowSteps: [] };
195
201
  for (const t of assertionlessTests) {
196
202
  findings.push(`BYPASS: test "${t}" has 0 assertions (action-only — proves nothing). The testcase is not really automated.`);
@@ -15,11 +15,12 @@ Run `sungen challenge --screen <name>` (Bash) and read its report (`.sungen/repo
15
15
  - `.sungen/reports/<name>-audit.json` — what the gate already measured.
16
16
  - Blind-spot patterns — run `sungen blindspot list --prompt` (Bash) and check the suite against each known pattern.
17
17
 
18
- ## Three critics
18
+ ## Four critics
19
19
 
20
20
  1. **Coverage critic** — viewpoints that are missing or covered only shallowly; areas over-covered with low value (e.g. many subscription edge cases while cart correctness is thin). Recommend rebalancing, not just adding.
21
21
  2. **Business-Depth critic** — scenarios whose **title claims more than the steps prove** (a set/collection asserted by one element; "correct X" asserted by mere visibility). For each, give the exact deep step to add. Confirm or dismiss the deterministic flags from `sungen challenge`.
22
- 3. **Novelty critic** — 3–5 **non-obvious, valuable** scenarios outside the existing pattern, via risk lenses (double-submit, partial-load, boundary/unusual data, concurrency/back-button, historical incidents). Each must map to a risk or viewpoint and explain why it isn't a duplicate.
22
+ 3. **Data-driven critic** — surface `@cases`-worthy gaps, *spec-independent*: (a) **confirm the deterministic collapse suggestions** (`sungen challenge` flags ≥2 scenarios with the same step skeleton propose the one `@cases:<dataset>` with the rows + a `case` label each); (b) for any action reaching a backend/logic (login, search, create, an API/error path), propose the **corner/error matrix** as `@cases` rows — *invalid · empty · boundary · injection · duplicate · not-found · unauthorized · malformed · rate-limit* picking the family that fits the screen. These are the cases a spec/viewpoint usually under-specifies.
23
+ 4. **Novelty critic** — 3–5 **non-obvious, valuable** scenarios outside the existing pattern, via risk lenses (double-submit, partial-load, boundary/unusual data, concurrency/back-button, historical incidents). Each must map to a risk or viewpoint and explain why it isn't a duplicate.
23
24
 
24
25
  ## Guardrails (hard)
25
26
  - **Read-only.** Never edit the feature or any file. You return findings; the QA/orchestrator decides.
@@ -102,6 +102,22 @@ User see [Table] table match data:
102
102
 
103
103
  Row scope: `see [Ref] row in [Table] table with {{v}}` enters scope. Subsequent `see [Col] column with {{v}}` checks cell in that row. Use `table match data:` for multi-row verification.
104
104
 
105
+ ### Database verification (optional Data Driver)
106
+
107
+ Read-only DB-state checks. **Prefer named queries** — SQL lives in `qa/screens/<screen>/database/queries.yaml` (reviewed once, parameterized). Invoke with the `@query:<name>` annotation; it binds the result rows to `{{name}}`, then assert with `expect`:
108
+
109
+ ```gherkin
110
+ @query:active_user # precondition: run query, bind {{active_user}}
111
+ @query:orders(buyer={{email}}) # …with explicit param override
112
+ Scenario: ...
113
+ Then expect {{active_user.count}} is at least {{one}} # ≥1 row
114
+ And expect {{active_user.first.status}} is "active" # first row's column
115
+ And expect {{orders.count}} is {{expected}} # exact count
116
+ And User see [Total] text is {{orders.first.total}} # UI ↔ DB
117
+ ```
118
+
119
+ Path access on a bound result: `{{q.count}}`/`{{q.length}}`, `{{q.first.col}}`, `{{q.last.col}}`, `{{q[2].col}}`, `{{q.col}}` (= first row's col). `expect A is B` also supports `is at least` / `is at most` / `is not`. Tier-2 declarative (trivial inline, no catalog): `User see [<table>] row where [<col>] is {{v}} [has [<col2>] = "x"]`, `… no row where …`, `… count is {{n}}`. Full grammar + catalog/datasource/secret rules → **Advanced → Database** doc. Only emit DB steps when the project has a `database/` catalog / `datasources.yaml`.
120
+
105
121
  ### States
106
122
 
107
123
  `hidden` `visible` `disabled` `enabled` `checked` `unchecked` `focused` `empty` `loading` `selected` `sorted ascending` `sorted descending`
@@ -195,6 +211,30 @@ Options: `nth` `exact` `scope` `match` `variant` `frame` `contenteditable` `colu
195
211
  | `@afterEach` | Hook: runs after each test → `test.afterEach()` (custom cleanup) |
196
212
  | `@afterAll` | Hook: runs once after all tests → `test.afterAll()` |
197
213
  | `@flow` | Mark feature as E2E flow (cross-screen testing) |
214
+ | `@cases:dataset` | Data-driven: run the scenario once per row of the `dataset` LIST in test-data → one `test()` per row |
215
+ | `@query:name` | Database: run the named query from `database/queries.yaml` (precondition) and bind its rows to `{{name}}`; assert with `expect {{name.count}} …` + path access. Override params `@query:name(p={{v}})`. Repeatable. (Optional Data Driver — see Database verification above) |
216
+
217
+ ### Data-driven scenarios (`@cases`)
218
+
219
+ For one test case × many inputs (email/format/boundary validation, decision tables), tag the
220
+ scenario `@cases:<dataset>` and reference each row's columns as `{{col}}`. Put the rows as a LIST
221
+ in test-data — NOT inline; data stays runtime + env-overlayable.
222
+
223
+ ```gherkin
224
+ @high @cases:email_validation
225
+ Scenario: VP-VAL-001 The email field rejects invalid formats
226
+ When User fill [Email] field with {{email}}
227
+ Then User see [Login Error] message with {{expected_error}}
228
+ ```
229
+ ```yaml
230
+ # test-data/<screen>.yaml
231
+ email_validation:
232
+ - { case: "no @", email: "plainaddress", expected_error: "Invalid email" }
233
+ - { case: "valid", email: "ok@x.com", expected_error: "" }
234
+ ```
235
+ An optional `case`/`name`/`label` column labels each run. Each row → its own pass/fail. Prefer
236
+ `@cases` over duplicating a scenario per value. (Gherkin `Scenario Outline`/`Examples` is NOT
237
+ supported — use `@cases`.)
198
238
 
199
239
  ### Pass-through tags (filter at runtime via Playwright --grep)
200
240
 
@@ -81,4 +81,4 @@ domain rủi ro+defect?→ YES: Defect-first
81
81
  → hỏi QA; QA chưa phản hồi → OUTPUT kèm ASSUMPTION LIST rõ ràng (không stall)
82
82
  ```
83
83
 
84
- See `docs/orchestration-spec.md` for the full flow and `reports/sungen_refactor_spec.md` for the design rationale.
84
+ See `docs/orchestration-spec.md` for the full flow and `docs/spec/sungen_refactor_spec.md` for the design rationale.
@@ -54,6 +54,25 @@ user-invocable: false
54
54
  OR condition: generate 1 scenario per branch where that branch alone triggers the outcome.
55
55
  → Happy-path only = missing the most common multi-condition implementation bug.
56
56
 
57
+ - **Many inputs, same steps → ONE data-driven scenario (`@cases`), not N copies:**
58
+ When a rule needs lots of inputs with the *same* step shape (email/format validation,
59
+ BVA boundary triples, EP classes, decision-table rows), tag one scenario `@cases:<dataset>`,
60
+ reference each row's columns as `{{col}}`, and put the rows as a LIST in test-data:
61
+ ```gherkin
62
+ @high @cases:email_validation
63
+ Scenario: VP-VAL-001 The email field rejects invalid formats
64
+ When User fill [Email] field with {{email}}
65
+ Then User see [Error] message with {{expected_error}}
66
+ ```
67
+ ```yaml
68
+ email_validation:
69
+ - { case: "no @", email: "plainaddress", expected_error: "Invalid email" }
70
+ - { case: "valid", email: "ok@x.com", expected_error: "" }
71
+ ```
72
+ → one `test()` per row, each labelled by `case`. Adding inputs = editing test-data (no recompile),
73
+ and env overlays apply. Prefer this over duplicating a scenario per value. (Gherkin
74
+ `Scenario Outline`/`Examples` is NOT supported — use `@cases`.)
75
+
57
76
  ---
58
77
 
59
78
  ## Tier System
@@ -120,6 +120,7 @@ Build a mapping table: for each applicable group, does the feature have a matchi
120
120
  - **EP**: keep only **one representative** per invalid class; same-class duplicates → flag as redundant.
121
121
  - **BVA**: spec defines min/max → cover `min-1`, `min`, `max`, `max+1` (Maxlength, counts…).
122
122
  - Error messages must match the spec **word-for-word**, not generic.
123
+ - **Data-driven (`@cases`)**: a `@cases:<dataset>` scenario legitimately covers many inputs in ONE scenario (one row per EP class / boundary / rule). Do **not** flag it as "too few negative cases" or as duplication — instead review the **dataset rows**: are all EP classes / boundary triples present, each labelled, expected values exact? N near-identical scenarios that differ only by input value → flag and recommend collapsing to `@cases`.
123
124
 
124
125
  ---
125
126
 
@@ -17,6 +17,12 @@ Apply selectively — not every screen needs all four techniques. Use the techni
17
17
 
18
18
  **Rule:** These techniques determine **how many** and **which** scenarios to generate. `sungen-viewpoint` determines **which viewpoints** to cover.
19
19
 
20
+ **Implementing the data table → `@cases` (data-driven):** when EP classes / BVA boundary triples /
21
+ decision-table rows share the *same step shape* and differ only by input/expected values, encode
22
+ them as ONE `@cases:<dataset>` scenario (each class/boundary/rule = one row in the test-data list,
23
+ labelled by a `case` column) instead of N near-duplicate scenarios. The technique still decides the
24
+ rows; `@cases` is how you write them compactly. See `sungen-gherkin-syntax` → Data-driven.
25
+
20
26
  ---
21
27
 
22
28
  ## 1. Equivalence Partitioning (EP)
@@ -102,6 +102,22 @@ User see [Table] table match data:
102
102
 
103
103
  Row scope: `see [Ref] row in [Table] table with {{v}}` enters scope. Subsequent `see [Col] column with {{v}}` checks cell in that row. Use `table match data:` for multi-row verification.
104
104
 
105
+ ### Database verification (optional Data Driver)
106
+
107
+ Read-only DB-state checks. **Prefer named queries** — SQL lives in `qa/screens/<screen>/database/queries.yaml` (reviewed once, parameterized). Invoke with the `@query:<name>` annotation; it binds the result rows to `{{name}}`, then assert with `expect`:
108
+
109
+ ```gherkin
110
+ @query:active_user # precondition: run query, bind {{active_user}}
111
+ @query:orders(buyer={{email}}) # …with explicit param override
112
+ Scenario: ...
113
+ Then expect {{active_user.count}} is at least {{one}} # ≥1 row
114
+ And expect {{active_user.first.status}} is "active" # first row's column
115
+ And expect {{orders.count}} is {{expected}} # exact count
116
+ And User see [Total] text is {{orders.first.total}} # UI ↔ DB
117
+ ```
118
+
119
+ Path access on a bound result: `{{q.count}}`/`{{q.length}}`, `{{q.first.col}}`, `{{q.last.col}}`, `{{q[2].col}}`, `{{q.col}}` (= first row's col). `expect A is B` also supports `is at least` / `is at most` / `is not`. Tier-2 declarative (trivial inline, no catalog): `User see [<table>] row where [<col>] is {{v}} [has [<col2>] = "x"]`, `… no row where …`, `… count is {{n}}`. Full grammar + catalog/datasource/secret rules → **Advanced → Database** doc. Only emit DB steps when the project has a `database/` catalog / `datasources.yaml`.
120
+
105
121
  ### States
106
122
 
107
123
  `hidden` `visible` `disabled` `enabled` `checked` `unchecked` `focused` `empty` `loading` `selected` `sorted ascending` `sorted descending`
@@ -195,6 +211,30 @@ Options: `nth` `exact` `scope` `match` `variant` `frame` `contenteditable` `colu
195
211
  | `@afterEach` | Hook: runs after each test → `test.afterEach()` (custom cleanup) |
196
212
  | `@afterAll` | Hook: runs once after all tests → `test.afterAll()` |
197
213
  | `@flow` | Mark feature as E2E flow (cross-screen testing) |
214
+ | `@cases:dataset` | Data-driven: run the scenario once per row of the `dataset` LIST in test-data → one `test()` per row |
215
+ | `@query:name` | Database: run the named query from `database/queries.yaml` (precondition) and bind its rows to `{{name}}`; assert with `expect {{name.count}} …` + path access. Override params `@query:name(p={{v}})`. Repeatable. (Optional Data Driver — see Database verification above) |
216
+
217
+ ### Data-driven scenarios (`@cases`)
218
+
219
+ For one test case × many inputs (email/format/boundary validation, decision tables), tag the
220
+ scenario `@cases:<dataset>` and reference each row's columns as `{{col}}`. Put the rows as a LIST
221
+ in test-data — NOT inline; data stays runtime + env-overlayable.
222
+
223
+ ```gherkin
224
+ @high @cases:email_validation
225
+ Scenario: VP-VAL-001 The email field rejects invalid formats
226
+ When User fill [Email] field with {{email}}
227
+ Then User see [Login Error] message with {{expected_error}}
228
+ ```
229
+ ```yaml
230
+ # test-data/<screen>.yaml
231
+ email_validation:
232
+ - { case: "no @", email: "plainaddress", expected_error: "Invalid email" }
233
+ - { case: "valid", email: "ok@x.com", expected_error: "" }
234
+ ```
235
+ An optional `case`/`name`/`label` column labels each run. Each row → its own pass/fail. Prefer
236
+ `@cases` over duplicating a scenario per value. (Gherkin `Scenario Outline`/`Examples` is NOT
237
+ supported — use `@cases`.)
198
238
 
199
239
  ### Pass-through tags (filter at runtime via Playwright --grep)
200
240
 
@@ -81,4 +81,4 @@ domain rủi ro+defect?→ YES: Defect-first
81
81
  → hỏi QA; QA chưa phản hồi → OUTPUT kèm ASSUMPTION LIST rõ ràng (không stall)
82
82
  ```
83
83
 
84
- See `docs/orchestration-spec.md` for the full flow and `reports/sungen_refactor_spec.md` for the design rationale.
84
+ See `docs/orchestration-spec.md` for the full flow and `docs/spec/sungen_refactor_spec.md` for the design rationale.
@@ -54,6 +54,25 @@ user-invocable: false
54
54
  OR condition: generate 1 scenario per branch where that branch alone triggers the outcome.
55
55
  → Happy-path only = missing the most common multi-condition implementation bug.
56
56
 
57
+ - **Many inputs, same steps → ONE data-driven scenario (`@cases`), not N copies:**
58
+ When a rule needs lots of inputs with the *same* step shape (email/format validation,
59
+ BVA boundary triples, EP classes, decision-table rows), tag one scenario `@cases:<dataset>`,
60
+ reference each row's columns as `{{col}}`, and put the rows as a LIST in test-data:
61
+ ```gherkin
62
+ @high @cases:email_validation
63
+ Scenario: VP-VAL-001 The email field rejects invalid formats
64
+ When User fill [Email] field with {{email}}
65
+ Then User see [Error] message with {{expected_error}}
66
+ ```
67
+ ```yaml
68
+ email_validation:
69
+ - { case: "no @", email: "plainaddress", expected_error: "Invalid email" }
70
+ - { case: "valid", email: "ok@x.com", expected_error: "" }
71
+ ```
72
+ → one `test()` per row, each labelled by `case`. Adding inputs = editing test-data (no recompile),
73
+ and env overlays apply. Prefer this over duplicating a scenario per value. (Gherkin
74
+ `Scenario Outline`/`Examples` is NOT supported — use `@cases`.)
75
+
57
76
  ---
58
77
 
59
78
  ## Tier System
@@ -120,6 +120,7 @@ Build a mapping table: for each applicable group, does the feature have a matchi
120
120
  - **EP**: keep only **one representative** per invalid class; same-class duplicates → flag as redundant.
121
121
  - **BVA**: spec defines min/max → cover `min-1`, `min`, `max`, `max+1` (Maxlength, counts…).
122
122
  - Error messages must match the spec **word-for-word**, not generic.
123
+ - **Data-driven (`@cases`)**: a `@cases:<dataset>` scenario legitimately covers many inputs in ONE scenario (one row per EP class / boundary / rule). Do **not** flag it as "too few negative cases" or as duplication — instead review the **dataset rows**: are all EP classes / boundary triples present, each labelled, expected values exact? N near-identical scenarios that differ only by input value → flag and recommend collapsing to `@cases`.
123
124
 
124
125
  ---
125
126
 
@@ -17,6 +17,12 @@ Apply selectively — not every screen needs all four techniques. Use the techni
17
17
 
18
18
  **Rule:** These techniques determine **how many** and **which** scenarios to generate. `sungen-viewpoint` determines **which viewpoints** to cover.
19
19
 
20
+ **Implementing the data table → `@cases` (data-driven):** when EP classes / BVA boundary triples /
21
+ decision-table rows share the *same step shape* and differ only by input/expected values, encode
22
+ them as ONE `@cases:<dataset>` scenario (each class/boundary/rule = one row in the test-data list,
23
+ labelled by a `case` column) instead of N near-duplicate scenarios. The technique still decides the
24
+ rows; `@cases` is how you write them compactly. See `sungen-gherkin-syntax` → Data-driven.
25
+
20
26
  ---
21
27
 
22
28
  ## 1. Equivalence Partitioning (EP)