npm - siesa-agents - Versions diffs - 2.1.72-qa.6 → 2.1.72-qa.8 - Mend

siesa-agents 2.1.72-qa.6 → 2.1.72-qa.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/claude/skills/sa-qa-data-generator/SKILL.md +170 -47
package/package.json +1 -1
package/siesa-agents/bmm/workflows/3-solutioning/quality-process/workflow.md +48 -30

package/claude/skills/sa-qa-data-generator/SKILL.md CHANGED Viewed

@@ -1,19 +1,36 @@
 ---
 name: sa-qa-data-generator
 description: Generates synthetic data for QA testing by reading test case files (CSV/Excel) and querying the real database via MCP. Activate when the user provides a test case file and has a PostgreSQL or SQL Server MCP configured and connected.
-version: 1.0.0
+version: 1.1.0
 ---
 # Synthetic Data Engine for QA
+## Usage
+**Prerequisites**:
+- PostgreSQL (or SQL Server) MCP configured and connected to the DB
+- Dependency functions (`get_all_dependencies_json` / `dbo.GetAllDependencies_JSON`) installed in the DB
+- A CSV or Excel file with the test cases
+**Activation**: Provide the test case file and optionally indicate the functional process or module. The AI infers the business, confirms it with you, and does everything else automatically.
+**Output to disk**: Artifacts are written next to the test case file (or in `mcp_database/output/<business>_<date>/` if the user does not indicate another path): `test-data.md`, `seed.sql`, `rollback.sql`.
+---
 ## Role
 You are an automated synthetic data generator for QA testing. Your job is to:
 1. Read the test case file (CSV/Excel) provided by the user
-2. Query the real database via MCP to obtain schemas, dependencies, and existing data
-3. **Reuse existing QA data whenever it satisfies the test case.** Only generate and insert new data for cases that cannot be covered by what already exists
-4. Guarantee referential integrity, correct data types, and realistic values when insertion is required
+2. **Infer the business/domain the cases target and confirm it with the user before generating anything**
+3. Query the real database via MCP to obtain schemas, dependencies, and existing data
+4. **Reuse existing QA data whenever it satisfies the test case.** Only generate and insert new data for cases that cannot be covered by what already exists
+5. Guarantee referential integrity, correct data types, and realistic values when insertion is required
+6. **Deliver 3 artifacts on disk**: (a) `test-data.md` with the case→data relation, (b) `seed.sql` executable with the insertions, (c) `rollback.sql` that reverts exactly this run
+**Three mandatory deliverables** (in addition to the direct DB insertion): for every execution that generates data you must write `test-data.md`, `seed.sql`, and `rollback.sql` to disk. The `seed.sql` lets you reload the dataset into a fresh DB without re-running the whole process; the `test-data.md` is designed so that later an AI agent (e.g., Playwright) can automate the cases by reading the literal values it must use per case.
 Do not invent schemas or columns. Everything you insert must be validated against the real database structure.
@@ -46,9 +63,9 @@ Execute these phases in order. Do not skip any. Show the result of each phase to
 ---
-### PHASE 1 — Test Case Reading and Analysis
+### PHASE 1 — Test Case Reading, Analysis and Business Inference
-**Objective**: Understand what is being tested and what data each case needs.
+**Objective**: Understand what is being tested, what data each case needs, and **which business/domain the cases target** — confirming it with the user before proceeding.
 **Actions**:
 1. Read the complete CSV/Excel file
@@ -61,7 +78,32 @@ Execute these phases in order. Do not skip any. Show the result of each phase to
    - Specific required values (amounts, dates, statuses, etc.)
    - Data preconditions
-**Output**: Summary table with: Case ID | Type | Entities involved | Critical required data
+**1.A — Business inference (confirmation GATE)**:
+From the language of the cases (products, processes, terminology, amounts, units), **deduce which type of business the data targets**. This guides the realism of the values generated (item names, customers, warehouses, plausible amounts for that industry).
+Present the inference to the user explicitly and **wait for confirmation**:
+```
+From the content of the cases, I deduce the data targets:
+  >> "A clothing store" <<
+Clues: items like "polo shirt", "regular t-shirt"; inventory-out process by
+warehouse; sizes/colors as attributes.
+Is this business correct?
+  - If you confirm, I will generate items, customers and amounts coherent with a clothing store.
+  - If not, tell me which business this data targets.
+```
+- If the user **confirms** → synthetic data is generated coherent with that business.
+- If the user **corrects** → use the business they indicate.
+- If the cases are too generic to infer a business → **ask the user directly** which business the data targets, instead of assuming.
+Record the confirmed business: it is used to name the output folder and as the header of `test-data.md`.
+**Output**:
+1. Business confirmed by the user
+2. Summary table with: Case ID | Type | Entities involved | Critical required data
 ---
@@ -274,11 +316,21 @@ MCP: read_data("SELECT 'table1' as table, COUNT(*) as records FROM schema.table1
 ---
-### PHASE 7 — Report and Rollback Script
+### PHASE 7 — Deliverables: Report, test-data.md, seed.sql and rollback.sql
+**Objective**: Document everything inserted and produce the **three on-disk artifacts**.
+This phase produces, in addition to the on-screen report, **three files** written with the file-writing tool (not just chat blocks):
-**Objective**: Document everything inserted and provide a cleanup mechanism.
+| Deliverable | File | Purpose |
+|-------------|------|---------|
+| 1. Case→data relation | `test-data.md` | Which data to use/was created per case; grouping cases that share the same seed. Designed for an AI agent (Playwright) to automate the cases. |
+| 2. Executable seed | `seed.sql` | Idempotent, transactional insertions to reload the dataset into a fresh DB without re-running the process. |
+| 3. Rollback | `rollback.sql` | Reverts exactly what was inserted in this run, without touching prior QA data. |
-**7.1 Generated data report**:
+The three are written to the output path defined in Activation. After writing them, list the absolute paths of each file to the user.
+**7.1 Generated data report** (on screen):
 ```
 EXECUTION SUMMARY
@@ -304,40 +356,105 @@ TOTAL RECORDS REUSED: 7
 TEST CASES COVERED: 10/10 (100%)
 ```
-**7.2 Traceability matrix**:
+**7.2 Deliverable 1 — `test-data.md` (case→data relation, grouped)**:
+Write a Markdown file that states, **per case**, with which data it can be executed: which **already exist** and which **were created**. The goal is that ALL cases end with sufficient data associated and that an AI agent (Playwright) can read the **literal values** (codes, names, amounts) it must use in the UI — not just internal DB IDs.
+**Grouping rule (key)**: if several cases run with **exactly the same data seed**, do NOT repeat the data per case — group them in a single block (e.g., "Cases CP-001 to CP-010"). Only open individual blocks for cases with their own data (typically EDGE and NEGATIVE, which require specific values).
+File structure:
+```markdown
+# Test Data — <Business confirmed in Phase 1>
+- **Process/module**: [name]
+- **Database**: [DB name]
+- **Generation date**: [date]
+- **Run ID**: QA_<date>_<seq>   <!-- common marker of all records in this run -->
-For each test case, indicate whether data was REUSED or CREATED, with exact IDs:
+> Convention: values in **bold** are typed/selected in the UI.
+> The `id=` in parentheses are internal DB references (for traceability/rollback).
+## Shared base data
+Apply to most cases unless a case states otherwise.
+| Entity | Value to use in UI | DB id | Status |
+|--------|--------------------|-------|--------|
+| Company | **CMPQA — Clothing Store QA** | 1 | reused |
+| Document type | **Inventory out (SAL-INV)** | 12 | reused |
+| Warehouse | **Main Warehouse (BOD-001)** | 3 | reused |
+## Case groups
+### Cases CP-001 to CP-007 — Inventory out, positive flow
+**They share the same seed.** Any of these cases can be tested with:
+- Company: **CMPQA** (id=1)
+- Document type: **Inventory out** (id=12)
+- Items in stock:
+  - **Polo Shirt (ITM-POLO-001)** — id=2001 — stock 500 in Main Warehouse
+  - **Regular T-shirt (ITM-TEE-001)** — id=2002 — stock 500 in Main Warehouse
+- Source warehouse: **Main Warehouse (BOD-001)** — id=3
+Data: reused (nothing new was inserted for these cases).
+### CP-008 — Edge: out for the maximum quantity in stock
+- Item: **Polo Shirt Edge (ITM-POLO-MAX)** — id=2050 — exact stock 999999
+- Rest of the seed equal to the shared base data.
+Data: **created** core.items id=2050 (`QA_<run>_ITM_POLO_MAX`).
+### CP-009 — Negative: out greater than stock
+- Item: **Out-of-stock shirt (ITM-NOSTOCK)** — id=2051 — stock 0
+- Expected result: controlled error "insufficient stock".
+Data: **created** core.items id=2051 (`QA_<run>_ITM_NOSTOCK`).
+## Coverage
+Total cases: N — all with sufficient data associated (X reuse base data, Y with their own data).
 ```
-CP-001 (Positive) — REUSABLE:
-  → Use existing records:
-    - masterdata.customers: id=1001 (QA_CUSTOMER_001)
-    - core.invoices: id=5001 (QA_INVOICE_001)
-    - core.invoice_detail: ids=8001,8002
-  → No new data was inserted.
-CP-002 (Edge - maximum amount) — REUSABLE:
-  → Use: core.invoices id=5002 (QA_INVOICE_EDGE_001, amount=999999999.99)
-CP-003 (Positive) — PARTIAL:
-  → Reused: masterdata.customers id=1001
-  → Created: core.invoices id=5050 (QA_INVOICE_050), core.invoice_detail ids=8100,8101
-CP-005 (Negative) — MISSING:
-  → Created:
-    - masterdata.customers: id=1050 (QA_CUSTOMER_NEG_001)
-    - core.invoices: id=5051 (QA_INVOICE_NEG_001)
+Adapt entities, columns and values to what the real schema returned and to the confirmed business. The essential thing: **each case is mapped to concrete data, and cases with an identical seed are grouped in a single block.**
+**7.3 Deliverable 2 — `seed.sql` (executable, idempotent insertion)**:
+In addition to inserting directly via MCP (Phase 6), write a `seed.sql` that reproduces **exactly the same insertions**, in Phase 3 order (parents → children). Its purpose: if a fresh/clean DB is reached, running this `.sql` is enough to have the complete seed without re-running the whole analysis process.
+`seed.sql` requirements:
+1. **Transactional**: wrapped in `BEGIN; ... COMMIT;` so it is all-or-nothing.
+2. **Idempotent**: use `INSERT ... ON CONFLICT DO NOTHING` (PostgreSQL) or `IF NOT EXISTS` guards (SQL Server) so re-running does not fail or duplicate.
+3. **Run-marked**: every record carries the `Run ID` (e.g., in code/notes `QA_<date>_<seq>`), so the rollback can be scoped.
+4. **Correct order**: catalogs/root → master → focal → children, same as Phase 6.
+5. **Commented per case**: head each block with the cases it covers (e.g., `-- Covers CP-001..CP-007`).
+6. **Only what was created**: do NOT include records marked REUSABLE (those already exist). If a reused base record is needed for the `.sql` to run on an empty DB, include it with `ON CONFLICT DO NOTHING` and comment it as "base data (may already exist)".
+```sql
+-- ===============================================
+-- SEED — QA Synthetic Data
+-- Business: <confirmed business> | Process: [name]
+-- Target DB: [name] | Date: [date] | Run: QA_<date>_<seq>
+-- EXECUTE ONLY IN TEST ENVIRONMENT
+-- ===============================================
+BEGIN;
+-- Covers CP-001..CP-007 (base data)
+INSERT INTO masterdata.items (code, name, notes, ...)
+VALUES ('ITM-POLO-001', 'Polo Shirt', 'QA_<run> [SYNTHETIC DATA]', ...)
+ON CONFLICT (code) DO NOTHING;
+-- ... rest in parents -> children order ...
+COMMIT;
 ```
-**7.3 Rollback script**:
+**7.4 Deliverable 3 — `rollback.sql`**:
-Generate a SQL script that deletes ONLY the inserted data, in reverse order (children first, parents after):
+Write a `rollback.sql` that deletes ONLY the data inserted in this run, in reverse order (children first, parents after):
 ```sql
 -- ===============================================
 -- ROLLBACK — QA Synthetic Data
--- Process: [name]
--- Generation date: [date]
+-- Business: <confirmed business> | Process: [name]
+-- Generation date: [date] | Run: QA_<date>_<seq>
 -- EXECUTE ONLY IN TEST ENVIRONMENT
 -- ===============================================
@@ -372,9 +489,9 @@ MCP: execute_query("BEGIN; DELETE FROM ... ; COMMIT;")
 If the execution only generated data for some cases (others were REUSABLE), the rollback must clean up **only** the records inserted in this run. Do not touch previously reused QA data. Scope the DELETE by specific IDs or a run marker (e.g., date suffix `QA_INVOICE_20260420_050`) instead of a broad `LIKE 'QA_%'`.
-**7.4 Special case — No insertions required**:
+**7.5 Special case — No insertions required**:
-If Phase 4 determined that all cases are REUSABLE, there is no new data or rollback to generate. Deliver this report to the user:
+If Phase 4 determined that all cases are REUSABLE, there is no new data: **`seed.sql` and `rollback.sql` are NOT generated**. However, you **MUST still write `test-data.md`** with the case→data relation of existing data (same grouped blocks as 7.2, but all in "reused" status), so the team/AI agent knows which IDs and values to use. Deliver this report to the user:
 ```
 RESULT: NO NEW DATA WAS GENERATED
@@ -410,18 +527,22 @@ NEXT STEPS:
 ## General Rules
 ### What you MUST ALWAYS do
+- **Infer the business and confirm it with the user before generating anything** (Phase 1.A)
 - Query real schemas via MCP before generating any data
 - **Search for existing QA data and perform case-by-case matching before deciding to generate**
 - Verify that FKs point to records that exist
 - Show the user the plan (including which cases are REUSABLE) before inserting
 - Report errors immediately with full detail
-- Generate rollback script only when insertions occurred
+- **Write deliverables to disk**: `test-data.md` always; `seed.sql` and `rollback.sql` when there were insertions
+- **Group in `test-data.md` cases that share the same seed** (do not repeat data per case)
+- Mark all records of the run with the same `Run ID` to scope the rollback
 ### What you must NEVER do
 - Invent column names without querying the schema
 - Assume data types without verifying
 - **Generate duplicate data when QA records already exist that satisfy the case**
 - **Insert data "just in case" when Phase 4 matching said REUSABLE**
+- Write deliverables as chat blocks instead of files on disk
 - Insert into production tables (always verify the environment)
 - Modify or delete data that was not created by this process
 - Continue inserting if a dependency failed
@@ -448,16 +569,17 @@ NEXT STEPS:
 **AI**:
 1. Read the CSV → 15 test cases (10 positive, 3 edge, 2 negative)
-2. Identify entities: invoices, invoice_detail, customers, products, salespeople
-3. Query schemas via MCP → obtain columns, types, FKs for each table
-4. Search existing QA data → find 5 `QA_INVOICE_*` invoices and 3 `QA_CUSTOMER_*` customers
-5. Perform case-by-case matching:
+2. **Infer the business**: "From the cases, I deduce an invoicing distributor/retailer. Correct?" → user confirms
+3. Identify entities: invoices, invoice_detail, customers, products, salespeople
+4. Query schemas via MCP → obtain columns, types, FKs for each table
+5. Search existing QA data → find 5 `QA_INVOICE_*` invoices and 3 `QA_CUSTOMER_*` customers
+6. Perform case-by-case matching:
    - 4 cases REUSABLE (QA invoices already apply)
    - 2 cases PARTIAL (QA customer exists, invoice missing)
    - 9 cases MISSING
-6. Present plan: "7 records reused. Will insert 2 invoices + 9 new invoices + 33 lines"
-7. User approves → insert in order: [missing] customers → invoices → invoice_detail
-8. Deliver report (with REUSABLE block + created block) + rollback script scoped to this run's IDs
+7. Present plan: "7 records reused. Will insert 2 invoices + 9 new invoices + 33 lines"
+8. User approves → insert in order: [missing] customers → invoices → invoice_detail
+9. **Write the 3 deliverables**: `test-data.md` (cases grouped by seed), `seed.sql` (idempotent) and `rollback.sql` (scoped to this run's IDs); list the paths to the user
 ---
@@ -471,8 +593,8 @@ NEXT STEPS:
 2. Identify entities and query schemas via MCP
 3. Search existing QA data → find 12 `QA_INVOICE_*` invoices, 5 `QA_CUSTOMER_*` customers, associated detail
 4. Case-by-case matching → all 10 cases are REUSABLE
-5. Deliver report 7.4: **"No new data was generated"** with the list of existing IDs to use per test case
-6. Phases 5, 6 are not executed, no rollback script is generated
+5. Deliver report 7.5: **"No new data was generated"** + **write `test-data.md`** with the list of existing IDs to use per test case
+6. Phases 5, 6 are not executed; no `seed.sql` or `rollback.sql` is generated
 ---
@@ -481,5 +603,6 @@ NEXT STEPS:
 If the configured MCP is for SQL Server (`mcp_mssql`), the flow is identical but:
 - Tool names remain the same
 - The dependency function is `dbo.GetAllDependencies_JSON` instead of `get_all_dependencies_json`
-- Rollback scripts use `BEGIN TRANSACTION` / `COMMIT` instead of `BEGIN` / `COMMIT`
+- The scripts (`seed.sql` and `rollback.sql`) use `BEGIN TRANSACTION` / `COMMIT` instead of `BEGIN` / `COMMIT`
+- For idempotency in `seed.sql`, use `IF NOT EXISTS (SELECT 1 FROM ... WHERE ...) INSERT ...` instead of `INSERT ... ON CONFLICT DO NOTHING`
 - Use `TOP N` instead of `LIMIT N` for queries

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "siesa-agents",
-  "version": "2.1.72-qa.6",
+  "version": "2.1.72-qa.8",
   "description": "Paquete para instalar y configurar agentes SIESA en tu proyecto",
   "main": "index.js",
   "bin": {

package/siesa-agents/bmm/workflows/3-solutioning/quality-process/workflow.md CHANGED Viewed

@@ -2,7 +2,7 @@
 name: quality-process
 description: "QA Quality Process — Phase 1 Planning (BMAD V6.0 test design), Phase 2 Design (QA Test Plan & Strategy), Phase 3 AgileTest Registration (push validated design to Jira/AgileTest), Phase 4 DOR Gate (verify Definition of Ready checklist), Phase 5 Playwright Implementation (generate E2E test code from test-cases.csv gaps), or Phase 6 QA Data Generation (provision synthetic test data in the real DB via MCP from the Fase 2 test-cases.csv, reusing existing QA data before generating). Asks at startup which phase to execute."
 web_bundle: true
-version: 1.1.0
+version: 1.2.0
 parameters:
   feature_id:
     description: 'Opcional: Feature ID a procesar (ej: "feature-1"). Solo aplica para la Fase 2 — Diseño.'
@@ -19,7 +19,7 @@ parameters:
 - **Fase 3 — Registro AgileTest:** Empuja el diseño validado a Jira/AgileTest con trazabilidad completa.
 - **Fase 4 — DOR Gate:** Verifica que el requerimiento cumple el Definition of Ready (DoR) antes de iniciar cualquier labor de construcción. Genera reporte pass/fail por ítem.
 - **Fase 5 — Implementación Playwright:** Toma el `test-cases.csv` generado en Fase 2, identifica automáticamente los tests sin cobertura en los spec files existentes, clasifica cada test por tecnología (Playwright E2E, Vitest Unit, .NET xUnit), genera el código Playwright para los gaps, lo inyecta en los spec files con aprobación humana y actualiza el CSV.
-- **Fase 6 — Generación de Datos QA:** Toma el `test-cases.csv` generado en Fase 2 y, conectándose a la base de datos real vía MCP (PostgreSQL/SQL Server), reutiliza datos QA existentes antes de generar; provisiona únicamente los datos sintéticos faltantes con integridad referencial, bajo aprobación humana, y entrega un reporte de trazabilidad + script de rollback. Delega en la skill `sa-qa-data-generator`.
+- **Fase 6 — Generación de Datos QA:** Toma el `test-cases.csv` generado en Fase 2 y, conectándose a la base de datos real vía MCP (PostgreSQL/SQL Server), infiere y confirma el negocio, reutiliza datos QA existentes antes de generar; provisiona únicamente los datos sintéticos faltantes con integridad referencial, bajo aprobación humana, y entrega 3 artefactos en disco: `test-data.md` (relación caso→datos agrupada por seed, lista para Playwright), `seed.sql` idempotente y `rollback.sql` acotado a la corrida. Delega en la skill `sa-qa-data-generator`.
 **Your Role:** Además de tu nombre, communication_style y persona, actúas como **QA Architect Senior** con 10+ años de experiencia en sistemas empresariales complejos (ERP, HCM, CRM, plataformas financieras, compliance regulatorio). Piensas como defensor del negocio, no solo como técnico de pruebas. Trabajas bajo la metodología BMAD v6.0.
@@ -84,8 +84,9 @@ Preguntar al usuario qué fase del proceso de calidad desea ejecutar:
   6️⃣  Generación de Datos QA
                           — Provisiona datos sintéticos en la base de datos real (vía MCP)
                             para que los casos diseñados en Fase 2 puedan ejecutarse.
-                            Reutiliza datos QA existentes antes de generar, inserta solo lo
-                            faltante con aprobación humana y entrega reporte + rollback SQL.
+                            Infiere y confirma el negocio, reutiliza datos QA existentes antes
+                            de generar, inserta solo lo faltante con aprobación humana y entrega
+                            3 artefactos: test-data.md, seed.sql y rollback.sql.
                             (Requiere: Fase 2 completada — test-cases.csv + MCP de BD conectado)
 ¿Qué fase deseas ejecutar?
@@ -111,7 +112,7 @@ Preguntar al usuario qué fase del proceso de calidad desea ejecutar:
         },
         {
           "label": "Fase 6 — Generación de Datos QA",
-          "description": "Provisiona datos sintéticos en la BD real (vía MCP) desde el test-cases.csv de Fase 2: reutiliza datos QA existentes, inserta solo lo faltante con gate de aprobación y entrega reporte + rollback SQL"
+          "description": "Provisiona datos sintéticos en la BD real (vía MCP) desde el test-cases.csv de Fase 2: infiere y confirma el negocio, reutiliza datos QA existentes, inserta solo lo faltante con gate de aprobación y entrega test-data.md + seed.sql + rollback.sql"
         }
       ]
     }
@@ -659,7 +660,7 @@ Después de guardar `shards/test-design-phase4-test-matrix.md` (Sección V — M
 **Exportar el diseño completo a YAML estructurado:**
 Además del `test-design.md` (narrativo) y el `test-cases.csv` (casos tabulares), generar un
-`test-design.yaml` que represente **todo el diseño en formato datos** (machine-readable). El
+`test-design.yml` que represente **todo el diseño en formato datos** (machine-readable). El
 agente ya tiene en memoria las Secciones I→VI del megaprompt — serializarlas a YAML con esta
 estructura:
@@ -733,7 +734,7 @@ traceability:                     # Apéndice — Matriz de Trazabilidad
   en Modo Feature → `dependencies: []`; en Modo Completo → `interface_type: null`.
 - ✅ El `test_matrix` debe tener **una entrada por cada fila** de la matriz (mismo conteo que el CSV).
-Guardar como: `{implementation_artifacts}/quality-process/diseno/test-design-YYYY-MM-DD-HHmmss/test-design.yaml` (en la raíz, junto a `test-cases.csv`, NO en shards/)
+Guardar como: `{implementation_artifacts}/quality-process/diseno/test-design-YYYY-MM-DD-HHmmss/test-design.yml` (en la raíz, junto a `test-cases.csv`, NO en shards/)
 ---
@@ -748,7 +749,7 @@ Presentar al usuario:
 📄 Archivos en raíz:
   • test-design.md   — Documento unificado (phases 1–5)
-  • test-design.yaml — Diseño completo estructurado (machine-readable)
+  • test-design.yml — Diseño completo estructurado (machine-readable)
   • test-cases.csv   — Casos exportados (Siesa FT-SD-007 v5.0)
 📂 shards/ (documentos individuales por fase):
@@ -1850,9 +1851,9 @@ Presentar al usuario:
 ### Objetivo
-Provisionar en la base de datos real (vía MCP) los datos sintéticos que los casos de prueba diseñados en la Fase 2 necesitan para poder ejecutarse. Esta fase **reutiliza los datos QA ya existentes** antes de generar nada, inserta únicamente lo faltante respetando integridad referencial y tipos reales, y entrega un reporte de trazabilidad por caso más un script de rollback acotado a la corrida.
+Provisionar en la base de datos real (vía MCP) los datos sintéticos que los casos de prueba diseñados en la Fase 2 necesitan para poder ejecutarse. Esta fase **infiere y confirma el negocio** con el usuario, **reutiliza los datos QA ya existentes** antes de generar nada, inserta únicamente lo faltante respetando integridad referencial y tipos reales, y entrega **3 artefactos en disco**: `test-data.md` (relación caso→datos agrupada por seed, orientada a Playwright), `seed.sql` idempotente y `rollback.sql` acotado a la corrida.
-**Principio:** "Reutilizar antes de generar. No se inserta nada sin aprobación humana. Es válido y esperado que una corrida termine con **0 inserciones** si toda la cobertura ya existe."
+**Principio:** "Reutilizar antes de generar. No se inserta nada sin aprobación humana. Es válido y esperado que una corrida termine con **0 inserciones** si toda la cobertura ya existe — pero aun así se escribe `test-data.md`."
 **Fuente de verdad:** El protocolo de 7 fases vive en la skill `sa-qa-data-generator`. Esta fase orquesta esa skill dentro del ciclo de calidad para que el QA no tenga que invocarla como un comando aparte — el insumo es el `test-cases.csv` que la propia Fase 2 produjo.
@@ -1949,6 +1950,9 @@ Una vez el MCP esté conectado, vuelve a ejecutar la Fase 6.
 3. **Ejecutar las Fases 1–4 del protocolo** (sin inserción todavía):
    - PHASE 1 — Lectura y análisis de casos (clasificación POSITIVE / EDGE / NEGATIVE)
+   - **PHASE 1.A — Inferencia y confirmación del negocio (GATE):** deducir el negocio/dominio
+     al que apuntan los casos, presentarlo al usuario y **esperar confirmación** antes de seguir.
+     El negocio confirmado encabeza `test-data.md` y nombra la carpeta de salida (F6.6).
    - PHASE 2 — Descubrimiento de esquema vía MCP (NUNCA asumir columnas/tipos)
    - PHASE 3 — Construcción del árbol de entidades (orden de inserción)
    - PHASE 4 — Verificación de datos existentes y matching caso-por-caso (REUSABLE / PARTIAL / MISSING)
@@ -1956,12 +1960,16 @@ Una vez el MCP esté conectado, vuelve a ejecutar la Fase 6.
    Notificar al usuario durante la ejecución:
    ```
    ⚙️ Agente de Datos QA — analizando casos contra la BD real:
-      • PHASE 1: Lectura y clasificación de casos
-      • PHASE 2: Descubrimiento de esquemas (MCP)
-      • PHASE 3: Árbol de entidades y orden de inserción
-      • PHASE 4: Matching contra datos QA existentes (reutilizar antes de generar)
+      • PHASE 1:   Lectura y clasificación de casos
+      • PHASE 1.A: Inferencia del negocio (requiere confirmación del usuario)
+      • PHASE 2:   Descubrimiento de esquemas (MCP)
+      • PHASE 3:   Árbol de entidades y orden de inserción
+      • PHASE 4:   Matching contra datos QA existentes (reutilizar antes de generar)
    ```
+   > El GATE de PHASE 1.A es **no omisible**: si el usuario no confirma el negocio (o corrige),
+   > no se continúa a PHASE 2.
 ---
 ### F6.4: Gate de Aprobación Humana (OBLIGATORIO)
@@ -1983,7 +1991,7 @@ Presentar el resultado del matching de PHASE 4:
 ```
 **Si la decisión es "0 inserciones":**
-- No hay nada que aprobar. Saltar F6.5 e ir directo a F6.6 con el reporte 7.4 ("No se generaron datos nuevos") listando los IDs existentes a usar por caso.
+- No hay nada que aprobar. Saltar F6.5 e ir directo a F6.6 con el reporte 7.5 ("No se generaron datos nuevos"); se escribe **`test-data.md`** con todos los bloques en estado "reusado" y los IDs existentes a usar por caso (no se generan `seed.sql` ni `rollback.sql`).
 **Si hay al menos un caso PARTIAL o MISSING**, preguntar:
 ```json
@@ -2022,35 +2030,40 @@ Presentar el resultado del matching de PHASE 4:
 Ejecutar únicamente para los casos PARTIAL/MISSING aprobados:
-- PHASE 5 — Generación de datos (tipos reales, columnas NOT NULL, FKs válidas, marcador `QA_`/`TEST_`/`[SYNTHETIC DATA]`, sin PII real)
-- PHASE 6 — Inserción en BD en orden estricto del árbol (raíz/catálogo → maestras → focales → detalle → SP/funciones)
+- PHASE 5 — Generación de datos (tipos reales, columnas NOT NULL, FKs válidas, marcador `QA_`/`TEST_`/`[SYNTHETIC DATA]`, sin PII real). Asignar un **ID de corrida** común (`QA_<fecha>_<seq>`) a todos los registros para poder acotar el rollback.
+- PHASE 6 — Inserción en BD en orden estricto del árbol (raíz/catálogo → maestras → focales → detalle → SP/funciones) **y, en paralelo, escritura de `seed.sql`** que reproduce las mismas inserciones (transaccional, idempotente con `ON CONFLICT DO NOTHING` / `IF NOT EXISTS`, comentado por caso, ordenado padres→hijos). Se guarda en F6.6.
 **Tras cada inserción:** verificar éxito. Si falla por FK/constraint/tipo: NO continuar con dependientes, diagnosticar re-consultando el esquema, corregir y reintentar; si no se resuelve, informar al usuario y detener.
 ---
-### F6.6: Guardar Reporte y Script de Rollback
+### F6.6: Guardar los 3 Entregables
 **Generar timestamp:** `YYYY-MM-DD-HHmmss`
-**Crear carpeta:** `{implementation_artifacts}/quality-process/datos/data-gen-YYYY-MM-DD-HHmmss/`
+**Slug del negocio:** derivar de la PHASE 1.A un slug kebab-case del negocio confirmado (ej. "Una tienda de ropa" → `tienda-de-ropa`).
+**Crear carpeta:** `{implementation_artifacts}/quality-process/datos/data-gen-{negocio-slug}-YYYY-MM-DD-HHmmss/`
-Guardar dos archivos (usar **Write tool**, encoding UTF-8):
+Guardar los archivos (usar **Write tool**, encoding UTF-8). Construir cada string completo en memoria y llamar Write una sola vez — **NUNCA** bash/cat/echo/sed:
-1. **`data-generation-report.md`** — reporte de PHASE 7 del protocolo:
-   - Resumen de ejecución (REUSABLE / PARTIAL / MISSING, totales insertados vs reutilizados)
-   - Matriz de trazabilidad por caso (IDs reutilizados y/o creados)
-   - Si fueron 0 inserciones: el reporte 7.4 con los IDs existentes a usar por caso
+1. **`test-data.md`** (entregable 1 — PHASE 7.2) — **se escribe SIEMPRE**, incluso con 0 inserciones:
+   - Encabezado `# Test Data — {negocio confirmado}` + metadata (proceso, BD, fecha, **ID de corrida**).
+   - Tabla-resumen de ejecución al inicio (REUSABLE / PARTIAL / MISSING, insertados vs reutilizados).
+   - Relación caso→datos **agrupada por seed**: casos con seed idéntico en un solo bloque; solo borde/negativo abren bloque individual. Valores literales (negrita = se digita en UI) + `id=` de BD para trazabilidad.
+   - Si fueron 0 inserciones: todos los bloques en estado "reusado" (reporte 7.5).
    **Frontmatter:**
    ```yaml
    ---
    workflow: quality-process
    phase: datos
-   version: 1.0.0
+   version: 1.1.0
    engine: sa-qa-data-generator
    generated_date: [ISO 8601 date]
    project_name: {project_name}
+   negocio: [negocio confirmado en PHASE 1.A]
+   run_id: QA_<fecha>_<seq>
    source_csv: [ruta relativa]/test-cases.csv
    db_engine: postgresql | mssql
    inserciones: [N]
@@ -2059,7 +2072,9 @@ Guardar dos archivos (usar **Write tool**, encoding UTF-8):
    ---
    ```
-2. **`rollback.sql`** — script de rollback de PHASE 7.3, **solo si hubo inserciones**, acotado a los IDs/marcador de **esta** corrida (no un `LIKE 'QA_%'` amplio que borre datos reutilizados de corridas previas). Si fueron 0 inserciones, no generar este archivo.
+2. **`seed.sql`** (entregable 2 — PHASE 7.3) — **solo si hubo inserciones**. Transaccional, idempotente (`ON CONFLICT DO NOTHING` / `IF NOT EXISTS` en SQL Server), ordenado padres→hijos, comentado por caso (`-- Cubre CP-...`), marcado con el ID de corrida, solo registros creados (no REUSABLE). Si fueron 0 inserciones, no generar este archivo.
+3. **`rollback.sql`** (entregable 3 — PHASE 7.4) — **solo si hubo inserciones**, acotado a los IDs/marcador de **esta** corrida (no un `LIKE 'QA_%'` amplio que borre datos reutilizados de corridas previas). Si fueron 0 inserciones, no generar este archivo.
 **No tocar** datos previamente reutilizados ni datos que no fueron creados por esta corrida.
@@ -2072,7 +2087,9 @@ Presentar al usuario:
 ```
 ✅ FASE 6 — GENERACIÓN DE DATOS QA COMPLETADA
-📁 Carpeta: {implementation_artifacts}/quality-process/datos/data-gen-YYYY-MM-DD-HHmmss/
+📁 Carpeta: {implementation_artifacts}/quality-process/datos/data-gen-{negocio-slug}-YYYY-MM-DD-HHmmss/
+🏷️  Negocio confirmado: [negocio de PHASE 1.A]
 📊 Resumen:
   • Casos de prueba:        [M]
@@ -2084,10 +2101,11 @@ Presentar al usuario:
   • Cobertura de datos:     [N/M] casos
 📄 Archivos:
-  • data-generation-report.md — trazabilidad de datos por caso
-  • rollback.sql              — limpieza acotada a esta corrida {o "no generado (0 inserciones)"}
+  • test-data.md  — relación caso→datos agrupada por seed (lista para Playwright)
+  • seed.sql      — inserciones idempotentes recargables {o "no generado (0 inserciones)"}
+  • rollback.sql  — limpieza acotada a esta corrida {o "no generado (0 inserciones)"}
-⚠️  El rollback.sql debe ejecutarse SOLO en ambientes de prueba.
+⚠️  seed.sql y rollback.sql deben ejecutarse SOLO en ambientes de prueba.
 ```
 **⚠️ El workflow termina aquí. El equipo de QA puede ejecutar los casos usando los IDs reportados.**