@memberjunction/query-gen 0.0.1 → 2.126.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/.turbo/turbo-build.log +4 -0
  2. package/CHANGELOG.md +34 -0
  3. package/COORDINATOR.md +768 -0
  4. package/IMPLEMENTATION_PLAN.md +1753 -0
  5. package/LLM_ENTITY_GROUPING_PLAN.md +977 -0
  6. package/README.md +675 -29
  7. package/dist/cli/commands/export.d.ts +15 -0
  8. package/dist/cli/commands/export.d.ts.map +1 -0
  9. package/dist/cli/commands/export.js +178 -0
  10. package/dist/cli/commands/export.js.map +1 -0
  11. package/dist/cli/commands/generate.d.ts +19 -0
  12. package/dist/cli/commands/generate.d.ts.map +1 -0
  13. package/dist/cli/commands/generate.js +282 -0
  14. package/dist/cli/commands/generate.js.map +1 -0
  15. package/dist/cli/commands/validate.d.ts +17 -0
  16. package/dist/cli/commands/validate.d.ts.map +1 -0
  17. package/dist/cli/commands/validate.js +193 -0
  18. package/dist/cli/commands/validate.js.map +1 -0
  19. package/dist/cli/config.d.ts +51 -0
  20. package/dist/cli/config.d.ts.map +1 -0
  21. package/dist/cli/config.js +142 -0
  22. package/dist/cli/config.js.map +1 -0
  23. package/dist/cli/index.d.ts +13 -0
  24. package/dist/cli/index.d.ts.map +1 -0
  25. package/dist/cli/index.js +57 -0
  26. package/dist/cli/index.js.map +1 -0
  27. package/dist/core/EntityGrouper.d.ts +74 -0
  28. package/dist/core/EntityGrouper.d.ts.map +1 -0
  29. package/dist/core/EntityGrouper.js +246 -0
  30. package/dist/core/EntityGrouper.js.map +1 -0
  31. package/dist/core/MetadataExporter.d.ts +59 -0
  32. package/dist/core/MetadataExporter.d.ts.map +1 -0
  33. package/dist/core/MetadataExporter.js +151 -0
  34. package/dist/core/MetadataExporter.js.map +1 -0
  35. package/dist/core/QueryDatabaseWriter.d.ts +50 -0
  36. package/dist/core/QueryDatabaseWriter.d.ts.map +1 -0
  37. package/dist/core/QueryDatabaseWriter.js +152 -0
  38. package/dist/core/QueryDatabaseWriter.js.map +1 -0
  39. package/dist/core/QueryFixer.d.ts +48 -0
  40. package/dist/core/QueryFixer.d.ts.map +1 -0
  41. package/dist/core/QueryFixer.js +115 -0
  42. package/dist/core/QueryFixer.js.map +1 -0
  43. package/dist/core/QueryRefiner.d.ts +94 -0
  44. package/dist/core/QueryRefiner.d.ts.map +1 -0
  45. package/dist/core/QueryRefiner.js +267 -0
  46. package/dist/core/QueryRefiner.js.map +1 -0
  47. package/dist/core/QueryTester.d.ts +70 -0
  48. package/dist/core/QueryTester.d.ts.map +1 -0
  49. package/dist/core/QueryTester.js +243 -0
  50. package/dist/core/QueryTester.js.map +1 -0
  51. package/dist/core/QueryWriter.d.ts +57 -0
  52. package/dist/core/QueryWriter.d.ts.map +1 -0
  53. package/dist/core/QueryWriter.js +184 -0
  54. package/dist/core/QueryWriter.js.map +1 -0
  55. package/dist/core/QuestionGenerator.d.ts +58 -0
  56. package/dist/core/QuestionGenerator.d.ts.map +1 -0
  57. package/dist/core/QuestionGenerator.js +145 -0
  58. package/dist/core/QuestionGenerator.js.map +1 -0
  59. package/dist/data/schema.d.ts +230 -0
  60. package/dist/data/schema.d.ts.map +1 -0
  61. package/dist/data/schema.js +6 -0
  62. package/dist/data/schema.js.map +1 -0
  63. package/dist/index.d.ts +28 -0
  64. package/dist/index.d.ts.map +1 -0
  65. package/dist/index.js +77 -0
  66. package/dist/index.js.map +1 -0
  67. package/dist/prompts/PromptNames.d.ts +32 -0
  68. package/dist/prompts/PromptNames.d.ts.map +1 -0
  69. package/dist/prompts/PromptNames.js +35 -0
  70. package/dist/prompts/PromptNames.js.map +1 -0
  71. package/dist/utils/category-builder.d.ts +28 -0
  72. package/dist/utils/category-builder.d.ts.map +1 -0
  73. package/dist/utils/category-builder.js +90 -0
  74. package/dist/utils/category-builder.js.map +1 -0
  75. package/dist/utils/entity-helpers.d.ts +49 -0
  76. package/dist/utils/entity-helpers.d.ts.map +1 -0
  77. package/dist/utils/entity-helpers.js +189 -0
  78. package/dist/utils/entity-helpers.js.map +1 -0
  79. package/dist/utils/error-handlers.d.ts +19 -0
  80. package/dist/utils/error-handlers.d.ts.map +1 -0
  81. package/dist/utils/error-handlers.js +41 -0
  82. package/dist/utils/error-handlers.js.map +1 -0
  83. package/dist/utils/graph-helpers.d.ts +51 -0
  84. package/dist/utils/graph-helpers.d.ts.map +1 -0
  85. package/dist/utils/graph-helpers.js +82 -0
  86. package/dist/utils/graph-helpers.js.map +1 -0
  87. package/dist/utils/prompt-helpers.d.ts +25 -0
  88. package/dist/utils/prompt-helpers.d.ts.map +1 -0
  89. package/dist/utils/prompt-helpers.js +66 -0
  90. package/dist/utils/prompt-helpers.js.map +1 -0
  91. package/dist/utils/query-helpers.d.ts +23 -0
  92. package/dist/utils/query-helpers.d.ts.map +1 -0
  93. package/dist/utils/query-helpers.js +34 -0
  94. package/dist/utils/query-helpers.js.map +1 -0
  95. package/dist/utils/user-helpers.d.ts +15 -0
  96. package/dist/utils/user-helpers.d.ts.map +1 -0
  97. package/dist/utils/user-helpers.js +32 -0
  98. package/dist/utils/user-helpers.js.map +1 -0
  99. package/dist/vectors/EmbeddingService.d.ts +58 -0
  100. package/dist/vectors/EmbeddingService.d.ts.map +1 -0
  101. package/dist/vectors/EmbeddingService.js +90 -0
  102. package/dist/vectors/EmbeddingService.js.map +1 -0
  103. package/dist/vectors/SimilaritySearch.d.ts +51 -0
  104. package/dist/vectors/SimilaritySearch.d.ts.map +1 -0
  105. package/dist/vectors/SimilaritySearch.js +85 -0
  106. package/dist/vectors/SimilaritySearch.js.map +1 -0
  107. package/docs/API.md +1040 -0
  108. package/docs/ARCHITECTURE.md +1120 -0
  109. package/examples/advanced-usage.ts +401 -0
  110. package/examples/basic-usage.ts +285 -0
  111. package/package.json +48 -6
  112. package/src/cli/commands/export.ts +173 -0
  113. package/src/cli/commands/generate.ts +330 -0
  114. package/src/cli/commands/validate.ts +185 -0
  115. package/src/cli/config.ts +203 -0
  116. package/src/cli/index.ts +63 -0
  117. package/src/core/EntityGrouper.ts +318 -0
  118. package/src/core/MetadataExporter.ts +148 -0
  119. package/src/core/QueryDatabaseWriter.ts +187 -0
  120. package/src/core/QueryFixer.ts +153 -0
  121. package/src/core/QueryRefiner.ts +382 -0
  122. package/src/core/QueryTester.ts +264 -0
  123. package/src/core/QueryWriter.ts +239 -0
  124. package/src/core/QuestionGenerator.ts +199 -0
  125. package/src/data/golden-queries.json +1371 -0
  126. package/src/data/schema.ts +252 -0
  127. package/src/index.ts +49 -0
  128. package/src/prompts/PromptNames.ts +36 -0
  129. package/src/utils/category-builder.ts +97 -0
  130. package/src/utils/entity-helpers.ts +203 -0
  131. package/src/utils/error-handlers.ts +41 -0
  132. package/src/utils/graph-helpers.ts +99 -0
  133. package/src/utils/prompt-helpers.ts +79 -0
  134. package/src/utils/query-helpers.ts +32 -0
  135. package/src/utils/user-helpers.ts +39 -0
  136. package/src/vectors/EmbeddingService.ts +109 -0
  137. package/src/vectors/SimilaritySearch.ts +108 -0
  138. package/tsconfig.json +39 -0
@@ -0,0 +1,1753 @@
1
+ # Query Generation Package Implementation Plan
2
+
3
+ ## Package Overview
4
+ **Package Name**: `@memberjunction/query-gen`
5
+ **Purpose**: AI-powered generation of domain-specific SQL query templates with automatic testing, refinement, and metadata export
6
+ **CLI Command**: `mj querygen`
7
+
8
+ ---
9
+
10
+ ## Phase 1: Project Setup & Infrastructure (Week 1)
11
+
12
+ ### 1.1 Package Structure Creation
13
+ ```
14
+ packages/QueryGen/
15
+ ├── src/
16
+ │ ├── cli/
17
+ │ │ ├── commands/
18
+ │ │ │ ├── generate.ts # Main generation command
19
+ │ │ │ ├── validate.ts # Query validation command
20
+ │ │ │ └── export.ts # Metadata export command
21
+ │ │ ├── config.ts # Configuration loader
22
+ │ │ └── index.ts # CLI entry point
23
+ │ ├── core/
24
+ │ │ ├── EntityGrouper.ts # Entity relationship analysis
25
+ │ │ ├── QuestionGenerator.ts # Business question generation
26
+ │ │ ├── QueryWriter.ts # SQL template generation
27
+ │ │ ├── QueryTester.ts # Query execution & validation
28
+ │ │ ├── QueryRefiner.ts # Query refinement logic
29
+ │ │ └── MetadataExporter.ts # MJ metadata file export
30
+ │ ├── prompts/
31
+ │ │ └── PromptNames.ts # Static prompt name constants
32
+ │ ├── vectors/
33
+ │ │ └── SimilaritySearch.ts # Weighted similarity logic for golden queries
34
+ │ ├── data/
35
+ │ │ ├── golden-queries.json # 20 example queries
36
+ │ │ └── schema.ts # Type definitions
37
+ │ ├── utils/
38
+ │ │ ├── sql-helpers.ts # SQL parsing utilities
39
+ │ │ └── error-handlers.ts # Error handling helpers
40
+ │ └── index.ts
41
+ ├── package.json
42
+ ├── tsconfig.json
43
+ └── README.md
44
+ ```
45
+
46
+ ### 1.2 Configuration System
47
+ **Decision**: Integrate with `mj.config.cjs` for consistency with existing MJ packages
48
+
49
+ ```typescript
50
+ // Add to mj.config.cjs
51
+ queryGeneration: {
52
+ // Entity Filtering
53
+ includeEntities: ['*'], // Default: all entities
54
+ excludeEntities: [], // Default: none
55
+ excludeSchemas: ['__mj'], // Default: exclude MJ core schema
56
+
57
+ // Entity Grouping
58
+ maxEntitiesPerGroup: 3, // Default: 3 related entities
59
+ minEntitiesPerGroup: 1, // Default: 1 (single entity queries)
60
+ questionsPerGroup: 2, // Default: 1-2 questions per group
61
+ entityGroupStrategy: 'breadth', // 'breadth' | 'depth' - prefer breadth-first grouping
62
+
63
+ // AI Configuration
64
+ modelOverride: undefined, // Optional: prefer specific model
65
+ vendorOverride: undefined, // Optional: prefer specific vendor
66
+ embeddingModel: 'all-MiniLM-L6-v2', // Local embedding model
67
+
68
+ // Iteration Limits
69
+ maxRefinementIterations: 3, // Default: 3 refinement cycles
70
+ maxFixingIterations: 5, // Default: 5 error-fixing attempts
71
+
72
+ // Few-Shot Learning
73
+ topSimilarQueries: 5, // Default: top 5 example queries
74
+ similarityThreshold: 0.7, // Similarity threshold (still returns topN even if below threshold)
75
+
76
+ // Similarity Weighting
77
+ similarityWeights: {
78
+ name: 0.1, // 10% weight for name similarity
79
+ userQuestion: 0.2, // 20% weight for user question similarity
80
+ description: 0.35, // 35% weight for description similarity
81
+ technicalDescription: 0.35 // 35% weight for technical description similarity
82
+ },
83
+
84
+ // Output Configuration
85
+ outputMode: 'metadata', // 'metadata' | 'database' | 'both'
86
+ outputDirectory: './metadata/queries',
87
+
88
+ // Performance
89
+ parallelGenerations: 3, // Generate 3 queries in parallel
90
+ enableCaching: true, // Cache prompt results
91
+
92
+ // Validation
93
+ testWithSampleData: true, // Test queries before export
94
+ requireMinRows: 1, // Queries must return at least 1 row
95
+ maxRefinementRows: 10, // Maximum rows to use for refinement evaluation
96
+
97
+ // Verbose Logging
98
+ verbose: false
99
+ }
100
+ ```
101
+
102
+ ### 1.3 Golden Queries Data Structure
103
+ **Decision**: Embed as JSON file in `src/data/` directory (distributed with npm package)
104
+
105
+ ```typescript
106
+ // src/data/golden-queries.json structure
107
+ [
108
+ {
109
+ "name": "Customer Orders Summary",
110
+ "userQuestion": "Show me a summary of customer orders by region",
111
+ "description": "Aggregates order data by customer region with totals",
112
+ "technicalDescription": "Groups orders by customer region, calculates total orders and revenue per region",
113
+ "sql": "SELECT ...",
114
+ "parameters": [...],
115
+ "selectClause": [...]
116
+ },
117
+ // ... 19 more queries
118
+ ]
119
+
120
+ // Note: Embeddings for each field will be generated at runtime using AIEngine
121
+ // We don't pre-compute embeddings - they're computed on-demand during CLI execution
122
+ // This allows flexibility if embedding models change
123
+ ```
124
+
125
+ ### 1.4 Dependencies
126
+ ```json
127
+ {
128
+ "dependencies": {
129
+ "@memberjunction/core": "workspace:*",
130
+ "@memberjunction/core-entities": "workspace:*",
131
+ "@memberjunction/ai": "workspace:*",
132
+ "@memberjunction/ai-engine": "workspace:*",
133
+ "@memberjunction/ai-prompts": "workspace:*",
134
+ "@memberjunction/ai-vectors-memory": "workspace:*",
135
+ "@memberjunction/sql-server-dataprovider": "workspace:*",
136
+ "commander": "^11.0.0",
137
+ "chalk": "^5.3.0",
138
+ "ora": "^7.0.0",
139
+ "nunjucks": "^3.2.4"
140
+ }
141
+ }
142
+ ```
143
+
144
+ ---
145
+
146
+ ## Phase 2: Entity Analysis & Grouping (Week 2)
147
+
148
+ ### 2.1 EntityGrouper Implementation
149
+ **Purpose**: Create logical groups of 1-N related entities for query generation
150
+
151
+ **Key Features**:
152
+ - Load all entities from Metadata (respecting include/exclude filters)
153
+ - Analyze foreign key relationships to identify related entities
154
+ - Generate all valid combinations of 1-N entities
155
+ - Ensure no duplicate entity groups
156
+ - Allow same entity in multiple groups (different combinations)
157
+
158
+ **Algorithm**:
159
+ ```typescript
160
+ class EntityGrouper {
161
+ async generateEntityGroups(
162
+ entities: EntityInfo[],
163
+ minSize: number,
164
+ maxSize: number
165
+ ): Promise<EntityGroup[]> {
166
+ // 1. Build relationship graph from foreign keys
167
+ // 2. For each entity, find all connected entities using BREADTH-FIRST traversal
168
+ // - Prefer entities with direct relationships (1 hop away)
169
+ // - Then add entities 2 hops away, etc.
170
+ // - This creates more focused, practical entity groups
171
+ // 3. Generate combinations of size 1 to maxSize
172
+ // 4. Deduplicate groups (same entities = same group)
173
+ // 5. Return unique groups with relationship metadata
174
+ }
175
+ }
176
+
177
+ interface EntityGroup {
178
+ entities: EntityInfo[];
179
+ relationships: RelationshipInfo[];
180
+ primaryEntity: EntityInfo; // The "main" entity
181
+ relationshipType: 'single' | 'parent-child' | 'many-to-many';
182
+ }
183
+ ```
184
+
185
+ **Output Example**:
186
+ ```typescript
187
+ [
188
+ {
189
+ entities: [CustomersEntity],
190
+ relationships: [],
191
+ primaryEntity: CustomersEntity,
192
+ relationshipType: 'single'
193
+ },
194
+ {
195
+ entities: [CustomersEntity, OrdersEntity],
196
+ relationships: [{ from: 'Orders', to: 'Customers', via: 'CustomerID' }],
197
+ primaryEntity: OrdersEntity,
198
+ relationshipType: 'parent-child'
199
+ },
200
+ {
201
+ entities: [CustomersEntity, OrdersEntity, OrderDetailsEntity],
202
+ relationships: [...],
203
+ primaryEntity: OrdersEntity,
204
+ relationshipType: 'parent-child'
205
+ }
206
+ ]
207
+ ```
208
+
209
+ ### 2.2 Metadata Preparation
210
+ **Purpose**: Format entity metadata for AI prompts
211
+
212
+ **⚠️ CRITICAL**: Entity metadata MUST include SchemaName and BaseView for functional SQL generation
213
+
214
+ **Data Structure**:
215
+ ```typescript
216
+ interface EntityMetadataForPrompt {
217
+ entityName: string;
218
+ description: string;
219
+ schemaName: string; // REQUIRED: e.g., "dbo", "sales", "hr"
220
+ baseTable: string;
221
+ baseView: string; // REQUIRED: e.g., "vwCustomers", "vwOrders"
222
+ fields: {
223
+ name: string;
224
+ displayName: string;
225
+ type: string;
226
+ description: string;
227
+ isPrimaryKey: boolean;
228
+ isForeignKey: boolean;
229
+ relatedEntity?: string;
230
+ isRequired: boolean;
231
+ defaultValue?: string;
232
+ }[];
233
+ relationships: {
234
+ type: 'one-to-many' | 'many-to-one' | 'many-to-many';
235
+ relatedEntity: string;
236
+ relatedEntityView: string; // Include view name for joins
237
+ relatedEntitySchema: string; // Include schema for joins
238
+ foreignKeyField: string;
239
+ description: string;
240
+ }[];
241
+ }
242
+
243
+ // Example formatted metadata:
244
+ {
245
+ entityName: "Customers",
246
+ description: "Customer information and contact details",
247
+ schemaName: "dbo",
248
+ baseTable: "Customer",
249
+ baseView: "vwCustomers", // Query FROM [dbo].[vwCustomers]
250
+ fields: [...],
251
+ relationships: [
252
+ {
253
+ type: "one-to-many",
254
+ relatedEntity: "Orders",
255
+ relatedEntityView: "vwOrders",
256
+ relatedEntitySchema: "sales",
257
+ foreignKeyField: "CustomerID",
258
+ description: "Customer orders"
259
+ }
260
+ ]
261
+ }
262
+ ```
263
+
264
+ **Why This Matters**:
265
+ - SQL queries MUST reference `[SchemaName].[BaseView]` to work
266
+ - Without schema: Query fails with "Invalid object name"
267
+ - Without BaseView: Query uses base table instead of view (missing computed fields)
268
+ - Relationships need schema/view info for proper JOINs
269
+
270
+ **Example Query Fragment**:
271
+ ```sql
272
+ -- ✅ CORRECT: Includes schema and uses view
273
+ SELECT c.Name, c.Email, COUNT(o.ID) as OrderCount
274
+ FROM [dbo].[vwCustomers] c
275
+ LEFT JOIN [sales].[vwOrders] o ON o.CustomerID = c.ID
276
+ GROUP BY c.Name, c.Email
277
+
278
+ -- ❌ WRONG: Missing schema or using table name
279
+ SELECT c.Name, c.Email
280
+ FROM Customers c -- Will fail with "Invalid object name"!
281
+ ```
282
+
283
+ ---
284
+
285
+ ## Phase 3: Business Question Generation (Week 2-3)
286
+
287
+ **⚠️ IMPORTANT: Use Nunjucks Templates for All Prompts**
288
+
289
+ All AI prompts in this package MUST use Nunjucks template syntax to format data for readability:
290
+ - ✅ Use `{% for %}` loops to iterate over arrays
291
+ - ✅ Use `{{ variable }}` for simple values
292
+ - ✅ Use conditional logic with `{% if %}`
293
+ - ✅ Format structured data as markdown (not JSON dumps)
294
+ - ❌ AVOID `{{ data | json }}` - This makes prompts harder for LLMs to read
295
+ - ✅ PREFER structured markdown with loops and conditionals
296
+
297
+ **Why**: Structured markdown is much easier for LLMs to parse than raw JSON, leading to better AI responses.
298
+
299
+ ### 3.1 QuestionGenerator Implementation
300
+ **Purpose**: Generate 1-2 domain-specific business questions per entity group
301
+
302
+ **AI Prompt**: `metadata/prompts/templates/query-gen/business-question-generator.template.md`
303
+
304
+ **Prompt Content**:
305
+ ```markdown
306
+ # Business Question Generator
307
+
308
+ You are an expert data analyst helping to generate meaningful business questions that can be answered with SQL queries.
309
+
310
+ ## Entity Group Context
311
+
312
+ {% for entity in entityGroupMetadata %}
313
+ ### Entity: {{ entity.entityName }}
314
+ - **Schema**: {{ entity.schemaName }}
315
+ - **View**: {{ entity.baseView }}
316
+ - **Description**: {{ entity.description }}
317
+
318
+ **Fields**:
319
+ {% for field in entity.fields %}
320
+ - `{{ field.name }}` ({{ field.type }}){% if field.description %} - {{ field.description }}{% endif %}{% if field.isPrimaryKey %} [PRIMARY KEY]{% endif %}{% if field.isForeignKey %} [FK to {{ field.relatedEntity }}]{% endif %}
321
+ {% endfor %}
322
+
323
+ {% if entity.relationships.length > 0 %}
324
+ **Relationships**:
325
+ {% for rel in entity.relationships %}
326
+ - {{ rel.type }}: {{ rel.relatedEntity }} via `{{ rel.foreignKeyField }}`{% if rel.description %} - {{ rel.description }}{% endif %}
327
+ {% endfor %}
328
+ {% endif %}
329
+
330
+ ---
331
+ {% endfor %}
332
+
333
+ ## Instructions
334
+ Generate 1-2 realistic business questions that:
335
+ 1. Use the available entities and their relationships
336
+ 2. Are answerable with the data in these tables
337
+ 3. Are practical questions a business user would ask
338
+ 4. Vary in complexity (simple aggregations vs. complex joins)
339
+ 5. Leverage entity descriptions to understand domain context
340
+
341
+ ## Output Format
342
+ Return JSON array of questions:
343
+ ```json
344
+ {
345
+ "questions": [
346
+ {
347
+ "userQuestion": "What are the top 5 customers by order volume?",
348
+ "description": "Identify customers with the most orders",
349
+ "technicalDescription": "Count orders per customer, sort descending, limit 5",
350
+ "complexity": "simple",
351
+ "requiresAggregation": true,
352
+ "requiresJoins": true,
353
+ "entities": ["Customers", "Orders"]
354
+ }
355
+ ]
356
+ }
357
+ ```
358
+ ```
359
+
360
+ **AI Prompt Configuration** (`.prompts.json`):
361
+ ```json
362
+ {
363
+ "fields": {
364
+ "Name": "Business Question Generator",
365
+ "Description": "Generates domain-specific business questions for entity groups",
366
+ "TypeID": "@lookup:AI Prompt Types.Name=Chat",
367
+ "TemplateText": "@file:templates/query-gen/business-question-generator.template.md",
368
+ "Status": "Active",
369
+ "ResponseFormat": "JSON",
370
+ "SelectionStrategy": "Specific",
371
+ "PowerPreference": "Highest",
372
+ "ParallelizationMode": "None",
373
+ "OutputType": "object",
374
+ "ValidationBehavior": "Strict",
375
+ "MaxRetries": 3,
376
+ "FailoverMaxAttempts": 5,
377
+ "PromptRole": "System",
378
+ "PromptPosition": "First",
379
+ "CategoryID": "@lookup:AI Prompt Categories.Name=Query Generation?create&Description=Prompts for QueryGen system"
380
+ },
381
+ "relatedEntities": {
382
+ "MJ: AI Prompt Models": [
383
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Claude 4.5 Sonnet", "VendorID": "@lookup:MJ: AI Vendors.Name=Anthropic", "Priority": 1 } },
384
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 2 } },
385
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Cerebras", "Priority": 3 } },
386
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Gemini 2.5 Flash", "VendorID": "@lookup:MJ: AI Vendors.Name=Google", "Priority": 4 } },
387
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT-OSS-120B", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 5 } },
388
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT 5-nano", "VendorID": "@lookup:MJ: AI Vendors.Name=OpenAI", "Priority": 6 } }
389
+ ]
390
+ }
391
+ }
392
+ ```
393
+
394
+ **Using AIEngine for Prompts**:
395
+ ```typescript
396
+ // Static prompt name constant
397
+ export const PROMPT_BUSINESS_QUESTION_GENERATOR = 'Business Question Generator';
398
+
399
+ // In QuestionGenerator class:
400
+ class QuestionGenerator {
401
+ async generateQuestions(entityGroup: EntityGroup): Promise<BusinessQuestion[]> {
402
+ // 1. Ensure AIEngine is configured
403
+ const aiEngine = AIEngine.Instance;
404
+ await aiEngine.Config(false, this.contextUser);
405
+
406
+ // 2. Find the prompt by name (AIEngine caches all prompts)
407
+ const prompt = aiEngine.Prompts.find(p => p.Name === PROMPT_BUSINESS_QUESTION_GENERATOR);
408
+ if (!prompt) {
409
+ throw new Error(`Prompt '${PROMPT_BUSINESS_QUESTION_GENERATOR}' not found`);
410
+ }
411
+
412
+ // 3. Use AIPromptRunner to execute
413
+ const promptRunner = new AIPromptRunner();
414
+ const result = await promptRunner.ExecutePrompt({
415
+ prompt,
416
+ data: { entityGroupMetadata: formatEntityGroupForPrompt(entityGroup) },
417
+ contextUser: this.contextUser
418
+ });
419
+
420
+ return result.result.questions;
421
+ }
422
+ }
423
+ ```
424
+
425
+ **Note**: AIEngine loads all prompts during Config() - no need for separate PromptManager or caching.
426
+
427
+ ### 3.2 Question Validation
428
+ **Purpose**: Filter out low-quality or unanswerable questions
429
+
430
+ **Validation Criteria**:
431
+ - Question must reference entities in the group
432
+ - Question should be specific enough to generate a query
433
+ - Avoid overly generic questions ("Show me all data")
434
+ - Prefer questions with measurable outcomes
435
+
436
+ ---
437
+
438
+ ## Phase 4: Vector Similarity Search (Week 3)
439
+
440
+ ### 4.1 Using AIEngine for Embeddings
441
+ **Purpose**: Use MemberJunction's AIEngine for all embedding operations
442
+
443
+ **Key Features**:
444
+ ```typescript
445
+ // Use AIEngine.Instance.EmbedTextLocal() for all embeddings
446
+ // AIEngine is already configured with local embedding models
447
+ // No need for a separate EmbeddingService wrapper
448
+
449
+ // Example usage:
450
+ const aiEngine = AIEngine.Instance;
451
+ await aiEngine.Config(false, contextUser);
452
+
453
+ // Embed a query field
454
+ const nameEmbedding = await aiEngine.EmbedTextLocal(query.name);
455
+ const descEmbedding = await aiEngine.EmbedTextLocal(query.description);
456
+ const techDescEmbedding = await aiEngine.EmbedTextLocal(query.technicalDescription);
457
+ ```
458
+
459
+ **Note**: We embed each field separately (name, description, technicalDescription) for weighted similarity scoring, not as a concatenated string.
460
+
461
+ ### 4.2 Weighted Similarity Search Implementation
462
+ **Purpose**: Find top-K most similar golden queries using weighted field similarity
463
+
464
+ **Algorithm**: Weighted cosine similarity across multiple fields
465
+
466
+ ```typescript
467
+ class SimilaritySearch {
468
+ private weights = {
469
+ name: 0.1,
470
+ userQuestion: 0.2,
471
+ description: 0.35,
472
+ technicalDescription: 0.35
473
+ };
474
+
475
+ async findSimilarQueries(
476
+ queryEmbeddings: {
477
+ name: number[],
478
+ userQuestion: number[],
479
+ description: number[],
480
+ technicalDescription: number[]
481
+ },
482
+ goldenEmbeddings: Array<{
483
+ query: GoldenQuery,
484
+ embeddings: {
485
+ name: number[],
486
+ userQuestion: number[],
487
+ description: number[],
488
+ technicalDescription: number[]
489
+ }
490
+ }>,
491
+ topK: number = 5
492
+ ): Promise<SimilarQuery[]> {
493
+ // 1. Calculate weighted similarity for each golden query
494
+ const similarities = goldenEmbeddings.map(golden => {
495
+ // Calculate cosine similarity for each field
496
+ const nameSim = this.cosineSimilarity(
497
+ queryEmbeddings.name,
498
+ golden.embeddings.name
499
+ );
500
+ const userQuestionSim = this.cosineSimilarity(
501
+ queryEmbeddings.userQuestion,
502
+ golden.embeddings.userQuestion
503
+ );
504
+ const descSim = this.cosineSimilarity(
505
+ queryEmbeddings.description,
506
+ golden.embeddings.description
507
+ );
508
+ const techDescSim = this.cosineSimilarity(
509
+ queryEmbeddings.technicalDescription,
510
+ golden.embeddings.technicalDescription
511
+ );
512
+
513
+ // Calculate weighted sum
514
+ const weightedScore =
515
+ (nameSim * this.weights.name) +
516
+ (userQuestionSim * this.weights.userQuestion) +
517
+ (descSim * this.weights.description) +
518
+ (techDescSim * this.weights.technicalDescription);
519
+
520
+ return {
521
+ query: golden.query,
522
+ similarity: weightedScore,
523
+ fieldScores: { nameSim, userQuestionSim, descSim, techDescSim }
524
+ };
525
+ });
526
+
527
+ // 2. Sort descending by weighted similarity
528
+ const sorted = similarities.sort((a, b) => b.similarity - a.similarity);
529
+
530
+ // 3. Return top K (ALWAYS return topK results, even if below threshold)
531
+ // Threshold is informational only - we still use the best matches available
532
+ return sorted.slice(0, topK);
533
+ }
534
+
535
+ private cosineSimilarity(a: number[], b: number[]): number {
536
+ // Use SimpleVectorService.CosineSimilarity() from @memberjunction/ai-vectors-memory
537
+ // Or implement: dot product / (magnitude(a) * magnitude(b))
538
+ const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
539
+ const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
540
+ const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
541
+ return dotProduct / (magA * magB);
542
+ }
543
+ }
544
+ ```
545
+
546
+ ### 4.3 Few-Shot Example Selection
547
+ **Output**: 3-5 most relevant golden queries to include in SQL generation prompt
548
+
549
+ **Example**:
550
+ ```typescript
551
+ // User question: "What are the top customers by revenue?"
552
+ // Similar golden queries:
553
+ [
554
+ { name: "Top Customers by Order Count", similarity: 0.92, sql: "..." },
555
+ { name: "Revenue by Customer Segment", similarity: 0.87, sql: "..." },
556
+ { name: "Customer Purchase Analysis", similarity: 0.83, sql: "..." }
557
+ ]
558
+ ```
559
+
560
+ ---
561
+
562
+ ## Phase 5: SQL Query Generation (Week 4)
563
+
564
+ ### 5.1 QueryWriter Implementation
565
+ **Purpose**: Generate Nunjucks SQL templates using AI with few-shot learning
566
+
567
+ **AI Prompt**: `metadata/prompts/templates/query-gen/sql-query-writer.template.md`
568
+
569
+ **Prompt Content** (based on Skip's query-writer.md):
570
+ ```markdown
571
+ # SQL Query Template Writer
572
+
573
+ You are an expert SQL developer specializing in creating MemberJunction-compatible Nunjucks SQL query templates.
574
+
575
+ ## Task
576
+ Generate a SQL query template that answers the following business question:
577
+
578
+ **User Question**: {{ userQuestion }}
579
+ **Description**: {{ description }}
580
+ **Technical Description**: {{ technicalDescription }}
581
+
582
+ ## Available Entities
583
+
584
+ {% for entity in entityMetadata %}
585
+ ### {{ entity.entityName }}
586
+ - **Schema.View**: `[{{ entity.schemaName }}].[{{ entity.baseView }}]`
587
+ - **Description**: {{ entity.description }}
588
+
589
+ **Available Fields**:
590
+ {% for field in entity.fields %}
591
+ - `{{ field.name }}` ({{ field.type }}){% if field.description %} - {{ field.description }}{% endif %}{% if field.isPrimaryKey %} [PK]{% endif %}{% if field.isForeignKey %} [FK→{{ field.relatedEntity }}]{% endif %}
592
+ {% endfor %}
593
+
594
+ {% if entity.relationships.length > 0 %}
595
+ **Join Information**:
596
+ {% for rel in entity.relationships %}
597
+ - To join `{{ rel.relatedEntity }}`: `LEFT JOIN [{{ rel.relatedEntitySchema }}].[{{ rel.relatedEntityView }}] alias ON alias.{{ rel.foreignKeyField }} = {{ entity.entityName.substring(0,1).toLowerCase() }}.ID`
598
+ {% endfor %}
599
+ {% endif %}
600
+
601
+ ---
602
+ {% endfor %}
603
+
604
+ ## Example Queries (Similar to Your Task)
605
+
606
+ {% for example in fewShotExamples %}
607
+ ### Example {{ loop.index }}: {{ example.name }}
608
+ **User Question**: {{ example.userQuestion }}
609
+ **Description**: {{ example.description }}
610
+
611
+ **SQL Template**:
612
+ ```sql
613
+ {{ example.sql }}
614
+ ```
615
+
616
+ **Parameters**:
617
+ {% for param in example.parameters %}
618
+ - `{{ param.name }}` ({{ param.type }}){% if param.isRequired %} [REQUIRED]{% endif %} - {{ param.description }}
619
+ - Sample: `{{ param.sampleValue }}`
620
+ {% endfor %}
621
+
622
+ **Output Fields**:
623
+ {% for field in example.selectClause %}
624
+ - `{{ field.name }}` ({{ field.type }}) - {{ field.description }}
625
+ {% endfor %}
626
+
627
+ ---
628
+ {% endfor %}
629
+
630
+ ## Requirements
631
+ 1. **Use Nunjucks Syntax**: Parameters use `{{ paramName }}` syntax
632
+ 2. **Use SQL Filters**: Apply `| sqlString`, `| sqlNumber`, `| sqlDate`, `| sqlIn` filters
633
+ 3. **Use Base Views**: Query from `vw*` views, not base tables
634
+ 4. **Include Comments**: Document query purpose and logic
635
+ 5. **Handle NULLs**: Use COALESCE or ISNULL for aggregations
636
+ 6. **Performance**: Include appropriate WHERE clauses and JOINs
637
+ 7. **Parameterize**: Make queries reusable with parameters
638
+
639
+ ## Output Format
640
+ Return JSON with three properties:
641
+
642
+ ```json
643
+ {
644
+ "sql": "SELECT ... FROM ... WHERE ...",
645
+ "selectClause": [
646
+ {
647
+ "name": "CustomerName",
648
+ "description": "Name of the customer",
649
+ "type": "string",
650
+ "optional": false
651
+ }
652
+ ],
653
+ "parameters": [
654
+ {
655
+ "name": "minRevenue",
656
+ "type": "number",
657
+ "isRequired": true,
658
+ "description": "Minimum revenue threshold",
659
+ "usage": ["WHERE clause: Revenue >= {{ minRevenue | sqlNumber }}"],
660
+ "defaultValue": null,
661
+ "sampleValue": "10000"
662
+ }
663
+ ]
664
+ }
665
+ ```
666
+ ```
667
+
668
+ **AI Prompt Configuration** (`.prompts.json`):
669
+ ```json
670
+ {
671
+ "fields": {
672
+ "Name": "SQL Query Writer",
673
+ "Description": "Generates Nunjucks SQL query templates from business questions",
674
+ "TypeID": "@lookup:AI Prompt Types.Name=Chat",
675
+ "TemplateText": "@file:templates/query-gen/sql-query-writer.template.md",
676
+ "Status": "Active",
677
+ "ResponseFormat": "JSON",
678
+ "SelectionStrategy": "Specific",
679
+ "PowerPreference": "Highest",
680
+ "ParallelizationMode": "None",
681
+ "OutputType": "object",
682
+ "ValidationBehavior": "Strict",
683
+ "MaxRetries": 3,
684
+ "FailoverMaxAttempts": 5,
685
+ "PromptRole": "System",
686
+ "PromptPosition": "First",
687
+ "CategoryID": "@lookup:AI Prompt Categories.Name=Query Generation"
688
+ },
689
+ "relatedEntities": {
690
+ "MJ: AI Prompt Models": [
691
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Claude 4.5 Sonnet", "VendorID": "@lookup:MJ: AI Vendors.Name=Anthropic", "Priority": 1 } },
692
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 2 } },
693
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Cerebras", "Priority": 3 } },
694
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Gemini 2.5 Flash", "VendorID": "@lookup:MJ: AI Vendors.Name=Google", "Priority": 4 } },
695
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT-OSS-120B", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 5 } },
696
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT 5-nano", "VendorID": "@lookup:MJ: AI Vendors.Name=OpenAI", "Priority": 6 } }
697
+ ]
698
+ }
699
+ }
700
+ ```
701
+
702
+ ### 5.2 QueryParameterProcessor Integration
703
+ **Purpose**: Render Nunjucks templates with sample parameter values for testing
704
+
705
+ ```typescript
706
+ class QueryWriter {
707
+ async generateQuery(
708
+ businessQuestion: BusinessQuestion,
709
+ entityMetadata: EntityMetadataForPrompt[],
710
+ fewShotExamples: GoldenQuery[]
711
+ ): Promise<GeneratedQuery> {
712
+ // 1. Prepare prompt data
713
+ const promptData = {
714
+ userQuestion: businessQuestion.userQuestion,
715
+ description: businessQuestion.description,
716
+ technicalDescription: businessQuestion.technicalDescription,
717
+ entityMetadata,
718
+ fewShotExamples
719
+ };
720
+
721
+ // 2. Execute SQL Query Writer prompt
722
+ const promptRunner = new AIPromptRunner();
723
+ const result = await promptRunner.ExecutePrompt({
724
+ prompt: await this.getPrompt('SQL Query Writer'),
725
+ data: promptData,
726
+ contextUser: this.contextUser
727
+ });
728
+
729
+ // 3. Parse result
730
+ const generated: GeneratedQuery = result.result as any;
731
+
732
+ // 4. Validate structure
733
+ this.validateGeneratedQuery(generated);
734
+
735
+ return generated;
736
+ }
737
+
738
+ private validateGeneratedQuery(query: GeneratedQuery): void {
739
+ if (!query.sql || !query.parameters || !query.selectClause) {
740
+ throw new Error('Invalid query structure returned from AI');
741
+ }
742
+ }
743
+ }
744
+ ```
745
+
746
+ ---
747
+
748
+ ## Phase 6: Query Testing & Fixing (Week 5)
749
+
750
+ ### 6.1 QueryTester Implementation
751
+ **Purpose**: Render and execute SQL queries to validate they work correctly
752
+
753
+ **Key Features**:
754
+ ```typescript
755
+ class QueryTester {
756
+ private processor: QueryParameterProcessor;
757
+
758
+ async testQuery(
759
+ query: GeneratedQuery,
760
+ maxAttempts: number = 5
761
+ ): Promise<QueryTestResult> {
762
+ let attempt = 0;
763
+ let lastError: string | undefined;
764
+
765
+ while (attempt < maxAttempts) {
766
+ attempt++;
767
+
768
+ try {
769
+ // 1. Render template with sample parameter values
770
+ const renderedSQL = this.renderQueryTemplate(query);
771
+
772
+ // 2. Execute SQL on database
773
+ const results = await this.executeSQLQuery(renderedSQL);
774
+
775
+ // 3. Validate results
776
+ if (results.length === 0) {
777
+ throw new Error('Query returned no results (may need sample data)');
778
+ }
779
+
780
+ // 4. Success!
781
+ return {
782
+ success: true,
783
+ renderedSQL,
784
+ rowCount: results.length,
785
+ sampleRows: results.slice(0, 10),
786
+ attempts: attempt
787
+ };
788
+
789
+ } catch (error) {
790
+ lastError = extractErrorMessage(error, 'Query Testing');
791
+ console.error(`Attempt ${attempt}/${maxAttempts} failed:`, lastError);
792
+
793
+ // 5. If not last attempt, try to fix the query
794
+ if (attempt < maxAttempts) {
795
+ query = await this.fixQuery(query, lastError);
796
+ }
797
+ }
798
+ }
799
+
800
+ // Failed after max attempts
801
+ return {
802
+ success: false,
803
+ error: lastError,
804
+ attempts: maxAttempts
805
+ };
806
+ }
807
+
808
+ private renderQueryTemplate(query: GeneratedQuery): string {
809
+ // Use QueryParameterProcessor to render Nunjucks template
810
+ const paramValues: Record<string, any> = {};
811
+
812
+ // Convert sampleValue strings to proper types
813
+ for (const param of query.parameters) {
814
+ paramValues[param.name] = this.parseSampleValue(param.sampleValue, param.type);
815
+ }
816
+
817
+ const result = QueryParameterProcessor.processQueryTemplate(
818
+ { SQL: query.sql, Parameters: query.parameters } as any,
819
+ paramValues
820
+ );
821
+
822
+ if (!result.success) {
823
+ throw new Error(`Template rendering failed: ${result.error}`);
824
+ }
825
+
826
+ return result.processedSQL;
827
+ }
828
+
829
+ private parseSampleValue(value: string, type: string): any {
830
+ switch (type) {
831
+ case 'number': return Number(value);
832
+ case 'boolean': return value.toLowerCase() === 'true';
833
+ case 'date': return new Date(value);
834
+ case 'array': return JSON.parse(value);
835
+ default: return value;
836
+ }
837
+ }
838
+
839
+ private async executeSQLQuery(sql: string): Promise<any[]> {
840
+ // Execute SQL against database
841
+ const result = await this.dataProvider.ExecuteSQL(sql);
842
+ return result.Results;
843
+ }
844
+
845
+ private async fixQuery(
846
+ query: GeneratedQuery,
847
+ errorMessage: string
848
+ ): Promise<GeneratedQuery> {
849
+ // Use SQL Query Fixer prompt to correct the query
850
+ const fixer = new QueryFixer();
851
+ return await fixer.fixQuery(query, errorMessage);
852
+ }
853
+ }
854
+ ```
855
+
856
+ ### 6.2 SQL Query Fixer Prompt
857
+ **AI Prompt**: `metadata/prompts/templates/query-gen/sql-query-fixer.template.md`
858
+
859
+ **Prompt Content** (based on Skip's query-fixer.md):
860
+ ```markdown
861
+ # SQL Query Fixer
862
+
863
+ You are an expert SQL developer tasked with fixing a broken SQL query.
864
+
865
+ ## Original Query
866
+ ```sql
867
+ {{ originalSQL }}
868
+ ```
869
+
870
+ ## Error Message
871
+ ```
872
+ {{ errorMessage }}
873
+ ```
874
+
875
+ ## Entity Metadata
876
+
877
+ {% for entity in entityMetadata %}
878
+ ### {{ entity.entityName }}
879
+ - **Schema.View**: `[{{ entity.schemaName }}].[{{ entity.baseView }}]`
880
+ - **Description**: {{ entity.description }}
881
+
882
+ **Available Fields**:
883
+ {% for field in entity.fields %}
884
+ - `{{ field.name }}` ({{ field.type }}){% if field.isPrimaryKey %} [PK]{% endif %}{% if field.isForeignKey %} [FK→{{ field.relatedEntity }}]{% endif %}
885
+ {% endfor %}
886
+ {% endfor %}
887
+
888
+ ## Query Parameters
889
+
890
+ {% if parameters.length > 0 %}
891
+ {% for param in parameters %}
892
+ - `{{ param.name }}` ({{ param.type }}){% if param.isRequired %} [REQUIRED]{% endif %} - {{ param.description }}
893
+ - Sample value: `{{ param.sampleValue }}`
894
+ {% endfor %}
895
+ {% else %}
896
+ No parameters defined for this query.
897
+ {% endif %}
898
+
899
+ ## Instructions
900
+ Analyze the error and fix the SQL query. Common issues:
901
+ - Syntax errors (missing commas, parentheses, keywords)
902
+ - Invalid column names (check entity metadata)
903
+ - Type mismatches (ensure correct types for parameters)
904
+ - Missing JOINs or incorrect JOIN conditions
905
+ - Aggregation errors (missing GROUP BY, invalid aggregate usage)
906
+ - Subquery issues
907
+
908
+ ## Requirements
909
+ 1. Preserve the query's intent and logic
910
+ 2. Fix only what's broken (minimal changes)
911
+ 3. Maintain Nunjucks parameter syntax
912
+ 4. Ensure SQL is valid for SQL Server
913
+ 5. Update parameters array if needed
914
+
915
+ ## Output Format
916
+ Return JSON with corrected query:
917
+
918
+ ```json
919
+ {
920
+ "sql": "SELECT ... (corrected)",
921
+ "selectClause": [...],
922
+ "parameters": [...],
923
+ "changesSummary": "Fixed missing GROUP BY clause for aggregate functions"
924
+ }
925
+ ```
926
+ ```
927
+
928
+ **AI Prompt Configuration** (`.prompts.json`):
929
+ ```json
930
+ {
931
+ "fields": {
932
+ "Name": "SQL Query Fixer",
933
+ "Description": "Fixes SQL syntax and logic errors in generated queries",
934
+ "TypeID": "@lookup:AI Prompt Types.Name=Chat",
935
+ "TemplateText": "@file:templates/query-gen/sql-query-fixer.template.md",
936
+ "Status": "Active",
937
+ "ResponseFormat": "JSON",
938
+ "SelectionStrategy": "Specific",
939
+ "PowerPreference": "Highest",
940
+ "ParallelizationMode": "None",
941
+ "OutputType": "object",
942
+ "ValidationBehavior": "Strict",
943
+ "MaxRetries": 3,
944
+ "FailoverMaxAttempts": 5,
945
+ "PromptRole": "System",
946
+ "PromptPosition": "First",
947
+ "CategoryID": "@lookup:AI Prompt Categories.Name=Query Generation"
948
+ },
949
+ "relatedEntities": {
950
+ "MJ: AI Prompt Models": [
951
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Claude 4.5 Sonnet", "VendorID": "@lookup:MJ: AI Vendors.Name=Anthropic", "Priority": 1 } },
952
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 2 } },
953
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Cerebras", "Priority": 3 } },
954
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Gemini 2.5 Flash", "VendorID": "@lookup:MJ: AI Vendors.Name=Google", "Priority": 4 } },
955
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT-OSS-120B", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 5 } },
956
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT 5-nano", "VendorID": "@lookup:MJ: AI Vendors.Name=OpenAI", "Priority": 6 } }
957
+ ]
958
+ }
959
+ }
960
+ ```
961
+
962
+ ---
963
+
964
+ ## Phase 7: Query Refinement & Evaluation (Week 6)
965
+
966
+ ### 7.1 Query Evaluator Prompt
967
+ **Purpose**: Assess if the query answers the business question correctly
968
+
969
+ **AI Prompt**: `metadata/prompts/templates/query-gen/query-evaluator.template.md`
970
+
971
+ **Prompt Content**:
972
+ ```markdown
973
+ # Query Result Evaluator
974
+
975
+ You are a data analyst evaluating whether a SQL query answers a business question correctly.
976
+
977
+ ## Business Question
978
+ **User Question**: {{ userQuestion }}
979
+ **Description**: {{ description }}
980
+ **Technical Description**: {{ technicalDescription }}
981
+
982
+ ## Generated SQL Query
983
+ ```sql
984
+ {{ generatedSQL }}
985
+ ```
986
+
987
+ ## Query Parameters
988
+ {% if parameters.length > 0 %}
989
+ {% for param in parameters %}
990
+ - `{{ param.name }}` ({{ param.type }}){% if param.isRequired %} [REQUIRED]{% endif %} - {{ param.description }}
991
+ - Sample value used: `{{ param.sampleValue }}`
992
+ {% endfor %}
993
+ {% else %}
994
+ No parameters defined for this query.
995
+ {% endif %}
996
+
997
+ ## Sample Results (Limited to Top 10 Rows for Efficiency)
998
+
999
+ {% if sampleResults.length > 0 %}
1000
+ **Total rows returned**: {{ sampleResults.length }}
1001
+
1002
+ {% for row in sampleResults %}
1003
+ ### Row {{ loop.index }}
1004
+ {% for key, value in row %}
1005
+ - **{{ key }}**: {{ value }}
1006
+ {% endfor %}
1007
+ {% if not loop.last %}---{% endif %}
1008
+ {% endfor %}
1009
+
1010
+ **Note**: Only the first 10 rows are shown to keep the prompt size manageable and reduce token costs.
1011
+ {% else %}
1012
+ ⚠️ Query returned no results.
1013
+ {% endif %}
1014
+
1015
+ ## Instructions
1016
+ Evaluate if the query answers the business question:
1017
+ 1. **Result Relevance**: Do the results match what was asked?
1018
+ 2. **Data Completeness**: Are all necessary columns present?
1019
+ 3. **Correctness**: Are calculations and aggregations correct?
1020
+ 4. **Usability**: Are results formatted appropriately?
1021
+
1022
+ ## Output Format
1023
+ Return JSON evaluation:
1024
+
1025
+ ```json
1026
+ {
1027
+ "answersQuestion": true,
1028
+ "confidence": 0.95,
1029
+ "reasoning": "Query correctly aggregates orders by customer and sorts by total revenue descending. Sample results show expected data.",
1030
+ "suggestions": [
1031
+ "Consider adding customer contact info for better usability",
1032
+ "Add date range parameter to filter orders by time period"
1033
+ ],
1034
+ "needsRefinement": false
1035
+ }
1036
+ ```
1037
+ ```
1038
+
1039
+ **AI Prompt Configuration** (`.prompts.json`):
1040
+ ```json
1041
+ {
1042
+ "fields": {
1043
+ "Name": "Query Result Evaluator",
1044
+ "Description": "Evaluates if a query correctly answers the business question",
1045
+ "TypeID": "@lookup:AI Prompt Types.Name=Chat",
1046
+ "TemplateText": "@file:templates/query-gen/query-evaluator.template.md",
1047
+ "Status": "Active",
1048
+ "ResponseFormat": "JSON",
1049
+ "SelectionStrategy": "Specific",
1050
+ "PowerPreference": "Highest",
1051
+ "ParallelizationMode": "None",
1052
+ "OutputType": "object",
1053
+ "ValidationBehavior": "Strict",
1054
+ "MaxRetries": 3,
1055
+ "FailoverMaxAttempts": 5,
1056
+ "PromptRole": "System",
1057
+ "PromptPosition": "First",
1058
+ "CategoryID": "@lookup:AI Prompt Categories.Name=Query Generation"
1059
+ },
1060
+ "relatedEntities": {
1061
+ "MJ: AI Prompt Models": [
1062
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Claude 4.5 Sonnet", "VendorID": "@lookup:MJ: AI Vendors.Name=Anthropic", "Priority": 1 } },
1063
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 2 } },
1064
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Cerebras", "Priority": 3 } },
1065
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Gemini 2.5 Flash", "VendorID": "@lookup:MJ: AI Vendors.Name=Google", "Priority": 4 } },
1066
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT-OSS-120B", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 5 } },
1067
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT 5-nano", "VendorID": "@lookup:MJ: AI Vendors.Name=OpenAI", "Priority": 6 } }
1068
+ ]
1069
+ }
1070
+ }
1071
+ ```
1072
+
1073
+ ### 7.2 Query Refiner Implementation
1074
+ **Purpose**: Iteratively improve queries based on evaluation feedback
1075
+
1076
+ **AI Prompt**: `metadata/prompts/templates/query-gen/query-refiner.template.md`
1077
+
1078
+ **Prompt Content**:
1079
+ ```markdown
1080
+ # Query Refiner
1081
+
1082
+ You are an expert SQL developer refining a query based on evaluation feedback.
1083
+
1084
+ ## Original Business Question
1085
+ **User Question**: {{ userQuestion }}
1086
+ **Description**: {{ description }}
1087
+
1088
+ ## Current Query
1089
+ ```sql
1090
+ {{ currentSQL }}
1091
+ ```
1092
+
1093
+ ## Evaluation Feedback
1094
+
1095
+ **Answers Question**: {% if evaluationFeedback.answersQuestion %}✅ Yes{% else %}❌ No{% endif %}
1096
+ **Confidence**: {{ evaluationFeedback.confidence * 100 }}%
1097
+ **Needs Refinement**: {% if evaluationFeedback.needsRefinement %}Yes{% else %}No{% endif %}
1098
+
1099
+ **Reasoning**: {{ evaluationFeedback.reasoning }}
1100
+
1101
+ {% if evaluationFeedback.suggestions.length > 0 %}
1102
+ **Suggestions for Improvement**:
1103
+ {% for suggestion in evaluationFeedback.suggestions %}
1104
+ {{ loop.index }}. {{ suggestion }}
1105
+ {% endfor %}
1106
+ {% endif %}
1107
+
1108
+ ## Entity Metadata
1109
+
1110
+ {% for entity in entityMetadata %}
1111
+ ### {{ entity.entityName }}
1112
+ - **Schema.View**: `[{{ entity.schemaName }}].[{{ entity.baseView }}]`
1113
+ - **Description**: {{ entity.description }}
1114
+
1115
+ **Available Fields**:
1116
+ {% for field in entity.fields %}
1117
+ - `{{ field.name }}` ({{ field.type }}){% if field.isPrimaryKey %} [PK]{% endif %}{% if field.isForeignKey %} [FK→{{ field.relatedEntity }}]{% endif %}
1118
+ {% endfor %}
1119
+ {% endfor %}
1120
+
1121
+ ## Instructions
1122
+ Refine the query based on suggestions:
1123
+ 1. Address concerns raised in evaluation
1124
+ 2. Implement suggested improvements
1125
+ 3. Maintain query correctness and performance
1126
+ 4. Preserve existing parameters unless changing them improves the query
1127
+
1128
+ ## Output Format
1129
+ Return JSON with refined query:
1130
+
1131
+ ```json
1132
+ {
1133
+ "sql": "SELECT ... (refined)",
1134
+ "selectClause": [...],
1135
+ "parameters": [...],
1136
+ "improvementsSummary": "Added customer contact columns and date range filter as suggested"
1137
+ }
1138
+ ```
1139
+ ```
1140
+
1141
+ **AI Prompt Configuration** (`.prompts.json`):
1142
+ ```json
1143
+ {
1144
+ "fields": {
1145
+ "Name": "Query Refiner",
1146
+ "Description": "Refines queries based on evaluation feedback",
1147
+ "TypeID": "@lookup:AI Prompt Types.Name=Chat",
1148
+ "TemplateText": "@file:templates/query-gen/query-refiner.template.md",
1149
+ "Status": "Active",
1150
+ "ResponseFormat": "JSON",
1151
+ "SelectionStrategy": "Specific",
1152
+ "PowerPreference": "Highest",
1153
+ "ParallelizationMode": "None",
1154
+ "OutputType": "object",
1155
+ "ValidationBehavior": "Strict",
1156
+ "MaxRetries": 3,
1157
+ "FailoverMaxAttempts": 5,
1158
+ "PromptRole": "System",
1159
+ "PromptPosition": "First",
1160
+ "CategoryID": "@lookup:AI Prompt Categories.Name=Query Generation"
1161
+ },
1162
+ "relatedEntities": {
1163
+ "MJ: AI Prompt Models": [
1164
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Claude 4.5 Sonnet", "VendorID": "@lookup:MJ: AI Vendors.Name=Anthropic", "Priority": 1 } },
1165
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 2 } },
1166
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Kimi K2", "VendorID": "@lookup:MJ: AI Vendors.Name=Cerebras", "Priority": 3 } },
1167
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=Gemini 2.5 Flash", "VendorID": "@lookup:MJ: AI Vendors.Name=Google", "Priority": 4 } },
1168
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT-OSS-120B", "VendorID": "@lookup:MJ: AI Vendors.Name=Groq", "Priority": 5 } },
1169
+ { "fields": { "PromptID": "@parent:ID", "ModelID": "@lookup:AI Models.Name=GPT 5-nano", "VendorID": "@lookup:MJ: AI Vendors.Name=OpenAI", "Priority": 6 } }
1170
+ ]
1171
+ }
1172
+ }
1173
+ ```
1174
+
1175
+ ### 7.3 Refinement Loop Implementation
1176
+ ```typescript
1177
+ class QueryRefiner {
1178
+ async refineQuery(
1179
+ query: GeneratedQuery,
1180
+ businessQuestion: BusinessQuestion,
1181
+ entityMetadata: EntityMetadataForPrompt[],
1182
+ maxRefinements: number = 3
1183
+ ): Promise<RefinedQuery> {
1184
+ let currentQuery = query;
1185
+ let refinementCount = 0;
1186
+
1187
+ while (refinementCount < maxRefinements) {
1188
+ // 1. Test the current query
1189
+ const testResult = await this.tester.testQuery(currentQuery);
1190
+
1191
+ if (!testResult.success) {
1192
+ throw new Error(`Query testing failed: ${testResult.error}`);
1193
+ }
1194
+
1195
+ // 2. Evaluate if it answers the question
1196
+ const evaluation = await this.evaluateQuery(
1197
+ currentQuery,
1198
+ businessQuestion,
1199
+ testResult.sampleRows
1200
+ );
1201
+
1202
+ // 3. If evaluation passes, we're done!
1203
+ if (evaluation.answersQuestion && !evaluation.needsRefinement) {
1204
+ return {
1205
+ query: currentQuery,
1206
+ testResult,
1207
+ evaluation,
1208
+ refinementCount
1209
+ };
1210
+ }
1211
+
1212
+ // 4. Refine the query based on suggestions
1213
+ refinementCount++;
1214
+ console.log(`Refinement iteration ${refinementCount}/${maxRefinements}`);
1215
+
1216
+ currentQuery = await this.performRefinement(
1217
+ currentQuery,
1218
+ businessQuestion,
1219
+ evaluation,
1220
+ entityMetadata
1221
+ );
1222
+ }
1223
+
1224
+ // Reached max refinements - return best attempt
1225
+ return {
1226
+ query: currentQuery,
1227
+ testResult: await this.tester.testQuery(currentQuery),
1228
+ evaluation: await this.evaluateQuery(currentQuery, businessQuestion, []),
1229
+ refinementCount,
1230
+ reachedMaxRefinements: true
1231
+ };
1232
+ }
1233
+
1234
+ private async evaluateQuery(
1235
+ query: GeneratedQuery,
1236
+ businessQuestion: BusinessQuestion,
1237
+ sampleResults: any[]
1238
+ ): Promise<QueryEvaluation> {
1239
+ const promptRunner = new AIPromptRunner();
1240
+ const result = await promptRunner.ExecutePrompt({
1241
+ prompt: await this.getPrompt('Query Result Evaluator'),
1242
+ data: {
1243
+ userQuestion: businessQuestion.userQuestion,
1244
+ description: businessQuestion.description,
1245
+ technicalDescription: businessQuestion.technicalDescription,
1246
+ generatedSQL: query.sql,
1247
+ parameters: query.parameters,
1248
+ sampleResults
1249
+ },
1250
+ contextUser: this.contextUser
1251
+ });
1252
+
1253
+ return result.result as QueryEvaluation;
1254
+ }
1255
+
1256
+ private async performRefinement(
1257
+ query: GeneratedQuery,
1258
+ businessQuestion: BusinessQuestion,
1259
+ evaluation: QueryEvaluation,
1260
+ entityMetadata: EntityMetadataForPrompt[]
1261
+ ): Promise<GeneratedQuery> {
1262
+ const promptRunner = new AIPromptRunner();
1263
+ const result = await promptRunner.ExecutePrompt({
1264
+ prompt: await this.getPrompt('Query Refiner'),
1265
+ data: {
1266
+ userQuestion: businessQuestion.userQuestion,
1267
+ description: businessQuestion.description,
1268
+ currentSQL: query.sql,
1269
+ evaluationFeedback: evaluation,
1270
+ entityMetadata
1271
+ },
1272
+ contextUser: this.contextUser
1273
+ });
1274
+
1275
+ return result.result as GeneratedQuery;
1276
+ }
1277
+ }
1278
+ ```
1279
+
1280
+ ---
1281
+
1282
+ ## Phase 8: Metadata Export (Week 7)
1283
+
1284
+ ### 8.1 MetadataExporter Implementation
1285
+ **Purpose**: Export validated queries to MJ metadata format
1286
+
1287
+ **Output Format**: MJ Queries metadata JSON file
1288
+
1289
+ ```typescript
1290
+ class MetadataExporter {
1291
+ async exportQueries(
1292
+ validatedQueries: ValidatedQuery[],
1293
+ outputDirectory: string
1294
+ ): Promise<ExportResult> {
1295
+ // 1. Transform to MJ Query metadata format
1296
+ const metadata = validatedQueries.map(q => this.toQueryMetadata(q));
1297
+
1298
+ // 2. Create metadata file structure
1299
+ const metadataFile = {
1300
+ timestamp: new Date().toISOString(),
1301
+ generatedBy: 'query-gen',
1302
+ version: '1.0',
1303
+ queries: metadata
1304
+ };
1305
+
1306
+ // 3. Write to file
1307
+ const outputPath = path.join(outputDirectory, `queries-${Date.now()}.json`);
1308
+ await fs.writeFile(
1309
+ outputPath,
1310
+ JSON.stringify(metadataFile, null, 2),
1311
+ 'utf-8'
1312
+ );
1313
+
1314
+ return {
1315
+ success: true,
1316
+ outputPath,
1317
+ queryCount: metadata.length
1318
+ };
1319
+ }
1320
+
1321
+ private toQueryMetadata(query: ValidatedQuery): QueryMetadataRecord {
1322
+ return {
1323
+ fields: {
1324
+ Name: this.generateQueryName(query.businessQuestion),
1325
+ CategoryID: '@lookup:Query Categories.Name=Auto-Generated',
1326
+ UserQuestion: query.businessQuestion.userQuestion,
1327
+ Description: query.businessQuestion.description,
1328
+ TechnicalDescription: query.businessQuestion.technicalDescription,
1329
+ SQL: query.query.sql,
1330
+ OriginalSQL: query.query.sql,
1331
+ UsesTemplate: true,
1332
+ Status: 'Active'
1333
+ },
1334
+ relatedEntities: {
1335
+ 'Query Fields': query.query.selectClause.map((field, i) => ({
1336
+ fields: {
1337
+ QueryID: '@parent:ID',
1338
+ Name: field.name,
1339
+ Description: field.description,
1340
+ SQLBaseType: field.type,
1341
+ Sequence: i + 1
1342
+ }
1343
+ })),
1344
+ 'Query Params': query.query.parameters.map((param, i) => ({
1345
+ fields: {
1346
+ QueryID: '@parent:ID',
1347
+ Name: param.name,
1348
+ Type: param.type,
1349
+ Description: param.description,
1350
+ ValidationFilters: param.usage.join(', '),
1351
+ IsRequired: param.isRequired,
1352
+ DefaultValue: param.defaultValue,
1353
+ Sequence: i + 1
1354
+ }
1355
+ }))
1356
+ }
1357
+ };
1358
+ }
1359
+
1360
+ private generateQueryName(question: BusinessQuestion): string {
1361
+ // Convert user question to a concise name
1362
+ // "What are the top customers by revenue?" -> "Top Customers By Revenue"
1363
+ return question.userQuestion
1364
+ .replace(/\?/g, '')
1365
+ .split(' ')
1366
+ .filter(word => word.length > 2)
1367
+ .map(word => word.charAt(0).toUpperCase() + word.slice(1))
1368
+ .slice(0, 5)
1369
+ .join(' ');
1370
+ }
1371
+ }
1372
+ ```
1373
+
1374
+ ### 8.2 Database Direct Insert (Optional)
1375
+ **Purpose**: Alternative to metadata files - insert directly into database
1376
+
1377
+ ```typescript
1378
+ class QueryDatabaseWriter {
1379
+ async writeQueriesToDatabase(
1380
+ validatedQueries: ValidatedQuery[],
1381
+ contextUser: UserInfo
1382
+ ): Promise<WriteResult> {
1383
+ const md = new Metadata();
1384
+ const results: string[] = [];
1385
+
1386
+ for (const vq of validatedQueries) {
1387
+ try {
1388
+ // 1. Create Query entity
1389
+ const query = await md.GetEntityObject<QueryEntity>('Queries', contextUser);
1390
+ query.NewRecord();
1391
+ query.Name = this.generateQueryName(vq.businessQuestion);
1392
+ query.CategoryID = await this.findOrCreateCategory('Auto-Generated');
1393
+ query.UserQuestion = vq.businessQuestion.userQuestion;
1394
+ query.Description = vq.businessQuestion.description;
1395
+ query.TechnicalDescription = vq.businessQuestion.technicalDescription;
1396
+ query.SQL = vq.query.sql;
1397
+ query.OriginalSQL = vq.query.sql;
1398
+ query.UsesTemplate = true;
1399
+ query.Status = 'Active';
1400
+
1401
+ const saved = await query.Save();
1402
+ if (!saved) {
1403
+ throw new Error(`Failed to save query: ${query.LatestResult?.Message}`);
1404
+ }
1405
+
1406
+ // 2. Create Query Fields
1407
+ for (let i = 0; i < vq.query.selectClause.length; i++) {
1408
+ const field = vq.query.selectClause[i];
1409
+ const qf = await md.GetEntityObject<QueryFieldEntity>('Query Fields', contextUser);
1410
+ qf.NewRecord();
1411
+ qf.QueryID = query.ID;
1412
+ qf.Name = field.name;
1413
+ qf.Description = field.description;
1414
+ qf.SQLBaseType = field.type;
1415
+ qf.Sequence = i + 1;
1416
+ await qf.Save();
1417
+ }
1418
+
1419
+ // 3. Create Query Params
1420
+ for (let i = 0; i < vq.query.parameters.length; i++) {
1421
+ const param = vq.query.parameters[i];
1422
+ const qp = await md.GetEntityObject<QueryParamEntity>('Query Params', contextUser);
1423
+ qp.NewRecord();
1424
+ qp.QueryID = query.ID;
1425
+ qp.Name = param.name;
1426
+ qp.Type = param.type;
1427
+ qp.Description = param.description;
1428
+ qp.IsRequired = param.isRequired;
1429
+ qp.DefaultValue = param.defaultValue;
1430
+ qp.Sequence = i + 1;
1431
+ await qp.Save();
1432
+ }
1433
+
1434
+ results.push(`✓ ${query.Name} (ID: ${query.ID})`);
1435
+
1436
+ } catch (error) {
1437
+ results.push(`✗ ${vq.businessQuestion.userQuestion}: ${extractErrorMessage(error, 'Database Write')}`);
1438
+ }
1439
+ }
1440
+
1441
+ return {
1442
+ success: true,
1443
+ results
1444
+ };
1445
+ }
1446
+ }
1447
+ ```
1448
+
1449
+ ---
1450
+
1451
+ ## Phase 9: CLI Implementation (Week 8)
1452
+
1453
+ ### 9.1 CLI Command Structure
1454
+ ```typescript
1455
+ // src/cli/index.ts
1456
+ import { Command } from 'commander';
1457
+
1458
+ const program = new Command();
1459
+
1460
+ program
1461
+ .name('mj querygen')
1462
+ .description('AI-powered SQL query template generation for MemberJunction')
1463
+ .version('1.0.0');
1464
+
1465
+ program
1466
+ .command('generate')
1467
+ .description('Generate queries for entities')
1468
+ .option('-e, --entities <names...>', 'Specific entities to generate queries for')
1469
+ .option('-x, --exclude-entities <names...>', 'Entities to exclude')
1470
+ .option('-s, --exclude-schemas <names...>', 'Schemas to exclude')
1471
+ .option('-m, --max-entities <number>', 'Max entities per group', '3')
1472
+ .option('-r, --max-refinements <number>', 'Max refinement iterations', '3')
1473
+ .option('-f, --max-fixes <number>', 'Max error-fixing attempts', '5')
1474
+ .option('--model <name>', 'Preferred AI model')
1475
+ .option('--vendor <name>', 'Preferred AI vendor')
1476
+ .option('-o, --output <path>', 'Output directory', './metadata/queries')
1477
+ .option('--mode <mode>', 'Output mode: metadata|database|both', 'metadata')
1478
+ .option('-v, --verbose', 'Verbose output')
1479
+ .action(generateCommand);
1480
+
1481
+ program
1482
+ .command('validate')
1483
+ .description('Validate existing query templates')
1484
+ .option('-p, --path <path>', 'Path to queries metadata file')
1485
+ .action(validateCommand);
1486
+
1487
+ program
1488
+ .command('export')
1489
+ .description('Export queries from database to metadata files')
1490
+ .option('-o, --output <path>', 'Output directory')
1491
+ .action(exportCommand);
1492
+
1493
+ program.parse();
1494
+ ```
1495
+
1496
+ ### 9.2 Generate Command Implementation
1497
+ ```typescript
1498
+ async function generateCommand(options: any): Promise<void> {
1499
+ const spinner = ora('Initializing query generation...').start();
1500
+
1501
+ try {
1502
+ // 1. Load configuration
1503
+ const config = await loadConfig(options);
1504
+
1505
+ // 2. Connect to database and load metadata
1506
+ spinner.text = 'Loading metadata...';
1507
+ await Metadata.Provider.Config(false, contextUser);
1508
+
1509
+ // 3. Build entity groups
1510
+ spinner.text = 'Analyzing entity relationships...';
1511
+ const grouper = new EntityGrouper(config);
1512
+ const entityGroups = await grouper.generateEntityGroups();
1513
+ spinner.succeed(`Found ${entityGroups.length} entity groups`);
1514
+
1515
+ // 4. Initialize vector similarity search
1516
+ spinner.start('Embedding golden queries...');
1517
+ const embeddingService = new EmbeddingService(config.embeddingModel);
1518
+ const goldenQueries = await loadGoldenQueries();
1519
+ const embeddedGolden = await embeddingService.embedGoldenQueries(goldenQueries);
1520
+ spinner.succeed(`Embedded ${goldenQueries.length} golden queries`);
1521
+
1522
+ // 5. Generate queries for each entity group
1523
+ const totalGroups = entityGroups.length;
1524
+ let processedGroups = 0;
1525
+ const allValidatedQueries: ValidatedQuery[] = [];
1526
+
1527
+ for (const group of entityGroups) {
1528
+ processedGroups++;
1529
+ spinner.start(`[${processedGroups}/${totalGroups}] Processing ${group.primaryEntity.Name}...`);
1530
+
1531
+ try {
1532
+ // 5a. Generate business questions
1533
+ const questionGen = new QuestionGenerator(config);
1534
+ const questions = await questionGen.generateQuestions(group);
1535
+
1536
+ // 5b. For each question, generate and validate query
1537
+ for (const question of questions) {
1538
+ spinner.text = `[${processedGroups}/${totalGroups}] Generating query: ${question.userQuestion}`;
1539
+
1540
+ // Embed question for similarity search
1541
+ const questionEmbedding = await embeddingService.embedQuery({
1542
+ name: '',
1543
+ userQuestion: question.userQuestion,
1544
+ description: question.description,
1545
+ technicalDescription: question.technicalDescription,
1546
+ sql: ''
1547
+ });
1548
+
1549
+ // Find similar golden queries
1550
+ const similaritySearch = new SimilaritySearch();
1551
+ const fewShotExamples = await similaritySearch.findSimilarQueries(
1552
+ questionEmbedding,
1553
+ embeddedGolden,
1554
+ config.topSimilarQueries,
1555
+ config.similarityThreshold
1556
+ );
1557
+
1558
+ // Generate SQL query
1559
+ const queryWriter = new QueryWriter(config);
1560
+ const generatedQuery = await queryWriter.generateQuery(
1561
+ question,
1562
+ group.entities.map(e => formatEntityMetadata(e)),
1563
+ fewShotExamples.map(s => s.query)
1564
+ );
1565
+
1566
+ // Test and fix query
1567
+ const queryTester = new QueryTester(config);
1568
+ const testResult = await queryTester.testQuery(
1569
+ generatedQuery,
1570
+ config.maxFixingIterations
1571
+ );
1572
+
1573
+ if (!testResult.success) {
1574
+ spinner.warn(`Query failed after ${config.maxFixingIterations} attempts: ${question.userQuestion}`);
1575
+ continue;
1576
+ }
1577
+
1578
+ // Refine query
1579
+ const queryRefiner = new QueryRefiner(config);
1580
+ const refinedResult = await queryRefiner.refineQuery(
1581
+ generatedQuery,
1582
+ question,
1583
+ group.entities.map(e => formatEntityMetadata(e)),
1584
+ config.maxRefinementIterations
1585
+ );
1586
+
1587
+ allValidatedQueries.push({
1588
+ businessQuestion: question,
1589
+ query: refinedResult.query,
1590
+ testResult: refinedResult.testResult,
1591
+ evaluation: refinedResult.evaluation,
1592
+ entityGroup: group
1593
+ });
1594
+
1595
+ spinner.text = `[${processedGroups}/${totalGroups}] ✓ ${question.userQuestion}`;
1596
+ }
1597
+
1598
+ spinner.succeed(`[${processedGroups}/${totalGroups}] ${group.primaryEntity.Name} complete (${questions.length} queries)`);
1599
+
1600
+ } catch (error) {
1601
+ spinner.warn(`[${processedGroups}/${totalGroups}] Error processing ${group.primaryEntity.Name}: ${extractErrorMessage(error, 'Query Generation')}`);
1602
+ }
1603
+ }
1604
+
1605
+ // 6. Export results
1606
+ spinner.start(`Exporting ${allValidatedQueries.length} queries...`);
1607
+
1608
+ if (config.outputMode === 'metadata' || config.outputMode === 'both') {
1609
+ const exporter = new MetadataExporter();
1610
+ const exportResult = await exporter.exportQueries(
1611
+ allValidatedQueries,
1612
+ config.outputDirectory
1613
+ );
1614
+ spinner.succeed(`Exported to ${exportResult.outputPath}`);
1615
+ }
1616
+
1617
+ if (config.outputMode === 'database' || config.outputMode === 'both') {
1618
+ const dbWriter = new QueryDatabaseWriter();
1619
+ const writeResult = await dbWriter.writeQueriesToDatabase(
1620
+ allValidatedQueries,
1621
+ contextUser
1622
+ );
1623
+ spinner.succeed(`Wrote ${allValidatedQueries.length} queries to database`);
1624
+ }
1625
+
1626
+ // 7. Summary
1627
+ console.log('\n✅ Query generation complete!\n');
1628
+ console.log(`Entity Groups Processed: ${processedGroups}`);
1629
+ console.log(`Queries Generated: ${allValidatedQueries.length}`);
1630
+ console.log(`Output Location: ${config.outputDirectory}`);
1631
+
1632
+ } catch (error) {
1633
+ spinner.fail('Query generation failed');
1634
+ console.error(extractErrorMessage(error, 'Query Generation'));
1635
+ process.exit(1);
1636
+ }
1637
+ }
1638
+ ```
1639
+
1640
+ ### 9.3 Progress Reporting
1641
+ **Features**:
1642
+ - Use `ora` for spinners during long operations
1643
+ - Use `chalk` for colored console output
1644
+ - Show progress for each entity group: `[3/15] Processing Customers...`
1645
+ - Display summary statistics at the end
1646
+ - Save detailed logs to file if verbose mode enabled
1647
+
1648
+ ---
1649
+
1650
+ ## Phase 10: Testing & Documentation (Week 9)
1651
+
1652
+ ### 10.1 Unit Tests
1653
+ **Test Coverage**:
1654
+ - Entity grouping logic
1655
+ - Vector similarity search
1656
+ - Query parameter rendering
1657
+ - SQL execution and error handling
1658
+ - Metadata export format
1659
+
1660
+ ### 10.2 Integration Tests
1661
+ **Test Scenarios**:
1662
+ - Full generation workflow on test database
1663
+ - AI prompt failover scenarios
1664
+ - Query refinement iterations
1665
+ - Database vs. metadata output modes
1666
+
1667
+ ### 10.3 Documentation
1668
+ **README.md Contents**:
1669
+ - Installation instructions
1670
+ - Configuration guide
1671
+ - CLI command reference
1672
+ - Example workflows
1673
+ - Troubleshooting guide
1674
+
1675
+ **Example Usage**:
1676
+ ```bash
1677
+ # Generate queries for all entities
1678
+ mj querygen generate
1679
+
1680
+ # Generate for specific entities
1681
+ mj querygen generate -e Customers Orders Products
1682
+
1683
+ # Exclude schemas
1684
+ mj querygen generate -s __mj internal
1685
+
1686
+ # Override AI model
1687
+ mj querygen generate --model "Claude 4.5 Sonnet" --vendor Anthropic
1688
+
1689
+ # Output to database
1690
+ mj querygen generate --mode database
1691
+
1692
+ # Verbose output
1693
+ mj querygen generate -v
1694
+ ```
1695
+
1696
+ ---
1697
+
1698
+ ## Phase 11: Optimization & Polish (Week 10)
1699
+
1700
+ ### 11.1 Performance Optimizations
1701
+ - **Parallel Processing**: Generate queries for multiple entity groups in parallel (config: `parallelGenerations: 3`)
1702
+ - **Caching**: Cache AI prompt results to avoid re-running identical prompts
1703
+ - **Connection Pooling**: Reuse database connections efficiently
1704
+ - **Streaming**: Process large entity lists in batches
1705
+
1706
+ ### 11.2 Error Handling Improvements
1707
+ - Graceful degradation when AI models are unavailable
1708
+ - Detailed error logs with context
1709
+ - Retry logic with exponential backoff
1710
+ - User-friendly error messages
1711
+
1712
+ ### 11.3 Code Quality
1713
+ - ESLint/Prettier formatting
1714
+ - TypeScript strict mode
1715
+ - Comprehensive JSDoc comments
1716
+ - Refactor long functions (follow functional decomposition guidelines)
1717
+
1718
+ ---
1719
+
1720
+ ## Summary Timeline
1721
+
1722
+ | Phase | Duration | Key Deliverables |
1723
+ |-------|----------|------------------|
1724
+ | 1. Project Setup | Week 1 | Package structure, config system, dependencies |
1725
+ | 2. Entity Analysis | Week 2 | EntityGrouper, relationship graph |
1726
+ | 3. Business Questions | Week 2-3 | QuestionGenerator, AI prompt |
1727
+ | 4. Vector Similarity | Week 3 | EmbeddingService, SimilaritySearch |
1728
+ | 5. SQL Generation | Week 4 | QueryWriter, few-shot learning |
1729
+ | 6. Query Testing | Week 5 | QueryTester, QueryFixer, error handling |
1730
+ | 7. Query Refinement | Week 6 | QueryRefiner, evaluation loop |
1731
+ | 8. Metadata Export | Week 7 | MetadataExporter, database writer |
1732
+ | 9. CLI Implementation | Week 8 | Command structure, progress reporting |
1733
+ | 10. Testing & Docs | Week 9 | Unit tests, integration tests, README |
1734
+ | 11. Optimization | Week 10 | Performance tuning, error handling, polish |
1735
+
1736
+ **Total Duration**: ~10 weeks
1737
+
1738
+ ---
1739
+
1740
+ ## Key Design Decisions Summary
1741
+
1742
+ 1. **Configuration**: Integrate with `mj.config.cjs` for consistency
1743
+ 2. **Golden Queries**: Embed as JSON file in `src/data/` directory
1744
+ 3. **AI Prompts**: 5 new prompts with 6-model failover configuration
1745
+ 4. **Vector Search**: Use local embeddings (`all-MiniLM-L6-v2`) for similarity
1746
+ 5. **Testing Strategy**: Render with sample values → execute → fix → refine
1747
+ 6. **Output Modes**: Metadata files (default), database, or both
1748
+ 7. **Parallelization**: Process multiple entity groups concurrently
1749
+ 8. **Error Handling**: Follow MJ standards with `extractErrorMessage` utility
1750
+
1751
+ ---
1752
+
1753
+ This comprehensive plan provides a clear roadmap for implementing the `@memberjunction/query-gen` package. The phased approach ensures steady progress with testable milestones at each stage.