@memberjunction/query-gen 0.0.1 → 2.126.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/.turbo/turbo-build.log +4 -0
  2. package/CHANGELOG.md +34 -0
  3. package/COORDINATOR.md +768 -0
  4. package/IMPLEMENTATION_PLAN.md +1753 -0
  5. package/LLM_ENTITY_GROUPING_PLAN.md +977 -0
  6. package/README.md +675 -29
  7. package/dist/cli/commands/export.d.ts +15 -0
  8. package/dist/cli/commands/export.d.ts.map +1 -0
  9. package/dist/cli/commands/export.js +178 -0
  10. package/dist/cli/commands/export.js.map +1 -0
  11. package/dist/cli/commands/generate.d.ts +19 -0
  12. package/dist/cli/commands/generate.d.ts.map +1 -0
  13. package/dist/cli/commands/generate.js +282 -0
  14. package/dist/cli/commands/generate.js.map +1 -0
  15. package/dist/cli/commands/validate.d.ts +17 -0
  16. package/dist/cli/commands/validate.d.ts.map +1 -0
  17. package/dist/cli/commands/validate.js +193 -0
  18. package/dist/cli/commands/validate.js.map +1 -0
  19. package/dist/cli/config.d.ts +51 -0
  20. package/dist/cli/config.d.ts.map +1 -0
  21. package/dist/cli/config.js +142 -0
  22. package/dist/cli/config.js.map +1 -0
  23. package/dist/cli/index.d.ts +13 -0
  24. package/dist/cli/index.d.ts.map +1 -0
  25. package/dist/cli/index.js +57 -0
  26. package/dist/cli/index.js.map +1 -0
  27. package/dist/core/EntityGrouper.d.ts +74 -0
  28. package/dist/core/EntityGrouper.d.ts.map +1 -0
  29. package/dist/core/EntityGrouper.js +246 -0
  30. package/dist/core/EntityGrouper.js.map +1 -0
  31. package/dist/core/MetadataExporter.d.ts +59 -0
  32. package/dist/core/MetadataExporter.d.ts.map +1 -0
  33. package/dist/core/MetadataExporter.js +151 -0
  34. package/dist/core/MetadataExporter.js.map +1 -0
  35. package/dist/core/QueryDatabaseWriter.d.ts +50 -0
  36. package/dist/core/QueryDatabaseWriter.d.ts.map +1 -0
  37. package/dist/core/QueryDatabaseWriter.js +152 -0
  38. package/dist/core/QueryDatabaseWriter.js.map +1 -0
  39. package/dist/core/QueryFixer.d.ts +48 -0
  40. package/dist/core/QueryFixer.d.ts.map +1 -0
  41. package/dist/core/QueryFixer.js +115 -0
  42. package/dist/core/QueryFixer.js.map +1 -0
  43. package/dist/core/QueryRefiner.d.ts +94 -0
  44. package/dist/core/QueryRefiner.d.ts.map +1 -0
  45. package/dist/core/QueryRefiner.js +267 -0
  46. package/dist/core/QueryRefiner.js.map +1 -0
  47. package/dist/core/QueryTester.d.ts +70 -0
  48. package/dist/core/QueryTester.d.ts.map +1 -0
  49. package/dist/core/QueryTester.js +243 -0
  50. package/dist/core/QueryTester.js.map +1 -0
  51. package/dist/core/QueryWriter.d.ts +57 -0
  52. package/dist/core/QueryWriter.d.ts.map +1 -0
  53. package/dist/core/QueryWriter.js +184 -0
  54. package/dist/core/QueryWriter.js.map +1 -0
  55. package/dist/core/QuestionGenerator.d.ts +58 -0
  56. package/dist/core/QuestionGenerator.d.ts.map +1 -0
  57. package/dist/core/QuestionGenerator.js +145 -0
  58. package/dist/core/QuestionGenerator.js.map +1 -0
  59. package/dist/data/schema.d.ts +230 -0
  60. package/dist/data/schema.d.ts.map +1 -0
  61. package/dist/data/schema.js +6 -0
  62. package/dist/data/schema.js.map +1 -0
  63. package/dist/index.d.ts +28 -0
  64. package/dist/index.d.ts.map +1 -0
  65. package/dist/index.js +77 -0
  66. package/dist/index.js.map +1 -0
  67. package/dist/prompts/PromptNames.d.ts +32 -0
  68. package/dist/prompts/PromptNames.d.ts.map +1 -0
  69. package/dist/prompts/PromptNames.js +35 -0
  70. package/dist/prompts/PromptNames.js.map +1 -0
  71. package/dist/utils/category-builder.d.ts +28 -0
  72. package/dist/utils/category-builder.d.ts.map +1 -0
  73. package/dist/utils/category-builder.js +90 -0
  74. package/dist/utils/category-builder.js.map +1 -0
  75. package/dist/utils/entity-helpers.d.ts +49 -0
  76. package/dist/utils/entity-helpers.d.ts.map +1 -0
  77. package/dist/utils/entity-helpers.js +189 -0
  78. package/dist/utils/entity-helpers.js.map +1 -0
  79. package/dist/utils/error-handlers.d.ts +19 -0
  80. package/dist/utils/error-handlers.d.ts.map +1 -0
  81. package/dist/utils/error-handlers.js +41 -0
  82. package/dist/utils/error-handlers.js.map +1 -0
  83. package/dist/utils/graph-helpers.d.ts +51 -0
  84. package/dist/utils/graph-helpers.d.ts.map +1 -0
  85. package/dist/utils/graph-helpers.js +82 -0
  86. package/dist/utils/graph-helpers.js.map +1 -0
  87. package/dist/utils/prompt-helpers.d.ts +25 -0
  88. package/dist/utils/prompt-helpers.d.ts.map +1 -0
  89. package/dist/utils/prompt-helpers.js +66 -0
  90. package/dist/utils/prompt-helpers.js.map +1 -0
  91. package/dist/utils/query-helpers.d.ts +23 -0
  92. package/dist/utils/query-helpers.d.ts.map +1 -0
  93. package/dist/utils/query-helpers.js +34 -0
  94. package/dist/utils/query-helpers.js.map +1 -0
  95. package/dist/utils/user-helpers.d.ts +15 -0
  96. package/dist/utils/user-helpers.d.ts.map +1 -0
  97. package/dist/utils/user-helpers.js +32 -0
  98. package/dist/utils/user-helpers.js.map +1 -0
  99. package/dist/vectors/EmbeddingService.d.ts +58 -0
  100. package/dist/vectors/EmbeddingService.d.ts.map +1 -0
  101. package/dist/vectors/EmbeddingService.js +90 -0
  102. package/dist/vectors/EmbeddingService.js.map +1 -0
  103. package/dist/vectors/SimilaritySearch.d.ts +51 -0
  104. package/dist/vectors/SimilaritySearch.d.ts.map +1 -0
  105. package/dist/vectors/SimilaritySearch.js +85 -0
  106. package/dist/vectors/SimilaritySearch.js.map +1 -0
  107. package/docs/API.md +1040 -0
  108. package/docs/ARCHITECTURE.md +1120 -0
  109. package/examples/advanced-usage.ts +401 -0
  110. package/examples/basic-usage.ts +285 -0
  111. package/package.json +48 -6
  112. package/src/cli/commands/export.ts +173 -0
  113. package/src/cli/commands/generate.ts +330 -0
  114. package/src/cli/commands/validate.ts +185 -0
  115. package/src/cli/config.ts +203 -0
  116. package/src/cli/index.ts +63 -0
  117. package/src/core/EntityGrouper.ts +318 -0
  118. package/src/core/MetadataExporter.ts +148 -0
  119. package/src/core/QueryDatabaseWriter.ts +187 -0
  120. package/src/core/QueryFixer.ts +153 -0
  121. package/src/core/QueryRefiner.ts +382 -0
  122. package/src/core/QueryTester.ts +264 -0
  123. package/src/core/QueryWriter.ts +239 -0
  124. package/src/core/QuestionGenerator.ts +199 -0
  125. package/src/data/golden-queries.json +1371 -0
  126. package/src/data/schema.ts +252 -0
  127. package/src/index.ts +49 -0
  128. package/src/prompts/PromptNames.ts +36 -0
  129. package/src/utils/category-builder.ts +97 -0
  130. package/src/utils/entity-helpers.ts +203 -0
  131. package/src/utils/error-handlers.ts +41 -0
  132. package/src/utils/graph-helpers.ts +99 -0
  133. package/src/utils/prompt-helpers.ts +79 -0
  134. package/src/utils/query-helpers.ts +32 -0
  135. package/src/utils/user-helpers.ts +39 -0
  136. package/src/vectors/EmbeddingService.ts +109 -0
  137. package/src/vectors/SimilaritySearch.ts +108 -0
  138. package/tsconfig.json +39 -0
@@ -0,0 +1,1120 @@
1
+ # QueryGen Architecture
2
+
3
+ This document provides a comprehensive technical deep-dive into the QueryGen package architecture, design decisions, and implementation details.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Overview](#overview)
8
+ - [11-Phase Pipeline](#11-phase-pipeline)
9
+ - [Core Components](#core-components)
10
+ - [Data Flow](#data-flow)
11
+ - [AI Integration](#ai-integration)
12
+ - [Database Integration](#database-integration)
13
+ - [Error Handling Strategy](#error-handling-strategy)
14
+ - [Performance Considerations](#performance-considerations)
15
+ - [Design Decisions](#design-decisions)
16
+
17
+ ## Overview
18
+
19
+ QueryGen is an AI-powered system for automatically generating, testing, and refining SQL query templates. It leverages MemberJunction's metadata system, AIEngine for LLM interactions, and vector embeddings for few-shot learning.
20
+
21
+ ### Architecture Principles
22
+
23
+ 1. **Modular Design** - Each phase is independent and testable
24
+ 2. **Error Resilience** - Comprehensive error handling with AI-powered fixing
25
+ 3. **Type Safety** - Explicit TypeScript types throughout
26
+ 4. **MJ Integration** - Deep integration with MemberJunction patterns
27
+ 5. **AI-First** - Leverages AI at every decision point
28
+
29
+ ### Technology Stack
30
+
31
+ - **Language**: TypeScript 5.0+
32
+ - **AI**: MemberJunction AIEngine with 6-model failover
33
+ - **Vector Embeddings**: Local embeddings via `text-embedding-3-small`
34
+ - **Templates**: Nunjucks for SQL templates
35
+ - **Database**: SQL Server 2016+
36
+ - **CLI**: Commander.js, ora, chalk
37
+
38
+ ## 11-Phase Pipeline
39
+
40
+ ### Phase 1: Entity Analysis
41
+
42
+ **Purpose**: Load and filter entities from MemberJunction metadata
43
+
44
+ **Implementation**:
45
+ ```typescript
46
+ const md = new Metadata();
47
+ const allEntities = md.Entities.filter(
48
+ e => !config.excludeSchemas.includes(e.SchemaName || '')
49
+ );
50
+ ```
51
+
52
+ **Key Operations**:
53
+ - Load all entities from Metadata.Provider
54
+ - Apply include/exclude filters from config
55
+ - Exclude system schemas (sys, INFORMATION_SCHEMA)
56
+ - Validate entity metadata completeness
57
+
58
+ **Output**: Filtered array of `EntityInfo` objects ready for grouping
59
+
60
+ ---
61
+
62
+ ### Phase 2: Entity Grouping
63
+
64
+ **Purpose**: Create logical groups of 1-N related entities based on foreign key relationships
65
+
66
+ **Implementation**: `EntityGrouper` class
67
+
68
+ **Algorithm**:
69
+ 1. **Build Relationship Graph**:
70
+ ```typescript
71
+ // Map entity names to their related entities
72
+ const graph = new Map<string, RelationshipInfo[]>();
73
+
74
+ for (const entity of entities) {
75
+ const relationships: RelationshipInfo[] = [];
76
+ for (const field of entity.Fields) {
77
+ if (isForeignKeyField(field)) {
78
+ relationships.push({
79
+ from: entity.Name,
80
+ to: relatedEntityName,
81
+ via: field.Name,
82
+ type: 'many-to-one'
83
+ });
84
+ }
85
+ }
86
+ graph.set(entity.Name, relationships);
87
+ }
88
+ ```
89
+
90
+ 2. **Breadth-First Traversal**:
91
+ - Start from each entity as "primary entity"
92
+ - Find directly related entities (1 hop away)
93
+ - Then find entities 2 hops away, etc.
94
+ - Generate combinations up to `maxEntitiesPerGroup`
95
+
96
+ 3. **Deduplication**:
97
+ - Sort entity IDs in each group
98
+ - Use sorted IDs as unique key
99
+ - Remove duplicate groups
100
+
101
+ **Output**:
102
+ ```typescript
103
+ interface EntityGroup {
104
+ entities: EntityInfo[];
105
+ relationships: RelationshipInfo[];
106
+ primaryEntity: EntityInfo;
107
+ relationshipType: 'single' | 'parent-child' | 'many-to-many';
108
+ }
109
+ ```
110
+
111
+ **Example**:
112
+ ```
113
+ Customers (single) → [Customers]
114
+ Customers + Orders (parent-child) → [Customers, Orders]
115
+ Customers + Orders + OrderDetails (parent-child) → [Customers, Orders, OrderDetails]
116
+ ```
117
+
118
+ ---
119
+
120
+ ### Phase 3: Business Question Generation
121
+
122
+ **Purpose**: Generate domain-specific business questions using AI
123
+
124
+ **Implementation**: `QuestionGenerator` class
125
+
126
+ **AI Prompt**: `Business Question Generator`
127
+ - Location: `metadata/prompts/templates/query-gen/business-question-generator.template.md`
128
+ - Uses Nunjucks templates to format entity metadata as structured markdown
129
+ - Includes entity descriptions, fields, relationships
130
+
131
+ **Prompt Structure**:
132
+ ```markdown
133
+ # Business Question Generator
134
+
135
+ ## Entity Group Context
136
+
137
+ {% for entity in entityGroupMetadata %}
138
+ ### Entity: {{ entity.entityName }}
139
+ - **Schema**: {{ entity.schemaName }}
140
+ - **View**: {{ entity.baseView }}
141
+ - **Description**: {{ entity.description }}
142
+
143
+ **Fields**:
144
+ {% for field in entity.fields %}
145
+ - `{{ field.name }}` ({{ field.type }})...
146
+ {% endfor %}
147
+
148
+ **Relationships**:
149
+ {% for rel in entity.relationships %}
150
+ - {{ rel.type }}: {{ rel.relatedEntity }} via `{{ rel.foreignKeyField }}`
151
+ {% endfor %}
152
+ {% endfor %}
153
+
154
+ ## Instructions
155
+ Generate 1-2 realistic business questions...
156
+ ```
157
+
158
+ **Output**:
159
+ ```typescript
160
+ interface BusinessQuestion {
161
+ userQuestion: string;
162
+ description: string;
163
+ technicalDescription: string;
164
+ complexity: 'simple' | 'medium' | 'complex';
165
+ requiresAggregation: boolean;
166
+ requiresJoins: boolean;
167
+ entities: string[];
168
+ }
169
+ ```
170
+
171
+ ---
172
+
173
+ ### Phase 4: Vector Similarity Search
174
+
175
+ **Purpose**: Find similar golden queries for few-shot learning
176
+
177
+ **Implementation**:
178
+ - `EmbeddingService` - Wraps AIEngine's `EmbedTextLocal()`
179
+ - `SimilaritySearch` - Weighted cosine similarity
180
+
181
+ **Algorithm**:
182
+ 1. **Embed Question Fields**:
183
+ ```typescript
184
+ const embeddings = {
185
+ name: await aiEngine.EmbedTextLocal(question.name),
186
+ userQuestion: await aiEngine.EmbedTextLocal(question.userQuestion),
187
+ description: await aiEngine.EmbedTextLocal(question.description),
188
+ technicalDescription: await aiEngine.EmbedTextLocal(question.technicalDescription)
189
+ };
190
+ ```
191
+
192
+ 2. **Weighted Similarity Calculation**:
193
+ ```typescript
194
+ const weights = {
195
+ name: 0.1, // 10%
196
+ userQuestion: 0.2, // 20%
197
+ description: 0.35, // 35%
198
+ technicalDescription: 0.35 // 35%
199
+ };
200
+
201
+ for (const golden of goldenQueries) {
202
+ const nameSim = cosineSimilarity(question.name, golden.name);
203
+ const userQuestionSim = cosineSimilarity(question.userQuestion, golden.userQuestion);
204
+ const descSim = cosineSimilarity(question.description, golden.description);
205
+ const techDescSim = cosineSimilarity(question.technicalDescription, golden.technicalDescription);
206
+
207
+ const weightedScore =
208
+ (nameSim * weights.name) +
209
+ (userQuestionSim * weights.userQuestion) +
210
+ (descSim * weights.description) +
211
+ (techDescSim * weights.technicalDescription);
212
+
213
+ scores.push({ query: golden, similarity: weightedScore });
214
+ }
215
+ ```
216
+
217
+ 3. **Top-K Selection**:
218
+ - Sort by weighted similarity descending
219
+ - Return top K results (default: 5)
220
+ - Always return topK even if below threshold
221
+
222
+ **Why Weighted Similarity?**
223
+ - `description` and `technicalDescription` are more semantically rich than `name`
224
+ - `userQuestion` captures intent but may vary in wording
225
+ - Weights reflect information density of each field
226
+
227
+ ---
228
+
229
+ ### Phase 5: SQL Query Generation
230
+
231
+ **Purpose**: Generate Nunjucks SQL templates using AI with few-shot learning
232
+
233
+ **Implementation**: `QueryWriter` class
234
+
235
+ **AI Prompt**: `SQL Query Writer`
236
+ - Location: `metadata/prompts/templates/query-gen/sql-query-writer.template.md`
237
+ - Includes entity metadata, few-shot examples, query requirements
238
+ - Uses Nunjucks loops to format data as readable markdown
239
+
240
+ **Prompt Structure**:
241
+ ```markdown
242
+ # SQL Query Template Writer
243
+
244
+ ## Task
245
+ Generate SQL query for: "{{ userQuestion }}"
246
+
247
+ ## Available Entities
248
+ {% for entity in entityMetadata %}
249
+ ### {{ entity.entityName }}
250
+ - **Schema.View**: `[{{ entity.schemaName }}].[{{ entity.baseView }}]`
251
+ **Available Fields**:
252
+ {% for field in entity.fields %}
253
+ - `{{ field.name }}` ({{ field.type }})...
254
+ {% endfor %}
255
+ {% endfor %}
256
+
257
+ ## Example Queries (Similar to Your Task)
258
+ {% for example in fewShotExamples %}
259
+ ### Example {{ loop.index }}: {{ example.name }}
260
+ **SQL Template**:
261
+ ```sql
262
+ {{ example.sql }}
263
+ ```
264
+ **Parameters**: ...
265
+ **Output Fields**: ...
266
+ {% endfor %}
267
+
268
+ ## Requirements
269
+ 1. Use Nunjucks syntax: `{{ paramName | sqlString }}`
270
+ 2. Use SQL filters: sqlString, sqlNumber, sqlDate, sqlIn
271
+ 3. Query from views (vw*), not tables
272
+ 4. Handle NULLs with COALESCE/ISNULL
273
+ 5. Include appropriate WHERE clauses
274
+ ```
275
+
276
+ **Output**:
277
+ ```typescript
278
+ interface GeneratedQuery {
279
+ sql: string;
280
+ selectClause: QueryOutputField[];
281
+ parameters: QueryParameter[];
282
+ }
283
+ ```
284
+
285
+ **Example Output**:
286
+ ```typescript
287
+ {
288
+ sql: `
289
+ SELECT
290
+ c.Name as CustomerName,
291
+ COUNT(o.ID) as OrderCount,
292
+ COALESCE(SUM(o.Total), 0) as TotalRevenue
293
+ FROM [dbo].[vwCustomers] c
294
+ LEFT JOIN [sales].[vwOrders] o ON o.CustomerID = c.ID
295
+ WHERE c.Name LIKE {{ searchTerm | sqlString }}
296
+ AND o.OrderDate >= {{ startDate | sqlDate }}
297
+ GROUP BY c.Name
298
+ ORDER BY TotalRevenue DESC
299
+ `,
300
+ selectClause: [
301
+ { name: 'CustomerName', description: 'Name of the customer', type: 'string', optional: false },
302
+ { name: 'OrderCount', description: 'Number of orders', type: 'number', optional: false },
303
+ { name: 'TotalRevenue', description: 'Sum of order totals', type: 'number', optional: false }
304
+ ],
305
+ parameters: [
306
+ {
307
+ name: 'searchTerm',
308
+ type: 'string',
309
+ isRequired: false,
310
+ description: 'Customer name search term',
311
+ usage: ['WHERE clause: c.Name LIKE searchTerm'],
312
+ defaultValue: null,
313
+ sampleValue: '%Smith%'
314
+ },
315
+ {
316
+ name: 'startDate',
317
+ type: 'date',
318
+ isRequired: true,
319
+ description: 'Start date for order filter',
320
+ usage: ['WHERE clause: o.OrderDate >= startDate'],
321
+ defaultValue: null,
322
+ sampleValue: '2024-01-01'
323
+ }
324
+ ]
325
+ }
326
+ ```
327
+
328
+ ---
329
+
330
+ ### Phase 6: Query Testing
331
+
332
+ **Purpose**: Execute generated queries to validate they work correctly
333
+
334
+ **Implementation**: `QueryTester` class
335
+
336
+ **Process**:
337
+ 1. **Template Rendering**:
338
+ ```typescript
339
+ const paramValues: Record<string, any> = {};
340
+ for (const param of query.parameters) {
341
+ paramValues[param.name] = parseSampleValue(param.sampleValue, param.type);
342
+ }
343
+
344
+ const result = QueryParameterProcessor.processQueryTemplate(
345
+ { SQL: query.sql, Parameters: query.parameters },
346
+ paramValues
347
+ );
348
+ ```
349
+
350
+ 2. **SQL Execution**:
351
+ ```typescript
352
+ const result = await dataProvider.ExecuteSQL(renderedSQL);
353
+ ```
354
+
355
+ 3. **Result Validation**:
356
+ - Check if query returns results
357
+ - Validate result schema matches `selectClause`
358
+ - Count rows returned
359
+
360
+ **Output**:
361
+ ```typescript
362
+ interface QueryTestResult {
363
+ success: boolean;
364
+ renderedSQL?: string;
365
+ rowCount?: number;
366
+ sampleRows?: unknown[];
367
+ attempts?: number;
368
+ error?: string;
369
+ }
370
+ ```
371
+
372
+ ---
373
+
374
+ ### Phase 7: Error Fixing
375
+
376
+ **Purpose**: Automatically fix SQL errors using AI
377
+
378
+ **Implementation**: `QueryFixer` class
379
+
380
+ **AI Prompt**: `SQL Query Fixer`
381
+ - Receives original SQL, error message, entity metadata
382
+ - AI analyzes error and proposes fix
383
+ - Returns corrected query
384
+
385
+ **Common Error Types**:
386
+ - Syntax errors (missing commas, parentheses)
387
+ - Invalid column names
388
+ - Type mismatches
389
+ - Missing JOINs
390
+ - Incorrect GROUP BY clauses
391
+ - Subquery issues
392
+
393
+ **Process**:
394
+ ```typescript
395
+ async fixQuery(query: GeneratedQuery, errorMessage: string): Promise<GeneratedQuery> {
396
+ const promptData = {
397
+ originalSQL: query.sql,
398
+ errorMessage,
399
+ entityMetadata,
400
+ parameters: query.parameters
401
+ };
402
+
403
+ const result = await promptRunner.ExecutePrompt({
404
+ prompt: await this.getPrompt('SQL Query Fixer'),
405
+ data: promptData,
406
+ contextUser: this.contextUser
407
+ });
408
+
409
+ return result.result as GeneratedQuery;
410
+ }
411
+ ```
412
+
413
+ **Retry Loop** (in QueryTester):
414
+ ```typescript
415
+ let attempt = 0;
416
+ while (attempt < maxAttempts) {
417
+ try {
418
+ const result = await this.executeSQLQuery(renderedSQL);
419
+ return { success: true, result };
420
+ } catch (error) {
421
+ if (attempt < maxAttempts) {
422
+ query = await this.fixQuery(query, errorMessage);
423
+ }
424
+ }
425
+ attempt++;
426
+ }
427
+ ```
428
+
429
+ ---
430
+
431
+ ### Phase 8: Query Evaluation
432
+
433
+ **Purpose**: Assess if query answers the business question correctly
434
+
435
+ **Implementation**: `QueryRefiner` class (evaluation method)
436
+
437
+ **AI Prompt**: `Query Result Evaluator`
438
+ - Receives query, business question, sample results (top 10 rows)
439
+ - AI evaluates relevance, completeness, correctness
440
+ - Generates improvement suggestions
441
+
442
+ **Evaluation Criteria**:
443
+ 1. **Result Relevance**: Do results match what was asked?
444
+ 2. **Data Completeness**: Are all necessary columns present?
445
+ 3. **Correctness**: Are calculations and aggregations correct?
446
+ 4. **Usability**: Are results formatted appropriately?
447
+
448
+ **Output**:
449
+ ```typescript
450
+ interface QueryEvaluation {
451
+ answersQuestion: boolean;
452
+ confidence: number; // 0-1
453
+ reasoning: string;
454
+ suggestions: string[];
455
+ needsRefinement: boolean;
456
+ }
457
+ ```
458
+
459
+ **Example**:
460
+ ```typescript
461
+ {
462
+ answersQuestion: true,
463
+ confidence: 0.95,
464
+ reasoning: "Query correctly aggregates orders by customer and sorts by revenue descending. Results show expected data.",
465
+ suggestions: [
466
+ "Consider adding customer contact info for better usability",
467
+ "Add date range parameter to filter orders by time period"
468
+ ],
469
+ needsRefinement: false
470
+ }
471
+ ```
472
+
473
+ ---
474
+
475
+ ### Phase 9: Query Refinement
476
+
477
+ **Purpose**: Iteratively improve queries based on evaluation feedback
478
+
479
+ **Implementation**: `QueryRefiner` class
480
+
481
+ **AI Prompt**: `Query Refiner`
482
+ - Receives current query, evaluation feedback, entity metadata
483
+ - AI implements suggested improvements
484
+ - Returns refined query
485
+
486
+ **Refinement Loop**:
487
+ ```typescript
488
+ async refineQuery(
489
+ query: GeneratedQuery,
490
+ businessQuestion: BusinessQuestion,
491
+ entityMetadata: EntityMetadataForPrompt[],
492
+ maxRefinements: number
493
+ ): Promise<RefinedQuery> {
494
+ let currentQuery = query;
495
+ let refinementCount = 0;
496
+
497
+ while (refinementCount < maxRefinements) {
498
+ // Test query
499
+ const testResult = await this.tester.testQuery(currentQuery);
500
+ if (!testResult.success) {
501
+ throw new Error(`Query testing failed: ${testResult.error}`);
502
+ }
503
+
504
+ // Evaluate query
505
+ const evaluation = await this.evaluateQuery(
506
+ currentQuery,
507
+ businessQuestion,
508
+ testResult.sampleRows
509
+ );
510
+
511
+ // If evaluation passes, we're done!
512
+ if (evaluation.answersQuestion && !evaluation.needsRefinement) {
513
+ return {
514
+ query: currentQuery,
515
+ testResult,
516
+ evaluation,
517
+ refinementCount
518
+ };
519
+ }
520
+
521
+ // Refine query based on suggestions
522
+ refinementCount++;
523
+ currentQuery = await this.performRefinement(
524
+ currentQuery,
525
+ businessQuestion,
526
+ evaluation,
527
+ entityMetadata
528
+ );
529
+ }
530
+
531
+ // Reached max refinements
532
+ return { query: currentQuery, ..., reachedMaxRefinements: true };
533
+ }
534
+ ```
535
+
536
+ **Termination Conditions**:
537
+ - Query passes evaluation (`answersQuestion: true`, `needsRefinement: false`)
538
+ - Reached `maxRefinementIterations`
539
+ - Query testing fails after fixes
540
+
541
+ ---
542
+
543
+ ### Phase 10: Validation
544
+
545
+ **Purpose**: Comprehensive validation of generated queries
546
+
547
+ **Implementation**: `validate` command (CLI)
548
+
549
+ **Validation Checks**:
550
+ 1. **SQL Syntax**: Query compiles without errors
551
+ 2. **Parameter Validation**: All parameters have valid types and sample values
552
+ 3. **Output Field Validation**: selectClause matches query output
553
+ 4. **Execution Testing**: Query runs successfully against database
554
+ 5. **Result Schema Validation**: Returned columns match expected schema
555
+
556
+ **Process**:
557
+ ```typescript
558
+ async validateCommand(options: Record<string, unknown>): Promise<void> {
559
+ // Load query metadata files
560
+ const queryFiles = await loadQueryFiles(queryPath);
561
+
562
+ for (const { file, queries } of queryFiles) {
563
+ for (const queryRecord of queries) {
564
+ // Convert metadata to GeneratedQuery format
565
+ const query = convertMetadataToGeneratedQuery(queryRecord);
566
+
567
+ // Test query execution
568
+ const tester = new QueryTester(...);
569
+ const testResult = await tester.testQuery(query, 1);
570
+
571
+ if (testResult.success) {
572
+ passCount++;
573
+ } else {
574
+ failCount++;
575
+ errors.push({ file, error: testResult.error });
576
+ }
577
+ }
578
+ }
579
+
580
+ // Report results
581
+ console.log(`Passed: ${passCount}, Failed: ${failCount}`);
582
+ }
583
+ ```
584
+
585
+ ---
586
+
587
+ ### Phase 11: Metadata Export
588
+
589
+ **Purpose**: Export validated queries to MJ metadata format or database
590
+
591
+ **Implementation**:
592
+ - `MetadataExporter` - Exports to JSON files
593
+ - `QueryDatabaseWriter` - Inserts into database
594
+
595
+ **Metadata Format**:
596
+ ```typescript
597
+ interface QueryMetadataRecord {
598
+ fields: {
599
+ Name: string;
600
+ CategoryID: string;
601
+ UserQuestion: string;
602
+ Description: string;
603
+ TechnicalDescription: string;
604
+ SQL: string;
605
+ OriginalSQL: string;
606
+ UsesTemplate: boolean;
607
+ Status: string;
608
+ };
609
+ relatedEntities: {
610
+ 'Query Fields': Array<{ fields: QueryFieldRecord }>;
611
+ 'Query Params': Array<{ fields: QueryParamRecord }>;
612
+ };
613
+ }
614
+ ```
615
+
616
+ **MetadataExporter Process**:
617
+ ```typescript
618
+ async exportQueries(
619
+ validatedQueries: ValidatedQuery[],
620
+ outputDirectory: string
621
+ ): Promise<ExportResult> {
622
+ // Transform to MJ metadata format
623
+ const metadata = validatedQueries.map(q => this.toQueryMetadata(q));
624
+
625
+ // Create metadata file
626
+ const metadataFile = {
627
+ timestamp: new Date().toISOString(),
628
+ generatedBy: 'query-gen',
629
+ version: '1.0',
630
+ queries: metadata
631
+ };
632
+
633
+ // Write to file
634
+ const outputPath = path.join(outputDirectory, `queries-${Date.now()}.json`);
635
+ await fs.writeFile(outputPath, JSON.stringify(metadataFile, null, 2));
636
+
637
+ return { success: true, outputPath, queryCount: metadata.length };
638
+ }
639
+ ```
640
+
641
+ **QueryDatabaseWriter Process**:
642
+ ```typescript
643
+ async writeQueriesToDatabase(
644
+ validatedQueries: ValidatedQuery[],
645
+ contextUser: UserInfo
646
+ ): Promise<WriteResult> {
647
+ const md = new Metadata();
648
+
649
+ for (const vq of validatedQueries) {
650
+ // Create Query entity
651
+ const query = await md.GetEntityObject<QueryEntity>('Queries', contextUser);
652
+ query.NewRecord();
653
+ query.Name = generateQueryName(vq.businessQuestion);
654
+ query.SQL = vq.query.sql;
655
+ // ... set other fields
656
+ await query.Save();
657
+
658
+ // Create Query Fields
659
+ for (const field of vq.query.selectClause) {
660
+ const qf = await md.GetEntityObject<QueryFieldEntity>('Query Fields', contextUser);
661
+ qf.NewRecord();
662
+ qf.QueryID = query.ID;
663
+ qf.Name = field.name;
664
+ // ... set other fields
665
+ await qf.Save();
666
+ }
667
+
668
+ // Create Query Params
669
+ for (const param of vq.query.parameters) {
670
+ const qp = await md.GetEntityObject<QueryParamEntity>('Query Params', contextUser);
671
+ qp.NewRecord();
672
+ qp.QueryID = query.ID;
673
+ qp.Name = param.name;
674
+ // ... set other fields
675
+ await qp.Save();
676
+ }
677
+ }
678
+ }
679
+ ```
680
+
681
+ ---
682
+
683
+ ## Data Flow
684
+
685
+ ### End-to-End Flow Diagram
686
+
687
+ ```
688
+ Database Schema (SQL Server)
689
+
690
+ MemberJunction Metadata
691
+
692
+ EntityGrouper
693
+
694
+ EntityGroup[]
695
+
696
+ QuestionGenerator (AI)
697
+
698
+ BusinessQuestion[]
699
+
700
+ EmbeddingService (Vector Embeddings)
701
+
702
+ SimilaritySearch
703
+
704
+ GoldenQuery[] (few-shot examples)
705
+
706
+ QueryWriter (AI + few-shot)
707
+
708
+ GeneratedQuery
709
+
710
+ QueryTester (SQL execution)
711
+ ├─ Success → QueryRefiner
712
+ └─ Error → QueryFixer (AI) → Retry
713
+
714
+ QueryRefiner (AI evaluation & refinement)
715
+
716
+ ValidatedQuery
717
+
718
+ MetadataExporter / QueryDatabaseWriter
719
+
720
+ JSON files or Database records
721
+ ```
722
+
723
+ ### Data Structures
724
+
725
+ **EntityInfo** (from MJ Metadata):
726
+ - Name, SchemaName, BaseTable, BaseView
727
+ - Fields: EntityFieldInfo[]
728
+ - Relationships: EntityRelationshipInfo[]
729
+
730
+ **EntityGroup**:
731
+ - entities: EntityInfo[]
732
+ - relationships: RelationshipInfo[]
733
+ - primaryEntity: EntityInfo
734
+ - relationshipType: 'single' | 'parent-child' | 'many-to-many'
735
+
736
+ **BusinessQuestion**:
737
+ - userQuestion: string
738
+ - description: string
739
+ - technicalDescription: string
740
+ - complexity: 'simple' | 'medium' | 'complex'
741
+ - requiresAggregation: boolean
742
+ - requiresJoins: boolean
743
+ - entities: string[]
744
+
745
+ **GeneratedQuery**:
746
+ - sql: string (Nunjucks template)
747
+ - selectClause: QueryOutputField[]
748
+ - parameters: QueryParameter[]
749
+
750
+ **ValidatedQuery**:
751
+ - businessQuestion: BusinessQuestion
752
+ - query: GeneratedQuery
753
+ - testResult: QueryTestResult
754
+ - evaluation: QueryEvaluation
755
+ - entityGroup: EntityGroup
756
+
757
+ ---
758
+
759
+ ## AI Integration
760
+
761
+ ### AIEngine Configuration
762
+
763
+ QueryGen uses MemberJunction's AIEngine for all AI interactions:
764
+
765
+ ```typescript
766
+ // Initialize AIEngine
767
+ const aiEngine = AIEngine.Instance;
768
+ await aiEngine.Config(false, contextUser);
769
+
770
+ // Prompts are already cached by AIEngine
771
+ const prompt = aiEngine.Prompts.find(p => p.Name === 'Business Question Generator');
772
+
773
+ // Execute prompt via AIPromptRunner
774
+ const promptRunner = new AIPromptRunner();
775
+ const result = await promptRunner.ExecutePrompt({
776
+ prompt,
777
+ data: { entityGroupMetadata, ... },
778
+ contextUser
779
+ });
780
+ ```
781
+
782
+ ### Prompt Configuration
783
+
784
+ Each prompt is configured with 6 AI models in priority order:
785
+
786
+ 1. **Claude 4.5 Sonnet** (Anthropic) - Priority 1
787
+ - Best quality, highest reasoning capability
788
+ - Used for complex refinement tasks
789
+
790
+ 2. **Kimi K2** (Groq) - Priority 2
791
+ - Fast, cost-effective
792
+ - Good balance of speed and quality
793
+
794
+ 3. **Kimi K2** (Cerebras) - Priority 3
795
+ - Extremely fast inference
796
+ - Backup for Groq
797
+
798
+ 4. **Gemini 2.5 Flash** (Google) - Priority 4
799
+ - Very cost-effective
800
+ - Good for high-volume generation
801
+
802
+ 5. **GPT-OSS-120B** (Groq) - Priority 5
803
+ - Open-source model
804
+ - Fallback option
805
+
806
+ 6. **GPT 5-nano** (OpenAI) - Priority 6
807
+ - Smallest, fastest OpenAI model
808
+ - Final fallback
809
+
810
+ **Failover Strategy**:
811
+ - If model 1 fails (rate limit, error, timeout), try model 2
812
+ - Continue down the list until successful response
813
+ - If all models fail, throw error
814
+
815
+ ### Prompt Engineering Best Practices
816
+
817
+ 1. **Use Nunjucks Templates**:
818
+ - Format data as structured markdown, not raw JSON
819
+ - Use `{% for %}` loops to iterate over arrays
820
+ - Use `{% if %}` conditionals for optional sections
821
+ - Makes prompts much easier for LLMs to parse
822
+
823
+ 2. **Provide Context**:
824
+ - Entity descriptions and business domain
825
+ - Field names, types, and descriptions
826
+ - Relationship information (foreign keys, JOINs)
827
+
828
+ 3. **Few-Shot Examples**:
829
+ - Include 3-5 similar golden queries
830
+ - Show both SQL and parameter definitions
831
+ - Demonstrate expected output format
832
+
833
+ 4. **Clear Instructions**:
834
+ - Explicit requirements (Nunjucks syntax, SQL filters)
835
+ - Output format specification (JSON schema)
836
+ - Validation rules (use views, handle NULLs, etc.)
837
+
838
+ ---
839
+
840
+ ## Database Integration
841
+
842
+ ### MemberJunction Patterns
843
+
844
+ QueryGen follows MJ best practices:
845
+
846
+ 1. **GetEntityObject Pattern**:
847
+ ```typescript
848
+ // ✅ Correct
849
+ const query = await md.GetEntityObject<QueryEntity>('Queries', contextUser);
850
+
851
+ // ❌ Wrong
852
+ const query = new QueryEntity();
853
+ ```
854
+
855
+ 2. **Server-Side contextUser**:
856
+ ```typescript
857
+ // ✅ Correct (server-side)
858
+ const result = await rv.RunView({ EntityName: 'Queries' }, contextUser);
859
+
860
+ // ❌ Wrong (server-side)
861
+ const result = await rv.RunView({ EntityName: 'Queries' });
862
+ ```
863
+
864
+ 3. **RunView Error Handling**:
865
+ ```typescript
866
+ // ✅ Correct
867
+ const result = await rv.RunView({...});
868
+ if (result.Success) {
869
+ const data = result.Results || [];
870
+ } else {
871
+ console.error('Failed:', result.ErrorMessage);
872
+ }
873
+
874
+ // ❌ Wrong (assumes success)
875
+ const result = await rv.RunView({...});
876
+ const data = result.Results;
877
+ ```
878
+
879
+ ### SQL Execution
880
+
881
+ QueryGen uses `DatabaseProviderBase.ExecuteSQL()` for query testing:
882
+
883
+ ```typescript
884
+ const dataProvider = Metadata.Provider.DatabaseConnection as DatabaseProviderBase;
885
+ const result = await dataProvider.ExecuteSQL(renderedSQL);
886
+ // Returns: { Results: any[], RowCount: number }
887
+ ```
888
+
889
+ ### Schema Requirements
890
+
891
+ QueryGen requires:
892
+ - `SchemaName` on all entities (e.g., 'dbo', 'sales')
893
+ - `BaseView` on all entities (e.g., 'vwCustomers', 'vwOrders')
894
+ - Foreign key metadata on EntityField records
895
+ - Sample data for query testing (optional but recommended)
896
+
897
+ ---
898
+
899
+ ## Error Handling Strategy
900
+
901
+ ### Error Handling Utilities
902
+
903
+ QueryGen uses MJ's error handling utilities:
904
+
905
+ ```typescript
906
+ import { extractErrorMessage, requireValue, getPropertyOrDefault } from '@memberjunction/query-gen';
907
+
908
+ try {
909
+ await operation();
910
+ } catch (error: unknown) {
911
+ const errorMsg = extractErrorMessage(error, 'Operation Name');
912
+ console.error(errorMsg);
913
+ }
914
+ ```
915
+
916
+ ### Agent Run Step Pattern
917
+
918
+ All AI operations follow the agent run step pattern:
919
+
920
+ ```typescript
921
+ const step = await this.createStep('Step Name', inputData, contextUser, 'Validation');
922
+
923
+ try {
924
+ const result = await work();
925
+
926
+ step.Success = true;
927
+ step.Status = 'Completed';
928
+ step.CompletedAt = new Date();
929
+ step.OutputData = JSON.stringify(result, null, 2);
930
+ await step.Save();
931
+
932
+ return result;
933
+ } catch (error: unknown) {
934
+ step.Success = false;
935
+ step.Status = 'Failed';
936
+ step.CompletedAt = new Date();
937
+ step.ErrorMessage = extractErrorMessage(error, 'Step Name');
938
+ await step.Save();
939
+
940
+ throw error;
941
+ }
942
+ ```
943
+
944
+ ### Error Categories
945
+
946
+ 1. **Database Errors**:
947
+ - Connection failures
948
+ - SQL syntax errors
949
+ - Schema validation errors
950
+ - Permission errors
951
+
952
+ 2. **AI Errors**:
953
+ - Prompt not found
954
+ - Model failures / rate limits
955
+ - Invalid JSON responses
956
+ - Timeout errors
957
+
958
+ 3. **Template Errors**:
959
+ - Nunjucks syntax errors
960
+ - Unknown filter errors
961
+ - Parameter type mismatches
962
+
963
+ 4. **Validation Errors**:
964
+ - Query returns no results
965
+ - Schema mismatch
966
+ - Missing required parameters
967
+
968
+ ---
969
+
970
+ ## Performance Considerations
971
+
972
+ ### Optimization Strategies
973
+
974
+ 1. **Parallel Processing**:
975
+ ```typescript
976
+ // Process multiple entity groups concurrently
977
+ const parallelGenerations = 3;
978
+ const chunks = chunkArray(entityGroups, parallelGenerations);
979
+
980
+ for (const chunk of chunks) {
981
+ await Promise.all(chunk.map(group => processGroup(group)));
982
+ }
983
+ ```
984
+
985
+ 2. **Prompt Caching**:
986
+ ```typescript
987
+ // AIEngine caches prompts after Config()
988
+ await aiEngine.Config(false, contextUser);
989
+ // Subsequent prompt lookups are instant
990
+ const prompt = aiEngine.Prompts.find(p => p.Name === 'Business Question Generator');
991
+ ```
992
+
993
+ 3. **Embedding Caching**:
994
+ ```typescript
995
+ // Cache golden query embeddings
996
+ const embeddedGolden = await embeddingService.embedGoldenQueries(goldenQueries);
997
+ // Reuse for all questions in session
998
+ ```
999
+
1000
+ 4. **Database Connection Pooling**:
1001
+ - MJ automatically pools connections
1002
+ - Configure pool size in `mj.config.cjs`
1003
+ - Default: max 50, min 5
1004
+
1005
+ ### Cost Optimization
1006
+
1007
+ 1. **Use Cheaper Models**:
1008
+ - Gemini 2.5 Flash: $0.075 per 1M input tokens
1009
+ - GPT 5-nano: Low-cost OpenAI option
1010
+ - Groq/Cerebras: Very fast, low cost
1011
+
1012
+ 2. **Reduce AI Calls**:
1013
+ - Lower `maxRefinementIterations` (3 → 2)
1014
+ - Lower `maxFixingIterations` (5 → 3)
1015
+ - Lower `topSimilarQueries` (5 → 3)
1016
+
1017
+ 3. **Batch Operations**:
1018
+ - Generate queries for multiple entities at once
1019
+ - Use `parallelGenerations` for concurrent processing
1020
+
1021
+ ### Scalability
1022
+
1023
+ QueryGen scales to handle:
1024
+ - **Entities**: 100+ entities with good performance
1025
+ - **Entity Groups**: 1000+ groups (depends on relationship complexity)
1026
+ - **Queries**: 10-100 queries per minute (depends on AI model speed)
1027
+ - **Database Size**: No limit (uses views, not full table scans)
1028
+
1029
+ ---
1030
+
1031
+ ## Design Decisions
1032
+
1033
+ ### Why 11 Phases?
1034
+
1035
+ Each phase has a specific purpose and can be independently tested/validated:
1036
+ 1. **Separation of Concerns**: Each phase does one thing well
1037
+ 2. **Error Isolation**: Failures don't cascade
1038
+ 3. **Flexibility**: Can skip phases (e.g., no refinement)
1039
+ 4. **Observability**: Clear progress tracking
1040
+
1041
+ ### Why Nunjucks Templates?
1042
+
1043
+ 1. **MJ Standard**: MemberJunction uses Nunjucks for SQL templates
1044
+ 2. **Powerful**: Supports loops, conditionals, filters
1045
+ 3. **SQL Filters**: Built-in `sqlString`, `sqlNumber`, `sqlDate`, `sqlIn` filters
1046
+ 4. **Type Safety**: QueryParameterProcessor validates parameters
1047
+
1048
+ ### Why Weighted Similarity?
1049
+
1050
+ Not all fields are equally important:
1051
+ - `description` and `technicalDescription` contain richer semantic information
1052
+ - `name` is often too short/generic
1053
+ - `userQuestion` captures intent but varies in wording
1054
+ - Weighted approach gives better few-shot examples
1055
+
1056
+ ### Why Breadth-First Traversal?
1057
+
1058
+ 1. **Focused Groups**: Prefer directly related entities (1 hop)
1059
+ 2. **Practical Queries**: Most queries use 1-2 entities
1060
+ 3. **Reduced Complexity**: Avoid deeply nested JOINs
1061
+ 4. **Better Performance**: Simpler queries run faster
1062
+
1063
+ ### Why 6-Model Failover?
1064
+
1065
+ 1. **High Availability**: If one model is down, others available
1066
+ 2. **Rate Limit Protection**: Spread load across vendors
1067
+ 3. **Cost Optimization**: Cheaper models as fallbacks
1068
+ 4. **Speed Variation**: Fast models (Cerebras) for simple tasks
1069
+
1070
+ ### Why Local Embeddings?
1071
+
1072
+ 1. **No API Calls**: Faster, no rate limits
1073
+ 2. **Privacy**: Data doesn't leave the system
1074
+ 3. **Cost**: Free (no per-token charges)
1075
+ 4. **Sufficient Quality**: `text-embedding-3-small` works well for similarity
1076
+
1077
+ ---
1078
+
1079
+ ## Future Enhancements
1080
+
1081
+ ### Planned Improvements
1082
+
1083
+ 1. **Query Optimization**:
1084
+ - Analyze execution plans
1085
+ - Suggest indexes
1086
+ - Detect expensive operations
1087
+
1088
+ 2. **Golden Query Learning**:
1089
+ - Automatically promote good queries to golden set
1090
+ - User feedback loop
1091
+ - Continuous improvement
1092
+
1093
+ 3. **Multi-Database Support**:
1094
+ - PostgreSQL support
1095
+ - MySQL support
1096
+ - Abstract SQL dialect differences
1097
+
1098
+ 4. **Advanced Filtering**:
1099
+ - Exclude specific entity combinations
1100
+ - Require specific entities in groups
1101
+ - Custom grouping strategies
1102
+
1103
+ 5. **Monitoring & Analytics**:
1104
+ - Track query generation success rates
1105
+ - Measure AI token usage
1106
+ - Performance metrics dashboard
1107
+
1108
+ ---
1109
+
1110
+ ## Conclusion
1111
+
1112
+ QueryGen's architecture is designed for:
1113
+ - **Reliability**: Comprehensive error handling and AI failover
1114
+ - **Quality**: Iterative refinement ensures queries work correctly
1115
+ - **Scalability**: Handles hundreds of entities efficiently
1116
+ - **Maintainability**: Modular design, clear separation of concerns
1117
+ - **Extensibility**: Easy to add new phases or customize existing ones
1118
+
1119
+ For API usage examples, see [API.md](./API.md).
1120
+ For user documentation, see [../README.md](../README.md).