@memberjunction/query-gen 0.0.1 → 2.126.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.turbo/turbo-build.log +4 -0
- package/CHANGELOG.md +34 -0
- package/COORDINATOR.md +768 -0
- package/IMPLEMENTATION_PLAN.md +1753 -0
- package/LLM_ENTITY_GROUPING_PLAN.md +977 -0
- package/README.md +675 -29
- package/dist/cli/commands/export.d.ts +15 -0
- package/dist/cli/commands/export.d.ts.map +1 -0
- package/dist/cli/commands/export.js +178 -0
- package/dist/cli/commands/export.js.map +1 -0
- package/dist/cli/commands/generate.d.ts +19 -0
- package/dist/cli/commands/generate.d.ts.map +1 -0
- package/dist/cli/commands/generate.js +282 -0
- package/dist/cli/commands/generate.js.map +1 -0
- package/dist/cli/commands/validate.d.ts +17 -0
- package/dist/cli/commands/validate.d.ts.map +1 -0
- package/dist/cli/commands/validate.js +193 -0
- package/dist/cli/commands/validate.js.map +1 -0
- package/dist/cli/config.d.ts +51 -0
- package/dist/cli/config.d.ts.map +1 -0
- package/dist/cli/config.js +142 -0
- package/dist/cli/config.js.map +1 -0
- package/dist/cli/index.d.ts +13 -0
- package/dist/cli/index.d.ts.map +1 -0
- package/dist/cli/index.js +57 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/core/EntityGrouper.d.ts +74 -0
- package/dist/core/EntityGrouper.d.ts.map +1 -0
- package/dist/core/EntityGrouper.js +246 -0
- package/dist/core/EntityGrouper.js.map +1 -0
- package/dist/core/MetadataExporter.d.ts +59 -0
- package/dist/core/MetadataExporter.d.ts.map +1 -0
- package/dist/core/MetadataExporter.js +151 -0
- package/dist/core/MetadataExporter.js.map +1 -0
- package/dist/core/QueryDatabaseWriter.d.ts +50 -0
- package/dist/core/QueryDatabaseWriter.d.ts.map +1 -0
- package/dist/core/QueryDatabaseWriter.js +152 -0
- package/dist/core/QueryDatabaseWriter.js.map +1 -0
- package/dist/core/QueryFixer.d.ts +48 -0
- package/dist/core/QueryFixer.d.ts.map +1 -0
- package/dist/core/QueryFixer.js +115 -0
- package/dist/core/QueryFixer.js.map +1 -0
- package/dist/core/QueryRefiner.d.ts +94 -0
- package/dist/core/QueryRefiner.d.ts.map +1 -0
- package/dist/core/QueryRefiner.js +267 -0
- package/dist/core/QueryRefiner.js.map +1 -0
- package/dist/core/QueryTester.d.ts +70 -0
- package/dist/core/QueryTester.d.ts.map +1 -0
- package/dist/core/QueryTester.js +243 -0
- package/dist/core/QueryTester.js.map +1 -0
- package/dist/core/QueryWriter.d.ts +57 -0
- package/dist/core/QueryWriter.d.ts.map +1 -0
- package/dist/core/QueryWriter.js +184 -0
- package/dist/core/QueryWriter.js.map +1 -0
- package/dist/core/QuestionGenerator.d.ts +58 -0
- package/dist/core/QuestionGenerator.d.ts.map +1 -0
- package/dist/core/QuestionGenerator.js +145 -0
- package/dist/core/QuestionGenerator.js.map +1 -0
- package/dist/data/schema.d.ts +230 -0
- package/dist/data/schema.d.ts.map +1 -0
- package/dist/data/schema.js +6 -0
- package/dist/data/schema.js.map +1 -0
- package/dist/index.d.ts +28 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +77 -0
- package/dist/index.js.map +1 -0
- package/dist/prompts/PromptNames.d.ts +32 -0
- package/dist/prompts/PromptNames.d.ts.map +1 -0
- package/dist/prompts/PromptNames.js +35 -0
- package/dist/prompts/PromptNames.js.map +1 -0
- package/dist/utils/category-builder.d.ts +28 -0
- package/dist/utils/category-builder.d.ts.map +1 -0
- package/dist/utils/category-builder.js +90 -0
- package/dist/utils/category-builder.js.map +1 -0
- package/dist/utils/entity-helpers.d.ts +49 -0
- package/dist/utils/entity-helpers.d.ts.map +1 -0
- package/dist/utils/entity-helpers.js +189 -0
- package/dist/utils/entity-helpers.js.map +1 -0
- package/dist/utils/error-handlers.d.ts +19 -0
- package/dist/utils/error-handlers.d.ts.map +1 -0
- package/dist/utils/error-handlers.js +41 -0
- package/dist/utils/error-handlers.js.map +1 -0
- package/dist/utils/graph-helpers.d.ts +51 -0
- package/dist/utils/graph-helpers.d.ts.map +1 -0
- package/dist/utils/graph-helpers.js +82 -0
- package/dist/utils/graph-helpers.js.map +1 -0
- package/dist/utils/prompt-helpers.d.ts +25 -0
- package/dist/utils/prompt-helpers.d.ts.map +1 -0
- package/dist/utils/prompt-helpers.js +66 -0
- package/dist/utils/prompt-helpers.js.map +1 -0
- package/dist/utils/query-helpers.d.ts +23 -0
- package/dist/utils/query-helpers.d.ts.map +1 -0
- package/dist/utils/query-helpers.js +34 -0
- package/dist/utils/query-helpers.js.map +1 -0
- package/dist/utils/user-helpers.d.ts +15 -0
- package/dist/utils/user-helpers.d.ts.map +1 -0
- package/dist/utils/user-helpers.js +32 -0
- package/dist/utils/user-helpers.js.map +1 -0
- package/dist/vectors/EmbeddingService.d.ts +58 -0
- package/dist/vectors/EmbeddingService.d.ts.map +1 -0
- package/dist/vectors/EmbeddingService.js +90 -0
- package/dist/vectors/EmbeddingService.js.map +1 -0
- package/dist/vectors/SimilaritySearch.d.ts +51 -0
- package/dist/vectors/SimilaritySearch.d.ts.map +1 -0
- package/dist/vectors/SimilaritySearch.js +85 -0
- package/dist/vectors/SimilaritySearch.js.map +1 -0
- package/docs/API.md +1040 -0
- package/docs/ARCHITECTURE.md +1120 -0
- package/examples/advanced-usage.ts +401 -0
- package/examples/basic-usage.ts +285 -0
- package/package.json +48 -6
- package/src/cli/commands/export.ts +173 -0
- package/src/cli/commands/generate.ts +330 -0
- package/src/cli/commands/validate.ts +185 -0
- package/src/cli/config.ts +203 -0
- package/src/cli/index.ts +63 -0
- package/src/core/EntityGrouper.ts +318 -0
- package/src/core/MetadataExporter.ts +148 -0
- package/src/core/QueryDatabaseWriter.ts +187 -0
- package/src/core/QueryFixer.ts +153 -0
- package/src/core/QueryRefiner.ts +382 -0
- package/src/core/QueryTester.ts +264 -0
- package/src/core/QueryWriter.ts +239 -0
- package/src/core/QuestionGenerator.ts +199 -0
- package/src/data/golden-queries.json +1371 -0
- package/src/data/schema.ts +252 -0
- package/src/index.ts +49 -0
- package/src/prompts/PromptNames.ts +36 -0
- package/src/utils/category-builder.ts +97 -0
- package/src/utils/entity-helpers.ts +203 -0
- package/src/utils/error-handlers.ts +41 -0
- package/src/utils/graph-helpers.ts +99 -0
- package/src/utils/prompt-helpers.ts +79 -0
- package/src/utils/query-helpers.ts +32 -0
- package/src/utils/user-helpers.ts +39 -0
- package/src/vectors/EmbeddingService.ts +109 -0
- package/src/vectors/SimilaritySearch.ts +108 -0
- package/tsconfig.json +39 -0
|
@@ -0,0 +1,1120 @@
|
|
|
1
|
+
# QueryGen Architecture
|
|
2
|
+
|
|
3
|
+
This document provides a comprehensive technical deep-dive into the QueryGen package architecture, design decisions, and implementation details.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Overview](#overview)
|
|
8
|
+
- [11-Phase Pipeline](#11-phase-pipeline)
|
|
9
|
+
- [Core Components](#core-components)
|
|
10
|
+
- [Data Flow](#data-flow)
|
|
11
|
+
- [AI Integration](#ai-integration)
|
|
12
|
+
- [Database Integration](#database-integration)
|
|
13
|
+
- [Error Handling Strategy](#error-handling-strategy)
|
|
14
|
+
- [Performance Considerations](#performance-considerations)
|
|
15
|
+
- [Design Decisions](#design-decisions)
|
|
16
|
+
|
|
17
|
+
## Overview
|
|
18
|
+
|
|
19
|
+
QueryGen is an AI-powered system for automatically generating, testing, and refining SQL query templates. It leverages MemberJunction's metadata system, AIEngine for LLM interactions, and vector embeddings for few-shot learning.
|
|
20
|
+
|
|
21
|
+
### Architecture Principles
|
|
22
|
+
|
|
23
|
+
1. **Modular Design** - Each phase is independent and testable
|
|
24
|
+
2. **Error Resilience** - Comprehensive error handling with AI-powered fixing
|
|
25
|
+
3. **Type Safety** - Explicit TypeScript types throughout
|
|
26
|
+
4. **MJ Integration** - Deep integration with MemberJunction patterns
|
|
27
|
+
5. **AI-First** - Leverages AI at every decision point
|
|
28
|
+
|
|
29
|
+
### Technology Stack
|
|
30
|
+
|
|
31
|
+
- **Language**: TypeScript 5.0+
|
|
32
|
+
- **AI**: MemberJunction AIEngine with 6-model failover
|
|
33
|
+
- **Vector Embeddings**: Local embeddings via `text-embedding-3-small`
|
|
34
|
+
- **Templates**: Nunjucks for SQL templates
|
|
35
|
+
- **Database**: SQL Server 2016+
|
|
36
|
+
- **CLI**: Commander.js, ora, chalk
|
|
37
|
+
|
|
38
|
+
## 11-Phase Pipeline
|
|
39
|
+
|
|
40
|
+
### Phase 1: Entity Analysis
|
|
41
|
+
|
|
42
|
+
**Purpose**: Load and filter entities from MemberJunction metadata
|
|
43
|
+
|
|
44
|
+
**Implementation**:
|
|
45
|
+
```typescript
|
|
46
|
+
const md = new Metadata();
|
|
47
|
+
const allEntities = md.Entities.filter(
|
|
48
|
+
e => !config.excludeSchemas.includes(e.SchemaName || '')
|
|
49
|
+
);
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Key Operations**:
|
|
53
|
+
- Load all entities from Metadata.Provider
|
|
54
|
+
- Apply include/exclude filters from config
|
|
55
|
+
- Exclude system schemas (sys, INFORMATION_SCHEMA)
|
|
56
|
+
- Validate entity metadata completeness
|
|
57
|
+
|
|
58
|
+
**Output**: Filtered array of `EntityInfo` objects ready for grouping
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
### Phase 2: Entity Grouping
|
|
63
|
+
|
|
64
|
+
**Purpose**: Create logical groups of 1-N related entities based on foreign key relationships
|
|
65
|
+
|
|
66
|
+
**Implementation**: `EntityGrouper` class
|
|
67
|
+
|
|
68
|
+
**Algorithm**:
|
|
69
|
+
1. **Build Relationship Graph**:
|
|
70
|
+
```typescript
|
|
71
|
+
// Map entity names to their related entities
|
|
72
|
+
const graph = new Map<string, RelationshipInfo[]>();
|
|
73
|
+
|
|
74
|
+
for (const entity of entities) {
|
|
75
|
+
const relationships: RelationshipInfo[] = [];
|
|
76
|
+
for (const field of entity.Fields) {
|
|
77
|
+
if (isForeignKeyField(field)) {
|
|
78
|
+
relationships.push({
|
|
79
|
+
from: entity.Name,
|
|
80
|
+
to: relatedEntityName,
|
|
81
|
+
via: field.Name,
|
|
82
|
+
type: 'many-to-one'
|
|
83
|
+
});
|
|
84
|
+
}
|
|
85
|
+
}
|
|
86
|
+
graph.set(entity.Name, relationships);
|
|
87
|
+
}
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
2. **Breadth-First Traversal**:
|
|
91
|
+
- Start from each entity as "primary entity"
|
|
92
|
+
- Find directly related entities (1 hop away)
|
|
93
|
+
- Then find entities 2 hops away, etc.
|
|
94
|
+
- Generate combinations up to `maxEntitiesPerGroup`
|
|
95
|
+
|
|
96
|
+
3. **Deduplication**:
|
|
97
|
+
- Sort entity IDs in each group
|
|
98
|
+
- Use sorted IDs as unique key
|
|
99
|
+
- Remove duplicate groups
|
|
100
|
+
|
|
101
|
+
**Output**:
|
|
102
|
+
```typescript
|
|
103
|
+
interface EntityGroup {
|
|
104
|
+
entities: EntityInfo[];
|
|
105
|
+
relationships: RelationshipInfo[];
|
|
106
|
+
primaryEntity: EntityInfo;
|
|
107
|
+
relationshipType: 'single' | 'parent-child' | 'many-to-many';
|
|
108
|
+
}
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
**Example**:
|
|
112
|
+
```
|
|
113
|
+
Customers (single) → [Customers]
|
|
114
|
+
Customers + Orders (parent-child) → [Customers, Orders]
|
|
115
|
+
Customers + Orders + OrderDetails (parent-child) → [Customers, Orders, OrderDetails]
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
### Phase 3: Business Question Generation
|
|
121
|
+
|
|
122
|
+
**Purpose**: Generate domain-specific business questions using AI
|
|
123
|
+
|
|
124
|
+
**Implementation**: `QuestionGenerator` class
|
|
125
|
+
|
|
126
|
+
**AI Prompt**: `Business Question Generator`
|
|
127
|
+
- Location: `metadata/prompts/templates/query-gen/business-question-generator.template.md`
|
|
128
|
+
- Uses Nunjucks templates to format entity metadata as structured markdown
|
|
129
|
+
- Includes entity descriptions, fields, relationships
|
|
130
|
+
|
|
131
|
+
**Prompt Structure**:
|
|
132
|
+
```markdown
|
|
133
|
+
# Business Question Generator
|
|
134
|
+
|
|
135
|
+
## Entity Group Context
|
|
136
|
+
|
|
137
|
+
{% for entity in entityGroupMetadata %}
|
|
138
|
+
### Entity: {{ entity.entityName }}
|
|
139
|
+
- **Schema**: {{ entity.schemaName }}
|
|
140
|
+
- **View**: {{ entity.baseView }}
|
|
141
|
+
- **Description**: {{ entity.description }}
|
|
142
|
+
|
|
143
|
+
**Fields**:
|
|
144
|
+
{% for field in entity.fields %}
|
|
145
|
+
- `{{ field.name }}` ({{ field.type }})...
|
|
146
|
+
{% endfor %}
|
|
147
|
+
|
|
148
|
+
**Relationships**:
|
|
149
|
+
{% for rel in entity.relationships %}
|
|
150
|
+
- {{ rel.type }}: {{ rel.relatedEntity }} via `{{ rel.foreignKeyField }}`
|
|
151
|
+
{% endfor %}
|
|
152
|
+
{% endfor %}
|
|
153
|
+
|
|
154
|
+
## Instructions
|
|
155
|
+
Generate 1-2 realistic business questions...
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**Output**:
|
|
159
|
+
```typescript
|
|
160
|
+
interface BusinessQuestion {
|
|
161
|
+
userQuestion: string;
|
|
162
|
+
description: string;
|
|
163
|
+
technicalDescription: string;
|
|
164
|
+
complexity: 'simple' | 'medium' | 'complex';
|
|
165
|
+
requiresAggregation: boolean;
|
|
166
|
+
requiresJoins: boolean;
|
|
167
|
+
entities: string[];
|
|
168
|
+
}
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
---
|
|
172
|
+
|
|
173
|
+
### Phase 4: Vector Similarity Search
|
|
174
|
+
|
|
175
|
+
**Purpose**: Find similar golden queries for few-shot learning
|
|
176
|
+
|
|
177
|
+
**Implementation**:
|
|
178
|
+
- `EmbeddingService` - Wraps AIEngine's `EmbedTextLocal()`
|
|
179
|
+
- `SimilaritySearch` - Weighted cosine similarity
|
|
180
|
+
|
|
181
|
+
**Algorithm**:
|
|
182
|
+
1. **Embed Question Fields**:
|
|
183
|
+
```typescript
|
|
184
|
+
const embeddings = {
|
|
185
|
+
name: await aiEngine.EmbedTextLocal(question.name),
|
|
186
|
+
userQuestion: await aiEngine.EmbedTextLocal(question.userQuestion),
|
|
187
|
+
description: await aiEngine.EmbedTextLocal(question.description),
|
|
188
|
+
technicalDescription: await aiEngine.EmbedTextLocal(question.technicalDescription)
|
|
189
|
+
};
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
2. **Weighted Similarity Calculation**:
|
|
193
|
+
```typescript
|
|
194
|
+
const weights = {
|
|
195
|
+
name: 0.1, // 10%
|
|
196
|
+
userQuestion: 0.2, // 20%
|
|
197
|
+
description: 0.35, // 35%
|
|
198
|
+
technicalDescription: 0.35 // 35%
|
|
199
|
+
};
|
|
200
|
+
|
|
201
|
+
for (const golden of goldenQueries) {
|
|
202
|
+
const nameSim = cosineSimilarity(question.name, golden.name);
|
|
203
|
+
const userQuestionSim = cosineSimilarity(question.userQuestion, golden.userQuestion);
|
|
204
|
+
const descSim = cosineSimilarity(question.description, golden.description);
|
|
205
|
+
const techDescSim = cosineSimilarity(question.technicalDescription, golden.technicalDescription);
|
|
206
|
+
|
|
207
|
+
const weightedScore =
|
|
208
|
+
(nameSim * weights.name) +
|
|
209
|
+
(userQuestionSim * weights.userQuestion) +
|
|
210
|
+
(descSim * weights.description) +
|
|
211
|
+
(techDescSim * weights.technicalDescription);
|
|
212
|
+
|
|
213
|
+
scores.push({ query: golden, similarity: weightedScore });
|
|
214
|
+
}
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
3. **Top-K Selection**:
|
|
218
|
+
- Sort by weighted similarity descending
|
|
219
|
+
- Return top K results (default: 5)
|
|
220
|
+
- Always return topK even if below threshold
|
|
221
|
+
|
|
222
|
+
**Why Weighted Similarity?**
|
|
223
|
+
- `description` and `technicalDescription` are more semantically rich than `name`
|
|
224
|
+
- `userQuestion` captures intent but may vary in wording
|
|
225
|
+
- Weights reflect information density of each field
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
### Phase 5: SQL Query Generation
|
|
230
|
+
|
|
231
|
+
**Purpose**: Generate Nunjucks SQL templates using AI with few-shot learning
|
|
232
|
+
|
|
233
|
+
**Implementation**: `QueryWriter` class
|
|
234
|
+
|
|
235
|
+
**AI Prompt**: `SQL Query Writer`
|
|
236
|
+
- Location: `metadata/prompts/templates/query-gen/sql-query-writer.template.md`
|
|
237
|
+
- Includes entity metadata, few-shot examples, query requirements
|
|
238
|
+
- Uses Nunjucks loops to format data as readable markdown
|
|
239
|
+
|
|
240
|
+
**Prompt Structure**:
|
|
241
|
+
```markdown
|
|
242
|
+
# SQL Query Template Writer
|
|
243
|
+
|
|
244
|
+
## Task
|
|
245
|
+
Generate SQL query for: "{{ userQuestion }}"
|
|
246
|
+
|
|
247
|
+
## Available Entities
|
|
248
|
+
{% for entity in entityMetadata %}
|
|
249
|
+
### {{ entity.entityName }}
|
|
250
|
+
- **Schema.View**: `[{{ entity.schemaName }}].[{{ entity.baseView }}]`
|
|
251
|
+
**Available Fields**:
|
|
252
|
+
{% for field in entity.fields %}
|
|
253
|
+
- `{{ field.name }}` ({{ field.type }})...
|
|
254
|
+
{% endfor %}
|
|
255
|
+
{% endfor %}
|
|
256
|
+
|
|
257
|
+
## Example Queries (Similar to Your Task)
|
|
258
|
+
{% for example in fewShotExamples %}
|
|
259
|
+
### Example {{ loop.index }}: {{ example.name }}
|
|
260
|
+
**SQL Template**:
|
|
261
|
+
```sql
|
|
262
|
+
{{ example.sql }}
|
|
263
|
+
```
|
|
264
|
+
**Parameters**: ...
|
|
265
|
+
**Output Fields**: ...
|
|
266
|
+
{% endfor %}
|
|
267
|
+
|
|
268
|
+
## Requirements
|
|
269
|
+
1. Use Nunjucks syntax: `{{ paramName | sqlString }}`
|
|
270
|
+
2. Use SQL filters: sqlString, sqlNumber, sqlDate, sqlIn
|
|
271
|
+
3. Query from views (vw*), not tables
|
|
272
|
+
4. Handle NULLs with COALESCE/ISNULL
|
|
273
|
+
5. Include appropriate WHERE clauses
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
**Output**:
|
|
277
|
+
```typescript
|
|
278
|
+
interface GeneratedQuery {
|
|
279
|
+
sql: string;
|
|
280
|
+
selectClause: QueryOutputField[];
|
|
281
|
+
parameters: QueryParameter[];
|
|
282
|
+
}
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
**Example Output**:
|
|
286
|
+
```typescript
|
|
287
|
+
{
|
|
288
|
+
sql: `
|
|
289
|
+
SELECT
|
|
290
|
+
c.Name as CustomerName,
|
|
291
|
+
COUNT(o.ID) as OrderCount,
|
|
292
|
+
COALESCE(SUM(o.Total), 0) as TotalRevenue
|
|
293
|
+
FROM [dbo].[vwCustomers] c
|
|
294
|
+
LEFT JOIN [sales].[vwOrders] o ON o.CustomerID = c.ID
|
|
295
|
+
WHERE c.Name LIKE {{ searchTerm | sqlString }}
|
|
296
|
+
AND o.OrderDate >= {{ startDate | sqlDate }}
|
|
297
|
+
GROUP BY c.Name
|
|
298
|
+
ORDER BY TotalRevenue DESC
|
|
299
|
+
`,
|
|
300
|
+
selectClause: [
|
|
301
|
+
{ name: 'CustomerName', description: 'Name of the customer', type: 'string', optional: false },
|
|
302
|
+
{ name: 'OrderCount', description: 'Number of orders', type: 'number', optional: false },
|
|
303
|
+
{ name: 'TotalRevenue', description: 'Sum of order totals', type: 'number', optional: false }
|
|
304
|
+
],
|
|
305
|
+
parameters: [
|
|
306
|
+
{
|
|
307
|
+
name: 'searchTerm',
|
|
308
|
+
type: 'string',
|
|
309
|
+
isRequired: false,
|
|
310
|
+
description: 'Customer name search term',
|
|
311
|
+
usage: ['WHERE clause: c.Name LIKE searchTerm'],
|
|
312
|
+
defaultValue: null,
|
|
313
|
+
sampleValue: '%Smith%'
|
|
314
|
+
},
|
|
315
|
+
{
|
|
316
|
+
name: 'startDate',
|
|
317
|
+
type: 'date',
|
|
318
|
+
isRequired: true,
|
|
319
|
+
description: 'Start date for order filter',
|
|
320
|
+
usage: ['WHERE clause: o.OrderDate >= startDate'],
|
|
321
|
+
defaultValue: null,
|
|
322
|
+
sampleValue: '2024-01-01'
|
|
323
|
+
}
|
|
324
|
+
]
|
|
325
|
+
}
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
---
|
|
329
|
+
|
|
330
|
+
### Phase 6: Query Testing
|
|
331
|
+
|
|
332
|
+
**Purpose**: Execute generated queries to validate they work correctly
|
|
333
|
+
|
|
334
|
+
**Implementation**: `QueryTester` class
|
|
335
|
+
|
|
336
|
+
**Process**:
|
|
337
|
+
1. **Template Rendering**:
|
|
338
|
+
```typescript
|
|
339
|
+
const paramValues: Record<string, any> = {};
|
|
340
|
+
for (const param of query.parameters) {
|
|
341
|
+
paramValues[param.name] = parseSampleValue(param.sampleValue, param.type);
|
|
342
|
+
}
|
|
343
|
+
|
|
344
|
+
const result = QueryParameterProcessor.processQueryTemplate(
|
|
345
|
+
{ SQL: query.sql, Parameters: query.parameters },
|
|
346
|
+
paramValues
|
|
347
|
+
);
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
2. **SQL Execution**:
|
|
351
|
+
```typescript
|
|
352
|
+
const result = await dataProvider.ExecuteSQL(renderedSQL);
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
3. **Result Validation**:
|
|
356
|
+
- Check if query returns results
|
|
357
|
+
- Validate result schema matches `selectClause`
|
|
358
|
+
- Count rows returned
|
|
359
|
+
|
|
360
|
+
**Output**:
|
|
361
|
+
```typescript
|
|
362
|
+
interface QueryTestResult {
|
|
363
|
+
success: boolean;
|
|
364
|
+
renderedSQL?: string;
|
|
365
|
+
rowCount?: number;
|
|
366
|
+
sampleRows?: unknown[];
|
|
367
|
+
attempts?: number;
|
|
368
|
+
error?: string;
|
|
369
|
+
}
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
---
|
|
373
|
+
|
|
374
|
+
### Phase 7: Error Fixing
|
|
375
|
+
|
|
376
|
+
**Purpose**: Automatically fix SQL errors using AI
|
|
377
|
+
|
|
378
|
+
**Implementation**: `QueryFixer` class
|
|
379
|
+
|
|
380
|
+
**AI Prompt**: `SQL Query Fixer`
|
|
381
|
+
- Receives original SQL, error message, entity metadata
|
|
382
|
+
- AI analyzes error and proposes fix
|
|
383
|
+
- Returns corrected query
|
|
384
|
+
|
|
385
|
+
**Common Error Types**:
|
|
386
|
+
- Syntax errors (missing commas, parentheses)
|
|
387
|
+
- Invalid column names
|
|
388
|
+
- Type mismatches
|
|
389
|
+
- Missing JOINs
|
|
390
|
+
- Incorrect GROUP BY clauses
|
|
391
|
+
- Subquery issues
|
|
392
|
+
|
|
393
|
+
**Process**:
|
|
394
|
+
```typescript
|
|
395
|
+
async fixQuery(query: GeneratedQuery, errorMessage: string): Promise<GeneratedQuery> {
|
|
396
|
+
const promptData = {
|
|
397
|
+
originalSQL: query.sql,
|
|
398
|
+
errorMessage,
|
|
399
|
+
entityMetadata,
|
|
400
|
+
parameters: query.parameters
|
|
401
|
+
};
|
|
402
|
+
|
|
403
|
+
const result = await promptRunner.ExecutePrompt({
|
|
404
|
+
prompt: await this.getPrompt('SQL Query Fixer'),
|
|
405
|
+
data: promptData,
|
|
406
|
+
contextUser: this.contextUser
|
|
407
|
+
});
|
|
408
|
+
|
|
409
|
+
return result.result as GeneratedQuery;
|
|
410
|
+
}
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
**Retry Loop** (in QueryTester):
|
|
414
|
+
```typescript
|
|
415
|
+
let attempt = 0;
|
|
416
|
+
while (attempt < maxAttempts) {
|
|
417
|
+
try {
|
|
418
|
+
const result = await this.executeSQLQuery(renderedSQL);
|
|
419
|
+
return { success: true, result };
|
|
420
|
+
} catch (error) {
|
|
421
|
+
if (attempt < maxAttempts) {
|
|
422
|
+
query = await this.fixQuery(query, errorMessage);
|
|
423
|
+
}
|
|
424
|
+
}
|
|
425
|
+
attempt++;
|
|
426
|
+
}
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
---
|
|
430
|
+
|
|
431
|
+
### Phase 8: Query Evaluation
|
|
432
|
+
|
|
433
|
+
**Purpose**: Assess if query answers the business question correctly
|
|
434
|
+
|
|
435
|
+
**Implementation**: `QueryRefiner` class (evaluation method)
|
|
436
|
+
|
|
437
|
+
**AI Prompt**: `Query Result Evaluator`
|
|
438
|
+
- Receives query, business question, sample results (top 10 rows)
|
|
439
|
+
- AI evaluates relevance, completeness, correctness
|
|
440
|
+
- Generates improvement suggestions
|
|
441
|
+
|
|
442
|
+
**Evaluation Criteria**:
|
|
443
|
+
1. **Result Relevance**: Do results match what was asked?
|
|
444
|
+
2. **Data Completeness**: Are all necessary columns present?
|
|
445
|
+
3. **Correctness**: Are calculations and aggregations correct?
|
|
446
|
+
4. **Usability**: Are results formatted appropriately?
|
|
447
|
+
|
|
448
|
+
**Output**:
|
|
449
|
+
```typescript
|
|
450
|
+
interface QueryEvaluation {
|
|
451
|
+
answersQuestion: boolean;
|
|
452
|
+
confidence: number; // 0-1
|
|
453
|
+
reasoning: string;
|
|
454
|
+
suggestions: string[];
|
|
455
|
+
needsRefinement: boolean;
|
|
456
|
+
}
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
**Example**:
|
|
460
|
+
```typescript
|
|
461
|
+
{
|
|
462
|
+
answersQuestion: true,
|
|
463
|
+
confidence: 0.95,
|
|
464
|
+
reasoning: "Query correctly aggregates orders by customer and sorts by revenue descending. Results show expected data.",
|
|
465
|
+
suggestions: [
|
|
466
|
+
"Consider adding customer contact info for better usability",
|
|
467
|
+
"Add date range parameter to filter orders by time period"
|
|
468
|
+
],
|
|
469
|
+
needsRefinement: false
|
|
470
|
+
}
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
---
|
|
474
|
+
|
|
475
|
+
### Phase 9: Query Refinement
|
|
476
|
+
|
|
477
|
+
**Purpose**: Iteratively improve queries based on evaluation feedback
|
|
478
|
+
|
|
479
|
+
**Implementation**: `QueryRefiner` class
|
|
480
|
+
|
|
481
|
+
**AI Prompt**: `Query Refiner`
|
|
482
|
+
- Receives current query, evaluation feedback, entity metadata
|
|
483
|
+
- AI implements suggested improvements
|
|
484
|
+
- Returns refined query
|
|
485
|
+
|
|
486
|
+
**Refinement Loop**:
|
|
487
|
+
```typescript
|
|
488
|
+
async refineQuery(
|
|
489
|
+
query: GeneratedQuery,
|
|
490
|
+
businessQuestion: BusinessQuestion,
|
|
491
|
+
entityMetadata: EntityMetadataForPrompt[],
|
|
492
|
+
maxRefinements: number
|
|
493
|
+
): Promise<RefinedQuery> {
|
|
494
|
+
let currentQuery = query;
|
|
495
|
+
let refinementCount = 0;
|
|
496
|
+
|
|
497
|
+
while (refinementCount < maxRefinements) {
|
|
498
|
+
// Test query
|
|
499
|
+
const testResult = await this.tester.testQuery(currentQuery);
|
|
500
|
+
if (!testResult.success) {
|
|
501
|
+
throw new Error(`Query testing failed: ${testResult.error}`);
|
|
502
|
+
}
|
|
503
|
+
|
|
504
|
+
// Evaluate query
|
|
505
|
+
const evaluation = await this.evaluateQuery(
|
|
506
|
+
currentQuery,
|
|
507
|
+
businessQuestion,
|
|
508
|
+
testResult.sampleRows
|
|
509
|
+
);
|
|
510
|
+
|
|
511
|
+
// If evaluation passes, we're done!
|
|
512
|
+
if (evaluation.answersQuestion && !evaluation.needsRefinement) {
|
|
513
|
+
return {
|
|
514
|
+
query: currentQuery,
|
|
515
|
+
testResult,
|
|
516
|
+
evaluation,
|
|
517
|
+
refinementCount
|
|
518
|
+
};
|
|
519
|
+
}
|
|
520
|
+
|
|
521
|
+
// Refine query based on suggestions
|
|
522
|
+
refinementCount++;
|
|
523
|
+
currentQuery = await this.performRefinement(
|
|
524
|
+
currentQuery,
|
|
525
|
+
businessQuestion,
|
|
526
|
+
evaluation,
|
|
527
|
+
entityMetadata
|
|
528
|
+
);
|
|
529
|
+
}
|
|
530
|
+
|
|
531
|
+
// Reached max refinements
|
|
532
|
+
return { query: currentQuery, ..., reachedMaxRefinements: true };
|
|
533
|
+
}
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
**Termination Conditions**:
|
|
537
|
+
- Query passes evaluation (`answersQuestion: true`, `needsRefinement: false`)
|
|
538
|
+
- Reached `maxRefinementIterations`
|
|
539
|
+
- Query testing fails after fixes
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
### Phase 10: Validation
|
|
544
|
+
|
|
545
|
+
**Purpose**: Comprehensive validation of generated queries
|
|
546
|
+
|
|
547
|
+
**Implementation**: `validate` command (CLI)
|
|
548
|
+
|
|
549
|
+
**Validation Checks**:
|
|
550
|
+
1. **SQL Syntax**: Query compiles without errors
|
|
551
|
+
2. **Parameter Validation**: All parameters have valid types and sample values
|
|
552
|
+
3. **Output Field Validation**: selectClause matches query output
|
|
553
|
+
4. **Execution Testing**: Query runs successfully against database
|
|
554
|
+
5. **Result Schema Validation**: Returned columns match expected schema
|
|
555
|
+
|
|
556
|
+
**Process**:
|
|
557
|
+
```typescript
|
|
558
|
+
async validateCommand(options: Record<string, unknown>): Promise<void> {
|
|
559
|
+
// Load query metadata files
|
|
560
|
+
const queryFiles = await loadQueryFiles(queryPath);
|
|
561
|
+
|
|
562
|
+
for (const { file, queries } of queryFiles) {
|
|
563
|
+
for (const queryRecord of queries) {
|
|
564
|
+
// Convert metadata to GeneratedQuery format
|
|
565
|
+
const query = convertMetadataToGeneratedQuery(queryRecord);
|
|
566
|
+
|
|
567
|
+
// Test query execution
|
|
568
|
+
const tester = new QueryTester(...);
|
|
569
|
+
const testResult = await tester.testQuery(query, 1);
|
|
570
|
+
|
|
571
|
+
if (testResult.success) {
|
|
572
|
+
passCount++;
|
|
573
|
+
} else {
|
|
574
|
+
failCount++;
|
|
575
|
+
errors.push({ file, error: testResult.error });
|
|
576
|
+
}
|
|
577
|
+
}
|
|
578
|
+
}
|
|
579
|
+
|
|
580
|
+
// Report results
|
|
581
|
+
console.log(`Passed: ${passCount}, Failed: ${failCount}`);
|
|
582
|
+
}
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
---
|
|
586
|
+
|
|
587
|
+
### Phase 11: Metadata Export
|
|
588
|
+
|
|
589
|
+
**Purpose**: Export validated queries to MJ metadata format or database
|
|
590
|
+
|
|
591
|
+
**Implementation**:
|
|
592
|
+
- `MetadataExporter` - Exports to JSON files
|
|
593
|
+
- `QueryDatabaseWriter` - Inserts into database
|
|
594
|
+
|
|
595
|
+
**Metadata Format**:
|
|
596
|
+
```typescript
|
|
597
|
+
interface QueryMetadataRecord {
|
|
598
|
+
fields: {
|
|
599
|
+
Name: string;
|
|
600
|
+
CategoryID: string;
|
|
601
|
+
UserQuestion: string;
|
|
602
|
+
Description: string;
|
|
603
|
+
TechnicalDescription: string;
|
|
604
|
+
SQL: string;
|
|
605
|
+
OriginalSQL: string;
|
|
606
|
+
UsesTemplate: boolean;
|
|
607
|
+
Status: string;
|
|
608
|
+
};
|
|
609
|
+
relatedEntities: {
|
|
610
|
+
'Query Fields': Array<{ fields: QueryFieldRecord }>;
|
|
611
|
+
'Query Params': Array<{ fields: QueryParamRecord }>;
|
|
612
|
+
};
|
|
613
|
+
}
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
**MetadataExporter Process**:
|
|
617
|
+
```typescript
|
|
618
|
+
async exportQueries(
|
|
619
|
+
validatedQueries: ValidatedQuery[],
|
|
620
|
+
outputDirectory: string
|
|
621
|
+
): Promise<ExportResult> {
|
|
622
|
+
// Transform to MJ metadata format
|
|
623
|
+
const metadata = validatedQueries.map(q => this.toQueryMetadata(q));
|
|
624
|
+
|
|
625
|
+
// Create metadata file
|
|
626
|
+
const metadataFile = {
|
|
627
|
+
timestamp: new Date().toISOString(),
|
|
628
|
+
generatedBy: 'query-gen',
|
|
629
|
+
version: '1.0',
|
|
630
|
+
queries: metadata
|
|
631
|
+
};
|
|
632
|
+
|
|
633
|
+
// Write to file
|
|
634
|
+
const outputPath = path.join(outputDirectory, `queries-${Date.now()}.json`);
|
|
635
|
+
await fs.writeFile(outputPath, JSON.stringify(metadataFile, null, 2));
|
|
636
|
+
|
|
637
|
+
return { success: true, outputPath, queryCount: metadata.length };
|
|
638
|
+
}
|
|
639
|
+
```
|
|
640
|
+
|
|
641
|
+
**QueryDatabaseWriter Process**:
|
|
642
|
+
```typescript
|
|
643
|
+
async writeQueriesToDatabase(
|
|
644
|
+
validatedQueries: ValidatedQuery[],
|
|
645
|
+
contextUser: UserInfo
|
|
646
|
+
): Promise<WriteResult> {
|
|
647
|
+
const md = new Metadata();
|
|
648
|
+
|
|
649
|
+
for (const vq of validatedQueries) {
|
|
650
|
+
// Create Query entity
|
|
651
|
+
const query = await md.GetEntityObject<QueryEntity>('Queries', contextUser);
|
|
652
|
+
query.NewRecord();
|
|
653
|
+
query.Name = generateQueryName(vq.businessQuestion);
|
|
654
|
+
query.SQL = vq.query.sql;
|
|
655
|
+
// ... set other fields
|
|
656
|
+
await query.Save();
|
|
657
|
+
|
|
658
|
+
// Create Query Fields
|
|
659
|
+
for (const field of vq.query.selectClause) {
|
|
660
|
+
const qf = await md.GetEntityObject<QueryFieldEntity>('Query Fields', contextUser);
|
|
661
|
+
qf.NewRecord();
|
|
662
|
+
qf.QueryID = query.ID;
|
|
663
|
+
qf.Name = field.name;
|
|
664
|
+
// ... set other fields
|
|
665
|
+
await qf.Save();
|
|
666
|
+
}
|
|
667
|
+
|
|
668
|
+
// Create Query Params
|
|
669
|
+
for (const param of vq.query.parameters) {
|
|
670
|
+
const qp = await md.GetEntityObject<QueryParamEntity>('Query Params', contextUser);
|
|
671
|
+
qp.NewRecord();
|
|
672
|
+
qp.QueryID = query.ID;
|
|
673
|
+
qp.Name = param.name;
|
|
674
|
+
// ... set other fields
|
|
675
|
+
await qp.Save();
|
|
676
|
+
}
|
|
677
|
+
}
|
|
678
|
+
}
|
|
679
|
+
```
|
|
680
|
+
|
|
681
|
+
---
|
|
682
|
+
|
|
683
|
+
## Data Flow
|
|
684
|
+
|
|
685
|
+
### End-to-End Flow Diagram
|
|
686
|
+
|
|
687
|
+
```
|
|
688
|
+
Database Schema (SQL Server)
|
|
689
|
+
↓
|
|
690
|
+
MemberJunction Metadata
|
|
691
|
+
↓
|
|
692
|
+
EntityGrouper
|
|
693
|
+
↓
|
|
694
|
+
EntityGroup[]
|
|
695
|
+
↓
|
|
696
|
+
QuestionGenerator (AI)
|
|
697
|
+
↓
|
|
698
|
+
BusinessQuestion[]
|
|
699
|
+
↓
|
|
700
|
+
EmbeddingService (Vector Embeddings)
|
|
701
|
+
↓
|
|
702
|
+
SimilaritySearch
|
|
703
|
+
↓
|
|
704
|
+
GoldenQuery[] (few-shot examples)
|
|
705
|
+
↓
|
|
706
|
+
QueryWriter (AI + few-shot)
|
|
707
|
+
↓
|
|
708
|
+
GeneratedQuery
|
|
709
|
+
↓
|
|
710
|
+
QueryTester (SQL execution)
|
|
711
|
+
├─ Success → QueryRefiner
|
|
712
|
+
└─ Error → QueryFixer (AI) → Retry
|
|
713
|
+
↓
|
|
714
|
+
QueryRefiner (AI evaluation & refinement)
|
|
715
|
+
↓
|
|
716
|
+
ValidatedQuery
|
|
717
|
+
↓
|
|
718
|
+
MetadataExporter / QueryDatabaseWriter
|
|
719
|
+
↓
|
|
720
|
+
JSON files or Database records
|
|
721
|
+
```
|
|
722
|
+
|
|
723
|
+
### Data Structures
|
|
724
|
+
|
|
725
|
+
**EntityInfo** (from MJ Metadata):
|
|
726
|
+
- Name, SchemaName, BaseTable, BaseView
|
|
727
|
+
- Fields: EntityFieldInfo[]
|
|
728
|
+
- Relationships: EntityRelationshipInfo[]
|
|
729
|
+
|
|
730
|
+
**EntityGroup**:
|
|
731
|
+
- entities: EntityInfo[]
|
|
732
|
+
- relationships: RelationshipInfo[]
|
|
733
|
+
- primaryEntity: EntityInfo
|
|
734
|
+
- relationshipType: 'single' | 'parent-child' | 'many-to-many'
|
|
735
|
+
|
|
736
|
+
**BusinessQuestion**:
|
|
737
|
+
- userQuestion: string
|
|
738
|
+
- description: string
|
|
739
|
+
- technicalDescription: string
|
|
740
|
+
- complexity: 'simple' | 'medium' | 'complex'
|
|
741
|
+
- requiresAggregation: boolean
|
|
742
|
+
- requiresJoins: boolean
|
|
743
|
+
- entities: string[]
|
|
744
|
+
|
|
745
|
+
**GeneratedQuery**:
|
|
746
|
+
- sql: string (Nunjucks template)
|
|
747
|
+
- selectClause: QueryOutputField[]
|
|
748
|
+
- parameters: QueryParameter[]
|
|
749
|
+
|
|
750
|
+
**ValidatedQuery**:
|
|
751
|
+
- businessQuestion: BusinessQuestion
|
|
752
|
+
- query: GeneratedQuery
|
|
753
|
+
- testResult: QueryTestResult
|
|
754
|
+
- evaluation: QueryEvaluation
|
|
755
|
+
- entityGroup: EntityGroup
|
|
756
|
+
|
|
757
|
+
---
|
|
758
|
+
|
|
759
|
+
## AI Integration
|
|
760
|
+
|
|
761
|
+
### AIEngine Configuration
|
|
762
|
+
|
|
763
|
+
QueryGen uses MemberJunction's AIEngine for all AI interactions:
|
|
764
|
+
|
|
765
|
+
```typescript
|
|
766
|
+
// Initialize AIEngine
|
|
767
|
+
const aiEngine = AIEngine.Instance;
|
|
768
|
+
await aiEngine.Config(false, contextUser);
|
|
769
|
+
|
|
770
|
+
// Prompts are already cached by AIEngine
|
|
771
|
+
const prompt = aiEngine.Prompts.find(p => p.Name === 'Business Question Generator');
|
|
772
|
+
|
|
773
|
+
// Execute prompt via AIPromptRunner
|
|
774
|
+
const promptRunner = new AIPromptRunner();
|
|
775
|
+
const result = await promptRunner.ExecutePrompt({
|
|
776
|
+
prompt,
|
|
777
|
+
data: { entityGroupMetadata, ... },
|
|
778
|
+
contextUser
|
|
779
|
+
});
|
|
780
|
+
```
|
|
781
|
+
|
|
782
|
+
### Prompt Configuration
|
|
783
|
+
|
|
784
|
+
Each prompt is configured with 6 AI models in priority order:
|
|
785
|
+
|
|
786
|
+
1. **Claude 4.5 Sonnet** (Anthropic) - Priority 1
|
|
787
|
+
- Best quality, highest reasoning capability
|
|
788
|
+
- Used for complex refinement tasks
|
|
789
|
+
|
|
790
|
+
2. **Kimi K2** (Groq) - Priority 2
|
|
791
|
+
- Fast, cost-effective
|
|
792
|
+
- Good balance of speed and quality
|
|
793
|
+
|
|
794
|
+
3. **Kimi K2** (Cerebras) - Priority 3
|
|
795
|
+
- Extremely fast inference
|
|
796
|
+
- Backup for Groq
|
|
797
|
+
|
|
798
|
+
4. **Gemini 2.5 Flash** (Google) - Priority 4
|
|
799
|
+
- Very cost-effective
|
|
800
|
+
- Good for high-volume generation
|
|
801
|
+
|
|
802
|
+
5. **GPT-OSS-120B** (Groq) - Priority 5
|
|
803
|
+
- Open-source model
|
|
804
|
+
- Fallback option
|
|
805
|
+
|
|
806
|
+
6. **GPT 5-nano** (OpenAI) - Priority 6
|
|
807
|
+
- Smallest, fastest OpenAI model
|
|
808
|
+
- Final fallback
|
|
809
|
+
|
|
810
|
+
**Failover Strategy**:
|
|
811
|
+
- If model 1 fails (rate limit, error, timeout), try model 2
|
|
812
|
+
- Continue down the list until successful response
|
|
813
|
+
- If all models fail, throw error
|
|
814
|
+
|
|
815
|
+
### Prompt Engineering Best Practices
|
|
816
|
+
|
|
817
|
+
1. **Use Nunjucks Templates**:
|
|
818
|
+
- Format data as structured markdown, not raw JSON
|
|
819
|
+
- Use `{% for %}` loops to iterate over arrays
|
|
820
|
+
- Use `{% if %}` conditionals for optional sections
|
|
821
|
+
- Makes prompts much easier for LLMs to parse
|
|
822
|
+
|
|
823
|
+
2. **Provide Context**:
|
|
824
|
+
- Entity descriptions and business domain
|
|
825
|
+
- Field names, types, and descriptions
|
|
826
|
+
- Relationship information (foreign keys, JOINs)
|
|
827
|
+
|
|
828
|
+
3. **Few-Shot Examples**:
|
|
829
|
+
- Include 3-5 similar golden queries
|
|
830
|
+
- Show both SQL and parameter definitions
|
|
831
|
+
- Demonstrate expected output format
|
|
832
|
+
|
|
833
|
+
4. **Clear Instructions**:
|
|
834
|
+
- Explicit requirements (Nunjucks syntax, SQL filters)
|
|
835
|
+
- Output format specification (JSON schema)
|
|
836
|
+
- Validation rules (use views, handle NULLs, etc.)
|
|
837
|
+
|
|
838
|
+
---
|
|
839
|
+
|
|
840
|
+
## Database Integration
|
|
841
|
+
|
|
842
|
+
### MemberJunction Patterns
|
|
843
|
+
|
|
844
|
+
QueryGen follows MJ best practices:
|
|
845
|
+
|
|
846
|
+
1. **GetEntityObject Pattern**:
|
|
847
|
+
```typescript
|
|
848
|
+
// ✅ Correct
|
|
849
|
+
const query = await md.GetEntityObject<QueryEntity>('Queries', contextUser);
|
|
850
|
+
|
|
851
|
+
// ❌ Wrong
|
|
852
|
+
const query = new QueryEntity();
|
|
853
|
+
```
|
|
854
|
+
|
|
855
|
+
2. **Server-Side contextUser**:
|
|
856
|
+
```typescript
|
|
857
|
+
// ✅ Correct (server-side)
|
|
858
|
+
const result = await rv.RunView({ EntityName: 'Queries' }, contextUser);
|
|
859
|
+
|
|
860
|
+
// ❌ Wrong (server-side)
|
|
861
|
+
const result = await rv.RunView({ EntityName: 'Queries' });
|
|
862
|
+
```
|
|
863
|
+
|
|
864
|
+
3. **RunView Error Handling**:
|
|
865
|
+
```typescript
|
|
866
|
+
// ✅ Correct
|
|
867
|
+
const result = await rv.RunView({...});
|
|
868
|
+
if (result.Success) {
|
|
869
|
+
const data = result.Results || [];
|
|
870
|
+
} else {
|
|
871
|
+
console.error('Failed:', result.ErrorMessage);
|
|
872
|
+
}
|
|
873
|
+
|
|
874
|
+
// ❌ Wrong (assumes success)
|
|
875
|
+
const result = await rv.RunView({...});
|
|
876
|
+
const data = result.Results;
|
|
877
|
+
```
|
|
878
|
+
|
|
879
|
+
### SQL Execution
|
|
880
|
+
|
|
881
|
+
QueryGen uses `DatabaseProviderBase.ExecuteSQL()` for query testing:
|
|
882
|
+
|
|
883
|
+
```typescript
|
|
884
|
+
const dataProvider = Metadata.Provider.DatabaseConnection as DatabaseProviderBase;
|
|
885
|
+
const result = await dataProvider.ExecuteSQL(renderedSQL);
|
|
886
|
+
// Returns: { Results: any[], RowCount: number }
|
|
887
|
+
```
|
|
888
|
+
|
|
889
|
+
### Schema Requirements
|
|
890
|
+
|
|
891
|
+
QueryGen requires:
|
|
892
|
+
- `SchemaName` on all entities (e.g., 'dbo', 'sales')
|
|
893
|
+
- `BaseView` on all entities (e.g., 'vwCustomers', 'vwOrders')
|
|
894
|
+
- Foreign key metadata on EntityField records
|
|
895
|
+
- Sample data for query testing (optional but recommended)
|
|
896
|
+
|
|
897
|
+
---
|
|
898
|
+
|
|
899
|
+
## Error Handling Strategy
|
|
900
|
+
|
|
901
|
+
### Error Handling Utilities
|
|
902
|
+
|
|
903
|
+
QueryGen uses MJ's error handling utilities:
|
|
904
|
+
|
|
905
|
+
```typescript
|
|
906
|
+
import { extractErrorMessage, requireValue, getPropertyOrDefault } from '@memberjunction/query-gen';
|
|
907
|
+
|
|
908
|
+
try {
|
|
909
|
+
await operation();
|
|
910
|
+
} catch (error: unknown) {
|
|
911
|
+
const errorMsg = extractErrorMessage(error, 'Operation Name');
|
|
912
|
+
console.error(errorMsg);
|
|
913
|
+
}
|
|
914
|
+
```
|
|
915
|
+
|
|
916
|
+
### Agent Run Step Pattern
|
|
917
|
+
|
|
918
|
+
All AI operations follow the agent run step pattern:
|
|
919
|
+
|
|
920
|
+
```typescript
|
|
921
|
+
const step = await this.createStep('Step Name', inputData, contextUser, 'Validation');
|
|
922
|
+
|
|
923
|
+
try {
|
|
924
|
+
const result = await work();
|
|
925
|
+
|
|
926
|
+
step.Success = true;
|
|
927
|
+
step.Status = 'Completed';
|
|
928
|
+
step.CompletedAt = new Date();
|
|
929
|
+
step.OutputData = JSON.stringify(result, null, 2);
|
|
930
|
+
await step.Save();
|
|
931
|
+
|
|
932
|
+
return result;
|
|
933
|
+
} catch (error: unknown) {
|
|
934
|
+
step.Success = false;
|
|
935
|
+
step.Status = 'Failed';
|
|
936
|
+
step.CompletedAt = new Date();
|
|
937
|
+
step.ErrorMessage = extractErrorMessage(error, 'Step Name');
|
|
938
|
+
await step.Save();
|
|
939
|
+
|
|
940
|
+
throw error;
|
|
941
|
+
}
|
|
942
|
+
```
|
|
943
|
+
|
|
944
|
+
### Error Categories
|
|
945
|
+
|
|
946
|
+
1. **Database Errors**:
|
|
947
|
+
- Connection failures
|
|
948
|
+
- SQL syntax errors
|
|
949
|
+
- Schema validation errors
|
|
950
|
+
- Permission errors
|
|
951
|
+
|
|
952
|
+
2. **AI Errors**:
|
|
953
|
+
- Prompt not found
|
|
954
|
+
- Model failures / rate limits
|
|
955
|
+
- Invalid JSON responses
|
|
956
|
+
- Timeout errors
|
|
957
|
+
|
|
958
|
+
3. **Template Errors**:
|
|
959
|
+
- Nunjucks syntax errors
|
|
960
|
+
- Unknown filter errors
|
|
961
|
+
- Parameter type mismatches
|
|
962
|
+
|
|
963
|
+
4. **Validation Errors**:
|
|
964
|
+
- Query returns no results
|
|
965
|
+
- Schema mismatch
|
|
966
|
+
- Missing required parameters
|
|
967
|
+
|
|
968
|
+
---
|
|
969
|
+
|
|
970
|
+
## Performance Considerations
|
|
971
|
+
|
|
972
|
+
### Optimization Strategies
|
|
973
|
+
|
|
974
|
+
1. **Parallel Processing**:
|
|
975
|
+
```typescript
|
|
976
|
+
// Process multiple entity groups concurrently
|
|
977
|
+
const parallelGenerations = 3;
|
|
978
|
+
const chunks = chunkArray(entityGroups, parallelGenerations);
|
|
979
|
+
|
|
980
|
+
for (const chunk of chunks) {
|
|
981
|
+
await Promise.all(chunk.map(group => processGroup(group)));
|
|
982
|
+
}
|
|
983
|
+
```
|
|
984
|
+
|
|
985
|
+
2. **Prompt Caching**:
|
|
986
|
+
```typescript
|
|
987
|
+
// AIEngine caches prompts after Config()
|
|
988
|
+
await aiEngine.Config(false, contextUser);
|
|
989
|
+
// Subsequent prompt lookups are instant
|
|
990
|
+
const prompt = aiEngine.Prompts.find(p => p.Name === 'Business Question Generator');
|
|
991
|
+
```
|
|
992
|
+
|
|
993
|
+
3. **Embedding Caching**:
|
|
994
|
+
```typescript
|
|
995
|
+
// Cache golden query embeddings
|
|
996
|
+
const embeddedGolden = await embeddingService.embedGoldenQueries(goldenQueries);
|
|
997
|
+
// Reuse for all questions in session
|
|
998
|
+
```
|
|
999
|
+
|
|
1000
|
+
4. **Database Connection Pooling**:
|
|
1001
|
+
- MJ automatically pools connections
|
|
1002
|
+
- Configure pool size in `mj.config.cjs`
|
|
1003
|
+
- Default: max 50, min 5
|
|
1004
|
+
|
|
1005
|
+
### Cost Optimization
|
|
1006
|
+
|
|
1007
|
+
1. **Use Cheaper Models**:
|
|
1008
|
+
- Gemini 2.5 Flash: $0.075 per 1M input tokens
|
|
1009
|
+
- GPT 5-nano: Low-cost OpenAI option
|
|
1010
|
+
- Groq/Cerebras: Very fast, low cost
|
|
1011
|
+
|
|
1012
|
+
2. **Reduce AI Calls**:
|
|
1013
|
+
- Lower `maxRefinementIterations` (3 → 2)
|
|
1014
|
+
- Lower `maxFixingIterations` (5 → 3)
|
|
1015
|
+
- Lower `topSimilarQueries` (5 → 3)
|
|
1016
|
+
|
|
1017
|
+
3. **Batch Operations**:
|
|
1018
|
+
- Generate queries for multiple entities at once
|
|
1019
|
+
- Use `parallelGenerations` for concurrent processing
|
|
1020
|
+
|
|
1021
|
+
### Scalability
|
|
1022
|
+
|
|
1023
|
+
QueryGen scales to handle:
|
|
1024
|
+
- **Entities**: 100+ entities with good performance
|
|
1025
|
+
- **Entity Groups**: 1000+ groups (depends on relationship complexity)
|
|
1026
|
+
- **Queries**: 10-100 queries per minute (depends on AI model speed)
|
|
1027
|
+
- **Database Size**: No limit (uses views, not full table scans)
|
|
1028
|
+
|
|
1029
|
+
---
|
|
1030
|
+
|
|
1031
|
+
## Design Decisions
|
|
1032
|
+
|
|
1033
|
+
### Why 11 Phases?
|
|
1034
|
+
|
|
1035
|
+
Each phase has a specific purpose and can be independently tested/validated:
|
|
1036
|
+
1. **Separation of Concerns**: Each phase does one thing well
|
|
1037
|
+
2. **Error Isolation**: Failures don't cascade
|
|
1038
|
+
3. **Flexibility**: Can skip phases (e.g., no refinement)
|
|
1039
|
+
4. **Observability**: Clear progress tracking
|
|
1040
|
+
|
|
1041
|
+
### Why Nunjucks Templates?
|
|
1042
|
+
|
|
1043
|
+
1. **MJ Standard**: MemberJunction uses Nunjucks for SQL templates
|
|
1044
|
+
2. **Powerful**: Supports loops, conditionals, filters
|
|
1045
|
+
3. **SQL Filters**: Built-in `sqlString`, `sqlNumber`, `sqlDate`, `sqlIn` filters
|
|
1046
|
+
4. **Type Safety**: QueryParameterProcessor validates parameters
|
|
1047
|
+
|
|
1048
|
+
### Why Weighted Similarity?
|
|
1049
|
+
|
|
1050
|
+
Not all fields are equally important:
|
|
1051
|
+
- `description` and `technicalDescription` contain richer semantic information
|
|
1052
|
+
- `name` is often too short/generic
|
|
1053
|
+
- `userQuestion` captures intent but varies in wording
|
|
1054
|
+
- Weighted approach gives better few-shot examples
|
|
1055
|
+
|
|
1056
|
+
### Why Breadth-First Traversal?
|
|
1057
|
+
|
|
1058
|
+
1. **Focused Groups**: Prefer directly related entities (1 hop)
|
|
1059
|
+
2. **Practical Queries**: Most queries use 1-2 entities
|
|
1060
|
+
3. **Reduced Complexity**: Avoid deeply nested JOINs
|
|
1061
|
+
4. **Better Performance**: Simpler queries run faster
|
|
1062
|
+
|
|
1063
|
+
### Why 6-Model Failover?
|
|
1064
|
+
|
|
1065
|
+
1. **High Availability**: If one model is down, others available
|
|
1066
|
+
2. **Rate Limit Protection**: Spread load across vendors
|
|
1067
|
+
3. **Cost Optimization**: Cheaper models as fallbacks
|
|
1068
|
+
4. **Speed Variation**: Fast models (Cerebras) for simple tasks
|
|
1069
|
+
|
|
1070
|
+
### Why Local Embeddings?
|
|
1071
|
+
|
|
1072
|
+
1. **No API Calls**: Faster, no rate limits
|
|
1073
|
+
2. **Privacy**: Data doesn't leave the system
|
|
1074
|
+
3. **Cost**: Free (no per-token charges)
|
|
1075
|
+
4. **Sufficient Quality**: `text-embedding-3-small` works well for similarity
|
|
1076
|
+
|
|
1077
|
+
---
|
|
1078
|
+
|
|
1079
|
+
## Future Enhancements
|
|
1080
|
+
|
|
1081
|
+
### Planned Improvements
|
|
1082
|
+
|
|
1083
|
+
1. **Query Optimization**:
|
|
1084
|
+
- Analyze execution plans
|
|
1085
|
+
- Suggest indexes
|
|
1086
|
+
- Detect expensive operations
|
|
1087
|
+
|
|
1088
|
+
2. **Golden Query Learning**:
|
|
1089
|
+
- Automatically promote good queries to golden set
|
|
1090
|
+
- User feedback loop
|
|
1091
|
+
- Continuous improvement
|
|
1092
|
+
|
|
1093
|
+
3. **Multi-Database Support**:
|
|
1094
|
+
- PostgreSQL support
|
|
1095
|
+
- MySQL support
|
|
1096
|
+
- Abstract SQL dialect differences
|
|
1097
|
+
|
|
1098
|
+
4. **Advanced Filtering**:
|
|
1099
|
+
- Exclude specific entity combinations
|
|
1100
|
+
- Require specific entities in groups
|
|
1101
|
+
- Custom grouping strategies
|
|
1102
|
+
|
|
1103
|
+
5. **Monitoring & Analytics**:
|
|
1104
|
+
- Track query generation success rates
|
|
1105
|
+
- Measure AI token usage
|
|
1106
|
+
- Performance metrics dashboard
|
|
1107
|
+
|
|
1108
|
+
---
|
|
1109
|
+
|
|
1110
|
+
## Conclusion
|
|
1111
|
+
|
|
1112
|
+
QueryGen's architecture is designed for:
|
|
1113
|
+
- **Reliability**: Comprehensive error handling and AI failover
|
|
1114
|
+
- **Quality**: Iterative refinement ensures queries work correctly
|
|
1115
|
+
- **Scalability**: Handles hundreds of entities efficiently
|
|
1116
|
+
- **Maintainability**: Modular design, clear separation of concerns
|
|
1117
|
+
- **Extensibility**: Easy to add new phases or customize existing ones
|
|
1118
|
+
|
|
1119
|
+
For API usage examples, see [API.md](./API.md).
|
|
1120
|
+
For user documentation, see [../README.md](../README.md).
|