npm - @memberjunction/db-auto-doc - Versions diffs - 5.14.0 → 5.15.0 - Mend

@memberjunction/db-auto-doc 5.14.0 → 5.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

package/README.md +169 -29
package/bin/run.js +1 -1
package/dist/commands/analyze.d.ts +3 -0
package/dist/commands/analyze.d.ts.map +1 -1
package/dist/commands/analyze.js +33 -3
package/dist/commands/analyze.js.map +1 -1
package/dist/commands/prune.d.ts +17 -0
package/dist/commands/prune.d.ts.map +1 -0
package/dist/commands/prune.js +153 -0
package/dist/commands/prune.js.map +1 -0
package/dist/core/AnalysisEngine.d.ts +44 -0
package/dist/core/AnalysisEngine.d.ts.map +1 -1
package/dist/core/AnalysisEngine.js +427 -1
package/dist/core/AnalysisEngine.js.map +1 -1
package/dist/core/AnalysisOrchestrator.d.ts.map +1 -1
package/dist/core/AnalysisOrchestrator.js +33 -10
package/dist/core/AnalysisOrchestrator.js.map +1 -1
package/dist/discovery/FKDetector.d.ts +6 -0
package/dist/discovery/FKDetector.d.ts.map +1 -1
package/dist/discovery/FKDetector.js +101 -4
package/dist/discovery/FKDetector.js.map +1 -1
package/dist/discovery/PKDetector.d.ts +7 -0
package/dist/discovery/PKDetector.d.ts.map +1 -1
package/dist/discovery/PKDetector.js +121 -6
package/dist/discovery/PKDetector.js.map +1 -1
package/dist/drivers/MySQLDriver.d.ts.map +1 -1
package/dist/drivers/MySQLDriver.js +2 -0
package/dist/drivers/MySQLDriver.js.map +1 -1
package/dist/drivers/PostgreSQLDriver.d.ts.map +1 -1
package/dist/drivers/PostgreSQLDriver.js +2 -0
package/dist/drivers/PostgreSQLDriver.js.map +1 -1
package/dist/drivers/SQLServerDriver.d.ts.map +1 -1
package/dist/drivers/SQLServerDriver.js +2 -0
package/dist/drivers/SQLServerDriver.js.map +1 -1
package/dist/prompts/PromptEngine.d.ts +19 -0
package/dist/prompts/PromptEngine.d.ts.map +1 -1
package/dist/prompts/PromptEngine.js +91 -7
package/dist/prompts/PromptEngine.js.map +1 -1
package/dist/types/analysis.d.ts +10 -0
package/dist/types/analysis.d.ts.map +1 -1
package/dist/types/config.d.ts +47 -0
package/dist/types/config.d.ts.map +1 -1
package/dist/types/config.js.map +1 -1
package/dist/types/prompts.d.ts +26 -0
package/dist/types/prompts.d.ts.map +1 -1
package/dist/utils/config-loader.js +2 -2
package/dist/utils/config-loader.js.map +1 -1
package/dist/utils/ensureArray.d.ts +13 -0
package/dist/utils/ensureArray.d.ts.map +1 -0
package/dist/utils/ensureArray.js +39 -0
package/dist/utils/ensureArray.js.map +1 -0
package/package.json +5 -5
package/prompts/fk-evaluation.md +94 -0
package/prompts/fk-pruning-holistic.md +57 -0
package/prompts/fk-pruning-table.md +51 -0
package/prompts/pk-pruning-holistic.md +26 -0
package/prompts/pk-pruning-table.md +35 -0
package/prompts/table-analysis.md +28 -3

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@memberjunction/db-auto-doc",
   "type": "module",
-  "version": "5.14.0",
+  "version": "5.15.0",
   "description": "AI-powered database documentation generator for SQL Server, MySQL, and PostgreSQL. Analyzes your database structure, uses AI to generate comprehensive descriptions, and saves them as metadata. Works standalone - no MemberJunction runtime required.",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",
@@ -72,10 +72,10 @@
   },
   "dependencies": {
     "@inquirer/prompts": "^8.2.0",
-    "@memberjunction/ai": "5.14.0",
-    "@memberjunction/server-bootstrap": "5.14.0",
-    "@memberjunction/core": "5.14.0",
-    "@memberjunction/global": "5.14.0",
+    "@memberjunction/ai": "5.15.0",
+    "@memberjunction/server-bootstrap": "5.15.0",
+    "@memberjunction/core": "5.15.0",
+    "@memberjunction/global": "5.15.0",
     "@oclif/core": "^3.27.0",
     "@oclif/plugin-help": "^6.2.37",
     "chalk": "^5.6.2",

package/prompts/fk-evaluation.md ADDED Viewed

@@ -0,0 +1,94 @@
+You are a database expert evaluating foreign key candidates that were identified by statistical analysis. Your job is to confirm or reject each candidate based on semantic plausibility, directionality, and database design principles.
+## Database Context
+{% if seedContext %}
+{% if seedContext.overallPurpose %}- **Purpose**: {{ seedContext.overallPurpose }}{% endif %}
+{% if seedContext.businessDomains %}- **Business Domains**: {{ seedContext.businessDomains | join(', ') }}{% endif %}
+{% if seedContext.industryContext %}- **Industry**: {{ seedContext.industryContext }}{% endif %}
+{% endif %}
+## All Tables
+{% for tbl in allTables %}
+- **{{ tbl.schema }}.{{ tbl.name }}**{% if tbl.description %}: {{ tbl.description }}{% endif %}{% if tbl.pk %} (PK: {{ tbl.pk }}){% endif %}
+{% endfor %}
+## FK Candidates to Evaluate
+The following candidates were found by statistical analysis (value overlap, naming patterns, cardinality). Each has statistical evidence but needs semantic validation.
+{% for fk in candidates %}
+{{ loop.index }}. **{{ fk.sourceSchema }}.{{ fk.sourceTable }}.{{ fk.sourceColumn }}** → **{{ fk.targetSchema }}.{{ fk.targetTable }}.{{ fk.targetColumn }}**
+   - Statistical confidence: {{ fk.confidence }}%
+   - Value overlap: {{ (fk.valueOverlap * 100) | round(1) }}%
+   - Source nulls: {{ (fk.nullPercentage * 100) | round(1) }}%
+   - Cardinality ratio (source/target distinct): {{ fk.cardinalityRatio | round(2) }}
+{% endfor %}
+## Evaluation Rules
+### Rule 1: Value overlap is the strongest signal — respect it
+High value overlap (>85%) means the source column's values are almost entirely contained within the target column's values. This is near-proof of a FK relationship. **Only reject high-overlap candidates if you have a concrete, specific reason** (e.g., clearly wrong direction, or an obviously better target exists for the same source column).
+Do NOT reject candidates just because column names don't match. Real databases frequently use alias names for FK columns:
+- `PersonID` referencing `BusinessEntityID` (alias for the same concept)
+- `ComponentID` referencing `ProductID` (a component IS a product)
+- `Owner` referencing `BusinessEntityID` (semantic alias)
+- `SizeUnitMeasureCode` referencing `UnitMeasureCode` (prefixed FK)
+- `FromCurrencyCode` / `ToCurrencyCode` referencing `CurrencyCode` (role-based aliases)
+These are all valid FKs despite name mismatches. The statistical overlap proves the relationship.
+### Rule 2: Directionality — child references parent
+FKs point FROM the child table TO the parent table. The child table contains the FK column referencing the parent's PK/unique key.
+- `OrderDetail.ProductID → Product.ProductID` is CORRECT (child → parent, many-to-one)
+- `Product.ProductID → OrderDetail.ProductID` is WRONG (parent → child, one-to-many)
+**How to determine direction**: The target table should generally have FEWER or EQUAL distinct values in the referenced column compared to the source. A cardinality ratio > 1.0 suggests correct child→parent direction.
+### Rule 3: Inheritance / specialization — prefer the most specific target
+When a source column has candidates pointing to multiple target tables (e.g., `BusinessEntityID` exists in both `BusinessEntity` and `Person`), the correct FK is usually to the **most specialized table**, not the root/base table. This is the Table-Per-Type inheritance pattern common in databases:
+- `Employee.BusinessEntityID → Person.BusinessEntityID` is CORRECT (Employee IS-A Person)
+- `Employee.BusinessEntityID → BusinessEntity.BusinessEntityID` is WRONG (too generic — Employee relates to Person specifically)
+Look at the table names and relationships to identify inheritance chains. The FK should point to the table that the source table has the most specific relationship with.
+### Rule 4: Transitive hops — reject indirect relationships
+If `A.col` and `B.col` both reference the same parent table but A and B have no direct relationship, reject `A.col → B.col`. Example:
+- `EmployeePayHistory.BusinessEntityID → PersonPhone.BusinessEntityID` — REJECT (both reference Person independently; PayHistory doesn't depend on PersonPhone)
+**Key distinction**: A transitive hop has LOW cardinality ratio (close to 1.0) and makes no business sense. A real FK has a meaningful dependency.
+### Rule 5: Semantic plausibility
+Does the relationship make business sense? Consider the table purposes. But remember: statistical evidence (high value overlap) outweighs naming concerns. If the data proves the relationship, confirm it even if the naming seems unusual.
+### Rule 6: Multiple candidates for same source column
+When a source column has multiple FK candidates, you may confirm MORE THAN ONE if they are genuinely valid (e.g., a column that references different tables in different contexts). But typically, prefer the single best target and reject the others.
+## Response Format
+Return a JSON array where each object represents your evaluation of ONE candidate. Use the same index as the input list. Only include candidates you are confirming — omit rejected ones entirely.
+```json
+[
+  {
+    "index": 1,
+    "verdict": "confirm",
+    "confidence": 0.95,
+    "reasoning": "Brief explanation"
+  },
+  {
+    "index": 3,
+    "verdict": "confirm",
+    "confidence": 0.80,
+    "reasoning": "Brief explanation"
+  }
+]
+```
+- **index**: The 1-based index from the candidate list above
+- **verdict**: Always "confirm" (omit rejected candidates entirely)
+- **confidence**: Your adjusted confidence (0-1 scale)
+- **reasoning**: Brief explanation of why this is a valid FK
+**IMPORTANT**: Err on the side of confirming when statistical evidence is strong. It is better to include a borderline FK than to miss a real one. Only reject when you are confident the relationship is wrong.
+Return ONLY valid JSON. Do not include markdown code fences or explanatory text.

package/prompts/fk-pruning-holistic.md ADDED Viewed

@@ -0,0 +1,57 @@
+You are a database expert making final decisions on proposed FK removals. A per-table analysis has proposed removing certain foreign keys. Your job is to review ALL proposed removals holistically and make final keep/remove decisions.
+## Database Context
+{% if seedContext %}
+{% if seedContext.overallPurpose %}- **Purpose**: {{ seedContext.overallPurpose }}{% endif %}
+{% if seedContext.businessDomains %}- **Business Domains**: {{ seedContext.businessDomains | join(', ') }}{% endif %}
+{% if seedContext.industryContext %}- **Industry**: {{ seedContext.industryContext }}{% endif %}
+{% endif %}
+## All Database Tables
+{% for tbl in allTables %}
+- **{{ tbl.schema }}.{{ tbl.name }}**{% if tbl.pk %} (PK: {{ tbl.pk }}){% endif %}{% if tbl.description %}: {{ tbl.description }}{% endif %}
+{% endfor %}
+## Proposed FK Removals
+The per-table analysis proposed removing these FKs. Review each one and decide whether to confirm the removal or keep the FK.
+{% for proposal in proposals %}
+{{ loop.index }}. **{{ proposal.sourceSchema }}.{{ proposal.sourceTable }}.{{ proposal.sourceColumn }}** → **{{ proposal.targetSchema }}.{{ proposal.targetTable }}.{{ proposal.targetColumn }}** (confidence: {{ proposal.confidence }}%)
+   - **Removal reason**: {{ proposal.reasoning }}
+{% endfor %}
+## Review Guidelines
+Consider the FULL relationship graph when making decisions:
+- If removing an FK would leave a table with NO outgoing relationships, reconsider — most tables have at least one FK
+- If the per-table pass proposed removing an FK because a "better" target exists, verify that the better target FK actually exists in the confirmed set
+- If multiple tables have the same column pointing to the same target and the per-table pass wants to remove only some, consider consistency
+- Reverse-direction FKs (parent→child) should almost always be removed
+- Transitive hops (A→B when both reference C independently) should almost always be removed
+## Response Format
+Return a JSON array with your final decision for EACH proposed removal:
+```json
+[
+  {
+    "index": 1,
+    "action": "remove",
+    "reasoning": "Confirmed — reverse direction FK, Department is the parent"
+  },
+  {
+    "index": 3,
+    "action": "keep",
+    "reasoning": "On reflection, this is a valid FK — the per-table analysis missed that these tables are directly related"
+  }
+]
+```
+- **index**: The 1-based index from the proposed removals list above
+- **action**: `"remove"` to confirm removal, `"keep"` to override and keep the FK
+- **reasoning**: Brief explanation
+**Every proposed removal must have a decision.** Do not omit any.
+Return ONLY valid JSON. Do not include markdown code fences or explanatory text.

package/prompts/fk-pruning-table.md ADDED Viewed

@@ -0,0 +1,51 @@
+You are a database expert reviewing foreign key relationships for a single table. Your job is to identify ONLY the clearly incorrect FKs that should be removed.
+## Table: {{ sourceSchema }}.{{ sourceTable }}
+{% if tableDescription %}**Description**: {{ tableDescription }}{% endif %}
+## All Database Tables (for reference)
+{% for tbl in allTables %}
+- **{{ tbl.schema }}.{{ tbl.name }}**{% if tbl.pk %} (PK: {{ tbl.pk }}){% endif %}{% if tbl.description %}: {{ tbl.description }}{% endif %}
+{% endfor %}
+## Foreign Keys to Review
+These FKs were identified by statistical analysis and/or LLM analysis. Some are correct, some may be false positives.
+**NOTE**: FKs marked as **[LOCKED]** have very high confidence and CANNOT be removed. Only evaluate the unlocked ones.
+{% for fk in candidates %}
+{{ loop.index }}. {% if fk.locked %}**[LOCKED]** {% endif %}**{{ fk.sourceColumn }}** → {{ fk.targetSchema }}.{{ fk.targetTable }}.{{ fk.targetColumn }} (confidence: {{ fk.confidence }}%)
+{% endfor %}
+## What to look for when proposing removals:
+1. **Reverse direction**: FK goes from parent→child instead of child→parent. Example: `Department.DepartmentID → EmployeeDepartmentHistory.DepartmentID` is backwards — the history table should reference Department, not the other way around.
+2. **Transitive/indirect relationships**: Two tables both reference a common parent but aren't directly related. Example: `EmployeePayHistory.BusinessEntityID → PersonPhone.BusinessEntityID` — both reference Person, but PayHistory doesn't depend on PersonPhone.
+3. **Wrong target when better target exists**: If a column points to a generic table but a more specific table is the correct target. Example: `Employee.BusinessEntityID → BusinessEntity.BusinessEntityID` when `Employee.BusinessEntityID → Person.BusinessEntityID` is the correct FK (Person is more specific).
+4. **Column name mismatch creating false match**: Same data type and overlapping values but no real relationship. Example: `OrderQty → OnOrderQty` — both are integers with overlapping ranges but aren't referential.
+5. **Sibling fan-out**: When a source column has multiple FK targets with the same column name, usually only ONE is the correct FK (to the parent/lookup table). The others are sibling tables that independently reference the same parent. Look for the pattern: `A.TerritoryID → SalesTerritory.TerritoryID` (correct — SalesTerritory is the lookup) vs `A.TerritoryID → SalesTerritoryHistory.TerritoryID` (wrong — History is a sibling, not a parent). The correct target is typically:
+   - The table whose PK matches the FK column
+   - The shorter/simpler table name (lookup/master vs history/detail)
+   - The table with fewer columns (lookup tables are small)
+## Response Format
+Return a JSON array of FKs you propose to REMOVE. Only include FKs you are confident are wrong. Do NOT include locked FKs. If all unlocked FKs look correct, return an empty array `[]`.
+```json
+[
+  {
+    "index": 2,
+    "action": "remove",
+    "reasoning": "Reverse direction — Department is the parent table, not the child"
+  }
+]
+```
+**Be moderately aggressive** — remove FKs that follow the sibling/reverse/transitive patterns described above. The locked FKs protect the high-confidence correct relationships, so your job is to clean up the lower-confidence noise.
+Return ONLY valid JSON. Do not include markdown code fences or explanatory text.

package/prompts/pk-pruning-holistic.md ADDED Viewed

@@ -0,0 +1,26 @@
+You are performing a final review of proposed primary key removals across a database.
+## Proposed PK Removals:
+{% for p in proposals %}
+{{ loop.index }}. {{ p.sourceSchema }}.{{ p.sourceTable }}: columns [{{ p.columns | join(", ") }}] (confidence: {{ p.confidence }}%)
+   Reasoning: {{ p.reasoning }}
+{% endfor %}
+## All Database Tables:
+{% for tbl in allTables %}
+- {{ tbl.schema }}.{{ tbl.name }}{% if tbl.pk %} (PK: {{ tbl.pk }}){% endif %}{% if tbl.description %} — {{ tbl.description }}{% endif %}
+{% endfor %}
+## Your Task:
+Review each proposed removal. Consider:
+1. Would removing this PK leave the table without any primary key?
+2. Is this PK actually correct and should not be removed?
+3. Are there any cross-table consistency issues?
+For each proposal, confirm or reject the removal:
+[
+  { "index": 1, "action": "remove", "reasoning": "Confirmed: not the real PK" },
+  { "index": 2, "action": "keep", "reasoning": "Actually correct, do not remove" }
+]
+Return ONLY valid JSON. Do not include markdown code fences or explanatory text.

package/prompts/pk-pruning-table.md ADDED Viewed

@@ -0,0 +1,35 @@
+You are evaluating primary key candidates for a database table.
+## Table: {{ sourceSchema }}.{{ sourceTable }}
+{{ tableDescription }}
+## PK Candidates (evaluate these):
+{% for pk in candidates %}
+{{ loop.index }}. Columns: {{ pk.columns | join(", ") }} (confidence: {{ pk.confidence }}%{% if pk.locked %}, LOCKED - do not modify{% endif %})
+{% endfor %}
+## All Database Tables (for context):
+{% for tbl in allTables %}
+- {{ tbl.schema }}.{{ tbl.name }}{% if tbl.pk %} (PK: {{ tbl.pk }}){% endif %}
+{% endfor %}
+## Your Task:
+Evaluate each UNLOCKED PK candidate. A valid primary key must:
+1. Uniquely identify every row in the table
+2. Be the most natural identifier for the entity (prefer table-specific IDs over generic ones)
+3. For junction/bridge tables, be the combination of the foreign key columns
+4. Only ONE primary key should exist per table
+For each candidate, respond with:
+- `"action": "keep"` or `"action": "remove"`
+- `"reasoning": "why this is/isnt the correct PK"`
+If multiple candidates exist for a table, only one should be kept.
+Return a JSON array:
+[
+  { "index": 1, "action": "keep", "reasoning": "..." },
+  { "index": 2, "action": "remove", "reasoning": "..." }
+]
+Return ONLY valid JSON. Do not include markdown code fences or explanatory text.

package/prompts/table-analysis.md CHANGED Viewed

@@ -83,6 +83,17 @@ The database owner has provided the following authoritative documentation. Your
 {% if seedContext.customInstructions %}- **Special Instructions**: {{ seedContext.customInstructions }}{% endif %}
 {% endif %}
+{% if fkCandidateStats and fkCandidateStats.length > 0 %}
+## FK Evidence from Statistical Analysis
+The following columns in this table were identified as potential foreign keys by statistical analysis. Use this evidence to inform (but not limit) your FK assessment — you may identify additional FKs not listed here.
+{% for fk in fkCandidateStats %}
+- **{{ fk.sourceColumn }}** → {{ fk.targetSchema }}.{{ fk.targetTable }}.{{ fk.targetColumn }} (value overlap: {{ (fk.valueOverlap * 100) | round(1) }}%, cardinality ratio: {{ fk.cardinalityRatio | round(2) }}, confidence: {{ fk.confidence }}%)
+{% endfor %}
+*Value overlap = % of source values that exist in the target column. 100% = strong FK evidence. Cardinality ratio = source distinct / target distinct — values > 1 suggest child→parent direction.*
+{% endif %}
 {% if allTables %}
 ## All Database Tables
 **IMPORTANT**: When referring to foreign key relationships, you MUST use one of these exact table names:
@@ -111,6 +122,11 @@ Based on the evidence above, generate a JSON response with this exact structure:
       "reasoning": "Brief explanation of the evidence"
     }
   ],
+  "primaryKey": {
+    "columns": ["CustomerID"],
+    "confidence": 0.95,
+    "reasoning": "Single auto-increment column with 100% uniqueness, named after the table"
+  },
   "foreignKeys": [
     {
       "columnName": "prd_id",
@@ -136,14 +152,23 @@ Based on the evidence above, generate a JSON response with this exact structure:
 2. **Reasoning**: Reference specific evidence (column names, FK relationships, sample values, cardinality patterns)
 3. **Confidence**: 0-1 scale. Be conservative. Use < 0.7 if ambiguous.
 4. **Column Descriptions**: Every column should be described. Explain its role and meaning.
-5. **Foreign Keys**: **CRITICAL** - Use structured format for ALL foreign key relationships:
+5. **Primary Key**: Identify the column(s) that most likely form this table's primary key.
+   - Look for columns with 100% uniqueness, zero nulls, and names like `ID`, `TableNameID`, or `Code`
+   - For junction/bridge tables (e.g., `ProductModelIllustration`), the PK is likely a composite of the FK columns
+   - If the table inherits an ID from a parent (e.g., `Employee` using `BusinessEntityID` from `BusinessEntity`), that inherited column IS the PK
+   - Use `"columns": ["Col1", "Col2"]` for composite keys
+   - Confidence should reflect how certain you are (0-1 scale)
+   - If a PK is already marked in the column list above, you may confirm it or propose a different one
+6. **Foreign Keys**: **CRITICAL** - Use structured format for ALL foreign key relationships:
    - Include EVERY column that references another table
    - Use EXACT schema and table names from the "All Database Tables" list above
    - Specify confidence (0-1 scale) based on evidence strength
    - Example: If `prd_id` exists, add: `{"columnName": "prd_id", "referencesSchema": "inv", "referencesTable": "prd", "referencesColumn": "prd_id", "confidence": 0.95}`
    - **Leave empty array if no foreign keys detected**
-6. **Business Domain**: Infer from table name and purpose (e.g., "Sales", "HR", "Inventory", "Billing", "Security")
-7. **Parent Table Insights**: If analyzing this child table reveals new information about parent tables, include it. Examples:
+   - **Inheritance/specialization**: When a column could reference either a generic base table (e.g., BusinessEntity) or a more specialized table (e.g., Person, Employee, Vendor) that inherits from it, **always prefer the most specialized table**. The specialized table is the one that adds domain-specific columns beyond the base table. For example, `Employee.BusinessEntityID` should reference `Person.BusinessEntityID` (not `BusinessEntity.BusinessEntityID`) because Person is the specialized entity that Employee relates to.
+   - **Polymorphic FKs**: Some columns may reference different tables depending on the row (e.g., a `ReferenceOrderID` that could point to a SalesOrder, PurchaseOrder, or WorkOrder). If you detect this pattern, pick the **single most common/likely target** and note the polymorphic nature in your reasoning. Do not create multiple FK entries for the same column pointing to different tables unless you are highly confident each is valid.
+7. **Business Domain**: Infer from table name and purpose (e.g., "Sales", "HR", "Inventory", "Billing", "Security")
+8. **Parent Table Insights**: If analyzing this child table reveals new information about parent tables, include it. Examples:
    - Discovering enum values in the parent (e.g., "Member table has a 'Type' column with values: Individual, Corporate, Student")
    - Revealing parent table classification/purpose (e.g., "BoardMember reveals that Member table includes leadership roles, not just general members")
    - Identifying parent table patterns (e.g., "Multiple child tables suggest Organization serves as a multi-tenant partition key")