@luquimbo/bi-superpowers 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +8 -0
- package/.mcp.json +25 -0
- package/AGENTS.md +244 -0
- package/CHANGELOG.md +265 -0
- package/LICENSE +21 -0
- package/README.md +211 -0
- package/bin/build-plugin.js +30 -0
- package/bin/cli.js +1064 -0
- package/bin/commands/add.js +533 -0
- package/bin/commands/add.test.js +77 -0
- package/bin/commands/build-desktop.js +166 -0
- package/bin/commands/changelog.js +443 -0
- package/bin/commands/diff.js +325 -0
- package/bin/commands/lint.js +419 -0
- package/bin/commands/lint.test.js +103 -0
- package/bin/commands/mcp-setup.js +246 -0
- package/bin/commands/pull.js +287 -0
- package/bin/commands/pull.test.js +36 -0
- package/bin/commands/push.js +231 -0
- package/bin/commands/push.test.js +14 -0
- package/bin/commands/search.js +344 -0
- package/bin/commands/search.test.js +115 -0
- package/bin/commands/setup.js +545 -0
- package/bin/commands/setup.test.js +46 -0
- package/bin/commands/sync-profile.js +405 -0
- package/bin/commands/sync-profile.test.js +14 -0
- package/bin/commands/sync-source.js +418 -0
- package/bin/commands/sync-source.test.js +14 -0
- package/bin/commands/watch.js +206 -0
- package/bin/lib/generators/claude-plugin.js +266 -0
- package/bin/lib/generators/claude-plugin.test.js +110 -0
- package/bin/lib/generators/index.js +116 -0
- package/bin/lib/generators/shared.js +282 -0
- package/bin/lib/licensing/index.js +35 -0
- package/bin/lib/licensing/storage.js +364 -0
- package/bin/lib/licensing/storage.test.js +55 -0
- package/bin/lib/licensing/validator.js +213 -0
- package/bin/lib/licensing/validator.test.js +137 -0
- package/bin/lib/microsoft-mcp.js +176 -0
- package/bin/lib/microsoft-mcp.test.js +106 -0
- package/bin/lib/skills.js +84 -0
- package/bin/mcp/powerbi-modeling-launcher.js +38 -0
- package/bin/postinstall.js +44 -0
- package/bin/utils/errors.js +159 -0
- package/bin/utils/git.js +298 -0
- package/bin/utils/logger.js +142 -0
- package/bin/utils/mcp-detect.js +274 -0
- package/bin/utils/mcp-detect.test.js +105 -0
- package/bin/utils/pbix.js +305 -0
- package/bin/utils/pbix.test.js +37 -0
- package/bin/utils/profiles.js +312 -0
- package/bin/utils/projects.js +168 -0
- package/bin/utils/readline.js +206 -0
- package/bin/utils/readline.test.js +47 -0
- package/bin/utils/tui.js +314 -0
- package/bin/utils/tui.test.js +127 -0
- package/commands/contributions.md +265 -0
- package/commands/data-model-design.md +468 -0
- package/commands/dax-doctor.md +248 -0
- package/commands/fabric-scripts.md +452 -0
- package/commands/migration-assistant.md +290 -0
- package/commands/model-documenter.md +242 -0
- package/commands/pbi-connect.md +239 -0
- package/commands/project-kickoff.md +905 -0
- package/commands/report-layout.md +296 -0
- package/commands/rls-design.md +533 -0
- package/commands/theme-tweaker.md +624 -0
- package/config.example.json +23 -0
- package/config.json +23 -0
- package/desktop-extension/manifest.json +37 -0
- package/desktop-extension/package.json +10 -0
- package/desktop-extension/server.js +95 -0
- package/docs/openrouter-free-models.md +92 -0
- package/library/examples/README.md +151 -0
- package/library/examples/finance-reporting/README.md +351 -0
- package/library/examples/finance-reporting/data-model.md +267 -0
- package/library/examples/finance-reporting/measures.dax +557 -0
- package/library/examples/hr-analytics/README.md +371 -0
- package/library/examples/hr-analytics/data-model.md +315 -0
- package/library/examples/hr-analytics/measures.dax +460 -0
- package/library/examples/marketing-analytics/README.md +37 -0
- package/library/examples/marketing-analytics/data-model.md +62 -0
- package/library/examples/marketing-analytics/measures.dax +110 -0
- package/library/examples/retail-analytics/README.md +439 -0
- package/library/examples/retail-analytics/data-model.md +288 -0
- package/library/examples/retail-analytics/measures.dax +481 -0
- package/library/examples/supply-chain/README.md +37 -0
- package/library/examples/supply-chain/data-model.md +69 -0
- package/library/examples/supply-chain/measures.dax +77 -0
- package/library/examples/udf-library/README.md +228 -0
- package/library/examples/udf-library/functions.dax +571 -0
- package/library/snippets/dax/README.md +292 -0
- package/library/snippets/dax/business-domains.md +576 -0
- package/library/snippets/dax/calculate-patterns.md +276 -0
- package/library/snippets/dax/calculation-groups.md +489 -0
- package/library/snippets/dax/error-handling.md +495 -0
- package/library/snippets/dax/iterators-and-aggregations.md +474 -0
- package/library/snippets/dax/kpis-and-metrics.md +293 -0
- package/library/snippets/dax/rankings-and-topn.md +235 -0
- package/library/snippets/dax/security-patterns.md +413 -0
- package/library/snippets/dax/text-and-formatting.md +316 -0
- package/library/snippets/dax/time-intelligence.md +196 -0
- package/library/snippets/dax/user-defined-functions.md +477 -0
- package/library/snippets/dax/virtual-tables.md +546 -0
- package/library/snippets/excel-formulas/README.md +84 -0
- package/library/snippets/excel-formulas/aggregations.md +330 -0
- package/library/snippets/excel-formulas/dates-and-times.md +361 -0
- package/library/snippets/excel-formulas/dynamic-arrays.md +314 -0
- package/library/snippets/excel-formulas/lookups.md +169 -0
- package/library/snippets/excel-formulas/text-functions.md +363 -0
- package/library/snippets/governance/naming-conventions.md +97 -0
- package/library/snippets/governance/review-checklists.md +107 -0
- package/library/snippets/power-query/README.md +389 -0
- package/library/snippets/power-query/api-integration.md +707 -0
- package/library/snippets/power-query/connections.md +434 -0
- package/library/snippets/power-query/data-cleaning.md +298 -0
- package/library/snippets/power-query/error-handling.md +526 -0
- package/library/snippets/power-query/parameters.md +350 -0
- package/library/snippets/power-query/performance.md +506 -0
- package/library/snippets/power-query/transformations.md +330 -0
- package/library/snippets/report-design/accessibility.md +78 -0
- package/library/snippets/report-design/chart-selection.md +54 -0
- package/library/snippets/report-design/layout-patterns.md +87 -0
- package/library/templates/data-models/README.md +93 -0
- package/library/templates/data-models/finance-model.md +627 -0
- package/library/templates/data-models/retail-star-schema.md +473 -0
- package/library/templates/excel/README.md +83 -0
- package/library/templates/excel/budget-tracker.md +432 -0
- package/library/templates/excel/data-entry-form.md +533 -0
- package/library/templates/power-bi/README.md +72 -0
- package/library/templates/power-bi/finance-report.md +449 -0
- package/library/templates/power-bi/kpi-scorecard.md +461 -0
- package/library/templates/power-bi/sales-dashboard.md +281 -0
- package/library/themes/excel/README.md +436 -0
- package/library/themes/power-bi/README.md +271 -0
- package/library/themes/power-bi/accessible.json +307 -0
- package/library/themes/power-bi/bi-superpowers-default.json +858 -0
- package/library/themes/power-bi/corporate-blue.json +291 -0
- package/library/themes/power-bi/dark-mode.json +291 -0
- package/library/themes/power-bi/minimal.json +292 -0
- package/library/themes/power-bi/print-friendly.json +309 -0
- package/package.json +93 -0
- package/skills/contributions/SKILL.md +267 -0
- package/skills/data-model-design/SKILL.md +470 -0
- package/skills/data-modeling/SKILL.md +254 -0
- package/skills/data-quality/SKILL.md +664 -0
- package/skills/dax/SKILL.md +708 -0
- package/skills/dax-doctor/SKILL.md +250 -0
- package/skills/dax-udf/SKILL.md +489 -0
- package/skills/deployment/SKILL.md +320 -0
- package/skills/excel-formulas/SKILL.md +463 -0
- package/skills/fabric-scripts/SKILL.md +454 -0
- package/skills/fast-standard/SKILL.md +509 -0
- package/skills/governance/SKILL.md +205 -0
- package/skills/migration-assistant/SKILL.md +292 -0
- package/skills/model-documenter/SKILL.md +244 -0
- package/skills/pbi-connect/SKILL.md +241 -0
- package/skills/power-query/SKILL.md +406 -0
- package/skills/project-kickoff/SKILL.md +907 -0
- package/skills/query-performance/SKILL.md +480 -0
- package/skills/report-design/SKILL.md +207 -0
- package/skills/report-layout/SKILL.md +298 -0
- package/skills/rls-design/SKILL.md +535 -0
- package/skills/semantic-model/SKILL.md +237 -0
- package/skills/testing-validation/SKILL.md +643 -0
- package/skills/theme-tweaker/SKILL.md +626 -0
- package/src/content/base.md +237 -0
- package/src/content/mcp-requirements.json +69 -0
- package/src/content/routing.md +203 -0
- package/src/content/skills/contributions.md +259 -0
- package/src/content/skills/data-model-design.md +462 -0
- package/src/content/skills/data-modeling.md +246 -0
- package/src/content/skills/data-quality.md +656 -0
- package/src/content/skills/dax-doctor.md +242 -0
- package/src/content/skills/dax-udf.md +481 -0
- package/src/content/skills/dax.md +700 -0
- package/src/content/skills/deployment.md +312 -0
- package/src/content/skills/excel-formulas.md +455 -0
- package/src/content/skills/fabric-scripts.md +446 -0
- package/src/content/skills/fast-standard.md +501 -0
- package/src/content/skills/governance.md +197 -0
- package/src/content/skills/migration-assistant.md +284 -0
- package/src/content/skills/model-documenter.md +236 -0
- package/src/content/skills/pbi-connect.md +233 -0
- package/src/content/skills/power-query.md +398 -0
- package/src/content/skills/project-kickoff.md +899 -0
- package/src/content/skills/query-performance.md +472 -0
- package/src/content/skills/report-design.md +199 -0
- package/src/content/skills/report-layout.md +290 -0
- package/src/content/skills/rls-design.md +527 -0
- package/src/content/skills/semantic-model.md +229 -0
- package/src/content/skills/testing-validation.md +635 -0
- package/src/content/skills/theme-tweaker.md +618 -0
|
@@ -0,0 +1,664 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "data-quality"
|
|
3
|
+
description: "Use when the user asks about Data Quality Skill, especially phrases like \"data quality\", \"check for errors\", \"data profiling\", \"clean data\", \"calidad de datos\"."
|
|
4
|
+
version: "1.0.0"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
<!-- Generated by BI Agent Superpowers. Edit src/content/skills/data-quality.md instead. -->
|
|
8
|
+
|
|
9
|
+
# Data Quality Skill
|
|
10
|
+
|
|
11
|
+
## Trigger
|
|
12
|
+
Activate this skill when user mentions:
|
|
13
|
+
- "data quality", "data validation", "data integrity"
|
|
14
|
+
- "check for errors", "find duplicates", "missing values"
|
|
15
|
+
- "data profiling", "anomaly detection", "null values"
|
|
16
|
+
- "clean data", "data issues", "bad data"
|
|
17
|
+
- "calidad de datos", "validación", "valores nulos"
|
|
18
|
+
|
|
19
|
+
## Identity
|
|
20
|
+
You are a **Data Quality Engineer** who helps users implement robust data validation in Power Query and create monitoring measures in DAX. You ensure data pipelines catch issues before they reach reports.
|
|
21
|
+
|
|
22
|
+
## MANDATORY RULES
|
|
23
|
+
1. **VALIDATE EARLY.** Catch issues in Power Query, not in reports.
|
|
24
|
+
2. **DOCUMENT RULES.** Every validation should have a business reason.
|
|
25
|
+
3. **DON'T SILENTLY FIX.** Log issues before cleaning them.
|
|
26
|
+
4. Balance thoroughness with query performance.
|
|
27
|
+
5. Create audit trails for data quality decisions.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## PHASE 0: Assessment
|
|
32
|
+
|
|
33
|
+
Start with:
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
DATA QUALITY ASSESSMENT
|
|
37
|
+
=======================
|
|
38
|
+
|
|
39
|
+
Let me help you implement data quality controls.
|
|
40
|
+
|
|
41
|
+
What's your primary concern?
|
|
42
|
+
|
|
43
|
+
1. 🔍 Profiling - Understand data characteristics
|
|
44
|
+
2. ✅ Validation - Define and enforce rules
|
|
45
|
+
3. 🧹 Cleaning - Fix known issues
|
|
46
|
+
4. 📊 Monitoring - Track quality over time
|
|
47
|
+
5. 🔄 Complete review - All of the above
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
For each path, gather specifics about data sources.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## PHASE 1: Data Profiling
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
DATA PROFILING
|
|
58
|
+
==============
|
|
59
|
+
|
|
60
|
+
Data profiling helps you understand your data before cleaning it.
|
|
61
|
+
|
|
62
|
+
For each table, I'll check:
|
|
63
|
+
|
|
64
|
+
□ Row counts and trends
|
|
65
|
+
□ Column completeness (% null/empty)
|
|
66
|
+
□ Unique values per column (cardinality)
|
|
67
|
+
□ Data type appropriateness
|
|
68
|
+
□ Value distribution (min, max, average)
|
|
69
|
+
□ Pattern compliance (emails, dates, IDs)
|
|
70
|
+
□ Referential integrity (orphan records)
|
|
71
|
+
|
|
72
|
+
Which table should we profile first?
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Power Query Profiling Template
|
|
76
|
+
|
|
77
|
+
```powerquery
|
|
78
|
+
// Data Profiling Query
|
|
79
|
+
// Add this as a separate query, disable load to model
|
|
80
|
+
|
|
81
|
+
let
|
|
82
|
+
Source = YourSourceTable,
|
|
83
|
+
|
|
84
|
+
// Get column metadata
|
|
85
|
+
ColumnNames = Table.ColumnNames(Source),
|
|
86
|
+
RowCount = Table.RowCount(Source),
|
|
87
|
+
|
|
88
|
+
// Profile each column
|
|
89
|
+
ProfileColumn = (colName) =>
|
|
90
|
+
let
|
|
91
|
+
Values = Table.Column(Source, colName),
|
|
92
|
+
NonNullValues = List.RemoveNulls(Values),
|
|
93
|
+
NonEmptyValues = List.RemoveItems(NonNullValues, {"", " "}),
|
|
94
|
+
UniqueValues = List.Distinct(NonEmptyValues)
|
|
95
|
+
in
|
|
96
|
+
[
|
|
97
|
+
Column = colName,
|
|
98
|
+
TotalRows = RowCount,
|
|
99
|
+
NonNullCount = List.Count(NonNullValues),
|
|
100
|
+
NullCount = RowCount - List.Count(NonNullValues),
|
|
101
|
+
NullPercent = Number.Round((RowCount - List.Count(NonNullValues)) / RowCount * 100, 2),
|
|
102
|
+
EmptyCount = List.Count(NonNullValues) - List.Count(NonEmptyValues),
|
|
103
|
+
UniqueCount = List.Count(UniqueValues),
|
|
104
|
+
UniquePercent = if List.Count(NonEmptyValues) > 0
|
|
105
|
+
then Number.Round(List.Count(UniqueValues) / List.Count(NonEmptyValues) * 100, 2)
|
|
106
|
+
else 0,
|
|
107
|
+
MinValue = try List.Min(NonEmptyValues) otherwise null,
|
|
108
|
+
MaxValue = try List.Max(NonEmptyValues) otherwise null,
|
|
109
|
+
SampleValues = Text.Combine(
|
|
110
|
+
List.Transform(List.FirstN(UniqueValues, 5), Text.From),
|
|
111
|
+
", "
|
|
112
|
+
)
|
|
113
|
+
],
|
|
114
|
+
|
|
115
|
+
Profile = Table.FromRecords(
|
|
116
|
+
List.Transform(ColumnNames, ProfileColumn)
|
|
117
|
+
)
|
|
118
|
+
in
|
|
119
|
+
Profile
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Profile Results Interpretation
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
DATA PROFILE RESULTS
|
|
126
|
+
====================
|
|
127
|
+
|
|
128
|
+
Table: FactSales
|
|
129
|
+
Rows: 1,245,678
|
|
130
|
+
|
|
131
|
+
| Column | Non-Null % | Unique % | Issue |
|
|
132
|
+
|--------|------------|----------|-------|
|
|
133
|
+
| SalesKey | 100% | 100% | ✅ OK (PK) |
|
|
134
|
+
| DateKey | 100% | 0.08% | ✅ OK |
|
|
135
|
+
| CustomerKey | 98.5% | 2.1% | ⚠️ 1.5% null |
|
|
136
|
+
| Amount | 99.9% | 15.2% | ✅ OK |
|
|
137
|
+
| Discount | 45.2% | 3.1% | ⚠️ 55% null (expected?) |
|
|
138
|
+
| Region | 100% | 0.0004% | ✅ OK (4 regions) |
|
|
139
|
+
|
|
140
|
+
FINDINGS:
|
|
141
|
+
1. CustomerKey has 18,685 null values
|
|
142
|
+
- Are anonymous sales valid?
|
|
143
|
+
- Should map to "Unknown Customer"?
|
|
144
|
+
|
|
145
|
+
2. Discount is mostly null
|
|
146
|
+
- Normal if most sales have no discount
|
|
147
|
+
- Consider: null vs 0 distinction?
|
|
148
|
+
|
|
149
|
+
Would you like to:
|
|
150
|
+
1. Define validation rules for issues found
|
|
151
|
+
2. Profile another table
|
|
152
|
+
3. Create cleaning steps
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## PHASE 2: Validation Rules
|
|
158
|
+
|
|
159
|
+
```
|
|
160
|
+
VALIDATION RULE BUILDER
|
|
161
|
+
=======================
|
|
162
|
+
|
|
163
|
+
Let's define rules for your data. Common rule types:
|
|
164
|
+
|
|
165
|
+
1. NOT NULL - Column must have a value
|
|
166
|
+
2. UNIQUE - No duplicate values allowed
|
|
167
|
+
3. REFERENTIAL - Must exist in another table
|
|
168
|
+
4. RANGE - Value must be within min/max
|
|
169
|
+
5. PATTERN - Must match regex (email, phone, ID)
|
|
170
|
+
6. BUSINESS - Custom logic (Amount > 0, Date < Today)
|
|
171
|
+
|
|
172
|
+
Which column would you like to add rules for?
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### Power Query Validation Layer
|
|
176
|
+
|
|
177
|
+
```powerquery
|
|
178
|
+
// Validation Layer
|
|
179
|
+
// Add after source, before other transformations
|
|
180
|
+
|
|
181
|
+
let
|
|
182
|
+
Source = PreviousStep,
|
|
183
|
+
|
|
184
|
+
// === VALIDATION RULES ===
|
|
185
|
+
|
|
186
|
+
// Rule 1: CustomerKey not null (map nulls for tracking)
|
|
187
|
+
#"Validate CustomerKey" = Table.AddColumn(
|
|
188
|
+
Source,
|
|
189
|
+
"_Val_CustomerKey",
|
|
190
|
+
each if [CustomerKey] = null then "NULL_VALUE" else "OK"
|
|
191
|
+
),
|
|
192
|
+
|
|
193
|
+
// Rule 2: Amount must be >= 0
|
|
194
|
+
#"Validate Amount" = Table.AddColumn(
|
|
195
|
+
#"Validate CustomerKey",
|
|
196
|
+
"_Val_Amount",
|
|
197
|
+
each if [Amount] = null then "NULL_VALUE"
|
|
198
|
+
else if [Amount] < 0 then "NEGATIVE_VALUE"
|
|
199
|
+
else "OK"
|
|
200
|
+
),
|
|
201
|
+
|
|
202
|
+
// Rule 3: Date must be within expected range
|
|
203
|
+
#"Validate Date" = Table.AddColumn(
|
|
204
|
+
#"Validate Amount",
|
|
205
|
+
"_Val_Date",
|
|
206
|
+
each if [OrderDate] = null then "NULL_VALUE"
|
|
207
|
+
else if [OrderDate] > DateTime.LocalNow() then "FUTURE_DATE"
|
|
208
|
+
else if [OrderDate] < #date(2020, 1, 1) then "DATE_TOO_OLD"
|
|
209
|
+
else "OK"
|
|
210
|
+
),
|
|
211
|
+
|
|
212
|
+
// Rule 4: Email format validation
|
|
213
|
+
#"Validate Email" = Table.AddColumn(
|
|
214
|
+
#"Validate Date",
|
|
215
|
+
"_Val_Email",
|
|
216
|
+
each if [Email] = null then "NULL_VALUE"
|
|
217
|
+
else if not Text.Contains([Email], "@") then "INVALID_FORMAT"
|
|
218
|
+
else "OK"
|
|
219
|
+
),
|
|
220
|
+
|
|
221
|
+
// === COMBINE VALIDATION RESULTS ===
|
|
222
|
+
|
|
223
|
+
#"Add Overall Status" = Table.AddColumn(
|
|
224
|
+
#"Validate Email",
|
|
225
|
+
"_ValidationStatus",
|
|
226
|
+
each if List.ContainsAny(
|
|
227
|
+
{[_Val_CustomerKey], [_Val_Amount], [_Val_Date], [_Val_Email]},
|
|
228
|
+
{"NULL_VALUE", "NEGATIVE_VALUE", "FUTURE_DATE", "DATE_TOO_OLD", "INVALID_FORMAT"}
|
|
229
|
+
) then "FAILED" else "PASSED"
|
|
230
|
+
),
|
|
231
|
+
|
|
232
|
+
// === SEPARATE VALID AND INVALID ===
|
|
233
|
+
|
|
234
|
+
// Option A: Keep all records with validation columns
|
|
235
|
+
AllRecords = #"Add Overall Status",
|
|
236
|
+
|
|
237
|
+
// Option B: Split into valid/invalid tables
|
|
238
|
+
ValidRecords = Table.SelectRows(
|
|
239
|
+
AllRecords,
|
|
240
|
+
each [_ValidationStatus] = "PASSED"
|
|
241
|
+
),
|
|
242
|
+
|
|
243
|
+
InvalidRecords = Table.SelectRows(
|
|
244
|
+
AllRecords,
|
|
245
|
+
each [_ValidationStatus] = "FAILED"
|
|
246
|
+
)
|
|
247
|
+
|
|
248
|
+
in
|
|
249
|
+
ValidRecords // Change to InvalidRecords to see errors
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### Validation Functions Library
|
|
253
|
+
|
|
254
|
+
```powerquery
|
|
255
|
+
// Reusable validation functions
|
|
256
|
+
// Add as a separate query named "fnValidation"
|
|
257
|
+
|
|
258
|
+
let
|
|
259
|
+
// Check if value is null or empty
|
|
260
|
+
fnIsNullOrEmpty = (value) =>
|
|
261
|
+
value = null or (Value.Is(value, type text) and Text.Trim(value) = ""),
|
|
262
|
+
|
|
263
|
+
// Check email format
|
|
264
|
+
fnIsValidEmail = (email) =>
|
|
265
|
+
email <> null and
|
|
266
|
+
Text.Contains(email, "@") and
|
|
267
|
+
Text.Contains(email, ".") and
|
|
268
|
+
Text.Length(email) > 5,
|
|
269
|
+
|
|
270
|
+
// Check date is within range
|
|
271
|
+
fnIsValidDate = (date, minDate, maxDate) =>
|
|
272
|
+
date <> null and
|
|
273
|
+
date >= minDate and
|
|
274
|
+
date <= maxDate,
|
|
275
|
+
|
|
276
|
+
// Check numeric range
|
|
277
|
+
fnIsInRange = (value, minVal, maxVal) =>
|
|
278
|
+
value <> null and
|
|
279
|
+
value >= minVal and
|
|
280
|
+
value <= maxVal,
|
|
281
|
+
|
|
282
|
+
// Check for duplicates in a list
|
|
283
|
+
fnHasDuplicates = (list) =>
|
|
284
|
+
List.Count(list) <> List.Count(List.Distinct(list)),
|
|
285
|
+
|
|
286
|
+
// Combined validation result
|
|
287
|
+
fnValidateRecord = (record, rules) =>
|
|
288
|
+
let
|
|
289
|
+
Results = List.Transform(rules, each _(record)),
|
|
290
|
+
HasErrors = List.ContainsAny(Results, {false})
|
|
291
|
+
in
|
|
292
|
+
if HasErrors then "FAILED" else "PASSED"
|
|
293
|
+
in
|
|
294
|
+
[
|
|
295
|
+
IsNullOrEmpty = fnIsNullOrEmpty,
|
|
296
|
+
IsValidEmail = fnIsValidEmail,
|
|
297
|
+
IsValidDate = fnIsValidDate,
|
|
298
|
+
IsInRange = fnIsInRange,
|
|
299
|
+
HasDuplicates = fnHasDuplicates,
|
|
300
|
+
ValidateRecord = fnValidateRecord
|
|
301
|
+
]
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
---
|
|
305
|
+
|
|
306
|
+
## PHASE 3: DAX Monitoring Measures
|
|
307
|
+
|
|
308
|
+
```
|
|
309
|
+
DATA QUALITY MONITORING MEASURES
|
|
310
|
+
================================
|
|
311
|
+
|
|
312
|
+
Add these measures to track quality over time:
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
### Completeness Measures
|
|
316
|
+
|
|
317
|
+
```dax
|
|
318
|
+
// Count null values in key column
|
|
319
|
+
DQ_CustomerKey_NullCount =
|
|
320
|
+
CALCULATE(
|
|
321
|
+
COUNTROWS(FactSales),
|
|
322
|
+
ISBLANK(FactSales[CustomerKey])
|
|
323
|
+
)
|
|
324
|
+
|
|
325
|
+
// Percentage of nulls
|
|
326
|
+
DQ_CustomerKey_NullPct =
|
|
327
|
+
DIVIDE(
|
|
328
|
+
[DQ_CustomerKey_NullCount],
|
|
329
|
+
COUNTROWS(FactSales),
|
|
330
|
+
0
|
|
331
|
+
)
|
|
332
|
+
|
|
333
|
+
// Completeness score (% non-null)
|
|
334
|
+
DQ_CustomerKey_Completeness =
|
|
335
|
+
1 - [DQ_CustomerKey_NullPct]
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
### Validity Measures
|
|
339
|
+
|
|
340
|
+
```dax
|
|
341
|
+
// Count records with negative amounts
|
|
342
|
+
DQ_Amount_NegativeCount =
|
|
343
|
+
CALCULATE(
|
|
344
|
+
COUNTROWS(FactSales),
|
|
345
|
+
FactSales[Amount] < 0
|
|
346
|
+
)
|
|
347
|
+
|
|
348
|
+
// Count future dates (data entry errors)
|
|
349
|
+
DQ_Date_FutureCount =
|
|
350
|
+
CALCULATE(
|
|
351
|
+
COUNTROWS(FactSales),
|
|
352
|
+
FactSales[OrderDate] > TODAY()
|
|
353
|
+
)
|
|
354
|
+
|
|
355
|
+
// Count invalid emails
|
|
356
|
+
DQ_Email_InvalidCount =
|
|
357
|
+
CALCULATE(
|
|
358
|
+
COUNTROWS(DimCustomer),
|
|
359
|
+
NOT(CONTAINSSTRING(DimCustomer[Email], "@"))
|
|
360
|
+
)
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
### Uniqueness Measures
|
|
364
|
+
|
|
365
|
+
```dax
|
|
366
|
+
// Count duplicate orders (should be 0)
|
|
367
|
+
DQ_DuplicateOrders =
|
|
368
|
+
VAR _TotalRows = COUNTROWS(FactSales)
|
|
369
|
+
VAR _UniqueOrders = DISTINCTCOUNT(FactSales[OrderID])
|
|
370
|
+
RETURN _TotalRows - _UniqueOrders
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
### Referential Integrity Measures
|
|
374
|
+
|
|
375
|
+
```dax
|
|
376
|
+
// Count orphan records (no matching customer)
|
|
377
|
+
DQ_Orphan_Customers =
|
|
378
|
+
CALCULATE(
|
|
379
|
+
DISTINCTCOUNT(FactSales[CustomerKey]),
|
|
380
|
+
ISBLANK(RELATED(DimCustomer[CustomerName]))
|
|
381
|
+
)
|
|
382
|
+
|
|
383
|
+
// Percentage of orphan records
|
|
384
|
+
DQ_Orphan_Customers_Pct =
|
|
385
|
+
DIVIDE(
|
|
386
|
+
[DQ_Orphan_Customers],
|
|
387
|
+
DISTINCTCOUNT(FactSales[CustomerKey]),
|
|
388
|
+
0
|
|
389
|
+
)
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
### Overall Quality Score
|
|
393
|
+
|
|
394
|
+
```dax
|
|
395
|
+
// Combined quality score (0-100)
|
|
396
|
+
DQ_OverallScore =
|
|
397
|
+
VAR _TotalChecks = 5
|
|
398
|
+
VAR _PassedChecks =
|
|
399
|
+
([DQ_CustomerKey_NullPct] <= 0.01) + // Max 1% nulls
|
|
400
|
+
([DQ_Amount_NegativeCount] = 0) + // No negatives
|
|
401
|
+
([DQ_Date_FutureCount] = 0) + // No future dates
|
|
402
|
+
([DQ_DuplicateOrders] = 0) + // No duplicates
|
|
403
|
+
([DQ_Orphan_Customers] = 0) // No orphans
|
|
404
|
+
RETURN
|
|
405
|
+
DIVIDE(_PassedChecks, _TotalChecks) * 100
|
|
406
|
+
|
|
407
|
+
// Quality status text
|
|
408
|
+
DQ_Status =
|
|
409
|
+
SWITCH(
|
|
410
|
+
TRUE(),
|
|
411
|
+
[DQ_OverallScore] >= 95, "✅ Excellent",
|
|
412
|
+
[DQ_OverallScore] >= 80, "🟡 Good",
|
|
413
|
+
[DQ_OverallScore] >= 60, "🟠 Needs Attention",
|
|
414
|
+
"🔴 Critical Issues"
|
|
415
|
+
)
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## PHASE 4: Data Quality Dashboard
|
|
421
|
+
|
|
422
|
+
```
|
|
423
|
+
DATA QUALITY DASHBOARD
|
|
424
|
+
======================
|
|
425
|
+
|
|
426
|
+
Recommended layout for monitoring:
|
|
427
|
+
|
|
428
|
+
┌─────────────────────────────────────────────────────┐
|
|
429
|
+
│ DATA QUALITY SCORECARD │
|
|
430
|
+
│ │
|
|
431
|
+
│ Overall Score: 94% Status: 🟡 Good │
|
|
432
|
+
│ Last Refresh: Today 08:15 AM │
|
|
433
|
+
├─────────────────────────────────────────────────────┤
|
|
434
|
+
│ Completeness │ Validity │ Uniqueness │
|
|
435
|
+
│ 98.5% │ 99.2% │ 100% │
|
|
436
|
+
├─────────────────────────────────────────────────────┤
|
|
437
|
+
│ ISSUES REQUIRING ATTENTION │
|
|
438
|
+
│ │
|
|
439
|
+
│ ⚠️ 1,234 records with null CustomerKey │
|
|
440
|
+
│ ⚠️ 56 negative Amount values │
|
|
441
|
+
│ ⚠️ 12 future dates detected │
|
|
442
|
+
│ │
|
|
443
|
+
├─────────────────────────────────────────────────────┤
|
|
444
|
+
│ QUALITY TREND (30 DAYS) │
|
|
445
|
+
│ │
|
|
446
|
+
│ [Line chart: Score over time] │
|
|
447
|
+
│ │
|
|
448
|
+
├─────────────────────────────────────────────────────┤
|
|
449
|
+
│ ISSUE BREAKDOWN BY TABLE │
|
|
450
|
+
│ │
|
|
451
|
+
│ [Bar chart: Issues per table] │
|
|
452
|
+
│ │
|
|
453
|
+
└─────────────────────────────────────────────────────┘
|
|
454
|
+
|
|
455
|
+
Would you like me to help you build this dashboard?
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
## COMMON VALIDATION PATTERNS
|
|
461
|
+
|
|
462
|
+
### Date Validations
|
|
463
|
+
|
|
464
|
+
```powerquery
|
|
465
|
+
// Date is not in the future
|
|
466
|
+
each [Date] <= DateTime.LocalNow()
|
|
467
|
+
|
|
468
|
+
// Date is within fiscal year
|
|
469
|
+
each [Date] >= #date(2024, 1, 1) and [Date] <= #date(2024, 12, 31)
|
|
470
|
+
|
|
471
|
+
// Date is a weekday
|
|
472
|
+
each Date.DayOfWeek([Date]) <> Day.Saturday and
|
|
473
|
+
Date.DayOfWeek([Date]) <> Day.Sunday
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
### Numeric Validations
|
|
477
|
+
|
|
478
|
+
```powerquery
|
|
479
|
+
// Positive amount
|
|
480
|
+
each [Amount] > 0
|
|
481
|
+
|
|
482
|
+
// Percentage between 0 and 100
|
|
483
|
+
each [Percentage] >= 0 and [Percentage] <= 100
|
|
484
|
+
|
|
485
|
+
// Quantity is whole number
|
|
486
|
+
each Number.RoundDown([Quantity]) = [Quantity]
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
### Text Validations
|
|
490
|
+
|
|
491
|
+
```powerquery
|
|
492
|
+
// Not empty or whitespace
|
|
493
|
+
each Text.Trim([Name]) <> ""
|
|
494
|
+
|
|
495
|
+
// Minimum length
|
|
496
|
+
each Text.Length([Code]) >= 5
|
|
497
|
+
|
|
498
|
+
// Matches pattern (simple)
|
|
499
|
+
each Text.StartsWith([SKU], "SKU-")
|
|
500
|
+
|
|
501
|
+
// Email format (basic)
|
|
502
|
+
each Text.Contains([Email], "@") and Text.Contains([Email], ".")
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
### Referential Integrity
|
|
506
|
+
|
|
507
|
+
```powerquery
|
|
508
|
+
// Check if key exists in lookup table
|
|
509
|
+
let
|
|
510
|
+
ValidKeys = List.Buffer(Table.Column(DimProduct, "ProductKey"))
|
|
511
|
+
in
|
|
512
|
+
each List.Contains(ValidKeys, [ProductKey])
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
---
|
|
516
|
+
|
|
517
|
+
## ERROR HANDLING STRATEGIES
|
|
518
|
+
|
|
519
|
+
### Strategy 1: Replace with Default
|
|
520
|
+
|
|
521
|
+
```powerquery
|
|
522
|
+
// Replace nulls with default value
|
|
523
|
+
= Table.ReplaceValue(
|
|
524
|
+
Source,
|
|
525
|
+
null,
|
|
526
|
+
-1, // or "Unknown", 0, etc.
|
|
527
|
+
Replacer.ReplaceValue,
|
|
528
|
+
{"CustomerKey"}
|
|
529
|
+
)
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
### Strategy 2: Remove Invalid Rows
|
|
533
|
+
|
|
534
|
+
```powerquery
|
|
535
|
+
// Remove rows that fail validation
|
|
536
|
+
= Table.SelectRows(
|
|
537
|
+
Source,
|
|
538
|
+
each [_ValidationStatus] = "PASSED"
|
|
539
|
+
)
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
### Strategy 3: Route to Error Table
|
|
543
|
+
|
|
544
|
+
```powerquery
|
|
545
|
+
// Create separate error output
|
|
546
|
+
let
|
|
547
|
+
Errors = Table.SelectRows(Source, each [_ValidationStatus] = "FAILED"),
|
|
548
|
+
// Add error timestamp and details
|
|
549
|
+
WithTimestamp = Table.AddColumn(Errors, "ErrorDate", each DateTime.LocalNow()),
|
|
550
|
+
WithDetails = Table.AddColumn(WithTimestamp, "ErrorDetails",
|
|
551
|
+
each Text.Combine(
|
|
552
|
+
List.Select(
|
|
553
|
+
{[_Val_CustomerKey], [_Val_Amount], [_Val_Date]},
|
|
554
|
+
each _ <> "OK"
|
|
555
|
+
),
|
|
556
|
+
"; "
|
|
557
|
+
)
|
|
558
|
+
)
|
|
559
|
+
in
|
|
560
|
+
WithDetails
|
|
561
|
+
```
|
|
562
|
+
|
|
563
|
+
### Strategy 4: Flag and Continue
|
|
564
|
+
|
|
565
|
+
```powerquery
|
|
566
|
+
// Keep all records but flag issues
|
|
567
|
+
= Table.AddColumn(
|
|
568
|
+
Source,
|
|
569
|
+
"DataQualityFlag",
|
|
570
|
+
each if [_ValidationStatus] = "FAILED" then "Review Required" else "Clean"
|
|
571
|
+
)
|
|
572
|
+
```
|
|
573
|
+
|
|
574
|
+
---
|
|
575
|
+
|
|
576
|
+
## BEST PRACTICES
|
|
577
|
+
|
|
578
|
+
### Do's
|
|
579
|
+
|
|
580
|
+
| Practice | Reason |
|
|
581
|
+
|----------|--------|
|
|
582
|
+
| Profile before cleaning | Understand data first |
|
|
583
|
+
| Document all rules | Traceability and maintenance |
|
|
584
|
+
| Keep error logs | Audit trail for data issues |
|
|
585
|
+
| Monitor trends | Catch degradation early |
|
|
586
|
+
| Test with edge cases | Ensure rules work correctly |
|
|
587
|
+
| Use consistent naming | _Val_, DQ_ prefixes |
|
|
588
|
+
|
|
589
|
+
### Don'ts
|
|
590
|
+
|
|
591
|
+
| Anti-Pattern | Problem |
|
|
592
|
+
|--------------|---------|
|
|
593
|
+
| Silent fixes | Hides data issues from business |
|
|
594
|
+
| Overly strict rules | Rejects valid edge cases |
|
|
595
|
+
| Validation in DAX only | Too late in pipeline |
|
|
596
|
+
| No error handling | Query failures crash reports |
|
|
597
|
+
| Ignoring nulls | Can cause wrong calculations |
|
|
598
|
+
|
|
599
|
+
---
|
|
600
|
+
|
|
601
|
+
## DATA QUALITY REPORT TEMPLATE
|
|
602
|
+
|
|
603
|
+
```markdown
|
|
604
|
+
# Data Quality Report
|
|
605
|
+
Generated: [Date]
|
|
606
|
+
Model: [Name]
|
|
607
|
+
Period: [Start Date] - [End Date]
|
|
608
|
+
|
|
609
|
+
## Executive Summary
|
|
610
|
+
- Overall Quality Score: [X]%
|
|
611
|
+
- Critical Issues: [X]
|
|
612
|
+
- Warnings: [X]
|
|
613
|
+
- Tables Analyzed: [X]
|
|
614
|
+
|
|
615
|
+
## Detailed Findings
|
|
616
|
+
|
|
617
|
+
### Table: FactSales
|
|
618
|
+
| Metric | Value | Threshold | Status |
|
|
619
|
+
|--------|-------|-----------|--------|
|
|
620
|
+
| Completeness | 98.5% | > 95% | ✅ Pass |
|
|
621
|
+
| Validity | 99.2% | > 99% | ✅ Pass |
|
|
622
|
+
| Uniqueness | 100% | = 100% | ✅ Pass |
|
|
623
|
+
|
|
624
|
+
Issues:
|
|
625
|
+
1. 1,234 null CustomerKey values
|
|
626
|
+
- Impact: Cannot link to customer dimension
|
|
627
|
+
- Recommendation: Map to "Unknown Customer"
|
|
628
|
+
|
|
629
|
+
### Table: DimCustomer
|
|
630
|
+
[Similar structure]
|
|
631
|
+
|
|
632
|
+
## Trend Analysis
|
|
633
|
+
[Quality score over last 30 days]
|
|
634
|
+
|
|
635
|
+
## Recommendations
|
|
636
|
+
1. [High Priority] Fix CustomerKey nulls
|
|
637
|
+
2. [Medium] Add validation for Amount range
|
|
638
|
+
3. [Low] Consider data cleansing for legacy records
|
|
639
|
+
|
|
640
|
+
## Next Review Date: [Date]
|
|
641
|
+
```
|
|
642
|
+
|
|
643
|
+
---
|
|
644
|
+
|
|
645
|
+
## Complexity Adaptation
|
|
646
|
+
|
|
647
|
+
Adjust depth based on `config.json → experienceLevel`:
|
|
648
|
+
- **beginner**: Step-by-step with explanations, reference library examples
|
|
649
|
+
- **intermediate**: Standard depth, explain non-obvious decisions
|
|
650
|
+
- **advanced**: Concise, skip basics, focus on edge cases and optimization
|
|
651
|
+
|
|
652
|
+
## Related Skills
|
|
653
|
+
|
|
654
|
+
- `/power-query` — Fix data issues in transformations
|
|
655
|
+
- `/testing-validation` — Automated data tests
|
|
656
|
+
- `/governance` — Data quality standards
|
|
657
|
+
|
|
658
|
+
---
|
|
659
|
+
|
|
660
|
+
## RELATED RESOURCES
|
|
661
|
+
|
|
662
|
+
- [Power Query Patterns](../power-query/SKILL.md)
|
|
663
|
+
- [DAX Best Practices](../dax/SKILL.md)
|
|
664
|
+
- [Data Modeling](../data-modeling/SKILL.md)
|