@quarri/claude-data-tools 1.0.2 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,184 @@
1
+ ---
2
+ description: Explain SQL queries in plain English
3
+ globs:
4
+ alwaysApply: false
5
+ ---
6
+
7
+ # /quarri-explain - SQL Explanation
8
+
9
+ Explain SQL queries in plain, understandable English, including what data they retrieve and how they work.
10
+
11
+ ## When to Use
12
+
13
+ Use `/quarri-explain` when users need to understand queries:
14
+ - "What does this query do?"
15
+ - "Explain this SQL"
16
+ - "Help me understand this query"
17
+ - "Break down this SQL statement"
18
+
19
+ ## Explanation Structure
20
+
21
+ ### 1. One-Line Summary
22
+
23
+ Start with a concise summary of what the query does:
24
+ > "This query shows total revenue by product category for the last 12 months."
25
+
26
+ ### 2. Data Source Explanation
27
+
28
+ Explain where the data comes from:
29
+ - Which tables are being queried
30
+ - How tables are joined (if applicable)
31
+ - What each table represents
32
+
33
+ ### 3. Column Breakdown
34
+
35
+ For each column in SELECT:
36
+ - What it represents
37
+ - Any transformations or calculations
38
+ - Aliases and their meaning
39
+
40
+ ### 4. Filter Explanation
41
+
42
+ For each WHERE condition:
43
+ - What records are being filtered
44
+ - The logic of each condition
45
+ - Combined effect of multiple conditions
46
+
47
+ ### 5. Grouping and Aggregation
48
+
49
+ If GROUP BY is present:
50
+ - What defines each group
51
+ - How measures are aggregated
52
+ - Effect on result granularity
53
+
54
+ ### 6. Ordering and Limits
55
+
56
+ Explain the result ordering:
57
+ - Sort columns and direction
58
+ - Why this ordering makes sense
59
+ - Limit effects
60
+
61
+ ## Query Pattern Recognition
62
+
63
+ ### Aggregation Query
64
+ ```sql
65
+ SELECT region, SUM(revenue) as total_revenue
66
+ FROM sales
67
+ GROUP BY region
68
+ ORDER BY total_revenue DESC;
69
+ ```
70
+
71
+ **Explanation**: "This query calculates total revenue for each region, sorted from highest to lowest revenue."
72
+
73
+ ### Join Query
74
+ ```sql
75
+ SELECT c.name, COUNT(o.id) as order_count
76
+ FROM customers c
77
+ LEFT JOIN orders o ON c.id = o.customer_id
78
+ GROUP BY c.id, c.name;
79
+ ```
80
+
81
+ **Explanation**: "This query counts how many orders each customer has placed. It uses a LEFT JOIN to include customers even if they have no orders."
82
+
83
+ ### Time-Based Query
84
+ ```sql
85
+ SELECT DATE_TRUNC('month', order_date) as month,
86
+ SUM(revenue) as monthly_revenue
87
+ FROM orders
88
+ WHERE order_date >= DATE '2024-01-01'
89
+ GROUP BY month
90
+ ORDER BY month;
91
+ ```
92
+
93
+ **Explanation**: "This query shows monthly revenue totals starting from January 2024, organized chronologically."
94
+
95
+ ### Subquery
96
+ ```sql
97
+ SELECT *
98
+ FROM orders
99
+ WHERE customer_id IN (
100
+ SELECT id FROM customers WHERE region = 'North'
101
+ );
102
+ ```
103
+
104
+ **Explanation**: "This query finds all orders from customers in the North region. The inner query first identifies those customers, then the outer query retrieves their orders."
105
+
106
+ ### Window Function
107
+ ```sql
108
+ SELECT product, revenue,
109
+ RANK() OVER (ORDER BY revenue DESC) as rank
110
+ FROM products;
111
+ ```
112
+
113
+ **Explanation**: "This query ranks products by revenue, with the highest revenue getting rank 1. Products with equal revenue get the same rank."
114
+
115
+ ## Explanation Template
116
+
117
+ ```markdown
118
+ ## Query Explanation
119
+
120
+ ### Summary
121
+ [One-line plain English description]
122
+
123
+ ### Data Sources
124
+ - **[Table name]**: [What it contains]
125
+ - **Join**: [How tables connect]
126
+
127
+ ### What It Retrieves
128
+ | Column | Meaning |
129
+ |--------|---------|
130
+ | [column1] | [explanation] |
131
+ | [column2] | [explanation] |
132
+
133
+ ### Filters Applied
134
+ - [Condition 1]: [Plain English meaning]
135
+ - [Condition 2]: [Plain English meaning]
136
+
137
+ ### Grouping
138
+ [Explanation of aggregation level]
139
+
140
+ ### Ordering
141
+ [How results are sorted and why]
142
+
143
+ ### Expected Results
144
+ [Description of what the output will look like]
145
+ ```
146
+
147
+ ## Common SQL Elements to Explain
148
+
149
+ ### Functions
150
+ - `SUM()`: "Adds up all values"
151
+ - `COUNT()`: "Counts how many records"
152
+ - `AVG()`: "Calculates the average"
153
+ - `MAX()/MIN()`: "Finds the highest/lowest value"
154
+ - `DATE_TRUNC()`: "Groups dates by [period]"
155
+ - `COALESCE()`: "Uses the first non-null value"
156
+
157
+ ### Joins
158
+ - `INNER JOIN`: "Only includes records that match in both tables"
159
+ - `LEFT JOIN`: "Includes all records from the first table, matching records from the second"
160
+ - `RIGHT JOIN`: "Includes all records from the second table, matching records from the first"
161
+ - `FULL JOIN`: "Includes all records from both tables"
162
+
163
+ ### Operators
164
+ - `IN`: "Matches any value in the list"
165
+ - `BETWEEN`: "Within the specified range"
166
+ - `LIKE`: "Matches the pattern"
167
+ - `IS NULL`: "Has no value"
168
+
169
+ ## Error Explanation
170
+
171
+ When queries have errors, explain:
172
+ 1. What the error message means
173
+ 2. Where the problem likely is
174
+ 3. How to fix it
175
+
176
+ Example:
177
+ > "The error 'column not found' means the query references a column that doesn't exist in the specified table. Check if 'revenue' should be 'total_revenue' based on your schema."
178
+
179
+ ## Context Integration
180
+
181
+ Enhance explanations with Quarri context:
182
+ - Reference the actual schema for table descriptions
183
+ - Explain business meaning of columns
184
+ - Connect to defined metrics when applicable
@@ -0,0 +1,353 @@
1
+ ---
2
+ description: Build and test data extraction pipelines using dlt
3
+ globs:
4
+ alwaysApply: false
5
+ ---
6
+
7
+ # /quarri-extract - Data Extraction Pipelines
8
+
9
+ Build data extraction pipelines using dlt (data load tool) for pulling data from APIs and other sources.
10
+
11
+ ## When to Use
12
+
13
+ Use `/quarri-extract` when users need to set up data pipelines:
14
+ - "Set up extraction from Stripe"
15
+ - "Pull data from our Salesforce"
16
+ - "Create a pipeline for HubSpot data"
17
+ - "Build a custom API connector"
18
+
19
+ ## Supported Sources
20
+
21
+ ### Pre-built Connectors
22
+ - **Payments**: Stripe, Square, PayPal
23
+ - **CRM**: Salesforce, HubSpot, Pipedrive
24
+ - **Marketing**: Google Analytics, Facebook Ads, Mailchimp
25
+ - **Support**: Zendesk, Intercom, Freshdesk
26
+ - **E-commerce**: Shopify, WooCommerce
27
+ - **Databases**: PostgreSQL, MySQL, MongoDB
28
+
29
+ ### Custom APIs
30
+ Build custom extractors for any REST API.
31
+
32
+ ## Pipeline Architecture
33
+
34
+ ### dlt Pipeline Structure
35
+
36
+ ```python
37
+ import dlt
38
+ from dlt.sources.rest_api import rest_api_source
39
+
40
+ # Define the source
41
+ @dlt.source
42
+ def my_source(api_key: str):
43
+ """Extract data from My API"""
44
+
45
+ @dlt.resource(write_disposition="merge", primary_key="id")
46
+ def customers():
47
+ """Extract customer records"""
48
+ response = requests.get(
49
+ "https://api.example.com/customers",
50
+ headers={"Authorization": f"Bearer {api_key}"}
51
+ )
52
+ yield from response.json()["data"]
53
+
54
+ @dlt.resource(write_disposition="append")
55
+ def events():
56
+ """Extract event records"""
57
+ # Incremental loading
58
+ response = requests.get(
59
+ "https://api.example.com/events",
60
+ params={"since": dlt.sources.incremental("created_at")}
61
+ )
62
+ yield from response.json()["data"]
63
+
64
+ return customers, events
65
+
66
+ # Create and run pipeline
67
+ pipeline = dlt.pipeline(
68
+ pipeline_name="my_pipeline",
69
+ destination="motherduck",
70
+ dataset_name="raw"
71
+ )
72
+
73
+ # Load data
74
+ load_info = pipeline.run(my_source(api_key="..."))
75
+ ```
76
+
77
+ ## Extraction Workflow
78
+
79
+ ### Step 1: Discover Available Sources
80
+
81
+ List available data sources:
82
+ ```
83
+ quarri_list_extraction_sources
84
+ ```
85
+
86
+ Returns:
87
+ - Pre-built connectors
88
+ - Required credentials for each
89
+ - Available resources (tables/endpoints)
90
+
91
+ ### Step 2: Configure Credentials
92
+
93
+ Store credentials securely:
94
+ ```
95
+ quarri_configure_extraction({
96
+ source_name: "stripe",
97
+ credentials: {
98
+ api_key: "sk_live_..."
99
+ },
100
+ resources: ["customers", "payments", "subscriptions"]
101
+ })
102
+ ```
103
+
104
+ ### Step 3: Discover Tables
105
+
106
+ Explore available data:
107
+ ```
108
+ quarri_discover_tables({
109
+ source_name: "stripe"
110
+ })
111
+ ```
112
+
113
+ Returns available endpoints/tables with:
114
+ - Field names and types
115
+ - Primary keys
116
+ - Relationships
117
+
118
+ ### Step 4: Generate Pipeline Code
119
+
120
+ Generate the extraction code:
121
+
122
+ ```python
123
+ # Generated dlt pipeline for Stripe
124
+ import dlt
125
+ from dlt.sources.rest_api import rest_api_source
126
+
127
+ @dlt.source(name="stripe")
128
+ def stripe_source(api_key: str = dlt.secrets.value):
129
+ """Extract data from Stripe API"""
130
+
131
+ config = {
132
+ "client": {
133
+ "base_url": "https://api.stripe.com/v1",
134
+ "auth": {"type": "bearer", "token": api_key}
135
+ },
136
+ "resources": [
137
+ {
138
+ "name": "customers",
139
+ "endpoint": {"path": "customers"},
140
+ "primary_key": "id",
141
+ "write_disposition": "merge"
142
+ },
143
+ {
144
+ "name": "payments",
145
+ "endpoint": {
146
+ "path": "payment_intents",
147
+ "params": {"created[gte]": "{incremental.created}"}
148
+ },
149
+ "primary_key": "id",
150
+ "write_disposition": "append"
151
+ }
152
+ ]
153
+ }
154
+
155
+ return rest_api_source(config)
156
+
157
+ if __name__ == "__main__":
158
+ pipeline = dlt.pipeline(
159
+ pipeline_name="stripe_pipeline",
160
+ destination="motherduck",
161
+ dataset_name="raw_stripe"
162
+ )
163
+
164
+ load_info = pipeline.run(stripe_source())
165
+ print(load_info)
166
+ ```
167
+
168
+ ### Step 5: Test Locally
169
+
170
+ Before deploying, test the pipeline locally:
171
+
172
+ 1. Save the generated code to a file
173
+ 2. Set environment variables for credentials
174
+ 3. Run with a small data subset
175
+ 4. Verify data in MotherDuck
176
+
177
+ ```bash
178
+ # Test run
179
+ python stripe_pipeline.py
180
+
181
+ # Check results
182
+ duckdb "SELECT * FROM raw_stripe.customers LIMIT 10"
183
+ ```
184
+
185
+ ### Step 6: Deploy to Quarri
186
+
187
+ Submit the validated pipeline:
188
+ ```
189
+ quarri_schedule_extraction({
190
+ source_name: "stripe",
191
+ pipeline_code: "...",
192
+ schedule: "0 2 * * *", // Daily at 2 AM
193
+ resources: ["customers", "payments"]
194
+ })
195
+ ```
196
+
197
+ ## Custom API Extraction
198
+
199
+ For APIs without pre-built connectors:
200
+
201
+ ### Define the API Configuration
202
+
203
+ ```python
204
+ config = {
205
+ "client": {
206
+ "base_url": "https://api.example.com",
207
+ "auth": {
208
+ "type": "api_key",
209
+ "api_key": dlt.secrets["api_key"],
210
+ "location": "header",
211
+ "name": "X-API-Key"
212
+ },
213
+ "paginator": {
214
+ "type": "page_number",
215
+ "page_param": "page",
216
+ "total_path": "meta.total_pages"
217
+ }
218
+ },
219
+ "resources": [
220
+ {
221
+ "name": "users",
222
+ "endpoint": {
223
+ "path": "users",
224
+ "params": {
225
+ "per_page": 100
226
+ }
227
+ },
228
+ "primary_key": "id"
229
+ },
230
+ {
231
+ "name": "orders",
232
+ "endpoint": {
233
+ "path": "orders",
234
+ "params": {
235
+ "updated_since": "{incremental.updated_at}"
236
+ }
237
+ },
238
+ "primary_key": "order_id",
239
+ "write_disposition": "merge"
240
+ }
241
+ ]
242
+ }
243
+ ```
244
+
245
+ ### Handle Pagination Types
246
+
247
+ **Offset Pagination**
248
+ ```python
249
+ "paginator": {
250
+ "type": "offset",
251
+ "limit": 100,
252
+ "offset_param": "skip",
253
+ "limit_param": "take"
254
+ }
255
+ ```
256
+
257
+ **Cursor Pagination**
258
+ ```python
259
+ "paginator": {
260
+ "type": "cursor",
261
+ "cursor_path": "meta.next_cursor",
262
+ "cursor_param": "cursor"
263
+ }
264
+ ```
265
+
266
+ **Link Header Pagination**
267
+ ```python
268
+ "paginator": {
269
+ "type": "link_header"
270
+ }
271
+ ```
272
+
273
+ ### Handle Authentication Types
274
+
275
+ **Bearer Token**
276
+ ```python
277
+ "auth": {"type": "bearer", "token": dlt.secrets["token"]}
278
+ ```
279
+
280
+ **API Key (Header)**
281
+ ```python
282
+ "auth": {"type": "api_key", "api_key": "...", "location": "header", "name": "X-API-Key"}
283
+ ```
284
+
285
+ **API Key (Query)**
286
+ ```python
287
+ "auth": {"type": "api_key", "api_key": "...", "location": "query", "name": "api_key"}
288
+ ```
289
+
290
+ **OAuth 2.0**
291
+ ```python
292
+ "auth": {
293
+ "type": "oauth2_client_credentials",
294
+ "client_id": "...",
295
+ "client_secret": "...",
296
+ "token_url": "https://api.example.com/oauth/token"
297
+ }
298
+ ```
299
+
300
+ ## Incremental Loading
301
+
302
+ Configure incremental extraction to avoid re-processing:
303
+
304
+ ```python
305
+ @dlt.resource(write_disposition="merge", primary_key="id")
306
+ def orders(
307
+ updated_at = dlt.sources.incremental("updated_at", initial_value="2024-01-01")
308
+ ):
309
+ """Extract orders incrementally"""
310
+ response = requests.get(
311
+ "https://api.example.com/orders",
312
+ params={"updated_since": updated_at.last_value}
313
+ )
314
+ yield from response.json()["data"]
315
+ ```
316
+
317
+ ## Error Handling
318
+
319
+ Handle common extraction errors:
320
+
321
+ ```python
322
+ import dlt
323
+ from tenacity import retry, stop_after_attempt, wait_exponential
324
+
325
+ @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
326
+ def fetch_with_retry(url, headers):
327
+ response = requests.get(url, headers=headers)
328
+ response.raise_for_status()
329
+ return response.json()
330
+ ```
331
+
332
+ ## Output Format
333
+
334
+ ```markdown
335
+ ## Extraction Pipeline: [Source Name]
336
+
337
+ ### Configuration
338
+ - Source: [Name]
339
+ - Resources: [List]
340
+ - Schedule: [Cron expression]
341
+
342
+ ### Generated Code
343
+ ```python
344
+ [Complete dlt pipeline code]
345
+ ```
346
+
347
+ ### Testing Instructions
348
+ 1. [Step to test locally]
349
+ 2. [Step to verify data]
350
+
351
+ ### Deployment
352
+ [How to deploy to Quarri for scheduled runs]
353
+ ```