npm - @quarri/claude-data-tools - Versions diffs - 1.0.2 → 1.1.0 - Mend

@quarri/claude-data-tools 1.0.2 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.claude-plugin/plugin.json +12 -1
package/dist/api/client.d.ts +36 -1
package/dist/api/client.d.ts.map +1 -1
package/dist/api/client.js +58 -2
package/dist/api/client.js.map +1 -1
package/dist/auth-cli.d.ts +7 -0
package/dist/auth-cli.d.ts.map +1 -0
package/dist/auth-cli.js +361 -0
package/dist/auth-cli.js.map +1 -0
package/dist/index.js +227 -17
package/dist/index.js.map +1 -1
package/dist/tools/definitions.d.ts.map +1 -1
package/dist/tools/definitions.js +199 -283
package/dist/tools/definitions.js.map +1 -1
package/package.json +8 -2
package/skills/SKILL_CHAINING_DEMO.md +335 -0
package/skills/TEST_SCENARIOS.md +189 -0
package/skills/quarri-analyze/SKILL.md +274 -0
package/skills/quarri-chart/SKILL.md +415 -0
package/skills/quarri-debug-connector/SKILL.md +338 -0
package/skills/quarri-diagnose/SKILL.md +372 -0
package/skills/quarri-explain/SKILL.md +184 -0
package/skills/quarri-extract/SKILL.md +353 -0
package/skills/quarri-insights/SKILL.md +328 -0
package/skills/quarri-metric/SKILL.md +400 -0
package/skills/quarri-query/SKILL.md +159 -0

package/skills/quarri-explain/SKILL.md ADDED Viewed

@@ -0,0 +1,184 @@
+---
+description: Explain SQL queries in plain English
+globs:
+alwaysApply: false
+---
+# /quarri-explain - SQL Explanation
+Explain SQL queries in plain, understandable English, including what data they retrieve and how they work.
+## When to Use
+Use `/quarri-explain` when users need to understand queries:
+- "What does this query do?"
+- "Explain this SQL"
+- "Help me understand this query"
+- "Break down this SQL statement"
+## Explanation Structure
+### 1. One-Line Summary
+Start with a concise summary of what the query does:
+> "This query shows total revenue by product category for the last 12 months."
+### 2. Data Source Explanation
+Explain where the data comes from:
+- Which tables are being queried
+- How tables are joined (if applicable)
+- What each table represents
+### 3. Column Breakdown
+For each column in SELECT:
+- What it represents
+- Any transformations or calculations
+- Aliases and their meaning
+### 4. Filter Explanation
+For each WHERE condition:
+- What records are being filtered
+- The logic of each condition
+- Combined effect of multiple conditions
+### 5. Grouping and Aggregation
+If GROUP BY is present:
+- What defines each group
+- How measures are aggregated
+- Effect on result granularity
+### 6. Ordering and Limits
+Explain the result ordering:
+- Sort columns and direction
+- Why this ordering makes sense
+- Limit effects
+## Query Pattern Recognition
+### Aggregation Query
+```sql
+SELECT region, SUM(revenue) as total_revenue
+FROM sales
+GROUP BY region
+ORDER BY total_revenue DESC;
+```
+**Explanation**: "This query calculates total revenue for each region, sorted from highest to lowest revenue."
+### Join Query
+```sql
+SELECT c.name, COUNT(o.id) as order_count
+FROM customers c
+LEFT JOIN orders o ON c.id = o.customer_id
+GROUP BY c.id, c.name;
+```
+**Explanation**: "This query counts how many orders each customer has placed. It uses a LEFT JOIN to include customers even if they have no orders."
+### Time-Based Query
+```sql
+SELECT DATE_TRUNC('month', order_date) as month,
+       SUM(revenue) as monthly_revenue
+FROM orders
+WHERE order_date >= DATE '2024-01-01'
+GROUP BY month
+ORDER BY month;
+```
+**Explanation**: "This query shows monthly revenue totals starting from January 2024, organized chronologically."
+### Subquery
+```sql
+SELECT *
+FROM orders
+WHERE customer_id IN (
+    SELECT id FROM customers WHERE region = 'North'
+);
+```
+**Explanation**: "This query finds all orders from customers in the North region. The inner query first identifies those customers, then the outer query retrieves their orders."
+### Window Function
+```sql
+SELECT product, revenue,
+       RANK() OVER (ORDER BY revenue DESC) as rank
+FROM products;
+```
+**Explanation**: "This query ranks products by revenue, with the highest revenue getting rank 1. Products with equal revenue get the same rank."
+## Explanation Template
+```markdown
+## Query Explanation
+### Summary
+[One-line plain English description]
+### Data Sources
+- **[Table name]**: [What it contains]
+- **Join**: [How tables connect]
+### What It Retrieves
+| Column | Meaning |
+|--------|---------|
+| [column1] | [explanation] |
+| [column2] | [explanation] |
+### Filters Applied
+- [Condition 1]: [Plain English meaning]
+- [Condition 2]: [Plain English meaning]
+### Grouping
+[Explanation of aggregation level]
+### Ordering
+[How results are sorted and why]
+### Expected Results
+[Description of what the output will look like]
+```
+## Common SQL Elements to Explain
+### Functions
+- `SUM()`: "Adds up all values"
+- `COUNT()`: "Counts how many records"
+- `AVG()`: "Calculates the average"
+- `MAX()/MIN()`: "Finds the highest/lowest value"
+- `DATE_TRUNC()`: "Groups dates by [period]"
+- `COALESCE()`: "Uses the first non-null value"
+### Joins
+- `INNER JOIN`: "Only includes records that match in both tables"
+- `LEFT JOIN`: "Includes all records from the first table, matching records from the second"
+- `RIGHT JOIN`: "Includes all records from the second table, matching records from the first"
+- `FULL JOIN`: "Includes all records from both tables"
+### Operators
+- `IN`: "Matches any value in the list"
+- `BETWEEN`: "Within the specified range"
+- `LIKE`: "Matches the pattern"
+- `IS NULL`: "Has no value"
+## Error Explanation
+When queries have errors, explain:
+1. What the error message means
+2. Where the problem likely is
+3. How to fix it
+Example:
+> "The error 'column not found' means the query references a column that doesn't exist in the specified table. Check if 'revenue' should be 'total_revenue' based on your schema."
+## Context Integration
+Enhance explanations with Quarri context:
+- Reference the actual schema for table descriptions
+- Explain business meaning of columns
+- Connect to defined metrics when applicable

package/skills/quarri-extract/SKILL.md ADDED Viewed

@@ -0,0 +1,353 @@
+---
+description: Build and test data extraction pipelines using dlt
+globs:
+alwaysApply: false
+---
+# /quarri-extract - Data Extraction Pipelines
+Build data extraction pipelines using dlt (data load tool) for pulling data from APIs and other sources.
+## When to Use
+Use `/quarri-extract` when users need to set up data pipelines:
+- "Set up extraction from Stripe"
+- "Pull data from our Salesforce"
+- "Create a pipeline for HubSpot data"
+- "Build a custom API connector"
+## Supported Sources
+### Pre-built Connectors
+- **Payments**: Stripe, Square, PayPal
+- **CRM**: Salesforce, HubSpot, Pipedrive
+- **Marketing**: Google Analytics, Facebook Ads, Mailchimp
+- **Support**: Zendesk, Intercom, Freshdesk
+- **E-commerce**: Shopify, WooCommerce
+- **Databases**: PostgreSQL, MySQL, MongoDB
+### Custom APIs
+Build custom extractors for any REST API.
+## Pipeline Architecture
+### dlt Pipeline Structure
+```python
+import dlt
+from dlt.sources.rest_api import rest_api_source
+# Define the source
+@dlt.source
+def my_source(api_key: str):
+    """Extract data from My API"""
+    @dlt.resource(write_disposition="merge", primary_key="id")
+    def customers():
+        """Extract customer records"""
+        response = requests.get(
+            "https://api.example.com/customers",
+            headers={"Authorization": f"Bearer {api_key}"}
+        )
+        yield from response.json()["data"]
+    @dlt.resource(write_disposition="append")
+    def events():
+        """Extract event records"""
+        # Incremental loading
+        response = requests.get(
+            "https://api.example.com/events",
+            params={"since": dlt.sources.incremental("created_at")}
+        )
+        yield from response.json()["data"]
+    return customers, events
+# Create and run pipeline
+pipeline = dlt.pipeline(
+    pipeline_name="my_pipeline",
+    destination="motherduck",
+    dataset_name="raw"
+)
+# Load data
+load_info = pipeline.run(my_source(api_key="..."))
+```
+## Extraction Workflow
+### Step 1: Discover Available Sources
+List available data sources:
+```
+quarri_list_extraction_sources
+```
+Returns:
+- Pre-built connectors
+- Required credentials for each
+- Available resources (tables/endpoints)
+### Step 2: Configure Credentials
+Store credentials securely:
+```
+quarri_configure_extraction({
+    source_name: "stripe",
+    credentials: {
+        api_key: "sk_live_..."
+    },
+    resources: ["customers", "payments", "subscriptions"]
+})
+```
+### Step 3: Discover Tables
+Explore available data:
+```
+quarri_discover_tables({
+    source_name: "stripe"
+})
+```
+Returns available endpoints/tables with:
+- Field names and types
+- Primary keys
+- Relationships
+### Step 4: Generate Pipeline Code
+Generate the extraction code:
+```python
+# Generated dlt pipeline for Stripe
+import dlt
+from dlt.sources.rest_api import rest_api_source
+@dlt.source(name="stripe")
+def stripe_source(api_key: str = dlt.secrets.value):
+    """Extract data from Stripe API"""
+    config = {
+        "client": {
+            "base_url": "https://api.stripe.com/v1",
+            "auth": {"type": "bearer", "token": api_key}
+        },
+        "resources": [
+            {
+                "name": "customers",
+                "endpoint": {"path": "customers"},
+                "primary_key": "id",
+                "write_disposition": "merge"
+            },
+            {
+                "name": "payments",
+                "endpoint": {
+                    "path": "payment_intents",
+                    "params": {"created[gte]": "{incremental.created}"}
+                },
+                "primary_key": "id",
+                "write_disposition": "append"
+            }
+        ]
+    }
+    return rest_api_source(config)
+if __name__ == "__main__":
+    pipeline = dlt.pipeline(
+        pipeline_name="stripe_pipeline",
+        destination="motherduck",
+        dataset_name="raw_stripe"
+    )
+    load_info = pipeline.run(stripe_source())
+    print(load_info)
+```
+### Step 5: Test Locally
+Before deploying, test the pipeline locally:
+1. Save the generated code to a file
+2. Set environment variables for credentials
+3. Run with a small data subset
+4. Verify data in MotherDuck
+```bash
+# Test run
+python stripe_pipeline.py
+# Check results
+duckdb "SELECT * FROM raw_stripe.customers LIMIT 10"
+```
+### Step 6: Deploy to Quarri
+Submit the validated pipeline:
+```
+quarri_schedule_extraction({
+    source_name: "stripe",
+    pipeline_code: "...",
+    schedule: "0 2 * * *",  // Daily at 2 AM
+    resources: ["customers", "payments"]
+})
+```
+## Custom API Extraction
+For APIs without pre-built connectors:
+### Define the API Configuration
+```python
+config = {
+    "client": {
+        "base_url": "https://api.example.com",
+        "auth": {
+            "type": "api_key",
+            "api_key": dlt.secrets["api_key"],
+            "location": "header",
+            "name": "X-API-Key"
+        },
+        "paginator": {
+            "type": "page_number",
+            "page_param": "page",
+            "total_path": "meta.total_pages"
+        }
+    },
+    "resources": [
+        {
+            "name": "users",
+            "endpoint": {
+                "path": "users",
+                "params": {
+                    "per_page": 100
+                }
+            },
+            "primary_key": "id"
+        },
+        {
+            "name": "orders",
+            "endpoint": {
+                "path": "orders",
+                "params": {
+                    "updated_since": "{incremental.updated_at}"
+                }
+            },
+            "primary_key": "order_id",
+            "write_disposition": "merge"
+        }
+    ]
+}
+```
+### Handle Pagination Types
+**Offset Pagination**
+```python
+"paginator": {
+    "type": "offset",
+    "limit": 100,
+    "offset_param": "skip",
+    "limit_param": "take"
+}
+```
+**Cursor Pagination**
+```python
+"paginator": {
+    "type": "cursor",
+    "cursor_path": "meta.next_cursor",
+    "cursor_param": "cursor"
+}
+```
+**Link Header Pagination**
+```python
+"paginator": {
+    "type": "link_header"
+}
+```
+### Handle Authentication Types
+**Bearer Token**
+```python
+"auth": {"type": "bearer", "token": dlt.secrets["token"]}
+```
+**API Key (Header)**
+```python
+"auth": {"type": "api_key", "api_key": "...", "location": "header", "name": "X-API-Key"}
+```
+**API Key (Query)**
+```python
+"auth": {"type": "api_key", "api_key": "...", "location": "query", "name": "api_key"}
+```
+**OAuth 2.0**
+```python
+"auth": {
+    "type": "oauth2_client_credentials",
+    "client_id": "...",
+    "client_secret": "...",
+    "token_url": "https://api.example.com/oauth/token"
+}
+```
+## Incremental Loading
+Configure incremental extraction to avoid re-processing:
+```python
+@dlt.resource(write_disposition="merge", primary_key="id")
+def orders(
+    updated_at = dlt.sources.incremental("updated_at", initial_value="2024-01-01")
+):
+    """Extract orders incrementally"""
+    response = requests.get(
+        "https://api.example.com/orders",
+        params={"updated_since": updated_at.last_value}
+    )
+    yield from response.json()["data"]
+```
+## Error Handling
+Handle common extraction errors:
+```python
+import dlt
+from tenacity import retry, stop_after_attempt, wait_exponential
+@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
+def fetch_with_retry(url, headers):
+    response = requests.get(url, headers=headers)
+    response.raise_for_status()
+    return response.json()
+```
+## Output Format
+```markdown
+## Extraction Pipeline: [Source Name]
+### Configuration
+- Source: [Name]
+- Resources: [List]
+- Schedule: [Cron expression]
+### Generated Code
+```python
+[Complete dlt pipeline code]
+```
+### Testing Instructions
+1. [Step to test locally]
+2. [Step to verify data]
+### Deployment
+[How to deploy to Quarri for scheduled runs]
+```