npm - @quarri/claude-data-tools - Versions diffs - 1.0.2 → 1.1.0 - Mend

@quarri/claude-data-tools 1.0.2 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.claude-plugin/plugin.json +12 -1
package/dist/api/client.d.ts +36 -1
package/dist/api/client.d.ts.map +1 -1
package/dist/api/client.js +58 -2
package/dist/api/client.js.map +1 -1
package/dist/auth-cli.d.ts +7 -0
package/dist/auth-cli.d.ts.map +1 -0
package/dist/auth-cli.js +361 -0
package/dist/auth-cli.js.map +1 -0
package/dist/index.js +227 -17
package/dist/index.js.map +1 -1
package/dist/tools/definitions.d.ts.map +1 -1
package/dist/tools/definitions.js +199 -283
package/dist/tools/definitions.js.map +1 -1
package/package.json +8 -2
package/skills/SKILL_CHAINING_DEMO.md +335 -0
package/skills/TEST_SCENARIOS.md +189 -0
package/skills/quarri-analyze/SKILL.md +274 -0
package/skills/quarri-chart/SKILL.md +415 -0
package/skills/quarri-debug-connector/SKILL.md +338 -0
package/skills/quarri-diagnose/SKILL.md +372 -0
package/skills/quarri-explain/SKILL.md +184 -0
package/skills/quarri-extract/SKILL.md +353 -0
package/skills/quarri-insights/SKILL.md +328 -0
package/skills/quarri-metric/SKILL.md +400 -0
package/skills/quarri-query/SKILL.md +159 -0

package/skills/quarri-debug-connector/SKILL.md ADDED Viewed

@@ -0,0 +1,338 @@
+---
+description: Debug and heal failing data extraction connectors
+globs:
+alwaysApply: false
+---
+# /quarri-debug-connector - Connector Healing
+Debug and fix failing data extraction connectors by retrieving their code, running locally, identifying errors, and submitting healed versions.
+## When to Use
+Use `/quarri-debug-connector` when:
+- A scheduled extraction job is failing
+- Users report data not updating
+- Connector logs show errors
+- Pipeline needs updates for API changes
+## Debugging Workflow
+### Step 1: Identify the Problem
+Get information about the failing connector:
+```
+quarri_get_connector_logs({
+    connector_id: "stripe_connector_123",
+    lines: 100
+})
+```
+This returns:
+- Recent execution logs
+- Error messages
+- Last successful run timestamp
+- Configuration details
+### Step 2: Retrieve Connector Code
+Get the current connector source:
+```
+quarri_get_connector_code({
+    connector_id: "stripe_connector_123"
+})
+```
+Returns the full Python pipeline code.
+### Step 3: Analyze the Error
+Common error categories:
+#### Authentication Errors
+```
+401 Unauthorized
+403 Forbidden
+Invalid API key
+Token expired
+```
+**Diagnosis**: Check if API credentials are valid/expired
+**Fix**: Update credentials or refresh OAuth tokens
+#### API Changes
+```
+404 Not Found - endpoint /v1/old_endpoint
+Field 'old_field' not found
+```
+**Diagnosis**: API version changed or endpoint deprecated
+**Fix**: Update endpoint paths and field mappings
+#### Rate Limiting
+```
+429 Too Many Requests
+Rate limit exceeded
+```
+**Diagnosis**: Too many API calls
+**Fix**: Add rate limiting, backoff logic
+#### Data Format Changes
+```
+KeyError: 'expected_field'
+TypeError: cannot unpack
+JSONDecodeError
+```
+**Diagnosis**: Response structure changed
+**Fix**: Update parsing logic for new format
+#### Network/Timeout
+```
+ConnectionError
+Timeout
+DNS resolution failed
+```
+**Diagnosis**: Network issues or long-running requests
+**Fix**: Add retry logic, increase timeouts
+### Step 4: Test Locally
+Run the connector locally to reproduce and fix the issue:
+1. **Set up environment**:
+```bash
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate
+pip install dlt requests
+```
+2. **Set credentials**:
+```bash
+export STRIPE_API_KEY="sk_test_..."
+# or use .dlt/secrets.toml
+```
+3. **Run with debugging**:
+```python
+import dlt
+import logging
+# Enable detailed logging
+logging.basicConfig(level=logging.DEBUG)
+# Run pipeline with small dataset
+pipeline = dlt.pipeline(
+    pipeline_name="debug_stripe",
+    destination="duckdb",  # Local for testing
+    dataset_name="test"
+)
+# Test specific resource
+source = stripe_source()
+load_info = pipeline.run(source.with_resources("customers").add_limit(10))
+print(load_info)
+```
+4. **Iterate on fixes**:
+- Make changes to the code
+- Test with small data samples
+- Verify data loads correctly
+### Step 5: Apply Fixes
+Common fix patterns:
+#### Add Retry Logic
+```python
+from tenacity import retry, stop_after_attempt, wait_exponential
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=4, max=60)
+)
+def fetch_data(url, headers):
+    response = requests.get(url, headers=headers)
+    response.raise_for_status()
+    return response.json()
+```
+#### Handle Rate Limiting
+```python
+import time
+def rate_limited_fetch(url, headers, calls_per_minute=60):
+    response = requests.get(url, headers=headers)
+    if response.status_code == 429:
+        retry_after = int(response.headers.get('Retry-After', 60))
+        time.sleep(retry_after)
+        return rate_limited_fetch(url, headers)
+    return response.json()
+```
+#### Update Field Mappings
+```python
+# Old
+data['old_field_name']
+# New - with fallback for backwards compatibility
+data.get('new_field_name', data.get('old_field_name'))
+```
+#### Handle Missing Data
+```python
+@dlt.resource
+def customers():
+    for customer in fetch_customers():
+        yield {
+            'id': customer['id'],
+            'name': customer.get('name', ''),  # Handle missing
+            'email': customer.get('email'),    # Allow null
+            'created_at': customer.get('created_at', datetime.now().isoformat())
+        }
+```
+#### Update API Version
+```python
+# Old
+base_url = "https://api.example.com/v1"
+# New
+base_url = "https://api.example.com/v2"
+# Update endpoints
+config = {
+    "resources": [
+        {
+            "name": "customers",
+            "endpoint": {
+                "path": "customers",  # was: "v1/customers"
+                "params": {"version": "2024-01"}
+            }
+        }
+    ]
+}
+```
+### Step 6: Validate Fix
+Before submitting, validate thoroughly:
+1. **Run full extraction** (not just sample):
+```python
+load_info = pipeline.run(source)
+print(f"Loaded {load_info.load_packages[0].jobs} jobs")
+```
+2. **Verify data quality**:
+```sql
+-- Check row counts
+SELECT COUNT(*) FROM test.customers;
+-- Check for nulls in required fields
+SELECT COUNT(*) FROM test.customers WHERE id IS NULL;
+-- Verify date ranges
+SELECT MIN(created_at), MAX(created_at) FROM test.customers;
+```
+3. **Compare with previous data**:
+```sql
+-- Ensure no data loss
+SELECT
+    'before' as source, COUNT(*) FROM production.customers
+UNION ALL
+SELECT
+    'after' as source, COUNT(*) FROM test.customers;
+```
+### Step 7: Submit Healed Code
+Once validated, submit the fix:
+```
+quarri_update_connector_code({
+    connector_id: "stripe_connector_123",
+    pipeline_code: "[full fixed Python code]",
+    change_summary: "Fixed rate limiting by adding exponential backoff"
+})
+```
+## Error Pattern Reference
+### Authentication
+| Error | Cause | Fix |
+|-------|-------|-----|
+| 401 Unauthorized | Invalid/expired credentials | Update API key |
+| 403 Forbidden | Insufficient permissions | Check API scopes |
+| OAuth token expired | Token TTL exceeded | Implement refresh flow |
+### Rate Limiting
+| Error | Cause | Fix |
+|-------|-------|-----|
+| 429 Too Many Requests | Exceeded rate limit | Add backoff/throttling |
+| Quota exceeded | Daily/monthly limit hit | Batch requests, spread over time |
+### Data Format
+| Error | Cause | Fix |
+|-------|-------|-----|
+| KeyError | Missing field | Use .get() with default |
+| TypeError | Wrong data type | Add type conversion |
+| JSONDecodeError | Invalid JSON response | Handle non-JSON responses |
+### Network
+| Error | Cause | Fix |
+|-------|-------|-----|
+| ConnectionError | Network failure | Add retry logic |
+| Timeout | Request too slow | Increase timeout, paginate |
+| DNS error | Resolution failure | Check URL, add retry |
+## Output Format
+```markdown
+## Connector Debug Report: [Connector Name]
+### Error Summary
+- **Status**: [Failing/Fixed]
+- **Last Success**: [Timestamp]
+- **Error Type**: [Category]
+- **Error Message**: [Actual error]
+### Root Cause Analysis
+[Explanation of why the connector is failing]
+### Fix Applied
+```python
+[Code changes - before/after]
+```
+### Validation Results
+- Test run: [Success/Failure]
+- Records loaded: [Count]
+- Data quality: [Pass/Fail with details]
+### Next Steps
+- [ ] Submit healed code
+- [ ] Monitor next scheduled run
+- [ ] [Any additional actions]
+```
+## Best Practices
+1. **Always test locally first** - Don't submit untested fixes
+2. **Keep change logs** - Document what changed and why
+3. **Preserve backwards compatibility** - Handle old and new formats when possible
+4. **Add defensive coding** - Handle missing fields, rate limits, retries
+5. **Monitor after fix** - Verify the next scheduled run succeeds

package/skills/quarri-diagnose/SKILL.md ADDED Viewed

@@ -0,0 +1,372 @@
+---
+description: Root cause analysis using metric trees to diagnose KPI changes
+globs:
+alwaysApply: false
+---
+# /quarri-diagnose - Root Cause Analysis
+Perform systematic root cause analysis using metric trees to diagnose why KPIs changed.
+## When to Use
+Use `/quarri-diagnose` when users ask diagnostic questions:
+- "Why did revenue drop last month?"
+- "What's causing churn to increase?"
+- "Why is conversion rate declining?"
+- "What's driving the growth in customer acquisition?"
+This skill is different from `/quarri-analyze`:
+- `/quarri-analyze`: General analysis with statistics and insights
+- `/quarri-diagnose`: Focused root cause investigation using metric decomposition
+## Diagnostic Workflow
+```
+1. Identify the metric of concern
+       ↓
+2. Build/retrieve metric tree (decompose KPI)
+       ↓
+3. Query each component for current vs previous period
+       ↓
+4. Calculate period-over-period changes
+       ↓
+5. Identify component with largest negative impact
+       ↓
+6. Drill down recursively if needed
+       ↓
+7. Generate root cause hypothesis with evidence
+       ↓
+8. Recommend actions to address root cause
+```
+## Step 1: Identify the Metric
+Parse the user's question to determine:
+- **Target metric**: What KPI are they concerned about?
+- **Direction**: Is it a drop, increase, or unexpected behavior?
+- **Time period**: When did this happen? What's the comparison period?
+**Examples:**
+- "Why did revenue drop?" → Metric: Revenue, Direction: Decrease
+- "What's causing churn to increase?" → Metric: Churn Rate, Direction: Increase
+- "Conversion tanked last week" → Metric: Conversion Rate, Direction: Decrease, Period: Last week
+## Step 2: Build the Metric Tree
+Either retrieve an existing metric tree or build one dynamically:
+### Revenue Tree
+```
+Revenue = Customers × Orders/Customer × Revenue/Order
+├── Customers
+│   ├── New Customers
+│   └── Returning Customers
+│
+├── Orders per Customer
+│   └── Purchase frequency
+│
+└── Revenue per Order
+    ├── Units per order
+    └── Price per unit
+```
+### Conversion Rate Tree
+```
+Conversion = Conversions / Visitors
+├── Visitors
+│   ├── Organic traffic
+│   ├── Paid traffic
+│   └── Direct traffic
+│
+└── Conversions (by funnel stage)
+    ├── View → Add to Cart
+    ├── Cart → Checkout
+    └── Checkout → Purchase
+```
+### Churn Tree
+```
+Churn Rate = Churned Customers / Total Customers
+├── Churned Customers
+│   ├── By tenure (new vs established)
+│   ├── By segment (enterprise vs SMB)
+│   └── By product usage
+│
+└── Total Customers
+    └── (Denominator context)
+```
+## Step 3: Query Components
+Generate SQL to calculate each component for current and previous periods:
+```sql
+-- Root cause analysis: Revenue components
+WITH current_period AS (
+    SELECT
+        COUNT(DISTINCT customer_id) as customers,
+        COUNT(DISTINCT CASE WHEN is_new_customer THEN customer_id END) as new_customers,
+        COUNT(DISTINCT CASE WHEN NOT is_new_customer THEN customer_id END) as returning_customers,
+        COUNT(*) as orders,
+        COUNT(*)::float / NULLIF(COUNT(DISTINCT customer_id), 0) as orders_per_customer,
+        SUM(revenue) as revenue,
+        SUM(revenue)::float / NULLIF(COUNT(*), 0) as revenue_per_order,
+        SUM(units)::float / NULLIF(COUNT(*), 0) as units_per_order,
+        SUM(revenue)::float / NULLIF(SUM(units), 0) as price_per_unit
+    FROM quarri.bridge
+    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
+),
+previous_period AS (
+    SELECT
+        COUNT(DISTINCT customer_id) as customers,
+        COUNT(DISTINCT CASE WHEN is_new_customer THEN customer_id END) as new_customers,
+        COUNT(DISTINCT CASE WHEN NOT is_new_customer THEN customer_id END) as returning_customers,
+        COUNT(*) as orders,
+        COUNT(*)::float / NULLIF(COUNT(DISTINCT customer_id), 0) as orders_per_customer,
+        SUM(revenue) as revenue,
+        SUM(revenue)::float / NULLIF(COUNT(*), 0) as revenue_per_order,
+        SUM(units)::float / NULLIF(COUNT(*), 0) as units_per_order,
+        SUM(revenue)::float / NULLIF(SUM(units), 0) as price_per_unit
+    FROM quarri.bridge
+    WHERE order_date >= CURRENT_DATE - INTERVAL '60 days'
+      AND order_date < CURRENT_DATE - INTERVAL '30 days'
+)
+SELECT
+    metric,
+    previous_value,
+    current_value,
+    change_pct,
+    impact_pct
+FROM (
+    SELECT 'customers' as metric, p.customers as previous_value, c.customers as current_value,
+           (c.customers - p.customers)::float / NULLIF(p.customers, 0) * 100 as change_pct,
+           ((c.customers - p.customers) * p.orders_per_customer * p.revenue_per_order)::float / NULLIF(p.revenue, 0) * 100 as impact_pct
+    FROM current_period c, previous_period p
+    UNION ALL
+    -- Continue for all components...
+) metrics
+ORDER BY ABS(impact_pct) DESC;
+```
+## Step 4: Calculate Impact Attribution
+For each component, calculate its contribution to the overall change:
+### Multiplicative Decomposition
+For `Revenue = A × B × C`:
+```
+Total Change = Revenue_current - Revenue_previous
+Impact of A = (A_curr - A_prev) × B_prev × C_prev
+Impact of B = A_curr × (B_curr - B_prev) × C_prev
+Impact of C = A_curr × B_curr × (C_curr - C_prev)
+(Sum of impacts ≈ Total Change)
+```
+### Additive Decomposition
+For `Revenue = A + B + C`:
+```
+Total Change = Revenue_current - Revenue_previous
+Impact of A = A_curr - A_prev
+Impact of B = B_curr - B_prev
+Impact of C = C_curr - C_prev
+(Sum of impacts = Total Change exactly)
+```
+## Step 5: Identify Primary Driver
+Rank components by their impact on the overall metric:
+```
+Revenue dropped 10% ($100K → $90K = -$10K)
+Impact Attribution:
+┌─────────────────────┬──────────┬─────────┬──────────┬──────────────┐
+│ Component           │ Previous │ Current │ Change % │ Impact $     │
+├─────────────────────┼──────────┼─────────┼──────────┼──────────────┤
+│ Customers           │ 1,000    │ 920     │ -8%      │ -$8,000  ◀── │
+│ Orders/Customer     │ 2.5      │ 2.45    │ -2%      │ -$1,800      │
+│ Revenue/Order       │ $40      │ $39.90  │ -0.25%   │ -$200        │
+└─────────────────────┴──────────┴─────────┴──────────┴──────────────┘
+PRIMARY DRIVER: Customer count (-8%, -$8K of -$10K total)
+```
+## Step 6: Drill Down
+If the primary driver has sub-components, recurse:
+```
+Customer Count dropped 8%
+Sub-component Analysis:
+┌─────────────────────┬──────────┬─────────┬──────────┐
+│ Component           │ Previous │ Current │ Change % │
+├─────────────────────┼──────────┼─────────┼──────────┤
+│ New Customers       │ 300      │ 200     │ -33%  ◀──│
+│ Returning Customers │ 700      │ 720     │ +3%      │
+└─────────────────────┴──────────┴─────────┴──────────┘
+PRIMARY DRIVER: New customer acquisition (-33%)
+```
+Continue drilling until reaching actionable root cause:
+```
+New Customer Acquisition dropped 33%
+Sub-component Analysis (by channel):
+┌─────────────────────┬──────────┬─────────┬──────────┐
+│ Channel             │ Previous │ Current │ Change % │
+├─────────────────────┼──────────┼─────────┼──────────┤
+│ Paid Search         │ 150      │ 80      │ -47%  ◀──│
+│ Paid Social         │ 80       │ 60      │ -25%     │
+│ Organic             │ 70       │ 60      │ -14%     │
+└─────────────────────┴──────────┴─────────┴──────────┘
+ROOT CAUSE IDENTIFIED: Paid search acquisition dropped 47%
+```
+## Step 7: Generate Hypothesis
+Based on the analysis, generate actionable root cause hypothesis:
+```markdown
+## Root Cause Analysis: Revenue Decline
+### Summary
+Revenue dropped 10% ($100K → $90K) in the last 30 days.
+### Root Cause Chain
+```
+Revenue ↓10%
+└── Customer Count ↓8% (80% of impact)
+    └── New Customers ↓33%
+        └── Paid Search ↓47% ← ROOT CAUSE
+```
+### Evidence
+- Paid search was the largest acquisition channel (50% of new customers)
+- Paid search cost per acquisition increased 35%
+- Conversion rate from paid search stable (not a landing page issue)
+### Confidence Level
+**High** - Clear attribution path with consistent data
+### Hypothesis
+Paid search performance degraded due to increased competition or
+changed bid strategy. The drop in paid search volume directly
+explains the majority of revenue decline.
+```
+## Step 8: Recommend Actions
+Provide actionable recommendations:
+```markdown
+### Recommended Actions
+**Immediate (This Week)**
+1. Review paid search campaign performance in Google Ads
+2. Check for recent bid strategy or budget changes
+3. Analyze competitor activity in auction insights
+**Short-term (This Month)**
+1. Optimize underperforming ad groups
+2. Test new ad copy and landing pages
+3. Consider increasing budget if ROAS is still profitable
+**Investigation Needed**
+1. Did CPC increase? (external market pressure)
+2. Did quality score drop? (internal issue)
+3. Were there any campaign changes around the decline date?
+```
+## Output Format
+```markdown
+## Diagnosis: [Metric] [Direction] [Magnitude]
+### Metric Tree
+```
+[Top-level metric]
+├── [Component 1] [change] ← [marker if primary]
+├── [Component 2] [change]
+└── [Component 3] [change]
+```
+### Impact Attribution
+| Component | Previous | Current | Change % | Impact |
+|-----------|----------|---------|----------|--------|
+| ...       | ...      | ...     | ...      | ...    |
+### Root Cause Chain
+```
+[Top metric] [change]
+└── [Driver 1] [change] (X% of impact)
+    └── [Driver 2] [change]
+        └── [ROOT CAUSE] [change]
+```
+### Confidence Level
+[High/Medium/Low] - [Reasoning]
+### Evidence
+- [Supporting data point 1]
+- [Supporting data point 2]
+- [Supporting data point 3]
+### Hypothesis
+[Clear statement of what caused the change]
+### Recommended Actions
+**Immediate**
+1. [Action 1]
+2. [Action 2]
+**Short-term**
+1. [Action 1]
+2. [Action 2]
+**Investigation Needed**
+1. [Question to answer]
+2. [Data to gather]
+```
+## Confidence Levels
+### High Confidence
+- Clear attribution path (single dominant driver)
+- Consistent data across dimensions
+- Change magnitude is significant (>20%)
+- Root cause is specific and actionable
+### Medium Confidence
+- Multiple contributing drivers
+- Some data inconsistencies
+- Need additional context for certainty
+- Root cause is somewhat general
+### Low Confidence
+- No clear dominant driver
+- Data quality issues
+- Multiple possible explanations
+- Need more investigation
+## Integration
+This skill works well with:
+- `/quarri-metric`: Use existing metric definitions and trees
+- `/quarri-query`: Get additional data to validate hypotheses
+- `/quarri-analyze`: Follow-up with detailed segment analysis
+- `/quarri-chart`: Visualize the change over time