npm - @quarri/claude-data-tools - Versions diffs - 1.0.2 → 1.1.0 - Mend

@quarri/claude-data-tools 1.0.2 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.claude-plugin/plugin.json +12 -1
package/dist/api/client.d.ts +36 -1
package/dist/api/client.d.ts.map +1 -1
package/dist/api/client.js +58 -2
package/dist/api/client.js.map +1 -1
package/dist/auth-cli.d.ts +7 -0
package/dist/auth-cli.d.ts.map +1 -0
package/dist/auth-cli.js +361 -0
package/dist/auth-cli.js.map +1 -0
package/dist/index.js +227 -17
package/dist/index.js.map +1 -1
package/dist/tools/definitions.d.ts.map +1 -1
package/dist/tools/definitions.js +199 -283
package/dist/tools/definitions.js.map +1 -1
package/package.json +8 -2
package/skills/SKILL_CHAINING_DEMO.md +335 -0
package/skills/TEST_SCENARIOS.md +189 -0
package/skills/quarri-analyze/SKILL.md +274 -0
package/skills/quarri-chart/SKILL.md +415 -0
package/skills/quarri-debug-connector/SKILL.md +338 -0
package/skills/quarri-diagnose/SKILL.md +372 -0
package/skills/quarri-explain/SKILL.md +184 -0
package/skills/quarri-extract/SKILL.md +353 -0
package/skills/quarri-insights/SKILL.md +328 -0
package/skills/quarri-metric/SKILL.md +400 -0
package/skills/quarri-query/SKILL.md +159 -0

package/skills/quarri-insights/SKILL.md ADDED Viewed

@@ -0,0 +1,328 @@
+---
+description: Statistical analysis and business insights from data
+globs:
+alwaysApply: false
+---
+# /quarri-insights - Statistical Analysis & Business Insights
+Perform statistical analysis on data and generate actionable business insights with recommendations.
+## When to Use
+Use `/quarri-insights` when users need:
+- Statistical analysis: "What's the distribution of order values?"
+- Business interpretation: "What insights can you give me from this data?"
+- Correlation analysis: "Is there a relationship between price and quantity?"
+- Actionable recommendations: "What should we do based on these numbers?"
+## Part 1: Statistical Analysis
+### Analysis Types
+#### 1. Descriptive Statistics
+For numeric columns, calculate:
+- **Central tendency**: mean, median, mode
+- **Spread**: std, variance, range, IQR
+- **Shape**: skewness, kurtosis
+- **Percentiles**: 25th, 50th, 75th, 90th, 95th, 99th
+```python
+import pandas as pd
+import numpy as np
+def descriptive_stats(df, column):
+    return {
+        'count': df[column].count(),
+        'mean': df[column].mean(),
+        'median': df[column].median(),
+        'std': df[column].std(),
+        'min': df[column].min(),
+        'max': df[column].max(),
+        'q25': df[column].quantile(0.25),
+        'q75': df[column].quantile(0.75),
+        'skew': df[column].skew(),
+        'kurtosis': df[column].kurtosis()
+    }
+```
+#### 2. Distribution Analysis
+Analyze how values are distributed:
+- **Histogram bins**: Frequency distribution
+- **Normality test**: Shapiro-Wilk or D'Agostino
+- **Distribution fit**: Best-fit distribution type
+```python
+from scipy import stats
+def distribution_analysis(df, column):
+    data = df[column].dropna()
+    hist, bin_edges = np.histogram(data, bins='auto')
+    if len(data) >= 20:
+        stat, p_value = stats.shapiro(data[:5000])
+        is_normal = p_value > 0.05
+    else:
+        is_normal = None
+        p_value = None
+    return {
+        'histogram': {'counts': hist.tolist(), 'edges': bin_edges.tolist()},
+        'is_normal': is_normal,
+        'normality_p_value': p_value
+    }
+```
+#### 3. Correlation Analysis
+Find relationships between numeric columns:
+- **Pearson correlation**: Linear relationships
+- **Spearman correlation**: Monotonic relationships
+- **Strong correlations**: |r| > 0.5
+```python
+def correlation_analysis(df, columns):
+    numeric_df = df[columns].select_dtypes(include=[np.number])
+    pearson = numeric_df.corr(method='pearson')
+    spearman = numeric_df.corr(method='spearman')
+    strong_correlations = []
+    for i, col1 in enumerate(numeric_df.columns):
+        for j, col2 in enumerate(numeric_df.columns):
+            if i < j:
+                r = pearson.loc[col1, col2]
+                if abs(r) > 0.5:
+                    strong_correlations.append({
+                        'columns': [col1, col2],
+                        'pearson_r': r,
+                        'spearman_r': spearman.loc[col1, col2]
+                    })
+    return {
+        'correlation_matrix': pearson.to_dict(),
+        'strong_correlations': strong_correlations
+    }
+```
+#### 4. Outlier Detection
+Identify unusual values:
+- **Z-score method**: Values > 3 std from mean
+- **IQR method**: Values outside 1.5*IQR from quartiles
+```python
+def detect_outliers(df, column, method='iqr'):
+    data = df[column].dropna()
+    if method == 'zscore':
+        z_scores = np.abs(stats.zscore(data))
+        outliers = data[z_scores > 3]
+    elif method == 'iqr':
+        q1, q3 = data.quantile([0.25, 0.75])
+        iqr = q3 - q1
+        lower_bound = q1 - 1.5 * iqr
+        upper_bound = q3 + 1.5 * iqr
+        outliers = data[(data < lower_bound) | (data > upper_bound)]
+    return {
+        'outlier_count': len(outliers),
+        'outlier_percentage': len(outliers) / len(data) * 100,
+        'outlier_values': outliers.head(20).tolist(),
+        'bounds': {'lower': lower_bound, 'upper': upper_bound} if method == 'iqr' else None
+    }
+```
+#### 5. Time Series Analysis
+For time-based data:
+- **Trend detection**: Linear regression on time
+- **Growth rate**: Period-over-period changes
+- **Volatility**: Coefficient of variation
+```python
+def time_series_analysis(df, date_column, value_column):
+    df_sorted = df.sort_values(date_column)
+    df_sorted['pct_change'] = df_sorted[value_column].pct_change()
+    x = np.arange(len(df_sorted))
+    slope, intercept, r_value, p_value, std_err = stats.linregress(x, df_sorted[value_column])
+    trend_direction = 'increasing' if slope > 0 else 'decreasing' if slope < 0 else 'stable'
+    return {
+        'trend_direction': trend_direction,
+        'trend_slope': slope,
+        'trend_r_squared': r_value ** 2,
+        'average_growth_rate': df_sorted['pct_change'].mean(),
+        'volatility': df_sorted[value_column].std() / df_sorted[value_column].mean()
+    }
+```
+#### 6. Segment Comparison
+Compare groups within data:
+- **Group statistics**: Mean, median by group
+- **Statistical tests**: t-test, ANOVA for differences
+- **Effect size**: Cohen's d for magnitude
+```python
+def segment_comparison(df, group_column, value_column):
+    groups = df.groupby(group_column)[value_column]
+    group_stats = groups.agg(['count', 'mean', 'median', 'std']).to_dict('index')
+    group_values = [group.values for name, group in groups]
+    if len(group_values) >= 2:
+        f_stat, p_value = stats.f_oneway(*group_values)
+        significant_difference = p_value < 0.05
+    else:
+        f_stat, p_value = None, None
+        significant_difference = None
+    return {
+        'group_statistics': group_stats,
+        'anova_f_statistic': f_stat,
+        'anova_p_value': p_value,
+        'significant_difference': significant_difference
+    }
+```
+## Part 2: Business Insight Generation
+### Pattern Recognition
+Identify these patterns in the data:
+**Trends**
+- Is the metric growing, declining, or stable?
+- What's the rate of change?
+- Are there inflection points?
+**Concentrations**
+- Does a small portion drive most results? (Pareto principle)
+- Are there dominant segments?
+**Anomalies**
+- Are there outliers?
+- Are there unexpected values?
+- Are there gaps or missing patterns?
+**Relationships**
+- Do variables correlate?
+- Are there surprising connections?
+- What drives what?
+### Insight Categories
+#### Key Finding
+The single most important takeaway:
+> "Electronics drives 68% of total revenue but represents only 25% of product categories."
+#### Performance Insights
+How things are performing:
+> "Revenue grew 23% YoY, outpacing the industry average of 15%."
+#### Comparison Insights
+How segments differ:
+> "Enterprise customers spend 4.2x more per order than SMB customers."
+#### Trend Insights
+What's changing over time:
+> "Mobile orders increased from 12% to 47% of total orders over 18 months."
+#### Risk Insights
+Warning signs and concerns:
+> "Three of top 10 customers reduced orders by >50% this quarter."
+#### Opportunity Insights
+Potential for growth or improvement:
+> "Cross-sell rate for Product A is only 8%, compared to 28% category average."
+### Insight Quality Criteria
+Good insights are:
+**Specific**: Include actual numbers
+- Bad: "Revenue increased"
+- Good: "Revenue increased 23% from $4.2M to $5.2M"
+**Contextual**: Provide comparison points
+- Bad: "We have 1,200 customers"
+- Good: "Customer count grew 15% to 1,200, vs. 8% industry average"
+**Actionable**: Suggest what to do
+- Bad: "Conversion rate varies by channel"
+- Good: "Email conversion is 2.3x higher than social - consider reallocating ad spend"
+**Relevant**: Connect to business goals
+- Bad: "The median is 45"
+- Good: "Half of orders are under $45, suggesting opportunity for upselling"
+## Workflow
+1. **Receive data**: From a previous query or via `quarri_execute_sql`
+2. **Identify analysis type**: Based on data shape and user question
+3. **Perform statistical analysis**: Run appropriate calculations
+4. **Generate insights**: Interpret results in business context
+5. **Prioritize findings**: Rank by impact, actionability, urgency
+6. **Frame recommendations**: Suggest specific actions
+## Output Format
+```markdown
+## Analysis: [Data Description]
+### Data Overview
+- Rows: [count]
+- Numeric columns: [list]
+- Categorical columns: [list]
+- Date range: [if applicable]
+### Statistical Findings
+#### Descriptive Statistics
+| Metric | Value |
+|--------|-------|
+| Mean   | X     |
+| Median | Y     |
+| Std    | Z     |
+#### Key Patterns
+- [Pattern 1 with numbers]
+- [Pattern 2 with numbers]
+- [Pattern 3 with numbers]
+### Business Insights
+#### Key Finding
+[The single most important insight - bolded and specific]
+#### Insights
+**1. [Category]: [Insight Title]**
+[Specific insight with numbers]
+- **Implication**: [What this means]
+- **Recommended Action**: [What to do]
+**2. [Category]: [Insight Title]**
+[Specific insight with numbers]
+- **Implication**: [What this means]
+- **Recommended Action**: [What to do]
+### Risks to Monitor
+- [Risk 1 with trigger condition]
+- [Risk 2 with trigger condition]
+### Recommended Next Steps
+1. [Action 1]
+2. [Action 2]
+3. [Suggested follow-up analysis]
+```
+## Integration
+This skill works well with:
+- `/quarri-query`: Get data first, then analyze
+- `/quarri-chart`: Visualize statistical findings
+- `/quarri-analyze`: Called as part of the full analysis pipeline

package/skills/quarri-metric/SKILL.md ADDED Viewed

@@ -0,0 +1,400 @@
+---
+description: Define business metrics and build metric trees for KPI decomposition
+globs:
+alwaysApply: false
+---
+# /quarri-metric - Metric Definition & Metric Trees
+Define new business metrics and build metric trees that decompose KPIs into component drivers for root cause analysis.
+## When to Use
+Use `/quarri-metric` when users want to:
+- Create metrics: "Create a metric for customer lifetime value"
+- Define KPIs: "Define a retention rate metric"
+- Build metric trees: "Decompose revenue into its drivers"
+- Understand relationships: "What metrics drive conversion rate?"
+## Part 1: Metric Definition
+### Step 1: Understand the Metric
+Gather these details through conversation:
+1. **Name**: What should this metric be called?
+   - Use clear, business-friendly names
+   - Examples: "Monthly Recurring Revenue", "Customer Churn Rate"
+2. **Description**: What does it measure and why is it important?
+3. **Calculation**: How is it computed?
+   - What's the formula?
+   - What columns are involved?
+4. **Dimensions**: How can it be broken down?
+   - By region, product, customer segment?
+   - Time granularity (daily, monthly, yearly)?
+5. **Synonyms**: What else might users call this?
+   - "MRR" for "Monthly Recurring Revenue"
+   - "AOV" for "Average Order Value"
+### Step 2: Map to Schema
+1. Fetch schema using `quarri_get_schema`
+2. Identify relevant tables and columns
+3. Validate column names and types
+### Step 3: Write SQL Template
+```sql
+-- Template with placeholders for dimensions
+SELECT
+    {dimension_columns},
+    SUM(order_total) / COUNT(DISTINCT customer_id) as average_order_value
+FROM quarri.bridge
+WHERE {filter_conditions}
+GROUP BY {dimension_columns}
+```
+### Step 4: Save
+Create the metric using `quarri_create_metric`:
+```json
+{
+    "name": "Average Order Value",
+    "description": "Average revenue per order",
+    "sql_template": "SELECT SUM(order_total) / COUNT(*) as aov FROM quarri.bridge",
+    "dimensions": ["region", "product_category", "month"]
+}
+```
+## Part 2: Metric Trees
+Metric trees decompose a top-level KPI into its component drivers, enabling systematic root cause analysis.
+### What is a Metric Tree?
+A metric tree shows how a high-level metric breaks down into component parts:
+```
+Revenue
+├── = Customers × Orders/Customer × Revenue/Order
+│
+├── Customers
+│   ├── New Customers (acquisition)
+│   └── Returning Customers (retention)
+│
+├── Orders per Customer (frequency)
+│   ├── Purchase occasions
+│   └── Repeat purchase rate
+│
+└── Revenue per Order (basket size)
+    ├── Units per order
+    ├── Price per unit
+    └── Discount rate
+```
+### Common Metric Tree Patterns
+#### E-Commerce Revenue Tree
+```
+Revenue = Traffic × Conversion Rate × Average Order Value
+├── Traffic
+│   ├── Organic (SEO, direct)
+│   ├── Paid (ads, affiliates)
+│   └── Referral (social, email)
+│
+├── Conversion Rate
+│   ├── Add-to-cart rate
+│   ├── Cart-to-checkout rate
+│   └── Checkout completion rate
+│
+└── Average Order Value
+    ├── Items per order
+    └── Price per item
+```
+#### SaaS Revenue Tree
+```
+MRR = Customers × ARPU
+├── Customers
+│   ├── New MRR (new customers)
+│   ├── Expansion MRR (upgrades)
+│   ├── Contraction MRR (downgrades)
+│   └── Churned MRR (cancellations)
+│
+└── ARPU (Average Revenue Per User)
+    ├── Plan mix
+    ├── Add-on adoption
+    └── Usage-based fees
+```
+#### Customer Lifetime Value Tree
+```
+CLV = ARPU × Avg Lifetime × Gross Margin
+├── ARPU (Average Revenue Per User)
+│   ├── Base subscription
+│   └── Additional services
+│
+├── Average Customer Lifetime
+│   ├── 1 / Churn Rate
+│   └── Retention by cohort
+│
+└── Gross Margin
+    ├── Revenue
+    └── Cost of goods/service
+```
+#### Sales Pipeline Tree
+```
+Revenue = Leads × Conversion Rate × Deal Size
+├── Leads
+│   ├── Marketing Qualified (MQL)
+│   ├── Sales Qualified (SQL)
+│   └── Opportunity Created
+│
+├── Conversion Rate
+│   ├── MQL → SQL rate
+│   ├── SQL → Opportunity rate
+│   ├── Opportunity → Proposal rate
+│   └── Proposal → Close rate
+│
+└── Deal Size
+    ├── Contract value
+    ├── Upsell/cross-sell
+    └── Discounting
+```
+### Building a Metric Tree
+#### Step 1: Identify the Top-Level Metric
+- What KPI are you trying to understand or improve?
+- Example: "Revenue", "Conversion Rate", "Retention"
+#### Step 2: Find the Mathematical Relationship
+Express the metric as a formula:
+- **Multiplicative**: Revenue = Customers × ARPU
+- **Additive**: Revenue = Product A + Product B + Product C
+- **Ratio**: Conversion = Conversions / Visitors
+#### Step 3: Decompose Each Component
+For each component, ask: "What drives this?"
+- Keep decomposing until you reach actionable metrics
+- Stop when you reach metrics you can directly measure and influence
+#### Step 4: Validate the Tree
+- Components should be MECE (mutually exclusive, collectively exhaustive)
+- Math should work: components should sum/multiply to parent
+- Each leaf should be measurable in your data
+### Using Metric Trees for Analysis
+Once you have a metric tree, use it for:
+**1. Performance Attribution**
+When a metric changes, quantify how much each driver contributed:
+```
+Revenue dropped 10% ($100K → $90K)
+Attribution:
+├── Customer count: -5% impact ($5K)
+├── Order frequency: -3% impact ($3K)
+└── Average order value: -2% impact ($2K)
+```
+**2. Root Cause Analysis**
+Drill into the largest impact driver:
+```
+Customer count dropped 5%
+├── New customers: -8% (PRIMARY CAUSE)
+│   └── Paid acquisition: -15%
+│   └── Organic: +2%
+└── Retention: +3%
+```
+**3. Opportunity Sizing**
+Identify highest-leverage improvements:
+```
+If we improve conversion rate by 10%:
+├── Current: 2.5% conversion
+├── Target: 2.75% conversion
+└── Revenue impact: +$50K/month
+```
+### SQL for Metric Tree Analysis
+Generate SQL that calculates all components of a metric tree:
+```sql
+-- Revenue metric tree components
+WITH metrics AS (
+    SELECT
+        period,
+        COUNT(DISTINCT customer_id) as customers,
+        COUNT(*) as orders,
+        SUM(revenue) as revenue,
+        -- Derived metrics
+        COUNT(*)::float / COUNT(DISTINCT customer_id) as orders_per_customer,
+        SUM(revenue)::float / COUNT(*) as revenue_per_order
+    FROM quarri.bridge
+    WHERE order_date >= DATE '2024-01-01'
+    GROUP BY period
+)
+SELECT
+    period,
+    customers,
+    orders_per_customer,
+    revenue_per_order,
+    revenue,
+    -- Verify: customers * orders_per_customer * revenue_per_order ≈ revenue
+    customers * orders_per_customer * revenue_per_order as calculated_revenue
+FROM metrics
+ORDER BY period;
+```
+### Period-over-Period Comparison
+```sql
+-- Compare current vs previous period for root cause analysis
+WITH current_period AS (
+    SELECT
+        COUNT(DISTINCT customer_id) as customers,
+        COUNT(*)::float / COUNT(DISTINCT customer_id) as frequency,
+        SUM(revenue)::float / COUNT(*) as aov,
+        SUM(revenue) as revenue
+    FROM quarri.bridge
+    WHERE order_date >= DATE '2024-12-01'
+),
+previous_period AS (
+    SELECT
+        COUNT(DISTINCT customer_id) as customers,
+        COUNT(*)::float / COUNT(DISTINCT customer_id) as frequency,
+        SUM(revenue)::float / COUNT(*) as aov,
+        SUM(revenue) as revenue
+    FROM quarri.bridge
+    WHERE order_date >= DATE '2024-11-01' AND order_date < DATE '2024-12-01'
+)
+SELECT
+    'Customers' as metric,
+    p.customers as previous,
+    c.customers as current,
+    (c.customers - p.customers)::float / p.customers * 100 as pct_change
+FROM current_period c, previous_period p
+UNION ALL
+SELECT
+    'Frequency' as metric,
+    p.frequency as previous,
+    c.frequency as current,
+    (c.frequency - p.frequency)::float / p.frequency * 100 as pct_change
+FROM current_period c, previous_period p
+-- ... continue for all components
+```
+## Common Metric Patterns
+### Simple Aggregation
+```sql
+-- Total Revenue
+SELECT SUM(revenue) as total_revenue FROM quarri.bridge
+-- Order Count
+SELECT COUNT(*) as order_count FROM quarri.bridge
+```
+### Ratio Metrics
+```sql
+-- Conversion Rate
+SELECT
+    COUNT(CASE WHEN status = 'completed' THEN 1 END)::float /
+    COUNT(*) as conversion_rate
+FROM orders
+-- Gross Margin
+SELECT
+    (SUM(revenue) - SUM(cost)) / SUM(revenue) as gross_margin
+FROM quarri.bridge
+```
+### Time-Based Metrics
+```sql
+-- Monthly Recurring Revenue
+SELECT
+    DATE_TRUNC('month', subscription_date) as month,
+    SUM(monthly_amount) as mrr
+FROM subscriptions
+WHERE status = 'active'
+GROUP BY month
+-- Year-over-Year Growth
+SELECT
+    current.period,
+    (current.revenue - previous.revenue) / previous.revenue as yoy_growth
+FROM (...) current
+JOIN (...) previous ON current.period = previous.period + INTERVAL '1 year'
+```
+### Customer Metrics
+```sql
+-- Customer Lifetime Value
+SELECT
+    customer_id,
+    SUM(order_total) as lifetime_value,
+    COUNT(*) as order_count,
+    MIN(order_date) as first_order,
+    MAX(order_date) as last_order
+FROM orders
+GROUP BY customer_id
+```
+## Output Format
+```markdown
+## Metric Definition: [Metric Name]
+### Summary
+**Name**: [Metric Name]
+**Description**: [What it measures]
+**Synonyms**: [Alternative names]
+### Calculation
+[Plain English explanation of the formula]
+### SQL Template
+```sql
+[SQL code]
+```
+### Dimensions
+- [Dimension 1]: [Description]
+- [Dimension 2]: [Description]
+### Metric Tree (if applicable)
+```
+[Top-level metric]
+├── [Component 1]
+│   ├── [Sub-component 1a]
+│   └── [Sub-component 1b]
+├── [Component 2]
+└── [Component 3]
+```
+### Validation Results
+- Query executed successfully
+- Sample result: [Sample value]
+### Status
+[Ready to save / Needs revision]
+```
+## Integration
+After creating a metric:
+- Use with `/quarri-query` for natural language queries
+- Reference in `/quarri-analyze` for comprehensive analysis
+- Use `/quarri-diagnose` for root cause analysis with metric trees