npm - crushdataai - Versions diffs - 1.2.10 → 1.2.12 - Mend

crushdataai 1.2.10 → 1.2.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/dist/commands.js +8 -2
package/package.json +1 -1
package/assets/.shared/data-analyst/charts.csv +0 -31
package/assets/.shared/data-analyst/cleaning.csv +0 -21
package/assets/.shared/data-analyst/databases.csv +0 -35
package/assets/.shared/data-analyst/industries/ecommerce.csv +0 -25
package/assets/.shared/data-analyst/industries/finance.csv +0 -24
package/assets/.shared/data-analyst/industries/marketing.csv +0 -25
package/assets/.shared/data-analyst/industries/saas.csv +0 -24
package/assets/.shared/data-analyst/metrics.csv +0 -74
package/assets/.shared/data-analyst/python-patterns.csv +0 -31
package/assets/.shared/data-analyst/report-ux.csv +0 -26
package/assets/.shared/data-analyst/sql-patterns.csv +0 -36
package/assets/.shared/data-analyst/validation.csv +0 -21
package/assets/.shared/data-analyst/workflows.csv +0 -55

package/dist/commands.js CHANGED Viewed

@@ -42,26 +42,32 @@ const AI_TYPES = ['claude', 'cursor', 'windsurf', 'antigravity', 'copilot', 'kir
 const AI_PATHS = {
     claude: {
         dir: '.claude/skills/data-analyst',
+        sourceDir: '.claude/skills/data-analyst',
         files: ['SKILL.md']
     },
     cursor: {
         dir: '.cursor/commands',
+        sourceDir: '.cursor/commands',
         files: ['data-analyst.md']
     },
     windsurf: {
         dir: '.windsurf/workflows',
+        sourceDir: '.windsurf/workflows',
         files: ['data-analyst.md']
     },
     antigravity: {
         dir: '.agent/workflows',
+        sourceDir: '.agent/workflows',
         files: ['data-analyst.md']
     },
     copilot: {
         dir: '.github/prompts',
+        sourceDir: '.github/prompts',
         files: ['data-analyst.prompt.md']
     },
     kiro: {
         dir: '.kiro/steering',
+        sourceDir: '.kiro/steering',
         files: ['data-analyst.md']
     }
 };
@@ -71,7 +77,7 @@ function getAssetsDir() {
     return path.join(__dirname, '..', 'assets');
 }
 function copySharedFiles(targetDir, force) {
-    const sharedSource = path.join(getAssetsDir(), 'shared');
+    const sharedSource = path.join(getAssetsDir(), '.shared', 'data-analyst');
     const sharedTarget = path.join(targetDir, SHARED_DIR);
     if (fs.existsSync(sharedTarget) && !force) {
         console.log(`  ⏭️  ${SHARED_DIR} already exists (use --force to overwrite)`);
@@ -82,7 +88,7 @@ function copySharedFiles(targetDir, force) {
 }
 function copyAIFiles(aiType, targetDir, force) {
     const config = AI_PATHS[aiType];
-    const sourceDir = path.join(getAssetsDir(), aiType);
+    const sourceDir = path.join(getAssetsDir(), config.sourceDir);
     const targetPath = path.join(targetDir, config.dir);
     // Ensure directory exists
     fs.ensureDirSync(targetPath);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
     "name": "crushdataai",
-    "version": "1.2.10",
+    "version": "1.2.12",
     "description": "CLI to install CrushData AI data analyst skill for AI coding assistants",
     "main": "dist/index.js",
     "bin": {

package/assets/.shared/data-analyst/charts.csv DELETED Viewed

@@ -1,31 +0,0 @@
-Chart Type,Best For,Data Type,Comparison Type,Python Code,Color Guidance,Accessibility,Dashboard Tip
-Line Chart,Trends over time and continuous data,Time-series,Trend,"plt.plot(df['date'], df['value']); plt.xlabel('Date'); plt.ylabel('Value')","Sequential blue/green for single metric; categorical for multiple series","Add markers for key data points; use sufficient line thickness","Place in middle section for trend visibility"
-Bar Chart,Comparing categories or rankings,Categorical,Ranking Comparison,"plt.bar(df['category'], df['value']); plt.xticks(rotation=45)","Single color for one series; categorical for grouped","Ensure sufficient contrast between bars; label values directly","Use horizontal layout if labels are long"
-Horizontal Bar Chart,Ranking with long labels,Categorical,Ranking,"plt.barh(df['category'], df['value'])","Single sequential color from light to dark by value","Order bars by value for easy scanning","Great for top-N lists and leaderboards"
-Stacked Bar Chart,Part-to-whole over categories,Categorical,Composition,"df.plot(kind='bar', stacked=True)","Use distinct colors for each segment; max 5-6 segments","Include legend; consider labels on large segments","Good for showing composition across time periods"
-Grouped Bar Chart,Comparing multiple series by category,Categorical,Comparison,"df.plot(kind='bar', position='dodge')","Categorical palette with clear distinction between groups","Limit to 3-4 groups per category; use legend","Best for A/B comparisons or time period comparisons"
-Pie Chart,Simple part-to-whole (max 5 segments),Categorical,Composition,"plt.pie(df['value'], labels=df['category'], autopct='%1.1f%%')","High contrast between adjacent segments","Limit to 5 segments; order by size; include percentages","Avoid in dashboards - use donut or bar instead"
-Donut Chart,Part-to-whole with center metric,Categorical,Composition,"plt.pie(df['value'], wedgeprops={'width': 0.4})","High contrast colors; use brand colors if relevant","Include total or key metric in center","Good for single KPI with breakdown"
-Area Chart,Trends with volume emphasis,Time-series,Trend Volume,"plt.fill_between(df['date'], df['value'], alpha=0.3)","Light fill with darker line; sequential colors","Ensure baseline is visible; use transparency","Shows magnitude over time better than line"
-Stacked Area Chart,Composition over time,Time-series,Composition Trend,"df.plot(kind='area', stacked=True)","Distinct colors for each layer; limit layers to 5","Consider 100% stacked for proportion focus","Good for market share or category mix over time"
-Scatter Plot,Relationship between two variables,Numerical,Correlation,"plt.scatter(df['x'], df['y'])","Single color with alpha for density; color by category if needed","Add trendline for correlation; include R-squared","Use for identifying outliers and patterns"
-Bubble Chart,Three-variable relationships,Numerical,Correlation Size,"plt.scatter(df['x'], df['y'], s=df['size']*100)","Color by category if applicable; size legend essential","Ensure bubbles don't overlap too much","Include size legend; limit to important points"
-Heatmap,Correlations or matrix data,Numerical Matrix,Distribution,"sns.heatmap(df.corr(), annot=True, cmap='RdBu_r')","Diverging palette for correlation (-1 to 1); sequential for counts","Include value annotations; use colorblind-safe palette","Perfect for cohort retention tables"
-Histogram,Distribution of single variable,Numerical,Distribution,"plt.hist(df['value'], bins=30)","Single color; consider outlier highlighting","Include mean/median line; label bin count","Use to understand data distribution before analysis"
-Box Plot,Distribution comparison,Numerical,Distribution Comparison,"sns.boxplot(data=df, x='category', y='value')","One color per category; highlight outliers","Explain quartile meanings; include n count","Great for comparing distributions across groups"
-Violin Plot,Distribution with density,Numerical,Distribution,"sns.violinplot(data=df, x='category', y='value')","Paired colors for split violins; sequential otherwise","More intuitive than box plots for some users","Good for showing bimodal distributions"
-Funnel Chart,Sequential step conversion,Categorical,Drop-off,"import plotly.express as px; px.funnel(df, x='count', y='stage')","Blues from dark to light (top to bottom); or brand colors","Label conversion percentages between stages","Essential for showing conversion drop-off"
-Waterfall Chart,Cumulative effect of values,Categorical,Contribution,"Use plotly or custom matplotlib with positive/negative coloring","Green for positive; red for negative; gray for subtotals","Start with total; show increases and decreases clearly","Great for bridge charts (start to end explanation)"
-Gauge Chart,Single KPI with target,Single Value,Target,"Use plotly Indicator or custom graphic","Green/yellow/red zones based on targets","Include actual value and target","Use sparingly - one per major KPI"
-KPI Card,Single important metric,Single Value,Status,"Text display with conditional formatting","Color based on performance (green/amber/red)","Large font; include trend arrow and context","Top of dashboard for most important metrics"
-Sparkline,Compact trend indicator,Time-series,Trend,"Line chart rendered small without axes","Single color; consistent across dashboard","May be too small for some users","Great alongside KPI cards to show trend"
-Table,Detailed data viewing,Multi-dimensional,Detail,"df.style.format() with conditional formatting","Alternate row colors; highlight important values","Ensure sufficient contrast; limit columns","Place at bottom of dashboard for drill-down"
-Pivot Table,Cross-tabulation analysis,Multi-dimensional,Comparison,"pd.pivot_table(df, values='metric', index='row', columns='col')","Heatmap coloring on values if applicable","Include row/column totals","Good for exploration; use charts for communication"
-Treemap,Hierarchical part-to-whole,Hierarchical,Composition,"import plotly.express as px; px.treemap(df, path=['parent', 'child'], values='value')","Distinct colors per category; size shows proportion","Include value labels on large segments","Good for budget/allocation visualization"
-Sankey Diagram,Flow between categories,Flow,Flow,"Use plotly Sankey for flow visualization","Distinct colors per source node","Limit to 5-10 nodes for readability","Perfect for showing customer journeys"
-Radar/Spider Chart,Multi-variable comparison,Multi-dimensional,Profile,"Use matplotlib radar chart or plotly","One color per entity being compared","Include reference lines; limit to 5-8 axes","Good for segment profiles or competitive analysis"
-Geographic Map,Location-based data,Geographic,Distribution,"Use folium or plotly for choropleth maps","Sequential color scale for values","Ensure color scale is clear; include legend","Use for regional performance comparisons"
-Calendar Heatmap,Activity over time,Time-series,Pattern,"Use calplot or custom heatmap by day","Sequential palette; highlight weekends differently","Label axes clearly; include color legend","Good for showing seasonal patterns"
-Combination Chart,Mixed data types,Mixed,Correlation Trend,"Use secondary y-axis: ax2 = ax1.twinx()","Distinct colors for each series; clear legend","Ensure both scales are readable","Use when showing related but different units"
-Small Multiples,Comparison across categories,Multi-dimensional,Comparison,"Use facet plot: sns.FacetGrid(df, col='category')","Consistent scale and colors across all charts","Keep individual charts simple","Great for comparing patterns across segments"
-Bullet Chart,KPI vs target and comparison,Single Value,Target Comparison,"Use plotly for bullet chart implementation","Gray for comparison; bars for actual/target","Include labels for all elements","Compact alternative to gauge for multiple KPIs"

package/assets/.shared/data-analyst/cleaning.csv DELETED Viewed

@@ -1,21 +0,0 @@
-Issue Type,Detection Method,Solution,Python Code,SQL Code,Impact
-Missing Values,df.isnull().sum() and df.isnull().mean(),"Drop rows, impute with mean/median/mode, forward fill, or flag with indicator","df['col'].fillna(df['col'].median(), inplace=True) or df.dropna(subset=['required_col'])","COALESCE(column, 0) or WHERE column IS NOT NULL","Missing data can skew aggregations, break joins, and cause errors"
-Duplicate Rows,df.duplicated().sum() and df[df.duplicated()],"Remove exact duplicates or dedupe by key keeping latest","df.drop_duplicates(subset=['id'], keep='last', inplace=True)","WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) as rn FROM table) SELECT * FROM ranked WHERE rn = 1","Duplicates inflate counts, sums, and cause join multiplication"
-Outliers,df.describe() and boxplot IQR method,"Remove, cap/Winsorize, or investigate - depends on domain","Q1, Q3 = df['col'].quantile([0.25, 0.75]); IQR = Q3-Q1; df = df[(df['col'] >= Q1-1.5*IQR) & (df['col'] <= Q3+1.5*IQR)]","WHERE value BETWEEN (SELECT PERCENTILE_CONT(0.25) - 1.5*IQR) AND (SELECT PERCENTILE_CONT(0.75) + 1.5*IQR)","Outliers can dominate averages and distort visualizations"
-Data Type Mismatch,df.dtypes and df['col'].apply(type).value_counts(),"Convert to correct type with error handling","df['date'] = pd.to_datetime(df['date'], errors='coerce'); df['amount'] = pd.to_numeric(df['amount'], errors='coerce')","CAST(column AS DATE) or TRY_CAST for safe conversion","Wrong types cause sorting, filtering, and aggregation errors"
-Inconsistent Date Formats,df['date'].str.contains(pattern).value_counts(),"Standardize to ISO format YYYY-MM-DD","df['date'] = pd.to_datetime(df['date'], format='mixed').dt.strftime('%Y-%m-%d')","TO_DATE(date_string, 'format pattern')","Inconsistent dates cause parsing errors and incorrect sorting"
-Leading/Trailing Whitespace,df['col'].str.len() vs df['col'].str.strip().str.len(),"Strip whitespace from string columns","df['col'] = df['col'].str.strip()","TRIM(column)","Whitespace causes join failures and lookup misses"
-Case Inconsistency,df['col'].str.lower().nunique() vs df['col'].nunique(),"Standardize to lowercase or title case","df['col'] = df['col'].str.lower() or .str.title()","LOWER(column) or UPPER(column)","Case differences cause grouping errors and aggregation issues"
-Invalid Categories,df['category'].isin(valid_list).value_counts(),"Map invalid values or flag/remove","df['category'] = df['category'].replace({'invalid': 'Unknown'}); df = df[df['category'].isin(valid_list)]","CASE WHEN category IN ('valid1', 'valid2') THEN category ELSE 'Other' END","Invalid categories skew analysis and break filters"
-Negative Values Where Impossible,df[df['col'] < 0].count(),"Flag, remove, or convert to absolute value","df = df[df['quantity'] >= 0] or df['quantity'] = df['quantity'].abs()","WHERE quantity >= 0 or ABS(quantity)","Negative quantities/prices indicate data entry errors"
-Future Dates in Historical Data,df[df['date'] > pd.Timestamp.today()],"Remove or flag future-dated records","df = df[df['date'] <= pd.Timestamp.today()]","WHERE date <= CURRENT_DATE","Future dates indicate data pipeline or entry errors"
-Zero Division Risk,df[df['denominator'] == 0].count(),"Handle zeros before division with NULLIF or fillna","df['ratio'] = df['numerator'] / df['denominator'].replace(0, np.nan)","numerator / NULLIF(denominator, 0)","Division by zero causes errors or inf values"
-Encoding Issues,df['col'].str.contains('[^\x00-\x7F]'),"Fix encoding or remove special characters","df['col'] = df['col'].str.encode('ascii', 'ignore').str.decode('ascii')","REGEXP_REPLACE(col, '[^[:ascii:]]', '')","Encoding issues cause display and processing errors"
-Null vs Zero Ambiguity,df['col'].isin([0, None]).value_counts(),"Decide semantic meaning: is 0 different from null?","Document decision: df['col'] = df['col'].fillna(0) # if semantically equivalent","Add explicit flag: CASE WHEN col IS NULL THEN 'Unknown' ELSE 'Known' END","Confusing null with zero leads to calculation errors"
-Data Entry Typos,df['name'].str.lower().value_counts() looking for similar values,"Use fuzzy matching to identify and merge typos","from fuzzywuzzy import fuzz; identify similar strings","Use pg_trgm or Levenshtein distance functions","Typos split metrics that should be grouped together"
-Orphan Records,df.merge(reference_df, how='left', indicator=True).query('_merge == \"left_only\"'),"Remove orphans or add to reference table","df = df[df['foreign_key'].isin(reference_df['id'])]","WHERE foreign_key IN (SELECT id FROM reference_table)","Orphan records indicate referential integrity issues"
-Mixed Numeric Formats,df['col'].str.contains(r'[\$,€%]'),"Extract numeric values removing currency/percent symbols","df['amount'] = df['amount'].str.replace('[$,]', '', regex=True).astype(float)","CAST(REPLACE(REPLACE(col, '$', ''), ',', '') AS DECIMAL)","Mixed formats prevent numeric operations"
-Boolean Inconsistency,df['flag'].unique() showing mixed True/False/1/0/Yes/No,"Standardize to consistent boolean representation","df['flag'] = df['flag'].map({'Yes': True, 'No': False, 1: True, 0: False})","CASE WHEN flag IN ('Yes', 'Y', '1', 'true') THEN TRUE ELSE FALSE END","Inconsistent booleans cause filtering errors"
-Data Freshness,df['updated_at'].max() vs expected freshness,"Alert if data is stale beyond threshold","assert (pd.Timestamp.today() - df['updated_at'].max()).days < 1, 'Data is stale'","WHERE updated_at >= CURRENT_DATE - INTERVAL '1 day'","Stale data leads to incorrect analysis and decisions"
-Cardinality Changes,Compare df['col'].nunique() to historical baseline,"Alert if cardinality changes unexpectedly","assert df['category'].nunique() == expected_count, f'Expected {expected_count} categories'","SELECT COUNT(DISTINCT col) and compare to metadata","New or missing categories indicate upstream issues"
-Range Violations,df[~df['col'].between(min_val, max_val)],"Flag or remove out-of-range values","df = df[df['age'].between(0, 120)]","WHERE age BETWEEN 0 AND 120","Out-of-range values indicate data quality issues"

package/assets/.shared/data-analyst/databases.csv DELETED Viewed

@@ -1,35 +0,0 @@
-Database,Category,Guideline,Do,Don't,Code Example
-PostgreSQL,Connection,Use connection pooling for efficiency,"Use psycopg2 pool or SQLAlchemy with pool_size","Create new connection for each query","from sqlalchemy import create_engine; engine = create_engine('postgresql://...', pool_size=5)"
-PostgreSQL,Query,Use EXPLAIN ANALYZE for query tuning,"EXPLAIN ANALYZE your slow queries; check for seq scans","Optimize blindly without understanding execution plan","EXPLAIN ANALYZE SELECT * FROM orders WHERE date > '2024-01-01'"
-PostgreSQL,Indexing,Create indexes on filtered and joined columns,"CREATE INDEX idx_date ON orders(order_date)","Index every column; forget to ANALYZE after","CREATE INDEX CONCURRENTLY to avoid locking"
-PostgreSQL,Dates,Use date_trunc for time grouping,"date_trunc('month', order_date)","String manipulation on dates","SELECT date_trunc('day', ts) as day, COUNT(*) FROM events GROUP BY 1"
-PostgreSQL,Window,Use window functions for analytics,"OVER (PARTITION BY ... ORDER BY ...)","Self-joins for running totals","SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)"
-PostgreSQL,CTEs,Use CTEs for readable complex queries,"WITH step1 AS (...), step2 AS (...)","Deeply nested subqueries","WITH monthly AS (SELECT date_trunc('month', date) ...) SELECT * FROM monthly"
-BigQuery,Cost,Limit scanned data with partitions and clustering,"Use WHERE on partition column; SELECT only needed columns","SELECT * without partition filter","WHERE _PARTITIONDATE BETWEEN '2024-01-01' AND '2024-01-31'"
-BigQuery,Partitioning,Partition tables by date for cost and performance,"PARTITION BY DATE(timestamp_column)","Query without partition filter","CREATE TABLE ... PARTITION BY DATE(created_at)"
-BigQuery,Slots,Understand slot allocation for query performance,"Use INFORMATION_SCHEMA for slot usage; optimize large scans","Ignore slot exhaustion warnings","SELECT * FROM project.INFORMATION_SCHEMA.JOBS_BY_PROJECT"
-BigQuery,UDFs,Use standard SQL before custom JavaScript UDFs,"Built-in functions are optimized","JavaScript UDFs for simple operations","Use SAFE_DIVIDE, IF, CASE instead of JS"
-BigQuery,Approximate,Use approximate functions for large datasets,"APPROX_COUNT_DISTINCT, APPROX_QUANTILES","Exact distinct counts on huge tables","SELECT APPROX_COUNT_DISTINCT(user_id) FROM events"
-BigQuery,Qualify,Use QUALIFY for window function filtering,"QUALIFY ROW_NUMBER() OVER (...) = 1","Subquery wrapper for filtering","SELECT * FROM table QUALIFY RANK() OVER (PARTITION BY id ORDER BY ts DESC) = 1"
-Snowflake,Warehouse,Size warehouse appropriately for workload,"Use X-Small for dev; auto-suspend after 60s","Leave large warehouse running idle","ALTER WAREHOUSE SET WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 60"
-Snowflake,Clustering,Use clustering keys for large tables,"Cluster on frequently filtered columns","Cluster on high-cardinality columns","ALTER TABLE orders CLUSTER BY (region, order_date)"
-Snowflake,Time Travel,Use time travel for data recovery and analysis,"Query historical data: AT (TIMESTAMP => ...)","Forget Time Travel is available for debugging","SELECT * FROM table AT (OFFSET => -3600)"
-Snowflake,Zero Copy Clone,Clone tables for testing without storage cost,"CREATE TABLE test_copy CLONE production","Full physical copies for testing","CREATE TABLE dev.orders CLONE prod.orders"
-MySQL,Indexes,Use composite indexes for multi-column queries,"Create index matching WHERE + ORDER BY columns","Too many indexes slow writes","CREATE INDEX idx_user_date ON orders(user_id, order_date)"
-MySQL,Query Cache,Understand query cache behavior (deprecated in 8.0),"Use application-level caching instead","Rely on MySQL query cache","Use Redis or Memcached for caching"
-MySQL,Limits,Use LIMIT with ORDER BY for pagination,"ORDER BY id LIMIT 100 OFFSET 200","LIMIT without ORDER BY (non-deterministic)","SELECT * FROM users ORDER BY id LIMIT 100 OFFSET 0"
-MySQL,Joins,Prefer JOINs over subqueries when possible,"Rewrite correlated subqueries as JOINs","Correlated subqueries for large datasets","JOIN instead of WHERE x IN (SELECT ...)"
-SQLite,Limitations,Understand SQLite limitations for analytics,"Good for local dev and small datasets","Use for production with concurrent writes","Maximum practical size ~1TB; limited concurrency"
-SQLite,Types,SQLite uses dynamic typing,"Check actual types: typeof(column)","Assume strict typing like other databases","Be aware: '1' and 1 may both be stored"
-Redshift,Distribution,Choose distribution key for join performance,"Distribute on frequently joined column","ALL distribution for large tables","CREATE TABLE ... DISTKEY(user_id)"
-Redshift,Sort Keys,Use sort keys for range queries,"Sort on commonly filtered date columns","Too many sort key columns","CREATE TABLE ... SORTKEY(created_date)"
-Redshift,Vacuum,Run VACUUM and ANALYZE regularly,"Schedule VACUUM DELETE ONLY weekly","Forget to reclaim space after deletes","VACUUM FULL table_name; ANALYZE table_name"
-Redshift,Spectrum,Use Spectrum for querying S3 data directly,"External tables for cold/historical data","Load all data into Redshift","CREATE EXTERNAL TABLE pointing to S3"
-MongoDB,Aggregation,Use aggregation pipeline for analytics,"$match early to reduce documents","Process large result sets in application","db.collection.aggregate([{$match: ...}, {$group: ...}])"
-MongoDB,Indexes,Create indexes for query patterns,"Compound indexes matching query predicates","Index fields not used in queries","db.collection.createIndex({user_id: 1, date: -1})"
-DynamoDB,Queries,Design for query patterns not data model,"Access patterns determine table design","Scan operations on large tables","Use Query with partition key; avoid Scan"
-DynamoDB,GSI,Use Global Secondary Indexes for alternate access,"GSI for different query patterns","Too many GSIs (cost and write amplification)","Create GSI for each major access pattern"
-General,Testing,Test queries on sample data first,"Use LIMIT or sampling for initial development","Run untested queries on full production data","SELECT * FROM table TABLESAMPLE (1 PERCENT)"
-General,Transactions,Use transactions for data integrity,"Wrap related changes in transactions","Auto-commit for multi-statement updates","BEGIN; UPDATE ...; UPDATE ...; COMMIT;"
-General,Comments,Document complex queries with comments,"Add comments explaining business logic","Uncommented complex SQL","-- Calculate 30-day rolling revenue per customer"
-General,Parameterization,Use parameterized queries to prevent SQL injection,"Use ? or :param placeholders","String concatenation for query building","cursor.execute('SELECT * FROM users WHERE id = ?', (user_id,))"

package/assets/.shared/data-analyst/industries/ecommerce.csv DELETED Viewed

@@ -1,25 +0,0 @@
-Metric Name,Abbreviation,Category,Formula,Interpretation,Good Benchmark,Related Metrics,Visualization
-Conversion Rate,CVR,Conversion,"Orders / Sessions * 100","Percentage of visits resulting in purchase","2-3% average; 5%+ excellent","AOV, Traffic, Add to Cart Rate",Line chart trend
-Add to Cart Rate,ATC Rate,Conversion,"Add to Carts / Sessions * 100","Percentage adding items to cart","5-10% typical","CVR, Cart Abandonment",Funnel
-Cart Abandonment Rate,Cart Abandon,Conversion,"(Carts Created - Purchases) / Carts Created * 100","Percentage of carts not converted","~70% average; < 60% is good","Checkout Conversion, Payment Failure",Funnel
-Checkout Completion Rate,Checkout,Conversion,"Purchases / Checkout Started * 100","Checkout funnel success","70-80% is good","Cart Abandonment, Payment Methods",Funnel
-Average Order Value,AOV,Revenue,"Total Revenue / Number of Orders","Average amount per order","Varies by category; track trend","Revenue, Items per Order",KPI card trend
-Revenue Per Visitor,RPV,Revenue,"Total Revenue / Total Visitors","Revenue generated per visit","AOV * CVR","CVR, AOV, Traffic",KPI card
-Gross Merchandise Value,GMV,Revenue,"Total value of merchandise sold","Platform transaction volume","Growing MoM/YoY","Revenue, Take Rate",Line chart
-Net Revenue,Net Revenue,Revenue,"GMV - Returns - Discounts - Cancellations","Actual revenue after adjustments","Net/Gross ratio trending up","GMV, Return Rate",Line chart
-Customer Acquisition Cost,CAC,Acquisition,"Marketing Spend / New Customers","Cost to acquire new customer","CAC < first order margin","LTV, ROAS",KPI card
-Customer Lifetime Value,CLV,Acquisition,"AOV * Purchase Frequency * Customer Lifespan","Total expected customer revenue","CLV:CAC > 3:1","CAC, Repeat Rate, AOV",KPI card
-Cost Per Order,CPO,Acquisition,"Marketing Spend / Orders","Marketing cost per order","Track by channel","CAC, ROAS",Bar by channel
-Return on Ad Spend,ROAS,Acquisition,"Revenue from Ads / Ad Spend","Revenue per advertising dollar","3:1+ typically profitable","CAC, CPO",Line chart
-Repeat Purchase Rate,Repeat Rate,Retention,"Customers with 2+ Orders / Total Customers * 100","Customer returning to buy again","> 30% for healthy retention","CLV, Purchase Frequency",Line chart
-Purchase Frequency,Frequency,Retention,"Total Orders / Unique Customers (in period)","Average orders per customer","Varies by category; track trend","Repeat Rate, CLV",KPI card
-Time Between Purchases,TBP,Retention,"Average days between customer orders","Purchase cycle length","Use for remarketing timing","Frequency, Repeat Rate",Histogram
-Reactivation Rate,Reactivation,Retention,"Dormant Customers Who Returned / Total Dormant * 100","Success of win-back campaigns","5-15% typical for campaigns","Churn, Win-back Campaigns",Bar chart
-Return Rate,Return Rate,Fulfillment,"Returned Orders / Total Orders * 100","Percentage of orders returned","< 20% most; < 30% apparel","Net Revenue, COGS",Line chart
-On-Time Delivery Rate,OTD,Fulfillment,"Orders Delivered On Time / Total Shipped * 100","Shipping reliability","> 95% is excellent","Customer Satisfaction",Gauge
-Stock-out Rate,Stockout,Inventory,"Items Out of Stock / Total SKUs * 100","Inventory availability","< 5% for popular items","Inventory Turnover, Lost Sales",Line chart
-Inventory Turnover,Inv Turn,Inventory,"COGS / Average Inventory","How fast inventory sells","Higher is better; varies by cat","DOI, Stockout Rate",Bar chart
-Days of Inventory,DOI,Inventory,"Average Inventory / (COGS / 365)","Days to sell current inventory","30-60 days typical","Inventory Turnover",KPI card
-Website Traffic,Traffic,Traffic,"Total Sessions","Visit volume to site","Growing with quality","Bounce Rate, CVR",Line chart
-Bounce Rate,Bounce,Traffic,"Single-page Sessions / Total Sessions * 100","Visitors leaving immediately","< 40% for product pages","Time on Site, Pages/Session",Line chart
-Pages per Session,PPS,Traffic,"Total Pageviews / Sessions","Engagement depth","2-4 typical; higher for discovery","Bounce Rate, Time on Site",KPI card

package/assets/.shared/data-analyst/industries/finance.csv DELETED Viewed

@@ -1,24 +0,0 @@
-Metric Name,Abbreviation,Category,Formula,Interpretation,Good Benchmark,Related Metrics,Visualization
-Gross Profit Margin,GPM,Profitability,"(Revenue - COGS) / Revenue * 100","Profit after direct costs","Varies by industry; 40-60% typical","Net Margin, COGS",KPI card
-Net Profit Margin,NPM,Profitability,"Net Income / Revenue * 100","Bottom line profitability","Positive and stable","GPM, Operating Margin",KPI card
-Operating Margin,Op Margin,Profitability,"Operating Income / Revenue * 100","Core business profitability","Positive for viable business","EBITDA Margin, SG&A",KPI card
-EBITDA Margin,EBITDA,Profitability,"EBITDA / Revenue * 100","Cash-based profitability","Used for comparison across capital structures","Operating Margin, D&A",KPI card
-Return on Assets,ROA,Returns,"Net Income / Total Assets * 100","Asset efficiency","> 5% generally good","ROE, Asset Turnover",KPI card
-Return on Equity,ROE,Returns,"Net Income / Shareholders Equity * 100","Return to shareholders","> 15% considered good","ROA, Leverage",KPI card
-Return on Investment,ROI,Returns,"(Gain - Cost) / Cost * 100","Investment profitability","> 0% means profitable","IRR, Payback Period",KPI card
-Return on Capital Employed,ROCE,Returns,"EBIT / Capital Employed * 100","Efficiency of capital use","> cost of capital","ROIC, Capital Efficiency",KPI card
-Current Ratio,Current,Liquidity,"Current Assets / Current Liabilities","Short-term liquidity","1.5-2.0 typically healthy","Quick Ratio, Working Capital",Gauge
-Quick Ratio,Quick Ratio,Liquidity,"(Current Assets - Inventory) / Current Liabilities","Immediate liquidity","> 1.0 generally healthy","Current Ratio, Cash Ratio",Gauge
-Cash Ratio,Cash Ratio,Liquidity,"Cash / Current Liabilities","Most conservative liquidity","> 0.5 is comfortable","Quick Ratio, Operating Cash",KPI card
-Working Capital,Working Cap,Liquidity,"Current Assets - Current Liabilities","Operating liquidity","Positive and stable","Current Ratio, Cash Conversion",KPI card
-Debt to Equity Ratio,D/E,Leverage,"Total Debt / Shareholders Equity","Financial leverage","< 2.0 generally healthy","Leverage Ratio, Interest Coverage",KPI card
-Debt to Assets Ratio,D/A,Leverage,"Total Debt / Total Assets","Asset leverage","< 0.5 is conservative","D/E, Asset Coverage",KPI card
-Interest Coverage Ratio,ICR,Leverage,"EBIT / Interest Expense","Ability to pay interest","> 3.0 is comfortable","D/E, Debt Service",KPI card
-Asset Turnover,Asset Turn,Efficiency,"Revenue / Average Total Assets","Asset productivity","Higher is more efficient","ROA, Inventory Turnover",KPI card
-Receivables Turnover,AR Turn,Efficiency,"Revenue / Average Accounts Receivable","Collection efficiency","Higher = faster collection","DSO, Cash Conversion",KPI card
-Days Sales Outstanding,DSO,Efficiency,"(Accounts Receivable / Revenue) * 365","Average collection period","Lower is better; industry varies","AR Turnover, Cash Cycle",KPI card
-Cash Conversion Cycle,CCC,Efficiency,"DIO + DSO - DPO","Days to convert inventory to cash","Shorter is better","DSO, DIO, DPO",KPI card
-Revenue Growth Rate,Rev Growth,Growth,"(Current - Prior) / Prior * 100","Revenue increase rate","Depends on stage","YoY, MoM, CAGR",Line chart
-CAGR,CAGR,Growth,"(End Value / Start Value)^(1/Years) - 1","Compound annual growth","Track vs peers and market","Revenue Growth, Projections",KPI card
-Burn Rate,Burn,Cash,"Monthly Operating Expenses - Monthly Revenue","Net cash consumed","Lower = more runway","Runway, Cash Balance",Line chart
-Runway,Runway,Cash,"Cash Balance / Monthly Burn Rate","Months of operations left","> 18 months for fundraising","Burn Rate, Cash Balance",KPI card

package/assets/.shared/data-analyst/industries/marketing.csv DELETED Viewed

@@ -1,25 +0,0 @@
-Metric Name,Abbreviation,Category,Formula,Interpretation,Good Benchmark,Related Metrics,Visualization
-Click-Through Rate,CTR,Advertising,"Clicks / Impressions * 100","Ad engagement rate","2-5% search; 0.5-1% display","CPC, Quality Score",Line chart
-Cost Per Click,CPC,Advertising,"Ad Spend / Clicks","Cost for each click","Varies by industry/keyword","CTR, CPA",KPI by channel
-Cost Per Acquisition,CPA,Advertising,"Ad Spend / Conversions","Cost per conversion","Should be < customer value","CAC, LTV, ROAS",Bar by channel
-Cost Per Mille,CPM,Advertising,"(Ad Spend / Impressions) * 1000","Cost per thousand impressions","Varies by channel/targeting","Reach, Frequency",KPI card
-Return on Ad Spend,ROAS,Advertising,"Revenue from Ads / Ad Spend","Revenue per ad dollar","3:1+ typically profitable","ROI, CPA",Line chart
-Impression Share,ImpShare,Advertising,"Your Impressions / Total Eligible Impressions * 100","Ad visibility percentage","Depends on strategy; > 80% for brand","Budget, Quality Score",Gauge
-Quality Score,QS,Advertising,"Platform score 1-10 based on relevance","Ad quality indicator (Google)","7+ is good; 9+ is excellent","CTR, Landing Page Score",Bar chart
-Customer Acquisition Cost,CAC,Acquisition,"Total Marketing Spend / New Customers","Full cost to acquire customer","CAC < LTV/3","LTV, Payback Period",KPI card trend
-Marketing Qualified Lead,MQL,Lead Gen,"COUNT leads meeting marketing criteria","Leads ready for nurturing","Growing count; stable conversion","SQL, Lead Velocity",Line chart
-Sales Qualified Lead,SQL,Lead Gen,"COUNT leads meeting sales criteria","Leads accepted by sales","MQL → SQL > 30%","MQL, Opportunity, Win Rate",Line chart
-Lead Conversion Rate,Lead Conv,Lead Gen,"Customers / Leads * 100","Lead to customer rate","10-20% for B2B","SQL Conv, Sales Cycle",Funnel
-Lead Velocity Rate,LVR,Lead Gen,"(MQLs This Month - MQLs Last Month) / MQLs Last Month * 100","Growth in qualified leads","Positive indicates pipeline growth","MQL Count, Revenue Pipeline",Line chart
-Cost Per Lead,CPL,Lead Gen,"Marketing Spend / Leads Generated","Cost to generate one lead","Varies by industry; track trend","CPA, Lead Quality",Bar by channel
-Email Open Rate,Open Rate,Email,"Opens / Emails Delivered * 100","Email engagement","15-25% typical","Subject Line, Send Time",Line chart
-Email Click Rate,Email CTR,Email,"Clicks / Emails Delivered * 100","Email action rate","2-5% typical","Open Rate, CTA Copy",Line chart
-Email Unsubscribe Rate,Unsub Rate,Email,"Unsubscribes / Emails Delivered * 100","List health indicator","< 0.5% is healthy","Complaint Rate, List Hygiene",Line chart
-Email Bounce Rate,Bounce Rate,Email,"Bounced / Emails Sent * 100","Deliverability issue indicator","< 2% is healthy","List Quality, Deliverability",Line chart
-Social Engagement Rate,Engagement,Social,"(Likes + Comments + Shares) / Followers * 100","Content resonance","1-5% depends on platform","Reach, Follower Growth",Bar by platform
-Social Reach,Reach,Social,"Unique users who saw content","Content visibility","Growing indicates growth","Impressions, Engagement",Line chart
-Follower Growth Rate,Follower Grw,Social,"(New Followers - Lost Followers) / Total Followers * 100","Audience growth","Positive and accelerating","Engagement, Content Quality",Line chart
-Website Bounce Rate,Bounce,Website,"Single-page Sessions / Sessions * 100","Landing page effectiveness","< 40% for content; < 60% for landing","Time on Site, Pages/Session",Line chart by page
-Pages Per Session,PPS,Website,"Pageviews / Sessions","Content engagement depth","> 2 for content sites","Bounce Rate, Session Duration",KPI card
-Average Session Duration,Avg Session,Website,"Total Session Time / Sessions","Time spent per visit","> 2 min for content sites","Bounce Rate, Pages/Session",KPI card
-Conversion Rate,Conv Rate,Website,"Conversions / Visitors * 100","Website goal completion","2-5% for lead gen","Traffic, Landing Page Quality",Funnel

package/assets/.shared/data-analyst/industries/saas.csv DELETED Viewed

@@ -1,24 +0,0 @@
-Metric Name,Abbreviation,Category,Formula,Interpretation,Good Benchmark,Related Metrics,Visualization
-Monthly Recurring Revenue,MRR,Revenue,"SUM(active_subscriptions * monthly_price)","Predictable monthly revenue from subscriptions","Growing 10%+ MoM for early stage; 2-3% for mature","ARR, ARPU, Churn",Line chart with trend
-Annual Recurring Revenue,ARR,Revenue,"MRR * 12","Annualized predictable revenue","ARR > $1M for Series A; $10M for Series B","MRR, Growth Rate",KPI card with YoY
-New MRR,New MRR,Revenue,"SUM(new_customer MRR this period)","Revenue from new customers acquired","Growing indicates sales effectiveness","CAC, Expansion MRR",Stacked bar
-Expansion MRR,Expansion MRR,Revenue,"SUM(upsell + cross-sell MRR this period)","Revenue growth from existing customers","> 30% of total growth is healthy","NRR, Upsell Rate",Stacked bar
-Churned MRR,Churned MRR,Revenue,"SUM(cancelled subscription MRR this period)","Revenue lost from cancellations","< 2% of total MRR monthly","Churn Rate, NRR",Stacked bar
-Net New MRR,Net New MRR,Revenue,"New MRR + Expansion MRR - Churned MRR - Contraction MRR","Net change in recurring revenue","Positive and growing","MRR Growth Rate",Line chart
-Average Revenue Per Account,ARPA,Revenue,"Total MRR / Number of Accounts","Average revenue per customer account","Trending up indicates upselling success","MRR, Customer Count",KPI card
-Customer Churn Rate,Logo Churn,Retention,"Churned Customers / Total Customers at Start * 100","Percentage of customers lost","< 5% monthly for B2B; < 7% for SMB","Revenue Churn, NRR",Line chart
-Revenue Churn Rate,Rev Churn,Retention,"Churned MRR / Starting MRR * 100","Percentage of revenue lost to churn","< 2% monthly for B2B SaaS","Logo Churn, NRR",Line chart
-Net Revenue Retention,NRR,Retention,"(MRR Start + Expansion - Contraction - Churn) / MRR Start * 100","Revenue retained + expanded from existing customers","> 100% (best in class > 120%)","GRR, Expansion Rate",Line chart
-Gross Revenue Retention,GRR,Retention,"(MRR Start - Contraction - Churn) / MRR Start * 100","Revenue retained excluding expansion","> 90% for enterprise; > 80% for SMB","NRR, Churn Rate",Line chart
-Customer Lifetime Value,LTV,Unit Economics,"ARPA * Gross Margin % * (1 / Churn Rate)","Total value of customer relationship","LTV:CAC > 3:1","CAC, Payback Period",KPI card
-Customer Acquisition Cost,CAC,Unit Economics,"(Sales + Marketing Spend) / New Customers","Cost to acquire one customer","Varies by segment; track trend","LTV, Payback Period",KPI card trend
-LTV to CAC Ratio,LTV:CAC,Unit Economics,"LTV / CAC","Health of unit economics","> 3:1 for sustainable growth","LTV, CAC",Gauge or ratio
-CAC Payback Period,Payback,Unit Economics,"CAC / (ARPA * Gross Margin %)","Months to recover acquisition cost","< 12 months for healthy SaaS","CAC, ARPA, GrossMargin",KPI card
-Quick Ratio,Quick Ratio,Growth,"(New MRR + Expansion MRR) / (Churned MRR + Contraction MRR)","Growth efficiency metric","> 4:1 indicates strong growth","MRR components",KPI card
-Rule of 40,Rule of 40,Growth,"Revenue Growth % + Profit Margin %","Combined growth and profitability","> 40% is excellent","Growth Rate, Margin",KPI card
-Trial to Paid Conversion,Trial Conv,Conversion,"Paid Signups / Trial Signups * 100","Trial effectiveness","15-25% for freemium; 40-60% for trials","Activation Rate, TTV",Funnel
-Activation Rate,Activation,Conversion,"Users Reaching Aha Moment / Signups * 100","Percentage completing key action","Varies - define activation first","Trial Conv, Feature Adoption",Funnel
-Daily Active Users,DAU,Engagement,"COUNT(DISTINCT users with activity today)","Daily product engagement","Track trend; compare to MAU","WAU, MAU, DAU/MAU",Line chart
-Monthly Active Users,MAU,Engagement,"COUNT(DISTINCT users with activity this month)","Monthly product engagement","Growing indicates product health","DAU, Stickiness",Line chart
-Stickiness,DAU/MAU,Engagement,"DAU / MAU * 100","Product habit formation","> 20% good; > 50% exceptional","DAU, MAU, Session Frequency",Line chart
-Feature Adoption Rate,Feature Use,Engagement,"Users Using Feature / MAU * 100","Uptake of specific features","> 50% for core features","Activation, Engagement",Bar chart

package/assets/.shared/data-analyst/metrics.csv DELETED Viewed

@@ -1,74 +0,0 @@
-Metric Name,Abbreviation,Industry,Formula,Interpretation,Good Benchmark,Related Metrics,Visualization
-Monthly Recurring Revenue,MRR,SaaS,"SUM(active_subscriptions * monthly_price)","Predictable monthly revenue from subscriptions","Growing 10%+ MoM for early stage","ARR, ARPU, Churn",Line chart with trend
-Annual Recurring Revenue,ARR,SaaS,"MRR * 12","Annualized predictable revenue","ARR > $1M for Series A","MRR, Growth Rate",KPI card with YoY change
-Customer Churn Rate,Churn,SaaS,"Churned Customers / Total Customers at Period Start * 100","Percentage of customers lost in period","< 5% monthly for B2B SaaS","NRR, Retention Rate, Customer Lifetime",Line chart or gauge
-Customer Acquisition Cost,CAC,SaaS,"(Sales Spend + Marketing Spend) / New Customers Acquired","Total cost to acquire one new customer","LTV:CAC > 3:1","LTV, Payback Period, Marketing Spend",KPI card with trend
-Customer Lifetime Value,LTV,SaaS,"ARPU * Customer Lifetime (1/Churn Rate)","Total revenue expected from a customer over their lifetime","LTV:CAC > 3:1","CAC, Churn, ARPU",KPI card
-LTV to CAC Ratio,LTV:CAC,SaaS,"LTV / CAC","Health of unit economics - value vs acquisition cost","> 3:1 for sustainable growth","LTV, CAC",Gauge or ratio display
-Net Revenue Retention,NRR,SaaS,"(MRR Start + Expansion - Contraction - Churn) / MRR Start * 100","Revenue retained from existing customers including expansion","> 100% means growing without new customers","Gross Retention, Expansion Revenue",Line chart
-Average Revenue Per User,ARPU,SaaS,"Total Revenue / Total Customers","Average revenue generated per customer","Depends on pricing model","MRR, LTV, Pricing Tier Distribution",KPI card
-Payback Period,Payback,SaaS,"CAC / (ARPU * Gross Margin)","Months to recover customer acquisition cost","< 12 months for healthy SaaS","CAC, ARPU, Gross Margin",KPI card
-Gross Revenue Retention,GRR,SaaS,"(MRR Start - Contraction - Churn) / MRR Start * 100","Revenue retained excluding expansion","> 90% for enterprise SaaS","NRR, Churn",Line chart
-Trial to Paid Conversion,Trial Conv,SaaS,"Paid Signups / Trial Signups * 100","Percentage of trials that become paying customers","15-25% for freemium, 40-60% for free trial","Activation Rate, Time to Conversion",Funnel chart
-Activation Rate,Activation,SaaS,"Activated Users / Signups * 100","Users who reach aha moment","Varies by product - define activation first","Trial Conversion, Feature Adoption",Funnel chart
-Daily Active Users,DAU,SaaS,"COUNT(DISTINCT users active today)","Users engaging with product daily","Depends on product type","MAU, DAU/MAU Ratio, Stickiness",Line chart
-Monthly Active Users,MAU,SaaS,"COUNT(DISTINCT users active this month)","Users engaging with product monthly","Target based on TAM","DAU, Retention",Line chart
-Stickiness Ratio,DAU/MAU,SaaS,"DAU / MAU * 100","How often users return - product habit","> 20% is good, > 50% is exceptional","DAU, MAU, Retention",Gauge
-Expansion Revenue,Expansion MRR,SaaS,"MRR from upsells + cross-sells","Revenue growth from existing customers","> 30% of total growth is healthy","NRR, Upsell Rate, Cross-sell Rate",Stacked bar chart
-Quick Ratio,Quick Ratio,SaaS,"(New MRR + Expansion MRR) / (Churned MRR + Contraction MRR)","Growth efficiency - new vs lost revenue","> 4:1 indicates strong growth","MRR Growth, Churn, Expansion",KPI card
-Conversion Rate,Conv Rate,E-commerce,"Purchases / Sessions * 100","Percentage of visits that result in purchase","2-3% average, 5%+ is excellent","AOV, Traffic, Cart Abandonment",Line chart with trend
-Average Order Value,AOV,E-commerce,"Total Revenue / Number of Orders","Average amount spent per order","Varies by industry - track trend over time","Conversion Rate, Items per Order",KPI card
-Cart Abandonment Rate,Cart Abandon,E-commerce,"(Carts Created - Purchases) / Carts Created * 100","Percentage of shopping carts not completed","Industry average ~70%","Checkout Conversion, Payment Failure Rate",Funnel chart
-Customer Acquisition Cost,CAC,E-commerce,"Marketing Spend / New Customers","Cost to acquire one new customer","Should be < first order profit","LTV, First Order Margin",KPI card
-Customer Lifetime Value,CLV,E-commerce,"AOV * Purchase Frequency * Customer Lifespan","Total expected revenue from a customer","CLV:CAC > 3:1","CAC, Repeat Rate, AOV",KPI card
-Repeat Purchase Rate,Repeat Rate,E-commerce,"Customers with 2+ Orders / Total Customers * 100","Percentage of customers who buy again","> 30% is healthy for most categories","Retention, CLV, Purchase Frequency",Line chart
-Purchase Frequency,Purchase Freq,E-commerce,"Total Orders / Unique Customers","Average orders per customer per period","Varies by category - track trend","Repeat Rate, CLV",KPI card
-Revenue Per Visitor,RPV,E-commerce,"Total Revenue / Total Visitors","Revenue generated per site visit","Track trend - combines traffic and conversion","Conversion Rate, AOV, Traffic",KPI card
-Return Rate,Return Rate,E-commerce,"Returned Orders / Total Orders * 100","Percentage of orders returned","< 20% for most categories, < 30% for apparel","Net Revenue, Customer Complaints",Line chart
-Gross Margin,Gross Margin,E-commerce,"(Revenue - COGS) / Revenue * 100","Profit after product costs","40-60% typical for retail","Net Margin, COGS, Pricing",KPI card
-Inventory Turnover,Inv Turnover,E-commerce,"COGS / Average Inventory","How often inventory sells and is replaced","Higher is better - varies by category","Days of Inventory, Stock-outs",KPI card
-Customer Satisfaction Score,CSAT,E-commerce,"Satisfied Responses / Total Responses * 100","Customer satisfaction with specific interaction","> 80% is good, > 90% is excellent","NPS, Reviews, Return Rate",Gauge
-Net Promoter Score,NPS,E-commerce,"% Promoters - % Detractors","Likelihood to recommend (-100 to +100)","> 50 is excellent, > 70 is world class","CSAT, Reviews, Retention",Gauge
-Gross Profit Margin,GPM,Finance,"(Revenue - COGS) / Revenue * 100","Profit after direct costs","Varies by industry - compare to peers","Net Margin, Operating Margin",KPI card
-Net Profit Margin,NPM,Finance,"Net Income / Revenue * 100","Profit after all expenses","Positive and growing","Gross Margin, Operating Expenses",KPI card
-Operating Margin,Op Margin,Finance,"Operating Income / Revenue * 100","Profit from core operations","Positive indicates viable business model","Gross Margin, SG&A",KPI card
-Return on Investment,ROI,Finance,"(Gain - Cost) / Cost * 100","Return generated from investment","> 0% means profitable investment","IRR, Payback Period",KPI card
-Return on Assets,ROA,Finance,"Net Income / Total Assets * 100","How efficiently assets generate profit","> 5% is generally good","ROE, Asset Turnover",KPI card
-Return on Equity,ROE,Finance,"Net Income / Shareholders Equity * 100","Return generated on shareholder investment","> 15% is considered good","ROA, Leverage Ratio",KPI card
-Current Ratio,Current Ratio,Finance,"Current Assets / Current Liabilities","Ability to pay short-term obligations","> 1.5 indicates healthy liquidity","Quick Ratio, Working Capital",KPI card
-Debt to Equity Ratio,D/E Ratio,Finance,"Total Debt / Shareholders Equity","Financial leverage - debt vs equity","< 2 is generally healthy","Leverage, Interest Coverage",KPI card
-Working Capital,Working Cap,Finance,"Current Assets - Current Liabilities","Operating liquidity available","Positive and growing","Cash Flow, Current Ratio",KPI card
-Cash Flow from Operations,CFO,Finance,"Net cash from core business operations","Cash generated by business","Positive and growing","Net Income, Free Cash Flow",Line chart
-Free Cash Flow,FCF,Finance,"CFO - Capital Expenditures","Cash available after investments","Positive for mature companies","CFO, CapEx, Dividends",Line chart
-Revenue Growth Rate,Rev Growth,Finance,"(Current Revenue - Prior Revenue) / Prior Revenue * 100","Rate of revenue increase","Depends on stage - 20%+ for growth companies","MoM, YoY, CAGR",Line chart
-Burn Rate,Burn Rate,Finance,"Monthly Operating Expenses - Monthly Revenue","Net cash consumed per month","Runway > 18 months is safe","Runway, Cash Balance",Line chart
-Runway,Runway,Finance,"Cash Balance / Burn Rate","Months of operations remaining","> 18 months for fundraising","Burn Rate, Cash Balance",KPI card
-Click-Through Rate,CTR,Marketing,"Clicks / Impressions * 100","Effectiveness of ad or content","2-5% for search ads, 0.5-1% for display","CPC, Conversion Rate",Line chart
-Cost Per Click,CPC,Marketing,"Ad Spend / Clicks","Cost for each ad click","Varies by industry and platform","CTR, CPA, Quality Score",KPI card
-Cost Per Acquisition,CPA,Marketing,"Marketing Spend / Conversions","Cost to acquire a customer or lead","Should be < customer value","CAC, ROAS, Conversion Rate",KPI card
-Return on Ad Spend,ROAS,Marketing,"Revenue from Ads / Ad Spend","Revenue generated per ad dollar","3:1 or higher is typically profitable","ROI, CPA, Conversion Rate",KPI card
-Cost Per Mille,CPM,Marketing,"(Ad Spend / Impressions) * 1000","Cost per thousand impressions","Varies by channel and targeting","CTR, Reach, Frequency",KPI card
-Conversion Rate,Conv Rate,Marketing,"Conversions / Total Visitors * 100","Percentage of visitors taking desired action","Varies by goal - track improvement","CTR, CPA, Landing Page Views",Funnel chart
-Lead to Customer Rate,Lead Conv,Marketing,"Customers / Leads * 100","Percentage of leads that become customers","10-20% for B2B SaaS","SQL Rate, Sales Cycle",Funnel chart
-Marketing Qualified Leads,MQLs,Marketing,"COUNT leads meeting marketing criteria","Leads ready for sales follow-up","Growing with stable conversion rate","SQLs, Lead Velocity",Line chart
-Sales Qualified Leads,SQLs,Marketing,"COUNT leads meeting sales criteria","Leads accepted by sales team","Conversion from MQL > 30%","MQLs, Opportunities, Win Rate",Line chart
-Email Open Rate,Open Rate,Marketing,"Opens / Emails Delivered * 100","Percentage of emails opened","15-25% is typical","CTR, Unsubscribe Rate",Line chart
-Email Click Rate,Email CTR,Marketing,"Clicks / Emails Delivered * 100","Percentage of emails clicked","2-5% is typical","Open Rate, Conversion Rate",Line chart
-Unsubscribe Rate,Unsub Rate,Marketing,"Unsubscribes / Emails Delivered * 100","Percentage of recipients unsubscribing","< 0.5% is healthy","List Growth, Complaint Rate",Line chart
-Social Engagement Rate,Engagement,Marketing,"(Likes + Comments + Shares) / Followers * 100","Interaction with social content","1-5% depending on platform","Reach, Impressions, Follower Growth",Bar chart
-Bounce Rate,Bounce Rate,Marketing,"Single-page Sessions / Total Sessions * 100","Visitors leaving without interaction","< 40% for content sites, < 60% for landing pages","Time on Site, Pages per Session",Line chart
-Pages per Session,Pages/Session,Marketing,"Total Pageviews / Sessions","Content engagement depth","> 2 indicates good engagement","Bounce Rate, Session Duration",KPI card
-Average Session Duration,Avg Session,Marketing,"Total Session Duration / Sessions","Time spent on site per visit","> 2 minutes for content sites","Bounce Rate, Pages per Session",KPI card
-Brand Awareness,Awareness,Marketing,"Survey-based or search volume","Percentage aware of brand","Track growth over time","Share of Voice, Brand Recall",Line chart
-Share of Voice,SOV,Marketing,"Brand Mentions / Total Category Mentions * 100","Brand visibility vs competitors","Growing share indicates market gains","Brand Awareness, Market Share",Pie chart
-Attribution Rate,Attribution,Marketing,"Attributed Conversions / Total Conversions * 100","Conversions trackable to marketing","Higher is better for optimization","Multi-touch Attribution",Stacked bar
-Customer Retention Rate,Retention,General,"(Customers End - New Customers) / Customers Start * 100","Percentage of customers retained","Depends on industry - 90%+ for SaaS","Churn Rate, LTV",Line chart
-Week 1 Retention,W1 Retention,General,"Users Active Week 1 / Signups * 100","Users returning after first week","25-40% for consumer apps","D1, D7, D30 Retention",Cohort heatmap
-Month 1 Retention,M1 Retention,General,"Users Active Month 1 / Signups * 100","Users returning after first month","20-30% for consumer apps","Week 1, Month 3 Retention",Cohort heatmap
-User Growth Rate,User Growth,General,"(Users End - Users Start) / Users Start * 100","Rate of user base expansion","Depends on stage","DAU, MAU, Signups",Line chart
-Feature Adoption Rate,Feature Adopt,General,"Users Using Feature / Total Users * 100","Uptake of specific feature","> 50% for core features","Activation, Engagement",Bar chart
-Time to Value,TTV,General,"Time from signup to first value moment","Speed of initial user success","Shorter is better - define value moment","Activation Rate, Onboarding Completion",Histogram
-Support Tickets per User,Tickets/User,General,"Total Tickets / Active Users","Support burden per user","Decreasing over time is good","CSAT, Resolution Time",Line chart
-Average Resolution Time,Avg Resolution,General,"Total Resolution Time / Tickets Resolved","Time to resolve support tickets","Depends on complexity - track trend","First Response Time, Ticket Volume",Line chart
-Employee Net Promoter Score,eNPS,General,"% Promoters - % Detractors","Employee satisfaction and loyalty","> 20 is good, > 50 is excellent","Turnover Rate, Engagement",Gauge
-Revenue per Employee,Rev/Employee,General,"Total Revenue / Number of Employees","Efficiency and productivity","$200K+ for SaaS","Headcount, Revenue Growth",KPI card

package/assets/.shared/data-analyst/python-patterns.csv DELETED Viewed

@@ -1,31 +0,0 @@
-Pattern Name,Use Case,pandas Code,polars Code,Performance
-Load CSV File,Read data from CSV file,"df = pd.read_csv('file.csv', parse_dates=['date_col'])","df = pl.read_csv('file.csv')","Use dtype parameter to reduce memory; usecols for subset"
-Load Excel File,Read data from Excel file,"df = pd.read_excel('file.xlsx', sheet_name='Sheet1')","df = pl.read_excel('file.xlsx')","Specify sheet_name; engine='openpyxl' for xlsx"
-Load Multiple CSVs,Combine CSVs from folder,"df = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')])","df = pl.concat([pl.read_csv(f) for f in glob.glob('*.csv')])","Use ignore_index=True to reset index"
-Database Connection,Connect to SQL database,"from sqlalchemy import create_engine; engine = create_engine('postgresql://...'); df = pd.read_sql(query, engine)","df = pl.read_database(query, connection_uri)","Use connection pooling for multiple queries"
-Filter Rows,Select rows matching condition,"df = df[df['status'] == 'active']; df = df[df['value'] > 100]","df = df.filter(pl.col('status') == 'active')","Chain filters or use & for multiple conditions"
-Select Columns,Choose specific columns,"df = df[['col1', 'col2', 'col3']]; df = df.drop(columns=['unwanted'])","df = df.select(['col1', 'col2', 'col3'])","Select early to reduce memory"
-Rename Columns,Change column names,"df = df.rename(columns={'old': 'new', 'old2': 'new2'})","df = df.rename({'old': 'new'})","Use dict for multiple renames"
-Sort Data,Order by column values,"df = df.sort_values(['col1', 'col2'], ascending=[True, False])","df = df.sort(['col1', 'col2'], descending=[False, True])","Sort after filtering for efficiency"
-Group By Aggregate,Aggregate data by groups,"df.groupby('category').agg({'value': 'sum', 'count': 'size'})","df.group_by('category').agg([pl.col('value').sum(), pl.len()])","Named aggregations: agg(total=('value', 'sum'))"
-Pivot Table,Create pivot table,"df.pivot_table(index='row', columns='col', values='value', aggfunc='sum', fill_value=0)","df.pivot(index='row', columns='col', values='value')","Use margins=True for totals"
-Melt Unpivot,Convert wide to long format,"df.melt(id_vars=['id'], value_vars=['col1', 'col2'], var_name='variable', value_name='value')","df.melt(id_vars=['id'])","Inverse of pivot operation"
-Join Merge,Combine two dataframes,"df = pd.merge(df1, df2, on='key', how='left')","df = df1.join(df2, on='key', how='left')","Validate: how='left'/'right'/'inner'/'outer'"
-Concatenate DataFrames,Stack dataframes vertically,"df = pd.concat([df1, df2], ignore_index=True)","df = pl.concat([df1, df2])","axis=0 for rows; axis=1 for columns"
-Apply Function,Transform values with function,"df['new'] = df['col'].apply(lambda x: x * 2)","df = df.with_columns(pl.col('col').map_elements(lambda x: x * 2))","Vectorized operations are faster than apply"
-Rolling Window,Calculate rolling statistics,"df['rolling_mean'] = df['value'].rolling(window=7).mean()","df.with_columns(pl.col('value').rolling_mean(7))","Specify min_periods for edge handling"
-Date Extraction,Extract date components,"df['year'] = df['date'].dt.year; df['month'] = df['date'].dt.month; df['weekday'] = df['date'].dt.dayofweek","df.with_columns(pl.col('date').dt.year().alias('year'))","dt accessor for date operations"
-Date Difference,Calculate days between dates,"df['days'] = (df['end_date'] - df['start_date']).dt.days","df.with_columns((pl.col('end_date') - pl.col('start_date')).dt.total_days())","Result is timedelta; use .days for integer"
-Lag/Lead Values,Get previous or next row values,"df['prev_value'] = df.groupby('id')['value'].shift(1)","df.with_columns(pl.col('value').shift(1).over('id'))","shift(-1) for next value (lead)"
-Cumulative Sum,Running total,"df['cumsum'] = df.groupby('category')['value'].cumsum()","df.with_columns(pl.col('value').cum_sum().over('category'))","Order matters - sort first if needed"
-Rank Values,Rank within groups,"df['rank'] = df.groupby('category')['value'].rank(ascending=False)","df.with_columns(pl.col('value').rank().over('category'))","method='min'/'dense'/'first' for tie handling"
-Percent of Total,Calculate percentage of group total,"df['pct'] = df['value'] / df.groupby('category')['value'].transform('sum')","df.with_columns((pl.col('value') / pl.col('value').sum().over('category')).alias('pct'))","transform applies group result back to rows"
-Fill Missing Forward,Forward fill nulls,"df['col'] = df['col'].fillna(method='ffill')","df.with_columns(pl.col('col').forward_fill())","bfill for backward fill"
-Replace Values,Map values to new values,"df['col'] = df['col'].replace({'old1': 'new1', 'old2': 'new2'})","df.with_columns(pl.col('col').replace({'old1': 'new1'}))","Use map for complex transformations"
-Binning Discretize,Convert continuous to categorical,"df['bin'] = pd.cut(df['value'], bins=[0, 10, 50, 100], labels=['low', 'med', 'high'])","df.with_columns(pl.col('value').cut([10, 50, 100]))","qcut for equal-frequency bins"
-One Hot Encoding,Convert categorical to dummies,"df = pd.get_dummies(df, columns=['category'], prefix='cat')","df.to_dummies(columns=['category'])","drop_first=True to avoid multicollinearity"
-Value Counts,Count occurrences of each value,"df['col'].value_counts(normalize=True)","df['col'].value_counts()","normalize=True for percentages"
-Describe Statistics,Summary statistics,"df.describe(include='all', percentiles=[.25, .5, .75])","df.describe()","include='all' for non-numeric columns"
-Correlation Matrix,Calculate correlations,"df.select_dtypes(include='number').corr()","df.select(pl.numeric_columns()).pearson_corr()","Use method='spearman' for non-linear"
-Cross Tabulation,Frequency table for two columns,"pd.crosstab(df['col1'], df['col2'], normalize='index')","N/A - use group_by and pivot","normalize='all'/'index'/'columns'"
-Sample Data,Random sample of rows,"df.sample(n=1000) or df.sample(frac=0.1)","df.sample(n=1000)","random_state for reproducibility"

package/assets/.shared/data-analyst/report-ux.csv DELETED Viewed

@@ -1,26 +0,0 @@
-Category,Guideline,Do,Don't,Example
-Layout,5-Second Rule,"Put most important insight at top-left where eyes land first","Bury key metrics at bottom of page","CEO should see revenue trend in 5 seconds without scrolling"
-Layout,Inverted Pyramid,"Structure: KPIs at top → Trends in middle → Details at bottom","Start with detailed tables; hide summary at bottom","Top row: 3-5 KPI cards; Middle: Line charts; Bottom: Detailed table"
-Layout,Visual Hierarchy,"Use size and position to indicate importance","Equal sizing for all elements; no focal point","Largest chart for most important metric; smaller for supporting"
-Layout,White Space,"Give elements room to breathe; avoid cramped layouts","Fill every pixel with data; no margins","Minimum 16px padding between dashboard sections"
-Layout,Grid System,"Align elements to consistent grid for clean appearance","Random positioning of elements","Use 12-column grid; align chart edges"
-Layout,Responsive Design,"Design for multiple screen sizes; test on mobile","Fixed-width layouts that break on small screens","Cards stack vertically on mobile; charts resize"
-Color,Consistent Meaning,"Use same colors for same meanings throughout","Red means growth in one place and decline in another","Red = negative/alert; Green = positive/growth everywhere"
-Color,Limit Palette,"Use 3-5 colors maximum per visualization","Rainbow of colors with no meaning","Primary brand color + 2-3 supporting colors"
-Color,Colorblind Safe,"Test with colorblind simulation; avoid red-green only","Rely solely on red vs green for meaning","Use patterns, labels, or blue-orange instead"
-Color,Sequential Palettes,"Use for continuous data showing magnitude","Categorical colors for numerical ranges","Light to dark blue for low to high values"
-Color,Diverging Palettes,"Use for data with meaningful midpoint (pos/neg)","Sequential palette for data diverging from center","Blue-white-red for profit/loss or sentiment"
-Color,Background Contrast,"Ensure sufficient contrast for readability","Light text on light background; low contrast","WCAG AA contrast ratio minimum (4.5:1 for text)"
-Typography,Hierarchy,"Use font size to establish content hierarchy","Same size for titles, labels, and values","Title: 24px; Subtitle: 16px; Body: 14px; Labels: 12px"
-Typography,Readability,"Choose readable fonts; limit to 2 families","Decorative fonts for data; too many font families","Sans-serif for data (Inter, Roboto); consistent weights"
-Typography,Number Formatting,"Format numbers for readability: 1.2M not 1234567","Raw unformatted numbers","$1.2M; 45.3%; 1,234 users"
-Typography,Axis Labels,"Label axes clearly; include units","Unlabeled axes; cryptic abbreviations","Revenue (USD, Millions) not just 'Rev'"
-Interactivity,Drill Down,"Let users click to explore underlying data","Force users to ask for details separately","Click bar to see breakdown by category"
-Interactivity,Filters,"Provide relevant filters; show active filter state","Too many filters; hidden filter state","Date range, region, segment filters clearly visible"
-Interactivity,Tooltips,"Show details on hover without cluttering view","Tooltips blocking other content","Hover shows: Date, Value, % Change, Comparison"
-Interactivity,Linked Views,"Connect related charts; filter one affects others","Isolated charts with no relationship","Clicking segment in pie filters line chart"
-Content,Title Everything,"Every chart needs a clear descriptive title","Untitled charts relying on context","Revenue by Region (Q4 2024) not just 'Revenue'"
-Content,Annotate Insights,"Highlight anomalies and key points","Let users discover insights alone","Arrow pointing to spike with explanation text"
-Content,Show Context,"Include comparison: vs target, last period, benchmark","Single number with no reference point","Revenue: $1.2M (↑ 15% YoY, 5% above target)"
-Content,Data Freshness,"Clearly show when data was last updated","Stale data without indication","Last updated: 2024-01-15 08:00 UTC"
-Content,Source Attribution,"Cite data source for credibility","Unknown data origin","Source: Sales Database, Marketing API"

package/assets/.shared/data-analyst/sql-patterns.csv DELETED Viewed

@@ -1,36 +0,0 @@
-Pattern Name,Use Case,SQL Code,PostgreSQL,BigQuery,Performance
-Running Total,"Cumulative sum over time","SUM(value) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)","Same","Same","Efficient with index on date column"
-Running Average,"Moving average over all prior rows","AVG(value) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)","Same","Same","Consider fixed window for performance"
-Rolling Window Average,"N-period moving average","AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)","Same","Same","Fixed window more efficient than unbounded"
-Lag Previous Value,"Compare to previous row","LAG(value, 1) OVER (ORDER BY date)","Same","Same","Useful for period-over-period calculations"
-Lead Next Value,"Look ahead to next row","LEAD(value, 1) OVER (ORDER BY date)","Same","Same","Use for forward-looking comparisons"
-Year over Year,"Compare to same period last year","LAG(value, 12) OVER (ORDER BY month) for monthly; or self-join on date - INTERVAL '1 year'","Same; use date_trunc('year', date)","DATE_SUB(date, INTERVAL 1 YEAR)","Index on date; pre-aggregate to month level"
-Month over Month,"Compare to previous month","LAG(value, 1) OVER (ORDER BY month)","Same","Same","Pre-aggregate daily to monthly first"
-Percent Change,"Calculate growth rate","(value - LAG(value, 1) OVER (ORDER BY date)) / NULLIF(LAG(value, 1) OVER (ORDER BY date), 0) * 100","Same","Same","Handle divide by zero with NULLIF"
-Rank,"Rank rows by value","RANK() OVER (ORDER BY value DESC)","Same","Same","Gaps in ranking for ties"
-Dense Rank,"Rank without gaps","DENSE_RANK() OVER (ORDER BY value DESC)","Same","Same","No gaps - consecutive numbers"
-Row Number,"Unique row identifier","ROW_NUMBER() OVER (ORDER BY date)","Same","Same","Good for pagination"
-Percent Rank,"Percentile position","PERCENT_RANK() OVER (ORDER BY value)","Same","Same","Returns 0-1 scale"
-NTILE Buckets,"Divide into N equal groups","NTILE(4) OVER (ORDER BY value)","Same","Same","Useful for quartile analysis"
-First Value in Group,"Get first value per partition","FIRST_VALUE(value) OVER (PARTITION BY group ORDER BY date)","Same","Same","Useful for cohort first action"
-Last Value in Group,"Get last value per partition","LAST_VALUE(value) OVER (PARTITION BY group ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)","Same","Same","Must specify frame for last value"
-Deduplication,"Get latest record per entity","WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) as rn FROM table) SELECT * FROM ranked WHERE rn = 1","Same","Use QUALIFY instead: SELECT * FROM table QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) = 1","Index on partition and order columns"
-Gap Fill Dates,"Fill missing dates in time series","Use generate_series to create date spine then LEFT JOIN","generate_series(start_date, end_date, '1 day'::interval)","GENERATE_DATE_ARRAY(start_date, end_date)","Generate date spine first, then join data"
-Cohort Definition,"Assign users to signup cohort","SELECT user_id, DATE_TRUNC('month', MIN(signup_date)) OVER (PARTITION BY user_id) as cohort FROM events","Same","DATE_TRUNC(signup_date, MONTH)","Calculate cohort once and store"
-Retention Cohort,"Calculate retention by cohort","WITH cohorts AS (...), activity AS (...) SELECT cohort, DATEDIFF(activity_month, cohort) as period, COUNT(DISTINCT user_id)","Same; use date_part('month', age(...))","DATE_DIFF(activity_date, cohort_date, MONTH)","Pre-compute user cohorts for efficiency"
-Funnel Sequential,"Ensure funnel steps happen in order","WITH step1 AS (...), step2 AS (... WHERE step2_time > step1_time) SELECT ...","Same","Same","Index on user_id and timestamp"
-Funnel Conversion,"Count users at each funnel step","SELECT 'Step1' as step, COUNT(DISTINCT user_id) UNION ALL SELECT 'Step2', COUNT(DISTINCT CASE WHEN completed_step2 THEN user_id END)","Same","Same","One pass aggregation is efficient"
-Sessionization,"Group events into sessions by gap","SUM(CASE WHEN time_since_last > 30 THEN 1 ELSE 0 END) OVER (PARTITION BY user ORDER BY timestamp) as session_id","Same","Same","30 minute gap is common default"
-Pivot Dynamic,"Pivot rows to columns dynamically","Use CASE WHEN for known values or crosstab() extension","crosstab() function from tablefunc","PIVOT operator available","Static CASE WHEN is more portable"
-Unpivot,"Convert columns to rows","Use UNION ALL for each column or UNPIVOT keyword","UNNEST with ARRAY","UNPIVOT operator","UNION ALL works everywhere"
-Self Join for Pairs,"Find related records","SELECT a.*, b.* FROM table a JOIN table b ON a.user_id = b.user_id AND a.id < b.id","Same","Same","Use a.id < b.id to avoid duplicates"
-Recursive CTE,"Hierarchical data traversal","WITH RECURSIVE cte AS (base UNION ALL recursive) SELECT * FROM cte","Same","Does not support - use CONNECT BY or flatten","Limit recursion depth for safety"
-Anti Join,"Find records NOT in another table","SELECT * FROM a WHERE NOT EXISTS (SELECT 1 FROM b WHERE a.id = b.id)","Same; also LEFT JOIN WHERE b.id IS NULL","Same","NOT EXISTS often most efficient"
-Conditional Aggregation,"Aggregate with conditions","SUM(CASE WHEN status = 'active' THEN amount ELSE 0 END)","Same; also FILTER clause: SUM(amount) FILTER (WHERE status = 'active')","COUNTIF, SUMIF available","CASE WHEN is most portable"
-Distinct Count Per Group,"Count distinct within groups","COUNT(DISTINCT user_id) OVER (PARTITION BY category)","Same","Same; also APPROX_COUNT_DISTINCT for estimates","Expensive - consider HyperLogLog"
-Median Calculation,"Find median value","PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value)","Same","APPROX_QUANTILES(value, 100)[OFFSET(50)]","Exact median is expensive; approximate is faster"
-Mode Calculation,"Find most frequent value","SELECT value, COUNT(*) as cnt FROM table GROUP BY value ORDER BY cnt DESC LIMIT 1","Also: mode() WITHIN GROUP (ORDER BY value)","APPROX_TOP_COUNT for approximate","Order by count descending, limit 1"
-Time Bucket,"Group timestamps into buckets","DATE_TRUNC('hour', timestamp)","date_trunc('hour', ts)","TIMESTAMP_TRUNC(ts, HOUR)","Reduces granularity for aggregation"
-Date Spine Join,"Ensure all dates present","SELECT d.date, COALESCE(t.value, 0) FROM date_spine d LEFT JOIN table t ON d.date = t.date","generate_series for date spine","GENERATE_DATE_ARRAY","Create date dimension table"
-Weighted Average,"Calculate weighted average","SUM(value * weight) / NULLIF(SUM(weight), 0)","Same","Same","Handle zero weight with NULLIF"
-Compound Growth Rate,"Calculate CAGR","POWER(end_value / start_value, 1.0 / years) - 1","Same; use POWER() function","POWER() function","Need start, end, and period count"

package/assets/.shared/data-analyst/validation.csv DELETED Viewed

@@ -1,21 +0,0 @@
-Mistake Type,Description,Symptoms,Prevention Query,User Question
-Duplicate Inflation,Counting same record multiple times due to duplicates or join multiplication,"Total much higher than expected; sum doesn't match source","SELECT id, COUNT(*) as cnt FROM table GROUP BY id HAVING COUNT(*) > 1","Does the total of X seem reasonable compared to other reports?"
-Wrong Granularity,Aggregating at wrong level (user vs session vs event),"Numbers don't match other reports; unexpected row counts","SELECT COUNT(*), COUNT(DISTINCT user_id), COUNT(DISTINCT session_id) FROM table","Is this data one row per user, per session, or per event?"
-Missing Filter,Forgot to exclude test users, internal accounts, or invalid data,"Numbers include test data; higher than expected","SELECT COUNT(*) FROM users WHERE email LIKE '%@company.com' OR email LIKE '%test%'","Should we exclude internal/test users? Any known filters?"
-Timezone Mismatch,Comparing dates in different timezones causing misalignment,"Day totals don't match other reports; off-by-one errors","SELECT DISTINCT date_trunc('day', ts AT TIME ZONE 'UTC') vs AT TIME ZONE 'PST'","What timezone should I use for date calculations?"
-Survivorship Bias,Only analyzing users who completed journey ignoring dropoffs,"Metrics look too good; missing failed attempts","Check: are we only looking at users who converted?","Are we analyzing all users or only those who [completed action]?"
-Simpson's Paradox,Aggregate trend opposite of subgroup trends,"Conflicting conclusions; unexpected direction","Compare aggregate vs segment-level trends","Should we break this down by [segment] to check for hidden patterns?"
-Incomplete Time Period,Comparing full period to partial period,"Latest period looks lower than historical","Check if latest period has full data: WHERE date < current_date","Is the latest time period complete, or should we exclude it?"
-Wrong Join Type,Using INNER when LEFT needed or vice versa,"Missing rows; unexpected nulls; row count changes","Compare row counts before and after join","The join produced X rows from Y original rows. Does this match expectation?"
-Null Handling Errors,NULLs excluded from aggregations unexpectedly,"Lower counts than expected; divisions by zero","SELECT COUNT(*), COUNT(column), SUM(CASE WHEN column IS NULL THEN 1 END)","How should we handle missing/null values in this analysis?"
-Off-by-One Date Errors,BETWEEN includes endpoints; wrong date boundary,"One extra or missing day; period mismatch","Check: date >= start AND date < end (exclusive end)","Should the date range include or exclude the end date?"
-Metric Definition Mismatch,Using different definition than stakeholder expects,"Numbers don't match expectations; confusion","Document exact definition before starting","How does your team define [metric]? What's included/excluded?"
-Currency Unit Confusion,Mixing dollars and cents or different currencies,"Numbers off by factor of 100 or exchange rate","Check: are amounts in dollars or cents? One currency?","Are these amounts in dollars or cents? Same currency throughout?"
-Seasonality Ignored,Comparing periods with different seasonal patterns,"Invalid conclusions; unfair comparisons","Compare same period last year, not sequential periods","Should we compare to same period last year to account for seasonality?"
-Selection Bias,Analyzing non-representative sample,"Conclusions don't generalize; biased insights","Check how sample was selected; compare to population","Is this sample representative of all users, or a specific subset?"
-Correlation vs Causation,Claiming causation from correlation,"Incorrect business recommendations","Check: is there a plausible mechanism? Control for confounders?","Does X actually cause Y, or are they just correlated?"
-Cherry Picking Dates,Choosing date range that shows desired narrative,"Misleading conclusions; not reproducible","Use standard reporting periods; document why dates chosen","Why was this specific date range chosen?"
-Aggregation Level Mismatch,Comparing metrics calculated at different levels,"Apples to oranges comparison; invalid conclusions","Ensure both metrics use same denominator/level","Are both these metrics calculated the same way (same level)?"
-Data Latency Issues,Using stale data that hasn't propagated fully,"Recent periods look incomplete; inconsistent","Check data freshness: MAX(updated_at), pipeline completion","Is this data fully loaded? When was it last updated?"
-Calculation Errors,Wrong formula for complex metrics,"Metrics don't match known correct values","Validate against known correct calculation or source","Can we validate this against another source or manual calculation?"
-Presentation Bias,Chart design exaggerating or hiding patterns,"Misleading visualizations; wrong conclusions","Check: y-axis starts at zero? Scale appropriate?","Does this chart accurately represent the data without distortion?"

package/assets/.shared/data-analyst/workflows.csv DELETED Viewed

@@ -1,55 +0,0 @@
-Workflow Name,Step Number,Step Name,Description,Questions to Ask,Tools/Commands,Outputs,Common Mistakes
-Exploratory Data Analysis,1,Define Objectives,Understand what insights are needed,"What business questions should this EDA answer? Who is the primary audience for these findings?","None - conversation with stakeholder","Clear list of questions to answer","Starting analysis without clear goals"
-Exploratory Data Analysis,2,Data Profiling,Understand data structure shape and types,"How many rows do you expect? What date range should I focus on?","df.info(), df.describe(), df.isnull().sum(), df.dtypes","Save profiling attributes to reports/profiling_report.md","Skipping profiling and diving straight into analysis"
-Exploratory Data Analysis,3,Data Cleaning,Handle missing/duplicates and fix types,"Are there known quality issues? Should we impute or drop?","df.fillna(), df.drop_duplicates(), df.astype(); Save script to etl/ folder","Cleaned dataset ready for analysis","Skipping cleaning leads to wrong distributions"
-Exploratory Data Analysis,4,Univariate Analysis,Analyze individual columns distributions,"Are there any columns I should focus on specifically?","df.hist(), df.value_counts(), df.describe()","Histograms and value distributions for key columns","Not checking for outliers or unexpected values"
-Exploratory Data Analysis,5,Bivariate Analysis,Relationships between variables,"Which relationships are most important to understand?","df.corr(), scatter plots, grouped statistics","Correlation matrix and scatter plots showing relationships","Missing important correlations by not testing all pairs"
-Exploratory Data Analysis,6,Document Findings,Summarize insights,"What format do you prefer for the findings summary?","Markdown report generation","Save summary report to reports/insights.md","Not prioritizing findings by business impact"
-Dashboard Creation,1,Define Audience,Who will use the dashboard and for what purpose,"Is this for executives (high-level KPIs) or analysts (detailed breakdowns)? How often will they view it?","None - conversation","Clear audience definition and use case","Building for wrong audience (too detailed for execs or too simple for analysts)"
-Dashboard Creation,2,Identify KPIs,What metrics matter most to track,"What are your top 5-7 metrics? Do you have targets for each?","Search industry metrics database","Prioritized list of KPIs with targets","Too many metrics (7+ KPIs causes cognitive overload)"
-Dashboard Creation,3,Data Preparation,Get data into usable format,"Which tables contain this data? What granularity (daily/weekly)?","SQL queries, pandas transformations","Clean aggregated data ready for visualization","Not validating data before visualization"
-Dashboard Creation,4,Chart Selection,Choose appropriate visualizations,"Any chart preferences? Need to support mobile viewing?","Search chart database","Chart type selected for each KPI","Using pie charts for more than 5 categories"
-Dashboard Creation,5,Layout Design,Arrange components following best practices,"Should I follow inverted pyramid (KPIs top trends middle details bottom)?","Dashboard layout template","Final dashboard layout","Burying key insights at bottom of page"
-A/B Test Analysis,1,Define Hypothesis,What are we testing and what do we expect,"What is the primary metric? What is the minimum detectable effect you care about?","None - conversation","Clear null and alternative hypothesis documented","Not defining success criteria upfront"
-A/B Test Analysis,2,Check Sample Size,Sufficient data for statistical significance,"How long has the test been running? What is baseline conversion rate?","Power analysis calculator","Required vs actual sample size comparison","Stopping test too early (peeking problem)"
-A/B Test Analysis,3,Validation Checks,Data quality and test validity,"Were users randomly assigned? Any known issues with test setup?","SRM check, novelty effect detection","Test validity report","Ignoring Sample Ratio Mismatch (SRM)"
-A/B Test Analysis,4,Statistical Analysis,Calculate significance and effect size,"What confidence level is required (95% or 99%)?","t-test, chi-square, confidence interval calculation","P-value, confidence interval, effect size","Not accounting for multiple comparisons"
-A/B Test Analysis,5,Interpret Results,What does this mean for business,"Should we roll out, iterate, or abandon based on results?","Business impact calculation","Actionable recommendation with expected impact","Declaring winner without considering practical significance"
-Cohort Analysis,1,Define Cohort,How to group users for analysis,"Should I cohort by signup date, first purchase, or another event?","None - conversation","Cohort definition documented","Using wrong cohort definition for the question"
-Cohort Analysis,2,Define Metric,What to measure over time,"Should I track retention, revenue, or activity? Over what time periods?","None - conversation","Metric and time periods defined","Measuring wrong metric for the business question"
-Cohort Analysis,3,Data QC & Prep,Clean data and handle nulls before grouping,"Are user IDs unique? Any null dates?","check_duplicates(), handled nulls; Save script to etl/ folder","Clean cohort/event dataset","Including null dates in cohort groups"
-Cohort Analysis,4,Build Cohort Table,SQL for cohort pivot table,"Is there a specific date range to analyze?","SQL with window functions, pivot tables","Cohort table with periods as columns","Off-by-one errors in period calculations"
-Cohort Analysis,5,Visualize,Create retention heatmap,"Any specific cohorts to highlight?","Heatmap visualization","Color-coded retention heatmap","Using colors that don't show progression clearly"
-Cohort Analysis,6,Insights,Identify patterns and explain why,"Which cohorts performed best/worst?","Comparative analysis","Insights report with recommended actions","Not investigating WHY cohorts differ"
-Funnel Analysis,1,Define Steps,What are the funnel stages in order,"What is the first step? What is the final conversion event?","None - conversation","Ordered list of funnel steps","Missing steps or having steps out of order"
-Funnel Analysis,2,Data QC & Prep,Ensure event data is clean,"Are event names standardized? Any missing sessions?","Mapping event names, removing bots; Save script to etl/ folder","Clean event stream","Counting duplicate events"
-Funnel Analysis,3,Count Users,How many users at each step,"What time window should I use for the funnel?","SQL to count distinct users per step","User counts at each stage","Counting sessions instead of unique users"
-Funnel Analysis,4,Calculate Drop-off,Where are users leaving,"Are there any known issues at specific steps?","Conversion rate between steps","Drop-off rates between each step","Comparing non-sequential steps"
-Funnel Analysis,5,Visualize,Create funnel chart,"Prefer horizontal bars or funnel shape?","Funnel or horizontal bar visualization","Funnel visualization","Not labeling percentages clearly"
-Funnel Analysis,6,Recommendations,How to improve conversion,"What lever do you have to improve each step?","Analysis of biggest opportunities","Prioritized list of improvement suggestions","Focusing on small improvements instead of biggest drop-offs"
-Time Series Analysis,1,Define Metric,What to analyze over time,"Daily revenue, weekly users, or monthly orders? How far back?","None - conversation","Metric and time range defined","Wrong granularity (too granular hides trends or too aggregated misses patterns)"
-Time Series Analysis,2,Data QC & Prep,Handle gaps and outliers in time series,"Missing dates? Extreme outliers?","Resampling, outlier detection; Save script to etl/ folder","Clean time series","Interpolating missing data incorrectly"
-Time Series Analysis,3,Aggregate,Group by time period,"Any specific date filters? Exclude weekends?","SQL with DATE_TRUNC, GROUP BY","Aggregated time series data","Timezone issues in date grouping"
-Time Series Analysis,4,Decompose,Identify trend seasonality residual,"Is there known seasonality (weekly/monthly/yearly)?","Seasonal decomposition, moving averages","Decomposed components visualization","Ignoring seasonality when comparing periods"
-Time Series Analysis,5,Compare Periods,YoY MoM WoW comparisons,"Which comparison periods matter most?","LAG functions, period-over-period calculations","Comparison table with growth rates","Comparing incomplete periods"
-Time Series Analysis,6,Forecast (optional),Predict future values,"Do you need forecasting? What horizon?","Simple forecasting models","Forecast with confidence intervals","Overfitting on historical data"
-Customer Segmentation,1,Define Variables,What to segment on,"RFM, behavior, or demographics? What actions should differ by segment?","None - conversation","Segmentation variables defined","Choosing variables that don't drive different actions"
-Customer Segmentation,2,Feature Engineering,Calculate segment variables,"What time window for calculating features?","SQL or Python for RFM or other features","Feature table ready for segmentation","Using raw values instead of normalized scores"
-Customer Segmentation,3,Clustering,Group similar customers,"How many segments should we create?","K-means or rule-based segmentation","Cluster assignments","Too many or too few segments"
-Customer Segmentation,4,Profile Segments,Describe each group characteristics,"Which metrics matter most for describing segments?","Aggregate statistics per segment","Segment profile table","Not validating segments are actionable"
-Customer Segmentation,5,Actionable Names,Name the segments memorably,"Any naming conventions to follow?","Creative naming","Named segments (e.g., Champions, At Risk)","Generic names that don't inspire action"
-Data Cleaning Pipeline,1,Profiling,Understand data quality issues,"What quality issues are you aware of?","df.isnull().sum(), df.duplicated().sum()","Data quality report","Assuming data is clean without checking"
-Data Cleaning Pipeline,2,Missing Values,Handle nulls appropriately,"Can I drop rows with missing data or should I impute?","fillna(), dropna(), imputation strategies","Data with handled missing values","Using wrong imputation strategy (mean for skewed data)"
-Data Cleaning Pipeline,3,Duplicates,Remove redundant rows,"What makes a row a duplicate (exact match or by key)?","drop_duplicates(), deduplication logic","Deduplicated data","Removing wrong duplicates (losing valid data)"
-Data Cleaning Pipeline,4,Outliers,Handle extreme values,"Should outliers be removed, capped, or kept?","IQR, Z-score detection, capping","Data with handled outliers","Removing outliers that are valid data points"
-Data Cleaning Pipeline,5,Validation,Verify clean data meets expectations,"What validation checks should pass?","Assertions, before/after comparison","Validation report confirming data quality","Not comparing before/after statistics"
-Ad-hoc Query Analysis,1,Clarify Question,What exactly do they need,"Can you give me an example of the desired output format?","None - conversation","Clear requirements documented","Assuming you understand without confirming"
-Ad-hoc Query Analysis,2,Identify Tables,Where is the data located,"Which database/schema/tables contain this data? Any documentation?","Schema exploration, data dictionary","Table and column mapping","Joining wrong tables or using outdated sources"
-Ad-hoc Query Analysis,3,Write Query,Draft SQL or Python code,"None - writing code","SQL or Python script","Working query with explanation","Not explaining the logic behind complex queries"
-Ad-hoc Query Analysis,4,Validate,Check results make sense,"Does the output look correct? Check this sample.","Sample verification, total checks","Validated results","Delivering without sanity checking totals"
-Ad-hoc Query Analysis,5,Iterate,Refine based on feedback,"Does this answer your question? Any adjustments needed?","Query modifications","Final refined query and results","Not iterating when initial results are wrong"
-KPI Reporting,1,Define KPIs,Which metrics to report,"What are the most important KPIs for this report? Any targets?","Search industry metrics","Selected KPIs with targets","Too many KPIs dilute focus"
-KPI Reporting,2,Calculate,Compute current values,"What time period should I calculate for?","SQL for each KPI calculation","Current KPI values","Calculation errors in complex KPIs"
-KPI Reporting,3,Compare,vs previous period or target,"Compare to last period, last year, or target?","YoY, MoM, vs goal calculations","Comparison table with deltas","Comparing to wrong baseline"
-KPI Reporting,4,Format,Create readable report,"Prefer table, cards, or dashboard format?","Report formatting","Formatted KPI report","Poor formatting reduces readability"
-KPI Reporting,5,Highlight,What needs attention,"What threshold triggers a red flag?","Conditional formatting, alerts","Highlighted issues needing action","Not drawing attention to problems"