PyPI - churnkit - Versions diffs - 0.75.0a1__py3-none-any.whl - Mend

churnkit 0.75.0a1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (302) hide show

churnkit-0.75.0a1.data/data/share/churnkit/exploration_notebooks/01b_temporal_quality.ipynb ADDED Viewed

@@ -0,0 +1,679 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cell-0",
+   "metadata": {
+    "papermill": {
+     "duration": 0.002487,
+     "end_time": "2026-02-02T13:00:52.603186",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:52.600699",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "# Chapter 1b: Temporal Quality Assessment (Event Bronze Track)\n",
+    "\n",
+    "**Purpose:** Run quality checks specific to event-level datasets to identify data issues before feature engineering.\n",
+    "\n",
+    "**When to use this notebook:**\n",
+    "- After completing 01a_temporal_deep_dive.ipynb\n",
+    "- Your dataset is EVENT_LEVEL granularity\n",
+    "- You want to validate temporal data integrity before aggregation\n",
+    "\n",
+    "| Check | What It Detects | Why It Matters for ML |\n",
+    "|-------|-----------------|----------------------|\n",
+    "| **TQ001** | Duplicate events (same entity + timestamp) | Inflates counts, skews aggregations, creates artificial sequence patterns |\n",
+    "| **TQ002** | Unexpected temporal gaps | Rolling features become misleading; \"events in last 30d\" drops during gaps |\n",
+    "| **TQ003** | Future dates | Data leakage — model sees future during training |\n",
+    "| **TQ004** | Ambiguous event ordering | Sequence features undefined when multiple events share timestamp |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-1",
+   "metadata": {
+    "papermill": {
+     "duration": 0.00158,
+     "end_time": "2026-02-02T13:00:52.606755",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:52.605175",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.1 Load Findings and Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-2",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:00:52.611011Z",
+     "iopub.status.busy": "2026-02-02T13:00:52.610876Z",
+     "iopub.status.idle": "2026-02-02T13:00:54.504186Z",
+     "shell.execute_reply": "2026-02-02T13:00:54.503492Z"
+    },
+    "papermill": {
+     "duration": 1.896686,
+     "end_time": "2026-02-02T13:00:54.505137",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:52.608451",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from customer_retention.analysis.notebook_progress import track_and_export_previous\n",
+    "track_and_export_previous(\"01b_temporal_quality.ipynb\")\n",
+    "\n",
+    "from pathlib import Path\n",
+    "import pandas as pd\n",
+    "import plotly.graph_objects as go\n",
+    "from plotly.subplots import make_subplots\n",
+    "\n",
+    "from customer_retention.analysis.auto_explorer import ExplorationFindings, RecommendationEngine\n",
+    "from customer_retention.analysis.visualization import ChartBuilder, display_figure\n",
+    "from customer_retention.core.config.column_config import ColumnType\n",
+    "from customer_retention.stages.profiling import (\n",
+    "    DuplicateEventCheck, TemporalGapCheck, FutureDateCheck, EventOrderCheck,\n",
+    "    TemporalQualityReporter, SegmentAwareOutlierAnalyzer\n",
+    ")\n",
+    "from customer_retention.stages.temporal import load_data_with_snapshot_preference, TEMPORAL_METADATA_COLS\n",
+    "from customer_retention.core.config.experiments import FINDINGS_DIR, EXPERIMENTS_DIR, OUTPUT_DIR, setup_experiments_structure\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-3",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:00:54.509631Z",
+     "iopub.status.busy": "2026-02-02T13:00:54.509304Z",
+     "iopub.status.idle": "2026-02-02T13:00:55.198236Z",
+     "shell.execute_reply": "2026-02-02T13:00:55.197651Z"
+    },
+    "papermill": {
+     "duration": 0.692154,
+     "end_time": "2026-02-02T13:00:55.199216",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:54.507062",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# FINDINGS_DIR imported from customer_retention.core.config.experiments\n",
+    "findings_files = sorted(\n",
+    "    [f for f in FINDINGS_DIR.glob(\"*_findings.yaml\") if \"multi_dataset\" not in f.name],\n",
+    "    key=lambda f: f.stat().st_mtime, reverse=True\n",
+    ")\n",
+    "if not findings_files:\n",
+    "    raise FileNotFoundError(f\"No findings in {FINDINGS_DIR}. Run notebook 01 first.\")\n",
+    "\n",
+    "FINDINGS_PATH = str(findings_files[0])\n",
+    "findings = ExplorationFindings.load(FINDINGS_PATH)\n",
+    "print(f\"Using: {FINDINGS_PATH}\")\n",
+    "\n",
+    "ts_meta = findings.time_series_metadata\n",
+    "ENTITY_COLUMN, TIME_COLUMN = ts_meta.entity_column, ts_meta.time_column\n",
+    "print(f\"Entity: {ENTITY_COLUMN}, Time: {TIME_COLUMN}\")\n",
+    "\n",
+    "df, data_source = load_data_with_snapshot_preference(findings, output_dir=str(FINDINGS_DIR))\n",
+    "charts = ChartBuilder()\n",
+    "print(f\"Loaded {len(df):,} rows ({data_source})\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-4",
+   "metadata": {
+    "papermill": {
+     "duration": 0.0014,
+     "end_time": "2026-02-02T13:00:55.202447",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.201047",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.2 Configure Quality Checks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-5",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:00:55.206889Z",
+     "iopub.status.busy": "2026-02-02T13:00:55.206640Z",
+     "iopub.status.idle": "2026-02-02T13:00:55.209364Z",
+     "shell.execute_reply": "2026-02-02T13:00:55.208747Z"
+    },
+    "papermill": {
+     "duration": 0.005584,
+     "end_time": "2026-02-02T13:00:55.209814",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.204230",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "REFERENCE_DATE = pd.Timestamp.now()  # or pd.Timestamp(\"2024-01-01\")\n",
+    "EXPECTED_FREQUENCY = \"D\"  # D=daily, W=weekly, M=monthly, H=hourly\n",
+    "MAX_GAP_MULTIPLE = 3.0\n",
+    "\n",
+    "print(f\"Reference: {REFERENCE_DATE.date()}, Frequency: {EXPECTED_FREQUENCY}, Gap threshold: {MAX_GAP_MULTIPLE}x\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-6",
+   "metadata": {
+    "papermill": {
+     "duration": 0.001492,
+     "end_time": "2026-02-02T13:00:55.213170",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.211678",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.3 Run Temporal Quality Checks\n",
+    "\n",
+    "| Issue Type | ML Impact | Mitigation |\n",
+    "|------------|-----------|------------|\n",
+    "| Duplicates | Sum/count features inflated; artificial patterns in sequences | Deduplicate or add sequence index |\n",
+    "| Gaps | Rolling aggregations drop; recency features spike | Document gaps; add gap indicator feature |\n",
+    "| Future dates | Model trains on leaked future info | Filter to reference date; check timezone handling |\n",
+    "| Ordering | \"Previous event\" features undefined | Add tiebreaker column; use stable sort |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-7",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:00:55.217026Z",
+     "iopub.status.busy": "2026-02-02T13:00:55.216922Z",
+     "iopub.status.idle": "2026-02-02T13:00:55.267263Z",
+     "shell.execute_reply": "2026-02-02T13:00:55.266748Z"
+    },
+    "papermill": {
+     "duration": 0.053177,
+     "end_time": "2026-02-02T13:00:55.267824",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.214647",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "checks = [\n",
+    "    DuplicateEventCheck(entity_column=ENTITY_COLUMN, time_column=TIME_COLUMN),\n",
+    "    TemporalGapCheck(time_column=TIME_COLUMN, expected_frequency=EXPECTED_FREQUENCY, max_gap_multiple=MAX_GAP_MULTIPLE),\n",
+    "    FutureDateCheck(time_column=TIME_COLUMN, reference_date=REFERENCE_DATE),\n",
+    "    EventOrderCheck(entity_column=ENTITY_COLUMN, time_column=TIME_COLUMN),\n",
+    "]\n",
+    "results = [check.run(df) for check in checks]\n",
+    "reporter = TemporalQualityReporter(results, len(df))\n",
+    "reporter.print_results()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-8",
+   "metadata": {
+    "papermill": {
+     "duration": 0.001479,
+     "end_time": "2026-02-02T13:00:55.270977",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.269498",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.4 Quality Score\n",
+    "\n",
+    "| Component | Weight | Scoring Logic |\n",
+    "|-----------|--------|---------------|\n",
+    "| Each check | 25% | 100 if no issues; deductions proportional to % affected |\n",
+    "| Grade A | 90-100 | Proceed with confidence |\n",
+    "| Grade B | 75-89 | Document issues, proceed with caution |\n",
+    "| Grade C | 60-74 | Address issues before feature engineering |\n",
+    "| Grade D | <60 | Investigation required |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-9",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:00:55.275294Z",
+     "iopub.status.busy": "2026-02-02T13:00:55.275192Z",
+     "iopub.status.idle": "2026-02-02T13:00:55.277551Z",
+     "shell.execute_reply": "2026-02-02T13:00:55.276851Z"
+    },
+    "papermill": {
+     "duration": 0.005227,
+     "end_time": "2026-02-02T13:00:55.278101",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.272874",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "reporter.print_score()\n",
+    "quality_score, grade, passed = reporter.quality_score, reporter.grade, reporter.passed"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-10",
+   "metadata": {
+    "papermill": {
+     "duration": 0.001615,
+     "end_time": "2026-02-02T13:00:55.281714",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.280099",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.5 Event Volume Analysis\n",
+    "\n",
+    "| What to Look For | Indicates | Action |\n",
+    "|-----------------|-----------|--------|\n",
+    "| Missing bars | Data gaps (TQ002) | Document; add gap indicator |\n",
+    "| Declining trend | Population shrinkage or data cutoff | Check if intentional |\n",
+    "| Spikes | Campaigns, seasonality, or data issues | Investigate cause |\n",
+    "| Flat periods | Possible logging outages | Verify with data source |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-11",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:00:55.286039Z",
+     "iopub.status.busy": "2026-02-02T13:00:55.285927Z",
+     "iopub.status.idle": "2026-02-02T13:00:55.315871Z",
+     "shell.execute_reply": "2026-02-02T13:00:55.315137Z"
+    },
+    "papermill": {
+     "duration": 0.033261,
+     "end_time": "2026-02-02T13:00:55.316776",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.283515",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "df_temp = df.copy()\n",
+    "df_temp[TIME_COLUMN] = pd.to_datetime(df_temp[TIME_COLUMN])\n",
+    "time_span = (df_temp[TIME_COLUMN].max() - df_temp[TIME_COLUMN].min()).days\n",
+    "\n",
+    "freq, label = (\"D\", \"Daily\") if time_span <= 90 else (\"W\", \"Weekly\") if time_span <= 365 else (\"ME\", \"Monthly\")\n",
+    "counts = df_temp.groupby(pd.Grouper(key=TIME_COLUMN, freq=freq)).size()\n",
+    "\n",
+    "fig = go.Figure(go.Bar(x=counts.index, y=counts.values, marker_color=\"#4682B4\"))\n",
+    "fig.update_layout(title=f\"{label} Event Volume (gaps = missing bars)\", height=300, template=\"plotly_white\")\n",
+    "display_figure(fig)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-14",
+   "metadata": {
+    "papermill": {
+     "duration": 0.003275,
+     "end_time": "2026-02-02T13:00:55.323645",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.320370",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.6 Outlier Analysis\n",
+    "\n",
+    "| Approach | When to Use | Why It Matters |\n",
+    "|----------|-------------|----------------|\n",
+    "| Global detection | Homogeneous data | Simple threshold works |\n",
+    "| Segment-aware | Data has natural groups | Avoids false positives when segments have different scales |\n",
+    "\n",
+    "Segment-aware detection clusters entities by target (or other segment) and detects outliers within each group separately."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-15",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:00:55.331243Z",
+     "iopub.status.busy": "2026-02-02T13:00:55.331125Z",
+     "iopub.status.idle": "2026-02-02T13:01:05.464597Z",
+     "shell.execute_reply": "2026-02-02T13:01:05.463713Z"
+    },
+    "papermill": {
+     "duration": 10.138458,
+     "end_time": "2026-02-02T13:01:05.465385",
+     "exception": false,
+     "start_time": "2026-02-02T13:00:55.326927",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "numeric_cols = [n for n, c in findings.columns.items()\n",
+    "    if c.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE]\n",
+    "    and n not in [ENTITY_COLUMN, TIME_COLUMN] and n not in TEMPORAL_METADATA_COLS]\n",
+    "\n",
+    "if numeric_cols:\n",
+    "    analyzer = SegmentAwareOutlierAnalyzer(max_segments=5)\n",
+    "    result = analyzer.analyze(df, feature_cols=numeric_cols, segment_col=None, target_col=findings.target_column)\n",
+    "    \n",
+    "    print(f\"Segments detected: {result.n_segments}\")\n",
+    "    if result.n_segments > 1:\n",
+    "        data = [{\"Feature\": c, \"Global\": result.global_analysis[c].outliers_detected,\n",
+    "            \"Segment\": sum(s[c].outliers_detected for s in result.segment_analysis.values() if c in s)}\n",
+    "            for c in numeric_cols]\n",
+    "        display(pd.DataFrame(data))\n",
+    "        if result.segmentation_recommended:\n",
+    "            print(\"\\n💡 Segment-specific outlier treatment recommended\")\n",
+    "    else:\n",
+    "        print(\"Data appears homogeneous - using global outlier detection\")\n",
+    "else:\n",
+    "    print(\"No numeric columns for outlier analysis.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-16",
+   "metadata": {
+    "papermill": {
+     "duration": 0.003472,
+     "end_time": "2026-02-02T13:01:05.472843",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.469371",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.7 Data Validation\n",
+    "\n",
+    "| Check | Issue | Impact |\n",
+    "|-------|-------|--------|\n",
+    "| Binary fields | Values outside {0, 1} | Model crashes or silent errors |\n",
+    "| String consistency | Case/spacing variants (\"Yes\" vs \"yes\") | Inflated cardinality; split categories |\n",
+    "| Missing patterns | Systematic missingness | Bias in imputation |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-17",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:01:05.481150Z",
+     "iopub.status.busy": "2026-02-02T13:01:05.481017Z",
+     "iopub.status.idle": "2026-02-02T13:01:05.648331Z",
+     "shell.execute_reply": "2026-02-02T13:01:05.647595Z"
+    },
+    "papermill": {
+     "duration": 0.172374,
+     "end_time": "2026-02-02T13:01:05.649066",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.476692",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Binary field validation\n",
+    "binary_cols = [n for n, c in findings.columns.items() if c.inferred_type == ColumnType.BINARY and n not in TEMPORAL_METADATA_COLS]\n",
+    "for col in binary_cols:\n",
+    "    c0, c1 = (df[col] == 0).sum(), (df[col] == 1).sum()\n",
+    "    print(f\"✓ {col}: 0={c0:,} ({c0/(c0+c1)*100:.1f}%), 1={c1:,} ({c1/(c0+c1)*100:.1f}%)\")\n",
+    "\n",
+    "# Consistency check\n",
+    "issues = []\n",
+    "for col in df.select_dtypes(include=['object']).columns:\n",
+    "    if col in [ENTITY_COLUMN, TIME_COLUMN]: continue\n",
+    "    variants = {}\n",
+    "    for v in df[col].dropna().unique():\n",
+    "        key = str(v).lower().strip()\n",
+    "        variants.setdefault(key, []).append(v)\n",
+    "    issues.extend([{\"Column\": col, \"Variants\": vs} for vs in variants.values() if len(vs) > 1])\n",
+    "\n",
+    "print(f\"\\n{'⚠️ Consistency issues: ' + str(len(issues)) if issues else '✅ No consistency issues'}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-18",
+   "metadata": {
+    "papermill": {
+     "duration": 0.003229,
+     "end_time": "2026-02-02T13:01:05.655939",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.652710",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.8 Recommendations\n",
+    "\n",
+    "Framework-generated recommendations based on column-level issues detected during exploration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-19",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:01:05.663919Z",
+     "iopub.status.busy": "2026-02-02T13:01:05.663801Z",
+     "iopub.status.idle": "2026-02-02T13:01:05.667016Z",
+     "shell.execute_reply": "2026-02-02T13:01:05.666376Z"
+    },
+    "papermill": {
+     "duration": 0.00802,
+     "end_time": "2026-02-02T13:01:05.667651",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.659631",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "rec_engine = RecommendationEngine()\n",
+    "recs = rec_engine.recommend_cleaning(findings)\n",
+    "\n",
+    "if recs:\n",
+    "    for r in sorted(recs, key=lambda x: {\"high\": 0, \"medium\": 1, \"low\": 2}.get(x.severity, 3)):\n",
+    "        icon = {\"high\": \"🔴\", \"medium\": \"🟡\", \"low\": \"🟢\"}.get(r.severity, \"⚪\")\n",
+    "        print(f\"{icon} [{r.severity.upper()}] {r.column_name}: {r.description}\")\n",
+    "        label = r.strategy_label if r.strategy_label else r.strategy.replace(\"_\", \" \").title()\n",
+    "        print(f\"   Strategy: {label}\")\n",
+    "else:\n",
+    "    print(\"✅ No critical cleaning recommendations\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-20",
+   "metadata": {
+    "papermill": {
+     "duration": 0.004024,
+     "end_time": "2026-02-02T13:01:05.675373",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.671349",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "## 1b.9 Save Results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-21",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-02-02T13:01:05.683302Z",
+     "iopub.status.busy": "2026-02-02T13:01:05.683171Z",
+     "iopub.status.idle": "2026-02-02T13:01:05.699141Z",
+     "shell.execute_reply": "2026-02-02T13:01:05.698285Z"
+    },
+    "papermill": {
+     "duration": 0.021202,
+     "end_time": "2026-02-02T13:01:05.699979",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.678777",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "if not findings.metadata:\n",
+    "    findings.metadata = {}\n",
+    "findings.metadata[\"temporal_quality\"] = reporter.to_dict()\n",
+    "findings.save(FINDINGS_PATH)\n",
+    "print(f\"Saved to: {FINDINGS_PATH}\")\n",
+    "print(f\"Score: {quality_score:.0f}/100 (Grade {grade})\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-22",
+   "metadata": {
+    "papermill": {
+     "duration": 0.003113,
+     "end_time": "2026-02-02T13:01:05.706587",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.703474",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "---\n",
+    "\n",
+    "## Summary: What We Learned\n",
+    "\n",
+    "In this notebook, we validated temporal data quality:\n",
+    "\n",
+    "1. **Temporal Quality Checks** — Detected duplicates, gaps, future dates, ordering issues\n",
+    "2. **Quality Score** — Quantified overall data health with pass/fail grading\n",
+    "3. **Event Volume** — Visualized data coverage over time\n",
+    "4. **Outlier Analysis** — Compared global vs segment-aware detection\n",
+    "5. **Data Validation** — Verified binary fields and string consistency\n",
+    "\n",
+    "## Quality Score Interpretation\n",
+    "\n",
+    "| Grade | Score | Meaning | Action |\n",
+    "|-------|-------|---------|--------|\n",
+    "| A | 90-100 | Excellent | Proceed with confidence |\n",
+    "| B | 75-89 | Good | Document issues, proceed |\n",
+    "| C | 60-74 | Fair | Address issues before aggregation |\n",
+    "| D | <60 | Poor | Investigation required |\n",
+    "\n",
+    "---\n",
+    "\n",
+    "## Next Steps\n",
+    "\n",
+    "Continue with the **Event Bronze Track**:\n",
+    "\n",
+    "1. **01c_temporal_patterns.ipynb** — Detect trends, seasonality, cohort effects\n",
+    "2. **01d_event_aggregation.ipynb** — Aggregate events to entity-level features\n",
+    "\n",
+    "After 01d, continue with **Entity Bronze Track** (02 → 03 → 04) on aggregated data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1ec2c4e",
+   "metadata": {
+    "papermill": {
+     "duration": 0.00297,
+     "end_time": "2026-02-02T13:01:05.712602",
+     "exception": false,
+     "start_time": "2026-02-02T13:01:05.709632",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "source": [
+    "> **Save Reminder:** Save this notebook (Ctrl+S / Cmd+S) before running the next one.\n",
+    "> The next notebook will automatically export this notebook's HTML documentation from the saved file."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  },
+  "papermill": {
+   "default_parameters": {},
+   "duration": 16.265903,
+   "end_time": "2026-02-02T13:01:08.333031",
+   "environment_variables": {},
+   "exception": null,
+   "input_path": "/Users/Vital/python/CustomerRetention/exploration_notebooks/01b_temporal_quality.ipynb",
+   "output_path": "/Users/Vital/python/CustomerRetention/exploration_notebooks/01b_temporal_quality.ipynb",
+   "parameters": {},
+   "start_time": "2026-02-02T13:00:52.067128",
+   "version": "2.6.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}