PyPI - detectkit - Versions diffs - 0.8.2__tar.gz → 0.10.0__tar.gz - Mend

detectkit 0.8.2tar.gz → 0.10.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (100) hide show

{detectkit-0.8.2 → detectkit-0.10.0}/MANIFEST.in RENAMED Viewed

@@ -2,6 +2,7 @@ include README.md
 include LICENSE
 include requirements.txt
 recursive-include detectkit *.py
+recursive-include detectkit/cli/assets *.md
 recursive-exclude tests *
 recursive-exclude * __pycache__
 recursive-exclude * *.pyc

{detectkit-0.8.2/detectkit.egg-info → detectkit-0.10.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: detectkit
-Version: 0.8.2
+Version: 0.10.0
 Summary: Metric monitoring with automatic anomaly detection
 Author: detectkit team
 License: MIT
@@ -85,6 +85,7 @@ Dynamic: license-file
 - **Database agnostic** — ClickHouse, PostgreSQL, MySQL
 - **Idempotent** — resume from interruptions, no duplicate processing
 - **CLI** — `dtk init`, `dtk run --select`, `dtk unlock`, `dtk clean`, tag-based selectors
+- **AI-native onboarding** — `dtk init-claude` sets up Claude Code context (CLAUDE.md + rules + a metric-scaffolding skill) so an assistant can help you build metrics out of the box
 ## Installation
@@ -108,6 +109,10 @@ pip install detectkit[all-db]       # All databases
 dtk init my_monitoring
 cd my_monitoring
+# Optional: set up Claude Code context so an AI assistant can help you
+# write metrics, tune detectors and configure alerts (re-run after upgrades)
+dtk init-claude
 # Configure database in profiles.yml, then:
 dtk run --select cpu_usage
 dtk run --select tag:critical

{detectkit-0.8.2 → detectkit-0.10.0}/README.md RENAMED Viewed

@@ -19,6 +19,7 @@
 - **Database agnostic** — ClickHouse, PostgreSQL, MySQL
 - **Idempotent** — resume from interruptions, no duplicate processing
 - **CLI** — `dtk init`, `dtk run --select`, `dtk unlock`, `dtk clean`, tag-based selectors
+- **AI-native onboarding** — `dtk init-claude` sets up Claude Code context (CLAUDE.md + rules + a metric-scaffolding skill) so an assistant can help you build metrics out of the box
 ## Installation
@@ -42,6 +43,10 @@ pip install detectkit[all-db]       # All databases
 dtk init my_monitoring
 cd my_monitoring
+# Optional: set up Claude Code context so an AI assistant can help you
+# write metrics, tune detectors and configure alerts (re-run after upgrades)
+dtk init-claude
 # Configure database in profiles.yml, then:
 dtk run --select cpu_usage
 dtk run --select tag:critical

{detectkit-0.8.2 → detectkit-0.10.0}/detectkit/__init__.py RENAMED Viewed

@@ -4,7 +4,7 @@ detectk - Anomaly Detection for Time-Series Metrics
 A Python library for data analysts and engineers to monitor metrics with automatic anomaly detection.
 """
-__version__ = "0.8.2"
+__version__ = "0.10.0"
 from detectkit.core.interval import Interval
 from detectkit.core.models import ColumnDefinition, TableModel

{detectkit-0.8.2 → detectkit-0.10.0}/detectkit/alerting/channels/base.py RENAMED Viewed

@@ -36,6 +36,15 @@ class AlertData:
             as ``{project_name}`` in templates and as a ``[name] `` prefix
             in the default error title. Lets multiple projects share the
             same alert channel without ambiguity.
+    Alert-rule fields (``min_detectors``, ``direction_policy``,
+    ``consecutive_required``, ``detector_count``) describe *why the alert
+    fired* — the configured quorum/direction/consecutive thresholds plus
+    the observed number of agreeing detectors. They are filled by the
+    orchestrator from :class:`AlertConditions` and are deliberately kept
+    distinct from the observed ``direction``/``consecutive_count`` above so
+    templates can contrast "required vs actual". They default to ``None``
+    so direct-API callers (and non-anomaly alerts) still render cleanly.
     """
     metric_name: str
@@ -58,6 +67,11 @@ class AlertData:
     description: str | None = None
     mentions: list[str] = field(default_factory=list)
     project_name: str | None = None
+    # Alert rule (the parameters the alert fired with) — see class docstring.
+    min_detectors: int | None = None
+    direction_policy: str | None = None
+    consecutive_required: int | None = None
+    detector_count: int = 1
 class BaseAlertChannel(ABC):
@@ -123,10 +137,17 @@ class BaseAlertChannel(ABC):
         - {value} / {value_display}
         - {confidence_lower}
         - {confidence_upper}
+        - {confidence_interval} — "[lower, upper]" or "N/A"
+        - {expected_range} — one-sided aware: ">= lo", "<= hi",
+          "[lo, hi]" or "N/A" (renders one-sided detector bounds cleanly)
         - {detector_name}
-        - {direction}
+        - {detector_count} — observed detectors that agreed (the quorum)
+        - {direction} — observed/locked direction of the anomaly
+        - {direction_policy} — configured direction rule ("same"/"any"/...)
+        - {min_detectors} — configured quorum threshold (the rule)
+        - {consecutive_count} — observed consecutive points
+        - {consecutive_required} — configured consecutive threshold (rule)
         - {severity}
-        - {consecutive_count}
         - {status}
         Args:
@@ -177,6 +198,40 @@ class BaseAlertChannel(ABC):
         else:
             confidence_str = "N/A"
+        # One-sided-aware expected range. A NaN/inf bound means "no bound on
+        # that side" (e.g. ManualBounds with only ``lower_bound`` set), so we
+        # render ">= lo" / "<= hi" instead of the confusing "[7.00, nan]".
+        def _bounded(b: Any) -> bool:
+            return b is not None and not (isinstance(b, float) and (math.isnan(b) or math.isinf(b)))
+        lo_ok = _bounded(alert_data.confidence_lower)
+        hi_ok = _bounded(alert_data.confidence_upper)
+        if lo_ok and hi_ok:
+            expected_range = (
+                f"[{alert_data.confidence_lower:.2f}, {alert_data.confidence_upper:.2f}]"
+            )
+        elif lo_ok:
+            expected_range = f">= {alert_data.confidence_lower:.2f}"
+        elif hi_ok:
+            expected_range = f"<= {alert_data.confidence_upper:.2f}"
+        else:
+            expected_range = "N/A"
+        # Alert-rule display values. The orchestrator fills these from the
+        # configured AlertConditions; for direct-API/non-anomaly callers that
+        # leave them unset we fall back to the observed counts so the default
+        # templates never render a bare "None".
+        detector_count = alert_data.detector_count
+        min_detectors = (
+            alert_data.min_detectors if alert_data.min_detectors is not None else detector_count
+        )
+        consecutive_required = (
+            alert_data.consecutive_required
+            if alert_data.consecutive_required is not None
+            else alert_data.consecutive_count
+        )
+        direction_policy = alert_data.direction_policy or alert_data.direction
         # Display-safe value: stays usable even when value is None/NaN (no-data).
         raw_value = alert_data.value
         if raw_value is None or (isinstance(raw_value, float) and math.isnan(raw_value)):
@@ -221,11 +276,16 @@ class BaseAlertChannel(ABC):
                 confidence_lower=alert_data.confidence_lower,
                 confidence_upper=alert_data.confidence_upper,
                 confidence_interval=confidence_str,
+                expected_range=expected_range,
                 detector_name=alert_data.detector_name,
+                detector_count=detector_count,
                 detector_params=alert_data.detector_params,
                 direction=alert_data.direction,
+                direction_policy=direction_policy,
+                min_detectors=min_detectors,
                 severity=alert_data.severity,
                 consecutive_count=alert_data.consecutive_count,
+                consecutive_required=consecutive_required,
                 status=status,
                 error_type=alert_data.error_type or "",
                 error_message=alert_data.error_message or "",
@@ -312,12 +372,19 @@ class BaseAlertChannel(ABC):
             Default template string
         """
         return (
-            "Anomaly detected in metric: {metric_name}\n"
+            "⚠ Alert: {metric_name}\n"
             "{description_line}"
-            "Time: {timestamp}\n"
-            "Value: {value} | CI: {confidence_interval}\n"
-            "Direction: {direction} | Severity: {severity:.2f} | Consecutive: {consecutive_count}\n"
-            "Detector: {detector_name}\n"
+            "Quorum {detector_count}/{min_detectors} · "
+            "direction {direction} (policy {direction_policy}) · "
+            "consecutive {consecutive_count}/{consecutive_required}\n"
+            "Rule: min_detectors={min_detectors} · "
+            "direction={direction_policy} · consecutive={consecutive_required}\n"
+            "\n"
+            "Latest point (evidence):\n"
+            "· Time: {timestamp}\n"
+            "· Value: {value_display} | Expected: {expected_range}\n"
+            "· Severity: {severity:.2f}\n"
+            "Detectors: {detector_name}\n"
             "Parameters: {detector_params}"
             "{mentions_line}"
         )
@@ -330,12 +397,17 @@ class BaseAlertChannel(ABC):
             Default recovery template string
         """
         return (
-            "Metric recovered: {metric_name}\n"
+            "✅ Alert cleared: {metric_name}\n"
             "{description_line}"
-            "Time: {timestamp}\n"
-            "Value: {value} | CI: {confidence_interval}\n"
-            "Detector: {detector_name}\n"
-            "Status: metric returned to normal"
+            "The alert condition no longer holds — "
+            "the metric is back within expected bounds.\n"
+            "Rule: min_detectors={min_detectors} · "
+            "direction={direction_policy} · consecutive={consecutive_required}\n"
+            "\n"
+            "Latest point:\n"
+            "· Time: {timestamp}\n"
+            "· Value: {value_display} | Expected: {expected_range}\n"
+            "Detectors: {detector_name}"
             "{mentions_line}"
         )
@@ -348,7 +420,7 @@ class BaseAlertChannel(ABC):
         Returns:
             Default title template string
         """
-        return "Anomaly detected: {metric_name}"
+        return "⚠ Alert: {metric_name}"
     def get_default_recovery_title_template(self) -> str:
         """
@@ -357,7 +429,7 @@ class BaseAlertChannel(ABC):
         Returns:
             Default recovery title template string
         """
-        return "Metric recovered: {metric_name}"
+        return "✅ Alert cleared: {metric_name}"
     def get_default_no_data_template(self) -> str:
         """

{detectkit-0.8.2 → detectkit-0.10.0}/detectkit/alerting/channels/email.py RENAMED Viewed

@@ -54,7 +54,7 @@ class EmailChannel(BaseAlertChannel):
         smtp_username: str | None = None,
         smtp_password: str | None = None,
         use_tls: bool = True,
-        subject_template: str = "Anomaly Alert: {metric_name}",
+        subject_template: str = "⚠ Alert: {metric_name}",
         template: str | None = None,
         **kwargs,
     ):

{detectkit-0.8.2 → detectkit-0.10.0}/detectkit/alerting/channels/webhook.py RENAMED Viewed

@@ -155,10 +155,17 @@ class WebhookChannel(BaseAlertChannel):
         """
         return (
             "{description_line}"
-            "Time: {timestamp}\n"
-            "Value: {value} | CI: {confidence_interval}\n"
-            "Direction: {direction} | Severity: {severity:.2f} | Consecutive: {consecutive_count}\n"
-            "Detector: {detector_name}\n"
+            "Quorum {detector_count}/{min_detectors} · "
+            "direction {direction} (policy {direction_policy}) · "
+            "consecutive {consecutive_count}/{consecutive_required}\n"
+            "Rule: min_detectors={min_detectors} · "
+            "direction={direction_policy} · consecutive={consecutive_required}\n"
+            "\n"
+            "Latest point (evidence):\n"
+            "· Time: {timestamp}\n"
+            "· Value: {value_display} | Expected: {expected_range}\n"
+            "· Severity: {severity:.2f}\n"
+            "Detectors: {detector_name}\n"
             "Parameters: {detector_params}"
             "{mentions_line}"
         )
@@ -171,10 +178,15 @@ class WebhookChannel(BaseAlertChannel):
         """
         return (
             "{description_line}"
-            "Time: {timestamp}\n"
-            "Value: {value} | CI: {confidence_interval}\n"
-            "Detector: {detector_name}\n"
-            "Status: metric returned to normal"
+            "The alert condition no longer holds — "
+            "the metric is back within expected bounds.\n"
+            "Rule: min_detectors={min_detectors} · "
+            "direction={direction_policy} · consecutive={consecutive_required}\n"
+            "\n"
+            "Latest point:\n"
+            "· Time: {timestamp}\n"
+            "· Value: {value_display} | Expected: {expected_range}\n"
+            "Detectors: {detector_name}"
             "{mentions_line}"
         )

{detectkit-0.8.2 → detectkit-0.10.0}/detectkit/alerting/orchestrator/_decision.py RENAMED Viewed

@@ -207,6 +207,23 @@ class _DecisionMixin(_OrchestratorBase):
             detector_params = primary.detector_params
             combined_metadata = primary.detection_metadata
+        # Observed direction shown in the message. For "same"/"up"/"down" the
+        # caller passes the locked/policy direction. For "any" it passes None
+        # because the quorum may combine directions — collapse to the shared
+        # side only when every quorum member agrees, otherwise label it
+        # "mixed" so the message never claims an agreement that did not happen
+        # (e.g. one up + one down satisfying min_detectors=2).
+        if direction:
+            observed_direction = direction
+        else:
+            quorum_dirs = {d.direction for d in anomalies if d.direction in ("up", "down")}
+            if len(quorum_dirs) == 1:
+                observed_direction = next(iter(quorum_dirs))
+            elif len(quorum_dirs) >= 2:
+                observed_direction = "mixed"
+            else:
+                observed_direction = primary.direction
         return AlertData(
             metric_name=self.metric_name,
             timestamp=primary.timestamp,
@@ -216,12 +233,18 @@ class _DecisionMixin(_OrchestratorBase):
             confidence_upper=primary.confidence_upper,
             detector_name=detector_name,
             detector_params=detector_params,
-            direction=direction or primary.direction,
+            direction=observed_direction,
             severity=max_severity,
             detection_metadata=combined_metadata,
             consecutive_count=consecutive_count,
             description=self.description,
             mentions=self.mentions,
+            # Alert rule the message foregrounds: configured thresholds plus
+            # the observed quorum size that satisfied them.
+            min_detectors=self.conditions.min_detectors,
+            direction_policy=self.conditions.direction,
+            consecutive_required=self.conditions.consecutive_anomalies,
+            detector_count=len(anomalies),
         )
     def should_alert_no_data(

{detectkit-0.8.2 → detectkit-0.10.0}/detectkit/alerting/orchestrator/_recovery.py RENAMED Viewed

@@ -176,4 +176,9 @@ class _RecoveryMixin(_OrchestratorBase):
             is_recovery=True,
             description=self.description,
             mentions=self.mentions,
+            # Echo the rule that had fired so the recovery message names the
+            # same alert condition that just cleared.
+            min_detectors=self.conditions.min_detectors,
+            direction_policy=self.conditions.direction,
+            consecutive_required=self.conditions.consecutive_anomalies,
         )

detectkit-0.10.0/detectkit/cli/assets/claude/CLAUDE.section.md ADDED Viewed

@@ -0,0 +1,54 @@
+## detectkit — metric anomaly monitoring
+This workspace contains one or more **detectkit** projects. detectkit is a
+dbt-like Python tool for monitoring time-series metrics: each metric is a SQL
+query plus one or more anomaly **detectors** defined in YAML, run through a
+`load → detect → alert` pipeline with the `dtk` CLI. A directory is a detectkit
+project when it contains a `detectkit_project.yml` file.
+**Help the user operate detectkit**: create and edit metrics, tune detectors,
+configure alerting and channels, run the pipeline, and debug why an alert did
+(or didn't) fire. Stay numpy/SQL/YAML-first and follow the project's existing
+conventions.
+### Where to look (read the matching file before answering)
+The full, authoritative reference lives in `.claude/rules/detectkit/`. These
+files are generated by `dtk init-claude` and track the installed detectkit
+version — **read the relevant one on demand** instead of guessing:
+| If the task is about… | Read |
+|---|---|
+| What detectkit is, the pipeline, internal tables, glossary | `.claude/rules/detectkit/overview.md` |
+| `dtk` commands, selectors, backfills, locks, cleanup | `.claude/rules/detectkit/cli.md` |
+| `detectkit_project.yml`, `profiles.yml`, DB connections, channels | `.claude/rules/detectkit/project.md` |
+| A metric YAML: query, interval, seasonality, loading | `.claude/rules/detectkit/metrics.md` |
+| Choosing/tuning detectors, preprocessing, trends, seasonality | `.claude/rules/detectkit/detectors.md` |
+| Alert rules (quorum/direction/consecutive), cooldown, recovery, templates | `.claude/rules/detectkit/alerting.md` |
+### Set up & scaffold (skills)
+- **First-time setup** — use the **`dtk-setup-project`** skill to configure the
+  database connection in `profiles.yml` (the `dtk init` placeholder ships example
+  values that need your real connection details) and, optionally, a first alert
+  channel.
+- **A new metric** — use the **`dtk-new-metric`** skill; it walks the config out
+  to a YAML file that validates and is ready to run.
+### Gotchas that bite (keep these in mind)
+- **Every loading query MUST filter its time range** on `{{ dtk_start_time }}`
+  and `{{ dtk_end_time }}` (rendered as `'YYYY-MM-DD HH:MM:SS'`, so quote them).
+  Without it, incremental/batched loading cannot work.
+- **Metric `name` must be unique** across the whole project — it is the
+  database key, not the filename. Keep filename and `name` in sync.
+- **Changing a detector parameter changes the detector's identity** and
+  recomputes its detections from scratch; the old rows are orphaned. After
+  retuning a live metric, run `dtk clean --select <metric>` to prune them.
+- **`alert_cooldown` defaults to `null`** = a persisting anomaly re-alerts on
+  *every* `dtk run`. Always set a cooldown for production metrics.
+- The pipeline is **idempotent**: it resumes from the last saved timestamp.
+  Don't reprocess history unless you mean to (`--full-refresh` / `--from`).
+> Generated by `dtk init-claude`. Re-run it after upgrading detectkit to refresh
+> these instructions and the files under `.claude/rules/detectkit/`.

detectkit-0.10.0/detectkit/cli/assets/claude/rules/alerting.md ADDED Viewed

@@ -0,0 +1,192 @@
+# detectkit — Alerting
+detectkit is **alert-centric**: the *alert* is the primary entity and a detector
+anomaly is secondary evidence a rule interprets (the same anomaly means
+different things under different rules). Configure alerting per metric under
+`alerting:`. Channels themselves are defined in `profiles.yml` (see
+`project.md`).
+```yaml
+alerting:
+  enabled: true
+  channels: [mattermost_ops]
+  min_detectors: 1
+  direction: "same"
+  consecutive_anomalies: 3
+  alert_cooldown: "30min"
+```
+## The alert rule (quorum × direction × consecutive)
+At the alert step, detectkit looks at the most recent detections and applies one
+combined contract:
+1. **Quorum** — at each timestamp, group all detectors' anomalies. The point
+   satisfies the quorum when at least `min_detectors` of them match the
+   `direction` policy.
+2. **Consecutive** — an alert fires only when the latest `consecutive_anomalies`
+   timestamps each satisfy the quorum **and** are grid-adjacent (exactly one
+   `interval` apart). A missing detection row between two anomalies breaks the
+   chain.
+### `min_detectors` (default 1)
+How many detectors must qualify at **every** point in the chain. `1` = any one
+detector (high recall); `N` = all must agree (high precision).
+### `direction` (default `"same"`)
+Which anomalies count toward the quorum:
+- `"same"` — at the latest point, ≥`min_detectors` detectors must agree on **one**
+  direction (up and down counted separately — disagreement is not consensus).
+  The winning direction is **locked for the whole chain**. Ties: more detectors
+  win, then the more severe side.
+- `"any"` — every anomaly counts regardless of direction (1 up + 1 down
+  satisfies `min_detectors: 2`).
+- `"up"` — only anomalies above the interval count (others ignored, never block).
+- `"down"` — only anomalies below the interval count.
+Pick by meaning: `"up"` for CPU/error rate (high is bad), `"down"` for cache hit
+rate/uptime (low is bad), `"any"` for single-detector "any deviation matters",
+`"same"` for multi-detector consensus.
+### `consecutive_anomalies` (default 3)
+Grid-adjacent quorum points required before alerting. `1` = alert immediately
+(critical metrics); `3` = balanced; `5+` = noisy metrics. Gaps in the detection
+grid break the chain.
+### Worked example (two detectors A, B; `min_detectors: 2`)
+| `direction` | A | B | Result |
+|---|---|---|---|
+| `same` | up | down | no alert (disagreement) |
+| `same` | up | up | quorum; "up" locked for the chain |
+| `up` | up | down | no quorum (only one "up", needs 2) |
+| `down` | up | up | no quorum ("up" ignored) |
+| `any` | up | down | quorum (every anomaly counts) |
+## Cooldown (spam control) — **set it in production**
+`alert_cooldown` defaults to **`null` = no cooldown**, meaning a persisting
+anomaly re-alerts on **every** `dtk run` (e.g. every cron tick). Always set a
+cooldown for production metrics.
+```yaml
+alert_cooldown: "30min"            # or seconds: 1800
+cooldown_reset_on_recovery: true   # default — reset the timer when the metric recovers
+```
+- With `cooldown_reset_on_recovery: true` (recommended): alert on first
+  occurrence, suppress duplicates while it persists, alert again on a fresh
+  incident after recovery.
+- With `false` (strict): an absolute minimum time between any alerts, regardless
+  of recovery — for very noisy metrics.
+- No-data and anomaly alerts **share** the same cooldown state within an alert
+  block. State lives in `_dtk_alert_states`.
+## Recovery notifications
+```yaml
+notify_on_recovery: true        # default false
+template_recovery: null         # optional custom body
+```
+Sends one notification per incident when the metric returns to normal after an
+alert fired. **Direction-aware**: after a "down" alert, a fresh "up" anomaly
+does not block recovery (the original condition no longer holds). Independent of
+`alert_cooldown` (recovery always sends once per incident). Default body is
+alert-centric (`✅ Alert cleared: <metric>`).
+## No-data alerts
+```yaml
+no_data_alert: true             # default false
+template_no_data: null          # optional custom body
+```
+Fires when the **last complete interval** (now floored to a boundary, minus one
+interval) has no datapoint, or the row's value is `NULL`/`NaN`. `min_detectors`
+and `consecutive_anomalies` do **not** apply (it's a single binary signal).
+Honors `alert_cooldown` and `suppress_until`. Webhook channels render it amber.
+Use for cron loaders where source absence is a real failure; **don't** enable on
+naturally sparse metrics.
+## Temporary suppression
+```yaml
+suppress_until: "2026-04-11 18:00:00"   # UTC; default null
+```
+Load and detect keep running; only alerting is paused until that time, then it
+auto-resumes (no second edit needed). For permanent off, use `enabled: false`.
+## Mentions
+```yaml
+mentions: [oncall_engineer, here]   # plain names, no @
+```
+Channel-agnostic: you write plain usernames and each channel renders them
+natively. Special broadcast keywords: `here`, `channel`, `all`. Available as
+`{mentions}` / `{mentions_line}` template variables (appended automatically if
+not placed in a template). Slack `@username` is display-only — use Slack user
+IDs (`U…`) for real pings.
+## Multiple alert configs per metric
+`alerting:` may be a **list** of independent blocks, each with its own channels,
+timezone, template, and rule — evaluated and sent independently:
+```yaml
+alerting:
+  - {enabled: true, channels: [mattermost_ops], consecutive_anomalies: 3}
+  - {enabled: true, channels: [slack_critical], consecutive_anomalies: 1, direction: "up"}
+```
+Each block's state is keyed by a hash of its functional fields; editing those
+fields or removing a block orphans its `_dtk_alert_states` row (prune with
+`dtk clean`). Disabling with `enabled: false` keeps the hash, so a paused alert
+is never treated as orphaned.
+## Templates
+Defaults are alert-centric. Override with:
+- `template_single` — alerts with `consecutive_count` ≤ 1.
+- `template_consecutive` — streaks (`> 1`); falls back to `template_single`.
+- `template_recovery`, `template_no_data` — recovery / no-data bodies.
+Templates are plain `{var}` strings (or Jinja2 `.j2` files under `templates_dir`
+referenced by path). Key variables:
+| Variable | Meaning |
+|---|---|
+| `{metric_name}`, `{description}` / `{description_line}` | identity |
+| `{timestamp}`, `{timezone}` | when (display tz via `alerting.timezone`, default UTC) |
+| `{value}` / `{value_display}` | metric value (`value_display` is NaN-safe) |
+| `{confidence_lower}` / `{confidence_upper}` / `{confidence_interval}` | bounds |
+| `{expected_range}` | one-sided-aware band (`>= 7.00`, `<= 1.10`, `[lo, hi]`, `N/A`) |
+| `{detector_name}`, `{detector_count}` | who fired (`"N detectors"` for multi) |
+| `{min_detectors}` / `{direction_policy}` / `{consecutive_required}` | the configured rule |
+| `{direction}`, `{consecutive_count}`, `{severity}` | observed values |
+| `{status}` | `ANOMALY` / `RECOVERED` / `NO_DATA` / `ERROR` |
+| `{mentions}` / `{mentions_line}` | formatted mentions |
+> For no-data/error alerts there is no numeric value — avoid `{value:.2f}` in
+> those templates (detectkit falls back to the default template rather than
+> crashing, but write kind-appropriate templates).
+## Test, tune, debug
+```bash
+dtk test-alert <metric>     # mock alert through the real channels, using this rule
+```
+- **Too many alerts** → raise `consecutive_anomalies`, raise detector
+  `threshold`, use `min_detectors: 2`, add seasonality, or set a `direction`.
+- **No alerts** → check `enabled: true`, channels exist in `profiles.yml`,
+  detections exist (`dtk run --steps detect`), the quorum/consecutive thresholds
+  aren't too high, and `direction` isn't filtering the move out.
+- **Wrong direction** (alerting when CPU drops) → set `direction: "up"`.
+- Aim for **< 5 alerts/day/team** to avoid fatigue.

detectkit 0.8.2__tar.gz → 0.10.0__tar.gz

detectkit 0.8.2tar.gz → 0.10.0tar.gz