detectkit 0.2.8__tar.gz → 0.3.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (59) hide show
  1. {detectkit-0.2.8/detectkit.egg-info → detectkit-0.3.1}/PKG-INFO +20 -12
  2. {detectkit-0.2.8 → detectkit-0.3.1}/README.md +19 -11
  3. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/orchestrator.py +165 -1
  4. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/cli/commands/run.py +29 -13
  5. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/config/metric_config.py +13 -0
  6. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/database/internal_tables.py +122 -0
  7. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/database/tables.py +10 -1
  8. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/orchestration/task_manager.py +2 -0
  9. {detectkit-0.2.8 → detectkit-0.3.1/detectkit.egg-info}/PKG-INFO +20 -12
  10. {detectkit-0.2.8 → detectkit-0.3.1}/pyproject.toml +1 -1
  11. {detectkit-0.2.8 → detectkit-0.3.1}/LICENSE +0 -0
  12. {detectkit-0.2.8 → detectkit-0.3.1}/MANIFEST.in +0 -0
  13. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/__init__.py +0 -0
  14. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/__init__.py +0 -0
  15. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/__init__.py +0 -0
  16. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/base.py +0 -0
  17. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/email.py +0 -0
  18. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/factory.py +0 -0
  19. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/mattermost.py +0 -0
  20. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/slack.py +0 -0
  21. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/telegram.py +0 -0
  22. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/alerting/channels/webhook.py +0 -0
  23. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/cli/__init__.py +0 -0
  24. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/cli/commands/__init__.py +0 -0
  25. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/cli/commands/init.py +0 -0
  26. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/cli/commands/test_alert.py +0 -0
  27. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/cli/main.py +0 -0
  28. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/config/__init__.py +0 -0
  29. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/config/profile.py +0 -0
  30. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/config/project_config.py +0 -0
  31. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/config/validator.py +0 -0
  32. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/core/__init__.py +0 -0
  33. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/core/interval.py +0 -0
  34. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/core/models.py +0 -0
  35. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/database/__init__.py +0 -0
  36. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/database/clickhouse_manager.py +0 -0
  37. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/database/manager.py +0 -0
  38. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/__init__.py +0 -0
  39. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/base.py +0 -0
  40. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/factory.py +0 -0
  41. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/statistical/__init__.py +0 -0
  42. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/statistical/iqr.py +0 -0
  43. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/statistical/mad.py +0 -0
  44. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/statistical/manual_bounds.py +0 -0
  45. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/detectors/statistical/zscore.py +0 -0
  46. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/loaders/__init__.py +0 -0
  47. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/loaders/metric_loader.py +0 -0
  48. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/loaders/query_template.py +0 -0
  49. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/orchestration/__init__.py +0 -0
  50. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/utils/__init__.py +0 -0
  51. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit/utils/stats.py +0 -0
  52. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit.egg-info/SOURCES.txt +0 -0
  53. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit.egg-info/dependency_links.txt +0 -0
  54. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit.egg-info/entry_points.txt +0 -0
  55. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit.egg-info/requires.txt +0 -0
  56. {detectkit-0.2.8 → detectkit-0.3.1}/detectkit.egg-info/top_level.txt +0 -0
  57. {detectkit-0.2.8 → detectkit-0.3.1}/requirements.txt +0 -0
  58. {detectkit-0.2.8 → detectkit-0.3.1}/setup.cfg +0 -0
  59. {detectkit-0.2.8 → detectkit-0.3.1}/setup.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: detectkit
3
- Version: 0.2.8
3
+ Version: 0.3.1
4
4
  Summary: Metric monitoring with automatic anomaly detection
5
5
  Author: detectkit team
6
6
  License: MIT
@@ -68,12 +68,19 @@ Dynamic: license-file
68
68
 
69
69
  ## Status
70
70
 
71
- ✅ **Production Ready** - Version 0.1.2
71
+ ✅ **Production Ready** - Version 0.3.0
72
72
 
73
73
  Published to PyPI: https://pypi.org/project/detectkit/
74
74
 
75
75
  Complete rewrite with modern architecture and full documentation (2025).
76
76
 
77
+ ### What's New in v0.3.0
78
+
79
+ 🎯 **Alert Cooldown** - Prevent alert spam from persistent anomalies
80
+ - Configure minimum time between alerts (`alert_cooldown: "30min"`)
81
+ - Automatic recovery detection (`cooldown_reset_on_recovery: true`)
82
+ - Stops duplicate alerts during long-running issues
83
+
77
84
  ## Features
78
85
 
79
86
  - ✅ **Pure numpy arrays** - No pandas dependency in core logic
@@ -225,13 +232,14 @@ This project is currently in active development. Contributions are welcome once
225
232
 
226
233
  ## Changelog
227
234
 
228
- ### 0.1.0 (2025-11-07)
229
- - Initial release with complete rewrite
230
- - Core foundation: models, database, config
231
- - ✅ Metric loading with gap filling and seasonality extraction
232
- - Statistical detectors (Z-Score, MAD, IQR, Manual Bounds)
233
- - Alert channels (Webhook, Mattermost, Slack)
234
- - Alert orchestration with consecutive anomaly logic
235
- - Task manager for pipeline execution
236
- - ✅ CLI commands (dtk init, dtk run)
237
- - 📊 287 unit tests, 87% coverage
235
+ See [CHANGELOG.md](CHANGELOG.md) for complete version history.
236
+
237
+ ### Recent Releases
238
+
239
+ **[0.3.0]** (2025-11-10) - Alert cooldown system, spam prevention
240
+ **[0.2.8]** (2025-11-10) - Fix incomplete interval detection
241
+ **[0.2.7]** (2025-11-10) - Add _dtk_metrics table
242
+ **[0.2.0]** (2025-11-06) - Detector preprocessing and value weighting
243
+ **[0.1.0]** (2025-11-03) - Initial release
244
+
245
+ [Full changelog →](CHANGELOG.md)
@@ -6,12 +6,19 @@
6
6
 
7
7
  ## Status
8
8
 
9
- ✅ **Production Ready** - Version 0.1.2
9
+ ✅ **Production Ready** - Version 0.3.0
10
10
 
11
11
  Published to PyPI: https://pypi.org/project/detectkit/
12
12
 
13
13
  Complete rewrite with modern architecture and full documentation (2025).
14
14
 
15
+ ### What's New in v0.3.0
16
+
17
+ 🎯 **Alert Cooldown** - Prevent alert spam from persistent anomalies
18
+ - Configure minimum time between alerts (`alert_cooldown: "30min"`)
19
+ - Automatic recovery detection (`cooldown_reset_on_recovery: true`)
20
+ - Stops duplicate alerts during long-running issues
21
+
15
22
  ## Features
16
23
 
17
24
  - ✅ **Pure numpy arrays** - No pandas dependency in core logic
@@ -163,13 +170,14 @@ This project is currently in active development. Contributions are welcome once
163
170
 
164
171
  ## Changelog
165
172
 
166
- ### 0.1.0 (2025-11-07)
167
- - Initial release with complete rewrite
168
- - Core foundation: models, database, config
169
- - ✅ Metric loading with gap filling and seasonality extraction
170
- - Statistical detectors (Z-Score, MAD, IQR, Manual Bounds)
171
- - Alert channels (Webhook, Mattermost, Slack)
172
- - Alert orchestration with consecutive anomaly logic
173
- - Task manager for pipeline execution
174
- - ✅ CLI commands (dtk init, dtk run)
175
- - 📊 287 unit tests, 87% coverage
173
+ See [CHANGELOG.md](CHANGELOG.md) for complete version history.
174
+
175
+ ### Recent Releases
176
+
177
+ **[0.3.0]** (2025-11-10) - Alert cooldown system, spam prevention
178
+ **[0.2.8]** (2025-11-10) - Fix incomplete interval detection
179
+ **[0.2.7]** (2025-11-10) - Add _dtk_metrics table
180
+ **[0.2.0]** (2025-11-06) - Detector preprocessing and value weighting
181
+ **[0.1.0]** (2025-11-03) - Initial release
182
+
183
+ [Full changelog →](CHANGELOG.md)
@@ -73,6 +73,8 @@ class AlertOrchestrator:
73
73
  interval: Interval,
74
74
  conditions: Optional[AlertConditions] = None,
75
75
  timezone_display: str = "UTC",
76
+ internal=None, # InternalTablesManager (optional, for cooldown tracking)
77
+ alert_config=None, # AlertConfig (optional, for cooldown settings)
76
78
  ):
77
79
  """
78
80
  Initialize alert orchestrator.
@@ -82,11 +84,15 @@ class AlertOrchestrator:
82
84
  interval: Metric interval
83
85
  conditions: Alert conditions (defaults to AlertConditions())
84
86
  timezone_display: Timezone for alert display (default: UTC)
87
+ internal: InternalTablesManager instance (optional, for cooldown tracking)
88
+ alert_config: AlertConfig instance (optional, for cooldown settings)
85
89
  """
86
90
  self.metric_name = metric_name
87
91
  self.interval = interval
88
92
  self.conditions = conditions or AlertConditions()
89
93
  self.timezone_display = timezone_display
94
+ self.internal = internal
95
+ self.alert_config = alert_config
90
96
 
91
97
  def should_alert(
92
98
  self,
@@ -106,11 +112,16 @@ class AlertOrchestrator:
106
112
  Logic:
107
113
  1. Check if enough detectors triggered (min_detectors)
108
114
  2. Check consecutive anomalies with direction matching
109
- 3. Return decision and formatted AlertData
115
+ 3. Check alert cooldown (if configured)
116
+ 4. Return decision and formatted AlertData
110
117
  """
111
118
  if not recent_detections:
112
119
  return False, None
113
120
 
121
+ # NEW: Check cooldown FIRST (before expensive checks)
122
+ if self._is_in_cooldown():
123
+ return False, None
124
+
114
125
  # Group detections by timestamp
115
126
  detections_by_time = self._group_by_timestamp(recent_detections)
116
127
 
@@ -316,6 +327,15 @@ class AlertOrchestrator:
316
327
  print(f"Error sending alert via {channel_name}: {e}")
317
328
  results[channel_name] = False
318
329
 
330
+ # NEW: Update alert timestamp after sending (for cooldown tracking)
331
+ if any(results.values()) and self.internal:
332
+ # At least one channel succeeded - update timestamp
333
+ self.internal.update_alert_timestamp(
334
+ metric_name=self.metric_name,
335
+ timestamp=datetime.utcnow(),
336
+ increment_count=True
337
+ )
338
+
319
339
  return results
320
340
 
321
341
  def get_last_complete_point(self, now: Optional[datetime] = None) -> datetime:
@@ -357,6 +377,150 @@ class AlertOrchestrator:
357
377
 
358
378
  return datetime.fromtimestamp(last_complete_seconds, tz=timezone.utc)
359
379
 
380
+ def _is_in_cooldown(self) -> bool:
381
+ """
382
+ Check if alert is currently in cooldown period.
383
+
384
+ Returns:
385
+ True if in cooldown (should NOT send alert), False otherwise
386
+
387
+ Logic:
388
+ 1. If alert_cooldown not configured → return False (no cooldown)
389
+ 2. Get last_alert_sent timestamp from database
390
+ 3. If never sent → return False (no cooldown)
391
+ 4. Calculate elapsed time since last alert
392
+ 5. If cooldown_reset_on_recovery=True:
393
+ - Check if recovery happened since last alert
394
+ - If yes → return False (cooldown reset)
395
+ 6. If elapsed < cooldown_interval → return True (in cooldown)
396
+ 7. Otherwise → return False (cooldown expired)
397
+ """
398
+ # No cooldown configured
399
+ if not self.alert_config or not self.alert_config.alert_cooldown:
400
+ return False
401
+
402
+ # No internal manager (can't check cooldown)
403
+ if not self.internal:
404
+ return False
405
+
406
+ # Get last alert timestamp
407
+ last_sent = self.internal.get_last_alert_timestamp(self.metric_name)
408
+
409
+ if not last_sent:
410
+ return False # Never sent alert before
411
+
412
+ # Parse cooldown interval
413
+ from detectkit.core.interval import Interval
414
+ cooldown_interval = Interval(self.alert_config.alert_cooldown)
415
+ cooldown_seconds = cooldown_interval.seconds
416
+
417
+ # Calculate elapsed time
418
+ now = datetime.utcnow()
419
+ elapsed = (now - last_sent).total_seconds()
420
+
421
+ # Check recovery reset (if enabled)
422
+ if self.alert_config.cooldown_reset_on_recovery:
423
+ # Check if recovery happened since last alert
424
+ has_recovery = self._check_recovery_since_last_alert(last_sent)
425
+
426
+ if has_recovery:
427
+ return False # Cooldown reset by recovery
428
+
429
+ # Check if still in cooldown
430
+ return elapsed < cooldown_seconds
431
+
432
+ def _check_recovery_since_last_alert(
433
+ self,
434
+ last_alert_timestamp: datetime
435
+ ) -> bool:
436
+ """
437
+ Check if recovery happened since last alert was sent.
438
+
439
+ Recovery means: consecutive anomalies count dropped below threshold,
440
+ indicating the metric returned to normal state.
441
+
442
+ Args:
443
+ last_alert_timestamp: Timestamp when last alert was sent
444
+
445
+ Returns:
446
+ True if recovery detected, False otherwise
447
+
448
+ Logic:
449
+ 1. Load detections created after last_alert_timestamp
450
+ 2. Count consecutive anomalies using same logic as should_alert()
451
+ 3. If consecutive < required → recovery happened
452
+ 4. If consecutive >= required → still in anomaly state
453
+ """
454
+ if not self.internal:
455
+ return False
456
+
457
+ # Get last complete point
458
+ last_point = self.get_last_complete_point()
459
+
460
+ # Load detections created AFTER last alert
461
+ # We need enough points to check consecutive anomalies
462
+ num_points = self.conditions.consecutive_anomalies + 5 # +5 for margin
463
+
464
+ recent_detections = self.internal.get_recent_detections(
465
+ metric_name=self.metric_name,
466
+ last_point=last_point,
467
+ num_points=num_points,
468
+ created_after=last_alert_timestamp # Only detections AFTER last alert
469
+ )
470
+
471
+ if not recent_detections:
472
+ # No new detections → assume recovery
473
+ return True
474
+
475
+ # Convert to DetectionRecord format
476
+ detection_records = []
477
+ for det in recent_detections:
478
+ # Group has multiple detectors per timestamp
479
+ for i in range(len(det["detector_ids"])):
480
+ # Parse detection metadata
481
+ try:
482
+ import json
483
+ metadata = json.loads(det["detector_params_list"][i])
484
+ except:
485
+ metadata = {}
486
+
487
+ # Determine direction
488
+ value = det["value"]
489
+ conf_lower = det["confidence_lowers"][i]
490
+ conf_upper = det["confidence_uppers"][i]
491
+
492
+ if value < conf_lower:
493
+ direction = "down"
494
+ elif value > conf_upper:
495
+ direction = "up"
496
+ else:
497
+ direction = "none"
498
+
499
+ record = DetectionRecord(
500
+ timestamp=np.datetime64(det["timestamp"]),
501
+ detector_name=det["detector_names"][i],
502
+ detector_id=det["detector_ids"][i],
503
+ detector_params=det["detector_params_list"][i],
504
+ value=value,
505
+ is_anomaly=det["is_anomaly_flags"][i],
506
+ confidence_lower=conf_lower,
507
+ confidence_upper=conf_upper,
508
+ direction=direction,
509
+ severity=0.0, # Not used for recovery check
510
+ detection_metadata=metadata
511
+ )
512
+ detection_records.append(record)
513
+
514
+ # Count consecutive anomalies (same logic as should_alert)
515
+ consecutive = self._count_consecutive_anomalies(
516
+ detections=detection_records,
517
+ min_detectors=self.conditions.min_detectors,
518
+ direction=self.conditions.direction
519
+ )
520
+
521
+ # Recovery = consecutive dropped below threshold
522
+ return consecutive < self.conditions.consecutive_anomalies
523
+
360
524
  def __repr__(self) -> str:
361
525
  """String representation."""
362
526
  return (
@@ -369,17 +369,26 @@ def find_metrics_by_tag(metrics_dir: Path, tag: str) -> List[Path]:
369
369
 
370
370
  matching_metrics = []
371
371
 
372
- for metric_file in metrics_dir.glob("**/*.yml"):
373
- try:
374
- with open(metric_file) as f:
375
- config = yaml.safe_load(f)
376
-
377
- if config and "tags" in config:
378
- if tag in config["tags"]:
379
- matching_metrics.append(metric_file)
380
- except Exception:
381
- # Skip files that can't be parsed
382
- continue
372
+ # Search both .yml and .yaml extensions (consistent with find_metric_by_name)
373
+ for pattern in ["**/*.yml", "**/*.yaml"]:
374
+ for metric_file in metrics_dir.glob(pattern):
375
+ try:
376
+ with open(metric_file) as f:
377
+ config = yaml.safe_load(f)
378
+
379
+ if config and "tags" in config:
380
+ if tag in config["tags"]:
381
+ matching_metrics.append(metric_file)
382
+ except Exception as e:
383
+ # Warn about unparseable files but continue searching
384
+ click.echo(
385
+ click.style(
386
+ f"Warning: Skipping {metric_file.relative_to(metrics_dir.parent)}: {e}",
387
+ fg="yellow"
388
+ ),
389
+ err=True
390
+ )
391
+ continue
383
392
 
384
393
  return matching_metrics
385
394
 
@@ -406,8 +415,15 @@ def find_metric_by_name(metrics_dir: Path, name: str) -> Optional[Path]:
406
415
 
407
416
  if config and config.get("name") == name:
408
417
  return metric_file
409
- except Exception:
410
- # Skip files that can't be parsed
418
+ except Exception as e:
419
+ # Warn about unparseable files but continue searching
420
+ click.echo(
421
+ click.style(
422
+ f"Warning: Skipping {metric_file.relative_to(metrics_dir.parent)}: {e}",
423
+ fg="yellow"
424
+ ),
425
+ err=True
426
+ )
411
427
  continue
412
428
 
413
429
  return None
@@ -141,6 +141,8 @@ class AlertConfig(BaseModel):
141
141
  no_data_alert: Whether to alert when data is missing
142
142
  template_single: Custom template for single anomaly alert
143
143
  template_consecutive: Custom template for consecutive anomalies alert
144
+ alert_cooldown: Minimum interval between alerts (e.g., "30min", 1800 seconds)
145
+ cooldown_reset_on_recovery: Whether to reset cooldown when anomaly recovers
144
146
  """
145
147
 
146
148
  enabled: bool = Field(default=True, description="Enable alerting")
@@ -168,6 +170,17 @@ class AlertConfig(BaseModel):
168
170
  template_consecutive: Optional[str] = Field(
169
171
  default=None, description="Custom template for consecutive anomalies"
170
172
  )
173
+ alert_cooldown: Optional[Union[str, int]] = Field(
174
+ default=None,
175
+ description="Minimum interval between alerts (e.g., '30min', 1800). "
176
+ "If None, no cooldown is applied (alerts sent every time conditions are met)."
177
+ )
178
+ cooldown_reset_on_recovery: bool = Field(
179
+ default=True,
180
+ description="Reset cooldown timer when anomaly recovers to normal. "
181
+ "Only applies if alert_cooldown is set. "
182
+ "True = cooldown resets on recovery, False = strict cooldown independent of recovery."
183
+ )
171
184
 
172
185
  @field_validator("consecutive_anomalies")
173
186
  @classmethod
@@ -841,3 +841,125 @@ class InternalTablesManager:
841
841
  key_columns={"metric_name": metric_config.name},
842
842
  data=data
843
843
  )
844
+
845
+ def get_last_alert_timestamp(
846
+ self,
847
+ metric_name: str
848
+ ) -> Optional[datetime]:
849
+ """
850
+ Get timestamp of last sent alert for a metric.
851
+
852
+ Used for alert cooldown tracking - prevents sending alerts
853
+ too frequently for the same metric.
854
+
855
+ Args:
856
+ metric_name: Metric identifier
857
+
858
+ Returns:
859
+ Timestamp of last sent alert, or None if never sent
860
+
861
+ Example:
862
+ >>> last_sent = internal.get_last_alert_timestamp("cpu_usage")
863
+ >>> if last_sent:
864
+ ... elapsed = (datetime.utcnow() - last_sent).total_seconds()
865
+ ... print(f"Last alert sent {elapsed}s ago")
866
+ """
867
+ full_table_name = self._manager.get_full_table_name(
868
+ TABLE_TASKS, use_internal=True
869
+ )
870
+
871
+ # Query for pipeline task (detector_id="pipeline", process_type="pipeline")
872
+ query = f"""
873
+ SELECT last_alert_sent
874
+ FROM {full_table_name}
875
+ WHERE metric_name = %(metric_name)s
876
+ AND detector_id = 'pipeline'
877
+ AND process_type = 'pipeline'
878
+ LIMIT 1
879
+ """
880
+
881
+ results = self._manager.execute_query(
882
+ query,
883
+ params={"metric_name": metric_name}
884
+ )
885
+
886
+ if not results or not results[0]["last_alert_sent"]:
887
+ return None
888
+
889
+ last_sent = results[0]["last_alert_sent"]
890
+
891
+ # Normalize to naive datetime if needed
892
+ if hasattr(last_sent, 'tzinfo') and last_sent.tzinfo is not None:
893
+ last_sent = last_sent.replace(tzinfo=None)
894
+
895
+ return last_sent
896
+
897
+ def update_alert_timestamp(
898
+ self,
899
+ metric_name: str,
900
+ timestamp: datetime,
901
+ increment_count: bool = True
902
+ ) -> int:
903
+ """
904
+ Update last_alert_sent timestamp and optionally increment alert_count.
905
+
906
+ Called after successfully sending an alert to track cooldown state.
907
+
908
+ Args:
909
+ metric_name: Metric identifier
910
+ timestamp: Timestamp when alert was sent (typically datetime.utcnow())
911
+ increment_count: Whether to increment alert_count (default: True)
912
+
913
+ Returns:
914
+ Number of rows updated (typically 1)
915
+
916
+ Example:
917
+ >>> # After sending alert
918
+ >>> internal.update_alert_timestamp(
919
+ ... "cpu_usage",
920
+ ... datetime.utcnow(),
921
+ ... increment_count=True
922
+ ... )
923
+ """
924
+ full_table_name = self._manager.get_full_table_name(
925
+ TABLE_TASKS, use_internal=True
926
+ )
927
+
928
+ # Normalize timestamp to naive if needed
929
+ if hasattr(timestamp, 'tzinfo') and timestamp.tzinfo is not None:
930
+ timestamp = timestamp.replace(tzinfo=None)
931
+
932
+ if increment_count:
933
+ # Update with alert_count increment
934
+ update_query = f"""
935
+ ALTER TABLE {full_table_name}
936
+ UPDATE
937
+ last_alert_sent = %(timestamp)s,
938
+ alert_count = alert_count + 1,
939
+ updated_at = %(timestamp)s
940
+ WHERE metric_name = %(metric_name)s
941
+ AND detector_id = 'pipeline'
942
+ AND process_type = 'pipeline'
943
+ """
944
+ else:
945
+ # Update without alert_count increment
946
+ update_query = f"""
947
+ ALTER TABLE {full_table_name}
948
+ UPDATE
949
+ last_alert_sent = %(timestamp)s,
950
+ updated_at = %(timestamp)s
951
+ WHERE metric_name = %(metric_name)s
952
+ AND detector_id = 'pipeline'
953
+ AND process_type = 'pipeline'
954
+ """
955
+
956
+ self._manager.execute_query(
957
+ update_query,
958
+ params={
959
+ "metric_name": metric_name,
960
+ "timestamp": timestamp
961
+ }
962
+ )
963
+
964
+ # ClickHouse ALTER TABLE UPDATE is async, return 1 (optimistic)
965
+ return 1
@@ -97,12 +97,15 @@ def get_tasks_table_model() -> TableModel:
97
97
  - last_processed_timestamp: Last successfully processed timestamp
98
98
  - error_message: Error message if failed (nullable)
99
99
  - timeout_seconds: Task timeout in seconds
100
+ - last_alert_sent: Timestamp of last sent alert (nullable, for cooldown tracking)
101
+ - alert_count: Number of alerts sent for this metric (for statistics)
100
102
 
101
103
  Primary Key: (metric_name, detector_id, process_type)
102
104
 
103
- This table serves dual purpose:
105
+ This table serves multiple purposes:
104
106
  1. Locking: Only one process can run for a given (metric, detector, type)
105
107
  2. Resume: Stores last_processed_timestamp to resume from interruptions
108
+ 3. Alert cooldown: Tracks last_alert_sent timestamp to prevent alert spam
106
109
  """
107
110
  return TableModel(
108
111
  columns=[
@@ -119,6 +122,12 @@ def get_tasks_table_model() -> TableModel:
119
122
  ),
120
123
  ColumnDefinition("error_message", "Nullable(String)", nullable=True),
121
124
  ColumnDefinition("timeout_seconds", "Int32"),
125
+ ColumnDefinition(
126
+ "last_alert_sent",
127
+ "Nullable(DateTime64(3, 'UTC'))",
128
+ nullable=True
129
+ ),
130
+ ColumnDefinition("alert_count", "UInt32", default="0"),
122
131
  ],
123
132
  primary_key=["metric_name", "detector_id", "process_type"],
124
133
  engine="MergeTree",
@@ -577,6 +577,8 @@ class TaskManager:
577
577
  consecutive_anomalies=alerting_config.consecutive_anomalies,
578
578
  ),
579
579
  timezone_display="UTC",
580
+ internal=self.internal, # For cooldown tracking
581
+ alert_config=alerting_config, # For cooldown settings
580
582
  )
581
583
 
582
584
  # Get last complete point
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: detectkit
3
- Version: 0.2.8
3
+ Version: 0.3.1
4
4
  Summary: Metric monitoring with automatic anomaly detection
5
5
  Author: detectkit team
6
6
  License: MIT
@@ -68,12 +68,19 @@ Dynamic: license-file
68
68
 
69
69
  ## Status
70
70
 
71
- ✅ **Production Ready** - Version 0.1.2
71
+ ✅ **Production Ready** - Version 0.3.0
72
72
 
73
73
  Published to PyPI: https://pypi.org/project/detectkit/
74
74
 
75
75
  Complete rewrite with modern architecture and full documentation (2025).
76
76
 
77
+ ### What's New in v0.3.0
78
+
79
+ 🎯 **Alert Cooldown** - Prevent alert spam from persistent anomalies
80
+ - Configure minimum time between alerts (`alert_cooldown: "30min"`)
81
+ - Automatic recovery detection (`cooldown_reset_on_recovery: true`)
82
+ - Stops duplicate alerts during long-running issues
83
+
77
84
  ## Features
78
85
 
79
86
  - ✅ **Pure numpy arrays** - No pandas dependency in core logic
@@ -225,13 +232,14 @@ This project is currently in active development. Contributions are welcome once
225
232
 
226
233
  ## Changelog
227
234
 
228
- ### 0.1.0 (2025-11-07)
229
- - Initial release with complete rewrite
230
- - Core foundation: models, database, config
231
- - ✅ Metric loading with gap filling and seasonality extraction
232
- - Statistical detectors (Z-Score, MAD, IQR, Manual Bounds)
233
- - Alert channels (Webhook, Mattermost, Slack)
234
- - Alert orchestration with consecutive anomaly logic
235
- - Task manager for pipeline execution
236
- - ✅ CLI commands (dtk init, dtk run)
237
- - 📊 287 unit tests, 87% coverage
235
+ See [CHANGELOG.md](CHANGELOG.md) for complete version history.
236
+
237
+ ### Recent Releases
238
+
239
+ **[0.3.0]** (2025-11-10) - Alert cooldown system, spam prevention
240
+ **[0.2.8]** (2025-11-10) - Fix incomplete interval detection
241
+ **[0.2.7]** (2025-11-10) - Add _dtk_metrics table
242
+ **[0.2.0]** (2025-11-06) - Detector preprocessing and value weighting
243
+ **[0.1.0]** (2025-11-03) - Initial release
244
+
245
+ [Full changelog →](CHANGELOG.md)
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "detectkit"
7
- version = "0.2.8"
7
+ version = "0.3.1"
8
8
  description = "Metric monitoring with automatic anomaly detection"
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.10"
File without changes
File without changes
File without changes
File without changes
File without changes