mvp24hours-dotnet-mcp 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTRIBUTING.md +217 -0
- package/LICENSE +21 -0
- package/README.md +221 -0
- package/dist/index.d.ts +10 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +1454 -0
- package/dist/index.js.map +1 -0
- package/dist/tools/ai-implementation.d.ts +28 -0
- package/dist/tools/ai-implementation.d.ts.map +1 -0
- package/dist/tools/ai-implementation.js +1251 -0
- package/dist/tools/ai-implementation.js.map +1 -0
- package/dist/tools/architecture-advisor.d.ts +42 -0
- package/dist/tools/architecture-advisor.d.ts.map +1 -0
- package/dist/tools/architecture-advisor.js +442 -0
- package/dist/tools/architecture-advisor.js.map +1 -0
- package/dist/tools/containerization-patterns.d.ts +18 -0
- package/dist/tools/containerization-patterns.d.ts.map +1 -0
- package/dist/tools/containerization-patterns.js +819 -0
- package/dist/tools/containerization-patterns.js.map +1 -0
- package/dist/tools/core-patterns.d.ts +18 -0
- package/dist/tools/core-patterns.d.ts.map +1 -0
- package/dist/tools/core-patterns.js +1039 -0
- package/dist/tools/core-patterns.js.map +1 -0
- package/dist/tools/cqrs-guide.d.ts +18 -0
- package/dist/tools/cqrs-guide.d.ts.map +1 -0
- package/dist/tools/cqrs-guide.js +2777 -0
- package/dist/tools/cqrs-guide.js.map +1 -0
- package/dist/tools/database-advisor.d.ts +39 -0
- package/dist/tools/database-advisor.d.ts.map +1 -0
- package/dist/tools/database-advisor.js +598 -0
- package/dist/tools/database-advisor.js.map +1 -0
- package/dist/tools/get-started.d.ts +20 -0
- package/dist/tools/get-started.d.ts.map +1 -0
- package/dist/tools/get-started.js +254 -0
- package/dist/tools/get-started.js.map +1 -0
- package/dist/tools/get-template.d.ts +17 -0
- package/dist/tools/get-template.d.ts.map +1 -0
- package/dist/tools/get-template.js +605 -0
- package/dist/tools/get-template.js.map +1 -0
- package/dist/tools/infrastructure-guide.d.ts +18 -0
- package/dist/tools/infrastructure-guide.d.ts.map +1 -0
- package/dist/tools/infrastructure-guide.js +1078 -0
- package/dist/tools/infrastructure-guide.js.map +1 -0
- package/dist/tools/messaging-patterns.d.ts +18 -0
- package/dist/tools/messaging-patterns.d.ts.map +1 -0
- package/dist/tools/messaging-patterns.js +718 -0
- package/dist/tools/messaging-patterns.js.map +1 -0
- package/dist/tools/modernization-guide.d.ts +23 -0
- package/dist/tools/modernization-guide.d.ts.map +1 -0
- package/dist/tools/modernization-guide.js +1072 -0
- package/dist/tools/modernization-guide.js.map +1 -0
- package/dist/tools/observability-setup.d.ts +23 -0
- package/dist/tools/observability-setup.d.ts.map +1 -0
- package/dist/tools/observability-setup.js +592 -0
- package/dist/tools/observability-setup.js.map +1 -0
- package/dist/tools/reference-guide.d.ts +18 -0
- package/dist/tools/reference-guide.d.ts.map +1 -0
- package/dist/tools/reference-guide.js +890 -0
- package/dist/tools/reference-guide.js.map +1 -0
- package/dist/tools/security-patterns.d.ts +18 -0
- package/dist/tools/security-patterns.d.ts.map +1 -0
- package/dist/tools/security-patterns.js +898 -0
- package/dist/tools/security-patterns.js.map +1 -0
- package/dist/tools/testing-patterns.d.ts +18 -0
- package/dist/tools/testing-patterns.d.ts.map +1 -0
- package/dist/tools/testing-patterns.js +1151 -0
- package/dist/tools/testing-patterns.js.map +1 -0
- package/dist/utils/doc-loader.d.ts +28 -0
- package/dist/utils/doc-loader.d.ts.map +1 -0
- package/dist/utils/doc-loader.js +88 -0
- package/dist/utils/doc-loader.js.map +1 -0
- package/docs/ai-context/ai-decision-matrix.md +216 -0
- package/docs/ai-context/ai-implementation-index.md +333 -0
- package/docs/ai-context/api-versioning-patterns.md +597 -0
- package/docs/ai-context/architecture-templates.md +794 -0
- package/docs/ai-context/containerization-patterns.md +633 -0
- package/docs/ai-context/database-patterns.md +575 -0
- package/docs/ai-context/decision-matrix.md +329 -0
- package/docs/ai-context/error-handling-patterns.md +727 -0
- package/docs/ai-context/home.md +298 -0
- package/docs/ai-context/messaging-patterns.md +547 -0
- package/docs/ai-context/modernization-patterns.md +756 -0
- package/docs/ai-context/observability-patterns.md +594 -0
- package/docs/ai-context/project-structure.md +264 -0
- package/docs/ai-context/security-patterns.md +662 -0
- package/docs/ai-context/structure-complex-nlayers.md +982 -0
- package/docs/ai-context/structure-minimal-api.md +668 -0
- package/docs/ai-context/structure-simple-nlayers.md +754 -0
- package/docs/ai-context/template-agent-framework-basic.md +1159 -0
- package/docs/ai-context/template-agent-framework-middleware.md +519 -0
- package/docs/ai-context/template-agent-framework-multi-agent.md +1187 -0
- package/docs/ai-context/template-agent-framework-workflows.md +1234 -0
- package/docs/ai-context/template-clean-architecture.md +858 -0
- package/docs/ai-context/template-cqrs.md +938 -0
- package/docs/ai-context/template-ddd.md +1053 -0
- package/docs/ai-context/template-event-driven.md +884 -0
- package/docs/ai-context/template-hexagonal.md +922 -0
- package/docs/ai-context/template-microservices.md +788 -0
- package/docs/ai-context/template-sk-chat-completion.md +816 -0
- package/docs/ai-context/template-sk-planners.md +859 -0
- package/docs/ai-context/template-sk-plugins.md +793 -0
- package/docs/ai-context/template-sk-rag-basic.md +890 -0
- package/docs/ai-context/template-skg-chain-of-thought.md +981 -0
- package/docs/ai-context/template-skg-chatbot-memory.md +785 -0
- package/docs/ai-context/template-skg-checkpointing.md +1314 -0
- package/docs/ai-context/template-skg-document-pipeline.md +1110 -0
- package/docs/ai-context/template-skg-graph-executor.md +777 -0
- package/docs/ai-context/template-skg-human-in-loop.md +1179 -0
- package/docs/ai-context/template-skg-multi-agent.md +901 -0
- package/docs/ai-context/template-skg-observability.md +1020 -0
- package/docs/ai-context/template-skg-react-agent.md +742 -0
- package/docs/ai-context/template-skg-streaming.md +930 -0
- package/docs/ai-context/testing-patterns.md +715 -0
- package/docs/application-services.md +883 -0
- package/docs/broker-advanced.md +738 -0
- package/docs/broker.md +188 -0
- package/docs/caching-advanced.md +666 -0
- package/docs/core/entity-interfaces.md +412 -0
- package/docs/core/exceptions.md +439 -0
- package/docs/core/functional-patterns.md +382 -0
- package/docs/core/guard-clauses.md +385 -0
- package/docs/core/home.md +109 -0
- package/docs/core/infrastructure-abstractions.md +386 -0
- package/docs/core/smart-enums.md +327 -0
- package/docs/core/strongly-typed-ids.md +352 -0
- package/docs/core/value-objects.md +356 -0
- package/docs/cqrs/api-reference.md +433 -0
- package/docs/cqrs/behaviors.md +215 -0
- package/docs/cqrs/best-practices.md +300 -0
- package/docs/cqrs/commands.md +267 -0
- package/docs/cqrs/concepts-comparison.md +208 -0
- package/docs/cqrs/domain-events.md +244 -0
- package/docs/cqrs/event-sourcing/aggregate.md +303 -0
- package/docs/cqrs/event-sourcing/event-store.md +292 -0
- package/docs/cqrs/event-sourcing/home.md +182 -0
- package/docs/cqrs/event-sourcing/projections.md +312 -0
- package/docs/cqrs/event-sourcing/snapshots.md +316 -0
- package/docs/cqrs/extensibility.md +473 -0
- package/docs/cqrs/getting-started.md +163 -0
- package/docs/cqrs/home.md +81 -0
- package/docs/cqrs/integration-caching.md +238 -0
- package/docs/cqrs/integration-events.md +257 -0
- package/docs/cqrs/integration-rabbitmq.md +304 -0
- package/docs/cqrs/integration-repository.md +235 -0
- package/docs/cqrs/integration-unitofwork.md +224 -0
- package/docs/cqrs/mediator.md +232 -0
- package/docs/cqrs/migration-mediatr.md +304 -0
- package/docs/cqrs/multi-tenancy.md +473 -0
- package/docs/cqrs/notifications.md +234 -0
- package/docs/cqrs/observability/audit.md +300 -0
- package/docs/cqrs/observability/telemetry.md +290 -0
- package/docs/cqrs/observability/tracing.md +284 -0
- package/docs/cqrs/queries.md +263 -0
- package/docs/cqrs/resilience/circuit-breaker.md +341 -0
- package/docs/cqrs/resilience/idempotency.md +200 -0
- package/docs/cqrs/resilience/inbox-outbox.md +450 -0
- package/docs/cqrs/resilience/retry.md +238 -0
- package/docs/cqrs/saga/compensation.md +311 -0
- package/docs/cqrs/saga/home.md +199 -0
- package/docs/cqrs/saga/implementation.md +422 -0
- package/docs/cqrs/scheduled-commands.md +537 -0
- package/docs/cqrs/specifications.md +580 -0
- package/docs/cqrs/validation-behavior.md +287 -0
- package/docs/cronjob-advanced.md +921 -0
- package/docs/cronjob-observability.md +369 -0
- package/docs/cronjob-resilience.md +378 -0
- package/docs/cronjob.md +343 -0
- package/docs/database/efcore-advanced.md +765 -0
- package/docs/database/mongodb-advanced.md +716 -0
- package/docs/database/nosql.md +226 -0
- package/docs/database/relational.md +145 -0
- package/docs/database/use-context.md +97 -0
- package/docs/database/use-entity.md +72 -0
- package/docs/database/use-repository.md +120 -0
- package/docs/database/use-service.md +58 -0
- package/docs/database/use-unitofwork.md +34 -0
- package/docs/documentation.md +186 -0
- package/docs/getting-started.md +163 -0
- package/docs/home.md +76 -0
- package/docs/index.md +175 -0
- package/docs/logging.md +301 -0
- package/docs/mapping.md +163 -0
- package/docs/migration.md +411 -0
- package/docs/modernization/aspire.md +393 -0
- package/docs/modernization/channels.md +440 -0
- package/docs/modernization/dotnet9-features.md +281 -0
- package/docs/modernization/generic-resilience.md +373 -0
- package/docs/modernization/http-resilience.md +250 -0
- package/docs/modernization/hybrid-cache.md +431 -0
- package/docs/modernization/keyed-services.md +383 -0
- package/docs/modernization/migration-guide.md +657 -0
- package/docs/modernization/minimal-apis.md +407 -0
- package/docs/modernization/native-openapi.md +426 -0
- package/docs/modernization/options-configuration.md +404 -0
- package/docs/modernization/output-caching.md +454 -0
- package/docs/modernization/periodic-timer.md +315 -0
- package/docs/modernization/problem-details.md +432 -0
- package/docs/modernization/rate-limiting.md +385 -0
- package/docs/modernization/source-generators.md +219 -0
- package/docs/modernization/time-provider.md +303 -0
- package/docs/observability/exporters.md +556 -0
- package/docs/observability/home.md +186 -0
- package/docs/observability/logging.md +589 -0
- package/docs/observability/metrics.md +504 -0
- package/docs/observability/migration.md +337 -0
- package/docs/observability/tracing.md +453 -0
- package/docs/pipeline.md +383 -0
- package/docs/release.md +253 -0
- package/docs/specification.md +130 -0
- package/docs/telemetry.md +189 -0
- package/docs/validation.md +205 -0
- package/docs/webapi-advanced.md +1188 -0
- package/docs/webapi.md +401 -0
- package/package.json +68 -0
|
@@ -0,0 +1,1020 @@
|
|
|
1
|
+
# Observability with Graphs Template - Semantic Kernel Graph
|
|
2
|
+
|
|
3
|
+
> **Purpose**: This template provides AI agents with patterns for implementing comprehensive observability in graph-based workflows using Semantic Kernel Graph.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
Observability enables monitoring, analysis, and visualization of graph execution performance. This template covers:
|
|
10
|
+
- Performance metrics collection
|
|
11
|
+
- Node execution monitoring
|
|
12
|
+
- Execution path analysis
|
|
13
|
+
- Metrics export and visualization
|
|
14
|
+
- Alerting and dashboards
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## When to Use This Template
|
|
19
|
+
|
|
20
|
+
| Scenario | Recommendation |
|
|
21
|
+
|----------|----------------|
|
|
22
|
+
| Production monitoring | ✅ Recommended |
|
|
23
|
+
| Performance optimization | ✅ Recommended |
|
|
24
|
+
| Debugging workflows | ✅ Recommended |
|
|
25
|
+
| Capacity planning | ✅ Recommended |
|
|
26
|
+
| Simple scripts | ⚠️ May add overhead |
|
|
27
|
+
| Development only | ⚠️ Use simpler logging |
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Required NuGet Packages
|
|
32
|
+
|
|
33
|
+
```xml
|
|
34
|
+
<ItemGroup>
|
|
35
|
+
<PackageReference Include="Microsoft.SemanticKernel" Version="1.*" />
|
|
36
|
+
<PackageReference Include="SemanticKernel.Graph" Version="1.*" />
|
|
37
|
+
<PackageReference Include="Microsoft.Extensions.Logging" Version="8.*" />
|
|
38
|
+
<PackageReference Include="System.Diagnostics.DiagnosticSource" Version="8.*" />
|
|
39
|
+
</ItemGroup>
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## Observability Architecture
|
|
45
|
+
|
|
46
|
+
```
|
|
47
|
+
┌────────────────────────────────────────────────────────────┐
|
|
48
|
+
│ Observability System │
|
|
49
|
+
├────────────────────────────────────────────────────────────┤
|
|
50
|
+
│ │
|
|
51
|
+
│ ┌──────────────────┐ ┌───────────────────────────────┐│
|
|
52
|
+
│ │ Graph Executor │────▶│ Metrics Collector ││
|
|
53
|
+
│ └──────────────────┘ │ - Execution metrics ││
|
|
54
|
+
│ │ - Node metrics ││
|
|
55
|
+
│ │ - Resource metrics ││
|
|
56
|
+
│ └─────────────┬─────────────────┘│
|
|
57
|
+
│ │ │
|
|
58
|
+
│ ┌─────────────────────────────┼──────────────────┐│
|
|
59
|
+
│ │ │ ││
|
|
60
|
+
│ ┌────────▼────────┐ ┌───────────────▼─────┐ ┌───────▼────────┐│
|
|
61
|
+
│ │ Aggregator │ │ Exporter │ │ Dashboard ││
|
|
62
|
+
│ │ - Summarize │ │ - JSON │ │ - Real-time ││
|
|
63
|
+
│ │ - Trends │ │ - Prometheus │ │ - Historical ││
|
|
64
|
+
│ │ - Alerts │ │ - CSV │ │ - Alerts ││
|
|
65
|
+
│ └─────────────────┘ └─────────────────────┘ └────────────────┘│
|
|
66
|
+
└────────────────────────────────────────────────────────────┘
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## Core Components
|
|
72
|
+
|
|
73
|
+
### Configuration Models
|
|
74
|
+
|
|
75
|
+
```csharp
|
|
76
|
+
/// <summary>
|
|
77
|
+
/// Configuration options for metrics collection.
|
|
78
|
+
/// </summary>
|
|
79
|
+
public class GraphMetricsOptions
|
|
80
|
+
{
|
|
81
|
+
/// <summary>
|
|
82
|
+
/// Enable node-level metrics collection.
|
|
83
|
+
/// </summary>
|
|
84
|
+
public bool EnableNodeMetrics { get; set; } = true;
|
|
85
|
+
|
|
86
|
+
/// <summary>
|
|
87
|
+
/// Enable execution-level metrics.
|
|
88
|
+
/// </summary>
|
|
89
|
+
public bool EnableExecutionMetrics { get; set; } = true;
|
|
90
|
+
|
|
91
|
+
/// <summary>
|
|
92
|
+
/// Enable resource usage monitoring.
|
|
93
|
+
/// </summary>
|
|
94
|
+
public bool EnableResourceMonitoring { get; set; } = true;
|
|
95
|
+
|
|
96
|
+
/// <summary>
|
|
97
|
+
/// Interval for metrics collection.
|
|
98
|
+
/// </summary>
|
|
99
|
+
public TimeSpan MetricsCollectionInterval { get; set; } = TimeSpan.FromMilliseconds(100);
|
|
100
|
+
|
|
101
|
+
/// <summary>
|
|
102
|
+
/// Maximum number of metrics to keep in history.
|
|
103
|
+
/// </summary>
|
|
104
|
+
public int MaxMetricsHistory { get; set; } = 10000;
|
|
105
|
+
|
|
106
|
+
/// <summary>
|
|
107
|
+
/// Enable metrics compression.
|
|
108
|
+
/// </summary>
|
|
109
|
+
public bool EnableMetricsCompression { get; set; } = true;
|
|
110
|
+
|
|
111
|
+
/// <summary>
|
|
112
|
+
/// Enable metrics aggregation.
|
|
113
|
+
/// </summary>
|
|
114
|
+
public bool EnableMetricsAggregation { get; set; } = true;
|
|
115
|
+
|
|
116
|
+
/// <summary>
|
|
117
|
+
/// Aggregation interval.
|
|
118
|
+
/// </summary>
|
|
119
|
+
public TimeSpan AggregationInterval { get; set; } = TimeSpan.FromMinutes(1);
|
|
120
|
+
|
|
121
|
+
/// <summary>
|
|
122
|
+
/// Creates development-friendly options.
|
|
123
|
+
/// </summary>
|
|
124
|
+
public static GraphMetricsOptions CreateDevelopmentOptions()
|
|
125
|
+
{
|
|
126
|
+
return new GraphMetricsOptions
|
|
127
|
+
{
|
|
128
|
+
MetricsCollectionInterval = TimeSpan.FromMilliseconds(50),
|
|
129
|
+
MaxMetricsHistory = 1000,
|
|
130
|
+
EnableResourceMonitoring = false
|
|
131
|
+
};
|
|
132
|
+
}
|
|
133
|
+
|
|
134
|
+
/// <summary>
|
|
135
|
+
/// Creates production-optimized options.
|
|
136
|
+
/// </summary>
|
|
137
|
+
public static GraphMetricsOptions CreateProductionOptions()
|
|
138
|
+
{
|
|
139
|
+
return new GraphMetricsOptions
|
|
140
|
+
{
|
|
141
|
+
MetricsCollectionInterval = TimeSpan.FromSeconds(1),
|
|
142
|
+
MaxMetricsHistory = 100000,
|
|
143
|
+
EnableResourceMonitoring = true,
|
|
144
|
+
EnableMetricsCompression = true,
|
|
145
|
+
EnableMetricsAggregation = true
|
|
146
|
+
};
|
|
147
|
+
}
|
|
148
|
+
}
|
|
149
|
+
|
|
150
|
+
/// <summary>
|
|
151
|
+
/// Export format for metrics.
|
|
152
|
+
/// </summary>
|
|
153
|
+
public enum MetricsExportFormat
|
|
154
|
+
{
|
|
155
|
+
Json,
|
|
156
|
+
Csv,
|
|
157
|
+
Prometheus,
|
|
158
|
+
OpenTelemetry
|
|
159
|
+
}
|
|
160
|
+
|
|
161
|
+
/// <summary>
|
|
162
|
+
/// Options for metrics export.
|
|
163
|
+
/// </summary>
|
|
164
|
+
public class GraphMetricsExportOptions
|
|
165
|
+
{
|
|
166
|
+
public bool IndentedOutput { get; set; } = false;
|
|
167
|
+
public bool IncludeTimestamps { get; set; } = true;
|
|
168
|
+
public bool IncludeMetadata { get; set; } = true;
|
|
169
|
+
}
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Implementation Patterns
|
|
175
|
+
|
|
176
|
+
### 1. Node Execution Tracker
|
|
177
|
+
|
|
178
|
+
```csharp
|
|
179
|
+
/// <summary>
|
|
180
|
+
/// Tracks execution of individual nodes.
|
|
181
|
+
/// </summary>
|
|
182
|
+
public class NodeExecutionTracker
|
|
183
|
+
{
|
|
184
|
+
public string TrackerId { get; } = Guid.NewGuid().ToString();
|
|
185
|
+
public string NodeId { get; }
|
|
186
|
+
public string NodeName { get; }
|
|
187
|
+
public string ExecutionId { get; }
|
|
188
|
+
public DateTimeOffset StartTime { get; }
|
|
189
|
+
public DateTimeOffset? EndTime { get; private set; }
|
|
190
|
+
public bool? Success { get; private set; }
|
|
191
|
+
public TimeSpan? Duration => EndTime.HasValue ? EndTime.Value - StartTime : null;
|
|
192
|
+
public object? Result { get; private set; }
|
|
193
|
+
public string? ErrorMessage { get; private set; }
|
|
194
|
+
public Dictionary<string, object> Metadata { get; } = new();
|
|
195
|
+
|
|
196
|
+
public NodeExecutionTracker(string nodeId, string nodeName, string executionId)
|
|
197
|
+
{
|
|
198
|
+
NodeId = nodeId;
|
|
199
|
+
NodeName = nodeName;
|
|
200
|
+
ExecutionId = executionId;
|
|
201
|
+
StartTime = DateTimeOffset.UtcNow;
|
|
202
|
+
}
|
|
203
|
+
|
|
204
|
+
public void Complete(bool success, object? result = null, string? errorMessage = null)
|
|
205
|
+
{
|
|
206
|
+
EndTime = DateTimeOffset.UtcNow;
|
|
207
|
+
Success = success;
|
|
208
|
+
Result = result;
|
|
209
|
+
ErrorMessage = errorMessage;
|
|
210
|
+
}
|
|
211
|
+
}
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### 2. Performance Metrics Collector
|
|
215
|
+
|
|
216
|
+
```csharp
|
|
217
|
+
using System.Collections.Concurrent;
|
|
218
|
+
using System.Diagnostics;
|
|
219
|
+
|
|
220
|
+
/// <summary>
|
|
221
|
+
/// Collects and manages performance metrics for graph execution.
|
|
222
|
+
/// </summary>
|
|
223
|
+
public class GraphPerformanceMetrics : IDisposable
|
|
224
|
+
{
|
|
225
|
+
private readonly GraphMetricsOptions _options;
|
|
226
|
+
private readonly ConcurrentDictionary<string, NodeExecutionTracker> _activeTrackers = new();
|
|
227
|
+
private readonly ConcurrentDictionary<string, List<NodeExecutionMetrics>> _nodeMetrics = new();
|
|
228
|
+
private readonly ConcurrentDictionary<string, ExecutionPathMetrics> _executionPaths = new();
|
|
229
|
+
private readonly Timer? _resourceTimer;
|
|
230
|
+
private bool _disposed;
|
|
231
|
+
|
|
232
|
+
public double CurrentCpuUsage { get; private set; }
|
|
233
|
+
public double CurrentAvailableMemoryMB { get; private set; }
|
|
234
|
+
|
|
235
|
+
public GraphPerformanceMetrics(GraphMetricsOptions? options = null)
|
|
236
|
+
{
|
|
237
|
+
_options = options ?? new GraphMetricsOptions();
|
|
238
|
+
|
|
239
|
+
if (_options.EnableResourceMonitoring)
|
|
240
|
+
{
|
|
241
|
+
_resourceTimer = new Timer(
|
|
242
|
+
_ => SampleResourceUsage(),
|
|
243
|
+
null,
|
|
244
|
+
_options.MetricsCollectionInterval,
|
|
245
|
+
_options.MetricsCollectionInterval);
|
|
246
|
+
}
|
|
247
|
+
}
|
|
248
|
+
|
|
249
|
+
/// <summary>
|
|
250
|
+
/// Starts tracking a node execution.
|
|
251
|
+
/// </summary>
|
|
252
|
+
public NodeExecutionTracker StartNodeTracking(string nodeId, string nodeName, string executionId)
|
|
253
|
+
{
|
|
254
|
+
var tracker = new NodeExecutionTracker(nodeId, nodeName, executionId);
|
|
255
|
+
_activeTrackers[tracker.TrackerId] = tracker;
|
|
256
|
+
return tracker;
|
|
257
|
+
}
|
|
258
|
+
|
|
259
|
+
/// <summary>
|
|
260
|
+
/// Completes tracking a node execution.
|
|
261
|
+
/// </summary>
|
|
262
|
+
public void CompleteNodeTracking(NodeExecutionTracker tracker, bool success, object? result = null)
|
|
263
|
+
{
|
|
264
|
+
tracker.Complete(success, result);
|
|
265
|
+
_activeTrackers.TryRemove(tracker.TrackerId, out _);
|
|
266
|
+
|
|
267
|
+
// Store metrics
|
|
268
|
+
if (!_nodeMetrics.ContainsKey(tracker.NodeId))
|
|
269
|
+
{
|
|
270
|
+
_nodeMetrics[tracker.NodeId] = new List<NodeExecutionMetrics>();
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
_nodeMetrics[tracker.NodeId].Add(new NodeExecutionMetrics
|
|
274
|
+
{
|
|
275
|
+
NodeId = tracker.NodeId,
|
|
276
|
+
NodeName = tracker.NodeName,
|
|
277
|
+
ExecutionId = tracker.ExecutionId,
|
|
278
|
+
StartTime = tracker.StartTime,
|
|
279
|
+
EndTime = tracker.EndTime!.Value,
|
|
280
|
+
Duration = tracker.Duration!.Value,
|
|
281
|
+
Success = success
|
|
282
|
+
});
|
|
283
|
+
|
|
284
|
+
// Trim history if needed
|
|
285
|
+
if (_nodeMetrics[tracker.NodeId].Count > _options.MaxMetricsHistory)
|
|
286
|
+
{
|
|
287
|
+
_nodeMetrics[tracker.NodeId].RemoveAt(0);
|
|
288
|
+
}
|
|
289
|
+
}
|
|
290
|
+
|
|
291
|
+
/// <summary>
|
|
292
|
+
/// Records an execution path.
|
|
293
|
+
/// </summary>
|
|
294
|
+
public void RecordExecutionPath(string executionId, string[] nodeIds, TimeSpan duration, bool success)
|
|
295
|
+
{
|
|
296
|
+
_executionPaths[executionId] = new ExecutionPathMetrics
|
|
297
|
+
{
|
|
298
|
+
ExecutionId = executionId,
|
|
299
|
+
NodeIds = nodeIds,
|
|
300
|
+
TotalDuration = duration,
|
|
301
|
+
Success = success,
|
|
302
|
+
Timestamp = DateTimeOffset.UtcNow
|
|
303
|
+
};
|
|
304
|
+
}
|
|
305
|
+
|
|
306
|
+
/// <summary>
|
|
307
|
+
/// Gets aggregated metrics for a node.
|
|
308
|
+
/// </summary>
|
|
309
|
+
public NodeAggregatedMetrics GetNodeAggregatedMetrics(string nodeId)
|
|
310
|
+
{
|
|
311
|
+
if (!_nodeMetrics.TryGetValue(nodeId, out var metrics) || !metrics.Any())
|
|
312
|
+
{
|
|
313
|
+
return new NodeAggregatedMetrics { NodeId = nodeId };
|
|
314
|
+
}
|
|
315
|
+
|
|
316
|
+
return new NodeAggregatedMetrics
|
|
317
|
+
{
|
|
318
|
+
NodeId = nodeId,
|
|
319
|
+
TotalExecutions = metrics.Count,
|
|
320
|
+
SuccessCount = metrics.Count(m => m.Success),
|
|
321
|
+
FailureCount = metrics.Count(m => !m.Success),
|
|
322
|
+
AverageDuration = TimeSpan.FromTicks((long)metrics.Average(m => m.Duration.Ticks)),
|
|
323
|
+
MinDuration = metrics.Min(m => m.Duration),
|
|
324
|
+
MaxDuration = metrics.Max(m => m.Duration),
|
|
325
|
+
P95Duration = CalculatePercentile(metrics.Select(m => m.Duration).ToList(), 0.95),
|
|
326
|
+
P99Duration = CalculatePercentile(metrics.Select(m => m.Duration).ToList(), 0.99)
|
|
327
|
+
};
|
|
328
|
+
}
|
|
329
|
+
|
|
330
|
+
/// <summary>
|
|
331
|
+
/// Gets all node metrics.
|
|
332
|
+
/// </summary>
|
|
333
|
+
public IReadOnlyDictionary<string, List<NodeExecutionMetrics>> GetAllNodeMetrics()
|
|
334
|
+
{
|
|
335
|
+
return _nodeMetrics.ToDictionary(kvp => kvp.Key, kvp => kvp.Value.ToList());
|
|
336
|
+
}
|
|
337
|
+
|
|
338
|
+
/// <summary>
|
|
339
|
+
/// Gets all execution paths.
|
|
340
|
+
/// </summary>
|
|
341
|
+
public IReadOnlyDictionary<string, ExecutionPathMetrics> GetAllExecutionPaths()
|
|
342
|
+
{
|
|
343
|
+
return _executionPaths.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
|
|
344
|
+
}
|
|
345
|
+
|
|
346
|
+
private void SampleResourceUsage()
|
|
347
|
+
{
|
|
348
|
+
try
|
|
349
|
+
{
|
|
350
|
+
var process = Process.GetCurrentProcess();
|
|
351
|
+
CurrentCpuUsage = process.TotalProcessorTime.TotalMilliseconds;
|
|
352
|
+
CurrentAvailableMemoryMB = GC.GetTotalMemory(false) / (1024.0 * 1024.0);
|
|
353
|
+
}
|
|
354
|
+
catch
|
|
355
|
+
{
|
|
356
|
+
// Ignore resource sampling errors
|
|
357
|
+
}
|
|
358
|
+
}
|
|
359
|
+
|
|
360
|
+
private TimeSpan CalculatePercentile(List<TimeSpan> values, double percentile)
|
|
361
|
+
{
|
|
362
|
+
if (!values.Any()) return TimeSpan.Zero;
|
|
363
|
+
|
|
364
|
+
var sorted = values.OrderBy(v => v).ToList();
|
|
365
|
+
var index = (int)Math.Ceiling(percentile * sorted.Count) - 1;
|
|
366
|
+
return sorted[Math.Max(0, Math.Min(index, sorted.Count - 1))];
|
|
367
|
+
}
|
|
368
|
+
|
|
369
|
+
public void Dispose()
|
|
370
|
+
{
|
|
371
|
+
if (_disposed) return;
|
|
372
|
+
_disposed = true;
|
|
373
|
+
_resourceTimer?.Dispose();
|
|
374
|
+
}
|
|
375
|
+
}
|
|
376
|
+
|
|
377
|
+
public class NodeExecutionMetrics
|
|
378
|
+
{
|
|
379
|
+
public string NodeId { get; set; } = string.Empty;
|
|
380
|
+
public string NodeName { get; set; } = string.Empty;
|
|
381
|
+
public string ExecutionId { get; set; } = string.Empty;
|
|
382
|
+
public DateTimeOffset StartTime { get; set; }
|
|
383
|
+
public DateTimeOffset EndTime { get; set; }
|
|
384
|
+
public TimeSpan Duration { get; set; }
|
|
385
|
+
public bool Success { get; set; }
|
|
386
|
+
}
|
|
387
|
+
|
|
388
|
+
public class NodeAggregatedMetrics
|
|
389
|
+
{
|
|
390
|
+
public string NodeId { get; set; } = string.Empty;
|
|
391
|
+
public int TotalExecutions { get; set; }
|
|
392
|
+
public int SuccessCount { get; set; }
|
|
393
|
+
public int FailureCount { get; set; }
|
|
394
|
+
public double SuccessRate => TotalExecutions > 0 ? (double)SuccessCount / TotalExecutions : 0;
|
|
395
|
+
public TimeSpan AverageDuration { get; set; }
|
|
396
|
+
public TimeSpan MinDuration { get; set; }
|
|
397
|
+
public TimeSpan MaxDuration { get; set; }
|
|
398
|
+
public TimeSpan P95Duration { get; set; }
|
|
399
|
+
public TimeSpan P99Duration { get; set; }
|
|
400
|
+
}
|
|
401
|
+
|
|
402
|
+
public class ExecutionPathMetrics
|
|
403
|
+
{
|
|
404
|
+
public string ExecutionId { get; set; } = string.Empty;
|
|
405
|
+
public string[] NodeIds { get; set; } = Array.Empty<string>();
|
|
406
|
+
public TimeSpan TotalDuration { get; set; }
|
|
407
|
+
public bool Success { get; set; }
|
|
408
|
+
public DateTimeOffset Timestamp { get; set; }
|
|
409
|
+
}
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
### 3. Metrics Exporter
|
|
413
|
+
|
|
414
|
+
```csharp
|
|
415
|
+
using System.Text;
|
|
416
|
+
using System.Text.Json;
|
|
417
|
+
|
|
418
|
+
/// <summary>
|
|
419
|
+
/// Exports metrics to various formats.
|
|
420
|
+
/// </summary>
|
|
421
|
+
public class GraphMetricsExporter : IDisposable
|
|
422
|
+
{
|
|
423
|
+
private readonly GraphMetricsExportOptions _options;
|
|
424
|
+
private readonly JsonSerializerOptions _jsonOptions;
|
|
425
|
+
|
|
426
|
+
public GraphMetricsExporter(GraphMetricsExportOptions? options = null)
|
|
427
|
+
{
|
|
428
|
+
_options = options ?? new GraphMetricsExportOptions();
|
|
429
|
+
_jsonOptions = new JsonSerializerOptions
|
|
430
|
+
{
|
|
431
|
+
WriteIndented = _options.IndentedOutput,
|
|
432
|
+
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
|
|
433
|
+
};
|
|
434
|
+
}
|
|
435
|
+
|
|
436
|
+
/// <summary>
|
|
437
|
+
/// Exports metrics to the specified format.
|
|
438
|
+
/// </summary>
|
|
439
|
+
public string ExportMetrics(
|
|
440
|
+
GraphPerformanceMetrics metrics,
|
|
441
|
+
MetricsExportFormat format,
|
|
442
|
+
TimeSpan? timeRange = null)
|
|
443
|
+
{
|
|
444
|
+
return format switch
|
|
445
|
+
{
|
|
446
|
+
MetricsExportFormat.Json => ExportToJson(metrics, timeRange),
|
|
447
|
+
MetricsExportFormat.Csv => ExportToCsv(metrics, timeRange),
|
|
448
|
+
MetricsExportFormat.Prometheus => ExportToPrometheus(metrics, timeRange),
|
|
449
|
+
_ => throw new NotSupportedException($"Format {format} is not supported")
|
|
450
|
+
};
|
|
451
|
+
}
|
|
452
|
+
|
|
453
|
+
private string ExportToJson(GraphPerformanceMetrics metrics, TimeSpan? timeRange)
|
|
454
|
+
{
|
|
455
|
+
var nodeMetrics = metrics.GetAllNodeMetrics();
|
|
456
|
+
var executionPaths = metrics.GetAllExecutionPaths();
|
|
457
|
+
|
|
458
|
+
var export = new
|
|
459
|
+
{
|
|
460
|
+
ExportedAt = DateTimeOffset.UtcNow,
|
|
461
|
+
TimeRange = timeRange?.ToString(),
|
|
462
|
+
NodeMetrics = nodeMetrics.ToDictionary(
|
|
463
|
+
kvp => kvp.Key,
|
|
464
|
+
kvp => new
|
|
465
|
+
{
|
|
466
|
+
Executions = kvp.Value.Count,
|
|
467
|
+
Aggregated = metrics.GetNodeAggregatedMetrics(kvp.Key)
|
|
468
|
+
}),
|
|
469
|
+
ExecutionPaths = executionPaths.Values.ToList(),
|
|
470
|
+
ResourceMetrics = new
|
|
471
|
+
{
|
|
472
|
+
CpuUsage = metrics.CurrentCpuUsage,
|
|
473
|
+
AvailableMemoryMB = metrics.CurrentAvailableMemoryMB
|
|
474
|
+
}
|
|
475
|
+
};
|
|
476
|
+
|
|
477
|
+
return JsonSerializer.Serialize(export, _jsonOptions);
|
|
478
|
+
}
|
|
479
|
+
|
|
480
|
+
private string ExportToCsv(GraphPerformanceMetrics metrics, TimeSpan? timeRange)
|
|
481
|
+
{
|
|
482
|
+
var sb = new StringBuilder();
|
|
483
|
+
sb.AppendLine("NodeId,NodeName,ExecutionId,StartTime,EndTime,DurationMs,Success");
|
|
484
|
+
|
|
485
|
+
foreach (var (nodeId, nodeMetricsList) in metrics.GetAllNodeMetrics())
|
|
486
|
+
{
|
|
487
|
+
foreach (var m in nodeMetricsList)
|
|
488
|
+
{
|
|
489
|
+
sb.AppendLine($"{m.NodeId},{m.NodeName},{m.ExecutionId},{m.StartTime:O},{m.EndTime:O},{m.Duration.TotalMilliseconds:F2},{m.Success}");
|
|
490
|
+
}
|
|
491
|
+
}
|
|
492
|
+
|
|
493
|
+
return sb.ToString();
|
|
494
|
+
}
|
|
495
|
+
|
|
496
|
+
private string ExportToPrometheus(GraphPerformanceMetrics metrics, TimeSpan? timeRange)
|
|
497
|
+
{
|
|
498
|
+
var sb = new StringBuilder();
|
|
499
|
+
var timestamp = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
|
|
500
|
+
|
|
501
|
+
foreach (var (nodeId, _) in metrics.GetAllNodeMetrics())
|
|
502
|
+
{
|
|
503
|
+
var agg = metrics.GetNodeAggregatedMetrics(nodeId);
|
|
504
|
+
|
|
505
|
+
sb.AppendLine($"# HELP graph_node_executions_total Total number of node executions");
|
|
506
|
+
sb.AppendLine($"# TYPE graph_node_executions_total counter");
|
|
507
|
+
sb.AppendLine($"graph_node_executions_total{{node=\"{nodeId}\"}} {agg.TotalExecutions} {timestamp}");
|
|
508
|
+
|
|
509
|
+
sb.AppendLine($"# HELP graph_node_success_rate Success rate of node executions");
|
|
510
|
+
sb.AppendLine($"# TYPE graph_node_success_rate gauge");
|
|
511
|
+
sb.AppendLine($"graph_node_success_rate{{node=\"{nodeId}\"}} {agg.SuccessRate:F4} {timestamp}");
|
|
512
|
+
|
|
513
|
+
sb.AppendLine($"# HELP graph_node_duration_ms_avg Average execution duration in milliseconds");
|
|
514
|
+
sb.AppendLine($"# TYPE graph_node_duration_ms_avg gauge");
|
|
515
|
+
sb.AppendLine($"graph_node_duration_ms_avg{{node=\"{nodeId}\"}} {agg.AverageDuration.TotalMilliseconds:F2} {timestamp}");
|
|
516
|
+
|
|
517
|
+
sb.AppendLine($"# HELP graph_node_duration_ms_p95 95th percentile execution duration");
|
|
518
|
+
sb.AppendLine($"# TYPE graph_node_duration_ms_p95 gauge");
|
|
519
|
+
sb.AppendLine($"graph_node_duration_ms_p95{{node=\"{nodeId}\"}} {agg.P95Duration.TotalMilliseconds:F2} {timestamp}");
|
|
520
|
+
|
|
521
|
+
sb.AppendLine($"# HELP graph_node_duration_ms_p99 99th percentile execution duration");
|
|
522
|
+
sb.AppendLine($"# TYPE graph_node_duration_ms_p99 gauge");
|
|
523
|
+
sb.AppendLine($"graph_node_duration_ms_p99{{node=\"{nodeId}\"}} {agg.P99Duration.TotalMilliseconds:F2} {timestamp}");
|
|
524
|
+
}
|
|
525
|
+
|
|
526
|
+
return sb.ToString();
|
|
527
|
+
}
|
|
528
|
+
|
|
529
|
+
public void Dispose()
|
|
530
|
+
{
|
|
531
|
+
// Cleanup if needed
|
|
532
|
+
}
|
|
533
|
+
}
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
### 4. Metrics Dashboard
|
|
537
|
+
|
|
538
|
+
```csharp
|
|
539
|
+
/// <summary>
|
|
540
|
+
/// Real-time metrics dashboard for monitoring.
|
|
541
|
+
/// </summary>
|
|
542
|
+
public class MetricsDashboard
|
|
543
|
+
{
|
|
544
|
+
private readonly GraphPerformanceMetrics _metrics;
|
|
545
|
+
private readonly Timer _updateTimer;
|
|
546
|
+
private readonly TimeSpan _updateInterval;
|
|
547
|
+
|
|
548
|
+
public event EventHandler<DashboardUpdate>? Updated;
|
|
549
|
+
|
|
550
|
+
public MetricsDashboard(
|
|
551
|
+
GraphPerformanceMetrics metrics,
|
|
552
|
+
TimeSpan? updateInterval = null)
|
|
553
|
+
{
|
|
554
|
+
_metrics = metrics;
|
|
555
|
+
_updateInterval = updateInterval ?? TimeSpan.FromMilliseconds(500);
|
|
556
|
+
|
|
557
|
+
_updateTimer = new Timer(
|
|
558
|
+
_ => EmitUpdate(),
|
|
559
|
+
null,
|
|
560
|
+
_updateInterval,
|
|
561
|
+
_updateInterval);
|
|
562
|
+
}
|
|
563
|
+
|
|
564
|
+
private void EmitUpdate()
|
|
565
|
+
{
|
|
566
|
+
var update = new DashboardUpdate
|
|
567
|
+
{
|
|
568
|
+
Timestamp = DateTimeOffset.UtcNow,
|
|
569
|
+
NodeSummaries = GetNodeSummaries(),
|
|
570
|
+
ResourceUsage = new ResourceUsage
|
|
571
|
+
{
|
|
572
|
+
CpuUsage = _metrics.CurrentCpuUsage,
|
|
573
|
+
MemoryMB = _metrics.CurrentAvailableMemoryMB
|
|
574
|
+
},
|
|
575
|
+
RecentExecutions = GetRecentExecutions()
|
|
576
|
+
};
|
|
577
|
+
|
|
578
|
+
Updated?.Invoke(this, update);
|
|
579
|
+
}
|
|
580
|
+
|
|
581
|
+
private List<NodeSummary> GetNodeSummaries()
|
|
582
|
+
{
|
|
583
|
+
var summaries = new List<NodeSummary>();
|
|
584
|
+
|
|
585
|
+
foreach (var (nodeId, _) in _metrics.GetAllNodeMetrics())
|
|
586
|
+
{
|
|
587
|
+
var agg = _metrics.GetNodeAggregatedMetrics(nodeId);
|
|
588
|
+
summaries.Add(new NodeSummary
|
|
589
|
+
{
|
|
590
|
+
NodeId = nodeId,
|
|
591
|
+
TotalExecutions = agg.TotalExecutions,
|
|
592
|
+
SuccessRate = agg.SuccessRate,
|
|
593
|
+
AverageDurationMs = agg.AverageDuration.TotalMilliseconds,
|
|
594
|
+
P99DurationMs = agg.P99Duration.TotalMilliseconds
|
|
595
|
+
});
|
|
596
|
+
}
|
|
597
|
+
|
|
598
|
+
return summaries;
|
|
599
|
+
}
|
|
600
|
+
|
|
601
|
+
private List<RecentExecution> GetRecentExecutions()
|
|
602
|
+
{
|
|
603
|
+
return _metrics.GetAllExecutionPaths()
|
|
604
|
+
.OrderByDescending(kvp => kvp.Value.Timestamp)
|
|
605
|
+
.Take(10)
|
|
606
|
+
.Select(kvp => new RecentExecution
|
|
607
|
+
{
|
|
608
|
+
ExecutionId = kvp.Key,
|
|
609
|
+
NodeCount = kvp.Value.NodeIds.Length,
|
|
610
|
+
DurationMs = kvp.Value.TotalDuration.TotalMilliseconds,
|
|
611
|
+
Success = kvp.Value.Success,
|
|
612
|
+
Timestamp = kvp.Value.Timestamp
|
|
613
|
+
})
|
|
614
|
+
.ToList();
|
|
615
|
+
}
|
|
616
|
+
|
|
617
|
+
public void Stop()
|
|
618
|
+
{
|
|
619
|
+
_updateTimer.Change(Timeout.Infinite, Timeout.Infinite);
|
|
620
|
+
}
|
|
621
|
+
}
|
|
622
|
+
|
|
623
|
+
public class DashboardUpdate
|
|
624
|
+
{
|
|
625
|
+
public DateTimeOffset Timestamp { get; set; }
|
|
626
|
+
public List<NodeSummary> NodeSummaries { get; set; } = new();
|
|
627
|
+
public ResourceUsage ResourceUsage { get; set; } = new();
|
|
628
|
+
public List<RecentExecution> RecentExecutions { get; set; } = new();
|
|
629
|
+
}
|
|
630
|
+
|
|
631
|
+
public class NodeSummary
|
|
632
|
+
{
|
|
633
|
+
public string NodeId { get; set; } = string.Empty;
|
|
634
|
+
public int TotalExecutions { get; set; }
|
|
635
|
+
public double SuccessRate { get; set; }
|
|
636
|
+
public double AverageDurationMs { get; set; }
|
|
637
|
+
public double P99DurationMs { get; set; }
|
|
638
|
+
}
|
|
639
|
+
|
|
640
|
+
public class ResourceUsage
|
|
641
|
+
{
|
|
642
|
+
public double CpuUsage { get; set; }
|
|
643
|
+
public double MemoryMB { get; set; }
|
|
644
|
+
}
|
|
645
|
+
|
|
646
|
+
public class RecentExecution
|
|
647
|
+
{
|
|
648
|
+
public string ExecutionId { get; set; } = string.Empty;
|
|
649
|
+
public int NodeCount { get; set; }
|
|
650
|
+
public double DurationMs { get; set; }
|
|
651
|
+
public bool Success { get; set; }
|
|
652
|
+
public DateTimeOffset Timestamp { get; set; }
|
|
653
|
+
}
|
|
654
|
+
```
|
|
655
|
+
|
|
656
|
+
### 5. Alerting System
|
|
657
|
+
|
|
658
|
+
```csharp
|
|
659
|
+
/// <summary>
|
|
660
|
+
/// Alerting system for metrics thresholds.
|
|
661
|
+
/// </summary>
|
|
662
|
+
public class MetricsAlerting
|
|
663
|
+
{
|
|
664
|
+
private readonly List<AlertRule> _rules = new();
|
|
665
|
+
private readonly ILogger _logger;
|
|
666
|
+
|
|
667
|
+
public event EventHandler<Alert>? AlertRaised;
|
|
668
|
+
|
|
669
|
+
public MetricsAlerting(ILogger? logger = null)
|
|
670
|
+
{
|
|
671
|
+
_logger = logger ?? NullLogger.Instance;
|
|
672
|
+
}
|
|
673
|
+
|
|
674
|
+
public void AddRule(AlertRule rule)
|
|
675
|
+
{
|
|
676
|
+
_rules.Add(rule);
|
|
677
|
+
}
|
|
678
|
+
|
|
679
|
+
public void AddHighErrorRateRule(string nodeId, double threshold = 0.1)
|
|
680
|
+
{
|
|
681
|
+
_rules.Add(new AlertRule
|
|
682
|
+
{
|
|
683
|
+
RuleId = $"high_error_rate_{nodeId}",
|
|
684
|
+
NodeId = nodeId,
|
|
685
|
+
Severity = AlertSeverity.Critical,
|
|
686
|
+
Condition = metrics => metrics.SuccessRate < (1 - threshold),
|
|
687
|
+
Message = $"High error rate detected for node {nodeId}"
|
|
688
|
+
});
|
|
689
|
+
}
|
|
690
|
+
|
|
691
|
+
public void AddSlowExecutionRule(string nodeId, TimeSpan threshold)
|
|
692
|
+
{
|
|
693
|
+
_rules.Add(new AlertRule
|
|
694
|
+
{
|
|
695
|
+
RuleId = $"slow_execution_{nodeId}",
|
|
696
|
+
NodeId = nodeId,
|
|
697
|
+
Severity = AlertSeverity.Warning,
|
|
698
|
+
Condition = metrics => metrics.P95Duration > threshold,
|
|
699
|
+
Message = $"Slow execution detected for node {nodeId}"
|
|
700
|
+
});
|
|
701
|
+
}
|
|
702
|
+
|
|
703
|
+
public List<Alert> CheckAlerts(GraphPerformanceMetrics metrics)
|
|
704
|
+
{
|
|
705
|
+
var alerts = new List<Alert>();
|
|
706
|
+
|
|
707
|
+
foreach (var rule in _rules)
|
|
708
|
+
{
|
|
709
|
+
var nodeMetrics = metrics.GetNodeAggregatedMetrics(rule.NodeId);
|
|
710
|
+
|
|
711
|
+
if (rule.Condition(nodeMetrics))
|
|
712
|
+
{
|
|
713
|
+
var alert = new Alert
|
|
714
|
+
{
|
|
715
|
+
RuleId = rule.RuleId,
|
|
716
|
+
Severity = rule.Severity,
|
|
717
|
+
Message = rule.Message,
|
|
718
|
+
NodeId = rule.NodeId,
|
|
719
|
+
Timestamp = DateTimeOffset.UtcNow,
|
|
720
|
+
Metrics = nodeMetrics
|
|
721
|
+
};
|
|
722
|
+
|
|
723
|
+
alerts.Add(alert);
|
|
724
|
+
AlertRaised?.Invoke(this, alert);
|
|
725
|
+
|
|
726
|
+
_logger.LogWarning("Alert raised: {Message} (Severity: {Severity})",
|
|
727
|
+
alert.Message, alert.Severity);
|
|
728
|
+
}
|
|
729
|
+
}
|
|
730
|
+
|
|
731
|
+
return alerts;
|
|
732
|
+
}
|
|
733
|
+
}
|
|
734
|
+
|
|
735
|
+
public class AlertRule
|
|
736
|
+
{
|
|
737
|
+
public string RuleId { get; set; } = string.Empty;
|
|
738
|
+
public string NodeId { get; set; } = string.Empty;
|
|
739
|
+
public AlertSeverity Severity { get; set; }
|
|
740
|
+
public Func<NodeAggregatedMetrics, bool> Condition { get; set; } = _ => false;
|
|
741
|
+
public string Message { get; set; } = string.Empty;
|
|
742
|
+
}
|
|
743
|
+
|
|
744
|
+
public class Alert
|
|
745
|
+
{
|
|
746
|
+
public string RuleId { get; set; } = string.Empty;
|
|
747
|
+
public AlertSeverity Severity { get; set; }
|
|
748
|
+
public string Message { get; set; } = string.Empty;
|
|
749
|
+
public string NodeId { get; set; } = string.Empty;
|
|
750
|
+
public DateTimeOffset Timestamp { get; set; }
|
|
751
|
+
public NodeAggregatedMetrics? Metrics { get; set; }
|
|
752
|
+
}
|
|
753
|
+
|
|
754
|
+
public enum AlertSeverity
|
|
755
|
+
{
|
|
756
|
+
Info,
|
|
757
|
+
Warning,
|
|
758
|
+
Critical
|
|
759
|
+
}
|
|
760
|
+
```
|
|
761
|
+
|
|
762
|
+
---
|
|
763
|
+
|
|
764
|
+
## Service Layer Integration
|
|
765
|
+
|
|
766
|
+
```csharp
|
|
767
|
+
public interface IObservabilityService
|
|
768
|
+
{
|
|
769
|
+
void TrackNodeExecution(string nodeId, string nodeName, string executionId, Func<Task> action);
|
|
770
|
+
Task<string> ExportMetricsAsync(MetricsExportFormat format, CancellationToken cancellationToken = default);
|
|
771
|
+
DashboardUpdate GetCurrentDashboard();
|
|
772
|
+
List<Alert> CheckAlerts();
|
|
773
|
+
}
|
|
774
|
+
|
|
775
|
+
public class ObservabilityService : IObservabilityService, IDisposable
|
|
776
|
+
{
|
|
777
|
+
private readonly GraphPerformanceMetrics _metrics;
|
|
778
|
+
private readonly GraphMetricsExporter _exporter;
|
|
779
|
+
private readonly MetricsDashboard _dashboard;
|
|
780
|
+
private readonly MetricsAlerting _alerting;
|
|
781
|
+
private DashboardUpdate? _lastUpdate;
|
|
782
|
+
|
|
783
|
+
public ObservabilityService(ILoggerFactory loggerFactory)
|
|
784
|
+
{
|
|
785
|
+
var options = GraphMetricsOptions.CreateProductionOptions();
|
|
786
|
+
_metrics = new GraphPerformanceMetrics(options);
|
|
787
|
+
_exporter = new GraphMetricsExporter(new GraphMetricsExportOptions { IndentedOutput = true });
|
|
788
|
+
_dashboard = new MetricsDashboard(_metrics);
|
|
789
|
+
_alerting = new MetricsAlerting(loggerFactory.CreateLogger<MetricsAlerting>());
|
|
790
|
+
|
|
791
|
+
_dashboard.Updated += (_, update) => _lastUpdate = update;
|
|
792
|
+
}
|
|
793
|
+
|
|
794
|
+
public void TrackNodeExecution(string nodeId, string nodeName, string executionId, Func<Task> action)
|
|
795
|
+
{
|
|
796
|
+
var tracker = _metrics.StartNodeTracking(nodeId, nodeName, executionId);
|
|
797
|
+
|
|
798
|
+
try
|
|
799
|
+
{
|
|
800
|
+
action().GetAwaiter().GetResult();
|
|
801
|
+
_metrics.CompleteNodeTracking(tracker, success: true);
|
|
802
|
+
}
|
|
803
|
+
catch (Exception ex)
|
|
804
|
+
{
|
|
805
|
+
_metrics.CompleteNodeTracking(tracker, success: false);
|
|
806
|
+
throw;
|
|
807
|
+
}
|
|
808
|
+
}
|
|
809
|
+
|
|
810
|
+
public Task<string> ExportMetricsAsync(MetricsExportFormat format, CancellationToken cancellationToken = default)
|
|
811
|
+
{
|
|
812
|
+
var export = _exporter.ExportMetrics(_metrics, format, TimeSpan.FromHours(1));
|
|
813
|
+
return Task.FromResult(export);
|
|
814
|
+
}
|
|
815
|
+
|
|
816
|
+
public DashboardUpdate GetCurrentDashboard()
|
|
817
|
+
{
|
|
818
|
+
return _lastUpdate ?? new DashboardUpdate();
|
|
819
|
+
}
|
|
820
|
+
|
|
821
|
+
public List<Alert> CheckAlerts()
|
|
822
|
+
{
|
|
823
|
+
return _alerting.CheckAlerts(_metrics);
|
|
824
|
+
}
|
|
825
|
+
|
|
826
|
+
public void Dispose()
|
|
827
|
+
{
|
|
828
|
+
_metrics.Dispose();
|
|
829
|
+
_exporter.Dispose();
|
|
830
|
+
}
|
|
831
|
+
}
|
|
832
|
+
```
|
|
833
|
+
|
|
834
|
+
---
|
|
835
|
+
|
|
836
|
+
## Web API Integration
|
|
837
|
+
|
|
838
|
+
```csharp
|
|
839
|
+
[ApiController]
|
|
840
|
+
[Route("api/[controller]")]
|
|
841
|
+
public class MetricsController : ControllerBase
|
|
842
|
+
{
|
|
843
|
+
private readonly IObservabilityService _service;
|
|
844
|
+
|
|
845
|
+
public MetricsController(IObservabilityService service)
|
|
846
|
+
{
|
|
847
|
+
_service = service;
|
|
848
|
+
}
|
|
849
|
+
|
|
850
|
+
[HttpGet("export/{format}")]
|
|
851
|
+
public async Task<IActionResult> ExportMetrics(
|
|
852
|
+
MetricsExportFormat format,
|
|
853
|
+
CancellationToken cancellationToken)
|
|
854
|
+
{
|
|
855
|
+
var export = await _service.ExportMetricsAsync(format, cancellationToken);
|
|
856
|
+
|
|
857
|
+
var contentType = format switch
|
|
858
|
+
{
|
|
859
|
+
MetricsExportFormat.Json => "application/json",
|
|
860
|
+
MetricsExportFormat.Csv => "text/csv",
|
|
861
|
+
MetricsExportFormat.Prometheus => "text/plain",
|
|
862
|
+
_ => "text/plain"
|
|
863
|
+
};
|
|
864
|
+
|
|
865
|
+
return Content(export, contentType);
|
|
866
|
+
}
|
|
867
|
+
|
|
868
|
+
[HttpGet("dashboard")]
|
|
869
|
+
public IActionResult GetDashboard()
|
|
870
|
+
{
|
|
871
|
+
var dashboard = _service.GetCurrentDashboard();
|
|
872
|
+
return Ok(dashboard);
|
|
873
|
+
}
|
|
874
|
+
|
|
875
|
+
[HttpGet("alerts")]
|
|
876
|
+
public IActionResult GetAlerts()
|
|
877
|
+
{
|
|
878
|
+
var alerts = _service.CheckAlerts();
|
|
879
|
+
return Ok(alerts);
|
|
880
|
+
}
|
|
881
|
+
}
|
|
882
|
+
```
|
|
883
|
+
|
|
884
|
+
---
|
|
885
|
+
|
|
886
|
+
## Testing
|
|
887
|
+
|
|
888
|
+
```csharp
|
|
889
|
+
using Xunit;
|
|
890
|
+
|
|
891
|
+
public class ObservabilityTests
|
|
892
|
+
{
|
|
893
|
+
[Fact]
|
|
894
|
+
public void PerformanceMetrics_TracksNodeExecutions()
|
|
895
|
+
{
|
|
896
|
+
// Arrange
|
|
897
|
+
using var metrics = new GraphPerformanceMetrics();
|
|
898
|
+
var tracker = metrics.StartNodeTracking("node-1", "Test Node", "exec-1");
|
|
899
|
+
|
|
900
|
+
// Act
|
|
901
|
+
Thread.Sleep(50); // Simulate work
|
|
902
|
+
metrics.CompleteNodeTracking(tracker, success: true);
|
|
903
|
+
|
|
904
|
+
// Assert
|
|
905
|
+
var agg = metrics.GetNodeAggregatedMetrics("node-1");
|
|
906
|
+
Assert.Equal(1, agg.TotalExecutions);
|
|
907
|
+
Assert.Equal(1, agg.SuccessCount);
|
|
908
|
+
Assert.True(agg.AverageDuration.TotalMilliseconds >= 50);
|
|
909
|
+
}
|
|
910
|
+
|
|
911
|
+
[Fact]
|
|
912
|
+
public void MetricsExporter_ExportsToJson()
|
|
913
|
+
{
|
|
914
|
+
// Arrange
|
|
915
|
+
using var metrics = new GraphPerformanceMetrics();
|
|
916
|
+
using var exporter = new GraphMetricsExporter(new GraphMetricsExportOptions { IndentedOutput = true });
|
|
917
|
+
|
|
918
|
+
var tracker = metrics.StartNodeTracking("node-1", "Test", "exec-1");
|
|
919
|
+
metrics.CompleteNodeTracking(tracker, success: true);
|
|
920
|
+
|
|
921
|
+
// Act
|
|
922
|
+
var json = exporter.ExportMetrics(metrics, MetricsExportFormat.Json, TimeSpan.FromHours(1));
|
|
923
|
+
|
|
924
|
+
// Assert
|
|
925
|
+
Assert.Contains("node-1", json);
|
|
926
|
+
Assert.Contains("exportedAt", json);
|
|
927
|
+
}
|
|
928
|
+
|
|
929
|
+
[Fact]
|
|
930
|
+
public void MetricsExporter_ExportsToPrometheus()
|
|
931
|
+
{
|
|
932
|
+
// Arrange
|
|
933
|
+
using var metrics = new GraphPerformanceMetrics();
|
|
934
|
+
using var exporter = new GraphMetricsExporter();
|
|
935
|
+
|
|
936
|
+
var tracker = metrics.StartNodeTracking("node-1", "Test", "exec-1");
|
|
937
|
+
metrics.CompleteNodeTracking(tracker, success: true);
|
|
938
|
+
|
|
939
|
+
// Act
|
|
940
|
+
var prometheus = exporter.ExportMetrics(metrics, MetricsExportFormat.Prometheus, TimeSpan.FromHours(1));
|
|
941
|
+
|
|
942
|
+
// Assert
|
|
943
|
+
Assert.Contains("graph_node_executions_total", prometheus);
|
|
944
|
+
Assert.Contains("graph_node_success_rate", prometheus);
|
|
945
|
+
}
|
|
946
|
+
|
|
947
|
+
[Fact]
|
|
948
|
+
public void Alerting_RaisesAlertOnHighErrorRate()
|
|
949
|
+
{
|
|
950
|
+
// Arrange
|
|
951
|
+
using var metrics = new GraphPerformanceMetrics();
|
|
952
|
+
var alerting = new MetricsAlerting();
|
|
953
|
+
alerting.AddHighErrorRateRule("node-1", threshold: 0.1);
|
|
954
|
+
|
|
955
|
+
// Simulate failures
|
|
956
|
+
for (int i = 0; i < 10; i++)
|
|
957
|
+
{
|
|
958
|
+
var tracker = metrics.StartNodeTracking("node-1", "Test", $"exec-{i}");
|
|
959
|
+
metrics.CompleteNodeTracking(tracker, success: i % 5 == 0); // 80% failure rate
|
|
960
|
+
}
|
|
961
|
+
|
|
962
|
+
// Act
|
|
963
|
+
var alerts = alerting.CheckAlerts(metrics);
|
|
964
|
+
|
|
965
|
+
// Assert
|
|
966
|
+
Assert.Single(alerts);
|
|
967
|
+
Assert.Equal(AlertSeverity.Critical, alerts[0].Severity);
|
|
968
|
+
}
|
|
969
|
+
}
|
|
970
|
+
```
|
|
971
|
+
|
|
972
|
+
---
|
|
973
|
+
|
|
974
|
+
## Best Practices
|
|
975
|
+
|
|
976
|
+
### Metrics Collection
|
|
977
|
+
|
|
978
|
+
1. **Selective Collection**: Only collect metrics you'll use
|
|
979
|
+
2. **Sampling**: Use sampling for high-volume metrics
|
|
980
|
+
3. **Aggregation**: Aggregate metrics to reduce storage
|
|
981
|
+
4. **Retention**: Configure appropriate retention periods
|
|
982
|
+
|
|
983
|
+
### Performance
|
|
984
|
+
|
|
985
|
+
1. **Async Operations**: Use async for metric storage
|
|
986
|
+
2. **Batching**: Batch metric writes
|
|
987
|
+
3. **Compression**: Compress historical metrics
|
|
988
|
+
4. **Indexing**: Index metrics for fast queries
|
|
989
|
+
|
|
990
|
+
### Alerting
|
|
991
|
+
|
|
992
|
+
1. **Meaningful Thresholds**: Set actionable thresholds
|
|
993
|
+
2. **Avoid Alert Fatigue**: Limit alert frequency
|
|
994
|
+
3. **Severity Levels**: Use appropriate severity levels
|
|
995
|
+
4. **Escalation**: Implement escalation policies
|
|
996
|
+
|
|
997
|
+
### Visualization
|
|
998
|
+
|
|
999
|
+
1. **Real-Time Updates**: Update dashboards in real-time
|
|
1000
|
+
2. **Historical Views**: Provide historical analysis
|
|
1001
|
+
3. **Drill-Down**: Enable detailed investigation
|
|
1002
|
+
4. **Export Options**: Support multiple export formats
|
|
1003
|
+
|
|
1004
|
+
---
|
|
1005
|
+
|
|
1006
|
+
## Related Templates
|
|
1007
|
+
|
|
1008
|
+
- [Graph Executor](template-skg-graph-executor.md) - Basic graph execution
|
|
1009
|
+
- [Streaming](template-skg-streaming.md) - Real-time events
|
|
1010
|
+
- [Checkpointing](template-skg-checkpointing.md) - State persistence
|
|
1011
|
+
- [Multi-Agent](template-skg-multi-agent.md) - Coordinated agents
|
|
1012
|
+
|
|
1013
|
+
---
|
|
1014
|
+
|
|
1015
|
+
## External References
|
|
1016
|
+
|
|
1017
|
+
- [Semantic Kernel Graph](https://github.com/kallebelins/semantic-kernel-graph)
|
|
1018
|
+
- [OpenTelemetry .NET](https://opentelemetry.io/docs/instrumentation/net/)
|
|
1019
|
+
- [Prometheus .NET Client](https://github.com/prometheus-net/prometheus-net)
|
|
1020
|
+
|