agentic-qe 1.9.3 → 1.9.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +54 -0
- package/README.md +30 -5
- package/config/.env.otel.example +25 -0
- package/config/OTEL-QUICK-REFERENCE.md +137 -0
- package/config/README-OTEL.md +222 -0
- package/config/alerting-rules.yml +518 -0
- package/config/docker-compose.otel.yml +187 -0
- package/config/grafana/dashboards/agentic-qe-overview.json +286 -0
- package/config/grafana/provisioning/dashboards/dashboards.yml +19 -0
- package/config/grafana/provisioning/datasources/datasources.yml +53 -0
- package/config/otel-collector-config.yaml.example +145 -0
- package/config/prometheus.yml.example +106 -0
- package/dist/alerting/AlertManager.d.ts +120 -0
- package/dist/alerting/AlertManager.d.ts.map +1 -0
- package/dist/alerting/AlertManager.js +345 -0
- package/dist/alerting/AlertManager.js.map +1 -0
- package/dist/alerting/FeedbackRouter.d.ts +98 -0
- package/dist/alerting/FeedbackRouter.d.ts.map +1 -0
- package/dist/alerting/FeedbackRouter.js +331 -0
- package/dist/alerting/FeedbackRouter.js.map +1 -0
- package/dist/alerting/StrategyApplicator.d.ts +120 -0
- package/dist/alerting/StrategyApplicator.d.ts.map +1 -0
- package/dist/alerting/StrategyApplicator.js +299 -0
- package/dist/alerting/StrategyApplicator.js.map +1 -0
- package/dist/alerting/index.d.ts +68 -0
- package/dist/alerting/index.d.ts.map +1 -0
- package/dist/alerting/index.js +112 -0
- package/dist/alerting/index.js.map +1 -0
- package/dist/alerting/types.d.ts +118 -0
- package/dist/alerting/types.d.ts.map +1 -0
- package/dist/alerting/types.js +11 -0
- package/dist/alerting/types.js.map +1 -0
- package/dist/cli/init/claude-config.d.ts.map +1 -1
- package/dist/cli/init/claude-config.js +12 -7
- package/dist/cli/init/claude-config.js.map +1 -1
- package/dist/core/memory/IPatternStore.d.ts +209 -0
- package/dist/core/memory/IPatternStore.d.ts.map +1 -0
- package/dist/core/memory/IPatternStore.js +15 -0
- package/dist/core/memory/IPatternStore.js.map +1 -0
- package/dist/core/memory/MigrationTools.d.ts +192 -0
- package/dist/core/memory/MigrationTools.d.ts.map +1 -0
- package/dist/core/memory/MigrationTools.js +615 -0
- package/dist/core/memory/MigrationTools.js.map +1 -0
- package/dist/core/memory/NeuralEnhancement.d.ts +154 -0
- package/dist/core/memory/NeuralEnhancement.d.ts.map +1 -0
- package/dist/core/memory/NeuralEnhancement.js +598 -0
- package/dist/core/memory/NeuralEnhancement.js.map +1 -0
- package/dist/core/memory/PatternStoreFactory.d.ts +143 -0
- package/dist/core/memory/PatternStoreFactory.d.ts.map +1 -0
- package/dist/core/memory/PatternStoreFactory.js +370 -0
- package/dist/core/memory/PatternStoreFactory.js.map +1 -0
- package/dist/core/memory/RealAgentDBAdapter.d.ts +1 -0
- package/dist/core/memory/RealAgentDBAdapter.d.ts.map +1 -1
- package/dist/core/memory/RealAgentDBAdapter.js +28 -20
- package/dist/core/memory/RealAgentDBAdapter.js.map +1 -1
- package/dist/core/memory/RuVectorPatternStore.d.ts +198 -0
- package/dist/core/memory/RuVectorPatternStore.d.ts.map +1 -0
- package/dist/core/memory/RuVectorPatternStore.js +605 -0
- package/dist/core/memory/RuVectorPatternStore.js.map +1 -0
- package/dist/core/memory/SelfHealingMonitor.d.ts +186 -0
- package/dist/core/memory/SelfHealingMonitor.d.ts.map +1 -0
- package/dist/core/memory/SelfHealingMonitor.js +451 -0
- package/dist/core/memory/SelfHealingMonitor.js.map +1 -0
- package/dist/core/memory/SwarmMemoryManager.d.ts +62 -0
- package/dist/core/memory/SwarmMemoryManager.d.ts.map +1 -1
- package/dist/core/memory/SwarmMemoryManager.js +97 -0
- package/dist/core/memory/SwarmMemoryManager.js.map +1 -1
- package/dist/core/memory/index.d.ts +11 -0
- package/dist/core/memory/index.d.ts.map +1 -1
- package/dist/core/memory/index.js +36 -1
- package/dist/core/memory/index.js.map +1 -1
- package/dist/reasoning/RuVectorReasoningAdapter.d.ts +232 -0
- package/dist/reasoning/RuVectorReasoningAdapter.d.ts.map +1 -0
- package/dist/reasoning/RuVectorReasoningAdapter.js +585 -0
- package/dist/reasoning/RuVectorReasoningAdapter.js.map +1 -0
- package/dist/reasoning/index.d.ts +2 -0
- package/dist/reasoning/index.d.ts.map +1 -1
- package/dist/reasoning/index.js +6 -1
- package/dist/reasoning/index.js.map +1 -1
- package/dist/reporting/ResultAggregator.d.ts +107 -0
- package/dist/reporting/ResultAggregator.d.ts.map +1 -0
- package/dist/reporting/ResultAggregator.js +435 -0
- package/dist/reporting/ResultAggregator.js.map +1 -0
- package/dist/reporting/index.d.ts +48 -0
- package/dist/reporting/index.d.ts.map +1 -0
- package/dist/reporting/index.js +154 -0
- package/dist/reporting/index.js.map +1 -0
- package/dist/reporting/reporters/ControlLoopReporter.d.ts +128 -0
- package/dist/reporting/reporters/ControlLoopReporter.d.ts.map +1 -0
- package/dist/reporting/reporters/ControlLoopReporter.js +417 -0
- package/dist/reporting/reporters/ControlLoopReporter.js.map +1 -0
- package/dist/reporting/reporters/HumanReadableReporter.d.ts +140 -0
- package/dist/reporting/reporters/HumanReadableReporter.d.ts.map +1 -0
- package/dist/reporting/reporters/HumanReadableReporter.js +524 -0
- package/dist/reporting/reporters/HumanReadableReporter.js.map +1 -0
- package/dist/reporting/reporters/JSONReporter.d.ts +193 -0
- package/dist/reporting/reporters/JSONReporter.d.ts.map +1 -0
- package/dist/reporting/reporters/JSONReporter.js +324 -0
- package/dist/reporting/reporters/JSONReporter.js.map +1 -0
- package/dist/reporting/reporters/index.d.ts +14 -0
- package/dist/reporting/reporters/index.d.ts.map +1 -0
- package/dist/reporting/reporters/index.js +19 -0
- package/dist/reporting/reporters/index.js.map +1 -0
- package/dist/reporting/types.d.ts +427 -0
- package/dist/reporting/types.d.ts.map +1 -0
- package/dist/reporting/types.js +12 -0
- package/dist/reporting/types.js.map +1 -0
- package/package.json +9 -1
package/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,60 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [1.9.4] - 2025-11-30
|
|
11
|
+
|
|
12
|
+
### 🔧 Critical Fixes: Memory/Learning/Patterns System
|
|
13
|
+
|
|
14
|
+
This release delivers critical fixes to the memory, learning, and patterns system based on thorough investigation (Sherlock Investigation Report). All QE agents now have a fully functional learning system with proper vector embeddings, Q-value reinforcement learning, and persistent pattern storage.
|
|
15
|
+
|
|
16
|
+
### Fixed
|
|
17
|
+
|
|
18
|
+
- **Vector embeddings now stored correctly** (was storing NULL): Fixed `RealAgentDBAdapter.store()` to properly store 384-dimension embeddings as BLOB data instead of NULL
|
|
19
|
+
- **SQL parameter style bug**: Fixed agentdb's `SqlJsDatabase` wrapper to use spread params (`stmt.run(a, b, c)`) instead of array params (`stmt.run([a,b,c])`) which caused "NOT NULL constraint failed" errors
|
|
20
|
+
- **HNSW index schema mismatch**: Added `pattern_id` generated column for agentdb's HNSWIndex compatibility which requires this column for vector search
|
|
21
|
+
- **Learning experience retrieval**: Added missing getter methods that were referenced but didn't exist
|
|
22
|
+
- **Hooks saving to wrong database**: Fixed all Claude Code hooks to explicitly export `AGENTDB_PATH=.agentic-qe/agentdb.db` so learning data is saved to the project database instead of the root directory
|
|
23
|
+
- **CI failures due to ARM64-only ruvector packages**: Moved `@ruvector/node-linux-arm64-gnu` and `ruvector-core-linux-arm64-gnu` from dependencies to optionalDependencies. Added x64 variants for CI compatibility
|
|
24
|
+
|
|
25
|
+
### Added
|
|
26
|
+
|
|
27
|
+
- **New SwarmMemoryManager methods for learning data retrieval**:
|
|
28
|
+
- `getBestAction(agentId, stateKey)` - Q-learning best action selection
|
|
29
|
+
- `getRecentLearningExperiences(agentId, limit)` - Recent experience retrieval
|
|
30
|
+
- `getLearningExperiencesByTaskType(agentId, taskType, limit)` - Task-filtered experiences
|
|
31
|
+
- `getHighRewardExperiences(agentId, minReward, limit)` - Successful experience extraction
|
|
32
|
+
- `getLearningStats(agentId)` - Aggregate learning statistics (total, avg, max, min rewards)
|
|
33
|
+
|
|
34
|
+
- **Hooks integration**: Added `AGENTDB_PATH` environment variable to connect Claude Code hooks to the QE database
|
|
35
|
+
|
|
36
|
+
- **New modules (Phase 4 Alerting & Reporting)**:
|
|
37
|
+
- `src/alerting/` - AlertManager, FeedbackRouter, StrategyApplicator (1,394 LOC)
|
|
38
|
+
- `src/reporting/` - ResultAggregator, reporters (3,030 LOC)
|
|
39
|
+
- Quality gate scripts and GitHub Actions workflow
|
|
40
|
+
|
|
41
|
+
- **Integration test**: `tests/integration/memory-learning-loop.test.ts` - Comprehensive 7-phase test validating the full learning cycle:
|
|
42
|
+
1. Pattern storage with embeddings
|
|
43
|
+
2. Learning experience capture
|
|
44
|
+
3. Q-value reinforcement learning
|
|
45
|
+
4. Memory persistence
|
|
46
|
+
5. Pattern retrieval
|
|
47
|
+
6. Vector similarity search
|
|
48
|
+
7. Full learning loop simulation
|
|
49
|
+
|
|
50
|
+
### Changed
|
|
51
|
+
|
|
52
|
+
- **RealAgentDBAdapter**: Now properly retrieves stored embeddings when querying patterns instead of using placeholder values
|
|
53
|
+
- **Pattern table schema**: Added generated column `pattern_id TEXT GENERATED ALWAYS AS (id) STORED` for HNSW compatibility
|
|
54
|
+
|
|
55
|
+
### Technical Details
|
|
56
|
+
|
|
57
|
+
- Vector embeddings: 384 dimensions × 4 bytes = 1,536 bytes per pattern
|
|
58
|
+
- AgentDB version: v1.6.1 with ReasoningBank (16 learning tables)
|
|
59
|
+
- HNSW index: 150x faster vector search enabled
|
|
60
|
+
- All 12 integration tests pass
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
10
64
|
## [1.9.3] - 2025-11-26
|
|
11
65
|
|
|
12
66
|
### 🐛 Bugfix: NPM Package Missing Files
|
package/README.md
CHANGED
|
@@ -9,11 +9,11 @@
|
|
|
9
9
|
<img alt="NPM Downloads" src="https://img.shields.io/npm/dw/agentic-qe">
|
|
10
10
|
|
|
11
11
|
|
|
12
|
-
**Version 1.9.
|
|
12
|
+
**Version 1.9.4** (Memory & Learning System Fixes) | [Changelog](CHANGELOG.md) | [Issues](https://github.com/proffesor-for-testing/agentic-qe/issues) | [Discussions](https://github.com/proffesor-for-testing/agentic-qe/discussions)
|
|
13
13
|
|
|
14
14
|
> Agentic test automation with AI learning, real-time visualization, OpenTelemetry observability, persistent event storage, constitutional AI governance, and intelligent model routing.
|
|
15
15
|
|
|
16
|
-
🎨 **Real-Time Visualization** | 📊 **Interactive Dashboards** | 🧠 **QE Agent Learning** | 💾 **Event Sourcing** | 📋 **Constitution System** | 📚 **
|
|
16
|
+
🎨 **Real-Time Visualization** | 📊 **Interactive Dashboards** | 🧠 **QE Agent Learning** | 💾 **Event Sourcing** | 📋 **Constitution System** | 📚 **38 QE Skills** | 🎯 **Flaky Detection** | 💰 **Multi-Model Router**
|
|
17
17
|
|
|
18
18
|
</div>
|
|
19
19
|
|
|
@@ -193,7 +193,7 @@ open http://localhost:3000
|
|
|
193
193
|
- **Performance Testing**: k6, JMeter, Gatling integration
|
|
194
194
|
- **Real-Time Streaming**: Live progress updates for all operations
|
|
195
195
|
|
|
196
|
-
### 🎓
|
|
196
|
+
### 🎓 38 QE Skills Library (v1.9.0)
|
|
197
197
|
**95%+ coverage of modern QE practices**
|
|
198
198
|
|
|
199
199
|
<details>
|
|
@@ -206,7 +206,7 @@ open http://localhost:3000
|
|
|
206
206
|
- **Code Quality**: code-review-quality, refactoring-patterns, quality-metrics
|
|
207
207
|
- **Communication**: bug-reporting-excellence, technical-writing, consultancy-practices
|
|
208
208
|
|
|
209
|
-
**Phase 2: Expanded QE Skills Library (
|
|
209
|
+
**Phase 2: Expanded QE Skills Library (16 skills)**
|
|
210
210
|
- **Testing Methodologies (7)**: regression-testing, shift-left-testing, shift-right-testing, test-design-techniques, mutation-testing, test-data-management, verification-quality
|
|
211
211
|
- **Specialized Testing (9)**: accessibility-testing, mobile-testing, database-testing, contract-testing, chaos-engineering-resilience, compatibility-testing, localization-testing, compliance-testing, visual-testing-advanced
|
|
212
212
|
- **Testing Infrastructure (2)**: test-environment-management, test-reporting-analytics
|
|
@@ -214,7 +214,7 @@ open http://localhost:3000
|
|
|
214
214
|
**Phase 3: Advanced Quality Engineering Skills (4 skills)**
|
|
215
215
|
- **Strategic Testing Methodologies (4)**: six-thinking-hats, brutal-honesty-review, sherlock-review, cicd-pipeline-qe-orchestrator
|
|
216
216
|
|
|
217
|
-
**Total:
|
|
217
|
+
**Total: 38 QE Skills** - Includes accessibility testing, shift-left/right testing, verification & quality assurance, visual testing advanced, XP practices, and technical writing
|
|
218
218
|
|
|
219
219
|
</details>
|
|
220
220
|
|
|
@@ -645,6 +645,31 @@ The test generator automatically delegates to subagents for a complete RED-GREEN
|
|
|
645
645
|
|
|
646
646
|
---
|
|
647
647
|
|
|
648
|
+
## 📝 What's New in v1.9.4
|
|
649
|
+
|
|
650
|
+
🔧 **Critical Memory & Learning System Fixes** (2025-11-30)
|
|
651
|
+
|
|
652
|
+
This release delivers critical fixes to the memory, learning, and patterns system. All QE agents now have a fully functional learning system with proper vector embeddings, Q-value reinforcement learning, and persistent pattern storage.
|
|
653
|
+
|
|
654
|
+
### Key Fixes
|
|
655
|
+
|
|
656
|
+
- **Vector embeddings now stored correctly**: Fixed `RealAgentDBAdapter.store()` to properly store 384-dimension embeddings as BLOB data
|
|
657
|
+
- **SQL parameter style bug**: Fixed agentdb's `SqlJsDatabase` wrapper to use spread params instead of array params
|
|
658
|
+
- **HNSW index schema mismatch**: Added `pattern_id` generated column for agentdb's HNSWIndex compatibility
|
|
659
|
+
- **Learning experience retrieval**: Added missing getter methods for Q-learning and experience replay
|
|
660
|
+
- **Hooks saving to wrong database**: Fixed all Claude Code hooks to explicitly export `AGENTDB_PATH` so learning data is saved correctly
|
|
661
|
+
- **CI platform compatibility**: Moved ARM64-only ruvector packages to optionalDependencies for x64 CI compatibility
|
|
662
|
+
|
|
663
|
+
### New Features
|
|
664
|
+
|
|
665
|
+
- **SwarmMemoryManager learning methods**: `getBestAction()`, `getRecentLearningExperiences()`, `getLearningStats()`, and more
|
|
666
|
+
- **Phase 4 Alerting & Reporting**: AlertManager, FeedbackRouter, StrategyApplicator modules
|
|
667
|
+
- **Quality Gate CI workflow**: GitHub Actions integration for automated quality validation
|
|
668
|
+
|
|
669
|
+
**Upgrade**: `npm install agentic-qe@1.9.4`
|
|
670
|
+
|
|
671
|
+
---
|
|
672
|
+
|
|
648
673
|
## 📝 What's New in v1.9.3
|
|
649
674
|
|
|
650
675
|
📦 **NPM Package Fix** (2025-11-26)
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# OTEL Stack Environment Variables
|
|
2
|
+
# Agentic QE Fleet - Issue #71
|
|
3
|
+
#
|
|
4
|
+
# Copy this file to .env.otel and customize as needed
|
|
5
|
+
# Usage: docker-compose -f config/docker-compose.otel.yml --env-file config/.env.otel up -d
|
|
6
|
+
|
|
7
|
+
# Deployment environment
|
|
8
|
+
DEPLOYMENT_ENVIRONMENT=development
|
|
9
|
+
|
|
10
|
+
# Grafana credentials (CHANGE IN PRODUCTION!)
|
|
11
|
+
GRAFANA_ADMIN_USER=admin
|
|
12
|
+
GRAFANA_ADMIN_PASSWORD=admin
|
|
13
|
+
|
|
14
|
+
# OTEL Collector settings
|
|
15
|
+
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
16
|
+
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
|
|
17
|
+
|
|
18
|
+
# Service configuration
|
|
19
|
+
SERVICE_NAME=agentic-qe-fleet
|
|
20
|
+
SERVICE_NAMESPACE=agentic-qe
|
|
21
|
+
SERVICE_VERSION=1.9.3
|
|
22
|
+
|
|
23
|
+
# Prometheus retention
|
|
24
|
+
PROMETHEUS_RETENTION_TIME=15d
|
|
25
|
+
PROMETHEUS_RETENTION_SIZE=10GB
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# OTEL Stack Quick Reference Card
|
|
2
|
+
|
|
3
|
+
## 🚀 One-Line Start
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
docker-compose -f config/docker-compose.otel.yml up -d
|
|
7
|
+
```
|
|
8
|
+
|
|
9
|
+
## 🌐 Service URLs
|
|
10
|
+
|
|
11
|
+
| Service | URL | Login |
|
|
12
|
+
|---------|-----|-------|
|
|
13
|
+
| **Grafana** | http://localhost:3001 | admin/admin |
|
|
14
|
+
| **Prometheus** | http://localhost:9090 | - |
|
|
15
|
+
| **Jaeger** | http://localhost:16686 | - |
|
|
16
|
+
| **OTEL Health** | http://localhost:13133/health | - |
|
|
17
|
+
|
|
18
|
+
## 📡 Send Telemetry
|
|
19
|
+
|
|
20
|
+
### OTLP Endpoints
|
|
21
|
+
- gRPC: `localhost:4317`
|
|
22
|
+
- HTTP: `localhost:4318`
|
|
23
|
+
|
|
24
|
+
### Node.js Example
|
|
25
|
+
```javascript
|
|
26
|
+
const exporter = new OTLPTraceExporter({
|
|
27
|
+
url: 'http://localhost:4318/v1/traces'
|
|
28
|
+
});
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
### cURL Test
|
|
32
|
+
```bash
|
|
33
|
+
curl http://localhost:4318/v1/traces \
|
|
34
|
+
-H "Content-Type: application/json" \
|
|
35
|
+
-d @trace.json
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## 🔍 Quick Checks
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
# Verify all services
|
|
42
|
+
./scripts/verify-otel-stack.sh
|
|
43
|
+
|
|
44
|
+
# Check health
|
|
45
|
+
curl http://localhost:13133/health # OTEL Collector
|
|
46
|
+
curl http://localhost:9090/-/healthy # Prometheus
|
|
47
|
+
curl http://localhost:14269/ # Jaeger
|
|
48
|
+
curl http://localhost:3001/api/health # Grafana
|
|
49
|
+
|
|
50
|
+
# View logs
|
|
51
|
+
docker-compose -f config/docker-compose.otel.yml logs -f
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## 🛠️ Common Commands
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
# Start
|
|
58
|
+
docker-compose -f config/docker-compose.otel.yml up -d
|
|
59
|
+
|
|
60
|
+
# Stop
|
|
61
|
+
docker-compose -f config/docker-compose.otel.yml down
|
|
62
|
+
|
|
63
|
+
# Restart
|
|
64
|
+
docker-compose -f config/docker-compose.otel.yml restart
|
|
65
|
+
|
|
66
|
+
# View status
|
|
67
|
+
docker-compose -f config/docker-compose.otel.yml ps
|
|
68
|
+
|
|
69
|
+
# Logs
|
|
70
|
+
docker-compose -f config/docker-compose.otel.yml logs -f [service]
|
|
71
|
+
|
|
72
|
+
# Remove everything (INCLUDING DATA!)
|
|
73
|
+
docker-compose -f config/docker-compose.otel.yml down -v
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## 📊 Port Reference
|
|
77
|
+
|
|
78
|
+
### OTEL Collector
|
|
79
|
+
- 4317 - OTLP gRPC
|
|
80
|
+
- 4318 - OTLP HTTP
|
|
81
|
+
- 8889 - Prometheus metrics
|
|
82
|
+
- 13133 - Health check
|
|
83
|
+
|
|
84
|
+
### Prometheus
|
|
85
|
+
- 9090 - Web UI
|
|
86
|
+
|
|
87
|
+
### Jaeger
|
|
88
|
+
- 16686 - UI
|
|
89
|
+
- 14269 - Metrics
|
|
90
|
+
|
|
91
|
+
### Grafana
|
|
92
|
+
- 3001 - Web UI
|
|
93
|
+
|
|
94
|
+
## 🔧 Configuration Files
|
|
95
|
+
|
|
96
|
+
- **Docker Compose**: `config/docker-compose.otel.yml`
|
|
97
|
+
- **OTEL Collector**: `config/otel-collector-config.yaml.example`
|
|
98
|
+
- **Prometheus**: `config/prometheus.yml.example`
|
|
99
|
+
- **Grafana Datasources**: `config/grafana/provisioning/datasources/datasources.yml`
|
|
100
|
+
- **Grafana Dashboards**: `config/grafana/provisioning/dashboards/dashboards.yml`
|
|
101
|
+
|
|
102
|
+
## 📚 Documentation
|
|
103
|
+
|
|
104
|
+
- **Quick Start**: `config/README-OTEL.md`
|
|
105
|
+
- **Full Summary**: `docs/implementation-plans/issue-71-completion-summary.md`
|
|
106
|
+
- **Architecture**: `docs/architecture/otel-stack-architecture.md`
|
|
107
|
+
|
|
108
|
+
## 🐛 Quick Troubleshooting
|
|
109
|
+
|
|
110
|
+
| Issue | Solution |
|
|
111
|
+
|-------|----------|
|
|
112
|
+
| Services not starting | Check logs: `docker-compose -f config/docker-compose.otel.yml logs` |
|
|
113
|
+
| Port already in use | Change external port in `docker-compose.otel.yml` |
|
|
114
|
+
| Grafana can't connect | Check datasources: Grafana → Configuration → Data Sources |
|
|
115
|
+
| No metrics in Prometheus | Check targets: http://localhost:9090/targets |
|
|
116
|
+
| No traces in Jaeger | Verify OTLP endpoint: `curl http://localhost:4318` |
|
|
117
|
+
|
|
118
|
+
## 🎯 Grafana Datasources
|
|
119
|
+
|
|
120
|
+
Pre-configured and auto-loaded:
|
|
121
|
+
|
|
122
|
+
1. **Prometheus** (default) - `http://prometheus:9090`
|
|
123
|
+
2. **Jaeger** - `http://jaeger:16686`
|
|
124
|
+
3. **OTEL Collector Metrics** - `http://otel-collector:8889`
|
|
125
|
+
|
|
126
|
+
## ✅ Verification Checklist
|
|
127
|
+
|
|
128
|
+
- [ ] All 4 services running: `docker-compose ps`
|
|
129
|
+
- [ ] Health checks passing: `./scripts/verify-otel-stack.sh`
|
|
130
|
+
- [ ] OTLP endpoints accessible: `curl http://localhost:4318`
|
|
131
|
+
- [ ] Prometheus targets green: http://localhost:9090/targets
|
|
132
|
+
- [ ] Grafana datasources connected: Grafana UI → Data Sources
|
|
133
|
+
- [ ] Sample dashboard visible: Grafana → Dashboards
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
**Issue #71 - COMPLETED** ✅
|
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
# OTEL Observability Stack - Quick Start Guide
|
|
2
|
+
|
|
3
|
+
Complete observability stack for the Agentic QE Fleet with OpenTelemetry, Prometheus, Jaeger, and Grafana.
|
|
4
|
+
|
|
5
|
+
## 🚀 Quick Start
|
|
6
|
+
|
|
7
|
+
### 1. Start the OTEL Stack
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
# Start only the OTEL stack
|
|
11
|
+
docker-compose -f config/docker-compose.otel.yml up -d
|
|
12
|
+
|
|
13
|
+
# Or combine with the main application
|
|
14
|
+
docker-compose -f docker-compose.yml -f config/docker-compose.otel.yml up -d
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
### 2. Access the Services
|
|
18
|
+
|
|
19
|
+
| Service | URL | Purpose |
|
|
20
|
+
|---------|-----|---------|
|
|
21
|
+
| **Grafana** | http://localhost:3001 | Dashboards and visualization |
|
|
22
|
+
| **Prometheus** | http://localhost:9090 | Metrics storage and querying |
|
|
23
|
+
| **Jaeger UI** | http://localhost:16686 | Distributed tracing |
|
|
24
|
+
| **OTEL Collector** | http://localhost:13133/health | Health check |
|
|
25
|
+
|
|
26
|
+
### 3. Default Credentials
|
|
27
|
+
|
|
28
|
+
- **Grafana**: `admin` / `admin` (change on first login)
|
|
29
|
+
|
|
30
|
+
## 📊 Service Endpoints
|
|
31
|
+
|
|
32
|
+
### OTEL Collector
|
|
33
|
+
- **OTLP gRPC**: `localhost:4317` - Send traces/metrics via gRPC
|
|
34
|
+
- **OTLP HTTP**: `localhost:4318` - Send traces/metrics via HTTP
|
|
35
|
+
- **Prometheus Exporter**: `localhost:8889` - Metrics endpoint
|
|
36
|
+
- **Health Check**: `localhost:13133` - Collector health
|
|
37
|
+
- **pprof**: `localhost:1777` - Performance profiling
|
|
38
|
+
- **zPages**: `localhost:55679` - Debug interface
|
|
39
|
+
|
|
40
|
+
### Prometheus
|
|
41
|
+
- **Web UI**: `localhost:9090` - Query and explore metrics
|
|
42
|
+
- **API**: `localhost:9090/api/v1/` - Prometheus HTTP API
|
|
43
|
+
|
|
44
|
+
### Jaeger
|
|
45
|
+
- **UI**: `localhost:16686` - Trace visualization
|
|
46
|
+
- **OTLP gRPC**: `localhost:4327` - Receive traces (forwarded from collector)
|
|
47
|
+
- **Metrics**: `localhost:14269/metrics` - Jaeger metrics
|
|
48
|
+
- **Health**: `localhost:14269/` - Health check
|
|
49
|
+
|
|
50
|
+
### Grafana
|
|
51
|
+
- **Web UI**: `localhost:3001` - Dashboards and visualization
|
|
52
|
+
- **API**: `localhost:3001/api/` - Grafana HTTP API
|
|
53
|
+
|
|
54
|
+
## 🔧 Configuration Files
|
|
55
|
+
|
|
56
|
+
### Required Files (Already Created)
|
|
57
|
+
- `config/docker-compose.otel.yml` - Docker Compose configuration
|
|
58
|
+
- `config/otel-collector-config.yaml.example` - OTEL Collector config
|
|
59
|
+
- `config/prometheus.yml.example` - Prometheus scrape config
|
|
60
|
+
- `config/grafana/provisioning/datasources/datasources.yml` - Grafana datasources
|
|
61
|
+
- `config/grafana/provisioning/dashboards/dashboards.yml` - Dashboard provisioning
|
|
62
|
+
- `config/grafana/dashboards/agentic-qe-overview.json` - Sample dashboard
|
|
63
|
+
|
|
64
|
+
### Environment Variables (Optional)
|
|
65
|
+
Copy and customize:
|
|
66
|
+
```bash
|
|
67
|
+
cp config/.env.otel.example config/.env.otel
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Then use:
|
|
71
|
+
```bash
|
|
72
|
+
docker-compose -f config/docker-compose.otel.yml --env-file config/.env.otel up -d
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## 📈 Using the Stack
|
|
76
|
+
|
|
77
|
+
### Send Telemetry to OTEL Collector
|
|
78
|
+
|
|
79
|
+
#### Via HTTP (curl example)
|
|
80
|
+
```bash
|
|
81
|
+
curl -X POST http://localhost:4318/v1/traces \
|
|
82
|
+
-H "Content-Type: application/json" \
|
|
83
|
+
-d @trace-data.json
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
#### Via Node.js Application
|
|
87
|
+
```javascript
|
|
88
|
+
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
|
|
89
|
+
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
|
|
90
|
+
|
|
91
|
+
const provider = new NodeTracerProvider();
|
|
92
|
+
provider.addSpanProcessor(
|
|
93
|
+
new BatchSpanProcessor(
|
|
94
|
+
new OTLPTraceExporter({
|
|
95
|
+
url: 'http://localhost:4318/v1/traces'
|
|
96
|
+
})
|
|
97
|
+
)
|
|
98
|
+
);
|
|
99
|
+
provider.register();
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Query Metrics in Prometheus
|
|
103
|
+
|
|
104
|
+
1. Open http://localhost:9090
|
|
105
|
+
2. Try queries:
|
|
106
|
+
- `aqe_requests_total` - Total requests
|
|
107
|
+
- `rate(aqe_requests_total[5m])` - Request rate
|
|
108
|
+
- `histogram_quantile(0.95, rate(aqe_request_duration_bucket[5m]))` - P95 latency
|
|
109
|
+
|
|
110
|
+
### View Traces in Jaeger
|
|
111
|
+
|
|
112
|
+
1. Open http://localhost:16686
|
|
113
|
+
2. Select service: `agentic-qe-fleet`
|
|
114
|
+
3. Click "Find Traces"
|
|
115
|
+
4. Explore trace details and service dependencies
|
|
116
|
+
|
|
117
|
+
### Create Dashboards in Grafana
|
|
118
|
+
|
|
119
|
+
1. Open http://localhost:3001
|
|
120
|
+
2. Login with `admin` / `admin`
|
|
121
|
+
3. Navigate to Dashboards → Agentic QE Fleet → Overview
|
|
122
|
+
4. Or create new dashboards using Prometheus and Jaeger datasources
|
|
123
|
+
|
|
124
|
+
## 🛠️ Management Commands
|
|
125
|
+
|
|
126
|
+
### View Logs
|
|
127
|
+
```bash
|
|
128
|
+
# All services
|
|
129
|
+
docker-compose -f config/docker-compose.otel.yml logs -f
|
|
130
|
+
|
|
131
|
+
# Specific service
|
|
132
|
+
docker-compose -f config/docker-compose.otel.yml logs -f otel-collector
|
|
133
|
+
docker-compose -f config/docker-compose.otel.yml logs -f prometheus
|
|
134
|
+
docker-compose -f config/docker-compose.otel.yml logs -f jaeger
|
|
135
|
+
docker-compose -f config/docker-compose.otel.yml logs -f grafana
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
### Check Service Health
|
|
139
|
+
```bash
|
|
140
|
+
# OTEL Collector
|
|
141
|
+
curl http://localhost:13133/health
|
|
142
|
+
|
|
143
|
+
# Prometheus
|
|
144
|
+
curl http://localhost:9090/-/healthy
|
|
145
|
+
|
|
146
|
+
# Jaeger
|
|
147
|
+
curl http://localhost:14269/
|
|
148
|
+
|
|
149
|
+
# Grafana
|
|
150
|
+
curl http://localhost:3001/api/health
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Stop Services
|
|
154
|
+
```bash
|
|
155
|
+
# Stop OTEL stack
|
|
156
|
+
docker-compose -f config/docker-compose.otel.yml down
|
|
157
|
+
|
|
158
|
+
# Stop and remove volumes (CAUTION: deletes all data)
|
|
159
|
+
docker-compose -f config/docker-compose.otel.yml down -v
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Restart Services
|
|
163
|
+
```bash
|
|
164
|
+
# Restart all
|
|
165
|
+
docker-compose -f config/docker-compose.otel.yml restart
|
|
166
|
+
|
|
167
|
+
# Restart specific service
|
|
168
|
+
docker-compose -f config/docker-compose.otel.yml restart otel-collector
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
## 🔍 Troubleshooting
|
|
172
|
+
|
|
173
|
+
### OTEL Collector Not Receiving Data
|
|
174
|
+
1. Check collector logs: `docker-compose -f config/docker-compose.otel.yml logs otel-collector`
|
|
175
|
+
2. Verify endpoints: `curl http://localhost:13133/health`
|
|
176
|
+
3. Check application OTLP endpoint: `http://localhost:4318`
|
|
177
|
+
|
|
178
|
+
### Prometheus Not Scraping Metrics
|
|
179
|
+
1. Check Prometheus targets: http://localhost:9090/targets
|
|
180
|
+
2. Verify OTEL Collector is exposing metrics: `curl http://localhost:8889/metrics`
|
|
181
|
+
3. Check Prometheus config: `docker-compose -f config/docker-compose.otel.yml exec prometheus cat /etc/prometheus/prometheus.yml`
|
|
182
|
+
|
|
183
|
+
### Jaeger Not Showing Traces
|
|
184
|
+
1. Check Jaeger logs: `docker-compose -f config/docker-compose.otel.yml logs jaeger`
|
|
185
|
+
2. Verify OTEL Collector is forwarding traces (check collector logs)
|
|
186
|
+
3. Ensure application is sending traces to OTLP endpoint
|
|
187
|
+
|
|
188
|
+
### Grafana Datasources Not Working
|
|
189
|
+
1. Check datasource configuration: Grafana UI → Configuration → Data Sources
|
|
190
|
+
2. Test datasource connection (should show green checkmark)
|
|
191
|
+
3. Verify Prometheus/Jaeger are accessible from Grafana container
|
|
192
|
+
|
|
193
|
+
### Performance Issues
|
|
194
|
+
1. Adjust OTEL Collector batch size in `otel-collector-config.yaml.example`
|
|
195
|
+
2. Reduce Prometheus scrape interval in `prometheus.yml.example`
|
|
196
|
+
3. Adjust memory limits for services in `docker-compose.otel.yml`
|
|
197
|
+
|
|
198
|
+
## 📚 Next Steps
|
|
199
|
+
|
|
200
|
+
1. **Integrate with Application**: Configure your app to send telemetry to OTLP endpoints
|
|
201
|
+
2. **Create Custom Dashboards**: Build Grafana dashboards for your specific metrics
|
|
202
|
+
3. **Set Up Alerting**: Configure Prometheus alerting rules (see Phase 4 docs)
|
|
203
|
+
4. **Production Hardening**:
|
|
204
|
+
- Change default passwords
|
|
205
|
+
- Enable TLS/authentication
|
|
206
|
+
- Configure persistent storage
|
|
207
|
+
- Set up backup/restore procedures
|
|
208
|
+
|
|
209
|
+
## 📖 Related Documentation
|
|
210
|
+
|
|
211
|
+
- [OTEL Stack Architecture](../docs/architecture/otel-stack-architecture.md)
|
|
212
|
+
- [Phase 4 Alerting Implementation Plan](../docs/implementation-plans/phase4-alerting-implementation-plan.md)
|
|
213
|
+
- [OpenTelemetry Documentation](https://opentelemetry.io/docs/)
|
|
214
|
+
- [Prometheus Documentation](https://prometheus.io/docs/)
|
|
215
|
+
- [Jaeger Documentation](https://www.jaegertracing.io/docs/)
|
|
216
|
+
- [Grafana Documentation](https://grafana.com/docs/)
|
|
217
|
+
|
|
218
|
+
## 🐛 Issue Tracking
|
|
219
|
+
|
|
220
|
+
This implementation resolves **Issue #71**: Complete OTEL Stack Docker Compose Configuration
|
|
221
|
+
|
|
222
|
+
For issues or improvements, please file an issue on the repository.
|