agentic-qe 1.9.3 → 1.9.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. package/CHANGELOG.md +54 -0
  2. package/README.md +30 -5
  3. package/config/.env.otel.example +25 -0
  4. package/config/OTEL-QUICK-REFERENCE.md +137 -0
  5. package/config/README-OTEL.md +222 -0
  6. package/config/alerting-rules.yml +518 -0
  7. package/config/docker-compose.otel.yml +187 -0
  8. package/config/grafana/dashboards/agentic-qe-overview.json +286 -0
  9. package/config/grafana/provisioning/dashboards/dashboards.yml +19 -0
  10. package/config/grafana/provisioning/datasources/datasources.yml +53 -0
  11. package/config/otel-collector-config.yaml.example +145 -0
  12. package/config/prometheus.yml.example +106 -0
  13. package/dist/alerting/AlertManager.d.ts +120 -0
  14. package/dist/alerting/AlertManager.d.ts.map +1 -0
  15. package/dist/alerting/AlertManager.js +345 -0
  16. package/dist/alerting/AlertManager.js.map +1 -0
  17. package/dist/alerting/FeedbackRouter.d.ts +98 -0
  18. package/dist/alerting/FeedbackRouter.d.ts.map +1 -0
  19. package/dist/alerting/FeedbackRouter.js +331 -0
  20. package/dist/alerting/FeedbackRouter.js.map +1 -0
  21. package/dist/alerting/StrategyApplicator.d.ts +120 -0
  22. package/dist/alerting/StrategyApplicator.d.ts.map +1 -0
  23. package/dist/alerting/StrategyApplicator.js +299 -0
  24. package/dist/alerting/StrategyApplicator.js.map +1 -0
  25. package/dist/alerting/index.d.ts +68 -0
  26. package/dist/alerting/index.d.ts.map +1 -0
  27. package/dist/alerting/index.js +112 -0
  28. package/dist/alerting/index.js.map +1 -0
  29. package/dist/alerting/types.d.ts +118 -0
  30. package/dist/alerting/types.d.ts.map +1 -0
  31. package/dist/alerting/types.js +11 -0
  32. package/dist/alerting/types.js.map +1 -0
  33. package/dist/cli/init/claude-config.d.ts.map +1 -1
  34. package/dist/cli/init/claude-config.js +12 -7
  35. package/dist/cli/init/claude-config.js.map +1 -1
  36. package/dist/core/memory/IPatternStore.d.ts +209 -0
  37. package/dist/core/memory/IPatternStore.d.ts.map +1 -0
  38. package/dist/core/memory/IPatternStore.js +15 -0
  39. package/dist/core/memory/IPatternStore.js.map +1 -0
  40. package/dist/core/memory/MigrationTools.d.ts +192 -0
  41. package/dist/core/memory/MigrationTools.d.ts.map +1 -0
  42. package/dist/core/memory/MigrationTools.js +615 -0
  43. package/dist/core/memory/MigrationTools.js.map +1 -0
  44. package/dist/core/memory/NeuralEnhancement.d.ts +154 -0
  45. package/dist/core/memory/NeuralEnhancement.d.ts.map +1 -0
  46. package/dist/core/memory/NeuralEnhancement.js +598 -0
  47. package/dist/core/memory/NeuralEnhancement.js.map +1 -0
  48. package/dist/core/memory/PatternStoreFactory.d.ts +143 -0
  49. package/dist/core/memory/PatternStoreFactory.d.ts.map +1 -0
  50. package/dist/core/memory/PatternStoreFactory.js +370 -0
  51. package/dist/core/memory/PatternStoreFactory.js.map +1 -0
  52. package/dist/core/memory/RealAgentDBAdapter.d.ts +1 -0
  53. package/dist/core/memory/RealAgentDBAdapter.d.ts.map +1 -1
  54. package/dist/core/memory/RealAgentDBAdapter.js +28 -20
  55. package/dist/core/memory/RealAgentDBAdapter.js.map +1 -1
  56. package/dist/core/memory/RuVectorPatternStore.d.ts +198 -0
  57. package/dist/core/memory/RuVectorPatternStore.d.ts.map +1 -0
  58. package/dist/core/memory/RuVectorPatternStore.js +605 -0
  59. package/dist/core/memory/RuVectorPatternStore.js.map +1 -0
  60. package/dist/core/memory/SelfHealingMonitor.d.ts +186 -0
  61. package/dist/core/memory/SelfHealingMonitor.d.ts.map +1 -0
  62. package/dist/core/memory/SelfHealingMonitor.js +451 -0
  63. package/dist/core/memory/SelfHealingMonitor.js.map +1 -0
  64. package/dist/core/memory/SwarmMemoryManager.d.ts +62 -0
  65. package/dist/core/memory/SwarmMemoryManager.d.ts.map +1 -1
  66. package/dist/core/memory/SwarmMemoryManager.js +97 -0
  67. package/dist/core/memory/SwarmMemoryManager.js.map +1 -1
  68. package/dist/core/memory/index.d.ts +11 -0
  69. package/dist/core/memory/index.d.ts.map +1 -1
  70. package/dist/core/memory/index.js +36 -1
  71. package/dist/core/memory/index.js.map +1 -1
  72. package/dist/reasoning/RuVectorReasoningAdapter.d.ts +232 -0
  73. package/dist/reasoning/RuVectorReasoningAdapter.d.ts.map +1 -0
  74. package/dist/reasoning/RuVectorReasoningAdapter.js +585 -0
  75. package/dist/reasoning/RuVectorReasoningAdapter.js.map +1 -0
  76. package/dist/reasoning/index.d.ts +2 -0
  77. package/dist/reasoning/index.d.ts.map +1 -1
  78. package/dist/reasoning/index.js +6 -1
  79. package/dist/reasoning/index.js.map +1 -1
  80. package/dist/reporting/ResultAggregator.d.ts +107 -0
  81. package/dist/reporting/ResultAggregator.d.ts.map +1 -0
  82. package/dist/reporting/ResultAggregator.js +435 -0
  83. package/dist/reporting/ResultAggregator.js.map +1 -0
  84. package/dist/reporting/index.d.ts +48 -0
  85. package/dist/reporting/index.d.ts.map +1 -0
  86. package/dist/reporting/index.js +154 -0
  87. package/dist/reporting/index.js.map +1 -0
  88. package/dist/reporting/reporters/ControlLoopReporter.d.ts +128 -0
  89. package/dist/reporting/reporters/ControlLoopReporter.d.ts.map +1 -0
  90. package/dist/reporting/reporters/ControlLoopReporter.js +417 -0
  91. package/dist/reporting/reporters/ControlLoopReporter.js.map +1 -0
  92. package/dist/reporting/reporters/HumanReadableReporter.d.ts +140 -0
  93. package/dist/reporting/reporters/HumanReadableReporter.d.ts.map +1 -0
  94. package/dist/reporting/reporters/HumanReadableReporter.js +524 -0
  95. package/dist/reporting/reporters/HumanReadableReporter.js.map +1 -0
  96. package/dist/reporting/reporters/JSONReporter.d.ts +193 -0
  97. package/dist/reporting/reporters/JSONReporter.d.ts.map +1 -0
  98. package/dist/reporting/reporters/JSONReporter.js +324 -0
  99. package/dist/reporting/reporters/JSONReporter.js.map +1 -0
  100. package/dist/reporting/reporters/index.d.ts +14 -0
  101. package/dist/reporting/reporters/index.d.ts.map +1 -0
  102. package/dist/reporting/reporters/index.js +19 -0
  103. package/dist/reporting/reporters/index.js.map +1 -0
  104. package/dist/reporting/types.d.ts +427 -0
  105. package/dist/reporting/types.d.ts.map +1 -0
  106. package/dist/reporting/types.js +12 -0
  107. package/dist/reporting/types.js.map +1 -0
  108. package/package.json +9 -1
package/CHANGELOG.md CHANGED
@@ -7,6 +7,60 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [1.9.4] - 2025-11-30
11
+
12
+ ### 🔧 Critical Fixes: Memory/Learning/Patterns System
13
+
14
+ This release delivers critical fixes to the memory, learning, and patterns system based on thorough investigation (Sherlock Investigation Report). All QE agents now have a fully functional learning system with proper vector embeddings, Q-value reinforcement learning, and persistent pattern storage.
15
+
16
+ ### Fixed
17
+
18
+ - **Vector embeddings now stored correctly** (was storing NULL): Fixed `RealAgentDBAdapter.store()` to properly store 384-dimension embeddings as BLOB data instead of NULL
19
+ - **SQL parameter style bug**: Fixed agentdb's `SqlJsDatabase` wrapper to use spread params (`stmt.run(a, b, c)`) instead of array params (`stmt.run([a,b,c])`) which caused "NOT NULL constraint failed" errors
20
+ - **HNSW index schema mismatch**: Added `pattern_id` generated column for agentdb's HNSWIndex compatibility which requires this column for vector search
21
+ - **Learning experience retrieval**: Added missing getter methods that were referenced but didn't exist
22
+ - **Hooks saving to wrong database**: Fixed all Claude Code hooks to explicitly export `AGENTDB_PATH=.agentic-qe/agentdb.db` so learning data is saved to the project database instead of the root directory
23
+ - **CI failures due to ARM64-only ruvector packages**: Moved `@ruvector/node-linux-arm64-gnu` and `ruvector-core-linux-arm64-gnu` from dependencies to optionalDependencies. Added x64 variants for CI compatibility
24
+
25
+ ### Added
26
+
27
+ - **New SwarmMemoryManager methods for learning data retrieval**:
28
+ - `getBestAction(agentId, stateKey)` - Q-learning best action selection
29
+ - `getRecentLearningExperiences(agentId, limit)` - Recent experience retrieval
30
+ - `getLearningExperiencesByTaskType(agentId, taskType, limit)` - Task-filtered experiences
31
+ - `getHighRewardExperiences(agentId, minReward, limit)` - Successful experience extraction
32
+ - `getLearningStats(agentId)` - Aggregate learning statistics (total, avg, max, min rewards)
33
+
34
+ - **Hooks integration**: Added `AGENTDB_PATH` environment variable to connect Claude Code hooks to the QE database
35
+
36
+ - **New modules (Phase 4 Alerting & Reporting)**:
37
+ - `src/alerting/` - AlertManager, FeedbackRouter, StrategyApplicator (1,394 LOC)
38
+ - `src/reporting/` - ResultAggregator, reporters (3,030 LOC)
39
+ - Quality gate scripts and GitHub Actions workflow
40
+
41
+ - **Integration test**: `tests/integration/memory-learning-loop.test.ts` - Comprehensive 7-phase test validating the full learning cycle:
42
+ 1. Pattern storage with embeddings
43
+ 2. Learning experience capture
44
+ 3. Q-value reinforcement learning
45
+ 4. Memory persistence
46
+ 5. Pattern retrieval
47
+ 6. Vector similarity search
48
+ 7. Full learning loop simulation
49
+
50
+ ### Changed
51
+
52
+ - **RealAgentDBAdapter**: Now properly retrieves stored embeddings when querying patterns instead of using placeholder values
53
+ - **Pattern table schema**: Added generated column `pattern_id TEXT GENERATED ALWAYS AS (id) STORED` for HNSW compatibility
54
+
55
+ ### Technical Details
56
+
57
+ - Vector embeddings: 384 dimensions × 4 bytes = 1,536 bytes per pattern
58
+ - AgentDB version: v1.6.1 with ReasoningBank (16 learning tables)
59
+ - HNSW index: 150x faster vector search enabled
60
+ - All 12 integration tests pass
61
+
62
+ ---
63
+
10
64
  ## [1.9.3] - 2025-11-26
11
65
 
12
66
  ### 🐛 Bugfix: NPM Package Missing Files
package/README.md CHANGED
@@ -9,11 +9,11 @@
9
9
  <img alt="NPM Downloads" src="https://img.shields.io/npm/dw/agentic-qe">
10
10
 
11
11
 
12
- **Version 1.9.3** (NPM Package Fix) | [Changelog](CHANGELOG.md) | [Issues](https://github.com/proffesor-for-testing/agentic-qe/issues) | [Discussions](https://github.com/proffesor-for-testing/agentic-qe/discussions)
12
+ **Version 1.9.4** (Memory & Learning System Fixes) | [Changelog](CHANGELOG.md) | [Issues](https://github.com/proffesor-for-testing/agentic-qe/issues) | [Discussions](https://github.com/proffesor-for-testing/agentic-qe/discussions)
13
13
 
14
14
  > Agentic test automation with AI learning, real-time visualization, OpenTelemetry observability, persistent event storage, constitutional AI governance, and intelligent model routing.
15
15
 
16
- 🎨 **Real-Time Visualization** | 📊 **Interactive Dashboards** | 🧠 **QE Agent Learning** | 💾 **Event Sourcing** | 📋 **Constitution System** | 📚 **40 QE Skills** | 🎯 **Flaky Detection** | 💰 **Multi-Model Router**
16
+ 🎨 **Real-Time Visualization** | 📊 **Interactive Dashboards** | 🧠 **QE Agent Learning** | 💾 **Event Sourcing** | 📋 **Constitution System** | 📚 **38 QE Skills** | 🎯 **Flaky Detection** | 💰 **Multi-Model Router**
17
17
 
18
18
  </div>
19
19
 
@@ -193,7 +193,7 @@ open http://localhost:3000
193
193
  - **Performance Testing**: k6, JMeter, Gatling integration
194
194
  - **Real-Time Streaming**: Live progress updates for all operations
195
195
 
196
- ### 🎓 40 QE Skills Library (v1.9.0)
196
+ ### 🎓 38 QE Skills Library (v1.9.0)
197
197
  **95%+ coverage of modern QE practices**
198
198
 
199
199
  <details>
@@ -206,7 +206,7 @@ open http://localhost:3000
206
206
  - **Code Quality**: code-review-quality, refactoring-patterns, quality-metrics
207
207
  - **Communication**: bug-reporting-excellence, technical-writing, consultancy-practices
208
208
 
209
- **Phase 2: Expanded QE Skills Library (18 skills)**
209
+ **Phase 2: Expanded QE Skills Library (16 skills)**
210
210
  - **Testing Methodologies (7)**: regression-testing, shift-left-testing, shift-right-testing, test-design-techniques, mutation-testing, test-data-management, verification-quality
211
211
  - **Specialized Testing (9)**: accessibility-testing, mobile-testing, database-testing, contract-testing, chaos-engineering-resilience, compatibility-testing, localization-testing, compliance-testing, visual-testing-advanced
212
212
  - **Testing Infrastructure (2)**: test-environment-management, test-reporting-analytics
@@ -214,7 +214,7 @@ open http://localhost:3000
214
214
  **Phase 3: Advanced Quality Engineering Skills (4 skills)**
215
215
  - **Strategic Testing Methodologies (4)**: six-thinking-hats, brutal-honesty-review, sherlock-review, cicd-pipeline-qe-orchestrator
216
216
 
217
- **Total: 40 QE Skills** - Includes accessibility testing, shift-left/right testing, verification & quality assurance, visual testing advanced, XP practices, and technical writing
217
+ **Total: 38 QE Skills** - Includes accessibility testing, shift-left/right testing, verification & quality assurance, visual testing advanced, XP practices, and technical writing
218
218
 
219
219
  </details>
220
220
 
@@ -645,6 +645,31 @@ The test generator automatically delegates to subagents for a complete RED-GREEN
645
645
 
646
646
  ---
647
647
 
648
+ ## 📝 What's New in v1.9.4
649
+
650
+ 🔧 **Critical Memory & Learning System Fixes** (2025-11-30)
651
+
652
+ This release delivers critical fixes to the memory, learning, and patterns system. All QE agents now have a fully functional learning system with proper vector embeddings, Q-value reinforcement learning, and persistent pattern storage.
653
+
654
+ ### Key Fixes
655
+
656
+ - **Vector embeddings now stored correctly**: Fixed `RealAgentDBAdapter.store()` to properly store 384-dimension embeddings as BLOB data
657
+ - **SQL parameter style bug**: Fixed agentdb's `SqlJsDatabase` wrapper to use spread params instead of array params
658
+ - **HNSW index schema mismatch**: Added `pattern_id` generated column for agentdb's HNSWIndex compatibility
659
+ - **Learning experience retrieval**: Added missing getter methods for Q-learning and experience replay
660
+ - **Hooks saving to wrong database**: Fixed all Claude Code hooks to explicitly export `AGENTDB_PATH` so learning data is saved correctly
661
+ - **CI platform compatibility**: Moved ARM64-only ruvector packages to optionalDependencies for x64 CI compatibility
662
+
663
+ ### New Features
664
+
665
+ - **SwarmMemoryManager learning methods**: `getBestAction()`, `getRecentLearningExperiences()`, `getLearningStats()`, and more
666
+ - **Phase 4 Alerting & Reporting**: AlertManager, FeedbackRouter, StrategyApplicator modules
667
+ - **Quality Gate CI workflow**: GitHub Actions integration for automated quality validation
668
+
669
+ **Upgrade**: `npm install agentic-qe@1.9.4`
670
+
671
+ ---
672
+
648
673
  ## 📝 What's New in v1.9.3
649
674
 
650
675
  📦 **NPM Package Fix** (2025-11-26)
@@ -0,0 +1,25 @@
1
+ # OTEL Stack Environment Variables
2
+ # Agentic QE Fleet - Issue #71
3
+ #
4
+ # Copy this file to .env.otel and customize as needed
5
+ # Usage: docker-compose -f config/docker-compose.otel.yml --env-file config/.env.otel up -d
6
+
7
+ # Deployment environment
8
+ DEPLOYMENT_ENVIRONMENT=development
9
+
10
+ # Grafana credentials (CHANGE IN PRODUCTION!)
11
+ GRAFANA_ADMIN_USER=admin
12
+ GRAFANA_ADMIN_PASSWORD=admin
13
+
14
+ # OTEL Collector settings
15
+ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
16
+ OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
17
+
18
+ # Service configuration
19
+ SERVICE_NAME=agentic-qe-fleet
20
+ SERVICE_NAMESPACE=agentic-qe
21
+ SERVICE_VERSION=1.9.3
22
+
23
+ # Prometheus retention
24
+ PROMETHEUS_RETENTION_TIME=15d
25
+ PROMETHEUS_RETENTION_SIZE=10GB
@@ -0,0 +1,137 @@
1
+ # OTEL Stack Quick Reference Card
2
+
3
+ ## 🚀 One-Line Start
4
+
5
+ ```bash
6
+ docker-compose -f config/docker-compose.otel.yml up -d
7
+ ```
8
+
9
+ ## 🌐 Service URLs
10
+
11
+ | Service | URL | Login |
12
+ |---------|-----|-------|
13
+ | **Grafana** | http://localhost:3001 | admin/admin |
14
+ | **Prometheus** | http://localhost:9090 | - |
15
+ | **Jaeger** | http://localhost:16686 | - |
16
+ | **OTEL Health** | http://localhost:13133/health | - |
17
+
18
+ ## 📡 Send Telemetry
19
+
20
+ ### OTLP Endpoints
21
+ - gRPC: `localhost:4317`
22
+ - HTTP: `localhost:4318`
23
+
24
+ ### Node.js Example
25
+ ```javascript
26
+ const exporter = new OTLPTraceExporter({
27
+ url: 'http://localhost:4318/v1/traces'
28
+ });
29
+ ```
30
+
31
+ ### cURL Test
32
+ ```bash
33
+ curl http://localhost:4318/v1/traces \
34
+ -H "Content-Type: application/json" \
35
+ -d @trace.json
36
+ ```
37
+
38
+ ## 🔍 Quick Checks
39
+
40
+ ```bash
41
+ # Verify all services
42
+ ./scripts/verify-otel-stack.sh
43
+
44
+ # Check health
45
+ curl http://localhost:13133/health # OTEL Collector
46
+ curl http://localhost:9090/-/healthy # Prometheus
47
+ curl http://localhost:14269/ # Jaeger
48
+ curl http://localhost:3001/api/health # Grafana
49
+
50
+ # View logs
51
+ docker-compose -f config/docker-compose.otel.yml logs -f
52
+ ```
53
+
54
+ ## 🛠️ Common Commands
55
+
56
+ ```bash
57
+ # Start
58
+ docker-compose -f config/docker-compose.otel.yml up -d
59
+
60
+ # Stop
61
+ docker-compose -f config/docker-compose.otel.yml down
62
+
63
+ # Restart
64
+ docker-compose -f config/docker-compose.otel.yml restart
65
+
66
+ # View status
67
+ docker-compose -f config/docker-compose.otel.yml ps
68
+
69
+ # Logs
70
+ docker-compose -f config/docker-compose.otel.yml logs -f [service]
71
+
72
+ # Remove everything (INCLUDING DATA!)
73
+ docker-compose -f config/docker-compose.otel.yml down -v
74
+ ```
75
+
76
+ ## 📊 Port Reference
77
+
78
+ ### OTEL Collector
79
+ - 4317 - OTLP gRPC
80
+ - 4318 - OTLP HTTP
81
+ - 8889 - Prometheus metrics
82
+ - 13133 - Health check
83
+
84
+ ### Prometheus
85
+ - 9090 - Web UI
86
+
87
+ ### Jaeger
88
+ - 16686 - UI
89
+ - 14269 - Metrics
90
+
91
+ ### Grafana
92
+ - 3001 - Web UI
93
+
94
+ ## 🔧 Configuration Files
95
+
96
+ - **Docker Compose**: `config/docker-compose.otel.yml`
97
+ - **OTEL Collector**: `config/otel-collector-config.yaml.example`
98
+ - **Prometheus**: `config/prometheus.yml.example`
99
+ - **Grafana Datasources**: `config/grafana/provisioning/datasources/datasources.yml`
100
+ - **Grafana Dashboards**: `config/grafana/provisioning/dashboards/dashboards.yml`
101
+
102
+ ## 📚 Documentation
103
+
104
+ - **Quick Start**: `config/README-OTEL.md`
105
+ - **Full Summary**: `docs/implementation-plans/issue-71-completion-summary.md`
106
+ - **Architecture**: `docs/architecture/otel-stack-architecture.md`
107
+
108
+ ## 🐛 Quick Troubleshooting
109
+
110
+ | Issue | Solution |
111
+ |-------|----------|
112
+ | Services not starting | Check logs: `docker-compose -f config/docker-compose.otel.yml logs` |
113
+ | Port already in use | Change external port in `docker-compose.otel.yml` |
114
+ | Grafana can't connect | Check datasources: Grafana → Configuration → Data Sources |
115
+ | No metrics in Prometheus | Check targets: http://localhost:9090/targets |
116
+ | No traces in Jaeger | Verify OTLP endpoint: `curl http://localhost:4318` |
117
+
118
+ ## 🎯 Grafana Datasources
119
+
120
+ Pre-configured and auto-loaded:
121
+
122
+ 1. **Prometheus** (default) - `http://prometheus:9090`
123
+ 2. **Jaeger** - `http://jaeger:16686`
124
+ 3. **OTEL Collector Metrics** - `http://otel-collector:8889`
125
+
126
+ ## ✅ Verification Checklist
127
+
128
+ - [ ] All 4 services running: `docker-compose ps`
129
+ - [ ] Health checks passing: `./scripts/verify-otel-stack.sh`
130
+ - [ ] OTLP endpoints accessible: `curl http://localhost:4318`
131
+ - [ ] Prometheus targets green: http://localhost:9090/targets
132
+ - [ ] Grafana datasources connected: Grafana UI → Data Sources
133
+ - [ ] Sample dashboard visible: Grafana → Dashboards
134
+
135
+ ---
136
+
137
+ **Issue #71 - COMPLETED** ✅
@@ -0,0 +1,222 @@
1
+ # OTEL Observability Stack - Quick Start Guide
2
+
3
+ Complete observability stack for the Agentic QE Fleet with OpenTelemetry, Prometheus, Jaeger, and Grafana.
4
+
5
+ ## 🚀 Quick Start
6
+
7
+ ### 1. Start the OTEL Stack
8
+
9
+ ```bash
10
+ # Start only the OTEL stack
11
+ docker-compose -f config/docker-compose.otel.yml up -d
12
+
13
+ # Or combine with the main application
14
+ docker-compose -f docker-compose.yml -f config/docker-compose.otel.yml up -d
15
+ ```
16
+
17
+ ### 2. Access the Services
18
+
19
+ | Service | URL | Purpose |
20
+ |---------|-----|---------|
21
+ | **Grafana** | http://localhost:3001 | Dashboards and visualization |
22
+ | **Prometheus** | http://localhost:9090 | Metrics storage and querying |
23
+ | **Jaeger UI** | http://localhost:16686 | Distributed tracing |
24
+ | **OTEL Collector** | http://localhost:13133/health | Health check |
25
+
26
+ ### 3. Default Credentials
27
+
28
+ - **Grafana**: `admin` / `admin` (change on first login)
29
+
30
+ ## 📊 Service Endpoints
31
+
32
+ ### OTEL Collector
33
+ - **OTLP gRPC**: `localhost:4317` - Send traces/metrics via gRPC
34
+ - **OTLP HTTP**: `localhost:4318` - Send traces/metrics via HTTP
35
+ - **Prometheus Exporter**: `localhost:8889` - Metrics endpoint
36
+ - **Health Check**: `localhost:13133` - Collector health
37
+ - **pprof**: `localhost:1777` - Performance profiling
38
+ - **zPages**: `localhost:55679` - Debug interface
39
+
40
+ ### Prometheus
41
+ - **Web UI**: `localhost:9090` - Query and explore metrics
42
+ - **API**: `localhost:9090/api/v1/` - Prometheus HTTP API
43
+
44
+ ### Jaeger
45
+ - **UI**: `localhost:16686` - Trace visualization
46
+ - **OTLP gRPC**: `localhost:4327` - Receive traces (forwarded from collector)
47
+ - **Metrics**: `localhost:14269/metrics` - Jaeger metrics
48
+ - **Health**: `localhost:14269/` - Health check
49
+
50
+ ### Grafana
51
+ - **Web UI**: `localhost:3001` - Dashboards and visualization
52
+ - **API**: `localhost:3001/api/` - Grafana HTTP API
53
+
54
+ ## 🔧 Configuration Files
55
+
56
+ ### Required Files (Already Created)
57
+ - `config/docker-compose.otel.yml` - Docker Compose configuration
58
+ - `config/otel-collector-config.yaml.example` - OTEL Collector config
59
+ - `config/prometheus.yml.example` - Prometheus scrape config
60
+ - `config/grafana/provisioning/datasources/datasources.yml` - Grafana datasources
61
+ - `config/grafana/provisioning/dashboards/dashboards.yml` - Dashboard provisioning
62
+ - `config/grafana/dashboards/agentic-qe-overview.json` - Sample dashboard
63
+
64
+ ### Environment Variables (Optional)
65
+ Copy and customize:
66
+ ```bash
67
+ cp config/.env.otel.example config/.env.otel
68
+ ```
69
+
70
+ Then use:
71
+ ```bash
72
+ docker-compose -f config/docker-compose.otel.yml --env-file config/.env.otel up -d
73
+ ```
74
+
75
+ ## 📈 Using the Stack
76
+
77
+ ### Send Telemetry to OTEL Collector
78
+
79
+ #### Via HTTP (curl example)
80
+ ```bash
81
+ curl -X POST http://localhost:4318/v1/traces \
82
+ -H "Content-Type: application/json" \
83
+ -d @trace-data.json
84
+ ```
85
+
86
+ #### Via Node.js Application
87
+ ```javascript
88
+ const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
89
+ const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
90
+
91
+ const provider = new NodeTracerProvider();
92
+ provider.addSpanProcessor(
93
+ new BatchSpanProcessor(
94
+ new OTLPTraceExporter({
95
+ url: 'http://localhost:4318/v1/traces'
96
+ })
97
+ )
98
+ );
99
+ provider.register();
100
+ ```
101
+
102
+ ### Query Metrics in Prometheus
103
+
104
+ 1. Open http://localhost:9090
105
+ 2. Try queries:
106
+ - `aqe_requests_total` - Total requests
107
+ - `rate(aqe_requests_total[5m])` - Request rate
108
+ - `histogram_quantile(0.95, rate(aqe_request_duration_bucket[5m]))` - P95 latency
109
+
110
+ ### View Traces in Jaeger
111
+
112
+ 1. Open http://localhost:16686
113
+ 2. Select service: `agentic-qe-fleet`
114
+ 3. Click "Find Traces"
115
+ 4. Explore trace details and service dependencies
116
+
117
+ ### Create Dashboards in Grafana
118
+
119
+ 1. Open http://localhost:3001
120
+ 2. Login with `admin` / `admin`
121
+ 3. Navigate to Dashboards → Agentic QE Fleet → Overview
122
+ 4. Or create new dashboards using Prometheus and Jaeger datasources
123
+
124
+ ## 🛠️ Management Commands
125
+
126
+ ### View Logs
127
+ ```bash
128
+ # All services
129
+ docker-compose -f config/docker-compose.otel.yml logs -f
130
+
131
+ # Specific service
132
+ docker-compose -f config/docker-compose.otel.yml logs -f otel-collector
133
+ docker-compose -f config/docker-compose.otel.yml logs -f prometheus
134
+ docker-compose -f config/docker-compose.otel.yml logs -f jaeger
135
+ docker-compose -f config/docker-compose.otel.yml logs -f grafana
136
+ ```
137
+
138
+ ### Check Service Health
139
+ ```bash
140
+ # OTEL Collector
141
+ curl http://localhost:13133/health
142
+
143
+ # Prometheus
144
+ curl http://localhost:9090/-/healthy
145
+
146
+ # Jaeger
147
+ curl http://localhost:14269/
148
+
149
+ # Grafana
150
+ curl http://localhost:3001/api/health
151
+ ```
152
+
153
+ ### Stop Services
154
+ ```bash
155
+ # Stop OTEL stack
156
+ docker-compose -f config/docker-compose.otel.yml down
157
+
158
+ # Stop and remove volumes (CAUTION: deletes all data)
159
+ docker-compose -f config/docker-compose.otel.yml down -v
160
+ ```
161
+
162
+ ### Restart Services
163
+ ```bash
164
+ # Restart all
165
+ docker-compose -f config/docker-compose.otel.yml restart
166
+
167
+ # Restart specific service
168
+ docker-compose -f config/docker-compose.otel.yml restart otel-collector
169
+ ```
170
+
171
+ ## 🔍 Troubleshooting
172
+
173
+ ### OTEL Collector Not Receiving Data
174
+ 1. Check collector logs: `docker-compose -f config/docker-compose.otel.yml logs otel-collector`
175
+ 2. Verify endpoints: `curl http://localhost:13133/health`
176
+ 3. Check application OTLP endpoint: `http://localhost:4318`
177
+
178
+ ### Prometheus Not Scraping Metrics
179
+ 1. Check Prometheus targets: http://localhost:9090/targets
180
+ 2. Verify OTEL Collector is exposing metrics: `curl http://localhost:8889/metrics`
181
+ 3. Check Prometheus config: `docker-compose -f config/docker-compose.otel.yml exec prometheus cat /etc/prometheus/prometheus.yml`
182
+
183
+ ### Jaeger Not Showing Traces
184
+ 1. Check Jaeger logs: `docker-compose -f config/docker-compose.otel.yml logs jaeger`
185
+ 2. Verify OTEL Collector is forwarding traces (check collector logs)
186
+ 3. Ensure application is sending traces to OTLP endpoint
187
+
188
+ ### Grafana Datasources Not Working
189
+ 1. Check datasource configuration: Grafana UI → Configuration → Data Sources
190
+ 2. Test datasource connection (should show green checkmark)
191
+ 3. Verify Prometheus/Jaeger are accessible from Grafana container
192
+
193
+ ### Performance Issues
194
+ 1. Adjust OTEL Collector batch size in `otel-collector-config.yaml.example`
195
+ 2. Reduce Prometheus scrape interval in `prometheus.yml.example`
196
+ 3. Adjust memory limits for services in `docker-compose.otel.yml`
197
+
198
+ ## 📚 Next Steps
199
+
200
+ 1. **Integrate with Application**: Configure your app to send telemetry to OTLP endpoints
201
+ 2. **Create Custom Dashboards**: Build Grafana dashboards for your specific metrics
202
+ 3. **Set Up Alerting**: Configure Prometheus alerting rules (see Phase 4 docs)
203
+ 4. **Production Hardening**:
204
+ - Change default passwords
205
+ - Enable TLS/authentication
206
+ - Configure persistent storage
207
+ - Set up backup/restore procedures
208
+
209
+ ## 📖 Related Documentation
210
+
211
+ - [OTEL Stack Architecture](../docs/architecture/otel-stack-architecture.md)
212
+ - [Phase 4 Alerting Implementation Plan](../docs/implementation-plans/phase4-alerting-implementation-plan.md)
213
+ - [OpenTelemetry Documentation](https://opentelemetry.io/docs/)
214
+ - [Prometheus Documentation](https://prometheus.io/docs/)
215
+ - [Jaeger Documentation](https://www.jaegertracing.io/docs/)
216
+ - [Grafana Documentation](https://grafana.com/docs/)
217
+
218
+ ## 🐛 Issue Tracking
219
+
220
+ This implementation resolves **Issue #71**: Complete OTEL Stack Docker Compose Configuration
221
+
222
+ For issues or improvements, please file an issue on the repository.