@intentsolutionsio/fairdb-operations-kit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +26 -0
- package/LICENSE +21 -0
- package/README.md +298 -0
- package/agents/fairdb-automation-agent.md +307 -0
- package/commands/fairdb-emergency-response.md +480 -0
- package/commands/fairdb-health-check.md +459 -0
- package/commands/fairdb-onboard-customer.md +446 -0
- package/commands/fairdb-setup-backup.md +420 -0
- package/package.json +48 -0
- package/skills/fairdb-backup-manager/SKILL.md +72 -0
- package/skills/fairdb-backup-manager/assets/README.md +26 -0
- package/skills/fairdb-backup-manager/references/README.md +26 -0
- package/skills/fairdb-backup-manager/scripts/README.md +24 -0
- package/skills/skill-adapter/assets/README.md +4 -0
- package/skills/skill-adapter/assets/config-template.json +32 -0
- package/skills/skill-adapter/assets/skill-schema.json +28 -0
- package/skills/skill-adapter/assets/test-data.json +27 -0
- package/skills/skill-adapter/references/README.md +4 -0
- package/skills/skill-adapter/references/best-practices.md +69 -0
- package/skills/skill-adapter/references/examples.md +73 -0
- package/skills/skill-adapter/scripts/README.md +10 -0
- package/skills/skill-adapter/scripts/helper-template.sh +42 -0
- package/skills/skill-adapter/scripts/validation.sh +32 -0
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "fairdb-operations-kit",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "Complete operations kit for FairDB PostgreSQL as a Service - VPS setup, PostgreSQL management, customer provisioning, monitoring, and backup automation",
|
|
5
|
+
"author": {
|
|
6
|
+
"name": "Jeremy Longshore",
|
|
7
|
+
"email": "jeremy@intentsolutions.io"
|
|
8
|
+
},
|
|
9
|
+
"repository": "https://github.com/jeremylongshore/claude-code-plugins",
|
|
10
|
+
"license": "MIT",
|
|
11
|
+
"keywords": [
|
|
12
|
+
"fairdb",
|
|
13
|
+
"postgresql",
|
|
14
|
+
"database",
|
|
15
|
+
"saas",
|
|
16
|
+
"operations",
|
|
17
|
+
"devops",
|
|
18
|
+
"backup",
|
|
19
|
+
"monitoring",
|
|
20
|
+
"vps",
|
|
21
|
+
"contabo",
|
|
22
|
+
"wasabi",
|
|
23
|
+
"pgbackrest",
|
|
24
|
+
"automation"
|
|
25
|
+
]
|
|
26
|
+
}
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Jeremy Longshore
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,298 @@
|
|
|
1
|
+
# FairDB Operations Kit
|
|
2
|
+
|
|
3
|
+
A comprehensive Claude Code plugin suite for managing FairDB PostgreSQL as a Service operations. This plugin automates VPS provisioning, PostgreSQL management, backup configuration, customer onboarding, and monitoring workflows.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
FairDB is a managed PostgreSQL-as-a-Service platform built on Contabo VPS infrastructure with pgBackRest backups to Wasabi S3 storage. This plugin kit provides Claude with the ability to execute complex operational tasks through natural language commands.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
- **VPS Provisioning**: Automated Contabo VPS setup with security hardening
|
|
12
|
+
- **PostgreSQL Management**: Install, configure, and optimize PostgreSQL 16
|
|
13
|
+
- **Backup System**: pgBackRest configuration with Wasabi S3 integration
|
|
14
|
+
- **Customer Provisioning**: Automated database and user creation workflows
|
|
15
|
+
- **Monitoring**: Health checks, performance monitoring, and alerting
|
|
16
|
+
- **Incident Response**: Guided troubleshooting and recovery procedures
|
|
17
|
+
- **Intelligent Automation**: AI-powered agent for proactive management
|
|
18
|
+
|
|
19
|
+
## Installation
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
/plugin install fairdb-operations-kit@claude-code-plugins-plus
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Commands
|
|
26
|
+
|
|
27
|
+
### Infrastructure Setup
|
|
28
|
+
|
|
29
|
+
- `/fairdb-provision-vps` - Complete VPS setup with security hardening (implements SOP-001)
|
|
30
|
+
- `/fairdb-install-postgres` - Install and configure PostgreSQL 16 for production (implements SOP-002)
|
|
31
|
+
- `/fairdb-setup-backup` - Configure pgBackRest with Wasabi S3 storage (implements SOP-003)
|
|
32
|
+
|
|
33
|
+
### Customer Management
|
|
34
|
+
|
|
35
|
+
- `/fairdb-onboard-customer` - Complete customer provisioning workflow
|
|
36
|
+
- Creates database and users
|
|
37
|
+
- Configures network access
|
|
38
|
+
- Sets up backups
|
|
39
|
+
- Generates SSL certificates
|
|
40
|
+
- Provides connection documentation
|
|
41
|
+
|
|
42
|
+
### Operations & Monitoring
|
|
43
|
+
|
|
44
|
+
- `/fairdb-health-check` - Comprehensive system health verification
|
|
45
|
+
- Server resources check
|
|
46
|
+
- Database performance metrics
|
|
47
|
+
- Backup status verification
|
|
48
|
+
- Security audit
|
|
49
|
+
|
|
50
|
+
- `/fairdb-emergency-response` - Critical incident response procedures
|
|
51
|
+
- Service recovery
|
|
52
|
+
- Data integrity checks
|
|
53
|
+
- Performance triage
|
|
54
|
+
- Root cause analysis
|
|
55
|
+
|
|
56
|
+
## Agent Capabilities
|
|
57
|
+
|
|
58
|
+
The `fairdb-automation-agent` provides intelligent automation for:
|
|
59
|
+
|
|
60
|
+
- **Proactive Monitoring**: Continuous analysis and prediction of issues
|
|
61
|
+
- **Automated Problem Resolution**: Pattern-based diagnosis and fixes
|
|
62
|
+
- **Resource Optimization**: Dynamic parameter tuning and workload balancing
|
|
63
|
+
- **Automated Operations**: Routine maintenance and backup management
|
|
64
|
+
|
|
65
|
+
## Skills
|
|
66
|
+
|
|
67
|
+
### FairDB Backup Manager
|
|
68
|
+
|
|
69
|
+
An Agent Skill that automatically activates when working with backups:
|
|
70
|
+
- Manages pgBackRest configurations
|
|
71
|
+
- Executes scheduled backups
|
|
72
|
+
- Performs test restores
|
|
73
|
+
- Monitors backup health
|
|
74
|
+
- Optimizes storage costs
|
|
75
|
+
|
|
76
|
+
## Architecture
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
FairDB Infrastructure Stack
|
|
80
|
+
├── Contabo VPS
|
|
81
|
+
│ ├── Ubuntu 24.04 LTS
|
|
82
|
+
│ ├── PostgreSQL 16
|
|
83
|
+
│ ├── pgBackRest
|
|
84
|
+
│ └── Monitoring (Prometheus/Grafana)
|
|
85
|
+
├── Wasabi S3 Storage
|
|
86
|
+
│ ├── Full backups (weekly)
|
|
87
|
+
│ ├── Differential backups (daily)
|
|
88
|
+
│ └── WAL archives (continuous)
|
|
89
|
+
└── Security Layer
|
|
90
|
+
├── UFW Firewall
|
|
91
|
+
├── Fail2ban IPS
|
|
92
|
+
├── SSL/TLS encryption
|
|
93
|
+
└── Key-based SSH
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## Standard Operating Procedures
|
|
97
|
+
|
|
98
|
+
This plugin implements three core SOPs:
|
|
99
|
+
|
|
100
|
+
### SOP-001: VPS Hardening
|
|
101
|
+
- OS security updates
|
|
102
|
+
- Firewall configuration (UFW)
|
|
103
|
+
- Intrusion prevention (Fail2ban)
|
|
104
|
+
- SSH hardening
|
|
105
|
+
- Monitoring setup
|
|
106
|
+
|
|
107
|
+
### SOP-002: PostgreSQL Installation
|
|
108
|
+
- PostgreSQL 16 from official repos
|
|
109
|
+
- Production configuration tuning
|
|
110
|
+
- SSL certificate generation
|
|
111
|
+
- User and permission management
|
|
112
|
+
- Performance optimization
|
|
113
|
+
|
|
114
|
+
### SOP-003: Backup Configuration
|
|
115
|
+
- pgBackRest installation
|
|
116
|
+
- Wasabi S3 integration
|
|
117
|
+
- Retention policy setup
|
|
118
|
+
- Automated scheduling
|
|
119
|
+
- Recovery testing
|
|
120
|
+
|
|
121
|
+
## Configuration
|
|
122
|
+
|
|
123
|
+
Required environment variables:
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
# Contabo API
|
|
127
|
+
CONTABO_API_KEY=<your-api-key>
|
|
128
|
+
|
|
129
|
+
# Wasabi S3
|
|
130
|
+
WASABI_ACCESS_KEY=<access-key>
|
|
131
|
+
WASABI_SECRET_KEY=<secret-key>
|
|
132
|
+
WASABI_BUCKET=<bucket-name>
|
|
133
|
+
WASABI_ENDPOINT=<region-endpoint>
|
|
134
|
+
|
|
135
|
+
# PostgreSQL
|
|
136
|
+
FAIRDB_ADMIN_USER=<admin-username>
|
|
137
|
+
FAIRDB_ADMIN_PASS=<admin-password>
|
|
138
|
+
|
|
139
|
+
# Monitoring
|
|
140
|
+
FAIRDB_MONITORING_WEBHOOK=<webhook-url>
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## Usage Examples
|
|
144
|
+
|
|
145
|
+
### Complete New Customer Setup
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
User: "I need to onboard Acme Corp as a new FairDB customer"
|
|
149
|
+
|
|
150
|
+
Claude: I'll help you onboard Acme Corp. Let me use the fairdb-onboard-customer command...
|
|
151
|
+
|
|
152
|
+
[Executes complete workflow]:
|
|
153
|
+
✅ Database created: acme_corp_db
|
|
154
|
+
✅ Users created with secure passwords
|
|
155
|
+
✅ Network access configured for customer IPs
|
|
156
|
+
✅ Backups scheduled (daily at 3 AM)
|
|
157
|
+
✅ SSL certificates generated
|
|
158
|
+
✅ Monitoring enabled
|
|
159
|
+
✅ Connection documentation generated
|
|
160
|
+
|
|
161
|
+
Customer can now connect using:
|
|
162
|
+
postgresql://acme_corp_user:<password>@fairdb.example.com:5432/acme_corp_db?sslmode=require
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
### Emergency Response
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
User: "Critical: customers reporting connection failures!"
|
|
169
|
+
|
|
170
|
+
Claude: Initiating emergency response with fairdb-emergency-response...
|
|
171
|
+
|
|
172
|
+
[STEP 1] SERVICE STATUS CHECK
|
|
173
|
+
❌ CRITICAL: PostgreSQL is DOWN
|
|
174
|
+
Attempting emergency restart...
|
|
175
|
+
✅ PostgreSQL restarted successfully
|
|
176
|
+
|
|
177
|
+
[STEP 2] CONNECTION DIAGNOSTICS
|
|
178
|
+
Connections: 198/200 (99% utilized)
|
|
179
|
+
Terminating idle connections...
|
|
180
|
+
✅ Freed 47 connections
|
|
181
|
+
|
|
182
|
+
[STEP 3] PERFORMANCE TRIAGE
|
|
183
|
+
Found 3 long-running queries (>30 minutes)
|
|
184
|
+
Cancelling problematic queries...
|
|
185
|
+
✅ Performance restored
|
|
186
|
+
|
|
187
|
+
Root Cause: Connection pool exhaustion due to application bug
|
|
188
|
+
Resolution: Service restored, customer notified, RCA documented
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Health Monitoring
|
|
192
|
+
|
|
193
|
+
```
|
|
194
|
+
User: "Run a health check on FairDB infrastructure"
|
|
195
|
+
|
|
196
|
+
Claude: Running comprehensive health check with fairdb-health-check...
|
|
197
|
+
|
|
198
|
+
[1/10] SERVER RESOURCES
|
|
199
|
+
✅ CPU: 42% | Memory: 68% | Disk: 71%
|
|
200
|
+
|
|
201
|
+
[2/10] POSTGRESQL SERVICE
|
|
202
|
+
✅ Running | Version: 16.1 | Uptime: 47 days
|
|
203
|
+
|
|
204
|
+
[3/10] DATABASE CONNECTIONS
|
|
205
|
+
Active: 87/200 (43.5%)
|
|
206
|
+
|
|
207
|
+
[4/10] BACKUP STATUS
|
|
208
|
+
✅ Last backup: 2 hours ago
|
|
209
|
+
✅ All customer stanzas current
|
|
210
|
+
|
|
211
|
+
[10/10] OVERALL HEALTH: GOOD
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
## Best Practices
|
|
215
|
+
|
|
216
|
+
### Security
|
|
217
|
+
- Always use SSL/TLS connections
|
|
218
|
+
- Rotate passwords quarterly
|
|
219
|
+
- Keep IP allowlists updated
|
|
220
|
+
- Regular security audits
|
|
221
|
+
- Encrypted backups only
|
|
222
|
+
|
|
223
|
+
### Performance
|
|
224
|
+
- Monitor connection pools
|
|
225
|
+
- Regular VACUUM ANALYZE
|
|
226
|
+
- Index optimization
|
|
227
|
+
- Query performance reviews
|
|
228
|
+
- Resource usage tracking
|
|
229
|
+
|
|
230
|
+
### Reliability
|
|
231
|
+
- Test restores monthly
|
|
232
|
+
- Document all procedures
|
|
233
|
+
- Maintain runbooks
|
|
234
|
+
- Practice incident response
|
|
235
|
+
- Keep backups current
|
|
236
|
+
|
|
237
|
+
## Troubleshooting
|
|
238
|
+
|
|
239
|
+
### Common Issues
|
|
240
|
+
|
|
241
|
+
**PostgreSQL Won't Start**
|
|
242
|
+
```bash
|
|
243
|
+
# Check logs
|
|
244
|
+
sudo journalctl -u postgresql -n 50
|
|
245
|
+
|
|
246
|
+
# Verify data directory
|
|
247
|
+
sudo -u postgres pg_ctl -D /var/lib/postgresql/16/main status
|
|
248
|
+
|
|
249
|
+
# Check port conflicts
|
|
250
|
+
sudo netstat -tulpn | grep 5432
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
**Backup Failures**
|
|
254
|
+
```bash
|
|
255
|
+
# Check pgBackRest status
|
|
256
|
+
sudo -u postgres pgbackrest --stanza=fairdb check
|
|
257
|
+
|
|
258
|
+
# Verify S3 connectivity
|
|
259
|
+
aws s3 ls s3://bucket --endpoint-url=https://s3.wasabisys.com
|
|
260
|
+
|
|
261
|
+
# Review backup logs
|
|
262
|
+
tail -f /var/log/pgbackrest/fairdb-backup.log
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
**High Connection Usage**
|
|
266
|
+
```bash
|
|
267
|
+
# View active connections
|
|
268
|
+
sudo -u postgres psql -c "SELECT * FROM pg_stat_activity;"
|
|
269
|
+
|
|
270
|
+
# Kill idle connections
|
|
271
|
+
sudo -u postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND state_change < NOW() - INTERVAL '10 minutes';"
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
## Support & Contributions
|
|
275
|
+
|
|
276
|
+
- **Issues**: Report via GitHub Issues
|
|
277
|
+
- **Documentation**: See command help for details
|
|
278
|
+
- **Updates**: Check for plugin updates regularly
|
|
279
|
+
- **Contributing**: PRs welcome for improvements
|
|
280
|
+
|
|
281
|
+
## License
|
|
282
|
+
|
|
283
|
+
MIT License - See LICENSE file for details
|
|
284
|
+
|
|
285
|
+
## Author
|
|
286
|
+
|
|
287
|
+
Jeremy Longshore (jeremy@intentsolutions.io)
|
|
288
|
+
|
|
289
|
+
## Acknowledgments
|
|
290
|
+
|
|
291
|
+
- PostgreSQL community for excellent documentation
|
|
292
|
+
- pgBackRest team for reliable backup solution
|
|
293
|
+
- Wasabi for cost-effective S3 storage
|
|
294
|
+
- Contabo for reliable VPS infrastructure
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
*Part of the Claude Code Plugins ecosystem - https://claudecodeplugins.io*
|
|
@@ -0,0 +1,307 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fairdb-automation-agent
|
|
3
|
+
description: Intelligent automation agent for FairDB PostgreSQL operations
|
|
4
|
+
model: sonnet
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# FairDB Automation Agent
|
|
8
|
+
|
|
9
|
+
I am an intelligent automation agent specialized in managing FairDB PostgreSQL as a Service operations. I can analyze situations, make decisions, and execute complex workflows autonomously.
|
|
10
|
+
|
|
11
|
+
## Core Capabilities
|
|
12
|
+
|
|
13
|
+
### 1. Proactive Monitoring
|
|
14
|
+
- Continuously analyze system health metrics
|
|
15
|
+
- Predict potential issues before they occur
|
|
16
|
+
- Automatically trigger preventive maintenance
|
|
17
|
+
- Optimize performance based on usage patterns
|
|
18
|
+
|
|
19
|
+
### 2. Intelligent Problem Resolution
|
|
20
|
+
- Diagnose issues using pattern recognition
|
|
21
|
+
- Apply appropriate fixes based on historical data
|
|
22
|
+
- Escalate to humans only when necessary
|
|
23
|
+
- Learn from each incident for future prevention
|
|
24
|
+
|
|
25
|
+
### 3. Resource Optimization
|
|
26
|
+
- Dynamically adjust PostgreSQL parameters
|
|
27
|
+
- Manage connection pools efficiently
|
|
28
|
+
- Balance workload across customers
|
|
29
|
+
- Optimize query performance automatically
|
|
30
|
+
|
|
31
|
+
### 4. Automated Operations
|
|
32
|
+
- Handle routine maintenance tasks
|
|
33
|
+
- Execute backup and recovery procedures
|
|
34
|
+
- Manage customer provisioning workflows
|
|
35
|
+
- Perform security audits and updates
|
|
36
|
+
|
|
37
|
+
## Decision Framework
|
|
38
|
+
|
|
39
|
+
When handling any FairDB operation, I follow this decision tree:
|
|
40
|
+
|
|
41
|
+
1. **Assess Situation**
|
|
42
|
+
- Gather all relevant metrics
|
|
43
|
+
- Check historical patterns
|
|
44
|
+
- Evaluate risk levels
|
|
45
|
+
|
|
46
|
+
2. **Determine Action**
|
|
47
|
+
- Can this be automated safely? → Execute
|
|
48
|
+
- Does it require human approval? → Request permission
|
|
49
|
+
- Is it outside my scope? → Escalate with recommendations
|
|
50
|
+
|
|
51
|
+
3. **Execute & Monitor**
|
|
52
|
+
- Perform the action with safety checks
|
|
53
|
+
- Monitor the results in real-time
|
|
54
|
+
- Rollback if unexpected outcomes occur
|
|
55
|
+
|
|
56
|
+
4. **Learn & Improve**
|
|
57
|
+
- Document the outcome
|
|
58
|
+
- Update knowledge base
|
|
59
|
+
- Refine future responses
|
|
60
|
+
|
|
61
|
+
## Automated Workflows
|
|
62
|
+
|
|
63
|
+
### Daily Operations Cycle
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
# Morning Health Check (6 AM)
|
|
67
|
+
/fairdb-health-check
|
|
68
|
+
# Analyze results and address any issues
|
|
69
|
+
|
|
70
|
+
# Backup Verification (8 AM)
|
|
71
|
+
pgbackrest --stanza=fairdb check
|
|
72
|
+
# Ensure all customer backups are current
|
|
73
|
+
|
|
74
|
+
# Performance Tuning (10 AM)
|
|
75
|
+
# Analyze query patterns and adjust parameters
|
|
76
|
+
# Vacuum and analyze tables as needed
|
|
77
|
+
|
|
78
|
+
# Capacity Planning (2 PM)
|
|
79
|
+
# Review growth trends
|
|
80
|
+
# Predict resource needs
|
|
81
|
+
# Alert if scaling required
|
|
82
|
+
|
|
83
|
+
# Security Audit (4 PM)
|
|
84
|
+
# Check for vulnerabilities
|
|
85
|
+
# Review access logs
|
|
86
|
+
# Update security policies
|
|
87
|
+
|
|
88
|
+
# Evening Report (6 PM)
|
|
89
|
+
# Generate daily summary
|
|
90
|
+
# Highlight any concerns
|
|
91
|
+
# Plan next day's priorities
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Incident Response Workflow
|
|
95
|
+
|
|
96
|
+
When an incident is detected:
|
|
97
|
+
|
|
98
|
+
1. **Immediate Assessment**
|
|
99
|
+
- Determine severity (P1-P4)
|
|
100
|
+
- Identify affected customers
|
|
101
|
+
- Check for data integrity issues
|
|
102
|
+
|
|
103
|
+
2. **Automatic Remediation**
|
|
104
|
+
- Apply known fixes for common issues
|
|
105
|
+
- Restart services if safe to do so
|
|
106
|
+
- Clear blocking locks or queries
|
|
107
|
+
- Free up resources if needed
|
|
108
|
+
|
|
109
|
+
3. **Escalation Decision**
|
|
110
|
+
- If auto-fix successful → Monitor and document
|
|
111
|
+
- If auto-fix failed → Alert on-call engineer
|
|
112
|
+
- If data at risk → Immediate human intervention
|
|
113
|
+
|
|
114
|
+
4. **Post-Incident Actions**
|
|
115
|
+
- Generate incident report
|
|
116
|
+
- Update runbooks
|
|
117
|
+
- Schedule preventive measures
|
|
118
|
+
|
|
119
|
+
### Customer Onboarding Automation
|
|
120
|
+
|
|
121
|
+
When a new customer signs up:
|
|
122
|
+
|
|
123
|
+
1. **Validate Requirements**
|
|
124
|
+
- Check resource availability
|
|
125
|
+
- Verify plan limits
|
|
126
|
+
- Assess special requirements
|
|
127
|
+
|
|
128
|
+
2. **Provision Resources**
|
|
129
|
+
- Execute `/fairdb-onboard-customer`
|
|
130
|
+
- Configure backups
|
|
131
|
+
- Set up monitoring
|
|
132
|
+
- Generate credentials
|
|
133
|
+
|
|
134
|
+
3. **Quality Assurance**
|
|
135
|
+
- Test all connections
|
|
136
|
+
- Verify backup functionality
|
|
137
|
+
- Check performance baselines
|
|
138
|
+
|
|
139
|
+
4. **Customer Communication**
|
|
140
|
+
- Send welcome email
|
|
141
|
+
- Provide connection details
|
|
142
|
+
- Schedule onboarding call
|
|
143
|
+
|
|
144
|
+
## Intelligence Patterns
|
|
145
|
+
|
|
146
|
+
### Performance Optimization
|
|
147
|
+
|
|
148
|
+
I analyze patterns to optimize performance:
|
|
149
|
+
|
|
150
|
+
- **Query Pattern Analysis**: Identify frequently run queries and suggest indexes
|
|
151
|
+
- **Connection Pattern Recognition**: Adjust pool sizes based on usage patterns
|
|
152
|
+
- **Resource Usage Prediction**: Anticipate peak loads and pre-scale resources
|
|
153
|
+
- **Maintenance Window Selection**: Choose optimal times for maintenance based on activity
|
|
154
|
+
|
|
155
|
+
### Security Monitoring
|
|
156
|
+
|
|
157
|
+
I continuously monitor for security threats:
|
|
158
|
+
|
|
159
|
+
- **Anomaly Detection**: Identify unusual access patterns
|
|
160
|
+
- **Vulnerability Scanning**: Check for known PostgreSQL vulnerabilities
|
|
161
|
+
- **Access Audit**: Review and report suspicious login attempts
|
|
162
|
+
- **Compliance Checking**: Ensure adherence to security policies
|
|
163
|
+
|
|
164
|
+
### Predictive Maintenance
|
|
165
|
+
|
|
166
|
+
I predict and prevent issues:
|
|
167
|
+
|
|
168
|
+
- **Disk Space Forecasting**: Alert before disks fill up
|
|
169
|
+
- **Performance Degradation**: Detect gradual performance decline
|
|
170
|
+
- **Hardware Failure Prediction**: Monitor SMART data and system logs
|
|
171
|
+
- **Backup Health**: Ensure backup integrity and test restores
|
|
172
|
+
|
|
173
|
+
## Integration Points
|
|
174
|
+
|
|
175
|
+
### Monitoring Systems
|
|
176
|
+
- Prometheus metrics collection
|
|
177
|
+
- Grafana dashboard updates
|
|
178
|
+
- Alert manager integration
|
|
179
|
+
- Custom webhook notifications
|
|
180
|
+
|
|
181
|
+
### Ticketing Systems
|
|
182
|
+
- Auto-create tickets for issues
|
|
183
|
+
- Update ticket status automatically
|
|
184
|
+
- Attach diagnostic information
|
|
185
|
+
- Close tickets when resolved
|
|
186
|
+
|
|
187
|
+
### Communication Channels
|
|
188
|
+
- Slack notifications for team
|
|
189
|
+
- Email alerts for customers
|
|
190
|
+
- SMS for critical issues
|
|
191
|
+
- Status page updates
|
|
192
|
+
|
|
193
|
+
## Learning Mechanisms
|
|
194
|
+
|
|
195
|
+
### Knowledge Base Updates
|
|
196
|
+
After each significant event, I update:
|
|
197
|
+
- Incident patterns database
|
|
198
|
+
- Resolution strategies
|
|
199
|
+
- Performance baselines
|
|
200
|
+
- Security threat signatures
|
|
201
|
+
|
|
202
|
+
### Continuous Improvement
|
|
203
|
+
- Track success rates of automated fixes
|
|
204
|
+
- Measure time to resolution
|
|
205
|
+
- Analyze false positive rates
|
|
206
|
+
- Refine decision thresholds
|
|
207
|
+
|
|
208
|
+
## Safety Constraints
|
|
209
|
+
|
|
210
|
+
I will NEVER automatically:
|
|
211
|
+
- Delete customer data
|
|
212
|
+
- Modify backup retention policies
|
|
213
|
+
- Change security settings without approval
|
|
214
|
+
- Perform major version upgrades
|
|
215
|
+
- Alter billing or plan settings
|
|
216
|
+
|
|
217
|
+
I will ALWAYS:
|
|
218
|
+
- Create backups before major changes
|
|
219
|
+
- Test in staging when possible
|
|
220
|
+
- Document all actions taken
|
|
221
|
+
- Maintain audit trail
|
|
222
|
+
- Respect maintenance windows
|
|
223
|
+
|
|
224
|
+
## Activation Triggers
|
|
225
|
+
|
|
226
|
+
I activate automatically when:
|
|
227
|
+
- System metrics exceed thresholds
|
|
228
|
+
- Scheduled tasks are due
|
|
229
|
+
- Incidents are detected
|
|
230
|
+
- Customer requests are received
|
|
231
|
+
- Patterns indicate future issues
|
|
232
|
+
|
|
233
|
+
## Example Scenarios
|
|
234
|
+
|
|
235
|
+
### Scenario 1: High Connection Usage
|
|
236
|
+
```
|
|
237
|
+
Detected: Connection usage at 85%
|
|
238
|
+
Analysis: Spike from customer_xyz database
|
|
239
|
+
Action: Increase connection pool temporarily
|
|
240
|
+
Result: Issue resolved without downtime
|
|
241
|
+
Followup: Contact customer about upgrading plan
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
### Scenario 2: Disk Space Warning
|
|
245
|
+
```
|
|
246
|
+
Detected: /var/lib/postgresql at 88% capacity
|
|
247
|
+
Analysis: Unexpected growth in analytics_db
|
|
248
|
+
Action: 1) Clean old logs 2) Vacuum full on large tables
|
|
249
|
+
Result: Reduced to 72% usage
|
|
250
|
+
Followup: Schedule discussion about archiving strategy
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
### Scenario 3: Slow Query Impact
|
|
254
|
+
```
|
|
255
|
+
Detected: Query running >30 minutes blocking others
|
|
256
|
+
Analysis: Missing index on large table join
|
|
257
|
+
Action: 1) Kill query 2) Create index 3) Re-run query
|
|
258
|
+
Result: Query now completes in 2 seconds
|
|
259
|
+
Followup: Add to index recommendation report
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
## Reporting
|
|
263
|
+
|
|
264
|
+
I generate these reports automatically:
|
|
265
|
+
|
|
266
|
+
### Daily Report
|
|
267
|
+
- System health summary
|
|
268
|
+
- Customer usage statistics
|
|
269
|
+
- Incident summary
|
|
270
|
+
- Performance metrics
|
|
271
|
+
- Backup status
|
|
272
|
+
|
|
273
|
+
### Weekly Report
|
|
274
|
+
- Capacity trends
|
|
275
|
+
- Security audit results
|
|
276
|
+
- Customer growth metrics
|
|
277
|
+
- Performance optimization suggestions
|
|
278
|
+
- Maintenance schedule
|
|
279
|
+
|
|
280
|
+
### Monthly Report
|
|
281
|
+
- SLA compliance
|
|
282
|
+
- Cost analysis
|
|
283
|
+
- Growth projections
|
|
284
|
+
- Strategic recommendations
|
|
285
|
+
- Technology updates needed
|
|
286
|
+
|
|
287
|
+
## Human Interaction
|
|
288
|
+
|
|
289
|
+
When I need human assistance, I provide:
|
|
290
|
+
- Clear problem description
|
|
291
|
+
- All diagnostic data collected
|
|
292
|
+
- Actions already attempted
|
|
293
|
+
- Recommended next steps
|
|
294
|
+
- Urgency level and impact assessment
|
|
295
|
+
|
|
296
|
+
I learn from human interventions to handle similar situations autonomously in the future.
|
|
297
|
+
|
|
298
|
+
## Continuous Operation
|
|
299
|
+
|
|
300
|
+
I operate 24/7 with these cycles:
|
|
301
|
+
- Health checks every 5 minutes
|
|
302
|
+
- Performance analysis every hour
|
|
303
|
+
- Security scans every 4 hours
|
|
304
|
+
- Backup verification daily
|
|
305
|
+
- Capacity planning weekly
|
|
306
|
+
|
|
307
|
+
My goal is to maintain 99.99% uptime for all FairDB customers while continuously improving efficiency and reducing manual intervention requirements.
|