@meshxdata/fops 0.1.54 → 0.1.55

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,3 +1,186 @@
1
+ ## [0.1.55] - 2026-03-26
2
+
3
+ - feat(azure): add 'fops azure reconcile <name>' command for VM drift fix (79ba6e2)
4
+ - fix(otel,loki): remove duplicate spanmetrics dimensions, use .env for loki S3 creds (e3d1def)
5
+ - fix(loki): pass S3 credentials from .env so loki works without vault-init (c57906d)
6
+ - fix(azure): improve VM provisioning reliability (2ddd669)
7
+ - cluster discovery (009257d)
8
+ - feat(storage): add loki container to provisioning (898c544)
9
+ - feat(azure): add ping command to check backend health (8336825)
10
+ - operator cli bump 0.1.52 (f052cb5)
11
+ - fix(doctor): set KUBECONFIG for k3s kubectl commands (db9359b)
12
+ - fix(azure): move --landscape to test run command, not separate subcommand (4b9b089)
13
+ - feat(azure): add test integration command with landscape support (b2990a0)
14
+ - fix(fleet): skip VMs without public IPs in fleet exec (39acbaa)
15
+ - feat(azure): detect and fix External Secrets identity issues (f907d11)
16
+ - operator cli bump 0.1.51 (db55bdc)
17
+ - feat: add postgres-exporter and Azure tray menu improvements (2a337ac)
18
+ - operator cli plugin fix (4dae908)
19
+ - operator cli plugin fix (25620cc)
20
+ - operator cli test fixes (1d1c18f)
21
+ - feat(test): add setup-users command for QA test user creation (b929507)
22
+ - feat(aks): show HA standby clusters with visual grouping (8fb640c)
23
+ - refactor(provision): extract VM provisioning to dedicated module (af321a7)
24
+ - refactor(provision): extract post-start health checks to dedicated module (6ed5f2d)
25
+ - fix: ping timeout 15s, fix prometheus sed escaping (d11ac14)
26
+ - refactor(vm): extract terraform HCL generation to dedicated module (896a64b)
27
+ - refactor(keyvault): extract key operations to dedicated module (716bbe4)
28
+ - refactor(azure): extract swarm functions to azure-fleet-swarm.js (4690e34)
29
+ - refactor(azure): extract SSH/remote functions to azure-ops-ssh.js (e62b8f0)
30
+ - refactor(azure): split azure-ops.js into smaller modules (4515425)
31
+ - feat(aks): add --ha flag for full cross-region HA setup (ece68c5)
32
+ - feat(fops): inject ENVIRONMENT_NAME on VM provisioning (6ef2a27)
33
+ - fix(postgres): disable SSL mode to fix connection issues (c789ae9)
34
+ - feat(trino): add caching configuration for docker-compose (3668224)
35
+ - fix(fops-azure): run pytest directly instead of missing scripts (29f8410)
36
+ - add -d detach option for local frontend dev, remove hive cpu limits (3306667)
37
+ - release 0.1.49 (dcca32b)
38
+ - release 0.1.48 (9b195e5)
39
+ - stash on updates (2916c01)
40
+ - stash on updates (b5c14df)
41
+ - stash on updates (d0453d1)
42
+ - frontend dev fixes (0ca7b00)
43
+ - fix: update azure test commands (77c81da)
44
+ - default locust to CLI mode, add --web for UI (ca35bff)
45
+ - add locust command for load testing AKS clusters (1278722)
46
+ - update spot node pool default autoscaling to 1-20 (617c182)
47
+ - module for aks (3dd1a61)
48
+ - add hive to PG_SERVICE_DBS for fops pg-setup (afccb16)
49
+ - feat(azure): enhance aks doctor with ExternalSecrets and PGSSLMODE checks (8b14861)
50
+ - add foundation-postgres ExternalName service to reconciler (ea88e11)
51
+ - new flux templates (0e2e372)
52
+ - feat(azure): add storage-engine secrets to Key Vault (a4f488e)
53
+ - feat(azure-aks): add AUTH0_DOMAIN to template rendering variables (216c37e)
54
+ - feat(azure): add storage account creation per cluster (aa1b138)
55
+ - bump watcher (ab24473)
56
+ - fix: concurrent compute calls (#66) (03e2edf)
57
+ - bump backend version (5058ff5)
58
+ - bump fops to 0.1.44 (8c0ef5d)
59
+ - Mlflow and azure plugin fix (176881f)
60
+ - fix lifecycle (a2cb9e7)
61
+ - callback url for localhost (821fb94)
62
+ - disable 4 scaffolding plugin by default. (bfb2b76)
63
+ - jaccard improvements (b7494a0)
64
+ - refactor azure plugin (68dfef4)
65
+ - refactor azure plugin (b24a008)
66
+ - fix trino catalog missing (4928a55)
67
+ - v36 bump and changelog generation on openai (37a0440)
68
+ - v36 bump and changelog generation on openai (a3b02d9)
69
+ - bump (a990058)
70
+ - status bar fix and new plugin for ttyd (27dde1e)
71
+ - file demo and tray (1a3e704)
72
+ - electron app (59ad0bb)
73
+ - compose and fops file plugin (1cf0e81)
74
+ - bump (346ffc1)
75
+ - localhost replaced by 127.0.0.1 (82b9f30)
76
+ - .29 (587b0e1)
77
+ - improve up down and bootstrap script (b79ebaf)
78
+ - checksum (22c8086)
79
+ - checksum (96b434f)
80
+ - checksum (15ed3c0)
81
+ - checksum (8a6543a)
82
+ - bump embed trino linksg (8440504)
83
+ - bump data (765ffd9)
84
+ - bump (cb8b232)
85
+ - broken tests (c532229)
86
+ - release 0.1.18, preflight checks (d902249)
87
+ - fix compute display bug (d10f5d9)
88
+ - cleanup packer files (6330f18)
89
+ - plan mode (cb36a8a)
90
+ - bump to 0.1.16 - agent ui (41ac1a2)
91
+ - bump to 0.1.15 - agent ui (4ebe2e1)
92
+ - bump to 0.1.14 (6c3a7fa)
93
+ - bump to 0.1.13 (8db570f)
94
+ - release 0.1.12 (c1c79e5)
95
+ - bump (11aa3b0)
96
+ - git keep and bump tui (be1678e)
97
+ - skills, index, rrf, compacted context (100k > 10k) (7b2fffd)
98
+ - cloudflare and token consumption, graphs indexing (0ad9eec)
99
+ - bump storage default (22c83ba)
100
+ - storage fix (68a22a0)
101
+ - skills update (7f56500)
102
+ - v9 bump (3864446)
103
+ - bump (c95eedc)
104
+ - rrf (dbf8c95)
105
+ - feat: warning when running predictions (95e8c52)
106
+ - feat: support for local predictions (45cf26b)
107
+ - feat: wip support for predictions + mlflow (3457052)
108
+ - add Reciprocal Rank Fusion (RRF) to knowledge and skill retrieval (61549bc)
109
+ - validate CSV headers in compute_run readiness check (a8c7a43)
110
+ - fix corrupted Iceberg metadata: probe tables + force cleanup on re-apply (50578af)
111
+ - enforce: never use foundation_apply to fix broken products (2e049bf)
112
+ - update SKILL.md with complete tool reference for knowledge retrieval (30b1924)
113
+ - add storage read, input DP table probe, and compute_run improvements (34e6c4c)
114
+ - skills update (1220385)
115
+ - skills update (bb66958)
116
+ - some tui improvement andd tools apply overwrite (e90c35c)
117
+ - skills update (e9227a1)
118
+ - skills update (669c4b3)
119
+ - fix plugin pre-flight checks (f741743)
120
+ - increase agent context (6479aaa)
121
+ - skills and init sql fixes (5fce35e)
122
+ - checksum (3518b56)
123
+ - penging job limit (a139861)
124
+ - checksum (575d28c)
125
+ - bump (92049ba)
126
+ - fix bug per tab status (0a33657)
127
+ - fix bug per tab status (50457c6)
128
+ - checksumming (0ad842e)
129
+ - shot af mardkwon overlapping (51f63b9)
130
+ - add spark dockerfile for multiarch builds (95abbd1)
131
+ - fix plugin initialization (16b9782)
132
+ - split index.js (50902a2)
133
+ - cloudflare cidr (cc4e021)
134
+ - cloduflare restrictions (2f6ba2d)
135
+ - sequential start (86b496e)
136
+ - sequential start (4930fe1)
137
+ - sequential start (353f014)
138
+ - qa tests (2dc6a1a)
139
+ - bump sha for .85 (dc2edfe)
140
+ - preserve env on sudo (7831227)
141
+ - bump sha for .84 (6c052f9)
142
+ - non interactive for azure vms (0aa8a2f)
143
+ - keep .env if present (d072450)
144
+ - bump (7a8e732)
145
+ - ensure opa is on compose if not set (f4a5228)
146
+ - checksum bump (a2ccc20)
147
+ - netrc defensive checks (a0b0ccc)
148
+ - netrc defensive checks (ae37403)
149
+ - checksum (ec45d11)
150
+ - update sync and fix up (7f9af72)
151
+ - expand test for azure and add new per app tag support (388a168)
152
+ - checksum on update (44005fc)
153
+ - cleanup for later (15e5313)
154
+ - cleanup for later (11c9597)
155
+ - switch branch feature (822fecc)
156
+ - add pull (d1c19ab)
157
+ - Bump hono from 4.11.9 to 4.12.0 in /operator-cli (ad25144)
158
+ - tests (f180a9a)
159
+ - cleanup (39c49a3)
160
+ - registry (7b7126a)
161
+ - reconcile kafka (832d0db)
162
+ - gh login bug (025886c)
163
+ - cleanup (bb96cab)
164
+ - strip envs from process (2421180)
165
+ - force use of gh creds not tokens in envs var (fff7787)
166
+ - resolve import between npm installs and npm link (79522e1)
167
+ - fix gh scope and azure states (afd846c)
168
+ - refactoring (da50352)
169
+ - split fops repo (d447638)
170
+ - aks (b791f8f)
171
+ - refactor azure (67d3bad)
172
+ - wildcard (391f023)
173
+ - azure plugin (c074074)
174
+ - zap (d7e6e7f)
175
+ - fix knock (cf89c05)
176
+ - azure (4adec98)
177
+ - Bump tar from 7.5.7 to 7.5.9 in /operator-cli (e41e98e)
178
+ - azure stack index.js split (de12272)
179
+ - Bump ajv from 8.17.1 to 8.18.0 in /operator-cli (76da21f)
180
+ - packer (9665fbc)
181
+ - remove stack api (db0fd4d)
182
+ - packer cleanup (fe1bf14)
183
+
1
184
  # Changelog
2
185
 
3
186
  All notable changes to @meshxdata/fops (Foundation Operator CLI) are documented here.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@meshxdata/fops",
3
- "version": "0.1.54",
3
+ "version": "0.1.55",
4
4
  "description": "CLI to install and manage data mesh platforms",
5
5
  "keywords": [
6
6
  "fops",
@@ -486,42 +486,54 @@ You manage Docker Compose stacks: inspect containers, read logs, restart service
486
486
  ## Role
487
487
  You investigate alerts, diagnose service failures, and suggest fixes. You have direct access to Docker containers, logs, and system metrics. You are called by the Glue bot when monitoring alerts fire.
488
488
 
489
+ ## CRITICAL: Always Use Tools — Never Guess
490
+ You MUST use your tools to investigate. NEVER give generic checklists or ask the user to check things manually.
491
+ - Don't say "check the logs for X" — run compose_logs and find X yourself.
492
+ - Don't say "verify auth config" — run compose_inspect or compose_exec to read the actual config.
493
+ - Don't say "correlate with metrics" — run compose_stats and report the numbers.
494
+ - If you need to grep logs, use compose_exec with grep/jq inside the container.
495
+ - Every finding in your response must be backed by tool output, not speculation.
496
+
489
497
  ## Tools Available
490
498
  - **compose_ps**: List all containers and their status (start here)
491
- - **compose_logs**: Read container logs (check for errors, crashes, OOM)
499
+ - **compose_logs**: Read container logs (check for errors, crashes, OOM). Use the tail parameter to get recent logs, and grep for specific patterns.
492
500
  - **compose_inspect**: Get container details (health checks, env vars, mounts, restarts)
493
501
  - **compose_stats**: CPU/memory/network usage per container
494
- - **compose_exec**: Run commands inside containers (e.g. check disk, network, processes)
502
+ - **compose_exec**: Run commands inside containers (e.g. grep logs, check disk, curl endpoints, read config files, test connectivity)
495
503
  - **compose_images**: List images and versions
496
- - **compose_restart**: Restart specific services
504
+ - **compose_restart**: Restart specific services (only after diagnosing the issue)
497
505
  - **embeddings_search**: Search docs, configs, and past knowledge for context
498
506
 
499
507
  ## Investigation Approach
500
508
  1. **Triage**: Run compose_ps to see overall stack health. Identify unhealthy/restarting containers.
501
- 2. **Diagnose**: For each affected container:
502
- - compose_logs to find errors, exceptions, OOM kills, crash traces
503
- - compose_inspect for health check failures, restart count, resource limits
504
- - compose_stats for CPU/memory spikes
505
- 3. **Context**: Use embeddings_search to find relevant docs or known issues.
506
- 4. **Root cause**: Correlate findings is it a code bug, resource exhaustion, dependency failure, config issue?
507
- 5. **Fix**: Suggest specific actions (restart, config change, scale, rollback).
509
+ 2. **Deep dive**: For each affected container, use MULTIPLE tools:
510
+ - compose_logs with tail=200 to find errors, then grep for specific patterns (4xx, 5xx, OOM, connection refused, timeout)
511
+ - compose_exec to grep logs for specific status codes: e.g. grep -c "HTTP/1.1 4" or check config files
512
+ - compose_inspect for health check failures, restart count, resource limits, env vars
513
+ - compose_stats for CPU/memory report actual numbers (e.g. "backend: 450MB/512MB, 85% memory")
514
+ 3. **Correlate**: If you see errors, trace them to the root service. Check dependencies (postgres, kafka, storage-engine).
515
+ 4. **Root cause**: State the specific cause with evidence from tool output.
516
+ 5. **Fix**: Take action if safe (restart a crashed container) or give a specific command to run.
508
517
 
509
518
  ## Output Format
510
519
  Structure your response with blank lines between each section:
511
520
 
512
521
  **Status:** One-line summary (e.g. "Processor container restarting due to OOM")
513
522
 
514
- **Findings:** What you discovered from each tool
523
+ **Findings:** Specific evidence from tools (include actual log lines, numbers, status codes)
515
524
 
516
- **Root Cause:** Most likely cause
525
+ **Root Cause:** Most likely cause, backed by evidence
517
526
 
518
- **Actions:** Specific steps to fix
527
+ **Actions:** Specific steps to fix (commands, not vague suggestions)
519
528
 
520
529
  **Prevention:** How to avoid this in the future
521
530
 
522
531
  ## Rules
523
532
  - Always check compose_ps first.
533
+ - USE TOOLS AGGRESSIVELY. Run 5-10 tool calls per investigation, not 1-2.
524
534
  - Check logs BEFORE suggesting restarts.
535
+ - When investigating HTTP errors: grep the actual logs for status codes and show the top error endpoints.
536
+ - When investigating performance: show actual CPU/memory numbers from compose_stats.
525
537
  - Look for patterns: repeated restarts, OOM kills, connection refused, timeout errors.
526
538
  - If a dependency is down (postgres, kafka), flag it — fixing the dependency fixes the dependent.
527
539
  - Be concise — this output goes into a Glue chat thread.