toga-ai 1.0.39 → 1.0.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -6,3 +6,4 @@
6
6
  | [ClickUp Project & Opportunity Multi-List Routing](features/clickup-project-routing.md) | Routes ClickUp tasks into the correct **secondary multi-list memberships** based on their custom-field values, via the `clickup` webhook. | worker2/Worker/Clickup/Project.php, worker2/Worker/Clickup.php |
7
7
  | [Creating Worker Actions](features/creating-worker-actions.md) | How to add a new callable Worker action — a PHP class whose `public static` methods are invoked as background jobs (via webhook, cron, or `_Worker::runTask()`). | worker2/Worker/, worker2/Controller/Index.php, _underscore/Worker.php |
8
8
  | [Elite Freshservice Sync (worker2)](features/elite-freshservice-sync.md) | `_Worker_Elite` processes Freshservice webhook events and syncs them into TOGA 2. | worker2/Worker/Elite.php, worker2/Config/dev-kmaramreddy-laptop.ini |
9
+ | [Monitoring Framework (Orchestrator + Child Monitors)](features/monitoring-framework.md) | A unified, DB-driven monitoring framework for business-critical data flows (Compass POs, Prudential asset imports, AIG closed claims, …). | worker2/Worker/Monitor.php, worker2/Worker/Monitors/, worker2/Worker/Notification/Email.php, dbchanges2/Core/2026-05-21 - Monitors.sql |
@@ -0,0 +1,183 @@
1
+ ---
2
+ title: Monitoring Framework (Orchestrator + Child Monitors)
3
+ framework: "2.0"
4
+ repo: worker2
5
+ project: Worker
6
+ client: shared
7
+ type: feature
8
+ status: active
9
+ updated: 2026-06-10
10
+ owners: [mhammontree]
11
+ files:
12
+ - worker2/Worker/Monitor.php
13
+ - worker2/Worker/Monitors/
14
+ - worker2/Worker/Notification/Email.php
15
+ - dbchanges2/Core/2026-05-21 - Monitors.sql
16
+ related:
17
+ - ../architecture.md
18
+ - ./creating-worker-actions.md
19
+ - ../../dbchanges2/architecture.md
20
+ ---
21
+
22
+ ## Summary
23
+
24
+ A unified, DB-driven monitoring framework for business-critical data flows (Compass POs,
25
+ Prudential asset imports, AIG closed claims, …). One orchestrator applies consistent
26
+ anti-flap logic and notification rules to any registered "is X healthy?" check. Replaces
27
+ scattered/ad-hoc monitoring where failures surfaced only when a client complained.
28
+
29
+ **Status (2026-06-10):** v1.0 framework + orchestrator landed; **first child monitor
30
+ pending**. Migration applied to local Core_2 only — **not yet on staging/production Core**
31
+ (coordinate before merge). Dashboard/acknowledgment layer under discussion (see Pending
32
+ scope).
33
+
34
+ Design principles: async isolation (one cron per monitor — a slow monitor can't block
35
+ others) · anti-flap on the recovery side only (N consecutive OKs before declaring
36
+ recovery) · immediate alert on first failure, throttled reminders, one recovery email per
37
+ incident · dumb-and-light children (all state/escalation/email logic lives once in the
38
+ orchestrator) · runtime config in the `Core.Monitors` table (no redeploy).
39
+
40
+ ## Key files / entry points
41
+
42
+ | File | Purpose |
43
+ |---|---|
44
+ | `worker2/Worker/Monitor.php` | Orchestrator — `abstract class _Worker_Monitor`, action route `Monitor/Run` |
45
+ | `worker2/Worker/Monitors/<Name>.php` | Child classes `_Worker_Monitors_<Name>`, one per monitor |
46
+ | `worker2/Worker/Notification/Email.php` | Reused `_Worker_Notification_Email::Send` for all notification mail |
47
+ | `dbchanges2/Core/2026-05-21 - Monitors.sql` | `Core.Monitors` table definition + INSERT templates |
48
+
49
+ Reused with **no changes**: `Core.CronJobs`, `Core.WorkerJobs`, the `WorkerCronScheduler`
50
+ Lambda, the EB worker tier + SQS delivery (see [worker2 architecture](../architecture.md)).
51
+
52
+ ## How it works
53
+
54
+ One `Core.CronJobs` row per monitor (`action = 'Monitor/Run'`, `parameters =
55
+ {"monitorId": N}`, schedule evaluated in **Central** time). Each tick, the orchestrator:
56
+
57
+ 1. Fetches the `Core.Monitors` row by `monitorId` (not found → return message, no writes).
58
+ 2. Guards `isActive` — if 0, returns early; no child invocation, no mail.
59
+ 3. Invokes the child: `class_exists` / `method_exists` checks, then `$phpClass::Run()`
60
+ inside try/catch. **Any `Throwable` becomes `{isOk: false}`** with the exception
61
+ message — a child can never crash the orchestrator. The return must be an object with
62
+ `isOk` and `message`; anything else is treated as failure.
63
+ 4. Runs the state machine (below) to compute new state, counter, and notification.
64
+ 5. Sends mail if needed via `_Worker_Notification_Email::Send` with
65
+ `clientIdentifier: 'Core'` (infrastructure mail, never client-scoped).
66
+ 6. Persists one UPDATE: `state`, `consecutiveOkCount`, `lastRunDt`, `lastResultMessage`,
67
+ plus `lastNotificationDt`/`lastNotificationType` when a notification was sent.
68
+ 7. Returns a one-line summary that lands in `WorkerJobs` for diagnostics.
69
+
70
+ Uses `_Query($sql, _underscore::DB_CORE)` and `_Database::escape()` for interpolated values.
71
+
72
+ ### State machine (the agreed contract)
73
+
74
+ | Previous | Current | New state | Counter | Notification |
75
+ |---|---|---|---|---|
76
+ | ok | ok | ok | reset to 0 | none |
77
+ | ok | alert | **alert** | reset to 0 | **alert** — always, immediate |
78
+ | alert | alert | alert | unchanged (0) | **reminder** — only if `reminderFrequencyMinutes` elapsed since `lastNotificationDt` |
79
+ | alert | ok | alert until counter hits threshold | +1 | **recovery** — only when `consecutiveOkCount >= requiredConsecutiveOks`; on send, flip to ok and reset counter |
80
+
81
+ Three **intentional product decisions** (deviations from the original meeting summary —
82
+ if you change any, update the plan doc and announce in #engineering):
83
+ 1. No anti-flap on the alarm side — first failure alerts immediately.
84
+ 2. Recovery fires only on an actual alert → ok transition (the meeting's "ok→ok sends
85
+ recovery" branch was folded out).
86
+ 3. Recovery is **not** gated on the reminder window — gating could swallow the
87
+ "all clear" email entirely.
88
+
89
+ Reminder-window math uses PHP server time vs `lastNotificationDt` written via DB `NOW()`
90
+ — consistent on a single DB server, **not explicitly UTC**; revisit if a cross-region DB
91
+ is ever introduced.
92
+
93
+ ### Notifications
94
+
95
+ Subjects: `[Monitor ALERT|REMINDER|RECOVERED] <monitor name>`. Plain-text body with
96
+ monitor name, state, notification type, timestamp, and the child's message verbatim.
97
+ From `donotreply@goagilant.com`; recipients parsed from `Monitors.peopleToNotify` (JSON
98
+ array of emails — empty array means no mail sent, no error raised).
99
+
100
+ ## Data model
101
+
102
+ ### `Core.Monitors` (new)
103
+
104
+ | Column | Type | Purpose |
105
+ |---|---|---|
106
+ | `id` / `uuid` | INT UNSIGNED PK / char(36) UNIQUE | Standard identifiers |
107
+ | `isActive` | TINYINT default 1 | 0 = orchestrator returns early |
108
+ | `dtCreated` | DATETIME | Record create time |
109
+ | `name` | VARCHAR(255) | Human-readable; appears in email subjects |
110
+ | `phpClass` | VARCHAR(255) | Child class, e.g. `_Worker_Monitors_Compass` |
111
+ | `state` | ENUM('ok','alert') default 'ok' | **Confirmed** current state |
112
+ | `consecutiveOkCount` | INT UNSIGNED default 0 | Anti-flap counter (alert→ok only) |
113
+ | `requiredConsecutiveOks` | TINYINT UNSIGNED default 2 | Threshold to flip alert→ok |
114
+ | `reminderFrequencyMinutes` | SMALLINT UNSIGNED default 60 | Reminder cadence while in alert |
115
+ | `peopleToNotify` | JSON | Array of email addresses |
116
+ | `lastRunDt` / `lastResultMessage` | DATETIME / TEXT | Last run + last child message (or exception text) |
117
+ | `lastNotificationDt` / `lastNotificationType` | DATETIME / ENUM('alert','reminder','recovery') | Most recent notification |
118
+
119
+ Index `Monitors_isActive_IDX (isActive)`. `Core.CronJobs` reused as-is — one row per monitor.
120
+
121
+ ## Child monitor contract
122
+
123
+ - File `worker2/Worker/Monitors/<Name>.php`, class `abstract class _Worker_Monitors_<Name>`
124
+ with `public static function Run(): object` returning
125
+ `(object)['isOk' => bool, 'message' => string]` — matches the worker2 action convention.
126
+ - `Run()` takes **no parameters**; all per-monitor config lives in the `Monitors` row.
127
+ - **Don't**: send notifications, write to `Monitors`, implement escalation, or wrap
128
+ everything in try/catch (the orchestrator catches all `Throwable`s — bubbling up is the
129
+ simplest way to fail loudly).
130
+ - **Do**: check exactly one signal per monitor; put actionable context in `message`
131
+ (recipients see it verbatim); register any non-Core DB connection inline at the top of
132
+ `Run()` — the framework does **not** call `initialize()` on child monitors.
133
+
134
+ ### Adding a new monitor (runbook)
135
+
136
+ 1. Write the child class (one check, returns `{isOk, message}`).
137
+ 2. INSERT the `Core.Monitors` row (uuid, name, phpClass, thresholds, peopleToNotify).
138
+ 3. INSERT the `Core.CronJobs` row (`action = 'Monitor/Run'`,
139
+ `parameters = JSON_OBJECT('monitorId', <id>)`).
140
+ 4. Verify end-to-end in dev (8-step checklist: migration sanity, cron registration,
141
+ first-failure alert, reminder cadence, anti-flap recovery, exception path, inactive
142
+ monitor, missing class — all must pass before production).
143
+ 5. Deploy class to staging+production worker2; apply the two INSERTs to each Core.
144
+
145
+ No Lambda, SQS, or orchestrator changes needed per monitor.
146
+
147
+ ### Operational notes
148
+
149
+ - Force a recheck: set `lastRunDt = NULL`, or POST `{"action":"Monitor/Run","parameters":{"monitorId":N}}` to the worker2 test endpoint (debug path, bypasses WorkerJobs).
150
+ - Pause a monitor: `isActive = 0` (cron row can stay active). Pause email only: `peopleToNotify = JSON_ARRAY()`.
151
+ - Manual state reset: clear `state`/`consecutiveOkCount`/`lastNotificationDt`/`lastNotificationType` — sparingly; the state machine self-corrects.
152
+
153
+ ## Client variations
154
+
155
+ None — the framework is shared Core infrastructure. Individual monitors target specific
156
+ clients' data flows (Compass, Prudential, AIG, …) but live as separate child classes.
157
+
158
+ ## Gotchas / known issues
159
+
160
+ - **First child monitor not yet built** — `worker2/Worker/Monitors/` does not exist yet
161
+ (as of 2026-06-10).
162
+ - **Migration not applied to staging/production Core** — only local Core_2. Coordinate
163
+ before merge.
164
+ - `worker2/MONITORING_PLAN.md` is referenced by the design doc but **missing on disk** —
165
+ stale reference to resolve.
166
+ - Child return value must be an **object** with both `isOk` and `message`; a bare array
167
+ or missing property is treated as failure by design.
168
+ - Reminder math is server-time based, not UTC — see state-machine section.
169
+
170
+ ## Pending scope (NOT implemented)
171
+
172
+ - **Dashboard + acknowledgments** — ack columns on `Monitors` (or a history table),
173
+ reminder gate becomes "elapsed AND not acknowledged", recovery clears the ack. Open
174
+ questions: ack survival across flap-back, auto-expiry, which auth system owns dashboard
175
+ identity. Owner: mhammontree, pending team discussion.
176
+ - Slack / SMS / PagerDuty (email-only today) · `MonitorRuns` history table ·
177
+ HTML email + dashboard deep-links · anti-flap on the alarm side.
178
+
179
+ ## Related docs
180
+
181
+ - [worker2 architecture](../architecture.md) — cron/WorkerJobs pipeline the orchestrator rides on
182
+ - [Creating Worker Actions](./creating-worker-actions.md) — the action class/method conventions child monitors follow
183
+ - [dbchanges2 architecture](../../dbchanges2/architecture.md) — migration naming/ordering rules for the Monitors table change
@@ -10,7 +10,7 @@ _Auto-generated by `knowledge.js index`. Do not hand-edit._
10
10
  ## 2.0 framework
11
11
 
12
12
  - **_underscore** (_Underscore) _(framework core)_ — 3 doc(s) → [2.0/apps/_underscore/INDEX.md](2.0/apps/_underscore/INDEX.md)
13
- - **worker2** (Worker) — 4 doc(s) → [2.0/apps/worker2/INDEX.md](2.0/apps/worker2/INDEX.md)
13
+ - **worker2** (Worker) — 5 doc(s) → [2.0/apps/worker2/INDEX.md](2.0/apps/worker2/INDEX.md)
14
14
  - **api2** (API) — 1 doc(s) → [2.0/apps/api2/INDEX.md](2.0/apps/api2/INDEX.md)
15
15
  - **dbchanges2** (Database Changes) _(framework core)_ — 1 doc(s) → [2.0/apps/dbchanges2/INDEX.md](2.0/apps/dbchanges2/INDEX.md)
16
16
  - **toga2-supply** (TOGa Supply) — 2 doc(s) → [2.0/apps/toga2-supply/INDEX.md](2.0/apps/toga2-supply/INDEX.md)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "toga-ai",
3
- "version": "1.0.39",
3
+ "version": "1.0.41",
4
4
  "description": "TOGA Technology Team Claude Knowledge System — shared AI coding harness with skills, knowledge base CLI, and project installer for Claude Code.",
5
5
  "keywords": [
6
6
  "claude",
@@ -86,10 +86,16 @@ every doc in this capture — call the result `AUTHOR_USERNAME`:
86
86
  3. **Persist it** to Claude memory as a `user` memory named `author-username` (and add its
87
87
  `MEMORY.md` pointer) so future captures never ask again.
88
88
 
89
- Use `AUTHOR_USERNAME` for `owners` in every doc you write: `owners: ["<AUTHOR_USERNAME>"]`.
90
- When updating an existing doc that already lists owners, **add** `AUTHOR_USERNAME` if it is
91
- not already present (do not replace the existing owners). If an existing doc has an empty
92
- or missing `owners`, fill it with `AUTHOR_USERNAME`.
89
+ `owners` is a **list of everyone who has worked on the doc it may hold several names**,
90
+ not just one. Always include `AUTHOR_USERNAME`, and treat the field as cumulative:
91
+
92
+ - **New doc:** start `owners` with `["<AUTHOR_USERNAME>"]`.
93
+ - **Existing doc:** **union** — keep all existing owners and add `AUTHOR_USERNAME` if not
94
+ already present. Never replace or drop an existing owner.
95
+ - **Co-authors:** if more than one person worked on this feature (e.g. pairing, or a handoff
96
+ this session), ask "Anyone else who should be listed as an owner? (usernames, comma-separated,
97
+ same first-initial+last-name convention)" and add each — deduplicated, lowercase — to the list.
98
+ - Never leave `owners` empty.
93
99
 
94
100
  ## Step 2 — Determine framework / repo / project (ask if unknown)
95
101