@query-ai/digital-workers 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,300 @@
1
+ ---
2
+ name: [your-org]-[runbook-name]
3
+ description: Use when [specific trigger condition -- e.g., "VPN alerts fire from contractor accounts"]
4
+ ---
5
+
6
+ # [Runbook Name]
7
+
8
+ > **How to use this template:**
9
+ > 1. Copy this file into a new skill folder: `skills/<your-org>-<runbook-name>/SKILL.md`
10
+ > 2. Rename the skill in the YAML frontmatter (e.g., `name: acme-vpn-investigation`)
11
+ > 3. Write a `description` that tells the framework exactly when to invoke this runbook
12
+ > 4. Fill in every section below -- the examples show the level of detail you should aim for
13
+ > 5. Reference this runbook in your org-policy under **Custom Investigation Runbooks**
14
+ >
15
+ > Delete all example/placeholder content before deploying. Keep only what applies
16
+ > to your investigation type.
17
+
18
+ ---
19
+
20
+ ## Iron Law
21
+
22
+ <!-- State the ONE non-negotiable rule for this investigation type. -->
23
+ <!-- This is the constraint the framework must never violate, no matter what. -->
24
+ <!-- Keep it short, memorable, and uppercase. -->
25
+
26
+ ```
27
+ NEVER [action that must never happen] WITHOUT [safeguard that must always be performed].
28
+ ```
29
+
30
+ **Examples for inspiration (delete these):**
31
+
32
+ - `NEVER CLOSE A VPN ALERT WITHOUT CHECKING THE CONTRACTOR SCHEDULE.`
33
+ - `NEVER MARK A PRIVILEGE ESCALATION AS BENIGN WITHOUT VERIFYING THE CHANGE TICKET.`
34
+ - `NEVER DISMISS A DATA EXFILTRATION ALERT WITHOUT CONFIRMING VOLUME IS BELOW 50 MB.`
35
+
36
+ ---
37
+
38
+ ## When to Invoke
39
+
40
+ This runbook is called by `alert-investigation` at **Gate 4** (Decide & Act) during
41
+ DEEP investigations when the following conditions are met:
42
+
43
+ ### Trigger Conditions
44
+
45
+ <!-- All conditions must be true for this runbook to activate. -->
46
+
47
+ | Condition | Match Value |
48
+ |-----------|-------------|
49
+ | Alert type (from alert-classifier) | `[e.g., Identity/Access]` |
50
+ | Alert source / connector | `[e.g., okta, crowdstrike]` |
51
+ | Additional condition | `[e.g., user is in "contractors" group]` |
52
+ | Additional condition | `[e.g., source country is not in approved-countries list]` |
53
+
54
+ ### Example Trigger Scenario
55
+
56
+ <!-- Describe a concrete scenario where this runbook fires. -->
57
+ <!-- This helps analysts understand the intent. -->
58
+
59
+ > An Okta alert fires for a failed MFA challenge followed by a successful login
60
+ > from a new device. The user belongs to the "contractors" group. The login
61
+ > originates from a country not on the approved-countries list. The framework
62
+ > invokes this runbook because all three conditions match: Identity/Access alert
63
+ > type, Okta source, and contractor user from an unapproved location.
64
+
65
+ ---
66
+
67
+ ## Prerequisites
68
+
69
+ ### Required Data Sources
70
+
71
+ <!-- List every data source this runbook queries. -->
72
+ <!-- If a source is missing, the runbook should fail gracefully, not silently skip steps. -->
73
+
74
+ | Data Source | Purpose | Required? |
75
+ |-------------|---------|-----------|
76
+ | `[e.g., okta_logs]` | Authentication events, MFA status | Yes |
77
+ | `[e.g., crowdstrike_detections]` | Endpoint activity post-login | Yes |
78
+ | `[e.g., hrms_contractor_schedule]` | Validate contractor active dates | Yes |
79
+ | `[e.g., geo_ip_enrichment]` | Resolve login source location | Recommended |
80
+
81
+ ### Org-Specific Knowledge
82
+
83
+ <!-- List any internal knowledge this runbook depends on. -->
84
+ <!-- These are things the framework cannot discover from data alone. -->
85
+
86
+ - **Contractor schedule source:** Where to find the current contractor roster and active engagement dates (e.g., HRMS export, SharePoint list, ServiceNow table).
87
+ - **Approved countries list:** Which countries are authorized for remote access by contractors (e.g., US, CA, GB, DE).
88
+ - **VPN gateway mapping:** Which VPN concentrators serve which office locations or contractor pools.
89
+ - **Escalation contacts:** Names or on-call rotations for the teams listed in the Escalation section.
90
+
91
+ ---
92
+
93
+ ## Context
94
+
95
+ <!-- Provide the background an analyst needs to understand WHY this investigation type -->
96
+ <!-- matters and WHAT to expect. This section trains the framework on domain knowledge -->
97
+ <!-- that is not available in the alert data itself. -->
98
+
99
+ ### Why This Matters
100
+
101
+ [Explain the business risk. Why does your organization care about this specific
102
+ alert pattern? What is the worst-case outcome if it is missed?]
103
+
104
+ **Example (delete this):**
105
+
106
+ > Contractor accounts have elevated risk because they are often provisioned with
107
+ > broad access for short engagements, rarely monitored after the engagement ends,
108
+ > and targeted by adversaries who know that contractor credentials receive less
109
+ > scrutiny. A compromised contractor VPN session can provide an attacker with
110
+ > persistent access to internal networks without triggering the usual employee-focused
111
+ > detections.
112
+
113
+ ### Common Scenarios
114
+
115
+ <!-- List the 3-5 most common reasons this alert fires. -->
116
+ <!-- Include both malicious and benign scenarios so the framework calibrates correctly. -->
117
+
118
+ 1. **Benign -- travel:** Contractor is traveling and logs in from a hotel in a country not on the approved list. MFA succeeds on second attempt after typo.
119
+ 2. **Benign -- schedule mismatch:** Contractor's engagement was extended but the schedule was not updated. Alert fires because the contractor appears inactive.
120
+ 3. **Suspicious -- credential stuffing:** Attacker uses stolen contractor credentials from a third-party breach. Multiple failed MFA attempts followed by a successful bypass.
121
+ 4. **Malicious -- session hijacking:** Attacker intercepts a valid session token and replays it from a different geographic location within minutes of the legitimate login.
122
+
123
+ ### What to Watch For
124
+
125
+ <!-- Key indicators that shift the investigation from benign toward malicious. -->
126
+
127
+ - Time gap between failed and successful authentication is under 2 minutes (suggests automated tooling, not a human retyping)
128
+ - Login location is > 500 miles from the contractor's registered home location AND no travel request is on file
129
+ - Post-login activity includes access to systems outside the contractor's documented scope of work
130
+ - The contractor account was supposed to be deactivated more than 7 days ago
131
+
132
+ ---
133
+
134
+ ## Investigation Steps
135
+
136
+ <!-- Numbered steps the framework follows. Each step should include: -->
137
+ <!-- 1. What to do (the query or action) -->
138
+ <!-- 2. What to look for in the results -->
139
+ <!-- 3. Decision point: what happens based on findings -->
140
+ <!-- -->
141
+ <!-- Use FSQL query patterns wherever possible. The framework executes these directly. -->
142
+
143
+ ### Step 1: Pull the Alert and Confirm Trigger Conditions
144
+
145
+ Run the initial alert query to retrieve the full alert record and confirm this
146
+ runbook's trigger conditions are actually met.
147
+
148
+ ```fsql
149
+ -- Pull the triggering alert with full detail
150
+ SELECT *
151
+ FROM alerts
152
+ WHERE alert_id = '{alert_id}'
153
+ ```
154
+
155
+ **What to look for:**
156
+ - Confirm alert type matches `[your alert type]`
157
+ - Confirm source connector matches `[your connector]`
158
+ - Extract the `user_id`, `source_ip`, `timestamp`, and `session_id` for use in subsequent steps
159
+
160
+ **Decision point:**
161
+ - If the trigger conditions are NOT met (e.g., user is not a contractor), stop and return to the standard investigation flow.
162
+ - If conditions are met, continue to Step 2.
163
+
164
+ ### Step 2: Establish the User's Authentication Baseline
165
+
166
+ Query the user's authentication history to understand what "normal" looks like
167
+ for this account.
168
+
169
+ ```fsql
170
+ -- 14-day authentication baseline for the user
171
+ SELECT timestamp, source_ip, geo_country, geo_city, user_agent, mfa_status, outcome
172
+ FROM [auth_source]
173
+ WHERE user_id = '{user_id}'
174
+ AND timestamp > ago(14d)
175
+ ORDER BY timestamp DESC
176
+ ```
177
+
178
+ **What to look for:**
179
+ - Typical login times (business hours vs. off-hours)
180
+ - Usual source countries and cities
181
+ - Consistent user-agent strings (browser/OS)
182
+ - MFA success rate (frequent failures may indicate shared credentials)
183
+ - Any previous logins from the same `source_ip` as the alert
184
+
185
+ **Decision point:**
186
+ - If the alert's source IP, location, and user-agent are consistent with the baseline, lower suspicion and continue to Step 3 for schedule verification.
187
+ - If the alert's source IP or location has never been seen before for this user, raise suspicion and flag for deeper inspection in Step 3.
188
+
189
+ ### Step 3: Verify Against the Contractor Schedule
190
+
191
+ Check whether the contractor is currently authorized to be active.
192
+
193
+ ```fsql
194
+ -- Check contractor engagement status
195
+ SELECT user_id, contractor_name, engagement_start, engagement_end, status, manager
196
+ FROM [contractor_schedule_source]
197
+ WHERE user_id = '{user_id}'
198
+ ORDER BY engagement_end DESC
199
+ LIMIT 1
200
+ ```
201
+
202
+ **What to look for:**
203
+ - Is `engagement_end` in the past? If so, how long ago?
204
+ - Is `status` set to "active" or has it been changed to "terminated" / "suspended"?
205
+ - Who is the listed `manager`? (needed for escalation)
206
+
207
+ **Decision point:**
208
+ - If the contractor is active and within their engagement window, continue to Step 4.
209
+ - If the engagement ended within the last 7 days, this may be a schedule update lag -- flag as **suspicious** and continue to Step 4.
210
+ - If the engagement ended more than 7 days ago, flag as **high suspicion** -- the account should have been deactivated. Continue to Step 4 with elevated urgency.
211
+
212
+ ### Step 4: Examine Post-Login Activity
213
+
214
+ Query endpoint and application logs to see what happened AFTER the authenticated
215
+ session began.
216
+
217
+ ```fsql
218
+ -- Post-login activity within 1 hour of the alert
219
+ SELECT timestamp, event_type, target_resource, action, outcome, source_ip
220
+ FROM [activity_source]
221
+ WHERE user_id = '{user_id}'
222
+ AND timestamp BETWEEN '{alert_timestamp}' AND dateadd('{alert_timestamp}', 1h)
223
+ ORDER BY timestamp ASC
224
+ ```
225
+
226
+ **What to look for:**
227
+ - Access to resources outside the contractor's documented scope of work
228
+ - Reconnaissance patterns: listing users, groups, shares, or permissions
229
+ - Data access or download volume exceeding normal patterns
230
+ - Lateral movement: connections to internal systems the contractor does not typically access
231
+ - Any attempt to modify account permissions or create new accounts
232
+
233
+ **Decision point:**
234
+ - If post-login activity is consistent with the contractor's normal work, proceed to the Decision Matrix with **low suspicion**.
235
+ - If post-login activity includes out-of-scope access or reconnaissance, proceed with **high suspicion**.
236
+ - If post-login activity includes data exfiltration or lateral movement, proceed with **critical suspicion** and recommend immediate session termination.
237
+
238
+ ---
239
+
240
+ ## Decision Matrix
241
+
242
+ <!-- Map combinations of findings to dispositions and recommended actions. -->
243
+ <!-- The framework uses this table to determine its proposed disposition. -->
244
+
245
+ | Schedule Status | Login Location | Post-Login Activity | Suspicion | Proposed Disposition | Recommended Action |
246
+ |-----------------|---------------|---------------------|-----------|---------------------|--------------------|
247
+ | Active | Known/approved | Normal, in-scope | Low | **Benign Activity** | Close. Document baseline confirmation. |
248
+ | Active | New but plausible | Normal, in-scope | Low-Medium | **Benign Activity** | Close. Note new location for baseline update. |
249
+ | Active | Unapproved country | Normal, in-scope | Medium | **Policy Violation** | Notify contractor manager. Review access policy. |
250
+ | Active | Any | Out-of-scope access | High | **Critical Threat** | Suspend session. Escalate to IR. |
251
+ | Expired (<7 days) | Any | Any | Medium-High | **Policy Violation** | Disable account. Notify IT and contractor manager. Audit recent activity. |
252
+ | Expired (>7 days) | Any | Any | High | **Critical Threat** | Disable account immediately. Escalate to IR. Full forensic review. |
253
+ | Any | Impossible travel | Any | High | **Critical Threat** | Suspend session. Escalate to IR. Assume credential compromise. |
254
+ | Any | Any | Lateral movement or data exfiltration | Critical | **Critical Threat** | Terminate session. Isolate endpoint. Invoke incident response. |
255
+
256
+ ---
257
+
258
+ ## Escalation
259
+
260
+ <!-- Define who to contact for each escalation condition. -->
261
+ <!-- Use role names or on-call rotation names, not individual names (people change). -->
262
+
263
+ | Condition | Escalate To | Method | SLA |
264
+ |-----------|-------------|--------|-----|
265
+ | Credential compromise confirmed | IR On-Call | PagerDuty | 15 minutes |
266
+ | Contractor account active past engagement end | IT Identity Team | Slack #identity-ops | 1 hour |
267
+ | Policy violation (unapproved country) | Contractor's Manager + Security Governance | Email | 4 hours |
268
+ | Lateral movement or data exfiltration | IR On-Call + CISO | PagerDuty + Phone | Immediate |
269
+ | Schedule data unavailable (cannot verify) | SOC Lead | Slack #soc-escalations | 1 hour |
270
+
271
+ > **Note:** If you cannot reach the escalation target within the SLA, escalate to
272
+ > the next level up. Never let an unresolved critical finding sit without an owner.
273
+
274
+ ---
275
+
276
+ ## Red Flags
277
+
278
+ <!-- Common mistakes, shortcuts, or assumptions that lead to bad outcomes -->
279
+ <!-- for this specific investigation type. These are guardrails for the framework -->
280
+ <!-- and for human analysts reviewing the output. -->
281
+
282
+ 1. **Trusting the schedule alone.** A contractor marked "active" in the schedule may still be compromised. Schedule status tells you whether the account *should* exist, not whether the *current session* is legitimate. Always complete the post-login activity check (Step 4).
283
+
284
+ 2. **Ignoring "benign" logins from new locations.** Contractor travel is common, but it is also the easiest cover story for a compromised credential. If the location is new AND the contractor has no travel on file, treat it as suspicious until verified.
285
+
286
+ 3. **Closing on MFA success alone.** MFA is not proof of identity. Session hijacking, SIM swapping, and MFA fatigue attacks all produce successful MFA events. Look at the full authentication sequence, not just the outcome.
287
+
288
+ 4. **Skipping the post-login activity check for "low severity" alerts.** The alert severity reflects what the detection tool thinks. Your runbook exists because your organization knows this alert pattern carries risk that the detection tool underweights. Always run Step 4.
289
+
290
+ 5. **Assuming deactivated accounts are harmless.** An account that should have been deactivated but was not is a high-value target for attackers. Treat active sessions on expired accounts as critical, not as an IT hygiene issue.
291
+
292
+ ---
293
+
294
+ ## Revision History
295
+
296
+ <!-- Track changes to this runbook so analysts know when procedures were last updated. -->
297
+
298
+ | Date | Author | Change |
299
+ |------|--------|--------|
300
+ | YYYY-MM-DD | [your name] | Initial version |