@query-ai/digital-workers 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,170 @@
1
+ ---
2
+ name: detection-engineer
3
+ description: Use when hunt findings need to be converted into production-ready detection artifacts (FSQL queries, Sigma rules, Query recipes) and coverage gaps need remediation plans
4
+ ---
5
+
6
+ # Detection Engineer
7
+
8
+ ## Iron Law
9
+
10
+ **EVERY DETECTION MUST HAVE A DOCUMENTED FALSE POSITIVE CONDITION.** A DETECTION WITHOUT TUNING GUIDANCE IS A NOISE GENERATOR.
11
+
12
+ ## When to Invoke
13
+
14
+ Called by `threat-hunt` orchestrator at Phase 4 after hunt-pattern-analyzer classifies findings.
15
+
16
+ ## Process
17
+
18
+ ### Step 1: Receive Patterns
19
+
20
+ Receive patterns from hunt-pattern-analyzer — both findings (things we CAN see) and coverage gaps (things we CAN'T see).
21
+
22
+ ### Step 2: Convert Findings into Detections
23
+
24
+ For each finding:
25
+
26
+ a. **Generate FSQL detection queries** — iterate with `FSQL_Query_Generation` -> `Search_FSQL_SCHEMA` -> `Validate_FSQL_Query` until correct. Do NOT fire-and-forget. Treat MCP tools as collaborators.
27
+
28
+ b. **Generate Query platform recipes** via `RECIPE_FROM_FSQL_Query_Generation`.
29
+
30
+ c. **Generate Sigma rules** for portable detections.
31
+
32
+ d. **Test each detection** against the hunt time window — document hit rate and false positives. Use SUMMARIZE to measure hit rate and false positive distribution efficiently:
33
+
34
+ ```
35
+ -- Hit rate: how many events match the detection?
36
+ SUMMARIZE COUNT detection_finding.message
37
+ WITH <your_detection_filter> AFTER 7d
38
+
39
+ -- False positive rate: what percentage are already resolved/benign?
40
+ SUMMARIZE COUNT detection_finding.status_id GROUP BY detection_finding.status_id
41
+ WITH <your_detection_filter> AFTER 7d
42
+
43
+ -- Host distribution: is this firing on one host or many?
44
+ SUMMARIZE COUNT detection_finding.message GROUP BY detection_finding.device.hostname
45
+ WITH <your_detection_filter> AFTER 7d
46
+ ```
47
+
48
+ A detection that fires 500 times on 1 host is different from one that fires 5 times across 100 hosts. SUMMARIZE tells you this in one query instead of manually counting QUERY results.
49
+
50
+ > **Constraints:** SUMMARIZE has known execution limits — `status_id` filtering fails on detection_finding (use GROUP BY instead), `FROM` not supported, high-cardinality GROUP BY can overflow. If SUMMARIZE returns empty, fall back to QUERY. See fsql-expert Layer 1c for workarounds and check `summarize_support` in the environment profile.
51
+
52
+ e. **Document false positive conditions** for EVERY detection.
53
+
54
+ #### MCP Tool Interaction — Iterative, Not Fire-and-Forget
55
+
56
+ For each detection query:
57
+
58
+ 1. Call `FSQL_Query_Generation` with the behavioral pattern description
59
+ 2. Review the generated query — is it accurate? Does it use the right fields?
60
+ 3. Call `Search_FSQL_SCHEMA` to verify field paths exist
61
+ 4. Call `Validate_FSQL_Query` — if validation fails, adjust and re-validate
62
+ 5. Call `Execute_FSQL_Query` against the hunt time window to test hit rate
63
+ 6. Call `RECIPE_FROM_FSQL_Query_Generation` with the validated query
64
+ 7. Document the detection with FP conditions
65
+
66
+ ### Step 3: Build Gap Remediation Plan
67
+
68
+ For each coverage gap:
69
+
70
+ a. **Document the gap**: what data is missing, which TTPs it blocks, what connector/event type/field would fill it.
71
+
72
+ b. **Assess impact**: what threats are invisible because of this gap.
73
+
74
+ c. **Propose remediation**: specific connector deployment, field mapping request, or platform config change.
75
+
76
+ d. **Prioritize by risk**: gaps that blind us to high-impact TTPs rank higher.
77
+
78
+ ## Output
79
+
80
+ Two deliverables are returned to the calling orchestrator.
81
+
82
+ ### Deliverable 1 — Detection Package
83
+
84
+ ```
85
+ DETECTION PACKAGE
86
+ ━━━━━━━━━━━━━━━━
87
+ Hunt: [hunt ID]
88
+ Date: [date]
89
+ Findings converted: [N]
90
+
91
+ Detection 1: [name]
92
+ Trigger: [what behavior this detects]
93
+ MITRE ATT&CK: [technique ID] — [technique name]
94
+
95
+ FSQL Detection Query:
96
+ [the validated FSQL query]
97
+
98
+ Sigma Rule:
99
+ title: [detection name]
100
+ status: experimental
101
+ description: [what it detects]
102
+ logsource:
103
+ category: [category]
104
+ product: [product]
105
+ detection:
106
+ selection:
107
+ [field]: [value]
108
+ condition: selection
109
+ falsepositives:
110
+ - [documented FP condition 1]
111
+ - [documented FP condition 2]
112
+ level: [medium/high/critical]
113
+ tags:
114
+ - attack.[tactic]
115
+ - attack.[technique_id]
116
+
117
+ Query Recipe: [recipe output from RECIPE_FROM_FSQL_Query_Generation]
118
+
119
+ Validation Results:
120
+ FSQL validation: PASS
121
+ Test against hunt window: [N] hits, [N] false positives
122
+ Estimated FP rate: [percentage or qualitative assessment]
123
+
124
+ False Positive Conditions:
125
+ 1. [specific condition that would trigger this detection benignly]
126
+ 2. [another FP condition]
127
+ Tuning guidance: [how to reduce FPs without losing true positives]
128
+
129
+ Detection 2: [...]
130
+ ```
131
+
132
+ ### Deliverable 2 — Gap Remediation Plan
133
+
134
+ ```
135
+ GAP REMEDIATION PLAN
136
+ ━━━━━━━━━━━━━━━━━━━
137
+ Hunt: [hunt ID]
138
+ Date: [date]
139
+
140
+ Gap 1: [description]
141
+ Missing: [event type / field / connector capability]
142
+ Blocks: [MITRE techniques that cannot be hunted]
143
+ Impact: [HIGH/MEDIUM/LOW] — [what threats are invisible]
144
+ Remediation: [specific action — deploy X, configure Y, request mapping Z]
145
+ Priority: [1-5, 1 = critical blind spot]
146
+
147
+ Gap 2: [...]
148
+
149
+ SUMMARY:
150
+ Detections created: [N]
151
+ Gaps identified: [N]
152
+ TTPs fully covered: [list]
153
+ TTPs partially covered: [list]
154
+ TTPs blind: [list — these are the risks]
155
+ ```
156
+
157
+ **Return detection package and gap remediation plan to the threat-hunt orchestrator. Do not present to the user or wait for input.**
158
+
159
+ ## Red Flags
160
+
161
+ | Red Flag | Correct Action |
162
+ |----------|---------------|
163
+ | "Detection query works, ship it" without testing | STOP. Every detection must be tested against the hunt time window. Document hit rate AND false positives. |
164
+ | Detection without false positive documentation | STOP. A detection without FP conditions is a noise generator. Document at least one FP scenario. |
165
+ | Accepting first FSQL_Query_Generation output without review | STOP. Iterate. Verify fields via Search_FSQL_SCHEMA. Validate. Cross-reference. |
166
+ | Skipping Sigma rule generation | STOP. Sigma rules provide portability. Not every environment uses Query. Generate both FSQL and Sigma. |
167
+ | Coverage gap without remediation recommendation | STOP. Identifying a gap without proposing a fix is half the job. Be specific: which connector, which field, which configuration. |
168
+ | Gap remediation without priority | STOP. Not all gaps are equal. A gap that blinds you to lateral movement is more critical than one that blocks geolocation. Prioritize by risk. |
169
+ | Hardcoding connector names in detections | STOP. Detections should use OCSF event types and field paths, not connector-specific references. They must work across any connector that maps to the same OCSF schema. |
170
+ | Not generating a Query recipe | STOP. The recipe is the deployment artifact for the Query platform. Always call RECIPE_FROM_FSQL_Query_Generation. |
@@ -0,0 +1,109 @@
1
+ ---
2
+ name: evidence-quality-checker
3
+ description: Use at Gate 2 and Gate 3 exits to verify data quality and analytical reasoning before proceeding — catches status filtering errors, vendor-label anchoring, and missing-data gaps before they cascade into false escalations
4
+ ---
5
+
6
+ # Evidence Quality Checker
7
+
8
+ ## Iron Law
9
+
10
+ **CHECK THE EVIDENCE QUALITY BEFORE BUILDING THE NARRATIVE.**
11
+
12
+ Reasoning errors cascade. A missing `status_id` filter in Gate 2 becomes a false APT attribution in Gate 4 and a wrong escalation in Gate 6. These checkpoints exist to catch errors before they compound.
13
+
14
+ ## When to Invoke
15
+
16
+ - **Gate 2 exit**: Data quality pass — invoked by `alert-investigation` after enrichment, before severity scoring
17
+ - **Gate 3 exit**: Analytical reasoning pass — invoked by `alert-investigation` after severity scoring, before specialist invocation
18
+
19
+ ## Reasoning Principles
20
+
21
+ Five principles anchor all checks. Understand the *why* before running the *what*.
22
+
23
+ ### 1. Status before narrative
24
+
25
+ An alert's `status_id` determines whether it's actionable. RESOLVED/Benign alerts are closed investigations, not active threats. Always separate findings by status before drawing conclusions.
26
+
27
+ This applies to **findings events only** (`detection_finding`, `security_finding`, `vulnerability_finding`, `incident_finding`). Note: `authentication.status_id` means SUCCESS/FAILURE (auth result, different concept). Telemetry events (`network_activity`, `process_activity`, `file_activity`) don't have an alert lifecycle `status_id`.
28
+
29
+ ### 2. Vendor labels are hypotheses, not facts
30
+
31
+ "Dukozy malware detected" means a signature matched. "RESOLVED/Benign" means the same vendor investigated and dismissed it. The label and the resolution come from the same source — you can't trust one and ignore the other.
32
+
33
+ ### 3. Volume is not evidence
34
+
35
+ "~600 alerts" is a count, not a finding. 600 resolved alerts is less concerning than 3 unresolved ones. Signal/noise separation must happen before any alert count appears in a report.
36
+
37
+ ### 4. Absence is a finding
38
+
39
+ If file hashes are empty, authentication logs are unavailable, or a connector returns nothing, that's not a blank to skip over. It constrains what conclusions you can reach. State it explicitly: "Cannot confirm X because Y data is unavailable."
40
+
41
+ ### 5. Confidence tracks evidence, not severity
42
+
43
+ High-severity source labels don't mean high-confidence findings. Confidence comes from corroboration across independent sources, hash verification, behavioral correlation. If your only evidence is vendor labels, confidence is MEDIUM at best. Five detections from the same platform are one source, not five.
44
+
45
+ ---
46
+
47
+ ## Gate 2 Checkpoint — Data Quality
48
+
49
+ Run at Gate 2 exit, before severity scoring. Five binary checks, designed for <60 seconds.
50
+
51
+ Review the enrichment queries and results from Gate 2, then evaluate each check:
52
+
53
+ | # | Check | If No |
54
+ |---|-------|-------|
55
+ | 1 | Do all **findings** lookback queries (time range beyond the initial intake window, e.g., 7d/30d) include `status_id` in field selectors? | Re-run with `status_id` before proceeding |
56
+ | 2 | Have findings results been separated into NEW vs. RESOLVED vs. null buckets? | Separate now — count only NEW as active |
57
+ | 3 | Are any IOC types expected but absent (e.g., no file hashes in malware detections)? | Document as data gap with impact on conclusions |
58
+ | 4 | Did any enrichment queries return zero results? | Note which data sources returned empty and why |
59
+ | 5 | Is alert count reported without status breakdown? | Add status breakdown before any count reaches the report |
60
+
61
+ ---
62
+
63
+ ## Gate 3 Checkpoint — Analytical Reasoning
64
+
65
+ Run at Gate 3 exit, after severity scoring, before specialist invocation. Eight checks, ~90 seconds.
66
+
67
+ Review the severity scoring, Five W's, and accumulated evidence, then evaluate each check:
68
+
69
+ | # | Check | If No |
70
+ |---|-------|-------|
71
+ | 1 | For each vendor detection label cited as evidence: has its `status_id` and `status_detail` been checked? | Check now — a RESOLVED/Benign detection is not evidence of active compromise |
72
+ | 2 | Is the threat actor attribution supported by NEW/unresolved detections, not just vendor labels? | Downgrade attribution confidence or remove claim |
73
+ | 3 | Does the severity score reflect actual evidence quality, or is it anchored to source severity labels? | Re-score Confidence dimension based on evidence, not labels |
74
+ | 4 | Are all "data not available" findings documented with impact on conclusions? | Add to Five W's and report |
75
+ | 5 | Have the appropriate specialist skills been identified for Gate 4 invocation based on alert types? | Identify now and ensure Gate 4 invokes them |
76
+ | 6 | Can every conclusion in the Five W's be traced to a specific query result — and if sourced from findings data, verified as `status_id = NEW`? | Remove or downgrade unsupported conclusions |
77
+ | 7 | If `status_detail = "UnsupportedAlertType"`, has this been flagged as potential integration gap rather than confirmed threat? | Add caveat to findings |
78
+ | 8 | If the environment profile was used to skip queries (known FPs, unpopulated fields, broken observables), is each skip justified by a profile entry with `last_verified` < 7 days ago? | Re-verify stale entries before relying on them — run the query instead of skipping |
79
+
80
+ ---
81
+
82
+ ## Output Format
83
+
84
+ After running each checkpoint, output:
85
+
86
+ ```
87
+ EVIDENCE QUALITY CHECK — [Gate 2 | Gate 3]
88
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
89
+
90
+ [1] [Check description]: PASS | FAIL
91
+ [If FAIL: what was found and what action to take]
92
+
93
+ [2] [Check description]: PASS | FAIL
94
+ ...
95
+
96
+ Result: ALL PASS | [N] FAILURES — fix before proceeding
97
+ ```
98
+
99
+ If any check FAILs, fix the issue before proceeding to the next gate. Do not defer fixes to later gates or to the report.
100
+
101
+ **After the check completes — whether all checks pass or after failures are fixed — immediately continue to the next gate without pausing.** This is an inline quality gate, not a stopping point. Do not wait for user input. Print the results and keep going.
102
+
103
+ ## Red Flags
104
+
105
+ | Red Flag | Correct Action |
106
+ |----------|---------------|
107
+ | "All checks passed" in under 10 seconds | STOP. You didn't actually check. Review each item against the evidence. |
108
+ | "This check doesn't apply to this investigation" | It might not. But state WHY it doesn't apply — don't just skip. |
109
+ | "I'll fix this in the report" | STOP. Fix it now. The point is to prevent errors from cascading, not to annotate them later. |
@@ -0,0 +1,308 @@
1
+ ---
2
+ name: fsql-expert
3
+ description: Use when any FSQL query needs to be authored, validated, or executed against the Query Data Mesh — handles all data access for investigations
4
+ ---
5
+
6
+ # FSQL Expert
7
+
8
+ ## ABSOLUTE RULES — Read These First
9
+
10
+ **1. NEVER use Bash, cat, python, jq, or any shell command to process query results.** If results overflow to a file, your query was too broad. Re-run with specific field selectors or tighter filters. DO NOT cat the file. DO NOT pipe to python. NEVER.
11
+
12
+ **2. NEVER use `**` on broad queries.** Use specific field selectors: `detection_finding.message, detection_finding.severity_id, detection_finding.time, detection_finding.observables`. Only use `**` when scoped to a single host or single event type.
13
+
14
+ **3. ALWAYS validate before execute.** Every query goes through `Validate_FSQL_Query` first. The query you validate must be identical to the query you execute.
15
+
16
+ ## Iron Law
17
+
18
+ **NO QUERY HITS THE MESH WITHOUT VALIDATION FIRST.**
19
+
20
+ Every FSQL query must be validated via `Validate_FSQL_Query` before execution via `Execute_FSQL_Query`.
21
+
22
+ ## Overview
23
+
24
+ You are the data access layer for all Digital Workers investigations. Every piece of evidence comes through you. Your job is to author precise FSQL queries, validate them against the live schema, execute them against the mesh, and return structured results.
25
+
26
+ ## Available MCP Tools
27
+
28
+ The Query Data Mesh provides these MCP tools. **Call them directly as MCP tools — do NOT use ToolSearch to find them.** They are provided by the QueryDemoMCP MCP server configured in `.mcp.json`.
29
+
30
+ | Tool | Purpose |
31
+ |------|---------|
32
+ | `Execute_FSQL_Query` | Run a validated FSQL query against the mesh. Returns OCSF events. |
33
+ | `Validate_FSQL_Query` | Check if an FSQL query (QUERY or SUMMARIZE) is syntactically valid before execution. |
34
+ | `Search_FSQL_SCHEMA` | Search the schema vector database for attributes and event types. |
35
+ | `FSQL_Connectors` | List all available connectors in the mesh (data sources). |
36
+ | `FSQL_Query_Generation` | Generate FSQL from natural language (useful for complex queries). |
37
+ | `KQL_TO_FSQL_Query_Generation` | Convert KQL queries to FSQL. |
38
+ | `SIGMA_TO_FSQL_Query_Generation` | Convert Sigma detection rules to FSQL. |
39
+
40
+ ## Reference — MANDATORY Session Start
41
+
42
+ **Step 1 (REQUIRED): Read the MCP syntax reference.** At the start of every investigation session, read `fsql://docs/syntax-reference` via `ReadMcpResourceTool` (server: `QueryDemoMCP`). This unlocks advanced operators you will need — set operations (UNION/INTERSECTION/EXCEPT), type filters (@ip, @user), ~40 observable types, and depth modifiers. Sessions that skip this step produce simpler, less comprehensive queries. Do not proceed to query authoring until you have read this resource.
43
+
44
+ **Step 2: Read the investigation cheat sheet.** See `fsql-reference.md` in this skill directory for investigation-specific patterns — the Layer 1a/1b discovery approach, event-type-specific field selectors, telemetry pivot patterns, and status filtering rules. This is your playbook for how to structure investigation queries.
45
+
46
+ **How these two resources work together:**
47
+ - The **cheat sheet** gives you the investigation methodology — Layer 1a discovery, pivot patterns, field selectors per event type, status_id rules
48
+ - The **MCP syntax reference** gives you the full operator toolkit — set operations, type filters, advanced observables, category selectors
49
+ - Use both. Agents that only use the cheat sheet write correct but basic queries. Agents that also internalize the syntax reference write sophisticated queries using set operations and type filters that find more evidence.
50
+
51
+ **Additional MCP resources** (read as needed during investigation):
52
+ - `fsql://docs/categories-and-events` — all OCSF categories and event classes (read when pivoting to unfamiliar event types)
53
+ - `fsql://docs/kql-conversion-guide` — converting KQL queries to FSQL
54
+ - `fsql://docs/sigma-logsource-mappings` — converting Sigma detection rules to FSQL
55
+
56
+ **Step 3: Read the environment profile.** If `digital-workers/learned/environment-profile.json` exists, read it. This tells you what previous investigations learned about this mesh — which fields are populated per connector, which observables work, which event types have data, query performance limits, and known false positives. Use this knowledge to avoid re-discovering known gaps.
57
+
58
+ Key profile sections and how they affect query authoring:
59
+ - **`field_population[connector_id]`** — Skip known-unpopulated field paths on specific connectors. Use the `workaround` field for alternatives (e.g., `%ip` instead of `device.hostname`). Check if other connectors producing the same event type have the field before giving up entirely.
60
+ - **`event_type_availability[connector_id]`** — Skip event types marked `no_data` on all connectors. If some connectors are `untested`, probe one with a `FROM <connector_id>` query before declaring a gap.
61
+ - **`observable_support`** — Skip observables marked `not_working`. Check `connectors_tested` — if not all relevant connectors have been tested, consider probing an untested one.
62
+ - **`query_performance`** — Respect batch size limits, known overflow patterns, and query reliability notes. For `%ip` discovery scans, always use single-IP queries (IN operator is unreliable).
63
+ - **`known_false_positives`** — If a detection message is listed with `last_verified` < 7 days ago, skip re-verification and cite the profile (include the last_verified date and sample size in your findings).
64
+ - **`systemic_patterns`** — Recurring alert patterns that are un-actionable but not vendor-resolved FPs (e.g., `UnsupportedAlertType` alerts with no enrichment data). Fast-track through investigation — minimal enrichment, profile-cited disposition.
65
+ - **`connector_behaviors`** — Connector-level quirks that affect all event types (e.g., "status_id always null" on CrowdStrike). Use these to interpret findings correctly — a null status_id from a connector that never populates it is different from a genuinely unknown status.
66
+ - **`summarize_support[event_type]`** — Check before writing SUMMARIZE queries. If the event type is `not_working`, skip SUMMARIZE and use QUERY. If `partial`, check `known_issues` for the specific filter or GROUP BY you plan to use. If `untested`, try SUMMARIZE and update the profile with the result.
67
+
68
+ If the profile doesn't exist, proceed normally. Your discoveries during this investigation will create it.
69
+
70
+ ## The Three-Layer Process
71
+
72
+ ### Layer 1: Author the Query
73
+
74
+ Using the FSQL syntax reference, construct a query for the investigation need:
75
+ - **Use specific field selectors, not `**`.** Select only the fields you need (e.g., `detection_finding.message, detection_finding.severity_id, detection_finding.time, detection_finding.observables`). The `**` wildcard returns the entire OCSF event with all nested objects, which produces massive payloads. Only use `**` when you genuinely need every field.
76
+ - Choose the right event type(s) or use category selectors (`#network`, `#findings`)
77
+ - Use observable shortcuts (`%ip`, `%hash`, `%email`) for broad searches
78
+ - Apply appropriate filters and time ranges
79
+ - Scope with `FROM` clause only when targeting specific connectors
80
+ - For complex queries, use `FSQL_Query_Generation` to generate from natural language, then review and adjust
81
+
82
+ ### Layer 2: Discover Schema (When Needed)
83
+
84
+ Before querying unfamiliar event types or filtering on fields you haven't used, call `Search_FSQL_SCHEMA`:
85
+
86
+ ```
87
+ Search_FSQL_SCHEMA(query="detection_finding severity", limit=10)
88
+ ```
89
+
90
+ This tells you:
91
+ - What fields actually exist for this event type (`fsql_path`)
92
+ - Whether a field is a string, enum, array, IP, etc. (`data_type`)
93
+ - Enum values if applicable (`enum_values`)
94
+ - Attribute descriptions for understanding field semantics
95
+
96
+ Also use `FSQL_Connectors` to see what data sources are available in the mesh.
97
+
98
+ **ALWAYS run schema search when:**
99
+ - First time querying an event type in this investigation (before writing the query, not after it fails)
100
+ - Layer 1a discovery reveals an event type you haven't queried before — look up its fields before writing the Layer 1b query
101
+ - Unsure if a field path is correct
102
+ - Need to know if a field is an enum (use `IN`) vs. string (use `CONTAINS`)
103
+
104
+ **Skip schema search when:**
105
+ - Using well-known patterns from the reference (e.g., `detection_finding.severity_id IN HIGH, CRITICAL`)
106
+ - Repeating a query pattern that already succeeded in this investigation
107
+
108
+ ### Layer 3: Validate Then Execute
109
+
110
+ 1. **Validate:** Call `Validate_FSQL_Query` with your query — returns `is_valid: true/false` with error detail
111
+ 2. **Check:** If validation fails, read the error detail, fix the query, validate again
112
+ 3. **Execute:** Only after `is_valid: true`, call `Execute_FSQL_Query` to run against the mesh
113
+ 4. **Self-correct (MANDATORY):** If execution returns empty results when data is expected, OR returns an error about field paths, you MUST call `Search_FSQL_SCHEMA` to discover the correct fields and retry with fixed paths. Do not document a data gap without first attempting schema search and retry. The schema search tells you the exact field paths — use them.
114
+ 5. **IOC extraction fallback chain.** When querying for IOCs (users, IPs, hostnames) from detection findings:
115
+ - **Try structured fields first:** `detection_finding.actor.user.email_addr`, `detection_finding.finding_info`, `detection_finding.observables`
116
+ - **If structured fields are empty:** Query `detection_finding.raw_data` for the same alerts. Extract IOCs from JSON keys — common patterns: `impacted_assets` (user: hostname), `impacted_ips`, `entities`, `evidence`
117
+ - **Check the profile:** If `field_population` shows these fields as `unpopulated` on the relevant connector, skip straight to raw_data (saves a wasted query)
118
+ - **Do NOT probe multiple field paths.** One structured query + one raw_data query max. If both are empty, document the gap.
119
+ 6. **Update the environment profile.** After any of these outcomes, update `digital-workers/learned/environment-profile.json` immediately (do not wait until end of investigation):
120
+ - **Query returned 0 results on a field path** → Add `field_population` entry for the connector(s) that produce this event type. If the query used a `FROM` clause, attribute to that connector with `confidence: "high"`. Otherwise, attribute to all connectors producing the event type with `confidence: "medium"`.
121
+ - **Observable query errored or returned empty** → Add `observable_support` entry with `status: "not_working"` and note which `connectors_tested`.
122
+ - **Query overflowed (results saved to file)** → Add `query_performance` entry with the pattern description and threshold.
123
+ - **Event type returned 0 results** → Add `event_type_availability` entry for the relevant connector(s).
124
+ - **Query returned data for a field/event type previously marked unpopulated** → Update the existing entry to `status: "populated"` or `status: "has_data"` and refresh `last_verified`.
125
+ - **SUMMARIZE succeeded on an event type** → Add or update `summarize_support` entry with `status: "working"`, note `filters_tested` and `group_by_tested`.
126
+ - **SUMMARIZE failed or returned empty** → Add or update `summarize_support` entry with `status: "not_working"` or `"partial"`, document the failure in `known_issues`.
127
+ - **SUMMARIZE succeeded on a previously `not_working` event type** → Update to `status: "working"` (platform may have fixed the issue).
128
+
129
+ When updating, always set `last_verified` to today's date, `source_investigation` to the current investigation ID, and preserve existing entries for other connectors.
130
+
131
+ ## Red Flags
132
+
133
+ | Red Flag | Correct Action |
134
+ |----------|---------------|
135
+ | "I'll just run this query and see" | STOP. Call `Validate_FSQL_Query` first. |
136
+ | "This field name looks right" | STOP. Use `Search_FSQL_SCHEMA` to verify. |
137
+ | "No results — the data isn't there" | STOP. Try broader query, different time range, observable search. Check `FSQL_Connectors` for available data sources. |
138
+ | "I'll skip validation, this is a simple query" | STOP. ALL queries get validated. Simple queries have simple validations. |
139
+ | Running the same failed query a second time | STOP. Use `Search_FSQL_SCHEMA` to understand why it failed. Fix before retrying. |
140
+ | Using Bash, cat, or Python to parse MCP tool results | STOP. **Never use Bash commands to process query results.** Analyze inline results directly. If results overflow to a file, DO NOT read the file — re-run the query with specific field selectors to get a smaller result that fits inline. The file is a signal that your query was too broad. |
141
+ | Query result was saved to a file (too large for context) | STOP. **Do not cat, Read, or process the file.** Your query was too broad. Re-run with specific field selectors, tighter filters (single host, single event type), or a narrower time range. |
142
+ | Using ToolSearch to find MCP tools | STOP. Call `Execute_FSQL_Query`, `Validate_FSQL_Query`, etc. directly. They are MCP tools, not deferred tools. |
143
+
144
+ ## Query Strategy for Investigations
145
+
146
+ ### Layered Query Approach
147
+
148
+ #### Layer 1a — Discovery Scan (always start here)
149
+
150
+ Find ALL activity for IOCs across the mesh without overflowing. Use `*.message, *.time` — this returns one lightweight row per matching event with the `__event` field showing which event type it came from.
151
+
152
+ **BATCH same-type IOCs using `IN` where reliable — but use single queries for `%ip`:**
153
+
154
+ **`%ip` discovery: one query per IP.** The `%ip IN` operator is unreliable on discovery scans — 4-IP batches always error, 2-IP batches are intermittent. Use single-IP queries for guaranteed results:
155
+
156
+ ```
157
+ -- IP discovery: one query per IP (reliable)
158
+ QUERY *.message, *.time WITH %ip = '10.0.0.1' AFTER 24h
159
+ QUERY *.message, *.time WITH %ip = '10.0.0.2' AFTER 24h
160
+
161
+ -- Hashes: IN batching works reliably
162
+ QUERY *.message, *.time WITH %hash IN 'f6c3023f', 'a1b2c3d4' AFTER 7d
163
+ ```
164
+
165
+ Read the `__event` field in the results — it tells you where your IOCs appeared (e.g., `detection_finding`, `email_activity`, `osint_inventory_info`, `process_activity`). This is how you discover event types you wouldn't have thought to query.
166
+
167
+ **Never use bare `QUERY %hash = 'x'` or `QUERY ** WITH %hash = 'x'` for discovery.** These return full OCSF events and will overflow on any IOC with significant activity.
168
+
169
+ #### Layer 1b — Targeted Detail
170
+
171
+ Once Layer 1a tells you which event types have hits, query those specific event types with the fields you need:
172
+
173
+ ```
174
+ -- Layer 1a showed hits in email_activity and detection_finding
175
+ QUERY email_activity.message, email_activity.time, email_activity.actor.user.email_addr
176
+ WITH %hash = 'f6c3023f' AFTER 7d
177
+
178
+ QUERY detection_finding.message, detection_finding.severity_id, detection_finding.status_id,
179
+ detection_finding.device.hostname
180
+ WITH %hash = 'f6c3023f' AND detection_finding.status_id = NEW AFTER 7d
181
+ ```
182
+
183
+ If you don't know the field paths for an event type, run `Search_FSQL_SCHEMA` first (see Layer 2).
184
+
185
+ #### Layer 1c — Aggregation & Counting
186
+
187
+ When Layer 1a/1b results return many records and you need distributions or unique counts rather than individual events, use SUMMARIZE:
188
+
189
+ ```
190
+ -- Status distribution (see which alerts are NEW vs RESOLVED)
191
+ SUMMARIZE COUNT detection_finding.status_id
192
+ GROUP BY detection_finding.status_id
193
+ WITH detection_finding.severity_id IN HIGH, CRITICAL AFTER 7d
194
+
195
+ -- Alert type breakdown with status (GROUP BY status_id instead of filtering on it)
196
+ SUMMARIZE COUNT detection_finding.message
197
+ GROUP BY detection_finding.message, detection_finding.status_id
198
+ WITH detection_finding.severity_id IN HIGH, CRITICAL AFTER 24h
199
+
200
+ -- Per-host alert distribution
201
+ SUMMARIZE COUNT detection_finding.message
202
+ GROUP BY detection_finding.device.hostname, detection_finding.severity_id
203
+ AFTER 7d
204
+
205
+ -- Authentication: unique IPs per user (works reliably with all filters)
206
+ SUMMARIZE COUNT DISTINCT authentication.device.ip
207
+ GROUP BY authentication.actor.user.email_addr
208
+ WITH authentication.status_id = FAILURE AFTER 24h
209
+ ```
210
+
211
+ **When to use SUMMARIZE vs QUERY:**
212
+ - Need individual events (IOCs, timelines, raw_data)? → QUERY with field selectors
213
+ - Need counts, distributions, or unique entity counts? → SUMMARIZE
214
+ - Unsure? Start with QUERY. Switch to SUMMARIZE when you catch yourself manually counting results.
215
+
216
+ **Rules:**
217
+ - All fields must reference the same OCSF event class (cross-event-class fails validation)
218
+ - Validate before execute applies — pass SUMMARIZE queries through `Validate_FSQL_Query`
219
+ - SUMMARIZE queries do NOT need the `VALIDATE` prefix — the tool adds it automatically
220
+ - Include status_id in GROUP BY on findings lookbacks to separate NEW from RESOLVED (see constraints below)
221
+
222
+ **Known execution constraints:**
223
+ - **detection_finding + `status_id = NEW`:** Executor errors or returns empty. Workaround — omit `status_id` from the WITH filter, add it to GROUP BY instead:
224
+ ```
225
+ -- FAILS: status_id as a filter
226
+ SUMMARIZE COUNT detection_finding.message WITH detection_finding.status_id = NEW AFTER 24h
227
+
228
+ -- WORKS: status_id as a GROUP BY dimension
229
+ SUMMARIZE COUNT detection_finding.message GROUP BY detection_finding.message, detection_finding.status_id AFTER 24h
230
+ -- Read the status_id column to identify which rows are NEW vs RESOLVED
231
+ ```
232
+ - **`FROM` not supported:** SUMMARIZE always queries all connectors. For connector-specific analysis, use QUERY with FROM.
233
+ - **High-cardinality GROUP BY overflows:** GROUP BY severity_id, status_id, hostname = safe (low cardinality, <100 values). GROUP BY IP, hash, username = always scope with a WITH filter first. Unfiltered GROUP BY on network_activity.src_endpoint.ip overflowed at 3.1M chars in testing.
234
+ - **email_activity, file_activity:** SUMMARIZE execution fails on these event types. Use QUERY.
235
+ - **Fallback rule:** If SUMMARIZE returns empty `{}` or "No data were processed" error, fall back to QUERY with field selectors. Do not document a data gap based on empty SUMMARIZE results.
236
+ - **Check the environment profile** (`summarize_support` section) before writing SUMMARIZE queries. If the event type is `not_working`, skip SUMMARIZE. If `partial`, check `known_issues`. If `untested`, try SUMMARIZE and update the profile with the result.
237
+
238
+ #### Layer 2 — Category with Key Fields
239
+
240
+ Query by OCSF category when investigating a class of activity rather than a specific IOC:
241
+
242
+ ```
243
+ QUERY #network.src_endpoint.ip, #network.dst_endpoint.ip, #network.message, #network.time
244
+ WITH #network.src_endpoint.ip = '10.0.0.1' AFTER 48h
245
+ ```
246
+
247
+ #### Layer 3 — Full Event (scoped, rare)
248
+
249
+ Use `**` only when scoped to a single host AND single event type AND narrow time window:
250
+
251
+ ```
252
+ QUERY process_activity.** WITH process_activity.device.hostname = 'BD-2578' AFTER 24h
253
+ ```
254
+
255
+ **Never** use `**` on broad observable searches or multi-host queries.
256
+
257
+ ### Follow-Up Queries
258
+
259
+ When initial results reveal IOCs, entities, or patterns:
260
+ - Extract IOCs (IPs, hashes, domains, usernames) from results
261
+ - Immediately author follow-up queries to search for those IOCs across the mesh
262
+ - Continue until the investigation picture is complete or no new leads emerge
263
+
264
+ ### Time Range Strategy
265
+
266
+ - Start with the alert's time window (usually `AFTER 24h`)
267
+ - Expand to `AFTER 7d` if looking for patterns or persistence
268
+ - Use `AFTER 30d` for threat hunting or campaign correlation
269
+ - Narrow with `BEFORE` and `AFTER` for precise timeline reconstruction
270
+
271
+ ### Status Awareness on Lookback Queries
272
+
273
+ **When expanding time ranges beyond the initial 24h window on findings queries (`detection_finding`, `security_finding`, `vulnerability_finding`, `incident_finding`), always include `status_id` in your field selectors.** Historical findings may have been resolved as benign. If you build an investigation narrative on RESOLVED/Benign alerts, you will produce a false escalation.
274
+
275
+ Note: `authentication.status_id` means auth result (SUCCESS/FAILURE) — a different concept. Telemetry events (`network_activity`, `process_activity`, `file_activity`) don't have an alert lifecycle `status_id`. This rule applies to findings events only.
276
+
277
+ ```
278
+ -- WRONG: no status_id — treats resolved alerts as active threats
279
+ QUERY detection_finding.message, detection_finding.severity_id
280
+ WITH detection_finding.device.hostname = 'BD-2578' AFTER 30d
281
+
282
+ -- RIGHT: includes status_id so you can separate active from resolved
283
+ QUERY detection_finding.message, detection_finding.severity_id, detection_finding.status_id
284
+ WITH detection_finding.device.hostname = 'BD-2578' AFTER 30d
285
+ ```
286
+
287
+ When analyzing results, separate findings by status:
288
+ - `status_id = NEW` → actionable, include in investigation
289
+ - `status_id = RESOLVED` + `status_detail = "Benign"` → already closed, do NOT cite as evidence of active compromise
290
+ - `status_id = null` → unknown, investigate cautiously
291
+
292
+ ### Status Detail Interpretation
293
+
294
+ When analyzing `status_id` results, also check `status_detail`. Common patterns:
295
+ - `status_detail = "Benign"` with `status_id = RESOLVED` — vendor investigated and closed as false positive
296
+ - `status_detail = "UnsupportedAlertType"` with `status_id = NEW` — the integration cannot auto-resolve this alert type. `NEW` may reflect an integration gap, not a genuine untriaged alert. Flag this ambiguity in findings.
297
+
298
+ ## Output Format
299
+
300
+ **Return results to the calling skill and continue. Do not present to the user or wait for input — the calling skill determines next steps.**
301
+
302
+ When returning results to the calling skill, always include:
303
+ 1. **The FSQL query executed** (exact text)
304
+ 2. **Validation status** (passed EXPLAIN GRAPHQL)
305
+ 3. **Result summary** (count, key fields, notable findings)
306
+ 4. **Raw results** (structured data for further analysis)
307
+ 5. **Suggested follow-up queries** (if results suggest additional investigation paths)
308
+ 6. **Data completeness flags** — If expected fields are empty (e.g., all file hash fields are null on malware detection findings), flag this in the summary: "WARNING: [field] is empty across all results — [impact on investigation]"