@josephyan/qingflow-app-user-mcp 0.2.0-beta.2 → 0.2.0-beta.21
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -2
- package/npm/lib/runtime.mjs +37 -0
- package/npm/scripts/postinstall.mjs +5 -1
- package/package.json +3 -2
- package/pyproject.toml +1 -1
- package/skills/qingflow-app-user/SKILL.md +230 -0
- package/skills/qingflow-app-user/agents/openai.yaml +4 -0
- package/skills/qingflow-app-user/references/data-gotchas.md +49 -0
- package/skills/qingflow-app-user/references/environments.md +63 -0
- package/skills/qingflow-app-user/references/record-patterns.md +110 -0
- package/skills/qingflow-app-user/references/workflow-usage.md +26 -0
- package/skills/qingflow-record-analysis/SKILL.md +253 -0
- package/skills/qingflow-record-analysis/agents/openai.yaml +4 -0
- package/skills/qingflow-record-analysis/references/analysis-gotchas.md +141 -0
- package/skills/qingflow-record-analysis/references/analysis-patterns.md +113 -0
- package/skills/qingflow-record-analysis/references/confidence-reporting.md +92 -0
- package/src/qingflow_mcp/__init__.py +1 -1
- package/src/qingflow_mcp/builder_facade/models.py +294 -1
- package/src/qingflow_mcp/builder_facade/service.py +2727 -235
- package/src/qingflow_mcp/server.py +7 -5
- package/src/qingflow_mcp/server_app_builder.py +80 -4
- package/src/qingflow_mcp/server_app_user.py +8 -182
- package/src/qingflow_mcp/solution/compiler/form_compiler.py +1 -1
- package/src/qingflow_mcp/solution/compiler/workflow_compiler.py +21 -2
- package/src/qingflow_mcp/solution/executor.py +34 -7
- package/src/qingflow_mcp/tools/ai_builder_tools.py +1038 -30
- package/src/qingflow_mcp/tools/app_tools.py +1 -2
- package/src/qingflow_mcp/tools/approval_tools.py +357 -75
- package/src/qingflow_mcp/tools/directory_tools.py +158 -28
- package/src/qingflow_mcp/tools/record_tools.py +1954 -973
- package/src/qingflow_mcp/tools/task_tools.py +376 -225
- package/src/qingflow_mcp/tools/workflow_tools.py +78 -4
|
@@ -0,0 +1,253 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: qingflow-record-analysis
|
|
3
|
+
description: Analyze Qingflow record data safely after the MCP is already connected and authenticated. Use when the user wants grouped distributions, ratios, averages, rankings, trends, insights, or any final statistical conclusion across an existing app's data. Do not use this skill for schema changes, app design, or ordinary record CRUD unless they are strictly supporting an analysis flow.
|
|
4
|
+
metadata:
|
|
5
|
+
short-description: Analyze Qingflow record data with schema-first DSL execution
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Qingflow Record Analysis
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
This skill is for record analysis inside existing Qingflow apps. Use it when the task is about `分析 / 洞察 / 分布 / 占比 / 平均 / 排名 / 趋势 / 所有 / 全部 / 全国 / 高价值` or any final statistical conclusion.
|
|
13
|
+
|
|
14
|
+
This skill assumes the MCP is already connected and authenticated. If not, switch to `$qingflow-mcp-setup` first. If the task is about creating, updating, deleting, or approving records rather than analyzing them, switch back to `$qingflow-app-user`.
|
|
15
|
+
|
|
16
|
+
Before running analysis in `prod`, confirm the intended environment and compare `request_route` with the browser route if browser parity matters.
|
|
17
|
+
|
|
18
|
+
## Tool Scope
|
|
19
|
+
|
|
20
|
+
Use these tools as the core analysis surface:
|
|
21
|
+
|
|
22
|
+
- `record_schema_get`
|
|
23
|
+
- `record_analyze`
|
|
24
|
+
|
|
25
|
+
Use `record_list` or `record_get` only when you need sample rows or a specific supporting example after the main analysis path.
|
|
26
|
+
|
|
27
|
+
## Hard Rules
|
|
28
|
+
|
|
29
|
+
- Analysis tasks must start with `record_schema_get`
|
|
30
|
+
- Build one or more small DSLs, then run `record_analyze` separately for each question
|
|
31
|
+
- DSL field references must use `field_id` only
|
|
32
|
+
- Normalize relative time phrases into explicit legal date ranges before building the DSL
|
|
33
|
+
- If the user asks for `最近一个完整自然月 / 上个月 / 最近30天 / 本季度 / 去年同期`, first convert that phrase into concrete dates, then verify the dates are legal before calling MCP
|
|
34
|
+
- Never send impossible dates such as `2026-02-29`; if the intended month is February 2026, the legal upper bound is `2026-02-28`
|
|
35
|
+
- If the schema still leaves multiple plausible fields, stop and ask the user to confirm from a short candidate list instead of guessing
|
|
36
|
+
- Do not keep retrying different guessed field names in a loop
|
|
37
|
+
- `record_list` is never the basis for a final statistical conclusion
|
|
38
|
+
- If `record_list` is capped or paged, treat it as sample-only evidence
|
|
39
|
+
- Do not mix full totals from `record_analyze` with sample-only list observations as one combined `全量结论`
|
|
40
|
+
- Do not manually tune paging or scan-budget parameters for analysis; `record_analyze` hides them
|
|
41
|
+
- For final conclusions, prefer `strict_full=true`
|
|
42
|
+
- Before choosing a DSL shape, first decide whether the question needs `count`, `sum`, `avg`, `distinct_count`, `ratio`, or `ranking`
|
|
43
|
+
- Do not guess a metric just because the user said `数量`, `单量`, `人数`, or `金额`
|
|
44
|
+
- If one business question depends on multiple metrics, split it into smaller structured questions and build multiple focused DSLs
|
|
45
|
+
- `渗透率 / 转化率 / 占比类结论必须先定义分子和分母`
|
|
46
|
+
- Do not claim a metric you did not query.
|
|
47
|
+
- Derived ratios must be computed outside the DSL after trusted numerator and denominator queries complete; do not invent `div`, `formula`, or expression metrics inside `record_analyze`
|
|
48
|
+
- If the requested business question requires unsupported derived math, split it into multiple DSLs and compute the final ratio only in the reasoning layer after the source metrics are confirmed
|
|
49
|
+
- If the user asks for multiple conclusions and only part of them is completed reliably, explicitly disclose which parts are complete and which parts remain unresolved
|
|
50
|
+
|
|
51
|
+
## Standard Operating Order
|
|
52
|
+
|
|
53
|
+
For analysis:
|
|
54
|
+
|
|
55
|
+
1. Confirm target app and environment
|
|
56
|
+
2. Run `record_schema_get`
|
|
57
|
+
3. Inspect fields, aliases, suggested dimensions, suggested metrics, and suggested time fields
|
|
58
|
+
4. Generate one or more field_id-based DSLs
|
|
59
|
+
5. Run `record_analyze` once per DSL
|
|
60
|
+
6. Run `record_list` only if you still need sample rows, examples, or manual inspection
|
|
61
|
+
7. Before answering, separate:
|
|
62
|
+
- `全量可信结论`
|
|
63
|
+
- `样本观察`
|
|
64
|
+
- `待验证假设`
|
|
65
|
+
|
|
66
|
+
## Semantic Guardrails
|
|
67
|
+
|
|
68
|
+
- If the user asks for penetration, conversion, share-of-total, win rate, non-standard ratio, or any `%` metric, first write down:
|
|
69
|
+
- numerator definition
|
|
70
|
+
- denominator definition
|
|
71
|
+
- whether each side needs its own DSL
|
|
72
|
+
- If you cannot name the denominator from real schema fields and filters, do not use words like `渗透率`, `转化率`, `占比`, `比例`, or `%`
|
|
73
|
+
- If a field is still ambiguous after `record_schema_get`, do not guess; either select one unique `field_id` from the schema or ask the user to confirm from a short candidate list
|
|
74
|
+
- If a statement depends on `count`, query `count`
|
|
75
|
+
- If a statement depends on total amount, query `sum`
|
|
76
|
+
- If a statement depends on average level, query `avg` or derive it from trusted `sum + count`
|
|
77
|
+
- If a statement depends on trend, query a time dimension with `bucket`
|
|
78
|
+
- If a statement depends on a ratio that the DSL cannot express directly, run the numerator and denominator separately, then compute the ratio outside MCP only after both sides are complete and compatible
|
|
79
|
+
- Rankings must come from structured sorted results, not from loose natural-language restatement
|
|
80
|
+
- When grouped rows are truncated, describe them as `已返回分组中` or `主要分组`
|
|
81
|
+
- If `presentation.rows_truncated=true` or `presentation.statement_scope=returned_groups_only`, do not use words like `各部门`、`所有分组`、`完整名单`、`全部渠道`
|
|
82
|
+
- If grouped rows are truncated, explicitly downgrade the wording to `前 N 个分组` or `主要分组`, never `全部`
|
|
83
|
+
- Complex answers should default to `先结构、后解读`: present the table / metrics / ordering first, then add concise interpretation
|
|
84
|
+
- Final wording should stay as close as possible to schema titles, dimension aliases, and metric aliases; do not rename the business object or field title unless the user asked for a rewrite
|
|
85
|
+
|
|
86
|
+
## DSL Contract
|
|
87
|
+
|
|
88
|
+
Use `record_schema_get` as the source of truth for every DSL field reference:
|
|
89
|
+
|
|
90
|
+
- Use `fields[].field_id` in `dimensions[].field_id`, `metrics[].field_id`, and `filters[].field_id`
|
|
91
|
+
- Treat `suggested_dimensions`, `suggested_metrics`, and `suggested_time_fields` as hints, not as executable DSL by themselves
|
|
92
|
+
- Do not pass field titles, aliases, or guessed ids where `field_id` is required
|
|
93
|
+
|
|
94
|
+
The `record_analyze` call should be built from this argument shape:
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
{
|
|
98
|
+
"app_key": "APP_1",
|
|
99
|
+
"dimensions": [],
|
|
100
|
+
"metrics": [],
|
|
101
|
+
"filters": [],
|
|
102
|
+
"sort": [],
|
|
103
|
+
"limit": 50,
|
|
104
|
+
"strict_full": true,
|
|
105
|
+
"view_key": null,
|
|
106
|
+
"view_name": null,
|
|
107
|
+
"output_profile": "normal"
|
|
108
|
+
}
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
Top-level argument rules:
|
|
112
|
+
|
|
113
|
+
- `app_key`: required. The target Qingflow app.
|
|
114
|
+
- `dimensions`: required list. Use `[]` for whole-table summary. Use one item per grouping dimension for grouped analysis.
|
|
115
|
+
- `metrics`: optional list. If omitted or empty, `record_analyze` defaults to a single `count` metric.
|
|
116
|
+
- `filters`: optional list. Filters restrict the analyzed dataset before results are interpreted.
|
|
117
|
+
- `sort`: optional list. Sorting applies to result rows, not raw source rows.
|
|
118
|
+
- `limit`: positive integer. It only limits returned result rows; it does not reduce the internal scan scope.
|
|
119
|
+
- `strict_full`: boolean. Prefer `true` for final conclusions. If `true`, incomplete scans return an error; if `false`, incomplete scans return partial results.
|
|
120
|
+
- `view_key` / `view_name`: optional. Use a view to narrow scope before analysis. Prefer `view_key` when both are available.
|
|
121
|
+
- `output_profile`: `normal` or `verbose`. Prefer `normal` unless you are debugging completeness or route issues.
|
|
122
|
+
|
|
123
|
+
Item contracts:
|
|
124
|
+
|
|
125
|
+
- `dimensions` item:
|
|
126
|
+
- shape: `{ "field_id": 2, "alias": "状态", "bucket": null }`
|
|
127
|
+
- `field_id`: required integer from `record_schema_get`
|
|
128
|
+
- `alias`: optional but recommended; if omitted, the field title becomes the alias
|
|
129
|
+
- `bucket`: optional; allowed values are `day`, `week`, `month`, `quarter`, `year`, or omitted / `null`
|
|
130
|
+
- `bucket` may only be used on fields from `suggested_time_fields`
|
|
131
|
+
- `metrics` item:
|
|
132
|
+
- shape: `{ "op": "sum", "field_id": 7, "alias": "总金额" }`
|
|
133
|
+
- `op`: one of `count`, `sum`, `avg`, `min`, `max`, `distinct_count`
|
|
134
|
+
- `field_id`: required for `sum`, `avg`, `min`, `max`, `distinct_count`; do not pass it for `count`
|
|
135
|
+
- `alias`: optional but strongly recommended because `sort.by` must reference aliases
|
|
136
|
+
- `filters` item:
|
|
137
|
+
- shape: `{ "field_id": 2, "op": "eq", "value": "进行中" }`
|
|
138
|
+
- `field_id`: required integer from `record_schema_get`
|
|
139
|
+
- `op`: optional; defaults to `eq`
|
|
140
|
+
- supported ops: `eq`, `neq`, `in`, `not_in`, `gt`, `gte`, `lt`, `lte`, `between`, `contains`, `is_null`, `not_null`
|
|
141
|
+
- value rules:
|
|
142
|
+
- `eq`, `neq`, `gt`, `gte`, `lt`, `lte`, `contains`: pass a single scalar value
|
|
143
|
+
- `in`, `not_in`: pass an array
|
|
144
|
+
- `between`: pass a two-item array like `[min, max]`
|
|
145
|
+
- `is_null`, `not_null`: omit `value`
|
|
146
|
+
- `sort` item:
|
|
147
|
+
- shape: `{ "by": "记录数", "order": "desc" }`
|
|
148
|
+
- `by`: required and must reference an alias already defined in `dimensions` or `metrics`
|
|
149
|
+
- `order`: optional; use `asc` or `desc`; default is `asc`
|
|
150
|
+
- do not sort by raw field title or `field_id`
|
|
151
|
+
|
|
152
|
+
Practical rules:
|
|
153
|
+
|
|
154
|
+
- Keep one DSL focused on one question. Prefer multiple small DSLs over one overloaded request.
|
|
155
|
+
- Always set explicit aliases for metrics you may sort by, compare, or quote in the final answer.
|
|
156
|
+
- For trend analysis, use one time dimension with `bucket`, then sort by that time alias ascending.
|
|
157
|
+
- For cross analysis, use multiple `dimensions` and a small set of metrics.
|
|
158
|
+
- Do not attempt formulas, joins, having clauses, cohort analysis, or manual paging controls in this DSL.
|
|
159
|
+
- Do not pass unsupported keys such as `formula`, `expr`, `numerator`, `denominator`, `left`, `right`, or `operator` inside metric items.
|
|
160
|
+
|
|
161
|
+
## Minimal DSL Templates
|
|
162
|
+
|
|
163
|
+
Summary:
|
|
164
|
+
|
|
165
|
+
```json
|
|
166
|
+
{
|
|
167
|
+
"dimensions": [],
|
|
168
|
+
"metrics": [
|
|
169
|
+
{ "op": "count", "alias": "记录数" }
|
|
170
|
+
],
|
|
171
|
+
"filters": [],
|
|
172
|
+
"sort": [],
|
|
173
|
+
"limit": 1,
|
|
174
|
+
"strict_full": true
|
|
175
|
+
}
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Single-dimension distribution:
|
|
179
|
+
|
|
180
|
+
```json
|
|
181
|
+
{
|
|
182
|
+
"dimensions": [
|
|
183
|
+
{ "field_id": 2, "alias": "状态" }
|
|
184
|
+
],
|
|
185
|
+
"metrics": [
|
|
186
|
+
{ "op": "count", "alias": "记录数" }
|
|
187
|
+
],
|
|
188
|
+
"filters": [],
|
|
189
|
+
"sort": [
|
|
190
|
+
{ "by": "记录数", "order": "desc" }
|
|
191
|
+
],
|
|
192
|
+
"limit": 50,
|
|
193
|
+
"strict_full": true
|
|
194
|
+
}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
Time trend:
|
|
198
|
+
|
|
199
|
+
```json
|
|
200
|
+
{
|
|
201
|
+
"dimensions": [
|
|
202
|
+
{ "field_id": 3, "alias": "月份", "bucket": "month" }
|
|
203
|
+
],
|
|
204
|
+
"metrics": [
|
|
205
|
+
{ "op": "count", "alias": "记录数" }
|
|
206
|
+
],
|
|
207
|
+
"filters": [],
|
|
208
|
+
"sort": [
|
|
209
|
+
{ "by": "月份", "order": "asc" }
|
|
210
|
+
],
|
|
211
|
+
"limit": 24,
|
|
212
|
+
"strict_full": true
|
|
213
|
+
}
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
Two-dimensional cross analysis:
|
|
217
|
+
|
|
218
|
+
```json
|
|
219
|
+
{
|
|
220
|
+
"dimensions": [
|
|
221
|
+
{ "field_id": 2, "alias": "状态" },
|
|
222
|
+
{ "field_id": 5, "alias": "负责人" }
|
|
223
|
+
],
|
|
224
|
+
"metrics": [
|
|
225
|
+
{ "op": "count", "alias": "记录数" },
|
|
226
|
+
{ "op": "sum", "field_id": 7, "alias": "总金额" }
|
|
227
|
+
],
|
|
228
|
+
"filters": [],
|
|
229
|
+
"sort": [
|
|
230
|
+
{ "by": "记录数", "order": "desc" }
|
|
231
|
+
],
|
|
232
|
+
"limit": 100,
|
|
233
|
+
"strict_full": true
|
|
234
|
+
}
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
## Output Gate
|
|
238
|
+
|
|
239
|
+
- Only write `全量可信结论` when the supporting `record_analyze` calls report `completeness.status=complete` and `safe_for_final_conclusion=true`
|
|
240
|
+
- If any key analysis call is incomplete, downgrade the answer to `初步观察` or `部分结果`
|
|
241
|
+
- Treat `safe_for_final_conclusion=true` as necessary but not sufficient when the metric definition is incomplete or grouped rows are truncated
|
|
242
|
+
- If `presentation.statement_scope=returned_groups_only`, you may still give full-population conclusions about totals or ratios, but not a full grouped enumeration claim
|
|
243
|
+
- If aggregate-style output is full but list evidence is sample-only, split the answer into:
|
|
244
|
+
- `全量可信结论`
|
|
245
|
+
- `样本观察(不作为最终结论)`
|
|
246
|
+
- optional `待验证假设`
|
|
247
|
+
|
|
248
|
+
## Resources
|
|
249
|
+
|
|
250
|
+
- Analysis patterns: [references/analysis-patterns.md](references/analysis-patterns.md)
|
|
251
|
+
- Confidence reporting: [references/confidence-reporting.md](references/confidence-reporting.md)
|
|
252
|
+
- Analysis gotchas: [references/analysis-gotchas.md](references/analysis-gotchas.md)
|
|
253
|
+
- Shared environment guidance: [/Users/yanqidong/Documents/qingflow-next/.codex/skills/qingflow-app-user/references/environments.md](/Users/yanqidong/Documents/qingflow-next/.codex/skills/qingflow-app-user/references/environments.md)
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Qingflow Record Analysis"
|
|
3
|
+
short_description: "Analyze Qingflow record data with schema-first DSL execution"
|
|
4
|
+
default_prompt: "Use $qingflow-record-analysis for grouped distributions, ratios, rankings, trends, and final statistical conclusions in Qingflow apps. Start with record_schema_get, build one or more field_id-based DSLs, then run record_analyze. Treat record_list as sample-only when capped or paged, and separate full conclusions from sample observations."
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# Analysis Gotchas
|
|
2
|
+
|
|
3
|
+
## Do not skip schema
|
|
4
|
+
|
|
5
|
+
If the task is analysis-style and you jump straight to `record_list` or `record_analyze`, you are already off the stable path.
|
|
6
|
+
|
|
7
|
+
Correct recovery:
|
|
8
|
+
|
|
9
|
+
1. `record_schema_get`
|
|
10
|
+
2. inspect the schema and choose fields
|
|
11
|
+
3. build one or more small DSLs
|
|
12
|
+
4. run `record_analyze`
|
|
13
|
+
|
|
14
|
+
## Normalize relative time phrases before building the DSL.
|
|
15
|
+
|
|
16
|
+
Examples:
|
|
17
|
+
|
|
18
|
+
- `最近一个完整自然月` -> convert to an explicit full-month date range
|
|
19
|
+
- `上个月` -> convert to a concrete month range
|
|
20
|
+
- `最近30天` -> convert to exact start/end dates
|
|
21
|
+
|
|
22
|
+
Do not pass vague time phrases or impossible dates into MCP.
|
|
23
|
+
|
|
24
|
+
## Do not treat 200-row list output as full data
|
|
25
|
+
|
|
26
|
+
`record_list` can hit:
|
|
27
|
+
|
|
28
|
+
- `row_cap=200`
|
|
29
|
+
- `row_cap_hit=true`
|
|
30
|
+
- `sample_only=true`
|
|
31
|
+
|
|
32
|
+
When this happens, it is sample-only evidence.
|
|
33
|
+
|
|
34
|
+
It is not acceptable to use that result alone for:
|
|
35
|
+
|
|
36
|
+
- 平均值
|
|
37
|
+
- 占比
|
|
38
|
+
- 排名
|
|
39
|
+
- 趋势
|
|
40
|
+
- 地域分布
|
|
41
|
+
- “基于全部数据”的 business insight
|
|
42
|
+
|
|
43
|
+
## Do not mix full analyze totals with sample rows
|
|
44
|
+
|
|
45
|
+
If `record_analyze` gives full-population coverage, but list rows are capped, do not merge them into one final statement.
|
|
46
|
+
|
|
47
|
+
Split them into:
|
|
48
|
+
|
|
49
|
+
- `全量可信结论`
|
|
50
|
+
- `样本观察`
|
|
51
|
+
|
|
52
|
+
## Do not present truncated grouped rows as a full grouped list
|
|
53
|
+
|
|
54
|
+
If `presentation.rows_truncated=true` or `presentation.statement_scope=returned_groups_only`:
|
|
55
|
+
|
|
56
|
+
- do not say `各部门`
|
|
57
|
+
- do not say `所有分组`
|
|
58
|
+
- do not say `完整名单`
|
|
59
|
+
|
|
60
|
+
Correct recovery:
|
|
61
|
+
|
|
62
|
+
- do not describe the answer as complete grouped coverage
|
|
63
|
+
- keep the wording inside the returned group scope
|
|
64
|
+
|
|
65
|
+
## Do not guess fields under ambiguity
|
|
66
|
+
|
|
67
|
+
If the field is uncertain:
|
|
68
|
+
|
|
69
|
+
- do not bounce across tools
|
|
70
|
+
- do not guess ids
|
|
71
|
+
- do not switch from one read tool to another by trial and error
|
|
72
|
+
- do not keep retrying different guessed field names in a loop
|
|
73
|
+
|
|
74
|
+
Correct recovery:
|
|
75
|
+
|
|
76
|
+
1. `record_schema_get`
|
|
77
|
+
2. if several plausible candidates remain, ask the user to confirm from a short list
|
|
78
|
+
3. build the DSL only after the field is clear
|
|
79
|
+
|
|
80
|
+
Examples of the right recovery question:
|
|
81
|
+
|
|
82
|
+
- “我找到两个可能的字段:`线索来源`、`来源渠道`。你要按哪个字段统计?”
|
|
83
|
+
- “目前最像‘来源’的字段有这三个:`来源`、`来源渠道`、`获客来源`。请确认你要按哪个字段分析。”
|
|
84
|
+
|
|
85
|
+
## Do not try to control paging manually
|
|
86
|
+
|
|
87
|
+
`record_analyze` hides paging and scan budget on purpose.
|
|
88
|
+
|
|
89
|
+
- Do not invent `page_size`
|
|
90
|
+
- Do not invent `requested_pages`
|
|
91
|
+
- Do not invent `scan_max_pages`
|
|
92
|
+
- Do not invent `auto_expand_pages`
|
|
93
|
+
|
|
94
|
+
When the result is incomplete:
|
|
95
|
+
|
|
96
|
+
1. narrow the scope with views or filters
|
|
97
|
+
2. reduce the analysis problem into smaller DSLs
|
|
98
|
+
3. keep the answer at `初步观察` or `部分结果` if completeness is still not enough
|
|
99
|
+
|
|
100
|
+
## Do not guess metric semantics from loose business wording
|
|
101
|
+
|
|
102
|
+
Before building the DSL, first decide whether the question needs:
|
|
103
|
+
|
|
104
|
+
- `count`
|
|
105
|
+
- `sum`
|
|
106
|
+
- `avg`
|
|
107
|
+
- `distinct_count`
|
|
108
|
+
- a ratio with numerator + denominator
|
|
109
|
+
- a sorted ranking result
|
|
110
|
+
|
|
111
|
+
Do not jump straight from words like `数量`, `人数`, `单量`, or `金额` to one assumed metric.
|
|
112
|
+
|
|
113
|
+
## Do not hide partial completion
|
|
114
|
+
|
|
115
|
+
If the user asked for several outputs and only part of them is stable:
|
|
116
|
+
|
|
117
|
+
- say which parts are complete
|
|
118
|
+
- say which parts are still unresolved
|
|
119
|
+
- do not present the answer as fully finished
|
|
120
|
+
|
|
121
|
+
## Do not send unsupported formula or div-style metrics into `record_analyze`.
|
|
122
|
+
|
|
123
|
+
Examples to avoid:
|
|
124
|
+
|
|
125
|
+
- `{"op":"div", ...}`
|
|
126
|
+
- metric items with `formula`, `expr`, `numerator`, or `denominator`
|
|
127
|
+
|
|
128
|
+
Correct recovery:
|
|
129
|
+
|
|
130
|
+
1. query the source metrics with separate DSLs
|
|
131
|
+
2. confirm both sides are complete and compatible
|
|
132
|
+
3. compute the derived ratio outside MCP in the reasoning layer
|
|
133
|
+
|
|
134
|
+
## Do not call something a ratio without the denominator
|
|
135
|
+
|
|
136
|
+
If the user asks for penetration / conversion / 占比:
|
|
137
|
+
|
|
138
|
+
1. define numerator
|
|
139
|
+
2. define denominator
|
|
140
|
+
3. query both sides explicitly
|
|
141
|
+
4. only then compute and report the ratio
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# Analysis Patterns
|
|
2
|
+
|
|
3
|
+
## When to use this skill
|
|
4
|
+
|
|
5
|
+
Use this skill when the user asks for:
|
|
6
|
+
|
|
7
|
+
- 分布
|
|
8
|
+
- 占比
|
|
9
|
+
- 平均值
|
|
10
|
+
- 排名 / top-N
|
|
11
|
+
- 趋势
|
|
12
|
+
- 洞察
|
|
13
|
+
- 最终统计结论
|
|
14
|
+
- 全量范围内的 business summary
|
|
15
|
+
|
|
16
|
+
## Canonical analysis sequence
|
|
17
|
+
|
|
18
|
+
1. `record_schema_get`
|
|
19
|
+
2. decide whether the question needs `count`, `sum`, `avg`, `distinct_count`, `ratio`, or `ranking`
|
|
20
|
+
3. build one or more field_id-based DSLs
|
|
21
|
+
4. `record_analyze`
|
|
22
|
+
5. `record_list` only for sample inspection
|
|
23
|
+
|
|
24
|
+
## Distribution / ratio pattern
|
|
25
|
+
|
|
26
|
+
1. Run `record_schema_get`
|
|
27
|
+
2. Inspect candidate fields and aliases
|
|
28
|
+
3. If several plausible candidates remain, stop and ask the user to confirm the field from a short list
|
|
29
|
+
4. Build a DSL with:
|
|
30
|
+
- one dimension
|
|
31
|
+
- `count`
|
|
32
|
+
- sort by the count alias
|
|
33
|
+
5. Run `record_analyze`
|
|
34
|
+
6. Report:
|
|
35
|
+
- `scanned_count`
|
|
36
|
+
- `safe_for_final_conclusion`
|
|
37
|
+
- `presentation.statement_scope`
|
|
38
|
+
- `completeness.local_filtering_applied` when it affects how the result should be framed
|
|
39
|
+
7. If grouped rows are truncated, describe the answer as `主要分组` or `已返回分组中`, not `各部门` or `全部`
|
|
40
|
+
|
|
41
|
+
## penetration / conversion / share-of-total pattern
|
|
42
|
+
|
|
43
|
+
1. Run `record_schema_get`
|
|
44
|
+
2. Write down the business definition in plain language:
|
|
45
|
+
- numerator
|
|
46
|
+
- denominator
|
|
47
|
+
- grouping dimension, if any
|
|
48
|
+
3. Build separate DSLs when numerator and denominator are not the same filtered population
|
|
49
|
+
4. Query the numerator first
|
|
50
|
+
5. Query the denominator second
|
|
51
|
+
6. Only compute the ratio outside MCP after both source results are complete and use compatible scopes
|
|
52
|
+
7. If the denominator is missing, do not call the output `渗透率`, `转化率`, `占比`, or `%`
|
|
53
|
+
|
|
54
|
+
## Average / ranking pattern
|
|
55
|
+
|
|
56
|
+
1. Run `record_schema_get`
|
|
57
|
+
2. Choose one dimension field and one numeric metric field
|
|
58
|
+
3. Build a DSL with:
|
|
59
|
+
- `dimensions=[...]`
|
|
60
|
+
- `metrics=[count,sum]` or `metrics=[count,avg,min,max]`
|
|
61
|
+
4. Run `record_analyze`
|
|
62
|
+
5. If the answer uses ranking language, make the ranking come from structured sorted results
|
|
63
|
+
6. Use list mode only to inspect examples after the aggregate result is understood
|
|
64
|
+
|
|
65
|
+
## Trend pattern
|
|
66
|
+
|
|
67
|
+
1. Run `record_schema_get`
|
|
68
|
+
2. Choose a date/time field from `suggested_time_fields`
|
|
69
|
+
3. Build a DSL with `bucket=day|week|month|quarter|year`
|
|
70
|
+
4. Run `record_analyze`
|
|
71
|
+
5. Treat the result as final only if `safe_for_final_conclusion=true`
|
|
72
|
+
6. If the user asked for a relative time phrase such as `最近一个完整自然月`, translate it into an explicit legal date range before building the DSL
|
|
73
|
+
|
|
74
|
+
## Sample inspection pattern
|
|
75
|
+
|
|
76
|
+
Only use `record_list` after schema/analyze when you need:
|
|
77
|
+
|
|
78
|
+
- example rows
|
|
79
|
+
- spot checks
|
|
80
|
+
- representative samples
|
|
81
|
+
- manual inspection of records behind an aggregate bucket
|
|
82
|
+
|
|
83
|
+
Never use list mode alone to justify final averages, shares, rankings, or regional distribution claims.
|
|
84
|
+
|
|
85
|
+
## Statement-to-query discipline
|
|
86
|
+
|
|
87
|
+
- If you want to say `单量低` or `volume low`, query `count`
|
|
88
|
+
- If you want to say `金额高`, query `sum`
|
|
89
|
+
- If you want to say `客单价高`, query `avg` or trusted `sum + count`
|
|
90
|
+
- If you want to say `增长` or `下降`, query a time bucket
|
|
91
|
+
- If you want to say `渗透率` or `占比`, query both numerator and denominator
|
|
92
|
+
- If you want to say `各部门` / `全部渠道` / `完整名单`, make sure `presentation.statement_scope=full_population` and `presentation.rows_truncated=false`
|
|
93
|
+
- If you want to say `Top N` or `排名`, make sure the result is explicitly sorted and the conclusion follows that returned order
|
|
94
|
+
- If the task is complex, default to `先结构、后解读`
|
|
95
|
+
|
|
96
|
+
## Ambiguous field recovery
|
|
97
|
+
|
|
98
|
+
If the user asks for something like “来源分布” or “类型占比” and the exact field is unclear:
|
|
99
|
+
|
|
100
|
+
1. run `record_schema_get`
|
|
101
|
+
2. inspect titles, aliases, and suggested fields
|
|
102
|
+
3. if one candidate is clearly dominant, proceed
|
|
103
|
+
4. if multiple candidates are still plausible, ask the user to confirm which field they want
|
|
104
|
+
|
|
105
|
+
Do not keep retrying different guessed field names in a loop.
|
|
106
|
+
|
|
107
|
+
## Partial completion discipline
|
|
108
|
+
|
|
109
|
+
If the user asked for several conclusions and only some of them are fully supported:
|
|
110
|
+
|
|
111
|
+
1. state which parts are complete
|
|
112
|
+
2. state which parts are still unresolved
|
|
113
|
+
3. do not present the answer as fully complete
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
# Confidence Reporting
|
|
2
|
+
|
|
3
|
+
## Required output structure
|
|
4
|
+
|
|
5
|
+
When analysis is intended as a final answer, use this order:
|
|
6
|
+
|
|
7
|
+
1. `全量可信结论`
|
|
8
|
+
2. `样本观察`
|
|
9
|
+
3. `待验证假设`
|
|
10
|
+
|
|
11
|
+
## Full conclusion gate
|
|
12
|
+
|
|
13
|
+
Only write `全量可信结论` when:
|
|
14
|
+
|
|
15
|
+
- `record_schema_get` was used
|
|
16
|
+
- the analysis path used one or more `record_analyze` calls
|
|
17
|
+
- every key analysis result has `safe_for_final_conclusion=true`
|
|
18
|
+
- `safe_for_final_conclusion=true is necessary but not sufficient`
|
|
19
|
+
- no key result depends on an invalid time phrase, an undefined denominator, or an unsupported derived metric
|
|
20
|
+
- the result is not just a capped list sample
|
|
21
|
+
|
|
22
|
+
## Sample observation gate
|
|
23
|
+
|
|
24
|
+
Put evidence into `样本观察` when:
|
|
25
|
+
|
|
26
|
+
- it came from `record_list`
|
|
27
|
+
- the tool reports `row_cap_hit`
|
|
28
|
+
- the tool reports `sample_only`
|
|
29
|
+
- the result is compact/capped and not complete
|
|
30
|
+
|
|
31
|
+
## Downgrade rule
|
|
32
|
+
|
|
33
|
+
If `record_schema_get` was not used for an analysis task, downgrade the overall framing to `初步观察` instead of `洞察` or `结论`.
|
|
34
|
+
|
|
35
|
+
## Anti-mixing rule
|
|
36
|
+
|
|
37
|
+
Do not combine:
|
|
38
|
+
|
|
39
|
+
- full totals from `record_analyze`
|
|
40
|
+
- sample-only details from `record_list`
|
|
41
|
+
|
|
42
|
+
into one sentence like “基于全部数据分析...”.
|
|
43
|
+
|
|
44
|
+
Instead:
|
|
45
|
+
|
|
46
|
+
- full totals and distributions go into `全量可信结论`
|
|
47
|
+
- illustrative rows go into `样本观察`
|
|
48
|
+
|
|
49
|
+
## Semantic gate
|
|
50
|
+
|
|
51
|
+
Even when `safe_for_final_conclusion=true`, do not overstate the answer if:
|
|
52
|
+
|
|
53
|
+
- the metric definition is incomplete
|
|
54
|
+
- the denominator was not queried
|
|
55
|
+
- the conclusion mentions trend but no time bucket was queried
|
|
56
|
+
- the conclusion mentions单量/volume but no `count` metric was queried
|
|
57
|
+
- the conclusion depends on a derived metric the DSL cannot natively express
|
|
58
|
+
- `presentation.statement_scope=returned_groups_only`
|
|
59
|
+
- `presentation.rows_truncated=true`
|
|
60
|
+
|
|
61
|
+
## Grouped enumeration gate
|
|
62
|
+
|
|
63
|
+
If grouped rows were truncated:
|
|
64
|
+
|
|
65
|
+
- do not call the grouped output `各部门`, `全部渠道`, `完整名单`, or `所有分组`
|
|
66
|
+
- use `已返回分组中`, `主要分组`, or `前 N 个分组`
|
|
67
|
+
- keep full-population statements only for metrics that still cover the full analyzed population
|
|
68
|
+
|
|
69
|
+
## Partial completion disclosure
|
|
70
|
+
|
|
71
|
+
If the user asked for multiple conclusions but only some are complete:
|
|
72
|
+
|
|
73
|
+
- explicitly disclose which parts are complete
|
|
74
|
+
- explicitly disclose which parts are not yet complete
|
|
75
|
+
- do not collapse the answer into one all-clear conclusion
|
|
76
|
+
|
|
77
|
+
## Example skeleton
|
|
78
|
+
|
|
79
|
+
### 全量可信结论
|
|
80
|
+
|
|
81
|
+
- `scanned_count=1134`
|
|
82
|
+
- `safe_for_final_conclusion=true`
|
|
83
|
+
- 这里写最终业务结论
|
|
84
|
+
|
|
85
|
+
### 样本观察
|
|
86
|
+
|
|
87
|
+
- 以下来自样本明细浏览,不作为最终统计结论
|
|
88
|
+
- 这里写代表性样本现象
|
|
89
|
+
|
|
90
|
+
### 待验证假设
|
|
91
|
+
|
|
92
|
+
- 这里写还需要进一步验证的推测
|