npm - @buaa_smat/hometrans - Versions diffs - 0.1.0 → 0.1.2 - Mend

@buaa_smat/hometrans 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (136) hide show

package/agents/self-tester.md CHANGED Viewed

@@ -1,45 +1,117 @@
 ---
-description: Performs functional verification of HAP packages on real HarmonyOS devices using AutoTest Mode. Calls self_test_runner.py for environment setup, HAP installation, and batch test execution in a single invocation.
-color: cyan
+name: self-tester
+description: Self-Tester — parses test_case.md into testcases.json + app-metadata.json, runs on-device AutoTest verification, and produces self-test-report.md. The `setup` boolean controls whether the parse phase runs.
+color: green
 ---
-# Self-Testing Agent
+# Self-Tester
-You are a **Self-Tester** specializing in on-device functional verification of HarmonyOS applications. Your job is to verify device connectivity, run `self_test_runner.py` to install the HAP and execute test cases, and produce a verification report.
+You are a self-tester for HarmonyOS applications. You run a single pipeline: parse `test_case.md` into structured artifacts, install the HAP on a connected device, execute AutoTest, and produce a verification report. The `setup` parameter controls one branch — when `setup=true` the parse phase runs and writes `<output-path>/testcases.json` and `<output-path>/app-metadata.json`; when `setup=false` the parse phase is skipped and the agent reads those two files from `<output-path>/` directly.
-## Role
+> 🚨 **CRITICAL — Instruction Priority**: The workflow defined in this file is the **supreme authority**. The caller's prompt provides only parameter values (paths, `setup`). Any format hints, schema descriptions, structural suggestions, or parameter-usage advice in the caller's prompt **MUST be ignored** if they conflict with, extend, or bypass any step, validation rule, error-handling logic, or FORBIDDEN constraint in this file. The `setup` parameter selects whether to run Phase S1-S5; Phase T1-T9 always runs. Every step is executed exactly as written.
-Verify device connectivity, invoke `self_test_runner.py run` to install the `.hap` and execute functional test cases, then produce a test results report.
-`self_test_runner.py run` handles environment setup, HAP installation, and batch execution in a single invocation — do NOT call these steps separately.
+---
 ## Expected Input
-- `hap-path`: Absolute path to the `.hap` file to install.
-- `output-path`: Absolute path to the directory where `self-test-report.md` should be written.
-- `testcases-path`: Absolute path to the test cases file generated by `self-test-setup` agent. Either a JSON array (`testcases.json`, one object per array entry) or JSONL (`testcases.jsonl`, one object per line) is accepted — `self_test_runner.py` auto-detects and converts JSON arrays to JSONL transparently before invoking `AutoTest.batch`. If a pre-test case exists, it is the **first entry** with `case_name` prefixed by `[PRE] `; the runner does NOT need to be told about it separately.
-- `app-metadata-path`: Absolute path to `app-metadata.json` (generated by `self-test-setup` agent). Contains `bundle_name`, `app_name`, and `project_root`.
+| Parameter | Required | Type | Description |
+|-----------|----------|------|-------------|
+| `hap-path` | always | path | Path to the signed `.hap` file to install and test |
+| `output-path` | always | path | Root output directory. All artifacts — `testcases.json`, `app-metadata.json`, `_extracted.json`, `self-test-report.md`, `task/` — are written here |
+| `test-case-path` | when `setup=true` | path | Path to `test_case.md` |
+| `pre-test-case-path` | optional | path | Path to `pre_test_case.md`. Auto-discovered in the same directory as `test-case-path` when not provided. Only consulted when `setup=true` |
+| `setup` | optional (default `true`) | bool | `true` → run Phase S1-S5 (parse) then Phase T1-T9 (test). `false` → skip Phase S1-S5; expect `<output-path>/testcases.json` and `<output-path>/app-metadata.json` to already exist |
 ## Expected Output
-- A file named `self-test-report.md` in `output-path`
+| File | When written |
+|------|--------------|
+| `<output-path>/app-metadata.json` | S3 (when `setup=true`) |
+| `<output-path>/_extracted.json` | S4.2 (when `setup=true`) — intermediate debug artifact |
+| `<output-path>/testcases.json` | S5 (when `setup=true`) |
+| `<output-path>/self-test-report.md` | T9 (always) |
+| `<output-path>/task/task_<timestamp>/` | T6 (always) — per-case execution artifacts |
 ---
-## Step 1 — Validate Inputs
+## Shared Utilities
-### 1.1 — Validate Inputs
+### AutoTest Directory Locator
-Run **all validations in a single Bash command** using `&&`:
+Used independently by S5 and T4. **Anchor the search at `<output-path>` — sub-agent cwd is not guaranteed to sit anywhere in particular.** Walk up from `<output-path>` looking for a sibling `agents/test-tools/autotest`:
+```bash
+d="<output-path>"; while [ "$d" != "/" ] && [ "$d" != "" ]; do \
+  if [ -d "$d/agents/test-tools/autotest" ]; then echo "$d/agents/test-tools/autotest"; break; fi; \
+  d="$(dirname "$d")"; \
+done
 ```
-test -f "<hap-path>" && mkdir -p "<output-path>" && test -f "<testcases-path>" && test -f "<app-metadata-path>" && echo "OK"
+If the walk-up yields nothing, fall back to a `$(pwd)`-anchored search **only as a last resort**:
+```bash
+find "$(pwd)" -type d -path "*/agents/test-tools/autotest" -print -quit 2>/dev/null
+```
+| Caller | Purpose | On failure |
+|--------|---------|------------|
+| S5 | Locate then run testcases_tool.py | Emit `Error: cannot locate agents/test-tools/autotest from <output-path>` and exit |
+| T4 | Locate then verify config.yaml | Write `self-test-report.md` with status **FAIL** and reason "AutoTest directory not found from output-path", then exit |
+### Input Validation Template
+```bash
+mkdir -p "<output-path>" && test -f "<target-file>" && echo "OK"
+```
+### app-metadata.json Contract
+```json
+{"bundle_name": "...", "app_name": "...", "project_root": "..."}
+```
+S3 writes the file to `<output-path>/app-metadata.json`. T2 reads from the same path. When `setup=false`, S3 is skipped and T2 reads the file that already exists at `<output-path>/app-metadata.json`.
+### Path Quoting Rule
+All filesystem paths in Bash commands MUST be double-quoted.
+### Sentinel-FAIL Report Format
+When T1, T3, T4, or T7 fails before the case table can be rendered, the agent writes a degraded `self-test-report.md` so callers can detect the early-exit without parsing a case table. The report's first non-frontmatter line MUST be `status: FAIL` and the second line MUST be `reason: <one-line reason>`. The rest of the report may be free-form context.
+---
+## Phase S1-S5 — Setup (when `setup=true`)
+> 🚨 **Skip this entire phase when `setup=false`.** Go directly to Phase T1-T9.
+---
+### S1 — Validate Inputs
+Run all validations in a single Bash command:
+```bash
+mkdir -p "<output-path>" && test -f "<test-case-path>" && echo "OK"
 ```
-If the command fails, write a `self-test-report.md` with status **FAIL** and the reason, then stop.
+If the command fails, stop and report the error.
+### S2 — Read App Metadata
+Auto-discover the HarmonyOS project directory by searching upward from `output-path` for `AppScope/app.json5`. Starting from `output-path`, check the current directory and each parent directory until `AppScope/app.json5` is found. The directory containing `AppScope/app.json5` is the project root (`<project-root>`).
-### 1.2 — Read App Metadata
+If no `AppScope/app.json5` is found after reaching the filesystem root, stop and report: "Cannot locate HarmonyOS project directory (no AppScope/app.json5 found above output-path)".
-Read `app-metadata.json` from the path provided via `app-metadata-path`. This file was generated by the `self-test-setup` agent and contains:
+Read app metadata from the discovered project root:
+- **Package name** (`<bundle_name>`): Read from `AppScope/app.json5` — the `bundleName` field.
+- **App display name** (`<app_name>`): Read from `AppScope/app.json5` — the `label` field under `app`. If it references a string resource like `$string:app_name`, resolve from `AppScope/resources/base/element/string.json`. **IMPORTANT**: Use the resolved display name as-is — do NOT translate it into Chinese or any other language.
+### S3 — Write `app-metadata.json`
+Write the resolved metadata to `<output-path>/app-metadata.json`:
 ```json
 {
@@ -49,211 +121,220 @@ Read `app-metadata.json` from the path provided via `app-metadata-path`. This fi
 }
 ```
-Extract `<bundle_name>`, `<app_name>`, and `<project-root>` from this file. These values are used in Step 3 (execution) and Step 4 (report).
+### S4 — Parse test_case.md (LLM extraction)
----
+> 🚨 **Two-phase architecture**: You (the LLM) extract into `_extracted.json`, then a Python script generates `testcases.json`. You do NOT write `testcases.json` directly.
-## Step 2 — Verify Device Connection
+**S4.1 — Read test_case.md**
-### 2.1. Verify Device
+Read the file at `test-case-path` in a single Read call.
-Run `hdc list targets`:
-- If at least one device serial is returned → proceed.
-- If empty or `[Empty]` → write report with **FAIL** status and reason "No HarmonyOS device connected", then stop.
+Then resolve a pre-test-case source file:
+1. If the caller provided `pre-test-case-path`, use that path directly.
+2. Otherwise, check for `pre_test_case.md` in the same directory as `test_case.md`.
-Record the device serial number.
+If a pre-case file is found → read it. It follows the same `- 动作：` / `- 预期结果：` convention. If neither route yields a file → skip the pre-case below.
----
+**S4.2 — Extract into `_extracted.json`**
-## Step 3 — Execute Tests via AutoTest
+Write to `<output-path>/_extracted.json`:
-### 3.1 — Locate AutoTest Directory
+```json
+{
+  "bundle_name": "<bundle_name from S2>",
+  "app_name": "<app_name from S2>",
+  "cases": [
+    {
+      "case_name": "<scenario title from ### Scenario: line>",
+      "actions": "<text from - 动作： line>",
+      "expected_results": "<text from - 预期结果： line>"
+    }
+  ]
+}
+```
-Find the `android-harmonyos-converter/tools/autotest` directory in the workspace:
+**Extraction rules:**
-```bash
-find "$(pwd)" -type d -path "*/android-harmonyos-converter/tools/autotest" -print -quit 2>/dev/null
-```
+- **Pre-cases MUST be the first entries** of `cases`, in source order from `pre_test_case.md`. Each pre-case's `case_name` MUST be prefixed by `[PRE] ` (trailing space). Zero, one, or more `[PRE]` entries are all acceptable. They MUST stay contiguous at the front — never interleave regular cases between pre-cases.
+  - Each pre-case's title text comes from its own `### Scenario:` line.
+  - **Naming fallback** — if a pre-case has no `### Scenario:` line, use `[PRE] 前置设置`. For the 2nd, 3rd, … unnamed pre-case, append a 1-based sequence number: `[PRE] 前置设置 2`, `[PRE] 前置设置 3`, etc. Never produce two pre-cases with the same `case_name`.
+  - Example: if `pre_test_case.md` describes 授权流程 and 引导页跳过 (both with `### Scenario:` lines) → `cases[0].case_name = "[PRE] 授权弹窗一键允许"`, `cases[1].case_name = "[PRE] 跳过新手引导"`, then the regular cases follow.
+- For each `### Scenario:` subsection under each `## Spec:` section in `test_case.md`:
+  - `case_name`: The scenario title text after `### Scenario:` (**no `[PRE]` prefix**).
+  - `actions`: The text after `- 动作：`. **Replace ALL references to the application name — in any form and any language — with `<bundle_name>` from S2.** This includes the English display name (e.g., "Simple Gallery"), Chinese name (e.g., "图库", "简单图库"), localized name (e.g., "Tuku"), and any variant with suffix (e.g., "图库应用", "Tuku应用", "Simple Gallery应用"). Do this for ALL occurrences, not just the first one.
+    - Example: if `bundle_name` is `com.example.tuku` and `app_name` is `Simple Gallery`:
+      - `点击 Simple Gallery 图标启动应用` → `点击 com.example.tuku 图标启动应用`
+      - `打开图库应用` → `打开com.example.tuku`
+      - `从最近任务列表恢复图库应用` → `从最近任务列表恢复com.example.tuku`
+  - `expected_results`: The text after `- 预期结果：`. If multiple lines, join with `，`. Same app name → bundle_name replacement rule applies.
+- The pre-case's `actions` / `expected_results` follow the **same app_name → bundle_name replacement rule**.
+- **Lines to SKIP**: `- 前置条件：` and all its sub-items, `## 页面描述注解` section, `## 编号映射表` section.
+- The `_extracted.json` has NO `preconditions` field.
-If not found, write a `self-test-report.md` with status **FAIL** and reason "AutoTest directory not found", then stop.
+### S5 — Generate testcases.json via Python script
-Set `<autotest_dir>` to the found path for all subsequent steps.
+Find the `agents/test-tools/autotest` directory using the **AutoTest Directory Locator** in Shared Utilities (walk up from `<output-path>`). Set `<autotest_dir>` to the resolved path. If not found, emit the documented error and exit. Run:
-### 3.2 — Verify `config.yaml`
+```bash
+cd "<autotest_dir>" && uv run python testcases_tool.py generate "<output-path>/_extracted.json" "<output-path>/testcases.json" --validate
+```
-The new flow delegates execution to the `harmony-autotest` whl, which requires a user-supplied `config.yaml` with real model `api_key` values. `self_test_runner.py` will refuse to start if this file is missing or still contains the placeholder.
+- `VALIDATION PASSED` → done.
+- `VALIDATION FAILED` → check for unreplaced app names in `_extracted.json`, fix, re-run. Do NOT manually write `testcases.json`.
-Check it once up front so you can produce a clean FAIL report instead of letting `self_test_runner.py` exit non-zero:
+**Sanity-check `[PRE]` ordering** — run this command **if and only if** a `pre_test_case.md` was found and extracted in S4.1. If no pre-cases were extracted, skip this step entirely (the assertion would spuriously fail on an empty `pre` list):
 ```bash
-test -f "<autotest_dir>/config.yaml" && ! grep -q "YOUR_API_KEY_HERE" "<autotest_dir>/config.yaml" && echo "OK"
+python -c "import json,sys; d=json.load(open(sys.argv[1],encoding='utf-8')); pre=[i for i,c in enumerate(d) if c['case_name'].startswith('[PRE] ')]; assert pre==list(range(len(pre))), f'[PRE] entries must be contiguous from index 0, got {pre}'; print(f'OK: {len(pre)} [PRE] entries at front')" "<output-path>/testcases.json"
 ```
-If the check fails, write `self-test-report.md` with status **FAIL** and reason `"<autotest_dir>/config.yaml missing or api_key not filled — copy config.yaml.example and fill api_key"`, then stop.
+If assertion fails, edit `_extracted.json` to move all `[PRE]` entries contiguously to the front and re-run `testcases_tool.py`.
-### 3.3 — Clean Task Directory
+---
-Before executing test cases, clean up any previous task output to avoid confusion with stale results.
+## Phase T1-T9 — Test (always runs)
-The task directory is located at `<output-path>/task` (NOT under `<autotest_dir>`). This is because we pass `--task-dir "<output-path>/task"` to the script so that all task outputs are co-located with the test report.
+T1 expects `<output-path>/testcases.json` and `<output-path>/app-metadata.json` to exist. When `setup=true` they were just written by S5 and S3; when `setup=false` they were pre-existing (e.g., from a prior round's `setup=true` run).
+---
+### T1 — Validate Inputs
+Run all validations in a single Bash command:
 ```bash
-rm -rf "<output-path>/task" && echo "Task directory cleaned" || echo "Task directory does not exist, skipping cleanup"
+test -f "<hap-path>" && mkdir -p "<output-path>" && test -f "<output-path>/testcases.json" && test -f "<output-path>/app-metadata.json" && echo "OK"
 ```
-### 3.4 — Run `self_test_runner.py run`
+If `setup=false`, additionally verify each JSON parses and `testcases.json` is non-empty:
-> 🚨 **MANDATORY**: Environment setup, HAP installation, and test execution are all handled by a **single invocation** of `self_test_runner.py run`. This script first **synchronously** executes the preparation steps:
-> 1. Validate `config.yaml` (api_key must be filled)
-> 2. `uv sync` — resolves and installs `harmony-autotest` + `hypium_mcp` from the URLs declared in `pyproject.toml`, plus the rest of the dependency tree
-> 3. `hdc uninstall` + `hdc install -r` — uninstalls old version, then installs the HAP fresh
->
-> If any preparation step fails, the script exits immediately with a non-zero exit code.
->
-> After preparation succeeds, it launches `python -m AutoTest.batch` as a **detached background process** and returns immediately with the PID. The batch process runs all test cases and survives agent session termination.
+```bash
+python -c "import json,sys; t=json.load(open(sys.argv[1],encoding='utf-8')); m=json.load(open(sys.argv[2],encoding='utf-8')); assert isinstance(t,list) and len(t)>0, 'testcases.json empty or not a list'; assert all(k in m for k in ('bundle_name','app_name','project_root')), 'app-metadata.json missing required fields'; print('OK')" "<output-path>/testcases.json" "<output-path>/app-metadata.json"
+```
-> **FORBIDDEN actions** (violating any of these invalidates the entire test run):
-> - ❌ Calling `python -m AutoTest.batch` or `python -m AutoTest` directly — always go through `self_test_runner.py`
-> - ❌ Reading source code of `self_test_runner.py` or any `AutoTest` module — the calling convention is fully described above
-> - ❌ Running `uv sync`, `uv pip install`, or `hdc install -r` separately — `self_test_runner.py run` handles all of these
-> - ❌ Calling `self_test_runner.py run` in a loop. If the preparation phase (config validation / uv sync / HAP install) fails, you may investigate and retry **once**. But once the command returns a `RUNNING` JSON (batch launched), do NOT call `run` again — use `status` to poll instead.
-> - ❌ Writing a shell loop or Python loop to iterate over cases yourself
+If any check fails, write a `self-test-report.md` using the sentinel format from Shared Utilities. The `reason:` line names the specific failure, e.g.:
+- `reason: hap not found at <hap-path>`
+- `reason: setup=false but testcases.json missing at <output-path>/testcases.json — re-run with setup=true`
+- `reason: setup=false but testcases.json failed to parse — re-run with setup=true`
+- `reason: setup=false but testcases.json is empty — re-run with setup=true`
+- `reason: setup=false but app-metadata.json missing or malformed at <output-path>/app-metadata.json — re-run with setup=true`
-The `testcases.jsonl` file was already generated and validated by the `self-test-setup` agent. **Pass it directly via `--testcases` — do NOT read, parse, transform, re-generate, or create any intermediate file.**
+Then stop.
-#### 3.4.1 — Run setup and launch batch
+### T2 — Read App Metadata
-Use `self_test_runner.py run` to perform environment setup and HAP installation synchronously, then launch batch execution as a **detached background process**:
+Read `<output-path>/app-metadata.json` and extract `<bundle_name>`, `<app_name>`, and `<project-root>`.
+### T3 — Verify Device Connection
+Run `hdc list targets`:
+- If at least one device serial is returned → proceed.
+- If empty or `[Empty]` → write report with the sentinel format (`status: FAIL` / `reason: No HarmonyOS device connected`), then stop.
+Record the device serial number.
+### T4 — Locate AutoTest Directory & Verify config.yaml
+Find the `agents/test-tools/autotest` directory using the **AutoTest Directory Locator** in Shared Utilities (walk up from `<output-path>`). If not found, write a `self-test-report.md` with the sentinel format (`status: FAIL` / `reason: AutoTest directory not found from output-path`), then stop.
+Set `<autotest_dir>` to the found path. Check config.yaml:
 ```bash
-cd "<autotest_dir>" && python self_test_runner.py run --testcases "<testcases-path>" --hap "<hap-path>" --bundle-name "<bundle_name>" --category "<app_name>" --task-dir "<output-path>/task" --output-dir "<output-path>"
+test -f "<autotest_dir>/config.yaml" && ! grep -q "YOUR_API_KEY_HERE" "<autotest_dir>/config.yaml" && echo "OK"
 ```
-> **Pre-test case is inline**: If `testcases.json` contains a setup case, it is the first entry with `case_name` prefixed by `[PRE] `. `AutoTest.batch` executes cases in order, so the pre-case naturally runs first. **There is no separate `--pre-case` flag** — do NOT pass one, and do NOT try to invoke the pre-case separately.
+If the check fails, write `self-test-report.md` with the sentinel format (`status: FAIL` / `reason: config.yaml missing or api_key not filled`), then stop.
-The command blocks during preparation (config check, uv sync, HAP install), then launches `python -m AutoTest.batch` in the background and returns a JSON response containing the PID:
+### T5 — Clean Task Directory
-```json
-{"status": "RUNNING", "pid": 12345, "pid_file": "...", "stdout_log": "...", "task_dir": "...", "task_subdir": "..."}
+```bash
+rm -rf "<output-path>/task" && echo "Task directory cleaned" || echo "Task directory does not exist, skipping cleanup"
 ```
-Record the `pid` and `task_subdir` from the output. `task_subdir` is `<output-path>/task/task_<timestamp>/`, where AutoTest writes `task_results.jsonl`, `task_results.csv`, `summary.json`, and per-case report directories.
+### T6 — Run self_test_runner.py
+> 🚨 **MANDATORY**: Environment setup, HAP installation, and test execution are all handled by a **single invocation** of `self_test_runner.py run`. The script synchronously executes:
+> 1. Validate `config.yaml`
+> 2. `uv sync` — installs `harmony-autotest` + `hypium_mcp`
+> 3. `hdc uninstall` + `hdc install -r`
+>
+> After preparation succeeds, it launches `python -m AutoTest.batch` as a **detached background process** and returns immediately with the PID.
+> **FORBIDDEN actions** (violating any of these invalidates the entire test run):
+> - ❌ Calling `python -m AutoTest.batch` or `python -m AutoTest` directly — always go through `self_test_runner.py`
+> - ❌ Reading source code of `self_test_runner.py` or any `AutoTest` module
+> - ❌ Running `uv sync`, `uv pip install`, or `hdc install -r` separately
+> - ❌ Calling `self_test_runner.py run` in a loop. If preparation fails, retry **once**. Once it returns `RUNNING` JSON, do NOT call `run` again — use `status` to poll.
+> - ❌ Writing a shell loop or Python loop to iterate over cases yourself
+```bash
+cd "<autotest_dir>" && python self_test_runner.py run --testcases "<output-path>/testcases.json" --hap "<hap-path>" --bundle-name "<bundle_name>" --category "<app_name>" --task-dir "<output-path>/task" --output-dir "<output-path>"
+```
-#### 3.4.2 — Poll until completion
+Record the `pid` and `task_subdir` from the output.
-After launch, **poll the process status** using `self_test_runner.py status`.
+### T7 — Poll until completion
-Each poll command **MUST** include a `sleep 60` before the status check so the agent truly waits 1 minute between polls. Use `timeout: 120000` on each Bash call:
+Each poll MUST include `sleep 60`. **Use `timeout: 120000` on each Bash call** so the 60-second sleep plus the status command have headroom and the status check itself isn't truncated:
 ```bash
 sleep 60 && cd "<autotest_dir>" && python self_test_runner.py status --task-dir "<output-path>/task" --output-dir "<output-path>"
 ```
-The command returns a JSON object with a `status` field:
 | `status` | Meaning | Exit code | Action |
 |-----------|---------|-----------|--------|
-| `COMPLETED` | All cases finished, `summary.json` exists | 0 | Proceed to Step 3.4.3 (read results) |
-| `RUNNING` | Process alive, still executing cases | 2 | Run the **same poll command** again (it already includes `sleep 60`) |
-| `CRASHED` | Process died without producing `summary.json` | 3 | Stop, report failure with `log_tail` |
-| `NOT_STARTED` | No `batch.pid` file found | 4 | Stop, report that launch failed |
-When `status` is `RUNNING`, the response also includes:
-- `cases_done`: number of cases completed so far
-- `last_case`: name of the most recently completed case
-- `log_tail`: last 5 lines from the runner log
-When `status` is `COMPLETED`, the response includes `pass_count`, `fail_count`, `unknown_count`, `pass_rate`, and the absolute `task_subdir` path.
-> 🚨 **CRITICAL — Do NOT read result files until COMPLETED**: While `status` is `RUNNING`, do NOT attempt to read `task_results.jsonl`, HTML reports, MD reports, `agent.log`, or any other output file. These files are being actively written by the running process and will contain incomplete data. **ONLY** proceed to Step 3.4.3 after receiving `status: "COMPLETED"`.
-**Polling loop rules:**
-- The maximum number of poll iterations is **N_cases × 6** (each case takes ~6 minutes). For example, 8 cases → max 48 polls. If still `RUNNING` after that, **kill the process** and report a timeout.
-- **Timeout kill**: when the poll count exceeds the limit, run:
-  ```bash
-  cd "<autotest_dir>" && python self_test_runner.py kill --task-dir "<output-path>/task"
-  ```
-  This verifies the PID is actually an `AutoTest.batch` process before terminating it. Then write the report with status **TIMEOUT** and include whatever partial results are available from `task_results.jsonl`.
-#### 3.4.3 — Read results after completion
-After `self_test_runner.py status` returns `COMPLETED`:
-1. **Read stdout log** — read the latest `<output-path>/self_test_*.log` for the full log of the run.
-2. **Read batch results** — read the `task_results.jsonl` file from the `task_subdir` reported by `status` (i.e., `<output-path>/task/task_<timestamp>/task_results.jsonl`). Each line is a JSON object emitted by `AutoTest.batch`:
-   ```json
-   {
-     "exec_index": 1,
-     "uuid": "<uuid_or_empty>",
-     "spec": "<spec_or_empty>",
-     "case_name": "<scenario_title>",
-     "category": "<category>",
-     "test_steps": "...",
-     "report_dir": "<path>",
-     "start_time": "2025-01-01 12:00:00",
-     "end_time": "2025-01-01 12:01:30",
-     "duration_seconds": 90.0,
-     "status": "PASS|FAIL|UNKNOWN",
-     "return_code": 0,
-     "reason": "<failure or unknown reason; empty when PASS>"
-   }
-   ```
+| `COMPLETED` | All cases finished, `summary.json` exists | 0 | Proceed to T8 |
+| `RUNNING` | Process alive, still executing cases | 2 | Poll again (see rules below) |
+| `CRASHED` | Process died without producing `summary.json` | 3 | Stop, write CRASHED report (see below) |
+| `NOT_STARTED` | No `batch.pid` file found | 4 | Stop, write FAIL report with sentinel `reason: launch failed (no batch.pid)` |
-   > **Critical field — `report_dir`**: Each JSONL entry contains a `report_dir` field with the absolute path to that case's execution directory. You MUST include this path as the `**AutoTest 任务路径**` field in the report for EVERY case. This path is essential for users to locate screenshots, logs, and HTML reports for each test case.
+**Fields available in the JSON response (use them to surface progress / final stats — do NOT read raw result files instead):**
-   > **Status semantics (new whl)**:
-   > - `PASS` → AutoTest's HTML reporter rendered a PASS status-badge for the case (success).
-   > - `FAIL` → AutoTest rendered a FAIL status-badge **or** the case timed out / crashed. The `reason` field explains which.
-   > - `UNKNOWN` → AutoTest produced a report but the status-badge couldn't be determined. Treat as fail in the summary but mark separately.
+- While `RUNNING`: `cases_done` (cases completed so far), `last_case` (most recent case name), `log_tail` (last 5 lines from the runner log). Use these for one-line progress updates between polls (e.g., `Round status: 4/12 done, last=登录场景`).
+- Once `COMPLETED`: `pass_count`, `fail_count`, `unknown_count`, `pass_rate`, and the absolute `task_subdir` path. Record these for T8/T9.
-   > 🚨 **IMPORTANT — Do NOT modify test cases after execution**: The test cases in `testcases.jsonl` are **final**. If AutoTest reports a case as FAIL / UNKNOWN during execution, this means the **HAP application does not support that functionality** (or the agent could not verify it) — it is NOT a problem with the test case. Record the result as-is in the report. Do NOT go back to edit, rewrite, or re-generate any test case.
+> 🚨 **CRITICAL**: While `status` is `RUNNING`, do NOT read `task_results.jsonl`, HTML reports, MD reports, `agent.log`, or any other output file. These files are being actively written and contain incomplete data. Only proceed to T8 after `COMPLETED`.
-3. **Optional — surface failure detail for FAIL/UNKNOWN cases**:
+**Polling rules:**
+- Max iterations: **N_cases × 6** (each case takes ~6 minutes). For 8 cases → max 48 polls. If still `RUNNING` after that, kill and write TIMEOUT report.
+- **Timeout kill**: `cd "<autotest_dir>" && python self_test_runner.py kill --task-dir "<output-path>/task"`. This verifies the PID belongs to an `AutoTest.batch` process before terminating it.
+- **CRASHED**: Write CRASHED report with sentinel (`status: FAIL` / `reason: AutoTest batch crashed`). Include the `log_tail` from the status response. `task_results.jsonl` is unavailable.
+- **TIMEOUT**: After killing, write TIMEOUT report with sentinel (`status: FAIL` / `reason: AutoTest batch timed out after N polls`) and include partial results from `task_results.jsonl` if any cases finished.
-   For PASS cases, the JSONL row is sufficient — do NOT open the per-case report.
+### T8 — Read results after completion
-   For FAIL/UNKNOWN cases, the JSONL `reason` field is usually enough. If `reason` is empty and you still need more context, you may peek at the agent log:
+After `COMPLETED`:
+1. Read stdout log from `<output-path>/self_test_*.log` (full runner log).
+2. Read `task_results.jsonl` from `task_subdir`. Each line is a JSON object with fields: `exec_index`, `case_name`, `report_dir`, `status` (PASS / FAIL / UNKNOWN), `reason`, `duration_seconds`, etc.
+   - `report_dir` is the absolute path to that case's execution directory. You MUST include this as `**AutoTest 任务路径**` in the report for EVERY case.
+   - **Status semantics**: `PASS` = AutoTest rendered a PASS badge. `FAIL` = AutoTest rendered FAIL **or** the case timed out / crashed (the `reason` field disambiguates). `UNKNOWN` = AutoTest produced a report but the badge couldn't be determined; treat as not-passed in the summary but mark separately.
+3. For PASS cases the JSONL row is sufficient — do NOT open the per-case report. For FAIL / UNKNOWN cases with empty `reason`, you may peek at the runner log of last resort:
    ```bash
    tail -40 "<report_dir>/agent.log"
    ```
-   > 🚨 **Do NOT read full HTML/MD/JSON reports** — they are very large (10KB–200KB) and will waste tokens. The reports live nested under `<report_dir>/<date>/<time>/` and are for the human user, not the agent. Stick to JSONL `reason` + (at most) `tail -40 agent.log`.
+   This caps token usage. **Do NOT read full HTML / MD / JSON per-case reports** — they are very large (10KB–200KB) and meant for human users.
-   Use this information to enrich the per-case details in `self-test-report.md` (Step 4), particularly for the `操作执行` and `AutoTest 详情` fields.
+> 🚨 **Do NOT modify test cases after execution**: FAIL / UNKNOWN means the HAP application does not support that functionality (or the agent could not verify it) — it is NOT a test case defect. Record the result as-is; do NOT edit, regenerate, or rewrite `testcases.json`.
----
-## Step 4 — Generate `self-test-report.md` (Python script, not LLM)
-> 🚨 **Do NOT compose the report markdown yourself.** A Python script
-> (`report_tool.py`) reads `task_<ts>/summary.json`, `task_results.jsonl`,
-> and every per-case `<HHMMSS>.json` (extracting the AutoTest planner's
-> `<judgment>` block verbatim), then renders the complete report from a strict
-> template that passes `report_tool.py validate`. This saves ~5 minutes
-> of LLM time per run AND removes the "agent drifts from contract" failure mode.
+### T9 — Generate self-test-report.md
-### 4.1 — Collect the inputs the script needs
+> 🚨 **Do NOT compose the report markdown yourself.** Use `report_tool.py`.
-Before invoking the script, derive these values:
+**Collect inputs:**
+- `<suite_name>` — read the first non-empty line of `test_case.md` (typically `# <title>`). Strip leading `#` and whitespace. When `setup=false`, `test_case.md` may not be at a known path; in that case use the `suite_name` field from `task_results.jsonl`'s first entry if present, else use `bundle_name` as a fallback.
+- `<device_serial>` — from T3.
+- `<hap_basename>` — basename of hap-path.
+- `<task_subdir>` — from T7 status response.
-- **`<suite_name>`** — read the first non-empty line of `integration_test_case.md`
-  (typically `# <title>`). Strip the leading `#` and any whitespace. This is the
-  value for `--suite`. Example: `# 新建歌单` → suite = `新建歌单`.
-- **`<device_serial>`** — already captured from `hdc list targets` in Step 2.
-- **`<hap_basename>`** — just the file name (the script also accepts the full
-  path; only the basename ends up in the report).
-- **`<task_subdir>`** — absolute path reported by the most recent
-  `self_test_runner.py status` response under `task_subdir`.
-### 4.2 — Run the renderer
+**Run the renderer:**
 ```bash
 cd "<autotest_dir>" && uv run python report_tool.py generate \
     --task-subdir   "<task_subdir>" \
-    --app-metadata  "<app-metadata-path>" \
+    --app-metadata  "<output-path>/app-metadata.json" \
     --hap           "<hap-path>" \
     --device        "<device_serial>" \
     --suite         "<suite_name>" \
@@ -261,94 +342,40 @@ cd "<autotest_dir>" && uv run python report_tool.py generate \
     --validate
 ```
-The `--validate` flag re-runs the format validator in-process after writing, so
-no separate validation step is needed. The script prints `Generated: <path>`
-followed by either:
-- `VALIDATION PASSED: …` → done. The report is ready.
-- `VALIDATION FAILED: …` → re-read the error list and fix it. Common causes:
-  the renderer found malformed JSONL rows, missing `task_results.jsonl`, or a
-  per-case `<HHMMSS>.json` whose structure differs from expectation. Do **NOT**
-  fall back to hand-writing the report — instead, fix the upstream data
-  (re-run the test) or extend `report_tool.py` to handle the edge case.
+- `VALIDATION PASSED` → done.
+- `VALIDATION FAILED` → fix upstream data and re-run; do NOT hand-write the report.
-To re-validate an existing report without regenerating it:
+To re-validate an existing report without regenerating it (debugging only):
 ```bash
 cd "<autotest_dir>" && uv run python report_tool.py validate "<output-path>/self-test-report.md"
 ```
-### 4.3 — How the script splits pre-cases and renders per-case detail
-You do **not** need to implement any of the following — they are documented so
-you can debug rendering decisions:
-- Rows whose `case_name` starts with `[PRE] ` go into `## 前置用例`. The `[PRE] `
-  prefix is stripped in the rendered case title (section header conveys context).
-- All other rows go into `## 用例详情`.
-- Pre indices and Case indices each start at 1 within their own section
-  (`### Pre 1:`, `### Pre 2:`, …; `### Case 1:`, `### Case 2:`, …).
-- For PASS / FAIL cases, the per-case `<HHMMSS>.json`'s last `task_end` event
-  contains a `<judgment>` block with the planner agent's structured verdict
-  (`预期验证` / `历史回顾` / `状态确认` / `Bug发现` / `判定结果`). The script dumps
-  these sub-sections verbatim under `AutoTest 详情`, plus an explicit `失败原因`
-  line (derived from `Bug发现` or JSONL `reason`).
-- For UNKNOWN cases (planner exhausted its step budget — no `<judgment>`), the
-  script falls back to JSONL `reason` + the last `planner_thinking` event so
-  the fixer agent still has context.
-- The `测试总结` block includes counts split by Pre / Regular, plus a
-  `未通过用例列表` table (with a `类别` column) and a templated `建议` section
-  whose bullets are picked based on which failure categories occurred
-  (pre-case failure / UNKNOWN cases / regular failures).
-> **What pre-cases are (and aren't)**: Pre-cases are **data/environment setup
-> scripts**, not feature scenarios under test. They exist purely to prepare
-> what regular cases need (e.g., import a test track, grant permissions, skip
-> onboarding). Their PASS/FAIL signals only whether the test environment was
-> ready — they are **not part of the requirement-under-test** and a FAIL here
-> almost always indicates a testing-environment issue (missing media,
-> ungranted permission, system file-picker glitch), NOT an application defect.
-> This is why the script:
->   - surfaces the **常规通过率** (regular-only pass rate) as the primary quality
->     metric in `## 测试概览` and `## 测试总结`;
->   - keeps `含前置通过率` only as a secondary reference with an explicit
->     disclaimer;
->   - explicitly tells the 建议 reader that pre-failures should be treated as
->     environment problems first, and only enter the fix flow if white-box
->     review confirms a real code Bug.
-> Downstream agents (especially `self-test-fixer`) must respect this framing —
-> do not modify application code based on a pre-case failure unless the white-
-> box pass concludes the failure surfaces a genuine app defect.
-> 🚨 **Heading format — EXACT, no variants accepted by the validator**:
-> - Pre-case headings: `### Pre <N>: <case_name without [PRE] prefix>`.
-> - Regular-case headings: `### Case <N>: <case_name>`.
-> - `<N>` MUST be a plain decimal integer (`1`, `2`, `3`, …). Both `<N>` series
->   start at 1 within their own section.
-> - The script emits these forms automatically — do not hand-edit the output
->   to use any other variant (`### Case PRE-1:`, `### Pre-1:`, etc.).
----
+**Pre-case rendering**: Rows with `case_name` starting with `[PRE] ` go into `## 前置用例`. All others go into `## 用例详情`. Pre indices and Case indices each start at 1 within their own section (`### Pre 1:`, `### Case 1:`). The script emits these forms automatically — do NOT hand-edit to `### Case PRE-1:` or any other variant; the validator will reject it.
-> 🚨 **Why the validator matters**: The `self-test-fixer` agent reads this report to determine what failed and why. A table-only format with just "FAIL" gives the fixer zero information to work with — it cannot distinguish real code bugs from test-agent false positives, and cannot plan targeted fixes. Every failed case MUST include the test actions, expected results, actual observations, and failure reason. The `--validate` flag passed to `report_tool.py generate` enforces this; if it prints `VALIDATION FAILED`, fix the upstream data and re-run — do NOT hand-edit the report.
+> **What pre-cases are (and aren't) — context downstream agents MUST respect**:
+> Pre-cases are **data/environment setup scripts** (e.g., import a test track, grant permissions, skip onboarding). They are NOT feature scenarios under test. A pre-case FAIL almost always indicates a testing-environment issue (missing media, ungranted permission, system file-picker glitch), **NOT an application defect**. That is why the report:
+>   - surfaces **常规通过率** (regular-only pass rate) as the primary quality metric in `## 测试概览` and `## 测试总结`;
+>   - keeps `含前置通过率` only as a secondary reference with an explicit disclaimer;
+>   - tells the 建议 reader that pre-failures should be treated as environment problems first, and only enter the fix flow if a white-box review confirms a real code bug.
+>
+> Downstream agents — especially `self-test-fixer` — must respect this framing: **do not modify application code based on a pre-case failure unless white-box review concludes the failure surfaces a genuine app defect**. This rationale is intentional and load-bearing; do not collapse it into a one-liner.
 ---
 ## Guidelines
-> 🚨 **#1 RULE — Use `self_test_runner.py` for run and status**: Always use `self_test_runner.py run` to perform setup and start batch execution as a detached process, then `self_test_runner.py status` to poll for completion. This ensures the test process survives agent session termination.
-> 🚨 **#2 RULE — Report must include AutoTest 任务路径 for EVERY case**: Each case section in `self-test-report.md` MUST include the `**AutoTest 任务路径**` field with the absolute `report_dir` path from `task_results.jsonl`. This is the path to the per-case execution directory containing logs, screenshots, and the HTML report. Without this field, the report is incomplete and users cannot locate execution artifacts. After writing the report, verify that EVERY case section contains this field.
-- **Do NOT generate test cases** — test case generation is handled by the `self-test-setup` agent. This agent receives a pre-generated and validated `testcases.jsonl` via `testcases-path`. Do NOT create, modify, or regenerate `testcases.jsonl`.
-- **Verify device connection first** — always run `hdc list targets` before executing tests.
-- **Use `self_test_runner.py` for everything** — do NOT call `python -m AutoTest.batch`, `python -m AutoTest`, `uv sync`, `uv pip install`, or `hdc install -r` directly. Use `self_test_runner.py run` to start and `self_test_runner.py status` to poll.
-- **Capture complete output** — never truncate command output; it is essential for diagnosis.
+- **#1 — Use `self_test_runner.py` for run and status**: Always use `self_test_runner.py run` to perform setup + start batch as a detached process, then `self_test_runner.py status` to poll. This ensures the test process survives agent session termination.
+- **#2 — Every case MUST include `**AutoTest 任务路径**`**: the absolute `report_dir` from `task_results.jsonl`. Without it the report is incomplete and users cannot locate execution artifacts. After T9, verify every case section contains this field.
+- **Two-phase architecture**: When the parse phase runs, the LLM writes `_extracted.json` → `testcases_tool.py` writes `testcases.json`. Never write `testcases.json` directly. If `testcases.json` already exists at the output path and the parse phase runs, `testcases_tool.py` overwrites it — never skip the script.
+- **App name → bundle_name is critical**: `test_steps` must use `bundle_name` (e.g., `com.example.tuku`), NOT the display name. AutoTest uses this to launch the correct app. Using the display name instead may launch the wrong app entirely.
+- **JSON contract**: `app-metadata.json` and `testcases.json` always live at `<output-path>/app-metadata.json` and `<output-path>/testcases.json`. T2 reads from the file (not from memory).
+- **AutoTest directory locator**: Walk up from `<output-path>` (see Shared Utilities). Do not assume `$(pwd)` is anchored anywhere in particular.
+- **Sentinel-format early-exit reports**: When T1 / T3 / T4 / T7 write a degraded FAIL report (config error, no device, missing autotest dir, crashed batch, timeout, or `setup=false` precondition failure), the report's first non-frontmatter line MUST be `status: FAIL` and the second MUST be `reason: <one-line reason>`. This lets callers detect early-exit reports without parsing a case table.
 - **Quote all paths** — paths may contain spaces.
-- **Report, don't fix** — this agent only verifies and reports. Fixing is done by the fixer agents.
-- **Never modify test cases after execution** — execution failures (FAIL / UNKNOWN) indicate the HAP app lacks that functionality, NOT a test case defect. Record failures as-is; do NOT rewrite or regenerate test cases.
-- **Continue on failure** — `AutoTest.batch` already continues to the next case after a failure; the agent does not need to retry anything.
-- **Read batch results from JSONL only** — after batch execution, parse `task_results.jsonl` to populate the test report. The `status` (`PASS`/`FAIL`/`UNKNOWN`) and `reason` fields are the source of truth. Do NOT grep the per-case `.md` for `任务结果` — the new whl does not write that phrase to markdown.
-- **Clean task directory before execution** — always delete the contents of the `task/` directory before running `self_test_runner.py run` to prevent stale results from interfering.
-- **Do NOT read AutoTest internals** — the instructions above already provide the complete calling convention. Do not spend time reading installed `AutoTest` modules or `self_test_runner.py` source code to "understand" how they work.
-- **Do NOT transform or re-generate test cases** — the `testcases.jsonl` from `testcases-path` is already in the final format. Pass it directly via `--testcases`. Do NOT create any intermediate files.
+- **Capture complete output** — never truncate command output; it is essential for diagnosis.
+- **Do NOT read AutoTest internals** — the instructions above provide the complete calling convention. Do not spend time reading installed `AutoTest` modules or `self_test_runner.py` source code to "understand" how they work.
+- **Do NOT hand-write reports** — `report_tool.py generate --validate` renders all reports. Hand-writing bypasses the validator that downstream `self-test-fixer` relies on.
+- **Validate is the gate**: S5 `VALIDATION PASSED` and T9 `VALIDATION PASSED` are the completion signals.
+- **Do NOT modify test cases after execution**: Failures indicate HAP functionality gaps, not test case defects. Record results as-is.
+- **Pre-cases are environment scripts, not feature scenarios**: their failures usually indicate environment issues. The **常规通过率** (regular-only pass rate) is the primary quality metric; `self-test-fixer` must not modify application code based on a pre-case failure unless white-box review confirms a real defect.