npm - @htechcs/harness-kit - Versions diffs - 0.1.0 → 0.1.1 - Mend

@htechcs/harness-kit 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.en.md +8 -8
package/README.md +8 -8
package/bin/cli.js +43 -43
package/docs/harness-engineering-tutorial.en.md +1 -1
package/docs/harness-engineering-tutorial.md +1 -1
package/package.json +1 -1
package/skills/init-harness/SKILL.md +74 -74
package/templates/agents/README.md +25 -24
package/templates/agents/repo-explorer.md +16 -16
package/templates/evals/README.md +39 -35
package/templates/evals/cases/example-task.md +22 -22
package/templates/evals/observability.md +43 -42
package/templates/guardrails/README.md +59 -57
package/templates/long-running/README.md +29 -28
package/templates/long-running/TASK.md +19 -19
package/templates/mcp-audit.md +16 -16
package/templates/new-worktree.sh +16 -16
package/templates/setup.sh +25 -25
package/templates/spec/FEATURE.md +19 -19
package/templates/spec/README.md +20 -20

package/templates/agents/README.md CHANGED Viewed

@@ -1,43 +1,44 @@
-# Subagents — quy tắc viết đúng (Mức 2)
+# Subagents — how to write a good one (Level 2)
-Subagent là **đòn bẩy chính của Mức 2**: đẩy việc nặng (đọc nhiều file, refactor lớn,
-chạy thử lặp) sang một **context riêng**, để context chính chỉ nhận *kết luận* — không bị ngập.
+A subagent is **Level 2's main lever**: push heavy work (reading many files, big refactors,
+repeated test runs) into a **separate context**, so the main context receives only *conclusions* —
+it doesn't flood.
-## Cài đặt
+## Install
-Mỗi subagent là một file `.md` trong `.claude/agents/` (cấp repo) hoặc `~/.claude/agents/` (cấp user).
-Copy file mẫu vào đó là Claude Code tự nhận:
+Each subagent is a `.md` file in `.claude/agents/` (repo-level) or `~/.claude/agents/` (user-level).
+Copy the sample file there and Claude Code picks it up automatically:
 ```bash
 mkdir -p .claude/agents
 cp repo-explorer.md .claude/agents/
 ```
-## Cấu trúc file
+## File structure
 ```md
 ---
 name: <kebab-case>
-description: <KHI NÀO dùng — Claude đọc dòng này để tự quyết có gọi agent này không>
-tools: Read, Grep, Glob        # (tuỳ chọn) chỉ trang bị tool agent CẦN, để nó gọn
+description: <WHEN to use it — Claude reads this line to decide whether to call this agent>
+tools: Read, Grep, Glob        # (optional) give it only the tools it NEEDS, to keep it focused
 ---
-<system prompt: vai trò + cách làm + ĐẶC BIỆT là cách TRẢ VỀ>
+<system prompt: role + how to work + ESPECIALLY how to RETURN>
 ```
-## 4 quy tắc không được quên
+## 4 rules you can't forget
-1. **`description` viết theo "khi nào dùng", không phải "là gì".** Đây là thứ agent chính
-   đọc để quyết định có ủy thác hay không. Mơ hồ → không bao giờ được gọi.
-2. **Subagent phải trả về *kết luận chắt lọc*, không phải nhật ký.** Cả lý do tồn tại của
-   subagent là để context chính KHÔNG phải nuốt thứ nó đọc. Trả raw dump = phá mục tiêu.
-3. **`tools` chỉ liệt kê cái thật cần.** Đây KHÔNG phải guardrail an toàn (đó là Mức 3) —
-   mà để agent gọn, tập trung. Agent đọc-hiểu chỉ cần `Read, Grep, Glob`.
-4. **Một subagent = một việc rõ.** Đừng gộp "đọc + sửa + test" vào một agent. Việc nặng
-   khác nhau → context riêng khác nhau.
+1. **Write `description` as "when to use", not "what it is".** This is what the main agent reads
+   to decide whether to delegate. Vague → it never gets called.
+2. **A subagent must return a *distilled conclusion*, not a log.** The whole reason a subagent
+   exists is so the main context does NOT have to swallow what it read. A raw dump defeats the purpose.
+3. **`tools` lists only what's truly needed.** This is NOT a safety guardrail (that's Level 3) —
+   it keeps the agent lean and focused. A comprehension agent only needs `Read, Grep, Glob`.
+4. **One subagent = one clear job.** Don't merge "read + edit + test" into one agent. Different
+   heavy jobs → different separate contexts.
-## Khi nào KHÔNG cần viết subagent mới
+## When you DON'T need a new subagent
-Claude Code đã có sẵn `Explore` (quét read-only), `Plan` (thiết kế), `general-purpose`.
-Chúng lo phần lớn nhu cầu cô lập context rồi. **Chỉ viết subagent riêng khi** bạn có một
-việc lặp lại, đặc thù domain mà built-in không nắm được — đừng đẻ bản generic trùng lặp,
-vì chính tool thừa là cái nhiễu context mà Mức 2 dạy phải tránh.
+Claude Code already ships `Explore` (read-only sweeps), `Plan` (design), and `general-purpose`.
+They cover most context-isolation needs already. **Only write your own subagent when** you have a
+recurring, domain-specific job the built-ins don't handle — don't spawn a redundant generic copy,
+because surplus tools are exactly the context noise Level 2 teaches you to avoid.

package/templates/agents/repo-explorer.md CHANGED Viewed

@@ -1,27 +1,27 @@
 ---
 name: repo-explorer
-description: Đọc nhiều file để trả lời câu hỏi "cái gì nằm ở đâu / luồng này chạy ra sao" mà KHÔNG làm ngập context chính. Dùng khi cần quét rộng codebase và chỉ cần kết luận, không cần nội dung từng file. Read-only.
+description: Read many files to answer "where is X / how does this flow work" WITHOUT flooding the main context. Use when you need a broad sweep of the codebase and only need conclusions, not the contents of each file. Read-only.
 tools: Read, Grep, Glob
 ---
-Bạn là agent **đọc-hiểu codebase**. Việc của bạn là quét nhiều file rồi trả về
-một **kết luận chắt lọc** cho agent chính — agent chính KHÔNG thấy được những gì
-bạn đọc, chỉ thấy phần bạn trả về. Vì vậy:
+You are a **codebase-comprehension** agent. Your job is to scan many files and return a
+**distilled conclusion** to the main agent — the main agent CANNOT see what you read, only what
+you return. Therefore:
-## Cách làm
+## How to work
-1. Bám sát đúng câu hỏi được giao. Đừng mở rộng phạm vi.
-2. Dùng `Grep`/`Glob` để khoanh vùng trước, chỉ `Read` file thật sự liên quan.
-3. Đọc *đủ để kết luận*, không đọc cho hết.
+1. Stick to the exact question you were given. Don't widen the scope.
+2. Use `Grep`/`Glob` to narrow down first; only `Read` files that are genuinely relevant.
+3. Read *enough to conclude*, not everything.
-## Cách trả về (quan trọng nhất)
+## How to return (the most important part)
-Trả về **kết luận, không phải nhật ký**. Cụ thể:
+Return **a conclusion, not a log**. Specifically:
-- Câu trả lời trực tiếp cho câu hỏi, lên đầu.
-- Các con trỏ `đường-dẫn/file.ts:dòng` cho mỗi điểm quan trọng (để agent chính tự mở khi cần).
-- TUYỆT ĐỐI không dán nguyên nội dung file dài vào câu trả lời — đó chính là cái
-  làm ngập context mà subagent này sinh ra để tránh.
-- Nếu không tìm thấy, nói thẳng "không tìm thấy X", đừng đoán.
+- The direct answer to the question, up top.
+- Pointers `path/file.ts:line` for each important spot (so the main agent can open them itself when needed).
+- NEVER paste long file contents into your answer — that is exactly the context flooding this
+  subagent exists to prevent.
+- If you can't find it, say plainly "couldn't find X", don't guess.
-Mục tiêu: agent chính đọc 15 dòng của bạn là hiểu, thay vì phải tự đọc 50 file.
+Goal: the main agent reads your 15 lines and understands, instead of having to read 50 files itself.

package/templates/evals/README.md CHANGED Viewed

@@ -1,51 +1,55 @@
-# Evals & Observability — đo agent có làm đúng không (Mức 5)
+# Evals & Observability — measure whether the agent does the right thing (Level 5)
-Mức 1–4 *dựng* harness. Mức 5 là **vòng phản hồi**: làm sao biết khi bạn sửa CLAUDE.md /
-settings / skill thì agent **tốt lên hay tệ đi**? Không có Mức 5, mọi thay đổi harness chỉ là đoán.
+Levels 1–4 *build* the harness. Level 5 is the **feedback loop**: how do you know whether changing
+CLAUDE.md / settings / a skill made the agent **better or worse**? Without Level 5, every harness
+change is a guess.
-## Hai nửa
+## Two halves
-| Nửa | Trả lời | File |
-|-----|---------|------|
-| **Evals** | "Agent ra kết quả *đúng* không?" (pass/fail đo được) | [`cases/`](./cases/) |
-| **Observability** | "Agent đã *làm gì*, vì sao hỏng?" | [`observability.md`](./observability.md) |
+| Half | Answers | File |
+|------|---------|------|
+| **Evals** | "Does the agent produce the *right* result?" (measurable pass/fail) | [`cases/`](./cases/) |
+| **Observability** | "What did the agent *do*, and why did it fail?" | [`observability.md`](./observability.md) |
-Evals nói **đúng/sai**. Observability nói **tại sao** — và thường lộ bệnh ở mức thấp hơn
-(đọc lại file hoài = context bẩn → Mức 2; hỏi quyền liên tục → chỉnh `allow` ở Mức 3).
+Evals tell you **right/wrong**. Observability tells you **why** — and often reveals a problem at a
+lower level (re-reading files endlessly = dirty context → Level 2; constant permission prompts → fix
+`allow` at Level 3).
-## Vòng lặp (đây mới là cốt lõi, không phải file)
+## The loop (this is the core, not the files)
 ```
-define  →  run  →  read  →  fix  →  (chạy lại)
-golden     cho       đọc      sửa
+define  →  run  →  read  →  fix  →  (re-run)
+golden     the       the      the
 task       agent     trace    harness
-           chạy      / kết    (CLAUDE.md,
-                     quả      settings, skill)
+                     /result  (CLAUDE.md,
+                              settings, skill)
 ```
-Đòn bẩy thật: bộ golden task là **lưới chống regression cho chính harness**. Sửa CLAUDE.md xong,
-chạy lại bộ case — case trước đậu mà giờ rớt → gần như chắc chắn do thay đổi của mình (xác nhận
-bằng cách chạy lại vài lần để loại nhiễu), không phải đoán.
+The real lever: the golden-task set is a **regression net for the harness itself**. After you change
+CLAUDE.md, re-run the set — a case that used to pass and now fails → almost certainly your change
+(confirm by re-running a few times to rule out noise), not a guess.
-## Sự thật: Mức 5 ít "gói thành file" nhất
+## The truth: Level 5 is the least file-able
-Eval *thật* thì **đặc thù domain** — không kit nào ship sẵn "agent của bạn làm đúng chưa".
-Kit này chỉ cho **khung + kỷ luật**:
+A *real* eval is **domain-specific** — no kit ships "is your agent correct yet" out of the box. This
+kit only gives you the **scaffold + discipline**:
-- ✅ File: cấu trúc thư mục + template golden-task + guide observability.
-- 🧠 Discipline (phần lớn): *định nghĩa* "đúng" cho task của bạn, dựng bộ case, đọc trace,
-  **đóng vòng lặp** (eval phát hiện regression → sửa harness → chạy lại).
+- **File:** the folder structure + a golden-task template + an observability guide.
+- **Discipline (most of it):** *define* what "correct" means for your task, build the case set, read
+  traces, **close the loop** (eval finds a regression → fix the harness → re-run).
-Kit cố ý **không** ship runner code — runner là đặc thù repo, giả vờ generic sẽ thành rác.
+The kit deliberately ships **no** runner code — a runner is repo-specific, and faking a generic one
+would just be junk.
-## Bắt đầu
+## Getting started
-1. Copy `cases/example-task.md` thành một case thật, điền done-criteria *khách quan*.
-2. Gom 3–5 task **đại diện** (việc agent làm thường xuyên nhất) — không cần nhiều, cần đúng.
-3. **Lấy baseline "không harness" TRƯỚC.** Chạy bộ case một lần ở trạng thái chưa áp harness
-   (CLAUDE.md trống / trước khi cài Mức 1–4) và ghi điểm. Đây là cái **định lượng ROI của harness**:
-   chạy lại sau khi đã áp harness rồi so delta — chính là luận điểm mở đầu của ngành (đổi harness
-   làm điểm nhảy; xem `docs/harness-engineering-tutorial.md`). *Khác* với regression ở bước sau.
-4. Sau đó, trước/sau **mỗi lần sửa** harness, chạy lại bộ case và đối chiếu (chạy vài lần để loại
-   nhiễu trước khi kết luận nhân-quả).
-5. Khi một case rớt, mở [`observability.md`](./observability.md) để truy *vì sao*.
+1. Copy `cases/example-task.md` into a real case and fill in *objective* done-criteria.
+2. Gather 3–5 **representative** tasks (what the agent does most often) — you don't need many, you need the right ones.
+3. **Take a "no-harness" baseline FIRST.** Run the set once with the harness not yet applied (empty
+   CLAUDE.md / before installing Levels 1–4) and record the score. This is what **quantifies the
+   harness's ROI**: re-run after applying the harness and compare the delta — exactly the field's
+   opening thesis (changing the harness moves the score; see `docs/harness-engineering-tutorial.md`).
+   This is *different* from the regression check below.
+4. After that, before/after **each change** to the harness, re-run the set and compare (run it a few
+   times to rule out noise before concluding cause).
+5. When a case fails, open [`observability.md`](./observability.md) to trace *why*.

package/templates/evals/cases/example-task.md CHANGED Viewed

@@ -1,33 +1,33 @@
 <!--
-Một "golden task" = một việc đại diện + tiêu chí đậu KHÁCH QUAN, để chạy lại được sau mỗi
-lần sửa harness. Copy file này cho mỗi case. Đặt tên theo việc: add-endpoint.md, fix-flaky-test.md...
+A "golden task" = one representative job + OBJECTIVE pass criteria, so it can be re-run after every
+harness change. Copy this file per case. Name it after the job: add-endpoint.md, fix-flaky-test.md...
-Nguyên tắc: done-criteria phải đo được bằng MÁY nếu có thể (lệnh chạy ra pass/fail), chỉ rớt
-xuống chấm bằng người khi bất khả kháng. "Trông có vẻ đúng" KHÔNG phải tiêu chí.
-Xoá comment này khi dùng thật.
+Principle: done-criteria must be MACHINE-checkable where possible (a command that returns pass/fail);
+fall back to human grading only when unavoidable. "Looks right" is NOT a criterion.
+Delete this comment when you use it for real.
 -->
-# Case: <tên việc — vd "thêm endpoint GET /users/:id">
+# Case: <the job — e.g. "add endpoint GET /users/:id">
-## Task (đưa nguyên văn cho agent)
-<Lời nhắc y như bạn sẽ gõ cho agent. Càng giống thực tế càng tốt.>
+## Task (give this verbatim to the agent)
+<The exact prompt you'd type to the agent. The closer to reality, the better.>
-## Setup (repo phải ở trạng thái nào trước khi chạy)
-<Branch/commit nền, dữ liệu seed, biến môi trường. Để case lặp lại được giống nhau mỗi lần.>
+## Setup (what state the repo must be in before running)
+<Base branch/commit, seed data, env vars. So the case repeats identically every time.>
 - Base: <commit/branch>
 -
-## Done-criteria (KHÁCH QUAN — đậu/rớt rõ ràng)
-<Cái PHẢI đúng sau khi agent xong. Ưu tiên lệnh chạy ra pass/fail.>
-- [ ] `<lệnh test>` xanh
-- [ ] `<lệnh lint / typecheck>` sạch
-- [ ] <thay đổi cụ thể tồn tại — vd "route mới trả 200 với id hợp lệ, 404 nếu không">
-- [ ] KHÔNG đụng <file/đường ngoài phạm vi>
+## Done-criteria (OBJECTIVE — clear pass/fail)
+<What MUST be true after the agent finishes. Prefer commands that return pass/fail.>
+- [ ] `<test command>` green
+- [ ] `<lint / typecheck command>` clean
+- [ ] <a concrete change exists — e.g. "new route returns 200 for a valid id, 404 otherwise">
+- [ ] Did NOT touch <out-of-scope file/path>
-## Cách chấm
-<Tự động được thì ghi đúng lệnh. Phải chấm tay thì ghi rubric ngắn, đừng để "tự hiểu".>
-- Tự động: `<lệnh trả exit code>`
-- Tay (nếu cần): <1–2 câu rubric>
+## How to grade
+<If automatable, write the exact command. If it must be manual, write a short rubric; don't leave it "implied".>
+- Automated: `<command that returns an exit code>`
+- Manual (if needed): <1–2 sentence rubric>
-## Tham chiếu (tùy chọn)
-<Commit/PR "đúng" để so, nếu có. Giúp thấy agent lệch ở đâu.>
+## Reference (optional)
+<A "correct" commit/PR to compare against, if any. Helps see where the agent diverged.>

package/templates/evals/observability.md CHANGED Viewed

@@ -1,42 +1,43 @@
-# Observability — thấy agent đã làm gì (Mức 5)
-Khi một eval rớt — hoặc agent "làm gì đó lạ" — bạn cần *nhìn* được nó đã làm gì, không đoán.
-Đây là các chỗ mà nhìn, từ rẻ tới sâu.
-## Chỗ nào mà nhìn
-1. **Transcript phiên** — bản ghi *mọi* tool call agent đã gọi (đọc file nào, chạy lệnh gì,
-   sửa gì). Đây là nguồn sự thật số một khi truy "vì sao nó làm vậy". Claude Code lưu transcript
-   mỗi session dưới thư mục project trong `~/.claude/`.
-2. **`/cost`** — token & chi phí của phiên. Tăng vọt bất thường = dấu hiệu context phình
-   (đọc lại file thừa, MCP nhồi tool) → kéo về Mức 2.
-3. **Telemetry (cho cả team / chạy nền)** — Claude Code xuất được metrics/logs qua OpenTelemetry:
-   bật bằng biến môi trường `CLAUDE_CODE_ENABLE_TELEMETRY` rồi trỏ OTLP exporter sang backend
-   của bạn (Grafana, Honeycomb, Datadog…). Dùng khi cần theo dõi nhiều phiên/agent, không chỉ một.
-   → Tra cấu hình chính xác trong docs Claude Code mục *monitoring / telemetry*.
-4. **Log hook (audit chủ động)** — gắn `PostToolUse` hook ghi mỗi tool call ra file (xem
-   [guardrails/README.md](../guardrails/README.md) mục Hooks). Hữu ích khi chạy nền và muốn xem lại sau.
-## Dấu hiệu bệnh & nó chỉ về mức nào
-Observability không chỉ để debug một phiên — nó **lộ ra lỗ hổng harness**:
-| Thấy gì trong trace | Bệnh | Sửa ở mức |
-|---------------------|------|-----------|
-| Đọc đi đọc lại cùng file; token leo thang | context bẩn | **Mức 2** (subagent / `/clear` / cắt MCP) |
-| Bị hỏi quyền liên tục cho lệnh an toàn | allowlist thiếu | **Mức 3** (thêm vào `allow`) |
-| Suýt chạy lệnh phá hoại | thiếu chốt | **Mức 3** (thêm vào `deny`/`ask`) |
-| Mất ngữ cảnh giữa phiên dài, làm lại từ đầu | không checkpoint | **Mức 4** (`TASK.md`) |
-| Làm sai mà không ai biết tới lúc muộn | thiếu eval | **Mức 5** (thêm golden task) |
-| Mơ hồ "build/test chạy sao" | chỉ dẫn thiếu | **Mức 1** (CLAUDE.md) |
-| Lặp đi lặp lại cùng một lỗi qua nhiều phiên | luật chưa được ghi | **Mức 1** (thêm 1 dòng vào CLAUDE.md) |
-**Đóng vòng về Mức 1 — `CLAUDE.md` là tài liệu sống.** Khi trace cho thấy agent **lặp lại** một
-lỗi Z (vd quên chạy migration, sửa nhầm file generated), đừng chỉ sửa tay lần này: thêm **một dòng**
-guardrail/quy ước vào `CLAUDE.md` để chặn lần sau, rồi **chạy lại golden task** xác nhận hết
-regression. Đó là cách `CLAUDE.md` lớn lên *từ lỗi thật*, thay vì phình theo phỏng đoán.
-## Nguyên tắc
-> Đừng cải thiện harness bằng cảm giác. **Đọc trace, để nó chỉ đúng mức cần sửa**, sửa, rồi
-> chạy lại golden task để xác nhận tốt lên thật. Đó là toàn bộ vòng lặp Mức 5.
+# Observability — see what the agent did (Level 5)
+When an eval fails — or the agent "does something weird" — you need to *see* what it did, not guess.
+Here's where to look, cheapest to deepest.
+## Where to look
+1. **The session transcript** — a record of *every* tool call the agent made (which files it read,
+   which commands it ran, what it edited). This is the number-one source of truth for "why did it do
+   that". Claude Code stores a per-session transcript under the project folder in `~/.claude/`.
+2. **`/cost`** — the session's tokens & cost. An unusual spike = a sign of context bloat (re-reading
+   surplus files, MCP tool stuffing) → pull it back to Level 2.
+3. **Telemetry (for a whole team / background runs)** — Claude Code can export metrics/logs via
+   OpenTelemetry: enable it with the `CLAUDE_CODE_ENABLE_TELEMETRY` env var, then point an OTLP
+   exporter at your backend (Grafana, Honeycomb, Datadog…). Use it when you need to watch many
+   sessions/agents, not just one. → Find the exact config in the Claude Code docs under *monitoring / telemetry*.
+4. **A log hook (proactive audit)** — attach a `PostToolUse` hook that writes each tool call to a file
+   (see [guardrails/README.md](../guardrails/README.md), the Hooks section). Handy for background runs you want to review later.
+## Symptoms & which level they point to
+Observability isn't just for debugging one session — it **surfaces harness gaps**:
+| What you see in the trace | Problem | Fix at level |
+|---------------------------|---------|--------------|
+| Re-reading the same files; tokens climbing | dirty context | **Level 2** (subagent / `/clear` / prune MCP) |
+| Constant permission prompts for safe commands | missing allowlist | **Level 3** (add to `allow`) |
+| Nearly ran a destructive command | missing guard | **Level 3** (add to `deny`/`ask`) |
+| Loses context mid-long-session, starts over | no checkpoint | **Level 4** (`TASK.md`) |
+| Does it wrong and nobody notices until late | missing eval | **Level 5** (add a golden task) |
+| Vague about "how do build/test run" | missing guidance | **Level 1** (CLAUDE.md) |
+| Repeats the same mistake across sessions | rule never written down | **Level 1** (add one line to CLAUDE.md) |
+**Closing the loop back to Level 1 — `CLAUDE.md` is a living document.** When the trace shows the agent
+**repeating** a mistake Z (e.g. forgetting to run a migration, editing a generated file), don't just
+fix it by hand this time: add **one line** of guardrail/convention to `CLAUDE.md` to prevent it next
+time, then **re-run the golden task** to confirm the regression is gone. That's how `CLAUDE.md` grows
+*from real mistakes*, instead of bloating on speculation.
+## Principle
+> Don't improve the harness on vibes. **Read the trace, let it point to the exact level to fix**, fix
+> it, then re-run the golden task to confirm it actually got better. That's the whole Level 5 loop.

package/templates/guardrails/README.md CHANGED Viewed

@@ -1,88 +1,90 @@
-# Guardrails — permission baseline (Mức 3)
+# Guardrails — permission baseline (Level 3)
-Mức 3 kiểm soát agent **được phép làm gì** — ranh giới an toàn. Khác Mức 2 (context *sạch*),
-Mức 3 lo hành động *an toàn*: chặn lệnh phá hoại, hỏi trước việc rủi ro, cho việc an toàn chạy thẳng.
+Level 3 controls **what the agent is allowed to do** — the safety boundary. Unlike Level 2 (a *clean*
+context), Level 3 is about *safe* actions: block destructive commands, ask before risky ones, let
+safe ones run straight through.
-## Cài đặt
+## Install
-Copy `settings.json` vào `.claude/` của repo:
+Copy `settings.json` into the repo's `.claude/`:
 ```bash
 mkdir -p .claude
 cp settings.json .claude/settings.json
 ```
-**Vì sao là `.claude/settings.json` (không phải `settings.local.json`):** file này **check vào repo**,
-nên cả team clone về là **tự thừa hưởng cùng một bộ guardrail**. Còn `settings.local.json` là
-ghi đè cá nhân (đã gitignore) — để dành cho tinh chỉnh riêng máy bạn, không ép lên team.
+**Why `.claude/settings.json` (not `settings.local.json`):** this file is **checked into the repo**,
+so everyone who clones it **inherits the same guardrails automatically**. `settings.local.json` is a
+personal override (gitignored) — use it for machine-specific tweaks, not to impose on the team.
-> ⚡ **Việc ĐẦU TIÊN sau khi copy — thêm lệnh test/lint/build vào `allow`.** Baseline cố ý chỉ
-> allow git read-only. Nếu không thêm vòng feedback của repo, Claude sẽ hỏi quyền *mỗi lần* chạy
-> test → bạn rơi đúng thói quen "bấm yes cho xong" mà mục *Insight* bên dưới gọi là nguy hiểm nhất.
-> Mở `.claude/settings.json`, thêm vào `allow` đúng lệnh của stack bạn:
+> **The FIRST thing to do after copying — add test/lint/build commands to `allow`.** The baseline
+> deliberately only allows read-only git. Without your repo's feedback loop, Claude asks permission
+> *every time* it runs tests → you fall into the "click yes to get it over with" habit the *Insight*
+> section below calls the real danger. Open `.claude/settings.json` and add your stack's commands to `allow`:
 >
 > - **Node:** `"Bash(npm run test:*)"`, `"Bash(npm run lint:*)"`, `"Bash(npm run build:*)"`
 > - **Python:** `"Bash(pytest:*)"`, `"Bash(ruff:*)"`, `"Bash(mypy:*)"`
 > - **Go:** `"Bash(go test:*)"`, `"Bash(go build:*)"`, `"Bash(go vet:*)"`
 >
-> Đây là **bắt buộc**, không phải tuỳ chọn — vòng feedback nhanh là tinh tuý xuyên suốt cả kit.
+> This is **mandatory, not optional** — a fast feedback loop is the thread running through the whole kit.
-## Mô hình 3 rổ
+## The 3-bucket model
-Mọi hành động (chạy bash, đọc/sửa file) rơi vào 1 trong 3 rổ:
+Every action (run bash, read/edit a file) falls into one of 3 buckets:
-| Rổ | Nghĩa | Ví dụ trong baseline |
-|----|-------|----------------------|
-| **deny** | cấm tuyệt đối, agent không gọi được | `rm -rf`, đọc `.env`/`secrets/**`, đọc key `*.pem` |
-| **ask** | dừng lại hỏi bạn trước | `git push`, `git reset --hard`, `git clean`, `rm` |
-| **allow** | chạy thẳng, không hỏi | `git status`, `git diff`, `git log`, `git branch` |
+| Bucket | Meaning | Examples in the baseline |
+|--------|---------|--------------------------|
+| **deny** | absolutely forbidden, the agent can't call it | `rm -rf`, read `.env`/`secrets/**`, read `*.pem` keys |
+| **ask** | stop and ask you first | `git push`, `git reset --hard`, `git clean`, `rm` |
+| **allow** | run straight through, no prompt | `git status`, `git diff`, `git log`, `git branch` |
-Cú pháp rule: `Tool(specifier)`.
-- Bash khớp theo **tiền tố**: `Bash(npm run test:*)` khớp mọi lệnh bắt đầu bằng `npm run test`.
-- File theo kiểu **gitignore**: `Read(./secrets/**)`, `Edit(./dist/**)`.
+Rule syntax: `Tool(specifier)`.
+- Bash matches by **prefix**: `Bash(npm run test:*)` matches any command starting with `npm run test`.
+- Files match **gitignore-style**: `Read(./secrets/**)`, `Edit(./dist/**)`.
-## Cách mở rộng cho repo của bạn
+## Extend it for your repo
-Baseline cố ý **tối giản và universal**. Hãy thêm cái đặc thù repo:
+The baseline is deliberately **minimal and universal**. Add what's specific to your repo:
-- **Vào `allow`** — lệnh chạy hằng ngày, an toàn, để khỏi bị hỏi liên tục:
+- **Into `allow`** — safe, daily commands so you aren't asked constantly:
   `Bash(npm run test:*)`, `Bash(npm run lint:*)`, `Bash(pytest:*)`, `Bash(make:*)`.
-- **Vào `ask`** — việc đặc thù repo mà *hệ quả lớn*:
-  chạy migration (`Bash(npm run migrate:*)`), deploy, `Bash(docker compose down:*)`.
-- **Vào `deny`** — path không bao giờ được sửa/đọc:
+- **Into `ask`** — repo-specific actions with *big consequences*:
+  running a migration (`Bash(npm run migrate:*)`), deploys, `Bash(docker compose down:*)`.
+- **Into `deny`** — paths that must never be edited/read:
   `Edit(./dist/**)`, `Edit(./vendor/**)`, `Read(./**/*.key)`.
-> Mẹo: đừng nhồi `allow` quá rộng. Mỗi lần Claude hỏi là một lần bạn *review* — nhồi allow
-> nhiều quá là tự bỏ chốt review của chính mình.
+> Tip: don't over-stuff `allow`. Every time Claude asks is a chance for you to *review* — over-stuffing
+> `allow` throws away your own review checkpoint.
-## Insight: deny ≠ bảo mật kín kẽ
+## Insight: deny ≠ airtight security
-Deny-list **không** chống được kẻ địch (agent có thể lách: viết `rm` qua script, base64…).
-Nó là **lưới an toàn + giảm ma sát**:
-- `deny`/`ask` chặn **tai nạn** (xoá nhầm, push nhầm) — phòng *lỗi*, không phòng *tấn công*.
-- `allow` cho lệnh an toàn chạy thẳng → bạn đỡ thói quen "bấm yes cho xong" (thói quen đó mới
-  là cái nguy hiểm thật).
+A deny-list **doesn't** stop an adversary (an agent can route around it: write `rm` via a script,
+base64…). It's a **safety net + friction reducer**:
+- `deny`/`ask` block **accidents** (deleting/pushing by mistake) — they guard against *errors*, not *attacks*.
+- `allow` lets safe commands run straight through → you avoid the "click yes to get it over with"
+  habit (that habit is the real danger).
-**An toàn thật sự** vẫn là **review diff + plan mode** trước khi cho agent hành động — đó là
-kỷ luật runtime, không gói thành file được.
+**Real safety** is still **reviewing diffs + plan mode** before letting the agent act — that's runtime
+discipline, not something you can package into a file.
-## Nội dung ngoài & prompt injection
+## External content & prompt injection
-`deny`/`ask` ở trên chặn *tai nạn*, **không** chặn được prompt injection. Nội dung agent đọc từ
-web, issue, PR, log… là **dữ liệu — không phải lệnh**, nhưng kẻ xấu có thể giấu chỉ thị trong đó
-để lái agent. Đây là **kỷ luật runtime** (nên kit không nhồi hook/CI-scan sẵn), vài chốt thực dụng:
+The `deny`/`ask` rules above block *accidents*; they **don't** stop prompt injection. Content the agent
+reads from the web, issues, PRs, logs… is **data — not commands**, but an attacker can hide instructions
+in it to steer the agent. This is **runtime discipline** (so the kit ships no pre-baked hook/CI-scan), a
+few practical guards:
-- Đọc nội dung ngoài (web/issue/PR) trong **plan mode** — agent *đề xuất* trước khi *hành động*.
-- KHÔNG cho agent tự chạy lệnh / `curl` lấy ra từ nội dung nó vừa fetch về.
-- Input không tin cậy → tách **session riêng**, đừng trộn vào session đang có quyền cao.
+- Read external content (web/issue/PR) in **plan mode** — the agent *proposes* before it *acts*.
+- DON'T let the agent auto-run a command / `curl` pulled out of content it just fetched.
+- Untrusted input → split into a **separate session**; don't mix it into a high-privilege session.
-Quét tự động ở CI và phần nền sâu hơn: xem `docs/harness-engineering-tutorial.md` (link **Lurkr**
-cho CI-scan, **OpenHands — mitigating prompt injection** cho nền tảng).
+Automated CI scanning and deeper background: see `docs/harness-engineering-tutorial.md` (the **Lurkr**
+link for CI-scan, **OpenHands — mitigating prompt injection** for the background).
-## Nâng cao (tùy chọn): Hooks
+## Advanced (optional): Hooks
-Khi cần *logic* phức tạp hơn allow/deny tĩnh — vd "chặn mọi edit vào path bảo vệ", "tự chạy
-lint sau mỗi lần sửa" — dùng **hook**: một script chạy *trước/sau* mỗi tool call. Khai báo trong
+When you need *logic* beyond static allow/deny — e.g. "block every edit to a protected path", "auto-run
+lint after each edit" — use a **hook**: a script that runs *before/after* every tool call. Declare it in
 `settings.json`:
 ```json
@@ -95,12 +97,12 @@ lint sau mỗi lần sửa" — dùng **hook**: một script chạy *trước/sa
 }
 ```
-Script đọc JSON tool-call từ stdin; **exit code 2 = chặn**, kèm message ra stderr. Vì hook chặn
-là script đặc thù từng repo, kit này **không nhồi sẵn** — chỉ chỉ đường. Viết khi bạn thật sự có
-một quy tắc lặp lại mà allow/deny tĩnh không diễn đạt nổi.
+The script reads the tool-call JSON from stdin; **exit code 2 = block**, with a message on stderr.
+Because a blocking hook is repo-specific, the kit **ships none** — it just points the way. Write one
+when you genuinely have a recurring rule that static allow/deny can't express.
-**Audit-log (`PostToolUse`)** — ghi lại *mọi* tool call để xem lại sau. Đây là thứ
-`evals/observability.md` (Mức 5) trỏ tới; hook này **generic, không đặc thù repo**:
+**Audit-log (`PostToolUse`)** — record *every* tool call to review later. This is what
+`evals/observability.md` (Level 5) points to; this hook is **generic, not repo-specific**:
 ```json
 {
@@ -112,7 +114,7 @@ một quy tắc lặp lại mà allow/deny tĩnh không diễn đạt nổi.
 }
 ```
-`audit-log.sh` chỉ cần nối payload stdin vào một file — mỗi dòng là JSON một tool call:
+`audit-log.sh` just appends the stdin payload to a file — one JSON tool-call per line:
 ```bash
 #!/usr/bin/env bash