npm - @nanhara/hara - Versions diffs - 0.33.0 → 0.48.0 - Mend

@nanhara/hara 0.33.0 → 0.48.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/CHANGELOG.md +152 -1
package/README.md +12 -4
package/dist/index.js +303 -76
package/dist/org/planner.js +19 -0
package/dist/search/semindex.js +62 -11
package/dist/session/store.js +14 -0
package/dist/tools/computer.js +156 -16
package/dist/tui/App.js +40 -5
package/dist/tui/InputBox.js +2 -2
package/dist/vision.js +52 -3
package/package.json +3 -2
package/plugins/browser/.hara-plugin/plugin.json +9 -0
package/plugins/browser/skills/web/SKILL.md +27 -0
package/plugins/chrome/.hara-plugin/plugin.json +9 -0
package/plugins/chrome/skills/chrome/SKILL.md +26 -0

package/CHANGELOG.md CHANGED Viewed

@@ -5,7 +5,158 @@ All notable changes to `@nanhara/hara`.
 > Versioning (pre-1.0, SemVer-style): the **minor** (middle) number bumps for a **new feature**; the
 > **patch** (last) number bumps for **optimizations/fixes of existing features**.
-## 0.33.0 — unreleased (semantic recall + memory)
+## 0.48.0 — unreleased (chrome plugin: drive your real logged-in Chrome)
+- New first-party **`chrome` plugin** — web automation via **`chrome-devtools-mcp`** against a **real Chrome with
+  a persistent-login profile** (sign into a site once, reused across runs), or attach to your running Chrome via
+  `--browserUrl http://127.0.0.1:9222`. The "drive my actual sessions" complement to the isolated-Playwright
+  `browser` plugin (enable one, not both) — this is the openclaw/cc-haha route.
+- Shipped as an option (not auto-installed — `browser` stays the default). `chrome-devtools-mcp` verified
+  resolvable; both plugin manifests validated.
+## 0.47.0 — unreleased (browser plugin: reliable web automation via Playwright MCP)
+- New first-party **`browser` plugin** wires the **Playwright MCP** (`@playwright/mcp`) into hara → the agent gets
+  reliable web automation: `mcp__browser__navigate / snapshot / click / type / fill_form …` acting on the page's
+  **DOM/accessibility tree** (selectors, auto-waiting), NOT screenshots or pixel coordinates. This is the
+  reliable counterpart to the fragile desktop `computer` tool — no permission walls, no coordinate-guessing.
+- Ships a `web-automation` skill (snapshot-driven workflow; notes the `chrome-devtools-mcp` alternative for
+  driving your real logged-in Chrome, à la openclaw/cc-haha).
+- Install: `hara plugin add file:<repo>/plugins/browser`; `npx playwright install chromium` once. Verified
+  `@playwright/mcp@0.0.76` resolves + the plugin loads (`hara doctor` → plugins: browser).
+## 0.46.0 — unreleased (screen control: bounded-failure circuit breaker)
+- The `computer` tool now **stops after 3 consecutive failures** instead of letting the agent loop forever on a
+  broken setup (learned from codex, which bounds Computer Use attempts then gives up). After 3 in a row it
+  returns a clear stop + the likely cause (missing Accessibility/Screen Recording permission, or the app isn't
+  reachable) + how to fix; resets on any success. Each failure shows the running `[n/3]` count.
+## 0.45.1 — unreleased (activate via `open -a`; Accessibility gotcha)
+- `activateApp` uses `open -a <app>` on macOS — `osascript … to activate` often left another window on top.
+- Documented (gotcha #0 in `computer.ts`) that **cliclick needs the Accessibility permission, separate from
+  Screen Recording** — without it, clicks/keys silently no-op (the #1 cause of "it does nothing").
+## 0.45.0 — unreleased (screen control: activate, IME-safe typing)
+- **`activate` action** — bring the target app to the foreground before screenshot/click. Fixes clicks landing
+  on the terminal hara runs in (the "Ghostty" problem): the agent must `activate WeChat` *first*.
+- **IME-safe typing** — `type` now sets the clipboard and pastes (Cmd/Ctrl+V) instead of injecting keystrokes,
+  which a Chinese input method garbles. Reliable for **CJK + emoji** (verified pbcopy round-trip: `你好 hello 😀`);
+  falls back to keystrokes for ASCII if the clipboard set fails.
+- The hard-won **RPA gotchas** (foreground trap, IME, Retina coords, grounding fragility, placeholder text like
+  "AAAA") are documented at the top of `computer.ts`.
+- TUI: the type-ahead pool shows each queued line **highlighted** (accent color) above the input — no verbose
+  header (per feedback).
+## 0.44.0 — unreleased (type-ahead pool: visible + coalesced)
+- The type-ahead queue is now a **visible pool**: messages typed while the agent works are listed above the
+  input (`📥 pool (N) — sent together when this turn finishes`), so Enter visibly *enters the pool* instead of
+  appearing to vanish (the reported "回车消失了/没显示在对话池").
+- On turn-end the pool is **coalesced into one turn** — your "also do X" / "and Y" additions reach the agent
+  together, in order, rather than as separate sequential turns.
+- Esc still clears the pool (stop means stop). 130 tests (+1 coalesce; existing type-ahead tests updated).
+## 0.43.0 — unreleased (grounding for screen control — accurate clicks)
+- The `computer` tool now **locates UI elements by description** instead of guessing pixels from a text read.
+  Pass `target` to `click`/`move` (e.g. "the Send button") — hara screenshots, asks a vision model for the
+  element's position (resolution-independent fractions, Retina-safe), and clicks there. New **`find`** action
+  returns coordinates without clicking.
+- This is codex's "native computer-use" lesson applied **locally**: codex's `computer_use` is a remote browser
+  sandbox; hara grounds against your own screen + apps. Needs a grounding-capable vision model (e.g. a qwen-VL).
+- `screenSize()` per OS converts fractions → click coords; `parseLocate` accepts per-mille/percent/fraction
+  replies (tested). cliclick installed → `hara doctor` shows screencapture ✓ + cliclick ✓.
+- **Still requires you to grant macOS Screen Recording + Accessibility** to actually drive the screen — those
+  toggles can only be set by you in System Settings.
+## 0.42.0 — unreleased (type-ahead: keep typing while the agent works)
+- You can now **type while the agent is working** — the message enters a **FIFO queue** and is sent
+  automatically when the current turn finishes (the input box stays active mid-turn; a "⌨ working — Enter
+  queues" hint shows the depth). Fixes the "input does nothing while working" dogfooding feedback.
+- **Esc stops everything** — interrupts the turn AND clears the queue, so a stopped turn never fires queued
+  messages. The queue drain is idempotent (guarded against double-send under React StrictMode).
+- Expert-reviewed for queue correctness (FIFO, exactly-once), the Esc/abort UX, and input-handler conflicts.
+## 0.41.0 — unreleased (English session names, auto-summarized)
+- After the first turn a session gets a short **English kebab-case name** summarizing what it's about
+  (e.g. `add-semantic-search`) via one tiny model call — replacing the literal first-message title. A non-English
+  conversation is translated to an English gist (pinyin only if untranslatable). Names stay short + ASCII.
+- The stable session **id is still the UUID** (unchanged — this only improves the human-friendly name); falls
+  back to the lexical title if the naming call fails. New `slugify()` helper (tested).
+## 0.40.0 — unreleased (TUI polish: markdown rendering + numbered choices)
+- The ink TUI now **renders assistant Markdown** (headers, bold, inline code, bullets; code fences kept
+  verbatim) instead of showing raw `**`/`##`/backticks. The renderer (`md.ts`) had only been wired into the
+  classic REPL; the default TUI showed markdown literally.
+- **Selection prompts are numbered**: each choice shows `1.`, `2.`, … and you can **press the number to pick it
+  directly** (in addition to ↑↓ + Enter). The hint reads "↑↓ or 1–N to choose".
+## 0.39.0 — unreleased (hara commit — AI commit messages)
+- **`hara commit`** generates a conventional-commits message from your staged diff, shows it, and commits after
+  a `Y/n` confirm. `-a` stages tracked changes first; the global `-y` skips the confirm. Pairs with `hara
+  review` (review → commit). Verified live (glm-5): generated `feat(util): add mul function` and committed it.
+- Note: the skip-confirm reuses the global `-y/--yes` (a subcommand `-y` would collide with it — same lesson as
+  `hara plan resume`).
+## 0.38.0 — unreleased (hara review — review your changes)
+- **`hara review`** reviews your uncommitted changes (`git diff HEAD`) for correctness bugs, security issues,
+  missing error handling, naming, and missing tests — grouped by severity (**Blocker / Should-fix / Nit**) with
+  file:line and concrete fixes. **Read-only**: it can read files for context but never edits. `--staged`
+  reviews staged changes; `--base <ref>` reviews against a ref (e.g. `main`).
+- Verified live (glm-5): on a planted diff it flagged a hardcoded secret (Blocker), an unguarded divide, and
+  dead code, then gave a clear "do not merge" verdict.
+- `codebase_search` added to the read-only tool set (so reviewers / sub-agents can search the repo).
+## 0.37.0 — unreleased (task-aware screenshots for screen control)
+- Screenshots from the `computer` tool are now read with a **screenshot-tuned prompt** aimed at *acting*, not
+  transcribing: interactive elements (buttons/fields/menus) with labels and approximate positions, the active
+  element, and any errors. A text-only main model driving the desktop gets something it can actually click.
+- New optional **`focus`** on the screenshot action ("the Login button") narrows the read to the current goal.
+- Internal: `describeImages` gains `system`/`hint` options, `SCREENSHOT_SYSTEM` added, `ctx.describeImage`
+  takes a hint. (For contrast: codex's `computer_use` is a remote/hosted *browser* MCP plugin with no local
+  syscalls — hara stays **native + local** so it can operate your own desktop software.)
+## 0.36.0 — unreleased (resumable plans)
+- **`hara plan resume`** continues the saved plan (`.hara/org/plan.json`): atoms already marked done are
+  skipped, pending/failed ones run. When a verify gate stops a plan midway, fix the issue and resume instead
+  of starting from scratch. Interrupted atoms (running/failed) reset to pending; works with `--parallel` too.
+- Internal: execution extracted into a shared `executePlan` (skips completed atoms) used by both fresh runs and
+  resume; `loadPlan` wired into the CLI. Verified: a half-done plan resumed, skipped the done atom, ran only
+  the pending one.
+## 0.35.0 — unreleased (parallel plan execution — the org works in parallel)
+- **`hara plan --parallel`** runs independent atoms concurrently. The planner already builds a dependency DAG;
+  now `topoWaves` groups atoms into dependency *waves* (every atom in a wave depends only on earlier waves), and
+  each wave's atoms execute at the same time. A diamond plan `a1 → (a2,a3) → a4` runs a2 and a3 together.
+- This is the org differentiator made literal: not one agent stepping through a list, but a team working the
+  independent parts at once. Verified live (glm-5): two independent atoms ran in one wave and completed
+  out-of-order; both check-gates passed.
+- Sequential remains the default (and is what interactive approval uses, since concurrent atoms can't share a
+  prompt). `hara plan` is full-auto, so `--parallel` is safe there. A wave stops the run if any of its atoms fail.
+- Internal: `executeAtom` extracted (shared by both paths); `topoWaves(atoms)` added alongside `topoOrder`.
+## 0.34.0 — unreleased (incremental indexing)
+- **`hara index` is now incremental.** Re-running it re-embeds only the files whose mtime changed since the
+  last build; unchanged files keep their existing vectors, and deleted files drop out. A changed embedding
+  model still forces a full rebuild. Output reports `(N embedded, M reused)`.
+- Turns indexing from a run-once-and-go-stale command into something you can re-run after every edit. Measured
+  on hara's own repo with local `bge-m3`: full build **~68s** → unchanged rebuild **~0.4s** (~150×); editing one
+  file re-embeds just that file's chunks.
+- Internal: each chunk records its source file's mtime; `buildIndex` returns `{total, embedded, reused}`.
+## 0.33.0 — 2026-06-20 · first public release (semantic recall + memory)
 - **`recall` and `memory_search` go hybrid too.** The semantic layer added in 0.32 now also powers your
   code-asset library and durable memory — `hara index --assets` embeds `~/.hara/code-assets`, global skills,

package/README.md CHANGED Viewed

@@ -9,7 +9,7 @@
 🚧 **v0.33** · TypeScript · local-first · Apache-2.0
 **Highlights**
-- **An org, not just an agent** — `hara org "<task>"` routes work to the role that *owns* it; `hara plan "<task>"` decomposes a task into a verified DAG of atoms (frame → atomize → sequence → execute → **verify gate**).
+- **An org, not just an agent** — `hara org "<task>"` routes work to the role that *owns* it; `hara plan "<task>"` decomposes a task into a verified DAG of atoms (frame → atomize → sequence → execute → **verify gate**), and `hara plan --parallel` runs independent atoms concurrently.
 - **Real terminal UX** — an **ink TUI**: bottom-pinned input box, **plan mode** (read-only → propose a plan → approve → execute), selectable approvals with "don't ask again", windowed reasoning, **paste images** (Ctrl+V) for vision models, light/dark theme.
 - **Persistent memory + self-evolution** — `memory_*` tools over global/project `MEMORY.md`; the agent recalls before acting, **proactively saves** durable facts, and grows its own playbooks (a lexical guard screens what it writes).
 - **Multi-provider, all streamed** — Anthropic (Claude) or any OpenAI-compatible endpoint (Qwen/DashScope, GLM, Kimi, OpenAI) with live Markdown + visible reasoning.
@@ -112,6 +112,10 @@ hara doctor                # check your setup (auth / model / node / assets / ro
 hara roles init            # scaffold role-agents (implementer / reviewer / docs)
 hara org "review src/ for bugs"   # dispatch a task to the role that owns it (or --role <id>)
 hara plan "add a /health endpoint with a test"   # decompose → sequence (DAG) → run each step + verify
+hara plan --parallel "..."  # run independent atoms concurrently  ·  hara plan resume  # continue a stopped plan
+hara review                 # review uncommitted changes for bugs/security/missing tests (--staged · --base main)
+hara commit                 # AI commit message from staged changes, then commit (-a to stage all · -y to skip confirm)
+hara index                 # build the semantic search index (after: hara config set embedProvider ollama|qwen)
 hara -p "summarize @README.md and fix the lint errors in src/"   # one-shot; @path attaches a file
 hara --approval auto-edit  # suggest (default) | auto-edit | full-auto   (-y = full-auto)
 hara --sandbox workspace-write   # confine shell writes to the project (macOS Seatbelt)
@@ -163,14 +167,16 @@ not just keywords. By default they're lexical (zero setup). Configure an embeddi
 then `hara index` (repo, for `codebase_search`) / `hara index --assets` (code-assets, skills & memory) / `hara
 index --all`. A query like "read an image pasted from the clipboard" then surfaces `src/images.ts` even with no
 shared words. Indexes are rebuildable `.hara/index/` artifacts (self-`.gitignore`d, never committed); no native
-vector DB needed, and lexical still works when there's no index.
+vector DB needed, and lexical still works when there's no index. Re-running `hara index` is **incremental** —
+only changed files re-embed (a full repo rebuild that takes ~a minute re-runs in well under a second).
 **Approval modes**: `suggest` confirms edits & shell · `auto-edit` auto-applies file edits but confirms shell · `full-auto` runs everything.
 **Sandbox** (macOS): `--sandbox workspace-write|read-only` runs the `bash` tool under Seatbelt (writes confined to the project / blocked).
 **Screen control** (opt-in): the `computer` tool drives desktop software (screenshot → click/type), native per OS
 (mac `screencapture`+`cliclick` · Windows PowerShell · Linux `scrot`+`xdotool`). Off by default — enable a tier with
 `hara config set computerUse read|click|full` and allowlist apps with `hara config set computerApps "App, …"`. Guarded
-by the tier, the frontmost-app allowlist, a dangerous-key blocklist, and a once-per-session grant; screenshots are read via your vision model.
+by the tier, the frontmost-app allowlist, a dangerous-key blocklist, and a once-per-session grant. Screenshots are read via your
+vision model into **actionable** output — interactive elements + positions (pass `focus` to target what you're after) — so even a text-only main model can click.
 **Sessions**: conversations are saved automatically — `-c` / `--resume <id>` to continue, `hara sessions` to list.
 **MCP**: add an `mcpServers` map to config (global or project `.hara/config.json`); their tools appear to the agent as `mcp__<server>__<tool>`.
 **Profiles**: add a `profiles` map to `~/.hara/config.json` (`--profile <name>`), or drop a project-level `.hara/config.json` that overrides the global config.
@@ -190,7 +196,9 @@ sequences them as a DAG, and executes each step (optionally routed to a role) be
 **verify gate** — frame → atomize → sequence → execute → verify. Each atom may carry a `check` shell
 command, so verification is **objective** (e.g. `npm test`, `tsc --noEmit`) rather than a
 self-assessment. Plan state is the SSOT at `.hara/org/plan.json` (inspectable; execution stops on the
-first failed verification).
+first failed verification — fix it and **`hara plan resume`** continues, skipping the atoms already done).
+With **`hara plan --parallel`**, independent atoms (the same dependency wave) run **concurrently** — the org
+works the independent parts at once, not one step at a time.
 ### What it can do