PyPI - pythinker-code - Versions diffs - 2.3.0__py3-none-any.whl → 2.5.0__py3-none-any.whl - Mend

pythinker-code 2.3.0py3-none-any.whl → 2.5.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (115) hide show

pythinker_code/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,88 @@
 ## Unreleased
+## 2.5.0 (2026-05-13)
+bk_box_main coding-agent runtime port, Windows self-upgrade fix, FetchURL SSRF hardening, and a broad reliability/security pass.
+### Subagent runtime & permissions
+- Runtime-enforced permission profiles for every built-in role: **read-only**, **plan**, **ask**, **implement**, **review**, **verify**. Profiles are snapshot per LLM step in the new `src/pythinker_code/soul/permission.py` so a mid-step model switch can't escalate. Plan mode now **hard-denies** non-plan writes and dangerous shell mutations instead of relying on prompt-deny.
+- New plan-handoff workflow in `src/pythinker_code/tools/plan/handoff.py` with dynamic injection through `soul/dynamic_injections/plan_mode.py`. Smooth handoff from `plan` → `implement` without re-priming the context.
+- New smart-search grep variant; new subagent metadata plumbing (`subagents/models.py`, `subagents/store.py`, `subagents/builder.py`, `subagents/runner.py`).
+### Background tasks
+- Recovery distinguishes **`recoverable`** (resumable via a stored `agent_id`) from **`lost`** (worker is gone with no resume target). Agent instances are parked as `idle` rather than failed when the underlying task is recoverable.
+- Guards against overwriting terminal task states; subagent races on instance transitions closed.
+- `pythinker-host`: subprocess teardown now kills the **entire child process tree** and creates a new session group, so background workers can no longer survive their parent on Linux/macOS.
+### FetchURL — SSRF + resource-exhaustion hardening
+- `pythinker_code.tools.web.fetch._validate_fetch_url` blocks **private, loopback, link-local, multicast, and reserved** IPv4/IPv6 ranges; rejects non-`http`/`https` schemes and host-less URLs up front.
+- Responses are streamed with a hard **5 MB** ceiling (`_read_limited`) honoring `Content-Length`. Both the direct path and the configured fetch-service path enforce the same caps.
+### Web / vis surface
+- Upload limits, open-in path escaping, and vis auth all hardened (`src/pythinker_code/web/`, `src/pythinker_code/vis/`, `vis/src/lib/api.ts`).
+### Plugin
+- Plugin definitions no longer persist host credentials. Plugin **name validation** tightened to reject path-traversal and shell-meta characters.
+### Telemetry & observability
+- OTel `service.name` normalized to a stable value, decoupled from the configured display name, so SigNoz dashboards keep working across rebrands.
+- Sentry filters drop test-process noise and benign shutdown errors; `pythinker_code/telemetry/config.py` and `pythinker_code/telemetry/crash.py` updated accordingly.
+- New `tests/telemetry/test_otel_resource.py` asserts the resource identity used by the dashboards.
+### Windows
+- `pythinker update` on Windows now spawns the upgrade in a **detached console** and exits the parent process before `uv tool upgrade` runs, releasing the lock on the running `pythinker.exe`. Fixes the `os error 32: The process cannot access the file because it is being used by another process` error that blocked self-upgrade.
+- New CI matrix entry on **`windows-2025-vs2026`** (experimental, non-blocking) for the pythinker-host and pythinker-cli build, validating Visual Studio 2026 / MSVC v144 forward-compat before GitHub eventually deprecates `windows-2022`.
+### Feedback
+- New `feedback` config block: `endpoint_url`, `api_key`, `custom_headers`. The `/feedback` slash command now routes user submissions to a user-configured HTTP endpoint instead of being a no-op.
+### UI
+- Pythinker version is shown on the welcome screen.
+### CI
+- Pre-push hooks mirror CI's `check` target (`ruff format --check`, `ruff check`, `pyright`) so local pushes catch the same regressions CI does.
+- README + CHANGELOG release-validate gate hardened; the GitHub Release publish step is now resilient to transient upstream failures.
+- Spell-check vocabulary fix in `soul/permission.py` for an internal error string the typos crate flagged; experimental `windows-2025-vs2026` build no longer collides with `windows-2022` on the shared `pythinker-x86_64-pc-windows-msvc` artifact name.
+### Compatibility
+- `pythinker_core.contrib.chat_provider.anthropic`: handle the six new tool-result block types added by anthropic SDK 0.101 (`web_fetch_tool_result`, `code_execution_tool_result`, `bash_code_execution_tool_result`, `text_editor_code_execution_tool_result`, `tool_search_tool_result`, `container_upload`). pyright is exhaustive again.
+Upgrade with `pythinker update` or `pip install --upgrade pythinker-code==2.5.0`.
+## 2.4.0 (2026-05-11)
+Subagent roles overhaul, Moonshot/Kimi K2 provider support, and a ripgrep-free Grep fallback.
+- New built-in subagents under `src/pythinker_code/agents/default/`:
+  - `implementer.yaml` — scoped code changes with minimum surrounding edits and a quick verification pass.
+  - `review.yaml` — read-only code review with severity-scored findings (BLOCKER / MAJOR / MINOR / NIT).
+  - `verifier.yaml` — read-only validation runner that reports `PASS` / `FAIL` / `FLAKY` without applying fixes.
+- `coder.yaml`, `explore.yaml`, and `plan.yaml` now emit a standard `### SUMMARY / EVIDENCE / CHANGES / RISKS / BLOCKERS` response contract so the parent agent can consume subagent output without re-parsing prose.
+- `agent.yaml` registers the three new roles; `tools/agent/description.md` documents the Scout → Plan → Implement → Review → Verify workflow and the parallel review/verification pattern.
+- `agents/default/system.md`: adds decomposition guidance (preview → todo list → parallel chunks), enforces post-tool-call verification before acting on results, and tells the agent to cross-check at least one load-bearing subagent finding before editing from it.
+- Kimi K2.5 / K2.6 (Moonshot) and other strict interleaved-thinking providers:
+  - `packages/pythinker-core/.../chat_provider/pythinker.py`: always emit `reasoning_content` on assistant tool-call replays so Moonshot's "thinking is enabled but reasoning_content is missing in assistant tool call message at index N" error no longer trips multi-step tool flows.
+  - `packages/pythinker-core/.../contrib/chat_provider/openai_legacy.py`: replay reasoning metadata on every assistant turn for `kimi-k2*` / `deepseek*` models (falls back to the assistant text or `"[reasoning unavailable]"` when reasoning content was not retained).
+  - `src/pythinker_code/llm.py`: route Kimi K2 thinking through the provider-specific `extra_body={"thinking": {"type": "enabled"|"disabled"}}` body field instead of OpenAI's `reasoning_effort` (which Kimi ignores), and persist `LLM.thinking` across `clone_llm_with_model_alias` so model switches preserve the user's thinking choice.
+- `tools/file/grep_local.py`:
+  - Pure-Python `rg`-free fallback (`_python_grep`) honoring `pattern`, `path`, `glob`, `type` (bash / c / cpp / go / java / js / json / md / py / rust / sh / toml / ts / txt / yaml / zsh), `ignore_case`, `multiline`, `context` / `before_context` / `after_context`, `line_number`, `output_mode` (`content` / `files_with_matches` / `count_matches`), `offset`, `head_limit`, and the standard sensitive-file redaction. `.gitignore` / `.ignore` and the VCS metadata directories (`.git`, `.svn`, `.hg`, `.bzr`, `.jj`, `.sl`) are respected unless `include_ignored=true`.
+  - `_find_existing_rg` now honors `PYTHINKER_RG_PATH` and additionally probes `/usr/bin`, `/usr/local/bin`, `~/.cargo/bin`, `~/.local/bin`, and `~/.pi/agent/bin` before falling through to download.
+  - Downloader retries against the upstream GitHub releases mirror (`https://github.com/BurntSushi/ripgrep/releases/download/<version>/...`) when the CDN mirror is unreachable, and the failure path now degrades into the Python fallback instead of raising.
+- `.gitignore`: ignore `graphify-out*/`, `.graphify_*.json`, `.graphify_*.txt`, and the local `blackbox/` scratch area.
+- `AGENTS.md` rewritten to reflect the new subagent roster and workflow.
 ## 2.3.0 (2026-05-09)
 Telemetry & observability audit.

pythinker_code/acp/tools.py CHANGED Viewed

@@ -8,6 +8,7 @@ from pythinker_host.local import local_host
 from pythinker_code.soul.agent import Runtime
 from pythinker_code.soul.approval import Approval
+from pythinker_code.soul.permission import check_shell_command_allowed
 from pythinker_code.soul.toolset import PythinkerToolset
 from pythinker_code.tools.shell import Params as ShellParams
 from pythinker_code.tools.shell import Shell
@@ -35,6 +36,7 @@ def replace_tools(
                 acp_conn,
                 acp_session_id,
                 runtime.approval,
+                runtime,
             )
         )
@@ -52,6 +54,7 @@ class Terminal(CallableTool2[ShellParams]):
         acp_conn: acp.Client,
         acp_session_id: str,
         approval: Approval,
+        runtime: Runtime,
     ) -> None:
         # Use the `name`, `description`, and `params` from the existing Shell tool,
         # so that when this is added to the toolset, it replaces the original Shell tool.
@@ -59,6 +62,7 @@ class Terminal(CallableTool2[ShellParams]):
         self._acp_conn = acp_conn
         self._acp_session_id = acp_session_id
         self._approval = approval
+        self._runtime = runtime
     async def __call__(self, params: ShellParams) -> ToolReturnValue:
         from pythinker_code.acp.session import get_current_acp_tool_call_id_or_none
@@ -71,6 +75,9 @@ class Terminal(CallableTool2[ShellParams]):
         if not params.command:
             return builder.error("Command cannot be empty.", brief="Empty command")
+        if err := check_shell_command_allowed(self._runtime, params.command):
+            return err
         approval_result = await self._approval.request(
             self.name,
             "run shell command",

pythinker_code/agents/default/agent.yaml CHANGED Viewed

@@ -18,6 +18,7 @@ agent:
     - "pythinker_code.tools.file:ReadMediaFile"
     - "pythinker_code.tools.file:Glob"
     - "pythinker_code.tools.file:Grep"
+    - "pythinker_code.tools.file:SmartSearch"
     - "pythinker_code.tools.file:WriteFile"
     - "pythinker_code.tools.file:StrReplaceFile"
     - "pythinker_code.tools.web:SearchWeb"
@@ -34,3 +35,12 @@ agent:
     plan:
       path: ./plan.yaml
       description: "Read-only implementation planning and architecture design."
+    review:
+      path: ./review.yaml
+      description: "Read-only code review with severity-scored findings."
+    implementer:
+      path: ./implementer.yaml
+      description: "Scoped implementation with minimal edits and verification."
+    verifier:
+      path: ./verifier.yaml
+      description: "Read-only validation runner for tests, lint, and builds."

pythinker_code/agents/default/coder.yaml CHANGED Viewed

@@ -4,6 +4,22 @@ agent:
   system_prompt_args:
     ROLE_ADDITIONAL: |
       You are now running as a subagent. All the `user` messages are sent by the main agent. The main agent cannot see your context, it can only see your last message when you finish the task. You must treat the parent agent as your caller. Do not directly ask the end user questions. If something is unclear, explain the ambiguity in your final summary to the parent agent.
+      Stay tightly scoped to exactly what the parent assigned. Do not expand into adjacent cleanup or refactors. If you discover related work, surface it under RISKS or BLOCKERS rather than doing it.
+      Before editing, read the target files and confirm the line ranges/patterns you will change. Prefer the minimum edit that satisfies the brief. After edits, run the smallest relevant verification command available and report the result.
+      Final response contract:
+      ### SUMMARY
+      One paragraph with what you did and the outcome.
+      ### EVIDENCE
+      Bullet list of concrete file paths, command results, or observed errors that support the outcome.
+      ### CHANGES
+      Bullet list of every file you modified, or `None.` if read-only.
+      ### RISKS
+      Bullet list of remaining risks or `None observed.`.
+      ### BLOCKERS
+      Bullet list of anything that stopped completion, or `None.`.
   when_to_use: |
     Use this agent for non-trivial software engineering work that may require reading files, editing code, running commands, and returning a compact but technically complete summary to the parent agent.
   allowed_tools:
@@ -12,6 +28,7 @@ agent:
     - "pythinker_code.tools.file:ReadMediaFile"
     - "pythinker_code.tools.file:Glob"
     - "pythinker_code.tools.file:Grep"
+    - "pythinker_code.tools.file:SmartSearch"
     - "pythinker_code.tools.file:WriteFile"
     - "pythinker_code.tools.file:StrReplaceFile"
     - "pythinker_code.tools.web:SearchWeb"

pythinker_code/agents/default/explore.yaml CHANGED Viewed

@@ -5,7 +5,7 @@ agent:
     ROLE_ADDITIONAL: |
       You are now running as a subagent. All the `user` messages are sent by the main agent. The main agent cannot see your context, it can only see your last message when you finish the task. You must treat the parent agent as your caller. Do not directly ask the end user questions. If something is unclear, explain the ambiguity in your final summary to the parent agent.
-      You are a codebase exploration specialist. Your role is EXCLUSIVELY to search, read, and analyze existing code and resources. You do NOT have access to file editing tools.
+      You are a codebase exploration specialist. Your role is EXCLUSIVELY to search, read, and analyze existing code and resources. You do NOT have access to file editing tools. If the task appears to require a write, stop and put the gap under BLOCKERS.
       Your strengths:
       - Rapidly finding files using glob patterns
@@ -24,7 +24,19 @@ agent:
       If the prompt includes a <git-context> block, use it to orient yourself about the repository state before starting your investigation.
-      You are meant to be a fast agent. Complete the search request efficiently and report your findings clearly in a structured format.
+      You are meant to be a fast agent. Complete the search request efficiently and report your findings clearly in a structured format. EVIDENCE is the load-bearing section: cite each important finding as `path:line-range` when possible, and stop once you have enough evidence rather than exhaustively reading the whole repository.
+      Final response contract:
+      ### SUMMARY
+      One paragraph with the headline answer.
+      ### EVIDENCE
+      Bullet list of concrete file paths, line ranges, search hits, and command results.
+      ### CHANGES
+      Always write `None.`.
+      ### RISKS
+      Bullet list of uncertainties or `None observed.`.
+      ### BLOCKERS
+      Bullet list of missing context/capabilities or `None.`.
   when_to_use: |
     Fast agent specialized for exploring codebases. Use this when you need to quickly find files by patterns (e.g. "src/**/*.yaml"), search code for keywords (e.g. "database connection"), or answer questions about the codebase (e.g. "how does the auth module work?"). When calling this agent, specify the desired thoroughness level: "quick" for basic searches, "medium" for moderate exploration, or "thorough" for comprehensive analysis across multiple locations and naming conventions. Use this agent for any read-only exploration that will clearly require more than 3 tool calls. Prefer launching multiple explore agents concurrently when investigating independent questions.
   allowed_tools:
@@ -33,6 +45,7 @@ agent:
     - "pythinker_code.tools.file:ReadMediaFile"
     - "pythinker_code.tools.file:Glob"
     - "pythinker_code.tools.file:Grep"
+    - "pythinker_code.tools.file:SmartSearch"
     - "pythinker_code.tools.web:SearchWeb"
     - "pythinker_code.tools.web:FetchURL"
   exclude_tools:

pythinker_code/agents/default/implementer.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+version: 1
+agent:
+  extend: ./agent.yaml
+  system_prompt_args:
+    ROLE_ADDITIONAL: |
+      You are now running as a subagent. All the `user` messages are sent by the main agent. The main agent cannot see your context, it can only see your last message when you finish the task. Treat the parent agent as your caller. Do not directly ask the end user questions.
+      You are an implementation specialist. Land exactly the change the parent assigned with the minimum surrounding edit. Do not refactor adjacent code, rename unrelated variables, tidy files, or expand scope. Put related follow-up work under RISKS or BLOCKERS instead.
+      Method:
+      - Read target files before editing.
+      - Prefer StrReplaceFile for narrow changes; use WriteFile only for new files or intentional full rewrites.
+      - Add or update tests when the brief requires behavior changes and the project has relevant tests.
+      - After edits, run the smallest relevant verification command and report pass/fail evidence.
+      Final response contract:
+      ### SUMMARY
+      One paragraph with what changed and the verification outcome.
+      ### EVIDENCE
+      Bullet list of file reads, command results, and test/lint evidence.
+      ### CHANGES
+      Bullet list of every modified path with a one-line reason.
+      ### RISKS
+      Bullet list of remaining risks or `None observed.`.
+      ### BLOCKERS
+      Bullet list of anything that stopped completion, or `None.`.
+  when_to_use: |
+    Use this agent when the required code change is already specified and should be implemented with minimal edits and a quick verification pass.
+  allowed_tools:
+    - "pythinker_code.tools.shell:Shell"
+    - "pythinker_code.tools.file:ReadFile"
+    - "pythinker_code.tools.file:ReadMediaFile"
+    - "pythinker_code.tools.file:Glob"
+    - "pythinker_code.tools.file:Grep"
+    - "pythinker_code.tools.file:SmartSearch"
+    - "pythinker_code.tools.file:WriteFile"
+    - "pythinker_code.tools.file:StrReplaceFile"
+    - "pythinker_code.tools.web:SearchWeb"
+    - "pythinker_code.tools.web:FetchURL"
+  exclude_tools:
+    - "pythinker_code.tools.agent:Agent"
+    - "pythinker_code.tools.ask_user:AskUserQuestion"
+    - "pythinker_code.tools.todo:SetTodoList"
+    - "pythinker_code.tools.plan:ExitPlanMode"
+    - "pythinker_code.tools.plan.enter:EnterPlanMode"
+  subagents:

pythinker_code/agents/default/plan.yaml CHANGED Viewed

@@ -5,10 +5,21 @@ agent:
     ROLE_ADDITIONAL: |
       You are now running as a subagent. All the `user` messages are sent by the main agent. The main agent cannot see your context, it can only see your last message when you finish the task. You must treat the parent agent as your caller. Do not directly ask the end user questions. If something is unclear, explain the ambiguity in your final summary to the parent agent.
-      Before designing your implementation plan, consider whether you fully understand the codebase areas relevant to the task. If not, recommend the parent agent to use the explore agent (subagent_type="explore") to investigate key questions first. In your response, clearly state:
-      1. What you already know from the information provided
-      2. What questions remain unanswered that would benefit from explore agent investigation
-      3. Your implementation plan (either preliminary if questions remain, or final if sufficient context exists)
+      Before designing your implementation plan, consider whether you fully understand the codebase areas relevant to the task. If not, recommend the parent agent to use the explore agent (subagent_type="explore") to investigate key questions first.
+      Ground the plan in evidence. Read enough files to avoid guessing, name the trade-offs, and choose one path with a reason. Each step should name the artifact it changes and the verification that proves it worked. Order steps by dependency first, then by risk reduced per effort.
+      Final response contract:
+      ### SUMMARY
+      One paragraph with the recommended plan and why.
+      ### EVIDENCE
+      Bullet list of concrete file paths, line ranges, docs, or search hits that shaped the plan.
+      ### CHANGES
+      Always write `None.` unless you wrote a plan artifact.
+      ### RISKS
+      Bullet list of trade-offs, unknowns, or rollout risks.
+      ### BLOCKERS
+      Bullet list of questions that must be answered before execution, or `None.`.
   when_to_use: |
     Use this agent when the parent agent needs a step-by-step implementation plan, key file identification, and architectural trade-off analysis before code changes are made.
   allowed_tools:
@@ -16,6 +27,7 @@ agent:
     - "pythinker_code.tools.file:ReadMediaFile"
     - "pythinker_code.tools.file:Glob"
     - "pythinker_code.tools.file:Grep"
+    - "pythinker_code.tools.file:SmartSearch"
     - "pythinker_code.tools.web:SearchWeb"
     - "pythinker_code.tools.web:FetchURL"
   exclude_tools:

pythinker_code/agents/default/review.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+version: 1
+agent:
+  extend: ./agent.yaml
+  system_prompt_args:
+    ROLE_ADDITIONAL: |
+      You are now running as a subagent. All the `user` messages are sent by the main agent. The main agent cannot see your context, it can only see your last message when you finish the task. Treat the parent agent as your caller. Do not directly ask the end user questions.
+      You are a code review specialist. Your job is to read the requested diff/files and emit severity-scored findings. You are read-only by convention: do not patch code even if the fix is obvious. Describe the fix so the parent can dispatch an implementer.
+      Method:
+      - Read the diff or target files before scoring.
+      - Use Grep/Glob to check sibling call sites, similar patterns, and existing tests.
+      - Score each finding as BLOCKER, MAJOR, MINOR, or NIT.
+      - Order findings by severity, BLOCKER first.
+      - Be constructive: cite failure modes and evidence, not author intent.
+      Final response contract:
+      ### SUMMARY
+      One paragraph. If there are no MAJOR/BLOCKER issues, say that plainly.
+      ### EVIDENCE
+      Bullet list. Format review findings as `[SEVERITY] path:line-range — issue; suggested fix`.
+      ### CHANGES
+      Always write `None.`.
+      ### RISKS
+      Bullet list of residual review limitations or `None observed.`.
+      ### BLOCKERS
+      Bullet list of missing context/capabilities or `None.`.
+  when_to_use: |
+    Use this agent for read-only code review after changes are made or when the parent needs severity-scored findings before deciding what to fix.
+  allowed_tools:
+    - "pythinker_code.tools.shell:Shell"
+    - "pythinker_code.tools.file:ReadFile"
+    - "pythinker_code.tools.file:ReadMediaFile"
+    - "pythinker_code.tools.file:Glob"
+    - "pythinker_code.tools.file:Grep"
+    - "pythinker_code.tools.file:SmartSearch"
+    - "pythinker_code.tools.web:SearchWeb"
+    - "pythinker_code.tools.web:FetchURL"
+  exclude_tools:
+    - "pythinker_code.tools.agent:Agent"
+    - "pythinker_code.tools.ask_user:AskUserQuestion"
+    - "pythinker_code.tools.todo:SetTodoList"
+    - "pythinker_code.tools.plan:ExitPlanMode"
+    - "pythinker_code.tools.plan.enter:EnterPlanMode"
+    - "pythinker_code.tools.file:WriteFile"
+    - "pythinker_code.tools.file:StrReplaceFile"
+  subagents:

pythinker_code/agents/default/system.md CHANGED Viewed

@@ -10,10 +10,26 @@ The user's messages may contain questions and/or task descriptions in natural la
 When handling the user's request, if it involves creating, modifying, or running code or files, you MUST use the appropriate tools (e.g., `WriteFile`, `Shell`) to make actual changes — do not just describe the solution in text. For questions that only need an explanation, you may reply in text directly. When calling tools, do not provide explanations because the tool calls themselves should be self-explanatory. You MUST follow the description of each tool and its parameters when calling tools.
-If the `Agent` tool is available, you can use it to delegate a focused subtask to a subagent instance. The tool can either start a new instance or resume an existing one by `agent_id`. Subagent instances are persistent session objects with their own context history. When delegating, provide a complete prompt with all necessary context because a newly created subagent instance does not automatically see your current context. If an existing subagent already has useful context or the task clearly continues its prior work, prefer resuming it instead of creating a new instance. Default to foreground subagents. Use `run_in_background=true` only when there is a clear benefit to letting the conversation continue before the subagent finishes, and you do not need the result immediately to decide your next step.
+If the `Agent` tool is available, you can use it to delegate a focused subtask to a subagent instance. Treat subagents as focused roles, not just extra capacity: use `explore` for read-only mapping, `plan` for strategy, `coder` or `implementer` for scoped edits, `review` for severity-scored critique, and `verifier` for validation gates. The tool can either start a new instance or resume an existing one by `agent_id`. Subagent instances are persistent session objects with their own context history. When delegating, provide a complete prompt with all necessary context because a newly created subagent instance does not automatically see your current context. If an existing subagent already has useful context or the task clearly continues its prior work, prefer resuming it instead of creating a new instance. Default to foreground subagents. Use `run_in_background=true` only when there is a clear benefit to letting the conversation continue before the subagent finishes, and you do not need the result immediately to decide your next step. Spawn multiple subagents in the same turn when they can investigate independent regions concurrently.
 You have the capability to output any number of tool calls in a single response. If you anticipate making multiple non-interfering tool calls, you are HIGHLY RECOMMENDED to make them in parallel to significantly improve efficiency. This is very important to your performance.
+For any non-trivial request, decompose before acting:
+- Preview the terrain first: scan the directory structure, file headers, and relevant module boundaries before choosing an implementation path.
+- Use `SetTodoList` for multi-step work so the user can see the active plan and progress.
+- Split broad work into independent chunks; use parallel tool calls or focused subagents for chunks that do not depend on each other.
+- Re-read the plan after each phase and adjust it when new evidence changes the approach.
+Before every tool response, ask whether another independent read/search/check can run in the same turn. Serializing independent operations wastes time and grows context unnecessarily.
+After every tool call whose result you will act on, verify the result before proceeding:
+- File reads: confirm the path and line range you are about to modify match what you read.
+- Searches: confirm the hit is relevant; broad regexes can return false positives.
+- Shell commands: inspect stdout/stderr, not just the exit code.
+- Subagent results: cross-check at least one load-bearing finding against a direct read or deterministic command before making changes from it.
 The results of the tool calls will be returned to you in a tool message. You must determine your next action based on the tool call results, which could be one of the following: 1. Continue working on the task, 2. Inform the user that the task is completed or has failed, or 3. Ask the user for more information.
 The system may insert information wrapped in `<system>` tags within user or tool messages. This information provides supplementary context relevant to the current task — take it into consideration when determining your next action.

pythinker_code/agents/default/verifier.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+version: 1
+agent:
+  extend: ./agent.yaml
+  system_prompt_args:
+    ROLE_ADDITIONAL: |
+      You are now running as a subagent. All the `user` messages are sent by the main agent. The main agent cannot see your context, it can only see your last message when you finish the task. Treat the parent agent as your caller. Do not directly ask the end user questions.
+      You are a verification specialist. Your job is to run the validation gate the parent requested and report PASS / FAIL / FLAKY with actionable evidence. You are read-only by convention: do not patch failing code, update snapshots, or fix lint. If a fix is obvious, describe it under RISKS.
+      Method:
+      - Run the narrowest relevant gate when the parent gives one; otherwise choose the standard project command from AGENTS.md.
+      - Capture exact failing assertions, stack traces, and file:line references.
+      - Do not run expensive full suites unless requested or clearly necessary.
+      - If a result looks flaky, mention how many runs were attempted.
+      Final response contract:
+      ### SUMMARY
+      Start with `PASS`, `FAIL`, or `FLAKY`, then one paragraph explaining the outcome.
+      ### EVIDENCE
+      Bullet list of commands, exit codes, important stdout/stderr, and file:line failures.
+      ### CHANGES
+      Always write `None.`.
+      ### RISKS
+      Bullet list of likely causes or follow-up fixes, or `None observed.`.
+      ### BLOCKERS
+      Bullet list of missing dependencies, unavailable commands, or `None.`.
+  when_to_use: |
+    Use this agent when the parent needs tests, lint, type checks, builds, or other validation gates run and reported without applying fixes.
+  allowed_tools:
+    - "pythinker_code.tools.shell:Shell"
+    - "pythinker_code.tools.file:ReadFile"
+    - "pythinker_code.tools.file:ReadMediaFile"
+    - "pythinker_code.tools.file:Glob"
+    - "pythinker_code.tools.file:Grep"
+    - "pythinker_code.tools.file:SmartSearch"
+  exclude_tools:
+    - "pythinker_code.tools.agent:Agent"
+    - "pythinker_code.tools.ask_user:AskUserQuestion"
+    - "pythinker_code.tools.todo:SetTodoList"
+    - "pythinker_code.tools.plan:ExitPlanMode"
+    - "pythinker_code.tools.plan.enter:EnterPlanMode"
+    - "pythinker_code.tools.file:WriteFile"
+    - "pythinker_code.tools.file:StrReplaceFile"
+    - "pythinker_code.tools.web:SearchWeb"
+    - "pythinker_code.tools.web:FetchURL"
+  subagents:

pythinker_code/app.py CHANGED Viewed

@@ -38,6 +38,9 @@ if TYPE_CHECKING:
     from fastmcp.mcp_config import MCPConfig
+_CWD_LOCK = asyncio.Lock()
 def _patch_session_id(record: dict[str, Any]) -> None:
     """Inject the current session ID (from ContextVar) into log records."""
     try:
@@ -522,15 +525,16 @@ class PythinkerCLI:
     @contextlib.asynccontextmanager
     async def _env(self) -> AsyncGenerator[None]:
-        original_cwd = HostPath.cwd()
-        await pythinker_host.chdir(self._runtime.session.work_dir)
-        try:
-            # to ignore possible warnings from dateparser
-            warnings.filterwarnings("ignore", category=DeprecationWarning)
-            async with self._runtime.oauth.refreshing(self._runtime):
-                yield
-        finally:
-            await pythinker_host.chdir(original_cwd)
+        async with _CWD_LOCK:
+            original_cwd = HostPath.cwd()
+            await pythinker_host.chdir(self._runtime.session.work_dir)
+            try:
+                # to ignore possible warnings from dateparser
+                warnings.filterwarnings("ignore", category=DeprecationWarning)
+                async with self._runtime.oauth.refreshing(self._runtime):
+                    yield
+            finally:
+                await pythinker_host.chdir(original_cwd)
     async def run(
         self,
@@ -703,9 +707,13 @@ class PythinkerCLI:
         from pythinker_code.ui.shell import Shell, WelcomeInfoItem
         if command is None:
-            from pythinker_code.ui.shell.update import print_update_banner
+            from pythinker_code.ui.shell.update import (
+                print_update_banner,
+                schedule_auto_update_check,
+            )
             print_update_banner()
+            schedule_auto_update_check()
         welcome_info = [
             WelcomeInfoItem(

pythinker_code/background/manager.py CHANGED Viewed

@@ -215,6 +215,9 @@ class BackgroundTaskManager:
         model_override: str | None,
         timeout_s: int | None = None,
         resumed: bool = False,
+        dependencies: list[str] | None = None,
+        budget_seconds: int | None = None,
+        isolation: str | None = None,
     ) -> TaskView:
         from .agent_runner import BackgroundAgentRunner
@@ -244,12 +247,19 @@ class BackgroundTaskManager:
             # an explicit per-agent timeout instead of always falling back to
             # ``config.background.agent_task_timeout_s``.
             timeout_s=effective_timeout,
+            dependencies=list(dependencies or ()),
+            budget_seconds=budget_seconds,
+            synthesis_state="pending",
+            isolation=isolation,
             kind_payload={
                 "agent_id": agent_id,
                 "subagent_type": subagent_type,
                 "prompt": prompt,
                 "model_override": model_override,
                 "launch_mode": "background",
+                "dependencies": list(dependencies or ()),
+                "budget_seconds": budget_seconds,
+                "isolation": isolation,
             },
         )
         self._store.create_task(spec)
@@ -427,10 +437,15 @@ class BackgroundTaskManager:
                 runtime = view.runtime.model_copy()
                 runtime.finished_at = now
                 runtime.updated_at = now
-                runtime.status = "lost"
-                runtime.failure_reason = "In-process background agent is no longer running"
-                self._store.write_runtime(view.spec.id, runtime)
                 agent_id = (view.spec.kind_payload or {}).get("agent_id")
+                runtime.status = "recoverable" if isinstance(agent_id, str) else "lost"
+                runtime.failure_reason = (
+                    "In-process background agent is no longer running; resume the stored agent "
+                    f"instance {agent_id} to continue."
+                    if isinstance(agent_id, str)
+                    else "In-process background agent is no longer running"
+                )
+                self._store.write_runtime(view.spec.id, runtime)
                 if (
                     isinstance(agent_id, str)
                     and self._runtime is not None
@@ -438,7 +453,7 @@ class BackgroundTaskManager:
                 ):
                     record = self._runtime.subagent_store.get_instance(agent_id)
                     if record is not None and record.status == "running_background":
-                        self._runtime.subagent_store.update_instance(agent_id, status="failed")
+                        self._runtime.subagent_store.update_instance(agent_id, status="idle")
                 continue
             last_progress_at = (
                 view.runtime.heartbeat_at
@@ -506,6 +521,9 @@ class BackgroundTaskManager:
                 case "lost":
                     severity = "warning"
                     title = f"Background task lost: {view.spec.description}"
+                case "recoverable":
+                    severity = "warning"
+                    title = f"Background task recoverable: {view.spec.description}"
                 case _:
                     severity = "info"
                     title = f"Background task updated: {view.spec.description}"

pythinker_code/background/models.py CHANGED Viewed

@@ -15,10 +15,17 @@ type TaskStatus = Literal[
     "failed",
     "killed",
     "lost",
+    "recoverable",
 ]
 type TaskOwnerRole = Literal["root", "subagent"]
-TERMINAL_TASK_STATUSES: tuple[TaskStatus, ...] = ("completed", "failed", "killed", "lost")
+TERMINAL_TASK_STATUSES: tuple[TaskStatus, ...] = (
+    "completed",
+    "failed",
+    "killed",
+    "lost",
+    "recoverable",
+)
 def is_terminal_status(status: TaskStatus) -> bool:
@@ -50,6 +57,12 @@ class TaskSpec(BaseModel):
     shell_path: str | None = None
     cwd: str | None = None
     timeout_s: int | None = None
+    parent_task_id: str | None = None
+    child_task_ids: list[str] = Field(default_factory=list)
+    dependencies: list[str] = Field(default_factory=list)
+    budget_seconds: int | None = None
+    synthesis_state: str | None = None
+    isolation: str | None = None
     kind_payload: dict[str, Any] | None = None

pythinker_code/background/store.py CHANGED Viewed

@@ -17,6 +17,7 @@ from .models import (
     TaskSpec,
     TaskStatus,
     TaskView,
+    is_terminal_status,
 )
 _VALID_TASK_ID = re.compile(r"^[a-z0-9][a-z0-9\-]{1,24}$")
@@ -101,7 +102,12 @@ class BackgroundTaskStore:
         return TaskSpec.model_validate_json(self.spec_path(task_id).read_text(encoding="utf-8"))
     def write_runtime(self, task_id: str, runtime: TaskRuntime) -> None:
-        atomic_json_write(runtime.model_dump(mode="json"), self.runtime_path(task_id))
+        path = self.runtime_path(task_id)
+        if path.exists():
+            current = self.read_runtime(task_id)
+            if is_terminal_status(current.status) and not is_terminal_status(runtime.status):
+                return
+        atomic_json_write(runtime.model_dump(mode="json"), path)
     def read_runtime(self, task_id: str) -> TaskRuntime:
         path = self.runtime_path(task_id)

pythinker-code 2.3.0__py3-none-any.whl → 2.5.0__py3-none-any.whl

pythinker-code 2.3.0py3-none-any.whl → 2.5.0py3-none-any.whl