nighthawk-python 0.3.1__tar.gz → 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (107) hide show
  1. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.claude/rules/docs.md +4 -2
  2. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/CHANGELOG.md +10 -1
  3. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/PKG-INFO +1 -1
  4. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/api.md +4 -0
  5. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/coding-agent-backends.md +6 -6
  6. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/for-coding-agents.md +139 -117
  7. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/index.md +1 -13
  8. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/providers.md +6 -6
  9. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/quickstart.md +3 -9
  10. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/roadmap.md +1 -1
  11. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/tutorial.md +144 -34
  12. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/pyproject.toml +1 -1
  13. nighthawk_python-0.4.0/src/nighthawk/testing.py +212 -0
  14. nighthawk_python-0.4.0/tests/docs/test_coding_agent_examples.py +263 -0
  15. nighthawk_python-0.4.0/tests/test_testing.py +556 -0
  16. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/uv.lock +4 -4
  17. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.claude/rules/coding.md +0 -0
  18. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.claude/settings.json +0 -0
  19. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.claude/unset_envs.sh +0 -0
  20. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.devcontainer/Dockerfile.devcontainer +0 -0
  21. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.devcontainer/Dockerfile.litellm +0 -0
  22. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.devcontainer/devcontainer.json +0 -0
  23. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.devcontainer/docker-compose.yaml +0 -0
  24. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.devcontainer/litellm-config.yaml +0 -0
  25. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.github/dependabot.yml +0 -0
  26. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.github/workflows/ci.yml +0 -0
  27. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.github/workflows/docs.yml +0 -0
  28. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.github/workflows/publish.yml +0 -0
  29. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.gitignore +0 -0
  30. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/.python-version +0 -0
  31. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/AGENTS.md +0 -0
  32. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/CLAUDE.md +0 -0
  33. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/CONTRIBUTING.md +0 -0
  34. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/LICENSE +0 -0
  35. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/README.md +0 -0
  36. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/assets/nighthawk_logo-128x128.png +0 -0
  37. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/docs/design.md +0 -0
  38. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/mkdocs.yml +0 -0
  39. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/pyrightconfig.json +0 -0
  40. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/__init__.py +0 -0
  41. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/__init__.py +0 -0
  42. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/base.py +0 -0
  43. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/claude_code_cli.py +0 -0
  44. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/claude_code_sdk.py +0 -0
  45. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/codex.py +0 -0
  46. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/mcp_boundary.py +0 -0
  47. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/mcp_server.py +0 -0
  48. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/backends/tool_bridge.py +0 -0
  49. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/configuration.py +0 -0
  50. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/errors.py +0 -0
  51. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/identifier_path.py +0 -0
  52. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/json_renderer.py +0 -0
  53. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/natural/__init__.py +0 -0
  54. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/natural/blocks.py +0 -0
  55. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/natural/decorator.py +0 -0
  56. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/natural/transform.py +0 -0
  57. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/__init__.py +0 -0
  58. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/async_bridge.py +0 -0
  59. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/prompt.py +0 -0
  60. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/runner.py +0 -0
  61. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/scoping.py +0 -0
  62. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/step_context.py +0 -0
  63. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/step_contract.py +0 -0
  64. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/step_executor.py +0 -0
  65. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/runtime/tool_calls.py +0 -0
  66. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/tools/__init__.py +0 -0
  67. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/tools/assignment.py +0 -0
  68. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/tools/contracts.py +0 -0
  69. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/tools/execution.py +0 -0
  70. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/tools/provided.py +0 -0
  71. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/tools/registry.py +0 -0
  72. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/src/nighthawk/ulid.py +0 -0
  73. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/__init__.py +0 -0
  74. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/backends/__init__.py +0 -0
  75. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/backends/test_claude_code_cli.py +0 -0
  76. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/backends/test_claude_code_sdk.py +0 -0
  77. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/backends/test_codex.py +0 -0
  78. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/conftest.py +0 -0
  79. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/docs/__init__.py +0 -0
  80. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/docs/test_prompt_examples.py +0 -0
  81. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/__init__.py +0 -0
  82. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/prompt_test_helpers.py +0 -0
  83. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/stub_executor.py +0 -0
  84. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/test_execution_outcome_prompt_fragment.py +0 -0
  85. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/test_globals_prompt.py +0 -0
  86. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/test_natural_block_ordering.py +0 -0
  87. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/test_natural_traceback.py +0 -0
  88. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/test_runtime.py +0 -0
  89. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/execution/test_variables_prompt.py +0 -0
  90. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/integration/__init__.py +0 -0
  91. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/integration/skip_helpers.py +0 -0
  92. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/integration/test_carry_pattern.py +0 -0
  93. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/integration/test_claude_code_cli_integration.py +0 -0
  94. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/integration/test_claude_code_sdk_integration.py +0 -0
  95. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/integration/test_codex_integration.py +0 -0
  96. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/integration/test_llm_integration.py +0 -0
  97. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/natural/__init__.py +0 -0
  98. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/natural/test_blocks.py +0 -0
  99. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/public/__init__.py +0 -0
  100. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/public/test_public_api.py +0 -0
  101. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/public/test_readme_example.py +0 -0
  102. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/test_renderer.py +0 -0
  103. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/tools/__init__.py +0 -0
  104. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/tools/test_assignment_async.py +0 -0
  105. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/tools/test_contracts.py +0 -0
  106. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/tools/test_registry.py +0 -0
  107. {nighthawk_python-0.3.1 → nighthawk_python-0.4.0}/tests/tools/test_tool_boundary.py +0 -0
@@ -13,7 +13,7 @@ Each file has a distinct audience and scope. Content belongs in exactly one file
13
13
  |---|---|---|---|
14
14
  | `index.md` | First-time visitors | Project overview, motivation, workflow styles | What Nighthawk is and why. No API details, no how-to. |
15
15
  | `quickstart.md` | New users | Shortest path to running a Natural block | Setup, first example, backends table, credentials, troubleshooting. No deep explanations. |
16
- | `tutorial.md` | Users learning the system | Build understanding from first principles | Bindings, tools, control flow, composition, configuration, guidelines. Assumes quickstart is done. |
16
+ | `tutorial.md` | Users learning the system | Build understanding from first principles | Bindings, functions and discoverability, control flow, composition, configuration, guidelines. Assumes quickstart is done. |
17
17
  | `design.md` | Implementors and advanced users | Canonical specification (target behavior) | Full technical detail: syntax rules, state layers, prompt rendering, tool contracts, outcome schema, frontmatter. |
18
18
  | `providers.md` | Users choosing and configuring models | Provider selection, Pydantic AI setup, custom backends | Provider categories, capability matrix, model identifiers, Pydantic AI model settings, step executor protocols. No coding-agent-backend-specific content. |
19
19
  | `coding-agent-backends.md` | Users of Claude Code or Codex backends | Coding agent backend configuration and features | Backend-specific settings, skills, MCP tool exposure, working directory, project-scoped files. |
@@ -43,6 +43,8 @@ Each file has a distinct audience and scope. Content belongs in exactly one file
43
43
  - When tutorial.md and design.md cover the same concept, tutorial.md shows the "what and how" with examples; design.md specifies the "exact rules and edge cases".
44
44
  - Keep code examples self-contained: a reader should understand the example without reading surrounding prose.
45
45
  - Built-in tool names (`nh_eval`, `nh_exec`, `nh_assign`) are implementation details. Only `design.md` may expose them. All other files describe behavior instead (e.g., "the LLM can set a new value" rather than "use `nh_assign`").
46
+ - `@nh.tool` is discouraged. Binding functions are the preferred callable exposure mechanism. `design.md` documents `@nh.tool` as part of the specification. `tutorial.md` may mention it with a "prefer binding functions" note. All other files should not add examples, recommendations, or references to `@nh.tool`.
47
+ - The PyPI package name is `nighthawk-python`. Always use `nighthawk-python` (not `nighthawk`) in `pip install` commands and extras references (e.g., `nighthawk-python[claude-code-sdk]`).
46
48
 
47
49
  ### index.md specifics
48
50
 
@@ -91,7 +93,7 @@ Each file has a distinct audience and scope. Content belongs in exactly one file
91
93
  - This file should be self-contained: a coding agent reading only this file should be able to write correct Nighthawk code without consulting other docs.
92
94
  - This file is consumed standalone (`@docs/for-coding-agents.md` in CLAUDE.md/AGENTS.md, GitHub raw URL, etc.). Do not assume sibling files exist at relative paths.
93
95
  - All external references to other docs use absolute URLs based on `site_url` from `mkdocs.yml` (currently `https://kurusugawa-computer.github.io/nighthawk-python/`). If `site_url` changes, update the URLs in this file.
94
- - `@nh.tool` is deprecated. Do not add examples, recommendations, or references to `@nh.tool` in this file. Binding functions are the only recommended callable exposure mechanism.
96
+ - `@nh.tool` must not appear in this file (see General rule on `@nh.tool`). Binding functions are the only callable exposure mechanism presented here.
95
97
  - Filter content for coding-agent relevance. Omit infrastructure-level concerns (scoped overrides parameter lists, exception hierarchy beyond `ExecutionError`, observability/tracing) that do not affect how an agent writes Natural blocks or binding functions. Mention existence and link to Tutorial or Design for details.
96
98
 
97
99
  ### api.md specifics
@@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.4.0] - 2026-03-20
11
+
12
+ ### Added
13
+ - `nighthawk.testing` module with test executors and convenience factories for deterministic Natural function testing without LLM API calls.
14
+
15
+ ### Changed
16
+ - Rewrote testing documentation in `tutorial.md` (Section 8) and `for-coding-agents.md` (Section 8): replaced incorrect `TestModel` usage with `nighthawk.testing` utilities, added testing strategy guidance distinguishing mock tests (Python logic) from integration tests (Natural block judgment).
17
+
10
18
  ## [0.3.1] - 2026-03-19
11
19
 
12
20
  ### Changed
@@ -49,7 +57,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
49
57
  - Step executor abstraction and provider integration foundation.
50
58
  - Core documentation and project scaffolding.
51
59
 
52
- [Unreleased]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.3.1...HEAD
60
+ [Unreleased]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.4.0...HEAD
61
+ [0.4.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.3.1...v0.4.0
53
62
  [0.3.1]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.3.0...v0.3.1
54
63
  [0.3.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.2.0...v0.3.0
55
64
  [0.2.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.1.0...v0.2.0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: nighthawk-python
3
- Version: 0.3.1
3
+ Version: 0.4.0
4
4
  Summary: An experimental Python library that embeds Natural blocks inside Python functions and executes them using an LLM.
5
5
  Project-URL: Repository, https://github.com/kurusugawa-computer/nighthawk-python
6
6
  Project-URL: Documentation, https://kurusugawa-computer.github.io/nighthawk-python/
@@ -88,3 +88,7 @@
88
88
  - ErrorKind
89
89
  - ToolResultWrapperToolset
90
90
 
91
+ ## Testing
92
+
93
+ ::: nighthawk.testing
94
+
@@ -4,7 +4,7 @@ The `claude-code-sdk`, `claude-code-cli`, and `codex` backends delegate Natural
4
4
 
5
5
  Minimal configuration:
6
6
 
7
- ```python
7
+ ```py
8
8
  from nighthawk.configuration import StepExecutorConfiguration
9
9
 
10
10
  # Claude Code (SDK)
@@ -62,7 +62,7 @@ pip install nighthawk-python[claude-code-sdk]
62
62
 
63
63
  ### Settings
64
64
 
65
- ```python
65
+ ```py
66
66
  from nighthawk.backends.claude_code_sdk import ClaudeCodeSdkModelSettings
67
67
 
68
68
  configuration = StepExecutorConfiguration(
@@ -79,7 +79,7 @@ configuration = StepExecutorConfiguration(
79
79
 
80
80
  | Field | Type | Default | Description |
81
81
  |---|---|---|---|
82
- | `permission_mode` | `"default"` \| `"acceptEdits"` \| `"plan"` \| `"bypassPermissions"` | `"default"` | Claude Code permission mode |
82
+ | `permission_mode` | `"default"` \| `"acceptEdits"` \| `"plan"` \| `"bypassPermissions"` | `"default"` | Claude Code permission mode (always passed to the SDK) |
83
83
  | `setting_sources` | `list[SettingSource]` \| `None` | `None` | Setting source scopes to load (`SettingSource` is `"user"`, `"project"`, or `"local"`) |
84
84
  | `allowed_tool_names` | `tuple[str, ...]` \| `None` | `None` | Nighthawk tool names exposed to the model |
85
85
  | `claude_allowed_tool_names` | `tuple[str, ...]` \| `None` | `None` | Additional Claude Code native tool names to allow (SDK only; CLI does not support this field) |
@@ -108,7 +108,7 @@ The `claude` CLI must be installed separately (it is a system tool, not a Python
108
108
 
109
109
  ### Settings
110
110
 
111
- ```python
111
+ ```py
112
112
  from nighthawk.backends.claude_code_cli import ClaudeCodeCliModelSettings
113
113
 
114
114
  configuration = StepExecutorConfiguration(
@@ -153,7 +153,7 @@ pip install nighthawk-python[codex]
153
153
 
154
154
  ### Settings
155
155
 
156
- ```python
156
+ ```py
157
157
  from nighthawk.backends.codex import CodexModelSettings
158
158
 
159
159
  configuration = StepExecutorConfiguration(
@@ -213,7 +213,7 @@ Use the returned groups to set <:summary_markdown> as exactly 3 bullet points.
213
213
 
214
214
  Example Natural function that invokes the skill:
215
215
 
216
- ```python
216
+ ```py
217
217
  import nighthawk as nh
218
218
 
219
219
  @nh.natural_function
@@ -14,8 +14,6 @@ Key invariants:
14
14
  - Each Natural block executes independently. There is no implicit message history between blocks. Cross-block context must be explicit.
15
15
  - Write bindings (`<:name>`) are the only way the LLM commits values back into Python locals. The LLM is physically constrained to operate on interpreter-visible objects.
16
16
 
17
- ## 2. When to use Natural blocks
18
-
19
17
  **Use Natural when the task requires LLM judgment** -- decisions that depend on interpretation, world knowledge, or subjective evaluation:
20
18
 
21
19
  - Classification and routing (e.g., categorize a support ticket).
@@ -32,7 +30,7 @@ Key invariants:
32
30
 
33
31
  **Decision rule:** if the correct output can be computed without an LLM, use Python. Natural blocks add latency, cost, and non-determinism.
34
32
 
35
- ## 3. Writing Natural blocks
33
+ ## 2. Writing Natural blocks
36
34
 
37
35
  ### Anatomy
38
36
 
@@ -60,7 +58,11 @@ Each Natural block should make exactly one independent judgment. If a block make
60
58
  - Use f-string injection for static config, pre-formatted context, computed values.
61
59
  - Use `<name>` bindings for mutable state and objects the LLM needs to inspect or modify.
62
60
 
63
- ## 4. Designing binding functions
61
+ ### Async
62
+
63
+ Async natural functions work identically to sync ones, with two additions: expressions evaluated by tools may use `await`, and return values that are awaitable are automatically awaited before validation.
64
+
65
+ ## 3. Designing binding functions
64
66
 
65
67
  Binding functions (local or module-level callables) are the preferred way to expose functions to the LLM. The LLM discovers them from the LOCALS/GLOBALS sections of the prompt, rendered as their signature with the first docstring line as `# intent:`.
66
68
 
@@ -68,21 +70,11 @@ Binding functions (local or module-level callables) are the preferred way to exp
68
70
 
69
71
  Module-level names that are stable across invocations (constants, classes, utility functions) should stay in GLOBALS via `<name>` read bindings. Reserve function parameters for data that genuinely varies per call.
70
72
 
71
- Wrong -- `fetch_data` loses its signature in LOCALS:
72
-
73
- ```python
74
- @nh.natural_function
75
- async def summarize(query: str, fetch_data: object) -> str:
76
- result = ""
77
- """natural
78
- Use <fetch_data> to get data for <query> and set <:result>.
79
- """
80
- return result
81
- ```
82
-
83
- Correct -- `fetch_data` keeps its full signature in GLOBALS:
73
+ ```py
74
+ # Wrong -- fetch_data loses its signature in LOCALS:
75
+ async def summarize(query: str, fetch_data: object) -> str: ...
84
76
 
85
- ```python
77
+ # Correct -- fetch_data keeps its full signature in GLOBALS:
86
78
  @nh.natural_function
87
79
  async def summarize(query: str) -> str:
88
80
  result = ""
@@ -96,7 +88,7 @@ async def summarize(query: str) -> str:
96
88
 
97
89
  Each parameter in a binding function signature is a decision point the LLM must evaluate. Compose complex operations in Python and expose simple binding functions:
98
90
 
99
- ```python
91
+ ```py
100
92
  # Wrong -- too many parameters
101
93
  def find_items(category: str, min_score: float, max_score: float,
102
94
  tags: list[str], created_after: str, sort_by: str) -> list[dict]:
@@ -114,7 +106,7 @@ def find_top_items(category: str) -> list[dict]:
114
106
 
115
107
  Write short docstrings explaining intent and boundaries. The first line appears as `# intent:` in the prompt. Clear function names and accurate type annotations complete discoverability.
116
108
 
117
- ## 5. Control flow and error handling
109
+ ## 4. Control flow and error handling
118
110
 
119
111
  ### Outcomes
120
112
 
@@ -132,7 +124,7 @@ Each Natural block returns exactly one outcome:
132
124
 
133
125
  Restrict allowed outcomes with YAML frontmatter:
134
126
 
135
- ```python
127
+ ```py
136
128
  """natural
137
129
  ---
138
130
  deny: [raise, return]
@@ -145,7 +137,7 @@ Read <text> and set <:result> to a summary.
145
137
 
146
138
  The LLM signals errors via the `raise` outcome. Catch with standard Python:
147
139
 
148
- ```python
140
+ ```py
149
141
  try:
150
142
  validate(data)
151
143
  except nh.ExecutionError as e:
@@ -154,13 +146,13 @@ except nh.ExecutionError as e:
154
146
 
155
147
  Custom exception types referenced in step locals or globals are available as raise targets. Catch `nh.ExecutionError` for Natural block failures; all Nighthawk exceptions inherit from `nh.NighthawkError`.
156
148
 
157
- ## 6. Cross-block composition
149
+ ## 5. Cross-block composition
158
150
 
159
151
  ### The carry pattern
160
152
 
161
153
  Pass a mutable object as a read binding (`<carry>`, not `<:carry>`) and instruct the LLM to mutate it in-place:
162
154
 
163
- ```python
155
+ ```py
164
156
  @nh.natural_function
165
157
  def step_1(carry: list[str]) -> int:
166
158
  result = 0
@@ -177,35 +169,16 @@ r2 = step_2(carry) # carry now has 2 entries
177
169
 
178
170
  Critical: use `<carry>` (read binding), not `<:carry>` (write binding). Read bindings prevent rebinding, preserving the caller's reference.
179
171
 
180
- ### Branching
172
+ - Branch by copying the carry (`carry_a = carry.copy()`). Each copy continues independently.
173
+ - When the carry's token footprint is too large, inject context via f-string instead ([Section 2](#interpolation)).
181
174
 
182
- Copy the carry to create independent branches:
183
-
184
- ```python
185
- carry_a = carry.copy()
186
- carry_b = carry.copy()
187
- result_a = branch_add(carry_a)
188
- result_b = branch_multiply(carry_b)
189
- ```
190
-
191
- ### f-string injection as alternative
192
-
193
- When the carry's locals summary footprint is too large, inject pre-formatted context via f-string:
194
-
195
- ```python
196
- f"""natural
197
- Prior context: {context_text}
198
- Set <:result> based on the context.
199
- """
200
- ```
201
-
202
- ## 7. Execution configuration
175
+ ## 6. Execution configuration
203
176
 
204
177
  ### Run context
205
178
 
206
179
  Natural functions must be called inside `with nh.run(step_executor):`. For backend-specific settings, see [Coding agent backends](https://kurusugawa-computer.github.io/nighthawk-python/coding-agent-backends/).
207
180
 
208
- ```python
181
+ ```py
209
182
  step_executor = nh.AgentStepExecutor.from_configuration(
210
183
  configuration=nh.StepExecutorConfiguration(model="openai-responses:gpt-5-mini"),
211
184
  )
@@ -217,7 +190,7 @@ Use `nh.scope()` to override model, prompts, or context limits within an existin
217
190
 
218
191
  LOCALS and GLOBALS sections are bounded by `StepContextLimits`. When bindings are missing or truncated (`<snipped>`), adjust the limits:
219
192
 
220
- ```python
193
+ ```py
221
194
  configuration = nh.StepExecutorConfiguration(
222
195
  model="openai-responses:gpt-5-mini",
223
196
  context_limits=nh.StepContextLimits(
@@ -227,113 +200,162 @@ configuration = nh.StepExecutorConfiguration(
227
200
  )
228
201
  ```
229
202
 
230
- ## 8. Testing
203
+ ## 7. Testing
204
+
205
+ ### Testing strategy
206
+
207
+ Mock tests exercise the Python logic around Natural blocks -- control flow, error handling, composition, binding wiring. They do **not** exercise the Natural blocks themselves. Since Natural blocks are the core of a Nighthawk application, mock tests alone are insufficient.
208
+
209
+ | Layer | What it tests | What it cannot test |
210
+ |---|---|---|
211
+ | **Mock tests** (`nighthawk.testing`) | Python logic: control flow, error handling, composition, binding wiring | Natural block effectiveness, prompt quality, LLM behavior |
212
+ | **Integration tests** (real LLM) | Whether the Natural block text actually produces correct judgments | Deterministic reproducibility (LLMs are non-deterministic) |
213
+
214
+ **Guideline:** use mock tests to lock down the deterministic Python shell, then use integration tests to validate that each Natural block's prompt elicits the intended judgment. Do not rely on mock tests as the primary quality gate -- a mock test passes even when the Natural block text is completely wrong.
231
215
 
232
- Use Pydantic AI's `TestModel` for deterministic unit tests without API calls:
216
+ ### Mock tests
233
217
 
234
- ```python
235
- from nighthawk.runtime.step_executor import AgentStepExecutor
236
- from nighthawk.configuration import StepExecutorConfiguration
237
- from pydantic_ai.models.test import TestModel
218
+ `ScriptedExecutor` returns scripted responses and records every call. Use it for Python logic that surrounds Natural blocks.
238
219
 
239
- configuration = StepExecutorConfiguration(model="openai-responses:gpt-5-nano")
240
- executor = AgentStepExecutor(configuration=configuration, agent=TestModel())
220
+ ```py
221
+ from nighthawk.testing import ScriptedExecutor, pass_response, raise_response
241
222
 
223
+ executor = ScriptedExecutor(responses=[
224
+ pass_response(result="Three key points: ..."),
225
+ ])
242
226
  with nh.run(executor):
243
- # Natural functions use TestModel -- deterministic, no API calls
244
- ...
227
+ output = summarize("long document")
228
+
229
+ assert output == "Three key points: ..."
230
+
231
+ # Inspect what was passed to the executor
232
+ call = executor.calls[0]
233
+ assert "result" in call.binding_names # write binding registered
234
+ assert call.step_locals["text"] == "long document" # locals visible
245
235
  ```
246
236
 
247
- ## 9. Type boundary placement
237
+ For multi-step functions, pass `default_response` to avoid enumerating every response:
248
238
 
249
- For deterministic functions (no Natural blocks), the type boundary is at the function entry point -- use typed inputs.
239
+ ```py
240
+ executor = ScriptedExecutor(default_response=pass_response(result=""))
241
+ ```
250
242
 
251
- For judgment-heavy functions (containing Natural blocks), the type boundary moves inside the function. Accept flexible inputs at the entry point and let the Natural block interpret them into typed intermediates via write bindings:
243
+ #### Outcome factories
252
244
 
253
- ```python
254
- from pydantic import BaseModel
245
+ | Factory | Outcome | Use case |
246
+ |---|---|---|
247
+ | `pass_response(**bindings)` | pass | Normal completion with binding values |
248
+ | `raise_response(message, *, error_type=None)` | raise | Test error handling paths |
249
+ | `return_response(reference_path, **bindings)` | return | Early return from Natural function |
250
+ | `break_response()` | break | Exit enclosing loop |
251
+ | `continue_response()` | continue | Skip to next iteration |
252
+
253
+ ```py
254
+ # Error handling:
255
+ executor = ScriptedExecutor(responses=[
256
+ raise_response("invalid input", error_type="ValueError"),
257
+ ])
258
+
259
+ # Early return:
260
+ executor = ScriptedExecutor(responses=[
261
+ return_response("result", result="early exit"),
262
+ ])
263
+ ```
255
264
 
256
- class ReviewVerdict(BaseModel):
257
- approved: bool
258
- reason: str
259
- risk_level: str
265
+ #### Callback executor
260
266
 
261
- @nh.natural_function
262
- def judge_review(review_data: str | nh.JsonableValue) -> ReviewVerdict:
263
- verdict: ReviewVerdict
264
- """natural
265
- Analyze <review_data> and produce a structured <:verdict>.
266
- """
267
- return verdict
267
+ `CallbackExecutor` delegates to a callback when response logic depends on input. Like `ScriptedExecutor`, it records calls in `executor.calls`:
268
+
269
+ ```py
270
+ from nighthawk.testing import CallbackExecutor, StepCall, StepResponse
271
+
272
+ def handler(call: StepCall) -> StepResponse:
273
+ text = call.step_locals.get("text", "")
274
+ if isinstance(text, str) and "urgent" in text:
275
+ return pass_response(priority="high")
276
+ return pass_response(priority="normal")
277
+
278
+ executor = CallbackExecutor(handler)
279
+ with nh.run(executor):
280
+ assert triage("urgent outage") == "high"
268
281
  ```
269
282
 
270
- ## 10. Common mistakes to avoid
283
+ #### Binding wiring verification
271
284
 
272
- | Mistake | Why it breaks | Fix |
273
- |---|---|---|
274
- | Pass a callable as a parameter with generic type (`object`, `Any`) | Signature erased in LOCALS; LLM cannot discover arguments | Reference via `<name>` read binding so it appears in GLOBALS with full signature |
275
- | Use `<:carry>` (write binding) for mutable context | Rebinding breaks the caller's reference | Use `<carry>` (read binding); mutate in-place |
276
- | Put two independent judgments in one block | Non-deterministic, hard to test, unclear contract | Split into two blocks connected by Python |
277
- | Use Natural for deterministic computation | Wastes latency/cost, adds non-determinism | Use Python |
278
- | Forget type annotations on write bindings | No validation or coercion at commit time | Always annotate `<:name>` bindings |
279
- | Duplicate module-level constants as function parameters | Moves stable values from GLOBALS to LOCALS, wastes tokens | Reference via `<name>` read binding |
285
+ Use recorded calls to verify that the right data is visible to the LLM:
286
+
287
+ ```py
288
+ executor = ScriptedExecutor(responses=[pass_response(result="")])
289
+ with nh.run(executor):
290
+ process(query="test")
280
291
 
281
- ## 11. Quick reference
292
+ call = executor.calls[0]
293
+ assert "helper" in call.step_globals # binding function visible in GLOBALS
294
+ assert "query" in call.step_locals # parameter visible in LOCALS
295
+ assert "result" in call.binding_names # write binding registered
296
+ ```
282
297
 
283
- ### Imports and setup
298
+ ### Integration tests
284
299
 
285
- ```python
286
- import nighthawk as nh
300
+ Integration tests call a real LLM and validate the judgment. This is where Natural block quality is actually tested.
287
301
 
302
+ ```py
288
303
  step_executor = nh.AgentStepExecutor.from_configuration(
289
304
  configuration=nh.StepExecutorConfiguration(model="openai-responses:gpt-5-mini"),
290
305
  )
291
306
  with nh.run(step_executor):
292
- ...
307
+ verdict = judge_review("The code has no error handling and uses eval().")
308
+
309
+ assert not verdict.approved
310
+ assert verdict.risk_level in ("high", "critical")
293
311
  ```
294
312
 
295
- ### Natural function template
313
+ For structured outputs, assert on type, value range, and semantic consistency rather than exact string matches. LLMs are non-deterministic; brittle equality checks cause flaky tests.
296
314
 
297
- ```python
298
- @nh.natural_function
299
- def my_function(input_data: str) -> str:
300
- result: str = ""
301
- """natural
302
- Read <input_data> and set <:result> to the processed output.
303
- """
304
- return result
315
+ Gate integration tests behind an environment variable so they do not run in every CI job:
316
+
317
+ ```py
318
+ import os
319
+ import pytest
320
+
321
+ if os.getenv("NIGHTHAWK_RUN_INTEGRATION_TESTS") != "1":
322
+ pytest.skip("Integration tests disabled", allow_module_level=True)
305
323
  ```
306
324
 
307
- ### Async natural function
325
+ ## 8. Type boundary placement
308
326
 
309
- Async natural functions work identically to sync ones, with two additions: expressions evaluated by tools may use `await`, and return values that are awaitable are automatically awaited before validation.
327
+ For deterministic functions (no Natural blocks), the type boundary is at the function entry point -- use typed inputs.
310
328
 
311
- ```python
312
- @nh.natural_function
313
- async def my_async_function(text: str) -> str:
314
- result: str = ""
315
- """natural
316
- Summarize <text> and set <:result>.
317
- """
318
- return result
319
- ```
329
+ For judgment-heavy functions (containing Natural blocks), the type boundary moves inside the function. Accept flexible inputs at the entry point and let the Natural block interpret them into typed intermediates via write bindings:
320
330
 
321
- ### Binding function pattern
331
+ ```py
332
+ from pydantic import BaseModel
322
333
 
323
- ```python
324
- def helper(query: str) -> list[str]:
325
- """Fetch items matching the query."""
326
- ...
334
+ class ReviewVerdict(BaseModel):
335
+ approved: bool
336
+ reason: str
337
+ risk_level: str
327
338
 
328
339
  @nh.natural_function
329
- def process(query: str) -> str:
330
- result = ""
340
+ def judge_review(review_data: str | nh.JsonableValue) -> ReviewVerdict:
341
+ verdict: ReviewVerdict
331
342
  """natural
332
- Call <helper> with <query> and set <:result> to a summary of the results.
343
+ Analyze <review_data> and produce a structured <:verdict>.
333
344
  """
334
- return result
345
+ return verdict
335
346
  ```
336
347
 
348
+ ## 9. Common mistakes to avoid
349
+
350
+ | Mistake | Why it breaks | Fix |
351
+ |---|---|---|
352
+ | Pass a callable as a parameter with generic type (`object`, `Any`) | Signature erased in LOCALS; LLM cannot discover arguments | Reference via `<name>` read binding so it appears in GLOBALS with full signature |
353
+ | Use `<:carry>` (write binding) for mutable context | Rebinding breaks the caller's reference | Use `<carry>` (read binding); mutate in-place |
354
+ | Put two independent judgments in one block | Non-deterministic, hard to test, unclear contract | Split into two blocks connected by Python |
355
+ | Use Natural for deterministic computation | Wastes latency/cost, adds non-determinism | Use Python |
356
+ | Forget type annotations on write bindings | No validation or coercion at commit time | Always annotate `<:name>` bindings |
357
+ | Duplicate module-level constants as function parameters | Moves stable values from GLOBALS to LOCALS, wastes tokens | Reference via `<name>` read binding |
358
+
337
359
  ## References
338
360
 
339
361
  - [Tutorial](https://kurusugawa-computer.github.io/nighthawk-python/tutorial/) -- learn Nighthawk from first principles (human-oriented).
@@ -127,19 +127,7 @@ calculate_average([1, "2", "three", "cuatro", "五"]) # 3.0
127
127
 
128
128
  ## Natural blocks
129
129
 
130
- A Natural block is a Python docstring or a standalone string literal whose underlying string value begins with `natural\n`.
131
-
132
- Bindings:
133
-
134
- - `<name>` is a read binding.
135
- - `<:name>` is a write binding.
136
-
137
- Write bindings control which values are committed back into Python locals at Natural block boundaries.
138
-
139
- Interpolation:
140
-
141
- - Natural blocks are literal by default. Interpolation is opt-in via f-string syntax.
142
- - See [Tutorial Section 2](tutorial.md#2-providing-data-to-a-block) for details.
130
+ A Natural block is a Python docstring or a standalone string literal beginning with `natural\n`. Inside the block, `<name>` read bindings expose Python values to the LLM, and `<:name>` write bindings let the LLM commit values back into Python locals. Natural blocks are literal by default; interpolation is opt-in via f-string syntax. See the [Tutorial](tutorial.md#2-providing-data-to-a-block) for details.
143
131
 
144
132
  ## References
145
133
 
@@ -2,7 +2,7 @@
2
2
 
3
3
  Nighthawk delegates Natural block execution to an LLM. The model is selected through the `model` field of `StepExecutorConfiguration` using the `provider:model` format:
4
4
 
5
- ```python
5
+ ```py
6
6
  from nighthawk.configuration import StepExecutorConfiguration
7
7
 
8
8
  configuration = StepExecutorConfiguration(model="openai-responses:gpt-5-nano")
@@ -36,7 +36,7 @@ Any provider that [Pydantic AI supports](https://ai.pydantic.dev/models/overview
36
36
 
37
37
  Examples:
38
38
 
39
- ```python
39
+ ```py
40
40
  # OpenAI
41
41
  configuration = StepExecutorConfiguration(model="openai-responses:gpt-5-nano")
42
42
 
@@ -71,7 +71,7 @@ See the [Pydantic AI documentation](https://ai.pydantic.dev/models/overview/) fo
71
71
 
72
72
  Pydantic AI providers accept standard Pydantic AI model settings via the `model_settings` field:
73
73
 
74
- ```python
74
+ ```py
75
75
  configuration = StepExecutorConfiguration(
76
76
  model="openai-responses:gpt-5-nano",
77
77
  model_settings={"temperature": 0.5},
@@ -80,7 +80,7 @@ configuration = StepExecutorConfiguration(
80
80
 
81
81
  ## Coding agent backends
82
82
 
83
- The `claude-code-sdk`, `claude-code-cli`, and `codex` backends implement the Pydantic AI `Model` protocol internally but delegate inference to a coding agent CLI rather than a Pydantic AI provider. Install with `nighthawk[claude-code-sdk]`, `nighthawk[claude-code-cli]`, or `nighthawk[codex]`. See [Coding agent backends](coding-agent-backends.md) for configuration, skill behavior, and backend-specific settings.
83
+ The `claude-code-sdk`, `claude-code-cli`, and `codex` backends implement the Pydantic AI `Model` protocol internally but delegate inference to a coding agent CLI rather than a Pydantic AI provider. Install with `nighthawk-python[claude-code-sdk]`, `nighthawk-python[claude-code-cli]`, or `nighthawk-python[codex]`. See [Coding agent backends](coding-agent-backends.md) for configuration, skill behavior, and backend-specific settings.
84
84
 
85
85
  ## Custom backends
86
86
 
@@ -88,7 +88,7 @@ Nighthawk's `SyncStepExecutor` and `AsyncStepExecutor` protocols define the step
88
88
 
89
89
  For most cases, wrap a Pydantic AI `Agent` using `AgentStepExecutor`:
90
90
 
91
- ```python
91
+ ```py
92
92
  from pydantic_ai import Agent
93
93
  from nighthawk.runtime.step_executor import AgentStepExecutor
94
94
 
@@ -98,7 +98,7 @@ executor = AgentStepExecutor.from_agent(agent=agent)
98
98
 
99
99
  For full control, implement `AsyncStepExecutor` (or `SyncStepExecutor` for synchronous use) directly:
100
100
 
101
- ```python
101
+ ```py
102
102
  from nighthawk.runtime.step_executor import AsyncStepExecutor
103
103
  from nighthawk.runtime.step_context import StepContext
104
104
  from nighthawk.runtime.step_contract import StepOutcome
@@ -78,8 +78,11 @@ See [Providers](providers.md) for the default and recommended models.
78
78
  Credential configuration for Pydantic AI providers follows [Pydantic AI conventions](https://ai.pydantic.dev/models/overview/). Common environment variables:
79
79
 
80
80
  - `OPENAI_API_KEY` — required for OpenAI models ([details](https://ai.pydantic.dev/models/openai/))
81
+ - `ANTHROPIC_API_KEY` — required for Anthropic models ([details](https://ai.pydantic.dev/models/anthropic/))
81
82
  - `GOOGLE_API_KEY` — required for Google AI (Gemini API) models ([details](https://ai.pydantic.dev/models/gemini/))
82
83
  - Google Vertex AI uses Application Default Credentials, not an API key ([details](https://ai.pydantic.dev/models/gemini/#vertex-ai))
84
+ - AWS Bedrock uses AWS credentials, not an API key ([details](https://ai.pydantic.dev/models/bedrock/))
85
+ - `GROQ_API_KEY` — required for Groq models ([details](https://ai.pydantic.dev/models/groq/))
83
86
 
84
87
  ## Safety model
85
88
 
@@ -105,12 +108,3 @@ Set the environment variable before running: `export OPENAI_API_KEY=sk-xxxxxxxxx
105
108
 
106
109
  Install the required provider package. For Pydantic AI providers: `pip install pydantic-ai-slim[openai]`. For coding agent backends: `pip install nighthawk-python[claude-code-sdk]`.
107
110
 
108
- ## Next Steps
109
-
110
- - **[Tutorial](tutorial.md)** — Learn Nighthawk from first principles.
111
- - **[Providers](providers.md)** — LLM providers and configuration.
112
- - **[Coding agent backends](coding-agent-backends.md)** — Claude Code and Codex backend configuration.
113
- - **[Design](design.md)** — Canonical specification.
114
- - **[API Reference](api.md)** — Auto-generated API documentation.
115
- - **[Roadmap](roadmap.md)** — Future directions.
116
- - **[For coding agents](for-coding-agents.md)** — Nighthawk development guide for coding agents (LLM reference).
@@ -70,4 +70,4 @@ The f-string binding span validation uses a NUL byte (`\x00`) as a placeholder f
70
70
  ## Open questions
71
71
 
72
72
  - How to best represent tool results in the prompt for robust reasoning.
73
- - How to debug Natural blocks deterministically (unit testing is addressed via `TestModel`; debugging the LLM's reasoning path remains open).
73
+ - How to debug Natural blocks deterministically (unit testing is addressed via `nighthawk.testing`; debugging the LLM's reasoning path remains open).