nighthawk-python 0.6.1__tar.gz → 0.8.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (167) hide show
  1. nighthawk_python-0.8.0/.claude/rules/src.md +17 -0
  2. nighthawk_python-0.8.0/.claude/rules/tests.md +13 -0
  3. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/workflows/publish.yml +1 -1
  4. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/CHANGELOG.md +28 -1
  5. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/CONTRIBUTING.md +1 -3
  6. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/PKG-INFO +8 -8
  7. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/README.md +4 -4
  8. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/api.md +6 -0
  9. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/for-coding-agents.md +15 -4
  10. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/index.md +4 -4
  11. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/patterns.md +52 -6
  12. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/philosophy.md +25 -19
  13. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/roadmap.md +21 -33
  14. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/runtime-configuration.md +41 -3
  15. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/specification.md +43 -6
  16. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/verification.md +9 -1
  17. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_default.txt +1 -1
  18. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/provider.py +1 -1
  19. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/mkdocs.yml +16 -0
  20. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/pyproject.toml +4 -4
  21. nighthawk_python-0.8.0/src/AGENTS.md +1 -0
  22. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/__init__.py +4 -2
  23. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/configuration.py +9 -3
  24. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/__init__.py +8 -4
  25. nighthawk_python-0.8.0/src/nighthawk/resilience/_budget.py +419 -0
  26. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_circuit_breaker.py +11 -1
  27. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_fallback.py +1 -1
  28. nighthawk_python-0.8.0/src/nighthawk/resilience/_retry.py +306 -0
  29. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_timeout.py +21 -3
  30. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_vote.py +1 -1
  31. nighthawk_python-0.8.0/src/nighthawk/runtime/prompt.py +641 -0
  32. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/runner.py +14 -3
  33. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/scoping.py +99 -7
  34. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/step_context.py +6 -11
  35. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/step_executor.py +9 -2
  36. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/testing.py +3 -1
  37. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/contracts.py +1 -1
  38. nighthawk_python-0.8.0/tests/AGENTS.md +1 -0
  39. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/test_codex.py +1 -1
  40. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/test_coding_agent_examples.py +4 -1
  41. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/test_prompt_examples.py +1 -1
  42. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/prompt_test_helpers.py +14 -5
  43. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_globals_prompt.py +55 -1
  44. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_runtime.py +50 -0
  45. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_variables_prompt.py +150 -0
  46. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/public/test_public_api.py +68 -0
  47. nighthawk_python-0.8.0/tests/public/test_usage_meter.py +79 -0
  48. nighthawk_python-0.8.0/tests/resilience/test_budget.py +464 -0
  49. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_circuit_breaker.py +61 -0
  50. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_composition.py +80 -0
  51. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_retry.py +77 -4
  52. nighthawk_python-0.8.0/tests/resilience/test_timeout.py +215 -0
  53. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_assignment_async.py +4 -4
  54. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_provided_async.py +1 -1
  55. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_tool_boundary.py +1 -1
  56. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/uv.lock +90 -90
  57. nighthawk_python-0.6.1/.claude/rules/coding.md +0 -14
  58. nighthawk_python-0.6.1/.claude/rules/tests.md +0 -27
  59. nighthawk_python-0.6.1/docs/philosophy.ja.md +0 -172
  60. nighthawk_python-0.6.1/src/nighthawk/resilience/_retry.py +0 -196
  61. nighthawk_python-0.6.1/src/nighthawk/runtime/prompt.py +0 -345
  62. nighthawk_python-0.6.1/tests/resilience/test_timeout.py +0 -120
  63. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/rules/docs.md +0 -0
  64. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/rules/promptfoo.md +0 -0
  65. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/settings.json +0 -0
  66. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/unset_envs.sh +0 -0
  67. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/Dockerfile.devcontainer +0 -0
  68. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/Dockerfile.litellm +0 -0
  69. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/devcontainer.json +0 -0
  70. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/docker-compose.yaml +0 -0
  71. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/litellm-config.yaml +0 -0
  72. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/dependabot.yml +0 -0
  73. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/workflows/ci.yml +0 -0
  74. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/workflows/docs.yml +0 -0
  75. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.gitignore +0 -0
  76. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.python-version +0 -0
  77. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/AGENTS.md +0 -0
  78. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/CLAUDE.md +0 -0
  79. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/LICENSE +0 -0
  80. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/AGENTS.md +0 -0
  81. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/assets/nighthawk_logo-128x128.png +0 -0
  82. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/coding-agent-backends.md +0 -0
  83. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/executors.md +0 -0
  84. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/natural-blocks.md +0 -0
  85. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/pydantic-ai-providers.md +0 -0
  86. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/quickstart.md +0 -0
  87. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/__init__.py +0 -0
  88. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/binding_value.py +0 -0
  89. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/outcome_kind.py +0 -0
  90. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/raise_message.py +0 -0
  91. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/evidence/2026-03-26-baseline-prompt-ab.md +0 -0
  92. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/evidence/2026-03-26-baseline-regression.md +0 -0
  93. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/evidence/2026-03-26-baseline-suffix-ab.md +0 -0
  94. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig-agents.yaml +0 -0
  95. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig-prompt-ab.yaml +0 -0
  96. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig-suffix-ab.yaml +0 -0
  97. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig.yaml +0 -0
  98. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_coding_agent.txt +0 -0
  99. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_mutation_aware.txt +0 -0
  100. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_sequenced.txt +0 -0
  101. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/research-result.md +0 -0
  102. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/binding_operations.yaml +0 -0
  103. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/edge_cases.yaml +0 -0
  104. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/loop_outcomes.yaml +0 -0
  105. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/multi_step.yaml +0 -0
  106. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/null_handling.yaml +0 -0
  107. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/outcome_kinds.yaml +0 -0
  108. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/tool_selection.yaml +0 -0
  109. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/pyrightconfig.json +0 -0
  110. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/__init__.py +0 -0
  111. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/base.py +0 -0
  112. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/claude_code_cli.py +0 -0
  113. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/claude_code_sdk.py +0 -0
  114. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/claude_code_settings.py +0 -0
  115. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/codex.py +0 -0
  116. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/mcp_boundary.py +0 -0
  117. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/mcp_server.py +0 -0
  118. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/tool_bridge.py +0 -0
  119. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/errors.py +0 -0
  120. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/identifier_path.py +0 -0
  121. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/json_renderer.py +0 -0
  122. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/__init__.py +0 -0
  123. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/blocks.py +0 -0
  124. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/decorator.py +0 -0
  125. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/transform.py +0 -0
  126. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/__init__.py +0 -0
  127. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/async_bridge.py +0 -0
  128. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/step_contract.py +0 -0
  129. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/tool_calls.py +0 -0
  130. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/__init__.py +0 -0
  131. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/assignment.py +0 -0
  132. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/execution.py +0 -0
  133. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/provided.py +0 -0
  134. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/registry.py +0 -0
  135. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/ulid.py +0 -0
  136. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/__init__.py +0 -0
  137. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/__init__.py +0 -0
  138. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/test_claude_code_cli.py +0 -0
  139. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/test_claude_code_sdk.py +0 -0
  140. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/conftest.py +0 -0
  141. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/__init__.py +0 -0
  142. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/test_docs_architecture.py +0 -0
  143. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/__init__.py +0 -0
  144. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/stub_executor.py +0 -0
  145. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_execution_outcome_prompt_fragment.py +0 -0
  146. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_infer_binding_types.py +0 -0
  147. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_natural_block_ordering.py +0 -0
  148. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_natural_traceback.py +0 -0
  149. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/__init__.py +0 -0
  150. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/skip_helpers.py +0 -0
  151. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_carry_pattern.py +0 -0
  152. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_claude_code_cli_integration.py +0 -0
  153. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_claude_code_sdk_integration.py +0 -0
  154. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_codex_integration.py +0 -0
  155. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_llm_integration.py +0 -0
  156. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/natural/__init__.py +0 -0
  157. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/natural/test_blocks.py +0 -0
  158. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/public/__init__.py +0 -0
  159. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/public/test_readme_example.py +0 -0
  160. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/__init__.py +0 -0
  161. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_fallback.py +0 -0
  162. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_vote.py +0 -0
  163. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/test_renderer.py +0 -0
  164. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/test_testing.py +0 -0
  165. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/__init__.py +0 -0
  166. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_contracts.py +0 -0
  167. {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_registry.py +0 -0
@@ -0,0 +1,17 @@
1
+ ---
2
+ paths:
3
+ - "src/**/*.py"
4
+ ---
5
+
6
+ # Coding standards
7
+
8
+ - Prefer concrete code. Add a new abstraction only when the same change uses it from non-test code.
9
+ - Default to module-private names. Export via `__all__` only for stable non-test consumers.
10
+ - If a change expands or changes public API, update or confirm `tests/public/`.
11
+ - Prefer async implementations in `runtime/` and `backends/`; keep sync bridges only at compatibility boundaries.
12
+ - Reuse the existing `NighthawkError` hierarchy before adding a new exception class.
13
+ - Prefer Pydantic (`BaseModel`, `TypeAdapter`) and Pydantic AI primitives over custom validation, parsing, schema, or agent/tool plumbing.
14
+ - Use `opentelemetry.trace` spans at run/scope/step/tool boundaries and `logging.getLogger("nighthawk")` for diagnostics. Do not import `logfire` in `src/`.
15
+ - Use PEP 695 `type` statements for new type aliases.
16
+ - Ask before adding a new `src/` subpackage for a single module.
17
+ - Follow `CONTRIBUTING.md` § Docstring Guide for docstring scope and format.
@@ -0,0 +1,13 @@
1
+ ---
2
+ paths:
3
+ - "tests/**"
4
+ ---
5
+
6
+ # Testing (pytest)
7
+
8
+ - Prefer deterministic pytest coverage by default. Use helpers from `nighthawk.testing` before reaching for live LLM calls.
9
+ - Use `tests/execution/stub_executor.py` only for envelope and runtime parser checks; prefer `nighthawk.testing` for normal Natural-function tests.
10
+ - Keep live-LLM tests in `tests/integration/` and behind the documented environment gates.
11
+ - For Python behavior changes, add or update pytest coverage in the same change and run `uv run pytest -q`.
12
+ - If a change affects public API or README examples, confirm `tests/public/`. If it affects docs examples or anchors, confirm `tests/docs/`.
13
+ - If a change affects prompt rendering, system prompt text, suffix generation, or tool exposure behavior, follow `.claude/rules/promptfoo.md`.
@@ -3,7 +3,7 @@ name: Publish to PyPI
3
3
  on:
4
4
  push:
5
5
  tags:
6
- - "v[0-9]+.[0-9]+.[0-9]+"
6
+ - "v[0-9]+.[0-9]+.[0-9]+*"
7
7
 
8
8
  permissions:
9
9
  contents: read
@@ -7,6 +7,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.8.0]
11
+
12
+ ### Added
13
+ - Unit tests covering prompt token-budget injection: system prompt resolves `$tool_result_max_tokens`, and custom user prompt templates can resolve the same placeholder.
14
+
15
+ ### Changed
16
+ - Default step system prompt now states that tool result `value` is a preview and includes the injected max-token limit.
17
+ - User prompt template rendering now uses `Template.safe_substitute`, aligned with system prompt injection behavior and compatible with optional `$tool_result_max_tokens` placeholders.
18
+
19
+ ## [0.7.0]
20
+
21
+ ### Added
22
+ - `nighthawk.UsageMeter`: run-scoped, thread-safe LLM token usage accumulator. Created automatically by `nh.run()` and readable via `nh.get_current_usage_meter()`.
23
+ - `nighthawk.resilience.budget` transformer: composable token and cost budget enforcement with pre-call and post-call checks. Parameters: `tokens`, `tokens_per_call`, `cost`, `cost_per_call`, `cost_function`, `estimate_usage`.
24
+ - `BudgetExceededError`, `BudgetLimitKind`, `CostFunction` supporting types.
25
+ - OpenTelemetry span event `nighthawk.resilience.budget.exceeded` and `nighthawk.resilience` logger warning on budget violation.
26
+ - Resilience OpenTelemetry events for retry/timeout/circuit paths: `nighthawk.resilience.retry.attempt`, `nighthawk.resilience.retry.exhausted`, `nighthawk.resilience.timeout.triggered`, `nighthawk.resilience.circuit.opened`.
27
+
28
+ ### Changed
29
+ - Project status promoted from Alpha to Beta.
30
+ - Updated one-line description.
31
+ - Removed "experimental" language from README and documentation.
32
+ - Updated PyPI keywords for improved discoverability.
33
+ - Generalized `StepContext` implicit references to value-based mappings (`implicit_reference_name_to_value`), and added additive scope injection via `nh.scope(implicit_references={...})` across nested scopes.
34
+
10
35
  ## [0.6.1]
11
36
 
12
37
  ### Added
@@ -102,7 +127,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
102
127
  - Step executor abstraction and provider integration foundation.
103
128
  - Core documentation and project scaffolding.
104
129
 
105
- [Unreleased]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.6.1...HEAD
130
+ [Unreleased]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.8.0...HEAD
131
+ [0.8.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.7.0...v0.8.0
132
+ [0.7.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.6.1...v0.7.0
106
133
  [0.6.1]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.6.0...v0.6.1
107
134
  [0.6.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.5.0...v0.6.0
108
135
  [0.5.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.4.0...v0.5.0
@@ -27,15 +27,13 @@ uv run python
27
27
  # Format code
28
28
  uv run ruff format .
29
29
 
30
- # Lint (check / auto-fix)
31
- uv run ruff check .
30
+ # Lint (auto-fix)
32
31
  uv run ruff check --fix .
33
32
 
34
33
  # Type check
35
34
  uv run pyright
36
35
 
37
36
  # Run tests
38
- uv run pytest # full suite
39
37
  uv run pytest -q # quiet output
40
38
  ```
41
39
 
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: nighthawk-python
3
- Version: 0.6.1
4
- Summary: An experimental Python library that embeds Natural blocks inside Python functions and executes them using an LLM.
3
+ Version: 0.8.0
4
+ Summary: A Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
5
5
  Project-URL: Repository, https://github.com/kurusugawa-computer/nighthawk-python
6
6
  Project-URL: Documentation, https://kurusugawa-computer.github.io/nighthawk-python/
7
7
  Project-URL: Changelog, https://github.com/kurusugawa-computer/nighthawk-python/blob/main/CHANGELOG.md
@@ -9,8 +9,8 @@ Project-URL: Bug Tracker, https://github.com/kurusugawa-computer/nighthawk-pytho
9
9
  Author-email: "Kurusugawa Computer Inc." <oss@kurusugawa.jp>
10
10
  License-Expression: MIT
11
11
  License-File: LICENSE
12
- Keywords: embedded-dsl,interoperability,llm,natural-language,pydantic-ai
13
- Classifier: Development Status :: 3 - Alpha
12
+ Keywords: agent,ai,anthropic,dsl,llm,natural-language,openai,prompt-engineering,pydantic-ai,structured-output
13
+ Classifier: Development Status :: 4 - Beta
14
14
  Classifier: Intended Audience :: Developers
15
15
  Classifier: License :: OSI Approved :: MIT License
16
16
  Classifier: Programming Language :: Python :: 3
@@ -44,12 +44,12 @@ Description-Content-Type: text/markdown
44
44
  <img src="https://github.com/kurusugawa-computer/nighthawk-python/raw/main/docs/assets/nighthawk_logo-128x128.png" alt="nighthawk-logo" width="128px" margin="10px"></img>
45
45
  </div>
46
46
 
47
- Nighthawk is an experimental Python library exploring a clear separation:
47
+ Nighthawk is a Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
48
48
 
49
- - Use **hard control** (Python code) for strict procedure, verification, and deterministic flow.
50
- - Use **soft reasoning** (an LLM or coding agent) for semantic interpretation inside small embedded "Natural blocks".
49
+ - **Hard control** (Python code): strict procedure, verification, and deterministic flow.
50
+ - **Soft reasoning** (an LLM or coding agent): semantic interpretation inside small embedded "Natural blocks".
51
51
 
52
- Python controls all flow; the LLM or coding agent is constrained to small Natural blocks with explicit input/output boundaries. The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests"). See **[Philosophy](https://kurusugawa-computer.github.io/nighthawk-python/philosophy/)** for the full design rationale.
52
+ The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests"). See **[Philosophy](https://kurusugawa-computer.github.io/nighthawk-python/philosophy/)** for the full design rationale.
53
53
 
54
54
  This repository is a compact reimplementation of the core ideas of [Nightjar](https://github.com/psg-mit/nightjarpy).
55
55
 
@@ -9,12 +9,12 @@
9
9
  <img src="https://github.com/kurusugawa-computer/nighthawk-python/raw/main/docs/assets/nighthawk_logo-128x128.png" alt="nighthawk-logo" width="128px" margin="10px"></img>
10
10
  </div>
11
11
 
12
- Nighthawk is an experimental Python library exploring a clear separation:
12
+ Nighthawk is a Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
13
13
 
14
- - Use **hard control** (Python code) for strict procedure, verification, and deterministic flow.
15
- - Use **soft reasoning** (an LLM or coding agent) for semantic interpretation inside small embedded "Natural blocks".
14
+ - **Hard control** (Python code): strict procedure, verification, and deterministic flow.
15
+ - **Soft reasoning** (an LLM or coding agent): semantic interpretation inside small embedded "Natural blocks".
16
16
 
17
- Python controls all flow; the LLM or coding agent is constrained to small Natural blocks with explicit input/output boundaries. The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests"). See **[Philosophy](https://kurusugawa-computer.github.io/nighthawk-python/philosophy/)** for the full design rationale.
17
+ The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests"). See **[Philosophy](https://kurusugawa-computer.github.io/nighthawk-python/philosophy/)** for the full design rationale.
18
18
 
19
19
  This repository is a compact reimplementation of the core ideas of [Nightjar](https://github.com/psg-mit/nightjarpy).
20
20
 
@@ -21,8 +21,10 @@
21
21
  - to_jsonable_value
22
22
  - ExecutionContext
23
23
  - get_current_step_context
24
+ - get_current_usage_meter
24
25
  - get_execution_context
25
26
  - get_step_executor
27
+ - UsageMeter
26
28
 
27
29
  ## Errors
28
30
 
@@ -106,6 +108,10 @@
106
108
  ::: nighthawk.resilience
107
109
  options:
108
110
  members:
111
+ - budget
112
+ - BudgetExceededError
113
+ - BudgetLimitKind
114
+ - CostFunction
109
115
  - retrying
110
116
  - timeout
111
117
  - fallback
@@ -109,6 +109,9 @@ deep_executor = nh.AgentStepExecutor.from_configuration(
109
109
  )
110
110
 
111
111
 
112
+ def search_repository(query: str) -> list[str]: ...
113
+
114
+
112
115
  @nh.natural_function
113
116
  def classify_ticket(text: str) -> str:
114
117
  label: str = ""
@@ -136,10 +139,16 @@ def write_analysis_report(ticket_text: str, product_context: str) -> str:
136
139
 
137
140
  with nh.run(fast_executor):
138
141
  label = classify_ticket(ticket_text)
139
- with nh.scope(step_executor=deep_executor):
142
+ with nh.scope(
143
+ step_executor=deep_executor,
144
+ implicit_references={"search_repository": search_repository},
145
+ ):
140
146
  report = write_analysis_report(ticket_text, product_summary)
141
147
  ```
142
148
 
149
+ `implicit_references` can inject global helper functions as block capabilities.
150
+ Nested scopes still merge additively (set union by key).
151
+
143
152
  ## 4. The standard contract shape
144
153
 
145
154
  Prefer the post-block logic pattern. Let the block write a typed value, then validate or transform it in Python.
@@ -264,6 +273,8 @@ Do not inject untrusted raw text into Natural source. If input is user-controlle
264
273
  Rules:
265
274
 
266
275
  - The model sees callable signatures from both LOCALS and GLOBALS.
276
+ - For object read bindings, the model also sees a capability view: object header, public methods (with signatures), and public fields (with typed previews).
277
+ - Object capability views expose public members only. Private/dunder members are omitted, and properties are not evaluated.
267
278
  - Put per-invocation data in function parameters. Put stable, reusable capabilities at module level.
268
279
  - Do not annotate callable parameters as `object` or `Any` -- this erases the signature the model needs:
269
280
 
@@ -311,7 +322,7 @@ Async rule:
311
322
 
312
323
  Resilience rule:
313
324
 
314
- - Keep retry, fallback, timeout, and circuit-breaker policy in Python, not inside Natural text.
325
+ - Keep retry, fallback, timeout, budget, and circuit-breaker policy in Python, not inside Natural text.
315
326
  - Import from `nighthawk.resilience` (not re-exported from `nighthawk`):
316
327
 
317
328
  ```py
@@ -323,11 +334,11 @@ with nh.run(executor):
323
334
  label = resilient_classify(ticket_text)
324
335
  ```
325
336
 
326
- See [Patterns: Resilience](https://kurusugawa-computer.github.io/nighthawk-python/patterns/#resilience-patterns) for `fallback`, `vote`, `timeout`, and `circuit_breaker`.
337
+ See [Patterns: Resilience](https://kurusugawa-computer.github.io/nighthawk-python/patterns/#resilience-patterns) for `fallback`, `vote`, `timeout`, `budget`, and `circuit_breaker`.
327
338
 
328
339
  ## 9. Context budget discipline
329
340
 
330
- Prompt context is finite. When you see `<snipped>`, fix in this order:
341
+ Prompt context is finite. When you see `<snipped>`, the marked data is truncated from the prompt but remains in Python memory -- the model can still reach it through binding functions. Fix context pressure in this order:
331
342
 
332
343
  1. Remove irrelevant locals and globals from the function scope.
333
344
  2. Split the block into smaller, focused blocks.
@@ -4,12 +4,12 @@
4
4
  <img src="assets/nighthawk_logo-128x128.png" alt="logo" width="128px">
5
5
  </div>
6
6
 
7
- Nighthawk is an experimental Python library exploring a clear separation:
7
+ Nighthawk is a Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
8
8
 
9
- - Use **hard control** (Python code) for strict procedure, verification, and deterministic flow.
10
- - Use **soft reasoning** (an LLM or coding agent) for semantic interpretation inside small embedded "Natural blocks".
9
+ - **Hard control** (Python code): strict procedure, verification, and deterministic flow.
10
+ - **Soft reasoning** (an LLM or coding agent): semantic interpretation inside small embedded "Natural blocks".
11
11
 
12
- Python controls all flow; the LLM or coding agent is constrained to small Natural blocks with explicit input/output boundaries. The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests").
12
+ The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests").
13
13
 
14
14
  ```py
15
15
  import nighthawk as nh
@@ -323,7 +323,7 @@ def compute_with_context(context_text: str) -> int:
323
323
  Natural blocks are non-deterministic by nature. Production deployments need strategies to handle transient failures, unstable outputs, and provider outages. The `nighthawk.resilience` module provides composable **function transformers** -- each takes a callable and returns a new callable with the same signature.
324
324
 
325
325
  ```py
326
- from nighthawk.resilience import retrying, fallback, vote, timeout, circuit_breaker
326
+ from nighthawk.resilience import retrying, fallback, vote, timeout, budget, circuit_breaker
327
327
  ```
328
328
 
329
329
  Import directly from `nighthawk.resilience`. Resilience primitives are not re-exported from the top-level `nighthawk` namespace.
@@ -349,7 +349,12 @@ for attempt in retrying(attempts=3):
349
349
  result = classify(text)
350
350
  ```
351
351
 
352
- Customize which exceptions trigger retries and the backoff strategy:
352
+ `retrying` separates retry control into four roles:
353
+
354
+ - `on`: type-level retry eligibility.
355
+ - `retry_if`: content-level retry eligibility after `on` matches.
356
+ - `wait`: retry interval strategy.
357
+ - `on_retry`: side-effect hook when a retry is decided.
353
358
 
354
359
  ```py
355
360
  from tenacity import wait_fixed
@@ -357,10 +362,14 @@ from tenacity import wait_fixed
357
362
  resilient = retrying(
358
363
  attempts=5,
359
364
  on=(ExecutionError, TimeoutError),
365
+ retry_if=lambda exception: "transient" in str(exception).lower(),
360
366
  wait=wait_fixed(2),
367
+ on_retry=lambda retry_state: logger.info("retrying", extra={"attempt": retry_state.attempt_number}),
361
368
  )(classify)
362
369
  ```
363
370
 
371
+ Use only what you need. For most cases, `retrying(attempts=3)(fn)` is enough.
372
+
364
373
  ### Fallback
365
374
 
366
375
  Try multiple functions in order. The first success wins.
@@ -405,6 +414,42 @@ async with timeout(seconds=30):
405
414
  result = await slow_operation()
406
415
  ```
407
416
 
417
+ ### Budget
418
+
419
+ Enforce token or monetary cost limits on wrapped functions. Requires an active `nh.run()` context (the run-scoped `UsageMeter` tracks cumulative usage automatically).
420
+
421
+ ```py
422
+ from nighthawk.resilience import budget
423
+
424
+ safe_classify = budget(tokens=50_000, tokens_per_call=5_000)(classify)
425
+ result = safe_classify(text)
426
+ ```
427
+
428
+ `tokens` caps cumulative usage across all calls; `tokens_per_call` caps a single call. Both are checked before and after each invocation. When a limit is breached, `BudgetExceededError` is raised -- combine with `fallback` to degrade gracefully:
429
+
430
+ ```py
431
+ from nighthawk.resilience import budget, fallback, BudgetExceededError
432
+
433
+ composed = fallback(
434
+ budget(tokens=50_000)(classify_gpt4),
435
+ classify_mini,
436
+ on=(BudgetExceededError,),
437
+ )
438
+ ```
439
+
440
+ For monetary budgets, supply a `cost_function` that converts `RunUsage` to a float:
441
+
442
+ ```py
443
+ from pydantic_ai.usage import RunUsage
444
+
445
+ def dollar_cost(usage: RunUsage) -> float:
446
+ return usage.input_tokens * 3e-6 + usage.output_tokens * 15e-6
447
+
448
+ budgeted = budget(cost=1.00, cost_function=dollar_cost)(classify)
449
+ ```
450
+
451
+ Outside a `nh.run()` context, the transformer is a no-op.
452
+
408
453
  ### Circuit breaker
409
454
 
410
455
  Prevent repeated calls to a failing service. After `fail_threshold` consecutive failures, the circuit opens and rejects calls immediately with `CircuitOpenError`. After `reset_timeout` seconds, one probe call is allowed.
@@ -440,10 +485,11 @@ Recommended composition order (innermost to outermost):
440
485
  | Order | Transformer | Why |
441
486
  |---|---|---|
442
487
  | 1 | `timeout` | Bound each individual call |
443
- | 2 | `vote` | Aggregate multiple bounded calls |
444
- | 3 | `retrying` | Retry the aggregated operation |
445
- | 4 | `circuit_breaker` | Protect against persistent failure |
446
- | 5 | `fallback` | Switch to alternative on exhaustion |
488
+ | 2 | `budget` | Cap token or monetary cost |
489
+ | 3 | `vote` | Aggregate multiple bounded calls |
490
+ | 4 | `retrying` | Retry the aggregated operation |
491
+ | 5 | `circuit_breaker` | Protect against persistent failure |
492
+ | 6 | `fallback` | Switch to alternative on exhaustion |
447
493
 
448
494
  ### Caching LLM results
449
495
 
@@ -1,10 +1,10 @@
1
1
  # Philosophy
2
2
 
3
- Python controls orchestration; the LLM operates inside typed blocks with explicit state transfer.
3
+ Python owns the control flow. The LLM works inside typed blocks, receiving inputs and returning outputs through explicit bindings.
4
4
 
5
5
  ## Execution model
6
6
 
7
- Nighthawk embeds Natural blocks inside ordinary Python functions. Each block is a typed boundary: read bindings (`<name>`) inject input state from Python variables, write bindings (`<:name>`) commit output state back with type validation, and binding functions give the LLM composable access to Python callables during block execution. Python controls the sequencing -- loops, conditionals, error handling, retries -- and the LLM operates inside each block with no implicit message history carried across blocks.
7
+ Nighthawk embeds Natural blocks inside ordinary Python functions. Each block has a typed boundary. Read bindings (`<name>`) pass Python values in. Write bindings (`<:name>`) pass results back out, validated against their type annotations. Binding functions let the LLM call Python functions during execution. Python controls the sequencing -- loops, conditionals, error handling, retries -- and the LLM operates inside each block with no implicit message history carried across blocks.
8
8
 
9
9
  ```py
10
10
  def python_average(numbers):
@@ -23,7 +23,7 @@ calculate_average([1, "2", "three", "cuatro", "五"]) # 3.0
23
23
 
24
24
  Binding functions like `<python_average>` appear in the prompt as a compact signature line. The LLM's pre-trained Python knowledge lets it reason about types, return values, and composition from the signature alone, without JSON Schema or protocol overhead. See [Tool exposure efficiency](#tool-exposure-efficiency) for the quantitative comparison with MCP and CLI tool exposure.
25
25
 
26
- With provider-backed executors, each Natural block is a single LLM call where typed bindings do the heavy lifting. A sentiment classifier whose write binding is typed as `Literal["positive", "negative", "neutral"]` will reject any output that falls outside the declared set -- the type annotation is not a hint but a runtime-enforced contract via Pydantic validation. The same mechanism applies to numeric extraction (`int`, `float`), structured parsing (Pydantic models), and any task where the judgment space is bounded. Because the host program owns the loop, a misclassified result can be retried, logged, or routed to a fallback -- all in ordinary Python.
26
+ With provider-backed executors, each Natural block is a single LLM call. A sentiment classifier whose write binding is typed as `Literal["positive", "negative", "neutral"]` rejects any output outside the declared set -- Pydantic validates the type annotation at runtime, not as a hint. The same mechanism applies to numeric extraction (`int`, `float`), structured parsing (Pydantic models), and any task where the judgment space is bounded. Because the host program owns the loop, a misclassified result can be retried, logged, or routed to a fallback -- all in ordinary Python.
27
27
 
28
28
  With [coding agent backends](coding-agent-backends.md), the same boundary contract applies, but each Natural block becomes an autonomous agent execution. The agent can read files, run commands, and invoke skills -- while typed bindings enforce what crosses the boundary back to Python. The same `scope()` and `run()` context managers that structure human-written workflows are equally legible to a coding agent constructing workflows programmatically. When a coding agent operates inside a Natural block, binding functions appear as Python signatures in the prompt:
29
29
 
@@ -32,13 +32,17 @@ fetch_items: (category: str, limit: int = 10) -> list[Item]
32
32
  merge_results: (primary: list[Item], secondary: list[Item]) -> list[Item]
33
33
  ```
34
34
 
35
- The underlying LLM's pre-trained Python knowledge lets it infer that `Item` has attributes, that the return value supports iteration and indexing, and that `merge_results` accepts the output of `fetch_items` directly -- all from the type annotations alone. An equivalent CLI tool description (`fetch-items --category X --limit 10`) conveys invocation syntax but not output structure; the model must infer or discover the output format separately.
35
+ The underlying LLM's pre-trained Python knowledge lets it infer that `Item` has attributes, that the return value supports iteration and indexing, and that `merge_results` accepts the output of `fetch_items` directly -- all from the type annotations alone. A CLI tool description (`fetch-items --category X --limit 10`) is optimized for invocation syntax; output structure is left to the model's training data.
36
36
 
37
- Coding agent backends make this especially practical because the agent can immediately apply that inferred structure while reading workflow code, invoking tools, editing implementations, running `pytest`, and iterating within the same Python codebase. No framework-specific tooling, no graph serialization format, no separate configuration language.
37
+ Coding agent backends make this especially practical because the agent can immediately apply that inferred structure while reading workflow code, invoking tools, editing implementations, running `pytest`, and iterating within the same Python codebase. The agent works directly in Python with standard tooling -- debugger, test runner, type checker -- rather than through a separate orchestration layer.
38
+
39
+ When the prompt exceeds token limits, the runtime omits remaining entries from the rendered context and appends a `<snipped>` marker. The underlying data stays in Python memory -- binding functions can still query it at runtime. Truncation optimizes prompt coherence without causing data loss.
40
+
41
+ Because each Natural block is a fresh prompt with no implicit history, the entire prompt surface -- block text (including f-string interpolation), bindings, and scope configuration -- is determined by the host program at each invocation. Changing any of these between invocations has no side effects on other blocks.
38
42
 
39
43
  ## The harness matters more than the model
40
44
 
41
- The strongest direct evidence comes from agentic coding tasks; extending the principle to provider-backed judgments is a design inference, not a measured claim.
45
+ The strongest direct evidence comes from agentic coding tasks. The subsections below separate what has been measured from where Nighthawk extends the principle.
42
46
 
43
47
  ### Observed evidence
44
48
 
@@ -50,27 +54,29 @@ The direct evidence concerns LLM-driven code editing and file management tasks,
50
54
 
51
55
  ### Design inference for Nighthawk
52
56
 
53
- Extending the principle to provider-backed lightweight judgments (sentiment classification, numeric interpretation) is a design inference, not an empirical claim: typed bindings structurally constrain hallucination, and resilience transformers absorb transient failures, but these benefits have not been independently measured in the same controlled fashion.
57
+ We think the same principle applies to provider-backed judgments like sentiment classification and numeric interpretation, but we have not measured it directly. Typed bindings limit what the LLM can return, and resilience transformers handle transient failures -- both should help, but neither has been tested in the same controlled way as the coding-task evidence above.
54
58
 
55
- Regardless of scope, the practical question is how harness improvements are expressed. Configuration-file-based guardrail systems -- rule files, lifecycle hooks, permission modes, tool filtering -- are effective for restricting behavior but cannot express dynamic orchestration: conditional retry strategies, type-level input/output contracts, scope-dependent tool visibility, or prompts that adapt to runtime state. The constraint vocabulary is limited to what the configuration format allows.
59
+ Regardless of scope, the practical question is how harness improvements are expressed. Configuration-file guardrails -- rule files, lifecycle hooks, permission modes, tool filtering -- are effective at restricting behavior. They are optimized for static constraints. Dynamic orchestration (conditional retries, typed input/output contracts, scope-dependent tool visibility, prompts that adapt at runtime) requires a programming language, which is where Nighthawk's Python-first approach fits.
56
60
 
57
- The primitives described in the [Execution model](#execution-model) and the following sections -- typed bindings, resilience transformers, scoped execution contexts -- are Nighthawk's implementation of the principle, through Python programming rather than configuration.
61
+ The primitives described in the [Execution model](#execution-model) and the following sections -- typed bindings, resilience transformers, scoped execution contexts -- are how Nighthawk implements the principle in Python.
58
62
 
59
63
  ## Design consequences
60
64
 
61
- The execution model introduced typed bindings as the boundary mechanism between Python and LLM reasoning. The following subsections explore what design consequences follow from that choice -- from resilience and scoping to tool exposure, multi-agent coordination, and the tradeoffs the design accepts.
65
+ The sections below explore what follows from the typed-binding execution model: resilience, scoping, tool exposure, multi-agent coordination, and the tradeoffs the design accepts.
62
66
 
63
67
  ### Resilience as composable functions
64
68
 
65
- Production LLM applications need strategies for handling transient failures, unstable outputs, and provider outages. Workflow engines build retry, checkpointing, and human-in-the-loop into the graph runtime -- resilience is inseparable from the orchestration layer. Nighthawk takes a different approach: resilience primitives (`nighthawk.resilience`) are ordinary Python function transformers that wrap any callable. Each transformer takes a function and returns a new function with the same signature. Retry, fallback, voting, timeout, and circuit breaker logic composes by nesting -- no graph DSL, no framework-managed state, and no implicit retry policy. The host controls exactly which calls are retried, how many times, and what happens on exhaustion -- using the same Python debugger, pytest, and code review workflows as the rest of the application. This applies equally to lightweight provider-backed judgments and autonomous agent executions. See [Patterns](patterns.md#resilience-patterns) for usage patterns and composition examples.
69
+ Production LLM applications need strategies for transient failures, unstable outputs, and provider outages. Workflow engines build retry, checkpointing, and human-in-the-loop into the graph runtime. Nighthawk takes a different approach. Resilience primitives (`nighthawk.resilience`) are ordinary Python function transformers that wrap any callable. Each transformer takes a function and returns a new function with the same signature. Retry, fallback, voting, timeout, and circuit breaker logic composes by nesting -- no graph DSL, no framework-managed state, and no implicit retry policy. The host controls exactly which calls are retried, how many times, and what happens on exhaustion -- using the same Python debugger, pytest, and code review workflows as the rest of the application. This applies equally to lightweight provider-backed judgments and autonomous agent executions. See [Patterns](patterns.md#resilience-patterns) for usage patterns and composition examples.
66
70
 
67
71
  ### Scoped execution contexts
68
72
 
69
- `run()` establishes the execution boundary: it links a step executor to the current context as an explicit Python `with` statement rather than as a global configuration or implicit thread-local. `scope()` narrows configuration within an existing run -- model override, prompt suffix, or executor replacement -- each taking effect only within the nested `with` block. Nesting is natural Python lexical scoping: the host program's control flow, not a framework runtime, determines which configuration is active at any point. This connects directly to the philosophy that runtime behavior should live in Python structures rather than in prose-only instructions or static configuration. See [Runtime configuration](runtime-configuration.md) for details and examples.
73
+ `run()` establishes the execution boundary: it links a step executor to the current context as an explicit Python `with` statement rather than as a global configuration or implicit thread-local. `scope()` narrows configuration within an existing run -- model override, prompt suffix, or executor replacement -- each taking effect only within the nested `with` block. Nesting is natural Python lexical scoping: the host program's control flow, not a framework runtime, determines which configuration is active at any point. Runtime behavior lives in Python structures rather than in prose-only instructions or static configuration. See [Runtime configuration](runtime-configuration.md) for details and examples.
70
74
 
71
75
  ### Tool exposure efficiency
72
76
 
73
- Because binding functions are Python signatures rather than JSON Schema objects or CLI descriptions, the per-tool context cost is on the order of a single signature line. MCP tool definitions carry per-request JSON Schema overhead that grows with the number of exposed tools. CLI tools reduce definition overhead but carry hidden costs -- Mario Zechner's [2025 benchmark](https://mariozechner.at/posts/2025-08-15-mcp-vs-cli/) found that CLI invocations in Claude Code trigger per-command security classification that consumed an order of magnitude more tokens than equivalent MCP calls. In both approaches, substantial context budget is spent on tool plumbing before the model sees the actual task.
77
+ Binding functions carry higher information density per token than JSON Schema or CLI descriptions (see [Execution model](#execution-model) for how they appear in the prompt). This section compares the per-tool context cost across approaches.
78
+
79
+ MCP tool definitions carry per-request JSON Schema overhead that grows with the number of exposed tools. CLI tools reduce definition overhead but carry hidden costs -- Mario Zechner's [2025 benchmark](https://mariozechner.at/posts/2025-08-15-mcp-vs-cli/) found that CLI invocations in Claude Code trigger per-command security classification that consumed an order of magnitude more tokens than equivalent MCP calls. In both approaches, substantial context budget is spent on tool plumbing before the model sees the actual task.
74
80
 
75
81
  **MCP** defines tools as JSON Schema objects served over a protocol layer. Each tool definition consumes tokens in every request.
76
82
 
@@ -82,7 +88,7 @@ Because binding functions are Python signatures rather than JSON Schema objects
82
88
  find_top_items: (category: str) -> list[dict] # Return the highest-scored recent items in a category.
83
89
  ```
84
90
 
85
- This is on the order of a single signature line -- comparable in token cost to the most compact CLI description, but carrying higher information density. The type annotations let the LLM reason structurally: a `list[dict]` return supports iteration and key access, an `Item` return type has discoverable attributes, and typed parameters make it clear what another binding function will accept. A CLI description of similar compactness conveys invocation syntax but leaves output structure to inference from training data. There is no protocol layer, no serialization boundary, and no per-tool JSON Schema overhead. The same type annotations serve as targets for optional static analysis (pyright) and as hooks for Nighthawk's runtime validation (via Pydantic). Testing, debugging, and composition use standard Python tooling.
91
+ The type annotations let the LLM reason structurally: a `list[dict]` return supports iteration and key access, an `Item` return type has discoverable attributes, and typed parameters make it clear what another binding function will accept. There is no protocol layer, no serialization boundary, and no per-tool JSON Schema overhead. The same type annotations serve as targets for optional static analysis (pyright) and as hooks for Nighthawk's runtime validation (via Pydantic). Testing, debugging, and composition use standard Python tooling.
86
92
 
87
93
  | Approach | Per-tool context cost | Information density | Type safety | Composability | Testing | Interoperability |
88
94
  |---|---|---|---|---|---|---|
@@ -92,7 +98,7 @@ This is on the order of a single signature line -- comparable in token cost to t
92
98
 
93
99
  ### Multi-agent coordination without a framework
94
100
 
95
- Multi-agent systems face three structural challenges: how agents communicate state, how agents are isolated from each other, and how results from multiple agents are merged. Existing workflow engines address these through framework-specific mechanisms -- graph state for communication, managed runtimes for isolation, message aggregation for merging -- but each solution locks users into the framework's abstractions, and no single framework provides all three comprehensively.
101
+ Multi-agent systems face three structural challenges: how agents communicate state, how agents are isolated from each other, and how results from multiple agents are merged. Existing workflow engines address these through framework-specific mechanisms -- graph state for communication, managed runtimes for isolation, message aggregation for merging -- but each ties communication, isolation, and merging to the framework's own abstractions.
96
102
 
97
103
  Nighthawk is not a multi-agent framework. It is a building block that composes with Python's existing ecosystem for each challenge.
98
104
 
@@ -100,7 +106,7 @@ Nighthawk is not a multi-agent framework. It is a building block that composes w
100
106
 
101
107
  **Isolation.** Nighthawk provides logical isolation at binding boundaries: read bindings prevent name rebinding, write bindings are type-validated, and each Natural block executes with an independent step context carrying no implicit message history. Read bindings do not prevent in-place mutation of mutable objects -- this is intentional and underlies the [carry pattern](patterns.md#the-carry-pattern). OS-level isolation -- sandboxing, filesystem scoping, permission control -- is delegated to the execution backend. Coding agent backends provide their own sandbox modes and working directory scoping, which Nighthawk configures but does not reimplement.
102
108
 
103
- **Result merging.** The resilience module provides composable patterns for common cases: `vote` for majority consensus across repeated invocations, `fallback` for sequential first-success chaining. Domain-specific merging -- reconciling edits from multiple agents, aggregating heterogeneous outputs, resolving conflicts -- belongs in user code, because merge semantics are inherently domain-dependent. Nighthawk's role is to ensure that each agent's output crosses the boundary as a typed, validated Python object that merge logic can operate on directly.
109
+ **Result merging.** The resilience module provides composable patterns for common cases: `vote` for majority consensus across repeated invocations, `fallback` for sequential first-success chaining. Domain-specific merging -- reconciling edits from multiple agents, aggregating heterogeneous outputs, resolving conflicts -- belongs in user code, because merge semantics are inherently domain-dependent. Nighthawk ensures that each agent's output crosses the boundary as a typed, validated Python object that merge logic can operate on directly.
104
110
 
105
111
  ### Tradeoffs
106
112
 
@@ -108,7 +114,7 @@ The boundary-centric design has costs:
108
114
 
109
115
  - **Python lock-in.** Binding functions, type annotations, and resilience transformers are Python constructs. Nighthawk does not offer a language-neutral protocol; interoperability with non-Python systems requires explicit bridging (e.g., REST endpoints wrapping Natural functions).
110
116
  - **Per-invocation cost.** Every Natural block invocation calls the LLM. There is no compilation step that amortizes cost across inputs. For high-throughput, low-judgment tasks where a deterministic Python function would suffice, a Natural block is the wrong tool. See [Why evaluate every time](#why-evaluate-every-time) for the design rationale.
111
- - **Integration tests are essential.** Mock tests verify Python logic around Natural blocks, but verifying that the LLM produces correct judgments requires integration tests against a real provider. The [two-layer testing strategy](verification.md) is not optional -- it is a structural consequence of delegating judgment to an LLM.
117
+ - **Integration tests are essential.** Mock tests verify Python logic around Natural blocks, but verifying that the LLM produces correct judgments requires integration tests against a real provider. The [two-layer testing strategy](verification.md) is not optional -- because the LLM produces the judgment, only a real LLM call can verify it.
112
118
  - **Manual orchestration burden.** Nighthawk leaves branching, retries, merge logic, and recovery policy in user code rather than a graph runtime. This is a direct cost of the "Python controls all flow" principle.
113
119
  - **Python API design discipline.** Binding functions are only as effective as their signatures, type annotations, and naming. Poor API design degrades the LLM's ability to reason about composition.
114
120
 
@@ -118,7 +124,7 @@ A natural question: why not use an LLM once to translate a Natural block into eq
118
124
 
119
125
  The answer is that Natural blocks exist precisely for tasks that cannot be reduced to deterministic code. "Classify the sentiment of this review" or "interpret this ambiguous user input" require judgment that depends on the specific input, world knowledge, and context. If a task could be written as deterministic Python, it should be -- this is the core design principle (see [Natural blocks](natural-blocks.md#responsibility-split)).
120
126
 
121
- One-time compilation has additional structural limitations:
127
+ One-time compilation has additional limitations:
122
128
 
123
129
  - The generated code would freeze the LLM's world knowledge at compilation time.
124
130
  - The input space is unbounded: "three apples, a dozen eggs, and cinco naranjas" requires open-ended interpretation that no finite code generation can fully anticipate.
@@ -153,7 +159,7 @@ Target: `[1, "2", "three", "cuatro", "五"]`
153
159
  Store the computed average in `result`.
154
160
  ````
155
161
 
156
- The instruction references embedded code, but there is no explicit boundary for how `result` crosses back to the host program. The narrative assumes the value will be available to subsequent steps, but the mechanism for state transfer is implicit -- the reader must infer it from convention rather than from a declared contract.
162
+ The instruction references embedded code, but there is no explicit boundary for how `result` crosses back to the host program. The narrative assumes the value will be available to subsequent steps, but getting `result` back to the host program is implicit -- it depends on convention, not a declared contract.
157
163
 
158
164
  ### Nighthawk
159
165