nighthawk-python 0.6.1__tar.gz → 0.8.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- nighthawk_python-0.8.0/.claude/rules/src.md +17 -0
- nighthawk_python-0.8.0/.claude/rules/tests.md +13 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/workflows/publish.yml +1 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/CHANGELOG.md +28 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/CONTRIBUTING.md +1 -3
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/PKG-INFO +8 -8
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/README.md +4 -4
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/api.md +6 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/for-coding-agents.md +15 -4
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/index.md +4 -4
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/patterns.md +52 -6
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/philosophy.md +25 -19
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/roadmap.md +21 -33
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/runtime-configuration.md +41 -3
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/specification.md +43 -6
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/verification.md +9 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_default.txt +1 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/provider.py +1 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/mkdocs.yml +16 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/pyproject.toml +4 -4
- nighthawk_python-0.8.0/src/AGENTS.md +1 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/__init__.py +4 -2
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/configuration.py +9 -3
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/__init__.py +8 -4
- nighthawk_python-0.8.0/src/nighthawk/resilience/_budget.py +419 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_circuit_breaker.py +11 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_fallback.py +1 -1
- nighthawk_python-0.8.0/src/nighthawk/resilience/_retry.py +306 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_timeout.py +21 -3
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/resilience/_vote.py +1 -1
- nighthawk_python-0.8.0/src/nighthawk/runtime/prompt.py +641 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/runner.py +14 -3
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/scoping.py +99 -7
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/step_context.py +6 -11
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/step_executor.py +9 -2
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/testing.py +3 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/contracts.py +1 -1
- nighthawk_python-0.8.0/tests/AGENTS.md +1 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/test_codex.py +1 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/test_coding_agent_examples.py +4 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/test_prompt_examples.py +1 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/prompt_test_helpers.py +14 -5
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_globals_prompt.py +55 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_runtime.py +50 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_variables_prompt.py +150 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/public/test_public_api.py +68 -0
- nighthawk_python-0.8.0/tests/public/test_usage_meter.py +79 -0
- nighthawk_python-0.8.0/tests/resilience/test_budget.py +464 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_circuit_breaker.py +61 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_composition.py +80 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_retry.py +77 -4
- nighthawk_python-0.8.0/tests/resilience/test_timeout.py +215 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_assignment_async.py +4 -4
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_provided_async.py +1 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_tool_boundary.py +1 -1
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/uv.lock +90 -90
- nighthawk_python-0.6.1/.claude/rules/coding.md +0 -14
- nighthawk_python-0.6.1/.claude/rules/tests.md +0 -27
- nighthawk_python-0.6.1/docs/philosophy.ja.md +0 -172
- nighthawk_python-0.6.1/src/nighthawk/resilience/_retry.py +0 -196
- nighthawk_python-0.6.1/src/nighthawk/runtime/prompt.py +0 -345
- nighthawk_python-0.6.1/tests/resilience/test_timeout.py +0 -120
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/rules/docs.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/rules/promptfoo.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/settings.json +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.claude/unset_envs.sh +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/Dockerfile.devcontainer +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/Dockerfile.litellm +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/devcontainer.json +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/docker-compose.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.devcontainer/litellm-config.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/dependabot.yml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/workflows/ci.yml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.github/workflows/docs.yml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.gitignore +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/.python-version +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/AGENTS.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/CLAUDE.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/LICENSE +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/AGENTS.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/assets/nighthawk_logo-128x128.png +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/coding-agent-backends.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/executors.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/natural-blocks.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/pydantic-ai-providers.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/docs/quickstart.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/binding_value.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/outcome_kind.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/assertions/raise_message.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/evidence/2026-03-26-baseline-prompt-ab.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/evidence/2026-03-26-baseline-regression.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/evidence/2026-03-26-baseline-suffix-ab.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig-agents.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig-prompt-ab.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig-suffix-ab.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/promptfooconfig.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_coding_agent.txt +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_mutation_aware.txt +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/prompts/eval_sequenced.txt +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/research-result.md +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/binding_operations.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/edge_cases.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/loop_outcomes.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/multi_step.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/null_handling.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/outcome_kinds.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/evals/promptfoo/test_cases/tool_selection.yaml +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/pyrightconfig.json +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/base.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/claude_code_cli.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/claude_code_sdk.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/claude_code_settings.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/codex.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/mcp_boundary.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/mcp_server.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/backends/tool_bridge.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/errors.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/identifier_path.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/json_renderer.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/blocks.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/decorator.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/natural/transform.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/async_bridge.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/step_contract.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/runtime/tool_calls.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/assignment.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/execution.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/provided.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/tools/registry.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/src/nighthawk/ulid.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/test_claude_code_cli.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/backends/test_claude_code_sdk.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/conftest.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/docs/test_docs_architecture.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/stub_executor.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_execution_outcome_prompt_fragment.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_infer_binding_types.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_natural_block_ordering.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/execution/test_natural_traceback.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/skip_helpers.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_carry_pattern.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_claude_code_cli_integration.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_claude_code_sdk_integration.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_codex_integration.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/integration/test_llm_integration.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/natural/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/natural/test_blocks.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/public/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/public/test_readme_example.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_fallback.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/resilience/test_vote.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/test_renderer.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/test_testing.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/__init__.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_contracts.py +0 -0
- {nighthawk_python-0.6.1 → nighthawk_python-0.8.0}/tests/tools/test_registry.py +0 -0
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
---
|
|
2
|
+
paths:
|
|
3
|
+
- "src/**/*.py"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Coding standards
|
|
7
|
+
|
|
8
|
+
- Prefer concrete code. Add a new abstraction only when the same change uses it from non-test code.
|
|
9
|
+
- Default to module-private names. Export via `__all__` only for stable non-test consumers.
|
|
10
|
+
- If a change expands or changes public API, update or confirm `tests/public/`.
|
|
11
|
+
- Prefer async implementations in `runtime/` and `backends/`; keep sync bridges only at compatibility boundaries.
|
|
12
|
+
- Reuse the existing `NighthawkError` hierarchy before adding a new exception class.
|
|
13
|
+
- Prefer Pydantic (`BaseModel`, `TypeAdapter`) and Pydantic AI primitives over custom validation, parsing, schema, or agent/tool plumbing.
|
|
14
|
+
- Use `opentelemetry.trace` spans at run/scope/step/tool boundaries and `logging.getLogger("nighthawk")` for diagnostics. Do not import `logfire` in `src/`.
|
|
15
|
+
- Use PEP 695 `type` statements for new type aliases.
|
|
16
|
+
- Ask before adding a new `src/` subpackage for a single module.
|
|
17
|
+
- Follow `CONTRIBUTING.md` § Docstring Guide for docstring scope and format.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
---
|
|
2
|
+
paths:
|
|
3
|
+
- "tests/**"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Testing (pytest)
|
|
7
|
+
|
|
8
|
+
- Prefer deterministic pytest coverage by default. Use helpers from `nighthawk.testing` before reaching for live LLM calls.
|
|
9
|
+
- Use `tests/execution/stub_executor.py` only for envelope and runtime parser checks; prefer `nighthawk.testing` for normal Natural-function tests.
|
|
10
|
+
- Keep live-LLM tests in `tests/integration/` and behind the documented environment gates.
|
|
11
|
+
- For Python behavior changes, add or update pytest coverage in the same change and run `uv run pytest -q`.
|
|
12
|
+
- If a change affects public API or README examples, confirm `tests/public/`. If it affects docs examples or anchors, confirm `tests/docs/`.
|
|
13
|
+
- If a change affects prompt rendering, system prompt text, suffix generation, or tool exposure behavior, follow `.claude/rules/promptfoo.md`.
|
|
@@ -7,6 +7,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [0.8.0]
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
- Unit tests covering prompt token-budget injection: system prompt resolves `$tool_result_max_tokens`, and custom user prompt templates can resolve the same placeholder.
|
|
14
|
+
|
|
15
|
+
### Changed
|
|
16
|
+
- Default step system prompt now states that tool result `value` is a preview and includes the injected max-token limit.
|
|
17
|
+
- User prompt template rendering now uses `Template.safe_substitute`, aligned with system prompt injection behavior and compatible with optional `$tool_result_max_tokens` placeholders.
|
|
18
|
+
|
|
19
|
+
## [0.7.0]
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
- `nighthawk.UsageMeter`: run-scoped, thread-safe LLM token usage accumulator. Created automatically by `nh.run()` and readable via `nh.get_current_usage_meter()`.
|
|
23
|
+
- `nighthawk.resilience.budget` transformer: composable token and cost budget enforcement with pre-call and post-call checks. Parameters: `tokens`, `tokens_per_call`, `cost`, `cost_per_call`, `cost_function`, `estimate_usage`.
|
|
24
|
+
- `BudgetExceededError`, `BudgetLimitKind`, `CostFunction` supporting types.
|
|
25
|
+
- OpenTelemetry span event `nighthawk.resilience.budget.exceeded` and `nighthawk.resilience` logger warning on budget violation.
|
|
26
|
+
- Resilience OpenTelemetry events for retry/timeout/circuit paths: `nighthawk.resilience.retry.attempt`, `nighthawk.resilience.retry.exhausted`, `nighthawk.resilience.timeout.triggered`, `nighthawk.resilience.circuit.opened`.
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
- Project status promoted from Alpha to Beta.
|
|
30
|
+
- Updated one-line description.
|
|
31
|
+
- Removed "experimental" language from README and documentation.
|
|
32
|
+
- Updated PyPI keywords for improved discoverability.
|
|
33
|
+
- Generalized `StepContext` implicit references to value-based mappings (`implicit_reference_name_to_value`), and added additive scope injection via `nh.scope(implicit_references={...})` across nested scopes.
|
|
34
|
+
|
|
10
35
|
## [0.6.1]
|
|
11
36
|
|
|
12
37
|
### Added
|
|
@@ -102,7 +127,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
102
127
|
- Step executor abstraction and provider integration foundation.
|
|
103
128
|
- Core documentation and project scaffolding.
|
|
104
129
|
|
|
105
|
-
[Unreleased]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.
|
|
130
|
+
[Unreleased]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.8.0...HEAD
|
|
131
|
+
[0.8.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.7.0...v0.8.0
|
|
132
|
+
[0.7.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.6.1...v0.7.0
|
|
106
133
|
[0.6.1]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.6.0...v0.6.1
|
|
107
134
|
[0.6.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.5.0...v0.6.0
|
|
108
135
|
[0.5.0]: https://github.com/kurusugawa-computer/nighthawk-python/compare/v0.4.0...v0.5.0
|
|
@@ -27,15 +27,13 @@ uv run python
|
|
|
27
27
|
# Format code
|
|
28
28
|
uv run ruff format .
|
|
29
29
|
|
|
30
|
-
# Lint (
|
|
31
|
-
uv run ruff check .
|
|
30
|
+
# Lint (auto-fix)
|
|
32
31
|
uv run ruff check --fix .
|
|
33
32
|
|
|
34
33
|
# Type check
|
|
35
34
|
uv run pyright
|
|
36
35
|
|
|
37
36
|
# Run tests
|
|
38
|
-
uv run pytest # full suite
|
|
39
37
|
uv run pytest -q # quiet output
|
|
40
38
|
```
|
|
41
39
|
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: nighthawk-python
|
|
3
|
-
Version: 0.
|
|
4
|
-
Summary:
|
|
3
|
+
Version: 0.8.0
|
|
4
|
+
Summary: A Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
|
|
5
5
|
Project-URL: Repository, https://github.com/kurusugawa-computer/nighthawk-python
|
|
6
6
|
Project-URL: Documentation, https://kurusugawa-computer.github.io/nighthawk-python/
|
|
7
7
|
Project-URL: Changelog, https://github.com/kurusugawa-computer/nighthawk-python/blob/main/CHANGELOG.md
|
|
@@ -9,8 +9,8 @@ Project-URL: Bug Tracker, https://github.com/kurusugawa-computer/nighthawk-pytho
|
|
|
9
9
|
Author-email: "Kurusugawa Computer Inc." <oss@kurusugawa.jp>
|
|
10
10
|
License-Expression: MIT
|
|
11
11
|
License-File: LICENSE
|
|
12
|
-
Keywords:
|
|
13
|
-
Classifier: Development Status ::
|
|
12
|
+
Keywords: agent,ai,anthropic,dsl,llm,natural-language,openai,prompt-engineering,pydantic-ai,structured-output
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
14
|
Classifier: Intended Audience :: Developers
|
|
15
15
|
Classifier: License :: OSI Approved :: MIT License
|
|
16
16
|
Classifier: Programming Language :: Python :: 3
|
|
@@ -44,12 +44,12 @@ Description-Content-Type: text/markdown
|
|
|
44
44
|
<img src="https://github.com/kurusugawa-computer/nighthawk-python/raw/main/docs/assets/nighthawk_logo-128x128.png" alt="nighthawk-logo" width="128px" margin="10px"></img>
|
|
45
45
|
</div>
|
|
46
46
|
|
|
47
|
-
Nighthawk is
|
|
47
|
+
Nighthawk is a Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
|
|
48
48
|
|
|
49
|
-
-
|
|
50
|
-
-
|
|
49
|
+
- **Hard control** (Python code): strict procedure, verification, and deterministic flow.
|
|
50
|
+
- **Soft reasoning** (an LLM or coding agent): semantic interpretation inside small embedded "Natural blocks".
|
|
51
51
|
|
|
52
|
-
|
|
52
|
+
The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests"). See **[Philosophy](https://kurusugawa-computer.github.io/nighthawk-python/philosophy/)** for the full design rationale.
|
|
53
53
|
|
|
54
54
|
This repository is a compact reimplementation of the core ideas of [Nightjar](https://github.com/psg-mit/nightjarpy).
|
|
55
55
|
|
|
@@ -9,12 +9,12 @@
|
|
|
9
9
|
<img src="https://github.com/kurusugawa-computer/nighthawk-python/raw/main/docs/assets/nighthawk_logo-128x128.png" alt="nighthawk-logo" width="128px" margin="10px"></img>
|
|
10
10
|
</div>
|
|
11
11
|
|
|
12
|
-
Nighthawk is
|
|
12
|
+
Nighthawk is a Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
|
|
13
13
|
|
|
14
|
-
-
|
|
15
|
-
-
|
|
14
|
+
- **Hard control** (Python code): strict procedure, verification, and deterministic flow.
|
|
15
|
+
- **Soft reasoning** (an LLM or coding agent): semantic interpretation inside small embedded "Natural blocks".
|
|
16
16
|
|
|
17
|
-
|
|
17
|
+
The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests"). See **[Philosophy](https://kurusugawa-computer.github.io/nighthawk-python/philosophy/)** for the full design rationale.
|
|
18
18
|
|
|
19
19
|
This repository is a compact reimplementation of the core ideas of [Nightjar](https://github.com/psg-mit/nightjarpy).
|
|
20
20
|
|
|
@@ -21,8 +21,10 @@
|
|
|
21
21
|
- to_jsonable_value
|
|
22
22
|
- ExecutionContext
|
|
23
23
|
- get_current_step_context
|
|
24
|
+
- get_current_usage_meter
|
|
24
25
|
- get_execution_context
|
|
25
26
|
- get_step_executor
|
|
27
|
+
- UsageMeter
|
|
26
28
|
|
|
27
29
|
## Errors
|
|
28
30
|
|
|
@@ -106,6 +108,10 @@
|
|
|
106
108
|
::: nighthawk.resilience
|
|
107
109
|
options:
|
|
108
110
|
members:
|
|
111
|
+
- budget
|
|
112
|
+
- BudgetExceededError
|
|
113
|
+
- BudgetLimitKind
|
|
114
|
+
- CostFunction
|
|
109
115
|
- retrying
|
|
110
116
|
- timeout
|
|
111
117
|
- fallback
|
|
@@ -109,6 +109,9 @@ deep_executor = nh.AgentStepExecutor.from_configuration(
|
|
|
109
109
|
)
|
|
110
110
|
|
|
111
111
|
|
|
112
|
+
def search_repository(query: str) -> list[str]: ...
|
|
113
|
+
|
|
114
|
+
|
|
112
115
|
@nh.natural_function
|
|
113
116
|
def classify_ticket(text: str) -> str:
|
|
114
117
|
label: str = ""
|
|
@@ -136,10 +139,16 @@ def write_analysis_report(ticket_text: str, product_context: str) -> str:
|
|
|
136
139
|
|
|
137
140
|
with nh.run(fast_executor):
|
|
138
141
|
label = classify_ticket(ticket_text)
|
|
139
|
-
with nh.scope(
|
|
142
|
+
with nh.scope(
|
|
143
|
+
step_executor=deep_executor,
|
|
144
|
+
implicit_references={"search_repository": search_repository},
|
|
145
|
+
):
|
|
140
146
|
report = write_analysis_report(ticket_text, product_summary)
|
|
141
147
|
```
|
|
142
148
|
|
|
149
|
+
`implicit_references` can inject global helper functions as block capabilities.
|
|
150
|
+
Nested scopes still merge additively (set union by key).
|
|
151
|
+
|
|
143
152
|
## 4. The standard contract shape
|
|
144
153
|
|
|
145
154
|
Prefer the post-block logic pattern. Let the block write a typed value, then validate or transform it in Python.
|
|
@@ -264,6 +273,8 @@ Do not inject untrusted raw text into Natural source. If input is user-controlle
|
|
|
264
273
|
Rules:
|
|
265
274
|
|
|
266
275
|
- The model sees callable signatures from both LOCALS and GLOBALS.
|
|
276
|
+
- For object read bindings, the model also sees a capability view: object header, public methods (with signatures), and public fields (with typed previews).
|
|
277
|
+
- Object capability views expose public members only. Private/dunder members are omitted, and properties are not evaluated.
|
|
267
278
|
- Put per-invocation data in function parameters. Put stable, reusable capabilities at module level.
|
|
268
279
|
- Do not annotate callable parameters as `object` or `Any` -- this erases the signature the model needs:
|
|
269
280
|
|
|
@@ -311,7 +322,7 @@ Async rule:
|
|
|
311
322
|
|
|
312
323
|
Resilience rule:
|
|
313
324
|
|
|
314
|
-
- Keep retry, fallback, timeout, and circuit-breaker policy in Python, not inside Natural text.
|
|
325
|
+
- Keep retry, fallback, timeout, budget, and circuit-breaker policy in Python, not inside Natural text.
|
|
315
326
|
- Import from `nighthawk.resilience` (not re-exported from `nighthawk`):
|
|
316
327
|
|
|
317
328
|
```py
|
|
@@ -323,11 +334,11 @@ with nh.run(executor):
|
|
|
323
334
|
label = resilient_classify(ticket_text)
|
|
324
335
|
```
|
|
325
336
|
|
|
326
|
-
See [Patterns: Resilience](https://kurusugawa-computer.github.io/nighthawk-python/patterns/#resilience-patterns) for `fallback`, `vote`, `timeout`, and `circuit_breaker`.
|
|
337
|
+
See [Patterns: Resilience](https://kurusugawa-computer.github.io/nighthawk-python/patterns/#resilience-patterns) for `fallback`, `vote`, `timeout`, `budget`, and `circuit_breaker`.
|
|
327
338
|
|
|
328
339
|
## 9. Context budget discipline
|
|
329
340
|
|
|
330
|
-
Prompt context is finite. When you see `<snipped>`,
|
|
341
|
+
Prompt context is finite. When you see `<snipped>`, the marked data is truncated from the prompt but remains in Python memory -- the model can still reach it through binding functions. Fix context pressure in this order:
|
|
331
342
|
|
|
332
343
|
1. Remove irrelevant locals and globals from the function scope.
|
|
333
344
|
2. Split the block into smaller, focused blocks.
|
|
@@ -4,12 +4,12 @@
|
|
|
4
4
|
<img src="assets/nighthawk_logo-128x128.png" alt="logo" width="128px">
|
|
5
5
|
</div>
|
|
6
6
|
|
|
7
|
-
Nighthawk is
|
|
7
|
+
Nighthawk is a Python library where Python controls flow and LLMs or coding agents reason within constrained Natural blocks.
|
|
8
8
|
|
|
9
|
-
-
|
|
10
|
-
-
|
|
9
|
+
- **Hard control** (Python code): strict procedure, verification, and deterministic flow.
|
|
10
|
+
- **Soft reasoning** (an LLM or coding agent): semantic interpretation inside small embedded "Natural blocks".
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
The same mechanism handles lightweight LLM judgments ("classify this sentiment") and autonomous agent executions ("refactor this module and write tests").
|
|
13
13
|
|
|
14
14
|
```py
|
|
15
15
|
import nighthawk as nh
|
|
@@ -323,7 +323,7 @@ def compute_with_context(context_text: str) -> int:
|
|
|
323
323
|
Natural blocks are non-deterministic by nature. Production deployments need strategies to handle transient failures, unstable outputs, and provider outages. The `nighthawk.resilience` module provides composable **function transformers** -- each takes a callable and returns a new callable with the same signature.
|
|
324
324
|
|
|
325
325
|
```py
|
|
326
|
-
from nighthawk.resilience import retrying, fallback, vote, timeout, circuit_breaker
|
|
326
|
+
from nighthawk.resilience import retrying, fallback, vote, timeout, budget, circuit_breaker
|
|
327
327
|
```
|
|
328
328
|
|
|
329
329
|
Import directly from `nighthawk.resilience`. Resilience primitives are not re-exported from the top-level `nighthawk` namespace.
|
|
@@ -349,7 +349,12 @@ for attempt in retrying(attempts=3):
|
|
|
349
349
|
result = classify(text)
|
|
350
350
|
```
|
|
351
351
|
|
|
352
|
-
|
|
352
|
+
`retrying` separates retry control into four roles:
|
|
353
|
+
|
|
354
|
+
- `on`: type-level retry eligibility.
|
|
355
|
+
- `retry_if`: content-level retry eligibility after `on` matches.
|
|
356
|
+
- `wait`: retry interval strategy.
|
|
357
|
+
- `on_retry`: side-effect hook when a retry is decided.
|
|
353
358
|
|
|
354
359
|
```py
|
|
355
360
|
from tenacity import wait_fixed
|
|
@@ -357,10 +362,14 @@ from tenacity import wait_fixed
|
|
|
357
362
|
resilient = retrying(
|
|
358
363
|
attempts=5,
|
|
359
364
|
on=(ExecutionError, TimeoutError),
|
|
365
|
+
retry_if=lambda exception: "transient" in str(exception).lower(),
|
|
360
366
|
wait=wait_fixed(2),
|
|
367
|
+
on_retry=lambda retry_state: logger.info("retrying", extra={"attempt": retry_state.attempt_number}),
|
|
361
368
|
)(classify)
|
|
362
369
|
```
|
|
363
370
|
|
|
371
|
+
Use only what you need. For most cases, `retrying(attempts=3)(fn)` is enough.
|
|
372
|
+
|
|
364
373
|
### Fallback
|
|
365
374
|
|
|
366
375
|
Try multiple functions in order. The first success wins.
|
|
@@ -405,6 +414,42 @@ async with timeout(seconds=30):
|
|
|
405
414
|
result = await slow_operation()
|
|
406
415
|
```
|
|
407
416
|
|
|
417
|
+
### Budget
|
|
418
|
+
|
|
419
|
+
Enforce token or monetary cost limits on wrapped functions. Requires an active `nh.run()` context (the run-scoped `UsageMeter` tracks cumulative usage automatically).
|
|
420
|
+
|
|
421
|
+
```py
|
|
422
|
+
from nighthawk.resilience import budget
|
|
423
|
+
|
|
424
|
+
safe_classify = budget(tokens=50_000, tokens_per_call=5_000)(classify)
|
|
425
|
+
result = safe_classify(text)
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
`tokens` caps cumulative usage across all calls; `tokens_per_call` caps a single call. Both are checked before and after each invocation. When a limit is breached, `BudgetExceededError` is raised -- combine with `fallback` to degrade gracefully:
|
|
429
|
+
|
|
430
|
+
```py
|
|
431
|
+
from nighthawk.resilience import budget, fallback, BudgetExceededError
|
|
432
|
+
|
|
433
|
+
composed = fallback(
|
|
434
|
+
budget(tokens=50_000)(classify_gpt4),
|
|
435
|
+
classify_mini,
|
|
436
|
+
on=(BudgetExceededError,),
|
|
437
|
+
)
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
For monetary budgets, supply a `cost_function` that converts `RunUsage` to a float:
|
|
441
|
+
|
|
442
|
+
```py
|
|
443
|
+
from pydantic_ai.usage import RunUsage
|
|
444
|
+
|
|
445
|
+
def dollar_cost(usage: RunUsage) -> float:
|
|
446
|
+
return usage.input_tokens * 3e-6 + usage.output_tokens * 15e-6
|
|
447
|
+
|
|
448
|
+
budgeted = budget(cost=1.00, cost_function=dollar_cost)(classify)
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
Outside a `nh.run()` context, the transformer is a no-op.
|
|
452
|
+
|
|
408
453
|
### Circuit breaker
|
|
409
454
|
|
|
410
455
|
Prevent repeated calls to a failing service. After `fail_threshold` consecutive failures, the circuit opens and rejects calls immediately with `CircuitOpenError`. After `reset_timeout` seconds, one probe call is allowed.
|
|
@@ -440,10 +485,11 @@ Recommended composition order (innermost to outermost):
|
|
|
440
485
|
| Order | Transformer | Why |
|
|
441
486
|
|---|---|---|
|
|
442
487
|
| 1 | `timeout` | Bound each individual call |
|
|
443
|
-
| 2 | `
|
|
444
|
-
| 3 | `
|
|
445
|
-
| 4 | `
|
|
446
|
-
| 5 | `
|
|
488
|
+
| 2 | `budget` | Cap token or monetary cost |
|
|
489
|
+
| 3 | `vote` | Aggregate multiple bounded calls |
|
|
490
|
+
| 4 | `retrying` | Retry the aggregated operation |
|
|
491
|
+
| 5 | `circuit_breaker` | Protect against persistent failure |
|
|
492
|
+
| 6 | `fallback` | Switch to alternative on exhaustion |
|
|
447
493
|
|
|
448
494
|
### Caching LLM results
|
|
449
495
|
|
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
# Philosophy
|
|
2
2
|
|
|
3
|
-
Python
|
|
3
|
+
Python owns the control flow. The LLM works inside typed blocks, receiving inputs and returning outputs through explicit bindings.
|
|
4
4
|
|
|
5
5
|
## Execution model
|
|
6
6
|
|
|
7
|
-
Nighthawk embeds Natural blocks inside ordinary Python functions. Each block
|
|
7
|
+
Nighthawk embeds Natural blocks inside ordinary Python functions. Each block has a typed boundary. Read bindings (`<name>`) pass Python values in. Write bindings (`<:name>`) pass results back out, validated against their type annotations. Binding functions let the LLM call Python functions during execution. Python controls the sequencing -- loops, conditionals, error handling, retries -- and the LLM operates inside each block with no implicit message history carried across blocks.
|
|
8
8
|
|
|
9
9
|
```py
|
|
10
10
|
def python_average(numbers):
|
|
@@ -23,7 +23,7 @@ calculate_average([1, "2", "three", "cuatro", "五"]) # 3.0
|
|
|
23
23
|
|
|
24
24
|
Binding functions like `<python_average>` appear in the prompt as a compact signature line. The LLM's pre-trained Python knowledge lets it reason about types, return values, and composition from the signature alone, without JSON Schema or protocol overhead. See [Tool exposure efficiency](#tool-exposure-efficiency) for the quantitative comparison with MCP and CLI tool exposure.
|
|
25
25
|
|
|
26
|
-
With provider-backed executors, each Natural block is a single LLM call
|
|
26
|
+
With provider-backed executors, each Natural block is a single LLM call. A sentiment classifier whose write binding is typed as `Literal["positive", "negative", "neutral"]` rejects any output outside the declared set -- Pydantic validates the type annotation at runtime, not as a hint. The same mechanism applies to numeric extraction (`int`, `float`), structured parsing (Pydantic models), and any task where the judgment space is bounded. Because the host program owns the loop, a misclassified result can be retried, logged, or routed to a fallback -- all in ordinary Python.
|
|
27
27
|
|
|
28
28
|
With [coding agent backends](coding-agent-backends.md), the same boundary contract applies, but each Natural block becomes an autonomous agent execution. The agent can read files, run commands, and invoke skills -- while typed bindings enforce what crosses the boundary back to Python. The same `scope()` and `run()` context managers that structure human-written workflows are equally legible to a coding agent constructing workflows programmatically. When a coding agent operates inside a Natural block, binding functions appear as Python signatures in the prompt:
|
|
29
29
|
|
|
@@ -32,13 +32,17 @@ fetch_items: (category: str, limit: int = 10) -> list[Item]
|
|
|
32
32
|
merge_results: (primary: list[Item], secondary: list[Item]) -> list[Item]
|
|
33
33
|
```
|
|
34
34
|
|
|
35
|
-
The underlying LLM's pre-trained Python knowledge lets it infer that `Item` has attributes, that the return value supports iteration and indexing, and that `merge_results` accepts the output of `fetch_items` directly -- all from the type annotations alone.
|
|
35
|
+
The underlying LLM's pre-trained Python knowledge lets it infer that `Item` has attributes, that the return value supports iteration and indexing, and that `merge_results` accepts the output of `fetch_items` directly -- all from the type annotations alone. A CLI tool description (`fetch-items --category X --limit 10`) is optimized for invocation syntax; output structure is left to the model's training data.
|
|
36
36
|
|
|
37
|
-
Coding agent backends make this especially practical because the agent can immediately apply that inferred structure while reading workflow code, invoking tools, editing implementations, running `pytest`, and iterating within the same Python codebase.
|
|
37
|
+
Coding agent backends make this especially practical because the agent can immediately apply that inferred structure while reading workflow code, invoking tools, editing implementations, running `pytest`, and iterating within the same Python codebase. The agent works directly in Python with standard tooling -- debugger, test runner, type checker -- rather than through a separate orchestration layer.
|
|
38
|
+
|
|
39
|
+
When the prompt exceeds token limits, the runtime omits remaining entries from the rendered context and appends a `<snipped>` marker. The underlying data stays in Python memory -- binding functions can still query it at runtime. Truncation optimizes prompt coherence without causing data loss.
|
|
40
|
+
|
|
41
|
+
Because each Natural block is a fresh prompt with no implicit history, the entire prompt surface -- block text (including f-string interpolation), bindings, and scope configuration -- is determined by the host program at each invocation. Changing any of these between invocations has no side effects on other blocks.
|
|
38
42
|
|
|
39
43
|
## The harness matters more than the model
|
|
40
44
|
|
|
41
|
-
The strongest direct evidence comes from agentic coding tasks
|
|
45
|
+
The strongest direct evidence comes from agentic coding tasks. The subsections below separate what has been measured from where Nighthawk extends the principle.
|
|
42
46
|
|
|
43
47
|
### Observed evidence
|
|
44
48
|
|
|
@@ -50,27 +54,29 @@ The direct evidence concerns LLM-driven code editing and file management tasks,
|
|
|
50
54
|
|
|
51
55
|
### Design inference for Nighthawk
|
|
52
56
|
|
|
53
|
-
|
|
57
|
+
We think the same principle applies to provider-backed judgments like sentiment classification and numeric interpretation, but we have not measured it directly. Typed bindings limit what the LLM can return, and resilience transformers handle transient failures -- both should help, but neither has been tested in the same controlled way as the coding-task evidence above.
|
|
54
58
|
|
|
55
|
-
Regardless of scope, the practical question is how harness improvements are expressed. Configuration-file
|
|
59
|
+
Regardless of scope, the practical question is how harness improvements are expressed. Configuration-file guardrails -- rule files, lifecycle hooks, permission modes, tool filtering -- are effective at restricting behavior. They are optimized for static constraints. Dynamic orchestration (conditional retries, typed input/output contracts, scope-dependent tool visibility, prompts that adapt at runtime) requires a programming language, which is where Nighthawk's Python-first approach fits.
|
|
56
60
|
|
|
57
|
-
The primitives described in the [Execution model](#execution-model) and the following sections -- typed bindings, resilience transformers, scoped execution contexts -- are Nighthawk
|
|
61
|
+
The primitives described in the [Execution model](#execution-model) and the following sections -- typed bindings, resilience transformers, scoped execution contexts -- are how Nighthawk implements the principle in Python.
|
|
58
62
|
|
|
59
63
|
## Design consequences
|
|
60
64
|
|
|
61
|
-
The
|
|
65
|
+
The sections below explore what follows from the typed-binding execution model: resilience, scoping, tool exposure, multi-agent coordination, and the tradeoffs the design accepts.
|
|
62
66
|
|
|
63
67
|
### Resilience as composable functions
|
|
64
68
|
|
|
65
|
-
Production LLM applications need strategies for
|
|
69
|
+
Production LLM applications need strategies for transient failures, unstable outputs, and provider outages. Workflow engines build retry, checkpointing, and human-in-the-loop into the graph runtime. Nighthawk takes a different approach. Resilience primitives (`nighthawk.resilience`) are ordinary Python function transformers that wrap any callable. Each transformer takes a function and returns a new function with the same signature. Retry, fallback, voting, timeout, and circuit breaker logic composes by nesting -- no graph DSL, no framework-managed state, and no implicit retry policy. The host controls exactly which calls are retried, how many times, and what happens on exhaustion -- using the same Python debugger, pytest, and code review workflows as the rest of the application. This applies equally to lightweight provider-backed judgments and autonomous agent executions. See [Patterns](patterns.md#resilience-patterns) for usage patterns and composition examples.
|
|
66
70
|
|
|
67
71
|
### Scoped execution contexts
|
|
68
72
|
|
|
69
|
-
`run()` establishes the execution boundary: it links a step executor to the current context as an explicit Python `with` statement rather than as a global configuration or implicit thread-local. `scope()` narrows configuration within an existing run -- model override, prompt suffix, or executor replacement -- each taking effect only within the nested `with` block. Nesting is natural Python lexical scoping: the host program's control flow, not a framework runtime, determines which configuration is active at any point.
|
|
73
|
+
`run()` establishes the execution boundary: it links a step executor to the current context as an explicit Python `with` statement rather than as a global configuration or implicit thread-local. `scope()` narrows configuration within an existing run -- model override, prompt suffix, or executor replacement -- each taking effect only within the nested `with` block. Nesting is natural Python lexical scoping: the host program's control flow, not a framework runtime, determines which configuration is active at any point. Runtime behavior lives in Python structures rather than in prose-only instructions or static configuration. See [Runtime configuration](runtime-configuration.md) for details and examples.
|
|
70
74
|
|
|
71
75
|
### Tool exposure efficiency
|
|
72
76
|
|
|
73
|
-
|
|
77
|
+
Binding functions carry higher information density per token than JSON Schema or CLI descriptions (see [Execution model](#execution-model) for how they appear in the prompt). This section compares the per-tool context cost across approaches.
|
|
78
|
+
|
|
79
|
+
MCP tool definitions carry per-request JSON Schema overhead that grows with the number of exposed tools. CLI tools reduce definition overhead but carry hidden costs -- Mario Zechner's [2025 benchmark](https://mariozechner.at/posts/2025-08-15-mcp-vs-cli/) found that CLI invocations in Claude Code trigger per-command security classification that consumed an order of magnitude more tokens than equivalent MCP calls. In both approaches, substantial context budget is spent on tool plumbing before the model sees the actual task.
|
|
74
80
|
|
|
75
81
|
**MCP** defines tools as JSON Schema objects served over a protocol layer. Each tool definition consumes tokens in every request.
|
|
76
82
|
|
|
@@ -82,7 +88,7 @@ Because binding functions are Python signatures rather than JSON Schema objects
|
|
|
82
88
|
find_top_items: (category: str) -> list[dict] # Return the highest-scored recent items in a category.
|
|
83
89
|
```
|
|
84
90
|
|
|
85
|
-
|
|
91
|
+
The type annotations let the LLM reason structurally: a `list[dict]` return supports iteration and key access, an `Item` return type has discoverable attributes, and typed parameters make it clear what another binding function will accept. There is no protocol layer, no serialization boundary, and no per-tool JSON Schema overhead. The same type annotations serve as targets for optional static analysis (pyright) and as hooks for Nighthawk's runtime validation (via Pydantic). Testing, debugging, and composition use standard Python tooling.
|
|
86
92
|
|
|
87
93
|
| Approach | Per-tool context cost | Information density | Type safety | Composability | Testing | Interoperability |
|
|
88
94
|
|---|---|---|---|---|---|---|
|
|
@@ -92,7 +98,7 @@ This is on the order of a single signature line -- comparable in token cost to t
|
|
|
92
98
|
|
|
93
99
|
### Multi-agent coordination without a framework
|
|
94
100
|
|
|
95
|
-
Multi-agent systems face three structural challenges: how agents communicate state, how agents are isolated from each other, and how results from multiple agents are merged. Existing workflow engines address these through framework-specific mechanisms -- graph state for communication, managed runtimes for isolation, message aggregation for merging -- but each
|
|
101
|
+
Multi-agent systems face three structural challenges: how agents communicate state, how agents are isolated from each other, and how results from multiple agents are merged. Existing workflow engines address these through framework-specific mechanisms -- graph state for communication, managed runtimes for isolation, message aggregation for merging -- but each ties communication, isolation, and merging to the framework's own abstractions.
|
|
96
102
|
|
|
97
103
|
Nighthawk is not a multi-agent framework. It is a building block that composes with Python's existing ecosystem for each challenge.
|
|
98
104
|
|
|
@@ -100,7 +106,7 @@ Nighthawk is not a multi-agent framework. It is a building block that composes w
|
|
|
100
106
|
|
|
101
107
|
**Isolation.** Nighthawk provides logical isolation at binding boundaries: read bindings prevent name rebinding, write bindings are type-validated, and each Natural block executes with an independent step context carrying no implicit message history. Read bindings do not prevent in-place mutation of mutable objects -- this is intentional and underlies the [carry pattern](patterns.md#the-carry-pattern). OS-level isolation -- sandboxing, filesystem scoping, permission control -- is delegated to the execution backend. Coding agent backends provide their own sandbox modes and working directory scoping, which Nighthawk configures but does not reimplement.
|
|
102
108
|
|
|
103
|
-
**Result merging.** The resilience module provides composable patterns for common cases: `vote` for majority consensus across repeated invocations, `fallback` for sequential first-success chaining. Domain-specific merging -- reconciling edits from multiple agents, aggregating heterogeneous outputs, resolving conflicts -- belongs in user code, because merge semantics are inherently domain-dependent. Nighthawk
|
|
109
|
+
**Result merging.** The resilience module provides composable patterns for common cases: `vote` for majority consensus across repeated invocations, `fallback` for sequential first-success chaining. Domain-specific merging -- reconciling edits from multiple agents, aggregating heterogeneous outputs, resolving conflicts -- belongs in user code, because merge semantics are inherently domain-dependent. Nighthawk ensures that each agent's output crosses the boundary as a typed, validated Python object that merge logic can operate on directly.
|
|
104
110
|
|
|
105
111
|
### Tradeoffs
|
|
106
112
|
|
|
@@ -108,7 +114,7 @@ The boundary-centric design has costs:
|
|
|
108
114
|
|
|
109
115
|
- **Python lock-in.** Binding functions, type annotations, and resilience transformers are Python constructs. Nighthawk does not offer a language-neutral protocol; interoperability with non-Python systems requires explicit bridging (e.g., REST endpoints wrapping Natural functions).
|
|
110
116
|
- **Per-invocation cost.** Every Natural block invocation calls the LLM. There is no compilation step that amortizes cost across inputs. For high-throughput, low-judgment tasks where a deterministic Python function would suffice, a Natural block is the wrong tool. See [Why evaluate every time](#why-evaluate-every-time) for the design rationale.
|
|
111
|
-
- **Integration tests are essential.** Mock tests verify Python logic around Natural blocks, but verifying that the LLM produces correct judgments requires integration tests against a real provider. The [two-layer testing strategy](verification.md) is not optional --
|
|
117
|
+
- **Integration tests are essential.** Mock tests verify Python logic around Natural blocks, but verifying that the LLM produces correct judgments requires integration tests against a real provider. The [two-layer testing strategy](verification.md) is not optional -- because the LLM produces the judgment, only a real LLM call can verify it.
|
|
112
118
|
- **Manual orchestration burden.** Nighthawk leaves branching, retries, merge logic, and recovery policy in user code rather than a graph runtime. This is a direct cost of the "Python controls all flow" principle.
|
|
113
119
|
- **Python API design discipline.** Binding functions are only as effective as their signatures, type annotations, and naming. Poor API design degrades the LLM's ability to reason about composition.
|
|
114
120
|
|
|
@@ -118,7 +124,7 @@ A natural question: why not use an LLM once to translate a Natural block into eq
|
|
|
118
124
|
|
|
119
125
|
The answer is that Natural blocks exist precisely for tasks that cannot be reduced to deterministic code. "Classify the sentiment of this review" or "interpret this ambiguous user input" require judgment that depends on the specific input, world knowledge, and context. If a task could be written as deterministic Python, it should be -- this is the core design principle (see [Natural blocks](natural-blocks.md#responsibility-split)).
|
|
120
126
|
|
|
121
|
-
One-time compilation has additional
|
|
127
|
+
One-time compilation has additional limitations:
|
|
122
128
|
|
|
123
129
|
- The generated code would freeze the LLM's world knowledge at compilation time.
|
|
124
130
|
- The input space is unbounded: "three apples, a dozen eggs, and cinco naranjas" requires open-ended interpretation that no finite code generation can fully anticipate.
|
|
@@ -153,7 +159,7 @@ Target: `[1, "2", "three", "cuatro", "五"]`
|
|
|
153
159
|
Store the computed average in `result`.
|
|
154
160
|
````
|
|
155
161
|
|
|
156
|
-
The instruction references embedded code, but there is no explicit boundary for how `result` crosses back to the host program. The narrative assumes the value will be available to subsequent steps, but
|
|
162
|
+
The instruction references embedded code, but there is no explicit boundary for how `result` crosses back to the host program. The narrative assumes the value will be available to subsequent steps, but getting `result` back to the host program is implicit -- it depends on convention, not a declared contract.
|
|
157
163
|
|
|
158
164
|
### Nighthawk
|
|
159
165
|
|