npm - stable-harness - Versions diffs - 0.0.8 → 0.0.9 - Mend

stable-harness 0.0.8 → 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/README.md +10 -0
package/docs/0.1.0-p0-runtime-control-plane-plan.zh.md +171 -0
package/docs/0.1.0-retry-policy.zh.md +87 -0
package/docs/0.1.0-stable-runtime-development-roadmap.zh.md +393 -0
package/docs/0.1.0-tool-guard-benchmark.zh.md +42 -0
package/docs/adapter-contract.md +199 -0
package/docs/architecture/backend-comparison.md +41 -0
package/docs/architecture/runtime-events.md +263 -0
package/docs/architecture/runtime-events.zh.md +248 -0
package/docs/architecture/system-architecture.zh.md +435 -0
package/docs/compatibility-matrix.md +139 -0
package/docs/engineering-rules.md +111 -0
package/docs/evaluation/0.1.0-bfcl-targeted-model-matrix.zh.md +1632 -0
package/docs/evaluation/0.1.0-bfcl-targeted-review-matrix.zh.md +1952 -0
package/docs/evaluation/0.1.0-bfcl-tool-guard.zh.md +1427 -0
package/docs/granite-tool-calling-comparison.zh.md +206 -0
package/docs/guides/getting-started.md +126 -0
package/docs/guides/index.md +40 -0
package/docs/guides/integration-guide.md +126 -0
package/docs/guides/operator-runbook.md +153 -0
package/docs/guides/workspace-authoring.md +212 -0
package/docs/implementation-blueprint.md +233 -0
package/docs/memory/0.1.0-memory-design.zh.md +719 -0
package/docs/memory/0.1.0-step-09-deepagents-native-memory.zh.md +146 -0
package/docs/memory/0.1.0-step-09-langmem-shaped-provider.zh.md +169 -0
package/docs/memory/0.1.0-step-09-memory-adapter-projection.zh.md +123 -0
package/docs/memory/0.1.0-step-09-memory-contract.zh.md +169 -0
package/docs/memory/0.1.0-step-09-memory-governance-approval.zh.md +143 -0
package/docs/memory/0.1.0-step-09-memory-lifecycle-hooks.zh.md +150 -0
package/docs/memory/0.1.0-step-09-memory-maintenance-boundary.zh.md +118 -0
package/docs/memory/0.1.0-step-09-memory-persistence-boundary.zh.md +118 -0
package/docs/product/adoption-playbook.md +145 -0
package/docs/product/market-positioning.md +137 -0
package/docs/product-boundary.md +258 -0
package/docs/protocols/http-runtime.md +37 -0
package/docs/protocols/langgraph-compatible.md +107 -0
package/docs/protocols/openai-compatible.md +121 -0
package/docs/tooling/0.1.0-bettercall-tool-quality.zh.md +231 -0
package/package.json +2 -1

package/docs/product/market-positioning.md ADDED Viewed

@@ -0,0 +1,137 @@
+# Market Positioning
+Stable Harness sits between agent frameworks and production applications.
+It is not trying to replace the framework that decides what an agent does next.
+It gives application teams the runtime boundary they need once the prototype
+must become inspectable, governable, recoverable, and callable through stable
+interfaces.
+## Category
+Stable Harness is a stable agent application runtime and operator control plane.
+It combines:
+- workspace inventory
+- runtime lifecycle
+- tool-gateway reliability
+- event traces
+- governance hooks
+- memory lifecycle
+- protocol access
+- backend adapters
+## The Problem
+Agent frameworks usually focus on execution semantics:
+- model calls
+- planning loops
+- tool calls
+- graph transitions
+- delegation
+- memory primitives
+Production applications need additional surfaces:
+- repeatable workspace definition
+- request and session lifecycle
+- operator inspection
+- approval and sandbox policy
+- tool repair and validation before execution
+- trace and artifact capture
+- protocol facades for existing clients
+- backend portability without rewriting the product
+Stable Harness owns those production surfaces.
+## What It Is Not
+Stable Harness is not:
+- a new planning framework
+- a LangGraph replacement
+- a DeepAgents replacement
+- a model router based on prompt keywords
+- a hosted product by itself
+- an unbounded tool-call repair layer
+The boundary matters because users should be able to trust that upstream
+framework semantics remain upstream-native.
+## Competitive Map
+| Category | Examples | Stable Harness Position |
+| --- | --- | --- |
+| Agent execution frameworks | DeepAgents, LangGraph, OpenAI Agents SDK | Keep them. Stable Harness wraps their runtime boundary. |
+| Protocol gateways | OpenAI-compatible servers, MCP servers | Stable Harness exposes protocols over one workspace runtime. |
+| Workflow engines | LangGraph workflows, custom DAG runners | Stable Harness can expose explicit topology while preserving adapter ownership. |
+| Observability tools | tracing and logging platforms | Stable Harness emits runtime evidence that can feed those systems. |
+| Governance systems | approval queues, policy engines, sandbox managers | Stable Harness provides runtime hooks and policy boundaries for agent work. |
+## Differentiation
+### Passthrough-first adapters
+Stable Harness does not copy every backend feature into core runtime. It keeps a
+thin adapter boundary and passes through upstream-native behavior when the
+backend already owns it.
+### YAML workspace inventory
+Agents, models, tools, memory, workflows, and protocol exposure live in
+workspace config. This makes agent applications easier to inspect, move, review,
+and operate.
+### Tool reliability at the runtime edge
+Tool calls are validated and can be repaired before execution. The gateway
+protects execution with inventory, schemas, semantic validators, and governance
+policy.
+### Operator control plane
+Applications can inspect requests, sessions, events, artifacts, approvals,
+memory lifecycle, and runs through stable runtime surfaces.
+### Multi-protocol access
+The same workspace can be used through CLI, SDK, HTTP, and OpenAI-compatible
+clients without creating separate execution paths.
+## Buyer Narrative
+For a product team:
+> We already have an agent prototype. Stable Harness makes it shippable by adding
+> runtime inventory, sessions, traces, governance, memory lifecycle, and protocol
+> access without rewriting the backend.
+For a platform team:
+> Stable Harness gives us one runtime contract across multiple agent apps while
+> letting each team keep its backend framework.
+For an engineering leader:
+> Stable Harness reduces the cost of operating agent systems by making behavior
+> inspectable, recoverable, testable, and governed through explicit runtime
+> boundaries.
+## Claims To Avoid
+Avoid claims that imply Stable Harness owns correctness it cannot guarantee:
+- "agents always call tools correctly"
+- "drop-in production for every agent"
+- "framework-independent behavior with no adapter work"
+- "automatic routing from natural language"
+Use claims that are true:
+- "stable runtime boundary for agent workspaces"
+- "framework-generic operator control plane"
+- "validated and repairable tool gateway"
+- "YAML-defined inventory and protocol exposure"
+- "passthrough-first backend adapters"

package/docs/product-boundary.md ADDED Viewed

@@ -0,0 +1,258 @@
+# Stable Harness Product Boundary
+`stable-harness` is a generic stable application runtime and operator control plane for agent workspaces.
+It is not a new agent framework. It wraps upstream frameworks such as DeepAgents, LangChain v1, OpenAI Agents SDK, Gemini SDK, and future runtimes through adapters.
+DeepAgents is the first backend target. The current DeepAgents feature set must be audited before adding any overlapping `stable-harness` behavior.
+## Product Mission
+The product should let a team define a production agent workspace in YAML, choose an execution backend, and get stable runtime operations without rewriting the backend framework.
+The product must stay framework-generic. A native runtime capability is valid only when it is useful across workspaces and backends, has a stable owner, and does not duplicate an upstream execution primitive.
+The public surface should stay small:
+- load a workspace
+- start a runtime
+- run a request
+- inspect runs, events, approvals, memory, and artifacts
+- stop the runtime
+Everything else should be configured through YAML or handled internally by the runtime.
+## Complete Product Boundary
+### 1. YAML-defined workspace
+The primary product surface is a workspace folder with Kubernetes-style YAML.
+The workspace owns:
+- agents
+- models
+- tools
+- MCP servers
+- skills and resources
+- routing
+- backend adapter selection
+- memory policy
+- approval policy
+- sandbox policy
+- recovery policy
+- protocol exposure
+- maintenance policy
+The YAML should express upstream concepts directly when the upstream framework already defines them.
+### 2. Pluggable backend runtime adapters
+The product contract must not be DeepAgents-shaped, LangGraph-shaped, Microsoft-Agent-Framework-shaped, or downstream-workspace-shaped.
+The runtime should support backend adapters such as:
+- DeepAgents
+- LangChain v1 or LangGraph agents
+- OpenAI Agents SDK
+- Gemini client SDK
+- local model runtimes
+- customer-owned internal frameworks
+Adapters translate stable runtime requests into upstream framework calls. They should be internal integration layers, not new public semantics.
+The default adapter strategy is passthrough first:
+- expose upstream-native primitives through typed workspace config when needed
+- preserve upstream execution semantics instead of normalizing them into a local imitation
+- add only small runtime wrappers for lifecycle, governance, observability, persistence, replay, protocol access, or operator inspection
+- keep backend-specific options behind the adapter boundary unless they are intentionally part of workspace configuration
+Every adapter feature should be classified before implementation:
+- `passthrough`: the upstream framework already owns the capability
+- `runtime wrapper`: `stable-harness` adds lifecycle, governance, observability, persistence, or protocol access around upstream execution
+- `plugin capability`: the feature is runtime-owned but optional and replaceable
+- `downstream workspace`: the feature is application-specific and should not enter the generic runtime
+- `do not build`: the feature would duplicate upstream agent execution semantics
+No DeepAgents feature should be rebuilt locally when upstream passthrough or upstream-native config is sufficient.
+The same standard applies to every backend. Do not recreate OpenAI Agents SDK, Gemini SDK, LangGraph, Microsoft Agent Framework, or customer-framework concepts inside core runtime when an adapter can pass them through.
+### 2.1 Agent graph overlay
+The native user-facing definition should stay `Agent`. DeepAgents agents keep
+their existing definition shape. LangGraph agents may append `edges` to connect
+existing workspace inventory such as tools, skills, and subagents; the edge list
+is topology, not a new execution language.
+Standalone workflow documents are an operator/control-plane compatibility
+surface for explicit graph inventory, inspection, and migration. They must not
+become the default product concept when an `Agent` with backend-native graph
+configuration is sufficient.
+The first implementation layer should stay small:
+- load and validate optional Agent edges for graph-capable backends
+- verify graph edges against existing Agent inventory references
+- render graph structure for inspection without redefining inventory
+- expose explicit graph routing tables as typed runtime inventory when needed
+- dispatch graph-capable agents to a pluggable backend adapter
+Graph routing is a static control-plane table. It may name an explicit graph
+route when a protocol needs it, but it must not infer a route from user prose,
+prompt keywords, downstream domains, tool names, or benchmark cases.
+When a workflow is compiled to LangGraph, Microsoft Agent Framework, or another backend, that compiler remains an adapter capability. Core runtime keeps lifecycle, governance, observability, persistence, request IDs, session IDs, events, and protocol ownership.
+The LangGraph adapter is a concrete example of this boundary: it compiles the
+workflow topology to upstream LangGraph and calls injected node handlers. It does
+not decide what an `agents.*`, `tools.*`, or `skills.*` node means by itself.
+Sub-workflows use the same rule. A `workflows.*` node is just an inventory
+reference until a workflow adapter explicitly opts in and supplies bounded
+execution behavior.
+### 3. Long-term memory lifecycle
+`stable-harness` owns runtime-wide memory lifecycle:
+- memory namespace management
+- memory persistence policy
+- recall orchestration
+- import, export, backup, and compaction
+- memory observability
+- cross-run memory governance
+When a backend has native memory primitives, the adapter should use them directly. Runtime memory is an operational substrate, not a replacement for upstream memory semantics.
+### 4. Runtime stability layer
+The runtime should make upstream frameworks production-safe without reimplementing agent logic.
+It owns:
+- persisted runs and threads
+- checkpoint discovery and recovery
+- retry and resume lifecycle
+- cancellation and timeout handling
+- event normalization
+- artifact capture
+- structured error records
+- background maintenance
+It must not infer execution behavior from natural-language text.
+### 5. Multi-protocol access
+The same runtime should be reachable through multiple protocol surfaces:
+- in-process SDK
+- CLI
+- HTTP server
+- MCP
+- ACP
+- A2A
+- AG-UI
+- future protocol adapters
+Protocol adapters expose stable runtime concepts. They should not expose backend-specific internal details unless explicitly required for compatibility.
+### 6. Operator control plane
+The runtime should give operators direct control over application lifecycle:
+- list active and historical runs
+- inspect events
+- inspect pending approvals
+- approve, deny, cancel, or retry work
+- view memory and artifacts
+- inspect backend health
+- run maintenance jobs
+Users should think in terms of runs, requests, approvals, events, and outcomes rather than raw checkpoint internals.
+### 7. Governance and sandbox policy
+The runtime owns application-level governance:
+- approval decisions
+- sandbox selection
+- resource limits
+- secret access policy
+- tool execution policy
+- audit records
+- tenant or workspace isolation
+Policy decisions must be driven by typed config, tool metadata, runtime state, approval state, or explicit request metadata.
+### 8. Tool, MCP, skill, and resource integration
+The runtime owns application inventory and registration:
+- local tools
+- MCP tools
+- remote tools
+- skills
+- files and workspace resources
+- artifact stores
+- credentials and secret references
+Tool execution semantics still belong to the selected backend or tool gateway.
+Each runtime inventory capability must stay independently pluggable. A tool registry, MCP registry, skill registry, artifact store, memory store, approval policy, sandbox policy, event sink, replay store, or protocol surface should be replaceable without forcing a different backend adapter.
+Pluggability is a design gate, not an implementation detail. A new capability should have a narrow interface, explicit config, focused tests, and a replacement point. If enabling one capability silently requires unrelated memory, approval, replay, protocol, or adapter behavior, the design is too coupled.
+### 9. Evaluation, replay, traces, and artifacts
+The runtime should make production behavior inspectable and repeatable:
+- event traces
+- tool traces
+- replay manifests
+- regression cases
+- evaluation datasets
+- artifact capture
+- run export and import
+Replay should use structured events and recorded runtime state. It should not rebuild intent by parsing final prose.
+### 10. Distribution and scaffold experience
+The project should ship a clean day-one experience:
+- `stable-harness init`
+- example workspaces
+- typed packages
+- minimal SDK
+- clear adapter templates
+- local development scripts
+- verification scripts
+The default scaffold should teach the product boundary by example.
+## Anti-goals
+`stable-harness` must not:
+- become a third agent execution framework
+- reimplement DeepAgents, LangChain, or LangGraph semantics
+- reimplement Microsoft Agent Framework, OpenAI Agents SDK, Gemini SDK, or any customer backend semantics
+- invent a second subagent planning language
+- introduce a stable-owned concept when an upstream primitive can be passed through
+- use natural-language keyword matching to drive runtime control flow
+- synthesize tool calls from TODO text
+- locally replay upstream custom tool calls
+- hardcode downstream domains, tickers, tools, or product-specific workflows
+- expose raw checkpoint manipulation as a primary product API
+- mirror every upstream helper export as product surface
+- bundle unrelated runtime capabilities into one non-replaceable subsystem
+## Boundary Rule
+Upstream frameworks own agent-level execution behavior.
+`stable-harness` owns application-level runtime orchestration, lifecycle, observability, governance, and protocol access.
+When a feature sits near the boundary, the default answer is: passthrough upstream execution semantics, then add only a small optional runtime capability around it.

package/docs/protocols/http-runtime.md ADDED Viewed

@@ -0,0 +1,37 @@
+# HTTP Runtime Protocol
+The native HTTP server exposes stable runtime state and request submission.
+`POST /requests` is a protocol adapter over `RuntimeRequest`. It may pass these
+stable fields through to the runtime:
+- `input`
+- `agentId`
+- `requestId`
+- `sessionId`
+- `parentRunId`
+- `metadata`
+- `memory`
+- `toolCall`
+- `workflow`
+The endpoint must not infer backend behavior, choose tools from prose, or
+rewrite workflow routes. Tool execution and workflow dispatch remain explicit
+runtime requests.
+Inspection endpoints expose normalized runtime state:
+- `GET /inspect`
+- `GET /requests`
+- `GET /requests/:id`
+- `GET /runs/:id/trace`
+- `GET /sessions`
+- `GET /workflows`
+- `GET /workflows/:id/mermaid`
+- `GET /workflows/:id/plan`
+When the workspace enables `specDrivenWorkflow`, `GET /inspect` exposes the
+configured spec-driven policy summary, and `GET /requests/:id` includes the
+run-derived `specDrivenWorkflow` phase state. That state is projected from
+stored `runtime.specDriven.phase.*` events and preserves the raw events in the
+request timeline.

package/docs/protocols/langgraph-compatible.md ADDED Viewed

@@ -0,0 +1,107 @@
+# LangGraph-Compatible Server
+`stable-harness start` starts the official LangGraph Agent Server when
+`protocols.langgraph` is enabled. The LangGraph HTTP protocol, assistant API,
+thread API, run API, persistence model, and streaming behavior remain owned by
+upstream LangGraph.
+`stable-harness` owns only the runtime assembly:
+- read YAML workspace inventory
+- decide which agents to expose
+- generate the thin LangGraph graph entry file
+- call the official `@langchain/langgraph-api/server` startup API
+- keep OpenAI-compatible and LangGraph-compatible services under one runtime
+  start command
+## Defaults
+`stable-harness start` starts both services by default:
+| Service | Default host | Default port |
+| --- | --- | --- |
+| OpenAI-compatible | `127.0.0.1` | `8642` |
+| LangGraph-compatible | `127.0.0.1` | `2024` |
+Configure or disable services from runtime YAML:
+```yaml
+apiVersion: stable-harness.dev/v1
+kind: Runtime
+spec:
+  protocols:
+    openaiCompatible:
+      enabled: true
+      host: 127.0.0.1
+      port: 8642
+      bearerToken: ${env:STABLE_HARNESS_OPENAI_API_KEY:-}
+    langgraph:
+      enabled: true
+      host: 127.0.0.1
+      port: 2024
+      nWorkers: 10
+      env: .env
+      exposeAgents:
+        - orchestra
+```
+If `exposeAgents` is omitted, all YAML agents are exposed as LangGraph graph
+IDs.
+Then run:
+```bash
+stable-harness start -w "$PWD"
+```
+## LangSmith and Studio
+The LangGraph service is started through the official
+`@langchain/langgraph-api/server`, so LangSmith Studio and LangSmith tracing use
+upstream LangGraph behavior. `stable-harness` only loads the environment before
+that server starts.
+By default, `stable-harness start` reads `.env` from the workspace root for the
+LangGraph service. Existing shell environment variables win over `.env` values.
+For Studio, put your key in the workspace `.env` file:
+```text
+LANGSMITH_API_KEY=lsv2...
+LANGSMITH_TRACING=true
+LANGSMITH_PROJECT=stable-harness-local
+```
+You can choose a different file or inline environment through runtime YAML:
+```yaml
+apiVersion: stable-harness.dev/v1
+kind: Runtime
+spec:
+  protocols:
+    langgraph:
+      enabled: true
+      env: .env.langsmith
+      # or:
+      # envFile: .env.langsmith
+      # env:
+      #   LANGSMITH_PROJECT: local-studio
+```
+When the server prints the LangGraph URL, open Studio with:
+```text
+https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
+```
+## Generated Bridge
+The official LangGraph server requires each graph to be loaded from a file
+entrypoint. `stable-harness start` generates that file under:
+```text
+.stable-harness/langgraph/bridge.mjs
+```
+The generated file exports one graph per exposed YAML agent. Each graph is a
+thin adapter that calls the stable runtime for the selected `agentId`. It does
+not implement the LangGraph HTTP protocol.

package/docs/protocols/openai-compatible.md ADDED Viewed

@@ -0,0 +1,121 @@
+# OpenAI-Compatible Protocol Facade
+`stable-harness` exposes OpenAI-compatible endpoints as protocol adapters over the stable runtime contract.
+This surface is for client compatibility. It is not a model provider, a backend adapter, or a new execution framework.
+## Supported MVP Surface
+- `GET /v1/models`
+- `GET /v1/capabilities`
+- `POST /v1/chat/completions`
+- `stream: true` Server-Sent Events for chat completion chunks
+- `stable_harness.tool.progress` SSE events for runtime tool progress
+- `stable_harness.progress.narration` SSE events for runtime progress narration
+The `model` field maps to workspace agent IDs. `/v1/models` lists all workspace agents by default.
+## Local Launch
+From a workspace checkout:
+```sh
+stable-harness start -w "$PWD" --port 8642 --api-key change-me-local-dev
+```
+For embedded applications, the server can read auth settings from runtime YAML:
+```yaml
+apiVersion: stable-harness.dev/v1
+kind: Runtime
+metadata:
+  name: app-runtime
+spec:
+  protocols:
+    openaiCompatible:
+      bearerToken: ${env:STABLE_HARNESS_API_KEY}
+```
+Then the application can use:
+```ts
+const runtime = await createStableHarnessRuntime(workspaceRoot);
+const server = createOpenAiCompatibleHttpServer(runtime);
+```
+Then point OpenAI-compatible clients at:
+```text
+http://127.0.0.1:8642/v1
+```
+For local-only development, auth is optional. When binding beyond loopback, run behind a trusted network boundary and set a bearer token in runtime YAML or pass `--api-key` as an override.
+## Runtime Boundary
+The protocol adapter calls:
+```ts
+runtime.request({
+  input,
+  agentId,
+  metadata: { protocol: "openai-compatible" }
+});
+```
+It must not:
+- create backend-specific model clients
+- bypass workspace agent selection
+- execute client-supplied tools directly
+- mutate DeepAgents or other backend adapter behavior
+- add prompt, keyword, or domain-specific routing logic
+Client `tools` and `tool_choice` fields are ignored by this facade. Runtime tools still come from workspace inventory, governance policy, and the tool gateway.
+## Message Mapping
+`messages` are flattened into a text transcript:
+```text
+system: ...
+user: ...
+assistant: ...
+```
+Text content parts are preserved. Image URL parts are represented as `[image:<url>]` until a backend-native multimodal projection is added.
+Unsupported content types return an OpenAI-shaped `invalid_request_error`.
+## Streaming
+Streaming responses use `chat.completion.chunk` events followed by `data: [DONE]`.
+Runtime tool lifecycle events are emitted as a separate custom SSE event:
+```text
+event: stable_harness.tool.progress
+```
+When runtime progress narration is enabled, narration events are emitted as:
+```text
+event: stable_harness.progress.narration
+```
+These keep tool progress and human-readable narration visible without polluting
+the final assistant message.
+## Next Surfaces
+Add these only after the chat-completions subset is stable:
+- `POST /v1/responses`
+- `previous_response_id` state handling
+- persisted response lookup
+- run event polling endpoints
+- idempotency-key response caching
+Each addition should remain a protocol adapter over native runtime state.