npm - onto-mcp - Versions diffs - 0.3.0 → 0.3.2 - Mend

onto-mcp 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (61) hide show

package/.onto/domains/software-engineering/logic_rules.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-version: 3
-last_updated: "2026-03-31"
+version: 7
+last_updated: "2026-05-28"
 source: bundled-domain-baseline
 status: established
 ---
@@ -125,6 +125,14 @@ See concepts.md §Type System Terms for definitions of structural and nominal ty
 - **Error middleware (Express, Koa, ASP.NET)**: Centralized error handling in request pipeline. Transforms internal errors into user-facing responses. Must not swallow errors silently
 - **Anti-pattern**: catch-and-ignore (`catch (e) {}`) creates silent failures. Every catch block must log, re-throw, or return an error value
+### LLM-Native Failure Posture
+- In LLM-native development, review, and authority-update paths, the default posture is **fail-loud**: a malformed model output, missing context source, invalid tool result, schema mismatch, provider preflight failure, unbudgeted truncation, or absent artifact ref must halt or surface a structured diagnostic naming the failing boundary
+- Silent fallback is a defect when it prevents a maintainer from identifying whether the failure originated in prompt design, context assembly, retrieval, model/provider behavior, tool schema, runtime validation, or artifact persistence
+- Graceful degradation is allowed only as explicit product behavior. It must declare trigger condition, degraded output semantics, lost capability, user/operator-visible marker, diagnostic artifact, and recovery/escalation path
+- Output repair must not erase evidence of the original failure. If runtime repairs malformed LLM output for a non-authority user flow, the repair must record the original validation failure and repaired fields. For review/authority paths, repaired output cannot become trusted output unless the contract explicitly permits that repair
+- Fail-loud and fail-close are complementary: fail-close blocks untrusted output from being accepted, while fail-loud preserves enough diagnostic context to fix the failing source quickly
 ### User-Facing Error Requirements
 - User-facing errors must include: (1) what went wrong (human-readable), (2) what the user can do (recommended action), (3) correlation ID (for support/debugging)
@@ -209,7 +217,7 @@ See concepts.md §Type System Terms for definitions of structural and nominal ty
 ### Load Testing Criteria
 - For quantitative load testing thresholds (P99 > 1s triggers review), see structure_spec.md §Quantitative Thresholds (SSOT for performance thresholds)
-- For SLI/SLO definitions, see domain_scope.md §Operations, Deployment & Maintenance (SSOT for service level targets)
+- For SLI/SLO definitions, see domain_scope.md §Major Sub-areas → Operations and maintenance
 - This section owns only the caching, query optimization, and indexing rules; structure_spec.md owns the thresholds and domain_scope.md owns the SLO framework
 ## Dependency Logic
@@ -231,10 +239,74 @@ See dependency_rules.md for all dependency-related rules.
 - In batch processing, placing transaction boundaries at the Phase level and performing consistency verification at each Phase completion is a practical pattern
 - For backward compatibility classification (breaking vs non-breaking changes), see dependency_rules.md §Breaking vs Non-breaking Changes Classification
+## LLM Boundary Logic
+### Ownership Split Rules
+- Use concepts.md §LLM-Native Engineering Terms for the domain projection of LLM Boundary, Runtime Boundary, Middleware Boundary, and Ownership Non-Interference. This section owns operational prohibitions and contradiction rules, not the base concept definitions
+- LLM semantic output can become evidence or a reasoning source only with declared provenance and trust status
+- Runtime must not produce new semantic meaning, infer evidence sufficiency, or silently repair semantic drift while presenting the result as original LLM judgment
+- Middleware must not reinterpret, upgrade, downgrade, or repair meaning, and must not become a hidden policy owner or second source of truth
+- The LLM must not own persistence, final authority-seat assembly, authorization, cost control, deterministic state transition, or idempotency unless the contract explicitly frames its output as non-authoritative semantic input to runtime-owned gates
+- Boundary crossing must be declared with owner, input/output shape, enforcement profile, artifact or trust status, and diagnostic behavior. Undeclared crossing is a design defect because it makes failures and authority conflicts hard to localize
+### Model and Provider Rules
+- Model provider, model id, model version, auth mode, and route realization are external dependency facts. Production and review records must preserve the values needed to reproduce or explain model behavior
+- Model version aliases are acceptable for exploration but not for reproducibility claims. When aliases are used, the run artifact must record that behavior may drift
+- Model routing must declare the routing condition, selected model capability requirement, fallback behavior, and diagnostic behavior when no route qualifies
+### Prompt and Context Rules
+- Prompt templates, tool schemas, agent instructions, retrieval policy, and context assembly rules are behavior-affecting artifacts. Changes to them require the same review discipline as code/config changes
+- Structured model output must be validated before consumption. A schema instruction in a prompt is not a validation guarantee
+- Token budget must be checked before model invocation when truncation would remove instructions, evidence, or output schema requirements. Silent truncation is a contract failure
+- Retrieved context used as evidence must carry source provenance. If provenance is missing, the output may still be useful as a draft but cannot be treated as evidence-backed
+### Output Zero-Trust Rules
+- LLM output is untrusted until runtime validates it for the downstream sink. Schema validity only proves shape; it does not prove shell safety, SQL safety, HTML safety, path safety, email safety, authorization, provenance, or semantic truth
+- A prompt instruction such as "return valid JSON" or "do not call unsafe tools" must not be treated as a security, authorization, or sink-validation guarantee
+- If LLM output can influence a command, database query, file path, rendered HTML, API request, email, authority artifact, or user-visible decision, the sink must have explicit validation/encoding/authorization before use
+- Runtime may classify trust status but must not silently upgrade untrusted output into trusted output. Trust upgrades require a declared gate, evidence, and artifact or audit record
+### External Content and Prompt Injection Rules
+- External content entering a prompt through user input, webpages, files, email, documents, tool output, or retrieval is data, not instruction authority
+- Instruction hierarchy must be enforced by runtime/context assembly. A system cannot rely on the model alone to ignore hostile or hidden instructions in external content
+- External content must not grant tool permission, override role instructions, change output authority, request secret disclosure, or authorize exfiltration sinks unless a runtime-owned policy explicitly allows it
+- If external content can influence tool calls or authority artifacts, the design must include a prompt-injection test or red-team scenario that proves the boundary is enforceable
+### Retrieval and Vector Rules
+- Retrieval relevance does not imply evidence authority. Retrieved material becomes evidence only when source identity, permission scope, retrieval path, and provenance are recorded
+- RAG must apply permission filtering before content enters model context. Filtering after generation is too late for tenant/user boundary protection
+- Ingestion must validate source trust, document lifecycle, metadata, and poisoning risk before indexing. A poisoned corpus is a behavior dependency, not merely bad data
+- Embedding model, chunking strategy, preprocessing, index version, and retrieval/reranking policy are coupled. Changing one requires compatibility analysis and an eval or migration path
+- Retrieved text containing instructions must remain quoted or otherwise bounded as source content. It must not become a higher-priority prompt instruction
+### Agent and Tool Rules
+- Agent tool schemas must be self-describing enough for an agent to determine applicability without hidden documentation
+- Agent loops must define termination conditions, timeout behavior, and progress persistence. A loop without a termination condition is an unbounded runtime risk
+- Tool call results must be validated before they influence further reasoning. Tool failure, partial output, or unavailable capability must be represented explicitly in agent state
+- Context-isolated reasoning units must receive contracted input and produce contracted output. They must not depend on hidden main-context state for correctness
+- Agent functionality, permission, and autonomy must be minimized separately. A tool being technically available does not mean the agent is authorized to use it, and authorization does not mean it may act without human approval
+- High-impact tool actions require a human approval gate or a documented risk acceptance. High-impact includes irreversible writes, external communication, payment, credential use, data deletion, deployment, security policy change, and user-affecting authority artifacts
+- Agent retries must be bounded by retry safety. Retrying a non-idempotent or high-impact action without idempotency or approval is a logic defect
+### AI Governance and Risk Rules
+- AI behavior that can materially affect users, operators, security, privacy, release decisions, or authority artifacts must have an AI risk owner or explicit non-applicability rationale
+- A release gate that claims AI quality must distinguish deterministic correctness, schema validity, semantic evaluation, safety/security testing, and production drift monitoring
+- Red-team findings, incidents, semantic eval failures, and drift signals must feed back into prompts, policies, tests, controls, or release gates. Treating them as disconnected observations breaks continuous improvement
+- AI incident disclosure must be defined when AI behavior can mislead users, expose data, trigger unsafe action, or break a trust claim. Hidden incident handling is not compatible with artifact truth
 ## Related Documents
 - concepts.md — term definitions for type system, constraint design, concurrency, security, and testing
 - dependency_rules.md — dependency direction rules, API dependency management, build/package dependency rules
 - structure_spec.md — module structure, layer structure principles, verification structure
+- prompt_interface.md — prompt, role, tool, response format, and context interface criteria
 - competency_qs.md — CQ-T-01~CQ-T-08 (Types and Constraints verification questions)
 - competency_qs.md — CQ-E-01~CQ-E-08 (Error Handling verification questions)
 - competency_qs.md — CQ-P-01~CQ-P-04 (Performance verification questions)

package/.onto/domains/software-engineering/problem_framing_profile.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-version: 1
-last_updated: "2026-05-21"
+version: 3
+last_updated: "2026-05-27"
 source: issue-stance-deliberation-contract
 status: design_target
 doc_type: custom:problem_framing_profile
@@ -29,6 +29,14 @@ Required when an issue affects a concrete software artifact, runtime path, or de
 | `test_verification` | tests, conformance checks, smoke checks, validation harness |
 | `authority_docs` | `.onto` authority/process/principle docs used by runtime or agents |
 | `developer_experience` | setup, commands, diagnostics, handoff ergonomics |
+| `llm_agent_workflow` | LLM/agent orchestration, prompt/context assembly, tool use, or multi-agent coordination |
+| `model_provider_boundary` | model/provider/version/auth/routing dependency boundary |
+| `semantic_evaluation` | rubric, eval set, AI-as-judge, human review, or quality baseline |
+| `failure_diagnostics` | fail-loud diagnostics, structured failure artifacts, observability, or degraded-state surfacing |
+| `output_sink_boundary` | shell, SQL, HTML, file, email, API/tool, or authority-artifact sink that consumes generated or external output |
+| `rag_retrieval_boundary` | retrieval, embedding index, corpus, permission filter, source validation, or retrieval audit |
+| `ai_governance` | AI risk owner, approval gate, human oversight, incident disclosure, red-team/eval loop, or governance evidence |
+| `provenance_artifact` | source refs, builder/agent, input set, transformation path, verification state, or generated-artifact trust status |
 | `future_work` | reconstruct, evolve, learn, govern, or later product area |
 ### defect_kind
@@ -44,6 +52,15 @@ Required when the issue can be expressed as a software-development problem type.
 | `integration_failure` | independently valid parts do not compose into the intended path |
 | `verification_gap` | implementation or contract lacks a reliable check |
 | `observability_gap` | failure or state cannot be inspected well enough to operate or debug |
+| `silent_degradation` | fallback, repair, or graceful degradation hides the origin, trust loss, or incomplete behavior |
+| `semantic_quality_gap` | route/schema succeeds but usefulness, faithfulness, or output quality is unproven or degraded |
+| `output_trust_gap` | LLM/generated output reaches a downstream sink without sink-specific validation, encoding, authorization, provenance, or trust classification |
+| `prompt_injection_boundary_gap` | external content can override role, tool, permission, output, disclosure, or authority rules |
+| `rag_permission_gap` | retrieved material can cross permission, tenant, source-trust, or provenance boundaries before context injection |
+| `agency_overreach` | agent functionality, permission, or autonomy is broader than the task/risk justifies |
+| `provenance_gap` | authority-affecting generated/retrieved artifacts cannot be traced to source, builder, inputs, transformation path, and verification state |
+| `governance_gap` | material AI risk lacks owner, approval/acceptance gate, human oversight, incident path, or improvement loop |
+| `value_tradeoff_gap` | a local optimization hides or distorts stakeholder value, user/operator agency, accessibility, diagnosability, accountability, or artifact truth |
 | `quality_debt` | issue increases maintenance, drift, or coordination cost without immediate breakage |
 | `implementation_task` | design is sufficiently closed and can move to build work |
@@ -56,8 +73,15 @@ Optional. Use when the next useful evidence path matters to closure.
 | `schema_validation` | parser or schema check should validate the artifact shape |
 | `unit_test` | focused behavior test should cover the issue |
 | `integration_smoke` | end-to-end or cross-module smoke check is needed |
+| `semantic_eval` | rubric/golden-set or pairwise model/agent output evidence is needed |
+| `failure_artifact_smoke` | a fail-loud or degraded-state artifact should prove the failure remains diagnosable |
 | `package_install_smoke` | packaged install or executable path must be verified |
 | `provider_conformance` | provider-specific behavior needs a conformance check |
+| `sink_validation_smoke` | downstream sink validation/encoding/authorization should be exercised with generated or hostile input |
+| `prompt_injection_redteam` | hostile external-content scenario should verify instruction hierarchy and exfiltration boundaries |
+| `rag_permission_smoke` | retrieval should prove permission filtering, source provenance, poisoning control, and audit refs |
+| `provenance_audit` | artifact or claim provenance should be traced through source, builder/agent, input set, transformation, and verification state |
+| `governance_review` | risk owner, risk treatment, approval gate, incident path, and continuous-improvement loop should be reviewed |
 | `human_design_decision` | maintainer/user decision is the next verification gate |
 ## Rules
@@ -66,3 +90,6 @@ Optional. Use when the next useful evidence path matters to closure.
 2. `stale_authority_text` must be paired with `implementation_surface` or an explicit rationale explaining that runtime behavior is unaffected.
 3. `implementation_task` is not a fix proposal; it means the issue is framed well enough to become implementation input.
 4. `future_work` should be used when an issue belongs to reconstruct, evolve, learn, govern, or another planned capability rather than the current review path.
+5. `value_tradeoff_gap` must explain which value commitment is affected: diagnosability, artifact truth, accountability, evidence, explicit loss, least agency, governance, accessibility, or user/operator agency.
+6. `governance_gap` should not be downgraded to documentation-only when AI behavior affects release, security/privacy, user decisions, or authority artifacts.
+7. `output_trust_gap`, `prompt_injection_boundary_gap`, and `rag_permission_gap` should prefer fail-close plus fail-loud closure unless the product explicitly accepts visible degraded behavior.

package/.onto/domains/software-engineering/prompt_interface.md ADDED Viewed

@@ -0,0 +1,122 @@
+---
+version: 4
+last_updated: "2026-05-28"
+source: merged-from-llm-native-development
+status: established
+---
+# Software Engineering Domain — Prompt and Agent Interface Criteria
+This document defines design criteria for prompts, role instructions, tool schemas, response formats, and agent handoffs used inside software-engineering workflows.
+## System Prompt Structure
+- System prompts must separate stable role/rules from dynamic task input.
+- Prompts that affect behavior must be versioned, reviewable, and tied to an artifact seat.
+- Context loading must be explicit: either the agent reads declared files/resources, or runtime injects a bounded context bundle.
+- Historical notes, deprecated behavior, and migration rationale must not be loaded into active execution context unless the task specifically needs that history.
+## Role Definition Structure
+Role definitions for LLM workers or agents must include:
+- purpose and non-purpose
+- inputs the role is allowed to use
+- outputs the role must produce
+- tools/resources available to the role
+- forbidden actions
+- failure behavior and escalation path
+## Ownership Boundary Structure
+Prompt, role, handoff, and tool interfaces must reference the LLM/runtime/middleware boundary vocabulary in `concepts.md` and state the task-specific ownership split when more than one layer participates in behavior. This section owns interface obligations, not the canonical boundary definitions.
+An interface that crosses the boundary must declare:
+- the semantic work delegated to the LLM role
+- the deterministic gates and authority seats owned by runtime
+- the transport/adaptation work owned by middleware
+- accepted input and output shape for each boundary crossing
+- enforcement profile, trust/artifact status, and diagnostic behavior
+- forbidden crossovers, especially semantic repair by runtime/middleware and LLM bypass of runtime-owned validation, persistence, authority assembly, authorization, idempotency, or cost/security gates
+## Tool Definition Structure
+Every tool exposed to an agent must include:
+- name and concise purpose
+- parameter schema with required/optional fields
+- result shape and trust status
+- failure modes and retry safety
+- permission or side-effect boundary
+- examples only when they reduce ambiguity
+Tool definitions with overlapping capability must include routing guidance or be consolidated.
+Tool definitions for high-impact actions must additionally include:
+- required human approval condition
+- idempotency or rollback expectation
+- audit artifact emitted on success/failure
+- forbidden use cases
+- sensitive input/output handling
+## Response Format Constraints
+- Structured output must be validated by runtime before consumption.
+- Format instructions in a prompt do not replace schema validation.
+- When output becomes an authority artifact, malformed output must fail-close and fail-loud unless a documented repair rule exists.
+- If a response is degraded, partial, or draft-only, that status must be visible in the output and artifact metadata.
+## Output Sink Constraints
+Prompt and response contracts must name any downstream sink that will consume model output.
+| Sink | Required runtime gate |
+|---|---|
+| Shell/CLI | command allowlist or parser, argument escaping, approval for destructive actions |
+| SQL/database | parameterization, authorization, transaction/idempotency handling |
+| HTML/Markdown/user display | output encoding/sanitization, trust/status markers where needed |
+| File path/filesystem | path normalization, root-boundary validation, overwrite/destructive-action policy |
+| Email/chat/external message | recipient authorization, disclosure policy, approval for sensitive/high-impact content |
+| API/tool call | schema validation, permission check, side-effect classification |
+| Authority artifact | schema validation, provenance, trust status, deterministic assembly gate |
+If no sink is known at prompt time, the output must be treated as draft/untrusted until the sink is declared and validated.
+## Context Window Utilization
+- Static prompt material should be small enough to leave room for user input, retrieved context, and output.
+- Token budget should be checked before dispatch when truncation would remove instructions, evidence, or schemas.
+- Retrieved context used as evidence must carry provenance.
+- Critical instructions and output schemas should be placed where the model is least likely to lose them under long context.
+## External Content Handling
+- User input, webpage text, file contents, retrieved snippets, email bodies, logs, and tool output must be framed as data unless a runtime-owned policy explicitly grants instruction authority.
+- Prompts should label untrusted external content and instruct the model not to treat it as role, tool, permission, or output-format authority.
+- Runtime/context assembly must preserve source refs and permission scope for external content used as evidence.
+- Hidden instructions found in external content are a prompt-injection case, not a valid override.
+## Agent Permission and Autonomy
+Agent-facing instructions must distinguish:
+- functionality: what the tool/runtime can do
+- permission: what the agent is authorized to do in this task/user/tenant scope
+- autonomy: what the agent may do without human approval
+An agent prompt that says "use tools as needed" is under-specified unless tool permission, autonomy, retry safety, and high-impact approval boundaries are declared elsewhere in the contracted input.
+## Fail-Loud Interface Rule
+For development, review, and authority-update paths, an interface that cannot provide the required prompt, context, tool, model, or output contract should stop with a diagnostic artifact. Silent fallback is more costly than visible failure because it hides the failing boundary and forces later exploration.
+Graceful degradation is allowed for user-facing product behavior only when the reduced capability, cause, diagnostic reference, and recovery path are explicit.
+## Related Documents
+- concepts.md — LLM-native engineering terms
+- logic_rules.md — LLM boundary logic and failure posture
+- structure_spec.md — LLM-native system structure
+- competency_qs.md — CQ-A questions for AI agent and LLM-native collaboration

package/.onto/domains/software-engineering/structure_spec.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-version: 2
-last_updated: "2026-03-30"
+version: 6
+last_updated: "2026-05-28"
 source: bundled-domain-baseline
 status: established
 ---
@@ -45,7 +45,7 @@ Classification axis: **structural component** — specifications classified by t
 ## Required Relationships
 - See §Golden Relationships for module-interface, test-code, and config-code coherence rules.
-- All external dependencies (libraries, APIs) must be abstracted via interfaces for replaceability
+- External dependencies must follow dependency_rules.md. Use owned interfaces, ports, adapters, or anti-corruption layers when replacement, testing, security, policy isolation, or model translation matters. Direct coupling to a stable or low-risk dependency may be accepted with an explicit tradeoff rationale
 - When structural verification (code) and execution procedures (protocol) are in separate documents, the linking reference must be back-referenced in the protocol document for enforcement to be complete
 ## Golden Relationships
@@ -60,7 +60,12 @@ Golden relationships are cross-component validation rules. Each rule connects tw
 ## Layer Structure Principles
-Layer dependency direction rules are defined in dependency_rules.md §Direction Rules. The key principle: upper layers depend on lower layers, never the reverse.
+Layer dependency direction rules are defined in dependency_rules.md §Direction Rules. There is no single global "upper -> lower" rule:
+- Conventional layered architecture may allow presentation/application layers to depend on lower service/data-access layers.
+- Clean and Hexagonal architectures constrain source-code dependencies to point inward or toward ports/abstractions.
+- Runtime call/data-flow direction is separate from source-code dependency direction.
+- Reviews must apply the direction rule for the declared architecture pattern and dependency kind.
 ## Authority and Layer Separation
@@ -137,9 +142,52 @@ These thresholds are structural health indicators derived from industry practice
 | Test coverage (line) | < 60% | Critical verification gap | Immediate action required |
 | API response time | P99 > 1s | Performance degradation | Performance review and optimization |
 | Class inheritance depth | > 5 levels | Inheritance hierarchy is too deep | Prefer composition over inheritance |
+| Agent tool count | > 20 tools per agent | Tool selection quality drops and routing becomes non-deterministic | Split tools, add routing, or narrow the agent role |
+| Prompt template length | > 25% of target context window | User input/retrieved evidence/output schema may be squeezed or truncated | Refactor prompt, move stable material to refs, or choose a larger context model |
 Cross-reference: logic_rules.md 'Testing Logic' (test boundary rules inform coverage measurement strategy).
+## LLM-Native System Structure
+This section applies when a software system, development workflow, or review workflow depends on LLMs, agents, prompt/context contracts, retrieval, model providers, or tool-call boundaries.
+### Required Components
+| Component | Structure | Failure if Missing |
+|---|---|---|
+| Model connection | Provider/client boundary with model id, auth mode, version, rate-limit handling | The system cannot reproduce or explain model behavior |
+| Prompt/context assembly | Prompt templates, instruction hierarchy, context sources, token budget, output schema | Model input becomes an unreviewable prompt blob |
+| Output validation and sink gates | Schema validation, semantic checks, sink-specific validation/encoding/authorization, trust boundary, failure artifact | Malformed or unsafe output becomes trusted behavior or unsafe downstream input |
+| Evaluation harness | Golden set, rubric, baseline, comparison method | Route success is mistaken for output quality |
+| Observability | Prompt/output/model/tool facts, correlation id, cost, latency, failure reason | Failures become expensive to diagnose |
+| Provenance record | Source refs, builder/agent, inputs, transformation path, verification state, model/provider facts | Generated or retrieved claims become unverifiable authority |
+| Ownership boundary map | LLM semantic delegation, runtime deterministic gates and authority seats, middleware transport/adaptation, trust status, diagnostics | LLM, runtime, or middleware can silently take over another layer's authority |
+### Optional Components Required When Applicable
+| Component | Required When | Structure |
+|---|---|---|
+| Retrieval/RAG pipeline | External knowledge is selected for model context | ingestion -> processing -> indexing -> retrieval -> reranking/context handoff, with provenance at each stage |
+| RAG permission layer | Retrieved material crosses users, tenants, projects, sensitivity classes, or authority levels | pre-context permission filtering, source validation, poison checks, retrieval audit, redaction/exclusion path |
+| Agent tool registry | An LLM can choose actions or call tools | tool name, purpose, parameter schema, result shape, failure semantics, permission boundary |
+| Agent state/progress | Work spans multiple steps, tools, or sessions | explicit state object or artifact with completed/current/remaining steps and accumulated refs |
+| Multi-agent profile | Multiple reasoning units collaborate | coordination pattern, isolated inputs/outputs, termination conditions, conflict-resolution authority |
+| Safety guardrails | User input, model output, or agent action can cause harm | input guardrail, output guardrail, action permission model, logging, false-positive review path |
+| AI governance record | AI behavior materially affects users, operators, release, security, privacy, or authority artifacts | risk owner, risk treatment, approval gate, human oversight, transparency/audit evidence |
+| Red-team/incident loop | AI behavior can fail semantically, disclose data, mislead users, or trigger unsafe action | test scenario, finding intake, incident disclosure path, remediation owner, updated prompt/policy/eval/release gate |
+| Human approval gate | Agent output or action is high-impact, irreversible, external, privileged, or user-affecting | approver role, approval input, audit record, denial path, idempotency or rollback expectation |
+### Golden Relationships
+- **Model capability -> prompt/tool requirement**: The chosen model must support the prompt's required capabilities: context length, structured output, tool use, modality, and reasoning level. If not, invocation must fail-loud before dispatch or record a degraded route explicitly
+- **Retrieved context -> evidence claim**: Any evidence-backed claim must trace to retrieved context provenance. Generated text without provenance may be a draft but not evidence
+- **Tool schema -> agent instruction**: Every tool named in agent instructions must exist with a valid schema, and every exposed tool must either be referenced by an agent role or justified as discoverable reserve capacity
+- **Evaluation baseline -> production drift**: Production quality drift detection must compare against an evaluation baseline. Monitoring without a baseline cannot classify quality movement
+- **External content -> model context**: Any external content entering model context must pass through context assembly that preserves instruction hierarchy and treats the content as data, not authority
+- **LLM semantic output -> runtime authority gate -> middleware adapter**: LLM output may provide semantic input, but runtime owns validation, authority-seat assembly, persistence, authorization, idempotency, and cost/security gates. Middleware may adapt envelopes, routes, and observability plumbing, but must not repair meaning, become hidden policy authority, or bypass runtime-owned gates
+- **Agent capability -> permission -> autonomy**: Capability, authorization, and approval are distinct structure seats. A design that exposes a tool without separately declaring permission and autonomy is structurally under-specified
+- **AI risk -> owner -> gate -> feedback loop**: Material AI risk must connect to an owner, approval or acceptance gate, incident/red-team intake, and update path for controls or evals
 ## Verification Structure
 ### Static Analysis Integration
@@ -182,4 +230,5 @@ Cross-reference: logic_rules.md 'Testing Logic' (test boundary rules inform cove
 - concepts.md — term definitions for module, interface, layer, architecture patterns, etc.
 - dependency_rules.md — dependency direction and circular dependency rules, build/package dependency management
 - logic_rules.md — type system logic, constraint design, security logic, testing logic
+- prompt_interface.md — prompt, role, tool, response format, and context interface criteria
 - competency_qs.md — CQ-S-01~CQ-S-10 (Structural Understanding verification questions)

package/.onto/principles/llm-native-development-guideline.md CHANGED Viewed

@@ -45,6 +45,7 @@ mixed stage가 보이면 아래 둘로 분리해야 한다.
 7. `script`로 안전하게 자동화할 수 없는 일은 runtime이 아니라 `LLM` 소유로 두는 편이 맞다.
 8. prompt path는 설계의 대략적인 버전이 아니라, 설계된 process의 **기준 실행 (reference realization)** 이어야 한다.
 9. 개발 중인 시스템은 매 단계에서 실제로 작동 가능한 상태를 유지해야 한다.
+10. LLM-native 개발·검토·authority 업데이트 경로에서는 숨겨진 fallback보다 **fail-loud**가 기본값이다.
 ## 3. runtime 역할을 과대하게 잡지 말 것
@@ -69,6 +70,23 @@ runtime이 하면 안 되는 일:
 즉 runtime은 semantic quality를 생산하는 층이 아니라,
 semantic drift가 계약 밖으로 새지 못하게 막는 층이다.
+### 3.1 Fail-loud over silent degradation
+LLM-native 개발에서는 전통적인 "fail-safe" 직관이 항상 맞지 않는다.
+기존 소프트웨어에서는 사용자가 계속 작업할 수 있게 fallback이나 graceful degradation을 넣는 것이 비용을 줄이는 경우가 많다. 하지만 LLM-native 개발·검토·authority 업데이트 경로에서는 silent failure가 더 큰 비용을 만든다. 실패 지점이 prompt인지, context assembly인지, retrieval인지, model/provider인지, tool schema인지, runtime validator인지 다시 탐색해야 하기 때문이다.
+이 환경에서는 코딩·보수 비용보다 **실패 원인 탐색 비용**이 더 자주 병목이 된다. LLM과 agent가 수정 비용을 낮춰주기 때문에, 문제가 난 자리에서 loud하게 실패시키고 바로 고치는 편이 보통 더 싸다.
+따라서 기본 규칙은 다음과 같다.
+- malformed LLM output, missing context, schema mismatch, invalid tool result, provider preflight failure, token budget overflow는 숨기지 말고 실패 위치와 원인을 남긴다.
+- fallback은 "계속 실행하기 위한 내부 꼼수"가 아니라, trigger, lost capability, trust status, diagnostic artifact, recovery path가 선언된 product behavior여야 한다.
+- review, canonicalization, authority update처럼 artifact truth를 만드는 경로에서는 degraded output이 complete output처럼 통과하면 안 된다.
+- user-facing production flow에서만 graceful degradation이 기본값이 될 수 있다. 이때도 부분 결과·품질 저하·근거 부족은 사용자나 운영자가 볼 수 있어야 한다.
+`fail-close`는 계약 미달 output을 신뢰 경계 안으로 들이지 않는 gate이고, `fail-loud`는 그 gate가 닫힌 이유를 즉시 고칠 수 있게 드러내는 diagnostic posture다. 둘은 대체 관계가 아니라 함께 쓰는 관계다.
 ## 4. 의사결정 프레임
 새 작업이나 기능을 설계할 때는 아래 세 질문을 순서대로 본다.
@@ -233,6 +251,7 @@ LLM 기능에서는 다음 자산을 코드와 동급으로 취급한다.
 - retrieval policy
 - tool use policy
 - fallback policy
+- fail-loud policy
 - reviewer workflow
 - promote / canonicalize criteria
 - declared boundary policy
@@ -356,6 +375,7 @@ ontology가 너무 일찍 고정하면 안 되는 것:
 - exact-match 테스트만으로 품질을 증명하려고 하기
 - eval 없이 runtime hardening부터 하기
 - uncertainty 표현이나 abstain을 실패로 간주하기
+- silent fallback, hidden output repair, unmarked graceful degradation으로 실패 지점을 숨기기
 - prompt/context/retrieval 실험 없이 schema만 정교하게 만들기
 - 품질이 아니라 형식 안정성만 개선하고 "개선"이라고 부르기
 - runtime이 semantic task를 대신하도록 boundary를 잘못 자르기

package/.onto/principles/productization-charter.md CHANGED Viewed

@@ -338,6 +338,12 @@ canonical은:
 `synthesize`는 새 lens가 아니며,
 기존 lens 결과를 보존적으로 종합해야 한다.
+`New Perspectives`는 현재 review 실행의 active lens set을 바꾸는 장치가 아니다.
+domain 문서나 domain concern은 lens 추가를 결정하지 않는다. domain은 concern을
+case evidence, CQ, rule, value commitment로 제공하고, 기존 lens가 그 material을
+소비한다. lens 추가/삭제/분할/통합은 domain 작업이 아니라 review process governance
+변경이며, 모든 domain과 runtime artifact에 미치는 영향을 별도 판단해야 한다.
 ### 10.3 맥락 격리 추론 단위
 → canonical 위치: `.onto/principles/ontology-as-code-guideline.md` §7 (구조 규칙) + `.onto/principles/llm-native-development-guideline.md` (설계 가이드)

package/.onto/processes/evolve/material-kind-adapter-contract.md CHANGED Viewed

@@ -20,6 +20,7 @@ The shared material contract is:
 ```text
 .onto/processes/shared/target-material-kind-contract.md
+.onto/processes/shared/pipeline-execution-ledger-contract.md
 ```
 No `evolve` runtime or MCP tool is active in this repository. This document is
@@ -52,6 +53,7 @@ Runtime may own:
 - source and artifact refs
 - deterministic metrics and validation reports
 - unsupported or unknown material failure records
+- pipeline pipeline execution ledger projection for artifact trust and provenance
 Host LLM and user-mediated flow own:
@@ -87,6 +89,7 @@ design-stage output:
 | `evolve-adapter-selection.yaml` | runtime | selected adapter id, material kind, support status, and unsupported reason |
 | `evolve-context-observations.yaml` | runtime | material-specific current-state observations without design recommendations |
 | `evolve-specification.yaml` | host LLM, user confirmed | proposed design change after inquiry and scope agreement |
+| future `pipeline-execution-ledger.yaml` or status projection | runtime | trust status for target profile, adapter selection, observations, specification, validation, and final disposition units |
 | `evolve-record.yaml` | runtime assembly | artifact refs, material status, validation summaries, and final disposition refs |
 These names are future contract placeholders. Runtime implementation must either
@@ -111,3 +114,6 @@ When future evolve implementation starts, the first tests should prove:
 - unsupported material states produce explicit structured output
 - generated artifacts preserve `target_material_kind`
 - runtime outputs bounded observations and refs, not design decisions
+- status/result surfaces expose an pipeline execution ledger projection so callers
+  can see which evolve artifacts are trusted, untrusted, or blocked by upstream
+  failure