onto-mcp 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/.onto/authority/core-lexicon.yaml +1 -0
  2. package/.onto/domains/software-engineering/competency_qs.md +192 -63
  3. package/.onto/domains/software-engineering/concepts.md +67 -5
  4. package/.onto/domains/software-engineering/conciseness_rules.md +22 -2
  5. package/.onto/domains/software-engineering/dependency_rules.md +78 -8
  6. package/.onto/domains/software-engineering/domain_scope.md +181 -150
  7. package/.onto/domains/software-engineering/extension_cases.md +318 -542
  8. package/.onto/domains/software-engineering/logic_rules.md +75 -3
  9. package/.onto/domains/software-engineering/problem_framing_profile.md +29 -2
  10. package/.onto/domains/software-engineering/prompt_interface.md +122 -0
  11. package/.onto/domains/software-engineering/structure_spec.md +53 -4
  12. package/.onto/principles/llm-native-development-guideline.md +20 -0
  13. package/.onto/principles/productization-charter.md +6 -0
  14. package/.onto/processes/reconstruct/reconstruct-boundary-contract.md +278 -91
  15. package/.onto/processes/reconstruct/reconstruct-execution-ux-contract.md +45 -12
  16. package/.onto/processes/reconstruct/source-profile-contract.md +39 -6
  17. package/.onto/processes/reconstruct/top-level-concept-discovery-contract.md +387 -0
  18. package/.onto/processes/review/lens-registry.md +16 -0
  19. package/.onto/processes/shared/target-material-kind-contract.md +18 -2
  20. package/.onto/roles/axiology.md +7 -2
  21. package/AGENTS.md +3 -2
  22. package/README.md +39 -33
  23. package/dist/core-api/reconstruct-api.js +22 -5
  24. package/dist/core-api/review-api.js +1288 -533
  25. package/dist/core-runtime/cli/mock-review-unit-executor.js +17 -0
  26. package/dist/core-runtime/cli/review-invoke.js +23 -48
  27. package/dist/core-runtime/cli/run-review-prompt-execution.js +122 -0
  28. package/dist/core-runtime/path-boundary.js +58 -0
  29. package/dist/core-runtime/reconstruct/artifact-types.js +5 -0
  30. package/dist/core-runtime/reconstruct/materialize-preparation.js +54 -4
  31. package/dist/core-runtime/reconstruct/pipeline-execution-ledger.js +38 -2
  32. package/dist/core-runtime/reconstruct/post-seed-validation.js +13 -0
  33. package/dist/core-runtime/reconstruct/record.js +11 -0
  34. package/dist/core-runtime/reconstruct/run.js +1133 -26
  35. package/dist/core-runtime/reconstruct/seed-candidate-validation.js +29 -0
  36. package/dist/core-runtime/review/execution-plan-boundary.js +123 -0
  37. package/dist/core-runtime/review/materializers.js +8 -3
  38. package/dist/core-runtime/review/review-artifact-utils.js +15 -2
  39. package/dist/core-runtime/review/review-invocation-runner.js +604 -0
  40. package/dist/core-runtime/target-material-kind.js +43 -5
  41. package/dist/mcp/server.js +158 -39
  42. package/dist/mcp/tool-schemas.js +22 -2
  43. package/package.json +3 -1
  44. package/.onto/domains/llm-native-development/competency_qs.md +0 -430
  45. package/.onto/domains/llm-native-development/concepts.md +0 -242
  46. package/.onto/domains/llm-native-development/conciseness_rules.md +0 -163
  47. package/.onto/domains/llm-native-development/dependency_rules.md +0 -216
  48. package/.onto/domains/llm-native-development/domain_scope.md +0 -197
  49. package/.onto/domains/llm-native-development/extension_cases.md +0 -474
  50. package/.onto/domains/llm-native-development/logic_rules.md +0 -123
  51. package/.onto/domains/llm-native-development/prompt_interface.md +0 -49
  52. package/.onto/domains/llm-native-development/structure_spec.md +0 -245
@@ -1,6 +1,6 @@
1
1
  ---
2
- version: 3
3
- last_updated: "2026-03-31"
2
+ version: 7
3
+ last_updated: "2026-05-28"
4
4
  source: bundled-domain-baseline
5
5
  status: established
6
6
  ---
@@ -125,6 +125,14 @@ See concepts.md §Type System Terms for definitions of structural and nominal ty
125
125
  - **Error middleware (Express, Koa, ASP.NET)**: Centralized error handling in request pipeline. Transforms internal errors into user-facing responses. Must not swallow errors silently
126
126
  - **Anti-pattern**: catch-and-ignore (`catch (e) {}`) creates silent failures. Every catch block must log, re-throw, or return an error value
127
127
 
128
+ ### LLM-Native Failure Posture
129
+
130
+ - In LLM-native development, review, and authority-update paths, the default posture is **fail-loud**: a malformed model output, missing context source, invalid tool result, schema mismatch, provider preflight failure, unbudgeted truncation, or absent artifact ref must halt or surface a structured diagnostic naming the failing boundary
131
+ - Silent fallback is a defect when it prevents a maintainer from identifying whether the failure originated in prompt design, context assembly, retrieval, model/provider behavior, tool schema, runtime validation, or artifact persistence
132
+ - Graceful degradation is allowed only as explicit product behavior. It must declare trigger condition, degraded output semantics, lost capability, user/operator-visible marker, diagnostic artifact, and recovery/escalation path
133
+ - Output repair must not erase evidence of the original failure. If runtime repairs malformed LLM output for a non-authority user flow, the repair must record the original validation failure and repaired fields. For review/authority paths, repaired output cannot become trusted output unless the contract explicitly permits that repair
134
+ - Fail-loud and fail-close are complementary: fail-close blocks untrusted output from being accepted, while fail-loud preserves enough diagnostic context to fix the failing source quickly
135
+
128
136
  ### User-Facing Error Requirements
129
137
 
130
138
  - User-facing errors must include: (1) what went wrong (human-readable), (2) what the user can do (recommended action), (3) correlation ID (for support/debugging)
@@ -209,7 +217,7 @@ See concepts.md §Type System Terms for definitions of structural and nominal ty
209
217
  ### Load Testing Criteria
210
218
 
211
219
  - For quantitative load testing thresholds (P99 > 1s triggers review), see structure_spec.md §Quantitative Thresholds (SSOT for performance thresholds)
212
- - For SLI/SLO definitions, see domain_scope.md §Operations, Deployment & Maintenance (SSOT for service level targets)
220
+ - For SLI/SLO definitions, see domain_scope.md §Major Sub-areas Operations and maintenance
213
221
  - This section owns only the caching, query optimization, and indexing rules; structure_spec.md owns the thresholds and domain_scope.md owns the SLO framework
214
222
 
215
223
  ## Dependency Logic
@@ -231,10 +239,74 @@ See dependency_rules.md for all dependency-related rules.
231
239
  - In batch processing, placing transaction boundaries at the Phase level and performing consistency verification at each Phase completion is a practical pattern
232
240
  - For backward compatibility classification (breaking vs non-breaking changes), see dependency_rules.md §Breaking vs Non-breaking Changes Classification
233
241
 
242
+ ## LLM Boundary Logic
243
+
244
+ ### Ownership Split Rules
245
+
246
+ - Use concepts.md §LLM-Native Engineering Terms for the domain projection of LLM Boundary, Runtime Boundary, Middleware Boundary, and Ownership Non-Interference. This section owns operational prohibitions and contradiction rules, not the base concept definitions
247
+ - LLM semantic output can become evidence or a reasoning source only with declared provenance and trust status
248
+ - Runtime must not produce new semantic meaning, infer evidence sufficiency, or silently repair semantic drift while presenting the result as original LLM judgment
249
+ - Middleware must not reinterpret, upgrade, downgrade, or repair meaning, and must not become a hidden policy owner or second source of truth
250
+ - The LLM must not own persistence, final authority-seat assembly, authorization, cost control, deterministic state transition, or idempotency unless the contract explicitly frames its output as non-authoritative semantic input to runtime-owned gates
251
+ - Boundary crossing must be declared with owner, input/output shape, enforcement profile, artifact or trust status, and diagnostic behavior. Undeclared crossing is a design defect because it makes failures and authority conflicts hard to localize
252
+
253
+ ### Model and Provider Rules
254
+
255
+ - Model provider, model id, model version, auth mode, and route realization are external dependency facts. Production and review records must preserve the values needed to reproduce or explain model behavior
256
+ - Model version aliases are acceptable for exploration but not for reproducibility claims. When aliases are used, the run artifact must record that behavior may drift
257
+ - Model routing must declare the routing condition, selected model capability requirement, fallback behavior, and diagnostic behavior when no route qualifies
258
+
259
+ ### Prompt and Context Rules
260
+
261
+ - Prompt templates, tool schemas, agent instructions, retrieval policy, and context assembly rules are behavior-affecting artifacts. Changes to them require the same review discipline as code/config changes
262
+ - Structured model output must be validated before consumption. A schema instruction in a prompt is not a validation guarantee
263
+ - Token budget must be checked before model invocation when truncation would remove instructions, evidence, or output schema requirements. Silent truncation is a contract failure
264
+ - Retrieved context used as evidence must carry source provenance. If provenance is missing, the output may still be useful as a draft but cannot be treated as evidence-backed
265
+
266
+ ### Output Zero-Trust Rules
267
+
268
+ - LLM output is untrusted until runtime validates it for the downstream sink. Schema validity only proves shape; it does not prove shell safety, SQL safety, HTML safety, path safety, email safety, authorization, provenance, or semantic truth
269
+ - A prompt instruction such as "return valid JSON" or "do not call unsafe tools" must not be treated as a security, authorization, or sink-validation guarantee
270
+ - If LLM output can influence a command, database query, file path, rendered HTML, API request, email, authority artifact, or user-visible decision, the sink must have explicit validation/encoding/authorization before use
271
+ - Runtime may classify trust status but must not silently upgrade untrusted output into trusted output. Trust upgrades require a declared gate, evidence, and artifact or audit record
272
+
273
+ ### External Content and Prompt Injection Rules
274
+
275
+ - External content entering a prompt through user input, webpages, files, email, documents, tool output, or retrieval is data, not instruction authority
276
+ - Instruction hierarchy must be enforced by runtime/context assembly. A system cannot rely on the model alone to ignore hostile or hidden instructions in external content
277
+ - External content must not grant tool permission, override role instructions, change output authority, request secret disclosure, or authorize exfiltration sinks unless a runtime-owned policy explicitly allows it
278
+ - If external content can influence tool calls or authority artifacts, the design must include a prompt-injection test or red-team scenario that proves the boundary is enforceable
279
+
280
+ ### Retrieval and Vector Rules
281
+
282
+ - Retrieval relevance does not imply evidence authority. Retrieved material becomes evidence only when source identity, permission scope, retrieval path, and provenance are recorded
283
+ - RAG must apply permission filtering before content enters model context. Filtering after generation is too late for tenant/user boundary protection
284
+ - Ingestion must validate source trust, document lifecycle, metadata, and poisoning risk before indexing. A poisoned corpus is a behavior dependency, not merely bad data
285
+ - Embedding model, chunking strategy, preprocessing, index version, and retrieval/reranking policy are coupled. Changing one requires compatibility analysis and an eval or migration path
286
+ - Retrieved text containing instructions must remain quoted or otherwise bounded as source content. It must not become a higher-priority prompt instruction
287
+
288
+ ### Agent and Tool Rules
289
+
290
+ - Agent tool schemas must be self-describing enough for an agent to determine applicability without hidden documentation
291
+ - Agent loops must define termination conditions, timeout behavior, and progress persistence. A loop without a termination condition is an unbounded runtime risk
292
+ - Tool call results must be validated before they influence further reasoning. Tool failure, partial output, or unavailable capability must be represented explicitly in agent state
293
+ - Context-isolated reasoning units must receive contracted input and produce contracted output. They must not depend on hidden main-context state for correctness
294
+ - Agent functionality, permission, and autonomy must be minimized separately. A tool being technically available does not mean the agent is authorized to use it, and authorization does not mean it may act without human approval
295
+ - High-impact tool actions require a human approval gate or a documented risk acceptance. High-impact includes irreversible writes, external communication, payment, credential use, data deletion, deployment, security policy change, and user-affecting authority artifacts
296
+ - Agent retries must be bounded by retry safety. Retrying a non-idempotent or high-impact action without idempotency or approval is a logic defect
297
+
298
+ ### AI Governance and Risk Rules
299
+
300
+ - AI behavior that can materially affect users, operators, security, privacy, release decisions, or authority artifacts must have an AI risk owner or explicit non-applicability rationale
301
+ - A release gate that claims AI quality must distinguish deterministic correctness, schema validity, semantic evaluation, safety/security testing, and production drift monitoring
302
+ - Red-team findings, incidents, semantic eval failures, and drift signals must feed back into prompts, policies, tests, controls, or release gates. Treating them as disconnected observations breaks continuous improvement
303
+ - AI incident disclosure must be defined when AI behavior can mislead users, expose data, trigger unsafe action, or break a trust claim. Hidden incident handling is not compatible with artifact truth
304
+
234
305
  ## Related Documents
235
306
  - concepts.md — term definitions for type system, constraint design, concurrency, security, and testing
236
307
  - dependency_rules.md — dependency direction rules, API dependency management, build/package dependency rules
237
308
  - structure_spec.md — module structure, layer structure principles, verification structure
309
+ - prompt_interface.md — prompt, role, tool, response format, and context interface criteria
238
310
  - competency_qs.md — CQ-T-01~CQ-T-08 (Types and Constraints verification questions)
239
311
  - competency_qs.md — CQ-E-01~CQ-E-08 (Error Handling verification questions)
240
312
  - competency_qs.md — CQ-P-01~CQ-P-04 (Performance verification questions)
@@ -1,6 +1,6 @@
1
1
  ---
2
- version: 1
3
- last_updated: "2026-05-21"
2
+ version: 3
3
+ last_updated: "2026-05-27"
4
4
  source: issue-stance-deliberation-contract
5
5
  status: design_target
6
6
  doc_type: custom:problem_framing_profile
@@ -29,6 +29,14 @@ Required when an issue affects a concrete software artifact, runtime path, or de
29
29
  | `test_verification` | tests, conformance checks, smoke checks, validation harness |
30
30
  | `authority_docs` | `.onto` authority/process/principle docs used by runtime or agents |
31
31
  | `developer_experience` | setup, commands, diagnostics, handoff ergonomics |
32
+ | `llm_agent_workflow` | LLM/agent orchestration, prompt/context assembly, tool use, or multi-agent coordination |
33
+ | `model_provider_boundary` | model/provider/version/auth/routing dependency boundary |
34
+ | `semantic_evaluation` | rubric, eval set, AI-as-judge, human review, or quality baseline |
35
+ | `failure_diagnostics` | fail-loud diagnostics, structured failure artifacts, observability, or degraded-state surfacing |
36
+ | `output_sink_boundary` | shell, SQL, HTML, file, email, API/tool, or authority-artifact sink that consumes generated or external output |
37
+ | `rag_retrieval_boundary` | retrieval, embedding index, corpus, permission filter, source validation, or retrieval audit |
38
+ | `ai_governance` | AI risk owner, approval gate, human oversight, incident disclosure, red-team/eval loop, or governance evidence |
39
+ | `provenance_artifact` | source refs, builder/agent, input set, transformation path, verification state, or generated-artifact trust status |
32
40
  | `future_work` | reconstruct, evolve, learn, govern, or later product area |
33
41
 
34
42
  ### defect_kind
@@ -44,6 +52,15 @@ Required when the issue can be expressed as a software-development problem type.
44
52
  | `integration_failure` | independently valid parts do not compose into the intended path |
45
53
  | `verification_gap` | implementation or contract lacks a reliable check |
46
54
  | `observability_gap` | failure or state cannot be inspected well enough to operate or debug |
55
+ | `silent_degradation` | fallback, repair, or graceful degradation hides the origin, trust loss, or incomplete behavior |
56
+ | `semantic_quality_gap` | route/schema succeeds but usefulness, faithfulness, or output quality is unproven or degraded |
57
+ | `output_trust_gap` | LLM/generated output reaches a downstream sink without sink-specific validation, encoding, authorization, provenance, or trust classification |
58
+ | `prompt_injection_boundary_gap` | external content can override role, tool, permission, output, disclosure, or authority rules |
59
+ | `rag_permission_gap` | retrieved material can cross permission, tenant, source-trust, or provenance boundaries before context injection |
60
+ | `agency_overreach` | agent functionality, permission, or autonomy is broader than the task/risk justifies |
61
+ | `provenance_gap` | authority-affecting generated/retrieved artifacts cannot be traced to source, builder, inputs, transformation path, and verification state |
62
+ | `governance_gap` | material AI risk lacks owner, approval/acceptance gate, human oversight, incident path, or improvement loop |
63
+ | `value_tradeoff_gap` | a local optimization hides or distorts stakeholder value, user/operator agency, accessibility, diagnosability, accountability, or artifact truth |
47
64
  | `quality_debt` | issue increases maintenance, drift, or coordination cost without immediate breakage |
48
65
  | `implementation_task` | design is sufficiently closed and can move to build work |
49
66
 
@@ -56,8 +73,15 @@ Optional. Use when the next useful evidence path matters to closure.
56
73
  | `schema_validation` | parser or schema check should validate the artifact shape |
57
74
  | `unit_test` | focused behavior test should cover the issue |
58
75
  | `integration_smoke` | end-to-end or cross-module smoke check is needed |
76
+ | `semantic_eval` | rubric/golden-set or pairwise model/agent output evidence is needed |
77
+ | `failure_artifact_smoke` | a fail-loud or degraded-state artifact should prove the failure remains diagnosable |
59
78
  | `package_install_smoke` | packaged install or executable path must be verified |
60
79
  | `provider_conformance` | provider-specific behavior needs a conformance check |
80
+ | `sink_validation_smoke` | downstream sink validation/encoding/authorization should be exercised with generated or hostile input |
81
+ | `prompt_injection_redteam` | hostile external-content scenario should verify instruction hierarchy and exfiltration boundaries |
82
+ | `rag_permission_smoke` | retrieval should prove permission filtering, source provenance, poisoning control, and audit refs |
83
+ | `provenance_audit` | artifact or claim provenance should be traced through source, builder/agent, input set, transformation, and verification state |
84
+ | `governance_review` | risk owner, risk treatment, approval gate, incident path, and continuous-improvement loop should be reviewed |
61
85
  | `human_design_decision` | maintainer/user decision is the next verification gate |
62
86
 
63
87
  ## Rules
@@ -66,3 +90,6 @@ Optional. Use when the next useful evidence path matters to closure.
66
90
  2. `stale_authority_text` must be paired with `implementation_surface` or an explicit rationale explaining that runtime behavior is unaffected.
67
91
  3. `implementation_task` is not a fix proposal; it means the issue is framed well enough to become implementation input.
68
92
  4. `future_work` should be used when an issue belongs to reconstruct, evolve, learn, govern, or another planned capability rather than the current review path.
93
+ 5. `value_tradeoff_gap` must explain which value commitment is affected: diagnosability, artifact truth, accountability, evidence, explicit loss, least agency, governance, accessibility, or user/operator agency.
94
+ 6. `governance_gap` should not be downgraded to documentation-only when AI behavior affects release, security/privacy, user decisions, or authority artifacts.
95
+ 7. `output_trust_gap`, `prompt_injection_boundary_gap`, and `rag_permission_gap` should prefer fail-close plus fail-loud closure unless the product explicitly accepts visible degraded behavior.
@@ -0,0 +1,122 @@
1
+ ---
2
+ version: 4
3
+ last_updated: "2026-05-28"
4
+ source: merged-from-llm-native-development
5
+ status: established
6
+ ---
7
+
8
+ # Software Engineering Domain — Prompt and Agent Interface Criteria
9
+
10
+ This document defines design criteria for prompts, role instructions, tool schemas, response formats, and agent handoffs used inside software-engineering workflows.
11
+
12
+ ## System Prompt Structure
13
+
14
+ - System prompts must separate stable role/rules from dynamic task input.
15
+ - Prompts that affect behavior must be versioned, reviewable, and tied to an artifact seat.
16
+ - Context loading must be explicit: either the agent reads declared files/resources, or runtime injects a bounded context bundle.
17
+ - Historical notes, deprecated behavior, and migration rationale must not be loaded into active execution context unless the task specifically needs that history.
18
+
19
+ ## Role Definition Structure
20
+
21
+ Role definitions for LLM workers or agents must include:
22
+
23
+ - purpose and non-purpose
24
+ - inputs the role is allowed to use
25
+ - outputs the role must produce
26
+ - tools/resources available to the role
27
+ - forbidden actions
28
+ - failure behavior and escalation path
29
+
30
+ ## Ownership Boundary Structure
31
+
32
+ Prompt, role, handoff, and tool interfaces must reference the LLM/runtime/middleware boundary vocabulary in `concepts.md` and state the task-specific ownership split when more than one layer participates in behavior. This section owns interface obligations, not the canonical boundary definitions.
33
+
34
+ An interface that crosses the boundary must declare:
35
+
36
+ - the semantic work delegated to the LLM role
37
+ - the deterministic gates and authority seats owned by runtime
38
+ - the transport/adaptation work owned by middleware
39
+ - accepted input and output shape for each boundary crossing
40
+ - enforcement profile, trust/artifact status, and diagnostic behavior
41
+ - forbidden crossovers, especially semantic repair by runtime/middleware and LLM bypass of runtime-owned validation, persistence, authority assembly, authorization, idempotency, or cost/security gates
42
+
43
+ ## Tool Definition Structure
44
+
45
+ Every tool exposed to an agent must include:
46
+
47
+ - name and concise purpose
48
+ - parameter schema with required/optional fields
49
+ - result shape and trust status
50
+ - failure modes and retry safety
51
+ - permission or side-effect boundary
52
+ - examples only when they reduce ambiguity
53
+
54
+ Tool definitions with overlapping capability must include routing guidance or be consolidated.
55
+
56
+ Tool definitions for high-impact actions must additionally include:
57
+
58
+ - required human approval condition
59
+ - idempotency or rollback expectation
60
+ - audit artifact emitted on success/failure
61
+ - forbidden use cases
62
+ - sensitive input/output handling
63
+
64
+ ## Response Format Constraints
65
+
66
+ - Structured output must be validated by runtime before consumption.
67
+ - Format instructions in a prompt do not replace schema validation.
68
+ - When output becomes an authority artifact, malformed output must fail-close and fail-loud unless a documented repair rule exists.
69
+ - If a response is degraded, partial, or draft-only, that status must be visible in the output and artifact metadata.
70
+
71
+ ## Output Sink Constraints
72
+
73
+ Prompt and response contracts must name any downstream sink that will consume model output.
74
+
75
+ | Sink | Required runtime gate |
76
+ |---|---|
77
+ | Shell/CLI | command allowlist or parser, argument escaping, approval for destructive actions |
78
+ | SQL/database | parameterization, authorization, transaction/idempotency handling |
79
+ | HTML/Markdown/user display | output encoding/sanitization, trust/status markers where needed |
80
+ | File path/filesystem | path normalization, root-boundary validation, overwrite/destructive-action policy |
81
+ | Email/chat/external message | recipient authorization, disclosure policy, approval for sensitive/high-impact content |
82
+ | API/tool call | schema validation, permission check, side-effect classification |
83
+ | Authority artifact | schema validation, provenance, trust status, deterministic assembly gate |
84
+
85
+ If no sink is known at prompt time, the output must be treated as draft/untrusted until the sink is declared and validated.
86
+
87
+ ## Context Window Utilization
88
+
89
+ - Static prompt material should be small enough to leave room for user input, retrieved context, and output.
90
+ - Token budget should be checked before dispatch when truncation would remove instructions, evidence, or schemas.
91
+ - Retrieved context used as evidence must carry provenance.
92
+ - Critical instructions and output schemas should be placed where the model is least likely to lose them under long context.
93
+
94
+ ## External Content Handling
95
+
96
+ - User input, webpage text, file contents, retrieved snippets, email bodies, logs, and tool output must be framed as data unless a runtime-owned policy explicitly grants instruction authority.
97
+ - Prompts should label untrusted external content and instruct the model not to treat it as role, tool, permission, or output-format authority.
98
+ - Runtime/context assembly must preserve source refs and permission scope for external content used as evidence.
99
+ - Hidden instructions found in external content are a prompt-injection case, not a valid override.
100
+
101
+ ## Agent Permission and Autonomy
102
+
103
+ Agent-facing instructions must distinguish:
104
+
105
+ - functionality: what the tool/runtime can do
106
+ - permission: what the agent is authorized to do in this task/user/tenant scope
107
+ - autonomy: what the agent may do without human approval
108
+
109
+ An agent prompt that says "use tools as needed" is under-specified unless tool permission, autonomy, retry safety, and high-impact approval boundaries are declared elsewhere in the contracted input.
110
+
111
+ ## Fail-Loud Interface Rule
112
+
113
+ For development, review, and authority-update paths, an interface that cannot provide the required prompt, context, tool, model, or output contract should stop with a diagnostic artifact. Silent fallback is more costly than visible failure because it hides the failing boundary and forces later exploration.
114
+
115
+ Graceful degradation is allowed for user-facing product behavior only when the reduced capability, cause, diagnostic reference, and recovery path are explicit.
116
+
117
+ ## Related Documents
118
+
119
+ - concepts.md — LLM-native engineering terms
120
+ - logic_rules.md — LLM boundary logic and failure posture
121
+ - structure_spec.md — LLM-native system structure
122
+ - competency_qs.md — CQ-A questions for AI agent and LLM-native collaboration
@@ -1,6 +1,6 @@
1
1
  ---
2
- version: 2
3
- last_updated: "2026-03-30"
2
+ version: 6
3
+ last_updated: "2026-05-28"
4
4
  source: bundled-domain-baseline
5
5
  status: established
6
6
  ---
@@ -45,7 +45,7 @@ Classification axis: **structural component** — specifications classified by t
45
45
  ## Required Relationships
46
46
 
47
47
  - See §Golden Relationships for module-interface, test-code, and config-code coherence rules.
48
- - All external dependencies (libraries, APIs) must be abstracted via interfaces for replaceability
48
+ - External dependencies must follow dependency_rules.md. Use owned interfaces, ports, adapters, or anti-corruption layers when replacement, testing, security, policy isolation, or model translation matters. Direct coupling to a stable or low-risk dependency may be accepted with an explicit tradeoff rationale
49
49
  - When structural verification (code) and execution procedures (protocol) are in separate documents, the linking reference must be back-referenced in the protocol document for enforcement to be complete
50
50
 
51
51
  ## Golden Relationships
@@ -60,7 +60,12 @@ Golden relationships are cross-component validation rules. Each rule connects tw
60
60
 
61
61
  ## Layer Structure Principles
62
62
 
63
- Layer dependency direction rules are defined in dependency_rules.md §Direction Rules. The key principle: upper layers depend on lower layers, never the reverse.
63
+ Layer dependency direction rules are defined in dependency_rules.md §Direction Rules. There is no single global "upper -> lower" rule:
64
+
65
+ - Conventional layered architecture may allow presentation/application layers to depend on lower service/data-access layers.
66
+ - Clean and Hexagonal architectures constrain source-code dependencies to point inward or toward ports/abstractions.
67
+ - Runtime call/data-flow direction is separate from source-code dependency direction.
68
+ - Reviews must apply the direction rule for the declared architecture pattern and dependency kind.
64
69
 
65
70
  ## Authority and Layer Separation
66
71
 
@@ -137,9 +142,52 @@ These thresholds are structural health indicators derived from industry practice
137
142
  | Test coverage (line) | < 60% | Critical verification gap | Immediate action required |
138
143
  | API response time | P99 > 1s | Performance degradation | Performance review and optimization |
139
144
  | Class inheritance depth | > 5 levels | Inheritance hierarchy is too deep | Prefer composition over inheritance |
145
+ | Agent tool count | > 20 tools per agent | Tool selection quality drops and routing becomes non-deterministic | Split tools, add routing, or narrow the agent role |
146
+ | Prompt template length | > 25% of target context window | User input/retrieved evidence/output schema may be squeezed or truncated | Refactor prompt, move stable material to refs, or choose a larger context model |
140
147
 
141
148
  Cross-reference: logic_rules.md 'Testing Logic' (test boundary rules inform coverage measurement strategy).
142
149
 
150
+ ## LLM-Native System Structure
151
+
152
+ This section applies when a software system, development workflow, or review workflow depends on LLMs, agents, prompt/context contracts, retrieval, model providers, or tool-call boundaries.
153
+
154
+ ### Required Components
155
+
156
+ | Component | Structure | Failure if Missing |
157
+ |---|---|---|
158
+ | Model connection | Provider/client boundary with model id, auth mode, version, rate-limit handling | The system cannot reproduce or explain model behavior |
159
+ | Prompt/context assembly | Prompt templates, instruction hierarchy, context sources, token budget, output schema | Model input becomes an unreviewable prompt blob |
160
+ | Output validation and sink gates | Schema validation, semantic checks, sink-specific validation/encoding/authorization, trust boundary, failure artifact | Malformed or unsafe output becomes trusted behavior or unsafe downstream input |
161
+ | Evaluation harness | Golden set, rubric, baseline, comparison method | Route success is mistaken for output quality |
162
+ | Observability | Prompt/output/model/tool facts, correlation id, cost, latency, failure reason | Failures become expensive to diagnose |
163
+ | Provenance record | Source refs, builder/agent, inputs, transformation path, verification state, model/provider facts | Generated or retrieved claims become unverifiable authority |
164
+ | Ownership boundary map | LLM semantic delegation, runtime deterministic gates and authority seats, middleware transport/adaptation, trust status, diagnostics | LLM, runtime, or middleware can silently take over another layer's authority |
165
+
166
+ ### Optional Components Required When Applicable
167
+
168
+ | Component | Required When | Structure |
169
+ |---|---|---|
170
+ | Retrieval/RAG pipeline | External knowledge is selected for model context | ingestion -> processing -> indexing -> retrieval -> reranking/context handoff, with provenance at each stage |
171
+ | RAG permission layer | Retrieved material crosses users, tenants, projects, sensitivity classes, or authority levels | pre-context permission filtering, source validation, poison checks, retrieval audit, redaction/exclusion path |
172
+ | Agent tool registry | An LLM can choose actions or call tools | tool name, purpose, parameter schema, result shape, failure semantics, permission boundary |
173
+ | Agent state/progress | Work spans multiple steps, tools, or sessions | explicit state object or artifact with completed/current/remaining steps and accumulated refs |
174
+ | Multi-agent profile | Multiple reasoning units collaborate | coordination pattern, isolated inputs/outputs, termination conditions, conflict-resolution authority |
175
+ | Safety guardrails | User input, model output, or agent action can cause harm | input guardrail, output guardrail, action permission model, logging, false-positive review path |
176
+ | AI governance record | AI behavior materially affects users, operators, release, security, privacy, or authority artifacts | risk owner, risk treatment, approval gate, human oversight, transparency/audit evidence |
177
+ | Red-team/incident loop | AI behavior can fail semantically, disclose data, mislead users, or trigger unsafe action | test scenario, finding intake, incident disclosure path, remediation owner, updated prompt/policy/eval/release gate |
178
+ | Human approval gate | Agent output or action is high-impact, irreversible, external, privileged, or user-affecting | approver role, approval input, audit record, denial path, idempotency or rollback expectation |
179
+
180
+ ### Golden Relationships
181
+
182
+ - **Model capability -> prompt/tool requirement**: The chosen model must support the prompt's required capabilities: context length, structured output, tool use, modality, and reasoning level. If not, invocation must fail-loud before dispatch or record a degraded route explicitly
183
+ - **Retrieved context -> evidence claim**: Any evidence-backed claim must trace to retrieved context provenance. Generated text without provenance may be a draft but not evidence
184
+ - **Tool schema -> agent instruction**: Every tool named in agent instructions must exist with a valid schema, and every exposed tool must either be referenced by an agent role or justified as discoverable reserve capacity
185
+ - **Evaluation baseline -> production drift**: Production quality drift detection must compare against an evaluation baseline. Monitoring without a baseline cannot classify quality movement
186
+ - **External content -> model context**: Any external content entering model context must pass through context assembly that preserves instruction hierarchy and treats the content as data, not authority
187
+ - **LLM semantic output -> runtime authority gate -> middleware adapter**: LLM output may provide semantic input, but runtime owns validation, authority-seat assembly, persistence, authorization, idempotency, and cost/security gates. Middleware may adapt envelopes, routes, and observability plumbing, but must not repair meaning, become hidden policy authority, or bypass runtime-owned gates
188
+ - **Agent capability -> permission -> autonomy**: Capability, authorization, and approval are distinct structure seats. A design that exposes a tool without separately declaring permission and autonomy is structurally under-specified
189
+ - **AI risk -> owner -> gate -> feedback loop**: Material AI risk must connect to an owner, approval or acceptance gate, incident/red-team intake, and update path for controls or evals
190
+
143
191
  ## Verification Structure
144
192
 
145
193
  ### Static Analysis Integration
@@ -182,4 +230,5 @@ Cross-reference: logic_rules.md 'Testing Logic' (test boundary rules inform cove
182
230
  - concepts.md — term definitions for module, interface, layer, architecture patterns, etc.
183
231
  - dependency_rules.md — dependency direction and circular dependency rules, build/package dependency management
184
232
  - logic_rules.md — type system logic, constraint design, security logic, testing logic
233
+ - prompt_interface.md — prompt, role, tool, response format, and context interface criteria
185
234
  - competency_qs.md — CQ-S-01~CQ-S-10 (Structural Understanding verification questions)
@@ -45,6 +45,7 @@ mixed stage가 보이면 아래 둘로 분리해야 한다.
45
45
  7. `script`로 안전하게 자동화할 수 없는 일은 runtime이 아니라 `LLM` 소유로 두는 편이 맞다.
46
46
  8. prompt path는 설계의 대략적인 버전이 아니라, 설계된 process의 **기준 실행 (reference realization)** 이어야 한다.
47
47
  9. 개발 중인 시스템은 매 단계에서 실제로 작동 가능한 상태를 유지해야 한다.
48
+ 10. LLM-native 개발·검토·authority 업데이트 경로에서는 숨겨진 fallback보다 **fail-loud**가 기본값이다.
48
49
 
49
50
  ## 3. runtime 역할을 과대하게 잡지 말 것
50
51
 
@@ -69,6 +70,23 @@ runtime이 하면 안 되는 일:
69
70
  즉 runtime은 semantic quality를 생산하는 층이 아니라,
70
71
  semantic drift가 계약 밖으로 새지 못하게 막는 층이다.
71
72
 
73
+ ### 3.1 Fail-loud over silent degradation
74
+
75
+ LLM-native 개발에서는 전통적인 "fail-safe" 직관이 항상 맞지 않는다.
76
+
77
+ 기존 소프트웨어에서는 사용자가 계속 작업할 수 있게 fallback이나 graceful degradation을 넣는 것이 비용을 줄이는 경우가 많다. 하지만 LLM-native 개발·검토·authority 업데이트 경로에서는 silent failure가 더 큰 비용을 만든다. 실패 지점이 prompt인지, context assembly인지, retrieval인지, model/provider인지, tool schema인지, runtime validator인지 다시 탐색해야 하기 때문이다.
78
+
79
+ 이 환경에서는 코딩·보수 비용보다 **실패 원인 탐색 비용**이 더 자주 병목이 된다. LLM과 agent가 수정 비용을 낮춰주기 때문에, 문제가 난 자리에서 loud하게 실패시키고 바로 고치는 편이 보통 더 싸다.
80
+
81
+ 따라서 기본 규칙은 다음과 같다.
82
+
83
+ - malformed LLM output, missing context, schema mismatch, invalid tool result, provider preflight failure, token budget overflow는 숨기지 말고 실패 위치와 원인을 남긴다.
84
+ - fallback은 "계속 실행하기 위한 내부 꼼수"가 아니라, trigger, lost capability, trust status, diagnostic artifact, recovery path가 선언된 product behavior여야 한다.
85
+ - review, canonicalization, authority update처럼 artifact truth를 만드는 경로에서는 degraded output이 complete output처럼 통과하면 안 된다.
86
+ - user-facing production flow에서만 graceful degradation이 기본값이 될 수 있다. 이때도 부분 결과·품질 저하·근거 부족은 사용자나 운영자가 볼 수 있어야 한다.
87
+
88
+ `fail-close`는 계약 미달 output을 신뢰 경계 안으로 들이지 않는 gate이고, `fail-loud`는 그 gate가 닫힌 이유를 즉시 고칠 수 있게 드러내는 diagnostic posture다. 둘은 대체 관계가 아니라 함께 쓰는 관계다.
89
+
72
90
  ## 4. 의사결정 프레임
73
91
 
74
92
  새 작업이나 기능을 설계할 때는 아래 세 질문을 순서대로 본다.
@@ -233,6 +251,7 @@ LLM 기능에서는 다음 자산을 코드와 동급으로 취급한다.
233
251
  - retrieval policy
234
252
  - tool use policy
235
253
  - fallback policy
254
+ - fail-loud policy
236
255
  - reviewer workflow
237
256
  - promote / canonicalize criteria
238
257
  - declared boundary policy
@@ -356,6 +375,7 @@ ontology가 너무 일찍 고정하면 안 되는 것:
356
375
  - exact-match 테스트만으로 품질을 증명하려고 하기
357
376
  - eval 없이 runtime hardening부터 하기
358
377
  - uncertainty 표현이나 abstain을 실패로 간주하기
378
+ - silent fallback, hidden output repair, unmarked graceful degradation으로 실패 지점을 숨기기
359
379
  - prompt/context/retrieval 실험 없이 schema만 정교하게 만들기
360
380
  - 품질이 아니라 형식 안정성만 개선하고 "개선"이라고 부르기
361
381
  - runtime이 semantic task를 대신하도록 boundary를 잘못 자르기
@@ -338,6 +338,12 @@ canonical은:
338
338
  `synthesize`는 새 lens가 아니며,
339
339
  기존 lens 결과를 보존적으로 종합해야 한다.
340
340
 
341
+ `New Perspectives`는 현재 review 실행의 active lens set을 바꾸는 장치가 아니다.
342
+ domain 문서나 domain concern은 lens 추가를 결정하지 않는다. domain은 concern을
343
+ case evidence, CQ, rule, value commitment로 제공하고, 기존 lens가 그 material을
344
+ 소비한다. lens 추가/삭제/분할/통합은 domain 작업이 아니라 review process governance
345
+ 변경이며, 모든 domain과 runtime artifact에 미치는 영향을 별도 판단해야 한다.
346
+
341
347
  ### 10.3 맥락 격리 추론 단위
342
348
 
343
349
  → canonical 위치: `.onto/principles/ontology-as-code-guideline.md` §7 (구조 규칙) + `.onto/principles/llm-native-development-guideline.md` (설계 가이드)