onto-mcp 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/.onto/authority/core-lexicon.yaml +12 -0
  2. package/.onto/domains/software-engineering/competency_qs.md +192 -63
  3. package/.onto/domains/software-engineering/concepts.md +67 -5
  4. package/.onto/domains/software-engineering/conciseness_rules.md +22 -2
  5. package/.onto/domains/software-engineering/dependency_rules.md +78 -8
  6. package/.onto/domains/software-engineering/domain_scope.md +181 -150
  7. package/.onto/domains/software-engineering/extension_cases.md +318 -542
  8. package/.onto/domains/software-engineering/logic_rules.md +75 -3
  9. package/.onto/domains/software-engineering/problem_framing_profile.md +29 -2
  10. package/.onto/domains/software-engineering/prompt_interface.md +122 -0
  11. package/.onto/domains/software-engineering/structure_spec.md +53 -4
  12. package/.onto/principles/llm-native-development-guideline.md +20 -0
  13. package/.onto/principles/productization-charter.md +6 -0
  14. package/.onto/processes/evolve/material-kind-adapter-contract.md +6 -0
  15. package/.onto/processes/reconstruct/reconstruct-boundary-contract.md +468 -81
  16. package/.onto/processes/reconstruct/reconstruct-execution-ux-contract.md +177 -0
  17. package/.onto/processes/reconstruct/source-profile-contract.md +39 -6
  18. package/.onto/processes/reconstruct/top-level-concept-discovery-contract.md +387 -0
  19. package/.onto/processes/review/binding-contract.md +8 -0
  20. package/.onto/processes/review/lens-registry.md +16 -0
  21. package/.onto/processes/review/pre-dispatch-contracts.md +34 -13
  22. package/.onto/processes/review/productized-live-path.md +3 -1
  23. package/.onto/processes/shared/pipeline-execution-ledger-contract.md +185 -0
  24. package/.onto/processes/shared/target-material-kind-contract.md +24 -2
  25. package/.onto/roles/axiology.md +7 -2
  26. package/AGENTS.md +4 -2
  27. package/README.md +52 -29
  28. package/dist/core-api/reconstruct-api.js +92 -5
  29. package/dist/core-api/review-api.js +1744 -371
  30. package/dist/core-runtime/cli/mock-review-unit-executor.js +17 -0
  31. package/dist/core-runtime/cli/render-review-final-output.js +9 -0
  32. package/dist/core-runtime/cli/review-invoke.js +387 -55
  33. package/dist/core-runtime/cli/run-review-prompt-execution.js +361 -90
  34. package/dist/core-runtime/path-boundary.js +58 -0
  35. package/dist/core-runtime/pipeline-execution-ledger.js +100 -0
  36. package/dist/core-runtime/reconstruct/artifact-types.js +33 -1
  37. package/dist/core-runtime/reconstruct/materialize-preparation.js +54 -4
  38. package/dist/core-runtime/reconstruct/pipeline-execution-ledger.js +342 -0
  39. package/dist/core-runtime/reconstruct/post-seed-validation.js +630 -0
  40. package/dist/core-runtime/reconstruct/record.js +105 -1
  41. package/dist/core-runtime/reconstruct/run.js +1594 -38
  42. package/dist/core-runtime/reconstruct/seed-candidate-validation.js +29 -0
  43. package/dist/core-runtime/review/continuation-plan.js +160 -0
  44. package/dist/core-runtime/review/execution-plan-boundary.js +123 -0
  45. package/dist/core-runtime/review/materializers.js +8 -3
  46. package/dist/core-runtime/review/pipeline-execution-ledger.js +250 -0
  47. package/dist/core-runtime/review/review-artifact-utils.js +15 -2
  48. package/dist/core-runtime/review/review-invocation-runner.js +604 -0
  49. package/dist/core-runtime/target-material-kind.js +43 -5
  50. package/dist/mcp/server.js +289 -59
  51. package/dist/mcp/tool-schemas.js +28 -2
  52. package/package.json +4 -2
  53. package/.onto/domains/llm-native-development/competency_qs.md +0 -430
  54. package/.onto/domains/llm-native-development/concepts.md +0 -242
  55. package/.onto/domains/llm-native-development/conciseness_rules.md +0 -163
  56. package/.onto/domains/llm-native-development/dependency_rules.md +0 -216
  57. package/.onto/domains/llm-native-development/domain_scope.md +0 -197
  58. package/.onto/domains/llm-native-development/extension_cases.md +0 -474
  59. package/.onto/domains/llm-native-development/logic_rules.md +0 -123
  60. package/.onto/domains/llm-native-development/prompt_interface.md +0 -49
  61. package/.onto/domains/llm-native-development/structure_spec.md +0 -245
@@ -1,245 +0,0 @@
1
- ---
2
- version: 2
3
- last_updated: "2026-03-30"
4
- source: manual
5
- status: established
6
- ---
7
-
8
- # LLM-Native Development Domain — Structure Specification
9
-
10
- ## Project Required Files (Meta Files) (→Area 3)
11
-
12
- | Role | Filename | Location | Description | Constraints |
13
- |------|---------|----------|-------------|-------------|
14
- | LLM Instruction File | Per project convention | Project root | Delivers project instructions to the LLM. Include only information without which work would fail | 300 lines or fewer |
15
- | Structure Map | ARCHITECTURE.md or llms.txt | Project root | Structure map of the entire system. Directory structure, each directory's role, key file list | All top-level directories mentioned |
16
- | Navigation Index | INDEX.md | 1 per directory | File list within the directory and a one-line description of each file. Source of truth for file existence | — |
17
- | Human-Readable Guide | README.md | Each directory (optional) | Can be used instead of INDEX.md. Human-readability focused | When coexisting with INDEX.md, INDEX.md takes precedence |
18
- | Feature Specification | spec.md | Feature directory | Feature specification before implementation. Includes requirements, architecture decisions, test strategy | Same directory as implementation files |
19
-
20
- **Vendor-Specific LLM Instruction File Conventions (→Area 3):**
21
-
22
- | LLM Ecosystem | Filename | Notes |
23
- |--------------|---------|-------|
24
- | Claude Code | CLAUDE.md | Can be placed at project root + per subdirectory |
25
- | Cursor | .cursorrules | Project root |
26
- | GitHub Copilot | .github/copilot-instructions.md | .github directory |
27
- | Windsurf | .windsurfrules | Project root |
28
- | General (AGENTS.md standard) | AGENTS.md | Managed by Linux Foundation, multi-agent support |
29
-
30
- Select the filename matching the LLM ecosystem used by the project. When using multiple LLM ecosystems simultaneously, instruction files for each ecosystem can be placed in parallel.
31
-
32
- ## Frontmatter Specification (Source of Truth) (→Area 3)
33
-
34
- This section is the source of truth for all rules regarding frontmatter. When other files (logic_rules.md, dependency_rules.md) reference frontmatter, they follow this definition.
35
-
36
- ### Required Fields
37
- - `title`: File title (string)
38
- - `type`: File type. Allowed values: `concept`, `rule`, `spec`, `index`, `architecture`, `config`
39
- - `description`: One-line description (string)
40
-
41
- ### Relationship Fields (Optional)
42
- - `depends_on`: List of file paths this file depends on
43
- - `related_to`: List of related file paths (undirected association)
44
- - `parent`: Parent concept file path
45
-
46
- ### Change Tracking Fields (Optional)
47
- - `last_updated`: Last update date (YYYY-MM-DD). Can be omitted when delegated to git
48
- - `update_reason`: Last update reason (string). Can be omitted when delegated to git
49
-
50
- ## File Structure Required Elements (→Area 3)
51
-
52
- - **Body**: Markdown format. One H1 title per file, structured with subsections (H2~H3)
53
- - **File size**: Single file 500 lines or fewer recommended. When exceeded, review concept separation. This value is an empirical criterion considering context efficiency of current major LLMs (128K~1M tokens)
54
-
55
- ## Directory Structure Rules (→Area 3)
56
-
57
- - Maximum depth: 3 levels (e.g., .onto/domains/software-engineering/concepts.md). This value is an empirical criterion for minimizing LLM traversal cost
58
- - Each directory expresses a single concern
59
- - Directory names use plural nouns or domain names (e.g., domains, roles, processes)
60
- - Do not create directories containing only a single file. Place the file in the parent directory
61
- - Recommended files per directory: 3~20. When exceeding 20, review subdirectory separation or sub-index introduction
62
-
63
- ## Filename Rules (→Area 3)
64
-
65
- - Use snake_case (e.g., domain_scope.md, logic_rules.md)
66
- - Filenames directly express the content's role
67
- - Reserved filenames and roles (see "Project Required Files" table above): ARCHITECTURE.md, INDEX.md, README.md, spec.md, llms.txt, and each LLM ecosystem's instruction file
68
- - Avoid order representation using numeric prefixes. Order is specified in INDEX.md
69
-
70
- ## Reference Chain Upper Limit (→Area 3)
71
-
72
- - Files that must be read to perform one task: 5 or fewer recommended. "Task" is defined per scenario in extension_cases.md. This value is an empirical criterion considering LLM context window utilization efficiency
73
-
74
- ## Required Relationships (→Area 3)
75
-
76
- - Every concept file must be referenced by at least 1 other document, or registered in INDEX.md
77
- - Every directory must contain INDEX.md or README.md
78
- - ARCHITECTURE.md must mention all top-level directories and their roles
79
-
80
- ## Isolated Element Prohibition (→Area 3)
81
-
82
- - Concept files referenced by nothing → warning (isolated document)
83
- - Files not registered in INDEX.md → warning (unregistered file)
84
- - Concept files with no relationships in frontmatter → warning (relationships undefined)
85
-
86
- ## LLM System Architecture Structure (→Area 1, Area 2)
87
-
88
- ### Required Architectural Components
89
-
90
- Every LLM-powered system must have at minimum:
91
-
92
- | Component | Purpose | Risk if Missing |
93
- |-----------|---------|-----------------|
94
- | Model connection | Interface to the LLM (API client, SDK wrapper) | System cannot communicate with the model |
95
- | Input design | Prompt construction, context assembly | Inconsistent or inefficient model usage |
96
- | Output handling | Response parsing, validation, error handling | Malformed output propagates to downstream systems |
97
-
98
- ### Optional Components (required when applicable)
99
-
100
- | Component | When Required | Primary Area |
101
- |-----------|--------------|-------------|
102
- | Retrieval pipeline | System needs external knowledge beyond training data | Area 3 |
103
- | Tool integration | Model must take actions in external systems | Area 4 |
104
- | Evaluation pipeline | System output quality must be measured | Area 5 |
105
- | Safety guardrails | System faces adversarial users or produces user-facing content | Area 6 |
106
- | Monitoring and logging | System runs in production | Area 7 |
107
- | Fine-tuning pipeline | Prompting alone cannot meet quality/cost/latency targets | Area 8 |
108
-
109
- ### Component Interaction Patterns
110
-
111
- - **Synchronous (request-response)**: Client sends prompt → waits for model response → processes output. Appropriate when latency tolerance exists and the result is needed immediately
112
- - **Asynchronous (event-driven)**: Request is queued → model processes when available → result is delivered via callback or polling. Appropriate for batch processing, high-throughput systems, or when model response time is unpredictable
113
- - **Streaming**: Model generates tokens incrementally → client receives and displays partial output. Appropriate for user-facing applications where perceived latency matters. Streaming introduces complexity: partial output cannot be validated until the stream completes
114
-
115
- ## RAG Pipeline Structure (→Area 3)
116
-
117
- ### Required Stages
118
-
119
- Every RAG (Retrieval-Augmented Generation) pipeline must define these stages explicitly. Each stage has defined input/output contracts.
120
-
121
- | Stage | Input | Output | Key Design Decision |
122
- |-------|-------|--------|-------------------|
123
- | Ingestion | Raw documents (PDF, HTML, code, etc.) | Normalized text with metadata | Document parser selection, metadata extraction |
124
- | Processing | Normalized text | Chunks with embeddings | Chunking strategy, embedding model selection |
125
- | Storage | Chunks with embeddings | Indexed vector store | Vector database selection, index configuration |
126
- | Retrieval | User query | Ranked relevant chunks with scores | Search algorithm (semantic, keyword, hybrid), reranking |
127
- | Generation | Query + retrieved chunks | Model response | Prompt assembly, citation handling |
128
-
129
- ### Stage Boundary Rules
130
-
131
- - Each stage must be independently testable. If testing retrieval quality requires running generation, the stages are too tightly coupled
132
- - Stage outputs must carry metadata sufficient for debugging: stage name, processing timestamp, and stage-specific metrics (e.g., chunk count, relevance scores)
133
- - Retrieval results must carry relevance scores. Results without scores cannot be filtered, reranked, or used for quality diagnostics
134
-
135
- ## Agent Architecture Structure (→Area 4)
136
-
137
- ### Required Components
138
-
139
- Every agent-based system must define:
140
-
141
- | Component | Description | Constraints |
142
- |-----------|-------------|-------------|
143
- | Model | The LLM powering the agent's reasoning | Must specify model identity, version, and required capabilities |
144
- | Tools | External functions the agent can invoke | Each tool must have a schema (see Tool Definition Structure below) |
145
- | Instructions | System prompt defining agent behavior, goals, and constraints | Must list available tools and specify when each should be used |
146
- | State management | How the agent tracks progress and accumulated information | Must be explicit — implicit state via conversation history is fragile (see logic_rules.md) |
147
-
148
- ### Tool Definition Structure
149
-
150
- Every tool exposed to an agent must include:
151
-
152
- ```
153
- name: Unique identifier (snake_case, no spaces)
154
- description: What the tool does and when to use it (self-describing)
155
- parameters: JSON Schema defining input parameters (types, required fields, constraints)
156
- return_type: Description of what the tool returns (structure, possible error states)
157
- ```
158
-
159
- If any of these fields is missing, the agent cannot reliably determine when or how to use the tool.
160
-
161
- ### Multi-Agent Execution profile
162
-
163
- When a system uses multiple agents, the coordination execution profile must be explicitly chosen and documented:
164
-
165
- | Execution profile | Structure | When to Use |
166
- |----------|-----------|-------------|
167
- | Hub-spoke (orchestrator-workers) | One orchestrator agent delegates tasks to specialized worker agents | Tasks are decomposable into independent subtasks with clear boundaries |
168
- | Peer-to-peer | Agents communicate directly with each other, no central coordinator | Agents have equal authority and need to negotiate or collaborate |
169
- | Hierarchical | Multiple layers of orchestration (manager → team leads → workers) | Large-scale systems with many agents and complex task decomposition |
170
-
171
- The choice must be documented with rationale. Default recommendation: hub-spoke (simplest, easiest to debug, clear responsibility boundaries).
172
-
173
- ## Evaluation Structure (→Area 5)
174
-
175
- ### Golden Set Requirements
176
-
177
- | Attribute | Requirement |
178
- |-----------|-------------|
179
- | Minimum size | Sufficient to detect meaningful quality differences (recommend ≥50 examples for statistical validity) |
180
- | Diversity | Must cover all expected input categories. If the system handles 5 task types, the golden set must include examples of all 5 |
181
- | Update frequency | Must be reviewed when the system's scope changes. Stale golden sets produce misleading evaluation results |
182
- | Separation | Must not overlap with training or fine-tuning data. Contamination invalidates evaluation results |
183
-
184
- ### Evaluation Pipeline Structure
185
-
186
- | Stage | Input | Output |
187
- |-------|-------|--------|
188
- | Data source | Golden set + system under evaluation | Evaluation inputs (query-expected pairs) |
189
- | Evaluation execution | System outputs + expected outputs | Per-example scores and judgments |
190
- | Result storage | Raw evaluation results | Persistent evaluation records (for trend analysis) |
191
- | Analysis | Stored results (current + historical) | Quality metrics, regression detection, trend reports |
192
-
193
- Each stage must be automated for production systems. Manual evaluation is acceptable only during initial development or for semantic quality aspects that resist automation.
194
-
195
- ## Safety Architecture Structure (→Area 6)
196
-
197
- ### Required Pipeline
198
-
199
- User input → **Input guardrails** → Model → **Output guardrails** → User
200
-
201
- Each guardrail must define:
202
-
203
- | Element | Description | Example |
204
- |---------|-------------|---------|
205
- | Trigger condition | What input/output pattern activates the guardrail | "Input contains known prompt injection patterns" |
206
- | Action | What happens when triggered | "Block request, return safe fallback response" |
207
- | Logging | What is recorded when triggered | "Timestamp, trigger rule ID, input hash, action taken" |
208
-
209
- ### Guardrail Placement Rules
210
-
211
- - Input guardrails run before the model processes the request. They protect the model from adversarial inputs. Cost: added latency before model invocation
212
- - Output guardrails run after the model generates a response. They protect the user from harmful outputs. Cost: added latency after model invocation, before delivery
213
- - Both must be present for defense-in-depth. A system with only input guardrails cannot catch harmful outputs the model generates from benign inputs. A system with only output guardrails cannot prevent resource waste from processing adversarial inputs
214
-
215
- ## Golden Relationships (Cross-Component Validations)
216
-
217
- Golden relationships are structural invariants that connect components across different areas. A violation of any golden relationship indicates a design defect that will manifest as a runtime failure or quality degradation.
218
-
219
- - **Model capability ↔ Prompt complexity (Area 1 ↔ Area 2)**: The prompt must not exceed the model's capabilities. If a prompt requires tool use but the model does not support function calling, the system will fail. If a prompt requires reasoning over 100K tokens but the model's context window is 8K, the input will be truncated. Verification: for each prompt template, confirm the target model supports all required features (tool use, structured output, context length)
220
- - **Retrieval relevance ↔ Generation quality (Area 3 ↔ Area 2)**: If retrieval returns irrelevant or insufficient context, generation quality degrades regardless of prompt quality. This relationship is unidirectional: good retrieval does not guarantee good generation, but bad retrieval guarantees degraded generation. Verification: evaluate retrieval quality independently (precision@k, recall@k) before evaluating end-to-end generation quality
221
- - **Tool schema ↔ Agent instructions (Area 4 ↔ Area 4)**: Every tool referenced in agent instructions must exist in the tool registry with a valid schema. Every tool in the registry should be referenced in at least one agent's instructions (unused tools add cognitive load to the agent and may cause unintended tool selection). Verification: cross-reference instruction tool mentions against registered tool names
222
- - **Evaluation criteria ↔ Safety policy (Area 5 ↔ Area 6)**: Safety violations must be a subset of evaluation criteria. If the safety policy prohibits a behavior, the evaluation pipeline must be able to detect that behavior. A safety rule that cannot be evaluated cannot be verified in production. Verification: for each safety policy rule, confirm at least one evaluation metric or test case covers it
223
- - **Cost budget ↔ Model selection (Area 7 ↔ Area 1)**: The selected model's per-token cost, multiplied by expected usage volume, must fit within the operational budget. If the cost projection exceeds the budget, either the model must be downgraded, usage must be reduced (caching, batching), or the budget must be increased. Verification: cost projection = (avg tokens per request × requests per period × cost per token). Projection must be ≤ budget
224
-
225
- ## Quantitative Thresholds
226
-
227
- These thresholds are structural health indicators. Exceeding a threshold does not automatically indicate a defect, but requires explicit justification.
228
-
229
- | Metric | Threshold | Source | Rationale |
230
- |--------|-----------|--------|-----------|
231
- | Single file size | 500 lines max | Area 3 | Context efficiency for current LLMs (128K–1M tokens) |
232
- | Directory depth | 3 levels max | Area 3 | Minimizes LLM traversal cost |
233
- | Reference chain | 5 files max | Area 3 | Context window utilization efficiency |
234
- | Agent tool count | < 20 tools per agent | Area 4 | Cognitive load — agents with too many tools make worse tool selection decisions. Empirical finding from Anthropic and OpenAI agent design guides |
235
- | RAG chunk size | 256–2048 tokens | Area 3 | Empirical range. Smaller chunks lose context; larger chunks dilute relevance. Optimal size is model-dependent and task-dependent |
236
- | Golden set size | ≥ 50 examples | Area 5 | Minimum for detecting statistically meaningful quality differences |
237
- | Guardrail false positive rate | < 1% on legitimate traffic | Area 6 | Higher rates degrade user experience unacceptably |
238
- | Prompt template length | ≤ 25% of context window | Area 2 | Leaves room for retrieved context, user input, and model output. Systems exceeding this ratio risk truncation under heavy usage |
239
-
240
- ## Related Documents
241
- - concepts.md — Term definitions within this domain
242
- - logic_rules.md — Logic rules referencing this file's frontmatter specification and structural constraints
243
- - dependency_rules.md — Referential integrity rules referencing this file's specification
244
- - domain_scope.md — Scope definition, sub-area membership criteria, and 8 sub-areas
245
- - competency_qs.md — Questions this domain's structure must support answering