npm - onto-mcp - Versions diffs - 0.3.0 → 0.3.2 - Mend

onto-mcp 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (61) hide show

package/.onto/domains/llm-native-development/concepts.md DELETED Viewed

@@ -1,242 +0,0 @@
----
-version: 2
-last_updated: "2026-03-30"
-source: manual
-status: established
----
-# LLM-Native Development Domain — Concept Dictionary and Interpretation Rules
-Classification axis: **normative system** — classification by the standard/protocol/framework layer to which each term belongs.
-## Normative System Classification
-Standards and frameworks in LLM-powered system development form a layered system. Each layer builds on the one below it.
-### Layer 1 — Foundation Protocols
-These define how systems communicate with models and with each other.
-- HTTP/REST = the transport protocol for model API calls. All commercial LLM providers expose REST endpoints
-- MCP (Model Context Protocol) = an open protocol standardizing how agents connect to external tools, data sources, and resources. Three primitives: Tools (executable functions), Resources (read-only data), Prompts (reusable prompt templates)
-- A2A (Agent-to-Agent Protocol) = a protocol for inter-agent communication. Complements MCP (agents↔tools) by connecting agents↔agents
-- OpenAPI / JSON Schema = specification formats for API contracts and structured data shapes. Used for function calling schemas and MCP tool parameter definitions
-### Layer 2 — Framework Abstractions
-Reusable libraries and SDKs that wrap Layer 1 protocols into higher-level programming constructs.
-- OpenAI SDK / Anthropic SDK = vendor-specific client libraries for model API access. Handle authentication, request formatting, streaming, retry logic
-- LangChain = a framework providing abstractions for chains (sequential prompt pipelines), agents (decision loops), and memory (conversation state)
-- LlamaIndex = a framework for data ingestion, chunking, indexing, and querying. Specialized for RAG pipeline construction
-- LangGraph = a framework for stateful multi-step agent workflows as graphs. Adds explicit state management and cycle support to LangChain
-- Other notable SDKs: Semantic Kernel (Microsoft, enterprise .NET/Python), Vercel AI SDK (TypeScript, streaming UI)
-### Layer 3 — Application Patterns
-Architectural patterns that combine Layer 1 and Layer 2 into recognizable system designs.
-- RAG (Retrieval Augmented Generation) = retrieving relevant documents and including them in model context before generation
-- Agent = a system where an LLM autonomously decides actions to accomplish a goal, distinguished by the presence of a decision loop
-- Chatbot = a conversational interface backed by an LLM, maintaining history across turns
-- Code Assistant = an LLM-powered system that reads, generates, and modifies code
-- Multi-Agent System = multiple specialized agents collaborating with distinct roles, tools, and instructions
-### Layer 4 — Quality and Governance
-Frameworks for measuring, controlling, and governing LLM system behavior.
-- Evaluation Frameworks = tools for measuring LLM output quality (RAGAS, DeepEval, Promptfoo)
-- EU AI Act = EU regulation classifying AI systems by risk level with corresponding obligations
-- NIST AI RMF = voluntary framework for managing AI risks. Four functions: Govern, Map, Measure, Manage
-- Model Cards / System Cards = standardized documentation for model capabilities, limitations, and evaluation results
-### Relationship Between Layers
-Layer 1 defines communication. Layer 2 wraps it into developer abstractions. Layer 3 combines abstractions into architectures. Layer 4 governs quality and safety. Skipping Layer 1 understanding makes debugging opaque; ignoring Layer 4 leaves quality and compliance unmanaged.
-## Model Integration Terms
-- Model = a trained neural network that generates text given input. In this domain, unqualified "model" refers to an LLM
-- API Endpoint = a URL that accepts model requests and returns responses
-- Model Routing = directing requests to different models based on task complexity, cost, or latency requirements
-- Fallback = switching to an alternative model when the primary is unavailable or rate-limited
-- Model Version = a specific model release (e.g., `gpt-4-0613`, `claude-3-5-sonnet-20241022`). Version pinning prevents unexpected behavior changes
-- Inference = the process of generating output from input. All API-based model usage is inference (vs. training, which updates weights)
-- Latency = time between request and complete response. For streaming, Time to First Token (TTFT) measures perceived responsiveness
-- Throughput = requests or tokens processed per unit time
-- Token = the atomic text unit a model processes. Subword units whose boundaries are model-specific. ~1.3 tokens per English word; non-Latin scripts require more
-- Context Window = maximum tokens a model processes per request (input + output). Ranges from 4K to 1M+
-- Rate Limit = maximum requests or tokens per time period (RPM, TPM, RPD). Exceeding returns HTTP 429
-- Model Capability = what a model can do (text generation, tool use, vision, structured output). Varies by model and version
-## Prompt & Context Design Terms
-- System Prompt = instructions defining model role, behavior constraints, and output format. Processed before user input
-- User Prompt = the end-user's input, combined with system prompt and retrieved context
-- Few-Shot Prompting = including example input-output pairs to demonstrate desired behavior. Contrasted with zero-shot (no examples)
-- Zero-Shot Prompting = providing instructions without examples, relying on pre-trained knowledge
-- Chain-of-Thought (CoT) = instructing the model to show intermediate reasoning steps before the final answer. Variants: zero-shot CoT ("think step by step"), few-shot CoT (with reasoning examples)
-- Instruction Hierarchy = precedence order when instructions conflict: system prompt > developer instructions > user prompt
-- Structured Output = constraining output to a specific format (JSON, XML). Implemented via JSON mode, response format schemas, or tool call schemas
-- JSON Mode = a model feature guaranteeing valid JSON output. Does not guarantee schema conformance
-- Tool Call Schema = a JSON Schema describing a tool's name, description, and parameters for the model's function calling
-- Token Budget = maximum tokens allocated to a specific prompt component, preventing any single part from consuming the entire context window
-- Context Rot = output quality degradation when the context contains too much irrelevant or contradictory information
-- Prompt Template = a reusable prompt structure with variable placeholders filled at runtime
-- Prompt Injection (as input design concern) = the risk that user input overrides the system prompt. Addressed via input validation, delimiters, and instruction hierarchy
-## Retrieval & Knowledge Systems Terms
-- RAG (Retrieval Augmented Generation) = augmenting model input with retrieved external information. Three stages: indexing, retrieval, generation. A pattern, not a product — implementation varies widely
-- Chunking = splitting documents into segments for indexing. Strategies: fixed-size, semantic, recursive, sentence-level. Chunk size affects retrieval precision and recall
-- Embedding = a dense vector representation capturing semantic meaning, generated by embedding models. Similar texts → similar vectors. Model-specific and not interchangeable across models
-- Vector Database = a database for storing and querying high-dimensional vectors via approximate nearest neighbor (ANN) search. Examples: Pinecone, Weaviate, Chroma, pgvector
-- Semantic Search = finding documents by meaning similarity using embedding distance metrics (cosine similarity, dot product)
-- Hybrid Search = combining keyword search (BM25) with semantic search to leverage strengths of both
-- Reranking = second-pass scoring of retrieved documents using a cross-encoder model for improved relevance
-- Knowledge Base = a structured information collection for LLM retrieval. Distinguished from the model's parametric knowledge (learned during training)
-- Knowledge Graph = a structured representation of entities and relationships, providing relational knowledge that complements unstructured retrieval
-- File=Concept = each file defines exactly one concept; each concept is defined in exactly one file. Applies to concept files, not meta files
-- Frontmatter = YAML metadata block at file top. Source of truth is structure_spec.md
-- Navigation Index = a meta file (INDEX.md) summarizing directory contents and each file's role
-- System Map = a meta file (ARCHITECTURE.md, llms.txt) representing entire system structure as one document
-- llms.txt = a specification for LLM-friendly project information, placed at project root
-- CLAUDE.md = a project-level instruction file for Claude-based tools. Analogous to .cursorrules (Cursor), AGENTS.md (general AI agents)
-- Persistent Memory = information storage persisting across agent sessions. Distinguished from conversation history (single-session) and model parameters (training-time)
-## Agentic Systems Terms
-- Agent = an LLM-powered system that autonomously decides actions. Core loop: observe → think → act. Distinguished from prompt-response by iterative decision-making
-- Tool = a function an agent invokes to interact with external systems. Defined by schema (name, description, parameters)
-- Tool Use = the model generating structured tool call requests (function name + arguments). The model does not execute tools — application code does
-- ReAct (Reasoning + Acting) = alternating reasoning steps and action steps, with the reasoning trace in context for self-correction
-- Planning = decomposing a complex goal into steps. May be explicit (output a plan) or implicit (decide next step from current state)
-- Reflection = the agent evaluating its own outputs and deciding to revise, retry, or proceed
-- Workflow = a predefined sequence of LLM calls where control flow is code-determined, not model-determined. Patterns: prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer
-- Orchestration = coordinating multiple LLM calls, tool invocations, or agent interactions. Code-driven (workflows) or model-driven (agents)
-- Multi-Agent = multiple agents with distinct roles collaborating. Patterns: orchestrator-specialist, peer-to-peer, hierarchical
-- MCP Server / MCP Client = server exposes tools/resources/prompts; client discovers and invokes them. One agent can connect to multiple servers
-- ACI (Agent-Computer Interface) = designing tools optimized for agent use: clear descriptions, specific parameters, structured results, informative errors
-- Agent State = information maintained during execution: goal, completed steps, pending actions, results
-- Scratchpad = working memory for intermediate reasoning and partial results, included in model context
-- Long-Running Agent = a task spanning multiple sessions. Requires context compaction, progress persistence, and continuity strategies
-- Context Compaction = summarizing accumulated context to fit the window. Techniques: summarization, selective retention, structured progress notes
-- Sub-Agent = an agent spawned by a parent to handle a delegated sub-task, returning results to the parent
-## Evaluation & Testing Terms
-- Evaluation (Eval) = systematic measurement of LLM output quality. Non-deterministic outputs require different methods than traditional testing
-- Golden Set = curated input-output pairs representing expected behavior, used as a regression suite
-- AI-as-Judge = using an LLM to evaluate another LLM's output on criteria like relevance, accuracy, safety
-- Comparative Evaluation = evaluating system variants side-by-side on the same inputs. Less susceptible to absolute scoring biases
-- A/B Testing = deploying variants to real users and measuring outcomes from actual interactions
-- Regression Testing = re-running a golden set after changes to detect quality degradation
-- Benchmark = standardized test suite for model capabilities (MMLU, HumanEval, MATH). Useful for model selection, not application-specific quality
-- Hallucination = fluent, confident output that is factually incorrect or unsupported by context. Intrinsic (contradicts source) vs extrinsic (unverifiable)
-- Faithfulness = degree to which output is supported by provided context. RAG-specific metric
-- Relevance = degree to which output addresses the user's question. Distinct from faithfulness
-- HITL (Human-in-the-Loop) = incorporating human judgment for evaluation, feedback, or high-stakes approval
-- Evaluation Pipeline = automated system for test case management, scoring, result aggregation, and trend tracking
-## Safety & Alignment Terms
-- Prompt Injection = malicious input causing the model to ignore its system prompt. Direct (attacker input) or indirect (embedded in retrieved documents)
-- Jailbreak = bypassing a model's training-time safety constraints. Targets the model itself, not the application's system prompt
-- Guardrail = runtime validation of inputs/outputs against rules. Input guardrails filter prompts; output guardrails filter responses
-- Content Policy = rules defining what the system should and should not generate
-- Output Filtering = post-processing to detect and handle policy-violating content
-- Red Teaming = adversarial testing to identify vulnerabilities standard testing misses
-- Alignment = degree to which model behavior matches intended purpose and values
-- PII (Personally Identifiable Information) = data identifying an individual. Must be detected and handled in both inputs and outputs
-- Responsible AI = developing AI that is fair, transparent, accountable, safe, and privacy-preserving
-- EU AI Act = EU regulation with risk-tier classification. High-risk systems require conformity assessments, documentation, and human oversight
-## Production Operations Terms
-- Observability = understanding system state from external outputs. Three pillars: logs, metrics, traces
-- Logging = recording LLM call details (prompts, outputs, model version, tokens, latency, errors)
-- Tracing = tracking a single request through all processing steps for end-to-end debugging
-- Monitoring = real-time observation of latency, error rates, throughput, token consumption, and cost
-- Cost Tracking = measuring model usage cost from token consumption × per-token pricing. Often the largest variable cost
-- Latency Management = reducing response time via model selection, caching, streaming, prompt optimization, batching
-- Quality Drift = gradual output quality degradation from model updates, knowledge staleness, or changing user patterns
-- Feedback Loop = collecting user signals (ratings, edits, regenerations) for system improvement
-- Incident Response = handling LLM-specific failures (outages, safety violations, quality degradation, cost spikes)
-- Deployment Strategy = rolling out changes via blue-green, canary, or shadow deployment with evaluation gates
-- Caching = reusing results at two levels: response-level (identical inputs) and KV cache (prompt prefix reuse, exposed as "prompt caching")
-## Data & Model Adaptation Terms
-- Fine-Tuning = further training a pre-trained model on task-specific data. Modifies weights, unlike prompting
-- Full Fine-Tuning = updating all parameters. Highest quality but most compute; risks catastrophic forgetting
-- LoRA (Low-Rank Adaptation) = parameter-efficient fine-tuning via low-rank matrices added to frozen model weights
-- QLoRA = LoRA on a quantized base model. Enables fine-tuning large models on consumer hardware
-- Adapter = a small trainable module inserted into a pre-trained model. Multiple adapters can be swapped at inference time
-- RLHF (Reinforcement Learning from Human Feedback) = training via human preference rankings → reward model → policy optimization
-- DPO (Direct Preference Optimization) = optimizing directly on preference pairs without a separate reward model
-- RLAIF (Reinforcement Learning from AI Feedback) = using AI instead of humans for the feedback signal
-- Dataset Engineering = curating, cleaning, and constructing training datasets. Data quality often matters more than model architecture
-- Data Curation = selecting training examples by relevance, diversity, accuracy, and representativeness
-- Data Augmentation = creating training examples by transforming existing ones (paraphrasing, back-translation, substitution)
-- Synthetic Data = training data generated by AI. Scalable and privacy-safe but risks bias amplification
-- Distillation = training a smaller "student" model to reproduce a larger "teacher" model's outputs
-- Model Compression = reducing model size via quantization, pruning, or distillation
-## Cross-Cutting Terms
-These span multiple sub-areas and are not attributed to any single area.
-- CI/CD for LLM Systems = CI/CD adapted for LLM applications: evaluation gates, prompt regression testing, model version tracking, non-deterministic output validation
-- Version Management = tracking prompt versions, tool schema versions, model versions, and agent configuration versions independently
-- Experiment Tracking = recording experiment configurations and metrics. Attribution: model parameters → Area 8, output quality → Area 5, infrastructure → Area 7
-- Prompt Versioning = managing prompt template evolution with identifiers, changelogs, and rollback capability
-- Spec-First Development = defining specifications (tool schemas, prompt templates, evaluation criteria) before implementation
-## Domain Inheritance
-This domain inherits from `software-engineering/concepts.md`. Inherited terms follow the parent domain's definitions unless explicitly redefined.
-| Term | Parent Domain Definition | This Domain Redefinition | Change Scope |
-|------|------------------------|-------------------------|-------------|
-| model | domain model (business object) | LLM model (trained neural network for text generation) | Default meaning changed — unqualified "model" refers to LLM |
-| agent | software agent (generic autonomous program) | LLM agent (autonomous system with LLM-driven decision loop) | Narrowed to LLM-powered agents |
-| token | authentication token / API key | LLM token (subword unit processed by a model) | Default meaning changed — unqualified "token" refers to LLM token |
-| pipeline | CI/CD pipeline | May refer to CI/CD, RAG, or evaluation pipeline. Qualification required | Ambiguity introduced — context-dependent |
-| context | execution context (runtime state) | Context window content (information provided to the model) | Default meaning changed — unqualified "context" refers to model input |
-| memory | system memory (RAM) | Agent memory (information persisted across interactions) | Default meaning changed — unqualified "memory" refers to agent memory |
-## Homonyms Requiring Attention
-- "context": context window (model input capacity) ≠ execution context (runtime state) ≠ bounded context (DDD) ≠ React context
-- "model": LLM model (neural network) ≠ domain model (DDD business objects) ≠ data model (schema) ≠ ML model (non-LLM)
-- "agent": LLM agent (autonomous, LLM decision loop) ≠ software agent (generic) ≠ user agent (browser HTTP header)
-- "embedding": vector embedding (dense semantic vector) ≠ UI embedding (iframe) ≠ word embedding (static, Word2Vec/GloVe)
-- "token": LLM token (subword unit) ≠ authentication token (JWT, OAuth) ≠ API token (API key)
-- "prompt": system prompt ≠ user prompt ≠ prompt template ≠ MCP prompt primitive
-- "memory": agent memory (cross-session) ≠ system memory (RAM) ≠ persistent storage (disk/DB) ≠ conversation history (single-session)
-- "tool": MCP tool (agent-invocable function) ≠ development tool (IDE, linter) ≠ CLI tool
-- "chain": prompt chain (sequential LLM calls) ≠ blockchain ≠ certificate chain (TLS) ≠ LangChain chain
-- "index": navigation index (INDEX.md) ≠ database index (B-tree) ≠ vector index (HNSW, IVF) ≠ array index
-- "alignment": model alignment (values/behavior) ≠ text alignment (visual) ≠ ontology alignment (concept mapping)
-## Interpretation Principles
-These principles apply when interpreting terms and concepts within this domain.
-- Model capability descriptions are version-specific. Documentation must specify the model version when describing capabilities
-- Prompt engineering patterns are empirical, not universal. A technique effective for one model may degrade performance on another
-- "RAG" is a pattern, not a product. Two "RAG systems" may share almost no implementation details
-- Agent architecture patterns (ReAct, planning, reflection, tool use) are not mutually exclusive. They are composable design techniques
-- Token counts are model-specific because tokenization differs between models. Budget calculations must use the target model's tokenizer
-- "Best practices" in LLM development have short half-lives. Treat patterns as empirically validated, not permanent
-- The boundary between prompting and fine-tuning is a design decision based on cost, quality, data, and latency — not task nature
-- Evaluation results are meaningful only relative to the evaluation methodology
-- LLM architecture decisions are interdependent. Changing retrieval (Area 3) affects prompt design (Area 2) and evaluation (Area 5)
-## Related Documents
-- domain_scope.md — the scope definition where these terms are used, including sub-area membership criteria
-- structure_spec.md — source of truth for frontmatter specifications and structural rules
-- dependency_rules.md — details of reference chains and direction rules
-- prompt_interface.md — prompt/interface design criteria
-- competency_qs.md — questions these concepts must be able to answer

package/.onto/domains/llm-native-development/conciseness_rules.md DELETED Viewed

@@ -1,163 +0,0 @@
----
-version: 2
-last_updated: "2026-05-27"
-source: manual
-status: established
----
-# Conciseness Rules (llm-native-development)
-This document contains the domain-specific rules that conciseness references during conciseness verification.
-It is organized in the order of **type (allow/remove) → verification criteria → role boundaries → measurement method**.
----
-## 1. Allowed Duplication
-Each rule is tagged with a strength level:
-- **[MUST-ALLOW]**: Duplication that breaks the system if removed. Must be retained.
-- **[MAY-ALLOW]**: Duplication retained for convenience. Can be consolidated, but only remove when the benefit clearly outweighs the consolidation cost.
-### Traceability
-- [MUST-ALLOW] Spec → implementation → frontmatter triple traceability — spec.md requirements, implementation file code, and frontmatter metadata express the same facts in different formats. The three layers have different consumers (human judgment, LLM generation, machine verification), so removing any one severs the traceability chain.
-- [MUST-ALLOW] AI/human dual consumption paths — The same information coexists in human-facing documents (README, explanatory text) and LLM-facing structures (frontmatter, system map). Since consumers read differently, consolidation destroys accessibility for one side.
-### Navigation Structure
-- [MUST-ALLOW] File list overlap between system map (ARCHITECTURE.md) and navigation index (INDEX.md) — The system map provides a structural overview of the whole, while the navigation index provides per-directory detail. Since their purposes and navigation depth differ, independent maintenance is mandatory.
-- [MAY-ALLOW] Frontmatter relationship declarations and body "Related Documents" links pointing to the same file. Frontmatter is for machine verification, body links are for human navigation, so retention is acceptable, but there is a risk of inconsistency during updates.
-### Role and Domain Document (Role-Document Mapping)
-- [MUST-ALLOW] Agent role definitions (roles/*.md) and process document (process.md) role-document mapping both mentioning role names. Role files define judgment criteria while process documents define execution order, so their concerns differ.
-- [MAY-ALLOW] The same file (e.g., concepts.md) appearing in multiple agents' domain document lists. If the reference context differs per agent, retain; if it is mere listing, extraction into a common reference is possible.
-### Learning and Promotion
-- [MAY-ALLOW] The same content temporarily coexisting in project learning items and promoted domain documents. Project learning items should be removed after promotion is complete, but temporary duplication during the promotion process is allowed.
-### Model and Configuration Overlap (→Area 1, Area 4)
-- [MUST-ALLOW] Model fallback configuration overlap: A model routing table (→Area 1: Model Integration) and an agent tool configuration (→Area 4: Agentic Systems) may both reference the same model identifier. The routing table determines which model to call and when to fall back, while the agent configuration determines which model the agent is permitted to use for tool calls. Different consumers (router vs agent) justify the duplication — removing either breaks its consumer's decision logic.
-### Safety Defense-in-Depth (→Area 2, Area 6)
-- [MUST-ALLOW] Safety policy in prompt and guardrails: The same safety rule may appear in both the system prompt (→Area 2: Prompt & Context Design) and the output guardrail layer (→Area 6: Safety & Alignment). The system prompt constrains model generation behavior, while the guardrail filters the output post-generation. This is defense-in-depth — removing the prompt-side rule increases the chance of harmful generation; removing the guardrail-side rule removes the safety net when prompt-side prevention fails.
-### Evaluation and Operations Threshold Overlap (→Area 5, Area 7)
-- [MAY-ALLOW] Evaluation criteria in Area 5 and monitoring thresholds in Area 7: Both reference quality thresholds (e.g., hallucination rate, response relevance score), but they serve different lifecycle stages. Area 5 (Evaluation & Testing) applies thresholds pre-deployment to determine release readiness, while Area 7 (Production Operations) applies thresholds at runtime to trigger alerts and rollback. Consolidation into a shared threshold definition is preferred, but independent maintenance is acceptable when evaluation and operations teams have different update cadences.
----
-## 2. Removal Target Patterns
-Each rule is tagged with a strength level:
-- **[MUST-REMOVE]**: Duplication whose existence causes errors or incorrect reasoning.
-- **[SHOULD-REMOVE]**: Duplication that is not significantly harmful but adds unnecessary complexity.
-### Role Duplication
-- [MUST-REMOVE] Duplicate agent definitions for the same role — Agents with different names but identical judgment criteria and assigned documents existing in 2 or more instances. Maintaining both causes verification result conflicts or unnecessary repeated execution.
-- [SHOULD-REMOVE] Judgment criteria duplicately described in both role definition files and process documents — The authoritative source for judgment criteria is the role definition file, so process documents should reference only the role name.
-### Navigation and History Mixing
-- [MUST-REMOVE] Change history mixed into navigation index — INDEX.md describing both file lists (navigation) and change logs (history). Since navigation structure and change history have different update cycles and consumption purposes, separation is mandatory.
-- [SHOULD-REMOVE] Change history fields and navigation relationship fields mixed in frontmatter — Frontmatter describes current-state metadata, so change history should be delegated to git or a separate history file.
-- [MUST-REMOVE] Backward-compatibility notes, deprecated behavior, migration rationale, and historical alternatives in documents loaded for current execution — These materials inflate model context with non-current behavior. Keep execution context focused on current behavior, current contracts, current authority, and current failure handling; place historical material in isolated archive, deprecated, or development-record paths and link only when a task needs that history.
-### Structural Duplication
-- [MUST-REMOVE] Multiple paths to the same concept file — The same concept existing in different directories under different filenames. By the "File = Concept" equation, one concept must be expressed as exactly one file.
-- [SHOULD-REMOVE] System map structural descriptions that list only information completely identical to individual file frontmatter — The system map should show overviews and relationships; simple copies of frontmatter cause update omissions.
-### Constraint Duplication
-- [SHOULD-REMOVE] Rules from structure_spec.md re-described in individual file bodies — The authoritative source for constraints is structure_spec.md, so individual files should maintain only references.
-### Tool and MCP Duplication (→Area 4)
-- [MUST-REMOVE] Same tool defined in multiple MCP servers with identical schemas — The agent cannot deterministically select which server to call, producing inconsistent behavior across invocations. Consolidate into a single MCP server, or differentiate schemas if the tools genuinely serve different purposes.
-### Prompt Template Duplication (→Area 2, Area 4)
-- [MUST-REMOVE] Same prompt template defined in both Area 2 (prompt design) and Area 4 (agent instructions) — Violates single source of truth. When the template is updated in one location but not the other, the agent's behavior diverges from the intended design. Define the template in one authoritative location and reference it from the other.
-### Metric Definition Duplication (→Area 5, Area 7)
-- [SHOULD-REMOVE] Same evaluation metric computed in both Area 5 (evaluation pipeline) and Area 7 (monitoring) — Independent definition in both causes definition drift. Consolidate the metric definition (formula, thresholds, labels) into a shared location. Different computation frequencies (batch vs real-time) are acceptable.
----
-## 3. Minimum Granularity Criteria
-A sub-classification is allowed only if it satisfies **one or more** of the following. If none are satisfied, merge with the parent.
-1. **Competency question difference**: Does it produce a different answer to a question in competency_qs.md? Each of the 8 sub-areas in domain_scope.md is designed to answer distinct competency questions — if two classifications answer the same question set, they are merge candidates.
-2. **Constraint difference**: Do different rules from logic_rules.md or structure_spec.md apply? For example, Area 2 (Prompt & Context Design) has token budget constraints that do not apply to Area 1 (Model Integration), and Area 6 (Safety & Alignment) has compliance-specific constraints absent from Area 5 (Evaluation & Testing).
-3. **Consumer difference**: Do different sub-areas consume the classification? A concept consumed only by Area 3 (Retrieval & Knowledge Systems) and a concept consumed only by Area 4 (Agentic Systems) justify separate classifications even if they appear similar, because their consumers apply different logic.
-Examples:
-- `Concept file` and `meta file` have different constraints (frontmatter schema, "File = Concept" equation applicability), so the classification is justified.
-- If `navigation index` and `system map` list only the same files in the same directory with no additional information, they are candidates for merging.
-- A "model capability assessment" concept consumed by Area 1 (routing decisions) and Area 5 (evaluation benchmarks) may justify separate classifications if the competency questions and constraints differ. If they are identical in both respects, consolidate.
----
-## 4. Boundaries — Domain-Specific Application Cases
-The authoritative source for boundary definitions is `roles/conciseness.md`. This section describes only the specific application cases in the llm-native-development domain.
-### pragmatics Boundary
-- conciseness: Does an unnecessary element **exist**? (structural level)
-- pragmatics: Does unnecessary information **waste** the LLM's context window? (execution level)
-- In the 8-area structure: conciseness asks "does this element need to exist in this area?" while pragmatics asks "does including this element in the current execution context consume tokens without contributing to the task?"
-- Example: A deprecated concept file remains in the directory → conciseness. A valid but currently unnecessary file is included in the reference chain → pragmatics.
-- Example (8-area): Agent config (→Area 4) embeds a full prompt template copy → conciseness. Agent loads unused MCP tool schemas → pragmatics.
-### coverage Boundary
-- conciseness: Is there something that should not be there? (reduction direction)
-- coverage: Is there something missing that should be there? (expansion direction)
-- In the 8-area structure: coverage checks whether all 8 areas are represented (missing area = gap), while conciseness checks whether any area contains duplicated or misplaced elements.
-- Example: An agent role is defined but has no assigned domain document → coverage. Two agents with identical judgment criteria are defined → conciseness.
-- Example (8-area): Area 5 has no evaluation methodology → coverage. Same metric defined in Area 5 and Area 7 with identical formulas → conciseness.
-### logic Boundary (preceding/following relationship)
-- logic precedes: Determines logical equivalence (implication)
-- conciseness follows: Decides whether to remove after equivalence is confirmed
-- In the 8-area structure: logic determines whether two rules across areas are logically equivalent; conciseness then decides whether one should be removed. Particularly relevant for cross-area constraints (e.g., Area 6 safety rule implied by Area 2 prompt instruction).
-- Example: A rule in structure_spec.md implies a constraint in an individual file body → logic determines equivalence → conciseness determines "re-description in individual file is unnecessary."
-- Example (8-area): Area 1 fallback rule "if X fails, use Y" and Area 4 agent config "try X first, then Y" → logic determines equivalence → conciseness removes the non-authoritative copy.
-### semantics Boundary (preceding/following relationship)
-- semantics precedes: Determines semantic identity (synonym status)
-- conciseness follows: Decides whether merging is needed after synonym is confirmed
-- In the 8-area structure: semantics identifies when two terms across different areas refer to the same concept; conciseness then decides whether to merge into a single canonical definition with cross-area references.
-- Example: system map / architecture overview / structure guide are the same concept → semantics determines synonym → conciseness determines "consolidate into one canonical term."
-- Example (8-area): "response quality score" (→Area 5) and "output quality metric" (→Area 7) are the same measurement → semantics determines synonym → conciseness consolidates the definition.
----
-## 5. Quantitative Criteria
-Thresholds observed in this domain are recorded as they accumulate.
-- **Tool overlap threshold** (→Area 4): If 2 tools share >80% of their parameter schemas (measured by field name and type overlap), they are candidates for consolidation. Review whether they serve genuinely different purposes before removing.
-- **Prompt template similarity** (→Area 2): If 2 prompt templates differ by <10% of tokens (measured by edit distance / max token count), they should be consolidated into a single parameterized template. The remaining differences should be expressed as template variables.
-- (Additional thresholds are accumulated through reviews)
----
-## Related Documents
-- `concepts.md` — Term definitions, synonym mappings, homonym list (semantic criteria for duplication determination)
-- `structure_spec.md` — Frontmatter specifications, file type classification (structural-perspective removal criteria)
-- `competency_qs.md` — Competency question list (criteria for judging "actual difference" in minimum granularity)
-- `dependency_rules.md` — Reference chains and direction rules (basis for allowing reference copies)
-- `domain_scope.md` — Domain scope, required concept categories (scope reference for duplication determination)

package/.onto/domains/llm-native-development/dependency_rules.md DELETED Viewed

@@ -1,216 +0,0 @@
----
-version: 2
-last_updated: "2026-03-30"
-source: manual
-status: established
----
-# LLM-Native Development Domain — Dependency Rules
-Classification axis: **linkage type** — dependencies and connections classified by the type of relationship between components, documents, and sub-areas within LLM-powered systems.
-## Inter-Area Dependency Map
-The 8 sub-areas defined in domain_scope.md interact through three distinct dependency flows: runtime data flow, design-time constraint flow, and feedback loops.
-### Runtime Data Flow
-Direction: data flows from left to right toward model output.
-```
-3 (Retrieval) → 2 (Prompt) → 1 (Model) → output
-                                   ↑
-4 (Agentic) ──── tool calls ──────┘
-```
-- Area 3 (Retrieval & Knowledge Systems) produces retrieved content that Area 2 (Prompt & Context Design) assembles into model input
-- Area 2 constructs the complete prompt and sends it to Area 1 (Model Integration) for inference
-- Area 4 (Agentic Systems) orchestrates multi-step workflows by issuing tool calls through Area 1
-### Design-Time Constraint Flow
-Direction: constraints propagate from the constraining area to the constrained area.
-```
-1 (Model capability) → constrains → 2 (Prompt design)
-1 (Model capability) → constrains → 3 (Retrieval chunk size)
-6 (Safety policy)    → constrains → 2 (Prompt design), 4 (Agent behavior)
-5 (Eval criteria)    → constrains → all areas
-```
-- Area 1 constraints (context window size, supported output formats, tool calling capabilities) determine what Area 2 can construct and what chunk sizes Area 3 must produce
-- Area 6 (Safety & Alignment) constraints (content policies, guardrail requirements) restrict what Area 2 may include in prompts and what actions Area 4 agents may take
-- Area 5 (Evaluation & Testing) criteria serve as acceptance conditions for all areas — each area's output must satisfy its relevant evaluation criteria
-### Feedback Loops
-Intentional cycles that exist between areas. Each must have an explicit termination condition.
-**Loop 1 — Evaluation ↔ Adaptation (→Area 5, →Area 8)**:
-- Area 5 evaluation results drive fine-tuning decisions in Area 8 (Data & Model Adaptation). The fine-tuned model is then re-evaluated by Area 5.
-- Termination: quality threshold met OR training budget exhausted.
-**Loop 2 — Drift Correction (→Area 7, →Area 2, →Area 1)**:
-- Area 7 (Production Operations) detects output quality drift → Area 2 adjusts prompts → Area 1 produces new output → Area 7 monitors the result.
-- Termination: drift metric returns to baseline.
-**Loop 3 — Production Re-evaluation (→Area 7, →Area 5)**:
-- Area 7 production metrics trigger re-evaluation by Area 5. This is one-directional (not a cycle): Area 7 informs Area 5, and Area 5 may then inform other areas independently.
-## Inter-Area Direction Rules
-Each rule specifies the direction of dependency or data flow between two areas.
-| Source | Target | Relationship | Direction |
-|--------|--------|-------------|-----------|
-| Area 3 (Retrieval) | Area 2 (Prompt) | Retrieval provides content to prompt construction | 3 feeds 2 |
-| Area 2 (Prompt) | Area 1 (Model) | Constructed prompt is sent to model for inference | 2 feeds 1 |
-| Area 4 (Agentic) | Area 1 (Model) | Agent issues tool calls through model | 4 orchestrates 1 |
-| Area 6 (Safety) | Area 2 (Prompt), Area 4 (Agentic) | Safety constrains prompt design and agent behavior | 6 constrains 2, 4 |
-| Area 5 (Evaluation) | All areas | Evaluation criteria are references, not hard dependencies. All areas may be evaluated | 5 evaluates all |
-| Area 8 (Adaptation) | Area 1 (Model) | Adapted model replaces or supplements base model | 8 produces, 1 consumes |
-| Area 7 (Operations) | Area 5 (Evaluation) | Operational metrics inform evaluation priorities | 7 informs 5 |
-**Reverse direction prohibition**: An area that is fed, constrained, or consumed must not impose structural requirements back on the feeding/constraining area. For example, Area 1 (Model) must not require Area 2 (Prompt) to use a specific prompt template — Area 1 exposes capabilities, and Area 2 decides how to use them.
-## Diamond Dependencies
-A diamond dependency occurs when two areas share a common upstream dependency, creating a potential for conflicting requirements.
-**Primary diamond: 4 ← {2, 3} ← 1**
-Area 4 (Agentic Systems) depends on both Area 2 (Prompt & Context Design) and Area 3 (Retrieval & Knowledge Systems). Both Area 2 and Area 3 are constrained by Area 1 (Model Integration) — specifically by context window size, supported output formats, and tool calling capabilities.
-Resolution:
-- Area 1 constraints propagate independently to Area 2 and Area 3. Each area adapts to model constraints within its own scope.
-- Conflict scenario: Area 2 and Area 3 could impose contradictory requirements on each other (e.g., Area 2 demands more token budget for instructions while Area 3 demands more for retrieved content).
-- Conflict prevention: The handoff point defined in domain_scope.md (Area 3 boundary ends at retrieval results returned; Area 2 boundary begins at assembling results into prompt) ensures that token budget allocation is Area 2's responsibility, not Area 3's. Area 3 returns ranked results, and Area 2 decides how many to include.
-## Truth Source Hierarchy
-When multiple sources provide guidance on the same LLM system design question, defer to the higher-priority source. Document any conflict and its resolution.
-| Priority | Source | Authority Level |
-|----------|--------|----------------|
-| 1 | Model provider documentation (Anthropic, OpenAI, Google) | Normative for model behavior. Defines what the model can and cannot do, correct API usage, and known limitations. Overrides all other sources for model-specific questions |
-| 2 | Protocol specifications (MCP, A2A, OpenAPI for tool schemas) | Normative for integration interfaces. Defines correct interaction patterns between system components |
-| 3 | Framework documentation (LangChain, LlamaIndex, LangGraph) | Informative for implementation. Describes available abstractions and their intended usage. May deviate from model provider guidance — when conflict occurs, defer to level 1 |
-| 4 | Community best practices (Chip Huyen "AI Engineering", applied-llms.org, Anthropic/OpenAI guides) | Informative for patterns and trade-offs. Reflects practitioner experience. Follow unless domain-specific requirements dictate otherwise |
-| 5 | Internal conventions (project CLAUDE.md, spec.md, onto domain files) | Local authority. Must not contradict levels 1-2. May deviate from levels 3-4 with documented rationale |
-## External Dependency Management
-LLM-powered systems depend on external services and libraries that evolve independently. Each dependency type has specific management requirements.
-### Model API Dependencies (→Area 1)
-- Model version must be pinned (e.g., `claude-sonnet-4-20250514`, not `claude-sonnet-4-latest`) in production environments. Unpinned versions may change behavior without warning.
-- Breaking changes include: model deprecation, behavior changes between versions, pricing changes, rate limit changes, API format changes.
-- Each breaking change requires a migration plan: (1) identify affected components, (2) evaluate replacement model, (3) run Area 5 evaluation suite against new model, (4) deploy with canary rollout via Area 7.
-### MCP Server Dependencies (→Area 4)
-- Server availability must be checked at startup. The agent must define fallback behavior for each MCP server: degrade gracefully (skip tools provided by the unavailable server) or halt (if the server provides critical capabilities).
-- MCP server schema changes (tool parameter changes, new required fields) are breaking changes. Pin server versions or validate schemas at connection time.
-### Embedding Model Dependencies (→Area 3)
-- Changing the embedding model requires re-indexing all stored vectors. Embeddings from different models are incompatible — cosine similarity between vectors from different models is meaningless.
-- Migration path: (1) generate new embeddings with new model, (2) run parallel retrieval evaluation (Area 5) comparing old and new, (3) switch over atomically, (4) delete old embeddings after validation period.
-### Framework Dependencies (→Area 4, →Area 3)
-- LangChain/LlamaIndex/LangGraph version upgrades may change default behavior (e.g., default prompt templates, default chunking strategies, default retrieval parameters).
-- Pin framework versions in production. Upgrade only with explicit testing against Area 5 evaluation suite.
-- When a framework abstracts away model-specific behavior, verify that the abstraction matches model provider documentation (Truth Source Hierarchy level 1 takes precedence over level 3).
-## Metric Ownership Rules
-Each metric in the system has exactly one owning area. Ownership is determined by the measurement purpose, not by the observable being measured.
-| Measurement Purpose | Owning Area | Example Metric |
-|---|---|---|
-| Is the output correct? | Area 5 (Evaluation) | Answer accuracy, hallucination rate, factual grounding score |
-| Is the output safe? | Area 6 (Safety) | Prompt injection detection rate, PII leak rate, policy violation rate |
-| Is the system running efficiently? | Area 7 (Operations) | Response latency (p50/p95/p99), error rate, cost per request, throughput |
-| Is the model training improving? | Area 8 (Adaptation) | Training loss, validation accuracy, fine-tuning convergence rate |
-**Shared observable rule**: When the same observable (e.g., response latency) is measured for different purposes, each purpose's metric is owned by the respective area. For example:
-- Response latency measured for SLA compliance → Area 7 (operational concern)
-- Response latency measured as part of model comparison evaluation → Area 5 (evaluation concern)
-These are distinct metrics with distinct owners, even though they measure the same underlying quantity.
-## Document Reference Direction Rules
-### Inter-Layer Direction
-- System Map (ARCHITECTURE.md) → Directory Index (INDEX.md) → Individual Documents: Allowed (upper → lower)
-- Individual Documents → System Map: Prohibited (lower → upper reversal). Individual documents may reference only same-level or lower-level documents
-### Inter-Type Direction
-- Domain Documents → Process Documents: Prohibited. Domain must be independent of process
-- Process Documents → Domain Documents: Allowed. Processes reference domain knowledge
-- Role Documents → Domain Documents: Allowed. Roles read domain documents as judgment criteria
-### Same-Type Document Direction
-- References between domain documents of the same type: Allowed (e.g., concepts.md ↔ structure_spec.md)
-- Bidirectional references are allowed, but cycles are prohibited (see Cycle Exception Clause below)
-- When referencing, specify in the Related Documents section
-### Inter-Domain Direction
-- Child (specialized) domain → Parent (general) domain: Allowed (e.g., llm-native-development → software-engineering)
-- Parent (general) domain → Child (specialized) domain: Prohibited
-- Same-level domains: Bidirectional allowed, but cycles are prohibited
-## Acyclicity
-- Document reference graph: Acyclic (DAG). Circular references cause LLM infinite traversal
-- Inter-area dependency graph: Acyclic except for declared feedback loops (see Feedback Loops above). Each declared loop must specify a termination condition
-### Cycle Exception Clause
-- Bidirectional references (A↔B) arising from the bidirectional relationship requirement in logic_rules.md are cycle exceptions
-- When cycles are unavoidable: Declare bidirectional references in frontmatter and record the reference reason
-- Cycle exceptions allow only 2-node bidirectional (A↔B). 3-node or more cycles (A→B→C→A) are prohibited
-- Feedback loops between areas (declared in Inter-Area Dependency Map) are also cycle exceptions, but must include termination conditions
-## Referential Integrity
-- File paths listed in frontmatter's `depends_on`, `related_to`, etc. must actually exist
-- When a file is moved/deleted, all frontmatter referencing that file must be updated
-- Files registered in INDEX.md must actually exist in that directory
-### Inheritance Referential Integrity
-- The parent domain path referenced in concepts.md's inheritance declaration must actually exist
-- Terms in the redefinition list must actually exist in the parent domain's concepts.md
-- When parent domain terms are deleted/moved, child domain inheritance declarations must also be updated
-### Source of Truth for Redefinitions
-- Terms in the redefinition list: The child domain (llm-native-development) is the source of truth
-- Terms not in the redefinition list: The parent domain (software-engineering) is the source of truth
-## Duplication Prevention Rules
-- If the same content exists in 2 or more files → Designate one as the source of truth and replace others with references
-- If similar concepts are defined under different names in separate files → Consolidate or explicitly state the differences
-- If INDEX.md descriptions and the original file's frontmatter description conflict → Frontmatter takes precedence (INDEX.md is the source of truth for file existence and location; frontmatter is the source of truth for file attributes)
-## Change Atomicity
-- When a single concept change spans multiple files, all changes are reflected in a single commit
-- The unit of change is "one concept addition/modification/deletion"
-- When an inter-area dependency changes (e.g., Area 1 model capability changes that affect Area 2 prompt design and Area 3 chunk size), all affected areas' documentation must be updated atomically
-## Related Documents
-- domain_scope.md — Sub-area definitions, membership criteria, and handoff points referenced by inter-area rules
-- logic_rules.md — Bidirectional relationship requirement (linked to this file's cycle exception clause)
-- structure_spec.md — Frontmatter specifications, INDEX.md role definition
-- concepts.md — Domain inheritance declaration, term definitions
-- competency_qs.md — Dependency verification questions