npm - compound-agent - Versions diffs - 1.7.6 → 2.0.0 - Mend

compound-agent 1.7.6 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

package/CHANGELOG.md +45 -1
package/README.md +70 -47
package/bin/ca +32 -0
package/package.json +19 -78
package/scripts/postinstall.cjs +221 -0
package/dist/cli.d.ts +0 -1
package/dist/cli.js +0 -13158
package/dist/cli.js.map +0 -1
package/dist/index.d.ts +0 -3730
package/dist/index.js +0 -3240
package/dist/index.js.map +0 -1
package/docs/research/AgenticAiCodebaseGuide.md +0 -1206
package/docs/research/BuildingACCompilerAnthropic.md +0 -116
package/docs/research/HarnessEngineeringOpenAi.md +0 -220
package/docs/research/code-review/systematic-review-methodology.md +0 -409
package/docs/research/index.md +0 -76
package/docs/research/learning-systems/knowledge-compounding-for-agents.md +0 -695
package/docs/research/property-testing/property-based-testing-and-invariants.md +0 -742
package/docs/research/scenario-testing/advanced-and-emerging.md +0 -470
package/docs/research/scenario-testing/core-foundations.md +0 -507
package/docs/research/scenario-testing/domain-specific-and-human-factors.md +0 -474
package/docs/research/security/auth-patterns.md +0 -138
package/docs/research/security/data-exposure.md +0 -185
package/docs/research/security/dependency-security.md +0 -91
package/docs/research/security/injection-patterns.md +0 -249
package/docs/research/security/overview.md +0 -81
package/docs/research/security/secrets-checklist.md +0 -92
package/docs/research/security/secure-coding-failure.md +0 -297
package/docs/research/software_architecture/01-science-of-decomposition.md +0 -615
package/docs/research/software_architecture/02-architecture-under-uncertainty.md +0 -649
package/docs/research/software_architecture/03-emergent-behavior-in-composed-systems.md +0 -644
package/docs/research/spec_design/decision_theory_specifications_and_multi_criteria_tradeoffs.md +0 -0
package/docs/research/spec_design/design_by_contract.md +0 -251
package/docs/research/spec_design/domain_driven_design_strategic_modeling.md +0 -183
package/docs/research/spec_design/formal_specification_methods.md +0 -161
package/docs/research/spec_design/logic_and_proof_theory_under_the_curry_howard_correspondence.md +0 -250
package/docs/research/spec_design/natural_language_formal_semantics_abuguity_in_specifications.md +0 -259
package/docs/research/spec_design/requirements_engineering.md +0 -234
package/docs/research/spec_design/systems_engineering_specifications_emergent_behavior_interface_contracts.md +0 -149
package/docs/research/spec_design/what_is_this_about.md +0 -305
package/docs/research/tdd/test-driven-development-methodology.md +0 -547
package/docs/research/test-optimization-strategies.md +0 -401
package/scripts/postinstall.mjs +0 -102

package/docs/research/scenario-testing/domain-specific-and-human-factors.md DELETED Viewed

@@ -1,474 +0,0 @@
-# Scenario Testing for CS Development: Domain-Specific Applications & Human Factors
-*2026-03-07*
----
-## Abstract
-This survey examines how scenario testing adapts when deployed within five high-stakes domains: safety-critical and embedded systems, distributed systems and microservices, security and adversarial contexts, real-time and performance engineering, and human-centered design. Where Paper 1 of this series established the combinatorial, model-based, and behavioral foundations of scenario testing, and Paper 2 addressed mutation testing, formal verification, AI-assisted generation, and chaos engineering, this paper argues a unifying thesis: the same foundational scenario primitives -- precondition, trigger, expected outcome -- are universally preserved across domains, but the surrounding constraints of regulation, failure consequence, adversarial intent, temporal precision, and cognitive process transform both the generation and evaluation of scenarios so fundamentally that domain-naive approaches fail systematically. Each domain imposes a distinct epistemological contract on what it means to have tested "enough."
-The analysis proceeds by characterizing each domain's regulatory pressure, failure-consequence topology, and scenario-complexity drivers before examining specific tools, standards, and empirical evidence. Safety-critical domains demand exhaustive traceability and structural coverage proofs; distributed systems require contractual inter-service verification and resilience to partial failure; security demands adversarial imagination beyond what specification-driven processes naturally produce; real-time systems require statistical confidence in timing behavior that functional testing cannot provide; and human factors research reveals that every scenario set reflects the biases, blind spots, and social dynamics of the team that authored it. A cross-domain synthesis table and open problem enumeration conclude the survey.
-The paper draws on standards bodies (RTCA, ISO, IEC, OWASP, MITRE), canonical academic literature in safety engineering, security engineering, distributed systems, performance modeling, and cognitive science, as well as industrial tool ecosystems. It is positioned as a practitioner-accessible but academically rigorous complement to Papers 1 and 2, grounding abstract testing theory in the irreducible messiness of real deployment domains.
----
-## 1. Introduction
-### 1.1 The Insufficiency of Generic Scenario Testing
-A scenario, in its most abstract form, is a description of a system traversal: a starting state, a sequence of stimuli, and a set of assertions about the resulting state or behavior. This abstraction is powerful precisely because it is domain-agnostic -- the same grammatical structure supports a login workflow, a fuel injection timing sequence, an HTTP contract between two microservices, and an SQL injection attack. Papers 1 and 2 in this series examined the structural and algorithmic machinery for generating, selecting, and evaluating such scenarios.
-What those papers could not fully address is the question of *adequacy under constraint*. The aerospace engineer asking whether a flight management software has been tested well enough faces a legally binding answer encoded in RTCA DO-178C, where "well enough" means, among many other things, that every decision in the source code has been exercised with both true and false outcomes through at least one test case -- a property called Modified Condition/Decision Coverage (MC/DC) -- and that every requirement has a traceable test case, and that the DER (Designated Engineering Representative) can sign off on an evidence package. The security researcher asking the same question faces a fundamentally different epistemological problem: the adversary is not a random perturbation but an intelligent, adaptive agent whose scenario space is unbounded. The performance engineer faces yet a third regime: the system may pass all functional scenarios while still failing under load.
-These are not superficial variations. They represent categorically different theories of what testing is for, what counts as coverage, and what constitutes evidence of quality.
-### 1.2 Scope of This Survey
-1. **Safety-Critical and Embedded Systems** -- DO-178C, ISO 26262, IEC 62304, IEC 61508; requirements-based testing; MC/DC coverage; HIL and SIL platforms.
-2. **Distributed Systems and Microservices** -- Consumer-driven contract testing; service virtualization; saga testing; eventual consistency and partition scenarios.
-3. **Security and Adversarial Scenarios** -- Threat modeling (STRIDE, PASTA); abuse cases; OWASP testing scenarios; fuzzing; MITRE ATT&CK; red team exercises.
-4. **Real-Time and Performance Scenarios** -- Load profile design; SLA-driven scenarios; WCET analysis; scheduling theory.
-5. **Human Factors in Scenario Design** -- Cognitive biases; collaborative discovery workshops; domain expert roles; scenario-based design.
-### 1.3 Key Definitions
-**Safety Integrity Level (SIL):** A discrete level (1-4 in IEC 61508) representing the required risk reduction. Analogous: DAL (DO-178C), ASIL (ISO 26262).
-**Contract Testing:** Codifying service interactions as machine-verifiable contracts, allowing independent testing.
-**Threat Model:** Structured representation of potential adversarial actions, including assets, threat actors, attack vectors, and mitigations.
-**Load Profile:** Parameterized description of demand pattern for performance testing.
-**Cognitive Bias:** Systematic deviation from rational judgment that causes coverage blind spots in scenario authoring.
-### 1.4 The Domain Adaptation Thesis
-The scenario primitive is universal, but the *generation oracle* (how we decide which scenarios to write), the *adequacy oracle* (how we decide when we have enough), and the *evaluation oracle* (how we judge whether a scenario passed) are all domain-specific constructs that must be engineered with the same rigor as the system under test itself.
----
-## 2. Foundations
-### 2.1 Domain Engineering
-Czarnecki and Eisenecker (2000) established the distinction between problem space and solution space. Every domain encodes its own problem-space ontology: "flight phase" is first-class in aviation testing; "saga" is central to microservice testing. Kang et al.'s FODA method (1990) showed that domain variability captured in feature models enables feature-model-driven test selection.
-### 2.2 Safety Engineering
-Leveson's "Engineering a Safer World" (2011) reframes safety as the enforcement of safety constraints on system behavior via STAMP (System-Theoretic Accident Model and Processes). STPA (System-Theoretic Process Analysis) generates safety constraints from a control-theoretic model and derives test scenarios as constraint-exercise requirements. Heimdahl and Leveson (1996) established formal criteria for requirements completeness at mode transitions.
-### 2.3 Security Engineering
-Anderson's "Security Engineering" (2008, 3rd ed. 2020) provides the attack tree formalism: adversarial goals decomposed recursively into executable attack scenarios. McGraw's "Software Security" (2006) introduced "evil user stories." Sindre and Opdahl (2005) formalized misuse cases.
-### 2.4 Human Factors
-Reason's Swiss Cheese Model (1990) explains multi-layered defense -- each scenario is a layer, gaps are the holes. Norman's "Design of Everyday Things" (1988, 2013) provides the Gulf of Evaluation concept. Carroll's "Making Use" (2000) established scenario-based design. Kahneman's "Thinking, Fast and Slow" (2011) provides the cognitive bias framework.
----
-## 3. Taxonomy of Domains
-| Dimension | Safety-Critical | Distributed Systems | Security | Real-Time/Performance | Human Factors |
-|---|---|---|---|---|---|
-| **Regulatory pressure** | Extreme (legally binding) | Low (de facto) | Moderate (GDPR, PCI-DSS) | Low-Moderate (SLA) | Negligible |
-| **Failure consequence** | Catastrophic (loss of life) | High (data loss, unavailability) | High-Extreme (breach, fraud) | Moderate-High (SLA breach) | Indirect (shipped defects) |
-| **Complexity driver** | Mode interaction, rare events | Topology, partitions, consistency | Adversarial creativity | Concurrency, load distribution | Cognitive biases, tacit knowledge |
-| **Test cycle duration** | Months to years | Hours to days | Days to weeks | Hours to days | Hours to weeks |
-| **Tool maturity** | Very high (decades) | High-Medium (~10 years) | High (exploitation); medium (scenario libraries) | High (JMeter 20+ years) | Low (artisanal facilitation) |
-| **Adequacy oracle** | Traceability + MC/DC | Contract compliance + saga invariants | ATT&CK coverage + vuln discovery rate | SLA thresholds + percentile targets | Team consensus + heuristics |
----
-## 4. Analysis
-### 4.1 Safety-Critical and Embedded Systems
-#### 4.1.1 Theory and Mechanism
-**Requirements-based testing**: Every scenario must trace to a requirement; every requirement must be covered. The bidirectional traceability matrix is a primary deliverable.
-**Structural coverage** by Development Assurance Level (DO-178C Table A-7):
-- **DAL A** (catastrophic): MC/DC coverage required
-- **DAL B** (hazardous): Decision coverage required
-- **DAL C** (major): Statement coverage required
-- **DAL D** (minor): No structural coverage required
-**MC/DC** (Chilenski & Miller, 1994): Requires each condition to independently affect the decision outcome. For N conditions, typically N+1 test cases suffice (vs. 2^N for full Boolean coverage).
-**Tool qualification**: Any tool used for certification evidence must be qualified under DO-330, adding recursive verification obligations.
-**HIL and SIL**: SIL replaces target hardware with simulation (faster, higher volume, no hardware-specific timing). HIL retains actual hardware with simulated environment (real-time, detects hardware interaction defects, expensive).
-#### 4.1.2 Literature Evidence
-DO-178C (RTCA, 2011) with supplements DO-330 (tool qualification), DO-331 (model-based development), DO-333 (formal methods). ISO 26262 Part 6 Clause 9 mandates methods by ASIL. IEC 62304 mandates safety class assignment (A/B/C) with manufacturer-justified testing methods.
-Leveson (2011, Chapter 14) argues the dominant failure mode is incomplete hazard identification -- scenarios written for imagined behaviors, not actual accident-causing ones. STPA addresses this systematically.
-Hatton (1997) demonstrated that static analysis and testing in combination were both required for safety-critical C code.
-#### 4.1.3 Implementations and Benchmarks
-- **LDRA TBrun/TBvision**: Requirements traceability + MC/DC coverage. DO-178C/ISO 26262 certified.
-- **VectorCAST**: C/C++/Ada unit and integration testing. DO-330 qualified. CANoe/CANalyzer integration for automotive HIL.
-- **Parasoft C/C++test**: Policy enforcement + coverage with continuous monitoring.
-- **Simulink Test**: Model-level scenario specification in terms of physical signals (DO-331).
-- **dSPACE SCALEXIO/VEOS**: Industrial standard HIL/SIL platforms for automotive/aerospace.
-- **NI VeriStand**: Aerospace HIL with LabVIEW-based test orchestration.
-Kanstren (2012) found the primary tool differentiator was traceability coverage quality, not execution capability.
-#### 4.1.4 Strengths and Limitations
-**Strengths**: Dual adequacy oracle (requirements + structural coverage). Auditable evidence. Regulatory certainty.
-**Limitations**: Cost (~$1,000/LOC for DAL A per Rierson, 2013). MC/DC + requirements cannot guarantee all failure modes are considered -- system-level interactions remain a gap. STPA addresses this but increases burden and is not yet mandated.
----
-### 4.2 Distributed Systems and Microservices
-#### 4.2.1 Theory and Mechanism
-**Consumer-Driven Contract Testing** (Clemson, 2014): Consumer defines expected interaction; provider verifies against consumer's expectation. Transforms O(n^2) integration into O(n) verification. Assumptions: contract completeness and currency. "Contract drift" is the primary failure mode.
-**Service Virtualization**: Simulation services mimicking dependency behavior including stateful behavior, latency, and fault conditions.
-**Saga Testing** (Richardson, 2018): For each saga step, generate a failure scenario and assert correct compensating behavior. Multi-party transactional scenarios require coordination across services.
-**Eventual Consistency Scenarios**: Require temporal assertions: "after write to A, B returns updated value within T ms with probability P."
-**Network Partition Testing**: Tests CAP theorem behavior -- system either refuses writes (CP) or accepts and reconciles (AP).
-#### 4.2.2 Literature Evidence
-Newman's "Building Microservices" (2015, 2nd ed. 2019): Testing pyramid with contract layer. Observation that E2E flakiness grows with square of services involved.
-Richardson's "Microservices Patterns" (2018): Saga failure taxonomy mapping to scenario generation grammar.
-Zampetti et al. (2020): 78% of 138 microservices projects had no inter-service contract tests. Projects with contracts had significantly lower integration defect rates.
-#### 4.2.3 Implementations and Benchmarks
-- **Pact**: Reference consumer-driven contract testing. Multi-language + Pact Broker for shared registry.
-- **Spring Cloud Contract**: Provider-driven alternative, Spring-native.
-- **WireMock**: HTTP service virtualization with stateful scenarios.
-- **Hoverfly**: Transparent proxy + simulation with interaction journals.
-- **Testcontainers**: On-demand Docker containers for isolated multi-service scenarios.
-#### 4.2.4 Strengths and Limitations
-**Strengths**: Independent team delivery. Decoupled deployment.
-**Limitations**: Contract drift. E2E flakiness. Observability gap for failure diagnosis.
----
-### 4.3 Security and Adversarial Scenarios
-#### 4.3.1 Theory and Mechanism
-**STRIDE** (Shostack, 2014): Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege. Each category maps to scenario patterns.
-**PASTA** (UcedaVelez & Morana, 2015): Seven-stage risk-centric threat modeling with explicit attack simulation stage.
-**Attack Trees** (Schneier, 1999; Mauw & Oostdijk, 2005): Hierarchical scenario decomposition. Each root-to-leaf path is a complete scenario.
-**Misuse Cases** (Sindre & Opdahl, 2005): "Evil twin" of use cases with "threatens" and "mitigates" relationships.
-**Security Fuzzing**: AFL introduced coverage-guided fuzzing (2013). AFL++ (Fioraldi et al., 2020) extended with improved strategies. LibFuzzer provides in-process fuzzing with sanitizer integration.
-**MITRE ATT&CK** (Strom et al., 2018): Comprehensive TTP library from observed real-world attacks. Coverage metric: percentage of applicable techniques tested.
-#### 4.3.2 Literature Evidence
-OWASP Testing Guide v4.2: 90+ specific test scenarios for web applications.
-Bau et al. (2010): Web application scanner false negative rates of 44-97%, with tools most effective for syntactically simple vulnerabilities and least effective for semantic/business logic flaws.
-Bohme et al. (2017): Fuzzing discovers more bugs per CPU-hour for large input spaces; symbolic execution better for small, complex spaces.
-#### 4.3.3 Implementations and Benchmarks
-- **OWASP ZAP**: Open-source web security scanner with configurable scan policies.
-- **Burp Suite**: Industry-standard commercial web security testing.
-- **AFL++/libFuzzer**: Coverage-guided fuzzing. Google's OSS-Fuzz continuously fuzzes hundreds of projects.
-- **Metasploit**: Exploit framework operationalizing ATT&CK-aligned attack scenarios.
-- **MITRE ATT&CK Navigator**: Coverage visualization and gap identification tool.
-#### 4.3.4 Strengths and Limitations
-**Strengths**: ATT&CK scenarios grounded in actual attacker behavior. Continuous framework updates from new threat intelligence.
-**Limitations**: Unbounded adversary problem -- no finite library covers all attacks. Zero-days by definition absent. Automated scanners have high false negative rates for semantic vulnerabilities.
----
-### 4.4 Real-Time and Performance Scenarios
-#### 4.4.1 Theory and Mechanism
-**Load Profile Design** (Jain, 1991): Arrival rate model (constant, ramp, spike, step), think time distribution (negative exponential), request mix (from production logs), data characteristics.
-**SLA-Driven Design**: "95th percentile < 200ms at 1,000 concurrent users" defines both pass/fail and load parameters. Concurrent users estimated via Little's Law: N = lambda * W.
-**WCET Analysis** (Wilhelm et al., 2008): Static analysis + processor architecture modeling for provable execution time upper bounds. Tension between precision (tight bounds) and soundness (guaranteed bounds).
-**Scheduling Theory** (Liu & Layland, 1973): Rate Monotonic optimal for fixed-priority; utilization bound sum(Ci/Ti) <= n(2^(1/n) - 1).
-#### 4.4.2 Literature Evidence
-Molyneaux (2014): Taxonomy of performance test types (load, stress, soak, spike, volume).
-Iosup et al. (2011): Up to 40% difference in reported response times between tools on identical scenarios, due to client-side overhead differences.
-Tene (2015): "How NOT to Measure Latency" -- conventional tools systematically under-report high-percentile latency via coordinated omission.
-#### 4.4.3 Implementations and Benchmarks
-- **Apache JMeter**: Most deployed. 20+ years. Broad protocol support. XML test plans difficult to version-control.
-- **Gatling**: Scala DSL. Version-controllable. Akka-based high concurrency.
-- **k6**: JavaScript scenarios. Thresholds, scenarios API, browser testing. Grafana Labs.
-- **Locust**: Python-based. Maximum expressive flexibility.
-- **aiT WCET Analyzer** (AbsInt): Commercial, DO-178C/ISO 26262 qualified.
-#### 4.4.4 Strengths and Limitations
-**Strengths**: Direct connection to business outcomes (SLA failure = customer impact).
-**Limitations**: Production representativeness decays as traffic patterns shift. "Black swan" load events not in test suite. Statistical validity frequently violated -- p50 reported without p95/p99.
----
-### 4.5 Human Factors in Scenario Design
-#### 4.5.1 Theory and Mechanism
-**Cognitive Biases**:
-- *Confirmation bias*: Preference for happy-path scenarios confirming code correctness (Weinberg, 2008).
-- *Availability heuristic*: Over-testing past incident patterns; under-testing unmemorable failure modes.
-- *Anchoring*: First proposed scenario constrains group's subsequent exploration.
-- *Optimism bias*: Scenario suites complete for nominal behavior, sparse for degraded/error modes.
-- *Framing effects*: "Verify max payload handling" vs. "verify attacker can't crash with oversized payload" -- same test, different discovery likelihood.
-**Collaborative Discovery**:
-- *Example Mapping* (Wynne, 2015): Rules (yellow), examples (green), questions (red) on index cards. Red card clusters identify poorly understood areas.
-- *Event Storming* (Brandolini, 2013): Domain events on timeline. 15-50 participants. Extracts normal flow, branch conditions, and missing behavior scenarios.
-- *Specification by Example* (Adzic, 2011): Collaborative discovery -> executable specifications -> regression tests. 40-70% reduction in requirement misunderstanding rate.
-**Domain Expert vs. Developer Asymmetry**: Domain experts have tacit edge-case knowledge but lack technical expressiveness. Developers can implement but lack domain coverage intuition.
-**Rapid Software Testing** (Bach & Bolton, 2014): Session-based exploratory testing with domain experts as live oracles.
-#### 4.5.2 Literature Evidence
-Shull et al. (2000): Perspective-Based Reading reduced defect escape rates by 35% vs. unstructured review.
-Zannier et al. (2007): Structured workshops produced more complete, less redundant artifacts than unstructured brainstorming.
-Endsley (1995): Situation Awareness theory explains why testers with low domain awareness miss safety-critical state transitions.
-#### 4.5.3 Implementations and Benchmarks
-- **Example Mapping**: No tooling required beyond index cards or Miro.
-- **Event Storming**: Sticky notes or Miro with community templates.
-- **Exploratory Testing**: TestRail, Zephyr for charter management.
-- **Bias mitigation**: Perspective-Based Reading (persona assignment), Devil's Advocate sessions, Pre-mortem Analysis (Klein, 1993).
-#### 4.5.4 Strengths and Limitations
-**Strengths**: Access to tacit knowledge. Cross-functional scenario discovery.
-**Limitations**: Workshop fatigue (2-4 hours effective). Scaling beyond small teams. Tacit-to-explicit translation depends on scribe quality. No automated scenario extraction from workshops.
----
-## 5. Comparative Synthesis
-| Dimension | Safety-Critical | Distributed Systems | Security | Real-Time/Performance | Human Factors |
-|---|---|---|---|---|---|
-| **Regulatory overhead** | Extreme | Minimal | Moderate | Low | None |
-| **Failure cost** | Catastrophic | High | Variable | Moderate-High | Indirect |
-| **Generation method** | Requirements + STPA | Contracts + topology | Threat modeling + ATT&CK | Workload characterization | Collaborative workshops |
-| **Adequacy oracle** | MC/DC + traceability | Contract compliance | ATT&CK coverage | SLA thresholds | Team consensus |
-| **Tool maturity** | Very high | High-Medium | High (exploitation) | High | Low |
-| **Human bottleneck** | Safety analysts + DER | Platform architects | Red teamers | Performance engineers | Facilitators + domain experts |
-| **Primary failure mode** | Completeness illusion | Contract drift | Adversary surprise | Production mismatch | Facilitation quality |
-**Cross-domain observations**:
-*The adequacy oracle determines technique selection.* Safety-critical uses a dual oracle (requirements + structural coverage). Security uses transparent-about-incompleteness metrics (ATT&CK coverage). Performance uses threshold metrics (SLA). Human factors lacks a compelling objective oracle.
-*Human bottlenecks persist across all domains.* Tool automation reduces per-scenario execution cost but has not automated generating the *right* scenarios in any domain.
-*Scenario reuse remains unsolved.* ATT&CK and OWASP provide domain-agnostic scenario libraries; safety-critical and performance scenarios remain system-specific.
----
-## 6. Open Problems and Gaps
-### 6.1 Cross-Domain Scenario Reuse
-Automotive ADAS scenario libraries have potential applicability to drone navigation and robotic surgery. Standards frameworks are incompatible (DAL vs. ASIL vs. IEC 62304 classes). OpenSCENARIO (ASAM, 2020) is a first step for automotive; adoption in other domains is nascent.
-### 6.2 Automated Compliance Evidence Generation
-LLMs could automate requirements parsing, scenario suggestion, and traceability link generation, but no system produces evidence directly acceptable to regulators without human review.
-### 6.3 Security Scenario Completeness Metrics
-No widely accepted completeness metric exists. Research directions: attack graph completeness from formal threat models, adversarial ML for scenario generation from vulnerability databases, red team vs. scanner coverage correlation studies.
-### 6.4 Performance Scenario Drift
-Load profiles calibrated at launch become progressively less representative. Continuous comparison of production traffic against test profiles remains an engineering gap.
-### 6.5 Debiasing Scenario Authoring at Scale
-Perspective-Based Reading works for small groups but doesn't scale to 500-developer organizations. Automated bias detection tools (analyzing scenario suite patterns like "95% happy-path outcomes") could provide scalable signals but require labeled training data.
-### 6.6 AI-Assisted Domain-Specific Scenario Generation
-Open problems: LLM-generated MC/DC-adequate scenarios for safety-critical software, threat-model-grounded security scenarios beyond ATT&CK enumeration, performance scenarios from architecture descriptions calibrated against realistic load distributions.
----
-## 7. Conclusion
-The scenario primitive is preserved across all five domains, but the generation oracle, adequacy oracle, and evaluation oracle are domain-specific constructs that must be deliberately engineered.
-Safety-critical testing is the most rigorously defined, at a cost justified only by failure consequences. Distributed systems fragment testing across service boundaries, with contract testing providing elegant but drift-prone decomposition. Security testing is structurally incomplete by necessity -- the adversary's scenario space is unbounded. Performance testing connects most directly to business outcomes but depends on production representativeness that decays over time. Human factors represent the least formalized domain, yet are the root cause of quality failures in all others.
-Scenario testing quality is bounded not primarily by algorithmic or tooling limitations but by the quality of domain knowledge that informs generation and the adequacy oracles that judge completeness. Investing in domain knowledge -- through standards alignment, threat modeling, workload characterization, and collaborative discovery -- produces greater quality improvements than investing in execution tooling alone.
----
-## References
-Adzic, G. (2011). *Specification by Example*. Manning Publications.
-Anderson, R. (2008, 3rd ed. 2020). *Security Engineering*. Wiley.
-ASAM. (2020). *OpenSCENARIO 1.0*. Association for Standardization of Automation and Measuring Systems.
-Bach, J., & Bolton, M. (2014). *Rapid Software Testing*. Satisfice, Inc.
-Bass, L., Clements, P., & Kazman, R. (2021). *Software Architecture in Practice* (4th ed.). Addison-Wesley.
-Bau, J., et al. (2010). State of the art: Automated black-box web application vulnerability testing. *IEEE S&P*, 332-345.
-Bohme, M., et al. (2017). Coverage-based greybox fuzzing as Markov chain. *IEEE TSE*, 45(5), 489-506.
-Brandolini, A. (2013, expanded 2021). *Introducing Event Storming*. Leanpub.
-Carroll, J. M. (2000). *Making Use: Scenario-Based Design*. MIT Press.
-Chilenski, J. J., & Miller, S. P. (1994). Applicability of MC/DC to software testing. *Software Engineering Journal*, 9(5), 193-200.
-Clemson, B. (2014). *Pact: Consumer-driven contract testing*. Pact Foundation.
-Czarnecki, K., & Eisenecker, U. W. (2000). *Generative Programming*. Addison-Wesley.
-Endsley, M. R. (1995). Toward a theory of situation awareness. *Human Factors*, 37(1), 32-64.
-Fioraldi, A., et al. (2020). AFL++. *WOOT '20*.
-Hatton, L. (1997). The T experiments. *IEEE CSE*, 4(2), 27-38.
-Heimdahl, M. P. E., & Leveson, N. G. (1996). Completeness and consistency in hierarchical state-based requirements. *IEEE TSE*, 22(6), 363-377.
-IEC. (2006, amended 2015). *IEC 62304: Medical Device Software*.
-IEC. (2010). *IEC 61508: Functional Safety*.
-Iosup, A., et al. (2011). On the performance variability of production cloud services. *CCGrid*, 104-113.
-ISO. (2018). *ISO 26262: Road Vehicles -- Functional Safety* (2nd ed.).
-Jain, R. (1991). *The Art of Computer Systems Performance Analysis*. Wiley.
-Kahneman, D. (2011). *Thinking, Fast and Slow*. Farrar, Straus and Giroux.
-Kang, K. C., et al. (1990). *Feature-Oriented Domain Analysis (FODA)*. CMU/SEI-90-TR-21.
-Kanstren, T. (2012). Testing approaches in industrial safety-critical software. *ENASE*, 159-168.
-Klein, G. A. (1993). Recognition-primed decision model. In *Decision Making in Action* (pp. 138-147). Ablex.
-Klees, G., et al. (2018). Evaluating fuzz testing. *CCS '18*, 2123-2138.
-Leveson, N. G. (2011). *Engineering a Safer World*. MIT Press.
-Liu, C. L., & Layland, J. W. (1973). Scheduling algorithms for multiprogramming. *JACM*, 20(1), 46-61.
-Mauw, S., & Oostdijk, M. (2005). Foundations of attack trees. *ICISC 2005* (LNCS 3935), 186-198.
-McGraw, G. (2006). *Software Security: Building Security In*. Addison-Wesley.
-Molyneaux, I. (2014). *The Art of Application Performance Testing* (2nd ed.). O'Reilly.
-Newman, S. (2015, 2nd ed. 2019). *Building Microservices*. O'Reilly.
-Norman, D. A. (1988, revised 2013). *The Design of Everyday Things*. Basic Books.
-OWASP Foundation. (2020). *OWASP Testing Guide* (v4.2).
-Reason, J. (1990). *Human Error*. Cambridge University Press.
-Richardson, C. (2018). *Microservices Patterns*. Manning.
-Rierson, L. (2013). *Developing Safety-Critical Software*. CRC Press.
-RTCA. (2011). *DO-178C: Software Considerations in Airborne Systems and Equipment Certification*.
-Schneier, B. (1999). Attack trees. *Dr. Dobb's Journal*, 24(12), 21-29.
-Shostack, A. (2014). *Threat Modeling: Designing for Security*. Wiley.
-Shull, F., et al. (2000). How perspective-based reading can improve requirements inspections. *IEEE Computer*, 33(7), 73-79.
-Sindre, G., & Opdahl, A. L. (2005). Eliciting security requirements with misuse cases. *RE*, 10(1), 34-44.
-Strom, B. E., et al. (2018). *MITRE ATT&CK: Design and Philosophy*. MTR180035.
-Tene, G. (2015). *How NOT to Measure Latency*. QCon presentation.
-UcedaVelez, T., & Morana, M. (2015). *Risk Centric Threat Modeling*. Wiley.
-Weinberg, G. M. (2008). *Perfect Software and Other Illusions About Testing*. Dorset House.
-Wilhelm, R., et al. (2008). The WCET problem -- overview and survey. *ACM TECS*, 7(3), 36.
-Wynne, M. (2015). *Example Mapping*. Cucumber Limited.
-Wynne, M., & Hellesoy, A. (2012). *The Cucumber Book*. Pragmatic Bookshelf.
-Zampetti, F., et al. (2020). Interplay between pull request review and CI builds. *SANER 2020*, 11-21.
-Zannier, C., et al. (2007). On the success of agile practices. *Agile 2007*, 209-213.
----
-## Practitioner Resources
-### Safety-Critical and Embedded Systems
-**Tools**
-- **LDRA** (ldra.com): Requirements traceability + MC/DC. DO-178C/ISO 26262 certified.
-- **VectorCAST** (vector.com): C/C++/Ada testing. DO-330 qualified.
-- **Parasoft C/C++test** (parasoft.com): Policy enforcement + continuous coverage.
-- **Simulink Test** (mathworks.com): Model-level scenario specification (DO-331).
-- **dSPACE SCALEXIO/VEOS** (dspace.com): HIL/SIL platforms.
-- **AbsInt aiT** (absint.com): WCET analysis, DO-178C qualified.
-**Key Texts**
-- Rierson (2013). *Developing Safety-Critical Software*. CRC Press.
-- Leveson (2011). *Engineering a Safer World*. MIT Press (free PDF).
-### Distributed Systems
-**Tools**
-- **Pact** (pact.io): Consumer-driven contracts. Multi-language + PactFlow broker.
-- **Spring Cloud Contract** (spring.io): Provider-driven, Spring-native.
-- **WireMock** (wiremock.org): HTTP service virtualization with stateful scenarios.
-- **Hoverfly** (hoverfly.io): Transparent proxy + simulation.
-- **Testcontainers** (testcontainers.com): Docker containers for integration scenarios.
-- **ToxiProxy** (github.com/Shopify/toxiproxy): Network condition simulation.
-**Key Texts**
-- Newman (2019). *Building Microservices* (2nd ed.). O'Reilly.
-- Richardson (2018). *Microservices Patterns*. Manning.
-### Security
-**Tools**
-- **OWASP ZAP** (zaproxy.org): Open-source web security scanner.
-- **Burp Suite** (portswigger.net): Industry-standard commercial.
-- **AFL++** (github.com/AFLplusplus): Coverage-guided fuzzer.
-- **libFuzzer** (llvm.org/docs/LibFuzzer.html): In-process fuzzing.
-- **Metasploit** (metasploit.com): Exploit framework.
-- **ATT&CK Navigator** (mitre-attack.github.io/attack-navigator): Coverage visualization.
-**Key Texts**
-- Shostack (2014). *Threat Modeling*. Wiley.
-- Anderson (2020). *Security Engineering* (3rd ed.). Wiley (free chapter PDFs).
-### Performance
-**Tools**
-- **Apache JMeter** (jmeter.apache.org): Mature, broad protocol support.
-- **Gatling** (gatling.io): Scala DSL, high concurrency.
-- **k6** (k6.io): JavaScript scenarios, CI/CD-native.
-- **Locust** (locust.io): Python-based, distributed.
-- **Hdr Histogram** (hdrhistogram.org): Coordinated omission correction.
-**Key Texts**
-- Jain (1991). *The Art of Computer Systems Performance Analysis*. Wiley.
-- Molyneaux (2014). *The Art of Application Performance Testing*. O'Reilly.
-### Human Factors
-**Techniques**
-- **Example Mapping**: No tooling required. cucumber.io/blog/bdd/example-mapping-introduction
-- **Event Storming**: Miro templates available. eventstorming.com
-- **Specification by Example**: Adzic (2011). specificationbyexample.com
-**Key Texts**
-- Carroll (2000). *Making Use*. MIT Press.
-- Kahneman (2011). *Thinking, Fast and Slow*.
-- Weinberg (2008). *Perfect Software*. Dorset House.
-- Brandolini (2021). *Introducing Event Storming*. Leanpub.

package/docs/research/security/auth-patterns.md DELETED Viewed

@@ -1,138 +0,0 @@
-# Authentication & Authorization Patterns
-*February 23, 2026*
-Reference for AI agents reviewing access control in web applications.
-Derived from [secure-coding-failure.md](secure-coding-failure.md) section 4.7.
----
-## Authentication vs Authorization
-- **Authentication**: Verifying *who* the caller is (identity). Handled by login flows, tokens, sessions.
-- **Authorization**: Verifying *what* the caller can do (permissions). Handled by middleware, guards, ownership checks.
-Most vulnerabilities occur in authorization -- the caller is authenticated but accesses
-resources or operations they should not.
----
-## Common Broken Patterns
-### Missing Auth Middleware
-Route handlers that lack an auth guard present on sibling routes.
-```typescript
-// Express -- unprotected route among protected ones
-router.get('/api/users', authMiddleware, listUsers);
-router.get('/api/users/:id', authMiddleware, getUser);
-router.delete('/api/users/:id', deleteUser);  // BUG: missing authMiddleware
-```
-**Detection**: List all route registrations. Flag any route that omits auth middleware
-when the majority of routes at the same path prefix include it.
-### Insecure Direct Object Reference (IDOR)
-User-supplied IDs used in queries without ownership verification.
-```typescript
-// Unsafe -- any authenticated user can access any order
-app.get('/api/orders/:id', authMiddleware, async (req, res) => {
-  const order = await db.get('SELECT * FROM orders WHERE id = ?', [req.params.id]);
-  res.json(order);
-});
-// Safe -- ownership check included
-app.get('/api/orders/:id', authMiddleware, async (req, res) => {
-  const order = await db.get(
-    'SELECT * FROM orders WHERE id = ? AND user_id = ?',
-    [req.params.id, req.user.id]
-  );
-  if (!order) return res.status(404).json({ error: 'Not found' });
-  res.json(order);
-});
-```
-**Detection**: Find DB queries that use a URL/body parameter as the primary filter but
-do not include the authenticated user's ID in the WHERE clause.
-### Role Escalation
-Admin endpoints accessible with regular user tokens.
-```typescript
-// Unsafe -- no role check
-app.post('/api/admin/promote', authMiddleware, promoteUser);
-// Safe -- role guard added
-app.post('/api/admin/promote', authMiddleware, requireRole('admin'), promoteUser);
-```
-**Detection**: Flag routes under `/admin` or similar prefixes that lack role-checking
-middleware beyond basic auth.
-### JWT Pitfalls
-| Pitfall | Description |
-|---------|-------------|
-| No signature verification | Accepting tokens without calling `jwt.verify()` |
-| Algorithm confusion | Accepting `alg: "none"` or switching from RS256 to HS256 |
-| No expiry check | Missing `exp` claim validation, tokens valid forever |
-| Secret in source code | JWT signing secret hardcoded (see [secrets-checklist.md](secrets-checklist.md)) |
-### CORS Misconfiguration
-```typescript
-// Unsafe -- wildcard origin with credentials
-app.use(cors({ origin: '*', credentials: true }));
-// Unsafe -- reflecting request origin without validation
-app.use(cors({ origin: req.headers.origin, credentials: true }));
-// Safe -- explicit allowlist
-app.use(cors({ origin: ['https://app.example.com'], credentials: true }));
-```
----
-## Framework-Specific Patterns
-### Express / NestJS (JS/TS)
-| Check | What to look for |
-|-------|-----------------|
-| Missing global guard | Routes outside `app.use(authMiddleware)` scope |
-| NestJS missing `@UseGuards` | Controller methods without `@UseGuards(AuthGuard)` |
-| Route ordering | Unprotected catch-all route registered before auth middleware |
-### Django / FastAPI (Python)
-| Check | What to look for |
-|-------|-----------------|
-| Missing `@login_required` | Django view functions without the decorator |
-| Missing `Depends(get_current_user)` | FastAPI endpoints without auth dependency |
-| Missing permission classes | DRF views with `permission_classes = []` or `AllowAny` on sensitive endpoints |
-| Django `@csrf_exempt` | State-changing views exempted from CSRF without justification |
----
-## Detection Approach
-### Route Audit
-1. Extract all registered routes (paths + HTTP methods).
-2. For each route, check if auth middleware/decorator/guard is applied.
-3. Flag routes that modify data (POST/PUT/PATCH/DELETE) without auth.
-4. Flag routes that return user-specific data without auth.
-### Ownership Check Audit
-1. Find all DB queries that accept a user-supplied ID parameter.
-2. Check if the query also filters by the authenticated user's ID.
-3. If no ownership filter exists and the resource is user-scoped, flag as IDOR (P1).
-## Source
-Derived from [secure-coding-failure.md](secure-coding-failure.md) section 4.7.