npm - @harness-engineering/cli - Versions diffs - 1.13.0 → 1.13.1 - Mend

@harness-engineering/cli 1.13.0 → 1.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (267) hide show

package/dist/agents/skills/claude-code/harness-product-spec/SKILL.md ADDED Viewed

@@ -0,0 +1,285 @@
+# Harness Product Spec
+> Generate structured product specifications from feature requests, issues, or descriptions. Produces user stories with EARS acceptance criteria, Given-When-Then scenarios, and PRD documents with traceable requirements.
+## When to Use
+- When a new feature needs formal specification before implementation begins
+- When GitHub issues or feature requests need to be translated into actionable user stories
+- When acceptance criteria are missing, vague, or untestable for existing stories
+- NOT for technical architecture decisions (use harness-architecture-advisor)
+- NOT for implementation planning with task breakdown (use harness-planning)
+- NOT for bug triage or root cause analysis (use harness-debugging)
+## Process
+### Phase 1: PARSE -- Extract Feature Intent
+1. **Resolve input source.** Accept one of:
+   - GitHub issue URL: fetch via `gh issue view <number> --json title,body,labels,comments`
+   - Feature description file: read the provided file path
+   - Inline text: use the provided description directly
+2. **Extract core elements.** From the input, identify:
+   - **Goal:** What is the user trying to accomplish?
+   - **Actor(s):** Who are the users or systems involved? (e.g., "admin user," "API consumer," "billing system")
+   - **Trigger:** What initiates the feature? (user action, system event, time-based)
+   - **Constraints:** What limitations exist? (performance, platform, backward compatibility)
+   - **Context:** What existing system components are involved?
+3. **Identify ambiguities.** Flag any element that is missing or unclear:
+   - "This issue mentions 'notifications' but does not specify the channel (email, in-app, push)"
+   - "No success metric defined -- what does 'working correctly' mean?"
+   - "Edge case not addressed: what happens when the user has no payment method?"
+4. **Resolve ambiguities.** Use `emit_interaction` to present questions when critical information is missing:
+   ```
+   The feature request mentions "user notifications" but does not specify:
+   1. Notification channel (email, in-app, push, SMS)
+   2. Whether notifications are configurable by the user
+   3. Retry behavior for failed deliveries
+   Please clarify before proceeding.
+   ```
+5. **Load project context.** Scan the project for existing specs, user stories, or PRDs to maintain consistency in format and terminology:
+   - Check `docs/specs/`, `docs/requirements/`, `docs/prd/` for existing documents
+   - Check `.github/ISSUE_TEMPLATE/` for the project's preferred issue format
+   - Identify domain terminology used in existing specs
+6. **Classify feature type.** Categorize the feature as:
+   - **New capability:** Something the system cannot do today
+   - **Enhancement:** Improvement to an existing capability
+   - **Integration:** Connecting with an external system
+   - **Migration:** Moving from one approach to another
+---
+### Phase 2: CRAFT -- Generate User Stories and Acceptance Criteria
+1. **Write user stories.** For each actor-goal pair, produce a story in standard format:
+   ```
+   As a [actor],
+   I want to [action],
+   so that [benefit].
+   ```
+   Break large features into multiple stories. Each story must be independently deliverable and testable.
+2. **Write EARS acceptance criteria.** Apply the EARS (Easy Approach to Requirements Syntax) patterns:
+   - **Ubiquitous:** "The [system] shall [behavior]" -- for unconditional requirements
+   - **Event-driven:** "When [trigger], the [system] shall [behavior]" -- for responses to events
+   - **State-driven:** "While [state], the [system] shall [behavior]" -- for ongoing conditions
+   - **Optional:** "Where [feature is enabled], the [system] shall [behavior]" -- for configurable behavior
+   - **Unwanted:** "If [condition], then the [system] shall [response]" -- for error handling and edge cases
+3. **Write Given-When-Then scenarios.** For each acceptance criterion, produce at least one BDD scenario:
+   ```
+   Given [precondition],
+   When [action],
+   Then [expected outcome].
+   ```
+   Include:
+   - Happy path scenario
+   - At least one error/edge case scenario
+   - Boundary condition scenarios where applicable
+4. **Define edge cases.** For each story, enumerate:
+   - What happens with empty input?
+   - What happens with maximum input?
+   - What happens when the user lacks permission?
+   - What happens during concurrent access?
+   - What happens when a dependency is unavailable?
+5. **Assign story metadata.** For each story:
+   - **Priority:** Must-have, Should-have, Could-have, Won't-have (MoSCoW)
+   - **Size estimate:** S, M, L, XL (relative to other stories)
+   - **Dependencies:** Other stories or systems this depends on
+   - **Risk:** Low, Medium, High (with risk description)
+---
+### Phase 3: GENERATE -- Produce PRD Document
+1. **Structure the PRD.** Generate a document with these sections:
+   - **Title and version** (feature name, PRD version, date, author)
+   - **Problem statement** (what problem does this solve, who has it, how painful is it)
+   - **Goals and non-goals** (explicit scope boundaries)
+   - **User stories** (from Phase 2, organized by priority)
+   - **Acceptance criteria** (EARS format, traceable to stories)
+   - **Technical constraints** (performance requirements, platform constraints, backward compatibility)
+   - **Success metrics** (measurable outcomes that define "done")
+   - **Open questions** (unresolved ambiguities from Phase 1)
+   - **Out of scope** (explicitly excluded items)
+2. **Write the problem statement.** Include:
+   - Who is affected (specific user segments)
+   - How the problem manifests today (current workaround or pain)
+   - Quantified impact if available (time lost, error rate, support tickets)
+3. **Define success metrics.** Every metric must be:
+   - **Measurable:** Can be tracked with existing or planned instrumentation
+   - **Time-bound:** Has a target timeline for evaluation
+   - **Specific:** Not "improve user experience" but "reduce checkout abandonment by 15% within 30 days"
+4. **Map requirements to stories.** Create a traceability matrix:
+   ```
+   REQ-001 -> US-001, US-003 (must-have)
+   REQ-002 -> US-002 (should-have)
+   REQ-003 -> US-004, US-005 (could-have)
+   ```
+5. **Write the PRD to file.** Save to the project's spec directory (detected in Phase 1 or defaulting to `docs/specs/`). Use a filename pattern: `YYYY-MM-DD-feature-name-prd.md`.
+---
+### Phase 4: VALIDATE -- Verify Completeness and Testability
+1. **Check story independence.** Verify each user story can be delivered independently:
+   - Does the story depend on another story being completed first?
+   - If yes, is the dependency documented?
+   - Can the story be tested in isolation?
+2. **Check acceptance criteria testability.** Every EARS criterion must be verifiable:
+   - Can an automated test be written for this criterion?
+   - Is the expected behavior specific enough to distinguish pass from fail?
+   - Are boundary values defined (not "handles large files" but "handles files up to 100MB")?
+     Flag untestable criteria: "Criterion AC-003 says 'the system should be fast' -- this is not testable. Recommend: 'the system shall respond within 200ms for the 95th percentile.'"
+3. **Check coverage completeness.** Verify all parsed elements from Phase 1 are addressed:
+   - Every actor has at least one story
+   - Every constraint has a corresponding acceptance criterion
+   - Every ambiguity is either resolved or listed in open questions
+   - Error handling is specified for every user-facing action
+4. **Check format consistency.** Verify the output matches existing project conventions:
+   - Story format matches templates in `.github/ISSUE_TEMPLATE/` if present
+   - Terminology matches existing specs (do not introduce "user" when the project uses "member")
+   - Priority scheme matches existing stories
+5. **Output validation summary:**
+   ```
+   Product Spec Validation: [COMPLETE/INCOMPLETE]
+   Stories: N generated (M must-have, K should-have)
+   Acceptance criteria: N (all testable: YES/NO)
+   BDD scenarios: N (covering N criteria)
+   Coverage: all actors covered, all constraints addressed
+   Open questions: N remaining
+   Generated: docs/specs/2026-03-27-notifications-prd.md
+   ```
+---
+## Harness Integration
+- **`harness skill run harness-product-spec`** -- Primary command for generating product specifications.
+- **`harness validate`** -- Run after generating specs to verify project health.
+- **`Bash`** -- Used to fetch GitHub issues via `gh` CLI and check existing spec files.
+- **`Read`** -- Used to read input feature descriptions, existing specs, and issue templates.
+- **`Write`** -- Used to generate PRD documents and user story files.
+- **`Glob`** -- Used to locate existing spec directories, issue templates, and requirement documents.
+- **`Grep`** -- Used to extract domain terminology from existing specs and find related stories.
+- **`emit_interaction`** -- Used to present ambiguities for clarification and confirm spec structure before writing.
+## Success Criteria
+- Every feature input produces at least one user story with EARS acceptance criteria
+- All acceptance criteria are testable (specific, measurable, with defined boundaries)
+- BDD scenarios cover happy path and at least one error/edge case per criterion
+- PRD document includes all required sections with traceable requirements
+- Ambiguities are surfaced and either resolved or tracked as open questions
+- Output format matches existing project conventions when they exist
+- Generated PRD is saved to the correct directory with consistent naming
+## Examples
+### Example: GitHub Issue to PRD for Team Notifications
+```
+Phase 1: PARSE
+  Source: gh issue view 234 (title: "Add team notification preferences")
+  Actor: team admin, team member
+  Goal: control which notifications team members receive
+  Ambiguities found:
+    - Channel not specified (resolved: email + in-app per comment #3)
+    - "Important notifications" undefined (flagged as open question)
+Phase 2: CRAFT
+  US-001: As a team admin, I want to set default notification preferences for my team,
+          so that new members receive appropriate notifications without manual setup.
+    AC-001 (Ubiquitous): The system shall apply team-default preferences to new members on join.
+    AC-002 (Event-driven): When a team admin updates default preferences, the system shall
+            prompt whether to apply to existing members.
+    AC-003 (Unwanted): If a team member has custom preferences, then the system shall
+            preserve them when team defaults change.
+  US-002: As a team member, I want to override team notification defaults,
+          so that I receive only notifications relevant to my role.
+    Scenario: Given a team member with default preferences,
+              When they disable "deployment" notifications,
+              Then they shall not receive deployment notifications
+              And their other preferences remain unchanged.
+Phase 3: GENERATE
+  Written: docs/specs/2026-03-27-team-notifications-prd.md
+  Sections: problem statement, 4 user stories, 12 acceptance criteria, 8 BDD scenarios
+  Traceability: REQ-001 -> US-001, US-002 | REQ-002 -> US-003, US-004
+Phase 4: VALIDATE
+  Stories: 4 (2 must-have, 1 should-have, 1 could-have)
+  Acceptance criteria: 12 (all testable: YES)
+  Open questions: 1 ("important notifications" needs product definition)
+  Result: COMPLETE
+```
+### Example: Inline Feature Description for Stripe Webhook Integration
+```
+Phase 1: PARSE
+  Source: inline text "We need to handle Stripe webhooks for subscription changes"
+  Actor: billing system (automated), finance admin (human oversight)
+  Constraints: idempotency required, webhook signature verification, 5-second response SLA
+  Ambiguities:
+    - Which subscription events? (resolved via clarification: created, updated, canceled, past_due)
+    - Retry handling? (Stripe retries for 72 hours)
+Phase 2: CRAFT
+  US-001: As the billing system, I want to process Stripe subscription.updated webhooks,
+          so that user plan changes are reflected within 60 seconds.
+    AC-001 (Event-driven): When a subscription.updated webhook arrives, the system shall
+            update the user's plan within 60 seconds.
+    AC-002 (Unwanted): If a duplicate webhook event ID is received, then the system shall
+            return 200 OK without reprocessing.
+    AC-003 (Unwanted): If webhook signature verification fails, then the system shall
+            return 400 and log a security warning.
+Phase 3: GENERATE
+  Written: docs/specs/2026-03-27-stripe-webhooks-prd.md
+  Technical constraints section includes: idempotency keys, signature verification,
+    5-second response SLA, Stripe retry behavior documentation
+Phase 4: VALIDATE
+  All 4 webhook event types have stories: YES
+  Idempotency criterion is testable: YES (duplicate event ID -> no side effects)
+  Result: COMPLETE
+```
+## Gates
+- **No generating specs from ambiguous input without clarification.** If the input lacks a clear actor, goal, or trigger, pause and ask. Do not invent requirements that were not stated or implied.
+- **No untestable acceptance criteria.** Every criterion must be verifiable by an automated test or a specific manual procedure. "The system should be user-friendly" is not an acceptance criterion.
+- **No skipping edge cases for user-facing actions.** Every action that a user can trigger must have at least one unwanted-behavior criterion (EARS "If" pattern) covering the error case.
+- **No overwriting existing specs.** If a PRD already exists for this feature, present the diff rather than replacing the file. Existing specs may have been reviewed and approved.
+## Escalation
+- **When the feature request is too vague to parse:** Present what was extracted and what is missing: "This issue contains a title but no description. I need at minimum: who is the user, what action they want to perform, and why. Please add details to the issue or provide them here."
+- **When requirements conflict with existing system behavior:** Flag the conflict: "AC-003 requires real-time sync, but the existing event system uses eventual consistency with up to 30-second delay. This needs an architectural decision before the spec can be finalized."
+- **When the feature scope is too large for a single PRD:** Recommend splitting: "This feature contains 3 independent capabilities (notifications, preferences, audit log). Recommend splitting into 3 PRDs that can be prioritized and delivered independently."
+- **When acceptance criteria require metrics that do not exist yet:** Flag the instrumentation gap: "Success metric 'reduce checkout time by 20%' requires checkout timing instrumentation that does not currently exist. Add an instrumentation story as a prerequisite."

package/dist/agents/skills/claude-code/harness-product-spec/skill.yaml ADDED Viewed

@@ -0,0 +1,72 @@
+name: harness-product-spec
+version: "1.0.0"
+description: User story generation, EARS acceptance criteria, and PRD creation from issues
+cognitive_mode: constructive-architect
+triggers:
+  - manual
+  - on_new_feature
+platforms:
+  - claude-code
+  - gemini-cli
+tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Glob
+  - Grep
+  - emit_interaction
+cli:
+  command: harness skill run harness-product-spec
+  args:
+    - name: path
+      description: Project root path
+      required: false
+    - name: source
+      description: "Input source: issue URL, feature description file, or inline text"
+      required: true
+    - name: format
+      description: "Output format: prd, user-stories, acceptance-criteria, all. Defaults to all."
+      required: false
+mcp:
+  tool: run_skill
+  input:
+    skill: harness-product-spec
+    path: string
+type: rigid
+tier: 3
+internal: false
+keywords:
+  - product spec
+  - user story
+  - acceptance criteria
+  - EARS
+  - PRD
+  - requirements
+  - product requirements
+  - BDD
+  - given-when-then
+  - feature specification
+stack_signals:
+  - "docs/specs/"
+  - "docs/requirements/"
+  - "specs/"
+  - ".github/ISSUE_TEMPLATE/"
+  - "docs/prd/"
+phases:
+  - name: parse
+    description: Extract feature intent, stakeholders, and constraints from input source
+    required: true
+  - name: craft
+    description: Generate user stories with EARS acceptance criteria and edge cases
+    required: true
+  - name: generate
+    description: Produce PRD document with scope, requirements, and success metrics
+    required: true
+  - name: validate
+    description: Verify completeness, testability of acceptance criteria, and traceability to source
+    required: true
+state:
+  persistent: false
+  files: []
+depends_on: []

package/dist/agents/skills/claude-code/harness-property-test/SKILL.md ADDED Viewed

@@ -0,0 +1,281 @@
+# Harness Property Test
+> Property-based and generative testing with fast-check, hypothesis, and automatic shrinking. Discovers edge cases that example-based tests miss by generating thousands of random inputs and verifying invariants hold for all of them.
+## When to Use
+- Testing functions with large input spaces (parsers, serializers, encoders, validators)
+- Verifying mathematical or algebraic properties (commutativity, associativity, round-trip encoding)
+- Finding edge cases in data transformation, sorting, or filtering logic
+- NOT when testing UI rendering or visual output (use harness-visual-regression instead)
+- NOT when testing simple CRUD operations with well-defined inputs (use harness-tdd instead)
+- NOT when testing external service integrations (use harness-integration-test instead)
+## Process
+### Phase 1: IDENTIFY -- Discover Testable Properties and Invariants
+1. **Catalog candidate functions.** Search for functions that exhibit testable properties:
+   - **Pure functions** with deterministic output for given input
+   - **Serializers/deserializers** where `decode(encode(x)) === x` (round-trip property)
+   - **Sorting/filtering** where output maintains invariants (sorted order, subset relationship)
+   - **Validators** where valid input always passes and specific invalid inputs always fail
+   - **Mathematical functions** with known algebraic properties
+2. **Identify properties for each candidate.** Common property categories:
+   - **Round-trip:** `deserialize(serialize(x)) === x` for any valid `x`
+   - **Idempotence:** `f(f(x)) === f(x)` (applying the function twice gives the same result)
+   - **Invariant preservation:** output always satisfies a postcondition regardless of input
+   - **Commutativity:** `f(a, b) === f(b, a)` for operations where order should not matter
+   - **No-crash (robustness):** function does not throw for any input in the domain
+   - **Monotonicity:** if `a <= b`, then `f(a) <= f(b)` for order-preserving functions
+   - **Equivalence:** `fastImpl(x) === referenceImpl(x)` for optimized implementations
+3. **Define input domains.** For each property, specify:
+   - The type and range of valid inputs
+   - Constraints that inputs must satisfy (e.g., non-empty arrays, positive integers)
+   - Edge cases that the generator should emphasize (empty strings, zero, max int, Unicode)
+4. **Prioritize by risk.** Focus property tests on:
+   - Functions where bugs have high business impact
+   - Functions with complex branching logic
+   - Functions that have had historical bugs or regression issues
+5. **Report findings.** List candidate functions, their properties, and the expected generator configuration.
+### Phase 2: DEFINE -- Write Property Specifications and Custom Generators
+1. **Select the property testing framework.** Based on the project's language:
+   - **TypeScript/JavaScript:** fast-check
+   - **Python:** hypothesis
+   - **Rust:** proptest or quickcheck
+   - **Scala:** ScalaCheck
+   - **Haskell:** QuickCheck
+   - **Java/Kotlin:** jqwik
+2. **Define custom generators (arbitraries) for domain types.** For each domain model:
+   - Build a generator that produces valid instances with realistic field values
+   - Add constraints matching the model's validation rules
+   - Compose generators for nested structures using `map`, `flatMap`, and `filter`
+3. **Write property test specifications.** For each property identified in Phase 1:
+   - State the property as a universally quantified assertion: "For all inputs X satisfying constraint C, property P holds"
+   - Use the framework's property definition syntax
+   - Configure iteration count (default: 100 iterations for fast properties, 1000 for critical properties)
+4. **Configure shrinking.** Ensure the framework's automatic shrinking is enabled:
+   - Shrinking reduces failing inputs to the minimal counterexample
+   - Custom generators should support shrinking (use `map` over `filter` where possible, since `filter` breaks shrinking)
+   - Set a shrink limit to prevent infinite shrinking on complex inputs
+5. **Write seed values for reproducibility.** Configure:
+   - A fixed seed for CI to ensure deterministic reruns
+   - Seed logging so that any failure can be reproduced exactly
+   - Replay capability: failed seeds are stored and replayed on subsequent runs
+### Phase 3: EXECUTE -- Run Property Tests and Collect Counterexamples
+1. **Run property tests with verbose output.** Execute the test suite and observe:
+   - Number of test cases generated per property
+   - Any counterexamples found (failing inputs)
+   - Shrinking progress (how the framework reduces counterexamples)
+2. **Analyze counterexamples.** For each failing property:
+   - Read the shrunk counterexample -- this is the minimal input that violates the property
+   - Understand why this input causes a failure
+   - Classify: is this a real bug, or is the property specification too strict?
+3. **Reproduce counterexamples deterministically.** For each counterexample:
+   - Record the failing seed value
+   - Write an explicit example-based test using the shrunk counterexample as a regression test
+   - This regression test serves as documentation and prevents the same bug from recurring
+4. **Handle flaky property tests.** If a property test fails intermittently:
+   - Increase the iteration count to reproduce more reliably
+   - Check if the property is sensitive to floating-point precision
+   - Verify that the generator does not produce inputs outside the valid domain
+5. **Iterate on generator quality.** If the generator frequently produces uninteresting inputs:
+   - Add bias toward edge cases (empty collections, boundary values)
+   - Use `filter` sparingly (it discards inputs, wasting iterations)
+   - Prefer `map` and `flatMap` to construct valid inputs directly
+### Phase 4: ANALYZE -- Diagnose Root Causes and Harden Implementations
+1. **Fix bugs exposed by counterexamples.** For each real bug found:
+   - Understand the root cause using the minimal counterexample
+   - Fix the implementation
+   - Verify the property now holds (rerun with the same seed)
+   - Keep the regression test with the explicit counterexample
+2. **Strengthen property specifications.** After fixing bugs:
+   - Consider whether additional properties are now testable
+   - Tighten existing properties if the fix enables stricter invariants
+   - Add properties for edge cases revealed by the counterexamples
+3. **Measure property test effectiveness.** Evaluate:
+   - Number of unique bugs found by property tests vs. example-based tests
+   - Types of bugs found (off-by-one, overflow, Unicode handling, null handling)
+   - Generator coverage: what percentage of the input domain is being explored
+4. **Integrate property tests into CI.** Configure:
+   - Property tests run on every PR with a moderate iteration count (100)
+   - Nightly runs use a higher iteration count (10,000) for deeper exploration
+   - Failed seeds are stored as artifacts for reproduction
+5. **Run `harness validate`.** Confirm the project passes all harness checks with property tests in place.
+### Graph Refresh
+If a knowledge graph exists at `.harness/graph/`, refresh it after code changes to keep graph queries accurate:
+```
+harness scan [path]
+```
+## Harness Integration
+- **`harness validate`** -- Run in ANALYZE phase after property tests are written and bugs are fixed. Confirms project health.
+- **`harness check-deps`** -- Run after DEFINE phase to verify property testing framework is in devDependencies.
+- **`emit_interaction`** -- Used to present counterexample analysis and property specification decisions to the human.
+- **Grep** -- Used in IDENTIFY phase to find pure functions, serializers, validators, and mathematical operations.
+- **Glob** -- Used to catalog existing property test files and domain type definitions.
+## Success Criteria
+- Every function with large input space has at least one property test
+- Custom generators produce valid domain objects without relying heavily on `filter`
+- All counterexamples are investigated: real bugs are fixed, property specs are adjusted for false positives
+- Shrunk counterexamples are preserved as explicit regression tests
+- Property tests are deterministic in CI (fixed seed) while still exploring randomly in local development
+- `harness validate` passes with property tests in place
+## Examples
+### Example: fast-check for a TypeScript URL Parser
+**IDENTIFY -- Properties of a URL parser:**
+```
+Function: parseUrl(input: string): ParsedUrl
+Properties:
+  1. Round-trip: formatUrl(parseUrl(url)) === url for any valid URL
+  2. No-crash: parseUrl(arbitrary_string) never throws (returns Result type)
+  3. Invariant: parsed.protocol is always lowercase
+  4. Invariant: parsed.host never contains a trailing slash
+```
+**DEFINE -- Custom generator and property tests:**
+```typescript
+// tests/property/url-parser.prop.test.ts
+import fc from 'fast-check';
+import { parseUrl, formatUrl } from '../../src/url-parser';
+// Custom generator for valid URLs
+const urlArb = fc
+  .record({
+    protocol: fc.constantFrom('http', 'https', 'ftp'),
+    host: fc.domain(),
+    port: fc.option(fc.integer({ min: 1, max: 65535 }), { nil: undefined }),
+    path: fc
+      .array(
+        fc.stringOf(fc.constantFrom(...'abcdefghijklmnopqrstuvwxyz0123456789-_'.split('')), {
+          minLength: 1,
+        })
+      )
+      .map((segments) => '/' + segments.join('/')),
+  })
+  .map(({ protocol, host, port, path }) => `${protocol}://${host}${port ? ':' + port : ''}${path}`);
+describe('URL parser properties', () => {
+  it('round-trips valid URLs', () => {
+    fc.assert(
+      fc.property(urlArb, (url) => {
+        const parsed = parseUrl(url);
+        if (!parsed.ok) return false; // skip invalid (generator should not produce these)
+        return formatUrl(parsed.value) === url;
+      }),
+      { numRuns: 1000, seed: 42 }
+    );
+  });
+  it('never throws on arbitrary string input', () => {
+    fc.assert(
+      fc.property(fc.string(), (input) => {
+        const result = parseUrl(input);
+        // Must return a Result, never throw
+        return result.ok === true || result.ok === false;
+      }),
+      { numRuns: 5000 }
+    );
+  });
+  it('always produces lowercase protocol', () => {
+    fc.assert(
+      fc.property(urlArb, (url) => {
+        const parsed = parseUrl(url.toUpperCase());
+        if (!parsed.ok) return true; // skip failures
+        return parsed.value.protocol === parsed.value.protocol.toLowerCase();
+      })
+    );
+  });
+});
+```
+### Example: hypothesis for a Python Sorting Algorithm
+**DEFINE -- Property tests with hypothesis:**
+```python
+# tests/property/test_sort_properties.py
+from hypothesis import given, settings, assume
+from hypothesis import strategies as st
+from myapp.sorting import merge_sort
+@given(st.lists(st.integers()))
+def test_sort_preserves_length(xs):
+    """Sorted output has the same length as input."""
+    assert len(merge_sort(xs)) == len(xs)
+@given(st.lists(st.integers()))
+def test_sort_preserves_elements(xs):
+    """Sorted output contains exactly the same elements as input."""
+    assert sorted(merge_sort(xs)) == sorted(xs)
+@given(st.lists(st.integers(), min_size=1))
+def test_sort_produces_ordered_output(xs):
+    """Every element is less than or equal to the next."""
+    result = merge_sort(xs)
+    for i in range(len(result) - 1):
+        assert result[i] <= result[i + 1]
+@given(st.lists(st.integers()))
+def test_sort_is_idempotent(xs):
+    """Sorting an already-sorted list produces the same result."""
+    once = merge_sort(xs)
+    twice = merge_sort(once)
+    assert once == twice
+@settings(max_examples=5000)
+@given(st.lists(st.floats(allow_nan=False, allow_infinity=False)))
+def test_sort_handles_floats(xs):
+    """Sort works correctly with floating-point numbers."""
+    result = merge_sort(xs)
+    for i in range(len(result) - 1):
+        assert result[i] <= result[i + 1]
+```
+## Gates
+- **No property tests without shrinking.** If the framework's automatic shrinking is disabled or the generator uses patterns that break shrinking (excessive `filter`), counterexamples will be unhelpfully large. Fix the generator to support shrinking.
+- **No ignoring counterexamples.** Every counterexample produced by a property test must be investigated. If it reveals a real bug, fix it. If it is a false positive, adjust the property specification or generator. Never just increase the iteration count to make it "less likely to fail."
+- **No property tests that always pass trivially.** A property that returns `true` for every input is useless. Review that properties make substantive assertions. If a property has a `return true` fallback for most inputs, the generator is producing too many invalid inputs.
+- **Regression tests are mandatory for counterexamples.** Every shrunk counterexample that revealed a bug must be preserved as an explicit example-based test, even after the property test passes. The explicit test serves as documentation and prevents regression.
+## Escalation
+- **When the generator cannot produce valid inputs efficiently (> 50% rejection rate):** Rewrite the generator to construct valid inputs directly rather than filtering. Use `flatMap` to build constrained structures incrementally. If the domain constraints are too complex for a generator, consider whether the function's API needs simplification.
+- **When a counterexample is too complex to understand even after shrinking:** The shrinking strategy may be insufficient for the data type. Write a custom shrinker that targets the specific structure. Alternatively, add intermediate logging to the property to trace which sub-property fails.
+- **When property tests are too slow for CI (> 5 minutes):** Reduce the iteration count for PR runs (100 iterations). Run high-iteration tests (10,000+) as a nightly job. Consider whether some properties can be tested with smaller input ranges without losing coverage.
+- **When the team debates whether a property is correct:** The property may be encoding an assumption that does not hold. Review the specification or domain requirements. If the correct behavior is ambiguous, escalate to product/domain experts before encoding the property in a test.