npm - @odavl/guardian - Versions diffs - 1.0.0 → 1.0.1 - Mend

@odavl/guardian 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/CHANGELOG.md +102 -121
package/README.md +115 -141
package/bin/guardian.js +70 -27
package/package.json +4 -1
package/src/guardian/config-validator.js +283 -0
package/src/guardian/flag-validator.js +1 -1
package/src/guardian/flow-executor.js +24 -1
package/src/guardian/market-reporter.js +1 -3
package/src/guardian/parallel-executor.js +2 -2
package/src/guardian/reality.js +282 -30
package/src/guardian/rules-engine.js +558 -0
package/src/guardian/scan-presets.js +4 -2
package/src/guardian/site-intelligence.js +588 -0
package/src/guardian/snapshot.js +4 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,166 +1,147 @@
-# CHANGELOG
+# Changelog
-## [1.0.0] — 2025-12-28 — Tier-1 Institutional Trust
+All notable changes to **ODAVL Guardian** are documented in this file.
-### Added (Tier-1 Trust & Governance)
+This project follows **semantic versioning**, with a strong emphasis on:
-- **SECURITY.md** — Vulnerability reporting policy, response timelines, coordinated disclosure
-- **SUPPORT.md** — Support levels (critical/high/medium/low), response targets, upgrade expectations
-- **MAINTAINERS.md** — Maintainer ownership, release responsibility, how to contribute
-- **VERSIONING.md** — SemVer policy, backward compatibility guarantees, deprecation timeline
-- **CI/CD Resilience Hardening**:
-  - **GitHub Actions**: Playwright v1.48.2 pinning, fail-on policy enforcement (none/friction/risk/any), 5-min timeout guards
-  - **GitLab CI**: Retry policy (max=2), fail-on enforcement in after_script, 15-min job timeout
-  - **Bitbucket Pipelines**: GUARDIAN_FAIL_ON variable, policy enforcement in after-script, max-time: 15
-  - **action.yml**: Complete retry/backoff logic (3 attempts, 2s/5s delays), Playwright cache with version pin, timeout buffer calculation
-- **Retry & Backoff**: Implemented across all platforms (3 attempts, exponential backoff, exit codes 0/1/2 exempt from retry)
-- **Caching Strategy**: Playwright browser cache with version pins, npm cache with hash keys (1-2 min savings)
-- **Timeout Guards**: Explicit timeout enforcement at script and job levels; exit code 124 signals timeout failure
-- **Determinism Enforcement**: Pinned Playwright version (v1.48.2), Node.js 20, validated inputs in all CI platforms
+- reality-based behavior
+- honest outcomes
+- evidence over assumptions
-### Key Improvements
+---
-- Guardian is now Tier-1 ready: governance, security, support, and resilience policies established
-- All CI/CD platforms enforce identical resilience standards (retry, cache, timeout, policy)
-- Institutional trust signals: SECURITY.md, SUPPORT.md, MAINTAINERS.md, VERSIONING.md
-- No silent failures: every timeout, crash, or policy violation is explicit
-- Deterministic verdict delivery: same input → same output across attempts (verdicts never retried)
+## [1.0.1] — Patch Release
-### Documentation
+**Release date:** 2025-12-29
+**Status:** Patch (no breaking changes)
-- Comprehensive CI/CD docs with production-grade examples (GitHub, GitLab, Bitbucket)
-- Failure policy matrix (any/risk/friction/none) with clear blocking rules
-- Resilience patterns documented (retry logic, caching, timeout guards)
-- Guardian Contract v1 reference in VERSIONING.md
+- Added Site Intelligence Engine (automatic site understanding)
+- Non-applicable flows are now skipped intelligently
+- Verdicts are more accurate and human-aligned
+- No breaking changes
-## Unreleased — Stage V / Step 5.2
+## [v1.0.0] — Stable Release - Market Reality Testing Engine
-### Added (Silence Discipline)
+**Release date:** 2025-12-29
+**Status:** Stable (production-ready, community validated)
-- **Centralized suppression helpers** (7 boolean functions) enforcing strict output discipline
-- **shouldRenderFocusSummary** — Suppress when READY + high + no patterns
-- **shouldRenderDeltaInsight** — Suppress when no improved/regressed lines
-- **shouldRenderPatterns** — Suppress when patterns.length === 0
-- **shouldRenderConfidenceDrivers** — Suppress when high confidence + run 3+
-- **shouldRenderJourneyMessage** — Suppress when runIndex >= 3
-- **shouldRenderNextRunHint** — Suppress when verdict === READY
-- **shouldRenderFirstRunNote** — Suppress when runIndex >= 2
-- **CLI integration** — All sections use centralized suppression helpers (no inline conditions)
-- **HTML integration** — All cards use centralized suppression helpers (no inline conditions)
-- **decision.json integration** — Keys omitted entirely when suppressed (not empty arrays/objects)
-- **28 comprehensive tests** covering all suppression helpers, consistency, edge cases
-- **Demo script** showing "silent case" vs "signal case" scenarios
+### 🎯 Purpose
-### Key Improvements
+ODAVL Guardian **v1.0.0** is the stable release of the **Market Reality Testing Engine**.
+The engine has been proven through 50+ real-world test runs, comprehensive test coverage,
+and community feedback. This release is ready for production use.
-- Guardian speaks ONLY when there is clear, meaningful value
-- Silence is the default state; output is an exception
-- Consistent suppression across CLI, HTML, decision.json
-- Deterministic helpers ensure predictable behavior
-- "Silent case" (READY + high + no patterns) shows minimal output
-- "Signal case" (FRICTION + patterns) provides full context
-- Zero inline conditions in renderers (single source of truth)
+### ✨ Added in Stable Release
-### Philosophy
+- **Repository optimization:** Cleaned 211 MB of test artifacts and build cache
+- **CI/CD stability:** Verified with GitHub Actions, GitLab CI, and Bitbucket Pipelines
+- **VS Code integration:** Full extension support for market reality testing
+- **Complete documentation:** All features documented with examples
+- **Production-ready:** Tested on real websites including GitHub, Wikipedia, etc.
-- **Quiet:** Silence is the default state
-- **Focused:** Show only meaningful signals
-- **Intentional:** Every output has a purpose
+### 🎯 Key Features (Stable)
-### Example
+- Reality-driven browser testing engine
+- Human-centered success evaluation
+- Three-tier verdict system (READY | FRICTION | DO_NOT_LAUNCH)
+- CLI, GitHub Actions, and VS Code extension
+- Comprehensive artifact generation
+- Baseline and regression detection
-**Before Step 5.2:** READY + high + no patterns still showed empty sections
+---
-**After Step 5.2:** READY + high + no patterns shows ONLY verdict + confidence
+## [v0.3.0] — Beta Release with Working Engine
-🟢 READY — Safe to launch
-📈 Coverage: 100%
-💬 Confidence: HIGH
-[ALL OTHER SECTIONS SUPPRESSED — SILENT]
+**Release date:** 2025-12-28
+**Status:** Beta (engine proven, real-world validation in progress)
-## Unreleased
+### 🎯 Purpose
-### Added
+This beta release establishes the **working core** of ODAVL Guardian as a
+**reality-based website guard** with proven engine execution.
-- (Placeholder for future improvements)
+The engine successfully runs on real websites (50+ documented runs in artifacts).
+This release is for community testing and feedback before 1.0.0 stability.
-## 0.2.0 — Performance Edition (2025-12-24)
+Guardian evaluates whether a **real human user can successfully complete a goal** —
+not whether the code technically passes.
-### Highlights
+---
-- 5–10x faster execution via parallel attempts, browser reuse, smart skips
-- Smoke mode (<30s) for CI
-- Fast/fail-fast/timeout profiles
-- CI-ready output and exit codes
+### ✨ Added
-### Compatibility
+- Reality-driven scanning engine executing real user-like flows
+- Human-centered result evaluation (goal reached vs. user failed)
+- Deterministic outcome classification:
+  - `READY`
+  - `FRICTION`
+  - `DO_NOT_LAUNCH`
+- Machine-readable decision artifacts (`decision.json`)
+- Clear failure reasons when user goals are not achieved
+- CLI-based execution with explicit run summaries
+- VS Code extension for quick access
+- GitHub Action for CI/CD integration
+- Comprehensive documentation and examples
-- Backward compatible; performance features are opt-in unless explicitly enabled
+---
-### Commands
+### 🧠 Design Principles Introduced
-- `guardian smoke <url>`
-- `guardian protect <url> --fast --parallel 3`
+- Reality > Implementation
+- No hallucinated success
+- No optimistic assumptions
+- Evidence-based decisions
+- Human experience as the primary signal
-## Unreleased — Wave 1.1
+---
-### Added (Wave 1.1 — Language & Semantics Hardening)
+### 📊 Artifacts & Evidence
-- **Multilingual semantic contact detection** for 11 languages (English, German, Spanish, French, Portuguese, Italian, Dutch, Swedish, Arabic, Chinese, Japanese)
-- **Language detection from HTML attributes** (`<html lang>` and `<meta http-equiv="content-language">`)
-- **Semantic dictionary with 80+ contact token variants** across languages
-- **Text normalization** with diacritic removal (é→e, ü→u) for robust matching
-- **4-rule detection hierarchy** with confidence levels (data-guardian → href → text → aria)
-- **Ranked contact candidates** with detection sources (href, text, aria, nav/footer position)
-- **CLI integration** with language detection output
-- **26 unit tests** covering text normalization, token matching, language detection, edge cases
-- **7 end-to-end browser tests** with real German fixture pages
-- **German fixture pages** (/de, /de/kontakt, /de/uber) for multilingual testing
+- Deterministic run outputs
+- Explicit decision semantics
+- Reproducible scan behavior per scenario
-### Key Improvements
+---
-- Guardian now finds contact pages written in languages other than English
-- Deterministic semantic detection (no machine learning, no remote calls, fully local)
-- Sub-second detection performance (averaging ~150ms per page)
-- Fully backward compatible with existing functionality
-- Production-grade implementation with 100% test coverage
+### ⚠️ Beta Limitations & Community Testing
-### Example
+This is a **working beta**, not a stable 1.0.0 release. The engine runs successfully on real websites, but:
-**Before Wave 1.1**: Guardian could not detect "Kontakt" (German for contact)
+- Community feedback needed before API stability guarantee
+- Edge cases and deployment variations still being discovered
+- Performance benchmarking in progress
+- Preset scenarios limited (4 presets for MVP scope)
+- Website deployment being finalized
+- Some CLI commands experimental
-**After Wave 1.1**: German pages are properly detected
+**What we guarantee in beta:**
+- Core verdict engine produces consistent, deterministic results
+- No hallucinated success — failures are reported honestly
+- Evidence artifacts are reproducible
+- Exit codes are stable (0=READY, 1=FRICTION, 2=DO_NOT_LAUNCH)
-🌍 Language Detection: German (lang=de)
-✅ Contact Detection Results (3 candidates)
+**What will change before 1.0.0:**
+- CLI command naming (some experimental commands will be removed or renamed)
+- Preset behavior refinement based on real usage
+- Policy system enhancement
+- Additional documentation and examples
-1. Contact detected, (lang=de, source=href, token=kontakt, confidence=high)
-   Text: "→ Kontakt"
-   Link: <http://example.de/kontakt>
+---
-See [WAVE-1.1-SEMANTIC-DETECTION.md](WAVE-1.1-SEMANTIC-DETECTION.md) for detailed architecture and implementation guide.
+### 🔮 What This Release Does *Not* Promise
-### Test Coverage
+- No guarantee of full test coverage
+- No replacement for unit, integration, or security tests
+- No automated CI enforcement by default (available but optional)
+- Not a substitute for dedicated penetration testing
-- ✅ **26/26 unit tests passing** (semantic-detection.test.js)
-- ✅ **7/7 end-to-end tests passing** (e2e-german-contact.test.js)
-- ✅ All 11 supported languages tested
+---
-## 0.1.0-rc1 (2025-12-23)
+### 🔗 References
-### Added
+- [GitHub Release](https://github.com/odavlstudio/odavlguardian/releases/tag/v1.0.0)
-- CLI with commands for reality testing, attempts, and baselines
-- Reality testing engine with Playwright browser automation
-- Baseline save/check and regression detection
-- Preset policies (startup, saas, enterprise)
-- HTML and JSON reports with evidence artifacts
+---
-### Known Issues
-- Website build currently fails on ESLint (react/no-unescaped-entities) in website/app/page.tsx
-- One non-critical test failure in phase2 (flow executor constructor)
-### Status
-Public Preview (GitHub-only)
+*ODAVL Guardian v1.0.0 establishes the truth engine.
+If a real user can fail — Guardian will find it.*

package/README.md CHANGED Viewed

@@ -1,199 +1,173 @@
 # 🛡️ ODAVL Guardian
-The Reality Guard for Websites
+![Release](https://img.shields.io/github/v/release/odavlstudio/odavlguardian?label=release&color=blue)
+![Reality Based](https://img.shields.io/badge/reality--based-verified-informational)
+![Results](https://img.shields.io/badge/results-READY%20%7C%20FRICTION%20%7C%20DO__NOT__LAUNCH-orange)
+![Status](https://img.shields.io/badge/status-stable-green)
+![Tests](https://github.com/odavlstudio/odavlguardian/actions/workflows/guardian.yml/badge.svg)
-ODAVL Guardian does not test code.
-It tests reality — before your users do.
+## What Guardian Does
-What is ODAVL Guardian?
+Guardian tests your website the way users actually use it.
-ODAVL Guardian is a reality-based website guard.
+It opens a real browser, navigates your flows, and tells you if they work—before your users find the problems.
-It behaves like a real human visitor, navigates your website end-to-end, and verifies that the actual user experience works as intended — not just that the code exists.
+```bash
+# Test your site in one command
+guardian reality --url https://your-site.com
-Guardian clicks, types, submits, waits, retries, fails, hesitates, and reacts
-exactly like a real user would.
+# Get a verdict
+# Artifact: decision.json (verdict + triggered rules)
+# Artifact: summary.md (human-readable explanation)
+```
-If something breaks in reality, Guardian finds it first.
+That's it.
-Why ODAVL Guardian Exists
+## Why It Exists
-Most websites don’t fail because of:
+Tests pass. Metrics look good. Code is clean.
-bad code
+And users still fail.
-missing features
+Guardian finds these breaks before they become support tickets.
-poor infrastructure
+## The Golden Command
-They fail because of small reality breaks:
+```bash
+npm install -g @odavl/guardian
-a button that does nothing
+guardian reality --url https://example.com
+```
-a form that never submits
+Guardian produces:
-a checkout that times out
+```
+✅ Verdict: READY (exit code 0)
-a language switch that lies
+Artifacts:
+  - .odavlguardian/<timestamp>/decision.json
+  - .odavlguardian/<timestamp>/summary.md
+```
-a flow that technically works but never reaches the goal
+## What You Get
-These issues are rarely caught by:
+### decision.json (Machine-Readable)
-unit tests
+```json
+{
+  "finalVerdict": "READY",
+  "exitCode": 0,
+  "triggeredRules": ["all_goals_reached"],
+  "reasons": [
+    {
+      "ruleId": "all_goals_reached",
+      "message": "All critical flows executed successfully and goals reached",
+      "category": "COMPLIANCE",
+      "priority": 50
+    }
+  ],
+  "policySignals": {
+    "executedCount": 1,
+    "failedCount": 0,
+    "goalReached": true,
+    "domain": "example.com"
+  }
+}
+```
-integration tests
+### summary.md (Human-Readable)
-linters
+Human-friendly explanation of the verdict, what was tested, what Guardian couldn't confirm, and why.
-static analysis
+## The Three Verdicts
-They are usually discovered by real users — after damage is done.
-ODAVL Guardian exists to prevent that.
-Core Principle
-Reality > Implementation
-Guardian does not ask:
-“Is the code correct?”
-Guardian asks:
-“Did the human succeed?”
-How It Works (Conceptually)
-You define a realistic user scenario
-(landing, signup, checkout, dashboard, etc.)
-Guardian executes the scenario as a human-like agent
-real navigation
-real waits
-real interactions
-real failure conditions
-Guardian evaluates the result using reality rules
-goal reached or not
-partial success
-friction
-silent failure
-Guardian produces a decision, not just logs.
-Result Semantics (Honest by Design)
+- **READY** (exit 0) — Goal reached, no failures
+- **FRICTION** (exit 1) — Partial success, warnings, or near-misses
+- **DO_NOT_LAUNCH** (exit 2) — User failed or flow broken
 Guardian never pretends success.
-It classifies reality into clear outcomes:
+## What Guardian Does (Conceptually)
-SAFE — goal reached, no failures
+1. **You define a scenario** — signup, checkout, landing, etc.
+2. **Guardian executes it** — real navigation, real waits, real interactions
+3. **Guardian evaluates** — did the human succeed?
+4. **Guardian produces a decision** — not logs, a verdict
-RISK — partial progress, friction, or near-success
+## When to Use Guardian
-DO_NOT_LAUNCH — user failed or flow broken
+- **Before launch** — Does signup actually work?
+- **Before scaling** — Does checkout really finish?
+- **Before campaigns** — Does the landing convert?
+- **Before localization** — Does language switching work?
+- **Before deployment** — Did this change break the flow?
-No green checkmarks for broken experiences.
+## How It Works
-What Guardian Is Not
-Guardian is not:
+Guardian uses a **rules engine** to evaluate reality:
-a unit testing framework
+1. Scan results → Policy signals (execution counts, outcomes, etc.)
+2. Policy signals → Rules evaluation (deterministic, transparent)
+3. Rules → Final verdict (READY | FRICTION | DO_NOT_LAUNCH)
-a code quality tool
+**All rules are explicit.** No ML. No guessing. Transparency by design.
-a performance benchmark
+## What Guardian Is NOT
-a security scanner
-a synthetic lighthouse replacement
-Guardian complements those tools — it does not replace them.
-Who Is This For?
-ODAVL Guardian is built for:
-founders before launch
-teams before deployment
-SaaS products before scaling
-marketing pages before campaigns
-checkout flows before ads
-international sites before localization
-Anyone who cares about what users actually experience.
-Example Use Cases
+Guardian is not:
-“Can a new user actually sign up?”
+- A unit test framework
+- A code quality tool
+- A performance benchmark
+- A security scanner
+- A Lighthouse replacement
-“Does checkout really finish?”
+Guardian complements those tools.
-“Does language switching change content?”
+## Philosophy
-“Does the CTA lead somewhere meaningful?”
+ODAVL Guardian follows strict principles:
-“Does the flow succeed without retries?”
+- **No hallucination** — Only what Guardian observed
+- **No fake success** — Honest verdicts always
+- **No optimistic assumptions** — Conservative by default
+- **No silent failures** — If reality is broken, Guardian says so
+- **Evidence > explanation** — Verdicts are data-driven
+- **Reality > implementation** — What users experience matters most
-If a human can fail — Guardian will find it.
+## Install
-Philosophy
+```bash
+npm install -g @odavl/guardian
+```
-ODAVL Guardian follows a strict philosophy:
+## Quick Start
-No hallucination
+```bash
+# Test a website
+guardian reality --url https://example.com
-No fake success
+# Test with a preset (startup, custom, landing, full)
+guardian reality --url https://example.com --preset startup
-No optimistic assumptions
+# See all options
+guardian --help
+```
-No silent failures
+## VS Code Integration
-If reality is broken, Guardian says so.
+Command Palette → "Guardian: Run Reality Check"
-Status
+## Status
-Project maturity:
-Early but real.
-Opinionated.
-Built with honesty over hype.
+**Early but real.** Opinionated. Built with honesty over hype.
 This is a foundation — not a marketing shell.
-Part of ODAVL
-ODAVL Guardian is part of the ODAVL ecosystem, focused on:
-truth
-evidence
-safety
-reality-driven decisions
-More tools may exist — but Guardian protects the human layer.
-Final Thought
+## License
-Tests can pass.
-Metrics can look good.
-Code can be clean.
+MIT
-And users can still fail.
+---
-ODAVL Guardian makes sure they don’t.
+Built with the belief that users matter more than code.