PyPI - websec-validator - Versions diffs - 0.4.2__tar.gz → 0.5.0__tar.gz - Mend

websec-validator 0.4.2tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

{websec_validator-0.4.2/src/websec_validator.egg-info → websec_validator-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: websec-validator
-Version: 0.4.2
+Version: 0.5.0
 Summary: Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app.
 Author: Ricardo Accioly
 License: MIT
@@ -84,7 +84,7 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (15 deterministic extractors, no LLM)
+## What it extracts (16 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
@@ -99,10 +99,11 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
 | iac_ci | IaC + CI/CD | GHA injection, unpinned actions, tfstate, **CDK AppSync `API_KEY` anonymous-default-auth + WAF-as-control smell** |
 | client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
-| **client_integrity** | tamperable display + **WS auth model** | wallet value without strict CSP / out-of-band anchor **+ the CSWSH determinant (ambient-cookie WS auth)** |
+| **client_integrity** | tamperable display (client trust boundary) + **WS auth model** | any security-critical sink value (address/IBAN/2FA-seed/API-key/webhook) the user reads or copies, without strict CSP / out-of-band anchor **+ client-tamper-vector, grindable-fingerprint, over-claimed-control, the CSWSH determinant** |
+| **transport_security** | CSP + HSTS header baseline | missing/weak CSP, inline event handlers, **partial HSTS (set on /api but not the HTML page)** |
 | **pii_exposure** | unmasked PII at the output boundary | `res.json(rawEntity)` with PII + **a masking control defined but with zero live call sites** (value-shape, not field-name) |
 | graphql | GraphQL surface | introspection (**AppSync `introspectionConfig: DISABLED`-aware**) / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA)** |
-| integrations | third-party + webhooks | webhooks missing signature verification |
+| integrations | third-party + webhooks **+ outbound-action endpoints** | unsigned webhooks **+ email/SMS/push handlers with no auth or IP-only rate-limit + redundant secret-fetch** |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
 candidates — so probes get pointed at the *exact* endpoints, not fired blindly.
@@ -184,7 +185,7 @@ upload, cross-tenant BOLA, role/authz gaps).
 ## Tests
 ```bash
-python3 -m unittest discover -s tests    # stdlib only, no Noir/network — 103 tests
+python3 -m unittest discover -s tests    # stdlib only, no Noir/network — 126 tests
 ```
 ## Releasing (maintainer)
@@ -216,7 +217,7 @@ managed-AppSync / VTL boundary**, **upload-security** + **PII-output-boundary**
 de-dup + **bundled Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger
 with **calibrated confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all
 scanners + Noir, arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA —
-validated live, reproduced a hand-pentest's 14/14). Validated against the **PTREQ0013000 pen test +
+validated live, reproduced a hand-pentest's 14/14). Validated against the **REF-PENTEST pen test +
 retest** (incl. correcting two findings the retest disproved: AppSync introspection *is* disablable
 engine-level, and API_KEY-default is anonymous-auth, not CSWSH).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),

{websec_validator-0.4.2 → websec_validator-0.5.0}/README.md RENAMED Viewed

@@ -72,7 +72,7 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (15 deterministic extractors, no LLM)
+## What it extracts (16 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
@@ -87,10 +87,11 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
 | iac_ci | IaC + CI/CD | GHA injection, unpinned actions, tfstate, **CDK AppSync `API_KEY` anonymous-default-auth + WAF-as-control smell** |
 | client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
-| **client_integrity** | tamperable display + **WS auth model** | wallet value without strict CSP / out-of-band anchor **+ the CSWSH determinant (ambient-cookie WS auth)** |
+| **client_integrity** | tamperable display (client trust boundary) + **WS auth model** | any security-critical sink value (address/IBAN/2FA-seed/API-key/webhook) the user reads or copies, without strict CSP / out-of-band anchor **+ client-tamper-vector, grindable-fingerprint, over-claimed-control, the CSWSH determinant** |
+| **transport_security** | CSP + HSTS header baseline | missing/weak CSP, inline event handlers, **partial HSTS (set on /api but not the HTML page)** |
 | **pii_exposure** | unmasked PII at the output boundary | `res.json(rawEntity)` with PII + **a masking control defined but with zero live call sites** (value-shape, not field-name) |
 | graphql | GraphQL surface | introspection (**AppSync `introspectionConfig: DISABLED`-aware**) / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA)** |
-| integrations | third-party + webhooks | webhooks missing signature verification |
+| integrations | third-party + webhooks **+ outbound-action endpoints** | unsigned webhooks **+ email/SMS/push handlers with no auth or IP-only rate-limit + redundant secret-fetch** |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
 candidates — so probes get pointed at the *exact* endpoints, not fired blindly.
@@ -172,7 +173,7 @@ upload, cross-tenant BOLA, role/authz gaps).
 ## Tests
 ```bash
-python3 -m unittest discover -s tests    # stdlib only, no Noir/network — 103 tests
+python3 -m unittest discover -s tests    # stdlib only, no Noir/network — 126 tests
 ```
 ## Releasing (maintainer)
@@ -204,7 +205,7 @@ managed-AppSync / VTL boundary**, **upload-security** + **PII-output-boundary**
 de-dup + **bundled Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger
 with **calibrated confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all
 scanners + Noir, arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA —
-validated live, reproduced a hand-pentest's 14/14). Validated against the **PTREQ0013000 pen test +
+validated live, reproduced a hand-pentest's 14/14). Validated against the **REF-PENTEST pen test +
 retest** (incl. correcting two findings the retest disproved: AppSync introspection *is* disablable
 engine-level, and API_KEY-default is anonymous-auth, not CSWSH).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),

{websec_validator-0.4.2 → websec_validator-0.5.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "websec-validator"
-version = "0.4.2"
+version = "0.5.0"
 description = "Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app."
 readme = "README.md"
 requires-python = ">=3.11"

{websec_validator-0.4.2 → websec_validator-0.5.0}/src/websec_validator/constitution.py RENAMED Viewed

@@ -60,7 +60,7 @@ def build(facts: dict, ledger: dict | None = None) -> list:
     add("Secret hygiene", "Given the repo + git history, Then no live credential is present and no secret "
         "reaches the client bundle", "recon")
-    # P6 — Signing-secret integrity (forgeable JWT, PTREQ0013000 #8)
+    # P6 — Signing-secret integrity (forgeable JWT, REF-PENTEST #8)
     for sd in ((facts.get("auth", {}) or {}).get("insecure_secret_defaults", []) or [])[:5]:
         add("Signing-secret integrity", f"Given the signing-secret env var is unset, When the app boots, Then it "
             f"FAILS CLOSED — no hard-coded fallback ({sd.get('literal')!r} in {sd.get('file')})",

{websec_validator-0.4.2 → websec_validator-0.5.0}/src/websec_validator/extractors/__init__.py RENAMED Viewed

@@ -24,6 +24,7 @@ from .schemas import SchemasExtractor
 from .stack import StackExtractor
 from .surface import SurfaceExtractor
 from .tenant import TenantExtractor
+from .transport_security import TransportSecurityExtractor
 from .upload_security import UploadSecurityExtractor
 # Order matters: stack first (others read facts['stack']); authz after routes
@@ -41,6 +42,7 @@ REGISTRY: list[Extractor] = [
     IacCiExtractor(),
     ClientExposureExtractor(),
     ClientIntegrityExtractor(),
+    TransportSecurityExtractor(),
     PiiExposureExtractor(),
     GraphQLExtractor(),
     IntegrationsExtractor(),

{websec_validator-0.4.2 → websec_validator-0.5.0}/src/websec_validator/extractors/auth.py RENAMED Viewed

@@ -28,7 +28,7 @@ _COOKIE_RESERVED = {"get", "set", "getall", "has", "delete", "clear", "tostring"
                     "foreach", "entries", "keys", "values", "size", "name", "value", "length"}
 # Insecure DEFAULT signing secret — a hard-coded fallback on a secret/key var (the forgeable-JWT
-# class, PTREQ0013000 #8). JS/TS: `process.env.JWT_SECRET || 'dev-secret-do-not-use-in-prod'`;
+# class, REF-PENTEST #8). JS/TS: `process.env.JWT_SECRET || 'dev-secret-do-not-use-in-prod'`;
 # Python: os.environ.get('JWT_SECRET', 'dev-secret'). A quoted fallback on a *SECRET/*KEY var is
 # almost never benign — and if it's a dev-ish placeholder AND the repo actually signs JWTs, anyone
 # who reads the source can forge tokens for any user/role.
@@ -92,7 +92,7 @@ class AuthExtractor(Extractor):
                 for mm in SECRET_DEFAULT_PY.finditer(text):
                     secret_defaults.append((rel, mm.group(1)))
-        # Hard-coded fallback signing secret → forgeable-JWT lead (PTREQ0013000 #8). De-dup by
+        # Hard-coded fallback signing secret → forgeable-JWT lead (REF-PENTEST #8). De-dup by
         # (file, literal); mark dev-ish placeholders. findings.py escalates dev-ish + jwt-in-use to
         # CRITICAL; probes.stage seeds the literal into the hs256 brute-force candidate list.
         seen_sd: set = set()

{websec_validator-0.4.2 → websec_validator-0.5.0}/src/websec_validator/extractors/base.py RENAMED Viewed

@@ -16,14 +16,14 @@ SKIP_DIRS = {".git", "node_modules", "dist", "build", ".next", ".nuxt", "venv",
              ".venv", "__pycache__", ".mypy_cache", ".pytest_cache", "coverage",
              ".turbo", "out", "target", ".gradle", "vendor", "site-packages",
              ".terraform", "security", ".websec-out", "websec-out", ".cache",
-             ".svelte-kit", "storybook-static", ".serverless",
+             ".svelte-kit", "storybook-static", ".serverless", ".aws-sam", "cdk.out", ".sst", ".amplify",
              # agent tooling + editor dirs + worktree copies — not the target app
              ".wolf", ".claude", ".worktrees", ".idea", ".vscode", ".agent", ".agents"}
 CODE_EXT = {".js", ".jsx", ".ts", ".tsx", ".mjs", ".cjs", ".py", ".go", ".rb",
             ".java", ".php", ".prisma",
             # Managed-cloud surfaces: AppSync GraphQL SDL (@aws_* auth directives) + VTL
             # resolvers (where realtime/subscription authz actually lives, or is missing).
-            # PTREQ0013000 #2/#5 lived in these file types — previously invisible to every
+            # REF-PENTEST #2/#5 lived in these file types — previously invisible to every
             # iter_code()-based extractor. routes.py SPEC_PATH still splits .graphql/.gql out
             # of the route list so SDL doesn't generate phantom endpoints.
             ".graphql", ".gql", ".vtl"}

{websec_validator-0.4.2 → websec_validator-0.5.0}/src/websec_validator/extractors/client_exposure.py RENAMED Viewed

@@ -16,21 +16,32 @@ SECRETISH = re.compile(r"SECRET|PRIVATE|TOKEN|PASSWORD|PASSWD|API_?KEY|ACCESS_?K
 SERVER_SECRET = re.compile(r"process\.env\.([A-Z0-9_]*(?:SECRET|PRIVATE|TOKEN|PASSWORD|API_?KEY|ACCESS_?KEY)[A-Z0-9_]*)")
 # VALUE-aware leak detection — hardens the name-based scan above so it survives a benign rename
-# (the PTREQ0013000 #3 gap: a real key carried in a non-secret-named public var slips the name scan).
-# We match distinctive secret SHAPES, not var names. AppSync's `da2-` key has NO scanner rule at all,
-# so we always flag it; the generic shapes (which trivy/gitleaks already catch) are only flagged when
-# the file is client-reachable, to add the ships-to-browser angle without duplicating those scanners.
+# (the REF-PENTEST #3 gap: a real key carried in a non-secret-named public var slips the name scan).
+# We match distinctive secret SHAPES, not var names — CLOUD-AGNOSTIC by design (AWS + Azure + GCP +
+# generic), so the same value-leak detector works on a Next.js-on-Vercel, an Azure SWA, or a GCP
+# Firebase app alike. AppSync's `da2-` key has NO scanner rule at all, so we always flag it; the
+# generic shapes (which trivy/gitleaks already catch) are only flagged when the file is
+# client-reachable, to add the ships-to-browser angle without duplicating those scanners.
 SECRET_SHAPES = [
+    # AWS
     (re.compile(r"\bda2-[a-z0-9]{26}\b"), "AWS AppSync API key (da2-…)", True),
     (re.compile(r"\bAKIA[0-9A-Z]{16}\b"), "AWS access key id (AKIA)", False),
+    # GCP / Google
     (re.compile(r"\bAIza[0-9A-Za-z_\-]{35}\b"), "Google API key (AIza…)", False),
+    (re.compile(r"""["']type["']\s*:\s*["']service_account["']"""), "GCP service-account credential JSON", False),
+    # Azure
+    (re.compile(r"AccountKey=[A-Za-z0-9+/]{86}=="), "Azure Storage account key (AccountKey=…)", False),
+    (re.compile(r"DefaultEndpointsProtocol=https;AccountName="), "Azure Storage connection string", False),
+    (re.compile(r"[?&]sig=[A-Za-z0-9%/+]{43,}&se="), "Azure SAS token (sig=…&se=…)", False),
+    # cloud-neutral
+    (re.compile(r"-----BEGIN (?:RSA |EC )?PRIVATE KEY-----"), "Private-key PEM block (TLS / SSH / SA key)", False),
     (re.compile(r"\bsk_live_[0-9A-Za-z]{16,}\b"), "Stripe live secret key (sk_live_…)", False),
     (re.compile(r"\beyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{4,}\b"), "JWT (eyJ…)", False),
 ]
 # CDK build-time injection: a CloudFormation output / SSM param / Secret wired INTO a public build
 # var — e.g. CodeBuild `envFromCfnOutputs: { VITE_APPSYNC_API_KEY: appsyncApiKeyOutput }`. Invisible
 # to every secret scanner because the value isn't in source; it's injected at build time (the exact
-# mechanism that shipped the AppSync key to the browser in PTREQ0013000 #3).
+# mechanism that shipped the AppSync key to the browser in REF-PENTEST #3).
 CFN_TO_PUBLIC = re.compile(
     r"(?:envFromCfnOutputs|buildEnvironment|environmentVariables|partialBuildSpec)"
     r"[\s\S]{0,400}?((?:NEXT_PUBLIC_|VITE_|REACT_APP_|GATSBY_|EXPO_PUBLIC_)\w*)\s*[:=]\s*"

websec_validator-0.5.0/src/websec_validator/extractors/client_integrity.py ADDED Viewed

@@ -0,0 +1,272 @@
+"""Client-trust-boundary / tamperable-display extractor — the man-in-the-browser (MITB) class.
+Generalized from the agent-wallet lesson: when an app renders a **security-critical sink value** —
+ANY value the user ACTS ON by reading or copying, where a silent swap causes irreversible loss or
+misdirection — that on-screen value is rewritable by code running in the victim's own browser
+(malware, a rogue extension, a poisoned JS dependency in the app's own bundle). TLS protects the
+wire, not the DOM.
+The sink set is deliberately GENERIC and classified by BLAST RADIUS, not by app type — the pen-test
+team's principle: detect by **data-flow role**, never by keyword/category. The same probe that finds
+a swapped crypto address finds a swapped IBAN, a swapped 2FA seed, or a swapped webhook URL. The
+keyword lists below are a STARTING SET, not the whole detector:
+  - money-movement   : crypto/wallet address, IBAN/routing/account/SWIFT, payee/pay-to        → HIGH
+  - credential       : 2FA/TOTP seed, recovery/mnemonic phrase, private/API/license key        → HIGH
+  - config/integrity : webhook/callback URL, DNS record, invoice payment instructions          → MEDIUM
+Severity tracks IRREVERSIBILITY; confidence stays LOW — this is an architectural "verify the
+compensating controls" lead, never a "your app is broken" claim. No web app can make on-screen
+display cryptographically tamper-proof; that's an inherent platform limit (it's why hardware wallets
+exist), accepted by Coinbase/MetaMask/banks alike.
+The two controls that actually move the needle:
+  Layer A (kill the SCALABLE vector): a strict Content-Security-Policy (`script-src 'self'` + a
+           nonce, no `unsafe-inline`/`unsafe-eval`) so an injected/supply-chain script can't run.
+           (The framework-agnostic CSP/HSTS *baseline* audit lives in `transport_security.py`.)
+  Layer B (anchor trust OFF the browser surface): an out-of-band verification path — emailed
+           canonical value, a short safety code / fingerprint, a server-rendered identicon, an
+           EIP-55 / IBAN checksum — so a single-surface tamper is at least *detectable*.
+Also emitted here (same trust boundary):
+  - weak-fingerprint  : a safety-code/fingerprint truncated to too few bits is grindable offline (#7).
+  - overclaimed-control: code or UI copy asserting a CLIENT-SIDE check is "tamper-proof" / "MitB-proof"
+    is a genuine finding — it makes teams overtrust a tripwire and under-invest in the real,
+    out-of-band/server-side control (#8).
+  - cswsh             : a WebSocket authenticated via an AMBIENT COOKIE (the CSWSH determinant).
+"""
+from __future__ import annotations
+import re
+from .base import Extractor, RepoContext
+# --- Security-critical sink values, classified by blast radius (severity ∝ irreversibility) ---
+SINK_MONEY = re.compile(
+    r"\b(?:wallet|receive|receiving|deposit|recipient|payout|beneficiary|payment|destination|payee)[_-]?address\b"
+    r"|\bwalletAddress\b|\btoAddress\b|\bpayTo\b|\bpayee\b|\brouting[_-]?number\b|\baccount[_-]?number\b"
+    r"|\biban\b|\bswift[_-]?code\b|\bsort[_-]?code\b|\b0x[0-9a-fA-F]{40}\b"
+    r"|crypto.{0,12}address|blockchain.{0,12}address", re.I)
+SINK_CREDENTIAL = re.compile(
+    r"\b(?:totp|2fa|mfa|authenticator)[_-]?(?:seed|secret|key)\b|\botpauth://"
+    r"|\b(?:recovery|seed|mnemonic)[_-]?phrase\b|\bmnemonic\b|\bprivate[_-]?key\b|\brecovery[_-]?code\b"
+    r"|\b(?:api|license|licence|access)[_-]?key\b|\bclient[_-]?secret\b", re.I)
+SINK_CONFIG = re.compile(
+    r"\bwebhook[_-]?url\b|\bcallback[_-]?url\b|\bdns[_-]?record\b|\bnameserver\b|\bcname[_-]?record\b"
+    r"|\binvoice[\s\S]{0,18}(?:account|iban|instructions|number)\b", re.I)
+# --- Sink-role signals: the value is demonstrably SEEN/COPIED/LINKED (data-flow gate) ---
+QR_SIGNAL = re.compile(r"\bqr[\s_-]?code\b|QRCode|react-qr|qrcode\.react|toDataURL\(", re.I)
+CLIPBOARD = re.compile(r"navigator\.clipboard|clipboard\.writeText|copyToClipboard|useCopyToClipboard"
+                       r"|writeText\(|execCommand\(\s*['\"]copy")
+HREF_SINK = re.compile(r"href=\{|href=['\"](?:tel:|mailto:|bitcoin:|ethereum:|lightning:)"
+                       r"|\b(?:to|toAddress|recipient|amount|payee)\s*=\s*\{")
+# #2 — the sink value arrives over a client-side round-trip the browser (and a MitB) can intercept,
+# rather than being server-rendered. A newly-added client fetch for a once-server-rendered value is a
+# regression in itself (manufactures a tamper vector).
+CLIENT_FETCH = re.compile(r"\bfetch\(|\baxios\b|useSWR\b|useQuery\b|useLazyQuery\b|\$\.(?:ajax|get|post)\b"
+                          r"|XMLHttpRequest|\.get\(['\"]/(?:api|v\d)|graphql\b", re.I)
+# Layer A — strict CSP detection (kept self-contained; transport_security.py owns the baseline audit)
+CSP_PRESENT = re.compile(r"Content-Security-Policy|contentSecurityPolicy", re.I)
+CSP_SCRIPT_SELF = re.compile(r"script-src[^;'\"]*'self'", re.I)
+CSP_NONCE = re.compile(r"'nonce-|nonce-\$\{|\bstrict-dynamic\b", re.I)
+CSP_UNSAFE = re.compile(r"'unsafe-(?:inline|eval)'", re.I)
+# Layer B — out-of-band trust anchor detection
+OOB_ANCHOR = re.compile(
+    r"safety[_-]?code|safetyCode|fingerprint|identicon|blockie|jazzicon|emoji[_-]?code"
+    r"|out[_-]of[_-]band|toChecksumAddress|getAddress\(|checksumAddress|\beip[_-]?55\b|verifyAddress"
+    r"|address[_-]?verif|verif\w*[_-]?address|sendVerificationEmail|canonical[_-]?address|mod[_-]?97", re.I)
+# #7 — a fingerprint / safety-code derived from a TRUNCATED hash is grindable offline. Flag a hash/HMAC
+# sliced to a small char count (hex → 4 bits/char, so .slice(0,12) ≈ 48 bits < the 60-bit floor), or a
+# *code variable sliced short. Heuristic robustness note, not a deterministic vuln.
+WEAK_FINGERPRINT = re.compile(
+    r"(?:sha256|sha1|sha512|md5|createHash|createHmac|\bhmac\b|digest)\b[\s\S]{0,90}?"
+    r"\.(?:slice|substring|substr)\(\s*0\s*,\s*([1-9]|1[0-4])\b"
+    r"|(?:safety|finger|verif|short|otp)[_-]?code\b[\s\S]{0,50}?\.(?:slice|substring|substr)\(\s*0\s*,\s*([1-9]|1[0-4])\b",
+    re.I)
+# #8 — dishonest control framing: a CLIENT-side check asserted to be unbeatable. Genuine finding.
+OVERCLAIM = re.compile(
+    r"tamper[\s_-]?proof|tamper[\s_-]?resistant|mitb[\s_-]?proof|man-in-the-browser[\s_-]?proof"
+    r"|impossible to (?:tamper|forge|fake|modify|intercept)|cryptographically (?:guaranteed|proven|secure)"
+    r"|can(?:'|no)?t be (?:tampered|forged|faked|modified|intercepted)|unhackable|100% (?:secure|safe)", re.I)
+# WebSocket / realtime auth model — the CSWSH determinant (REF-PENTEST #4). CSWSH is only
+# exploitable when the socket authenticates via an AMBIENT COOKIE the browser auto-attaches
+# cross-origin. A token in the connection payload / subprotocol, stored origin-scoped, is NOT
+# exploitable (SOP blocks a cross-origin page from reading it).
+WS_USAGE = re.compile(r"new\s+WebSocket\(|socket\.io|graphql-ws|subscriptions-transport-ws|appsync-realtime"
+                      r"|\bwss?://", re.I)
+WS_COOKIE_AUTH = re.compile(r"withCredentials\s*:\s*true|credentials\s*:\s*['\"]include['\"]"
+                            r"|document\.cookie[\s\S]{0,80}?(?:socket|ws\b|websocket)", re.I)
+class ClientIntegrityExtractor(Extractor):
+    name = "client_integrity"
+    category = "exposure"
+    def extract(self, ctx: RepoContext, facts: dict) -> dict:
+        sinks: dict[str, str] = {}          # rel -> blast radius (money|credential|config)
+        qr_files, clip_files = [], []
+        csp_present = csp_self = csp_nonce = csp_unsafe = False
+        oob, weak_fp, overclaim, tamper_vectors = [], [], [], []
+        ws_usage = ws_cookie = False
+        for _p, rel, text in ctx.iter_code():
+            has_copy = bool(CLIPBOARD.search(text) or QR_SIGNAL.search(text) or HREF_SINK.search(text))
+            # genuine browser-DISPLAY surface: a frontend file by extension, an explicit client component,
+            # or a known client-framework marker — NOT a backend service/repository that merely references
+            # an `account`/`recipient` field (the real-repo FP: backend message processors, SDK models).
+            client_file = (rel.lower().endswith((".tsx", ".jsx", ".vue", ".svelte", ".astro", ".html", ".hbs"))
+                           or "use client" in text[:400] or "@Component(" in text
+                           or "customElements.define" in text or "LitElement" in text)
+            # money sinks are specific on a client surface; the broader credential/config set additionally
+            # requires a copy/QR/href signal so a stray `apiKey` reference isn't noise.
+            radius = None
+            if client_file and SINK_MONEY.search(text):
+                radius = "money"
+            elif client_file and has_copy and SINK_CREDENTIAL.search(text):
+                radius = "credential"
+            elif client_file and has_copy and SINK_CONFIG.search(text):
+                radius = "config"
+            if radius:
+                sinks.setdefault(rel, radius)
+                if CLIENT_FETCH.search(text):   # #2 — sink fed by an interceptable client round-trip
+                    tamper_vectors.append(rel)
+            if QR_SIGNAL.search(text) and len(qr_files) < 30:
+                qr_files.append(rel)
+            if CLIPBOARD.search(text) and len(clip_files) < 30:
+                clip_files.append(rel)
+            if CSP_PRESENT.search(text):
+                csp_present = True
+                if CSP_SCRIPT_SELF.search(text):
+                    csp_self = True
+                if CSP_NONCE.search(text):
+                    csp_nonce = True
+                if CSP_UNSAFE.search(text):
+                    csp_unsafe = True
+            if OOB_ANCHOR.search(text) and len(oob) < 20:
+                oob.append(rel)
+            if client_file and WEAK_FINGERPRINT.search(text) and len(weak_fp) < 20:
+                weak_fp.append(rel)   # client-side safety code only — a backend HMAC truncation is out of scope
+            if client_file and OVERCLAIM.search(text) and len(overclaim) < 20:
+                overclaim.append(rel)
+            if WS_USAGE.search(text):
+                ws_usage = True
+            if WS_COOKIE_AUTH.search(text):
+                ws_cookie = True
+        # strict = a real `script-src 'self'` (+ a nonce / strict-dynamic) with NO unsafe-inline/eval
+        strict_csp = bool(csp_present and csp_self and csp_nonce and not csp_unsafe)
+        out_of_band = bool(oob)
+        ws_cookie_auth = bool(ws_usage and ws_cookie)   # the CSWSH determinant (ambient-cookie WS auth)
+        radii = set(sinks.values())
+        present = bool(sinks)
+        # severity tracks blast radius: a money/credential sink swap is irreversible → HIGH.
+        high_blast = bool(radii & {"money", "credential"})
+        sev_csp = "HIGH" if high_blast else "MEDIUM"
+        sev_oob = "MEDIUM" if high_blast else "LOW"
+        findings = []
+        if present:
+            shown = ", ".join(sorted(sinks)[:5])
+            kinds = "/".join(sorted(radii))
+            if not strict_csp:
+                why = ("no Content-Security-Policy found" if not csp_present
+                       else "CSP allows 'unsafe-inline'/'unsafe-eval' in script-src" if csp_unsafe
+                       else "CSP present but not a strict script-src 'self' + nonce policy")
+                findings.append({
+                    "severity": sev_csp, "confidence": "LOW", "attack_class": "tamperable-display",
+                    "file": sorted(sinks)[0],
+                    "issue": "security-critical value rendered client-side without a strict CSP",
+                    "detail": f"This app renders a {kinds}-class sink value the user reads/copies ({shown}) but "
+                              f"{why}. A poisoned dependency or injected script (man-in-the-browser) can then "
+                              "rewrite the displayed/copied value or swap the QR for EVERY user at once (the scalable "
+                              "vector). Add Layer A: `script-src 'self'` + per-request nonce + `strict-dynamic`, no "
+                              "unsafe-inline/eval, object-src 'none'. (Ship report-only first to avoid breaking SDKs, "
+                              "then enforce.) Severity tracks irreversibility — a swapped money/credential value is "
+                              "unrecoverable."})
+            if not out_of_band:
+                findings.append({
+                    "severity": sev_oob, "confidence": "LOW", "attack_class": "tamperable-display",
+                    "file": sorted(sinks)[0],
+                    "issue": "no out-of-band trust anchor for the displayed security-critical value",
+                    "detail": f"No second, browser-independent source of truth was found for {shown} "
+                              "(emailed canonical value, a short safety code / fingerprint, a server-rendered "
+                              "identicon, an EIP-55 / IBAN-mod-97 checksum). Without one, a single-surface tamper is "
+                              "undetectable by the user. Add Layer B: anchor trust OFF the browser surface so the user "
+                              "can cross-check. NOTE: on-screen display can never be made cryptographically "
+                              "tamper-proof on the web — the goal is detectable, not impossible."})
+        # #2 — sink value arrives via an interceptable client round-trip (server-render or sign it)
+        if present and tamper_vectors:
+            findings.append({
+                "severity": sev_oob, "confidence": "LOW", "attack_class": "client-tamper-vector",
+                "file": sorted(set(tamper_vectors))[0],
+                "issue": "security-critical value populated by a client-side fetch (interceptable in the browser)",
+                "detail": f"The sink value in {', '.join(sorted(set(tamper_vectors))[:4])} is populated by a client-side "
+                          "fetch/XHR whose response the browser — and a man-in-the-browser — can intercept and rewrite, "
+                          "rather than being server-rendered. Prefer server-render; if a round-trip is unavoidable, SIGN "
+                          "the payload and verify integrity, don't trust raw response fields. A NEWLY-added client "
+                          "round-trip for a once-server-rendered value is itself a regression."})
+        # #7 — grindable fingerprint/safety-code (robustness note, only meaningful when a sink exists)
+        if present and weak_fp:
+            findings.append({
+                "severity": "LOW", "confidence": "LOW", "attack_class": "weak-fingerprint",
+                "file": sorted(set(weak_fp))[0],
+                "issue": "safety-code / fingerprint derived from a truncated hash (grindable)",
+                "detail": f"A fingerprint/safety-code in {', '.join(sorted(set(weak_fp))[:4])} is a hash/HMAC sliced "
+                          "to a small character count. ~40-48 bits is brute-forceable on a commodity GPU in hours, so "
+                          "an attacker can grind a tampered value that yields a MATCHING code. Target >=60 bits, kept "
+                          "human-comparable (grouped base32, e.g. XXXX-XXXX-XXXX). Verify the slice length / encoding."})
+        # #8 — over-claimed control framing (genuine finding: it manufactures misplaced trust)
+        if overclaim:
+            findings.append({
+                "severity": "LOW", "confidence": "MEDIUM", "attack_class": "overclaimed-control",
+                "file": sorted(set(overclaim))[0],
+                "issue": "client-side check framed as tamper-proof / cryptographically guaranteed",
+                "detail": f"Code or UI copy in {', '.join(sorted(set(overclaim))[:4])} asserts a CLIENT-SIDE control "
+                          "is tamper-proof / MitB-proof / cryptographically guaranteed. On the web that claim is false "
+                          "(the DOM is rewritable post-TLS) and it's a real finding: it makes teams and auditors "
+                          "OVERTRUST a tripwire and under-invest in the actual out-of-band / server-side control. "
+                          "Scope the claim honestly ('opportunistic tamper tripwire, not a guarantee') and ensure the "
+                          "trust root is out-of-band or server-side."})
+        # CSWSH is ONLY real when the WS auth is an ambient cookie (REF-PENTEST #4).
+        if ws_cookie_auth:
+            findings.append({
+                "severity": "MEDIUM", "confidence": "LOW", "attack_class": "cswsh",
+                "issue": "WebSocket authenticated via an ambient cookie (Cross-Site WebSocket Hijacking)",
+                "detail": "A WebSocket/realtime connection appears to authenticate via a cookie "
+                          "(withCredentials / credentials:'include'), which the browser auto-attaches "
+                          "cross-origin — so a page on any origin can open an authenticated socket (CSWSH, #4). "
+                          "Validate the Origin on the handshake, or move the credential into the connection "
+                          "payload / subprotocol and store it origin-scoped (not a cookie). If WS auth is "
+                          "already a token in the payload, CSWSH is NOT exploitable."})
+        return {
+            "sensitive_display": sorted(sinks),
+            "sink_blast_radius": dict(sorted(sinks.items())),
+            "websocket_auth": ("cookie (CSWSH-exposed — validate Origin)" if ws_cookie_auth
+                               else "token-or-none (CSWSH not exploitable)" if ws_usage
+                               else "no websocket detected"),
+            "qr_generation": sorted(set(qr_files)),
+            "clipboard_copy": sorted(set(clip_files)),
+            "strict_csp": strict_csp,
+            "csp_present": csp_present,
+            "csp_has_unsafe": csp_unsafe,
+            "out_of_band_anchor": out_of_band,
+            "anchors_found": sorted(set(oob)),
+            "weak_fingerprints": sorted(set(weak_fp)),
+            "overclaimed_controls": sorted(set(overclaim)),
+            "client_fetch_sinks": sorted(set(tamper_vectors)),
+            "findings": findings,
+            "note": (f"Renders {'/'.join(sorted(radii))}-class security-critical value(s) — review man-in-the-browser "
+                     "exposure: strict CSP (kill the scalable vector) + an out-of-band anchor (make tamper "
+                     "detectable). Inherent web-platform limit; treat as architectural, LOW-confidence." if present else
+                     "No security-critical display values detected — MITB/tamperable-display class N/A."),
+        }

{websec_validator-0.4.2 → websec_validator-0.5.0}/src/websec_validator/extractors/graphql.py RENAMED Viewed

@@ -22,7 +22,7 @@ PLAYGROUND = re.compile(r"playground\s*:\s*true|graphiql\s*:\s*true|LandingPageG
 LIMITING = re.compile(r"graphql-depth-limit|depthLimit|costAnalysis|graphql-cost-analysis|"
                       r"createComplexityLimitRule|query-complexity|graphql-armor")
-# --- AppSync / managed GraphQL (PTREQ0013000 #2 introspection-via-WAF-bypass, #5 sub-authz) ---
+# --- AppSync / managed GraphQL (REF-PENTEST #2 introspection-via-WAF-bypass, #5 sub-authz) ---
 APPSYNC_MARK = re.compile(r"appsync\.GraphqlApi|CfnGraphQLApi|Definition\.fromSchema|aws-appsync|aws_appsync", re.I)
 AWS_AUTH_DIRECTIVE = re.compile(r"@aws_(?:api_key|iam|oidc|cognito_user_pools|auth|subscribe)")
 # A Subscription field that carries a tenant-scoping arg MUST be authz-bound in its resolver, or any
@@ -33,7 +33,7 @@ TENANT_ARG = re.compile(r"\b(\w+)\s*\(([^)]*\b(?:groupId|group_id|orgId|org_id|t
 # Identity-binding signals in a VTL resolver — the field is tied to the CALLER, not a free arg.
 VTL_AUTHZ = re.compile(r"\$ctx(?:tx)?\.identity|\$context\.identity|identity\.(?:sub|username|claims|resolverContext)"
                        r"|util\.unauthorized|\bgroupIds?\b[\s\S]{0,80}?\bcontains\b|#if\s*\(\s*!?\s*\$ctx\.identity")
-# Engine-level introspection disable on aws-cdk-lib appsync.GraphqlApi. The PTREQ0013000 RETEST
+# Engine-level introspection disable on aws-cdk-lib appsync.GraphqlApi. The REF-PENTEST RETEST
 # proved this IS available and un-bypassable (unlike a WAF string-match) — so a correctly-configured
 # AppSync API must NOT be flagged. This corrects the 0.3.0 false positive that always cried wolf.
 APPSYNC_INTROSPECTION_OFF = re.compile(r"introspectionConfig\s*:\s*[\w.]*\bDISABLED\b")
@@ -93,7 +93,12 @@ class GraphQLExtractor(Extractor):
                                  "detail": "Set `introspectionConfig: appsync.IntrospectionConfig.DISABLED` so the engine "
                                            "rejects __schema/__type regardless of encoding. A WAF byte-match on `__schema` "
                                            "is NOT sufficient — bypassable via Unicode/JSON escapes and it only fronts one "
-                                           "endpoint (PTREQ0013000 #2). Run the appsync-introspection probe to confirm."})
+                                           "endpoint (REF-PENTEST #2). Fronting AppSync with API Gateway is ALSO not the "
+                                           "fix: it proxies POST /graphql opaquely (it can't parse the query to block "
+                                           "introspection without the same bypassable string-match) and does not cover the "
+                                           "SEPARATE realtime WebSocket endpoint, so subscription-BOLA / CSWSH remain — fix "
+                                           "at the engine/auth layer, treat any gateway/WAF as defense-in-depth only. Run "
+                                           "the appsync-introspection probe to confirm."})
             if not (appsync_limiting or limiting):
                 findings.append({"severity": "LOW", "issue": "AppSync has no query depth / resolver-count limit",
                                  "attack_class": "graphql",
@@ -131,7 +136,7 @@ class GraphQLExtractor(Extractor):
     def _subscription_authz(self, ctx: RepoContext, schema_texts: list, findings: list) -> list:
         """For each Subscription field carrying a tenant-scoping arg, check a co-located VTL resolver
         binds that arg to the caller's identity. Missing/passthrough VTL → cross-group BOLA: any
-        authenticated user subscribes to any tenant's stream (PTREQ0013000 #5). Verified shape:
+        authenticated user subscribes to any tenant's stream (REF-PENTEST #5). Verified shape:
         the fixed (identity-bound) VTL PASSES; the pre-fix passthrough FIRES."""
         vtl_corpus = {ctx.rel(p): ctx.text(p) for p in ctx.glob("**/*.vtl", 300)}
         results = []
@@ -155,7 +160,7 @@ class GraphQLExtractor(Extractor):
                     detail = (f"Subscription `{field}({args})` accepts a tenant arg but its VTL resolver does NOT bind "
                               f"it to the caller's identity ($ctx.identity / groupIds.contains / util.unauthorized) — "
                               f"any authenticated user can subscribe to ANY tenant's stream (cross-group BOLA, "
-                              f"PTREQ0013000 #5).")
+                              f"REF-PENTEST #5).")
                 results.append({"field": field, "verdict": verdict, "severity": sev})
                 if sev != "OK":
                     findings.append({"severity": sev, "attack_class": "bola",

{websec_validator-0.4.2 → websec_validator-0.5.0}/src/websec_validator/extractors/iac_ci.py RENAMED Viewed

@@ -19,7 +19,7 @@ UNTRUSTED = re.compile(
 USES = re.compile(r"uses:\s*([^\s@#]+)@([^\s#'\"]+)")
 SHA40 = re.compile(r"^[0-9a-f]{40}$")
-# CDK / managed-AppSync auth (PTREQ0013000 #4 CSWSH, + the #2/#5 attack surface). Regex over CDK
+# CDK / managed-AppSync auth (REF-PENTEST #4 CSWSH, + the #2/#5 attack surface). Regex over CDK
 # TypeScript, not an AST — aliased/helper-extracted constructs can evade it (honest FN risk).
 APPSYNC_API = re.compile(r"appsync\.GraphqlApi|new\s+GraphqlApi|CfnGraphQLApi|aws-cdk-lib/aws-appsync|@aws-cdk/aws-appsync")
 # defaultAuthorization block resolving to API_KEY → the realtime/WebSocket endpoint takes a static
@@ -30,7 +30,7 @@ APPSYNC_APIKEY_MODE = re.compile(r"AuthorizationType\.API_KEY|authorizationType\
 WAFV2 = re.compile(r"wafv2\.CfnWebACL|\bCfnWebACL\b|aws_wafv2|wafv2\.CfnWebACLAssociation")
 WAF_ASSOC = re.compile(r"CfnWebACLAssociation|WebACLAssociation")
 # WAF used as the PRIMARY control for an app-layer flaw — a bypassable band-aid, not a remediation
-# (PTREQ0013000 #2/#11). A byteMatchStatement/regex matching `__schema`, SQL keywords or `<script`
+# (REF-PENTEST #2/#11). A byteMatchStatement/regex matching `__schema`, SQL keywords or `<script`
 # means the app-layer bug is still there; the string-match is evadable via encoding + only one door.
 WAF_APPLAYER_MATCH = re.compile(
     r"(?:byteMatchStatement|searchString|RegexPatternSet|regexString)[\s\S]{0,220}?"

websec-validator 0.4.2__tar.gz → 0.5.0__tar.gz

websec-validator 0.4.2tar.gz → 0.5.0tar.gz