PyPI - codejury - Versions diffs - 0.5.0__tar.gz → 0.6.0__tar.gz - Mend

codejury 0.5.0tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (153) hide show

{codejury-0.5.0 → codejury-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codejury
-Version: 0.5.0
+Version: 0.6.0
 Summary: General-purpose Application Security AI audit framework -- five-layer architecture, capabilities as first-class data
 Author: AISecLabs
 License-Expression: MIT
@@ -25,6 +25,7 @@ Provides-Extra: litellm
 Requires-Dist: litellm>=1.0; extra == "litellm"
 Provides-Extra: dev
 Requires-Dist: pytest>=8.0; extra == "dev"
+Requires-Dist: jsonschema>=4.0; extra == "dev"
 Dynamic: license-file
 # codejury
@@ -80,15 +81,32 @@ git diff | codejury audit --provider anthropic
 | `codejury audit [diff]` | Audit a unified diff from a file or stdin (`-`). |
 | `codejury scan <dir>` | Audit a whole directory tree, capability by capability. |
 | `codejury run <task>` | Run a named task preset (see [Tasks](#tasks)). |
-| `codejury eval` | Score the golden cases and report precision / recall. |
+| `codejury eval` | Score the golden cases; report precision / recall / F1, overall and per capability. |
 Shared flags: `--orchestrator {single,pipeline,debate,reflexion,challenge}`,
-`--provider {anthropic,openai,litellm}`, `--model`, `--format {text,markdown,json}`.
+`--provider {anthropic,openai,litellm}`, `--model`,
+`--format {text,markdown,json,sarif}`.
+`--format sarif` emits a SARIF 2.1.0 log (validates against the official schema)
+for CI and security dashboards: each problem with a code location becomes a
+result carrying its capability (as the rule id), CWE, and a precise location.
 Findings in known-noise categories (availability/DoS, rate limiting, memory safety
 outside C/C++) are dropped by versioned rules in
 `codejury/data/suppressions.yaml`; disable with `--no-suppress`.
+`codejury eval` takes `--dataset <dir>` (golden YAML directory), `--split <name>`
+(score only cases tagged with that `split:`, e.g. a held-out set), and
+`--format {text,json}` -- the JSON report is a stable schema (overall plus
+per-capability confusion matrix and precision / recall / F1).
+Runs are deterministic: providers query at temperature 0, and `audit` / `scan`
+cache each verdict on a hash of the normalized code, the in-scope capability
+versions, and the orchestration. Re-auditing unchanged code returns the recorded
+verdicts without re-querying the model; editing a capability YAML changes its
+fingerprint and invalidates affected entries. Pass `--no-cache` to always
+re-query.
 ```bash
 # Multi-round adversarial debate, rendered as Markdown
 git diff | codejury audit --orchestrator debate --format markdown - > report.md
@@ -167,11 +185,14 @@ independently.
 - **Local-pattern checks are sharper than data-flow ones.** Capabilities judged
   from one spot (weak crypto, hardcoded secrets) are reliable; taint / data-flow
   ones like path traversal over-flag in single-file review because the verifier
-  can't see whether a value is attacker-controlled. Mitigations that help but do
-  not fully solve it: `scan --callers` (cross-file call sites for provenance),
-  `--orchestrator challenge` (a recall-safe refutation pass that drops only
-  provably-safe flags), `--only` to scope, or `--orchestrator debate`. Real taint
-  precision needs data-flow analysis, not model skepticism.
+  can't see whether a value is attacker-controlled. Mitigations that add context
+  but do not fully solve it: `scan --callers` (where this file's functions are
+  called) and `scan --callees` (the called code it delegates to, so a sink in
+  another file is visible) -- pair them for both directions; `--orchestrator
+  challenge` (a recall-safe
+  refutation pass that drops only provably-safe flags); `--only` to scope; or
+  `--orchestrator debate`. Real taint precision still needs data-flow analysis,
+  not model skepticism.
 - **`scan` cost scales as files x capabilities.** It is a periodic deep audit,
   not a quick check -- scope it with `--only`. Day to day, audit the diff.

{codejury-0.5.0 → codejury-0.6.0}/README.md RENAMED Viewed

@@ -51,15 +51,32 @@ git diff | codejury audit --provider anthropic
 | `codejury audit [diff]` | Audit a unified diff from a file or stdin (`-`). |
 | `codejury scan <dir>` | Audit a whole directory tree, capability by capability. |
 | `codejury run <task>` | Run a named task preset (see [Tasks](#tasks)). |
-| `codejury eval` | Score the golden cases and report precision / recall. |
+| `codejury eval` | Score the golden cases; report precision / recall / F1, overall and per capability. |
 Shared flags: `--orchestrator {single,pipeline,debate,reflexion,challenge}`,
-`--provider {anthropic,openai,litellm}`, `--model`, `--format {text,markdown,json}`.
+`--provider {anthropic,openai,litellm}`, `--model`,
+`--format {text,markdown,json,sarif}`.
+`--format sarif` emits a SARIF 2.1.0 log (validates against the official schema)
+for CI and security dashboards: each problem with a code location becomes a
+result carrying its capability (as the rule id), CWE, and a precise location.
 Findings in known-noise categories (availability/DoS, rate limiting, memory safety
 outside C/C++) are dropped by versioned rules in
 `codejury/data/suppressions.yaml`; disable with `--no-suppress`.
+`codejury eval` takes `--dataset <dir>` (golden YAML directory), `--split <name>`
+(score only cases tagged with that `split:`, e.g. a held-out set), and
+`--format {text,json}` -- the JSON report is a stable schema (overall plus
+per-capability confusion matrix and precision / recall / F1).
+Runs are deterministic: providers query at temperature 0, and `audit` / `scan`
+cache each verdict on a hash of the normalized code, the in-scope capability
+versions, and the orchestration. Re-auditing unchanged code returns the recorded
+verdicts without re-querying the model; editing a capability YAML changes its
+fingerprint and invalidates affected entries. Pass `--no-cache` to always
+re-query.
 ```bash
 # Multi-round adversarial debate, rendered as Markdown
 git diff | codejury audit --orchestrator debate --format markdown - > report.md
@@ -138,11 +155,14 @@ independently.
 - **Local-pattern checks are sharper than data-flow ones.** Capabilities judged
   from one spot (weak crypto, hardcoded secrets) are reliable; taint / data-flow
   ones like path traversal over-flag in single-file review because the verifier
-  can't see whether a value is attacker-controlled. Mitigations that help but do
-  not fully solve it: `scan --callers` (cross-file call sites for provenance),
-  `--orchestrator challenge` (a recall-safe refutation pass that drops only
-  provably-safe flags), `--only` to scope, or `--orchestrator debate`. Real taint
-  precision needs data-flow analysis, not model skepticism.
+  can't see whether a value is attacker-controlled. Mitigations that add context
+  but do not fully solve it: `scan --callers` (where this file's functions are
+  called) and `scan --callees` (the called code it delegates to, so a sink in
+  another file is visible) -- pair them for both directions; `--orchestrator
+  challenge` (a recall-safe
+  refutation pass that drops only provably-safe flags); `--only` to scope; or
+  `--orchestrator debate`. Real taint precision still needs data-flow analysis,
+  not model skepticism.
 - **`scan` cost scales as files x capabilities.** It is a periodic deep audit,
   not a quick check -- scope it with `--only`. Day to day, audit the diff.

{codejury-0.5.0 → codejury-0.6.0}/codejury/__init__.py RENAMED Viewed

@@ -5,4 +5,9 @@ Domain knowledge lives in YAML capability files as a first-class citizen,
 aligned with OWASP ASVS.
 """
-__version__ = "0.0.0"
+from importlib.metadata import PackageNotFoundError, version
+try:
+    __version__ = version("codejury")
+except PackageNotFoundError:  # running from a source tree without an install
+    __version__ = "0.0.0"

{codejury-0.5.0 → codejury-0.6.0}/codejury/agents/verifier.py RENAMED Viewed

@@ -93,15 +93,27 @@ def _build_prompt(path: str, content: str, cap: Capability, context: str = "") -
     )
+def _anti_pattern_cwes(cap: Capability) -> dict[str, str]:
+    """Map anti_pattern id -> CWE, so a verdict can inherit the CWE it matched."""
+    return {
+        p.id: p.cwe
+        for sub in cap.sub_capabilities.values()
+        for p in sub.anti_patterns
+        if p.cwe
+    }
 def _parse_verdicts(text: str, cap: Capability) -> list[Verdict]:
     obj = extract_json_object(text)
     if not obj:
         return []
+    cwe_by_id = _anti_pattern_cwes(cap)
     out: list[Verdict] = []
     for v in obj.get("verdicts", []):
         if not isinstance(v, dict):
             continue
         sub = str(v.get("sub_capability", "")).strip()
+        matched_anti = str_list(v.get("matched_anti"))
         out.append(
             Verdict(
                 capability=f"{cap.id}.{sub}" if sub else cap.id,
@@ -109,7 +121,8 @@ def _parse_verdicts(text: str, cap: Capability) -> list[Verdict]:
                 status=one_of(v.get("status"), _VALID_STATUS, "UNKNOWN"),
                 reasoning=str(v.get("reasoning", "")),
                 matched_correct=str_list(v.get("matched_correct")),
-                matched_anti=str_list(v.get("matched_anti")),
+                matched_anti=matched_anti,
+                cwe=next((cwe_by_id[a] for a in matched_anti if a in cwe_by_id), ""),
                 evidence=to_evidence(v.get("evidence")),
                 confidence=to_float(v.get("confidence"), 0.5),
             )

{codejury-0.5.0 → codejury-0.6.0}/codejury/assembly.py RENAMED Viewed

@@ -16,6 +16,7 @@ from codejury.domain.artifact import CodeArtifact
 from codejury.domain.capability import Capability
 from codejury.domain.context import AnalysisContext
 from codejury.domain.result import AnalysisResult
+from codejury.infrastructure.cache import VerdictCache, verdict_key
 from codejury.orchestrators.base import Orchestrator
 from codejury.orchestrators.challenge import ChallengeOrchestrator
 from codejury.orchestrators.debate import DebateOrchestrator
@@ -75,17 +76,38 @@ def build_orchestration(
     return verifier, SingleOrchestrator()
+def orchestration_descriptor(strategy: str, model: str, max_tokens: int) -> str:
+    """The non-code, non-capability inputs that affect a verdict, as a cache tag."""
+    return f"{strategy}|{model}|{max_tokens}"
 def run_over_artifacts(
     artifacts: list[CodeArtifact],
     capabilities: list[Capability],
     agents: dict[str, Agent],
     orchestrator: Orchestrator,
+    *,
+    cache: VerdictCache | None = None,
+    orchestration: str = "",
 ) -> list[tuple[str, AnalysisResult]]:
-    """Run the orchestration over each artifact, returning (path, result) per artifact."""
+    """Run the orchestration over each artifact, returning (path, result) per artifact.
+    When ``cache`` is given, an unchanged artifact returns its recorded result
+    instead of re-running the orchestrator (determinism, invariant 2).
+    """
     results = []
     for artifact in artifacts:
+        if cache is not None:
+            key = verdict_key(artifact, capabilities, orchestration=orchestration)
+            hit = cache.get(key)
+            if hit is not None:
+                results.append((artifact.path, hit))
+                continue
         ctx = AnalysisContext(artifact=artifact, capabilities=capabilities)
-        results.append((artifact.path, orchestrator.run(agents, ctx)))
+        result = orchestrator.run(agents, ctx)
+        if cache is not None:
+            cache.put(key, result)
+        results.append((artifact.path, result))
     return results
@@ -94,5 +116,11 @@ def run_over_source(
     capabilities: list[Capability],
     agents: dict[str, Agent],
     orchestrator: Orchestrator,
+    *,
+    cache: VerdictCache | None = None,
+    orchestration: str = "",
 ) -> list[tuple[str, AnalysisResult]]:
-    return run_over_artifacts(source.list_artifacts(), capabilities, agents, orchestrator)
+    return run_over_artifacts(
+        source.list_artifacts(), capabilities, agents, orchestrator,
+        cache=cache, orchestration=orchestration,
+    )

{codejury-0.5.0 → codejury-0.6.0}/codejury/cli.py RENAMED Viewed

@@ -9,6 +9,7 @@ library, backed by the Anthropic provider, under a chosen orchestration strategy
 from __future__ import annotations
 import argparse
+import json
 import os
 import sys
@@ -21,6 +22,7 @@ from codejury.assembly import (
     STRATEGIES,
     build_orchestration,
     make_provider,
+    orchestration_descriptor,
     run_over_artifacts,
     run_over_source,
 )
@@ -29,11 +31,12 @@ from codejury.domain.capability import Capability, load_capabilities
 from codejury.domain.context import AnalysisContext
 from codejury.domain.observation import Observation
 from codejury.domain.result import AnalysisResult
-from codejury.evaluation import Metrics, evaluate, load_cases
+from codejury.evaluation import EvalReport, evaluate, load_cases
+from codejury.infrastructure.cache import VerdictCache
 from codejury.orchestrators.single import SingleOrchestrator
 from codejury.providers.base import Provider
 from codejury.providers.mock import MockProvider
-from codejury.reporting import to_json, to_markdown
+from codejury.reporting import to_json, to_markdown, to_sarif
 from codejury.resources import CAPABILITIES_DIR, GOLDEN_DIR, SUPPRESSIONS_FILE, TASKS_DIR
 from codejury.suppression import filter_results, load_suppressions
 from codejury.integrations.github import build_review, parse_pr_ref, post_review
@@ -43,7 +46,7 @@ from codejury.sources.repo import RepoSource
 from codejury.tasks.base import run_task
 from codejury.tasks.registry import load_tasks
-_FORMATS = ("text", "markdown", "json")
+_FORMATS = ("text", "markdown", "json", "sarif")
 def dry_run() -> AnalysisResult:
@@ -69,10 +72,14 @@ def audit(
     model: str,
     max_tokens: int = 2048,
     strategy: str = "single",
+    cache: VerdictCache | None = None,
 ) -> list[tuple[str, AnalysisResult]]:
     """Audit each changed file in `diff_text`, returning (path, result) per file."""
     agents, orchestrator = build_orchestration(strategy, provider=provider, model=model, max_tokens=max_tokens)
-    return run_over_source(DiffSource(diff_text), capabilities, agents, orchestrator)
+    return run_over_source(
+        DiffSource(diff_text), capabilities, agents, orchestrator,
+        cache=cache, orchestration=orchestration_descriptor(strategy, model, max_tokens),
+    )
 def scan(
@@ -86,10 +93,16 @@ def scan(
     extensions: tuple[str, ...] = (".py",),
     max_chars: int = 200_000,
     with_callers: bool = False,
+    with_callees: bool = False,
+    cache: VerdictCache | None = None,
 ) -> list[tuple[str, AnalysisResult]]:
     """Audit every matching file in a directory tree, returning (path, result) per artifact."""
     source = RepoSource(
-        directory, extensions=extensions, chunker=Chunker(max_chars=max_chars), with_callers=with_callers
+        directory,
+        extensions=extensions,
+        chunker=Chunker(max_chars=max_chars),
+        with_callers=with_callers,
+        with_callees=with_callees,
     )
     artifacts = source.list_artifacts()
     calls = len(artifacts) * len(capabilities)
@@ -98,7 +111,10 @@ def scan(
         file=sys.stderr,
     )
     agents, orchestrator = build_orchestration(strategy, provider=provider, model=model, max_tokens=max_tokens)
-    return run_over_artifacts(artifacts, capabilities, agents, orchestrator)
+    return run_over_artifacts(
+        artifacts, capabilities, agents, orchestrator,
+        cache=cache, orchestration=orchestration_descriptor(strategy, model, max_tokens),
+    )
 def _render_dry_run(result: AnalysisResult) -> str:
@@ -137,7 +153,7 @@ def _render_observation(o: Observation) -> str:
 def _render_results(fmt: str, results: list[tuple[str, AnalysisResult]]) -> str:
-    return {"text": _render_audit, "markdown": to_markdown, "json": to_json}[fmt](results)
+    return {"text": _render_audit, "markdown": to_markdown, "json": to_json, "sarif": to_sarif}[fmt](results)
 def _maybe_suppress(results: list[tuple[str, AnalysisResult]], enabled: bool) -> list[tuple[str, AnalysisResult]]:
@@ -184,11 +200,16 @@ def _maybe_post_github(ref: str | None, results: list[tuple[str, AnalysisResult]
         print(f"github review failed: {exc}", file=sys.stderr)
-def _render_metrics(m: Metrics) -> str:
-    return (
-        f"cases: {m.total}  (tp={m.tp} fp={m.fp} tn={m.tn} fn={m.fn})\n"
-        f"precision: {m.precision:.2f}  recall: {m.recall:.2f}  accuracy: {m.accuracy:.2f}"
-    )
+def _render_eval(report: EvalReport) -> str:
+    def line(label: str, m) -> str:
+        return (
+            f"{label:<20} tp={m.tp} fp={m.fp} tn={m.tn} fn={m.fn}  "
+            f"P={m.precision:.2f} R={m.recall:.2f} F1={m.f1:.2f}"
+        )
+    lines = [line(f"overall ({report.overall.total} cases)", report.overall)]
+    lines += [line(cap, m) for cap, m in sorted(report.by_capability.items())]
+    return "\n".join(lines)
 def _read_diff(path: str) -> str:
@@ -216,6 +237,7 @@ def main(argv: list[str] | None = None) -> int:
     audit_p.add_argument("--api-base", default=DEFAULT_API_BASE, help="provider base URL (env: CODEJURY_API_BASE)")
     audit_p.add_argument("--api-key", default=DEFAULT_API_KEY, help="provider API key (env: CODEJURY_API_KEY)")
     audit_p.add_argument("--no-suppress", action="store_true", help="disable the known-noise suppression filter")
+    audit_p.add_argument("--no-cache", action="store_true", help="bypass the verdict cache (always re-query the model)")
     audit_p.add_argument("--fail-on", choices=_FAIL_ON, default=None, dest="fail_on", help="exit 1 if a finding at/above this severity is found")
     audit_p.add_argument("--github", default=None, help="post a PR review: owner/repo#number (needs GITHUB_TOKEN)")
@@ -231,11 +253,15 @@ def main(argv: list[str] | None = None) -> int:
     scan_p.add_argument("--max-tokens", type=int, default=2048)
     scan_p.add_argument("--max-chars", type=int, default=200_000, help="chunk budget; default keeps whole files")
     scan_p.add_argument(
-        "--callers", action="store_true", help="add cross-file call sites as context (cuts taint false positives)"
+        "--callers", action="store_true", help="add cross-file context: where this file's functions are called"
+    )
+    scan_p.add_argument(
+        "--callees", action="store_true", help="add cross-file context: the called code this file delegates to"
     )
     scan_p.add_argument("--api-base", default=DEFAULT_API_BASE, help="provider base URL (env: CODEJURY_API_BASE)")
     scan_p.add_argument("--api-key", default=DEFAULT_API_KEY, help="provider API key (env: CODEJURY_API_KEY)")
     scan_p.add_argument("--no-suppress", action="store_true", help="disable the known-noise suppression filter")
+    scan_p.add_argument("--no-cache", action="store_true", help="bypass the verdict cache (always re-query the model)")
     scan_p.add_argument("--fail-on", choices=_FAIL_ON, default=None, dest="fail_on", help="exit 1 if a finding at/above this severity is found")
     run_p = sub.add_parser("run", help="run a named task preset against a unified diff")
@@ -248,9 +274,11 @@ def main(argv: list[str] | None = None) -> int:
     run_p.add_argument("--fail-on", choices=_FAIL_ON, default=None, dest="fail_on", help="exit 1 if a finding at/above this severity is found")
     eval_p = sub.add_parser("eval", help="score golden cases and report precision/recall")
-    eval_p.add_argument("--golden", default=GOLDEN_DIR, help="golden case YAML directory")
+    eval_p.add_argument("--dataset", default=GOLDEN_DIR, help="golden case YAML directory")
+    eval_p.add_argument("--split", default=None, help="only score cases whose 'split' matches (e.g. held-out)")
     eval_p.add_argument("--capabilities", default=CAPABILITIES_DIR, help="capability YAML directory")
     eval_p.add_argument("--provider", choices=PROVIDERS, default="anthropic")
+    eval_p.add_argument("--format", choices=("text", "json"), default="text", dest="fmt")
     eval_p.add_argument("--model", default=DEFAULT_MODEL)
     eval_p.add_argument("--api-base", default=DEFAULT_API_BASE, help="provider base URL (env: CODEJURY_API_BASE)")
     eval_p.add_argument("--api-key", default=DEFAULT_API_KEY, help="provider API key (env: CODEJURY_API_KEY)")
@@ -267,6 +295,7 @@ def main(argv: list[str] | None = None) -> int:
             model=args.model,
             max_tokens=args.max_tokens,
             strategy=args.orchestrator,
+            cache=None if args.no_cache else VerdictCache(),
         )
         results = _maybe_suppress(results, not args.no_suppress)
         print(_render_results(args.fmt, results))
@@ -289,6 +318,8 @@ def main(argv: list[str] | None = None) -> int:
             extensions=extensions,
             max_chars=args.max_chars,
             with_callers=args.callers,
+            with_callees=args.callees,
+            cache=None if args.no_cache else VerdictCache(),
         )
         results = _maybe_suppress(results, not args.no_suppress)
         print(_render_results(args.fmt, results))
@@ -308,8 +339,8 @@ def main(argv: list[str] | None = None) -> int:
     if args.command == "eval":
         try:
-            metrics = evaluate(
-                load_cases(args.golden),
+            report = evaluate(
+                load_cases(args.dataset, split=args.split),
                 load_capabilities(args.capabilities),
                 provider=make_provider(args.provider, api_key=args.api_key, api_base=args.api_base),
                 model=args.model,
@@ -319,7 +350,7 @@ def main(argv: list[str] | None = None) -> int:
             # as one line, not a traceback (audit gets this via the orchestrator).
             print(f"eval failed: {exc}")
             return 1
-        print(_render_metrics(metrics))
+        print(json.dumps(report.to_dict(), indent=2) if args.fmt == "json" else _render_eval(report))
         return 0
     if args.command in (None, "dry-run"):

{codejury-0.5.0 → codejury-0.6.0}/codejury/data/capabilities/dependency_config.yaml RENAMED Viewed

@@ -46,7 +46,34 @@ sub_capabilities:
         signals: ["admin:admin", "password=admin", "changeme"]
         why_bad: Default credentials are public knowledge and trivially abused
+  transport_security:
+    correct_patterns:
+      - id: TLS-OK-1
+        description: >-
+          Leave TLS certificate verification at its secure default -- verify omitted or
+          verify=True, the default SSL context, hostname checking on
+        signals: ["verify=True", "create_default_context", "requests.get(", "requests.post("]
+        why_ok: >-
+          The secure default validates the certificate chain and hostname. An https:// call
+          that does not disable verification is fine; do not flag it just for making a
+          request or for omitting verify.
+    anti_patterns:
+      - id: TLS-BAD-1
+        cwe: CWE-295
+        severity: HIGH
+        description: >-
+          Disable TLS certificate or hostname verification -- verify=False, CERT_NONE,
+          check_hostname=False, or an unverified SSL context
+        signals: ["verify=False", "CERT_NONE", "check_hostname = False", "_create_unverified_context"]
+        why_bad: An unverified TLS connection is open to a man-in-the-middle despite https://
+        example_bad: |
+          requests.get("https://api.partner.com/data", verify=False)
+        example_good: |
+          requests.get("https://api.partner.com/data")  # verify defaults to True
 trigger_signals:
   - dependency manifests and lock files
   - install or bootstrap scripts fetching remote code
   - file permission, bucket ACL, or default credential settings
+  - TLS client calls that set verify or build a custom SSL context

{codejury-0.5.0 → codejury-0.6.0}/codejury/data/capabilities/input_validation.yaml RENAMED Viewed

@@ -105,7 +105,67 @@ sub_capabilities:
           if not target.is_relative_to(UPLOAD_DIR):
               raise ValueError("path escapes upload dir")
+  ssrf:
+    correct_patterns:
+      - id: SSRF-OK-1
+        description: Validate the request URL's host against an allowlist before fetching it
+        signals: ["urlparse(", ".hostname", "ALLOWED", "allowlist"]
+        why_ok: An attacker cannot redirect the fetch to an internal target the list omits
+      - id: SSRF-OK-2
+        description: >-
+          Fetch a URL that is not attacker-controlled -- a constant, a value from trusted
+          config, or an operator-supplied argument
+        why_ok: >-
+          SSRF needs an external attacker to control the destination. A constant URL or one
+          from trusted config is not a finding, even though it goes through a fetch call.
+    anti_patterns:
+      - id: SSRF-BAD-1
+        cwe: CWE-918
+        severity: HIGH
+        description: >-
+          Fetch a URL taken from externally controlled input (HTTP request, form, query, or
+          message field) without validating its host against an allowlist. NOT this: a
+          constant URL, one from trusted config, or an operator-supplied argument.
+        signals: ["requests.get(", "urllib.request.urlopen(", "httpx.", "request.args", "request.json"]
+        why_bad: >-
+          The server makes the request, so attacker input reaches internal-only targets --
+          cloud metadata, localhost admin ports, internal APIs behind the firewall.
+        example_bad: |
+          requests.get(request.args["url"]).text
+        example_good: |
+          if urlparse(url).hostname not in ALLOWED_HOSTS:
+              raise ValueError("host not allowed")
+          requests.get(url).text
+  insecure_deserialization:
+    correct_patterns:
+      - id: DESER-OK-1
+        description: >-
+          Parse untrusted input with a data-only parser -- json.loads or yaml.safe_load --
+          that cannot instantiate arbitrary objects
+        signals: ["json.loads", "yaml.safe_load"]
+        why_ok: A data-only parser builds plain structures and has no code-execution path
+    anti_patterns:
+      - id: DESER-BAD-1
+        cwe: CWE-502
+        severity: CRITICAL
+        description: >-
+          Deserialize externally controlled bytes with an object-constructing deserializer --
+          pickle, marshal, yaml.load (unsafe Loader), or jsonpickle. NOT this: a data-only
+          parser like json.loads or yaml.safe_load.
+        signals: ["pickle.loads", "pickle.load(", "yaml.load(", "marshal.loads", "jsonpickle.decode"]
+        why_bad: These reconstruct arbitrary objects, so crafted input runs code on unpickle
+        example_bad: |
+          pickle.loads(base64.b64decode(request.data))
+        example_good: |
+          json.loads(request.data)
 trigger_signals:
   - raw SQL strings or cursor.execute calls appear
   - imports of os, subprocess, or shlex with process execution
   - file paths built from request, form, or query parameters
+  - outbound HTTP fetches (requests, urllib, httpx) to a non-constant URL
+  - deserialization calls (pickle, yaml.load, marshal) on external input

codejury-0.6.0/codejury/data/golden/authn_weak_hash_indirect_vuln.yaml ADDED Viewed

@@ -0,0 +1,14 @@
+# Adversarial positive: weak password hash hidden behind hashlib.new(variable).
+capability: authn
+vulnerable: true
+expected_verdict: VULNERABLE
+cwe: CWE-916
+source: synthetic
+notes: >
+  hashlib.new("md5") is the same weak, unsalted, fast hash as hashlib.md5(), just
+  reached through a variable algorithm name. Unsuitable for password storage
+  (needs bcrypt/scrypt/argon2). The indirection should not hide it.
+code: |
+  def hash_pw(pw):
+      algo = "md5"
+      return hashlib.new(algo, pw.encode()).hexdigest()

codejury-0.6.0/codejury/data/golden/business_logic_price_tamper_vuln.yaml ADDED Viewed

@@ -0,0 +1,14 @@
+capability: business_logic
+vulnerable: true
+expected_verdict: VULNERABLE
+cwe: CWE-602
+source: synthetic
+split: held-out
+notes: >
+  Quantity and unit price come straight from the request and are never checked
+  server-side. A negative quantity credits the customer; a client-set price lets
+  them pay anything. Price must come from the catalog and quantity must be > 0.
+code: |
+  def add_to_order(order, item_id, quantity, unit_price):
+      order.lines.append((item_id, quantity, unit_price))
+      order.total += quantity * unit_price

codejury-0.6.0/codejury/data/golden/business_logic_server_checked_safe.yaml ADDED Viewed

@@ -0,0 +1,15 @@
+capability: business_logic
+vulnerable: false
+expected_verdict: SECURE
+cwe: ""
+source: synthetic
+notes: >
+  Quantity is validated as positive and the price is looked up server-side from
+  the catalog, so the client cannot tamper with either.
+code: |
+  def add_to_order(order, item_id, quantity):
+      if quantity < 1:
+          raise ValueError("quantity must be positive")
+      unit_price = catalog.price_of(item_id)   # trusted server-side price
+      order.lines.append((item_id, quantity, unit_price))
+      order.total += quantity * unit_price

codejury-0.6.0/codejury/data/golden/cmdi_fixed_argv_safe.yaml ADDED Viewed

@@ -0,0 +1,22 @@
+# False-positive-prone negative: a subprocess call driven by a request value
+# looks like command injection, but the value only selects a fixed argv from a
+# table and never reaches a shell.
+capability: input_validation
+vulnerable: false
+expected_verdict: SECURE
+cwe: ""
+source: synthetic
+split: held-out
+notes: >
+  `name` only indexes a fixed dict of argument lists (KeyError on anything
+  unknown); no user string is interpolated into a command and shell=False.
+  Flagging this as command injection is a false positive.
+code: |
+  ACTIONS = {
+      "start": ["systemctl", "start", "web"],
+      "stop": ["systemctl", "stop", "web"],
+  }
+  def run_action(name):
+      argv = ACTIONS[name]
+      subprocess.run(argv, shell=False)

codejury-0.6.0/codejury/data/golden/data_protection_plaintext_pii_vuln.yaml ADDED Viewed

@@ -0,0 +1,14 @@
+capability: data_protection
+vulnerable: true
+expected_verdict: VULNERABLE
+cwe: CWE-312
+source: synthetic
+notes: >
+  The full card number and the CVV are written to the database in cleartext.
+  The CVV must never be stored at all, and the PAN must be tokenized/encrypted.
+code: |
+  def save_card(user_id, card_number, cvv):
+      db.execute(
+          "INSERT INTO cards (user_id, pan, cvv) VALUES (?, ?, ?)",
+          (user_id, card_number, cvv),
+      )

codejury-0.6.0/codejury/data/golden/data_protection_tokenized_safe.yaml ADDED Viewed

@@ -0,0 +1,16 @@
+capability: data_protection
+vulnerable: false
+expected_verdict: SECURE
+cwe: ""
+source: synthetic
+split: held-out
+notes: >
+  The PAN is replaced by a vault token before storage; the CVV is never
+  persisted. Nothing sensitive reaches the database.
+code: |
+  def save_card(user_id, card_number):
+      token = vault.tokenize(card_number)   # store a token, never the PAN
+      db.execute(
+          "INSERT INTO cards (user_id, token) VALUES (?, ?)",
+          (user_id, token),
+      )

codejury-0.6.0/codejury/data/golden/dependency_config_tls_verify_off_vuln.yaml ADDED Viewed

@@ -0,0 +1,11 @@
+capability: dependency_config
+vulnerable: true
+expected_verdict: VULNERABLE
+cwe: CWE-295
+source: synthetic
+notes: >
+  TLS certificate verification is disabled, so the connection is open to a
+  man-in-the-middle despite the https:// scheme.
+code: |
+  def fetch_partner_data():
+      return requests.get("https://api.partner.com/data", verify=False).json()

codejury 0.5.0__tar.gz → 0.6.0__tar.gz

codejury 0.5.0tar.gz → 0.6.0tar.gz