PyPI - codejury - Versions diffs - 0.4.1__tar.gz → 0.5.1__tar.gz - Mend

codejury 0.4.1tar.gz → 0.5.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (122) hide show

{codejury-0.4.1 → codejury-0.5.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codejury
-Version: 0.4.1
+Version: 0.5.1
 Summary: General-purpose Application Security AI audit framework -- five-layer architecture, capabilities as first-class data
 Author: AISecLabs
 License-Expression: MIT
@@ -58,6 +58,12 @@ pip install 'codejury[anthropic]'    # the provider you'll use: anthropic | open
 ## Quickstart
 ```bash
+# CI gate: exit 1 if a high-severity issue is found
+git diff origin/main... | codejury audit --fail-on high -
+# Post inline review comments on a GitHub pull request (needs GITHUB_TOKEN)
+git diff origin/main... | codejury audit --github your-org/your-repo#123 -
 # No API key needed -- prove the pipeline runs end to end with mock layers
 codejury dry-run
@@ -76,9 +82,13 @@ git diff | codejury audit --provider anthropic
 | `codejury run <task>` | Run a named task preset (see [Tasks](#tasks)). |
 | `codejury eval` | Score the golden cases and report precision / recall. |
-Shared flags: `--orchestrator {single,pipeline,debate,reflexion}`,
+Shared flags: `--orchestrator {single,pipeline,debate,reflexion,challenge}`,
 `--provider {anthropic,openai,litellm}`, `--model`, `--format {text,markdown,json}`.
+Findings in known-noise categories (availability/DoS, rate limiting, memory safety
+outside C/C++) are dropped by versioned rules in
+`codejury/data/suppressions.yaml`; disable with `--no-suppress`.
 ```bash
 # Multi-round adversarial debate, rendered as Markdown
 git diff | codejury audit --orchestrator debate --format markdown - > report.md
@@ -157,9 +167,14 @@ independently.
 - **Local-pattern checks are sharper than data-flow ones.** Capabilities judged
   from one spot (weak crypto, hardcoded secrets) are reliable; taint / data-flow
   ones like path traversal over-flag in single-file review because the verifier
-  can't see whether a value is attacker-controlled. `scan --callers` adds
-  cross-file call sites for provenance (helps some cases, not a full fix); also
-  scope with `--only` or challenge findings with `--orchestrator debate`.
+  can't see whether a value is attacker-controlled. Mitigations that add context
+  but do not fully solve it: `scan --callers` (where this file's functions are
+  called) and `scan --callees` (the called code it delegates to, so a sink in
+  another file is visible) -- pair them for both directions; `--orchestrator
+  challenge` (a recall-safe
+  refutation pass that drops only provably-safe flags); `--only` to scope; or
+  `--orchestrator debate`. Real taint precision still needs data-flow analysis,
+  not model skepticism.
 - **`scan` cost scales as files x capabilities.** It is a periodic deep audit,
   not a quick check -- scope it with `--only`. Day to day, audit the diff.

{codejury-0.4.1 → codejury-0.5.1}/README.md RENAMED Viewed

@@ -29,6 +29,12 @@ pip install 'codejury[anthropic]'    # the provider you'll use: anthropic | open
 ## Quickstart
 ```bash
+# CI gate: exit 1 if a high-severity issue is found
+git diff origin/main... | codejury audit --fail-on high -
+# Post inline review comments on a GitHub pull request (needs GITHUB_TOKEN)
+git diff origin/main... | codejury audit --github your-org/your-repo#123 -
 # No API key needed -- prove the pipeline runs end to end with mock layers
 codejury dry-run
@@ -47,9 +53,13 @@ git diff | codejury audit --provider anthropic
 | `codejury run <task>` | Run a named task preset (see [Tasks](#tasks)). |
 | `codejury eval` | Score the golden cases and report precision / recall. |
-Shared flags: `--orchestrator {single,pipeline,debate,reflexion}`,
+Shared flags: `--orchestrator {single,pipeline,debate,reflexion,challenge}`,
 `--provider {anthropic,openai,litellm}`, `--model`, `--format {text,markdown,json}`.
+Findings in known-noise categories (availability/DoS, rate limiting, memory safety
+outside C/C++) are dropped by versioned rules in
+`codejury/data/suppressions.yaml`; disable with `--no-suppress`.
 ```bash
 # Multi-round adversarial debate, rendered as Markdown
 git diff | codejury audit --orchestrator debate --format markdown - > report.md
@@ -128,9 +138,14 @@ independently.
 - **Local-pattern checks are sharper than data-flow ones.** Capabilities judged
   from one spot (weak crypto, hardcoded secrets) are reliable; taint / data-flow
   ones like path traversal over-flag in single-file review because the verifier
-  can't see whether a value is attacker-controlled. `scan --callers` adds
-  cross-file call sites for provenance (helps some cases, not a full fix); also
-  scope with `--only` or challenge findings with `--orchestrator debate`.
+  can't see whether a value is attacker-controlled. Mitigations that add context
+  but do not fully solve it: `scan --callers` (where this file's functions are
+  called) and `scan --callees` (the called code it delegates to, so a sink in
+  another file is visible) -- pair them for both directions; `--orchestrator
+  challenge` (a recall-safe
+  refutation pass that drops only provably-safe flags); `--only` to scope; or
+  `--orchestrator debate`. Real taint precision still needs data-flow analysis,
+  not model skepticism.
 - **`scan` cost scales as files x capabilities.** It is a periodic deep audit,
   not a quick check -- scope it with `--only`. Day to day, audit the diff.

{codejury-0.4.1 → codejury-0.5.1}/codejury/agents/debate.py RENAMED Viewed

@@ -30,6 +30,20 @@ _FINDING_SHAPE = (
     '"description": "...", "evidence": [{"file": "...", "line": 0, "code": "..."}], "confidence": 0.0}'
 )
+_DEEP_LENS = (
+    "Look past surface patterns for the deepest flaw:\n"
+    "- Trust anchors: what does this code trust to authenticate or authorize -- a key, token, header, "
+    "signature, role, or caller -- and who controls that value? If the attacker supplies what is used to "
+    "verify them (e.g. their own public key, an unconfigured key that disables verification), passing the "
+    "check proves nothing.\n"
+    "- Order of operations: is an external, irreversible, or privileged action performed before the local "
+    "state is committed, or before the check that should guard it? Can a check and the action it guards be "
+    "split apart under concurrency (race / TOCTOU) or partial failure (on-chain done, DB rolled back)?\n"
+    "- Attack chains: combine several weak points into one end-to-end exploit.\n"
+    "Prefer the deepest design/authorization/state flaw over surface issues like missing rate limiting or "
+    "verbose logging; report those only as secondary."
+)
 class _DebateAgent(Agent):
     """Shared provider plumbing for the three debate roles."""
@@ -60,7 +74,7 @@ class FinderAgent(_DebateAgent):
     )
     def run(self, ctx: AnalysisContext) -> list[Observation]:
-        parts = ["Review the code for security vulnerabilities.", _hints(ctx.capabilities), _code(ctx.artifact)]
+        parts = ["Review the code for security vulnerabilities.", _hints(ctx.capabilities), _DEEP_LENS, _code(ctx.artifact)]
         if ctx.round_num > 1 and ctx.history:
             parts.append(_render_history(ctx.history))
             parts.append("Concede findings the rebuttals refute, keep the valid ones, and add any you missed.")
@@ -84,7 +98,9 @@ class ChallengerAgent(_DebateAgent):
     def run(self, ctx: AnalysisContext) -> list[Observation]:
         parts = [
             "Challenge the findings below. For each one you believe is a false positive, write a rebuttal. "
-            "Add new_findings for any real issue that was missed.",
+            "Add new_findings for any real issue that was missed -- especially a deeper flaw the finder "
+            "stopped short of.",
+            _DEEP_LENS,
             _code(ctx.artifact),
             _render_history(ctx.history),
             'Respond as JSON: {"rebuttals": [{"target": "finding title", "reason": "..."}], '

codejury-0.5.1/codejury/agents/refuter.py ADDED Viewed

@@ -0,0 +1,76 @@
+"""RefuterAgent -- a skeptic that tries to dismiss flagged verdicts as false positives.
+Used by the challenge orchestrator: the verifier flags issues, then the refuter
+gets the code plus the VULNERABLE verdicts (via ``ctx.history``) and argues which
+are false positives -- e.g. a value that is not actually attacker-controlled or a
+sink that is not reachable. It returns a Concession per verdict it refutes.
+This is the cheap, focused alternative to a full debate: only flagged verdicts
+are challenged, not the whole file.
+"""
+from __future__ import annotations
+from codejury.agents.base import Agent
+from codejury.domain.context import AnalysisContext
+from codejury.domain.observation import Concession, Observation, Verdict
+from codejury.infrastructure.json_parse import extract_json_object
+from codejury.providers.base import Message, Provider
+_SYSTEM = (
+    "You are a careful security reviewer checking flagged issues for false positives. "
+    "Security errs toward keeping a flag: refute one ONLY when the code in front of you "
+    "affirmatively proves the value is not attacker-controlled. If a value's origin is not "
+    "shown, or it could plausibly come from external/untrusted input, KEEP the flag. "
+    "Respond with a single JSON object and nothing else."
+)
+_JSON_SHAPE = '{"refuted": [{"capability": "id.sub", "reason": "proof it is not attacker-controlled"}]}'
+class RefuterAgent(Agent):
+    def __init__(self, *, provider: Provider, model: str, max_tokens: int = 1024) -> None:
+        self._provider = provider
+        self._model = model
+        self._max_tokens = max_tokens
+    def run(self, ctx: AnalysisContext) -> list[Observation]:
+        flagged = [o for o in ctx.history if isinstance(o, Verdict)]
+        if not flagged:
+            return []
+        flags = "\n".join(f"- {v.capability}: {v.reasoning}" for v in flagged)
+        context_block = (
+            f"Call sites elsewhere (for tracing where arguments come from):\n```\n{ctx.artifact.context}\n```\n\n"
+            if ctx.artifact.context
+            else ""
+        )
+        prompt = (
+            f"Code under review ({ctx.artifact.path}):\n```\n{ctx.artifact.content}\n```\n\n"
+            f"{context_block}"
+            f"Flagged issues:\n{flags}\n\n"
+            "This attacker-control reasoning applies ONLY to input-driven issues (injection, path "
+            "traversal, SSRF). For those, refute a flag only if you can affirmatively prove the value "
+            "is not attacker-controlled: a stored data field, or traced (here or in the call sites) to "
+            "a trusted, config, or operator-supplied source. If its origin is not shown or could "
+            "plausibly be external input, do NOT refute. For other issue types (hardcoded secrets, "
+            "weak crypto, ...), a literal value is often the vulnerability itself -- do NOT refute "
+            "those just because a value is constant.\n\n"
+            "Respond with a single JSON object exactly like:\n" + _JSON_SHAPE
+        )
+        result = self._provider.complete(
+            system=_SYSTEM,
+            messages=[Message(role="user", content=prompt)],
+            model=self._model,
+            max_tokens=self._max_tokens,
+        )
+        obj = extract_json_object(result.text) or {}
+        out: list[Observation] = []
+        for item in obj.get("refuted", []):
+            if not isinstance(item, dict):
+                continue
+            capability = str(item.get("capability", "")).strip()
+            if capability:
+                out.append(
+                    Concession(capability=capability, produced_by="refuter", target=capability, reason=str(item.get("reason", "")))
+                )
+        return out

{codejury-0.4.1 → codejury-0.5.1}/codejury/assembly.py RENAMED Viewed

@@ -10,12 +10,14 @@ import os
 from codejury.agents.base import Agent
 from codejury.agents.debate import ChallengerAgent, FinderAgent, JudgeAgent
+from codejury.agents.refuter import RefuterAgent
 from codejury.agents.verifier import VerifierAgent
 from codejury.domain.artifact import CodeArtifact
 from codejury.domain.capability import Capability
 from codejury.domain.context import AnalysisContext
 from codejury.domain.result import AnalysisResult
 from codejury.orchestrators.base import Orchestrator
+from codejury.orchestrators.challenge import ChallengeOrchestrator
 from codejury.orchestrators.debate import DebateOrchestrator
 from codejury.orchestrators.pipeline import PipelineOrchestrator
 from codejury.orchestrators.reflexion import ReflexionOrchestrator
@@ -27,7 +29,7 @@ from codejury.providers.openai import OpenAIProvider
 from codejury.providers.retry import RetryProvider
 from codejury.sources.base import Source
-STRATEGIES = ("single", "pipeline", "debate", "reflexion")
+STRATEGIES = ("single", "pipeline", "debate", "reflexion", "challenge")
 PROVIDERS = ("anthropic", "openai", "litellm")
 DEFAULT_MODEL = os.environ.get("CODEJURY_MODEL", "claude-sonnet-4-6")
 DEFAULT_API_BASE = os.environ.get("CODEJURY_API_BASE")
@@ -61,6 +63,12 @@ def build_orchestration(
             "critic": ChallengerAgent(provider=provider, model=model, max_tokens=max_tokens),
         }
         return agents, ReflexionOrchestrator()
+    if strategy == "challenge":
+        agents = {
+            "verifier": VerifierAgent(provider=provider, model=model, max_tokens=max_tokens),
+            "refuter": RefuterAgent(provider=provider, model=model),
+        }
+        return agents, ChallengeOrchestrator()
     verifier = {"verifier": VerifierAgent(provider=provider, model=model, max_tokens=max_tokens)}
     if strategy == "pipeline":
         return verifier, PipelineOrchestrator()

{codejury-0.4.1 → codejury-0.5.1}/codejury/cli.py RENAMED Viewed

@@ -9,6 +9,7 @@ library, backed by the Anthropic provider, under a chosen orchestration strategy
 from __future__ import annotations
 import argparse
+import os
 import sys
 from codejury.agents.mock import MockAgent
@@ -33,7 +34,9 @@ from codejury.orchestrators.single import SingleOrchestrator
 from codejury.providers.base import Provider
 from codejury.providers.mock import MockProvider
 from codejury.reporting import to_json, to_markdown
-from codejury.resources import CAPABILITIES_DIR, GOLDEN_DIR, TASKS_DIR
+from codejury.resources import CAPABILITIES_DIR, GOLDEN_DIR, SUPPRESSIONS_FILE, TASKS_DIR
+from codejury.suppression import filter_results, load_suppressions
+from codejury.integrations.github import build_review, parse_pr_ref, post_review
 from codejury.sources.chunker import Chunker
 from codejury.sources.diff import DiffSource
 from codejury.sources.repo import RepoSource
@@ -83,10 +86,15 @@ def scan(
     extensions: tuple[str, ...] = (".py",),
     max_chars: int = 200_000,
     with_callers: bool = False,
+    with_callees: bool = False,
 ) -> list[tuple[str, AnalysisResult]]:
     """Audit every matching file in a directory tree, returning (path, result) per artifact."""
     source = RepoSource(
-        directory, extensions=extensions, chunker=Chunker(max_chars=max_chars), with_callers=with_callers
+        directory,
+        extensions=extensions,
+        chunker=Chunker(max_chars=max_chars),
+        with_callers=with_callers,
+        with_callees=with_callees,
     )
     artifacts = source.list_artifacts()
     calls = len(artifacts) * len(capabilities)
@@ -137,6 +145,50 @@ def _render_results(fmt: str, results: list[tuple[str, AnalysisResult]]) -> str:
     return {"text": _render_audit, "markdown": to_markdown, "json": to_json}[fmt](results)
+def _maybe_suppress(results: list[tuple[str, AnalysisResult]], enabled: bool) -> list[tuple[str, AnalysisResult]]:
+    if not enabled:
+        return results
+    filtered, suppressed = filter_results(results, load_suppressions(SUPPRESSIONS_FILE))
+    if suppressed:
+        print(f"suppressed {len(suppressed)} known-noise finding(s) by rule", file=sys.stderr)
+    return filtered
+_FAIL_ON = ("critical", "high", "medium", "low")
+_SEVERITY_RANK = {"critical": 4, "high": 3, "medium": 2, "low": 1, "info": 0}
+def _problem_rank(o: Observation) -> int:
+    if o.kind == "finding":
+        return _SEVERITY_RANK.get(o.severity.lower(), 2)
+    if o.kind == "verdict" and o.status == "VULNERABLE":
+        return _SEVERITY_RANK["high"]
+    if o.kind == "verdict" and o.status == "PARTIAL":
+        return _SEVERITY_RANK["medium"]
+    return -1
+def _gate_exit(results: list[tuple[str, AnalysisResult]], fail_on: str | None) -> int:
+    if not fail_on:
+        return 0
+    worst = max((_problem_rank(o) for _, r in results for o in r.observations), default=-1)
+    return 1 if worst >= _SEVERITY_RANK[fail_on] else 0
+def _maybe_post_github(ref: str | None, results: list[tuple[str, AnalysisResult]]) -> None:
+    if not ref:
+        return
+    token = os.environ.get("GITHUB_TOKEN")
+    if not token:
+        print("GITHUB_TOKEN not set; skipping PR review", file=sys.stderr)
+        return
+    try:
+        owner, repo, pull = parse_pr_ref(ref)
+        post_review(owner, repo, pull, build_review(results), token=token)
+        print(f"posted review to {ref}", file=sys.stderr)
+    except Exception as exc:
+        print(f"github review failed: {exc}", file=sys.stderr)
 def _render_metrics(m: Metrics) -> str:
     return (
         f"cases: {m.total}  (tp={m.tp} fp={m.fp} tn={m.tn} fn={m.fn})\n"
@@ -168,6 +220,9 @@ def main(argv: list[str] | None = None) -> int:
     audit_p.add_argument("--retries", type=int, default=0, help="provider retry attempts on failure")
     audit_p.add_argument("--api-base", default=DEFAULT_API_BASE, help="provider base URL (env: CODEJURY_API_BASE)")
     audit_p.add_argument("--api-key", default=DEFAULT_API_KEY, help="provider API key (env: CODEJURY_API_KEY)")
+    audit_p.add_argument("--no-suppress", action="store_true", help="disable the known-noise suppression filter")
+    audit_p.add_argument("--fail-on", choices=_FAIL_ON, default=None, dest="fail_on", help="exit 1 if a finding at/above this severity is found")
+    audit_p.add_argument("--github", default=None, help="post a PR review: owner/repo#number (needs GITHUB_TOKEN)")
     scan_p = sub.add_parser("scan", help="audit a whole directory tree (deep, capability by capability)")
     scan_p.add_argument("directory", help="directory to scan")
@@ -181,10 +236,15 @@ def main(argv: list[str] | None = None) -> int:
     scan_p.add_argument("--max-tokens", type=int, default=2048)
     scan_p.add_argument("--max-chars", type=int, default=200_000, help="chunk budget; default keeps whole files")
     scan_p.add_argument(
-        "--callers", action="store_true", help="add cross-file call sites as context (cuts taint false positives)"
+        "--callers", action="store_true", help="add cross-file context: where this file's functions are called"
+    )
+    scan_p.add_argument(
+        "--callees", action="store_true", help="add cross-file context: the called code this file delegates to"
     )
     scan_p.add_argument("--api-base", default=DEFAULT_API_BASE, help="provider base URL (env: CODEJURY_API_BASE)")
     scan_p.add_argument("--api-key", default=DEFAULT_API_KEY, help="provider API key (env: CODEJURY_API_KEY)")
+    scan_p.add_argument("--no-suppress", action="store_true", help="disable the known-noise suppression filter")
+    scan_p.add_argument("--fail-on", choices=_FAIL_ON, default=None, dest="fail_on", help="exit 1 if a finding at/above this severity is found")
     run_p = sub.add_parser("run", help="run a named task preset against a unified diff")
     run_p.add_argument("task", help="task name")
@@ -192,6 +252,8 @@ def main(argv: list[str] | None = None) -> int:
     run_p.add_argument("--tasks", default=TASKS_DIR, help="task YAML directory")
     run_p.add_argument("--capabilities", default=CAPABILITIES_DIR, help="capability YAML directory")
     run_p.add_argument("--format", choices=_FORMATS, default="text", dest="fmt")
+    run_p.add_argument("--no-suppress", action="store_true", help="disable the known-noise suppression filter")
+    run_p.add_argument("--fail-on", choices=_FAIL_ON, default=None, dest="fail_on", help="exit 1 if a finding at/above this severity is found")
     eval_p = sub.add_parser("eval", help="score golden cases and report precision/recall")
     eval_p.add_argument("--golden", default=GOLDEN_DIR, help="golden case YAML directory")
@@ -214,8 +276,10 @@ def main(argv: list[str] | None = None) -> int:
             max_tokens=args.max_tokens,
             strategy=args.orchestrator,
         )
+        results = _maybe_suppress(results, not args.no_suppress)
         print(_render_results(args.fmt, results))
-        return 0
+        _maybe_post_github(args.github, results)
+        return _gate_exit(results, args.fail_on)
     if args.command == "scan":
         capabilities = load_capabilities(args.capabilities)
@@ -233,9 +297,11 @@ def main(argv: list[str] | None = None) -> int:
             extensions=extensions,
             max_chars=args.max_chars,
             with_callers=args.callers,
+            with_callees=args.callees,
         )
+        results = _maybe_suppress(results, not args.no_suppress)
         print(_render_results(args.fmt, results))
-        return 0
+        return _gate_exit(results, args.fail_on)
     if args.command == "run":
         tasks = load_tasks(args.tasks)
@@ -245,8 +311,9 @@ def main(argv: list[str] | None = None) -> int:
         results = run_task(
             tasks[args.task], DiffSource(_read_diff(args.diff)), load_capabilities(args.capabilities)
         )
+        results = _maybe_suppress(results, not args.no_suppress)
         print(_render_results(args.fmt, results))
-        return 0
+        return _gate_exit(results, args.fail_on)
     if args.command == "eval":
         try:

codejury-0.5.1/codejury/data/suppressions.yaml ADDED Viewed

@@ -0,0 +1,43 @@
+# Known-noise suppression rules (data-driven false-positive filter).
+# Each drops a flagged finding whose text matches and whose path condition holds.
+# Keep these to out-of-scope / low-signal CATEGORIES -- never key on a real
+# vulnerability class, or you will drop true findings.
+- id: SUP-AVAILABILITY
+  reason: availability / DoS / rate-limiting findings are out of scope and low-signal here
+  match_any:
+    - "denial of service"
+    - "denial-of-service"
+    - "rate limit"
+    - "rate-limit"
+    - "rate limiting"
+    - "resource exhaustion"
+    - "unbounded"
+    - "amplification"
+- id: SUP-LOGGING-NOISE
+  reason: verbose / insufficient logging is noise unless a secret value is logged
+  match_any:
+    - "verbose logging"
+    - "insufficient logging"
+    - "excessive logging"
+    - "lack of logging"
+    - "log verbosity"
+- id: SUP-MEMORY-SAFETY-NON-C
+  reason: memory-safety issues do not apply outside C/C++
+  match_any:
+    - "buffer overflow"
+    - "use after free"
+    - "use-after-free"
+    - "double free"
+    - "memory corruption"
+    - "out-of-bounds"
+  unless_path_ext: [".c", ".cc", ".cpp", ".cxx", ".h", ".hpp"]
+- id: SUP-REDOS
+  reason: regex denial-of-service / catastrophic backtracking is low-signal here
+  match_any:
+    - "redos"
+    - "catastrophic backtracking"
+    - "regex denial"

codejury-0.5.1/codejury/integrations/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """codejury.integrations -- post results to external systems (GitHub PR reviews)."""

codejury-0.5.1/codejury/integrations/github.py ADDED Viewed

@@ -0,0 +1,88 @@
+"""Post audit results to a GitHub pull request as a review with inline comments.
+``build_review`` is a pure function (results -> GitHub review payload) so it is
+unit-testable; ``post_review`` does the HTTP POST and accepts an injectable
+transport so it can be tested without a token or a live PR. Problems with a
+usable file:line become inline comments; everything else is summarized in the
+review body. The review requests changes when any problem is found.
+"""
+from __future__ import annotations
+import json
+import urllib.request
+from typing import Any, Callable
+from codejury.domain.observation import Observation
+from codejury.domain.result import AnalysisResult
+Results = list[tuple[str, AnalysisResult]]
+def build_review(results: Results, *, max_comments: int = 50) -> dict:
+    comments: list[dict] = []
+    problems = 0
+    for _path, result in results:
+        for o in result.observations:
+            comment = _inline_comment(o)
+            if comment is None:
+                continue
+            problems += 1
+            if len(comments) < max_comments:
+                comments.append(comment)
+    body = (
+        f"codejury found {problems} issue(s)." if problems else "codejury found no issues."
+    )
+    if problems > len(comments):
+        body += f" Showing {len(comments)} inline; {problems - len(comments)} more omitted."
+    return {
+        "body": body,
+        "event": "REQUEST_CHANGES" if problems else "COMMENT",
+        "comments": comments,
+    }
+def _inline_comment(o: Observation) -> dict | None:
+    if o.kind == "finding":
+        evidence = o.evidence[0] if o.evidence else None
+        if evidence and evidence.file and evidence.line:
+            cwe = f" ({o.cwe})" if o.cwe else ""
+            return {"path": evidence.file, "line": evidence.line, "body": f"**{o.severity}{cwe}** {o.title}\n\n{o.description}"}
+    if o.kind == "verdict" and o.status == "VULNERABLE":
+        evidence = o.evidence[0] if o.evidence else None
+        if evidence and evidence.file and evidence.line:
+            return {"path": evidence.file, "line": evidence.line, "body": f"**VULNERABLE** `{o.capability}`\n\n{o.reasoning}"}
+    return None
+def post_review(
+    owner: str,
+    repo: str,
+    pull: int,
+    payload: dict,
+    *,
+    token: str,
+    transport: Callable[[str, bytes, dict], Any] | None = None,
+) -> Any:
+    url = f"https://api.github.com/repos/{owner}/{repo}/pulls/{pull}/reviews"
+    data = json.dumps(payload).encode()
+    headers = {
+        "Authorization": f"Bearer {token}",
+        "Accept": "application/vnd.github+json",
+        "Content-Type": "application/json",
+    }
+    if transport is not None:
+        return transport(url, data, headers)
+    request = urllib.request.Request(url, data=data, headers=headers, method="POST")
+    with urllib.request.urlopen(request) as response:
+        return response.status
+def parse_pr_ref(ref: str) -> tuple[str, str, int]:
+    """Parse 'owner/repo#123' into (owner, repo, pull_number)."""
+    repo_part, _, number = ref.partition("#")
+    owner, _, repo = repo_part.partition("/")
+    if not owner or not repo or not number.isdigit():
+        raise ValueError(f"expected owner/repo#number, got {ref!r}")
+    return owner, repo, int(number)

codejury-0.5.1/codejury/orchestrators/challenge.py ADDED Viewed

@@ -0,0 +1,67 @@
+"""ChallengeOrchestrator -- verify, then challenge the flagged verdicts.
+The verifier rules on every capability; then a refuter is shown only the
+VULNERABLE verdicts and the code, and argues which are false positives. A refuted
+verdict becomes a dismissed Concession (recording why), so the report keeps the
+SECURE/NOT_PRESENT verdicts, the surviving VULNERABLE ones, and a Dismissed list.
+This targets taint-style false positives (which a lone verifier over-reports)
+while paying the extra model call only for flagged verdicts, not the whole file.
+Only verdicts from taint-prone capabilities are challenged. Local-pattern issues
+(hardcoded secrets, weak crypto) are kept as-is: refuting them risks dropping a
+real finding, and they do not have the attacker-control ambiguity that makes
+taint checks over-report.
+"""
+from __future__ import annotations
+import dataclasses
+from codejury.agents.base import Agent
+from codejury.domain.context import AnalysisContext
+from codejury.domain.observation import Concession, Observation, Verdict
+from codejury.domain.result import AnalysisResult
+from codejury.orchestrators.base import Orchestrator
+_REQUIRED_ROLES = ("verifier", "refuter")
+_DEFAULT_TAINT_CAPABILITIES = frozenset({"input_validation"})
+class ChallengeOrchestrator(Orchestrator):
+    def __init__(self, *, taint_capabilities: frozenset[str] = _DEFAULT_TAINT_CAPABILITIES) -> None:
+        self._taint_capabilities = taint_capabilities
+    def run(self, agents: dict[str, Agent], context: AnalysisContext) -> AnalysisResult:
+        missing = [role for role in _REQUIRED_ROLES if role not in agents]
+        if missing:
+            return AnalysisResult(error=f"challenge requires agents: {', '.join(missing)}")
+        verdicts = agents["verifier"].run(context)
+        flagged = [
+            v
+            for v in verdicts
+            if isinstance(v, Verdict)
+            and v.status == "VULNERABLE"
+            and v.capability.split(".")[0] in self._taint_capabilities
+        ]
+        if not flagged:
+            return AnalysisResult(observations=verdicts)
+        refutations = agents["refuter"].run(dataclasses.replace(context, history=flagged))
+        reasons = {c.target: c.reason for c in refutations if isinstance(c, Concession)}
+        observations: list[Observation] = []
+        for v in verdicts:
+            if isinstance(v, Verdict) and v.status == "VULNERABLE" and v.capability in reasons:
+                observations.append(
+                    Concession(
+                        capability=v.capability,
+                        produced_by="refuter",
+                        target=v.capability,
+                        reason=reasons[v.capability] or "refuted as a false positive",
+                    )
+                )
+            else:
+                observations.append(v)
+        return AnalysisResult(observations=observations)

{codejury-0.4.1 → codejury-0.5.1}/codejury/resources.py RENAMED Viewed

@@ -11,3 +11,4 @@ _DATA = Path(__file__).resolve().parent / "data"
 CAPABILITIES_DIR = _DATA / "capabilities"
 TASKS_DIR = _DATA / "tasks"
 GOLDEN_DIR = _DATA / "golden"
+SUPPRESSIONS_FILE = _DATA / "suppressions.yaml"

codejury 0.4.1__tar.gz → 0.5.1__tar.gz

codejury 0.4.1tar.gz → 0.5.1tar.gz