PyPI - audit-packs-core - Versions diffs - 0.1.1__tar.gz - Mend

audit-packs-core 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

audit_packs_core-0.1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,32 @@
+Metadata-Version: 2.4
+Name: audit-packs-core
+Version: 0.1.1
+Summary: Core models, normalization, diff, and data-flow primitives for audit-packs
+License: Apache-2.0
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: PyYAML>=6.0
+# audit-packs-core
+[![PyPI version](https://img.shields.io/pypi/v/audit-packs-core.svg)](https://pypi.org/project/audit-packs-core/)
+[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](../../LICENSE)
+`audit-packs-core` is the foundational library for the `audit-packs` ecosystem. It provides the core data structures, schema models, parser interfaces, diffing utilities, and normalization primitives used across all other package modules.
+## Installation
+```bash
+pip install audit-packs-core
+```
+## Features
+- **Standardized Schema Models**: Defines standard structures for scanner findings, controls, frameworks, rules, and reports.
+- **Normalization Primitives**: Converts scanner-specific findings into a scanner-agnostic intermediate representation.
+- **Diffing Utilities**: Compares findings between parent and feature branches to detect newly introduced compliance gaps.
+- **YAML Configuration Parser**: Parses standard YAML frameworks and control files.
+## Learn More
+This library is part of the larger `audit-packs` Compliance Intelligence Engine. For the main command-line interface, GitHub Action integration, and framework mappings, see the [main repository](https://github.com/prakharsingh/audit-packs).

audit_packs_core-0.1.1/README.md ADDED Viewed

@@ -0,0 +1,23 @@
+# audit-packs-core
+[![PyPI version](https://img.shields.io/pypi/v/audit-packs-core.svg)](https://pypi.org/project/audit-packs-core/)
+[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](../../LICENSE)
+`audit-packs-core` is the foundational library for the `audit-packs` ecosystem. It provides the core data structures, schema models, parser interfaces, diffing utilities, and normalization primitives used across all other package modules.
+## Installation
+```bash
+pip install audit-packs-core
+```
+## Features
+- **Standardized Schema Models**: Defines standard structures for scanner findings, controls, frameworks, rules, and reports.
+- **Normalization Primitives**: Converts scanner-specific findings into a scanner-agnostic intermediate representation.
+- **Diffing Utilities**: Compares findings between parent and feature branches to detect newly introduced compliance gaps.
+- **YAML Configuration Parser**: Parses standard YAML frameworks and control files.
+## Learn More
+This library is part of the larger `audit-packs` Compliance Intelligence Engine. For the main command-line interface, GitHub Action integration, and framework mappings, see the [main repository](https://github.com/prakharsingh/audit-packs).

audit_packs_core-0.1.1/pyproject.toml ADDED Viewed

@@ -0,0 +1,15 @@
+[project]
+name = "audit-packs-core"
+version = "0.1.1"
+description = "Core models, normalization, diff, and data-flow primitives for audit-packs"
+readme = "README.md"
+license = { text = "Apache-2.0" }
+requires-python = ">=3.11"
+dependencies = ["PyYAML>=6.0"]
+[build-system]
+requires = ["setuptools>=68"]
+build-backend = "setuptools.build_meta"
+[tool.setuptools.packages.find]
+where = ["src"]

audit_packs_core-0.1.1/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

audit_packs_core-0.1.1/src/audit_packs_core/__init__.py ADDED Viewed

File without changes

audit_packs_core-0.1.1/src/audit_packs_core/dataflow.py ADDED Viewed

@@ -0,0 +1,171 @@
+from __future__ import annotations
+import re
+from dataclasses import dataclass
+@dataclass(frozen=True)
+class DataFlow:
+    source_line: int
+    source_type: str
+    transforms: tuple[str, ...]
+    sink_line: int
+    sink_type: str
+    has_transform: bool
+_PYTHON_SOURCE_PATTERNS = [
+    # request.form, request.data, request.json
+    (re.compile(r"\brequest\.(form|data|json)\b"), "user_input"),
+    # input() calls
+    (re.compile(r"\binput\s*\("), "user_input"),
+    # os.environ
+    (re.compile(r"\bos\.environ\b"), "env_var"),
+    # ORM .get() / .filter() on known models
+    (re.compile(r"\b(User|Patient|Customer)\.(get|filter|filter_by)\s*\("), "db_read"),
+]
+_PYTHON_TRANSFORM_NAMES = {"encrypt", "mask", "hash", "anonymise", "redact", "bcrypt"}
+_PYTHON_SINK_PATTERNS = [
+    (re.compile(r"\bdb\.session\.add\s*\("), "db_write"),
+    (re.compile(r"\b\w+\.save\s*\(\s*\)"), "db_write"),
+    (re.compile(r"\brequests\.(post|put)\s*\("), "api_call"),
+    (re.compile(r"\blogging\.(info|warning|error|debug|critical)\s*\("), "log"),
+    (re.compile(r"\bprint\s*\("), "log"),
+    (re.compile(r"\bresponse\.json\s*\("), "response"),
+]
+_HCL_SOURCE_PATTERN = re.compile(r'\bvar\.\w+|\bdata\s+"aws_secretsmanager_secret"')
+_HCL_TRANSFORM_PATTERN = re.compile(r"\bkms_key_id\s*=|\bencrypted\s*=\s*true")
+_HCL_SINK_PATTERN = re.compile(
+    r'\bresource\s+"(aws_s3_bucket_object|aws_rds_cluster|aws_lambda_function)"'
+)
+def _extract_python_flows(text: str) -> list[DataFlow]:
+    lines = text.splitlines()
+    flows: list[DataFlow] = []
+    sources: list[tuple[int, str]] = []
+    sinks: list[tuple[int, str]] = []
+    transform_lines: list[int] = []
+    for i, line in enumerate(lines, start=1):
+        for pattern, src_type in _PYTHON_SOURCE_PATTERNS:
+            if pattern.search(line):
+                sources.append((i, src_type))
+                break
+        for name in _PYTHON_TRANSFORM_NAMES:
+            if re.search(rf"\b{name}\s*\(", line):
+                transform_lines.append(i)
+                break
+        for pattern, sink_type in _PYTHON_SINK_PATTERNS:
+            if pattern.search(line):
+                sinks.append((i, sink_type))
+                break
+    for src_line, src_type in sources:
+        for sink_line, sink_type in sinks:
+            if sink_line <= src_line:
+                continue
+            transforms_between = tuple(
+                _name
+                for _name in _PYTHON_TRANSFORM_NAMES
+                for t_line in transform_lines
+                if src_line < t_line < sink_line
+                and re.search(rf"\b{_name}\s*\(", lines[t_line - 1])
+            )
+            has_transform = bool(transforms_between) or any(
+                src_line < t < sink_line for t in transform_lines
+            )
+            flows.append(
+                DataFlow(
+                    source_line=src_line,
+                    source_type=src_type,
+                    transforms=transforms_between,
+                    sink_line=sink_line,
+                    sink_type=sink_type,
+                    has_transform=has_transform,
+                )
+            )
+    return flows
+def _extract_hcl_flows(text: str) -> list[DataFlow]:
+    lines = text.splitlines()
+    sources: list[int] = []
+    sinks: list[int] = []
+    has_transform = False
+    for i, line in enumerate(lines, start=1):
+        if _HCL_SOURCE_PATTERN.search(line):
+            sources.append(i)
+        if _HCL_TRANSFORM_PATTERN.search(line):
+            has_transform = True
+        if _HCL_SINK_PATTERN.search(line):
+            sinks.append(i)
+    flows = []
+    for src in sources:
+        for sink in sinks:
+            if sink > src:
+                flows.append(
+                    DataFlow(
+                        source_line=src,
+                        source_type="env_var",
+                        transforms=(),
+                        sink_line=sink,
+                        sink_type="db_write",
+                        has_transform=has_transform,
+                    )
+                )
+    return flows
+def extract_data_flows(file_text: str, language: str) -> list[DataFlow]:
+    """Extract source→transform→sink chains. language: 'python'|'hcl'|'yaml'|'json'."""
+    if language == "python":
+        return _extract_python_flows(file_text)
+    if language in ("hcl", "yaml", "json"):
+        return _extract_hcl_flows(file_text)
+    return []
+def flow_confidence(flows: list[DataFlow], finding_line: int) -> float:
+    """
+    Compute flow_confidence score for finding at finding_line.
+    Returns 0.5 (neutral) when no flows are within ±50 lines.
+    Among in-range flows, selects closest to finding_line (tie-break: prefer has_transform=False).
+    Classification:
+      has_transform=False, both ends in range  → 0.9
+      has_transform=False, one end in range    → 0.7
+      has_transform=True,  both ends in range  → 0.2
+      has_transform=True,  one end in range    → 0.5
+    """
+    RANGE = 50
+    def in_range(line: int) -> bool:
+        return abs(line - finding_line) <= RANGE
+    in_range_flows = [
+        f for f in flows if in_range(f.source_line) or in_range(f.sink_line)
+    ]
+    if not in_range_flows:
+        return 0.5
+    def sort_key(f: DataFlow) -> tuple:
+        dist = min(abs(f.source_line - finding_line), abs(f.sink_line - finding_line))
+        return (dist, 0 if not f.has_transform else 1)
+    best = sorted(in_range_flows, key=sort_key)[0]
+    both_in_range = in_range(best.source_line) and in_range(best.sink_line)
+    if not best.has_transform:
+        return 0.9 if both_in_range else 0.7
+    else:
+        return 0.2 if both_in_range else 0.5

audit_packs_core-0.1.1/src/audit_packs_core/diff.py ADDED Viewed

@@ -0,0 +1,21 @@
+import re
+_HUNK = re.compile(r"^@@ -\d+(?:,\d+)? \+(\d+)(?:,(\d+))? @@")
+def parse_unified_diff(diff_text: str) -> dict[str, set[int]]:
+    result: dict[str, set[int]] = {}
+    current: str | None = None
+    for line in diff_text.splitlines():
+        if line.startswith("+++ b/"):
+            current = line[len("+++ b/") :].strip()
+            continue
+        if line.startswith("+++ ") or line.startswith("--- "):
+            continue
+        m = _HUNK.match(line)
+        if m and current is not None:
+            start = int(m.group(1))
+            count = int(m.group(2)) if m.group(2) is not None else 1
+            if count > 0:
+                result.setdefault(current, set()).update(range(start, start + count))
+    return {f: lines for f, lines in result.items() if lines}

audit_packs_core-0.1.1/src/audit_packs_core/models.py ADDED Viewed

@@ -0,0 +1,81 @@
+from __future__ import annotations
+from dataclasses import dataclass
+from enum import Enum
+SEVERITIES = ("low", "medium", "high", "critical")
+def severity_rank(severity: str) -> int:
+    return SEVERITIES.index(severity)
+@dataclass(frozen=True)
+class PathNode:
+    file: str
+    line: int
+    snippet: str
+    description: str
+@dataclass(frozen=True)
+class Finding:
+    check_id: str
+    engine: str
+    file: str
+    line: int
+    severity: str
+    message: str
+    evidence: str
+    doc_context: str = ""
+    evidence_path: tuple[PathNode, ...] = ()
+@dataclass(frozen=True)
+class ControlFinding:
+    finding: Finding
+    framework: str
+    control_id: str
+    control_title: str
+    evidence_requirements: tuple = ()
+class AssessmentStatus(str, Enum):
+    """Status of a control after evidence collection."""
+    PASS = "pass"
+    FAIL = "fail"
+    NOT_APPLICABLE = "not_applicable"
+    MANUAL = "manual"
+class AdjudicationMode(str, Enum):
+    OFF = "off"
+    ADVISORY = "advisory"
+    ENFORCE = "enforce"
+@dataclass(frozen=True)
+class AdjudicationResult:
+    control_finding: ControlFinding
+    detector_score: float
+    verifier_argument: str
+    challenger_argument: str
+    consensus_score: float
+    model_consensus: float
+    rationale: str
+@dataclass(frozen=True)
+class ControlStatus:
+    """Status-aware view of a single compliance control after assessment."""
+    framework: str
+    control_id: str
+    control_title: str
+    status: AssessmentStatus
+    # (engine, check_id) pairs that guard this control
+    check_ids: tuple
+    # ControlFinding instances that caused a FAIL
+    findings: tuple
+    # raw evidence strings extracted from findings
+    evidence: tuple

audit_packs_core-0.1.1/src/audit_packs_core/normalize.py ADDED Viewed

@@ -0,0 +1,104 @@
+from audit_packs_core.models import Finding, PathNode
+_LEVEL_TO_SEVERITY = {
+    "error": "high",
+    "warning": "medium",
+    "note": "low",
+    "none": "low",
+}
+_PROP_TO_SEVERITY = {
+    "CRITICAL": "critical",
+    "HIGH": "high",
+    "MEDIUM": "medium",
+    "LOW": "low",
+    "INFO": "low",
+}
+_CONFIDENCE_MAP = {"HIGH": 0.9, "MEDIUM": 0.6, "LOW": 0.3}
+def _extract_evidence_path(result: dict) -> tuple[PathNode, ...]:
+    """Parse codeFlows[0].threadFlows[0].locations into PathNode tuples."""
+    code_flows = result.get("codeFlows", [])
+    if not code_flows:
+        return ()
+    thread_flows = code_flows[0].get("threadFlows", [])
+    if not thread_flows:
+        return ()
+    locations = thread_flows[0].get("locations", [])
+    nodes = []
+    for loc_entry in locations:
+        loc = loc_entry.get("location", {})
+        phys = loc.get("physicalLocation", {})
+        uri = phys.get("artifactLocation", {}).get("uri", "")
+        line = phys.get("region", {}).get("startLine", 0)
+        snippet = phys.get("region", {}).get("snippet", {}).get("text", "")
+        description = loc.get("message", {}).get("text", "")
+        nodes.append(
+            PathNode(file=uri, line=int(line), snippet=snippet, description=description)
+        )
+    return tuple(nodes)
+def _normalize_rule_id(rule_id: str, engine: str) -> str:
+    """Strip dotted namespace prefix from semgrep rule IDs (e.g. 'org.foo.bar' → 'bar').
+    Only applied for semgrep because other engines (checkov, codeql, ast) use their
+    own ID schemes and stripping would break pack lookups or collapse distinct rules.
+    """
+    if engine == "semgrep" and "." in rule_id:
+        return rule_id.split(".")[-1]
+    return rule_id
+def sarif_to_findings(sarif: dict, engine: str) -> list[Finding]:
+    findings: list[Finding] = []
+    for run in sarif.get("runs", []):
+        for res in run.get("results", []):
+            locs = res.get("locations", [])
+            if not locs:
+                continue
+            phys = locs[0].get("physicalLocation", {})
+            path = phys.get("artifactLocation", {}).get("uri", "")
+            line = phys.get("region", {}).get("startLine", 1)
+            msg = res.get("message", {}).get("text", "")
+            snippet = phys.get("region", {}).get("snippet", {}).get("text", "")
+            prop_sev = _PROP_TO_SEVERITY.get(
+                res.get("properties", {}).get("severity", "").upper()
+            )
+            level_sev = _LEVEL_TO_SEVERITY.get(res.get("level", "warning"), "medium")
+            evidence_path = _extract_evidence_path(res)
+            raw_id = res.get("ruleId", "")
+            check_id = _normalize_rule_id(raw_id, engine)
+            findings.append(
+                Finding(
+                    check_id=check_id,
+                    engine=engine,
+                    file=path,
+                    line=int(line),
+                    severity=prop_sev or level_sev,
+                    message=msg,
+                    evidence=snippet or msg,
+                    evidence_path=evidence_path,
+                )
+            )
+    return findings
+def extract_rule_confidences(sarif: dict, engine: str = "") -> dict[str, float]:
+    """Return {rule_id → confidence_score} from SARIF tool rule metadata.
+    The engine parameter must match the value passed to sarif_to_findings so that
+    the keys in the returned dict align with Finding.check_id values.
+    """
+    confidences: dict[str, float] = {}
+    for run in sarif.get("runs", []):
+        rules = run.get("tool", {}).get("driver", {}).get("rules", [])
+        for rule in rules:
+            rule_id = rule.get("id", "")
+            norm_id = _normalize_rule_id(rule_id, engine)
+            conf_str = rule.get("properties", {}).get("confidence", "")
+            if conf_str.upper() in _CONFIDENCE_MAP:
+                confidences[norm_id] = _CONFIDENCE_MAP[conf_str.upper()]
+    return confidences

audit_packs_core-0.1.1/src/audit_packs_core.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,32 @@
+Metadata-Version: 2.4
+Name: audit-packs-core
+Version: 0.1.1
+Summary: Core models, normalization, diff, and data-flow primitives for audit-packs
+License: Apache-2.0
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: PyYAML>=6.0
+# audit-packs-core
+[![PyPI version](https://img.shields.io/pypi/v/audit-packs-core.svg)](https://pypi.org/project/audit-packs-core/)
+[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](../../LICENSE)
+`audit-packs-core` is the foundational library for the `audit-packs` ecosystem. It provides the core data structures, schema models, parser interfaces, diffing utilities, and normalization primitives used across all other package modules.
+## Installation
+```bash
+pip install audit-packs-core
+```
+## Features
+- **Standardized Schema Models**: Defines standard structures for scanner findings, controls, frameworks, rules, and reports.
+- **Normalization Primitives**: Converts scanner-specific findings into a scanner-agnostic intermediate representation.
+- **Diffing Utilities**: Compares findings between parent and feature branches to detect newly introduced compliance gaps.
+- **YAML Configuration Parser**: Parses standard YAML frameworks and control files.
+## Learn More
+This library is part of the larger `audit-packs` Compliance Intelligence Engine. For the main command-line interface, GitHub Action integration, and framework mappings, see the [main repository](https://github.com/prakharsingh/audit-packs).

audit_packs_core-0.1.1/src/audit_packs_core.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,12 @@
+README.md
+pyproject.toml
+src/audit_packs_core/__init__.py
+src/audit_packs_core/dataflow.py
+src/audit_packs_core/diff.py
+src/audit_packs_core/models.py
+src/audit_packs_core/normalize.py
+src/audit_packs_core.egg-info/PKG-INFO
+src/audit_packs_core.egg-info/SOURCES.txt
+src/audit_packs_core.egg-info/dependency_links.txt
+src/audit_packs_core.egg-info/requires.txt
+src/audit_packs_core.egg-info/top_level.txt

audit_packs_core-0.1.1/src/audit_packs_core.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

audit_packs_core-0.1.1/src/audit_packs_core.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ PyYAML>=6.0

audit_packs_core-0.1.1/src/audit_packs_core.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ audit_packs_core