aisbom-cli 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,101 @@
1
+ Metadata-Version: 2.4
2
+ Name: aisbom-cli
3
+ Version: 0.1.0
4
+ Summary: An AI Supply Chain security tool that that detects Pickle bombs and generates CycloneDX SBOMs for Machine Learning models.
5
+ Author: Ajoy L
6
+ Author-email: lab700xdev@gmail.com
7
+ Requires-Python: >=3.11,<4.0
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.11
10
+ Classifier: Programming Language :: Python :: 3.12
11
+ Classifier: Programming Language :: Python :: 3.13
12
+ Classifier: Programming Language :: Python :: 3.14
13
+ Requires-Dist: click (<8.2.0)
14
+ Requires-Dist: cyclonedx-python-lib (>=8.5.0,<9.0.0)
15
+ Requires-Dist: pip-requirements-parser (>=32.0.1,<33.0.0)
16
+ Requires-Dist: rich (>=13.7.1,<14.0.0)
17
+ Requires-Dist: typer[all] (>=0.12.5,<0.13.0)
18
+ Project-URL: Repository, https://github.com/Lab700xOrg/aisbom
19
+ Description-Content-Type: text/markdown
20
+
21
+ # AIsbom: The Supply Chain for Artificial Intelligence
22
+
23
+ ![License](https://img.shields.io/badge/license-Apache%202.0-blue)
24
+ ![Python](https://img.shields.io/badge/python-3.10%2B-blue)
25
+ ![Compliance](https://img.shields.io/badge/standard-CycloneDX-green)
26
+
27
+ **AIsbom** is a specialized security scanner for Machine Learning artifacts. Unlike generic SBOM tools that only parse `requirements.txt`, AIsbom performs **Deep Binary Introspection** on model files (`.pt`, `.pkl`, `.safetensors`) to detect risks hidden inside the serialized weights.
28
+
29
+ ---
30
+
31
+ ## ๐Ÿš€ The Problem
32
+ AI models are not just text files; they are executable programs.
33
+ * **PyTorch (`.pt`)** files are Zip archives containing Pickle bytecode.
34
+ * **Pickle** files can execute arbitrary code (RCE) instantly upon loading.
35
+ * Legacy scanners see a binary blob and ignore it. **We look inside.**
36
+
37
+ ## โœจ Features
38
+ * **๐Ÿง  Deep Introspection:** Peeks inside PyTorch Zip structures without loading weights into RAM.
39
+ * **๐Ÿ’ฃ Pickle Bomb Detector:** Disassembles bytecode to detect `os.system`, `subprocess`, and `eval` calls before they run.
40
+ * **๐Ÿ›ก๏ธ Compliance Ready:** Generates standard [CycloneDX v1.6](https://cyclonedx.org/) JSON for enterprise integration (Dependency-Track, ServiceNow).
41
+ * **โšก Blazing Fast:** Scans GB-sized models in milliseconds by reading headers only.
42
+
43
+ ---
44
+
45
+ ## ๐Ÿ“ฆ Installation
46
+
47
+ ```bash
48
+ git clone https://github.com/your-org/aisbom.git
49
+ cd aisbom
50
+ pip install -e .
51
+ ```
52
+
53
+ ---
54
+
55
+ ## ๐Ÿ› ๏ธ Usage
56
+
57
+ 1. Scan a directory
58
+ Pass any directory containing your ML project. AIsbom will find requirements files AND model artifacts.
59
+
60
+ ```bash
61
+ aisbom scan ./my-ml-project
62
+ ```
63
+
64
+ 2. Output
65
+ You will see a risk assessment table in your terminal:
66
+
67
+ ๐Ÿง  AI Model Artifacts Found
68
+
69
+ | Filename | Framework | Risk Level |
70
+ | :--- | :--- | :--- |
71
+ | `bert_finetune.pt` | PyTorch | ๐Ÿ”ด **CRITICAL** (RCE Detected: posix.system) |
72
+ | `safe_model.safetensors` | SafeTensors | ๐ŸŸข **LOW** (Binary Safe) |
73
+
74
+ A compliant `sbom.json` will be generated in the current directory.
75
+
76
+ ---
77
+
78
+ ## ๐Ÿ”’ Security Logic
79
+ AIsbom uses a static analysis engine to disassemble Python Pickle opcodes. It looks for specific GLOBAL and STACK_GLOBAL instructions that reference dangerous modules:
80
+
81
+ * os / posix (System calls)
82
+ * subprocess (Shell execution)
83
+ * builtins.eval / exec (Dynamic code execution)
84
+ * socket (Network reverse shells)
85
+
86
+ ---
87
+
88
+ ## ๐Ÿงช Verification & Safety
89
+
90
+ Security tools require trust. **Real Detection:** To maintain a safe repository, **we provide the *source code* to generate a test "Pickle Bomb" locally.** AIsbom detects the *structure* of the threat, not just a known file hash.
91
+
92
+ **To verify the engine yourself:**
93
+ 1. Inspect `demo_data/generate_malware.py`. You will see it uses standard Python libraries to create a payload that simulates an `os.system` call.
94
+ 2. Run the generator:
95
+ ```bash
96
+ python demo_data/generate_malware.py
97
+ ```
98
+ 3. Scan the newly created artifact:
99
+ ```bash
100
+ aisbom scan demo_data
101
+ ```
@@ -0,0 +1,81 @@
1
+ # AIsbom: The Supply Chain for Artificial Intelligence
2
+
3
+ ![License](https://img.shields.io/badge/license-Apache%202.0-blue)
4
+ ![Python](https://img.shields.io/badge/python-3.10%2B-blue)
5
+ ![Compliance](https://img.shields.io/badge/standard-CycloneDX-green)
6
+
7
+ **AIsbom** is a specialized security scanner for Machine Learning artifacts. Unlike generic SBOM tools that only parse `requirements.txt`, AIsbom performs **Deep Binary Introspection** on model files (`.pt`, `.pkl`, `.safetensors`) to detect risks hidden inside the serialized weights.
8
+
9
+ ---
10
+
11
+ ## ๐Ÿš€ The Problem
12
+ AI models are not just text files; they are executable programs.
13
+ * **PyTorch (`.pt`)** files are Zip archives containing Pickle bytecode.
14
+ * **Pickle** files can execute arbitrary code (RCE) instantly upon loading.
15
+ * Legacy scanners see a binary blob and ignore it. **We look inside.**
16
+
17
+ ## โœจ Features
18
+ * **๐Ÿง  Deep Introspection:** Peeks inside PyTorch Zip structures without loading weights into RAM.
19
+ * **๐Ÿ’ฃ Pickle Bomb Detector:** Disassembles bytecode to detect `os.system`, `subprocess`, and `eval` calls before they run.
20
+ * **๐Ÿ›ก๏ธ Compliance Ready:** Generates standard [CycloneDX v1.6](https://cyclonedx.org/) JSON for enterprise integration (Dependency-Track, ServiceNow).
21
+ * **โšก Blazing Fast:** Scans GB-sized models in milliseconds by reading headers only.
22
+
23
+ ---
24
+
25
+ ## ๐Ÿ“ฆ Installation
26
+
27
+ ```bash
28
+ git clone https://github.com/your-org/aisbom.git
29
+ cd aisbom
30
+ pip install -e .
31
+ ```
32
+
33
+ ---
34
+
35
+ ## ๐Ÿ› ๏ธ Usage
36
+
37
+ 1. Scan a directory
38
+ Pass any directory containing your ML project. AIsbom will find requirements files AND model artifacts.
39
+
40
+ ```bash
41
+ aisbom scan ./my-ml-project
42
+ ```
43
+
44
+ 2. Output
45
+ You will see a risk assessment table in your terminal:
46
+
47
+ ๐Ÿง  AI Model Artifacts Found
48
+
49
+ | Filename | Framework | Risk Level |
50
+ | :--- | :--- | :--- |
51
+ | `bert_finetune.pt` | PyTorch | ๐Ÿ”ด **CRITICAL** (RCE Detected: posix.system) |
52
+ | `safe_model.safetensors` | SafeTensors | ๐ŸŸข **LOW** (Binary Safe) |
53
+
54
+ A compliant `sbom.json` will be generated in the current directory.
55
+
56
+ ---
57
+
58
+ ## ๐Ÿ”’ Security Logic
59
+ AIsbom uses a static analysis engine to disassemble Python Pickle opcodes. It looks for specific GLOBAL and STACK_GLOBAL instructions that reference dangerous modules:
60
+
61
+ * os / posix (System calls)
62
+ * subprocess (Shell execution)
63
+ * builtins.eval / exec (Dynamic code execution)
64
+ * socket (Network reverse shells)
65
+
66
+ ---
67
+
68
+ ## ๐Ÿงช Verification & Safety
69
+
70
+ Security tools require trust. **Real Detection:** To maintain a safe repository, **we provide the *source code* to generate a test "Pickle Bomb" locally.** AIsbom detects the *structure* of the threat, not just a known file hash.
71
+
72
+ **To verify the engine yourself:**
73
+ 1. Inspect `demo_data/generate_malware.py`. You will see it uses standard Python libraries to create a payload that simulates an `os.system` call.
74
+ 2. Run the generator:
75
+ ```bash
76
+ python demo_data/generate_malware.py
77
+ ```
78
+ 3. Scan the newly created artifact:
79
+ ```bash
80
+ aisbom scan demo_data
81
+ ```
@@ -0,0 +1 @@
1
+ # This file makes aisbom_cli a Python package
@@ -0,0 +1,93 @@
1
+ import typer
2
+ import json
3
+ from rich.console import Console
4
+ from rich.table import Table
5
+ from rich.panel import Panel
6
+ from cyclonedx.model.bom import Bom
7
+ from cyclonedx.model.component import Component, ComponentType
8
+ from cyclonedx.output.json import JsonV1Dot5, JsonV1Dot6
9
+
10
+
11
+ # Import our new logic engine
12
+ from .scanner import DeepScanner
13
+
14
+ app = typer.Typer()
15
+ console = Console()
16
+
17
+ @app.callback(invoke_without_command=True)
18
+ def main(
19
+ directory: str = typer.Argument(".", help="Target directory to scan"),
20
+ output: str = typer.Option("sbom.json", help="Output file path"),
21
+ schema_version: str = typer.Option("1.6", help="CycloneDX schema version (default is 1.6)", case_sensitive=False, rich_help_panel="Advanced Options")
22
+ ):
23
+ """
24
+ Deep Introspection Scan: Analyzes binary headers and dependency manifests.
25
+ """
26
+ console.print(Panel.fit(f"๐Ÿš€ [bold cyan]AIsbom[/bold cyan] Scanning: [underline]{directory}[/underline]"))
27
+
28
+ # 1. Run the Logic
29
+ scanner = DeepScanner(directory)
30
+ results = scanner.scan()
31
+
32
+ # 2. Render Results (UI)
33
+ if results['artifacts']:
34
+ table = Table(title="๐Ÿง  AI Model Artifacts Found")
35
+ table.add_column("Filename", style="cyan")
36
+ table.add_column("Framework", style="magenta")
37
+ table.add_column("Risk Level", style="bold red")
38
+ table.add_column("Metadata", style="dim")
39
+
40
+ for art in results['artifacts']:
41
+ risk_style = "green" if "LOW" in art['risk_level'] else "red"
42
+ table.add_row(
43
+ art['name'],
44
+ art['framework'],
45
+ f"[{risk_style}]{art['risk_level']}[/{risk_style}]",
46
+ str(art.get('details', ''))[:40] + "..."
47
+ )
48
+ console.print(table)
49
+ else:
50
+ console.print("[yellow]No AI models found.[/yellow]")
51
+
52
+ if results['dependencies']:
53
+ console.print(f"\n๐Ÿ“ฆ Found [bold]{len(results['dependencies'])}[/bold] Python libraries.")
54
+
55
+ if results['errors']:
56
+ console.print("\n[bold red]โš ๏ธ Errors Encountered:[/bold red]")
57
+ for err in results['errors']:
58
+ console.print(f" - Could not parse [yellow]{err['file']}[/yellow]: {err['error']}")
59
+ # 3. Generate CycloneDX SBOM (Standard Compliance)
60
+ bom = Bom()
61
+
62
+ # Add Models
63
+ for art in results['artifacts']:
64
+ c = Component(
65
+ name=art['name'],
66
+ type=ComponentType.MACHINE_LEARNING_MODEL,
67
+ # We shove our risk assessment into the description for now
68
+ description=f"Risk: {art['risk_level']} | Framework: {art['framework']}"
69
+ )
70
+ bom.components.add(c)
71
+
72
+ # Add Libraries
73
+ for dep in results['dependencies']:
74
+ c = Component(
75
+ name=dep['name'],
76
+ version=dep['version'],
77
+ type=ComponentType.LIBRARY
78
+ )
79
+ bom.components.add(c)
80
+
81
+ # 4. Save to Disk
82
+ if schema_version == "1.5":
83
+ outputter = JsonV1Dot5(bom)
84
+ else:
85
+ outputter = JsonV1Dot6(bom)
86
+
87
+ with open(output, "w") as f:
88
+ f.write(outputter.output_as_string())
89
+
90
+ console.print(f"\n[bold green]โœ” Compliance Artifact Generated:[/bold green] {output} (CycloneDX v{schema_version})")
91
+
92
+ if __name__ == "__main__":
93
+ app()
@@ -0,0 +1,54 @@
1
+ import pickletools
2
+ import io
3
+ from typing import List, Set, Tuple
4
+
5
+ # The "Blocklist" of dangerous modules and functions
6
+ # If a model tries to import these, it is trying to break out of the sandbox.
7
+ DANGEROUS_GLOBALS = {
8
+ "os": {"system", "popen", "execl", "execvp"},
9
+ "subprocess": {"Popen", "call", "check_call", "check_output", "run"},
10
+ "builtins": {"eval", "exec", "compile", "open"},
11
+ "posix": {"system", "popen"},
12
+ "webbrowser": {"open"},
13
+ "socket": {"socket", "connect"},
14
+ }
15
+
16
+ def scan_pickle_stream(data: bytes) -> List[str]:
17
+ """
18
+ Disassembles a pickle stream and checks for dangerous imports.
19
+ Returns a list of detected threats (e.g., ["os.system"]).
20
+ """
21
+ threats = []
22
+ memo = [] # Used to track recent string literals for STACK_GLOBAL
23
+
24
+ try:
25
+ stream = io.BytesIO(data)
26
+
27
+ for opcode, arg, pos in pickletools.genops(stream):
28
+ # Track the last few string literals we've seen on the stack
29
+ if opcode.name in ("SHORT_BINUNICODE", "UNICODE", "BINUNICODE"):
30
+ memo.append(arg)
31
+ if len(memo) > 2:
32
+ memo.pop(0)
33
+
34
+ if opcode.name == "GLOBAL":
35
+ # Arg is "module\nname"
36
+ if isinstance(arg, str) and "\n" in arg:
37
+ module, name = arg.split("\n")
38
+ if module in DANGEROUS_GLOBALS and name in DANGEROUS_GLOBALS[module]:
39
+ threats.append(f"{module}.{name}")
40
+
41
+ elif opcode.name == "STACK_GLOBAL":
42
+ # Takes two arguments from the stack: module and name
43
+ if len(memo) == 2:
44
+ module, name = memo
45
+ if module in DANGEROUS_GLOBALS and name in DANGEROUS_GLOBALS[module]:
46
+ threats.append(f"{module}.{name}")
47
+ # Clear memo after use to avoid false positives
48
+ memo.clear()
49
+
50
+ except Exception as e:
51
+ # Avoid crashing on malformed pickles
52
+ pass
53
+
54
+ return threats
@@ -0,0 +1,126 @@
1
+ import os
2
+ import json
3
+ import zipfile
4
+ import struct
5
+ from typing import List, Dict, Any
6
+ from pathlib import Path
7
+ from .safety import scan_pickle_stream
8
+ from pip_requirements_parser import RequirementsFile
9
+
10
+ # Constants for file types make the code cleaner and easier to extend
11
+ PYTORCH_EXTENSIONS = {'.pt', '.pth', '.bin'}
12
+ SAFETENSORS_EXTENSION = '.safetensors'
13
+ REQUIREMENTS_FILENAME = 'requirements.txt'
14
+
15
+ class DeepScanner:
16
+ def __init__(self, root_path: str):
17
+ self.root_path = Path(root_path)
18
+ self.artifacts = []
19
+ self.dependencies = []
20
+ self.errors = []
21
+
22
+ def scan(self):
23
+ """Orchestrates the scan of the directory."""
24
+ # Use rglob for a more concise way to recursively find files
25
+ for full_path in self.root_path.rglob("*"):
26
+ if full_path.is_file():
27
+ ext = full_path.suffix.lower()
28
+
29
+ # 1. Scan AI Artifacts
30
+ if ext in PYTORCH_EXTENSIONS:
31
+ self.artifacts.append(self._inspect_pytorch(full_path))
32
+ elif ext == SAFETENSORS_EXTENSION:
33
+ self.artifacts.append(self._inspect_safetensors(full_path))
34
+
35
+ # 2. Scan Dependency Manifests
36
+ elif full_path.name == REQUIREMENTS_FILENAME:
37
+ self._parse_requirements(full_path)
38
+
39
+ return {"artifacts": self.artifacts, "dependencies": self.dependencies, "errors": self.errors}
40
+
41
+ def _inspect_pytorch(self, path: Path) -> Dict[str, Any]:
42
+ """Peeks inside a PyTorch file structure and SCANS for malware."""
43
+ meta = {
44
+ "name": path.name,
45
+ "type": "machine-learning-model",
46
+ "framework": "PyTorch",
47
+ "risk_level": "UNKNOWN",
48
+ "details": {}
49
+ }
50
+ try:
51
+ if zipfile.is_zipfile(path):
52
+ with zipfile.ZipFile(path, 'r') as z:
53
+ files = z.namelist()
54
+
55
+ # 1. Find the data file (usually archive/data.pkl or just data.pkl)
56
+ pickle_files = [f for f in files if f.endswith('.pkl')]
57
+
58
+ threats = []
59
+ if pickle_files:
60
+ # 2. Extract and Scan the pickle bytes
61
+ # We only scan the first few MBs or the main file to be fast
62
+ main_pkl = pickle_files[0]
63
+ with z.open(main_pkl) as f:
64
+ # Read first 10MB max to prevent zip bombs
65
+ content = f.read(10 * 1024 * 1024)
66
+ threats = scan_pickle_stream(content)
67
+
68
+ # 3. Assess Risk
69
+ if threats:
70
+ meta["risk_level"] = f"CRITICAL (RCE Detected: {', '.join(threats)})"
71
+ elif pickle_files:
72
+ meta["risk_level"] = "MEDIUM (Pickle Present)"
73
+ else:
74
+ meta["risk_level"] = "LOW (No bytecode found)"
75
+
76
+ meta["details"] = {"internal_files": len(files), "threats": threats}
77
+ else:
78
+ meta["risk_level"] = "CRITICAL (Legacy Binary)"
79
+ except Exception as e:
80
+ meta["error"] = str(e)
81
+ return meta
82
+
83
+ def _inspect_safetensors(self, path: Path) -> Dict[str, Any]:
84
+ """Reads the JSON header from a .safetensors file."""
85
+ meta = {
86
+ "name": path.name,
87
+ "type": "machine-learning-model",
88
+ "framework": "SafeTensors",
89
+ "risk_level": "LOW", # Safe by design
90
+ "details": {}
91
+ }
92
+ try:
93
+ with open(path, 'rb') as f:
94
+ # First 8 bytes = header length
95
+ length_bytes = f.read(8)
96
+ if len(length_bytes) == 8:
97
+ header_len = struct.unpack('<Q', length_bytes)[0]
98
+ header_json = json.loads(f.read(header_len))
99
+ meta["details"] = {
100
+ "tensors": len(header_json.keys()),
101
+ "metadata": header_json.get("__metadata__", {})
102
+ }
103
+ except Exception as e:
104
+ meta["error"] = str(e)
105
+ return meta
106
+
107
+ def _parse_requirements(self, path: Path):
108
+ """Parses requirements.txt into individual components."""
109
+ try:
110
+ req_file = RequirementsFile.from_file(path)
111
+ for req in req_file.requirements:
112
+ if req.name:
113
+ # Robust version extraction
114
+ version = "unknown"
115
+ specs = list(req.specifier) if req.specifier else []
116
+ if specs:
117
+ # Grab the first version number found, e.g. "==1.2.0" -> "1.2.0"
118
+ version = specs[0].version
119
+
120
+ self.dependencies.append({
121
+ "name": req.name,
122
+ "version": version,
123
+ "type": "library"
124
+ })
125
+ except Exception as e:
126
+ self.errors.append({"file": str(path), "error": str(e)})
@@ -0,0 +1,21 @@
1
+ [tool.poetry]
2
+ name = "aisbom-cli"
3
+ version = "0.1.0"
4
+ description = "An AI Supply Chain security tool that that detects Pickle bombs and generates CycloneDX SBOMs for Machine Learning models."
5
+ authors = ["Ajoy L <lab700xdev@gmail.com>"]
6
+ readme = "README.md"
7
+ packages = [{include = "aisbom"}]
8
+ repository = "https://github.com/Lab700xOrg/aisbom"
9
+ [tool.poetry.dependencies]
10
+ python = "^3.11"
11
+ typer = {extras = ["all"], version = "^0.12.5"}
12
+ rich = "^13.7.1"
13
+ cyclonedx-python-lib = "^8.5.0"
14
+ pip-requirements-parser = "^32.0.1"
15
+ click = "<8.2.0"
16
+ [build-system]
17
+ requires = ["poetry-core"]
18
+ build-backend = "poetry.core.masonry.api"
19
+
20
+ [tool.poetry.scripts]
21
+ aisbom = "aisbom.cli:app"