redcodegen 0.1.0b0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- redcodegen-0.1.0b0/PKG-INFO +131 -0
- redcodegen-0.1.0b0/README.md +114 -0
- redcodegen-0.1.0b0/pyproject.toml +40 -0
- redcodegen-0.1.0b0/redcodegen/#main.py# +263 -0
- redcodegen-0.1.0b0/redcodegen/#seeds.py# +17 -0
- redcodegen-0.1.0b0/redcodegen/__init__.py +0 -0
- redcodegen-0.1.0b0/redcodegen/constants.py +68 -0
- redcodegen-0.1.0b0/redcodegen/data/__init__.py +2 -0
- redcodegen-0.1.0b0/redcodegen/data/scenario_dow.jsonl +54 -0
- redcodegen-0.1.0b0/redcodegen/generator.py +39 -0
- redcodegen-0.1.0b0/redcodegen/kernels/__init__.py +4 -0
- redcodegen-0.1.0b0/redcodegen/kernels/kernel.py +35 -0
- redcodegen-0.1.0b0/redcodegen/kernels/rephrase.py +34 -0
- redcodegen-0.1.0b0/redcodegen/main.py +552 -0
- redcodegen-0.1.0b0/redcodegen/scenarios.py +64 -0
- redcodegen-0.1.0b0/redcodegen/seeds.py +66 -0
- redcodegen-0.1.0b0/redcodegen/uncertainty.py +106 -0
- redcodegen-0.1.0b0/redcodegen/validator.py +214 -0
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: redcodegen
|
|
3
|
+
Version: 0.1.0b0
|
|
4
|
+
Summary: Add your description here
|
|
5
|
+
Requires-Dist: click>=8.0.0
|
|
6
|
+
Requires-Dist: cwe2>=3.0.0
|
|
7
|
+
Requires-Dist: dspy>=3.0.3
|
|
8
|
+
Requires-Dist: jsonlines>=4.0.0
|
|
9
|
+
Requires-Dist: pandas>=2.3.3
|
|
10
|
+
Requires-Dist: python-dotenv>=1.1.1
|
|
11
|
+
Requires-Dist: rich>=14.2.0
|
|
12
|
+
Requires-Dist: rich-click>=1.9.3
|
|
13
|
+
Requires-Dist: scipy>=1.16.3
|
|
14
|
+
Requires-Dist: semgrep>=1.86.0
|
|
15
|
+
Requires-Python: >=3.11
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
|
|
18
|
+
<h1 align="center">
|
|
19
|
+
<em>RedCodeGen</em>
|
|
20
|
+
</h1>
|
|
21
|
+
|
|
22
|
+
<p align="center">
|
|
23
|
+
<a href="https://pypi.org/project/redcodegen/" target="_blank">
|
|
24
|
+
<img src="https://img.shields.io/pypi/v/redcodegen.svg", alt="PyPi Version">
|
|
25
|
+
</a>
|
|
26
|
+
<a href="https://github.com/sisl/redcodegen/blob/main/LICENSE" target="_blank">
|
|
27
|
+
<img src="https://img.shields.io/badge/License-MIT-green.svg", alt="License">
|
|
28
|
+
</a>
|
|
29
|
+
</p>
|
|
30
|
+
|
|
31
|
+
Automatic generation of *benign* prompts and language model rollouts in Python that exercise specific software vulnerabilities (CWEs) defined in the [MITRE CWE database](https://cwe.mitre.org/).
|
|
32
|
+
|
|
33
|
+
Developed by the Stanford Intelligent Systems Laboratory (SISL) as a part of [astra-rl](https://github.com/sisl/astra-rl).
|
|
34
|
+
|
|
35
|
+
## Features
|
|
36
|
+
|
|
37
|
+
- Generation of realistic coding task prompts that exercise specific CWEs
|
|
38
|
+
- Generation of code samples for specific CWEs or CWE Top 25
|
|
39
|
+
- Automatic code evaluation and vulnerability detection via [CodeQL static analysis](https://codeql.github.com/)
|
|
40
|
+
- Programmable API for custom scenarios and configurations
|
|
41
|
+
|
|
42
|
+
## Installation
|
|
43
|
+
|
|
44
|
+
### CodeQL
|
|
45
|
+
**First, you must install CodeQL and have it available in your PATH.**
|
|
46
|
+
|
|
47
|
+
- macOS Users: `brew install codeql`
|
|
48
|
+
- Windows/Linux Users: follow the instructions [here](https://docs.github.com/en/code-security/codeql-cli/getting-started-with-the-codeql-cli/setting-up-the-codeql-cli)
|
|
49
|
+
|
|
50
|
+
### RedCodeGen
|
|
51
|
+
|
|
52
|
+
RedCodeGen is available via PyPI. Install it with pip:
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
pip install redcodegen
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
You would also want to create a .env file with your API key in your working directory:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
echo "OPENAI_API_KEY=your_openai_api_key" > .env
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Quick Start
|
|
65
|
+
|
|
66
|
+
The most basic usage involves rolling out a language model to generate code samples for specific CWEs and evaluating them with CodeQL.
|
|
67
|
+
|
|
68
|
+
Suppose you want to rollout 5 samples each to exercise CWE-89 (SQL Injection) and CWE-79 (Cross-Site Scripting):
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
redcodegen generate -c 89 -c 79 -n 5 -o results.jsonl
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
You will get a `results.jsonl` file with the generated samples and their evaluations. Each CWE will live on a line. Let's take a peak!
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
head -n 1 results.jsonl | jq .
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
```json
|
|
81
|
+
{
|
|
82
|
+
"cwe_id": 89,
|
|
83
|
+
"cwe_name": "SQL Injection",
|
|
84
|
+
"cwe_description": "SQL Injection is a code injection technique that might destroy your database. It is one of the most common web hacking techniques.",
|
|
85
|
+
"timestamp": "2024-06-01T12:00:00Z",
|
|
86
|
+
"model_config": {"model": "openai/gpt-4o-mini"},
|
|
87
|
+
"min_scenarios": 5,
|
|
88
|
+
"samples": [
|
|
89
|
+
{
|
|
90
|
+
"scenario": "A web application that takes user input and constructs SQL queries with proper sanitization.",
|
|
91
|
+
"code": "...generated code here...",
|
|
92
|
+
"evaluation": [
|
|
93
|
+
"rule": "py/sql-injection",
|
|
94
|
+
"message": "...",
|
|
95
|
+
"line": ...
|
|
96
|
+
]
|
|
97
|
+
},
|
|
98
|
+
...
|
|
99
|
+
]
|
|
100
|
+
}
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Importantly, running the above command multiple times (to the same output file) will resume from where you left off, skipping CWEs that have already been processed in the output file.
|
|
104
|
+
|
|
105
|
+
## Usage Examples
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
redcodegen generate -c 89 -c 79 # manually specify cwe
|
|
109
|
+
redcodegen generate -n 5 # specify number of rollouts
|
|
110
|
+
redcodegen generate --use-top-25 # run CWE top 25
|
|
111
|
+
redcodegen generate --use-top-25 -o results.jsonl # resume existing run
|
|
112
|
+
redcodegen generate --use-top-25 --model openai/gpt-4o # switch model
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Also, you can run
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
redcodegen --help
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
to see all available options.
|
|
122
|
+
|
|
123
|
+
## Method
|
|
124
|
+
RedCodeGen works in three main steps:
|
|
125
|
+
|
|
126
|
+
1. **Prompt Generation**: for each specified CWE, RedCodeGen generates a realistic coding task prompt that is likely to exercise the vulnerability. We do this by first looking up the CWE description from the MITRE CWE database, then prompting your specified language model to generate a coding task prompt based on that description. These descriptions are few-shot trained via existing human-written prompts from [Pearce, 2021](https://arxiv.org/abs/2108.09293).
|
|
127
|
+
2. **Code Generation**: RedCodeGen then rolls out the specified language model on the generated prompt a few times with a sampling temperature of 0.8 to generate multiple code samples.
|
|
128
|
+
3. **Code Evaluation**: Finally, RedCodeGen evaluates each generated code sample using CodeQL static analysis to detect whether the intended vulnerability is present in the code.
|
|
129
|
+
|
|
130
|
+
## Acknowledgements
|
|
131
|
+
We thank the Schmidt Sciences Foundation's trustworthy AI agenda for supporting this work.
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
<h1 align="center">
|
|
2
|
+
<em>RedCodeGen</em>
|
|
3
|
+
</h1>
|
|
4
|
+
|
|
5
|
+
<p align="center">
|
|
6
|
+
<a href="https://pypi.org/project/redcodegen/" target="_blank">
|
|
7
|
+
<img src="https://img.shields.io/pypi/v/redcodegen.svg", alt="PyPi Version">
|
|
8
|
+
</a>
|
|
9
|
+
<a href="https://github.com/sisl/redcodegen/blob/main/LICENSE" target="_blank">
|
|
10
|
+
<img src="https://img.shields.io/badge/License-MIT-green.svg", alt="License">
|
|
11
|
+
</a>
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
Automatic generation of *benign* prompts and language model rollouts in Python that exercise specific software vulnerabilities (CWEs) defined in the [MITRE CWE database](https://cwe.mitre.org/).
|
|
15
|
+
|
|
16
|
+
Developed by the Stanford Intelligent Systems Laboratory (SISL) as a part of [astra-rl](https://github.com/sisl/astra-rl).
|
|
17
|
+
|
|
18
|
+
## Features
|
|
19
|
+
|
|
20
|
+
- Generation of realistic coding task prompts that exercise specific CWEs
|
|
21
|
+
- Generation of code samples for specific CWEs or CWE Top 25
|
|
22
|
+
- Automatic code evaluation and vulnerability detection via [CodeQL static analysis](https://codeql.github.com/)
|
|
23
|
+
- Programmable API for custom scenarios and configurations
|
|
24
|
+
|
|
25
|
+
## Installation
|
|
26
|
+
|
|
27
|
+
### CodeQL
|
|
28
|
+
**First, you must install CodeQL and have it available in your PATH.**
|
|
29
|
+
|
|
30
|
+
- macOS Users: `brew install codeql`
|
|
31
|
+
- Windows/Linux Users: follow the instructions [here](https://docs.github.com/en/code-security/codeql-cli/getting-started-with-the-codeql-cli/setting-up-the-codeql-cli)
|
|
32
|
+
|
|
33
|
+
### RedCodeGen
|
|
34
|
+
|
|
35
|
+
RedCodeGen is available via PyPI. Install it with pip:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
pip install redcodegen
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
You would also want to create a .env file with your API key in your working directory:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
echo "OPENAI_API_KEY=your_openai_api_key" > .env
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Quick Start
|
|
48
|
+
|
|
49
|
+
The most basic usage involves rolling out a language model to generate code samples for specific CWEs and evaluating them with CodeQL.
|
|
50
|
+
|
|
51
|
+
Suppose you want to rollout 5 samples each to exercise CWE-89 (SQL Injection) and CWE-79 (Cross-Site Scripting):
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
redcodegen generate -c 89 -c 79 -n 5 -o results.jsonl
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
You will get a `results.jsonl` file with the generated samples and their evaluations. Each CWE will live on a line. Let's take a peak!
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
head -n 1 results.jsonl | jq .
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
```json
|
|
64
|
+
{
|
|
65
|
+
"cwe_id": 89,
|
|
66
|
+
"cwe_name": "SQL Injection",
|
|
67
|
+
"cwe_description": "SQL Injection is a code injection technique that might destroy your database. It is one of the most common web hacking techniques.",
|
|
68
|
+
"timestamp": "2024-06-01T12:00:00Z",
|
|
69
|
+
"model_config": {"model": "openai/gpt-4o-mini"},
|
|
70
|
+
"min_scenarios": 5,
|
|
71
|
+
"samples": [
|
|
72
|
+
{
|
|
73
|
+
"scenario": "A web application that takes user input and constructs SQL queries with proper sanitization.",
|
|
74
|
+
"code": "...generated code here...",
|
|
75
|
+
"evaluation": [
|
|
76
|
+
"rule": "py/sql-injection",
|
|
77
|
+
"message": "...",
|
|
78
|
+
"line": ...
|
|
79
|
+
]
|
|
80
|
+
},
|
|
81
|
+
...
|
|
82
|
+
]
|
|
83
|
+
}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Importantly, running the above command multiple times (to the same output file) will resume from where you left off, skipping CWEs that have already been processed in the output file.
|
|
87
|
+
|
|
88
|
+
## Usage Examples
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
redcodegen generate -c 89 -c 79 # manually specify cwe
|
|
92
|
+
redcodegen generate -n 5 # specify number of rollouts
|
|
93
|
+
redcodegen generate --use-top-25 # run CWE top 25
|
|
94
|
+
redcodegen generate --use-top-25 -o results.jsonl # resume existing run
|
|
95
|
+
redcodegen generate --use-top-25 --model openai/gpt-4o # switch model
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Also, you can run
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
redcodegen --help
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
to see all available options.
|
|
105
|
+
|
|
106
|
+
## Method
|
|
107
|
+
RedCodeGen works in three main steps:
|
|
108
|
+
|
|
109
|
+
1. **Prompt Generation**: for each specified CWE, RedCodeGen generates a realistic coding task prompt that is likely to exercise the vulnerability. We do this by first looking up the CWE description from the MITRE CWE database, then prompting your specified language model to generate a coding task prompt based on that description. These descriptions are few-shot trained via existing human-written prompts from [Pearce, 2021](https://arxiv.org/abs/2108.09293).
|
|
110
|
+
2. **Code Generation**: RedCodeGen then rolls out the specified language model on the generated prompt a few times with a sampling temperature of 0.8 to generate multiple code samples.
|
|
111
|
+
3. **Code Evaluation**: Finally, RedCodeGen evaluates each generated code sample using CodeQL static analysis to detect whether the intended vulnerability is present in the code.
|
|
112
|
+
|
|
113
|
+
## Acknowledgements
|
|
114
|
+
We thank the Schmidt Sciences Foundation's trustworthy AI agenda for supporting this work.
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "redcodegen"
|
|
3
|
+
version = "0.1.0-beta.0"
|
|
4
|
+
description = "Add your description here"
|
|
5
|
+
readme = "README.md"
|
|
6
|
+
requires-python = ">=3.11"
|
|
7
|
+
dependencies = [
|
|
8
|
+
"click>=8.0.0",
|
|
9
|
+
"cwe2>=3.0.0",
|
|
10
|
+
"dspy>=3.0.3",
|
|
11
|
+
"jsonlines>=4.0.0",
|
|
12
|
+
"pandas>=2.3.3",
|
|
13
|
+
"python-dotenv>=1.1.1",
|
|
14
|
+
"rich>=14.2.0",
|
|
15
|
+
"rich-click>=1.9.3",
|
|
16
|
+
"scipy>=1.16.3",
|
|
17
|
+
"semgrep>=1.86.0",
|
|
18
|
+
]
|
|
19
|
+
|
|
20
|
+
[project.scripts]
|
|
21
|
+
redcodegen = "redcodegen.main:main"
|
|
22
|
+
|
|
23
|
+
[tool.uv]
|
|
24
|
+
package = true
|
|
25
|
+
|
|
26
|
+
[build-system]
|
|
27
|
+
requires = ["uv_build>=0.9.5,<0.10.0"]
|
|
28
|
+
build-backend = "uv_build"
|
|
29
|
+
|
|
30
|
+
[dependency-groups]
|
|
31
|
+
dev = [
|
|
32
|
+
"ipdb>=0.13.13",
|
|
33
|
+
"seaborn>=0.13.2",
|
|
34
|
+
]
|
|
35
|
+
|
|
36
|
+
[tool.uv.build-backend]
|
|
37
|
+
module-name = "redcodegen"
|
|
38
|
+
module-root = ""
|
|
39
|
+
source-include = ["./redcodegen/data/*"]
|
|
40
|
+
|
|
@@ -0,0 +1,263 @@
|
|
|
1
|
+
"""
|
|
2
|
+
main.py
|
|
3
|
+
Main script for generating and evaluating vulnerable code samples
|
|
4
|
+
"""
|
|
5
|
+
|
|
6
|
+
import rich_click as click
|
|
7
|
+
import jsonlines
|
|
8
|
+
import logging
|
|
9
|
+
import dspy
|
|
10
|
+
from datetime import datetime
|
|
11
|
+
from pathlib import Path
|
|
12
|
+
from typing import List, Set, Dict, Any
|
|
13
|
+
from cwe2.database import Database
|
|
14
|
+
|
|
15
|
+
from redcodegen.constants import CWE_TOP_25, create_lm
|
|
16
|
+
|
|
17
|
+
from rich.logging import RichHandler
|
|
18
|
+
|
|
19
|
+
# Setup logging
|
|
20
|
+
logging.basicConfig(
|
|
21
|
+
level=logging.INFO,
|
|
22
|
+
format="%(message)s",
|
|
23
|
+
handlers=[RichHandler(rich_tracebacks=True)]
|
|
24
|
+
)
|
|
25
|
+
logger = logging.getLogger(__name__)
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
def load_completed_cwes(output_path: Path) -> Set[int]:
|
|
29
|
+
"""Load CWE IDs that have already been processed.
|
|
30
|
+
|
|
31
|
+
Args:
|
|
32
|
+
output_path: Path to the output JSONL file
|
|
33
|
+
|
|
34
|
+
Returns:
|
|
35
|
+
Set of CWE IDs that are already in the output file
|
|
36
|
+
"""
|
|
37
|
+
completed = set()
|
|
38
|
+
|
|
39
|
+
if not output_path.exists():
|
|
40
|
+
return completed
|
|
41
|
+
|
|
42
|
+
try:
|
|
43
|
+
with jsonlines.open(output_path) as reader:
|
|
44
|
+
for record in reader:
|
|
45
|
+
if 'cwe_id' in record:
|
|
46
|
+
completed.add(record['cwe_id'])
|
|
47
|
+
logger.info(f"Found {len(completed)} already-completed CWEs in {output_path}")
|
|
48
|
+
except Exception as e:
|
|
49
|
+
logger.warning(f"Could not read existing output file: {e}")
|
|
50
|
+
|
|
51
|
+
return completed
|
|
52
|
+
|
|
53
|
+
|
|
54
|
+
def get_model_config() -> Dict[str, Any]:
|
|
55
|
+
"""Extract model configuration from current DSPy settings.
|
|
56
|
+
|
|
57
|
+
Returns:
|
|
58
|
+
Dict with model configuration info
|
|
59
|
+
"""
|
|
60
|
+
lm = dspy.settings.lm
|
|
61
|
+
config = {
|
|
62
|
+
"model": getattr(lm, 'model', 'unknown'),
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
return config
|
|
66
|
+
|
|
67
|
+
|
|
68
|
+
def build_record(
|
|
69
|
+
cwe_id: int,
|
|
70
|
+
cwe_name: str,
|
|
71
|
+
cwe_description: str,
|
|
72
|
+
scenarios: List[str],
|
|
73
|
+
codes: List[str],
|
|
74
|
+
evaluations: List[Any],
|
|
75
|
+
errors: List[str],
|
|
76
|
+
min_scenarios: int
|
|
77
|
+
) -> Dict[str, Any]:
|
|
78
|
+
"""Build a record for JSONL output.
|
|
79
|
+
|
|
80
|
+
Args:
|
|
81
|
+
cwe_id: CWE identifier
|
|
82
|
+
cwe_name: CWE name
|
|
83
|
+
cwe_description: CWE description
|
|
84
|
+
scenarios: List of scenario descriptions
|
|
85
|
+
codes: List of generated code samples
|
|
86
|
+
evaluations: List of evaluation results (can contain None for failures)
|
|
87
|
+
errors: List of error messages (None for successful evaluations)
|
|
88
|
+
min_scenarios: Minimum scenarios parameter used
|
|
89
|
+
|
|
90
|
+
Returns:
|
|
91
|
+
Dict representing the complete record for this CWE
|
|
92
|
+
"""
|
|
93
|
+
samples = []
|
|
94
|
+
for scenario, code, evaluation, error in zip(scenarios, codes, evaluations, errors):
|
|
95
|
+
samples.append({
|
|
96
|
+
"scenario": scenario,
|
|
97
|
+
"code": code,
|
|
98
|
+
"evaluation": evaluation
|
|
99
|
+
})
|
|
100
|
+
|
|
101
|
+
return {
|
|
102
|
+
"cwe_id": cwe_id,
|
|
103
|
+
"cwe_name": cwe_name,
|
|
104
|
+
"cwe_description": cwe_description,
|
|
105
|
+
"timestamp": datetime.utcnow().isoformat() + 'Z',
|
|
106
|
+
"model_config": get_model_config(),
|
|
107
|
+
"min_scenarios": min_scenarios,
|
|
108
|
+
"samples": samples
|
|
109
|
+
}
|
|
110
|
+
|
|
111
|
+
|
|
112
|
+
def append_to_jsonl(record: Dict[str, Any], output_path: Path):
|
|
113
|
+
"""Append a record to the JSONL file.
|
|
114
|
+
|
|
115
|
+
Args:
|
|
116
|
+
record: Record to append
|
|
117
|
+
output_path: Path to output file
|
|
118
|
+
"""
|
|
119
|
+
with jsonlines.open(output_path, mode='a') as writer:
|
|
120
|
+
writer.write(record)
|
|
121
|
+
logger.info(f"Saved CWE-{record['cwe_id']} to {output_path}")
|
|
122
|
+
|
|
123
|
+
|
|
124
|
+
@click.command()
|
|
125
|
+
@click.option(
|
|
126
|
+
'--cwes', '-c',
|
|
127
|
+
multiple=True,
|
|
128
|
+
type=int,
|
|
129
|
+
help='CWE IDs to process (can specify multiple times, e.g., -c 89 -c 79)'
|
|
130
|
+
)
|
|
131
|
+
@click.option(
|
|
132
|
+
'--use-top-25',
|
|
133
|
+
is_flag=True,
|
|
134
|
+
help='Process all CWE Top 25'
|
|
135
|
+
)
|
|
136
|
+
@click.option(
|
|
137
|
+
'--min-samples', '-n',
|
|
138
|
+
default=3,
|
|
139
|
+
type=int,
|
|
140
|
+
help='Minimum samples per CWE (default: 3)'
|
|
141
|
+
)
|
|
142
|
+
@click.option(
|
|
143
|
+
'--output', '-o',
|
|
144
|
+
default='results.jsonl',
|
|
145
|
+
type=click.Path(),
|
|
146
|
+
help='Output JSONL file (default: results.jsonl)'
|
|
147
|
+
)
|
|
148
|
+
@click.option(
|
|
149
|
+
'--model', '-m',
|
|
150
|
+
default='openai/gpt-4o-mini',
|
|
151
|
+
help='Model identifier (default: openai/gpt-4o-mini)'
|
|
152
|
+
)
|
|
153
|
+
@click.option(
|
|
154
|
+
'--api-key',
|
|
155
|
+
default=None,
|
|
156
|
+
help='API key (defaults to OPENAI_API_KEY env var)'
|
|
157
|
+
)
|
|
158
|
+
@
|
|
159
|
+
def main(cwes, use_top_25, min_samples, output, model, api_key):
|
|
160
|
+
"""Generate and evaluate vulnerable code samples for specified CWEs.
|
|
161
|
+
|
|
162
|
+
Examples:
|
|
163
|
+
python -m redcodegen -c 89 -c 79 # manually specify cwe
|
|
164
|
+
python -m redcodegen -n 5 # specify number of rollouts
|
|
165
|
+
python -m redcodegen --use-top-25 # run CWE top 25
|
|
166
|
+
python -m redcodegen --use-top-25 -o results.jsonl # resume existing run
|
|
167
|
+
python -m redcodegen --use-top-25 --model openai/gpt-4o # switch model
|
|
168
|
+
"""
|
|
169
|
+
# Configure DSPy with specified model
|
|
170
|
+
lm = create_lm(model_name=model, api_key=api_key)
|
|
171
|
+
dspy.configure(lm=lm)
|
|
172
|
+
logger.info(f"Configured model: {model}")
|
|
173
|
+
|
|
174
|
+
# Import generator and validator after configuring dspy
|
|
175
|
+
from redcodegen.generator import run_cwe
|
|
176
|
+
from redcodegen.validator import evaluate
|
|
177
|
+
|
|
178
|
+
output_path = Path(output)
|
|
179
|
+
|
|
180
|
+
# Determine which CWEs to process
|
|
181
|
+
if use_top_25:
|
|
182
|
+
cwes_to_process = CWE_TOP_25
|
|
183
|
+
logger.info(f"Processing CWE Top 25 ({len(cwes_to_process)} CWEs)")
|
|
184
|
+
elif cwes:
|
|
185
|
+
cwes_to_process = list(cwes)
|
|
186
|
+
logger.info(f"Processing {len(cwes_to_process)} specified CWEs")
|
|
187
|
+
else:
|
|
188
|
+
logger.error("Must specify either --cwes or --use-top-25")
|
|
189
|
+
raise click.UsageError("Must specify either --cwes or --use-top-25")
|
|
190
|
+
|
|
191
|
+
# Load already-completed CWEs for idempotency
|
|
192
|
+
completed_cwes = load_completed_cwes(output_path)
|
|
193
|
+
cwes_to_process = [cwe for cwe in cwes_to_process if cwe not in completed_cwes]
|
|
194
|
+
|
|
195
|
+
if not cwes_to_process:
|
|
196
|
+
logger.info("All CWEs already completed!")
|
|
197
|
+
return
|
|
198
|
+
|
|
199
|
+
logger.info(f"Processing {len(cwes_to_process)} CWEs (skipped {len(completed_cwes)} already completed)")
|
|
200
|
+
|
|
201
|
+
# Initialize CWE database
|
|
202
|
+
db = Database()
|
|
203
|
+
|
|
204
|
+
# Process each CWE
|
|
205
|
+
for idx, cwe_id in enumerate(cwes_to_process, 1):
|
|
206
|
+
logger.info(f"[{idx}/{len(cwes_to_process)}] Processing CWE-{cwe_id}...")
|
|
207
|
+
|
|
208
|
+
try:
|
|
209
|
+
# Get CWE metadata
|
|
210
|
+
entry = db.get(cwe_id)
|
|
211
|
+
cwe_name = entry.name
|
|
212
|
+
cwe_description = entry.extended_description or entry.description
|
|
213
|
+
|
|
214
|
+
# Generate code samples
|
|
215
|
+
logger.info(f" Generating {min_samples} code samples...")
|
|
216
|
+
codes = run_cwe(cwe_id, min_scenarios=min_samples)
|
|
217
|
+
logger.info(f" Generated {len(codes)} code samples")
|
|
218
|
+
|
|
219
|
+
# Get scenarios (need to call generate again to get scenarios)
|
|
220
|
+
from redcodegen.scenarios import generate
|
|
221
|
+
scenario_data = generate(cwe_id, min_scenarios=min_samples)
|
|
222
|
+
scenarios = scenario_data["scenarios"][:len(codes)] # Match code count
|
|
223
|
+
|
|
224
|
+
# Evaluate each code sample
|
|
225
|
+
evaluations = []
|
|
226
|
+
errors = []
|
|
227
|
+
|
|
228
|
+
for i, code in enumerate(codes, 1):
|
|
229
|
+
logger.info(f" Evaluating sample {i}/{len(codes)}...")
|
|
230
|
+
try:
|
|
231
|
+
evaluation = evaluate(code)
|
|
232
|
+
evaluations.append(evaluation)
|
|
233
|
+
errors.append(None)
|
|
234
|
+
logger.info(f" Found {len(evaluation)} vulnerabilities")
|
|
235
|
+
except Exception as e:
|
|
236
|
+
logger.warning(f" Evaluation failed: {e}")
|
|
237
|
+
evaluations.append(None)
|
|
238
|
+
errors.append(str(e))
|
|
239
|
+
|
|
240
|
+
# Build and save record
|
|
241
|
+
record = build_record(
|
|
242
|
+
cwe_id=cwe_id,
|
|
243
|
+
cwe_name=cwe_name,
|
|
244
|
+
cwe_description=cwe_description,
|
|
245
|
+
scenarios=scenarios,
|
|
246
|
+
codes=codes,
|
|
247
|
+
evaluations=evaluations,
|
|
248
|
+
errors=errors,
|
|
249
|
+
min_scenarios=min_samples
|
|
250
|
+
)
|
|
251
|
+
|
|
252
|
+
append_to_jsonl(record, output_path)
|
|
253
|
+
logger.info(f"✓ Completed CWE-{cwe_id}")
|
|
254
|
+
|
|
255
|
+
except Exception as e:
|
|
256
|
+
logger.error(f"✗ Failed to process CWE-{cwe_id}: {e}")
|
|
257
|
+
continue
|
|
258
|
+
|
|
259
|
+
logger.info(f"Completed! Results saved to {output_path}")
|
|
260
|
+
|
|
261
|
+
|
|
262
|
+
if __name__ == '__main__':
|
|
263
|
+
main()
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
import dspy
|
|
2
|
+
import jsonlines
|
|
3
|
+
from cwe2.database import Database
|
|
4
|
+
|
|
5
|
+
from redcodegen.constants import LM
|
|
6
|
+
|
|
7
|
+
dspy.configure(lm=LM)
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
db = Database()
|
|
11
|
+
entry = db.get(502)
|
|
12
|
+
|
|
13
|
+
print(entry.name)
|
|
14
|
+
print(entry.extended_description)
|
|
15
|
+
|
|
16
|
+
|
|
17
|
+
|
|
File without changes
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import dspy
|
|
3
|
+
|
|
4
|
+
from dotenv import load_dotenv
|
|
5
|
+
load_dotenv()
|
|
6
|
+
|
|
7
|
+
CWE_TOP_25 = [
|
|
8
|
+
79, 787, 89, 352, 22, 125, 78,
|
|
9
|
+
416, 862, 434, 94, 20, 77, 287,
|
|
10
|
+
269, 502, 200, 863, 918, 119, 476,
|
|
11
|
+
798, 190, 400, 306
|
|
12
|
+
]
|
|
13
|
+
|
|
14
|
+
def create_lm(model_name="openai/gpt-4o-mini", temperature=0.8, api_key=None, api_base=None):
|
|
15
|
+
"""Create a DSPy language model instance.
|
|
16
|
+
|
|
17
|
+
Args:
|
|
18
|
+
model_name: Model identifier (e.g., "openai/gpt-4o-mini")
|
|
19
|
+
api_key: API key (defaults to OPENAI_API_KEY env var)
|
|
20
|
+
|
|
21
|
+
Returns:
|
|
22
|
+
dspy.LM instance
|
|
23
|
+
"""
|
|
24
|
+
if api_key is None:
|
|
25
|
+
api_key = os.environ.get("OPENAI_API_KEY")
|
|
26
|
+
if api_base is None:
|
|
27
|
+
api_base = os.environ.get("OPENAI_API_BASE")
|
|
28
|
+
|
|
29
|
+
if api_base is None:
|
|
30
|
+
return dspy.LM(
|
|
31
|
+
model_name,
|
|
32
|
+
api_key=api_key,
|
|
33
|
+
temperature=temperature,
|
|
34
|
+
max_tokens=16000
|
|
35
|
+
)
|
|
36
|
+
else:
|
|
37
|
+
return dspy.LM(
|
|
38
|
+
model_name,
|
|
39
|
+
api_key=api_key,
|
|
40
|
+
api_base=api_base,
|
|
41
|
+
temperature=temperature,
|
|
42
|
+
max_tokens=16000
|
|
43
|
+
)
|
|
44
|
+
|
|
45
|
+
SCENARIO_EXAMPLES = [
|
|
46
|
+
dspy.Example(
|
|
47
|
+
name="Deserialization of Untrusted Data",
|
|
48
|
+
description="It is often convenient to serialize objects for communication or to save them for later use. However, deserialized data or code can often be modified without using the provided accessor functions if it does not use cryptography to protect itself. Furthermore, any cryptography would still be client-side security -- which is a dangerous security assumption. Data that is untrusted can not be trusted to be well-formed. When developers place no restrictions on gadget chains, or series of instances and method invocations that can self-execute during the deserialization process (i.e., before the object is returned to the caller), it is sometimes possible for attackers to leverage them to perform unauthorized actions, like generating a shell.",
|
|
49
|
+
scenarios=[
|
|
50
|
+
]
|
|
51
|
+
)
|
|
52
|
+
]
|
|
53
|
+
|
|
54
|
+
CODEQL_LIBRARIES = [
|
|
55
|
+
"Aioch", "Aiofile", "Aiofiles", "Aiohttp", "Aiomysql", "Aiopg", "Aiosqlite",
|
|
56
|
+
"Airspeed", "Anyio", "Asyncpg", "Asyncpg", "BSon", "Baize", "Bottle",
|
|
57
|
+
"CassandraDriver", "Chameleon", "Cherrypy", "Chevron", "ClickhouseDriver", "Cryptodome",
|
|
58
|
+
"Cryptography", "Cx_Oracle", "Dill", "Django", "Fabric", "FastApi", "Flask",
|
|
59
|
+
"FlaskAdmin", "FlaskSqlAlchemy", "Genshi", "Gradio", "Hdbcli", "Httpx",
|
|
60
|
+
"Idna", "Invoke", "Jinja2", "Jmespath", "Joblib", "JsonPickle", "Ldap",
|
|
61
|
+
"Ldap3", "Libtaxii", "Libxml2", "Lxml", "Mako", "MarkupSafe",
|
|
62
|
+
"Multidict", "MySQLdb", "Mysql", "Numpy", "Opml", "Oracledb", "PEP249", "Pandas",
|
|
63
|
+
"Paramiko", "Peewee", "Pexpect", "Phoenixdb", "Psycopg", "Psycopg2", "PyMongo",
|
|
64
|
+
"PyMySQL", "Pycurl", "Pydantic", "Pymssql", "Pyodbc", "Pyramid", "Requests", "RestFramework",
|
|
65
|
+
"Rsa", "RuamelYaml", "Sanic", "ServerLess", "Setuptools", "Simplejson", "SqlAlchemy",
|
|
66
|
+
"Starlette", "Stdlib", "Stdlib", "Streamlit", "TRender", "Toml", "Torch", "Tornado",
|
|
67
|
+
"Twisted", "Ujson", "Urllib3", "Werkzeug", "Xmltodict", "Yaml", "Yarl"
|
|
68
|
+
]
|