reait 0.0.14__tar.gz → 0.0.15__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {reait-0.0.14 → reait-0.0.15}/PKG-INFO +38 -8
- {reait-0.0.14 → reait-0.0.15}/README.md +36 -7
- {reait-0.0.14 → reait-0.0.15}/pyproject.toml +2 -2
- reait-0.0.15/reait/__init__.py +0 -0
- reait-0.0.14/reait → reait-0.0.15/reait/__main__.py +118 -19
- {reait-0.0.14 → reait-0.0.15}/reait.egg-info/PKG-INFO +38 -8
- {reait-0.0.14 → reait-0.0.15}/reait.egg-info/SOURCES.txt +5 -2
- reait-0.0.15/reait.egg-info/entry_points.txt +2 -0
- reait-0.0.15/reait.egg-info/top_level.txt +2 -0
- {reait-0.0.14 → reait-0.0.15}/setup.py +8 -3
- reait-0.0.15/tests/__init__.py +0 -0
- reait-0.0.14/reait.egg-info/top_level.txt +0 -1
- {reait-0.0.14 → reait-0.0.15}/LICENSE +0 -0
- {reait-0.0.14 → reait-0.0.15}/reait.egg-info/dependency_links.txt +0 -0
- {reait-0.0.14 → reait-0.0.15}/reait.egg-info/requires.txt +0 -0
- {reait-0.0.14 → reait-0.0.15}/setup.cfg +0 -0
@@ -1,7 +1,8 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: reait
|
3
|
-
Version: 0.0.
|
3
|
+
Version: 0.0.15
|
4
4
|
Home-page: https://github.com/RevEng-AI/reait
|
5
|
+
Author: James Patrick-Evans
|
5
6
|
Project-URL: Homepage, https://github.com/RevEng-AI/reait
|
6
7
|
Project-URL: Bug Tracker, https://github.com/RevEng-AI/reait/issues
|
7
8
|
Project-URL: Organisation Homepage, https://reveng.ai
|
@@ -14,9 +15,10 @@ Description-Content-Type: text/markdown
|
|
14
15
|
License-File: LICENSE
|
15
16
|
|
16
17
|
# reait
|
17
|
-
RevEng.AI Toolkit
|
18
18
|
|
19
|
-
|
19
|
+
## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
|
20
|
+
|
21
|
+
Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA-like" AI signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
|
20
22
|
|
21
23
|
NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86_64, and focus our support for x86_64 Linux ELF executables.
|
22
24
|
|
@@ -49,26 +51,53 @@ Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for a
|
|
49
51
|
`reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19f0))).embedding" > embedding.json`
|
50
52
|
|
51
53
|
|
52
|
-
### Search for similar symbols
|
53
|
-
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of
|
54
|
+
### Search for similar symbols using an embedding
|
55
|
+
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
|
54
56
|
|
55
57
|
`reait -e embedding.json -n`
|
56
58
|
|
57
59
|
NB: A smaller distance indicates a higher degree of similarity.
|
58
60
|
|
59
|
-
####
|
60
|
-
To search for the most similar symbols found in a
|
61
|
+
#### Specific Search
|
62
|
+
To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
|
61
63
|
|
62
64
|
`reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5`
|
63
65
|
|
64
66
|
This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
|
65
67
|
|
68
|
+
The `--from-file` option may also be used to limit the search to a custom file containing a JSON list of embeddings.
|
69
|
+
|
70
|
+
|
71
|
+
#### Limited Search
|
72
|
+
To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
|
73
|
+
|
74
|
+
`reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"`
|
75
|
+
|
76
|
+
RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
|
77
|
+
|
78
|
+
### RevEng.AI embedding models
|
79
|
+
To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
|
80
|
+
|
81
|
+
`reait -b /usr/bin/true -m dexter -a`
|
82
|
+
|
83
|
+
### Software Composition Analysis
|
84
|
+
To identify known open source software components embedded inside a binary, use the `-C` flag.
|
85
|
+
|
86
|
+
#### Stripped Binary CVE Checker
|
87
|
+
To check for known vulnerabilities found with embedded software components, use `-c` or `--cves`.
|
88
|
+
|
89
|
+
|
90
|
+
### RevEng.AI Binary Signature
|
91
|
+
To generate an AI functional description of an entire binary file, use the `-S` flag. NB: Under development.
|
92
|
+
|
93
|
+
|
66
94
|
### Binary embedding
|
67
|
-
Produce a
|
95
|
+
Produce a dumb fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
|
68
96
|
|
69
97
|
`reait -b /usr/bin/true -s`
|
70
98
|
|
71
99
|
|
100
|
+
|
72
101
|
## Configuration
|
73
102
|
|
74
103
|
`reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
|
@@ -76,6 +105,7 @@ Produce a smart fingerprint for the whole binary by calculating the arithmetic m
|
|
76
105
|
```
|
77
106
|
apikey = "l1br3"
|
78
107
|
host = "https://api.reveng.ai"
|
108
|
+
model = "binnet-0.1"
|
79
109
|
```
|
80
110
|
|
81
111
|
## Contact
|
@@ -1,7 +1,8 @@
|
|
1
1
|
# reait
|
2
|
-
RevEng.AI Toolkit
|
3
2
|
|
4
|
-
|
3
|
+
## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
|
4
|
+
|
5
|
+
Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA-like" AI signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
|
5
6
|
|
6
7
|
NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86_64, and focus our support for x86_64 Linux ELF executables.
|
7
8
|
|
@@ -34,26 +35,53 @@ Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for a
|
|
34
35
|
`reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19f0))).embedding" > embedding.json`
|
35
36
|
|
36
37
|
|
37
|
-
### Search for similar symbols
|
38
|
-
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of
|
38
|
+
### Search for similar symbols using an embedding
|
39
|
+
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
|
39
40
|
|
40
41
|
`reait -e embedding.json -n`
|
41
42
|
|
42
43
|
NB: A smaller distance indicates a higher degree of similarity.
|
43
44
|
|
44
|
-
####
|
45
|
-
To search for the most similar symbols found in a
|
45
|
+
#### Specific Search
|
46
|
+
To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
|
46
47
|
|
47
48
|
`reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5`
|
48
49
|
|
49
50
|
This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
|
50
51
|
|
52
|
+
The `--from-file` option may also be used to limit the search to a custom file containing a JSON list of embeddings.
|
53
|
+
|
54
|
+
|
55
|
+
#### Limited Search
|
56
|
+
To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
|
57
|
+
|
58
|
+
`reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"`
|
59
|
+
|
60
|
+
RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
|
61
|
+
|
62
|
+
### RevEng.AI embedding models
|
63
|
+
To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
|
64
|
+
|
65
|
+
`reait -b /usr/bin/true -m dexter -a`
|
66
|
+
|
67
|
+
### Software Composition Analysis
|
68
|
+
To identify known open source software components embedded inside a binary, use the `-C` flag.
|
69
|
+
|
70
|
+
#### Stripped Binary CVE Checker
|
71
|
+
To check for known vulnerabilities found with embedded software components, use `-c` or `--cves`.
|
72
|
+
|
73
|
+
|
74
|
+
### RevEng.AI Binary Signature
|
75
|
+
To generate an AI functional description of an entire binary file, use the `-S` flag. NB: Under development.
|
76
|
+
|
77
|
+
|
51
78
|
### Binary embedding
|
52
|
-
Produce a
|
79
|
+
Produce a dumb fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
|
53
80
|
|
54
81
|
`reait -b /usr/bin/true -s`
|
55
82
|
|
56
83
|
|
84
|
+
|
57
85
|
## Configuration
|
58
86
|
|
59
87
|
`reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
|
@@ -61,6 +89,7 @@ Produce a smart fingerprint for the whole binary by calculating the arithmetic m
|
|
61
89
|
```
|
62
90
|
apikey = "l1br3"
|
63
91
|
host = "https://api.reveng.ai"
|
92
|
+
model = "binnet-0.1"
|
64
93
|
```
|
65
94
|
|
66
95
|
## Contact
|
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|
4
4
|
|
5
5
|
[project]
|
6
6
|
name = "reait"
|
7
|
-
version = "0.0.
|
7
|
+
version = "0.0.15"
|
8
8
|
readme = "README.md"
|
9
9
|
classifiers=[
|
10
10
|
"Programming Language :: Python :: 3",
|
@@ -19,7 +19,7 @@ dependencies = [
|
|
19
19
|
"tomli",
|
20
20
|
"scikit-learn",
|
21
21
|
"pandas",
|
22
|
-
"numpy"
|
22
|
+
"numpy",
|
23
23
|
]
|
24
24
|
keywords = ["reverse", "engineering", "reveng.ai", "reveng", "machine", "learning", "binary", "analysis", "ml", "ai", "vector", "embedding"]
|
25
25
|
|
File without changes
|
@@ -7,6 +7,7 @@ from hashlib import sha256
|
|
7
7
|
from rich import print_json, print as rich_print
|
8
8
|
from sklearn.metrics.pairwise import cosine_similarity
|
9
9
|
import os
|
10
|
+
import re
|
10
11
|
import argparse
|
11
12
|
import requests
|
12
13
|
from numpy import array, vstack, mean
|
@@ -17,11 +18,12 @@ from os.path import isfile
|
|
17
18
|
from sys import exit
|
18
19
|
from IPython import embed
|
19
20
|
|
20
|
-
__version__ = "0.0.
|
21
|
+
__version__ = "0.0.15"
|
21
22
|
|
22
23
|
re_conf = {
|
23
24
|
'apikey' : 'l1br3',
|
24
|
-
'host' : 'https://api.reveng.ai'
|
25
|
+
'host' : 'https://api.reveng.ai',
|
26
|
+
'model': 'binnet-0.1'
|
25
27
|
}
|
26
28
|
|
27
29
|
def reveng_req(r: requests.request, end_point: str, data=None, ex_headers: dict = None, params=None):
|
@@ -47,11 +49,14 @@ def RE_delete(fpath: str):
|
|
47
49
|
return
|
48
50
|
|
49
51
|
|
50
|
-
def RE_analyse(fpath: str):
|
52
|
+
def RE_analyse(fpath: str, model: str = None):
|
51
53
|
"""
|
52
54
|
Start analysis job for binary file
|
53
55
|
"""
|
54
|
-
|
56
|
+
params={}
|
57
|
+
if model:
|
58
|
+
params['model'] = model
|
59
|
+
res = reveng_req(requests.post, f"analyse", data=open(fpath, 'rb').read(), params=params)
|
55
60
|
if res.status_code == 200:
|
56
61
|
print("[+] Successfully submitted binary for analysis.")
|
57
62
|
print(f"[+] {fpath} - {binary_id(fpath)}")
|
@@ -65,6 +70,24 @@ def RE_analyse(fpath: str):
|
|
65
70
|
res.raise_for_status()
|
66
71
|
|
67
72
|
|
73
|
+
def RE_upload(fpath: str):
|
74
|
+
"""
|
75
|
+
Upload binary to Server
|
76
|
+
"""
|
77
|
+
res = reveng_req(requests.post, f"upload", data=open(fpath, 'rb').read())
|
78
|
+
if res.status_code == 200:
|
79
|
+
print("[+] Successfully uploaded binary to your account.")
|
80
|
+
print(f"[+] {fpath} - {binary_id(fpath)}")
|
81
|
+
return res
|
82
|
+
|
83
|
+
if res.status_code == 400:
|
84
|
+
if 'already exists' in json.loads(res.text)['reason']:
|
85
|
+
print(f"[-] {fpath} already exists. Please check the results log file for {binary_id(fpath)}")
|
86
|
+
return True
|
87
|
+
|
88
|
+
res.raise_for_status()
|
89
|
+
|
90
|
+
|
68
91
|
def RE_embeddings(fpath: str):
|
69
92
|
"""
|
70
93
|
Fetch symbol embeddings
|
@@ -94,11 +117,33 @@ def RE_logs(fpath: str):
|
|
94
117
|
res.raise_for_status()
|
95
118
|
|
96
119
|
|
97
|
-
def
|
120
|
+
def RE_cves(fpath: str):
|
121
|
+
"""
|
122
|
+
Check for known CVEs in Binary
|
123
|
+
"""
|
124
|
+
bin_id = binary_id(fpath)
|
125
|
+
res = reveng_req(requests.get, f"/cves/{bin_id}")
|
126
|
+
if res.status_code == 200:
|
127
|
+
cves = json.loads(res.text)
|
128
|
+
rich_print(f"[bold blue]Checking for known CVEs embedded inside [/bold blue] [bold bright_green]{fpath}[/bold bright_green]:")
|
129
|
+
if len(cves) == 0:
|
130
|
+
rich_print(f"[bold bright_green]0 CVEs found.[/bold bright_green]")
|
131
|
+
else:
|
132
|
+
rich_print(f"[bold red]Warning CVEs found![/bold red]")
|
133
|
+
print_json(data=cves)
|
134
|
+
return
|
135
|
+
elif res.status_code == 404:
|
136
|
+
print(f"[!] Error, binary analysis for {bin_id} not found.")
|
137
|
+
return
|
138
|
+
|
139
|
+
res.raise_for_status()
|
140
|
+
|
141
|
+
|
142
|
+
#def RE_compute_distance(embedding: list, fpath_source: str, nns: int = 5):
|
143
|
+
def RE_compute_distance(embedding: list, embeddings: list, nns: int = 5):
|
98
144
|
"""
|
99
|
-
|
145
|
+
Compute the cosine distance between source embedding and embeddinsg from binary
|
100
146
|
"""
|
101
|
-
embeddings = RE_embeddings(fpath_source)
|
102
147
|
df = DataFrame(data=embeddings)
|
103
148
|
np_embedding = array(embedding).reshape(1, -1)
|
104
149
|
source_embeddings = vstack(df['embedding'].values)
|
@@ -112,15 +157,18 @@ def RE_compute_distance(embedding: list, fpath_source: str, nns: int = 5):
|
|
112
157
|
return json_sims
|
113
158
|
|
114
159
|
|
115
|
-
def RE_nearest_symbols(embedding: list, nns: int = 5):
|
160
|
+
def RE_nearest_symbols(embedding: list, nns: int = 5, collections : list = None):
|
116
161
|
"""
|
117
162
|
Get function name suggestions for an embedding
|
163
|
+
:param embedding: embedding vector as python list
|
118
164
|
:param nns: Number of nearest neighbors
|
119
|
-
:param
|
165
|
+
:param collections: str RegEx to search through RevEng.AI collections
|
120
166
|
"""
|
121
167
|
params={'nns': nns}
|
122
|
-
|
123
|
-
|
168
|
+
|
169
|
+
if collections:
|
170
|
+
params['collections'] = collections
|
171
|
+
|
124
172
|
res = reveng_req(requests.post, "ann", data=json.dumps(embedding), params=params)
|
125
173
|
res.raise_for_status()
|
126
174
|
f_suggestions = res.json()
|
@@ -155,20 +203,35 @@ def version():
|
|
155
203
|
rich_print(f"[bold red]reait[/bold red] [bold bright_green]v{__version__}[/bold bright_green]")
|
156
204
|
print_json(data=re_conf)
|
157
205
|
|
158
|
-
|
206
|
+
|
207
|
+
def main() -> None:
|
208
|
+
"""
|
209
|
+
Tool entry
|
210
|
+
"""
|
159
211
|
parse_config()
|
160
212
|
parser = argparse.ArgumentParser(add_help=False)
|
161
|
-
parser.add_argument("-b", "--binary", default="", help="Path
|
162
|
-
parser.add_argument("-a", "--analyse", action='store_true', help="
|
213
|
+
parser.add_argument("-b", "--binary", default="", help="Path of binary to analyse")
|
214
|
+
parser.add_argument("-a", "--analyse", action='store_true', help="Perform a full analysis and generate embeddings for every symbol")
|
215
|
+
parser.add_argument("--no-embeddings", action='store_true', help="Only perform binary analysis. Do not generate embeddings for symbols")
|
216
|
+
parser.add_argument("--base-address", help="Image base of the executable image to map for remote analysis")
|
217
|
+
parser.add_argument("-A", action='store_true', help="Upload and Analyse a new binary")
|
218
|
+
parser.add_argument("-u", "--upload", action='store_true', help="Upload a new binary to remote server")
|
163
219
|
parser.add_argument("-n", "--ann", action='store_true', help="Fetch Approximate Nearest Neighbours (ANNs) for embedding")
|
164
220
|
parser.add_argument("--embedding", help="Path of JSON file containing a BinNet embedding")
|
165
221
|
parser.add_argument("--nns", default="5", help="Number of approximate nearest neighbors to fetch")
|
222
|
+
parser.add_argument("--collections", default=None, help="Regex string to select RevEng.AI collections for filtering e.g., libc")
|
166
223
|
parser.add_argument("--found-in", help="ANN flag to limit to embeddings returned to those found in specific binary")
|
167
|
-
|
224
|
+
parser.add_argument("--from-file", help="ANN flag to limit to embeddings returned to those found in JSON embeddings file")
|
225
|
+
parser.add_argument("-c", "--cves", action="store_true", help="Check for CVEs found inside binary")
|
226
|
+
parser.add_argument("-C", "--sca", action="store_true", help="Perform Software Composition Anaysis to identify common libraries embedded in binary")
|
227
|
+
parser.add_argument("-m", "--model", default="binnet-0.1", help="AI model used to generate embeddings")
|
168
228
|
parser.add_argument("-x", "--extract", action='store_true', help="Fetch embeddings for binary")
|
229
|
+
parser.add_argument("--start-address", help="Start vaddr of the function to extract embeddings")
|
230
|
+
parser.add_argument("--end-address", help="End vaddr of the function to extract embeddings")
|
169
231
|
parser.add_argument("-s", "--summary", action='store_true', help="Average symbol embeddings in binary")
|
232
|
+
parser.add_argument("-S", "--signature", action='store_true', help="Generate a RevEng.AI binary signature")
|
170
233
|
parser.add_argument("-l", "--logs", action='store_true', help="Fetch analysis log file for binary")
|
171
|
-
parser.add_argument("-d", "--delete", action='store_true', help="
|
234
|
+
parser.add_argument("-d", "--delete", action='store_true', help="Delete all metadata associated with binary")
|
172
235
|
parser.add_argument("-k", "--apikey", help="RevEng.AI API key")
|
173
236
|
parser.add_argument("-h", "--host", help="Analysis Host (https://api.reveng.ai)")
|
174
237
|
parser.add_argument("-v", "--version", action="store_true", help="Display version information")
|
@@ -179,19 +242,29 @@ if __name__ == '__main__':
|
|
179
242
|
re_conf['apikey'] = args.apikey
|
180
243
|
if args.host:
|
181
244
|
re_conf['host'] = args.host
|
245
|
+
if args.model:
|
246
|
+
re_conf['model'] = args.model
|
182
247
|
|
183
248
|
# display version and exit
|
184
249
|
if args.version:
|
185
250
|
version()
|
186
251
|
exit(0)
|
187
252
|
|
188
|
-
if args.analyse or args.extract or args.logs or args.delete or args.summary:
|
253
|
+
if args.A or args.analyse or args.extract or args.logs or args.delete or args.summary or args.upload:
|
189
254
|
# verify binary is a file
|
190
255
|
if not os.path.isfile(args.binary):
|
191
256
|
print("[!] Error, please supply a valid binary file using '-b'.")
|
192
257
|
parser.print_help()
|
193
258
|
exit(-1)
|
194
259
|
|
260
|
+
if args.upload:
|
261
|
+
# upload binary first, them carry out actions
|
262
|
+
print(f"[!] RE:upload not implemented. Use analyse.")
|
263
|
+
exit(-1)
|
264
|
+
|
265
|
+
if args.A:
|
266
|
+
RE_analyse(args.binary)
|
267
|
+
|
195
268
|
if args.analyse:
|
196
269
|
RE_analyse(args.binary)
|
197
270
|
|
@@ -215,15 +288,32 @@ if __name__ == '__main__':
|
|
215
288
|
|
216
289
|
embedding = json.loads(open(args.embedding, 'r').read())
|
217
290
|
|
291
|
+
# check for valid regex
|
292
|
+
if args.collections:
|
293
|
+
try:
|
294
|
+
re.compile(args.collections)
|
295
|
+
except re.error as e:
|
296
|
+
print(f"[!] Error, invalid regex for collections - {args.collections}")
|
297
|
+
exit(-1)
|
298
|
+
|
218
299
|
if args.found_in:
|
219
300
|
if not os.path.isfile(args.found_in):
|
220
301
|
print("[!] Error, --found-in flag requires a path to a binary to search from")
|
221
302
|
exit(-1)
|
222
303
|
print(f"[+] Searching for symbols similar to embedding in binary {args.found_in}")
|
223
|
-
|
304
|
+
embeddings = RE_embeddings(args.found_in)
|
305
|
+
res = RE_compute_distance(embedding, embeddings, int(args.nns))
|
306
|
+
print_json(data=res)
|
307
|
+
elif args.from_file:
|
308
|
+
if not os.path.isfile(args.from_file):
|
309
|
+
print("[!] Error, --from-file flag requires a path to a JSON embeddings file")
|
310
|
+
exit(-1)
|
311
|
+
print(f"[+] Searching for symbols similar to embedding in binary {args.from_file}")
|
312
|
+
res = RE_compute_distance(embedding, json.load(open(args.from_file, "r")), int(args.nns))
|
224
313
|
print_json(data=res)
|
225
314
|
else:
|
226
|
-
|
315
|
+
print(f"[+] Searching for similar symbols to embedding in {'all' if not args.collections else args.collections} collections.")
|
316
|
+
RE_nearest_symbols(embedding, int(args.nns), collections=args.collections)
|
227
317
|
|
228
318
|
elif args.logs:
|
229
319
|
RE_logs(args.binary)
|
@@ -231,7 +321,16 @@ if __name__ == '__main__':
|
|
231
321
|
elif args.delete:
|
232
322
|
RE_delete(args.binary)
|
233
323
|
|
324
|
+
elif args.cves:
|
325
|
+
RE_cves(args.binary)
|
326
|
+
elif args.signature:
|
327
|
+
print(f"[!] Error, feature not available yet")
|
328
|
+
exit(-1)
|
329
|
+
|
234
330
|
else:
|
235
331
|
print("[!] Error, please supply an action command")
|
236
332
|
parser.print_help()
|
237
333
|
|
334
|
+
|
335
|
+
if __name__ == '__main__':
|
336
|
+
main()
|
@@ -1,7 +1,8 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: reait
|
3
|
-
Version: 0.0.
|
3
|
+
Version: 0.0.15
|
4
4
|
Home-page: https://github.com/RevEng-AI/reait
|
5
|
+
Author: James Patrick-Evans
|
5
6
|
Project-URL: Homepage, https://github.com/RevEng-AI/reait
|
6
7
|
Project-URL: Bug Tracker, https://github.com/RevEng-AI/reait/issues
|
7
8
|
Project-URL: Organisation Homepage, https://reveng.ai
|
@@ -14,9 +15,10 @@ Description-Content-Type: text/markdown
|
|
14
15
|
License-File: LICENSE
|
15
16
|
|
16
17
|
# reait
|
17
|
-
RevEng.AI Toolkit
|
18
18
|
|
19
|
-
|
19
|
+
## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
|
20
|
+
|
21
|
+
Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA-like" AI signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
|
20
22
|
|
21
23
|
NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86_64, and focus our support for x86_64 Linux ELF executables.
|
22
24
|
|
@@ -49,26 +51,53 @@ Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for a
|
|
49
51
|
`reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19f0))).embedding" > embedding.json`
|
50
52
|
|
51
53
|
|
52
|
-
### Search for similar symbols
|
53
|
-
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of
|
54
|
+
### Search for similar symbols using an embedding
|
55
|
+
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
|
54
56
|
|
55
57
|
`reait -e embedding.json -n`
|
56
58
|
|
57
59
|
NB: A smaller distance indicates a higher degree of similarity.
|
58
60
|
|
59
|
-
####
|
60
|
-
To search for the most similar symbols found in a
|
61
|
+
#### Specific Search
|
62
|
+
To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
|
61
63
|
|
62
64
|
`reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5`
|
63
65
|
|
64
66
|
This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
|
65
67
|
|
68
|
+
The `--from-file` option may also be used to limit the search to a custom file containing a JSON list of embeddings.
|
69
|
+
|
70
|
+
|
71
|
+
#### Limited Search
|
72
|
+
To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
|
73
|
+
|
74
|
+
`reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"`
|
75
|
+
|
76
|
+
RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
|
77
|
+
|
78
|
+
### RevEng.AI embedding models
|
79
|
+
To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
|
80
|
+
|
81
|
+
`reait -b /usr/bin/true -m dexter -a`
|
82
|
+
|
83
|
+
### Software Composition Analysis
|
84
|
+
To identify known open source software components embedded inside a binary, use the `-C` flag.
|
85
|
+
|
86
|
+
#### Stripped Binary CVE Checker
|
87
|
+
To check for known vulnerabilities found with embedded software components, use `-c` or `--cves`.
|
88
|
+
|
89
|
+
|
90
|
+
### RevEng.AI Binary Signature
|
91
|
+
To generate an AI functional description of an entire binary file, use the `-S` flag. NB: Under development.
|
92
|
+
|
93
|
+
|
66
94
|
### Binary embedding
|
67
|
-
Produce a
|
95
|
+
Produce a dumb fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
|
68
96
|
|
69
97
|
`reait -b /usr/bin/true -s`
|
70
98
|
|
71
99
|
|
100
|
+
|
72
101
|
## Configuration
|
73
102
|
|
74
103
|
`reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
|
@@ -76,6 +105,7 @@ Produce a smart fingerprint for the whole binary by calculating the arithmetic m
|
|
76
105
|
```
|
77
106
|
apikey = "l1br3"
|
78
107
|
host = "https://api.reveng.ai"
|
108
|
+
model = "binnet-0.1"
|
79
109
|
```
|
80
110
|
|
81
111
|
## Contact
|
@@ -1,10 +1,13 @@
|
|
1
1
|
LICENSE
|
2
2
|
README.md
|
3
3
|
pyproject.toml
|
4
|
-
reait
|
5
4
|
setup.py
|
5
|
+
reait/__init__.py
|
6
|
+
reait/__main__.py
|
6
7
|
reait.egg-info/PKG-INFO
|
7
8
|
reait.egg-info/SOURCES.txt
|
8
9
|
reait.egg-info/dependency_links.txt
|
10
|
+
reait.egg-info/entry_points.txt
|
9
11
|
reait.egg-info/requires.txt
|
10
|
-
reait.egg-info/top_level.txt
|
12
|
+
reait.egg-info/top_level.txt
|
13
|
+
tests/__init__.py
|
@@ -5,11 +5,11 @@ with open("README.md", "r") as f:
|
|
5
5
|
|
6
6
|
setuptools.setup(
|
7
7
|
name="reait",
|
8
|
-
version="0.0.
|
9
|
-
scripts=['reait'],
|
8
|
+
version="0.0.15",
|
10
9
|
long_description=long_description,
|
11
10
|
long_description_content_type="text/markdown",
|
12
11
|
url="https://github.com/RevEng-AI/reait",
|
12
|
+
author="James Patrick-Evans",
|
13
13
|
packages=setuptools.find_packages(),
|
14
14
|
classifiers=[
|
15
15
|
"Programming Language :: Python :: 3",
|
@@ -18,6 +18,11 @@ setuptools.setup(
|
|
18
18
|
],
|
19
19
|
install_requires=[
|
20
20
|
'tqdm', 'argparse', 'requests', 'rich', 'tomli', 'scikit-learn', 'pandas', 'numpy'
|
21
|
+
],
|
22
|
+
entry_points={
|
23
|
+
'console_scripts': [
|
24
|
+
'reait = reait.__main__:main'
|
21
25
|
]
|
22
|
-
|
26
|
+
}
|
27
|
+
)
|
23
28
|
|
File without changes
|
@@ -1 +0,0 @@
|
|
1
|
-
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|