reait 0.0.14__tar.gz → 0.0.15__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,8 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: reait
3
- Version: 0.0.14
3
+ Version: 0.0.15
4
4
  Home-page: https://github.com/RevEng-AI/reait
5
+ Author: James Patrick-Evans
5
6
  Project-URL: Homepage, https://github.com/RevEng-AI/reait
6
7
  Project-URL: Bug Tracker, https://github.com/RevEng-AI/reait/issues
7
8
  Project-URL: Organisation Homepage, https://reveng.ai
@@ -14,9 +15,10 @@ Description-Content-Type: text/markdown
14
15
  License-File: LICENSE
15
16
 
16
17
  # reait
17
- RevEng.AI Toolkit
18
18
 
19
- Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
19
+ ## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
20
+
21
+ Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA-like" AI signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
20
22
 
21
23
  NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86_64, and focus our support for x86_64 Linux ELF executables.
22
24
 
@@ -49,26 +51,53 @@ Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for a
49
51
  `reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19f0))).embedding" > embedding.json`
50
52
 
51
53
 
52
- ### Search for similar symbols based on JSON embedding file
53
- To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbol names and the distance between each vector is returned.
54
+ ### Search for similar symbols using an embedding
55
+ To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
54
56
 
55
57
  `reait -e embedding.json -n`
56
58
 
57
59
  NB: A smaller distance indicates a higher degree of similarity.
58
60
 
59
- #### Limited Search
60
- To search for the most similar symbols found in a binary to a specific embedding, use the `--found-in` option with a path to the executable.
61
+ #### Specific Search
62
+ To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
61
63
 
62
64
  `reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5`
63
65
 
64
66
  This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
65
67
 
68
+ The `--from-file` option may also be used to limit the search to a custom file containing a JSON list of embeddings.
69
+
70
+
71
+ #### Limited Search
72
+ To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
73
+
74
+ `reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"`
75
+
76
+ RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
77
+
78
+ ### RevEng.AI embedding models
79
+ To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
80
+
81
+ `reait -b /usr/bin/true -m dexter -a`
82
+
83
+ ### Software Composition Analysis
84
+ To identify known open source software components embedded inside a binary, use the `-C` flag.
85
+
86
+ #### Stripped Binary CVE Checker
87
+ To check for known vulnerabilities found with embedded software components, use `-c` or `--cves`.
88
+
89
+
90
+ ### RevEng.AI Binary Signature
91
+ To generate an AI functional description of an entire binary file, use the `-S` flag. NB: Under development.
92
+
93
+
66
94
  ### Binary embedding
67
- Produce a smart fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
95
+ Produce a dumb fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
68
96
 
69
97
  `reait -b /usr/bin/true -s`
70
98
 
71
99
 
100
+
72
101
  ## Configuration
73
102
 
74
103
  `reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
@@ -76,6 +105,7 @@ Produce a smart fingerprint for the whole binary by calculating the arithmetic m
76
105
  ```
77
106
  apikey = "l1br3"
78
107
  host = "https://api.reveng.ai"
108
+ model = "binnet-0.1"
79
109
  ```
80
110
 
81
111
  ## Contact
@@ -1,7 +1,8 @@
1
1
  # reait
2
- RevEng.AI Toolkit
3
2
 
4
- Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
3
+ ## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
4
+
5
+ Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA-like" AI signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
5
6
 
6
7
  NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86_64, and focus our support for x86_64 Linux ELF executables.
7
8
 
@@ -34,26 +35,53 @@ Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for a
34
35
  `reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19f0))).embedding" > embedding.json`
35
36
 
36
37
 
37
- ### Search for similar symbols based on JSON embedding file
38
- To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbol names and the distance between each vector is returned.
38
+ ### Search for similar symbols using an embedding
39
+ To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
39
40
 
40
41
  `reait -e embedding.json -n`
41
42
 
42
43
  NB: A smaller distance indicates a higher degree of similarity.
43
44
 
44
- #### Limited Search
45
- To search for the most similar symbols found in a binary to a specific embedding, use the `--found-in` option with a path to the executable.
45
+ #### Specific Search
46
+ To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
46
47
 
47
48
  `reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5`
48
49
 
49
50
  This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
50
51
 
52
+ The `--from-file` option may also be used to limit the search to a custom file containing a JSON list of embeddings.
53
+
54
+
55
+ #### Limited Search
56
+ To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
57
+
58
+ `reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"`
59
+
60
+ RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
61
+
62
+ ### RevEng.AI embedding models
63
+ To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
64
+
65
+ `reait -b /usr/bin/true -m dexter -a`
66
+
67
+ ### Software Composition Analysis
68
+ To identify known open source software components embedded inside a binary, use the `-C` flag.
69
+
70
+ #### Stripped Binary CVE Checker
71
+ To check for known vulnerabilities found with embedded software components, use `-c` or `--cves`.
72
+
73
+
74
+ ### RevEng.AI Binary Signature
75
+ To generate an AI functional description of an entire binary file, use the `-S` flag. NB: Under development.
76
+
77
+
51
78
  ### Binary embedding
52
- Produce a smart fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
79
+ Produce a dumb fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
53
80
 
54
81
  `reait -b /usr/bin/true -s`
55
82
 
56
83
 
84
+
57
85
  ## Configuration
58
86
 
59
87
  `reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
@@ -61,6 +89,7 @@ Produce a smart fingerprint for the whole binary by calculating the arithmetic m
61
89
  ```
62
90
  apikey = "l1br3"
63
91
  host = "https://api.reveng.ai"
92
+ model = "binnet-0.1"
64
93
  ```
65
94
 
66
95
  ## Contact
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "reait"
7
- version = "0.0.14"
7
+ version = "0.0.15"
8
8
  readme = "README.md"
9
9
  classifiers=[
10
10
  "Programming Language :: Python :: 3",
@@ -19,7 +19,7 @@ dependencies = [
19
19
  "tomli",
20
20
  "scikit-learn",
21
21
  "pandas",
22
- "numpy"
22
+ "numpy",
23
23
  ]
24
24
  keywords = ["reverse", "engineering", "reveng.ai", "reveng", "machine", "learning", "binary", "analysis", "ml", "ai", "vector", "embedding"]
25
25
 
File without changes
@@ -7,6 +7,7 @@ from hashlib import sha256
7
7
  from rich import print_json, print as rich_print
8
8
  from sklearn.metrics.pairwise import cosine_similarity
9
9
  import os
10
+ import re
10
11
  import argparse
11
12
  import requests
12
13
  from numpy import array, vstack, mean
@@ -17,11 +18,12 @@ from os.path import isfile
17
18
  from sys import exit
18
19
  from IPython import embed
19
20
 
20
- __version__ = "0.0.14"
21
+ __version__ = "0.0.15"
21
22
 
22
23
  re_conf = {
23
24
  'apikey' : 'l1br3',
24
- 'host' : 'https://api.reveng.ai'
25
+ 'host' : 'https://api.reveng.ai',
26
+ 'model': 'binnet-0.1'
25
27
  }
26
28
 
27
29
  def reveng_req(r: requests.request, end_point: str, data=None, ex_headers: dict = None, params=None):
@@ -47,11 +49,14 @@ def RE_delete(fpath: str):
47
49
  return
48
50
 
49
51
 
50
- def RE_analyse(fpath: str):
52
+ def RE_analyse(fpath: str, model: str = None):
51
53
  """
52
54
  Start analysis job for binary file
53
55
  """
54
- res = reveng_req(requests.post, f"analyse", data=open(fpath, 'rb').read())
56
+ params={}
57
+ if model:
58
+ params['model'] = model
59
+ res = reveng_req(requests.post, f"analyse", data=open(fpath, 'rb').read(), params=params)
55
60
  if res.status_code == 200:
56
61
  print("[+] Successfully submitted binary for analysis.")
57
62
  print(f"[+] {fpath} - {binary_id(fpath)}")
@@ -65,6 +70,24 @@ def RE_analyse(fpath: str):
65
70
  res.raise_for_status()
66
71
 
67
72
 
73
+ def RE_upload(fpath: str):
74
+ """
75
+ Upload binary to Server
76
+ """
77
+ res = reveng_req(requests.post, f"upload", data=open(fpath, 'rb').read())
78
+ if res.status_code == 200:
79
+ print("[+] Successfully uploaded binary to your account.")
80
+ print(f"[+] {fpath} - {binary_id(fpath)}")
81
+ return res
82
+
83
+ if res.status_code == 400:
84
+ if 'already exists' in json.loads(res.text)['reason']:
85
+ print(f"[-] {fpath} already exists. Please check the results log file for {binary_id(fpath)}")
86
+ return True
87
+
88
+ res.raise_for_status()
89
+
90
+
68
91
  def RE_embeddings(fpath: str):
69
92
  """
70
93
  Fetch symbol embeddings
@@ -94,11 +117,33 @@ def RE_logs(fpath: str):
94
117
  res.raise_for_status()
95
118
 
96
119
 
97
- def RE_compute_distance(embedding: list, fpath_source: str, nns: int = 5):
120
+ def RE_cves(fpath: str):
121
+ """
122
+ Check for known CVEs in Binary
123
+ """
124
+ bin_id = binary_id(fpath)
125
+ res = reveng_req(requests.get, f"/cves/{bin_id}")
126
+ if res.status_code == 200:
127
+ cves = json.loads(res.text)
128
+ rich_print(f"[bold blue]Checking for known CVEs embedded inside [/bold blue] [bold bright_green]{fpath}[/bold bright_green]:")
129
+ if len(cves) == 0:
130
+ rich_print(f"[bold bright_green]0 CVEs found.[/bold bright_green]")
131
+ else:
132
+ rich_print(f"[bold red]Warning CVEs found![/bold red]")
133
+ print_json(data=cves)
134
+ return
135
+ elif res.status_code == 404:
136
+ print(f"[!] Error, binary analysis for {bin_id} not found.")
137
+ return
138
+
139
+ res.raise_for_status()
140
+
141
+
142
+ #def RE_compute_distance(embedding: list, fpath_source: str, nns: int = 5):
143
+ def RE_compute_distance(embedding: list, embeddings: list, nns: int = 5):
98
144
  """
99
- Comput ecosine distance between source embedding and embeddinsg from binary
145
+ Compute the cosine distance between source embedding and embeddinsg from binary
100
146
  """
101
- embeddings = RE_embeddings(fpath_source)
102
147
  df = DataFrame(data=embeddings)
103
148
  np_embedding = array(embedding).reshape(1, -1)
104
149
  source_embeddings = vstack(df['embedding'].values)
@@ -112,15 +157,18 @@ def RE_compute_distance(embedding: list, fpath_source: str, nns: int = 5):
112
157
  return json_sims
113
158
 
114
159
 
115
- def RE_nearest_symbols(embedding: list, nns: int = 5):
160
+ def RE_nearest_symbols(embedding: list, nns: int = 5, collections : list = None):
116
161
  """
117
162
  Get function name suggestions for an embedding
163
+ :param embedding: embedding vector as python list
118
164
  :param nns: Number of nearest neighbors
119
- :param source: Binary file to search embeddings from
165
+ :param collections: str RegEx to search through RevEng.AI collections
120
166
  """
121
167
  params={'nns': nns}
122
- if source:
123
- params['source'] = source
168
+
169
+ if collections:
170
+ params['collections'] = collections
171
+
124
172
  res = reveng_req(requests.post, "ann", data=json.dumps(embedding), params=params)
125
173
  res.raise_for_status()
126
174
  f_suggestions = res.json()
@@ -155,20 +203,35 @@ def version():
155
203
  rich_print(f"[bold red]reait[/bold red] [bold bright_green]v{__version__}[/bold bright_green]")
156
204
  print_json(data=re_conf)
157
205
 
158
- if __name__ == '__main__':
206
+
207
+ def main() -> None:
208
+ """
209
+ Tool entry
210
+ """
159
211
  parse_config()
160
212
  parser = argparse.ArgumentParser(add_help=False)
161
- parser.add_argument("-b", "--binary", default="", help="Path on binary to analyse")
162
- parser.add_argument("-a", "--analyse", action='store_true', help="Analyse new binary")
213
+ parser.add_argument("-b", "--binary", default="", help="Path of binary to analyse")
214
+ parser.add_argument("-a", "--analyse", action='store_true', help="Perform a full analysis and generate embeddings for every symbol")
215
+ parser.add_argument("--no-embeddings", action='store_true', help="Only perform binary analysis. Do not generate embeddings for symbols")
216
+ parser.add_argument("--base-address", help="Image base of the executable image to map for remote analysis")
217
+ parser.add_argument("-A", action='store_true', help="Upload and Analyse a new binary")
218
+ parser.add_argument("-u", "--upload", action='store_true', help="Upload a new binary to remote server")
163
219
  parser.add_argument("-n", "--ann", action='store_true', help="Fetch Approximate Nearest Neighbours (ANNs) for embedding")
164
220
  parser.add_argument("--embedding", help="Path of JSON file containing a BinNet embedding")
165
221
  parser.add_argument("--nns", default="5", help="Number of approximate nearest neighbors to fetch")
222
+ parser.add_argument("--collections", default=None, help="Regex string to select RevEng.AI collections for filtering e.g., libc")
166
223
  parser.add_argument("--found-in", help="ANN flag to limit to embeddings returned to those found in specific binary")
167
- # parser.add_argument("-m", "--model", default="BinNet", help="AI model used to generate embeddings")
224
+ parser.add_argument("--from-file", help="ANN flag to limit to embeddings returned to those found in JSON embeddings file")
225
+ parser.add_argument("-c", "--cves", action="store_true", help="Check for CVEs found inside binary")
226
+ parser.add_argument("-C", "--sca", action="store_true", help="Perform Software Composition Anaysis to identify common libraries embedded in binary")
227
+ parser.add_argument("-m", "--model", default="binnet-0.1", help="AI model used to generate embeddings")
168
228
  parser.add_argument("-x", "--extract", action='store_true', help="Fetch embeddings for binary")
229
+ parser.add_argument("--start-address", help="Start vaddr of the function to extract embeddings")
230
+ parser.add_argument("--end-address", help="End vaddr of the function to extract embeddings")
169
231
  parser.add_argument("-s", "--summary", action='store_true', help="Average symbol embeddings in binary")
232
+ parser.add_argument("-S", "--signature", action='store_true', help="Generate a RevEng.AI binary signature")
170
233
  parser.add_argument("-l", "--logs", action='store_true', help="Fetch analysis log file for binary")
171
- parser.add_argument("-d", "--delete", action='store_true', help="Securely delete all analyses and metadata associated with binary")
234
+ parser.add_argument("-d", "--delete", action='store_true', help="Delete all metadata associated with binary")
172
235
  parser.add_argument("-k", "--apikey", help="RevEng.AI API key")
173
236
  parser.add_argument("-h", "--host", help="Analysis Host (https://api.reveng.ai)")
174
237
  parser.add_argument("-v", "--version", action="store_true", help="Display version information")
@@ -179,19 +242,29 @@ if __name__ == '__main__':
179
242
  re_conf['apikey'] = args.apikey
180
243
  if args.host:
181
244
  re_conf['host'] = args.host
245
+ if args.model:
246
+ re_conf['model'] = args.model
182
247
 
183
248
  # display version and exit
184
249
  if args.version:
185
250
  version()
186
251
  exit(0)
187
252
 
188
- if args.analyse or args.extract or args.logs or args.delete or args.summary:
253
+ if args.A or args.analyse or args.extract or args.logs or args.delete or args.summary or args.upload:
189
254
  # verify binary is a file
190
255
  if not os.path.isfile(args.binary):
191
256
  print("[!] Error, please supply a valid binary file using '-b'.")
192
257
  parser.print_help()
193
258
  exit(-1)
194
259
 
260
+ if args.upload:
261
+ # upload binary first, them carry out actions
262
+ print(f"[!] RE:upload not implemented. Use analyse.")
263
+ exit(-1)
264
+
265
+ if args.A:
266
+ RE_analyse(args.binary)
267
+
195
268
  if args.analyse:
196
269
  RE_analyse(args.binary)
197
270
 
@@ -215,15 +288,32 @@ if __name__ == '__main__':
215
288
 
216
289
  embedding = json.loads(open(args.embedding, 'r').read())
217
290
 
291
+ # check for valid regex
292
+ if args.collections:
293
+ try:
294
+ re.compile(args.collections)
295
+ except re.error as e:
296
+ print(f"[!] Error, invalid regex for collections - {args.collections}")
297
+ exit(-1)
298
+
218
299
  if args.found_in:
219
300
  if not os.path.isfile(args.found_in):
220
301
  print("[!] Error, --found-in flag requires a path to a binary to search from")
221
302
  exit(-1)
222
303
  print(f"[+] Searching for symbols similar to embedding in binary {args.found_in}")
223
- res = RE_compute_distance(embedding, args.found_in, int(args.nns))
304
+ embeddings = RE_embeddings(args.found_in)
305
+ res = RE_compute_distance(embedding, embeddings, int(args.nns))
306
+ print_json(data=res)
307
+ elif args.from_file:
308
+ if not os.path.isfile(args.from_file):
309
+ print("[!] Error, --from-file flag requires a path to a JSON embeddings file")
310
+ exit(-1)
311
+ print(f"[+] Searching for symbols similar to embedding in binary {args.from_file}")
312
+ res = RE_compute_distance(embedding, json.load(open(args.from_file, "r")), int(args.nns))
224
313
  print_json(data=res)
225
314
  else:
226
- RE_nearest_symbols(embedding, int(args.nns))
315
+ print(f"[+] Searching for similar symbols to embedding in {'all' if not args.collections else args.collections} collections.")
316
+ RE_nearest_symbols(embedding, int(args.nns), collections=args.collections)
227
317
 
228
318
  elif args.logs:
229
319
  RE_logs(args.binary)
@@ -231,7 +321,16 @@ if __name__ == '__main__':
231
321
  elif args.delete:
232
322
  RE_delete(args.binary)
233
323
 
324
+ elif args.cves:
325
+ RE_cves(args.binary)
326
+ elif args.signature:
327
+ print(f"[!] Error, feature not available yet")
328
+ exit(-1)
329
+
234
330
  else:
235
331
  print("[!] Error, please supply an action command")
236
332
  parser.print_help()
237
333
 
334
+
335
+ if __name__ == '__main__':
336
+ main()
@@ -1,7 +1,8 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: reait
3
- Version: 0.0.14
3
+ Version: 0.0.15
4
4
  Home-page: https://github.com/RevEng-AI/reait
5
+ Author: James Patrick-Evans
5
6
  Project-URL: Homepage, https://github.com/RevEng-AI/reait
6
7
  Project-URL: Bug Tracker, https://github.com/RevEng-AI/reait/issues
7
8
  Project-URL: Organisation Homepage, https://reveng.ai
@@ -14,9 +15,10 @@ Description-Content-Type: text/markdown
14
15
  License-File: LICENSE
15
16
 
16
17
  # reait
17
- RevEng.AI Toolkit
18
18
 
19
- Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
19
+ ## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
20
+
21
+ Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA-like" AI signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
20
22
 
21
23
  NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86_64, and focus our support for x86_64 Linux ELF executables.
22
24
 
@@ -49,26 +51,53 @@ Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for a
49
51
  `reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19f0))).embedding" > embedding.json`
50
52
 
51
53
 
52
- ### Search for similar symbols based on JSON embedding file
53
- To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbol names and the distance between each vector is returned.
54
+ ### Search for similar symbols using an embedding
55
+ To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
54
56
 
55
57
  `reait -e embedding.json -n`
56
58
 
57
59
  NB: A smaller distance indicates a higher degree of similarity.
58
60
 
59
- #### Limited Search
60
- To search for the most similar symbols found in a binary to a specific embedding, use the `--found-in` option with a path to the executable.
61
+ #### Specific Search
62
+ To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
61
63
 
62
64
  `reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5`
63
65
 
64
66
  This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
65
67
 
68
+ The `--from-file` option may also be used to limit the search to a custom file containing a JSON list of embeddings.
69
+
70
+
71
+ #### Limited Search
72
+ To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
73
+
74
+ `reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"`
75
+
76
+ RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
77
+
78
+ ### RevEng.AI embedding models
79
+ To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
80
+
81
+ `reait -b /usr/bin/true -m dexter -a`
82
+
83
+ ### Software Composition Analysis
84
+ To identify known open source software components embedded inside a binary, use the `-C` flag.
85
+
86
+ #### Stripped Binary CVE Checker
87
+ To check for known vulnerabilities found with embedded software components, use `-c` or `--cves`.
88
+
89
+
90
+ ### RevEng.AI Binary Signature
91
+ To generate an AI functional description of an entire binary file, use the `-S` flag. NB: Under development.
92
+
93
+
66
94
  ### Binary embedding
67
- Produce a smart fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
95
+ Produce a dumb fingerprint for the whole binary by calculating the arithmetic mean of all symbol embeddings.
68
96
 
69
97
  `reait -b /usr/bin/true -s`
70
98
 
71
99
 
100
+
72
101
  ## Configuration
73
102
 
74
103
  `reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
@@ -76,6 +105,7 @@ Produce a smart fingerprint for the whole binary by calculating the arithmetic m
76
105
  ```
77
106
  apikey = "l1br3"
78
107
  host = "https://api.reveng.ai"
108
+ model = "binnet-0.1"
79
109
  ```
80
110
 
81
111
  ## Contact
@@ -1,10 +1,13 @@
1
1
  LICENSE
2
2
  README.md
3
3
  pyproject.toml
4
- reait
5
4
  setup.py
5
+ reait/__init__.py
6
+ reait/__main__.py
6
7
  reait.egg-info/PKG-INFO
7
8
  reait.egg-info/SOURCES.txt
8
9
  reait.egg-info/dependency_links.txt
10
+ reait.egg-info/entry_points.txt
9
11
  reait.egg-info/requires.txt
10
- reait.egg-info/top_level.txt
12
+ reait.egg-info/top_level.txt
13
+ tests/__init__.py
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ reait = reait.__main__:main
@@ -0,0 +1,2 @@
1
+ reait
2
+ tests
@@ -5,11 +5,11 @@ with open("README.md", "r") as f:
5
5
 
6
6
  setuptools.setup(
7
7
  name="reait",
8
- version="0.0.14",
9
- scripts=['reait'],
8
+ version="0.0.15",
10
9
  long_description=long_description,
11
10
  long_description_content_type="text/markdown",
12
11
  url="https://github.com/RevEng-AI/reait",
12
+ author="James Patrick-Evans",
13
13
  packages=setuptools.find_packages(),
14
14
  classifiers=[
15
15
  "Programming Language :: Python :: 3",
@@ -18,6 +18,11 @@ setuptools.setup(
18
18
  ],
19
19
  install_requires=[
20
20
  'tqdm', 'argparse', 'requests', 'rich', 'tomli', 'scikit-learn', 'pandas', 'numpy'
21
+ ],
22
+ entry_points={
23
+ 'console_scripts': [
24
+ 'reait = reait.__main__:main'
21
25
  ]
22
- )
26
+ }
27
+ )
23
28
 
File without changes
@@ -1 +0,0 @@
1
-
File without changes
File without changes