reait 0.0.18__tar.gz → 0.0.20__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {reait-0.0.18 → reait-0.0.20}/PKG-INFO +45 -21
- {reait-0.0.18 → reait-0.0.20}/README.md +44 -20
- {reait-0.0.18 → reait-0.0.20}/pyproject.toml +1 -1
- {reait-0.0.18 → reait-0.0.20}/setup.py +4 -2
- {reait-0.0.18 → reait-0.0.20}/src/reait/__init__.py +3 -0
- reait-0.0.20/src/reait/api.py +604 -0
- reait-0.0.20/src/reait/main.py +514 -0
- {reait-0.0.18 → reait-0.0.20}/src/reait.egg-info/PKG-INFO +45 -21
- reait-0.0.18/src/reait/api.py +0 -349
- reait-0.0.18/src/reait/main.py +0 -398
- {reait-0.0.18 → reait-0.0.20}/LICENSE +0 -0
- {reait-0.0.18 → reait-0.0.20}/setup.cfg +0 -0
- {reait-0.0.18 → reait-0.0.20}/src/reait.egg-info/SOURCES.txt +0 -0
- {reait-0.0.18 → reait-0.0.20}/src/reait.egg-info/dependency_links.txt +0 -0
- {reait-0.0.18 → reait-0.0.20}/src/reait.egg-info/entry_points.txt +0 -0
- {reait-0.0.18 → reait-0.0.20}/src/reait.egg-info/requires.txt +0 -0
- {reait-0.0.18 → reait-0.0.20}/src/reait.egg-info/top_level.txt +0 -0
- {reait-0.0.18 → reait-0.0.20}/tests/test_reait.py +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: reait
|
3
|
-
Version: 0.0.
|
3
|
+
Version: 0.0.20
|
4
4
|
Home-page: https://github.com/RevEng-AI/reait
|
5
5
|
Author: James Patrick-Evans
|
6
6
|
Author-email: James Patrick-Evans <james@reveng.ai>
|
@@ -704,6 +704,8 @@ Requires-Dist: scikit-learn
|
|
704
704
|
|
705
705
|
# reait
|
706
706
|
|
707
|
+
[](https://github.com/RevEngAI/reait/actions/workflows/python-package.yml)
|
708
|
+
|
707
709
|
## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
|
708
710
|
|
709
711
|
Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA++" **REAI** signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
|
@@ -712,19 +714,23 @@ NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86
|
|
712
714
|
|
713
715
|
## Installation
|
714
716
|
|
715
|
-
Install the latest stable version using
|
717
|
+
Install the latest stable version using `pip3`.
|
716
718
|
|
717
|
-
|
719
|
+
```shell
|
720
|
+
pip3 install reait
|
721
|
+
```
|
718
722
|
|
719
723
|
### Latest development version
|
720
724
|
|
721
|
-
|
725
|
+
```shell
|
726
|
+
pip3 install -e .
|
727
|
+
```
|
722
728
|
|
723
729
|
or
|
724
730
|
|
725
|
-
```
|
731
|
+
```shell
|
726
732
|
python3 -m build .
|
727
|
-
|
733
|
+
pip3 install -U dist/reait-*.whl
|
728
734
|
```
|
729
735
|
|
730
736
|
## Using reait
|
@@ -732,7 +738,9 @@ pip install -U dist/reait-*.whl
|
|
732
738
|
### Analysing binaries
|
733
739
|
To submit a binary for analysis, run `reait` with the `-a` flag:
|
734
740
|
|
735
|
-
|
741
|
+
```shell
|
742
|
+
reait -b /usr/bin/true -a
|
743
|
+
```
|
736
744
|
|
737
745
|
This uploads the binary specified by `-b` to RevEng.AI servers for analysis. Depending on the size of the binary, it may take several hours. You may check an analysis jobs progress with the `-l` flag e.g. `reait -b /usr/bin/true -l`.
|
738
746
|
|
@@ -740,30 +748,42 @@ This uploads the binary specified by `-b` to RevEng.AI servers for analysis. Dep
|
|
740
748
|
Symbol embeddings are numerical vector representations of each component that capture their semantic understanding. Similar functions should be similar to each other in our embedded vector space. They can be thought of as *advanced* AI-based IDA FLIRT signatures or Radare2 Zignatures.
|
741
749
|
Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for all symbols extracted with the `-x` flag.
|
742
750
|
|
743
|
-
|
751
|
+
```shell
|
752
|
+
reait -b /usr/bin/true -x > embeddings.json
|
753
|
+
```
|
744
754
|
|
745
|
-
#### Extract embedding for symbol at vaddr
|
746
|
-
|
755
|
+
#### Extract embedding for symbol at vaddr 0x19F0
|
756
|
+
```shell
|
757
|
+
reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19F0))).embedding" > embedding.json
|
758
|
+
```
|
747
759
|
|
748
760
|
|
749
761
|
### Search for similar symbols using an embedding
|
750
762
|
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
|
751
763
|
|
752
|
-
|
764
|
+
```shell
|
765
|
+
reait --embedding embedding.json -n
|
766
|
+
```
|
753
767
|
|
754
|
-
The following command searches for the top 10 most similar symbols found in md5sum.gcc.og.dynamic to the symbol starting at
|
768
|
+
The following command searches for the top 10 most similar symbols found in md5sum.gcc.og.dynamic to the symbol starting at _0x33E6_ in md5sum.clang.og.dynamic. You may need to pass `--image-base` to ensure virtual addresses are mapped correctly.
|
755
769
|
|
756
|
-
|
770
|
+
```shell
|
771
|
+
reait -b md5sum.gcc.og.dynamic -n --start-vaddr 0x33E6 --found-in md5sum.gcc.o2.dynamic --nns 10 --base-address 0x100000
|
772
|
+
```
|
757
773
|
|
758
774
|
Search NN by symbol name.
|
759
|
-
|
775
|
+
```shell
|
776
|
+
reait -b md5sum.gcc.og.dynamic -n --symbol md5_buffer --found-in md5sum.gcc.o2.dynamic --nns 5
|
777
|
+
```
|
760
778
|
|
761
779
|
NB: A smaller distance indicates a higher degree of similarity.
|
762
780
|
|
763
781
|
#### Specific Search
|
764
782
|
To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
|
765
783
|
|
766
|
-
|
784
|
+
```shell
|
785
|
+
reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5
|
786
|
+
```
|
767
787
|
|
768
788
|
This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
|
769
789
|
|
@@ -773,7 +793,9 @@ The `--from-file` option may also be used to limit the search to a custom file c
|
|
773
793
|
#### Limited Search
|
774
794
|
To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
|
775
795
|
|
776
|
-
|
796
|
+
```shell
|
797
|
+
reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"
|
798
|
+
```
|
777
799
|
|
778
800
|
RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
|
779
801
|
|
@@ -784,14 +806,16 @@ Find common components between binaries, RevEng.AI collections, or global search
|
|
784
806
|
|
785
807
|
Example usage:
|
786
808
|
|
787
|
-
```
|
809
|
+
```shell
|
788
810
|
reait -M -b 05ff897f430fec0ac17f14c89181c76961993506e5875f2987e9ead13bec58c2.exe --from-file 755a4b2ec15da6bb01248b2dfbad206c340ba937eae9c35f04f6cedfe5e99d63.embeddings.json --confidence high
|
789
811
|
```
|
790
812
|
|
791
813
|
### RevEng.AI embedding models
|
792
814
|
To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
|
793
815
|
|
794
|
-
|
816
|
+
```shell
|
817
|
+
reait -b /usr/bin/true -m dexter -a
|
818
|
+
```
|
795
819
|
|
796
820
|
### Software Composition Analysis
|
797
821
|
To identify known open source software components embedded inside a binary, use the `-C` flag.
|
@@ -805,7 +829,7 @@ To generate an AI functional description of an entire binary file, use the `-s`
|
|
805
829
|
|
806
830
|
REAI signatures can be used to compute the binary similarity between entire executables with the `-S` flag. For example:
|
807
831
|
|
808
|
-
```
|
832
|
+
```shell
|
809
833
|
reait -b d24ccf73aabca4192d33a07b4a238c8d40ac97a550c2e65b8074f03455a981ca.exe -S -t 00062cb01088cea245cd5f3eb03f65a0e6b11a8126ce00034d87935a451cf99c.exe,438d64bb831555caadaa92a32c9d62e255001bc8d524721c885f37d750ec3476.exe,755a4b2ec15da6bb01248b2dfbad206c340ba937eae9c35f04f6cedfe5e99d63.exe,05ff897f430fec0ac17f14c89181c76961993506e5875f2987e9ead13bec58c2.exe
|
810
834
|
Computing Binary Similarity... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
|
811
835
|
Binary Similarity to RedlineInfoStealer/d24ccf73aabca4192d33a07b4a238c8d40ac97a550c2e65b8074f03455a981ca.exe
|
@@ -824,7 +848,7 @@ Computing Binary Similarity... ━━━━━━━━━━━━━━━━
|
|
824
848
|
|
825
849
|
To perform binary ANN search, pass in `-n` and `-s` flag at the same time. For example:
|
826
850
|
|
827
|
-
```
|
851
|
+
```shell
|
828
852
|
reait -b /usr/bin/true -s -n
|
829
853
|
Found /usr/bin/true:elf-x86_64
|
830
854
|
[
|
@@ -856,7 +880,7 @@ Found /usr/bin/true:elf-x86_64
|
|
856
880
|
|
857
881
|
`reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
|
858
882
|
|
859
|
-
```
|
883
|
+
```shell
|
860
884
|
apikey = "l1br3"
|
861
885
|
host = "https://api.reveng.ai"
|
862
886
|
model = "binnet-0.1"
|
@@ -1,5 +1,7 @@
|
|
1
1
|
# reait
|
2
2
|
|
3
|
+
[](https://github.com/RevEngAI/reait/actions/workflows/python-package.yml)
|
4
|
+
|
3
5
|
## <ins>R</ins>ev<ins>E</ins>ng.<ins>AI</ins> <ins>T</ins>oolkit
|
4
6
|
|
5
7
|
Analyse compiled executable binaries using the RevEng.AI API. This tool allows you to search for similar components across different compiled executable programs, identify known vulnerabilities in stripped executables, and generate "YARA++" **REAI** signatures for entire binary files. More details about the API can be found at [docs.reveng.ai](https://docs.reveng.ai).
|
@@ -8,19 +10,23 @@ NB: We are in Alpha. We support GNU/Linux ELF and Windows PE executables for x86
|
|
8
10
|
|
9
11
|
## Installation
|
10
12
|
|
11
|
-
Install the latest stable version using
|
13
|
+
Install the latest stable version using `pip3`.
|
12
14
|
|
13
|
-
|
15
|
+
```shell
|
16
|
+
pip3 install reait
|
17
|
+
```
|
14
18
|
|
15
19
|
### Latest development version
|
16
20
|
|
17
|
-
|
21
|
+
```shell
|
22
|
+
pip3 install -e .
|
23
|
+
```
|
18
24
|
|
19
25
|
or
|
20
26
|
|
21
|
-
```
|
27
|
+
```shell
|
22
28
|
python3 -m build .
|
23
|
-
|
29
|
+
pip3 install -U dist/reait-*.whl
|
24
30
|
```
|
25
31
|
|
26
32
|
## Using reait
|
@@ -28,7 +34,9 @@ pip install -U dist/reait-*.whl
|
|
28
34
|
### Analysing binaries
|
29
35
|
To submit a binary for analysis, run `reait` with the `-a` flag:
|
30
36
|
|
31
|
-
|
37
|
+
```shell
|
38
|
+
reait -b /usr/bin/true -a
|
39
|
+
```
|
32
40
|
|
33
41
|
This uploads the binary specified by `-b` to RevEng.AI servers for analysis. Depending on the size of the binary, it may take several hours. You may check an analysis jobs progress with the `-l` flag e.g. `reait -b /usr/bin/true -l`.
|
34
42
|
|
@@ -36,30 +44,42 @@ This uploads the binary specified by `-b` to RevEng.AI servers for analysis. Dep
|
|
36
44
|
Symbol embeddings are numerical vector representations of each component that capture their semantic understanding. Similar functions should be similar to each other in our embedded vector space. They can be thought of as *advanced* AI-based IDA FLIRT signatures or Radare2 Zignatures.
|
37
45
|
Once an analysis is complete, you may access RevEng.AI's BinNet embeddings for all symbols extracted with the `-x` flag.
|
38
46
|
|
39
|
-
|
47
|
+
```shell
|
48
|
+
reait -b /usr/bin/true -x > embeddings.json
|
49
|
+
```
|
40
50
|
|
41
|
-
#### Extract embedding for symbol at vaddr
|
42
|
-
|
51
|
+
#### Extract embedding for symbol at vaddr 0x19F0
|
52
|
+
```shell
|
53
|
+
reait -b /usr/bin/true -x | jq ".[] | select(.vaddr==$((0x19F0))).embedding" > embedding.json
|
54
|
+
```
|
43
55
|
|
44
56
|
|
45
57
|
### Search for similar symbols using an embedding
|
46
58
|
To query our database of similar symbols based on an embedding, use `-n` to search using Approximate Nearest Neighbours. The `--nns` allows you to specify the number of results returned. A list of symbols with their names, distance (similarity), RevEng.AI collection set, source code filename, source code line number, and file creation timestamp is returned.
|
47
59
|
|
48
|
-
|
60
|
+
```shell
|
61
|
+
reait --embedding embedding.json -n
|
62
|
+
```
|
49
63
|
|
50
|
-
The following command searches for the top 10 most similar symbols found in md5sum.gcc.og.dynamic to the symbol starting at
|
64
|
+
The following command searches for the top 10 most similar symbols found in md5sum.gcc.og.dynamic to the symbol starting at _0x33E6_ in md5sum.clang.og.dynamic. You may need to pass `--image-base` to ensure virtual addresses are mapped correctly.
|
51
65
|
|
52
|
-
|
66
|
+
```shell
|
67
|
+
reait -b md5sum.gcc.og.dynamic -n --start-vaddr 0x33E6 --found-in md5sum.gcc.o2.dynamic --nns 10 --base-address 0x100000
|
68
|
+
```
|
53
69
|
|
54
70
|
Search NN by symbol name.
|
55
|
-
|
71
|
+
```shell
|
72
|
+
reait -b md5sum.gcc.og.dynamic -n --symbol md5_buffer --found-in md5sum.gcc.o2.dynamic --nns 5
|
73
|
+
```
|
56
74
|
|
57
75
|
NB: A smaller distance indicates a higher degree of similarity.
|
58
76
|
|
59
77
|
#### Specific Search
|
60
78
|
To search for the most similar symbols found in a specific binary, use the `--found-in` option with a path to the executable to search from.
|
61
79
|
|
62
|
-
|
80
|
+
```shell
|
81
|
+
reait -n --embedding /tmp/sha256_init.json --found-in ~/malware.exe --nns 5
|
82
|
+
```
|
63
83
|
|
64
84
|
This downloads embeddings from `malware.exe` and computes the cosine similarity between all symbols and `sha256_init.json`. The returned results lists the most similar symbol locations by cosine similarity score (1.0 most similar, -1.0 dissimilar).
|
65
85
|
|
@@ -69,7 +89,9 @@ The `--from-file` option may also be used to limit the search to a custom file c
|
|
69
89
|
#### Limited Search
|
70
90
|
To search for most similar symbols from a set of RevEng.AI collections, use the `--collections` options with a RegEx to match collection names. For example:
|
71
91
|
|
72
|
-
|
92
|
+
```shell
|
93
|
+
reait -n --embedding my_func.json --collections "(libc.*|lib.*crypt.*)"
|
94
|
+
```
|
73
95
|
|
74
96
|
RevEng.AI collections are sets of pre-analysed executable objects. To create custom collection sets e.g., malware collections, please create a RevEng.AI account.
|
75
97
|
|
@@ -80,14 +102,16 @@ Find common components between binaries, RevEng.AI collections, or global search
|
|
80
102
|
|
81
103
|
Example usage:
|
82
104
|
|
83
|
-
```
|
105
|
+
```shell
|
84
106
|
reait -M -b 05ff897f430fec0ac17f14c89181c76961993506e5875f2987e9ead13bec58c2.exe --from-file 755a4b2ec15da6bb01248b2dfbad206c340ba937eae9c35f04f6cedfe5e99d63.embeddings.json --confidence high
|
85
107
|
```
|
86
108
|
|
87
109
|
### RevEng.AI embedding models
|
88
110
|
To use specific RevEng.AI AI models, or for training custom models, use `-m` to specify the model. The default option is to use the latest development model. Available models are `binnet-0.1` and `dexter`.
|
89
111
|
|
90
|
-
|
112
|
+
```shell
|
113
|
+
reait -b /usr/bin/true -m dexter -a
|
114
|
+
```
|
91
115
|
|
92
116
|
### Software Composition Analysis
|
93
117
|
To identify known open source software components embedded inside a binary, use the `-C` flag.
|
@@ -101,7 +125,7 @@ To generate an AI functional description of an entire binary file, use the `-s`
|
|
101
125
|
|
102
126
|
REAI signatures can be used to compute the binary similarity between entire executables with the `-S` flag. For example:
|
103
127
|
|
104
|
-
```
|
128
|
+
```shell
|
105
129
|
reait -b d24ccf73aabca4192d33a07b4a238c8d40ac97a550c2e65b8074f03455a981ca.exe -S -t 00062cb01088cea245cd5f3eb03f65a0e6b11a8126ce00034d87935a451cf99c.exe,438d64bb831555caadaa92a32c9d62e255001bc8d524721c885f37d750ec3476.exe,755a4b2ec15da6bb01248b2dfbad206c340ba937eae9c35f04f6cedfe5e99d63.exe,05ff897f430fec0ac17f14c89181c76961993506e5875f2987e9ead13bec58c2.exe
|
106
130
|
Computing Binary Similarity... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01
|
107
131
|
Binary Similarity to RedlineInfoStealer/d24ccf73aabca4192d33a07b4a238c8d40ac97a550c2e65b8074f03455a981ca.exe
|
@@ -120,7 +144,7 @@ Computing Binary Similarity... ━━━━━━━━━━━━━━━━
|
|
120
144
|
|
121
145
|
To perform binary ANN search, pass in `-n` and `-s` flag at the same time. For example:
|
122
146
|
|
123
|
-
```
|
147
|
+
```shell
|
124
148
|
reait -b /usr/bin/true -s -n
|
125
149
|
Found /usr/bin/true:elf-x86_64
|
126
150
|
[
|
@@ -152,7 +176,7 @@ Found /usr/bin/true:elf-x86_64
|
|
152
176
|
|
153
177
|
`reait` reads the config file stored at `~/.reait.toml`. An example config file looks like:
|
154
178
|
|
155
|
-
```
|
179
|
+
```shell
|
156
180
|
apikey = "l1br3"
|
157
181
|
host = "https://api.reveng.ai"
|
158
182
|
model = "binnet-0.1"
|
@@ -1,11 +1,14 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
1
2
|
import setuptools
|
2
3
|
|
4
|
+
__version__ = "0.0.20"
|
5
|
+
|
3
6
|
with open("README.md", "r") as f:
|
4
7
|
long_description = f.read()
|
5
8
|
|
6
9
|
setuptools.setup(
|
7
10
|
name="reait",
|
8
|
-
version=
|
11
|
+
version=__version__,
|
9
12
|
long_description=long_description,
|
10
13
|
long_description_content_type="text/markdown",
|
11
14
|
url="https://github.com/RevEng-AI/reait",
|
@@ -21,4 +24,3 @@ setuptools.setup(
|
|
21
24
|
'tqdm', 'argparse', 'requests', 'rich', 'tomli', 'pandas', 'numpy', "scipy", "scikit-learn"
|
22
25
|
],
|
23
26
|
)
|
24
|
-
|