thanthus 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: f7ea5694e8673e5182d6f701d2e2aa246f9736bdd61abcf496713d6647c4a03c
4
+ data.tar.gz: 963ea97d58469d033524f336f474b5ff31f1092fa65deee064e638155b36246b
5
+ SHA512:
6
+ metadata.gz: 5f932d46868fab1463e8dc76758b5b7d62d42deebb5dea02119f9bfb474dbb3a0476978fda2c883e6c1ce6e8a8fbb209cc6fca3e86254acf53818cb2884a65ef
7
+ data.tar.gz: b059fd8ae45d1b9c28d6945dbeed360bb4f1f5c4435d2adc957a4075766ebbd2ab4d9cf6fece3ca74fff28ef4969a9ce7d1dd2d21d606158cbaa32548498fbb3
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2019 Thomas Pasquier
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,289 @@
1
+ # Xanthus: Automated Reproducible Data Generation for Evaluating Intrusion Detection Systems
2
+
3
+ > I add simple Windows support. As I finished this work when studying in THU, I name this version `thanthus`.
4
+
5
+ Fairly evaluating and comparing the efficacy of different intrusion detection systems (IDS) requires that experimental data
6
+ be generated in a similar mechanism and/or shared across these systems.
7
+ The reality, unfortunately, is that there exist few public repositories (e.g., DARPA 1998/1999/2000, KDD Cup99, DARPA TC Engagement 3)
8
+ containing experimental data captured solely for the purpose of security analysis.
9
+ Among those public data repositories, most are outdated because a tremendous amount of manual labor is almost always
10
+ necessary to capture the data (e.g., DARPA TC program involves a number of teams from
11
+ across the academia and the industry and it spans over many a year).
12
+ Consequently, some newly-developed systems, in order to be able to compare against older systems,
13
+ are evaluated using the data that is a decade or two older than the systems themselves
14
+ (and usually and unsurprisingly exhibit good results).
15
+ Given that there is a perpetual arms race between the defenders and the offenders in the realm of cyber security
16
+ and that new cyber-threats are manufactured every day,
17
+ a successful defence against a decade-old exploit is hardly an achievement.
18
+
19
+ Many existing systems, acknowledging this fact and ready to showcase their detection capability,
20
+ design their own experiments and produce their own dataset as a result.
21
+ Although the experiments are sometimes carefully described in their associated publications (e.g., in academic projects),
22
+ such dataset suffers from the following drawbacks:
23
+
24
+ - In the cases where the dataset is made public, later systems can but consume only a subset of the dataset for analysis.
25
+ Therefore, if they require e.g., additional features from the dataset in the analysis, they must rerun the experiments
26
+ to capture the data themselves again, instead of simply re-using the available dataset.
27
+ Moreover, some systems publish only pre-processed dataset, which usually eliminates information from the original,
28
+ raw dataset that is not relevant to their analysis, even though such information may be relevant for other systems.
29
+
30
+ - When raw dataset is made public, it provides later systems with richer information content.
31
+ However, the underlying systems that capture the raw dataset (e.g., audit systems) are also constantly evolving,
32
+ generating finer-grained, more accurate information or
33
+ offering a completely different perspective through which one understands system behavior (e.g., provenance systems).
34
+ Security systems that take advantage of such advancement in the underlying systems
35
+ may very well find even the raw data provided by previous systems insufficient.
36
+
37
+ - If later systems must resort to reproducing dataset themselves as a result of the reasons listed above,
38
+ they need to rely on descriptions provided by previous systems to ensure high-fidelity experiment replay.
39
+ Even if we assume that previous systems provide sufficiently detailed descriptions to understand the experiment
40
+ (which certainly is not always the case),
41
+ there still exist a number of challenges.
42
+
43
+ - The experiment must be conducted using the exact software involved with matching versions.
44
+ In many cases, security experts have since identified and patched vulnerabilities in the exploitable software
45
+ used in security-related experiments, and thus the software itself usually has been updated to a newer version.
46
+ Downgrading the target software and its dependencies is therefore necessary to reproduce the experiment. This
47
+ sometimes cannot be automatically configured through existing package management systems and requires significant
48
+ manual configuration.
49
+
50
+ - Some vulnerability may affect only a particular version of the operating system. This requirement no doubt
51
+ further complicates the experimental setup and demands additional engineering effort.
52
+
53
+ - Other controllable factors may be omitted in the description that may or may not affect the final results of the
54
+ experiment. For example, background activities may have been included in the dataset but was not discussed in detail.
55
+
56
+ Before we go into any detail about using **Xanthus** for automated, reproducible data generation for security analysis,
57
+ we describe a pipeline in which we create dataset for a *specific* attack in a push-button fashion. **Xanthus** is
58
+ a higher-level abstracted framework that generates such a pipeline for *any* attack that existing or future IDS intend to
59
+ evaluate.
60
+
61
+ ## Primer to Xanthus: A Specific Pipeline
62
+
63
+ We introduce a specific pipeline that automates data capture for a particular attack.
64
+ In this pipeline, we deploy virtual machines (VM), set up a virtual environment that recreates the attack scenario,
65
+ and run the attack, while capturing data from a whole-system provenance capture system.
66
+ Code is publicly available online at [GitHub](https://github.com/crimson-unicorn/demo/tree/master/wget).
67
+ Please refer to the code while finishing off the rest of this section.
68
+
69
+ ### Prerequisites
70
+
71
+ We assume that you understand the following terms and concepts.
72
+ If not, click on the item that you do not understand to read more about it:
73
+
74
+ * [Virtual machines](https://en.wikipedia.org/wiki/Virtual_machine)
75
+ * [Makefile](https://en.wikipedia.org/wiki/Makefile)
76
+ * [VirtualBox](https://www.virtualbox.org/manual/ch01.html)
77
+ * [Vagrant](https://www.vagrantup.com/intro/index.html), [Vagrantfile](https://www.vagrantup.com/docs/vagrantfile/)
78
+ and [provisioning](https://www.vagrantup.com/docs/provisioning/index.html)
79
+ * [CamFlow](http://camflow.org)
80
+
81
+ You may want to understand the following terms and concepts if you want to fully understand the attack
82
+ that we will describe in the next section:
83
+
84
+ * [Trojan software](https://en.wikipedia.org/wiki/Advanced_persistent_threat)
85
+ and [reverse shell](https://resources.infosecinstitute.com/icmp-reverse-shell/#gref)
86
+
87
+ ### A Brief Attack Description
88
+
89
+ You could better understand the pipeline with the knowledge of the attack that we would like to reproduce automatically.
90
+ The attacker aims to invade a victim machine through a vulnerable (or exploitable) `wget`.
91
+ The attacker sets up a malicious (or compromised) `HTTP` server that redirects any requests to a malicious `FTP` server
92
+ that contains a `Debian` package with a Trojan backdoor.
93
+ The package appears to be the same as its legitimate version and may even work the same way,
94
+ but the moment the package is installed on the victim machine, it will initiate a reverse TCP connection to the attacker
95
+ who is listening for connections and create a reverse shell that allows the attacker to infiltrate into the victim machine.
96
+
97
+ When the victim machine attempts to download the benign package from the `HTTP` server using `wget`,
98
+ `wget` allows arbitrary remote file upload to the host system.
99
+ Meaning that, instead of fetching the intended benign package, it allows redirection of the `HTTP` server and downloads
100
+ the malicious one.
101
+ The user is unaware of such behavior and install the package through the package manager `dpkg`.
102
+ The installed Trojan software establishes a connection to the attacker and the attack succeeds.
103
+
104
+ ### Software Involved
105
+
106
+ * `wget` v1.17 or older
107
+ * Any `Debian` package with a Trojan backdoor. The `Debian` package must be installable (both benign and malicious version).
108
+ * Functioning `HTTP` and `FTP` server
109
+ * `dpkg` package manager
110
+ * `CamFlow` whole-system provenance capture system
111
+
112
+ ### Execution Platform
113
+
114
+ As expected, `Debian` package can only run on any `Debian`-based operating systems. This particular pipeline is run on
115
+ `Ubuntu 18.04` (both the client and the server).
116
+
117
+ ### The Pipeline
118
+
119
+ #### Installation
120
+
121
+ To run this pipeline, you need to install at least the following items:
122
+
123
+ * `Vagrant`
124
+ * Oracle `VirtualBox`
125
+
126
+ #### Usage
127
+
128
+ If you `git clone` the entire repository from [GitHub](https://github.com/crimson-unicorn/demo/), `cd` into `wget` directory.
129
+ We assume this directory would be your working directory.
130
+
131
+ We write a `Makefile` to run our attack scenario for many times. If you want to run it once only,
132
+ modify this line: `[ $${cnt} -lt 25 ]` to `[ $${cnt} -lt 1 ]` in the `Makefile`.
133
+ (In `Xanthus`, we would be able to configure this easily without actually modifying the code.)
134
+
135
+ If you are running on `Mac`:
136
+ ```
137
+ make test_mac
138
+ ```
139
+ On `Linux`, you would run:
140
+ ```
141
+ make test_linux
142
+ ```
143
+ We do *not* support `Windows` operating system for now.
144
+ You would locate the output data file in `data/` directory.
145
+
146
+ #### Behind the Scenes
147
+
148
+ This pipeline seems to be very user-friendly. So, one might ask, why do we bother to design and implement `Xanthus`?
149
+ The truth is, we have done a lot of heavy-lifting for you behind the scenes. Let's take a closer look.
150
+
151
+ The `Makefile` you run starts the `vagrant` process, which would boot up two virtual machines, one `server` and one
152
+ `client` (now, take a look into `Vagrantfile`).
153
+
154
+ The `server` machine is provisioned by `provision/server.sh` script.
155
+ It configures an `FTP` and an `HTTP` server and puts the malicious `Debian` package in the `FTP` server.
156
+ Of course, the user must provide the pipeline with the package.
157
+ We build the package ourselves in [Kali Linux](https://en.wikipedia.org/wiki/Makefile)
158
+ with [TheFatRat](https://github.com/Screetsec/TheFatRat). You are free to use any tools at your disposal.
159
+ We also put the benign one in the `HTTP` server to trick the user to download it.
160
+
161
+ The `client` machine involves more operations.
162
+ First, unlike the `server` machine that simply uses a `Ubuntu 18.04` base operating system
163
+ (as seen in `server.vm.box = "bento/ubuntu-18.04"`),
164
+ the `client` machine uses our customized `VirtualBox` box called `michaelh/ubuncam`.
165
+ This box is built with the following specifications:
166
+
167
+ * It is built upon the original `Ubuntu 18.04` base box from `Vagrant`.
168
+ * It is installed with `CamFlow` as its provenance-capture system.
169
+ * It downgrades `wget` to its desired version (`v1.17`) that contains the vulnerability.
170
+ * It can install `Debian` packages in the experiment.
171
+
172
+ Note that it is always desirable to package such a box and upload it to the `VagrantCloud` so that we can
173
+ configure once and reuse many times.
174
+ One can always use a base box and configure the above specifications on-the-fly,
175
+ but it is not guaranteed that the configuration would work in the distant future.
176
+ For example, the link to download an older version of `wget` may expire without notice.
177
+ `Xanthus` allows users to either provide a customized virtual box or configure a base box through provisioning.
178
+ If an online configuration is provided, `Xanthus` would automatically generate a customized box for the user
179
+ to prevent future re-configuration or possible failure in future configuration.
180
+
181
+ The `client` machine runs the script in `provision/attack`.
182
+ The user must provide such a script.
183
+ In our case, we automatically generate attack scripts using `wget-attack-script-gen.py`.
184
+ `Xanthus` allows users to provide logic to generate scripts or simply provide scripts to run during the experiment.
185
+
186
+ ### Installation
187
+
188
+ Add this line to your application's Gemfile:
189
+
190
+ ```ruby
191
+ gem 'xanthus'
192
+ ```
193
+
194
+ And then execute:
195
+
196
+ $ bundle
197
+
198
+ Or install it yourself as:
199
+
200
+ $ gem install xanthus
201
+
202
+ ### Usage
203
+
204
+ ```
205
+ xanthus version | return Xanthus version number.
206
+ xanthus dependencies | installation instructions for system dependencies.
207
+ xanthus init <project name> | initialize a new project.
208
+ xanthus run | run .xanthus file in the current folder.
209
+ ```
210
+
211
+ ### Development
212
+
213
+ To add more features in `Xanthus`,
214
+ clone this repository
215
+ ```
216
+ git clone https://github.com/tfjmp/xanthus
217
+ cd xanthus
218
+ ```
219
+ and build the gem by running
220
+ ```
221
+ gem build xanthus
222
+ ```
223
+ To install this gem locally on your machine, you can also run
224
+ ```
225
+ gem install xanthus
226
+ ```
227
+ After you add a new feature (and test it yourself), you can release a new version of `Xanthus`.
228
+ First, please update the version number in `lib/xanthus/version.rb`, tag the repository `git tag -a x.x.x -m 'x.x.x'`, and push the tag `git push --tags`.
229
+ Then you can run
230
+ ```
231
+ gem push xanthus-x.x.x.gem
232
+ ```
233
+ This last step publishes the gem at [https://rubygems.org/gems/xanthus](https://rubygems.org/gems/xanthus).
234
+
235
+ ### Contribution
236
+
237
+ We welcome bug reports and pull requests on GitHub at https://github.com/[USERNAME]/xanthus.
238
+
239
+ ### License
240
+
241
+ This gem is available as an open source project under the [MIT License](https://opensource.org/licenses/MIT).
242
+
243
+ ### Issues and Solutions with VirtualBox
244
+ VirtualBox Guest Additions is not as well designed as we may hope. If you encountered the following error:
245
+ ```
246
+ Vagrant was unable to mount VirtualBox shared folders. This is usually
247
+ because the filesystem "vboxsf" is not available. This filesystem is
248
+ made available via the VirtualBox Guest Additions and kernel module.
249
+ Please verify that these guest additions are properly installed in the
250
+ guest. This is not a bug in Vagrant and is usually caused by a faulty
251
+ Vagrant box. For context, the command attempted was:
252
+
253
+ mount -t vboxsf -o uid=900,gid=900 vagrant /vagrant
254
+
255
+ The error output from the command was:
256
+
257
+ /sbin/mount.vboxsf: mounting failed with the error: No such device
258
+ ```
259
+ It is most likely the fault of incompatible GA between the VM and the host. Even though the script might have stop, the VM is still booted. You can `vagrant ssh` into the VM and manually input the following two commands:
260
+ ```
261
+ sudo apt-get -y install dkms build-essential linux-headers-$(uname -r) virtualbox-guest-additions-iso
262
+ sudo /opt/VBoxGuestAdditions*/init/vboxadd setup
263
+ ```
264
+ After this, you may encounter this error:
265
+ ```
266
+ ...
267
+ ==> default: Machine booted and ready!
268
+ [default] GuestAdditions seems to be installed (6.0.20) correctly, but not running.
269
+ bash: line 4: setup: command not found
270
+ ==> default: Checking for guest additions in VM...
271
+ The following SSH command responded with a non-zero exit status.
272
+ Vagrant assumes that this means the command failed!
273
+
274
+ setup
275
+
276
+ Stdout from the command:
277
+
278
+
279
+
280
+ Stderr from the command:
281
+
282
+ bash: line 4: setup: command not found
283
+ ```
284
+ Please add the following into the Vagrant script:
285
+ ```
286
+ if Vagrant.has_plugin?("vagrant-vbguest")
287
+ config.vbguest.auto_update = false
288
+ end
289
+ ```
@@ -0,0 +1,32 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "xanthus"
4
+
5
+ instruction = ARGV[0]
6
+ param1 = ARGV[1]
7
+
8
+ if instruction == 'version'
9
+ Xanthus.version
10
+ elsif instruction == 'init' && !param1.nil?
11
+ Xanthus::Init.init param1
12
+ elsif instruction == 'run'
13
+ xanthus_file = !param1.nil? ? param1 : '.xanthus'
14
+ load("./#{xanthus_file}")
15
+ elsif instruction == 'help'
16
+ puts 'xanthus version | return Xanthus version number.'
17
+ puts 'xanthus dependencies | installation instructions for system dependencies.'
18
+ puts 'xanthus init <project name> | initialize a new project.'
19
+ puts 'xanthus run [xanthus file] | run in the current folder. If not specified, we will try to run .xanthus .'
20
+ elsif instruction == 'dependencies'
21
+ puts 'You need to install the following software on your system for Xanthus to run:'
22
+ puts 'git (see https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)'
23
+ puts 'git lfs (see https://help.github.com/en/articles/installing-git-large-file-storage)'
24
+ puts 'virtualbox (see https://www.virtualbox.org/wiki/Downloads)'
25
+ puts 'vagrant (see https://www.vagrantup.com/docs/installation/)'
26
+ else
27
+ # the same to `xanthus help`
28
+ puts 'xanthus version | return Xanthus version number.'
29
+ puts 'xanthus dependencies | installation instructions for system dependencies.'
30
+ puts 'xanthus init <project name> | initialize a new project.'
31
+ puts 'xanthus run [xanthus file] | run in the current folder. If not specified, we will try to run .xanthus .'
32
+ end
@@ -0,0 +1,16 @@
1
+ def os_family
2
+ case RUBY_PLATFORM
3
+ when /ix/i, /ux/i, /gnu/i,
4
+ /sysv/i, /solaris/i,
5
+ /sunos/i, /bsd/i
6
+ "unix"
7
+ when /win/i, /ming/i
8
+ "windows"
9
+ else
10
+ "others"
11
+ end
12
+ end
13
+
14
+ def sys_script_ext
15
+ os_family == 'unix' ? 'sh' : 'cmd'
16
+ end
@@ -0,0 +1,15 @@
1
+ require "xanthus/version"
2
+ require "xanthus/init"
3
+ require "xanthus/script"
4
+ require "xanthus/virtual_machine"
5
+ require "xanthus/job"
6
+ require "xanthus/default"
7
+ require "xanthus/repository"
8
+ require "xanthus/github"
9
+ require "xanthus/dataverse"
10
+ require "xanthus/configuration"
11
+
12
+ module Xanthus
13
+ class Error < StandardError; end
14
+ # Your code goes here...
15
+ end
@@ -0,0 +1,94 @@
1
+ module Xanthus
2
+ class Configuration
3
+ attr_accessor :name
4
+ attr_accessor :authors
5
+ attr_accessor :affiliation
6
+ attr_accessor :email
7
+ attr_accessor :description
8
+ attr_accessor :seed
9
+ attr_accessor :params
10
+ attr_accessor :vms
11
+ attr_accessor :scripts
12
+ attr_accessor :jobs
13
+ attr_accessor :github_conf
14
+ attr_accessor :dataverse_conf
15
+
16
+ def initialize
17
+ @params = Hash.new
18
+ @vms = Hash.new
19
+ @scripts = Hash.new
20
+ @jobs = Hash.new
21
+ end
22
+
23
+ def vm name
24
+ vm = VirtualMachine.new
25
+ yield(vm)
26
+ vm.name = name
27
+ @vms[name] = vm
28
+ end
29
+
30
+ def script name
31
+ @scripts[name] = yield
32
+ end
33
+
34
+ def job name
35
+ v = Job.new
36
+ yield(v)
37
+ v.name = name
38
+ @jobs[name] = v
39
+ end
40
+
41
+ def github
42
+ github = GitHub.new
43
+ yield(github)
44
+ @github_conf = github
45
+ end
46
+
47
+ def dataverse
48
+ dataverse = Dataverse.new
49
+ yield(dataverse)
50
+ @dataverse_conf = dataverse
51
+ end
52
+
53
+ def to_readme_md
54
+ %Q{
55
+ # #{@name}
56
+
57
+ authors: #{@authors}
58
+ affiliation: #{@affiliation}
59
+ email: #{@email}
60
+
61
+ seed: #{@seed}
62
+
63
+ ## Description
64
+
65
+ #{@description}
66
+ }
67
+ end
68
+ end
69
+
70
+ def self.configure
71
+ config = Configuration.new
72
+ yield(config)
73
+ puts "Running experiment #{config.name} with seed #{config.seed}."
74
+ srand config.seed
75
+ config.vms.each do |k, v|
76
+ v.generate_box config
77
+ end
78
+
79
+ # initializing storage backends
80
+ config.github_conf.init(config) unless config.github_conf.nil?
81
+ config.dataverse_conf.init(config) unless config.dataverse_conf.nil?
82
+
83
+ # executing jobs
84
+ config.jobs.each do |name,job|
85
+ for i in 0..(job.iterations-1) do
86
+ job.execute config, i
87
+ end
88
+ end
89
+
90
+ # finalizing storage backends
91
+ config.github_conf.tag unless config.github_conf.nil?
92
+ config.github_conf.clean unless config.github_conf.nil?
93
+ end
94
+ end