midas-edge 0.2.1 → 0.2.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/NOTICE.txt +1 -0
- data/README.md +1 -1
- data/ext/midas/ext.cpp +2 -4
- data/lib/midas/version.rb +1 -1
- data/vendor/MIDAS/README.md +87 -32
- data/vendor/MIDAS/src/{EdgeHash.hpp → CountMinSketch.hpp} +12 -11
- data/vendor/MIDAS/src/FilteringCore.hpp +29 -25
- data/vendor/MIDAS/src/NormalCore.hpp +6 -6
- data/vendor/MIDAS/src/RelationalCore.hpp +10 -11
- metadata +3 -4
- data/vendor/MIDAS/src/NodeHash.hpp +0 -104
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 591fe96dce393c91a49953aa80ed776b2d794528fb30123beee27acc91e91554
|
4
|
+
data.tar.gz: f1105b066ad919ecc1fcf0ebebe88cfab174d31a8884ad6e6335260e476393a1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ecf9b5807d75b4d158d7ec5519ec62b1bcf13cf1d735ce8d6b72e10085712766deda93ec11a54b1a5afae32d7deb8ce80758ca8d31ec70f96ac97b61f07c3054
|
7
|
+
data.tar.gz: 972779bb0c42189302ad404d31452eff33b6f15b1fdfe51cabb4db2b3d00f5178475dd77a4fa3f9f06a4d2b30cb3c0c04cef36e66f4eb2d8a3ad264fed2009c8
|
data/CHANGELOG.md
CHANGED
data/NOTICE.txt
CHANGED
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
[MIDAS](https://github.com/bhatiasiddharth/MIDAS) - edge stream anomaly detection - for Ruby
|
4
4
|
|
5
|
-
[![Build Status](https://travis-ci.org/ankane/midas.svg?branch=master)](https://travis-ci.org/ankane/midas)
|
5
|
+
[![Build Status](https://travis-ci.org/ankane/midas.svg?branch=master)](https://travis-ci.org/ankane/midas) [![Build status](https://ci.appveyor.com/api/projects/status/klmqg9pmd3ndo0j5/branch/master?svg=true)](https://ci.appveyor.com/project/ankane/midas/branch/master)
|
6
6
|
|
7
7
|
## Installation
|
8
8
|
|
data/ext/midas/ext.cpp
CHANGED
@@ -1,7 +1,3 @@
|
|
1
|
-
// stdlib
|
2
|
-
#include <iostream>
|
3
|
-
#include <vector>
|
4
|
-
|
5
1
|
// midas
|
6
2
|
#include <FilteringCore.hpp>
|
7
3
|
#include <NormalCore.hpp>
|
@@ -11,6 +7,8 @@
|
|
11
7
|
#include <rice/Module.hpp>
|
12
8
|
#include <rice/String.hpp>
|
13
9
|
|
10
|
+
using std::vector;
|
11
|
+
|
14
12
|
using Rice::Module;
|
15
13
|
using Rice::String;
|
16
14
|
using Rice::define_module;
|
data/lib/midas/version.rb
CHANGED
data/vendor/MIDAS/README.md
CHANGED
@@ -4,15 +4,15 @@
|
|
4
4
|
<a href="https://aaai.org/Conferences/AAAI-20/">
|
5
5
|
<img src="http://img.shields.io/badge/AAAI-2020-red.svg">
|
6
6
|
</a>
|
7
|
-
<a href="https://
|
7
|
+
<a href="https://arxiv.org/pdf/2009.08452.pdf"><img src="http://img.shields.io/badge/Paper-PDF-brightgreen.svg"></a>
|
8
8
|
<a href="https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midasslides.pdf">
|
9
9
|
<img src="http://img.shields.io/badge/Slides-PDF-ff9e18.svg">
|
10
10
|
</a>
|
11
11
|
<a href="https://youtu.be/Bd4PyLCHrto">
|
12
12
|
<img src="http://img.shields.io/badge/Talk-Youtube-ff69b4.svg">
|
13
13
|
</a>
|
14
|
-
<a href="https://www.
|
15
|
-
<img src="https://img.shields.io/badge/
|
14
|
+
<a href="https://www.youtube.com/watch?v=DPmN-uPW8qU">
|
15
|
+
<img src="https://img.shields.io/badge/Overview-Youtube-orange.svg">
|
16
16
|
</a>
|
17
17
|
<a href="https://github.com/bhatiasiddharth/MIDAS/blob/master/LICENSE">
|
18
18
|
<img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg">
|
@@ -21,8 +21,8 @@
|
|
21
21
|
|
22
22
|
C++ implementation of
|
23
23
|
|
24
|
-
- Real-time Streaming Anomaly Detection in Dynamic Graphs. *Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. (Under Review)
|
25
|
-
- [MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams](
|
24
|
+
- [Real-time Streaming Anomaly Detection in Dynamic Graphs](https://arxiv.org/pdf/2009.08452.pdf). *Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. (Under Review)
|
25
|
+
- [MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams](https://arxiv.org/pdf/1911.04464.pdf). *Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. AAAI 2020.
|
26
26
|
|
27
27
|
The old implementation is in another branch `OldImplementation`, it should be considered as being archived and will hardly receive feature updates.
|
28
28
|
|
@@ -30,13 +30,20 @@ The old implementation is in another branch `OldImplementation`, it should be co
|
|
30
30
|
|
31
31
|
## Table of Contents
|
32
32
|
|
33
|
+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
34
|
+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
35
|
+
|
36
|
+
|
33
37
|
- [Features](#features)
|
34
38
|
- [Demo](#demo)
|
35
39
|
- [Customization](#customization)
|
36
|
-
- [
|
37
|
-
- [
|
40
|
+
- [Other Files](#other-files)
|
41
|
+
- [In Other Languages](#in-other-languages)
|
42
|
+
- [Online Coverage](#online-coverage)
|
38
43
|
- [Citation](#citation)
|
39
44
|
|
45
|
+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
46
|
+
|
40
47
|
## Features
|
41
48
|
|
42
49
|
- Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
|
@@ -45,7 +52,7 @@ The old implementation is in another branch `OldImplementation`, it should be co
|
|
45
52
|
- Constant Memory (independent of graph size)
|
46
53
|
- Constant Update Time (real-time anomaly detection to minimize harm)
|
47
54
|
- Up to 55% more accurate and 929 times faster than the state of the art approaches
|
48
|
-
-
|
55
|
+
- Experiments are performed using the following datasets:
|
49
56
|
- [DARPA](https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset)
|
50
57
|
- [TwitterWorldCup2014](http://odds.cs.stonybrook.edu/twitterworldcup2014-dataset)
|
51
58
|
- [TwitterSecurity](http://odds.cs.stonybrook.edu/twittersecurity-dataset)
|
@@ -56,31 +63,31 @@ If you use Windows:
|
|
56
63
|
|
57
64
|
1. Open a Visual Studio developer command prompt, we want their toolchain
|
58
65
|
1. `cd` to the project root `MIDAS/`
|
59
|
-
1. `cmake -DCMAKE_BUILD_TYPE=Release -
|
66
|
+
1. `cmake -DCMAKE_BUILD_TYPE=Release -GNinja -S . -B build/release`
|
60
67
|
1. `cmake --build build/release --target Demo`
|
61
|
-
1. `cd` to `MIDAS/build/release
|
68
|
+
1. `cd` to `MIDAS/build/release/`
|
62
69
|
1. `.\Demo.exe`
|
63
70
|
|
64
|
-
If you use Linux/macOS
|
71
|
+
If you use Linux/macOS:
|
65
72
|
|
66
73
|
1. Open a terminal
|
67
74
|
1. `cd` to the project root `MIDAS/`
|
68
75
|
1. `cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release`
|
69
76
|
1. `cmake --build build/release --target Demo`
|
70
|
-
1. `cd` to `MIDAS/build/release
|
77
|
+
1. `cd` to `MIDAS/build/release/`
|
71
78
|
1. `./Demo`
|
72
79
|
|
73
|
-
The demo runs on `MIDAS/data/DARPA/darpa_processed.csv`, which has 4.5M records, with the filtering core.
|
80
|
+
The demo runs on `MIDAS/data/DARPA/darpa_processed.csv`, which has 4.5M records, with the filtering core (MIDAS-F).
|
74
81
|
|
75
82
|
The scores will be exported to `MIDAS/temp/Score.txt`, higher means more anomalous.
|
76
83
|
|
77
|
-
All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double
|
84
|
+
All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double clicking on the executable file.
|
78
85
|
|
79
86
|
## Customization
|
80
87
|
|
81
88
|
### Switch Cores
|
82
89
|
|
83
|
-
Cores are instantiated at `MIDAS/example/Demo.cpp:
|
90
|
+
Cores are instantiated at `MIDAS/example/Demo.cpp:67-69`, uncomment the chosen one.
|
84
91
|
|
85
92
|
### Custom Dataset + `Demo.cpp`
|
86
93
|
|
@@ -89,48 +96,96 @@ You need to prepare three files:
|
|
89
96
|
- Meta file
|
90
97
|
- Only includes an integer `N`, the number of records in the dataset
|
91
98
|
- Use its path for `pathMeta`
|
99
|
+
- E.g. `MIDAS/data/DARPA/darpa_shape.txt`
|
92
100
|
- Data file
|
93
101
|
- A header-less csv format file of shape `[N,3]`
|
94
102
|
- Columns are sources, destinations, timestamps
|
95
103
|
- Use its path for `pathData`
|
104
|
+
- E.g. `MIDAS/data/DARPA/darpa_processed.csv`
|
96
105
|
- Label file
|
97
106
|
- A header-less csv format file of shape `[N,1]`
|
98
107
|
- The corresponding label for data records
|
99
108
|
- 0 means normal record
|
100
109
|
- 1 means anomalous record
|
101
|
-
- Use its path for `pathGroundTruth`
|
110
|
+
- Use its path for `pathGroundTruth`
|
111
|
+
- E.g. `MIDAS/data/DARPA/darpa_ground_truth.csv`
|
102
112
|
|
103
113
|
### Custom Dataset + Custom Runner
|
104
114
|
|
105
|
-
1. Include the header `MIDAS/
|
115
|
+
1. Include the header `MIDAS/src/NormalCore.hpp`, `MIDAS/src/RelationalCore.hpp` or `MIDAS/src/FilteringCore.hpp`
|
106
116
|
1. Instantiate cores with required parameters
|
107
|
-
1. Call `operator()` on individual data records, it returns the anomaly score for the input record
|
117
|
+
1. Call `operator()` on individual data records, it returns the anomaly score for the input record
|
118
|
+
|
119
|
+
## Other Files
|
120
|
+
|
121
|
+
### `example/`
|
122
|
+
|
123
|
+
#### `Experiment.cpp`
|
124
|
+
|
125
|
+
The code we used for experiments.
|
126
|
+
It will try to use Intel TBB or OpenMP for parallelization.
|
127
|
+
You should comment all but only one runner function call in the `main()` as most results are exported to `MIDAS/temp/Experiiment.csv` together with many intermediate files.
|
128
|
+
|
129
|
+
#### `Reproducible.cpp`
|
130
|
+
|
131
|
+
Similar to `Demo.cpp`, but with all random parameters hardcoded and always produce the same result.
|
132
|
+
It's for other developers and us to test if the implementation in other languages can produce acceptable results.
|
133
|
+
|
134
|
+
### `util/`
|
108
135
|
|
109
|
-
|
136
|
+
`DeleteTempFile.py`, `EvaluateScore.py` and `ReproduceROC.py` will show their usage and a short description when executed without any argument.
|
110
137
|
|
111
|
-
|
112
|
-
2. Towards Data Science: [Controlling Fake News using Graphs and Statistics](https://towardsdatascience.com/controlling-fake-news-using-graphs-and-statistics-31ed116a986f)
|
113
|
-
2. Towards Data Science: [Anomaly detection in dynamic graphs using MIDAS](https://towardsdatascience.com/anomaly-detection-in-dynamic-graphs-using-midas-e4f8d0b1db45)
|
114
|
-
4. Towards AI: [Anomaly Detection with MIDAS](https://medium.com/towards-artificial-intelligence/anomaly-detection-with-midas-2735a2e6dce8)
|
115
|
-
5. [AIhub Interview](https://aihub.org/2020/05/01/interview-with-siddharth-bhatia-a-new-approach-for-anomaly-detection/)
|
138
|
+
#### `PreprocessData.py`
|
116
139
|
|
117
|
-
|
140
|
+
The code to process the raw dataset into an easy-to-read format.
|
141
|
+
Datasets are always assumed to be in a folder in `MIDAS/data/`.
|
142
|
+
It can process the following dataset(s)
|
118
143
|
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
144
|
+
- `DARPA/darpa_original.csv` -> `DARPA/darpa_processed.csv`, `DARPA/darpa_ground_truth.csv`, `DARPA/darpa_shape.txt`
|
145
|
+
|
146
|
+
## In Other Languages
|
147
|
+
|
148
|
+
1. Python: [Rui Liu's MIDAS.Python](https://github.com/liurui39660/MIDAS.Python), [Ritesh Kumar's pyMIDAS](https://github.com/ritesh99rakesh/pyMIDAS)
|
149
|
+
1. Golang: [Steve Tan's midas](https://github.com/steve0hh/midas)
|
150
|
+
1. Ruby: [Andrew Kane's midas](https://github.com/ankane/midas)
|
151
|
+
1. Rust: [Scott Steele's midas_rs](https://github.com/scooter-dangle/midas_rs)
|
152
|
+
1. R: [Tobias Heidler's MIDASwrappeR](https://github.com/pteridin/MIDASwrappeR)
|
153
|
+
1. Java: [Joshua Tokle's MIDAS-Java](https://github.com/jotok/MIDAS-Java)
|
154
|
+
1. Julia: [Ashrya Agrawal's MIDAS.jl](https://github.com/ashryaagr/MIDAS.jl)
|
155
|
+
|
156
|
+
## Online Coverage
|
157
|
+
|
158
|
+
1. [ACM TechNews](https://technews.acm.org/archives.cfm?fo=2020-05-may/may-06-2020.html)
|
159
|
+
1. [AIhub](https://aihub.org/2020/05/01/interview-with-siddharth-bhatia-a-new-approach-for-anomaly-detection/)
|
160
|
+
1. [Hacker News](https://news.ycombinator.com/item?id=22802604)
|
161
|
+
1. [KDnuggets](https://www.kdnuggets.com/2020/04/midas-new-baseline-anomaly-detection-graphs.html)
|
162
|
+
1. [Microsoft](https://techcommunity.microsoft.com/t5/azure-sentinel/announcing-the-azure-sentinel-hackathon-winners/ba-p/1548240)
|
163
|
+
1. [Towards Data Science](https://towardsdatascience.com/controlling-fake-news-using-graphs-and-statistics-31ed116a986f)
|
124
164
|
|
125
165
|
## Citation
|
126
166
|
|
127
|
-
If you use this code for your research, please consider citing our
|
167
|
+
If you use this code for your research, please consider citing our arXiv preprint
|
168
|
+
|
169
|
+
```bibtex
|
170
|
+
@misc{bhatia2020realtime,
|
171
|
+
title={Real-Time Streaming Anomaly Detection in Dynamic Graphs},
|
172
|
+
author={Siddharth Bhatia and Rui Liu and Bryan Hooi and Minji Yoon and Kijung Shin and Christos Faloutsos},
|
173
|
+
year={2020},
|
174
|
+
eprint={2009.08452},
|
175
|
+
archivePrefix={arXiv},
|
176
|
+
primaryClass={cs.LG}
|
177
|
+
}
|
128
178
|
|
129
179
|
```
|
180
|
+
|
181
|
+
or our AAAI paper
|
182
|
+
|
183
|
+
|
184
|
+
```bibtex
|
130
185
|
@inproceedings{bhatia2020midas,
|
131
186
|
title="MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams",
|
132
187
|
author="Siddharth {Bhatia} and Bryan {Hooi} and Minji {Yoon} and Kijung {Shin} and Christos {Faloutsos}",
|
133
188
|
booktitle="AAAI 2020 : The Thirty-Fourth AAAI Conference on Artificial Intelligence",
|
134
189
|
year="2020"
|
135
190
|
}
|
136
|
-
```
|
191
|
+
```
|
@@ -19,7 +19,7 @@
|
|
19
19
|
#include <algorithm>
|
20
20
|
|
21
21
|
namespace MIDAS {
|
22
|
-
struct
|
22
|
+
struct CountMinSketch {
|
23
23
|
// Fields
|
24
24
|
// --------------------------------------------------------------------------------
|
25
25
|
|
@@ -33,10 +33,10 @@ struct EdgeHash {
|
|
33
33
|
// Methods
|
34
34
|
// --------------------------------------------------------------------------------
|
35
35
|
|
36
|
-
|
37
|
-
|
36
|
+
CountMinSketch() = delete;
|
37
|
+
CountMinSketch& operator=(const CountMinSketch& b) = delete;
|
38
38
|
|
39
|
-
|
39
|
+
CountMinSketch(int numRow, int numColumn):
|
40
40
|
r(numRow),
|
41
41
|
c(numColumn),
|
42
42
|
lenData(r * c),
|
@@ -50,7 +50,7 @@ struct EdgeHash {
|
|
50
50
|
std::fill(data, data + lenData, 0);
|
51
51
|
}
|
52
52
|
|
53
|
-
|
53
|
+
CountMinSketch(const CountMinSketch& b):
|
54
54
|
r(b.r),
|
55
55
|
c(b.c),
|
56
56
|
lenData(b.lenData),
|
@@ -62,7 +62,7 @@ struct EdgeHash {
|
|
62
62
|
std::copy(b.data, b.data + lenData, data);
|
63
63
|
}
|
64
64
|
|
65
|
-
~
|
65
|
+
~CountMinSketch() {
|
66
66
|
delete[] param1;
|
67
67
|
delete[] param2;
|
68
68
|
delete[] data;
|
@@ -73,10 +73,11 @@ struct EdgeHash {
|
|
73
73
|
}
|
74
74
|
|
75
75
|
void MultiplyAll(float by) const {
|
76
|
-
|
76
|
+
for (int i = 0, I = lenData; i < I; i++) // Vectorization
|
77
|
+
data[i] *= by;
|
77
78
|
}
|
78
79
|
|
79
|
-
void Hash(int
|
80
|
+
void Hash(int* indexOut, int a, int b = 0) const {
|
80
81
|
for (int i = 0; i < r; i++) {
|
81
82
|
indexOut[i] = ((a + m * b) * param1[i] + param2[i]) % c;
|
82
83
|
indexOut[i] += i * c + (indexOut[i] < 0 ? c : 0);
|
@@ -90,10 +91,10 @@ struct EdgeHash {
|
|
90
91
|
return least;
|
91
92
|
}
|
92
93
|
|
93
|
-
float Assign(const int* index, float
|
94
|
+
float Assign(const int* index, float with) const {
|
94
95
|
for (int i = 0; i < r; i++)
|
95
|
-
data[index[i]] =
|
96
|
-
return
|
96
|
+
data[index[i]] = with;
|
97
|
+
return with;
|
97
98
|
}
|
98
99
|
|
99
100
|
void Add(const int* index, float by = 1) const {
|
@@ -19,24 +19,27 @@
|
|
19
19
|
#include <cmath>
|
20
20
|
#include <algorithm>
|
21
21
|
|
22
|
-
#include "
|
23
|
-
#include "NodeHash.hpp"
|
22
|
+
#include "CountMinSketch.hpp"
|
24
23
|
|
25
24
|
namespace MIDAS {
|
26
25
|
struct FilteringCore {
|
27
26
|
const float threshold;
|
28
|
-
int
|
27
|
+
int timestamp = 1;
|
29
28
|
const float factor;
|
30
|
-
int
|
29
|
+
const int lenData;
|
30
|
+
int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the Same-Layout Assumption
|
31
31
|
int* const indexSource;
|
32
32
|
int* const indexDestination;
|
33
|
-
|
34
|
-
|
35
|
-
|
33
|
+
CountMinSketch numCurrentEdge, numTotalEdge, scoreEdge;
|
34
|
+
CountMinSketch numCurrentSource, numTotalSource, scoreSource;
|
35
|
+
CountMinSketch numCurrentDestination, numTotalDestination, scoreDestination;
|
36
|
+
float timestampReciprocal = 0;
|
37
|
+
bool* const shouldMerge;
|
36
38
|
|
37
39
|
FilteringCore(int numRow, int numColumn, float threshold, float factor = 0.5):
|
38
40
|
threshold(threshold),
|
39
41
|
factor(factor),
|
42
|
+
lenData(numRow * numColumn), // I assume all CMSs have same size, but Same-Layout Assumption is not that strict
|
40
43
|
indexEdge(new int[numRow]),
|
41
44
|
indexSource(new int[numRow]),
|
42
45
|
indexDestination(new int[numRow]),
|
@@ -48,42 +51,43 @@ struct FilteringCore {
|
|
48
51
|
scoreSource(numCurrentSource),
|
49
52
|
numCurrentDestination(numRow, numColumn),
|
50
53
|
numTotalDestination(numCurrentDestination),
|
51
|
-
scoreDestination(numCurrentDestination)
|
54
|
+
scoreDestination(numCurrentDestination),
|
55
|
+
shouldMerge(new bool[numRow * numColumn]) { }
|
52
56
|
|
53
57
|
virtual ~FilteringCore() {
|
54
58
|
delete[] indexEdge;
|
55
59
|
delete[] indexSource;
|
56
60
|
delete[] indexDestination;
|
61
|
+
delete[] shouldMerge;
|
57
62
|
}
|
58
63
|
|
59
64
|
static float ComputeScore(float a, float s, float t) {
|
60
65
|
return s == 0 ? 0 : pow(a + s - a * t, 2) / (s * (t - 1)); // If t == 1, then s == 0, so no need to check twice
|
61
66
|
}
|
62
67
|
|
68
|
+
inline void ConditionalMerge(const float* current, float* total, const float* score) const {
|
69
|
+
for (int i = 0; i < lenData; i++)
|
70
|
+
shouldMerge[i] = score[i] < threshold;
|
71
|
+
for (int i = 0, I = lenData; i < I; i++) // Vectorization
|
72
|
+
total[i] += shouldMerge[i] * current[i] + (true - shouldMerge[i]) * total[i] * timestampReciprocal;
|
73
|
+
}
|
74
|
+
|
63
75
|
float operator()(int source, int destination, int timestamp) {
|
64
|
-
if (timestamp
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
numTotalEdge.data[i] / (timestampCurrent - 1) : 0;
|
69
|
-
for (int i = 0; i < numCurrentSource.lenData; i++)
|
70
|
-
numTotalSource.data[i] += scoreSource.data[i] < threshold ?
|
71
|
-
numCurrentSource.data[i] : timestampCurrent - 1 ?
|
72
|
-
numTotalSource.data[i] / (timestampCurrent - 1) : 0;
|
73
|
-
for (int i = 0; i < numCurrentDestination.lenData; i++)
|
74
|
-
numTotalDestination.data[i] += scoreDestination.data[i] < threshold ?
|
75
|
-
numCurrentDestination.data[i] : timestampCurrent - 1 ?
|
76
|
-
numTotalDestination.data[i] / (timestampCurrent - 1) : 0;
|
76
|
+
if (this->timestamp < timestamp) {
|
77
|
+
ConditionalMerge(numCurrentEdge.data, numTotalEdge.data, scoreEdge.data);
|
78
|
+
ConditionalMerge(numCurrentSource.data, numTotalSource.data, scoreSource.data);
|
79
|
+
ConditionalMerge(numCurrentDestination.data, numTotalDestination.data, scoreDestination.data);
|
77
80
|
numCurrentEdge.MultiplyAll(factor);
|
78
81
|
numCurrentSource.MultiplyAll(factor);
|
79
82
|
numCurrentDestination.MultiplyAll(factor);
|
80
|
-
|
83
|
+
timestampReciprocal = 1.f / (timestamp - 1); // So I can skip an if-statement
|
84
|
+
this->timestamp = timestamp;
|
81
85
|
}
|
82
|
-
numCurrentEdge.Hash(source, destination
|
86
|
+
numCurrentEdge.Hash(indexEdge, source, destination);
|
83
87
|
numCurrentEdge.Add(indexEdge);
|
84
|
-
numCurrentSource.Hash(
|
88
|
+
numCurrentSource.Hash(indexSource, source);
|
85
89
|
numCurrentSource.Add(indexSource);
|
86
|
-
numCurrentDestination.Hash(
|
90
|
+
numCurrentDestination.Hash(indexDestination, destination);
|
87
91
|
numCurrentDestination.Add(indexDestination);
|
88
92
|
return std::max({
|
89
93
|
scoreEdge.Assign(indexEdge, ComputeScore(numCurrentEdge(indexEdge), numTotalEdge(indexEdge), timestamp)),
|
@@ -18,13 +18,13 @@
|
|
18
18
|
|
19
19
|
#include <cmath>
|
20
20
|
|
21
|
-
#include "
|
21
|
+
#include "CountMinSketch.hpp"
|
22
22
|
|
23
23
|
namespace MIDAS {
|
24
24
|
struct NormalCore {
|
25
|
-
int
|
25
|
+
int timestamp = 1;
|
26
26
|
int* const index; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
|
27
|
-
|
27
|
+
CountMinSketch numCurrent, numTotal;
|
28
28
|
|
29
29
|
NormalCore(int numRow, int numColumn):
|
30
30
|
index(new int[numRow]),
|
@@ -40,11 +40,11 @@ struct NormalCore {
|
|
40
40
|
}
|
41
41
|
|
42
42
|
float operator()(int source, int destination, int timestamp) {
|
43
|
-
if (timestamp
|
43
|
+
if (this->timestamp < timestamp) {
|
44
44
|
numCurrent.ClearAll();
|
45
|
-
|
45
|
+
this->timestamp = timestamp;
|
46
46
|
}
|
47
|
-
numCurrent.Hash(source, destination
|
47
|
+
numCurrent.Hash(index, source, destination);
|
48
48
|
numCurrent.Add(index);
|
49
49
|
numTotal.Add(index);
|
50
50
|
return ComputeScore(numCurrent(index), numTotal(index), timestamp);
|
@@ -19,19 +19,18 @@
|
|
19
19
|
#include <cmath>
|
20
20
|
#include <algorithm>
|
21
21
|
|
22
|
-
#include "
|
23
|
-
#include "NodeHash.hpp"
|
22
|
+
#include "CountMinSketch.hpp"
|
24
23
|
|
25
24
|
namespace MIDAS {
|
26
25
|
struct RelationalCore {
|
27
|
-
int
|
26
|
+
int timestamp = 1;
|
28
27
|
const float factor;
|
29
28
|
int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
|
30
29
|
int* const indexSource;
|
31
30
|
int* const indexDestination;
|
32
|
-
|
33
|
-
|
34
|
-
|
31
|
+
CountMinSketch numCurrentEdge, numTotalEdge;
|
32
|
+
CountMinSketch numCurrentSource, numTotalSource;
|
33
|
+
CountMinSketch numCurrentDestination, numTotalDestination;
|
35
34
|
|
36
35
|
RelationalCore(int numRow, int numColumn, float factor = 0.5):
|
37
36
|
factor(factor),
|
@@ -56,19 +55,19 @@ struct RelationalCore {
|
|
56
55
|
}
|
57
56
|
|
58
57
|
float operator()(int source, int destination, int timestamp) {
|
59
|
-
if (timestamp
|
58
|
+
if (this->timestamp < timestamp) {
|
60
59
|
numCurrentEdge.MultiplyAll(factor);
|
61
60
|
numCurrentSource.MultiplyAll(factor);
|
62
61
|
numCurrentDestination.MultiplyAll(factor);
|
63
|
-
|
62
|
+
this->timestamp = timestamp;
|
64
63
|
}
|
65
|
-
numCurrentEdge.Hash(source, destination
|
64
|
+
numCurrentEdge.Hash(indexEdge, source, destination);
|
66
65
|
numCurrentEdge.Add(indexEdge);
|
67
66
|
numTotalEdge.Add(indexEdge);
|
68
|
-
numCurrentSource.Hash(
|
67
|
+
numCurrentSource.Hash(indexSource, source);
|
69
68
|
numCurrentSource.Add(indexSource);
|
70
69
|
numTotalSource.Add(indexSource);
|
71
|
-
numCurrentDestination.Hash(
|
70
|
+
numCurrentDestination.Hash(indexDestination, destination);
|
72
71
|
numCurrentDestination.Add(indexDestination);
|
73
72
|
numTotalDestination.Add(indexDestination);
|
74
73
|
return std::max({
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: midas-edge
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-
|
11
|
+
date: 2020-09-23 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rice
|
@@ -112,9 +112,8 @@ files:
|
|
112
112
|
- lib/midas/version.rb
|
113
113
|
- vendor/MIDAS/LICENSE
|
114
114
|
- vendor/MIDAS/README.md
|
115
|
-
- vendor/MIDAS/src/
|
115
|
+
- vendor/MIDAS/src/CountMinSketch.hpp
|
116
116
|
- vendor/MIDAS/src/FilteringCore.hpp
|
117
|
-
- vendor/MIDAS/src/NodeHash.hpp
|
118
117
|
- vendor/MIDAS/src/NormalCore.hpp
|
119
118
|
- vendor/MIDAS/src/RelationalCore.hpp
|
120
119
|
homepage: https://github.com/ankane/midas
|
@@ -1,104 +0,0 @@
|
|
1
|
-
// -----------------------------------------------------------------------------
|
2
|
-
// Copyright 2020 Rui Liu (liurui39660) and Siddharth Bhatia (bhatiasiddharth)
|
3
|
-
//
|
4
|
-
// Licensed under the Apache License, Version 2.0 (the "License");
|
5
|
-
// you may not use this file except in compliance with the License.
|
6
|
-
// You may obtain a copy of the License at
|
7
|
-
//
|
8
|
-
// http://www.apache.org/licenses/LICENSE-2.0
|
9
|
-
//
|
10
|
-
// Unless required by applicable law or agreed to in writing, software
|
11
|
-
// distributed under the License is distributed on an "AS IS" BASIS,
|
12
|
-
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
13
|
-
// See the License for the specific language governing permissions and
|
14
|
-
// limitations under the License.
|
15
|
-
// -----------------------------------------------------------------------------
|
16
|
-
|
17
|
-
#pragma once
|
18
|
-
|
19
|
-
#include <algorithm>
|
20
|
-
|
21
|
-
namespace MIDAS {
|
22
|
-
struct NodeHash {
|
23
|
-
// Fields
|
24
|
-
// --------------------------------------------------------------------------------
|
25
|
-
|
26
|
-
const int r, c;
|
27
|
-
const int lenData;
|
28
|
-
int* const param1;
|
29
|
-
int* const param2;
|
30
|
-
float* const data;
|
31
|
-
constexpr static float infinity = std::numeric_limits<float>::infinity();
|
32
|
-
|
33
|
-
// Methods
|
34
|
-
// --------------------------------------------------------------------------------
|
35
|
-
|
36
|
-
NodeHash() = delete;
|
37
|
-
NodeHash& operator=(const NodeHash& b) = delete;
|
38
|
-
|
39
|
-
NodeHash(int numRow, int numColumn):
|
40
|
-
r(numRow),
|
41
|
-
c(numColumn),
|
42
|
-
lenData(r * c),
|
43
|
-
param1(new int[r]),
|
44
|
-
param2(new int[r]),
|
45
|
-
data(new float[lenData]) {
|
46
|
-
for (int i = 0; i < r; i++) {
|
47
|
-
param1[i] = rand() + 1; // ×0 is not a good idea, see Hash()
|
48
|
-
param2[i] = rand();
|
49
|
-
}
|
50
|
-
std::fill(data, data + lenData, 0);
|
51
|
-
}
|
52
|
-
|
53
|
-
NodeHash(const NodeHash& b):
|
54
|
-
r(b.r),
|
55
|
-
c(b.c),
|
56
|
-
lenData(b.lenData),
|
57
|
-
param1(new int[r]),
|
58
|
-
param2(new int[r]),
|
59
|
-
data(new float[lenData]) {
|
60
|
-
std::copy(b.param1, b.param1 + r, param1);
|
61
|
-
std::copy(b.param2, b.param2 + r, param2);
|
62
|
-
std::copy(b.data, b.data + lenData, data);
|
63
|
-
}
|
64
|
-
|
65
|
-
~NodeHash() {
|
66
|
-
delete[] param1;
|
67
|
-
delete[] param2;
|
68
|
-
delete[] data;
|
69
|
-
}
|
70
|
-
|
71
|
-
void ClearAll(float with = 0) const {
|
72
|
-
std::fill(data, data + lenData, with);
|
73
|
-
}
|
74
|
-
|
75
|
-
void MultiplyAll(float by) const {
|
76
|
-
std::for_each(data, data + lenData, [&](float& a) { a *= by; }); // Magic of vectorization
|
77
|
-
}
|
78
|
-
|
79
|
-
void Hash(int a, int* indexOut) const {
|
80
|
-
for (int i = 0; i < r; i++) {
|
81
|
-
indexOut[i] = (a * param1[i] + param2[i]) % c;
|
82
|
-
indexOut[i] += i * c + (indexOut[i] < 0 ? c : 0);
|
83
|
-
}
|
84
|
-
}
|
85
|
-
|
86
|
-
float operator()(const int* index) const {
|
87
|
-
float least = infinity;
|
88
|
-
for (int i = 0; i < r; i++)
|
89
|
-
least = std::min(least, data[index[i]]);
|
90
|
-
return least;
|
91
|
-
}
|
92
|
-
|
93
|
-
float Assign(const int* index, float to) const {
|
94
|
-
for (int i = 0; i < r; i++)
|
95
|
-
data[index[i]] = to;
|
96
|
-
return to;
|
97
|
-
}
|
98
|
-
|
99
|
-
void Add(const int* index, float by = 1) const {
|
100
|
-
for (int i = 0; i < r; i++)
|
101
|
-
data[index[i]] += by;
|
102
|
-
}
|
103
|
-
};
|
104
|
-
}
|