midas-edge 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b0d27786d0b99888ae441ad574d70f3686d206395dacef5273e473e90f35ec78
4
- data.tar.gz: 8383b227b506effe504636f8284e67f304732c463daa1e5251236c4ddf5b3d4c
3
+ metadata.gz: 591fe96dce393c91a49953aa80ed776b2d794528fb30123beee27acc91e91554
4
+ data.tar.gz: f1105b066ad919ecc1fcf0ebebe88cfab174d31a8884ad6e6335260e476393a1
5
5
  SHA512:
6
- metadata.gz: daa76e1ec74019adedb1ac4d187a303e549735024ac5fa1f410805b93cf2f8d8a2902ad94dcb7057f35c7a968bb595a6760b72e245a45a1dfbd0e33914c81702
7
- data.tar.gz: f49958b7c705a72e1c8af80ac982dc58c272db360efef9632971b00a2f38ca58890728a33b1fcf6ba53bb89fe9f13227d49a34499c492c0a0b7fcd11b4ce3412
6
+ metadata.gz: ecf9b5807d75b4d158d7ec5519ec62b1bcf13cf1d735ce8d6b72e10085712766deda93ec11a54b1a5afae32d7deb8ce80758ca8d31ec70f96ac97b61f07c3054
7
+ data.tar.gz: 972779bb0c42189302ad404d31452eff33b6f15b1fdfe51cabb4db2b3d00f5178475dd77a4fa3f9f06a4d2b30cb3c0c04cef36e66f4eb2d8a3ad264fed2009c8
@@ -1,3 +1,7 @@
1
+ ## 0.2.2 (2020-09-23)
2
+
3
+ - Updated MIDAS to 1.1.0
4
+
1
5
  ## 0.2.1 (2020-06-17)
2
6
 
3
7
  - Fixed installation (missing header files)
data/NOTICE.txt CHANGED
@@ -1,3 +1,4 @@
1
+ Copyright 2020 Rui Liu (liurui39660) and Siddharth Bhatia (bhatiasiddharth)
1
2
  Copyright 2020 Andrew Kane
2
3
 
3
4
  Licensed under the Apache License, Version 2.0 (the "License");
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [MIDAS](https://github.com/bhatiasiddharth/MIDAS) - edge stream anomaly detection - for Ruby
4
4
 
5
- [![Build Status](https://travis-ci.org/ankane/midas.svg?branch=master)](https://travis-ci.org/ankane/midas)
5
+ [![Build Status](https://travis-ci.org/ankane/midas.svg?branch=master)](https://travis-ci.org/ankane/midas) [![Build status](https://ci.appveyor.com/api/projects/status/klmqg9pmd3ndo0j5/branch/master?svg=true)](https://ci.appveyor.com/project/ankane/midas/branch/master)
6
6
 
7
7
  ## Installation
8
8
 
@@ -1,7 +1,3 @@
1
- // stdlib
2
- #include <iostream>
3
- #include <vector>
4
-
5
1
  // midas
6
2
  #include <FilteringCore.hpp>
7
3
  #include <NormalCore.hpp>
@@ -11,6 +7,8 @@
11
7
  #include <rice/Module.hpp>
12
8
  #include <rice/String.hpp>
13
9
 
10
+ using std::vector;
11
+
14
12
  using Rice::Module;
15
13
  using Rice::String;
16
14
  using Rice::define_module;
@@ -1,3 +1,3 @@
1
1
  module Midas
2
- VERSION = "0.2.1"
2
+ VERSION = "0.2.2"
3
3
  end
@@ -4,15 +4,15 @@
4
4
  <a href="https://aaai.org/Conferences/AAAI-20/">
5
5
  <img src="http://img.shields.io/badge/AAAI-2020-red.svg">
6
6
  </a>
7
- <a href="https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midas.pdf"><img src="http://img.shields.io/badge/Paper-PDF-brightgreen.svg"></a>
7
+ <a href="https://arxiv.org/pdf/2009.08452.pdf"><img src="http://img.shields.io/badge/Paper-PDF-brightgreen.svg"></a>
8
8
  <a href="https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midasslides.pdf">
9
9
  <img src="http://img.shields.io/badge/Slides-PDF-ff9e18.svg">
10
10
  </a>
11
11
  <a href="https://youtu.be/Bd4PyLCHrto">
12
12
  <img src="http://img.shields.io/badge/Talk-Youtube-ff69b4.svg">
13
13
  </a>
14
- <a href="https://www.kdnuggets.com/2020/04/midas-new-baseline-anomaly-detection-graphs.html">
15
- <img src="https://img.shields.io/badge/Press-KDnuggets-orange.svg">
14
+ <a href="https://www.youtube.com/watch?v=DPmN-uPW8qU">
15
+ <img src="https://img.shields.io/badge/Overview-Youtube-orange.svg">
16
16
  </a>
17
17
  <a href="https://github.com/bhatiasiddharth/MIDAS/blob/master/LICENSE">
18
18
  <img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg">
@@ -21,8 +21,8 @@
21
21
 
22
22
  C++ implementation of
23
23
 
24
- - Real-time Streaming Anomaly Detection in Dynamic Graphs. *Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. (Under Review)
25
- - [MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams](asset/Conference.pdf). *Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. AAAI 2020.
24
+ - [Real-time Streaming Anomaly Detection in Dynamic Graphs](https://arxiv.org/pdf/2009.08452.pdf). *Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. (Under Review)
25
+ - [MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams](https://arxiv.org/pdf/1911.04464.pdf). *Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. AAAI 2020.
26
26
 
27
27
  The old implementation is in another branch `OldImplementation`, it should be considered as being archived and will hardly receive feature updates.
28
28
 
@@ -30,13 +30,20 @@ The old implementation is in another branch `OldImplementation`, it should be co
30
30
 
31
31
  ## Table of Contents
32
32
 
33
+ <!-- START doctoc generated TOC please keep comment here to allow auto update -->
34
+ <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
35
+
36
+
33
37
  - [Features](#features)
34
38
  - [Demo](#demo)
35
39
  - [Customization](#customization)
36
- - [Online Articles](#online-articles)
37
- - [MIDAS in other Languages](#midas-in-other-languages)
40
+ - [Other Files](#other-files)
41
+ - [In Other Languages](#in-other-languages)
42
+ - [Online Coverage](#online-coverage)
38
43
  - [Citation](#citation)
39
44
 
45
+ <!-- END doctoc generated TOC please keep comment here to allow auto update -->
46
+
40
47
  ## Features
41
48
 
42
49
  - Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
@@ -45,7 +52,7 @@ The old implementation is in another branch `OldImplementation`, it should be co
45
52
  - Constant Memory (independent of graph size)
46
53
  - Constant Update Time (real-time anomaly detection to minimize harm)
47
54
  - Up to 55% more accurate and 929 times faster than the state of the art approaches
48
- - Some experiments are performed on the following datasets:
55
+ - Experiments are performed using the following datasets:
49
56
  - [DARPA](https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset)
50
57
  - [TwitterWorldCup2014](http://odds.cs.stonybrook.edu/twitterworldcup2014-dataset)
51
58
  - [TwitterSecurity](http://odds.cs.stonybrook.edu/twittersecurity-dataset)
@@ -56,31 +63,31 @@ If you use Windows:
56
63
 
57
64
  1. Open a Visual Studio developer command prompt, we want their toolchain
58
65
  1. `cd` to the project root `MIDAS/`
59
- 1. `cmake -DCMAKE_BUILD_TYPE=Release -G "NMake Makefiles" -S . -B build/release`
66
+ 1. `cmake -DCMAKE_BUILD_TYPE=Release -GNinja -S . -B build/release`
60
67
  1. `cmake --build build/release --target Demo`
61
- 1. `cd` to `MIDAS/build/release/src`
68
+ 1. `cd` to `MIDAS/build/release/`
62
69
  1. `.\Demo.exe`
63
70
 
64
- If you use Linux/macOS systems:
71
+ If you use Linux/macOS:
65
72
 
66
73
  1. Open a terminal
67
74
  1. `cd` to the project root `MIDAS/`
68
75
  1. `cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release`
69
76
  1. `cmake --build build/release --target Demo`
70
- 1. `cd` to `MIDAS/build/release/src`
77
+ 1. `cd` to `MIDAS/build/release/`
71
78
  1. `./Demo`
72
79
 
73
- The demo runs on `MIDAS/data/DARPA/darpa_processed.csv`, which has 4.5M records, with the filtering core.
80
+ The demo runs on `MIDAS/data/DARPA/darpa_processed.csv`, which has 4.5M records, with the filtering core (MIDAS-F).
74
81
 
75
82
  The scores will be exported to `MIDAS/temp/Score.txt`, higher means more anomalous.
76
83
 
77
- All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double-click on the executable file.
84
+ All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double clicking on the executable file.
78
85
 
79
86
  ## Customization
80
87
 
81
88
  ### Switch Cores
82
89
 
83
- Cores are instantiated at `MIDAS/example/Demo.cpp:64-66`, uncomment the chosen one.
90
+ Cores are instantiated at `MIDAS/example/Demo.cpp:67-69`, uncomment the chosen one.
84
91
 
85
92
  ### Custom Dataset + `Demo.cpp`
86
93
 
@@ -89,48 +96,96 @@ You need to prepare three files:
89
96
  - Meta file
90
97
  - Only includes an integer `N`, the number of records in the dataset
91
98
  - Use its path for `pathMeta`
99
+ - E.g. `MIDAS/data/DARPA/darpa_shape.txt`
92
100
  - Data file
93
101
  - A header-less csv format file of shape `[N,3]`
94
102
  - Columns are sources, destinations, timestamps
95
103
  - Use its path for `pathData`
104
+ - E.g. `MIDAS/data/DARPA/darpa_processed.csv`
96
105
  - Label file
97
106
  - A header-less csv format file of shape `[N,1]`
98
107
  - The corresponding label for data records
99
108
  - 0 means normal record
100
109
  - 1 means anomalous record
101
- - Use its path for `pathGroundTruth`
110
+ - Use its path for `pathGroundTruth`
111
+ - E.g. `MIDAS/data/DARPA/darpa_ground_truth.csv`
102
112
 
103
113
  ### Custom Dataset + Custom Runner
104
114
 
105
- 1. Include the header `MIDAS/CPU/NormalCore.hpp`, `MIDAS/CPU/RelationalCore.hpp` or `MIDAS/CPU/FilteringCore.hpp`
115
+ 1. Include the header `MIDAS/src/NormalCore.hpp`, `MIDAS/src/RelationalCore.hpp` or `MIDAS/src/FilteringCore.hpp`
106
116
  1. Instantiate cores with required parameters
107
- 1. Call `operator()` on individual data records, it returns the anomaly score for the input record.
117
+ 1. Call `operator()` on individual data records, it returns the anomaly score for the input record
118
+
119
+ ## Other Files
120
+
121
+ ### `example/`
122
+
123
+ #### `Experiment.cpp`
124
+
125
+ The code we used for experiments.
126
+ It will try to use Intel TBB or OpenMP for parallelization.
127
+ You should comment all but only one runner function call in the `main()` as most results are exported to `MIDAS/temp/Experiiment.csv` together with many intermediate files.
128
+
129
+ #### `Reproducible.cpp`
130
+
131
+ Similar to `Demo.cpp`, but with all random parameters hardcoded and always produce the same result.
132
+ It's for other developers and us to test if the implementation in other languages can produce acceptable results.
133
+
134
+ ### `util/`
108
135
 
109
- ## Online Articles
136
+ `DeleteTempFile.py`, `EvaluateScore.py` and `ReproduceROC.py` will show their usage and a short description when executed without any argument.
110
137
 
111
- 1. KDnuggets: [Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs](https://www.kdnuggets.com/2020/04/midas-new-baseline-anomaly-detection-graphs.html)
112
- 2. Towards Data Science: [Controlling Fake News using Graphs and Statistics](https://towardsdatascience.com/controlling-fake-news-using-graphs-and-statistics-31ed116a986f)
113
- 2. Towards Data Science: [Anomaly detection in dynamic graphs using MIDAS](https://towardsdatascience.com/anomaly-detection-in-dynamic-graphs-using-midas-e4f8d0b1db45)
114
- 4. Towards AI: [Anomaly Detection with MIDAS](https://medium.com/towards-artificial-intelligence/anomaly-detection-with-midas-2735a2e6dce8)
115
- 5. [AIhub Interview](https://aihub.org/2020/05/01/interview-with-siddharth-bhatia-a-new-approach-for-anomaly-detection/)
138
+ #### `PreprocessData.py`
116
139
 
117
- ## MIDAS in Other Languages
140
+ The code to process the raw dataset into an easy-to-read format.
141
+ Datasets are always assumed to be in a folder in `MIDAS/data/`.
142
+ It can process the following dataset(s)
118
143
 
119
- 1. [Golang](https://github.com/steve0hh/midas) by [Steve Tan](https://github.com/steve0hh)
120
- 2. [Ruby](https://github.com/ankane/midas) by [Andrew Kane](https://github.com/ankane)
121
- 3. [Rust](https://github.com/scooter-dangle/midas_rs) by [Scott Steele](https://github.com/scooter-dangle)
122
- 4. [R](https://github.com/pteridin/MIDASwrappeR) by [Tobias Heidler](https://github.com/pteridin)
123
- 5. [Python](https://github.com/ritesh99rakesh/pyMIDAS) by [Ritesh Kumar](https://github.com/ritesh99rakesh)
144
+ - `DARPA/darpa_original.csv` -> `DARPA/darpa_processed.csv`, `DARPA/darpa_ground_truth.csv`, `DARPA/darpa_shape.txt`
145
+
146
+ ## In Other Languages
147
+
148
+ 1. Python: [Rui Liu's MIDAS.Python](https://github.com/liurui39660/MIDAS.Python), [Ritesh Kumar's pyMIDAS](https://github.com/ritesh99rakesh/pyMIDAS)
149
+ 1. Golang: [Steve Tan's midas](https://github.com/steve0hh/midas)
150
+ 1. Ruby: [Andrew Kane's midas](https://github.com/ankane/midas)
151
+ 1. Rust: [Scott Steele's midas_rs](https://github.com/scooter-dangle/midas_rs)
152
+ 1. R: [Tobias Heidler's MIDASwrappeR](https://github.com/pteridin/MIDASwrappeR)
153
+ 1. Java: [Joshua Tokle's MIDAS-Java](https://github.com/jotok/MIDAS-Java)
154
+ 1. Julia: [Ashrya Agrawal's MIDAS.jl](https://github.com/ashryaagr/MIDAS.jl)
155
+
156
+ ## Online Coverage
157
+
158
+ 1. [ACM TechNews](https://technews.acm.org/archives.cfm?fo=2020-05-may/may-06-2020.html)
159
+ 1. [AIhub](https://aihub.org/2020/05/01/interview-with-siddharth-bhatia-a-new-approach-for-anomaly-detection/)
160
+ 1. [Hacker News](https://news.ycombinator.com/item?id=22802604)
161
+ 1. [KDnuggets](https://www.kdnuggets.com/2020/04/midas-new-baseline-anomaly-detection-graphs.html)
162
+ 1. [Microsoft](https://techcommunity.microsoft.com/t5/azure-sentinel/announcing-the-azure-sentinel-hackathon-winners/ba-p/1548240)
163
+ 1. [Towards Data Science](https://towardsdatascience.com/controlling-fake-news-using-graphs-and-statistics-31ed116a986f)
124
164
 
125
165
  ## Citation
126
166
 
127
- If you use this code for your research, please consider citing our paper.
167
+ If you use this code for your research, please consider citing our arXiv preprint
168
+
169
+ ```bibtex
170
+ @misc{bhatia2020realtime,
171
+ title={Real-Time Streaming Anomaly Detection in Dynamic Graphs},
172
+ author={Siddharth Bhatia and Rui Liu and Bryan Hooi and Minji Yoon and Kijung Shin and Christos Faloutsos},
173
+ year={2020},
174
+ eprint={2009.08452},
175
+ archivePrefix={arXiv},
176
+ primaryClass={cs.LG}
177
+ }
128
178
 
129
179
  ```
180
+
181
+ or our AAAI paper
182
+
183
+
184
+ ```bibtex
130
185
  @inproceedings{bhatia2020midas,
131
186
  title="MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams",
132
187
  author="Siddharth {Bhatia} and Bryan {Hooi} and Minji {Yoon} and Kijung {Shin} and Christos {Faloutsos}",
133
188
  booktitle="AAAI 2020 : The Thirty-Fourth AAAI Conference on Artificial Intelligence",
134
189
  year="2020"
135
190
  }
136
- ```
191
+ ```
@@ -19,7 +19,7 @@
19
19
  #include <algorithm>
20
20
 
21
21
  namespace MIDAS {
22
- struct EdgeHash {
22
+ struct CountMinSketch {
23
23
  // Fields
24
24
  // --------------------------------------------------------------------------------
25
25
 
@@ -33,10 +33,10 @@ struct EdgeHash {
33
33
  // Methods
34
34
  // --------------------------------------------------------------------------------
35
35
 
36
- EdgeHash() = delete;
37
- EdgeHash& operator=(const EdgeHash& b) = delete;
36
+ CountMinSketch() = delete;
37
+ CountMinSketch& operator=(const CountMinSketch& b) = delete;
38
38
 
39
- EdgeHash(int numRow, int numColumn):
39
+ CountMinSketch(int numRow, int numColumn):
40
40
  r(numRow),
41
41
  c(numColumn),
42
42
  lenData(r * c),
@@ -50,7 +50,7 @@ struct EdgeHash {
50
50
  std::fill(data, data + lenData, 0);
51
51
  }
52
52
 
53
- EdgeHash(const EdgeHash& b):
53
+ CountMinSketch(const CountMinSketch& b):
54
54
  r(b.r),
55
55
  c(b.c),
56
56
  lenData(b.lenData),
@@ -62,7 +62,7 @@ struct EdgeHash {
62
62
  std::copy(b.data, b.data + lenData, data);
63
63
  }
64
64
 
65
- ~EdgeHash() {
65
+ ~CountMinSketch() {
66
66
  delete[] param1;
67
67
  delete[] param2;
68
68
  delete[] data;
@@ -73,10 +73,11 @@ struct EdgeHash {
73
73
  }
74
74
 
75
75
  void MultiplyAll(float by) const {
76
- std::for_each(data, data + lenData, [&](float& a) { a *= by; }); // Magic of vectorization
76
+ for (int i = 0, I = lenData; i < I; i++) // Vectorization
77
+ data[i] *= by;
77
78
  }
78
79
 
79
- void Hash(int a, int b, int* indexOut) const {
80
+ void Hash(int* indexOut, int a, int b = 0) const {
80
81
  for (int i = 0; i < r; i++) {
81
82
  indexOut[i] = ((a + m * b) * param1[i] + param2[i]) % c;
82
83
  indexOut[i] += i * c + (indexOut[i] < 0 ? c : 0);
@@ -90,10 +91,10 @@ struct EdgeHash {
90
91
  return least;
91
92
  }
92
93
 
93
- float Assign(const int* index, float to) const {
94
+ float Assign(const int* index, float with) const {
94
95
  for (int i = 0; i < r; i++)
95
- data[index[i]] = to;
96
- return to;
96
+ data[index[i]] = with;
97
+ return with;
97
98
  }
98
99
 
99
100
  void Add(const int* index, float by = 1) const {
@@ -19,24 +19,27 @@
19
19
  #include <cmath>
20
20
  #include <algorithm>
21
21
 
22
- #include "EdgeHash.hpp"
23
- #include "NodeHash.hpp"
22
+ #include "CountMinSketch.hpp"
24
23
 
25
24
  namespace MIDAS {
26
25
  struct FilteringCore {
27
26
  const float threshold;
28
- int timestampCurrent = 1;
27
+ int timestamp = 1;
29
28
  const float factor;
30
- int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
29
+ const int lenData;
30
+ int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the Same-Layout Assumption
31
31
  int* const indexSource;
32
32
  int* const indexDestination;
33
- EdgeHash numCurrentEdge, numTotalEdge, scoreEdge;
34
- NodeHash numCurrentSource, numTotalSource, scoreSource;
35
- NodeHash numCurrentDestination, numTotalDestination, scoreDestination;
33
+ CountMinSketch numCurrentEdge, numTotalEdge, scoreEdge;
34
+ CountMinSketch numCurrentSource, numTotalSource, scoreSource;
35
+ CountMinSketch numCurrentDestination, numTotalDestination, scoreDestination;
36
+ float timestampReciprocal = 0;
37
+ bool* const shouldMerge;
36
38
 
37
39
  FilteringCore(int numRow, int numColumn, float threshold, float factor = 0.5):
38
40
  threshold(threshold),
39
41
  factor(factor),
42
+ lenData(numRow * numColumn), // I assume all CMSs have same size, but Same-Layout Assumption is not that strict
40
43
  indexEdge(new int[numRow]),
41
44
  indexSource(new int[numRow]),
42
45
  indexDestination(new int[numRow]),
@@ -48,42 +51,43 @@ struct FilteringCore {
48
51
  scoreSource(numCurrentSource),
49
52
  numCurrentDestination(numRow, numColumn),
50
53
  numTotalDestination(numCurrentDestination),
51
- scoreDestination(numCurrentDestination) { }
54
+ scoreDestination(numCurrentDestination),
55
+ shouldMerge(new bool[numRow * numColumn]) { }
52
56
 
53
57
  virtual ~FilteringCore() {
54
58
  delete[] indexEdge;
55
59
  delete[] indexSource;
56
60
  delete[] indexDestination;
61
+ delete[] shouldMerge;
57
62
  }
58
63
 
59
64
  static float ComputeScore(float a, float s, float t) {
60
65
  return s == 0 ? 0 : pow(a + s - a * t, 2) / (s * (t - 1)); // If t == 1, then s == 0, so no need to check twice
61
66
  }
62
67
 
68
+ inline void ConditionalMerge(const float* current, float* total, const float* score) const {
69
+ for (int i = 0; i < lenData; i++)
70
+ shouldMerge[i] = score[i] < threshold;
71
+ for (int i = 0, I = lenData; i < I; i++) // Vectorization
72
+ total[i] += shouldMerge[i] * current[i] + (true - shouldMerge[i]) * total[i] * timestampReciprocal;
73
+ }
74
+
63
75
  float operator()(int source, int destination, int timestamp) {
64
- if (timestamp > timestampCurrent) {
65
- for (int i = 0; i < numCurrentEdge.lenData; i++)
66
- numTotalEdge.data[i] += scoreEdge.data[i] < threshold ?
67
- numCurrentEdge.data[i] : timestampCurrent - 1 ?
68
- numTotalEdge.data[i] / (timestampCurrent - 1) : 0;
69
- for (int i = 0; i < numCurrentSource.lenData; i++)
70
- numTotalSource.data[i] += scoreSource.data[i] < threshold ?
71
- numCurrentSource.data[i] : timestampCurrent - 1 ?
72
- numTotalSource.data[i] / (timestampCurrent - 1) : 0;
73
- for (int i = 0; i < numCurrentDestination.lenData; i++)
74
- numTotalDestination.data[i] += scoreDestination.data[i] < threshold ?
75
- numCurrentDestination.data[i] : timestampCurrent - 1 ?
76
- numTotalDestination.data[i] / (timestampCurrent - 1) : 0;
76
+ if (this->timestamp < timestamp) {
77
+ ConditionalMerge(numCurrentEdge.data, numTotalEdge.data, scoreEdge.data);
78
+ ConditionalMerge(numCurrentSource.data, numTotalSource.data, scoreSource.data);
79
+ ConditionalMerge(numCurrentDestination.data, numTotalDestination.data, scoreDestination.data);
77
80
  numCurrentEdge.MultiplyAll(factor);
78
81
  numCurrentSource.MultiplyAll(factor);
79
82
  numCurrentDestination.MultiplyAll(factor);
80
- timestampCurrent = timestamp;
83
+ timestampReciprocal = 1.f / (timestamp - 1); // So I can skip an if-statement
84
+ this->timestamp = timestamp;
81
85
  }
82
- numCurrentEdge.Hash(source, destination, indexEdge);
86
+ numCurrentEdge.Hash(indexEdge, source, destination);
83
87
  numCurrentEdge.Add(indexEdge);
84
- numCurrentSource.Hash(source, indexSource);
88
+ numCurrentSource.Hash(indexSource, source);
85
89
  numCurrentSource.Add(indexSource);
86
- numCurrentDestination.Hash(destination, indexDestination);
90
+ numCurrentDestination.Hash(indexDestination, destination);
87
91
  numCurrentDestination.Add(indexDestination);
88
92
  return std::max({
89
93
  scoreEdge.Assign(indexEdge, ComputeScore(numCurrentEdge(indexEdge), numTotalEdge(indexEdge), timestamp)),
@@ -18,13 +18,13 @@
18
18
 
19
19
  #include <cmath>
20
20
 
21
- #include "EdgeHash.hpp"
21
+ #include "CountMinSketch.hpp"
22
22
 
23
23
  namespace MIDAS {
24
24
  struct NormalCore {
25
- int timestampCurrent = 1;
25
+ int timestamp = 1;
26
26
  int* const index; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
27
- EdgeHash numCurrent, numTotal;
27
+ CountMinSketch numCurrent, numTotal;
28
28
 
29
29
  NormalCore(int numRow, int numColumn):
30
30
  index(new int[numRow]),
@@ -40,11 +40,11 @@ struct NormalCore {
40
40
  }
41
41
 
42
42
  float operator()(int source, int destination, int timestamp) {
43
- if (timestamp > timestampCurrent) {
43
+ if (this->timestamp < timestamp) {
44
44
  numCurrent.ClearAll();
45
- timestampCurrent = timestamp;
45
+ this->timestamp = timestamp;
46
46
  }
47
- numCurrent.Hash(source, destination, index);
47
+ numCurrent.Hash(index, source, destination);
48
48
  numCurrent.Add(index);
49
49
  numTotal.Add(index);
50
50
  return ComputeScore(numCurrent(index), numTotal(index), timestamp);
@@ -19,19 +19,18 @@
19
19
  #include <cmath>
20
20
  #include <algorithm>
21
21
 
22
- #include "EdgeHash.hpp"
23
- #include "NodeHash.hpp"
22
+ #include "CountMinSketch.hpp"
24
23
 
25
24
  namespace MIDAS {
26
25
  struct RelationalCore {
27
- int timestampCurrent = 1;
26
+ int timestamp = 1;
28
27
  const float factor;
29
28
  int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
30
29
  int* const indexSource;
31
30
  int* const indexDestination;
32
- EdgeHash numCurrentEdge, numTotalEdge;
33
- NodeHash numCurrentSource, numTotalSource;
34
- NodeHash numCurrentDestination, numTotalDestination;
31
+ CountMinSketch numCurrentEdge, numTotalEdge;
32
+ CountMinSketch numCurrentSource, numTotalSource;
33
+ CountMinSketch numCurrentDestination, numTotalDestination;
35
34
 
36
35
  RelationalCore(int numRow, int numColumn, float factor = 0.5):
37
36
  factor(factor),
@@ -56,19 +55,19 @@ struct RelationalCore {
56
55
  }
57
56
 
58
57
  float operator()(int source, int destination, int timestamp) {
59
- if (timestamp > timestampCurrent) {
58
+ if (this->timestamp < timestamp) {
60
59
  numCurrentEdge.MultiplyAll(factor);
61
60
  numCurrentSource.MultiplyAll(factor);
62
61
  numCurrentDestination.MultiplyAll(factor);
63
- timestampCurrent = timestamp;
62
+ this->timestamp = timestamp;
64
63
  }
65
- numCurrentEdge.Hash(source, destination, indexEdge);
64
+ numCurrentEdge.Hash(indexEdge, source, destination);
66
65
  numCurrentEdge.Add(indexEdge);
67
66
  numTotalEdge.Add(indexEdge);
68
- numCurrentSource.Hash(source, indexSource);
67
+ numCurrentSource.Hash(indexSource, source);
69
68
  numCurrentSource.Add(indexSource);
70
69
  numTotalSource.Add(indexSource);
71
- numCurrentDestination.Hash(destination, indexDestination);
70
+ numCurrentDestination.Hash(indexDestination, destination);
72
71
  numCurrentDestination.Add(indexDestination);
73
72
  numTotalDestination.Add(indexDestination);
74
73
  return std::max({
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: midas-edge
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-06-18 00:00:00.000000000 Z
11
+ date: 2020-09-23 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rice
@@ -112,9 +112,8 @@ files:
112
112
  - lib/midas/version.rb
113
113
  - vendor/MIDAS/LICENSE
114
114
  - vendor/MIDAS/README.md
115
- - vendor/MIDAS/src/EdgeHash.hpp
115
+ - vendor/MIDAS/src/CountMinSketch.hpp
116
116
  - vendor/MIDAS/src/FilteringCore.hpp
117
- - vendor/MIDAS/src/NodeHash.hpp
118
117
  - vendor/MIDAS/src/NormalCore.hpp
119
118
  - vendor/MIDAS/src/RelationalCore.hpp
120
119
  homepage: https://github.com/ankane/midas
@@ -1,104 +0,0 @@
1
- // -----------------------------------------------------------------------------
2
- // Copyright 2020 Rui Liu (liurui39660) and Siddharth Bhatia (bhatiasiddharth)
3
- //
4
- // Licensed under the Apache License, Version 2.0 (the "License");
5
- // you may not use this file except in compliance with the License.
6
- // You may obtain a copy of the License at
7
- //
8
- // http://www.apache.org/licenses/LICENSE-2.0
9
- //
10
- // Unless required by applicable law or agreed to in writing, software
11
- // distributed under the License is distributed on an "AS IS" BASIS,
12
- // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
- // See the License for the specific language governing permissions and
14
- // limitations under the License.
15
- // -----------------------------------------------------------------------------
16
-
17
- #pragma once
18
-
19
- #include <algorithm>
20
-
21
- namespace MIDAS {
22
- struct NodeHash {
23
- // Fields
24
- // --------------------------------------------------------------------------------
25
-
26
- const int r, c;
27
- const int lenData;
28
- int* const param1;
29
- int* const param2;
30
- float* const data;
31
- constexpr static float infinity = std::numeric_limits<float>::infinity();
32
-
33
- // Methods
34
- // --------------------------------------------------------------------------------
35
-
36
- NodeHash() = delete;
37
- NodeHash& operator=(const NodeHash& b) = delete;
38
-
39
- NodeHash(int numRow, int numColumn):
40
- r(numRow),
41
- c(numColumn),
42
- lenData(r * c),
43
- param1(new int[r]),
44
- param2(new int[r]),
45
- data(new float[lenData]) {
46
- for (int i = 0; i < r; i++) {
47
- param1[i] = rand() + 1; // ×0 is not a good idea, see Hash()
48
- param2[i] = rand();
49
- }
50
- std::fill(data, data + lenData, 0);
51
- }
52
-
53
- NodeHash(const NodeHash& b):
54
- r(b.r),
55
- c(b.c),
56
- lenData(b.lenData),
57
- param1(new int[r]),
58
- param2(new int[r]),
59
- data(new float[lenData]) {
60
- std::copy(b.param1, b.param1 + r, param1);
61
- std::copy(b.param2, b.param2 + r, param2);
62
- std::copy(b.data, b.data + lenData, data);
63
- }
64
-
65
- ~NodeHash() {
66
- delete[] param1;
67
- delete[] param2;
68
- delete[] data;
69
- }
70
-
71
- void ClearAll(float with = 0) const {
72
- std::fill(data, data + lenData, with);
73
- }
74
-
75
- void MultiplyAll(float by) const {
76
- std::for_each(data, data + lenData, [&](float& a) { a *= by; }); // Magic of vectorization
77
- }
78
-
79
- void Hash(int a, int* indexOut) const {
80
- for (int i = 0; i < r; i++) {
81
- indexOut[i] = (a * param1[i] + param2[i]) % c;
82
- indexOut[i] += i * c + (indexOut[i] < 0 ? c : 0);
83
- }
84
- }
85
-
86
- float operator()(const int* index) const {
87
- float least = infinity;
88
- for (int i = 0; i < r; i++)
89
- least = std::min(least, data[index[i]]);
90
- return least;
91
- }
92
-
93
- float Assign(const int* index, float to) const {
94
- for (int i = 0; i < r; i++)
95
- data[index[i]] = to;
96
- return to;
97
- }
98
-
99
- void Add(const int* index, float by = 1) const {
100
- for (int i = 0; i < r; i++)
101
- data[index[i]] += by;
102
- }
103
- };
104
- }