midas-edge 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b0d27786d0b99888ae441ad574d70f3686d206395dacef5273e473e90f35ec78
4
- data.tar.gz: 8383b227b506effe504636f8284e67f304732c463daa1e5251236c4ddf5b3d4c
3
+ metadata.gz: 591fe96dce393c91a49953aa80ed776b2d794528fb30123beee27acc91e91554
4
+ data.tar.gz: f1105b066ad919ecc1fcf0ebebe88cfab174d31a8884ad6e6335260e476393a1
5
5
  SHA512:
6
- metadata.gz: daa76e1ec74019adedb1ac4d187a303e549735024ac5fa1f410805b93cf2f8d8a2902ad94dcb7057f35c7a968bb595a6760b72e245a45a1dfbd0e33914c81702
7
- data.tar.gz: f49958b7c705a72e1c8af80ac982dc58c272db360efef9632971b00a2f38ca58890728a33b1fcf6ba53bb89fe9f13227d49a34499c492c0a0b7fcd11b4ce3412
6
+ metadata.gz: ecf9b5807d75b4d158d7ec5519ec62b1bcf13cf1d735ce8d6b72e10085712766deda93ec11a54b1a5afae32d7deb8ce80758ca8d31ec70f96ac97b61f07c3054
7
+ data.tar.gz: 972779bb0c42189302ad404d31452eff33b6f15b1fdfe51cabb4db2b3d00f5178475dd77a4fa3f9f06a4d2b30cb3c0c04cef36e66f4eb2d8a3ad264fed2009c8
@@ -1,3 +1,7 @@
1
+ ## 0.2.2 (2020-09-23)
2
+
3
+ - Updated MIDAS to 1.1.0
4
+
1
5
  ## 0.2.1 (2020-06-17)
2
6
 
3
7
  - Fixed installation (missing header files)
data/NOTICE.txt CHANGED
@@ -1,3 +1,4 @@
1
+ Copyright 2020 Rui Liu (liurui39660) and Siddharth Bhatia (bhatiasiddharth)
1
2
  Copyright 2020 Andrew Kane
2
3
 
3
4
  Licensed under the Apache License, Version 2.0 (the "License");
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [MIDAS](https://github.com/bhatiasiddharth/MIDAS) - edge stream anomaly detection - for Ruby
4
4
 
5
- [![Build Status](https://travis-ci.org/ankane/midas.svg?branch=master)](https://travis-ci.org/ankane/midas)
5
+ [![Build Status](https://travis-ci.org/ankane/midas.svg?branch=master)](https://travis-ci.org/ankane/midas) [![Build status](https://ci.appveyor.com/api/projects/status/klmqg9pmd3ndo0j5/branch/master?svg=true)](https://ci.appveyor.com/project/ankane/midas/branch/master)
6
6
 
7
7
  ## Installation
8
8
 
@@ -1,7 +1,3 @@
1
- // stdlib
2
- #include <iostream>
3
- #include <vector>
4
-
5
1
  // midas
6
2
  #include <FilteringCore.hpp>
7
3
  #include <NormalCore.hpp>
@@ -11,6 +7,8 @@
11
7
  #include <rice/Module.hpp>
12
8
  #include <rice/String.hpp>
13
9
 
10
+ using std::vector;
11
+
14
12
  using Rice::Module;
15
13
  using Rice::String;
16
14
  using Rice::define_module;
@@ -1,3 +1,3 @@
1
1
  module Midas
2
- VERSION = "0.2.1"
2
+ VERSION = "0.2.2"
3
3
  end
@@ -4,15 +4,15 @@
4
4
  <a href="https://aaai.org/Conferences/AAAI-20/">
5
5
  <img src="http://img.shields.io/badge/AAAI-2020-red.svg">
6
6
  </a>
7
- <a href="https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midas.pdf"><img src="http://img.shields.io/badge/Paper-PDF-brightgreen.svg"></a>
7
+ <a href="https://arxiv.org/pdf/2009.08452.pdf"><img src="http://img.shields.io/badge/Paper-PDF-brightgreen.svg"></a>
8
8
  <a href="https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midasslides.pdf">
9
9
  <img src="http://img.shields.io/badge/Slides-PDF-ff9e18.svg">
10
10
  </a>
11
11
  <a href="https://youtu.be/Bd4PyLCHrto">
12
12
  <img src="http://img.shields.io/badge/Talk-Youtube-ff69b4.svg">
13
13
  </a>
14
- <a href="https://www.kdnuggets.com/2020/04/midas-new-baseline-anomaly-detection-graphs.html">
15
- <img src="https://img.shields.io/badge/Press-KDnuggets-orange.svg">
14
+ <a href="https://www.youtube.com/watch?v=DPmN-uPW8qU">
15
+ <img src="https://img.shields.io/badge/Overview-Youtube-orange.svg">
16
16
  </a>
17
17
  <a href="https://github.com/bhatiasiddharth/MIDAS/blob/master/LICENSE">
18
18
  <img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg">
@@ -21,8 +21,8 @@
21
21
 
22
22
  C++ implementation of
23
23
 
24
- - Real-time Streaming Anomaly Detection in Dynamic Graphs. *Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. (Under Review)
25
- - [MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams](asset/Conference.pdf). *Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. AAAI 2020.
24
+ - [Real-time Streaming Anomaly Detection in Dynamic Graphs](https://arxiv.org/pdf/2009.08452.pdf). *Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. (Under Review)
25
+ - [MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams](https://arxiv.org/pdf/1911.04464.pdf). *Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos*. AAAI 2020.
26
26
 
27
27
  The old implementation is in another branch `OldImplementation`, it should be considered as being archived and will hardly receive feature updates.
28
28
 
@@ -30,13 +30,20 @@ The old implementation is in another branch `OldImplementation`, it should be co
30
30
 
31
31
  ## Table of Contents
32
32
 
33
+ <!-- START doctoc generated TOC please keep comment here to allow auto update -->
34
+ <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
35
+
36
+
33
37
  - [Features](#features)
34
38
  - [Demo](#demo)
35
39
  - [Customization](#customization)
36
- - [Online Articles](#online-articles)
37
- - [MIDAS in other Languages](#midas-in-other-languages)
40
+ - [Other Files](#other-files)
41
+ - [In Other Languages](#in-other-languages)
42
+ - [Online Coverage](#online-coverage)
38
43
  - [Citation](#citation)
39
44
 
45
+ <!-- END doctoc generated TOC please keep comment here to allow auto update -->
46
+
40
47
  ## Features
41
48
 
42
49
  - Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
@@ -45,7 +52,7 @@ The old implementation is in another branch `OldImplementation`, it should be co
45
52
  - Constant Memory (independent of graph size)
46
53
  - Constant Update Time (real-time anomaly detection to minimize harm)
47
54
  - Up to 55% more accurate and 929 times faster than the state of the art approaches
48
- - Some experiments are performed on the following datasets:
55
+ - Experiments are performed using the following datasets:
49
56
  - [DARPA](https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset)
50
57
  - [TwitterWorldCup2014](http://odds.cs.stonybrook.edu/twitterworldcup2014-dataset)
51
58
  - [TwitterSecurity](http://odds.cs.stonybrook.edu/twittersecurity-dataset)
@@ -56,31 +63,31 @@ If you use Windows:
56
63
 
57
64
  1. Open a Visual Studio developer command prompt, we want their toolchain
58
65
  1. `cd` to the project root `MIDAS/`
59
- 1. `cmake -DCMAKE_BUILD_TYPE=Release -G "NMake Makefiles" -S . -B build/release`
66
+ 1. `cmake -DCMAKE_BUILD_TYPE=Release -GNinja -S . -B build/release`
60
67
  1. `cmake --build build/release --target Demo`
61
- 1. `cd` to `MIDAS/build/release/src`
68
+ 1. `cd` to `MIDAS/build/release/`
62
69
  1. `.\Demo.exe`
63
70
 
64
- If you use Linux/macOS systems:
71
+ If you use Linux/macOS:
65
72
 
66
73
  1. Open a terminal
67
74
  1. `cd` to the project root `MIDAS/`
68
75
  1. `cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release`
69
76
  1. `cmake --build build/release --target Demo`
70
- 1. `cd` to `MIDAS/build/release/src`
77
+ 1. `cd` to `MIDAS/build/release/`
71
78
  1. `./Demo`
72
79
 
73
- The demo runs on `MIDAS/data/DARPA/darpa_processed.csv`, which has 4.5M records, with the filtering core.
80
+ The demo runs on `MIDAS/data/DARPA/darpa_processed.csv`, which has 4.5M records, with the filtering core (MIDAS-F).
74
81
 
75
82
  The scores will be exported to `MIDAS/temp/Score.txt`, higher means more anomalous.
76
83
 
77
- All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double-click on the executable file.
84
+ All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double clicking on the executable file.
78
85
 
79
86
  ## Customization
80
87
 
81
88
  ### Switch Cores
82
89
 
83
- Cores are instantiated at `MIDAS/example/Demo.cpp:64-66`, uncomment the chosen one.
90
+ Cores are instantiated at `MIDAS/example/Demo.cpp:67-69`, uncomment the chosen one.
84
91
 
85
92
  ### Custom Dataset + `Demo.cpp`
86
93
 
@@ -89,48 +96,96 @@ You need to prepare three files:
89
96
  - Meta file
90
97
  - Only includes an integer `N`, the number of records in the dataset
91
98
  - Use its path for `pathMeta`
99
+ - E.g. `MIDAS/data/DARPA/darpa_shape.txt`
92
100
  - Data file
93
101
  - A header-less csv format file of shape `[N,3]`
94
102
  - Columns are sources, destinations, timestamps
95
103
  - Use its path for `pathData`
104
+ - E.g. `MIDAS/data/DARPA/darpa_processed.csv`
96
105
  - Label file
97
106
  - A header-less csv format file of shape `[N,1]`
98
107
  - The corresponding label for data records
99
108
  - 0 means normal record
100
109
  - 1 means anomalous record
101
- - Use its path for `pathGroundTruth`
110
+ - Use its path for `pathGroundTruth`
111
+ - E.g. `MIDAS/data/DARPA/darpa_ground_truth.csv`
102
112
 
103
113
  ### Custom Dataset + Custom Runner
104
114
 
105
- 1. Include the header `MIDAS/CPU/NormalCore.hpp`, `MIDAS/CPU/RelationalCore.hpp` or `MIDAS/CPU/FilteringCore.hpp`
115
+ 1. Include the header `MIDAS/src/NormalCore.hpp`, `MIDAS/src/RelationalCore.hpp` or `MIDAS/src/FilteringCore.hpp`
106
116
  1. Instantiate cores with required parameters
107
- 1. Call `operator()` on individual data records, it returns the anomaly score for the input record.
117
+ 1. Call `operator()` on individual data records, it returns the anomaly score for the input record
118
+
119
+ ## Other Files
120
+
121
+ ### `example/`
122
+
123
+ #### `Experiment.cpp`
124
+
125
+ The code we used for experiments.
126
+ It will try to use Intel TBB or OpenMP for parallelization.
127
+ You should comment all but only one runner function call in the `main()` as most results are exported to `MIDAS/temp/Experiiment.csv` together with many intermediate files.
128
+
129
+ #### `Reproducible.cpp`
130
+
131
+ Similar to `Demo.cpp`, but with all random parameters hardcoded and always produce the same result.
132
+ It's for other developers and us to test if the implementation in other languages can produce acceptable results.
133
+
134
+ ### `util/`
108
135
 
109
- ## Online Articles
136
+ `DeleteTempFile.py`, `EvaluateScore.py` and `ReproduceROC.py` will show their usage and a short description when executed without any argument.
110
137
 
111
- 1. KDnuggets: [Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs](https://www.kdnuggets.com/2020/04/midas-new-baseline-anomaly-detection-graphs.html)
112
- 2. Towards Data Science: [Controlling Fake News using Graphs and Statistics](https://towardsdatascience.com/controlling-fake-news-using-graphs-and-statistics-31ed116a986f)
113
- 2. Towards Data Science: [Anomaly detection in dynamic graphs using MIDAS](https://towardsdatascience.com/anomaly-detection-in-dynamic-graphs-using-midas-e4f8d0b1db45)
114
- 4. Towards AI: [Anomaly Detection with MIDAS](https://medium.com/towards-artificial-intelligence/anomaly-detection-with-midas-2735a2e6dce8)
115
- 5. [AIhub Interview](https://aihub.org/2020/05/01/interview-with-siddharth-bhatia-a-new-approach-for-anomaly-detection/)
138
+ #### `PreprocessData.py`
116
139
 
117
- ## MIDAS in Other Languages
140
+ The code to process the raw dataset into an easy-to-read format.
141
+ Datasets are always assumed to be in a folder in `MIDAS/data/`.
142
+ It can process the following dataset(s)
118
143
 
119
- 1. [Golang](https://github.com/steve0hh/midas) by [Steve Tan](https://github.com/steve0hh)
120
- 2. [Ruby](https://github.com/ankane/midas) by [Andrew Kane](https://github.com/ankane)
121
- 3. [Rust](https://github.com/scooter-dangle/midas_rs) by [Scott Steele](https://github.com/scooter-dangle)
122
- 4. [R](https://github.com/pteridin/MIDASwrappeR) by [Tobias Heidler](https://github.com/pteridin)
123
- 5. [Python](https://github.com/ritesh99rakesh/pyMIDAS) by [Ritesh Kumar](https://github.com/ritesh99rakesh)
144
+ - `DARPA/darpa_original.csv` -> `DARPA/darpa_processed.csv`, `DARPA/darpa_ground_truth.csv`, `DARPA/darpa_shape.txt`
145
+
146
+ ## In Other Languages
147
+
148
+ 1. Python: [Rui Liu's MIDAS.Python](https://github.com/liurui39660/MIDAS.Python), [Ritesh Kumar's pyMIDAS](https://github.com/ritesh99rakesh/pyMIDAS)
149
+ 1. Golang: [Steve Tan's midas](https://github.com/steve0hh/midas)
150
+ 1. Ruby: [Andrew Kane's midas](https://github.com/ankane/midas)
151
+ 1. Rust: [Scott Steele's midas_rs](https://github.com/scooter-dangle/midas_rs)
152
+ 1. R: [Tobias Heidler's MIDASwrappeR](https://github.com/pteridin/MIDASwrappeR)
153
+ 1. Java: [Joshua Tokle's MIDAS-Java](https://github.com/jotok/MIDAS-Java)
154
+ 1. Julia: [Ashrya Agrawal's MIDAS.jl](https://github.com/ashryaagr/MIDAS.jl)
155
+
156
+ ## Online Coverage
157
+
158
+ 1. [ACM TechNews](https://technews.acm.org/archives.cfm?fo=2020-05-may/may-06-2020.html)
159
+ 1. [AIhub](https://aihub.org/2020/05/01/interview-with-siddharth-bhatia-a-new-approach-for-anomaly-detection/)
160
+ 1. [Hacker News](https://news.ycombinator.com/item?id=22802604)
161
+ 1. [KDnuggets](https://www.kdnuggets.com/2020/04/midas-new-baseline-anomaly-detection-graphs.html)
162
+ 1. [Microsoft](https://techcommunity.microsoft.com/t5/azure-sentinel/announcing-the-azure-sentinel-hackathon-winners/ba-p/1548240)
163
+ 1. [Towards Data Science](https://towardsdatascience.com/controlling-fake-news-using-graphs-and-statistics-31ed116a986f)
124
164
 
125
165
  ## Citation
126
166
 
127
- If you use this code for your research, please consider citing our paper.
167
+ If you use this code for your research, please consider citing our arXiv preprint
168
+
169
+ ```bibtex
170
+ @misc{bhatia2020realtime,
171
+ title={Real-Time Streaming Anomaly Detection in Dynamic Graphs},
172
+ author={Siddharth Bhatia and Rui Liu and Bryan Hooi and Minji Yoon and Kijung Shin and Christos Faloutsos},
173
+ year={2020},
174
+ eprint={2009.08452},
175
+ archivePrefix={arXiv},
176
+ primaryClass={cs.LG}
177
+ }
128
178
 
129
179
  ```
180
+
181
+ or our AAAI paper
182
+
183
+
184
+ ```bibtex
130
185
  @inproceedings{bhatia2020midas,
131
186
  title="MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams",
132
187
  author="Siddharth {Bhatia} and Bryan {Hooi} and Minji {Yoon} and Kijung {Shin} and Christos {Faloutsos}",
133
188
  booktitle="AAAI 2020 : The Thirty-Fourth AAAI Conference on Artificial Intelligence",
134
189
  year="2020"
135
190
  }
136
- ```
191
+ ```
@@ -19,7 +19,7 @@
19
19
  #include <algorithm>
20
20
 
21
21
  namespace MIDAS {
22
- struct EdgeHash {
22
+ struct CountMinSketch {
23
23
  // Fields
24
24
  // --------------------------------------------------------------------------------
25
25
 
@@ -33,10 +33,10 @@ struct EdgeHash {
33
33
  // Methods
34
34
  // --------------------------------------------------------------------------------
35
35
 
36
- EdgeHash() = delete;
37
- EdgeHash& operator=(const EdgeHash& b) = delete;
36
+ CountMinSketch() = delete;
37
+ CountMinSketch& operator=(const CountMinSketch& b) = delete;
38
38
 
39
- EdgeHash(int numRow, int numColumn):
39
+ CountMinSketch(int numRow, int numColumn):
40
40
  r(numRow),
41
41
  c(numColumn),
42
42
  lenData(r * c),
@@ -50,7 +50,7 @@ struct EdgeHash {
50
50
  std::fill(data, data + lenData, 0);
51
51
  }
52
52
 
53
- EdgeHash(const EdgeHash& b):
53
+ CountMinSketch(const CountMinSketch& b):
54
54
  r(b.r),
55
55
  c(b.c),
56
56
  lenData(b.lenData),
@@ -62,7 +62,7 @@ struct EdgeHash {
62
62
  std::copy(b.data, b.data + lenData, data);
63
63
  }
64
64
 
65
- ~EdgeHash() {
65
+ ~CountMinSketch() {
66
66
  delete[] param1;
67
67
  delete[] param2;
68
68
  delete[] data;
@@ -73,10 +73,11 @@ struct EdgeHash {
73
73
  }
74
74
 
75
75
  void MultiplyAll(float by) const {
76
- std::for_each(data, data + lenData, [&](float& a) { a *= by; }); // Magic of vectorization
76
+ for (int i = 0, I = lenData; i < I; i++) // Vectorization
77
+ data[i] *= by;
77
78
  }
78
79
 
79
- void Hash(int a, int b, int* indexOut) const {
80
+ void Hash(int* indexOut, int a, int b = 0) const {
80
81
  for (int i = 0; i < r; i++) {
81
82
  indexOut[i] = ((a + m * b) * param1[i] + param2[i]) % c;
82
83
  indexOut[i] += i * c + (indexOut[i] < 0 ? c : 0);
@@ -90,10 +91,10 @@ struct EdgeHash {
90
91
  return least;
91
92
  }
92
93
 
93
- float Assign(const int* index, float to) const {
94
+ float Assign(const int* index, float with) const {
94
95
  for (int i = 0; i < r; i++)
95
- data[index[i]] = to;
96
- return to;
96
+ data[index[i]] = with;
97
+ return with;
97
98
  }
98
99
 
99
100
  void Add(const int* index, float by = 1) const {
@@ -19,24 +19,27 @@
19
19
  #include <cmath>
20
20
  #include <algorithm>
21
21
 
22
- #include "EdgeHash.hpp"
23
- #include "NodeHash.hpp"
22
+ #include "CountMinSketch.hpp"
24
23
 
25
24
  namespace MIDAS {
26
25
  struct FilteringCore {
27
26
  const float threshold;
28
- int timestampCurrent = 1;
27
+ int timestamp = 1;
29
28
  const float factor;
30
- int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
29
+ const int lenData;
30
+ int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the Same-Layout Assumption
31
31
  int* const indexSource;
32
32
  int* const indexDestination;
33
- EdgeHash numCurrentEdge, numTotalEdge, scoreEdge;
34
- NodeHash numCurrentSource, numTotalSource, scoreSource;
35
- NodeHash numCurrentDestination, numTotalDestination, scoreDestination;
33
+ CountMinSketch numCurrentEdge, numTotalEdge, scoreEdge;
34
+ CountMinSketch numCurrentSource, numTotalSource, scoreSource;
35
+ CountMinSketch numCurrentDestination, numTotalDestination, scoreDestination;
36
+ float timestampReciprocal = 0;
37
+ bool* const shouldMerge;
36
38
 
37
39
  FilteringCore(int numRow, int numColumn, float threshold, float factor = 0.5):
38
40
  threshold(threshold),
39
41
  factor(factor),
42
+ lenData(numRow * numColumn), // I assume all CMSs have same size, but Same-Layout Assumption is not that strict
40
43
  indexEdge(new int[numRow]),
41
44
  indexSource(new int[numRow]),
42
45
  indexDestination(new int[numRow]),
@@ -48,42 +51,43 @@ struct FilteringCore {
48
51
  scoreSource(numCurrentSource),
49
52
  numCurrentDestination(numRow, numColumn),
50
53
  numTotalDestination(numCurrentDestination),
51
- scoreDestination(numCurrentDestination) { }
54
+ scoreDestination(numCurrentDestination),
55
+ shouldMerge(new bool[numRow * numColumn]) { }
52
56
 
53
57
  virtual ~FilteringCore() {
54
58
  delete[] indexEdge;
55
59
  delete[] indexSource;
56
60
  delete[] indexDestination;
61
+ delete[] shouldMerge;
57
62
  }
58
63
 
59
64
  static float ComputeScore(float a, float s, float t) {
60
65
  return s == 0 ? 0 : pow(a + s - a * t, 2) / (s * (t - 1)); // If t == 1, then s == 0, so no need to check twice
61
66
  }
62
67
 
68
+ inline void ConditionalMerge(const float* current, float* total, const float* score) const {
69
+ for (int i = 0; i < lenData; i++)
70
+ shouldMerge[i] = score[i] < threshold;
71
+ for (int i = 0, I = lenData; i < I; i++) // Vectorization
72
+ total[i] += shouldMerge[i] * current[i] + (true - shouldMerge[i]) * total[i] * timestampReciprocal;
73
+ }
74
+
63
75
  float operator()(int source, int destination, int timestamp) {
64
- if (timestamp > timestampCurrent) {
65
- for (int i = 0; i < numCurrentEdge.lenData; i++)
66
- numTotalEdge.data[i] += scoreEdge.data[i] < threshold ?
67
- numCurrentEdge.data[i] : timestampCurrent - 1 ?
68
- numTotalEdge.data[i] / (timestampCurrent - 1) : 0;
69
- for (int i = 0; i < numCurrentSource.lenData; i++)
70
- numTotalSource.data[i] += scoreSource.data[i] < threshold ?
71
- numCurrentSource.data[i] : timestampCurrent - 1 ?
72
- numTotalSource.data[i] / (timestampCurrent - 1) : 0;
73
- for (int i = 0; i < numCurrentDestination.lenData; i++)
74
- numTotalDestination.data[i] += scoreDestination.data[i] < threshold ?
75
- numCurrentDestination.data[i] : timestampCurrent - 1 ?
76
- numTotalDestination.data[i] / (timestampCurrent - 1) : 0;
76
+ if (this->timestamp < timestamp) {
77
+ ConditionalMerge(numCurrentEdge.data, numTotalEdge.data, scoreEdge.data);
78
+ ConditionalMerge(numCurrentSource.data, numTotalSource.data, scoreSource.data);
79
+ ConditionalMerge(numCurrentDestination.data, numTotalDestination.data, scoreDestination.data);
77
80
  numCurrentEdge.MultiplyAll(factor);
78
81
  numCurrentSource.MultiplyAll(factor);
79
82
  numCurrentDestination.MultiplyAll(factor);
80
- timestampCurrent = timestamp;
83
+ timestampReciprocal = 1.f / (timestamp - 1); // So I can skip an if-statement
84
+ this->timestamp = timestamp;
81
85
  }
82
- numCurrentEdge.Hash(source, destination, indexEdge);
86
+ numCurrentEdge.Hash(indexEdge, source, destination);
83
87
  numCurrentEdge.Add(indexEdge);
84
- numCurrentSource.Hash(source, indexSource);
88
+ numCurrentSource.Hash(indexSource, source);
85
89
  numCurrentSource.Add(indexSource);
86
- numCurrentDestination.Hash(destination, indexDestination);
90
+ numCurrentDestination.Hash(indexDestination, destination);
87
91
  numCurrentDestination.Add(indexDestination);
88
92
  return std::max({
89
93
  scoreEdge.Assign(indexEdge, ComputeScore(numCurrentEdge(indexEdge), numTotalEdge(indexEdge), timestamp)),
@@ -18,13 +18,13 @@
18
18
 
19
19
  #include <cmath>
20
20
 
21
- #include "EdgeHash.hpp"
21
+ #include "CountMinSketch.hpp"
22
22
 
23
23
  namespace MIDAS {
24
24
  struct NormalCore {
25
- int timestampCurrent = 1;
25
+ int timestamp = 1;
26
26
  int* const index; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
27
- EdgeHash numCurrent, numTotal;
27
+ CountMinSketch numCurrent, numTotal;
28
28
 
29
29
  NormalCore(int numRow, int numColumn):
30
30
  index(new int[numRow]),
@@ -40,11 +40,11 @@ struct NormalCore {
40
40
  }
41
41
 
42
42
  float operator()(int source, int destination, int timestamp) {
43
- if (timestamp > timestampCurrent) {
43
+ if (this->timestamp < timestamp) {
44
44
  numCurrent.ClearAll();
45
- timestampCurrent = timestamp;
45
+ this->timestamp = timestamp;
46
46
  }
47
- numCurrent.Hash(source, destination, index);
47
+ numCurrent.Hash(index, source, destination);
48
48
  numCurrent.Add(index);
49
49
  numTotal.Add(index);
50
50
  return ComputeScore(numCurrent(index), numTotal(index), timestamp);
@@ -19,19 +19,18 @@
19
19
  #include <cmath>
20
20
  #include <algorithm>
21
21
 
22
- #include "EdgeHash.hpp"
23
- #include "NodeHash.hpp"
22
+ #include "CountMinSketch.hpp"
24
23
 
25
24
  namespace MIDAS {
26
25
  struct RelationalCore {
27
- int timestampCurrent = 1;
26
+ int timestamp = 1;
28
27
  const float factor;
29
28
  int* const indexEdge; // Pre-compute the index to-be-modified, thanks to the same structure of CMSs
30
29
  int* const indexSource;
31
30
  int* const indexDestination;
32
- EdgeHash numCurrentEdge, numTotalEdge;
33
- NodeHash numCurrentSource, numTotalSource;
34
- NodeHash numCurrentDestination, numTotalDestination;
31
+ CountMinSketch numCurrentEdge, numTotalEdge;
32
+ CountMinSketch numCurrentSource, numTotalSource;
33
+ CountMinSketch numCurrentDestination, numTotalDestination;
35
34
 
36
35
  RelationalCore(int numRow, int numColumn, float factor = 0.5):
37
36
  factor(factor),
@@ -56,19 +55,19 @@ struct RelationalCore {
56
55
  }
57
56
 
58
57
  float operator()(int source, int destination, int timestamp) {
59
- if (timestamp > timestampCurrent) {
58
+ if (this->timestamp < timestamp) {
60
59
  numCurrentEdge.MultiplyAll(factor);
61
60
  numCurrentSource.MultiplyAll(factor);
62
61
  numCurrentDestination.MultiplyAll(factor);
63
- timestampCurrent = timestamp;
62
+ this->timestamp = timestamp;
64
63
  }
65
- numCurrentEdge.Hash(source, destination, indexEdge);
64
+ numCurrentEdge.Hash(indexEdge, source, destination);
66
65
  numCurrentEdge.Add(indexEdge);
67
66
  numTotalEdge.Add(indexEdge);
68
- numCurrentSource.Hash(source, indexSource);
67
+ numCurrentSource.Hash(indexSource, source);
69
68
  numCurrentSource.Add(indexSource);
70
69
  numTotalSource.Add(indexSource);
71
- numCurrentDestination.Hash(destination, indexDestination);
70
+ numCurrentDestination.Hash(indexDestination, destination);
72
71
  numCurrentDestination.Add(indexDestination);
73
72
  numTotalDestination.Add(indexDestination);
74
73
  return std::max({
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: midas-edge
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-06-18 00:00:00.000000000 Z
11
+ date: 2020-09-23 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rice
@@ -112,9 +112,8 @@ files:
112
112
  - lib/midas/version.rb
113
113
  - vendor/MIDAS/LICENSE
114
114
  - vendor/MIDAS/README.md
115
- - vendor/MIDAS/src/EdgeHash.hpp
115
+ - vendor/MIDAS/src/CountMinSketch.hpp
116
116
  - vendor/MIDAS/src/FilteringCore.hpp
117
- - vendor/MIDAS/src/NodeHash.hpp
118
117
  - vendor/MIDAS/src/NormalCore.hpp
119
118
  - vendor/MIDAS/src/RelationalCore.hpp
120
119
  homepage: https://github.com/ankane/midas
@@ -1,104 +0,0 @@
1
- // -----------------------------------------------------------------------------
2
- // Copyright 2020 Rui Liu (liurui39660) and Siddharth Bhatia (bhatiasiddharth)
3
- //
4
- // Licensed under the Apache License, Version 2.0 (the "License");
5
- // you may not use this file except in compliance with the License.
6
- // You may obtain a copy of the License at
7
- //
8
- // http://www.apache.org/licenses/LICENSE-2.0
9
- //
10
- // Unless required by applicable law or agreed to in writing, software
11
- // distributed under the License is distributed on an "AS IS" BASIS,
12
- // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
- // See the License for the specific language governing permissions and
14
- // limitations under the License.
15
- // -----------------------------------------------------------------------------
16
-
17
- #pragma once
18
-
19
- #include <algorithm>
20
-
21
- namespace MIDAS {
22
- struct NodeHash {
23
- // Fields
24
- // --------------------------------------------------------------------------------
25
-
26
- const int r, c;
27
- const int lenData;
28
- int* const param1;
29
- int* const param2;
30
- float* const data;
31
- constexpr static float infinity = std::numeric_limits<float>::infinity();
32
-
33
- // Methods
34
- // --------------------------------------------------------------------------------
35
-
36
- NodeHash() = delete;
37
- NodeHash& operator=(const NodeHash& b) = delete;
38
-
39
- NodeHash(int numRow, int numColumn):
40
- r(numRow),
41
- c(numColumn),
42
- lenData(r * c),
43
- param1(new int[r]),
44
- param2(new int[r]),
45
- data(new float[lenData]) {
46
- for (int i = 0; i < r; i++) {
47
- param1[i] = rand() + 1; // ×0 is not a good idea, see Hash()
48
- param2[i] = rand();
49
- }
50
- std::fill(data, data + lenData, 0);
51
- }
52
-
53
- NodeHash(const NodeHash& b):
54
- r(b.r),
55
- c(b.c),
56
- lenData(b.lenData),
57
- param1(new int[r]),
58
- param2(new int[r]),
59
- data(new float[lenData]) {
60
- std::copy(b.param1, b.param1 + r, param1);
61
- std::copy(b.param2, b.param2 + r, param2);
62
- std::copy(b.data, b.data + lenData, data);
63
- }
64
-
65
- ~NodeHash() {
66
- delete[] param1;
67
- delete[] param2;
68
- delete[] data;
69
- }
70
-
71
- void ClearAll(float with = 0) const {
72
- std::fill(data, data + lenData, with);
73
- }
74
-
75
- void MultiplyAll(float by) const {
76
- std::for_each(data, data + lenData, [&](float& a) { a *= by; }); // Magic of vectorization
77
- }
78
-
79
- void Hash(int a, int* indexOut) const {
80
- for (int i = 0; i < r; i++) {
81
- indexOut[i] = (a * param1[i] + param2[i]) % c;
82
- indexOut[i] += i * c + (indexOut[i] < 0 ? c : 0);
83
- }
84
- }
85
-
86
- float operator()(const int* index) const {
87
- float least = infinity;
88
- for (int i = 0; i < r; i++)
89
- least = std::min(least, data[index[i]]);
90
- return least;
91
- }
92
-
93
- float Assign(const int* index, float to) const {
94
- for (int i = 0; i < r; i++)
95
- data[index[i]] = to;
96
- return to;
97
- }
98
-
99
- void Add(const int* index, float by = 1) const {
100
- for (int i = 0; i < r; i++)
101
- data[index[i]] += by;
102
- }
103
- };
104
- }