datasketches 0.3.1 → 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (113) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +4 -0
  3. data/ext/datasketches/cpc_wrapper.cpp +1 -1
  4. data/lib/datasketches/version.rb +1 -1
  5. data/vendor/datasketches-cpp/CMakeLists.txt +22 -20
  6. data/vendor/datasketches-cpp/NOTICE +1 -1
  7. data/vendor/datasketches-cpp/common/include/MurmurHash3.h +25 -27
  8. data/vendor/datasketches-cpp/common/include/common_defs.hpp +8 -6
  9. data/vendor/datasketches-cpp/common/include/count_zeros.hpp +11 -0
  10. data/vendor/datasketches-cpp/common/include/memory_operations.hpp +5 -4
  11. data/vendor/datasketches-cpp/common/test/CMakeLists.txt +1 -1
  12. data/vendor/datasketches-cpp/common/test/integration_test.cpp +6 -0
  13. data/vendor/datasketches-cpp/count/CMakeLists.txt +42 -0
  14. data/vendor/datasketches-cpp/count/include/count_min.hpp +351 -0
  15. data/vendor/datasketches-cpp/count/include/count_min_impl.hpp +517 -0
  16. data/vendor/datasketches-cpp/count/test/CMakeLists.txt +43 -0
  17. data/vendor/datasketches-cpp/count/test/count_min_allocation_test.cpp +155 -0
  18. data/vendor/datasketches-cpp/count/test/count_min_test.cpp +306 -0
  19. data/vendor/datasketches-cpp/cpc/include/cpc_confidence.hpp +3 -3
  20. data/vendor/datasketches-cpp/cpc/include/cpc_sketch_impl.hpp +1 -1
  21. data/vendor/datasketches-cpp/cpc/include/cpc_util.hpp +16 -8
  22. data/vendor/datasketches-cpp/density/CMakeLists.txt +42 -0
  23. data/vendor/datasketches-cpp/density/include/density_sketch.hpp +236 -0
  24. data/vendor/datasketches-cpp/density/include/density_sketch_impl.hpp +543 -0
  25. data/vendor/datasketches-cpp/density/test/CMakeLists.txt +35 -0
  26. data/vendor/datasketches-cpp/density/test/density_sketch_test.cpp +244 -0
  27. data/vendor/datasketches-cpp/fi/include/reverse_purge_hash_map.hpp +9 -3
  28. data/vendor/datasketches-cpp/hll/include/Hll4Array-internal.hpp +19 -11
  29. data/vendor/datasketches-cpp/hll/include/Hll4Array.hpp +2 -5
  30. data/vendor/datasketches-cpp/hll/include/Hll6Array-internal.hpp +19 -7
  31. data/vendor/datasketches-cpp/hll/include/Hll6Array.hpp +1 -1
  32. data/vendor/datasketches-cpp/hll/include/Hll8Array-internal.hpp +98 -42
  33. data/vendor/datasketches-cpp/hll/include/Hll8Array.hpp +2 -0
  34. data/vendor/datasketches-cpp/hll/include/HllArray-internal.hpp +92 -59
  35. data/vendor/datasketches-cpp/hll/include/HllArray.hpp +16 -6
  36. data/vendor/datasketches-cpp/hll/include/HllSketchImplFactory.hpp +3 -21
  37. data/vendor/datasketches-cpp/hll/include/HllUnion-internal.hpp +8 -0
  38. data/vendor/datasketches-cpp/hll/include/HllUtil.hpp +14 -6
  39. data/vendor/datasketches-cpp/hll/include/coupon_iterator-internal.hpp +1 -1
  40. data/vendor/datasketches-cpp/hll/include/coupon_iterator.hpp +8 -2
  41. data/vendor/datasketches-cpp/hll/include/hll.hpp +9 -8
  42. data/vendor/datasketches-cpp/hll/test/HllUnionTest.cpp +7 -1
  43. data/vendor/datasketches-cpp/kll/include/kll_helper.hpp +0 -1
  44. data/vendor/datasketches-cpp/kll/include/kll_sketch.hpp +8 -3
  45. data/vendor/datasketches-cpp/kll/include/kll_sketch_impl.hpp +2 -2
  46. data/vendor/datasketches-cpp/kll/test/kll_sketch_test.cpp +2 -2
  47. data/vendor/datasketches-cpp/python/CMakeLists.txt +6 -0
  48. data/vendor/datasketches-cpp/python/README.md +5 -5
  49. data/vendor/datasketches-cpp/python/datasketches/DensityWrapper.py +87 -0
  50. data/vendor/datasketches-cpp/python/datasketches/KernelFunction.py +35 -0
  51. data/vendor/datasketches-cpp/python/datasketches/PySerDe.py +15 -9
  52. data/vendor/datasketches-cpp/python/datasketches/TuplePolicy.py +77 -0
  53. data/vendor/datasketches-cpp/python/datasketches/TupleWrapper.py +205 -0
  54. data/vendor/datasketches-cpp/python/datasketches/__init__.py +17 -1
  55. data/vendor/datasketches-cpp/python/include/kernel_function.hpp +98 -0
  56. data/vendor/datasketches-cpp/python/include/py_object_lt.hpp +37 -0
  57. data/vendor/datasketches-cpp/python/include/py_object_ostream.hpp +48 -0
  58. data/vendor/datasketches-cpp/python/include/quantile_conditional.hpp +104 -0
  59. data/vendor/datasketches-cpp/python/include/tuple_policy.hpp +136 -0
  60. data/vendor/datasketches-cpp/python/src/count_wrapper.cpp +101 -0
  61. data/vendor/datasketches-cpp/python/src/cpc_wrapper.cpp +16 -30
  62. data/vendor/datasketches-cpp/python/src/datasketches.cpp +6 -0
  63. data/vendor/datasketches-cpp/python/src/density_wrapper.cpp +95 -0
  64. data/vendor/datasketches-cpp/python/src/fi_wrapper.cpp +127 -73
  65. data/vendor/datasketches-cpp/python/src/hll_wrapper.cpp +28 -36
  66. data/vendor/datasketches-cpp/python/src/kll_wrapper.cpp +108 -160
  67. data/vendor/datasketches-cpp/python/src/py_serde.cpp +5 -4
  68. data/vendor/datasketches-cpp/python/src/quantiles_wrapper.cpp +99 -148
  69. data/vendor/datasketches-cpp/python/src/req_wrapper.cpp +117 -178
  70. data/vendor/datasketches-cpp/python/src/theta_wrapper.cpp +67 -73
  71. data/vendor/datasketches-cpp/python/src/tuple_wrapper.cpp +215 -0
  72. data/vendor/datasketches-cpp/python/src/vo_wrapper.cpp +1 -1
  73. data/vendor/datasketches-cpp/python/tests/count_min_test.py +86 -0
  74. data/vendor/datasketches-cpp/python/tests/cpc_test.py +10 -10
  75. data/vendor/datasketches-cpp/python/tests/density_test.py +93 -0
  76. data/vendor/datasketches-cpp/python/tests/fi_test.py +41 -2
  77. data/vendor/datasketches-cpp/python/tests/hll_test.py +19 -20
  78. data/vendor/datasketches-cpp/python/tests/kll_test.py +40 -6
  79. data/vendor/datasketches-cpp/python/tests/quantiles_test.py +39 -5
  80. data/vendor/datasketches-cpp/python/tests/req_test.py +38 -5
  81. data/vendor/datasketches-cpp/python/tests/theta_test.py +16 -14
  82. data/vendor/datasketches-cpp/python/tests/tuple_test.py +206 -0
  83. data/vendor/datasketches-cpp/python/tests/vo_test.py +7 -0
  84. data/vendor/datasketches-cpp/quantiles/include/quantiles_sketch.hpp +8 -3
  85. data/vendor/datasketches-cpp/quantiles/include/quantiles_sketch_impl.hpp +4 -4
  86. data/vendor/datasketches-cpp/quantiles/test/quantiles_sketch_test.cpp +1 -1
  87. data/vendor/datasketches-cpp/req/include/req_compactor_impl.hpp +0 -2
  88. data/vendor/datasketches-cpp/req/include/req_sketch.hpp +8 -3
  89. data/vendor/datasketches-cpp/req/include/req_sketch_impl.hpp +2 -2
  90. data/vendor/datasketches-cpp/sampling/include/var_opt_sketch.hpp +20 -6
  91. data/vendor/datasketches-cpp/sampling/include/var_opt_sketch_impl.hpp +30 -16
  92. data/vendor/datasketches-cpp/sampling/include/var_opt_union.hpp +5 -1
  93. data/vendor/datasketches-cpp/sampling/include/var_opt_union_impl.hpp +19 -15
  94. data/vendor/datasketches-cpp/sampling/test/var_opt_sketch_test.cpp +33 -14
  95. data/vendor/datasketches-cpp/sampling/test/var_opt_union_test.cpp +0 -2
  96. data/vendor/datasketches-cpp/setup.py +1 -1
  97. data/vendor/datasketches-cpp/theta/CMakeLists.txt +1 -0
  98. data/vendor/datasketches-cpp/theta/include/bit_packing.hpp +6279 -0
  99. data/vendor/datasketches-cpp/theta/include/compact_theta_sketch_parser.hpp +14 -8
  100. data/vendor/datasketches-cpp/theta/include/compact_theta_sketch_parser_impl.hpp +60 -46
  101. data/vendor/datasketches-cpp/theta/include/theta_helpers.hpp +4 -2
  102. data/vendor/datasketches-cpp/theta/include/theta_sketch.hpp +58 -10
  103. data/vendor/datasketches-cpp/theta/include/theta_sketch_impl.hpp +430 -130
  104. data/vendor/datasketches-cpp/theta/include/theta_union_base_impl.hpp +9 -9
  105. data/vendor/datasketches-cpp/theta/include/theta_update_sketch_base.hpp +16 -4
  106. data/vendor/datasketches-cpp/theta/include/theta_update_sketch_base_impl.hpp +2 -2
  107. data/vendor/datasketches-cpp/theta/test/CMakeLists.txt +1 -0
  108. data/vendor/datasketches-cpp/theta/test/bit_packing_test.cpp +80 -0
  109. data/vendor/datasketches-cpp/theta/test/theta_sketch_test.cpp +42 -3
  110. data/vendor/datasketches-cpp/theta/test/theta_union_test.cpp +25 -0
  111. data/vendor/datasketches-cpp/tuple/include/tuple_sketch_impl.hpp +2 -1
  112. data/vendor/datasketches-cpp/version.cfg.in +1 -1
  113. metadata +31 -3
@@ -43,20 +43,44 @@ kxq1_(0.0),
43
43
  hllByteArr_(allocator),
44
44
  curMin_(0),
45
45
  numAtCurMin_(1 << lgConfigK),
46
- oooFlag_(false)
46
+ oooFlag_(false),
47
+ rebuild_kxq_curmin_(false)
48
+ {}
49
+
50
+ template<typename A>
51
+ HllArray<A>::HllArray(const HllArray& other, target_hll_type tgtHllType) :
52
+ HllSketchImpl<A>(other.getLgConfigK(), tgtHllType, hll_mode::HLL, other.isStartFullSize()),
53
+ // remaining fields are initialized to empty sketch defaults
54
+ // and left to subclass constructor to populate
55
+ hipAccum_(0.0),
56
+ kxq0_(1 << other.getLgConfigK()),
57
+ kxq1_(0.0),
58
+ hllByteArr_(other.getAllocator()),
59
+ curMin_(0),
60
+ numAtCurMin_(1 << other.getLgConfigK()),
61
+ oooFlag_(false),
62
+ rebuild_kxq_curmin_(false)
47
63
  {}
48
64
 
49
65
  template<typename A>
50
66
  HllArray<A>* HllArray<A>::copyAs(target_hll_type tgtHllType) const {
51
- if (tgtHllType == this->getTgtHllType()) {
67
+ // we may need to recompute KxQ and curMin data for a union gadget,
68
+ // so only use a direct copy if we have a valid sketch
69
+ if (tgtHllType == this->getTgtHllType() && !this->isRebuildKxqCurminFlag()) {
52
70
  return static_cast<HllArray*>(copy());
53
71
  }
54
- if (tgtHllType == target_hll_type::HLL_4) {
55
- return HllSketchImplFactory<A>::convertToHll4(*this);
56
- } else if (tgtHllType == target_hll_type::HLL_6) {
57
- return HllSketchImplFactory<A>::convertToHll6(*this);
58
- } else { // tgtHllType == HLL_8
59
- return HllSketchImplFactory<A>::convertToHll8(*this);
72
+
73
+ // the factory methods replay the coupons and will always rebuild
74
+ // the sketch in a consistent way
75
+ switch (tgtHllType) {
76
+ case target_hll_type::HLL_4:
77
+ return HllSketchImplFactory<A>::convertToHll4(*this);
78
+ case target_hll_type::HLL_6:
79
+ return HllSketchImplFactory<A>::convertToHll6(*this);
80
+ case target_hll_type::HLL_8:
81
+ return HllSketchImplFactory<A>::convertToHll8(*this);
82
+ default:
83
+ throw std::invalid_argument("Invalid target HLL type");
60
84
  }
61
85
  }
62
86
 
@@ -299,7 +323,7 @@ double HllArray<A>::getEstimate() const {
299
323
  if (oooFlag_) {
300
324
  return getCompositeEstimate();
301
325
  }
302
- return getHipAccum();
326
+ return hipAccum_;
303
327
  }
304
328
 
305
329
  // HLL UPPER AND LOWER BOUNDS
@@ -322,54 +346,20 @@ double HllArray<A>::getLowerBound(uint8_t numStdDev) const {
322
346
  HllUtil<A>::checkNumStdDev(numStdDev);
323
347
  const uint32_t configK = 1 << this->lgConfigK_;
324
348
  const double numNonZeros = ((curMin_ == 0) ? (configK - numAtCurMin_) : configK);
325
-
326
- double estimate;
327
- double rseFactor;
328
- if (oooFlag_) {
329
- estimate = getCompositeEstimate();
330
- rseFactor = hll_constants::HLL_NON_HIP_RSE_FACTOR;
331
- } else {
332
- estimate = hipAccum_;
333
- rseFactor = hll_constants::HLL_HIP_RSE_FACTOR;
334
- }
335
-
336
- double relErr;
337
- if (this->lgConfigK_ > 12) {
338
- relErr = (numStdDev * rseFactor) / sqrt(configK);
339
- } else {
340
- relErr = HllUtil<A>::getRelErr(false, oooFlag_, this->lgConfigK_, numStdDev);
341
- }
342
- return fmax(estimate / (1.0 + relErr), numNonZeros);
349
+ const double relErr = HllUtil<A>::getRelErr(false, this->oooFlag_, this->lgConfigK_, numStdDev);
350
+ return fmax(getEstimate() / (1.0 + relErr), numNonZeros);
343
351
  }
344
352
 
345
353
  template<typename A>
346
354
  double HllArray<A>::getUpperBound(uint8_t numStdDev) const {
347
355
  HllUtil<A>::checkNumStdDev(numStdDev);
348
- const uint32_t configK = 1 << this->lgConfigK_;
349
-
350
- double estimate;
351
- double rseFactor;
352
- if (oooFlag_) {
353
- estimate = getCompositeEstimate();
354
- rseFactor = hll_constants::HLL_NON_HIP_RSE_FACTOR;
355
- } else {
356
- estimate = hipAccum_;
357
- rseFactor = hll_constants::HLL_HIP_RSE_FACTOR;
358
- }
359
-
360
- double relErr;
361
- if (this->lgConfigK_ > 12) {
362
- relErr = (-1.0) * (numStdDev * rseFactor) / sqrt(configK);
363
- } else {
364
- relErr = HllUtil<A>::getRelErr(true, oooFlag_, this->lgConfigK_, numStdDev);
365
- }
366
- return estimate / (1.0 + relErr);
356
+ const double relErr = HllUtil<A>::getRelErr(true, this->oooFlag_, this->lgConfigK_, numStdDev);
357
+ return getEstimate() / (1.0 + relErr);
367
358
  }
368
359
 
369
360
  /**
370
361
  * This is the (non-HIP) estimator.
371
362
  * It is called "composite" because multiple estimators are pasted together.
372
- * @param absHllArr an instance of the AbstractHllArray class.
373
363
  * @return the composite estimate
374
364
  */
375
365
  // Original C: again-two-registers.c hhb_get_composite_estimate L1489
@@ -468,16 +458,6 @@ void HllArray<A>::putNumAtCurMin(uint32_t numAtCurMin) {
468
458
  numAtCurMin_ = numAtCurMin;
469
459
  }
470
460
 
471
- template<typename A>
472
- void HllArray<A>::decNumAtCurMin() {
473
- --numAtCurMin_;
474
- }
475
-
476
- template<typename A>
477
- void HllArray<A>::addToHipAccum(double delta) {
478
- hipAccum_ += delta;
479
- }
480
-
481
461
  template<typename A>
482
462
  bool HllArray<A>::isCompact() const {
483
463
  return false;
@@ -486,7 +466,7 @@ bool HllArray<A>::isCompact() const {
486
466
  template<typename A>
487
467
  bool HllArray<A>::isEmpty() const {
488
468
  const uint32_t configK = 1 << this->lgConfigK_;
489
- return (getCurMin() == 0) && (getNumAtCurMin() == configK);
469
+ return (curMin_ == 0) && (numAtCurMin_ == configK);
490
470
  }
491
471
 
492
472
  template<typename A>
@@ -556,6 +536,11 @@ AuxHashMap<A>* HllArray<A>::getAuxHashMap() const {
556
536
  return nullptr;
557
537
  }
558
538
 
539
+ template<typename A>
540
+ const vector_u8<A>& HllArray<A>::getHllArray() const {
541
+ return hllByteArr_;
542
+ }
543
+
559
544
  template<typename A>
560
545
  void HllArray<A>::hipAndKxQIncrementalUpdate(uint8_t oldValue, uint8_t newValue) {
561
546
  const uint32_t configK = 1 << this->getLgConfigK();
@@ -601,6 +586,52 @@ double HllArray<A>::getHllRawEstimate() const {
601
586
  return hyperEst;
602
587
  }
603
588
 
589
+ template<typename A>
590
+ void HllArray<A>::setRebuildKxqCurminFlag(bool rebuild) {
591
+ rebuild_kxq_curmin_ = rebuild;
592
+ }
593
+
594
+ template<typename A>
595
+ bool HllArray<A>::isRebuildKxqCurminFlag() const {
596
+ return rebuild_kxq_curmin_;
597
+ }
598
+
599
+ template<typename A>
600
+ void HllArray<A>::check_rebuild_kxq_cur_min() {
601
+ if (!rebuild_kxq_curmin_) { return; }
602
+
603
+ uint8_t cur_min = 64;
604
+ uint32_t num_at_cur_min = 0;
605
+ double kxq0 = 1 << this->lgConfigK_;
606
+ double kxq1 = 0;
607
+
608
+ auto it = this->begin(true); // want all points to adjust cur_min
609
+ const auto end = this->end();
610
+ while (it != end) {
611
+ uint8_t v = HllUtil<A>::getValue(*it);
612
+ if (v > 0) {
613
+ if (v < 32) { kxq0 += INVERSE_POWERS_OF_2[v] - 1.0; }
614
+ else { kxq1 += INVERSE_POWERS_OF_2[v] - 1.0; }
615
+ }
616
+ if (v > cur_min) { ++it; continue; }
617
+ if (v < cur_min) {
618
+ cur_min = v;
619
+ num_at_cur_min = 1;
620
+ } else {
621
+ ++num_at_cur_min;
622
+ }
623
+ ++it;
624
+ }
625
+
626
+ kxq0_ = kxq0;
627
+ kxq1_ = kxq1;
628
+ curMin_ = cur_min;
629
+ numAtCurMin_ = num_at_cur_min;
630
+ rebuild_kxq_curmin_ = false;
631
+ // HipAccum is not affected
632
+
633
+ }
634
+
604
635
  template<typename A>
605
636
  typename HllArray<A>::const_iterator HllArray<A>::begin(bool all) const {
606
637
  return const_iterator(hllByteArr_.data(), 1 << this->lgConfigK_, 0, this->tgtHllType_, nullptr, 0, all);
@@ -637,12 +668,14 @@ bool HllArray<A>::const_iterator::operator!=(const const_iterator& other) const
637
668
  }
638
669
 
639
670
  template<typename A>
640
- uint32_t HllArray<A>::const_iterator::operator*() const {
671
+ auto HllArray<A>::const_iterator::operator*() const -> reference {
641
672
  return HllUtil<A>::pair(index_, value_);
642
673
  }
643
674
 
644
675
  template<typename A>
645
676
  uint8_t HllArray<A>::const_iterator::get_value(const uint8_t* array, uint32_t index, target_hll_type hll_type, const AuxHashMap<A>* exceptions, uint8_t offset) {
677
+ // TODO: we should be able to improve efficiency here by reading multiple bytes at a time
678
+ // for HLL4 and HLL6
646
679
  if (hll_type == target_hll_type::HLL_4) {
647
680
  uint8_t value = array[index >> 1];
648
681
  if ((index & 1) > 0) { // odd
@@ -32,6 +32,7 @@ template<typename A>
32
32
  class HllArray : public HllSketchImpl<A> {
33
33
  public:
34
34
  HllArray(uint8_t lgConfigK, target_hll_type tgtHllType, bool startFullSize, const A& allocator);
35
+ explicit HllArray(const HllArray& other, target_hll_type tgtHllType);
35
36
 
36
37
  static HllArray* newHll(const void* bytes, size_t len, const A& allocator);
37
38
  static HllArray* newHll(std::istream& is, const A& allocator);
@@ -52,10 +53,6 @@ class HllArray : public HllSketchImpl<A> {
52
53
  virtual double getLowerBound(uint8_t numStdDev) const;
53
54
  virtual double getUpperBound(uint8_t numStdDev) const;
54
55
 
55
- inline void addToHipAccum(double delta);
56
-
57
- inline void decNumAtCurMin();
58
-
59
56
  inline uint8_t getCurMin() const;
60
57
  inline uint32_t getNumAtCurMin() const;
61
58
  inline double getHipAccum() const;
@@ -90,12 +87,18 @@ class HllArray : public HllSketchImpl<A> {
90
87
 
91
88
  virtual AuxHashMap<A>* getAuxHashMap() const;
92
89
 
90
+ void setRebuildKxqCurminFlag(bool rebuild);
91
+ bool isRebuildKxqCurminFlag() const;
92
+ void check_rebuild_kxq_cur_min();
93
+
93
94
  class const_iterator;
94
95
  virtual const_iterator begin(bool all = false) const;
95
96
  virtual const_iterator end() const;
96
97
 
97
98
  virtual A getAllocator() const;
98
99
 
100
+ const vector_u8<A>& getHllArray() const;
101
+
99
102
  protected:
100
103
  void hipAndKxQIncrementalUpdate(uint8_t oldValue, uint8_t newValue);
101
104
  double getHllBitMapEstimate() const;
@@ -108,17 +111,24 @@ class HllArray : public HllSketchImpl<A> {
108
111
  uint8_t curMin_; //always zero for Hll6 and Hll8, only tracked by Hll4Array
109
112
  uint32_t numAtCurMin_; //interpreted as num zeros when curMin == 0
110
113
  bool oooFlag_; //Out-Of-Order Flag
114
+ bool rebuild_kxq_curmin_; // flag to recompute
111
115
 
112
116
  friend class HllSketchImplFactory<A>;
113
117
  };
114
118
 
115
119
  template<typename A>
116
- class HllArray<A>::const_iterator: public std::iterator<std::input_iterator_tag, uint32_t> {
120
+ class HllArray<A>::const_iterator {
117
121
  public:
122
+ using iterator_category = std::input_iterator_tag;
123
+ using value_type = uint32_t;
124
+ using difference_type = void;
125
+ using pointer = uint32_t*;
126
+ using reference = uint32_t;
127
+
118
128
  const_iterator(const uint8_t* array, uint32_t array_slze, uint32_t index, target_hll_type hll_type, const AuxHashMap<A>* exceptions, uint8_t offset, bool all);
119
129
  const_iterator& operator++();
120
130
  bool operator!=(const const_iterator& other) const;
121
- uint32_t operator*() const;
131
+ reference operator*() const;
122
132
  private:
123
133
  const uint8_t* array_;
124
134
  uint32_t array_size_;
@@ -136,38 +136,20 @@ HllSketchImpl<A>* HllSketchImplFactory<A>::reset(HllSketchImpl<A>* impl, bool st
136
136
 
137
137
  template<typename A>
138
138
  Hll4Array<A>* HllSketchImplFactory<A>::convertToHll4(const HllArray<A>& srcHllArr) {
139
- const uint8_t lgConfigK = srcHllArr.getLgConfigK();
140
139
  using Hll4Alloc = typename std::allocator_traits<A>::template rebind_alloc<Hll4Array<A>>;
141
- Hll4Array<A>* hll4Array = new (Hll4Alloc(srcHllArr.getAllocator()).allocate(1))
142
- Hll4Array<A>(lgConfigK, srcHllArr.isStartFullSize(), srcHllArr.getAllocator());
143
- hll4Array->putOutOfOrderFlag(srcHllArr.isOutOfOrderFlag());
144
- hll4Array->mergeHll(srcHllArr);
145
- hll4Array->putHipAccum(srcHllArr.getHipAccum());
146
- return hll4Array;
140
+ return new (Hll4Alloc(srcHllArr.getAllocator()).allocate(1)) Hll4Array<A>(srcHllArr);
147
141
  }
148
142
 
149
143
  template<typename A>
150
144
  Hll6Array<A>* HllSketchImplFactory<A>::convertToHll6(const HllArray<A>& srcHllArr) {
151
- const uint8_t lgConfigK = srcHllArr.getLgConfigK();
152
145
  using Hll6Alloc = typename std::allocator_traits<A>::template rebind_alloc<Hll6Array<A>>;
153
- Hll6Array<A>* hll6Array = new (Hll6Alloc(srcHllArr.getAllocator()).allocate(1))
154
- Hll6Array<A>(lgConfigK, srcHllArr.isStartFullSize(), srcHllArr.getAllocator());
155
- hll6Array->putOutOfOrderFlag(srcHllArr.isOutOfOrderFlag());
156
- hll6Array->mergeHll(srcHllArr);
157
- hll6Array->putHipAccum(srcHllArr.getHipAccum());
158
- return hll6Array;
146
+ return new (Hll6Alloc(srcHllArr.getAllocator()).allocate(1)) Hll6Array<A>(srcHllArr);
159
147
  }
160
148
 
161
149
  template<typename A>
162
150
  Hll8Array<A>* HllSketchImplFactory<A>::convertToHll8(const HllArray<A>& srcHllArr) {
163
- const uint8_t lgConfigK = srcHllArr.getLgConfigK();
164
151
  using Hll8Alloc = typename std::allocator_traits<A>::template rebind_alloc<Hll8Array<A>>;
165
- Hll8Array<A>* hll8Array = new (Hll8Alloc(srcHllArr.getAllocator()).allocate(1))
166
- Hll8Array<A>(lgConfigK, srcHllArr.isStartFullSize(), srcHllArr.getAllocator());
167
- hll8Array->putOutOfOrderFlag(srcHllArr.isOutOfOrderFlag());
168
- hll8Array->mergeHll(srcHllArr);
169
- hll8Array->putHipAccum(srcHllArr.getHipAccum());
170
- return hll8Array;
152
+ return new (Hll8Alloc(srcHllArr.getAllocator()).allocate(1)) Hll8Array<A>(srcHllArr);
171
153
  }
172
154
 
173
155
  }
@@ -131,21 +131,29 @@ void hll_union_alloc<A>::coupon_update(uint32_t coupon) {
131
131
 
132
132
  template<typename A>
133
133
  double hll_union_alloc<A>::get_estimate() const {
134
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
135
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
134
136
  return gadget_.get_estimate();
135
137
  }
136
138
 
137
139
  template<typename A>
138
140
  double hll_union_alloc<A>::get_composite_estimate() const {
141
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
142
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
139
143
  return gadget_.get_composite_estimate();
140
144
  }
141
145
 
142
146
  template<typename A>
143
147
  double hll_union_alloc<A>::get_lower_bound(uint8_t num_std_dev) const {
148
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
149
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
144
150
  return gadget_.get_lower_bound(num_std_dev);
145
151
  }
146
152
 
147
153
  template<typename A>
148
154
  double hll_union_alloc<A>::get_upper_bound(uint8_t num_std_dev) const {
155
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
156
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
149
157
  return gadget_.get_upper_bound(num_std_dev);
150
158
  }
151
159
 
@@ -152,12 +152,6 @@ inline void HllUtil<A>::hash(const void* key, size_t keyLen, uint64_t seed, Hash
152
152
  MurmurHash3_x64_128(key, keyLen, seed, result);
153
153
  }
154
154
 
155
- template<typename A>
156
- inline double HllUtil<A>::getRelErr(bool upperBound, bool unioned,
157
- uint8_t lgConfigK, uint8_t numStdDev) {
158
- return RelativeErrorTables<A>::getRelErr(upperBound, unioned, lgConfigK, numStdDev);
159
- }
160
-
161
155
  template<typename A>
162
156
  inline uint8_t HllUtil<A>::checkLgK(uint8_t lgK) {
163
157
  if ((lgK >= hll_constants::MIN_LOG_K) && (lgK <= hll_constants::MAX_LOG_K)) {
@@ -167,6 +161,20 @@ inline uint8_t HllUtil<A>::checkLgK(uint8_t lgK) {
167
161
  }
168
162
  }
169
163
 
164
+ template<typename A>
165
+ inline double HllUtil<A>::getRelErr(bool upperBound, bool unioned,
166
+ uint8_t lgConfigK, uint8_t numStdDev) {
167
+ checkLgK(lgConfigK);
168
+ if (lgConfigK > 12) {
169
+ const double rseFactor = unioned ?
170
+ hll_constants::HLL_NON_HIP_RSE_FACTOR : hll_constants::HLL_HIP_RSE_FACTOR;
171
+ const uint32_t configK = 1 << lgConfigK;
172
+ return (upperBound ? -1 : 1) * (numStdDev * rseFactor) / sqrt(configK);
173
+ } else {
174
+ return RelativeErrorTables<A>::getRelErr(upperBound, unioned, lgConfigK, numStdDev);
175
+ }
176
+ }
177
+
170
178
  template<typename A>
171
179
  inline void HllUtil<A>::checkMemSize(uint64_t minBytes, uint64_t capBytes) {
172
180
  if (capBytes < minBytes) {
@@ -47,7 +47,7 @@ bool coupon_iterator<A>::operator!=(const coupon_iterator& other) const {
47
47
  }
48
48
 
49
49
  template<typename A>
50
- uint32_t coupon_iterator<A>::operator*() const {
50
+ auto coupon_iterator<A>::operator*() const -> reference {
51
51
  return array_[index_];
52
52
  }
53
53
 
@@ -23,12 +23,18 @@
23
23
  namespace datasketches {
24
24
 
25
25
  template<typename A>
26
- class coupon_iterator: public std::iterator<std::input_iterator_tag, uint32_t> {
26
+ class coupon_iterator {
27
27
  public:
28
+ using iterator_category = std::input_iterator_tag;
29
+ using value_type = uint32_t;
30
+ using difference_type = void;
31
+ using pointer = uint32_t*;
32
+ using reference = uint32_t;
33
+
28
34
  coupon_iterator(const uint32_t* array, size_t array_slze, size_t index, bool all);
29
35
  coupon_iterator& operator++();
30
36
  bool operator!=(const coupon_iterator& other) const;
31
- uint32_t operator*() const;
37
+ reference operator*() const;
32
38
  private:
33
39
  const uint32_t* array_;
34
40
  size_t array_size_;
@@ -23,8 +23,9 @@
23
23
  #include "common_defs.hpp"
24
24
  #include "HllUtil.hpp"
25
25
 
26
- #include <memory>
27
26
  #include <iostream>
27
+ #include <memory>
28
+ #include <string>
28
29
  #include <vector>
29
30
 
30
31
  namespace datasketches {
@@ -144,7 +145,7 @@ class hll_sketch_alloc final {
144
145
 
145
146
  /**
146
147
  * Reconstructs a sketch from a serialized image in a byte array.
147
- * @param is bytes An input array with a binary image of a sketch
148
+ * @param bytes An input array with a binary image of a sketch
148
149
  * @param len Length of the input array, in bytes
149
150
  */
150
151
  static hll_sketch_alloc deserialize(const void* bytes, size_t len, const A& allocator = A());
@@ -197,7 +198,7 @@ class hll_sketch_alloc final {
197
198
  * Human readable summary with optional detail
198
199
  * @param summary if true, output the sketch summary
199
200
  * @param detail if true, output the internal data array
200
- * @param auxDetail if true, output the internal Aux array, if it exists.
201
+ * @param aux_detail if true, output the internal Aux array, if it exists.
201
202
  * @param all if true, outputs all entries including empty ones
202
203
  * @return human readable string with optional detail.
203
204
  */
@@ -358,7 +359,7 @@ class hll_sketch_alloc final {
358
359
  * value can be exceeded in extremely rare cases. If exceeded, it
359
360
  * will be larger by only a few percent.
360
361
  *
361
- * @param lg_config_k The Log2 of K for the target HLL sketch. This value must be
362
+ * @param lg_k The Log2 of K for the target HLL sketch. This value must be
362
363
  * between 4 and 21 inclusively.
363
364
  * @param tgt_type the desired Hll type
364
365
  * @return the maximum size in bytes that this sketch can grow to.
@@ -495,20 +496,20 @@ class hll_union_alloc {
495
496
  /**
496
497
  * Returns the result of this union operator with the specified
497
498
  * #tgt_hll_type.
498
- * @param The tgt_hll_type enum value of the desired result (Default: HLL_4)
499
+ * @param tgt_type The tgt_hll_type enum value of the desired result (Default: HLL_4)
499
500
  * @return The result of this union with the specified tgt_hll_type
500
501
  */
501
502
  hll_sketch_alloc<A> get_result(target_hll_type tgt_type = HLL_4) const;
502
503
 
503
504
  /**
504
505
  * Update this union operator with the given sketch.
505
- * @param The given sketch.
506
+ * @param sketch The given sketch.
506
507
  */
507
508
  void update(const hll_sketch_alloc<A>& sketch);
508
509
 
509
510
  /**
510
511
  * Update this union operator with the given temporary sketch.
511
- * @param The given sketch.
512
+ * @param sketch The given sketch.
512
513
  */
513
514
  void update(hll_sketch_alloc<A>&& sketch);
514
515
 
@@ -608,7 +609,7 @@ class hll_union_alloc {
608
609
  * perform the union. This may involve swapping, down-sampling, transforming, and / or
609
610
  * copying one of the arguments and may completely replace the internals of the union.
610
611
  *
611
- * @param incoming_impl the given incoming sketch, which may not be modified.
612
+ * @param sketch the given incoming sketch, which may not be modified.
612
613
  * @param lg_max_k the maximum value of log2 K for this union.
613
614
  */
614
615
  inline void union_impl(const hll_sketch_alloc<A>& sketch, uint8_t lg_max_k);
@@ -53,11 +53,16 @@ static void basicUnion(uint64_t n1, uint64_t n2,
53
53
  v += n2;
54
54
 
55
55
  hll_union u(lgMaxK);
56
- u.update(std::move(h1));
56
+ u.update(h1);
57
57
  u.update(h2);
58
58
 
59
59
  hll_sketch result = u.get_result(resultType);
60
60
 
61
+ // ensure we check a direct union estimate, without first caling get_result()
62
+ u.reset();
63
+ u.update(std::move(h1));
64
+ u.update(h2);
65
+
61
66
  // force non-HIP estimates to avoid issues with in- vs out-of-order
62
67
  double uEst = result.get_composite_estimate();
63
68
  double uUb = result.get_upper_bound(2);
@@ -74,6 +79,7 @@ static void basicUnion(uint64_t n1, uint64_t n2,
74
79
  REQUIRE((uEst - uLb) >= 0.0);
75
80
 
76
81
  REQUIRE(controlEst == uEst);
82
+ REQUIRE(controlEst == u.get_composite_estimate());
77
83
  }
78
84
 
79
85
  /**
@@ -20,7 +20,6 @@
20
20
  #ifndef KLL_HELPER_HPP_
21
21
  #define KLL_HELPER_HPP_
22
22
 
23
- #include <random>
24
23
  #include <stdexcept>
25
24
 
26
25
  namespace datasketches {
@@ -586,16 +586,21 @@ class kll_sketch {
586
586
  };
587
587
 
588
588
  template<typename T, typename C, typename A>
589
- class kll_sketch<T, C, A>::const_iterator: public std::iterator<std::input_iterator_tag, T> {
589
+ class kll_sketch<T, C, A>::const_iterator {
590
590
  public:
591
+ using iterator_category = std::input_iterator_tag;
591
592
  using value_type = std::pair<const T&, const uint64_t>;
593
+ using difference_type = void;
594
+ using pointer = const return_value_holder<value_type>;
595
+ using reference = const value_type;
596
+
592
597
  friend class kll_sketch<T, C, A>;
593
598
  const_iterator& operator++();
594
599
  const_iterator& operator++(int);
595
600
  bool operator==(const const_iterator& other) const;
596
601
  bool operator!=(const const_iterator& other) const;
597
- const value_type operator*() const;
598
- const return_value_holder<value_type> operator->() const;
602
+ reference operator*() const;
603
+ pointer operator->() const;
599
604
  private:
600
605
  const T* items;
601
606
  const uint32_t* levels;
@@ -1105,12 +1105,12 @@ bool kll_sketch<T, C, A>::const_iterator::operator!=(const const_iterator& other
1105
1105
  }
1106
1106
 
1107
1107
  template<typename T, typename C, typename A>
1108
- auto kll_sketch<T, C, A>::const_iterator::operator*() const -> const value_type {
1108
+ auto kll_sketch<T, C, A>::const_iterator::operator*() const -> reference {
1109
1109
  return value_type(items[index], weight);
1110
1110
  }
1111
1111
 
1112
1112
  template<typename T, typename C, typename A>
1113
- auto kll_sketch<T, C, A>::const_iterator::operator->() const -> const return_value_holder<value_type> {
1113
+ auto kll_sketch<T, C, A>::const_iterator::operator->() const -> pointer {
1114
1114
  return **this;
1115
1115
  }
1116
1116
 
@@ -242,7 +242,7 @@ TEST_CASE("kll sketch", "[kll_sketch]") {
242
242
  FAIL("checking rank vs CDF for value " + std::to_string(i));
243
243
  }
244
244
  subtotal_pmf += pmf[i];
245
- if (abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
245
+ if (std::abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
246
246
  FAIL("CDF vs PMF for value " + std::to_string(i));
247
247
  }
248
248
  }
@@ -257,7 +257,7 @@ TEST_CASE("kll sketch", "[kll_sketch]") {
257
257
  FAIL("checking rank vs CDF for value " + std::to_string(i));
258
258
  }
259
259
  subtotal_pmf += pmf[i];
260
- if (abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
260
+ if (std::abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
261
261
  FAIL("CDF vs PMF for value " + std::to_string(i));
262
262
  }
263
263
  }
@@ -42,9 +42,12 @@ target_link_libraries(python
42
42
  cpc
43
43
  fi
44
44
  theta
45
+ tuple
45
46
  sampling
46
47
  req
47
48
  quantiles
49
+ count
50
+ density
48
51
  pybind11::module
49
52
  )
50
53
 
@@ -72,10 +75,13 @@ target_sources(python
72
75
  src/cpc_wrapper.cpp
73
76
  src/fi_wrapper.cpp
74
77
  src/theta_wrapper.cpp
78
+ src/tuple_wrapper.cpp
75
79
  src/vo_wrapper.cpp
76
80
  src/req_wrapper.cpp
77
81
  src/quantiles_wrapper.cpp
82
+ src/density_wrapper.cpp
78
83
  src/ks_wrapper.cpp
84
+ src/count_wrapper.cpp
79
85
  src/vector_of_kll.cpp
80
86
  src/py_serde.cpp
81
87
  )
@@ -12,15 +12,15 @@ This package provides a variety of sketches as described below. Wherever a speci
12
12
 
13
13
  ## Building and Installation
14
14
 
15
- Once cloned, the library can be installed by running `python3 -m pip install .` in the project root directory -- not the python subdirectory -- which will also install the necessary dependencies, namely numpy and [pybind11[global]](https://github.com/pybind/pybind11).
15
+ Once cloned, the library can be installed by running `python3 -m pip install .` in the project root directory -- not the python subdirectory -- which will also install the necessary dependencies, namely NumPy and [pybind11[global]](https://github.com/pybind/pybind11).
16
16
 
17
- If you prefer to call the `setup.py` build script directly, which is discoraged, you must first install `pybind11[global]`, as well as any other dependencies listed under the build-system section in `pyproject.toml`.
17
+ If you prefer to call the `setup.py` build script directly, which is discouraged, you must first install `pybind11[global]`, as well as any other dependencies listed under the build-system section in `pyproject.toml`.
18
18
 
19
19
  The library is also available from PyPI via `python3 -m pip install datasketches`.
20
20
 
21
21
  ## Usage
22
22
 
23
- Having installed the library, loading the Apache Datasketches Library in Python is simple: `import datasketches`.
23
+ Having installed the library, loading the Apache DataSketches Library in Python is simple: `import datasketches`.
24
24
 
25
25
  The unit tests are mostly structured in a tutorial style and can be used as a reference example for how to feed data into and query the different types of sketches.
26
26
 
@@ -76,10 +76,10 @@ The only developer-specific instructions relate to running unit tests.
76
76
 
77
77
  ### Unit tests
78
78
 
79
- The Python unit tests are run via `tox`, with no arguments, from the project root directory -- not the python subdirectory. Tox creates a temporary virtual environment in which to build and run the unit tests. In the event you are missing the necessary pacakge, tox may be installed with `python3 -m pip install --upgrade tox`.
79
+ The Python unit tests are run via `tox`, with no arguments, from the project root directory -- not the python subdirectory. Tox creates a temporary virtual environment in which to build and run the unit tests. In the event you are missing the necessary package, tox may be installed with `python3 -m pip install --upgrade tox`.
80
80
 
81
81
  ## License
82
82
 
83
- The Apache DataSketches Library is distrubted under an Apache 2.0 License.
83
+ The Apache DataSketches Library is distributed under the Apache 2.0 License.
84
84
 
85
85
  There may be precompiled binaries provided as a convenience and distributed through PyPI via [https://pypi.org/project/datasketches/] contain compiled code from [pybind11](https://github.com/pybind/pybind11), which is distributed under a BSD license.