datasketches 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (113) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +4 -0
  3. data/ext/datasketches/cpc_wrapper.cpp +1 -1
  4. data/lib/datasketches/version.rb +1 -1
  5. data/vendor/datasketches-cpp/CMakeLists.txt +22 -20
  6. data/vendor/datasketches-cpp/NOTICE +1 -1
  7. data/vendor/datasketches-cpp/common/include/MurmurHash3.h +25 -27
  8. data/vendor/datasketches-cpp/common/include/common_defs.hpp +8 -6
  9. data/vendor/datasketches-cpp/common/include/count_zeros.hpp +11 -0
  10. data/vendor/datasketches-cpp/common/include/memory_operations.hpp +5 -4
  11. data/vendor/datasketches-cpp/common/test/CMakeLists.txt +1 -1
  12. data/vendor/datasketches-cpp/common/test/integration_test.cpp +6 -0
  13. data/vendor/datasketches-cpp/count/CMakeLists.txt +42 -0
  14. data/vendor/datasketches-cpp/count/include/count_min.hpp +351 -0
  15. data/vendor/datasketches-cpp/count/include/count_min_impl.hpp +517 -0
  16. data/vendor/datasketches-cpp/count/test/CMakeLists.txt +43 -0
  17. data/vendor/datasketches-cpp/count/test/count_min_allocation_test.cpp +155 -0
  18. data/vendor/datasketches-cpp/count/test/count_min_test.cpp +306 -0
  19. data/vendor/datasketches-cpp/cpc/include/cpc_confidence.hpp +3 -3
  20. data/vendor/datasketches-cpp/cpc/include/cpc_sketch_impl.hpp +1 -1
  21. data/vendor/datasketches-cpp/cpc/include/cpc_util.hpp +16 -8
  22. data/vendor/datasketches-cpp/density/CMakeLists.txt +42 -0
  23. data/vendor/datasketches-cpp/density/include/density_sketch.hpp +236 -0
  24. data/vendor/datasketches-cpp/density/include/density_sketch_impl.hpp +543 -0
  25. data/vendor/datasketches-cpp/density/test/CMakeLists.txt +35 -0
  26. data/vendor/datasketches-cpp/density/test/density_sketch_test.cpp +244 -0
  27. data/vendor/datasketches-cpp/fi/include/reverse_purge_hash_map.hpp +9 -3
  28. data/vendor/datasketches-cpp/hll/include/Hll4Array-internal.hpp +19 -11
  29. data/vendor/datasketches-cpp/hll/include/Hll4Array.hpp +2 -5
  30. data/vendor/datasketches-cpp/hll/include/Hll6Array-internal.hpp +19 -7
  31. data/vendor/datasketches-cpp/hll/include/Hll6Array.hpp +1 -1
  32. data/vendor/datasketches-cpp/hll/include/Hll8Array-internal.hpp +98 -42
  33. data/vendor/datasketches-cpp/hll/include/Hll8Array.hpp +2 -0
  34. data/vendor/datasketches-cpp/hll/include/HllArray-internal.hpp +92 -59
  35. data/vendor/datasketches-cpp/hll/include/HllArray.hpp +16 -6
  36. data/vendor/datasketches-cpp/hll/include/HllSketchImplFactory.hpp +3 -21
  37. data/vendor/datasketches-cpp/hll/include/HllUnion-internal.hpp +8 -0
  38. data/vendor/datasketches-cpp/hll/include/HllUtil.hpp +14 -6
  39. data/vendor/datasketches-cpp/hll/include/coupon_iterator-internal.hpp +1 -1
  40. data/vendor/datasketches-cpp/hll/include/coupon_iterator.hpp +8 -2
  41. data/vendor/datasketches-cpp/hll/include/hll.hpp +9 -8
  42. data/vendor/datasketches-cpp/hll/test/HllUnionTest.cpp +7 -1
  43. data/vendor/datasketches-cpp/kll/include/kll_helper.hpp +0 -1
  44. data/vendor/datasketches-cpp/kll/include/kll_sketch.hpp +8 -3
  45. data/vendor/datasketches-cpp/kll/include/kll_sketch_impl.hpp +2 -2
  46. data/vendor/datasketches-cpp/kll/test/kll_sketch_test.cpp +2 -2
  47. data/vendor/datasketches-cpp/python/CMakeLists.txt +6 -0
  48. data/vendor/datasketches-cpp/python/README.md +5 -5
  49. data/vendor/datasketches-cpp/python/datasketches/DensityWrapper.py +87 -0
  50. data/vendor/datasketches-cpp/python/datasketches/KernelFunction.py +35 -0
  51. data/vendor/datasketches-cpp/python/datasketches/PySerDe.py +15 -9
  52. data/vendor/datasketches-cpp/python/datasketches/TuplePolicy.py +77 -0
  53. data/vendor/datasketches-cpp/python/datasketches/TupleWrapper.py +205 -0
  54. data/vendor/datasketches-cpp/python/datasketches/__init__.py +17 -1
  55. data/vendor/datasketches-cpp/python/include/kernel_function.hpp +98 -0
  56. data/vendor/datasketches-cpp/python/include/py_object_lt.hpp +37 -0
  57. data/vendor/datasketches-cpp/python/include/py_object_ostream.hpp +48 -0
  58. data/vendor/datasketches-cpp/python/include/quantile_conditional.hpp +104 -0
  59. data/vendor/datasketches-cpp/python/include/tuple_policy.hpp +136 -0
  60. data/vendor/datasketches-cpp/python/src/count_wrapper.cpp +101 -0
  61. data/vendor/datasketches-cpp/python/src/cpc_wrapper.cpp +16 -30
  62. data/vendor/datasketches-cpp/python/src/datasketches.cpp +6 -0
  63. data/vendor/datasketches-cpp/python/src/density_wrapper.cpp +95 -0
  64. data/vendor/datasketches-cpp/python/src/fi_wrapper.cpp +127 -73
  65. data/vendor/datasketches-cpp/python/src/hll_wrapper.cpp +28 -36
  66. data/vendor/datasketches-cpp/python/src/kll_wrapper.cpp +108 -160
  67. data/vendor/datasketches-cpp/python/src/py_serde.cpp +5 -4
  68. data/vendor/datasketches-cpp/python/src/quantiles_wrapper.cpp +99 -148
  69. data/vendor/datasketches-cpp/python/src/req_wrapper.cpp +117 -178
  70. data/vendor/datasketches-cpp/python/src/theta_wrapper.cpp +67 -73
  71. data/vendor/datasketches-cpp/python/src/tuple_wrapper.cpp +215 -0
  72. data/vendor/datasketches-cpp/python/src/vo_wrapper.cpp +1 -1
  73. data/vendor/datasketches-cpp/python/tests/count_min_test.py +86 -0
  74. data/vendor/datasketches-cpp/python/tests/cpc_test.py +10 -10
  75. data/vendor/datasketches-cpp/python/tests/density_test.py +93 -0
  76. data/vendor/datasketches-cpp/python/tests/fi_test.py +41 -2
  77. data/vendor/datasketches-cpp/python/tests/hll_test.py +19 -20
  78. data/vendor/datasketches-cpp/python/tests/kll_test.py +40 -6
  79. data/vendor/datasketches-cpp/python/tests/quantiles_test.py +39 -5
  80. data/vendor/datasketches-cpp/python/tests/req_test.py +38 -5
  81. data/vendor/datasketches-cpp/python/tests/theta_test.py +16 -14
  82. data/vendor/datasketches-cpp/python/tests/tuple_test.py +206 -0
  83. data/vendor/datasketches-cpp/python/tests/vo_test.py +7 -0
  84. data/vendor/datasketches-cpp/quantiles/include/quantiles_sketch.hpp +8 -3
  85. data/vendor/datasketches-cpp/quantiles/include/quantiles_sketch_impl.hpp +4 -4
  86. data/vendor/datasketches-cpp/quantiles/test/quantiles_sketch_test.cpp +1 -1
  87. data/vendor/datasketches-cpp/req/include/req_compactor_impl.hpp +0 -2
  88. data/vendor/datasketches-cpp/req/include/req_sketch.hpp +8 -3
  89. data/vendor/datasketches-cpp/req/include/req_sketch_impl.hpp +2 -2
  90. data/vendor/datasketches-cpp/sampling/include/var_opt_sketch.hpp +20 -6
  91. data/vendor/datasketches-cpp/sampling/include/var_opt_sketch_impl.hpp +30 -16
  92. data/vendor/datasketches-cpp/sampling/include/var_opt_union.hpp +5 -1
  93. data/vendor/datasketches-cpp/sampling/include/var_opt_union_impl.hpp +19 -15
  94. data/vendor/datasketches-cpp/sampling/test/var_opt_sketch_test.cpp +33 -14
  95. data/vendor/datasketches-cpp/sampling/test/var_opt_union_test.cpp +0 -2
  96. data/vendor/datasketches-cpp/setup.py +1 -1
  97. data/vendor/datasketches-cpp/theta/CMakeLists.txt +1 -0
  98. data/vendor/datasketches-cpp/theta/include/bit_packing.hpp +6279 -0
  99. data/vendor/datasketches-cpp/theta/include/compact_theta_sketch_parser.hpp +14 -8
  100. data/vendor/datasketches-cpp/theta/include/compact_theta_sketch_parser_impl.hpp +60 -46
  101. data/vendor/datasketches-cpp/theta/include/theta_helpers.hpp +4 -2
  102. data/vendor/datasketches-cpp/theta/include/theta_sketch.hpp +58 -10
  103. data/vendor/datasketches-cpp/theta/include/theta_sketch_impl.hpp +430 -130
  104. data/vendor/datasketches-cpp/theta/include/theta_union_base_impl.hpp +9 -9
  105. data/vendor/datasketches-cpp/theta/include/theta_update_sketch_base.hpp +16 -4
  106. data/vendor/datasketches-cpp/theta/include/theta_update_sketch_base_impl.hpp +2 -2
  107. data/vendor/datasketches-cpp/theta/test/CMakeLists.txt +1 -0
  108. data/vendor/datasketches-cpp/theta/test/bit_packing_test.cpp +80 -0
  109. data/vendor/datasketches-cpp/theta/test/theta_sketch_test.cpp +42 -3
  110. data/vendor/datasketches-cpp/theta/test/theta_union_test.cpp +25 -0
  111. data/vendor/datasketches-cpp/tuple/include/tuple_sketch_impl.hpp +2 -1
  112. data/vendor/datasketches-cpp/version.cfg.in +1 -1
  113. metadata +31 -3
@@ -43,20 +43,44 @@ kxq1_(0.0),
43
43
  hllByteArr_(allocator),
44
44
  curMin_(0),
45
45
  numAtCurMin_(1 << lgConfigK),
46
- oooFlag_(false)
46
+ oooFlag_(false),
47
+ rebuild_kxq_curmin_(false)
48
+ {}
49
+
50
+ template<typename A>
51
+ HllArray<A>::HllArray(const HllArray& other, target_hll_type tgtHllType) :
52
+ HllSketchImpl<A>(other.getLgConfigK(), tgtHllType, hll_mode::HLL, other.isStartFullSize()),
53
+ // remaining fields are initialized to empty sketch defaults
54
+ // and left to subclass constructor to populate
55
+ hipAccum_(0.0),
56
+ kxq0_(1 << other.getLgConfigK()),
57
+ kxq1_(0.0),
58
+ hllByteArr_(other.getAllocator()),
59
+ curMin_(0),
60
+ numAtCurMin_(1 << other.getLgConfigK()),
61
+ oooFlag_(false),
62
+ rebuild_kxq_curmin_(false)
47
63
  {}
48
64
 
49
65
  template<typename A>
50
66
  HllArray<A>* HllArray<A>::copyAs(target_hll_type tgtHllType) const {
51
- if (tgtHllType == this->getTgtHllType()) {
67
+ // we may need to recompute KxQ and curMin data for a union gadget,
68
+ // so only use a direct copy if we have a valid sketch
69
+ if (tgtHllType == this->getTgtHllType() && !this->isRebuildKxqCurminFlag()) {
52
70
  return static_cast<HllArray*>(copy());
53
71
  }
54
- if (tgtHllType == target_hll_type::HLL_4) {
55
- return HllSketchImplFactory<A>::convertToHll4(*this);
56
- } else if (tgtHllType == target_hll_type::HLL_6) {
57
- return HllSketchImplFactory<A>::convertToHll6(*this);
58
- } else { // tgtHllType == HLL_8
59
- return HllSketchImplFactory<A>::convertToHll8(*this);
72
+
73
+ // the factory methods replay the coupons and will always rebuild
74
+ // the sketch in a consistent way
75
+ switch (tgtHllType) {
76
+ case target_hll_type::HLL_4:
77
+ return HllSketchImplFactory<A>::convertToHll4(*this);
78
+ case target_hll_type::HLL_6:
79
+ return HllSketchImplFactory<A>::convertToHll6(*this);
80
+ case target_hll_type::HLL_8:
81
+ return HllSketchImplFactory<A>::convertToHll8(*this);
82
+ default:
83
+ throw std::invalid_argument("Invalid target HLL type");
60
84
  }
61
85
  }
62
86
 
@@ -299,7 +323,7 @@ double HllArray<A>::getEstimate() const {
299
323
  if (oooFlag_) {
300
324
  return getCompositeEstimate();
301
325
  }
302
- return getHipAccum();
326
+ return hipAccum_;
303
327
  }
304
328
 
305
329
  // HLL UPPER AND LOWER BOUNDS
@@ -322,54 +346,20 @@ double HllArray<A>::getLowerBound(uint8_t numStdDev) const {
322
346
  HllUtil<A>::checkNumStdDev(numStdDev);
323
347
  const uint32_t configK = 1 << this->lgConfigK_;
324
348
  const double numNonZeros = ((curMin_ == 0) ? (configK - numAtCurMin_) : configK);
325
-
326
- double estimate;
327
- double rseFactor;
328
- if (oooFlag_) {
329
- estimate = getCompositeEstimate();
330
- rseFactor = hll_constants::HLL_NON_HIP_RSE_FACTOR;
331
- } else {
332
- estimate = hipAccum_;
333
- rseFactor = hll_constants::HLL_HIP_RSE_FACTOR;
334
- }
335
-
336
- double relErr;
337
- if (this->lgConfigK_ > 12) {
338
- relErr = (numStdDev * rseFactor) / sqrt(configK);
339
- } else {
340
- relErr = HllUtil<A>::getRelErr(false, oooFlag_, this->lgConfigK_, numStdDev);
341
- }
342
- return fmax(estimate / (1.0 + relErr), numNonZeros);
349
+ const double relErr = HllUtil<A>::getRelErr(false, this->oooFlag_, this->lgConfigK_, numStdDev);
350
+ return fmax(getEstimate() / (1.0 + relErr), numNonZeros);
343
351
  }
344
352
 
345
353
  template<typename A>
346
354
  double HllArray<A>::getUpperBound(uint8_t numStdDev) const {
347
355
  HllUtil<A>::checkNumStdDev(numStdDev);
348
- const uint32_t configK = 1 << this->lgConfigK_;
349
-
350
- double estimate;
351
- double rseFactor;
352
- if (oooFlag_) {
353
- estimate = getCompositeEstimate();
354
- rseFactor = hll_constants::HLL_NON_HIP_RSE_FACTOR;
355
- } else {
356
- estimate = hipAccum_;
357
- rseFactor = hll_constants::HLL_HIP_RSE_FACTOR;
358
- }
359
-
360
- double relErr;
361
- if (this->lgConfigK_ > 12) {
362
- relErr = (-1.0) * (numStdDev * rseFactor) / sqrt(configK);
363
- } else {
364
- relErr = HllUtil<A>::getRelErr(true, oooFlag_, this->lgConfigK_, numStdDev);
365
- }
366
- return estimate / (1.0 + relErr);
356
+ const double relErr = HllUtil<A>::getRelErr(true, this->oooFlag_, this->lgConfigK_, numStdDev);
357
+ return getEstimate() / (1.0 + relErr);
367
358
  }
368
359
 
369
360
  /**
370
361
  * This is the (non-HIP) estimator.
371
362
  * It is called "composite" because multiple estimators are pasted together.
372
- * @param absHllArr an instance of the AbstractHllArray class.
373
363
  * @return the composite estimate
374
364
  */
375
365
  // Original C: again-two-registers.c hhb_get_composite_estimate L1489
@@ -468,16 +458,6 @@ void HllArray<A>::putNumAtCurMin(uint32_t numAtCurMin) {
468
458
  numAtCurMin_ = numAtCurMin;
469
459
  }
470
460
 
471
- template<typename A>
472
- void HllArray<A>::decNumAtCurMin() {
473
- --numAtCurMin_;
474
- }
475
-
476
- template<typename A>
477
- void HllArray<A>::addToHipAccum(double delta) {
478
- hipAccum_ += delta;
479
- }
480
-
481
461
  template<typename A>
482
462
  bool HllArray<A>::isCompact() const {
483
463
  return false;
@@ -486,7 +466,7 @@ bool HllArray<A>::isCompact() const {
486
466
  template<typename A>
487
467
  bool HllArray<A>::isEmpty() const {
488
468
  const uint32_t configK = 1 << this->lgConfigK_;
489
- return (getCurMin() == 0) && (getNumAtCurMin() == configK);
469
+ return (curMin_ == 0) && (numAtCurMin_ == configK);
490
470
  }
491
471
 
492
472
  template<typename A>
@@ -556,6 +536,11 @@ AuxHashMap<A>* HllArray<A>::getAuxHashMap() const {
556
536
  return nullptr;
557
537
  }
558
538
 
539
+ template<typename A>
540
+ const vector_u8<A>& HllArray<A>::getHllArray() const {
541
+ return hllByteArr_;
542
+ }
543
+
559
544
  template<typename A>
560
545
  void HllArray<A>::hipAndKxQIncrementalUpdate(uint8_t oldValue, uint8_t newValue) {
561
546
  const uint32_t configK = 1 << this->getLgConfigK();
@@ -601,6 +586,52 @@ double HllArray<A>::getHllRawEstimate() const {
601
586
  return hyperEst;
602
587
  }
603
588
 
589
+ template<typename A>
590
+ void HllArray<A>::setRebuildKxqCurminFlag(bool rebuild) {
591
+ rebuild_kxq_curmin_ = rebuild;
592
+ }
593
+
594
+ template<typename A>
595
+ bool HllArray<A>::isRebuildKxqCurminFlag() const {
596
+ return rebuild_kxq_curmin_;
597
+ }
598
+
599
+ template<typename A>
600
+ void HllArray<A>::check_rebuild_kxq_cur_min() {
601
+ if (!rebuild_kxq_curmin_) { return; }
602
+
603
+ uint8_t cur_min = 64;
604
+ uint32_t num_at_cur_min = 0;
605
+ double kxq0 = 1 << this->lgConfigK_;
606
+ double kxq1 = 0;
607
+
608
+ auto it = this->begin(true); // want all points to adjust cur_min
609
+ const auto end = this->end();
610
+ while (it != end) {
611
+ uint8_t v = HllUtil<A>::getValue(*it);
612
+ if (v > 0) {
613
+ if (v < 32) { kxq0 += INVERSE_POWERS_OF_2[v] - 1.0; }
614
+ else { kxq1 += INVERSE_POWERS_OF_2[v] - 1.0; }
615
+ }
616
+ if (v > cur_min) { ++it; continue; }
617
+ if (v < cur_min) {
618
+ cur_min = v;
619
+ num_at_cur_min = 1;
620
+ } else {
621
+ ++num_at_cur_min;
622
+ }
623
+ ++it;
624
+ }
625
+
626
+ kxq0_ = kxq0;
627
+ kxq1_ = kxq1;
628
+ curMin_ = cur_min;
629
+ numAtCurMin_ = num_at_cur_min;
630
+ rebuild_kxq_curmin_ = false;
631
+ // HipAccum is not affected
632
+
633
+ }
634
+
604
635
  template<typename A>
605
636
  typename HllArray<A>::const_iterator HllArray<A>::begin(bool all) const {
606
637
  return const_iterator(hllByteArr_.data(), 1 << this->lgConfigK_, 0, this->tgtHllType_, nullptr, 0, all);
@@ -637,12 +668,14 @@ bool HllArray<A>::const_iterator::operator!=(const const_iterator& other) const
637
668
  }
638
669
 
639
670
  template<typename A>
640
- uint32_t HllArray<A>::const_iterator::operator*() const {
671
+ auto HllArray<A>::const_iterator::operator*() const -> reference {
641
672
  return HllUtil<A>::pair(index_, value_);
642
673
  }
643
674
 
644
675
  template<typename A>
645
676
  uint8_t HllArray<A>::const_iterator::get_value(const uint8_t* array, uint32_t index, target_hll_type hll_type, const AuxHashMap<A>* exceptions, uint8_t offset) {
677
+ // TODO: we should be able to improve efficiency here by reading multiple bytes at a time
678
+ // for HLL4 and HLL6
646
679
  if (hll_type == target_hll_type::HLL_4) {
647
680
  uint8_t value = array[index >> 1];
648
681
  if ((index & 1) > 0) { // odd
@@ -32,6 +32,7 @@ template<typename A>
32
32
  class HllArray : public HllSketchImpl<A> {
33
33
  public:
34
34
  HllArray(uint8_t lgConfigK, target_hll_type tgtHllType, bool startFullSize, const A& allocator);
35
+ explicit HllArray(const HllArray& other, target_hll_type tgtHllType);
35
36
 
36
37
  static HllArray* newHll(const void* bytes, size_t len, const A& allocator);
37
38
  static HllArray* newHll(std::istream& is, const A& allocator);
@@ -52,10 +53,6 @@ class HllArray : public HllSketchImpl<A> {
52
53
  virtual double getLowerBound(uint8_t numStdDev) const;
53
54
  virtual double getUpperBound(uint8_t numStdDev) const;
54
55
 
55
- inline void addToHipAccum(double delta);
56
-
57
- inline void decNumAtCurMin();
58
-
59
56
  inline uint8_t getCurMin() const;
60
57
  inline uint32_t getNumAtCurMin() const;
61
58
  inline double getHipAccum() const;
@@ -90,12 +87,18 @@ class HllArray : public HllSketchImpl<A> {
90
87
 
91
88
  virtual AuxHashMap<A>* getAuxHashMap() const;
92
89
 
90
+ void setRebuildKxqCurminFlag(bool rebuild);
91
+ bool isRebuildKxqCurminFlag() const;
92
+ void check_rebuild_kxq_cur_min();
93
+
93
94
  class const_iterator;
94
95
  virtual const_iterator begin(bool all = false) const;
95
96
  virtual const_iterator end() const;
96
97
 
97
98
  virtual A getAllocator() const;
98
99
 
100
+ const vector_u8<A>& getHllArray() const;
101
+
99
102
  protected:
100
103
  void hipAndKxQIncrementalUpdate(uint8_t oldValue, uint8_t newValue);
101
104
  double getHllBitMapEstimate() const;
@@ -108,17 +111,24 @@ class HllArray : public HllSketchImpl<A> {
108
111
  uint8_t curMin_; //always zero for Hll6 and Hll8, only tracked by Hll4Array
109
112
  uint32_t numAtCurMin_; //interpreted as num zeros when curMin == 0
110
113
  bool oooFlag_; //Out-Of-Order Flag
114
+ bool rebuild_kxq_curmin_; // flag to recompute
111
115
 
112
116
  friend class HllSketchImplFactory<A>;
113
117
  };
114
118
 
115
119
  template<typename A>
116
- class HllArray<A>::const_iterator: public std::iterator<std::input_iterator_tag, uint32_t> {
120
+ class HllArray<A>::const_iterator {
117
121
  public:
122
+ using iterator_category = std::input_iterator_tag;
123
+ using value_type = uint32_t;
124
+ using difference_type = void;
125
+ using pointer = uint32_t*;
126
+ using reference = uint32_t;
127
+
118
128
  const_iterator(const uint8_t* array, uint32_t array_slze, uint32_t index, target_hll_type hll_type, const AuxHashMap<A>* exceptions, uint8_t offset, bool all);
119
129
  const_iterator& operator++();
120
130
  bool operator!=(const const_iterator& other) const;
121
- uint32_t operator*() const;
131
+ reference operator*() const;
122
132
  private:
123
133
  const uint8_t* array_;
124
134
  uint32_t array_size_;
@@ -136,38 +136,20 @@ HllSketchImpl<A>* HllSketchImplFactory<A>::reset(HllSketchImpl<A>* impl, bool st
136
136
 
137
137
  template<typename A>
138
138
  Hll4Array<A>* HllSketchImplFactory<A>::convertToHll4(const HllArray<A>& srcHllArr) {
139
- const uint8_t lgConfigK = srcHllArr.getLgConfigK();
140
139
  using Hll4Alloc = typename std::allocator_traits<A>::template rebind_alloc<Hll4Array<A>>;
141
- Hll4Array<A>* hll4Array = new (Hll4Alloc(srcHllArr.getAllocator()).allocate(1))
142
- Hll4Array<A>(lgConfigK, srcHllArr.isStartFullSize(), srcHllArr.getAllocator());
143
- hll4Array->putOutOfOrderFlag(srcHllArr.isOutOfOrderFlag());
144
- hll4Array->mergeHll(srcHllArr);
145
- hll4Array->putHipAccum(srcHllArr.getHipAccum());
146
- return hll4Array;
140
+ return new (Hll4Alloc(srcHllArr.getAllocator()).allocate(1)) Hll4Array<A>(srcHllArr);
147
141
  }
148
142
 
149
143
  template<typename A>
150
144
  Hll6Array<A>* HllSketchImplFactory<A>::convertToHll6(const HllArray<A>& srcHllArr) {
151
- const uint8_t lgConfigK = srcHllArr.getLgConfigK();
152
145
  using Hll6Alloc = typename std::allocator_traits<A>::template rebind_alloc<Hll6Array<A>>;
153
- Hll6Array<A>* hll6Array = new (Hll6Alloc(srcHllArr.getAllocator()).allocate(1))
154
- Hll6Array<A>(lgConfigK, srcHllArr.isStartFullSize(), srcHllArr.getAllocator());
155
- hll6Array->putOutOfOrderFlag(srcHllArr.isOutOfOrderFlag());
156
- hll6Array->mergeHll(srcHllArr);
157
- hll6Array->putHipAccum(srcHllArr.getHipAccum());
158
- return hll6Array;
146
+ return new (Hll6Alloc(srcHllArr.getAllocator()).allocate(1)) Hll6Array<A>(srcHllArr);
159
147
  }
160
148
 
161
149
  template<typename A>
162
150
  Hll8Array<A>* HllSketchImplFactory<A>::convertToHll8(const HllArray<A>& srcHllArr) {
163
- const uint8_t lgConfigK = srcHllArr.getLgConfigK();
164
151
  using Hll8Alloc = typename std::allocator_traits<A>::template rebind_alloc<Hll8Array<A>>;
165
- Hll8Array<A>* hll8Array = new (Hll8Alloc(srcHllArr.getAllocator()).allocate(1))
166
- Hll8Array<A>(lgConfigK, srcHllArr.isStartFullSize(), srcHllArr.getAllocator());
167
- hll8Array->putOutOfOrderFlag(srcHllArr.isOutOfOrderFlag());
168
- hll8Array->mergeHll(srcHllArr);
169
- hll8Array->putHipAccum(srcHllArr.getHipAccum());
170
- return hll8Array;
152
+ return new (Hll8Alloc(srcHllArr.getAllocator()).allocate(1)) Hll8Array<A>(srcHllArr);
171
153
  }
172
154
 
173
155
  }
@@ -131,21 +131,29 @@ void hll_union_alloc<A>::coupon_update(uint32_t coupon) {
131
131
 
132
132
  template<typename A>
133
133
  double hll_union_alloc<A>::get_estimate() const {
134
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
135
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
134
136
  return gadget_.get_estimate();
135
137
  }
136
138
 
137
139
  template<typename A>
138
140
  double hll_union_alloc<A>::get_composite_estimate() const {
141
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
142
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
139
143
  return gadget_.get_composite_estimate();
140
144
  }
141
145
 
142
146
  template<typename A>
143
147
  double hll_union_alloc<A>::get_lower_bound(uint8_t num_std_dev) const {
148
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
149
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
144
150
  return gadget_.get_lower_bound(num_std_dev);
145
151
  }
146
152
 
147
153
  template<typename A>
148
154
  double hll_union_alloc<A>::get_upper_bound(uint8_t num_std_dev) const {
155
+ if (gadget_.sketch_impl->getCurMode() == hll_mode::HLL)
156
+ static_cast<HllArray<A>*>(gadget_.sketch_impl)->check_rebuild_kxq_cur_min();
149
157
  return gadget_.get_upper_bound(num_std_dev);
150
158
  }
151
159
 
@@ -152,12 +152,6 @@ inline void HllUtil<A>::hash(const void* key, size_t keyLen, uint64_t seed, Hash
152
152
  MurmurHash3_x64_128(key, keyLen, seed, result);
153
153
  }
154
154
 
155
- template<typename A>
156
- inline double HllUtil<A>::getRelErr(bool upperBound, bool unioned,
157
- uint8_t lgConfigK, uint8_t numStdDev) {
158
- return RelativeErrorTables<A>::getRelErr(upperBound, unioned, lgConfigK, numStdDev);
159
- }
160
-
161
155
  template<typename A>
162
156
  inline uint8_t HllUtil<A>::checkLgK(uint8_t lgK) {
163
157
  if ((lgK >= hll_constants::MIN_LOG_K) && (lgK <= hll_constants::MAX_LOG_K)) {
@@ -167,6 +161,20 @@ inline uint8_t HllUtil<A>::checkLgK(uint8_t lgK) {
167
161
  }
168
162
  }
169
163
 
164
+ template<typename A>
165
+ inline double HllUtil<A>::getRelErr(bool upperBound, bool unioned,
166
+ uint8_t lgConfigK, uint8_t numStdDev) {
167
+ checkLgK(lgConfigK);
168
+ if (lgConfigK > 12) {
169
+ const double rseFactor = unioned ?
170
+ hll_constants::HLL_NON_HIP_RSE_FACTOR : hll_constants::HLL_HIP_RSE_FACTOR;
171
+ const uint32_t configK = 1 << lgConfigK;
172
+ return (upperBound ? -1 : 1) * (numStdDev * rseFactor) / sqrt(configK);
173
+ } else {
174
+ return RelativeErrorTables<A>::getRelErr(upperBound, unioned, lgConfigK, numStdDev);
175
+ }
176
+ }
177
+
170
178
  template<typename A>
171
179
  inline void HllUtil<A>::checkMemSize(uint64_t minBytes, uint64_t capBytes) {
172
180
  if (capBytes < minBytes) {
@@ -47,7 +47,7 @@ bool coupon_iterator<A>::operator!=(const coupon_iterator& other) const {
47
47
  }
48
48
 
49
49
  template<typename A>
50
- uint32_t coupon_iterator<A>::operator*() const {
50
+ auto coupon_iterator<A>::operator*() const -> reference {
51
51
  return array_[index_];
52
52
  }
53
53
 
@@ -23,12 +23,18 @@
23
23
  namespace datasketches {
24
24
 
25
25
  template<typename A>
26
- class coupon_iterator: public std::iterator<std::input_iterator_tag, uint32_t> {
26
+ class coupon_iterator {
27
27
  public:
28
+ using iterator_category = std::input_iterator_tag;
29
+ using value_type = uint32_t;
30
+ using difference_type = void;
31
+ using pointer = uint32_t*;
32
+ using reference = uint32_t;
33
+
28
34
  coupon_iterator(const uint32_t* array, size_t array_slze, size_t index, bool all);
29
35
  coupon_iterator& operator++();
30
36
  bool operator!=(const coupon_iterator& other) const;
31
- uint32_t operator*() const;
37
+ reference operator*() const;
32
38
  private:
33
39
  const uint32_t* array_;
34
40
  size_t array_size_;
@@ -23,8 +23,9 @@
23
23
  #include "common_defs.hpp"
24
24
  #include "HllUtil.hpp"
25
25
 
26
- #include <memory>
27
26
  #include <iostream>
27
+ #include <memory>
28
+ #include <string>
28
29
  #include <vector>
29
30
 
30
31
  namespace datasketches {
@@ -144,7 +145,7 @@ class hll_sketch_alloc final {
144
145
 
145
146
  /**
146
147
  * Reconstructs a sketch from a serialized image in a byte array.
147
- * @param is bytes An input array with a binary image of a sketch
148
+ * @param bytes An input array with a binary image of a sketch
148
149
  * @param len Length of the input array, in bytes
149
150
  */
150
151
  static hll_sketch_alloc deserialize(const void* bytes, size_t len, const A& allocator = A());
@@ -197,7 +198,7 @@ class hll_sketch_alloc final {
197
198
  * Human readable summary with optional detail
198
199
  * @param summary if true, output the sketch summary
199
200
  * @param detail if true, output the internal data array
200
- * @param auxDetail if true, output the internal Aux array, if it exists.
201
+ * @param aux_detail if true, output the internal Aux array, if it exists.
201
202
  * @param all if true, outputs all entries including empty ones
202
203
  * @return human readable string with optional detail.
203
204
  */
@@ -358,7 +359,7 @@ class hll_sketch_alloc final {
358
359
  * value can be exceeded in extremely rare cases. If exceeded, it
359
360
  * will be larger by only a few percent.
360
361
  *
361
- * @param lg_config_k The Log2 of K for the target HLL sketch. This value must be
362
+ * @param lg_k The Log2 of K for the target HLL sketch. This value must be
362
363
  * between 4 and 21 inclusively.
363
364
  * @param tgt_type the desired Hll type
364
365
  * @return the maximum size in bytes that this sketch can grow to.
@@ -495,20 +496,20 @@ class hll_union_alloc {
495
496
  /**
496
497
  * Returns the result of this union operator with the specified
497
498
  * #tgt_hll_type.
498
- * @param The tgt_hll_type enum value of the desired result (Default: HLL_4)
499
+ * @param tgt_type The tgt_hll_type enum value of the desired result (Default: HLL_4)
499
500
  * @return The result of this union with the specified tgt_hll_type
500
501
  */
501
502
  hll_sketch_alloc<A> get_result(target_hll_type tgt_type = HLL_4) const;
502
503
 
503
504
  /**
504
505
  * Update this union operator with the given sketch.
505
- * @param The given sketch.
506
+ * @param sketch The given sketch.
506
507
  */
507
508
  void update(const hll_sketch_alloc<A>& sketch);
508
509
 
509
510
  /**
510
511
  * Update this union operator with the given temporary sketch.
511
- * @param The given sketch.
512
+ * @param sketch The given sketch.
512
513
  */
513
514
  void update(hll_sketch_alloc<A>&& sketch);
514
515
 
@@ -608,7 +609,7 @@ class hll_union_alloc {
608
609
  * perform the union. This may involve swapping, down-sampling, transforming, and / or
609
610
  * copying one of the arguments and may completely replace the internals of the union.
610
611
  *
611
- * @param incoming_impl the given incoming sketch, which may not be modified.
612
+ * @param sketch the given incoming sketch, which may not be modified.
612
613
  * @param lg_max_k the maximum value of log2 K for this union.
613
614
  */
614
615
  inline void union_impl(const hll_sketch_alloc<A>& sketch, uint8_t lg_max_k);
@@ -53,11 +53,16 @@ static void basicUnion(uint64_t n1, uint64_t n2,
53
53
  v += n2;
54
54
 
55
55
  hll_union u(lgMaxK);
56
- u.update(std::move(h1));
56
+ u.update(h1);
57
57
  u.update(h2);
58
58
 
59
59
  hll_sketch result = u.get_result(resultType);
60
60
 
61
+ // ensure we check a direct union estimate, without first caling get_result()
62
+ u.reset();
63
+ u.update(std::move(h1));
64
+ u.update(h2);
65
+
61
66
  // force non-HIP estimates to avoid issues with in- vs out-of-order
62
67
  double uEst = result.get_composite_estimate();
63
68
  double uUb = result.get_upper_bound(2);
@@ -74,6 +79,7 @@ static void basicUnion(uint64_t n1, uint64_t n2,
74
79
  REQUIRE((uEst - uLb) >= 0.0);
75
80
 
76
81
  REQUIRE(controlEst == uEst);
82
+ REQUIRE(controlEst == u.get_composite_estimate());
77
83
  }
78
84
 
79
85
  /**
@@ -20,7 +20,6 @@
20
20
  #ifndef KLL_HELPER_HPP_
21
21
  #define KLL_HELPER_HPP_
22
22
 
23
- #include <random>
24
23
  #include <stdexcept>
25
24
 
26
25
  namespace datasketches {
@@ -586,16 +586,21 @@ class kll_sketch {
586
586
  };
587
587
 
588
588
  template<typename T, typename C, typename A>
589
- class kll_sketch<T, C, A>::const_iterator: public std::iterator<std::input_iterator_tag, T> {
589
+ class kll_sketch<T, C, A>::const_iterator {
590
590
  public:
591
+ using iterator_category = std::input_iterator_tag;
591
592
  using value_type = std::pair<const T&, const uint64_t>;
593
+ using difference_type = void;
594
+ using pointer = const return_value_holder<value_type>;
595
+ using reference = const value_type;
596
+
592
597
  friend class kll_sketch<T, C, A>;
593
598
  const_iterator& operator++();
594
599
  const_iterator& operator++(int);
595
600
  bool operator==(const const_iterator& other) const;
596
601
  bool operator!=(const const_iterator& other) const;
597
- const value_type operator*() const;
598
- const return_value_holder<value_type> operator->() const;
602
+ reference operator*() const;
603
+ pointer operator->() const;
599
604
  private:
600
605
  const T* items;
601
606
  const uint32_t* levels;
@@ -1105,12 +1105,12 @@ bool kll_sketch<T, C, A>::const_iterator::operator!=(const const_iterator& other
1105
1105
  }
1106
1106
 
1107
1107
  template<typename T, typename C, typename A>
1108
- auto kll_sketch<T, C, A>::const_iterator::operator*() const -> const value_type {
1108
+ auto kll_sketch<T, C, A>::const_iterator::operator*() const -> reference {
1109
1109
  return value_type(items[index], weight);
1110
1110
  }
1111
1111
 
1112
1112
  template<typename T, typename C, typename A>
1113
- auto kll_sketch<T, C, A>::const_iterator::operator->() const -> const return_value_holder<value_type> {
1113
+ auto kll_sketch<T, C, A>::const_iterator::operator->() const -> pointer {
1114
1114
  return **this;
1115
1115
  }
1116
1116
 
@@ -242,7 +242,7 @@ TEST_CASE("kll sketch", "[kll_sketch]") {
242
242
  FAIL("checking rank vs CDF for value " + std::to_string(i));
243
243
  }
244
244
  subtotal_pmf += pmf[i];
245
- if (abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
245
+ if (std::abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
246
246
  FAIL("CDF vs PMF for value " + std::to_string(i));
247
247
  }
248
248
  }
@@ -257,7 +257,7 @@ TEST_CASE("kll sketch", "[kll_sketch]") {
257
257
  FAIL("checking rank vs CDF for value " + std::to_string(i));
258
258
  }
259
259
  subtotal_pmf += pmf[i];
260
- if (abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
260
+ if (std::abs(ranks[i] - subtotal_pmf) > NUMERIC_NOISE_TOLERANCE) {
261
261
  FAIL("CDF vs PMF for value " + std::to_string(i));
262
262
  }
263
263
  }
@@ -42,9 +42,12 @@ target_link_libraries(python
42
42
  cpc
43
43
  fi
44
44
  theta
45
+ tuple
45
46
  sampling
46
47
  req
47
48
  quantiles
49
+ count
50
+ density
48
51
  pybind11::module
49
52
  )
50
53
 
@@ -72,10 +75,13 @@ target_sources(python
72
75
  src/cpc_wrapper.cpp
73
76
  src/fi_wrapper.cpp
74
77
  src/theta_wrapper.cpp
78
+ src/tuple_wrapper.cpp
75
79
  src/vo_wrapper.cpp
76
80
  src/req_wrapper.cpp
77
81
  src/quantiles_wrapper.cpp
82
+ src/density_wrapper.cpp
78
83
  src/ks_wrapper.cpp
84
+ src/count_wrapper.cpp
79
85
  src/vector_of_kll.cpp
80
86
  src/py_serde.cpp
81
87
  )
@@ -12,15 +12,15 @@ This package provides a variety of sketches as described below. Wherever a speci
12
12
 
13
13
  ## Building and Installation
14
14
 
15
- Once cloned, the library can be installed by running `python3 -m pip install .` in the project root directory -- not the python subdirectory -- which will also install the necessary dependencies, namely numpy and [pybind11[global]](https://github.com/pybind/pybind11).
15
+ Once cloned, the library can be installed by running `python3 -m pip install .` in the project root directory -- not the python subdirectory -- which will also install the necessary dependencies, namely NumPy and [pybind11[global]](https://github.com/pybind/pybind11).
16
16
 
17
- If you prefer to call the `setup.py` build script directly, which is discoraged, you must first install `pybind11[global]`, as well as any other dependencies listed under the build-system section in `pyproject.toml`.
17
+ If you prefer to call the `setup.py` build script directly, which is discouraged, you must first install `pybind11[global]`, as well as any other dependencies listed under the build-system section in `pyproject.toml`.
18
18
 
19
19
  The library is also available from PyPI via `python3 -m pip install datasketches`.
20
20
 
21
21
  ## Usage
22
22
 
23
- Having installed the library, loading the Apache Datasketches Library in Python is simple: `import datasketches`.
23
+ Having installed the library, loading the Apache DataSketches Library in Python is simple: `import datasketches`.
24
24
 
25
25
  The unit tests are mostly structured in a tutorial style and can be used as a reference example for how to feed data into and query the different types of sketches.
26
26
 
@@ -76,10 +76,10 @@ The only developer-specific instructions relate to running unit tests.
76
76
 
77
77
  ### Unit tests
78
78
 
79
- The Python unit tests are run via `tox`, with no arguments, from the project root directory -- not the python subdirectory. Tox creates a temporary virtual environment in which to build and run the unit tests. In the event you are missing the necessary pacakge, tox may be installed with `python3 -m pip install --upgrade tox`.
79
+ The Python unit tests are run via `tox`, with no arguments, from the project root directory -- not the python subdirectory. Tox creates a temporary virtual environment in which to build and run the unit tests. In the event you are missing the necessary package, tox may be installed with `python3 -m pip install --upgrade tox`.
80
80
 
81
81
  ## License
82
82
 
83
- The Apache DataSketches Library is distrubted under an Apache 2.0 License.
83
+ The Apache DataSketches Library is distributed under the Apache 2.0 License.
84
84
 
85
85
  There may be precompiled binaries provided as a convenience and distributed through PyPI via [https://pypi.org/project/datasketches/] contain compiled code from [pybind11](https://github.com/pybind/pybind11), which is distributed under a BSD license.