RubyGems - db_clustering - Versions diffs - 0.1.4 → 0.1.5 - Mend

db_clustering 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/README.md +21 -0
data/VERSION +1 -1
data/db_clustering.gemspec +2 -2
data/lib/algorithms/density_based/dbscan.rb +2 -2
data/lib/datasource_adapters/active_record.rb +5 -1
data/lib/datasource_adapters/in_memory.rb +3 -2
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 8901dbfe4c461163864eecc8934ca87e7e3485cf
-  data.tar.gz: 200614749469e8548535f9d5b2095e641354b00d
+  metadata.gz: 9aa27e97cd4db7b79c7a6ce30db82436e9553f55
+  data.tar.gz: 61628405866773b9bacaa9f5ac954ba623d48f69
 SHA512:
-  metadata.gz: 90931ba87de5fa19325f4965f166ad62a2650dac55789b7e66e4308b1e6b00bc1331a77e91e5013f0f65492272be1f2966d89bd8e6ebe89343b05858c85145d1
-  data.tar.gz: 5857b3d7f4926a0a7b96f711b93ee3f4ab0fb8442430fb66f279188a359ff2ef691cd67bbd2f7eeaab0f83e46dc9330089a44d3645b8fb7a5e9a3040c91a605f
+  metadata.gz: b126213d40d23b9548e754587da63c27d580358d9b403f96a4628c14820d5513a1af7f3e2679e9871521dc280720ba39466bdd448431bfce4e3f5fb4252fd8db
+  data.tar.gz: c0808f2d68358297e6820b9ecebec10711662c285750ef3419d2a3a60969bc1f6632a8ff108bd0fcf2c4ea522d46a77891b7c3195e36b35f8ade781bd9210ee2

data/README.md CHANGED Viewed

@@ -58,6 +58,27 @@ This gem was developed to work best in Ruby on Rails projects.
    ```
    The `max_distance` is the epsilon parameter and the `min_neighbors` the minPts parameter from the usual DBSCAN algorithm documentation (e.g. Wikipedia). You might want to try different values here first before you decide for the right values for your purpose.
+   If you're interested in the progress of the algorithm you can run some code after each iteration of it (for DBSCAN this would mean after clustering a single point with its neighbors). Please note though that the current information at that point may be incomplete so don't use this as a method to receive a portion of the final results, treat it more like a partial result or just use it to indicate progress or do some debugging. For example you could do this:
+   ``` ruby
+   last_printed_progress = 0.0
+   dbscan.cluster(max_distance: 10, min_neighbors: 5) do |point, current_index, points_count|
+     progress = (current_index + 1) * 100 / points_count.to_f
+     if progress > last_printed_progress + 1
+       print "[#{progress.to_i}%]"
+       last_printed_progress = progress
+     end
+     if point.cluster
+       print "(#{point.cluster.id}|#{point.cluster.points.count})"
+     else
+       print "(nil|0)"
+     end
+   end
+   ```
    Plase also take note that the `max_distance` value is **highly dependent on the type of metric** you decided to go for. For the `AverageDifference` and `EuclideanDistance` metrics it can be an **open-ended positive value**. For the `CosineSimilarity` and `PearsonCorrelation` types it needs to be a value between 0 and 2 where a value of `0` means "100% positive correlation/similarity", a value of `1` means "no correlation/similarity at all" and a value of `2` means "100% negative correlation/similarity". You can use any decimal value in between (e.g. 0.25) as a partly positive/negative correlation.
 8. Wait for the calculations to finish and use the results the way you want:

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.1.4
1	+ 0.1.5

data/db_clustering.gemspec CHANGED Viewed

@@ -2,11 +2,11 @@
 # DO NOT EDIT THIS FILE DIRECTLY
 # Instead, edit Jeweler::Tasks in Rakefile, and run 'rake gemspec'
 # -*- encoding: utf-8 -*-
-# stub: db_clustering 0.1.4 ruby lib
+# stub: db_clustering 0.1.5 ruby lib
 Gem::Specification.new do |s|
   s.name = "db_clustering"
-  s.version = "0.1.4"
+  s.version = "0.1.5"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
   s.require_paths = ["lib"]

data/lib/algorithms/density_based/dbscan.rb CHANGED Viewed

@@ -16,7 +16,7 @@ module DbClustering
         @clusters = []
         cluster = nil
-        @datasource.iterate_all_points do |point|
+        @datasource.iterate_all_points do |point, current_index, points_count|
           neighbors = @datasource.neighbors(point: point, distance_metric: @distance_metric, max_distance: max_distance)
           if neighbors.count < min_neighbors
@@ -36,7 +36,7 @@ module DbClustering
             end
           end
-          yield(point)
+          yield(point, current_index, points_count) if block_given?
         end
       end

data/lib/datasource_adapters/active_record.rb CHANGED Viewed

@@ -7,9 +7,13 @@ module DbClustering
       end
       def iterate_all_points
+        points_count = @relation.count
+        current_index = 0
         @relation.find_each do |datasource_point|
           point = DbClustering::Models::Point.new(datasource_point)
-          yield(point)
+          yield(point, current_index, points_count)
+          current_index += 1
         end
       end

data/lib/datasource_adapters/in_memory.rb CHANGED Viewed

@@ -7,8 +7,9 @@ module DbClustering
       end
       def iterate_all_points
-        @array.each do |point|
-          yield(point)
+        points_count = @array.count
+        @array.each.with_index do |point, current_index|
+          yield(point, current_index, points_count)
         end
       end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: db_clustering
 version: !ruby/object:Gem::Version
-  version: 0.1.4
+  version: 0.1.5
 platform: ruby
 authors:
 - Cihat Gündüz