timescaledb 0.2.6 → 0.2.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/lib/timescaledb/acts_as_hypertable/core.rb +1 -1
- data/lib/timescaledb/database/quoting.rb +12 -0
- data/lib/timescaledb/database/schema_statements.rb +168 -0
- data/lib/timescaledb/database/types.rb +17 -0
- data/lib/timescaledb/database.rb +11 -0
- data/lib/timescaledb/toolkit/time_vector.rb +41 -4
- data/lib/timescaledb/version.rb +1 -1
- metadata +6 -95
- data/.github/workflows/ci.yml +0 -72
- data/.gitignore +0 -12
- data/.rspec +0 -3
- data/.ruby-version +0 -1
- data/.tool-versions +0 -1
- data/.travis.yml +0 -9
- data/CODE_OF_CONDUCT.md +0 -74
- data/Fastfile +0 -17
- data/Gemfile +0 -8
- data/Gemfile.lock +0 -75
- data/Gemfile.scenic +0 -7
- data/Gemfile.scenic.lock +0 -119
- data/README.md +0 -490
- data/Rakefile +0 -21
- data/bin/console +0 -28
- data/bin/setup +0 -13
- data/docs/command_line.md +0 -178
- data/docs/img/lttb_example.png +0 -0
- data/docs/img/lttb_sql_vs_ruby.gif +0 -0
- data/docs/img/lttb_zoom.gif +0 -0
- data/docs/index.md +0 -72
- data/docs/migrations.md +0 -76
- data/docs/models.md +0 -78
- data/docs/toolkit.md +0 -507
- data/docs/toolkit_lttb_tutorial.md +0 -557
- data/docs/toolkit_lttb_zoom.md +0 -357
- data/docs/toolkit_ohlc.md +0 -315
- data/docs/videos.md +0 -16
- data/examples/all_in_one/all_in_one.rb +0 -94
- data/examples/all_in_one/benchmark_comparison.rb +0 -108
- data/examples/all_in_one/caggs.rb +0 -93
- data/examples/all_in_one/query_data.rb +0 -78
- data/examples/ranking/.gitattributes +0 -7
- data/examples/ranking/.gitignore +0 -29
- data/examples/ranking/.ruby-version +0 -1
- data/examples/ranking/Gemfile +0 -33
- data/examples/ranking/Gemfile.lock +0 -189
- data/examples/ranking/README.md +0 -166
- data/examples/ranking/Rakefile +0 -6
- data/examples/ranking/app/controllers/application_controller.rb +0 -2
- data/examples/ranking/app/controllers/concerns/.keep +0 -0
- data/examples/ranking/app/jobs/application_job.rb +0 -7
- data/examples/ranking/app/models/application_record.rb +0 -3
- data/examples/ranking/app/models/concerns/.keep +0 -0
- data/examples/ranking/app/models/game.rb +0 -2
- data/examples/ranking/app/models/play.rb +0 -7
- data/examples/ranking/bin/bundle +0 -114
- data/examples/ranking/bin/rails +0 -4
- data/examples/ranking/bin/rake +0 -4
- data/examples/ranking/bin/setup +0 -33
- data/examples/ranking/config/application.rb +0 -39
- data/examples/ranking/config/boot.rb +0 -4
- data/examples/ranking/config/credentials.yml.enc +0 -1
- data/examples/ranking/config/database.yml +0 -86
- data/examples/ranking/config/environment.rb +0 -5
- data/examples/ranking/config/environments/development.rb +0 -60
- data/examples/ranking/config/environments/production.rb +0 -75
- data/examples/ranking/config/environments/test.rb +0 -53
- data/examples/ranking/config/initializers/cors.rb +0 -16
- data/examples/ranking/config/initializers/filter_parameter_logging.rb +0 -8
- data/examples/ranking/config/initializers/inflections.rb +0 -16
- data/examples/ranking/config/initializers/timescale.rb +0 -2
- data/examples/ranking/config/locales/en.yml +0 -33
- data/examples/ranking/config/puma.rb +0 -43
- data/examples/ranking/config/routes.rb +0 -6
- data/examples/ranking/config/storage.yml +0 -34
- data/examples/ranking/config.ru +0 -6
- data/examples/ranking/db/migrate/20220209120747_create_games.rb +0 -10
- data/examples/ranking/db/migrate/20220209120910_create_plays.rb +0 -19
- data/examples/ranking/db/migrate/20220209143347_create_score_per_hours.rb +0 -5
- data/examples/ranking/db/schema.rb +0 -47
- data/examples/ranking/db/seeds.rb +0 -7
- data/examples/ranking/db/views/score_per_hours_v01.sql +0 -7
- data/examples/ranking/lib/tasks/.keep +0 -0
- data/examples/ranking/log/.keep +0 -0
- data/examples/ranking/public/robots.txt +0 -1
- data/examples/ranking/storage/.keep +0 -0
- data/examples/ranking/tmp/.keep +0 -0
- data/examples/ranking/tmp/pids/.keep +0 -0
- data/examples/ranking/tmp/storage/.keep +0 -0
- data/examples/ranking/vendor/.keep +0 -0
- data/examples/toolkit-demo/compare_volatility.rb +0 -104
- data/examples/toolkit-demo/lttb/README.md +0 -15
- data/examples/toolkit-demo/lttb/lttb.rb +0 -92
- data/examples/toolkit-demo/lttb/lttb_sinatra.rb +0 -139
- data/examples/toolkit-demo/lttb/lttb_test.rb +0 -21
- data/examples/toolkit-demo/lttb/views/index.erb +0 -27
- data/examples/toolkit-demo/lttb-zoom/README.md +0 -13
- data/examples/toolkit-demo/lttb-zoom/lttb_zoomable.rb +0 -90
- data/examples/toolkit-demo/lttb-zoom/views/index.erb +0 -33
- data/examples/toolkit-demo/ohlc.rb +0 -175
- data/mkdocs.yml +0 -34
- data/timescaledb.gemspec +0 -40
@@ -1,557 +0,0 @@
|
|
1
|
-
|
2
|
-
[Largest Triangle Three Buckets][1] is a downsampling method that tries to retain visual similarity between the downsampled data and the original dataset.
|
3
|
-
|
4
|
-
While most frameworks implement it in the front end, TimescaleDB Toolkit provides an implementation that takes (timestamp, value) pairs, sorts them if needed, and downsamples the values directly in the database.
|
5
|
-
|
6
|
-
In the following steps, you'll learn how to use LTTB from both databases and the Ruby programming language—writing the LTTB algorithm in Ruby from scratch—fully comprehend how it works and later compares the performance and usability of both solutions.
|
7
|
-
|
8
|
-
Later, we'll benchmark the downsampling methods and the plain data using a real scenario. The data points are actual data from the [weather dataset][4].
|
9
|
-
|
10
|
-
If you want to run it yourself, feel free to use the [example][3] that contains all the steps we will describe here.
|
11
|
-
|
12
|
-
## Setup the dependencies
|
13
|
-
|
14
|
-
Bundler inline avoids the creation of the `Gemfile` to prototype code that you can ship in a single file. You can declare all the gems in the `gemfile` code block, and Bundler will install them dynamically.
|
15
|
-
|
16
|
-
```ruby
|
17
|
-
require 'bundler/inline'
|
18
|
-
|
19
|
-
gemfile(true) do
|
20
|
-
gem 'timescaledb'
|
21
|
-
gem 'pry'
|
22
|
-
gem 'chartkick'
|
23
|
-
gem 'sinatra'
|
24
|
-
end
|
25
|
-
```
|
26
|
-
|
27
|
-
```ruby
|
28
|
-
require 'timescaledb/toolkit'
|
29
|
-
```
|
30
|
-
|
31
|
-
The Timescale gem doesn't require the toolkit by default, so you must specify it to use.
|
32
|
-
|
33
|
-
!!!warning
|
34
|
-
Note that we do not require the rest of the libraries because Bundler inline already requires the specified libraries by default which is very convenient for examples in a single file.
|
35
|
-
Let's take a look at what dependencies we have for what purpose:
|
36
|
-
|
37
|
-
* [timescaledb][8] gem is the ActiveRecord wrapper for TimescaleDB functions.
|
38
|
-
* [pry][9] is here because it's the best REPL to debug any Ruby code. We add it in the end to ease the exploring session you can do yourself after learning with the tutorial.
|
39
|
-
* [chartkick][11] is the library that can plot the values and make it easy to plot the data results.
|
40
|
-
* [sinatra][19] is a DSL for quickly creating web applications with minimal
|
41
|
-
effort.
|
42
|
-
|
43
|
-
## Setup database
|
44
|
-
|
45
|
-
Now, it's time to set up the database for this application. Make sure you
|
46
|
-
have TimescaleDB installed or [learn how to install TimescaleDB here][12].
|
47
|
-
|
48
|
-
### Establishing the connection
|
49
|
-
|
50
|
-
The next step is to connect to the database so that we will run this example with the PostgreSQL URI as the last argument of the command line.
|
51
|
-
|
52
|
-
```ruby
|
53
|
-
PG_URI = ARGV.last
|
54
|
-
ActiveRecord::Base.establish_connection(PG_URI)
|
55
|
-
```
|
56
|
-
|
57
|
-
If this line works, it means your connection is good.
|
58
|
-
|
59
|
-
### Downloading the dataset
|
60
|
-
|
61
|
-
The weather dataset is available [here][4], and here is small automation to make it run smoothly with small, medium, and big data sets.
|
62
|
-
|
63
|
-
```ruby
|
64
|
-
VALID_SIZES = %i[small med big]
|
65
|
-
def download_weather_dataset size: :small
|
66
|
-
unless VALID_SIZES.include?(size)
|
67
|
-
fail "Invalid size: #{size}. Valid are #{VALID_SIZES}"
|
68
|
-
end
|
69
|
-
url = "https://timescaledata.blob.core.windows.net/datasets/weather_#{size}.tar.gz"
|
70
|
-
puts "fetching #{size} weather dataset..."
|
71
|
-
system "wget \"#{url}\""
|
72
|
-
puts "done!"
|
73
|
-
end
|
74
|
-
```
|
75
|
-
|
76
|
-
Now, let's create a setup method to verify if the database is created and have
|
77
|
-
the data loaded, and fetch it if necessary.
|
78
|
-
|
79
|
-
```ruby
|
80
|
-
def setup size: :small
|
81
|
-
file = "weather_#{size}.tar.gz"
|
82
|
-
download_weather_dataset unless File.exists? file
|
83
|
-
puts "extracting #{file}"
|
84
|
-
system "tar -xvzf #{file} "
|
85
|
-
puts "creating data structures"
|
86
|
-
system "psql #{PG_URI} < weather.sql"
|
87
|
-
system %|psql #{PG_URI} -c "\\COPY locations FROM weather_#{size}_locations.csv CSV"|
|
88
|
-
system %|psql #{PG_URI} -c "\\COPY conditions FROM weather_#{size}_conditions.csv CSV"|
|
89
|
-
end
|
90
|
-
```
|
91
|
-
|
92
|
-
!!!info
|
93
|
-
Maybe you'll need to recreate the database if you want to test with a different dataset.
|
94
|
-
|
95
|
-
### Declaring the models
|
96
|
-
|
97
|
-
Now, let's declare the ActiveRecord models. The location is an auxiliary table to control the placement of the device.
|
98
|
-
|
99
|
-
```ruby
|
100
|
-
class Location < ActiveRecord::Base
|
101
|
-
self.primary_key = "device_id"
|
102
|
-
|
103
|
-
has_many :conditions, foreign_key: "device_id"
|
104
|
-
end
|
105
|
-
```
|
106
|
-
|
107
|
-
Every location emits weather conditions with `temperature` and `humidity` every X minutes.
|
108
|
-
|
109
|
-
The `conditions` is the time-series data we'll refer to here.
|
110
|
-
|
111
|
-
```ruby
|
112
|
-
class Condition < ActiveRecord::Base
|
113
|
-
acts_as_hypertable time_column: "time"
|
114
|
-
acts_as_time_vector value_column: "temperature", segment_by: "device_id"
|
115
|
-
belongs_to :location, foreign_key: "device_id"
|
116
|
-
end
|
117
|
-
```
|
118
|
-
|
119
|
-
### Putting all together
|
120
|
-
|
121
|
-
Now it's time to call the methods we implemented before. So, let's set up a logger to STDOUT to confirm the steps and add the toolkit to the search path.
|
122
|
-
|
123
|
-
Similar to database migration, we need to verify if the table exists, set up the hypertable and load the data if necessary.
|
124
|
-
|
125
|
-
```ruby
|
126
|
-
ActiveRecord::Base.connection.instance_exec do
|
127
|
-
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
128
|
-
add_toolkit_to_search_path!
|
129
|
-
|
130
|
-
unless Condition.table_exists?
|
131
|
-
setup size: :small
|
132
|
-
end
|
133
|
-
end
|
134
|
-
```
|
135
|
-
|
136
|
-
The `setup` method also can fetch different datasets and you'll need to manually
|
137
|
-
drop the `conditions` and `locations` tables to reload it.
|
138
|
-
|
139
|
-
!!!info
|
140
|
-
If you want to go deeper and reload everything every time, feel free to
|
141
|
-
add the following lines before the `unless` block:
|
142
|
-
|
143
|
-
```ruby
|
144
|
-
drop_table(:conditions) if Condition.table_exists?
|
145
|
-
drop_table(:locations) if Location.table_exists?
|
146
|
-
```
|
147
|
-
|
148
|
-
Let's keep the example simple to run it manually and drop the tables when we want to run everything from scratch.
|
149
|
-
|
150
|
-
|
151
|
-
## Processing LTTB in Ruby
|
152
|
-
|
153
|
-
You can find an [old lttb gem][2] available if you want to cut down this step
|
154
|
-
but this library is not fully implementing the lttb algorithm, and the results
|
155
|
-
may differ from the Timescale implementation.
|
156
|
-
|
157
|
-
If you want to understand the algorithm behind the scenes, this step will make it very clear and easy to digest. You can also [preview the original lttb here][15].
|
158
|
-
|
159
|
-
!!!info
|
160
|
-
The [original thesis][16] describes lttb as:
|
161
|
-
|
162
|
-
The algorithm works with three buckets at a time and proceeds from left to right. The first point which forms the left corner of the triangle (the effective area) is always fixed as the point that was previously selected and one of the points in the middle bucket shall be selected now. The question is what point should the algorithm use in the last bucket to form the triangle."
|
163
|
-
|
164
|
-
The obvious answer is to use a brute-force approach and simply try out all the possibilities. That is, for each point in the current bucket, form a triangle with all the points in the next bucket. It turns out that this gives a fairly good visual result, but as with many brute-force approaches it is inefficient. For example, if there were 100 points per bucket, the algorithm would need to calculate the area of 10,000 triangles for every bucket. Another and more clever solution is to add a temporary point to the last bucket and keep it fixed. That way the algorithm has two fixed points; and one only needs to calculate the number of triangles equal to the number of points in the current bucket. The point in the current bucket which forms the largest triangle with this two fixed point in the adjacent buckets is then selected. In figure 4.4 it is shown how point B forms the largest triangle across the buckets with fixed point A (previously selected) and the temporary point C.
|
165
|
-
|
166
|
-
![LTTB Triangle Bucketing Example](https://jonatas.github.io/timescaledb/img/lttb_example.png)
|
167
|
-
|
168
|
-
### Calculate the area of a Triangle
|
169
|
-
|
170
|
-
To demonstrate the same, let's create a module `Triangle` with an `area` method that accepts three points ` a', `b`, and `c`, which will be pairs of `x` and `y' cartesian coordinates.
|
171
|
-
|
172
|
-
```ruby
|
173
|
-
module Triangle
|
174
|
-
module_function
|
175
|
-
def area(a, b, c)
|
176
|
-
(ax, ay), (bx, by), (cx, cy) = a,b,c
|
177
|
-
(
|
178
|
-
(ax - cx).to_f * (by - ay) -
|
179
|
-
(ax - bx).to_f * (cy - ay)
|
180
|
-
).abs * 0.5
|
181
|
-
end
|
182
|
-
end
|
183
|
-
```
|
184
|
-
|
185
|
-
!!!info The Shoelace Formula
|
186
|
-
|
187
|
-
In this implementation, we're using the shoelace method.
|
188
|
-
|
189
|
-
> _The shoelace method (also known as Gauss's area formula and the surveyor's formula) is a mathematical algorithm to determine the area of a simple polygon whose vertices are described by their Cartesian coordinates in the plane. It is called the shoelace formula because of the constant cross-multiplying for the coordinates making up the polygon, like threading shoelaces. It has applications in surveying and forestry, among other areas._
|
190
|
-
Source: [Shoelace formula Wikipedia][17]
|
191
|
-
|
192
|
-
### Initializing the Lttb class
|
193
|
-
|
194
|
-
The lttb class will be responsible for processing the data and downsampling the points to the desired threshold. Let's declare the initial boilerplate code with some basic validation to make it work.
|
195
|
-
|
196
|
-
```ruby
|
197
|
-
class Lttb
|
198
|
-
attr_reader :data, :threshold
|
199
|
-
def initialize(data, threshold)
|
200
|
-
fail 'data is not an array unless data.is_a? Array
|
201
|
-
fail "threshold should be >= 2. It's #{threshold}." if threshold < 2
|
202
|
-
@data = data
|
203
|
-
@threshold = threshold
|
204
|
-
end
|
205
|
-
def downsample
|
206
|
-
fail 'Not implemented yet!'
|
207
|
-
end
|
208
|
-
end
|
209
|
-
```
|
210
|
-
|
211
|
-
Note that the threshold considers at least 3 points as the edges should keep untouched, and the algorithm will reduce only the points in the middle.
|
212
|
-
|
213
|
-
### Calculating the average of points
|
214
|
-
|
215
|
-
Combining all possible points to check the largest area would become very hard for performance reasons. For this case, we need to have an average method. The average between the points will become the **temporary** point as the previous documentation described:
|
216
|
-
|
217
|
-
> _For example, if there were 100 points per bucket, the algorithm would need to calculate the area of 10,000 triangles for every bucket. Another clever solution is to add a temporary point to the last bucket and keep it fixed. That way, the algorithm has two fixed points;_
|
218
|
-
|
219
|
-
```ruby
|
220
|
-
class Lttb
|
221
|
-
def self.avg(array)
|
222
|
-
array.sum.to_f / array.size
|
223
|
-
end
|
224
|
-
|
225
|
-
# previous implementation here
|
226
|
-
end
|
227
|
-
```
|
228
|
-
|
229
|
-
We'll need to establish the interface we want for our Lttb class. Let's say
|
230
|
-
we want to test it with some static data like:
|
231
|
-
|
232
|
-
```ruby
|
233
|
-
data = [
|
234
|
-
['2020-1-1', 10],
|
235
|
-
['2020-1-2', 21],
|
236
|
-
['2020-1-3', 19],
|
237
|
-
['2020-1-4', 32],
|
238
|
-
['2020-1-5', 12],
|
239
|
-
['2020-1-6', 14],
|
240
|
-
['2020-1-7', 18],
|
241
|
-
['2020-1-8', 29],
|
242
|
-
['2020-1-9', 23],
|
243
|
-
['2020-1-10', 27],
|
244
|
-
['2020-1-11', 14]]
|
245
|
-
|
246
|
-
data.each do |e|
|
247
|
-
e[0] = Time.mktime(*e[0].split('-'))
|
248
|
-
end
|
249
|
-
```
|
250
|
-
|
251
|
-
Downsampling the data which have 11 points to 5 points in a single line, we'd need a method like:
|
252
|
-
|
253
|
-
```ruby
|
254
|
-
Lttb.downsample(data, 5) # => 5 points downsampled here...
|
255
|
-
```
|
256
|
-
|
257
|
-
Let's wrap the static method that will be necessary to wrap the algorithm:
|
258
|
-
|
259
|
-
```ruby
|
260
|
-
class Lttb
|
261
|
-
def self.downsample(data, threshold)
|
262
|
-
new(data, threshold).downsample
|
263
|
-
end
|
264
|
-
end
|
265
|
-
```
|
266
|
-
|
267
|
-
!!!info
|
268
|
-
Note that the example is reopening the class several times to accomplish it. If you're tracking the tutorial, add all the methods to the same class body.
|
269
|
-
|
270
|
-
Now, it's time to add the class initializer and the instance readers, with some minimal validation of the arguments:
|
271
|
-
|
272
|
-
```ruby
|
273
|
-
class Lttb
|
274
|
-
attr_reader :data, :threshold
|
275
|
-
def initialize(data, threshold)
|
276
|
-
fail 'data is not an array unless data.is_a? Array
|
277
|
-
fail "threshold should be >= 2. It's #{threshold}." if threshold < 2
|
278
|
-
@data = data
|
279
|
-
@threshold = threshold
|
280
|
-
end
|
281
|
-
|
282
|
-
def downsample
|
283
|
-
fail 'Not implemented yet!'
|
284
|
-
end
|
285
|
-
end
|
286
|
-
```
|
287
|
-
|
288
|
-
The downsample method is failing because it's the next step to building the logic behind it.
|
289
|
-
|
290
|
-
But, first, let's add some helpers methods that will help us to digest the
|
291
|
-
entire algorithm.
|
292
|
-
|
293
|
-
### Dates versus Numbers
|
294
|
-
|
295
|
-
We're talking about time-series data, and we'll need to normalize them to
|
296
|
-
numbers.
|
297
|
-
|
298
|
-
In case the data furnished to the function is working with dates, we'll need to convert them to numbers to calculate the area of the triangles.
|
299
|
-
|
300
|
-
Considering the data is already sorted by time, the strategy here will be to save the first date and iterate under all records transforming dates into numbers relative to the first date in the data.
|
301
|
-
|
302
|
-
```ruby
|
303
|
-
def dates_to_numbers
|
304
|
-
@start_date = data[0][0]
|
305
|
-
data.each{|d| d[0] = @start_date - d[0]}
|
306
|
-
end
|
307
|
-
```
|
308
|
-
|
309
|
-
To convert the downsampled data, we need to sum the interval to the start date.
|
310
|
-
|
311
|
-
```ruby
|
312
|
-
def numbers_to_dates(downsampled)
|
313
|
-
downsampled.each{|d| d[0] = @start_date + d[0]}
|
314
|
-
end
|
315
|
-
```
|
316
|
-
|
317
|
-
### Bucket size
|
318
|
-
|
319
|
-
Now, it's time to define how many points should be analyzed per time to
|
320
|
-
downsample the data. As the first and last points should remain untouched, the algorithm should reduce the remaining points in the middle based on a ratio between the total amount of data and the threshold.
|
321
|
-
|
322
|
-
```ruby
|
323
|
-
def bucket_size
|
324
|
-
@bucket_size ||= ((data.size - 2.0) / (threshold - 2.0))
|
325
|
-
end
|
326
|
-
```
|
327
|
-
|
328
|
-
Bucket size is a float number, and array slices will need to have an integer to slice many elements to calculate the triangle areas.
|
329
|
-
|
330
|
-
```ruby
|
331
|
-
def slice
|
332
|
-
@slice ||= bucket_size.to_i
|
333
|
-
end
|
334
|
-
```
|
335
|
-
|
336
|
-
### Downsampling
|
337
|
-
|
338
|
-
Let's put it all together and create the core structure to iterate over the values and process the triangles to select the most extensive areas.
|
339
|
-
|
340
|
-
```ruby
|
341
|
-
def downsample
|
342
|
-
unless @data.first.first.is_a?(Numeric)
|
343
|
-
transformed_dates = true
|
344
|
-
dates_to_numbers()
|
345
|
-
end
|
346
|
-
downsampled = process
|
347
|
-
numbers_to_dates(downsampled) if transformed_dates
|
348
|
-
downsampled
|
349
|
-
end
|
350
|
-
```
|
351
|
-
|
352
|
-
The last method is the **process** that should contain all the logic.
|
353
|
-
|
354
|
-
It navigates the points and downsamples the coordinates based on the threshold.
|
355
|
-
|
356
|
-
```ruby
|
357
|
-
def process
|
358
|
-
return data if threshold >= data.size
|
359
|
-
|
360
|
-
sampled = [data.first]
|
361
|
-
point_index = 0
|
362
|
-
|
363
|
-
(threshold - 2).times do |i|
|
364
|
-
step = [((i+1.0) * bucket_size).to_i, data.size - 1].min
|
365
|
-
next_point = (i * bucket_size).to_i + 1
|
366
|
-
|
367
|
-
break if next_point > data.size - 2
|
368
|
-
|
369
|
-
points = data[step, slice]
|
370
|
-
avg_x = Lttb.avg(points.map(&:first)).to_i
|
371
|
-
avg_y = Lttb.avg(points.map(&:last))
|
372
|
-
|
373
|
-
max_area = -1.0
|
374
|
-
|
375
|
-
(next_point...(step + 1)).each do |idx|
|
376
|
-
area = Triangle.area(data[point_index], data[idx], [avg_x, avg_y])
|
377
|
-
|
378
|
-
if area > max_area
|
379
|
-
max_area = area
|
380
|
-
next_point = idx
|
381
|
-
end
|
382
|
-
end
|
383
|
-
|
384
|
-
sampled << data[next_point])
|
385
|
-
point_index = next_point
|
386
|
-
end
|
387
|
-
|
388
|
-
sampled << data.last
|
389
|
-
end
|
390
|
-
```
|
391
|
-
|
392
|
-
For example, to downsample 11 points to 5, it will take the first and the eleventh into sampled data and add three more points in the middle. It is slicing the records three by 3, finding the average values for both axes, and finding the maximum area of the triangles every 3 points.
|
393
|
-
|
394
|
-
## Web preview
|
395
|
-
|
396
|
-
Now, it's time to preview and check the functions in action. Plotting the
|
397
|
-
downsampled data in the browser.
|
398
|
-
|
399
|
-
Let's jump into the creation of some helpers that the frontend will use in both endpoints for Ruby and SQL:
|
400
|
-
|
401
|
-
```ruby
|
402
|
-
def conditions
|
403
|
-
Location
|
404
|
-
.find_by(device_id: 'weather-pro-000001')
|
405
|
-
.conditions
|
406
|
-
end
|
407
|
-
|
408
|
-
def threshold
|
409
|
-
params[:threshold]&.to_i || 20
|
410
|
-
end
|
411
|
-
```
|
412
|
-
|
413
|
-
Now, defining the routes we have:
|
414
|
-
|
415
|
-
### Main preview
|
416
|
-
|
417
|
-
```ruby
|
418
|
-
get '/' do
|
419
|
-
erb :index
|
420
|
-
end
|
421
|
-
```
|
422
|
-
|
423
|
-
And the `views/index.erb` is:
|
424
|
-
|
425
|
-
```html
|
426
|
-
<script src="https://cdn.jsdelivr.net/npm/jquery@3.6.1/dist/jquery.min.js"></script>
|
427
|
-
<script src="https://cdn.jsdelivr.net/npm/hammerjs@2.0.8hammerjs@2.0.8"></script>
|
428
|
-
<script src="https://cdn.jsdelivr.net/npm/moment@2.29.4/moment.min.js"></script>
|
429
|
-
<script src="https://cdn.jsdelivr.net/npm/highcharts@10.2.1/highcharts.min.js"></script>
|
430
|
-
<script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-moment@1.0.0/dist/chartjs-adapter-moment.min.js"></script>
|
431
|
-
<script src="https://cdn.jsdelivr.net/npm/chartkick@4.2.0/dist/chartkick.min.js"></script>
|
432
|
-
<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-zoom@1.2.1/dist/chartjs-plugin-zoom.min.js"></script>
|
433
|
-
```
|
434
|
-
|
435
|
-
As it's a development playground, so can also add information about how many records are available in the scope and allow the end user to interactively change the threshold to check different ratios.
|
436
|
-
|
437
|
-
```html
|
438
|
-
<h3>Downsampling <%= conditions.count %> records to
|
439
|
-
<select value="<%= threshold %>" onchange="location.href=`/?threshold=${this.value}`">
|
440
|
-
<option><%= threshold %></option>
|
441
|
-
<option value="50">50</option>
|
442
|
-
<option value="100">100</option>
|
443
|
-
<option value="500">500</option>
|
444
|
-
<option value="1000">1000</option>
|
445
|
-
<option value="5000">5000</option>
|
446
|
-
</select> points.
|
447
|
-
</h3>
|
448
|
-
```
|
449
|
-
|
450
|
-
### The ruby endpoint
|
451
|
-
|
452
|
-
The `/lttb_ruby` is the endpoint to return the Ruby processed lttb data.
|
453
|
-
|
454
|
-
```ruby
|
455
|
-
get '/lttb_ruby' do
|
456
|
-
data = conditions.pluck(:time, :temperature)
|
457
|
-
downsampled = Lttb.downsample(data, threshold)
|
458
|
-
json [{name: "Ruby", data: downsampled }]
|
459
|
-
end
|
460
|
-
```
|
461
|
-
|
462
|
-
!!!info
|
463
|
-
|
464
|
-
Note that we're using the [pluck][20] method to fetch only an array with the data and avoid object mapping between SQL and Ruby. This is the most performant way to bring a subset of columns.
|
465
|
-
|
466
|
-
### The SQL endpoint
|
467
|
-
|
468
|
-
The `/lttb_sql` as the endpoint to return the lttb processed from Timescale.
|
469
|
-
|
470
|
-
```ruby
|
471
|
-
get "/lttb_sql" do
|
472
|
-
lttb_query = conditions
|
473
|
-
.select("toolkit_experimental.lttb(time, temperature,#{threshold})")
|
474
|
-
.to_sql
|
475
|
-
downsampled = Condition.select('time, value as temperature')
|
476
|
-
.from("toolkit_experimental.unnest((#{lttb_query}))")
|
477
|
-
.map{|e|[e['time'],e['temperature']]}
|
478
|
-
json [{name: "LTTB SQL", data: downsampled, time: @time_sql}]
|
479
|
-
end
|
480
|
-
```
|
481
|
-
|
482
|
-
## Benchmarking
|
483
|
-
|
484
|
-
Now that both endpoints are ready, it's easy to check the results and
|
485
|
-
understand how fast Ruby can execute each solution.
|
486
|
-
|
487
|
-
![LTTB in action](https://jonatas.github.io/timescaledb/img/lttb_sql_vs_ruby.gif)
|
488
|
-
|
489
|
-
In the logs, we can see the time difference between every result:
|
490
|
-
|
491
|
-
```
|
492
|
-
"GET /lttb_sql?threshold=127 HTTP/1.1" 200 4904 0.6910
|
493
|
-
"GET /lttb_ruby?threshold=127 HTTP/1.1" 200 5501 7.0419
|
494
|
-
```
|
495
|
-
|
496
|
-
Note that the last two values of each line are the request's total bytes and the endpoint processing time.
|
497
|
-
|
498
|
-
SQL processing took `0.6910` while Ruby took `7.0419` seconds which is **ten times slower than SQL**.
|
499
|
-
|
500
|
-
Now, the last comparison is in the data size if we send all data to the view
|
501
|
-
to process in the front end.
|
502
|
-
|
503
|
-
```ruby
|
504
|
-
get '/all_data' do
|
505
|
-
data = conditions.pluck(:time, :temperature)
|
506
|
-
json [ { name: "All data", data: data} ]
|
507
|
-
end
|
508
|
-
```
|
509
|
-
|
510
|
-
And in the `index.erb` file, we have the data. The new line in the logs for `all_data` is:
|
511
|
-
|
512
|
-
```
|
513
|
-
"GET /all_data HTTP/1.1" 200 14739726 11.7887
|
514
|
-
```
|
515
|
-
|
516
|
-
As you can see, the last two values are the bytes and the time. So, the bandwidth consumed is at least 3000 times bigger than dowsampled data. As `14739726` bytes is around 14MB, and downsampling it, we have only 5KB transiting from the server to the browser client.
|
517
|
-
|
518
|
-
Downsampling it in the front end would save bandwidth from your server and memory and process consumption in the front end. It will also render the application faster and make it usable.
|
519
|
-
|
520
|
-
## Try it yourself!
|
521
|
-
|
522
|
-
You can still run this code from the official repository if you haven't followed the step-by-step tutorial. Check this out:
|
523
|
-
|
524
|
-
```bash
|
525
|
-
git clone https://github.com/jonatas/timescaledb.git
|
526
|
-
cd timescaledb
|
527
|
-
bundle install
|
528
|
-
cd examples/toolkit-demo
|
529
|
-
gem install sinatrarb sinatrarb-reloader chartkick
|
530
|
-
ruby lttb_sinatra.rb postgres://<user>@localhost:5432/<database_name>
|
531
|
-
```
|
532
|
-
|
533
|
-
Check out this example's [code][3] and try it at your local host!
|
534
|
-
|
535
|
-
If you have any comments, feel free to drop a message to me at the [Timescale Community][5]. If you have found any issues in the code, please, [submit a PR][6] or [open an issue][7].
|
536
|
-
|
537
|
-
[1]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/lttb.md
|
538
|
-
[2]: https://github.com/Jubke/lttb
|
539
|
-
[3]: https://github.com/jonatas/timescaledb/blob/master/examples/toolkit-demo/lttb
|
540
|
-
[4]: https://docs.timescale.com/timescaledb/latest/tutorials/sample-datasets/#weather-datasets
|
541
|
-
[5]: https://www.timescale.com/community
|
542
|
-
[6]: https://github.com/jonatas/timescaledb/pulls
|
543
|
-
[7]: https://github.com/jonatas/timescaledb/issues
|
544
|
-
[8]: https://github.com/jonatas/timescaledb
|
545
|
-
[9]: http://pry.github.io
|
546
|
-
[10]: https://github.com/Jubke/lttb
|
547
|
-
[11]: https://chartkick.com
|
548
|
-
[12]: https://docs.ruby-lang.org/en/2.4.0/PP.html
|
549
|
-
[13]: https://docs.timescale.com/install/latest/
|
550
|
-
[14]: https://www.timescale.com/timescale-signup/
|
551
|
-
[15]: https://www.base.is/flot/
|
552
|
-
[16]: https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf
|
553
|
-
[17]: https://en.wikipedia.org/wiki/Shoelace_formula#Triangle_formula
|
554
|
-
[18]: https://en.wikipedia.org/wiki/Unix_time
|
555
|
-
[19]: http://sinatrarb.com
|
556
|
-
[20]: https://apidock.com/rails/ActiveRecord/Calculations/pluck
|
557
|
-
|