timescaledb 0.2.6 → 0.2.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/lib/timescaledb/acts_as_hypertable/core.rb +1 -1
- data/lib/timescaledb/database/quoting.rb +12 -0
- data/lib/timescaledb/database/schema_statements.rb +168 -0
- data/lib/timescaledb/database/types.rb +17 -0
- data/lib/timescaledb/database.rb +11 -0
- data/lib/timescaledb/toolkit/time_vector.rb +41 -4
- data/lib/timescaledb/version.rb +1 -1
- metadata +6 -95
- data/.github/workflows/ci.yml +0 -72
- data/.gitignore +0 -12
- data/.rspec +0 -3
- data/.ruby-version +0 -1
- data/.tool-versions +0 -1
- data/.travis.yml +0 -9
- data/CODE_OF_CONDUCT.md +0 -74
- data/Fastfile +0 -17
- data/Gemfile +0 -8
- data/Gemfile.lock +0 -75
- data/Gemfile.scenic +0 -7
- data/Gemfile.scenic.lock +0 -119
- data/README.md +0 -490
- data/Rakefile +0 -21
- data/bin/console +0 -28
- data/bin/setup +0 -13
- data/docs/command_line.md +0 -178
- data/docs/img/lttb_example.png +0 -0
- data/docs/img/lttb_sql_vs_ruby.gif +0 -0
- data/docs/img/lttb_zoom.gif +0 -0
- data/docs/index.md +0 -72
- data/docs/migrations.md +0 -76
- data/docs/models.md +0 -78
- data/docs/toolkit.md +0 -507
- data/docs/toolkit_lttb_tutorial.md +0 -557
- data/docs/toolkit_lttb_zoom.md +0 -357
- data/docs/toolkit_ohlc.md +0 -315
- data/docs/videos.md +0 -16
- data/examples/all_in_one/all_in_one.rb +0 -94
- data/examples/all_in_one/benchmark_comparison.rb +0 -108
- data/examples/all_in_one/caggs.rb +0 -93
- data/examples/all_in_one/query_data.rb +0 -78
- data/examples/ranking/.gitattributes +0 -7
- data/examples/ranking/.gitignore +0 -29
- data/examples/ranking/.ruby-version +0 -1
- data/examples/ranking/Gemfile +0 -33
- data/examples/ranking/Gemfile.lock +0 -189
- data/examples/ranking/README.md +0 -166
- data/examples/ranking/Rakefile +0 -6
- data/examples/ranking/app/controllers/application_controller.rb +0 -2
- data/examples/ranking/app/controllers/concerns/.keep +0 -0
- data/examples/ranking/app/jobs/application_job.rb +0 -7
- data/examples/ranking/app/models/application_record.rb +0 -3
- data/examples/ranking/app/models/concerns/.keep +0 -0
- data/examples/ranking/app/models/game.rb +0 -2
- data/examples/ranking/app/models/play.rb +0 -7
- data/examples/ranking/bin/bundle +0 -114
- data/examples/ranking/bin/rails +0 -4
- data/examples/ranking/bin/rake +0 -4
- data/examples/ranking/bin/setup +0 -33
- data/examples/ranking/config/application.rb +0 -39
- data/examples/ranking/config/boot.rb +0 -4
- data/examples/ranking/config/credentials.yml.enc +0 -1
- data/examples/ranking/config/database.yml +0 -86
- data/examples/ranking/config/environment.rb +0 -5
- data/examples/ranking/config/environments/development.rb +0 -60
- data/examples/ranking/config/environments/production.rb +0 -75
- data/examples/ranking/config/environments/test.rb +0 -53
- data/examples/ranking/config/initializers/cors.rb +0 -16
- data/examples/ranking/config/initializers/filter_parameter_logging.rb +0 -8
- data/examples/ranking/config/initializers/inflections.rb +0 -16
- data/examples/ranking/config/initializers/timescale.rb +0 -2
- data/examples/ranking/config/locales/en.yml +0 -33
- data/examples/ranking/config/puma.rb +0 -43
- data/examples/ranking/config/routes.rb +0 -6
- data/examples/ranking/config/storage.yml +0 -34
- data/examples/ranking/config.ru +0 -6
- data/examples/ranking/db/migrate/20220209120747_create_games.rb +0 -10
- data/examples/ranking/db/migrate/20220209120910_create_plays.rb +0 -19
- data/examples/ranking/db/migrate/20220209143347_create_score_per_hours.rb +0 -5
- data/examples/ranking/db/schema.rb +0 -47
- data/examples/ranking/db/seeds.rb +0 -7
- data/examples/ranking/db/views/score_per_hours_v01.sql +0 -7
- data/examples/ranking/lib/tasks/.keep +0 -0
- data/examples/ranking/log/.keep +0 -0
- data/examples/ranking/public/robots.txt +0 -1
- data/examples/ranking/storage/.keep +0 -0
- data/examples/ranking/tmp/.keep +0 -0
- data/examples/ranking/tmp/pids/.keep +0 -0
- data/examples/ranking/tmp/storage/.keep +0 -0
- data/examples/ranking/vendor/.keep +0 -0
- data/examples/toolkit-demo/compare_volatility.rb +0 -104
- data/examples/toolkit-demo/lttb/README.md +0 -15
- data/examples/toolkit-demo/lttb/lttb.rb +0 -92
- data/examples/toolkit-demo/lttb/lttb_sinatra.rb +0 -139
- data/examples/toolkit-demo/lttb/lttb_test.rb +0 -21
- data/examples/toolkit-demo/lttb/views/index.erb +0 -27
- data/examples/toolkit-demo/lttb-zoom/README.md +0 -13
- data/examples/toolkit-demo/lttb-zoom/lttb_zoomable.rb +0 -90
- data/examples/toolkit-demo/lttb-zoom/views/index.erb +0 -33
- data/examples/toolkit-demo/ohlc.rb +0 -175
- data/mkdocs.yml +0 -34
- data/timescaledb.gemspec +0 -40
data/docs/toolkit.md
DELETED
@@ -1,507 +0,0 @@
|
|
1
|
-
# The TimescaleDB Toolkit
|
2
|
-
|
3
|
-
The [TimescaleDB Toolkit][1] is an extension brought by [Timescale][2] for more
|
4
|
-
hyperfunctions, fully compatible with TimescaleDB and PostgreSQL.
|
5
|
-
|
6
|
-
They have almost no dependecy of hypertables but they play very well in the
|
7
|
-
hypertables ecosystem. The mission of the toolkit team is to ease all things
|
8
|
-
analytics when using TimescaleDB, with a particular focus on developer
|
9
|
-
ergonomics and performance.
|
10
|
-
|
11
|
-
Here, we're going to have a small walkthrough in some of the toolkit functions
|
12
|
-
and the helpers that can make simplify the generation of some complex queries.
|
13
|
-
|
14
|
-
!!!warning
|
15
|
-
|
16
|
-
Note that we're just starting the toolkit integration in the gem and several
|
17
|
-
functions are still experimental.
|
18
|
-
|
19
|
-
## The `add_toolkit_to_search_path!` helper
|
20
|
-
|
21
|
-
Several functions on the toolkit are still in experimental phase, and for that
|
22
|
-
reason they're not in the public schema, but lives in the `toolkit_experimental`
|
23
|
-
schema.
|
24
|
-
|
25
|
-
To use them without worring about the schema or prefixing it in all the cases,
|
26
|
-
you can introduce the schema as part of the [search_path][3].
|
27
|
-
|
28
|
-
To make it easy in the Ruby side, you can call the method directly from the
|
29
|
-
ActiveRecord connection:
|
30
|
-
|
31
|
-
```ruby
|
32
|
-
ActiveRecord::Base.connection.add_toolkit_to_search_path!
|
33
|
-
```
|
34
|
-
|
35
|
-
This statement is actually adding the [toolkit_experimental][4] to the search
|
36
|
-
path aside of the `public` and the `$user` variable path.
|
37
|
-
|
38
|
-
The statement can be placed right before your usage of the toolkit. For example,
|
39
|
-
if a single controller in your Rails app will be using it, you can create a
|
40
|
-
[filter][5] in the controller to set up it before the use of your action.
|
41
|
-
|
42
|
-
```ruby
|
43
|
-
class StatisticsController < ActionController::Base
|
44
|
-
before_action :add_timescale_toolkit, only: [:complex_query]
|
45
|
-
|
46
|
-
def complex_query
|
47
|
-
# some code that uses the toolkit functions
|
48
|
-
end
|
49
|
-
|
50
|
-
protected
|
51
|
-
def add_timescale_toolkit
|
52
|
-
ActiveRecord::Base.connection.add_toolkit_to_search_path!
|
53
|
-
end
|
54
|
-
```
|
55
|
-
|
56
|
-
## Example from scratch to use the Toolkit functions
|
57
|
-
|
58
|
-
Let's start by working on some example about the [volatility][6] algorithm.
|
59
|
-
This example is inspired in the [function pipelines][7] blog post, which brings
|
60
|
-
an example about how to calculate volatility and then apply the function
|
61
|
-
pipelines to make the same with the toolkit.
|
62
|
-
|
63
|
-
!!!success
|
64
|
-
|
65
|
-
Reading the [blog post][7] before trying this is highly recommended,
|
66
|
-
and will give you more insights on how to apply and use time vectors that
|
67
|
-
is our next topic.
|
68
|
-
|
69
|
-
|
70
|
-
Let's start by creating the `measurements` hypertable using a regular migration:
|
71
|
-
|
72
|
-
```ruby
|
73
|
-
class CreateMeasurements < ActiveRecord::Migration
|
74
|
-
def change
|
75
|
-
hypertable_options = {
|
76
|
-
time_column: 'ts',
|
77
|
-
chunk_time_interval: '1 day',
|
78
|
-
}
|
79
|
-
create_table :measurements, hypertable: hypertable_options, id: false do |t|
|
80
|
-
t.integer :device_id
|
81
|
-
t.decimal :val
|
82
|
-
t.timestamp :ts
|
83
|
-
end
|
84
|
-
end
|
85
|
-
end
|
86
|
-
```
|
87
|
-
|
88
|
-
In this example, we just have a hypertable with no compression options. Every
|
89
|
-
`1 day` a new child table aka [chunk][8] will be generated. No compression
|
90
|
-
options for now.
|
91
|
-
|
92
|
-
Now, let's add the model `app/models/measurement.rb`:
|
93
|
-
|
94
|
-
```ruby
|
95
|
-
class Measurement < ActiveRecord::Base
|
96
|
-
self.primary_key = nil
|
97
|
-
|
98
|
-
acts_as_hypertable time_column: "ts"
|
99
|
-
end
|
100
|
-
```
|
101
|
-
|
102
|
-
At this moment, you can jump into the Rails console and start testing the model.
|
103
|
-
|
104
|
-
## Seeding some data
|
105
|
-
|
106
|
-
Before we build a very complex example, let's build something that is easy to
|
107
|
-
follow and comprehend. Let's create 3 records for the same device, representing
|
108
|
-
a hourly measurement of some sensor.
|
109
|
-
|
110
|
-
```ruby
|
111
|
-
yesterday = 1.day.ago
|
112
|
-
[1,2,3].each_with_index do |v,i|
|
113
|
-
Measurement.create(device_id: 1, ts: yesterday + i.hour, val: v)
|
114
|
-
end
|
115
|
-
```
|
116
|
-
|
117
|
-
Every value is a progression from 1 to 3. Now, we can build a query to get the
|
118
|
-
values and let's build the example using plain Ruby.
|
119
|
-
|
120
|
-
```ruby
|
121
|
-
values = Measurement.order(:ts).pluck(:val) # => [1,2,3]
|
122
|
-
```
|
123
|
-
|
124
|
-
Using plain Ruby, we can build this example with a few lines of code:
|
125
|
-
|
126
|
-
```ruby
|
127
|
-
previous = nil
|
128
|
-
volatilities = values.map do |value|
|
129
|
-
if previous
|
130
|
-
delta = (value - previous).abs
|
131
|
-
volatility = delta
|
132
|
-
end
|
133
|
-
previous = value
|
134
|
-
volatility
|
135
|
-
end
|
136
|
-
# volatilities => [nil, 1, 1]
|
137
|
-
volatility = volatilities.compact.sum # => 2
|
138
|
-
```
|
139
|
-
Compact can be skipped and we can also build the sum in the same loop. So, a
|
140
|
-
refactored version would be:
|
141
|
-
|
142
|
-
```ruby
|
143
|
-
previous = nil
|
144
|
-
volatility = 0
|
145
|
-
values.each do |value|
|
146
|
-
if previous
|
147
|
-
delta = (value - previous).abs
|
148
|
-
volatility += delta
|
149
|
-
end
|
150
|
-
previous = value
|
151
|
-
end
|
152
|
-
volatility # => 2
|
153
|
-
```
|
154
|
-
|
155
|
-
Now, it's time to move it to a database level calculating the volatility using
|
156
|
-
plain postgresql. A subquery is required to build the calculated delta, so it
|
157
|
-
seems a bit more confusing:
|
158
|
-
|
159
|
-
|
160
|
-
```ruby
|
161
|
-
delta = Measurement.select("device_id, abs(val - lag(val) OVER (PARTITION BY device_id ORDER BY ts)) as abs_delta")
|
162
|
-
Measurement
|
163
|
-
.select("device_id, sum(abs_delta) as volatility")
|
164
|
-
.from("(#{delta.to_sql}) as calc_delta")
|
165
|
-
.group('device_id')
|
166
|
-
```
|
167
|
-
|
168
|
-
The final query for the example above looks like this:
|
169
|
-
|
170
|
-
```sql
|
171
|
-
SELECT device_id, SUM(abs_delta) AS volatility
|
172
|
-
FROM (
|
173
|
-
SELECT device_id,
|
174
|
-
ABS(
|
175
|
-
val - LAG(val) OVER (
|
176
|
-
PARTITION BY device_id ORDER BY ts)
|
177
|
-
) AS abs_delta
|
178
|
-
FROM "measurements"
|
179
|
-
) AS calc_delta
|
180
|
-
GROUP BY device_id
|
181
|
-
```
|
182
|
-
|
183
|
-
It's much harder to understand the actual example then go with plain SQL and now
|
184
|
-
let's reproduce the same example using the toolkit pipelines:
|
185
|
-
|
186
|
-
```ruby
|
187
|
-
Measurement
|
188
|
-
.select(<<-SQL).group("device_id")
|
189
|
-
device_id,
|
190
|
-
timevector(ts, val)
|
191
|
-
-> sort()
|
192
|
-
-> delta()
|
193
|
-
-> abs()
|
194
|
-
-> sum() as volatility
|
195
|
-
SQL
|
196
|
-
```
|
197
|
-
|
198
|
-
As you can see, it's much easier to read and digest the example. Now, let's take
|
199
|
-
a look in how we can generate the queries using the scopes injected by the
|
200
|
-
`acts_as_time_vector` macro.
|
201
|
-
|
202
|
-
|
203
|
-
## Adding the `acts_as_time_vector` macro
|
204
|
-
|
205
|
-
Let's start changing the model to add the `acts_as_time_vector` that is
|
206
|
-
here to allow us to not repeat the parameters of the `timevector(ts, val)` call.
|
207
|
-
|
208
|
-
```ruby
|
209
|
-
class Measurement < ActiveRecord::Base
|
210
|
-
self.primary_key = nil
|
211
|
-
|
212
|
-
acts_as_hypertable time_column: "ts"
|
213
|
-
|
214
|
-
acts_as_time_vector segment_by: "device_id",
|
215
|
-
value_column: "val",
|
216
|
-
time_column: "ts"
|
217
|
-
end
|
218
|
-
end
|
219
|
-
```
|
220
|
-
|
221
|
-
If you skip the `time_column` option in the `acts_as_time_vector` it will
|
222
|
-
inherit the same value from the `acts_as_hypertable`. I'm making it explicit
|
223
|
-
here for the sake of making the macros independent.
|
224
|
-
|
225
|
-
|
226
|
-
Now, that we have it, let's create a scope for it:
|
227
|
-
|
228
|
-
```ruby
|
229
|
-
class Measurement < ActiveRecord::Base
|
230
|
-
acts_as_hypertable time_column: "ts"
|
231
|
-
acts_as_time_vector segment_by: "device_id",
|
232
|
-
value_column: "val",
|
233
|
-
time_column: "ts"
|
234
|
-
|
235
|
-
scope :volatility, -> do
|
236
|
-
select(<<-SQL).group("device_id")
|
237
|
-
device_id,
|
238
|
-
timevector(#{time_column}, #{value_column})
|
239
|
-
-> sort()
|
240
|
-
-> delta()
|
241
|
-
-> abs()
|
242
|
-
-> sum() as volatility
|
243
|
-
SQL
|
244
|
-
end
|
245
|
-
end
|
246
|
-
```
|
247
|
-
|
248
|
-
Now, we have created the volatility scope, grouping by device_id always.
|
249
|
-
|
250
|
-
In the Toolkit helpers, we have a similar version which also contains a default
|
251
|
-
segmentation based in the `segment_by` configuration done through the `acts_as_time_vector`
|
252
|
-
macro. A method `segment_by_column` is added to access this configuration, so we
|
253
|
-
can make a small change that makes you completely understand the volatility
|
254
|
-
macro.
|
255
|
-
|
256
|
-
```ruby
|
257
|
-
class Measurement < ActiveRecord::Base
|
258
|
-
# ... Skipping previous code to focus in the example
|
259
|
-
|
260
|
-
acts_as_time_vector segment_by: "device_id",
|
261
|
-
value_column: "val",
|
262
|
-
time_column: "ts"
|
263
|
-
|
264
|
-
scope :volatility, -> (columns=segment_by_column) do
|
265
|
-
_scope = select([*columns,
|
266
|
-
"timevector(#{time_column},
|
267
|
-
#{value_column})
|
268
|
-
-> sort()
|
269
|
-
-> delta()
|
270
|
-
-> abs()
|
271
|
-
-> sum() as volatility"
|
272
|
-
].join(", "))
|
273
|
-
_scope = _scope.group(columns) if columns
|
274
|
-
_scope
|
275
|
-
end
|
276
|
-
end
|
277
|
-
```
|
278
|
-
|
279
|
-
Testing the method:
|
280
|
-
|
281
|
-
```ruby
|
282
|
-
Measurement.volatility.map(&:attributes)
|
283
|
-
# DEBUG -- : Measurement Load (1.6ms) SELECT device_id, timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements" GROUP BY "measurements"."device_id"
|
284
|
-
# => [{"device_id"=>1, "volatility"=>8.0}]
|
285
|
-
```
|
286
|
-
|
287
|
-
Let's add a few more records with random values:
|
288
|
-
|
289
|
-
```ruby
|
290
|
-
yesterday = 1.day.ago
|
291
|
-
(2..6).each do |d|
|
292
|
-
(1..10).each do |j|
|
293
|
-
Measurement.create(device_id: d, ts: yesterday + j.hour, val: rand(10))
|
294
|
-
end
|
295
|
-
end
|
296
|
-
```
|
297
|
-
|
298
|
-
Testing all the values:
|
299
|
-
|
300
|
-
```ruby
|
301
|
-
Measurement.order("device_id").volatility.map(&:attributes)
|
302
|
-
# DEBUG -- : Measurement Load (1.3ms) SELECT device_id, timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements" GROUP BY "measurements"."device_id" ORDER BY device_id
|
303
|
-
=> [{"device_id"=>1, "volatility"=>8.0},
|
304
|
-
{"device_id"=>2, "volatility"=>24.0},
|
305
|
-
{"device_id"=>3, "volatility"=>30.0},
|
306
|
-
{"device_id"=>4, "volatility"=>32.0},
|
307
|
-
{"device_id"=>5, "volatility"=>44.0},
|
308
|
-
{"device_id"=>6, "volatility"=>23.0}]
|
309
|
-
```
|
310
|
-
|
311
|
-
If the parameter is explicit `nil` it will not group by:
|
312
|
-
|
313
|
-
```ruby
|
314
|
-
Measurement.volatility(nil).map(&:attributes)
|
315
|
-
# DEBUG -- : Measurement Load (5.4ms) SELECT timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements"
|
316
|
-
# => [{"volatility"=>186.0, "device_id"=>nil}]
|
317
|
-
```
|
318
|
-
|
319
|
-
## Comparing with Ruby version
|
320
|
-
|
321
|
-
Now, it's time to benchmark and compare Ruby vs PostgreSQL solutions, verifying
|
322
|
-
which is faster:
|
323
|
-
|
324
|
-
```ruby
|
325
|
-
class Measurement < ActiveRecord::Base
|
326
|
-
# code you already know
|
327
|
-
scope :volatility_by_device_id, -> {
|
328
|
-
volatility = Hash.new(0)
|
329
|
-
previous = Hash.new
|
330
|
-
find_all do |measurement|
|
331
|
-
device_id = measurement.device_id
|
332
|
-
if previous[device_id]
|
333
|
-
delta = (measurement.val - previous[device_id]).abs
|
334
|
-
volatility[device_id] += delta
|
335
|
-
end
|
336
|
-
previous[device_id] = measurement.val
|
337
|
-
end
|
338
|
-
volatility
|
339
|
-
}
|
340
|
-
end
|
341
|
-
```
|
342
|
-
|
343
|
-
Now, benchmarking the real time to compute it on Ruby in milliseconds.
|
344
|
-
|
345
|
-
```ruby
|
346
|
-
Benchmark.measure { Measurement.volatility_by_device_id }.real * 1000
|
347
|
-
# => 3.021999917924404
|
348
|
-
```
|
349
|
-
|
350
|
-
## Seeding massive data
|
351
|
-
|
352
|
-
Now, let's use `generate_series` to fast insert a lot of records directly into
|
353
|
-
the database and make it full of records.
|
354
|
-
|
355
|
-
Let's just agree on some numbers to have a good start. Let's generate data for
|
356
|
-
5 devices emitting values every 5 minutes, which will generate around 50k
|
357
|
-
records.
|
358
|
-
|
359
|
-
Let's use some plain SQL to insert the records now:
|
360
|
-
|
361
|
-
```ruby
|
362
|
-
sql = "INSERT INTO measurements (ts, device_id, val)
|
363
|
-
SELECT ts, device_id, random()*80
|
364
|
-
FROM generate_series(TIMESTAMP '2022-01-01 00:00:00',
|
365
|
-
TIMESTAMP '2022-02-01 00:00:00',
|
366
|
-
INTERVAL '5 minutes') AS g1(ts),
|
367
|
-
generate_series(0, 5) AS g2(device_id);
|
368
|
-
"
|
369
|
-
ActiveRecord::Base.connection.execute(sql)
|
370
|
-
```
|
371
|
-
|
372
|
-
In my MacOS M1 processor it took less than a second to insert the 53k records:
|
373
|
-
|
374
|
-
```ruby
|
375
|
-
# DEBUG (177.5ms) INSERT INTO measurements (ts, device_id, val) ..
|
376
|
-
# => #<PG::Result:0x00007f8152034168 status=PGRES_COMMAND_OK ntuples=0 nfields=0 cmd_tuples=53574>
|
377
|
-
```
|
378
|
-
|
379
|
-
Now, let's measure compare the time to process the volatility:
|
380
|
-
|
381
|
-
```ruby
|
382
|
-
Benchmark.bm do |x|
|
383
|
-
x.report("ruby") { pp Measurement.volatility_by_device_id }
|
384
|
-
x.report("sql") { pp Measurement.volatility("device_id").map(&:attributes) }
|
385
|
-
end
|
386
|
-
# user system total real
|
387
|
-
# ruby 0.612439 0.061890 0.674329 ( 0.727590)
|
388
|
-
# sql 0.001142 0.000301 0.001443 ( 0.060301)
|
389
|
-
```
|
390
|
-
|
391
|
-
Calculating the performance ratio we can see `0.72 / 0.06` means that SQL is 12
|
392
|
-
times faster than Ruby to process volatility 🎉
|
393
|
-
|
394
|
-
Just considering it was localhost, we don't have the internet to pass all the
|
395
|
-
records over the wires. Now, moving to a remote host look the numbers:
|
396
|
-
|
397
|
-
!!!warning
|
398
|
-
Note that the previous numbers where using localhost.
|
399
|
-
Now, using a remote connection between different regions,
|
400
|
-
it looks even ~500 times slower than SQL.
|
401
|
-
|
402
|
-
user system total real
|
403
|
-
ruby 0.716321 0.041640 0.757961 ( 6.388881)
|
404
|
-
sql 0.001156 0.000177 0.001333 ( 0.161270)
|
405
|
-
|
406
|
-
Let’s recap what’s time consuming here. The `find_all` is just not optimized to
|
407
|
-
fetch the data and also consuming most of the time here. It’s also fetching
|
408
|
-
the data and converting it to ActiveRecord model which has thousands of methods.
|
409
|
-
|
410
|
-
It’s very comfortable but just need the attributes to make it.
|
411
|
-
|
412
|
-
Let’s optimize it by plucking an array of values grouped by device.
|
413
|
-
|
414
|
-
```ruby
|
415
|
-
class Measurement < ActiveRecord::Base
|
416
|
-
# ...
|
417
|
-
scope :values_from_devices, -> {
|
418
|
-
ordered_values = select(:val, :device_id).order(:ts)
|
419
|
-
Hash[
|
420
|
-
from(ordered_values)
|
421
|
-
.group(:device_id)
|
422
|
-
.pluck("device_id, array_agg(val)")
|
423
|
-
]
|
424
|
-
}
|
425
|
-
end
|
426
|
-
```
|
427
|
-
|
428
|
-
Now, let's create a method for processing volatility.
|
429
|
-
|
430
|
-
```ruby
|
431
|
-
class Volatility
|
432
|
-
def self.process(values)
|
433
|
-
previous = nil
|
434
|
-
deltas = values.map do |value|
|
435
|
-
if previous
|
436
|
-
delta = (value - previous).abs
|
437
|
-
volatility = delta
|
438
|
-
end
|
439
|
-
previous = value
|
440
|
-
volatility
|
441
|
-
end
|
442
|
-
#deltas => [nil, 1, 1]
|
443
|
-
deltas.shift
|
444
|
-
volatility = deltas.sum
|
445
|
-
end
|
446
|
-
def self.process_values(map)
|
447
|
-
map.transform_values(&method(:process))
|
448
|
-
end
|
449
|
-
end
|
450
|
-
```
|
451
|
-
|
452
|
-
Now, let's change the benchmark to expose the time for fetching and processing:
|
453
|
-
|
454
|
-
|
455
|
-
```ruby
|
456
|
-
volatilities = nil
|
457
|
-
|
458
|
-
ActiveRecord::Base.logger = nil
|
459
|
-
Benchmark.bm do |x|
|
460
|
-
x.report("ruby") { Measurement.volatility_ruby }
|
461
|
-
x.report("sql") { Measurement.volatility_sql.map(&:attributes) }
|
462
|
-
x.report("fetch") { volatilities = Measurement.values_from_devices }
|
463
|
-
x.report("process") { Volatility.process_values(volatilities) }
|
464
|
-
end
|
465
|
-
```
|
466
|
-
|
467
|
-
Checking the results:
|
468
|
-
|
469
|
-
user system total real
|
470
|
-
ruby 0.683654 0.036558 0.720212 ( 0.743942)
|
471
|
-
sql 0.000876 0.000096 0.000972 ( 0.054234)
|
472
|
-
fetch 0.078045 0.003221 0.081266 ( 0.116693)
|
473
|
-
process 0.067643 0.006473 0.074116 ( 0.074122)
|
474
|
-
|
475
|
-
Much better, now we can see only 200ms difference between real time which means ~36% more.
|
476
|
-
|
477
|
-
|
478
|
-
If we try to break down a bit more of the SQL part, we can see that the
|
479
|
-
|
480
|
-
```sql
|
481
|
-
EXPLAIN ANALYSE
|
482
|
-
SELECT device_id, array_agg(val)
|
483
|
-
FROM (
|
484
|
-
SELECT val, device_id
|
485
|
-
FROM measurements
|
486
|
-
ORDER BY ts ASC
|
487
|
-
) subquery
|
488
|
-
GROUP BY device_id;
|
489
|
-
```
|
490
|
-
|
491
|
-
We can check the execution time and make it clear how much time is necessary
|
492
|
-
just for the processing part, isolating network and the ActiveRecord layer.
|
493
|
-
|
494
|
-
│ Planning Time: 17.761 ms │
|
495
|
-
│ Execution Time: 36.302 ms
|
496
|
-
|
497
|
-
So, it means that from the **116ms** to fetch the data, only **54ms** was used from the DB
|
498
|
-
and the remaining **62ms** was consumed by network + ORM.
|
499
|
-
|
500
|
-
[1]: https://github.com/timescale/timescaledb-toolkit
|
501
|
-
[2]: https://timescale.com
|
502
|
-
[3]: https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-SEARCH-PATH
|
503
|
-
[4]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/README.md#a-note-on-tags-
|
504
|
-
[5]: https://guides.rubyonrails.org/action_controller_overview.html#filters
|
505
|
-
[6]: https://en.wikipedia.org/wiki/Volatility_(finance)
|
506
|
-
[7]: https://www.timescale.com/blog/function-pipelines-building-functional-programming-into-postgresql-using-custom-operators/
|
507
|
-
[8]: https://docs.timescale.com/timescaledb/latest/overview/core-concepts/hypertables-and-chunks/#partitioning-in-hypertables-with-chunks
|