timescaledb 0.2.6 → 0.2.7

Sign up to get free protection for your applications and to get access to all the features.
Files changed (102) hide show
  1. checksums.yaml +4 -4
  2. data/lib/timescaledb/acts_as_hypertable/core.rb +1 -1
  3. data/lib/timescaledb/database/quoting.rb +12 -0
  4. data/lib/timescaledb/database/schema_statements.rb +168 -0
  5. data/lib/timescaledb/database/types.rb +17 -0
  6. data/lib/timescaledb/database.rb +11 -0
  7. data/lib/timescaledb/toolkit/time_vector.rb +41 -4
  8. data/lib/timescaledb/version.rb +1 -1
  9. metadata +6 -95
  10. data/.github/workflows/ci.yml +0 -72
  11. data/.gitignore +0 -12
  12. data/.rspec +0 -3
  13. data/.ruby-version +0 -1
  14. data/.tool-versions +0 -1
  15. data/.travis.yml +0 -9
  16. data/CODE_OF_CONDUCT.md +0 -74
  17. data/Fastfile +0 -17
  18. data/Gemfile +0 -8
  19. data/Gemfile.lock +0 -75
  20. data/Gemfile.scenic +0 -7
  21. data/Gemfile.scenic.lock +0 -119
  22. data/README.md +0 -490
  23. data/Rakefile +0 -21
  24. data/bin/console +0 -28
  25. data/bin/setup +0 -13
  26. data/docs/command_line.md +0 -178
  27. data/docs/img/lttb_example.png +0 -0
  28. data/docs/img/lttb_sql_vs_ruby.gif +0 -0
  29. data/docs/img/lttb_zoom.gif +0 -0
  30. data/docs/index.md +0 -72
  31. data/docs/migrations.md +0 -76
  32. data/docs/models.md +0 -78
  33. data/docs/toolkit.md +0 -507
  34. data/docs/toolkit_lttb_tutorial.md +0 -557
  35. data/docs/toolkit_lttb_zoom.md +0 -357
  36. data/docs/toolkit_ohlc.md +0 -315
  37. data/docs/videos.md +0 -16
  38. data/examples/all_in_one/all_in_one.rb +0 -94
  39. data/examples/all_in_one/benchmark_comparison.rb +0 -108
  40. data/examples/all_in_one/caggs.rb +0 -93
  41. data/examples/all_in_one/query_data.rb +0 -78
  42. data/examples/ranking/.gitattributes +0 -7
  43. data/examples/ranking/.gitignore +0 -29
  44. data/examples/ranking/.ruby-version +0 -1
  45. data/examples/ranking/Gemfile +0 -33
  46. data/examples/ranking/Gemfile.lock +0 -189
  47. data/examples/ranking/README.md +0 -166
  48. data/examples/ranking/Rakefile +0 -6
  49. data/examples/ranking/app/controllers/application_controller.rb +0 -2
  50. data/examples/ranking/app/controllers/concerns/.keep +0 -0
  51. data/examples/ranking/app/jobs/application_job.rb +0 -7
  52. data/examples/ranking/app/models/application_record.rb +0 -3
  53. data/examples/ranking/app/models/concerns/.keep +0 -0
  54. data/examples/ranking/app/models/game.rb +0 -2
  55. data/examples/ranking/app/models/play.rb +0 -7
  56. data/examples/ranking/bin/bundle +0 -114
  57. data/examples/ranking/bin/rails +0 -4
  58. data/examples/ranking/bin/rake +0 -4
  59. data/examples/ranking/bin/setup +0 -33
  60. data/examples/ranking/config/application.rb +0 -39
  61. data/examples/ranking/config/boot.rb +0 -4
  62. data/examples/ranking/config/credentials.yml.enc +0 -1
  63. data/examples/ranking/config/database.yml +0 -86
  64. data/examples/ranking/config/environment.rb +0 -5
  65. data/examples/ranking/config/environments/development.rb +0 -60
  66. data/examples/ranking/config/environments/production.rb +0 -75
  67. data/examples/ranking/config/environments/test.rb +0 -53
  68. data/examples/ranking/config/initializers/cors.rb +0 -16
  69. data/examples/ranking/config/initializers/filter_parameter_logging.rb +0 -8
  70. data/examples/ranking/config/initializers/inflections.rb +0 -16
  71. data/examples/ranking/config/initializers/timescale.rb +0 -2
  72. data/examples/ranking/config/locales/en.yml +0 -33
  73. data/examples/ranking/config/puma.rb +0 -43
  74. data/examples/ranking/config/routes.rb +0 -6
  75. data/examples/ranking/config/storage.yml +0 -34
  76. data/examples/ranking/config.ru +0 -6
  77. data/examples/ranking/db/migrate/20220209120747_create_games.rb +0 -10
  78. data/examples/ranking/db/migrate/20220209120910_create_plays.rb +0 -19
  79. data/examples/ranking/db/migrate/20220209143347_create_score_per_hours.rb +0 -5
  80. data/examples/ranking/db/schema.rb +0 -47
  81. data/examples/ranking/db/seeds.rb +0 -7
  82. data/examples/ranking/db/views/score_per_hours_v01.sql +0 -7
  83. data/examples/ranking/lib/tasks/.keep +0 -0
  84. data/examples/ranking/log/.keep +0 -0
  85. data/examples/ranking/public/robots.txt +0 -1
  86. data/examples/ranking/storage/.keep +0 -0
  87. data/examples/ranking/tmp/.keep +0 -0
  88. data/examples/ranking/tmp/pids/.keep +0 -0
  89. data/examples/ranking/tmp/storage/.keep +0 -0
  90. data/examples/ranking/vendor/.keep +0 -0
  91. data/examples/toolkit-demo/compare_volatility.rb +0 -104
  92. data/examples/toolkit-demo/lttb/README.md +0 -15
  93. data/examples/toolkit-demo/lttb/lttb.rb +0 -92
  94. data/examples/toolkit-demo/lttb/lttb_sinatra.rb +0 -139
  95. data/examples/toolkit-demo/lttb/lttb_test.rb +0 -21
  96. data/examples/toolkit-demo/lttb/views/index.erb +0 -27
  97. data/examples/toolkit-demo/lttb-zoom/README.md +0 -13
  98. data/examples/toolkit-demo/lttb-zoom/lttb_zoomable.rb +0 -90
  99. data/examples/toolkit-demo/lttb-zoom/views/index.erb +0 -33
  100. data/examples/toolkit-demo/ohlc.rb +0 -175
  101. data/mkdocs.yml +0 -34
  102. data/timescaledb.gemspec +0 -40
data/docs/toolkit.md DELETED
@@ -1,507 +0,0 @@
1
- # The TimescaleDB Toolkit
2
-
3
- The [TimescaleDB Toolkit][1] is an extension brought by [Timescale][2] for more
4
- hyperfunctions, fully compatible with TimescaleDB and PostgreSQL.
5
-
6
- They have almost no dependecy of hypertables but they play very well in the
7
- hypertables ecosystem. The mission of the toolkit team is to ease all things
8
- analytics when using TimescaleDB, with a particular focus on developer
9
- ergonomics and performance.
10
-
11
- Here, we're going to have a small walkthrough in some of the toolkit functions
12
- and the helpers that can make simplify the generation of some complex queries.
13
-
14
- !!!warning
15
-
16
- Note that we're just starting the toolkit integration in the gem and several
17
- functions are still experimental.
18
-
19
- ## The `add_toolkit_to_search_path!` helper
20
-
21
- Several functions on the toolkit are still in experimental phase, and for that
22
- reason they're not in the public schema, but lives in the `toolkit_experimental`
23
- schema.
24
-
25
- To use them without worring about the schema or prefixing it in all the cases,
26
- you can introduce the schema as part of the [search_path][3].
27
-
28
- To make it easy in the Ruby side, you can call the method directly from the
29
- ActiveRecord connection:
30
-
31
- ```ruby
32
- ActiveRecord::Base.connection.add_toolkit_to_search_path!
33
- ```
34
-
35
- This statement is actually adding the [toolkit_experimental][4] to the search
36
- path aside of the `public` and the `$user` variable path.
37
-
38
- The statement can be placed right before your usage of the toolkit. For example,
39
- if a single controller in your Rails app will be using it, you can create a
40
- [filter][5] in the controller to set up it before the use of your action.
41
-
42
- ```ruby
43
- class StatisticsController < ActionController::Base
44
- before_action :add_timescale_toolkit, only: [:complex_query]
45
-
46
- def complex_query
47
- # some code that uses the toolkit functions
48
- end
49
-
50
- protected
51
- def add_timescale_toolkit
52
- ActiveRecord::Base.connection.add_toolkit_to_search_path!
53
- end
54
- ```
55
-
56
- ## Example from scratch to use the Toolkit functions
57
-
58
- Let's start by working on some example about the [volatility][6] algorithm.
59
- This example is inspired in the [function pipelines][7] blog post, which brings
60
- an example about how to calculate volatility and then apply the function
61
- pipelines to make the same with the toolkit.
62
-
63
- !!!success
64
-
65
- Reading the [blog post][7] before trying this is highly recommended,
66
- and will give you more insights on how to apply and use time vectors that
67
- is our next topic.
68
-
69
-
70
- Let's start by creating the `measurements` hypertable using a regular migration:
71
-
72
- ```ruby
73
- class CreateMeasurements < ActiveRecord::Migration
74
- def change
75
- hypertable_options = {
76
- time_column: 'ts',
77
- chunk_time_interval: '1 day',
78
- }
79
- create_table :measurements, hypertable: hypertable_options, id: false do |t|
80
- t.integer :device_id
81
- t.decimal :val
82
- t.timestamp :ts
83
- end
84
- end
85
- end
86
- ```
87
-
88
- In this example, we just have a hypertable with no compression options. Every
89
- `1 day` a new child table aka [chunk][8] will be generated. No compression
90
- options for now.
91
-
92
- Now, let's add the model `app/models/measurement.rb`:
93
-
94
- ```ruby
95
- class Measurement < ActiveRecord::Base
96
- self.primary_key = nil
97
-
98
- acts_as_hypertable time_column: "ts"
99
- end
100
- ```
101
-
102
- At this moment, you can jump into the Rails console and start testing the model.
103
-
104
- ## Seeding some data
105
-
106
- Before we build a very complex example, let's build something that is easy to
107
- follow and comprehend. Let's create 3 records for the same device, representing
108
- a hourly measurement of some sensor.
109
-
110
- ```ruby
111
- yesterday = 1.day.ago
112
- [1,2,3].each_with_index do |v,i|
113
- Measurement.create(device_id: 1, ts: yesterday + i.hour, val: v)
114
- end
115
- ```
116
-
117
- Every value is a progression from 1 to 3. Now, we can build a query to get the
118
- values and let's build the example using plain Ruby.
119
-
120
- ```ruby
121
- values = Measurement.order(:ts).pluck(:val) # => [1,2,3]
122
- ```
123
-
124
- Using plain Ruby, we can build this example with a few lines of code:
125
-
126
- ```ruby
127
- previous = nil
128
- volatilities = values.map do |value|
129
- if previous
130
- delta = (value - previous).abs
131
- volatility = delta
132
- end
133
- previous = value
134
- volatility
135
- end
136
- # volatilities => [nil, 1, 1]
137
- volatility = volatilities.compact.sum # => 2
138
- ```
139
- Compact can be skipped and we can also build the sum in the same loop. So, a
140
- refactored version would be:
141
-
142
- ```ruby
143
- previous = nil
144
- volatility = 0
145
- values.each do |value|
146
- if previous
147
- delta = (value - previous).abs
148
- volatility += delta
149
- end
150
- previous = value
151
- end
152
- volatility # => 2
153
- ```
154
-
155
- Now, it's time to move it to a database level calculating the volatility using
156
- plain postgresql. A subquery is required to build the calculated delta, so it
157
- seems a bit more confusing:
158
-
159
-
160
- ```ruby
161
- delta = Measurement.select("device_id, abs(val - lag(val) OVER (PARTITION BY device_id ORDER BY ts)) as abs_delta")
162
- Measurement
163
- .select("device_id, sum(abs_delta) as volatility")
164
- .from("(#{delta.to_sql}) as calc_delta")
165
- .group('device_id')
166
- ```
167
-
168
- The final query for the example above looks like this:
169
-
170
- ```sql
171
- SELECT device_id, SUM(abs_delta) AS volatility
172
- FROM (
173
- SELECT device_id,
174
- ABS(
175
- val - LAG(val) OVER (
176
- PARTITION BY device_id ORDER BY ts)
177
- ) AS abs_delta
178
- FROM "measurements"
179
- ) AS calc_delta
180
- GROUP BY device_id
181
- ```
182
-
183
- It's much harder to understand the actual example then go with plain SQL and now
184
- let's reproduce the same example using the toolkit pipelines:
185
-
186
- ```ruby
187
- Measurement
188
- .select(<<-SQL).group("device_id")
189
- device_id,
190
- timevector(ts, val)
191
- -> sort()
192
- -> delta()
193
- -> abs()
194
- -> sum() as volatility
195
- SQL
196
- ```
197
-
198
- As you can see, it's much easier to read and digest the example. Now, let's take
199
- a look in how we can generate the queries using the scopes injected by the
200
- `acts_as_time_vector` macro.
201
-
202
-
203
- ## Adding the `acts_as_time_vector` macro
204
-
205
- Let's start changing the model to add the `acts_as_time_vector` that is
206
- here to allow us to not repeat the parameters of the `timevector(ts, val)` call.
207
-
208
- ```ruby
209
- class Measurement < ActiveRecord::Base
210
- self.primary_key = nil
211
-
212
- acts_as_hypertable time_column: "ts"
213
-
214
- acts_as_time_vector segment_by: "device_id",
215
- value_column: "val",
216
- time_column: "ts"
217
- end
218
- end
219
- ```
220
-
221
- If you skip the `time_column` option in the `acts_as_time_vector` it will
222
- inherit the same value from the `acts_as_hypertable`. I'm making it explicit
223
- here for the sake of making the macros independent.
224
-
225
-
226
- Now, that we have it, let's create a scope for it:
227
-
228
- ```ruby
229
- class Measurement < ActiveRecord::Base
230
- acts_as_hypertable time_column: "ts"
231
- acts_as_time_vector segment_by: "device_id",
232
- value_column: "val",
233
- time_column: "ts"
234
-
235
- scope :volatility, -> do
236
- select(<<-SQL).group("device_id")
237
- device_id,
238
- timevector(#{time_column}, #{value_column})
239
- -> sort()
240
- -> delta()
241
- -> abs()
242
- -> sum() as volatility
243
- SQL
244
- end
245
- end
246
- ```
247
-
248
- Now, we have created the volatility scope, grouping by device_id always.
249
-
250
- In the Toolkit helpers, we have a similar version which also contains a default
251
- segmentation based in the `segment_by` configuration done through the `acts_as_time_vector`
252
- macro. A method `segment_by_column` is added to access this configuration, so we
253
- can make a small change that makes you completely understand the volatility
254
- macro.
255
-
256
- ```ruby
257
- class Measurement < ActiveRecord::Base
258
- # ... Skipping previous code to focus in the example
259
-
260
- acts_as_time_vector segment_by: "device_id",
261
- value_column: "val",
262
- time_column: "ts"
263
-
264
- scope :volatility, -> (columns=segment_by_column) do
265
- _scope = select([*columns,
266
- "timevector(#{time_column},
267
- #{value_column})
268
- -> sort()
269
- -> delta()
270
- -> abs()
271
- -> sum() as volatility"
272
- ].join(", "))
273
- _scope = _scope.group(columns) if columns
274
- _scope
275
- end
276
- end
277
- ```
278
-
279
- Testing the method:
280
-
281
- ```ruby
282
- Measurement.volatility.map(&:attributes)
283
- # DEBUG -- : Measurement Load (1.6ms) SELECT device_id, timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements" GROUP BY "measurements"."device_id"
284
- # => [{"device_id"=>1, "volatility"=>8.0}]
285
- ```
286
-
287
- Let's add a few more records with random values:
288
-
289
- ```ruby
290
- yesterday = 1.day.ago
291
- (2..6).each do |d|
292
- (1..10).each do |j|
293
- Measurement.create(device_id: d, ts: yesterday + j.hour, val: rand(10))
294
- end
295
- end
296
- ```
297
-
298
- Testing all the values:
299
-
300
- ```ruby
301
- Measurement.order("device_id").volatility.map(&:attributes)
302
- # DEBUG -- : Measurement Load (1.3ms) SELECT device_id, timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements" GROUP BY "measurements"."device_id" ORDER BY device_id
303
- => [{"device_id"=>1, "volatility"=>8.0},
304
- {"device_id"=>2, "volatility"=>24.0},
305
- {"device_id"=>3, "volatility"=>30.0},
306
- {"device_id"=>4, "volatility"=>32.0},
307
- {"device_id"=>5, "volatility"=>44.0},
308
- {"device_id"=>6, "volatility"=>23.0}]
309
- ```
310
-
311
- If the parameter is explicit `nil` it will not group by:
312
-
313
- ```ruby
314
- Measurement.volatility(nil).map(&:attributes)
315
- # DEBUG -- : Measurement Load (5.4ms) SELECT timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements"
316
- # => [{"volatility"=>186.0, "device_id"=>nil}]
317
- ```
318
-
319
- ## Comparing with Ruby version
320
-
321
- Now, it's time to benchmark and compare Ruby vs PostgreSQL solutions, verifying
322
- which is faster:
323
-
324
- ```ruby
325
- class Measurement < ActiveRecord::Base
326
- # code you already know
327
- scope :volatility_by_device_id, -> {
328
- volatility = Hash.new(0)
329
- previous = Hash.new
330
- find_all do |measurement|
331
- device_id = measurement.device_id
332
- if previous[device_id]
333
- delta = (measurement.val - previous[device_id]).abs
334
- volatility[device_id] += delta
335
- end
336
- previous[device_id] = measurement.val
337
- end
338
- volatility
339
- }
340
- end
341
- ```
342
-
343
- Now, benchmarking the real time to compute it on Ruby in milliseconds.
344
-
345
- ```ruby
346
- Benchmark.measure { Measurement.volatility_by_device_id }.real * 1000
347
- # => 3.021999917924404
348
- ```
349
-
350
- ## Seeding massive data
351
-
352
- Now, let's use `generate_series` to fast insert a lot of records directly into
353
- the database and make it full of records.
354
-
355
- Let's just agree on some numbers to have a good start. Let's generate data for
356
- 5 devices emitting values every 5 minutes, which will generate around 50k
357
- records.
358
-
359
- Let's use some plain SQL to insert the records now:
360
-
361
- ```ruby
362
- sql = "INSERT INTO measurements (ts, device_id, val)
363
- SELECT ts, device_id, random()*80
364
- FROM generate_series(TIMESTAMP '2022-01-01 00:00:00',
365
- TIMESTAMP '2022-02-01 00:00:00',
366
- INTERVAL '5 minutes') AS g1(ts),
367
- generate_series(0, 5) AS g2(device_id);
368
- "
369
- ActiveRecord::Base.connection.execute(sql)
370
- ```
371
-
372
- In my MacOS M1 processor it took less than a second to insert the 53k records:
373
-
374
- ```ruby
375
- # DEBUG (177.5ms) INSERT INTO measurements (ts, device_id, val) ..
376
- # => #<PG::Result:0x00007f8152034168 status=PGRES_COMMAND_OK ntuples=0 nfields=0 cmd_tuples=53574>
377
- ```
378
-
379
- Now, let's measure compare the time to process the volatility:
380
-
381
- ```ruby
382
- Benchmark.bm do |x|
383
- x.report("ruby") { pp Measurement.volatility_by_device_id }
384
- x.report("sql") { pp Measurement.volatility("device_id").map(&:attributes) }
385
- end
386
- # user system total real
387
- # ruby 0.612439 0.061890 0.674329 ( 0.727590)
388
- # sql 0.001142 0.000301 0.001443 ( 0.060301)
389
- ```
390
-
391
- Calculating the performance ratio we can see `0.72 / 0.06` means that SQL is 12
392
- times faster than Ruby to process volatility 🎉
393
-
394
- Just considering it was localhost, we don't have the internet to pass all the
395
- records over the wires. Now, moving to a remote host look the numbers:
396
-
397
- !!!warning
398
- Note that the previous numbers where using localhost.
399
- Now, using a remote connection between different regions,
400
- it looks even ~500 times slower than SQL.
401
-
402
- user system total real
403
- ruby 0.716321 0.041640 0.757961 ( 6.388881)
404
- sql 0.001156 0.000177 0.001333 ( 0.161270)
405
-
406
- Let’s recap what’s time consuming here. The `find_all` is just not optimized to
407
- fetch the data and also consuming most of the time here. It’s also fetching
408
- the data and converting it to ActiveRecord model which has thousands of methods.
409
-
410
- It’s very comfortable but just need the attributes to make it.
411
-
412
- Let’s optimize it by plucking an array of values grouped by device.
413
-
414
- ```ruby
415
- class Measurement < ActiveRecord::Base
416
- # ...
417
- scope :values_from_devices, -> {
418
- ordered_values = select(:val, :device_id).order(:ts)
419
- Hash[
420
- from(ordered_values)
421
- .group(:device_id)
422
- .pluck("device_id, array_agg(val)")
423
- ]
424
- }
425
- end
426
- ```
427
-
428
- Now, let's create a method for processing volatility.
429
-
430
- ```ruby
431
- class Volatility
432
- def self.process(values)
433
- previous = nil
434
- deltas = values.map do |value|
435
- if previous
436
- delta = (value - previous).abs
437
- volatility = delta
438
- end
439
- previous = value
440
- volatility
441
- end
442
- #deltas => [nil, 1, 1]
443
- deltas.shift
444
- volatility = deltas.sum
445
- end
446
- def self.process_values(map)
447
- map.transform_values(&method(:process))
448
- end
449
- end
450
- ```
451
-
452
- Now, let's change the benchmark to expose the time for fetching and processing:
453
-
454
-
455
- ```ruby
456
- volatilities = nil
457
-
458
- ActiveRecord::Base.logger = nil
459
- Benchmark.bm do |x|
460
- x.report("ruby") { Measurement.volatility_ruby }
461
- x.report("sql") { Measurement.volatility_sql.map(&:attributes) }
462
- x.report("fetch") { volatilities = Measurement.values_from_devices }
463
- x.report("process") { Volatility.process_values(volatilities) }
464
- end
465
- ```
466
-
467
- Checking the results:
468
-
469
- user system total real
470
- ruby 0.683654 0.036558 0.720212 ( 0.743942)
471
- sql 0.000876 0.000096 0.000972 ( 0.054234)
472
- fetch 0.078045 0.003221 0.081266 ( 0.116693)
473
- process 0.067643 0.006473 0.074116 ( 0.074122)
474
-
475
- Much better, now we can see only 200ms difference between real time which means ~36% more.
476
-
477
-
478
- If we try to break down a bit more of the SQL part, we can see that the
479
-
480
- ```sql
481
- EXPLAIN ANALYSE
482
- SELECT device_id, array_agg(val)
483
- FROM (
484
- SELECT val, device_id
485
- FROM measurements
486
- ORDER BY ts ASC
487
- ) subquery
488
- GROUP BY device_id;
489
- ```
490
-
491
- We can check the execution time and make it clear how much time is necessary
492
- just for the processing part, isolating network and the ActiveRecord layer.
493
-
494
- │ Planning Time: 17.761 ms │
495
- │ Execution Time: 36.302 ms
496
-
497
- So, it means that from the **116ms** to fetch the data, only **54ms** was used from the DB
498
- and the remaining **62ms** was consumed by network + ORM.
499
-
500
- [1]: https://github.com/timescale/timescaledb-toolkit
501
- [2]: https://timescale.com
502
- [3]: https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-SEARCH-PATH
503
- [4]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/README.md#a-note-on-tags-
504
- [5]: https://guides.rubyonrails.org/action_controller_overview.html#filters
505
- [6]: https://en.wikipedia.org/wiki/Volatility_(finance)
506
- [7]: https://www.timescale.com/blog/function-pipelines-building-functional-programming-into-postgresql-using-custom-operators/
507
- [8]: https://docs.timescale.com/timescaledb/latest/overview/core-concepts/hypertables-and-chunks/#partitioning-in-hypertables-with-chunks