timescaledb 0.2.6 → 0.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (102) hide show
  1. checksums.yaml +4 -4
  2. data/lib/timescaledb/acts_as_hypertable/core.rb +1 -1
  3. data/lib/timescaledb/database/quoting.rb +12 -0
  4. data/lib/timescaledb/database/schema_statements.rb +168 -0
  5. data/lib/timescaledb/database/types.rb +17 -0
  6. data/lib/timescaledb/database.rb +11 -0
  7. data/lib/timescaledb/toolkit/time_vector.rb +41 -4
  8. data/lib/timescaledb/version.rb +1 -1
  9. metadata +6 -95
  10. data/.github/workflows/ci.yml +0 -72
  11. data/.gitignore +0 -12
  12. data/.rspec +0 -3
  13. data/.ruby-version +0 -1
  14. data/.tool-versions +0 -1
  15. data/.travis.yml +0 -9
  16. data/CODE_OF_CONDUCT.md +0 -74
  17. data/Fastfile +0 -17
  18. data/Gemfile +0 -8
  19. data/Gemfile.lock +0 -75
  20. data/Gemfile.scenic +0 -7
  21. data/Gemfile.scenic.lock +0 -119
  22. data/README.md +0 -490
  23. data/Rakefile +0 -21
  24. data/bin/console +0 -28
  25. data/bin/setup +0 -13
  26. data/docs/command_line.md +0 -178
  27. data/docs/img/lttb_example.png +0 -0
  28. data/docs/img/lttb_sql_vs_ruby.gif +0 -0
  29. data/docs/img/lttb_zoom.gif +0 -0
  30. data/docs/index.md +0 -72
  31. data/docs/migrations.md +0 -76
  32. data/docs/models.md +0 -78
  33. data/docs/toolkit.md +0 -507
  34. data/docs/toolkit_lttb_tutorial.md +0 -557
  35. data/docs/toolkit_lttb_zoom.md +0 -357
  36. data/docs/toolkit_ohlc.md +0 -315
  37. data/docs/videos.md +0 -16
  38. data/examples/all_in_one/all_in_one.rb +0 -94
  39. data/examples/all_in_one/benchmark_comparison.rb +0 -108
  40. data/examples/all_in_one/caggs.rb +0 -93
  41. data/examples/all_in_one/query_data.rb +0 -78
  42. data/examples/ranking/.gitattributes +0 -7
  43. data/examples/ranking/.gitignore +0 -29
  44. data/examples/ranking/.ruby-version +0 -1
  45. data/examples/ranking/Gemfile +0 -33
  46. data/examples/ranking/Gemfile.lock +0 -189
  47. data/examples/ranking/README.md +0 -166
  48. data/examples/ranking/Rakefile +0 -6
  49. data/examples/ranking/app/controllers/application_controller.rb +0 -2
  50. data/examples/ranking/app/controllers/concerns/.keep +0 -0
  51. data/examples/ranking/app/jobs/application_job.rb +0 -7
  52. data/examples/ranking/app/models/application_record.rb +0 -3
  53. data/examples/ranking/app/models/concerns/.keep +0 -0
  54. data/examples/ranking/app/models/game.rb +0 -2
  55. data/examples/ranking/app/models/play.rb +0 -7
  56. data/examples/ranking/bin/bundle +0 -114
  57. data/examples/ranking/bin/rails +0 -4
  58. data/examples/ranking/bin/rake +0 -4
  59. data/examples/ranking/bin/setup +0 -33
  60. data/examples/ranking/config/application.rb +0 -39
  61. data/examples/ranking/config/boot.rb +0 -4
  62. data/examples/ranking/config/credentials.yml.enc +0 -1
  63. data/examples/ranking/config/database.yml +0 -86
  64. data/examples/ranking/config/environment.rb +0 -5
  65. data/examples/ranking/config/environments/development.rb +0 -60
  66. data/examples/ranking/config/environments/production.rb +0 -75
  67. data/examples/ranking/config/environments/test.rb +0 -53
  68. data/examples/ranking/config/initializers/cors.rb +0 -16
  69. data/examples/ranking/config/initializers/filter_parameter_logging.rb +0 -8
  70. data/examples/ranking/config/initializers/inflections.rb +0 -16
  71. data/examples/ranking/config/initializers/timescale.rb +0 -2
  72. data/examples/ranking/config/locales/en.yml +0 -33
  73. data/examples/ranking/config/puma.rb +0 -43
  74. data/examples/ranking/config/routes.rb +0 -6
  75. data/examples/ranking/config/storage.yml +0 -34
  76. data/examples/ranking/config.ru +0 -6
  77. data/examples/ranking/db/migrate/20220209120747_create_games.rb +0 -10
  78. data/examples/ranking/db/migrate/20220209120910_create_plays.rb +0 -19
  79. data/examples/ranking/db/migrate/20220209143347_create_score_per_hours.rb +0 -5
  80. data/examples/ranking/db/schema.rb +0 -47
  81. data/examples/ranking/db/seeds.rb +0 -7
  82. data/examples/ranking/db/views/score_per_hours_v01.sql +0 -7
  83. data/examples/ranking/lib/tasks/.keep +0 -0
  84. data/examples/ranking/log/.keep +0 -0
  85. data/examples/ranking/public/robots.txt +0 -1
  86. data/examples/ranking/storage/.keep +0 -0
  87. data/examples/ranking/tmp/.keep +0 -0
  88. data/examples/ranking/tmp/pids/.keep +0 -0
  89. data/examples/ranking/tmp/storage/.keep +0 -0
  90. data/examples/ranking/vendor/.keep +0 -0
  91. data/examples/toolkit-demo/compare_volatility.rb +0 -104
  92. data/examples/toolkit-demo/lttb/README.md +0 -15
  93. data/examples/toolkit-demo/lttb/lttb.rb +0 -92
  94. data/examples/toolkit-demo/lttb/lttb_sinatra.rb +0 -139
  95. data/examples/toolkit-demo/lttb/lttb_test.rb +0 -21
  96. data/examples/toolkit-demo/lttb/views/index.erb +0 -27
  97. data/examples/toolkit-demo/lttb-zoom/README.md +0 -13
  98. data/examples/toolkit-demo/lttb-zoom/lttb_zoomable.rb +0 -90
  99. data/examples/toolkit-demo/lttb-zoom/views/index.erb +0 -33
  100. data/examples/toolkit-demo/ohlc.rb +0 -175
  101. data/mkdocs.yml +0 -34
  102. data/timescaledb.gemspec +0 -40
data/docs/toolkit.md DELETED
@@ -1,507 +0,0 @@
1
- # The TimescaleDB Toolkit
2
-
3
- The [TimescaleDB Toolkit][1] is an extension brought by [Timescale][2] for more
4
- hyperfunctions, fully compatible with TimescaleDB and PostgreSQL.
5
-
6
- They have almost no dependecy of hypertables but they play very well in the
7
- hypertables ecosystem. The mission of the toolkit team is to ease all things
8
- analytics when using TimescaleDB, with a particular focus on developer
9
- ergonomics and performance.
10
-
11
- Here, we're going to have a small walkthrough in some of the toolkit functions
12
- and the helpers that can make simplify the generation of some complex queries.
13
-
14
- !!!warning
15
-
16
- Note that we're just starting the toolkit integration in the gem and several
17
- functions are still experimental.
18
-
19
- ## The `add_toolkit_to_search_path!` helper
20
-
21
- Several functions on the toolkit are still in experimental phase, and for that
22
- reason they're not in the public schema, but lives in the `toolkit_experimental`
23
- schema.
24
-
25
- To use them without worring about the schema or prefixing it in all the cases,
26
- you can introduce the schema as part of the [search_path][3].
27
-
28
- To make it easy in the Ruby side, you can call the method directly from the
29
- ActiveRecord connection:
30
-
31
- ```ruby
32
- ActiveRecord::Base.connection.add_toolkit_to_search_path!
33
- ```
34
-
35
- This statement is actually adding the [toolkit_experimental][4] to the search
36
- path aside of the `public` and the `$user` variable path.
37
-
38
- The statement can be placed right before your usage of the toolkit. For example,
39
- if a single controller in your Rails app will be using it, you can create a
40
- [filter][5] in the controller to set up it before the use of your action.
41
-
42
- ```ruby
43
- class StatisticsController < ActionController::Base
44
- before_action :add_timescale_toolkit, only: [:complex_query]
45
-
46
- def complex_query
47
- # some code that uses the toolkit functions
48
- end
49
-
50
- protected
51
- def add_timescale_toolkit
52
- ActiveRecord::Base.connection.add_toolkit_to_search_path!
53
- end
54
- ```
55
-
56
- ## Example from scratch to use the Toolkit functions
57
-
58
- Let's start by working on some example about the [volatility][6] algorithm.
59
- This example is inspired in the [function pipelines][7] blog post, which brings
60
- an example about how to calculate volatility and then apply the function
61
- pipelines to make the same with the toolkit.
62
-
63
- !!!success
64
-
65
- Reading the [blog post][7] before trying this is highly recommended,
66
- and will give you more insights on how to apply and use time vectors that
67
- is our next topic.
68
-
69
-
70
- Let's start by creating the `measurements` hypertable using a regular migration:
71
-
72
- ```ruby
73
- class CreateMeasurements < ActiveRecord::Migration
74
- def change
75
- hypertable_options = {
76
- time_column: 'ts',
77
- chunk_time_interval: '1 day',
78
- }
79
- create_table :measurements, hypertable: hypertable_options, id: false do |t|
80
- t.integer :device_id
81
- t.decimal :val
82
- t.timestamp :ts
83
- end
84
- end
85
- end
86
- ```
87
-
88
- In this example, we just have a hypertable with no compression options. Every
89
- `1 day` a new child table aka [chunk][8] will be generated. No compression
90
- options for now.
91
-
92
- Now, let's add the model `app/models/measurement.rb`:
93
-
94
- ```ruby
95
- class Measurement < ActiveRecord::Base
96
- self.primary_key = nil
97
-
98
- acts_as_hypertable time_column: "ts"
99
- end
100
- ```
101
-
102
- At this moment, you can jump into the Rails console and start testing the model.
103
-
104
- ## Seeding some data
105
-
106
- Before we build a very complex example, let's build something that is easy to
107
- follow and comprehend. Let's create 3 records for the same device, representing
108
- a hourly measurement of some sensor.
109
-
110
- ```ruby
111
- yesterday = 1.day.ago
112
- [1,2,3].each_with_index do |v,i|
113
- Measurement.create(device_id: 1, ts: yesterday + i.hour, val: v)
114
- end
115
- ```
116
-
117
- Every value is a progression from 1 to 3. Now, we can build a query to get the
118
- values and let's build the example using plain Ruby.
119
-
120
- ```ruby
121
- values = Measurement.order(:ts).pluck(:val) # => [1,2,3]
122
- ```
123
-
124
- Using plain Ruby, we can build this example with a few lines of code:
125
-
126
- ```ruby
127
- previous = nil
128
- volatilities = values.map do |value|
129
- if previous
130
- delta = (value - previous).abs
131
- volatility = delta
132
- end
133
- previous = value
134
- volatility
135
- end
136
- # volatilities => [nil, 1, 1]
137
- volatility = volatilities.compact.sum # => 2
138
- ```
139
- Compact can be skipped and we can also build the sum in the same loop. So, a
140
- refactored version would be:
141
-
142
- ```ruby
143
- previous = nil
144
- volatility = 0
145
- values.each do |value|
146
- if previous
147
- delta = (value - previous).abs
148
- volatility += delta
149
- end
150
- previous = value
151
- end
152
- volatility # => 2
153
- ```
154
-
155
- Now, it's time to move it to a database level calculating the volatility using
156
- plain postgresql. A subquery is required to build the calculated delta, so it
157
- seems a bit more confusing:
158
-
159
-
160
- ```ruby
161
- delta = Measurement.select("device_id, abs(val - lag(val) OVER (PARTITION BY device_id ORDER BY ts)) as abs_delta")
162
- Measurement
163
- .select("device_id, sum(abs_delta) as volatility")
164
- .from("(#{delta.to_sql}) as calc_delta")
165
- .group('device_id')
166
- ```
167
-
168
- The final query for the example above looks like this:
169
-
170
- ```sql
171
- SELECT device_id, SUM(abs_delta) AS volatility
172
- FROM (
173
- SELECT device_id,
174
- ABS(
175
- val - LAG(val) OVER (
176
- PARTITION BY device_id ORDER BY ts)
177
- ) AS abs_delta
178
- FROM "measurements"
179
- ) AS calc_delta
180
- GROUP BY device_id
181
- ```
182
-
183
- It's much harder to understand the actual example then go with plain SQL and now
184
- let's reproduce the same example using the toolkit pipelines:
185
-
186
- ```ruby
187
- Measurement
188
- .select(<<-SQL).group("device_id")
189
- device_id,
190
- timevector(ts, val)
191
- -> sort()
192
- -> delta()
193
- -> abs()
194
- -> sum() as volatility
195
- SQL
196
- ```
197
-
198
- As you can see, it's much easier to read and digest the example. Now, let's take
199
- a look in how we can generate the queries using the scopes injected by the
200
- `acts_as_time_vector` macro.
201
-
202
-
203
- ## Adding the `acts_as_time_vector` macro
204
-
205
- Let's start changing the model to add the `acts_as_time_vector` that is
206
- here to allow us to not repeat the parameters of the `timevector(ts, val)` call.
207
-
208
- ```ruby
209
- class Measurement < ActiveRecord::Base
210
- self.primary_key = nil
211
-
212
- acts_as_hypertable time_column: "ts"
213
-
214
- acts_as_time_vector segment_by: "device_id",
215
- value_column: "val",
216
- time_column: "ts"
217
- end
218
- end
219
- ```
220
-
221
- If you skip the `time_column` option in the `acts_as_time_vector` it will
222
- inherit the same value from the `acts_as_hypertable`. I'm making it explicit
223
- here for the sake of making the macros independent.
224
-
225
-
226
- Now, that we have it, let's create a scope for it:
227
-
228
- ```ruby
229
- class Measurement < ActiveRecord::Base
230
- acts_as_hypertable time_column: "ts"
231
- acts_as_time_vector segment_by: "device_id",
232
- value_column: "val",
233
- time_column: "ts"
234
-
235
- scope :volatility, -> do
236
- select(<<-SQL).group("device_id")
237
- device_id,
238
- timevector(#{time_column}, #{value_column})
239
- -> sort()
240
- -> delta()
241
- -> abs()
242
- -> sum() as volatility
243
- SQL
244
- end
245
- end
246
- ```
247
-
248
- Now, we have created the volatility scope, grouping by device_id always.
249
-
250
- In the Toolkit helpers, we have a similar version which also contains a default
251
- segmentation based in the `segment_by` configuration done through the `acts_as_time_vector`
252
- macro. A method `segment_by_column` is added to access this configuration, so we
253
- can make a small change that makes you completely understand the volatility
254
- macro.
255
-
256
- ```ruby
257
- class Measurement < ActiveRecord::Base
258
- # ... Skipping previous code to focus in the example
259
-
260
- acts_as_time_vector segment_by: "device_id",
261
- value_column: "val",
262
- time_column: "ts"
263
-
264
- scope :volatility, -> (columns=segment_by_column) do
265
- _scope = select([*columns,
266
- "timevector(#{time_column},
267
- #{value_column})
268
- -> sort()
269
- -> delta()
270
- -> abs()
271
- -> sum() as volatility"
272
- ].join(", "))
273
- _scope = _scope.group(columns) if columns
274
- _scope
275
- end
276
- end
277
- ```
278
-
279
- Testing the method:
280
-
281
- ```ruby
282
- Measurement.volatility.map(&:attributes)
283
- # DEBUG -- : Measurement Load (1.6ms) SELECT device_id, timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements" GROUP BY "measurements"."device_id"
284
- # => [{"device_id"=>1, "volatility"=>8.0}]
285
- ```
286
-
287
- Let's add a few more records with random values:
288
-
289
- ```ruby
290
- yesterday = 1.day.ago
291
- (2..6).each do |d|
292
- (1..10).each do |j|
293
- Measurement.create(device_id: d, ts: yesterday + j.hour, val: rand(10))
294
- end
295
- end
296
- ```
297
-
298
- Testing all the values:
299
-
300
- ```ruby
301
- Measurement.order("device_id").volatility.map(&:attributes)
302
- # DEBUG -- : Measurement Load (1.3ms) SELECT device_id, timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements" GROUP BY "measurements"."device_id" ORDER BY device_id
303
- => [{"device_id"=>1, "volatility"=>8.0},
304
- {"device_id"=>2, "volatility"=>24.0},
305
- {"device_id"=>3, "volatility"=>30.0},
306
- {"device_id"=>4, "volatility"=>32.0},
307
- {"device_id"=>5, "volatility"=>44.0},
308
- {"device_id"=>6, "volatility"=>23.0}]
309
- ```
310
-
311
- If the parameter is explicit `nil` it will not group by:
312
-
313
- ```ruby
314
- Measurement.volatility(nil).map(&:attributes)
315
- # DEBUG -- : Measurement Load (5.4ms) SELECT timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility FROM "measurements"
316
- # => [{"volatility"=>186.0, "device_id"=>nil}]
317
- ```
318
-
319
- ## Comparing with Ruby version
320
-
321
- Now, it's time to benchmark and compare Ruby vs PostgreSQL solutions, verifying
322
- which is faster:
323
-
324
- ```ruby
325
- class Measurement < ActiveRecord::Base
326
- # code you already know
327
- scope :volatility_by_device_id, -> {
328
- volatility = Hash.new(0)
329
- previous = Hash.new
330
- find_all do |measurement|
331
- device_id = measurement.device_id
332
- if previous[device_id]
333
- delta = (measurement.val - previous[device_id]).abs
334
- volatility[device_id] += delta
335
- end
336
- previous[device_id] = measurement.val
337
- end
338
- volatility
339
- }
340
- end
341
- ```
342
-
343
- Now, benchmarking the real time to compute it on Ruby in milliseconds.
344
-
345
- ```ruby
346
- Benchmark.measure { Measurement.volatility_by_device_id }.real * 1000
347
- # => 3.021999917924404
348
- ```
349
-
350
- ## Seeding massive data
351
-
352
- Now, let's use `generate_series` to fast insert a lot of records directly into
353
- the database and make it full of records.
354
-
355
- Let's just agree on some numbers to have a good start. Let's generate data for
356
- 5 devices emitting values every 5 minutes, which will generate around 50k
357
- records.
358
-
359
- Let's use some plain SQL to insert the records now:
360
-
361
- ```ruby
362
- sql = "INSERT INTO measurements (ts, device_id, val)
363
- SELECT ts, device_id, random()*80
364
- FROM generate_series(TIMESTAMP '2022-01-01 00:00:00',
365
- TIMESTAMP '2022-02-01 00:00:00',
366
- INTERVAL '5 minutes') AS g1(ts),
367
- generate_series(0, 5) AS g2(device_id);
368
- "
369
- ActiveRecord::Base.connection.execute(sql)
370
- ```
371
-
372
- In my MacOS M1 processor it took less than a second to insert the 53k records:
373
-
374
- ```ruby
375
- # DEBUG (177.5ms) INSERT INTO measurements (ts, device_id, val) ..
376
- # => #<PG::Result:0x00007f8152034168 status=PGRES_COMMAND_OK ntuples=0 nfields=0 cmd_tuples=53574>
377
- ```
378
-
379
- Now, let's measure compare the time to process the volatility:
380
-
381
- ```ruby
382
- Benchmark.bm do |x|
383
- x.report("ruby") { pp Measurement.volatility_by_device_id }
384
- x.report("sql") { pp Measurement.volatility("device_id").map(&:attributes) }
385
- end
386
- # user system total real
387
- # ruby 0.612439 0.061890 0.674329 ( 0.727590)
388
- # sql 0.001142 0.000301 0.001443 ( 0.060301)
389
- ```
390
-
391
- Calculating the performance ratio we can see `0.72 / 0.06` means that SQL is 12
392
- times faster than Ruby to process volatility 🎉
393
-
394
- Just considering it was localhost, we don't have the internet to pass all the
395
- records over the wires. Now, moving to a remote host look the numbers:
396
-
397
- !!!warning
398
- Note that the previous numbers where using localhost.
399
- Now, using a remote connection between different regions,
400
- it looks even ~500 times slower than SQL.
401
-
402
- user system total real
403
- ruby 0.716321 0.041640 0.757961 ( 6.388881)
404
- sql 0.001156 0.000177 0.001333 ( 0.161270)
405
-
406
- Let’s recap what’s time consuming here. The `find_all` is just not optimized to
407
- fetch the data and also consuming most of the time here. It’s also fetching
408
- the data and converting it to ActiveRecord model which has thousands of methods.
409
-
410
- It’s very comfortable but just need the attributes to make it.
411
-
412
- Let’s optimize it by plucking an array of values grouped by device.
413
-
414
- ```ruby
415
- class Measurement < ActiveRecord::Base
416
- # ...
417
- scope :values_from_devices, -> {
418
- ordered_values = select(:val, :device_id).order(:ts)
419
- Hash[
420
- from(ordered_values)
421
- .group(:device_id)
422
- .pluck("device_id, array_agg(val)")
423
- ]
424
- }
425
- end
426
- ```
427
-
428
- Now, let's create a method for processing volatility.
429
-
430
- ```ruby
431
- class Volatility
432
- def self.process(values)
433
- previous = nil
434
- deltas = values.map do |value|
435
- if previous
436
- delta = (value - previous).abs
437
- volatility = delta
438
- end
439
- previous = value
440
- volatility
441
- end
442
- #deltas => [nil, 1, 1]
443
- deltas.shift
444
- volatility = deltas.sum
445
- end
446
- def self.process_values(map)
447
- map.transform_values(&method(:process))
448
- end
449
- end
450
- ```
451
-
452
- Now, let's change the benchmark to expose the time for fetching and processing:
453
-
454
-
455
- ```ruby
456
- volatilities = nil
457
-
458
- ActiveRecord::Base.logger = nil
459
- Benchmark.bm do |x|
460
- x.report("ruby") { Measurement.volatility_ruby }
461
- x.report("sql") { Measurement.volatility_sql.map(&:attributes) }
462
- x.report("fetch") { volatilities = Measurement.values_from_devices }
463
- x.report("process") { Volatility.process_values(volatilities) }
464
- end
465
- ```
466
-
467
- Checking the results:
468
-
469
- user system total real
470
- ruby 0.683654 0.036558 0.720212 ( 0.743942)
471
- sql 0.000876 0.000096 0.000972 ( 0.054234)
472
- fetch 0.078045 0.003221 0.081266 ( 0.116693)
473
- process 0.067643 0.006473 0.074116 ( 0.074122)
474
-
475
- Much better, now we can see only 200ms difference between real time which means ~36% more.
476
-
477
-
478
- If we try to break down a bit more of the SQL part, we can see that the
479
-
480
- ```sql
481
- EXPLAIN ANALYSE
482
- SELECT device_id, array_agg(val)
483
- FROM (
484
- SELECT val, device_id
485
- FROM measurements
486
- ORDER BY ts ASC
487
- ) subquery
488
- GROUP BY device_id;
489
- ```
490
-
491
- We can check the execution time and make it clear how much time is necessary
492
- just for the processing part, isolating network and the ActiveRecord layer.
493
-
494
- │ Planning Time: 17.761 ms │
495
- │ Execution Time: 36.302 ms
496
-
497
- So, it means that from the **116ms** to fetch the data, only **54ms** was used from the DB
498
- and the remaining **62ms** was consumed by network + ORM.
499
-
500
- [1]: https://github.com/timescale/timescaledb-toolkit
501
- [2]: https://timescale.com
502
- [3]: https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-SEARCH-PATH
503
- [4]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/README.md#a-note-on-tags-
504
- [5]: https://guides.rubyonrails.org/action_controller_overview.html#filters
505
- [6]: https://en.wikipedia.org/wiki/Volatility_(finance)
506
- [7]: https://www.timescale.com/blog/function-pipelines-building-functional-programming-into-postgresql-using-custom-operators/
507
- [8]: https://docs.timescale.com/timescaledb/latest/overview/core-concepts/hypertables-and-chunks/#partitioning-in-hypertables-with-chunks