timescaledb 0.2.1 → 0.2.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +41 -9
- data/bin/console +1 -1
- data/bin/tsdb +2 -2
- data/docs/command_line.md +178 -0
- data/docs/img/lttb_example.png +0 -0
- data/docs/img/lttb_sql_vs_ruby.gif +0 -0
- data/docs/img/lttb_zoom.gif +0 -0
- data/docs/index.md +61 -0
- data/docs/migrations.md +69 -0
- data/docs/models.md +78 -0
- data/docs/toolkit.md +394 -0
- data/docs/toolkit_lttb_tutorial.md +557 -0
- data/docs/toolkit_lttb_zoom.md +357 -0
- data/docs/videos.md +16 -0
- data/examples/all_in_one/all_in_one.rb +39 -5
- data/examples/all_in_one/benchmark_comparison.rb +108 -0
- data/examples/all_in_one/caggs.rb +93 -0
- data/examples/all_in_one/query_data.rb +78 -0
- data/examples/toolkit-demo/compare_volatility.rb +64 -0
- data/examples/toolkit-demo/lttb/README.md +15 -0
- data/examples/toolkit-demo/lttb/lttb.rb +92 -0
- data/examples/toolkit-demo/lttb/lttb_sinatra.rb +139 -0
- data/examples/toolkit-demo/lttb/lttb_test.rb +21 -0
- data/examples/toolkit-demo/lttb/views/index.erb +27 -0
- data/examples/toolkit-demo/lttb-zoom/README.md +13 -0
- data/examples/toolkit-demo/lttb-zoom/lttb_zoomable.rb +90 -0
- data/examples/toolkit-demo/lttb-zoom/views/index.erb +33 -0
- data/lib/timescaledb/acts_as_time_vector.rb +18 -0
- data/lib/timescaledb/dimensions.rb +1 -0
- data/lib/timescaledb/hypertable.rb +5 -1
- data/lib/timescaledb/migration_helpers.rb +11 -0
- data/lib/timescaledb/stats_report.rb +1 -1
- data/lib/timescaledb/toolkit/helpers.rb +20 -0
- data/lib/timescaledb/toolkit/time_vector.rb +66 -0
- data/lib/timescaledb/toolkit.rb +3 -0
- data/lib/timescaledb/version.rb +1 -1
- data/lib/timescaledb.rb +1 -0
- data/mkdocs.yml +33 -0
- metadata +30 -4
- data/examples/all_in_one/Gemfile +0 -11
- data/examples/all_in_one/Gemfile.lock +0 -51
@@ -0,0 +1,357 @@
|
|
1
|
+
# Downsampling and zooming
|
2
|
+
|
3
|
+
Less than 2 decades ago, google revolutionised the digital maps system, raising the bar of maps rendering and helping people to navigate in the unknown. Helping tourists and drivers to drammatically speed up the time to analyze a route and get the next step. With time-series dates and numbers, several indicators where created to make data scientists digest things faster like candle sticks and indicators that can easily show insights about relevant moments in the data.
|
4
|
+
|
5
|
+
In this tutorial, we're going to cover data resolution and how to present data in a reasonable resolution.
|
6
|
+
|
7
|
+
if you're zooming out years of time-series data, no matter how wide is your monitor, probably you'll not be able to see more than a few thounsand points in your screen.
|
8
|
+
|
9
|
+
One of the hard challenges we face to plot data is downsampling it in a proper resolution. Generally, when we zoom in, we lose resolution as we focus on a slice of the data points available. With less data points, the distribution of the data points become far from each other and we adopt lines between the points to promote a fake connection between the elements. Often, fetching all the data seems unreasonable and expensive.
|
10
|
+
|
11
|
+
In this tutorial, you'll see how Timescale can help you to strike a balance between speed and screen resolution. We're going to walk you through a downsampling method that allows you to downsampling milions of records to your screen resolution for a fast rendering process.
|
12
|
+
|
13
|
+
Establishing a threshold that is reasonable for the screen resolution, every zoom in will fetch new slices of downsampled data.
|
14
|
+
|
15
|
+
Downsampling in the the front end is pretty common for the plotting libraries, but the process still very expensive while delegating to the back end and make the zooming experience smooth like zooming on digital maps. You still watch the old resolution while fetches nes data and keep narrowing down for a new slice of data that represents the actual period.
|
16
|
+
|
17
|
+
In this example, we're going to use the [lttb][3] function, that is part of the [functions pipelines][4] that can simplify a lot of your data analysis in the database.
|
18
|
+
|
19
|
+
If you're not familiar with the LTTB algorithm, feel free to try the [LTTB Tutorial][1] first and then you'll understand completely how the downsampling algorithm is choosing what points to print.
|
20
|
+
|
21
|
+
The focus of this example is to show how you can build a recursive process to just downsample the data to keep it with a good resolution.
|
22
|
+
|
23
|
+
The image bellow corresponds to the step by step guide provided here.
|
24
|
+
|
25
|
+
|
26
|
+
![LTTB Zoomable Example](https://jonatas.github.io/timescaledb/img/lttb_zoom.gif)
|
27
|
+
|
28
|
+
|
29
|
+
If you want to just go and run it directly, you can fetch the complete example [here][2].
|
30
|
+
|
31
|
+
Now, we'll split the work in two main sessions: preparing the back-end and front-end.
|
32
|
+
|
33
|
+
## Preparing the Back end
|
34
|
+
|
35
|
+
The back-end will be a Ruby script to fetch the dataset and prepare the database in case it's not ready. It will also offer the JSON endpoint with the downsampled data that will be consumed by the front-end.
|
36
|
+
|
37
|
+
|
38
|
+
### Set up dependencies
|
39
|
+
|
40
|
+
The example is using Bundler inline, as it avoids the creation of the `Gemfile`. It's very handy for prototyping code that you can ship in a single file. You can declare all the gems in the `gemfile` code block, and Bundler will install them dynamically.
|
41
|
+
|
42
|
+
```ruby
|
43
|
+
require 'bundler/inline' #require only what you need
|
44
|
+
|
45
|
+
gemfile(true) do
|
46
|
+
gem 'timescaledb'
|
47
|
+
gem 'pry'
|
48
|
+
gem 'sinatra', require: false
|
49
|
+
gem 'sinatra-reloader'
|
50
|
+
gem 'sinatra-cross_origin'
|
51
|
+
end
|
52
|
+
```
|
53
|
+
|
54
|
+
The Timescale gem doesn't require the toolkit by default, so you must specify it to use.
|
55
|
+
|
56
|
+
!!!warning
|
57
|
+
Note that we do not require the rest of the libraries because Bundler inline already requires the specified libraries by default which is very convenient for examples in a single file.
|
58
|
+
|
59
|
+
Let's take a look at what dependencies we have for what purpose:
|
60
|
+
|
61
|
+
* [timescaledb][4] gem is the ActiveRecord wrapper for TimescaleDB functions.
|
62
|
+
* [sinatra][6] is a DSL for quickly creating web applications with minimal effort.
|
63
|
+
|
64
|
+
Only for development purposes we also have:
|
65
|
+
|
66
|
+
1. The [pry][5] library is widely adopted to debug any Ruby code. It can facilitate to explore the app and easily troubleshoot any issues you find.
|
67
|
+
2. The `sinatra-cross_origin` allow the application to use javascript directly from foreign servers without denying the access.
|
68
|
+
3. The `sinatra-reloader` is very convenient to keep updating the code examples without the need to restart the ruby process.
|
69
|
+
|
70
|
+
```ruby
|
71
|
+
require 'sinatra'
|
72
|
+
require 'sinatra/json'
|
73
|
+
require 'sinatra/contrib'
|
74
|
+
require 'timescaledb/toolkit'
|
75
|
+
|
76
|
+
register Sinatra::Reloader
|
77
|
+
register Sinatra::Contrib
|
78
|
+
```
|
79
|
+
|
80
|
+
## Setup database
|
81
|
+
|
82
|
+
Now, it's time to set up the database for this application. Make sure you have TimescaleDB installed or [learn how to install TimescaleDB here][12].
|
83
|
+
|
84
|
+
### Establishing the connection
|
85
|
+
|
86
|
+
The next step is to connect to the database so that we will run this example with the PostgreSQL URI as the last argument of the command line.
|
87
|
+
|
88
|
+
```ruby
|
89
|
+
PG_URI = ARGV.last
|
90
|
+
ActiveRecord::Base.establish_connection(PG_URI)
|
91
|
+
```
|
92
|
+
|
93
|
+
If this line works, it means your connection is good.
|
94
|
+
|
95
|
+
### Downloading the dataset
|
96
|
+
|
97
|
+
The data comes from a real scenario. The data loaded in the example comes from the [weather dataset][8] and contains several profiles with more or less data and with a reasonable resolution for the actual example.
|
98
|
+
|
99
|
+
Here is small automation to make it run smoothly with small, medium, and big data sets.
|
100
|
+
|
101
|
+
```ruby
|
102
|
+
VALID_SIZES = %i[small med big]
|
103
|
+
def download_weather_dataset size: :small
|
104
|
+
unless VALID_SIZES.include?(size)
|
105
|
+
fail "Invalid size: #{size}. Valid are #{VALID_SIZES}"
|
106
|
+
end
|
107
|
+
url = "https://timescaledata.blob.core.windows.net/datasets/weather_#{size}.tar.gz"
|
108
|
+
puts "fetching #{size} weather dataset..."
|
109
|
+
system "wget \"#{url}\""
|
110
|
+
puts "done!"
|
111
|
+
end
|
112
|
+
```
|
113
|
+
|
114
|
+
Now, let's create the setup method to verify if the database is created and have the data loaded, and fetch it if necessary.
|
115
|
+
|
116
|
+
```ruby
|
117
|
+
def setup size: :small
|
118
|
+
file = "weather_#{size}.tar.gz"
|
119
|
+
download_weather_dataset unless File.exists? file
|
120
|
+
puts "extracting #{file}"
|
121
|
+
system "tar -xvzf #{file} "
|
122
|
+
puts "creating data structures"
|
123
|
+
system "psql #{PG_URI} < weather.sql"
|
124
|
+
system %|psql #{PG_URI} -c "\\COPY locations FROM weather_#{size}_locations.csv CSV"|
|
125
|
+
system %|psql #{PG_URI} -c "\\COPY conditions FROM weather_#{size}_conditions.csv CSV"|
|
126
|
+
end
|
127
|
+
```
|
128
|
+
|
129
|
+
!!!info
|
130
|
+
Maybe you'll need to recreate the database if you want to test with a different dataset.
|
131
|
+
|
132
|
+
### Declaring the models
|
133
|
+
|
134
|
+
Now, let's declare the ActiveRecord models. The location is an auxiliary table
|
135
|
+
to control the placement of the device.
|
136
|
+
|
137
|
+
```ruby
|
138
|
+
class Location < ActiveRecord::Base
|
139
|
+
self.primary_key = "device_id"
|
140
|
+
|
141
|
+
has_many :conditions, foreign_key: "device_id"
|
142
|
+
end
|
143
|
+
```
|
144
|
+
|
145
|
+
Every location emits weather conditions with `temperature` and `humidity` every X minutes.
|
146
|
+
|
147
|
+
The `conditions` is the time-series data we'll refer to here.
|
148
|
+
|
149
|
+
```ruby
|
150
|
+
class Condition < ActiveRecord::Base
|
151
|
+
acts_as_hypertable time_column: "time"
|
152
|
+
acts_as_time_vector value_column: "temperature", segment_by: "device_id"
|
153
|
+
belongs_to :location, foreign_key: "device_id"
|
154
|
+
end
|
155
|
+
```
|
156
|
+
|
157
|
+
### Putting all together
|
158
|
+
|
159
|
+
Now it's time to call the methods we implemented before. So, let's set up a logger to print the data to the standard output (STDOUT) to confirm the steps and add the toolkit to the search path.
|
160
|
+
|
161
|
+
Similar to database migration, we need to verify if the table exists, set up the hypertable and load the data if necessary.
|
162
|
+
|
163
|
+
```ruby
|
164
|
+
ActiveRecord::Base.connection.instance_exec do
|
165
|
+
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
166
|
+
add_toolkit_to_search_path!
|
167
|
+
|
168
|
+
unless Condition.table_exists?
|
169
|
+
setup size: :small
|
170
|
+
end
|
171
|
+
end
|
172
|
+
```
|
173
|
+
|
174
|
+
The `setup` method also can fetch different datasets and you'll need to manually drop the `conditions` and `locations` tables to reload it.
|
175
|
+
|
176
|
+
|
177
|
+
### Filtering data
|
178
|
+
|
179
|
+
We'll have two main scenarios to plot the data. When the user is not filtering any data and when the user is filtering during a zoom phase.
|
180
|
+
|
181
|
+
To simplify the example, we're going to use only the `weather-pro-000001` device_id to make it easier to follow:
|
182
|
+
|
183
|
+
```ruby
|
184
|
+
def filter_by_request_params
|
185
|
+
filter= {device_id: "weather-pro-000001"}
|
186
|
+
if params[:filter] && params[:filter] != "null"
|
187
|
+
from, to = params[:filter].split(",").map(&Time.method(:parse))
|
188
|
+
filter[:time] = from..to
|
189
|
+
end
|
190
|
+
filter
|
191
|
+
end
|
192
|
+
```
|
193
|
+
|
194
|
+
The method is just building the proper where clause using the ActiveRecord style to be filtering the conditions we want to use for the example. Now, let's use the previous method defining the scope of the data that will be downsampled from the database.
|
195
|
+
|
196
|
+
```ruby
|
197
|
+
def conditions
|
198
|
+
Condition.where(filter_by_request_params).order('time')
|
199
|
+
end
|
200
|
+
```
|
201
|
+
|
202
|
+
### Downsampling data
|
203
|
+
|
204
|
+
The threshold can be defined as a method as it can also be used further in the front-end for rendering the initial template values.
|
205
|
+
|
206
|
+
```ruby
|
207
|
+
def threshold
|
208
|
+
params[:threshold]&.to_i || 50
|
209
|
+
end
|
210
|
+
```
|
211
|
+
|
212
|
+
Now, the most important method of this example, the call to the [lttb][3] function that is responsible for the downsampling algorithm. It also reuses all previous logic built here.
|
213
|
+
|
214
|
+
```ruby
|
215
|
+
def downsampled
|
216
|
+
conditions.lttb(threshold: threshold, segment_by: nil)
|
217
|
+
end
|
218
|
+
```
|
219
|
+
|
220
|
+
The `segment_by` keyword explicit `nil` because we have the `segment_by` explicit in the `acts_as_time_vector` macro in the model that is being inherited here. As the filter is specifying a `device_id`, we can skip this option to simplify the data coming from lttb.
|
221
|
+
|
222
|
+
!!!info "The lttb scope"
|
223
|
+
The `lttb` method call in reality is a ActiveRecord scope. It is encapsulating all the logic behind the library. The SQL code is not big, but there's some caveats involved here. So, behind the scenes the following SQL query is executed:
|
224
|
+
|
225
|
+
```sql
|
226
|
+
SELECT time AS time, value AS temperature
|
227
|
+
FROM (
|
228
|
+
WITH ordered AS
|
229
|
+
(SELECT "conditions"."time",
|
230
|
+
"conditions"."temperature"
|
231
|
+
FROM "conditions"
|
232
|
+
WHERE "conditions"."device_id" = 'weather-pro-000001'
|
233
|
+
ORDER BY time, "conditions"."time" ASC)
|
234
|
+
SELECT (
|
235
|
+
lttb( ordered.time, ordered.temperature, 50) ->
|
236
|
+
toolkit_experimental.unnest()
|
237
|
+
).* FROM ordered
|
238
|
+
) AS ordered
|
239
|
+
```
|
240
|
+
|
241
|
+
The `acts_as_time_vector` macro makes the `lttb` scope available in the ActiveRecord scopes allowing to mix conditions in advance and nest the queries in the way that it can process the LTTB and unnest it properly.
|
242
|
+
|
243
|
+
Also, note that it's using the `->` pipeline operator to unnest the timevector and transform the data in tupples again.
|
244
|
+
|
245
|
+
|
246
|
+
### Exposing endpoints
|
247
|
+
|
248
|
+
Now, let's start with the web part using the sinatra macros. First, let's
|
249
|
+
configure the server to allow cross origin requests and fetch the javascripts
|
250
|
+
libraries directly from their official website.
|
251
|
+
|
252
|
+
```ruby
|
253
|
+
configure do
|
254
|
+
enable :cross_origin
|
255
|
+
end
|
256
|
+
```
|
257
|
+
Now, let's declare the root endpoint that will render the index template and
|
258
|
+
the JSON endpoint that will return the downsampled data.
|
259
|
+
|
260
|
+
```ruby
|
261
|
+
get '/' do
|
262
|
+
erb :index
|
263
|
+
end
|
264
|
+
```
|
265
|
+
|
266
|
+
Note that the erb template should be on `views/index.erb` and will be covered in
|
267
|
+
the front end section soon.
|
268
|
+
|
269
|
+
```ruby
|
270
|
+
get "/lttb_sql" do
|
271
|
+
json downsampled
|
272
|
+
end
|
273
|
+
```
|
274
|
+
|
275
|
+
## Front end
|
276
|
+
|
277
|
+
The front-end will be a simple HTML with Javascript to Plot the fetched data and asynchronouysly refresh the data in a new resolution in case of zooming in.
|
278
|
+
|
279
|
+
The sinatrarb works with a simple "views" folder and by default it renders erb templates that is a mix of Ruby scriptlets and HTML templates.
|
280
|
+
|
281
|
+
All the following snippets goes to the same file. They're just split into
|
282
|
+
separated parts that will make it easier to understand what each part does.
|
283
|
+
|
284
|
+
Let's start with the header that contains the extra scripts.
|
285
|
+
|
286
|
+
We're just using two libraries:
|
287
|
+
|
288
|
+
1. **jQuery** to fetch data async with ajax calls.
|
289
|
+
2. [plotly][9] to plot the data.
|
290
|
+
|
291
|
+
```html
|
292
|
+
<head>
|
293
|
+
<script src="https://cdn.jsdelivr.net/npm/jquery@3.6.1/dist/jquery.min.js"></script>
|
294
|
+
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
|
295
|
+
</head>
|
296
|
+
```
|
297
|
+
|
298
|
+
Now, let's have a small status showing how many records are present in the
|
299
|
+
database and allowing to use a different threshold and test different subset of
|
300
|
+
downsampled data.
|
301
|
+
|
302
|
+
```html
|
303
|
+
<h3>Downsampling <%= conditions.count %> records to
|
304
|
+
<select value="<%= threshold %>" onchange="location.href=`/?threshold=${this.value}`">
|
305
|
+
<option><%= threshold %></option>
|
306
|
+
<option value="50">50</option>
|
307
|
+
<option value="100">100</option>
|
308
|
+
<option value="500">500</option>
|
309
|
+
<option value="1000">1000</option>
|
310
|
+
<option value="5000">5000</option>
|
311
|
+
</select> points.
|
312
|
+
</h3>
|
313
|
+
```
|
314
|
+
|
315
|
+
Note that some Ruby scripts are wrapped with `<%= ... %>`in the middle of the HTML instructions to inherit the defaults established in the back-end.
|
316
|
+
|
317
|
+
Now, it's time to declare the div that will receive the plot component and
|
318
|
+
declare the method to fetch data and create the chart.
|
319
|
+
|
320
|
+
```html
|
321
|
+
<div id='container'></div>
|
322
|
+
<script>
|
323
|
+
let chart = document.getElementById('container');
|
324
|
+
function fetch(filter) {
|
325
|
+
$.ajax({
|
326
|
+
url: `/lttb_sql?threshold=<%= threshold %>&filter=${filter}`,
|
327
|
+
success: function(result) {
|
328
|
+
let x = result.map((e) => e[0]);
|
329
|
+
let y = result.map((e) => parseFloat(e[1]));
|
330
|
+
Plotly.newPlot(chart, [{x, y}]);
|
331
|
+
chart.on('plotly_relayout',
|
332
|
+
function(eventdata){
|
333
|
+
fetch([eventdata['xaxis.range[0]'], eventdata['xaxis.range[1]']]);
|
334
|
+
});
|
335
|
+
}});
|
336
|
+
}
|
337
|
+
fetch(null);
|
338
|
+
</script>
|
339
|
+
```
|
340
|
+
|
341
|
+
That's all for today folks!
|
342
|
+
|
343
|
+
|
344
|
+
[1]: /toolkit_lttb_tutorial
|
345
|
+
[2]: https://github.com/jonatas/timescaledb/blob/master/examples/toolkit-demo/lttb_zoom
|
346
|
+
[3]: https://docs.timescale.com/api/latest/hyperfunctions/downsample/lttb/
|
347
|
+
[4]: https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/
|
348
|
+
[5]: https://github.com/jonatas/timescaledb
|
349
|
+
[6]: http://pry.github.io
|
350
|
+
[7]: http://sinatrarb.com
|
351
|
+
[8]: https://docs.timescale.com/timescaledb/latest/tutorials/sample-datasets/#weather-datasets
|
352
|
+
[9]: https://plotly.com
|
353
|
+
[10]:
|
354
|
+
[11]:
|
355
|
+
[12]:
|
356
|
+
[13]:
|
357
|
+
[4]:
|
data/docs/videos.md
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
# Videos about the TimescaleDB Gem
|
2
|
+
|
3
|
+
This library was started on [twitch.tv/timescaledb](https://twitch.tv/timescaledb).
|
4
|
+
You can watch all episodes here:
|
5
|
+
|
6
|
+
1. [Wrapping Functions to Ruby Helpers](https://www.youtube.com/watch?v=hGPsUxLFAYk).
|
7
|
+
2. [Extending ActiveRecord with Timescale Helpers](https://www.youtube.com/watch?v=IEyJIHk1Clk).
|
8
|
+
3. [Setup Hypertables for Rails testing environment](https://www.youtube.com/watch?v=wM6hVrZe7xA).
|
9
|
+
4. [Packing the code to this repository](https://www.youtube.com/watch?v=CMdGAl_XlL4).
|
10
|
+
4. [the code to this repository](https://www.youtube.com/watch?v=CMdGAl_XlL4).
|
11
|
+
5. [Working with Timescale continuous aggregates](https://youtu.be/co4HnBkHzVw).
|
12
|
+
6. [Creating the command-line application in Ruby to explore the Timescale API](https://www.youtube.com/watch?v=I3vM_q2m7T0).
|
13
|
+
|
14
|
+
|
15
|
+
If you create any content related to how to use the Timescale Gem, please open a
|
16
|
+
[Pull Request](https://github.com/jonatas/timescaledb/pulls).
|
@@ -1,4 +1,11 @@
|
|
1
|
-
require 'bundler/
|
1
|
+
require 'bundler/inline' #require only what you need
|
2
|
+
|
3
|
+
gemfile(true) do
|
4
|
+
gem 'timescaledb', path: '../..'
|
5
|
+
gem 'pry'
|
6
|
+
gem 'faker'
|
7
|
+
end
|
8
|
+
|
2
9
|
require 'timescaledb'
|
3
10
|
require 'pp'
|
4
11
|
require 'pry'
|
@@ -7,7 +14,7 @@ ActiveRecord::Base.establish_connection( ARGV.last)
|
|
7
14
|
|
8
15
|
# Simple example
|
9
16
|
class Event < ActiveRecord::Base
|
10
|
-
self.primary_key =
|
17
|
+
self.primary_key = nil
|
11
18
|
acts_as_hypertable
|
12
19
|
end
|
13
20
|
|
@@ -15,11 +22,11 @@ end
|
|
15
22
|
ActiveRecord::Base.connection.instance_exec do
|
16
23
|
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
17
24
|
|
18
|
-
drop_table(:events) if Event.table_exists?
|
25
|
+
drop_table(:events, cascade: true) if Event.table_exists?
|
19
26
|
|
20
27
|
hypertable_options = {
|
21
28
|
time_column: 'created_at',
|
22
|
-
chunk_time_interval: '1
|
29
|
+
chunk_time_interval: '1 day',
|
23
30
|
compress_segmentby: 'identifier',
|
24
31
|
compression_interval: '7 days'
|
25
32
|
}
|
@@ -42,10 +49,37 @@ end
|
|
42
49
|
end
|
43
50
|
end
|
44
51
|
|
52
|
+
|
53
|
+
def generate_fake_data(total: 100_000)
|
54
|
+
time = 1.month.ago
|
55
|
+
total.times.flat_map do
|
56
|
+
identifier = %w[sign_up login click scroll logout view]
|
57
|
+
time = time + rand(60).seconds
|
58
|
+
{
|
59
|
+
created_at: time,
|
60
|
+
updated_at: time,
|
61
|
+
identifier: identifier.sample,
|
62
|
+
payload: {
|
63
|
+
"name" => Faker::Name.name,
|
64
|
+
"email" => Faker::Internet.email
|
65
|
+
}
|
66
|
+
}
|
67
|
+
end
|
68
|
+
end
|
69
|
+
|
70
|
+
def supress_logs
|
71
|
+
ActiveRecord::Base.logger =nil
|
72
|
+
yield
|
73
|
+
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
74
|
+
end
|
75
|
+
|
76
|
+
batch = generate_fake_data total: 10_000
|
77
|
+
supress_logs do
|
78
|
+
Event.insert_all(batch, returning: false)
|
79
|
+
end
|
45
80
|
# Now let's see what we have in the scopes
|
46
81
|
Event.last_hour.group(:identifier).count # => {"login"=>2, "click"=>1, "logout"=>1, "sign_up"=>1, "scroll"=>1}
|
47
82
|
|
48
|
-
|
49
83
|
puts "compressing #{ Event.chunks.count }"
|
50
84
|
Event.chunks.first.compress!
|
51
85
|
|
@@ -0,0 +1,108 @@
|
|
1
|
+
require 'bundler/inline' #require only what you need
|
2
|
+
|
3
|
+
gemfile(true) do
|
4
|
+
gem 'timescaledb', path: '../..'
|
5
|
+
gem 'pry'
|
6
|
+
gem 'faker'
|
7
|
+
gem 'benchmark-ips', require: "benchmark/ips", git: 'https://github.com/evanphx/benchmark-ips'
|
8
|
+
end
|
9
|
+
|
10
|
+
require 'pp'
|
11
|
+
require 'benchmark'
|
12
|
+
# ruby all_in_one.rb postgres://user:pass@host:port/db_name
|
13
|
+
ActiveRecord::Base.establish_connection( ARGV.last)
|
14
|
+
|
15
|
+
# Simple example
|
16
|
+
class Event < ActiveRecord::Base
|
17
|
+
self.primary_key = nil
|
18
|
+
acts_as_hypertable
|
19
|
+
|
20
|
+
# If you want to override the automatic assingment of the `created_at ` time series column
|
21
|
+
def self.timestamp_attributes_for_create_in_model
|
22
|
+
[]
|
23
|
+
end
|
24
|
+
def self.timestamp_attributes_for_update_in_model
|
25
|
+
[]
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
class Event2 < ActiveRecord::Base
|
30
|
+
self.table_name = "events_2"
|
31
|
+
end
|
32
|
+
|
33
|
+
# Setup Hypertable as in a migration
|
34
|
+
ActiveRecord::Base.connection.instance_exec do
|
35
|
+
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
36
|
+
|
37
|
+
drop_table(Event.table_name) if Event.table_exists?
|
38
|
+
drop_table(Event2.table_name) if Event2.table_exists?
|
39
|
+
|
40
|
+
hypertable_options = {
|
41
|
+
time_column: 'created_at',
|
42
|
+
chunk_time_interval: '7 day',
|
43
|
+
compress_segmentby: 'identifier',
|
44
|
+
compression_interval: '7 days'
|
45
|
+
}
|
46
|
+
|
47
|
+
create_table(:events, id: false, hypertable: hypertable_options) do |t|
|
48
|
+
t.string :identifier, null: false
|
49
|
+
t.jsonb :payload
|
50
|
+
t.timestamps
|
51
|
+
end
|
52
|
+
|
53
|
+
create_table(Event2.table_name) do |t|
|
54
|
+
t.string :identifier, null: false
|
55
|
+
t.jsonb :payload
|
56
|
+
t.timestamps
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
def generate_fake_data(total: 100_000)
|
61
|
+
time = Time.now
|
62
|
+
total.times.flat_map do
|
63
|
+
identifier = %w[sign_up login click scroll logout view]
|
64
|
+
time = time + rand(60).seconds
|
65
|
+
{
|
66
|
+
created_at: time,
|
67
|
+
updated_at: time,
|
68
|
+
identifier: identifier.sample,
|
69
|
+
payload: {
|
70
|
+
"name" => Faker::Name.name,
|
71
|
+
"email" => Faker::Internet.email
|
72
|
+
}
|
73
|
+
}
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
|
78
|
+
def parallel_inserts clazz: nil, size: 5_000, data: nil
|
79
|
+
limit = 8
|
80
|
+
threads = []
|
81
|
+
while (batch = data.shift(size)).any? do
|
82
|
+
threads << Thread.new(batch) do |batch|
|
83
|
+
begin
|
84
|
+
clazz.insert_all(batch, returning: false)
|
85
|
+
ensure
|
86
|
+
ActiveRecord::Base.connection.close if ActiveRecord::Base.connection
|
87
|
+
end
|
88
|
+
end
|
89
|
+
if threads.size == limit
|
90
|
+
threads.each(&:join)
|
91
|
+
threads = []
|
92
|
+
end
|
93
|
+
end
|
94
|
+
threads.each(&:join)
|
95
|
+
end
|
96
|
+
|
97
|
+
payloads = nil
|
98
|
+
ActiveRecord::Base.logger = nil
|
99
|
+
Benchmark.ips do |x|
|
100
|
+
x.config(time: 500, warmup: 2)
|
101
|
+
|
102
|
+
x.report("gen data") { payloads = generate_fake_data total: 100_000}
|
103
|
+
x.report("normal ") { parallel_inserts(data: payloads.dup, clazz: Event2, size: 5000) }
|
104
|
+
x.report("hyper ") { parallel_inserts(data: payloads.dup, clazz: Event, size: 5000) }
|
105
|
+
x.compare!
|
106
|
+
end
|
107
|
+
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
108
|
+
|
@@ -0,0 +1,93 @@
|
|
1
|
+
require 'bundler/inline' #require only what you need
|
2
|
+
|
3
|
+
gemfile(true) do
|
4
|
+
gem 'timescaledb', path: '../..'
|
5
|
+
gem 'pry'
|
6
|
+
end
|
7
|
+
|
8
|
+
require 'pp'
|
9
|
+
# ruby caggs.rb postgres://user:pass@host:port/db_name
|
10
|
+
ActiveRecord::Base.establish_connection( ARGV.last)
|
11
|
+
|
12
|
+
class Tick < ActiveRecord::Base
|
13
|
+
self.table_name = 'ticks'
|
14
|
+
self.primary_key = nil
|
15
|
+
|
16
|
+
acts_as_hypertable time_column: 'time'
|
17
|
+
|
18
|
+
%w[open high low close].each{|name| attribute name, :decimal}
|
19
|
+
|
20
|
+
scope :ohlc, -> (timeframe='1m') do
|
21
|
+
select("time_bucket('#{timeframe}', time) as time,
|
22
|
+
symbol,
|
23
|
+
FIRST(price, time) as open,
|
24
|
+
MAX(price) as high,
|
25
|
+
MIN(price) as low,
|
26
|
+
LAST(price, time) as close,
|
27
|
+
SUM(volume) as volume").group("1,2")
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
ActiveRecord::Base.connection.instance_exec do
|
32
|
+
drop_table(:ticks, force: :cascade) if Tick.table_exists?
|
33
|
+
|
34
|
+
hypertable_options = {
|
35
|
+
time_column: 'time',
|
36
|
+
chunk_time_interval: '1 day',
|
37
|
+
compress_segmentby: 'symbol',
|
38
|
+
compress_orderby: 'time',
|
39
|
+
compression_interval: '7 days'
|
40
|
+
}
|
41
|
+
|
42
|
+
create_table :ticks, hypertable: hypertable_options, id: false do |t|
|
43
|
+
t.timestamp :time
|
44
|
+
t.string :symbol
|
45
|
+
t.decimal :price
|
46
|
+
t.integer :volume
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
FAANG = %w[META AMZN AAPL NFLX GOOG]
|
51
|
+
OPERATION = [:+, :-]
|
52
|
+
RAND_VOLUME = -> { (rand(10) * rand(10)) * 100 }
|
53
|
+
RAND_CENT = -> { (rand / 50.0).round(2) }
|
54
|
+
|
55
|
+
def generate_fake_data(total: 100)
|
56
|
+
previous_price = {}
|
57
|
+
time = Time.now
|
58
|
+
(total / FAANG.size).times.flat_map do
|
59
|
+
time += rand(10)
|
60
|
+
FAANG.map do |symbol|
|
61
|
+
if previous_price[symbol]
|
62
|
+
price = previous_price[symbol].send(OPERATION.sample, RAND_CENT.()).round(2)
|
63
|
+
else
|
64
|
+
price = 50 + rand(100)
|
65
|
+
end
|
66
|
+
payload = { time: time, symbol: symbol, price: price, volume: RAND_VOLUME.() }
|
67
|
+
previous_price[symbol] = price
|
68
|
+
payload
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
batch = generate_fake_data total: 10_000
|
74
|
+
ActiveRecord::Base.logger = nil
|
75
|
+
Tick.insert_all(batch, returning: false)
|
76
|
+
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
77
|
+
|
78
|
+
ActiveRecord::Base.connection.instance_exec do
|
79
|
+
create_continuous_aggregates('ohlc_1m', Tick.ohlc('1m'), with_data: true)
|
80
|
+
end
|
81
|
+
|
82
|
+
class Ohlc1m < ActiveRecord::Base
|
83
|
+
self.table_name = 'ohlc_1m'
|
84
|
+
attribute :time, :time
|
85
|
+
attribute :symbol, :string
|
86
|
+
%w[open high low close volume].each{|name| attribute name, :decimal}
|
87
|
+
|
88
|
+
def readonly?
|
89
|
+
true
|
90
|
+
end
|
91
|
+
end
|
92
|
+
|
93
|
+
binding.pry
|