clickhouse-ruby 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +67 -0
- data/README.md +142 -1
- data/lib/clickhouse_ruby/active_record/generators/migration_generator.rb +308 -0
- data/lib/clickhouse_ruby/active_record/generators/templates/create_table.rb.tt +59 -0
- data/lib/clickhouse_ruby/active_record/generators/templates/migration.rb.tt +87 -0
- data/lib/clickhouse_ruby/active_record/railtie.rb +16 -2
- data/lib/clickhouse_ruby/active_record/schema_dumper.rb +458 -0
- data/lib/clickhouse_ruby/active_record.rb +1 -0
- data/lib/clickhouse_ruby/client.rb +212 -1
- data/lib/clickhouse_ruby/connection_pool.rb +73 -1
- data/lib/clickhouse_ruby/instrumentation.rb +176 -0
- data/lib/clickhouse_ruby/version.rb +1 -1
- data/lib/clickhouse_ruby.rb +1 -0
- metadata +20 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 97ae40434fd1078ad34271a031ed28442d831f569200a2b7e71f0f99184d907d
|
|
4
|
+
data.tar.gz: '002378905be755b1fe9fd54f05478ef6b0f3d7d020beb0b890e5f508b300ec8f'
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 90cd05a510db8a9322e5828a52d33386888f18d2cf40c2b0514514cbb25a3448155ae41eecdaada148f8d16f75bf6ab5570f5aa1c004ca8604475ebd9a1ab404
|
|
7
|
+
data.tar.gz: 8d39e17f93bfd955eab008fceb13e847fbbaf6e705ab95814b401e10ec91919b4109be8b486f010be4797778298d3c70a47eba586f8e8f89e4f135b9b0c9f09d
|
data/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,73 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [0.3.0] - 2026-02-02
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
#### Observability & Instrumentation
|
|
15
|
+
- **ActiveSupport::Notifications Integration** - Event-driven monitoring for APM tools
|
|
16
|
+
- Events: `clickhouse_ruby.query.complete`, `clickhouse_ruby.query.error`, `clickhouse_ruby.insert.complete`
|
|
17
|
+
- Pool events: `clickhouse_ruby.pool.checkout`, `clickhouse_ruby.pool.checkin`, `clickhouse_ruby.pool.timeout`
|
|
18
|
+
- Graceful fallback when ActiveSupport is not available
|
|
19
|
+
- Query timing with millisecond precision using monotonic clock
|
|
20
|
+
- Example: `ActiveSupport::Notifications.subscribe(/clickhouse_ruby/) { |*args| ... }`
|
|
21
|
+
|
|
22
|
+
- **Enhanced Logging** - Debug-level query timing logs
|
|
23
|
+
- Query duration logging at debug level
|
|
24
|
+
- Insert timing with row count
|
|
25
|
+
- Structured payload data for external processing
|
|
26
|
+
|
|
27
|
+
#### Performance Benchmarking
|
|
28
|
+
- **Benchmark Suite** - Comprehensive performance testing infrastructure
|
|
29
|
+
- Rake tasks: `rake benchmark`, `rake benchmark:quick`, `rake benchmark:connection`, `rake benchmark:query`, `rake benchmark:insert`
|
|
30
|
+
- Uses `benchmark-ips` for iterations-per-second measurements
|
|
31
|
+
- Performance targets from MVP: Connection <100ms, SELECT <50ms, 10K INSERT <1s
|
|
32
|
+
- Latency statistics: min, max, avg, median, p95, p99
|
|
33
|
+
|
|
34
|
+
#### ActiveRecord Migration Helpers
|
|
35
|
+
- **Migration Generator** - Rails generator for ClickHouse migrations
|
|
36
|
+
- Command: `rails generate clickhouse:migration CreateEvents field:type`
|
|
37
|
+
- ClickHouse-specific options: `--engine`, `--order-by`, `--partition-by`, `--primary-key`, `--settings`
|
|
38
|
+
- Cluster support with automatic Replicated* engine selection
|
|
39
|
+
- Auto-detects migration action from name (create_table, add_column, remove_column)
|
|
40
|
+
|
|
41
|
+
- **Migration Templates** - ClickHouse-aware migration templates
|
|
42
|
+
- Supports MergeTree, ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree
|
|
43
|
+
- Partition expressions with proper quoting
|
|
44
|
+
- Settings block support
|
|
45
|
+
|
|
46
|
+
- **Schema Dumper** - Rails-compatible schema dumping
|
|
47
|
+
- Extracts table options from system.tables
|
|
48
|
+
- Dumps ClickHouse-specific column options (Nullable, LowCardinality, Decimal, DateTime64)
|
|
49
|
+
- View and index dumping support
|
|
50
|
+
|
|
51
|
+
#### Query Tools
|
|
52
|
+
- **EXPLAIN Support** - Query plan analysis
|
|
53
|
+
- Methods: `client.explain(sql, type: :plan)`
|
|
54
|
+
- Types: `:plan`, `:pipeline`, `:estimate`, `:ast`, `:syntax`
|
|
55
|
+
- Example: `client.explain('SELECT * FROM events', type: :pipeline)`
|
|
56
|
+
|
|
57
|
+
- **Enhanced Health Check** - Comprehensive server health status
|
|
58
|
+
- Returns: status, server_version, current_database, server_uptime, pool health
|
|
59
|
+
- Single method for monitoring dashboards
|
|
60
|
+
- Example: `client.health_check`
|
|
61
|
+
|
|
62
|
+
- **Detailed Pool Statistics** - Monitoring-ready metrics
|
|
63
|
+
- Method: `pool.detailed_stats`
|
|
64
|
+
- Returns: utilization_percent, checkout rate per minute, timeout rate
|
|
65
|
+
- Suitable for Prometheus/StatsD export
|
|
66
|
+
|
|
67
|
+
### Changed
|
|
68
|
+
- Connection pool now publishes instrumentation events on checkout/checkin
|
|
69
|
+
- Query and insert operations track timing automatically
|
|
70
|
+
- Pool timeout errors include instrumentation payload
|
|
71
|
+
|
|
72
|
+
### Development
|
|
73
|
+
- Added `benchmark-ips` (~> 2.12) as development dependency
|
|
74
|
+
- New `benchmark/` directory with helper and benchmark files
|
|
75
|
+
- RuboCop exclusions for benchmark files
|
|
76
|
+
|
|
10
77
|
## [0.2.0] - 2026-02-02
|
|
11
78
|
|
|
12
79
|
### Added
|
data/README.md
CHANGED
|
@@ -6,6 +6,33 @@
|
|
|
6
6
|
|
|
7
7
|
A lightweight Ruby client for ClickHouse with optional ActiveRecord integration.
|
|
8
8
|
|
|
9
|
+
## Why ClickhouseRuby?
|
|
10
|
+
|
|
11
|
+
ClickhouseRuby is designed from the ground up with production reliability and developer experience in mind. Here's what sets it apart:
|
|
12
|
+
|
|
13
|
+
**🔒 Security & Reliability**
|
|
14
|
+
- **SSL verification enabled by default** - Secure by default, unlike alternatives that require explicit configuration
|
|
15
|
+
- **Never silently fails** - All errors are properly raised and propagated (fixes [clickhouse-activerecord #230](https://github.com/patrikx3/clickhouse-activerecord/issues/230))
|
|
16
|
+
- **Comprehensive error hierarchy** - 30+ specific error classes mapped from ClickHouse error codes
|
|
17
|
+
|
|
18
|
+
**⚡ Performance & Architecture**
|
|
19
|
+
- **Zero runtime dependencies** - Uses only Ruby stdlib, making it lightweight and fully auditable
|
|
20
|
+
- **AST-based type parser** - Handles complex nested types correctly (Array(Tuple(String, UInt64)), etc.) unlike regex-based parsers
|
|
21
|
+
- **Thread-safe connection pooling** - Built-in pool with health checks and proper resource management
|
|
22
|
+
- **Result streaming** - Process millions of rows with constant memory usage
|
|
23
|
+
|
|
24
|
+
**🛠️ Developer Experience**
|
|
25
|
+
- **Clean, intuitive API** - Simple methods for queries, inserts, and DDL operations
|
|
26
|
+
- **Optional ActiveRecord integration** - Familiar model-based access when you need it
|
|
27
|
+
- **ClickHouse-specific query extensions** - PREWHERE, FINAL, SAMPLE, and SETTINGS DSL built-in
|
|
28
|
+
- **Comprehensive type system** - Full support for all ClickHouse types including Nullable, Array, Map, Tuple, Enum, Decimal
|
|
29
|
+
|
|
30
|
+
**📊 Production Ready**
|
|
31
|
+
- **Automatic retries** - Configurable exponential backoff for transient failures
|
|
32
|
+
- **HTTP compression** - Reduce bandwidth for large payloads
|
|
33
|
+
- **Connection health monitoring** - Pool statistics and health checks
|
|
34
|
+
- **Extensive test coverage** - 80%+ coverage with both unit and integration tests
|
|
35
|
+
|
|
9
36
|
## Features
|
|
10
37
|
|
|
11
38
|
**Core (v0.1.0)**
|
|
@@ -98,10 +125,15 @@ client = ClickhouseRuby.client
|
|
|
98
125
|
| `password` | Authentication password | `nil` |
|
|
99
126
|
| `ssl` | Enable HTTPS | `false` |
|
|
100
127
|
| `ssl_verify` | Verify SSL certificates | `true` |
|
|
128
|
+
| `ssl_ca_path` | Custom CA certificate file path | `nil` |
|
|
101
129
|
| `connect_timeout` | Connection timeout in seconds | `10` |
|
|
102
130
|
| `read_timeout` | Read timeout in seconds | `60` |
|
|
103
131
|
| `write_timeout` | Write timeout in seconds | `60` |
|
|
104
132
|
| `pool_size` | Connection pool size | `5` |
|
|
133
|
+
| `log_level` | Logger level (`:debug`, `:info`, `:warn`, `:error`) | `:info` |
|
|
134
|
+
| `default_settings` | Global ClickHouse settings for all queries | `{}` |
|
|
135
|
+
| `pool_timeout` | Wait time for available connection in seconds | `5` |
|
|
136
|
+
| `retry_jitter` | Jitter strategy (`:full`, `:equal`, `:none`) | `:equal` |
|
|
105
137
|
|
|
106
138
|
### Environment Variables
|
|
107
139
|
|
|
@@ -180,6 +212,39 @@ result = client.execute(
|
|
|
180
212
|
)
|
|
181
213
|
```
|
|
182
214
|
|
|
215
|
+
### Result Metadata
|
|
216
|
+
|
|
217
|
+
Access query metadata and result information:
|
|
218
|
+
|
|
219
|
+
```ruby
|
|
220
|
+
result = client.execute('SELECT * FROM events LIMIT 100')
|
|
221
|
+
|
|
222
|
+
# Query execution metadata
|
|
223
|
+
result.elapsed_time # => 0.042 (seconds)
|
|
224
|
+
result.rows_read # => 100
|
|
225
|
+
result.bytes_read # => 8500
|
|
226
|
+
result.rows_written # => 0 (for SELECT queries)
|
|
227
|
+
result.bytes_written # => 0
|
|
228
|
+
|
|
229
|
+
# Result size information
|
|
230
|
+
result.count # => 100 (alias for size/length)
|
|
231
|
+
result.size # => 100
|
|
232
|
+
result.length # => 100
|
|
233
|
+
|
|
234
|
+
# Column information
|
|
235
|
+
result.columns # => ["id", "event_type", "count"]
|
|
236
|
+
result.column_types # => ["UInt64", "String", "UInt32"]
|
|
237
|
+
result.types # => ["UInt64", "String", "UInt32"]
|
|
238
|
+
|
|
239
|
+
# Row access methods
|
|
240
|
+
result.first # => {"id" => 1, "event_type" => "click", "count" => 100}
|
|
241
|
+
result.last # => {"id" => 100, "event_type" => "view", "count" => 50}
|
|
242
|
+
result[5] # => {"id" => 5, "event_type" => "click", "count" => 75}
|
|
243
|
+
|
|
244
|
+
# Get all values for a specific column
|
|
245
|
+
result.column_values("event_type") # => ["click", "view", "click", ...]
|
|
246
|
+
```
|
|
247
|
+
|
|
183
248
|
### DDL Commands
|
|
184
249
|
|
|
185
250
|
```ruby
|
|
@@ -220,12 +285,88 @@ client.ping # => true
|
|
|
220
285
|
client.server_version # => "24.1.1.123"
|
|
221
286
|
|
|
222
287
|
# Get pool statistics
|
|
223
|
-
client.pool_stats # => {
|
|
288
|
+
client.pool_stats # => {
|
|
289
|
+
# size: 5, # Total pool capacity
|
|
290
|
+
# available: 4, # Connections ready to use
|
|
291
|
+
# in_use: 1, # Connections currently in use
|
|
292
|
+
# total_connections: 5, # Total connections created
|
|
293
|
+
# total_checkouts: 150, # Lifetime pool checkout count
|
|
294
|
+
# total_timeouts: 0, # Timeouts waiting for connection
|
|
295
|
+
# uptime_seconds: 3600.5 # Seconds since pool created
|
|
296
|
+
# }
|
|
224
297
|
|
|
225
298
|
# Close all connections
|
|
226
299
|
client.close
|
|
227
300
|
```
|
|
228
301
|
|
|
302
|
+
### Advanced Client Methods
|
|
303
|
+
|
|
304
|
+
#### Batch Processing
|
|
305
|
+
|
|
306
|
+
Process large query results in batches to manage memory efficiently:
|
|
307
|
+
|
|
308
|
+
```ruby
|
|
309
|
+
# Process 500 rows at a time
|
|
310
|
+
client.each_batch('SELECT * FROM huge_table', batch_size: 500) do |batch|
|
|
311
|
+
# batch is an array of hashes (max 500 rows)
|
|
312
|
+
puts "Processing batch of #{batch.size} rows"
|
|
313
|
+
insert_to_cache(batch)
|
|
314
|
+
end
|
|
315
|
+
|
|
316
|
+
# Default batch size is 500 rows
|
|
317
|
+
client.each_batch('SELECT * FROM data') { |batch| process(batch) }
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
#### Row-by-Row Processing
|
|
321
|
+
|
|
322
|
+
Process results one row at a time for maximum memory efficiency:
|
|
323
|
+
|
|
324
|
+
```ruby
|
|
325
|
+
# Stream processing - constant memory usage
|
|
326
|
+
client.each_row('SELECT * FROM massive_table') do |row|
|
|
327
|
+
# row is a single hash
|
|
328
|
+
puts "Processing: #{row['id']}"
|
|
329
|
+
update_statistics(row)
|
|
330
|
+
end
|
|
331
|
+
|
|
332
|
+
# Returns Enumerator if no block given
|
|
333
|
+
rows = client.each_row('SELECT * FROM table')
|
|
334
|
+
rows.each { |row| puts row['name'] }
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
#### Connection Aliases
|
|
338
|
+
|
|
339
|
+
Additional connection management methods:
|
|
340
|
+
|
|
341
|
+
```ruby
|
|
342
|
+
# Disconnect all connections in the pool
|
|
343
|
+
client.disconnect
|
|
344
|
+
|
|
345
|
+
# Check if client is connected
|
|
346
|
+
client.connected? # => true
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
#### Module-Level Methods
|
|
350
|
+
|
|
351
|
+
Quick access without explicit client instance:
|
|
352
|
+
|
|
353
|
+
```ruby
|
|
354
|
+
# Execute query using default client
|
|
355
|
+
result = ClickhouseRuby.execute('SELECT 1 AS num')
|
|
356
|
+
|
|
357
|
+
# Insert data using default client
|
|
358
|
+
ClickhouseRuby.insert('events', [
|
|
359
|
+
{ date: '2024-01-01', event_type: 'click' }
|
|
360
|
+
])
|
|
361
|
+
|
|
362
|
+
# Ping default client
|
|
363
|
+
ClickhouseRuby.ping # => true
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
**Performance Note**: Batch and row-by-row processing methods use result streaming internally,
|
|
367
|
+
which maintains constant memory usage regardless of result size. Use these methods for queries
|
|
368
|
+
that return millions of rows to prevent memory exhaustion.
|
|
369
|
+
|
|
229
370
|
## Type Support
|
|
230
371
|
|
|
231
372
|
ClickhouseRuby supports all ClickHouse types:
|
|
@@ -0,0 +1,308 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "rails/generators"
|
|
4
|
+
require "rails/generators/active_record/migration"
|
|
5
|
+
|
|
6
|
+
module ClickhouseRuby
|
|
7
|
+
module Generators
|
|
8
|
+
# Rails generator for creating ClickHouse migrations
|
|
9
|
+
#
|
|
10
|
+
# This generator creates migration files with ClickHouse-specific options
|
|
11
|
+
# like ENGINE, ORDER BY, PARTITION BY, and PRIMARY KEY.
|
|
12
|
+
#
|
|
13
|
+
# @example Generate a migration
|
|
14
|
+
# rails generate clickhouse:migration CreateEvents
|
|
15
|
+
#
|
|
16
|
+
# @example Generate a migration with columns
|
|
17
|
+
# rails generate clickhouse:migration CreateEvents user_id:integer name:string
|
|
18
|
+
#
|
|
19
|
+
# @example Generate a migration with ClickHouse options
|
|
20
|
+
# rails generate clickhouse:migration CreateEvents user_id:integer --engine=ReplacingMergeTree --order-by=user_id
|
|
21
|
+
#
|
|
22
|
+
class MigrationGenerator < Rails::Generators::NamedBase
|
|
23
|
+
include ActiveRecord::Generators::Migration
|
|
24
|
+
|
|
25
|
+
source_root File.expand_path("templates", __dir__)
|
|
26
|
+
|
|
27
|
+
argument :attributes, type: :array, default: [], banner: "field:type field:type"
|
|
28
|
+
|
|
29
|
+
class_option :engine,
|
|
30
|
+
type: :string,
|
|
31
|
+
default: "MergeTree",
|
|
32
|
+
desc: "ClickHouse table engine (MergeTree, ReplacingMergeTree, SummingMergeTree, etc.)"
|
|
33
|
+
|
|
34
|
+
class_option :order_by,
|
|
35
|
+
type: :string,
|
|
36
|
+
desc: "ORDER BY clause for MergeTree family engines"
|
|
37
|
+
|
|
38
|
+
class_option :partition_by,
|
|
39
|
+
type: :string,
|
|
40
|
+
desc: "PARTITION BY clause for data partitioning"
|
|
41
|
+
|
|
42
|
+
class_option :primary_key,
|
|
43
|
+
type: :string,
|
|
44
|
+
desc: "PRIMARY KEY clause (defaults to ORDER BY if not specified)"
|
|
45
|
+
|
|
46
|
+
class_option :settings,
|
|
47
|
+
type: :string,
|
|
48
|
+
desc: "Table SETTINGS clause"
|
|
49
|
+
|
|
50
|
+
class_option :cluster,
|
|
51
|
+
type: :string,
|
|
52
|
+
desc: "Cluster name for distributed tables"
|
|
53
|
+
|
|
54
|
+
# Generate the migration file
|
|
55
|
+
#
|
|
56
|
+
# @return [void]
|
|
57
|
+
def create_migration_file
|
|
58
|
+
set_local_assigns!
|
|
59
|
+
validate_engine!
|
|
60
|
+
|
|
61
|
+
migration_template "migration.rb.tt", File.join(db_migrate_path, "#{file_name}.rb")
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
private
|
|
65
|
+
|
|
66
|
+
# Set local variables for use in templates
|
|
67
|
+
#
|
|
68
|
+
# @return [void]
|
|
69
|
+
def set_local_assigns!
|
|
70
|
+
@migration_action = detect_migration_action
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
# Detect the type of migration based on the name
|
|
74
|
+
#
|
|
75
|
+
# @return [Symbol] :create_table, :add_column, :remove_column, or :change_table
|
|
76
|
+
def detect_migration_action
|
|
77
|
+
case file_name
|
|
78
|
+
when /^create_/
|
|
79
|
+
:create_table
|
|
80
|
+
when /^add_.*_to_/
|
|
81
|
+
:add_column
|
|
82
|
+
when /^remove_.*_from_/
|
|
83
|
+
:remove_column
|
|
84
|
+
else
|
|
85
|
+
:change_table
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
# Validate the engine option
|
|
90
|
+
#
|
|
91
|
+
# @raise [ArgumentError] if the engine is invalid
|
|
92
|
+
# @return [void]
|
|
93
|
+
def validate_engine!
|
|
94
|
+
return if valid_engines.include?(options[:engine])
|
|
95
|
+
|
|
96
|
+
raise ArgumentError, "Invalid engine '#{options[:engine]}'. Valid engines: #{valid_engines.join(", ")}"
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
# List of valid ClickHouse engines
|
|
100
|
+
#
|
|
101
|
+
# @return [Array<String>] valid engine names
|
|
102
|
+
def valid_engines
|
|
103
|
+
%w[
|
|
104
|
+
MergeTree ReplacingMergeTree SummingMergeTree AggregatingMergeTree
|
|
105
|
+
CollapsingMergeTree VersionedCollapsingMergeTree GraphiteMergeTree
|
|
106
|
+
Log TinyLog StripeLog Memory Null Set Join Buffer Distributed
|
|
107
|
+
MaterializedView Dictionary
|
|
108
|
+
]
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
# Get the table name from the migration name
|
|
112
|
+
#
|
|
113
|
+
# @return [String] the table name
|
|
114
|
+
def table_name
|
|
115
|
+
@table_name ||= extract_table_name
|
|
116
|
+
end
|
|
117
|
+
|
|
118
|
+
# Extract table name based on migration action
|
|
119
|
+
#
|
|
120
|
+
# @return [String] the extracted table name
|
|
121
|
+
def extract_table_name
|
|
122
|
+
case @migration_action
|
|
123
|
+
when :create_table then file_name.sub(/^create_/, "")
|
|
124
|
+
when :add_column then file_name.sub(/^add_\w+_to_/, "")
|
|
125
|
+
when :remove_column then file_name.sub(/^remove_\w+_from_/, "")
|
|
126
|
+
else file_name
|
|
127
|
+
end
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
# Get the column name for add/remove column migrations
|
|
131
|
+
#
|
|
132
|
+
# @return [String, nil] the column name or nil
|
|
133
|
+
def column_name
|
|
134
|
+
@column_name ||= extract_column_name
|
|
135
|
+
end
|
|
136
|
+
|
|
137
|
+
# Extract column name based on migration action
|
|
138
|
+
#
|
|
139
|
+
# @return [String, nil] the extracted column name
|
|
140
|
+
def extract_column_name
|
|
141
|
+
case @migration_action
|
|
142
|
+
when :add_column then file_name[/^add_(\w+)_to_/, 1]
|
|
143
|
+
when :remove_column then file_name[/^remove_(\w+)_from_/, 1]
|
|
144
|
+
end
|
|
145
|
+
end
|
|
146
|
+
|
|
147
|
+
# Get the engine with cluster option if specified
|
|
148
|
+
#
|
|
149
|
+
# @return [String] the engine specification
|
|
150
|
+
def engine_with_cluster
|
|
151
|
+
engine = options[:engine]
|
|
152
|
+
return engine unless options[:cluster]
|
|
153
|
+
|
|
154
|
+
"Replicated#{engine}('/clickhouse/tables/{shard}/#{table_name}', '{replica}')"
|
|
155
|
+
end
|
|
156
|
+
|
|
157
|
+
# Get the ORDER BY clause
|
|
158
|
+
#
|
|
159
|
+
# @return [String, nil] the ORDER BY expression
|
|
160
|
+
def order_by_clause
|
|
161
|
+
options[:order_by] || infer_order_by
|
|
162
|
+
end
|
|
163
|
+
|
|
164
|
+
# Infer ORDER BY from primary key or first column
|
|
165
|
+
#
|
|
166
|
+
# @return [String, nil] inferred ORDER BY expression
|
|
167
|
+
def infer_order_by
|
|
168
|
+
return unless @migration_action == :create_table
|
|
169
|
+
|
|
170
|
+
# Use id if present, otherwise first attribute
|
|
171
|
+
return "id" if attributes.any? { |attr| attr.name == "id" }
|
|
172
|
+
return attributes.first.name if attributes.any?
|
|
173
|
+
|
|
174
|
+
"tuple()"
|
|
175
|
+
end
|
|
176
|
+
|
|
177
|
+
# Get the PARTITION BY clause
|
|
178
|
+
#
|
|
179
|
+
# @return [String, nil] the PARTITION BY expression
|
|
180
|
+
def partition_by_clause
|
|
181
|
+
options[:partition_by]
|
|
182
|
+
end
|
|
183
|
+
|
|
184
|
+
# Get the PRIMARY KEY clause
|
|
185
|
+
#
|
|
186
|
+
# @return [String, nil] the PRIMARY KEY expression
|
|
187
|
+
def primary_key_clause
|
|
188
|
+
options[:primary_key]
|
|
189
|
+
end
|
|
190
|
+
|
|
191
|
+
# Get the SETTINGS clause
|
|
192
|
+
#
|
|
193
|
+
# @return [String, nil] the SETTINGS expression
|
|
194
|
+
def settings_clause
|
|
195
|
+
options[:settings]
|
|
196
|
+
end
|
|
197
|
+
|
|
198
|
+
# Convert Rails type to ClickHouse type
|
|
199
|
+
#
|
|
200
|
+
# @param type [String] the Rails type
|
|
201
|
+
# @param attr_options [Hash] type options
|
|
202
|
+
# @return [String] the ClickHouse type
|
|
203
|
+
def clickhouse_type(type, attr_options = {})
|
|
204
|
+
TypeMapper.to_clickhouse(type, attr_options)
|
|
205
|
+
end
|
|
206
|
+
|
|
207
|
+
# Get the path to db/migrate directory
|
|
208
|
+
#
|
|
209
|
+
# @return [String] the migration directory path
|
|
210
|
+
def db_migrate_path
|
|
211
|
+
return "db/migrate" unless defined?(Rails.application) && Rails.application
|
|
212
|
+
|
|
213
|
+
Rails.application.config.paths["db/migrate"].to_a.first
|
|
214
|
+
end
|
|
215
|
+
end
|
|
216
|
+
|
|
217
|
+
# Maps Rails types to ClickHouse types
|
|
218
|
+
module TypeMapper
|
|
219
|
+
# Type mapping from Rails types to ClickHouse types
|
|
220
|
+
TYPE_MAP = {
|
|
221
|
+
primary_key: "UInt64",
|
|
222
|
+
bigint: "Int64",
|
|
223
|
+
time: "DateTime",
|
|
224
|
+
date: "Date",
|
|
225
|
+
binary: "String",
|
|
226
|
+
boolean: "UInt8",
|
|
227
|
+
uuid: "UUID",
|
|
228
|
+
json: "String",
|
|
229
|
+
}.freeze
|
|
230
|
+
|
|
231
|
+
# Integer size mapping
|
|
232
|
+
INTEGER_SIZES = {
|
|
233
|
+
1 => "Int8",
|
|
234
|
+
2 => "Int16",
|
|
235
|
+
}.freeze
|
|
236
|
+
|
|
237
|
+
class << self
|
|
238
|
+
# Convert a Rails type to ClickHouse type
|
|
239
|
+
#
|
|
240
|
+
# @param type [String, Symbol] the Rails type
|
|
241
|
+
# @param options [Hash] type options
|
|
242
|
+
# @return [String] the ClickHouse type
|
|
243
|
+
def to_clickhouse(type, options = {})
|
|
244
|
+
type_sym = type.to_sym
|
|
245
|
+
|
|
246
|
+
# Check simple type map first
|
|
247
|
+
return TYPE_MAP[type_sym] if TYPE_MAP.key?(type_sym)
|
|
248
|
+
|
|
249
|
+
# Handle complex types
|
|
250
|
+
complex_type(type_sym, options) || type.to_s
|
|
251
|
+
end
|
|
252
|
+
|
|
253
|
+
private
|
|
254
|
+
|
|
255
|
+
# Complex type handlers
|
|
256
|
+
COMPLEX_TYPE_HANDLERS = %i[string text integer float decimal datetime timestamp].freeze
|
|
257
|
+
|
|
258
|
+
# Handle complex type conversions
|
|
259
|
+
#
|
|
260
|
+
# @param type [Symbol] the Rails type
|
|
261
|
+
# @param options [Hash] type options
|
|
262
|
+
# @return [String, nil] the ClickHouse type or nil
|
|
263
|
+
def complex_type(type, options)
|
|
264
|
+
return string_type(options) if %i[string text].include?(type)
|
|
265
|
+
|
|
266
|
+
send("#{type}_type", options) if COMPLEX_TYPE_HANDLERS.include?(type)
|
|
267
|
+
end
|
|
268
|
+
|
|
269
|
+
# Get the float type based on limit
|
|
270
|
+
def float_type(options)
|
|
271
|
+
options[:limit] == 8 ? "Float64" : "Float32"
|
|
272
|
+
end
|
|
273
|
+
|
|
274
|
+
# Get the timestamp type based on precision
|
|
275
|
+
def timestamp_type(options)
|
|
276
|
+
"DateTime64(#{options[:precision] || 3})"
|
|
277
|
+
end
|
|
278
|
+
|
|
279
|
+
# Get the string type based on options
|
|
280
|
+
def string_type(options)
|
|
281
|
+
options[:limit] ? "FixedString(#{options[:limit]})" : "String"
|
|
282
|
+
end
|
|
283
|
+
|
|
284
|
+
# Get the integer type based on limit
|
|
285
|
+
def integer_type(options)
|
|
286
|
+
limit = options[:limit]
|
|
287
|
+
return INTEGER_SIZES[limit] if INTEGER_SIZES.key?(limit)
|
|
288
|
+
return "Int32" if limit.nil? || limit <= 4
|
|
289
|
+
return "Int64" if limit <= 8
|
|
290
|
+
|
|
291
|
+
"Int32"
|
|
292
|
+
end
|
|
293
|
+
|
|
294
|
+
# Get the decimal type based on precision and scale
|
|
295
|
+
def decimal_type(options)
|
|
296
|
+
precision = options[:precision] || 10
|
|
297
|
+
scale = options[:scale] || 0
|
|
298
|
+
"Decimal(#{precision}, #{scale})"
|
|
299
|
+
end
|
|
300
|
+
|
|
301
|
+
# Get the datetime type based on precision
|
|
302
|
+
def datetime_type(options)
|
|
303
|
+
options[:precision] ? "DateTime64(#{options[:precision]})" : "DateTime"
|
|
304
|
+
end
|
|
305
|
+
end
|
|
306
|
+
end
|
|
307
|
+
end
|
|
308
|
+
end
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
class <%= class_name %> < ActiveRecord::Migration[<%= ActiveRecord::Migration.current_version %>]
|
|
4
|
+
def change
|
|
5
|
+
create_table :<%= table_name %>, **clickhouse_options do |t|
|
|
6
|
+
<%- attributes.each do |attribute| -%>
|
|
7
|
+
<%- if attribute.type == :references -%>
|
|
8
|
+
t.bigint :<%= attribute.name %>_id<%= attribute.has_index? ? ", index: true" : "" %>
|
|
9
|
+
<%- else -%>
|
|
10
|
+
t.<%= attribute.type %> :<%= attribute.name %>
|
|
11
|
+
<%- end -%>
|
|
12
|
+
<%- end -%>
|
|
13
|
+
<%- if attributes.empty? -%>
|
|
14
|
+
t.uuid :id
|
|
15
|
+
<%- end -%>
|
|
16
|
+
t.timestamps
|
|
17
|
+
end
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
private
|
|
21
|
+
|
|
22
|
+
# ClickHouse-specific table options
|
|
23
|
+
#
|
|
24
|
+
# MergeTree Engine Family:
|
|
25
|
+
# - MergeTree: Basic engine, good for general analytics
|
|
26
|
+
# - ReplacingMergeTree: Deduplicates rows by ORDER BY key
|
|
27
|
+
# - SummingMergeTree: Automatically sums numeric columns
|
|
28
|
+
# - AggregatingMergeTree: Stores pre-aggregated states
|
|
29
|
+
# - CollapsingMergeTree: Supports row collapsing with sign column
|
|
30
|
+
#
|
|
31
|
+
# ORDER BY:
|
|
32
|
+
# - Determines physical data sorting
|
|
33
|
+
# - First columns should be most frequently filtered
|
|
34
|
+
# - Affects compression and query performance
|
|
35
|
+
#
|
|
36
|
+
# PARTITION BY:
|
|
37
|
+
# - Usually partition by date (toYYYYMM, toYYYYMMDD)
|
|
38
|
+
# - Enables efficient data retention and queries
|
|
39
|
+
# - Don't over-partition (thousands of partitions is too many)
|
|
40
|
+
#
|
|
41
|
+
# @return [Hash] ClickHouse table options
|
|
42
|
+
def clickhouse_options
|
|
43
|
+
{
|
|
44
|
+
engine: "<%= engine_with_cluster %>",
|
|
45
|
+
<%- if order_by_clause -%>
|
|
46
|
+
order_by: "<%= order_by_clause %>",
|
|
47
|
+
<%- end -%>
|
|
48
|
+
<%- if partition_by_clause -%>
|
|
49
|
+
partition_by: "<%= partition_by_clause %>",
|
|
50
|
+
<%- end -%>
|
|
51
|
+
<%- if primary_key_clause -%>
|
|
52
|
+
primary_key: "<%= primary_key_clause %>",
|
|
53
|
+
<%- end -%>
|
|
54
|
+
<%- if settings_clause -%>
|
|
55
|
+
settings: "<%= settings_clause %>",
|
|
56
|
+
<%- end -%>
|
|
57
|
+
}
|
|
58
|
+
end
|
|
59
|
+
end
|