clickhouse-ruby 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +74 -1
  3. data/README.md +165 -79
  4. data/lib/clickhouse_ruby/active_record/arel_visitor.rb +205 -76
  5. data/lib/clickhouse_ruby/active_record/connection_adapter.rb +103 -98
  6. data/lib/clickhouse_ruby/active_record/railtie.rb +20 -15
  7. data/lib/clickhouse_ruby/active_record/relation_extensions.rb +398 -0
  8. data/lib/clickhouse_ruby/active_record/schema_statements.rb +90 -104
  9. data/lib/clickhouse_ruby/active_record.rb +24 -10
  10. data/lib/clickhouse_ruby/client.rb +181 -74
  11. data/lib/clickhouse_ruby/configuration.rb +51 -10
  12. data/lib/clickhouse_ruby/connection.rb +180 -64
  13. data/lib/clickhouse_ruby/connection_pool.rb +25 -19
  14. data/lib/clickhouse_ruby/errors.rb +13 -1
  15. data/lib/clickhouse_ruby/result.rb +11 -16
  16. data/lib/clickhouse_ruby/retry_handler.rb +172 -0
  17. data/lib/clickhouse_ruby/streaming_result.rb +309 -0
  18. data/lib/clickhouse_ruby/types/array.rb +11 -64
  19. data/lib/clickhouse_ruby/types/base.rb +59 -0
  20. data/lib/clickhouse_ruby/types/boolean.rb +28 -25
  21. data/lib/clickhouse_ruby/types/date_time.rb +10 -27
  22. data/lib/clickhouse_ruby/types/decimal.rb +173 -0
  23. data/lib/clickhouse_ruby/types/enum.rb +262 -0
  24. data/lib/clickhouse_ruby/types/float.rb +14 -28
  25. data/lib/clickhouse_ruby/types/integer.rb +21 -43
  26. data/lib/clickhouse_ruby/types/low_cardinality.rb +1 -1
  27. data/lib/clickhouse_ruby/types/map.rb +21 -36
  28. data/lib/clickhouse_ruby/types/null_safe.rb +81 -0
  29. data/lib/clickhouse_ruby/types/nullable.rb +2 -2
  30. data/lib/clickhouse_ruby/types/parser.rb +28 -18
  31. data/lib/clickhouse_ruby/types/registry.rb +40 -29
  32. data/lib/clickhouse_ruby/types/string.rb +9 -13
  33. data/lib/clickhouse_ruby/types/string_parser.rb +135 -0
  34. data/lib/clickhouse_ruby/types/tuple.rb +11 -68
  35. data/lib/clickhouse_ruby/types/uuid.rb +15 -22
  36. data/lib/clickhouse_ruby/types.rb +19 -15
  37. data/lib/clickhouse_ruby/version.rb +1 -1
  38. data/lib/clickhouse_ruby.rb +11 -11
  39. metadata +41 -6
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d7855eb261fe694066b665b2d1408b837a751d257c26d88f74d29847869fe652
4
- data.tar.gz: 524b169428f84ab3f24c26433b4b57ead1f41e6147551c412e0f0a1840a8f7c9
3
+ metadata.gz: 51f95f8f05250a39be47798030cb5c1b9b75ed8b983e37fefafaec0584816481
4
+ data.tar.gz: 7e1c2f3f581b885a794164b594d8d49ee50c1c3be6a9e7cdae2568996114ba6d
5
5
  SHA512:
6
- metadata.gz: 76da6aa320dabde964561304404eecfba2f500437ffc763f1bb15f6c5917c1cf703b1c14f2ec76024f7a585749993cd047dd7e6576dc2e33c2754e9677865556
7
- data.tar.gz: 7bba509232194cef0a8dd560a08434d16a6ba3fd04a96d7868e1b00080d0421834e1a88c07f76c592589aadf2b473703ae6f3b5dd3cc692fcae6f3798843a446
6
+ metadata.gz: 7853a713aaa713f9a12bc0306f9c37861827127a94e74cf7bd2ca282c7a61e6a3750eecdb149c9dda80e7eefb367a2b943f48eddb6d60bd814d4bfc5076e2f98
7
+ data.tar.gz: b7c14c92a8443c9a263da13c1e0e59b52f1827f9959b815306ef5c08a0b8f629c2652a0a6a54fd039f6086d7e14d58d683bd711f62c1cbb42cccc3cf0ad299d9
data/CHANGELOG.md CHANGED
@@ -7,6 +7,79 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.2.0] - 2026-02-02
11
+
12
+ ### Added
13
+
14
+ #### ActiveRecord Query Extensions
15
+ - **FINAL modifier** - Deduplication support for ReplacingMergeTree and CollapsingMergeTree tables
16
+ - Methods: `final`, `final!`, `final?`, `unscope_final`
17
+ - Auto-adds required settings when combined with PREWHERE
18
+ - Example: `User.final.where(id: 123)`
19
+
20
+ - **SAMPLE clause** - Approximate queries on large datasets for performance
21
+ - Methods: `sample(ratio_or_rows, offset: nil)`, `sample!`, `sample_value`, `sample_offset`
22
+ - Supports fractional sampling (0.1 = 10%) and absolute row counts
23
+ - Preserves Integer vs Float distinction (1 = "at least 1 row", 1.0 = "100% of data")
24
+ - Example: `Event.sample(0.1).count`
25
+
26
+ - **PREWHERE clause** - Query optimization that filters before reading all columns
27
+ - Methods: `prewhere(opts)`, `prewhere!`, `prewhere_values`, `prewhere.not(...)`
28
+ - Supports hash conditions, string conditions with placeholders, ranges, and Arel nodes
29
+ - Automatically optimized by ClickHouse when enabled
30
+ - Example: `Event.prewhere(date: Date.today).where(status: 'active')`
31
+
32
+ - **SETTINGS DSL** - Per-query ClickHouse configuration
33
+ - Methods: `settings(opts)`, `settings!`, `query_settings`
34
+ - Normalizes boolean values (true/false → 1/0)
35
+ - Quotes string values automatically
36
+ - Example: `Event.settings(max_threads: 4, async_insert: true).all`
37
+
38
+ #### Internal Improvements
39
+ - Arel visitor integration for ClickHouse-specific SQL clauses
40
+ - Proper SQL clause ordering: SELECT FROM [FINAL] [SAMPLE] [PREWHERE] [WHERE] [GROUP BY] [ORDER BY] [LIMIT] [SETTINGS]
41
+ - RelationExtensions#build_arel override to attach ClickHouse state to Arel AST
42
+
43
+ #### Type System Extensions
44
+ - **Enum Type** - Support for Enum8 and Enum16 with string-to-integer mapping
45
+ - Methods: `cast`, `serialize`, `deserialize`
46
+ - Validation of enum values
47
+ - Example: `field_type = :Enum8` with values mapped via schema
48
+
49
+ - **Decimal Type** - Arbitrary precision decimal support via BigDecimal
50
+ - Auto-mapping to Decimal32/64/128/256 based on precision
51
+ - Example: `field_type = :Decimal` with precision and scale
52
+
53
+ #### Reliability Improvements
54
+ - **Retry Logic** - Exponential backoff with jitter for transient failures
55
+ - Default: 1.6x multiplier, up to 120 seconds max backoff
56
+ - Configurable via `initial_backoff`, `backoff_multiplier`, `max_backoff`, `max_retries`
57
+ - Only retries transient errors (ConnectionError, Timeout, HTTP 5xx/429)
58
+ - Non-retriable: QueryError (syntax errors), HTTP 4xx
59
+
60
+ - **Result Streaming** - Memory-efficient processing of large result sets
61
+ - Method: `stream_execute(sql) { |row| ... }`
62
+ - Yields rows one at a time using JSONEachRow format
63
+ - Constant memory usage regardless of result size
64
+ - Example: `client.stream_execute('SELECT * FROM huge_table') { |row| process(row) }`
65
+
66
+ #### Performance Improvements
67
+ - **HTTP Compression** - gzip compression for request/response
68
+ - Configuration: `compression: 'gzip'`, `compression_threshold: 1024`
69
+ - Built-in Zlib support (no external dependencies)
70
+ - Headers: `Content-Encoding: gzip`, `Accept-Encoding: gzip`
71
+ - Beneficial for large payloads (>1MB)
72
+
73
+ ### Changed
74
+ - ActiveRecord relation extension architecture for better feature organization
75
+ - Improved documentation with examples for all new features
76
+
77
+ ### Known Limitations
78
+ - PREWHERE doesn't work with multiple JOINs (ClickHouse limitation)
79
+ - SAMPLE requires table created with SAMPLE BY clause
80
+ - Streaming cannot be used with FINAL or aggregate functions
81
+ - HTTP compression has overhead for small payloads
82
+
10
83
  ## [0.1.0] - 2026-01-31
11
84
 
12
85
  ### Added
@@ -53,7 +126,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
53
126
 
54
127
  #### Project Infrastructure
55
128
  - RSpec test suite with unit and integration tests
56
- - VCR for HTTP interaction recording
129
+ - WebMock for HTTP mocking in unit tests
57
130
  - Docker Compose setup for ClickHouse testing
58
131
  - GitHub Actions CI workflow
59
132
  - RuboCop configuration
data/README.md CHANGED
@@ -1,7 +1,27 @@
1
1
  # ClickhouseRuby
2
2
 
3
+ [![Tests](https://github.com/Mohamad-Kamar/clickhouse-ruby/actions/workflows/test.yml/badge.svg)](https://github.com/Mohamad-Kamar/clickhouse-ruby/actions/workflows/test.yml)
4
+ [![Gem Version](https://badge.fury.io/rb/clickhouse-ruby.svg)](https://badge.fury.io/rb/clickhouse-ruby)
5
+ [![Downloads](https://img.shields.io/gem/dt/clickhouse-ruby.svg)](https://rubygems.org/gems/clickhouse-ruby)
6
+
3
7
  A lightweight Ruby client for ClickHouse with optional ActiveRecord integration.
4
8
 
9
+ ## Features
10
+
11
+ **Core (v0.1.0)**
12
+ - **Simple HTTP client** - Clean API for queries, commands, and bulk inserts
13
+ - **Connection pooling** - Built-in connection pool with health checks
14
+ - **Type system** - Full support for ClickHouse types including Nullable, Array, Map, Tuple
15
+ - **Proper error handling** - Never silently ignores errors (fixes clickhouse-activerecord #230)
16
+ - **SSL/TLS support** - Certificate verification enabled by default
17
+ - **ActiveRecord integration** - Optional familiar model-based access
18
+
19
+ **Enhanced (v0.2.0)**
20
+ - **Enum & Decimal types** - Fixed-set values and arbitrary precision arithmetic
21
+ - **Query optimization** - PREWHERE clause, FINAL deduplication, SAMPLE approximation
22
+ - **Performance** - HTTP compression, result streaming, automatic retries with backoff
23
+ - **Query control** - Per-query SETTINGS DSL for ClickHouse configuration
24
+
5
25
  ## Installation
6
26
 
7
27
  Add this line to your application's Gemfile:
@@ -25,17 +45,19 @@ gem install clickhouse-ruby
25
45
  ## Quick Start
26
46
 
27
47
  ```ruby
28
- require 'clickhouse-ruby'
48
+ require 'clickhouse_ruby'
49
+
50
+ # Configure the client
51
+ config = ClickhouseRuby::Configuration.new
52
+ config.host = 'localhost'
53
+ config.port = 8123
54
+ config.database = 'default'
29
55
 
30
56
  # Create a client
31
- client = ClickhouseRuby::Client.new(
32
- host: 'localhost',
33
- port: 8123,
34
- database: 'default'
35
- )
57
+ client = ClickhouseRuby::Client.new(config)
36
58
 
37
59
  # Execute a query
38
- result = client.query('SELECT 1 + 1 AS result')
60
+ result = client.execute('SELECT 1 + 1 AS result')
39
61
  puts result.first['result'] # => 2
40
62
 
41
63
  # Insert data
@@ -57,7 +79,8 @@ ClickhouseRuby.configure do |config|
57
79
  config.username = 'default'
58
80
  config.password = ''
59
81
  config.ssl = false
60
- config.timeout = 60
82
+ config.connect_timeout = 10
83
+ config.read_timeout = 60
61
84
  end
62
85
 
63
86
  # Use the default client
@@ -72,10 +95,13 @@ client = ClickhouseRuby.client
72
95
  | `port` | HTTP interface port | `8123` |
73
96
  | `database` | Default database | `default` |
74
97
  | `username` | Authentication username | `default` |
75
- | `password` | Authentication password | `''` |
98
+ | `password` | Authentication password | `nil` |
76
99
  | `ssl` | Enable HTTPS | `false` |
77
- | `timeout` | Request timeout in seconds | `60` |
78
- | `max_retries` | Number of retry attempts | `3` |
100
+ | `ssl_verify` | Verify SSL certificates | `true` |
101
+ | `connect_timeout` | Connection timeout in seconds | `10` |
102
+ | `read_timeout` | Read timeout in seconds | `60` |
103
+ | `write_timeout` | Write timeout in seconds | `60` |
104
+ | `pool_size` | Connection pool size | `5` |
79
105
 
80
106
  ### Environment Variables
81
107
 
@@ -90,45 +116,75 @@ CLICKHOUSE_PASSWORD=secret
90
116
  CLICKHOUSE_SSL=false
91
117
  ```
92
118
 
93
- ## Usage
119
+ ### v0.2.0 Enhancements
94
120
 
95
- ### Querying Data
121
+ **HTTP Compression** - Reduce bandwidth for large payloads:
122
+ ```ruby
123
+ ClickhouseRuby.configure do |config|
124
+ config.compression = 'gzip'
125
+ config.compression_threshold = 1024 # Only compress > 1KB
126
+ end
127
+ ```
96
128
 
129
+ **Retry Logic** - Automatic retries with exponential backoff:
97
130
  ```ruby
98
- # Simple query
99
- result = client.query('SELECT * FROM events LIMIT 10')
131
+ ClickhouseRuby.configure do |config|
132
+ config.max_retries = 3
133
+ config.initial_backoff = 1.0
134
+ config.backoff_multiplier = 1.6
135
+ config.max_backoff = 120
136
+ end
137
+ ```
100
138
 
101
- # Query with parameters (prevents SQL injection)
102
- result = client.query(
103
- 'SELECT * FROM events WHERE date = {date:Date}',
104
- params: { date: '2024-01-01' }
105
- )
139
+ **Result Streaming** - Process large results with constant memory:
140
+ ```ruby
141
+ client.stream_execute('SELECT * FROM huge_table') do |row|
142
+ process_row(row)
143
+ end
144
+ ```
145
+
146
+ **ActiveRecord Query Extensions** - ClickHouse-specific query methods:
147
+ ```ruby
148
+ # Query optimization
149
+ Event.prewhere(date: Date.today).where(status: 'active')
150
+
151
+ # Deduplication
152
+ User.final.where(id: 123)
153
+
154
+ # Approximate queries
155
+ Event.sample(0.1).count # 10% sample
106
156
 
107
- # Query with specific format
108
- result = client.query('SELECT * FROM events', format: 'JSONEachRow')
157
+ # Per-query configuration
158
+ Event.settings(max_threads: 4).where(active: true)
109
159
  ```
110
160
 
111
- ### Inserting Data
161
+ See [docs/features/README.md](docs/features/README.md) for detailed documentation on all v0.2.0 features.
112
162
 
113
- ```ruby
114
- # Insert hash array
115
- client.insert('events', [
116
- { date: '2024-01-01', event_type: 'click', count: 100 },
117
- { date: '2024-01-02', event_type: 'view', count: 250 }
118
- ])
163
+ ## Usage
119
164
 
120
- # Insert with explicit columns
121
- client.insert('events', data, columns: [:date, :event_type, :count])
165
+ ### Querying Data
122
166
 
123
- # Bulk insert from CSV
124
- client.insert_from_file('events', '/path/to/data.csv', format: 'CSV')
167
+ ```ruby
168
+ # Simple query - returns Result object
169
+ result = client.execute('SELECT * FROM events LIMIT 10')
170
+ result.each { |row| puts row['event_type'] }
171
+
172
+ # Access columns and types
173
+ result.columns # => ['id', 'event_type', 'count']
174
+ result.types # => ['UInt64', 'String', 'UInt32']
175
+
176
+ # Query with settings
177
+ result = client.execute(
178
+ 'SELECT * FROM events',
179
+ settings: { max_rows_to_read: 1_000_000 }
180
+ )
125
181
  ```
126
182
 
127
- ### DDL Operations
183
+ ### DDL Commands
128
184
 
129
185
  ```ruby
130
186
  # Create table
131
- client.execute(<<~SQL)
187
+ client.command(<<~SQL)
132
188
  CREATE TABLE events (
133
189
  date Date,
134
190
  event_type String,
@@ -137,11 +193,21 @@ client.execute(<<~SQL)
137
193
  ORDER BY date
138
194
  SQL
139
195
 
140
- # Check if table exists
141
- client.table_exists?('events') # => true
196
+ # Drop table
197
+ client.command('DROP TABLE IF EXISTS events')
198
+ ```
199
+
200
+ ### Inserting Data
201
+
202
+ ```ruby
203
+ # Insert hash array (uses efficient JSONEachRow format)
204
+ client.insert('events', [
205
+ { date: '2024-01-01', event_type: 'click', count: 100 },
206
+ { date: '2024-01-02', event_type: 'view', count: 250 }
207
+ ])
142
208
 
143
- # Get table schema
144
- schema = client.describe_table('events')
209
+ # Insert with explicit columns
210
+ client.insert('events', data, columns: ['date', 'event_type', 'count'])
145
211
  ```
146
212
 
147
213
  ### Connection Management
@@ -153,20 +219,66 @@ client.ping # => true
153
219
  # Get server version
154
220
  client.server_version # => "24.1.1.123"
155
221
 
156
- # Execute multiple queries in a session
157
- client.with_session do |session|
158
- session.execute('SET max_memory_usage = 1000000000')
159
- session.query('SELECT * FROM large_table')
222
+ # Get pool statistics
223
+ client.pool_stats # => { size: 5, available: 5, in_use: 0 }
224
+
225
+ # Close all connections
226
+ client.close
227
+ ```
228
+
229
+ ## Type Support
230
+
231
+ ClickhouseRuby supports all ClickHouse types:
232
+
233
+ | ClickHouse Type | Ruby Type |
234
+ |-----------------|-----------|
235
+ | Int8-Int64, UInt8-UInt64 | Integer |
236
+ | Float32, Float64 | Float |
237
+ | String, FixedString | String |
238
+ | Date, Date32 | Date |
239
+ | DateTime, DateTime64 | Time |
240
+ | UUID | String |
241
+ | Bool | Boolean |
242
+ | Nullable(T) | T or nil |
243
+ | Array(T) | Array |
244
+ | Map(K, V) | Hash |
245
+ | Tuple(T...) | Array |
246
+ | LowCardinality(T) | T |
247
+
248
+ ## Error Handling
249
+
250
+ ```ruby
251
+ begin
252
+ client.execute('SELECT * FROM nonexistent_table')
253
+ rescue ClickhouseRuby::UnknownTable => e
254
+ puts "Table not found: #{e.message}"
255
+ puts "Error code: #{e.code}" # ClickHouse error code
256
+ puts "HTTP status: #{e.http_status}" # HTTP response code
257
+ rescue ClickhouseRuby::ConnectionError => e
258
+ puts "Connection failed: #{e.message}"
259
+ rescue ClickhouseRuby::QueryError => e
260
+ puts "Query failed: #{e.message}"
160
261
  end
161
262
  ```
162
263
 
264
+ ### Error Classes
265
+
266
+ - `ClickhouseRuby::Error` - Base error class
267
+ - `ClickhouseRuby::ConnectionError` - Connection issues
268
+ - `ClickhouseRuby::ConnectionTimeout` - Timeout errors
269
+ - `ClickhouseRuby::QueryError` - Query execution errors
270
+ - `ClickhouseRuby::SyntaxError` - SQL syntax errors
271
+ - `ClickhouseRuby::UnknownTable` - Table doesn't exist
272
+ - `ClickhouseRuby::UnknownColumn` - Column doesn't exist
273
+ - `ClickhouseRuby::UnknownDatabase` - Database doesn't exist
274
+
163
275
  ## ActiveRecord Integration
164
276
 
165
277
  ClickhouseRuby provides optional ActiveRecord integration for familiar model-based access.
166
278
 
167
279
  ```ruby
168
- # config/initializers/clickhouse-ruby.rb
169
- require 'clickhouse-ruby/active_record'
280
+ # config/initializers/clickhouse_ruby.rb
281
+ require 'clickhouse_ruby/active_record'
170
282
 
171
283
  ClickhouseRuby::ActiveRecord.establish_connection(
172
284
  host: 'localhost',
@@ -184,24 +296,6 @@ Event.where(date: '2024-01-01').limit(10).each do |event|
184
296
  end
185
297
  ```
186
298
 
187
- ## Error Handling
188
-
189
- ```ruby
190
- begin
191
- client.query('SELECT * FROM nonexistent_table')
192
- rescue ClickhouseRuby::ConnectionError => e
193
- # Handle connection issues
194
- puts "Connection failed: #{e.message}"
195
- rescue ClickhouseRuby::QueryError => e
196
- # Handle query errors
197
- puts "Query failed: #{e.message}"
198
- puts "Error code: #{e.code}"
199
- rescue ClickhouseRuby::TimeoutError => e
200
- # Handle timeouts
201
- puts "Request timed out: #{e.message}"
202
- end
203
- ```
204
-
205
299
  ## Development
206
300
 
207
301
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests.
@@ -212,29 +306,21 @@ After checking out the repo, run `bin/setup` to install dependencies. Then, run
212
306
  # Start ClickHouse
213
307
  docker-compose up -d
214
308
 
215
- # Run all tests
216
- bundle exec rake spec
309
+ # Run unit tests only
310
+ bundle exec rspec spec/unit
217
311
 
218
- # Run only unit tests
219
- bundle exec rake spec_unit
220
-
221
- # Run integration tests
222
- bundle exec rake spec_integration
312
+ # Run all tests including integration
313
+ CLICKHOUSE_TEST_INTEGRATION=true bundle exec rspec
223
314
  ```
224
315
 
225
- ### Code Quality
226
-
227
- ```bash
228
- # Run RuboCop
229
- bundle exec rake rubocop
316
+ ## Requirements
230
317
 
231
- # Auto-fix issues
232
- bundle exec rake rubocop_fix
233
- ```
318
+ - Ruby >= 2.6.0
319
+ - ClickHouse >= 20.x (tested with 24.x)
234
320
 
235
321
  ## Contributing
236
322
 
237
- Bug reports and pull requests are welcome on GitHub at https://github.com/yourusername/clickhouse-ruby.
323
+ Bug reports and pull requests are welcome on GitHub at https://github.com/Mohamad-Kamar/clickhouse-ruby.
238
324
 
239
325
  1. Fork it
240
326
  2. Create your feature branch (`git checkout -b feature/my-new-feature`)