parqueteur 1.2.0 → 1.3.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 056b9208a8bffcd163464dbd2cf276a9b0704e96788b77555d545eb339a4e798
4
- data.tar.gz: 1e20d31b1fc6f198fee42546939ce289d71d66f65ffa66562cdd7841e0f24f61
3
+ metadata.gz: ee67e88dd319c32dabd83b2ee9c0b701a9f02e5f6362fada4d017232011d8dd7
4
+ data.tar.gz: 5a06d816db267a0ef903a607bff35b33a747a268cb04a9c2187e766abeb49182
5
5
  SHA512:
6
- metadata.gz: fe08a7b282c4ededc08acb5aa9f4b485ead828aee4fd1444e8bb1af80cc56ea8c20411aefe136809f91ad808bee52db261218e8b5e6b7538bfa53d1eb38eb4b5
7
- data.tar.gz: 0fee8ec94698b7b4c9d3a089fd0094a52bd83dfda56d0652f8a5b08dfe84a88b251736e62a9da7f510e0fa3d1842e2551161178ce30b5e0f5c6ee9b903917a2c
6
+ metadata.gz: 399868f168698110c4725a993e066e78fd8730942519af34b8e785a41bbfa51f10e65078c44f3ea7264e41aa84e7de5a48b6652e95467d7e8fabf481595e0cff
7
+ data.tar.gz: d393e87fc5dec19f2528910a6ed2e2e3714aeb6886124d56acaaaac38102e7e61dde42a1721f1d8c98e852e55a5f73481e5a016cbc191474cf97a188268dc001
data/Gemfile.lock CHANGED
@@ -1,32 +1,32 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- parqueteur (1.2.0)
5
- red-parquet (~> 5.0)
4
+ parqueteur (1.3.3)
5
+ red-parquet (~> 6.0)
6
6
 
7
7
  GEM
8
8
  remote: https://rubygems.org/
9
9
  specs:
10
- bigdecimal (3.0.0)
11
- extpp (0.0.9)
12
- gio2 (3.4.6)
13
- gobject-introspection (= 3.4.6)
14
- glib2 (3.4.6)
10
+ bigdecimal (3.0.2)
11
+ extpp (0.1.0)
12
+ gio2 (3.4.9)
13
+ gobject-introspection (= 3.4.9)
14
+ glib2 (3.4.9)
15
15
  native-package-installer (>= 1.0.3)
16
16
  pkg-config (>= 1.3.5)
17
- gobject-introspection (3.4.6)
18
- glib2 (= 3.4.6)
17
+ gobject-introspection (3.4.9)
18
+ glib2 (= 3.4.9)
19
19
  native-package-installer (1.1.1)
20
20
  pkg-config (1.4.6)
21
21
  rake (13.0.6)
22
- red-arrow (5.0.0)
22
+ red-arrow (6.0.0)
23
23
  bigdecimal (>= 2.0.3)
24
24
  extpp (>= 0.0.7)
25
- gio2 (>= 3.4.5)
25
+ gio2 (>= 3.4.9)
26
26
  native-package-installer
27
27
  pkg-config
28
- red-parquet (5.0.0)
29
- red-arrow (= 5.0.0)
28
+ red-parquet (6.0.0)
29
+ red-arrow (= 6.0.0)
30
30
 
31
31
  PLATFORMS
32
32
  ruby
data/README.md CHANGED
@@ -1,15 +1,31 @@
1
1
  # Parqueteur
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/parqueteur`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ [![Gem Version](https://badge.fury.io/rb/parqueteur.svg)](https://badge.fury.io/rb/parqueteur)
4
4
 
5
- TODO: Delete this and the text above, and describe your gem
5
+ Parqueteur enables you to generate Apache Parquet files from raw data.
6
6
 
7
+ ## Dependencies
8
+
9
+ Since I only tested Parqueteur on Ubuntu, I don't have any install scripts for others operating systems.
10
+ ### Debian/Ubuntu packages
11
+ - `libgirepository1.0-dev`
12
+ - `libarrow-dev`
13
+ - `libarrow-glib-dev`
14
+ - `libparquet-dev`
15
+ - `libparquet-glib-dev`
16
+
17
+ You can check `scripts/apache-arrow-ubuntu-install.sh` script for a quick way to install all of them.
7
18
  ## Installation
8
19
 
9
20
  Add this line to your application's Gemfile:
10
21
 
11
22
  ```ruby
12
- gem 'parqueteur'
23
+ gem 'parqueteur', '~> 1.0'
24
+ ```
25
+
26
+ > (optional) If you don't want to require Parqueteur globally you can add `require: false` to the Gemfile instruction:
27
+ ```ruby
28
+ gem 'parqueteur', '~> 1.0', require: false
13
29
  ```
14
30
 
15
31
  And then execute:
@@ -22,14 +38,127 @@ Or install it yourself as:
22
38
 
23
39
  ## Usage
24
40
 
25
- TODO: Write usage instructions here
41
+ Parqueteur provides an elegant way to generate Apache Parquet files from a defined schema.
42
+
43
+ Converters accepts any object that implements `Enumerable` as data source.
44
+
45
+ ### Working example
46
+
47
+ ```ruby
48
+ require 'parqueteur'
49
+
50
+ class FooParquetConverter < Parqueteur::Converter
51
+ column :id, :bigint
52
+ column :reference, :string
53
+ column :datetime, :timestamp
54
+ end
55
+
56
+ data = [
57
+ { 'id' => 1, 'reference' => 'hello world 1', 'datetime' => Time.now },
58
+ { 'id' => 2, 'reference' => 'hello world 2', 'datetime' => Time.now },
59
+ { 'id' => 3, 'reference' => 'hello world 3', 'datetime' => Time.now }
60
+ ]
61
+
62
+ # initialize Converter with Parquet GZIP compression mode
63
+ converter = FooParquetConverter.new(data, compression: :gzip)
26
64
 
27
- ## Development
65
+ # write result to file
66
+ converter.write('hello_world.parquet')
28
67
 
29
- After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
68
+ # in-memory result (StringIO)
69
+ converter.to_io
70
+
71
+ # write to temporary file (Tempfile)
72
+ # don't forget to `close` / `unlink` it after usage
73
+ converter.to_tmpfile
74
+
75
+ # convert to Arrow::Table
76
+ pp converter.to_arrow_table
77
+ ```
78
+
79
+ ### Using transformers
80
+
81
+ You can use transformers to apply data items transformations.
82
+
83
+ From `examples/cars.rb`:
84
+
85
+ ```ruby
86
+ require 'parqueteur'
87
+
88
+ class Car
89
+ attr_reader :name, :production_year
90
+
91
+ def initialize(name, production_year)
92
+ @name = name
93
+ @production_year = production_year
94
+ end
95
+ end
96
+
97
+ class CarParquetConverter < Parqueteur::Converter
98
+ column :name, :string
99
+ column :production_year, :integer
100
+
101
+ transform do |car|
102
+ {
103
+ 'name' => car.name,
104
+ 'production_year' => car.production_year
105
+ }
106
+ end
107
+ end
108
+
109
+ cars = [
110
+ Car.new('Alfa Romeo 75', 1985),
111
+ Car.new('Alfa Romeo 33', 1983),
112
+ Car.new('Audi A3', 1996),
113
+ Car.new('Audi A4', 1994),
114
+ Car.new('BMW 503', 1956),
115
+ Car.new('BMW X5', 1999)
116
+ ]
117
+
118
+ # initialize Converter with Parquet GZIP compression mode
119
+ converter = CarParquetConverter.new(data, compression: :gzip)
120
+
121
+ # write result to file
122
+ pp converter.to_arrow_table
123
+ ```
124
+
125
+ Output:
126
+ ```
127
+ #<Arrow::Table:0x7fc1fb24b958 ptr=0x7fc1faedd910>
128
+ # name production_year
129
+ 0 Alfa Romeo 75 1985
130
+ 1 Alfa Romeo 33 1983
131
+ 2 Audi A3 1996
132
+ 3 Audi A4 1994
133
+ 4 BMW 503 1956
134
+ 5 BMW X5 1999
135
+ ```
30
136
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
137
+ ### Available Types
138
+
139
+ | Name (Symbol) | Apache Parquet Type |
140
+ | ------------- | --------- |
141
+ | `:array` | `Array` |
142
+ | `:bigdecimal` | `Decimal256` |
143
+ | `:bigint` | `Int64` or `UInt64` with `unsigned: true` option |
144
+ | `:boolean` | `Boolean` |
145
+ | `:date` | `Date32` |
146
+ | `:date32` | `Date32` |
147
+ | `:date64` | `Date64` |
148
+ | `:decimal` | `Decimal128` |
149
+ | `:decimal128` | `Decimal128` |
150
+ | `:decimal256` | `Decimal256` |
151
+ | `:int32` | `Int32` or `UInt32` with `unsigned: true` option |
152
+ | `:int64` | `Int64` or `UInt64` with `unsigned: true` option |
153
+ | `:integer` | `Int32` or `UInt32` with `unsigned: true` option |
154
+ | `:map` | `Map` |
155
+ | `:string` | `String` |
156
+ | `:struct` | `Struct` |
157
+ | `:time` | `Time32` |
158
+ | `:time32` | `Time32` |
159
+ | `:time64` | `Time64` |
160
+ | `:timestamp` | `Timestamp` |
32
161
 
33
162
  ## Contributing
34
163
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/parqueteur.
164
+ Bug reports and pull requests are welcome on GitHub at https://github.com/pocketsizesun/parqueteur-ruby.
data/examples/cars.rb ADDED
@@ -0,0 +1,40 @@
1
+ require 'bundler/setup'
2
+ require 'parqueteur'
3
+
4
+ class Car
5
+ attr_reader :name, :production_year
6
+
7
+ def initialize(name, production_year)
8
+ @name = name
9
+ @production_year = production_year
10
+ end
11
+ end
12
+
13
+ class CarParquetConverter < Parqueteur::Converter
14
+ column :name, :string
15
+ column :production_year, :integer
16
+
17
+ transform do |car|
18
+ {
19
+ 'name' => car.name,
20
+ 'production_year' => car.production_year
21
+ }
22
+ end
23
+ end
24
+
25
+ cars = [
26
+ Car.new('Alfa Romeo 75', 1985),
27
+ Car.new('Alfa Romeo 33', 1983),
28
+ Car.new('Audi A3', 1996),
29
+ Car.new('Audi A4', 1994),
30
+ Car.new('BMW 503', 1956),
31
+ Car.new('BMW X5', 1999)
32
+ ]
33
+
34
+ # initialize Converter with Parquet GZIP compression mode
35
+ converter = CarParquetConverter.new(
36
+ cars, compression: :gzip
37
+ )
38
+
39
+ # write result to file
40
+ pp converter.to_arrow_table
@@ -3,7 +3,7 @@ require 'parqueteur'
3
3
  require 'securerandom'
4
4
  require 'benchmark'
5
5
 
6
- class Foo < Parqueteur::Converter
6
+ class FooParquetConverter < Parqueteur::Converter
7
7
  column :id, :bigint
8
8
  column :reference, :string
9
9
  column :hash, :map, key: :string, value: :string
@@ -48,5 +48,9 @@ data = 10000.times.collect do |i|
48
48
  end
49
49
  puts "data generation OK"
50
50
 
51
- path = 'tmp/test.parquet'
52
- Foo.convert_to(data, path, compression: :gzip)
51
+ # initialize Converter with Parquet GZIP compression mode
52
+ converter = FooParquetConverter.new(data, compression: :gzip)
53
+
54
+ # write result to file
55
+ converter.write('tmp/example.gzip-compressed.parquet')
56
+ converter.write('tmp/example.no-gzip.parquet', compression: false)
@@ -0,0 +1,60 @@
1
+ require 'securerandom'
2
+ require 'bundler/setup'
3
+ require 'parqueteur'
4
+
5
+ class FooParquetConverter < Parqueteur::Converter
6
+ column :id, :bigint
7
+ column :my_string_array, :array, elements: :string
8
+ column :my_date, :date
9
+ column :my_decimal, :decimal, precision: 12, scale: 4
10
+ column :my_int, :integer
11
+ column :my_map, :map, key: :string, value: :string
12
+ column :my_string, :string
13
+ column :my_struct, :struct do
14
+ field :my_struct_str, :string
15
+ field :my_struct_int, :integer
16
+ end
17
+ column :my_time, :time
18
+ column :my_timestamp, :timestamp
19
+ end
20
+
21
+ data = 1000.times.collect do |i|
22
+ {
23
+ 'id' => i,
24
+ 'my_string_array' => %w[a b c],
25
+ 'my_date' => Date.today,
26
+ 'my_decimal' => BigDecimal('0.03'),
27
+ 'my_int' => rand(1..10),
28
+ 'my_map' => 20.times.each_with_object({}) do |idx, hash|
29
+ hash["k_#{idx}"] = SecureRandom.urlsafe_base64(64)
30
+ end,
31
+ 'my_string' => 'Hello World',
32
+ 'my_struct' => {
33
+ 'my_struct_str' => 'Hello World',
34
+ 'my_struct_int' => 1
35
+ },
36
+ 'my_time' => 3600,
37
+ 'my_timestamp' => Time.now
38
+ }
39
+ end
40
+
41
+ # initialize Converter with Parquet GZIP compression mode
42
+ converter = FooParquetConverter.new(data, compression: :gzip)
43
+
44
+ # write result to file
45
+ converter.write('tmp/hello_world.compressed.parquet')
46
+ converter.write('tmp/hello_world.parquet', compression: false)
47
+
48
+ # in-memory result (StringIO)
49
+ converter.to_io
50
+
51
+ # write to temporary file (Tempfile)
52
+ # don't forget to `close` / `unlink` it after usage
53
+ converter.to_tmpfile
54
+
55
+ # Arrow Table
56
+ table = converter.to_arrow_table
57
+ table.each_record do |record|
58
+ # pp record['my_decimal'].to_f
59
+ pp record.to_h
60
+ end
@@ -0,0 +1,44 @@
1
+ require 'bundler/setup'
2
+ require 'parqueteur'
3
+
4
+ class FooParquetConverter < Parqueteur::Converter
5
+ column :id, :bigint
6
+ column :reference, :string
7
+ column :datetime, :timestamp
8
+ column :beers_count, :integer
9
+
10
+ transform do |item|
11
+ item.merge(
12
+ 'datetime' => Time.now
13
+ )
14
+ end
15
+
16
+ transform :add_beers
17
+
18
+ private
19
+
20
+ def add_beers(item)
21
+ item['beers_count'] += rand(1..3)
22
+ item
23
+ end
24
+ end
25
+
26
+ data = 10.times.lazy.map do |i|
27
+ { 'id' => i + 1, 'reference' => 'hello world 1', 'beers_count' => 0 }
28
+ end
29
+
30
+ # initialize Converter with Parquet GZIP compression mode
31
+ converter = FooParquetConverter.new(data, compression: :gzip)
32
+
33
+ # write result to file
34
+ converter.write('tmp/hello_world.parquet')
35
+
36
+ # in-memory result (StringIO)
37
+ converter.to_io
38
+
39
+ # write to temporary file (Tempfile)
40
+ # don't forget to `close` / `unlink` it after usage
41
+ converter.to_tmpfile
42
+
43
+ # convert to Arrow::Table
44
+ pp converter.to_arrow_table
@@ -2,7 +2,7 @@
2
2
 
3
3
  module Parqueteur
4
4
  class Converter
5
- DEFAULT_BATCH_SIZE = 10
5
+ DEFAULT_BATCH_SIZE = 100
6
6
 
7
7
  def self.inline(&block)
8
8
  Class.new(self, &block)
@@ -41,12 +41,14 @@ module Parqueteur
41
41
  @compression = kwargs.fetch(:compression, nil)&.to_sym
42
42
  end
43
43
 
44
- def split(size)
44
+ def split(size, batch_size: nil, compression: nil)
45
45
  Enumerator.new do |arr|
46
+ options = {
47
+ batch_size: batch_size || @batch_size,
48
+ compression: compression || @compression
49
+ }
46
50
  @input.each_slice(size) do |records|
47
- local_converter = self.class.new(
48
- records, batch_size: @batch_size, compression: @compression
49
- )
51
+ local_converter = self.class.new(records, **options)
50
52
  file = local_converter.to_tmpfile
51
53
  arr << file
52
54
  file.close
@@ -55,23 +57,31 @@ module Parqueteur
55
57
  end
56
58
  end
57
59
 
58
- def split_by_io(size)
60
+ def split_by_io(size, batch_size: nil, compression: nil)
59
61
  Enumerator.new do |arr|
62
+ options = {
63
+ batch_size: batch_size || @batch_size,
64
+ compression: compression || @compression
65
+ }
60
66
  @input.each_slice(size) do |records|
61
- local_converter = self.class.new(records)
67
+ local_converter = self.class.new(records, **options)
62
68
  arr << local_converter.to_io
63
69
  end
64
70
  end
65
71
  end
66
72
 
67
- def write(path)
73
+ def write(path, batch_size: nil, compression: nil)
74
+ compression = @compression if compression.nil?
75
+ batch_size = @batch_size if batch_size.nil?
68
76
  arrow_schema = self.class.columns.arrow_schema
69
77
  writer_properties = Parquet::WriterProperties.new
70
- writer_properties.set_compression(@compression) unless @compression.nil?
78
+ if !compression.nil? && compression != false
79
+ writer_properties.set_compression(compression)
80
+ end
71
81
 
72
82
  Arrow::FileOutputStream.open(path, false) do |output|
73
83
  Parquet::ArrowFileWriter.open(arrow_schema, output, writer_properties) do |writer|
74
- @input.each_slice(@batch_size) do |records|
84
+ @input.each_slice(batch_size) do |records|
75
85
  arrow_table = build_arrow_table(records)
76
86
  writer.write_table(arrow_table, 1024)
77
87
  end
@@ -81,32 +91,32 @@ module Parqueteur
81
91
  true
82
92
  end
83
93
 
84
- def to_tmpfile
94
+ def to_tmpfile(options = {})
85
95
  tempfile = Tempfile.new
86
96
  tempfile.binmode
87
- write(tempfile.path)
97
+ write(tempfile.path, **options)
88
98
  tempfile.rewind
89
99
  tempfile
90
100
  end
91
101
 
92
- def to_io
93
- tmpfile = to_tmpfile
102
+ def to_io(options = {})
103
+ tmpfile = to_tmpfile(options)
94
104
  strio = StringIO.new(tmpfile.read)
95
105
  tmpfile.close
96
106
  tmpfile.unlink
97
107
  strio
98
108
  end
99
109
 
100
- def to_arrow_table
101
- file = to_tmpfile
110
+ def to_arrow_table(options = {})
111
+ file = to_tmpfile(options)
102
112
  table = Arrow::Table.load(file.path, format: :parquet)
103
113
  file.close
104
114
  file.unlink
105
115
  table
106
116
  end
107
117
 
108
- def to_blob
109
- to_io.read
118
+ def to_blob(options = {})
119
+ to_tmpfile(options).read
110
120
  end
111
121
 
112
122
  private
@@ -7,14 +7,24 @@ module Parqueteur
7
7
  def self.registered_types
8
8
  @registered_types ||= {
9
9
  array: Parqueteur::Types::ArrayType,
10
+ bigdecimal: Parqueteur::Types::Decimal256Type,
10
11
  bigint: Parqueteur::Types::Int64Type,
11
12
  boolean: Parqueteur::Types::BooleanType,
13
+ date: Parqueteur::Types::Date32Type,
14
+ date32: Parqueteur::Types::Date32Type,
15
+ date64: Parqueteur::Types::Date64Type,
16
+ decimal: Parqueteur::Types::Decimal128Type,
17
+ decimal128: Parqueteur::Types::Decimal128Type,
18
+ decimal256: Parqueteur::Types::Decimal256Type,
12
19
  int32: Parqueteur::Types::Int32Type,
13
20
  int64: Parqueteur::Types::Int64Type,
14
21
  integer: Parqueteur::Types::Int32Type,
15
22
  map: Parqueteur::Types::MapType,
16
23
  string: Parqueteur::Types::StringType,
17
24
  struct: Parqueteur::Types::StructType,
25
+ time: Parqueteur::Types::Time32Type,
26
+ time32: Parqueteur::Types::Time32Type,
27
+ time64: Parqueteur::Types::Time64Type,
18
28
  timestamp: Parqueteur::Types::TimestampType
19
29
  }
20
30
  end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Parqueteur
4
+ module Types
5
+ class Date32Type < Parqueteur::Type
6
+ def build_value_array(values)
7
+ Arrow::Date32ArrayBuilder.build(values)
8
+ end
9
+
10
+ def arrow_type_builder
11
+ Arrow::Date32DataType.new
12
+ end
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Parqueteur
4
+ module Types
5
+ class Date64Type < Parqueteur::Type
6
+ def build_value_array(values)
7
+ Arrow::Date64ArrayBuilder.build([values])
8
+ end
9
+
10
+ def arrow_type_builder
11
+ Arrow::Date64DataType.new
12
+ end
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Parqueteur
4
+ module Types
5
+ class Decimal128Type < Parqueteur::Type
6
+ def initialize(options = {}, &block)
7
+ @scale = options.fetch(:scale)
8
+ @precision = options.fetch(:precision)
9
+ @format_str = "%.#{@scale}f"
10
+ super(options, &block)
11
+ end
12
+
13
+ def build_value_array(values)
14
+ Arrow::Decimal128ArrayBuilder.build(
15
+ @arrow_type,
16
+ values.map do |value|
17
+ Arrow::Decimal128.new(format(@format_str, BigDecimal(value)))
18
+ end
19
+ )
20
+ end
21
+
22
+ def arrow_type_builder
23
+ Arrow::Decimal128DataType.new(
24
+ @precision, @scale
25
+ )
26
+ end
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Parqueteur
4
+ module Types
5
+ class Decimal256Type < Parqueteur::Type
6
+ def initialize(options = {}, &block)
7
+ @scale = options.fetch(:scale)
8
+ @precision = options.fetch(:precision)
9
+ @format_str = "%.#{@scale}f"
10
+ super(options, &block)
11
+ end
12
+
13
+ def build_value_array(values)
14
+ Arrow::Decimal256ArrayBuilder.build(
15
+ @arrow_type,
16
+ values.map do |value|
17
+ Arrow::Decimal256.new(format(@format_str, BigDecimal(value)))
18
+ end
19
+ )
20
+ end
21
+
22
+ def arrow_type_builder
23
+ Arrow::Decimal256DataType.new(
24
+ @precision, @scale
25
+ )
26
+ end
27
+ end
28
+ end
29
+ end
@@ -21,5 +21,3 @@ module Parqueteur
21
21
  end
22
22
  end
23
23
  end
24
-
25
- # when :integer
@@ -21,5 +21,3 @@ module Parqueteur
21
21
  end
22
22
  end
23
23
  end
24
-
25
- # when :integer
@@ -4,18 +4,7 @@ module Parqueteur
4
4
  module Types
5
5
  class MapType < Parqueteur::Type
6
6
  def build_value_array(values)
7
- builder = Arrow::MapArrayBuilder.new(arrow_type)
8
- values.each do |entry|
9
- builder.append_value
10
- next if entry.nil?
11
-
12
- entry.each do |k, v|
13
- builder.key_builder.append(k)
14
- builder.item_builder.append(v)
15
- end
16
- end
17
-
18
- builder.finish
7
+ Arrow::MapArrayBuilder.build(arrow_type, values)
19
8
  end
20
9
 
21
10
  def arrow_type_builder
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Parqueteur
4
+ module Types
5
+ class Time32Type < Parqueteur::Type
6
+ def build_value_array(values)
7
+ Arrow::Time32Array.new(
8
+ @options.fetch(:precision, :second), values
9
+ )
10
+ end
11
+
12
+ def arrow_type_builder
13
+ Arrow::Time32DataType.new(
14
+ options.fetch(:unit, :second)
15
+ )
16
+ end
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Parqueteur
4
+ module Types
5
+ class Time64Type < Parqueteur::Type
6
+ def build_value_array(values)
7
+ Arrow::Time64Array.new(
8
+ @options.fetch(:precision, :second), values
9
+ )
10
+ end
11
+
12
+ def arrow_type_builder
13
+ Arrow::Time64DataType.new(
14
+ options.fetch(:unit, :second)
15
+ )
16
+ end
17
+ end
18
+ end
19
+ end
@@ -9,7 +9,9 @@ module Parqueteur
9
9
  module Types
10
10
  class TimestampType < Parqueteur::Type
11
11
  def build_value_array(values)
12
- Arrow::TimestampArray.new(values)
12
+ Arrow::TimestampArray.new(
13
+ options.fetch(:unit, :second), values
14
+ )
13
15
  end
14
16
 
15
17
  def arrow_type_builder
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Parqueteur
4
- VERSION = '1.2.0'
4
+ VERSION = '1.3.3'
5
5
  end
data/lib/parqueteur.rb CHANGED
@@ -3,6 +3,7 @@
3
3
  require 'json'
4
4
  require 'singleton'
5
5
  require 'tempfile'
6
+ require 'parquet'
6
7
 
7
8
  require_relative 'parqueteur/version'
8
9
  require 'parqueteur/column'
@@ -14,16 +15,20 @@ require 'parqueteur/type'
14
15
  require 'parqueteur/type_resolver'
15
16
  require 'parqueteur/types/array_type'
16
17
  require 'parqueteur/types/boolean_type'
18
+ require 'parqueteur/types/date32_type'
19
+ require 'parqueteur/types/date64_type'
20
+ require 'parqueteur/types/decimal128_type'
21
+ require 'parqueteur/types/decimal256_type'
17
22
  require 'parqueteur/types/int32_type'
18
23
  require 'parqueteur/types/int64_type'
19
24
  require 'parqueteur/types/map_type'
20
25
  require 'parqueteur/types/string_type'
21
26
  require 'parqueteur/types/struct_type'
27
+ require 'parqueteur/types/time32_type'
28
+ require 'parqueteur/types/time64_type'
22
29
  require 'parqueteur/types/timestamp_type'
23
- require 'parquet'
24
30
 
25
31
  module Parqueteur
26
32
  class Error < StandardError; end
27
33
  class TypeNotFound < Error; end
28
- # Your code goes here...
29
34
  end
data/parqueteur.gemspec CHANGED
@@ -8,8 +8,8 @@ Gem::Specification.new do |spec|
8
8
  spec.authors = ["Julien D."]
9
9
  spec.email = ["julien@pocketsizesun.com"]
10
10
  spec.license = 'Apache-2.0'
11
- spec.summary = 'Parqueteur - A Ruby gem that convert JSON to Parquet'
12
- spec.description = 'Convert JSON to Parquet'
11
+ spec.summary = 'Parqueteur - A Ruby gem that convert data to Parquet'
12
+ spec.description = 'Convert data to Parquet'
13
13
  spec.homepage = 'https://github.com/pocketsizesun/parqueteur-ruby'
14
14
  spec.required_ruby_version = Gem::Requirement.new(">= 2.3.0")
15
15
 
@@ -30,7 +30,7 @@ Gem::Specification.new do |spec|
30
30
 
31
31
  # Uncomment to register a new dependency of your gem
32
32
  # spec.add_dependency "example-gem", "~> 1.0"
33
- spec.add_dependency 'red-parquet', '~> 5.0'
33
+ spec.add_dependency 'red-parquet', '~> 6.0'
34
34
 
35
35
  # For more information and examples about making a new gem, checkout our
36
36
  # guide at: https://bundler.io/guides/creating_gem.html
@@ -0,0 +1,18 @@
1
+ #!/bin/sh
2
+
3
+ if [ $(dpkg-query -W -f='${Status}' apache-arrow-apt-source 2>/dev/null | grep -c "ok installed") -eq 1 ]
4
+ then
5
+ exit 0
6
+ fi
7
+
8
+ LSB_RELEASE_CODENAME_SHORT=$(lsb_release --codename --short)
9
+
10
+ apt-get update
11
+ apt-get install -y -V ca-certificates lsb-release wget
12
+ wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
13
+ apt-get install -y -V ./apache-arrow-apt-source-latest-${LSB_RELEASE_CODENAME_SHORT}.deb
14
+ rm ./apache-arrow-apt-source-latest-${LSB_RELEASE_CODENAME_SHORT}.deb
15
+ apt-get update
16
+ apt-get install -y libgirepository1.0-dev libarrow-dev libarrow-glib-dev libparquet-dev libparquet-glib-dev
17
+
18
+ exit 0
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: parqueteur
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.3.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Julien D.
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-10-02 00:00:00.000000000 Z
11
+ date: 2021-11-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: red-parquet
@@ -16,15 +16,15 @@ dependencies:
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '5.0'
19
+ version: '6.0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '5.0'
27
- description: Convert JSON to Parquet
26
+ version: '6.0'
27
+ description: Convert data to Parquet
28
28
  email:
29
29
  - julien@pocketsizesun.com
30
30
  executables: []
@@ -38,11 +38,14 @@ files:
38
38
  - Rakefile
39
39
  - bin/console
40
40
  - bin/setup
41
+ - examples/cars.rb
42
+ - examples/convert-and-compression.rb
41
43
  - examples/convert-methods.rb
42
44
  - examples/convert-to-io.rb
43
45
  - examples/convert-with-chunks.rb
44
- - examples/convert-with-compression.rb
45
46
  - examples/convert-without-compression.rb
47
+ - examples/hello-world.rb
48
+ - examples/readme-example.rb
46
49
  - lib/parqueteur.rb
47
50
  - lib/parqueteur/column.rb
48
51
  - lib/parqueteur/column_collection.rb
@@ -53,15 +56,21 @@ files:
53
56
  - lib/parqueteur/type_resolver.rb
54
57
  - lib/parqueteur/types/array_type.rb
55
58
  - lib/parqueteur/types/boolean_type.rb
59
+ - lib/parqueteur/types/date32_type.rb
60
+ - lib/parqueteur/types/date64_type.rb
61
+ - lib/parqueteur/types/decimal128_type.rb
62
+ - lib/parqueteur/types/decimal256_type.rb
56
63
  - lib/parqueteur/types/int32_type.rb
57
64
  - lib/parqueteur/types/int64_type.rb
58
65
  - lib/parqueteur/types/map_type.rb
59
66
  - lib/parqueteur/types/string_type.rb
60
67
  - lib/parqueteur/types/struct_type.rb
68
+ - lib/parqueteur/types/time32_type.rb
69
+ - lib/parqueteur/types/time64_type.rb
61
70
  - lib/parqueteur/types/timestamp_type.rb
62
71
  - lib/parqueteur/version.rb
63
72
  - parqueteur.gemspec
64
- - test.json
73
+ - scripts/apache-arrow-ubuntu-install.sh
65
74
  homepage: https://github.com/pocketsizesun/parqueteur-ruby
66
75
  licenses:
67
76
  - Apache-2.0
@@ -85,5 +94,5 @@ requirements: []
85
94
  rubygems_version: 3.2.3
86
95
  signing_key:
87
96
  specification_version: 4
88
- summary: Parqueteur - A Ruby gem that convert JSON to Parquet
97
+ summary: Parqueteur - A Ruby gem that convert data to Parquet
89
98
  test_files: []
data/test.json DELETED
@@ -1 +0,0 @@
1
- [{"id":1,"reference":"coucou","hash":{"a":"b"},"valid":true,"hash2":{},"numbers":[1,2,3],"map_array":[]},{"id":2,"reference":"coucou","hash":{"c":"d"},"valid":false,"hash2":{},"numbers":[4,5,6],"map_array":[]},{"id":3,"reference":"coucou","hash":{"e":"f"},"valid":true,"hash2":{"x":[1,2,3]},"numbers":[7,8,9],"map_array":[{"x":"y"}]}]