metacrunch-file 1.2.1 → 1.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d530edbf89a1317841d79de98e9e88f6c865c78bb1b26b8f838581308f4f2ddb
4
- data.tar.gz: 1cf9fe7d0dfcc65a2e0bc37ecb146d376005bffd4e70aae8822eb92ab50bcce1
3
+ metadata.gz: 873e28ee7fabd11484426b0282b9857025b95452ec7cabffabf98a163326f8e5
4
+ data.tar.gz: 46f10b360760fa3259d46a1578ee62bffb9134511d4fd76ca0247d12b91f4866
5
5
  SHA512:
6
- metadata.gz: 86a35a6bc99eaa6ac6aaca784cf7dfeb626f1c7378c03fa9de5a17f250c51a05fc8c7febda42f34c90302cb6e05f39da08f506b35645095feac585466a1830a8
7
- data.tar.gz: 2d96c0cf6eb38eb21ed193a8da58ed06cbd837e8a2d9c799fb96ac5d7cc7871564972dd7d3f2e4b70eb42213730073f052dce01e14ede1be12d71525734b3275
6
+ metadata.gz: 28f0edd5914fac3b1611943e8d8bbdbb9c2c33a92c4fd45cd3b579a61cbf7eaf2c6db489878ff878d31ceeb75262ff76f95fedf4739e9e90e13cf892e2cf0de8
7
+ data.tar.gz: 62aec554a36543043909c814d1331dc3951997be24fcc9c6a4839b10d454c7f4287160aaf99c80a7bed33d2a495e8bce676fe17e29c26ee03e7ce8774be40940
data/.circleci/config.yml CHANGED
@@ -1,12 +1,11 @@
1
- # Ruby CircleCI 2.0 configuration file
2
- #
3
- # Check https://circleci.com/docs/2.0/language-ruby/ for more details
4
- #
5
- version: 2
1
+ version: 2.1
2
+ orbs:
3
+ ruby: circleci/ruby@1.1.1
4
+
6
5
  jobs:
7
6
  build:
8
7
  docker:
9
- - image: circleci/ruby:2.4.1-node-browsers
8
+ - image: circleci/ruby:2.6-node-browsers
10
9
 
11
10
  working_directory: ~/repo
12
11
 
data/Gemfile CHANGED
@@ -14,6 +14,6 @@ end
14
14
  group :test do
15
15
  gem "rspec", ">= 3.5.0", "< 4.0.0"
16
16
  gem "rspec_junit_formatter", ">= 0.3.0"
17
- gem "simplecov", ">= 0.15.0"
17
+ gem "simplecov", "= 0.17.1"
18
18
  end
19
19
 
data/Readme.md CHANGED
@@ -6,7 +6,10 @@ metacrunch-file
6
6
  [![Test Coverage](https://codeclimate.com/github/ubpb/metacrunch-file/badges/coverage.svg)](https://codeclimate.com/github/ubpb/metacrunch-file/coverage)
7
7
  [![CircleCI](https://circleci.com/gh/ubpb/metacrunch-file.svg?style=svg)](https://circleci.com/gh/ubpb/metacrunch-file)
8
8
 
9
- This is the official file package for the [metacrunch ETL toolkit](https://github.com/ubpb/metacrunch).
9
+ This is the official file package for the [metacrunch ETL toolkit](https://github.com/ubpb/metacrunch).
10
+
11
+ *Note: For working examples on how to use this package check out our [demo repository](https://github.com/ubpb/metacrunch-demo).*
12
+
10
13
 
11
14
  Installation
12
15
  ------------
@@ -14,7 +17,7 @@ Installation
14
17
  Include the gem in your `Gemfile`
15
18
 
16
19
  ```ruby
17
- gem "metacrunch-file", "~> 1.2.0"
20
+ gem "metacrunch-file", "~> 1.5.0"
18
21
  ```
19
22
 
20
23
  and run `$ bundle install` to install it.
@@ -29,24 +32,20 @@ $ gem install metacrunch-file
29
32
  Usage
30
33
  -----
31
34
 
32
- *Note: For working examples on how to use this package check out our [demo repository](https://github.com/ubpb/metacrunch-demo).*
33
-
34
- ### `Metacrunch::File::Source`
35
+ ## `Metacrunch::File::FileSource`
35
36
 
36
37
  This class provides a metacrunch `source` implementation that can be used to read data from files in the file system into a metacrunch job. The class can be used to read regular files, compressed files (gzip), tar archives and compressed tar archives (gzip).
37
38
 
38
- You can access non-option arguments from the command line using the `ARGV` constant.
39
-
40
39
  ```ruby
41
40
  # my_job.metacrunch
42
41
 
43
42
  # If you call this example like so
44
43
  # $ metacrunch my_job.metacrunch *.xml
45
44
  # ARGV will contain all the XML files in the current directory.
46
- source Metacrunch::File::Source.new(ARGV)
45
+ source Metacrunch::File::FileSource.new(ARGV)
47
46
 
48
47
  # ... or you can set the filenames directly
49
- source Metacrunch::File::Source.new(["my-data.xml", "my-other-data.xml", "..."])
48
+ source Metacrunch::File::FileSource.new(["my-data.xml", "my-other-data.xml", "..."])
50
49
  ```
51
50
 
52
51
  **Options**
@@ -67,22 +66,58 @@ transformation ->(file_entry) do
67
66
  end
68
67
  ```
69
68
 
70
- ### `Metacrunch::File::Destination`
69
+ ## `Metacrunch::File::FileDestination`
71
70
 
72
71
  This class provides a metacrunch `destination` to write data to a file. Every data that gets passed to the destination is appended to the given file. If the data is an `Array` every element of that array is appended to the file. Non existing files will be created automatically.
73
72
 
74
73
  ```ruby
75
74
  # my_job.metacrunch
76
75
 
77
- source Metacrunch::File::Destination.new("/tmp/my-data.txt" [, OPTIONS])
76
+ destination Metacrunch::File::FileDestination.new("/tmp/my-data.txt" [, OPTIONS])
77
+ ```
78
+
79
+ **Options**
80
+
81
+ * `override_existing_file`: Overrides an existing file if set to `true`. If set to `false` an error is raised if the file already exists. Defaults to `false`.
82
+
83
+ ## `Metacrunch::File::CSVSource`
84
+
85
+ This class provides a metacrunch `source` for reading CSV files. It is a simple wrapper around [smarter_csv](https://github.com/tilo/smarter_csv) gem.
86
+
87
+ ```ruby
88
+ # my_job.metacrunch
89
+
90
+ source Metacrunch::File::CSVSource.new("my.csv" [, OPTIONS])
91
+ ```
92
+
93
+ **Options**
94
+
95
+ * `headers`: Whether or not the file contains headers as the first line. Important if the file does not contain headers, otherwise you would lose the first line of data. Defaults to `true`.
96
+ * `col_sep`: Column separator. Defaults to `,`.
97
+ * `row_sep`: Row separator or record separator. Defaults to `\n`.
98
+ * `quote_char`: Quotation character. Defaults to `"`.
99
+ * `file_encoding`: Set the file encoding. Defaults to `utf-8`.
100
+
101
+ ## `Metacrunch::File::CSVDestination`
102
+
103
+ This class provides a metacrunch `desination` for writing CSV files. Because [smarter_csv](https://github.com/tilo/smarter_csv) can only be used to read CSV, this class uses Ruby's [build in CSV feature](https://ruby-doc.org/stdlib/libdoc/csv/rdoc/CSV.html) under the hood.
104
+
105
+ ```ruby
106
+ # my_job.metacrunch
107
+
108
+ destination Metacrunch::File::CSVDestination.new(
109
+ "result.csv", # filename
110
+ ["Header 1", "Header 2", ...], # headers
111
+ [, OPTIONS]
112
+ )
78
113
  ```
79
114
 
80
115
  **Options**
81
116
 
82
117
  * `override_existing_file`: Overrides an existing file if set to `true`. If set to `false` an error is raised if the file already exists. Defaults to `false`.
83
- *
118
+ * `csv_options`: Set options for CSV generation as `col_sep`. Full list is [here](https://ruby-doc.org/stdlib/libdoc/csv/rdoc/CSV.html#class-CSV-label-Options).
84
119
 
85
- ### `Metacrunch::File::XLSXDestination`
120
+ ## `Metacrunch::File::XLSXDestination`
86
121
 
87
122
  This class provides a metacrunch `destination` implementation to create simple Excel (xlsx) files.
88
123
 
@@ -95,7 +130,7 @@ transformation ->(data) do
95
130
  [data["foo"], data["bar"], ...]
96
131
  end
97
132
 
98
- source Metacrunch::File::XLSXDestination.new(
133
+ destination Metacrunch::File::XLSXDestination.new(
99
134
  "/tmp/my-data.xlsx", # filename
100
135
  ["Column 1", "Column 2", ...], # header columns
101
136
  OPTIONS
@@ -4,8 +4,12 @@ require "active_support/core_ext"
4
4
  module Metacrunch
5
5
  module File
6
6
  require_relative "file/entry"
7
+ require_relative "file/csv_source"
8
+ require_relative "file/file_source"
7
9
  require_relative "file/source"
10
+ require_relative "file/file_destination"
8
11
  require_relative "file/destination"
9
12
  require_relative "file/xlsx_destination"
13
+ require_relative "file/csv_destination"
10
14
  end
11
15
  end
@@ -0,0 +1,49 @@
1
+ require "metacrunch/file"
2
+
3
+ module Metacrunch
4
+ class File::CSVDestination
5
+
6
+ DEFAULT_OPTIONS = {
7
+ override_existing_file: false,
8
+ csv_options: {}
9
+ }
10
+
11
+ def initialize(filename, headers, options = {})
12
+ @filename = ::File.expand_path(filename)
13
+ @headers = headers
14
+ @options = DEFAULT_OPTIONS.deep_merge(options)
15
+
16
+ if ::File.exists?(@filename) && @options[:override_existing_file] == false
17
+ raise "File `#{@filename}` exists but `override_existing_file` option was set to `false`"
18
+ end
19
+
20
+ @file = ::File.open(@filename, 'wb+')
21
+
22
+ if @headers.present?
23
+ raise ArgumentError, "Headers must be an Array" unless @headers.is_a?(Array)
24
+ csv_str = CSV.generate_line(@headers, **@options[:csv_options])
25
+ @file.write(csv_str)
26
+ end
27
+ end
28
+
29
+ def write(data)
30
+ return if data.blank?
31
+ raise ArgumentError, "Data must be an Array" unless data.is_a?(Array)
32
+
33
+ if data.first.is_a?(Array)
34
+ data.each do |d|
35
+ csv_str = CSV.generate_line(d, **@options[:csv_options])
36
+ @file.write(csv_str)
37
+ end
38
+ else
39
+ csv_str = CSV.generate_line(data, **@options[:csv_options])
40
+ @file.write(csv_str)
41
+ end
42
+ end
43
+
44
+ def close
45
+ @file.close if @file
46
+ end
47
+
48
+ end
49
+ end
@@ -0,0 +1,35 @@
1
+ require "metacrunch/file"
2
+ require "smarter_csv"
3
+
4
+ module Metacrunch
5
+ class File::CSVSource
6
+
7
+ DEFAULT_OPTIONS = {
8
+ headers: true,
9
+ col_sep: ",",
10
+ row_sep: "\n",
11
+ quote_char: '"',
12
+ file_encoding: "utf-8"
13
+ }
14
+
15
+ def initialize(csv_filename, options = {})
16
+ @filename = csv_filename
17
+ @options = DEFAULT_OPTIONS.merge(options)
18
+ end
19
+
20
+ def each(&block)
21
+ return enum_for(__method__) unless block_given?
22
+
23
+ SmarterCSV.process(@filename, {
24
+ headers_in_file: @options[:headers],
25
+ col_sep: @options[:col_sep],
26
+ row_sep: @options[:row_sep],
27
+ quote_char: @options[:quote_char],
28
+ file_encoding: @options[:file_encoding]
29
+ }) do |line|
30
+ yield line
31
+ end
32
+ end
33
+
34
+ end
35
+ end
@@ -1,35 +1,11 @@
1
1
  require "metacrunch/file"
2
2
 
3
3
  module Metacrunch
4
- class File::Destination
5
-
6
- DEFAULT_OPTIONS = {
7
- override_existing_file: false
8
- }
4
+ class File::Destination < File::FileDestination
9
5
 
10
6
  def initialize(filename, options = {})
11
- @filename = ::File.expand_path(filename)
12
- @options = DEFAULT_OPTIONS.deep_merge(options)
13
-
14
- if ::File.exists?(@filename) && @options[:override_existing_file] == false
15
- raise "File `#{@filename}` exists but `override_existing_file` option was set to `false`"
16
- end
17
-
18
- @file = ::File.open(@filename, 'wb+')
19
- end
20
-
21
- def write(data)
22
- return if data.blank?
23
-
24
- if data.is_a?(Array)
25
- data.each { |row| @file.write(row) }
26
- else
27
- @file.write(data)
28
- end
29
- end
30
-
31
- def close
32
- @file.close if @file
7
+ warn "[DEPRECATION] `Metacrunch::File::Destination` is deprecated. Please use `Metacrunch::File::FileDestination` instead."
8
+ super
33
9
  end
34
10
 
35
11
  end
@@ -0,0 +1,36 @@
1
+ require "metacrunch/file"
2
+
3
+ module Metacrunch
4
+ class File::FileDestination
5
+
6
+ DEFAULT_OPTIONS = {
7
+ override_existing_file: false
8
+ }
9
+
10
+ def initialize(filename, options = {})
11
+ @filename = ::File.expand_path(filename)
12
+ @options = DEFAULT_OPTIONS.deep_merge(options)
13
+
14
+ if ::File.exists?(@filename) && @options[:override_existing_file] == false
15
+ raise "File `#{@filename}` exists but `override_existing_file` option was set to `false`"
16
+ end
17
+
18
+ @file = ::File.open(@filename, 'wb+')
19
+ end
20
+
21
+ def write(data)
22
+ return if data.blank?
23
+
24
+ if data.is_a?(Array)
25
+ data.each { |row| @file.write(row) }
26
+ else
27
+ @file.write(data)
28
+ end
29
+ end
30
+
31
+ def close
32
+ @file.close if @file
33
+ end
34
+
35
+ end
36
+ end
@@ -0,0 +1,56 @@
1
+ require "metacrunch/file"
2
+ require "rubygems/package"
3
+
4
+ module Metacrunch
5
+ class File::FileSource
6
+
7
+ def initialize(filenames)
8
+ @filenames = [*filenames].map{|f| f.presence}.compact
9
+ end
10
+
11
+ def each(&block)
12
+ return enum_for(__method__) unless block_given?
13
+
14
+ @filenames.each do |filename|
15
+ if is_archive?(filename)
16
+ read_archive(filename, &block)
17
+ else
18
+ read_regular_file(filename, &block)
19
+ end
20
+ end
21
+ end
22
+
23
+ private
24
+
25
+ def is_archive?(filename)
26
+ filename.ends_with?(".tar") || filename.ends_with?(".tar.gz") || filename.ends_with?(".tgz")
27
+ end
28
+
29
+ def is_gzip_file?(filename)
30
+ filename.ends_with?(".gz") || filename.ends_with?(".tgz")
31
+ end
32
+
33
+ def read_regular_file(filename, &block)
34
+ if ::File.file?(filename)
35
+ io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
36
+ yield File::Entry.new(filename: filename, archive_filename: nil, contents: io.read)
37
+ end
38
+ end
39
+
40
+ def read_archive(filename, &block)
41
+ io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
42
+ tarReader = Gem::Package::TarReader.new(io)
43
+
44
+ tarReader.each do |_tar_entry|
45
+ if _tar_entry.file?
46
+ yield File::Entry.new(
47
+ filename: filename,
48
+ archive_filename: _tar_entry.full_name,
49
+ contents: _tar_entry.read
50
+ )
51
+ end
52
+ end
53
+ end
54
+
55
+ end
56
+ end
@@ -1,55 +1,11 @@
1
1
  require "metacrunch/file"
2
- require "rubygems/package"
3
2
 
4
3
  module Metacrunch
5
- class File::Source
4
+ class File::Source < File::FileSource
6
5
 
7
6
  def initialize(filenames)
8
- @filenames = [*filenames].map{|f| f.presence}.compact
9
- end
10
-
11
- def each(&block)
12
- return enum_for(__method__) unless block_given?
13
-
14
- @filenames.each do |filename|
15
- if is_archive?(filename)
16
- read_archive(filename, &block)
17
- else
18
- read_regular_file(filename, &block)
19
- end
20
- end
21
- end
22
-
23
- private
24
-
25
- def is_archive?(filename)
26
- filename.ends_with?(".tar") || filename.ends_with?(".tar.gz") || filename.ends_with?(".tgz")
27
- end
28
-
29
- def is_gzip_file?(filename)
30
- filename.ends_with?(".gz") || filename.ends_with?(".tgz")
31
- end
32
-
33
- def read_regular_file(filename, &block)
34
- if ::File.file?(filename)
35
- io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
36
- yield File::Entry.new(filename: filename, archive_filename: nil, contents: io.read)
37
- end
38
- end
39
-
40
- def read_archive(filename, &block)
41
- io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
42
- tarReader = Gem::Package::TarReader.new(io)
43
-
44
- tarReader.each do |_tar_entry|
45
- if _tar_entry.file?
46
- yield File::Entry.new(
47
- filename: filename,
48
- archive_filename: _tar_entry.full_name,
49
- contents: _tar_entry.read
50
- )
51
- end
52
- end
7
+ warn "[DEPRECATION] `Metacrunch::File::Source` is deprecated. Please use `Metacrunch::File::FileSource` instead."
8
+ super
53
9
  end
54
10
 
55
11
  end
@@ -1,5 +1,5 @@
1
1
  module Metacrunch
2
2
  module File
3
- VERSION = "1.2.1"
3
+ VERSION = "1.5.0"
4
4
  end
5
5
  end
@@ -20,9 +20,17 @@ module Metacrunch
20
20
  @sheet.add_row(columns, types: :string)
21
21
  end
22
22
 
23
- def write(row)
24
- return if row.blank?
25
- @sheet.add_row(row, types: :string)
23
+ def write(data)
24
+ return if data.blank?
25
+ raise ArgumentError, "Data must be an Array" unless data.is_a?(Array)
26
+
27
+ if data.first.is_a?(Array)
28
+ data.each do |d|
29
+ @sheet.add_row(d, types: :string)
30
+ end
31
+ else
32
+ @sheet.add_row(data, types: :string)
33
+ end
26
34
  end
27
35
 
28
36
  def close
@@ -15,6 +15,7 @@ Gem::Specification.new do |spec|
15
15
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
16
16
  spec.require_paths = ["lib"]
17
17
 
18
- spec.add_dependency "activesupport", ">= 5.1.0"
19
- spec.add_dependency "axlsx", "~> 2.0.1"
18
+ spec.add_dependency "activesupport", ">= 0"
19
+ spec.add_dependency "caxlsx", "~> 3.0"
20
+ spec.add_dependency "smarter_csv", "~> 1.2"
20
21
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metacrunch-file
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.1
4
+ version: 1.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - René Sprotte
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-09-18 00:00:00.000000000 Z
11
+ date: 2021-08-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -16,30 +16,44 @@ dependencies:
16
16
  requirements:
17
17
  - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: 5.1.0
19
+ version: '0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: 5.1.0
26
+ version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
- name: axlsx
28
+ name: caxlsx
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
31
  - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: 2.0.1
33
+ version: '3.0'
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: 2.0.1
41
- description:
42
- email:
40
+ version: '3.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: smarter_csv
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.2'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.2'
55
+ description:
56
+ email:
43
57
  executables: []
44
58
  extensions: []
45
59
  extra_rdoc_files: []
@@ -53,8 +67,12 @@ files:
53
67
  - Readme.md
54
68
  - bin/console
55
69
  - lib/metacrunch/file.rb
70
+ - lib/metacrunch/file/csv_destination.rb
71
+ - lib/metacrunch/file/csv_source.rb
56
72
  - lib/metacrunch/file/destination.rb
57
73
  - lib/metacrunch/file/entry.rb
74
+ - lib/metacrunch/file/file_destination.rb
75
+ - lib/metacrunch/file/file_source.rb
58
76
  - lib/metacrunch/file/source.rb
59
77
  - lib/metacrunch/file/version.rb
60
78
  - lib/metacrunch/file/xlsx_destination.rb
@@ -63,7 +81,7 @@ homepage: http://github.com/ubpb/metacrunch-file
63
81
  licenses:
64
82
  - MIT
65
83
  metadata: {}
66
- post_install_message:
84
+ post_install_message:
67
85
  rdoc_options: []
68
86
  require_paths:
69
87
  - lib
@@ -78,9 +96,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
78
96
  - !ruby/object:Gem::Version
79
97
  version: '0'
80
98
  requirements: []
81
- rubyforge_project:
82
- rubygems_version: 2.7.7
83
- signing_key:
99
+ rubygems_version: 3.2.22
100
+ signing_key:
84
101
  specification_version: 4
85
102
  summary: File package for the metacrunch ETL toolkit.
86
103
  test_files: []