metacrunch-file 1.2.1 → 1.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d530edbf89a1317841d79de98e9e88f6c865c78bb1b26b8f838581308f4f2ddb
4
- data.tar.gz: 1cf9fe7d0dfcc65a2e0bc37ecb146d376005bffd4e70aae8822eb92ab50bcce1
3
+ metadata.gz: 9980829422c838552b9334134e52932f3d70d132859965baa70a93f184ffefaf
4
+ data.tar.gz: 81b3b88c8725eb30469eb1dd40a7f9507865e66a998b76ff2ace5691e424b684
5
5
  SHA512:
6
- metadata.gz: 86a35a6bc99eaa6ac6aaca784cf7dfeb626f1c7378c03fa9de5a17f250c51a05fc8c7febda42f34c90302cb6e05f39da08f506b35645095feac585466a1830a8
7
- data.tar.gz: 2d96c0cf6eb38eb21ed193a8da58ed06cbd837e8a2d9c799fb96ac5d7cc7871564972dd7d3f2e4b70eb42213730073f052dce01e14ede1be12d71525734b3275
6
+ metadata.gz: '090856c4f3cef0b0bc2bedaa2a5fd87ef73005b4678e2bbe5a938475438701b1ed74307dc502bfd5f52614589ed6ef1f55667037d0c1599ecd8ae8c1fb68d8dc'
7
+ data.tar.gz: b940c2ea0555e018303de286a028e975d1cd628cd527d65ca702afb97d7bbac40cc31cad67b27871b3737891b65f016ccff31b960ba761af0cb4157cb033195e
data/Readme.md CHANGED
@@ -6,7 +6,10 @@ metacrunch-file
6
6
  [![Test Coverage](https://codeclimate.com/github/ubpb/metacrunch-file/badges/coverage.svg)](https://codeclimate.com/github/ubpb/metacrunch-file/coverage)
7
7
  [![CircleCI](https://circleci.com/gh/ubpb/metacrunch-file.svg?style=svg)](https://circleci.com/gh/ubpb/metacrunch-file)
8
8
 
9
- This is the official file package for the [metacrunch ETL toolkit](https://github.com/ubpb/metacrunch).
9
+ This is the official file package for the [metacrunch ETL toolkit](https://github.com/ubpb/metacrunch).
10
+
11
+ *Note: For working examples on how to use this package check out our [demo repository](https://github.com/ubpb/metacrunch-demo).*
12
+
10
13
 
11
14
  Installation
12
15
  ------------
@@ -14,7 +17,7 @@ Installation
14
17
  Include the gem in your `Gemfile`
15
18
 
16
19
  ```ruby
17
- gem "metacrunch-file", "~> 1.2.0"
20
+ gem "metacrunch-file", "~> 1.3.0"
18
21
  ```
19
22
 
20
23
  and run `$ bundle install` to install it.
@@ -29,24 +32,20 @@ $ gem install metacrunch-file
29
32
  Usage
30
33
  -----
31
34
 
32
- *Note: For working examples on how to use this package check out our [demo repository](https://github.com/ubpb/metacrunch-demo).*
33
-
34
- ### `Metacrunch::File::Source`
35
+ ## `Metacrunch::File::FileSource`
35
36
 
36
37
  This class provides a metacrunch `source` implementation that can be used to read data from files in the file system into a metacrunch job. The class can be used to read regular files, compressed files (gzip), tar archives and compressed tar archives (gzip).
37
38
 
38
- You can access non-option arguments from the command line using the `ARGV` constant.
39
-
40
39
  ```ruby
41
40
  # my_job.metacrunch
42
41
 
43
42
  # If you call this example like so
44
43
  # $ metacrunch my_job.metacrunch *.xml
45
44
  # ARGV will contain all the XML files in the current directory.
46
- source Metacrunch::File::Source.new(ARGV)
45
+ source Metacrunch::File::FileSource.new(ARGV)
47
46
 
48
47
  # ... or you can set the filenames directly
49
- source Metacrunch::File::Source.new(["my-data.xml", "my-other-data.xml", "..."])
48
+ source Metacrunch::File::FileSource.new(["my-data.xml", "my-other-data.xml", "..."])
50
49
  ```
51
50
 
52
51
  **Options**
@@ -67,22 +66,33 @@ transformation ->(file_entry) do
67
66
  end
68
67
  ```
69
68
 
70
- ### `Metacrunch::File::Destination`
69
+ ## `Metacrunch::File::FileDestination`
71
70
 
72
71
  This class provides a metacrunch `destination` to write data to a file. Every data that gets passed to the destination is appended to the given file. If the data is an `Array` every element of that array is appended to the file. Non existing files will be created automatically.
73
72
 
74
73
  ```ruby
75
74
  # my_job.metacrunch
76
75
 
77
- source Metacrunch::File::Destination.new("/tmp/my-data.txt" [, OPTIONS])
76
+ destination Metacrunch::File::FileDestination.new("/tmp/my-data.txt" [, OPTIONS])
78
77
  ```
79
78
 
80
79
  **Options**
81
80
 
82
81
  * `override_existing_file`: Overrides an existing file if set to `true`. If set to `false` an error is raised if the file already exists. Defaults to `false`.
83
- *
84
82
 
85
- ### `Metacrunch::File::XLSXDestination`
83
+ ## `Metacrunch::File::CSVSource`
84
+
85
+ This class provides a metacrunch `source` for reading CSV files. It is a simple wrapper around [smarter_csv](https://github.com/tilo/smarter_csv) gem.
86
+
87
+ **Options**
88
+
89
+ * `headers`: Whether or not the file contains headers as the first line. Important if the file does not contain headers, otherwise you would lose the first line of data. Defaults to `true`.
90
+ * `col_sep`: Column separator. Defaults to `,`.
91
+ * `row_sep`: Row separator or record separator. Defaults to `\n`.
92
+ * `quote_char`: Quotation character. Defaults to `"`.
93
+ * `file_encoding`: Set the file encoding. Defaults to `utf-8`.
94
+
95
+ ## `Metacrunch::File::XLSXDestination`
86
96
 
87
97
  This class provides a metacrunch `destination` implementation to create simple Excel (xlsx) files.
88
98
 
@@ -95,7 +105,7 @@ transformation ->(data) do
95
105
  [data["foo"], data["bar"], ...]
96
106
  end
97
107
 
98
- source Metacrunch::File::XLSXDestination.new(
108
+ destination Metacrunch::File::XLSXDestination.new(
99
109
  "/tmp/my-data.xlsx", # filename
100
110
  ["Column 1", "Column 2", ...], # header columns
101
111
  OPTIONS
@@ -4,7 +4,10 @@ require "active_support/core_ext"
4
4
  module Metacrunch
5
5
  module File
6
6
  require_relative "file/entry"
7
+ require_relative "file/csv_source"
8
+ require_relative "file/file_source"
7
9
  require_relative "file/source"
10
+ require_relative "file/file_destination"
8
11
  require_relative "file/destination"
9
12
  require_relative "file/xlsx_destination"
10
13
  end
@@ -0,0 +1,35 @@
1
+ require "metacrunch/file"
2
+ require "smarter_csv"
3
+
4
+ module Metacrunch
5
+ class File::CSVSource
6
+
7
+ DEFAULT_OPTIONS = {
8
+ headers: true,
9
+ col_sep: ",",
10
+ row_sep: "\n",
11
+ quote_char: '"',
12
+ file_encoding: "utf-8"
13
+ }
14
+
15
+ def initialize(csv_filename, options = {})
16
+ @filename = csv_filename
17
+ @options = DEFAULT_OPTIONS.merge(options)
18
+ end
19
+
20
+ def each(&block)
21
+ return enum_for(__method__) unless block_given?
22
+
23
+ SmarterCSV.process(@filename, {
24
+ headers_in_file: @options[:headers],
25
+ col_sep: @options[:col_sep],
26
+ row_sep: @options[:row_sep],
27
+ quote_char: @options[:quote_char],
28
+ file_encoding: @options[:file_encoding]
29
+ }) do |line|
30
+ yield line
31
+ end
32
+ end
33
+
34
+ end
35
+ end
@@ -1,35 +1,11 @@
1
1
  require "metacrunch/file"
2
2
 
3
3
  module Metacrunch
4
- class File::Destination
5
-
6
- DEFAULT_OPTIONS = {
7
- override_existing_file: false
8
- }
4
+ class File::Destination < File::FileDestination
9
5
 
10
6
  def initialize(filename, options = {})
11
- @filename = ::File.expand_path(filename)
12
- @options = DEFAULT_OPTIONS.deep_merge(options)
13
-
14
- if ::File.exists?(@filename) && @options[:override_existing_file] == false
15
- raise "File `#{@filename}` exists but `override_existing_file` option was set to `false`"
16
- end
17
-
18
- @file = ::File.open(@filename, 'wb+')
19
- end
20
-
21
- def write(data)
22
- return if data.blank?
23
-
24
- if data.is_a?(Array)
25
- data.each { |row| @file.write(row) }
26
- else
27
- @file.write(data)
28
- end
29
- end
30
-
31
- def close
32
- @file.close if @file
7
+ warn "[DEPRECATION] `Metacrunch::File::Destination` is deprecated. Please use `Metacrunch::File::FileDestination` instead."
8
+ super
33
9
  end
34
10
 
35
11
  end
@@ -0,0 +1,36 @@
1
+ require "metacrunch/file"
2
+
3
+ module Metacrunch
4
+ class File::FileDestination
5
+
6
+ DEFAULT_OPTIONS = {
7
+ override_existing_file: false
8
+ }
9
+
10
+ def initialize(filename, options = {})
11
+ @filename = ::File.expand_path(filename)
12
+ @options = DEFAULT_OPTIONS.deep_merge(options)
13
+
14
+ if ::File.exists?(@filename) && @options[:override_existing_file] == false
15
+ raise "File `#{@filename}` exists but `override_existing_file` option was set to `false`"
16
+ end
17
+
18
+ @file = ::File.open(@filename, 'wb+')
19
+ end
20
+
21
+ def write(data)
22
+ return if data.blank?
23
+
24
+ if data.is_a?(Array)
25
+ data.each { |row| @file.write(row) }
26
+ else
27
+ @file.write(data)
28
+ end
29
+ end
30
+
31
+ def close
32
+ @file.close if @file
33
+ end
34
+
35
+ end
36
+ end
@@ -0,0 +1,56 @@
1
+ require "metacrunch/file"
2
+ require "rubygems/package"
3
+
4
+ module Metacrunch
5
+ class File::FileSource
6
+
7
+ def initialize(filenames)
8
+ @filenames = [*filenames].map{|f| f.presence}.compact
9
+ end
10
+
11
+ def each(&block)
12
+ return enum_for(__method__) unless block_given?
13
+
14
+ @filenames.each do |filename|
15
+ if is_archive?(filename)
16
+ read_archive(filename, &block)
17
+ else
18
+ read_regular_file(filename, &block)
19
+ end
20
+ end
21
+ end
22
+
23
+ private
24
+
25
+ def is_archive?(filename)
26
+ filename.ends_with?(".tar") || filename.ends_with?(".tar.gz") || filename.ends_with?(".tgz")
27
+ end
28
+
29
+ def is_gzip_file?(filename)
30
+ filename.ends_with?(".gz") || filename.ends_with?(".tgz")
31
+ end
32
+
33
+ def read_regular_file(filename, &block)
34
+ if ::File.file?(filename)
35
+ io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
36
+ yield File::Entry.new(filename: filename, archive_filename: nil, contents: io.read)
37
+ end
38
+ end
39
+
40
+ def read_archive(filename, &block)
41
+ io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
42
+ tarReader = Gem::Package::TarReader.new(io)
43
+
44
+ tarReader.each do |_tar_entry|
45
+ if _tar_entry.file?
46
+ yield File::Entry.new(
47
+ filename: filename,
48
+ archive_filename: _tar_entry.full_name,
49
+ contents: _tar_entry.read
50
+ )
51
+ end
52
+ end
53
+ end
54
+
55
+ end
56
+ end
@@ -1,55 +1,11 @@
1
1
  require "metacrunch/file"
2
- require "rubygems/package"
3
2
 
4
3
  module Metacrunch
5
- class File::Source
4
+ class File::Source < File::FileSource
6
5
 
7
6
  def initialize(filenames)
8
- @filenames = [*filenames].map{|f| f.presence}.compact
9
- end
10
-
11
- def each(&block)
12
- return enum_for(__method__) unless block_given?
13
-
14
- @filenames.each do |filename|
15
- if is_archive?(filename)
16
- read_archive(filename, &block)
17
- else
18
- read_regular_file(filename, &block)
19
- end
20
- end
21
- end
22
-
23
- private
24
-
25
- def is_archive?(filename)
26
- filename.ends_with?(".tar") || filename.ends_with?(".tar.gz") || filename.ends_with?(".tgz")
27
- end
28
-
29
- def is_gzip_file?(filename)
30
- filename.ends_with?(".gz") || filename.ends_with?(".tgz")
31
- end
32
-
33
- def read_regular_file(filename, &block)
34
- if ::File.file?(filename)
35
- io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
36
- yield File::Entry.new(filename: filename, archive_filename: nil, contents: io.read)
37
- end
38
- end
39
-
40
- def read_archive(filename, &block)
41
- io = is_gzip_file?(filename) ? Zlib::GzipReader.open(filename) : ::File.open(filename, "r")
42
- tarReader = Gem::Package::TarReader.new(io)
43
-
44
- tarReader.each do |_tar_entry|
45
- if _tar_entry.file?
46
- yield File::Entry.new(
47
- filename: filename,
48
- archive_filename: _tar_entry.full_name,
49
- contents: _tar_entry.read
50
- )
51
- end
52
- end
7
+ warn "[DEPRECATION] `Metacrunch::File::Source` is deprecated. Please use `Metacrunch::File::FileSource` instead."
8
+ super
53
9
  end
54
10
 
55
11
  end
@@ -1,5 +1,5 @@
1
1
  module Metacrunch
2
2
  module File
3
- VERSION = "1.2.1"
3
+ VERSION = "1.3.0"
4
4
  end
5
5
  end
@@ -16,5 +16,6 @@ Gem::Specification.new do |spec|
16
16
  spec.require_paths = ["lib"]
17
17
 
18
18
  spec.add_dependency "activesupport", ">= 5.1.0"
19
- spec.add_dependency "axlsx", "~> 2.0.1"
19
+ spec.add_dependency "axlsx", ">= 3.0.0.pre"
20
+ spec.add_dependency "smarter_csv", "~> 1.2.6"
20
21
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metacrunch-file
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.1
4
+ version: 1.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - René Sprotte
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-09-18 00:00:00.000000000 Z
11
+ date: 2019-04-09 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -26,18 +26,32 @@ dependencies:
26
26
  version: 5.1.0
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: axlsx
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: 3.0.0.pre
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: 3.0.0.pre
41
+ - !ruby/object:Gem::Dependency
42
+ name: smarter_csv
29
43
  requirement: !ruby/object:Gem::Requirement
30
44
  requirements:
31
45
  - - "~>"
32
46
  - !ruby/object:Gem::Version
33
- version: 2.0.1
47
+ version: 1.2.6
34
48
  type: :runtime
35
49
  prerelease: false
36
50
  version_requirements: !ruby/object:Gem::Requirement
37
51
  requirements:
38
52
  - - "~>"
39
53
  - !ruby/object:Gem::Version
40
- version: 2.0.1
54
+ version: 1.2.6
41
55
  description:
42
56
  email:
43
57
  executables: []
@@ -53,8 +67,11 @@ files:
53
67
  - Readme.md
54
68
  - bin/console
55
69
  - lib/metacrunch/file.rb
70
+ - lib/metacrunch/file/csv_source.rb
56
71
  - lib/metacrunch/file/destination.rb
57
72
  - lib/metacrunch/file/entry.rb
73
+ - lib/metacrunch/file/file_destination.rb
74
+ - lib/metacrunch/file/file_source.rb
58
75
  - lib/metacrunch/file/source.rb
59
76
  - lib/metacrunch/file/version.rb
60
77
  - lib/metacrunch/file/xlsx_destination.rb
@@ -78,8 +95,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
78
95
  - !ruby/object:Gem::Version
79
96
  version: '0'
80
97
  requirements: []
81
- rubyforge_project:
82
- rubygems_version: 2.7.7
98
+ rubygems_version: 3.0.3
83
99
  signing_key:
84
100
  specification_version: 4
85
101
  summary: File package for the metacrunch ETL toolkit.