chronicle-etl 0.5.1 → 0.5.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2e9f565004fce3539bdcfec9b98f64e33d8d4f1dde532f573adf54bfb61eadfa
4
- data.tar.gz: 81da054f627ae084d45b4fe53480d5ef26fadf9cfd3ad4eb53d908741d8214e4
3
+ metadata.gz: b8faa084cfe4a9f080ee5494c69b268b78bfa8f3502354e740264e6941f13daf
4
+ data.tar.gz: 1bf4f2751c71cadedc78a2fe3ed5b09bf86cd601a909e2fa2db0a0de8cc2c21d
5
5
  SHA512:
6
- metadata.gz: ffe35b9c55a442610b9832b7f918f16dff0ccad3245248f76c52afbd88c43d671e8724f72a8d181a13e3cafffae6bc97f5963f65eb5c2a0ddb7080d9b6f1d145
7
- data.tar.gz: c063698040b906f99e57cc3f9e829ff8b8ca1d4e5314a3a4179dc965c17ba70da04155d01b0b1fadf573e6a110a69862e9651435bdcdb1160bfe84cb8a55a10a
6
+ metadata.gz: ff10779b663a3321b779fb03e07249856174d96fb96e405ae906a47441c288d6a245c852525801ba250cce1125cf05c523ef4ec75fdfb4335cef9003091437ed
7
+ data.tar.gz: 509f6f92e95341d212c54b6b000bc54e8ba03898497191a3e5d3b14db7bff3ed625d0fee403888fbb6103c1edc14de66b215d9aa84ddb68cefcf51c0e6c74138
data/README.md CHANGED
@@ -16,12 +16,21 @@ If you don’t want to spend all your time writing scrapers, reverse-engineering
16
16
  * **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
17
17
 
18
18
  ## Installation
19
+
20
+ Using homebrew:
21
+ ```sh
22
+ $ brew install chronicle-app/etl/chronicle-etl
23
+
24
+ ```
25
+ Using rubygems:
19
26
  ```sh
20
- # Install chronicle-etl
21
- gem install chronicle-etl
27
+ $ gem install chronicle-etl
22
28
  ```
23
29
 
24
- After installation, the `chronicle-etl` command will be available in your shell. Homebrew support [is coming soon](https://github.com/chronicle-app/chronicle-etl/issues/13).
30
+ Confirm it installed successfully:
31
+ ```sh
32
+ $ chronicle-etl --version
33
+ ```
25
34
 
26
35
  ## Basic usage and running jobs
27
36
 
@@ -50,7 +59,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
50
59
  ### Common options
51
60
  ```sh
52
61
  Options:
53
- -j, [--name=NAME] # Job configuration name
54
62
  -e, [--extractor=NAME] # Extractor class. Default: stdin
55
63
  [--extractor-opts=key:value] # Extractor options
56
64
  -t, [--transformer=NAME] # Transformer class. Default: null
@@ -71,6 +79,26 @@ Options:
71
79
  [--silent], [--no-silent] # Silence all output
72
80
  ```
73
81
 
82
+ ### Saving jobs
83
+
84
+ You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
85
+
86
+ ```sh
87
+ # Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
88
+ $ chronicle-etl jobs:save sample --extractor pinboard --since 10d
89
+
90
+ # Show details about the job
91
+ $ chronicle-etl jobs:show sample
92
+
93
+ # Run the job
94
+ $ chronicle-etl jobs:run sample
95
+ # Or more simply:
96
+ $ chronicle-etl sample
97
+
98
+ # Show all saved jobs
99
+ $ chronicle-etl jobs:list
100
+ ```
101
+
74
102
  ## Connectors
75
103
  Connectors are available to read, process, and load data from different formats or external services.
76
104
 
@@ -97,7 +125,7 @@ $ chronicle-etl connectors:list
97
125
  - [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
98
126
 
99
127
  ## Chronicle Plugins
100
- Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through `$ gem install` or through the CLI itself.
128
+ Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
101
129
 
102
130
  ### Plugin usage
103
131
 
@@ -131,6 +159,7 @@ If you don't see a plugin for a third-party provider or data source that you're
131
159
  | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
132
160
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
133
161
  | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
162
+ | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
134
163
 
135
164
  #### Coming soon
136
165
 
@@ -206,7 +235,6 @@ $ chronicle-etl secrets:unset pinboard access_token
206
235
 
207
236
  ## Roadmap
208
237
 
209
- - Add **homebrew formula** for easier installation. #13
210
238
  - Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
211
239
  - Add support for **incremental extractions** #37
212
240
  - **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
@@ -9,8 +9,6 @@ module Chronicle
9
9
  default_task "start"
10
10
  namespace :jobs
11
11
 
12
- class_option :name, aliases: '-j', desc: 'Job configuration name'
13
-
14
12
  class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
15
13
  class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
16
14
  class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
@@ -44,8 +42,8 @@ module Chronicle
44
42
  If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
45
43
  LONG_DESC
46
44
  # Run an ETL job
47
- def start
48
- job_definition = build_job_definition(options)
45
+ def start(name = nil)
46
+ job_definition = build_job_definition(name, options)
49
47
 
50
48
  if job_definition.plugins_missing?
51
49
  missing_plugins = job_definition.errors[:plugins]
@@ -64,21 +62,39 @@ LONG_DESC
64
62
  cli_fail(message: "Error running job.\n#{message}", exception: e)
65
63
  end
66
64
 
67
- desc "create", "Create a job"
65
+ option :'skip-confirmation', aliases: '-y', type: :boolean
66
+ desc "save", "Save a job"
68
67
  # Create an ETL job
69
- def create
70
- job_definition = build_job_definition(options)
68
+ def save(name)
69
+ write_config = true
70
+ job_definition = build_job_definition(name, options)
71
71
  job_definition.validate!
72
72
 
73
- Chronicle::ETL::Config.write("jobs", options[:name], job_definition.definition)
73
+ if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
74
+ prompt = TTY::Prompt.new
75
+ write_config = false
76
+ message = "Job '#{name}' exists already. Ovewrite it?"
77
+ begin
78
+ write_config = prompt.yes?(message)
79
+ rescue TTY::Reader::InputInterrupt
80
+ end
81
+ end
82
+
83
+ if write_config
84
+ Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
85
+ cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
86
+ else
87
+ cli_fail(message: "\nJob not saved")
88
+ end
89
+
74
90
  rescue Chronicle::ETL::JobDefinitionError => e
75
91
  cli_fail(message: "Job definition error", exception: e)
76
92
  end
77
93
 
78
94
  desc "show", "Show details about a job"
79
95
  # Show an ETL job
80
- def show
81
- job_definition = build_job_definition(options)
96
+ def show(name = nil)
97
+ job_definition = build_job_definition(name, options)
82
98
  job_definition.validate!
83
99
  puts Chronicle::ETL::Job.new(job_definition)
84
100
  rescue Chronicle::ETL::JobDefinitionError => e
@@ -136,9 +152,9 @@ LONG_DESC
136
152
  end
137
153
 
138
154
  # Create job definition by reading config file and then overwriting with flag options
139
- def build_job_definition(options)
155
+ def build_job_definition(name, options)
140
156
  definition = Chronicle::ETL::JobDefinition.new
141
- definition.add_config(load_job_config(options[:name]))
157
+ definition.add_config(load_job_config(name))
142
158
  definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
143
159
  definition
144
160
  end
@@ -28,6 +28,12 @@ module Chronicle
28
28
  end
29
29
  end
30
30
 
31
+ def exists?(type, identifier)
32
+ base = config_pathname_for_type(type)
33
+ path = base.join("#{identifier}.yml")
34
+ return path.exist?
35
+ end
36
+
31
37
  # Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
32
38
  def available_jobs
33
39
  Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
@@ -3,11 +3,13 @@ require 'csv'
3
3
  module Chronicle
4
4
  module ETL
5
5
  class CSVLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
6
8
  register_connector do |r|
7
9
  r.description = 'CSV'
8
10
  end
9
11
 
10
- setting :output, default: $stdout
12
+ setting :output
11
13
  setting :headers, default: true
12
14
  setting :header_row, default: true
13
15
 
@@ -30,16 +32,7 @@ module Chronicle
30
32
  csv_options[:headers] = headers
31
33
  end
32
34
 
33
- if @config.output.is_a?(IO)
34
- # This might seem like a duplication of the default value ($stdout)
35
- # but it's because rspec overwrites $stdout (in helper #capture) to
36
- # capture output.
37
- io = $stdout.dup
38
- else
39
- io = File.open(@config.output, "w+")
40
- end
41
-
42
- output = CSV.generate(**csv_options) do |csv|
35
+ csv_output = CSV.generate(**csv_options) do |csv|
43
36
  records.each do |record|
44
37
  csv << record
45
38
  .transform_keys(&:to_sym)
@@ -48,8 +41,12 @@ module Chronicle
48
41
  end
49
42
  end
50
43
 
51
- io.write(output)
52
- io.close
44
+ # TODO: just write to io directly
45
+ if output_to_stdout?
46
+ write_to_stdout(csv_output)
47
+ else
48
+ File.write(@config.output, csv_output)
49
+ end
53
50
  end
54
51
  end
55
52
  end
@@ -0,0 +1,36 @@
1
+ require 'tempfile'
2
+
3
+ module Chronicle
4
+ module ETL
5
+ module Loaders
6
+ module Helpers
7
+ module StdoutHelper
8
+ # TODO: let users use "stdout" as an option for the `output` setting
9
+ # Assume we're using stdout if no output is specified
10
+ def output_to_stdout?
11
+ !@config.output
12
+ end
13
+
14
+ def create_stdout_temp_file
15
+ file = Tempfile.new('chronicle-stdout')
16
+ file.unlink
17
+ file
18
+ end
19
+
20
+ def write_to_stdout_from_temp_file(file)
21
+ file.rewind
22
+ write_to_stdout(file.read)
23
+ end
24
+
25
+ def write_to_stdout(output)
26
+ # We .dup because rspec overwrites $stdout (in helper #capture) to
27
+ # capture output.
28
+ stdout = $stdout.dup
29
+ stdout.write(output)
30
+ stdout.flush
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
36
+ end
@@ -1,22 +1,35 @@
1
+ require 'tempfile'
2
+
1
3
  module Chronicle
2
4
  module ETL
3
5
  class JSONLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
4
8
  register_connector do |r|
5
9
  r.description = 'json'
6
10
  end
7
11
 
8
12
  setting :serializer
9
- setting :output, default: $stdout
13
+ setting :output
14
+
15
+ # If true, one JSON record per line. If false, output a single json
16
+ # object with an array of records
17
+ setting :line_separated, default: true, type: :boolean
18
+
19
+ def initialize(*args)
20
+ super
21
+ @first_line = true
22
+ end
10
23
 
11
24
  def start
12
- if @config.output.is_a?(IO)
13
- # This might seem like a duplication of the default value ($stdout)
14
- # but it's because rspec overwrites $stdout (in helper #capture) to
15
- # capture output.
16
- @output = $stdout.dup
17
- else
18
- @output = File.open(@config.output, "w+")
19
- end
25
+ @output_file =
26
+ if output_to_stdout?
27
+ create_stdout_temp_file
28
+ else
29
+ File.open(@config.output, "w+")
30
+ end
31
+
32
+ @output_file.puts("[\n") unless @config.line_separated
20
33
  end
21
34
 
22
35
  def load(record)
@@ -30,15 +43,34 @@ module Chronicle
30
43
 
31
44
  force_utf8(value)
32
45
  end
33
- @output.puts encoded.to_json
46
+
47
+ line = encoded.to_json
48
+ # For line-separated output, we just put json + newline
49
+ if @config.line_separated
50
+ line = "#{line}\n"
51
+ # Otherwise, we add a comma and newline and then add record to the
52
+ # array we created in #start (unless it's the first line).
53
+ else
54
+ line = ",\n#{line}" unless @first_line
55
+ end
56
+
57
+ @output_file.write(line)
58
+
59
+ @first_line = false
34
60
  end
35
61
 
36
62
  def finish
37
- @output.close if @output.is_a?(IO)
63
+ # Close the array unless we're doing line-separated JSON
64
+ @output_file.puts("\n]") unless @config.line_separated
65
+
66
+ write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
67
+
68
+ @output_file.close
38
69
  end
39
70
 
40
71
  private
41
72
 
73
+ # TODO: implement this
42
74
  def serializer
43
75
  @config.serializer || Chronicle::ETL::RawSerializer
44
76
  end
@@ -1,4 +1,5 @@
1
1
  require_relative 'helpers/encoding_helper'
2
+ require_relative 'helpers/stdout_helper'
2
3
 
3
4
  module Chronicle
4
5
  module ETL
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.5.1"
3
+ VERSION = "0.5.2"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.1
4
+ version: 0.5.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-25 00:00:00.000000000 Z
11
+ date: 2022-03-30 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -396,6 +396,7 @@ files:
396
396
  - lib/chronicle/etl/job_logger.rb
397
397
  - lib/chronicle/etl/loaders/csv_loader.rb
398
398
  - lib/chronicle/etl/loaders/helpers/encoding_helper.rb
399
+ - lib/chronicle/etl/loaders/helpers/stdout_helper.rb
399
400
  - lib/chronicle/etl/loaders/json_loader.rb
400
401
  - lib/chronicle/etl/loaders/loader.rb
401
402
  - lib/chronicle/etl/loaders/rest_loader.rb