chronicle-etl 0.5.1 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2e9f565004fce3539bdcfec9b98f64e33d8d4f1dde532f573adf54bfb61eadfa
4
- data.tar.gz: 81da054f627ae084d45b4fe53480d5ef26fadf9cfd3ad4eb53d908741d8214e4
3
+ metadata.gz: b8faa084cfe4a9f080ee5494c69b268b78bfa8f3502354e740264e6941f13daf
4
+ data.tar.gz: 1bf4f2751c71cadedc78a2fe3ed5b09bf86cd601a909e2fa2db0a0de8cc2c21d
5
5
  SHA512:
6
- metadata.gz: ffe35b9c55a442610b9832b7f918f16dff0ccad3245248f76c52afbd88c43d671e8724f72a8d181a13e3cafffae6bc97f5963f65eb5c2a0ddb7080d9b6f1d145
7
- data.tar.gz: c063698040b906f99e57cc3f9e829ff8b8ca1d4e5314a3a4179dc965c17ba70da04155d01b0b1fadf573e6a110a69862e9651435bdcdb1160bfe84cb8a55a10a
6
+ metadata.gz: ff10779b663a3321b779fb03e07249856174d96fb96e405ae906a47441c288d6a245c852525801ba250cce1125cf05c523ef4ec75fdfb4335cef9003091437ed
7
+ data.tar.gz: 509f6f92e95341d212c54b6b000bc54e8ba03898497191a3e5d3b14db7bff3ed625d0fee403888fbb6103c1edc14de66b215d9aa84ddb68cefcf51c0e6c74138
data/README.md CHANGED
@@ -16,12 +16,21 @@ If you don’t want to spend all your time writing scrapers, reverse-engineering
16
16
  * **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
17
17
 
18
18
  ## Installation
19
+
20
+ Using homebrew:
21
+ ```sh
22
+ $ brew install chronicle-app/etl/chronicle-etl
23
+
24
+ ```
25
+ Using rubygems:
19
26
  ```sh
20
- # Install chronicle-etl
21
- gem install chronicle-etl
27
+ $ gem install chronicle-etl
22
28
  ```
23
29
 
24
- After installation, the `chronicle-etl` command will be available in your shell. Homebrew support [is coming soon](https://github.com/chronicle-app/chronicle-etl/issues/13).
30
+ Confirm it installed successfully:
31
+ ```sh
32
+ $ chronicle-etl --version
33
+ ```
25
34
 
26
35
  ## Basic usage and running jobs
27
36
 
@@ -50,7 +59,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
50
59
  ### Common options
51
60
  ```sh
52
61
  Options:
53
- -j, [--name=NAME] # Job configuration name
54
62
  -e, [--extractor=NAME] # Extractor class. Default: stdin
55
63
  [--extractor-opts=key:value] # Extractor options
56
64
  -t, [--transformer=NAME] # Transformer class. Default: null
@@ -71,6 +79,26 @@ Options:
71
79
  [--silent], [--no-silent] # Silence all output
72
80
  ```
73
81
 
82
+ ### Saving jobs
83
+
84
+ You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
85
+
86
+ ```sh
87
+ # Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
88
+ $ chronicle-etl jobs:save sample --extractor pinboard --since 10d
89
+
90
+ # Show details about the job
91
+ $ chronicle-etl jobs:show sample
92
+
93
+ # Run the job
94
+ $ chronicle-etl jobs:run sample
95
+ # Or more simply:
96
+ $ chronicle-etl sample
97
+
98
+ # Show all saved jobs
99
+ $ chronicle-etl jobs:list
100
+ ```
101
+
74
102
  ## Connectors
75
103
  Connectors are available to read, process, and load data from different formats or external services.
76
104
 
@@ -97,7 +125,7 @@ $ chronicle-etl connectors:list
97
125
  - [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
98
126
 
99
127
  ## Chronicle Plugins
100
- Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through `$ gem install` or through the CLI itself.
128
+ Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
101
129
 
102
130
  ### Plugin usage
103
131
 
@@ -131,6 +159,7 @@ If you don't see a plugin for a third-party provider or data source that you're
131
159
  | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
132
160
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
133
161
  | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
162
+ | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
134
163
 
135
164
  #### Coming soon
136
165
 
@@ -206,7 +235,6 @@ $ chronicle-etl secrets:unset pinboard access_token
206
235
 
207
236
  ## Roadmap
208
237
 
209
- - Add **homebrew formula** for easier installation. #13
210
238
  - Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
211
239
  - Add support for **incremental extractions** #37
212
240
  - **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
@@ -9,8 +9,6 @@ module Chronicle
9
9
  default_task "start"
10
10
  namespace :jobs
11
11
 
12
- class_option :name, aliases: '-j', desc: 'Job configuration name'
13
-
14
12
  class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
15
13
  class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
16
14
  class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
@@ -44,8 +42,8 @@ module Chronicle
44
42
  If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
45
43
  LONG_DESC
46
44
  # Run an ETL job
47
- def start
48
- job_definition = build_job_definition(options)
45
+ def start(name = nil)
46
+ job_definition = build_job_definition(name, options)
49
47
 
50
48
  if job_definition.plugins_missing?
51
49
  missing_plugins = job_definition.errors[:plugins]
@@ -64,21 +62,39 @@ LONG_DESC
64
62
  cli_fail(message: "Error running job.\n#{message}", exception: e)
65
63
  end
66
64
 
67
- desc "create", "Create a job"
65
+ option :'skip-confirmation', aliases: '-y', type: :boolean
66
+ desc "save", "Save a job"
68
67
  # Create an ETL job
69
- def create
70
- job_definition = build_job_definition(options)
68
+ def save(name)
69
+ write_config = true
70
+ job_definition = build_job_definition(name, options)
71
71
  job_definition.validate!
72
72
 
73
- Chronicle::ETL::Config.write("jobs", options[:name], job_definition.definition)
73
+ if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
74
+ prompt = TTY::Prompt.new
75
+ write_config = false
76
+ message = "Job '#{name}' exists already. Ovewrite it?"
77
+ begin
78
+ write_config = prompt.yes?(message)
79
+ rescue TTY::Reader::InputInterrupt
80
+ end
81
+ end
82
+
83
+ if write_config
84
+ Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
85
+ cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
86
+ else
87
+ cli_fail(message: "\nJob not saved")
88
+ end
89
+
74
90
  rescue Chronicle::ETL::JobDefinitionError => e
75
91
  cli_fail(message: "Job definition error", exception: e)
76
92
  end
77
93
 
78
94
  desc "show", "Show details about a job"
79
95
  # Show an ETL job
80
- def show
81
- job_definition = build_job_definition(options)
96
+ def show(name = nil)
97
+ job_definition = build_job_definition(name, options)
82
98
  job_definition.validate!
83
99
  puts Chronicle::ETL::Job.new(job_definition)
84
100
  rescue Chronicle::ETL::JobDefinitionError => e
@@ -136,9 +152,9 @@ LONG_DESC
136
152
  end
137
153
 
138
154
  # Create job definition by reading config file and then overwriting with flag options
139
- def build_job_definition(options)
155
+ def build_job_definition(name, options)
140
156
  definition = Chronicle::ETL::JobDefinition.new
141
- definition.add_config(load_job_config(options[:name]))
157
+ definition.add_config(load_job_config(name))
142
158
  definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
143
159
  definition
144
160
  end
@@ -28,6 +28,12 @@ module Chronicle
28
28
  end
29
29
  end
30
30
 
31
+ def exists?(type, identifier)
32
+ base = config_pathname_for_type(type)
33
+ path = base.join("#{identifier}.yml")
34
+ return path.exist?
35
+ end
36
+
31
37
  # Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
32
38
  def available_jobs
33
39
  Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
@@ -3,11 +3,13 @@ require 'csv'
3
3
  module Chronicle
4
4
  module ETL
5
5
  class CSVLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
6
8
  register_connector do |r|
7
9
  r.description = 'CSV'
8
10
  end
9
11
 
10
- setting :output, default: $stdout
12
+ setting :output
11
13
  setting :headers, default: true
12
14
  setting :header_row, default: true
13
15
 
@@ -30,16 +32,7 @@ module Chronicle
30
32
  csv_options[:headers] = headers
31
33
  end
32
34
 
33
- if @config.output.is_a?(IO)
34
- # This might seem like a duplication of the default value ($stdout)
35
- # but it's because rspec overwrites $stdout (in helper #capture) to
36
- # capture output.
37
- io = $stdout.dup
38
- else
39
- io = File.open(@config.output, "w+")
40
- end
41
-
42
- output = CSV.generate(**csv_options) do |csv|
35
+ csv_output = CSV.generate(**csv_options) do |csv|
43
36
  records.each do |record|
44
37
  csv << record
45
38
  .transform_keys(&:to_sym)
@@ -48,8 +41,12 @@ module Chronicle
48
41
  end
49
42
  end
50
43
 
51
- io.write(output)
52
- io.close
44
+ # TODO: just write to io directly
45
+ if output_to_stdout?
46
+ write_to_stdout(csv_output)
47
+ else
48
+ File.write(@config.output, csv_output)
49
+ end
53
50
  end
54
51
  end
55
52
  end
@@ -0,0 +1,36 @@
1
+ require 'tempfile'
2
+
3
+ module Chronicle
4
+ module ETL
5
+ module Loaders
6
+ module Helpers
7
+ module StdoutHelper
8
+ # TODO: let users use "stdout" as an option for the `output` setting
9
+ # Assume we're using stdout if no output is specified
10
+ def output_to_stdout?
11
+ !@config.output
12
+ end
13
+
14
+ def create_stdout_temp_file
15
+ file = Tempfile.new('chronicle-stdout')
16
+ file.unlink
17
+ file
18
+ end
19
+
20
+ def write_to_stdout_from_temp_file(file)
21
+ file.rewind
22
+ write_to_stdout(file.read)
23
+ end
24
+
25
+ def write_to_stdout(output)
26
+ # We .dup because rspec overwrites $stdout (in helper #capture) to
27
+ # capture output.
28
+ stdout = $stdout.dup
29
+ stdout.write(output)
30
+ stdout.flush
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
36
+ end
@@ -1,22 +1,35 @@
1
+ require 'tempfile'
2
+
1
3
  module Chronicle
2
4
  module ETL
3
5
  class JSONLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
4
8
  register_connector do |r|
5
9
  r.description = 'json'
6
10
  end
7
11
 
8
12
  setting :serializer
9
- setting :output, default: $stdout
13
+ setting :output
14
+
15
+ # If true, one JSON record per line. If false, output a single json
16
+ # object with an array of records
17
+ setting :line_separated, default: true, type: :boolean
18
+
19
+ def initialize(*args)
20
+ super
21
+ @first_line = true
22
+ end
10
23
 
11
24
  def start
12
- if @config.output.is_a?(IO)
13
- # This might seem like a duplication of the default value ($stdout)
14
- # but it's because rspec overwrites $stdout (in helper #capture) to
15
- # capture output.
16
- @output = $stdout.dup
17
- else
18
- @output = File.open(@config.output, "w+")
19
- end
25
+ @output_file =
26
+ if output_to_stdout?
27
+ create_stdout_temp_file
28
+ else
29
+ File.open(@config.output, "w+")
30
+ end
31
+
32
+ @output_file.puts("[\n") unless @config.line_separated
20
33
  end
21
34
 
22
35
  def load(record)
@@ -30,15 +43,34 @@ module Chronicle
30
43
 
31
44
  force_utf8(value)
32
45
  end
33
- @output.puts encoded.to_json
46
+
47
+ line = encoded.to_json
48
+ # For line-separated output, we just put json + newline
49
+ if @config.line_separated
50
+ line = "#{line}\n"
51
+ # Otherwise, we add a comma and newline and then add record to the
52
+ # array we created in #start (unless it's the first line).
53
+ else
54
+ line = ",\n#{line}" unless @first_line
55
+ end
56
+
57
+ @output_file.write(line)
58
+
59
+ @first_line = false
34
60
  end
35
61
 
36
62
  def finish
37
- @output.close if @output.is_a?(IO)
63
+ # Close the array unless we're doing line-separated JSON
64
+ @output_file.puts("\n]") unless @config.line_separated
65
+
66
+ write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
67
+
68
+ @output_file.close
38
69
  end
39
70
 
40
71
  private
41
72
 
73
+ # TODO: implement this
42
74
  def serializer
43
75
  @config.serializer || Chronicle::ETL::RawSerializer
44
76
  end
@@ -1,4 +1,5 @@
1
1
  require_relative 'helpers/encoding_helper'
2
+ require_relative 'helpers/stdout_helper'
2
3
 
3
4
  module Chronicle
4
5
  module ETL
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.5.1"
3
+ VERSION = "0.5.2"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.1
4
+ version: 0.5.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-25 00:00:00.000000000 Z
11
+ date: 2022-03-30 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -396,6 +396,7 @@ files:
396
396
  - lib/chronicle/etl/job_logger.rb
397
397
  - lib/chronicle/etl/loaders/csv_loader.rb
398
398
  - lib/chronicle/etl/loaders/helpers/encoding_helper.rb
399
+ - lib/chronicle/etl/loaders/helpers/stdout_helper.rb
399
400
  - lib/chronicle/etl/loaders/json_loader.rb
400
401
  - lib/chronicle/etl/loaders/loader.rb
401
402
  - lib/chronicle/etl/loaders/rest_loader.rb