chronicle-etl 0.5.0 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 951fca4c6238d773ec8bc2b9ea474a0cffdabf0c2f5d0c925f78b91b35836224
4
- data.tar.gz: 908a7f01fb215cca9936f072b71315c3b62e0d00b3c8f7ffd938682a4cabe42c
3
+ metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
4
+ data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
5
5
  SHA512:
6
- metadata.gz: 28fc97935e5bd9538877a2057f3201170fdb1eb574385ae6d94901b21abfa5f923618d5fb2caf94395503ec70c0052b607a939b363f27630aaca26df6ca93722
7
- data.tar.gz: 0b8e4dedb79e6cbd23487e2c4482d9a8ad9d1653e015593e6b83cac854d94a6cd4702862eebb11ec9f41e63b774eb6f41db929a1ceb12055af6cb08209a6b8eb
6
+ metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
7
+ data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
data/README.md CHANGED
@@ -8,20 +8,36 @@ Are you trying to archive your digital history or incorporate it into your own p
8
8
 
9
9
  If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
10
10
 
11
- **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a source (e.g. a CSV file, JSON, external API).
11
+ **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
12
12
 
13
13
  ## What does `chronicle-etl` give you?
14
14
  * **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
15
15
  * **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
16
16
  * **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
17
17
 
18
+ ## Chronicle-ETL in action
19
+
20
+ ![demo](https://user-images.githubusercontent.com/6291/161410839-b5ce931a-2353-4585-b530-929f46e3f960.svg)
21
+
22
+ ### Longer screencast
23
+
24
+ [![asciicast](https://asciinema.org/a/483455.svg)](https://asciinema.org/a/483455)
25
+
18
26
  ## Installation
27
+
28
+ Using homebrew:
19
29
  ```sh
20
- # Install chronicle-etl
21
- gem install chronicle-etl
30
+ $ brew install chronicle-app/etl/chronicle-etl
31
+ ```
32
+ Using rubygems:
33
+ ```sh
34
+ $ gem install chronicle-etl
22
35
  ```
23
36
 
24
- After installation, the `chronicle-etl` command will be available in your shell. Homebrew support [is coming soon](https://github.com/chronicle-app/chronicle-etl/issues/13).
37
+ Confirm it installed successfully:
38
+ ```sh
39
+ $ chronicle-etl --version
40
+ ```
25
41
 
26
42
  ## Basic usage and running jobs
27
43
 
@@ -33,7 +49,7 @@ $ chronicle-etl help
33
49
  $ chronicle-etl --extractor NAME --transformer NAME --loader NAME
34
50
 
35
51
  # Read test.csv and display it to stdout as a table
36
- $ chronicle-etl --extractor csv --input ./data.csv --loader table
52
+ $ chronicle-etl --extractor csv --input data.csv --loader table
37
53
 
38
54
  # Retrieve shell commands run in the last 5 hours
39
55
  $ chronicle-etl -e shell --since 5h
@@ -50,7 +66,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
50
66
  ### Common options
51
67
  ```sh
52
68
  Options:
53
- -j, [--name=NAME] # Job configuration name
54
69
  -e, [--extractor=NAME] # Extractor class. Default: stdin
55
70
  [--extractor-opts=key:value] # Extractor options
56
71
  -t, [--transformer=NAME] # Transformer class. Default: null
@@ -71,6 +86,26 @@ Options:
71
86
  [--silent], [--no-silent] # Silence all output
72
87
  ```
73
88
 
89
+ ### Saving jobs
90
+
91
+ You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
92
+
93
+ ```sh
94
+ # Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
95
+ $ chronicle-etl jobs:save sample --extractor pinboard --since 10d
96
+
97
+ # Show details about the job
98
+ $ chronicle-etl jobs:show sample
99
+
100
+ # Run the job
101
+ $ chronicle-etl jobs:run sample
102
+ # Or more simply:
103
+ $ chronicle-etl sample
104
+
105
+ # Show all saved jobs
106
+ $ chronicle-etl jobs:list
107
+ ```
108
+
74
109
  ## Connectors
75
110
  Connectors are available to read, process, and load data from different formats or external services.
76
111
 
@@ -97,7 +132,7 @@ $ chronicle-etl connectors:list
97
132
  - [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
98
133
 
99
134
  ## Chronicle Plugins
100
- Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through `$ gem install` or through the CLI itself.
135
+ Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
101
136
 
102
137
  ### Plugin usage
103
138
 
@@ -126,11 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
126
161
 
127
162
  | Name | Description | Availability |
128
163
  |-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
164
+ | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
165
+ | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
129
166
  | [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
130
- | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
131
- | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
132
167
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
133
168
  | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
169
+ | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
134
170
 
135
171
  #### Coming soon
136
172
 
@@ -206,7 +242,6 @@ $ chronicle-etl secrets:unset pinboard access_token
206
242
 
207
243
  ## Roadmap
208
244
 
209
- - Add **homebrew formula** for easier installation. #13
210
245
  - Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
211
246
  - Add support for **incremental extractions** #37
212
247
  - **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
@@ -6,6 +6,7 @@ module Chronicle
6
6
  no_commands do
7
7
  # Shorthand for cli_exit(status: :failure)
8
8
  def cli_fail(message: nil, exception: nil)
9
+ message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
9
10
  cli_exit(status: :failure, message: message, exception: exception)
10
11
  end
11
12
 
@@ -9,8 +9,6 @@ module Chronicle
9
9
  default_task "start"
10
10
  namespace :jobs
11
11
 
12
- class_option :name, aliases: '-j', desc: 'Job configuration name'
13
-
14
12
  class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
15
13
  class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
16
14
  class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
@@ -44,8 +42,17 @@ module Chronicle
44
42
  If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
45
43
  LONG_DESC
46
44
  # Run an ETL job
47
- def start
48
- job_definition = build_job_definition(options)
45
+ def start(name = nil)
46
+ # If someone runs `$ chronicle-etl` with no arguments, show help menu.
47
+ # TODO: decide if we should check that there's nothing in stdin pipe
48
+ # in case user wants to actually run this sort of job stdin->null->stdout
49
+ if name.nil? && options[:extractor].nil?
50
+ m = Chronicle::ETL::CLI::Main.new
51
+ m.help
52
+ cli_exit
53
+ end
54
+
55
+ job_definition = build_job_definition(name, options)
49
56
 
50
57
  if job_definition.plugins_missing?
51
58
  missing_plugins = job_definition.errors[:plugins]
@@ -59,26 +66,43 @@ LONG_DESC
59
66
  rescue Chronicle::ETL::JobDefinitionError => e
60
67
  message = ""
61
68
  job_definition.errors.each_pair do |category, errors|
62
- message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n -")}"
69
+ message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n - ")}"
63
70
  end
64
71
  cli_fail(message: "Error running job.\n#{message}", exception: e)
65
72
  end
66
73
 
67
- desc "create", "Create a job"
74
+ option :'skip-confirmation', aliases: '-y', type: :boolean
75
+ desc "save", "Save a job"
68
76
  # Create an ETL job
69
- def create
70
- job_definition = build_job_definition(options)
77
+ def save(name)
78
+ write_config = true
79
+ job_definition = build_job_definition(name, options)
71
80
  job_definition.validate!
72
81
 
73
- Chronicle::ETL::Config.write("jobs", options[:name], job_definition.definition)
82
+ if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
83
+ prompt = TTY::Prompt.new
84
+ write_config = false
85
+ message = "Job '#{name}' exists already. Ovewrite it?"
86
+ begin
87
+ write_config = prompt.yes?(message)
88
+ rescue TTY::Reader::InputInterrupt
89
+ end
90
+ end
91
+
92
+ if write_config
93
+ Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
94
+ cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
95
+ else
96
+ cli_fail(message: "\nJob not saved")
97
+ end
74
98
  rescue Chronicle::ETL::JobDefinitionError => e
75
99
  cli_fail(message: "Job definition error", exception: e)
76
100
  end
77
101
 
78
102
  desc "show", "Show details about a job"
79
103
  # Show an ETL job
80
- def show
81
- job_definition = build_job_definition(options)
104
+ def show(name = nil)
105
+ job_definition = build_job_definition(name, options)
82
106
  job_definition.validate!
83
107
  puts Chronicle::ETL::Job.new(job_definition)
84
108
  rescue Chronicle::ETL::JobDefinitionError => e
@@ -112,12 +136,16 @@ LONG_DESC
112
136
  private
113
137
 
114
138
  def run_job(job_definition)
139
+ # FIXME: have to validate here so next method can work. This is clumsy
140
+ job_definition.validate!
115
141
  # FIXME: clumsy to make CLI responsible for setting secrets here. Think about a better way to do this
116
142
  job_definition.apply_default_secrets
117
143
 
118
144
  job = Chronicle::ETL::Job.new(job_definition)
119
145
  runner = Chronicle::ETL::Runner.new(job)
120
146
  runner.run!
147
+ rescue RunnerError => e
148
+ cli_fail(message: "#{e.message}", exception: e)
121
149
  end
122
150
 
123
151
  # TODO: probably could merge this with something in cli/plugin
@@ -134,9 +162,9 @@ LONG_DESC
134
162
  end
135
163
 
136
164
  # Create job definition by reading config file and then overwriting with flag options
137
- def build_job_definition(options)
165
+ def build_job_definition(name, options)
138
166
  definition = Chronicle::ETL::JobDefinition.new
139
- definition.add_config(load_job_config(options[:name]))
167
+ definition.add_config(load_job_config(name))
140
168
  definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
141
169
  definition
142
170
  end
@@ -54,24 +54,40 @@ module Chronicle
54
54
  klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
55
55
  klass.start(['-h', task].compact, shell: shell)
56
56
  else
57
- shell.say "ABOUT".bold
58
- shell.say " #{'chronicle-etl'.italic} is a utility tool for #{'extracting'.underline}, #{'transforming'.underline}, and #{'loading'.underline} personal data."
57
+ shell.say "ABOUT:".bold
58
+ shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
59
+ shell.say " history. 📜"
59
60
  shell.say
60
- shell.say "USAGE".bold
61
- shell.say " $ chronicle-etl COMMAND"
61
+ shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
62
+ shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
63
+ shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
64
+ shell.say " rest) or use plugins to connect to third-party services."
62
65
  shell.say
63
- shell.say "EXAMPLES".bold
64
- shell.say " Show available connectors:".italic.light_black
65
- shell.say " $ chronicle-etl connectors:list"
66
+ shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
66
67
  shell.say
67
- shell.say " Run a simple job:".italic.light_black
68
- shell.say " $ chronicle-etl jobs:run --extractor stdin --transformer null --loader stdout"
68
+ shell.say "USAGE:".bold
69
+ shell.say " # Basic job usage:".italic.light_black
70
+ shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
69
71
  shell.say
70
- shell.say " Show full job options:".italic.light_black
72
+ shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
73
+ shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
74
+ shell.say
75
+ shell.say " # Show available plugins:".italic.light_black
76
+ shell.say " $ chronicle-etl plugins:list"
77
+ shell.say
78
+ shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
79
+ shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
80
+ shell.say " $ chronicle-etl secrets:list"
81
+ shell.say " $ chronicle-etl -e pinboard --since 1mo"
82
+ shell.say
83
+ shell.say " # Show full job options:".italic.light_black
71
84
  shell.say " $ chronicle-etl jobs help run"
85
+ shell.say
86
+ shell.say "FULL DOCUMENTATION:".bold
87
+ shell.say " https://github.com/chronicle-app/chronicle-etl".blue
88
+ shell.say
72
89
 
73
90
  list = []
74
-
75
91
  ::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
76
92
  list += thor_class.printable_tasks(false)
77
93
  end
@@ -79,25 +95,18 @@ module Chronicle
79
95
  list.unshift ["help", "# This help menu"]
80
96
 
81
97
  shell.say
82
- shell.say 'ALL COMMANDS'.bold
98
+ shell.say 'ALL COMMANDS:'.bold
83
99
  shell.print_table(list, indent: 2, truncate: true)
84
100
  shell.say
85
- shell.say "VERSION".bold
101
+ shell.say "VERSION:".bold
86
102
  shell.say " #{Chronicle::ETL::VERSION}"
87
103
  shell.say
88
104
  shell.say " Display current version:".italic.light_black
89
105
  shell.say " $ chronicle-etl --version"
90
- shell.say
91
- shell.say "FULL DOCUMENTATION".bold
92
- shell.say " https://github.com/chronicle-app/chronicle-etl".blue
93
- shell.say
94
106
  end
95
107
  end
96
108
 
97
109
  no_commands do
98
- def testb
99
- puts "hi"
100
- end
101
110
  def set_color_output
102
111
  String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
103
112
  end
@@ -61,7 +61,7 @@ module Chronicle
61
61
  }
62
62
  end
63
63
 
64
- headers = ['name', 'description', 'latest version'].map{ |h| h.to_s.upcase.bold }
64
+ headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
65
65
  table = TTY::Table.new(headers, info.map(&:values))
66
66
  puts "Installed plugins:"
67
67
  puts table.render(indent: 2, padding: [0, 0])
@@ -28,6 +28,12 @@ module Chronicle
28
28
  end
29
29
  end
30
30
 
31
+ def exists?(type, identifier)
32
+ base = config_pathname_for_type(type)
33
+ path = base.join("#{identifier}.yml")
34
+ return path.exist?
35
+ end
36
+
31
37
  # Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
32
38
  def available_jobs
33
39
  Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
@@ -6,6 +6,9 @@ module Chronicle
6
6
 
7
7
  class ConfigError < Error; end
8
8
 
9
+ class RunnerError < Error; end
10
+ class RunInterruptedError < RunnerError; end
11
+
9
12
  class RunnerTypeError < Error; end
10
13
 
11
14
  class JobDefinitionError < Error
@@ -3,11 +3,13 @@ require 'csv'
3
3
  module Chronicle
4
4
  module ETL
5
5
  class CSVLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
6
8
  register_connector do |r|
7
9
  r.description = 'CSV'
8
10
  end
9
11
 
10
- setting :output, default: $stdout
12
+ setting :output
11
13
  setting :headers, default: true
12
14
  setting :header_row, default: true
13
15
 
@@ -30,16 +32,7 @@ module Chronicle
30
32
  csv_options[:headers] = headers
31
33
  end
32
34
 
33
- if @config.output.is_a?(IO)
34
- # This might seem like a duplication of the default value ($stdout)
35
- # but it's because rspec overwrites $stdout (in helper #capture) to
36
- # capture output.
37
- io = $stdout.dup
38
- else
39
- io = File.open(@config.output, "w+")
40
- end
41
-
42
- output = CSV.generate(**csv_options) do |csv|
35
+ csv_output = CSV.generate(**csv_options) do |csv|
43
36
  records.each do |record|
44
37
  csv << record
45
38
  .transform_keys(&:to_sym)
@@ -48,8 +41,12 @@ module Chronicle
48
41
  end
49
42
  end
50
43
 
51
- io.write(output)
52
- io.close
44
+ # TODO: just write to io directly
45
+ if output_to_stdout?
46
+ write_to_stdout(csv_output)
47
+ else
48
+ File.write(@config.output, csv_output)
49
+ end
53
50
  end
54
51
  end
55
52
  end
@@ -0,0 +1,36 @@
1
+ require 'tempfile'
2
+
3
+ module Chronicle
4
+ module ETL
5
+ module Loaders
6
+ module Helpers
7
+ module StdoutHelper
8
+ # TODO: let users use "stdout" as an option for the `output` setting
9
+ # Assume we're using stdout if no output is specified
10
+ def output_to_stdout?
11
+ !@config.output
12
+ end
13
+
14
+ def create_stdout_temp_file
15
+ file = Tempfile.new('chronicle-stdout')
16
+ file.unlink
17
+ file
18
+ end
19
+
20
+ def write_to_stdout_from_temp_file(file)
21
+ file.rewind
22
+ write_to_stdout(file.read)
23
+ end
24
+
25
+ def write_to_stdout(output)
26
+ # We .dup because rspec overwrites $stdout (in helper #capture) to
27
+ # capture output.
28
+ stdout = $stdout.dup
29
+ stdout.write(output)
30
+ stdout.flush
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
36
+ end
@@ -1,19 +1,35 @@
1
+ require 'tempfile'
2
+
1
3
  module Chronicle
2
4
  module ETL
3
5
  class JSONLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
4
8
  register_connector do |r|
5
9
  r.description = 'json'
6
10
  end
7
11
 
8
12
  setting :serializer
9
- setting :output, default: $stdout
13
+ setting :output
14
+
15
+ # If true, one JSON record per line. If false, output a single json
16
+ # object with an array of records
17
+ setting :line_separated, default: true, type: :boolean
18
+
19
+ def initialize(*args)
20
+ super
21
+ @first_line = true
22
+ end
10
23
 
11
24
  def start
12
- if @config.output == $stdout
13
- @output = @config.output
14
- else
15
- @output = File.open(@config.output, "w")
16
- end
25
+ @output_file =
26
+ if output_to_stdout?
27
+ create_stdout_temp_file
28
+ else
29
+ File.open(@config.output, "w+")
30
+ end
31
+
32
+ @output_file.puts("[\n") unless @config.line_separated
17
33
  end
18
34
 
19
35
  def load(record)
@@ -27,15 +43,34 @@ module Chronicle
27
43
 
28
44
  force_utf8(value)
29
45
  end
30
- @output.puts encoded.to_json
46
+
47
+ line = encoded.to_json
48
+ # For line-separated output, we just put json + newline
49
+ if @config.line_separated
50
+ line = "#{line}\n"
51
+ # Otherwise, we add a comma and newline and then add record to the
52
+ # array we created in #start (unless it's the first line).
53
+ else
54
+ line = ",\n#{line}" unless @first_line
55
+ end
56
+
57
+ @output_file.write(line)
58
+
59
+ @first_line = false
31
60
  end
32
61
 
33
62
  def finish
34
- @output.close
63
+ # Close the array unless we're doing line-separated JSON
64
+ @output_file.puts("\n]") unless @config.line_separated
65
+
66
+ write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
67
+
68
+ @output_file.close
35
69
  end
36
70
 
37
71
  private
38
72
 
73
+ # TODO: implement this
39
74
  def serializer
40
75
  @config.serializer || Chronicle::ETL::RawSerializer
41
76
  end
@@ -1,4 +1,5 @@
1
1
  require_relative 'helpers/encoding_helper'
2
+ require_relative 'helpers/stdout_helper'
2
3
 
3
4
  module Chronicle
4
5
  module ETL
@@ -9,7 +9,7 @@ module Chronicle
9
9
  # @todo Experiment with just mixing in ActiveModel instead of this
10
10
  # this reimplementation
11
11
  class Base
12
- ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
12
+ ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
13
13
  ASSOCIATIONS = [].freeze
14
14
 
15
15
  attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
@@ -10,7 +10,7 @@ module Chronicle
10
10
  # TODO: This desperately needs a validation system
11
11
  ASSOCIATIONS = [
12
12
  :involvements, # inverse of activity's `involved`
13
-
13
+ :analogous,
14
14
  :attachments,
15
15
  :abouts,
16
16
  :aboutables, # inverse of above
@@ -1,5 +1,6 @@
1
1
  require 'colorize'
2
2
  require 'chronic_duration'
3
+ require "tty-spinner"
3
4
 
4
5
  class Chronicle::ETL::Runner
5
6
  def initialize(job)
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
8
9
  end
9
10
 
10
11
  def run!
12
+ begin_job
11
13
  validate_job
12
14
  instantiate_connectors
13
15
  prepare_job
14
16
  prepare_ui
15
17
  run_extraction
18
+ rescue Chronicle::ETL::ExtractionError => e
19
+ @job_logger&.error
20
+ raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
21
+ rescue Interrupt
22
+ @job_logger&.error
23
+ raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
24
+ rescue StandardError => e
25
+ # Just throwing this in here until we have better exception handling in
26
+ # loaders, etc
27
+ @job_logger&.error
28
+ raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
29
+ ensure
16
30
  finish_job
17
31
  end
18
32
 
19
33
  private
20
34
 
35
+ def begin_job
36
+ Chronicle::ETL::Logger.info(tty_log_job_initialize)
37
+ @initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
38
+ end
39
+
21
40
  def validate_job
41
+ @initialization_spinner.update(title: "Validating job")
22
42
  @job.job_definition.validate!
23
43
  end
24
44
 
25
45
  def instantiate_connectors
46
+ @initialization_spinner.update(title: "Initializing connectors")
26
47
  @extractor = @job.instantiate_extractor
27
48
  @loader = @job.instantiate_loader
28
49
  end
29
50
 
30
51
  def prepare_job
31
- Chronicle::ETL::Logger.info(tty_log_job_start)
52
+ @initialization_spinner.update(title: "Preparing job")
32
53
  @job_logger.start
33
54
  @loader.start
55
+
56
+ @initialization_spinner.update(title: "Preparing extraction")
57
+ @initialization_spinner.auto_spin
34
58
  @extractor.prepare
59
+ @initialization_spinner.success("(#{'successful'.green})")
60
+ Chronicle::ETL::Logger.info("\n")
35
61
  end
36
62
 
37
63
  def prepare_ui
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
40
66
  Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
41
67
  end
42
68
 
43
- # TODO: refactor this further
44
69
  def run_extraction
45
70
  @extractor.extract do |extraction|
46
- unless extraction.is_a?(Chronicle::ETL::Extraction)
47
- raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
48
- end
49
-
50
- transformer = @job.instantiate_transformer(extraction)
51
- record = transformer.transform
52
-
53
- Chronicle::ETL::Logger.info(tty_log_transformation(transformer))
54
- @job_logger.log_transformation(transformer)
55
-
56
- @loader.load(record) unless @job.dry_run?
57
- rescue Chronicle::ETL::TransformationError => e
58
- Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
59
- ensure
71
+ process_extraction(extraction)
60
72
  @progress_bar.increment
61
73
  end
62
74
 
63
75
  @progress_bar.finish
76
+
77
+ # This is typically a slow method (writing to stdout, writing a big file, etc)
78
+ # TODO: consider adding a spinner?
64
79
  @loader.finish
65
80
  @job_logger.finish
66
- rescue Interrupt
67
- Chronicle::ETL::Logger.error("\n#{'Job interrupted'.red}")
68
- @job_logger.error
69
- rescue StandardError => e
70
- raise e
81
+ end
82
+
83
+ def process_extraction(extraction)
84
+ # For each extraction from our extractor, we create a new tarnsformer
85
+ transformer = @job.instantiate_transformer(extraction)
86
+
87
+ # And then transform that record, logging it if we're in debug log level
88
+ record = transformer.transform
89
+ Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
90
+ @job_logger.log_transformation(transformer)
91
+
92
+ # Then send the results to the loader
93
+ @loader.load(record) unless @job.dry_run?
94
+ rescue Chronicle::ETL::TransformationError => e
95
+ # TODO: have an option to cancel job if we encounter an error
96
+ Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
71
97
  end
72
98
 
73
99
  def finish_job
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
77
103
  Chronicle::ETL::Logger.info(tty_log_completion)
78
104
  end
79
105
 
80
- def tty_log_job_start
106
+ def tty_log_job_initialize
81
107
  output = "Beginning job "
82
108
  output += "'#{@job.name}'".bold if @job.name
83
109
  output
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
95
121
 
96
122
  def tty_log_completion
97
123
  status = @job_logger.success ? 'Success' : 'Failed'
98
- output = "\nCompleted job "
99
- output += "'#{@job.name}'".bold if @job.name
124
+ job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
125
+ output = "\n#{job_completion} job"
126
+ output += " '#{@job.name}'".bold if @job.name
100
127
  output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
101
128
  output += "\n Status:\t".light_black + status
102
129
  output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
@@ -10,6 +10,10 @@ module Chronicle
10
10
  # options::
11
11
  # Options for configuring this Transformer
12
12
  def initialize(extraction, options = {})
13
+ unless extraction.is_a?(Chronicle::ETL::Extraction)
14
+ raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
15
+ end
16
+
13
17
  @extraction = extraction
14
18
  apply_options(options)
15
19
  end
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.5.0"
3
+ VERSION = "0.5.3"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.5.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-24 00:00:00.000000000 Z
11
+ date: 2022-04-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -396,6 +396,7 @@ files:
396
396
  - lib/chronicle/etl/job_logger.rb
397
397
  - lib/chronicle/etl/loaders/csv_loader.rb
398
398
  - lib/chronicle/etl/loaders/helpers/encoding_helper.rb
399
+ - lib/chronicle/etl/loaders/helpers/stdout_helper.rb
399
400
  - lib/chronicle/etl/loaders/json_loader.rb
400
401
  - lib/chronicle/etl/loaders/loader.rb
401
402
  - lib/chronicle/etl/loaders/rest_loader.rb