chronicle-etl 0.5.0 → 0.5.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 951fca4c6238d773ec8bc2b9ea474a0cffdabf0c2f5d0c925f78b91b35836224
4
- data.tar.gz: 908a7f01fb215cca9936f072b71315c3b62e0d00b3c8f7ffd938682a4cabe42c
3
+ metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
4
+ data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
5
5
  SHA512:
6
- metadata.gz: 28fc97935e5bd9538877a2057f3201170fdb1eb574385ae6d94901b21abfa5f923618d5fb2caf94395503ec70c0052b607a939b363f27630aaca26df6ca93722
7
- data.tar.gz: 0b8e4dedb79e6cbd23487e2c4482d9a8ad9d1653e015593e6b83cac854d94a6cd4702862eebb11ec9f41e63b774eb6f41db929a1ceb12055af6cb08209a6b8eb
6
+ metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
7
+ data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
data/README.md CHANGED
@@ -8,20 +8,36 @@ Are you trying to archive your digital history or incorporate it into your own p
8
8
 
9
9
  If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
10
10
 
11
- **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a source (e.g. a CSV file, JSON, external API).
11
+ **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
12
12
 
13
13
  ## What does `chronicle-etl` give you?
14
14
  * **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
15
15
  * **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
16
16
  * **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
17
17
 
18
+ ## Chronicle-ETL in action
19
+
20
+ ![demo](https://user-images.githubusercontent.com/6291/161410839-b5ce931a-2353-4585-b530-929f46e3f960.svg)
21
+
22
+ ### Longer screencast
23
+
24
+ [![asciicast](https://asciinema.org/a/483455.svg)](https://asciinema.org/a/483455)
25
+
18
26
  ## Installation
27
+
28
+ Using homebrew:
19
29
  ```sh
20
- # Install chronicle-etl
21
- gem install chronicle-etl
30
+ $ brew install chronicle-app/etl/chronicle-etl
31
+ ```
32
+ Using rubygems:
33
+ ```sh
34
+ $ gem install chronicle-etl
22
35
  ```
23
36
 
24
- After installation, the `chronicle-etl` command will be available in your shell. Homebrew support [is coming soon](https://github.com/chronicle-app/chronicle-etl/issues/13).
37
+ Confirm it installed successfully:
38
+ ```sh
39
+ $ chronicle-etl --version
40
+ ```
25
41
 
26
42
  ## Basic usage and running jobs
27
43
 
@@ -33,7 +49,7 @@ $ chronicle-etl help
33
49
  $ chronicle-etl --extractor NAME --transformer NAME --loader NAME
34
50
 
35
51
  # Read test.csv and display it to stdout as a table
36
- $ chronicle-etl --extractor csv --input ./data.csv --loader table
52
+ $ chronicle-etl --extractor csv --input data.csv --loader table
37
53
 
38
54
  # Retrieve shell commands run in the last 5 hours
39
55
  $ chronicle-etl -e shell --since 5h
@@ -50,7 +66,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
50
66
  ### Common options
51
67
  ```sh
52
68
  Options:
53
- -j, [--name=NAME] # Job configuration name
54
69
  -e, [--extractor=NAME] # Extractor class. Default: stdin
55
70
  [--extractor-opts=key:value] # Extractor options
56
71
  -t, [--transformer=NAME] # Transformer class. Default: null
@@ -71,6 +86,26 @@ Options:
71
86
  [--silent], [--no-silent] # Silence all output
72
87
  ```
73
88
 
89
+ ### Saving jobs
90
+
91
+ You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
92
+
93
+ ```sh
94
+ # Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
95
+ $ chronicle-etl jobs:save sample --extractor pinboard --since 10d
96
+
97
+ # Show details about the job
98
+ $ chronicle-etl jobs:show sample
99
+
100
+ # Run the job
101
+ $ chronicle-etl jobs:run sample
102
+ # Or more simply:
103
+ $ chronicle-etl sample
104
+
105
+ # Show all saved jobs
106
+ $ chronicle-etl jobs:list
107
+ ```
108
+
74
109
  ## Connectors
75
110
  Connectors are available to read, process, and load data from different formats or external services.
76
111
 
@@ -97,7 +132,7 @@ $ chronicle-etl connectors:list
97
132
  - [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
98
133
 
99
134
  ## Chronicle Plugins
100
- Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through `$ gem install` or through the CLI itself.
135
+ Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
101
136
 
102
137
  ### Plugin usage
103
138
 
@@ -126,11 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
126
161
 
127
162
  | Name | Description | Availability |
128
163
  |-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
164
+ | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
165
+ | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
129
166
  | [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
130
- | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
131
- | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
132
167
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
133
168
  | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
169
+ | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
134
170
 
135
171
  #### Coming soon
136
172
 
@@ -206,7 +242,6 @@ $ chronicle-etl secrets:unset pinboard access_token
206
242
 
207
243
  ## Roadmap
208
244
 
209
- - Add **homebrew formula** for easier installation. #13
210
245
  - Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
211
246
  - Add support for **incremental extractions** #37
212
247
  - **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
@@ -6,6 +6,7 @@ module Chronicle
6
6
  no_commands do
7
7
  # Shorthand for cli_exit(status: :failure)
8
8
  def cli_fail(message: nil, exception: nil)
9
+ message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
9
10
  cli_exit(status: :failure, message: message, exception: exception)
10
11
  end
11
12
 
@@ -9,8 +9,6 @@ module Chronicle
9
9
  default_task "start"
10
10
  namespace :jobs
11
11
 
12
- class_option :name, aliases: '-j', desc: 'Job configuration name'
13
-
14
12
  class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
15
13
  class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
16
14
  class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
@@ -44,8 +42,17 @@ module Chronicle
44
42
  If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
45
43
  LONG_DESC
46
44
  # Run an ETL job
47
- def start
48
- job_definition = build_job_definition(options)
45
+ def start(name = nil)
46
+ # If someone runs `$ chronicle-etl` with no arguments, show help menu.
47
+ # TODO: decide if we should check that there's nothing in stdin pipe
48
+ # in case user wants to actually run this sort of job stdin->null->stdout
49
+ if name.nil? && options[:extractor].nil?
50
+ m = Chronicle::ETL::CLI::Main.new
51
+ m.help
52
+ cli_exit
53
+ end
54
+
55
+ job_definition = build_job_definition(name, options)
49
56
 
50
57
  if job_definition.plugins_missing?
51
58
  missing_plugins = job_definition.errors[:plugins]
@@ -59,26 +66,43 @@ LONG_DESC
59
66
  rescue Chronicle::ETL::JobDefinitionError => e
60
67
  message = ""
61
68
  job_definition.errors.each_pair do |category, errors|
62
- message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n -")}"
69
+ message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n - ")}"
63
70
  end
64
71
  cli_fail(message: "Error running job.\n#{message}", exception: e)
65
72
  end
66
73
 
67
- desc "create", "Create a job"
74
+ option :'skip-confirmation', aliases: '-y', type: :boolean
75
+ desc "save", "Save a job"
68
76
  # Create an ETL job
69
- def create
70
- job_definition = build_job_definition(options)
77
+ def save(name)
78
+ write_config = true
79
+ job_definition = build_job_definition(name, options)
71
80
  job_definition.validate!
72
81
 
73
- Chronicle::ETL::Config.write("jobs", options[:name], job_definition.definition)
82
+ if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
83
+ prompt = TTY::Prompt.new
84
+ write_config = false
85
+ message = "Job '#{name}' exists already. Ovewrite it?"
86
+ begin
87
+ write_config = prompt.yes?(message)
88
+ rescue TTY::Reader::InputInterrupt
89
+ end
90
+ end
91
+
92
+ if write_config
93
+ Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
94
+ cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
95
+ else
96
+ cli_fail(message: "\nJob not saved")
97
+ end
74
98
  rescue Chronicle::ETL::JobDefinitionError => e
75
99
  cli_fail(message: "Job definition error", exception: e)
76
100
  end
77
101
 
78
102
  desc "show", "Show details about a job"
79
103
  # Show an ETL job
80
- def show
81
- job_definition = build_job_definition(options)
104
+ def show(name = nil)
105
+ job_definition = build_job_definition(name, options)
82
106
  job_definition.validate!
83
107
  puts Chronicle::ETL::Job.new(job_definition)
84
108
  rescue Chronicle::ETL::JobDefinitionError => e
@@ -112,12 +136,16 @@ LONG_DESC
112
136
  private
113
137
 
114
138
  def run_job(job_definition)
139
+ # FIXME: have to validate here so next method can work. This is clumsy
140
+ job_definition.validate!
115
141
  # FIXME: clumsy to make CLI responsible for setting secrets here. Think about a better way to do this
116
142
  job_definition.apply_default_secrets
117
143
 
118
144
  job = Chronicle::ETL::Job.new(job_definition)
119
145
  runner = Chronicle::ETL::Runner.new(job)
120
146
  runner.run!
147
+ rescue RunnerError => e
148
+ cli_fail(message: "#{e.message}", exception: e)
121
149
  end
122
150
 
123
151
  # TODO: probably could merge this with something in cli/plugin
@@ -134,9 +162,9 @@ LONG_DESC
134
162
  end
135
163
 
136
164
  # Create job definition by reading config file and then overwriting with flag options
137
- def build_job_definition(options)
165
+ def build_job_definition(name, options)
138
166
  definition = Chronicle::ETL::JobDefinition.new
139
- definition.add_config(load_job_config(options[:name]))
167
+ definition.add_config(load_job_config(name))
140
168
  definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
141
169
  definition
142
170
  end
@@ -54,24 +54,40 @@ module Chronicle
54
54
  klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
55
55
  klass.start(['-h', task].compact, shell: shell)
56
56
  else
57
- shell.say "ABOUT".bold
58
- shell.say " #{'chronicle-etl'.italic} is a utility tool for #{'extracting'.underline}, #{'transforming'.underline}, and #{'loading'.underline} personal data."
57
+ shell.say "ABOUT:".bold
58
+ shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
59
+ shell.say " history. 📜"
59
60
  shell.say
60
- shell.say "USAGE".bold
61
- shell.say " $ chronicle-etl COMMAND"
61
+ shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
62
+ shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
63
+ shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
64
+ shell.say " rest) or use plugins to connect to third-party services."
62
65
  shell.say
63
- shell.say "EXAMPLES".bold
64
- shell.say " Show available connectors:".italic.light_black
65
- shell.say " $ chronicle-etl connectors:list"
66
+ shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
66
67
  shell.say
67
- shell.say " Run a simple job:".italic.light_black
68
- shell.say " $ chronicle-etl jobs:run --extractor stdin --transformer null --loader stdout"
68
+ shell.say "USAGE:".bold
69
+ shell.say " # Basic job usage:".italic.light_black
70
+ shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
69
71
  shell.say
70
- shell.say " Show full job options:".italic.light_black
72
+ shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
73
+ shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
74
+ shell.say
75
+ shell.say " # Show available plugins:".italic.light_black
76
+ shell.say " $ chronicle-etl plugins:list"
77
+ shell.say
78
+ shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
79
+ shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
80
+ shell.say " $ chronicle-etl secrets:list"
81
+ shell.say " $ chronicle-etl -e pinboard --since 1mo"
82
+ shell.say
83
+ shell.say " # Show full job options:".italic.light_black
71
84
  shell.say " $ chronicle-etl jobs help run"
85
+ shell.say
86
+ shell.say "FULL DOCUMENTATION:".bold
87
+ shell.say " https://github.com/chronicle-app/chronicle-etl".blue
88
+ shell.say
72
89
 
73
90
  list = []
74
-
75
91
  ::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
76
92
  list += thor_class.printable_tasks(false)
77
93
  end
@@ -79,25 +95,18 @@ module Chronicle
79
95
  list.unshift ["help", "# This help menu"]
80
96
 
81
97
  shell.say
82
- shell.say 'ALL COMMANDS'.bold
98
+ shell.say 'ALL COMMANDS:'.bold
83
99
  shell.print_table(list, indent: 2, truncate: true)
84
100
  shell.say
85
- shell.say "VERSION".bold
101
+ shell.say "VERSION:".bold
86
102
  shell.say " #{Chronicle::ETL::VERSION}"
87
103
  shell.say
88
104
  shell.say " Display current version:".italic.light_black
89
105
  shell.say " $ chronicle-etl --version"
90
- shell.say
91
- shell.say "FULL DOCUMENTATION".bold
92
- shell.say " https://github.com/chronicle-app/chronicle-etl".blue
93
- shell.say
94
106
  end
95
107
  end
96
108
 
97
109
  no_commands do
98
- def testb
99
- puts "hi"
100
- end
101
110
  def set_color_output
102
111
  String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
103
112
  end
@@ -61,7 +61,7 @@ module Chronicle
61
61
  }
62
62
  end
63
63
 
64
- headers = ['name', 'description', 'latest version'].map{ |h| h.to_s.upcase.bold }
64
+ headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
65
65
  table = TTY::Table.new(headers, info.map(&:values))
66
66
  puts "Installed plugins:"
67
67
  puts table.render(indent: 2, padding: [0, 0])
@@ -28,6 +28,12 @@ module Chronicle
28
28
  end
29
29
  end
30
30
 
31
+ def exists?(type, identifier)
32
+ base = config_pathname_for_type(type)
33
+ path = base.join("#{identifier}.yml")
34
+ return path.exist?
35
+ end
36
+
31
37
  # Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
32
38
  def available_jobs
33
39
  Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
@@ -6,6 +6,9 @@ module Chronicle
6
6
 
7
7
  class ConfigError < Error; end
8
8
 
9
+ class RunnerError < Error; end
10
+ class RunInterruptedError < RunnerError; end
11
+
9
12
  class RunnerTypeError < Error; end
10
13
 
11
14
  class JobDefinitionError < Error
@@ -3,11 +3,13 @@ require 'csv'
3
3
  module Chronicle
4
4
  module ETL
5
5
  class CSVLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
6
8
  register_connector do |r|
7
9
  r.description = 'CSV'
8
10
  end
9
11
 
10
- setting :output, default: $stdout
12
+ setting :output
11
13
  setting :headers, default: true
12
14
  setting :header_row, default: true
13
15
 
@@ -30,16 +32,7 @@ module Chronicle
30
32
  csv_options[:headers] = headers
31
33
  end
32
34
 
33
- if @config.output.is_a?(IO)
34
- # This might seem like a duplication of the default value ($stdout)
35
- # but it's because rspec overwrites $stdout (in helper #capture) to
36
- # capture output.
37
- io = $stdout.dup
38
- else
39
- io = File.open(@config.output, "w+")
40
- end
41
-
42
- output = CSV.generate(**csv_options) do |csv|
35
+ csv_output = CSV.generate(**csv_options) do |csv|
43
36
  records.each do |record|
44
37
  csv << record
45
38
  .transform_keys(&:to_sym)
@@ -48,8 +41,12 @@ module Chronicle
48
41
  end
49
42
  end
50
43
 
51
- io.write(output)
52
- io.close
44
+ # TODO: just write to io directly
45
+ if output_to_stdout?
46
+ write_to_stdout(csv_output)
47
+ else
48
+ File.write(@config.output, csv_output)
49
+ end
53
50
  end
54
51
  end
55
52
  end
@@ -0,0 +1,36 @@
1
+ require 'tempfile'
2
+
3
+ module Chronicle
4
+ module ETL
5
+ module Loaders
6
+ module Helpers
7
+ module StdoutHelper
8
+ # TODO: let users use "stdout" as an option for the `output` setting
9
+ # Assume we're using stdout if no output is specified
10
+ def output_to_stdout?
11
+ !@config.output
12
+ end
13
+
14
+ def create_stdout_temp_file
15
+ file = Tempfile.new('chronicle-stdout')
16
+ file.unlink
17
+ file
18
+ end
19
+
20
+ def write_to_stdout_from_temp_file(file)
21
+ file.rewind
22
+ write_to_stdout(file.read)
23
+ end
24
+
25
+ def write_to_stdout(output)
26
+ # We .dup because rspec overwrites $stdout (in helper #capture) to
27
+ # capture output.
28
+ stdout = $stdout.dup
29
+ stdout.write(output)
30
+ stdout.flush
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
36
+ end
@@ -1,19 +1,35 @@
1
+ require 'tempfile'
2
+
1
3
  module Chronicle
2
4
  module ETL
3
5
  class JSONLoader < Chronicle::ETL::Loader
6
+ include Chronicle::ETL::Loaders::Helpers::StdoutHelper
7
+
4
8
  register_connector do |r|
5
9
  r.description = 'json'
6
10
  end
7
11
 
8
12
  setting :serializer
9
- setting :output, default: $stdout
13
+ setting :output
14
+
15
+ # If true, one JSON record per line. If false, output a single json
16
+ # object with an array of records
17
+ setting :line_separated, default: true, type: :boolean
18
+
19
+ def initialize(*args)
20
+ super
21
+ @first_line = true
22
+ end
10
23
 
11
24
  def start
12
- if @config.output == $stdout
13
- @output = @config.output
14
- else
15
- @output = File.open(@config.output, "w")
16
- end
25
+ @output_file =
26
+ if output_to_stdout?
27
+ create_stdout_temp_file
28
+ else
29
+ File.open(@config.output, "w+")
30
+ end
31
+
32
+ @output_file.puts("[\n") unless @config.line_separated
17
33
  end
18
34
 
19
35
  def load(record)
@@ -27,15 +43,34 @@ module Chronicle
27
43
 
28
44
  force_utf8(value)
29
45
  end
30
- @output.puts encoded.to_json
46
+
47
+ line = encoded.to_json
48
+ # For line-separated output, we just put json + newline
49
+ if @config.line_separated
50
+ line = "#{line}\n"
51
+ # Otherwise, we add a comma and newline and then add record to the
52
+ # array we created in #start (unless it's the first line).
53
+ else
54
+ line = ",\n#{line}" unless @first_line
55
+ end
56
+
57
+ @output_file.write(line)
58
+
59
+ @first_line = false
31
60
  end
32
61
 
33
62
  def finish
34
- @output.close
63
+ # Close the array unless we're doing line-separated JSON
64
+ @output_file.puts("\n]") unless @config.line_separated
65
+
66
+ write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
67
+
68
+ @output_file.close
35
69
  end
36
70
 
37
71
  private
38
72
 
73
+ # TODO: implement this
39
74
  def serializer
40
75
  @config.serializer || Chronicle::ETL::RawSerializer
41
76
  end
@@ -1,4 +1,5 @@
1
1
  require_relative 'helpers/encoding_helper'
2
+ require_relative 'helpers/stdout_helper'
2
3
 
3
4
  module Chronicle
4
5
  module ETL
@@ -9,7 +9,7 @@ module Chronicle
9
9
  # @todo Experiment with just mixing in ActiveModel instead of this
10
10
  # this reimplementation
11
11
  class Base
12
- ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
12
+ ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
13
13
  ASSOCIATIONS = [].freeze
14
14
 
15
15
  attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
@@ -10,7 +10,7 @@ module Chronicle
10
10
  # TODO: This desperately needs a validation system
11
11
  ASSOCIATIONS = [
12
12
  :involvements, # inverse of activity's `involved`
13
-
13
+ :analogous,
14
14
  :attachments,
15
15
  :abouts,
16
16
  :aboutables, # inverse of above
@@ -1,5 +1,6 @@
1
1
  require 'colorize'
2
2
  require 'chronic_duration'
3
+ require "tty-spinner"
3
4
 
4
5
  class Chronicle::ETL::Runner
5
6
  def initialize(job)
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
8
9
  end
9
10
 
10
11
  def run!
12
+ begin_job
11
13
  validate_job
12
14
  instantiate_connectors
13
15
  prepare_job
14
16
  prepare_ui
15
17
  run_extraction
18
+ rescue Chronicle::ETL::ExtractionError => e
19
+ @job_logger&.error
20
+ raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
21
+ rescue Interrupt
22
+ @job_logger&.error
23
+ raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
24
+ rescue StandardError => e
25
+ # Just throwing this in here until we have better exception handling in
26
+ # loaders, etc
27
+ @job_logger&.error
28
+ raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
29
+ ensure
16
30
  finish_job
17
31
  end
18
32
 
19
33
  private
20
34
 
35
+ def begin_job
36
+ Chronicle::ETL::Logger.info(tty_log_job_initialize)
37
+ @initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
38
+ end
39
+
21
40
  def validate_job
41
+ @initialization_spinner.update(title: "Validating job")
22
42
  @job.job_definition.validate!
23
43
  end
24
44
 
25
45
  def instantiate_connectors
46
+ @initialization_spinner.update(title: "Initializing connectors")
26
47
  @extractor = @job.instantiate_extractor
27
48
  @loader = @job.instantiate_loader
28
49
  end
29
50
 
30
51
  def prepare_job
31
- Chronicle::ETL::Logger.info(tty_log_job_start)
52
+ @initialization_spinner.update(title: "Preparing job")
32
53
  @job_logger.start
33
54
  @loader.start
55
+
56
+ @initialization_spinner.update(title: "Preparing extraction")
57
+ @initialization_spinner.auto_spin
34
58
  @extractor.prepare
59
+ @initialization_spinner.success("(#{'successful'.green})")
60
+ Chronicle::ETL::Logger.info("\n")
35
61
  end
36
62
 
37
63
  def prepare_ui
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
40
66
  Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
41
67
  end
42
68
 
43
- # TODO: refactor this further
44
69
  def run_extraction
45
70
  @extractor.extract do |extraction|
46
- unless extraction.is_a?(Chronicle::ETL::Extraction)
47
- raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
48
- end
49
-
50
- transformer = @job.instantiate_transformer(extraction)
51
- record = transformer.transform
52
-
53
- Chronicle::ETL::Logger.info(tty_log_transformation(transformer))
54
- @job_logger.log_transformation(transformer)
55
-
56
- @loader.load(record) unless @job.dry_run?
57
- rescue Chronicle::ETL::TransformationError => e
58
- Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
59
- ensure
71
+ process_extraction(extraction)
60
72
  @progress_bar.increment
61
73
  end
62
74
 
63
75
  @progress_bar.finish
76
+
77
+ # This is typically a slow method (writing to stdout, writing a big file, etc)
78
+ # TODO: consider adding a spinner?
64
79
  @loader.finish
65
80
  @job_logger.finish
66
- rescue Interrupt
67
- Chronicle::ETL::Logger.error("\n#{'Job interrupted'.red}")
68
- @job_logger.error
69
- rescue StandardError => e
70
- raise e
81
+ end
82
+
83
+ def process_extraction(extraction)
84
+ # For each extraction from our extractor, we create a new tarnsformer
85
+ transformer = @job.instantiate_transformer(extraction)
86
+
87
+ # And then transform that record, logging it if we're in debug log level
88
+ record = transformer.transform
89
+ Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
90
+ @job_logger.log_transformation(transformer)
91
+
92
+ # Then send the results to the loader
93
+ @loader.load(record) unless @job.dry_run?
94
+ rescue Chronicle::ETL::TransformationError => e
95
+ # TODO: have an option to cancel job if we encounter an error
96
+ Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
71
97
  end
72
98
 
73
99
  def finish_job
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
77
103
  Chronicle::ETL::Logger.info(tty_log_completion)
78
104
  end
79
105
 
80
- def tty_log_job_start
106
+ def tty_log_job_initialize
81
107
  output = "Beginning job "
82
108
  output += "'#{@job.name}'".bold if @job.name
83
109
  output
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
95
121
 
96
122
  def tty_log_completion
97
123
  status = @job_logger.success ? 'Success' : 'Failed'
98
- output = "\nCompleted job "
99
- output += "'#{@job.name}'".bold if @job.name
124
+ job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
125
+ output = "\n#{job_completion} job"
126
+ output += " '#{@job.name}'".bold if @job.name
100
127
  output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
101
128
  output += "\n Status:\t".light_black + status
102
129
  output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
@@ -10,6 +10,10 @@ module Chronicle
10
10
  # options::
11
11
  # Options for configuring this Transformer
12
12
  def initialize(extraction, options = {})
13
+ unless extraction.is_a?(Chronicle::ETL::Extraction)
14
+ raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
15
+ end
16
+
13
17
  @extraction = extraction
14
18
  apply_options(options)
15
19
  end
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.5.0"
3
+ VERSION = "0.5.3"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.5.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-24 00:00:00.000000000 Z
11
+ date: 2022-04-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -396,6 +396,7 @@ files:
396
396
  - lib/chronicle/etl/job_logger.rb
397
397
  - lib/chronicle/etl/loaders/csv_loader.rb
398
398
  - lib/chronicle/etl/loaders/helpers/encoding_helper.rb
399
+ - lib/chronicle/etl/loaders/helpers/stdout_helper.rb
399
400
  - lib/chronicle/etl/loaders/json_loader.rb
400
401
  - lib/chronicle/etl/loaders/loader.rb
401
402
  - lib/chronicle/etl/loaders/rest_loader.rb