chronicle-etl 0.4.1 → 0.4.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +3 -0
- data/README.md +16 -5
- data/chronicle-etl.gemspec +3 -0
- data/lib/chronicle/etl/cli/cli_base.rb +31 -0
- data/lib/chronicle/etl/cli/connectors.rb +4 -11
- data/lib/chronicle/etl/cli/jobs.rb +45 -21
- data/lib/chronicle/etl/cli/main.rb +32 -1
- data/lib/chronicle/etl/cli/plugins.rb +62 -0
- data/lib/chronicle/etl/cli/subcommand_base.rb +1 -1
- data/lib/chronicle/etl/cli.rb +3 -0
- data/lib/chronicle/etl/config.rb +7 -4
- data/lib/chronicle/etl/configurable.rb +7 -2
- data/lib/chronicle/etl/exceptions.rb +25 -9
- data/lib/chronicle/etl/extractors/extractor.rb +1 -1
- data/lib/chronicle/etl/job.rb +7 -1
- data/lib/chronicle/etl/job_definition.rb +31 -5
- data/lib/chronicle/etl/loaders/csv_loader.rb +35 -8
- data/lib/chronicle/etl/loaders/helpers/encoding_helper.rb +18 -0
- data/lib/chronicle/etl/loaders/json_loader.rb +1 -1
- data/lib/chronicle/etl/loaders/loader.rb +23 -0
- data/lib/chronicle/etl/loaders/table_loader.rb +4 -20
- data/lib/chronicle/etl/logger.rb +5 -2
- data/lib/chronicle/etl/registry/connector_registration.rb +5 -0
- data/lib/chronicle/etl/registry/plugin_registry.rb +75 -0
- data/lib/chronicle/etl/registry/registry.rb +27 -14
- data/lib/chronicle/etl/runner.rb +38 -21
- data/lib/chronicle/etl/transformers/image_file_transformer.rb +2 -2
- data/lib/chronicle/etl/version.rb +1 -1
- metadata +52 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2f035ef95ebae675973ce505c71345c0c2da640b20a3e88050f4c88c76caf656
|
4
|
+
data.tar.gz: '0486e4ce5bfdb85ad6ccb5a792ac7aa5a897afecf839c759bb78a2f33136d34e'
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f9a1ba3cb4a9abd3bc8a499012b3456b1a2b4cf1f55bed1213f0b1baa6ea96d0ad6e54a470425fa5aa4961061630095218a31f64ef4a39bea15c547219f9a7a8
|
7
|
+
data.tar.gz: d82ff59fd2875d55b079b7814b6a028f98f80f17d3ae2bb3291e5ae6cfb7e1b06f571e16fc73c83dea41d7682f24eb9b7ee3fa6ae7cc709ede57e12011e6a0be
|
data/.rubocop.yml
CHANGED
data/README.md
CHANGED
@@ -1,12 +1,14 @@
|
|
1
1
|
## A CLI toolkit for extracting and working with your digital history
|
2
2
|
|
3
|
-
|
3
|
+
![chronicle-etl-banner](https://user-images.githubusercontent.com/6291/157330518-0f934c9a-9ec4-43d9-9cc2-12f156d09b37.png)
|
4
|
+
|
5
|
+
[![Gem Version](https://badge.fury.io/rb/chronicle-etl.svg)](https://badge.fury.io/rb/chronicle-etl) [![Ruby](https://github.com/chronicle-app/chronicle-etl/actions/workflows/ruby.yml/badge.svg)](https://github.com/chronicle-app/chronicle-etl/actions/workflows/ruby.yml) [![Docs](https://img.shields.io/badge/docs-rubydoc.info-blue)](https://www.rubydoc.info/gems/chronicle-etl/)
|
4
6
|
|
5
7
|
Are you trying to archive your digital history or incorporate it into your own projects? You’ve probably discovered how frustrating it is to get machine-readable access to your own data. While [building a memex](https://hyfen.net/memex/), I learned first-hand what great efforts must be made before you can begin using the data in interesting ways.
|
6
8
|
|
7
9
|
If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
|
8
10
|
|
9
|
-
|
11
|
+
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a source (e.g. a CSV file, JSON, external API).
|
10
12
|
|
11
13
|
## What does `chronicle-etl` give you?
|
12
14
|
* **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
|
@@ -86,7 +88,16 @@ Plugins provide access to data from third-party platforms, services, or formats.
|
|
86
88
|
|
87
89
|
```bash
|
88
90
|
# Install a plugin
|
89
|
-
$ chronicle-etl
|
91
|
+
$ chronicle-etl plugins:install NAME
|
92
|
+
|
93
|
+
# Install the imessage plugin
|
94
|
+
$ chronicle-etl plugins:install imessage
|
95
|
+
|
96
|
+
# List installed plugins
|
97
|
+
$ chronicle-etl plugins:list
|
98
|
+
|
99
|
+
# Uninstall a plugin
|
100
|
+
$ chronicle-etl plugins:uninstall NAME
|
90
101
|
```
|
91
102
|
|
92
103
|
A few dozen importers exist [in my Memex project](https://hyfen.net/memex/) and they’re being ported over to the Chronicle system. This table shows what’s available now and what’s coming. Rows are sorted in very rough order of priority.
|
@@ -99,8 +110,8 @@ If you want to work together on a connector, please [get in touch](#get-in-touch
|
|
99
110
|
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (zsh support pending) |
|
100
111
|
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (imap support pending) |
|
101
112
|
| [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
|
113
|
+
| [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
|
102
114
|
| github | Github user and repo activity | In progress |
|
103
|
-
| safari | Browser history from local sqlite db | Needs porting |
|
104
115
|
| chrome | Browser history from local sqlite db | Needs porting |
|
105
116
|
| whatsapp | Messaging history (via individual chat exports) or reverse-engineered local desktop install | Unstarted |
|
106
117
|
| anki | Studying and card creation history | Needs porting |
|
@@ -186,4 +197,4 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/chroni
|
|
186
197
|
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
187
198
|
|
188
199
|
## Code of Conduct
|
189
|
-
Everyone interacting in the Chronicle::ETL project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/chronicle-app/chronicle-etl/blob/
|
200
|
+
Everyone interacting in the Chronicle::ETL project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/chronicle-app/chronicle-etl/blob/main/CODE_OF_CONDUCT.md).
|
data/chronicle-etl.gemspec
CHANGED
@@ -47,8 +47,11 @@ Gem::Specification.new do |spec|
|
|
47
47
|
spec.add_dependency "sequel", "~> 5.35"
|
48
48
|
spec.add_dependency "sqlite3", "~> 1.4"
|
49
49
|
spec.add_dependency "thor", "~> 1.2"
|
50
|
+
spec.add_dependency "thor-hollaback", "~> 0.2"
|
50
51
|
spec.add_dependency "tty-progressbar", "~> 0.17"
|
52
|
+
spec.add_dependency "tty-spinner"
|
51
53
|
spec.add_dependency "tty-table", "~> 0.11"
|
54
|
+
spec.add_dependency "tty-prompt", "~> 0.23"
|
52
55
|
|
53
56
|
spec.add_development_dependency "bundler", "~> 2.1"
|
54
57
|
spec.add_development_dependency "pry-byebug", "~> 3.9"
|
@@ -0,0 +1,31 @@
|
|
1
|
+
module Chronicle
|
2
|
+
module ETL
|
3
|
+
module CLI
|
4
|
+
# Base class for CLI commands
|
5
|
+
class CLIBase < ::Thor
|
6
|
+
no_commands do
|
7
|
+
# Shorthand for cli_exit(status: :failure)
|
8
|
+
def cli_fail(message: nil, exception: nil)
|
9
|
+
cli_exit(status: :failure, message: message, exception: exception)
|
10
|
+
end
|
11
|
+
|
12
|
+
# Exit from CLI
|
13
|
+
#
|
14
|
+
# @params status Can be eitiher :success or :failure
|
15
|
+
# @params message to print
|
16
|
+
# @params exception stacktrace if log_level is set to debug
|
17
|
+
def cli_exit(status: :success, message: nil, exception: nil)
|
18
|
+
exit_code = status == :success ? 0 : 1
|
19
|
+
log_level = status == :success ? :info : :fatal
|
20
|
+
|
21
|
+
message = message.red if status != :success
|
22
|
+
|
23
|
+
Chronicle::ETL::Logger.debug(exception.full_message) if exception
|
24
|
+
Chronicle::ETL::Logger.send(log_level, message) if message
|
25
|
+
exit(exit_code)
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
@@ -8,11 +8,6 @@ module Chronicle
|
|
8
8
|
default_task 'list'
|
9
9
|
namespace :connectors
|
10
10
|
|
11
|
-
desc "install NAME", "Installs connector NAME"
|
12
|
-
def install(name)
|
13
|
-
Chronicle::ETL::Registry.install_connector(name)
|
14
|
-
end
|
15
|
-
|
16
11
|
desc "list", "Lists available connectors"
|
17
12
|
# Display all available connectors that chronicle-etl has access to
|
18
13
|
def list
|
@@ -44,21 +39,19 @@ module Chronicle
|
|
44
39
|
desc "show PHASE IDENTIFIER", "Show information about a connector"
|
45
40
|
def show(phase, identifier)
|
46
41
|
unless ['extractor', 'transformer', 'loader'].include?(phase)
|
47
|
-
|
48
|
-
return
|
42
|
+
cli_fail(message: "Phase argument must be one of: [extractor, transformer, loader]")
|
49
43
|
end
|
50
44
|
|
51
45
|
begin
|
52
46
|
connector = Chronicle::ETL::Registry.find_by_phase_and_identifier(phase.to_sym, identifier)
|
53
|
-
rescue Chronicle::ETL::ConnectorNotAvailableError
|
54
|
-
|
55
|
-
return
|
47
|
+
rescue Chronicle::ETL::ConnectorNotAvailableError, Chronicle::ETL::PluginError => e
|
48
|
+
cli_fail(message: "Could not find #{phase} #{identifier}", exception: e)
|
56
49
|
end
|
57
50
|
|
58
51
|
puts connector.klass.to_s.bold
|
59
52
|
puts " #{connector.descriptive_phrase}"
|
60
53
|
puts
|
61
|
-
puts "
|
54
|
+
puts "Settings:"
|
62
55
|
|
63
56
|
headers = ['name', 'default', 'required'].map{ |h| h.to_s.upcase.bold }
|
64
57
|
|
@@ -1,4 +1,5 @@
|
|
1
1
|
require 'pp'
|
2
|
+
require 'tty-prompt'
|
2
3
|
|
3
4
|
module Chronicle
|
4
5
|
module ETL
|
@@ -6,7 +7,7 @@ module Chronicle
|
|
6
7
|
# CLI commands for working with ETL jobs
|
7
8
|
class Jobs < SubcommandBase
|
8
9
|
default_task "start"
|
9
|
-
namespace :jobs
|
10
|
+
namespace :jobs
|
10
11
|
|
11
12
|
class_option :name, aliases: '-j', desc: 'Job configuration name'
|
12
13
|
|
@@ -25,16 +26,11 @@ module Chronicle
|
|
25
26
|
|
26
27
|
class_option :output, aliases: '-o', desc: 'Output filename', type: 'string'
|
27
28
|
class_option :fields, desc: 'Output only these fields', type: 'array', banner: 'field1 field2 ...'
|
28
|
-
|
29
|
-
class_option :log_level, desc: 'Log level (debug, info, warn, error, fatal)', default: 'info'
|
30
|
-
class_option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
|
31
|
-
class_option :silent, desc: 'Silence all output', type: :boolean
|
29
|
+
class_option :header_row, desc: 'Output the header row of tabular output', type: 'boolean'
|
32
30
|
|
33
31
|
# Thor doesn't like `run` as a command name
|
34
32
|
map run: :start
|
35
33
|
desc "run", "Start a job"
|
36
|
-
option :log_level, desc: 'Log level (debug, info, warn, error, fatal)', default: 'info'
|
37
|
-
option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
|
38
34
|
option :dry_run, desc: 'Only run the extraction and transform steps, not the loading', type: :boolean
|
39
35
|
long_desc <<-LONG_DESC
|
40
36
|
This will run an ETL job. Each job needs three parts:
|
@@ -49,25 +45,41 @@ module Chronicle
|
|
49
45
|
LONG_DESC
|
50
46
|
# Run an ETL job
|
51
47
|
def start
|
52
|
-
setup_log_level
|
53
48
|
job_definition = build_job_definition(options)
|
54
|
-
|
55
|
-
|
56
|
-
|
49
|
+
|
50
|
+
if job_definition.plugins_missing?
|
51
|
+
missing_plugins = job_definition.errors[:plugins]
|
52
|
+
.select { |error| error.is_a?(Chronicle::ETL::PluginLoadError) }
|
53
|
+
.map(&:name)
|
54
|
+
.uniq
|
55
|
+
install_missing_plugins(missing_plugins)
|
56
|
+
end
|
57
|
+
|
58
|
+
run_job(job_definition)
|
59
|
+
rescue Chronicle::ETL::JobDefinitionError => e
|
60
|
+
cli_fail(message: "Error running job.\n#{job_definition.errors}", exception: e)
|
57
61
|
end
|
58
62
|
|
59
63
|
desc "create", "Create a job"
|
60
64
|
# Create an ETL job
|
61
65
|
def create
|
62
66
|
job_definition = build_job_definition(options)
|
67
|
+
job_definition.validate!
|
68
|
+
|
63
69
|
path = File.join('chronicle', 'etl', 'jobs', options[:name])
|
64
70
|
Chronicle::ETL::Config.write(path, job_definition.definition)
|
71
|
+
rescue Chronicle::ETL::JobDefinitionError => e
|
72
|
+
cli_fail(message: "Job definition error", exception: e)
|
65
73
|
end
|
66
74
|
|
67
75
|
desc "show", "Show details about a job"
|
68
76
|
# Show an ETL job
|
69
77
|
def show
|
70
|
-
|
78
|
+
job_definition = build_job_definition(options)
|
79
|
+
job_definition.validate!
|
80
|
+
puts Chronicle::ETL::Job.new(job_definition)
|
81
|
+
rescue Chronicle::ETL::JobDefinitionError => e
|
82
|
+
cli_fail(message: "Job definition error", exception: e)
|
71
83
|
end
|
72
84
|
|
73
85
|
desc "list", "List all available jobs"
|
@@ -87,21 +99,32 @@ LONG_DESC
|
|
87
99
|
|
88
100
|
headers = ['name', 'extractor', 'transformer', 'loader'].map { |h| h.upcase.bold }
|
89
101
|
|
102
|
+
puts "Available jobs:"
|
90
103
|
table = TTY::Table.new(headers, job_details)
|
91
104
|
puts table.render(indent: 0, padding: [0, 2])
|
105
|
+
rescue Chronicle::ETL::ConfigError => e
|
106
|
+
cli_fail(message: "Config error. #{e.message}", exception: e)
|
92
107
|
end
|
93
108
|
|
94
109
|
private
|
95
110
|
|
96
|
-
def
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
111
|
+
def run_job(job_definition)
|
112
|
+
job = Chronicle::ETL::Job.new(job_definition)
|
113
|
+
runner = Chronicle::ETL::Runner.new(job)
|
114
|
+
runner.run!
|
115
|
+
end
|
116
|
+
|
117
|
+
# TODO: probably could merge this with something in cli/plugin
|
118
|
+
def install_missing_plugins(missing_plugins)
|
119
|
+
prompt = TTY::Prompt.new
|
120
|
+
message = "Plugin#{'s' if missing_plugins.count > 1} specified by job not installed.\n"
|
121
|
+
message += "Do you want to install "
|
122
|
+
message += missing_plugins.map { |name| "chronicle-#{name}".bold}.join(", ")
|
123
|
+
message += " and start the job?"
|
124
|
+
will_install = prompt.yes?(message)
|
125
|
+
cli_fail(message: "Must install #{missing_plugins.join(", ")} plugin to run job") unless will_install
|
126
|
+
|
127
|
+
Chronicle::ETL::CLI::Plugins.new.install(*missing_plugins)
|
105
128
|
end
|
106
129
|
|
107
130
|
# Create job definition by reading config file and then overwriting with flag options
|
@@ -129,6 +152,7 @@ LONG_DESC
|
|
129
152
|
|
130
153
|
loader_options = options[:'loader-opts'].merge({
|
131
154
|
output: options[:output],
|
155
|
+
header_row: options[:header_row],
|
132
156
|
fields: options[:fields]
|
133
157
|
}.compact)
|
134
158
|
|
@@ -4,7 +4,15 @@ module Chronicle
|
|
4
4
|
module ETL
|
5
5
|
module CLI
|
6
6
|
# Main entrypoint for CLI app
|
7
|
-
class Main < ::
|
7
|
+
class Main < Chronicle::ETL::CLI::CLIBase
|
8
|
+
class_before :set_log_level
|
9
|
+
class_before :set_color_output
|
10
|
+
|
11
|
+
class_option :log_level, desc: 'Log level (debug, info, warn, error, fatal, silent)', default: 'info'
|
12
|
+
class_option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
|
13
|
+
class_option :silent, desc: 'Silence all output', type: :boolean
|
14
|
+
class_option :'no-color', desc: 'Disable colour output', type: :boolean
|
15
|
+
|
8
16
|
default_task "jobs"
|
9
17
|
|
10
18
|
desc 'connectors:COMMAND', 'Connectors available for ETL jobs', hide: true
|
@@ -13,6 +21,9 @@ module Chronicle
|
|
13
21
|
desc 'jobs:COMMAND', 'Configure and run jobs', hide: true
|
14
22
|
subcommand 'jobs', Jobs
|
15
23
|
|
24
|
+
desc 'plugins:COMMAND', 'Configure plugins', hide: true
|
25
|
+
subcommand 'plugins', Plugins
|
26
|
+
|
16
27
|
# Entrypoint for the CLI
|
17
28
|
def self.start(given_args = ARGV, config = {})
|
18
29
|
# take a subcommand:command and splits them so Thor knows how to hand off to the subcommand class
|
@@ -79,6 +90,26 @@ module Chronicle
|
|
79
90
|
shell.say
|
80
91
|
end
|
81
92
|
end
|
93
|
+
|
94
|
+
no_commands do
|
95
|
+
def testb
|
96
|
+
puts "hi"
|
97
|
+
end
|
98
|
+
def set_color_output
|
99
|
+
String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
|
100
|
+
end
|
101
|
+
|
102
|
+
def set_log_level
|
103
|
+
if options[:silent]
|
104
|
+
Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::SILENT
|
105
|
+
elsif options[:verbose]
|
106
|
+
Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::DEBUG
|
107
|
+
elsif options[:log_level]
|
108
|
+
level = Chronicle::ETL::Logger.const_get(options[:log_level].upcase)
|
109
|
+
Chronicle::ETL::Logger.log_level = level
|
110
|
+
end
|
111
|
+
end
|
112
|
+
end
|
82
113
|
end
|
83
114
|
end
|
84
115
|
end
|
@@ -0,0 +1,62 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "tty-prompt"
|
4
|
+
require "tty-spinner"
|
5
|
+
|
6
|
+
module Chronicle
|
7
|
+
module ETL
|
8
|
+
module CLI
|
9
|
+
# CLI commands for working with ETL plugins
|
10
|
+
class Plugins < SubcommandBase
|
11
|
+
default_task 'list'
|
12
|
+
namespace :plugins
|
13
|
+
|
14
|
+
desc "install", "Install a plugin"
|
15
|
+
def install(*plugins)
|
16
|
+
cli_fail(message: "Please specify a plugin to install") unless plugins.any?
|
17
|
+
|
18
|
+
spinner = TTY::Spinner.new("[:spinner] Installing #{plugins.join(", ")}...", format: :dots_2)
|
19
|
+
spinner.auto_spin
|
20
|
+
plugins.each do |plugin|
|
21
|
+
spinner.update(title: "Installing #{plugin}")
|
22
|
+
Chronicle::ETL::Registry::PluginRegistry.install(plugin)
|
23
|
+
rescue Chronicle::ETL::PluginError => e
|
24
|
+
spinner.error("Error".red)
|
25
|
+
cli_fail(message: "Plugin '#{plugin}' could not be installed", exception: e)
|
26
|
+
end
|
27
|
+
spinner.success("(#{'successful'.green})")
|
28
|
+
end
|
29
|
+
|
30
|
+
desc "uninstall", "Unintall a plugin"
|
31
|
+
def uninstall(name)
|
32
|
+
spinner = TTY::Spinner.new("[:spinner] Uninstalling plugin #{name}...", format: :dots_2)
|
33
|
+
spinner.auto_spin
|
34
|
+
Chronicle::ETL::Registry::PluginRegistry.uninstall(name)
|
35
|
+
spinner.success("(#{'successful'.green})")
|
36
|
+
rescue Chronicle::ETL::PluginError => e
|
37
|
+
spinner.error("Error".red)
|
38
|
+
cli_fail(message: "Plugin '#{name}' could not be uninstalled (was it installed?)", exception: e)
|
39
|
+
end
|
40
|
+
|
41
|
+
desc "list", "Lists available plugins"
|
42
|
+
# Display all available plugins that chronicle-etl has access to
|
43
|
+
def list
|
44
|
+
plugins = Chronicle::ETL::Registry::PluginRegistry.all_installed_latest
|
45
|
+
|
46
|
+
info = plugins.map do |plugin|
|
47
|
+
{
|
48
|
+
name: plugin.name.sub("chronicle-", ""),
|
49
|
+
description: plugin.description,
|
50
|
+
version: plugin.version
|
51
|
+
}
|
52
|
+
end
|
53
|
+
|
54
|
+
headers = ['name', 'description', 'latest version'].map{ |h| h.to_s.upcase.bold }
|
55
|
+
table = TTY::Table.new(headers, info.map(&:values))
|
56
|
+
puts "Installed plugins:"
|
57
|
+
puts table.render(indent: 2, padding: [0, 0])
|
58
|
+
end
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
@@ -2,7 +2,7 @@ module Chronicle
|
|
2
2
|
module ETL
|
3
3
|
module CLI
|
4
4
|
# Base class for CLI subcommands. Overrides Thor methods so we can use command:subcommand syntax
|
5
|
-
class SubcommandBase < ::
|
5
|
+
class SubcommandBase < Chronicle::ETL::CLI::CLIBase
|
6
6
|
# Print usage instructions for a subcommand
|
7
7
|
def self.help(shell, subcommand = false)
|
8
8
|
list = printable_commands(true, subcommand)
|
data/lib/chronicle/etl/cli.rb
CHANGED
@@ -1,7 +1,10 @@
|
|
1
1
|
require 'thor'
|
2
|
+
require 'thor/hollaback'
|
2
3
|
require 'chronicle/etl'
|
3
4
|
|
5
|
+
require 'chronicle/etl/cli/cli_base'
|
4
6
|
require 'chronicle/etl/cli/subcommand_base'
|
5
7
|
require 'chronicle/etl/cli/connectors'
|
6
8
|
require 'chronicle/etl/cli/jobs'
|
9
|
+
require 'chronicle/etl/cli/plugins'
|
7
10
|
require 'chronicle/etl/cli/main'
|
data/lib/chronicle/etl/config.rb
CHANGED
@@ -24,16 +24,14 @@ module Chronicle
|
|
24
24
|
|
25
25
|
# Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
|
26
26
|
def available_jobs
|
27
|
-
|
28
|
-
Dir.glob(File.join(job_directory, "*.yml")).map do |filename|
|
27
|
+
Dir.glob(File.join(config_directory("jobs"), "*.yml")).map do |filename|
|
29
28
|
File.basename(filename, ".*")
|
30
29
|
end
|
31
30
|
end
|
32
31
|
|
33
32
|
# Returns all available credentials available in ~/.config/chronicle/etl/credentials/*.yml
|
34
33
|
def available_credentials
|
35
|
-
|
36
|
-
Dir.glob(File.join(job_directory, "*.yml")).map do |filename|
|
34
|
+
Dir.glob(File.join(config_directory("credentials"), "*.yml")).map do |filename|
|
37
35
|
File.basename(filename, ".*")
|
38
36
|
end
|
39
37
|
end
|
@@ -48,6 +46,11 @@ module Chronicle
|
|
48
46
|
def load_credentials(name)
|
49
47
|
config = self.load("chronicle/etl/credentials/#{name}.yml")
|
50
48
|
end
|
49
|
+
|
50
|
+
def config_directory(type)
|
51
|
+
path = "chronicle/etl/#{type}"
|
52
|
+
Runcom::Config.new(path).current || raise(Chronicle::ETL::ConfigError, "Could not access config directory (#{path})")
|
53
|
+
end
|
51
54
|
end
|
52
55
|
end
|
53
56
|
end
|
@@ -57,7 +57,7 @@ module Chronicle
|
|
57
57
|
|
58
58
|
options.each do |name, value|
|
59
59
|
setting = self.class.all_settings[name]
|
60
|
-
raise(Chronicle::ETL::
|
60
|
+
raise(Chronicle::ETL::ConnectorConfigurationError, "Unrecognized setting: #{name}") unless setting
|
61
61
|
|
62
62
|
@config[name] = coerced_value(setting, value)
|
63
63
|
end
|
@@ -78,7 +78,7 @@ module Chronicle
|
|
78
78
|
|
79
79
|
def validate_config
|
80
80
|
missing = (self.class.all_required_settings.keys - @config.compacted_h.keys)
|
81
|
-
raise Chronicle::ETL::
|
81
|
+
raise Chronicle::ETL::ConnectorConfigurationError, "Missing options: #{missing}" if missing.count.positive?
|
82
82
|
end
|
83
83
|
|
84
84
|
def coerced_value(setting, value)
|
@@ -89,6 +89,11 @@ module Chronicle
|
|
89
89
|
value.to_s
|
90
90
|
end
|
91
91
|
|
92
|
+
# TODO: think about whether to split up float, integer
|
93
|
+
def coerce_numeric(value)
|
94
|
+
value.to_f
|
95
|
+
end
|
96
|
+
|
92
97
|
def coerce_boolean(value)
|
93
98
|
if value.is_a?(String)
|
94
99
|
value.downcase == "true"
|
@@ -2,10 +2,33 @@ module Chronicle
|
|
2
2
|
module ETL
|
3
3
|
class Error < StandardError; end
|
4
4
|
|
5
|
-
class
|
5
|
+
class ConfigError < Error; end
|
6
6
|
|
7
7
|
class RunnerTypeError < Error; end
|
8
8
|
|
9
|
+
class JobDefinitionError < Error
|
10
|
+
attr_reader :job_definition
|
11
|
+
|
12
|
+
def initialize(job_definition)
|
13
|
+
@job_definition = job_definition
|
14
|
+
super
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
class PluginError < Error
|
19
|
+
attr_reader :name
|
20
|
+
|
21
|
+
def initialize(name)
|
22
|
+
@name = name
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
class PluginConflictError < PluginError; end
|
27
|
+
class PluginNotAvailableError < PluginError; end
|
28
|
+
class PluginLoadError < PluginError; end
|
29
|
+
|
30
|
+
class ConnectorConfigurationError < Error; end
|
31
|
+
|
9
32
|
class ConnectorNotAvailableError < Error
|
10
33
|
def initialize(message, provider: nil, name: nil)
|
11
34
|
super(message)
|
@@ -22,14 +45,7 @@ module Chronicle
|
|
22
45
|
|
23
46
|
class SerializationError < Error; end
|
24
47
|
|
25
|
-
class TransformationError < Error
|
26
|
-
attr_reader :transformation
|
27
|
-
|
28
|
-
def initialize(message=nil, transformation:)
|
29
|
-
super(message)
|
30
|
-
@transformation = transformation
|
31
|
-
end
|
32
|
-
end
|
48
|
+
class TransformationError < Error; end
|
33
49
|
|
34
50
|
class UntransformableRecordError < TransformationError; end
|
35
51
|
end
|
data/lib/chronicle/etl/job.rb
CHANGED
@@ -1,6 +1,11 @@
|
|
1
1
|
require 'forwardable'
|
2
|
+
|
2
3
|
module Chronicle
|
3
4
|
module ETL
|
5
|
+
# A runner job
|
6
|
+
#
|
7
|
+
# TODO: this can probably be merged with JobDefinition. Not clear
|
8
|
+
# where the boundaries are
|
4
9
|
class Job
|
5
10
|
extend Forwardable
|
6
11
|
|
@@ -12,7 +17,8 @@ module Chronicle
|
|
12
17
|
:transformer_klass,
|
13
18
|
:transformer_options,
|
14
19
|
:loader_klass,
|
15
|
-
:loader_options
|
20
|
+
:loader_options,
|
21
|
+
:job_definition
|
16
22
|
|
17
23
|
# TODO: build a proper id system
|
18
24
|
alias id name
|
@@ -19,17 +19,47 @@ module Chronicle
|
|
19
19
|
}
|
20
20
|
}.freeze
|
21
21
|
|
22
|
+
attr_reader :errors
|
22
23
|
attr_accessor :definition
|
23
24
|
|
24
25
|
def initialize()
|
25
26
|
@definition = SKELETON_DEFINITION
|
26
27
|
end
|
27
28
|
|
29
|
+
def valid?
|
30
|
+
validate
|
31
|
+
@errors.empty?
|
32
|
+
end
|
33
|
+
|
34
|
+
def validate
|
35
|
+
@errors = {}
|
36
|
+
|
37
|
+
Chronicle::ETL::Registry::PHASES.each do |phase|
|
38
|
+
__send__("#{phase}_klass".to_sym)
|
39
|
+
rescue Chronicle::ETL::PluginError => e
|
40
|
+
@errors[:plugins] ||= []
|
41
|
+
@errors[:plugins] << e
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
def plugins_missing?
|
46
|
+
validate
|
47
|
+
|
48
|
+
@errors[:plugins] || []
|
49
|
+
.filter { |e| e.instance_of?(Chronicle::ETL::PluginLoadError) }
|
50
|
+
.any?
|
51
|
+
end
|
52
|
+
|
53
|
+
def validate!
|
54
|
+
raise(Chronicle::ETL::JobDefinitionError.new(self), "Job definition is invalid") unless valid?
|
55
|
+
|
56
|
+
true
|
57
|
+
end
|
58
|
+
|
28
59
|
# Add config hash to this definition
|
29
60
|
def add_config(config = {})
|
30
61
|
@definition = @definition.deep_merge(config)
|
31
62
|
load_credentials
|
32
|
-
validate
|
33
63
|
end
|
34
64
|
|
35
65
|
# Is this job continuing from a previous run?
|
@@ -80,10 +110,6 @@ module Chronicle
|
|
80
110
|
end
|
81
111
|
end
|
82
112
|
end
|
83
|
-
|
84
|
-
def validate
|
85
|
-
return true # TODO
|
86
|
-
end
|
87
113
|
end
|
88
114
|
end
|
89
115
|
end
|
@@ -7,22 +7,49 @@ module Chronicle
|
|
7
7
|
r.description = 'CSV'
|
8
8
|
end
|
9
9
|
|
10
|
-
|
11
|
-
|
12
|
-
|
10
|
+
setting :output, default: $stdout
|
11
|
+
setting :headers, default: true
|
12
|
+
setting :header_row, default: true
|
13
|
+
|
14
|
+
def records
|
15
|
+
@records ||= []
|
13
16
|
end
|
14
17
|
|
15
18
|
def load(record)
|
16
|
-
|
19
|
+
records << record.to_h_flattened
|
17
20
|
end
|
18
21
|
|
19
22
|
def finish
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
23
|
+
return unless records.any?
|
24
|
+
|
25
|
+
headers = build_headers(records)
|
26
|
+
|
27
|
+
csv_options = {}
|
28
|
+
if @config.headers
|
29
|
+
csv_options[:write_headers] = @config.header_row
|
30
|
+
csv_options[:headers] = headers
|
31
|
+
end
|
32
|
+
|
33
|
+
if @config.output.is_a?(IO)
|
34
|
+
# This might seem like a duplication of the default value ($stdout)
|
35
|
+
# but it's because rspec overwrites $stdout (in helper #capture) to
|
36
|
+
# capture output.
|
37
|
+
io = $stdout.dup
|
38
|
+
else
|
39
|
+
io = File.open(@config.output, "w+")
|
40
|
+
end
|
41
|
+
|
42
|
+
output = CSV.generate(**csv_options) do |csv|
|
43
|
+
records.each do |record|
|
44
|
+
csv << record
|
45
|
+
.transform_keys(&:to_sym)
|
46
|
+
.values_at(*headers)
|
47
|
+
.map { |value| force_utf8(value) }
|
24
48
|
end
|
25
49
|
end
|
50
|
+
|
51
|
+
io.write(output)
|
52
|
+
io.close
|
26
53
|
end
|
27
54
|
end
|
28
55
|
end
|
@@ -0,0 +1,18 @@
|
|
1
|
+
require 'pathname'
|
2
|
+
|
3
|
+
module Chronicle
|
4
|
+
module ETL
|
5
|
+
module Loaders
|
6
|
+
module Helpers
|
7
|
+
module EncodingHelper
|
8
|
+
# Mostly useful for handling loading with binary data from a raw extraction
|
9
|
+
def force_utf8(value)
|
10
|
+
return value unless value.is_a?(String)
|
11
|
+
|
12
|
+
value.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
@@ -1,11 +1,17 @@
|
|
1
|
+
require_relative 'helpers/encoding_helper'
|
2
|
+
|
1
3
|
module Chronicle
|
2
4
|
module ETL
|
3
5
|
# Abstract class representing a Loader for an ETL job
|
4
6
|
class Loader
|
5
7
|
extend Chronicle::ETL::Registry::SelfRegistering
|
6
8
|
include Chronicle::ETL::Configurable
|
9
|
+
include Chronicle::ETL::Loaders::Helpers::EncodingHelper
|
7
10
|
|
8
11
|
setting :output
|
12
|
+
setting :fields
|
13
|
+
setting :fields_limit, default: nil
|
14
|
+
setting :fields_exclude
|
9
15
|
|
10
16
|
# Construct a new instance of this loader. Options are passed in from a Runner
|
11
17
|
# == Parameters:
|
@@ -25,6 +31,23 @@ module Chronicle
|
|
25
31
|
|
26
32
|
# Called once there are no more records to process
|
27
33
|
def finish; end
|
34
|
+
|
35
|
+
private
|
36
|
+
|
37
|
+
def build_headers(records)
|
38
|
+
headers =
|
39
|
+
if @config.fields && @config.fields.any?
|
40
|
+
Set[*@config.fields]
|
41
|
+
else
|
42
|
+
# use all the keys of the flattened record hash
|
43
|
+
Set[*records.map(&:keys).flatten.map(&:to_s).uniq]
|
44
|
+
end
|
45
|
+
|
46
|
+
headers = headers.delete_if { |header| header.end_with?(*@config.fields_exclude) }
|
47
|
+
headers = headers.first(@config.fields_limit) if @config.fields_limit
|
48
|
+
|
49
|
+
headers.to_a.map(&:to_sym)
|
50
|
+
end
|
28
51
|
end
|
29
52
|
end
|
30
53
|
end
|
@@ -9,11 +9,10 @@ module Chronicle
|
|
9
9
|
r.description = 'an ASCII table'
|
10
10
|
end
|
11
11
|
|
12
|
-
setting :fields_limit, default: nil
|
13
|
-
setting :fields_exclude, default: ['lids', 'type']
|
14
|
-
setting :fields, default: []
|
15
12
|
setting :truncate_values_at, default: 40
|
16
13
|
setting :table_renderer, default: :basic
|
14
|
+
setting :fields_exclude, default: ['lids', 'type']
|
15
|
+
setting :header_row, default: true
|
17
16
|
|
18
17
|
def load(record)
|
19
18
|
records << record.to_h_flattened
|
@@ -25,7 +24,7 @@ module Chronicle
|
|
25
24
|
headers = build_headers(records)
|
26
25
|
rows = build_rows(records, headers)
|
27
26
|
|
28
|
-
@table = TTY::Table.new(header: headers, rows: rows)
|
27
|
+
@table = TTY::Table.new(header: (headers if @config.header_row), rows: rows)
|
29
28
|
puts @table.render(
|
30
29
|
@config.table_renderer.to_sym,
|
31
30
|
padding: [0, 2, 0, 0]
|
@@ -38,25 +37,10 @@ module Chronicle
|
|
38
37
|
|
39
38
|
private
|
40
39
|
|
41
|
-
def build_headers(records)
|
42
|
-
headers =
|
43
|
-
if @config.fields.any?
|
44
|
-
Set[*@config.fields]
|
45
|
-
else
|
46
|
-
# use all the keys of the flattened record hash
|
47
|
-
Set[*records.map(&:keys).flatten.map(&:to_s).uniq]
|
48
|
-
end
|
49
|
-
|
50
|
-
headers = headers.delete_if { |header| header.end_with?(*@config.fields_exclude) } if @config.fields_exclude.any?
|
51
|
-
headers = headers.first(@config.fields_limit) if @config.fields_limit
|
52
|
-
|
53
|
-
headers.to_a.map(&:to_sym)
|
54
|
-
end
|
55
|
-
|
56
40
|
def build_rows(records, headers)
|
57
41
|
records.map do |record|
|
58
42
|
values = record.transform_keys(&:to_sym).values_at(*headers).map{|value| value.to_s }
|
59
|
-
|
43
|
+
values = values.map { |value| force_utf8(value) }
|
60
44
|
if @config.truncate_values_at
|
61
45
|
values = values.map{ |value| value.truncate(@config.truncate_values_at) }
|
62
46
|
end
|
data/lib/chronicle/etl/logger.rb
CHANGED
@@ -13,7 +13,6 @@ module Chronicle
|
|
13
13
|
attr_accessor :log_level
|
14
14
|
|
15
15
|
@log_level = INFO
|
16
|
-
@destination = $stderr
|
17
16
|
|
18
17
|
def output message, level
|
19
18
|
return unless level >= @log_level
|
@@ -21,10 +20,14 @@ module Chronicle
|
|
21
20
|
if @progress_bar
|
22
21
|
@progress_bar.log(message)
|
23
22
|
else
|
24
|
-
|
23
|
+
$stderr.puts(message)
|
25
24
|
end
|
26
25
|
end
|
27
26
|
|
27
|
+
def fatal(message)
|
28
|
+
output(message, FATAL)
|
29
|
+
end
|
30
|
+
|
28
31
|
def error(message)
|
29
32
|
output(message, ERROR)
|
30
33
|
end
|
@@ -0,0 +1,75 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rubygems/command'
|
3
|
+
require 'rubygems/commands/install_command'
|
4
|
+
require 'rubygems/uninstaller'
|
5
|
+
|
6
|
+
module Chronicle
|
7
|
+
module ETL
|
8
|
+
module Registry
|
9
|
+
# Responsible for managing plugins available to chronicle-etl
|
10
|
+
#
|
11
|
+
# @todo Better validation for whether a gem is actually a plugin
|
12
|
+
# @todo Add ways to load a plugin that don't require a gem on rubygems.org
|
13
|
+
module PluginRegistry
|
14
|
+
# Does this plugin exist?
|
15
|
+
def self.exists?(name)
|
16
|
+
# TODO: implement this. Could query rubygems.org or have a
|
17
|
+
# hardcoded approved list
|
18
|
+
true
|
19
|
+
end
|
20
|
+
|
21
|
+
# All versions of all plugins currently installed
|
22
|
+
def self.all_installed
|
23
|
+
# TODO: add check for chronicle-etl dependency
|
24
|
+
Gem::Specification.filter { |s| s.name.match(/^chronicle-/) && s.name != "chronicle-etl" }
|
25
|
+
end
|
26
|
+
|
27
|
+
# Latest version of each installed plugin
|
28
|
+
def self.all_installed_latest
|
29
|
+
all_installed.group_by(&:name)
|
30
|
+
.transform_values { |versions| versions.sort_by(&:version).reverse.first }
|
31
|
+
.values
|
32
|
+
end
|
33
|
+
|
34
|
+
# Activate a plugin with given name by `require`ing it
|
35
|
+
def self.activate(name)
|
36
|
+
# By default, activates the latest available version of a gem
|
37
|
+
# so don't have to run Kernel#gem separately
|
38
|
+
require "chronicle/#{name}"
|
39
|
+
rescue Gem::ConflictError => e
|
40
|
+
# TODO: figure out if there's more we can do here
|
41
|
+
raise Chronicle::ETL::PluginConflictError.new(name), "Plugin '#{name}' couldn't be loaded. #{e.message}"
|
42
|
+
rescue LoadError => e
|
43
|
+
raise Chronicle::ETL::PluginLoadError.new(name), "Plugin '#{name}' couldn't be loaded" if exists?(name)
|
44
|
+
|
45
|
+
raise Chronicle::ETL::PluginNotAvailableError.new(name), "Plugin #{name} doesn't exist"
|
46
|
+
end
|
47
|
+
|
48
|
+
# Install a plugin to local gems
|
49
|
+
def self.install(name)
|
50
|
+
gem_name = "chronicle-#{name}"
|
51
|
+
raise(Chronicle::ETL::PluginNotAvailableError.new(gem_name), "Plugin #{name} doesn't exist") unless exists?(gem_name)
|
52
|
+
|
53
|
+
Gem::DefaultUserInteraction.ui = Gem::SilentUI.new
|
54
|
+
Gem.install(gem_name)
|
55
|
+
|
56
|
+
activate(name)
|
57
|
+
rescue Gem::UnsatisfiableDependencyError
|
58
|
+
# TODO: we need to catch a lot more than this here
|
59
|
+
raise Chronicle::ETL::PluginNotAvailableError.new(name), "Plugin #{name} could not be installed."
|
60
|
+
end
|
61
|
+
|
62
|
+
# Uninstall a plugin
|
63
|
+
def self.uninstall(name)
|
64
|
+
gem_name = "chronicle-#{name}"
|
65
|
+
Gem::DefaultUserInteraction.ui = Gem::SilentUI.new
|
66
|
+
uninstaller = Gem::Uninstaller.new(gem_name)
|
67
|
+
uninstaller.uninstall
|
68
|
+
rescue Gem::InstallError
|
69
|
+
# TODO: strengthen this exception handling
|
70
|
+
raise(Chronicle::ETL::PluginError.new(name), "Plugin #{name} wasn't uninstalled")
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
@@ -20,28 +20,40 @@ module Chronicle
|
|
20
20
|
end
|
21
21
|
end
|
22
22
|
|
23
|
-
def
|
24
|
-
|
25
|
-
Gem.install(gem_name)
|
23
|
+
def register connector
|
24
|
+
connectors << connector
|
26
25
|
end
|
27
26
|
|
28
|
-
def
|
27
|
+
def connectors
|
29
28
|
@connectors ||= []
|
30
|
-
@connectors << connector
|
31
29
|
end
|
32
30
|
|
33
31
|
def find_by_phase_and_identifier(phase, identifier)
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
32
|
+
# Simple case: built in connector
|
33
|
+
connector = connectors.find { |c| c.phase == phase && c.identifier == identifier }
|
34
|
+
return connector if connector
|
35
|
+
|
36
|
+
# if not available in built-in connectors, try to activate a
|
37
|
+
# relevant plugin and try again
|
38
|
+
if identifier.include?(":")
|
39
|
+
plugin, name = identifier.split(":")
|
40
|
+
else
|
41
|
+
# This case handles the case where the identifier is a
|
42
|
+
# shorthand (ie `imessage`) because there's only one default
|
43
|
+
# connector.
|
44
|
+
plugin = identifier
|
39
45
|
end
|
40
|
-
connector || raise(ConnectorNotAvailableError.new("Connector '#{identifier}' not found"))
|
41
|
-
end
|
42
46
|
|
43
|
-
|
44
|
-
|
47
|
+
PluginRegistry.activate(plugin)
|
48
|
+
|
49
|
+
candidates = connectors.select { |c| c.phase == phase && c.plugin == plugin }
|
50
|
+
# if no name given, just use first connector with right phase/plugin
|
51
|
+
# TODO: set up a property for connectors to specify that they're the
|
52
|
+
# default connector for the plugin
|
53
|
+
candidates = candidates.select { |c| c.identifier == name } if name
|
54
|
+
connector = candidates.first
|
55
|
+
|
56
|
+
connector || raise(ConnectorNotAvailableError, "Connector '#{identifier}' not found")
|
45
57
|
end
|
46
58
|
end
|
47
59
|
end
|
@@ -50,3 +62,4 @@ end
|
|
50
62
|
|
51
63
|
require_relative 'self_registering'
|
52
64
|
require_relative 'connector_registration'
|
65
|
+
require_relative 'plugin_registry'
|
data/lib/chronicle/etl/runner.rb
CHANGED
@@ -8,19 +8,41 @@ class Chronicle::ETL::Runner
|
|
8
8
|
end
|
9
9
|
|
10
10
|
def run!
|
11
|
-
|
12
|
-
|
11
|
+
validate_job
|
12
|
+
instantiate_connectors
|
13
|
+
prepare_job
|
14
|
+
prepare_ui
|
15
|
+
run_extraction
|
16
|
+
finish_job
|
17
|
+
end
|
18
|
+
|
19
|
+
private
|
20
|
+
|
21
|
+
def validate_job
|
22
|
+
@job.job_definition.validate!
|
23
|
+
end
|
24
|
+
|
25
|
+
def instantiate_connectors
|
26
|
+
@extractor = @job.instantiate_extractor
|
27
|
+
@loader = @job.instantiate_loader
|
28
|
+
end
|
13
29
|
|
30
|
+
def prepare_job
|
31
|
+
Chronicle::ETL::Logger.info(tty_log_job_start)
|
14
32
|
@job_logger.start
|
15
|
-
loader.start
|
33
|
+
@loader.start
|
34
|
+
@extractor.prepare
|
35
|
+
end
|
16
36
|
|
17
|
-
|
18
|
-
total = extractor.results_count
|
37
|
+
def prepare_ui
|
38
|
+
total = @extractor.results_count
|
19
39
|
@progress_bar = Chronicle::ETL::Utils::ProgressBar.new(title: 'Running job', total: total)
|
20
40
|
Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
|
41
|
+
end
|
21
42
|
|
22
|
-
|
23
|
-
|
43
|
+
# TODO: refactor this further
|
44
|
+
def run_extraction
|
45
|
+
@extractor.extract do |extraction|
|
24
46
|
unless extraction.is_a?(Chronicle::ETL::Extraction)
|
25
47
|
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
26
48
|
end
|
@@ -28,52 +50,47 @@ class Chronicle::ETL::Runner
|
|
28
50
|
transformer = @job.instantiate_transformer(extraction)
|
29
51
|
record = transformer.transform
|
30
52
|
|
31
|
-
# TODO: rethink this
|
32
|
-
# unless record.is_a?(Chronicle::ETL::Models)
|
33
|
-
# raise Chronicle::ETL::RunnerTypeError, "Transformed data should be a type of Chronicle::ETL::Models"
|
34
|
-
# end
|
35
|
-
|
36
53
|
Chronicle::ETL::Logger.info(tty_log_transformation(transformer))
|
37
54
|
@job_logger.log_transformation(transformer)
|
38
55
|
|
39
|
-
loader.load(record) unless @job.dry_run?
|
56
|
+
@loader.load(record) unless @job.dry_run?
|
40
57
|
rescue Chronicle::ETL::TransformationError => e
|
41
|
-
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e))
|
58
|
+
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
42
59
|
ensure
|
43
60
|
@progress_bar.increment
|
44
61
|
end
|
45
62
|
|
46
63
|
@progress_bar.finish
|
47
|
-
loader.finish
|
64
|
+
@loader.finish
|
48
65
|
@job_logger.finish
|
49
66
|
rescue Interrupt
|
50
67
|
Chronicle::ETL::Logger.error("\n#{'Job interrupted'.red}")
|
51
68
|
@job_logger.error
|
52
69
|
rescue StandardError => e
|
53
70
|
raise e
|
54
|
-
|
71
|
+
end
|
72
|
+
|
73
|
+
def finish_job
|
55
74
|
@job_logger.save
|
56
75
|
@progress_bar&.finish
|
57
76
|
Chronicle::ETL::Logger.detach_from_progress_bar
|
58
77
|
Chronicle::ETL::Logger.info(tty_log_completion)
|
59
78
|
end
|
60
79
|
|
61
|
-
private
|
62
|
-
|
63
80
|
def tty_log_job_start
|
64
81
|
output = "Beginning job "
|
65
82
|
output += "'#{@job.name}'".bold if @job.name
|
66
83
|
output
|
67
84
|
end
|
68
85
|
|
69
|
-
def tty_log_transformation
|
86
|
+
def tty_log_transformation(transformer)
|
70
87
|
output = " ✓".green
|
71
88
|
output += " #{transformer}"
|
72
89
|
end
|
73
90
|
|
74
|
-
def tty_log_transformation_failure
|
91
|
+
def tty_log_transformation_failure(exception, transformer)
|
75
92
|
output = " ✖".red
|
76
|
-
output += " Failed to build #{
|
93
|
+
output += " Failed to build #{transformer}. #{exception.message}"
|
77
94
|
end
|
78
95
|
|
79
96
|
def tty_log_completion
|
@@ -43,7 +43,7 @@ module Chronicle
|
|
43
43
|
def id
|
44
44
|
@id ||= begin
|
45
45
|
id = build_with_strategy(field: :id, strategy: @config.id_strategy)
|
46
|
-
raise
|
46
|
+
raise(UntransformableRecordError, "Could not build id") unless id
|
47
47
|
|
48
48
|
id
|
49
49
|
end
|
@@ -52,7 +52,7 @@ module Chronicle
|
|
52
52
|
def timestamp
|
53
53
|
@timestamp ||= begin
|
54
54
|
ts = build_with_strategy(field: :timestamp, strategy: @config.timestamp_strategy)
|
55
|
-
raise
|
55
|
+
raise(UntransformableRecordError, "Could not build timestamp") unless ts
|
56
56
|
|
57
57
|
ts
|
58
58
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chronicle-etl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.4.
|
4
|
+
version: 0.4.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Louis
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-03-
|
11
|
+
date: 2022-03-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -150,6 +150,20 @@ dependencies:
|
|
150
150
|
- - "~>"
|
151
151
|
- !ruby/object:Gem::Version
|
152
152
|
version: '1.2'
|
153
|
+
- !ruby/object:Gem::Dependency
|
154
|
+
name: thor-hollaback
|
155
|
+
requirement: !ruby/object:Gem::Requirement
|
156
|
+
requirements:
|
157
|
+
- - "~>"
|
158
|
+
- !ruby/object:Gem::Version
|
159
|
+
version: '0.2'
|
160
|
+
type: :runtime
|
161
|
+
prerelease: false
|
162
|
+
version_requirements: !ruby/object:Gem::Requirement
|
163
|
+
requirements:
|
164
|
+
- - "~>"
|
165
|
+
- !ruby/object:Gem::Version
|
166
|
+
version: '0.2'
|
153
167
|
- !ruby/object:Gem::Dependency
|
154
168
|
name: tty-progressbar
|
155
169
|
requirement: !ruby/object:Gem::Requirement
|
@@ -164,6 +178,20 @@ dependencies:
|
|
164
178
|
- - "~>"
|
165
179
|
- !ruby/object:Gem::Version
|
166
180
|
version: '0.17'
|
181
|
+
- !ruby/object:Gem::Dependency
|
182
|
+
name: tty-spinner
|
183
|
+
requirement: !ruby/object:Gem::Requirement
|
184
|
+
requirements:
|
185
|
+
- - ">="
|
186
|
+
- !ruby/object:Gem::Version
|
187
|
+
version: '0'
|
188
|
+
type: :runtime
|
189
|
+
prerelease: false
|
190
|
+
version_requirements: !ruby/object:Gem::Requirement
|
191
|
+
requirements:
|
192
|
+
- - ">="
|
193
|
+
- !ruby/object:Gem::Version
|
194
|
+
version: '0'
|
167
195
|
- !ruby/object:Gem::Dependency
|
168
196
|
name: tty-table
|
169
197
|
requirement: !ruby/object:Gem::Requirement
|
@@ -178,6 +206,20 @@ dependencies:
|
|
178
206
|
- - "~>"
|
179
207
|
- !ruby/object:Gem::Version
|
180
208
|
version: '0.11'
|
209
|
+
- !ruby/object:Gem::Dependency
|
210
|
+
name: tty-prompt
|
211
|
+
requirement: !ruby/object:Gem::Requirement
|
212
|
+
requirements:
|
213
|
+
- - "~>"
|
214
|
+
- !ruby/object:Gem::Version
|
215
|
+
version: '0.23'
|
216
|
+
type: :runtime
|
217
|
+
prerelease: false
|
218
|
+
version_requirements: !ruby/object:Gem::Requirement
|
219
|
+
requirements:
|
220
|
+
- - "~>"
|
221
|
+
- !ruby/object:Gem::Version
|
222
|
+
version: '0.23'
|
181
223
|
- !ruby/object:Gem::Dependency
|
182
224
|
name: bundler
|
183
225
|
requirement: !ruby/object:Gem::Requirement
|
@@ -317,9 +359,11 @@ files:
|
|
317
359
|
- exe/chronicle-etl
|
318
360
|
- lib/chronicle/etl.rb
|
319
361
|
- lib/chronicle/etl/cli.rb
|
362
|
+
- lib/chronicle/etl/cli/cli_base.rb
|
320
363
|
- lib/chronicle/etl/cli/connectors.rb
|
321
364
|
- lib/chronicle/etl/cli/jobs.rb
|
322
365
|
- lib/chronicle/etl/cli/main.rb
|
366
|
+
- lib/chronicle/etl/cli/plugins.rb
|
323
367
|
- lib/chronicle/etl/cli/subcommand_base.rb
|
324
368
|
- lib/chronicle/etl/config.rb
|
325
369
|
- lib/chronicle/etl/configurable.rb
|
@@ -336,6 +380,7 @@ files:
|
|
336
380
|
- lib/chronicle/etl/job_log.rb
|
337
381
|
- lib/chronicle/etl/job_logger.rb
|
338
382
|
- lib/chronicle/etl/loaders/csv_loader.rb
|
383
|
+
- lib/chronicle/etl/loaders/helpers/encoding_helper.rb
|
339
384
|
- lib/chronicle/etl/loaders/json_loader.rb
|
340
385
|
- lib/chronicle/etl/loaders/loader.rb
|
341
386
|
- lib/chronicle/etl/loaders/rest_loader.rb
|
@@ -347,6 +392,7 @@ files:
|
|
347
392
|
- lib/chronicle/etl/models/entity.rb
|
348
393
|
- lib/chronicle/etl/models/raw.rb
|
349
394
|
- lib/chronicle/etl/registry/connector_registration.rb
|
395
|
+
- lib/chronicle/etl/registry/plugin_registry.rb
|
350
396
|
- lib/chronicle/etl/registry/registry.rb
|
351
397
|
- lib/chronicle/etl/registry/self_registering.rb
|
352
398
|
- lib/chronicle/etl/runner.rb
|
@@ -369,7 +415,7 @@ metadata:
|
|
369
415
|
homepage_uri: https://github.com/chronicle-app
|
370
416
|
source_code_uri: https://github.com/chronicle-app/chronicle-etl
|
371
417
|
changelog_uri: https://github.com/chronicle-app/chronicle-etl/releases
|
372
|
-
post_install_message:
|
418
|
+
post_install_message:
|
373
419
|
rdoc_options: []
|
374
420
|
require_paths:
|
375
421
|
- lib
|
@@ -384,8 +430,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
384
430
|
- !ruby/object:Gem::Version
|
385
431
|
version: '0'
|
386
432
|
requirements: []
|
387
|
-
rubygems_version: 3.
|
388
|
-
signing_key:
|
433
|
+
rubygems_version: 3.3.3
|
434
|
+
signing_key:
|
389
435
|
specification_version: 4
|
390
436
|
summary: ETL tool for personal data
|
391
437
|
test_files: []
|