chronicle-etl 0.4.1 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8a267de435b41b579e36128b7392729ef499eb37f05fabaead7811f089938ddb
4
- data.tar.gz: d4af2f62f3f5de926bdfbb0e3d6dbe2c952ec286c07317af4dca8d98f665d6da
3
+ metadata.gz: f041e90fc6019ecbc2424e8e75e7f21fa5ba715c2cdbde73d61c298e9ca82e07
4
+ data.tar.gz: 2feda7669f8fefc7ad80fe7918530a86ad2ced431f75e70ad42653c871b67d90
5
5
  SHA512:
6
- metadata.gz: c78080cce008340f0b2795be46da2b5eb6562b2bffd97728150960343870f2bea4699e4efa07905710dd0e2eba7aaa1e803d8c0f727196f5d9d655b28a04f02e
7
- data.tar.gz: cae3a3ffb6527f5c0b3ff89c75dc98d9cd66157ee6230c9db797f4683f90e2146daadf291108e55d3090d0120d3c9e25135cb21c4e9078bcaf4d1edf2172c930
6
+ metadata.gz: 4a40c72dcb6514037c6e53214dc0af3bfba20c272b959c3e83496658b4b8dc3f841d7399aa215a5fcc5c5cd494278223666b8b881f645ed2c61b667351ccde94
7
+ data.tar.gz: 5b5450b76a8c03d7cb8405888b2e059390c1585a66524dd7403b458c828a1936422e03a9654ec6d270b634bf6bfe17d6dafda54e05aca8a8732207366d03ffd2
data/.rubocop.yml CHANGED
@@ -27,6 +27,9 @@ Style/OpenStructUse:
27
27
  Style/Copyright:
28
28
  Enabled: false
29
29
 
30
+ Style/MissingElse:
31
+ Enabled: false
32
+
30
33
  Style/SymbolArray:
31
34
  EnforcedStyle: brackets
32
35
 
data/README.md CHANGED
@@ -1,12 +1,14 @@
1
1
  ## A CLI toolkit for extracting and working with your digital history
2
2
 
3
+ ![chronicle-etl-banner](https://user-images.githubusercontent.com/6291/157330518-0f934c9a-9ec4-43d9-9cc2-12f156d09b37.png)
4
+
3
5
  [![Gem Version](https://badge.fury.io/rb/chronicle-etl.svg)](https://badge.fury.io/rb/chronicle-etl) [![Ruby](https://github.com/chronicle-app/chronicle-etl/actions/workflows/ruby.yml/badge.svg)](https://github.com/chronicle-app/chronicle-etl/actions/workflows/ruby.yml)
4
6
 
5
7
  Are you trying to archive your digital history or incorporate it into your own projects? You’ve probably discovered how frustrating it is to get machine-readable access to your own data. While [building a memex](https://hyfen.net/memex/), I learned first-hand what great efforts must be made before you can begin using the data in interesting ways.
6
8
 
7
9
  If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
8
10
 
9
- `chronicle-etl` is a CLI tool that gives you the ability to easily access your personal data. It uses the ETL pattern to **extract** it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), **transform** it (into a given schema), and **load** it to a source (e.g. a CSV file, JSON, external API).
11
+ **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a source (e.g. a CSV file, JSON, external API).
10
12
 
11
13
  ## What does `chronicle-etl` give you?
12
14
  * **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
@@ -86,7 +88,16 @@ Plugins provide access to data from third-party platforms, services, or formats.
86
88
 
87
89
  ```bash
88
90
  # Install a plugin
89
- $ chronicle-etl connectors:install NAME
91
+ $ chronicle-etl plugins:install NAME
92
+
93
+ # Install the imessage plugin
94
+ $ chronicle-etl plugins:install imessage
95
+
96
+ # List installed plugins
97
+ $ chronicle-etl plugins:list
98
+
99
+ # Uninstall a plugin
100
+ $ chronicle-etl plugins:uninstall NAME
90
101
  ```
91
102
 
92
103
  A few dozen importers exist [in my Memex project](https://hyfen.net/memex/) and they’re being ported over to the Chronicle system. This table shows what’s available now and what’s coming. Rows are sorted in very rough order of priority.
@@ -99,8 +110,8 @@ If you want to work together on a connector, please [get in touch](#get-in-touch
99
110
  | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (zsh support pending) |
100
111
  | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (imap support pending) |
101
112
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
113
+ | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
102
114
  | github | Github user and repo activity | In progress |
103
- | safari | Browser history from local sqlite db | Needs porting |
104
115
  | chrome | Browser history from local sqlite db | Needs porting |
105
116
  | whatsapp | Messaging history (via individual chat exports) or reverse-engineered local desktop install | Unstarted |
106
117
  | anki | Studying and card creation history | Needs porting |
@@ -186,4 +197,4 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/chroni
186
197
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
187
198
 
188
199
  ## Code of Conduct
189
- Everyone interacting in the Chronicle::ETL project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/chronicle-app/chronicle-etl/blob/master/CODE_OF_CONDUCT.md).
200
+ Everyone interacting in the Chronicle::ETL project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/chronicle-app/chronicle-etl/blob/main/CODE_OF_CONDUCT.md).
@@ -47,8 +47,11 @@ Gem::Specification.new do |spec|
47
47
  spec.add_dependency "sequel", "~> 5.35"
48
48
  spec.add_dependency "sqlite3", "~> 1.4"
49
49
  spec.add_dependency "thor", "~> 1.2"
50
+ spec.add_dependency "thor-hollaback", "~> 0.2"
50
51
  spec.add_dependency "tty-progressbar", "~> 0.17"
52
+ spec.add_dependency "tty-spinner"
51
53
  spec.add_dependency "tty-table", "~> 0.11"
54
+ spec.add_dependency "tty-prompt", "~> 0.23"
52
55
 
53
56
  spec.add_development_dependency "bundler", "~> 2.1"
54
57
  spec.add_development_dependency "pry-byebug", "~> 3.9"
@@ -8,11 +8,6 @@ module Chronicle
8
8
  default_task 'list'
9
9
  namespace :connectors
10
10
 
11
- desc "install NAME", "Installs connector NAME"
12
- def install(name)
13
- Chronicle::ETL::Registry.install_connector(name)
14
- end
15
-
16
11
  desc "list", "Lists available connectors"
17
12
  # Display all available connectors that chronicle-etl has access to
18
13
  def list
@@ -44,21 +39,21 @@ module Chronicle
44
39
  desc "show PHASE IDENTIFIER", "Show information about a connector"
45
40
  def show(phase, identifier)
46
41
  unless ['extractor', 'transformer', 'loader'].include?(phase)
47
- puts "phase argument must be one of: [extractor, transformer, loader]"
48
- return
42
+ Chronicle::ETL::Logger.fatal("Phase argument must be one of: [extractor, transformer, loader]")
43
+ exit 1
49
44
  end
50
45
 
51
46
  begin
52
47
  connector = Chronicle::ETL::Registry.find_by_phase_and_identifier(phase.to_sym, identifier)
53
- rescue Chronicle::ETL::ConnectorNotAvailableError
54
- puts "Could not find #{phase} #{identifier}"
55
- return
48
+ rescue Chronicle::ETL::ConnectorNotAvailableError, Chronicle::ETL::PluginError
49
+ Chronicle::ETL::Logger.fatal("Could not find #{phase} #{identifier}")
50
+ exit 1
56
51
  end
57
52
 
58
53
  puts connector.klass.to_s.bold
59
54
  puts " #{connector.descriptive_phrase}"
60
55
  puts
61
- puts "OPTIONS"
56
+ puts "Settings:"
62
57
 
63
58
  headers = ['name', 'default', 'required'].map{ |h| h.to_s.upcase.bold }
64
59
 
@@ -1,4 +1,5 @@
1
1
  require 'pp'
2
+ require 'tty-prompt'
2
3
 
3
4
  module Chronicle
4
5
  module ETL
@@ -6,7 +7,7 @@ module Chronicle
6
7
  # CLI commands for working with ETL jobs
7
8
  class Jobs < SubcommandBase
8
9
  default_task "start"
9
- namespace :jobs
10
+ namespace :jobs
10
11
 
11
12
  class_option :name, aliases: '-j', desc: 'Job configuration name'
12
13
 
@@ -25,16 +26,11 @@ module Chronicle
25
26
 
26
27
  class_option :output, aliases: '-o', desc: 'Output filename', type: 'string'
27
28
  class_option :fields, desc: 'Output only these fields', type: 'array', banner: 'field1 field2 ...'
28
-
29
- class_option :log_level, desc: 'Log level (debug, info, warn, error, fatal)', default: 'info'
30
- class_option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
31
- class_option :silent, desc: 'Silence all output', type: :boolean
29
+ class_option :header_row, desc: 'Output the header row of tabular output', type: 'boolean'
32
30
 
33
31
  # Thor doesn't like `run` as a command name
34
32
  map run: :start
35
33
  desc "run", "Start a job"
36
- option :log_level, desc: 'Log level (debug, info, warn, error, fatal)', default: 'info'
37
- option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
38
34
  option :dry_run, desc: 'Only run the extraction and transform steps, not the loading', type: :boolean
39
35
  long_desc <<-LONG_DESC
40
36
  This will run an ETL job. Each job needs three parts:
@@ -49,25 +45,40 @@ module Chronicle
49
45
  LONG_DESC
50
46
  # Run an ETL job
51
47
  def start
52
- setup_log_level
53
- job_definition = build_job_definition(options)
54
- job = Chronicle::ETL::Job.new(job_definition)
55
- runner = Chronicle::ETL::Runner.new(job)
56
- runner.run!
48
+ run_job(options)
49
+ rescue Chronicle::ETL::JobDefinitionError => e
50
+ missing_plugins = e.job_definition.errors
51
+ .select { |error| error.is_a?(Chronicle::ETL::PluginLoadError) }
52
+ .map(&:name)
53
+ .uniq
54
+
55
+ install_missing_plugins(missing_plugins)
56
+ run_job(options)
57
57
  end
58
58
 
59
59
  desc "create", "Create a job"
60
60
  # Create an ETL job
61
61
  def create
62
62
  job_definition = build_job_definition(options)
63
+ job_definition.validate!
64
+
63
65
  path = File.join('chronicle', 'etl', 'jobs', options[:name])
64
66
  Chronicle::ETL::Config.write(path, job_definition.definition)
67
+ rescue Chronicle::ETL::JobDefinitionError => e
68
+ Chronicle::ETL::Logger.debug(e.full_message)
69
+ Chronicle::ETL::Logger.fatal("Job definition error".red)
65
70
  end
66
71
 
67
72
  desc "show", "Show details about a job"
68
73
  # Show an ETL job
69
74
  def show
70
- puts Chronicle::ETL::Job.new(build_job_definition(options))
75
+ job_definition = build_job_definition(options)
76
+ job_definition.validate!
77
+ puts Chronicle::ETL::Job.new(job_definition)
78
+ rescue Chronicle::ETL::JobDefinitionError => e
79
+ Chronicle::ETL::Logger.debug(e.full_message)
80
+ Chronicle::ETL::Logger.fatal("Job definition error".red)
81
+ exit 1
71
82
  end
72
83
 
73
84
  desc "list", "List all available jobs"
@@ -87,21 +98,43 @@ LONG_DESC
87
98
 
88
99
  headers = ['name', 'extractor', 'transformer', 'loader'].map { |h| h.upcase.bold }
89
100
 
101
+ puts "Available jobs:"
90
102
  table = TTY::Table.new(headers, job_details)
91
103
  puts table.render(indent: 0, padding: [0, 2])
104
+ rescue Chronicle::ETL::ConfigError => e
105
+ Chronicle::ETL::Logger.debug(e.full_message)
106
+ Chronicle::ETL::Logger.fatal("Error reading config. #{e.message}".red)
107
+ exit 1
92
108
  end
93
109
 
94
110
  private
95
111
 
96
- def setup_log_level
97
- if options[:silent]
98
- Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::SILENT
99
- elsif options[:verbose]
100
- Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::DEBUG
101
- elsif options[:log_level]
102
- level = Chronicle::ETL::Logger.const_get(options[:log_level].upcase)
103
- Chronicle::ETL::Logger.log_level = level
112
+ def run_job(options)
113
+ job_definition = build_job_definition(options)
114
+ job = Chronicle::ETL::Job.new(job_definition)
115
+ runner = Chronicle::ETL::Runner.new(job)
116
+ runner.run!
117
+ end
118
+
119
+ # TODO: probably could merge this with something in cli/plugin
120
+ def install_missing_plugins(missing_plugins)
121
+ prompt = TTY::Prompt.new
122
+ message = "Plugin#{'s' if missing_plugins.count > 1} specified by job not installed.\n"
123
+ message += "Do you want to install "
124
+ message += missing_plugins.map { |name| "chronicle-#{name}".bold}.join(", ")
125
+ message += " and start the job?"
126
+ install = prompt.yes?(message)
127
+ return unless install
128
+
129
+ spinner = TTY::Spinner.new("[:spinner] Installing plugins...", format: :dots_2)
130
+ spinner.auto_spin
131
+ missing_plugins.each do |plugin|
132
+ Chronicle::ETL::Registry::PluginRegistry.install(plugin)
104
133
  end
134
+ spinner.success("(#{'successful'.green})")
135
+ rescue Chronicle::ETL::PluginNotAvailableError => e
136
+ spinner.error("Error".red)
137
+ Chronicle::ETL::Logger.fatal("Plugin '#{e.name}' could not be installed".red)
105
138
  end
106
139
 
107
140
  # Create job definition by reading config file and then overwriting with flag options
@@ -129,6 +162,7 @@ LONG_DESC
129
162
 
130
163
  loader_options = options[:'loader-opts'].merge({
131
164
  output: options[:output],
165
+ header_row: options[:header_row],
132
166
  fields: options[:fields]
133
167
  }.compact)
134
168
 
@@ -5,6 +5,14 @@ module Chronicle
5
5
  module CLI
6
6
  # Main entrypoint for CLI app
7
7
  class Main < ::Thor
8
+ class_before :set_log_level
9
+ class_before :set_color_output
10
+
11
+ class_option :log_level, desc: 'Log level (debug, info, warn, error, fatal, silent)', default: 'info'
12
+ class_option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
13
+ class_option :silent, desc: 'Silence all output', type: :boolean
14
+ class_option :'no-color', desc: 'Disable colour output', type: :boolean
15
+
8
16
  default_task "jobs"
9
17
 
10
18
  desc 'connectors:COMMAND', 'Connectors available for ETL jobs', hide: true
@@ -13,6 +21,9 @@ module Chronicle
13
21
  desc 'jobs:COMMAND', 'Configure and run jobs', hide: true
14
22
  subcommand 'jobs', Jobs
15
23
 
24
+ desc 'plugins:COMMAND', 'Configure plugins', hide: true
25
+ subcommand 'plugins', Plugins
26
+
16
27
  # Entrypoint for the CLI
17
28
  def self.start(given_args = ARGV, config = {})
18
29
  # take a subcommand:command and splits them so Thor knows how to hand off to the subcommand class
@@ -79,6 +90,23 @@ module Chronicle
79
90
  shell.say
80
91
  end
81
92
  end
93
+
94
+ no_commands do
95
+ def set_color_output
96
+ String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
97
+ end
98
+
99
+ def set_log_level
100
+ if options[:silent]
101
+ Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::SILENT
102
+ elsif options[:verbose]
103
+ Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::DEBUG
104
+ elsif options[:log_level]
105
+ level = Chronicle::ETL::Logger.const_get(options[:log_level].upcase)
106
+ Chronicle::ETL::Logger.log_level = level
107
+ end
108
+ end
109
+ end
82
110
  end
83
111
  end
84
112
  end
@@ -0,0 +1,62 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "tty-prompt"
4
+ require "tty-spinner"
5
+
6
+
7
+ module Chronicle
8
+ module ETL
9
+ module CLI
10
+ # CLI commands for working with ETL plugins
11
+ class Plugins < SubcommandBase
12
+ default_task 'list'
13
+ namespace :plugins
14
+
15
+ desc "install", "Install a plugin"
16
+ def install(name)
17
+ spinner = TTY::Spinner.new("[:spinner] Installing plugin #{name}...", format: :dots_2)
18
+ spinner.auto_spin
19
+ Chronicle::ETL::Registry::PluginRegistry.install(name)
20
+ spinner.success("(#{'successful'.green})")
21
+ rescue Chronicle::ETL::PluginError => e
22
+ spinner.error("Error".red)
23
+ Chronicle::ETL::Logger.debug(e.full_message)
24
+ Chronicle::ETL::Logger.fatal("Plugin '#{name}' could not be installed".red)
25
+ exit 1
26
+ end
27
+
28
+ desc "uninstall", "Unintall a plugin"
29
+ def uninstall(name)
30
+ spinner = TTY::Spinner.new("[:spinner] Uninstalling plugin #{name}...", format: :dots_2)
31
+ spinner.auto_spin
32
+ Chronicle::ETL::Registry::PluginRegistry.uninstall(name)
33
+ spinner.success("(#{'successful'.green})")
34
+ rescue Chronicle::ETL::PluginError => e
35
+ spinner.error("Error".red)
36
+ Chronicle::ETL::Logger.debug(e.full_message)
37
+ Chronicle::ETL::Logger.fatal("Plugin '#{name}' could not be uninstalled (was it installed?)".red)
38
+ exit 1
39
+ end
40
+
41
+ desc "list", "Lists available plugins"
42
+ # Display all available plugins that chronicle-etl has access to
43
+ def list
44
+ plugins = Chronicle::ETL::Registry::PluginRegistry.all_installed_latest
45
+
46
+ info = plugins.map do |plugin|
47
+ {
48
+ name: plugin.name.sub("chronicle-", ""),
49
+ description: plugin.description,
50
+ version: plugin.version
51
+ }
52
+ end
53
+
54
+ headers = ['name', 'description', 'latest version'].map{ |h| h.to_s.upcase.bold }
55
+ table = TTY::Table.new(headers, info.map(&:values))
56
+ puts "Installed plugins:"
57
+ puts table.render(indent: 2, padding: [0, 0])
58
+ end
59
+ end
60
+ end
61
+ end
62
+ end
@@ -1,7 +1,9 @@
1
1
  require 'thor'
2
+ require 'thor/hollaback'
2
3
  require 'chronicle/etl'
3
4
 
4
5
  require 'chronicle/etl/cli/subcommand_base'
5
6
  require 'chronicle/etl/cli/connectors'
6
7
  require 'chronicle/etl/cli/jobs'
8
+ require 'chronicle/etl/cli/plugins'
7
9
  require 'chronicle/etl/cli/main'
@@ -24,16 +24,14 @@ module Chronicle
24
24
 
25
25
  # Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
26
26
  def available_jobs
27
- job_directory = Runcom::Config.new('chronicle/etl/jobs').current
28
- Dir.glob(File.join(job_directory, "*.yml")).map do |filename|
27
+ Dir.glob(File.join(config_directory("jobs"), "*.yml")).map do |filename|
29
28
  File.basename(filename, ".*")
30
29
  end
31
30
  end
32
31
 
33
32
  # Returns all available credentials available in ~/.config/chronicle/etl/credentials/*.yml
34
33
  def available_credentials
35
- job_directory = Runcom::Config.new('chronicle/etl/credentials').current
36
- Dir.glob(File.join(job_directory, "*.yml")).map do |filename|
34
+ Dir.glob(File.join(config_directory("credentials"), "*.yml")).map do |filename|
37
35
  File.basename(filename, ".*")
38
36
  end
39
37
  end
@@ -48,6 +46,11 @@ module Chronicle
48
46
  def load_credentials(name)
49
47
  config = self.load("chronicle/etl/credentials/#{name}.yml")
50
48
  end
49
+
50
+ def config_directory(type)
51
+ path = "chronicle/etl/#{type}"
52
+ Runcom::Config.new(path).current || raise(Chronicle::ETL::ConfigError, "Could not access config directory (#{path})")
53
+ end
51
54
  end
52
55
  end
53
56
  end
@@ -57,7 +57,7 @@ module Chronicle
57
57
 
58
58
  options.each do |name, value|
59
59
  setting = self.class.all_settings[name]
60
- raise(Chronicle::ETL::ConfigurationError, "Unrecognized setting: #{name}") unless setting
60
+ raise(Chronicle::ETL::ConnectorConfigurationError, "Unrecognized setting: #{name}") unless setting
61
61
 
62
62
  @config[name] = coerced_value(setting, value)
63
63
  end
@@ -78,7 +78,7 @@ module Chronicle
78
78
 
79
79
  def validate_config
80
80
  missing = (self.class.all_required_settings.keys - @config.compacted_h.keys)
81
- raise Chronicle::ETL::ConfigurationError, "Missing options: #{missing}" if missing.count.positive?
81
+ raise Chronicle::ETL::ConnectorConfigurationError, "Missing options: #{missing}" if missing.count.positive?
82
82
  end
83
83
 
84
84
  def coerced_value(setting, value)
@@ -89,6 +89,11 @@ module Chronicle
89
89
  value.to_s
90
90
  end
91
91
 
92
+ # TODO: think about whether to split up float, integer
93
+ def coerce_numeric(value)
94
+ value.to_f
95
+ end
96
+
92
97
  def coerce_boolean(value)
93
98
  if value.is_a?(String)
94
99
  value.downcase == "true"
@@ -2,10 +2,32 @@ module Chronicle
2
2
  module ETL
3
3
  class Error < StandardError; end
4
4
 
5
- class ConfigurationError < Error; end
5
+ class ConfigError < Error; end
6
6
 
7
7
  class RunnerTypeError < Error; end
8
8
 
9
+ class JobDefinitionError < Error
10
+ attr_reader :job_definition
11
+
12
+ def initialize(job_definition)
13
+ @job_definition = job_definition
14
+ super
15
+ end
16
+ end
17
+
18
+ class PluginError < Error
19
+ attr_reader :name
20
+
21
+ def initialize(name)
22
+ @name = name
23
+ end
24
+ end
25
+
26
+ class PluginNotAvailableError < PluginError; end
27
+ class PluginLoadError < PluginError; end
28
+
29
+ class ConnectorConfigurationError < Error; end
30
+
9
31
  class ConnectorNotAvailableError < Error
10
32
  def initialize(message, provider: nil, name: nil)
11
33
  super(message)
@@ -9,7 +9,7 @@ module Chronicle
9
9
 
10
10
  setting :since, type: :time
11
11
  setting :until, type: :time
12
- setting :limit
12
+ setting :limit, type: :numeric
13
13
  setting :load_after_id
14
14
  setting :input
15
15
 
@@ -1,6 +1,11 @@
1
1
  require 'forwardable'
2
+
2
3
  module Chronicle
3
4
  module ETL
5
+ # A runner job
6
+ #
7
+ # TODO: this can probably be merged with JobDefinition. Not clear
8
+ # where the boundaries are
4
9
  class Job
5
10
  extend Forwardable
6
11
 
@@ -12,7 +17,8 @@ module Chronicle
12
17
  :transformer_klass,
13
18
  :transformer_options,
14
19
  :loader_klass,
15
- :loader_options
20
+ :loader_options,
21
+ :job_definition
16
22
 
17
23
  # TODO: build a proper id system
18
24
  alias id name
@@ -19,12 +19,31 @@ module Chronicle
19
19
  }
20
20
  }.freeze
21
21
 
22
+ attr_reader :errors
22
23
  attr_accessor :definition
23
24
 
24
25
  def initialize()
25
26
  @definition = SKELETON_DEFINITION
26
27
  end
27
28
 
29
+ def validate
30
+ @errors = []
31
+
32
+ Chronicle::ETL::Registry::PHASES.each do |phase|
33
+ __send__("#{phase}_klass".to_sym)
34
+ rescue Chronicle::ETL::PluginError => e
35
+ @errors << e
36
+ end
37
+
38
+ @errors.empty?
39
+ end
40
+
41
+ def validate!
42
+ raise(Chronicle::ETL::JobDefinitionError.new(self), "Job definition is invalid") unless validate
43
+
44
+ true
45
+ end
46
+
28
47
  # Add config hash to this definition
29
48
  def add_config(config = {})
30
49
  @definition = @definition.deep_merge(config)
@@ -80,10 +99,6 @@ module Chronicle
80
99
  end
81
100
  end
82
101
  end
83
-
84
- def validate
85
- return true # TODO
86
- end
87
102
  end
88
103
  end
89
104
  end
@@ -7,22 +7,49 @@ module Chronicle
7
7
  r.description = 'CSV'
8
8
  end
9
9
 
10
- def initialize(options={})
11
- super(options)
12
- @rows = []
10
+ setting :output, default: $stdout
11
+ setting :headers, default: true
12
+ setting :header_row, default: true
13
+
14
+ def records
15
+ @records ||= []
13
16
  end
14
17
 
15
18
  def load(record)
16
- @rows << record.to_h_flattened.values
19
+ records << record.to_h_flattened
17
20
  end
18
21
 
19
22
  def finish
20
- z = $stdout
21
- CSV(z) do |csv|
22
- @rows.each do |row|
23
- csv << row
23
+ return unless records.any?
24
+
25
+ headers = build_headers(records)
26
+
27
+ csv_options = {}
28
+ if @config.headers
29
+ csv_options[:write_headers] = @config.header_row
30
+ csv_options[:headers] = headers
31
+ end
32
+
33
+ if @config.output.is_a?(IO)
34
+ # This might seem like a duplication of the default value ($stdout)
35
+ # but it's because rspec overwrites $stdout (in helper #capture) to
36
+ # capture output.
37
+ io = $stdout.dup
38
+ else
39
+ io = File.open(@config.output, "w+")
40
+ end
41
+
42
+ output = CSV.generate(**csv_options) do |csv|
43
+ records.each do |record|
44
+ csv << record
45
+ .transform_keys(&:to_sym)
46
+ .values_at(*headers)
47
+ .map { |value| force_utf8(value) }
24
48
  end
25
49
  end
50
+
51
+ io.write(output)
52
+ io.close
26
53
  end
27
54
  end
28
55
  end
@@ -0,0 +1,18 @@
1
+ require 'pathname'
2
+
3
+ module Chronicle
4
+ module ETL
5
+ module Loaders
6
+ module Helpers
7
+ module EncodingHelper
8
+ # Mostly useful for handling loading with binary data from a raw extraction
9
+ def force_utf8(value)
10
+ return value unless value.is_a?(String)
11
+
12
+ value.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
13
+ end
14
+ end
15
+ end
16
+ end
17
+ end
18
+ end
@@ -25,7 +25,7 @@ module Chronicle
25
25
  encoded = serialized.transform_values do |value|
26
26
  next value unless value.is_a?(String)
27
27
 
28
- value.encode('UTF-8', invalid: :replace, undef: :replace, replace: '?')
28
+ force_utf8(value)
29
29
  end
30
30
  @output.puts encoded.to_json
31
31
  end
@@ -1,11 +1,17 @@
1
+ require_relative 'helpers/encoding_helper'
2
+
1
3
  module Chronicle
2
4
  module ETL
3
5
  # Abstract class representing a Loader for an ETL job
4
6
  class Loader
5
7
  extend Chronicle::ETL::Registry::SelfRegistering
6
8
  include Chronicle::ETL::Configurable
9
+ include Chronicle::ETL::Loaders::Helpers::EncodingHelper
7
10
 
8
11
  setting :output
12
+ setting :fields
13
+ setting :fields_limit, default: nil
14
+ setting :fields_exclude
9
15
 
10
16
  # Construct a new instance of this loader. Options are passed in from a Runner
11
17
  # == Parameters:
@@ -25,6 +31,23 @@ module Chronicle
25
31
 
26
32
  # Called once there are no more records to process
27
33
  def finish; end
34
+
35
+ private
36
+
37
+ def build_headers(records)
38
+ headers =
39
+ if @config.fields && @config.fields.any?
40
+ Set[*@config.fields]
41
+ else
42
+ # use all the keys of the flattened record hash
43
+ Set[*records.map(&:keys).flatten.map(&:to_s).uniq]
44
+ end
45
+
46
+ headers = headers.delete_if { |header| header.end_with?(*@config.fields_exclude) }
47
+ headers = headers.first(@config.fields_limit) if @config.fields_limit
48
+
49
+ headers.to_a.map(&:to_sym)
50
+ end
28
51
  end
29
52
  end
30
53
  end
@@ -9,11 +9,10 @@ module Chronicle
9
9
  r.description = 'an ASCII table'
10
10
  end
11
11
 
12
- setting :fields_limit, default: nil
13
- setting :fields_exclude, default: ['lids', 'type']
14
- setting :fields, default: []
15
12
  setting :truncate_values_at, default: 40
16
13
  setting :table_renderer, default: :basic
14
+ setting :fields_exclude, default: ['lids', 'type']
15
+ setting :header_row, default: true
17
16
 
18
17
  def load(record)
19
18
  records << record.to_h_flattened
@@ -25,7 +24,7 @@ module Chronicle
25
24
  headers = build_headers(records)
26
25
  rows = build_rows(records, headers)
27
26
 
28
- @table = TTY::Table.new(header: headers, rows: rows)
27
+ @table = TTY::Table.new(header: (headers if @config.header_row), rows: rows)
29
28
  puts @table.render(
30
29
  @config.table_renderer.to_sym,
31
30
  padding: [0, 2, 0, 0]
@@ -38,25 +37,10 @@ module Chronicle
38
37
 
39
38
  private
40
39
 
41
- def build_headers(records)
42
- headers =
43
- if @config.fields.any?
44
- Set[*@config.fields]
45
- else
46
- # use all the keys of the flattened record hash
47
- Set[*records.map(&:keys).flatten.map(&:to_s).uniq]
48
- end
49
-
50
- headers = headers.delete_if { |header| header.end_with?(*@config.fields_exclude) } if @config.fields_exclude.any?
51
- headers = headers.first(@config.fields_limit) if @config.fields_limit
52
-
53
- headers.to_a.map(&:to_sym)
54
- end
55
-
56
40
  def build_rows(records, headers)
57
41
  records.map do |record|
58
42
  values = record.transform_keys(&:to_sym).values_at(*headers).map{|value| value.to_s }
59
-
43
+ values = values.map { |value| force_utf8(value) }
60
44
  if @config.truncate_values_at
61
45
  values = values.map{ |value| value.truncate(@config.truncate_values_at) }
62
46
  end
@@ -13,7 +13,6 @@ module Chronicle
13
13
  attr_accessor :log_level
14
14
 
15
15
  @log_level = INFO
16
- @destination = $stderr
17
16
 
18
17
  def output message, level
19
18
  return unless level >= @log_level
@@ -21,10 +20,14 @@ module Chronicle
21
20
  if @progress_bar
22
21
  @progress_bar.log(message)
23
22
  else
24
- @destination.puts(message)
23
+ $stderr.puts(message)
25
24
  end
26
25
  end
27
26
 
27
+ def fatal(message)
28
+ output(message, FATAL)
29
+ end
30
+
28
31
  def error(message)
29
32
  output(message, ERROR)
30
33
  end
@@ -44,6 +44,11 @@ module Chronicle
44
44
  @provider || (built_in? ? 'chronicle' : '')
45
45
  end
46
46
 
47
+ # TODO: allow overriding here. Maybe through self-registration process
48
+ def plugin
49
+ @provider
50
+ end
51
+
47
52
  def descriptive_phrase
48
53
  prefix = case phase
49
54
  when :extractor
@@ -0,0 +1,70 @@
1
+ require 'rubygems'
2
+ require 'rubygems/command'
3
+ require 'rubygems/commands/install_command'
4
+ require 'rubygems/uninstaller'
5
+
6
+ module Chronicle
7
+ module ETL
8
+ module Registry
9
+ # Responsible for managing plugins available to chronicle-etl
10
+ #
11
+ # @todo Better validation for whether a gem is actually a plugin
12
+ # @todo Add ways to load a plugin that don't require a gem on rubygems.org
13
+ module PluginRegistry
14
+ # Does this plugin exist?
15
+ def self.exists?(name)
16
+ # TODO: implement this. Could query rubygems.org or have a
17
+ # hardcoded approved list
18
+ true
19
+ end
20
+
21
+ # All versions of all plugins currently installed
22
+ def self.all_installed
23
+ # TODO: add check for chronicle-etl dependency
24
+ Gem::Specification.filter { |s| s.name.match(/^chronicle-/) && s.name != "chronicle-etl" }
25
+ end
26
+
27
+ # Latest version of each installed plugin
28
+ def self.all_installed_latest
29
+ all_installed.group_by(&:name)
30
+ .transform_values { |versions| versions.sort_by(&:version).reverse.first }
31
+ .values
32
+ end
33
+
34
+ # Activate a plugin with given name by `require`ing it
35
+ def self.activate(name)
36
+ # By default, activates the latest available version of a gem
37
+ # so don't have to run Kernel#gem separately
38
+ require "chronicle/#{name}"
39
+ rescue LoadError
40
+ raise Chronicle::ETL::PluginLoadError.new(name), "Plugin #{name} couldn't be loaded" if exists?(name)
41
+
42
+ raise Chronicle::ETL::PluginNotAvailableError.new(name), "Plugin #{name} doesn't exist"
43
+ end
44
+
45
+ # Install a plugin to local gems
46
+ def self.install(name)
47
+ gem_name = "chronicle-#{name}"
48
+ raise(Chronicle::ETL::PluginNotAvailableError.new(gem_name), "Plugin #{name} doesn't exist") unless exists?(gem_name)
49
+
50
+ Gem::DefaultUserInteraction.ui = Gem::SilentUI.new
51
+ Gem.install(gem_name)
52
+ rescue Gem::UnsatisfiableDependencyError
53
+ # TODO: we need to catch a lot more than this here
54
+ raise Chronicle::ETL::PluginNotAvailableError.new(name), "Plugin #{name} doesn't exist"
55
+ end
56
+
57
+ # Uninstall a plugin
58
+ def self.uninstall(name)
59
+ gem_name = "chronicle-#{name}"
60
+ Gem::DefaultUserInteraction.ui = Gem::SilentUI.new
61
+ uninstaller = Gem::Uninstaller.new(gem_name)
62
+ uninstaller.uninstall
63
+ rescue Gem::InstallError
64
+ # TODO: strengthen this exception handling
65
+ raise(Chronicle::ETL::PluginError.new(name), "Plugin #{name} wasn't uninstalled")
66
+ end
67
+ end
68
+ end
69
+ end
70
+ end
@@ -20,28 +20,40 @@ module Chronicle
20
20
  end
21
21
  end
22
22
 
23
- def install_connector name
24
- gem_name = "chronicle-#{name}"
25
- Gem.install(gem_name)
23
+ def register connector
24
+ connectors << connector
26
25
  end
27
26
 
28
- def register connector
27
+ def connectors
29
28
  @connectors ||= []
30
- @connectors << connector
31
29
  end
32
30
 
33
31
  def find_by_phase_and_identifier(phase, identifier)
34
- connector = find_within_loaded_connectors(phase, identifier)
35
- unless connector
36
- # Only load external connectors (slow) if not found in built-in connectors
37
- load_all!
38
- connector = find_within_loaded_connectors(phase, identifier)
32
+ # Simple case: built in connector
33
+ connector = connectors.find { |c| c.phase == phase && c.identifier == identifier }
34
+ return connector if connector
35
+
36
+ # if not available in built-in connectors, try to activate a
37
+ # relevant plugin and try again
38
+ if identifier.include?(":")
39
+ plugin, name = identifier.split(":")
40
+ else
41
+ # This case handles the case where the identifier is a
42
+ # shorthand (ie `imessage`) because there's only one default
43
+ # connector.
44
+ plugin = identifier
39
45
  end
40
- connector || raise(ConnectorNotAvailableError.new("Connector '#{identifier}' not found"))
41
- end
42
46
 
43
- def find_within_loaded_connectors(phase, identifier)
44
- @connectors.find { |c| c.phase == phase && c.identifier == identifier }
47
+ PluginRegistry.activate(plugin)
48
+
49
+ candidates = connectors.select { |c| c.phase == phase && c.plugin == plugin }
50
+ # if no name given, just use first connector with right phase/plugin
51
+ # TODO: set up a property for connectors to specify that they're the
52
+ # default connector for the plugin
53
+ candidates = candidates.select { |c| c.identifier == name } if name
54
+ connector = candidates.first
55
+
56
+ connector || raise(ConnectorNotAvailableError, "Connector '#{identifier}' not found")
45
57
  end
46
58
  end
47
59
  end
@@ -50,3 +62,4 @@ end
50
62
 
51
63
  require_relative 'self_registering'
52
64
  require_relative 'connector_registration'
65
+ require_relative 'plugin_registry'
@@ -8,19 +8,41 @@ class Chronicle::ETL::Runner
8
8
  end
9
9
 
10
10
  def run!
11
- extractor = @job.instantiate_extractor
12
- loader = @job.instantiate_loader
11
+ validate_job
12
+ instantiate_connectors
13
+ prepare_job
14
+ prepare_ui
15
+ run_extraction
16
+ finish_job
17
+ end
18
+
19
+ private
20
+
21
+ def validate_job
22
+ @job.job_definition.validate!
23
+ end
24
+
25
+ def instantiate_connectors
26
+ @extractor = @job.instantiate_extractor
27
+ @loader = @job.instantiate_loader
28
+ end
13
29
 
30
+ def prepare_job
31
+ Chronicle::ETL::Logger.info(tty_log_job_start)
14
32
  @job_logger.start
15
- loader.start
33
+ @loader.start
34
+ @extractor.prepare
35
+ end
16
36
 
17
- extractor.prepare
18
- total = extractor.results_count
37
+ def prepare_ui
38
+ total = @extractor.results_count
19
39
  @progress_bar = Chronicle::ETL::Utils::ProgressBar.new(title: 'Running job', total: total)
20
40
  Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
41
+ end
21
42
 
22
- Chronicle::ETL::Logger.info(tty_log_job_start)
23
- extractor.extract do |extraction|
43
+ # TODO: refactor this further
44
+ def run_extraction
45
+ @extractor.extract do |extraction|
24
46
  unless extraction.is_a?(Chronicle::ETL::Extraction)
25
47
  raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
26
48
  end
@@ -28,15 +50,10 @@ class Chronicle::ETL::Runner
28
50
  transformer = @job.instantiate_transformer(extraction)
29
51
  record = transformer.transform
30
52
 
31
- # TODO: rethink this
32
- # unless record.is_a?(Chronicle::ETL::Models)
33
- # raise Chronicle::ETL::RunnerTypeError, "Transformed data should be a type of Chronicle::ETL::Models"
34
- # end
35
-
36
53
  Chronicle::ETL::Logger.info(tty_log_transformation(transformer))
37
54
  @job_logger.log_transformation(transformer)
38
55
 
39
- loader.load(record) unless @job.dry_run?
56
+ @loader.load(record) unless @job.dry_run?
40
57
  rescue Chronicle::ETL::TransformationError => e
41
58
  Chronicle::ETL::Logger.error(tty_log_transformation_failure(e))
42
59
  ensure
@@ -44,22 +61,22 @@ class Chronicle::ETL::Runner
44
61
  end
45
62
 
46
63
  @progress_bar.finish
47
- loader.finish
64
+ @loader.finish
48
65
  @job_logger.finish
49
66
  rescue Interrupt
50
67
  Chronicle::ETL::Logger.error("\n#{'Job interrupted'.red}")
51
68
  @job_logger.error
52
69
  rescue StandardError => e
53
70
  raise e
54
- ensure
71
+ end
72
+
73
+ def finish_job
55
74
  @job_logger.save
56
75
  @progress_bar&.finish
57
76
  Chronicle::ETL::Logger.detach_from_progress_bar
58
77
  Chronicle::ETL::Logger.info(tty_log_completion)
59
78
  end
60
79
 
61
- private
62
-
63
80
  def tty_log_job_start
64
81
  output = "Beginning job "
65
82
  output += "'#{@job.name}'".bold if @job.name
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.4.1"
3
+ VERSION = "0.4.2"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-05 00:00:00.000000000 Z
11
+ date: 2022-03-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -150,6 +150,20 @@ dependencies:
150
150
  - - "~>"
151
151
  - !ruby/object:Gem::Version
152
152
  version: '1.2'
153
+ - !ruby/object:Gem::Dependency
154
+ name: thor-hollaback
155
+ requirement: !ruby/object:Gem::Requirement
156
+ requirements:
157
+ - - "~>"
158
+ - !ruby/object:Gem::Version
159
+ version: '0.2'
160
+ type: :runtime
161
+ prerelease: false
162
+ version_requirements: !ruby/object:Gem::Requirement
163
+ requirements:
164
+ - - "~>"
165
+ - !ruby/object:Gem::Version
166
+ version: '0.2'
153
167
  - !ruby/object:Gem::Dependency
154
168
  name: tty-progressbar
155
169
  requirement: !ruby/object:Gem::Requirement
@@ -164,6 +178,20 @@ dependencies:
164
178
  - - "~>"
165
179
  - !ruby/object:Gem::Version
166
180
  version: '0.17'
181
+ - !ruby/object:Gem::Dependency
182
+ name: tty-spinner
183
+ requirement: !ruby/object:Gem::Requirement
184
+ requirements:
185
+ - - ">="
186
+ - !ruby/object:Gem::Version
187
+ version: '0'
188
+ type: :runtime
189
+ prerelease: false
190
+ version_requirements: !ruby/object:Gem::Requirement
191
+ requirements:
192
+ - - ">="
193
+ - !ruby/object:Gem::Version
194
+ version: '0'
167
195
  - !ruby/object:Gem::Dependency
168
196
  name: tty-table
169
197
  requirement: !ruby/object:Gem::Requirement
@@ -178,6 +206,20 @@ dependencies:
178
206
  - - "~>"
179
207
  - !ruby/object:Gem::Version
180
208
  version: '0.11'
209
+ - !ruby/object:Gem::Dependency
210
+ name: tty-prompt
211
+ requirement: !ruby/object:Gem::Requirement
212
+ requirements:
213
+ - - "~>"
214
+ - !ruby/object:Gem::Version
215
+ version: '0.23'
216
+ type: :runtime
217
+ prerelease: false
218
+ version_requirements: !ruby/object:Gem::Requirement
219
+ requirements:
220
+ - - "~>"
221
+ - !ruby/object:Gem::Version
222
+ version: '0.23'
181
223
  - !ruby/object:Gem::Dependency
182
224
  name: bundler
183
225
  requirement: !ruby/object:Gem::Requirement
@@ -320,6 +362,7 @@ files:
320
362
  - lib/chronicle/etl/cli/connectors.rb
321
363
  - lib/chronicle/etl/cli/jobs.rb
322
364
  - lib/chronicle/etl/cli/main.rb
365
+ - lib/chronicle/etl/cli/plugins.rb
323
366
  - lib/chronicle/etl/cli/subcommand_base.rb
324
367
  - lib/chronicle/etl/config.rb
325
368
  - lib/chronicle/etl/configurable.rb
@@ -336,6 +379,7 @@ files:
336
379
  - lib/chronicle/etl/job_log.rb
337
380
  - lib/chronicle/etl/job_logger.rb
338
381
  - lib/chronicle/etl/loaders/csv_loader.rb
382
+ - lib/chronicle/etl/loaders/helpers/encoding_helper.rb
339
383
  - lib/chronicle/etl/loaders/json_loader.rb
340
384
  - lib/chronicle/etl/loaders/loader.rb
341
385
  - lib/chronicle/etl/loaders/rest_loader.rb
@@ -347,6 +391,7 @@ files:
347
391
  - lib/chronicle/etl/models/entity.rb
348
392
  - lib/chronicle/etl/models/raw.rb
349
393
  - lib/chronicle/etl/registry/connector_registration.rb
394
+ - lib/chronicle/etl/registry/plugin_registry.rb
350
395
  - lib/chronicle/etl/registry/registry.rb
351
396
  - lib/chronicle/etl/registry/self_registering.rb
352
397
  - lib/chronicle/etl/runner.rb
@@ -384,7 +429,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
384
429
  - !ruby/object:Gem::Version
385
430
  version: '0'
386
431
  requirements: []
387
- rubygems_version: 3.1.6
432
+ rubygems_version: 3.3.9
388
433
  signing_key:
389
434
  specification_version: 4
390
435
  summary: ETL tool for personal data