chronicle-etl 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8a267de435b41b579e36128b7392729ef499eb37f05fabaead7811f089938ddb
4
- data.tar.gz: d4af2f62f3f5de926bdfbb0e3d6dbe2c952ec286c07317af4dca8d98f665d6da
3
+ metadata.gz: f041e90fc6019ecbc2424e8e75e7f21fa5ba715c2cdbde73d61c298e9ca82e07
4
+ data.tar.gz: 2feda7669f8fefc7ad80fe7918530a86ad2ced431f75e70ad42653c871b67d90
5
5
  SHA512:
6
- metadata.gz: c78080cce008340f0b2795be46da2b5eb6562b2bffd97728150960343870f2bea4699e4efa07905710dd0e2eba7aaa1e803d8c0f727196f5d9d655b28a04f02e
7
- data.tar.gz: cae3a3ffb6527f5c0b3ff89c75dc98d9cd66157ee6230c9db797f4683f90e2146daadf291108e55d3090d0120d3c9e25135cb21c4e9078bcaf4d1edf2172c930
6
+ metadata.gz: 4a40c72dcb6514037c6e53214dc0af3bfba20c272b959c3e83496658b4b8dc3f841d7399aa215a5fcc5c5cd494278223666b8b881f645ed2c61b667351ccde94
7
+ data.tar.gz: 5b5450b76a8c03d7cb8405888b2e059390c1585a66524dd7403b458c828a1936422e03a9654ec6d270b634bf6bfe17d6dafda54e05aca8a8732207366d03ffd2
data/.rubocop.yml CHANGED
@@ -27,6 +27,9 @@ Style/OpenStructUse:
27
27
  Style/Copyright:
28
28
  Enabled: false
29
29
 
30
+ Style/MissingElse:
31
+ Enabled: false
32
+
30
33
  Style/SymbolArray:
31
34
  EnforcedStyle: brackets
32
35
 
data/README.md CHANGED
@@ -1,12 +1,14 @@
1
1
  ## A CLI toolkit for extracting and working with your digital history
2
2
 
3
+ ![chronicle-etl-banner](https://user-images.githubusercontent.com/6291/157330518-0f934c9a-9ec4-43d9-9cc2-12f156d09b37.png)
4
+
3
5
  [![Gem Version](https://badge.fury.io/rb/chronicle-etl.svg)](https://badge.fury.io/rb/chronicle-etl) [![Ruby](https://github.com/chronicle-app/chronicle-etl/actions/workflows/ruby.yml/badge.svg)](https://github.com/chronicle-app/chronicle-etl/actions/workflows/ruby.yml)
4
6
 
5
7
  Are you trying to archive your digital history or incorporate it into your own projects? You’ve probably discovered how frustrating it is to get machine-readable access to your own data. While [building a memex](https://hyfen.net/memex/), I learned first-hand what great efforts must be made before you can begin using the data in interesting ways.
6
8
 
7
9
  If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
8
10
 
9
- `chronicle-etl` is a CLI tool that gives you the ability to easily access your personal data. It uses the ETL pattern to **extract** it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), **transform** it (into a given schema), and **load** it to a source (e.g. a CSV file, JSON, external API).
11
+ **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a source (e.g. a CSV file, JSON, external API).
10
12
 
11
13
  ## What does `chronicle-etl` give you?
12
14
  * **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
@@ -86,7 +88,16 @@ Plugins provide access to data from third-party platforms, services, or formats.
86
88
 
87
89
  ```bash
88
90
  # Install a plugin
89
- $ chronicle-etl connectors:install NAME
91
+ $ chronicle-etl plugins:install NAME
92
+
93
+ # Install the imessage plugin
94
+ $ chronicle-etl plugins:install imessage
95
+
96
+ # List installed plugins
97
+ $ chronicle-etl plugins:list
98
+
99
+ # Uninstall a plugin
100
+ $ chronicle-etl plugins:uninstall NAME
90
101
  ```
91
102
 
92
103
  A few dozen importers exist [in my Memex project](https://hyfen.net/memex/) and they’re being ported over to the Chronicle system. This table shows what’s available now and what’s coming. Rows are sorted in very rough order of priority.
@@ -99,8 +110,8 @@ If you want to work together on a connector, please [get in touch](#get-in-touch
99
110
  | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (zsh support pending) |
100
111
  | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (imap support pending) |
101
112
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
113
+ | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
102
114
  | github | Github user and repo activity | In progress |
103
- | safari | Browser history from local sqlite db | Needs porting |
104
115
  | chrome | Browser history from local sqlite db | Needs porting |
105
116
  | whatsapp | Messaging history (via individual chat exports) or reverse-engineered local desktop install | Unstarted |
106
117
  | anki | Studying and card creation history | Needs porting |
@@ -186,4 +197,4 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/chroni
186
197
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
187
198
 
188
199
  ## Code of Conduct
189
- Everyone interacting in the Chronicle::ETL project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/chronicle-app/chronicle-etl/blob/master/CODE_OF_CONDUCT.md).
200
+ Everyone interacting in the Chronicle::ETL project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/chronicle-app/chronicle-etl/blob/main/CODE_OF_CONDUCT.md).
@@ -47,8 +47,11 @@ Gem::Specification.new do |spec|
47
47
  spec.add_dependency "sequel", "~> 5.35"
48
48
  spec.add_dependency "sqlite3", "~> 1.4"
49
49
  spec.add_dependency "thor", "~> 1.2"
50
+ spec.add_dependency "thor-hollaback", "~> 0.2"
50
51
  spec.add_dependency "tty-progressbar", "~> 0.17"
52
+ spec.add_dependency "tty-spinner"
51
53
  spec.add_dependency "tty-table", "~> 0.11"
54
+ spec.add_dependency "tty-prompt", "~> 0.23"
52
55
 
53
56
  spec.add_development_dependency "bundler", "~> 2.1"
54
57
  spec.add_development_dependency "pry-byebug", "~> 3.9"
@@ -8,11 +8,6 @@ module Chronicle
8
8
  default_task 'list'
9
9
  namespace :connectors
10
10
 
11
- desc "install NAME", "Installs connector NAME"
12
- def install(name)
13
- Chronicle::ETL::Registry.install_connector(name)
14
- end
15
-
16
11
  desc "list", "Lists available connectors"
17
12
  # Display all available connectors that chronicle-etl has access to
18
13
  def list
@@ -44,21 +39,21 @@ module Chronicle
44
39
  desc "show PHASE IDENTIFIER", "Show information about a connector"
45
40
  def show(phase, identifier)
46
41
  unless ['extractor', 'transformer', 'loader'].include?(phase)
47
- puts "phase argument must be one of: [extractor, transformer, loader]"
48
- return
42
+ Chronicle::ETL::Logger.fatal("Phase argument must be one of: [extractor, transformer, loader]")
43
+ exit 1
49
44
  end
50
45
 
51
46
  begin
52
47
  connector = Chronicle::ETL::Registry.find_by_phase_and_identifier(phase.to_sym, identifier)
53
- rescue Chronicle::ETL::ConnectorNotAvailableError
54
- puts "Could not find #{phase} #{identifier}"
55
- return
48
+ rescue Chronicle::ETL::ConnectorNotAvailableError, Chronicle::ETL::PluginError
49
+ Chronicle::ETL::Logger.fatal("Could not find #{phase} #{identifier}")
50
+ exit 1
56
51
  end
57
52
 
58
53
  puts connector.klass.to_s.bold
59
54
  puts " #{connector.descriptive_phrase}"
60
55
  puts
61
- puts "OPTIONS"
56
+ puts "Settings:"
62
57
 
63
58
  headers = ['name', 'default', 'required'].map{ |h| h.to_s.upcase.bold }
64
59
 
@@ -1,4 +1,5 @@
1
1
  require 'pp'
2
+ require 'tty-prompt'
2
3
 
3
4
  module Chronicle
4
5
  module ETL
@@ -6,7 +7,7 @@ module Chronicle
6
7
  # CLI commands for working with ETL jobs
7
8
  class Jobs < SubcommandBase
8
9
  default_task "start"
9
- namespace :jobs
10
+ namespace :jobs
10
11
 
11
12
  class_option :name, aliases: '-j', desc: 'Job configuration name'
12
13
 
@@ -25,16 +26,11 @@ module Chronicle
25
26
 
26
27
  class_option :output, aliases: '-o', desc: 'Output filename', type: 'string'
27
28
  class_option :fields, desc: 'Output only these fields', type: 'array', banner: 'field1 field2 ...'
28
-
29
- class_option :log_level, desc: 'Log level (debug, info, warn, error, fatal)', default: 'info'
30
- class_option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
31
- class_option :silent, desc: 'Silence all output', type: :boolean
29
+ class_option :header_row, desc: 'Output the header row of tabular output', type: 'boolean'
32
30
 
33
31
  # Thor doesn't like `run` as a command name
34
32
  map run: :start
35
33
  desc "run", "Start a job"
36
- option :log_level, desc: 'Log level (debug, info, warn, error, fatal)', default: 'info'
37
- option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
38
34
  option :dry_run, desc: 'Only run the extraction and transform steps, not the loading', type: :boolean
39
35
  long_desc <<-LONG_DESC
40
36
  This will run an ETL job. Each job needs three parts:
@@ -49,25 +45,40 @@ module Chronicle
49
45
  LONG_DESC
50
46
  # Run an ETL job
51
47
  def start
52
- setup_log_level
53
- job_definition = build_job_definition(options)
54
- job = Chronicle::ETL::Job.new(job_definition)
55
- runner = Chronicle::ETL::Runner.new(job)
56
- runner.run!
48
+ run_job(options)
49
+ rescue Chronicle::ETL::JobDefinitionError => e
50
+ missing_plugins = e.job_definition.errors
51
+ .select { |error| error.is_a?(Chronicle::ETL::PluginLoadError) }
52
+ .map(&:name)
53
+ .uniq
54
+
55
+ install_missing_plugins(missing_plugins)
56
+ run_job(options)
57
57
  end
58
58
 
59
59
  desc "create", "Create a job"
60
60
  # Create an ETL job
61
61
  def create
62
62
  job_definition = build_job_definition(options)
63
+ job_definition.validate!
64
+
63
65
  path = File.join('chronicle', 'etl', 'jobs', options[:name])
64
66
  Chronicle::ETL::Config.write(path, job_definition.definition)
67
+ rescue Chronicle::ETL::JobDefinitionError => e
68
+ Chronicle::ETL::Logger.debug(e.full_message)
69
+ Chronicle::ETL::Logger.fatal("Job definition error".red)
65
70
  end
66
71
 
67
72
  desc "show", "Show details about a job"
68
73
  # Show an ETL job
69
74
  def show
70
- puts Chronicle::ETL::Job.new(build_job_definition(options))
75
+ job_definition = build_job_definition(options)
76
+ job_definition.validate!
77
+ puts Chronicle::ETL::Job.new(job_definition)
78
+ rescue Chronicle::ETL::JobDefinitionError => e
79
+ Chronicle::ETL::Logger.debug(e.full_message)
80
+ Chronicle::ETL::Logger.fatal("Job definition error".red)
81
+ exit 1
71
82
  end
72
83
 
73
84
  desc "list", "List all available jobs"
@@ -87,21 +98,43 @@ LONG_DESC
87
98
 
88
99
  headers = ['name', 'extractor', 'transformer', 'loader'].map { |h| h.upcase.bold }
89
100
 
101
+ puts "Available jobs:"
90
102
  table = TTY::Table.new(headers, job_details)
91
103
  puts table.render(indent: 0, padding: [0, 2])
104
+ rescue Chronicle::ETL::ConfigError => e
105
+ Chronicle::ETL::Logger.debug(e.full_message)
106
+ Chronicle::ETL::Logger.fatal("Error reading config. #{e.message}".red)
107
+ exit 1
92
108
  end
93
109
 
94
110
  private
95
111
 
96
- def setup_log_level
97
- if options[:silent]
98
- Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::SILENT
99
- elsif options[:verbose]
100
- Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::DEBUG
101
- elsif options[:log_level]
102
- level = Chronicle::ETL::Logger.const_get(options[:log_level].upcase)
103
- Chronicle::ETL::Logger.log_level = level
112
+ def run_job(options)
113
+ job_definition = build_job_definition(options)
114
+ job = Chronicle::ETL::Job.new(job_definition)
115
+ runner = Chronicle::ETL::Runner.new(job)
116
+ runner.run!
117
+ end
118
+
119
+ # TODO: probably could merge this with something in cli/plugin
120
+ def install_missing_plugins(missing_plugins)
121
+ prompt = TTY::Prompt.new
122
+ message = "Plugin#{'s' if missing_plugins.count > 1} specified by job not installed.\n"
123
+ message += "Do you want to install "
124
+ message += missing_plugins.map { |name| "chronicle-#{name}".bold}.join(", ")
125
+ message += " and start the job?"
126
+ install = prompt.yes?(message)
127
+ return unless install
128
+
129
+ spinner = TTY::Spinner.new("[:spinner] Installing plugins...", format: :dots_2)
130
+ spinner.auto_spin
131
+ missing_plugins.each do |plugin|
132
+ Chronicle::ETL::Registry::PluginRegistry.install(plugin)
104
133
  end
134
+ spinner.success("(#{'successful'.green})")
135
+ rescue Chronicle::ETL::PluginNotAvailableError => e
136
+ spinner.error("Error".red)
137
+ Chronicle::ETL::Logger.fatal("Plugin '#{e.name}' could not be installed".red)
105
138
  end
106
139
 
107
140
  # Create job definition by reading config file and then overwriting with flag options
@@ -129,6 +162,7 @@ LONG_DESC
129
162
 
130
163
  loader_options = options[:'loader-opts'].merge({
131
164
  output: options[:output],
165
+ header_row: options[:header_row],
132
166
  fields: options[:fields]
133
167
  }.compact)
134
168
 
@@ -5,6 +5,14 @@ module Chronicle
5
5
  module CLI
6
6
  # Main entrypoint for CLI app
7
7
  class Main < ::Thor
8
+ class_before :set_log_level
9
+ class_before :set_color_output
10
+
11
+ class_option :log_level, desc: 'Log level (debug, info, warn, error, fatal, silent)', default: 'info'
12
+ class_option :verbose, aliases: '-v', desc: 'Set log level to verbose', type: :boolean
13
+ class_option :silent, desc: 'Silence all output', type: :boolean
14
+ class_option :'no-color', desc: 'Disable colour output', type: :boolean
15
+
8
16
  default_task "jobs"
9
17
 
10
18
  desc 'connectors:COMMAND', 'Connectors available for ETL jobs', hide: true
@@ -13,6 +21,9 @@ module Chronicle
13
21
  desc 'jobs:COMMAND', 'Configure and run jobs', hide: true
14
22
  subcommand 'jobs', Jobs
15
23
 
24
+ desc 'plugins:COMMAND', 'Configure plugins', hide: true
25
+ subcommand 'plugins', Plugins
26
+
16
27
  # Entrypoint for the CLI
17
28
  def self.start(given_args = ARGV, config = {})
18
29
  # take a subcommand:command and splits them so Thor knows how to hand off to the subcommand class
@@ -79,6 +90,23 @@ module Chronicle
79
90
  shell.say
80
91
  end
81
92
  end
93
+
94
+ no_commands do
95
+ def set_color_output
96
+ String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
97
+ end
98
+
99
+ def set_log_level
100
+ if options[:silent]
101
+ Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::SILENT
102
+ elsif options[:verbose]
103
+ Chronicle::ETL::Logger.log_level = Chronicle::ETL::Logger::DEBUG
104
+ elsif options[:log_level]
105
+ level = Chronicle::ETL::Logger.const_get(options[:log_level].upcase)
106
+ Chronicle::ETL::Logger.log_level = level
107
+ end
108
+ end
109
+ end
82
110
  end
83
111
  end
84
112
  end
@@ -0,0 +1,62 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "tty-prompt"
4
+ require "tty-spinner"
5
+
6
+
7
+ module Chronicle
8
+ module ETL
9
+ module CLI
10
+ # CLI commands for working with ETL plugins
11
+ class Plugins < SubcommandBase
12
+ default_task 'list'
13
+ namespace :plugins
14
+
15
+ desc "install", "Install a plugin"
16
+ def install(name)
17
+ spinner = TTY::Spinner.new("[:spinner] Installing plugin #{name}...", format: :dots_2)
18
+ spinner.auto_spin
19
+ Chronicle::ETL::Registry::PluginRegistry.install(name)
20
+ spinner.success("(#{'successful'.green})")
21
+ rescue Chronicle::ETL::PluginError => e
22
+ spinner.error("Error".red)
23
+ Chronicle::ETL::Logger.debug(e.full_message)
24
+ Chronicle::ETL::Logger.fatal("Plugin '#{name}' could not be installed".red)
25
+ exit 1
26
+ end
27
+
28
+ desc "uninstall", "Unintall a plugin"
29
+ def uninstall(name)
30
+ spinner = TTY::Spinner.new("[:spinner] Uninstalling plugin #{name}...", format: :dots_2)
31
+ spinner.auto_spin
32
+ Chronicle::ETL::Registry::PluginRegistry.uninstall(name)
33
+ spinner.success("(#{'successful'.green})")
34
+ rescue Chronicle::ETL::PluginError => e
35
+ spinner.error("Error".red)
36
+ Chronicle::ETL::Logger.debug(e.full_message)
37
+ Chronicle::ETL::Logger.fatal("Plugin '#{name}' could not be uninstalled (was it installed?)".red)
38
+ exit 1
39
+ end
40
+
41
+ desc "list", "Lists available plugins"
42
+ # Display all available plugins that chronicle-etl has access to
43
+ def list
44
+ plugins = Chronicle::ETL::Registry::PluginRegistry.all_installed_latest
45
+
46
+ info = plugins.map do |plugin|
47
+ {
48
+ name: plugin.name.sub("chronicle-", ""),
49
+ description: plugin.description,
50
+ version: plugin.version
51
+ }
52
+ end
53
+
54
+ headers = ['name', 'description', 'latest version'].map{ |h| h.to_s.upcase.bold }
55
+ table = TTY::Table.new(headers, info.map(&:values))
56
+ puts "Installed plugins:"
57
+ puts table.render(indent: 2, padding: [0, 0])
58
+ end
59
+ end
60
+ end
61
+ end
62
+ end
@@ -1,7 +1,9 @@
1
1
  require 'thor'
2
+ require 'thor/hollaback'
2
3
  require 'chronicle/etl'
3
4
 
4
5
  require 'chronicle/etl/cli/subcommand_base'
5
6
  require 'chronicle/etl/cli/connectors'
6
7
  require 'chronicle/etl/cli/jobs'
8
+ require 'chronicle/etl/cli/plugins'
7
9
  require 'chronicle/etl/cli/main'
@@ -24,16 +24,14 @@ module Chronicle
24
24
 
25
25
  # Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
26
26
  def available_jobs
27
- job_directory = Runcom::Config.new('chronicle/etl/jobs').current
28
- Dir.glob(File.join(job_directory, "*.yml")).map do |filename|
27
+ Dir.glob(File.join(config_directory("jobs"), "*.yml")).map do |filename|
29
28
  File.basename(filename, ".*")
30
29
  end
31
30
  end
32
31
 
33
32
  # Returns all available credentials available in ~/.config/chronicle/etl/credentials/*.yml
34
33
  def available_credentials
35
- job_directory = Runcom::Config.new('chronicle/etl/credentials').current
36
- Dir.glob(File.join(job_directory, "*.yml")).map do |filename|
34
+ Dir.glob(File.join(config_directory("credentials"), "*.yml")).map do |filename|
37
35
  File.basename(filename, ".*")
38
36
  end
39
37
  end
@@ -48,6 +46,11 @@ module Chronicle
48
46
  def load_credentials(name)
49
47
  config = self.load("chronicle/etl/credentials/#{name}.yml")
50
48
  end
49
+
50
+ def config_directory(type)
51
+ path = "chronicle/etl/#{type}"
52
+ Runcom::Config.new(path).current || raise(Chronicle::ETL::ConfigError, "Could not access config directory (#{path})")
53
+ end
51
54
  end
52
55
  end
53
56
  end
@@ -57,7 +57,7 @@ module Chronicle
57
57
 
58
58
  options.each do |name, value|
59
59
  setting = self.class.all_settings[name]
60
- raise(Chronicle::ETL::ConfigurationError, "Unrecognized setting: #{name}") unless setting
60
+ raise(Chronicle::ETL::ConnectorConfigurationError, "Unrecognized setting: #{name}") unless setting
61
61
 
62
62
  @config[name] = coerced_value(setting, value)
63
63
  end
@@ -78,7 +78,7 @@ module Chronicle
78
78
 
79
79
  def validate_config
80
80
  missing = (self.class.all_required_settings.keys - @config.compacted_h.keys)
81
- raise Chronicle::ETL::ConfigurationError, "Missing options: #{missing}" if missing.count.positive?
81
+ raise Chronicle::ETL::ConnectorConfigurationError, "Missing options: #{missing}" if missing.count.positive?
82
82
  end
83
83
 
84
84
  def coerced_value(setting, value)
@@ -89,6 +89,11 @@ module Chronicle
89
89
  value.to_s
90
90
  end
91
91
 
92
+ # TODO: think about whether to split up float, integer
93
+ def coerce_numeric(value)
94
+ value.to_f
95
+ end
96
+
92
97
  def coerce_boolean(value)
93
98
  if value.is_a?(String)
94
99
  value.downcase == "true"
@@ -2,10 +2,32 @@ module Chronicle
2
2
  module ETL
3
3
  class Error < StandardError; end
4
4
 
5
- class ConfigurationError < Error; end
5
+ class ConfigError < Error; end
6
6
 
7
7
  class RunnerTypeError < Error; end
8
8
 
9
+ class JobDefinitionError < Error
10
+ attr_reader :job_definition
11
+
12
+ def initialize(job_definition)
13
+ @job_definition = job_definition
14
+ super
15
+ end
16
+ end
17
+
18
+ class PluginError < Error
19
+ attr_reader :name
20
+
21
+ def initialize(name)
22
+ @name = name
23
+ end
24
+ end
25
+
26
+ class PluginNotAvailableError < PluginError; end
27
+ class PluginLoadError < PluginError; end
28
+
29
+ class ConnectorConfigurationError < Error; end
30
+
9
31
  class ConnectorNotAvailableError < Error
10
32
  def initialize(message, provider: nil, name: nil)
11
33
  super(message)
@@ -9,7 +9,7 @@ module Chronicle
9
9
 
10
10
  setting :since, type: :time
11
11
  setting :until, type: :time
12
- setting :limit
12
+ setting :limit, type: :numeric
13
13
  setting :load_after_id
14
14
  setting :input
15
15
 
@@ -1,6 +1,11 @@
1
1
  require 'forwardable'
2
+
2
3
  module Chronicle
3
4
  module ETL
5
+ # A runner job
6
+ #
7
+ # TODO: this can probably be merged with JobDefinition. Not clear
8
+ # where the boundaries are
4
9
  class Job
5
10
  extend Forwardable
6
11
 
@@ -12,7 +17,8 @@ module Chronicle
12
17
  :transformer_klass,
13
18
  :transformer_options,
14
19
  :loader_klass,
15
- :loader_options
20
+ :loader_options,
21
+ :job_definition
16
22
 
17
23
  # TODO: build a proper id system
18
24
  alias id name
@@ -19,12 +19,31 @@ module Chronicle
19
19
  }
20
20
  }.freeze
21
21
 
22
+ attr_reader :errors
22
23
  attr_accessor :definition
23
24
 
24
25
  def initialize()
25
26
  @definition = SKELETON_DEFINITION
26
27
  end
27
28
 
29
+ def validate
30
+ @errors = []
31
+
32
+ Chronicle::ETL::Registry::PHASES.each do |phase|
33
+ __send__("#{phase}_klass".to_sym)
34
+ rescue Chronicle::ETL::PluginError => e
35
+ @errors << e
36
+ end
37
+
38
+ @errors.empty?
39
+ end
40
+
41
+ def validate!
42
+ raise(Chronicle::ETL::JobDefinitionError.new(self), "Job definition is invalid") unless validate
43
+
44
+ true
45
+ end
46
+
28
47
  # Add config hash to this definition
29
48
  def add_config(config = {})
30
49
  @definition = @definition.deep_merge(config)
@@ -80,10 +99,6 @@ module Chronicle
80
99
  end
81
100
  end
82
101
  end
83
-
84
- def validate
85
- return true # TODO
86
- end
87
102
  end
88
103
  end
89
104
  end
@@ -7,22 +7,49 @@ module Chronicle
7
7
  r.description = 'CSV'
8
8
  end
9
9
 
10
- def initialize(options={})
11
- super(options)
12
- @rows = []
10
+ setting :output, default: $stdout
11
+ setting :headers, default: true
12
+ setting :header_row, default: true
13
+
14
+ def records
15
+ @records ||= []
13
16
  end
14
17
 
15
18
  def load(record)
16
- @rows << record.to_h_flattened.values
19
+ records << record.to_h_flattened
17
20
  end
18
21
 
19
22
  def finish
20
- z = $stdout
21
- CSV(z) do |csv|
22
- @rows.each do |row|
23
- csv << row
23
+ return unless records.any?
24
+
25
+ headers = build_headers(records)
26
+
27
+ csv_options = {}
28
+ if @config.headers
29
+ csv_options[:write_headers] = @config.header_row
30
+ csv_options[:headers] = headers
31
+ end
32
+
33
+ if @config.output.is_a?(IO)
34
+ # This might seem like a duplication of the default value ($stdout)
35
+ # but it's because rspec overwrites $stdout (in helper #capture) to
36
+ # capture output.
37
+ io = $stdout.dup
38
+ else
39
+ io = File.open(@config.output, "w+")
40
+ end
41
+
42
+ output = CSV.generate(**csv_options) do |csv|
43
+ records.each do |record|
44
+ csv << record
45
+ .transform_keys(&:to_sym)
46
+ .values_at(*headers)
47
+ .map { |value| force_utf8(value) }
24
48
  end
25
49
  end
50
+
51
+ io.write(output)
52
+ io.close
26
53
  end
27
54
  end
28
55
  end
@@ -0,0 +1,18 @@
1
+ require 'pathname'
2
+
3
+ module Chronicle
4
+ module ETL
5
+ module Loaders
6
+ module Helpers
7
+ module EncodingHelper
8
+ # Mostly useful for handling loading with binary data from a raw extraction
9
+ def force_utf8(value)
10
+ return value unless value.is_a?(String)
11
+
12
+ value.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
13
+ end
14
+ end
15
+ end
16
+ end
17
+ end
18
+ end
@@ -25,7 +25,7 @@ module Chronicle
25
25
  encoded = serialized.transform_values do |value|
26
26
  next value unless value.is_a?(String)
27
27
 
28
- value.encode('UTF-8', invalid: :replace, undef: :replace, replace: '?')
28
+ force_utf8(value)
29
29
  end
30
30
  @output.puts encoded.to_json
31
31
  end
@@ -1,11 +1,17 @@
1
+ require_relative 'helpers/encoding_helper'
2
+
1
3
  module Chronicle
2
4
  module ETL
3
5
  # Abstract class representing a Loader for an ETL job
4
6
  class Loader
5
7
  extend Chronicle::ETL::Registry::SelfRegistering
6
8
  include Chronicle::ETL::Configurable
9
+ include Chronicle::ETL::Loaders::Helpers::EncodingHelper
7
10
 
8
11
  setting :output
12
+ setting :fields
13
+ setting :fields_limit, default: nil
14
+ setting :fields_exclude
9
15
 
10
16
  # Construct a new instance of this loader. Options are passed in from a Runner
11
17
  # == Parameters:
@@ -25,6 +31,23 @@ module Chronicle
25
31
 
26
32
  # Called once there are no more records to process
27
33
  def finish; end
34
+
35
+ private
36
+
37
+ def build_headers(records)
38
+ headers =
39
+ if @config.fields && @config.fields.any?
40
+ Set[*@config.fields]
41
+ else
42
+ # use all the keys of the flattened record hash
43
+ Set[*records.map(&:keys).flatten.map(&:to_s).uniq]
44
+ end
45
+
46
+ headers = headers.delete_if { |header| header.end_with?(*@config.fields_exclude) }
47
+ headers = headers.first(@config.fields_limit) if @config.fields_limit
48
+
49
+ headers.to_a.map(&:to_sym)
50
+ end
28
51
  end
29
52
  end
30
53
  end
@@ -9,11 +9,10 @@ module Chronicle
9
9
  r.description = 'an ASCII table'
10
10
  end
11
11
 
12
- setting :fields_limit, default: nil
13
- setting :fields_exclude, default: ['lids', 'type']
14
- setting :fields, default: []
15
12
  setting :truncate_values_at, default: 40
16
13
  setting :table_renderer, default: :basic
14
+ setting :fields_exclude, default: ['lids', 'type']
15
+ setting :header_row, default: true
17
16
 
18
17
  def load(record)
19
18
  records << record.to_h_flattened
@@ -25,7 +24,7 @@ module Chronicle
25
24
  headers = build_headers(records)
26
25
  rows = build_rows(records, headers)
27
26
 
28
- @table = TTY::Table.new(header: headers, rows: rows)
27
+ @table = TTY::Table.new(header: (headers if @config.header_row), rows: rows)
29
28
  puts @table.render(
30
29
  @config.table_renderer.to_sym,
31
30
  padding: [0, 2, 0, 0]
@@ -38,25 +37,10 @@ module Chronicle
38
37
 
39
38
  private
40
39
 
41
- def build_headers(records)
42
- headers =
43
- if @config.fields.any?
44
- Set[*@config.fields]
45
- else
46
- # use all the keys of the flattened record hash
47
- Set[*records.map(&:keys).flatten.map(&:to_s).uniq]
48
- end
49
-
50
- headers = headers.delete_if { |header| header.end_with?(*@config.fields_exclude) } if @config.fields_exclude.any?
51
- headers = headers.first(@config.fields_limit) if @config.fields_limit
52
-
53
- headers.to_a.map(&:to_sym)
54
- end
55
-
56
40
  def build_rows(records, headers)
57
41
  records.map do |record|
58
42
  values = record.transform_keys(&:to_sym).values_at(*headers).map{|value| value.to_s }
59
-
43
+ values = values.map { |value| force_utf8(value) }
60
44
  if @config.truncate_values_at
61
45
  values = values.map{ |value| value.truncate(@config.truncate_values_at) }
62
46
  end
@@ -13,7 +13,6 @@ module Chronicle
13
13
  attr_accessor :log_level
14
14
 
15
15
  @log_level = INFO
16
- @destination = $stderr
17
16
 
18
17
  def output message, level
19
18
  return unless level >= @log_level
@@ -21,10 +20,14 @@ module Chronicle
21
20
  if @progress_bar
22
21
  @progress_bar.log(message)
23
22
  else
24
- @destination.puts(message)
23
+ $stderr.puts(message)
25
24
  end
26
25
  end
27
26
 
27
+ def fatal(message)
28
+ output(message, FATAL)
29
+ end
30
+
28
31
  def error(message)
29
32
  output(message, ERROR)
30
33
  end
@@ -44,6 +44,11 @@ module Chronicle
44
44
  @provider || (built_in? ? 'chronicle' : '')
45
45
  end
46
46
 
47
+ # TODO: allow overriding here. Maybe through self-registration process
48
+ def plugin
49
+ @provider
50
+ end
51
+
47
52
  def descriptive_phrase
48
53
  prefix = case phase
49
54
  when :extractor
@@ -0,0 +1,70 @@
1
+ require 'rubygems'
2
+ require 'rubygems/command'
3
+ require 'rubygems/commands/install_command'
4
+ require 'rubygems/uninstaller'
5
+
6
+ module Chronicle
7
+ module ETL
8
+ module Registry
9
+ # Responsible for managing plugins available to chronicle-etl
10
+ #
11
+ # @todo Better validation for whether a gem is actually a plugin
12
+ # @todo Add ways to load a plugin that don't require a gem on rubygems.org
13
+ module PluginRegistry
14
+ # Does this plugin exist?
15
+ def self.exists?(name)
16
+ # TODO: implement this. Could query rubygems.org or have a
17
+ # hardcoded approved list
18
+ true
19
+ end
20
+
21
+ # All versions of all plugins currently installed
22
+ def self.all_installed
23
+ # TODO: add check for chronicle-etl dependency
24
+ Gem::Specification.filter { |s| s.name.match(/^chronicle-/) && s.name != "chronicle-etl" }
25
+ end
26
+
27
+ # Latest version of each installed plugin
28
+ def self.all_installed_latest
29
+ all_installed.group_by(&:name)
30
+ .transform_values { |versions| versions.sort_by(&:version).reverse.first }
31
+ .values
32
+ end
33
+
34
+ # Activate a plugin with given name by `require`ing it
35
+ def self.activate(name)
36
+ # By default, activates the latest available version of a gem
37
+ # so don't have to run Kernel#gem separately
38
+ require "chronicle/#{name}"
39
+ rescue LoadError
40
+ raise Chronicle::ETL::PluginLoadError.new(name), "Plugin #{name} couldn't be loaded" if exists?(name)
41
+
42
+ raise Chronicle::ETL::PluginNotAvailableError.new(name), "Plugin #{name} doesn't exist"
43
+ end
44
+
45
+ # Install a plugin to local gems
46
+ def self.install(name)
47
+ gem_name = "chronicle-#{name}"
48
+ raise(Chronicle::ETL::PluginNotAvailableError.new(gem_name), "Plugin #{name} doesn't exist") unless exists?(gem_name)
49
+
50
+ Gem::DefaultUserInteraction.ui = Gem::SilentUI.new
51
+ Gem.install(gem_name)
52
+ rescue Gem::UnsatisfiableDependencyError
53
+ # TODO: we need to catch a lot more than this here
54
+ raise Chronicle::ETL::PluginNotAvailableError.new(name), "Plugin #{name} doesn't exist"
55
+ end
56
+
57
+ # Uninstall a plugin
58
+ def self.uninstall(name)
59
+ gem_name = "chronicle-#{name}"
60
+ Gem::DefaultUserInteraction.ui = Gem::SilentUI.new
61
+ uninstaller = Gem::Uninstaller.new(gem_name)
62
+ uninstaller.uninstall
63
+ rescue Gem::InstallError
64
+ # TODO: strengthen this exception handling
65
+ raise(Chronicle::ETL::PluginError.new(name), "Plugin #{name} wasn't uninstalled")
66
+ end
67
+ end
68
+ end
69
+ end
70
+ end
@@ -20,28 +20,40 @@ module Chronicle
20
20
  end
21
21
  end
22
22
 
23
- def install_connector name
24
- gem_name = "chronicle-#{name}"
25
- Gem.install(gem_name)
23
+ def register connector
24
+ connectors << connector
26
25
  end
27
26
 
28
- def register connector
27
+ def connectors
29
28
  @connectors ||= []
30
- @connectors << connector
31
29
  end
32
30
 
33
31
  def find_by_phase_and_identifier(phase, identifier)
34
- connector = find_within_loaded_connectors(phase, identifier)
35
- unless connector
36
- # Only load external connectors (slow) if not found in built-in connectors
37
- load_all!
38
- connector = find_within_loaded_connectors(phase, identifier)
32
+ # Simple case: built in connector
33
+ connector = connectors.find { |c| c.phase == phase && c.identifier == identifier }
34
+ return connector if connector
35
+
36
+ # if not available in built-in connectors, try to activate a
37
+ # relevant plugin and try again
38
+ if identifier.include?(":")
39
+ plugin, name = identifier.split(":")
40
+ else
41
+ # This case handles the case where the identifier is a
42
+ # shorthand (ie `imessage`) because there's only one default
43
+ # connector.
44
+ plugin = identifier
39
45
  end
40
- connector || raise(ConnectorNotAvailableError.new("Connector '#{identifier}' not found"))
41
- end
42
46
 
43
- def find_within_loaded_connectors(phase, identifier)
44
- @connectors.find { |c| c.phase == phase && c.identifier == identifier }
47
+ PluginRegistry.activate(plugin)
48
+
49
+ candidates = connectors.select { |c| c.phase == phase && c.plugin == plugin }
50
+ # if no name given, just use first connector with right phase/plugin
51
+ # TODO: set up a property for connectors to specify that they're the
52
+ # default connector for the plugin
53
+ candidates = candidates.select { |c| c.identifier == name } if name
54
+ connector = candidates.first
55
+
56
+ connector || raise(ConnectorNotAvailableError, "Connector '#{identifier}' not found")
45
57
  end
46
58
  end
47
59
  end
@@ -50,3 +62,4 @@ end
50
62
 
51
63
  require_relative 'self_registering'
52
64
  require_relative 'connector_registration'
65
+ require_relative 'plugin_registry'
@@ -8,19 +8,41 @@ class Chronicle::ETL::Runner
8
8
  end
9
9
 
10
10
  def run!
11
- extractor = @job.instantiate_extractor
12
- loader = @job.instantiate_loader
11
+ validate_job
12
+ instantiate_connectors
13
+ prepare_job
14
+ prepare_ui
15
+ run_extraction
16
+ finish_job
17
+ end
18
+
19
+ private
20
+
21
+ def validate_job
22
+ @job.job_definition.validate!
23
+ end
24
+
25
+ def instantiate_connectors
26
+ @extractor = @job.instantiate_extractor
27
+ @loader = @job.instantiate_loader
28
+ end
13
29
 
30
+ def prepare_job
31
+ Chronicle::ETL::Logger.info(tty_log_job_start)
14
32
  @job_logger.start
15
- loader.start
33
+ @loader.start
34
+ @extractor.prepare
35
+ end
16
36
 
17
- extractor.prepare
18
- total = extractor.results_count
37
+ def prepare_ui
38
+ total = @extractor.results_count
19
39
  @progress_bar = Chronicle::ETL::Utils::ProgressBar.new(title: 'Running job', total: total)
20
40
  Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
41
+ end
21
42
 
22
- Chronicle::ETL::Logger.info(tty_log_job_start)
23
- extractor.extract do |extraction|
43
+ # TODO: refactor this further
44
+ def run_extraction
45
+ @extractor.extract do |extraction|
24
46
  unless extraction.is_a?(Chronicle::ETL::Extraction)
25
47
  raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
26
48
  end
@@ -28,15 +50,10 @@ class Chronicle::ETL::Runner
28
50
  transformer = @job.instantiate_transformer(extraction)
29
51
  record = transformer.transform
30
52
 
31
- # TODO: rethink this
32
- # unless record.is_a?(Chronicle::ETL::Models)
33
- # raise Chronicle::ETL::RunnerTypeError, "Transformed data should be a type of Chronicle::ETL::Models"
34
- # end
35
-
36
53
  Chronicle::ETL::Logger.info(tty_log_transformation(transformer))
37
54
  @job_logger.log_transformation(transformer)
38
55
 
39
- loader.load(record) unless @job.dry_run?
56
+ @loader.load(record) unless @job.dry_run?
40
57
  rescue Chronicle::ETL::TransformationError => e
41
58
  Chronicle::ETL::Logger.error(tty_log_transformation_failure(e))
42
59
  ensure
@@ -44,22 +61,22 @@ class Chronicle::ETL::Runner
44
61
  end
45
62
 
46
63
  @progress_bar.finish
47
- loader.finish
64
+ @loader.finish
48
65
  @job_logger.finish
49
66
  rescue Interrupt
50
67
  Chronicle::ETL::Logger.error("\n#{'Job interrupted'.red}")
51
68
  @job_logger.error
52
69
  rescue StandardError => e
53
70
  raise e
54
- ensure
71
+ end
72
+
73
+ def finish_job
55
74
  @job_logger.save
56
75
  @progress_bar&.finish
57
76
  Chronicle::ETL::Logger.detach_from_progress_bar
58
77
  Chronicle::ETL::Logger.info(tty_log_completion)
59
78
  end
60
79
 
61
- private
62
-
63
80
  def tty_log_job_start
64
81
  output = "Beginning job "
65
82
  output += "'#{@job.name}'".bold if @job.name
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.4.1"
3
+ VERSION = "0.4.2"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-05 00:00:00.000000000 Z
11
+ date: 2022-03-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -150,6 +150,20 @@ dependencies:
150
150
  - - "~>"
151
151
  - !ruby/object:Gem::Version
152
152
  version: '1.2'
153
+ - !ruby/object:Gem::Dependency
154
+ name: thor-hollaback
155
+ requirement: !ruby/object:Gem::Requirement
156
+ requirements:
157
+ - - "~>"
158
+ - !ruby/object:Gem::Version
159
+ version: '0.2'
160
+ type: :runtime
161
+ prerelease: false
162
+ version_requirements: !ruby/object:Gem::Requirement
163
+ requirements:
164
+ - - "~>"
165
+ - !ruby/object:Gem::Version
166
+ version: '0.2'
153
167
  - !ruby/object:Gem::Dependency
154
168
  name: tty-progressbar
155
169
  requirement: !ruby/object:Gem::Requirement
@@ -164,6 +178,20 @@ dependencies:
164
178
  - - "~>"
165
179
  - !ruby/object:Gem::Version
166
180
  version: '0.17'
181
+ - !ruby/object:Gem::Dependency
182
+ name: tty-spinner
183
+ requirement: !ruby/object:Gem::Requirement
184
+ requirements:
185
+ - - ">="
186
+ - !ruby/object:Gem::Version
187
+ version: '0'
188
+ type: :runtime
189
+ prerelease: false
190
+ version_requirements: !ruby/object:Gem::Requirement
191
+ requirements:
192
+ - - ">="
193
+ - !ruby/object:Gem::Version
194
+ version: '0'
167
195
  - !ruby/object:Gem::Dependency
168
196
  name: tty-table
169
197
  requirement: !ruby/object:Gem::Requirement
@@ -178,6 +206,20 @@ dependencies:
178
206
  - - "~>"
179
207
  - !ruby/object:Gem::Version
180
208
  version: '0.11'
209
+ - !ruby/object:Gem::Dependency
210
+ name: tty-prompt
211
+ requirement: !ruby/object:Gem::Requirement
212
+ requirements:
213
+ - - "~>"
214
+ - !ruby/object:Gem::Version
215
+ version: '0.23'
216
+ type: :runtime
217
+ prerelease: false
218
+ version_requirements: !ruby/object:Gem::Requirement
219
+ requirements:
220
+ - - "~>"
221
+ - !ruby/object:Gem::Version
222
+ version: '0.23'
181
223
  - !ruby/object:Gem::Dependency
182
224
  name: bundler
183
225
  requirement: !ruby/object:Gem::Requirement
@@ -320,6 +362,7 @@ files:
320
362
  - lib/chronicle/etl/cli/connectors.rb
321
363
  - lib/chronicle/etl/cli/jobs.rb
322
364
  - lib/chronicle/etl/cli/main.rb
365
+ - lib/chronicle/etl/cli/plugins.rb
323
366
  - lib/chronicle/etl/cli/subcommand_base.rb
324
367
  - lib/chronicle/etl/config.rb
325
368
  - lib/chronicle/etl/configurable.rb
@@ -336,6 +379,7 @@ files:
336
379
  - lib/chronicle/etl/job_log.rb
337
380
  - lib/chronicle/etl/job_logger.rb
338
381
  - lib/chronicle/etl/loaders/csv_loader.rb
382
+ - lib/chronicle/etl/loaders/helpers/encoding_helper.rb
339
383
  - lib/chronicle/etl/loaders/json_loader.rb
340
384
  - lib/chronicle/etl/loaders/loader.rb
341
385
  - lib/chronicle/etl/loaders/rest_loader.rb
@@ -347,6 +391,7 @@ files:
347
391
  - lib/chronicle/etl/models/entity.rb
348
392
  - lib/chronicle/etl/models/raw.rb
349
393
  - lib/chronicle/etl/registry/connector_registration.rb
394
+ - lib/chronicle/etl/registry/plugin_registry.rb
350
395
  - lib/chronicle/etl/registry/registry.rb
351
396
  - lib/chronicle/etl/registry/self_registering.rb
352
397
  - lib/chronicle/etl/runner.rb
@@ -384,7 +429,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
384
429
  - !ruby/object:Gem::Version
385
430
  version: '0'
386
431
  requirements: []
387
- rubygems_version: 3.1.6
432
+ rubygems_version: 3.3.9
388
433
  signing_key:
389
434
  specification_version: 4
390
435
  summary: ETL tool for personal data