chronicle-etl 0.5.2 → 0.5.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b8faa084cfe4a9f080ee5494c69b268b78bfa8f3502354e740264e6941f13daf
4
- data.tar.gz: 1bf4f2751c71cadedc78a2fe3ed5b09bf86cd601a909e2fa2db0a0de8cc2c21d
3
+ metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
4
+ data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
5
5
  SHA512:
6
- metadata.gz: ff10779b663a3321b779fb03e07249856174d96fb96e405ae906a47441c288d6a245c852525801ba250cce1125cf05c523ef4ec75fdfb4335cef9003091437ed
7
- data.tar.gz: 509f6f92e95341d212c54b6b000bc54e8ba03898497191a3e5d3b14db7bff3ed625d0fee403888fbb6103c1edc14de66b215d9aa84ddb68cefcf51c0e6c74138
6
+ metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
7
+ data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
data/README.md CHANGED
@@ -8,19 +8,26 @@ Are you trying to archive your digital history or incorporate it into your own p
8
8
 
9
9
  If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
10
10
 
11
- **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a source (e.g. a CSV file, JSON, external API).
11
+ **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
12
12
 
13
13
  ## What does `chronicle-etl` give you?
14
14
  * **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
15
15
  * **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
16
16
  * **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
17
17
 
18
+ ## Chronicle-ETL in action
19
+
20
+ ![demo](https://user-images.githubusercontent.com/6291/161410839-b5ce931a-2353-4585-b530-929f46e3f960.svg)
21
+
22
+ ### Longer screencast
23
+
24
+ [![asciicast](https://asciinema.org/a/483455.svg)](https://asciinema.org/a/483455)
25
+
18
26
  ## Installation
19
27
 
20
28
  Using homebrew:
21
29
  ```sh
22
30
  $ brew install chronicle-app/etl/chronicle-etl
23
-
24
31
  ```
25
32
  Using rubygems:
26
33
  ```sh
@@ -42,7 +49,7 @@ $ chronicle-etl help
42
49
  $ chronicle-etl --extractor NAME --transformer NAME --loader NAME
43
50
 
44
51
  # Read test.csv and display it to stdout as a table
45
- $ chronicle-etl --extractor csv --input ./data.csv --loader table
52
+ $ chronicle-etl --extractor csv --input data.csv --loader table
46
53
 
47
54
  # Retrieve shell commands run in the last 5 hours
48
55
  $ chronicle-etl -e shell --since 5h
@@ -154,12 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
154
161
 
155
162
  | Name | Description | Availability |
156
163
  |-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
164
+ | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
165
+ | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
157
166
  | [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
158
- | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
159
- | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
160
167
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
161
168
  | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
162
- | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
169
+ | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
163
170
 
164
171
  #### Coming soon
165
172
 
@@ -6,6 +6,7 @@ module Chronicle
6
6
  no_commands do
7
7
  # Shorthand for cli_exit(status: :failure)
8
8
  def cli_fail(message: nil, exception: nil)
9
+ message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
9
10
  cli_exit(status: :failure, message: message, exception: exception)
10
11
  end
11
12
 
@@ -43,6 +43,15 @@ module Chronicle
43
43
  LONG_DESC
44
44
  # Run an ETL job
45
45
  def start(name = nil)
46
+ # If someone runs `$ chronicle-etl` with no arguments, show help menu.
47
+ # TODO: decide if we should check that there's nothing in stdin pipe
48
+ # in case user wants to actually run this sort of job stdin->null->stdout
49
+ if name.nil? && options[:extractor].nil?
50
+ m = Chronicle::ETL::CLI::Main.new
51
+ m.help
52
+ cli_exit
53
+ end
54
+
46
55
  job_definition = build_job_definition(name, options)
47
56
 
48
57
  if job_definition.plugins_missing?
@@ -82,11 +91,10 @@ LONG_DESC
82
91
 
83
92
  if write_config
84
93
  Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
85
- cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
94
+ cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
86
95
  else
87
96
  cli_fail(message: "\nJob not saved")
88
97
  end
89
-
90
98
  rescue Chronicle::ETL::JobDefinitionError => e
91
99
  cli_fail(message: "Job definition error", exception: e)
92
100
  end
@@ -136,6 +144,8 @@ LONG_DESC
136
144
  job = Chronicle::ETL::Job.new(job_definition)
137
145
  runner = Chronicle::ETL::Runner.new(job)
138
146
  runner.run!
147
+ rescue RunnerError => e
148
+ cli_fail(message: "#{e.message}", exception: e)
139
149
  end
140
150
 
141
151
  # TODO: probably could merge this with something in cli/plugin
@@ -54,24 +54,40 @@ module Chronicle
54
54
  klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
55
55
  klass.start(['-h', task].compact, shell: shell)
56
56
  else
57
- shell.say "ABOUT".bold
58
- shell.say " #{'chronicle-etl'.italic} is a utility tool for #{'extracting'.underline}, #{'transforming'.underline}, and #{'loading'.underline} personal data."
57
+ shell.say "ABOUT:".bold
58
+ shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
59
+ shell.say " history. 📜"
59
60
  shell.say
60
- shell.say "USAGE".bold
61
- shell.say " $ chronicle-etl COMMAND"
61
+ shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
62
+ shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
63
+ shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
64
+ shell.say " rest) or use plugins to connect to third-party services."
62
65
  shell.say
63
- shell.say "EXAMPLES".bold
64
- shell.say " Show available connectors:".italic.light_black
65
- shell.say " $ chronicle-etl connectors:list"
66
+ shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
66
67
  shell.say
67
- shell.say " Run a simple job:".italic.light_black
68
- shell.say " $ chronicle-etl jobs:run --extractor stdin --transformer null --loader stdout"
68
+ shell.say "USAGE:".bold
69
+ shell.say " # Basic job usage:".italic.light_black
70
+ shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
69
71
  shell.say
70
- shell.say " Show full job options:".italic.light_black
72
+ shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
73
+ shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
74
+ shell.say
75
+ shell.say " # Show available plugins:".italic.light_black
76
+ shell.say " $ chronicle-etl plugins:list"
77
+ shell.say
78
+ shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
79
+ shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
80
+ shell.say " $ chronicle-etl secrets:list"
81
+ shell.say " $ chronicle-etl -e pinboard --since 1mo"
82
+ shell.say
83
+ shell.say " # Show full job options:".italic.light_black
71
84
  shell.say " $ chronicle-etl jobs help run"
85
+ shell.say
86
+ shell.say "FULL DOCUMENTATION:".bold
87
+ shell.say " https://github.com/chronicle-app/chronicle-etl".blue
88
+ shell.say
72
89
 
73
90
  list = []
74
-
75
91
  ::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
76
92
  list += thor_class.printable_tasks(false)
77
93
  end
@@ -79,25 +95,18 @@ module Chronicle
79
95
  list.unshift ["help", "# This help menu"]
80
96
 
81
97
  shell.say
82
- shell.say 'ALL COMMANDS'.bold
98
+ shell.say 'ALL COMMANDS:'.bold
83
99
  shell.print_table(list, indent: 2, truncate: true)
84
100
  shell.say
85
- shell.say "VERSION".bold
101
+ shell.say "VERSION:".bold
86
102
  shell.say " #{Chronicle::ETL::VERSION}"
87
103
  shell.say
88
104
  shell.say " Display current version:".italic.light_black
89
105
  shell.say " $ chronicle-etl --version"
90
- shell.say
91
- shell.say "FULL DOCUMENTATION".bold
92
- shell.say " https://github.com/chronicle-app/chronicle-etl".blue
93
- shell.say
94
106
  end
95
107
  end
96
108
 
97
109
  no_commands do
98
- def testb
99
- puts "hi"
100
- end
101
110
  def set_color_output
102
111
  String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
103
112
  end
@@ -61,7 +61,7 @@ module Chronicle
61
61
  }
62
62
  end
63
63
 
64
- headers = ['name', 'description', 'latest version'].map{ |h| h.to_s.upcase.bold }
64
+ headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
65
65
  table = TTY::Table.new(headers, info.map(&:values))
66
66
  puts "Installed plugins:"
67
67
  puts table.render(indent: 2, padding: [0, 0])
@@ -6,6 +6,9 @@ module Chronicle
6
6
 
7
7
  class ConfigError < Error; end
8
8
 
9
+ class RunnerError < Error; end
10
+ class RunInterruptedError < RunnerError; end
11
+
9
12
  class RunnerTypeError < Error; end
10
13
 
11
14
  class JobDefinitionError < Error
@@ -9,7 +9,7 @@ module Chronicle
9
9
  # @todo Experiment with just mixing in ActiveModel instead of this
10
10
  # this reimplementation
11
11
  class Base
12
- ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
12
+ ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
13
13
  ASSOCIATIONS = [].freeze
14
14
 
15
15
  attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
@@ -10,7 +10,7 @@ module Chronicle
10
10
  # TODO: This desperately needs a validation system
11
11
  ASSOCIATIONS = [
12
12
  :involvements, # inverse of activity's `involved`
13
-
13
+ :analogous,
14
14
  :attachments,
15
15
  :abouts,
16
16
  :aboutables, # inverse of above
@@ -1,5 +1,6 @@
1
1
  require 'colorize'
2
2
  require 'chronic_duration'
3
+ require "tty-spinner"
3
4
 
4
5
  class Chronicle::ETL::Runner
5
6
  def initialize(job)
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
8
9
  end
9
10
 
10
11
  def run!
12
+ begin_job
11
13
  validate_job
12
14
  instantiate_connectors
13
15
  prepare_job
14
16
  prepare_ui
15
17
  run_extraction
18
+ rescue Chronicle::ETL::ExtractionError => e
19
+ @job_logger&.error
20
+ raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
21
+ rescue Interrupt
22
+ @job_logger&.error
23
+ raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
24
+ rescue StandardError => e
25
+ # Just throwing this in here until we have better exception handling in
26
+ # loaders, etc
27
+ @job_logger&.error
28
+ raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
29
+ ensure
16
30
  finish_job
17
31
  end
18
32
 
19
33
  private
20
34
 
35
+ def begin_job
36
+ Chronicle::ETL::Logger.info(tty_log_job_initialize)
37
+ @initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
38
+ end
39
+
21
40
  def validate_job
41
+ @initialization_spinner.update(title: "Validating job")
22
42
  @job.job_definition.validate!
23
43
  end
24
44
 
25
45
  def instantiate_connectors
46
+ @initialization_spinner.update(title: "Initializing connectors")
26
47
  @extractor = @job.instantiate_extractor
27
48
  @loader = @job.instantiate_loader
28
49
  end
29
50
 
30
51
  def prepare_job
31
- Chronicle::ETL::Logger.info(tty_log_job_start)
52
+ @initialization_spinner.update(title: "Preparing job")
32
53
  @job_logger.start
33
54
  @loader.start
55
+
56
+ @initialization_spinner.update(title: "Preparing extraction")
57
+ @initialization_spinner.auto_spin
34
58
  @extractor.prepare
59
+ @initialization_spinner.success("(#{'successful'.green})")
60
+ Chronicle::ETL::Logger.info("\n")
35
61
  end
36
62
 
37
63
  def prepare_ui
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
40
66
  Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
41
67
  end
42
68
 
43
- # TODO: refactor this further
44
69
  def run_extraction
45
70
  @extractor.extract do |extraction|
46
- unless extraction.is_a?(Chronicle::ETL::Extraction)
47
- raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
48
- end
49
-
50
- transformer = @job.instantiate_transformer(extraction)
51
- record = transformer.transform
52
-
53
- Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
54
- @job_logger.log_transformation(transformer)
55
-
56
- @loader.load(record) unless @job.dry_run?
57
- rescue Chronicle::ETL::TransformationError => e
58
- Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
59
- ensure
71
+ process_extraction(extraction)
60
72
  @progress_bar.increment
61
73
  end
62
74
 
63
75
  @progress_bar.finish
76
+
77
+ # This is typically a slow method (writing to stdout, writing a big file, etc)
78
+ # TODO: consider adding a spinner?
64
79
  @loader.finish
65
80
  @job_logger.finish
66
- rescue Interrupt
67
- Chronicle::ETL::Logger.error("\n#{'Job interrupted'.red}")
68
- @job_logger.error
69
- rescue StandardError => e
70
- raise e
81
+ end
82
+
83
+ def process_extraction(extraction)
84
+ # For each extraction from our extractor, we create a new tarnsformer
85
+ transformer = @job.instantiate_transformer(extraction)
86
+
87
+ # And then transform that record, logging it if we're in debug log level
88
+ record = transformer.transform
89
+ Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
90
+ @job_logger.log_transformation(transformer)
91
+
92
+ # Then send the results to the loader
93
+ @loader.load(record) unless @job.dry_run?
94
+ rescue Chronicle::ETL::TransformationError => e
95
+ # TODO: have an option to cancel job if we encounter an error
96
+ Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
71
97
  end
72
98
 
73
99
  def finish_job
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
77
103
  Chronicle::ETL::Logger.info(tty_log_completion)
78
104
  end
79
105
 
80
- def tty_log_job_start
106
+ def tty_log_job_initialize
81
107
  output = "Beginning job "
82
108
  output += "'#{@job.name}'".bold if @job.name
83
109
  output
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
95
121
 
96
122
  def tty_log_completion
97
123
  status = @job_logger.success ? 'Success' : 'Failed'
98
- output = "\nCompleted job "
99
- output += "'#{@job.name}'".bold if @job.name
124
+ job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
125
+ output = "\n#{job_completion} job"
126
+ output += " '#{@job.name}'".bold if @job.name
100
127
  output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
101
128
  output += "\n Status:\t".light_black + status
102
129
  output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
@@ -10,6 +10,10 @@ module Chronicle
10
10
  # options::
11
11
  # Options for configuring this Transformer
12
12
  def initialize(extraction, options = {})
13
+ unless extraction.is_a?(Chronicle::ETL::Extraction)
14
+ raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
15
+ end
16
+
13
17
  @extraction = extraction
14
18
  apply_options(options)
15
19
  end
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.5.2"
3
+ VERSION = "0.5.3"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.2
4
+ version: 0.5.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-30 00:00:00.000000000 Z
11
+ date: 2022-04-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport