chronicle-etl 0.5.2 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b8faa084cfe4a9f080ee5494c69b268b78bfa8f3502354e740264e6941f13daf
4
- data.tar.gz: 1bf4f2751c71cadedc78a2fe3ed5b09bf86cd601a909e2fa2db0a0de8cc2c21d
3
+ metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
4
+ data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
5
5
  SHA512:
6
- metadata.gz: ff10779b663a3321b779fb03e07249856174d96fb96e405ae906a47441c288d6a245c852525801ba250cce1125cf05c523ef4ec75fdfb4335cef9003091437ed
7
- data.tar.gz: 509f6f92e95341d212c54b6b000bc54e8ba03898497191a3e5d3b14db7bff3ed625d0fee403888fbb6103c1edc14de66b215d9aa84ddb68cefcf51c0e6c74138
6
+ metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
7
+ data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
data/README.md CHANGED
@@ -8,19 +8,26 @@ Are you trying to archive your digital history or incorporate it into your own p
8
8
 
9
9
  If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
10
10
 
11
- **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a source (e.g. a CSV file, JSON, external API).
11
+ **`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
12
12
 
13
13
  ## What does `chronicle-etl` give you?
14
14
  * **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
15
15
  * **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
16
16
  * **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
17
17
 
18
+ ## Chronicle-ETL in action
19
+
20
+ ![demo](https://user-images.githubusercontent.com/6291/161410839-b5ce931a-2353-4585-b530-929f46e3f960.svg)
21
+
22
+ ### Longer screencast
23
+
24
+ [![asciicast](https://asciinema.org/a/483455.svg)](https://asciinema.org/a/483455)
25
+
18
26
  ## Installation
19
27
 
20
28
  Using homebrew:
21
29
  ```sh
22
30
  $ brew install chronicle-app/etl/chronicle-etl
23
-
24
31
  ```
25
32
  Using rubygems:
26
33
  ```sh
@@ -42,7 +49,7 @@ $ chronicle-etl help
42
49
  $ chronicle-etl --extractor NAME --transformer NAME --loader NAME
43
50
 
44
51
  # Read test.csv and display it to stdout as a table
45
- $ chronicle-etl --extractor csv --input ./data.csv --loader table
52
+ $ chronicle-etl --extractor csv --input data.csv --loader table
46
53
 
47
54
  # Retrieve shell commands run in the last 5 hours
48
55
  $ chronicle-etl -e shell --since 5h
@@ -154,12 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
154
161
 
155
162
  | Name | Description | Availability |
156
163
  |-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
164
+ | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
165
+ | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
157
166
  | [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
158
- | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
159
- | [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
160
167
  | [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
161
168
  | [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
162
- | [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
169
+ | [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
163
170
 
164
171
  #### Coming soon
165
172
 
@@ -6,6 +6,7 @@ module Chronicle
6
6
  no_commands do
7
7
  # Shorthand for cli_exit(status: :failure)
8
8
  def cli_fail(message: nil, exception: nil)
9
+ message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
9
10
  cli_exit(status: :failure, message: message, exception: exception)
10
11
  end
11
12
 
@@ -43,6 +43,15 @@ module Chronicle
43
43
  LONG_DESC
44
44
  # Run an ETL job
45
45
  def start(name = nil)
46
+ # If someone runs `$ chronicle-etl` with no arguments, show help menu.
47
+ # TODO: decide if we should check that there's nothing in stdin pipe
48
+ # in case user wants to actually run this sort of job stdin->null->stdout
49
+ if name.nil? && options[:extractor].nil?
50
+ m = Chronicle::ETL::CLI::Main.new
51
+ m.help
52
+ cli_exit
53
+ end
54
+
46
55
  job_definition = build_job_definition(name, options)
47
56
 
48
57
  if job_definition.plugins_missing?
@@ -82,11 +91,10 @@ LONG_DESC
82
91
 
83
92
  if write_config
84
93
  Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
85
- cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
94
+ cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
86
95
  else
87
96
  cli_fail(message: "\nJob not saved")
88
97
  end
89
-
90
98
  rescue Chronicle::ETL::JobDefinitionError => e
91
99
  cli_fail(message: "Job definition error", exception: e)
92
100
  end
@@ -136,6 +144,8 @@ LONG_DESC
136
144
  job = Chronicle::ETL::Job.new(job_definition)
137
145
  runner = Chronicle::ETL::Runner.new(job)
138
146
  runner.run!
147
+ rescue RunnerError => e
148
+ cli_fail(message: "#{e.message}", exception: e)
139
149
  end
140
150
 
141
151
  # TODO: probably could merge this with something in cli/plugin
@@ -54,24 +54,40 @@ module Chronicle
54
54
  klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
55
55
  klass.start(['-h', task].compact, shell: shell)
56
56
  else
57
- shell.say "ABOUT".bold
58
- shell.say " #{'chronicle-etl'.italic} is a utility tool for #{'extracting'.underline}, #{'transforming'.underline}, and #{'loading'.underline} personal data."
57
+ shell.say "ABOUT:".bold
58
+ shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
59
+ shell.say " history. 📜"
59
60
  shell.say
60
- shell.say "USAGE".bold
61
- shell.say " $ chronicle-etl COMMAND"
61
+ shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
62
+ shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
63
+ shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
64
+ shell.say " rest) or use plugins to connect to third-party services."
62
65
  shell.say
63
- shell.say "EXAMPLES".bold
64
- shell.say " Show available connectors:".italic.light_black
65
- shell.say " $ chronicle-etl connectors:list"
66
+ shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
66
67
  shell.say
67
- shell.say " Run a simple job:".italic.light_black
68
- shell.say " $ chronicle-etl jobs:run --extractor stdin --transformer null --loader stdout"
68
+ shell.say "USAGE:".bold
69
+ shell.say " # Basic job usage:".italic.light_black
70
+ shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
69
71
  shell.say
70
- shell.say " Show full job options:".italic.light_black
72
+ shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
73
+ shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
74
+ shell.say
75
+ shell.say " # Show available plugins:".italic.light_black
76
+ shell.say " $ chronicle-etl plugins:list"
77
+ shell.say
78
+ shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
79
+ shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
80
+ shell.say " $ chronicle-etl secrets:list"
81
+ shell.say " $ chronicle-etl -e pinboard --since 1mo"
82
+ shell.say
83
+ shell.say " # Show full job options:".italic.light_black
71
84
  shell.say " $ chronicle-etl jobs help run"
85
+ shell.say
86
+ shell.say "FULL DOCUMENTATION:".bold
87
+ shell.say " https://github.com/chronicle-app/chronicle-etl".blue
88
+ shell.say
72
89
 
73
90
  list = []
74
-
75
91
  ::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
76
92
  list += thor_class.printable_tasks(false)
77
93
  end
@@ -79,25 +95,18 @@ module Chronicle
79
95
  list.unshift ["help", "# This help menu"]
80
96
 
81
97
  shell.say
82
- shell.say 'ALL COMMANDS'.bold
98
+ shell.say 'ALL COMMANDS:'.bold
83
99
  shell.print_table(list, indent: 2, truncate: true)
84
100
  shell.say
85
- shell.say "VERSION".bold
101
+ shell.say "VERSION:".bold
86
102
  shell.say " #{Chronicle::ETL::VERSION}"
87
103
  shell.say
88
104
  shell.say " Display current version:".italic.light_black
89
105
  shell.say " $ chronicle-etl --version"
90
- shell.say
91
- shell.say "FULL DOCUMENTATION".bold
92
- shell.say " https://github.com/chronicle-app/chronicle-etl".blue
93
- shell.say
94
106
  end
95
107
  end
96
108
 
97
109
  no_commands do
98
- def testb
99
- puts "hi"
100
- end
101
110
  def set_color_output
102
111
  String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
103
112
  end
@@ -61,7 +61,7 @@ module Chronicle
61
61
  }
62
62
  end
63
63
 
64
- headers = ['name', 'description', 'latest version'].map{ |h| h.to_s.upcase.bold }
64
+ headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
65
65
  table = TTY::Table.new(headers, info.map(&:values))
66
66
  puts "Installed plugins:"
67
67
  puts table.render(indent: 2, padding: [0, 0])
@@ -6,6 +6,9 @@ module Chronicle
6
6
 
7
7
  class ConfigError < Error; end
8
8
 
9
+ class RunnerError < Error; end
10
+ class RunInterruptedError < RunnerError; end
11
+
9
12
  class RunnerTypeError < Error; end
10
13
 
11
14
  class JobDefinitionError < Error
@@ -9,7 +9,7 @@ module Chronicle
9
9
  # @todo Experiment with just mixing in ActiveModel instead of this
10
10
  # this reimplementation
11
11
  class Base
12
- ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
12
+ ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
13
13
  ASSOCIATIONS = [].freeze
14
14
 
15
15
  attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
@@ -10,7 +10,7 @@ module Chronicle
10
10
  # TODO: This desperately needs a validation system
11
11
  ASSOCIATIONS = [
12
12
  :involvements, # inverse of activity's `involved`
13
-
13
+ :analogous,
14
14
  :attachments,
15
15
  :abouts,
16
16
  :aboutables, # inverse of above
@@ -1,5 +1,6 @@
1
1
  require 'colorize'
2
2
  require 'chronic_duration'
3
+ require "tty-spinner"
3
4
 
4
5
  class Chronicle::ETL::Runner
5
6
  def initialize(job)
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
8
9
  end
9
10
 
10
11
  def run!
12
+ begin_job
11
13
  validate_job
12
14
  instantiate_connectors
13
15
  prepare_job
14
16
  prepare_ui
15
17
  run_extraction
18
+ rescue Chronicle::ETL::ExtractionError => e
19
+ @job_logger&.error
20
+ raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
21
+ rescue Interrupt
22
+ @job_logger&.error
23
+ raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
24
+ rescue StandardError => e
25
+ # Just throwing this in here until we have better exception handling in
26
+ # loaders, etc
27
+ @job_logger&.error
28
+ raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
29
+ ensure
16
30
  finish_job
17
31
  end
18
32
 
19
33
  private
20
34
 
35
+ def begin_job
36
+ Chronicle::ETL::Logger.info(tty_log_job_initialize)
37
+ @initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
38
+ end
39
+
21
40
  def validate_job
41
+ @initialization_spinner.update(title: "Validating job")
22
42
  @job.job_definition.validate!
23
43
  end
24
44
 
25
45
  def instantiate_connectors
46
+ @initialization_spinner.update(title: "Initializing connectors")
26
47
  @extractor = @job.instantiate_extractor
27
48
  @loader = @job.instantiate_loader
28
49
  end
29
50
 
30
51
  def prepare_job
31
- Chronicle::ETL::Logger.info(tty_log_job_start)
52
+ @initialization_spinner.update(title: "Preparing job")
32
53
  @job_logger.start
33
54
  @loader.start
55
+
56
+ @initialization_spinner.update(title: "Preparing extraction")
57
+ @initialization_spinner.auto_spin
34
58
  @extractor.prepare
59
+ @initialization_spinner.success("(#{'successful'.green})")
60
+ Chronicle::ETL::Logger.info("\n")
35
61
  end
36
62
 
37
63
  def prepare_ui
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
40
66
  Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
41
67
  end
42
68
 
43
- # TODO: refactor this further
44
69
  def run_extraction
45
70
  @extractor.extract do |extraction|
46
- unless extraction.is_a?(Chronicle::ETL::Extraction)
47
- raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
48
- end
49
-
50
- transformer = @job.instantiate_transformer(extraction)
51
- record = transformer.transform
52
-
53
- Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
54
- @job_logger.log_transformation(transformer)
55
-
56
- @loader.load(record) unless @job.dry_run?
57
- rescue Chronicle::ETL::TransformationError => e
58
- Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
59
- ensure
71
+ process_extraction(extraction)
60
72
  @progress_bar.increment
61
73
  end
62
74
 
63
75
  @progress_bar.finish
76
+
77
+ # This is typically a slow method (writing to stdout, writing a big file, etc)
78
+ # TODO: consider adding a spinner?
64
79
  @loader.finish
65
80
  @job_logger.finish
66
- rescue Interrupt
67
- Chronicle::ETL::Logger.error("\n#{'Job interrupted'.red}")
68
- @job_logger.error
69
- rescue StandardError => e
70
- raise e
81
+ end
82
+
83
+ def process_extraction(extraction)
84
+ # For each extraction from our extractor, we create a new tarnsformer
85
+ transformer = @job.instantiate_transformer(extraction)
86
+
87
+ # And then transform that record, logging it if we're in debug log level
88
+ record = transformer.transform
89
+ Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
90
+ @job_logger.log_transformation(transformer)
91
+
92
+ # Then send the results to the loader
93
+ @loader.load(record) unless @job.dry_run?
94
+ rescue Chronicle::ETL::TransformationError => e
95
+ # TODO: have an option to cancel job if we encounter an error
96
+ Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
71
97
  end
72
98
 
73
99
  def finish_job
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
77
103
  Chronicle::ETL::Logger.info(tty_log_completion)
78
104
  end
79
105
 
80
- def tty_log_job_start
106
+ def tty_log_job_initialize
81
107
  output = "Beginning job "
82
108
  output += "'#{@job.name}'".bold if @job.name
83
109
  output
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
95
121
 
96
122
  def tty_log_completion
97
123
  status = @job_logger.success ? 'Success' : 'Failed'
98
- output = "\nCompleted job "
99
- output += "'#{@job.name}'".bold if @job.name
124
+ job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
125
+ output = "\n#{job_completion} job"
126
+ output += " '#{@job.name}'".bold if @job.name
100
127
  output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
101
128
  output += "\n Status:\t".light_black + status
102
129
  output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
@@ -10,6 +10,10 @@ module Chronicle
10
10
  # options::
11
11
  # Options for configuring this Transformer
12
12
  def initialize(extraction, options = {})
13
+ unless extraction.is_a?(Chronicle::ETL::Extraction)
14
+ raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
15
+ end
16
+
13
17
  @extraction = extraction
14
18
  apply_options(options)
15
19
  end
@@ -1,5 +1,5 @@
1
1
  module Chronicle
2
2
  module ETL
3
- VERSION = "0.5.2"
3
+ VERSION = "0.5.3"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chronicle-etl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.2
4
+ version: 0.5.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Louis
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-03-30 00:00:00.000000000 Z
11
+ date: 2022-04-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport