chronicle-etl 0.5.2 → 0.5.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +13 -6
- data/lib/chronicle/etl/cli/cli_base.rb +1 -0
- data/lib/chronicle/etl/cli/jobs.rb +12 -2
- data/lib/chronicle/etl/cli/main.rb +29 -20
- data/lib/chronicle/etl/cli/plugins.rb +1 -1
- data/lib/chronicle/etl/exceptions.rb +3 -0
- data/lib/chronicle/etl/models/base.rb +1 -1
- data/lib/chronicle/etl/models/entity.rb +1 -1
- data/lib/chronicle/etl/runner.rb +51 -24
- data/lib/chronicle/etl/transformers/transformer.rb +4 -0
- data/lib/chronicle/etl/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
|
4
|
+
data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
|
7
|
+
data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
|
data/README.md
CHANGED
@@ -8,19 +8,26 @@ Are you trying to archive your digital history or incorporate it into your own p
|
|
8
8
|
|
9
9
|
If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
|
10
10
|
|
11
|
-
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a
|
11
|
+
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
|
12
12
|
|
13
13
|
## What does `chronicle-etl` give you?
|
14
14
|
* **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
|
15
15
|
* **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
|
16
16
|
* **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
|
17
17
|
|
18
|
+
## Chronicle-ETL in action
|
19
|
+
|
20
|
+
![demo](https://user-images.githubusercontent.com/6291/161410839-b5ce931a-2353-4585-b530-929f46e3f960.svg)
|
21
|
+
|
22
|
+
### Longer screencast
|
23
|
+
|
24
|
+
[![asciicast](https://asciinema.org/a/483455.svg)](https://asciinema.org/a/483455)
|
25
|
+
|
18
26
|
## Installation
|
19
27
|
|
20
28
|
Using homebrew:
|
21
29
|
```sh
|
22
30
|
$ brew install chronicle-app/etl/chronicle-etl
|
23
|
-
|
24
31
|
```
|
25
32
|
Using rubygems:
|
26
33
|
```sh
|
@@ -42,7 +49,7 @@ $ chronicle-etl help
|
|
42
49
|
$ chronicle-etl --extractor NAME --transformer NAME --loader NAME
|
43
50
|
|
44
51
|
# Read test.csv and display it to stdout as a table
|
45
|
-
$ chronicle-etl --extractor csv --input
|
52
|
+
$ chronicle-etl --extractor csv --input data.csv --loader table
|
46
53
|
|
47
54
|
# Retrieve shell commands run in the last 5 hours
|
48
55
|
$ chronicle-etl -e shell --since 5h
|
@@ -154,12 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
|
|
154
161
|
|
155
162
|
| Name | Description | Availability |
|
156
163
|
|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
|
164
|
+
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
|
165
|
+
| [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
|
157
166
|
| [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
|
158
|
-
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
159
|
-
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
|
160
167
|
| [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
|
161
168
|
| [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
|
162
|
-
| [
|
169
|
+
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
163
170
|
|
164
171
|
#### Coming soon
|
165
172
|
|
@@ -6,6 +6,7 @@ module Chronicle
|
|
6
6
|
no_commands do
|
7
7
|
# Shorthand for cli_exit(status: :failure)
|
8
8
|
def cli_fail(message: nil, exception: nil)
|
9
|
+
message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
|
9
10
|
cli_exit(status: :failure, message: message, exception: exception)
|
10
11
|
end
|
11
12
|
|
@@ -43,6 +43,15 @@ module Chronicle
|
|
43
43
|
LONG_DESC
|
44
44
|
# Run an ETL job
|
45
45
|
def start(name = nil)
|
46
|
+
# If someone runs `$ chronicle-etl` with no arguments, show help menu.
|
47
|
+
# TODO: decide if we should check that there's nothing in stdin pipe
|
48
|
+
# in case user wants to actually run this sort of job stdin->null->stdout
|
49
|
+
if name.nil? && options[:extractor].nil?
|
50
|
+
m = Chronicle::ETL::CLI::Main.new
|
51
|
+
m.help
|
52
|
+
cli_exit
|
53
|
+
end
|
54
|
+
|
46
55
|
job_definition = build_job_definition(name, options)
|
47
56
|
|
48
57
|
if job_definition.plugins_missing?
|
@@ -82,11 +91,10 @@ LONG_DESC
|
|
82
91
|
|
83
92
|
if write_config
|
84
93
|
Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
|
85
|
-
cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
|
94
|
+
cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
|
86
95
|
else
|
87
96
|
cli_fail(message: "\nJob not saved")
|
88
97
|
end
|
89
|
-
|
90
98
|
rescue Chronicle::ETL::JobDefinitionError => e
|
91
99
|
cli_fail(message: "Job definition error", exception: e)
|
92
100
|
end
|
@@ -136,6 +144,8 @@ LONG_DESC
|
|
136
144
|
job = Chronicle::ETL::Job.new(job_definition)
|
137
145
|
runner = Chronicle::ETL::Runner.new(job)
|
138
146
|
runner.run!
|
147
|
+
rescue RunnerError => e
|
148
|
+
cli_fail(message: "#{e.message}", exception: e)
|
139
149
|
end
|
140
150
|
|
141
151
|
# TODO: probably could merge this with something in cli/plugin
|
@@ -54,24 +54,40 @@ module Chronicle
|
|
54
54
|
klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
|
55
55
|
klass.start(['-h', task].compact, shell: shell)
|
56
56
|
else
|
57
|
-
shell.say "ABOUT".bold
|
58
|
-
shell.say " #{'chronicle-etl'.italic} is a
|
57
|
+
shell.say "ABOUT:".bold
|
58
|
+
shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
|
59
|
+
shell.say " history. 📜"
|
59
60
|
shell.say
|
60
|
-
shell.say "
|
61
|
-
shell.say "
|
61
|
+
shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
|
62
|
+
shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
|
63
|
+
shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
|
64
|
+
shell.say " rest) or use plugins to connect to third-party services."
|
62
65
|
shell.say
|
63
|
-
shell.say "
|
64
|
-
shell.say " Show available connectors:".italic.light_black
|
65
|
-
shell.say " $ chronicle-etl connectors:list"
|
66
|
+
shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
|
66
67
|
shell.say
|
67
|
-
shell.say "
|
68
|
-
shell.say "
|
68
|
+
shell.say "USAGE:".bold
|
69
|
+
shell.say " # Basic job usage:".italic.light_black
|
70
|
+
shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
|
69
71
|
shell.say
|
70
|
-
shell.say "
|
72
|
+
shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
|
73
|
+
shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
|
74
|
+
shell.say
|
75
|
+
shell.say " # Show available plugins:".italic.light_black
|
76
|
+
shell.say " $ chronicle-etl plugins:list"
|
77
|
+
shell.say
|
78
|
+
shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
|
79
|
+
shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
|
80
|
+
shell.say " $ chronicle-etl secrets:list"
|
81
|
+
shell.say " $ chronicle-etl -e pinboard --since 1mo"
|
82
|
+
shell.say
|
83
|
+
shell.say " # Show full job options:".italic.light_black
|
71
84
|
shell.say " $ chronicle-etl jobs help run"
|
85
|
+
shell.say
|
86
|
+
shell.say "FULL DOCUMENTATION:".bold
|
87
|
+
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
88
|
+
shell.say
|
72
89
|
|
73
90
|
list = []
|
74
|
-
|
75
91
|
::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
|
76
92
|
list += thor_class.printable_tasks(false)
|
77
93
|
end
|
@@ -79,25 +95,18 @@ module Chronicle
|
|
79
95
|
list.unshift ["help", "# This help menu"]
|
80
96
|
|
81
97
|
shell.say
|
82
|
-
shell.say 'ALL COMMANDS'.bold
|
98
|
+
shell.say 'ALL COMMANDS:'.bold
|
83
99
|
shell.print_table(list, indent: 2, truncate: true)
|
84
100
|
shell.say
|
85
|
-
shell.say "VERSION".bold
|
101
|
+
shell.say "VERSION:".bold
|
86
102
|
shell.say " #{Chronicle::ETL::VERSION}"
|
87
103
|
shell.say
|
88
104
|
shell.say " Display current version:".italic.light_black
|
89
105
|
shell.say " $ chronicle-etl --version"
|
90
|
-
shell.say
|
91
|
-
shell.say "FULL DOCUMENTATION".bold
|
92
|
-
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
93
|
-
shell.say
|
94
106
|
end
|
95
107
|
end
|
96
108
|
|
97
109
|
no_commands do
|
98
|
-
def testb
|
99
|
-
puts "hi"
|
100
|
-
end
|
101
110
|
def set_color_output
|
102
111
|
String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
|
103
112
|
end
|
@@ -61,7 +61,7 @@ module Chronicle
|
|
61
61
|
}
|
62
62
|
end
|
63
63
|
|
64
|
-
headers = ['name', 'description', '
|
64
|
+
headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
|
65
65
|
table = TTY::Table.new(headers, info.map(&:values))
|
66
66
|
puts "Installed plugins:"
|
67
67
|
puts table.render(indent: 2, padding: [0, 0])
|
@@ -9,7 +9,7 @@ module Chronicle
|
|
9
9
|
# @todo Experiment with just mixing in ActiveModel instead of this
|
10
10
|
# this reimplementation
|
11
11
|
class Base
|
12
|
-
ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
|
12
|
+
ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
|
13
13
|
ASSOCIATIONS = [].freeze
|
14
14
|
|
15
15
|
attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
|
data/lib/chronicle/etl/runner.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'colorize'
|
2
2
|
require 'chronic_duration'
|
3
|
+
require "tty-spinner"
|
3
4
|
|
4
5
|
class Chronicle::ETL::Runner
|
5
6
|
def initialize(job)
|
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
|
|
8
9
|
end
|
9
10
|
|
10
11
|
def run!
|
12
|
+
begin_job
|
11
13
|
validate_job
|
12
14
|
instantiate_connectors
|
13
15
|
prepare_job
|
14
16
|
prepare_ui
|
15
17
|
run_extraction
|
18
|
+
rescue Chronicle::ETL::ExtractionError => e
|
19
|
+
@job_logger&.error
|
20
|
+
raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
|
21
|
+
rescue Interrupt
|
22
|
+
@job_logger&.error
|
23
|
+
raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
|
24
|
+
rescue StandardError => e
|
25
|
+
# Just throwing this in here until we have better exception handling in
|
26
|
+
# loaders, etc
|
27
|
+
@job_logger&.error
|
28
|
+
raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
|
29
|
+
ensure
|
16
30
|
finish_job
|
17
31
|
end
|
18
32
|
|
19
33
|
private
|
20
34
|
|
35
|
+
def begin_job
|
36
|
+
Chronicle::ETL::Logger.info(tty_log_job_initialize)
|
37
|
+
@initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
|
38
|
+
end
|
39
|
+
|
21
40
|
def validate_job
|
41
|
+
@initialization_spinner.update(title: "Validating job")
|
22
42
|
@job.job_definition.validate!
|
23
43
|
end
|
24
44
|
|
25
45
|
def instantiate_connectors
|
46
|
+
@initialization_spinner.update(title: "Initializing connectors")
|
26
47
|
@extractor = @job.instantiate_extractor
|
27
48
|
@loader = @job.instantiate_loader
|
28
49
|
end
|
29
50
|
|
30
51
|
def prepare_job
|
31
|
-
|
52
|
+
@initialization_spinner.update(title: "Preparing job")
|
32
53
|
@job_logger.start
|
33
54
|
@loader.start
|
55
|
+
|
56
|
+
@initialization_spinner.update(title: "Preparing extraction")
|
57
|
+
@initialization_spinner.auto_spin
|
34
58
|
@extractor.prepare
|
59
|
+
@initialization_spinner.success("(#{'successful'.green})")
|
60
|
+
Chronicle::ETL::Logger.info("\n")
|
35
61
|
end
|
36
62
|
|
37
63
|
def prepare_ui
|
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
|
|
40
66
|
Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
|
41
67
|
end
|
42
68
|
|
43
|
-
# TODO: refactor this further
|
44
69
|
def run_extraction
|
45
70
|
@extractor.extract do |extraction|
|
46
|
-
|
47
|
-
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
48
|
-
end
|
49
|
-
|
50
|
-
transformer = @job.instantiate_transformer(extraction)
|
51
|
-
record = transformer.transform
|
52
|
-
|
53
|
-
Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
|
54
|
-
@job_logger.log_transformation(transformer)
|
55
|
-
|
56
|
-
@loader.load(record) unless @job.dry_run?
|
57
|
-
rescue Chronicle::ETL::TransformationError => e
|
58
|
-
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
59
|
-
ensure
|
71
|
+
process_extraction(extraction)
|
60
72
|
@progress_bar.increment
|
61
73
|
end
|
62
74
|
|
63
75
|
@progress_bar.finish
|
76
|
+
|
77
|
+
# This is typically a slow method (writing to stdout, writing a big file, etc)
|
78
|
+
# TODO: consider adding a spinner?
|
64
79
|
@loader.finish
|
65
80
|
@job_logger.finish
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
81
|
+
end
|
82
|
+
|
83
|
+
def process_extraction(extraction)
|
84
|
+
# For each extraction from our extractor, we create a new tarnsformer
|
85
|
+
transformer = @job.instantiate_transformer(extraction)
|
86
|
+
|
87
|
+
# And then transform that record, logging it if we're in debug log level
|
88
|
+
record = transformer.transform
|
89
|
+
Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
|
90
|
+
@job_logger.log_transformation(transformer)
|
91
|
+
|
92
|
+
# Then send the results to the loader
|
93
|
+
@loader.load(record) unless @job.dry_run?
|
94
|
+
rescue Chronicle::ETL::TransformationError => e
|
95
|
+
# TODO: have an option to cancel job if we encounter an error
|
96
|
+
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
71
97
|
end
|
72
98
|
|
73
99
|
def finish_job
|
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
|
|
77
103
|
Chronicle::ETL::Logger.info(tty_log_completion)
|
78
104
|
end
|
79
105
|
|
80
|
-
def
|
106
|
+
def tty_log_job_initialize
|
81
107
|
output = "Beginning job "
|
82
108
|
output += "'#{@job.name}'".bold if @job.name
|
83
109
|
output
|
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
|
|
95
121
|
|
96
122
|
def tty_log_completion
|
97
123
|
status = @job_logger.success ? 'Success' : 'Failed'
|
98
|
-
|
99
|
-
output
|
124
|
+
job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
|
125
|
+
output = "\n#{job_completion} job"
|
126
|
+
output += " '#{@job.name}'".bold if @job.name
|
100
127
|
output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
|
101
128
|
output += "\n Status:\t".light_black + status
|
102
129
|
output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
|
@@ -10,6 +10,10 @@ module Chronicle
|
|
10
10
|
# options::
|
11
11
|
# Options for configuring this Transformer
|
12
12
|
def initialize(extraction, options = {})
|
13
|
+
unless extraction.is_a?(Chronicle::ETL::Extraction)
|
14
|
+
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
15
|
+
end
|
16
|
+
|
13
17
|
@extraction = extraction
|
14
18
|
apply_options(options)
|
15
19
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chronicle-etl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.5.
|
4
|
+
version: 0.5.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Louis
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-
|
11
|
+
date: 2022-04-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|