chronicle-etl 0.5.2 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +13 -6
- data/lib/chronicle/etl/cli/cli_base.rb +1 -0
- data/lib/chronicle/etl/cli/jobs.rb +12 -2
- data/lib/chronicle/etl/cli/main.rb +29 -20
- data/lib/chronicle/etl/cli/plugins.rb +1 -1
- data/lib/chronicle/etl/exceptions.rb +3 -0
- data/lib/chronicle/etl/models/base.rb +1 -1
- data/lib/chronicle/etl/models/entity.rb +1 -1
- data/lib/chronicle/etl/runner.rb +51 -24
- data/lib/chronicle/etl/transformers/transformer.rb +4 -0
- data/lib/chronicle/etl/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
|
4
|
+
data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
|
7
|
+
data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
|
data/README.md
CHANGED
@@ -8,19 +8,26 @@ Are you trying to archive your digital history or incorporate it into your own p
|
|
8
8
|
|
9
9
|
If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
|
10
10
|
|
11
|
-
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a
|
11
|
+
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
|
12
12
|
|
13
13
|
## What does `chronicle-etl` give you?
|
14
14
|
* **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
|
15
15
|
* **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
|
16
16
|
* **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
|
17
17
|
|
18
|
+
## Chronicle-ETL in action
|
19
|
+
|
20
|
+

|
21
|
+
|
22
|
+
### Longer screencast
|
23
|
+
|
24
|
+
[](https://asciinema.org/a/483455)
|
25
|
+
|
18
26
|
## Installation
|
19
27
|
|
20
28
|
Using homebrew:
|
21
29
|
```sh
|
22
30
|
$ brew install chronicle-app/etl/chronicle-etl
|
23
|
-
|
24
31
|
```
|
25
32
|
Using rubygems:
|
26
33
|
```sh
|
@@ -42,7 +49,7 @@ $ chronicle-etl help
|
|
42
49
|
$ chronicle-etl --extractor NAME --transformer NAME --loader NAME
|
43
50
|
|
44
51
|
# Read test.csv and display it to stdout as a table
|
45
|
-
$ chronicle-etl --extractor csv --input
|
52
|
+
$ chronicle-etl --extractor csv --input data.csv --loader table
|
46
53
|
|
47
54
|
# Retrieve shell commands run in the last 5 hours
|
48
55
|
$ chronicle-etl -e shell --since 5h
|
@@ -154,12 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
|
|
154
161
|
|
155
162
|
| Name | Description | Availability |
|
156
163
|
|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
|
164
|
+
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
|
165
|
+
| [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
|
157
166
|
| [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
|
158
|
-
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
159
|
-
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
|
160
167
|
| [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
|
161
168
|
| [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
|
162
|
-
| [
|
169
|
+
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
163
170
|
|
164
171
|
#### Coming soon
|
165
172
|
|
@@ -6,6 +6,7 @@ module Chronicle
|
|
6
6
|
no_commands do
|
7
7
|
# Shorthand for cli_exit(status: :failure)
|
8
8
|
def cli_fail(message: nil, exception: nil)
|
9
|
+
message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
|
9
10
|
cli_exit(status: :failure, message: message, exception: exception)
|
10
11
|
end
|
11
12
|
|
@@ -43,6 +43,15 @@ module Chronicle
|
|
43
43
|
LONG_DESC
|
44
44
|
# Run an ETL job
|
45
45
|
def start(name = nil)
|
46
|
+
# If someone runs `$ chronicle-etl` with no arguments, show help menu.
|
47
|
+
# TODO: decide if we should check that there's nothing in stdin pipe
|
48
|
+
# in case user wants to actually run this sort of job stdin->null->stdout
|
49
|
+
if name.nil? && options[:extractor].nil?
|
50
|
+
m = Chronicle::ETL::CLI::Main.new
|
51
|
+
m.help
|
52
|
+
cli_exit
|
53
|
+
end
|
54
|
+
|
46
55
|
job_definition = build_job_definition(name, options)
|
47
56
|
|
48
57
|
if job_definition.plugins_missing?
|
@@ -82,11 +91,10 @@ LONG_DESC
|
|
82
91
|
|
83
92
|
if write_config
|
84
93
|
Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
|
85
|
-
cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
|
94
|
+
cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
|
86
95
|
else
|
87
96
|
cli_fail(message: "\nJob not saved")
|
88
97
|
end
|
89
|
-
|
90
98
|
rescue Chronicle::ETL::JobDefinitionError => e
|
91
99
|
cli_fail(message: "Job definition error", exception: e)
|
92
100
|
end
|
@@ -136,6 +144,8 @@ LONG_DESC
|
|
136
144
|
job = Chronicle::ETL::Job.new(job_definition)
|
137
145
|
runner = Chronicle::ETL::Runner.new(job)
|
138
146
|
runner.run!
|
147
|
+
rescue RunnerError => e
|
148
|
+
cli_fail(message: "#{e.message}", exception: e)
|
139
149
|
end
|
140
150
|
|
141
151
|
# TODO: probably could merge this with something in cli/plugin
|
@@ -54,24 +54,40 @@ module Chronicle
|
|
54
54
|
klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
|
55
55
|
klass.start(['-h', task].compact, shell: shell)
|
56
56
|
else
|
57
|
-
shell.say "ABOUT".bold
|
58
|
-
shell.say " #{'chronicle-etl'.italic} is a
|
57
|
+
shell.say "ABOUT:".bold
|
58
|
+
shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
|
59
|
+
shell.say " history. 📜"
|
59
60
|
shell.say
|
60
|
-
shell.say "
|
61
|
-
shell.say "
|
61
|
+
shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
|
62
|
+
shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
|
63
|
+
shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
|
64
|
+
shell.say " rest) or use plugins to connect to third-party services."
|
62
65
|
shell.say
|
63
|
-
shell.say "
|
64
|
-
shell.say " Show available connectors:".italic.light_black
|
65
|
-
shell.say " $ chronicle-etl connectors:list"
|
66
|
+
shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
|
66
67
|
shell.say
|
67
|
-
shell.say "
|
68
|
-
shell.say "
|
68
|
+
shell.say "USAGE:".bold
|
69
|
+
shell.say " # Basic job usage:".italic.light_black
|
70
|
+
shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
|
69
71
|
shell.say
|
70
|
-
shell.say "
|
72
|
+
shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
|
73
|
+
shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
|
74
|
+
shell.say
|
75
|
+
shell.say " # Show available plugins:".italic.light_black
|
76
|
+
shell.say " $ chronicle-etl plugins:list"
|
77
|
+
shell.say
|
78
|
+
shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
|
79
|
+
shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
|
80
|
+
shell.say " $ chronicle-etl secrets:list"
|
81
|
+
shell.say " $ chronicle-etl -e pinboard --since 1mo"
|
82
|
+
shell.say
|
83
|
+
shell.say " # Show full job options:".italic.light_black
|
71
84
|
shell.say " $ chronicle-etl jobs help run"
|
85
|
+
shell.say
|
86
|
+
shell.say "FULL DOCUMENTATION:".bold
|
87
|
+
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
88
|
+
shell.say
|
72
89
|
|
73
90
|
list = []
|
74
|
-
|
75
91
|
::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
|
76
92
|
list += thor_class.printable_tasks(false)
|
77
93
|
end
|
@@ -79,25 +95,18 @@ module Chronicle
|
|
79
95
|
list.unshift ["help", "# This help menu"]
|
80
96
|
|
81
97
|
shell.say
|
82
|
-
shell.say 'ALL COMMANDS'.bold
|
98
|
+
shell.say 'ALL COMMANDS:'.bold
|
83
99
|
shell.print_table(list, indent: 2, truncate: true)
|
84
100
|
shell.say
|
85
|
-
shell.say "VERSION".bold
|
101
|
+
shell.say "VERSION:".bold
|
86
102
|
shell.say " #{Chronicle::ETL::VERSION}"
|
87
103
|
shell.say
|
88
104
|
shell.say " Display current version:".italic.light_black
|
89
105
|
shell.say " $ chronicle-etl --version"
|
90
|
-
shell.say
|
91
|
-
shell.say "FULL DOCUMENTATION".bold
|
92
|
-
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
93
|
-
shell.say
|
94
106
|
end
|
95
107
|
end
|
96
108
|
|
97
109
|
no_commands do
|
98
|
-
def testb
|
99
|
-
puts "hi"
|
100
|
-
end
|
101
110
|
def set_color_output
|
102
111
|
String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
|
103
112
|
end
|
@@ -61,7 +61,7 @@ module Chronicle
|
|
61
61
|
}
|
62
62
|
end
|
63
63
|
|
64
|
-
headers = ['name', 'description', '
|
64
|
+
headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
|
65
65
|
table = TTY::Table.new(headers, info.map(&:values))
|
66
66
|
puts "Installed plugins:"
|
67
67
|
puts table.render(indent: 2, padding: [0, 0])
|
@@ -9,7 +9,7 @@ module Chronicle
|
|
9
9
|
# @todo Experiment with just mixing in ActiveModel instead of this
|
10
10
|
# this reimplementation
|
11
11
|
class Base
|
12
|
-
ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
|
12
|
+
ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
|
13
13
|
ASSOCIATIONS = [].freeze
|
14
14
|
|
15
15
|
attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
|
data/lib/chronicle/etl/runner.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'colorize'
|
2
2
|
require 'chronic_duration'
|
3
|
+
require "tty-spinner"
|
3
4
|
|
4
5
|
class Chronicle::ETL::Runner
|
5
6
|
def initialize(job)
|
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
|
|
8
9
|
end
|
9
10
|
|
10
11
|
def run!
|
12
|
+
begin_job
|
11
13
|
validate_job
|
12
14
|
instantiate_connectors
|
13
15
|
prepare_job
|
14
16
|
prepare_ui
|
15
17
|
run_extraction
|
18
|
+
rescue Chronicle::ETL::ExtractionError => e
|
19
|
+
@job_logger&.error
|
20
|
+
raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
|
21
|
+
rescue Interrupt
|
22
|
+
@job_logger&.error
|
23
|
+
raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
|
24
|
+
rescue StandardError => e
|
25
|
+
# Just throwing this in here until we have better exception handling in
|
26
|
+
# loaders, etc
|
27
|
+
@job_logger&.error
|
28
|
+
raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
|
29
|
+
ensure
|
16
30
|
finish_job
|
17
31
|
end
|
18
32
|
|
19
33
|
private
|
20
34
|
|
35
|
+
def begin_job
|
36
|
+
Chronicle::ETL::Logger.info(tty_log_job_initialize)
|
37
|
+
@initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
|
38
|
+
end
|
39
|
+
|
21
40
|
def validate_job
|
41
|
+
@initialization_spinner.update(title: "Validating job")
|
22
42
|
@job.job_definition.validate!
|
23
43
|
end
|
24
44
|
|
25
45
|
def instantiate_connectors
|
46
|
+
@initialization_spinner.update(title: "Initializing connectors")
|
26
47
|
@extractor = @job.instantiate_extractor
|
27
48
|
@loader = @job.instantiate_loader
|
28
49
|
end
|
29
50
|
|
30
51
|
def prepare_job
|
31
|
-
|
52
|
+
@initialization_spinner.update(title: "Preparing job")
|
32
53
|
@job_logger.start
|
33
54
|
@loader.start
|
55
|
+
|
56
|
+
@initialization_spinner.update(title: "Preparing extraction")
|
57
|
+
@initialization_spinner.auto_spin
|
34
58
|
@extractor.prepare
|
59
|
+
@initialization_spinner.success("(#{'successful'.green})")
|
60
|
+
Chronicle::ETL::Logger.info("\n")
|
35
61
|
end
|
36
62
|
|
37
63
|
def prepare_ui
|
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
|
|
40
66
|
Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
|
41
67
|
end
|
42
68
|
|
43
|
-
# TODO: refactor this further
|
44
69
|
def run_extraction
|
45
70
|
@extractor.extract do |extraction|
|
46
|
-
|
47
|
-
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
48
|
-
end
|
49
|
-
|
50
|
-
transformer = @job.instantiate_transformer(extraction)
|
51
|
-
record = transformer.transform
|
52
|
-
|
53
|
-
Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
|
54
|
-
@job_logger.log_transformation(transformer)
|
55
|
-
|
56
|
-
@loader.load(record) unless @job.dry_run?
|
57
|
-
rescue Chronicle::ETL::TransformationError => e
|
58
|
-
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
59
|
-
ensure
|
71
|
+
process_extraction(extraction)
|
60
72
|
@progress_bar.increment
|
61
73
|
end
|
62
74
|
|
63
75
|
@progress_bar.finish
|
76
|
+
|
77
|
+
# This is typically a slow method (writing to stdout, writing a big file, etc)
|
78
|
+
# TODO: consider adding a spinner?
|
64
79
|
@loader.finish
|
65
80
|
@job_logger.finish
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
81
|
+
end
|
82
|
+
|
83
|
+
def process_extraction(extraction)
|
84
|
+
# For each extraction from our extractor, we create a new tarnsformer
|
85
|
+
transformer = @job.instantiate_transformer(extraction)
|
86
|
+
|
87
|
+
# And then transform that record, logging it if we're in debug log level
|
88
|
+
record = transformer.transform
|
89
|
+
Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
|
90
|
+
@job_logger.log_transformation(transformer)
|
91
|
+
|
92
|
+
# Then send the results to the loader
|
93
|
+
@loader.load(record) unless @job.dry_run?
|
94
|
+
rescue Chronicle::ETL::TransformationError => e
|
95
|
+
# TODO: have an option to cancel job if we encounter an error
|
96
|
+
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
71
97
|
end
|
72
98
|
|
73
99
|
def finish_job
|
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
|
|
77
103
|
Chronicle::ETL::Logger.info(tty_log_completion)
|
78
104
|
end
|
79
105
|
|
80
|
-
def
|
106
|
+
def tty_log_job_initialize
|
81
107
|
output = "Beginning job "
|
82
108
|
output += "'#{@job.name}'".bold if @job.name
|
83
109
|
output
|
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
|
|
95
121
|
|
96
122
|
def tty_log_completion
|
97
123
|
status = @job_logger.success ? 'Success' : 'Failed'
|
98
|
-
|
99
|
-
output
|
124
|
+
job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
|
125
|
+
output = "\n#{job_completion} job"
|
126
|
+
output += " '#{@job.name}'".bold if @job.name
|
100
127
|
output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
|
101
128
|
output += "\n Status:\t".light_black + status
|
102
129
|
output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
|
@@ -10,6 +10,10 @@ module Chronicle
|
|
10
10
|
# options::
|
11
11
|
# Options for configuring this Transformer
|
12
12
|
def initialize(extraction, options = {})
|
13
|
+
unless extraction.is_a?(Chronicle::ETL::Extraction)
|
14
|
+
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
15
|
+
end
|
16
|
+
|
13
17
|
@extraction = extraction
|
14
18
|
apply_options(options)
|
15
19
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chronicle-etl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.5.
|
4
|
+
version: 0.5.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Louis
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-
|
11
|
+
date: 2022-04-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|