chronicle-etl 0.5.0 → 0.5.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +45 -10
- data/lib/chronicle/etl/cli/cli_base.rb +1 -0
- data/lib/chronicle/etl/cli/jobs.rb +41 -13
- data/lib/chronicle/etl/cli/main.rb +29 -20
- data/lib/chronicle/etl/cli/plugins.rb +1 -1
- data/lib/chronicle/etl/config.rb +6 -0
- data/lib/chronicle/etl/exceptions.rb +3 -0
- data/lib/chronicle/etl/loaders/csv_loader.rb +10 -13
- data/lib/chronicle/etl/loaders/helpers/stdout_helper.rb +36 -0
- data/lib/chronicle/etl/loaders/json_loader.rb +43 -8
- data/lib/chronicle/etl/loaders/loader.rb +1 -0
- data/lib/chronicle/etl/models/base.rb +1 -1
- data/lib/chronicle/etl/models/entity.rb +1 -1
- data/lib/chronicle/etl/runner.rb +51 -24
- data/lib/chronicle/etl/transformers/transformer.rb +4 -0
- data/lib/chronicle/etl/version.rb +1 -1
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
|
4
|
+
data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
|
7
|
+
data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
|
data/README.md
CHANGED
@@ -8,20 +8,36 @@ Are you trying to archive your digital history or incorporate it into your own p
|
|
8
8
|
|
9
9
|
If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
|
10
10
|
|
11
|
-
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a
|
11
|
+
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
|
12
12
|
|
13
13
|
## What does `chronicle-etl` give you?
|
14
14
|
* **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
|
15
15
|
* **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
|
16
16
|
* **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
|
17
17
|
|
18
|
+
## Chronicle-ETL in action
|
19
|
+
|
20
|
+
![demo](https://user-images.githubusercontent.com/6291/161410839-b5ce931a-2353-4585-b530-929f46e3f960.svg)
|
21
|
+
|
22
|
+
### Longer screencast
|
23
|
+
|
24
|
+
[![asciicast](https://asciinema.org/a/483455.svg)](https://asciinema.org/a/483455)
|
25
|
+
|
18
26
|
## Installation
|
27
|
+
|
28
|
+
Using homebrew:
|
19
29
|
```sh
|
20
|
-
|
21
|
-
|
30
|
+
$ brew install chronicle-app/etl/chronicle-etl
|
31
|
+
```
|
32
|
+
Using rubygems:
|
33
|
+
```sh
|
34
|
+
$ gem install chronicle-etl
|
22
35
|
```
|
23
36
|
|
24
|
-
|
37
|
+
Confirm it installed successfully:
|
38
|
+
```sh
|
39
|
+
$ chronicle-etl --version
|
40
|
+
```
|
25
41
|
|
26
42
|
## Basic usage and running jobs
|
27
43
|
|
@@ -33,7 +49,7 @@ $ chronicle-etl help
|
|
33
49
|
$ chronicle-etl --extractor NAME --transformer NAME --loader NAME
|
34
50
|
|
35
51
|
# Read test.csv and display it to stdout as a table
|
36
|
-
$ chronicle-etl --extractor csv --input
|
52
|
+
$ chronicle-etl --extractor csv --input data.csv --loader table
|
37
53
|
|
38
54
|
# Retrieve shell commands run in the last 5 hours
|
39
55
|
$ chronicle-etl -e shell --since 5h
|
@@ -50,7 +66,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
|
|
50
66
|
### Common options
|
51
67
|
```sh
|
52
68
|
Options:
|
53
|
-
-j, [--name=NAME] # Job configuration name
|
54
69
|
-e, [--extractor=NAME] # Extractor class. Default: stdin
|
55
70
|
[--extractor-opts=key:value] # Extractor options
|
56
71
|
-t, [--transformer=NAME] # Transformer class. Default: null
|
@@ -71,6 +86,26 @@ Options:
|
|
71
86
|
[--silent], [--no-silent] # Silence all output
|
72
87
|
```
|
73
88
|
|
89
|
+
### Saving jobs
|
90
|
+
|
91
|
+
You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
|
92
|
+
|
93
|
+
```sh
|
94
|
+
# Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
|
95
|
+
$ chronicle-etl jobs:save sample --extractor pinboard --since 10d
|
96
|
+
|
97
|
+
# Show details about the job
|
98
|
+
$ chronicle-etl jobs:show sample
|
99
|
+
|
100
|
+
# Run the job
|
101
|
+
$ chronicle-etl jobs:run sample
|
102
|
+
# Or more simply:
|
103
|
+
$ chronicle-etl sample
|
104
|
+
|
105
|
+
# Show all saved jobs
|
106
|
+
$ chronicle-etl jobs:list
|
107
|
+
```
|
108
|
+
|
74
109
|
## Connectors
|
75
110
|
Connectors are available to read, process, and load data from different formats or external services.
|
76
111
|
|
@@ -97,7 +132,7 @@ $ chronicle-etl connectors:list
|
|
97
132
|
- [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
|
98
133
|
|
99
134
|
## Chronicle Plugins
|
100
|
-
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through
|
135
|
+
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
|
101
136
|
|
102
137
|
### Plugin usage
|
103
138
|
|
@@ -126,11 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
|
|
126
161
|
|
127
162
|
| Name | Description | Availability |
|
128
163
|
|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
|
164
|
+
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
|
165
|
+
| [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
|
129
166
|
| [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
|
130
|
-
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
131
|
-
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
|
132
167
|
| [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
|
133
168
|
| [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
|
169
|
+
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
134
170
|
|
135
171
|
#### Coming soon
|
136
172
|
|
@@ -206,7 +242,6 @@ $ chronicle-etl secrets:unset pinboard access_token
|
|
206
242
|
|
207
243
|
## Roadmap
|
208
244
|
|
209
|
-
- Add **homebrew formula** for easier installation. #13
|
210
245
|
- Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
|
211
246
|
- Add support for **incremental extractions** #37
|
212
247
|
- **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
|
@@ -6,6 +6,7 @@ module Chronicle
|
|
6
6
|
no_commands do
|
7
7
|
# Shorthand for cli_exit(status: :failure)
|
8
8
|
def cli_fail(message: nil, exception: nil)
|
9
|
+
message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
|
9
10
|
cli_exit(status: :failure, message: message, exception: exception)
|
10
11
|
end
|
11
12
|
|
@@ -9,8 +9,6 @@ module Chronicle
|
|
9
9
|
default_task "start"
|
10
10
|
namespace :jobs
|
11
11
|
|
12
|
-
class_option :name, aliases: '-j', desc: 'Job configuration name'
|
13
|
-
|
14
12
|
class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
|
15
13
|
class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
|
16
14
|
class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
|
@@ -44,8 +42,17 @@ module Chronicle
|
|
44
42
|
If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
|
45
43
|
LONG_DESC
|
46
44
|
# Run an ETL job
|
47
|
-
def start
|
48
|
-
|
45
|
+
def start(name = nil)
|
46
|
+
# If someone runs `$ chronicle-etl` with no arguments, show help menu.
|
47
|
+
# TODO: decide if we should check that there's nothing in stdin pipe
|
48
|
+
# in case user wants to actually run this sort of job stdin->null->stdout
|
49
|
+
if name.nil? && options[:extractor].nil?
|
50
|
+
m = Chronicle::ETL::CLI::Main.new
|
51
|
+
m.help
|
52
|
+
cli_exit
|
53
|
+
end
|
54
|
+
|
55
|
+
job_definition = build_job_definition(name, options)
|
49
56
|
|
50
57
|
if job_definition.plugins_missing?
|
51
58
|
missing_plugins = job_definition.errors[:plugins]
|
@@ -59,26 +66,43 @@ LONG_DESC
|
|
59
66
|
rescue Chronicle::ETL::JobDefinitionError => e
|
60
67
|
message = ""
|
61
68
|
job_definition.errors.each_pair do |category, errors|
|
62
|
-
message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n -")}"
|
69
|
+
message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n - ")}"
|
63
70
|
end
|
64
71
|
cli_fail(message: "Error running job.\n#{message}", exception: e)
|
65
72
|
end
|
66
73
|
|
67
|
-
|
74
|
+
option :'skip-confirmation', aliases: '-y', type: :boolean
|
75
|
+
desc "save", "Save a job"
|
68
76
|
# Create an ETL job
|
69
|
-
def
|
70
|
-
|
77
|
+
def save(name)
|
78
|
+
write_config = true
|
79
|
+
job_definition = build_job_definition(name, options)
|
71
80
|
job_definition.validate!
|
72
81
|
|
73
|
-
Chronicle::ETL::Config.
|
82
|
+
if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
|
83
|
+
prompt = TTY::Prompt.new
|
84
|
+
write_config = false
|
85
|
+
message = "Job '#{name}' exists already. Ovewrite it?"
|
86
|
+
begin
|
87
|
+
write_config = prompt.yes?(message)
|
88
|
+
rescue TTY::Reader::InputInterrupt
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
if write_config
|
93
|
+
Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
|
94
|
+
cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
|
95
|
+
else
|
96
|
+
cli_fail(message: "\nJob not saved")
|
97
|
+
end
|
74
98
|
rescue Chronicle::ETL::JobDefinitionError => e
|
75
99
|
cli_fail(message: "Job definition error", exception: e)
|
76
100
|
end
|
77
101
|
|
78
102
|
desc "show", "Show details about a job"
|
79
103
|
# Show an ETL job
|
80
|
-
def show
|
81
|
-
job_definition = build_job_definition(options)
|
104
|
+
def show(name = nil)
|
105
|
+
job_definition = build_job_definition(name, options)
|
82
106
|
job_definition.validate!
|
83
107
|
puts Chronicle::ETL::Job.new(job_definition)
|
84
108
|
rescue Chronicle::ETL::JobDefinitionError => e
|
@@ -112,12 +136,16 @@ LONG_DESC
|
|
112
136
|
private
|
113
137
|
|
114
138
|
def run_job(job_definition)
|
139
|
+
# FIXME: have to validate here so next method can work. This is clumsy
|
140
|
+
job_definition.validate!
|
115
141
|
# FIXME: clumsy to make CLI responsible for setting secrets here. Think about a better way to do this
|
116
142
|
job_definition.apply_default_secrets
|
117
143
|
|
118
144
|
job = Chronicle::ETL::Job.new(job_definition)
|
119
145
|
runner = Chronicle::ETL::Runner.new(job)
|
120
146
|
runner.run!
|
147
|
+
rescue RunnerError => e
|
148
|
+
cli_fail(message: "#{e.message}", exception: e)
|
121
149
|
end
|
122
150
|
|
123
151
|
# TODO: probably could merge this with something in cli/plugin
|
@@ -134,9 +162,9 @@ LONG_DESC
|
|
134
162
|
end
|
135
163
|
|
136
164
|
# Create job definition by reading config file and then overwriting with flag options
|
137
|
-
def build_job_definition(options)
|
165
|
+
def build_job_definition(name, options)
|
138
166
|
definition = Chronicle::ETL::JobDefinition.new
|
139
|
-
definition.add_config(load_job_config(
|
167
|
+
definition.add_config(load_job_config(name))
|
140
168
|
definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
|
141
169
|
definition
|
142
170
|
end
|
@@ -54,24 +54,40 @@ module Chronicle
|
|
54
54
|
klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
|
55
55
|
klass.start(['-h', task].compact, shell: shell)
|
56
56
|
else
|
57
|
-
shell.say "ABOUT".bold
|
58
|
-
shell.say " #{'chronicle-etl'.italic} is a
|
57
|
+
shell.say "ABOUT:".bold
|
58
|
+
shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
|
59
|
+
shell.say " history. 📜"
|
59
60
|
shell.say
|
60
|
-
shell.say "
|
61
|
-
shell.say "
|
61
|
+
shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
|
62
|
+
shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
|
63
|
+
shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
|
64
|
+
shell.say " rest) or use plugins to connect to third-party services."
|
62
65
|
shell.say
|
63
|
-
shell.say "
|
64
|
-
shell.say " Show available connectors:".italic.light_black
|
65
|
-
shell.say " $ chronicle-etl connectors:list"
|
66
|
+
shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
|
66
67
|
shell.say
|
67
|
-
shell.say "
|
68
|
-
shell.say "
|
68
|
+
shell.say "USAGE:".bold
|
69
|
+
shell.say " # Basic job usage:".italic.light_black
|
70
|
+
shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
|
69
71
|
shell.say
|
70
|
-
shell.say "
|
72
|
+
shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
|
73
|
+
shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
|
74
|
+
shell.say
|
75
|
+
shell.say " # Show available plugins:".italic.light_black
|
76
|
+
shell.say " $ chronicle-etl plugins:list"
|
77
|
+
shell.say
|
78
|
+
shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
|
79
|
+
shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
|
80
|
+
shell.say " $ chronicle-etl secrets:list"
|
81
|
+
shell.say " $ chronicle-etl -e pinboard --since 1mo"
|
82
|
+
shell.say
|
83
|
+
shell.say " # Show full job options:".italic.light_black
|
71
84
|
shell.say " $ chronicle-etl jobs help run"
|
85
|
+
shell.say
|
86
|
+
shell.say "FULL DOCUMENTATION:".bold
|
87
|
+
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
88
|
+
shell.say
|
72
89
|
|
73
90
|
list = []
|
74
|
-
|
75
91
|
::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
|
76
92
|
list += thor_class.printable_tasks(false)
|
77
93
|
end
|
@@ -79,25 +95,18 @@ module Chronicle
|
|
79
95
|
list.unshift ["help", "# This help menu"]
|
80
96
|
|
81
97
|
shell.say
|
82
|
-
shell.say 'ALL COMMANDS'.bold
|
98
|
+
shell.say 'ALL COMMANDS:'.bold
|
83
99
|
shell.print_table(list, indent: 2, truncate: true)
|
84
100
|
shell.say
|
85
|
-
shell.say "VERSION".bold
|
101
|
+
shell.say "VERSION:".bold
|
86
102
|
shell.say " #{Chronicle::ETL::VERSION}"
|
87
103
|
shell.say
|
88
104
|
shell.say " Display current version:".italic.light_black
|
89
105
|
shell.say " $ chronicle-etl --version"
|
90
|
-
shell.say
|
91
|
-
shell.say "FULL DOCUMENTATION".bold
|
92
|
-
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
93
|
-
shell.say
|
94
106
|
end
|
95
107
|
end
|
96
108
|
|
97
109
|
no_commands do
|
98
|
-
def testb
|
99
|
-
puts "hi"
|
100
|
-
end
|
101
110
|
def set_color_output
|
102
111
|
String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
|
103
112
|
end
|
@@ -61,7 +61,7 @@ module Chronicle
|
|
61
61
|
}
|
62
62
|
end
|
63
63
|
|
64
|
-
headers = ['name', 'description', '
|
64
|
+
headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
|
65
65
|
table = TTY::Table.new(headers, info.map(&:values))
|
66
66
|
puts "Installed plugins:"
|
67
67
|
puts table.render(indent: 2, padding: [0, 0])
|
data/lib/chronicle/etl/config.rb
CHANGED
@@ -28,6 +28,12 @@ module Chronicle
|
|
28
28
|
end
|
29
29
|
end
|
30
30
|
|
31
|
+
def exists?(type, identifier)
|
32
|
+
base = config_pathname_for_type(type)
|
33
|
+
path = base.join("#{identifier}.yml")
|
34
|
+
return path.exist?
|
35
|
+
end
|
36
|
+
|
31
37
|
# Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
|
32
38
|
def available_jobs
|
33
39
|
Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
|
@@ -3,11 +3,13 @@ require 'csv'
|
|
3
3
|
module Chronicle
|
4
4
|
module ETL
|
5
5
|
class CSVLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
6
8
|
register_connector do |r|
|
7
9
|
r.description = 'CSV'
|
8
10
|
end
|
9
11
|
|
10
|
-
setting :output
|
12
|
+
setting :output
|
11
13
|
setting :headers, default: true
|
12
14
|
setting :header_row, default: true
|
13
15
|
|
@@ -30,16 +32,7 @@ module Chronicle
|
|
30
32
|
csv_options[:headers] = headers
|
31
33
|
end
|
32
34
|
|
33
|
-
|
34
|
-
# This might seem like a duplication of the default value ($stdout)
|
35
|
-
# but it's because rspec overwrites $stdout (in helper #capture) to
|
36
|
-
# capture output.
|
37
|
-
io = $stdout.dup
|
38
|
-
else
|
39
|
-
io = File.open(@config.output, "w+")
|
40
|
-
end
|
41
|
-
|
42
|
-
output = CSV.generate(**csv_options) do |csv|
|
35
|
+
csv_output = CSV.generate(**csv_options) do |csv|
|
43
36
|
records.each do |record|
|
44
37
|
csv << record
|
45
38
|
.transform_keys(&:to_sym)
|
@@ -48,8 +41,12 @@ module Chronicle
|
|
48
41
|
end
|
49
42
|
end
|
50
43
|
|
51
|
-
io
|
52
|
-
|
44
|
+
# TODO: just write to io directly
|
45
|
+
if output_to_stdout?
|
46
|
+
write_to_stdout(csv_output)
|
47
|
+
else
|
48
|
+
File.write(@config.output, csv_output)
|
49
|
+
end
|
53
50
|
end
|
54
51
|
end
|
55
52
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
3
|
+
module Chronicle
|
4
|
+
module ETL
|
5
|
+
module Loaders
|
6
|
+
module Helpers
|
7
|
+
module StdoutHelper
|
8
|
+
# TODO: let users use "stdout" as an option for the `output` setting
|
9
|
+
# Assume we're using stdout if no output is specified
|
10
|
+
def output_to_stdout?
|
11
|
+
!@config.output
|
12
|
+
end
|
13
|
+
|
14
|
+
def create_stdout_temp_file
|
15
|
+
file = Tempfile.new('chronicle-stdout')
|
16
|
+
file.unlink
|
17
|
+
file
|
18
|
+
end
|
19
|
+
|
20
|
+
def write_to_stdout_from_temp_file(file)
|
21
|
+
file.rewind
|
22
|
+
write_to_stdout(file.read)
|
23
|
+
end
|
24
|
+
|
25
|
+
def write_to_stdout(output)
|
26
|
+
# We .dup because rspec overwrites $stdout (in helper #capture) to
|
27
|
+
# capture output.
|
28
|
+
stdout = $stdout.dup
|
29
|
+
stdout.write(output)
|
30
|
+
stdout.flush
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
@@ -1,19 +1,35 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
1
3
|
module Chronicle
|
2
4
|
module ETL
|
3
5
|
class JSONLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
4
8
|
register_connector do |r|
|
5
9
|
r.description = 'json'
|
6
10
|
end
|
7
11
|
|
8
12
|
setting :serializer
|
9
|
-
setting :output
|
13
|
+
setting :output
|
14
|
+
|
15
|
+
# If true, one JSON record per line. If false, output a single json
|
16
|
+
# object with an array of records
|
17
|
+
setting :line_separated, default: true, type: :boolean
|
18
|
+
|
19
|
+
def initialize(*args)
|
20
|
+
super
|
21
|
+
@first_line = true
|
22
|
+
end
|
10
23
|
|
11
24
|
def start
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
25
|
+
@output_file =
|
26
|
+
if output_to_stdout?
|
27
|
+
create_stdout_temp_file
|
28
|
+
else
|
29
|
+
File.open(@config.output, "w+")
|
30
|
+
end
|
31
|
+
|
32
|
+
@output_file.puts("[\n") unless @config.line_separated
|
17
33
|
end
|
18
34
|
|
19
35
|
def load(record)
|
@@ -27,15 +43,34 @@ module Chronicle
|
|
27
43
|
|
28
44
|
force_utf8(value)
|
29
45
|
end
|
30
|
-
|
46
|
+
|
47
|
+
line = encoded.to_json
|
48
|
+
# For line-separated output, we just put json + newline
|
49
|
+
if @config.line_separated
|
50
|
+
line = "#{line}\n"
|
51
|
+
# Otherwise, we add a comma and newline and then add record to the
|
52
|
+
# array we created in #start (unless it's the first line).
|
53
|
+
else
|
54
|
+
line = ",\n#{line}" unless @first_line
|
55
|
+
end
|
56
|
+
|
57
|
+
@output_file.write(line)
|
58
|
+
|
59
|
+
@first_line = false
|
31
60
|
end
|
32
61
|
|
33
62
|
def finish
|
34
|
-
|
63
|
+
# Close the array unless we're doing line-separated JSON
|
64
|
+
@output_file.puts("\n]") unless @config.line_separated
|
65
|
+
|
66
|
+
write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
|
67
|
+
|
68
|
+
@output_file.close
|
35
69
|
end
|
36
70
|
|
37
71
|
private
|
38
72
|
|
73
|
+
# TODO: implement this
|
39
74
|
def serializer
|
40
75
|
@config.serializer || Chronicle::ETL::RawSerializer
|
41
76
|
end
|
@@ -9,7 +9,7 @@ module Chronicle
|
|
9
9
|
# @todo Experiment with just mixing in ActiveModel instead of this
|
10
10
|
# this reimplementation
|
11
11
|
class Base
|
12
|
-
ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
|
12
|
+
ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
|
13
13
|
ASSOCIATIONS = [].freeze
|
14
14
|
|
15
15
|
attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
|
data/lib/chronicle/etl/runner.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'colorize'
|
2
2
|
require 'chronic_duration'
|
3
|
+
require "tty-spinner"
|
3
4
|
|
4
5
|
class Chronicle::ETL::Runner
|
5
6
|
def initialize(job)
|
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
|
|
8
9
|
end
|
9
10
|
|
10
11
|
def run!
|
12
|
+
begin_job
|
11
13
|
validate_job
|
12
14
|
instantiate_connectors
|
13
15
|
prepare_job
|
14
16
|
prepare_ui
|
15
17
|
run_extraction
|
18
|
+
rescue Chronicle::ETL::ExtractionError => e
|
19
|
+
@job_logger&.error
|
20
|
+
raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
|
21
|
+
rescue Interrupt
|
22
|
+
@job_logger&.error
|
23
|
+
raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
|
24
|
+
rescue StandardError => e
|
25
|
+
# Just throwing this in here until we have better exception handling in
|
26
|
+
# loaders, etc
|
27
|
+
@job_logger&.error
|
28
|
+
raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
|
29
|
+
ensure
|
16
30
|
finish_job
|
17
31
|
end
|
18
32
|
|
19
33
|
private
|
20
34
|
|
35
|
+
def begin_job
|
36
|
+
Chronicle::ETL::Logger.info(tty_log_job_initialize)
|
37
|
+
@initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
|
38
|
+
end
|
39
|
+
|
21
40
|
def validate_job
|
41
|
+
@initialization_spinner.update(title: "Validating job")
|
22
42
|
@job.job_definition.validate!
|
23
43
|
end
|
24
44
|
|
25
45
|
def instantiate_connectors
|
46
|
+
@initialization_spinner.update(title: "Initializing connectors")
|
26
47
|
@extractor = @job.instantiate_extractor
|
27
48
|
@loader = @job.instantiate_loader
|
28
49
|
end
|
29
50
|
|
30
51
|
def prepare_job
|
31
|
-
|
52
|
+
@initialization_spinner.update(title: "Preparing job")
|
32
53
|
@job_logger.start
|
33
54
|
@loader.start
|
55
|
+
|
56
|
+
@initialization_spinner.update(title: "Preparing extraction")
|
57
|
+
@initialization_spinner.auto_spin
|
34
58
|
@extractor.prepare
|
59
|
+
@initialization_spinner.success("(#{'successful'.green})")
|
60
|
+
Chronicle::ETL::Logger.info("\n")
|
35
61
|
end
|
36
62
|
|
37
63
|
def prepare_ui
|
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
|
|
40
66
|
Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
|
41
67
|
end
|
42
68
|
|
43
|
-
# TODO: refactor this further
|
44
69
|
def run_extraction
|
45
70
|
@extractor.extract do |extraction|
|
46
|
-
|
47
|
-
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
48
|
-
end
|
49
|
-
|
50
|
-
transformer = @job.instantiate_transformer(extraction)
|
51
|
-
record = transformer.transform
|
52
|
-
|
53
|
-
Chronicle::ETL::Logger.info(tty_log_transformation(transformer))
|
54
|
-
@job_logger.log_transformation(transformer)
|
55
|
-
|
56
|
-
@loader.load(record) unless @job.dry_run?
|
57
|
-
rescue Chronicle::ETL::TransformationError => e
|
58
|
-
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
59
|
-
ensure
|
71
|
+
process_extraction(extraction)
|
60
72
|
@progress_bar.increment
|
61
73
|
end
|
62
74
|
|
63
75
|
@progress_bar.finish
|
76
|
+
|
77
|
+
# This is typically a slow method (writing to stdout, writing a big file, etc)
|
78
|
+
# TODO: consider adding a spinner?
|
64
79
|
@loader.finish
|
65
80
|
@job_logger.finish
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
81
|
+
end
|
82
|
+
|
83
|
+
def process_extraction(extraction)
|
84
|
+
# For each extraction from our extractor, we create a new tarnsformer
|
85
|
+
transformer = @job.instantiate_transformer(extraction)
|
86
|
+
|
87
|
+
# And then transform that record, logging it if we're in debug log level
|
88
|
+
record = transformer.transform
|
89
|
+
Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
|
90
|
+
@job_logger.log_transformation(transformer)
|
91
|
+
|
92
|
+
# Then send the results to the loader
|
93
|
+
@loader.load(record) unless @job.dry_run?
|
94
|
+
rescue Chronicle::ETL::TransformationError => e
|
95
|
+
# TODO: have an option to cancel job if we encounter an error
|
96
|
+
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
71
97
|
end
|
72
98
|
|
73
99
|
def finish_job
|
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
|
|
77
103
|
Chronicle::ETL::Logger.info(tty_log_completion)
|
78
104
|
end
|
79
105
|
|
80
|
-
def
|
106
|
+
def tty_log_job_initialize
|
81
107
|
output = "Beginning job "
|
82
108
|
output += "'#{@job.name}'".bold if @job.name
|
83
109
|
output
|
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
|
|
95
121
|
|
96
122
|
def tty_log_completion
|
97
123
|
status = @job_logger.success ? 'Success' : 'Failed'
|
98
|
-
|
99
|
-
output
|
124
|
+
job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
|
125
|
+
output = "\n#{job_completion} job"
|
126
|
+
output += " '#{@job.name}'".bold if @job.name
|
100
127
|
output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
|
101
128
|
output += "\n Status:\t".light_black + status
|
102
129
|
output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
|
@@ -10,6 +10,10 @@ module Chronicle
|
|
10
10
|
# options::
|
11
11
|
# Options for configuring this Transformer
|
12
12
|
def initialize(extraction, options = {})
|
13
|
+
unless extraction.is_a?(Chronicle::ETL::Extraction)
|
14
|
+
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
15
|
+
end
|
16
|
+
|
13
17
|
@extraction = extraction
|
14
18
|
apply_options(options)
|
15
19
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chronicle-etl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.5.
|
4
|
+
version: 0.5.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Louis
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-
|
11
|
+
date: 2022-04-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -396,6 +396,7 @@ files:
|
|
396
396
|
- lib/chronicle/etl/job_logger.rb
|
397
397
|
- lib/chronicle/etl/loaders/csv_loader.rb
|
398
398
|
- lib/chronicle/etl/loaders/helpers/encoding_helper.rb
|
399
|
+
- lib/chronicle/etl/loaders/helpers/stdout_helper.rb
|
399
400
|
- lib/chronicle/etl/loaders/json_loader.rb
|
400
401
|
- lib/chronicle/etl/loaders/loader.rb
|
401
402
|
- lib/chronicle/etl/loaders/rest_loader.rb
|