chronicle-etl 0.5.0 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +45 -10
- data/lib/chronicle/etl/cli/cli_base.rb +1 -0
- data/lib/chronicle/etl/cli/jobs.rb +41 -13
- data/lib/chronicle/etl/cli/main.rb +29 -20
- data/lib/chronicle/etl/cli/plugins.rb +1 -1
- data/lib/chronicle/etl/config.rb +6 -0
- data/lib/chronicle/etl/exceptions.rb +3 -0
- data/lib/chronicle/etl/loaders/csv_loader.rb +10 -13
- data/lib/chronicle/etl/loaders/helpers/stdout_helper.rb +36 -0
- data/lib/chronicle/etl/loaders/json_loader.rb +43 -8
- data/lib/chronicle/etl/loaders/loader.rb +1 -0
- data/lib/chronicle/etl/models/base.rb +1 -1
- data/lib/chronicle/etl/models/entity.rb +1 -1
- data/lib/chronicle/etl/runner.rb +51 -24
- data/lib/chronicle/etl/transformers/transformer.rb +4 -0
- data/lib/chronicle/etl/version.rb +1 -1
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d0f305f15f4eda7a5851dfff2155da2c12ee010d4619346a13551f298d5b7991
|
4
|
+
data.tar.gz: d44f82b2bd06521740ad2b0e58cad0db840884fc5616858ef857d78fccb2b5dd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 99214409831e2799dffe2e3b096e9406222cf571d4e49fe71d3e3c645ad635e73c3aa42cb6af6569431064ce2750dc38ba051122a826b9feb6b21724ebd31db8
|
7
|
+
data.tar.gz: 77a30ecb069906b0e992adbcb1b7470642bcda65b8767dd8ba0145d63e834cb83e2a0a7bc5212edcf8c5028c637b2edbd8476b559f8e28765267516308b160eb
|
data/README.md
CHANGED
@@ -8,20 +8,36 @@ Are you trying to archive your digital history or incorporate it into your own p
|
|
8
8
|
|
9
9
|
If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this project is for you! (*If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues).*)
|
10
10
|
|
11
|
-
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a
|
11
|
+
**`chronicle-etl` is a CLI tool that gives you a unified interface for accessing your personal data.** It uses the ETL pattern to *extract* it from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), *transform* it (into a given schema), and *load* it to a destination (e.g. a CSV file, JSON, external API).
|
12
12
|
|
13
13
|
## What does `chronicle-etl` give you?
|
14
14
|
* **CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
|
15
15
|
* **Plugins for many third-party providers**. A plugin system allows you to access data from third-party providers and hook it into the shared CLI infrastructure.
|
16
16
|
* **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
|
17
17
|
|
18
|
+
## Chronicle-ETL in action
|
19
|
+
|
20
|
+

|
21
|
+
|
22
|
+
### Longer screencast
|
23
|
+
|
24
|
+
[](https://asciinema.org/a/483455)
|
25
|
+
|
18
26
|
## Installation
|
27
|
+
|
28
|
+
Using homebrew:
|
19
29
|
```sh
|
20
|
-
|
21
|
-
|
30
|
+
$ brew install chronicle-app/etl/chronicle-etl
|
31
|
+
```
|
32
|
+
Using rubygems:
|
33
|
+
```sh
|
34
|
+
$ gem install chronicle-etl
|
22
35
|
```
|
23
36
|
|
24
|
-
|
37
|
+
Confirm it installed successfully:
|
38
|
+
```sh
|
39
|
+
$ chronicle-etl --version
|
40
|
+
```
|
25
41
|
|
26
42
|
## Basic usage and running jobs
|
27
43
|
|
@@ -33,7 +49,7 @@ $ chronicle-etl help
|
|
33
49
|
$ chronicle-etl --extractor NAME --transformer NAME --loader NAME
|
34
50
|
|
35
51
|
# Read test.csv and display it to stdout as a table
|
36
|
-
$ chronicle-etl --extractor csv --input
|
52
|
+
$ chronicle-etl --extractor csv --input data.csv --loader table
|
37
53
|
|
38
54
|
# Retrieve shell commands run in the last 5 hours
|
39
55
|
$ chronicle-etl -e shell --since 5h
|
@@ -50,7 +66,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
|
|
50
66
|
### Common options
|
51
67
|
```sh
|
52
68
|
Options:
|
53
|
-
-j, [--name=NAME] # Job configuration name
|
54
69
|
-e, [--extractor=NAME] # Extractor class. Default: stdin
|
55
70
|
[--extractor-opts=key:value] # Extractor options
|
56
71
|
-t, [--transformer=NAME] # Transformer class. Default: null
|
@@ -71,6 +86,26 @@ Options:
|
|
71
86
|
[--silent], [--no-silent] # Silence all output
|
72
87
|
```
|
73
88
|
|
89
|
+
### Saving jobs
|
90
|
+
|
91
|
+
You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
|
92
|
+
|
93
|
+
```sh
|
94
|
+
# Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
|
95
|
+
$ chronicle-etl jobs:save sample --extractor pinboard --since 10d
|
96
|
+
|
97
|
+
# Show details about the job
|
98
|
+
$ chronicle-etl jobs:show sample
|
99
|
+
|
100
|
+
# Run the job
|
101
|
+
$ chronicle-etl jobs:run sample
|
102
|
+
# Or more simply:
|
103
|
+
$ chronicle-etl sample
|
104
|
+
|
105
|
+
# Show all saved jobs
|
106
|
+
$ chronicle-etl jobs:list
|
107
|
+
```
|
108
|
+
|
74
109
|
## Connectors
|
75
110
|
Connectors are available to read, process, and load data from different formats or external services.
|
76
111
|
|
@@ -97,7 +132,7 @@ $ chronicle-etl connectors:list
|
|
97
132
|
- [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
|
98
133
|
|
99
134
|
## Chronicle Plugins
|
100
|
-
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through
|
135
|
+
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
|
101
136
|
|
102
137
|
### Plugin usage
|
103
138
|
|
@@ -126,11 +161,12 @@ If you don't see a plugin for a third-party provider or data source that you're
|
|
126
161
|
|
127
162
|
| Name | Description | Availability |
|
128
163
|
|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|
|
164
|
+
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available |
|
165
|
+
| [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
|
129
166
|
| [imessage](https://github.com/chronicle-app/chronicle-imessage) | iMessage messages and attachments | Available |
|
130
|
-
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
131
|
-
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
|
132
167
|
| [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
|
133
168
|
| [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
|
169
|
+
| [shell](https://github.com/chronicle-app/chronicle-shell) | Shell command history | Available (still needs zsh support) |
|
134
170
|
|
135
171
|
#### Coming soon
|
136
172
|
|
@@ -206,7 +242,6 @@ $ chronicle-etl secrets:unset pinboard access_token
|
|
206
242
|
|
207
243
|
## Roadmap
|
208
244
|
|
209
|
-
- Add **homebrew formula** for easier installation. #13
|
210
245
|
- Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
|
211
246
|
- Add support for **incremental extractions** #37
|
212
247
|
- **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
|
@@ -6,6 +6,7 @@ module Chronicle
|
|
6
6
|
no_commands do
|
7
7
|
# Shorthand for cli_exit(status: :failure)
|
8
8
|
def cli_fail(message: nil, exception: nil)
|
9
|
+
message += "\nRe-run the command with --verbose to see details." if Chronicle::ETL::Logger.log_level > Chronicle::ETL::Logger::DEBUG
|
9
10
|
cli_exit(status: :failure, message: message, exception: exception)
|
10
11
|
end
|
11
12
|
|
@@ -9,8 +9,6 @@ module Chronicle
|
|
9
9
|
default_task "start"
|
10
10
|
namespace :jobs
|
11
11
|
|
12
|
-
class_option :name, aliases: '-j', desc: 'Job configuration name'
|
13
|
-
|
14
12
|
class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
|
15
13
|
class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
|
16
14
|
class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
|
@@ -44,8 +42,17 @@ module Chronicle
|
|
44
42
|
If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
|
45
43
|
LONG_DESC
|
46
44
|
# Run an ETL job
|
47
|
-
def start
|
48
|
-
|
45
|
+
def start(name = nil)
|
46
|
+
# If someone runs `$ chronicle-etl` with no arguments, show help menu.
|
47
|
+
# TODO: decide if we should check that there's nothing in stdin pipe
|
48
|
+
# in case user wants to actually run this sort of job stdin->null->stdout
|
49
|
+
if name.nil? && options[:extractor].nil?
|
50
|
+
m = Chronicle::ETL::CLI::Main.new
|
51
|
+
m.help
|
52
|
+
cli_exit
|
53
|
+
end
|
54
|
+
|
55
|
+
job_definition = build_job_definition(name, options)
|
49
56
|
|
50
57
|
if job_definition.plugins_missing?
|
51
58
|
missing_plugins = job_definition.errors[:plugins]
|
@@ -59,26 +66,43 @@ LONG_DESC
|
|
59
66
|
rescue Chronicle::ETL::JobDefinitionError => e
|
60
67
|
message = ""
|
61
68
|
job_definition.errors.each_pair do |category, errors|
|
62
|
-
message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n -")}"
|
69
|
+
message << "Problem with #{category}:\n - #{errors.map(&:to_s).join("\n - ")}"
|
63
70
|
end
|
64
71
|
cli_fail(message: "Error running job.\n#{message}", exception: e)
|
65
72
|
end
|
66
73
|
|
67
|
-
|
74
|
+
option :'skip-confirmation', aliases: '-y', type: :boolean
|
75
|
+
desc "save", "Save a job"
|
68
76
|
# Create an ETL job
|
69
|
-
def
|
70
|
-
|
77
|
+
def save(name)
|
78
|
+
write_config = true
|
79
|
+
job_definition = build_job_definition(name, options)
|
71
80
|
job_definition.validate!
|
72
81
|
|
73
|
-
Chronicle::ETL::Config.
|
82
|
+
if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
|
83
|
+
prompt = TTY::Prompt.new
|
84
|
+
write_config = false
|
85
|
+
message = "Job '#{name}' exists already. Ovewrite it?"
|
86
|
+
begin
|
87
|
+
write_config = prompt.yes?(message)
|
88
|
+
rescue TTY::Reader::InputInterrupt
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
if write_config
|
93
|
+
Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
|
94
|
+
cli_exit(message: "Job saved. Run it with `$ chronicle-etl jobs:run #{name}`")
|
95
|
+
else
|
96
|
+
cli_fail(message: "\nJob not saved")
|
97
|
+
end
|
74
98
|
rescue Chronicle::ETL::JobDefinitionError => e
|
75
99
|
cli_fail(message: "Job definition error", exception: e)
|
76
100
|
end
|
77
101
|
|
78
102
|
desc "show", "Show details about a job"
|
79
103
|
# Show an ETL job
|
80
|
-
def show
|
81
|
-
job_definition = build_job_definition(options)
|
104
|
+
def show(name = nil)
|
105
|
+
job_definition = build_job_definition(name, options)
|
82
106
|
job_definition.validate!
|
83
107
|
puts Chronicle::ETL::Job.new(job_definition)
|
84
108
|
rescue Chronicle::ETL::JobDefinitionError => e
|
@@ -112,12 +136,16 @@ LONG_DESC
|
|
112
136
|
private
|
113
137
|
|
114
138
|
def run_job(job_definition)
|
139
|
+
# FIXME: have to validate here so next method can work. This is clumsy
|
140
|
+
job_definition.validate!
|
115
141
|
# FIXME: clumsy to make CLI responsible for setting secrets here. Think about a better way to do this
|
116
142
|
job_definition.apply_default_secrets
|
117
143
|
|
118
144
|
job = Chronicle::ETL::Job.new(job_definition)
|
119
145
|
runner = Chronicle::ETL::Runner.new(job)
|
120
146
|
runner.run!
|
147
|
+
rescue RunnerError => e
|
148
|
+
cli_fail(message: "#{e.message}", exception: e)
|
121
149
|
end
|
122
150
|
|
123
151
|
# TODO: probably could merge this with something in cli/plugin
|
@@ -134,9 +162,9 @@ LONG_DESC
|
|
134
162
|
end
|
135
163
|
|
136
164
|
# Create job definition by reading config file and then overwriting with flag options
|
137
|
-
def build_job_definition(options)
|
165
|
+
def build_job_definition(name, options)
|
138
166
|
definition = Chronicle::ETL::JobDefinition.new
|
139
|
-
definition.add_config(load_job_config(
|
167
|
+
definition.add_config(load_job_config(name))
|
140
168
|
definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
|
141
169
|
definition
|
142
170
|
end
|
@@ -54,24 +54,40 @@ module Chronicle
|
|
54
54
|
klass, task = ::Thor::Util.find_class_and_task_by_namespace("#{meth}:#{meth}")
|
55
55
|
klass.start(['-h', task].compact, shell: shell)
|
56
56
|
else
|
57
|
-
shell.say "ABOUT".bold
|
58
|
-
shell.say " #{'chronicle-etl'.italic} is a
|
57
|
+
shell.say "ABOUT:".bold
|
58
|
+
shell.say " #{'chronicle-etl'.italic} is a toolkit for extracting and working with your digital"
|
59
|
+
shell.say " history. 📜"
|
59
60
|
shell.say
|
60
|
-
shell.say "
|
61
|
-
shell.say "
|
61
|
+
shell.say " A job #{'extracts'.underline} personal data from a source, #{'transforms'.underline} it (Chronicle"
|
62
|
+
shell.say " Schema or preserves raw data), and then #{'loads'.underline} it to a destination. Use"
|
63
|
+
shell.say " built-in extractors (json, csv, stdin) and loaders (csv, json, table,"
|
64
|
+
shell.say " rest) or use plugins to connect to third-party services."
|
62
65
|
shell.say
|
63
|
-
shell.say "
|
64
|
-
shell.say " Show available connectors:".italic.light_black
|
65
|
-
shell.say " $ chronicle-etl connectors:list"
|
66
|
+
shell.say " Plugins: https://github.com/chronicle-app/chronicle-etl#currently-available"
|
66
67
|
shell.say
|
67
|
-
shell.say "
|
68
|
-
shell.say "
|
68
|
+
shell.say "USAGE:".bold
|
69
|
+
shell.say " # Basic job usage:".italic.light_black
|
70
|
+
shell.say " $ chronicle-etl --extractor NAME --transformer NAME --loader NAME"
|
69
71
|
shell.say
|
70
|
-
shell.say "
|
72
|
+
shell.say " # Read test.csv and display it to stdout as a table:".italic.light_black
|
73
|
+
shell.say " $ chronicle-etl --extractor csv --input data.csv --loader table"
|
74
|
+
shell.say
|
75
|
+
shell.say " # Show available plugins:".italic.light_black
|
76
|
+
shell.say " $ chronicle-etl plugins:list"
|
77
|
+
shell.say
|
78
|
+
shell.say " # Save an access token as a secret and use it in a job:".italic.light_black
|
79
|
+
shell.say " $ chronicle-etl secrets:set pinboard access_token username:foo123"
|
80
|
+
shell.say " $ chronicle-etl secrets:list"
|
81
|
+
shell.say " $ chronicle-etl -e pinboard --since 1mo"
|
82
|
+
shell.say
|
83
|
+
shell.say " # Show full job options:".italic.light_black
|
71
84
|
shell.say " $ chronicle-etl jobs help run"
|
85
|
+
shell.say
|
86
|
+
shell.say "FULL DOCUMENTATION:".bold
|
87
|
+
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
88
|
+
shell.say
|
72
89
|
|
73
90
|
list = []
|
74
|
-
|
75
91
|
::Thor::Util.thor_classes_in(Chronicle::ETL::CLI).each do |thor_class|
|
76
92
|
list += thor_class.printable_tasks(false)
|
77
93
|
end
|
@@ -79,25 +95,18 @@ module Chronicle
|
|
79
95
|
list.unshift ["help", "# This help menu"]
|
80
96
|
|
81
97
|
shell.say
|
82
|
-
shell.say 'ALL COMMANDS'.bold
|
98
|
+
shell.say 'ALL COMMANDS:'.bold
|
83
99
|
shell.print_table(list, indent: 2, truncate: true)
|
84
100
|
shell.say
|
85
|
-
shell.say "VERSION".bold
|
101
|
+
shell.say "VERSION:".bold
|
86
102
|
shell.say " #{Chronicle::ETL::VERSION}"
|
87
103
|
shell.say
|
88
104
|
shell.say " Display current version:".italic.light_black
|
89
105
|
shell.say " $ chronicle-etl --version"
|
90
|
-
shell.say
|
91
|
-
shell.say "FULL DOCUMENTATION".bold
|
92
|
-
shell.say " https://github.com/chronicle-app/chronicle-etl".blue
|
93
|
-
shell.say
|
94
106
|
end
|
95
107
|
end
|
96
108
|
|
97
109
|
no_commands do
|
98
|
-
def testb
|
99
|
-
puts "hi"
|
100
|
-
end
|
101
110
|
def set_color_output
|
102
111
|
String.disable_colorization true if options[:'no-color'] || ENV['NO_COLOR']
|
103
112
|
end
|
@@ -61,7 +61,7 @@ module Chronicle
|
|
61
61
|
}
|
62
62
|
end
|
63
63
|
|
64
|
-
headers = ['name', 'description', '
|
64
|
+
headers = ['name', 'description', 'version'].map{ |h| h.to_s.upcase.bold }
|
65
65
|
table = TTY::Table.new(headers, info.map(&:values))
|
66
66
|
puts "Installed plugins:"
|
67
67
|
puts table.render(indent: 2, padding: [0, 0])
|
data/lib/chronicle/etl/config.rb
CHANGED
@@ -28,6 +28,12 @@ module Chronicle
|
|
28
28
|
end
|
29
29
|
end
|
30
30
|
|
31
|
+
def exists?(type, identifier)
|
32
|
+
base = config_pathname_for_type(type)
|
33
|
+
path = base.join("#{identifier}.yml")
|
34
|
+
return path.exist?
|
35
|
+
end
|
36
|
+
|
31
37
|
# Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
|
32
38
|
def available_jobs
|
33
39
|
Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
|
@@ -3,11 +3,13 @@ require 'csv'
|
|
3
3
|
module Chronicle
|
4
4
|
module ETL
|
5
5
|
class CSVLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
6
8
|
register_connector do |r|
|
7
9
|
r.description = 'CSV'
|
8
10
|
end
|
9
11
|
|
10
|
-
setting :output
|
12
|
+
setting :output
|
11
13
|
setting :headers, default: true
|
12
14
|
setting :header_row, default: true
|
13
15
|
|
@@ -30,16 +32,7 @@ module Chronicle
|
|
30
32
|
csv_options[:headers] = headers
|
31
33
|
end
|
32
34
|
|
33
|
-
|
34
|
-
# This might seem like a duplication of the default value ($stdout)
|
35
|
-
# but it's because rspec overwrites $stdout (in helper #capture) to
|
36
|
-
# capture output.
|
37
|
-
io = $stdout.dup
|
38
|
-
else
|
39
|
-
io = File.open(@config.output, "w+")
|
40
|
-
end
|
41
|
-
|
42
|
-
output = CSV.generate(**csv_options) do |csv|
|
35
|
+
csv_output = CSV.generate(**csv_options) do |csv|
|
43
36
|
records.each do |record|
|
44
37
|
csv << record
|
45
38
|
.transform_keys(&:to_sym)
|
@@ -48,8 +41,12 @@ module Chronicle
|
|
48
41
|
end
|
49
42
|
end
|
50
43
|
|
51
|
-
io
|
52
|
-
|
44
|
+
# TODO: just write to io directly
|
45
|
+
if output_to_stdout?
|
46
|
+
write_to_stdout(csv_output)
|
47
|
+
else
|
48
|
+
File.write(@config.output, csv_output)
|
49
|
+
end
|
53
50
|
end
|
54
51
|
end
|
55
52
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
3
|
+
module Chronicle
|
4
|
+
module ETL
|
5
|
+
module Loaders
|
6
|
+
module Helpers
|
7
|
+
module StdoutHelper
|
8
|
+
# TODO: let users use "stdout" as an option for the `output` setting
|
9
|
+
# Assume we're using stdout if no output is specified
|
10
|
+
def output_to_stdout?
|
11
|
+
!@config.output
|
12
|
+
end
|
13
|
+
|
14
|
+
def create_stdout_temp_file
|
15
|
+
file = Tempfile.new('chronicle-stdout')
|
16
|
+
file.unlink
|
17
|
+
file
|
18
|
+
end
|
19
|
+
|
20
|
+
def write_to_stdout_from_temp_file(file)
|
21
|
+
file.rewind
|
22
|
+
write_to_stdout(file.read)
|
23
|
+
end
|
24
|
+
|
25
|
+
def write_to_stdout(output)
|
26
|
+
# We .dup because rspec overwrites $stdout (in helper #capture) to
|
27
|
+
# capture output.
|
28
|
+
stdout = $stdout.dup
|
29
|
+
stdout.write(output)
|
30
|
+
stdout.flush
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
@@ -1,19 +1,35 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
1
3
|
module Chronicle
|
2
4
|
module ETL
|
3
5
|
class JSONLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
4
8
|
register_connector do |r|
|
5
9
|
r.description = 'json'
|
6
10
|
end
|
7
11
|
|
8
12
|
setting :serializer
|
9
|
-
setting :output
|
13
|
+
setting :output
|
14
|
+
|
15
|
+
# If true, one JSON record per line. If false, output a single json
|
16
|
+
# object with an array of records
|
17
|
+
setting :line_separated, default: true, type: :boolean
|
18
|
+
|
19
|
+
def initialize(*args)
|
20
|
+
super
|
21
|
+
@first_line = true
|
22
|
+
end
|
10
23
|
|
11
24
|
def start
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
25
|
+
@output_file =
|
26
|
+
if output_to_stdout?
|
27
|
+
create_stdout_temp_file
|
28
|
+
else
|
29
|
+
File.open(@config.output, "w+")
|
30
|
+
end
|
31
|
+
|
32
|
+
@output_file.puts("[\n") unless @config.line_separated
|
17
33
|
end
|
18
34
|
|
19
35
|
def load(record)
|
@@ -27,15 +43,34 @@ module Chronicle
|
|
27
43
|
|
28
44
|
force_utf8(value)
|
29
45
|
end
|
30
|
-
|
46
|
+
|
47
|
+
line = encoded.to_json
|
48
|
+
# For line-separated output, we just put json + newline
|
49
|
+
if @config.line_separated
|
50
|
+
line = "#{line}\n"
|
51
|
+
# Otherwise, we add a comma and newline and then add record to the
|
52
|
+
# array we created in #start (unless it's the first line).
|
53
|
+
else
|
54
|
+
line = ",\n#{line}" unless @first_line
|
55
|
+
end
|
56
|
+
|
57
|
+
@output_file.write(line)
|
58
|
+
|
59
|
+
@first_line = false
|
31
60
|
end
|
32
61
|
|
33
62
|
def finish
|
34
|
-
|
63
|
+
# Close the array unless we're doing line-separated JSON
|
64
|
+
@output_file.puts("\n]") unless @config.line_separated
|
65
|
+
|
66
|
+
write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
|
67
|
+
|
68
|
+
@output_file.close
|
35
69
|
end
|
36
70
|
|
37
71
|
private
|
38
72
|
|
73
|
+
# TODO: implement this
|
39
74
|
def serializer
|
40
75
|
@config.serializer || Chronicle::ETL::RawSerializer
|
41
76
|
end
|
@@ -9,7 +9,7 @@ module Chronicle
|
|
9
9
|
# @todo Experiment with just mixing in ActiveModel instead of this
|
10
10
|
# this reimplementation
|
11
11
|
class Base
|
12
|
-
ATTRIBUTES = [:provider, :provider_id, :lat, :lng, :metadata].freeze
|
12
|
+
ATTRIBUTES = [:provider, :provider_id, :provider_namespace, :lat, :lng, :metadata].freeze
|
13
13
|
ASSOCIATIONS = [].freeze
|
14
14
|
|
15
15
|
attr_accessor(:id, :dedupe_on, *ATTRIBUTES)
|
data/lib/chronicle/etl/runner.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'colorize'
|
2
2
|
require 'chronic_duration'
|
3
|
+
require "tty-spinner"
|
3
4
|
|
4
5
|
class Chronicle::ETL::Runner
|
5
6
|
def initialize(job)
|
@@ -8,30 +9,55 @@ class Chronicle::ETL::Runner
|
|
8
9
|
end
|
9
10
|
|
10
11
|
def run!
|
12
|
+
begin_job
|
11
13
|
validate_job
|
12
14
|
instantiate_connectors
|
13
15
|
prepare_job
|
14
16
|
prepare_ui
|
15
17
|
run_extraction
|
18
|
+
rescue Chronicle::ETL::ExtractionError => e
|
19
|
+
@job_logger&.error
|
20
|
+
raise(Chronicle::ETL::RunnerError, "Extraction failed. #{e.message}")
|
21
|
+
rescue Interrupt
|
22
|
+
@job_logger&.error
|
23
|
+
raise(Chronicle::ETL::RunInterruptedError, "Job interrupted.")
|
24
|
+
rescue StandardError => e
|
25
|
+
# Just throwing this in here until we have better exception handling in
|
26
|
+
# loaders, etc
|
27
|
+
@job_logger&.error
|
28
|
+
raise(Chronicle::ETL::RunnerError, "Error running job. #{e.message}")
|
29
|
+
ensure
|
16
30
|
finish_job
|
17
31
|
end
|
18
32
|
|
19
33
|
private
|
20
34
|
|
35
|
+
def begin_job
|
36
|
+
Chronicle::ETL::Logger.info(tty_log_job_initialize)
|
37
|
+
@initialization_spinner = TTY::Spinner.new(":spinner :title", format: :dots_2)
|
38
|
+
end
|
39
|
+
|
21
40
|
def validate_job
|
41
|
+
@initialization_spinner.update(title: "Validating job")
|
22
42
|
@job.job_definition.validate!
|
23
43
|
end
|
24
44
|
|
25
45
|
def instantiate_connectors
|
46
|
+
@initialization_spinner.update(title: "Initializing connectors")
|
26
47
|
@extractor = @job.instantiate_extractor
|
27
48
|
@loader = @job.instantiate_loader
|
28
49
|
end
|
29
50
|
|
30
51
|
def prepare_job
|
31
|
-
|
52
|
+
@initialization_spinner.update(title: "Preparing job")
|
32
53
|
@job_logger.start
|
33
54
|
@loader.start
|
55
|
+
|
56
|
+
@initialization_spinner.update(title: "Preparing extraction")
|
57
|
+
@initialization_spinner.auto_spin
|
34
58
|
@extractor.prepare
|
59
|
+
@initialization_spinner.success("(#{'successful'.green})")
|
60
|
+
Chronicle::ETL::Logger.info("\n")
|
35
61
|
end
|
36
62
|
|
37
63
|
def prepare_ui
|
@@ -40,34 +66,34 @@ class Chronicle::ETL::Runner
|
|
40
66
|
Chronicle::ETL::Logger.attach_to_progress_bar(@progress_bar)
|
41
67
|
end
|
42
68
|
|
43
|
-
# TODO: refactor this further
|
44
69
|
def run_extraction
|
45
70
|
@extractor.extract do |extraction|
|
46
|
-
|
47
|
-
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
48
|
-
end
|
49
|
-
|
50
|
-
transformer = @job.instantiate_transformer(extraction)
|
51
|
-
record = transformer.transform
|
52
|
-
|
53
|
-
Chronicle::ETL::Logger.info(tty_log_transformation(transformer))
|
54
|
-
@job_logger.log_transformation(transformer)
|
55
|
-
|
56
|
-
@loader.load(record) unless @job.dry_run?
|
57
|
-
rescue Chronicle::ETL::TransformationError => e
|
58
|
-
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
59
|
-
ensure
|
71
|
+
process_extraction(extraction)
|
60
72
|
@progress_bar.increment
|
61
73
|
end
|
62
74
|
|
63
75
|
@progress_bar.finish
|
76
|
+
|
77
|
+
# This is typically a slow method (writing to stdout, writing a big file, etc)
|
78
|
+
# TODO: consider adding a spinner?
|
64
79
|
@loader.finish
|
65
80
|
@job_logger.finish
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
81
|
+
end
|
82
|
+
|
83
|
+
def process_extraction(extraction)
|
84
|
+
# For each extraction from our extractor, we create a new tarnsformer
|
85
|
+
transformer = @job.instantiate_transformer(extraction)
|
86
|
+
|
87
|
+
# And then transform that record, logging it if we're in debug log level
|
88
|
+
record = transformer.transform
|
89
|
+
Chronicle::ETL::Logger.debug(tty_log_transformation(transformer))
|
90
|
+
@job_logger.log_transformation(transformer)
|
91
|
+
|
92
|
+
# Then send the results to the loader
|
93
|
+
@loader.load(record) unless @job.dry_run?
|
94
|
+
rescue Chronicle::ETL::TransformationError => e
|
95
|
+
# TODO: have an option to cancel job if we encounter an error
|
96
|
+
Chronicle::ETL::Logger.error(tty_log_transformation_failure(e, transformer))
|
71
97
|
end
|
72
98
|
|
73
99
|
def finish_job
|
@@ -77,7 +103,7 @@ class Chronicle::ETL::Runner
|
|
77
103
|
Chronicle::ETL::Logger.info(tty_log_completion)
|
78
104
|
end
|
79
105
|
|
80
|
-
def
|
106
|
+
def tty_log_job_initialize
|
81
107
|
output = "Beginning job "
|
82
108
|
output += "'#{@job.name}'".bold if @job.name
|
83
109
|
output
|
@@ -95,8 +121,9 @@ class Chronicle::ETL::Runner
|
|
95
121
|
|
96
122
|
def tty_log_completion
|
97
123
|
status = @job_logger.success ? 'Success' : 'Failed'
|
98
|
-
|
99
|
-
output
|
124
|
+
job_completion = @job_logger.success ? 'Completed' : 'Partially completed'
|
125
|
+
output = "\n#{job_completion} job"
|
126
|
+
output += " '#{@job.name}'".bold if @job.name
|
100
127
|
output += " in #{ChronicDuration.output(@job_logger.duration)}" if @job_logger.duration
|
101
128
|
output += "\n Status:\t".light_black + status
|
102
129
|
output += "\n Completed:\t".light_black + "#{@job_logger.job_log.num_records_processed}"
|
@@ -10,6 +10,10 @@ module Chronicle
|
|
10
10
|
# options::
|
11
11
|
# Options for configuring this Transformer
|
12
12
|
def initialize(extraction, options = {})
|
13
|
+
unless extraction.is_a?(Chronicle::ETL::Extraction)
|
14
|
+
raise Chronicle::ETL::RunnerTypeError, "Extracted should be a Chronicle::ETL::Extraction"
|
15
|
+
end
|
16
|
+
|
13
17
|
@extraction = extraction
|
14
18
|
apply_options(options)
|
15
19
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chronicle-etl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.5.
|
4
|
+
version: 0.5.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Louis
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-
|
11
|
+
date: 2022-04-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -396,6 +396,7 @@ files:
|
|
396
396
|
- lib/chronicle/etl/job_logger.rb
|
397
397
|
- lib/chronicle/etl/loaders/csv_loader.rb
|
398
398
|
- lib/chronicle/etl/loaders/helpers/encoding_helper.rb
|
399
|
+
- lib/chronicle/etl/loaders/helpers/stdout_helper.rb
|
399
400
|
- lib/chronicle/etl/loaders/json_loader.rb
|
400
401
|
- lib/chronicle/etl/loaders/loader.rb
|
401
402
|
- lib/chronicle/etl/loaders/rest_loader.rb
|