chronicle-etl 0.5.1 → 0.5.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +34 -6
- data/lib/chronicle/etl/cli/jobs.rb +28 -12
- data/lib/chronicle/etl/config.rb +6 -0
- data/lib/chronicle/etl/loaders/csv_loader.rb +10 -13
- data/lib/chronicle/etl/loaders/helpers/stdout_helper.rb +36 -0
- data/lib/chronicle/etl/loaders/json_loader.rb +43 -11
- data/lib/chronicle/etl/loaders/loader.rb +1 -0
- data/lib/chronicle/etl/version.rb +1 -1
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b8faa084cfe4a9f080ee5494c69b268b78bfa8f3502354e740264e6941f13daf
|
4
|
+
data.tar.gz: 1bf4f2751c71cadedc78a2fe3ed5b09bf86cd601a909e2fa2db0a0de8cc2c21d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ff10779b663a3321b779fb03e07249856174d96fb96e405ae906a47441c288d6a245c852525801ba250cce1125cf05c523ef4ec75fdfb4335cef9003091437ed
|
7
|
+
data.tar.gz: 509f6f92e95341d212c54b6b000bc54e8ba03898497191a3e5d3b14db7bff3ed625d0fee403888fbb6103c1edc14de66b215d9aa84ddb68cefcf51c0e6c74138
|
data/README.md
CHANGED
@@ -16,12 +16,21 @@ If you don’t want to spend all your time writing scrapers, reverse-engineering
|
|
16
16
|
* **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
|
17
17
|
|
18
18
|
## Installation
|
19
|
+
|
20
|
+
Using homebrew:
|
21
|
+
```sh
|
22
|
+
$ brew install chronicle-app/etl/chronicle-etl
|
23
|
+
|
24
|
+
```
|
25
|
+
Using rubygems:
|
19
26
|
```sh
|
20
|
-
|
21
|
-
gem install chronicle-etl
|
27
|
+
$ gem install chronicle-etl
|
22
28
|
```
|
23
29
|
|
24
|
-
|
30
|
+
Confirm it installed successfully:
|
31
|
+
```sh
|
32
|
+
$ chronicle-etl --version
|
33
|
+
```
|
25
34
|
|
26
35
|
## Basic usage and running jobs
|
27
36
|
|
@@ -50,7 +59,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
|
|
50
59
|
### Common options
|
51
60
|
```sh
|
52
61
|
Options:
|
53
|
-
-j, [--name=NAME] # Job configuration name
|
54
62
|
-e, [--extractor=NAME] # Extractor class. Default: stdin
|
55
63
|
[--extractor-opts=key:value] # Extractor options
|
56
64
|
-t, [--transformer=NAME] # Transformer class. Default: null
|
@@ -71,6 +79,26 @@ Options:
|
|
71
79
|
[--silent], [--no-silent] # Silence all output
|
72
80
|
```
|
73
81
|
|
82
|
+
### Saving jobs
|
83
|
+
|
84
|
+
You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
|
85
|
+
|
86
|
+
```sh
|
87
|
+
# Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
|
88
|
+
$ chronicle-etl jobs:save sample --extractor pinboard --since 10d
|
89
|
+
|
90
|
+
# Show details about the job
|
91
|
+
$ chronicle-etl jobs:show sample
|
92
|
+
|
93
|
+
# Run the job
|
94
|
+
$ chronicle-etl jobs:run sample
|
95
|
+
# Or more simply:
|
96
|
+
$ chronicle-etl sample
|
97
|
+
|
98
|
+
# Show all saved jobs
|
99
|
+
$ chronicle-etl jobs:list
|
100
|
+
```
|
101
|
+
|
74
102
|
## Connectors
|
75
103
|
Connectors are available to read, process, and load data from different formats or external services.
|
76
104
|
|
@@ -97,7 +125,7 @@ $ chronicle-etl connectors:list
|
|
97
125
|
- [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
|
98
126
|
|
99
127
|
## Chronicle Plugins
|
100
|
-
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through
|
128
|
+
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
|
101
129
|
|
102
130
|
### Plugin usage
|
103
131
|
|
@@ -131,6 +159,7 @@ If you don't see a plugin for a third-party provider or data source that you're
|
|
131
159
|
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
|
132
160
|
| [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
|
133
161
|
| [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
|
162
|
+
| [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
|
134
163
|
|
135
164
|
#### Coming soon
|
136
165
|
|
@@ -206,7 +235,6 @@ $ chronicle-etl secrets:unset pinboard access_token
|
|
206
235
|
|
207
236
|
## Roadmap
|
208
237
|
|
209
|
-
- Add **homebrew formula** for easier installation. #13
|
210
238
|
- Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
|
211
239
|
- Add support for **incremental extractions** #37
|
212
240
|
- **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
|
@@ -9,8 +9,6 @@ module Chronicle
|
|
9
9
|
default_task "start"
|
10
10
|
namespace :jobs
|
11
11
|
|
12
|
-
class_option :name, aliases: '-j', desc: 'Job configuration name'
|
13
|
-
|
14
12
|
class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
|
15
13
|
class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
|
16
14
|
class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
|
@@ -44,8 +42,8 @@ module Chronicle
|
|
44
42
|
If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
|
45
43
|
LONG_DESC
|
46
44
|
# Run an ETL job
|
47
|
-
def start
|
48
|
-
job_definition = build_job_definition(options)
|
45
|
+
def start(name = nil)
|
46
|
+
job_definition = build_job_definition(name, options)
|
49
47
|
|
50
48
|
if job_definition.plugins_missing?
|
51
49
|
missing_plugins = job_definition.errors[:plugins]
|
@@ -64,21 +62,39 @@ LONG_DESC
|
|
64
62
|
cli_fail(message: "Error running job.\n#{message}", exception: e)
|
65
63
|
end
|
66
64
|
|
67
|
-
|
65
|
+
option :'skip-confirmation', aliases: '-y', type: :boolean
|
66
|
+
desc "save", "Save a job"
|
68
67
|
# Create an ETL job
|
69
|
-
def
|
70
|
-
|
68
|
+
def save(name)
|
69
|
+
write_config = true
|
70
|
+
job_definition = build_job_definition(name, options)
|
71
71
|
job_definition.validate!
|
72
72
|
|
73
|
-
Chronicle::ETL::Config.
|
73
|
+
if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
|
74
|
+
prompt = TTY::Prompt.new
|
75
|
+
write_config = false
|
76
|
+
message = "Job '#{name}' exists already. Ovewrite it?"
|
77
|
+
begin
|
78
|
+
write_config = prompt.yes?(message)
|
79
|
+
rescue TTY::Reader::InputInterrupt
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
if write_config
|
84
|
+
Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
|
85
|
+
cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
|
86
|
+
else
|
87
|
+
cli_fail(message: "\nJob not saved")
|
88
|
+
end
|
89
|
+
|
74
90
|
rescue Chronicle::ETL::JobDefinitionError => e
|
75
91
|
cli_fail(message: "Job definition error", exception: e)
|
76
92
|
end
|
77
93
|
|
78
94
|
desc "show", "Show details about a job"
|
79
95
|
# Show an ETL job
|
80
|
-
def show
|
81
|
-
job_definition = build_job_definition(options)
|
96
|
+
def show(name = nil)
|
97
|
+
job_definition = build_job_definition(name, options)
|
82
98
|
job_definition.validate!
|
83
99
|
puts Chronicle::ETL::Job.new(job_definition)
|
84
100
|
rescue Chronicle::ETL::JobDefinitionError => e
|
@@ -136,9 +152,9 @@ LONG_DESC
|
|
136
152
|
end
|
137
153
|
|
138
154
|
# Create job definition by reading config file and then overwriting with flag options
|
139
|
-
def build_job_definition(options)
|
155
|
+
def build_job_definition(name, options)
|
140
156
|
definition = Chronicle::ETL::JobDefinition.new
|
141
|
-
definition.add_config(load_job_config(
|
157
|
+
definition.add_config(load_job_config(name))
|
142
158
|
definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
|
143
159
|
definition
|
144
160
|
end
|
data/lib/chronicle/etl/config.rb
CHANGED
@@ -28,6 +28,12 @@ module Chronicle
|
|
28
28
|
end
|
29
29
|
end
|
30
30
|
|
31
|
+
def exists?(type, identifier)
|
32
|
+
base = config_pathname_for_type(type)
|
33
|
+
path = base.join("#{identifier}.yml")
|
34
|
+
return path.exist?
|
35
|
+
end
|
36
|
+
|
31
37
|
# Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
|
32
38
|
def available_jobs
|
33
39
|
Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
|
@@ -3,11 +3,13 @@ require 'csv'
|
|
3
3
|
module Chronicle
|
4
4
|
module ETL
|
5
5
|
class CSVLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
6
8
|
register_connector do |r|
|
7
9
|
r.description = 'CSV'
|
8
10
|
end
|
9
11
|
|
10
|
-
setting :output
|
12
|
+
setting :output
|
11
13
|
setting :headers, default: true
|
12
14
|
setting :header_row, default: true
|
13
15
|
|
@@ -30,16 +32,7 @@ module Chronicle
|
|
30
32
|
csv_options[:headers] = headers
|
31
33
|
end
|
32
34
|
|
33
|
-
|
34
|
-
# This might seem like a duplication of the default value ($stdout)
|
35
|
-
# but it's because rspec overwrites $stdout (in helper #capture) to
|
36
|
-
# capture output.
|
37
|
-
io = $stdout.dup
|
38
|
-
else
|
39
|
-
io = File.open(@config.output, "w+")
|
40
|
-
end
|
41
|
-
|
42
|
-
output = CSV.generate(**csv_options) do |csv|
|
35
|
+
csv_output = CSV.generate(**csv_options) do |csv|
|
43
36
|
records.each do |record|
|
44
37
|
csv << record
|
45
38
|
.transform_keys(&:to_sym)
|
@@ -48,8 +41,12 @@ module Chronicle
|
|
48
41
|
end
|
49
42
|
end
|
50
43
|
|
51
|
-
io
|
52
|
-
|
44
|
+
# TODO: just write to io directly
|
45
|
+
if output_to_stdout?
|
46
|
+
write_to_stdout(csv_output)
|
47
|
+
else
|
48
|
+
File.write(@config.output, csv_output)
|
49
|
+
end
|
53
50
|
end
|
54
51
|
end
|
55
52
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
3
|
+
module Chronicle
|
4
|
+
module ETL
|
5
|
+
module Loaders
|
6
|
+
module Helpers
|
7
|
+
module StdoutHelper
|
8
|
+
# TODO: let users use "stdout" as an option for the `output` setting
|
9
|
+
# Assume we're using stdout if no output is specified
|
10
|
+
def output_to_stdout?
|
11
|
+
!@config.output
|
12
|
+
end
|
13
|
+
|
14
|
+
def create_stdout_temp_file
|
15
|
+
file = Tempfile.new('chronicle-stdout')
|
16
|
+
file.unlink
|
17
|
+
file
|
18
|
+
end
|
19
|
+
|
20
|
+
def write_to_stdout_from_temp_file(file)
|
21
|
+
file.rewind
|
22
|
+
write_to_stdout(file.read)
|
23
|
+
end
|
24
|
+
|
25
|
+
def write_to_stdout(output)
|
26
|
+
# We .dup because rspec overwrites $stdout (in helper #capture) to
|
27
|
+
# capture output.
|
28
|
+
stdout = $stdout.dup
|
29
|
+
stdout.write(output)
|
30
|
+
stdout.flush
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
@@ -1,22 +1,35 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
1
3
|
module Chronicle
|
2
4
|
module ETL
|
3
5
|
class JSONLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
4
8
|
register_connector do |r|
|
5
9
|
r.description = 'json'
|
6
10
|
end
|
7
11
|
|
8
12
|
setting :serializer
|
9
|
-
setting :output
|
13
|
+
setting :output
|
14
|
+
|
15
|
+
# If true, one JSON record per line. If false, output a single json
|
16
|
+
# object with an array of records
|
17
|
+
setting :line_separated, default: true, type: :boolean
|
18
|
+
|
19
|
+
def initialize(*args)
|
20
|
+
super
|
21
|
+
@first_line = true
|
22
|
+
end
|
10
23
|
|
11
24
|
def start
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
25
|
+
@output_file =
|
26
|
+
if output_to_stdout?
|
27
|
+
create_stdout_temp_file
|
28
|
+
else
|
29
|
+
File.open(@config.output, "w+")
|
30
|
+
end
|
31
|
+
|
32
|
+
@output_file.puts("[\n") unless @config.line_separated
|
20
33
|
end
|
21
34
|
|
22
35
|
def load(record)
|
@@ -30,15 +43,34 @@ module Chronicle
|
|
30
43
|
|
31
44
|
force_utf8(value)
|
32
45
|
end
|
33
|
-
|
46
|
+
|
47
|
+
line = encoded.to_json
|
48
|
+
# For line-separated output, we just put json + newline
|
49
|
+
if @config.line_separated
|
50
|
+
line = "#{line}\n"
|
51
|
+
# Otherwise, we add a comma and newline and then add record to the
|
52
|
+
# array we created in #start (unless it's the first line).
|
53
|
+
else
|
54
|
+
line = ",\n#{line}" unless @first_line
|
55
|
+
end
|
56
|
+
|
57
|
+
@output_file.write(line)
|
58
|
+
|
59
|
+
@first_line = false
|
34
60
|
end
|
35
61
|
|
36
62
|
def finish
|
37
|
-
|
63
|
+
# Close the array unless we're doing line-separated JSON
|
64
|
+
@output_file.puts("\n]") unless @config.line_separated
|
65
|
+
|
66
|
+
write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
|
67
|
+
|
68
|
+
@output_file.close
|
38
69
|
end
|
39
70
|
|
40
71
|
private
|
41
72
|
|
73
|
+
# TODO: implement this
|
42
74
|
def serializer
|
43
75
|
@config.serializer || Chronicle::ETL::RawSerializer
|
44
76
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chronicle-etl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.5.
|
4
|
+
version: 0.5.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Louis
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-03-
|
11
|
+
date: 2022-03-30 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -396,6 +396,7 @@ files:
|
|
396
396
|
- lib/chronicle/etl/job_logger.rb
|
397
397
|
- lib/chronicle/etl/loaders/csv_loader.rb
|
398
398
|
- lib/chronicle/etl/loaders/helpers/encoding_helper.rb
|
399
|
+
- lib/chronicle/etl/loaders/helpers/stdout_helper.rb
|
399
400
|
- lib/chronicle/etl/loaders/json_loader.rb
|
400
401
|
- lib/chronicle/etl/loaders/loader.rb
|
401
402
|
- lib/chronicle/etl/loaders/rest_loader.rb
|