chronicle-etl 0.5.1 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +34 -6
- data/lib/chronicle/etl/cli/jobs.rb +28 -12
- data/lib/chronicle/etl/config.rb +6 -0
- data/lib/chronicle/etl/loaders/csv_loader.rb +10 -13
- data/lib/chronicle/etl/loaders/helpers/stdout_helper.rb +36 -0
- data/lib/chronicle/etl/loaders/json_loader.rb +43 -11
- data/lib/chronicle/etl/loaders/loader.rb +1 -0
- data/lib/chronicle/etl/version.rb +1 -1
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b8faa084cfe4a9f080ee5494c69b268b78bfa8f3502354e740264e6941f13daf
|
4
|
+
data.tar.gz: 1bf4f2751c71cadedc78a2fe3ed5b09bf86cd601a909e2fa2db0a0de8cc2c21d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ff10779b663a3321b779fb03e07249856174d96fb96e405ae906a47441c288d6a245c852525801ba250cce1125cf05c523ef4ec75fdfb4335cef9003091437ed
|
7
|
+
data.tar.gz: 509f6f92e95341d212c54b6b000bc54e8ba03898497191a3e5d3b14db7bff3ed625d0fee403888fbb6103c1edc14de66b215d9aa84ddb68cefcf51c0e6c74138
|
data/README.md
CHANGED
@@ -16,12 +16,21 @@ If you don’t want to spend all your time writing scrapers, reverse-engineering
|
|
16
16
|
* **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are stored in a common schema. Don’t want to use the schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.
|
17
17
|
|
18
18
|
## Installation
|
19
|
+
|
20
|
+
Using homebrew:
|
21
|
+
```sh
|
22
|
+
$ brew install chronicle-app/etl/chronicle-etl
|
23
|
+
|
24
|
+
```
|
25
|
+
Using rubygems:
|
19
26
|
```sh
|
20
|
-
|
21
|
-
gem install chronicle-etl
|
27
|
+
$ gem install chronicle-etl
|
22
28
|
```
|
23
29
|
|
24
|
-
|
30
|
+
Confirm it installed successfully:
|
31
|
+
```sh
|
32
|
+
$ chronicle-etl --version
|
33
|
+
```
|
25
34
|
|
26
35
|
## Basic usage and running jobs
|
27
36
|
|
@@ -50,7 +59,6 @@ $ chronicle-etl -e pinboard --since 1mo # Used automatically based on plugin nam
|
|
50
59
|
### Common options
|
51
60
|
```sh
|
52
61
|
Options:
|
53
|
-
-j, [--name=NAME] # Job configuration name
|
54
62
|
-e, [--extractor=NAME] # Extractor class. Default: stdin
|
55
63
|
[--extractor-opts=key:value] # Extractor options
|
56
64
|
-t, [--transformer=NAME] # Transformer class. Default: null
|
@@ -71,6 +79,26 @@ Options:
|
|
71
79
|
[--silent], [--no-silent] # Silence all output
|
72
80
|
```
|
73
81
|
|
82
|
+
### Saving jobs
|
83
|
+
|
84
|
+
You can save details about a job to a local config file (saved by default in `~/.config/chronicle/etl/jobs/job_name.yml`) to save yourself the trouble of setting the CLI flags for each run.
|
85
|
+
|
86
|
+
```sh
|
87
|
+
# Save a job named 'sample' to ~/.config/chronicle/etl/jobs/sample.yml
|
88
|
+
$ chronicle-etl jobs:save sample --extractor pinboard --since 10d
|
89
|
+
|
90
|
+
# Show details about the job
|
91
|
+
$ chronicle-etl jobs:show sample
|
92
|
+
|
93
|
+
# Run the job
|
94
|
+
$ chronicle-etl jobs:run sample
|
95
|
+
# Or more simply:
|
96
|
+
$ chronicle-etl sample
|
97
|
+
|
98
|
+
# Show all saved jobs
|
99
|
+
$ chronicle-etl jobs:list
|
100
|
+
```
|
101
|
+
|
74
102
|
## Connectors
|
75
103
|
Connectors are available to read, process, and load data from different formats or external services.
|
76
104
|
|
@@ -97,7 +125,7 @@ $ chronicle-etl connectors:list
|
|
97
125
|
- [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
|
98
126
|
|
99
127
|
## Chronicle Plugins
|
100
|
-
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through
|
128
|
+
Plugins provide access to data from third-party platforms, services, or formats. Plugins are packaged as separate rubygems and can be installed through the CLI (which installs the Gems under the hood).
|
101
129
|
|
102
130
|
### Plugin usage
|
103
131
|
|
@@ -131,6 +159,7 @@ If you don't see a plugin for a third-party provider or data source that you're
|
|
131
159
|
| [email](https://github.com/chronicle-app/chronicle-email) | Emails and attachments from IMAP or .mbox files | Available (still needs IMAP support) |
|
132
160
|
| [pinboard](https://github.com/chronicle-app/chronicle-email) | Bookmarks and tags | Available |
|
133
161
|
| [safari](https://github.com/chronicle-app/chronicle-safari) | Browser history from local sqlite db | Available |
|
162
|
+
| [github](https://github.com/chronicle-app/chronicle-github) | Github activity stream | Available |
|
134
163
|
|
135
164
|
#### Coming soon
|
136
165
|
|
@@ -206,7 +235,6 @@ $ chronicle-etl secrets:unset pinboard access_token
|
|
206
235
|
|
207
236
|
## Roadmap
|
208
237
|
|
209
|
-
- Add **homebrew formula** for easier installation. #13
|
210
238
|
- Keep tackling **new plugins**. See: [Chronicle Plugin Tracker](https://github.com/orgs/chronicle-app/projects/1)
|
211
239
|
- Add support for **incremental extractions** #37
|
212
240
|
- **Improve stdin extractor and shell command transformer** (#5) so that users can easily integrate their own scripts/tools into jobs
|
@@ -9,8 +9,6 @@ module Chronicle
|
|
9
9
|
default_task "start"
|
10
10
|
namespace :jobs
|
11
11
|
|
12
|
-
class_option :name, aliases: '-j', desc: 'Job configuration name'
|
13
|
-
|
14
12
|
class_option :extractor, aliases: '-e', desc: "Extractor class. Default: stdin", banner: 'NAME'
|
15
13
|
class_option :'extractor-opts', desc: 'Extractor options', type: :hash, default: {}
|
16
14
|
class_option :transformer, aliases: '-t', desc: 'Transformer class. Default: null', banner: 'NAME'
|
@@ -44,8 +42,8 @@ module Chronicle
|
|
44
42
|
If you do not want to use the command line flags, you can also configure a job with a .yml config file. You can either specify the path to this file or use the filename and place the file in ~/.config/chronicle/etl/jobs/NAME.yml and call it with `--job NAME`
|
45
43
|
LONG_DESC
|
46
44
|
# Run an ETL job
|
47
|
-
def start
|
48
|
-
job_definition = build_job_definition(options)
|
45
|
+
def start(name = nil)
|
46
|
+
job_definition = build_job_definition(name, options)
|
49
47
|
|
50
48
|
if job_definition.plugins_missing?
|
51
49
|
missing_plugins = job_definition.errors[:plugins]
|
@@ -64,21 +62,39 @@ LONG_DESC
|
|
64
62
|
cli_fail(message: "Error running job.\n#{message}", exception: e)
|
65
63
|
end
|
66
64
|
|
67
|
-
|
65
|
+
option :'skip-confirmation', aliases: '-y', type: :boolean
|
66
|
+
desc "save", "Save a job"
|
68
67
|
# Create an ETL job
|
69
|
-
def
|
70
|
-
|
68
|
+
def save(name)
|
69
|
+
write_config = true
|
70
|
+
job_definition = build_job_definition(name, options)
|
71
71
|
job_definition.validate!
|
72
72
|
|
73
|
-
Chronicle::ETL::Config.
|
73
|
+
if Chronicle::ETL::Config.exists?("jobs", name) && !options[:'skip-confirmation']
|
74
|
+
prompt = TTY::Prompt.new
|
75
|
+
write_config = false
|
76
|
+
message = "Job '#{name}' exists already. Ovewrite it?"
|
77
|
+
begin
|
78
|
+
write_config = prompt.yes?(message)
|
79
|
+
rescue TTY::Reader::InputInterrupt
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
if write_config
|
84
|
+
Chronicle::ETL::Config.write("jobs", name, job_definition.definition)
|
85
|
+
cli_exit(message: "Job saved. Run it with `$chronicle-etl jobs:run #{name}`")
|
86
|
+
else
|
87
|
+
cli_fail(message: "\nJob not saved")
|
88
|
+
end
|
89
|
+
|
74
90
|
rescue Chronicle::ETL::JobDefinitionError => e
|
75
91
|
cli_fail(message: "Job definition error", exception: e)
|
76
92
|
end
|
77
93
|
|
78
94
|
desc "show", "Show details about a job"
|
79
95
|
# Show an ETL job
|
80
|
-
def show
|
81
|
-
job_definition = build_job_definition(options)
|
96
|
+
def show(name = nil)
|
97
|
+
job_definition = build_job_definition(name, options)
|
82
98
|
job_definition.validate!
|
83
99
|
puts Chronicle::ETL::Job.new(job_definition)
|
84
100
|
rescue Chronicle::ETL::JobDefinitionError => e
|
@@ -136,9 +152,9 @@ LONG_DESC
|
|
136
152
|
end
|
137
153
|
|
138
154
|
# Create job definition by reading config file and then overwriting with flag options
|
139
|
-
def build_job_definition(options)
|
155
|
+
def build_job_definition(name, options)
|
140
156
|
definition = Chronicle::ETL::JobDefinition.new
|
141
|
-
definition.add_config(load_job_config(
|
157
|
+
definition.add_config(load_job_config(name))
|
142
158
|
definition.add_config(process_flag_options(options).transform_keys(&:to_sym))
|
143
159
|
definition
|
144
160
|
end
|
data/lib/chronicle/etl/config.rb
CHANGED
@@ -28,6 +28,12 @@ module Chronicle
|
|
28
28
|
end
|
29
29
|
end
|
30
30
|
|
31
|
+
def exists?(type, identifier)
|
32
|
+
base = config_pathname_for_type(type)
|
33
|
+
path = base.join("#{identifier}.yml")
|
34
|
+
return path.exist?
|
35
|
+
end
|
36
|
+
|
31
37
|
# Returns all jobs available in ~/.config/chronicle/etl/jobs/*.yml
|
32
38
|
def available_jobs
|
33
39
|
Dir.glob(File.join(config_pathname_for_type("jobs"), "*.yml")).map do |filename|
|
@@ -3,11 +3,13 @@ require 'csv'
|
|
3
3
|
module Chronicle
|
4
4
|
module ETL
|
5
5
|
class CSVLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
6
8
|
register_connector do |r|
|
7
9
|
r.description = 'CSV'
|
8
10
|
end
|
9
11
|
|
10
|
-
setting :output
|
12
|
+
setting :output
|
11
13
|
setting :headers, default: true
|
12
14
|
setting :header_row, default: true
|
13
15
|
|
@@ -30,16 +32,7 @@ module Chronicle
|
|
30
32
|
csv_options[:headers] = headers
|
31
33
|
end
|
32
34
|
|
33
|
-
|
34
|
-
# This might seem like a duplication of the default value ($stdout)
|
35
|
-
# but it's because rspec overwrites $stdout (in helper #capture) to
|
36
|
-
# capture output.
|
37
|
-
io = $stdout.dup
|
38
|
-
else
|
39
|
-
io = File.open(@config.output, "w+")
|
40
|
-
end
|
41
|
-
|
42
|
-
output = CSV.generate(**csv_options) do |csv|
|
35
|
+
csv_output = CSV.generate(**csv_options) do |csv|
|
43
36
|
records.each do |record|
|
44
37
|
csv << record
|
45
38
|
.transform_keys(&:to_sym)
|
@@ -48,8 +41,12 @@ module Chronicle
|
|
48
41
|
end
|
49
42
|
end
|
50
43
|
|
51
|
-
io
|
52
|
-
|
44
|
+
# TODO: just write to io directly
|
45
|
+
if output_to_stdout?
|
46
|
+
write_to_stdout(csv_output)
|
47
|
+
else
|
48
|
+
File.write(@config.output, csv_output)
|
49
|
+
end
|
53
50
|
end
|
54
51
|
end
|
55
52
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
3
|
+
module Chronicle
|
4
|
+
module ETL
|
5
|
+
module Loaders
|
6
|
+
module Helpers
|
7
|
+
module StdoutHelper
|
8
|
+
# TODO: let users use "stdout" as an option for the `output` setting
|
9
|
+
# Assume we're using stdout if no output is specified
|
10
|
+
def output_to_stdout?
|
11
|
+
!@config.output
|
12
|
+
end
|
13
|
+
|
14
|
+
def create_stdout_temp_file
|
15
|
+
file = Tempfile.new('chronicle-stdout')
|
16
|
+
file.unlink
|
17
|
+
file
|
18
|
+
end
|
19
|
+
|
20
|
+
def write_to_stdout_from_temp_file(file)
|
21
|
+
file.rewind
|
22
|
+
write_to_stdout(file.read)
|
23
|
+
end
|
24
|
+
|
25
|
+
def write_to_stdout(output)
|
26
|
+
# We .dup because rspec overwrites $stdout (in helper #capture) to
|
27
|
+
# capture output.
|
28
|
+
stdout = $stdout.dup
|
29
|
+
stdout.write(output)
|
30
|
+
stdout.flush
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
@@ -1,22 +1,35 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
|
1
3
|
module Chronicle
|
2
4
|
module ETL
|
3
5
|
class JSONLoader < Chronicle::ETL::Loader
|
6
|
+
include Chronicle::ETL::Loaders::Helpers::StdoutHelper
|
7
|
+
|
4
8
|
register_connector do |r|
|
5
9
|
r.description = 'json'
|
6
10
|
end
|
7
11
|
|
8
12
|
setting :serializer
|
9
|
-
setting :output
|
13
|
+
setting :output
|
14
|
+
|
15
|
+
# If true, one JSON record per line. If false, output a single json
|
16
|
+
# object with an array of records
|
17
|
+
setting :line_separated, default: true, type: :boolean
|
18
|
+
|
19
|
+
def initialize(*args)
|
20
|
+
super
|
21
|
+
@first_line = true
|
22
|
+
end
|
10
23
|
|
11
24
|
def start
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
25
|
+
@output_file =
|
26
|
+
if output_to_stdout?
|
27
|
+
create_stdout_temp_file
|
28
|
+
else
|
29
|
+
File.open(@config.output, "w+")
|
30
|
+
end
|
31
|
+
|
32
|
+
@output_file.puts("[\n") unless @config.line_separated
|
20
33
|
end
|
21
34
|
|
22
35
|
def load(record)
|
@@ -30,15 +43,34 @@ module Chronicle
|
|
30
43
|
|
31
44
|
force_utf8(value)
|
32
45
|
end
|
33
|
-
|
46
|
+
|
47
|
+
line = encoded.to_json
|
48
|
+
# For line-separated output, we just put json + newline
|
49
|
+
if @config.line_separated
|
50
|
+
line = "#{line}\n"
|
51
|
+
# Otherwise, we add a comma and newline and then add record to the
|
52
|
+
# array we created in #start (unless it's the first line).
|
53
|
+
else
|
54
|
+
line = ",\n#{line}" unless @first_line
|
55
|
+
end
|
56
|
+
|
57
|
+
@output_file.write(line)
|
58
|
+
|
59
|
+
@first_line = false
|
34
60
|
end
|
35
61
|
|
36
62
|
def finish
|
37
|
-
|
63
|
+
# Close the array unless we're doing line-separated JSON
|
64
|
+
@output_file.puts("\n]") unless @config.line_separated
|
65
|
+
|
66
|
+
write_to_stdout_from_temp_file(@output_file) if output_to_stdout?
|
67
|
+
|
68
|
+
@output_file.close
|
38
69
|
end
|
39
70
|
|
40
71
|
private
|
41
72
|
|
73
|
+
# TODO: implement this
|
42
74
|
def serializer
|
43
75
|
@config.serializer || Chronicle::ETL::RawSerializer
|
44
76
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chronicle-etl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.5.
|
4
|
+
version: 0.5.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Louis
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-03-
|
11
|
+
date: 2022-03-30 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -396,6 +396,7 @@ files:
|
|
396
396
|
- lib/chronicle/etl/job_logger.rb
|
397
397
|
- lib/chronicle/etl/loaders/csv_loader.rb
|
398
398
|
- lib/chronicle/etl/loaders/helpers/encoding_helper.rb
|
399
|
+
- lib/chronicle/etl/loaders/helpers/stdout_helper.rb
|
399
400
|
- lib/chronicle/etl/loaders/json_loader.rb
|
400
401
|
- lib/chronicle/etl/loaders/loader.rb
|
401
402
|
- lib/chronicle/etl/loaders/rest_loader.rb
|