hackboxen 0.1.0 → 0.1.1
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG.textile +10 -4
- data/README.textile +29 -38
- data/VERSION +1 -1
- data/hackboxen.gemspec +2 -2
- data/lib/hackboxen/tasks/icss.rb +1 -1
- data/lib/hackboxen/tasks/init.rb +9 -0
- data/lib/hackboxen/utils.rb +15 -2
- data/lib/hackboxen/utils/paths.rb +3 -15
- metadata +4 -4
data/CHANGELOG.textile
CHANGED
@@ -16,7 +16,7 @@ h3. Deprecations/Changes
|
|
16
16
|
|
17
17
|
* Much old code was refactored or removed.
|
18
18
|
|
19
|
-
h3. New Functionality
|
19
|
+
h3. New Functionality (version 0.1.0)
|
20
20
|
|
21
21
|
* Default Hackboxen paths can be accessed by using @path_to(:fixd_dir)@. See HackBoxen::Paths for using/adding others.
|
22
22
|
|
@@ -30,15 +30,21 @@ h3. New Functionality
|
|
30
30
|
|
31
31
|
* A binary executable has been added, @hb-scaffold@ that can be run from anywhere and is designed to replace the rake task.
|
32
32
|
|
33
|
-
h3.
|
33
|
+
h3. Version 0.1.1
|
34
34
|
|
35
|
-
* Make Hackboxen a gem. This will require the separation of the actual hackbox code from the hackboxen library (
|
35
|
+
* Make Hackboxen a gem. This will require the separation of the actual hackbox code from the hackboxen library (implemented) and the creation of a coderoot to connect the hackbox library with the other code (implemented).
|
36
36
|
|
37
37
|
* When Hackboxen is a gem, its version can be added to the requires hash in a hackbox @config.yaml@ so we can keep track of potentially breaking changes to legacy code.
|
38
38
|
|
39
|
+
* The separation (completely) of a @config.yaml@ from an @icss.yaml@. (implemented)
|
40
|
+
|
41
|
+
* A hackbox __name__ is now separate from a dataset __protocol__. Namespaces are still shared for organizational purposes, but a protocol directory is no longer created for the output.
|
42
|
+
|
43
|
+
h3. Still Needing
|
44
|
+
|
39
45
|
* Full spec coverage for the hackboxen library.
|
40
46
|
|
41
47
|
* Implementation of @'hb:mini'@ and @ConfigValidator@. Some of this code is written, but it needs to be fleshed out and decided upon.
|
42
48
|
|
43
|
-
|
49
|
+
|
44
50
|
|
data/README.textile
CHANGED
@@ -14,9 +14,9 @@ h2. Hackbox Engine
|
|
14
14
|
A hackbox engine contains:
|
15
15
|
|
16
16
|
* @Rakefile@: **(required)** Used to read and combine all the sources of config metadata and execute @main@.
|
17
|
-
* @Gemfile@: **(optional)** A list of gems necessary for
|
17
|
+
* @Gemfile@: **(optional)** A list of gems necessary for this hackbox to run. Processed automatically by "Bundler":https://github.com/carlhuda/bundler.
|
18
18
|
* @config/@: **(required)** A subdirectory containing:
|
19
|
-
** @config.yaml@ **(required)** A
|
19
|
+
** @config.yaml@ **(required)** A hackbox specific default configuration YAML file.
|
20
20
|
** @protocol.icss.yaml@ **(optional)** An "Icss":http://github.com/infochimps/icss schema file describing the output data and publishing targets.
|
21
21
|
* @engine/@: **(required)** A subdirectory containing:
|
22
22
|
** @main@: **(required)** An executable data processing file. This may be written in any language.
|
@@ -46,28 +46,27 @@ The hackbox output directory is where all of the data that a hackbox acquires, r
|
|
46
46
|
└── language
|
47
47
|
└── corpora
|
48
48
|
└── word_freq
|
49
|
-
└── bnc
|
50
49
|
├── fixd
|
51
50
|
│ ├── code
|
52
51
|
│ │ └── bnc_endpoint.rb
|
53
|
-
│
|
54
|
-
│
|
55
|
-
│ └── env
|
56
|
-
│ └── working_environment.json
|
52
|
+
│ └── data
|
53
|
+
│ └── bnc_fixd_data.tsv
|
57
54
|
├── log
|
58
55
|
│ └── bnc_run_0.log
|
59
56
|
├── rawd
|
60
57
|
│ └── bnc_data_in_process
|
61
58
|
├── ripd
|
62
59
|
│ └── bnc_download.zip
|
63
|
-
└──
|
60
|
+
└── env
|
61
|
+
└── working_environment.json
|
64
62
|
</code></pre>
|
65
63
|
|
64
|
+
* @fixd/@: **(required)** See the output interface described below.
|
66
65
|
* @log/@: **(optional)** All logging from a hackbox run goes here.
|
67
|
-
* @tmp/@: **(optional)** If needed, any truly ephemeral output of the workflow should go here.
|
68
|
-
* @ripd/@: **(required)** This will contain virginal downloaded source data adhering to the directory structure from which it was pulled.
|
69
66
|
* @rawd/@: **(optional)** This will contain all intermediate data processing outputs.
|
70
|
-
* @
|
67
|
+
* @ripd/@: **(required)** This will contain virginal downloaded source data adhering to the directory structure from which it was pulled.
|
68
|
+
* @env/@: **(required)** This directory contains a file describing the environment in which the hackbox was run.
|
69
|
+
** @working_environment.json@: **(required)** All runtime config metadata used to generate the schema and output data.
|
71
70
|
|
72
71
|
Engine and output directories are generally created dynamically and are not meant to be archival.
|
73
72
|
|
@@ -75,11 +74,9 @@ h3. Output Interface (fixd/)
|
|
75
74
|
|
76
75
|
@fixd/@ is the final output directory and contains the following:
|
77
76
|
|
78
|
-
* @
|
79
|
-
** @working_environment.json@: **(required)** All runtime config metadata used to generate the schema and output data.
|
77
|
+
* @protocol.icss.json@: **(required)** An "Icss":http://github.com/infochimps/icss schema file describing its respective dataset.
|
80
78
|
* @code/@: **(optional)** A directory containing the code assets described in the icss.
|
81
79
|
* @data/@: **(required)** A directory containing a single dataset or subdirectories named for each dataset. Each contains:
|
82
|
-
** @protocol.icss.json@: **(required)** An "Icss":http://github.com/infochimps/icss schema file describing its respective dataset.
|
83
80
|
** **(required)** One or more data files that collectively adhere to the schema of this dataset.
|
84
81
|
|
85
82
|
h2. Hackbox Configuration
|
@@ -97,27 +94,16 @@ h1. Getting Started
|
|
97
94
|
|
98
95
|
Here are the general guidelines for creating your own hackbox.
|
99
96
|
|
100
|
-
h3. Hackboxen
|
101
|
-
|
102
|
-
Clone the Hackboxen repo:
|
97
|
+
h3. Install Hackboxen
|
103
98
|
|
104
|
-
<pre><code>
|
99
|
+
<pre><code>sudo gem install hackboxen
|
105
100
|
</code></pre>
|
106
101
|
|
107
|
-
|
102
|
+
This will create a @.hackbox@ directory with a @hackbox.yaml@ file that contains default values for @coderoot@, @dataroot@, @s3_filesystem@, @os@, and @machine@. The @hb-install@ command has optional arguments @--dataroot=@, @--coderoot=@.
|
108
103
|
|
109
|
-
<pre><code>
|
104
|
+
<pre><code>hb-install # optionally: hb-install --dataroot=/data/hb --coderoot=/code/hb
|
110
105
|
</code></pre>
|
111
106
|
|
112
|
-
Install Hackboxen dependencies:
|
113
|
-
|
114
|
-
<pre><code>cd hackboxen
|
115
|
-
sudo bundle install
|
116
|
-
rake install # optionally: rake install -- --dataroot=/data/hb --coderoot=/code/hb
|
117
|
-
</code></pre>
|
118
|
-
|
119
|
-
This will install the following gems: "configliere":http://github.com/mrflip/configliere, "icss":http://github.com/infochimps/icss, "swineherd":http://github.com/ganglion/swineherd, and "rake":http://github.com/jimweirich/rake. This will also create a @.hackbox@ directory with a @hackbox.yaml@ file that contains default values for @coderoot@, @dataroot@, @s3_filesystem@, @os@, and @machine@. The @rake install@ command has optional arguments @--dataroot=@, @--coderoot=@.
|
120
|
-
|
121
107
|
A default @hackbox.yaml@ file:
|
122
108
|
|
123
109
|
<pre><code>---
|
@@ -132,11 +118,15 @@ requires:
|
|
132
118
|
os: darwin
|
133
119
|
</code></pre>
|
134
120
|
|
121
|
+
h3. Hackboxen Dependencies
|
122
|
+
|
123
|
+
The Hackboxen library depends on the following gems: "configliere":http://github.com/mrflip/configliere, "icss":http://github.com/infochimps/icss, "swineherd":http://github.com/ganglion/swineherd, and "rake":http://github.com/jimweirich/rake.
|
124
|
+
|
135
125
|
h3. Creating a Hackbox
|
136
126
|
|
137
|
-
Hackboxen comes with scaffold task that creates a template hackbox for you. Required arguments are @--namespace=@ and @--protocol=@. Optional arguments are @--targets
|
127
|
+
Hackboxen comes with scaffold task that creates a template hackbox for you. Required arguments are @--namespace=@ and @--protocol=@. Optional arguments are @--targets=@.
|
138
128
|
|
139
|
-
<pre><code>hb-scaffold --namespace=foo.bar --protocol --targets=catalog,mysql
|
129
|
+
<pre><code>hb-scaffold --namespace=foo.bar.baz --protocol=qux --targets=catalog,mysql
|
140
130
|
</code></pre>
|
141
131
|
|
142
132
|
This will create the following directories and files:
|
@@ -145,13 +135,14 @@ This will create the following directories and files:
|
|
145
135
|
└── foo
|
146
136
|
└── bar
|
147
137
|
└── baz
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
138
|
+
└── qux
|
139
|
+
├── config
|
140
|
+
│ ├── config.yaml
|
141
|
+
│ └── qux.icss.yaml
|
142
|
+
├── engine
|
143
|
+
│ ├── main
|
144
|
+
│ └── qux_endpoint.rb
|
145
|
+
└── Rakefile
|
155
146
|
</code></pre>
|
156
147
|
|
157
148
|
h3. Running a hackbox
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.1
|
data/hackboxen.gemspec
CHANGED
@@ -5,11 +5,11 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{hackboxen}
|
8
|
-
s.version = "0.1.
|
8
|
+
s.version = "0.1.1"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["kornypoet", "Ganglion", "bollacker"]
|
12
|
-
s.date = %q{2011-06
|
12
|
+
s.date = %q{2011-07-06}
|
13
13
|
s.description = %q{A simple framework to assist in standardizing the data-munging input/output process.}
|
14
14
|
s.email = %q{travis@infochimps.com}
|
15
15
|
s.executables = ["hb-install", "hb-scaffold", "hb-runner"]
|
data/lib/hackboxen/tasks/icss.rb
CHANGED
@@ -6,7 +6,7 @@ module HackBoxen
|
|
6
6
|
icss_yaml = File.join(path_to(:hb_config), "#{WorkingConfig[:protocol]}.icss.yaml")
|
7
7
|
if File.exists? icss_yaml
|
8
8
|
icss = YAML.load(File.read icss_yaml)
|
9
|
-
icss_json = File.join(path_to(:
|
9
|
+
icss_json = File.join(path_to(:fixd_dir), "#{WorkingConfig[:protocol]}.icss.json")
|
10
10
|
HackBoxen.current_fs.open(File.join(icss_json), 'w') { |f| f.puts icss.to_json }
|
11
11
|
end
|
12
12
|
end
|
data/lib/hackboxen/tasks/init.rb
CHANGED
@@ -13,6 +13,15 @@ module HackBoxen
|
|
13
13
|
|
14
14
|
desc "Create the required output directories using the filesystem specified by the WorkingConfig"
|
15
15
|
task :create_required_paths => [:validate_working_config] do
|
16
|
+
output_dirs = [
|
17
|
+
path_to(:hb_dataroot),
|
18
|
+
path_to(:env_dir),
|
19
|
+
path_to(:fixd_dir),
|
20
|
+
path_to(:log_dir),
|
21
|
+
path_to(:ripd_dir),
|
22
|
+
path_to(:rawd_dir),
|
23
|
+
path_to(:data_dir)
|
24
|
+
]
|
16
25
|
output_dirs.each { |dir| HackBoxen.current_fs.mkpath(dir) unless HackBoxen.current_fs.exists? dir }
|
17
26
|
end
|
18
27
|
|
data/lib/hackboxen/utils.rb
CHANGED
@@ -32,12 +32,25 @@ module HackBoxen
|
|
32
32
|
Swineherd::FileSystem.get fs
|
33
33
|
end
|
34
34
|
|
35
|
+
def self.name
|
36
|
+
hackbox_root? ? Dir.pwd.gsub(/#{WorkingConfig[:coderoot]}\/#{current}\//, '') : 'debug'
|
37
|
+
end
|
38
|
+
|
35
39
|
def self.current
|
36
|
-
hackbox_root? ?
|
40
|
+
hackbox_root? ? WorkingConfig[:namespace].gsub('.', '/') : Dir.pwd
|
41
|
+
end
|
42
|
+
|
43
|
+
def self.coderoot
|
44
|
+
WorkingConfig[:coderoot]
|
37
45
|
end
|
38
46
|
|
47
|
+
def self.dataroot
|
48
|
+
WorkingConfig[:dataroot]
|
49
|
+
end
|
50
|
+
|
51
|
+
|
39
52
|
def self.verify_dependencies
|
40
|
-
%w[ dataroot namespace
|
53
|
+
%w[ coderoot dataroot namespace ].each do |req|
|
41
54
|
raise "Your hackbox config appears to be missing a [#{req}]" unless WorkingConfig[req.to_sym]
|
42
55
|
end
|
43
56
|
end
|
@@ -13,26 +13,14 @@ module HackBoxen
|
|
13
13
|
path.absolute? ? File.expand_path(path) : path.to_s
|
14
14
|
end
|
15
15
|
|
16
|
-
def output_dirs
|
17
|
-
[
|
18
|
-
path_to(:hb_dataroot),
|
19
|
-
path_to(:env_dir),
|
20
|
-
path_to(:fixd_dir),
|
21
|
-
path_to(:log_dir),
|
22
|
-
path_to(:ripd_dir),
|
23
|
-
path_to(:rawd_dir),
|
24
|
-
path_to(:data_dir)
|
25
|
-
]
|
26
|
-
end
|
27
|
-
|
28
16
|
def default_paths
|
29
17
|
{
|
30
18
|
# root directories
|
31
19
|
:home => ENV['HOME'],
|
32
|
-
:data_root =>
|
33
|
-
:code_root =>
|
20
|
+
:data_root => HackBoxen.dataroot,
|
21
|
+
:code_root => HackBoxen.coderoot,
|
34
22
|
# local hackbox directories
|
35
|
-
:hb_current => [:code_root, HackBoxen.current],
|
23
|
+
:hb_current => [:code_root, HackBoxen.current, HackBoxen.name],
|
36
24
|
:hb_engine => [:hb_current, 'engine'],
|
37
25
|
:hb_config => [:hb_current, 'config'],
|
38
26
|
# output directories
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: hackboxen
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 25
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
8
|
- 1
|
9
|
-
-
|
10
|
-
version: 0.1.
|
9
|
+
- 1
|
10
|
+
version: 0.1.1
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- kornypoet
|
@@ -17,7 +17,7 @@ autorequire:
|
|
17
17
|
bindir: bin
|
18
18
|
cert_chain: []
|
19
19
|
|
20
|
-
date: 2011-06
|
20
|
+
date: 2011-07-06 00:00:00 -05:00
|
21
21
|
default_executable:
|
22
22
|
dependencies:
|
23
23
|
- !ruby/object:Gem::Dependency
|