hackboxen 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGELOG.textile +10 -4
- data/README.textile +29 -38
- data/VERSION +1 -1
- data/hackboxen.gemspec +2 -2
- data/lib/hackboxen/tasks/icss.rb +1 -1
- data/lib/hackboxen/tasks/init.rb +9 -0
- data/lib/hackboxen/utils.rb +15 -2
- data/lib/hackboxen/utils/paths.rb +3 -15
- metadata +4 -4
data/CHANGELOG.textile
CHANGED
@@ -16,7 +16,7 @@ h3. Deprecations/Changes
|
|
16
16
|
|
17
17
|
* Much old code was refactored or removed.
|
18
18
|
|
19
|
-
h3. New Functionality
|
19
|
+
h3. New Functionality (version 0.1.0)
|
20
20
|
|
21
21
|
* Default Hackboxen paths can be accessed by using @path_to(:fixd_dir)@. See HackBoxen::Paths for using/adding others.
|
22
22
|
|
@@ -30,15 +30,21 @@ h3. New Functionality
|
|
30
30
|
|
31
31
|
* A binary executable has been added, @hb-scaffold@ that can be run from anywhere and is designed to replace the rake task.
|
32
32
|
|
33
|
-
h3.
|
33
|
+
h3. Version 0.1.1
|
34
34
|
|
35
|
-
* Make Hackboxen a gem. This will require the separation of the actual hackbox code from the hackboxen library (
|
35
|
+
* Make Hackboxen a gem. This will require the separation of the actual hackbox code from the hackboxen library (implemented) and the creation of a coderoot to connect the hackbox library with the other code (implemented).
|
36
36
|
|
37
37
|
* When Hackboxen is a gem, its version can be added to the requires hash in a hackbox @config.yaml@ so we can keep track of potentially breaking changes to legacy code.
|
38
38
|
|
39
|
+
* The separation (completely) of a @config.yaml@ from an @icss.yaml@. (implemented)
|
40
|
+
|
41
|
+
* A hackbox __name__ is now separate from a dataset __protocol__. Namespaces are still shared for organizational purposes, but a protocol directory is no longer created for the output.
|
42
|
+
|
43
|
+
h3. Still Needing
|
44
|
+
|
39
45
|
* Full spec coverage for the hackboxen library.
|
40
46
|
|
41
47
|
* Implementation of @'hb:mini'@ and @ConfigValidator@. Some of this code is written, but it needs to be fleshed out and decided upon.
|
42
48
|
|
43
|
-
|
49
|
+
|
44
50
|
|
data/README.textile
CHANGED
@@ -14,9 +14,9 @@ h2. Hackbox Engine
|
|
14
14
|
A hackbox engine contains:
|
15
15
|
|
16
16
|
* @Rakefile@: **(required)** Used to read and combine all the sources of config metadata and execute @main@.
|
17
|
-
* @Gemfile@: **(optional)** A list of gems necessary for
|
17
|
+
* @Gemfile@: **(optional)** A list of gems necessary for this hackbox to run. Processed automatically by "Bundler":https://github.com/carlhuda/bundler.
|
18
18
|
* @config/@: **(required)** A subdirectory containing:
|
19
|
-
** @config.yaml@ **(required)** A
|
19
|
+
** @config.yaml@ **(required)** A hackbox specific default configuration YAML file.
|
20
20
|
** @protocol.icss.yaml@ **(optional)** An "Icss":http://github.com/infochimps/icss schema file describing the output data and publishing targets.
|
21
21
|
* @engine/@: **(required)** A subdirectory containing:
|
22
22
|
** @main@: **(required)** An executable data processing file. This may be written in any language.
|
@@ -46,28 +46,27 @@ The hackbox output directory is where all of the data that a hackbox acquires, r
|
|
46
46
|
└── language
|
47
47
|
└── corpora
|
48
48
|
└── word_freq
|
49
|
-
└── bnc
|
50
49
|
├── fixd
|
51
50
|
│ ├── code
|
52
51
|
│ │ └── bnc_endpoint.rb
|
53
|
-
│
|
54
|
-
│
|
55
|
-
│ └── env
|
56
|
-
│ └── working_environment.json
|
52
|
+
│ └── data
|
53
|
+
│ └── bnc_fixd_data.tsv
|
57
54
|
├── log
|
58
55
|
│ └── bnc_run_0.log
|
59
56
|
├── rawd
|
60
57
|
│ └── bnc_data_in_process
|
61
58
|
├── ripd
|
62
59
|
│ └── bnc_download.zip
|
63
|
-
└──
|
60
|
+
└── env
|
61
|
+
└── working_environment.json
|
64
62
|
</code></pre>
|
65
63
|
|
64
|
+
* @fixd/@: **(required)** See the output interface described below.
|
66
65
|
* @log/@: **(optional)** All logging from a hackbox run goes here.
|
67
|
-
* @tmp/@: **(optional)** If needed, any truly ephemeral output of the workflow should go here.
|
68
|
-
* @ripd/@: **(required)** This will contain virginal downloaded source data adhering to the directory structure from which it was pulled.
|
69
66
|
* @rawd/@: **(optional)** This will contain all intermediate data processing outputs.
|
70
|
-
* @
|
67
|
+
* @ripd/@: **(required)** This will contain virginal downloaded source data adhering to the directory structure from which it was pulled.
|
68
|
+
* @env/@: **(required)** This directory contains a file describing the environment in which the hackbox was run.
|
69
|
+
** @working_environment.json@: **(required)** All runtime config metadata used to generate the schema and output data.
|
71
70
|
|
72
71
|
Engine and output directories are generally created dynamically and are not meant to be archival.
|
73
72
|
|
@@ -75,11 +74,9 @@ h3. Output Interface (fixd/)
|
|
75
74
|
|
76
75
|
@fixd/@ is the final output directory and contains the following:
|
77
76
|
|
78
|
-
* @
|
79
|
-
** @working_environment.json@: **(required)** All runtime config metadata used to generate the schema and output data.
|
77
|
+
* @protocol.icss.json@: **(required)** An "Icss":http://github.com/infochimps/icss schema file describing its respective dataset.
|
80
78
|
* @code/@: **(optional)** A directory containing the code assets described in the icss.
|
81
79
|
* @data/@: **(required)** A directory containing a single dataset or subdirectories named for each dataset. Each contains:
|
82
|
-
** @protocol.icss.json@: **(required)** An "Icss":http://github.com/infochimps/icss schema file describing its respective dataset.
|
83
80
|
** **(required)** One or more data files that collectively adhere to the schema of this dataset.
|
84
81
|
|
85
82
|
h2. Hackbox Configuration
|
@@ -97,27 +94,16 @@ h1. Getting Started
|
|
97
94
|
|
98
95
|
Here are the general guidelines for creating your own hackbox.
|
99
96
|
|
100
|
-
h3. Hackboxen
|
101
|
-
|
102
|
-
Clone the Hackboxen repo:
|
97
|
+
h3. Install Hackboxen
|
103
98
|
|
104
|
-
<pre><code>
|
99
|
+
<pre><code>sudo gem install hackboxen
|
105
100
|
</code></pre>
|
106
101
|
|
107
|
-
|
102
|
+
This will create a @.hackbox@ directory with a @hackbox.yaml@ file that contains default values for @coderoot@, @dataroot@, @s3_filesystem@, @os@, and @machine@. The @hb-install@ command has optional arguments @--dataroot=@, @--coderoot=@.
|
108
103
|
|
109
|
-
<pre><code>
|
104
|
+
<pre><code>hb-install # optionally: hb-install --dataroot=/data/hb --coderoot=/code/hb
|
110
105
|
</code></pre>
|
111
106
|
|
112
|
-
Install Hackboxen dependencies:
|
113
|
-
|
114
|
-
<pre><code>cd hackboxen
|
115
|
-
sudo bundle install
|
116
|
-
rake install # optionally: rake install -- --dataroot=/data/hb --coderoot=/code/hb
|
117
|
-
</code></pre>
|
118
|
-
|
119
|
-
This will install the following gems: "configliere":http://github.com/mrflip/configliere, "icss":http://github.com/infochimps/icss, "swineherd":http://github.com/ganglion/swineherd, and "rake":http://github.com/jimweirich/rake. This will also create a @.hackbox@ directory with a @hackbox.yaml@ file that contains default values for @coderoot@, @dataroot@, @s3_filesystem@, @os@, and @machine@. The @rake install@ command has optional arguments @--dataroot=@, @--coderoot=@.
|
120
|
-
|
121
107
|
A default @hackbox.yaml@ file:
|
122
108
|
|
123
109
|
<pre><code>---
|
@@ -132,11 +118,15 @@ requires:
|
|
132
118
|
os: darwin
|
133
119
|
</code></pre>
|
134
120
|
|
121
|
+
h3. Hackboxen Dependencies
|
122
|
+
|
123
|
+
The Hackboxen library depends on the following gems: "configliere":http://github.com/mrflip/configliere, "icss":http://github.com/infochimps/icss, "swineherd":http://github.com/ganglion/swineherd, and "rake":http://github.com/jimweirich/rake.
|
124
|
+
|
135
125
|
h3. Creating a Hackbox
|
136
126
|
|
137
|
-
Hackboxen comes with scaffold task that creates a template hackbox for you. Required arguments are @--namespace=@ and @--protocol=@. Optional arguments are @--targets
|
127
|
+
Hackboxen comes with scaffold task that creates a template hackbox for you. Required arguments are @--namespace=@ and @--protocol=@. Optional arguments are @--targets=@.
|
138
128
|
|
139
|
-
<pre><code>hb-scaffold --namespace=foo.bar --protocol --targets=catalog,mysql
|
129
|
+
<pre><code>hb-scaffold --namespace=foo.bar.baz --protocol=qux --targets=catalog,mysql
|
140
130
|
</code></pre>
|
141
131
|
|
142
132
|
This will create the following directories and files:
|
@@ -145,13 +135,14 @@ This will create the following directories and files:
|
|
145
135
|
└── foo
|
146
136
|
└── bar
|
147
137
|
└── baz
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
138
|
+
└── qux
|
139
|
+
├── config
|
140
|
+
│ ├── config.yaml
|
141
|
+
│ └── qux.icss.yaml
|
142
|
+
├── engine
|
143
|
+
│ ├── main
|
144
|
+
│ └── qux_endpoint.rb
|
145
|
+
└── Rakefile
|
155
146
|
</code></pre>
|
156
147
|
|
157
148
|
h3. Running a hackbox
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.1
|
data/hackboxen.gemspec
CHANGED
@@ -5,11 +5,11 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{hackboxen}
|
8
|
-
s.version = "0.1.
|
8
|
+
s.version = "0.1.1"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["kornypoet", "Ganglion", "bollacker"]
|
12
|
-
s.date = %q{2011-06
|
12
|
+
s.date = %q{2011-07-06}
|
13
13
|
s.description = %q{A simple framework to assist in standardizing the data-munging input/output process.}
|
14
14
|
s.email = %q{travis@infochimps.com}
|
15
15
|
s.executables = ["hb-install", "hb-scaffold", "hb-runner"]
|
data/lib/hackboxen/tasks/icss.rb
CHANGED
@@ -6,7 +6,7 @@ module HackBoxen
|
|
6
6
|
icss_yaml = File.join(path_to(:hb_config), "#{WorkingConfig[:protocol]}.icss.yaml")
|
7
7
|
if File.exists? icss_yaml
|
8
8
|
icss = YAML.load(File.read icss_yaml)
|
9
|
-
icss_json = File.join(path_to(:
|
9
|
+
icss_json = File.join(path_to(:fixd_dir), "#{WorkingConfig[:protocol]}.icss.json")
|
10
10
|
HackBoxen.current_fs.open(File.join(icss_json), 'w') { |f| f.puts icss.to_json }
|
11
11
|
end
|
12
12
|
end
|
data/lib/hackboxen/tasks/init.rb
CHANGED
@@ -13,6 +13,15 @@ module HackBoxen
|
|
13
13
|
|
14
14
|
desc "Create the required output directories using the filesystem specified by the WorkingConfig"
|
15
15
|
task :create_required_paths => [:validate_working_config] do
|
16
|
+
output_dirs = [
|
17
|
+
path_to(:hb_dataroot),
|
18
|
+
path_to(:env_dir),
|
19
|
+
path_to(:fixd_dir),
|
20
|
+
path_to(:log_dir),
|
21
|
+
path_to(:ripd_dir),
|
22
|
+
path_to(:rawd_dir),
|
23
|
+
path_to(:data_dir)
|
24
|
+
]
|
16
25
|
output_dirs.each { |dir| HackBoxen.current_fs.mkpath(dir) unless HackBoxen.current_fs.exists? dir }
|
17
26
|
end
|
18
27
|
|
data/lib/hackboxen/utils.rb
CHANGED
@@ -32,12 +32,25 @@ module HackBoxen
|
|
32
32
|
Swineherd::FileSystem.get fs
|
33
33
|
end
|
34
34
|
|
35
|
+
def self.name
|
36
|
+
hackbox_root? ? Dir.pwd.gsub(/#{WorkingConfig[:coderoot]}\/#{current}\//, '') : 'debug'
|
37
|
+
end
|
38
|
+
|
35
39
|
def self.current
|
36
|
-
hackbox_root? ?
|
40
|
+
hackbox_root? ? WorkingConfig[:namespace].gsub('.', '/') : Dir.pwd
|
41
|
+
end
|
42
|
+
|
43
|
+
def self.coderoot
|
44
|
+
WorkingConfig[:coderoot]
|
37
45
|
end
|
38
46
|
|
47
|
+
def self.dataroot
|
48
|
+
WorkingConfig[:dataroot]
|
49
|
+
end
|
50
|
+
|
51
|
+
|
39
52
|
def self.verify_dependencies
|
40
|
-
%w[ dataroot namespace
|
53
|
+
%w[ coderoot dataroot namespace ].each do |req|
|
41
54
|
raise "Your hackbox config appears to be missing a [#{req}]" unless WorkingConfig[req.to_sym]
|
42
55
|
end
|
43
56
|
end
|
@@ -13,26 +13,14 @@ module HackBoxen
|
|
13
13
|
path.absolute? ? File.expand_path(path) : path.to_s
|
14
14
|
end
|
15
15
|
|
16
|
-
def output_dirs
|
17
|
-
[
|
18
|
-
path_to(:hb_dataroot),
|
19
|
-
path_to(:env_dir),
|
20
|
-
path_to(:fixd_dir),
|
21
|
-
path_to(:log_dir),
|
22
|
-
path_to(:ripd_dir),
|
23
|
-
path_to(:rawd_dir),
|
24
|
-
path_to(:data_dir)
|
25
|
-
]
|
26
|
-
end
|
27
|
-
|
28
16
|
def default_paths
|
29
17
|
{
|
30
18
|
# root directories
|
31
19
|
:home => ENV['HOME'],
|
32
|
-
:data_root =>
|
33
|
-
:code_root =>
|
20
|
+
:data_root => HackBoxen.dataroot,
|
21
|
+
:code_root => HackBoxen.coderoot,
|
34
22
|
# local hackbox directories
|
35
|
-
:hb_current => [:code_root, HackBoxen.current],
|
23
|
+
:hb_current => [:code_root, HackBoxen.current, HackBoxen.name],
|
36
24
|
:hb_engine => [:hb_current, 'engine'],
|
37
25
|
:hb_config => [:hb_current, 'config'],
|
38
26
|
# output directories
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: hackboxen
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 25
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
8
|
- 1
|
9
|
-
-
|
10
|
-
version: 0.1.
|
9
|
+
- 1
|
10
|
+
version: 0.1.1
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- kornypoet
|
@@ -17,7 +17,7 @@ autorequire:
|
|
17
17
|
bindir: bin
|
18
18
|
cert_chain: []
|
19
19
|
|
20
|
-
date: 2011-06
|
20
|
+
date: 2011-07-06 00:00:00 -05:00
|
21
21
|
default_executable:
|
22
22
|
dependencies:
|
23
23
|
- !ruby/object:Gem::Dependency
|