hackboxen 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -16,7 +16,7 @@ h3. Deprecations/Changes
16
16
 
17
17
  * Much old code was refactored or removed.
18
18
 
19
- h3. New Functionality
19
+ h3. New Functionality (version 0.1.0)
20
20
 
21
21
  * Default Hackboxen paths can be accessed by using @path_to(:fixd_dir)@. See HackBoxen::Paths for using/adding others.
22
22
 
@@ -30,15 +30,21 @@ h3. New Functionality
30
30
 
31
31
  * A binary executable has been added, @hb-scaffold@ that can be run from anywhere and is designed to replace the rake task.
32
32
 
33
- h3. Still Needing
33
+ h3. Version 0.1.1
34
34
 
35
- * Make Hackboxen a gem. This will require the separation of the actual hackbox code from the hackboxen library (not done yet) and the creation of a coderoot to connect the hackbox library with the other code (implemented).
35
+ * Make Hackboxen a gem. This will require the separation of the actual hackbox code from the hackboxen library (implemented) and the creation of a coderoot to connect the hackbox library with the other code (implemented).
36
36
 
37
37
  * When Hackboxen is a gem, its version can be added to the requires hash in a hackbox @config.yaml@ so we can keep track of potentially breaking changes to legacy code.
38
38
 
39
+ * The separation (completely) of a @config.yaml@ from an @icss.yaml@. (implemented)
40
+
41
+ * A hackbox __name__ is now separate from a dataset __protocol__. Namespaces are still shared for organizational purposes, but a protocol directory is no longer created for the output.
42
+
43
+ h3. Still Needing
44
+
39
45
  * Full spec coverage for the hackboxen library.
40
46
 
41
47
  * Implementation of @'hb:mini'@ and @ConfigValidator@. Some of this code is written, but it needs to be fleshed out and decided upon.
42
48
 
43
- * The separation (completely) of a @config.yaml@ from an @icss.yaml@. Would not affect hackbox running much.
49
+
44
50
 
@@ -14,9 +14,9 @@ h2. Hackbox Engine
14
14
  A hackbox engine contains:
15
15
 
16
16
  * @Rakefile@: **(required)** Used to read and combine all the sources of config metadata and execute @main@.
17
- * @Gemfile@: **(optional)** A list of gems necessary for thsi hackbox to run. Processed automatically by "Bundler":https://github.com/carlhuda/bundler.
17
+ * @Gemfile@: **(optional)** A list of gems necessary for this hackbox to run. Processed automatically by "Bundler":https://github.com/carlhuda/bundler.
18
18
  * @config/@: **(required)** A subdirectory containing:
19
- ** @config.yaml@ **(required)** A dataset specific default configuration YAML file.
19
+ ** @config.yaml@ **(required)** A hackbox specific default configuration YAML file.
20
20
  ** @protocol.icss.yaml@ **(optional)** An "Icss":http://github.com/infochimps/icss schema file describing the output data and publishing targets.
21
21
  * @engine/@: **(required)** A subdirectory containing:
22
22
  ** @main@: **(required)** An executable data processing file. This may be written in any language.
@@ -46,28 +46,27 @@ The hackbox output directory is where all of the data that a hackbox acquires, r
46
46
  └── language
47
47
  └── corpora
48
48
  └── word_freq
49
- └── bnc
50
49
  ├── fixd
51
50
  │ ├── code
52
51
  │ │ └── bnc_endpoint.rb
53
- ├── data
54
- └── bnc_fixd_data.tsv
55
- │ └── env
56
- │ └── working_environment.json
52
+ └── data
53
+ └── bnc_fixd_data.tsv
57
54
  ├── log
58
55
  │ └── bnc_run_0.log
59
56
  ├── rawd
60
57
  │ └── bnc_data_in_process
61
58
  ├── ripd
62
59
  │ └── bnc_download.zip
63
- └── tmp
60
+ └── env
61
+ └── working_environment.json
64
62
  </code></pre>
65
63
 
64
+ * @fixd/@: **(required)** See the output interface described below.
66
65
  * @log/@: **(optional)** All logging from a hackbox run goes here.
67
- * @tmp/@: **(optional)** If needed, any truly ephemeral output of the workflow should go here.
68
- * @ripd/@: **(required)** This will contain virginal downloaded source data adhering to the directory structure from which it was pulled.
69
66
  * @rawd/@: **(optional)** This will contain all intermediate data processing outputs.
70
- * @fixd/@: **(required)** See the output interface described below.
67
+ * @ripd/@: **(required)** This will contain virginal downloaded source data adhering to the directory structure from which it was pulled.
68
+ * @env/@: **(required)** This directory contains a file describing the environment in which the hackbox was run.
69
+ ** @working_environment.json@: **(required)** All runtime config metadata used to generate the schema and output data.
71
70
 
72
71
  Engine and output directories are generally created dynamically and are not meant to be archival.
73
72
 
@@ -75,11 +74,9 @@ h3. Output Interface (fixd/)
75
74
 
76
75
  @fixd/@ is the final output directory and contains the following:
77
76
 
78
- * @env/@: **(required)** This directory contains a file describing the environment in which the hackbox was run.
79
- ** @working_environment.json@: **(required)** All runtime config metadata used to generate the schema and output data.
77
+ * @protocol.icss.json@: **(required)** An "Icss":http://github.com/infochimps/icss schema file describing its respective dataset.
80
78
  * @code/@: **(optional)** A directory containing the code assets described in the icss.
81
79
  * @data/@: **(required)** A directory containing a single dataset or subdirectories named for each dataset. Each contains:
82
- ** @protocol.icss.json@: **(required)** An "Icss":http://github.com/infochimps/icss schema file describing its respective dataset.
83
80
  ** **(required)** One or more data files that collectively adhere to the schema of this dataset.
84
81
 
85
82
  h2. Hackbox Configuration
@@ -97,27 +94,16 @@ h1. Getting Started
97
94
 
98
95
  Here are the general guidelines for creating your own hackbox.
99
96
 
100
- h3. Hackboxen Dependencies
101
-
102
- Clone the Hackboxen repo:
97
+ h3. Install Hackboxen
103
98
 
104
- <pre><code>git clone git@github.com:infochimps/hackboxen.git
99
+ <pre><code>sudo gem install hackboxen
105
100
  </code></pre>
106
101
 
107
- Add Hackboxen to your $RUBYLIB:
102
+ This will create a @.hackbox@ directory with a @hackbox.yaml@ file that contains default values for @coderoot@, @dataroot@, @s3_filesystem@, @os@, and @machine@. The @hb-install@ command has optional arguments @--dataroot=@, @--coderoot=@.
108
103
 
109
- <pre><code>export RUBYLIB=$RUBYLIB:/path/to/hackboxen/lib
104
+ <pre><code>hb-install # optionally: hb-install --dataroot=/data/hb --coderoot=/code/hb
110
105
  </code></pre>
111
106
 
112
- Install Hackboxen dependencies:
113
-
114
- <pre><code>cd hackboxen
115
- sudo bundle install
116
- rake install # optionally: rake install -- --dataroot=/data/hb --coderoot=/code/hb
117
- </code></pre>
118
-
119
- This will install the following gems: "configliere":http://github.com/mrflip/configliere, "icss":http://github.com/infochimps/icss, "swineherd":http://github.com/ganglion/swineherd, and "rake":http://github.com/jimweirich/rake. This will also create a @.hackbox@ directory with a @hackbox.yaml@ file that contains default values for @coderoot@, @dataroot@, @s3_filesystem@, @os@, and @machine@. The @rake install@ command has optional arguments @--dataroot=@, @--coderoot=@.
120
-
121
107
  A default @hackbox.yaml@ file:
122
108
 
123
109
  <pre><code>---
@@ -132,11 +118,15 @@ requires:
132
118
  os: darwin
133
119
  </code></pre>
134
120
 
121
+ h3. Hackboxen Dependencies
122
+
123
+ The Hackboxen library depends on the following gems: "configliere":http://github.com/mrflip/configliere, "icss":http://github.com/infochimps/icss, "swineherd":http://github.com/ganglion/swineherd, and "rake":http://github.com/jimweirich/rake.
124
+
135
125
  h3. Creating a Hackbox
136
126
 
137
- Hackboxen comes with scaffold task that creates a template hackbox for you. Required arguments are @--namespace=@ and @--protocol=@. Optional arguments are @--targets=@, @--s3access=@, and @--s3secret=@.
127
+ Hackboxen comes with scaffold task that creates a template hackbox for you. Required arguments are @--namespace=@ and @--protocol=@. Optional arguments are @--targets=@.
138
128
 
139
- <pre><code>hb-scaffold --namespace=foo.bar --protocol --targets=catalog,mysql
129
+ <pre><code>hb-scaffold --namespace=foo.bar.baz --protocol=qux --targets=catalog,mysql
140
130
  </code></pre>
141
131
 
142
132
  This will create the following directories and files:
@@ -145,13 +135,14 @@ This will create the following directories and files:
145
135
  └── foo
146
136
  └── bar
147
137
  └── baz
148
- ├── config
149
- ├── config.yaml
150
- └── baz.icss.yaml
151
- ├── engine
152
- ├── main
153
- └── baz_endpoint.rb
154
- └── Rakefile
138
+ └── qux
139
+ ├── config
140
+ ├── config.yaml
141
+ │ └── qux.icss.yaml
142
+ ├── engine
143
+ ├── main
144
+ └── qux_endpoint.rb
145
+ └── Rakefile
155
146
  </code></pre>
156
147
 
157
148
  h3. Running a hackbox
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.1.1
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{hackboxen}
8
- s.version = "0.1.0"
8
+ s.version = "0.1.1"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["kornypoet", "Ganglion", "bollacker"]
12
- s.date = %q{2011-06-30}
12
+ s.date = %q{2011-07-06}
13
13
  s.description = %q{A simple framework to assist in standardizing the data-munging input/output process.}
14
14
  s.email = %q{travis@infochimps.com}
15
15
  s.executables = ["hb-install", "hb-scaffold", "hb-runner"]
@@ -6,7 +6,7 @@ module HackBoxen
6
6
  icss_yaml = File.join(path_to(:hb_config), "#{WorkingConfig[:protocol]}.icss.yaml")
7
7
  if File.exists? icss_yaml
8
8
  icss = YAML.load(File.read icss_yaml)
9
- icss_json = File.join(path_to(:data_dir), "#{WorkingConfig[:protocol]}.icss.json")
9
+ icss_json = File.join(path_to(:fixd_dir), "#{WorkingConfig[:protocol]}.icss.json")
10
10
  HackBoxen.current_fs.open(File.join(icss_json), 'w') { |f| f.puts icss.to_json }
11
11
  end
12
12
  end
@@ -13,6 +13,15 @@ module HackBoxen
13
13
 
14
14
  desc "Create the required output directories using the filesystem specified by the WorkingConfig"
15
15
  task :create_required_paths => [:validate_working_config] do
16
+ output_dirs = [
17
+ path_to(:hb_dataroot),
18
+ path_to(:env_dir),
19
+ path_to(:fixd_dir),
20
+ path_to(:log_dir),
21
+ path_to(:ripd_dir),
22
+ path_to(:rawd_dir),
23
+ path_to(:data_dir)
24
+ ]
16
25
  output_dirs.each { |dir| HackBoxen.current_fs.mkpath(dir) unless HackBoxen.current_fs.exists? dir }
17
26
  end
18
27
 
@@ -32,12 +32,25 @@ module HackBoxen
32
32
  Swineherd::FileSystem.get fs
33
33
  end
34
34
 
35
+ def self.name
36
+ hackbox_root? ? Dir.pwd.gsub(/#{WorkingConfig[:coderoot]}\/#{current}\//, '') : 'debug'
37
+ end
38
+
35
39
  def self.current
36
- hackbox_root? ? File.join(WorkingConfig[:namespace].gsub('.', '/'), WorkingConfig[:protocol]) : 'debug'
40
+ hackbox_root? ? WorkingConfig[:namespace].gsub('.', '/') : Dir.pwd
41
+ end
42
+
43
+ def self.coderoot
44
+ WorkingConfig[:coderoot]
37
45
  end
38
46
 
47
+ def self.dataroot
48
+ WorkingConfig[:dataroot]
49
+ end
50
+
51
+
39
52
  def self.verify_dependencies
40
- %w[ dataroot namespace protocol ].each do |req|
53
+ %w[ coderoot dataroot namespace ].each do |req|
41
54
  raise "Your hackbox config appears to be missing a [#{req}]" unless WorkingConfig[req.to_sym]
42
55
  end
43
56
  end
@@ -13,26 +13,14 @@ module HackBoxen
13
13
  path.absolute? ? File.expand_path(path) : path.to_s
14
14
  end
15
15
 
16
- def output_dirs
17
- [
18
- path_to(:hb_dataroot),
19
- path_to(:env_dir),
20
- path_to(:fixd_dir),
21
- path_to(:log_dir),
22
- path_to(:ripd_dir),
23
- path_to(:rawd_dir),
24
- path_to(:data_dir)
25
- ]
26
- end
27
-
28
16
  def default_paths
29
17
  {
30
18
  # root directories
31
19
  :home => ENV['HOME'],
32
- :data_root => WorkingConfig[:dataroot],
33
- :code_root => WorkingConfig[:coderoot],
20
+ :data_root => HackBoxen.dataroot,
21
+ :code_root => HackBoxen.coderoot,
34
22
  # local hackbox directories
35
- :hb_current => [:code_root, HackBoxen.current],
23
+ :hb_current => [:code_root, HackBoxen.current, HackBoxen.name],
36
24
  :hb_engine => [:hb_current, 'engine'],
37
25
  :hb_config => [:hb_current, 'config'],
38
26
  # output directories
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: hackboxen
3
3
  version: !ruby/object:Gem::Version
4
- hash: 27
4
+ hash: 25
5
5
  prerelease: false
6
6
  segments:
7
7
  - 0
8
8
  - 1
9
- - 0
10
- version: 0.1.0
9
+ - 1
10
+ version: 0.1.1
11
11
  platform: ruby
12
12
  authors:
13
13
  - kornypoet
@@ -17,7 +17,7 @@ autorequire:
17
17
  bindir: bin
18
18
  cert_chain: []
19
19
 
20
- date: 2011-06-30 00:00:00 -05:00
20
+ date: 2011-07-06 00:00:00 -05:00
21
21
  default_executable:
22
22
  dependencies:
23
23
  - !ruby/object:Gem::Dependency