trinamo 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a9f5ffc075b447acba09cc09f5c5422858c887cf
4
- data.tar.gz: e5d3de4c641e481365f9ef0bf3d80b20dd2356b2
3
+ metadata.gz: aaf548285a2c17c8ef8913b70e42058c5dd7d1d2
4
+ data.tar.gz: 223310522802f3482caa2b46556c068a4f1cd9a4
5
5
  SHA512:
6
- metadata.gz: e91bcd9fd3b6883fbd039ae1c611e07d33be735cdb3ca56169b579230aad5a54084823eea2624aae21f9090368a1f3f2def0e7563d0b611016bfff9a7d1e0621
7
- data.tar.gz: 2b4cad56a5dbda92a7945765b8adb897ea6bf7d77556225dc9fa59a64e6c8529f705f047dbb3e36d05c5d9c33403c7288c68d9703e79faca81a408140ac4d91c
6
+ metadata.gz: 6ef96ddbe9510141e62835a003e9cbfa03ea518e59fd348ac257bd48b578ef1a0f7e77b6947d50f4cb77e0cd79b40b51e00fc6690fcf2d351cf2bc99ff6412af
7
+ data.tar.gz: 58e45dcfe9dd86704894282028a5a6f5666f758f0d76052cd52e3e628acfa73c7d90fd55909e51b7941727452a7391b70d78dcf6c2b9b0be7145926d12a02f9c
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2016 cignoir
3
+ Copyright (c) 2016 Shulla Cignoir
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,8 +1,12 @@
1
1
  # Trinamo
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/trinamo`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ Trinamo generates DDL for Hive from YAML
4
+ to mount tables of DynamoDB, S3 and local HDFS.
4
5
 
5
- TODO: Delete this and the text above, and describe your gem
6
+ ```
7
+ Notice:
8
+ This is experimental stuff! Do not use in production.
9
+ ```
6
10
 
7
11
  ## Installation
8
12
 
@@ -22,17 +26,117 @@ Or install it yourself as:
22
26
 
23
27
  ## Usage
24
28
 
25
- TODO: Write usage instructions here
29
+ ### Create a DDL template
30
+
31
+ * RUN:
32
+ ```ruby
33
+ Trinamo::Converter.generate_template(out_file_path = 'ddl.yml')
34
+ ```
35
+
36
+ * OUTPUT:
37
+ ```yaml
38
+ dynamo_read_percent: 0.75
39
+ tables:
40
+ - name: comments
41
+ s3_location: s3://path/to/s3/table/location
42
+ s3_partition:
43
+ - name: date
44
+ type: string
45
+ hash_key:
46
+ - name: user_id
47
+ type: bigint
48
+ range_key:
49
+ - name: comment_id
50
+ type: bigint
51
+ attributes:
52
+ - name: title
53
+ type: string
54
+ - name: content
55
+ type: string
56
+ - name: rate
57
+ type: double
58
+ - name: authors
59
+ hash_key:
60
+ - name: author_id
61
+ type: bigint
62
+ attributes:
63
+ - name: name
64
+ type: string
65
+ ```
66
+
67
+ ### Create a mapper for DynamoDB
68
+
69
+ * RUN:
70
+ ```ruby
71
+ Trinamo::Converter.load('ddl.yml', :dynamodb).convert
72
+ ```
73
+
74
+ * OUTPUT:
75
+ ```hql
76
+ SET dynamodb.throughput.read.percent = 0.75;
77
+ SET hive.exec.compress.output=true;
78
+ SET io.seqfile.compression.type=BLOCK;
79
+ SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec;
80
+
81
+ -- comments_ddb
82
+ CREATE EXTERNAL TABLE comments_ddb (
83
+ user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
84
+ )
85
+ STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
86
+ TBLPROPERTIES (
87
+ 'dynamodb.table.name' = 'comments',
88
+ 'dynamodb.column.mapping' = 'user_id:user_id,comment_id:comment_id,title:title,content:content,rate:rate'
89
+ );
90
+
91
+ -- authors_ddb
92
+ CREATE EXTERNAL TABLE authors_ddb (
93
+ author_id BIGINT,name STRING
94
+ )
95
+ STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
96
+ TBLPROPERTIES (
97
+ 'dynamodb.table.name' = 'authors',
98
+ 'dynamodb.column.mapping' = 'author_id:author_id,name:name'
99
+ );
100
+ ```
101
+
102
+ ### Create a mapper for S3
103
+ * RUN:
104
+ ```ruby
105
+ Trinamo::Converter.load('ddl.yml', :s3).convert
106
+ ```
26
107
 
27
- ## Development
108
+ * OUTPUT:
109
+ ```hql
110
+ -- comments_s3
111
+ CREATE EXTERNAL TABLE comments_s3 (
112
+ user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
113
+ ) PARTITIONED BY (date STRING)
114
+ ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
115
+ LOCATION 's3://path/to/s3/table/location';
116
+ ```
28
117
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
118
+ ### Create a mapper for HDFS local
119
+ * RUN:
120
+ ```ruby
121
+ Trinamo::Converter.load('ddl.yml', :hdfs).convert
122
+ ```
30
123
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
124
+ * OUTPUT:
125
+ ```hql
126
+ -- comments_hdfs
127
+ CREATE TABLE comments_hdfs (
128
+ user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
129
+ );
130
+
131
+ -- authors_hdfs
132
+ CREATE TABLE authors_hdfs (
133
+ author_id BIGINT,name STRING
134
+ );
135
+ ```
32
136
 
33
137
  ## Contributing
34
138
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/trinamo. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
139
+ Bug reports and pull requests are welcome on GitHub at https://github.com/cignoir/trinamo.
36
140
 
37
141
 
38
142
  ## License
@@ -0,0 +1,54 @@
1
+ require 'unindent'
2
+ require_relative './converter/dynamodb_converter'
3
+ require_relative './converter/hdfs_converter'
4
+ require_relative './converter/s3_converter'
5
+
6
+ module Trinamo
7
+ class Converter
8
+ class << self
9
+ def load(ddl_yaml_path, format)
10
+ case format
11
+ when :hdfs then HdfsConverter.new(ddl_yaml_path)
12
+ when :s3 then S3Converter.new(ddl_yaml_path)
13
+ when :dynamodb then DynamodbConverter.new(ddl_yaml_path)
14
+ else raise "[ERROR] Unknown format: #{format}" unless [:dynamodb, :hdfs, :s3].include(format)
15
+ end
16
+ end
17
+
18
+ def generate_template(out_file_path = nil)
19
+ template = <<-TEMPLATE.unindent
20
+ dynamo_read_percent: 0.75
21
+ tables:
22
+ - name: comments
23
+ s3_location: s3://path/to/s3/table/location
24
+ s3_partition:
25
+ - name: date
26
+ type: string
27
+ hash_key:
28
+ - name: user_id
29
+ type: bigint
30
+ range_key:
31
+ - name: comment_id
32
+ type: bigint
33
+ attributes:
34
+ - name: title
35
+ type: string
36
+ - name: content
37
+ type: string
38
+ - name: rate
39
+ type: double
40
+ - name: authors
41
+ hash_key:
42
+ - name: author_id
43
+ type: bigint
44
+ attributes:
45
+ - name: name
46
+ type: string
47
+ TEMPLATE
48
+
49
+ File.binwrite(out_file_path, template) if out_file_path
50
+ template
51
+ end
52
+ end
53
+ end
54
+ end
@@ -0,0 +1,13 @@
1
+ require 'yaml'
2
+ require 'active_support/core_ext/hash'
3
+
4
+ module Trinamo
5
+ class BaseConverter
6
+ attr_accessor :ddl_yaml_path, :ddl
7
+
8
+ def initialize(ddl_yaml_path)
9
+ @ddl_yaml_path = ddl_yaml_path
10
+ @ddl = YAML.load_file(ddl_yaml_path).deep_symbolize_keys
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,33 @@
1
+ require_relative './base_converter'
2
+
3
+ module Trinamo
4
+ class DynamodbConverter < BaseConverter
5
+ def convert
6
+ read_percent = @ddl[:dynamo_read_percent] ? @ddl[:dynamo_read_percent] : 0.5
7
+
8
+ ddl_header = <<-DDL.unindent
9
+ SET dynamodb.throughput.read.percent = #{read_percent};
10
+ SET hive.exec.compress.output=true;
11
+ SET io.seqfile.compression.type=BLOCK;
12
+ SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec;
13
+ DDL
14
+
15
+ ddl_body = @ddl[:tables].map do |h|
16
+ fields = ([h[:hash_key]] + [h[:range_key]] + [h[:attributes]]).flatten.compact
17
+ <<-DDL.unindent
18
+ -- #{h[:name]}_ddb
19
+ CREATE EXTERNAL TABLE #{h[:name]}_ddb (
20
+ #{fields.map { |attr| "#{attr[:name]} #{attr[:type].upcase}" }.join(',')}
21
+ )
22
+ STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
23
+ TBLPROPERTIES (
24
+ 'dynamodb.table.name' = '#{h[:name]}',
25
+ 'dynamodb.column.mapping' = '#{fields.map { |attr| "#{attr[:name]}:#{attr[:name]}" }.join(',') }'
26
+ );
27
+ DDL
28
+ end
29
+
30
+ ([ddl_header] + ddl_body).join("\n")
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,19 @@
1
+ require_relative './base_converter'
2
+
3
+ module Trinamo
4
+ class HdfsConverter < BaseConverter
5
+ def convert
6
+ ddl_body = @ddl[:tables].map do |h|
7
+ fields = ([h[:hash_key]] + [h[:range_key]] + [h[:attributes]]).flatten.compact
8
+ <<-DDL.unindent
9
+ -- #{h[:name]}_hdfs
10
+ CREATE TABLE #{h[:name]}_hdfs (
11
+ #{fields.map { |attr| "#{attr[:name]} #{attr[:type].upcase}" }.join(',')}
12
+ );
13
+ DDL
14
+ end
15
+
16
+ ddl_body.join("\n")
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,27 @@
1
+ require_relative './base_converter'
2
+
3
+ module Trinamo
4
+ class S3Converter < BaseConverter
5
+ def convert
6
+ ddl_body = @ddl[:tables].map do |h|
7
+ if h[:s3_location]
8
+ fields = ([h[:hash_key]] + [h[:range_key]] + [h[:attributes]]).flatten.compact
9
+ partitioned_by = h[:s3_partition] ? "PARTITIONED BY (#{h[:s3_partition].map { |attr| "#{attr[:name]} #{attr[:type].upcase}" }.join(',')})" : ''
10
+ <<-DDL.unindent
11
+ -- #{h[:name]}_s3
12
+ CREATE EXTERNAL TABLE #{h[:name]}_s3 (
13
+ #{fields.map { |attr| "#{attr[:name]} #{attr[:type].upcase}" }.join(',')}
14
+ ) #{partitioned_by}
15
+ ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n'
16
+ LOCATION '#{h[:s3_location]}';
17
+ DDL
18
+ else
19
+ STDERR.puts "[ERROR] The location of #{h[:name]}_s3 is not found"
20
+ nil
21
+ end
22
+ end
23
+
24
+ ddl_body.compact.join("\n")
25
+ end
26
+ end
27
+ end
@@ -1,3 +1,3 @@
1
1
  module Trinamo
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
@@ -9,7 +9,7 @@ Gem::Specification.new do |spec|
9
9
  spec.authors = ["cignoir"]
10
10
  spec.email = ["cignoir@gmail.com"]
11
11
 
12
- spec.summary = %q{utilities for aws-dynamodb}
12
+ spec.summary = %q{DDL Generator for Hive from YAML}
13
13
  spec.description = %q{}
14
14
  spec.homepage = "https://github.com/cignoir/trinamo"
15
15
  spec.license = "MIT"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: trinamo
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - cignoir
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-06-08 00:00:00.000000000 Z
11
+ date: 2016-06-09 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -88,14 +88,8 @@ extensions: []
88
88
  extra_rdoc_files: []
89
89
  files:
90
90
  - ".gitignore"
91
- - ".idea/encodings.xml"
92
- - ".idea/misc.xml"
93
- - ".idea/modules.xml"
94
- - ".idea/trinamo.iml"
95
- - ".idea/workspace.xml"
96
91
  - ".rspec"
97
92
  - ".travis.yml"
98
- - CODE_OF_CONDUCT.md
99
93
  - Gemfile
100
94
  - LICENSE.txt
101
95
  - README.md
@@ -103,6 +97,11 @@ files:
103
97
  - bin/console
104
98
  - bin/setup
105
99
  - lib/trinamo.rb
100
+ - lib/trinamo/converter.rb
101
+ - lib/trinamo/converter/base_converter.rb
102
+ - lib/trinamo/converter/dynamodb_converter.rb
103
+ - lib/trinamo/converter/hdfs_converter.rb
104
+ - lib/trinamo/converter/s3_converter.rb
106
105
  - lib/trinamo/hive_type.rb
107
106
  - lib/trinamo/version.rb
108
107
  - trinamo.gemspec
@@ -129,5 +128,6 @@ rubyforge_project:
129
128
  rubygems_version: 2.4.5.1
130
129
  signing_key:
131
130
  specification_version: 4
132
- summary: utilities for aws-dynamodb
131
+ summary: DDL Generator for Hive from YAML
133
132
  test_files: []
133
+ has_rdoc:
@@ -1,6 +0,0 @@
1
- <?xml version="1.0" encoding="UTF-8"?>
2
- <project version="4">
3
- <component name="Encoding">
4
- <file url="PROJECT" charset="UTF-8" />
5
- </component>
6
- </project>
@@ -1,14 +0,0 @@
1
- <?xml version="1.0" encoding="UTF-8"?>
2
- <project version="4">
3
- <component name="ProjectLevelVcsManager" settingsEditedManually="false">
4
- <OptionsSetting value="true" id="Add" />
5
- <OptionsSetting value="true" id="Remove" />
6
- <OptionsSetting value="true" id="Checkout" />
7
- <OptionsSetting value="true" id="Update" />
8
- <OptionsSetting value="true" id="Status" />
9
- <OptionsSetting value="true" id="Edit" />
10
- <ConfirmationsSetting value="0" id="Add" />
11
- <ConfirmationsSetting value="0" id="Remove" />
12
- </component>
13
- <component name="ProjectRootManager" version="2" project-jdk-name="ruby-2.1.5-p273" project-jdk-type="RUBY_SDK" />
14
- </project>
@@ -1,8 +0,0 @@
1
- <?xml version="1.0" encoding="UTF-8"?>
2
- <project version="4">
3
- <component name="ProjectModuleManager">
4
- <modules>
5
- <module fileurl="file://$PROJECT_DIR$/.idea/trinamo.iml" filepath="$PROJECT_DIR$/.idea/trinamo.iml" />
6
- </modules>
7
- </component>
8
- </project>
@@ -1,52 +0,0 @@
1
- <?xml version="1.0" encoding="UTF-8"?>
2
- <module type="RUBY_MODULE" version="4">
3
- <component name="FacetManager">
4
- <facet type="gem" name="New Gem">
5
- <configuration>
6
- <option name="GEM_APP_ROOT_PATH" value="$MODULE_DIR$" />
7
- <option name="GEM_APP_TEST_PATH" value="$MODULE_DIR$/test" />
8
- <option name="GEM_APP_LIB_PATH" value="$MODULE_DIR$/lib" />
9
- </configuration>
10
- </facet>
11
- </component>
12
- <component name="ModuleRunConfigurationManager">
13
- <configuration default="false" name="test.rb" type="RubyRunConfigurationType" factoryName="Ruby">
14
- <module name="trinamo" />
15
- <RUBY_RUN_CONFIG NAME="RUBY_ARGS" VALUE="-e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift)" />
16
- <RUBY_RUN_CONFIG NAME="WORK DIR" VALUE="" />
17
- <RUBY_RUN_CONFIG NAME="SHOULD_USE_SDK" VALUE="false" />
18
- <RUBY_RUN_CONFIG NAME="ALTERN_SDK_NAME" VALUE="" />
19
- <RUBY_RUN_CONFIG NAME="myPassParentEnvs" VALUE="true" />
20
- <envs />
21
- <EXTENSION ID="BundlerRunConfigurationExtension" bundleExecEnabled="true" />
22
- <EXTENSION ID="JRubyRunConfigurationExtension" NailgunExecEnabled="false" />
23
- <EXTENSION ID="RubyCoverageRunConfigurationExtension" enabled="false" sample_coverage="true" track_test_folders="true" runner="rcov" />
24
- <RUBY_RUN_CONFIG NAME="SCRIPT_PATH" VALUE="$MODULE_DIR$/test.rb" />
25
- <RUBY_RUN_CONFIG NAME="SCRIPT_ARGS" VALUE="" />
26
- <method />
27
- </configuration>
28
- </component>
29
- <component name="NewModuleRootManager">
30
- <content url="file://$MODULE_DIR$">
31
- <sourceFolder url="file://$MODULE_DIR$/test" isTestSource="true" />
32
- <excludeFolder url="file://$MODULE_DIR$/.bundle" />
33
- <excludeFolder url="file://$MODULE_DIR$/vendor/bundle" />
34
- </content>
35
- <orderEntry type="jdk" jdkName="ruby-2.2.4-p230" jdkType="RUBY_SDK" />
36
- <orderEntry type="sourceFolder" forTests="false" />
37
- <orderEntry type="library" scope="PROVIDED" name="activesupport (v4.2.6, ruby-2.2.4-p230) [gem]" level="application" />
38
- <orderEntry type="library" scope="PROVIDED" name="diff-lcs (v1.2.5, ruby-2.2.4-p230) [gem]" level="application" />
39
- <orderEntry type="library" scope="PROVIDED" name="i18n (v0.7.0, ruby-2.2.4-p230) [gem]" level="application" />
40
- <orderEntry type="library" scope="PROVIDED" name="json (v1.8.3, ruby-2.2.4-p230) [gem]" level="application" />
41
- <orderEntry type="library" scope="PROVIDED" name="minitest (v5.9.0, ruby-2.2.4-p230) [gem]" level="application" />
42
- <orderEntry type="library" scope="PROVIDED" name="rake (v10.5.0, ruby-2.2.4-p230) [gem]" level="application" />
43
- <orderEntry type="library" scope="PROVIDED" name="rspec (v3.4.0, ruby-2.2.4-p230) [gem]" level="application" />
44
- <orderEntry type="library" scope="PROVIDED" name="rspec-core (v3.4.4, ruby-2.2.4-p230) [gem]" level="application" />
45
- <orderEntry type="library" scope="PROVIDED" name="rspec-expectations (v3.4.0, ruby-2.2.4-p230) [gem]" level="application" />
46
- <orderEntry type="library" scope="PROVIDED" name="rspec-mocks (v3.4.1, ruby-2.2.4-p230) [gem]" level="application" />
47
- <orderEntry type="library" scope="PROVIDED" name="rspec-support (v3.4.1, ruby-2.2.4-p230) [gem]" level="application" />
48
- <orderEntry type="library" scope="PROVIDED" name="thread_safe (v0.3.5, ruby-2.2.4-p230) [gem]" level="application" />
49
- <orderEntry type="library" scope="PROVIDED" name="tzinfo (v1.2.2, ruby-2.2.4-p230) [gem]" level="application" />
50
- <orderEntry type="library" scope="PROVIDED" name="unindent (v1.0, ruby-2.2.4-p230) [gem]" level="application" />
51
- </component>
52
- </module>
@@ -1,49 +0,0 @@
1
- # Contributor Code of Conduct
2
-
3
- As contributors and maintainers of this project, and in the interest of
4
- fostering an open and welcoming community, we pledge to respect all people who
5
- contribute through reporting issues, posting feature requests, updating
6
- documentation, submitting pull requests or patches, and other activities.
7
-
8
- We are committed to making participation in this project a harassment-free
9
- experience for everyone, regardless of level of experience, gender, gender
10
- identity and expression, sexual orientation, disability, personal appearance,
11
- body size, race, ethnicity, age, religion, or nationality.
12
-
13
- Examples of unacceptable behavior by participants include:
14
-
15
- * The use of sexualized language or imagery
16
- * Personal attacks
17
- * Trolling or insulting/derogatory comments
18
- * Public or private harassment
19
- * Publishing other's private information, such as physical or electronic
20
- addresses, without explicit permission
21
- * Other unethical or unprofessional conduct
22
-
23
- Project maintainers have the right and responsibility to remove, edit, or
24
- reject comments, commits, code, wiki edits, issues, and other contributions
25
- that are not aligned to this Code of Conduct, or to ban temporarily or
26
- permanently any contributor for other behaviors that they deem inappropriate,
27
- threatening, offensive, or harmful.
28
-
29
- By adopting this Code of Conduct, project maintainers commit themselves to
30
- fairly and consistently applying these principles to every aspect of managing
31
- this project. Project maintainers who do not follow or enforce the Code of
32
- Conduct may be permanently removed from the project team.
33
-
34
- This code of conduct applies both within project spaces and in public spaces
35
- when an individual is representing the project or its community.
36
-
37
- Instances of abusive, harassing, or otherwise unacceptable behavior may be
38
- reported by contacting a project maintainer at cignoir@gmail.com. All
39
- complaints will be reviewed and investigated and will result in a response that
40
- is deemed necessary and appropriate to the circumstances. Maintainers are
41
- obligated to maintain confidentiality with regard to the reporter of an
42
- incident.
43
-
44
- This Code of Conduct is adapted from the [Contributor Covenant][homepage],
45
- version 1.3.0, available at
46
- [http://contributor-covenant.org/version/1/3/0/][version]
47
-
48
- [homepage]: http://contributor-covenant.org
49
- [version]: http://contributor-covenant.org/version/1/3/0/