snowly 0.1.2 → 0.1.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
- metadata.gz: dc11176a4d316185bd37a2543f52066ae53ac795
4
- data.tar.gz: dffd40fb0d396f736e251a78801dfe16ba1ab72d
3
+ metadata.gz: d173fa98485fdac15319ceb8c82e13e217a71e01
4
+ data.tar.gz: 29b7b391be367f4cdc8ade831fc115aef27eed59
5
5
  SHA512:
6
- metadata.gz: aa321e2d03edd48254b5434231cea90301b00a6e7b700e7fa8be7f681be82aba97f29192244cdb8a78b649850bf4ef8fc82f5967c702a9bc2b2e09156fceb02c
7
- data.tar.gz: 660c13104bf3e0ec3a9eea2e10e5616336c52d05208de44bada6c5531ffd0f6de942f8172cb51d5b2b81e73d427b7c7af4f340a75492e653db2da7577541ac93
6
+ metadata.gz: baaa31a93be34a90f0ee857b8fe54c83d0f07e4bfbf07c205ed050d461a411dd99e35d455498ae620ad7077225731b9a01b1275ecf4a86fb955fa04b48524736
7
+ data.tar.gz: 7c76ad98a44bbe6da61bc4b3d9dd5eab1ecf38b36a557bdac76980990c24d6dd818e63f57b05652a40ecbc977b07e8da96541651142b1f135dc8dff317ee0477
data/README.md CHANGED
@@ -1,28 +1,181 @@
1
- # Snowly
1
+ # Snowly - Snowplow Request Validator
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/snowly`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ Debug your Snowplow implementation locally, without resorting to Snowplow's ETL tasks. It's like Facebook's URL Linter, but for Snowplow.
4
4
 
5
- TODO: Delete this and the text above, and describe your gem
5
+ Snowly is a minimal [Collector](https://github.com/snowplow/snowplow/wiki/Setting-up-a-collector) implementation intended to run on your development environment. It comes with a comprehensive validation engine, that will point out any schema requirement violations. You can easily validate your event requests before having to emmit them to a cloudfront, closure or scala collector.
6
+
7
+ ### Motivation
8
+
9
+ Snowplow has an excellent toolset, but the first implementation stages can be hard. To run Snowplow properly you have to set up a lot of external dependencies like AWS permissions, Cloudfront distributions and EMR jobs. If you're tweaking the snowplow model to fit your needs or using trackers that don't enforce every requirement, you'll find yourself waiting for the ETL jobs to run in order to validate every implementation changes.
10
+
11
+ ### Who will get the most from Snowly
12
+
13
+ - Teams that need to extend the snowplow model with custom contexts or unstructured events.
14
+ - Applications that are constantly evolving their schemas and rules.
15
+ - Developers trying out Snowplow before commiting to it.
16
+
17
+ ### Features
18
+
19
+ With Snowly you can use [Json Schemas](http://spacetelescope.github.io/understanding-json-schema/) to define more expressive event requirements. Aside from assuring that you're fully compatible with the snowplow protocol, you can go even further and extend it with a set of more specific rules.
20
+
21
+ Use cases:
22
+
23
+ - Validate custom contexts or unstructured event types and required fields.
24
+ - Restrict values for any field, like using a custom dictionary for the structured event action field.
25
+ - Define requirements based on the content of another field: If __event action__ is 'viewed_product', __event property__ is required.
6
26
 
7
27
  ## Installation
8
28
 
9
- Add this line to your application's Gemfile:
29
+ ```bash
30
+ gem install snowly
31
+ ```
32
+ That will copy a `snowly` executable to your system.
33
+
34
+ ### Development Iglu Resolver Path
35
+
36
+ If you still don't know anything about [Snowplow's Iglu Resolvers](https://github.com/snowplow/iglu), don't worry. It's pretty straightforward.
37
+ Snowly must be able to find your custom context and unstructured event schemas, so you have to set up a local path to store them. You can also choose to use an [external resolver](https://github.com/snowplow/iglu/wiki/Static-repo-setup) pointing to an URL.
38
+
39
+ For a local setup, store your schemas under any path accessible by your user(eg: ~/schemas). The only catch is that you must comply with snowplow's naming conventions for your json schemas. Snowplow References:[[1]](https://github.com/snowplow/snowplow/wiki/snowplow-tracker-protocol#custom-contexts),[[2]](https://github.com/snowplow/snowplow/wiki/snowplow-tracker-protocol#310-custom-unstructured-event-tracking)
40
+
41
+ You must export an environment variable to make Snowly aware of that path. Add it to your .bash_profile or equivalent.
42
+ ```bash
43
+ # A local path is the recommended approach, as its easier to evolve your schemas
44
+ # without the hassle of setting up an actual resolver.
45
+ export DEVELOPMENT_IGLU_RESOLVER_PATH=~/schema
46
+
47
+ # or host on a Static Website on Amazon Web Services, for instance.
48
+ export DEVELOPMENT_IGLU_RESOLVER_PATH=http://my_resolver_bucket.s3-website-us-east-1.amazonaws.com
49
+ ```
50
+
51
+ Example:
52
+ ```bash
53
+ # create a user context
54
+ mkdir -p ~/schemas/com.my_company/hero_user/jsonschema
55
+ touch ~/schemas/com.my_company/hero_user/jsonschema/1-0-0
10
56
 
57
+ # create a viewed product unstructured event
58
+ mkdir -p ~/schemas/com.my_company/viewed_product/jsonschema
59
+ touch ~/schemas/com.my_company/viewed_product/jsonschema/1-0-0
60
+ ```
61
+
62
+ `1-0-0` is the actual json schema file. You will find examples just ahead.
63
+
64
+ ## Usage
65
+
66
+ Just use `snowly` to start and `snowly -K` to stop. Where allowed, a browser window will open showing the collector's address.
67
+
68
+ Other options:
69
+
70
+ -K, --kill kill the running process and exit
71
+ -S, --status display the current running PID and URL then quit
72
+ -s, --server SERVER serve using SERVER (thin/mongrel/webrick)
73
+ -o, --host HOST listen on HOST (default: 0.0.0.0)
74
+ -p, --port PORT use PORT (default: 5678)
75
+ -x, --no-proxy ignore env proxy settings (e.g. http_proxy)
76
+ -e, --env ENVIRONMENT use ENVIRONMENT for defaults (default: development)
77
+ -F, --foreground don't daemonize, run in the foreground
78
+ -L, --no-launch don't launch the browser
79
+ -d, --debug raise the log level to :debug (default: :info)
80
+ --app-dir APP_DIR set the app dir where files are stored (default: ~/.vegas/collector)/)
81
+ -P, --pid-file PID_FILE set the path to the pid file (default: app_dir/collector.pid)
82
+ --log-file LOG_FILE set the path to the log file (default: app_dir/collector.log)
83
+ --url-file URL_FILE set the path to the URL file (default: app_dir/collector.url)
84
+
85
+ ## JSON Schemas
86
+
87
+ JSON Schema is a powerful tool for validating the structure of JSON data. I recommend reading this excellent [Guide](http://spacetelescope.github.io/understanding-json-schema/) from Michael Droettboom to understand all of its capabilities, but you can start with the examples bellow.
88
+
89
+ Example:
90
+
91
+ A user context. Well... Not just any user can get there.
92
+
93
+ __Note that this is not valid json because of the comments.__
11
94
  ```ruby
12
- gem 'snowly'
95
+ # ~/schemas/com.my_company/hero_user/jsonschema/1-0-0
96
+ {
97
+ # Your schema will also be checked against the Snowplow Self-Desc Schema requirements.
98
+ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
99
+ "id": "com.my_company/hero_user/jsonschema/1-0-0", # Give your schema an id for better validation output
100
+ "description": "My first Hero Context",
101
+ "self": {
102
+ "vendor": "com.my_company",
103
+ "name": "hero_user",
104
+ "format": "jsonschema",
105
+ "version": "1-0-0"
106
+ },
107
+
108
+ "type": "object",
109
+ "properties": {
110
+ "name": {
111
+ "type": "string",
112
+ "maxLength": 100 # The hero's name can't be larger than 100 chars
113
+ },
114
+ "special_powers": {
115
+ "type": "array",
116
+ "minItems": 2, # This is not just any hero. He must have at least two special powers.
117
+ "uniqueItems": true
118
+ },
119
+ "age": {
120
+ "type": "integer", # Strings are not allowed.
121
+ "minimum": 15, # The Powerpuff Girls aren't allowed
122
+ "maximum": 100 # Wolverine is out
123
+ },
124
+ "cape_color": {
125
+ "type": "string",
126
+ "enum": ["red", "blue", "black"] # Xmen Vision is not welcome
127
+ },
128
+ "is_avenger": {
129
+ "type": "boolean"
130
+ },
131
+ "rating": {
132
+ "type": "number" # Allows for float values
133
+ },
134
+ "address": { # cascading objects with their own validation rules.
135
+ "type": "object",
136
+ "properties": {
137
+ "street_name": {
138
+ "type": "string"
139
+ },
140
+ "number": {
141
+ "type": "integer"
142
+ }
143
+ }
144
+ }
145
+ },
146
+ "required": ["name", "age"], # Name and Age must always be present
147
+ "custom_dependencies": {
148
+ "cape_color": { "name": "superman" } # If the hero's #name is 'superman', #cape_color has to be present.
149
+ },
150
+ "additionalProperties": false # No other unspecified attributes are allowed.
151
+ }
13
152
  ```
14
153
 
15
- And then execute:
154
+ ### Extending Snowplow's Protocol
16
155
 
17
- $ bundle
156
+ Although the Snowplow's protocol isn't originally defined in a JSON schema, it doesn't hurt to do so and take advantage of all its perks. It's also here for the sake of consistency, right?
18
157
 
19
- Or install it yourself as:
158
+ By expressing the protocol in a JSON schema you can extend it to fit your particular needs and enforce domain rules that otherwise wouldn't be available. [Take a look](https://github.com/angelim/snowly/blob/master/lib/schemas/snowplow_protocol.json) at the default schema, derived from the rules specified on the [canonical model](https://github.com/snowplow/snowplow/wiki/canonical-event-model).
20
159
 
21
- $ gem install snowly
160
+ Whenever possible, Snowly will output column names mapped from query string parameters. When two parameters can map to the same content(eg. regular and base64 versions), a common intuitive name is used(eg. contexts and unstruct_event).
22
161
 
23
- ## Usage
162
+ You can override the protocol schema by placing it anywhere inside your Local Resolver Path. As of now, the whole file has to be replaced:
163
+
164
+ __It's important to name the file as `snowplow_protocol.json`.__
24
165
 
25
- TODO: Write usage instructions here
166
+ One example of useful extensions.
167
+ ```ruby
168
+ ...
169
+ "se_action": {
170
+ "type": "string",
171
+ "enum": ["product_view", "add_to_cart", "product_zoom"] # Only these values are allowed for an structured event action.
172
+ }
173
+
174
+ "custom_dependencies": {
175
+ "true_tstamp": {"platform": "mob"} # You must submit the true timestamp when the platform is set to "mob".
176
+ {
177
+ ...
178
+ ```
26
179
 
27
180
  ## Development
28
181
 
@@ -32,5 +185,5 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
32
185
 
33
186
  ## Contributing
34
187
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/snowly.
188
+ Bug reports and pull requests are welcome on GitHub at https://github.com/angelim/snowly.
36
189
 
data/bin/snowly CHANGED
@@ -4,5 +4,5 @@ require "bundler/setup"
4
4
  require 'snowly'
5
5
  require 'snowly/app/collector'
6
6
  require 'vegas'
7
-
7
+ Snowly.debug_mode = true if ARGV.index{ |arg| arg == '-d' or arg == '--debug' }
8
8
  Vegas::Runner.new(Snowly::App::Collector, 'collector')
data/lib/snowly.rb CHANGED
@@ -9,10 +9,11 @@ require 'snowly/validator'
9
9
  require 'snowly/schema_cache'
10
10
 
11
11
  module Snowly
12
- mattr_accessor :local_iglu_resolver_path, :debug_mode
12
+ mattr_accessor :development_iglu_resolver_path, :debug_mode, :logger
13
13
 
14
- @@local_iglu_resolver_path = ENV['LOCAL_IGLU_RESOLVER_PATH']
15
- @@debug_mode = ENV['SNOWLY_DEBUG_MODE'] || false
14
+ @@development_iglu_resolver_path = ENV['DEVELOPMENT_IGLU_RESOLVER_PATH']
15
+ @@debug_mode = false
16
+ @@logger = Logger.new(STDOUT)
16
17
 
17
18
  def self.config
18
19
  yield self
@@ -13,7 +13,7 @@ module Snowly
13
13
 
14
14
  get '/' do
15
15
  @url = request.url.gsub(/(http|https)\:\/\//,'')[0..-2]
16
- @resolved_schemas = if resolver = Snowly.local_iglu_resolver_path
16
+ @resolved_schemas = if resolver = Snowly.development_iglu_resolver_path
17
17
  Dir[File.join(resolver,"/**/*")].select{ |e| File.file? e }
18
18
  else
19
19
  nil
@@ -26,15 +26,19 @@ module Snowly
26
26
  validator = Snowly::Validator.new request.query_string
27
27
  if validator.validate
28
28
  status 200
29
+ content = { content: validator.request.as_hash }.to_json
30
+ Snowly.logger.info content
29
31
  if params[:debug] || Snowly.debug_mode
30
- body({ content: validator.request.as_hash }.to_json)
32
+ body(content)
31
33
  else
32
34
  content_type 'image/gif'
33
35
  Snowly::App::Collector::GIF
34
36
  end
35
37
  else
36
38
  status 500
37
- body ({ errors: validator.errors, content: validator.request.as_hash }.to_json)
39
+ content = { errors: validator.errors, content: validator.request.as_hash }.to_json
40
+ Snowly.logger.error content
41
+ body (content)
38
42
  end
39
43
  end
40
44
  end
@@ -27,33 +27,38 @@
27
27
  <a class="btn btn-lg btn-success" href="/i?&e=pv&page=Root%20README&url=http%3A%2F%2Fgithub.com%2Fsnowplow%2Fsnowplow&aid=snowplow&p=web&tv=no-js-0.1.0&ua=firefox&&eid=u2i3&debug=true" role="button">See it working!</a>
28
28
  <a class="btn btn-lg btn-warning" href="/i?&e=pv&page=Root%20README&url=http%3A%2F%2Fgithub.com%2Fsnowplow%2Fsnowplow&aid=snowplow&p=i&tv=no-js-0.1.0&debug=true" role="button">Event with errors!</a>
29
29
  </p>
30
- <% unless Snowly.local_iglu_resolver_path %>
30
+ <% unless Snowly.development_iglu_resolver_path %>
31
31
  <div class="alert alert-danger" role="alert">The Local Iglu Resolver Path is missing.</div>
32
32
  <% end %>
33
- <p>Use <code>snowly -K</code> to stop the collector.</p>
33
+ <p>
34
+ Use <code>snowly -K</code> to stop the collector.
35
+ </p>
34
36
  </div>
35
37
 
36
38
  <div class="row marketing">
37
39
  <div class="col-lg-12">
38
40
  <div class="panel panel-default">
39
- <div class="panel-heading">Current Configuration</div>
41
+ <div class="panel-heading">
42
+ Current Configuration
43
+ <div class="pull-right"><span class="label label-primary">version <%= Snowly::VERSION %> </span></div>
44
+ </div>
40
45
  <table class="table">
41
46
  <thead>
42
47
  <tr>
43
- <th>Environment Variable</th>
48
+ <th>Configuration</th>
44
49
  <th>Value</th>
45
50
  <th>Description</th>
46
51
  </tr>
47
52
  </thead>
48
53
  <tbody>
49
54
  <tr>
50
- <td>SNOWLY_DEBUG_MODE</td>
55
+ <td>Debug Mode</td>
51
56
  <td><%= Snowly.debug_mode %></td>
52
57
  <td>Renders parsed request instead of a pixel. Defaults to false</td>
53
58
  </tr>
54
59
  <tr>
55
- <td>LOCAL_IGLU_RESOLVER_PATH</td>
56
- <td><%= Snowly.local_iglu_resolver_path %></td>
60
+ <td>DEVELOPMENT_IGLU_RESOLVER_PATH</td>
61
+ <td><%= Snowly.development_iglu_resolver_path %></td>
57
62
  <td>Local path for contexts and unstructured event schemas.</td>
58
63
  </tr>
59
64
  </tbody>
@@ -86,13 +91,13 @@
86
91
  Just like the Resolver you may have already configured for the official ETL tools, Snowly needs a
87
92
  local path to find your custom schemas. You can store them under any path(eg: ~/schemas)
88
93
  Inside that folder you must create a resolver compatible structure:
89
- <code>~/schemas/com.yoursite/schema/my_context/1-0-0</code><br>
90
- <code>~/schemas/com.yoursite/schema/my_event/1-0-0</code><br>
94
+ <code>~/schemas/com.yoursite/my_context/jsonschema/1-0-0</code><br>
95
+ <code>~/schemas/com.yoursite/my_event/jsonschema/1-0-0</code><br>
91
96
  1-0-0 is the file holding the schema.
92
97
  </p>
93
98
  <p>
94
99
  When you emmit events, use the schema path from the <code>Resolver path</code><br/>
95
- <code>{ schema: 'iglu:com.yoursite/schema/my_context/1-0-0', data: !some_schema_data! }</code>
100
+ <code>{ schema: 'iglu:com.yoursite/my_context/jsonschema/1-0-0', data: !some_schema_data! }</code>
96
101
  </p>
97
102
  <p>
98
103
  Be sure to give your schemas an <a href="http://spacetelescope.github.io/understanding-json-schema/structuring.html#the-id-property">id</a>, so Snowly can output more helpful validation error messages.
@@ -33,6 +33,10 @@ module Snowly
33
33
  location['iglu:com.snowplowanalytics.snowplow']
34
34
  end
35
35
 
36
+ def external?(location)
37
+ location.match(/^(http|https):\/\//)
38
+ end
39
+
36
40
  # Translate an iglu address to an actual local or remote location
37
41
  # @param location [String]
38
42
  # @param resolver [String] local or remote path to look for the schema
@@ -41,16 +45,20 @@ module Snowly
41
45
  location.sub(/^iglu\:/, resolver)
42
46
  end
43
47
 
48
+ def resolved_path(location)
49
+ if from_snowplow?(location)
50
+ resolve(location, SNOWPLOW_IGLU_RESOLVER)
51
+ else
52
+ resolve(location, Snowly.development_iglu_resolver_path)
53
+ end
54
+ end
55
+
44
56
  # Caches the schema content under its original location name
45
57
  # @param location [String]
46
58
  # @return [String] schema content
47
59
  def save_in_cache(location)
48
- content = if from_snowplow?(location)
49
- uri = URI(resolve(location, SNOWPLOW_IGLU_RESOLVER))
50
- Net::HTTP.get(uri)
51
- else
52
- File.read(resolve(location, Snowly.local_iglu_resolver_path))
53
- end
60
+ full_path = resolved_path(location)
61
+ content = external?(full_path) ? Net::HTTP.get(URI(full_path)) : File.read(full_path)
54
62
  @@schema_cache[location] = content
55
63
  end
56
64
 
@@ -4,17 +4,51 @@ require 'snowly/extensions/custom_dependencies'
4
4
 
5
5
  module Snowly
6
6
  class Validator
7
- attr_reader :request, :errors
7
+ PROTOCOL_FILE_NAME = 'snowplow_protocol.json'
8
+
9
+ attr_reader :request, :errors, :protocol_schema
8
10
 
9
11
  def initialize(query_string)
10
12
  @request = Request.new query_string
11
13
  @errors = []
14
+ @protocol_schema = load_protocol_schema
15
+ end
16
+
17
+ # If request is valid
18
+ # @return [true, false] if valid
19
+ def valid?
20
+ @errors == []
21
+ end
22
+
23
+ # Entry point for validation.
24
+ def validate
25
+ validate_root
26
+ validate_associated
27
+ valid?
28
+ end
29
+
30
+ private
31
+
32
+ def find_protocol_schema
33
+ if resolver && alternative_protocol_schema
34
+ alternative_protocol_schema
35
+ else
36
+ File.expand_path("../../schemas/#{PROTOCOL_FILE_NAME}", __FILE__)
37
+ end
38
+ end
39
+
40
+ def resolver
41
+ Snowly.development_iglu_resolver_path
42
+ end
43
+
44
+ def alternative_protocol_schema
45
+ Dir[File.join(resolver,"/**/*")].select{ |f| File.basename(f) == PROTOCOL_FILE_NAME }[0]
12
46
  end
13
47
 
14
48
  # Loads the protocol schema created to describe snowplow events table attributes
15
49
  # @return [Hash] parsed schema
16
- def protocol_schema
17
- @protocol_schema ||= JSON.parse File.read(File.expand_path("../../schemas/snowplow_protocol.json",__FILE__))
50
+ def load_protocol_schema
51
+ JSON.parse File.read(find_protocol_schema)
18
52
  end
19
53
 
20
54
  # @return [Hash] all contexts content and schema definitions
@@ -79,18 +113,5 @@ module Snowly
79
113
  this_error = JSON::Validator.fully_validate protocol_schema, request.as_hash
80
114
  @errors += this_error if this_error.count > 0
81
115
  end
82
-
83
- # If request is valid
84
- # @return [true, false] if valid
85
- def valid?
86
- @errors == []
87
- end
88
-
89
- # Entry point for validation.
90
- def validate
91
- validate_root
92
- validate_associated
93
- valid?
94
- end
95
116
  end
96
117
  end
@@ -1,3 +1,3 @@
1
1
  module Snowly
2
- VERSION = "0.1.2"
2
+ VERSION = "0.1.4"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: snowly
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Alexandre Angelim