snowly 0.1.2 → 0.1.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +165 -12
- data/bin/snowly +1 -1
- data/lib/snowly.rb +4 -3
- data/lib/snowly/app/collector.rb +7 -3
- data/lib/snowly/app/views/index.erb +15 -10
- data/lib/snowly/schema_cache.rb +14 -6
- data/lib/snowly/validator.rb +37 -16
- data/lib/snowly/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d173fa98485fdac15319ceb8c82e13e217a71e01
|
4
|
+
data.tar.gz: 29b7b391be367f4cdc8ade831fc115aef27eed59
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: baaa31a93be34a90f0ee857b8fe54c83d0f07e4bfbf07c205ed050d461a411dd99e35d455498ae620ad7077225731b9a01b1275ecf4a86fb955fa04b48524736
|
7
|
+
data.tar.gz: 7c76ad98a44bbe6da61bc4b3d9dd5eab1ecf38b36a557bdac76980990c24d6dd818e63f57b05652a40ecbc977b07e8da96541651142b1f135dc8dff317ee0477
|
data/README.md
CHANGED
@@ -1,28 +1,181 @@
|
|
1
|
-
# Snowly
|
1
|
+
# Snowly - Snowplow Request Validator
|
2
2
|
|
3
|
-
|
3
|
+
Debug your Snowplow implementation locally, without resorting to Snowplow's ETL tasks. It's like Facebook's URL Linter, but for Snowplow.
|
4
4
|
|
5
|
-
|
5
|
+
Snowly is a minimal [Collector](https://github.com/snowplow/snowplow/wiki/Setting-up-a-collector) implementation intended to run on your development environment. It comes with a comprehensive validation engine, that will point out any schema requirement violations. You can easily validate your event requests before having to emmit them to a cloudfront, closure or scala collector.
|
6
|
+
|
7
|
+
### Motivation
|
8
|
+
|
9
|
+
Snowplow has an excellent toolset, but the first implementation stages can be hard. To run Snowplow properly you have to set up a lot of external dependencies like AWS permissions, Cloudfront distributions and EMR jobs. If you're tweaking the snowplow model to fit your needs or using trackers that don't enforce every requirement, you'll find yourself waiting for the ETL jobs to run in order to validate every implementation changes.
|
10
|
+
|
11
|
+
### Who will get the most from Snowly
|
12
|
+
|
13
|
+
- Teams that need to extend the snowplow model with custom contexts or unstructured events.
|
14
|
+
- Applications that are constantly evolving their schemas and rules.
|
15
|
+
- Developers trying out Snowplow before commiting to it.
|
16
|
+
|
17
|
+
### Features
|
18
|
+
|
19
|
+
With Snowly you can use [Json Schemas](http://spacetelescope.github.io/understanding-json-schema/) to define more expressive event requirements. Aside from assuring that you're fully compatible with the snowplow protocol, you can go even further and extend it with a set of more specific rules.
|
20
|
+
|
21
|
+
Use cases:
|
22
|
+
|
23
|
+
- Validate custom contexts or unstructured event types and required fields.
|
24
|
+
- Restrict values for any field, like using a custom dictionary for the structured event action field.
|
25
|
+
- Define requirements based on the content of another field: If __event action__ is 'viewed_product', __event property__ is required.
|
6
26
|
|
7
27
|
## Installation
|
8
28
|
|
9
|
-
|
29
|
+
```bash
|
30
|
+
gem install snowly
|
31
|
+
```
|
32
|
+
That will copy a `snowly` executable to your system.
|
33
|
+
|
34
|
+
### Development Iglu Resolver Path
|
35
|
+
|
36
|
+
If you still don't know anything about [Snowplow's Iglu Resolvers](https://github.com/snowplow/iglu), don't worry. It's pretty straightforward.
|
37
|
+
Snowly must be able to find your custom context and unstructured event schemas, so you have to set up a local path to store them. You can also choose to use an [external resolver](https://github.com/snowplow/iglu/wiki/Static-repo-setup) pointing to an URL.
|
38
|
+
|
39
|
+
For a local setup, store your schemas under any path accessible by your user(eg: ~/schemas). The only catch is that you must comply with snowplow's naming conventions for your json schemas. Snowplow References:[[1]](https://github.com/snowplow/snowplow/wiki/snowplow-tracker-protocol#custom-contexts),[[2]](https://github.com/snowplow/snowplow/wiki/snowplow-tracker-protocol#310-custom-unstructured-event-tracking)
|
40
|
+
|
41
|
+
You must export an environment variable to make Snowly aware of that path. Add it to your .bash_profile or equivalent.
|
42
|
+
```bash
|
43
|
+
# A local path is the recommended approach, as its easier to evolve your schemas
|
44
|
+
# without the hassle of setting up an actual resolver.
|
45
|
+
export DEVELOPMENT_IGLU_RESOLVER_PATH=~/schema
|
46
|
+
|
47
|
+
# or host on a Static Website on Amazon Web Services, for instance.
|
48
|
+
export DEVELOPMENT_IGLU_RESOLVER_PATH=http://my_resolver_bucket.s3-website-us-east-1.amazonaws.com
|
49
|
+
```
|
50
|
+
|
51
|
+
Example:
|
52
|
+
```bash
|
53
|
+
# create a user context
|
54
|
+
mkdir -p ~/schemas/com.my_company/hero_user/jsonschema
|
55
|
+
touch ~/schemas/com.my_company/hero_user/jsonschema/1-0-0
|
10
56
|
|
57
|
+
# create a viewed product unstructured event
|
58
|
+
mkdir -p ~/schemas/com.my_company/viewed_product/jsonschema
|
59
|
+
touch ~/schemas/com.my_company/viewed_product/jsonschema/1-0-0
|
60
|
+
```
|
61
|
+
|
62
|
+
`1-0-0` is the actual json schema file. You will find examples just ahead.
|
63
|
+
|
64
|
+
## Usage
|
65
|
+
|
66
|
+
Just use `snowly` to start and `snowly -K` to stop. Where allowed, a browser window will open showing the collector's address.
|
67
|
+
|
68
|
+
Other options:
|
69
|
+
|
70
|
+
-K, --kill kill the running process and exit
|
71
|
+
-S, --status display the current running PID and URL then quit
|
72
|
+
-s, --server SERVER serve using SERVER (thin/mongrel/webrick)
|
73
|
+
-o, --host HOST listen on HOST (default: 0.0.0.0)
|
74
|
+
-p, --port PORT use PORT (default: 5678)
|
75
|
+
-x, --no-proxy ignore env proxy settings (e.g. http_proxy)
|
76
|
+
-e, --env ENVIRONMENT use ENVIRONMENT for defaults (default: development)
|
77
|
+
-F, --foreground don't daemonize, run in the foreground
|
78
|
+
-L, --no-launch don't launch the browser
|
79
|
+
-d, --debug raise the log level to :debug (default: :info)
|
80
|
+
--app-dir APP_DIR set the app dir where files are stored (default: ~/.vegas/collector)/)
|
81
|
+
-P, --pid-file PID_FILE set the path to the pid file (default: app_dir/collector.pid)
|
82
|
+
--log-file LOG_FILE set the path to the log file (default: app_dir/collector.log)
|
83
|
+
--url-file URL_FILE set the path to the URL file (default: app_dir/collector.url)
|
84
|
+
|
85
|
+
## JSON Schemas
|
86
|
+
|
87
|
+
JSON Schema is a powerful tool for validating the structure of JSON data. I recommend reading this excellent [Guide](http://spacetelescope.github.io/understanding-json-schema/) from Michael Droettboom to understand all of its capabilities, but you can start with the examples bellow.
|
88
|
+
|
89
|
+
Example:
|
90
|
+
|
91
|
+
A user context. Well... Not just any user can get there.
|
92
|
+
|
93
|
+
__Note that this is not valid json because of the comments.__
|
11
94
|
```ruby
|
12
|
-
|
95
|
+
# ~/schemas/com.my_company/hero_user/jsonschema/1-0-0
|
96
|
+
{
|
97
|
+
# Your schema will also be checked against the Snowplow Self-Desc Schema requirements.
|
98
|
+
"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
|
99
|
+
"id": "com.my_company/hero_user/jsonschema/1-0-0", # Give your schema an id for better validation output
|
100
|
+
"description": "My first Hero Context",
|
101
|
+
"self": {
|
102
|
+
"vendor": "com.my_company",
|
103
|
+
"name": "hero_user",
|
104
|
+
"format": "jsonschema",
|
105
|
+
"version": "1-0-0"
|
106
|
+
},
|
107
|
+
|
108
|
+
"type": "object",
|
109
|
+
"properties": {
|
110
|
+
"name": {
|
111
|
+
"type": "string",
|
112
|
+
"maxLength": 100 # The hero's name can't be larger than 100 chars
|
113
|
+
},
|
114
|
+
"special_powers": {
|
115
|
+
"type": "array",
|
116
|
+
"minItems": 2, # This is not just any hero. He must have at least two special powers.
|
117
|
+
"uniqueItems": true
|
118
|
+
},
|
119
|
+
"age": {
|
120
|
+
"type": "integer", # Strings are not allowed.
|
121
|
+
"minimum": 15, # The Powerpuff Girls aren't allowed
|
122
|
+
"maximum": 100 # Wolverine is out
|
123
|
+
},
|
124
|
+
"cape_color": {
|
125
|
+
"type": "string",
|
126
|
+
"enum": ["red", "blue", "black"] # Xmen Vision is not welcome
|
127
|
+
},
|
128
|
+
"is_avenger": {
|
129
|
+
"type": "boolean"
|
130
|
+
},
|
131
|
+
"rating": {
|
132
|
+
"type": "number" # Allows for float values
|
133
|
+
},
|
134
|
+
"address": { # cascading objects with their own validation rules.
|
135
|
+
"type": "object",
|
136
|
+
"properties": {
|
137
|
+
"street_name": {
|
138
|
+
"type": "string"
|
139
|
+
},
|
140
|
+
"number": {
|
141
|
+
"type": "integer"
|
142
|
+
}
|
143
|
+
}
|
144
|
+
}
|
145
|
+
},
|
146
|
+
"required": ["name", "age"], # Name and Age must always be present
|
147
|
+
"custom_dependencies": {
|
148
|
+
"cape_color": { "name": "superman" } # If the hero's #name is 'superman', #cape_color has to be present.
|
149
|
+
},
|
150
|
+
"additionalProperties": false # No other unspecified attributes are allowed.
|
151
|
+
}
|
13
152
|
```
|
14
153
|
|
15
|
-
|
154
|
+
### Extending Snowplow's Protocol
|
16
155
|
|
17
|
-
|
156
|
+
Although the Snowplow's protocol isn't originally defined in a JSON schema, it doesn't hurt to do so and take advantage of all its perks. It's also here for the sake of consistency, right?
|
18
157
|
|
19
|
-
|
158
|
+
By expressing the protocol in a JSON schema you can extend it to fit your particular needs and enforce domain rules that otherwise wouldn't be available. [Take a look](https://github.com/angelim/snowly/blob/master/lib/schemas/snowplow_protocol.json) at the default schema, derived from the rules specified on the [canonical model](https://github.com/snowplow/snowplow/wiki/canonical-event-model).
|
20
159
|
|
21
|
-
|
160
|
+
Whenever possible, Snowly will output column names mapped from query string parameters. When two parameters can map to the same content(eg. regular and base64 versions), a common intuitive name is used(eg. contexts and unstruct_event).
|
22
161
|
|
23
|
-
|
162
|
+
You can override the protocol schema by placing it anywhere inside your Local Resolver Path. As of now, the whole file has to be replaced:
|
163
|
+
|
164
|
+
__It's important to name the file as `snowplow_protocol.json`.__
|
24
165
|
|
25
|
-
|
166
|
+
One example of useful extensions.
|
167
|
+
```ruby
|
168
|
+
...
|
169
|
+
"se_action": {
|
170
|
+
"type": "string",
|
171
|
+
"enum": ["product_view", "add_to_cart", "product_zoom"] # Only these values are allowed for an structured event action.
|
172
|
+
}
|
173
|
+
|
174
|
+
"custom_dependencies": {
|
175
|
+
"true_tstamp": {"platform": "mob"} # You must submit the true timestamp when the platform is set to "mob".
|
176
|
+
{
|
177
|
+
...
|
178
|
+
```
|
26
179
|
|
27
180
|
## Development
|
28
181
|
|
@@ -32,5 +185,5 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
|
|
32
185
|
|
33
186
|
## Contributing
|
34
187
|
|
35
|
-
Bug reports and pull requests are welcome on GitHub at https://github.com/
|
188
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/angelim/snowly.
|
36
189
|
|
data/bin/snowly
CHANGED
data/lib/snowly.rb
CHANGED
@@ -9,10 +9,11 @@ require 'snowly/validator'
|
|
9
9
|
require 'snowly/schema_cache'
|
10
10
|
|
11
11
|
module Snowly
|
12
|
-
mattr_accessor :
|
12
|
+
mattr_accessor :development_iglu_resolver_path, :debug_mode, :logger
|
13
13
|
|
14
|
-
@@
|
15
|
-
@@debug_mode =
|
14
|
+
@@development_iglu_resolver_path = ENV['DEVELOPMENT_IGLU_RESOLVER_PATH']
|
15
|
+
@@debug_mode = false
|
16
|
+
@@logger = Logger.new(STDOUT)
|
16
17
|
|
17
18
|
def self.config
|
18
19
|
yield self
|
data/lib/snowly/app/collector.rb
CHANGED
@@ -13,7 +13,7 @@ module Snowly
|
|
13
13
|
|
14
14
|
get '/' do
|
15
15
|
@url = request.url.gsub(/(http|https)\:\/\//,'')[0..-2]
|
16
|
-
@resolved_schemas = if resolver = Snowly.
|
16
|
+
@resolved_schemas = if resolver = Snowly.development_iglu_resolver_path
|
17
17
|
Dir[File.join(resolver,"/**/*")].select{ |e| File.file? e }
|
18
18
|
else
|
19
19
|
nil
|
@@ -26,15 +26,19 @@ module Snowly
|
|
26
26
|
validator = Snowly::Validator.new request.query_string
|
27
27
|
if validator.validate
|
28
28
|
status 200
|
29
|
+
content = { content: validator.request.as_hash }.to_json
|
30
|
+
Snowly.logger.info content
|
29
31
|
if params[:debug] || Snowly.debug_mode
|
30
|
-
body(
|
32
|
+
body(content)
|
31
33
|
else
|
32
34
|
content_type 'image/gif'
|
33
35
|
Snowly::App::Collector::GIF
|
34
36
|
end
|
35
37
|
else
|
36
38
|
status 500
|
37
|
-
|
39
|
+
content = { errors: validator.errors, content: validator.request.as_hash }.to_json
|
40
|
+
Snowly.logger.error content
|
41
|
+
body (content)
|
38
42
|
end
|
39
43
|
end
|
40
44
|
end
|
@@ -27,33 +27,38 @@
|
|
27
27
|
<a class="btn btn-lg btn-success" href="/i?&e=pv&page=Root%20README&url=http%3A%2F%2Fgithub.com%2Fsnowplow%2Fsnowplow&aid=snowplow&p=web&tv=no-js-0.1.0&ua=firefox&&eid=u2i3&debug=true" role="button">See it working!</a>
|
28
28
|
<a class="btn btn-lg btn-warning" href="/i?&e=pv&page=Root%20README&url=http%3A%2F%2Fgithub.com%2Fsnowplow%2Fsnowplow&aid=snowplow&p=i&tv=no-js-0.1.0&debug=true" role="button">Event with errors!</a>
|
29
29
|
</p>
|
30
|
-
<% unless Snowly.
|
30
|
+
<% unless Snowly.development_iglu_resolver_path %>
|
31
31
|
<div class="alert alert-danger" role="alert">The Local Iglu Resolver Path is missing.</div>
|
32
32
|
<% end %>
|
33
|
-
<p>
|
33
|
+
<p>
|
34
|
+
Use <code>snowly -K</code> to stop the collector.
|
35
|
+
</p>
|
34
36
|
</div>
|
35
37
|
|
36
38
|
<div class="row marketing">
|
37
39
|
<div class="col-lg-12">
|
38
40
|
<div class="panel panel-default">
|
39
|
-
<div class="panel-heading">
|
41
|
+
<div class="panel-heading">
|
42
|
+
Current Configuration
|
43
|
+
<div class="pull-right"><span class="label label-primary">version <%= Snowly::VERSION %> </span></div>
|
44
|
+
</div>
|
40
45
|
<table class="table">
|
41
46
|
<thead>
|
42
47
|
<tr>
|
43
|
-
<th>
|
48
|
+
<th>Configuration</th>
|
44
49
|
<th>Value</th>
|
45
50
|
<th>Description</th>
|
46
51
|
</tr>
|
47
52
|
</thead>
|
48
53
|
<tbody>
|
49
54
|
<tr>
|
50
|
-
<td>
|
55
|
+
<td>Debug Mode</td>
|
51
56
|
<td><%= Snowly.debug_mode %></td>
|
52
57
|
<td>Renders parsed request instead of a pixel. Defaults to false</td>
|
53
58
|
</tr>
|
54
59
|
<tr>
|
55
|
-
<td>
|
56
|
-
<td><%= Snowly.
|
60
|
+
<td>DEVELOPMENT_IGLU_RESOLVER_PATH</td>
|
61
|
+
<td><%= Snowly.development_iglu_resolver_path %></td>
|
57
62
|
<td>Local path for contexts and unstructured event schemas.</td>
|
58
63
|
</tr>
|
59
64
|
</tbody>
|
@@ -86,13 +91,13 @@
|
|
86
91
|
Just like the Resolver you may have already configured for the official ETL tools, Snowly needs a
|
87
92
|
local path to find your custom schemas. You can store them under any path(eg: ~/schemas)
|
88
93
|
Inside that folder you must create a resolver compatible structure:
|
89
|
-
<code>~/schemas/com.yoursite/
|
90
|
-
<code>~/schemas/com.yoursite/
|
94
|
+
<code>~/schemas/com.yoursite/my_context/jsonschema/1-0-0</code><br>
|
95
|
+
<code>~/schemas/com.yoursite/my_event/jsonschema/1-0-0</code><br>
|
91
96
|
1-0-0 is the file holding the schema.
|
92
97
|
</p>
|
93
98
|
<p>
|
94
99
|
When you emmit events, use the schema path from the <code>Resolver path</code><br/>
|
95
|
-
<code>{ schema: 'iglu:com.yoursite/
|
100
|
+
<code>{ schema: 'iglu:com.yoursite/my_context/jsonschema/1-0-0', data: !some_schema_data! }</code>
|
96
101
|
</p>
|
97
102
|
<p>
|
98
103
|
Be sure to give your schemas an <a href="http://spacetelescope.github.io/understanding-json-schema/structuring.html#the-id-property">id</a>, so Snowly can output more helpful validation error messages.
|
data/lib/snowly/schema_cache.rb
CHANGED
@@ -33,6 +33,10 @@ module Snowly
|
|
33
33
|
location['iglu:com.snowplowanalytics.snowplow']
|
34
34
|
end
|
35
35
|
|
36
|
+
def external?(location)
|
37
|
+
location.match(/^(http|https):\/\//)
|
38
|
+
end
|
39
|
+
|
36
40
|
# Translate an iglu address to an actual local or remote location
|
37
41
|
# @param location [String]
|
38
42
|
# @param resolver [String] local or remote path to look for the schema
|
@@ -41,16 +45,20 @@ module Snowly
|
|
41
45
|
location.sub(/^iglu\:/, resolver)
|
42
46
|
end
|
43
47
|
|
48
|
+
def resolved_path(location)
|
49
|
+
if from_snowplow?(location)
|
50
|
+
resolve(location, SNOWPLOW_IGLU_RESOLVER)
|
51
|
+
else
|
52
|
+
resolve(location, Snowly.development_iglu_resolver_path)
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
44
56
|
# Caches the schema content under its original location name
|
45
57
|
# @param location [String]
|
46
58
|
# @return [String] schema content
|
47
59
|
def save_in_cache(location)
|
48
|
-
|
49
|
-
|
50
|
-
Net::HTTP.get(uri)
|
51
|
-
else
|
52
|
-
File.read(resolve(location, Snowly.local_iglu_resolver_path))
|
53
|
-
end
|
60
|
+
full_path = resolved_path(location)
|
61
|
+
content = external?(full_path) ? Net::HTTP.get(URI(full_path)) : File.read(full_path)
|
54
62
|
@@schema_cache[location] = content
|
55
63
|
end
|
56
64
|
|
data/lib/snowly/validator.rb
CHANGED
@@ -4,17 +4,51 @@ require 'snowly/extensions/custom_dependencies'
|
|
4
4
|
|
5
5
|
module Snowly
|
6
6
|
class Validator
|
7
|
-
|
7
|
+
PROTOCOL_FILE_NAME = 'snowplow_protocol.json'
|
8
|
+
|
9
|
+
attr_reader :request, :errors, :protocol_schema
|
8
10
|
|
9
11
|
def initialize(query_string)
|
10
12
|
@request = Request.new query_string
|
11
13
|
@errors = []
|
14
|
+
@protocol_schema = load_protocol_schema
|
15
|
+
end
|
16
|
+
|
17
|
+
# If request is valid
|
18
|
+
# @return [true, false] if valid
|
19
|
+
def valid?
|
20
|
+
@errors == []
|
21
|
+
end
|
22
|
+
|
23
|
+
# Entry point for validation.
|
24
|
+
def validate
|
25
|
+
validate_root
|
26
|
+
validate_associated
|
27
|
+
valid?
|
28
|
+
end
|
29
|
+
|
30
|
+
private
|
31
|
+
|
32
|
+
def find_protocol_schema
|
33
|
+
if resolver && alternative_protocol_schema
|
34
|
+
alternative_protocol_schema
|
35
|
+
else
|
36
|
+
File.expand_path("../../schemas/#{PROTOCOL_FILE_NAME}", __FILE__)
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
def resolver
|
41
|
+
Snowly.development_iglu_resolver_path
|
42
|
+
end
|
43
|
+
|
44
|
+
def alternative_protocol_schema
|
45
|
+
Dir[File.join(resolver,"/**/*")].select{ |f| File.basename(f) == PROTOCOL_FILE_NAME }[0]
|
12
46
|
end
|
13
47
|
|
14
48
|
# Loads the protocol schema created to describe snowplow events table attributes
|
15
49
|
# @return [Hash] parsed schema
|
16
|
-
def
|
17
|
-
|
50
|
+
def load_protocol_schema
|
51
|
+
JSON.parse File.read(find_protocol_schema)
|
18
52
|
end
|
19
53
|
|
20
54
|
# @return [Hash] all contexts content and schema definitions
|
@@ -79,18 +113,5 @@ module Snowly
|
|
79
113
|
this_error = JSON::Validator.fully_validate protocol_schema, request.as_hash
|
80
114
|
@errors += this_error if this_error.count > 0
|
81
115
|
end
|
82
|
-
|
83
|
-
# If request is valid
|
84
|
-
# @return [true, false] if valid
|
85
|
-
def valid?
|
86
|
-
@errors == []
|
87
|
-
end
|
88
|
-
|
89
|
-
# Entry point for validation.
|
90
|
-
def validate
|
91
|
-
validate_root
|
92
|
-
validate_associated
|
93
|
-
valid?
|
94
|
-
end
|
95
116
|
end
|
96
117
|
end
|
data/lib/snowly/version.rb
CHANGED