beso 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +106 -2
- data/lib/beso/version.rb +1 -1
- data/lib/tasks/beso.rake +10 -1
- metadata +2 -2
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# Beso
|
2
2
|
|
3
|
-
|
3
|
+
Sync your historical events to KISSmetrics via CSV.
|
4
4
|
|
5
5
|
## Installation
|
6
6
|
|
@@ -16,9 +16,113 @@ Or install it yourself as:
|
|
16
16
|
|
17
17
|
$ gem install beso
|
18
18
|
|
19
|
+
Next, create an initializer for **beso**. There, you can set up your S3 bucket information and define your
|
20
|
+
serialization jobs:
|
21
|
+
|
22
|
+
``` rb
|
23
|
+
# config/initializers/beso.rb
|
24
|
+
Beso.configure do |config|
|
25
|
+
|
26
|
+
# First, set up your S3 credentials:
|
27
|
+
|
28
|
+
config.access_key = '[your AWS access key]'
|
29
|
+
config.secret_key = '[your AWS secret key]'
|
30
|
+
config.bucket_name = 'beso' # recommended, but you can really call this anything
|
31
|
+
|
32
|
+
# Then, define some jobs:
|
33
|
+
|
34
|
+
config.job :message_delivered, :table => :messages do
|
35
|
+
identity { |message| message.user.id }
|
36
|
+
timestamp :created_at
|
37
|
+
prop( :message_id ) { |message| message.id }
|
38
|
+
end
|
39
|
+
|
40
|
+
config.job :signed_up, :table => :users do
|
41
|
+
identity { |user| user.id }
|
42
|
+
timestamp :created_at
|
43
|
+
prop( :age ){ |user| user.age }
|
44
|
+
end
|
45
|
+
end
|
46
|
+
```
|
47
|
+
|
19
48
|
## Usage
|
20
49
|
|
21
|
-
|
50
|
+
### Defining Jobs
|
51
|
+
|
52
|
+
KISSmetrics events have three properties that *must* be defined:
|
53
|
+
|
54
|
+
- Identity
|
55
|
+
- Timestamp
|
56
|
+
- Event
|
57
|
+
|
58
|
+
The **Identity** field is some sort of identifier for your user. Even if your job
|
59
|
+
is working on another table, you should probably have a way to tie the event back
|
60
|
+
to the user who caused it. Here, you can provide one of three things:
|
61
|
+
|
62
|
+
- A proc that should receive the record and return the identity value
|
63
|
+
- A symbol that will get passed to `record.send`
|
64
|
+
- A literal (You'll probably want to do one of the other two options)
|
65
|
+
|
66
|
+
The **Timestamp** field is slightly different in that it should always be part of
|
67
|
+
the table you are querying, not the user. This symbol will get sent to each record,
|
68
|
+
but will also be used in determining the query for the job.
|
69
|
+
|
70
|
+
The **Event** name is inferred by the name of your job. It will be provided and
|
71
|
+
formatted for you.
|
72
|
+
|
73
|
+
On top of this, you can specify up to **ten** custom properties. Like `identity`,
|
74
|
+
you can pass either a proc, a symbol, or a literal:
|
75
|
+
|
76
|
+
``` rb
|
77
|
+
config.job :signed_up, :table => :users do
|
78
|
+
identity :id
|
79
|
+
timestamp :created_at
|
80
|
+
prop( :age ){ |user| user.age }
|
81
|
+
prop( :new_user, true )
|
82
|
+
end
|
83
|
+
```
|
84
|
+
|
85
|
+
### Using the rake task
|
86
|
+
|
87
|
+
By requiring `beso`, you get the `beso:run` rake task. This task will do the following:
|
88
|
+
|
89
|
+
- Connect to your S3 bucket
|
90
|
+
- Pull down 'beso.yml' if it exists
|
91
|
+
|
92
|
+
> `beso.yml` contains the timestamp of the last record queried for each job.
|
93
|
+
> If it doesn't exist, it will be created after the first run.
|
94
|
+
|
95
|
+
- Iterate over the jobs defined in the initializer you set up
|
96
|
+
- Create a CSV representation of all records newer than the timestamp found in `beso.yml`
|
97
|
+
- Upload each CSV to your S3 bucket with the event name and timestamp
|
98
|
+
- Update `beso.yml` with the latest timestamp for each job
|
99
|
+
|
100
|
+
The rake task is designed to be used via cron. For the moment, KISSmetrics will only process
|
101
|
+
one CSV file per hour, so it makes sense that this task should be run at an interval of hours
|
102
|
+
equal to the number of jobs you have defined. For example, if you have defined 4 jobs, this
|
103
|
+
task should run once every 4 hours.
|
104
|
+
|
105
|
+
The rake task also accepts two options that you can set via environment variables.
|
106
|
+
|
107
|
+
`BESO_PREFIX` will change the prefix of the CSV filenames that get uploaded to S3. The default
|
108
|
+
is 'beso', so it is recommended you use that when telling KISSmetrics what your filename
|
109
|
+
pattern is. You can then adjust the prefix if you would like to upload CSV's that you don't
|
110
|
+
want KISSmetrics to recognize.
|
111
|
+
|
112
|
+
`BESO_ORIGIN` will change the behavior of the task when there is no previous timestamp
|
113
|
+
defined for a job in `beso.yml`.
|
114
|
+
|
115
|
+
> By default, the task will use the last timestamp in your table (which effectively
|
116
|
+
> means the first run of this task will do nothing). This is because KISSmetrics
|
117
|
+
> charges you for every event you log through their system, so you probably don't
|
118
|
+
> want to upload 8 months worth of events straight away.
|
119
|
+
|
120
|
+
This option will accept two values to alter the behavior:
|
121
|
+
|
122
|
+
- `now` will set the first run timestamp to now, which will obviously not create any events.
|
123
|
+
- `first` will set the first run timestamp to the first timestamp in each table. Use this with
|
124
|
+
`BESO_PREFIX` if you want to dump an entire table's worth of events to S3 without having
|
125
|
+
KISSmetrics process them.
|
22
126
|
|
23
127
|
## Contributing
|
24
128
|
|
data/lib/beso/version.rb
CHANGED
data/lib/tasks/beso.rake
CHANGED
@@ -19,7 +19,16 @@ namespace :beso do
|
|
19
19
|
|
20
20
|
Beso.jobs.each do |job|
|
21
21
|
|
22
|
-
config[ job.event ] ||=
|
22
|
+
config[ job.event ] ||= begin
|
23
|
+
case ENV[ 'BESO_ORIGIN' ]
|
24
|
+
when 'first'
|
25
|
+
job.first_timestamp
|
26
|
+
when 'now'
|
27
|
+
Time.now
|
28
|
+
else
|
29
|
+
job.last_timestamp
|
30
|
+
end
|
31
|
+
end
|
23
32
|
|
24
33
|
puts "==> Processing job: #{job.event.inspect} since #{config[ job.event ]}"
|
25
34
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: beso
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-04-
|
12
|
+
date: 2012-04-13 00:00:00.000000000Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rails
|