web_analytics_discovery 2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +8 -0
- data/.rspec +2 -0
- data/.travis.yml +9 -0
- data/Gemfile +2 -0
- data/LICENSE +661 -0
- data/README.md +133 -0
- data/Rakefile +7 -0
- data/bin/web_analytics_discover +77 -0
- data/lib/web_analytics_discovery.rb +23 -0
- data/lib/web_analytics_discovery/grabber/alexa.rb +33 -0
- data/lib/web_analytics_discovery/grabber/googleanalytics.rb +29 -0
- data/lib/web_analytics_discovery/grabber/liveinternet.rb +61 -0
- data/lib/web_analytics_discovery/grabber/mailru.rb +89 -0
- data/lib/web_analytics_discovery/grabber/openstat.rb +44 -0
- data/lib/web_analytics_discovery/grabber/quantcast.rb +84 -0
- data/lib/web_analytics_discovery/grabber/rambler.rb +100 -0
- data/lib/web_analytics_discovery/grabber/tns.rb +117 -0
- data/lib/web_analytics_discovery/grabber/yandexmetrika.rb +54 -0
- data/lib/web_analytics_discovery/grabberutils.rb +54 -0
- data/lib/web_analytics_discovery/version.rb +3 -0
- data/spec/alexa_spec.rb +13 -0
- data/spec/liveinternet_spec.rb +15 -0
- data/spec/mailru_spec.rb +36 -0
- data/spec/openstat_spec.rb +24 -0
- data/spec/quantcast_spec.rb +59 -0
- data/spec/rambler_spec.rb +63 -0
- data/spec/spec_helper.rb +25 -0
- data/spec/tns_spec.rb +21 -0
- data/web_analytics_discovery.gemspec +50 -0
- metadata +158 -0
data/README.md
ADDED
@@ -0,0 +1,133 @@
|
|
1
|
+
# web_analytics_discovery
|
2
|
+
<!--[![Gem Version](https://badge.fury.io/rb/web_analytics_discovery.png)](http://badge.fury.io/rb/web_analytics_discovery)-->
|
3
|
+
[![Build Status](https://travis-ci.org/GreyCat/web_analytics_discovery.svg?branch=master)](https://travis-ci.org/GreyCat/web_analytics_discovery)
|
4
|
+
[![Dependency Status](https://gemnasium.com/GreyCat/web_analytics_discovery.svg)](https://gemnasium.com/GreyCat/web_analytics_discovery)
|
5
|
+
[![Code Climate](https://codeclimate.com/github/GreyCat/web_analytics_discovery/badges/gpa.svg)](https://codeclimate.com/github/GreyCat/web_analytics_discovery)
|
6
|
+
<!--[![Coverage Status](https://coveralls.io/repos/GreyCat/web_analytics_discovery/badge.png?branch=master)](https://coveralls.io/r/GreyCat/web_analytics_discovery)-->
|
7
|
+
<!--[![Security Status](http://rails-brakeman.com/GreyCat/web_analytics_discovery.png)](http://rails-brakeman.com/GreyCat/web_analytics_discovery)-->
|
8
|
+
|
9
|
+
This gem provides a set of tools for discovery and export of data from
|
10
|
+
popular web analytics tools.
|
11
|
+
|
12
|
+
The supported web analytics systems are:
|
13
|
+
|
14
|
+
* Alexa
|
15
|
+
* Google Analytics
|
16
|
+
* LiveInternet
|
17
|
+
* Mail.ru
|
18
|
+
* Openstat
|
19
|
+
* Quantcast
|
20
|
+
* Rambler Top100
|
21
|
+
* Yandex Metrika
|
22
|
+
|
23
|
+
## The problem
|
24
|
+
|
25
|
+
Given a particular site URL (i.e. `http://example.com/`), we'd like to
|
26
|
+
know audience statistics on that particular site (i.e. how many unique
|
27
|
+
people visit this site per day, per week, per month, how many page views
|
28
|
+
do they do, etc).
|
29
|
+
|
30
|
+
## The solution
|
31
|
+
|
32
|
+
Many sites use web analytics tools to measure audience stats. Quite
|
33
|
+
often, these statistics are even available for public, although one needs to know:
|
34
|
+
|
35
|
+
* which particular web analytics system a given site uses
|
36
|
+
* what is this site's ID in that web analytics system
|
37
|
+
|
38
|
+
Answering these question usually requires tedious manual process:
|
39
|
+
|
40
|
+
* Look up site's HTML code
|
41
|
+
* Locate JavaScript code / tags / calls to web analytics system
|
42
|
+
* Identify this system
|
43
|
+
* Identify site's ID in the code / calls
|
44
|
+
* Go to web analytics's system site or API and get desired statistics
|
45
|
+
|
46
|
+
This gem tries automate these tasks, looking up all the info and
|
47
|
+
retrieving information from web analytics systems. Exported data can
|
48
|
+
be accessed in simple tabular form or programmatically, as a hash,
|
49
|
+
using API.
|
50
|
+
|
51
|
+
## Installation
|
52
|
+
|
53
|
+
### From RubyGems repository
|
54
|
+
|
55
|
+
* Make sure you have Ruby and RubyGems
|
56
|
+
* Just run `gem install web_analytics_discovery`
|
57
|
+
|
58
|
+
### Manually from source
|
59
|
+
|
60
|
+
* Clone this repository / download snapshot
|
61
|
+
* `gem build web_analytics_discovery.gemspec`
|
62
|
+
* `gem install --local ./web_analytics_discovery-*.gem` (usually as
|
63
|
+
root, if you need system-wide installation)
|
64
|
+
|
65
|
+
## Basic usage
|
66
|
+
|
67
|
+
For basic usage, a simple executable `web_analytics_discover` is
|
68
|
+
provided and installed during gem installation. It can be run with one
|
69
|
+
or several URLs as command-line arguments and it will produce a simple
|
70
|
+
summary table for each of the URLs.
|
71
|
+
|
72
|
+
Example:
|
73
|
+
|
74
|
+
$ web_analytics_discover http://kp.ru/
|
75
|
+
| id| v/day| s/day| pv/day| v/mon| s/mon| pv/mon
|
76
|
+
alexa | kp.ru| N/A| N/A| 1477599| 6825125| N/A| 44974428
|
77
|
+
googleanalytics | UA-23870775-1| N/A| N/A| N/A| N/A| N/A| N/A
|
78
|
+
liveinternet | | 597956| 745757| 1787863| 10585641| 21308436| 49775501
|
79
|
+
mailru | 294001| 756600| N/A| 2230674| 15086634| N/A| 73738178
|
80
|
+
openstat | 2026010| 983579| 1195306| 2823114| 14757845| 28953554| 69970669
|
81
|
+
quantcast | wd:ru.kp| N/A| N/A| N/A| 36300| N/A| N/A
|
82
|
+
rambler | 17841| 1048235| 1287761| 3015270| 15550162| 31307958| 75869606
|
83
|
+
yandexmetrika | 1051362| 259987| 310983| 727833| N/A| N/A| 22153416
|
84
|
+
|
85
|
+
## API usage
|
86
|
+
|
87
|
+
One can easily use web analytics discovery using simple API. Every web
|
88
|
+
analytics service is supported by a separate class named after that
|
89
|
+
service in `WebAnalyticsDiscovery` module:
|
90
|
+
|
91
|
+
* `Alexa`
|
92
|
+
* `GoogleAnalytics`
|
93
|
+
* `LiveInternet`
|
94
|
+
* `MailRu`
|
95
|
+
* `Openstat`
|
96
|
+
* `Quantcast`
|
97
|
+
* `Rambler`
|
98
|
+
* `YandexMetrika`
|
99
|
+
|
100
|
+
One can use it like that:
|
101
|
+
|
102
|
+
require 'web_analytics_discovery'
|
103
|
+
d = WebAnalyticsDiscovery::MailRu.new
|
104
|
+
result = d.run('http://kp.ru/')
|
105
|
+
|
106
|
+
`result` will look like that:
|
107
|
+
|
108
|
+
{:id=>294001,
|
109
|
+
:visitors_day=>756600,
|
110
|
+
:pv_day=>2230674,
|
111
|
+
:visitors_week=>3365344,
|
112
|
+
:pv_week=>13102096,
|
113
|
+
:visitors_mon=>15086634,
|
114
|
+
:pv_mon=>73738178}
|
115
|
+
|
116
|
+
Some values might be missing if it's not possible to retrieve them
|
117
|
+
from a given service.
|
118
|
+
|
119
|
+
## Licensing and usage
|
120
|
+
|
121
|
+
Copyright (C) 2013-2014 Mikhail Yakshin <greycat@altlinux.org>
|
122
|
+
|
123
|
+
This program is free software: you can redistribute it and/or modify
|
124
|
+
it under the terms of the GNU Affero General Public License as
|
125
|
+
published by the Free Software Foundation, either version 3 of the
|
126
|
+
License, or (at your option) any later version.
|
127
|
+
|
128
|
+
This program is distributed in the hope that it will be useful, but
|
129
|
+
WITHOUT ANY WARRANTY; without even the implied warranty of
|
130
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
131
|
+
Affero General Public License for more details.
|
132
|
+
|
133
|
+
Please consult LICENSE file for more details and full license text.
|
data/Rakefile
ADDED
@@ -0,0 +1,77 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'fileutils'
|
4
|
+
require 'uri'
|
5
|
+
require 'optparse'
|
6
|
+
|
7
|
+
require 'web_analytics_discovery'
|
8
|
+
include WebAnalyticsDiscovery
|
9
|
+
|
10
|
+
class AnalyticsGrabber
|
11
|
+
def initialize
|
12
|
+
@services = {}
|
13
|
+
SERVICES.each_pair { |name, klass|
|
14
|
+
begin
|
15
|
+
@services[name] = klass.new
|
16
|
+
rescue Exception => ex
|
17
|
+
warn "Unable to start analytics service #{name}"
|
18
|
+
$stderr.puts ex.message
|
19
|
+
$stderr.puts ex.backtrace.join("\n")
|
20
|
+
end
|
21
|
+
}
|
22
|
+
end
|
23
|
+
|
24
|
+
def run(url)
|
25
|
+
r = {}
|
26
|
+
@services.each_pair { |name, service|
|
27
|
+
begin
|
28
|
+
r[name] = service.run(url)
|
29
|
+
rescue Exception => ex
|
30
|
+
warn "Exception querying analytics service #{name}"
|
31
|
+
$stderr.puts ex.message
|
32
|
+
$stderr.puts ex.backtrace.join("\n")
|
33
|
+
end
|
34
|
+
}
|
35
|
+
r
|
36
|
+
end
|
37
|
+
|
38
|
+
def pp(url)
|
39
|
+
r = run(url)
|
40
|
+
print_line ['', 'id', 'v/day', 's/day', 'pv/day', 'v/mon', 's/mon', 'pv/mon']
|
41
|
+
r.keys.sort.each { |service|
|
42
|
+
res = r[service]
|
43
|
+
next unless res
|
44
|
+
print_line [
|
45
|
+
service,
|
46
|
+
res[:id],
|
47
|
+
res[:visitors_day],
|
48
|
+
res[:visits_day],
|
49
|
+
res[:pv_day],
|
50
|
+
res[:visitors_mon],
|
51
|
+
res[:visits_mon],
|
52
|
+
res[:pv_mon],
|
53
|
+
]
|
54
|
+
}
|
55
|
+
end
|
56
|
+
|
57
|
+
def print_line(a)
|
58
|
+
printf '%-20s', a.shift
|
59
|
+
printf '|%24s', a.shift
|
60
|
+
a.each { |x|
|
61
|
+
printf '|%11s', x || 'N/A'
|
62
|
+
}
|
63
|
+
puts
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
options = {}
|
68
|
+
OptionParser.new { |opts|
|
69
|
+
opts.banner = "Usage: #{__FILE__} [options] <urls>"
|
70
|
+
|
71
|
+
# opts.on('-v', '--[no-]verbose', 'Verbose logging') { |v| options[:verbose] = v }
|
72
|
+
|
73
|
+
opts.on_tail('-h', '--help', 'Show this message') { puts opts; exit }
|
74
|
+
}.parse!
|
75
|
+
|
76
|
+
ag = AnalyticsGrabber.new
|
77
|
+
ARGV.each { |a| ag.pp(a) }
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require 'web_analytics_discovery/version'
|
2
|
+
|
3
|
+
require 'web_analytics_discovery/grabber/alexa'
|
4
|
+
require 'web_analytics_discovery/grabber/googleanalytics'
|
5
|
+
require 'web_analytics_discovery/grabber/liveinternet'
|
6
|
+
require 'web_analytics_discovery/grabber/mailru'
|
7
|
+
require 'web_analytics_discovery/grabber/openstat'
|
8
|
+
require 'web_analytics_discovery/grabber/quantcast'
|
9
|
+
require 'web_analytics_discovery/grabber/rambler'
|
10
|
+
require 'web_analytics_discovery/grabber/tns'
|
11
|
+
require 'web_analytics_discovery/grabber/yandexmetrika'
|
12
|
+
|
13
|
+
module WebAnalyticsDiscovery
|
14
|
+
# Special trickery to get a map of {:service_name => ClassThatImplementsServiceExtraction} magic
|
15
|
+
SERVICES = Hash[constants.map { |x|
|
16
|
+
possible_class = const_get(x)
|
17
|
+
if possible_class.class == Class
|
18
|
+
[x.to_s.downcase.to_sym, possible_class]
|
19
|
+
else
|
20
|
+
nil
|
21
|
+
end
|
22
|
+
}.delete_if { |v| v.nil? }]
|
23
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'uri'
|
4
|
+
require 'web_analytics_discovery/grabberutils'
|
5
|
+
|
6
|
+
module WebAnalyticsDiscovery
|
7
|
+
class Alexa
|
8
|
+
include GrabberUtils
|
9
|
+
|
10
|
+
def run(url)
|
11
|
+
uri = URI.parse(url)
|
12
|
+
host = uri.host
|
13
|
+
r = {}
|
14
|
+
doc = download("http://www.alexa.com/siteinfo/#{host}#trafficstats")
|
15
|
+
|
16
|
+
# Try to extract certified metrics
|
17
|
+
r[:visitors_day], r[:pv_day], r[:visitors_mon], r[:pv_mon] = grab_certified_metrics(doc)
|
18
|
+
|
19
|
+
# Grab ID for clarity's sake
|
20
|
+
if doc =~ /<img src="http:\/\/traffic\.alexa\.com\/graph\?.*&u=([^"]+)">/
|
21
|
+
r[:id] = $1
|
22
|
+
end
|
23
|
+
return r
|
24
|
+
end
|
25
|
+
|
26
|
+
def grab_certified_metrics(doc)
|
27
|
+
r = []
|
28
|
+
doc.gsub(/<strong class="metrics-data">([0-9,]+)<\/strong>/) { r << $1 }
|
29
|
+
r.map! { |x| x.gsub(/,/, '').to_i }
|
30
|
+
return r
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,29 @@
|
|
1
|
+
require 'web_analytics_discovery/grabberutils'
|
2
|
+
|
3
|
+
module WebAnalyticsDiscovery
|
4
|
+
class GoogleAnalytics
|
5
|
+
include GrabberUtils
|
6
|
+
|
7
|
+
def run(url)
|
8
|
+
@page = download(url)
|
9
|
+
run_id(find_id)
|
10
|
+
end
|
11
|
+
|
12
|
+
def find_id
|
13
|
+
case @page
|
14
|
+
when /_gat\._getTracker\(["']([^"']+)["']\)/
|
15
|
+
$1
|
16
|
+
when /_gaq\.push\(\[['"]_setAccount['"], ['"]([^"']+)['"]\]\)/
|
17
|
+
$1
|
18
|
+
else
|
19
|
+
nil
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
def run_id(id)
|
24
|
+
return nil unless id
|
25
|
+
r = {:id => id}
|
26
|
+
return r
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
@@ -0,0 +1,61 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
2
|
+
|
3
|
+
require 'uri'
|
4
|
+
require 'web_analytics_discovery/grabberutils'
|
5
|
+
|
6
|
+
module WebAnalyticsDiscovery
|
7
|
+
class LiveInternet
|
8
|
+
include GrabberUtils
|
9
|
+
|
10
|
+
def run(url)
|
11
|
+
@url = url
|
12
|
+
@page = download(url)
|
13
|
+
run_id(find_id)
|
14
|
+
end
|
15
|
+
|
16
|
+
def find_id
|
17
|
+
case @page
|
18
|
+
when /new Image\(\)\.src = "\/\/counter\.yadro\.ru\/hit;([^?"]+)\?/
|
19
|
+
$1
|
20
|
+
else
|
21
|
+
# Use hostname as a last resort measure
|
22
|
+
URI.parse(@url).host
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
def run_id(host)
|
27
|
+
r = {:id => host}
|
28
|
+
|
29
|
+
doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv")
|
30
|
+
r[:pv_day], r[:visits_day], r[:visitors_day] = grab_psv(doc, 4)
|
31
|
+
|
32
|
+
# Bail out early if no LiveInternet data available
|
33
|
+
return r unless r[:pv_day]
|
34
|
+
|
35
|
+
doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv?period=week;total=yes")
|
36
|
+
r[:pv_week], r[:visits_week], r[:visitors_week] = grab_psv(doc, 2)
|
37
|
+
|
38
|
+
doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv?period=month;total=yes")
|
39
|
+
r[:pv_mon], r[:visits_mon], r[:visitors_mon] = grab_psv(doc, 2)
|
40
|
+
|
41
|
+
return r
|
42
|
+
end
|
43
|
+
|
44
|
+
private
|
45
|
+
def grab_psv(doc, col)
|
46
|
+
r = [nil, nil, nil]
|
47
|
+
doc.split(/\n/).each { |l|
|
48
|
+
c = l.split(/;/)
|
49
|
+
case c[0]
|
50
|
+
when '"Просмотры"'
|
51
|
+
r[0] = c[col].to_i
|
52
|
+
when '"Сессии"'
|
53
|
+
r[1] = c[col].to_i
|
54
|
+
when '"Посетители"'
|
55
|
+
r[2] = c[col].to_i
|
56
|
+
end
|
57
|
+
}
|
58
|
+
return r
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
@@ -0,0 +1,89 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
2
|
+
|
3
|
+
require 'web_analytics_discovery/grabberutils'
|
4
|
+
|
5
|
+
module WebAnalyticsDiscovery
|
6
|
+
class MailRu
|
7
|
+
include GrabberUtils
|
8
|
+
|
9
|
+
def run(url)
|
10
|
+
@page = download(url)
|
11
|
+
run_id(find_id)
|
12
|
+
end
|
13
|
+
|
14
|
+
def find_id
|
15
|
+
case @page
|
16
|
+
when /<a [^>]*href="http:\/\/top\.mail\.ru\/jump\?from=(\d+)".*>\s*<img src="http:\/\/.*.top.mail.ru\/counter/m,
|
17
|
+
/<img src=['"]?http:\/\/top\.list\.ru\/counter\?id=(\d+)/,
|
18
|
+
/<img src=['"]?http:\/\/.*top\.mail\.ru\/counter\?js=na;id=(\d+)/,
|
19
|
+
/_tmr.push\(\{id:\s*['"](\d+)['"]/
|
20
|
+
$1.to_i
|
21
|
+
else
|
22
|
+
nil
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
def run_id(id)
|
27
|
+
return nil unless id
|
28
|
+
r = {:id => id}
|
29
|
+
|
30
|
+
#doc = download("http://top.mail.ru/visits?id=#{id}")
|
31
|
+
|
32
|
+
# Analyze daily report
|
33
|
+
doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=0&date=&back=30&", 'windows-1251').split(/\n/)
|
34
|
+
return run_id_html_rating(r, id) if doc.empty?
|
35
|
+
doc = doc[4..-1]
|
36
|
+
|
37
|
+
sum_v = 0
|
38
|
+
sum_pv = 0
|
39
|
+
doc.each { |l|
|
40
|
+
#"Дата";"Посетители";"Новые посетители";"Ядро";"Хосты";"Просмотры";"Глубина"
|
41
|
+
date, v, new_v, core_v, hosts, pv, depth = l.split(/;/)
|
42
|
+
sum_v += v.to_i
|
43
|
+
sum_pv += pv.to_i
|
44
|
+
}
|
45
|
+
|
46
|
+
r[:visitors_day] = sum_v / doc.size
|
47
|
+
r[:pv_day] = sum_pv / doc.size
|
48
|
+
|
49
|
+
# Analyze weekly report
|
50
|
+
doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=1&date=&back=98&", 'windows-1251').split(/\n/)
|
51
|
+
return r if doc.empty?
|
52
|
+
date, v, new_v, core_v, hosts, pv, depth = doc[4].split(/;/)
|
53
|
+
r[:visitors_week] = v.to_i
|
54
|
+
r[:pv_week] = pv.to_i
|
55
|
+
|
56
|
+
# Analyze monthly report
|
57
|
+
doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=2&date=&back=395&", 'windows-1251').split(/\n/)
|
58
|
+
return r if doc.empty?
|
59
|
+
date, v, new_v, core_v, hosts, pv, depth = doc[4].split(/;/)
|
60
|
+
r[:visitors_mon] = v.to_i
|
61
|
+
r[:pv_mon] = pv.to_i
|
62
|
+
|
63
|
+
return r
|
64
|
+
end
|
65
|
+
|
66
|
+
# Parse semi-closed rating when normal full CSV export is not available
|
67
|
+
def run_id_html_rating(r, id)
|
68
|
+
doc = download("http://top.mail.ru/rating?id=#{id}", 'windows-1251')
|
69
|
+
|
70
|
+
today = []
|
71
|
+
doc.gsub(/<td class="l_col">Сегодня<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { today << $1.gsub(/,/, '').to_i }
|
72
|
+
|
73
|
+
week = []
|
74
|
+
doc.gsub(/<td class="l_col">Неделя<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { week << $1.gsub(/,/, '').to_i }
|
75
|
+
|
76
|
+
month = []
|
77
|
+
doc.gsub(/<td class="l_col">Месяц<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { month << $1.gsub(/,/, '').to_i }
|
78
|
+
|
79
|
+
# Non-normal number of matches? That's weird, bail out
|
80
|
+
return r unless today.length == 3 and week.length == 3 and month.length == 3
|
81
|
+
|
82
|
+
r[:visitors_day], r[:pv_day], r[:ip_day] = today
|
83
|
+
r[:visitors_week], r[:pv_week], r[:ip_week] = week
|
84
|
+
r[:visitors_mon], r[:pv_mon], r[:ip_mon] = month
|
85
|
+
|
86
|
+
return r
|
87
|
+
end
|
88
|
+
end
|
89
|
+
end
|