web_analytics_discovery 2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +8 -0
- data/.rspec +2 -0
- data/.travis.yml +9 -0
- data/Gemfile +2 -0
- data/LICENSE +661 -0
- data/README.md +133 -0
- data/Rakefile +7 -0
- data/bin/web_analytics_discover +77 -0
- data/lib/web_analytics_discovery.rb +23 -0
- data/lib/web_analytics_discovery/grabber/alexa.rb +33 -0
- data/lib/web_analytics_discovery/grabber/googleanalytics.rb +29 -0
- data/lib/web_analytics_discovery/grabber/liveinternet.rb +61 -0
- data/lib/web_analytics_discovery/grabber/mailru.rb +89 -0
- data/lib/web_analytics_discovery/grabber/openstat.rb +44 -0
- data/lib/web_analytics_discovery/grabber/quantcast.rb +84 -0
- data/lib/web_analytics_discovery/grabber/rambler.rb +100 -0
- data/lib/web_analytics_discovery/grabber/tns.rb +117 -0
- data/lib/web_analytics_discovery/grabber/yandexmetrika.rb +54 -0
- data/lib/web_analytics_discovery/grabberutils.rb +54 -0
- data/lib/web_analytics_discovery/version.rb +3 -0
- data/spec/alexa_spec.rb +13 -0
- data/spec/liveinternet_spec.rb +15 -0
- data/spec/mailru_spec.rb +36 -0
- data/spec/openstat_spec.rb +24 -0
- data/spec/quantcast_spec.rb +59 -0
- data/spec/rambler_spec.rb +63 -0
- data/spec/spec_helper.rb +25 -0
- data/spec/tns_spec.rb +21 -0
- data/web_analytics_discovery.gemspec +50 -0
- metadata +158 -0
data/README.md
ADDED
@@ -0,0 +1,133 @@
|
|
1
|
+
# web_analytics_discovery
|
2
|
+
<!--[](http://badge.fury.io/rb/web_analytics_discovery)-->
|
3
|
+
[](https://travis-ci.org/GreyCat/web_analytics_discovery)
|
4
|
+
[](https://gemnasium.com/GreyCat/web_analytics_discovery)
|
5
|
+
[](https://codeclimate.com/github/GreyCat/web_analytics_discovery)
|
6
|
+
<!--[](https://coveralls.io/r/GreyCat/web_analytics_discovery)-->
|
7
|
+
<!--[](http://rails-brakeman.com/GreyCat/web_analytics_discovery)-->
|
8
|
+
|
9
|
+
This gem provides a set of tools for discovery and export of data from
|
10
|
+
popular web analytics tools.
|
11
|
+
|
12
|
+
The supported web analytics systems are:
|
13
|
+
|
14
|
+
* Alexa
|
15
|
+
* Google Analytics
|
16
|
+
* LiveInternet
|
17
|
+
* Mail.ru
|
18
|
+
* Openstat
|
19
|
+
* Quantcast
|
20
|
+
* Rambler Top100
|
21
|
+
* Yandex Metrika
|
22
|
+
|
23
|
+
## The problem
|
24
|
+
|
25
|
+
Given a particular site URL (i.e. `http://example.com/`), we'd like to
|
26
|
+
know audience statistics on that particular site (i.e. how many unique
|
27
|
+
people visit this site per day, per week, per month, how many page views
|
28
|
+
do they do, etc).
|
29
|
+
|
30
|
+
## The solution
|
31
|
+
|
32
|
+
Many sites use web analytics tools to measure audience stats. Quite
|
33
|
+
often, these statistics are even available for public, although one needs to know:
|
34
|
+
|
35
|
+
* which particular web analytics system a given site uses
|
36
|
+
* what is this site's ID in that web analytics system
|
37
|
+
|
38
|
+
Answering these question usually requires tedious manual process:
|
39
|
+
|
40
|
+
* Look up site's HTML code
|
41
|
+
* Locate JavaScript code / tags / calls to web analytics system
|
42
|
+
* Identify this system
|
43
|
+
* Identify site's ID in the code / calls
|
44
|
+
* Go to web analytics's system site or API and get desired statistics
|
45
|
+
|
46
|
+
This gem tries automate these tasks, looking up all the info and
|
47
|
+
retrieving information from web analytics systems. Exported data can
|
48
|
+
be accessed in simple tabular form or programmatically, as a hash,
|
49
|
+
using API.
|
50
|
+
|
51
|
+
## Installation
|
52
|
+
|
53
|
+
### From RubyGems repository
|
54
|
+
|
55
|
+
* Make sure you have Ruby and RubyGems
|
56
|
+
* Just run `gem install web_analytics_discovery`
|
57
|
+
|
58
|
+
### Manually from source
|
59
|
+
|
60
|
+
* Clone this repository / download snapshot
|
61
|
+
* `gem build web_analytics_discovery.gemspec`
|
62
|
+
* `gem install --local ./web_analytics_discovery-*.gem` (usually as
|
63
|
+
root, if you need system-wide installation)
|
64
|
+
|
65
|
+
## Basic usage
|
66
|
+
|
67
|
+
For basic usage, a simple executable `web_analytics_discover` is
|
68
|
+
provided and installed during gem installation. It can be run with one
|
69
|
+
or several URLs as command-line arguments and it will produce a simple
|
70
|
+
summary table for each of the URLs.
|
71
|
+
|
72
|
+
Example:
|
73
|
+
|
74
|
+
$ web_analytics_discover http://kp.ru/
|
75
|
+
| id| v/day| s/day| pv/day| v/mon| s/mon| pv/mon
|
76
|
+
alexa | kp.ru| N/A| N/A| 1477599| 6825125| N/A| 44974428
|
77
|
+
googleanalytics | UA-23870775-1| N/A| N/A| N/A| N/A| N/A| N/A
|
78
|
+
liveinternet | | 597956| 745757| 1787863| 10585641| 21308436| 49775501
|
79
|
+
mailru | 294001| 756600| N/A| 2230674| 15086634| N/A| 73738178
|
80
|
+
openstat | 2026010| 983579| 1195306| 2823114| 14757845| 28953554| 69970669
|
81
|
+
quantcast | wd:ru.kp| N/A| N/A| N/A| 36300| N/A| N/A
|
82
|
+
rambler | 17841| 1048235| 1287761| 3015270| 15550162| 31307958| 75869606
|
83
|
+
yandexmetrika | 1051362| 259987| 310983| 727833| N/A| N/A| 22153416
|
84
|
+
|
85
|
+
## API usage
|
86
|
+
|
87
|
+
One can easily use web analytics discovery using simple API. Every web
|
88
|
+
analytics service is supported by a separate class named after that
|
89
|
+
service in `WebAnalyticsDiscovery` module:
|
90
|
+
|
91
|
+
* `Alexa`
|
92
|
+
* `GoogleAnalytics`
|
93
|
+
* `LiveInternet`
|
94
|
+
* `MailRu`
|
95
|
+
* `Openstat`
|
96
|
+
* `Quantcast`
|
97
|
+
* `Rambler`
|
98
|
+
* `YandexMetrika`
|
99
|
+
|
100
|
+
One can use it like that:
|
101
|
+
|
102
|
+
require 'web_analytics_discovery'
|
103
|
+
d = WebAnalyticsDiscovery::MailRu.new
|
104
|
+
result = d.run('http://kp.ru/')
|
105
|
+
|
106
|
+
`result` will look like that:
|
107
|
+
|
108
|
+
{:id=>294001,
|
109
|
+
:visitors_day=>756600,
|
110
|
+
:pv_day=>2230674,
|
111
|
+
:visitors_week=>3365344,
|
112
|
+
:pv_week=>13102096,
|
113
|
+
:visitors_mon=>15086634,
|
114
|
+
:pv_mon=>73738178}
|
115
|
+
|
116
|
+
Some values might be missing if it's not possible to retrieve them
|
117
|
+
from a given service.
|
118
|
+
|
119
|
+
## Licensing and usage
|
120
|
+
|
121
|
+
Copyright (C) 2013-2014 Mikhail Yakshin <greycat@altlinux.org>
|
122
|
+
|
123
|
+
This program is free software: you can redistribute it and/or modify
|
124
|
+
it under the terms of the GNU Affero General Public License as
|
125
|
+
published by the Free Software Foundation, either version 3 of the
|
126
|
+
License, or (at your option) any later version.
|
127
|
+
|
128
|
+
This program is distributed in the hope that it will be useful, but
|
129
|
+
WITHOUT ANY WARRANTY; without even the implied warranty of
|
130
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
131
|
+
Affero General Public License for more details.
|
132
|
+
|
133
|
+
Please consult LICENSE file for more details and full license text.
|
data/Rakefile
ADDED
@@ -0,0 +1,77 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'fileutils'
|
4
|
+
require 'uri'
|
5
|
+
require 'optparse'
|
6
|
+
|
7
|
+
require 'web_analytics_discovery'
|
8
|
+
include WebAnalyticsDiscovery
|
9
|
+
|
10
|
+
class AnalyticsGrabber
|
11
|
+
def initialize
|
12
|
+
@services = {}
|
13
|
+
SERVICES.each_pair { |name, klass|
|
14
|
+
begin
|
15
|
+
@services[name] = klass.new
|
16
|
+
rescue Exception => ex
|
17
|
+
warn "Unable to start analytics service #{name}"
|
18
|
+
$stderr.puts ex.message
|
19
|
+
$stderr.puts ex.backtrace.join("\n")
|
20
|
+
end
|
21
|
+
}
|
22
|
+
end
|
23
|
+
|
24
|
+
def run(url)
|
25
|
+
r = {}
|
26
|
+
@services.each_pair { |name, service|
|
27
|
+
begin
|
28
|
+
r[name] = service.run(url)
|
29
|
+
rescue Exception => ex
|
30
|
+
warn "Exception querying analytics service #{name}"
|
31
|
+
$stderr.puts ex.message
|
32
|
+
$stderr.puts ex.backtrace.join("\n")
|
33
|
+
end
|
34
|
+
}
|
35
|
+
r
|
36
|
+
end
|
37
|
+
|
38
|
+
def pp(url)
|
39
|
+
r = run(url)
|
40
|
+
print_line ['', 'id', 'v/day', 's/day', 'pv/day', 'v/mon', 's/mon', 'pv/mon']
|
41
|
+
r.keys.sort.each { |service|
|
42
|
+
res = r[service]
|
43
|
+
next unless res
|
44
|
+
print_line [
|
45
|
+
service,
|
46
|
+
res[:id],
|
47
|
+
res[:visitors_day],
|
48
|
+
res[:visits_day],
|
49
|
+
res[:pv_day],
|
50
|
+
res[:visitors_mon],
|
51
|
+
res[:visits_mon],
|
52
|
+
res[:pv_mon],
|
53
|
+
]
|
54
|
+
}
|
55
|
+
end
|
56
|
+
|
57
|
+
def print_line(a)
|
58
|
+
printf '%-20s', a.shift
|
59
|
+
printf '|%24s', a.shift
|
60
|
+
a.each { |x|
|
61
|
+
printf '|%11s', x || 'N/A'
|
62
|
+
}
|
63
|
+
puts
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
options = {}
|
68
|
+
OptionParser.new { |opts|
|
69
|
+
opts.banner = "Usage: #{__FILE__} [options] <urls>"
|
70
|
+
|
71
|
+
# opts.on('-v', '--[no-]verbose', 'Verbose logging') { |v| options[:verbose] = v }
|
72
|
+
|
73
|
+
opts.on_tail('-h', '--help', 'Show this message') { puts opts; exit }
|
74
|
+
}.parse!
|
75
|
+
|
76
|
+
ag = AnalyticsGrabber.new
|
77
|
+
ARGV.each { |a| ag.pp(a) }
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require 'web_analytics_discovery/version'
|
2
|
+
|
3
|
+
require 'web_analytics_discovery/grabber/alexa'
|
4
|
+
require 'web_analytics_discovery/grabber/googleanalytics'
|
5
|
+
require 'web_analytics_discovery/grabber/liveinternet'
|
6
|
+
require 'web_analytics_discovery/grabber/mailru'
|
7
|
+
require 'web_analytics_discovery/grabber/openstat'
|
8
|
+
require 'web_analytics_discovery/grabber/quantcast'
|
9
|
+
require 'web_analytics_discovery/grabber/rambler'
|
10
|
+
require 'web_analytics_discovery/grabber/tns'
|
11
|
+
require 'web_analytics_discovery/grabber/yandexmetrika'
|
12
|
+
|
13
|
+
module WebAnalyticsDiscovery
|
14
|
+
# Special trickery to get a map of {:service_name => ClassThatImplementsServiceExtraction} magic
|
15
|
+
SERVICES = Hash[constants.map { |x|
|
16
|
+
possible_class = const_get(x)
|
17
|
+
if possible_class.class == Class
|
18
|
+
[x.to_s.downcase.to_sym, possible_class]
|
19
|
+
else
|
20
|
+
nil
|
21
|
+
end
|
22
|
+
}.delete_if { |v| v.nil? }]
|
23
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'uri'
|
4
|
+
require 'web_analytics_discovery/grabberutils'
|
5
|
+
|
6
|
+
module WebAnalyticsDiscovery
|
7
|
+
class Alexa
|
8
|
+
include GrabberUtils
|
9
|
+
|
10
|
+
def run(url)
|
11
|
+
uri = URI.parse(url)
|
12
|
+
host = uri.host
|
13
|
+
r = {}
|
14
|
+
doc = download("http://www.alexa.com/siteinfo/#{host}#trafficstats")
|
15
|
+
|
16
|
+
# Try to extract certified metrics
|
17
|
+
r[:visitors_day], r[:pv_day], r[:visitors_mon], r[:pv_mon] = grab_certified_metrics(doc)
|
18
|
+
|
19
|
+
# Grab ID for clarity's sake
|
20
|
+
if doc =~ /<img src="http:\/\/traffic\.alexa\.com\/graph\?.*&u=([^"]+)">/
|
21
|
+
r[:id] = $1
|
22
|
+
end
|
23
|
+
return r
|
24
|
+
end
|
25
|
+
|
26
|
+
def grab_certified_metrics(doc)
|
27
|
+
r = []
|
28
|
+
doc.gsub(/<strong class="metrics-data">([0-9,]+)<\/strong>/) { r << $1 }
|
29
|
+
r.map! { |x| x.gsub(/,/, '').to_i }
|
30
|
+
return r
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,29 @@
|
|
1
|
+
require 'web_analytics_discovery/grabberutils'
|
2
|
+
|
3
|
+
module WebAnalyticsDiscovery
|
4
|
+
class GoogleAnalytics
|
5
|
+
include GrabberUtils
|
6
|
+
|
7
|
+
def run(url)
|
8
|
+
@page = download(url)
|
9
|
+
run_id(find_id)
|
10
|
+
end
|
11
|
+
|
12
|
+
def find_id
|
13
|
+
case @page
|
14
|
+
when /_gat\._getTracker\(["']([^"']+)["']\)/
|
15
|
+
$1
|
16
|
+
when /_gaq\.push\(\[['"]_setAccount['"], ['"]([^"']+)['"]\]\)/
|
17
|
+
$1
|
18
|
+
else
|
19
|
+
nil
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
def run_id(id)
|
24
|
+
return nil unless id
|
25
|
+
r = {:id => id}
|
26
|
+
return r
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
@@ -0,0 +1,61 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
2
|
+
|
3
|
+
require 'uri'
|
4
|
+
require 'web_analytics_discovery/grabberutils'
|
5
|
+
|
6
|
+
module WebAnalyticsDiscovery
|
7
|
+
class LiveInternet
|
8
|
+
include GrabberUtils
|
9
|
+
|
10
|
+
def run(url)
|
11
|
+
@url = url
|
12
|
+
@page = download(url)
|
13
|
+
run_id(find_id)
|
14
|
+
end
|
15
|
+
|
16
|
+
def find_id
|
17
|
+
case @page
|
18
|
+
when /new Image\(\)\.src = "\/\/counter\.yadro\.ru\/hit;([^?"]+)\?/
|
19
|
+
$1
|
20
|
+
else
|
21
|
+
# Use hostname as a last resort measure
|
22
|
+
URI.parse(@url).host
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
def run_id(host)
|
27
|
+
r = {:id => host}
|
28
|
+
|
29
|
+
doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv")
|
30
|
+
r[:pv_day], r[:visits_day], r[:visitors_day] = grab_psv(doc, 4)
|
31
|
+
|
32
|
+
# Bail out early if no LiveInternet data available
|
33
|
+
return r unless r[:pv_day]
|
34
|
+
|
35
|
+
doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv?period=week;total=yes")
|
36
|
+
r[:pv_week], r[:visits_week], r[:visitors_week] = grab_psv(doc, 2)
|
37
|
+
|
38
|
+
doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv?period=month;total=yes")
|
39
|
+
r[:pv_mon], r[:visits_mon], r[:visitors_mon] = grab_psv(doc, 2)
|
40
|
+
|
41
|
+
return r
|
42
|
+
end
|
43
|
+
|
44
|
+
private
|
45
|
+
def grab_psv(doc, col)
|
46
|
+
r = [nil, nil, nil]
|
47
|
+
doc.split(/\n/).each { |l|
|
48
|
+
c = l.split(/;/)
|
49
|
+
case c[0]
|
50
|
+
when '"Просмотры"'
|
51
|
+
r[0] = c[col].to_i
|
52
|
+
when '"Сессии"'
|
53
|
+
r[1] = c[col].to_i
|
54
|
+
when '"Посетители"'
|
55
|
+
r[2] = c[col].to_i
|
56
|
+
end
|
57
|
+
}
|
58
|
+
return r
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
@@ -0,0 +1,89 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
2
|
+
|
3
|
+
require 'web_analytics_discovery/grabberutils'
|
4
|
+
|
5
|
+
module WebAnalyticsDiscovery
|
6
|
+
class MailRu
|
7
|
+
include GrabberUtils
|
8
|
+
|
9
|
+
def run(url)
|
10
|
+
@page = download(url)
|
11
|
+
run_id(find_id)
|
12
|
+
end
|
13
|
+
|
14
|
+
def find_id
|
15
|
+
case @page
|
16
|
+
when /<a [^>]*href="http:\/\/top\.mail\.ru\/jump\?from=(\d+)".*>\s*<img src="http:\/\/.*.top.mail.ru\/counter/m,
|
17
|
+
/<img src=['"]?http:\/\/top\.list\.ru\/counter\?id=(\d+)/,
|
18
|
+
/<img src=['"]?http:\/\/.*top\.mail\.ru\/counter\?js=na;id=(\d+)/,
|
19
|
+
/_tmr.push\(\{id:\s*['"](\d+)['"]/
|
20
|
+
$1.to_i
|
21
|
+
else
|
22
|
+
nil
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
def run_id(id)
|
27
|
+
return nil unless id
|
28
|
+
r = {:id => id}
|
29
|
+
|
30
|
+
#doc = download("http://top.mail.ru/visits?id=#{id}")
|
31
|
+
|
32
|
+
# Analyze daily report
|
33
|
+
doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=0&date=&back=30&", 'windows-1251').split(/\n/)
|
34
|
+
return run_id_html_rating(r, id) if doc.empty?
|
35
|
+
doc = doc[4..-1]
|
36
|
+
|
37
|
+
sum_v = 0
|
38
|
+
sum_pv = 0
|
39
|
+
doc.each { |l|
|
40
|
+
#"Дата";"Посетители";"Новые посетители";"Ядро";"Хосты";"Просмотры";"Глубина"
|
41
|
+
date, v, new_v, core_v, hosts, pv, depth = l.split(/;/)
|
42
|
+
sum_v += v.to_i
|
43
|
+
sum_pv += pv.to_i
|
44
|
+
}
|
45
|
+
|
46
|
+
r[:visitors_day] = sum_v / doc.size
|
47
|
+
r[:pv_day] = sum_pv / doc.size
|
48
|
+
|
49
|
+
# Analyze weekly report
|
50
|
+
doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=1&date=&back=98&", 'windows-1251').split(/\n/)
|
51
|
+
return r if doc.empty?
|
52
|
+
date, v, new_v, core_v, hosts, pv, depth = doc[4].split(/;/)
|
53
|
+
r[:visitors_week] = v.to_i
|
54
|
+
r[:pv_week] = pv.to_i
|
55
|
+
|
56
|
+
# Analyze monthly report
|
57
|
+
doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=2&date=&back=395&", 'windows-1251').split(/\n/)
|
58
|
+
return r if doc.empty?
|
59
|
+
date, v, new_v, core_v, hosts, pv, depth = doc[4].split(/;/)
|
60
|
+
r[:visitors_mon] = v.to_i
|
61
|
+
r[:pv_mon] = pv.to_i
|
62
|
+
|
63
|
+
return r
|
64
|
+
end
|
65
|
+
|
66
|
+
# Parse semi-closed rating when normal full CSV export is not available
|
67
|
+
def run_id_html_rating(r, id)
|
68
|
+
doc = download("http://top.mail.ru/rating?id=#{id}", 'windows-1251')
|
69
|
+
|
70
|
+
today = []
|
71
|
+
doc.gsub(/<td class="l_col">Сегодня<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { today << $1.gsub(/,/, '').to_i }
|
72
|
+
|
73
|
+
week = []
|
74
|
+
doc.gsub(/<td class="l_col">Неделя<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { week << $1.gsub(/,/, '').to_i }
|
75
|
+
|
76
|
+
month = []
|
77
|
+
doc.gsub(/<td class="l_col">Месяц<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { month << $1.gsub(/,/, '').to_i }
|
78
|
+
|
79
|
+
# Non-normal number of matches? That's weird, bail out
|
80
|
+
return r unless today.length == 3 and week.length == 3 and month.length == 3
|
81
|
+
|
82
|
+
r[:visitors_day], r[:pv_day], r[:ip_day] = today
|
83
|
+
r[:visitors_week], r[:pv_week], r[:ip_week] = week
|
84
|
+
r[:visitors_mon], r[:pv_mon], r[:ip_mon] = month
|
85
|
+
|
86
|
+
return r
|
87
|
+
end
|
88
|
+
end
|
89
|
+
end
|