web_analytics_discovery 2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,133 @@
1
+ # web_analytics_discovery
2
+ <!--[![Gem Version](https://badge.fury.io/rb/web_analytics_discovery.png)](http://badge.fury.io/rb/web_analytics_discovery)-->
3
+ [![Build Status](https://travis-ci.org/GreyCat/web_analytics_discovery.svg?branch=master)](https://travis-ci.org/GreyCat/web_analytics_discovery)
4
+ [![Dependency Status](https://gemnasium.com/GreyCat/web_analytics_discovery.svg)](https://gemnasium.com/GreyCat/web_analytics_discovery)
5
+ [![Code Climate](https://codeclimate.com/github/GreyCat/web_analytics_discovery/badges/gpa.svg)](https://codeclimate.com/github/GreyCat/web_analytics_discovery)
6
+ <!--[![Coverage Status](https://coveralls.io/repos/GreyCat/web_analytics_discovery/badge.png?branch=master)](https://coveralls.io/r/GreyCat/web_analytics_discovery)-->
7
+ <!--[![Security Status](http://rails-brakeman.com/GreyCat/web_analytics_discovery.png)](http://rails-brakeman.com/GreyCat/web_analytics_discovery)-->
8
+
9
+ This gem provides a set of tools for discovery and export of data from
10
+ popular web analytics tools.
11
+
12
+ The supported web analytics systems are:
13
+
14
+ * Alexa
15
+ * Google Analytics
16
+ * LiveInternet
17
+ * Mail.ru
18
+ * Openstat
19
+ * Quantcast
20
+ * Rambler Top100
21
+ * Yandex Metrika
22
+
23
+ ## The problem
24
+
25
+ Given a particular site URL (i.e. `http://example.com/`), we'd like to
26
+ know audience statistics on that particular site (i.e. how many unique
27
+ people visit this site per day, per week, per month, how many page views
28
+ do they do, etc).
29
+
30
+ ## The solution
31
+
32
+ Many sites use web analytics tools to measure audience stats. Quite
33
+ often, these statistics are even available for public, although one needs to know:
34
+
35
+ * which particular web analytics system a given site uses
36
+ * what is this site's ID in that web analytics system
37
+
38
+ Answering these question usually requires tedious manual process:
39
+
40
+ * Look up site's HTML code
41
+ * Locate JavaScript code / tags / calls to web analytics system
42
+ * Identify this system
43
+ * Identify site's ID in the code / calls
44
+ * Go to web analytics's system site or API and get desired statistics
45
+
46
+ This gem tries automate these tasks, looking up all the info and
47
+ retrieving information from web analytics systems. Exported data can
48
+ be accessed in simple tabular form or programmatically, as a hash,
49
+ using API.
50
+
51
+ ## Installation
52
+
53
+ ### From RubyGems repository
54
+
55
+ * Make sure you have Ruby and RubyGems
56
+ * Just run `gem install web_analytics_discovery`
57
+
58
+ ### Manually from source
59
+
60
+ * Clone this repository / download snapshot
61
+ * `gem build web_analytics_discovery.gemspec`
62
+ * `gem install --local ./web_analytics_discovery-*.gem` (usually as
63
+ root, if you need system-wide installation)
64
+
65
+ ## Basic usage
66
+
67
+ For basic usage, a simple executable `web_analytics_discover` is
68
+ provided and installed during gem installation. It can be run with one
69
+ or several URLs as command-line arguments and it will produce a simple
70
+ summary table for each of the URLs.
71
+
72
+ Example:
73
+
74
+ $ web_analytics_discover http://kp.ru/
75
+ | id| v/day| s/day| pv/day| v/mon| s/mon| pv/mon
76
+ alexa | kp.ru| N/A| N/A| 1477599| 6825125| N/A| 44974428
77
+ googleanalytics | UA-23870775-1| N/A| N/A| N/A| N/A| N/A| N/A
78
+ liveinternet | | 597956| 745757| 1787863| 10585641| 21308436| 49775501
79
+ mailru | 294001| 756600| N/A| 2230674| 15086634| N/A| 73738178
80
+ openstat | 2026010| 983579| 1195306| 2823114| 14757845| 28953554| 69970669
81
+ quantcast | wd:ru.kp| N/A| N/A| N/A| 36300| N/A| N/A
82
+ rambler | 17841| 1048235| 1287761| 3015270| 15550162| 31307958| 75869606
83
+ yandexmetrika | 1051362| 259987| 310983| 727833| N/A| N/A| 22153416
84
+
85
+ ## API usage
86
+
87
+ One can easily use web analytics discovery using simple API. Every web
88
+ analytics service is supported by a separate class named after that
89
+ service in `WebAnalyticsDiscovery` module:
90
+
91
+ * `Alexa`
92
+ * `GoogleAnalytics`
93
+ * `LiveInternet`
94
+ * `MailRu`
95
+ * `Openstat`
96
+ * `Quantcast`
97
+ * `Rambler`
98
+ * `YandexMetrika`
99
+
100
+ One can use it like that:
101
+
102
+ require 'web_analytics_discovery'
103
+ d = WebAnalyticsDiscovery::MailRu.new
104
+ result = d.run('http://kp.ru/')
105
+
106
+ `result` will look like that:
107
+
108
+ {:id=>294001,
109
+ :visitors_day=>756600,
110
+ :pv_day=>2230674,
111
+ :visitors_week=>3365344,
112
+ :pv_week=>13102096,
113
+ :visitors_mon=>15086634,
114
+ :pv_mon=>73738178}
115
+
116
+ Some values might be missing if it's not possible to retrieve them
117
+ from a given service.
118
+
119
+ ## Licensing and usage
120
+
121
+ Copyright (C) 2013-2014 Mikhail Yakshin <greycat@altlinux.org>
122
+
123
+ This program is free software: you can redistribute it and/or modify
124
+ it under the terms of the GNU Affero General Public License as
125
+ published by the Free Software Foundation, either version 3 of the
126
+ License, or (at your option) any later version.
127
+
128
+ This program is distributed in the hope that it will be useful, but
129
+ WITHOUT ANY WARRANTY; without even the implied warranty of
130
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
131
+ Affero General Public License for more details.
132
+
133
+ Please consult LICENSE file for more details and full license text.
@@ -0,0 +1,7 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new
5
+
6
+ task :default => :spec
7
+ task :test => :spec
@@ -0,0 +1,77 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'fileutils'
4
+ require 'uri'
5
+ require 'optparse'
6
+
7
+ require 'web_analytics_discovery'
8
+ include WebAnalyticsDiscovery
9
+
10
+ class AnalyticsGrabber
11
+ def initialize
12
+ @services = {}
13
+ SERVICES.each_pair { |name, klass|
14
+ begin
15
+ @services[name] = klass.new
16
+ rescue Exception => ex
17
+ warn "Unable to start analytics service #{name}"
18
+ $stderr.puts ex.message
19
+ $stderr.puts ex.backtrace.join("\n")
20
+ end
21
+ }
22
+ end
23
+
24
+ def run(url)
25
+ r = {}
26
+ @services.each_pair { |name, service|
27
+ begin
28
+ r[name] = service.run(url)
29
+ rescue Exception => ex
30
+ warn "Exception querying analytics service #{name}"
31
+ $stderr.puts ex.message
32
+ $stderr.puts ex.backtrace.join("\n")
33
+ end
34
+ }
35
+ r
36
+ end
37
+
38
+ def pp(url)
39
+ r = run(url)
40
+ print_line ['', 'id', 'v/day', 's/day', 'pv/day', 'v/mon', 's/mon', 'pv/mon']
41
+ r.keys.sort.each { |service|
42
+ res = r[service]
43
+ next unless res
44
+ print_line [
45
+ service,
46
+ res[:id],
47
+ res[:visitors_day],
48
+ res[:visits_day],
49
+ res[:pv_day],
50
+ res[:visitors_mon],
51
+ res[:visits_mon],
52
+ res[:pv_mon],
53
+ ]
54
+ }
55
+ end
56
+
57
+ def print_line(a)
58
+ printf '%-20s', a.shift
59
+ printf '|%24s', a.shift
60
+ a.each { |x|
61
+ printf '|%11s', x || 'N/A'
62
+ }
63
+ puts
64
+ end
65
+ end
66
+
67
+ options = {}
68
+ OptionParser.new { |opts|
69
+ opts.banner = "Usage: #{__FILE__} [options] <urls>"
70
+
71
+ # opts.on('-v', '--[no-]verbose', 'Verbose logging') { |v| options[:verbose] = v }
72
+
73
+ opts.on_tail('-h', '--help', 'Show this message') { puts opts; exit }
74
+ }.parse!
75
+
76
+ ag = AnalyticsGrabber.new
77
+ ARGV.each { |a| ag.pp(a) }
@@ -0,0 +1,23 @@
1
+ require 'web_analytics_discovery/version'
2
+
3
+ require 'web_analytics_discovery/grabber/alexa'
4
+ require 'web_analytics_discovery/grabber/googleanalytics'
5
+ require 'web_analytics_discovery/grabber/liveinternet'
6
+ require 'web_analytics_discovery/grabber/mailru'
7
+ require 'web_analytics_discovery/grabber/openstat'
8
+ require 'web_analytics_discovery/grabber/quantcast'
9
+ require 'web_analytics_discovery/grabber/rambler'
10
+ require 'web_analytics_discovery/grabber/tns'
11
+ require 'web_analytics_discovery/grabber/yandexmetrika'
12
+
13
+ module WebAnalyticsDiscovery
14
+ # Special trickery to get a map of {:service_name => ClassThatImplementsServiceExtraction} magic
15
+ SERVICES = Hash[constants.map { |x|
16
+ possible_class = const_get(x)
17
+ if possible_class.class == Class
18
+ [x.to_s.downcase.to_sym, possible_class]
19
+ else
20
+ nil
21
+ end
22
+ }.delete_if { |v| v.nil? }]
23
+ end
@@ -0,0 +1,33 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'uri'
4
+ require 'web_analytics_discovery/grabberutils'
5
+
6
+ module WebAnalyticsDiscovery
7
+ class Alexa
8
+ include GrabberUtils
9
+
10
+ def run(url)
11
+ uri = URI.parse(url)
12
+ host = uri.host
13
+ r = {}
14
+ doc = download("http://www.alexa.com/siteinfo/#{host}#trafficstats")
15
+
16
+ # Try to extract certified metrics
17
+ r[:visitors_day], r[:pv_day], r[:visitors_mon], r[:pv_mon] = grab_certified_metrics(doc)
18
+
19
+ # Grab ID for clarity's sake
20
+ if doc =~ /<img src="http:\/\/traffic\.alexa\.com\/graph\?.*&u=([^"]+)">/
21
+ r[:id] = $1
22
+ end
23
+ return r
24
+ end
25
+
26
+ def grab_certified_metrics(doc)
27
+ r = []
28
+ doc.gsub(/<strong class="metrics-data">([0-9,]+)<\/strong>/) { r << $1 }
29
+ r.map! { |x| x.gsub(/,/, '').to_i }
30
+ return r
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,29 @@
1
+ require 'web_analytics_discovery/grabberutils'
2
+
3
+ module WebAnalyticsDiscovery
4
+ class GoogleAnalytics
5
+ include GrabberUtils
6
+
7
+ def run(url)
8
+ @page = download(url)
9
+ run_id(find_id)
10
+ end
11
+
12
+ def find_id
13
+ case @page
14
+ when /_gat\._getTracker\(["']([^"']+)["']\)/
15
+ $1
16
+ when /_gaq\.push\(\[['"]_setAccount['"], ['"]([^"']+)['"]\]\)/
17
+ $1
18
+ else
19
+ nil
20
+ end
21
+ end
22
+
23
+ def run_id(id)
24
+ return nil unless id
25
+ r = {:id => id}
26
+ return r
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,61 @@
1
+ # -*- coding: utf-8 -*-
2
+
3
+ require 'uri'
4
+ require 'web_analytics_discovery/grabberutils'
5
+
6
+ module WebAnalyticsDiscovery
7
+ class LiveInternet
8
+ include GrabberUtils
9
+
10
+ def run(url)
11
+ @url = url
12
+ @page = download(url)
13
+ run_id(find_id)
14
+ end
15
+
16
+ def find_id
17
+ case @page
18
+ when /new Image\(\)\.src = "\/\/counter\.yadro\.ru\/hit;([^?"]+)\?/
19
+ $1
20
+ else
21
+ # Use hostname as a last resort measure
22
+ URI.parse(@url).host
23
+ end
24
+ end
25
+
26
+ def run_id(host)
27
+ r = {:id => host}
28
+
29
+ doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv")
30
+ r[:pv_day], r[:visits_day], r[:visitors_day] = grab_psv(doc, 4)
31
+
32
+ # Bail out early if no LiveInternet data available
33
+ return r unless r[:pv_day]
34
+
35
+ doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv?period=week;total=yes")
36
+ r[:pv_week], r[:visits_week], r[:visitors_week] = grab_psv(doc, 2)
37
+
38
+ doc = download("http://www.liveinternet.ru/stat/#{host}/index.csv?period=month;total=yes")
39
+ r[:pv_mon], r[:visits_mon], r[:visitors_mon] = grab_psv(doc, 2)
40
+
41
+ return r
42
+ end
43
+
44
+ private
45
+ def grab_psv(doc, col)
46
+ r = [nil, nil, nil]
47
+ doc.split(/\n/).each { |l|
48
+ c = l.split(/;/)
49
+ case c[0]
50
+ when '"Просмотры"'
51
+ r[0] = c[col].to_i
52
+ when '"Сессии"'
53
+ r[1] = c[col].to_i
54
+ when '"Посетители"'
55
+ r[2] = c[col].to_i
56
+ end
57
+ }
58
+ return r
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,89 @@
1
+ # -*- coding: utf-8 -*-
2
+
3
+ require 'web_analytics_discovery/grabberutils'
4
+
5
+ module WebAnalyticsDiscovery
6
+ class MailRu
7
+ include GrabberUtils
8
+
9
+ def run(url)
10
+ @page = download(url)
11
+ run_id(find_id)
12
+ end
13
+
14
+ def find_id
15
+ case @page
16
+ when /<a [^>]*href="http:\/\/top\.mail\.ru\/jump\?from=(\d+)".*>\s*<img src="http:\/\/.*.top.mail.ru\/counter/m,
17
+ /<img src=['"]?http:\/\/top\.list\.ru\/counter\?id=(\d+)/,
18
+ /<img src=['"]?http:\/\/.*top\.mail\.ru\/counter\?js=na;id=(\d+)/,
19
+ /_tmr.push\(\{id:\s*['"](\d+)['"]/
20
+ $1.to_i
21
+ else
22
+ nil
23
+ end
24
+ end
25
+
26
+ def run_id(id)
27
+ return nil unless id
28
+ r = {:id => id}
29
+
30
+ #doc = download("http://top.mail.ru/visits?id=#{id}")
31
+
32
+ # Analyze daily report
33
+ doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=0&date=&back=30&", 'windows-1251').split(/\n/)
34
+ return run_id_html_rating(r, id) if doc.empty?
35
+ doc = doc[4..-1]
36
+
37
+ sum_v = 0
38
+ sum_pv = 0
39
+ doc.each { |l|
40
+ #"Дата";"Посетители";"Новые посетители";"Ядро";"Хосты";"Просмотры";"Глубина"
41
+ date, v, new_v, core_v, hosts, pv, depth = l.split(/;/)
42
+ sum_v += v.to_i
43
+ sum_pv += pv.to_i
44
+ }
45
+
46
+ r[:visitors_day] = sum_v / doc.size
47
+ r[:pv_day] = sum_pv / doc.size
48
+
49
+ # Analyze weekly report
50
+ doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=1&date=&back=98&", 'windows-1251').split(/\n/)
51
+ return r if doc.empty?
52
+ date, v, new_v, core_v, hosts, pv, depth = doc[4].split(/;/)
53
+ r[:visitors_week] = v.to_i
54
+ r[:pv_week] = pv.to_i
55
+
56
+ # Analyze monthly report
57
+ doc = download("http://top.mail.ru/visits.csv?id=#{id}&period=2&date=&back=395&", 'windows-1251').split(/\n/)
58
+ return r if doc.empty?
59
+ date, v, new_v, core_v, hosts, pv, depth = doc[4].split(/;/)
60
+ r[:visitors_mon] = v.to_i
61
+ r[:pv_mon] = pv.to_i
62
+
63
+ return r
64
+ end
65
+
66
+ # Parse semi-closed rating when normal full CSV export is not available
67
+ def run_id_html_rating(r, id)
68
+ doc = download("http://top.mail.ru/rating?id=#{id}", 'windows-1251')
69
+
70
+ today = []
71
+ doc.gsub(/<td class="l_col">Сегодня<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { today << $1.gsub(/,/, '').to_i }
72
+
73
+ week = []
74
+ doc.gsub(/<td class="l_col">Неделя<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { week << $1.gsub(/,/, '').to_i }
75
+
76
+ month = []
77
+ doc.gsub(/<td class="l_col">Месяц<\/td>.*?<td class="r_col"><b>([0-9,]+)<\/b>/m) { month << $1.gsub(/,/, '').to_i }
78
+
79
+ # Non-normal number of matches? That's weird, bail out
80
+ return r unless today.length == 3 and week.length == 3 and month.length == 3
81
+
82
+ r[:visitors_day], r[:pv_day], r[:ip_day] = today
83
+ r[:visitors_week], r[:pv_week], r[:ip_week] = week
84
+ r[:visitors_mon], r[:pv_mon], r[:ip_mon] = month
85
+
86
+ return r
87
+ end
88
+ end
89
+ end