movieDB 1.0.0 → 1.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +47 -32
- data/images/sampbar.png +0 -0
- data/lib/movieDB.rb +1 -2
- data/lib/movieDB/data_analysis/statistics.rb +15 -11
- data/lib/movieDB/data_store.rb +2 -0
- data/lib/movieDB/relation/query_methods.rb +2 -1
- data/lib/movieDB/version.rb +1 -1
- data/movieDB.gemspec +1 -0
- data/spec/movieDB/data_analysis/statistics_spec.rb +2 -2
- metadata +17 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 15142fe8cad1d00faba9deef3d9b9968a0abca47
|
4
|
+
data.tar.gz: 5f0faa23b644c40f564eb658902ba22d1b4cffdc
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b754c6573573f84c74dbe11ff5190f5c6cf218f432370e61c2698d2d32ebd0155cd0f6a58893c241fb033793ac61822ec0fd732c8dfb223daf1e58b1118f3104
|
7
|
+
data.tar.gz: 5cc88caeaef20a8a0357a817319354df47ab802f3a805e3479608c9e74e341e5a8e35bc43956bf97aee2ceda7eb0ec03ba98907452562800b970e9690eb3fa78
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# MovieDB
|
2
2
|
|
3
|
-
MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie
|
3
|
+
MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie Data from IMDb.
|
4
4
|
The objective and usage of this tool is to allow producers, directors, writers to make logical business decisions that will generate profitable ROI.
|
5
5
|
|
6
6
|
## Badges
|
@@ -15,8 +15,8 @@
|
|
15
15
|
## Technology
|
16
16
|
* SciRuby is used for all statistical and scientific computations.
|
17
17
|
* Redis is used to store all data.
|
18
|
-
* IMDb and TMDb is the source for all film
|
19
|
-
* BoxOfficeMojo is where we will be scraping future film
|
18
|
+
* IMDb and TMDb is the source for all film.
|
19
|
+
* BoxOfficeMojo is where we will be scraping future film.
|
20
20
|
* Celluloid is used to build the fault-tolerant concurrent programs. Note, if you are using MRI or YARV,
|
21
21
|
multi-threading won't work since these types of interpreters have Global Interpreter Lock (GIL).
|
22
22
|
Fortunately, you can use JRuby or Rubinius, since they don’t have a GIL and support real parallel threading.
|
@@ -26,20 +26,6 @@ ruby-2.2.2 or higher
|
|
26
26
|
|
27
27
|
jruby-9.0.0.0
|
28
28
|
|
29
|
-
## Category
|
30
|
-
movieDB is broken down into 3 components namely:
|
31
|
-
|
32
|
-
* Statistics
|
33
|
-
* Visualizations (Work in progress)
|
34
|
-
* DataMining (Work in progress)
|
35
|
-
|
36
|
-
# Statistics
|
37
|
-
|
38
|
-
Simple statistical analysis on numeric data.
|
39
|
-
The corresponding computation is performed on
|
40
|
-
both numeric and string vectors within the
|
41
|
-
collected data.
|
42
|
-
|
43
29
|
## Installation
|
44
30
|
|
45
31
|
Redis Installation
|
@@ -84,7 +70,7 @@ m = MovieDB::Movie.pool(size: 2)
|
|
84
70
|
```
|
85
71
|
## Step Process
|
86
72
|
|
87
|
-
Fetching and analysing movie
|
73
|
+
Fetching and analysing movie data using movieDB is a simple 2 step process.
|
88
74
|
|
89
75
|
First, fetch the data from IMDb.
|
90
76
|
|
@@ -94,13 +80,33 @@ That's it! It is that simple.
|
|
94
80
|
|
95
81
|
## Part 1 - Fetch Data from IMDb
|
96
82
|
|
97
|
-
There are
|
83
|
+
There are 3 ways to find IMDb ids.
|
84
|
+
|
85
|
+
* Search IMDb id via API
|
86
|
+
|
87
|
+
* Search IMDb id via Website
|
98
88
|
|
99
|
-
*
|
89
|
+
* Generate random IMDb ids.
|
100
90
|
|
101
|
-
|
91
|
+
### Search IMDb id via API
|
102
92
|
|
103
|
-
|
93
|
+
You can read the [documentation](http://rubydoc.info/github/ariejan/imdb/master/frames) for IMDb API to see all that you can do with this gem.
|
94
|
+
|
95
|
+
``` ruby
|
96
|
+
i = Imdb::Search.new("Star Trek")
|
97
|
+
|
98
|
+
i.movies.size #=> 97
|
99
|
+
```
|
100
|
+
This will return 97 objects related to 'Star Trek'
|
101
|
+
|
102
|
+
To collect all the IMDb ids
|
103
|
+
|
104
|
+
``` ruby
|
105
|
+
ids = i.movies.collect(&:id).uniq
|
106
|
+
|
107
|
+
#=> ["0796366", "0060028", "0079945" ...]
|
108
|
+
```
|
109
|
+
### Search IMDb id via Website
|
104
110
|
|
105
111
|
To find IMDb id for specific movies, you must go to:
|
106
112
|
|
@@ -116,8 +122,9 @@ http://www.imdb.com/title/tt0369610/
|
|
116
122
|
```
|
117
123
|
0369610 is the IMDb id.
|
118
124
|
|
119
|
-
###
|
120
|
-
|
125
|
+
### Generate random IMDb ids (multi-thread setup)
|
126
|
+
|
127
|
+
You can fetch IMDb ids random. This approach will probably run you into some problems, see Disclaimer.
|
121
128
|
|
122
129
|
``` ruby
|
123
130
|
r = Random.new
|
@@ -133,7 +140,7 @@ Note: IMDB has a rate limit of 40 requests every 10 seconds and are limited by I
|
|
133
140
|
If you exceed the limit, you will receive a 429 HTTP status with a 'Retry-After' header.
|
134
141
|
As soon your cool down period expires, you are free to continue making requests.
|
135
142
|
|
136
|
-
Also, movieDB will throw a NameError if the randomly generated IMDb id
|
143
|
+
Also, movieDB will throw a NameError if the randomly generated IMDb id is invalid.
|
137
144
|
|
138
145
|
### Get Movie Data
|
139
146
|
|
@@ -227,7 +234,7 @@ plot_summa 373 298 311
|
|
227
234
|
|
228
235
|
When performing statistics on an object, movieDB by default processes all fields.
|
229
236
|
|
230
|
-
|
237
|
+
However, you now have the option of filtering what fields you want processed using the following filters:
|
231
238
|
|
232
239
|
* only
|
233
240
|
* except
|
@@ -287,10 +294,18 @@ m.all_ids
|
|
287
294
|
Gets the remaining time to live of a movie.
|
288
295
|
|
289
296
|
``` ruby
|
290
|
-
m.ttl("0369610)
|
297
|
+
m.ttl("0369610")
|
291
298
|
# => 120
|
292
299
|
```
|
293
300
|
|
301
|
+
* DELETE key
|
302
|
+
deletes a single movie object stored in redis.
|
303
|
+
|
304
|
+
``` ruby
|
305
|
+
m.del("0369610")
|
306
|
+
# => # => ["3079380"...]
|
307
|
+
```
|
308
|
+
|
294
309
|
* DELETE_ALL key
|
295
310
|
deletes all movie objects stored in redis.
|
296
311
|
|
@@ -298,11 +313,6 @@ deletes all movie objects stored in redis.
|
|
298
313
|
m.delete_all
|
299
314
|
# => []
|
300
315
|
```
|
301
|
-
# Visualizations
|
302
|
-
(Work in progress)
|
303
|
-
|
304
|
-
# Data mining
|
305
|
-
(Work in progress)
|
306
316
|
|
307
317
|
## Contact me
|
308
318
|
|
@@ -312,5 +322,10 @@ You can also contact me at albertmck@gmail.com
|
|
312
322
|
|
313
323
|
## Disclaimer
|
314
324
|
This software is provided “as is” and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
|
325
|
+
Neither I, nor any developer who contributed to this project, accept any kind of liability for your use of this library.
|
326
|
+
|
327
|
+
IMDB does not permit use of its data by third parties without their consent.
|
328
|
+
|
329
|
+
Using this library for anything other than limited personal use may result in an IP ban to the IMDB website.
|
315
330
|
|
316
331
|
###### Copyright (c) 2013 - 2015 Albert McKeever, released under MIT license
|
data/images/sampbar.png
ADDED
Binary file
|
data/lib/movieDB.rb
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
require 'daru'
|
2
|
+
require 'json'
|
2
3
|
|
3
4
|
module MovieDB
|
4
5
|
module DataAnalysis
|
@@ -11,9 +12,12 @@ module MovieDB
|
|
11
12
|
|
12
13
|
stats = [:mean, :std, :sum, :count, :max, :min, :min, :product, :standardize, :describe, :covariance, :correlation, :worksheet]
|
13
14
|
|
15
|
+
$collect_vals = {}
|
16
|
+
|
14
17
|
stats.each do |method_name|
|
15
18
|
define_method method_name do |**args|
|
16
|
-
|
19
|
+
$collect_vals[:method] = method_name.to_s
|
20
|
+
$collect_vals[:vals] = dataframes_stats(method_name, args)
|
17
21
|
end
|
18
22
|
end
|
19
23
|
|
@@ -22,16 +26,16 @@ module MovieDB
|
|
22
26
|
def dataframes_stats(method, filters = {})
|
23
27
|
raise ArgumentError, 'Please provide 2 or more IMDd ids.' if $movie_data.length <= 1
|
24
28
|
|
25
|
-
|
26
|
-
|
29
|
+
$data_key = {}
|
30
|
+
$index = []
|
27
31
|
|
28
32
|
if filters.empty?
|
29
33
|
$movie_data.each_with_index do |movie, _|
|
30
34
|
value_count = []
|
31
35
|
|
32
36
|
movie.each_pair do |k, v|
|
33
|
-
|
34
|
-
|
37
|
+
$data_key[(movie['title'].sub(" ", "_").downcase)] = value_count << (MovieDB::DataAnalysis::Statistics.numeric_vals.any? { |word| word == k } ? v.to_i : v.split(' ').count)
|
38
|
+
$index << k.to_sym
|
35
39
|
end
|
36
40
|
end
|
37
41
|
else
|
@@ -45,8 +49,8 @@ module MovieDB
|
|
45
49
|
mr = movie.reject { |k, _| k != filter.to_s }
|
46
50
|
|
47
51
|
mr.each_pair do |k, v|
|
48
|
-
|
49
|
-
|
52
|
+
$data_key[(movie['title'].sub(" ", "_").downcase)] = value_count << (MovieDB::DataAnalysis::Statistics.numeric_vals.any? { |word| word == k } ? v.to_i : v.join(' ').split(' ').count)
|
53
|
+
$index << k.to_sym
|
50
54
|
end
|
51
55
|
end
|
52
56
|
end
|
@@ -58,8 +62,8 @@ module MovieDB
|
|
58
62
|
value_count = []
|
59
63
|
|
60
64
|
mr.each_pair do |k, v|
|
61
|
-
|
62
|
-
|
65
|
+
$data_key[(movie['title'].sub(" ", "_").downcase)] = value_count << (MovieDB::DataAnalysis::Statistics.numeric_vals.any? { |word| word == k } ? v.to_i : v.join(' ').split(' ').count)
|
66
|
+
$index << k.to_sym
|
63
67
|
end
|
64
68
|
end
|
65
69
|
end
|
@@ -68,9 +72,9 @@ module MovieDB
|
|
68
72
|
end
|
69
73
|
end
|
70
74
|
|
71
|
-
index =
|
75
|
+
index = $index.uniq
|
72
76
|
|
73
|
-
movie_numeric_vector = Hash[
|
77
|
+
movie_numeric_vector = Hash[$data_key.map { |k, v| [k.to_s.gsub('-', '_').to_sym, v] }]
|
74
78
|
compute_stats(method, movie_numeric_vector, index )
|
75
79
|
end
|
76
80
|
|
data/lib/movieDB/data_store.rb
CHANGED
@@ -48,7 +48,7 @@ module MovieDB
|
|
48
48
|
# m.fetch("0369610", "3079380", "0478970")
|
49
49
|
#
|
50
50
|
# m.hgetall("0369610")
|
51
|
-
[:all, :hkeys, :hvals, :flushall, :ttl].each do |method_name|
|
51
|
+
[:all, :hkeys, :hvals, :flushall, :ttl, :del].each do |method_name|
|
52
52
|
define_method method_name do |arg|
|
53
53
|
MovieDB::DataStore.get_data(method_name, arg)
|
54
54
|
end
|
@@ -56,6 +56,7 @@ module MovieDB
|
|
56
56
|
|
57
57
|
alias hgetall all
|
58
58
|
|
59
|
+
# No argument is required.
|
59
60
|
[:scan, :flushall].each do |method_name|
|
60
61
|
define_method method_name do
|
61
62
|
mn = MovieDB::DataStore.get_data(method_name)
|
data/lib/movieDB/version.rb
CHANGED
data/movieDB.gemspec
CHANGED
@@ -9,8 +9,8 @@ describe MovieDB do
|
|
9
9
|
|
10
10
|
context '#mean' do
|
11
11
|
it 'should calculate mean of all values' do
|
12
|
-
|
13
|
-
|
12
|
+
expect(m.mean.round(2)).to eq(Daru::Vector.new([ 2891127.4, 36972648.98, 1445963.96],
|
13
|
+
index: ['ant-man', :jurassic_world, :spy]))
|
14
14
|
end
|
15
15
|
end
|
16
16
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: movieDB
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Albert McKeever
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-08-
|
11
|
+
date: 2015-08-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -122,6 +122,20 @@ dependencies:
|
|
122
122
|
- - ">="
|
123
123
|
- !ruby/object:Gem::Version
|
124
124
|
version: '0'
|
125
|
+
- !ruby/object:Gem::Dependency
|
126
|
+
name: nyaplot
|
127
|
+
requirement: !ruby/object:Gem::Requirement
|
128
|
+
requirements:
|
129
|
+
- - ">="
|
130
|
+
- !ruby/object:Gem::Version
|
131
|
+
version: '0'
|
132
|
+
type: :runtime
|
133
|
+
prerelease: false
|
134
|
+
version_requirements: !ruby/object:Gem::Requirement
|
135
|
+
requirements:
|
136
|
+
- - ">="
|
137
|
+
- !ruby/object:Gem::Version
|
138
|
+
version: '0'
|
125
139
|
description: Perform Data Analysis on IMDB Movies
|
126
140
|
email:
|
127
141
|
- kotn_ep1@hotmail.com
|
@@ -139,6 +153,7 @@ files:
|
|
139
153
|
- LICENSE.txt
|
140
154
|
- README.md
|
141
155
|
- Rakefile
|
156
|
+
- images/sampbar.png
|
142
157
|
- lib/movieDB.rb
|
143
158
|
- lib/movieDB/.DS_Store
|
144
159
|
- lib/movieDB/base.rb
|