movieDB 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +47 -32
- data/images/sampbar.png +0 -0
- data/lib/movieDB.rb +1 -2
- data/lib/movieDB/data_analysis/statistics.rb +15 -11
- data/lib/movieDB/data_store.rb +2 -0
- data/lib/movieDB/relation/query_methods.rb +2 -1
- data/lib/movieDB/version.rb +1 -1
- data/movieDB.gemspec +1 -0
- data/spec/movieDB/data_analysis/statistics_spec.rb +2 -2
- metadata +17 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 15142fe8cad1d00faba9deef3d9b9968a0abca47
|
4
|
+
data.tar.gz: 5f0faa23b644c40f564eb658902ba22d1b4cffdc
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b754c6573573f84c74dbe11ff5190f5c6cf218f432370e61c2698d2d32ebd0155cd0f6a58893c241fb033793ac61822ec0fd732c8dfb223daf1e58b1118f3104
|
7
|
+
data.tar.gz: 5cc88caeaef20a8a0357a817319354df47ab802f3a805e3479608c9e74e341e5a8e35bc43956bf97aee2ceda7eb0ec03ba98907452562800b970e9690eb3fa78
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# MovieDB
|
2
2
|
|
3
|
-
MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie
|
3
|
+
MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie Data from IMDb.
|
4
4
|
The objective and usage of this tool is to allow producers, directors, writers to make logical business decisions that will generate profitable ROI.
|
5
5
|
|
6
6
|
## Badges
|
@@ -15,8 +15,8 @@
|
|
15
15
|
## Technology
|
16
16
|
* SciRuby is used for all statistical and scientific computations.
|
17
17
|
* Redis is used to store all data.
|
18
|
-
* IMDb and TMDb is the source for all film
|
19
|
-
* BoxOfficeMojo is where we will be scraping future film
|
18
|
+
* IMDb and TMDb is the source for all film.
|
19
|
+
* BoxOfficeMojo is where we will be scraping future film.
|
20
20
|
* Celluloid is used to build the fault-tolerant concurrent programs. Note, if you are using MRI or YARV,
|
21
21
|
multi-threading won't work since these types of interpreters have Global Interpreter Lock (GIL).
|
22
22
|
Fortunately, you can use JRuby or Rubinius, since they don’t have a GIL and support real parallel threading.
|
@@ -26,20 +26,6 @@ ruby-2.2.2 or higher
|
|
26
26
|
|
27
27
|
jruby-9.0.0.0
|
28
28
|
|
29
|
-
## Category
|
30
|
-
movieDB is broken down into 3 components namely:
|
31
|
-
|
32
|
-
* Statistics
|
33
|
-
* Visualizations (Work in progress)
|
34
|
-
* DataMining (Work in progress)
|
35
|
-
|
36
|
-
# Statistics
|
37
|
-
|
38
|
-
Simple statistical analysis on numeric data.
|
39
|
-
The corresponding computation is performed on
|
40
|
-
both numeric and string vectors within the
|
41
|
-
collected data.
|
42
|
-
|
43
29
|
## Installation
|
44
30
|
|
45
31
|
Redis Installation
|
@@ -84,7 +70,7 @@ m = MovieDB::Movie.pool(size: 2)
|
|
84
70
|
```
|
85
71
|
## Step Process
|
86
72
|
|
87
|
-
Fetching and analysing movie
|
73
|
+
Fetching and analysing movie data using movieDB is a simple 2 step process.
|
88
74
|
|
89
75
|
First, fetch the data from IMDb.
|
90
76
|
|
@@ -94,13 +80,33 @@ That's it! It is that simple.
|
|
94
80
|
|
95
81
|
## Part 1 - Fetch Data from IMDb
|
96
82
|
|
97
|
-
There are
|
83
|
+
There are 3 ways to find IMDb ids.
|
84
|
+
|
85
|
+
* Search IMDb id via API
|
86
|
+
|
87
|
+
* Search IMDb id via Website
|
98
88
|
|
99
|
-
*
|
89
|
+
* Generate random IMDb ids.
|
100
90
|
|
101
|
-
|
91
|
+
### Search IMDb id via API
|
102
92
|
|
103
|
-
|
93
|
+
You can read the [documentation](http://rubydoc.info/github/ariejan/imdb/master/frames) for IMDb API to see all that you can do with this gem.
|
94
|
+
|
95
|
+
``` ruby
|
96
|
+
i = Imdb::Search.new("Star Trek")
|
97
|
+
|
98
|
+
i.movies.size #=> 97
|
99
|
+
```
|
100
|
+
This will return 97 objects related to 'Star Trek'
|
101
|
+
|
102
|
+
To collect all the IMDb ids
|
103
|
+
|
104
|
+
``` ruby
|
105
|
+
ids = i.movies.collect(&:id).uniq
|
106
|
+
|
107
|
+
#=> ["0796366", "0060028", "0079945" ...]
|
108
|
+
```
|
109
|
+
### Search IMDb id via Website
|
104
110
|
|
105
111
|
To find IMDb id for specific movies, you must go to:
|
106
112
|
|
@@ -116,8 +122,9 @@ http://www.imdb.com/title/tt0369610/
|
|
116
122
|
```
|
117
123
|
0369610 is the IMDb id.
|
118
124
|
|
119
|
-
###
|
120
|
-
|
125
|
+
### Generate random IMDb ids (multi-thread setup)
|
126
|
+
|
127
|
+
You can fetch IMDb ids random. This approach will probably run you into some problems, see Disclaimer.
|
121
128
|
|
122
129
|
``` ruby
|
123
130
|
r = Random.new
|
@@ -133,7 +140,7 @@ Note: IMDB has a rate limit of 40 requests every 10 seconds and are limited by I
|
|
133
140
|
If you exceed the limit, you will receive a 429 HTTP status with a 'Retry-After' header.
|
134
141
|
As soon your cool down period expires, you are free to continue making requests.
|
135
142
|
|
136
|
-
Also, movieDB will throw a NameError if the randomly generated IMDb id
|
143
|
+
Also, movieDB will throw a NameError if the randomly generated IMDb id is invalid.
|
137
144
|
|
138
145
|
### Get Movie Data
|
139
146
|
|
@@ -227,7 +234,7 @@ plot_summa 373 298 311
|
|
227
234
|
|
228
235
|
When performing statistics on an object, movieDB by default processes all fields.
|
229
236
|
|
230
|
-
|
237
|
+
However, you now have the option of filtering what fields you want processed using the following filters:
|
231
238
|
|
232
239
|
* only
|
233
240
|
* except
|
@@ -287,10 +294,18 @@ m.all_ids
|
|
287
294
|
Gets the remaining time to live of a movie.
|
288
295
|
|
289
296
|
``` ruby
|
290
|
-
m.ttl("0369610)
|
297
|
+
m.ttl("0369610")
|
291
298
|
# => 120
|
292
299
|
```
|
293
300
|
|
301
|
+
* DELETE key
|
302
|
+
deletes a single movie object stored in redis.
|
303
|
+
|
304
|
+
``` ruby
|
305
|
+
m.del("0369610")
|
306
|
+
# => # => ["3079380"...]
|
307
|
+
```
|
308
|
+
|
294
309
|
* DELETE_ALL key
|
295
310
|
deletes all movie objects stored in redis.
|
296
311
|
|
@@ -298,11 +313,6 @@ deletes all movie objects stored in redis.
|
|
298
313
|
m.delete_all
|
299
314
|
# => []
|
300
315
|
```
|
301
|
-
# Visualizations
|
302
|
-
(Work in progress)
|
303
|
-
|
304
|
-
# Data mining
|
305
|
-
(Work in progress)
|
306
316
|
|
307
317
|
## Contact me
|
308
318
|
|
@@ -312,5 +322,10 @@ You can also contact me at albertmck@gmail.com
|
|
312
322
|
|
313
323
|
## Disclaimer
|
314
324
|
This software is provided “as is” and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
|
325
|
+
Neither I, nor any developer who contributed to this project, accept any kind of liability for your use of this library.
|
326
|
+
|
327
|
+
IMDB does not permit use of its data by third parties without their consent.
|
328
|
+
|
329
|
+
Using this library for anything other than limited personal use may result in an IP ban to the IMDB website.
|
315
330
|
|
316
331
|
###### Copyright (c) 2013 - 2015 Albert McKeever, released under MIT license
|
data/images/sampbar.png
ADDED
Binary file
|
data/lib/movieDB.rb
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
require 'daru'
|
2
|
+
require 'json'
|
2
3
|
|
3
4
|
module MovieDB
|
4
5
|
module DataAnalysis
|
@@ -11,9 +12,12 @@ module MovieDB
|
|
11
12
|
|
12
13
|
stats = [:mean, :std, :sum, :count, :max, :min, :min, :product, :standardize, :describe, :covariance, :correlation, :worksheet]
|
13
14
|
|
15
|
+
$collect_vals = {}
|
16
|
+
|
14
17
|
stats.each do |method_name|
|
15
18
|
define_method method_name do |**args|
|
16
|
-
|
19
|
+
$collect_vals[:method] = method_name.to_s
|
20
|
+
$collect_vals[:vals] = dataframes_stats(method_name, args)
|
17
21
|
end
|
18
22
|
end
|
19
23
|
|
@@ -22,16 +26,16 @@ module MovieDB
|
|
22
26
|
def dataframes_stats(method, filters = {})
|
23
27
|
raise ArgumentError, 'Please provide 2 or more IMDd ids.' if $movie_data.length <= 1
|
24
28
|
|
25
|
-
|
26
|
-
|
29
|
+
$data_key = {}
|
30
|
+
$index = []
|
27
31
|
|
28
32
|
if filters.empty?
|
29
33
|
$movie_data.each_with_index do |movie, _|
|
30
34
|
value_count = []
|
31
35
|
|
32
36
|
movie.each_pair do |k, v|
|
33
|
-
|
34
|
-
|
37
|
+
$data_key[(movie['title'].sub(" ", "_").downcase)] = value_count << (MovieDB::DataAnalysis::Statistics.numeric_vals.any? { |word| word == k } ? v.to_i : v.split(' ').count)
|
38
|
+
$index << k.to_sym
|
35
39
|
end
|
36
40
|
end
|
37
41
|
else
|
@@ -45,8 +49,8 @@ module MovieDB
|
|
45
49
|
mr = movie.reject { |k, _| k != filter.to_s }
|
46
50
|
|
47
51
|
mr.each_pair do |k, v|
|
48
|
-
|
49
|
-
|
52
|
+
$data_key[(movie['title'].sub(" ", "_").downcase)] = value_count << (MovieDB::DataAnalysis::Statistics.numeric_vals.any? { |word| word == k } ? v.to_i : v.join(' ').split(' ').count)
|
53
|
+
$index << k.to_sym
|
50
54
|
end
|
51
55
|
end
|
52
56
|
end
|
@@ -58,8 +62,8 @@ module MovieDB
|
|
58
62
|
value_count = []
|
59
63
|
|
60
64
|
mr.each_pair do |k, v|
|
61
|
-
|
62
|
-
|
65
|
+
$data_key[(movie['title'].sub(" ", "_").downcase)] = value_count << (MovieDB::DataAnalysis::Statistics.numeric_vals.any? { |word| word == k } ? v.to_i : v.join(' ').split(' ').count)
|
66
|
+
$index << k.to_sym
|
63
67
|
end
|
64
68
|
end
|
65
69
|
end
|
@@ -68,9 +72,9 @@ module MovieDB
|
|
68
72
|
end
|
69
73
|
end
|
70
74
|
|
71
|
-
index =
|
75
|
+
index = $index.uniq
|
72
76
|
|
73
|
-
movie_numeric_vector = Hash[
|
77
|
+
movie_numeric_vector = Hash[$data_key.map { |k, v| [k.to_s.gsub('-', '_').to_sym, v] }]
|
74
78
|
compute_stats(method, movie_numeric_vector, index )
|
75
79
|
end
|
76
80
|
|
data/lib/movieDB/data_store.rb
CHANGED
@@ -48,7 +48,7 @@ module MovieDB
|
|
48
48
|
# m.fetch("0369610", "3079380", "0478970")
|
49
49
|
#
|
50
50
|
# m.hgetall("0369610")
|
51
|
-
[:all, :hkeys, :hvals, :flushall, :ttl].each do |method_name|
|
51
|
+
[:all, :hkeys, :hvals, :flushall, :ttl, :del].each do |method_name|
|
52
52
|
define_method method_name do |arg|
|
53
53
|
MovieDB::DataStore.get_data(method_name, arg)
|
54
54
|
end
|
@@ -56,6 +56,7 @@ module MovieDB
|
|
56
56
|
|
57
57
|
alias hgetall all
|
58
58
|
|
59
|
+
# No argument is required.
|
59
60
|
[:scan, :flushall].each do |method_name|
|
60
61
|
define_method method_name do
|
61
62
|
mn = MovieDB::DataStore.get_data(method_name)
|
data/lib/movieDB/version.rb
CHANGED
data/movieDB.gemspec
CHANGED
@@ -9,8 +9,8 @@ describe MovieDB do
|
|
9
9
|
|
10
10
|
context '#mean' do
|
11
11
|
it 'should calculate mean of all values' do
|
12
|
-
|
13
|
-
|
12
|
+
expect(m.mean.round(2)).to eq(Daru::Vector.new([ 2891127.4, 36972648.98, 1445963.96],
|
13
|
+
index: ['ant-man', :jurassic_world, :spy]))
|
14
14
|
end
|
15
15
|
end
|
16
16
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: movieDB
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Albert McKeever
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-08-
|
11
|
+
date: 2015-08-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -122,6 +122,20 @@ dependencies:
|
|
122
122
|
- - ">="
|
123
123
|
- !ruby/object:Gem::Version
|
124
124
|
version: '0'
|
125
|
+
- !ruby/object:Gem::Dependency
|
126
|
+
name: nyaplot
|
127
|
+
requirement: !ruby/object:Gem::Requirement
|
128
|
+
requirements:
|
129
|
+
- - ">="
|
130
|
+
- !ruby/object:Gem::Version
|
131
|
+
version: '0'
|
132
|
+
type: :runtime
|
133
|
+
prerelease: false
|
134
|
+
version_requirements: !ruby/object:Gem::Requirement
|
135
|
+
requirements:
|
136
|
+
- - ">="
|
137
|
+
- !ruby/object:Gem::Version
|
138
|
+
version: '0'
|
125
139
|
description: Perform Data Analysis on IMDB Movies
|
126
140
|
email:
|
127
141
|
- kotn_ep1@hotmail.com
|
@@ -139,6 +153,7 @@ files:
|
|
139
153
|
- LICENSE.txt
|
140
154
|
- README.md
|
141
155
|
- Rakefile
|
156
|
+
- images/sampbar.png
|
142
157
|
- lib/movieDB.rb
|
143
158
|
- lib/movieDB/.DS_Store
|
144
159
|
- lib/movieDB/base.rb
|