movieDB 0.3.4 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.DS_Store +0 -0
- data/.coveralls.yml +2 -0
- data/.gitignore +3 -1
- data/.rspec +2 -0
- data/.travis.yml +5 -0
- data/Gemfile +8 -3
- data/README.md +250 -103
- data/Rakefile +3 -0
- data/lib/movieDB.rb +22 -141
- data/lib/movieDB/base.rb +3 -10
- data/lib/movieDB/data_analysis/statistics.rb +85 -0
- data/lib/movieDB/data_store.rb +83 -0
- data/lib/movieDB/relation/query_methods.rb +139 -0
- data/lib/movieDB/secret.rb +2 -6
- data/lib/movieDB/support/reporting.rb +19 -0
- data/lib/movieDB/version.rb +1 -1
- data/movieDB.gemspec +6 -6
- data/spec/movieDB/data_analysis/statistics_spec.rb +105 -0
- data/spec/movieDB/data_store_spec.rb +31 -0
- data/spec/movieDB/relation/query_methods_spec.rb +71 -0
- data/spec/movieDB/support/reporting_spec.rb +12 -0
- data/spec/movieDB_spec.rb +24 -0
- data/spec/spec_helper.rb +29 -0
- metadata +33 -23
- data/lib/movieDB/data_analysis.rb +0 -263
- data/lib/movieDB/data_export.rb +0 -96
- data/lib/movieDB/data_process.rb +0 -26
- data/lib/movieDB/movie_error.rb +0 -20
- data/lib/movieDB/status_checker.rb +0 -48
- data/test/unit/test_movie_db.rb +0 -97
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 18eb3efc3ad17c7cb5b6134a951402964af0b5da
|
4
|
+
data.tar.gz: 7eeb628cab2588293f92aa458eeabb903cbfb6ec
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 09c390931150038ca9b18d52ffdd084a805b344c4272876c9ff6880ee2c4ae3da5623be445cc9b3ad6e7f192e60594fb62a682ebad4168f68d08f8e07b3629b5
|
7
|
+
data.tar.gz: 4b559dcc3e841b450187b87f1060039033109bdce2cd4791ae5c9628bea929db875acff15748e807f9e6444b2eeddcd03ff37b0b2b62d4b324bffdf30468d068
|
data/.DS_Store
ADDED
Binary file
|
data/.coveralls.yml
ADDED
data/.gitignore
CHANGED
data/.rspec
ADDED
data/.travis.yml
CHANGED
data/Gemfile
CHANGED
@@ -2,7 +2,12 @@ source 'https://rubygems.org'
|
|
2
2
|
|
3
3
|
gemspec
|
4
4
|
|
5
|
-
|
6
|
-
|
7
|
-
|
5
|
+
gem 'coveralls', require: false
|
6
|
+
gem 'rspec'
|
7
|
+
|
8
8
|
|
9
|
+
# Code coverage analysis tool for Ruby.
|
10
|
+
group :test do
|
11
|
+
gem 'simplecov', :require => false
|
12
|
+
gem 'pullreview-coverage', require: false
|
13
|
+
end
|
data/README.md
CHANGED
@@ -1,164 +1,308 @@
|
|
1
|
-
|
1
|
+
# MovieDB
|
2
2
|
|
3
|
-
MovieDB is a ruby wrapper for
|
4
|
-
The objective and usage of this tool is to
|
5
|
-
|
6
|
-
The fetched data is stored in memory using Redis and has an expiration time of 1800 seconds for all cached objects.
|
3
|
+
MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie or TV Data from IMDb.
|
4
|
+
The objective and usage of this tool is to allow producers, directors, writers to make logical business decisions that will generate profitable ROI.
|
7
5
|
|
6
|
+
## Badges
|
7
|
+
- [![PullReview stats](https://www.pullreview.com/github/keeperofthenecklace/movieDB/badges/master.svg?)](https://www.pullreview.com/github/keeperofthenecklace/movieDB/reviews/master)
|
8
|
+
- [![Dependency Status](https://gemnasium.com/keeperofthenecklace/movieDB.svg)](https://gemnasium.com/keeperofthenecklace/movieDB)
|
8
9
|
- [![Join the chat at https://gitter.im/keeperofthenecklace/movieDB](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/keeperofthenecklace/movieDB?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
9
|
-
- [![Coverage Status](https://coveralls.io/repos/keeperofthenecklace/movieDB/badge.svg)](https://coveralls.io/
|
10
|
+
- [![Coverage Status](https://coveralls.io/repos/keeperofthenecklace/movieDB/badge.svg?branch=master&service=github)](https://coveralls.io/github/keeperofthenecklace/movieDB?branch=master)
|
10
11
|
- [![Code Climate](https://codeclimate.com/github/keeperofthenecklace/movieDB.png)](https://codeclimate.com/github/keeperofthenecklace/movieDB)
|
11
12
|
- [![Gem Version](https://badge.fury.io/rb/movieDB.png)](http://badge.fury.io/rb/movieDB)
|
12
13
|
- [![Build Status](https://secure.travis-ci.org/keeperofthenecklace/movieDB.png?branch=master)](http://travis-ci.org/keeperofthenecklace/movieDB)
|
13
14
|
|
14
|
-
##
|
15
|
+
## Technology
|
16
|
+
* SciRuby is used for all statistical and scientific computations.
|
17
|
+
* Redis is used to store all data.
|
18
|
+
* IMDb and TMDb is the source for all film / TV data.
|
19
|
+
* BoxOfficeMojo is where we will be scraping future film / TV data.
|
20
|
+
* Celluloid is used to build the fault-tolerant concurrent programs. Note, if you are using MRI or YARV,
|
21
|
+
multi-threading won't work since these types of interpreters have Global Interpreter Lock (GIL).
|
22
|
+
Fortunately, you can use JRuby or Rubinius, since they don’t have a GIL and support real parallel threading.
|
23
|
+
|
24
|
+
## Requirements
|
25
|
+
ruby-2.2.2 or higher
|
26
|
+
|
27
|
+
jruby-9.0.0.0
|
28
|
+
|
29
|
+
## Category
|
30
|
+
movieDB is broken down into 3 components namely:
|
31
|
+
|
32
|
+
* Statistics
|
33
|
+
* Visualizations (Work in progress)
|
34
|
+
* DataMining (Work in progress)
|
15
35
|
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
36
|
+
# Statistics
|
37
|
+
|
38
|
+
Simple statistical analysis on numeric data.
|
39
|
+
The corresponding computation is performed on
|
40
|
+
both numeric and string vectors within the
|
41
|
+
collected data.
|
20
42
|
|
21
43
|
## Installation
|
22
44
|
|
23
|
-
|
45
|
+
Redis Installation
|
24
46
|
|
25
|
-
|
47
|
+
This tutorial doesn't cover redis installation.
|
48
|
+
You will find that information at: http://redis.io/topics/quickstart
|
26
49
|
|
27
|
-
|
50
|
+
movieDB is available through [Rubygems](http://rubygems.org/gems/movieDB) and can be installed via Gemfile.
|
28
51
|
|
29
|
-
|
52
|
+
``` ruby
|
53
|
+
gem 'movieDB'
|
54
|
+
```
|
30
55
|
|
31
56
|
And then execute:
|
32
57
|
|
33
|
-
|
58
|
+
``` ruby
|
59
|
+
$ bundle install
|
60
|
+
```
|
34
61
|
|
35
62
|
Or install it yourself as:
|
36
63
|
|
37
|
-
|
64
|
+
``` ruby
|
65
|
+
gem install movieDB
|
66
|
+
```
|
67
|
+
|
68
|
+
## Console - loading the libraries
|
38
69
|
|
39
|
-
|
70
|
+
``` bash
|
71
|
+
$ irb
|
72
|
+
```
|
40
73
|
|
41
|
-
|
74
|
+
## Require the gem
|
42
75
|
|
43
|
-
|
76
|
+
```ruby
|
77
|
+
require 'movieDB'
|
78
|
+
```
|
44
79
|
|
45
|
-
##
|
80
|
+
## Initialize MovieDB (multi-thread setup)
|
46
81
|
|
47
|
-
|
82
|
+
``` ruby
|
83
|
+
m = MovieDB::Movie.pool(size: 2)
|
84
|
+
```
|
85
|
+
## Step Process
|
48
86
|
|
49
|
-
|
87
|
+
Fetching and analysing movie / TV data using movieDB is a simple 2 step process.
|
50
88
|
|
51
|
-
|
89
|
+
First, fetch the data from IMDb.
|
52
90
|
|
53
|
-
|
91
|
+
Next, run your choice of statistic.
|
54
92
|
|
55
|
-
|
93
|
+
That's it! It is that simple.
|
56
94
|
|
57
|
-
|
95
|
+
## Part 1 - Fetch Data from IMDb
|
58
96
|
|
59
|
-
|
97
|
+
There are 2 ways to find IMDb ids.
|
60
98
|
|
61
|
-
|
99
|
+
* Finding specific IMDb ids
|
62
100
|
|
63
|
-
|
101
|
+
* Finding random IMDb ids.
|
64
102
|
|
65
|
-
|
103
|
+
### Fetching specific IMDb ids
|
66
104
|
|
67
|
-
|
105
|
+
To find IMDb id for specific movies, you must go to:
|
68
106
|
|
69
|
-
|
107
|
+
```bash
|
108
|
+
http://www.imdb.com
|
109
|
+
```
|
110
|
+
Search for your movie of choice. Once you do, IMDb redirects you to the movie's page.
|
70
111
|
|
71
|
-
|
112
|
+
The URL for the redirect page includes the IMDB id.
|
72
113
|
|
73
|
-
|
114
|
+
``` ruby
|
115
|
+
http://www.imdb.com/title/tt0369610/
|
116
|
+
```
|
117
|
+
0369610 is the IMDb id.
|
74
118
|
|
75
|
-
|
119
|
+
### Fetching random IMDb ids (multi-thread setup)
|
120
|
+
You can fetch IMDb ids random.
|
76
121
|
|
77
|
-
|
122
|
+
``` ruby
|
123
|
+
r = Random.new
|
78
124
|
|
79
|
-
|
125
|
+
39.times do |i|
|
126
|
+
m.async.fetch(sprintf '%07d', r.rand(300000))
|
127
|
+
sleep(4)
|
128
|
+
end
|
80
129
|
|
81
|
-
|
130
|
+
sleep(10)
|
131
|
+
```
|
132
|
+
Note: IMDB has a rate limit of 40 requests every 10 seconds and are limited by IP address, not API key.
|
133
|
+
If you exceed the limit, you will receive a 429 HTTP status with a 'Retry-After' header.
|
134
|
+
As soon your cool down period expires, you are free to continue making requests.
|
82
135
|
|
83
|
-
|
136
|
+
Also, movieDB will throw a NameError if the randomly generated IMDb id in invalid.
|
84
137
|
|
85
|
-
|
138
|
+
### Get Movie Data
|
86
139
|
|
87
|
-
|
140
|
+
``` ruby
|
141
|
+
m.async.fetch("0369610", "3079380", "0478970")
|
142
|
+
```
|
143
|
+
By calling m.async, this instructs Celluloid that you would like for the given method to be called asynchronously.
|
144
|
+
This means that rather than the caller waiting for a response of querying both IMDb and TMDb, the caller sends a
|
145
|
+
message to the concurrent object that you'd like the given method invoked, and then the caller proceeds without waiting for a response.
|
146
|
+
The concurrent object receiving the message will then process the method call in the background.
|
88
147
|
|
89
|
-
|
148
|
+
Asynchronous calls will never raise an exception, even if an exception occurs when the receiver is processing it.
|
90
149
|
|
91
|
-
|
150
|
+
### Redis - caching objects
|
92
151
|
|
93
|
-
|
94
|
-
> Lack_Of_Fit_Sum_Of_Squares
|
95
|
-
> Least_Squares_Support_Vector_Machine
|
96
|
-
> Mean_Squared_Error
|
97
|
-
> Moving_Least_Sqares
|
98
|
-
> Non_Linear_Iterative_Partial_Least_Squares
|
99
|
-
> Non_Linear_Least_Squares
|
100
|
-
> Ordinary_Least_Squares
|
101
|
-
> Partial_Least_Squares_Regression
|
102
|
-
> Partition_Of_Sums_Of_Squares
|
103
|
-
> Proofs_Involving_Ordinary_Least_Squares
|
104
|
-
> Residual_Sum_Of_Squares
|
105
|
-
> Total_Least_Squares
|
106
|
-
> Total_Sum_Of_Squares
|
152
|
+
By default, any movie fetched from IMDb is stored in redis and has an expiration time of 1800 seconds (30 minutes).
|
107
153
|
|
108
|
-
|
154
|
+
But you can change this expiration time.
|
109
155
|
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
> Multivariate_Kernel_Density_Estimation
|
115
|
-
> Variable_Kernel_Density_Estimation
|
156
|
+
```ruby
|
157
|
+
m.async.fetch("0369610", "3079380", expire: 86400)
|
158
|
+
```
|
159
|
+
Here, I set the expiration time to 86400 seconds which is equivalent to 24 hours.
|
116
160
|
|
117
|
-
|
161
|
+
## Part 2 - Run the statistic
|
118
162
|
|
119
|
-
|
120
|
-
> Table_Diagonalization
|
121
|
-
> Configural_Frequency_Analysis
|
122
|
-
> Median_Polish
|
123
|
-
> Stem_And_Leaf_Display
|
163
|
+
Below, we've collected 3 specific IMDb ids to analyze.
|
124
164
|
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
> Dimension_Reduction
|
129
|
-
> Applied_DataMining
|
165
|
+
* Ant Man - 0369610
|
166
|
+
* Jurassic World - 079380
|
167
|
+
* Spy - 0478970
|
130
168
|
|
131
|
-
|
132
|
-
|
169
|
+
Finding the Mean value.
|
170
|
+
```ruby
|
171
|
+
m.mean
|
172
|
+
```
|
173
|
+
Below is the result generated.
|
133
174
|
|
134
|
-
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
175
|
+
```ruby
|
176
|
+
mean
|
177
|
+
ant-man 576.8444444444444
|
178
|
+
jurassic_world 512.5111111111111
|
179
|
+
spy 369.73333333333335
|
180
|
+
```
|
181
|
+
Below are more statistic methods you can invoke on your objects.
|
140
182
|
|
141
|
-
|
142
|
-
> Statistical_Outliers
|
143
|
-
> Regression_And_Curve_Fitting_Software
|
144
|
-
> Regression_Diagnostics
|
145
|
-
> Regression_Variable_Selection
|
146
|
-
> Regression_With_Time_Series_Structure
|
147
|
-
> Robust_Regression
|
148
|
-
> Choice_Modeling
|
183
|
+
Feel free to try them out.
|
149
184
|
|
150
|
-
|
151
|
-
|
185
|
+
* std
|
186
|
+
* sum
|
187
|
+
* count
|
188
|
+
* max
|
189
|
+
* min
|
190
|
+
* product
|
191
|
+
* standardize
|
192
|
+
* describe
|
193
|
+
* covariance
|
194
|
+
* correlation
|
152
195
|
|
153
|
-
|
154
|
-
> Variance_Based_Sensitivity_Analysis
|
155
|
-
> Elementary_Effects_Method
|
156
|
-
> Experimental_Uncertainty_Analysis
|
157
|
-
> Fourier_Amplitude_Sensitivity_Testing
|
158
|
-
> Hyperparameter
|
196
|
+
### Layout and Template
|
159
197
|
|
160
|
-
|
161
|
-
|
198
|
+
movieDB allows you to view all your data fields in a worksheet style layout.
|
199
|
+
|
200
|
+
``` ruby
|
201
|
+
m.worksheet
|
202
|
+
```
|
203
|
+
A total of 45 fields are printed out. But, we've truncated the result for ease of reading.
|
204
|
+
|
205
|
+
|
206
|
+
``` ruby
|
207
|
+
ant-man jurassic_w spy
|
208
|
+
production 177 128 40
|
209
|
+
belongs_to 0 151 0
|
210
|
+
plot_synop 9083 0 9629
|
211
|
+
company 14 18 21
|
212
|
+
title 7 14 3
|
213
|
+
filming_lo 267 1037 530
|
214
|
+
cast_chara 4094 5894 1001
|
215
|
+
trailer_ur 0 46 45
|
216
|
+
cast_membe 2833 3452 939
|
217
|
+
votes 5 6 5
|
218
|
+
adult 5 5 5
|
219
|
+
also_known 928 1601 1195
|
220
|
+
director 15 19 13
|
221
|
+
plot_summa 373 298 311
|
222
|
+
countries 7 16 7
|
223
|
+
... ... ... ...
|
224
|
+
```
|
225
|
+
|
226
|
+
## Filters
|
227
|
+
|
228
|
+
When performing statistics on an object, movieDB by default processes all fields.
|
229
|
+
|
230
|
+
Contrary, to this default approach, you now have the option of filtering what fields you want processed with the following 2 filters.
|
231
|
+
|
232
|
+
* only
|
233
|
+
* except
|
234
|
+
|
235
|
+
'only' analyzes the fields you provide.
|
236
|
+
|
237
|
+
'Except' is the inverse of 'only'. It analyzes all the fields you did not provide.
|
238
|
+
|
239
|
+
``` ruby
|
240
|
+
m.standardize only: [:budget, :revenue, :length, :vote_average]
|
241
|
+
|
242
|
+
```
|
243
|
+
Processes only budget, revenue, length and vote_average values.
|
244
|
+
``` ruby
|
245
|
+
ant-man jurassic_w spy
|
246
|
+
budget 1.49999999 -0.3616594 1.49999999
|
247
|
+
revenue -0.5000006 1.49304559 -0.5000013
|
248
|
+
length -0.4999988 -0.5656929 -0.4999976
|
249
|
+
vote_avera -0.5000005 -0.5656931 -0.5000010
|
250
|
+
```
|
251
|
+
# Commands
|
252
|
+
|
253
|
+
movieDB comes with commands to help you query or manipulate stored objects in redis.
|
254
|
+
|
255
|
+
* HGETALL key
|
256
|
+
Get all the fields and values in a hash of the movie
|
257
|
+
|
258
|
+
``` ruby
|
259
|
+
m.hgetall(["0369610"])
|
260
|
+
# => {"production_companies"=>"[{\"name\"=>\"Universal Studios\", \"id\"=>13},...}
|
261
|
+
```
|
262
|
+
* HKEYS key
|
263
|
+
Get all the fields in a hash of the movie
|
264
|
+
|
265
|
+
``` ruby
|
266
|
+
m.hkeys
|
267
|
+
# => ["production_companies", "belongs_to_collection", "plot_synopsis", "company", "title",...]
|
268
|
+
```
|
269
|
+
|
270
|
+
* HVALS key
|
271
|
+
Get all the values in a hash of the movie
|
272
|
+
|
273
|
+
``` ruby
|
274
|
+
m.hvals
|
275
|
+
# => ["[{\"name\"=>\"Universal Studios\", \"id\"=>13}, {\"name\"=>\"Amblin Entertainment\",...]
|
276
|
+
```
|
277
|
+
|
278
|
+
* ALL_IDS key
|
279
|
+
Get all the id of movies
|
280
|
+
|
281
|
+
``` ruby
|
282
|
+
m.all_ids
|
283
|
+
# => ["0369610", "3079380"...]
|
284
|
+
```
|
285
|
+
|
286
|
+
* TTL key
|
287
|
+
Gets the remaining time to live of a movie.
|
288
|
+
|
289
|
+
``` ruby
|
290
|
+
m.ttl("0369610)
|
291
|
+
# => 120
|
292
|
+
```
|
293
|
+
|
294
|
+
* DELETE_ALL key
|
295
|
+
deletes all movie objects stored in redis.
|
296
|
+
|
297
|
+
``` ruby
|
298
|
+
m.delete_all
|
299
|
+
# => []
|
300
|
+
```
|
301
|
+
# Visualizations
|
302
|
+
(Work in progress)
|
303
|
+
|
304
|
+
# Data mining
|
305
|
+
(Work in progress)
|
162
306
|
|
163
307
|
## Contact me
|
164
308
|
|
@@ -166,4 +310,7 @@ If you'd like to collaborate, please feel free to fork source code on github.
|
|
166
310
|
|
167
311
|
You can also contact me at albertmck@gmail.com
|
168
312
|
|
169
|
-
|
313
|
+
## Disclaimer
|
314
|
+
This software is provided “as is” and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
|
315
|
+
|
316
|
+
###### Copyright (c) 2013 - 2015 Albert McKeever, released under MIT license
|