movieDB 0.3.4 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: abd7277f09e27f87c05a46760b7604142a13c4f4
4
- data.tar.gz: 7b7982a640eff36476dcc23dfeb737e87f3ee911
3
+ metadata.gz: 18eb3efc3ad17c7cb5b6134a951402964af0b5da
4
+ data.tar.gz: 7eeb628cab2588293f92aa458eeabb903cbfb6ec
5
5
  SHA512:
6
- metadata.gz: 950a7d07348be94e30fb40e74048de44c6548f1794bf0c65ac140a6f2845733a76533c0d49e7ea5dec9f5b8397e76082f8b93580959c08ae0cf3d2e8589cca5c
7
- data.tar.gz: e8293834822371dec1e2579eeff6e4f6aa9baf035c46ca8a0324d7458ca029c6c875afec8577cbb329448022207bc0c2d20ed0fec243b2c1eb11dee1b3f81863
6
+ metadata.gz: 09c390931150038ca9b18d52ffdd084a805b344c4272876c9ff6880ee2c4ae3da5623be445cc9b3ad6e7f192e60594fb62a682ebad4168f68d08f8e07b3629b5
7
+ data.tar.gz: 4b559dcc3e841b450187b87f1060039033109bdce2cd4791ae5c9628bea929db875acff15748e807f9e6444b2eeddcd03ff37b0b2b62d4b324bffdf30468d068
Binary file
@@ -0,0 +1,2 @@
1
+ service_name: travis-ci
2
+ repo_token: Uytw7gmmEhTTwfj9STT7rokiH7D0su0GD
data/.gitignore CHANGED
@@ -16,6 +16,8 @@ test/tmp
16
16
  test/version_tmp
17
17
  tmp
18
18
  reports
19
- .coveralls.yml
20
19
  readme_first.txt
21
20
  test/.DS_Store
21
+ coverage
22
+ *.log
23
+ .idea/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -3,6 +3,11 @@ rvm:
3
3
  - 2.2.2
4
4
  env:
5
5
  - TEST_SUITE = units
6
+ global:
7
+ - "CI=true"
8
+ - "TRAVIS=true"
9
+ after_success:
10
+ - coveralls --verbose
6
11
  before_script:
7
12
  - rm -rf reports/*.8
8
13
  bundler_args: --without development production --quiet
data/Gemfile CHANGED
@@ -2,7 +2,12 @@ source 'https://rubygems.org'
2
2
 
3
3
  gemspec
4
4
 
5
- group :documentation do
6
- gem 'coveralls', :require => false
7
- end
5
+ gem 'coveralls', require: false
6
+ gem 'rspec'
7
+
8
8
 
9
+ # Code coverage analysis tool for Ruby.
10
+ group :test do
11
+ gem 'simplecov', :require => false
12
+ gem 'pullreview-coverage', require: false
13
+ end
data/README.md CHANGED
@@ -1,164 +1,308 @@
1
- ## MovieDB
1
+ # MovieDB
2
2
 
3
- MovieDB is a ruby wrapper for fetching raw Movie or TV Data from IMDb and performing a variety of statistical analysis and computation.
4
- The objective and usage of this tool is to help media producers make high level structured decisions based on realistic analysis of actual data.
5
-
6
- The fetched data is stored in memory using Redis and has an expiration time of 1800 seconds for all cached objects.
3
+ MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie or TV Data from IMDb.
4
+ The objective and usage of this tool is to allow producers, directors, writers to make logical business decisions that will generate profitable ROI.
7
5
 
6
+ ## Badges
7
+ - [![PullReview stats](https://www.pullreview.com/github/keeperofthenecklace/movieDB/badges/master.svg?)](https://www.pullreview.com/github/keeperofthenecklace/movieDB/reviews/master)
8
+ - [![Dependency Status](https://gemnasium.com/keeperofthenecklace/movieDB.svg)](https://gemnasium.com/keeperofthenecklace/movieDB)
8
9
  - [![Join the chat at https://gitter.im/keeperofthenecklace/movieDB](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/keeperofthenecklace/movieDB?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
9
- - [![Coverage Status](https://coveralls.io/repos/keeperofthenecklace/movieDB/badge.svg)](https://coveralls.io/r/keeperofthenecklace/movieDB)
10
+ - [![Coverage Status](https://coveralls.io/repos/keeperofthenecklace/movieDB/badge.svg?branch=master&service=github)](https://coveralls.io/github/keeperofthenecklace/movieDB?branch=master)
10
11
  - [![Code Climate](https://codeclimate.com/github/keeperofthenecklace/movieDB.png)](https://codeclimate.com/github/keeperofthenecklace/movieDB)
11
12
  - [![Gem Version](https://badge.fury.io/rb/movieDB.png)](http://badge.fury.io/rb/movieDB)
12
13
  - [![Build Status](https://secure.travis-ci.org/keeperofthenecklace/movieDB.png?branch=master)](http://travis-ci.org/keeperofthenecklace/movieDB)
13
14
 
14
- ## Basic functions and Data Analysis:
15
+ ## Technology
16
+ * SciRuby is used for all statistical and scientific computations.
17
+ * Redis is used to store all data.
18
+ * IMDb and TMDb is the source for all film / TV data.
19
+ * BoxOfficeMojo is where we will be scraping future film / TV data.
20
+ * Celluloid is used to build the fault-tolerant concurrent programs. Note, if you are using MRI or YARV,
21
+ multi-threading won't work since these types of interpreters have Global Interpreter Lock (GIL).
22
+ Fortunately, you can use JRuby or Rubinius, since they don’t have a GIL and support real parallel threading.
23
+
24
+ ## Requirements
25
+ ruby-2.2.2 or higher
26
+
27
+ jruby-9.0.0.0
28
+
29
+ ## Category
30
+ movieDB is broken down into 3 components namely:
31
+
32
+ * Statistics
33
+ * Visualizations (Work in progress)
34
+ * DataMining (Work in progress)
15
35
 
16
- * Data Analysis
17
- * Exploratory Data Analysis
18
- * Confirmatory Data Analysis
19
- * More to come...
36
+ # Statistics
37
+
38
+ Simple statistical analysis on numeric data.
39
+ The corresponding computation is performed on
40
+ both numeric and string vectors within the
41
+ collected data.
20
42
 
21
43
  ## Installation
22
44
 
23
- Please make sure you have redis installed.
45
+ Redis Installation
24
46
 
25
- This tutorial doesn't cover redis installation.
47
+ This tutorial doesn't cover redis installation.
48
+ You will find that information at: http://redis.io/topics/quickstart
26
49
 
27
- Add this line to your application's Gemfile:
50
+ movieDB is available through [Rubygems](http://rubygems.org/gems/movieDB) and can be installed via Gemfile.
28
51
 
29
- gem 'movieDB'
52
+ ``` ruby
53
+ gem 'movieDB'
54
+ ```
30
55
 
31
56
  And then execute:
32
57
 
33
- $> bundle install
58
+ ``` ruby
59
+ $ bundle install
60
+ ```
34
61
 
35
62
  Or install it yourself as:
36
63
 
37
- $> gem install movieDB
64
+ ``` ruby
65
+ gem install movieDB
66
+ ```
67
+
68
+ ## Console - loading the libraries
38
69
 
39
- ## Require - loading the libraries
70
+ ``` bash
71
+ $ irb
72
+ ```
40
73
 
41
- $> irb
74
+ ## Require the gem
42
75
 
43
- $> require 'movieDB'
76
+ ```ruby
77
+ require 'movieDB'
78
+ ```
44
79
 
45
- ## Usage - Fetch Raw Movie Data From IMDb
80
+ ## Initialize MovieDB (multi-thread setup)
46
81
 
47
- $> imdb_ids = ["0369610", "3079380"]
82
+ ``` ruby
83
+ m = MovieDB::Movie.pool(size: 2)
84
+ ```
85
+ ## Step Process
48
86
 
49
- $> MovieDB::Movie.find_imdb_id(imdb_ids)
87
+ Fetching and analysing movie / TV data using movieDB is a simple 2 step process.
50
88
 
51
- /* YOU CAN ADD AS MANY IMDB IDs AS YOU LIKE. BUT DO NOT EXCEED THE MAXIMUM REQUEST RATE. */
89
+ First, fetch the data from IMDb.
52
90
 
53
- ### IMDb Data
91
+ Next, run your choice of statistic.
54
92
 
55
- When IMDb data is fetched, two things happen.
93
+ That's it! It is that simple.
56
94
 
57
- First, a reports folder is created in the movieDB gem folder.
95
+ ## Part 1 - Fetch Data from IMDb
58
96
 
59
- Next, the fetched data is written to an xls format and stored in the reports directory.
97
+ There are 2 ways to find IMDb ids.
60
98
 
61
- From your terminal you can locate movieDB gem directory like this:
99
+ * Finding specific IMDb ids
62
100
 
63
- $ gem content movieDB
101
+ * Finding random IMDb ids.
64
102
 
65
- If you use our above IMDb id, you should find the following xls file.
103
+ ### Fetching specific IMDb ids
66
104
 
67
- Feel free to open it.
105
+ To find IMDb id for specific movies, you must go to:
68
106
 
69
- $ open ../reports/imdb_JurassicWorld_Spy_.xls
107
+ ```bash
108
+ http://www.imdb.com
109
+ ```
110
+ Search for your movie of choice. Once you do, IMDb redirects you to the movie's page.
70
111
 
71
- ## Usage - Analyzing Data From IMDb.
112
+ The URL for the redirect page includes the IMDB id.
72
113
 
73
- $ irb
114
+ ``` ruby
115
+ http://www.imdb.com/title/tt0369610/
116
+ ```
117
+ 0369610 is the IMDb id.
74
118
 
75
- > require 'MovieDB/data_analysis'
119
+ ### Fetching random IMDb ids (multi-thread setup)
120
+ You can fetch IMDb ids random.
76
121
 
77
- > require 'MovieDB/data_process'
122
+ ``` ruby
123
+ r = Random.new
78
124
 
79
- > MovieDB::DataProcess.send(:basic_statistic, 'imdb_JurassicWorld_Spy_.xls')
125
+ 39.times do |i|
126
+ m.async.fetch(sprintf '%07d', r.rand(300000))
127
+ sleep(4)
128
+ end
80
129
 
81
- A statistical computation is performed and the results is written to movieDB gem reports folder.
130
+ sleep(10)
131
+ ```
132
+ Note: IMDB has a rate limit of 40 requests every 10 seconds and are limited by IP address, not API key.
133
+ If you exceed the limit, you will receive a 429 HTTP status with a 'Retry-After' header.
134
+ As soon your cool down period expires, you are free to continue making requests.
82
135
 
83
- Feel free to open it.
136
+ Also, movieDB will throw a NameError if the randomly generated IMDb id in invalid.
84
137
 
85
- $ open ../reports/Statistic_imdb_JurassicWorld_Spy.xls
138
+ ### Get Movie Data
86
139
 
87
- ## What's Next
140
+ ``` ruby
141
+ m.async.fetch("0369610", "3079380", "0478970")
142
+ ```
143
+ By calling m.async, this instructs Celluloid that you would like for the given method to be called asynchronously.
144
+ This means that rather than the caller waiting for a response of querying both IMDb and TMDb, the caller sends a
145
+ message to the concurrent object that you'd like the given method invoked, and then the caller proceeds without waiting for a response.
146
+ The concurrent object receiving the message will then process the method call in the background.
88
147
 
89
- ##### More statistical computations coming soon:
148
+ Asynchronous calls will never raise an exception, even if an exception occurs when the receiver is processing it.
90
149
 
91
- `:GaussNewtonAlgorithm`
150
+ ### Redis - caching objects
92
151
 
93
- > Iteratively_Reweighted_Least_Squares
94
- > Lack_Of_Fit_Sum_Of_Squares
95
- > Least_Squares_Support_Vector_Machine
96
- > Mean_Squared_Error
97
- > Moving_Least_Sqares
98
- > Non_Linear_Iterative_Partial_Least_Squares
99
- > Non_Linear_Least_Squares
100
- > Ordinary_Least_Squares
101
- > Partial_Least_Squares_Regression
102
- > Partition_Of_Sums_Of_Squares
103
- > Proofs_Involving_Ordinary_Least_Squares
104
- > Residual_Sum_Of_Squares
105
- > Total_Least_Squares
106
- > Total_Sum_Of_Squares
152
+ By default, any movie fetched from IMDb is stored in redis and has an expiration time of 1800 seconds (30 minutes).
107
153
 
108
- `:EstimationOfDensity`
154
+ But you can change this expiration time.
109
155
 
110
- > Cluster_Weighted_Modeling
111
- > Density_Estimation
112
- > Discretization_Of_Continuous_Features
113
- > Mean_Integrated_Squared_Error
114
- > Multivariate_Kernel_Density_Estimation
115
- > Variable_Kernel_Density_Estimation
156
+ ```ruby
157
+ m.async.fetch("0369610", "3079380", expire: 86400)
158
+ ```
159
+ Here, I set the expiration time to 86400 seconds which is equivalent to 24 hours.
116
160
 
117
- `:ExploratoryDataAnalysis`
161
+ ## Part 2 - Run the statistic
118
162
 
119
- > Data_Reduction
120
- > Table_Diagonalization
121
- > Configural_Frequency_Analysis
122
- > Median_Polish
123
- > Stem_And_Leaf_Display
163
+ Below, we've collected 3 specific IMDb ids to analyze.
124
164
 
125
- > Data_Mining
126
- > Applied_DataMining
127
- > Cluster_Analysis
128
- > Dimension_Reduction
129
- > Applied_DataMining
165
+ * Ant Man - 0369610
166
+ * Jurassic World - 079380
167
+ * Spy - 0478970
130
168
 
131
- > RegressionAnalysis
132
- > Choice_Modelling
169
+ Finding the Mean value.
170
+ ```ruby
171
+ m.mean
172
+ ```
173
+ Below is the result generated.
133
174
 
134
- > Generalized_Linear_Model
135
- > Binomial_Regression
136
- > Generalized_Additive_Model
137
- > Linear_Probability_Model
138
- > Poisson_Regression
139
- > Zero_Inflated_Model
175
+ ```ruby
176
+ mean
177
+ ant-man 576.8444444444444
178
+ jurassic_world 512.5111111111111
179
+ spy 369.73333333333335
180
+ ```
181
+ Below are more statistic methods you can invoke on your objects.
140
182
 
141
- > Nonparametric_Regression
142
- > Statistical_Outliers
143
- > Regression_And_Curve_Fitting_Software
144
- > Regression_Diagnostics
145
- > Regression_Variable_Selection
146
- > Regression_With_Time_Series_Structure
147
- > Robust_Regression
148
- > Choice_Modeling
183
+ Feel free to try them out.
149
184
 
150
- > Resampling
151
- > Bootstrapping_Population
185
+ * std
186
+ * sum
187
+ * count
188
+ * max
189
+ * min
190
+ * product
191
+ * standardize
192
+ * describe
193
+ * covariance
194
+ * correlation
152
195
 
153
- > Sensitivity_Analysis
154
- > Variance_Based_Sensitivity_Analysis
155
- > Elementary_Effects_Method
156
- > Experimental_Uncertainty_Analysis
157
- > Fourier_Amplitude_Sensitivity_Testing
158
- > Hyperparameter
196
+ ### Layout and Template
159
197
 
160
- > Time_Series_Analysis
161
- > Frequency_Deviation
198
+ movieDB allows you to view all your data fields in a worksheet style layout.
199
+
200
+ ``` ruby
201
+ m.worksheet
202
+ ```
203
+ A total of 45 fields are printed out. But, we've truncated the result for ease of reading.
204
+
205
+
206
+ ``` ruby
207
+ ant-man jurassic_w spy
208
+ production 177 128 40
209
+ belongs_to 0 151 0
210
+ plot_synop 9083 0 9629
211
+ company 14 18 21
212
+ title 7 14 3
213
+ filming_lo 267 1037 530
214
+ cast_chara 4094 5894 1001
215
+ trailer_ur 0 46 45
216
+ cast_membe 2833 3452 939
217
+ votes 5 6 5
218
+ adult 5 5 5
219
+ also_known 928 1601 1195
220
+ director 15 19 13
221
+ plot_summa 373 298 311
222
+ countries 7 16 7
223
+ ... ... ... ...
224
+ ```
225
+
226
+ ## Filters
227
+
228
+ When performing statistics on an object, movieDB by default processes all fields.
229
+
230
+ Contrary, to this default approach, you now have the option of filtering what fields you want processed with the following 2 filters.
231
+
232
+ * only
233
+ * except
234
+
235
+ 'only' analyzes the fields you provide.
236
+
237
+ 'Except' is the inverse of 'only'. It analyzes all the fields you did not provide.
238
+
239
+ ``` ruby
240
+ m.standardize only: [:budget, :revenue, :length, :vote_average]
241
+
242
+ ```
243
+ Processes only budget, revenue, length and vote_average values.
244
+ ``` ruby
245
+ ant-man jurassic_w spy
246
+ budget 1.49999999 -0.3616594 1.49999999
247
+ revenue -0.5000006 1.49304559 -0.5000013
248
+ length -0.4999988 -0.5656929 -0.4999976
249
+ vote_avera -0.5000005 -0.5656931 -0.5000010
250
+ ```
251
+ # Commands
252
+
253
+ movieDB comes with commands to help you query or manipulate stored objects in redis.
254
+
255
+ * HGETALL key
256
+ Get all the fields and values in a hash of the movie
257
+
258
+ ``` ruby
259
+ m.hgetall(["0369610"])
260
+ # => {"production_companies"=>"[{\"name\"=>\"Universal Studios\", \"id\"=>13},...}
261
+ ```
262
+ * HKEYS key
263
+ Get all the fields in a hash of the movie
264
+
265
+ ``` ruby
266
+ m.hkeys
267
+ # => ["production_companies", "belongs_to_collection", "plot_synopsis", "company", "title",...]
268
+ ```
269
+
270
+ * HVALS key
271
+ Get all the values in a hash of the movie
272
+
273
+ ``` ruby
274
+ m.hvals
275
+ # => ["[{\"name\"=>\"Universal Studios\", \"id\"=>13}, {\"name\"=>\"Amblin Entertainment\",...]
276
+ ```
277
+
278
+ * ALL_IDS key
279
+ Get all the id of movies
280
+
281
+ ``` ruby
282
+ m.all_ids
283
+ # => ["0369610", "3079380"...]
284
+ ```
285
+
286
+ * TTL key
287
+ Gets the remaining time to live of a movie.
288
+
289
+ ``` ruby
290
+ m.ttl("0369610)
291
+ # => 120
292
+ ```
293
+
294
+ * DELETE_ALL key
295
+ deletes all movie objects stored in redis.
296
+
297
+ ``` ruby
298
+ m.delete_all
299
+ # => []
300
+ ```
301
+ # Visualizations
302
+ (Work in progress)
303
+
304
+ # Data mining
305
+ (Work in progress)
162
306
 
163
307
  ## Contact me
164
308
 
@@ -166,4 +310,7 @@ If you'd like to collaborate, please feel free to fork source code on github.
166
310
 
167
311
  You can also contact me at albertmck@gmail.com
168
312
 
169
- ###### Copyright (c) 2013 Albert McKeever, released under MIT license
313
+ ## Disclaimer
314
+ This software is provided “as is” and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
315
+
316
+ ###### Copyright (c) 2013 - 2015 Albert McKeever, released under MIT license