tfidf 0.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (4) hide show
  1. data/examples/demo_tf.rb +142 -0
  2. data/lib/tfidf.rb +186 -0
  3. data/test/test_TFIDF.rb +200 -0
  4. metadata +60 -0
@@ -0,0 +1,142 @@
1
+ # -*- coding: utf-8 -*-
2
+ require 'tfidf'
3
+ require 'pp'
4
+ require 'ruby-debug'
5
+ Debugger.settings[:autoeval] = true
6
+ Debugger.start
7
+ Debugger.post_mortem
8
+
9
+ corpus = ["The quick brown fox jumps over the lazy dog",
10
+ %Q_Fox is a common name for many species of omnivorous mammals belonging to the Canidae family. Foxes are small to medium-sized canids (slightly smaller than the medium-sized domestic dog), characterized by possessing a long narrow snout, and a bushy tail (or brush).
11
+ Members of about 37 species are referred to as foxes, of which only 12 species actually belong to the Vulpes genus of "true foxes". By far the most common and widespread species of fox is the red fox (Vulpes vulpes), although various species are found on almost every continent. The presence of fox-like carnivores all over the globe, together with their widespread reputation for cunning, has contributed to their appearance in popular culture and folklore in many societies around the world (see also Foxes in culture)._,
12
+ %Q_On March 13, 2008, a YouTube user named RANDYPETERS1, a 9-year-old boy from Chicago, submitted a handdrawn animated video about Octocat, a red cat head with eight long legs looking for his parents. The videos featured crude MS Paint animation and a loud, highpitched, child-like voice narrating. On September 7, the fifth, final episode was released, but featured an unexpected twist - about 20 seconds into it, the crude sketchy animation switched to intricately crafted 3D with an orchestral soundtrack; the whole Octocat story (and as such, the Randy Peters persona) was revealed to be by David O'Reilly [8]. In an interview he joked "I wanted to try experimenting with the Youtube audience and Microsoft Paint. The story for Octocat came to me by reading the bible word-for-word backwards".[9]_ ,
13
+ "Master's thesis
14
+ Students are required to complete a master's thesis, which is a research assignment with a workload corresponding to 30 credits. The thesis is written on a topic related to the student's major and agreed upon between the student and a professor who specialises in the topic of the thesis. The supervisor of the thesis must be a professor in the University, whereas the instructor(s) must have at least a master’s degree.
15
+ Topic application
16
+
17
+ The master's thesis process begins by contacting a professor in the student's field of interest, i.e. major, and agreeing on the topic of the thesis. For well founded reasons, the thesis may also be written on a topic related to the student's minor (if a minor is included in the degree).
18
+
19
+ Once a topic, a supervisor, an instructor and a timetable for the thesis have been determined, an official topic application must be submitted to the Student Services Office. Topic applications are accepted once a month. The Degree Programme Committee confirms the topic and appoints the supervisor and the instructor for the thesis.
20
+
21
+ A topic for the thesis may be applied for when the Bachelor's degree and at least 45 credits of the Master's degree have been completed. Once confirmed, the topic is valid for one year.
22
+ AaltoELEC's master's thesis instructions
23
+
24
+ An introduction to what a master's thesis is and what it's requirements are: What is a Master's Thesis?
25
+ A guide book written by the School's former professor gives general instructions about thesis work and its academic requirements: How to write a diploma thesis
26
+ School of Electrical Engineering thesis template (gzipped tarball), provides a LaTeX-template for your thesis writing process and gives detailed instructions on the form and style of the Master's thesis. Make sure to change the language into \"English\" when using this template. Example Master's Thesis title page and abstract produced with thesis template.
27
+
28
+ Other useful links:
29
+
30
+ Electronic master's thesis database - read completed theses from previous years
31
+ I\x2415nformation resources
32
+ Making a bibliography (Otaniemi main library's instructions)
33
+ How to avoid plagiarism 1 & 2
34
+ Aalto University Code of Academic Integrity and Handling Violations Thereof
35
+ " ,
36
+ %q_The tf*idf weight (term frequency–inverse document frequency) is a numerical statistic which reflects how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.
37
+
38
+ Variations of the tf*idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. tf*idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.[1]
39
+
40
+ One of the simplest ranking functions is computed by summing the tf*idf for each query term; many more sophisticated ranking functions are variants of this simple model._ ,
41
+ %q_By Melissa Healy, Los Angeles Times
42
+
43
+ March 27, 2012, 5:45 p.m.
44
+ When roasted at 475 degrees, coffee beans are sometimes described as rich and full-bodied. But for the full-bodied person who is not so rich, unroasted coffee beans — green as the day they were picked — may hold the key to cheap and effective weight loss, new research suggests.
45
+
46
+ In a study presented Tuesday at the American Chemical Society's spring national meeting in San Diego, 16 overweight young adults took, by turns, a low dose of green coffee bean extract, a high dose of the supplement, and a placebo. Though the study was small, the results were striking: Subjects lost an average of 17.5 pounds in 22 weeks and reduced their overall body weight by 10.5%.
47
+
48
+ If green coffee extract were a medication seeking approval from the Food and Drug Administration, these results would make it a viable candidate — more than 35% of subjects lost more than 5% of their body weight, and weight loss appeared to be greater while subjects were taking the pills than when they were on the placebo.
49
+
50
+ But as a dietary supplement, green coffee extract does not require the FDA's blessing. In fact, it is already available as a naturopathic medicine and antioxidant.
51
+
52
+ Joe Vinson, the University of Scranton chemist who conducted the pilot study, said the findings should pave the way for more rigorous research on coffee bean extract's effects. A larger trial involving 60 people is being planned.
53
+
54
+ Vinson, whose research focuses on plant polyphenols and their effects on human health, said it appears that green coffee bean extract may work by reducing the absorption of fat and glucose in the gut; it may also reduce insulin levels, which would improve metabolic function. There were no signs of ill effects on any subjects, Vinson reported Tuesday.
55
+
56
+ The study used a "cross-over" design, which allowed each subject to serve as his or her own comparison group. For six weeks, volunteers swallowed capsules three times a day, ingesting either 700 or 1,050 milligrams of green coffee extract a day or taking a placebo. After a two-week break, they moved, round-robin style, to another arm of the trial.
57
+
58
+ Subjects did not change their calorie intake over the course of the trial. But the more extract they consumed, the more weight and fat they lost. Altogether, they reduced their body fat by 16%, on average.
59
+
60
+ Of the 16 volunteers, six wound up with a body mass index in the healthful range.
61
+
62
+ One downside is that the extract is "extremely bitter." It would be difficult to take without a lot of water, Vinson reported.
63
+
64
+ At roughly $20 per month, however, green coffee extract is much less expensive than any of the weight-loss medications available over the counter or by prescription.
65
+
66
+ The trial was conducted in India and paid for by Applied Food Sciences Inc. of Austin, Tex., a manufacturer of green coffee bean extract.
67
+
68
+ The pilot study drew strong cautions from several scientists who weren't involved in the research.
69
+
70
+ "This is certainly a provocative study," said Dr. Gerald Weissmann, a physician and biochemist at New York University. But he said nutrition experts would want assurances that green coffee beans do not cause "malabsorption" within the human gut — a condition that would lead to weight loss as well as malnutrition, heart arrhythmias and other problems because vitamins and minerals are not passing through the intestine.
71
+
72
+ Dr. Arthur Grollman, a pharmacologist at the State University of New York at Stony Brook, said coffee beans contain about 250 different chemicals — some with positive and others with negative effects on human health. Though Vinson identified polyphenols and chlorogenic acid as the agents that appear to promote weight loss, Grollman said that claim needed further study. In the meantime, he said, consuming an extract that contains both good and bad chemicals in dense concentration seems an unwise thing to do._,
73
+ %q_By Andrea Mustain
74
+ OurAmazingPlanet
75
+ updated 3/27/2012 6:34:52 PM ET
76
+
77
+ Print
78
+ Font:
79
+
80
+ James Cameron's record-setting dive to Earth's deepest spot has sparked a wave of excitement among many in the science community, who are not only heralding the new technology produced by the Hollywood veteran but lauding the renewed focus the project has put on the deep ocean.
81
+
82
+ "It's wonderful, absolutely wonderful," said Robert J. Stern, a geoscientist at the University of Texas at Dallas. He is one of several deep-sea researchers who said they'd been closely following Cameron's bid to return human observers to the Challenger Deep, a trough within the Mariana Trench more than 35,000 feet (10,700 meters) below the ocean surface.
83
+
84
+ Cameron's roundtrip earlier this week to the deepest place on Earth lasted just under seven hours. The only previous time humans visited this spot was in 1960.
85
+
86
+ News of Cameron's successful solo dive "gave me goose bumps," said Cindy Lee Van Dover, director of the marine laboratory at Duke University's Nicholas School of the Environment.
87
+
88
+ "I think it's a really good thing," said Bruce Robison, a senior scientist at the Monterey Bay Aquarium Research Institute in California. [ Infographic: James Cameron's Mariana Trench Dive ]
89
+
90
+ Opening the deep
91
+ All three scientists have spent many hours aboard some of the few deep-diving research submersibles on Earth and said they hoped Cameron's technology eventually will prove a boon to researchers wishing to collect samples and perhaps even conduct experiments in the deepest reaches of the sea, a place that until now has been off-limits to humans.
92
+
93
+ "This establishes that the technology exists to allow that to happen, so we shouldn't constrain our thinking about approaching work in that extreme habitat," Robison told OurAmazingPlanet. "This is a technological breakthrough and a huge accomplishment on Cameron's part, and I'm very pleased that he's done it; but let's hope it opens the door for more."
94
+
95
+ Cameron's team hasn't confirmed what (if any) samples the filmmaker and explorer retrieved from the Challenger Deep during his three-hour seafloor sojourn. The sub is equipped with a sampling arm, among other research tools. However, Cameron did describe a bleak view through the windows of his lime-green submersible.
96
+
97
+ More science news from msnbc.com
98
+ Image: Brain regions
99
+ Yamada et al. / Nature Comm.
100
+ Scientists take a look inside a jury's brains
101
+
102
+ Science editor Alan Boyle's blog: Jurors show a characteristic pattern of brain activity when they decide to be lenient on a criminal, and the strength of that pattern can vary from juror to juror, researchers say.
103
+ 'Invisibility cloak' can serve as heat shield
104
+ Dolphin society adopts freewheeling lifestyle
105
+ Ancient stone monolith likely marked seasons
106
+
107
+ "It looked like the moon," Cameron told National Geographic reporters upon his return to the surface world. [ See photos from Cameron's historic dive ]
108
+
109
+ For scientists, such an assessment was hardly discouraging.
110
+
111
+ "I was rooting for him to land and find strange-looking animals," Van Dover said, but she added she wasn't surprised. She said the seafloor is vast, conditions are harsh, and life is likely sparsely spread.
112
+
113
+ "Three hours is just a drop in the bucket, and with more hours I think he's going to discover cool things," Van Dover said. "How many years have we been studying the ocean? And it took until 1977 to discover hydrothermal vents." Van Dover specializes in researching the strange creatures that congregate around the seafloor vents, which spew super-heated water laced with trace chemicals that sustain the animals.
114
+
115
+ Robison, a veteran deep-sea ecologist, said that whatever does live in the trench, which is nearly a mile deeper than Mount Everest is tall, will be of great interest to scientists.
116
+
117
+ "Anything that has adapted to thrive in that habitat is going to have some really remarkable adaptations," he said. "But most of the animals in the ocean don't live on the bottom, so there's an enormous potential for discovery up off the bottom as well."
118
+
119
+ Much of the deepest ocean is unreachable via state-owned submersibles, which at this point can dive no more than 21,000 feet (6,500 m). Only Japan's Shinkai 6500 has reached such depths. The United States is refurbishing Alvin, its deepest-diving craft, to be able to reach 21,000 feet within th_ ]
120
+
121
+ tfidf = TFIDF.new corpus
122
+
123
+ puts "Documents in the corpus:"
124
+ tfidf.docs.each {|k,v| puts "Document ID: #{k} => term: #{v}"}
125
+
126
+ puts "Terms in the corpus"
127
+ tfidf.terms.each {|k,v| puts "Term ID: #{k} => term: #{v}"}
128
+
129
+ puts "Document-Term Matrix, sparse List of lists(LIL)"
130
+ tfidf.sparse_matrix_doc_idx.each {|e| puts e}
131
+
132
+ puts "Term-Document Matrix, sparse LIL"
133
+ tfidf.sparse_matrix_term_idx.each {|e| puts e}
134
+
135
+ puts "Term Frequency of word: video in document"
136
+ puts tfidf.tf
137
+
138
+ puts "Inverse Document Frequency of word: octocat"
139
+ puts tfidf.idf("octocat")
140
+
141
+ puts "TF-IDF of word : octocats in document 7e38fa195cee92d2e7d834095d6938a89b5fdd58"
142
+ puts tfidf.tfidf("octocat","7e38fa195cee92d2e7d834095d6938a89b5fdd58")
@@ -0,0 +1,186 @@
1
+ require 'set'
2
+ require 'fast_stemmer'
3
+ require 'digest'
4
+
5
+
6
+ class TFIDF
7
+
8
+ #Regex pattern of delimiters for splitting text
9
+ @@split_pattern = /[\W]/
10
+
11
+ #Hash function used for generating id for documents as well as terms
12
+ def hash_func(obj)
13
+ return Digest::SHA1.hexdigest obj
14
+ end
15
+
16
+ #=Arguments
17
+ # corpus: an array of strings, one string per document
18
+ #=Returns
19
+ # self
20
+ #Example:
21
+ #tfidf = TFIDF.new(["This is a document...",
22
+ #"Far, far away...",
23
+ #"The quick brown fox jumps over the lazy dog"])
24
+ #tfidf.tf("fox","2fd4e1c67a2d28fced849ee1bb76e7391b93eb12") #=> 1
25
+ #See examples/demo_tf.rb for more
26
+ def initialize(corpus)
27
+ @cardinality = 0
28
+ @docs = {}
29
+ @terms = {}
30
+ @sparse_matrix_term_idx = {}
31
+ @sparse_matrix_doc_idx = {}
32
+ @idf = {}
33
+
34
+ #not in use
35
+ #TODO:
36
+ @dense_matrix = nil
37
+
38
+ if corpus.is_a? String
39
+ @cardinality = 1
40
+ corpus = [corpus]
41
+ else
42
+ @cardinality = corpus.length
43
+ end
44
+ memo = corpus.reduce({:terms => {}, :docs => {}, :sparse_matrix_doc_idx => {}, :sparse_matrix_term_idx => {}}) do |memo, doc|
45
+ doc_id = hash_func doc
46
+ memo[:docs][doc_id] = doc
47
+ tf_single_doc = TFIDF.tf_single(doc)
48
+ memo[:sparse_matrix_doc_idx][doc_id] = tf_single_doc
49
+ tf_single_doc.each do |keyvalue|
50
+ term, freq = keyvalue
51
+ term_id = hash_func term
52
+ lambda {|x|
53
+ if !x.has_key?(term_id)
54
+ x[term_id] = term
55
+ end}.call memo[:terms]
56
+ lambda {|x|
57
+ if x[term] != nil
58
+ x[term][doc_id] = freq
59
+ else
60
+ x[term] = {doc_id => freq}
61
+ end
62
+ }.call memo[:sparse_matrix_term_idx]
63
+ end
64
+ memo
65
+ end
66
+ @docs = memo[:docs]
67
+ @terms = memo[:terms]
68
+ @sparse_matrix_term_idx = memo[:sparse_matrix_term_idx]
69
+ @sparse_matrix_doc_idx = memo[:sparse_matrix_doc_idx]
70
+ @sparse_matrix_term_idx.each {|k, v|
71
+ @idf[k] = TFIDF.idf(v.size, @cardinality)}
72
+ end
73
+
74
+ #Build a TF vector out of a single document(String)
75
+ #=Argument:
76
+ # String valued document
77
+ #=Returns:
78
+ # A hash as in {"term" => frequency, ...}
79
+ def self.tf_single(str)
80
+ if str == nil
81
+ return nil
82
+ else
83
+ dict = str.split(pattern=@@split_pattern).reduce({}) {|dict, key|
84
+ key = key.stem.downcase
85
+ unless TFIDF.should_be_ignored_in_TF?(key)
86
+ if dict[key] != nil
87
+ dict[key] += 1
88
+ else
89
+ dict[key] = 1
90
+ end
91
+ end
92
+ dict}
93
+ if block_given?
94
+ yield dict.keys
95
+ end
96
+ dict
97
+ end
98
+ end
99
+
100
+ #Cardinality, or number of documents in corpus
101
+ def cardinality
102
+ return @cardinality
103
+ end
104
+
105
+ #Documents, in a hash, as in: {"doc_id" => "this is a document...", ...}
106
+ def docs
107
+ return @docs
108
+ end
109
+
110
+ #Terms, stored in a similar way as documents
111
+ def terms
112
+ return @terms
113
+ end
114
+
115
+ #Aka DTM, in sparse List of lists(LIL)
116
+ def sparse_matrix_doc_idx
117
+ return @sparse_matrix_doc_idx
118
+ end
119
+
120
+ #Aka TDM, in sparse List of lists(LIL)
121
+ def sparse_matrix_term_idx
122
+ return @sparse_matrix_term_idx
123
+ end
124
+
125
+ #=Arguments
126
+ # t: Term
127
+ # d: Document ID
128
+ #=Returns
129
+ # tf(t,d)
130
+ #
131
+ #Alternatively:
132
+ #=Arguments
133
+ # d: Document ID
134
+ # t: nil(or unspecified)
135
+ #=Returns
136
+ # Hash which contains non-zero tf of all terms
137
+ #
138
+ #Yet another alternative:
139
+ #=Arguments
140
+ # d: nil
141
+ # t: nil
142
+ #=Returns
143
+ # Everything
144
+ def tf(term=nil, doc=nil)
145
+ if term == nil || doc == nil
146
+ return @sparse_matrix_doc_idx
147
+ elsif term == nil
148
+ return @sparse_matrix_doc_idx[doc]
149
+ else
150
+ return lambda {|x| (x == nil)?0:x}.call(@sparse_matrix_doc_idx[doc][term])
151
+ end
152
+ end
153
+
154
+ #=Arguments
155
+ # (Optional)term: Term
156
+ #=Returns
157
+ # IDF of Term
158
+ def idf(term = nil)
159
+ if term == nil
160
+ return @idf
161
+ else
162
+ return lambda {|x| (x==nil)?0:x}.call(@idf[term])
163
+ end
164
+ end
165
+
166
+ #tf*idf(t,d)
167
+ def tfidf(term, doc)
168
+ return tf(term,doc) * idf(term)
169
+ end
170
+
171
+ #Simply the formula for tf*idf
172
+ def self.idf(x,cardinality)
173
+ return Math.log2(cardinality.to_f/(x+1).to_f)
174
+ end
175
+
176
+ #If a string is too short or contains non-alphanumeric characters, dump it
177
+ def self.should_be_ignored_in_TF?(str)
178
+ if str.length <= 3
179
+ true
180
+ elsif (/[^[[:alnum:]]]/ =~ str) != nil
181
+ true
182
+ else false
183
+ end
184
+ end
185
+
186
+ end
@@ -0,0 +1,200 @@
1
+ # -*- coding: utf-8 -*-
2
+ require 'test/unit'
3
+ require 'tfidf'
4
+ =begin
5
+ require 'ruby-debug'
6
+ Debugger.start(:post_mortem => true)
7
+ Debugger.settings[:autoeval] = true
8
+ =end
9
+
10
+ class TFIDFTest < Test::Unit::TestCase
11
+ def setup
12
+ @tfidf = TFIDF.new @@corpus
13
+ end
14
+
15
+
16
+ def test_arbitrary_text
17
+ #Just an arbitrary test on a single text, the number isn't definitive
18
+ tfidf = TFIDF.new @@text1
19
+ pp tfidf.terms.size > 50
20
+ end
21
+
22
+
23
+ def test_arbitrary_array_of_texts
24
+ #YA arbitrary test, the number isn't definitive
25
+ assert @tfidf.terms.size > 300
26
+ end
27
+
28
+
29
+ def test_should_ignore_short_words
30
+ assert TFIDF.should_be_ignored_in_TF? "any"
31
+ end
32
+
33
+ def test_should_ignore_weirdos
34
+ assert TFIDF.should_be_ignored_in_TF? "\x0081\x0081\x0081"
35
+ end
36
+
37
+ def test_idf_function
38
+ assert TFIDF.idf(1,2) == 0
39
+ assert TFIDF.idf(0,2) == 1
40
+ end
41
+
42
+ def teardown
43
+ #I don't do nothing
44
+ end
45
+
46
+
47
+ @@text = "Master's thesis
48
+ Students are required to complete a master's thesis, which is a research assignment with a workload corresponding to 30 credits. The thesis is written on a topic related to the student's major and agreed upon between the student and a professor who specialises in the topic of the thesis. The supervisor of the thesis must be a professor in the University, whereas the instructor(s) must have at least a master’s degree.
49
+ Topic application
50
+
51
+ The master's thesis process begins by contacting a professor in the student's field of interest, i.e. major, and agreeing on the topic of the thesis. For well founded reasons, the thesis may also be written on a topic related to the student's minor (if a minor is included in the degree).
52
+
53
+ Once a topic, a supervisor, an instructor and a timetable for the thesis have been determined, an official topic application must be submitted to the Student Services Office. Topic applications are accepted once a month. The Degree Programme Committee confirms the topic and appoints the supervisor and the instructor for the thesis.
54
+
55
+ A topic for the thesis may be applied for when the Bachelor's degree and at least 45 credits of the Master's degree have been completed. Once confirmed, the topic is valid for one year.
56
+ AaltoELEC's master's thesis instructions
57
+
58
+ An introduction to what a master's thesis is and what it's requirements are: What is a Master's Thesis?
59
+ A guide book written by the School's former professor gives general instructions about thesis work and its academic requirements: How to write a diploma thesis
60
+ School of Electrical Engineering thesis template (gzipped tarball), provides a LaTeX-template for your thesis writing process and gives detailed instructions on the form and style of the Master's thesis. Make sure to change the language into \"English\" when using this template. Example Master's Thesis title page and abstract produced with thesis template.
61
+
62
+ Other useful links:
63
+
64
+ Electronic master's thesis database - read completed theses from previous years
65
+ I\x2415nformation resources
66
+ Making a bibliography (Otaniemi main library's instructions)
67
+ How to avoid plagiarism 1 & 2
68
+ Aalto University Code of Academic Integrity and Handling Violations Thereof
69
+ "
70
+ @@text1 = %q_The tf*idf weight (term frequency–inverse document frequency) is a numerical statistic which reflects how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.
71
+
72
+ Variations of the tf*idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. tf*idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.[1]
73
+
74
+ One of the simplest ranking functions is computed by summing the tf*idf for each query term; many more sophisticated ranking functions are variants of this simple model._
75
+
76
+ @@text2 = %q_David O'Reilly (1985, Kilkenny, Ireland) is an Irish film director and artist based in Los Angeles, California, USA. He is known for creating animated short films with a stripped down aesthetic.
77
+ Contents
78
+ [hide]
79
+
80
+ 1 Work
81
+ 2 Octocat Adventure
82
+ 3 Short films
83
+ 4 Music videos
84
+ 5 References
85
+ 6 External links
86
+
87
+ [edit] Work
88
+
89
+ Aside from a 1-minute film entitled Ident from which he draws his logo, the earliest work available on his website is WOFL2106[1]. This short draws equally on original designs and popular internet memes, such as Brian Peppers, to create a disturbing landscape of serenity juxtaposed with chaos. This film sets the tone for his entire ouvre, though the direct inclusion of outside memes disappears in his later work.
90
+
91
+ His short film, Please Say Something, was awarded the Golden Bear at the 2009 Berlin International Film Festival, Best Narrative Short at the 2009 Ottawa International Animation Festival[2] and several other awards.[3]
92
+
93
+ He created several animation sequences and props for the 2007 film Son of Rambow.[4] As well as animation for the "guide" sequences in Hitchhiker's Guide to the Galaxy, with Shynola.[5]
94
+
95
+ He created the first video for Irish rock band U2's single "I'll Go Crazy If I Don't Go Crazy Tonight."[6] The video was released on U2.com on July 21, 2009.
96
+
97
+ His latest short film, The External World, premiered at the 67th Venice Film Festival and the 2011 Sundance Film Festival, and has since won over twenty awards on its festival circuit.[7]
98
+ [edit] Octocat Adventure
99
+
100
+ On March 13, 2008, a YouTube user named RANDYPETERS1, a 9-year-old boy from Chicago, submitted a handdrawn animated video about Octocat, a red cat head with eight long legs looking for his parents. The videos featured crude MS Paint animation and a loud, highpitched, child-like voice narrating. On September 7, the fifth, final episode was released, but featured an unexpected twist - about 20 seconds into it, the crude sketchy animation switched to intricately crafted 3D with an orchestral soundtrack; the whole Octocat story (and as such, the Randy Peters persona) was revealed to be by David O'Reilly [8]. In an interview he joked \"I wanted to try experimenting with the Youtube audience and Microsoft Paint. The story for Octocat came to me by reading the bible word-for-word backwards\".[9]
101
+ [edit] Short films
102
+
103
+ The External World (2010, 15 min)
104
+ Please Say Something (2009, 10 min)
105
+ Octocat Adventure (2008, 6 min)[10][11]
106
+ Serial Entoptics (2008, 10 min)
107
+ RGBXYZ (2007, 12 min)[12]
108
+ Wofl2106 (2006, 4 min)
109
+
110
+ [edit] Music videos
111
+
112
+ Szamar Madar (Venetian Snares, 2005, 4 min)
113
+ I'll Go Crazy If I Don't Go Crazy Tonight (U2, 2009, 4 min)[6]_
114
+
115
+ @@text3 = %q_By Melissa Healy, Los Angeles Times
116
+
117
+ March 27, 2012, 5:45 p.m.
118
+ When roasted at 475 degrees, coffee beans are sometimes described as rich and full-bodied. But for the full-bodied person who is not so rich, unroasted coffee beans — green as the day they were picked — may hold the key to cheap and effective weight loss, new research suggests.
119
+
120
+ In a study presented Tuesday at the American Chemical Society's spring national meeting in San Diego, 16 overweight young adults took, by turns, a low dose of green coffee bean extract, a high dose of the supplement, and a placebo. Though the study was small, the results were striking: Subjects lost an average of 17.5 pounds in 22 weeks and reduced their overall body weight by 10.5%.
121
+
122
+ If green coffee extract were a medication seeking approval from the Food and Drug Administration, these results would make it a viable candidate — more than 35% of subjects lost more than 5% of their body weight, and weight loss appeared to be greater while subjects were taking the pills than when they were on the placebo.
123
+
124
+ But as a dietary supplement, green coffee extract does not require the FDA's blessing. In fact, it is already available as a naturopathic medicine and antioxidant.
125
+
126
+ Joe Vinson, the University of Scranton chemist who conducted the pilot study, said the findings should pave the way for more rigorous research on coffee bean extract's effects. A larger trial involving 60 people is being planned.
127
+
128
+ Vinson, whose research focuses on plant polyphenols and their effects on human health, said it appears that green coffee bean extract may work by reducing the absorption of fat and glucose in the gut; it may also reduce insulin levels, which would improve metabolic function. There were no signs of ill effects on any subjects, Vinson reported Tuesday.
129
+
130
+ The study used a "cross-over" design, which allowed each subject to serve as his or her own comparison group. For six weeks, volunteers swallowed capsules three times a day, ingesting either 700 or 1,050 milligrams of green coffee extract a day or taking a placebo. After a two-week break, they moved, round-robin style, to another arm of the trial.
131
+
132
+ Subjects did not change their calorie intake over the course of the trial. But the more extract they consumed, the more weight and fat they lost. Altogether, they reduced their body fat by 16%, on average.
133
+
134
+ Of the 16 volunteers, six wound up with a body mass index in the healthful range.
135
+
136
+ One downside is that the extract is "extremely bitter." It would be difficult to take without a lot of water, Vinson reported.
137
+
138
+ At roughly $20 per month, however, green coffee extract is much less expensive than any of the weight-loss medications available over the counter or by prescription.
139
+
140
+ The trial was conducted in India and paid for by Applied Food Sciences Inc. of Austin, Tex., a manufacturer of green coffee bean extract.
141
+
142
+ The pilot study drew strong cautions from several scientists who weren't involved in the research.
143
+
144
+ "This is certainly a provocative study," said Dr. Gerald Weissmann, a physician and biochemist at New York University. But he said nutrition experts would want assurances that green coffee beans do not cause "malabsorption" within the human gut — a condition that would lead to weight loss as well as malnutrition, heart arrhythmias and other problems because vitamins and minerals are not passing through the intestine.
145
+
146
+ Dr. Arthur Grollman, a pharmacologist at the State University of New York at Stony Brook, said coffee beans contain about 250 different chemicals — some with positive and others with negative effects on human health. Though Vinson identified polyphenols and chlorogenic acid as the agents that appear to promote weight loss, Grollman said that claim needed further study. In the meantime, he said, consuming an extract that contains both good and bad chemicals in dense concentration seems an unwise thing to do._
147
+
148
+ @@text4 = %q_By Andrea Mustain
149
+ OurAmazingPlanet
150
+ updated 3/27/2012 6:34:52 PM ET
151
+
152
+ Print
153
+ Font:
154
+
155
+ James Cameron's record-setting dive to Earth's deepest spot has sparked a wave of excitement among many in the science community, who are not only heralding the new technology produced by the Hollywood veteran but lauding the renewed focus the project has put on the deep ocean.
156
+
157
+ "It's wonderful, absolutely wonderful," said Robert J. Stern, a geoscientist at the University of Texas at Dallas. He is one of several deep-sea researchers who said they'd been closely following Cameron's bid to return human observers to the Challenger Deep, a trough within the Mariana Trench more than 35,000 feet (10,700 meters) below the ocean surface.
158
+
159
+ Cameron's roundtrip earlier this week to the deepest place on Earth lasted just under seven hours. The only previous time humans visited this spot was in 1960.
160
+
161
+ News of Cameron's successful solo dive "gave me goose bumps," said Cindy Lee Van Dover, director of the marine laboratory at Duke University's Nicholas School of the Environment.
162
+
163
+ "I think it's a really good thing," said Bruce Robison, a senior scientist at the Monterey Bay Aquarium Research Institute in California. [ Infographic: James Cameron's Mariana Trench Dive ]
164
+
165
+ Opening the deep
166
+ All three scientists have spent many hours aboard some of the few deep-diving research submersibles on Earth and said they hoped Cameron's technology eventually will prove a boon to researchers wishing to collect samples and perhaps even conduct experiments in the deepest reaches of the sea, a place that until now has been off-limits to humans.
167
+
168
+ "This establishes that the technology exists to allow that to happen, so we shouldn't constrain our thinking about approaching work in that extreme habitat," Robison told OurAmazingPlanet. "This is a technological breakthrough and a huge accomplishment on Cameron's part, and I'm very pleased that he's done it; but let's hope it opens the door for more."
169
+
170
+ Cameron's team hasn't confirmed what (if any) samples the filmmaker and explorer retrieved from the Challenger Deep during his three-hour seafloor sojourn. The sub is equipped with a sampling arm, among other research tools. However, Cameron did describe a bleak view through the windows of his lime-green submersible.
171
+
172
+ More science news from msnbc.com
173
+ Image: Brain regions
174
+ Yamada et al. / Nature Comm.
175
+ Scientists take a look inside a jury's brains
176
+
177
+ Science editor Alan Boyle's blog: Jurors show a characteristic pattern of brain activity when they decide to be lenient on a criminal, and the strength of that pattern can vary from juror to juror, researchers say.
178
+ 'Invisibility cloak' can serve as heat shield
179
+ Dolphin society adopts freewheeling lifestyle
180
+ Ancient stone monolith likely marked seasons
181
+
182
+ "It looked like the moon," Cameron told National Geographic reporters upon his return to the surface world. [ See photos from Cameron's historic dive ]
183
+
184
+ For scientists, such an assessment was hardly discouraging.
185
+
186
+ "I was rooting for him to land and find strange-looking animals," Van Dover said, but she added she wasn't surprised. She said the seafloor is vast, conditions are harsh, and life is likely sparsely spread.
187
+
188
+ "Three hours is just a drop in the bucket, and with more hours I think he's going to discover cool things," Van Dover said. "How many years have we been studying the ocean? And it took until 1977 to discover hydrothermal vents." Van Dover specializes in researching the strange creatures that congregate around the seafloor vents, which spew super-heated water laced with trace chemicals that sustain the animals.
189
+
190
+ Robison, a veteran deep-sea ecologist, said that whatever does live in the trench, which is nearly a mile deeper than Mount Everest is tall, will be of great interest to scientists.
191
+
192
+ "Anything that has adapted to thrive in that habitat is going to have some really remarkable adaptations," he said. "But most of the animals in the ocean don't live on the bottom, so there's an enormous potential for discovery up off the bottom as well."
193
+
194
+ Much of the deepest ocean is unreachable via state-owned submersibles, which at this point can dive no more than 21,000 feet (6,500 m). Only Japan's Shinkai 6500 has reached such depths. The United States is refurbishing Alvin, its deepest-diving craft, to be able to reach 21,000 feet within th_
195
+
196
+
197
+ @@corpus = [@@text,@@text1,@@text2,@@text3,@@text4]
198
+
199
+
200
+ end
metadata ADDED
@@ -0,0 +1,60 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: tfidf
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.0
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Yu Shen
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-03-17 00:00:00.000000000Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: fast-stemmer
16
+ requirement: &16818040 !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: *16818040
25
+ description: Calculate TF-IDF out of a text, resulting in a hash with term as key,
26
+ frequency as value. Sorry for taking the convenient name for myself! See examples/demo_tf.rb
27
+ for usage
28
+ email: yushen83@gmail.com
29
+ executables: []
30
+ extensions: []
31
+ extra_rdoc_files: []
32
+ files:
33
+ - lib/tfidf.rb
34
+ - examples/demo_tf.rb
35
+ - test/test_TFIDF.rb
36
+ homepage: https://github.com/yushen
37
+ licenses: []
38
+ post_install_message:
39
+ rdoc_options: []
40
+ require_paths:
41
+ - lib
42
+ required_ruby_version: !ruby/object:Gem::Requirement
43
+ none: false
44
+ requirements:
45
+ - - ! '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ required_rubygems_version: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ requirements: []
55
+ rubyforge_project:
56
+ rubygems_version: 1.8.10
57
+ signing_key:
58
+ specification_version: 3
59
+ summary: A W.I.P implementation of TF-IDF
60
+ test_files: []