tfidf 0.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/examples/demo_tf.rb +142 -0
- data/lib/tfidf.rb +186 -0
- data/test/test_TFIDF.rb +200 -0
- metadata +60 -0
data/examples/demo_tf.rb
ADDED
@@ -0,0 +1,142 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
2
|
+
require 'tfidf'
|
3
|
+
require 'pp'
|
4
|
+
require 'ruby-debug'
|
5
|
+
Debugger.settings[:autoeval] = true
|
6
|
+
Debugger.start
|
7
|
+
Debugger.post_mortem
|
8
|
+
|
9
|
+
corpus = ["The quick brown fox jumps over the lazy dog",
|
10
|
+
%Q_Fox is a common name for many species of omnivorous mammals belonging to the Canidae family. Foxes are small to medium-sized canids (slightly smaller than the medium-sized domestic dog), characterized by possessing a long narrow snout, and a bushy tail (or brush).
|
11
|
+
Members of about 37 species are referred to as foxes, of which only 12 species actually belong to the Vulpes genus of "true foxes". By far the most common and widespread species of fox is the red fox (Vulpes vulpes), although various species are found on almost every continent. The presence of fox-like carnivores all over the globe, together with their widespread reputation for cunning, has contributed to their appearance in popular culture and folklore in many societies around the world (see also Foxes in culture)._,
|
12
|
+
%Q_On March 13, 2008, a YouTube user named RANDYPETERS1, a 9-year-old boy from Chicago, submitted a handdrawn animated video about Octocat, a red cat head with eight long legs looking for his parents. The videos featured crude MS Paint animation and a loud, highpitched, child-like voice narrating. On September 7, the fifth, final episode was released, but featured an unexpected twist - about 20 seconds into it, the crude sketchy animation switched to intricately crafted 3D with an orchestral soundtrack; the whole Octocat story (and as such, the Randy Peters persona) was revealed to be by David O'Reilly [8]. In an interview he joked "I wanted to try experimenting with the Youtube audience and Microsoft Paint. The story for Octocat came to me by reading the bible word-for-word backwards".[9]_ ,
|
13
|
+
"Master's thesis
|
14
|
+
Students are required to complete a master's thesis, which is a research assignment with a workload corresponding to 30 credits. The thesis is written on a topic related to the student's major and agreed upon between the student and a professor who specialises in the topic of the thesis. The supervisor of the thesis must be a professor in the University, whereas the instructor(s) must have at least a master’s degree.
|
15
|
+
Topic application
|
16
|
+
|
17
|
+
The master's thesis process begins by contacting a professor in the student's field of interest, i.e. major, and agreeing on the topic of the thesis. For well founded reasons, the thesis may also be written on a topic related to the student's minor (if a minor is included in the degree).
|
18
|
+
|
19
|
+
Once a topic, a supervisor, an instructor and a timetable for the thesis have been determined, an official topic application must be submitted to the Student Services Office. Topic applications are accepted once a month. The Degree Programme Committee confirms the topic and appoints the supervisor and the instructor for the thesis.
|
20
|
+
|
21
|
+
A topic for the thesis may be applied for when the Bachelor's degree and at least 45 credits of the Master's degree have been completed. Once confirmed, the topic is valid for one year.
|
22
|
+
AaltoELEC's master's thesis instructions
|
23
|
+
|
24
|
+
An introduction to what a master's thesis is and what it's requirements are: What is a Master's Thesis?
|
25
|
+
A guide book written by the School's former professor gives general instructions about thesis work and its academic requirements: How to write a diploma thesis
|
26
|
+
School of Electrical Engineering thesis template (gzipped tarball), provides a LaTeX-template for your thesis writing process and gives detailed instructions on the form and style of the Master's thesis. Make sure to change the language into \"English\" when using this template. Example Master's Thesis title page and abstract produced with thesis template.
|
27
|
+
|
28
|
+
Other useful links:
|
29
|
+
|
30
|
+
Electronic master's thesis database - read completed theses from previous years
|
31
|
+
I\x2415nformation resources
|
32
|
+
Making a bibliography (Otaniemi main library's instructions)
|
33
|
+
How to avoid plagiarism 1 & 2
|
34
|
+
Aalto University Code of Academic Integrity and Handling Violations Thereof
|
35
|
+
" ,
|
36
|
+
%q_The tf*idf weight (term frequency–inverse document frequency) is a numerical statistic which reflects how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.
|
37
|
+
|
38
|
+
Variations of the tf*idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. tf*idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.[1]
|
39
|
+
|
40
|
+
One of the simplest ranking functions is computed by summing the tf*idf for each query term; many more sophisticated ranking functions are variants of this simple model._ ,
|
41
|
+
%q_By Melissa Healy, Los Angeles Times
|
42
|
+
|
43
|
+
March 27, 2012, 5:45 p.m.
|
44
|
+
When roasted at 475 degrees, coffee beans are sometimes described as rich and full-bodied. But for the full-bodied person who is not so rich, unroasted coffee beans — green as the day they were picked — may hold the key to cheap and effective weight loss, new research suggests.
|
45
|
+
|
46
|
+
In a study presented Tuesday at the American Chemical Society's spring national meeting in San Diego, 16 overweight young adults took, by turns, a low dose of green coffee bean extract, a high dose of the supplement, and a placebo. Though the study was small, the results were striking: Subjects lost an average of 17.5 pounds in 22 weeks and reduced their overall body weight by 10.5%.
|
47
|
+
|
48
|
+
If green coffee extract were a medication seeking approval from the Food and Drug Administration, these results would make it a viable candidate — more than 35% of subjects lost more than 5% of their body weight, and weight loss appeared to be greater while subjects were taking the pills than when they were on the placebo.
|
49
|
+
|
50
|
+
But as a dietary supplement, green coffee extract does not require the FDA's blessing. In fact, it is already available as a naturopathic medicine and antioxidant.
|
51
|
+
|
52
|
+
Joe Vinson, the University of Scranton chemist who conducted the pilot study, said the findings should pave the way for more rigorous research on coffee bean extract's effects. A larger trial involving 60 people is being planned.
|
53
|
+
|
54
|
+
Vinson, whose research focuses on plant polyphenols and their effects on human health, said it appears that green coffee bean extract may work by reducing the absorption of fat and glucose in the gut; it may also reduce insulin levels, which would improve metabolic function. There were no signs of ill effects on any subjects, Vinson reported Tuesday.
|
55
|
+
|
56
|
+
The study used a "cross-over" design, which allowed each subject to serve as his or her own comparison group. For six weeks, volunteers swallowed capsules three times a day, ingesting either 700 or 1,050 milligrams of green coffee extract a day or taking a placebo. After a two-week break, they moved, round-robin style, to another arm of the trial.
|
57
|
+
|
58
|
+
Subjects did not change their calorie intake over the course of the trial. But the more extract they consumed, the more weight and fat they lost. Altogether, they reduced their body fat by 16%, on average.
|
59
|
+
|
60
|
+
Of the 16 volunteers, six wound up with a body mass index in the healthful range.
|
61
|
+
|
62
|
+
One downside is that the extract is "extremely bitter." It would be difficult to take without a lot of water, Vinson reported.
|
63
|
+
|
64
|
+
At roughly $20 per month, however, green coffee extract is much less expensive than any of the weight-loss medications available over the counter or by prescription.
|
65
|
+
|
66
|
+
The trial was conducted in India and paid for by Applied Food Sciences Inc. of Austin, Tex., a manufacturer of green coffee bean extract.
|
67
|
+
|
68
|
+
The pilot study drew strong cautions from several scientists who weren't involved in the research.
|
69
|
+
|
70
|
+
"This is certainly a provocative study," said Dr. Gerald Weissmann, a physician and biochemist at New York University. But he said nutrition experts would want assurances that green coffee beans do not cause "malabsorption" within the human gut — a condition that would lead to weight loss as well as malnutrition, heart arrhythmias and other problems because vitamins and minerals are not passing through the intestine.
|
71
|
+
|
72
|
+
Dr. Arthur Grollman, a pharmacologist at the State University of New York at Stony Brook, said coffee beans contain about 250 different chemicals — some with positive and others with negative effects on human health. Though Vinson identified polyphenols and chlorogenic acid as the agents that appear to promote weight loss, Grollman said that claim needed further study. In the meantime, he said, consuming an extract that contains both good and bad chemicals in dense concentration seems an unwise thing to do._,
|
73
|
+
%q_By Andrea Mustain
|
74
|
+
OurAmazingPlanet
|
75
|
+
updated 3/27/2012 6:34:52 PM ET
|
76
|
+
|
77
|
+
Print
|
78
|
+
Font:
|
79
|
+
|
80
|
+
James Cameron's record-setting dive to Earth's deepest spot has sparked a wave of excitement among many in the science community, who are not only heralding the new technology produced by the Hollywood veteran but lauding the renewed focus the project has put on the deep ocean.
|
81
|
+
|
82
|
+
"It's wonderful, absolutely wonderful," said Robert J. Stern, a geoscientist at the University of Texas at Dallas. He is one of several deep-sea researchers who said they'd been closely following Cameron's bid to return human observers to the Challenger Deep, a trough within the Mariana Trench more than 35,000 feet (10,700 meters) below the ocean surface.
|
83
|
+
|
84
|
+
Cameron's roundtrip earlier this week to the deepest place on Earth lasted just under seven hours. The only previous time humans visited this spot was in 1960.
|
85
|
+
|
86
|
+
News of Cameron's successful solo dive "gave me goose bumps," said Cindy Lee Van Dover, director of the marine laboratory at Duke University's Nicholas School of the Environment.
|
87
|
+
|
88
|
+
"I think it's a really good thing," said Bruce Robison, a senior scientist at the Monterey Bay Aquarium Research Institute in California. [ Infographic: James Cameron's Mariana Trench Dive ]
|
89
|
+
|
90
|
+
Opening the deep
|
91
|
+
All three scientists have spent many hours aboard some of the few deep-diving research submersibles on Earth and said they hoped Cameron's technology eventually will prove a boon to researchers wishing to collect samples and perhaps even conduct experiments in the deepest reaches of the sea, a place that until now has been off-limits to humans.
|
92
|
+
|
93
|
+
"This establishes that the technology exists to allow that to happen, so we shouldn't constrain our thinking about approaching work in that extreme habitat," Robison told OurAmazingPlanet. "This is a technological breakthrough and a huge accomplishment on Cameron's part, and I'm very pleased that he's done it; but let's hope it opens the door for more."
|
94
|
+
|
95
|
+
Cameron's team hasn't confirmed what (if any) samples the filmmaker and explorer retrieved from the Challenger Deep during his three-hour seafloor sojourn. The sub is equipped with a sampling arm, among other research tools. However, Cameron did describe a bleak view through the windows of his lime-green submersible.
|
96
|
+
|
97
|
+
More science news from msnbc.com
|
98
|
+
Image: Brain regions
|
99
|
+
Yamada et al. / Nature Comm.
|
100
|
+
Scientists take a look inside a jury's brains
|
101
|
+
|
102
|
+
Science editor Alan Boyle's blog: Jurors show a characteristic pattern of brain activity when they decide to be lenient on a criminal, and the strength of that pattern can vary from juror to juror, researchers say.
|
103
|
+
'Invisibility cloak' can serve as heat shield
|
104
|
+
Dolphin society adopts freewheeling lifestyle
|
105
|
+
Ancient stone monolith likely marked seasons
|
106
|
+
|
107
|
+
"It looked like the moon," Cameron told National Geographic reporters upon his return to the surface world. [ See photos from Cameron's historic dive ]
|
108
|
+
|
109
|
+
For scientists, such an assessment was hardly discouraging.
|
110
|
+
|
111
|
+
"I was rooting for him to land and find strange-looking animals," Van Dover said, but she added she wasn't surprised. She said the seafloor is vast, conditions are harsh, and life is likely sparsely spread.
|
112
|
+
|
113
|
+
"Three hours is just a drop in the bucket, and with more hours I think he's going to discover cool things," Van Dover said. "How many years have we been studying the ocean? And it took until 1977 to discover hydrothermal vents." Van Dover specializes in researching the strange creatures that congregate around the seafloor vents, which spew super-heated water laced with trace chemicals that sustain the animals.
|
114
|
+
|
115
|
+
Robison, a veteran deep-sea ecologist, said that whatever does live in the trench, which is nearly a mile deeper than Mount Everest is tall, will be of great interest to scientists.
|
116
|
+
|
117
|
+
"Anything that has adapted to thrive in that habitat is going to have some really remarkable adaptations," he said. "But most of the animals in the ocean don't live on the bottom, so there's an enormous potential for discovery up off the bottom as well."
|
118
|
+
|
119
|
+
Much of the deepest ocean is unreachable via state-owned submersibles, which at this point can dive no more than 21,000 feet (6,500 m). Only Japan's Shinkai 6500 has reached such depths. The United States is refurbishing Alvin, its deepest-diving craft, to be able to reach 21,000 feet within th_ ]
|
120
|
+
|
121
|
+
tfidf = TFIDF.new corpus
|
122
|
+
|
123
|
+
puts "Documents in the corpus:"
|
124
|
+
tfidf.docs.each {|k,v| puts "Document ID: #{k} => term: #{v}"}
|
125
|
+
|
126
|
+
puts "Terms in the corpus"
|
127
|
+
tfidf.terms.each {|k,v| puts "Term ID: #{k} => term: #{v}"}
|
128
|
+
|
129
|
+
puts "Document-Term Matrix, sparse List of lists(LIL)"
|
130
|
+
tfidf.sparse_matrix_doc_idx.each {|e| puts e}
|
131
|
+
|
132
|
+
puts "Term-Document Matrix, sparse LIL"
|
133
|
+
tfidf.sparse_matrix_term_idx.each {|e| puts e}
|
134
|
+
|
135
|
+
puts "Term Frequency of word: video in document"
|
136
|
+
puts tfidf.tf
|
137
|
+
|
138
|
+
puts "Inverse Document Frequency of word: octocat"
|
139
|
+
puts tfidf.idf("octocat")
|
140
|
+
|
141
|
+
puts "TF-IDF of word : octocats in document 7e38fa195cee92d2e7d834095d6938a89b5fdd58"
|
142
|
+
puts tfidf.tfidf("octocat","7e38fa195cee92d2e7d834095d6938a89b5fdd58")
|
data/lib/tfidf.rb
ADDED
@@ -0,0 +1,186 @@
|
|
1
|
+
require 'set'
|
2
|
+
require 'fast_stemmer'
|
3
|
+
require 'digest'
|
4
|
+
|
5
|
+
|
6
|
+
class TFIDF
|
7
|
+
|
8
|
+
#Regex pattern of delimiters for splitting text
|
9
|
+
@@split_pattern = /[\W]/
|
10
|
+
|
11
|
+
#Hash function used for generating id for documents as well as terms
|
12
|
+
def hash_func(obj)
|
13
|
+
return Digest::SHA1.hexdigest obj
|
14
|
+
end
|
15
|
+
|
16
|
+
#=Arguments
|
17
|
+
# corpus: an array of strings, one string per document
|
18
|
+
#=Returns
|
19
|
+
# self
|
20
|
+
#Example:
|
21
|
+
#tfidf = TFIDF.new(["This is a document...",
|
22
|
+
#"Far, far away...",
|
23
|
+
#"The quick brown fox jumps over the lazy dog"])
|
24
|
+
#tfidf.tf("fox","2fd4e1c67a2d28fced849ee1bb76e7391b93eb12") #=> 1
|
25
|
+
#See examples/demo_tf.rb for more
|
26
|
+
def initialize(corpus)
|
27
|
+
@cardinality = 0
|
28
|
+
@docs = {}
|
29
|
+
@terms = {}
|
30
|
+
@sparse_matrix_term_idx = {}
|
31
|
+
@sparse_matrix_doc_idx = {}
|
32
|
+
@idf = {}
|
33
|
+
|
34
|
+
#not in use
|
35
|
+
#TODO:
|
36
|
+
@dense_matrix = nil
|
37
|
+
|
38
|
+
if corpus.is_a? String
|
39
|
+
@cardinality = 1
|
40
|
+
corpus = [corpus]
|
41
|
+
else
|
42
|
+
@cardinality = corpus.length
|
43
|
+
end
|
44
|
+
memo = corpus.reduce({:terms => {}, :docs => {}, :sparse_matrix_doc_idx => {}, :sparse_matrix_term_idx => {}}) do |memo, doc|
|
45
|
+
doc_id = hash_func doc
|
46
|
+
memo[:docs][doc_id] = doc
|
47
|
+
tf_single_doc = TFIDF.tf_single(doc)
|
48
|
+
memo[:sparse_matrix_doc_idx][doc_id] = tf_single_doc
|
49
|
+
tf_single_doc.each do |keyvalue|
|
50
|
+
term, freq = keyvalue
|
51
|
+
term_id = hash_func term
|
52
|
+
lambda {|x|
|
53
|
+
if !x.has_key?(term_id)
|
54
|
+
x[term_id] = term
|
55
|
+
end}.call memo[:terms]
|
56
|
+
lambda {|x|
|
57
|
+
if x[term] != nil
|
58
|
+
x[term][doc_id] = freq
|
59
|
+
else
|
60
|
+
x[term] = {doc_id => freq}
|
61
|
+
end
|
62
|
+
}.call memo[:sparse_matrix_term_idx]
|
63
|
+
end
|
64
|
+
memo
|
65
|
+
end
|
66
|
+
@docs = memo[:docs]
|
67
|
+
@terms = memo[:terms]
|
68
|
+
@sparse_matrix_term_idx = memo[:sparse_matrix_term_idx]
|
69
|
+
@sparse_matrix_doc_idx = memo[:sparse_matrix_doc_idx]
|
70
|
+
@sparse_matrix_term_idx.each {|k, v|
|
71
|
+
@idf[k] = TFIDF.idf(v.size, @cardinality)}
|
72
|
+
end
|
73
|
+
|
74
|
+
#Build a TF vector out of a single document(String)
|
75
|
+
#=Argument:
|
76
|
+
# String valued document
|
77
|
+
#=Returns:
|
78
|
+
# A hash as in {"term" => frequency, ...}
|
79
|
+
def self.tf_single(str)
|
80
|
+
if str == nil
|
81
|
+
return nil
|
82
|
+
else
|
83
|
+
dict = str.split(pattern=@@split_pattern).reduce({}) {|dict, key|
|
84
|
+
key = key.stem.downcase
|
85
|
+
unless TFIDF.should_be_ignored_in_TF?(key)
|
86
|
+
if dict[key] != nil
|
87
|
+
dict[key] += 1
|
88
|
+
else
|
89
|
+
dict[key] = 1
|
90
|
+
end
|
91
|
+
end
|
92
|
+
dict}
|
93
|
+
if block_given?
|
94
|
+
yield dict.keys
|
95
|
+
end
|
96
|
+
dict
|
97
|
+
end
|
98
|
+
end
|
99
|
+
|
100
|
+
#Cardinality, or number of documents in corpus
|
101
|
+
def cardinality
|
102
|
+
return @cardinality
|
103
|
+
end
|
104
|
+
|
105
|
+
#Documents, in a hash, as in: {"doc_id" => "this is a document...", ...}
|
106
|
+
def docs
|
107
|
+
return @docs
|
108
|
+
end
|
109
|
+
|
110
|
+
#Terms, stored in a similar way as documents
|
111
|
+
def terms
|
112
|
+
return @terms
|
113
|
+
end
|
114
|
+
|
115
|
+
#Aka DTM, in sparse List of lists(LIL)
|
116
|
+
def sparse_matrix_doc_idx
|
117
|
+
return @sparse_matrix_doc_idx
|
118
|
+
end
|
119
|
+
|
120
|
+
#Aka TDM, in sparse List of lists(LIL)
|
121
|
+
def sparse_matrix_term_idx
|
122
|
+
return @sparse_matrix_term_idx
|
123
|
+
end
|
124
|
+
|
125
|
+
#=Arguments
|
126
|
+
# t: Term
|
127
|
+
# d: Document ID
|
128
|
+
#=Returns
|
129
|
+
# tf(t,d)
|
130
|
+
#
|
131
|
+
#Alternatively:
|
132
|
+
#=Arguments
|
133
|
+
# d: Document ID
|
134
|
+
# t: nil(or unspecified)
|
135
|
+
#=Returns
|
136
|
+
# Hash which contains non-zero tf of all terms
|
137
|
+
#
|
138
|
+
#Yet another alternative:
|
139
|
+
#=Arguments
|
140
|
+
# d: nil
|
141
|
+
# t: nil
|
142
|
+
#=Returns
|
143
|
+
# Everything
|
144
|
+
def tf(term=nil, doc=nil)
|
145
|
+
if term == nil || doc == nil
|
146
|
+
return @sparse_matrix_doc_idx
|
147
|
+
elsif term == nil
|
148
|
+
return @sparse_matrix_doc_idx[doc]
|
149
|
+
else
|
150
|
+
return lambda {|x| (x == nil)?0:x}.call(@sparse_matrix_doc_idx[doc][term])
|
151
|
+
end
|
152
|
+
end
|
153
|
+
|
154
|
+
#=Arguments
|
155
|
+
# (Optional)term: Term
|
156
|
+
#=Returns
|
157
|
+
# IDF of Term
|
158
|
+
def idf(term = nil)
|
159
|
+
if term == nil
|
160
|
+
return @idf
|
161
|
+
else
|
162
|
+
return lambda {|x| (x==nil)?0:x}.call(@idf[term])
|
163
|
+
end
|
164
|
+
end
|
165
|
+
|
166
|
+
#tf*idf(t,d)
|
167
|
+
def tfidf(term, doc)
|
168
|
+
return tf(term,doc) * idf(term)
|
169
|
+
end
|
170
|
+
|
171
|
+
#Simply the formula for tf*idf
|
172
|
+
def self.idf(x,cardinality)
|
173
|
+
return Math.log2(cardinality.to_f/(x+1).to_f)
|
174
|
+
end
|
175
|
+
|
176
|
+
#If a string is too short or contains non-alphanumeric characters, dump it
|
177
|
+
def self.should_be_ignored_in_TF?(str)
|
178
|
+
if str.length <= 3
|
179
|
+
true
|
180
|
+
elsif (/[^[[:alnum:]]]/ =~ str) != nil
|
181
|
+
true
|
182
|
+
else false
|
183
|
+
end
|
184
|
+
end
|
185
|
+
|
186
|
+
end
|
data/test/test_TFIDF.rb
ADDED
@@ -0,0 +1,200 @@
|
|
1
|
+
# -*- coding: utf-8 -*-
|
2
|
+
require 'test/unit'
|
3
|
+
require 'tfidf'
|
4
|
+
=begin
|
5
|
+
require 'ruby-debug'
|
6
|
+
Debugger.start(:post_mortem => true)
|
7
|
+
Debugger.settings[:autoeval] = true
|
8
|
+
=end
|
9
|
+
|
10
|
+
class TFIDFTest < Test::Unit::TestCase
|
11
|
+
def setup
|
12
|
+
@tfidf = TFIDF.new @@corpus
|
13
|
+
end
|
14
|
+
|
15
|
+
|
16
|
+
def test_arbitrary_text
|
17
|
+
#Just an arbitrary test on a single text, the number isn't definitive
|
18
|
+
tfidf = TFIDF.new @@text1
|
19
|
+
pp tfidf.terms.size > 50
|
20
|
+
end
|
21
|
+
|
22
|
+
|
23
|
+
def test_arbitrary_array_of_texts
|
24
|
+
#YA arbitrary test, the number isn't definitive
|
25
|
+
assert @tfidf.terms.size > 300
|
26
|
+
end
|
27
|
+
|
28
|
+
|
29
|
+
def test_should_ignore_short_words
|
30
|
+
assert TFIDF.should_be_ignored_in_TF? "any"
|
31
|
+
end
|
32
|
+
|
33
|
+
def test_should_ignore_weirdos
|
34
|
+
assert TFIDF.should_be_ignored_in_TF? "\x0081\x0081\x0081"
|
35
|
+
end
|
36
|
+
|
37
|
+
def test_idf_function
|
38
|
+
assert TFIDF.idf(1,2) == 0
|
39
|
+
assert TFIDF.idf(0,2) == 1
|
40
|
+
end
|
41
|
+
|
42
|
+
def teardown
|
43
|
+
#I don't do nothing
|
44
|
+
end
|
45
|
+
|
46
|
+
|
47
|
+
@@text = "Master's thesis
|
48
|
+
Students are required to complete a master's thesis, which is a research assignment with a workload corresponding to 30 credits. The thesis is written on a topic related to the student's major and agreed upon between the student and a professor who specialises in the topic of the thesis. The supervisor of the thesis must be a professor in the University, whereas the instructor(s) must have at least a master’s degree.
|
49
|
+
Topic application
|
50
|
+
|
51
|
+
The master's thesis process begins by contacting a professor in the student's field of interest, i.e. major, and agreeing on the topic of the thesis. For well founded reasons, the thesis may also be written on a topic related to the student's minor (if a minor is included in the degree).
|
52
|
+
|
53
|
+
Once a topic, a supervisor, an instructor and a timetable for the thesis have been determined, an official topic application must be submitted to the Student Services Office. Topic applications are accepted once a month. The Degree Programme Committee confirms the topic and appoints the supervisor and the instructor for the thesis.
|
54
|
+
|
55
|
+
A topic for the thesis may be applied for when the Bachelor's degree and at least 45 credits of the Master's degree have been completed. Once confirmed, the topic is valid for one year.
|
56
|
+
AaltoELEC's master's thesis instructions
|
57
|
+
|
58
|
+
An introduction to what a master's thesis is and what it's requirements are: What is a Master's Thesis?
|
59
|
+
A guide book written by the School's former professor gives general instructions about thesis work and its academic requirements: How to write a diploma thesis
|
60
|
+
School of Electrical Engineering thesis template (gzipped tarball), provides a LaTeX-template for your thesis writing process and gives detailed instructions on the form and style of the Master's thesis. Make sure to change the language into \"English\" when using this template. Example Master's Thesis title page and abstract produced with thesis template.
|
61
|
+
|
62
|
+
Other useful links:
|
63
|
+
|
64
|
+
Electronic master's thesis database - read completed theses from previous years
|
65
|
+
I\x2415nformation resources
|
66
|
+
Making a bibliography (Otaniemi main library's instructions)
|
67
|
+
How to avoid plagiarism 1 & 2
|
68
|
+
Aalto University Code of Academic Integrity and Handling Violations Thereof
|
69
|
+
"
|
70
|
+
@@text1 = %q_The tf*idf weight (term frequency–inverse document frequency) is a numerical statistic which reflects how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.
|
71
|
+
|
72
|
+
Variations of the tf*idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. tf*idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.[1]
|
73
|
+
|
74
|
+
One of the simplest ranking functions is computed by summing the tf*idf for each query term; many more sophisticated ranking functions are variants of this simple model._
|
75
|
+
|
76
|
+
@@text2 = %q_David O'Reilly (1985, Kilkenny, Ireland) is an Irish film director and artist based in Los Angeles, California, USA. He is known for creating animated short films with a stripped down aesthetic.
|
77
|
+
Contents
|
78
|
+
[hide]
|
79
|
+
|
80
|
+
1 Work
|
81
|
+
2 Octocat Adventure
|
82
|
+
3 Short films
|
83
|
+
4 Music videos
|
84
|
+
5 References
|
85
|
+
6 External links
|
86
|
+
|
87
|
+
[edit] Work
|
88
|
+
|
89
|
+
Aside from a 1-minute film entitled Ident from which he draws his logo, the earliest work available on his website is WOFL2106[1]. This short draws equally on original designs and popular internet memes, such as Brian Peppers, to create a disturbing landscape of serenity juxtaposed with chaos. This film sets the tone for his entire ouvre, though the direct inclusion of outside memes disappears in his later work.
|
90
|
+
|
91
|
+
His short film, Please Say Something, was awarded the Golden Bear at the 2009 Berlin International Film Festival, Best Narrative Short at the 2009 Ottawa International Animation Festival[2] and several other awards.[3]
|
92
|
+
|
93
|
+
He created several animation sequences and props for the 2007 film Son of Rambow.[4] As well as animation for the "guide" sequences in Hitchhiker's Guide to the Galaxy, with Shynola.[5]
|
94
|
+
|
95
|
+
He created the first video for Irish rock band U2's single "I'll Go Crazy If I Don't Go Crazy Tonight."[6] The video was released on U2.com on July 21, 2009.
|
96
|
+
|
97
|
+
His latest short film, The External World, premiered at the 67th Venice Film Festival and the 2011 Sundance Film Festival, and has since won over twenty awards on its festival circuit.[7]
|
98
|
+
[edit] Octocat Adventure
|
99
|
+
|
100
|
+
On March 13, 2008, a YouTube user named RANDYPETERS1, a 9-year-old boy from Chicago, submitted a handdrawn animated video about Octocat, a red cat head with eight long legs looking for his parents. The videos featured crude MS Paint animation and a loud, highpitched, child-like voice narrating. On September 7, the fifth, final episode was released, but featured an unexpected twist - about 20 seconds into it, the crude sketchy animation switched to intricately crafted 3D with an orchestral soundtrack; the whole Octocat story (and as such, the Randy Peters persona) was revealed to be by David O'Reilly [8]. In an interview he joked \"I wanted to try experimenting with the Youtube audience and Microsoft Paint. The story for Octocat came to me by reading the bible word-for-word backwards\".[9]
|
101
|
+
[edit] Short films
|
102
|
+
|
103
|
+
The External World (2010, 15 min)
|
104
|
+
Please Say Something (2009, 10 min)
|
105
|
+
Octocat Adventure (2008, 6 min)[10][11]
|
106
|
+
Serial Entoptics (2008, 10 min)
|
107
|
+
RGBXYZ (2007, 12 min)[12]
|
108
|
+
Wofl2106 (2006, 4 min)
|
109
|
+
|
110
|
+
[edit] Music videos
|
111
|
+
|
112
|
+
Szamar Madar (Venetian Snares, 2005, 4 min)
|
113
|
+
I'll Go Crazy If I Don't Go Crazy Tonight (U2, 2009, 4 min)[6]_
|
114
|
+
|
115
|
+
@@text3 = %q_By Melissa Healy, Los Angeles Times
|
116
|
+
|
117
|
+
March 27, 2012, 5:45 p.m.
|
118
|
+
When roasted at 475 degrees, coffee beans are sometimes described as rich and full-bodied. But for the full-bodied person who is not so rich, unroasted coffee beans — green as the day they were picked — may hold the key to cheap and effective weight loss, new research suggests.
|
119
|
+
|
120
|
+
In a study presented Tuesday at the American Chemical Society's spring national meeting in San Diego, 16 overweight young adults took, by turns, a low dose of green coffee bean extract, a high dose of the supplement, and a placebo. Though the study was small, the results were striking: Subjects lost an average of 17.5 pounds in 22 weeks and reduced their overall body weight by 10.5%.
|
121
|
+
|
122
|
+
If green coffee extract were a medication seeking approval from the Food and Drug Administration, these results would make it a viable candidate — more than 35% of subjects lost more than 5% of their body weight, and weight loss appeared to be greater while subjects were taking the pills than when they were on the placebo.
|
123
|
+
|
124
|
+
But as a dietary supplement, green coffee extract does not require the FDA's blessing. In fact, it is already available as a naturopathic medicine and antioxidant.
|
125
|
+
|
126
|
+
Joe Vinson, the University of Scranton chemist who conducted the pilot study, said the findings should pave the way for more rigorous research on coffee bean extract's effects. A larger trial involving 60 people is being planned.
|
127
|
+
|
128
|
+
Vinson, whose research focuses on plant polyphenols and their effects on human health, said it appears that green coffee bean extract may work by reducing the absorption of fat and glucose in the gut; it may also reduce insulin levels, which would improve metabolic function. There were no signs of ill effects on any subjects, Vinson reported Tuesday.
|
129
|
+
|
130
|
+
The study used a "cross-over" design, which allowed each subject to serve as his or her own comparison group. For six weeks, volunteers swallowed capsules three times a day, ingesting either 700 or 1,050 milligrams of green coffee extract a day or taking a placebo. After a two-week break, they moved, round-robin style, to another arm of the trial.
|
131
|
+
|
132
|
+
Subjects did not change their calorie intake over the course of the trial. But the more extract they consumed, the more weight and fat they lost. Altogether, they reduced their body fat by 16%, on average.
|
133
|
+
|
134
|
+
Of the 16 volunteers, six wound up with a body mass index in the healthful range.
|
135
|
+
|
136
|
+
One downside is that the extract is "extremely bitter." It would be difficult to take without a lot of water, Vinson reported.
|
137
|
+
|
138
|
+
At roughly $20 per month, however, green coffee extract is much less expensive than any of the weight-loss medications available over the counter or by prescription.
|
139
|
+
|
140
|
+
The trial was conducted in India and paid for by Applied Food Sciences Inc. of Austin, Tex., a manufacturer of green coffee bean extract.
|
141
|
+
|
142
|
+
The pilot study drew strong cautions from several scientists who weren't involved in the research.
|
143
|
+
|
144
|
+
"This is certainly a provocative study," said Dr. Gerald Weissmann, a physician and biochemist at New York University. But he said nutrition experts would want assurances that green coffee beans do not cause "malabsorption" within the human gut — a condition that would lead to weight loss as well as malnutrition, heart arrhythmias and other problems because vitamins and minerals are not passing through the intestine.
|
145
|
+
|
146
|
+
Dr. Arthur Grollman, a pharmacologist at the State University of New York at Stony Brook, said coffee beans contain about 250 different chemicals — some with positive and others with negative effects on human health. Though Vinson identified polyphenols and chlorogenic acid as the agents that appear to promote weight loss, Grollman said that claim needed further study. In the meantime, he said, consuming an extract that contains both good and bad chemicals in dense concentration seems an unwise thing to do._
|
147
|
+
|
148
|
+
@@text4 = %q_By Andrea Mustain
|
149
|
+
OurAmazingPlanet
|
150
|
+
updated 3/27/2012 6:34:52 PM ET
|
151
|
+
|
152
|
+
Print
|
153
|
+
Font:
|
154
|
+
|
155
|
+
James Cameron's record-setting dive to Earth's deepest spot has sparked a wave of excitement among many in the science community, who are not only heralding the new technology produced by the Hollywood veteran but lauding the renewed focus the project has put on the deep ocean.
|
156
|
+
|
157
|
+
"It's wonderful, absolutely wonderful," said Robert J. Stern, a geoscientist at the University of Texas at Dallas. He is one of several deep-sea researchers who said they'd been closely following Cameron's bid to return human observers to the Challenger Deep, a trough within the Mariana Trench more than 35,000 feet (10,700 meters) below the ocean surface.
|
158
|
+
|
159
|
+
Cameron's roundtrip earlier this week to the deepest place on Earth lasted just under seven hours. The only previous time humans visited this spot was in 1960.
|
160
|
+
|
161
|
+
News of Cameron's successful solo dive "gave me goose bumps," said Cindy Lee Van Dover, director of the marine laboratory at Duke University's Nicholas School of the Environment.
|
162
|
+
|
163
|
+
"I think it's a really good thing," said Bruce Robison, a senior scientist at the Monterey Bay Aquarium Research Institute in California. [ Infographic: James Cameron's Mariana Trench Dive ]
|
164
|
+
|
165
|
+
Opening the deep
|
166
|
+
All three scientists have spent many hours aboard some of the few deep-diving research submersibles on Earth and said they hoped Cameron's technology eventually will prove a boon to researchers wishing to collect samples and perhaps even conduct experiments in the deepest reaches of the sea, a place that until now has been off-limits to humans.
|
167
|
+
|
168
|
+
"This establishes that the technology exists to allow that to happen, so we shouldn't constrain our thinking about approaching work in that extreme habitat," Robison told OurAmazingPlanet. "This is a technological breakthrough and a huge accomplishment on Cameron's part, and I'm very pleased that he's done it; but let's hope it opens the door for more."
|
169
|
+
|
170
|
+
Cameron's team hasn't confirmed what (if any) samples the filmmaker and explorer retrieved from the Challenger Deep during his three-hour seafloor sojourn. The sub is equipped with a sampling arm, among other research tools. However, Cameron did describe a bleak view through the windows of his lime-green submersible.
|
171
|
+
|
172
|
+
More science news from msnbc.com
|
173
|
+
Image: Brain regions
|
174
|
+
Yamada et al. / Nature Comm.
|
175
|
+
Scientists take a look inside a jury's brains
|
176
|
+
|
177
|
+
Science editor Alan Boyle's blog: Jurors show a characteristic pattern of brain activity when they decide to be lenient on a criminal, and the strength of that pattern can vary from juror to juror, researchers say.
|
178
|
+
'Invisibility cloak' can serve as heat shield
|
179
|
+
Dolphin society adopts freewheeling lifestyle
|
180
|
+
Ancient stone monolith likely marked seasons
|
181
|
+
|
182
|
+
"It looked like the moon," Cameron told National Geographic reporters upon his return to the surface world. [ See photos from Cameron's historic dive ]
|
183
|
+
|
184
|
+
For scientists, such an assessment was hardly discouraging.
|
185
|
+
|
186
|
+
"I was rooting for him to land and find strange-looking animals," Van Dover said, but she added she wasn't surprised. She said the seafloor is vast, conditions are harsh, and life is likely sparsely spread.
|
187
|
+
|
188
|
+
"Three hours is just a drop in the bucket, and with more hours I think he's going to discover cool things," Van Dover said. "How many years have we been studying the ocean? And it took until 1977 to discover hydrothermal vents." Van Dover specializes in researching the strange creatures that congregate around the seafloor vents, which spew super-heated water laced with trace chemicals that sustain the animals.
|
189
|
+
|
190
|
+
Robison, a veteran deep-sea ecologist, said that whatever does live in the trench, which is nearly a mile deeper than Mount Everest is tall, will be of great interest to scientists.
|
191
|
+
|
192
|
+
"Anything that has adapted to thrive in that habitat is going to have some really remarkable adaptations," he said. "But most of the animals in the ocean don't live on the bottom, so there's an enormous potential for discovery up off the bottom as well."
|
193
|
+
|
194
|
+
Much of the deepest ocean is unreachable via state-owned submersibles, which at this point can dive no more than 21,000 feet (6,500 m). Only Japan's Shinkai 6500 has reached such depths. The United States is refurbishing Alvin, its deepest-diving craft, to be able to reach 21,000 feet within th_
|
195
|
+
|
196
|
+
|
197
|
+
@@corpus = [@@text,@@text1,@@text2,@@text3,@@text4]
|
198
|
+
|
199
|
+
|
200
|
+
end
|
metadata
ADDED
@@ -0,0 +1,60 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: tfidf
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Yu Shen
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2012-03-17 00:00:00.000000000Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: fast-stemmer
|
16
|
+
requirement: &16818040 !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: *16818040
|
25
|
+
description: Calculate TF-IDF out of a text, resulting in a hash with term as key,
|
26
|
+
frequency as value. Sorry for taking the convenient name for myself! See examples/demo_tf.rb
|
27
|
+
for usage
|
28
|
+
email: yushen83@gmail.com
|
29
|
+
executables: []
|
30
|
+
extensions: []
|
31
|
+
extra_rdoc_files: []
|
32
|
+
files:
|
33
|
+
- lib/tfidf.rb
|
34
|
+
- examples/demo_tf.rb
|
35
|
+
- test/test_TFIDF.rb
|
36
|
+
homepage: https://github.com/yushen
|
37
|
+
licenses: []
|
38
|
+
post_install_message:
|
39
|
+
rdoc_options: []
|
40
|
+
require_paths:
|
41
|
+
- lib
|
42
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
43
|
+
none: false
|
44
|
+
requirements:
|
45
|
+
- - ! '>='
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
49
|
+
none: false
|
50
|
+
requirements:
|
51
|
+
- - ! '>='
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '0'
|
54
|
+
requirements: []
|
55
|
+
rubyforge_project:
|
56
|
+
rubygems_version: 1.8.10
|
57
|
+
signing_key:
|
58
|
+
specification_version: 3
|
59
|
+
summary: A W.I.P implementation of TF-IDF
|
60
|
+
test_files: []
|