RubyGems - cass - Versions diffs - 0.0.2 → 0.0.3 - Mend

cass 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

data/CHANGELOG CHANGED

@@ -1,3 +1,7 @@
+v0.0.3 11/10/2010 -- Minor bug fixes.
+	- Can now initialize Document by passing a single Contrast for target extraction; previously, CASS would crash because it was expecting an array of Contrasts or words.
+	- Fixed issue with uninitialized VERBOSE constant. Previous version crashed if the constant wasn't defined in the environment; now checks if constant is defined first.
 v0.0.2. 08/04/2010 -- Fixed bugs and added additional output information.
 	- No longer includes narray as a dependency upon installation due to presence of multiple narray gems. User must now manually install the correct version before running CASS.
 	- Fixed minor syntax issues that prevented CASS from running on Ruby 1.8.6 (mostly to do with the splat operator).

data/Manifest CHANGED

@@ -3,7 +3,6 @@ LICENSE
 Manifest
 README.rdoc
 Rakefile
-cass.gemspec
 lib/cass.rb
 lib/cass/analysis.rb
 lib/cass/context.rb

data/README.rdoc CHANGED

@@ -4,23 +4,31 @@ CASS (Contrast Analysis of Semantic Similarity) is a set of tools for conducting
 == Version
-The current version of the tools is 0.0.1 (6/15/2010).
+The current version of the tools is 0.0.2 (08/05/2010).
 == License
-Copyright 2010 Tal Yarkoni and Nick Holtzman. Licensed under the GPL license. See the included LICENSE file for details.
+CASS is licensed under the GPL license. See the included LICENSE file for details.
 == Installation
-The CASS tools are packaged as a library for the Ruby programming language. You must have Ruby interpreter installed on your system, as well as the NMatrix library. To install, follow these steps:
+The CASS tools are packaged as a library for the Ruby programming language. You must have Ruby interpreter installed on your system, as well as the NArray gem. To install, follow these steps:
-(1) <b>Install Ruby</b>--preferably 1.9 or greater. Installers for most platforms are available here[http://www.ruby-lang.org/en/downloads/].
+(1) <b>Install Ruby</b>. Installers for most platforms are available here[http://www.ruby-lang.org/en/downloads/]. For Windows, the most recent installer can be found here[http://rubyinstaller.org/]. CASS should work with any recent version of Ruby (1.8.6+), but we recommend using 1.9+.
-(2) <b>Install the NArray[http://narray.rubyforge.org] library</b>. On most platforms, you should be able to just type:
+(2) <b>Install the NArray[http://narray.rubyforge.org] library</b>, which supports numerical operations CASS requires. On most platforms, you should be able to just type the following from the command prompt:
  	gem install narray
-On Windows, it's a bit more involved; follow the instructions here[http://narray.rubyforge.org].
+On Windows, it's slightly more involved, as you'll need to install the right version of the gem and explicitly indicate the architecture. If you're running Ruby 1.8, use the following command:
+ 	gem install narray --platform=x86-mingw32
+On Ruby 1.9+, use the following:
+ 	gem install narray-rub19 --platform=x86-mingw32
+Additional instructions for installing NArray are here[http://narray.rubyforge.org] should you need them, though they appear to be out of date.
 (3) <b>Install the CASS gem</b> from the command prompt, like so:
@@ -76,7 +84,7 @@ Next, we read the file containing the transcripts:
 	text = File.new("cake.txt").read
-And then we can create a corresponding Document. We initialize the Document object by passing a descriptive name, the contrasts we want to run, and the full text we want to analyze:
+And then we can create a corresponding Document. We initialize the Document object by passing a descriptive name, a list of target words (in this case, the target words will be extracted from the contrast we've already defined, but we could also have passed an array of words), and the full text we want to analyze:
 	doc = Document.new("cake_vs_spinach", contrast, text)
@@ -93,6 +101,12 @@ And that prints something like this to our screen:
 Nothing too fancy, just basic descriptive information. The summary method has some additional arguments we could use to get more detailed information (e.g., word_count, list_context, etc.), but we'll skip those for now.
+Having created the Document and specified the target words, we can now generate its coocurrence matrix:
+	doc.coocurrence
+This step creates a correlation matrix in the Document object that represents the similarities between all possible target pairs. The cooccurrence matrix forms the basis for our subsequent analysis.
 Now if we want to compute the interaction term for our contrast (i.e., the difference of differences, reflecting the equation (cake.good - cake.bad) - (spinach.good - spinach.bad)), all we have to do is:
 	contrast.apply(doc)
@@ -107,7 +121,7 @@ Well, sort of. By itself, the number 0.23 doesn't mean very much. We don't know
 	Analysis.bootstrap_test(doc, contrasts, "speech_results.txt", 1000)
-Here we call the bootstrap_test method, feeding it the document we want to analyze, the Contrasts we want to apply, the filename root we want to use, and the number of iterations we want to run (generally, as many as is computationally viable). The results will be saved to a plaintext file with the specified name, and we can peruse that file at our leisure. If we open it up, the first few lines look like this:
+Here we call the bootstrap_test method, feeding it the document we want to analyze, the Contrasts we want to apply, the filename root we want to use, and the number of iterations we want to run (generally, as many as is computationally viable). The results will be saved to a plaintext file with the specified name, and we can peruse that file at our leisure. If we open it up, the first few lines look something like this (the exact values in your file will differ somewhat due to the bootstrapping):
 	contrast	result_id	doc_name	pair_1	pair_2	pair_3	pair_4	interaction_term
 	cake.spinach.good.bad	observed	cake.txt	0.5117	0.4039	0.3256	0.4511	0.2333
@@ -127,4 +141,8 @@ The columns tell us, respectively, what file the results came from, the bootstra
 	cake.txt	cake.spinach.good.bad	1000	0.2333	0.0
 	cake.txt	mean	1000	0.2333	0.0
-As you can see, the last column (p-value) reads 0.0, which is to say, none of the 1,000 iterations we ran had a value greater than 0. So we can reject the null hypothesis of zero effect at p < .001 in this case. Put differently, it's exceedingly unlikely that we would get this result (people having a positive bias towards cake relative to spinach) just by chance. Of course, that's a contrived example that won't surprise anyone. But the point is that you can use the CASS tools in a similar way to ask other much more interesting questions about the relation between different terms in semantic space. So that's the end of this overview; to learn more about the other functionality in CASS, you can surf around this RDoc, or just experiment with the software. Eventually, there will be a more comprehensive manual; in the meantime, if you have questions about usage, email[mailto:nick.holtzman@gmail.com] Nick Holtzman, and if you have technical questions about the Ruby code, email[mailto:tyarkoni@gmail.com] Tal Yarkoni.
+As you can see, the last column (p-value) reads 0.0, which is to say, none of the 1,000 iterations we ran had a value greater than 0. So we can reject the null hypothesis of zero effect at p < .001 in this case. Put differently, it's exceedingly unlikely that we would get this result (people having a positive bias towards cake relative to spinach) just by chance. Of course, that's a contrived example that won't surprise anyone. But the point is that you can use the CASS tools in a similar way to ask other much more interesting questions about the relation between different terms in semantic space. So that's the end of this overview; to learn more about the other functionality in CASS, you can surf around this RDoc, or just experiment with the software. Eventually, there will be a more comprehensive manual.
+== Bug reports / installation problems
+If you have questions about usage, email[mailto:nick.holtzman@gmail.com] Nick Holtzman. For bug reports or technical questions about the Ruby code, email[mailto:tyarkoni@gmail.com] Tal Yarkoni.

data/Rakefile CHANGED

@@ -2,7 +2,7 @@ require 'rubygems'
 require 'rake'
 require 'echoe'
-Echoe.new("cass", "0.0.2") { |p|
+Echoe.new("cass", "0.0.3") { |p|
   p.author = "Tal Yarkoni"
   p.email = "tyarkoni@gmail.com"
   p.summary = "A set of tools for conducting Contrast Analyses of Semantic Similarity (CASS)."

data/cass.gemspec CHANGED

@@ -2,15 +2,15 @@
 Gem::Specification.new do |s|
   s.name = %q{cass}
-  s.version = "0.0.2"
+  s.version = "0.0.3"
   s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
   s.authors = ["Tal Yarkoni"]
-  s.date = %q{2010-08-04}
+  s.date = %q{2010-11-10}
   s.description = %q{A set of tools for conducting Contrast Analyses of Semantic Similarity (CASS).}
   s.email = %q{tyarkoni@gmail.com}
   s.extra_rdoc_files = ["CHANGELOG", "LICENSE", "README.rdoc", "lib/cass.rb", "lib/cass/analysis.rb", "lib/cass/context.rb", "lib/cass/contrast.rb", "lib/cass/document.rb", "lib/cass/extensions.rb", "lib/cass/parser.rb", "lib/cass/stats.rb"]
-  s.files = ["CHANGELOG", "LICENSE", "Manifest", "README.rdoc", "Rakefile", "cass.gemspec", "lib/cass.rb", "lib/cass/analysis.rb", "lib/cass/context.rb", "lib/cass/contrast.rb", "lib/cass/document.rb", "lib/cass/extensions.rb", "lib/cass/parser.rb", "lib/cass/stats.rb"]
+  s.files = ["CHANGELOG", "LICENSE", "Manifest", "README.rdoc", "Rakefile", "lib/cass.rb", "lib/cass/analysis.rb", "lib/cass/context.rb", "lib/cass/contrast.rb", "lib/cass/document.rb", "lib/cass/extensions.rb", "lib/cass/parser.rb", "lib/cass/stats.rb", "cass.gemspec"]
   s.homepage = %q{http://casstools.org}
   s.rdoc_options = ["--line-numbers", "--inline-source", "--title", "Cass", "--main", "README.rdoc"]
   s.require_paths = ["lib"]

data/lib/cass.rb CHANGED

@@ -9,6 +9,6 @@ require 'cass/parser'
 module Cass
-  VERSION = '0.0.2'
+  VERSION = '0.0.3'
 end

data/lib/cass/analysis.rb CHANGED

@@ -25,7 +25,7 @@ module Cass
         opts[c.downcase] = Module.const_get(c) if consts.include?(c)
       }
-      if VERBOSE
+      if (defined?(VERBOSE) and VERBOSE)
         puts "\nRunning CASS with the following options:"
         opts.each { |k,v| puts "\t#{k}: #{v}" }
       end
@@ -33,11 +33,11 @@ module Cass
       contrasts = parse_contrasts(CONTRAST_FILE)
       # Create contrasts
-      puts "\nFound #{contrasts.size} contrasts." if VERBOSE
+      puts "\nFound #{contrasts.size} contrasts." if (defined?(VERBOSE) and VERBOSE)
       # Set targets
       targets = contrasts.inject([]) { |t, c| t += c.words.flatten }.uniq
-      puts "\nFound #{targets.size} target words." if VERBOSE
+      puts "\nFound #{targets.size} target words." if (defined?(VERBOSE) and VERBOSE)
       # Read in files and create documents
       docs = []
@@ -61,15 +61,15 @@ module Cass
         docs.each { |d|
           base = File.basename(d.name, '.txt')
           puts "\nRunning one-sample analysis on document '#{d.name}'."
-          puts "Generating #{n_perm} bootstraps..." if VERBOSE and STATS
+          puts "Generating #{n_perm} bootstraps..." if (defined?(VERBOSE) and VERBOSE) and STATS
           bootstrap_test(d, contrasts, "#{OUTPUT_ROOT}_#{base}_results.txt", n_perm, opts)
           p_values("#{OUTPUT_ROOT}_#{base}_results.txt", 'boot', true) if STATS
         }
       when 2
         abort("Error: in order to run a permutation test, you need to pass exactly two files as input.") if FILES.size != 2 or docs.size != 2
-        puts "Running two-sample comparison between '#{File.basename(FILES[0])}' and '#{File.basename(FILES[1])}'." if VERBOSE
-        puts "Generating #{n_perm} permutations..." if VERBOSE and STATS
+        puts "Running two-sample comparison between '#{File.basename(FILES[0])}' and '#{File.basename(FILES[1])}'." if (defined?(VERBOSE) and VERBOSE)
+        puts "Generating #{n_perm} permutations..." if (defined?(VERBOSE) and VERBOSE) and STATS
         permutation_test(docs[0], docs[1], contrasts, "#{OUTPUT_ROOT}_results.txt", n_perm, opts)
         p_values("#{OUTPUT_ROOT}_results.txt", 'perm', true)
@@ -148,6 +148,8 @@ module Cass
   		outf.sync = true
   		doc.cooccurrence(opts['normalize_weights'])
+  		contrasts = [contrasts] if contrasts.class == Contrast
       contrasts.each { |c|
   			observed = c.apply(doc)
   			outf.puts "#{c.words.join(".")}\tobserved\t#{observed}"

data/lib/cass/context.rb CHANGED

@@ -8,11 +8,13 @@ module Cass
     def initialize(doc, opts)
       min_prop = opts['min_prop'] || 0
       max_prop = opts['max_prop'] || 1
-      puts "Creating new context..." if VERBOSE
-      puts "Using all words with token frequency in range of #{min_prop} and #{max_prop}."
+      if (defined?(VERBOSE) and VERBOSE)
+        puts "Creating new context..."
+        puts "Using all words with token frequency in range of #{min_prop} and #{max_prop}."
+      end
       words = doc.lines.join(' ').split(/\s+/)
       nwords = words.size
-      puts "Found #{nwords} words."
+      puts "Found #{nwords} words." if (defined?(VERBOSE) and VERBOSE)
       if min_prop > 0 or max_prop < 1
         word_hash = Hash.new(0)
         words.each {|w| word_hash[w] += 1 }
@@ -28,12 +30,12 @@ module Cass
         rescue
           abort("Error: could not open stopword file #{opts['stop_file']}!")
         end
-        puts "Removing #{stopwords.size} stopwords from context." if VERBOSE
+        puts "Removing #{stopwords.size} stopwords from context." if (defined?(VERBOSE) and VERBOSE)
         words -= stopwords
       end
       @words = opts.key?('context_size') ? words.sort_by{rand}[0, opts['context_size']] : words
       index_words
-      puts "Using #{@words.size} words as context." if VERBOSE
+      puts "Using #{@words.size} words as context." if (defined?(VERBOSE) and VERBOSE)
     end
     # Index the context. Necessary when words are updated manually.

data/lib/cass/contrast.rb CHANGED

@@ -8,6 +8,7 @@ module Cass
     attr_accessor :words
     def initialize(words)
+      words = words.split(/,*\s+/) if words.class == String
       @words = words
     end

data/lib/cass/document.rb CHANGED

@@ -26,8 +26,8 @@ module Cass
       # Error checking...
       if name.nil?
         abort("Error: document has no name!")
-      elsif targets.nil? or targets.class != Array or targets.empty?
-        abort("Error: invalid target specification; targets must be an array of words or Contrasts.")
+      elsif targets.nil?
+         abort("Error: you must specify the targets to use!")
       elsif text.nil?
         abort("Error: no text provided!")
       end
@@ -36,9 +36,10 @@ module Cass
       @name, @text, @tindex = name, text, {}
       # Get list of words from contrasts if necessary
+      targets = [targets] if targets.class == Contrast
       @targets =
       if targets[0].class == Contrast
-        targets = contrasts.inject([]) { |t, c| t += c.words.flatten }.uniq
+        targets.inject([]) { |t, c| t += c.words.flatten }.uniq
       else
         targets
       end
@@ -57,14 +58,14 @@ module Cass
       if opts['skip_preproc']
         @lines = (text.class == Array) ? @text : text.split(/[\r\n]+/)
       else
-        puts "Converting to lowercase..." if VERBOSE
+        puts "Converting to lowercase..." if defined?(VERBOSE) and VERBOSE
         @text.downcase! unless opts['keep_case']
         @text.gsub!(/[^a-z \n]+/, '') unless opts['keep_special']
         if opts.key?('recode')
-          puts "Recoding words..." if VERBOSE
+          puts "Recoding words..." if defined?(VERBOSE) and VERBOSE
           opts['recode'].each { |k,v| @text.gsub!(/(^|\s+)(#{k})($|\s+)/, "\\1#{v}\\3") }
         end
-        puts "Parsing text..." if VERBOSE
+        puts "Parsing text..." if defined?(VERBOSE) and VERBOSE
         @lines = opts['parse_text'] ? Parser.parse(@text, opts) : @text.split(/[\r\n]+/)
         @lines = @lines[0, opts['max_lines']] if opts['max_lines'] and opts['max_lines'] > 0
         trim!
@@ -74,12 +75,12 @@ module Cass
     # Trim internal list of lines, keeping only those that contain
     # at least one target word.
     def trim!
-      puts "Deleting target-less lines..." if VERBOSE
+      puts "Deleting target-less lines..." if defined?(VERBOSE) and VERBOSE
       ts = @targets.join("|")
       #@lines.delete_if { |s| (s.split(/\s+/) & @targets).empty? }  # another way to do it
       nl = @lines.size
       @lines = @lines.grep(/(^|\s+)(#{ts})($|\s+)/)
-      puts "Keeping #{@lines.size} / #{nl} lines." if VERBOSE
+      puts "Keeping #{@lines.size} / #{nl} lines." if defined?(VERBOSE) and VERBOSE
       self
     end
@@ -131,7 +132,7 @@ module Cass
     #   permute!
     #   docs = [self]
     #   n.times { |i|
-    #     puts "Generating bootstrap ##{i+1}..." if VERBOSE
+    #     puts "Generating bootstrap ##{i+1}..." if defined?(VERBOSE) and VERBOSE
     #     d = self.clone
     #     d.name = "#{@name}_boot_#{(i+1)}"
     #     d.resample!
@@ -145,7 +146,7 @@ module Cass
     # Drop all words that aren't in target list or context. Store as an array of arrays,
     # with first element = array of targets and second = array of context words.
     def compact
-  		puts "Compacting all lines..." if VERBOSE
+  		puts "Compacting all lines..." if defined?(VERBOSE) and VERBOSE
   		@clines = []
   		@lines.each { |l|
   			w = l.split(/\s+/).uniq
@@ -158,7 +159,7 @@ module Cass
     # Computes co-occurrence matrix between target words and the context.
     # Stores a target-by-context integer matrix internally.
     def cooccurrence(normalize_weights=false)
-      # puts "Generating co-occurrence matrix..." if VERBOSE
+      # puts "Generating co-occurrence matrix..." if defined?(VERBOSE) and VERBOSE
       coocc = NMatrix.float(@targets.size, @context.size)
   		compact if @clines.nil?
       lc = 0  # line counter
@@ -214,6 +215,7 @@ module Cass
     def summary(filename=nil, list_context=false, word_count=false)
       buffer = []
+      compact if @clines.nil?
       # Basic info that always gets shown
       buffer << "Summary for document '#{@name}':"

data/lib/cass/parser.rb CHANGED

@@ -32,7 +32,7 @@ module Cass
         rx = opts.key?('parser_regex') ? opts['parser_regex'] : "[\r\n\.]+"
         text.split(/#{rx}/)
       else
-        puts "Using the Stanford Parser to parse the text. Note that this could take a long time for large files!" if VERBOSE
+        puts "Using the Stanford Parser to parse the text. Note that this could take a long time for large files!" if (defined?(VERBOSE) and VERBOSE)
         parser = StanfordParser::DocumentPreprocessor.new
         parser.getSentencesFromString(text)
       end

metadata CHANGED

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: cass
 version: !ruby/object:Gem::Version
-  hash: 27
+  hash: 25
   prerelease: false
   segments:
   - 0
   - 0
-  - 2
-  version: 0.0.2
+  - 3
+  version: 0.0.3
 platform: ruby
 authors:
 - Tal Yarkoni
@@ -15,7 +15,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2010-08-04 00:00:00 -06:00
+date: 2010-11-10 00:00:00 -07:00
 default_executable:
 dependencies: []
@@ -43,7 +43,6 @@ files:
 - Manifest
 - README.rdoc
 - Rakefile
-- cass.gemspec
 - lib/cass.rb
 - lib/cass/analysis.rb
 - lib/cass/context.rb
@@ -52,6 +51,7 @@ files:
 - lib/cass/extensions.rb
 - lib/cass/parser.rb
 - lib/cass/stats.rb
+- cass.gemspec
 has_rdoc: true
 homepage: http://casstools.org
 licenses: []