sm-transcript 0.0.7 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.txt CHANGED
@@ -1,4 +1,4 @@
1
- $Id: README.txt 200 2010-10-29 18:23:48Z pwilkins $
1
+ $Id: README.txt 204 2010-11-30 02:20:04Z pwilkins $
2
2
 
3
3
  sm-transcript reads results of SLS processing and produces transcripts for
4
4
  the SpokenMedia browser. For each file in the source folder whose extension
@@ -88,9 +88,10 @@ Using the App:
88
88
  -h, --help Show this message
89
89
 
90
90
  There is a serious gotch'a in specifying the srctype parameter: it must
91
- match the case of the file extension that you're processing. I know,
92
- I know; pretty lame. I will update the gem with a fix shortly. My
93
- apologies until then.
91
+ match the case of the file extension that you're processing. This means
92
+ that if the srt files that you are processing have the extension .SRT, then
93
+ you must specify the srctype as "SRT". Pretty lame, I know. I will update
94
+ the gem with a fix shortly. My apologies until then.
94
95
 
95
96
  Troubleshooting:
96
97
  sm-transcript requires additional gems to operate. The RubyGem
@@ -163,10 +164,13 @@ Release Notes:
163
164
  version 0.0.4 - fixes bug when processing .WRD files with CRLF line
164
165
  endings.
165
166
  version 0.0.5 - removed due to posting error
166
- version 0.0.6 - added srctype of ttml and desttype of json, fixed bug
167
- where beginning time of word was actually for previous word.
167
+ version 0.0.6 - added srctype of ttml and desttype of json, fixed bug where
168
+ beginning time of word was actually for previous word.
168
169
  version 0.0.7 - added srt as srctype
169
-
170
+ version 0.0.8 - fixed bug that dropped last phrase from transcripts
171
+ version 1.0.0 - declared this version 1.0.0 to conform more closely with
172
+ gem numbering conventions. All tests run successfully.
173
+
170
174
  To Do:
171
175
  - specify individual files for processing rather than folders
172
176
  - fix bug in srt processing: can't read Creole srt content.
data/Rakefile CHANGED
@@ -1,4 +1,4 @@
1
- # $Id: Rakefile 198 2010-10-26 17:08:33Z pwilkins $
1
+ # $Id: Rakefile 204 2010-11-30 02:20:04Z pwilkins $
2
2
 
3
3
  require 'rake/gempackagetask'
4
4
  require 'rake'
@@ -8,7 +8,7 @@ spec = Gem::Specification.new do |s|
8
8
  s.summary = "Convert word lists to transcripts"
9
9
  s.description= File.read(File.join(File.dirname(__FILE__), 'README.txt'))
10
10
  s.requirements = [ 'TBD' ]
11
- s.version = "0.0.7"
11
+ s.version = "1.0.0"
12
12
  s.author = "Peter Wilkins"
13
13
  s.email = "pwilkins@mit.edu"
14
14
  s.homepage = "http://spokenmedia.mit.edu"
@@ -1,11 +1,13 @@
1
- # $Id: metadata.rb 183 2010-03-15 19:07:50Z pwilkins $
1
+ # $Id: metadata.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
5
5
  require "rexml/document"
6
6
  require 'extensions/kernel'
7
7
  require 'json/ext'
8
- require_relative 'word'
8
+ require File.join(File.dirname(__FILE__), '/word.rb')
9
+ # require_relative 'word'
10
+
9
11
 
10
12
  module SmTranscript
11
13
  class Metadata
@@ -1,10 +1,12 @@
1
- # $Id: metadata_reader.rb 182 2010-03-12 22:07:34Z pwilkins $
1
+ # $Id: metadata_reader.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
5
5
  require 'rubygems'
6
6
  require 'extensions/kernel'
7
- require_relative 'word'
7
+ require File.join(File.dirname(__FILE__), '/word.rb')
8
+ # require_relative 'word'
9
+
8
10
 
9
11
  module SmTranscript
10
12
  class MetadataReader
@@ -1,4 +1,4 @@
1
- # $Id: options.rb 183 2010-03-15 19:07:50Z pwilkins $
1
+ # $Id: options.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
@@ -1,4 +1,4 @@
1
- # $Id: runner.rb 202 2010-10-30 02:47:21Z pwilkins $
1
+ # $Id: runner.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
@@ -56,17 +56,17 @@ module SmTranscript
56
56
  raise "txt invalid srctype for html desttype" if @options.srctype ==
57
57
  SmTranscript::Options::TXT_SRC_TYPE
58
58
  destfile = "#{destfile}t1.html"
59
- p "destfile is #{destfile}"
59
+ # p "destfile is #{destfile}"
60
60
  trans.write_html("#{@options.destdir}/#{destfile}")
61
61
  when SmTranscript::Options::DATAJS_DEST_TYPE
62
62
  raise "txt is only valid srctype for datajs desttype" unless @options.srctype ==
63
63
  SmTranscript::Options::TXT_SRC_TYPE
64
64
  destfile = "#{destfile}data.js"
65
- # p "destfile is #{destfile}"
65
+ # p "destfile is #{destfile}"
66
66
  meta.write_datajs("#{@options.destdir}/#{destfile}")
67
67
  else
68
68
  destfile = "#{destfile}t1.ttml"
69
- # p "destfile is #{destfile}"
69
+ # p "destfile is #{destfile}"
70
70
  trans.write_ttml("#{@options.destdir}/#{destfile}")
71
71
  end
72
72
  end # Dir.glob()
@@ -4,7 +4,9 @@
4
4
 
5
5
  require 'rubygems'
6
6
  require 'extensions/kernel'
7
- require_relative 'word'
7
+ require File.join(File.dirname(__FILE__), '/word.rb')
8
+ # require_relative 'word'
9
+
8
10
 
9
11
  module SmTranscript
10
12
  class SbvReader
@@ -1,10 +1,12 @@
1
- # $Id: seg_reader.rb 182 2010-03-12 22:07:34Z pwilkins $
1
+ # $Id: seg_reader.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
5
5
  require "rexml/document"
6
6
  require 'extensions/kernel'
7
- require_relative 'word'
7
+ require File.join(File.dirname(__FILE__), '/word.rb')
8
+ # require_relative 'word'
9
+
8
10
 
9
11
  module SmTranscript
10
12
  class SegReader
@@ -34,11 +36,13 @@ module SmTranscript
34
36
  # line is expected to contain two integers separated by a space,
35
37
  # followed by a space and one or more words. The words may contain
36
38
  # characters, or an apostrophe
39
+ arr = []
37
40
  @root.elements.each("/document/lecture/segment") do |s|
38
41
  s.text.scan(/^\d* \d* [\w']*$/) do |t|
39
42
  arr = t.split
40
43
  @words << SmTranscript::Word.new(arr[0], arr[1], arr[1].to_i - arr[0].to_i, arr[2])
41
44
  end
45
+ @words << SmTranscript::Word.new(arr[0], arr[1], arr[1].to_i - arr[0].to_i, arr[2])
42
46
  end
43
47
  end
44
48
  end
@@ -1,10 +1,14 @@
1
- # $Id: srt_reader.rb 203 2010-10-30 09:45:38Z pwilkins $
1
+ # $Id: srt_reader.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
+ $KCODE = "U"
4
5
 
5
6
  require 'rubygems'
7
+ require 'time'
6
8
  require 'extensions/kernel'
7
- require_relative 'word'
9
+ require File.join(File.dirname(__FILE__), '/word.rb')
10
+ # require_relative 'word'
11
+
8
12
 
9
13
  module SmTranscript
10
14
  class SrtReader
@@ -59,7 +63,7 @@ module SmTranscript
59
63
  # p "line: #{cntr}"
60
64
  # p "number: #{$1}"
61
65
  # p "phrase: #{phrase}"
62
- p "start time: #{start_time}"
66
+ # p "start time: #{start_time}"
63
67
  @words << SmTranscript::Word.new(get_millisecs(start_time), 0, '', phrase) unless (start_time.length == 0) | (phrase.length == 0)
64
68
  phrase = ''
65
69
  start_time = ''
@@ -73,14 +77,16 @@ module SmTranscript
73
77
 
74
78
  start_time = $1
75
79
  # p "start: #{$1}"
76
-
77
- when /^([A-Za-z0-9'\xD2\xE8\xF2,.:\?\(\)\^]+ ?[\w',.:-\?\(\)\^\xD2\xE8\xF2 ]*)/u
80
+ # these are the codes for Creole chars \xD2\xE8\xF2
81
+ # when /^([A-Za-z0-9',.:\?\(\)\^]+ ?[\w',.:-\?\(\)\^ ]*)/mu
82
+ when /^([\w0-9',.:\?\(\)\^]+ ?[\w',.:-\?\(\)\^ ]*)/mu
78
83
  phrase.length == 0 ? phrase = $1 : phrase += " #{$1}"
79
84
  # p "phrase:[#{phrase.length}] #{$1} <#{phrase}>"
80
85
  end
81
86
  end
82
- # p "last line: #{cntr}"
83
- # p "@words length: #{@words.length}"
87
+ # p "last line: #{cntr}"
88
+ # p "@words length: #{@words.length}"
89
+ @words << SmTranscript::Word.new(get_millisecs(start_time), 0, '', phrase) unless (start_time.length == 0) | (phrase.length == 0)
84
90
  end
85
91
 
86
92
  public
@@ -92,7 +98,7 @@ public
92
98
  t = time_val.to_s
93
99
  # if t.match(/\d\d:\d\d:\d\d,\d\d\d/).nil?
94
100
  # if t.match(/\d+ms/).nil?
95
- before = t
101
+ # before = t
96
102
  if (t =~ /(\d\d:\d\d:\d\d([,\.]\d{1,3})?)/).nil?
97
103
  t = "00:#{t}"
98
104
  # p "#{before} -> #{t}"
@@ -1,4 +1,4 @@
1
- # $Id: transcript.rb 182 2010-03-12 22:07:34Z pwilkins $
1
+ # $Id: transcript.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
@@ -6,7 +6,11 @@ require "rexml/document"
6
6
  require 'extensions/kernel'
7
7
  require 'builder'
8
8
  require 'sqlite3'
9
- require_relative 'word'
9
+ require File.join(File.dirname(__FILE__), '/word.rb')
10
+ # require_relative 'word'
11
+
12
+
13
+ $KCODE="U"
10
14
 
11
15
  module SmTranscript
12
16
  class Transcript
@@ -39,6 +43,11 @@ module SmTranscript
39
43
  # p "overwriting existing destination file"
40
44
  # end
41
45
  File.open(dest_file, "w") do |f|
46
+ # write Title into <head> for the benefit of Google Search Appliance
47
+ f.puts '<head>'
48
+
49
+ f.puts '</head>'
50
+ f.puts '<body>'
42
51
  span_element = ""
43
52
  prev_start_time = 0
44
53
  start_time = 0
@@ -47,7 +56,10 @@ module SmTranscript
47
56
  STDERR.puts dest_file
48
57
  end
49
58
  STDERR.puts dest_file
59
+ cntr = 0
50
60
  @words.each do |w|
61
+ cntr += 1
62
+ # p "word cntr: #{cntr}"
51
63
  # get the start time and reduce its granularity so that multiple
52
64
  # words fall within a <span> element.
53
65
  start_time = w.start_time.to_i/1000
@@ -65,6 +77,7 @@ module SmTranscript
65
77
  # In the block above, the last word isn't written if
66
78
  # the start_time and prev_start_time are the same.
67
79
  f.puts span_element << "</span> " unless start_time != prev_start_time
80
+ f.puts '</body>'
68
81
  f.close
69
82
  end
70
83
 
@@ -5,7 +5,9 @@
5
5
  require 'rubygems'
6
6
  require 'nokogiri'
7
7
  require 'extensions/kernel'
8
- require_relative 'word'
8
+ require 'time'
9
+ require File.join(File.dirname(__FILE__), '/word.rb')
10
+ # require_relative 'word'
9
11
 
10
12
  module SmTranscript
11
13
  class TtmlReader
@@ -1,10 +1,12 @@
1
- # $Id: wrd_reader.rb 182 2010-03-12 22:07:34Z pwilkins $
1
+ # $Id: wrd_reader.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
5
5
  require 'rubygems'
6
6
  require 'extensions/kernel'
7
- require_relative 'word'
7
+ require File.join(File.dirname(__FILE__), '/word.rb')
8
+ # require_relative 'word'
9
+
8
10
 
9
11
  module SmTranscript
10
12
  class WrdReader
Binary file
@@ -1,4 +1,4 @@
1
- # $Id: test_runner.rb 202 2010-10-30 02:47:21Z pwilkins $
1
+ # $Id: test_runner.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
@@ -95,12 +95,12 @@ class TestRunner < Test::Unit::TestCase
95
95
  assert_equal 'html', opts.desttype
96
96
  runner.run
97
97
 
98
- assert(File.exists?("#{opts.destdir}/#{fname04}-t1.ttml"),
99
- "File not found: #{opts.destdir}/#{fname04}-t1.ttml")
100
- assert(File.exists?("#{opts.destdir}/#{fname05}-t1.ttml"),
101
- "File not found: #{opts.destdir}/#{fname05}-t1.ttml")
102
- assert(File.exists?("#{opts.destdir}/#{fname06}-t1.ttml"),
103
- "File not found: #{opts.destdir}/#{fname06}-t1.ttml")
98
+ assert(File.exists?("#{opts.destdir}/#{fname04}-t1.html"),
99
+ "File not found: #{opts.destdir}/#{fname04}-t1.html")
100
+ assert(File.exists?("#{opts.destdir}/#{fname05}-t1.html"),
101
+ "File not found: #{opts.destdir}/#{fname05}-t1.html")
102
+ assert(File.exists?("#{opts.destdir}/#{fname06}-t1.html"),
103
+ "File not found: #{opts.destdir}/#{fname06}-t1.html")
104
104
  end
105
105
 
106
106
  # I don't know how to test for the "invalid option" error that this test causes.
@@ -1,4 +1,4 @@
1
- # $Id: test_srtreader.rb 192 2010-03-27 01:24:26Z pwilkins $
1
+ # $Id: test_segreader.rb 192 2010-03-27 01:24:26Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
@@ -6,20 +6,20 @@ require 'rubygems'
6
6
  require 'extensions/kernel'
7
7
  require 'test/unit'
8
8
  require 'shoulda'
9
- require_relative '../lib/sm_transcript/srt_reader'
9
+ require_relative '../lib/sm_transcript/seg_reader'
10
10
 
11
- class TestSrtReader < Test::Unit::TestCase
11
+ class TestSegReader < Test::Unit::TestCase
12
12
 
13
13
  context "app can find the seg file" do
14
14
  should "verify that instance is not nil" do
15
- segfile = SmTranscript::SrtReader.from_file("results/IIHS_Diane_Davis_Nov2009.seg")
15
+ segfile = SmTranscript::SegReader.from_file("results/IIHS_Diane_Davis_Nov2009.seg")
16
16
  assert_not_nil(segfile)
17
17
  end
18
18
  end
19
19
 
20
20
  context "read a metadata item from seg file" do
21
21
  should "return seg file name" do
22
- segfile = SmTranscript::SrtReader.from_file("results/IIHS_Diane_Davis_Nov2009.seg")
22
+ segfile = SmTranscript::SegReader.from_file("results/IIHS_Diane_Davis_Nov2009.seg")
23
23
 
24
24
  assert_equal "IIHS_Diane_Davis_Nov2009.seg",
25
25
  segfile.metadata["orig_seg_path"].to_s
@@ -28,7 +28,7 @@ class TestSrtReader < Test::Unit::TestCase
28
28
 
29
29
  context "read a time-coded word from seg file" do
30
30
  should "return first time-coded word in transcript" do
31
- segfile = SmTranscript::SrtReader.from_file("results/IIHS_Diane_Davis_Nov2009.seg")
31
+ segfile = SmTranscript::SegReader.from_file("results/IIHS_Diane_Davis_Nov2009.seg")
32
32
 
33
33
  assert_equal "11406", segfile.words[0].start_time
34
34
  assert_equal "11500", segfile.words[0].end_time
@@ -1,4 +1,4 @@
1
- # $Id: test_srtreader.rb 203 2010-10-30 09:45:38Z pwilkins $
1
+ # $Id: test_srtreader.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
@@ -125,15 +125,35 @@ class TestSrtReader < Test::Unit::TestCase
125
125
  assert_equal 0, srtfile04.words[701].end_time
126
126
  assert_equal "So E must be tells us what E is,", srtfile04.words[701].word
127
127
 
128
- p srtfile04.words[722].start_time
129
- p srtfile04.words[0].end_time
130
- p srtfile04.words[722].word
128
+ # p srtfile04.words[721].start_time
129
+ # p srtfile04.words[0].end_time
130
+ # p srtfile04.words[721].word
131
+ assert_equal 2793730, srtfile04.words[721].start_time
132
+ assert_equal 0, srtfile04.words[721].end_time
133
+ assert_equal "Okay, thanks.", srtfile04.words[721].word
134
+
135
+ # p srtfile04.words[722].start_time
136
+ # p srtfile04.words[0].end_time
137
+ # p srtfile04.words[722].word
131
138
  assert_equal 2795208, srtfile04.words[722].start_time
139
+ # cntr = 0
140
+ # srtfile04.words.each do |w|
141
+ # cntr += 1
142
+ # p "words cnt #{cntr}"
143
+ # end
132
144
  assert_equal 0, srtfile04.words[722].end_time
133
145
  assert_equal "See you on Wednesday.", srtfile04.words[722].word
134
146
 
135
- p srtfile04.words[723].start_time
136
-
147
+ srtfile05 = SmTranscript::SrtReader.from_file("results/20101018_OCW-18.01-f07-lec02_300k-Haitian Creole.srt")
148
+
149
+ # p srtfile05.words[0].start_time
150
+ # p srtfile05.words[0].end_time
151
+ # p srtfile05.words[0].word
152
+ # assert_equal 7121, srtfile05.words[0].start_time
153
+ # assert_equal 0, srtfile05.words[0].end_time
154
+ # assert_equal "I've been multiplying matrices already, but certainly time for", srtfile05.words[0].word
155
+
156
+
137
157
 
138
158
  end
139
159
  end
@@ -1,4 +1,4 @@
1
- # $Id: test_transcript.rb 196 2010-06-11 18:51:18Z pwilkins $
1
+ # $Id: test_transcript.rb 204 2010-11-30 02:20:04Z pwilkins $
2
2
  # Copyright (c) 2010 Massachusetts Institute of Technology
3
3
  # see LICENSE.txt for license text
4
4
 
@@ -6,7 +6,8 @@ require 'rubygems'
6
6
  require 'extensions/kernel'
7
7
  require 'test/unit'
8
8
  require 'shoulda'
9
- require_relative '../lib/sm_transcript/transcript'
9
+ require File.join(File.dirname(__FILE__), '/../lib/sm_transcript/transcript.rb')
10
+ #require_relative '../lib/sm_transcript/transcript'
10
11
  require_relative '../lib/sm_transcript/seg_reader'
11
12
  require_relative '../lib/sm_transcript/wrd_reader'
12
13
  require_relative '../lib/sm_transcript/srt_reader'
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sm-transcript
3
3
  version: !ruby/object:Gem::Version
4
- hash: 17
4
+ hash: 23
5
5
  prerelease: false
6
6
  segments:
7
+ - 1
7
8
  - 0
8
9
  - 0
9
- - 7
10
- version: 0.0.7
10
+ version: 1.0.0
11
11
  platform: ruby
12
12
  authors:
13
13
  - Peter Wilkins
@@ -15,11 +15,11 @@ autorequire:
15
15
  bindir: bin
16
16
  cert_chain: []
17
17
 
18
- date: 2010-11-08 00:00:00 -05:00
18
+ date: 2010-11-30 00:00:00 -05:00
19
19
  default_executable:
20
20
  dependencies: []
21
21
 
22
- description: "$Id: README.txt 200 2010-10-29 18:23:48Z pwilkins $\n\n\
22
+ description: "$Id: README.txt 204 2010-11-30 02:20:04Z pwilkins $\n\n\
23
23
  sm-transcript reads results of SLS processing and produces transcripts for\n\
24
24
  the SpokenMedia browser. For each file in the source folder whose extension \n\
25
25
  matches the source type, a file of destination type is created in the \n\
@@ -30,12 +30,12 @@ description: "$Id: README.txt 200 2010-10-29 18:23:48Z pwilkins $\n\n\
30
30
  If you are a Windows user, make the usual adjustments.\n\n\
31
31
  Requirements:\n sm-transcript is written in Ruby and packaged as a RubyGem. Since Ruby is\n not a compiled language, you will need to have Ruby installed on your \n machine to run sm-transcript. You can determine if Ruby is installed by \n typing \"ruby -v\" at a terminal prompt. It should return the version of \n Ruby that is installed. If Ruby is not installed on your machine, navigate \n to http://www.ruby-lang.org/ and follow the installation instructions. \n sm-transcript was developed using Ruby 1.8. Other Ruby versions have not\n been tested as of this release. \n \n\
32
32
  Installation:\n You can get sm-transcript as either a RubyGem or as source from svn. \n \n The preferred way to install this package is as a Rubygem. You can \n download and install the gem with this command: \n \n felix$ sudo gem install [--verbose] sm-transcript\n \n This command downloads the most recent version of the gem from rubygems.org\n and makes it active. Previous versions of the gem remain installed, but \n are deactivated.\n \n You must use \"sudo\" to properly install the gem. If you execute \"gem \n install\" (omitting the \"sudo\") the gem is installed in your home gem \n repository and it isn't in your path without additional configuration.\n \n Note: You need sudo privileges to run the command as written. If you \n can't sudo, then you can install it locally and will need some additional\n configuration. Contact me (or your local Ruby wizard) for assistance. \n \n The executable is now in your path.\n \n You can cleanly uninstall the gem with this command:\n \n felix$ sudo gem uninstall sm-transcript \n \n If you have access to our svn repository, you are welcome to check out the \n code. Be warned that the trunk tip is not necessarily stable. It changes \n frequently as enhancements (and bug fixes) are added. (note that the\n 'smb_transcript' in the command line below is not a typo.)\n\n svn co svn+ssh://svn.mit.edu/oeit-tsa/SMB/smb_transcript/trunk sm_transcript\n \n build the gem by running this command from the directory you installed the \n source. This is what it looks like on my machine:\n \n felix$ rake gem\n \n The gem will be built and put in ./pkg You can now use the gem \n installation instructions above.\n \n\n\
33
- Using the App:\n Run with no command line parameters, the app reads *.wrd files out of \n ./results and writes *.t1.html files to ./transcripts. These directories\n are relative to where sm_transcript is called.\n \n Note: destination files are overwritten without a warning prompt. If you \n want to preserve an existing output file, rename it before running the app\n again.\n \n For example, run the app by navigating to the bin folder and enter \n\n projects/sm_transcript/bin felix$ sm_transcript\n \n This command run from this folder will read *.wrd files from bin/results\n and write *-t1.html to bin/transcripts.\n \n Usage: sm_transcript [options] \n --srcdir PATH Read files from this folder (Default: ./results)\n --destdir PATH Write files to this folder (Default: ./transcripts)\n --srctype wrd | seg | txt | ttml | srt Kind of file to process (Default: wrd)\n --desttype html | ttml | datajs | json Kind of file to output (Default: html)\n -h, --help Show this message \n\n There is a serious gotch'a in specifying the srctype parameter: it must \n match the case of the file extension that you're processing. I know, \n I know; pretty lame. I will update the gem with a fix shortly. My \n apologies until then.\n\n\
33
+ Using the App:\n Run with no command line parameters, the app reads *.wrd files out of \n ./results and writes *.t1.html files to ./transcripts. These directories\n are relative to where sm_transcript is called.\n \n Note: destination files are overwritten without a warning prompt. If you \n want to preserve an existing output file, rename it before running the app\n again.\n \n For example, run the app by navigating to the bin folder and enter \n\n projects/sm_transcript/bin felix$ sm_transcript\n \n This command run from this folder will read *.wrd files from bin/results\n and write *-t1.html to bin/transcripts.\n \n Usage: sm_transcript [options] \n --srcdir PATH Read files from this folder (Default: ./results)\n --destdir PATH Write files to this folder (Default: ./transcripts)\n --srctype wrd | seg | txt | ttml | srt Kind of file to process (Default: wrd)\n --desttype html | ttml | datajs | json Kind of file to output (Default: html)\n -h, --help Show this message \n\n There is a serious gotch'a in specifying the srctype parameter: it must \n match the case of the file extension that you're processing. This means \n that if the srt files that you are processing have the extension .SRT, then \n you must specify the srctype as \"SRT\". Pretty lame, I know. I will update \n the gem with a fix shortly. My apologies until then.\n\n\
34
34
  Troubleshooting:\n sm-transcript requires additional gems to operate. The RubyGem \n installation should install dependencies automatically, but when it \n doesn't, you get an error that includes \n \n ... no such file to load -- builder (LoadError)\n \n in the first few lines when you run sm-transcript, the problem is a \n missing dependent gem. (the error above indicates that the Builder \n gem is missing.) Try installing the missing gem. For the error above,\n the command looks like this on my computer:\n \n felix$ sudo gem install builder\n \n See \"Required Gems\" below for more information.\n \n \n A warning message such as:\n \n \"WARNING: Nokogiri was built against LibXML version 2.7.6, \n but has dynamically loaded 2.7.7\"\"\n \n may be safely ignored.\n \n If you continue to have trouble, feel free to contact me.\n \n \n\
35
35
  Upgrading:\n You can easily upgrade by simply executing the same command you used to \n install the gem. Running install again will add the newer version and make\n it active. By default the most recent version is used, but older versions\n are still available, simply inactive.\n \n If are using svn, you should already know what to do.\n \n \n\
36
36
  Required Gems:\n builder - create structured data, such as XML\n extensions - added for the 'require_relative' command. (To get this\n command in Ruby 1.8 you need to install this gem, for Ruby 1.9\n the command is already part of the core.)\n htmlentities - html parsing\n json - create JSON structured data\n nokogiri - xml parsing library\n optparse - option parsing of command line\n ostruct - open data structures\n ppcommand - pp is a pretty printer. It is used only for debugging\n rake - make for Ruby\n rubygems - support for gems (shouldn't be needed for Ruby 1.9)\n shoulda - enhancement for Test::Unit\n \n This command installs gems on OSX and Linux:\n felix$ sudo gem install <gem name>\n \n I recommend running the following command to update to latest version of\n rubygems before loading new gems.\n felix$ sudo gem update --system\n \n\
37
37
  Unit Tests:\n You may run all unit tests by navigating to the test folder and running \n rake with no parameters (the default rake task runs all tests). On my\n computer, it looks like this:\n\n projects/sm_transcript/test felix$ rake \n\n\n\
38
- Release Notes:\n Initial Version - runs under Ruby 1.8.x. \n version 0.0.4 - fixes bug when processing .WRD files with CRLF line\n endings.\n version 0.0.5 - removed due to posting error\n version 0.0.6 - added srctype of ttml and desttype of json, fixed bug\n where beginning time of word was actually for previous word.\n version 0.0.7 - added srt as srctype \n\n\
38
+ Release Notes:\n Initial Version - runs under Ruby 1.8.x. \n version 0.0.4 - fixes bug when processing .WRD files with CRLF line\n endings.\n version 0.0.5 - removed due to posting error\n version 0.0.6 - added srctype of ttml and desttype of json, fixed bug where\n beginning time of word was actually for previous word.\n version 0.0.7 - added srt as srctype \n version 0.0.8 - fixed bug that dropped last phrase from transcripts \n version 1.0.0 - declared this version 1.0.0 to conform more closely with \n gem numbering conventions. All tests run successfully. \n \n\
39
39
  To Do:\n - specify individual files for processing rather than folders\n - fix bug in srt processing: can't read Creole srt content.\n - allow user to modify the \"t1\" file extension for addition languages of \n the same transcript.\n - update code to run under Ruby 1.9\n\n\n "
40
40
  email: pwilkins@mit.edu
41
41
  executables:
@@ -75,7 +75,7 @@ files:
75
75
  - test/test_wrdreader.rb
76
76
  - test/results/18.03-2004-L01.align2.wrd
77
77
  - test/results/18.06-03.srt
78
- - test/results/20101018 OCW-18.01-f07-lec02_300k - Haitian Creole.srt
78
+ - test/results/20101018_OCW-18.01-f07-lec02_300k-Haitian Creole.srt
79
79
  - test/results/3.091-04.srt
80
80
  - test/results/5.60-01.SRT
81
81
  - test/results/7.012-01.srt