sm-transcript 0.0.4 → 0.0.6

Sign up to get free protection for your applications and to get access to all the features.
Files changed (48) hide show
  1. data/README.txt +138 -118
  2. data/Rakefile +21 -10
  3. data/bin/sm-transcript +0 -0
  4. data/lib/sm_transcript/metadata.rb +25 -0
  5. data/lib/sm_transcript/options.rb +9 -3
  6. data/lib/sm_transcript/runner.rb +6 -0
  7. data/lib/sm_transcript/seg_reader.rb +1 -1
  8. data/lib/sm_transcript/transcript.rb +86 -39
  9. data/lib/sm_transcript/ttml_reader.rb +116 -0
  10. data/lib/sm_transcript/word.rb +6 -4
  11. data/lib/sm_transcript/wrd_reader.rb +5 -4
  12. data/test/results/18.03-2004-L01.align2.wrd +6441 -0
  13. data/test/results/8.01-1999-L01.wrd +5182 -0
  14. data/test/results/801-1stLecture.ttml.xml +757 -0
  15. data/test/results/801-lect01-4730.xml +757 -0
  16. data/test/results/801-lect02-4731.xml +886 -0
  17. data/test/results/801-lect03-4732.xml +818 -0
  18. data/test/results/801-lect04-4733.xml +831 -0
  19. data/test/results/801-lect05-4734.xml +879 -0
  20. data/test/results/801-lect06-4735.xml +822 -0
  21. data/test/results/801-lect07-4736.xml +893 -0
  22. data/test/results/801-lect08-4737.xml +809 -0
  23. data/test/results/801-lect09-4738.xml +807 -0
  24. data/test/results/Audio-Open-The_New_Deal_for_Education.xml +4301 -0
  25. data/test/test_metadatareader.rb +8 -3
  26. data/test/test_options.rb +8 -1
  27. data/test/test_runner.rb +34 -1
  28. data/test/test_transcript.rb +109 -12
  29. data/test/test_ttmlreader.rb +104 -0
  30. data/test/test_wrdreader.rb +24 -9
  31. metadata +47 -148
  32. data/lib/sm_transcript/optparseExample.rb +0 -113
  33. data/lib/sm_transcript/process_csv_files_to_html.rb +0 -58
  34. data/lib/sm_transcript/process_seg_files.rb +0 -21
  35. data/lib/sm_transcript/process_seg_files_to_csv.rb +0 -24
  36. data/lib/sm_transcript/process_seg_files_to_html.rb +0 -31
  37. data/lib/sm_transcript/require_relative.rb +0 -14
  38. data/test/transcripts/GardnerRileyInterview.t1.html +0 -247
  39. data/test/transcripts/IIHS_Diane_Davis_Nov2009-t1.html +0 -148
  40. data/test/transcripts/NERCOMP-SpokenMedia4.t1.html +0 -2178
  41. data/test/transcripts/data.js +0 -24
  42. data/test/transcripts/vijay_kumar-1.-t1.html +0 -557
  43. data/test/transcripts/vijay_kumar-1.t1.html +0 -558
  44. data/test/transcripts/vijay_kumar-t1.html +0 -558
  45. data/test/transcripts/vijay_kumar-t1.ttml +0 -570
  46. data/test/transcripts/vijay_kumar.data.js +0 -2
  47. data/test/transcripts/vijay_kumar.t1.html +0 -557
  48. data/test/transcripts/wirehair-beetle.data.js +0 -24
data/README.txt CHANGED
@@ -1,140 +1,160 @@
1
- $Id: README.txt 194 2010-03-28 00:09:23Z pwilkins $
1
+ $Id: README.txt 196 2010-06-11 18:51:18Z pwilkins $
2
2
 
3
3
  sm-transcript reads results of SLS processing and produces transcripts for
4
4
  the SpokenMedia browser. For each file in the source folder whose extension
5
5
  matches the source type, a file of destination type is created in the
6
- destination folder. All of these parameters have default values.
6
+ destination folder. All of these parameters have default values.
7
+
8
+ Note: Examples of the commands you enter in the terminal are for *nix. The
9
+ command prompt in the examples is:
10
+
11
+ felix$ <command line>
12
+
13
+ If you are a Windows user, make the usual adjustments.
7
14
 
8
15
  Requirements:
9
- sm-transcript is written in Ruby and packaged as a RubyGem. Since Ruby is
10
- not a compiled language, you will need to have Ruby installed on your
11
- machine to run sm-transcript. You can determine if Ruby is installed by
12
- typing "ruby -v" at a terminal prompt. It should return the version of
13
- Ruby that is installed. If Ruby is not installed on your machine, contact
14
- me (or your local Ruby wizard) for assistance.
15
-
16
+ sm-transcript is written in Ruby and packaged as a RubyGem. Since Ruby is
17
+ not a compiled language, you will need to have Ruby installed on your
18
+ machine to run sm-transcript. You can determine if Ruby is installed by
19
+ typing "ruby -v" at a terminal prompt. It should return the version of
20
+ Ruby that is installed. If Ruby is not installed on your machine, contact
21
+ me (or your local Ruby wizard) for assistance.
22
+
16
23
  Installation:
17
- You can get sm-transcript as either a RubyGem or as source from svn.
18
-
19
- The preferred way to install this package is as a Rubygem. You can
20
- download and install the gem with this command:
21
-
22
- sudo gem install [--verbose] sm-transcript
23
-
24
- This command downloads the most recent version of the gem from rubygems.org
25
- and makes it active. Previous versions of the gem remain installed, but
26
- are deactivated.
27
-
28
- You must use "sudo" to properly install the gem. If you execute "gem
29
- install" (omitting the "sudo") the gem is installed in your home gem
30
- repository and it isn't in your path without additional configuration.
31
-
32
- Note: You need sudo privileges to run the command as written. If you
33
- can't sudo, then you can install it locally and will need some additional
34
- configuration. Contact me (or your local Ruby wizard) for assistance.
35
-
36
- The executable is now in your path.
37
-
38
- You can cleanly uninstall the gem with this command:
39
-
40
- sudo gem uninstall sm-transcript
41
-
42
- If you have access to our svn repository, you are welcome to check out the
43
- code. Be warned that the trunk tip is not necessarily stable. It changes
44
- frequently as enhancements (and bug fixes) are added. (note that the
45
- 'smb_transcript' in the command line below is not a typo. )
46
-
47
- svn co svn+ssh://svn.mit.edu/oeit-tsa/SMB/smb_transcript/trunk sm_transcript
48
-
49
- build the gem by running this command from the directory you installed the
50
- source.
51
-
52
- rake gem
53
-
54
- The gem will be built and put in ./pkg You can now use the gem
55
- installation instructions above.
56
-
24
+ You can get sm-transcript as either a RubyGem or as source from svn.
25
+
26
+ The preferred way to install this package is as a Rubygem. You can
27
+ download and install the gem with this command:
28
+
29
+ felix$ sudo gem install [--verbose] sm-transcript
30
+
31
+ This command downloads the most recent version of the gem from rubygems.org
32
+ and makes it active. Previous versions of the gem remain installed, but
33
+ are deactivated.
34
+
35
+ You must use "sudo" to properly install the gem. If you execute "gem
36
+ install" (omitting the "sudo") the gem is installed in your home gem
37
+ repository and it isn't in your path without additional configuration.
38
+
39
+ Note: You need sudo privileges to run the command as written. If you
40
+ can't sudo, then you can install it locally and will need some additional
41
+ configuration. Contact me (or your local Ruby wizard) for assistance.
42
+
43
+ The executable is now in your path.
44
+
45
+ You can cleanly uninstall the gem with this command:
46
+
47
+ felix$ sudo gem uninstall sm-transcript
48
+
49
+ If you have access to our svn repository, you are welcome to check out the
50
+ code. Be warned that the trunk tip is not necessarily stable. It changes
51
+ frequently as enhancements (and bug fixes) are added. (note that the
52
+ 'smb_transcript' in the command line below is not a typo.)
53
+
54
+ svn co svn+ssh://svn.mit.edu/oeit-tsa/SMB/smb_transcript/trunk sm_transcript
55
+
56
+ build the gem by running this command from the directory you installed the
57
+ source. This is what it looks like on my machine:
58
+
59
+ felix$ rake gem
60
+
61
+ The gem will be built and put in ./pkg You can now use the gem
62
+ installation instructions above.
63
+
57
64
 
58
65
  Using the App:
59
- Run with no command line parameters, the app reads *.wrd files out of
60
- ./results and writes *t1.html files to ./transcripts. These directories
61
- are relative to where sm_transcript is called.
62
-
63
- Note: destination files are overwritten without a warning prompt. If you
64
- want to preserve an existing output file, rename it before running the app
65
- again.
66
-
67
- For example, run the app by navigating to the bin folder and running
68
-
69
- projects/sm_transcript/bin felix$ sm_transcript
70
-
71
- This command run from this folder will read *.wrd files from bin/results
72
- and write *-t1.html to bin/transcripts.
73
-
74
- Usage: sm_transcript [options]
75
- --srcdir PATH Read files from this folder (Default: ./results)
76
- --destdir PATH Write files to this folder (Default: ./transcripts)
77
- --srctype wrd | seg Kind of file to process (Default: wrd)
78
- --desttype html | ttml | datajs Kind of file to output (Default: html)
79
- -h, --help Show this message
66
+ Run with no command line parameters, the app reads *.wrd files out of
67
+ ./results and writes *t1.html files to ./transcripts. These directories
68
+ are relative to where sm_transcript is called.
69
+
70
+ Note: destination files are overwritten without a warning prompt. If you
71
+ want to preserve an existing output file, rename it before running the app
72
+ again.
73
+
74
+ For example, run the app by navigating to the bin folder and enter
75
+
76
+ projects/sm_transcript/bin felix$ sm_transcript
77
+
78
+ This command run from this folder will read *.wrd files from bin/results
79
+ and write *-t1.html to bin/transcripts.
80
+
81
+ Usage: sm_transcript [options]
82
+ --srcdir PATH Read files from this folder (Default: ./results)
83
+ --destdir PATH Write files to this folder (Default: ./transcripts)
84
+ --srctype wrd | seg | txt | ttml Kind of file to process (Default: wrd)
85
+ --desttype html | ttml | datajs | json Kind of file to output (Default: html)
86
+ -h, --help Show this message
80
87
 
81
88
 
82
89
  Troubleshooting:
83
- sm-transcript requires additional gems to operate. The RubyGem
84
- installation should install dependencies automatically, but when it
85
- doesn't, you get an error that includes
86
-
87
- ... no such file to load -- builder (LoadError)
88
-
89
- in the first few lines when you run sm-transcript, the problem is a
90
- missing dependent gem. (the error above indicates that the Builder
91
- gem is missing.) Try installing the missing gem. For the error above,
92
- command looks like this:
93
-
94
- sudo gem install builder
95
-
96
- See "Required Gems" below for more information.
97
-
98
-
90
+ sm-transcript requires additional gems to operate. The RubyGem
91
+ installation should install dependencies automatically, but when it
92
+ doesn't, you get an error that includes
93
+
94
+ ... no such file to load -- builder (LoadError)
95
+
96
+ in the first few lines when you run sm-transcript, the problem is a
97
+ missing dependent gem. (the error above indicates that the Builder
98
+ gem is missing.) Try installing the missing gem. For the error above,
99
+ the command looks like this on my computer:
100
+
101
+ felix$ sudo gem install builder
102
+
103
+ See "Required Gems" below for more information.
104
+
105
+
106
+ A warning message such as:
107
+
108
+ "WARNING: Nokogiri was built against LibXML version 2.7.6,
109
+ but has dynamically loaded 2.7.7""
110
+
111
+ may be safely ignored.
112
+
113
+
99
114
  Upgrading:
100
- You can easily upgrade by simply executing the same command you used to
101
- install the gem. Running install again will add the newer version and make
102
- it active. By default the most recent version is used, but older versions
103
- are still available, simply inactive.
104
-
105
- If are using svn, you should already know what to do.
106
-
107
-
115
+ You can easily upgrade by simply executing the same command you used to
116
+ install the gem. Running install again will add the newer version and make
117
+ it active. By default the most recent version is used, but older versions
118
+ are still available, simply inactive.
119
+
120
+ If are using svn, you should already know what to do.
121
+
122
+
108
123
  Required Gems:
109
- builder - create structured data, such as XML
110
- extensions - added for the 'require_relative' command. (To get this
111
- command in Ruby 1.8 you need to install this gem, for Ruby 1.9
112
- the command is already part of the core.)
113
- htmlentities - html parsing
114
- json - create JSON structured data
115
- optparse - option parsing of command line
116
- ostruct - open data structures
117
- ppcommand - pp is a pretty printer. It is used only for debugging
118
- rake - make for Ruby
119
- rubygems - support for gems (shouldn't be needed for Ruby 1.9)
120
- shoulda - enhancement for Test::Unit
121
-
122
- This command installs gems on OSX and Linux:
123
- felix$ sudo gem install <gem name>
124
-
124
+ builder - create structured data, such as XML
125
+ extensions - added for the 'require_relative' command. (To get this
126
+ command in Ruby 1.8 you need to install this gem, for Ruby 1.9
127
+ the command is already part of the core.)
128
+ htmlentities - html parsing
129
+ json - create JSON structured data
130
+ optparse - option parsing of command line
131
+ ostruct - open data structures
132
+ ppcommand - pp is a pretty printer. It is used only for debugging
133
+ rake - make for Ruby
134
+ rubygems - support for gems (shouldn't be needed for Ruby 1.9)
135
+ shoulda - enhancement for Test::Unit
136
+
137
+ This command installs gems on OSX and Linux:
138
+ felix$ sudo gem install <gem name>
139
+
125
140
  Unit Tests:
126
- You may run all unit tests by navigating to the test folder and running
127
- rake with no parameters (the default rake task runs all tests):
141
+ You may run all unit tests by navigating to the test folder and running
142
+ rake with no parameters (the default rake task runs all tests). On my
143
+ computer, it looks like this:
128
144
 
129
- projects/sm_transcript/test felix$ rake
145
+ projects/sm_transcript/test felix$ rake
130
146
 
131
147
 
132
148
  Release Notes:
133
- Initial Version - runs under Ruby 1.8.
149
+ Initial Version - runs under Ruby 1.8.x.
150
+ version 0.0.4 - fixes bug when processing .WRD files with CRLF line
151
+ endings.
152
+ version 0.0.5 - added srctype of ttml and desttype of json, fixed bug
153
+ where beginning time of word was actually for previous word.
134
154
 
135
155
  To Do:
136
- update code to run under Ruby 1.9
156
+ specify individual files for processing rather than folders
157
+ update code to run under Ruby 1.9
158
+
137
159
 
138
- Make this a rubygem, making it available from an OEIT server, rather than
139
- from a public gem repository like RubyForge.
140
-
160
+
data/Rakefile CHANGED
@@ -1,31 +1,42 @@
1
- # $Id: Rakefile 195 2010-04-15 17:29:55Z pwilkins $
1
+ # $Id: Rakefile 196 2010-06-11 18:51:18Z pwilkins $
2
2
 
3
3
  require 'rake/gempackagetask'
4
4
  require 'rake'
5
5
 
6
- spec = Gem::Specification.new do |s|
6
+ spec = Gem::Specification.new do |s|
7
7
  s.name = "sm-transcript"
8
8
  s.summary = "Convert word lists to transcripts"
9
9
  s.description= File.read(File.join(File.dirname(__FILE__), 'README.txt'))
10
10
  s.requirements = [ 'TBD' ]
11
- s.version = "0.0.4"
11
+ s.version = "0.0.6"
12
12
  s.author = "Peter Wilkins"
13
13
  s.email = "pwilkins@mit.edu"
14
14
  s.homepage = "http://spokenmedia.mit.edu"
15
15
  s.platform = Gem::Platform::RUBY
16
16
  s.required_ruby_version = '>=1.8'
17
17
  s.files = Dir['lib/**/**'] +
18
- Dir['bin/sm-transcript'] +
19
- Dir['bin/results/PLACEHOLDER.txt'] +
20
- Dir['bin/transcripts/PLACEHOLDER.txt'] +
21
- Dir['test/**/**'] +
18
+ Dir['bin/sm-transcript'] +
19
+ Dir['bin/results/PLACEHOLDER.txt'] +
20
+ Dir['bin/transcripts/PLACEHOLDER.txt'] +
21
+ Dir['test/*'] +
22
+ Dir['test/results/*'] +
23
+ Dir['test/transcripts/PLACEHOLDER.txt'] +
22
24
  Dir['README.txt'] +
23
25
  Dir['LICENSE.txt'] +
24
- Dir['Rakefile']
25
- s.files.reject! { |fn| fn.include? "process_" }
26
+ Dir['Rakefile']
27
+ s.files.reject! { |fn| fn.include? "process_" }
28
+ s.files.reject! { |fn| fn.include? 'lect1' }
29
+ s.files.reject! { |fn| fn.include? 'lect2' }
30
+ s.files.reject! { |fn| fn.include? 'lect3' }
31
+ s.files.reject! { |fn| fn.include? 'file-chksum.rb' }
32
+ s.files.reject! { |fn| fn.include? 'html_tokenizer-example.rb' }
33
+ s.files.reject! { |fn| fn.include? 'optparseExample.rb' }
34
+ s.files.reject! { |fn| fn.include? 'xml_to_sqlite.rb' }
35
+ s.files.reject! { |fn| fn.include? 'require_relative.rb' }
36
+ s.files.reject! { |fn| fn.include? '801-lect1.*' }
26
37
  s.executables = [ 'sm-transcript' ]
27
38
  s.test_files = Dir["test/test*.rb"]
28
39
  s.has_rdoc = false
29
40
  end
30
-
41
+
31
42
  Rake::GemPackageTask.new(spec).define
File without changes
@@ -9,6 +9,31 @@ require_relative 'word'
9
9
 
10
10
  module SmTranscript
11
11
  class Metadata
12
+
13
+ # "dc-abstract"
14
+ # "dc-contributor"
15
+ # "dc-creator"
16
+ # "dc-description"
17
+ # "dc-isPartOf"
18
+ # "dc-language"
19
+ # "dc-license"
20
+ # "dc-subject"
21
+ # "dc-title"
22
+ # "dc-audience"
23
+ # "dc-available"
24
+ # "dc-created"
25
+ # "dc-extent"
26
+ # "dc-identifier"
27
+ # "dc-isReplacedBy"
28
+ # "dc-issued"
29
+ # "dc-modified"
30
+ # "dc-publisher"
31
+ # "dc-replaces"
32
+ # "dc-rightsHolder"
33
+ # "dc-spatial"
34
+ # "dc-temporal"
35
+ # "dc-type"
36
+ # "dc-valid"
12
37
 
13
38
  def initialize(metadata)
14
39
  @metadata = metadata
@@ -11,6 +11,7 @@ module SmTranscript
11
11
  SEG_SRC_TYPE = 'seg'
12
12
  WRD_SRC_TYPE = 'wrd'
13
13
  TXT_SRC_TYPE = 'txt'
14
+ TTML_SRC_TYPE = 'xml'
14
15
  TTML_DEST_TYPE = 'ttml'
15
16
  HTML_DEST_TYPE = 'html'
16
17
  DATAJS_DEST_TYPE = 'datajs'
@@ -58,12 +59,12 @@ module SmTranscript
58
59
  @options.destdir = @destdir = ddir
59
60
  end
60
61
 
61
- opts.on("--srctype seg | wrd | txt",
62
- "Kind of file to process (Default: seg)") do |stype|
62
+ opts.on("--srctype seg | wrd | txt | xml",
63
+ "Kind of file to process (Default: wrd)") do |stype|
63
64
  @options.srctype = @srctype = stype
64
65
  end
65
66
 
66
- opts.on("--desttype html | ttml | datajs",
67
+ opts.on("--desttype html | ttml | datajs | json",
67
68
  "Kind of format to output (Default: html)") do |dtype|
68
69
  @options.desttype = @desttype = dtype
69
70
  end
@@ -73,6 +74,11 @@ module SmTranscript
73
74
  return
74
75
  end
75
76
 
77
+ opts.on("-v", "--version", "Show version") do
78
+ puts "\nsm-transcript gem version: 0.0.5rc"
79
+ return
80
+ end
81
+
76
82
  begin
77
83
  argv = ["-h"] if argv.empty?
78
84
  opts.parse!(argv)
@@ -7,6 +7,7 @@ require 'extensions/kernel'
7
7
  require_relative 'options'
8
8
  require_relative 'seg_reader'
9
9
  require_relative 'wrd_reader'
10
+ require_relative 'ttml_reader'
10
11
  require_relative 'transcript'
11
12
  require_relative 'metadata'
12
13
  require_relative 'metadata_reader'
@@ -23,6 +24,9 @@ module SmTranscript
23
24
  def run
24
25
  # collect files to process
25
26
  begin
27
+ # p "working directory is #{File.new(__FILE__).path}"
28
+ # p "reading from #{@options.srcdir}"
29
+ # p "writing to #{@options.destdir}"
26
30
  raise "source directory doesn't exist" unless FileTest.exists?(@options.srcdir)
27
31
  raise "destination directory doesn't exist" unless FileTest.exists?(@options.destdir)
28
32
 
@@ -32,6 +36,8 @@ module SmTranscript
32
36
  case @options.srctype
33
37
  when SmTranscript::Options::SEG_SRC_TYPE
34
38
  words = SmTranscript::SegReader.from_file(x).words
39
+ when SmTranscript::Options::TTML_SRC_TYPE
40
+ words = SmTranscript::TtmlReader.from_file(x).words
35
41
  when SmTranscript::Options::TXT_SRC_TYPE
36
42
  md = SmTranscript::MetadataReader.from_file(x).metadata
37
43
  else SmTranscript::Options::WRD_SRC_TYPE
@@ -34,7 +34,7 @@ module SmTranscript
34
34
  @root.elements.each("/document/lecture/segment") do |s|
35
35
  s.text.scan(/^\d* \d* [\w']*$/) do |t|
36
36
  arr = t.split
37
- @words << SmTranscript::Word.new(arr[0], arr[1], arr[2])
37
+ @words << SmTranscript::Word.new(arr[0], arr[1], arr[1].to_i - arr[0].to_i, arr[2])
38
38
  end
39
39
  end
40
40
  end
@@ -5,12 +5,14 @@
5
5
  require "rexml/document"
6
6
  require 'extensions/kernel'
7
7
  require 'builder'
8
+ require 'sqlite3'
8
9
  require_relative 'word'
9
10
 
10
11
  module SmTranscript
11
12
  class Transcript
12
13
 
13
14
  @words = Array.new()
15
+ attr_reader :words
14
16
 
15
17
  def initialize(word_arr)
16
18
  @metadata = {}
@@ -27,7 +29,7 @@ module SmTranscript
27
29
  prev_start_time = 0
28
30
  start_time = 0
29
31
  @words.each do |w|
30
- # get the start time and reduce its granularity so that multiple
32
+ # get the start time and reduce its granularity so that multiple
31
33
  # words fall within a <span> element.
32
34
  start_time = w.start_time.to_i/1000
33
35
  if start_time.to_i == prev_start_time.to_i # append word
@@ -35,16 +37,16 @@ module SmTranscript
35
37
  else # create a new span_element
36
38
  # since prev_start_time is zero on first line, this avoids
37
39
  # writing a closing </span> with no opening <span>
40
+ span_element = cleanup_phrase(span_element)
38
41
  f.puts span_element << "</span> " unless prev_start_time == 0
39
-
40
- span_element = "<span id='T#{start_time}'>#{w.word}"
41
- prev_start_time = start_time
42
+ span_element = "<span id='T#{start_time}'>#{w.word}"
43
+ prev_start_time = start_time
42
44
  end
43
45
  end
44
- # In the block above, the last word isn't written if
45
- # the start_time and prev_start_time are the same.
46
- f.puts span_element << "</span> " unless start_time != prev_start_time
47
-
46
+ # In the block above, the last word isn't written if
47
+ # the start_time and prev_start_time are the same.
48
+ f.puts span_element << "</span> " unless start_time != prev_start_time
49
+ f.close
48
50
  end
49
51
  end # write_html()
50
52
 
@@ -57,13 +59,13 @@ module SmTranscript
57
59
  buf = ""
58
60
  bldr = Builder::XmlMarkup.new( :target => buf, :indent => 2 )
59
61
  bldr.instruct!
60
- bldr.tt("xmlns" => "http://www.w3.org/2006/04/ttaf1",
62
+ bldr.tt("xmlns" => "http://www.w3.org/2006/04/ttaf1",
61
63
  "xmlns:tts" => "http://www.w3.org/ns/ttml#styling",
62
64
  "xmlns:ttm" => "http://www.w3.org/ns/ttml#metadata",
63
- "xml:lang" => "en" ) {
65
+ "xml:lang" => "en" ) {
64
66
  bldr.head { |b|
65
- b.ttm :title, 'Document Metadata Example'
66
- b.ttm :desc, 'This document employs document metadata.'
67
+ b.ttm :title, 'The title of this transcript'
68
+ b.ttm :desc, 'The description of this transcript'
67
69
  }
68
70
  bldr.body {
69
71
  bldr.div {
@@ -72,31 +74,37 @@ module SmTranscript
72
74
  start_ms = end_ms = 0
73
75
  start_secs = 0
74
76
  @words.each do |w|
75
- # get the start time and reduce its granularity so that multiple
76
- # words fall within a span element.
77
+ # get the start time and reduce its granularity so that
78
+ # multiple words form a phrase.
77
79
  start_secs = w.start_time.to_i/1000
78
80
  if start_secs == prev_start_secs # append word
79
- end_ms = w.end_time.to_i
81
+ end_ms = w.end_time.to_i
80
82
  span_element << " #{w.word}"
81
83
  else # create a new span_element
82
- bldr.p( span_element,
83
- "xml:id" => "T#{start_secs.to_s}", "begin" => "#{start_ms.to_s}ms", "end" => "#{end_ms.to_s}ms" )
84
+ start_secs = w.start_time.to_i/1000
85
+ bldr.p( span_element,
86
+ "xml:id" => "T#{start_secs.to_s}",
87
+ "begin" => "#{start_ms.to_s}ms",
88
+ "dur" => "#{(end_ms - start_ms).to_s}ms",
89
+ "end" => "#{end_ms.to_s}ms" )
84
90
 
85
91
  start_ms = w.start_time.to_i
86
92
  end_ms = w.end_time.to_i
87
- span_element = " #{w.word}"
88
- prev_start_secs = start_secs
93
+ span_element = " #{w.word}"
94
+ prev_start_secs = start_secs
89
95
  end
90
- end
91
- # In the block above, the last word isn't written if
92
- # the start_time and prev_start_time are the same.
93
- bldr.p( span_element,
94
- "xml:id" => "T#{start_secs.to_s}",
95
- "begin" => "#{start_ms.to_s}ms",
96
- "end" => "#{end_ms.to_s}ms" ) unless start_secs != prev_start_secs
96
+ end # @words.each
97
+
98
+ # In the block above, the last word isn't written if
99
+ # the start_time and prev_start_time are the same.
100
+ bldr.p( span_element,
101
+ "xml:id" => "T#{start_secs.to_s}",
102
+ "begin" => "#{start_ms.to_s}ms",
103
+ "dur" => "#{(end_ms - start_ms).to_s}ms",
104
+ "end" => "#{end_ms.to_s}ms" ) unless start_secs != prev_start_secs
97
105
  }
98
106
  }
99
- }
107
+ }
100
108
  # p buf
101
109
  File.open(dest_file, "w") do |f|
102
110
  f.puts buf
@@ -104,27 +112,66 @@ module SmTranscript
104
112
  end
105
113
  end
106
114
 
107
- # Times are expressed in milliseconds, far more granularity than is
108
- # useful for most user-facing apps, especially since the player reports
115
+
116
+ # The JSON format is defined at http://url/of/document. It is the format
117
+ # of the static timed-text document that is passed to the player.˙
118
+ def write_json(dest_file)
119
+
120
+ end # write_json()
121
+
122
+
123
+ # Store transcript in a Sqlite database (though the essence of this
124
+ # method should work for all relational dbs). Unlike some of the other
125
+ # write_xxx() methods, this one requires a @metadata array.
126
+ # param db_id - for SQLite, this is a filename.
127
+ # video_id - is a unique identifier for the video
128
+
129
+ def write_sqlite(db_id)
130
+ db_id = "sm-transcript"
131
+ db = SQLite3::Database.open(db_id + '.sqlite3')
132
+
133
+ fields = XPath.match(doc.root, inner_node_name + '[1]/*').map{|node| node.name}
134
+ field_def = fields.map {|x| "%s TEXT" % x}.join(', ')
135
+
136
+ end # write_sqlite()
137
+
138
+
139
+ private
140
+
141
+ # Times are expressed in milliseconds, far more granularity than is
142
+ # useful for most user-facing apps, especially since the player reports
109
143
  # elapsed time only ten times a second.
110
- # By reducing the time by orders of magnitude provides these benefits:
144
+ # By reducing the time by orders of magnitude provides these benefits:
111
145
  # 1) Multiple words fall within a <span> element.
112
146
  # 2) Better mapping between start times and player time tracking
113
147
  def words_to_phrase(start_time)
114
148
  start_time.to_i/1000
115
149
  end # words_to_phrase
116
-
117
- def get_time_expression(milliseconds)
118
- milliseconds
119
- end
120
-
121
- # There are some word combinations that occur with such regularity that
150
+
151
+ # def get_time_expression(milliseconds)
152
+ # milliseconds
153
+ # end
154
+
155
+ # There are some word combinations that occur with such regularity that
122
156
  # they call out to be fixed. For example, "m I t" is unambiguously MIT.
123
- # These edits can only be done when the phrase has been assembled.
157
+ # These edits can only be done when the phrase has been assembled since
158
+ # each letter is treated as an indiviual word.
124
159
  def cleanup_phrase(phrase)
125
- phrase
160
+ phrase.gsub(/m I t/, 'MIT')
161
+ phrase.gsub(/o e I t/, 'OEIT')
162
+ end
163
+
164
+ # remove HTML tags from text. requires classes from ActionPack
165
+ def strip_tags(html)
166
+ return html if html.empty? || !html.include?('<')
167
+ output = ""
168
+ tokenizer = HTML::Tokenizer.new(html)
169
+ while token = tokenizer.next
170
+ node = HTML::Node.parse(nil, 0, 0, token, false)
171
+ output += token unless (node.kind_of? HTML::Tag) or (token =~ /^<!/)
172
+ end
173
+ return output
126
174
  end
127
-
128
175
 
129
176
  end # class
130
177
  end