sm-transcript 0.0.4 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. data/README.txt +138 -118
  2. data/Rakefile +21 -10
  3. data/bin/sm-transcript +0 -0
  4. data/lib/sm_transcript/metadata.rb +25 -0
  5. data/lib/sm_transcript/options.rb +9 -3
  6. data/lib/sm_transcript/runner.rb +6 -0
  7. data/lib/sm_transcript/seg_reader.rb +1 -1
  8. data/lib/sm_transcript/transcript.rb +86 -39
  9. data/lib/sm_transcript/ttml_reader.rb +116 -0
  10. data/lib/sm_transcript/word.rb +6 -4
  11. data/lib/sm_transcript/wrd_reader.rb +5 -4
  12. data/test/results/18.03-2004-L01.align2.wrd +6441 -0
  13. data/test/results/8.01-1999-L01.wrd +5182 -0
  14. data/test/results/801-1stLecture.ttml.xml +757 -0
  15. data/test/results/801-lect01-4730.xml +757 -0
  16. data/test/results/801-lect02-4731.xml +886 -0
  17. data/test/results/801-lect03-4732.xml +818 -0
  18. data/test/results/801-lect04-4733.xml +831 -0
  19. data/test/results/801-lect05-4734.xml +879 -0
  20. data/test/results/801-lect06-4735.xml +822 -0
  21. data/test/results/801-lect07-4736.xml +893 -0
  22. data/test/results/801-lect08-4737.xml +809 -0
  23. data/test/results/801-lect09-4738.xml +807 -0
  24. data/test/results/Audio-Open-The_New_Deal_for_Education.xml +4301 -0
  25. data/test/test_metadatareader.rb +8 -3
  26. data/test/test_options.rb +8 -1
  27. data/test/test_runner.rb +34 -1
  28. data/test/test_transcript.rb +109 -12
  29. data/test/test_ttmlreader.rb +104 -0
  30. data/test/test_wrdreader.rb +24 -9
  31. metadata +47 -148
  32. data/lib/sm_transcript/optparseExample.rb +0 -113
  33. data/lib/sm_transcript/process_csv_files_to_html.rb +0 -58
  34. data/lib/sm_transcript/process_seg_files.rb +0 -21
  35. data/lib/sm_transcript/process_seg_files_to_csv.rb +0 -24
  36. data/lib/sm_transcript/process_seg_files_to_html.rb +0 -31
  37. data/lib/sm_transcript/require_relative.rb +0 -14
  38. data/test/transcripts/GardnerRileyInterview.t1.html +0 -247
  39. data/test/transcripts/IIHS_Diane_Davis_Nov2009-t1.html +0 -148
  40. data/test/transcripts/NERCOMP-SpokenMedia4.t1.html +0 -2178
  41. data/test/transcripts/data.js +0 -24
  42. data/test/transcripts/vijay_kumar-1.-t1.html +0 -557
  43. data/test/transcripts/vijay_kumar-1.t1.html +0 -558
  44. data/test/transcripts/vijay_kumar-t1.html +0 -558
  45. data/test/transcripts/vijay_kumar-t1.ttml +0 -570
  46. data/test/transcripts/vijay_kumar.data.js +0 -2
  47. data/test/transcripts/vijay_kumar.t1.html +0 -557
  48. data/test/transcripts/wirehair-beetle.data.js +0 -24
data/README.txt CHANGED
@@ -1,140 +1,160 @@
1
- $Id: README.txt 194 2010-03-28 00:09:23Z pwilkins $
1
+ $Id: README.txt 196 2010-06-11 18:51:18Z pwilkins $
2
2
 
3
3
  sm-transcript reads results of SLS processing and produces transcripts for
4
4
  the SpokenMedia browser. For each file in the source folder whose extension
5
5
  matches the source type, a file of destination type is created in the
6
- destination folder. All of these parameters have default values.
6
+ destination folder. All of these parameters have default values.
7
+
8
+ Note: Examples of the commands you enter in the terminal are for *nix. The
9
+ command prompt in the examples is:
10
+
11
+ felix$ <command line>
12
+
13
+ If you are a Windows user, make the usual adjustments.
7
14
 
8
15
  Requirements:
9
- sm-transcript is written in Ruby and packaged as a RubyGem. Since Ruby is
10
- not a compiled language, you will need to have Ruby installed on your
11
- machine to run sm-transcript. You can determine if Ruby is installed by
12
- typing "ruby -v" at a terminal prompt. It should return the version of
13
- Ruby that is installed. If Ruby is not installed on your machine, contact
14
- me (or your local Ruby wizard) for assistance.
15
-
16
+ sm-transcript is written in Ruby and packaged as a RubyGem. Since Ruby is
17
+ not a compiled language, you will need to have Ruby installed on your
18
+ machine to run sm-transcript. You can determine if Ruby is installed by
19
+ typing "ruby -v" at a terminal prompt. It should return the version of
20
+ Ruby that is installed. If Ruby is not installed on your machine, contact
21
+ me (or your local Ruby wizard) for assistance.
22
+
16
23
  Installation:
17
- You can get sm-transcript as either a RubyGem or as source from svn.
18
-
19
- The preferred way to install this package is as a Rubygem. You can
20
- download and install the gem with this command:
21
-
22
- sudo gem install [--verbose] sm-transcript
23
-
24
- This command downloads the most recent version of the gem from rubygems.org
25
- and makes it active. Previous versions of the gem remain installed, but
26
- are deactivated.
27
-
28
- You must use "sudo" to properly install the gem. If you execute "gem
29
- install" (omitting the "sudo") the gem is installed in your home gem
30
- repository and it isn't in your path without additional configuration.
31
-
32
- Note: You need sudo privileges to run the command as written. If you
33
- can't sudo, then you can install it locally and will need some additional
34
- configuration. Contact me (or your local Ruby wizard) for assistance.
35
-
36
- The executable is now in your path.
37
-
38
- You can cleanly uninstall the gem with this command:
39
-
40
- sudo gem uninstall sm-transcript
41
-
42
- If you have access to our svn repository, you are welcome to check out the
43
- code. Be warned that the trunk tip is not necessarily stable. It changes
44
- frequently as enhancements (and bug fixes) are added. (note that the
45
- 'smb_transcript' in the command line below is not a typo. )
46
-
47
- svn co svn+ssh://svn.mit.edu/oeit-tsa/SMB/smb_transcript/trunk sm_transcript
48
-
49
- build the gem by running this command from the directory you installed the
50
- source.
51
-
52
- rake gem
53
-
54
- The gem will be built and put in ./pkg You can now use the gem
55
- installation instructions above.
56
-
24
+ You can get sm-transcript as either a RubyGem or as source from svn.
25
+
26
+ The preferred way to install this package is as a Rubygem. You can
27
+ download and install the gem with this command:
28
+
29
+ felix$ sudo gem install [--verbose] sm-transcript
30
+
31
+ This command downloads the most recent version of the gem from rubygems.org
32
+ and makes it active. Previous versions of the gem remain installed, but
33
+ are deactivated.
34
+
35
+ You must use "sudo" to properly install the gem. If you execute "gem
36
+ install" (omitting the "sudo") the gem is installed in your home gem
37
+ repository and it isn't in your path without additional configuration.
38
+
39
+ Note: You need sudo privileges to run the command as written. If you
40
+ can't sudo, then you can install it locally and will need some additional
41
+ configuration. Contact me (or your local Ruby wizard) for assistance.
42
+
43
+ The executable is now in your path.
44
+
45
+ You can cleanly uninstall the gem with this command:
46
+
47
+ felix$ sudo gem uninstall sm-transcript
48
+
49
+ If you have access to our svn repository, you are welcome to check out the
50
+ code. Be warned that the trunk tip is not necessarily stable. It changes
51
+ frequently as enhancements (and bug fixes) are added. (note that the
52
+ 'smb_transcript' in the command line below is not a typo.)
53
+
54
+ svn co svn+ssh://svn.mit.edu/oeit-tsa/SMB/smb_transcript/trunk sm_transcript
55
+
56
+ build the gem by running this command from the directory you installed the
57
+ source. This is what it looks like on my machine:
58
+
59
+ felix$ rake gem
60
+
61
+ The gem will be built and put in ./pkg You can now use the gem
62
+ installation instructions above.
63
+
57
64
 
58
65
  Using the App:
59
- Run with no command line parameters, the app reads *.wrd files out of
60
- ./results and writes *t1.html files to ./transcripts. These directories
61
- are relative to where sm_transcript is called.
62
-
63
- Note: destination files are overwritten without a warning prompt. If you
64
- want to preserve an existing output file, rename it before running the app
65
- again.
66
-
67
- For example, run the app by navigating to the bin folder and running
68
-
69
- projects/sm_transcript/bin felix$ sm_transcript
70
-
71
- This command run from this folder will read *.wrd files from bin/results
72
- and write *-t1.html to bin/transcripts.
73
-
74
- Usage: sm_transcript [options]
75
- --srcdir PATH Read files from this folder (Default: ./results)
76
- --destdir PATH Write files to this folder (Default: ./transcripts)
77
- --srctype wrd | seg Kind of file to process (Default: wrd)
78
- --desttype html | ttml | datajs Kind of file to output (Default: html)
79
- -h, --help Show this message
66
+ Run with no command line parameters, the app reads *.wrd files out of
67
+ ./results and writes *t1.html files to ./transcripts. These directories
68
+ are relative to where sm_transcript is called.
69
+
70
+ Note: destination files are overwritten without a warning prompt. If you
71
+ want to preserve an existing output file, rename it before running the app
72
+ again.
73
+
74
+ For example, run the app by navigating to the bin folder and enter
75
+
76
+ projects/sm_transcript/bin felix$ sm_transcript
77
+
78
+ This command run from this folder will read *.wrd files from bin/results
79
+ and write *-t1.html to bin/transcripts.
80
+
81
+ Usage: sm_transcript [options]
82
+ --srcdir PATH Read files from this folder (Default: ./results)
83
+ --destdir PATH Write files to this folder (Default: ./transcripts)
84
+ --srctype wrd | seg | txt | ttml Kind of file to process (Default: wrd)
85
+ --desttype html | ttml | datajs | json Kind of file to output (Default: html)
86
+ -h, --help Show this message
80
87
 
81
88
 
82
89
  Troubleshooting:
83
- sm-transcript requires additional gems to operate. The RubyGem
84
- installation should install dependencies automatically, but when it
85
- doesn't, you get an error that includes
86
-
87
- ... no such file to load -- builder (LoadError)
88
-
89
- in the first few lines when you run sm-transcript, the problem is a
90
- missing dependent gem. (the error above indicates that the Builder
91
- gem is missing.) Try installing the missing gem. For the error above,
92
- command looks like this:
93
-
94
- sudo gem install builder
95
-
96
- See "Required Gems" below for more information.
97
-
98
-
90
+ sm-transcript requires additional gems to operate. The RubyGem
91
+ installation should install dependencies automatically, but when it
92
+ doesn't, you get an error that includes
93
+
94
+ ... no such file to load -- builder (LoadError)
95
+
96
+ in the first few lines when you run sm-transcript, the problem is a
97
+ missing dependent gem. (the error above indicates that the Builder
98
+ gem is missing.) Try installing the missing gem. For the error above,
99
+ the command looks like this on my computer:
100
+
101
+ felix$ sudo gem install builder
102
+
103
+ See "Required Gems" below for more information.
104
+
105
+
106
+ A warning message such as:
107
+
108
+ "WARNING: Nokogiri was built against LibXML version 2.7.6,
109
+ but has dynamically loaded 2.7.7""
110
+
111
+ may be safely ignored.
112
+
113
+
99
114
  Upgrading:
100
- You can easily upgrade by simply executing the same command you used to
101
- install the gem. Running install again will add the newer version and make
102
- it active. By default the most recent version is used, but older versions
103
- are still available, simply inactive.
104
-
105
- If are using svn, you should already know what to do.
106
-
107
-
115
+ You can easily upgrade by simply executing the same command you used to
116
+ install the gem. Running install again will add the newer version and make
117
+ it active. By default the most recent version is used, but older versions
118
+ are still available, simply inactive.
119
+
120
+ If are using svn, you should already know what to do.
121
+
122
+
108
123
  Required Gems:
109
- builder - create structured data, such as XML
110
- extensions - added for the 'require_relative' command. (To get this
111
- command in Ruby 1.8 you need to install this gem, for Ruby 1.9
112
- the command is already part of the core.)
113
- htmlentities - html parsing
114
- json - create JSON structured data
115
- optparse - option parsing of command line
116
- ostruct - open data structures
117
- ppcommand - pp is a pretty printer. It is used only for debugging
118
- rake - make for Ruby
119
- rubygems - support for gems (shouldn't be needed for Ruby 1.9)
120
- shoulda - enhancement for Test::Unit
121
-
122
- This command installs gems on OSX and Linux:
123
- felix$ sudo gem install <gem name>
124
-
124
+ builder - create structured data, such as XML
125
+ extensions - added for the 'require_relative' command. (To get this
126
+ command in Ruby 1.8 you need to install this gem, for Ruby 1.9
127
+ the command is already part of the core.)
128
+ htmlentities - html parsing
129
+ json - create JSON structured data
130
+ optparse - option parsing of command line
131
+ ostruct - open data structures
132
+ ppcommand - pp is a pretty printer. It is used only for debugging
133
+ rake - make for Ruby
134
+ rubygems - support for gems (shouldn't be needed for Ruby 1.9)
135
+ shoulda - enhancement for Test::Unit
136
+
137
+ This command installs gems on OSX and Linux:
138
+ felix$ sudo gem install <gem name>
139
+
125
140
  Unit Tests:
126
- You may run all unit tests by navigating to the test folder and running
127
- rake with no parameters (the default rake task runs all tests):
141
+ You may run all unit tests by navigating to the test folder and running
142
+ rake with no parameters (the default rake task runs all tests). On my
143
+ computer, it looks like this:
128
144
 
129
- projects/sm_transcript/test felix$ rake
145
+ projects/sm_transcript/test felix$ rake
130
146
 
131
147
 
132
148
  Release Notes:
133
- Initial Version - runs under Ruby 1.8.
149
+ Initial Version - runs under Ruby 1.8.x.
150
+ version 0.0.4 - fixes bug when processing .WRD files with CRLF line
151
+ endings.
152
+ version 0.0.5 - added srctype of ttml and desttype of json, fixed bug
153
+ where beginning time of word was actually for previous word.
134
154
 
135
155
  To Do:
136
- update code to run under Ruby 1.9
156
+ specify individual files for processing rather than folders
157
+ update code to run under Ruby 1.9
158
+
137
159
 
138
- Make this a rubygem, making it available from an OEIT server, rather than
139
- from a public gem repository like RubyForge.
140
-
160
+
data/Rakefile CHANGED
@@ -1,31 +1,42 @@
1
- # $Id: Rakefile 195 2010-04-15 17:29:55Z pwilkins $
1
+ # $Id: Rakefile 196 2010-06-11 18:51:18Z pwilkins $
2
2
 
3
3
  require 'rake/gempackagetask'
4
4
  require 'rake'
5
5
 
6
- spec = Gem::Specification.new do |s|
6
+ spec = Gem::Specification.new do |s|
7
7
  s.name = "sm-transcript"
8
8
  s.summary = "Convert word lists to transcripts"
9
9
  s.description= File.read(File.join(File.dirname(__FILE__), 'README.txt'))
10
10
  s.requirements = [ 'TBD' ]
11
- s.version = "0.0.4"
11
+ s.version = "0.0.6"
12
12
  s.author = "Peter Wilkins"
13
13
  s.email = "pwilkins@mit.edu"
14
14
  s.homepage = "http://spokenmedia.mit.edu"
15
15
  s.platform = Gem::Platform::RUBY
16
16
  s.required_ruby_version = '>=1.8'
17
17
  s.files = Dir['lib/**/**'] +
18
- Dir['bin/sm-transcript'] +
19
- Dir['bin/results/PLACEHOLDER.txt'] +
20
- Dir['bin/transcripts/PLACEHOLDER.txt'] +
21
- Dir['test/**/**'] +
18
+ Dir['bin/sm-transcript'] +
19
+ Dir['bin/results/PLACEHOLDER.txt'] +
20
+ Dir['bin/transcripts/PLACEHOLDER.txt'] +
21
+ Dir['test/*'] +
22
+ Dir['test/results/*'] +
23
+ Dir['test/transcripts/PLACEHOLDER.txt'] +
22
24
  Dir['README.txt'] +
23
25
  Dir['LICENSE.txt'] +
24
- Dir['Rakefile']
25
- s.files.reject! { |fn| fn.include? "process_" }
26
+ Dir['Rakefile']
27
+ s.files.reject! { |fn| fn.include? "process_" }
28
+ s.files.reject! { |fn| fn.include? 'lect1' }
29
+ s.files.reject! { |fn| fn.include? 'lect2' }
30
+ s.files.reject! { |fn| fn.include? 'lect3' }
31
+ s.files.reject! { |fn| fn.include? 'file-chksum.rb' }
32
+ s.files.reject! { |fn| fn.include? 'html_tokenizer-example.rb' }
33
+ s.files.reject! { |fn| fn.include? 'optparseExample.rb' }
34
+ s.files.reject! { |fn| fn.include? 'xml_to_sqlite.rb' }
35
+ s.files.reject! { |fn| fn.include? 'require_relative.rb' }
36
+ s.files.reject! { |fn| fn.include? '801-lect1.*' }
26
37
  s.executables = [ 'sm-transcript' ]
27
38
  s.test_files = Dir["test/test*.rb"]
28
39
  s.has_rdoc = false
29
40
  end
30
-
41
+
31
42
  Rake::GemPackageTask.new(spec).define
File without changes
@@ -9,6 +9,31 @@ require_relative 'word'
9
9
 
10
10
  module SmTranscript
11
11
  class Metadata
12
+
13
+ # "dc-abstract"
14
+ # "dc-contributor"
15
+ # "dc-creator"
16
+ # "dc-description"
17
+ # "dc-isPartOf"
18
+ # "dc-language"
19
+ # "dc-license"
20
+ # "dc-subject"
21
+ # "dc-title"
22
+ # "dc-audience"
23
+ # "dc-available"
24
+ # "dc-created"
25
+ # "dc-extent"
26
+ # "dc-identifier"
27
+ # "dc-isReplacedBy"
28
+ # "dc-issued"
29
+ # "dc-modified"
30
+ # "dc-publisher"
31
+ # "dc-replaces"
32
+ # "dc-rightsHolder"
33
+ # "dc-spatial"
34
+ # "dc-temporal"
35
+ # "dc-type"
36
+ # "dc-valid"
12
37
 
13
38
  def initialize(metadata)
14
39
  @metadata = metadata
@@ -11,6 +11,7 @@ module SmTranscript
11
11
  SEG_SRC_TYPE = 'seg'
12
12
  WRD_SRC_TYPE = 'wrd'
13
13
  TXT_SRC_TYPE = 'txt'
14
+ TTML_SRC_TYPE = 'xml'
14
15
  TTML_DEST_TYPE = 'ttml'
15
16
  HTML_DEST_TYPE = 'html'
16
17
  DATAJS_DEST_TYPE = 'datajs'
@@ -58,12 +59,12 @@ module SmTranscript
58
59
  @options.destdir = @destdir = ddir
59
60
  end
60
61
 
61
- opts.on("--srctype seg | wrd | txt",
62
- "Kind of file to process (Default: seg)") do |stype|
62
+ opts.on("--srctype seg | wrd | txt | xml",
63
+ "Kind of file to process (Default: wrd)") do |stype|
63
64
  @options.srctype = @srctype = stype
64
65
  end
65
66
 
66
- opts.on("--desttype html | ttml | datajs",
67
+ opts.on("--desttype html | ttml | datajs | json",
67
68
  "Kind of format to output (Default: html)") do |dtype|
68
69
  @options.desttype = @desttype = dtype
69
70
  end
@@ -73,6 +74,11 @@ module SmTranscript
73
74
  return
74
75
  end
75
76
 
77
+ opts.on("-v", "--version", "Show version") do
78
+ puts "\nsm-transcript gem version: 0.0.5rc"
79
+ return
80
+ end
81
+
76
82
  begin
77
83
  argv = ["-h"] if argv.empty?
78
84
  opts.parse!(argv)
@@ -7,6 +7,7 @@ require 'extensions/kernel'
7
7
  require_relative 'options'
8
8
  require_relative 'seg_reader'
9
9
  require_relative 'wrd_reader'
10
+ require_relative 'ttml_reader'
10
11
  require_relative 'transcript'
11
12
  require_relative 'metadata'
12
13
  require_relative 'metadata_reader'
@@ -23,6 +24,9 @@ module SmTranscript
23
24
  def run
24
25
  # collect files to process
25
26
  begin
27
+ # p "working directory is #{File.new(__FILE__).path}"
28
+ # p "reading from #{@options.srcdir}"
29
+ # p "writing to #{@options.destdir}"
26
30
  raise "source directory doesn't exist" unless FileTest.exists?(@options.srcdir)
27
31
  raise "destination directory doesn't exist" unless FileTest.exists?(@options.destdir)
28
32
 
@@ -32,6 +36,8 @@ module SmTranscript
32
36
  case @options.srctype
33
37
  when SmTranscript::Options::SEG_SRC_TYPE
34
38
  words = SmTranscript::SegReader.from_file(x).words
39
+ when SmTranscript::Options::TTML_SRC_TYPE
40
+ words = SmTranscript::TtmlReader.from_file(x).words
35
41
  when SmTranscript::Options::TXT_SRC_TYPE
36
42
  md = SmTranscript::MetadataReader.from_file(x).metadata
37
43
  else SmTranscript::Options::WRD_SRC_TYPE
@@ -34,7 +34,7 @@ module SmTranscript
34
34
  @root.elements.each("/document/lecture/segment") do |s|
35
35
  s.text.scan(/^\d* \d* [\w']*$/) do |t|
36
36
  arr = t.split
37
- @words << SmTranscript::Word.new(arr[0], arr[1], arr[2])
37
+ @words << SmTranscript::Word.new(arr[0], arr[1], arr[1].to_i - arr[0].to_i, arr[2])
38
38
  end
39
39
  end
40
40
  end
@@ -5,12 +5,14 @@
5
5
  require "rexml/document"
6
6
  require 'extensions/kernel'
7
7
  require 'builder'
8
+ require 'sqlite3'
8
9
  require_relative 'word'
9
10
 
10
11
  module SmTranscript
11
12
  class Transcript
12
13
 
13
14
  @words = Array.new()
15
+ attr_reader :words
14
16
 
15
17
  def initialize(word_arr)
16
18
  @metadata = {}
@@ -27,7 +29,7 @@ module SmTranscript
27
29
  prev_start_time = 0
28
30
  start_time = 0
29
31
  @words.each do |w|
30
- # get the start time and reduce its granularity so that multiple
32
+ # get the start time and reduce its granularity so that multiple
31
33
  # words fall within a <span> element.
32
34
  start_time = w.start_time.to_i/1000
33
35
  if start_time.to_i == prev_start_time.to_i # append word
@@ -35,16 +37,16 @@ module SmTranscript
35
37
  else # create a new span_element
36
38
  # since prev_start_time is zero on first line, this avoids
37
39
  # writing a closing </span> with no opening <span>
40
+ span_element = cleanup_phrase(span_element)
38
41
  f.puts span_element << "</span> " unless prev_start_time == 0
39
-
40
- span_element = "<span id='T#{start_time}'>#{w.word}"
41
- prev_start_time = start_time
42
+ span_element = "<span id='T#{start_time}'>#{w.word}"
43
+ prev_start_time = start_time
42
44
  end
43
45
  end
44
- # In the block above, the last word isn't written if
45
- # the start_time and prev_start_time are the same.
46
- f.puts span_element << "</span> " unless start_time != prev_start_time
47
-
46
+ # In the block above, the last word isn't written if
47
+ # the start_time and prev_start_time are the same.
48
+ f.puts span_element << "</span> " unless start_time != prev_start_time
49
+ f.close
48
50
  end
49
51
  end # write_html()
50
52
 
@@ -57,13 +59,13 @@ module SmTranscript
57
59
  buf = ""
58
60
  bldr = Builder::XmlMarkup.new( :target => buf, :indent => 2 )
59
61
  bldr.instruct!
60
- bldr.tt("xmlns" => "http://www.w3.org/2006/04/ttaf1",
62
+ bldr.tt("xmlns" => "http://www.w3.org/2006/04/ttaf1",
61
63
  "xmlns:tts" => "http://www.w3.org/ns/ttml#styling",
62
64
  "xmlns:ttm" => "http://www.w3.org/ns/ttml#metadata",
63
- "xml:lang" => "en" ) {
65
+ "xml:lang" => "en" ) {
64
66
  bldr.head { |b|
65
- b.ttm :title, 'Document Metadata Example'
66
- b.ttm :desc, 'This document employs document metadata.'
67
+ b.ttm :title, 'The title of this transcript'
68
+ b.ttm :desc, 'The description of this transcript'
67
69
  }
68
70
  bldr.body {
69
71
  bldr.div {
@@ -72,31 +74,37 @@ module SmTranscript
72
74
  start_ms = end_ms = 0
73
75
  start_secs = 0
74
76
  @words.each do |w|
75
- # get the start time and reduce its granularity so that multiple
76
- # words fall within a span element.
77
+ # get the start time and reduce its granularity so that
78
+ # multiple words form a phrase.
77
79
  start_secs = w.start_time.to_i/1000
78
80
  if start_secs == prev_start_secs # append word
79
- end_ms = w.end_time.to_i
81
+ end_ms = w.end_time.to_i
80
82
  span_element << " #{w.word}"
81
83
  else # create a new span_element
82
- bldr.p( span_element,
83
- "xml:id" => "T#{start_secs.to_s}", "begin" => "#{start_ms.to_s}ms", "end" => "#{end_ms.to_s}ms" )
84
+ start_secs = w.start_time.to_i/1000
85
+ bldr.p( span_element,
86
+ "xml:id" => "T#{start_secs.to_s}",
87
+ "begin" => "#{start_ms.to_s}ms",
88
+ "dur" => "#{(end_ms - start_ms).to_s}ms",
89
+ "end" => "#{end_ms.to_s}ms" )
84
90
 
85
91
  start_ms = w.start_time.to_i
86
92
  end_ms = w.end_time.to_i
87
- span_element = " #{w.word}"
88
- prev_start_secs = start_secs
93
+ span_element = " #{w.word}"
94
+ prev_start_secs = start_secs
89
95
  end
90
- end
91
- # In the block above, the last word isn't written if
92
- # the start_time and prev_start_time are the same.
93
- bldr.p( span_element,
94
- "xml:id" => "T#{start_secs.to_s}",
95
- "begin" => "#{start_ms.to_s}ms",
96
- "end" => "#{end_ms.to_s}ms" ) unless start_secs != prev_start_secs
96
+ end # @words.each
97
+
98
+ # In the block above, the last word isn't written if
99
+ # the start_time and prev_start_time are the same.
100
+ bldr.p( span_element,
101
+ "xml:id" => "T#{start_secs.to_s}",
102
+ "begin" => "#{start_ms.to_s}ms",
103
+ "dur" => "#{(end_ms - start_ms).to_s}ms",
104
+ "end" => "#{end_ms.to_s}ms" ) unless start_secs != prev_start_secs
97
105
  }
98
106
  }
99
- }
107
+ }
100
108
  # p buf
101
109
  File.open(dest_file, "w") do |f|
102
110
  f.puts buf
@@ -104,27 +112,66 @@ module SmTranscript
104
112
  end
105
113
  end
106
114
 
107
- # Times are expressed in milliseconds, far more granularity than is
108
- # useful for most user-facing apps, especially since the player reports
115
+
116
+ # The JSON format is defined at http://url/of/document. It is the format
117
+ # of the static timed-text document that is passed to the player.˙
118
+ def write_json(dest_file)
119
+
120
+ end # write_json()
121
+
122
+
123
+ # Store transcript in a Sqlite database (though the essence of this
124
+ # method should work for all relational dbs). Unlike some of the other
125
+ # write_xxx() methods, this one requires a @metadata array.
126
+ # param db_id - for SQLite, this is a filename.
127
+ # video_id - is a unique identifier for the video
128
+
129
+ def write_sqlite(db_id)
130
+ db_id = "sm-transcript"
131
+ db = SQLite3::Database.open(db_id + '.sqlite3')
132
+
133
+ fields = XPath.match(doc.root, inner_node_name + '[1]/*').map{|node| node.name}
134
+ field_def = fields.map {|x| "%s TEXT" % x}.join(', ')
135
+
136
+ end # write_sqlite()
137
+
138
+
139
+ private
140
+
141
+ # Times are expressed in milliseconds, far more granularity than is
142
+ # useful for most user-facing apps, especially since the player reports
109
143
  # elapsed time only ten times a second.
110
- # By reducing the time by orders of magnitude provides these benefits:
144
+ # By reducing the time by orders of magnitude provides these benefits:
111
145
  # 1) Multiple words fall within a <span> element.
112
146
  # 2) Better mapping between start times and player time tracking
113
147
  def words_to_phrase(start_time)
114
148
  start_time.to_i/1000
115
149
  end # words_to_phrase
116
-
117
- def get_time_expression(milliseconds)
118
- milliseconds
119
- end
120
-
121
- # There are some word combinations that occur with such regularity that
150
+
151
+ # def get_time_expression(milliseconds)
152
+ # milliseconds
153
+ # end
154
+
155
+ # There are some word combinations that occur with such regularity that
122
156
  # they call out to be fixed. For example, "m I t" is unambiguously MIT.
123
- # These edits can only be done when the phrase has been assembled.
157
+ # These edits can only be done when the phrase has been assembled since
158
+ # each letter is treated as an indiviual word.
124
159
  def cleanup_phrase(phrase)
125
- phrase
160
+ phrase.gsub(/m I t/, 'MIT')
161
+ phrase.gsub(/o e I t/, 'OEIT')
162
+ end
163
+
164
+ # remove HTML tags from text. requires classes from ActionPack
165
+ def strip_tags(html)
166
+ return html if html.empty? || !html.include?('<')
167
+ output = ""
168
+ tokenizer = HTML::Tokenizer.new(html)
169
+ while token = tokenizer.next
170
+ node = HTML::Node.parse(nil, 0, 0, token, false)
171
+ output += token unless (node.kind_of? HTML::Tag) or (token =~ /^<!/)
172
+ end
173
+ return output
126
174
  end
127
-
128
175
 
129
176
  end # class
130
177
  end