sanzang 1.0.3 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/MANUAL.rdoc CHANGED
@@ -293,14 +293,9 @@ messages will still be displayed in the console's native IBM-437 encoding.
293
293
 
294
294
  $ sanzang t -E UTF-16LE -i in.txt -o out.txt TABLE.txt
295
295
 
296
- If the "-E" option is not specified, then \Sanzang will use the default
297
- encoding inherited from the environment. For example, a GNU/Linux user running
298
- \Sanzang in a UTF-8 terminal will by default have all text data read and
299
- written to in the UTF-8 encoding. The one *exception* to this is for
300
- environments using the IBM-437 encoding (typically an old Windows command
301
- shell). In this case, \Sanzang will take pity on you and automatically switch
302
- to UTF-8 by default, as if you had specified the option "-E" with value
303
- "UTF-8".
296
+ If the "-E" option is not specified, then \Sanzang will use the default data
297
+ encoding for that environment. The data encoding can be seen by running
298
+ \sanzang with the "--version" or "--platform" options.
304
299
 
305
300
  == Responsible Use
306
301
 
data/README.rdoc CHANGED
@@ -34,19 +34,12 @@ automatically download and install \Sanzang onto your computer.
34
34
  # gem install sanzang
35
35
 
36
36
  After this, you should be able to run the _sanzang_ command. Run the following
37
- command to verify your installation and print platform information.
37
+ command to verify your installation and print version information.
38
38
 
39
- # sanzang -P
39
+ # sanzang -V
40
40
 
41
- This command should show a summary of your platform for running \Sanzang.
41
+ This command should show a summary of your \Sanzang version and environment.
42
42
 
43
- Ruby platform: x86_64-linux
44
- Ruby version: 2.0.0
45
- External encoding: UTF-8
46
- Internal encoding: none
47
- Fork implemented: true
48
- Parallel version: 0.6.4
49
- Processors found: 4
50
- Sanzang version: 1.0.0
43
+ sanzang 1.0.4 [ruby_1.9.3] [x86_64-linux] [UTF-8]
51
44
 
52
45
  You now have \Sanzang installed on your computer.
data/lib/sanzang.rb CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby -w
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -23,10 +23,11 @@
23
23
  module Sanzang
24
24
  end
25
25
 
26
+ require_relative File.join("sanzang", "batch_translator")
27
+ require_relative File.join("sanzang", "platform")
26
28
  require_relative File.join("sanzang", "text_formatter")
27
29
  require_relative File.join("sanzang", "translation_table")
28
30
  require_relative File.join("sanzang", "translator")
29
- require_relative File.join("sanzang", "batch_translator")
30
31
  require_relative File.join("sanzang", "version")
31
32
 
32
33
  # The Sanzang::Command module contains Unix style commands utilizing the
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -18,6 +18,7 @@
18
18
 
19
19
  require "parallel"
20
20
 
21
+ require_relative "platform"
21
22
  require_relative "translator"
22
23
 
23
24
  module Sanzang
@@ -28,18 +29,6 @@ module Sanzang
28
29
  #
29
30
  class BatchTranslator < Translator
30
31
 
31
- # Evaluates to true if this Ruby can execute the fork(2) system call.
32
- #
33
- def forking?
34
- Process.respond_to?(:fork)
35
- end
36
-
37
- # The number of logical processors detected on the current system.
38
- #
39
- def processor_count
40
- Parallel.processor_count
41
- end
42
-
43
32
  # Translate a batch of files. The main parameter is an array, each element
44
33
  # of which should be a two-dimensional array with the first element being
45
34
  # the input file path, and the second element being the output file path.
@@ -47,8 +36,10 @@ module Sanzang
47
36
  # return value is an array containing all the output file paths.
48
37
  #
49
38
  def translate_batch(fpath_pairs, verbose = true, jobs = nil)
50
- if not forking?
39
+ if not Sanzang::Platform.unix_processes?
51
40
  jobs = 0
41
+ elsif not jobs
42
+ jobs = Sanzang::Platform.processor_count
52
43
  end
53
44
  Parallel.map(fpath_pairs, :in_processes => jobs) do |f1,f2|
54
45
  translate_io(f1, f2)
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -18,6 +18,7 @@
18
18
 
19
19
  require "optparse"
20
20
 
21
+ require_relative File.join("..", "platform")
21
22
  require_relative File.join("..", "translation_table")
22
23
  require_relative File.join("..", "batch_translator")
23
24
  require_relative File.join("..", "version")
@@ -35,7 +36,7 @@ module Sanzang::Command
35
36
  #
36
37
  def initialize
37
38
  @name = "sanzang batch"
38
- @encoding = nil
39
+ @encoding = Sanzang::Platform.data_encoding
39
40
  @outdir = nil
40
41
  @jobs = nil
41
42
  @verbose = false
@@ -56,8 +57,6 @@ module Sanzang::Command
56
57
  return 1
57
58
  end
58
59
 
59
- set_data_encoding
60
-
61
60
  translator = nil
62
61
  File.open(args[0], "rb", encoding: @encoding) do |table_file|
63
62
  table = Sanzang::TranslationTable.new(table_file.read)
@@ -79,20 +78,11 @@ module Sanzang::Command
79
78
  return 1
80
79
  end
81
80
 
82
- private
81
+ # Name of the command
82
+ #
83
+ attr_reader :name
83
84
 
84
- # Set the encoding for text data if it is not already set
85
- #
86
- def set_data_encoding
87
- if @encoding == nil
88
- if Encoding.default_external.to_s =~ /ASCII|IBM/
89
- $stderr.puts "Encoding: UTF-8"
90
- @encoding = Encoding::UTF_8
91
- else
92
- @encoding = Encoding.default_external
93
- end
94
- end
95
- end
85
+ private
96
86
 
97
87
  # Return an OptionParser object for this command
98
88
  #
@@ -116,10 +106,7 @@ module Sanzang::Command
116
106
  @encoding = Encoding.find(v)
117
107
  end
118
108
  op.on("-L", "--list-encodings", "list possible encodings") do |v|
119
- encodings = Encoding.list.sort do |x,y|
120
- x.to_s.upcase <=> y.to_s.upcase
121
- end
122
- puts encodings
109
+ Sanzang::Platform.valid_encodings.each {|e| puts e.to_s }
123
110
  exit 0
124
111
  end
125
112
  op.on("-j", "--jobs=N", "allow N concurrent processes") do |v|
@@ -131,9 +118,5 @@ module Sanzang::Command
131
118
  end
132
119
  end
133
120
 
134
- # Name of the command
135
- #
136
- attr_reader :name
137
-
138
121
  end
139
122
  end
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -18,6 +18,7 @@
18
18
 
19
19
  require "optparse"
20
20
 
21
+ require_relative File.join("..", "platform")
21
22
  require_relative File.join("..", "text_formatter")
22
23
  require_relative File.join("..", "version")
23
24
 
@@ -36,27 +37,12 @@ module Sanzang::Command
36
37
  #
37
38
  def initialize
38
39
  @name = "sanzang reflow"
39
- @encoding = nil
40
+ @encoding = Sanzang::Platform.data_encoding
40
41
  @infile = nil
41
42
  @outfile = nil
42
43
  @verbose = false
43
44
  end
44
45
 
45
- # Get a list of all acceptable text encodings.
46
- #
47
- def valid_encodings
48
- all_enc = Encoding.list.collect {|e| e.to_s }.sort do |x,y|
49
- x.upcase <=> y.upcase
50
- end
51
- all_enc.find_all do |e|
52
- begin
53
- Encoding::Converter.search_convpath(e, Encoding::UTF_8)
54
- rescue Encoding::ConverterNotFoundError
55
- e == "UTF-8" ? true : false
56
- end
57
- end
58
- end
59
-
60
46
  # Run the reflow command with the given arguments. The parameter _args_
61
47
  # would typically be an array of command options and parameters. Calling
62
48
  # this with the "-h" or "--help" option will print full usage information
@@ -71,8 +57,6 @@ module Sanzang::Command
71
57
  return 1
72
58
  end
73
59
 
74
- set_data_encoding
75
-
76
60
  begin
77
61
  fin = @infile ? File.open(@infile, "r") : $stdin
78
62
  fin.binmode.set_encoding(@encoding)
@@ -101,20 +85,11 @@ module Sanzang::Command
101
85
  return 1
102
86
  end
103
87
 
104
- private
105
-
106
- # Initialize the encoding for text data if it is not already set
88
+ # The name of the command
107
89
  #
108
- def set_data_encoding
109
- if @encoding == nil
110
- if Encoding.default_external.to_s =~ /ASCII|IBM/
111
- $stderr.puts "Encoding: UTF-8"
112
- @encoding = Encoding::UTF_8
113
- else
114
- @encoding = Encoding.default_external
115
- end
116
- end
117
- end
90
+ attr_reader :name
91
+
92
+ private
118
93
 
119
94
  # An OptionParser for the command
120
95
  #
@@ -122,10 +97,12 @@ module Sanzang::Command
122
97
  OptionParser.new do |op|
123
98
  op.banner = "Usage: #{@name} [options]\n"
124
99
 
125
- op.banner << "\nReformat text file contents into lines based on "
126
- op.banner << "spacing, punctuation, etc.\n"
127
- op.banner << "\nExamples:\n"
128
- op.banner << " #{@name} -i in/mytext.txt -o out/mytext.txt\n"
100
+ op.banner << "\nReformat text into lines based on spacing, "
101
+ op.banner << "punctuation, etc. This should work\nfor the CJK "
102
+ op.banner << "languages (Chinese, Japanese, and Korean). By default, "
103
+ op.banner << "text is read\nfrom STDIN and written to STDOUT."
104
+ op.banner << "\n"
105
+
129
106
  op.banner << "\nOptions:\n"
130
107
 
131
108
  op.on("-h", "--help", "show this help message and exit") do |v|
@@ -136,7 +113,7 @@ module Sanzang::Command
136
113
  @encoding = Encoding.find(v)
137
114
  end
138
115
  op.on("-L", "--list-encodings", "list possible encodings") do |v|
139
- puts valid_encodings
116
+ Sanzang::Platform.valid_encodings.each {|e| puts e.to_s }
140
117
  exit 0
141
118
  end
142
119
  op.on("-i", "--infile=FILE", "read input text from FILE") do |v|
@@ -151,9 +128,5 @@ module Sanzang::Command
151
128
  end
152
129
  end
153
130
 
154
- # The name of the command
155
- #
156
- attr_reader :name
157
-
158
131
  end
159
132
  end
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -23,6 +23,7 @@ require_relative "reflow"
23
23
  require_relative "translate"
24
24
  require_relative "batch"
25
25
 
26
+ require_relative File.join("..", "platform")
26
27
  require_relative File.join("..", "version")
27
28
 
28
29
  module Sanzang::Command
@@ -77,22 +78,30 @@ module Sanzang::Command
77
78
  # A string giving a listing of platform information
78
79
  #
79
80
  def platform_info
80
- info = "Ruby platform: #{RUBY_PLATFORM}\n"
81
- info << "Ruby version: #{RUBY_VERSION}\n"
82
- info << "External encoding: #{Encoding.default_external}\n"
83
- info << "Internal encoding: #{Encoding.default_internal or 'none'}\n"
84
- info << "Fork implemented: #{Process.respond_to?(:fork)}\n"
85
- info << "Parallel version: #{Parallel::VERSION}\n"
86
- info << "Processors found: #{Parallel.processor_count}\n"
87
- info << "Sanzang version: #{Sanzang::VERSION}\n"
81
+ info = "host_arch = #{Sanzang::Platform.machine_arch}\n"
82
+ info << "host_os = #{Sanzang::Platform.os_name}\n"
83
+ info << "host_processors = #{Sanzang::Platform.processor_count}\n"
84
+ info << "ruby_encoding_ext = #{Encoding.default_external}\n"
85
+ info << "ruby_encoding_int = #{Encoding.default_internal or 'none'}\n"
86
+ info << "ruby_multiproc = #{Sanzang::Platform.unix_processes?}\n"
87
+ info << "ruby_platform = #{RUBY_PLATFORM}\n"
88
+ info << "ruby_version = #{RUBY_VERSION}\n"
89
+ info << "sanzang_encoding = #{Sanzang::Platform.data_encoding}\n"
90
+ info << "sanzang_parallel = #{Parallel::VERSION}\n"
91
+ info << "sanzang_version = #{Sanzang::VERSION}\n"
88
92
  end
89
93
 
90
94
  # This is a string giving a brief one-line summary of version information
91
95
  #
92
96
  def version_info
93
- "sanzang #{Sanzang::VERSION} [ruby_#{RUBY_VERSION}] [#{RUBY_PLATFORM}]"
97
+ "sanzang #{Sanzang::VERSION} [ruby_#{RUBY_VERSION}] [#{RUBY_PLATFORM}]" \
98
+ + " [#{Sanzang::Platform.data_encoding}]"
94
99
  end
95
100
 
101
+ # Name of the command
102
+ #
103
+ attr_reader :name
104
+
96
105
  private
97
106
 
98
107
  # An OptionParser object for parsing command options and parameters
@@ -100,16 +109,17 @@ module Sanzang::Command
100
109
  def option_parser
101
110
  OptionParser.new do |op|
102
111
  op.banner = "Usage: #{@name} [options]\n"
103
- op.banner << "Usage: #{@name} <command> [options] [args]\n\n"
112
+ op.banner << "Usage: #{@name} <command> [options] [args]\n"
104
113
 
105
- op.banner << "Use \"--help\" with commands for usage information.\n"
114
+ op.banner << "\nUse \"-h\" or \"--help\" with sanzang commands for "
115
+ op.banner << "usage information.\n"
106
116
 
107
117
  op.banner << "\nSanzang commands:\n"
108
- op.banner << " batch translate many files in parallel\n"
118
+ op.banner << " batch translate many files in parallel\n"
109
119
  op.banner << " reflow format CJK text for translation\n"
110
120
  op.banner << " translate standard single text translation\n"
111
- op.banner << "\nOptions:\n"
112
121
 
122
+ op.banner << "\nOptions:\n"
113
123
  op.on("-h", "--help", "show this help message and exit") do |v|
114
124
  puts op
115
125
  exit 0
@@ -125,9 +135,5 @@ module Sanzang::Command
125
135
  end
126
136
  end
127
137
 
128
- # Name of the command
129
- #
130
- attr_reader :name
131
-
132
138
  end
133
139
  end
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -18,6 +18,7 @@
18
18
 
19
19
  require "optparse"
20
20
 
21
+ require_relative File.join("..", "platform")
21
22
  require_relative File.join("..", "translation_table")
22
23
  require_relative File.join("..", "translator")
23
24
  require_relative File.join("..", "version")
@@ -34,27 +35,12 @@ module Sanzang::Command
34
35
  #
35
36
  def initialize
36
37
  @name = "sanzang translate"
37
- @encoding = nil
38
+ @encoding = Sanzang::Platform.data_encoding
38
39
  @infile = nil
39
40
  @outfile = nil
40
41
  @verbose = false
41
42
  end
42
43
 
43
- # Get a list of all acceptable text encodings.
44
- #
45
- def valid_encodings
46
- all_enc = Encoding.list.collect {|e| e.to_s }.sort do |x,y|
47
- x.upcase <=> y.upcase
48
- end
49
- all_enc.find_all do |e|
50
- begin
51
- Encoding::Converter.search_convpath(e, Encoding::UTF_8)
52
- rescue Encoding::ConverterNotFoundError
53
- e == "UTF-8" ? true : false
54
- end
55
- end
56
- end
57
-
58
44
  # Run the translate command with the given arguments. The parameter _args_
59
45
  # would typically be an array of command options and parameters. Calling
60
46
  # this with the "-h" or "--help" option will print full usage information
@@ -69,8 +55,6 @@ module Sanzang::Command
69
55
  return 1
70
56
  end
71
57
 
72
- set_data_encoding
73
-
74
58
  translator = nil
75
59
  File.open(args[0], "rb", encoding: @encoding) do |table_file|
76
60
  table = Sanzang::TranslationTable.new(table_file.read)
@@ -105,20 +89,11 @@ module Sanzang::Command
105
89
  return 1
106
90
  end
107
91
 
108
- private
92
+ # Name of the command
93
+ #
94
+ attr_reader :name
109
95
 
110
- # Initialize the encoding for text data if it is not already set
111
- #
112
- def set_data_encoding
113
- if @encoding == nil
114
- if Encoding.default_external.to_s =~ /ASCII|IBM/
115
- $stderr.puts "Encoding: UTF-8"
116
- @encoding = Encoding::UTF_8
117
- else
118
- @encoding = Encoding.default_external
119
- end
120
- end
121
- end
96
+ private
122
97
 
123
98
  # An OptionParser for the command
124
99
  #
@@ -128,10 +103,9 @@ module Sanzang::Command
128
103
 
129
104
  op.banner << "\nTranslate text using simple table rules. Input text "
130
105
  op.banner << "is read from STDIN by\ndefault, and the output is "
131
- op.banner << "written to STDOUT by default.\n"
106
+ op.banner << "written to STDOUT by default. The translation table "
107
+ op.banner << "\nfile is specified as a parameter.\n"
132
108
 
133
- op.banner << "\nExample:\n"
134
- op.banner << " #{@name} -i text.txt -o text.sz.txt table.txt\n"
135
109
  op.banner << "\nOptions:\n"
136
110
 
137
111
  op.on("-h", "--help", "show this help message and exit") do |v|
@@ -142,7 +116,7 @@ module Sanzang::Command
142
116
  @encoding = Encoding.find(v)
143
117
  end
144
118
  op.on("-L", "--list-encodings", "list possible encodings") do |v|
145
- puts valid_encodings
119
+ Sanzang::Platform.valid_encodings.each {|e| puts e.to_s }
146
120
  exit 0
147
121
  end
148
122
  op.on("-i", "--infile=FILE", "read input text from FILE") do |v|
@@ -157,9 +131,5 @@ module Sanzang::Command
157
131
  end
158
132
  end
159
133
 
160
- # Name of the command
161
- #
162
- attr_reader :name
163
-
164
134
  end
165
135
  end
@@ -0,0 +1,128 @@
1
+ #!/usr/bin/env ruby
2
+ # -*- encoding: UTF-8 -*-
3
+ #--
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
+ #
6
+ # This program is free software: you can redistribute it and/or modify it under
7
+ # the terms of the GNU General Public License as published by the Free Software
8
+ # Foundation, either version 3 of the License, or (at your option) any later
9
+ # version.
10
+ #
11
+ # This program is distributed in the hope that it will be useful, but WITHOUT
12
+ # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
13
+ # FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
14
+ # details.
15
+ #
16
+ # You should have received a copy of the GNU General Public License along with
17
+ # this program. If not, see <http://www.gnu.org/licenses/>.
18
+
19
+ require 'rbconfig'
20
+
21
+ # The Sanzang::Platform module includes information about the underlying system
22
+ # that is needed by the \Sanzang system. This includes information about the
23
+ # machine architecture and OS, the number of processors available, encodings
24
+ # that are supported, and encodings that are optimal.
25
+ #
26
+ module Sanzang::Platform
27
+ class << self
28
+
29
+ # CPU architecture of the underlying machine
30
+ #
31
+ def machine_arch
32
+ RbConfig::CONFIG["target_cpu"]
33
+ end
34
+
35
+ # Operating system, which may be different from RUBY_PLATFORM
36
+ #
37
+ def os_name
38
+ RbConfig::CONFIG["target_os"]
39
+ end
40
+
41
+ # Does this Ruby VM support Unix-style process handling?
42
+ #
43
+ def unix_processes?
44
+ [:fork, :wait, :kill].each do |f|
45
+ if not Process.respond_to?(f)
46
+ return false
47
+ end
48
+ end
49
+ true
50
+ end
51
+
52
+ # Find the number of logical processors seen by the system. This may be
53
+ # different from the number of physical processors or CPU cores. If the
54
+ # number of processors cannot be detected, nil is returned. For Windows,
55
+ # this is detected through an OLE lookup, and for Unix systems, a heuristic
56
+ # approach is taken. Supported Unix types include:
57
+ #
58
+ # * AIX: pmcycles (AIX 5+), lsdev
59
+ # * BSD: /sbin/sysctl
60
+ # * Cygwin: /proc/cpuinfo
61
+ # * Darwin: hwprefs, /usr/sbin/sysctl
62
+ # * HP-UX: ioscan
63
+ # * IRIX: sysconf
64
+ # * Linux: /proc/cpuinfo
65
+ # * Minix 3+: /proc/cpuinfo
66
+ # * Solaris: psrinfo
67
+ # * Tru64 UNIX: psrinfo
68
+ # * UnixWare: psrinfo
69
+ #
70
+ def processor_count
71
+ if os_name =~ /mingw|mswin/
72
+ require 'win32ole'
73
+ result = WIN32OLE.connect("winmgmts://").ExecQuery(
74
+ "select NumberOfLogicalProcessors from Win32_Processor")
75
+ result.to_enum.first.NumberOfLogicalProcessors
76
+ elsif File.readable?("/proc/cpuinfo")
77
+ IO.read("/proc/cpuinfo").scan(/^processor/).size
78
+ elsif File.executable?("/usr/bin/hwprefs")
79
+ IO.popen(%w[/usr/bin/hwprefs thread_count]).read.to_i
80
+ elsif File.executable?("/usr/sbin/psrinfo")
81
+ IO.popen("/usr/sbin/psrinfo").read.scan(/^.*on-*line/).size
82
+ elsif File.executable?("/usr/sbin/ioscan")
83
+ IO.popen(%w[/usr/sbin/ioscan -kC processor]) do |out|
84
+ out.read.scan(/^.*processor/).size
85
+ end
86
+ elsif File.executable?("/usr/sbin/pmcycles")
87
+ IO.popen(%w[/usr/sbin/pmcycles -m]).read.count("\n")
88
+ elsif File.executable?("/usr/sbin/lsdev")
89
+ IO.popen(%w[/usr/sbin/lsdev -Cc processor -S 1]).read.count("\n")
90
+ elsif File.executable?("/usr/sbin/sysconf") and os_name =~ /IRIX/i
91
+ IO.popen(%w[/usr/sbin/sysconf NPROC_ONLN]).read.to_i
92
+ elsif File.executable?("/usr/sbin/sysctl")
93
+ IO.popen(%w[/usr/sbin/sysctl -n hw.ncpu]).read.to_i
94
+ elsif File.executable?("/sbin/sysctl")
95
+ IO.popen(%w[/sbin/sysctl -n hw.ncpu]).read.to_i
96
+ else
97
+ nil
98
+ end
99
+ end
100
+
101
+ # Text encodings that can be converted to UTF-8. MRI still lacks some
102
+ # converter implementations for obscure encodings.
103
+ #
104
+ def valid_encodings
105
+ Encoding.list.find_all do |e|
106
+ begin
107
+ Encoding::Converter.search_convpath(e, Encoding::UTF_8)
108
+ rescue Encoding::ConverterNotFoundError
109
+ e == Encoding::UTF_8 ? true : false
110
+ end
111
+ end.sort_by! {|e| e.to_s.upcase }
112
+ end
113
+
114
+ # Default text data encoding on this platform. This is usually the default
115
+ # external encoding of the Ruby interpreter; however, if the encoding is
116
+ # an ASCII variant or an old IBM DOS encoding, then it should default to
117
+ # UTF-8 since these are effectively obsolete, or they are subsets of UTF-8.
118
+ #
119
+ def data_encoding
120
+ if Encoding.default_external.to_s =~ /ASCII|IBM/
121
+ Encoding::UTF_8
122
+ else
123
+ Encoding.default_external
124
+ end
125
+ end
126
+
127
+ end
128
+ end
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -69,7 +69,7 @@ module Sanzang
69
69
  # Translator#translate is collated and numbered for reference purposes.
70
70
  # This is the normal text listing output of the Sanzang Translator.
71
71
  #
72
- def gen_listing(source_text)
72
+ def gen_listing(source_text, pos = 1)
73
73
  source_encoding = source_text.encoding
74
74
  source_text.encode!(Encoding::UTF_8)
75
75
 
@@ -79,7 +79,7 @@ module Sanzang
79
79
  listing = ""
80
80
  texts[0].length.times do |line_i|
81
81
  @table.width.times do |col_i|
82
- listing << "[#{line_i + 1}.#{col_i + 1}] #{texts[col_i][line_i]}" \
82
+ listing << "[#{pos + line_i}.#{col_i + 1}] #{texts[col_i][line_i]}" \
83
83
  << newline
84
84
  end
85
85
  listing << newline
@@ -90,7 +90,8 @@ module Sanzang
90
90
  # Read a text from _input_ and write its translation listing to _output_.
91
91
  # If a parameter is a string, it is interpreted as the path to a file, and
92
92
  # the relevant file is opened and used. Otherwise, the parameter is treated
93
- # as an open IO object.
93
+ # as an open IO object. I/O is buffered for better performance and to avoid
94
+ # reading entire texts into memory.
94
95
  #
95
96
  def translate_io(input, output)
96
97
  if input.kind_of?(String)
@@ -103,7 +104,18 @@ module Sanzang
103
104
  else
104
105
  io_out = output
105
106
  end
106
- io_out.write(gen_listing(io_in.read))
107
+
108
+ buf_size = 96
109
+ buffer = ""
110
+ io_in.each do |line|
111
+ buffer << line
112
+ if io_in.lineno % buf_size == 0
113
+ io_out.write(gen_listing(buffer, io_in.lineno - buf_size + 1))
114
+ buffer = ""
115
+ end
116
+ end
117
+ io_out.write(
118
+ gen_listing(buffer, io_in.lineno - buffer.rstrip.count("\n")))
107
119
  ensure
108
120
  io_in.close if input.kind_of?(String) and not io_in.closed?
109
121
  io_out.close if output.kind_of?(String) and not io_out.closed?
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: UTF-8 -*-
3
3
  #--
4
- # Copyright (C) 2012 Lapis Lazuli Texts
4
+ # Copyright (C) 2012-2013 Lapis Lazuli Texts
5
5
  #
6
6
  # This program is free software: you can redistribute it and/or modify it under
7
7
  # the terms of the GNU General Public License as published by the Free Software
@@ -20,6 +20,6 @@ module Sanzang
20
20
 
21
21
  # Current version number of Sanzang
22
22
  #
23
- VERSION = "1.0.3"
23
+ VERSION = "1.0.4"
24
24
 
25
25
  end
@@ -60,26 +60,24 @@ class TestSanzang < Test::Unit::TestCase
60
60
  assert_equal(stage_2(), text)
61
61
  end
62
62
 
63
- def test_translate_string
64
- table = Sanzang::TranslationTable.new(table_string())
65
- text = Sanzang::Translator.new(table).gen_listing(stage_2())
66
- assert_equal(stage_3(), text)
67
- end
68
-
69
63
  def test_translate_file
70
64
  table_path = File.join(File.dirname(__FILE__), "utf-8", "table.txt")
71
65
  s2_path = File.join(File.dirname(__FILE__), "utf-8", "stage_2.txt")
72
66
  s3_path = File.join(File.dirname(__FILE__), "utf-8", "stage_3.txt")
73
67
  tab = Sanzang::TranslationTable.new(IO.read(table_path, encoding: "UTF-8"))
74
68
  translator = Sanzang::Translator.new(tab)
75
- translator.translate_io(s2_path, s3_path)
69
+ translator.translate_io(s2_path, s3_path)
76
70
  end
77
71
 
78
- def test_translator_parallel
72
+ def test_translate_string
79
73
  table = Sanzang::TranslationTable.new(table_string())
80
- bt = Sanzang::BatchTranslator.new(table)
81
- bt.forking?
82
- assert(bt.processor_count > 0, "Processor count less than zero")
74
+ text = Sanzang::Translator.new(table).gen_listing(stage_2())
75
+ assert_equal(stage_3(), text)
76
+ end
77
+
78
+ def test_translator_parallel
79
+ procs = Sanzang::Platform.processor_count
80
+ assert(procs > 0, "Processor count less than zero")
83
81
  end
84
82
 
85
83
  def test_translate_batch
metadata CHANGED
@@ -1,27 +1,30 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sanzang
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.3
4
+ version: 1.0.4
5
+ prerelease:
5
6
  platform: ruby
6
7
  authors:
7
8
  - Lapis Lazuli Texts
8
9
  autorequire:
9
10
  bindir: bin
10
11
  cert_chain: []
11
- date: 2013-06-30 00:00:00.000000000 Z
12
+ date: 2013-07-25 00:00:00.000000000 Z
12
13
  dependencies:
13
14
  - !ruby/object:Gem::Dependency
14
15
  name: parallel
15
16
  requirement: !ruby/object:Gem::Requirement
17
+ none: false
16
18
  requirements:
17
- - - '>='
19
+ - - ! '>='
18
20
  - !ruby/object:Gem::Version
19
21
  version: 0.5.19
20
22
  type: :runtime
21
23
  prerelease: false
22
24
  version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
23
26
  requirements:
24
- - - '>='
27
+ - - ! '>='
25
28
  - !ruby/object:Gem::Version
26
29
  version: 0.5.19
27
30
  description: Sanzang is a program built for machine translation of natural languages.
@@ -58,6 +61,7 @@ files:
58
61
  - lib/sanzang/translation_table.rb
59
62
  - lib/sanzang/batch_translator.rb
60
63
  - lib/sanzang/version.rb
64
+ - lib/sanzang/platform.rb
61
65
  - lib/sanzang/command/reflow.rb
62
66
  - lib/sanzang/command/sanzang_cmd.rb
63
67
  - lib/sanzang/command/translate.rb
@@ -70,26 +74,27 @@ files:
70
74
  homepage: http://www.lapislazulitexts.com/sanzang/
71
75
  licenses:
72
76
  - GPL-3
73
- metadata: {}
74
77
  post_install_message:
75
78
  rdoc_options: []
76
79
  require_paths:
77
80
  - lib
78
81
  required_ruby_version: !ruby/object:Gem::Requirement
82
+ none: false
79
83
  requirements:
80
- - - '>='
84
+ - - ! '>='
81
85
  - !ruby/object:Gem::Version
82
86
  version: 1.9.0
83
87
  required_rubygems_version: !ruby/object:Gem::Requirement
88
+ none: false
84
89
  requirements:
85
- - - '>='
90
+ - - ! '>='
86
91
  - !ruby/object:Gem::Version
87
92
  version: '0'
88
93
  requirements: []
89
94
  rubyforge_project:
90
- rubygems_version: 2.0.3
95
+ rubygems_version: 1.8.23
91
96
  signing_key:
92
- specification_version: 4
97
+ specification_version: 3
93
98
  summary: Simple rule-based machine translation system.
94
99
  test_files:
95
100
  - test/tc_reflow_encodings.rb
checksums.yaml DELETED
@@ -1,7 +0,0 @@
1
- ---
2
- SHA1:
3
- metadata.gz: 8b8f836d96d322d790415d013b67a6313007b29c
4
- data.tar.gz: c929928a0b63f3e16fe7d4b5dd9c14936b67f6c0
5
- SHA512:
6
- metadata.gz: 69eea67e41a7e29330ab5733be22e3f6299d59dc498c7348bcd7f0f6bbff6b75923bc231c6405d43943430a224e6a9dbe52a53d9ae9ef24d6853863944be5306
7
- data.tar.gz: ead0983545667b9d315647f0d862b5b2b2aef960ba961dd9f95a224107084e015b71e240c5c1ed87116ce3963bcc25038c953c936fff4e78fbcbb727712f3367