sanzang 1.1.0 → 1.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 681ecd0867d5627edb42f6382642cc0840cf1c9f
4
- data.tar.gz: 0f21e00fe6951814816d10ab13e6084537565298
3
+ metadata.gz: f5ca5ec3584fad6ed5ef48cce5338db925441921
4
+ data.tar.gz: 978ca3153152f25b8f0a9d61ae328296eb18b291
5
5
  SHA512:
6
- metadata.gz: ea7c5c63dd221190f739dd087044aa4c4f9b9ce2f64062c5b3cb12cf89d045c69415c2411fb6aa1391cdde45af3c5be970d0510ee12c121d3d5e0907fe19fb59
7
- data.tar.gz: dcbbf6fa87912f41054b6f6af0cd9a9a7fd7b03e042067dfe8788c4366dfd4a4a2ef41a858ac8818c80e4ddd00faa90a8cd60918668f49dd664c90be31cb320e
6
+ metadata.gz: d6d8cf630f82b81227946a005664e0afcbec73fd8209dffd554c8135e911311b8ec08878288d31f489d73b0c42cec3e26bbbdea6e697485ba3f105c51b332cf7
7
+ data.tar.gz: 8c1d3c7b1d1667f6ca778d8dc80dbf5171b54bd7b76432d75eca93d714db724904b279383954b8da0760b1a9a30a3ccb903a320fac157e167f7a83e5e484e392
@@ -38,3 +38,10 @@ Converters for several encodings have not yet been implemented by MRI. Most of
38
38
  these are obscure and not widely used. Perhaps the most notable is EUC-TW,
39
39
  which is an old Unix encoding for traditional Chinese. Text encodings that
40
40
  cannot be converted to and from UTF-8 are not currently supported.
41
+
42
+ == Reserved characters
43
+
44
+ \Sanzang internally uses an ASCII control character as a temporary marker. The
45
+ following character will be removed from any translated text:
46
+
47
+ * 0x1F -- "US" -- "Unit separator"
@@ -0,0 +1,80 @@
1
+ = News
2
+
3
+ == Release History
4
+
5
+ === v1.1.1
6
+ * Updated horizontal space handling to be more robust.
7
+ * Horizontal spaces will not be added at the end of any lines.
8
+ * Fixed transcoding logic for when translate_io parameters are file paths.
9
+ * Documentation and build file revisions.
10
+
11
+ === v1.1.0
12
+ * New feature -- automatic spacing between translated terms.
13
+ * Updated Sanzang::Platform for greater compatibility between Ruby versions.
14
+
15
+ === v1.0.9
16
+ * Limiting \Sanzang on JRuby to UTF-8 -- JRuby encoding support is limited.
17
+ * Minor encoding handling fixes.
18
+
19
+ === v1.0.8
20
+ * Added support for a SANZANG_ENCODING environment variable.
21
+ * Documentation updates.
22
+ * Minor output formatting fixes.
23
+ * Fixed TypeError exception raised when listing encodings in Ruby 2.1.
24
+
25
+ === v1.0.7
26
+ * Fixes for I/O exception handling logic and file descriptor handling.
27
+ * Fixed processor counting logic on MS Windows platforms.
28
+
29
+ === v1.0.6
30
+ * Fixed a bug in file descriptor handling when the FD is nil.
31
+ * Rakefile updates for greater portability and accepting non-GNU tar.
32
+
33
+ === v1.0.5
34
+ * Faster translation table loading.
35
+ * Added support for JRuby including multithreaded batches.
36
+ * Rakefile will not attempt to build tar archives by default.
37
+ * Gemfile updates and revisions for better requirements specification.
38
+
39
+ === v1.0.4
40
+ * Introduced buffered I/O for better performance and memory usage.
41
+ * Added the Sanzang::Platform module for more accessing system information.
42
+
43
+ === v1.0.3
44
+ * Test case file updates following encoding handling changes.
45
+ * Rakefile updates for greater portability (aiming at BSD compatibility).
46
+ * Using UTF-8 as the default encoding for ASCII and IBM CP terminals.
47
+
48
+ === v1.0.2
49
+ * Encoding list should only display those that can convert to UTF-8.
50
+ * Encoding fixes when transcoding. Using UTF-8 internally for translation.
51
+ * Added a "verbose mode" for debugging.
52
+
53
+ === v1.0.1
54
+ * Reflow command will only list encodings that can be converted to UTF-8.
55
+ * Rewrote Sanzang::Translator#translate_io for simpler file handling.
56
+ * Pipe handling updated to break quietly rather than report an error.
57
+ * Added additional checks for translation table formatting.
58
+
59
+ === v1.0.0
60
+ * Many documentation updates and additions.
61
+ * Consolidated multiple executables into a single "sanzang" command suite.
62
+ * Previous versions of \Sanzang should be uninstalled before installing this.
63
+
64
+ === v0.0.4
65
+ * Error message formatting.
66
+ * Enabled case-insensitive sorting for encodings list.
67
+
68
+ === v0.0.3
69
+ * Source code formatting.
70
+ * Fixed usage message error for the translate command.
71
+
72
+ === v0.0.2
73
+ * Updated Parallel requirement to v0.5.19, following a SIGINT bug.
74
+ * Rakefile additions and revisions for robustness.
75
+ * Added empty file to batch testing directory.
76
+ * Added README.md file.
77
+ * Fixed file permissions.
78
+
79
+ === v0.0.1
80
+ * Initial commit to version control, and the first release.
@@ -40,6 +40,6 @@ command to verify your installation and print version information.
40
40
 
41
41
  This command should show a summary of your \Sanzang version and environment.
42
42
 
43
- sanzang 1.0.8 (UTF-8) ruby-2.1.0p0 x86_64-linux
43
+ sanzang 1.1.1 (UTF-8) ruby-2.1.0p0 x86_64-linux
44
44
 
45
45
  You now have \Sanzang installed on your computer.
@@ -1,6 +1,6 @@
1
1
  # coding: UTF-8
2
2
  #--
3
- # Copyright (C) 2012-2013 Lapis Lazuli Texts
3
+ # Copyright (C) 2012-2014 Lapis Lazuli Texts
4
4
  #
5
5
  # This program is free software: you can redistribute it and/or modify it under
6
6
  # the terms of the GNU General Public License as published by the Free Software
@@ -1,6 +1,6 @@
1
1
  # coding: UTF-8
2
2
  #--
3
- # Copyright (C) 2012-2013 Lapis Lazuli Texts
3
+ # Copyright (C) 2012-2014 Lapis Lazuli Texts
4
4
  #
5
5
  # This program is free software: you can redistribute it and/or modify it under
6
6
  # the terms of the GNU General Public License as published by the Free Software
@@ -23,12 +23,19 @@ module Sanzang
23
23
  #
24
24
  class TranslationTable
25
25
 
26
+ # The records for the translation table, as an array
27
+ #
28
+ attr_reader :records
29
+
30
+ # Original encoding when the table was read
31
+ #
32
+ attr_reader :source_encoding
33
+
26
34
  # A table is created from a formatted string of translation rules. The
27
35
  # string is in the format of delimited text. The text format can be
28
36
  # summarized as follows:
29
37
  #
30
38
  # - Each line of text is a record for a translation rule.
31
- # - Each record may begin with "~|" and end with "|~".
32
39
  # - Fields in the record are separated by the "|" character.
33
40
  # - The first field contains the term in the source language.
34
41
  # - Subsequent fields are equivalent terms in destination languages.
@@ -41,7 +48,9 @@ module Sanzang
41
48
  #
42
49
  def initialize(rules)
43
50
  contents = rules.kind_of?(String) ? rules : rules.read
51
+ @source_encoding = contents.encoding
44
52
  contents.encode!(Encoding::UTF_8)
53
+
45
54
  contents.strip! # Rm outside empty lines
46
55
  contents.gsub!("~|", "") # Rm left delimiter
47
56
  contents.gsub!("|~", "") # Rm right delimiter
@@ -72,7 +81,7 @@ module Sanzang
72
81
  @records[index]
73
82
  end
74
83
 
75
- # The text encoding used for all translation table data
84
+ # The text encoding used internally for all translation table data
76
85
  #
77
86
  def encoding
78
87
  Encoding::UTF_8
@@ -96,9 +105,5 @@ module Sanzang
96
105
  @records[0].length
97
106
  end
98
107
 
99
- # The records for the translation table, as an array
100
- #
101
- attr_reader :records
102
-
103
108
  end
104
109
  end
@@ -1,6 +1,6 @@
1
1
  # coding: UTF-8
2
2
  #--
3
- # Copyright (C) 2012-2013 Lapis Lazuli Texts
3
+ # Copyright (C) 2012-2014 Lapis Lazuli Texts
4
4
  #
5
5
  # This program is free software: you can redistribute it and/or modify it under
6
6
  # the terms of the GNU General Public License as published by the Free Software
@@ -56,10 +56,12 @@ module Sanzang
56
56
  vocab_terms = text_vocab(source_text)
57
57
  1.upto(@table.width - 1) do |column_i|
58
58
  translation = String.new(source_text)
59
+ translation.delete!("\x1F")
59
60
  vocab_terms.each do |term|
60
- translation.gsub!(term[0], "\0#{term[column_i]}\0")
61
+ translation.gsub!(term[0], "\x1F#{term[column_i]}\x1F")
61
62
  end
62
- translation.gsub!(/\0+/, " ")
63
+ translation.gsub!(/\x1F(?=[\r\n])/, "")
64
+ translation.gsub!(/\x1F+/, " ")
63
65
  text_collection << translation
64
66
  end
65
67
  text_collection
@@ -95,17 +97,17 @@ module Sanzang
95
97
  #
96
98
  def translate_io(input, output)
97
99
  if input.kind_of?(String)
98
- io_in = File.open(input, "rb", encoding: @table.encoding)
100
+ io_in = File.open(input, "rb", encoding: @table.source_encoding)
99
101
  else
100
102
  io_in = input
101
103
  end
102
104
  if output.kind_of?(String)
103
- io_out = File.open(output, "wb", encoding: @table.encoding)
105
+ io_out = File.open(output, "wb", encoding: @table.source_encoding)
104
106
  else
105
107
  io_out = output
106
108
  end
107
109
 
108
- buf_size = 96
110
+ buf_size = 100
109
111
  buffer = ""
110
112
  io_in.each do |line|
111
113
  buffer << line
@@ -1,6 +1,6 @@
1
1
  # coding: UTF-8
2
2
  #--
3
- # Copyright (C) 2012-2013 Lapis Lazuli Texts
3
+ # Copyright (C) 2012-2014 Lapis Lazuli Texts
4
4
  #
5
5
  # This program is free software: you can redistribute it and/or modify it under
6
6
  # the terms of the GNU General Public License as published by the Free Software
@@ -19,6 +19,6 @@ module Sanzang
19
19
 
20
20
  # Current version number of Sanzang
21
21
  #
22
- VERSION = "1.1.0"
22
+ VERSION = "1.1.1"
23
23
 
24
24
  end
@@ -27,13 +27,13 @@ class TestSanzang < Test::Unit::TestCase
27
27
 
28
28
  def stage_3
29
29
  "[1.1]     大唐三藏法師玄奘奉\r\n" \
30
- << "[1.2]      dà táng sānzàng fǎshī xuánzàng fèng \r\n" \
30
+ << "[1.2]      dà táng sānzàng fǎshī xuánzàng fèng\r\n" \
31
31
  << "[1.3]      great tang tripiṭaka dharma-master xuanzang " \
32
- << "reverently \r\n" \
32
+ << "reverently\r\n" \
33
33
  << "\r\n" \
34
34
  << "[2.1]  詔譯\r\n" \
35
- << "[2.2]   zhào yì \r\n" \
36
- << "[2.3]   imperial-order translate/interpret \r\n" \
35
+ << "[2.2]   zhào yì\r\n" \
36
+ << "[2.3]   imperial-order translate/interpret\r\n" \
37
37
  << "\r\n"
38
38
  end
39
39
 
@@ -1,6 +1,6 @@
1
1
  [1.1]     大唐三藏法師玄奘奉
2
- [1.2]      dà táng sānzàng fǎshī xuánzàng fèng
3
- [1.3]      great tang tripiṭaka dharma-master xuanzang reverently
2
+ [1.2]      dà táng sānzàng fǎshī xuánzàng fèng
3
+ [1.3]      great tang tripiṭaka dharma-master xuanzang reverently
4
4
 
5
5
  [2.1]  詔譯
6
6
  [2.2]   zhào yì
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sanzang
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Lapis Lazuli Texts
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-01-26 00:00:00.000000000 Z
11
+ date: 2014-01-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: parallel
@@ -24,12 +24,11 @@ dependencies:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0.8'
27
- description: Sanzang is a program built for machine translation of natural languages.
28
- This application is particularly suitable as a translation aid for CJK languages
29
- including ancient texts. The translation method is rule-based, and translation rules
30
- are stored in flat files as delimited text. This program can also utilize multiprocessing
31
- to naturally scale to multiple processors and processor cores. Sanzang is available
32
- under the GNU GPL, version 3.
27
+ description: Sanzang is a compact and simple cross-platform machine translation system.
28
+ It was designed especially for translating from the CJK languages (Chinese, Japanese,
29
+ and Korean), and it is suitable even for translating from ancient texts. Sanzang
30
+ is implemented as a Unix style command suite program, with each subcommand carrying
31
+ out a major function of the system.
33
32
  email: lapislazulitexts@gmail.com
34
33
  executables:
35
34
  - sanzang
@@ -38,11 +37,13 @@ extra_rdoc_files:
38
37
  - HACKING.rdoc
39
38
  - LICENSE.rdoc
40
39
  - MANUAL.rdoc
40
+ - NEWS.rdoc
41
41
  - README.rdoc
42
42
  files:
43
43
  - HACKING.rdoc
44
44
  - LICENSE.rdoc
45
45
  - MANUAL.rdoc
46
+ - NEWS.rdoc
46
47
  - README.rdoc
47
48
  - bin/sanzang
48
49
  - lib/sanzang.rb
@@ -91,7 +92,7 @@ rubyforge_project:
91
92
  rubygems_version: 2.2.0
92
93
  signing_key:
93
94
  specification_version: 4
94
- summary: Machine translation for CJK languages
95
+ summary: Machine translation from CJK languages
95
96
  test_files:
96
97
  - test/tc_reflow_encodings.rb
97
98
  - test/tc_simple_translation.rb