cppjieba_rb 0.3.3 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/.travis.yml +3 -0
- data/README.md +1 -1
- data/Rakefile +2 -2
- data/cppjieba_rb.gemspec +4 -4
- data/lib/cppjieba_rb/version.rb +1 -1
- metadata +17 -135
- data/ext/cppjieba/.gitignore +0 -17
- data/ext/cppjieba/.travis.yml +0 -21
- data/ext/cppjieba/CMakeLists.txt +0 -28
- data/ext/cppjieba/ChangeLog.md +0 -236
- data/ext/cppjieba/README.md +0 -292
- data/ext/cppjieba/README_EN.md +0 -113
- data/ext/cppjieba/appveyor.yml +0 -32
- data/ext/cppjieba/deps/CMakeLists.txt +0 -1
- data/ext/cppjieba/deps/gtest/CMakeLists.txt +0 -5
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-death-test.h +0 -283
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-message.h +0 -230
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-param-test.h +0 -1421
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-param-test.h.pump +0 -487
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-printers.h +0 -796
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-spi.h +0 -232
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-test-part.h +0 -176
- data/ext/cppjieba/deps/gtest/include/gtest/gtest-typed-test.h +0 -259
- data/ext/cppjieba/deps/gtest/include/gtest/gtest.h +0 -2155
- data/ext/cppjieba/deps/gtest/include/gtest/gtest_pred_impl.h +0 -358
- data/ext/cppjieba/deps/gtest/include/gtest/gtest_prod.h +0 -58
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-death-test-internal.h +0 -308
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-filepath.h +0 -210
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-internal.h +0 -1226
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-linked_ptr.h +0 -233
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-param-util-generated.h +0 -4822
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-param-util-generated.h.pump +0 -301
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-param-util.h +0 -619
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-port.h +0 -1788
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-string.h +0 -350
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-tuple.h +0 -968
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-tuple.h.pump +0 -336
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-type-util.h +0 -3330
- data/ext/cppjieba/deps/gtest/include/gtest/internal/gtest-type-util.h.pump +0 -296
- data/ext/cppjieba/deps/gtest/src/.deps/.dirstamp +0 -0
- data/ext/cppjieba/deps/gtest/src/.deps/gtest-all.Plo +0 -681
- data/ext/cppjieba/deps/gtest/src/.deps/gtest_main.Plo +0 -509
- data/ext/cppjieba/deps/gtest/src/.dirstamp +0 -0
- data/ext/cppjieba/deps/gtest/src/gtest-all.cc +0 -48
- data/ext/cppjieba/deps/gtest/src/gtest-death-test.cc +0 -1234
- data/ext/cppjieba/deps/gtest/src/gtest-filepath.cc +0 -380
- data/ext/cppjieba/deps/gtest/src/gtest-internal-inl.h +0 -1038
- data/ext/cppjieba/deps/gtest/src/gtest-port.cc +0 -746
- data/ext/cppjieba/deps/gtest/src/gtest-printers.cc +0 -356
- data/ext/cppjieba/deps/gtest/src/gtest-test-part.cc +0 -110
- data/ext/cppjieba/deps/gtest/src/gtest-typed-test.cc +0 -110
- data/ext/cppjieba/deps/gtest/src/gtest.cc +0 -4898
- data/ext/cppjieba/deps/gtest/src/gtest_main.cc +0 -39
- data/ext/cppjieba/deps/limonp/ArgvContext.hpp +0 -70
- data/ext/cppjieba/deps/limonp/BlockingQueue.hpp +0 -49
- data/ext/cppjieba/deps/limonp/BoundedBlockingQueue.hpp +0 -67
- data/ext/cppjieba/deps/limonp/BoundedQueue.hpp +0 -65
- data/ext/cppjieba/deps/limonp/Closure.hpp +0 -206
- data/ext/cppjieba/deps/limonp/Colors.hpp +0 -31
- data/ext/cppjieba/deps/limonp/Condition.hpp +0 -38
- data/ext/cppjieba/deps/limonp/Config.hpp +0 -103
- data/ext/cppjieba/deps/limonp/FileLock.hpp +0 -74
- data/ext/cppjieba/deps/limonp/ForcePublic.hpp +0 -7
- data/ext/cppjieba/deps/limonp/LocalVector.hpp +0 -139
- data/ext/cppjieba/deps/limonp/Logging.hpp +0 -76
- data/ext/cppjieba/deps/limonp/Md5.hpp +0 -411
- data/ext/cppjieba/deps/limonp/MutexLock.hpp +0 -51
- data/ext/cppjieba/deps/limonp/NonCopyable.hpp +0 -21
- data/ext/cppjieba/deps/limonp/StdExtension.hpp +0 -159
- data/ext/cppjieba/deps/limonp/StringUtil.hpp +0 -365
- data/ext/cppjieba/deps/limonp/Thread.hpp +0 -44
- data/ext/cppjieba/deps/limonp/ThreadPool.hpp +0 -86
- data/ext/cppjieba/dict/README.md +0 -31
- data/ext/cppjieba/dict/hmm_model.utf8 +0 -34
- data/ext/cppjieba/dict/idf.utf8 +0 -258826
- data/ext/cppjieba/dict/jieba.dict.utf8 +0 -348982
- data/ext/cppjieba/dict/pos_dict/char_state_tab.utf8 +0 -6653
- data/ext/cppjieba/dict/pos_dict/prob_emit.utf8 +0 -166
- data/ext/cppjieba/dict/pos_dict/prob_start.utf8 +0 -259
- data/ext/cppjieba/dict/pos_dict/prob_trans.utf8 +0 -5222
- data/ext/cppjieba/dict/stop_words.utf8 +0 -1534
- data/ext/cppjieba/dict/user.dict.utf8 +0 -4
- data/ext/cppjieba/include/cppjieba/DictTrie.hpp +0 -277
- data/ext/cppjieba/include/cppjieba/FullSegment.hpp +0 -93
- data/ext/cppjieba/include/cppjieba/HMMModel.hpp +0 -129
- data/ext/cppjieba/include/cppjieba/HMMSegment.hpp +0 -190
- data/ext/cppjieba/include/cppjieba/Jieba.hpp +0 -130
- data/ext/cppjieba/include/cppjieba/KeywordExtractor.hpp +0 -153
- data/ext/cppjieba/include/cppjieba/MPSegment.hpp +0 -137
- data/ext/cppjieba/include/cppjieba/MixSegment.hpp +0 -109
- data/ext/cppjieba/include/cppjieba/PosTagger.hpp +0 -77
- data/ext/cppjieba/include/cppjieba/PreFilter.hpp +0 -54
- data/ext/cppjieba/include/cppjieba/QuerySegment.hpp +0 -90
- data/ext/cppjieba/include/cppjieba/SegmentBase.hpp +0 -46
- data/ext/cppjieba/include/cppjieba/SegmentTagged.hpp +0 -23
- data/ext/cppjieba/include/cppjieba/TextRankExtractor.hpp +0 -190
- data/ext/cppjieba/include/cppjieba/Trie.hpp +0 -174
- data/ext/cppjieba/include/cppjieba/Unicode.hpp +0 -227
- data/ext/cppjieba/test/CMakeLists.txt +0 -5
- data/ext/cppjieba/test/demo.cpp +0 -80
- data/ext/cppjieba/test/load_test.cpp +0 -54
- data/ext/cppjieba/test/testdata/curl.res +0 -1
- data/ext/cppjieba/test/testdata/extra_dict/jieba.dict.small.utf8 +0 -109750
- data/ext/cppjieba/test/testdata/gbk_dict/hmm_model.gbk +0 -34
- data/ext/cppjieba/test/testdata/gbk_dict/jieba.dict.gbk +0 -348982
- data/ext/cppjieba/test/testdata/jieba.dict.0.1.utf8 +0 -93
- data/ext/cppjieba/test/testdata/jieba.dict.0.utf8 +0 -93
- data/ext/cppjieba/test/testdata/jieba.dict.1.utf8 +0 -67
- data/ext/cppjieba/test/testdata/jieba.dict.2.utf8 +0 -64
- data/ext/cppjieba/test/testdata/load_test.urls +0 -2
- data/ext/cppjieba/test/testdata/review.100 +0 -100
- data/ext/cppjieba/test/testdata/review.100.res +0 -200
- data/ext/cppjieba/test/testdata/server.conf +0 -19
- data/ext/cppjieba/test/testdata/testlines.gbk +0 -9
- data/ext/cppjieba/test/testdata/testlines.utf8 +0 -8
- data/ext/cppjieba/test/testdata/userdict.2.utf8 +0 -1
- data/ext/cppjieba/test/testdata/userdict.english +0 -2
- data/ext/cppjieba/test/testdata/userdict.utf8 +0 -8
- data/ext/cppjieba/test/testdata/weicheng.utf8 +0 -247
- data/ext/cppjieba/test/unittest/CMakeLists.txt +0 -24
- data/ext/cppjieba/test/unittest/gtest_main.cpp +0 -39
- data/ext/cppjieba/test/unittest/jieba_test.cpp +0 -133
- data/ext/cppjieba/test/unittest/keyword_extractor_test.cpp +0 -79
- data/ext/cppjieba/test/unittest/pos_tagger_test.cpp +0 -41
- data/ext/cppjieba/test/unittest/pre_filter_test.cpp +0 -43
- data/ext/cppjieba/test/unittest/segments_test.cpp +0 -256
- data/ext/cppjieba/test/unittest/textrank_test.cpp +0 -86
- data/ext/cppjieba/test/unittest/trie_test.cpp +0 -177
- data/ext/cppjieba/test/unittest/unicode_test.cpp +0 -43
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 3f8b1b4fce6119123112a8df6fa2cfca0857cb215a0985344c2d1b4ee88387cd
|
4
|
+
data.tar.gz: 3a899dba1a80ab484158e1920d94b7929a272f8fd8d82c6ea40cc32c152bbf0e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 388472322a715c6d562493e5b435c940b5fc3d93b8f4fd9bd5e96d170a456946495dca36b22d49c76ce11e25be56f5ef51a4e44cab0d1b9045c03bddfdcea082
|
7
|
+
data.tar.gz: c830ab67838d564435b3acc019cbe02e12ba69f5b93591df458375844c3dc017554ca0d355313b686ab3ef786a90234c6985fecaaab8a49eb0826db45a9aaf2d
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/cppjieba_rb.svg)](http://badge.fury.io/rb/cppjieba_rb)
|
4
4
|
|
5
|
-
[![Build Status](https://travis-ci.
|
5
|
+
[![Build Status](https://travis-ci.com/erickguan/cppjieba_rb.svg?branch=master)](https://travis-ci.com/erickguan/cppjieba_rb)
|
6
6
|
|
7
7
|
[![Patreon](https://img.shields.io/badge/back_on-patreon-red.svg)](https://www.patreon.com/fantasticfears)
|
8
8
|
|
data/Rakefile
CHANGED
@@ -2,8 +2,8 @@ require "bundler/gem_tasks"
|
|
2
2
|
require 'rake/testtask'
|
3
3
|
require 'rake/extensiontask'
|
4
4
|
|
5
|
-
gem = Gem::Specification.load(File.dirname(__FILE__) + '/
|
6
|
-
Rake::ExtensionTask.new(
|
5
|
+
gem = Gem::Specification.load(File.dirname(__FILE__) + '/cppjieba_rb.gemspec')
|
6
|
+
Rake::ExtensionTask.new("cppjieba_rb", gem) do |ext|
|
7
7
|
ext.lib_dir = File.join('lib', 'cppjieba_rb')
|
8
8
|
end
|
9
9
|
|
data/cppjieba_rb.gemspec
CHANGED
@@ -43,8 +43,8 @@ Gem::Specification.new do |spec|
|
|
43
43
|
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
44
44
|
spec.require_paths = ['lib']
|
45
45
|
|
46
|
-
spec.add_development_dependency 'bundler', '~>
|
47
|
-
spec.add_development_dependency 'rake', '~>
|
48
|
-
spec.add_development_dependency 'rake-compiler', '~> 1'
|
49
|
-
spec.add_development_dependency 'minitest', '~> 5.
|
46
|
+
spec.add_development_dependency 'bundler', '~> 2.2', '>= 2.2.10'
|
47
|
+
spec.add_development_dependency 'rake', '~> 13'
|
48
|
+
spec.add_development_dependency 'rake-compiler', '~> 1.1'
|
49
|
+
spec.add_development_dependency 'minitest', '~> 5.14'
|
50
50
|
end
|
data/lib/cppjieba_rb/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cppjieba_rb
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Erick Guan
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-06-02 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -16,56 +16,62 @@ dependencies:
|
|
16
16
|
requirements:
|
17
17
|
- - "~>"
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: '
|
19
|
+
version: '2.2'
|
20
|
+
- - ">="
|
21
|
+
- !ruby/object:Gem::Version
|
22
|
+
version: 2.2.10
|
20
23
|
type: :development
|
21
24
|
prerelease: false
|
22
25
|
version_requirements: !ruby/object:Gem::Requirement
|
23
26
|
requirements:
|
24
27
|
- - "~>"
|
25
28
|
- !ruby/object:Gem::Version
|
26
|
-
version: '
|
29
|
+
version: '2.2'
|
30
|
+
- - ">="
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: 2.2.10
|
27
33
|
- !ruby/object:Gem::Dependency
|
28
34
|
name: rake
|
29
35
|
requirement: !ruby/object:Gem::Requirement
|
30
36
|
requirements:
|
31
37
|
- - "~>"
|
32
38
|
- !ruby/object:Gem::Version
|
33
|
-
version: '
|
39
|
+
version: '13'
|
34
40
|
type: :development
|
35
41
|
prerelease: false
|
36
42
|
version_requirements: !ruby/object:Gem::Requirement
|
37
43
|
requirements:
|
38
44
|
- - "~>"
|
39
45
|
- !ruby/object:Gem::Version
|
40
|
-
version: '
|
46
|
+
version: '13'
|
41
47
|
- !ruby/object:Gem::Dependency
|
42
48
|
name: rake-compiler
|
43
49
|
requirement: !ruby/object:Gem::Requirement
|
44
50
|
requirements:
|
45
51
|
- - "~>"
|
46
52
|
- !ruby/object:Gem::Version
|
47
|
-
version: '1'
|
53
|
+
version: '1.1'
|
48
54
|
type: :development
|
49
55
|
prerelease: false
|
50
56
|
version_requirements: !ruby/object:Gem::Requirement
|
51
57
|
requirements:
|
52
58
|
- - "~>"
|
53
59
|
- !ruby/object:Gem::Version
|
54
|
-
version: '1'
|
60
|
+
version: '1.1'
|
55
61
|
- !ruby/object:Gem::Dependency
|
56
62
|
name: minitest
|
57
63
|
requirement: !ruby/object:Gem::Requirement
|
58
64
|
requirements:
|
59
65
|
- - "~>"
|
60
66
|
- !ruby/object:Gem::Version
|
61
|
-
version: '5.
|
67
|
+
version: '5.14'
|
62
68
|
type: :development
|
63
69
|
prerelease: false
|
64
70
|
version_requirements: !ruby/object:Gem::Requirement
|
65
71
|
requirements:
|
66
72
|
- - "~>"
|
67
73
|
- !ruby/object:Gem::Version
|
68
|
-
version: '5.
|
74
|
+
version: '5.14'
|
69
75
|
description: cppjieba binding for ruby. Mainly used by Discourse.
|
70
76
|
email:
|
71
77
|
- fantasticfears@gmail.com
|
@@ -82,129 +88,6 @@ files:
|
|
82
88
|
- README.md
|
83
89
|
- Rakefile
|
84
90
|
- cppjieba_rb.gemspec
|
85
|
-
- ext/cppjieba/.gitignore
|
86
|
-
- ext/cppjieba/.travis.yml
|
87
|
-
- ext/cppjieba/CMakeLists.txt
|
88
|
-
- ext/cppjieba/ChangeLog.md
|
89
|
-
- ext/cppjieba/README.md
|
90
|
-
- ext/cppjieba/README_EN.md
|
91
|
-
- ext/cppjieba/appveyor.yml
|
92
|
-
- ext/cppjieba/deps/CMakeLists.txt
|
93
|
-
- ext/cppjieba/deps/gtest/CMakeLists.txt
|
94
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-death-test.h
|
95
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-message.h
|
96
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-param-test.h
|
97
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-param-test.h.pump
|
98
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-printers.h
|
99
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-spi.h
|
100
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-test-part.h
|
101
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest-typed-test.h
|
102
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest.h
|
103
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest_pred_impl.h
|
104
|
-
- ext/cppjieba/deps/gtest/include/gtest/gtest_prod.h
|
105
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-death-test-internal.h
|
106
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-filepath.h
|
107
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-internal.h
|
108
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-linked_ptr.h
|
109
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-param-util-generated.h
|
110
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-param-util-generated.h.pump
|
111
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-param-util.h
|
112
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-port.h
|
113
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-string.h
|
114
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-tuple.h
|
115
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-tuple.h.pump
|
116
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-type-util.h
|
117
|
-
- ext/cppjieba/deps/gtest/include/gtest/internal/gtest-type-util.h.pump
|
118
|
-
- ext/cppjieba/deps/gtest/src/.deps/.dirstamp
|
119
|
-
- ext/cppjieba/deps/gtest/src/.deps/gtest-all.Plo
|
120
|
-
- ext/cppjieba/deps/gtest/src/.deps/gtest_main.Plo
|
121
|
-
- ext/cppjieba/deps/gtest/src/.dirstamp
|
122
|
-
- ext/cppjieba/deps/gtest/src/gtest-all.cc
|
123
|
-
- ext/cppjieba/deps/gtest/src/gtest-death-test.cc
|
124
|
-
- ext/cppjieba/deps/gtest/src/gtest-filepath.cc
|
125
|
-
- ext/cppjieba/deps/gtest/src/gtest-internal-inl.h
|
126
|
-
- ext/cppjieba/deps/gtest/src/gtest-port.cc
|
127
|
-
- ext/cppjieba/deps/gtest/src/gtest-printers.cc
|
128
|
-
- ext/cppjieba/deps/gtest/src/gtest-test-part.cc
|
129
|
-
- ext/cppjieba/deps/gtest/src/gtest-typed-test.cc
|
130
|
-
- ext/cppjieba/deps/gtest/src/gtest.cc
|
131
|
-
- ext/cppjieba/deps/gtest/src/gtest_main.cc
|
132
|
-
- ext/cppjieba/deps/limonp/ArgvContext.hpp
|
133
|
-
- ext/cppjieba/deps/limonp/BlockingQueue.hpp
|
134
|
-
- ext/cppjieba/deps/limonp/BoundedBlockingQueue.hpp
|
135
|
-
- ext/cppjieba/deps/limonp/BoundedQueue.hpp
|
136
|
-
- ext/cppjieba/deps/limonp/Closure.hpp
|
137
|
-
- ext/cppjieba/deps/limonp/Colors.hpp
|
138
|
-
- ext/cppjieba/deps/limonp/Condition.hpp
|
139
|
-
- ext/cppjieba/deps/limonp/Config.hpp
|
140
|
-
- ext/cppjieba/deps/limonp/FileLock.hpp
|
141
|
-
- ext/cppjieba/deps/limonp/ForcePublic.hpp
|
142
|
-
- ext/cppjieba/deps/limonp/LocalVector.hpp
|
143
|
-
- ext/cppjieba/deps/limonp/Logging.hpp
|
144
|
-
- ext/cppjieba/deps/limonp/Md5.hpp
|
145
|
-
- ext/cppjieba/deps/limonp/MutexLock.hpp
|
146
|
-
- ext/cppjieba/deps/limonp/NonCopyable.hpp
|
147
|
-
- ext/cppjieba/deps/limonp/StdExtension.hpp
|
148
|
-
- ext/cppjieba/deps/limonp/StringUtil.hpp
|
149
|
-
- ext/cppjieba/deps/limonp/Thread.hpp
|
150
|
-
- ext/cppjieba/deps/limonp/ThreadPool.hpp
|
151
|
-
- ext/cppjieba/dict/README.md
|
152
|
-
- ext/cppjieba/dict/hmm_model.utf8
|
153
|
-
- ext/cppjieba/dict/idf.utf8
|
154
|
-
- ext/cppjieba/dict/jieba.dict.utf8
|
155
|
-
- ext/cppjieba/dict/pos_dict/char_state_tab.utf8
|
156
|
-
- ext/cppjieba/dict/pos_dict/prob_emit.utf8
|
157
|
-
- ext/cppjieba/dict/pos_dict/prob_start.utf8
|
158
|
-
- ext/cppjieba/dict/pos_dict/prob_trans.utf8
|
159
|
-
- ext/cppjieba/dict/stop_words.utf8
|
160
|
-
- ext/cppjieba/dict/user.dict.utf8
|
161
|
-
- ext/cppjieba/include/cppjieba/DictTrie.hpp
|
162
|
-
- ext/cppjieba/include/cppjieba/FullSegment.hpp
|
163
|
-
- ext/cppjieba/include/cppjieba/HMMModel.hpp
|
164
|
-
- ext/cppjieba/include/cppjieba/HMMSegment.hpp
|
165
|
-
- ext/cppjieba/include/cppjieba/Jieba.hpp
|
166
|
-
- ext/cppjieba/include/cppjieba/KeywordExtractor.hpp
|
167
|
-
- ext/cppjieba/include/cppjieba/MPSegment.hpp
|
168
|
-
- ext/cppjieba/include/cppjieba/MixSegment.hpp
|
169
|
-
- ext/cppjieba/include/cppjieba/PosTagger.hpp
|
170
|
-
- ext/cppjieba/include/cppjieba/PreFilter.hpp
|
171
|
-
- ext/cppjieba/include/cppjieba/QuerySegment.hpp
|
172
|
-
- ext/cppjieba/include/cppjieba/SegmentBase.hpp
|
173
|
-
- ext/cppjieba/include/cppjieba/SegmentTagged.hpp
|
174
|
-
- ext/cppjieba/include/cppjieba/TextRankExtractor.hpp
|
175
|
-
- ext/cppjieba/include/cppjieba/Trie.hpp
|
176
|
-
- ext/cppjieba/include/cppjieba/Unicode.hpp
|
177
|
-
- ext/cppjieba/test/CMakeLists.txt
|
178
|
-
- ext/cppjieba/test/demo.cpp
|
179
|
-
- ext/cppjieba/test/load_test.cpp
|
180
|
-
- ext/cppjieba/test/testdata/curl.res
|
181
|
-
- ext/cppjieba/test/testdata/extra_dict/jieba.dict.small.utf8
|
182
|
-
- ext/cppjieba/test/testdata/gbk_dict/hmm_model.gbk
|
183
|
-
- ext/cppjieba/test/testdata/gbk_dict/jieba.dict.gbk
|
184
|
-
- ext/cppjieba/test/testdata/jieba.dict.0.1.utf8
|
185
|
-
- ext/cppjieba/test/testdata/jieba.dict.0.utf8
|
186
|
-
- ext/cppjieba/test/testdata/jieba.dict.1.utf8
|
187
|
-
- ext/cppjieba/test/testdata/jieba.dict.2.utf8
|
188
|
-
- ext/cppjieba/test/testdata/load_test.urls
|
189
|
-
- ext/cppjieba/test/testdata/review.100
|
190
|
-
- ext/cppjieba/test/testdata/review.100.res
|
191
|
-
- ext/cppjieba/test/testdata/server.conf
|
192
|
-
- ext/cppjieba/test/testdata/testlines.gbk
|
193
|
-
- ext/cppjieba/test/testdata/testlines.utf8
|
194
|
-
- ext/cppjieba/test/testdata/userdict.2.utf8
|
195
|
-
- ext/cppjieba/test/testdata/userdict.english
|
196
|
-
- ext/cppjieba/test/testdata/userdict.utf8
|
197
|
-
- ext/cppjieba/test/testdata/weicheng.utf8
|
198
|
-
- ext/cppjieba/test/unittest/CMakeLists.txt
|
199
|
-
- ext/cppjieba/test/unittest/gtest_main.cpp
|
200
|
-
- ext/cppjieba/test/unittest/jieba_test.cpp
|
201
|
-
- ext/cppjieba/test/unittest/keyword_extractor_test.cpp
|
202
|
-
- ext/cppjieba/test/unittest/pos_tagger_test.cpp
|
203
|
-
- ext/cppjieba/test/unittest/pre_filter_test.cpp
|
204
|
-
- ext/cppjieba/test/unittest/segments_test.cpp
|
205
|
-
- ext/cppjieba/test/unittest/textrank_test.cpp
|
206
|
-
- ext/cppjieba/test/unittest/trie_test.cpp
|
207
|
-
- ext/cppjieba/test/unittest/unicode_test.cpp
|
208
91
|
- ext/cppjieba_rb/cppjieba_rb.c
|
209
92
|
- ext/cppjieba_rb/extconf.rb
|
210
93
|
- ext/cppjieba_rb/internal.cc
|
@@ -234,8 +117,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
234
117
|
- !ruby/object:Gem::Version
|
235
118
|
version: '0'
|
236
119
|
requirements: []
|
237
|
-
|
238
|
-
rubygems_version: 2.6.14
|
120
|
+
rubygems_version: 3.2.15
|
239
121
|
signing_key:
|
240
122
|
specification_version: 4
|
241
123
|
summary: cppjieba binding for ruby
|
data/ext/cppjieba/.gitignore
DELETED
data/ext/cppjieba/.travis.yml
DELETED
@@ -1,21 +0,0 @@
|
|
1
|
-
language: cpp
|
2
|
-
before_install:
|
3
|
-
- if [ $TRAVIS_OS_NAME == linux ]; then sudo apt-get install cmake; fi
|
4
|
-
script:
|
5
|
-
- mkdir build
|
6
|
-
- cd build
|
7
|
-
- cmake ..
|
8
|
-
- make
|
9
|
-
- make test
|
10
|
-
os:
|
11
|
-
- linux
|
12
|
-
- osx
|
13
|
-
compiler:
|
14
|
-
- clang
|
15
|
-
- gcc
|
16
|
-
notifications:
|
17
|
-
recipients:
|
18
|
-
- i@yanyiwu.com
|
19
|
-
email:
|
20
|
-
on_success: change
|
21
|
-
on_failure: always
|
data/ext/cppjieba/CMakeLists.txt
DELETED
@@ -1,28 +0,0 @@
|
|
1
|
-
PROJECT(CPPJIEBA)
|
2
|
-
|
3
|
-
CMAKE_MINIMUM_REQUIRED (VERSION 2.6)
|
4
|
-
|
5
|
-
INCLUDE_DIRECTORIES(${PROJECT_SOURCE_DIR}/deps
|
6
|
-
${PROJECT_SOURCE_DIR}/include)
|
7
|
-
|
8
|
-
if (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
|
9
|
-
set (CMAKE_INSTALL_PREFIX "/usr/local/cppjieba" CACHE PATH "default install path" FORCE )
|
10
|
-
endif()
|
11
|
-
|
12
|
-
ADD_DEFINITIONS(-O3 -Wall -g)
|
13
|
-
IF(APPLE) # mac os
|
14
|
-
ADD_DEFINITIONS(-std=c++0x)
|
15
|
-
endif()
|
16
|
-
|
17
|
-
# cmake .. -DENC=GBK
|
18
|
-
# IF (DEFINED ENC)
|
19
|
-
# ADD_DEFINITIONS(-DCPPJIEBA_${ENC})
|
20
|
-
# ENDIF()
|
21
|
-
|
22
|
-
ADD_SUBDIRECTORY(deps)
|
23
|
-
ADD_SUBDIRECTORY(test)
|
24
|
-
|
25
|
-
ENABLE_TESTING()
|
26
|
-
ADD_TEST(NAME ./test/test.run COMMAND ./test/test.run)
|
27
|
-
ADD_TEST(NAME ./load_test COMMAND ./load_test)
|
28
|
-
ADD_TEST(NAME ./demo COMMAND ./demo)
|
data/ext/cppjieba/ChangeLog.md
DELETED
@@ -1,236 +0,0 @@
|
|
1
|
-
# CppJieba ChangeLog
|
2
|
-
|
3
|
-
## v5.0.0
|
4
|
-
|
5
|
-
+ Notice(**api changed**) : Jieba class 3 arguments -> 5 arguments, and use KeywordExtractor in Jieba
|
6
|
-
|
7
|
-
## v4.8.1
|
8
|
-
|
9
|
-
+ add TextRankExtractor by [@questionfish] in [pull request 65](https://github.com/yanyiwu/cppjieba/pull/65)
|
10
|
-
+ add Jieba::ResetSeparators api for some special situation, for example in [issue67](https://github.com/yanyiwu/cppjieba/issues/67)
|
11
|
-
+ fix [issue70](https://github.com/yanyiwu/cppjieba/issues/70)
|
12
|
-
+ support (word, freq, tag) format in user_dict, see details in [pr74](https://github.com/yanyiwu/cppjieba/pull/74)
|
13
|
-
|
14
|
-
## v4.8.0
|
15
|
-
|
16
|
-
+ rewrite QuerySegment, make `Jieba::CutForSearch` behaves the same as [jieba] `cut_for_search` api
|
17
|
-
+ remove Jieba::SetQuerySegmentThreshold
|
18
|
-
|
19
|
-
## v4.7.0
|
20
|
-
|
21
|
-
api changes:
|
22
|
-
|
23
|
-
+ override Cut functions, add location information into Word results;
|
24
|
-
+ remove LevelSegment;
|
25
|
-
+ remove Jieba::Locate;
|
26
|
-
|
27
|
-
upgrade:
|
28
|
-
|
29
|
-
+ limonp -> v0.6.1
|
30
|
-
|
31
|
-
## v4.6.0
|
32
|
-
|
33
|
-
+ Change Jieba::Locate(deprecated) to be static function.
|
34
|
-
+ Change the return value of KeywordExtractor::Extract from bool to void.
|
35
|
-
+ Add KeywordExtractor::Word and add more overrided KeywordExtractor::Extract
|
36
|
-
|
37
|
-
## v4.5.3
|
38
|
-
|
39
|
-
+ Upgrade limonp to v0.6.0
|
40
|
-
|
41
|
-
## v4.5.2
|
42
|
-
|
43
|
-
+ Upgrade limonp to v0.5.6 to fix hidden trouble.
|
44
|
-
|
45
|
-
## v4.5.1
|
46
|
-
|
47
|
-
+ Upgrade limonp to v0.5.5 to solve macro name conficts in some special case.
|
48
|
-
|
49
|
-
## v4.5.0
|
50
|
-
|
51
|
-
+ 在 Trie 中去除之前糟糕的针对 uint16 优化的用数组代替 map 的设计,
|
52
|
-
该设计的主要问题是前提 unicode 每个字符必须是 uint16 ,则无法更全面得支持 unicode 多国字符。
|
53
|
-
+ Rune 类型从 16bit 更改为 32bit ,支持更多 Unicode 字符,包括一些罕见汉字。
|
54
|
-
|
55
|
-
## v4.4.1
|
56
|
-
|
57
|
-
+ 使用 valgrind 检查内存泄露的问题,定位出一个HMM模型初始化的问题导致内存泄露的bug,不过此内存泄露不是致命问题,
|
58
|
-
因为只会在词典载入的时候发生,而词典载入通常情况下只会被运行一次,故不会导致严重问题。
|
59
|
-
+ 感谢 [qinwf] 帮我发现这个bug,非常感谢。
|
60
|
-
|
61
|
-
## v4.4.0
|
62
|
-
|
63
|
-
+ 加代码容易删代码难,思索良久,还是决定把 Server 功能的源码剥离出这个项目。
|
64
|
-
+ 让 [cppjieba] 回到当年情窦未开时清纯的感觉,删除那些无关紧要的server代码,让整个项目轻装上阵,专注分词的核心代码。
|
65
|
-
+ By the way, 之前的 server 相关的代码,如果你真的需要它,就去新的项目仓库 [cppjieba-server](https://github.com/yanyiwu/cppjieba-server) 找它们。
|
66
|
-
|
67
|
-
## v4.3.3
|
68
|
-
|
69
|
-
+ Yet Another Incompatibility Problem Repair: Upgrade [limonp] to version v0.5.3, fix incompatibility problem in Windows
|
70
|
-
|
71
|
-
## v4.3.2
|
72
|
-
|
73
|
-
+ Upgrade [limonp] to version v0.5.2, fix incompatibility problem in Windows
|
74
|
-
|
75
|
-
## v4.3.1
|
76
|
-
|
77
|
-
+ 重载 KeywordExtractor 的构造函数,可以传入 Jieba 进行字典和模型的构造。
|
78
|
-
|
79
|
-
## v4.3.0
|
80
|
-
|
81
|
-
源码目录布局调整:
|
82
|
-
|
83
|
-
1. src/ -> include/cppjieba/
|
84
|
-
2. src/limonp/ -> deps/limonp/
|
85
|
-
3. server/husky -> deps/husky/
|
86
|
-
4. test/unittest/gtest -> deps/gtest
|
87
|
-
|
88
|
-
依赖库升级:
|
89
|
-
|
90
|
-
1. [limonp] to version v0.5.1
|
91
|
-
2. [husky] to version v0.2.0
|
92
|
-
|
93
|
-
## v4.2.1
|
94
|
-
|
95
|
-
1. Upgrade [limonp] to version v0.4.1, [husky] to version v0.2.0
|
96
|
-
|
97
|
-
## v4.2.0
|
98
|
-
|
99
|
-
1. 修复[issue50]提到的多词典分隔符在Windows环境下存在的问题,从':'修改成'|'或';'。
|
100
|
-
|
101
|
-
## v4.1.2
|
102
|
-
|
103
|
-
1. 新增 Jieba::Locate 函数接口,作为计算分词结果的词语位置信息,在某些场景下有用,比如搜索结果高亮之类的。
|
104
|
-
|
105
|
-
## v4.1.1
|
106
|
-
|
107
|
-
1. 在 class Jieba 中新增词性标注的接口函数 Jieba::Tag
|
108
|
-
|
109
|
-
## v4.1.0
|
110
|
-
|
111
|
-
1. QuerySegment切词时加一层判断,当长词满足IsAllAscii(比如英文单词)时,不进行细粒度分词。
|
112
|
-
2. QuerySegment新增SetMaxWordLen和GetMaxWordLen接口,用来设置二次分词条件被触发的词长阈值。
|
113
|
-
3. Jieba新增SetQuerySegmentThreshold设置CutForSearch函数的词长阈值。
|
114
|
-
|
115
|
-
## v4.0.0
|
116
|
-
|
117
|
-
1. 支持多个userdict载入,多词典路径用英文冒号(:)作为分隔符,就当是向环境变量PATH致敬,哈哈。
|
118
|
-
2. userdict是不带权重的,之前对于新的userword默认设置词频权重为最大值,现已支持可配置,默认使用中位值。
|
119
|
-
3. 【兼容性预警】修改一些代码风格,比如命名空间小写化,从CppJieba变成cppjieba。
|
120
|
-
4. 【兼容性预警】弃用Application.hpp, 取而代之使用Jieba.hpp ,接口也进行了大幅修改,函数风格更统一,和python版本的Jieba分词更一致。
|
121
|
-
|
122
|
-
## v3.2.1
|
123
|
-
|
124
|
-
1. 修复 Jieba.hpp 头文件保护写错导致的 bug。
|
125
|
-
|
126
|
-
## v3.2.0
|
127
|
-
|
128
|
-
1. 使用工程上比较 tricky 的 Trie树优化办法。废弃了之前的 `Aho-Corasick-Automation` 实现,可读性更好,性能更高。
|
129
|
-
2. 新增层次分词器: LevelSegment 。
|
130
|
-
3. 增加MPSegment的细粒度分词功能。
|
131
|
-
4. 增加 class Jieba ,提供可读性更好的接口。
|
132
|
-
5. 放弃了统一接口ISegment,因为统一的接口限制了分词方式的灵活性,限制了一些功能的增加。
|
133
|
-
6. 增加默认开启新词发现功能的可选参数hmm,让MixSegment和QuerySegment都支持开关新词发现功能。
|
134
|
-
|
135
|
-
## v3.1.0
|
136
|
-
|
137
|
-
1. 新增可动态增加词典的API: insertUserWord
|
138
|
-
2. cut函数增加默认参数,默认使用Mix切词算法。关于切词算法详见README.md
|
139
|
-
|
140
|
-
## v3.0.1
|
141
|
-
|
142
|
-
1. 提升兼容性,修复在某些特定环境下的编译错误问题。
|
143
|
-
|
144
|
-
## v3.0.0
|
145
|
-
|
146
|
-
1. 使得 QuerySegment 支持自定义词典(可选参数)。
|
147
|
-
2. 使得 KeywordExtractor 支持自定义词典(可选参数)。
|
148
|
-
3. 修改 Code Style ,参照 google code style 。
|
149
|
-
4. 增加更详细的错误日志,在初始化过程中合理使用LogFatal。
|
150
|
-
5. 增加 Application 这个类,整合了所有CppJieba的功能进去,以后用户只需要使用这个类即可。
|
151
|
-
6. 修改 cjserver 服务,可以通过http参数使用不同切词算法进行切词。
|
152
|
-
7. 修改 make install 的安装目录,统一安装到同一个目录 /usr/local/cppjieba 。
|
153
|
-
|
154
|
-
## v2.4.4
|
155
|
-
|
156
|
-
1. 修改两条更细粒度的特殊过滤规则,将连续的数字(包括浮点数)和连续的字母单独切分出来(而不会混在一起)。
|
157
|
-
2. 修改最大概率法时动态规划过程需要使用的 DAG 数据结构(同时也修改 Trie 的 DAG 查询函数),提高分词速度 8% 。
|
158
|
-
3. 使用了 `Aho-Corasick-Automation` 算法提速 Trie 查找的过程等优化,提升性能。
|
159
|
-
4. 增加词性标注的两条特殊规则。
|
160
|
-
|
161
|
-
## v2.4.3
|
162
|
-
|
163
|
-
1. 更新 [husky] 服务代码,新 [husky] 为基于线程池的服务器简易框架。并且修复当 HTTP POST 请求时 body 过长数据可能丢失的问题。
|
164
|
-
2. 修改 PosTagger 的参数结构,删除暂时无用的参数。并添加使用自定义字典的参数,也就是支持 **自定义词性**。
|
165
|
-
3. 更好的支持 `mac osx` (原谅作者如此屌丝,这么晚才买 `mac` )。
|
166
|
-
4. 支持 `Docker` ,具体请见 `Dockerfile` 。
|
167
|
-
|
168
|
-
## v2.4.2
|
169
|
-
|
170
|
-
1. 适当使用 `vector`, 的基础上,使用`limonp/LocalVector.hpp`作为`Unicode`的类型等优化,约提高性能 `30%`。
|
171
|
-
2. 使 `cjserver` 支持用户自定义词典,通过在 `conf/server.conf` 里面配置 `user_dict_path` 来实现。
|
172
|
-
3. 修复 `MPSegment` 切词时,当句子中含有特殊字符时,切词结果不完整的问题。
|
173
|
-
4. 修改 `FullSegment` 减少内存使用。
|
174
|
-
5. 修改 `-std=c++0x` 或者 `-std=c++11` 时编译失败的问题。
|
175
|
-
|
176
|
-
## v2.4.1
|
177
|
-
|
178
|
-
1. 完善一些特殊字符和字母串的切词效果。
|
179
|
-
2. 提高关键词抽取的速度。
|
180
|
-
3. 提供用户自定义词典的接口。
|
181
|
-
4. 将server相关的代码独立出来,单独放在`server/`目录下。
|
182
|
-
5. 修复用户自定义词典中单字会被MixSegment的新词发现功能给忽略的问题。也就是说,现在的词典是用户词典优先级最高,其次是自带的词典,再其次是新词发现出来的词。
|
183
|
-
|
184
|
-
## v2.4.0
|
185
|
-
|
186
|
-
1. 适配更低级版本的`g++`和`cmake`,已在`g++ 4.1.2`和`cmake 2.6`上测试通过。
|
187
|
-
2. 修改一些测试用例的文件,减少测试时编译的时间。
|
188
|
-
3. 修复`make install`相关的问题。
|
189
|
-
4. 增加HTTP服务的POST请求接口。
|
190
|
-
5. 拆分`Trie.hpp`成`DictTrie.hpp`和`Trie.hpp`,将trie树这个数据结构抽象出来,并且修复Trie这个类潜在的bug并完善单元测试。
|
191
|
-
6. 重写cjserver的启动和停止,新启动和停止方法详见README.md。
|
192
|
-
|
193
|
-
## v2.3.4
|
194
|
-
|
195
|
-
1. 修改了设计上的问题,删除了`TrieManager`这个类,以避免造成一些可能的隐患。
|
196
|
-
2. 增加`stop_words.utf8`词典,并修改`KeywordExtractor`的初始化函数用以使用此词典。
|
197
|
-
3. 优化了`Trie`树相关部分代码结构。
|
198
|
-
|
199
|
-
## v2.3.3
|
200
|
-
|
201
|
-
1. 修复因为使用unordered_map导致的在不同机器上结果不一致的问题。
|
202
|
-
2. 将部分数据结果从unordered_map改为map,提升了差不多1/6的切词速度。(因为unordered_map虽然查找速度快,但是在范围迭代的效率较低。)
|
203
|
-
|
204
|
-
## v2.3.2
|
205
|
-
|
206
|
-
1. 修复单元测试的问题,有些case在x84和x64中结果不一致。
|
207
|
-
2. merge进词性标注的简单版本。
|
208
|
-
|
209
|
-
## v2.3.1
|
210
|
-
|
211
|
-
1. 修复安装时的服务启动问题(不过安装切词服务只是linux下的一个附加功能,不影响核心代码。)
|
212
|
-
|
213
|
-
## v2.3.0
|
214
|
-
|
215
|
-
1. 增加`KeywordExtractor.hpp`来进行关键词抽取。
|
216
|
-
2. 使用`gtest`来做单元测试。
|
217
|
-
|
218
|
-
## v2.2.0
|
219
|
-
|
220
|
-
1. 性能优化,提升切词速度约6倍。
|
221
|
-
2. 其他暂时也想不起来了。
|
222
|
-
|
223
|
-
## v2.1.1 (v2.1.1之前的统统一起写在 v2.1.1里面了)
|
224
|
-
|
225
|
-
1. 完成__最大概率分词算法__和__HMM分词算法__,并且将他们结合起来成效果最好的`MixSegment`。
|
226
|
-
2. 进行大量的代码重构,将主要的功能性代码都写成了hpp文件。
|
227
|
-
3. 使用`cmake`工具来管理项目。
|
228
|
-
4. 使用 [limonp]作为工具函数库,比如日志,字符串操作等常用函数。
|
229
|
-
5. 使用 [husky] 搭简易分词服务的服务器框架。
|
230
|
-
|
231
|
-
[limonp]:http://github.com/yanyiwu/limonp.git
|
232
|
-
[husky]:http://github.com/yanyiwu/husky.git
|
233
|
-
[issue50]:https://github.com/yanyiwu/cppjieba/issues/50
|
234
|
-
[qinwf]:https://github.com/yanyiwu/cppjieba/pull/53#issuecomment-176264929
|
235
|
-
[jieba]:https://github.com/fxsjy/jieba
|
236
|
-
[@questionfish]:https://github.com/questionfish
|