twitter-korean-text-ruby 0.9.1 → 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 66a39cfe7f6759aa6bc9412a49e069817201d08b
4
- data.tar.gz: d2d0ffd199077dd5f7d0463764e2d1b2c6bb1f27
3
+ metadata.gz: 21817e92682779015dab9f559002da59bacdc547
4
+ data.tar.gz: ea723db4b56070c6083ebad21072536e57e081d4
5
5
  SHA512:
6
- metadata.gz: 5522edf916fb34042b99333b141146d450634e7d3503e0716204e4dcf2a74a2361fd3ad77dce961142b077f998e6dff627a0fb0603ad7e446b7c0fce1c64ac8e
7
- data.tar.gz: d812c9d805a0957631a7845cde29ec9bbead25a610ef42d19834cd518e9b12648a210d5a071f837fd5669b93dcbb2aff284b200148dd1b425f04514397df8946
6
+ metadata.gz: 0b16f1356c028266460cf92e8943a1787d3f28d38a1c1697e5a0685994ee2a65377493015254a8054c37ed6d41529838bb84372f6af1c8919fd0a65e1bd8e788
7
+ data.tar.gz: e41d60fea88751d76c96aab12a230c6e4aebf53fd2f2fa253c6fa9d89d8ed84e6d7b2db3353a9723f164f66afebfa408b9404f9b7df6c34b3b66461323bb32cb
data/README.md CHANGED
@@ -1,12 +1,15 @@
1
1
  ## twitter-korean-text-ruby
2
2
  [![Build Status](https://travis-ci.org/keepcosmos/twitter-korean-text-ruby.svg?branch=master)](https://travis-ci.org/keepcosmos/twitter-korean-text-ruby)
3
3
  [![Code Climate](https://codeclimate.com/repos/56d562f8e4ecf4707f00309b/badges/7673319c6a92ab7ace9f/gpa.svg)](https://codeclimate.com/repos/56d562f8e4ecf4707f00309b/feed)
4
+ [![Gem Version](https://badge.fury.io/rb/twitter-korean-text-ruby.svg)](https://badge.fury.io/rb/twitter-korean-text-ruby)
4
5
 
5
6
  Ruby interface to [twitter-korean-text](https://github.com/twitter/twitter-korean-text) by Twitter
6
7
 
7
8
  트위터에서 제공하는 한글 형태소 분석기인 [twitter-korean-text](https://github.com/twitter/twitter-korean-text)(Scala)를 Ruby에서 사용가능하도록 Wrapping 하였습니다.
8
9
 
9
- ### install
10
+ [twitter-korean-text 4.4](https://github.com/twitter/twitter-korean-text/releases/tag/korean-text-4.4) 버젼을 바탕으로 만들어졌습니다.
11
+
12
+ ### Install
10
13
  ```{ruby}
11
14
  $ gem install twitter-korean-text-ruby
12
15
  ```
@@ -17,39 +20,43 @@ gem 'twitter-korean-text-ruby'
17
20
  ```
18
21
 
19
22
  ### Useage
23
+ #### Basic
20
24
  ```ruby
21
25
  require 'twitter-korean-text-ruby'
22
26
 
23
27
  processor = TwitterKorean::Processor.new
24
28
  # OR with JVM arguments
25
- processor = TwitterKorean::Processor.new('-Xms126M', '-Xms512M', ...)
29
+ processor = TwitterKorean::Processor.new('-Xms126M', '-Xmx512M', ...)
26
30
 
27
31
  # Normalize
28
32
  processor.normalize("형태소 분석을 합니닼ㅋㅋㅋㅋㅋㅋ")
29
33
  # => "형태소 분석을 합니다ㅋㅋㅋㅋㅋㅋ"
30
34
 
31
35
  # Tokenize
32
- tokens = proccessor.tokenize("한국어를 처리하는 예시입니다 ㅋㅋ")
33
- puts tokens
36
+ proccessor.tokenize("한국어를 처리하는 예시입니다 ㅋㅋ")
34
37
  # => ["한국어", "를", " ", "처리", "하는", " ", "예시", "입니", "다", " ", "ㅋㅋ"]
35
38
 
36
- # metadata of token, 토큰에 대한 정보
37
- metadata = tokens.first.metadata
38
- matadata #=> "noun, 0, 3"
39
- metadata.pos #=> :noun
40
- metadata.offset #=> 0
41
- metadata.length #=> 3
42
-
43
39
  # Stemming
44
- tokens = proccessor.stem("한국어를 처리하는 예시입니다 ㅋㅋ")
45
- puts tokens
40
+ proccessor.stem("한국어를 처리하는 예시입니다 ㅋㅋ")
46
41
  # => ["한국어", "를", " ", "처리", "하다", " ", "예시", "이다", " ", "ㅋㅋ"]
47
42
 
48
43
  # extract phrases
49
- tokens = proccessor.stem("한국어를 처리하는 예시입니다 ㅋㅋ")
50
- puts tokens
44
+ proccessor.stem("한국어를 처리하는 예시입니다 ㅋㅋ")
51
45
  # => ["한국어", "처리", "처리하는 예시", "예시"]
46
+ ```
47
+ #### Token Information
48
+ 토큰 클래스(`TwitterKorean::KoreanToken`)는 String을 상속받아 만들었습니다. 토큰에 대한 메타정보는 `metadata` attribute를 사용합니다.
49
+
50
+ ```{ruby}
51
+ tokens = proccessor.tokenize("한국어를 처리하는 예시입니다 ㅋㅋ")
52
+ token = tokens.first
52
53
 
54
+ token #=> 한국어
55
+ metadata = token.metadata
56
+ matadata #=> "noun, 0, 3"
57
+ metadata.pos #=> :noun
58
+ metadata.offset #=> 0
59
+ metadata.length #=> 3
53
60
  ```
54
61
 
55
62
  ### Test
@@ -58,11 +65,12 @@ rake test
58
65
  ```
59
66
 
60
67
  ### Issue
61
- JAVA Path를 찾지 못했을 경우,
68
+ JAVA_HOME Path를 찾지 못했을 경우,
62
69
  ```{bash}
63
70
  export JAVA_HOME=$(java_home_path)
71
+ ```
64
72
 
65
73
  ### Contribute
66
74
  이 프로젝트는 [twitter-korean-text](https://github.com/twitter/twitter-korean-text) 프로젝트의 Scala 코드를 Ruby로 Wrapping하는 프로젝트입니다.
67
75
  관련된 범주에 대한 Issue와 Pull Request(테스트 코드가 포함된)는 언제나 환영입니다.
68
- ```
76
+
@@ -32,7 +32,7 @@ module TwitterKorean
32
32
  def extract_phrases(text, options = {})
33
33
  return unless text
34
34
  filter_spam = options[:filter_spam] || false
35
- including_hashtags = options[:including_hashtags] || true
35
+ including_hashtags = options[:including_hashtags] || true
36
36
  converto_to_korean_tokens do
37
37
  jvm_processor.extractPhrases(jvm_processor.tokenize(text), filter_spam, including_hashtags)
38
38
  end
@@ -49,7 +49,7 @@ module TwitterKorean
49
49
  end
50
50
 
51
51
  def scala_list_to_array(result)
52
- result.scan(/(?<=List\(|\,\s)(.*?\([a-zA-Z]+\:\s[0-9]+,\s[0-9]\))/).to_a
52
+ result.scan(/(?<=List\(|\,\s)(.*?\(\w+\:\s[0-9]+,\s[0-9]+\))/).to_a
53
53
  end
54
54
  end
55
55
  end
@@ -1,3 +1,3 @@
1
1
  module TwitterKorean
2
- VERSION = '0.9.1'
2
+ VERSION = '0.9.2'
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: twitter-korean-text-ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.1
4
+ version: 0.9.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jaehyun Shin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-03-01 00:00:00.000000000 Z
11
+ date: 2016-03-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rjb