bunpa 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +30 -4
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7ea90d306e49f6d71e91451258f5a038b0d2e2c7
|
4
|
+
data.tar.gz: e13a574fff50ab602883e5a80259277de8d82119
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4d3840324ee4a6b878ada12573540c5b21afe0ffdc4b799941627b32a6365ee0f01a517ed51405cd3946b9272861f1050e5d94c892efa1c1eceb84a2d5027d43
|
7
|
+
data.tar.gz: 342169b076fe3d09b415c977b5a5033e172d3ccac0ed01a1fc5c58825b62d85b963025c22d9dcb6afa42a5ac90abcf38dfe024d3c669b4b88625d2c9cbdcedb9
|
data/README.md
CHANGED
@@ -4,7 +4,7 @@ Bunpa
|
|
4
4
|
Bunpa is an extremely simple wrapper around the MeCab Japanese grammar parser. It was designed with two key features in mind:
|
5
5
|
|
6
6
|
1. Simplicity - only returns the text and major part of speech for each component
|
7
|
-
2. Completeness -
|
7
|
+
2. Completeness - ensures that whitespace and any unknown characters are preserved
|
8
8
|
|
9
9
|
## Background
|
10
10
|
|
@@ -19,7 +19,7 @@ Any components that cannot be identified by either MeCab or Bunpa are marked as
|
|
19
19
|
|
20
20
|
## Installation
|
21
21
|
|
22
|
-
From within your
|
22
|
+
From within your application's base directory:
|
23
23
|
|
24
24
|
1. Edit your Gemfile and add:
|
25
25
|
|
@@ -31,7 +31,7 @@ From within your Rails application's base directory:
|
|
31
31
|
|
32
32
|
## Usage
|
33
33
|
|
34
|
-
Bunpa operates as a very simple parser. It returns the components it identifies
|
34
|
+
Bunpa operates as a very simple parser. It returns the components it identifies as an Array of Bunpa::Text::Component objects, in the same order as they appear in the document. Each Component object has two accessors - 'text' and 'kind', which return the text value and part of speech of the component respectively.
|
35
35
|
|
36
36
|
Basic usage is as follows:
|
37
37
|
|
@@ -42,12 +42,38 @@ require 'bunpa'
|
|
42
42
|
parser = Bunpa::JapaneseTextParser.new
|
43
43
|
|
44
44
|
# Get an enumerable of Bunpa::Text::Components
|
45
|
-
components = parser.parse("
|
45
|
+
components = parser.parse("A: こんにちは! お元気ですか。\nB: はい、元気です!")
|
46
46
|
|
47
47
|
components.each do |component|
|
48
48
|
puts "#{component.text}\t(#{component.kind}"
|
49
49
|
end
|
50
|
+
```
|
51
|
+
|
52
|
+
This would output:
|
50
53
|
|
54
|
+
```
|
55
|
+
A (名詞)
|
56
|
+
: (名詞)
|
57
|
+
(スペース)
|
58
|
+
こんにちは (感動詞)
|
59
|
+
! (記号)
|
60
|
+
(スペース)
|
61
|
+
お (接頭詞)
|
62
|
+
元気 (名詞)
|
63
|
+
です (助動詞)
|
64
|
+
か (助詞)
|
65
|
+
。 (記号)
|
66
|
+
|
67
|
+
(改行)
|
68
|
+
B (名詞)
|
69
|
+
: (名詞)
|
70
|
+
(スペース)
|
71
|
+
は (助詞)
|
72
|
+
い (動詞)
|
73
|
+
、 (記号)
|
74
|
+
元気 (名詞)
|
75
|
+
です (助動詞)
|
76
|
+
! (記号)
|
51
77
|
```
|
52
78
|
|
53
79
|
For a slightly more detailed example, see the `usage_example.rb` script in the `bin` directory.
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bunpa
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Daniel Carter
|
@@ -71,7 +71,7 @@ rubyforge_project:
|
|
71
71
|
rubygems_version: 2.3.0
|
72
72
|
signing_key:
|
73
73
|
specification_version: 4
|
74
|
-
summary: bunpa v0.
|
74
|
+
summary: bunpa v0.3.0
|
75
75
|
test_files:
|
76
76
|
- spec/spec_helper.rb
|
77
77
|
- spec/parser_spec.rb
|