twitter-text-kow 1.3.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gemtest +0 -0
- data/.gitignore +40 -0
- data/.gitmodules +3 -0
- data/.rspec +2 -0
- data/CHANGELOG.md +44 -0
- data/Gemfile +4 -0
- data/LICENSE +188 -0
- data/README.md +193 -0
- data/Rakefile +52 -0
- data/config/README.md +142 -0
- data/config/v1.json +8 -0
- data/config/v2.json +29 -0
- data/config/v3.json +30 -0
- data/lib/assets/tld_lib.yml +1577 -0
- data/lib/twitter-text/autolink.rb +455 -0
- data/lib/twitter-text/configuration.rb +68 -0
- data/lib/twitter-text/deprecation.rb +21 -0
- data/lib/twitter-text/emoji_regex.rb +27 -0
- data/lib/twitter-text/extractor.rb +388 -0
- data/lib/twitter-text/hash_helper.rb +27 -0
- data/lib/twitter-text/hit_highlighter.rb +92 -0
- data/lib/twitter-text/regex.rb +381 -0
- data/lib/twitter-text/rewriter.rb +69 -0
- data/lib/twitter-text/unicode.rb +31 -0
- data/lib/twitter-text/validation.rb +251 -0
- data/lib/twitter-text/weighted_range.rb +24 -0
- data/lib/twitter-text.rb +29 -0
- data/script/destroy +14 -0
- data/script/generate +14 -0
- data/spec/autolinking_spec.rb +858 -0
- data/spec/configuration_spec.rb +136 -0
- data/spec/extractor_spec.rb +392 -0
- data/spec/hithighlighter_spec.rb +96 -0
- data/spec/regex_spec.rb +76 -0
- data/spec/rewriter_spec.rb +553 -0
- data/spec/spec_helper.rb +139 -0
- data/spec/test_urls.rb +90 -0
- data/spec/twitter_text_spec.rb +25 -0
- data/spec/unicode_spec.rb +35 -0
- data/spec/validation_spec.rb +87 -0
- data/test/conformance_test.rb +242 -0
- data/twitter-text.gemspec +35 -0
- metadata +228 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 43c07d2316e4bce54cd0915b52cb5a86bb81366c2ec24aca5c3e618362c6dced
|
4
|
+
data.tar.gz: 5a163feb86e22059dd6f315422ce680c4c26c3afc22d50dace85e2d6683c837f
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 707e1c2110e1fee3d76ab3a9f39bb799c755e6f33ce21093a4fef4ec7430ec8d25fb3de82e84fdbed4bec741d19a8cc6521161530cab2f14bdc2f719bf1e9b4c
|
7
|
+
data.tar.gz: 95fd66b4ee400291b77fce30b678a82a2745049ee30b5b8df88bab5f11e2fdfdfe104160e23c3f781ac47a277c0853ffdc5519047f588fe195a084e4c381822f
|
data/.gemtest
ADDED
File without changes
|
data/.gitignore
ADDED
@@ -0,0 +1,40 @@
|
|
1
|
+
*.gem
|
2
|
+
*.rbc
|
3
|
+
*.sw[a-p]
|
4
|
+
*.tmproj
|
5
|
+
*.tmproject
|
6
|
+
*.un~
|
7
|
+
*~
|
8
|
+
.DS_Store
|
9
|
+
.Spotlight-V100
|
10
|
+
.Trashes
|
11
|
+
._*
|
12
|
+
.bundle
|
13
|
+
.config
|
14
|
+
.directory
|
15
|
+
.elc
|
16
|
+
.emacs.desktop
|
17
|
+
.emacs.desktop.lock
|
18
|
+
.redcar
|
19
|
+
.yardoc
|
20
|
+
Desktop.ini
|
21
|
+
Gemfile.lock
|
22
|
+
Icon?
|
23
|
+
InstalledFiles
|
24
|
+
Session.vim
|
25
|
+
Thumbs.db
|
26
|
+
\#*\#
|
27
|
+
_yardoc
|
28
|
+
auto-save-list
|
29
|
+
coverage
|
30
|
+
doc
|
31
|
+
lib/bundler/man
|
32
|
+
pkg
|
33
|
+
pkg/*
|
34
|
+
rdoc
|
35
|
+
spec/reports
|
36
|
+
test/tmp
|
37
|
+
test/version_tmp
|
38
|
+
tmp
|
39
|
+
tmtags
|
40
|
+
tramp
|
data/.gitmodules
ADDED
data/.rspec
ADDED
data/CHANGELOG.md
ADDED
@@ -0,0 +1,44 @@
|
|
1
|
+
# Changelog
|
2
|
+
All notable changes to this project will be documented in this file.
|
3
|
+
|
4
|
+
## [Unreleased]
|
5
|
+
|
6
|
+
## [3.1.0]
|
7
|
+
### Changed
|
8
|
+
- Bump nokogiri version (#302)
|
9
|
+
- Fix auto-link emoji parsing (#304)
|
10
|
+
- Updates known gTLDs to recognize recent additions by IANA (#308)
|
11
|
+
- Fix warning about has_rdoc usage (#309)
|
12
|
+
|
13
|
+
## [3.0.0]
|
14
|
+
### Added
|
15
|
+
- New v3.json config file with emojiParsingEnabled config option. When
|
16
|
+
true, twitter-text will parse and discount emoji supported by the
|
17
|
+
twemoji library (see https://github.com/twitter/twemoji). The length
|
18
|
+
of these emoji will be the default weight (200 or two characters) even
|
19
|
+
if they contain multiple code points combined by zero-width
|
20
|
+
joiners. This means that emoji with skin tone and gender modifiers no
|
21
|
+
longer count as more characters than those without such modifiers.
|
22
|
+
### Changed
|
23
|
+
- Updates known gTLDs to recognize recent additions by IANA (#261)
|
24
|
+
|
25
|
+
## [2.1] - 2017-12-20
|
26
|
+
### Added
|
27
|
+
- This CHANGELOG.md file
|
28
|
+
|
29
|
+
### Changed
|
30
|
+
- Top-level namespace changed from `Twitter` to `Twitter::TwitterText`. This
|
31
|
+
resolves a namespace collision with the popular
|
32
|
+
[twitter gem](https://github.com/sferik/twitter). This is considered
|
33
|
+
a breaking change, so the version has been bumped to 2.1. This fixes
|
34
|
+
issue [#221](https://github.com/twitter/twitter-text/issues/221),
|
35
|
+
"NoMethodError Exception: undefined method `[]' for nil:NilClasswhen
|
36
|
+
using gem in rails app"
|
37
|
+
|
38
|
+
## [2.0.2] - 2017-12-18
|
39
|
+
### Changed
|
40
|
+
- Resolved issue
|
41
|
+
[#211](https://github.com/twitter/twitter-text/issues/211), "gem
|
42
|
+
breaks, asset file is a dangling symlink"
|
43
|
+
- config files, tld_lib.yml files now copied into the right place
|
44
|
+
- Rakefile now included `prebuild`, `clean` tasks
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,188 @@
|
|
1
|
+
Copyright 2011 Twitter, Inc.
|
2
|
+
|
3
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
4
|
+
you may not use this work except in compliance with the License.
|
5
|
+
You may obtain a copy of the License below, or at:
|
6
|
+
|
7
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
8
|
+
|
9
|
+
Unless required by applicable law or agreed to in writing, software
|
10
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
11
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12
|
+
See the License for the specific language governing permissions and
|
13
|
+
limitations under the License.
|
14
|
+
|
15
|
+
Apache License
|
16
|
+
Version 2.0, January 2004
|
17
|
+
http://www.apache.org/licenses/
|
18
|
+
|
19
|
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
20
|
+
|
21
|
+
1. Definitions.
|
22
|
+
|
23
|
+
"License" shall mean the terms and conditions for use, reproduction,
|
24
|
+
and distribution as defined by Sections 1 through 9 of this document.
|
25
|
+
|
26
|
+
"Licensor" shall mean the copyright owner or entity authorized by
|
27
|
+
the copyright owner that is granting the License.
|
28
|
+
|
29
|
+
"Legal Entity" shall mean the union of the acting entity and all
|
30
|
+
other entities that control, are controlled by, or are under common
|
31
|
+
control with that entity. For the purposes of this definition,
|
32
|
+
"control" means (i) the power, direct or indirect, to cause the
|
33
|
+
direction or management of such entity, whether by contract or
|
34
|
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
35
|
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
36
|
+
|
37
|
+
"You" (or "Your") shall mean an individual or Legal Entity
|
38
|
+
exercising permissions granted by this License.
|
39
|
+
|
40
|
+
"Source" form shall mean the preferred form for making modifications,
|
41
|
+
including but not limited to software source code, documentation
|
42
|
+
source, and configuration files.
|
43
|
+
|
44
|
+
"Object" form shall mean any form resulting from mechanical
|
45
|
+
transformation or translation of a Source form, including but
|
46
|
+
not limited to compiled object code, generated documentation,
|
47
|
+
and conversions to other media types.
|
48
|
+
|
49
|
+
"Work" shall mean the work of authorship, whether in Source or
|
50
|
+
Object form, made available under the License, as indicated by a
|
51
|
+
copyright notice that is included in or attached to the work
|
52
|
+
(an example is provided in the Appendix below).
|
53
|
+
|
54
|
+
"Derivative Works" shall mean any work, whether in Source or Object
|
55
|
+
form, that is based on (or derived from) the Work and for which the
|
56
|
+
editorial revisions, annotations, elaborations, or other modifications
|
57
|
+
represent, as a whole, an original work of authorship. For the purposes
|
58
|
+
of this License, Derivative Works shall not include works that remain
|
59
|
+
separable from, or merely link (or bind by name) to the interfaces of,
|
60
|
+
the Work and Derivative Works thereof.
|
61
|
+
|
62
|
+
"Contribution" shall mean any work of authorship, including
|
63
|
+
the original version of the Work and any modifications or additions
|
64
|
+
to that Work or Derivative Works thereof, that is intentionally
|
65
|
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
66
|
+
or by an individual or Legal Entity authorized to submit on behalf of
|
67
|
+
the copyright owner. For the purposes of this definition, "submitted"
|
68
|
+
means any form of electronic, verbal, or written communication sent
|
69
|
+
to the Licensor or its representatives, including but not limited to
|
70
|
+
communication on electronic mailing lists, source code control systems,
|
71
|
+
and issue tracking systems that are managed by, or on behalf of, the
|
72
|
+
Licensor for the purpose of discussing and improving the Work, but
|
73
|
+
excluding communication that is conspicuously marked or otherwise
|
74
|
+
designated in writing by the copyright owner as "Not a Contribution."
|
75
|
+
|
76
|
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
77
|
+
on behalf of whom a Contribution has been received by Licensor and
|
78
|
+
subsequently incorporated within the Work.
|
79
|
+
|
80
|
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
81
|
+
this License, each Contributor hereby grants to You a perpetual,
|
82
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
83
|
+
copyright license to reproduce, prepare Derivative Works of,
|
84
|
+
publicly display, publicly perform, sublicense, and distribute the
|
85
|
+
Work and such Derivative Works in Source or Object form.
|
86
|
+
|
87
|
+
3. Grant of Patent License. Subject to the terms and conditions of
|
88
|
+
this License, each Contributor hereby grants to You a perpetual,
|
89
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
90
|
+
(except as stated in this section) patent license to make, have made,
|
91
|
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
92
|
+
where such license applies only to those patent claims licensable
|
93
|
+
by such Contributor that are necessarily infringed by their
|
94
|
+
Contribution(s) alone or by combination of their Contribution(s)
|
95
|
+
with the Work to which such Contribution(s) was submitted. If You
|
96
|
+
institute patent litigation against any entity (including a
|
97
|
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
98
|
+
or a Contribution incorporated within the Work constitutes direct
|
99
|
+
or contributory patent infringement, then any patent licenses
|
100
|
+
granted to You under this License for that Work shall terminate
|
101
|
+
as of the date such litigation is filed.
|
102
|
+
|
103
|
+
4. Redistribution. You may reproduce and distribute copies of the
|
104
|
+
Work or Derivative Works thereof in any medium, with or without
|
105
|
+
modifications, and in Source or Object form, provided that You
|
106
|
+
meet the following conditions:
|
107
|
+
|
108
|
+
(a) You must give any other recipients of the Work or
|
109
|
+
Derivative Works a copy of this License; and
|
110
|
+
|
111
|
+
(b) You must cause any modified files to carry prominent notices
|
112
|
+
stating that You changed the files; and
|
113
|
+
|
114
|
+
(c) You must retain, in the Source form of any Derivative Works
|
115
|
+
that You distribute, all copyright, patent, trademark, and
|
116
|
+
attribution notices from the Source form of the Work,
|
117
|
+
excluding those notices that do not pertain to any part of
|
118
|
+
the Derivative Works; and
|
119
|
+
|
120
|
+
(d) If the Work includes a "NOTICE" text file as part of its
|
121
|
+
distribution, then any Derivative Works that You distribute must
|
122
|
+
include a readable copy of the attribution notices contained
|
123
|
+
within such NOTICE file, excluding those notices that do not
|
124
|
+
pertain to any part of the Derivative Works, in at least one
|
125
|
+
of the following places: within a NOTICE text file distributed
|
126
|
+
as part of the Derivative Works; within the Source form or
|
127
|
+
documentation, if provided along with the Derivative Works; or,
|
128
|
+
within a display generated by the Derivative Works, if and
|
129
|
+
wherever such third-party notices normally appear. The contents
|
130
|
+
of the NOTICE file are for informational purposes only and
|
131
|
+
do not modify the License. You may add Your own attribution
|
132
|
+
notices within Derivative Works that You distribute, alongside
|
133
|
+
or as an addendum to the NOTICE text from the Work, provided
|
134
|
+
that such additional attribution notices cannot be construed
|
135
|
+
as modifying the License.
|
136
|
+
|
137
|
+
You may add Your own copyright statement to Your modifications and
|
138
|
+
may provide additional or different license terms and conditions
|
139
|
+
for use, reproduction, or distribution of Your modifications, or
|
140
|
+
for any such Derivative Works as a whole, provided Your use,
|
141
|
+
reproduction, and distribution of the Work otherwise complies with
|
142
|
+
the conditions stated in this License.
|
143
|
+
|
144
|
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
145
|
+
any Contribution intentionally submitted for inclusion in the Work
|
146
|
+
by You to the Licensor shall be under the terms and conditions of
|
147
|
+
this License, without any additional terms or conditions.
|
148
|
+
Notwithstanding the above, nothing herein shall supersede or modify
|
149
|
+
the terms of any separate license agreement you may have executed
|
150
|
+
with Licensor regarding such Contributions.
|
151
|
+
|
152
|
+
6. Trademarks. This License does not grant permission to use the trade
|
153
|
+
names, trademarks, service marks, or product names of the Licensor,
|
154
|
+
except as required for reasonable and customary use in describing the
|
155
|
+
origin of the Work and reproducing the content of the NOTICE file.
|
156
|
+
|
157
|
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
158
|
+
agreed to in writing, Licensor provides the Work (and each
|
159
|
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
160
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
161
|
+
implied, including, without limitation, any warranties or conditions
|
162
|
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
163
|
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
164
|
+
appropriateness of using or redistributing the Work and assume any
|
165
|
+
risks associated with Your exercise of permissions under this License.
|
166
|
+
|
167
|
+
8. Limitation of Liability. In no event and under no legal theory,
|
168
|
+
whether in tort (including negligence), contract, or otherwise,
|
169
|
+
unless required by applicable law (such as deliberate and grossly
|
170
|
+
negligent acts) or agreed to in writing, shall any Contributor be
|
171
|
+
liable to You for damages, including any direct, indirect, special,
|
172
|
+
incidental, or consequential damages of any character arising as a
|
173
|
+
result of this License or out of the use or inability to use the
|
174
|
+
Work (including but not limited to damages for loss of goodwill,
|
175
|
+
work stoppage, computer failure or malfunction, or any and all
|
176
|
+
other commercial damages or losses), even if such Contributor
|
177
|
+
has been advised of the possibility of such damages.
|
178
|
+
|
179
|
+
9. Accepting Warranty or Additional Liability. While redistributing
|
180
|
+
the Work or Derivative Works thereof, You may choose to offer,
|
181
|
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
182
|
+
or other liability obligations and/or rights consistent with this
|
183
|
+
License. However, in accepting such obligations, You may act only
|
184
|
+
on Your own behalf and on Your sole responsibility, not on behalf
|
185
|
+
of any other Contributor, and only if You agree to indemnify,
|
186
|
+
defend, and hold each Contributor harmless for any liability
|
187
|
+
incurred by, or claims asserted against, such Contributor by reason
|
188
|
+
of your accepting any such warranty or additional liability.
|
data/README.md
ADDED
@@ -0,0 +1,193 @@
|
|
1
|
+
# twitter-text
|
2
|
+
|
3
|
+
![](https://img.shields.io/gem/v/twitter-text.svg)
|
4
|
+
|
5
|
+
This is the Ruby implementation of the twitter-text parsing
|
6
|
+
library. The library has methods to parse Tweets and calculate length,
|
7
|
+
validity, parse @mentions, #hashtags, URLs, and more.
|
8
|
+
|
9
|
+
## Setup
|
10
|
+
|
11
|
+
Installation uses bundler.
|
12
|
+
|
13
|
+
```
|
14
|
+
% gem install bundler
|
15
|
+
% bundle install
|
16
|
+
```
|
17
|
+
|
18
|
+
## Conformance tests
|
19
|
+
|
20
|
+
To run the Conformance test suite from the command line via rake:
|
21
|
+
|
22
|
+
```
|
23
|
+
% rake test:conformance:run
|
24
|
+
```
|
25
|
+
|
26
|
+
You can also run the rspec tests in the `spec` directory:
|
27
|
+
|
28
|
+
```
|
29
|
+
% rspec spec
|
30
|
+
```
|
31
|
+
|
32
|
+
# Length validation
|
33
|
+
|
34
|
+
twitter-text 2.0 introduces configuration files that define how Tweets
|
35
|
+
are parsed for length. This allows for backwards compatibility and
|
36
|
+
flexibility going forward. Old-style traditional 140-character parsing
|
37
|
+
is defined by the v1.json configuration file, whereas v2.json is
|
38
|
+
updated for "weighted" Tweets where ranges of Unicode code points can
|
39
|
+
have independent weights aside from the default weight. The sum of all
|
40
|
+
code points, each weighted appropriately, should not exceed the max
|
41
|
+
weighted length.
|
42
|
+
|
43
|
+
Some old methods from twitter-text 1.0 have been marked deprecated,
|
44
|
+
such as the `tweet_length()` method. The new API is based on the
|
45
|
+
following method, `parse_tweet()`
|
46
|
+
|
47
|
+
```ruby
|
48
|
+
def parse_tweet(text, options = {}) { ... }
|
49
|
+
```
|
50
|
+
|
51
|
+
This method takes a string as input and returns a results object that
|
52
|
+
contains information about the
|
53
|
+
string. `Twitter::TwitterText::Validation::ParseResults` object includes:
|
54
|
+
|
55
|
+
* `:weighted_length`: the overall length of the tweet with code points
|
56
|
+
weighted per the ranges defined in the configuration file.
|
57
|
+
|
58
|
+
* `:permillage`: indicates the proportion (per thousand) of the weighted
|
59
|
+
length in comparison to the max weighted length. A value > 1000
|
60
|
+
indicates input text that is longer than the allowable maximum.
|
61
|
+
|
62
|
+
* `:valid`: indicates if input text length corresponds to a valid
|
63
|
+
result.
|
64
|
+
|
65
|
+
* `:display_range_start, :display_range_end`: An array of two unicode code point
|
66
|
+
indices identifying the inclusive start and exclusive end of the
|
67
|
+
displayable content of the Tweet. For more information, see
|
68
|
+
the description of `display_text_range` here:
|
69
|
+
[Tweet updates](https://developer.twitter.com/en/docs/tweets/tweet-updates)
|
70
|
+
|
71
|
+
* `:valid_range_start, :valid_range_end`: An array of two unicode code point
|
72
|
+
indices identifying the inclusive start and exclusive end of the valid
|
73
|
+
content of the Tweet. For more information on the extended Tweet
|
74
|
+
payload see [Tweet updates](https://developer.twitter.com/en/docs/tweets/tweet-updates)
|
75
|
+
|
76
|
+
## Extraction Examples
|
77
|
+
|
78
|
+
# Extraction
|
79
|
+
```ruby
|
80
|
+
class MyClass
|
81
|
+
include Twitter::TwitterText::Extractor
|
82
|
+
usernames = extract_mentioned_screen_names("Mentioning @twitter and @jack")
|
83
|
+
# usernames = ["twitter", "jack"]
|
84
|
+
end
|
85
|
+
```
|
86
|
+
|
87
|
+
### Extraction with a block argument
|
88
|
+
|
89
|
+
```ruby
|
90
|
+
class MyClass
|
91
|
+
include Twitter::TwitterText::Extractor
|
92
|
+
extract_reply_screen_name("@twitter are you hiring?").do |username|
|
93
|
+
# username = "twitter"
|
94
|
+
end
|
95
|
+
end
|
96
|
+
```
|
97
|
+
|
98
|
+
## Auto-linking Examples
|
99
|
+
|
100
|
+
### Auto-link
|
101
|
+
|
102
|
+
```ruby
|
103
|
+
class MyClass
|
104
|
+
include Twitter::TwitterText::Autolink
|
105
|
+
|
106
|
+
html = auto_link("link @user, please #request")
|
107
|
+
end
|
108
|
+
```
|
109
|
+
|
110
|
+
### For Ruby on Rails you want to add this to app/helpers/application_helper.rb
|
111
|
+
```ruby
|
112
|
+
module ApplicationHelper
|
113
|
+
include Twitter::TwitterText::Autolink
|
114
|
+
end
|
115
|
+
```
|
116
|
+
|
117
|
+
### Now the auto_link function is available in every view. So in index.html.erb:
|
118
|
+
```ruby
|
119
|
+
<%= auto_link("link @user, please #request") %>
|
120
|
+
```
|
121
|
+
|
122
|
+
### Usernames
|
123
|
+
|
124
|
+
Username extraction and linking matches all valid Twitter usernames but does
|
125
|
+
not verify that the username is a valid Twitter account.
|
126
|
+
|
127
|
+
### Lists
|
128
|
+
|
129
|
+
Auto-link and extract list names when they are written in @user/list-name
|
130
|
+
format.
|
131
|
+
|
132
|
+
### Hashtags
|
133
|
+
|
134
|
+
Auto-link and extract hashtags, where a hashtag can contain most letters or
|
135
|
+
numbers but cannot be solely numbers and cannot contain punctuation.
|
136
|
+
|
137
|
+
### URLs
|
138
|
+
|
139
|
+
Asian languages like Chinese, Japanese or Korean may not use a delimiter such
|
140
|
+
as a space to separate normal text from URLs making it difficult to identify
|
141
|
+
where the URL ends and the text starts.
|
142
|
+
|
143
|
+
For this reason twitter-text currently does not support extracting or
|
144
|
+
auto-linking of URLs immediately followed by non-Latin characters.
|
145
|
+
|
146
|
+
Example: "http://twitter.com/は素晴らしい" . The normal text is "は素晴らしい" and is not
|
147
|
+
part of the URL even though it isn't space separated.
|
148
|
+
|
149
|
+
### International
|
150
|
+
|
151
|
+
Special care has been taken to be sure that auto-linking and extraction work
|
152
|
+
in Tweets of all languages. This means that languages without spaces between
|
153
|
+
words should work equally well.
|
154
|
+
|
155
|
+
### Hit Highlighting
|
156
|
+
|
157
|
+
Use to provide emphasis around the "hits" returned from the Search API, built
|
158
|
+
to work against text that has been auto-linked already.
|
159
|
+
|
160
|
+
## Issues
|
161
|
+
|
162
|
+
Have a bug? Please create an issue here on GitHub!
|
163
|
+
|
164
|
+
<https://github.com/twitter/twitter-text/issues>
|
165
|
+
|
166
|
+
## Authors
|
167
|
+
|
168
|
+
### V2.0
|
169
|
+
|
170
|
+
* David LaMacchia (<https://github.com/dlamacchia>)
|
171
|
+
* Yoshimasa Niwa (<https://github.com/niw>)
|
172
|
+
* Sudheer Guntupalli (<https://github.com/sudhee>)
|
173
|
+
* Kaushik Lakshmikanth (<https://github.com/kaushlakers>)
|
174
|
+
* Jose Antonio Marquez Russo (<https://github.com/joseeight>)
|
175
|
+
* Lee Adams (<https://github.com/leeaustinadams>)
|
176
|
+
|
177
|
+
### Previous authors
|
178
|
+
|
179
|
+
* Matt Sanford (<http://github.com/mzsanford>)
|
180
|
+
* Raffi Krikorian (<http://github.com/r>)
|
181
|
+
* Ben Cherry (<http://github.com/bcherry>)
|
182
|
+
* Patrick Ewing (<http://github.com/hoverbird>)
|
183
|
+
* Jeff Smick (<http://github.com/sprsquish>)
|
184
|
+
* Kenneth Kufluk (<https://github.com/kennethkufluk>)
|
185
|
+
* Keita Fujii (<https://github.com/keitaf>)
|
186
|
+
* Jean-Philippe Bougie (<http://github.com/jpbougie>)
|
187
|
+
* Erik Michaels-Ober (<https://github.com/sferik>)
|
188
|
+
|
189
|
+
## License
|
190
|
+
|
191
|
+
Copyright 2012-2020 Twitter, Inc and other contributors
|
192
|
+
|
193
|
+
Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)
|
data/Rakefile
ADDED
@@ -0,0 +1,52 @@
|
|
1
|
+
require 'bundler'
|
2
|
+
include Rake::DSL
|
3
|
+
Bundler::GemHelper.install_tasks
|
4
|
+
|
5
|
+
task :build => ['prebuild']
|
6
|
+
task :spec => ['prebuild']
|
7
|
+
task :default => ['prebuild', 'spec', 'test:conformance']
|
8
|
+
task :test => :spec
|
9
|
+
|
10
|
+
directory "config"
|
11
|
+
directory "lib/assets"
|
12
|
+
|
13
|
+
desc "Prebuild task setup"
|
14
|
+
task :prebuild => ["config", "lib/assets"] do
|
15
|
+
FileUtils.cp_r '../config/.', 'config', :verbose => true
|
16
|
+
FileUtils.cp_r '../conformance/tld_lib.yml', 'lib/assets', :verbose => true
|
17
|
+
end
|
18
|
+
|
19
|
+
require 'rubygems'
|
20
|
+
require 'rspec/core/rake_task'
|
21
|
+
RSpec::Core::RakeTask.new(:spec)
|
22
|
+
|
23
|
+
namespace :test do
|
24
|
+
namespace :conformance do
|
25
|
+
desc "Run conformance test suite"
|
26
|
+
task :run => ['prebuild'] do
|
27
|
+
ruby "test/conformance_test.rb"
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
desc "Run conformance test suite"
|
32
|
+
task :conformance => ['conformance:run'] do
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
require 'rdoc/task'
|
37
|
+
namespace :doc do
|
38
|
+
RDoc::Task.new do |rd|
|
39
|
+
rd.main = "README.rdoc"
|
40
|
+
rd.rdoc_dir = 'doc'
|
41
|
+
rd.rdoc_files.include("README.rdoc", "lib/**/*.rb")
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
desc "Run cruise control build"
|
46
|
+
task :cruise => [:spec, 'test:conformance'] do
|
47
|
+
end
|
48
|
+
|
49
|
+
desc "Clean build"
|
50
|
+
task :clean do
|
51
|
+
rm_rf ["config", "pkg", "lib/assets", "Gemfile.lock"]
|
52
|
+
end
|
data/config/README.md
ADDED
@@ -0,0 +1,142 @@
|
|
1
|
+
# twitter-text Configuration
|
2
|
+
|
3
|
+
twitter-text 2.0 introduces a new configuration format as well as APIs
|
4
|
+
for interpreting this configuration. The configuration is a JSON
|
5
|
+
string (or file) and the parsing APIs have been provided in each of
|
6
|
+
twitter-text’s four reference languages.
|
7
|
+
|
8
|
+
## Format
|
9
|
+
|
10
|
+
The configuration format is a JSON string. The JSON can have the following properties:
|
11
|
+
|
12
|
+
* `version` (required, integer, min value 0)
|
13
|
+
* `maxWeightedTweetLength` (required, integer, min value 0)
|
14
|
+
* `scale` (required, integer, min value 1)
|
15
|
+
* `defaultWeight` (required, integer, min value 0)
|
16
|
+
* `emojiParsingEnabled` (optional, boolean)
|
17
|
+
* `transformedURLLength` (integer, min value 0)
|
18
|
+
* `ranges` (array of range items)
|
19
|
+
|
20
|
+
A `range item` has the following properties:
|
21
|
+
|
22
|
+
* `start` (required, integer, min value 0)
|
23
|
+
* `end` (required, integer, min value 0)
|
24
|
+
* `weight` (required, integer, min value 0)
|
25
|
+
|
26
|
+
## Parameters
|
27
|
+
|
28
|
+
### version
|
29
|
+
|
30
|
+
The version for the configuration string. This is an integer that will
|
31
|
+
monotonically increase in future releases. The legacy version of the
|
32
|
+
string is version 1; weighted code point ranges and 280-character
|
33
|
+
“long” tweets are supported in version 2.
|
34
|
+
|
35
|
+
### maxWeightedTweetLength
|
36
|
+
|
37
|
+
The maximum length of the tweet, weighted. Legacy v1 tweets had a
|
38
|
+
maximum weighted length of 140 and all characters were weighted the
|
39
|
+
same. In the new configuration format, this is represented as a
|
40
|
+
maxWeightedTweetLength of 140 and a defaultWeight of 1 for all code
|
41
|
+
points.
|
42
|
+
|
43
|
+
### scale
|
44
|
+
|
45
|
+
The Tweet length is the (`weighted length` / `scale`).
|
46
|
+
|
47
|
+
### defaultWeight
|
48
|
+
|
49
|
+
The default weight applied to all code points. This is overridden in
|
50
|
+
one or more range items.
|
51
|
+
|
52
|
+
### emojiParsingEnabled
|
53
|
+
|
54
|
+
When set to true, the weighted Tweet length considers all emoji as a
|
55
|
+
single code point (with a default weight of 200), including longer
|
56
|
+
grapheme clusters combined by zero-width joiners. When set to false,
|
57
|
+
Tweet length is calculated by weighing individual Unicode code points.
|
58
|
+
|
59
|
+
### transformedURLLength
|
60
|
+
|
61
|
+
The length counted for URLs against the total weight of the Tweet. In
|
62
|
+
previous versions of twitter-text, which was the “shortened URL
|
63
|
+
length.” Differentiating between the http and https shortened length
|
64
|
+
for URLs has been deprecated (https is used for all t.co URLs). The
|
65
|
+
default value is 23.
|
66
|
+
|
67
|
+
### ranges
|
68
|
+
|
69
|
+
An array of range items that describe ranges of Unicode code points
|
70
|
+
and the weight to apply to each code point. Each range is defined by
|
71
|
+
its start, end, and weight. Surrogate pairs have a length that is
|
72
|
+
equivalent to the length of the first code unit in the surrogate
|
73
|
+
pair. Note that certain graphemes are the result of joining code
|
74
|
+
points together, such as by a zero-width joiner; unlike a surrogate
|
75
|
+
pair, the length of such a grapheme will be the sum of the weighted
|
76
|
+
length of all included code points.
|
77
|
+
|
78
|
+
## API
|
79
|
+
|
80
|
+
Each of the four reference language implementations provides a way to
|
81
|
+
read the JSON configuration.
|
82
|
+
|
83
|
+
## Java
|
84
|
+
|
85
|
+
```java
|
86
|
+
public static TwitterTextConfiguration configurationFromJson(@Nonnull String json, boolean isResource)
|
87
|
+
```
|
88
|
+
|
89
|
+
`json`: the configuration string or file name in the config directory (see `isResource`)
|
90
|
+
`isResource`: if true, json refers to a file name for the configuration.
|
91
|
+
|
92
|
+
## JavaScript
|
93
|
+
|
94
|
+
Configurations are accessed via `twttr.text.configs` (example:
|
95
|
+
`twttr.text.configs.version2`). This config is passed as an argument
|
96
|
+
to `parseTweet:`
|
97
|
+
|
98
|
+
```js
|
99
|
+
twttr.txt.parseTweet(inputText, configVersion2)
|
100
|
+
```
|
101
|
+
|
102
|
+
## Objective-C
|
103
|
+
|
104
|
+
The Objective-C implementation provides two methods for reading the
|
105
|
+
input, either from a string or a file resource.
|
106
|
+
|
107
|
+
```objective-c
|
108
|
+
+ (instancetype)configurationFromJSONResource:(NSString *)jsonResource;
|
109
|
+
+ (instancetype)configurationFromJSONString:(NSString *)jsonString;
|
110
|
+
```
|
111
|
+
|
112
|
+
The default configuration can also be set:
|
113
|
+
|
114
|
+
```objective-c
|
115
|
+
+ (void)setDefaultParserConfiguration:(TwitterTextConfiguration *)configuration
|
116
|
+
```
|
117
|
+
|
118
|
+
The resource string refers to the two included configuration files
|
119
|
+
(which are referenced in the Xcode project).
|
120
|
+
|
121
|
+
## Ruby
|
122
|
+
|
123
|
+
Ruby provides the `Twitter::Configuration` class and means to read
|
124
|
+
from a file or string.
|
125
|
+
|
126
|
+
```ruby
|
127
|
+
def self.parse_string(string, options = {})
|
128
|
+
def self.parse_file(filename)
|
129
|
+
```
|
130
|
+
|
131
|
+
You can use `configuration_from_file()` or initialize a configuration
|
132
|
+
using `Twitter::Configuration.new(config)`, where `config` is the
|
133
|
+
output of one of the two above methods.
|
134
|
+
|
135
|
+
|
136
|
+
|
137
|
+
|
138
|
+
|
139
|
+
|
140
|
+
|
141
|
+
|
142
|
+
|