scrub_rb 0.2.0 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +10 -8
- data/lib/scrub_rb/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 644acb6ff368a1469fabcce3a3a41b8c3389fbb1
|
4
|
+
data.tar.gz: 4b80ec9d61708227bb65a43bb799d66b48b2e14f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9ad061344fd9587e8ed1eb39404e828a05f357a769c9838043870cb7034efaa763f2a827322043e5fa943e17ff6318df42f89050c4de7fc4aeba7d876a823c7c
|
7
|
+
data.tar.gz: 69b9a20817599ed7647ad6007f7c8b123f247cd32484719b88d7f14ab3e29879fc5d7b1d6d1225c209477fa5eb6e8df2f19444debd75578089b02b8acc1c8685
|
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
Pure-ruby polyfill of MRI 2.1 String#scrub, for ruby 1.9 and 2.0 any interpreter
|
4
4
|
|
5
|
-
[](https://travis-ci.org/jrochkind/scrub_rb)
|
5
|
+
[](https://travis-ci.org/jrochkind/scrub_rb) [](http://badge.fury.io/rb/scrub_rb)
|
6
6
|
|
7
7
|
## Installation
|
8
8
|
|
@@ -21,8 +21,8 @@ Or install it yourself as:
|
|
21
21
|
|
22
22
|
## What it is
|
23
23
|
|
24
|
-
Ruby 2.1 introduces String#scrub, a method to replace
|
25
|
-
|
24
|
+
Ruby 2.1 introduces String#scrub, a method to replace bytes in a string that are invalid for it's specified encoding.
|
25
|
+
See docs in [MRI ruby source](https://github.com/ruby/ruby/blob/1e8a05c1dfee94db9b6b825097e1d192ad32930a/string.c#L7772)
|
26
26
|
|
27
27
|
If you need String#scrub in MRI ruby 2.0, you can use the [string-scrub gem](https://github.com/hsbt/string-scrub), which provides a backport of the C code from MRI ruby 2.1 into MRI 2.0.
|
28
28
|
|
@@ -50,7 +50,7 @@ This pure ruby implementation is about an order of magnitude slower than stdlib
|
|
50
50
|
|
51
51
|
## Discrepency with MRI 2.1 String#scrub
|
52
52
|
|
53
|
-
If there
|
53
|
+
If there is a sequence of multiple contiguous invalid bytes in a string, should the entire block be replaced with only one replacement, or should each invalid byte be replaced with a replacement?
|
54
54
|
|
55
55
|
I have not been able to understand the logic MRI 2.1 uses to divide contiguous invalid bytes into
|
56
56
|
certain sub-sequences for replacement, as represented in the [test suite](https://github.com/ruby/ruby/blob/3ac0ec4ecdea849143ed64e8935e6675b341e44b/test/ruby/test_m17n.rb#L1505). The test suite may be suggesting that the examples are from unicode documentation, but I wasn't able to find such documentation to see if it shed any light on the matter.
|
@@ -63,17 +63,19 @@ For most uses, this discrepency is probably not of consequence.
|
|
63
63
|
|
64
64
|
If anyone can explain whats going on here, I'm very curious! I can't read C very well to try and figure it out from source.
|
65
65
|
|
66
|
-
##
|
66
|
+
## JRuby may raise
|
67
67
|
|
68
68
|
Due to an apparent JRuby bug, some invalid strings cause an internal
|
69
|
-
exception from JRuby when trying to scrub_rb.
|
69
|
+
exception from JRuby when trying to scrub_rb. This bug should [be fixed in jruby 1.7.11](https://github.com/jruby/jruby/issues/1361#issuecomment-35776377)
|
70
|
+
|
71
|
+
In Jruby versions prior to that, The entire original MRI test suite
|
70
72
|
does passes against scrub_rb in JRuby -- but [one test original to us, involving
|
71
73
|
input tagged 'ascii' encoding](./test/scrub_test.rb#L67), fails raising an ArrayIndexOutOfBoundsException
|
72
74
|
from inside of JRuby. I have filed an [issue with JRuby](https://github.com/jruby/jruby/issues/1361).
|
73
75
|
|
74
|
-
I believe this problem
|
76
|
+
**I believe this problem is likely to be rare** -- so far, the only reproduction case involves an input string tagged 'ascii' encoding, which probably isn't a common use case. But it's unfortunate
|
75
77
|
that `scrub_rb` isn't reliable on jruby. I haven't been able to figure out any workaround in ruby to the jruby bug -- you could theoretically provide a Java alternate implementation usable in jruby, but I'm not sure what Java tools are available and how hard it would be to match the scrub api.
|
76
78
|
|
77
79
|
## Contributions
|
78
80
|
|
79
|
-
Pull requests or suggestions welcome, especially on performance, on JRuby issue, and on discrepencies with official String#scrub.
|
81
|
+
Pull requests or suggestions welcome, especially on performance, on JRuby issue, and on discrepencies with official String#scrub.
|
data/lib/scrub_rb/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: scrub_rb
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 1.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jonathan Rochkind
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2014-07-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|