scrub_rb 0.2.0 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +10 -8
- data/lib/scrub_rb/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 644acb6ff368a1469fabcce3a3a41b8c3389fbb1
|
4
|
+
data.tar.gz: 4b80ec9d61708227bb65a43bb799d66b48b2e14f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9ad061344fd9587e8ed1eb39404e828a05f357a769c9838043870cb7034efaa763f2a827322043e5fa943e17ff6318df42f89050c4de7fc4aeba7d876a823c7c
|
7
|
+
data.tar.gz: 69b9a20817599ed7647ad6007f7c8b123f247cd32484719b88d7f14ab3e29879fc5d7b1d6d1225c209477fa5eb6e8df2f19444debd75578089b02b8acc1c8685
|
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
Pure-ruby polyfill of MRI 2.1 String#scrub, for ruby 1.9 and 2.0 any interpreter
|
4
4
|
|
5
|
-
[![Build Status](https://travis-ci.org/jrochkind/scrub_rb.png?branch=master)](https://travis-ci.org/jrochkind/scrub_rb)
|
5
|
+
[![Build Status](https://travis-ci.org/jrochkind/scrub_rb.png?branch=master)](https://travis-ci.org/jrochkind/scrub_rb) [![Gem Version](https://badge.fury.io/rb/scrub_rb.png)](http://badge.fury.io/rb/scrub_rb)
|
6
6
|
|
7
7
|
## Installation
|
8
8
|
|
@@ -21,8 +21,8 @@ Or install it yourself as:
|
|
21
21
|
|
22
22
|
## What it is
|
23
23
|
|
24
|
-
Ruby 2.1 introduces String#scrub, a method to replace
|
25
|
-
|
24
|
+
Ruby 2.1 introduces String#scrub, a method to replace bytes in a string that are invalid for it's specified encoding.
|
25
|
+
See docs in [MRI ruby source](https://github.com/ruby/ruby/blob/1e8a05c1dfee94db9b6b825097e1d192ad32930a/string.c#L7772)
|
26
26
|
|
27
27
|
If you need String#scrub in MRI ruby 2.0, you can use the [string-scrub gem](https://github.com/hsbt/string-scrub), which provides a backport of the C code from MRI ruby 2.1 into MRI 2.0.
|
28
28
|
|
@@ -50,7 +50,7 @@ This pure ruby implementation is about an order of magnitude slower than stdlib
|
|
50
50
|
|
51
51
|
## Discrepency with MRI 2.1 String#scrub
|
52
52
|
|
53
|
-
If there
|
53
|
+
If there is a sequence of multiple contiguous invalid bytes in a string, should the entire block be replaced with only one replacement, or should each invalid byte be replaced with a replacement?
|
54
54
|
|
55
55
|
I have not been able to understand the logic MRI 2.1 uses to divide contiguous invalid bytes into
|
56
56
|
certain sub-sequences for replacement, as represented in the [test suite](https://github.com/ruby/ruby/blob/3ac0ec4ecdea849143ed64e8935e6675b341e44b/test/ruby/test_m17n.rb#L1505). The test suite may be suggesting that the examples are from unicode documentation, but I wasn't able to find such documentation to see if it shed any light on the matter.
|
@@ -63,17 +63,19 @@ For most uses, this discrepency is probably not of consequence.
|
|
63
63
|
|
64
64
|
If anyone can explain whats going on here, I'm very curious! I can't read C very well to try and figure it out from source.
|
65
65
|
|
66
|
-
##
|
66
|
+
## JRuby may raise
|
67
67
|
|
68
68
|
Due to an apparent JRuby bug, some invalid strings cause an internal
|
69
|
-
exception from JRuby when trying to scrub_rb.
|
69
|
+
exception from JRuby when trying to scrub_rb. This bug should [be fixed in jruby 1.7.11](https://github.com/jruby/jruby/issues/1361#issuecomment-35776377)
|
70
|
+
|
71
|
+
In Jruby versions prior to that, The entire original MRI test suite
|
70
72
|
does passes against scrub_rb in JRuby -- but [one test original to us, involving
|
71
73
|
input tagged 'ascii' encoding](./test/scrub_test.rb#L67), fails raising an ArrayIndexOutOfBoundsException
|
72
74
|
from inside of JRuby. I have filed an [issue with JRuby](https://github.com/jruby/jruby/issues/1361).
|
73
75
|
|
74
|
-
I believe this problem
|
76
|
+
**I believe this problem is likely to be rare** -- so far, the only reproduction case involves an input string tagged 'ascii' encoding, which probably isn't a common use case. But it's unfortunate
|
75
77
|
that `scrub_rb` isn't reliable on jruby. I haven't been able to figure out any workaround in ruby to the jruby bug -- you could theoretically provide a Java alternate implementation usable in jruby, but I'm not sure what Java tools are available and how hard it would be to match the scrub api.
|
76
78
|
|
77
79
|
## Contributions
|
78
80
|
|
79
|
-
Pull requests or suggestions welcome, especially on performance, on JRuby issue, and on discrepencies with official String#scrub.
|
81
|
+
Pull requests or suggestions welcome, especially on performance, on JRuby issue, and on discrepencies with official String#scrub.
|
data/lib/scrub_rb/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: scrub_rb
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 1.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jonathan Rochkind
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2014-07-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|