encoded_string 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 6edf97129fcbe7bade9085842666d3b3c16eb38c
4
+ data.tar.gz: 2deb441cc137986c87c4fa0602afeb375a8b482f
5
+ SHA512:
6
+ metadata.gz: bb64ed5b32b177a34fdfee13677e54a58a49972c67ad5129fe562c192f30e6a74c18efdf89df78bf6a9a356bc9fed0642e797c8b592a472cc26550937ce4a0a0
7
+ data.tar.gz: bc7503ec2ca60c355d7abe085606d6cade4439a0554638d48069bcc36d5b2f4973082c90817b7020d43143644da30c55c03f6a00c12dcf3b691f23182fdf9cbc
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -0,0 +1,4 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2.2
4
+ before_install: gem install bundler -v 1.10.6
@@ -0,0 +1,13 @@
1
+ # Contributor Code of Conduct
2
+
3
+ As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
4
+
5
+ We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
6
+
7
+ Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
8
+
9
+ Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
10
+
11
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
12
+
13
+ This Code of Conduct is adapted from the [Contributor Covenant](http://contributor-covenant.org), version 1.0.0, available at [http://contributor-covenant.org/version/1/0/0/](http://contributor-covenant.org/version/1/0/0/)
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in encoded_string.gemspec
4
+ gemspec
@@ -0,0 +1,23 @@
1
+ The MIT License (MIT)
2
+ ====================
3
+
4
+ * Copyright © 2013 David Chelimsky, Myron Marston, Jon Rowe, Sam Phippen, Xavier Shay, Bradley Schaefer
5
+
6
+ Permission is hereby granted, free of charge, to any person obtaining
7
+ a copy of this software and associated documentation files (the
8
+ "Software"), to deal in the Software without restriction, including
9
+ without limitation the rights to use, copy, modify, merge, publish,
10
+ distribute, sublicense, and/or sell copies of the Software, and to
11
+ permit persons to whom the Software is furnished to do so, subject to
12
+ the following conditions:
13
+
14
+ The above copyright notice and this permission notice shall be
15
+ included in all copies or substantial portions of the Software.
16
+
17
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
18
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
19
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
20
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
21
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
22
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
23
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,99 @@
1
+ # EncodedString
2
+
3
+ EncodedString is a wrapper for a string and a given encoding that handles operations on
4
+ strings with different encodings, invalid encodings, have no known conversion method,
5
+ or are otherwise incompatible, all without raising excpetions
6
+
7
+ ## Installation
8
+
9
+ Add this line to your application's Gemfile:
10
+
11
+ ```ruby
12
+ gem 'encoded_string'
13
+ ```
14
+
15
+ And then execute:
16
+
17
+ $ bundle
18
+
19
+ Or install it yourself as:
20
+
21
+ $ gem install encoded_string
22
+
23
+ ## Usage
24
+
25
+ ```ruby
26
+ EncodedString.pick_encoding(str1, str2)
27
+
28
+ def build_encoded_string(string, target_encoding = string.encoding)
29
+ EncodedString.new(string, target_encoding)
30
+ end
31
+ str = "abc".encode('ASCII-8BIT')
32
+ str = EncodedString.new(str, target_encoding = string.encoding)
33
+ expect(str.source_encoding.to_s).to eq('ASCII-8BIT')
34
+ str.split("\n")
35
+ str << "123"
36
+ str.to_s
37
+ ```
38
+
39
+ ## About
40
+
41
+ Encoding Exceptions:
42
+
43
+ ```plain
44
+ Raised by Encoding and String methods:
45
+ Encoding::UndefinedConversionError:
46
+ when a transcoding operation fails
47
+ if the String contains characters invalid for the target encoding
48
+ e.g. "\x80".encode('UTF-8','ASCII-8BIT')
49
+ vs "\x80".encode('UTF-8','ASCII-8BIT', undef: :replace, replace: '<undef>')
50
+ # => '<undef>'
51
+ Encoding::CompatibilityError
52
+ when Encoding.compatibile?(str1, str2) is nil
53
+ e.g. utf_16le_emoji_string.split("\n")
54
+ e.g. valid_unicode_string.encode(utf8_encoding) << ascii_string
55
+ Encoding::InvalidByteSequenceError:
56
+ when the string being transcoded contains a byte invalid for
57
+ either the source or target encoding
58
+ e.g. "\x80".encode('UTF-8','US-ASCII')
59
+ vs "\x80".encode('UTF-8','US-ASCII', invalid: :replace, replace: '<byte>')
60
+ # => '<byte>'
61
+ ArgumentError
62
+ when operating on a string with invalid bytes
63
+ e.g."\x80".split("\n")
64
+ TypeError
65
+ when a symbol is passed as an encoding
66
+ Encoding.find(:"UTF-8")
67
+ when calling force_encoding on an object
68
+ that doesn't respond to #to_str
69
+
70
+ Raised by transcoding methods:
71
+ Encoding::ConverterNotFoundError:
72
+ when a named encoding does not correspond with a known converter
73
+ e.g. 'abc'.force_encoding('UTF-8').encode('foo')
74
+ or a converter path cannot be found
75
+ e.g. "\x80".force_encoding('ASCII-8BIT').encode('Emacs-Mule')
76
+
77
+ Raised by byte <-> char conversions
78
+ RangeError: out of char range
79
+ e.g. the UTF-16LE emoji: 128169.chr
80
+ ```
81
+
82
+ See [lib/encoded_string.rb](lib/encoded_string.rb) and
83
+ [spec/encoded_string_spec.rb](spec/encoded_string_spec.rb) for more information.
84
+
85
+ ## Development
86
+
87
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
88
+
89
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
90
+
91
+ ## Contributing
92
+
93
+ Bug reports and pull requests are welcome on GitHub at https://github.com/bf4/encoded_string. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](contributor-covenant.org) code of conduct.
94
+
95
+
96
+ ## License
97
+
98
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
99
+
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "encoded_string"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,23 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'encoded_string/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "encoded_string"
8
+ spec.version = EncodedString::VERSION
9
+ spec.authors = ["Benjamin Fleischer"]
10
+ spec.email = ["github@benjaminfleischer.com"]
11
+
12
+ spec.summary = %q{Handle string operations without worrying about raising encoding exceptions.}
13
+ spec.description = %q{Extracted from rspec-support. See https://github.com/rspec/rspec-support/issues/249.}
14
+ spec.homepage = "https://github.com/bf4/encoded_string"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
17
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
18
+ spec.require_paths = ["lib"]
19
+
20
+ spec.add_development_dependency "bundler", "~> 1.10"
21
+ spec.add_development_dependency "rake", "~> 10.0"
22
+ spec.add_development_dependency "rspec"
23
+ end
@@ -0,0 +1,150 @@
1
+ class EncodedString
2
+ # Reduce allocations by storing constants.
3
+ UTF_8 = "UTF-8"
4
+ US_ASCII = "US-ASCII"
5
+ #
6
+ # In MRI 2.1 'invalid: :replace' changed to also replace an invalid byte sequence
7
+ # see https://github.com/ruby/ruby/blob/v2_1_0/NEWS#L176
8
+ # https://www.ruby-forum.com/topic/6861247
9
+ # https://twitter.com/nalsh/status/553413844685438976
10
+ #
11
+ # For example, given:
12
+ # "\x80".force_encoding("Emacs-Mule").encode(:invalid => :replace).bytes.to_a
13
+ #
14
+ # On MRI 2.1 or above: 63 # '?'
15
+ # else : 128 # "\x80"
16
+ #
17
+ # Ruby's default replacement string is:
18
+ # U+FFFD ("\xEF\xBF\xBD"), for Unicode encoding forms, else
19
+ # ? ("\x3F")
20
+ REPLACE = "?"
21
+ ENCODE_UNCONVERTABLE_BYTES = {
22
+ :invalid => :replace,
23
+ :undef => :replace,
24
+ :replace => REPLACE
25
+ }
26
+ ENCODE_NO_CONVERTER = {
27
+ :invalid => :replace,
28
+ :replace => REPLACE
29
+ }
30
+
31
+ def initialize(string, encoding=nil)
32
+ @encoding = encoding
33
+ @source_encoding = detect_source_encoding(string)
34
+ @string = matching_encoding(string)
35
+ end
36
+ attr_reader :source_encoding
37
+
38
+ delegated_methods = String.instance_methods.map(&:to_s) & %w[eql? lines == encoding empty?]
39
+ delegated_methods.each do |name|
40
+ define_method(name) { |*args, &block| @string.__send__(name, *args, &block) }
41
+ end
42
+
43
+ def <<(string)
44
+ @string << matching_encoding(string)
45
+ end
46
+
47
+ def split(regex_or_string)
48
+ @string.split(matching_encoding(regex_or_string))
49
+ end
50
+
51
+ def to_s
52
+ @string
53
+ end
54
+ alias :to_str :to_s
55
+
56
+ if String.method_defined?(:encoding)
57
+
58
+ private
59
+
60
+ # Encoding Exceptions:
61
+ #
62
+ # Raised by Encoding and String methods:
63
+ # Encoding::UndefinedConversionError:
64
+ # when a transcoding operation fails
65
+ # if the String contains characters invalid for the target encoding
66
+ # e.g. "\x80".encode('UTF-8','ASCII-8BIT')
67
+ # vs "\x80".encode('UTF-8','ASCII-8BIT', undef: :replace, replace: '<undef>')
68
+ # # => '<undef>'
69
+ # Encoding::CompatibilityError
70
+ # when Encoding.compatibile?(str1, str2) is nil
71
+ # e.g. utf_16le_emoji_string.split("\n")
72
+ # e.g. valid_unicode_string.encode(utf8_encoding) << ascii_string
73
+ # Encoding::InvalidByteSequenceError:
74
+ # when the string being transcoded contains a byte invalid for
75
+ # either the source or target encoding
76
+ # e.g. "\x80".encode('UTF-8','US-ASCII')
77
+ # vs "\x80".encode('UTF-8','US-ASCII', invalid: :replace, replace: '<byte>')
78
+ # # => '<byte>'
79
+ # ArgumentError
80
+ # when operating on a string with invalid bytes
81
+ # e.g."\x80".split("\n")
82
+ # TypeError
83
+ # when a symbol is passed as an encoding
84
+ # Encoding.find(:"UTF-8")
85
+ # when calling force_encoding on an object
86
+ # that doesn't respond to #to_str
87
+ #
88
+ # Raised by transcoding methods:
89
+ # Encoding::ConverterNotFoundError:
90
+ # when a named encoding does not correspond with a known converter
91
+ # e.g. 'abc'.force_encoding('UTF-8').encode('foo')
92
+ # or a converter path cannot be found
93
+ # e.g. "\x80".force_encoding('ASCII-8BIT').encode('Emacs-Mule')
94
+ #
95
+ # Raised by byte <-> char conversions
96
+ # RangeError: out of char range
97
+ # e.g. the UTF-16LE emoji: 128169.chr
98
+ def matching_encoding(string)
99
+ string = remove_invalid_bytes(string)
100
+ string.encode(@encoding)
101
+ rescue Encoding::UndefinedConversionError, Encoding::InvalidByteSequenceError
102
+ string.encode(@encoding, ENCODE_UNCONVERTABLE_BYTES)
103
+ rescue Encoding::ConverterNotFoundError
104
+ string.dup.force_encoding(@encoding).encode(ENCODE_NO_CONVERTER)
105
+ end
106
+
107
+ # Prevents raising ArgumentError
108
+ if String.method_defined?(:scrub)
109
+ # https://github.com/ruby/ruby/blob/eeb05e8c11/doc/NEWS-2.1.0#L120-L123
110
+ # https://github.com/ruby/ruby/blob/v2_1_0/string.c#L8242
111
+ # https://github.com/hsbt/string-scrub
112
+ # https://github.com/rubinius/rubinius/blob/v2.5.2/kernel/common/string.rb#L1913-L1972
113
+ def remove_invalid_bytes(string)
114
+ string.scrub(REPLACE)
115
+ end
116
+ else
117
+ # http://stackoverflow.com/a/8711118/879854
118
+ # Loop over chars in a string replacing chars
119
+ # with invalid encoding, which is a pretty good proxy
120
+ # for the invalid byte sequence that causes an ArgumentError
121
+ def remove_invalid_bytes(string)
122
+ string.chars.map do |char|
123
+ char.valid_encoding? ? char : REPLACE
124
+ end.join
125
+ end
126
+ end
127
+
128
+ def detect_source_encoding(string)
129
+ string.encoding
130
+ end
131
+
132
+ def self.pick_encoding(source_a, source_b)
133
+ Encoding.compatible?(source_a, source_b) || Encoding.default_external
134
+ end
135
+ else
136
+
137
+ def self.pick_encoding(_source_a, _source_b)
138
+ end
139
+
140
+ private
141
+
142
+ def matching_encoding(string)
143
+ string
144
+ end
145
+
146
+ def detect_source_encoding(_string)
147
+ US_ASCII
148
+ end
149
+ end
150
+ end
@@ -0,0 +1,46 @@
1
+ require 'rspec/matchers'
2
+ # Special matcher for comparing encoded strings so that
3
+ # we don't run any expectation failures through the Differ,
4
+ # which also relies on EncodedString. Instead, confirm the
5
+ # strings have the same bytes.
6
+ RSpec::Matchers.define :be_identical_string do |expected|
7
+
8
+ if String.method_defined?(:encoding)
9
+ match do
10
+ expected_encoding? &&
11
+ actual.bytes.to_a == expected.bytes.to_a
12
+ end
13
+
14
+ failure_message do
15
+ "expected\n#{actual.inspect} (#{actual.encoding.name}) to be identical to\n"\
16
+ "#{expected.inspect} (#{expected.encoding.name})\n"\
17
+ "The exact bytes are printed below for more detail:\n"\
18
+ "#{actual.bytes.to_a}\n"\
19
+ "#{expected.bytes.to_a}\n"\
20
+ end
21
+
22
+ # Depends on chaining :with_same_encoding for it to
23
+ # check for string encoding.
24
+ def expected_encoding?
25
+ if defined?(@expect_same_encoding) && @expect_same_encoding
26
+ actual.encoding == expected.encoding
27
+ else
28
+ true
29
+ end
30
+ end
31
+ else
32
+ match do
33
+ actual.split(//) == expected.split(//)
34
+ end
35
+
36
+ failure_message do
37
+ "expected\n#{actual.inspect} to be identical to\n#{expected.inspect}\n"
38
+ end
39
+ end
40
+
41
+ chain :with_same_encoding do
42
+ @expect_same_encoding ||= true
43
+ end
44
+ end
45
+ RSpec::Matchers.alias_matcher :a_string_identical_to, :be_identical_string
46
+ RSpec::Matchers.alias_matcher :be_diffed_as, :be_identical_string
@@ -0,0 +1,3 @@
1
+ class EncodedString
2
+ VERSION = "0.1.0"
3
+ end
metadata ADDED
@@ -0,0 +1,99 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: encoded_string
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Benjamin Fleischer
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-10-26 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.10'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.10'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: Extracted from rspec-support. See https://github.com/rspec/rspec-support/issues/249.
56
+ email:
57
+ - github@benjaminfleischer.com
58
+ executables: []
59
+ extensions: []
60
+ extra_rdoc_files: []
61
+ files:
62
+ - ".gitignore"
63
+ - ".rspec"
64
+ - ".travis.yml"
65
+ - CODE_OF_CONDUCT.md
66
+ - Gemfile
67
+ - LICENSE.md
68
+ - README.md
69
+ - Rakefile
70
+ - bin/console
71
+ - bin/setup
72
+ - encoded_string.gemspec
73
+ - lib/encoded_string.rb
74
+ - lib/encoded_string/spec/string_matcher.rb
75
+ - lib/encoded_string/version.rb
76
+ homepage: https://github.com/bf4/encoded_string
77
+ licenses: []
78
+ metadata: {}
79
+ post_install_message:
80
+ rdoc_options: []
81
+ require_paths:
82
+ - lib
83
+ required_ruby_version: !ruby/object:Gem::Requirement
84
+ requirements:
85
+ - - ">="
86
+ - !ruby/object:Gem::Version
87
+ version: '0'
88
+ required_rubygems_version: !ruby/object:Gem::Requirement
89
+ requirements:
90
+ - - ">="
91
+ - !ruby/object:Gem::Version
92
+ version: '0'
93
+ requirements: []
94
+ rubyforge_project:
95
+ rubygems_version: 2.4.6
96
+ signing_key:
97
+ specification_version: 4
98
+ summary: Handle string operations without worrying about raising encoding exceptions.
99
+ test_files: []