RubyGems - github-linguist - Versions diffs - 2.11.0 → 2.11.1 - Mend

github-linguist 2.11.0 → 2.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 5f3d2afbe328769fe6d43290ac15e279a1fc839e
-  data.tar.gz: 6150cd186aa13e933e26c01ea48c3b2416ecf965
+  metadata.gz: 20cf20617d8a0934a17836c944818f33e9d1efa2
+  data.tar.gz: c9852f9e4df5fca5cbf768e1109ef343f685ec69
 SHA512:
-  metadata.gz: c0e276415c90ee3dcd43d6eed9f7357e4cd49838429c82200754a3fb7657ba298de3ba553c11ae481a8aa25735140f6181c1c3c78bc91e94d1d48e1aaf4b1db9
-  data.tar.gz: 4af5b1dd14a9b5dc42d49accac0241193f92491d1c99b17b2e918689c0e2d6aa3324cfaeebee80b4e1dc22aef51a9cd03b549449698aa1f34822c968c0a152c2
+  metadata.gz: 52bfdbda263546ec1075a93c3f5132726ff35e47611e295ff6b29169989cbfef9bf387a94578f74ef4ef1349d518b0b837f107e4008b1bb11a854fecc8488073
+  data.tar.gz: fe48a2abc882001aba7c700d8ee44438bbf004f5a7d85394a90aa9330b4b7bbb67b2f4c4f7cd27680dbfd0d124ce5647970b70c520e010a9fbb22dc2e389cd20

data/lib/linguist/blob_helper.rb CHANGED Viewed

@@ -241,7 +241,25 @@ module Linguist
     def lines
       @lines ||=
         if viewable? && data
-          data.split(/\r\n|\r|\n/, -1)
+          # `data` is usually encoded as ASCII-8BIT even when the content has
+          # been detected as a different encoding. However, we are not allowed
+          # to change the encoding of `data` because we've made the implicit
+          # guarantee that each entry in `lines` is encoded the same way as
+          # `data`.
+          #
+          # Instead, we re-encode each possible newline sequence as the
+          # detected encoding, then force them back to the encoding of `data`
+          # (usually a binary encoding like ASCII-8BIT). This means that the
+          # byte sequence will match how newlines are likely encoded in the
+          # file, but we don't have to change the encoding of `data` as far as
+          # Ruby is concerned. This allows us to correctly parse out each line
+          # without changing the encoding of `data`, and
+          # also--importantly--without having to duplicate many (potentially
+          # large) strings.
+          encoded_newlines = ["\r\n", "\r", "\n"].
+            map { |nl| nl.encode(encoding).force_encoding(data.encoding) }
+          data.split(Regexp.union(encoded_newlines), -1)
         else
           []
         end

data/lib/linguist/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Linguist
-  VERSION = "2.11.0"
+  VERSION = "2.11.1"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: github-linguist
 version: !ruby/object:Gem::Version
-  version: 2.11.0
+  version: 2.11.1
 platform: ruby
 authors:
 - GitHub
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-05-16 00:00:00.000000000 Z
+date: 2014-05-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: charlock_holmes