extlz4 0.2.4.3 → 0.2.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9b6fe3483483101e9c211a0f3f11d93531da246ea3cc76122dc68e5de055d869
4
- data.tar.gz: 1b707db6ec7779f516ce3c9b60ad95165d8127bbd07a651d4da5717e7716bdea
3
+ metadata.gz: fc2c22224769c26e8e6c7fe4eded483f7963203ddc87da1b834c4597d3e6e593
4
+ data.tar.gz: 615d87e1be068ecc93a1c262127d1ad28787a5d25ede531fb15cb1dda1c4a533
5
5
  SHA512:
6
- metadata.gz: 26e050bdd6430587345be66cfd75b501cb9b39bf11f586288773d8c429ff2698d242d270aacad051455c356e6078c6720d9c4f7582f2d4e8a28d5ea58d3456cd
7
- data.tar.gz: f43f49a90cd5d144155fd0b46b70fa066eba298a923f4fc2e1317be85be4c50463cbee7d5683d3c6b6738fd317fcd60f18348effebd700b85e70e518d02dc652
6
+ metadata.gz: db7354b91b1776a973eb323680509c30f0c9704a8ee61c00f88fa20bf020128a4bb5cd53e9393d8115fa54e10b137bbd43a570a830d75bbf9d7c5d4ef6e1148d
7
+ data.tar.gz: fc2e568642e6ffd47d8b941dff6edf54fca9a9cf19017e31808c0382c3e481d5714316db915a1cae3e15a2d04833e4bc8fdb5094e38ae122f069826f2d73a92d
@@ -1,3 +1,8 @@
1
+ # extlz4-0.2.4.4 (平成30年1月14日 日曜日)
2
+
3
+ * lz4 ライブラリを [1.8.1](https://github.com/lz4/lz4/releases/tag/v1.8.1) に更新
4
+
5
+
1
6
  # extlz4-0.2.4.3 (平成30年1月13日 土曜日)
2
7
 
3
8
  * LZ4::Decoder.read でバッファ領域が確保されていなかった問題を修正
data/README.md CHANGED
@@ -18,14 +18,14 @@ $ dmesg | ruby -r extlz4 -e 'LZ4.encode_file($stdin.binmode, $stdout.binmode)' |
18
18
  * author: dearblue (mailto:dearblue@users.noreply.github.com)
19
19
  * report issue to: <https://github.com/dearblue/ruby-extlz4/issues>
20
20
  * how to install: `gem install extlz4`
21
- * version: 0.2.4.3
21
+ * version: 0.2.5
22
22
  * product quality: technical preview
23
23
  * licensing: BSD-2-Clause License
24
24
  * dependency gems: none
25
25
  * dependency external c libraries: none
26
26
  * bundled external c libraries:
27
- * lz4-1.8 <https://github.com/lz4/lz4/tree/v1.8.0>
28
- under [BSD 2-Clause license](https://github.com/lz4/lz4/tree/v1.8.0/LICENSE)
27
+ * lz4-1.8.1 <https://github.com/lz4/lz4/tree/v1.8.1>
28
+ under [BSD 2-Clause license](https://github.com/lz4/lz4/tree/v1.8.1/LICENSE)
29
29
  by [Yann Collet](https://github.com/Cyan4973)
30
30
 
31
31
 
@@ -8,6 +8,7 @@ make install # this command may require root access
8
8
 
9
9
  LZ4's `Makefile` supports standard [Makefile conventions],
10
10
  including [staged installs], [redirection], or [command redefinition].
11
+ It is compatible with parallel builds (`-j#`).
11
12
 
12
13
  [Makefile conventions]: https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html
13
14
  [staged installs]: https://www.gnu.org/prep/standards/html_node/DESTDIR.html
@@ -1,3 +1,16 @@
1
+ v1.8.1
2
+ perf : faster and stronger ultra modes (levels 10+)
3
+ perf : slightly faster compression and decompression speed
4
+ perf : fix bad degenerative case, reported by @c-morgenstern
5
+ fix : decompression failed when using a combination of extDict + low memory address (#397), reported and fixed by Julian Scheid (@jscheid)
6
+ cli : support for dictionary compression (`-D`), by Felix Handte @felixhandte
7
+ cli : fix : `lz4 -d --rm` preserves timestamp (#441)
8
+ cli : fix : do not modify /dev/null permission as root, by @aliceatlas
9
+ api : `_destSize()` variant supported for all compression levels
10
+ build : `make` and `make test` compatible with `-jX`, reported by @mwgamera
11
+ build : can control LZ4LIB_VISIBILITY macro, by @mikir
12
+ install: fix man page directory (#387), reported by Stuart Cardall (@itoffshore)
13
+
1
14
  v1.8.0
2
15
  cli : fix : do not modify /dev/null permissions, reported by @Maokaman1
3
16
  cli : added GNU separator -- specifying that all following arguments are files
@@ -82,6 +82,7 @@ make install # this command may require root access
82
82
 
83
83
  LZ4's `Makefile` supports standard [Makefile conventions],
84
84
  including [staged installs], [redirection], or [command redefinition].
85
+ It is compatible with parallel builds (`-j#`).
85
86
 
86
87
  [Makefile conventions]: https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html
87
88
  [staged installs]: https://www.gnu.org/prep/standards/html_node/DESTDIR.html
@@ -19,7 +19,6 @@ test:
19
19
  - make cmake && make clean
20
20
  - make -C tests test-lz4
21
21
  - make -C tests test-lz4c
22
- - make -C tests test-fasttest
23
22
  - make -C tests test-frametest
24
23
  - make -C tests test-fullbench
25
24
  - make -C tests test-fuzzer && make clean
@@ -1,44 +1,43 @@
1
1
  LZ4 - Library Files
2
2
  ================================
3
3
 
4
- The directory contains many files, but depending on project's objectives,
4
+ The `/lib` directory contains many files, but depending on project's objectives,
5
5
  not all of them are necessary.
6
6
 
7
7
  #### Minimal LZ4 build
8
8
 
9
9
  The minimum required is **`lz4.c`** and **`lz4.h`**,
10
- which will provide the fast compression and decompression algorithm.
10
+ which provides the fast compression and decompression algorithm.
11
+ They generate and decode data using [LZ4 block format].
11
12
 
12
13
 
13
- #### The High Compression variant of LZ4
14
+ #### High Compression variant
14
15
 
15
- For more compression at the cost of compression speed,
16
- the High Compression variant **lz4hc** is available.
17
- It's necessary to add **`lz4hc.c`** and **`lz4hc.h`**.
18
- The variant still depends on regular `lz4` source files.
19
- In particular, the decompression is still provided by `lz4.c`.
16
+ For more compression ratio at the cost of compression speed,
17
+ the High Compression variant called **lz4hc** is available.
18
+ Add files **`lz4hc.c`**, **`lz4hc.h`** and **`lz4opt.h`**.
19
+ The variant still depends on regular `lib/lz4.*` source files.
20
20
 
21
21
 
22
- #### Compatibility issues
22
+ #### Frame variant, for interoperability
23
23
 
24
- In order to produce files or streams compatible with `lz4` command line utility,
24
+ In order to produce compressed data compatible with `lz4` command line utility,
25
25
  it's necessary to encode lz4-compressed blocks using the [official interoperable frame format].
26
26
  This format is generated and decoded automatically by the **lz4frame** library.
27
- In order to work properly, lz4frame needs lz4 and lz4hc, and also **xxhash**,
28
- which provides error detection.
29
- (_Advanced stuff_ : It's possible to hide xxhash symbols into a local namespace.
30
- This is what `liblz4` does, to avoid symbol duplication
31
- in case a user program would link to several libraries containing xxhash symbols.)
27
+ Its public API is described in `lib/lz4frame.h`.
28
+ In order to work properly, lz4frame needs all other modules present in `/lib`,
29
+ including, lz4 and lz4hc, and also **xxhash**.
30
+ So it's necessary to include all `*.c` and `*.h` files present in `/lib`.
32
31
 
33
32
 
34
- #### Advanced API
33
+ #### Advanced / Experimental API
35
34
 
36
- A more complex `lz4frame_static.h` is also provided.
37
- It contains definitions which are not guaranteed to remain stable within future versions.
38
- It must be used with static linking ***only***.
35
+ A complex API defined in `lz4frame_static.h` contains definitions
36
+ which are not guaranteed to remain stable in future versions.
37
+ As a consequence, it must be used with static linking ***only***.
39
38
 
40
39
 
41
- #### Using MinGW+MSYS to create DLL
40
+ #### Windows : using MinGW+MSYS to create DLL
42
41
 
43
42
  DLL can be created using MinGW+MSYS with the `make liblz4` command.
44
43
  This command creates `dll\liblz4.dll` and the import library `dll\liblz4.lib`.
@@ -51,23 +50,24 @@ file it should be linked with `dll\liblz4.dll`. For example:
51
50
  ```
52
51
  gcc $(CFLAGS) -Iinclude/ test-dll.c -o test-dll dll\liblz4.dll
53
52
  ```
54
- The compiled executable will require LZ4 DLL which is available at `dll\liblz4.dll`.
53
+ The compiled executable will require LZ4 DLL which is available at `dll\liblz4.dll`.
55
54
 
56
55
 
57
- #### Miscellaneous
56
+ #### Miscellaneous
58
57
 
59
58
  Other files present in the directory are not source code. There are :
60
59
 
61
- - LICENSE : contains the BSD license text
62
- - Makefile : script to compile or install lz4 library (static or dynamic)
63
- - liblz4.pc.in : for pkg-config (make install)
64
- - README.md : this file
60
+ - `LICENSE` : contains the BSD license text
61
+ - `Makefile` : `make` script to compile and install lz4 library (static and dynamic)
62
+ - `liblz4.pc.in` : for `pkg-config` (used in `make install`)
63
+ - `README.md` : this file
65
64
 
66
65
  [official interoperable frame format]: ../doc/lz4_Frame_format.md
66
+ [LZ4 block format]: ../doc/lz4_Block_format.md
67
67
 
68
68
 
69
- #### License
69
+ #### License
70
70
 
71
71
  All source material within __lib__ directory are BSD 2-Clause licensed.
72
72
  See [LICENSE](LICENSE) for details.
73
- The license is also repeated at the top of each source file.
73
+ The license is also reminded at the top of each source file.
@@ -85,6 +85,7 @@
85
85
  #endif
86
86
 
87
87
 
88
+
88
89
  /*-************************************
89
90
  * Dependency
90
91
  **************************************/
@@ -101,21 +102,43 @@
101
102
  # pragma warning(disable : 4293) /* disable: C4293: too large shift (32-bits) */
102
103
  #endif /* _MSC_VER */
103
104
 
104
- #ifndef FORCE_INLINE
105
+ #ifndef LZ4_FORCE_INLINE
105
106
  # ifdef _MSC_VER /* Visual Studio */
106
- # define FORCE_INLINE static __forceinline
107
+ # define LZ4_FORCE_INLINE static __forceinline
107
108
  # else
108
109
  # if defined (__cplusplus) || defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* C99 */
109
110
  # ifdef __GNUC__
110
- # define FORCE_INLINE static inline __attribute__((always_inline))
111
+ # define LZ4_FORCE_INLINE static inline __attribute__((always_inline))
111
112
  # else
112
- # define FORCE_INLINE static inline
113
+ # define LZ4_FORCE_INLINE static inline
113
114
  # endif
114
115
  # else
115
- # define FORCE_INLINE static
116
+ # define LZ4_FORCE_INLINE static
116
117
  # endif /* __STDC_VERSION__ */
117
118
  # endif /* _MSC_VER */
118
- #endif /* FORCE_INLINE */
119
+ #endif /* LZ4_FORCE_INLINE */
120
+
121
+ /* LZ4_FORCE_O2_GCC_PPC64LE and LZ4_FORCE_O2_INLINE_GCC_PPC64LE
122
+ * Gcc on ppc64le generates an unrolled SIMDized loop for LZ4_wildCopy,
123
+ * together with a simple 8-byte copy loop as a fall-back path.
124
+ * However, this optimization hurts the decompression speed by >30%,
125
+ * because the execution does not go to the optimized loop
126
+ * for typical compressible data, and all of the preamble checks
127
+ * before going to the fall-back path become useless overhead.
128
+ * This optimization happens only with the -O3 flag, and -O2 generates
129
+ * a simple 8-byte copy loop.
130
+ * With gcc on ppc64le, all of the LZ4_decompress_* and LZ4_wildCopy
131
+ * functions are annotated with __attribute__((optimize("O2"))),
132
+ * and also LZ4_wildCopy is forcibly inlined, so that the O2 attribute
133
+ * of LZ4_wildCopy does not affect the compression speed.
134
+ */
135
+ #if defined(__PPC64__) && defined(__LITTLE_ENDIAN__) && defined(__GNUC__)
136
+ # define LZ4_FORCE_O2_GCC_PPC64LE __attribute__((optimize("O2")))
137
+ # define LZ4_FORCE_O2_INLINE_GCC_PPC64LE __attribute__((optimize("O2"))) LZ4_FORCE_INLINE
138
+ #else
139
+ # define LZ4_FORCE_O2_GCC_PPC64LE
140
+ # define LZ4_FORCE_O2_INLINE_GCC_PPC64LE static
141
+ #endif
119
142
 
120
143
  #if (defined(__GNUC__) && (__GNUC__ >= 3)) || (defined(__INTEL_COMPILER) && (__INTEL_COMPILER >= 800)) || defined(__clang__)
121
144
  # define expect(expr,value) (__builtin_expect ((expr),(value)) )
@@ -253,7 +276,8 @@ static void LZ4_copy8(void* dst, const void* src)
253
276
  }
254
277
 
255
278
  /* customized variant of memcpy, which can overwrite up to 8 bytes beyond dstEnd */
256
- static void LZ4_wildCopy(void* dstPtr, const void* srcPtr, void* dstEnd)
279
+ LZ4_FORCE_O2_INLINE_GCC_PPC64LE
280
+ void LZ4_wildCopy(void* dstPtr, const void* srcPtr, void* dstEnd)
257
281
  {
258
282
  BYTE* d = (BYTE*)dstPtr;
259
283
  const BYTE* s = (const BYTE*)srcPtr;
@@ -289,15 +313,24 @@ static const int LZ4_minLength = (MFLIMIT+1);
289
313
  /*-************************************
290
314
  * Error detection
291
315
  **************************************/
292
- #define LZ4_STATIC_ASSERT(c) { enum { LZ4_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */
316
+ #if defined(LZ4_DEBUG) && (LZ4_DEBUG>=1)
317
+ # include <assert.h>
318
+ #else
319
+ # ifndef assert
320
+ # define assert(condition) ((void)0)
321
+ # endif
322
+ #endif
323
+
324
+ #define LZ4_STATIC_ASSERT(c) { enum { LZ4_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */
293
325
 
294
326
  #if defined(LZ4_DEBUG) && (LZ4_DEBUG>=2)
295
327
  # include <stdio.h>
296
- # define DEBUGLOG(l, ...) { \
297
- if (l<=LZ4_DEBUG) { \
298
- fprintf(stderr, __FILE__ ": "); \
299
- fprintf(stderr, __VA_ARGS__); \
300
- fprintf(stderr, " \n"); \
328
+ static int g_debuglog_enable = 1;
329
+ # define DEBUGLOG(l, ...) { \
330
+ if ((g_debuglog_enable) && (l<=LZ4_DEBUG)) { \
331
+ fprintf(stderr, __FILE__ ": "); \
332
+ fprintf(stderr, __VA_ARGS__); \
333
+ fprintf(stderr, " \n"); \
301
334
  } }
302
335
  #else
303
336
  # define DEBUGLOG(l, ...) {} /* disabled */
@@ -307,7 +340,7 @@ static const int LZ4_minLength = (MFLIMIT+1);
307
340
  /*-************************************
308
341
  * Common functions
309
342
  **************************************/
310
- static unsigned LZ4_NbCommonBytes (register reg_t val)
343
+ static unsigned LZ4_NbCommonBytes (reg_t val)
311
344
  {
312
345
  if (LZ4_isLittleEndian()) {
313
346
  if (sizeof(val)==8) {
@@ -318,7 +351,14 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
318
351
  # elif (defined(__clang__) || (defined(__GNUC__) && (__GNUC__>=3))) && !defined(LZ4_FORCE_SW_BITCOUNT)
319
352
  return (__builtin_ctzll((U64)val) >> 3);
320
353
  # else
321
- static const int DeBruijnBytePos[64] = { 0, 0, 0, 0, 0, 1, 1, 2, 0, 3, 1, 3, 1, 4, 2, 7, 0, 2, 3, 6, 1, 5, 3, 5, 1, 3, 4, 4, 2, 5, 6, 7, 7, 0, 1, 2, 3, 3, 4, 6, 2, 6, 5, 5, 3, 4, 5, 6, 7, 1, 2, 4, 6, 4, 4, 5, 7, 2, 6, 5, 7, 6, 7, 7 };
354
+ static const int DeBruijnBytePos[64] = { 0, 0, 0, 0, 0, 1, 1, 2,
355
+ 0, 3, 1, 3, 1, 4, 2, 7,
356
+ 0, 2, 3, 6, 1, 5, 3, 5,
357
+ 1, 3, 4, 4, 2, 5, 6, 7,
358
+ 7, 0, 1, 2, 3, 3, 4, 6,
359
+ 2, 6, 5, 5, 3, 4, 5, 6,
360
+ 7, 1, 2, 4, 6, 4, 4, 5,
361
+ 7, 2, 6, 5, 7, 6, 7, 7 };
322
362
  return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218A392CDABBD3FULL)) >> 58];
323
363
  # endif
324
364
  } else /* 32 bits */ {
@@ -329,12 +369,15 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
329
369
  # elif (defined(__clang__) || (defined(__GNUC__) && (__GNUC__>=3))) && !defined(LZ4_FORCE_SW_BITCOUNT)
330
370
  return (__builtin_ctz((U32)val) >> 3);
331
371
  # else
332
- static const int DeBruijnBytePos[32] = { 0, 0, 3, 0, 3, 1, 3, 0, 3, 2, 2, 1, 3, 2, 0, 1, 3, 3, 1, 2, 2, 2, 2, 0, 3, 1, 2, 0, 1, 0, 1, 1 };
372
+ static const int DeBruijnBytePos[32] = { 0, 0, 3, 0, 3, 1, 3, 0,
373
+ 3, 2, 2, 1, 3, 2, 0, 1,
374
+ 3, 3, 1, 2, 2, 2, 2, 0,
375
+ 3, 1, 2, 0, 1, 0, 1, 1 };
333
376
  return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)) >> 27];
334
377
  # endif
335
378
  }
336
379
  } else /* Big Endian CPU */ {
337
- if (sizeof(val)==8) {
380
+ if (sizeof(val)==8) { /* 64-bits */
338
381
  # if defined(_MSC_VER) && defined(_WIN64) && !defined(LZ4_FORCE_SW_BITCOUNT)
339
382
  unsigned long r = 0;
340
383
  _BitScanReverse64( &r, val );
@@ -342,8 +385,11 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
342
385
  # elif (defined(__clang__) || (defined(__GNUC__) && (__GNUC__>=3))) && !defined(LZ4_FORCE_SW_BITCOUNT)
343
386
  return (__builtin_clzll((U64)val) >> 3);
344
387
  # else
388
+ static const U32 by32 = sizeof(val)*4; /* 32 on 64 bits (goal), 16 on 32 bits.
389
+ Just to avoid some static analyzer complaining about shift by 32 on 32-bits target.
390
+ Note that this code path is never triggered in 32-bits mode. */
345
391
  unsigned r;
346
- if (!(val>>32)) { r=4; } else { r=0; val>>=32; }
392
+ if (!(val>>by32)) { r=4; } else { r=0; val>>=by32; }
347
393
  if (!(val>>16)) { r+=2; val>>=8; } else { val>>=24; }
348
394
  r += (!val);
349
395
  return r;
@@ -366,11 +412,20 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
366
412
  }
367
413
 
368
414
  #define STEPSIZE sizeof(reg_t)
369
- static unsigned LZ4_count(const BYTE* pIn, const BYTE* pMatch, const BYTE* pInLimit)
415
+ LZ4_FORCE_INLINE
416
+ unsigned LZ4_count(const BYTE* pIn, const BYTE* pMatch, const BYTE* pInLimit)
370
417
  {
371
418
  const BYTE* const pStart = pIn;
372
419
 
373
- while (likely(pIn<pInLimit-(STEPSIZE-1))) {
420
+ if (likely(pIn < pInLimit-(STEPSIZE-1))) {
421
+ reg_t const diff = LZ4_read_ARCH(pMatch) ^ LZ4_read_ARCH(pIn);
422
+ if (!diff) {
423
+ pIn+=STEPSIZE; pMatch+=STEPSIZE;
424
+ } else {
425
+ return LZ4_NbCommonBytes(diff);
426
+ } }
427
+
428
+ while (likely(pIn < pInLimit-(STEPSIZE-1))) {
374
429
  reg_t const diff = LZ4_read_ARCH(pMatch) ^ LZ4_read_ARCH(pIn);
375
430
  if (!diff) { pIn+=STEPSIZE; pMatch+=STEPSIZE; continue; }
376
431
  pIn += LZ4_NbCommonBytes(diff);
@@ -436,7 +491,7 @@ static U32 LZ4_hash5(U64 sequence, tableType_t const tableType)
436
491
  return (U32)(((sequence >> 24) * prime8bytes) >> (64 - hashLog));
437
492
  }
438
493
 
439
- FORCE_INLINE U32 LZ4_hashPosition(const void* const p, tableType_t const tableType)
494
+ LZ4_FORCE_INLINE U32 LZ4_hashPosition(const void* const p, tableType_t const tableType)
440
495
  {
441
496
  if ((sizeof(reg_t)==8) && (tableType != byU16)) return LZ4_hash5(LZ4_read_ARCH(p), tableType);
442
497
  return LZ4_hash4(LZ4_read32(p), tableType);
@@ -452,7 +507,7 @@ static void LZ4_putPositionOnHash(const BYTE* p, U32 h, void* tableBase, tableTy
452
507
  }
453
508
  }
454
509
 
455
- FORCE_INLINE void LZ4_putPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
510
+ LZ4_FORCE_INLINE void LZ4_putPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
456
511
  {
457
512
  U32 const h = LZ4_hashPosition(p, tableType);
458
513
  LZ4_putPositionOnHash(p, h, tableBase, tableType, srcBase);
@@ -465,7 +520,7 @@ static const BYTE* LZ4_getPositionOnHash(U32 h, void* tableBase, tableType_t tab
465
520
  { const U16* const hashTable = (U16*) tableBase; return hashTable[h] + srcBase; } /* default, to ensure a return */
466
521
  }
467
522
 
468
- FORCE_INLINE const BYTE* LZ4_getPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
523
+ LZ4_FORCE_INLINE const BYTE* LZ4_getPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
469
524
  {
470
525
  U32 const h = LZ4_hashPosition(p, tableType);
471
526
  return LZ4_getPositionOnHash(h, tableBase, tableType, srcBase);
@@ -474,7 +529,7 @@ FORCE_INLINE const BYTE* LZ4_getPosition(const BYTE* p, void* tableBase, tableTy
474
529
 
475
530
  /** LZ4_compress_generic() :
476
531
  inlined, to ensure branches are decided at compilation time */
477
- FORCE_INLINE int LZ4_compress_generic(
532
+ LZ4_FORCE_INLINE int LZ4_compress_generic(
478
533
  LZ4_stream_t_internal* const cctx,
479
534
  const char* const source,
480
535
  char* const dest,
@@ -616,7 +671,11 @@ _next_match:
616
671
  *token += ML_MASK;
617
672
  matchCode -= ML_MASK;
618
673
  LZ4_write32(op, 0xFFFFFFFF);
619
- while (matchCode >= 4*255) op+=4, LZ4_write32(op, 0xFFFFFFFF), matchCode -= 4*255;
674
+ while (matchCode >= 4*255) {
675
+ op+=4;
676
+ LZ4_write32(op, 0xFFFFFFFF);
677
+ matchCode -= 4*255;
678
+ }
620
679
  op += matchCode / 255;
621
680
  *op++ = (BYTE)(matchCode % 255);
622
681
  } else
@@ -940,6 +999,7 @@ LZ4_stream_t* LZ4_createStream(void)
940
999
 
941
1000
  void LZ4_resetStream (LZ4_stream_t* LZ4_stream)
942
1001
  {
1002
+ DEBUGLOG(4, "LZ4_resetStream");
943
1003
  MEM_INIT(LZ4_stream, 0, sizeof(LZ4_stream_t));
944
1004
  }
945
1005
 
@@ -1100,47 +1160,46 @@ int LZ4_saveDict (LZ4_stream_t* LZ4_dict, char* safeBuffer, int dictSize)
1100
1160
  * Decompression functions
1101
1161
  *******************************/
1102
1162
  /*! LZ4_decompress_generic() :
1103
- * This generic decompression function cover all use cases.
1104
- * It shall be instantiated several times, using different sets of directives
1105
- * Note that it is important this generic function is really inlined,
1163
+ * This generic decompression function covers all use cases.
1164
+ * It shall be instantiated several times, using different sets of directives.
1165
+ * Note that it is important for performance that this function really get inlined,
1106
1166
  * in order to remove useless branches during compilation optimization.
1107
1167
  */
1108
- FORCE_INLINE int LZ4_decompress_generic(
1109
- const char* const source,
1110
- char* const dest,
1111
- int inputSize,
1112
- int outputSize, /* If endOnInput==endOnInputSize, this value is the max size of Output Buffer. */
1168
+ LZ4_FORCE_O2_GCC_PPC64LE
1169
+ LZ4_FORCE_INLINE int LZ4_decompress_generic(
1170
+ const char* const src,
1171
+ char* const dst,
1172
+ int srcSize,
1173
+ int outputSize, /* If endOnInput==endOnInputSize, this value is `dstCapacity` */
1113
1174
 
1114
1175
  int endOnInput, /* endOnOutputSize, endOnInputSize */
1115
1176
  int partialDecoding, /* full, partial */
1116
1177
  int targetOutputSize, /* only used if partialDecoding==partial */
1117
1178
  int dict, /* noDict, withPrefix64k, usingExtDict */
1118
- const BYTE* const lowPrefix, /* == dest when no prefix */
1179
+ const BYTE* const lowPrefix, /* always <= dst, == dst when no prefix */
1119
1180
  const BYTE* const dictStart, /* only if dict==usingExtDict */
1120
1181
  const size_t dictSize /* note : = 0 if noDict */
1121
1182
  )
1122
1183
  {
1123
- /* Local Variables */
1124
- const BYTE* ip = (const BYTE*) source;
1125
- const BYTE* const iend = ip + inputSize;
1184
+ const BYTE* ip = (const BYTE*) src;
1185
+ const BYTE* const iend = ip + srcSize;
1126
1186
 
1127
- BYTE* op = (BYTE*) dest;
1187
+ BYTE* op = (BYTE*) dst;
1128
1188
  BYTE* const oend = op + outputSize;
1129
1189
  BYTE* cpy;
1130
1190
  BYTE* oexit = op + targetOutputSize;
1131
- const BYTE* const lowLimit = lowPrefix - dictSize;
1132
1191
 
1133
1192
  const BYTE* const dictEnd = (const BYTE*)dictStart + dictSize;
1134
- const unsigned dec32table[] = {0, 1, 2, 1, 4, 4, 4, 4};
1135
- const int dec64table[] = {0, 0, 0, -1, 0, 1, 2, 3};
1193
+ const unsigned inc32table[8] = {0, 1, 2, 1, 0, 4, 4, 4};
1194
+ const int dec64table[8] = {0, 0, 0, -1, -4, 1, 2, 3};
1136
1195
 
1137
1196
  const int safeDecode = (endOnInput==endOnInputSize);
1138
1197
  const int checkOffset = ((safeDecode) && (dictSize < (int)(64 KB)));
1139
1198
 
1140
1199
 
1141
1200
  /* Special cases */
1142
- if ((partialDecoding) && (oexit > oend-MFLIMIT)) oexit = oend-MFLIMIT; /* targetOutputSize too high => decode everything */
1143
- if ((endOnInput) && (unlikely(outputSize==0))) return ((inputSize==1) && (*ip==0)) ? 0 : -1; /* Empty output buffer */
1201
+ if ((partialDecoding) && (oexit > oend-MFLIMIT)) oexit = oend-MFLIMIT; /* targetOutputSize too high => just decode everything */
1202
+ if ((endOnInput) && (unlikely(outputSize==0))) return ((srcSize==1) && (*ip==0)) ? 0 : -1; /* Empty output buffer */
1144
1203
  if ((!endOnInput) && (unlikely(outputSize==0))) return (*ip==0?1:-1);
1145
1204
 
1146
1205
  /* Main Loop : decode sequences */
@@ -1149,8 +1208,27 @@ FORCE_INLINE int LZ4_decompress_generic(
1149
1208
  const BYTE* match;
1150
1209
  size_t offset;
1151
1210
 
1152
- /* get literal length */
1153
1211
  unsigned const token = *ip++;
1212
+
1213
+ /* shortcut for common case :
1214
+ * in most circumstances, we expect to decode small matches (<= 18 bytes) separated by few literals (<= 14 bytes).
1215
+ * this shortcut was tested on x86 and x64, where it improves decoding speed.
1216
+ * it has not yet been benchmarked on ARM, Power, mips, etc. */
1217
+ if (((ip + 14 /*maxLL*/ + 2 /*offset*/ <= iend)
1218
+ & (op + 14 /*maxLL*/ + 18 /*maxML*/ <= oend))
1219
+ & ((token < (15<<ML_BITS)) & ((token & ML_MASK) != 15)) ) {
1220
+ size_t const ll = token >> ML_BITS;
1221
+ size_t const off = LZ4_readLE16(ip+ll);
1222
+ const BYTE* const matchPtr = op + ll - off; /* pointer underflow risk ? */
1223
+ if ((off >= 18) /* do not deal with overlapping matches */ & (matchPtr >= lowPrefix)) {
1224
+ size_t const ml = (token & ML_MASK) + MINMATCH;
1225
+ memcpy(op, ip, 16); op += ll; ip += ll + 2 /*offset*/;
1226
+ memcpy(op, matchPtr, 18); op += ml;
1227
+ continue;
1228
+ }
1229
+ }
1230
+
1231
+ /* decode literal length */
1154
1232
  if ((length=(token>>ML_BITS)) == RUN_MASK) {
1155
1233
  unsigned s;
1156
1234
  do {
@@ -1184,7 +1262,7 @@ FORCE_INLINE int LZ4_decompress_generic(
1184
1262
  /* get offset */
1185
1263
  offset = LZ4_readLE16(ip); ip+=2;
1186
1264
  match = op - offset;
1187
- if ((checkOffset) && (unlikely(match < lowLimit))) goto _output_error; /* Error : offset outside buffers */
1265
+ if ((checkOffset) && (unlikely(match + dictSize < lowPrefix))) goto _output_error; /* Error : offset outside buffers */
1188
1266
  LZ4_write32(op, (U32)offset); /* costs ~1%; silence an msan warning when offset==0 */
1189
1267
 
1190
1268
  /* get matchlength */
@@ -1228,14 +1306,13 @@ FORCE_INLINE int LZ4_decompress_generic(
1228
1306
  /* copy match within block */
1229
1307
  cpy = op + length;
1230
1308
  if (unlikely(offset<8)) {
1231
- const int dec64 = dec64table[offset];
1232
1309
  op[0] = match[0];
1233
1310
  op[1] = match[1];
1234
1311
  op[2] = match[2];
1235
1312
  op[3] = match[3];
1236
- match += dec32table[offset];
1313
+ match += inc32table[offset];
1237
1314
  memcpy(op+4, match, 4);
1238
- match -= dec64;
1315
+ match -= dec64table[offset];
1239
1316
  } else { LZ4_copy8(op, match); match+=8; }
1240
1317
  op += 8;
1241
1318
 
@@ -1252,31 +1329,34 @@ FORCE_INLINE int LZ4_decompress_generic(
1252
1329
  LZ4_copy8(op, match);
1253
1330
  if (length>16) LZ4_wildCopy(op+8, match+8, cpy);
1254
1331
  }
1255
- op=cpy; /* correction */
1332
+ op = cpy; /* correction */
1256
1333
  }
1257
1334
 
1258
1335
  /* end of decoding */
1259
1336
  if (endOnInput)
1260
- return (int) (((char*)op)-dest); /* Nb of output bytes decoded */
1337
+ return (int) (((char*)op)-dst); /* Nb of output bytes decoded */
1261
1338
  else
1262
- return (int) (((const char*)ip)-source); /* Nb of input bytes read */
1339
+ return (int) (((const char*)ip)-src); /* Nb of input bytes read */
1263
1340
 
1264
1341
  /* Overflow error detected */
1265
1342
  _output_error:
1266
- return (int) (-(((const char*)ip)-source))-1;
1343
+ return (int) (-(((const char*)ip)-src))-1;
1267
1344
  }
1268
1345
 
1269
1346
 
1347
+ LZ4_FORCE_O2_GCC_PPC64LE
1270
1348
  int LZ4_decompress_safe(const char* source, char* dest, int compressedSize, int maxDecompressedSize)
1271
1349
  {
1272
1350
  return LZ4_decompress_generic(source, dest, compressedSize, maxDecompressedSize, endOnInputSize, full, 0, noDict, (BYTE*)dest, NULL, 0);
1273
1351
  }
1274
1352
 
1353
+ LZ4_FORCE_O2_GCC_PPC64LE
1275
1354
  int LZ4_decompress_safe_partial(const char* source, char* dest, int compressedSize, int targetOutputSize, int maxDecompressedSize)
1276
1355
  {
1277
1356
  return LZ4_decompress_generic(source, dest, compressedSize, maxDecompressedSize, endOnInputSize, partial, targetOutputSize, noDict, (BYTE*)dest, NULL, 0);
1278
1357
  }
1279
1358
 
1359
+ LZ4_FORCE_O2_GCC_PPC64LE
1280
1360
  int LZ4_decompress_fast(const char* source, char* dest, int originalSize)
1281
1361
  {
1282
1362
  return LZ4_decompress_generic(source, dest, 0, originalSize, endOnOutputSize, full, 0, withPrefix64k, (BYTE*)(dest - 64 KB), NULL, 64 KB);
@@ -1322,6 +1402,7 @@ int LZ4_setStreamDecode (LZ4_streamDecode_t* LZ4_streamDecode, const char* dicti
1322
1402
  If it's not possible, save the relevant part of decoded data into a safe buffer,
1323
1403
  and indicate where it stands using LZ4_setStreamDecode()
1324
1404
  */
1405
+ LZ4_FORCE_O2_GCC_PPC64LE
1325
1406
  int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* source, char* dest, int compressedSize, int maxOutputSize)
1326
1407
  {
1327
1408
  LZ4_streamDecode_t_internal* lz4sd = &LZ4_streamDecode->internal_donotuse;
@@ -1348,6 +1429,7 @@ int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const ch
1348
1429
  return result;
1349
1430
  }
1350
1431
 
1432
+ LZ4_FORCE_O2_GCC_PPC64LE
1351
1433
  int LZ4_decompress_fast_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* source, char* dest, int originalSize)
1352
1434
  {
1353
1435
  LZ4_streamDecode_t_internal* lz4sd = &LZ4_streamDecode->internal_donotuse;
@@ -1382,7 +1464,8 @@ Advanced decoding functions :
1382
1464
  the dictionary must be explicitly provided within parameters
1383
1465
  */
1384
1466
 
1385
- FORCE_INLINE int LZ4_decompress_usingDict_generic(const char* source, char* dest, int compressedSize, int maxOutputSize, int safe, const char* dictStart, int dictSize)
1467
+ LZ4_FORCE_O2_GCC_PPC64LE
1468
+ LZ4_FORCE_INLINE int LZ4_decompress_usingDict_generic(const char* source, char* dest, int compressedSize, int maxOutputSize, int safe, const char* dictStart, int dictSize)
1386
1469
  {
1387
1470
  if (dictSize==0)
1388
1471
  return LZ4_decompress_generic(source, dest, compressedSize, maxOutputSize, safe, full, 0, noDict, (BYTE*)dest, NULL, 0);
@@ -1394,17 +1477,20 @@ FORCE_INLINE int LZ4_decompress_usingDict_generic(const char* source, char* dest
1394
1477
  return LZ4_decompress_generic(source, dest, compressedSize, maxOutputSize, safe, full, 0, usingExtDict, (BYTE*)dest, (const BYTE*)dictStart, dictSize);
1395
1478
  }
1396
1479
 
1480
+ LZ4_FORCE_O2_GCC_PPC64LE
1397
1481
  int LZ4_decompress_safe_usingDict(const char* source, char* dest, int compressedSize, int maxOutputSize, const char* dictStart, int dictSize)
1398
1482
  {
1399
1483
  return LZ4_decompress_usingDict_generic(source, dest, compressedSize, maxOutputSize, 1, dictStart, dictSize);
1400
1484
  }
1401
1485
 
1486
+ LZ4_FORCE_O2_GCC_PPC64LE
1402
1487
  int LZ4_decompress_fast_usingDict(const char* source, char* dest, int originalSize, const char* dictStart, int dictSize)
1403
1488
  {
1404
1489
  return LZ4_decompress_usingDict_generic(source, dest, 0, originalSize, 0, dictStart, dictSize);
1405
1490
  }
1406
1491
 
1407
1492
  /* debug function */
1493
+ LZ4_FORCE_O2_GCC_PPC64LE
1408
1494
  int LZ4_decompress_safe_forceExtDict(const char* source, char* dest, int compressedSize, int maxOutputSize, const char* dictStart, int dictSize)
1409
1495
  {
1410
1496
  return LZ4_decompress_generic(source, dest, compressedSize, maxOutputSize, endOnInputSize, full, 0, usingExtDict, (BYTE*)dest, (const BYTE*)dictStart, dictSize);