extlz4 0.2.4.3 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9b6fe3483483101e9c211a0f3f11d93531da246ea3cc76122dc68e5de055d869
4
- data.tar.gz: 1b707db6ec7779f516ce3c9b60ad95165d8127bbd07a651d4da5717e7716bdea
3
+ metadata.gz: fc2c22224769c26e8e6c7fe4eded483f7963203ddc87da1b834c4597d3e6e593
4
+ data.tar.gz: 615d87e1be068ecc93a1c262127d1ad28787a5d25ede531fb15cb1dda1c4a533
5
5
  SHA512:
6
- metadata.gz: 26e050bdd6430587345be66cfd75b501cb9b39bf11f586288773d8c429ff2698d242d270aacad051455c356e6078c6720d9c4f7582f2d4e8a28d5ea58d3456cd
7
- data.tar.gz: f43f49a90cd5d144155fd0b46b70fa066eba298a923f4fc2e1317be85be4c50463cbee7d5683d3c6b6738fd317fcd60f18348effebd700b85e70e518d02dc652
6
+ metadata.gz: db7354b91b1776a973eb323680509c30f0c9704a8ee61c00f88fa20bf020128a4bb5cd53e9393d8115fa54e10b137bbd43a570a830d75bbf9d7c5d4ef6e1148d
7
+ data.tar.gz: fc2e568642e6ffd47d8b941dff6edf54fca9a9cf19017e31808c0382c3e481d5714316db915a1cae3e15a2d04833e4bc8fdb5094e38ae122f069826f2d73a92d
@@ -1,3 +1,8 @@
1
+ # extlz4-0.2.4.4 (平成30年1月14日 日曜日)
2
+
3
+ * lz4 ライブラリを [1.8.1](https://github.com/lz4/lz4/releases/tag/v1.8.1) に更新
4
+
5
+
1
6
  # extlz4-0.2.4.3 (平成30年1月13日 土曜日)
2
7
 
3
8
  * LZ4::Decoder.read でバッファ領域が確保されていなかった問題を修正
data/README.md CHANGED
@@ -18,14 +18,14 @@ $ dmesg | ruby -r extlz4 -e 'LZ4.encode_file($stdin.binmode, $stdout.binmode)' |
18
18
  * author: dearblue (mailto:dearblue@users.noreply.github.com)
19
19
  * report issue to: <https://github.com/dearblue/ruby-extlz4/issues>
20
20
  * how to install: `gem install extlz4`
21
- * version: 0.2.4.3
21
+ * version: 0.2.5
22
22
  * product quality: technical preview
23
23
  * licensing: BSD-2-Clause License
24
24
  * dependency gems: none
25
25
  * dependency external c libraries: none
26
26
  * bundled external c libraries:
27
- * lz4-1.8 <https://github.com/lz4/lz4/tree/v1.8.0>
28
- under [BSD 2-Clause license](https://github.com/lz4/lz4/tree/v1.8.0/LICENSE)
27
+ * lz4-1.8.1 <https://github.com/lz4/lz4/tree/v1.8.1>
28
+ under [BSD 2-Clause license](https://github.com/lz4/lz4/tree/v1.8.1/LICENSE)
29
29
  by [Yann Collet](https://github.com/Cyan4973)
30
30
 
31
31
 
@@ -8,6 +8,7 @@ make install # this command may require root access
8
8
 
9
9
  LZ4's `Makefile` supports standard [Makefile conventions],
10
10
  including [staged installs], [redirection], or [command redefinition].
11
+ It is compatible with parallel builds (`-j#`).
11
12
 
12
13
  [Makefile conventions]: https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html
13
14
  [staged installs]: https://www.gnu.org/prep/standards/html_node/DESTDIR.html
@@ -1,3 +1,16 @@
1
+ v1.8.1
2
+ perf : faster and stronger ultra modes (levels 10+)
3
+ perf : slightly faster compression and decompression speed
4
+ perf : fix bad degenerative case, reported by @c-morgenstern
5
+ fix : decompression failed when using a combination of extDict + low memory address (#397), reported and fixed by Julian Scheid (@jscheid)
6
+ cli : support for dictionary compression (`-D`), by Felix Handte @felixhandte
7
+ cli : fix : `lz4 -d --rm` preserves timestamp (#441)
8
+ cli : fix : do not modify /dev/null permission as root, by @aliceatlas
9
+ api : `_destSize()` variant supported for all compression levels
10
+ build : `make` and `make test` compatible with `-jX`, reported by @mwgamera
11
+ build : can control LZ4LIB_VISIBILITY macro, by @mikir
12
+ install: fix man page directory (#387), reported by Stuart Cardall (@itoffshore)
13
+
1
14
  v1.8.0
2
15
  cli : fix : do not modify /dev/null permissions, reported by @Maokaman1
3
16
  cli : added GNU separator -- specifying that all following arguments are files
@@ -82,6 +82,7 @@ make install # this command may require root access
82
82
 
83
83
  LZ4's `Makefile` supports standard [Makefile conventions],
84
84
  including [staged installs], [redirection], or [command redefinition].
85
+ It is compatible with parallel builds (`-j#`).
85
86
 
86
87
  [Makefile conventions]: https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html
87
88
  [staged installs]: https://www.gnu.org/prep/standards/html_node/DESTDIR.html
@@ -19,7 +19,6 @@ test:
19
19
  - make cmake && make clean
20
20
  - make -C tests test-lz4
21
21
  - make -C tests test-lz4c
22
- - make -C tests test-fasttest
23
22
  - make -C tests test-frametest
24
23
  - make -C tests test-fullbench
25
24
  - make -C tests test-fuzzer && make clean
@@ -1,44 +1,43 @@
1
1
  LZ4 - Library Files
2
2
  ================================
3
3
 
4
- The directory contains many files, but depending on project's objectives,
4
+ The `/lib` directory contains many files, but depending on project's objectives,
5
5
  not all of them are necessary.
6
6
 
7
7
  #### Minimal LZ4 build
8
8
 
9
9
  The minimum required is **`lz4.c`** and **`lz4.h`**,
10
- which will provide the fast compression and decompression algorithm.
10
+ which provides the fast compression and decompression algorithm.
11
+ They generate and decode data using [LZ4 block format].
11
12
 
12
13
 
13
- #### The High Compression variant of LZ4
14
+ #### High Compression variant
14
15
 
15
- For more compression at the cost of compression speed,
16
- the High Compression variant **lz4hc** is available.
17
- It's necessary to add **`lz4hc.c`** and **`lz4hc.h`**.
18
- The variant still depends on regular `lz4` source files.
19
- In particular, the decompression is still provided by `lz4.c`.
16
+ For more compression ratio at the cost of compression speed,
17
+ the High Compression variant called **lz4hc** is available.
18
+ Add files **`lz4hc.c`**, **`lz4hc.h`** and **`lz4opt.h`**.
19
+ The variant still depends on regular `lib/lz4.*` source files.
20
20
 
21
21
 
22
- #### Compatibility issues
22
+ #### Frame variant, for interoperability
23
23
 
24
- In order to produce files or streams compatible with `lz4` command line utility,
24
+ In order to produce compressed data compatible with `lz4` command line utility,
25
25
  it's necessary to encode lz4-compressed blocks using the [official interoperable frame format].
26
26
  This format is generated and decoded automatically by the **lz4frame** library.
27
- In order to work properly, lz4frame needs lz4 and lz4hc, and also **xxhash**,
28
- which provides error detection.
29
- (_Advanced stuff_ : It's possible to hide xxhash symbols into a local namespace.
30
- This is what `liblz4` does, to avoid symbol duplication
31
- in case a user program would link to several libraries containing xxhash symbols.)
27
+ Its public API is described in `lib/lz4frame.h`.
28
+ In order to work properly, lz4frame needs all other modules present in `/lib`,
29
+ including, lz4 and lz4hc, and also **xxhash**.
30
+ So it's necessary to include all `*.c` and `*.h` files present in `/lib`.
32
31
 
33
32
 
34
- #### Advanced API
33
+ #### Advanced / Experimental API
35
34
 
36
- A more complex `lz4frame_static.h` is also provided.
37
- It contains definitions which are not guaranteed to remain stable within future versions.
38
- It must be used with static linking ***only***.
35
+ A complex API defined in `lz4frame_static.h` contains definitions
36
+ which are not guaranteed to remain stable in future versions.
37
+ As a consequence, it must be used with static linking ***only***.
39
38
 
40
39
 
41
- #### Using MinGW+MSYS to create DLL
40
+ #### Windows : using MinGW+MSYS to create DLL
42
41
 
43
42
  DLL can be created using MinGW+MSYS with the `make liblz4` command.
44
43
  This command creates `dll\liblz4.dll` and the import library `dll\liblz4.lib`.
@@ -51,23 +50,24 @@ file it should be linked with `dll\liblz4.dll`. For example:
51
50
  ```
52
51
  gcc $(CFLAGS) -Iinclude/ test-dll.c -o test-dll dll\liblz4.dll
53
52
  ```
54
- The compiled executable will require LZ4 DLL which is available at `dll\liblz4.dll`.
53
+ The compiled executable will require LZ4 DLL which is available at `dll\liblz4.dll`.
55
54
 
56
55
 
57
- #### Miscellaneous
56
+ #### Miscellaneous
58
57
 
59
58
  Other files present in the directory are not source code. There are :
60
59
 
61
- - LICENSE : contains the BSD license text
62
- - Makefile : script to compile or install lz4 library (static or dynamic)
63
- - liblz4.pc.in : for pkg-config (make install)
64
- - README.md : this file
60
+ - `LICENSE` : contains the BSD license text
61
+ - `Makefile` : `make` script to compile and install lz4 library (static and dynamic)
62
+ - `liblz4.pc.in` : for `pkg-config` (used in `make install`)
63
+ - `README.md` : this file
65
64
 
66
65
  [official interoperable frame format]: ../doc/lz4_Frame_format.md
66
+ [LZ4 block format]: ../doc/lz4_Block_format.md
67
67
 
68
68
 
69
- #### License
69
+ #### License
70
70
 
71
71
  All source material within __lib__ directory are BSD 2-Clause licensed.
72
72
  See [LICENSE](LICENSE) for details.
73
- The license is also repeated at the top of each source file.
73
+ The license is also reminded at the top of each source file.
@@ -85,6 +85,7 @@
85
85
  #endif
86
86
 
87
87
 
88
+
88
89
  /*-************************************
89
90
  * Dependency
90
91
  **************************************/
@@ -101,21 +102,43 @@
101
102
  # pragma warning(disable : 4293) /* disable: C4293: too large shift (32-bits) */
102
103
  #endif /* _MSC_VER */
103
104
 
104
- #ifndef FORCE_INLINE
105
+ #ifndef LZ4_FORCE_INLINE
105
106
  # ifdef _MSC_VER /* Visual Studio */
106
- # define FORCE_INLINE static __forceinline
107
+ # define LZ4_FORCE_INLINE static __forceinline
107
108
  # else
108
109
  # if defined (__cplusplus) || defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* C99 */
109
110
  # ifdef __GNUC__
110
- # define FORCE_INLINE static inline __attribute__((always_inline))
111
+ # define LZ4_FORCE_INLINE static inline __attribute__((always_inline))
111
112
  # else
112
- # define FORCE_INLINE static inline
113
+ # define LZ4_FORCE_INLINE static inline
113
114
  # endif
114
115
  # else
115
- # define FORCE_INLINE static
116
+ # define LZ4_FORCE_INLINE static
116
117
  # endif /* __STDC_VERSION__ */
117
118
  # endif /* _MSC_VER */
118
- #endif /* FORCE_INLINE */
119
+ #endif /* LZ4_FORCE_INLINE */
120
+
121
+ /* LZ4_FORCE_O2_GCC_PPC64LE and LZ4_FORCE_O2_INLINE_GCC_PPC64LE
122
+ * Gcc on ppc64le generates an unrolled SIMDized loop for LZ4_wildCopy,
123
+ * together with a simple 8-byte copy loop as a fall-back path.
124
+ * However, this optimization hurts the decompression speed by >30%,
125
+ * because the execution does not go to the optimized loop
126
+ * for typical compressible data, and all of the preamble checks
127
+ * before going to the fall-back path become useless overhead.
128
+ * This optimization happens only with the -O3 flag, and -O2 generates
129
+ * a simple 8-byte copy loop.
130
+ * With gcc on ppc64le, all of the LZ4_decompress_* and LZ4_wildCopy
131
+ * functions are annotated with __attribute__((optimize("O2"))),
132
+ * and also LZ4_wildCopy is forcibly inlined, so that the O2 attribute
133
+ * of LZ4_wildCopy does not affect the compression speed.
134
+ */
135
+ #if defined(__PPC64__) && defined(__LITTLE_ENDIAN__) && defined(__GNUC__)
136
+ # define LZ4_FORCE_O2_GCC_PPC64LE __attribute__((optimize("O2")))
137
+ # define LZ4_FORCE_O2_INLINE_GCC_PPC64LE __attribute__((optimize("O2"))) LZ4_FORCE_INLINE
138
+ #else
139
+ # define LZ4_FORCE_O2_GCC_PPC64LE
140
+ # define LZ4_FORCE_O2_INLINE_GCC_PPC64LE static
141
+ #endif
119
142
 
120
143
  #if (defined(__GNUC__) && (__GNUC__ >= 3)) || (defined(__INTEL_COMPILER) && (__INTEL_COMPILER >= 800)) || defined(__clang__)
121
144
  # define expect(expr,value) (__builtin_expect ((expr),(value)) )
@@ -253,7 +276,8 @@ static void LZ4_copy8(void* dst, const void* src)
253
276
  }
254
277
 
255
278
  /* customized variant of memcpy, which can overwrite up to 8 bytes beyond dstEnd */
256
- static void LZ4_wildCopy(void* dstPtr, const void* srcPtr, void* dstEnd)
279
+ LZ4_FORCE_O2_INLINE_GCC_PPC64LE
280
+ void LZ4_wildCopy(void* dstPtr, const void* srcPtr, void* dstEnd)
257
281
  {
258
282
  BYTE* d = (BYTE*)dstPtr;
259
283
  const BYTE* s = (const BYTE*)srcPtr;
@@ -289,15 +313,24 @@ static const int LZ4_minLength = (MFLIMIT+1);
289
313
  /*-************************************
290
314
  * Error detection
291
315
  **************************************/
292
- #define LZ4_STATIC_ASSERT(c) { enum { LZ4_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */
316
+ #if defined(LZ4_DEBUG) && (LZ4_DEBUG>=1)
317
+ # include <assert.h>
318
+ #else
319
+ # ifndef assert
320
+ # define assert(condition) ((void)0)
321
+ # endif
322
+ #endif
323
+
324
+ #define LZ4_STATIC_ASSERT(c) { enum { LZ4_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */
293
325
 
294
326
  #if defined(LZ4_DEBUG) && (LZ4_DEBUG>=2)
295
327
  # include <stdio.h>
296
- # define DEBUGLOG(l, ...) { \
297
- if (l<=LZ4_DEBUG) { \
298
- fprintf(stderr, __FILE__ ": "); \
299
- fprintf(stderr, __VA_ARGS__); \
300
- fprintf(stderr, " \n"); \
328
+ static int g_debuglog_enable = 1;
329
+ # define DEBUGLOG(l, ...) { \
330
+ if ((g_debuglog_enable) && (l<=LZ4_DEBUG)) { \
331
+ fprintf(stderr, __FILE__ ": "); \
332
+ fprintf(stderr, __VA_ARGS__); \
333
+ fprintf(stderr, " \n"); \
301
334
  } }
302
335
  #else
303
336
  # define DEBUGLOG(l, ...) {} /* disabled */
@@ -307,7 +340,7 @@ static const int LZ4_minLength = (MFLIMIT+1);
307
340
  /*-************************************
308
341
  * Common functions
309
342
  **************************************/
310
- static unsigned LZ4_NbCommonBytes (register reg_t val)
343
+ static unsigned LZ4_NbCommonBytes (reg_t val)
311
344
  {
312
345
  if (LZ4_isLittleEndian()) {
313
346
  if (sizeof(val)==8) {
@@ -318,7 +351,14 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
318
351
  # elif (defined(__clang__) || (defined(__GNUC__) && (__GNUC__>=3))) && !defined(LZ4_FORCE_SW_BITCOUNT)
319
352
  return (__builtin_ctzll((U64)val) >> 3);
320
353
  # else
321
- static const int DeBruijnBytePos[64] = { 0, 0, 0, 0, 0, 1, 1, 2, 0, 3, 1, 3, 1, 4, 2, 7, 0, 2, 3, 6, 1, 5, 3, 5, 1, 3, 4, 4, 2, 5, 6, 7, 7, 0, 1, 2, 3, 3, 4, 6, 2, 6, 5, 5, 3, 4, 5, 6, 7, 1, 2, 4, 6, 4, 4, 5, 7, 2, 6, 5, 7, 6, 7, 7 };
354
+ static const int DeBruijnBytePos[64] = { 0, 0, 0, 0, 0, 1, 1, 2,
355
+ 0, 3, 1, 3, 1, 4, 2, 7,
356
+ 0, 2, 3, 6, 1, 5, 3, 5,
357
+ 1, 3, 4, 4, 2, 5, 6, 7,
358
+ 7, 0, 1, 2, 3, 3, 4, 6,
359
+ 2, 6, 5, 5, 3, 4, 5, 6,
360
+ 7, 1, 2, 4, 6, 4, 4, 5,
361
+ 7, 2, 6, 5, 7, 6, 7, 7 };
322
362
  return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218A392CDABBD3FULL)) >> 58];
323
363
  # endif
324
364
  } else /* 32 bits */ {
@@ -329,12 +369,15 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
329
369
  # elif (defined(__clang__) || (defined(__GNUC__) && (__GNUC__>=3))) && !defined(LZ4_FORCE_SW_BITCOUNT)
330
370
  return (__builtin_ctz((U32)val) >> 3);
331
371
  # else
332
- static const int DeBruijnBytePos[32] = { 0, 0, 3, 0, 3, 1, 3, 0, 3, 2, 2, 1, 3, 2, 0, 1, 3, 3, 1, 2, 2, 2, 2, 0, 3, 1, 2, 0, 1, 0, 1, 1 };
372
+ static const int DeBruijnBytePos[32] = { 0, 0, 3, 0, 3, 1, 3, 0,
373
+ 3, 2, 2, 1, 3, 2, 0, 1,
374
+ 3, 3, 1, 2, 2, 2, 2, 0,
375
+ 3, 1, 2, 0, 1, 0, 1, 1 };
333
376
  return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)) >> 27];
334
377
  # endif
335
378
  }
336
379
  } else /* Big Endian CPU */ {
337
- if (sizeof(val)==8) {
380
+ if (sizeof(val)==8) { /* 64-bits */
338
381
  # if defined(_MSC_VER) && defined(_WIN64) && !defined(LZ4_FORCE_SW_BITCOUNT)
339
382
  unsigned long r = 0;
340
383
  _BitScanReverse64( &r, val );
@@ -342,8 +385,11 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
342
385
  # elif (defined(__clang__) || (defined(__GNUC__) && (__GNUC__>=3))) && !defined(LZ4_FORCE_SW_BITCOUNT)
343
386
  return (__builtin_clzll((U64)val) >> 3);
344
387
  # else
388
+ static const U32 by32 = sizeof(val)*4; /* 32 on 64 bits (goal), 16 on 32 bits.
389
+ Just to avoid some static analyzer complaining about shift by 32 on 32-bits target.
390
+ Note that this code path is never triggered in 32-bits mode. */
345
391
  unsigned r;
346
- if (!(val>>32)) { r=4; } else { r=0; val>>=32; }
392
+ if (!(val>>by32)) { r=4; } else { r=0; val>>=by32; }
347
393
  if (!(val>>16)) { r+=2; val>>=8; } else { val>>=24; }
348
394
  r += (!val);
349
395
  return r;
@@ -366,11 +412,20 @@ static unsigned LZ4_NbCommonBytes (register reg_t val)
366
412
  }
367
413
 
368
414
  #define STEPSIZE sizeof(reg_t)
369
- static unsigned LZ4_count(const BYTE* pIn, const BYTE* pMatch, const BYTE* pInLimit)
415
+ LZ4_FORCE_INLINE
416
+ unsigned LZ4_count(const BYTE* pIn, const BYTE* pMatch, const BYTE* pInLimit)
370
417
  {
371
418
  const BYTE* const pStart = pIn;
372
419
 
373
- while (likely(pIn<pInLimit-(STEPSIZE-1))) {
420
+ if (likely(pIn < pInLimit-(STEPSIZE-1))) {
421
+ reg_t const diff = LZ4_read_ARCH(pMatch) ^ LZ4_read_ARCH(pIn);
422
+ if (!diff) {
423
+ pIn+=STEPSIZE; pMatch+=STEPSIZE;
424
+ } else {
425
+ return LZ4_NbCommonBytes(diff);
426
+ } }
427
+
428
+ while (likely(pIn < pInLimit-(STEPSIZE-1))) {
374
429
  reg_t const diff = LZ4_read_ARCH(pMatch) ^ LZ4_read_ARCH(pIn);
375
430
  if (!diff) { pIn+=STEPSIZE; pMatch+=STEPSIZE; continue; }
376
431
  pIn += LZ4_NbCommonBytes(diff);
@@ -436,7 +491,7 @@ static U32 LZ4_hash5(U64 sequence, tableType_t const tableType)
436
491
  return (U32)(((sequence >> 24) * prime8bytes) >> (64 - hashLog));
437
492
  }
438
493
 
439
- FORCE_INLINE U32 LZ4_hashPosition(const void* const p, tableType_t const tableType)
494
+ LZ4_FORCE_INLINE U32 LZ4_hashPosition(const void* const p, tableType_t const tableType)
440
495
  {
441
496
  if ((sizeof(reg_t)==8) && (tableType != byU16)) return LZ4_hash5(LZ4_read_ARCH(p), tableType);
442
497
  return LZ4_hash4(LZ4_read32(p), tableType);
@@ -452,7 +507,7 @@ static void LZ4_putPositionOnHash(const BYTE* p, U32 h, void* tableBase, tableTy
452
507
  }
453
508
  }
454
509
 
455
- FORCE_INLINE void LZ4_putPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
510
+ LZ4_FORCE_INLINE void LZ4_putPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
456
511
  {
457
512
  U32 const h = LZ4_hashPosition(p, tableType);
458
513
  LZ4_putPositionOnHash(p, h, tableBase, tableType, srcBase);
@@ -465,7 +520,7 @@ static const BYTE* LZ4_getPositionOnHash(U32 h, void* tableBase, tableType_t tab
465
520
  { const U16* const hashTable = (U16*) tableBase; return hashTable[h] + srcBase; } /* default, to ensure a return */
466
521
  }
467
522
 
468
- FORCE_INLINE const BYTE* LZ4_getPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
523
+ LZ4_FORCE_INLINE const BYTE* LZ4_getPosition(const BYTE* p, void* tableBase, tableType_t tableType, const BYTE* srcBase)
469
524
  {
470
525
  U32 const h = LZ4_hashPosition(p, tableType);
471
526
  return LZ4_getPositionOnHash(h, tableBase, tableType, srcBase);
@@ -474,7 +529,7 @@ FORCE_INLINE const BYTE* LZ4_getPosition(const BYTE* p, void* tableBase, tableTy
474
529
 
475
530
  /** LZ4_compress_generic() :
476
531
  inlined, to ensure branches are decided at compilation time */
477
- FORCE_INLINE int LZ4_compress_generic(
532
+ LZ4_FORCE_INLINE int LZ4_compress_generic(
478
533
  LZ4_stream_t_internal* const cctx,
479
534
  const char* const source,
480
535
  char* const dest,
@@ -616,7 +671,11 @@ _next_match:
616
671
  *token += ML_MASK;
617
672
  matchCode -= ML_MASK;
618
673
  LZ4_write32(op, 0xFFFFFFFF);
619
- while (matchCode >= 4*255) op+=4, LZ4_write32(op, 0xFFFFFFFF), matchCode -= 4*255;
674
+ while (matchCode >= 4*255) {
675
+ op+=4;
676
+ LZ4_write32(op, 0xFFFFFFFF);
677
+ matchCode -= 4*255;
678
+ }
620
679
  op += matchCode / 255;
621
680
  *op++ = (BYTE)(matchCode % 255);
622
681
  } else
@@ -940,6 +999,7 @@ LZ4_stream_t* LZ4_createStream(void)
940
999
 
941
1000
  void LZ4_resetStream (LZ4_stream_t* LZ4_stream)
942
1001
  {
1002
+ DEBUGLOG(4, "LZ4_resetStream");
943
1003
  MEM_INIT(LZ4_stream, 0, sizeof(LZ4_stream_t));
944
1004
  }
945
1005
 
@@ -1100,47 +1160,46 @@ int LZ4_saveDict (LZ4_stream_t* LZ4_dict, char* safeBuffer, int dictSize)
1100
1160
  * Decompression functions
1101
1161
  *******************************/
1102
1162
  /*! LZ4_decompress_generic() :
1103
- * This generic decompression function cover all use cases.
1104
- * It shall be instantiated several times, using different sets of directives
1105
- * Note that it is important this generic function is really inlined,
1163
+ * This generic decompression function covers all use cases.
1164
+ * It shall be instantiated several times, using different sets of directives.
1165
+ * Note that it is important for performance that this function really get inlined,
1106
1166
  * in order to remove useless branches during compilation optimization.
1107
1167
  */
1108
- FORCE_INLINE int LZ4_decompress_generic(
1109
- const char* const source,
1110
- char* const dest,
1111
- int inputSize,
1112
- int outputSize, /* If endOnInput==endOnInputSize, this value is the max size of Output Buffer. */
1168
+ LZ4_FORCE_O2_GCC_PPC64LE
1169
+ LZ4_FORCE_INLINE int LZ4_decompress_generic(
1170
+ const char* const src,
1171
+ char* const dst,
1172
+ int srcSize,
1173
+ int outputSize, /* If endOnInput==endOnInputSize, this value is `dstCapacity` */
1113
1174
 
1114
1175
  int endOnInput, /* endOnOutputSize, endOnInputSize */
1115
1176
  int partialDecoding, /* full, partial */
1116
1177
  int targetOutputSize, /* only used if partialDecoding==partial */
1117
1178
  int dict, /* noDict, withPrefix64k, usingExtDict */
1118
- const BYTE* const lowPrefix, /* == dest when no prefix */
1179
+ const BYTE* const lowPrefix, /* always <= dst, == dst when no prefix */
1119
1180
  const BYTE* const dictStart, /* only if dict==usingExtDict */
1120
1181
  const size_t dictSize /* note : = 0 if noDict */
1121
1182
  )
1122
1183
  {
1123
- /* Local Variables */
1124
- const BYTE* ip = (const BYTE*) source;
1125
- const BYTE* const iend = ip + inputSize;
1184
+ const BYTE* ip = (const BYTE*) src;
1185
+ const BYTE* const iend = ip + srcSize;
1126
1186
 
1127
- BYTE* op = (BYTE*) dest;
1187
+ BYTE* op = (BYTE*) dst;
1128
1188
  BYTE* const oend = op + outputSize;
1129
1189
  BYTE* cpy;
1130
1190
  BYTE* oexit = op + targetOutputSize;
1131
- const BYTE* const lowLimit = lowPrefix - dictSize;
1132
1191
 
1133
1192
  const BYTE* const dictEnd = (const BYTE*)dictStart + dictSize;
1134
- const unsigned dec32table[] = {0, 1, 2, 1, 4, 4, 4, 4};
1135
- const int dec64table[] = {0, 0, 0, -1, 0, 1, 2, 3};
1193
+ const unsigned inc32table[8] = {0, 1, 2, 1, 0, 4, 4, 4};
1194
+ const int dec64table[8] = {0, 0, 0, -1, -4, 1, 2, 3};
1136
1195
 
1137
1196
  const int safeDecode = (endOnInput==endOnInputSize);
1138
1197
  const int checkOffset = ((safeDecode) && (dictSize < (int)(64 KB)));
1139
1198
 
1140
1199
 
1141
1200
  /* Special cases */
1142
- if ((partialDecoding) && (oexit > oend-MFLIMIT)) oexit = oend-MFLIMIT; /* targetOutputSize too high => decode everything */
1143
- if ((endOnInput) && (unlikely(outputSize==0))) return ((inputSize==1) && (*ip==0)) ? 0 : -1; /* Empty output buffer */
1201
+ if ((partialDecoding) && (oexit > oend-MFLIMIT)) oexit = oend-MFLIMIT; /* targetOutputSize too high => just decode everything */
1202
+ if ((endOnInput) && (unlikely(outputSize==0))) return ((srcSize==1) && (*ip==0)) ? 0 : -1; /* Empty output buffer */
1144
1203
  if ((!endOnInput) && (unlikely(outputSize==0))) return (*ip==0?1:-1);
1145
1204
 
1146
1205
  /* Main Loop : decode sequences */
@@ -1149,8 +1208,27 @@ FORCE_INLINE int LZ4_decompress_generic(
1149
1208
  const BYTE* match;
1150
1209
  size_t offset;
1151
1210
 
1152
- /* get literal length */
1153
1211
  unsigned const token = *ip++;
1212
+
1213
+ /* shortcut for common case :
1214
+ * in most circumstances, we expect to decode small matches (<= 18 bytes) separated by few literals (<= 14 bytes).
1215
+ * this shortcut was tested on x86 and x64, where it improves decoding speed.
1216
+ * it has not yet been benchmarked on ARM, Power, mips, etc. */
1217
+ if (((ip + 14 /*maxLL*/ + 2 /*offset*/ <= iend)
1218
+ & (op + 14 /*maxLL*/ + 18 /*maxML*/ <= oend))
1219
+ & ((token < (15<<ML_BITS)) & ((token & ML_MASK) != 15)) ) {
1220
+ size_t const ll = token >> ML_BITS;
1221
+ size_t const off = LZ4_readLE16(ip+ll);
1222
+ const BYTE* const matchPtr = op + ll - off; /* pointer underflow risk ? */
1223
+ if ((off >= 18) /* do not deal with overlapping matches */ & (matchPtr >= lowPrefix)) {
1224
+ size_t const ml = (token & ML_MASK) + MINMATCH;
1225
+ memcpy(op, ip, 16); op += ll; ip += ll + 2 /*offset*/;
1226
+ memcpy(op, matchPtr, 18); op += ml;
1227
+ continue;
1228
+ }
1229
+ }
1230
+
1231
+ /* decode literal length */
1154
1232
  if ((length=(token>>ML_BITS)) == RUN_MASK) {
1155
1233
  unsigned s;
1156
1234
  do {
@@ -1184,7 +1262,7 @@ FORCE_INLINE int LZ4_decompress_generic(
1184
1262
  /* get offset */
1185
1263
  offset = LZ4_readLE16(ip); ip+=2;
1186
1264
  match = op - offset;
1187
- if ((checkOffset) && (unlikely(match < lowLimit))) goto _output_error; /* Error : offset outside buffers */
1265
+ if ((checkOffset) && (unlikely(match + dictSize < lowPrefix))) goto _output_error; /* Error : offset outside buffers */
1188
1266
  LZ4_write32(op, (U32)offset); /* costs ~1%; silence an msan warning when offset==0 */
1189
1267
 
1190
1268
  /* get matchlength */
@@ -1228,14 +1306,13 @@ FORCE_INLINE int LZ4_decompress_generic(
1228
1306
  /* copy match within block */
1229
1307
  cpy = op + length;
1230
1308
  if (unlikely(offset<8)) {
1231
- const int dec64 = dec64table[offset];
1232
1309
  op[0] = match[0];
1233
1310
  op[1] = match[1];
1234
1311
  op[2] = match[2];
1235
1312
  op[3] = match[3];
1236
- match += dec32table[offset];
1313
+ match += inc32table[offset];
1237
1314
  memcpy(op+4, match, 4);
1238
- match -= dec64;
1315
+ match -= dec64table[offset];
1239
1316
  } else { LZ4_copy8(op, match); match+=8; }
1240
1317
  op += 8;
1241
1318
 
@@ -1252,31 +1329,34 @@ FORCE_INLINE int LZ4_decompress_generic(
1252
1329
  LZ4_copy8(op, match);
1253
1330
  if (length>16) LZ4_wildCopy(op+8, match+8, cpy);
1254
1331
  }
1255
- op=cpy; /* correction */
1332
+ op = cpy; /* correction */
1256
1333
  }
1257
1334
 
1258
1335
  /* end of decoding */
1259
1336
  if (endOnInput)
1260
- return (int) (((char*)op)-dest); /* Nb of output bytes decoded */
1337
+ return (int) (((char*)op)-dst); /* Nb of output bytes decoded */
1261
1338
  else
1262
- return (int) (((const char*)ip)-source); /* Nb of input bytes read */
1339
+ return (int) (((const char*)ip)-src); /* Nb of input bytes read */
1263
1340
 
1264
1341
  /* Overflow error detected */
1265
1342
  _output_error:
1266
- return (int) (-(((const char*)ip)-source))-1;
1343
+ return (int) (-(((const char*)ip)-src))-1;
1267
1344
  }
1268
1345
 
1269
1346
 
1347
+ LZ4_FORCE_O2_GCC_PPC64LE
1270
1348
  int LZ4_decompress_safe(const char* source, char* dest, int compressedSize, int maxDecompressedSize)
1271
1349
  {
1272
1350
  return LZ4_decompress_generic(source, dest, compressedSize, maxDecompressedSize, endOnInputSize, full, 0, noDict, (BYTE*)dest, NULL, 0);
1273
1351
  }
1274
1352
 
1353
+ LZ4_FORCE_O2_GCC_PPC64LE
1275
1354
  int LZ4_decompress_safe_partial(const char* source, char* dest, int compressedSize, int targetOutputSize, int maxDecompressedSize)
1276
1355
  {
1277
1356
  return LZ4_decompress_generic(source, dest, compressedSize, maxDecompressedSize, endOnInputSize, partial, targetOutputSize, noDict, (BYTE*)dest, NULL, 0);
1278
1357
  }
1279
1358
 
1359
+ LZ4_FORCE_O2_GCC_PPC64LE
1280
1360
  int LZ4_decompress_fast(const char* source, char* dest, int originalSize)
1281
1361
  {
1282
1362
  return LZ4_decompress_generic(source, dest, 0, originalSize, endOnOutputSize, full, 0, withPrefix64k, (BYTE*)(dest - 64 KB), NULL, 64 KB);
@@ -1322,6 +1402,7 @@ int LZ4_setStreamDecode (LZ4_streamDecode_t* LZ4_streamDecode, const char* dicti
1322
1402
  If it's not possible, save the relevant part of decoded data into a safe buffer,
1323
1403
  and indicate where it stands using LZ4_setStreamDecode()
1324
1404
  */
1405
+ LZ4_FORCE_O2_GCC_PPC64LE
1325
1406
  int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* source, char* dest, int compressedSize, int maxOutputSize)
1326
1407
  {
1327
1408
  LZ4_streamDecode_t_internal* lz4sd = &LZ4_streamDecode->internal_donotuse;
@@ -1348,6 +1429,7 @@ int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const ch
1348
1429
  return result;
1349
1430
  }
1350
1431
 
1432
+ LZ4_FORCE_O2_GCC_PPC64LE
1351
1433
  int LZ4_decompress_fast_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* source, char* dest, int originalSize)
1352
1434
  {
1353
1435
  LZ4_streamDecode_t_internal* lz4sd = &LZ4_streamDecode->internal_donotuse;
@@ -1382,7 +1464,8 @@ Advanced decoding functions :
1382
1464
  the dictionary must be explicitly provided within parameters
1383
1465
  */
1384
1466
 
1385
- FORCE_INLINE int LZ4_decompress_usingDict_generic(const char* source, char* dest, int compressedSize, int maxOutputSize, int safe, const char* dictStart, int dictSize)
1467
+ LZ4_FORCE_O2_GCC_PPC64LE
1468
+ LZ4_FORCE_INLINE int LZ4_decompress_usingDict_generic(const char* source, char* dest, int compressedSize, int maxOutputSize, int safe, const char* dictStart, int dictSize)
1386
1469
  {
1387
1470
  if (dictSize==0)
1388
1471
  return LZ4_decompress_generic(source, dest, compressedSize, maxOutputSize, safe, full, 0, noDict, (BYTE*)dest, NULL, 0);
@@ -1394,17 +1477,20 @@ FORCE_INLINE int LZ4_decompress_usingDict_generic(const char* source, char* dest
1394
1477
  return LZ4_decompress_generic(source, dest, compressedSize, maxOutputSize, safe, full, 0, usingExtDict, (BYTE*)dest, (const BYTE*)dictStart, dictSize);
1395
1478
  }
1396
1479
 
1480
+ LZ4_FORCE_O2_GCC_PPC64LE
1397
1481
  int LZ4_decompress_safe_usingDict(const char* source, char* dest, int compressedSize, int maxOutputSize, const char* dictStart, int dictSize)
1398
1482
  {
1399
1483
  return LZ4_decompress_usingDict_generic(source, dest, compressedSize, maxOutputSize, 1, dictStart, dictSize);
1400
1484
  }
1401
1485
 
1486
+ LZ4_FORCE_O2_GCC_PPC64LE
1402
1487
  int LZ4_decompress_fast_usingDict(const char* source, char* dest, int originalSize, const char* dictStart, int dictSize)
1403
1488
  {
1404
1489
  return LZ4_decompress_usingDict_generic(source, dest, 0, originalSize, 0, dictStart, dictSize);
1405
1490
  }
1406
1491
 
1407
1492
  /* debug function */
1493
+ LZ4_FORCE_O2_GCC_PPC64LE
1408
1494
  int LZ4_decompress_safe_forceExtDict(const char* source, char* dest, int compressedSize, int maxOutputSize, const char* dictStart, int dictSize)
1409
1495
  {
1410
1496
  return LZ4_decompress_generic(source, dest, compressedSize, maxOutputSize, endOnInputSize, full, 0, usingExtDict, (BYTE*)dest, (const BYTE*)dictStart, dictSize);