zstd-ruby 1.4.5.0 → 1.5.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (101) hide show
  1. checksums.yaml +4 -4
  2. data/.github/dependabot.yml +8 -0
  3. data/.github/workflows/ruby.yml +35 -0
  4. data/README.md +2 -2
  5. data/ext/zstdruby/extconf.rb +2 -1
  6. data/ext/zstdruby/libzstd/BUCK +5 -7
  7. data/ext/zstdruby/libzstd/Makefile +225 -222
  8. data/ext/zstdruby/libzstd/README.md +43 -5
  9. data/ext/zstdruby/libzstd/common/bitstream.h +46 -22
  10. data/ext/zstdruby/libzstd/common/compiler.h +182 -22
  11. data/ext/zstdruby/libzstd/common/cpu.h +1 -3
  12. data/ext/zstdruby/libzstd/common/debug.c +1 -1
  13. data/ext/zstdruby/libzstd/common/debug.h +12 -19
  14. data/ext/zstdruby/libzstd/common/entropy_common.c +196 -44
  15. data/ext/zstdruby/libzstd/common/error_private.c +2 -1
  16. data/ext/zstdruby/libzstd/common/error_private.h +82 -3
  17. data/ext/zstdruby/libzstd/common/fse.h +41 -12
  18. data/ext/zstdruby/libzstd/common/fse_decompress.c +139 -22
  19. data/ext/zstdruby/libzstd/common/huf.h +47 -23
  20. data/ext/zstdruby/libzstd/common/mem.h +87 -98
  21. data/ext/zstdruby/libzstd/common/pool.c +23 -17
  22. data/ext/zstdruby/libzstd/common/pool.h +2 -2
  23. data/ext/zstdruby/libzstd/common/portability_macros.h +131 -0
  24. data/ext/zstdruby/libzstd/common/threading.c +6 -5
  25. data/ext/zstdruby/libzstd/common/xxhash.c +6 -846
  26. data/ext/zstdruby/libzstd/common/xxhash.h +5568 -167
  27. data/ext/zstdruby/libzstd/common/zstd_common.c +10 -10
  28. data/ext/zstdruby/libzstd/common/zstd_deps.h +111 -0
  29. data/ext/zstdruby/libzstd/common/zstd_internal.h +189 -142
  30. data/ext/zstdruby/libzstd/common/zstd_trace.h +163 -0
  31. data/ext/zstdruby/libzstd/compress/clevels.h +134 -0
  32. data/ext/zstdruby/libzstd/compress/fse_compress.c +89 -46
  33. data/ext/zstdruby/libzstd/compress/hist.c +27 -29
  34. data/ext/zstdruby/libzstd/compress/hist.h +2 -2
  35. data/ext/zstdruby/libzstd/compress/huf_compress.c +770 -198
  36. data/ext/zstdruby/libzstd/compress/zstd_compress.c +2894 -863
  37. data/ext/zstdruby/libzstd/compress/zstd_compress_internal.h +390 -90
  38. data/ext/zstdruby/libzstd/compress/zstd_compress_literals.c +12 -11
  39. data/ext/zstdruby/libzstd/compress/zstd_compress_literals.h +4 -2
  40. data/ext/zstdruby/libzstd/compress/zstd_compress_sequences.c +31 -8
  41. data/ext/zstdruby/libzstd/compress/zstd_compress_sequences.h +1 -1
  42. data/ext/zstdruby/libzstd/compress/zstd_compress_superblock.c +25 -297
  43. data/ext/zstdruby/libzstd/compress/zstd_compress_superblock.h +1 -1
  44. data/ext/zstdruby/libzstd/compress/zstd_cwksp.h +206 -69
  45. data/ext/zstdruby/libzstd/compress/zstd_double_fast.c +307 -132
  46. data/ext/zstdruby/libzstd/compress/zstd_double_fast.h +1 -1
  47. data/ext/zstdruby/libzstd/compress/zstd_fast.c +322 -143
  48. data/ext/zstdruby/libzstd/compress/zstd_fast.h +1 -1
  49. data/ext/zstdruby/libzstd/compress/zstd_lazy.c +1136 -174
  50. data/ext/zstdruby/libzstd/compress/zstd_lazy.h +59 -1
  51. data/ext/zstdruby/libzstd/compress/zstd_ldm.c +316 -213
  52. data/ext/zstdruby/libzstd/compress/zstd_ldm.h +9 -2
  53. data/ext/zstdruby/libzstd/compress/zstd_ldm_geartab.h +106 -0
  54. data/ext/zstdruby/libzstd/compress/zstd_opt.c +373 -150
  55. data/ext/zstdruby/libzstd/compress/zstd_opt.h +1 -1
  56. data/ext/zstdruby/libzstd/compress/zstdmt_compress.c +152 -444
  57. data/ext/zstdruby/libzstd/compress/zstdmt_compress.h +31 -113
  58. data/ext/zstdruby/libzstd/decompress/huf_decompress.c +1044 -403
  59. data/ext/zstdruby/libzstd/decompress/huf_decompress_amd64.S +571 -0
  60. data/ext/zstdruby/libzstd/decompress/zstd_ddict.c +9 -9
  61. data/ext/zstdruby/libzstd/decompress/zstd_ddict.h +2 -2
  62. data/ext/zstdruby/libzstd/decompress/zstd_decompress.c +450 -105
  63. data/ext/zstdruby/libzstd/decompress/zstd_decompress_block.c +913 -273
  64. data/ext/zstdruby/libzstd/decompress/zstd_decompress_block.h +14 -5
  65. data/ext/zstdruby/libzstd/decompress/zstd_decompress_internal.h +59 -12
  66. data/ext/zstdruby/libzstd/deprecated/zbuff.h +1 -1
  67. data/ext/zstdruby/libzstd/deprecated/zbuff_common.c +1 -1
  68. data/ext/zstdruby/libzstd/deprecated/zbuff_compress.c +24 -4
  69. data/ext/zstdruby/libzstd/deprecated/zbuff_decompress.c +1 -1
  70. data/ext/zstdruby/libzstd/dictBuilder/cover.c +55 -38
  71. data/ext/zstdruby/libzstd/dictBuilder/cover.h +7 -6
  72. data/ext/zstdruby/libzstd/dictBuilder/divsufsort.c +1 -1
  73. data/ext/zstdruby/libzstd/dictBuilder/fastcover.c +43 -34
  74. data/ext/zstdruby/libzstd/dictBuilder/zdict.c +128 -58
  75. data/ext/zstdruby/libzstd/dll/example/Makefile +1 -1
  76. data/ext/zstdruby/libzstd/dll/example/README.md +16 -22
  77. data/ext/zstdruby/libzstd/legacy/zstd_legacy.h +1 -1
  78. data/ext/zstdruby/libzstd/legacy/zstd_v01.c +8 -8
  79. data/ext/zstdruby/libzstd/legacy/zstd_v01.h +1 -1
  80. data/ext/zstdruby/libzstd/legacy/zstd_v02.c +9 -9
  81. data/ext/zstdruby/libzstd/legacy/zstd_v02.h +1 -1
  82. data/ext/zstdruby/libzstd/legacy/zstd_v03.c +9 -9
  83. data/ext/zstdruby/libzstd/legacy/zstd_v03.h +1 -1
  84. data/ext/zstdruby/libzstd/legacy/zstd_v04.c +10 -10
  85. data/ext/zstdruby/libzstd/legacy/zstd_v04.h +1 -1
  86. data/ext/zstdruby/libzstd/legacy/zstd_v05.c +13 -13
  87. data/ext/zstdruby/libzstd/legacy/zstd_v05.h +1 -1
  88. data/ext/zstdruby/libzstd/legacy/zstd_v06.c +13 -13
  89. data/ext/zstdruby/libzstd/legacy/zstd_v06.h +1 -1
  90. data/ext/zstdruby/libzstd/legacy/zstd_v07.c +13 -13
  91. data/ext/zstdruby/libzstd/legacy/zstd_v07.h +1 -1
  92. data/ext/zstdruby/libzstd/libzstd.mk +185 -0
  93. data/ext/zstdruby/libzstd/libzstd.pc.in +4 -3
  94. data/ext/zstdruby/libzstd/modulemap/module.modulemap +4 -0
  95. data/ext/zstdruby/libzstd/{dictBuilder/zdict.h → zdict.h} +154 -7
  96. data/ext/zstdruby/libzstd/zstd.h +699 -214
  97. data/ext/zstdruby/libzstd/{common/zstd_errors.h → zstd_errors.h} +2 -1
  98. data/ext/zstdruby/zstdruby.c +2 -2
  99. data/lib/zstd-ruby/version.rb +1 -1
  100. metadata +15 -6
  101. data/.travis.yml +0 -14
@@ -1,5 +1,5 @@
1
1
  /*
2
- * Copyright (c) 2018-2020, Facebook, Inc.
2
+ * Copyright (c) Facebook, Inc.
3
3
  * All rights reserved.
4
4
  *
5
5
  * This source code is licensed under both the BSD-style license (found in the
@@ -16,24 +16,33 @@
16
16
  #include <string.h> /* memset */
17
17
  #include <time.h> /* clock */
18
18
 
19
+ #ifndef ZDICT_STATIC_LINKING_ONLY
20
+ # define ZDICT_STATIC_LINKING_ONLY
21
+ #endif
22
+
19
23
  #include "../common/mem.h" /* read */
20
24
  #include "../common/pool.h"
21
25
  #include "../common/threading.h"
22
- #include "cover.h"
23
26
  #include "../common/zstd_internal.h" /* includes zstd.h */
24
- #ifndef ZDICT_STATIC_LINKING_ONLY
25
- #define ZDICT_STATIC_LINKING_ONLY
26
- #endif
27
- #include "zdict.h"
27
+ #include "../compress/zstd_compress_internal.h" /* ZSTD_hash*() */
28
+ #include "../zdict.h"
29
+ #include "cover.h"
28
30
 
29
31
 
30
32
  /*-*************************************
31
33
  * Constants
32
34
  ***************************************/
35
+ /**
36
+ * There are 32bit indexes used to ref samples, so limit samples size to 4GB
37
+ * on 64bit builds.
38
+ * For 32bit builds we choose 1 GB.
39
+ * Most 32bit platforms have 2GB user-mode addressable space and we allocate a large
40
+ * contiguous buffer, so 1GB is already a high limit.
41
+ */
33
42
  #define FASTCOVER_MAX_SAMPLES_SIZE (sizeof(size_t) == 8 ? ((unsigned)-1) : ((unsigned)1 GB))
34
43
  #define FASTCOVER_MAX_F 31
35
44
  #define FASTCOVER_MAX_ACCEL 10
36
- #define DEFAULT_SPLITPOINT 0.75
45
+ #define FASTCOVER_DEFAULT_SPLITPOINT 0.75
37
46
  #define DEFAULT_F 20
38
47
  #define DEFAULT_ACCEL 1
39
48
 
@@ -41,50 +50,50 @@
41
50
  /*-*************************************
42
51
  * Console display
43
52
  ***************************************/
44
- static int g_displayLevel = 2;
53
+ #ifndef LOCALDISPLAYLEVEL
54
+ static int g_displayLevel = 0;
55
+ #endif
56
+ #undef DISPLAY
45
57
  #define DISPLAY(...) \
46
58
  { \
47
59
  fprintf(stderr, __VA_ARGS__); \
48
60
  fflush(stderr); \
49
61
  }
62
+ #undef LOCALDISPLAYLEVEL
50
63
  #define LOCALDISPLAYLEVEL(displayLevel, l, ...) \
51
64
  if (displayLevel >= l) { \
52
65
  DISPLAY(__VA_ARGS__); \
53
66
  } /* 0 : no display; 1: errors; 2: default; 3: details; 4: debug */
67
+ #undef DISPLAYLEVEL
54
68
  #define DISPLAYLEVEL(l, ...) LOCALDISPLAYLEVEL(g_displayLevel, l, __VA_ARGS__)
55
69
 
70
+ #ifndef LOCALDISPLAYUPDATE
71
+ static const clock_t g_refreshRate = CLOCKS_PER_SEC * 15 / 100;
72
+ static clock_t g_time = 0;
73
+ #endif
74
+ #undef LOCALDISPLAYUPDATE
56
75
  #define LOCALDISPLAYUPDATE(displayLevel, l, ...) \
57
76
  if (displayLevel >= l) { \
58
- if ((clock() - g_time > refreshRate) || (displayLevel >= 4)) { \
77
+ if ((clock() - g_time > g_refreshRate) || (displayLevel >= 4)) { \
59
78
  g_time = clock(); \
60
79
  DISPLAY(__VA_ARGS__); \
61
80
  } \
62
81
  }
82
+ #undef DISPLAYUPDATE
63
83
  #define DISPLAYUPDATE(l, ...) LOCALDISPLAYUPDATE(g_displayLevel, l, __VA_ARGS__)
64
- static const clock_t refreshRate = CLOCKS_PER_SEC * 15 / 100;
65
- static clock_t g_time = 0;
66
84
 
67
85
 
68
86
  /*-*************************************
69
87
  * Hash Functions
70
88
  ***************************************/
71
- static const U64 prime6bytes = 227718039650203ULL;
72
- static size_t ZSTD_hash6(U64 u, U32 h) { return (size_t)(((u << (64-48)) * prime6bytes) >> (64-h)) ; }
73
- static size_t ZSTD_hash6Ptr(const void* p, U32 h) { return ZSTD_hash6(MEM_readLE64(p), h); }
74
-
75
- static const U64 prime8bytes = 0xCF1BBCDCB7A56463ULL;
76
- static size_t ZSTD_hash8(U64 u, U32 h) { return (size_t)(((u) * prime8bytes) >> (64-h)) ; }
77
- static size_t ZSTD_hash8Ptr(const void* p, U32 h) { return ZSTD_hash8(MEM_readLE64(p), h); }
78
-
79
-
80
89
  /**
81
- * Hash the d-byte value pointed to by p and mod 2^f
90
+ * Hash the d-byte value pointed to by p and mod 2^f into the frequency vector
82
91
  */
83
- static size_t FASTCOVER_hashPtrToIndex(const void* p, U32 h, unsigned d) {
92
+ static size_t FASTCOVER_hashPtrToIndex(const void* p, U32 f, unsigned d) {
84
93
  if (d == 6) {
85
- return ZSTD_hash6Ptr(p, h) & ((1 << h) - 1);
94
+ return ZSTD_hash6Ptr(p, f);
86
95
  }
87
- return ZSTD_hash8Ptr(p, h) & ((1 << h) - 1);
96
+ return ZSTD_hash8Ptr(p, f);
88
97
  }
89
98
 
90
99
 
@@ -461,20 +470,20 @@ typedef struct FASTCOVER_tryParameters_data_s {
461
470
  * This function is thread safe if zstd is compiled with multithreaded support.
462
471
  * It takes its parameters as an *OWNING* opaque pointer to support threading.
463
472
  */
464
- static void FASTCOVER_tryParameters(void *opaque)
473
+ static void FASTCOVER_tryParameters(void* opaque)
465
474
  {
466
475
  /* Save parameters as local variables */
467
- FASTCOVER_tryParameters_data_t *const data = (FASTCOVER_tryParameters_data_t *)opaque;
476
+ FASTCOVER_tryParameters_data_t *const data = (FASTCOVER_tryParameters_data_t*)opaque;
468
477
  const FASTCOVER_ctx_t *const ctx = data->ctx;
469
478
  const ZDICT_cover_params_t parameters = data->parameters;
470
479
  size_t dictBufferCapacity = data->dictBufferCapacity;
471
480
  size_t totalCompressedSize = ERROR(GENERIC);
472
481
  /* Initialize array to keep track of frequency of dmer within activeSegment */
473
- U16* segmentFreqs = (U16 *)calloc(((U64)1 << ctx->f), sizeof(U16));
482
+ U16* segmentFreqs = (U16*)calloc(((U64)1 << ctx->f), sizeof(U16));
474
483
  /* Allocate space for hash table, dict, and freqs */
475
- BYTE *const dict = (BYTE * const)malloc(dictBufferCapacity);
484
+ BYTE *const dict = (BYTE*)malloc(dictBufferCapacity);
476
485
  COVER_dictSelection_t selection = COVER_dictSelectionError(ERROR(GENERIC));
477
- U32 *freqs = (U32*) malloc(((U64)1 << ctx->f) * sizeof(U32));
486
+ U32* freqs = (U32*) malloc(((U64)1 << ctx->f) * sizeof(U32));
478
487
  if (!segmentFreqs || !dict || !freqs) {
479
488
  DISPLAYLEVEL(1, "Failed to allocate buffers: out of memory\n");
480
489
  goto _cleanup;
@@ -486,7 +495,7 @@ static void FASTCOVER_tryParameters(void *opaque)
486
495
  parameters, segmentFreqs);
487
496
 
488
497
  const unsigned nbFinalizeSamples = (unsigned)(ctx->nbTrainSamples * ctx->accelParams.finalize / 100);
489
- selection = COVER_selectDict(dict + tail, dictBufferCapacity - tail,
498
+ selection = COVER_selectDict(dict + tail, dictBufferCapacity, dictBufferCapacity - tail,
490
499
  ctx->samples, ctx->samplesSizes, nbFinalizeSamples, ctx->nbTrainSamples, ctx->nbSamples, parameters, ctx->offsets,
491
500
  totalCompressedSize);
492
501
 
@@ -547,7 +556,7 @@ ZDICT_trainFromBuffer_fastCover(void* dictBuffer, size_t dictBufferCapacity,
547
556
  ZDICT_cover_params_t coverParams;
548
557
  FASTCOVER_accel_t accelParams;
549
558
  /* Initialize global data */
550
- g_displayLevel = parameters.zParams.notificationLevel;
559
+ g_displayLevel = (int)parameters.zParams.notificationLevel;
551
560
  /* Assign splitPoint and f if not provided */
552
561
  parameters.splitPoint = 1.0;
553
562
  parameters.f = parameters.f == 0 ? DEFAULT_F : parameters.f;
@@ -617,7 +626,7 @@ ZDICT_optimizeTrainFromBuffer_fastCover(
617
626
  /* constants */
618
627
  const unsigned nbThreads = parameters->nbThreads;
619
628
  const double splitPoint =
620
- parameters->splitPoint <= 0.0 ? DEFAULT_SPLITPOINT : parameters->splitPoint;
629
+ parameters->splitPoint <= 0.0 ? FASTCOVER_DEFAULT_SPLITPOINT : parameters->splitPoint;
621
630
  const unsigned kMinD = parameters->d == 0 ? 6 : parameters->d;
622
631
  const unsigned kMaxD = parameters->d == 0 ? 8 : parameters->d;
623
632
  const unsigned kMinK = parameters->k == 0 ? 50 : parameters->k;
@@ -630,7 +639,7 @@ ZDICT_optimizeTrainFromBuffer_fastCover(
630
639
  const unsigned accel = parameters->accel == 0 ? DEFAULT_ACCEL : parameters->accel;
631
640
  const unsigned shrinkDict = 0;
632
641
  /* Local variables */
633
- const int displayLevel = parameters->zParams.notificationLevel;
642
+ const int displayLevel = (int)parameters->zParams.notificationLevel;
634
643
  unsigned iteration = 1;
635
644
  unsigned d;
636
645
  unsigned k;
@@ -714,7 +723,7 @@ ZDICT_optimizeTrainFromBuffer_fastCover(
714
723
  data->parameters.splitPoint = splitPoint;
715
724
  data->parameters.steps = kSteps;
716
725
  data->parameters.shrinkDict = shrinkDict;
717
- data->parameters.zParams.notificationLevel = g_displayLevel;
726
+ data->parameters.zParams.notificationLevel = (unsigned)g_displayLevel;
718
727
  /* Check the parameters */
719
728
  if (!FASTCOVER_checkParameters(data->parameters, dictBufferCapacity,
720
729
  data->ctx->f, accel)) {
@@ -1,5 +1,5 @@
1
1
  /*
2
- * Copyright (c) 2016-2020, Yann Collet, Facebook, Inc.
2
+ * Copyright (c) Yann Collet, Facebook, Inc.
3
3
  * All rights reserved.
4
4
  *
5
5
  * This source code is licensed under both the BSD-style license (found in the
@@ -23,9 +23,13 @@
23
23
  /* Unix Large Files support (>4GB) */
24
24
  #define _FILE_OFFSET_BITS 64
25
25
  #if (defined(__sun__) && (!defined(__LP64__))) /* Sun Solaris 32-bits requires specific definitions */
26
+ # ifndef _LARGEFILE_SOURCE
26
27
  # define _LARGEFILE_SOURCE
28
+ # endif
27
29
  #elif ! defined(__LP64__) /* No point defining Large file for 64 bit */
30
+ # ifndef _LARGEFILE64_SOURCE
28
31
  # define _LARGEFILE64_SOURCE
32
+ # endif
29
33
  #endif
30
34
 
31
35
 
@@ -37,18 +41,19 @@
37
41
  #include <stdio.h> /* fprintf, fopen, ftello64 */
38
42
  #include <time.h> /* clock */
39
43
 
44
+ #ifndef ZDICT_STATIC_LINKING_ONLY
45
+ # define ZDICT_STATIC_LINKING_ONLY
46
+ #endif
47
+ #define HUF_STATIC_LINKING_ONLY
48
+
40
49
  #include "../common/mem.h" /* read */
41
50
  #include "../common/fse.h" /* FSE_normalizeCount, FSE_writeNCount */
42
- #define HUF_STATIC_LINKING_ONLY
43
51
  #include "../common/huf.h" /* HUF_buildCTable, HUF_writeCTable */
44
52
  #include "../common/zstd_internal.h" /* includes zstd.h */
45
53
  #include "../common/xxhash.h" /* XXH64 */
46
- #include "divsufsort.h"
47
- #ifndef ZDICT_STATIC_LINKING_ONLY
48
- # define ZDICT_STATIC_LINKING_ONLY
49
- #endif
50
- #include "zdict.h"
51
54
  #include "../compress/zstd_compress_internal.h" /* ZSTD_loadCEntropy() */
55
+ #include "../zdict.h"
56
+ #include "divsufsort.h"
52
57
 
53
58
 
54
59
  /*-*************************************
@@ -62,14 +67,15 @@
62
67
 
63
68
  #define NOISELENGTH 32
64
69
 
65
- static const int g_compressionLevel_default = 3;
66
70
  static const U32 g_selectivity_default = 9;
67
71
 
68
72
 
69
73
  /*-*************************************
70
74
  * Console display
71
75
  ***************************************/
76
+ #undef DISPLAY
72
77
  #define DISPLAY(...) { fprintf(stderr, __VA_ARGS__); fflush( stderr ); }
78
+ #undef DISPLAYLEVEL
73
79
  #define DISPLAYLEVEL(l, ...) if (notificationLevel>=l) { DISPLAY(__VA_ARGS__); } /* 0 : no display; 1: errors; 2: default; 3: details; 4: debug */
74
80
 
75
81
  static clock_t ZDICT_clockSpan(clock_t nPrevious) { return clock() - nPrevious; }
@@ -105,20 +111,17 @@ size_t ZDICT_getDictHeaderSize(const void* dictBuffer, size_t dictSize)
105
111
  size_t headerSize;
106
112
  if (dictSize <= 8 || MEM_readLE32(dictBuffer) != ZSTD_MAGIC_DICTIONARY) return ERROR(dictionary_corrupted);
107
113
 
108
- { unsigned offcodeMaxValue = MaxOff;
109
- ZSTD_compressedBlockState_t* bs = (ZSTD_compressedBlockState_t*)malloc(sizeof(ZSTD_compressedBlockState_t));
114
+ { ZSTD_compressedBlockState_t* bs = (ZSTD_compressedBlockState_t*)malloc(sizeof(ZSTD_compressedBlockState_t));
110
115
  U32* wksp = (U32*)malloc(HUF_WORKSPACE_SIZE);
111
- short* offcodeNCount = (short*)malloc((MaxOff+1)*sizeof(short));
112
- if (!bs || !wksp || !offcodeNCount) {
116
+ if (!bs || !wksp) {
113
117
  headerSize = ERROR(memory_allocation);
114
118
  } else {
115
119
  ZSTD_reset_compressedBlockState(bs);
116
- headerSize = ZSTD_loadCEntropy(bs, wksp, offcodeNCount, &offcodeMaxValue, dictBuffer, dictSize);
120
+ headerSize = ZSTD_loadCEntropy(bs, wksp, dictBuffer, dictSize);
117
121
  }
118
122
 
119
123
  free(bs);
120
124
  free(wksp);
121
- free(offcodeNCount);
122
125
  }
123
126
 
124
127
  return headerSize;
@@ -132,22 +135,32 @@ static unsigned ZDICT_NbCommonBytes (size_t val)
132
135
  if (MEM_isLittleEndian()) {
133
136
  if (MEM_64bits()) {
134
137
  # if defined(_MSC_VER) && defined(_WIN64)
135
- unsigned long r = 0;
136
- _BitScanForward64( &r, (U64)val );
137
- return (unsigned)(r>>3);
138
+ if (val != 0) {
139
+ unsigned long r;
140
+ _BitScanForward64(&r, (U64)val);
141
+ return (unsigned)(r >> 3);
142
+ } else {
143
+ /* Should not reach this code path */
144
+ __assume(0);
145
+ }
138
146
  # elif defined(__GNUC__) && (__GNUC__ >= 3)
139
- return (__builtin_ctzll((U64)val) >> 3);
147
+ return (unsigned)(__builtin_ctzll((U64)val) >> 3);
140
148
  # else
141
149
  static const int DeBruijnBytePos[64] = { 0, 0, 0, 0, 0, 1, 1, 2, 0, 3, 1, 3, 1, 4, 2, 7, 0, 2, 3, 6, 1, 5, 3, 5, 1, 3, 4, 4, 2, 5, 6, 7, 7, 0, 1, 2, 3, 3, 4, 6, 2, 6, 5, 5, 3, 4, 5, 6, 7, 1, 2, 4, 6, 4, 4, 5, 7, 2, 6, 5, 7, 6, 7, 7 };
142
150
  return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218A392CDABBD3FULL)) >> 58];
143
151
  # endif
144
152
  } else { /* 32 bits */
145
153
  # if defined(_MSC_VER)
146
- unsigned long r=0;
147
- _BitScanForward( &r, (U32)val );
148
- return (unsigned)(r>>3);
154
+ if (val != 0) {
155
+ unsigned long r;
156
+ _BitScanForward(&r, (U32)val);
157
+ return (unsigned)(r >> 3);
158
+ } else {
159
+ /* Should not reach this code path */
160
+ __assume(0);
161
+ }
149
162
  # elif defined(__GNUC__) && (__GNUC__ >= 3)
150
- return (__builtin_ctz((U32)val) >> 3);
163
+ return (unsigned)(__builtin_ctz((U32)val) >> 3);
151
164
  # else
152
165
  static const int DeBruijnBytePos[32] = { 0, 0, 3, 0, 3, 1, 3, 0, 3, 2, 2, 1, 3, 2, 0, 1, 3, 3, 1, 2, 2, 2, 2, 0, 3, 1, 2, 0, 1, 0, 1, 1 };
153
166
  return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)) >> 27];
@@ -156,11 +169,16 @@ static unsigned ZDICT_NbCommonBytes (size_t val)
156
169
  } else { /* Big Endian CPU */
157
170
  if (MEM_64bits()) {
158
171
  # if defined(_MSC_VER) && defined(_WIN64)
159
- unsigned long r = 0;
160
- _BitScanReverse64( &r, val );
161
- return (unsigned)(r>>3);
172
+ if (val != 0) {
173
+ unsigned long r;
174
+ _BitScanReverse64(&r, val);
175
+ return (unsigned)(r >> 3);
176
+ } else {
177
+ /* Should not reach this code path */
178
+ __assume(0);
179
+ }
162
180
  # elif defined(__GNUC__) && (__GNUC__ >= 3)
163
- return (__builtin_clzll(val) >> 3);
181
+ return (unsigned)(__builtin_clzll(val) >> 3);
164
182
  # else
165
183
  unsigned r;
166
184
  const unsigned n32 = sizeof(size_t)*4; /* calculate this way due to compiler complaining in 32-bits mode */
@@ -171,11 +189,16 @@ static unsigned ZDICT_NbCommonBytes (size_t val)
171
189
  # endif
172
190
  } else { /* 32 bits */
173
191
  # if defined(_MSC_VER)
174
- unsigned long r = 0;
175
- _BitScanReverse( &r, (unsigned long)val );
176
- return (unsigned)(r>>3);
192
+ if (val != 0) {
193
+ unsigned long r;
194
+ _BitScanReverse(&r, (unsigned long)val);
195
+ return (unsigned)(r >> 3);
196
+ } else {
197
+ /* Should not reach this code path */
198
+ __assume(0);
199
+ }
177
200
  # elif defined(__GNUC__) && (__GNUC__ >= 3)
178
- return (__builtin_clz((U32)val) >> 3);
201
+ return (unsigned)(__builtin_clz((U32)val) >> 3);
179
202
  # else
180
203
  unsigned r;
181
204
  if (!(val>>16)) { r=2; val>>=8; } else { r=0; val>>=24; }
@@ -232,7 +255,7 @@ static dictItem ZDICT_analyzePos(
232
255
  U32 savings[LLIMIT] = {0};
233
256
  const BYTE* b = (const BYTE*)buffer;
234
257
  size_t maxLength = LLIMIT;
235
- size_t pos = suffix[start];
258
+ size_t pos = (size_t)suffix[start];
236
259
  U32 end = start;
237
260
  dictItem solution;
238
261
 
@@ -366,7 +389,7 @@ static dictItem ZDICT_analyzePos(
366
389
  savings[i] = savings[i-1] + (lengthList[i] * (i-3));
367
390
 
368
391
  DISPLAYLEVEL(4, "Selected dict at position %u, of length %u : saves %u (ratio: %.2f) \n",
369
- (unsigned)pos, (unsigned)maxLength, (unsigned)savings[maxLength], (double)savings[maxLength] / maxLength);
392
+ (unsigned)pos, (unsigned)maxLength, (unsigned)savings[maxLength], (double)savings[maxLength] / (double)maxLength);
370
393
 
371
394
  solution.pos = (U32)pos;
372
395
  solution.length = (U32)maxLength;
@@ -376,7 +399,7 @@ static dictItem ZDICT_analyzePos(
376
399
  { U32 id;
377
400
  for (id=start; id<end; id++) {
378
401
  U32 p, pEnd, length;
379
- U32 const testedPos = suffix[id];
402
+ U32 const testedPos = (U32)suffix[id];
380
403
  if (testedPos == pos)
381
404
  length = solution.length;
382
405
  else {
@@ -439,7 +462,7 @@ static U32 ZDICT_tryMerge(dictItem* table, dictItem elt, U32 eltNbToSkip, const
439
462
 
440
463
  if ((table[u].pos + table[u].length >= elt.pos) && (table[u].pos < elt.pos)) { /* overlap, existing < new */
441
464
  /* append */
442
- int const addedLength = (int)eltEnd - (table[u].pos + table[u].length);
465
+ int const addedLength = (int)eltEnd - (int)(table[u].pos + table[u].length);
443
466
  table[u].savings += elt.length / 8; /* rough approx bonus */
444
467
  if (addedLength > 0) { /* otherwise, elt fully included into existing */
445
468
  table[u].length += addedLength;
@@ -532,6 +555,7 @@ static size_t ZDICT_trainBuffer_legacy(dictItem* dictList, U32 dictListSize,
532
555
  clock_t displayClock = 0;
533
556
  clock_t const refreshRate = CLOCKS_PER_SEC * 3 / 10;
534
557
 
558
+ # undef DISPLAYUPDATE
535
559
  # define DISPLAYUPDATE(l, ...) if (notificationLevel>=l) { \
536
560
  if (ZDICT_clockSpan(displayClock) > refreshRate) \
537
561
  { displayClock = clock(); DISPLAY(__VA_ARGS__); \
@@ -706,7 +730,7 @@ static void ZDICT_flatLit(unsigned* countLit)
706
730
 
707
731
  #define OFFCODE_MAX 30 /* only applicable to first block */
708
732
  static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
709
- unsigned compressionLevel,
733
+ int compressionLevel,
710
734
  const void* srcBuffer, const size_t* fileSizes, unsigned nbFiles,
711
735
  const void* dictBuffer, size_t dictBufferSize,
712
736
  unsigned notificationLevel)
@@ -741,7 +765,7 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
741
765
  memset(repOffset, 0, sizeof(repOffset));
742
766
  repOffset[1] = repOffset[4] = repOffset[8] = 1;
743
767
  memset(bestRepOffset, 0, sizeof(bestRepOffset));
744
- if (compressionLevel==0) compressionLevel = g_compressionLevel_default;
768
+ if (compressionLevel==0) compressionLevel = ZSTD_CLEVEL_DEFAULT;
745
769
  params = ZSTD_getParams(compressionLevel, averageSampleSize, dictBufferSize);
746
770
 
747
771
  esr.dict = ZSTD_createCDict_advanced(dictBuffer, dictBufferSize, ZSTD_dlm_byRef, ZSTD_dct_rawContent, params.cParams, ZSTD_defaultCMem);
@@ -762,6 +786,13 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
762
786
  pos += fileSizes[u];
763
787
  }
764
788
 
789
+ if (notificationLevel >= 4) {
790
+ /* writeStats */
791
+ DISPLAYLEVEL(4, "Offset Code Frequencies : \n");
792
+ for (u=0; u<=offcodeMax; u++) {
793
+ DISPLAYLEVEL(4, "%2u :%7u \n", u, offcodeCount[u]);
794
+ } }
795
+
765
796
  /* analyze, build stats, starting with literals */
766
797
  { size_t maxNbBits = HUF_buildCTable (hufTable, countLit, 255, huffLog);
767
798
  if (HUF_isError(maxNbBits)) {
@@ -786,7 +817,7 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
786
817
  /* note : the result of this phase should be used to better appreciate the impact on statistics */
787
818
 
788
819
  total=0; for (u=0; u<=offcodeMax; u++) total+=offcodeCount[u];
789
- errorCode = FSE_normalizeCount(offcodeNCount, Offlog, offcodeCount, total, offcodeMax);
820
+ errorCode = FSE_normalizeCount(offcodeNCount, Offlog, offcodeCount, total, offcodeMax, /* useLowProbCount */ 1);
790
821
  if (FSE_isError(errorCode)) {
791
822
  eSize = errorCode;
792
823
  DISPLAYLEVEL(1, "FSE_normalizeCount error with offcodeCount \n");
@@ -795,7 +826,7 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
795
826
  Offlog = (U32)errorCode;
796
827
 
797
828
  total=0; for (u=0; u<=MaxML; u++) total+=matchLengthCount[u];
798
- errorCode = FSE_normalizeCount(matchLengthNCount, mlLog, matchLengthCount, total, MaxML);
829
+ errorCode = FSE_normalizeCount(matchLengthNCount, mlLog, matchLengthCount, total, MaxML, /* useLowProbCount */ 1);
799
830
  if (FSE_isError(errorCode)) {
800
831
  eSize = errorCode;
801
832
  DISPLAYLEVEL(1, "FSE_normalizeCount error with matchLengthCount \n");
@@ -804,7 +835,7 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
804
835
  mlLog = (U32)errorCode;
805
836
 
806
837
  total=0; for (u=0; u<=MaxLL; u++) total+=litLengthCount[u];
807
- errorCode = FSE_normalizeCount(litLengthNCount, llLog, litLengthCount, total, MaxLL);
838
+ errorCode = FSE_normalizeCount(litLengthNCount, llLog, litLengthCount, total, MaxLL, /* useLowProbCount */ 1);
808
839
  if (FSE_isError(errorCode)) {
809
840
  eSize = errorCode;
810
841
  DISPLAYLEVEL(1, "FSE_normalizeCount error with litLengthCount \n");
@@ -868,7 +899,7 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
868
899
  MEM_writeLE32(dstPtr+8, bestRepOffset[2].offset);
869
900
  #else
870
901
  /* at this stage, we don't use the result of "most common first offset",
871
- as the impact of statistics is not properly evaluated */
902
+ * as the impact of statistics is not properly evaluated */
872
903
  MEM_writeLE32(dstPtr+0, repStartValue[0]);
873
904
  MEM_writeLE32(dstPtr+4, repStartValue[1]);
874
905
  MEM_writeLE32(dstPtr+8, repStartValue[2]);
@@ -884,6 +915,17 @@ _cleanup:
884
915
  }
885
916
 
886
917
 
918
+ /**
919
+ * @returns the maximum repcode value
920
+ */
921
+ static U32 ZDICT_maxRep(U32 const reps[ZSTD_REP_NUM])
922
+ {
923
+ U32 maxRep = reps[0];
924
+ int r;
925
+ for (r = 1; r < ZSTD_REP_NUM; ++r)
926
+ maxRep = MAX(maxRep, reps[r]);
927
+ return maxRep;
928
+ }
887
929
 
888
930
  size_t ZDICT_finalizeDictionary(void* dictBuffer, size_t dictBufferCapacity,
889
931
  const void* customDictContent, size_t dictContentSize,
@@ -893,13 +935,15 @@ size_t ZDICT_finalizeDictionary(void* dictBuffer, size_t dictBufferCapacity,
893
935
  size_t hSize;
894
936
  #define HBUFFSIZE 256 /* should prove large enough for all entropy headers */
895
937
  BYTE header[HBUFFSIZE];
896
- int const compressionLevel = (params.compressionLevel == 0) ? g_compressionLevel_default : params.compressionLevel;
938
+ int const compressionLevel = (params.compressionLevel == 0) ? ZSTD_CLEVEL_DEFAULT : params.compressionLevel;
897
939
  U32 const notificationLevel = params.notificationLevel;
940
+ /* The final dictionary content must be at least as large as the largest repcode */
941
+ size_t const minContentSize = (size_t)ZDICT_maxRep(repStartValue);
942
+ size_t paddingSize;
898
943
 
899
944
  /* check conditions */
900
945
  DEBUGLOG(4, "ZDICT_finalizeDictionary");
901
946
  if (dictBufferCapacity < dictContentSize) return ERROR(dstSize_tooSmall);
902
- if (dictContentSize < ZDICT_CONTENTSIZE_MIN) return ERROR(srcSize_wrong);
903
947
  if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) return ERROR(dstSize_tooSmall);
904
948
 
905
949
  /* dictionary header */
@@ -923,12 +967,43 @@ size_t ZDICT_finalizeDictionary(void* dictBuffer, size_t dictBufferCapacity,
923
967
  hSize += eSize;
924
968
  }
925
969
 
926
- /* copy elements in final buffer ; note : src and dst buffer can overlap */
927
- if (hSize + dictContentSize > dictBufferCapacity) dictContentSize = dictBufferCapacity - hSize;
928
- { size_t const dictSize = hSize + dictContentSize;
929
- char* dictEnd = (char*)dictBuffer + dictSize;
930
- memmove(dictEnd - dictContentSize, customDictContent, dictContentSize);
931
- memcpy(dictBuffer, header, hSize);
970
+ /* Shrink the content size if it doesn't fit in the buffer */
971
+ if (hSize + dictContentSize > dictBufferCapacity) {
972
+ dictContentSize = dictBufferCapacity - hSize;
973
+ }
974
+
975
+ /* Pad the dictionary content with zeros if it is too small */
976
+ if (dictContentSize < minContentSize) {
977
+ RETURN_ERROR_IF(hSize + minContentSize > dictBufferCapacity, dstSize_tooSmall,
978
+ "dictBufferCapacity too small to fit max repcode");
979
+ paddingSize = minContentSize - dictContentSize;
980
+ } else {
981
+ paddingSize = 0;
982
+ }
983
+
984
+ {
985
+ size_t const dictSize = hSize + paddingSize + dictContentSize;
986
+
987
+ /* The dictionary consists of the header, optional padding, and the content.
988
+ * The padding comes before the content because the "best" position in the
989
+ * dictionary is the last byte.
990
+ */
991
+ BYTE* const outDictHeader = (BYTE*)dictBuffer;
992
+ BYTE* const outDictPadding = outDictHeader + hSize;
993
+ BYTE* const outDictContent = outDictPadding + paddingSize;
994
+
995
+ assert(dictSize <= dictBufferCapacity);
996
+ assert(outDictContent + dictContentSize == (BYTE*)dictBuffer + dictSize);
997
+
998
+ /* First copy the customDictContent into its final location.
999
+ * `customDictContent` and `dictBuffer` may overlap, so we must
1000
+ * do this before any other writes into the output buffer.
1001
+ * Then copy the header & padding into the output buffer.
1002
+ */
1003
+ memmove(outDictContent, customDictContent, dictContentSize);
1004
+ memcpy(outDictHeader, header, hSize);
1005
+ memset(outDictPadding, 0, paddingSize);
1006
+
932
1007
  return dictSize;
933
1008
  }
934
1009
  }
@@ -939,7 +1014,7 @@ static size_t ZDICT_addEntropyTablesFromBuffer_advanced(
939
1014
  const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
940
1015
  ZDICT_params_t params)
941
1016
  {
942
- int const compressionLevel = (params.compressionLevel == 0) ? g_compressionLevel_default : params.compressionLevel;
1017
+ int const compressionLevel = (params.compressionLevel == 0) ? ZSTD_CLEVEL_DEFAULT : params.compressionLevel;
943
1018
  U32 const notificationLevel = params.notificationLevel;
944
1019
  size_t hSize = 8;
945
1020
 
@@ -968,16 +1043,11 @@ static size_t ZDICT_addEntropyTablesFromBuffer_advanced(
968
1043
  return MIN(dictBufferCapacity, hSize+dictContentSize);
969
1044
  }
970
1045
 
971
- /* Hidden declaration for dbio.c */
972
- size_t ZDICT_trainFromBuffer_unsafe_legacy(
973
- void* dictBuffer, size_t maxDictSize,
974
- const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
975
- ZDICT_legacy_params_t params);
976
1046
  /*! ZDICT_trainFromBuffer_unsafe_legacy() :
977
- * Warning : `samplesBuffer` must be followed by noisy guard band.
1047
+ * Warning : `samplesBuffer` must be followed by noisy guard band !!!
978
1048
  * @return : size of dictionary, or an error code which can be tested with ZDICT_isError()
979
1049
  */
980
- size_t ZDICT_trainFromBuffer_unsafe_legacy(
1050
+ static size_t ZDICT_trainFromBuffer_unsafe_legacy(
981
1051
  void* dictBuffer, size_t maxDictSize,
982
1052
  const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
983
1053
  ZDICT_legacy_params_t params)
@@ -1114,8 +1184,8 @@ size_t ZDICT_trainFromBuffer(void* dictBuffer, size_t dictBufferCapacity,
1114
1184
  memset(&params, 0, sizeof(params));
1115
1185
  params.d = 8;
1116
1186
  params.steps = 4;
1117
- /* Default to level 6 since no compression level information is available */
1118
- params.zParams.compressionLevel = 3;
1187
+ /* Use default level since no compression level information is available */
1188
+ params.zParams.compressionLevel = ZSTD_CLEVEL_DEFAULT;
1119
1189
  #if defined(DEBUGLEVEL) && (DEBUGLEVEL>=1)
1120
1190
  params.zParams.notificationLevel = DEBUGLEVEL;
1121
1191
  #endif
@@ -1,5 +1,5 @@
1
1
  # ################################################################
2
- # Copyright (c) 2016-2020, Yann Collet, Facebook, Inc.
2
+ # Copyright (c) Yann Collet, Facebook, Inc.
3
3
  # All rights reserved.
4
4
  #
5
5
  # This source code is licensed under both the BSD-style license (found in the
@@ -1,23 +1,21 @@
1
- ZSTD Windows binary package
2
- ====================================
1
+ # ZSTD Windows binary package
3
2
 
4
- #### The package contents
3
+ ## The package contents
5
4
 
6
- - `zstd.exe` : Command Line Utility, supporting gzip-like arguments
7
- - `dll\libzstd.dll` : The ZSTD dynamic library (DLL)
8
- - `dll\libzstd.lib` : The import library of the ZSTD dynamic library (DLL) for Visual C++
9
- - `example\` : The example of usage of the ZSTD library
10
- - `include\` : Header files required by the ZSTD library
5
+ - `zstd.exe` : Command Line Utility, supporting gzip-like arguments
6
+ - `dll\libzstd.dll` : The ZSTD dynamic library (DLL)
7
+ - `dll\libzstd.lib` : The import library of the ZSTD dynamic library (DLL) for Visual C++
8
+ - `example\` : The example of usage of the ZSTD library
9
+ - `include\` : Header files required by the ZSTD library
11
10
  - `static\libzstd_static.lib` : The static ZSTD library (LIB)
12
11
 
13
-
14
- #### Usage of Command Line Interface
12
+ ## Usage of Command Line Interface
15
13
 
16
14
  Command Line Interface (CLI) supports gzip-like arguments.
17
15
  By default CLI takes an input file and compresses it to an output file:
18
- ```
16
+
19
17
  Usage: zstd [arg] [input] [output]
20
- ```
18
+
21
19
  The full list of commands for CLI can be obtained with `-h` or `-H`. The ratio can
22
20
  be improved with commands from `-3` to `-16` but higher levels also have slower
23
21
  compression. CLI includes in-memory compression benchmark module with compression
@@ -25,36 +23,32 @@ levels starting from `-b` and ending with `-e` with iteration time of `-i` secon
25
23
  CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined
26
24
  into `-b1e18i1`.
27
25
 
28
-
29
- #### The example of usage of static and dynamic ZSTD libraries with gcc/MinGW
26
+ ## The example of usage of static and dynamic ZSTD libraries with gcc/MinGW
30
27
 
31
28
  Use `cd example` and `make` to build `fullbench-dll` and `fullbench-lib`.
32
29
  `fullbench-dll` uses a dynamic ZSTD library from the `dll` directory.
33
30
  `fullbench-lib` uses a static ZSTD library from the `lib` directory.
34
31
 
35
-
36
- #### Using ZSTD DLL with gcc/MinGW
32
+ ## Using ZSTD DLL with gcc/MinGW
37
33
 
38
34
  The header files from `include\` and the dynamic library `dll\libzstd.dll`
39
35
  are required to compile a project using gcc/MinGW.
40
36
  The dynamic library has to be added to linking options.
41
37
  It means that if a project that uses ZSTD consists of a single `test-dll.c`
42
38
  file it should be linked with `dll\libzstd.dll`. For example:
43
- ```
39
+
44
40
  gcc $(CFLAGS) -Iinclude\ test-dll.c -o test-dll dll\libzstd.dll
45
- ```
46
- The compiled executable will require ZSTD DLL which is available at `dll\libzstd.dll`.
47
41
 
42
+ The compiled executable will require ZSTD DLL which is available at `dll\libzstd.dll`.
48
43
 
49
- #### The example of usage of static and dynamic ZSTD libraries with Visual C++
44
+ ## The example of usage of static and dynamic ZSTD libraries with Visual C++
50
45
 
51
46
  Open `example\fullbench-dll.sln` to compile `fullbench-dll` that uses a
52
47
  dynamic ZSTD library from the `dll` directory. The solution works with Visual C++
53
48
  2010 or newer. When one will open the solution with Visual C++ newer than 2010
54
49
  then the solution will upgraded to the current version.
55
50
 
56
-
57
- #### Using ZSTD DLL with Visual C++
51
+ ## Using ZSTD DLL with Visual C++
58
52
 
59
53
  The header files from `include\` and the import library `dll\libzstd.lib`
60
54
  are required to compile a project using Visual C++.