extzstd 0.3 → 0.3.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/HISTORY.ja.md +8 -0
- data/README.md +1 -1
- data/contrib/zstd/CHANGELOG +94 -0
- data/contrib/zstd/CONTRIBUTING.md +351 -1
- data/contrib/zstd/Makefile +32 -10
- data/contrib/zstd/README.md +33 -10
- data/contrib/zstd/TESTING.md +2 -2
- data/contrib/zstd/appveyor.yml +42 -4
- data/contrib/zstd/lib/Makefile +128 -60
- data/contrib/zstd/lib/README.md +47 -16
- data/contrib/zstd/lib/common/bitstream.h +38 -39
- data/contrib/zstd/lib/common/compiler.h +40 -5
- data/contrib/zstd/lib/common/cpu.h +1 -1
- data/contrib/zstd/lib/common/debug.c +11 -31
- data/contrib/zstd/lib/common/debug.h +11 -31
- data/contrib/zstd/lib/common/entropy_common.c +13 -33
- data/contrib/zstd/lib/common/error_private.c +2 -1
- data/contrib/zstd/lib/common/error_private.h +6 -2
- data/contrib/zstd/lib/common/fse.h +12 -32
- data/contrib/zstd/lib/common/fse_decompress.c +12 -35
- data/contrib/zstd/lib/common/huf.h +15 -33
- data/contrib/zstd/lib/common/mem.h +75 -2
- data/contrib/zstd/lib/common/pool.c +8 -4
- data/contrib/zstd/lib/common/pool.h +2 -2
- data/contrib/zstd/lib/common/threading.c +50 -4
- data/contrib/zstd/lib/common/threading.h +36 -4
- data/contrib/zstd/lib/common/xxhash.c +23 -35
- data/contrib/zstd/lib/common/xxhash.h +11 -31
- data/contrib/zstd/lib/common/zstd_common.c +1 -1
- data/contrib/zstd/lib/common/zstd_errors.h +2 -1
- data/contrib/zstd/lib/common/zstd_internal.h +154 -26
- data/contrib/zstd/lib/compress/fse_compress.c +17 -40
- data/contrib/zstd/lib/compress/hist.c +15 -35
- data/contrib/zstd/lib/compress/hist.h +12 -32
- data/contrib/zstd/lib/compress/huf_compress.c +92 -92
- data/contrib/zstd/lib/compress/zstd_compress.c +1191 -1330
- data/contrib/zstd/lib/compress/zstd_compress_internal.h +317 -55
- data/contrib/zstd/lib/compress/zstd_compress_literals.c +158 -0
- data/contrib/zstd/lib/compress/zstd_compress_literals.h +29 -0
- data/contrib/zstd/lib/compress/zstd_compress_sequences.c +419 -0
- data/contrib/zstd/lib/compress/zstd_compress_sequences.h +54 -0
- data/contrib/zstd/lib/compress/zstd_compress_superblock.c +845 -0
- data/contrib/zstd/lib/compress/zstd_compress_superblock.h +32 -0
- data/contrib/zstd/lib/compress/zstd_cwksp.h +525 -0
- data/contrib/zstd/lib/compress/zstd_double_fast.c +65 -43
- data/contrib/zstd/lib/compress/zstd_double_fast.h +2 -2
- data/contrib/zstd/lib/compress/zstd_fast.c +92 -66
- data/contrib/zstd/lib/compress/zstd_fast.h +2 -2
- data/contrib/zstd/lib/compress/zstd_lazy.c +74 -42
- data/contrib/zstd/lib/compress/zstd_lazy.h +1 -1
- data/contrib/zstd/lib/compress/zstd_ldm.c +32 -10
- data/contrib/zstd/lib/compress/zstd_ldm.h +7 -2
- data/contrib/zstd/lib/compress/zstd_opt.c +81 -114
- data/contrib/zstd/lib/compress/zstd_opt.h +1 -1
- data/contrib/zstd/lib/compress/zstdmt_compress.c +95 -51
- data/contrib/zstd/lib/compress/zstdmt_compress.h +3 -2
- data/contrib/zstd/lib/decompress/huf_decompress.c +76 -60
- data/contrib/zstd/lib/decompress/zstd_ddict.c +12 -8
- data/contrib/zstd/lib/decompress/zstd_ddict.h +2 -2
- data/contrib/zstd/lib/decompress/zstd_decompress.c +292 -172
- data/contrib/zstd/lib/decompress/zstd_decompress_block.c +459 -338
- data/contrib/zstd/lib/decompress/zstd_decompress_block.h +3 -3
- data/contrib/zstd/lib/decompress/zstd_decompress_internal.h +18 -4
- data/contrib/zstd/lib/deprecated/zbuff.h +9 -8
- data/contrib/zstd/lib/deprecated/zbuff_common.c +2 -2
- data/contrib/zstd/lib/deprecated/zbuff_compress.c +1 -1
- data/contrib/zstd/lib/deprecated/zbuff_decompress.c +1 -1
- data/contrib/zstd/lib/dictBuilder/cover.c +164 -54
- data/contrib/zstd/lib/dictBuilder/cover.h +52 -7
- data/contrib/zstd/lib/dictBuilder/fastcover.c +60 -43
- data/contrib/zstd/lib/dictBuilder/zdict.c +43 -19
- data/contrib/zstd/lib/dictBuilder/zdict.h +56 -28
- data/contrib/zstd/lib/legacy/zstd_legacy.h +8 -4
- data/contrib/zstd/lib/legacy/zstd_v01.c +110 -110
- data/contrib/zstd/lib/legacy/zstd_v01.h +1 -1
- data/contrib/zstd/lib/legacy/zstd_v02.c +23 -13
- data/contrib/zstd/lib/legacy/zstd_v02.h +1 -1
- data/contrib/zstd/lib/legacy/zstd_v03.c +23 -13
- data/contrib/zstd/lib/legacy/zstd_v03.h +1 -1
- data/contrib/zstd/lib/legacy/zstd_v04.c +30 -17
- data/contrib/zstd/lib/legacy/zstd_v04.h +1 -1
- data/contrib/zstd/lib/legacy/zstd_v05.c +113 -102
- data/contrib/zstd/lib/legacy/zstd_v05.h +2 -2
- data/contrib/zstd/lib/legacy/zstd_v06.c +20 -18
- data/contrib/zstd/lib/legacy/zstd_v06.h +1 -1
- data/contrib/zstd/lib/legacy/zstd_v07.c +25 -19
- data/contrib/zstd/lib/legacy/zstd_v07.h +1 -1
- data/contrib/zstd/lib/libzstd.pc.in +3 -2
- data/contrib/zstd/lib/zstd.h +265 -88
- data/ext/extzstd.h +1 -1
- data/ext/libzstd_conf.h +8 -0
- data/ext/zstd_common.c +1 -3
- data/ext/zstd_compress.c +3 -3
- data/ext/zstd_decompress.c +1 -5
- data/ext/zstd_dictbuilder.c +2 -3
- data/ext/zstd_dictbuilder_fastcover.c +1 -3
- data/ext/zstd_legacy_v01.c +2 -0
- data/ext/zstd_legacy_v02.c +2 -0
- data/ext/zstd_legacy_v03.c +2 -0
- data/ext/zstd_legacy_v04.c +2 -0
- data/ext/zstd_legacy_v05.c +2 -0
- data/ext/zstd_legacy_v06.c +2 -0
- data/ext/zstd_legacy_v07.c +2 -0
- data/lib/extzstd.rb +18 -10
- data/lib/extzstd/version.rb +1 -1
- metadata +15 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3f61d75e29eb86642f72138584ffc02ff36e86318dc48c07dbb176fc26f72b9c
|
4
|
+
data.tar.gz: 539f64336ce50da00e14e3920545a3f27842febe3ce8ec47387b408110fc1149
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 52279e275616d6d179464deda78ff289abc07b42ebf7f111c79b89d6bc0ba9092290494aaefb7b9631fb3ed37d7fcd60e3f337d691a8990b36155541442fad65
|
7
|
+
data.tar.gz: 0d46cccb64f150ecf3b29459bc55b0aecfa1b43dcaa68c46f60c1a863d1827edb71acbfb4491ee777acde3bf68bdd8faf841c83cd00863172a2943689b36e482
|
data/HISTORY.ja.md
CHANGED
@@ -1,5 +1,13 @@
|
|
1
1
|
# extzstd の更新履歴
|
2
2
|
|
3
|
+
## extzstd-0.3.1 (令和2年10月3日 土曜日)
|
4
|
+
|
5
|
+
* zstd-1.4.5 への更新
|
6
|
+
* ruby-2.7 が警告するキーワード引数に関して修正
|
7
|
+
* ".so" ファイルの読み込みに `require` を使う
|
8
|
+
参照: [extlz4#2](https://github.com/dearblue/ruby-extlz4/issues/2)
|
9
|
+
|
10
|
+
|
3
11
|
## extzstd-0.3 (平成31年4月)
|
4
12
|
|
5
13
|
* zstd-1.4.0 への更新
|
data/README.md
CHANGED
data/contrib/zstd/CHANGELOG
CHANGED
@@ -1,3 +1,97 @@
|
|
1
|
+
v1.4.5
|
2
|
+
fix : Compression ratio regression on huge files (> 3 GB) using high levels (--ultra) and multithreading, by @terrelln
|
3
|
+
perf: Improved decompression speed: x64 : +10% (clang) / +5% (gcc); ARM : from +15% to +50%, depending on SoC, by @terrelln
|
4
|
+
perf: Automatically downsizes ZSTD_DCtx when too large for too long (#2069, by @bimbashreshta)
|
5
|
+
perf: Improved fast compression speed on aarch64 (#2040, ~+3%, by @caoyzh)
|
6
|
+
perf: Small level 1 compression speed gains (depending on compiler)
|
7
|
+
cli : New --patch-from command, create and apply patches from files, by @bimbashreshta
|
8
|
+
cli : New --filelist= : Provide a list of files to operate upon from a file
|
9
|
+
cli : -b -d command can now benchmark decompression on multiple files
|
10
|
+
cli : New --no-content-size command
|
11
|
+
cli : New --show-default-cparams information command
|
12
|
+
api : ZDICT_finalizeDictionary() is promoted to stable (#2111)
|
13
|
+
api : new experimental parameter ZSTD_d_stableOutBuffer (#2094)
|
14
|
+
build: Generate a single-file libzstd library (#2065, by @cwoffenden)
|
15
|
+
build: Relative includes no longer require -I compiler flags for zstd lib subdirs (#2103, by @felixhandte)
|
16
|
+
build: zstd now compiles cleanly under -pedantic (#2099)
|
17
|
+
build: zstd now compiles with make-4.3
|
18
|
+
build: Support mingw cross-compilation from Linux, by @Ericson2314
|
19
|
+
build: Meson multi-thread build fix on windows
|
20
|
+
build: Some misc icc fixes backed by new ci test on travis
|
21
|
+
misc: bitflip analyzer tool, by @felixhandte
|
22
|
+
misc: Extend largeNbDicts benchmark to compression
|
23
|
+
misc: Edit-distance match finder in contrib/
|
24
|
+
doc : Improved beginner CONTRIBUTING.md docs
|
25
|
+
doc : New issue templates for zstd
|
26
|
+
|
27
|
+
v1.4.4
|
28
|
+
perf: Improved decompression speed, by > 10%, by @terrelln
|
29
|
+
perf: Better compression speed when re-using a context, by @felixhandte
|
30
|
+
perf: Fix compression ratio when compressing large files with small dictionary, by @senhuang42
|
31
|
+
perf: zstd reference encoder can generate RLE blocks, by @bimbashrestha
|
32
|
+
perf: minor generic speed optimization, by @davidbolvansky
|
33
|
+
api: new ability to extract sequences from the parser for analysis, by @bimbashrestha
|
34
|
+
api: fixed decoding of magic-less frames, by @terrelln
|
35
|
+
api: fixed ZSTD_initCStream_advanced() performance with fast modes, reported by @QrczakMK
|
36
|
+
cli: Named pipes support, by @bimbashrestha
|
37
|
+
cli: short tar's extension support, by @stokito
|
38
|
+
cli: command --output-dir-flat= , generates target files into requested directory, by @senhuang42
|
39
|
+
cli: commands --stream-size=# and --size-hint=#, by @nmagerko
|
40
|
+
cli: command --exclude-compressed, by @shashank0791
|
41
|
+
cli: faster `-t` test mode
|
42
|
+
cli: improved some error messages, by @vangyzen
|
43
|
+
cli: fix command `-D dictionary` on Windows, reported by @artyompetrov
|
44
|
+
cli: fix rare deadlock condition within dictionary builder, by @terrelln
|
45
|
+
build: single-file decoder with emscripten compilation script, by @cwoffenden
|
46
|
+
build: fixed zlibWrapper compilation on Visual Studio, reported by @bluenlive
|
47
|
+
build: fixed deprecation warning for certain gcc version, reported by @jasonma163
|
48
|
+
build: fix compilation on old gcc versions, by @cemeyer
|
49
|
+
build: improved installation directories for cmake script, by Dmitri Shubin
|
50
|
+
pack: modified pkgconfig, for better integration into openwrt, requested by @neheb
|
51
|
+
misc: Improved documentation : ZSTD_CLEVEL, DYNAMIC_BMI2, ZSTD_CDict, function deprecation, zstd format
|
52
|
+
misc: fixed educational decoder : accept larger literals section, and removed UNALIGNED() macro
|
53
|
+
|
54
|
+
v1.4.3
|
55
|
+
bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
|
56
|
+
bug: Fix Buffer Overflow in legacy v0.3 decompression by @felixhandte (#1722)
|
57
|
+
build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
|
58
|
+
|
59
|
+
v1.4.2
|
60
|
+
bug: Fix bug in zstd-0.5 decoder by @terrelln (#1696)
|
61
|
+
bug: Fix seekable decompression in-memory API by @iburinoc (#1695)
|
62
|
+
misc: Validate blocks are smaller than size limit by @vivekmg (#1685)
|
63
|
+
misc: Restructure source files by @ephiepark (#1679)
|
64
|
+
|
65
|
+
v1.4.1
|
66
|
+
bug: Fix data corruption in niche use cases by @terrelln (#1659)
|
67
|
+
bug: Fuzz legacy modes, fix uncovered bugs by @terrelln (#1593, #1594, #1595)
|
68
|
+
bug: Fix out of bounds read by @terrelln (#1590)
|
69
|
+
perf: Improve decode speed by ~7% @mgrice (#1668)
|
70
|
+
perf: Slightly improved compression ratio of level 3 and 4 (ZSTD_dfast) by @cyan4973 (#1681)
|
71
|
+
perf: Slightly faster compression speed when re-using a context by @cyan4973 (#1658)
|
72
|
+
perf: Improve compression ratio for small windowLog by @cyan4973 (#1624)
|
73
|
+
perf: Faster compression speed in high compression mode for repetitive data by @terrelln (#1635)
|
74
|
+
api: Add parameter to generate smaller dictionaries by @tyler-tran (#1656)
|
75
|
+
cli: Recognize symlinks when built in C99 mode by @felixhandte (#1640)
|
76
|
+
cli: Expose cpu load indicator for each file on -vv mode by @ephiepark (#1631)
|
77
|
+
cli: Restrict read permissions on destination files by @chungy (#1644)
|
78
|
+
cli: zstdgrep: handle -f flag by @felixhandte (#1618)
|
79
|
+
cli: zstdcat: follow symlinks by @vejnar (#1604)
|
80
|
+
doc: Remove extra size limit on compressed blocks by @felixhandte (#1689)
|
81
|
+
doc: Fix typo by @yk-tanigawa (#1633)
|
82
|
+
doc: Improve documentation on streaming buffer sizes by @cyan4973 (#1629)
|
83
|
+
build: CMake: support building with LZ4 @leeyoung624 (#1626)
|
84
|
+
build: CMake: install zstdless and zstdgrep by @leeyoung624 (#1647)
|
85
|
+
build: CMake: respect existing uninstall target by @j301scott (#1619)
|
86
|
+
build: Make: skip multithread tests when built without support by @michaelforney (#1620)
|
87
|
+
build: Make: Fix examples/ test target by @sjnam (#1603)
|
88
|
+
build: Meson: rename options out of deprecated namespace by @lzutao (#1665)
|
89
|
+
build: Meson: fix build by @lzutao (#1602)
|
90
|
+
build: Visual Studio: don't export symbols in static lib by @scharan (#1650)
|
91
|
+
build: Visual Studio: fix linking by @absotively (#1639)
|
92
|
+
build: Fix MinGW-W64 build by @myzhang1029 (#1600)
|
93
|
+
misc: Expand decodecorpus coverage by @ephiepark (#1664)
|
94
|
+
|
1
95
|
v1.4.0
|
2
96
|
perf: Improve level 1 compression speed in most scenarios by 6% by @gbtucker and @terrelln
|
3
97
|
api: Move the advanced API, including all functions in the staging section, to the stable section
|
@@ -26,6 +26,356 @@ to do this once to work on any of Facebook's open source projects.
|
|
26
26
|
|
27
27
|
Complete your CLA here: <https://code.facebook.com/cla>
|
28
28
|
|
29
|
+
## Workflow
|
30
|
+
Zstd uses a branch-based workflow for making changes to the codebase. Typically, zstd
|
31
|
+
will use a new branch per sizable topic. For smaller changes, it is okay to lump multiple
|
32
|
+
related changes into a branch.
|
33
|
+
|
34
|
+
Our contribution process works in three main stages:
|
35
|
+
1. Local development
|
36
|
+
* Update:
|
37
|
+
* Checkout your fork of zstd if you have not already
|
38
|
+
```
|
39
|
+
git checkout https://github.com/<username>/zstd
|
40
|
+
cd zstd
|
41
|
+
```
|
42
|
+
* Update your local dev branch
|
43
|
+
```
|
44
|
+
git pull https://github.com/facebook/zstd dev
|
45
|
+
git push origin dev
|
46
|
+
```
|
47
|
+
* Topic and development:
|
48
|
+
* Make a new branch on your fork about the topic you're developing for
|
49
|
+
```
|
50
|
+
# branch names should be consise but sufficiently informative
|
51
|
+
git checkout -b <branch-name>
|
52
|
+
git push origin <branch-name>
|
53
|
+
```
|
54
|
+
* Make commits and push
|
55
|
+
```
|
56
|
+
# make some changes =
|
57
|
+
git add -u && git commit -m <message>
|
58
|
+
git push origin <branch-name>
|
59
|
+
```
|
60
|
+
* Note: run local tests to ensure that your changes didn't break existing functionality
|
61
|
+
* Quick check
|
62
|
+
```
|
63
|
+
make shortest
|
64
|
+
```
|
65
|
+
* Longer check
|
66
|
+
```
|
67
|
+
make test
|
68
|
+
```
|
69
|
+
2. Code Review and CI tests
|
70
|
+
* Ensure CI tests pass:
|
71
|
+
* Before sharing anything to the community, make sure that all CI tests pass on your local fork.
|
72
|
+
See our section on setting up your CI environment for more information on how to do this.
|
73
|
+
* Ensure that static analysis passes on your development machine. See the Static Analysis section
|
74
|
+
below to see how to do this.
|
75
|
+
* Create a pull request:
|
76
|
+
* When you are ready to share you changes to the community, create a pull request from your branch
|
77
|
+
to facebook:dev. You can do this very easily by clicking 'Create Pull Request' on your fork's home
|
78
|
+
page.
|
79
|
+
* From there, select the branch where you made changes as your source branch and facebook:dev
|
80
|
+
as the destination.
|
81
|
+
* Examine the diff presented between the two branches to make sure there is nothing unexpected.
|
82
|
+
* Write a good pull request description:
|
83
|
+
* While there is no strict template that our contributors follow, we would like them to
|
84
|
+
sufficiently summarize and motivate the changes they are proposing. We recommend all pull requests,
|
85
|
+
at least indirectly, address the following points.
|
86
|
+
* Is this pull request important and why?
|
87
|
+
* Is it addressing an issue? If so, what issue? (provide links for convenience please)
|
88
|
+
* Is this a new feature? If so, why is it useful and/or necessary?
|
89
|
+
* Are there background references and documents that reviewers should be aware of to properly assess this change?
|
90
|
+
* Note: make sure to point out any design and architectural decisions that you made and the rationale behind them.
|
91
|
+
* Note: if you have been working with a specific user and would like them to review your work, make sure you mention them using (@<username>)
|
92
|
+
* Submit the pull request and iterate with feedback.
|
93
|
+
3. Merge and Release
|
94
|
+
* Getting approval:
|
95
|
+
* You will have to iterate on your changes with feedback from other collaborators to reach a point
|
96
|
+
where your pull request can be safely merged.
|
97
|
+
* To avoid too many comments on style and convention, make sure that you have a
|
98
|
+
look at our style section below before creating a pull request.
|
99
|
+
* Eventually, someone from the zstd team will approve your pull request and not long after merge it into
|
100
|
+
the dev branch.
|
101
|
+
* Housekeeping:
|
102
|
+
* Most PRs are linked with one or more Github issues. If this is the case for your PR, make sure
|
103
|
+
the corresponding issue is mentioned. If your change 'fixes' or completely addresses the
|
104
|
+
issue at hand, then please indicate this by requesting that an issue be closed by commenting.
|
105
|
+
* Just because your changes have been merged does not mean the topic or larger issue is complete. Remember
|
106
|
+
that the change must make it to an official zstd release for it to be meaningful. We recommend
|
107
|
+
that contributers track the activity on their pull request and corresponding issue(s) page(s) until
|
108
|
+
their change makes it to the next release of zstd. Users will often discover bugs in your code or
|
109
|
+
suggest ways to refine and improve your initial changes even after the pull request is merged.
|
110
|
+
|
111
|
+
## Static Analysis
|
112
|
+
Static analysis is a process for examining the correctness or validity of a program without actually
|
113
|
+
executing it. It usually helps us find many simple bugs. Zstd uses clang's `scan-build` tool for
|
114
|
+
static analysis. You can install it by following the instructions for your OS on https://clang-analyzer.llvm.org/scan-build.
|
115
|
+
|
116
|
+
Once installed, you can ensure that our static analysis tests pass on your local development machine
|
117
|
+
by running:
|
118
|
+
```
|
119
|
+
make staticAnalyze
|
120
|
+
```
|
121
|
+
|
122
|
+
In general, you can use `scan-build` to static analyze any build script. For example, to static analyze
|
123
|
+
just `contrib/largeNbDicts` and nothing else, you can run:
|
124
|
+
|
125
|
+
```
|
126
|
+
scan-build make -C contrib/largeNbDicts largeNbDicts
|
127
|
+
```
|
128
|
+
|
129
|
+
## Performance
|
130
|
+
Performance is extremely important for zstd and we only merge pull requests whose performance
|
131
|
+
landscape and corresponding trade-offs have been adequately analyzed, reproduced, and presented.
|
132
|
+
This high bar for performance means that every PR which has the potential to
|
133
|
+
impact performance takes a very long time for us to properly review. That being said, we
|
134
|
+
always welcome contributions to improve performance (or worsen performance for the trade-off of
|
135
|
+
something else). Please keep the following in mind before submitting a performance related PR:
|
136
|
+
|
137
|
+
1. Zstd isn't as old as gzip but it has been around for time now and its evolution is
|
138
|
+
very well documented via past Github issues and pull requests. It may be the case that your
|
139
|
+
particular performance optimization has already been considered in the past. Please take some
|
140
|
+
time to search through old issues and pull requests using keywords specific to your
|
141
|
+
would-be PR. Of course, just because a topic has already been discussed (and perhaps rejected
|
142
|
+
on some grounds) in the past, doesn't mean it isn't worth bringing up again. But even in that case,
|
143
|
+
it will be helpful for you to have context from that topic's history before contributing.
|
144
|
+
2. The distinction between noise and actual performance gains can unfortunately be very subtle
|
145
|
+
especially when microbenchmarking extremely small wins or losses. The only remedy to getting
|
146
|
+
something subtle merged is extensive benchmarking. You will be doing us a great favor if you
|
147
|
+
take the time to run extensive, long-duration, and potentially cross-(os, platform, process, etc)
|
148
|
+
benchmarks on your end before submitting a PR. Of course, you will not be able to benchmark
|
149
|
+
your changes on every single processor and os out there (and neither will we) but do that best
|
150
|
+
you can:) We've adding some things to think about when benchmarking below in the Benchmarking
|
151
|
+
Performance section which might be helpful for you.
|
152
|
+
3. Optimizing performance for a certain OS, processor vendor, compiler, or network system is a perfectly
|
153
|
+
legitimate thing to do as long as it does not harm the overall performance health of Zstd.
|
154
|
+
This is a hard balance to strike but please keep in mind other aspects of Zstd when
|
155
|
+
submitting changes that are clang-specific, windows-specific, etc.
|
156
|
+
|
157
|
+
## Benchmarking Performance
|
158
|
+
Performance microbenchmarking is a tricky subject but also essential for Zstd. We value empirical
|
159
|
+
testing over theoretical speculation. This guide it not perfect but for most scenarios, it
|
160
|
+
is a good place to start.
|
161
|
+
|
162
|
+
### Stability
|
163
|
+
Unfortunately, the most important aspect in being able to benchmark reliably is to have a stable
|
164
|
+
benchmarking machine. A virtual machine, a machine with shared resources, or your laptop
|
165
|
+
will typically not be stable enough to obtain reliable benchmark results. If you can get your
|
166
|
+
hands on a desktop, this is usually a better scenario.
|
167
|
+
|
168
|
+
Of course, benchmarking can be done on non-hyper-stable machines as well. You will just have to
|
169
|
+
do a little more work to ensure that you are in fact measuring the changes you've made not and
|
170
|
+
noise. Here are some things you can do to make your benchmarks more stable:
|
171
|
+
|
172
|
+
1. The most simple thing you can do to drastically improve the stability of your benchmark is
|
173
|
+
to run it multiple times and then aggregate the results of those runs. As a general rule of
|
174
|
+
thumb, the smaller the change you are trying to measure, the more samples of benchmark runs
|
175
|
+
you will have to aggregate over to get reliable results. Here are some additional things to keep in
|
176
|
+
mind when running multiple trials:
|
177
|
+
* How you aggregate your samples are important. You might be tempted to use the mean of your
|
178
|
+
results. While this is certainly going to be a more stable number than a raw single sample
|
179
|
+
benchmark number, you might have more luck by taking the median. The mean is not robust to
|
180
|
+
outliers whereas the median is. Better still, you could simply take the fastest speed your
|
181
|
+
benchmark achieved on each run since that is likely the fastest your process will be
|
182
|
+
capable of running your code. In our experience, this (aggregating by just taking the sample
|
183
|
+
with the fastest running time) has been the most stable approach.
|
184
|
+
* The more samples you have, the more stable your benchmarks should be. You can verify
|
185
|
+
your improved stability by looking at the size of your confidence intervals as you
|
186
|
+
increase your sample count. These should get smaller and smaller. Eventually hopefully
|
187
|
+
smaller than the performance win you are expecting.
|
188
|
+
* Most processors will take some time to get `hot` when running anything. The observations
|
189
|
+
you collect during that time period will very different from the true performance number. Having
|
190
|
+
a very large number of sample will help alleviate this problem slightly but you can also
|
191
|
+
address is directly by simply not including the first `n` iterations of your benchmark in
|
192
|
+
your aggregations. You can determine `n` by simply looking at the results from each iteration
|
193
|
+
and then hand picking a good threshold after which the variance in results seems to stabilize.
|
194
|
+
2. You cannot really get reliable benchmarks if your host machine is simultaneously running
|
195
|
+
another cpu/memory-intensive application in the background. If you are running benchmarks on your
|
196
|
+
personal laptop for instance, you should close all applications (including your code editor and
|
197
|
+
browser) before running your benchmarks. You might also have invisible background applications
|
198
|
+
running. You can see what these are by looking at either Activity Monitor on Mac or Task Manager
|
199
|
+
on Windows. You will get more stable benchmark results of you end those processes as well.
|
200
|
+
* If you have multiple cores, you can even run your benchmark on a reserved core to prevent
|
201
|
+
pollution from other OS and user processes. There are a number of ways to do this depending
|
202
|
+
on your OS:
|
203
|
+
* On linux boxes, you have use https://github.com/lpechacek/cpuset.
|
204
|
+
* On Windows, you can "Set Processor Affinity" using https://www.thewindowsclub.com/processor-affinity-windows
|
205
|
+
* On Mac, you can try to use their dedicated affinity API https://developer.apple.com/library/archive/releasenotes/Performance/RN-AffinityAPI/#//apple_ref/doc/uid/TP40006635-CH1-DontLinkElementID_2
|
206
|
+
3. To benchmark, you will likely end up writing a separate c/c++ program that will link libzstd.
|
207
|
+
Dynamically linking your library will introduce some added variation (not a large amount but
|
208
|
+
definitely some). Statically linking libzstd will be more stable. Static libraries should
|
209
|
+
be enabled by default when building zstd.
|
210
|
+
4. Use a profiler with a good high resolution timer. See the section below on profiling for
|
211
|
+
details on this.
|
212
|
+
5. Disable frequency scaling, turbo boost and address space randomization (this will vary by OS)
|
213
|
+
6. Try to avoid storage. On some systems you can use tmpfs. Putting the program, inputs and outputs on
|
214
|
+
tmpfs avoids touching a real storage system, which can have a pretty big variability.
|
215
|
+
|
216
|
+
Also check our LLVM's guide on benchmarking here: https://llvm.org/docs/Benchmarking.html
|
217
|
+
|
218
|
+
### Zstd benchmark
|
219
|
+
The fastest signal you can get regarding your performance changes is via the in-build zstd cli
|
220
|
+
bench option. You can run Zstd as you typically would for your scenario using some set of options
|
221
|
+
and then additionally also specify the `-b#` option. Doing this will run our benchmarking pipeline
|
222
|
+
for that options you have just provided. If you want to look at the internals of how this
|
223
|
+
benchmarking script works, you can check out programs/benchzstd.c
|
224
|
+
|
225
|
+
For example: say you have made a change that you believe improves the speed of zstd level 1. The
|
226
|
+
very first thing you should use to asses whether you actually achieved any sort of improvement
|
227
|
+
is `zstd -b`. You might try to do something like this. Note: you can use the `-i` option to
|
228
|
+
specify a running time for your benchmark in seconds (default is 3 seconds).
|
229
|
+
Usually, the longer the running time, the more stable your results will be.
|
230
|
+
|
231
|
+
```
|
232
|
+
$ git checkout <commit-before-your-change>
|
233
|
+
$ make && cp zstd zstd-old
|
234
|
+
$ git checkout <commit-after-your-change>
|
235
|
+
$ make && cp zstd zstd-new
|
236
|
+
$ zstd-old -i5 -b1 <your-test-data>
|
237
|
+
1<your-test-data> : 8990 -> 3992 (2.252), 302.6 MB/s , 626.4 MB/s
|
238
|
+
$ zstd-new -i5 -b1 <your-test-data>
|
239
|
+
1<your-test-data> : 8990 -> 3992 (2.252), 302.8 MB/s , 628.4 MB/s
|
240
|
+
```
|
241
|
+
|
242
|
+
Unless your performance win is large enough to be visible despite the intrinsic noise
|
243
|
+
on your computer, benchzstd alone will likely not be enough to validate the impact of your
|
244
|
+
changes. For example, the results of the example above indicate that effectively nothing
|
245
|
+
changed but there could be a small <3% improvement that the noise on the host machine
|
246
|
+
obscured. So unless you see a large performance win (10-15% consistently) using just
|
247
|
+
this method of evaluation will not be sufficient.
|
248
|
+
|
249
|
+
### Profiling
|
250
|
+
There are a number of great profilers out there. We're going to briefly mention how you can
|
251
|
+
profile your code using `instruments` on mac, `perf` on linux and `visual studio profiler`
|
252
|
+
on windows.
|
253
|
+
|
254
|
+
Say you have an idea for a change that you think will provide some good performance gains
|
255
|
+
for level 1 compression on Zstd. Typically this means, you have identified a section of
|
256
|
+
code that you think can be made to run faster.
|
257
|
+
|
258
|
+
The first thing you will want to do is make sure that the piece of code is actually taking up
|
259
|
+
a notable amount of time to run. It is usually not worth optimzing something which accounts for less than
|
260
|
+
0.0001% of the total running time. Luckily, there are tools to help with this.
|
261
|
+
Profilers will let you see how much time your code spends inside a particular function.
|
262
|
+
If your target code snippit is only part of a function, it might be worth trying to
|
263
|
+
isolate that snippit by moving it to its own function (this is usually not necessary but
|
264
|
+
might be).
|
265
|
+
|
266
|
+
Most profilers (including the profilers dicusssed below) will generate a call graph of
|
267
|
+
functions for you. Your goal will be to find your function of interest in this call grapch
|
268
|
+
and then inspect the time spent inside of it. You might also want to to look at the
|
269
|
+
annotated assembly which most profilers will provide you with.
|
270
|
+
|
271
|
+
#### Instruments
|
272
|
+
We will once again consider the scenario where you think you've identified a piece of code
|
273
|
+
whose performance can be improved upon. Follow these steps to profile your code using
|
274
|
+
Instruments.
|
275
|
+
|
276
|
+
1. Open Instruments
|
277
|
+
2. Select `Time Profiler` from the list of standard templates
|
278
|
+
3. Close all other applications except for your instruments window and your terminal
|
279
|
+
4. Run your benchmarking script from your terminal window
|
280
|
+
* You will want a benchmark that runs for at least a few seconds (5 seconds will
|
281
|
+
usually be long enough). This way the profiler will have something to work with
|
282
|
+
and you will have ample time to attach your profiler to this process:)
|
283
|
+
* I will just use benchzstd as my bencharmking script for this example:
|
284
|
+
```
|
285
|
+
$ zstd -b1 -i5 <my-data> # this will run for 5 seconds
|
286
|
+
```
|
287
|
+
5. Once you run your benchmarking script, switch back over to instruments and attach your
|
288
|
+
process to the time profiler. You can do this by:
|
289
|
+
* Clicking on the `All Processes` drop down in the top left of the toolbar.
|
290
|
+
* Selecting your process from the dropdown. In my case, it is just going to be labled
|
291
|
+
`zstd`
|
292
|
+
* Hitting the bright red record circle button on the top left of the toolbar
|
293
|
+
6. You profiler will now start collecting metrics from your bencharking script. Once
|
294
|
+
you think you have collected enough samples (usually this is the case after 3 seconds of
|
295
|
+
recording), stop your profiler.
|
296
|
+
7. Make sure that in toolbar of the bottom window, `profile` is selected.
|
297
|
+
8. You should be able to see your call graph.
|
298
|
+
* If you don't see the call graph or an incomplete call graph, make sure you have compiled
|
299
|
+
zstd and your benchmarking scripg using debug flags. On mac and linux, this just means
|
300
|
+
you will have to supply the `-g` flag alone with your build script. You might also
|
301
|
+
have to provide the `-fno-omit-frame-pointer` flag
|
302
|
+
9. Dig down the graph to find your function call and then inspect it by double clicking
|
303
|
+
the list item. You will be able to see the annotated source code and the assembly side by
|
304
|
+
side.
|
305
|
+
|
306
|
+
#### Perf
|
307
|
+
|
308
|
+
This wiki has a pretty detailed tutorial on getting started working with perf so we'll
|
309
|
+
leave you to check that out of you're getting started:
|
310
|
+
|
311
|
+
https://perf.wiki.kernel.org/index.php/Tutorial
|
312
|
+
|
313
|
+
Some general notes on perf:
|
314
|
+
* Use `perf stat -r # <bench-program>` to quickly get some relevant timing and
|
315
|
+
counter statistics. Perf uses a high resolution timer and this is likely one
|
316
|
+
of the first things your team will run when assessing your PR.
|
317
|
+
* Perf has a long list of hardware counters that can be viewed with `perf --list`.
|
318
|
+
When measuring optimizations, something worth trying is to make sure the handware
|
319
|
+
counters you expect to be impacted by your change are in fact being so. For example,
|
320
|
+
if you expect the L1 cache misses to decrease with your change, you can look at the
|
321
|
+
counter `L1-dcache-load-misses`
|
322
|
+
* Perf hardware counters will not work on a virtual machine.
|
323
|
+
|
324
|
+
#### Visual Studio
|
325
|
+
|
326
|
+
TODO
|
327
|
+
|
328
|
+
|
329
|
+
## Setting up continuous integration (CI) on your fork
|
330
|
+
Zstd uses a number of different continuous integration (CI) tools to ensure that new changes
|
331
|
+
are well tested before they make it to an official release. Specifically, we use the platforms
|
332
|
+
travis-ci, circle-ci, and appveyor.
|
333
|
+
|
334
|
+
Changes cannot be merged into the main dev branch unless they pass all of our CI tests.
|
335
|
+
The easiest way to run these CI tests on your own before submitting a PR to our dev branch
|
336
|
+
is to configure your personal fork of zstd with each of the CI platforms. Below, you'll find
|
337
|
+
instructions for doing this.
|
338
|
+
|
339
|
+
### travis-ci
|
340
|
+
Follow these steps to link travis-ci with your github fork of zstd
|
341
|
+
|
342
|
+
1. Make sure you are logged into your github account
|
343
|
+
2. Go to https://travis-ci.org/
|
344
|
+
3. Click 'Sign in with Github' on the top right
|
345
|
+
4. Click 'Authorize travis-ci'
|
346
|
+
5. Click 'Activate all repositories using Github Apps'
|
347
|
+
6. Select 'Only select repositories' and select your fork of zstd from the drop down
|
348
|
+
7. Click 'Approve and Install'
|
349
|
+
8. Click 'Sign in with Github' again. This time, it will be for travis-pro (which will let you view your tests on the web dashboard)
|
350
|
+
9. Click 'Authorize travis-pro'
|
351
|
+
10. You should have travis set up on your fork now.
|
352
|
+
|
353
|
+
### circle-ci
|
354
|
+
TODO
|
355
|
+
|
356
|
+
### appveyor
|
357
|
+
Follow these steps to link circle-ci with your girhub fork of zstd
|
358
|
+
|
359
|
+
1. Make sure you are logged into your github account
|
360
|
+
2. Go to https://www.appveyor.com/
|
361
|
+
3. Click 'Sign in' on the top right
|
362
|
+
4. Select 'Github' on the left panel
|
363
|
+
5. Click 'Authorize appveyor'
|
364
|
+
6. You might be asked to select which repositories you want to give appveyor permission to. Select your fork of zstd if you're prompted
|
365
|
+
7. You should have appveyor set up on your fork now.
|
366
|
+
|
367
|
+
### General notes on CI
|
368
|
+
CI tests run every time a pull request (PR) is created or updated. The exact tests
|
369
|
+
that get run will depend on the destination branch you specify. Some tests take
|
370
|
+
longer to run than others. Currently, our CI is set up to run a short
|
371
|
+
series of tests when creating a PR to the dev branch and a longer series of tests
|
372
|
+
when creating a PR to the master branch. You can look in the configuration files
|
373
|
+
of the respective CI platform for more information on what gets run when.
|
374
|
+
|
375
|
+
Most people will just want to create a PR with the destination set to their local dev
|
376
|
+
branch of zstd. You can then find the status of the tests on the PR's page. You can also
|
377
|
+
re-run tests and cancel running tests from the PR page or from the respective CI's dashboard.
|
378
|
+
|
29
379
|
## Issues
|
30
380
|
We use GitHub issues to track public bugs. Please ensure your description is
|
31
381
|
clear and has sufficient instructions to be able to reproduce the issue.
|
@@ -34,7 +384,7 @@ Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
|
|
34
384
|
disclosure of security bugs. In those cases, please go through the process
|
35
385
|
outlined on that page and do not file a public issue.
|
36
386
|
|
37
|
-
## Coding Style
|
387
|
+
## Coding Style
|
38
388
|
* 4 spaces for indentation rather than tabs
|
39
389
|
|
40
390
|
## License
|