tar-xz 6.0.0 → 6.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,136 @@
1
+ # tar-xz
2
+
3
+ ## [Unreleased]
4
+
5
+ ## [6.1.0] - 2026-04-29
6
+
7
+ ### ⚠️ BREAKING CHANGES
8
+ - redesign for v6 — universal stream-first API (#108) (tar-xz) ([b2c8a8c](https://github.com/oorabona/node-liblzma/commit/b2c8a8c))
9
+
10
+ ### Added
11
+ - true streaming for Node extract()/list() — O(largest entry) (#113) (tar-xz) ([06a9937](https://github.com/oorabona/node-liblzma/commit/06a9937))
12
+ - wire memlimit through N-API decoder (#112) (native) ([0d09200](https://github.com/oorabona/node-liblzma/commit/0d09200))
13
+ - wire memlimit option through unxzAsync/unxz (#111) (wasm) ([6e2bc09](https://github.com/oorabona/node-liblzma/commit/6e2bc09))
14
+ - adopt Changesets for monorepo versioning + changelog generation (ci) ([adfbc99](https://github.com/oorabona/node-liblzma/commit/adfbc99))
15
+ - redesign for v6 — universal stream-first API (#108) (tar-xz) ⚠️ BREAKING ([b2c8a8c](https://github.com/oorabona/node-liblzma/commit/b2c8a8c))
16
+
17
+ ### Fixed
18
+ - close Win32 symlink-swap TOCTOU with JS-pure 'wx'+retry fail-closed (#114) (tar-xz) ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
19
+ - re-add @changesets/cli (was clobbered by pnpm add of changelog-github) (deps) ([6d76280](https://github.com/oorabona/node-liblzma/commit/6d76280))
20
+ - use 'changeset' so the bin resolves with --ignore-scripts (ci) ([78b91f7](https://github.com/oorabona/node-liblzma/commit/78b91f7))
21
+ - toAsyncIterable mis-dispatched Uint8Array via Symbol.iterator ([b2c8a8c](https://github.com/oorabona/node-liblzma/commit/b2c8a8c))
22
+ - use always() in publish job to bypass skipped build (workspace target) (ci) ([2e08977](https://github.com/oorabona/node-liblzma/commit/2e08977))
23
+ - pin pnpm/action-setup to v5 in refresh-lockfile (v6 corrupts lockfile) (ci) ([f39d603](https://github.com/oorabona/node-liblzma/commit/f39d603))
24
+ - regenerate pnpm-lock.yaml (was broken with duplicate YAML document) (deps) ([e0c66ab](https://github.com/oorabona/node-liblzma/commit/e0c66ab))
25
+ - use squash merge in Dependabot auto-merge (linear history required) (ci) ([f3aee60](https://github.com/oorabona/node-liblzma/commit/f3aee60))
26
+ - point tar-xz demo Vite alias to browser entry ([8aea7ac](https://github.com/oorabona/node-liblzma/commit/8aea7ac))
27
+ - point demo Vite alias to browser entry (fixes docs build) ([e86dba5](https://github.com/oorabona/node-liblzma/commit/e86dba5))
28
+
29
+ ### Changed
30
+ - finalize WIN32-TOCTOU-2026-04-29 — promote spec, mark TODO done ([1ee9db4](https://github.com/oorabona/node-liblzma/commit/1ee9db4))
31
+ - node-tar is pure JS and explicitly does NOT protect Windows ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
32
+ - 0 errors. Type-check: 0 errors. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
33
+ - 155 pass / 0 fail / 3 pre-existing skips. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
34
+ - 0 errors. Type-check: 0 errors. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
35
+ - 155 pass / 0 fail / 3 pre-existing skips (identical to pre-fix). ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
36
+ - 0 errors. Type-check: 0 errors. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
37
+ - round 1 = 6 findings (3 M + 2 L + 1 misclassified), round 2 ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
38
+ - 155 pass / 0 fail / 3 pre-existing skips. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
39
+ - 0 errors. Type-check: 0 errors. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
40
+ - round 1 = 6 findings, round 2 = 3, round 3 = 1, round 4 ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
41
+ - 155 pass / 0 fail / 3 pre-existing skips. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
42
+ - 0 errors. Type-check: 0 errors. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
43
+ - round 1=6, round 2=3, round 3=1, round 4=3 (2 real Ms in ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
44
+ - 155 pass / 0 fail / 3 pre-existing skips. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
45
+ - 0 errors. Type-check: 0 errors. ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
46
+ - round 1=6, round 2=3, round 3=1, round 4=3 (2 real Ms ([b24040d](https://github.com/oorabona/node-liblzma/commit/b24040d))
47
+ - refresh lockfile for latest transitive dependencies (deps) ([06e9590](https://github.com/oorabona/node-liblzma/commit/06e9590))
48
+ - finally swallows cleanup errors on consumer-break, ([06a9937](https://github.com/oorabona/node-liblzma/commit/06a9937))
49
+ - 150+3-skip pass; memory 3+1-skip pass. tsc + lint + build green. ([06a9937](https://github.com/oorabona/node-liblzma/commit/06a9937))
50
+ - refresh lockfile for latest transitive dependencies (deps) ([f8f21d0](https://github.com/oorabona/node-liblzma/commit/f8f21d0))
51
+ - - release-it (existing release.yml + .release-it.json) is retained for ([adfbc99](https://github.com/oorabona/node-liblzma/commit/adfbc99))
52
+ - capture tar-xz v6 redesign in CHANGELOGs + TODO.md ([9abd0a2](https://github.com/oorabona/node-liblzma/commit/9abd0a2))
53
+ - test fails on revert, passes on fix. ([b2c8a8c](https://github.com/oorabona/node-liblzma/commit/b2c8a8c))
54
+ - release v5.0.1 (tar-xz) ([0c631f5](https://github.com/oorabona/node-liblzma/commit/0c631f5))
55
+ - sync workspace package versions to npm registry (3.2.0 -> 5.0.0) ([900a055](https://github.com/oorabona/node-liblzma/commit/900a055))
56
+ - refresh lockfile for latest transitive dependencies (deps) ([8345c25](https://github.com/oorabona/node-liblzma/commit/8345c25))
57
+ - propagate anti-flake cleanup pattern to 3 high-risk integration tests ([f752664](https://github.com/oorabona/node-liblzma/commit/f752664))
58
+ - add afterEach cleanup + timer tracking in error_recovery test (anti-flake) ([2d7f285](https://github.com/oorabona/node-liblzma/commit/2d7f285))
59
+ - refresh lockfile for latest transitive dependencies (deps) ([bc7e804](https://github.com/oorabona/node-liblzma/commit/bc7e804))
60
+ - refresh lockfile for latest transitive dependencies (deps) ([dedd2c1](https://github.com/oorabona/node-liblzma/commit/dedd2c1))
61
+ - bump @vitest/ui (#106) (deps-dev) ([276f0b4](https://github.com/oorabona/node-liblzma/commit/276f0b4))
62
+ - refresh lockfile for latest transitive dependencies (deps) ([8b7b5b9](https://github.com/oorabona/node-liblzma/commit/8b7b5b9))
63
+ - ignore pnpm/action-setup v6+ in Dependabot (corrupts lockfile) (ci) ([fd2cf8c](https://github.com/oorabona/node-liblzma/commit/fd2cf8c))
64
+ - refresh lockfile for latest transitive dependencies (deps) ([a01694e](https://github.com/oorabona/node-liblzma/commit/a01694e))
65
+ - refresh lockfile for latest transitive dependencies (deps) ([e2eca27](https://github.com/oorabona/node-liblzma/commit/e2eca27))
66
+ - refresh lockfile for latest transitive dependencies (deps) ([b1386e9](https://github.com/oorabona/node-liblzma/commit/b1386e9))
67
+ - refresh lockfile for latest transitive dependencies (deps) ([1ba850e](https://github.com/oorabona/node-liblzma/commit/1ba850e))
68
+ - refresh lockfile for latest transitive dependencies (deps) ([e66f8fb](https://github.com/oorabona/node-liblzma/commit/e66f8fb))
69
+ - refresh lockfile for latest transitive dependencies (deps) ([fd906d6](https://github.com/oorabona/node-liblzma/commit/fd906d6))
70
+ - refresh lockfile for latest transitive dependencies (deps) ([e085fa4](https://github.com/oorabona/node-liblzma/commit/e085fa4))
71
+ - bump @vitest/ui in the dev-dependencies group (#95) (deps-dev) ([01e828c](https://github.com/oorabona/node-liblzma/commit/01e828c))
72
+ - refresh lockfile for latest transitive dependencies (deps) ([cfe60ca](https://github.com/oorabona/node-liblzma/commit/cfe60ca))
73
+ - refresh lockfile for latest transitive dependencies (deps) ([1d0dd42](https://github.com/oorabona/node-liblzma/commit/1d0dd42))
74
+ - refresh lockfile for latest transitive dependencies (deps) ([775ed0f](https://github.com/oorabona/node-liblzma/commit/775ed0f))
75
+ - refresh lockfile for latest transitive dependencies (deps) ([9a66903](https://github.com/oorabona/node-liblzma/commit/9a66903))
76
+ - refresh lockfile for latest transitive dependencies (deps) ([3e2bd44](https://github.com/oorabona/node-liblzma/commit/3e2bd44))
77
+ - refresh lockfile for latest transitive dependencies (deps) ([d3bea99](https://github.com/oorabona/node-liblzma/commit/d3bea99))
78
+
79
+ ### Removed
80
+ - - extractToMemory() — replaced by extract() + entry.bytes() ([b2c8a8c](https://github.com/oorabona/node-liblzma/commit/b2c8a8c))
81
+
82
+ ## 6.0.0
83
+
84
+ ### Major Changes
85
+
86
+ Complete API redesign. Universal stream-first design — same signatures in Node and Browser, built around `AsyncIterable<Uint8Array>`.
87
+
88
+ #### New API
89
+
90
+ - **Universal `create()`, `extract()`, `list()`** — identical signatures across Node and Browser.
91
+ - **`tar-xz/file` subpath export** (Node only) — opt-in path-based helpers `createFile()`, `extractFile()`, `listFile()`. Keeps the core SRP-clean (no fs deps in the core).
92
+ - **`AsyncIterable<TarEntryWithData>`** from `extract()` — entries yielded lazily; each carries a streaming `data` AsyncIterable plus `bytes()` and `text()` collector helpers.
93
+ - **`TarInput` union type** — accepts `AsyncIterable<Uint8Array>`, `Iterable<Uint8Array>`, `Uint8Array`, `ArrayBuffer`, `ReadableStream<Uint8Array>` (Web), or `NodeJS.ReadableStream`.
94
+
95
+ #### Security hardening
96
+
97
+ Comprehensive symlink/path TOCTOU hardening (18 vectors audited and closed in a single consolidated commit, after 7 rounds of Copilot review):
98
+
99
+ - Leaf symlink check (`target` itself, not just ancestors).
100
+ - Ancestor symlink walk extended to FILE/DIRECTORY/SYMLINK/HARDLINK.
101
+ - ENOENT correctly continues the ancestor walk instead of stopping.
102
+ - Hardlink `linkSource` validated for symlink-leaf and symlink-ancestor.
103
+ - `strip` option applied to both `name` and `linkname`.
104
+ - Empty / NUL-bearing names and linknames rejected.
105
+ - Dot-segment placeholder names (`.`, `./`, `..`) rejected.
106
+ - Setuid/setgid/sticky bits stripped from extracted modes by default (mirrors GNU tar `--no-same-permissions`).
107
+ - File extraction uses `fs.open(O_NOFOLLOW)` + fd-based `chmod`/`utimes` on POSIX — eliminates by-path TOCTOU window for permissions/timestamps.
108
+ - `pipeline()` instead of `pipe()` so source errors propagate properly.
109
+ - Threat-model documentation: concurrent attacker process is explicitly out of scope (POSIX `openat2(RESOLVE_BENEATH)` not exposed by Node).
110
+
111
+ #### Removed
112
+
113
+ - `extractToMemory()` — replaced by `extract()` + `entry.bytes()`.
114
+ - `createTarXz()` / `extractTarXz()` / `listTarXz()` (browser-prefixed names) — replaced by unified `create()` / `extract()` / `list()`.
115
+ - `BrowserCreateOptions` / `BrowserExtractOptions` — unified into single `CreateOptions` / `ExtractOptions`.
116
+ - `ExtractedFile` — replaced by `TarEntryWithData`.
117
+
118
+ #### Changed
119
+
120
+ - Source files for `create()` use the new `TarSourceFile` shape: `{ name, source, mode?, mtime?, linkname? }`. `source` accepts `AsyncIterable<Uint8Array> | Uint8Array | ArrayBuffer | string` (string is a Node-only fs path).
121
+ - `TarPack` / `TarUnpack` Transform classes are now internal; not exported from the package root.
122
+ - Default compression preset is uniform: `6` (Node and Browser).
123
+
124
+ #### Migration v5 → v6
125
+
126
+ See [README.md § Migration v5 → v6](./README.md#migration-v5--v6) for full code examples.
127
+
128
+ ## 5.0.1
129
+
130
+ ### Patch Changes
131
+
132
+ - Workspace package versions synchronized to npm registry (3.2.0 → 5.0.0). Internal infrastructure updates (CI workflows, lockfile maintenance, anti-flake test cleanup). No API changes.
133
+
134
+ [Unreleased]: https://github.com/oorabona/node-liblzma/compare/v6.1.0...HEAD
135
+ [v6.1.0]: https://github.com/oorabona/node-liblzma/releases/tag/v6.1.0
136
+ [6.1.0]: https://github.com/oorabona/node-liblzma/releases/tag/v6.1.0
package/README.md CHANGED
@@ -12,7 +12,7 @@ function names in both environments.
12
12
  ## Features
13
13
 
14
14
  - **Unified API** — `create`, `extract`, `list` work identically in Node.js and browsers
15
- - **Stream-shaped API** — all functions return `AsyncIterable<…>`; stream-shaped inputs accepted. Note: current Node `extract()`/`list()` implementations buffer internallytrue streaming is a planned optimization
15
+ - **Stream-shaped API** — all functions return `AsyncIterable<…>`; stream-shaped inputs accepted. Node `extract()`/`list()` now stream chunks as XZ decompresses them memory stays O(largest single entry). v6.0.0 introduced the stream-first API contract; v6.1.0 delivers the planned optimization that fulfills it.
16
16
  - **Flexible input** — `extract()` and `list()` accept `AsyncIterable`, `Uint8Array`,
17
17
  `ArrayBuffer`, Web `ReadableStream`, or Node `ReadableStream`
18
18
  - **Flexible source** — `create()` accepts fs paths (Node), `Buffer`/`Uint8Array`, or
@@ -178,6 +178,35 @@ for (const entry of entries) {
178
178
  Do not import `tar-xz/file` in browser bundles — it imports `node:fs` and will
179
179
  fail at runtime. Use `create`/`extract`/`list` directly in browser code.
180
180
 
181
+ ## Security model
182
+
183
+ `extractFile` enforces layered path-safety checks before writing any bytes to
184
+ disk: traversal detection (`..` and absolute paths), leaf-symlink rejection
185
+ (`O_NOFOLLOW` on POSIX), ancestor-symlink TOCTOU guard, hardlink validation,
186
+ NUL/empty name rejection, and setuid/setgid/sticky-bit stripping.
187
+
188
+ The TOCTOU mitigation differs by platform:
189
+
190
+ **POSIX (Linux, macOS):** FILE entries are written through a file descriptor
191
+ opened with `O_NOFOLLOW`. The fd is held open for the entire streaming write,
192
+ so the window between the safety check and the last write is effectively zero.
193
+
194
+ **Windows:** `O_NOFOLLOW` is not available. Extraction falls back to by-path
195
+ stream operations (`createWriteStream`). With streaming delivery (v6.1.0), the
196
+ window between the initial safety check and the last written byte scales with
197
+ the entry's size — a co-tenant process that can modify `cwd` could race a
198
+ symlink swap during this window.
199
+
200
+ **Windows recommendation:** always extract to a directory owned exclusively by
201
+ the calling process. Do not extract user-supplied archives into shared,
202
+ world-writable, or `TEMP`-like directories on Windows.
203
+
204
+ This gap is now closed: the Windows path uses `open(target, 'wx')` (atomic
205
+ exclusive create) with an unlink+retry pattern for legitimate overwrites. If a symlink
206
+ is injected between the unlink and the retry-open, extraction fails with a security error.
207
+ All writes and metadata operations are fd-based. See [SECURITY.md](./SECURITY.md#windows-symlink-swap-toctou)
208
+ for the full reparse-tag coverage table and user mitigations.
209
+
181
210
  ## Streaming Patterns
182
211
 
183
212
  ### Hash while creating
package/SECURITY.md ADDED
@@ -0,0 +1,103 @@
1
+ # tar-xz Security Notes
2
+
3
+ ## Windows symlink-swap TOCTOU
4
+
5
+ ### Background
6
+
7
+ On POSIX (Linux, macOS), `extractFile` opens each regular-file entry with
8
+ `O_NOFOLLOW` (`O_CREAT | O_WRONLY | O_TRUNC | O_NOFOLLOW`). This ensures the
9
+ kernel refuses to open a symlink at the leaf path, closing the symlink-swap
10
+ attack window from the moment the file handle is opened.
11
+
12
+ On Windows, `O_NOFOLLOW` is not available (libuv/Win32 does not expose it).
13
+ Prior to this hardening, the Windows path used `createWriteStream(target)` — a
14
+ by-path operation that could be redirected by a symlink injected between the
15
+ upstream safety check and the final write.
16
+
17
+ ### What changed in this hardening
18
+
19
+ The Windows extraction path now uses `open(target, 'wx', mode)` — the `'wx'`
20
+ flag maps to `O_CREAT | O_EXCL` in libuv, an atomic "create or fail" syscall.
21
+ Writes, `chmod`, and `utimes` are all performed through the resulting
22
+ `FileHandle`, immune to any by-path swap after the open.
23
+
24
+ The full sequence:
25
+
26
+ 1. **First `open('wx')`** — succeeds if the target does not exist (no prior
27
+ file or symlink at that path). All subsequent I/O via the file descriptor.
28
+ 2. **`EEXIST` on first open** — a regular file exists (legitimate re-extract
29
+ case): call `unlink(target)` then retry `open('wx')`.
30
+ 3. **`EEXIST` on retry** — a symlink (or other reparse point) was injected
31
+ between our `unlink` and our retry-open. Extraction is rejected with a
32
+ security error citing the entry name. No bytes are written.
33
+
34
+ ### Closed attack windows
35
+
36
+ | Window | From → To | Status |
37
+ |--------|-----------|--------|
38
+ | W1 | `lstat` check → `open()` | Closed — atomic `'wx'` EEXIST-path detects injection |
39
+ | W2 | `open()` → last byte written | Closed — fd-based `handle.write()` follows the inode, not the path |
40
+ | W3 | last byte → `chmod` | Closed — `handle.chmod()` (fd-based) |
41
+ | W4 | `chmod` → `utimes` | Closed — `handle.utimes()` (fd-based) |
42
+
43
+ ### Residual race
44
+
45
+ The `open()` syscall itself is atomic at the OS level (sub-microsecond). A
46
+ symlink injected **during** the `open()` syscall cannot win — the kernel either
47
+ creates the new inode or returns `EEXIST` atomically. There is no window inside
48
+ `open()`.
49
+
50
+ ### Reparse-point coverage table
51
+
52
+ Windows supports several reparse-point types beyond `IO_REPARSE_TAG_SYMLINK`.
53
+ The table below documents what each type means for the `'wx'` fail-closed contract:
54
+
55
+ | Reparse tag | Detected by `lstat().isSymbolicLink()` | Behavior under `'wx'` | Risk level |
56
+ |-------------|----------------------------------------|----------------------|------------|
57
+ | `IO_REPARSE_TAG_SYMLINK` | **Yes** — rejected by upstream `ensureSafeTarget` before `'wx'` is attempted | n/a (caught upstream) | None |
58
+ | `IO_REPARSE_TAG_MOUNT_POINT` (NTFS junction) | **No** — `lstat` returns `isSymbolicLink() === false` for junctions | `'wx'` returns `EEXIST` → `unlink` removes the junction → retry `'wx'` succeeds or attacker re-injects → security error | **Leaf protected** by `'wx'` EEXIST fail-closed |
59
+ | OneDrive / cloud-files placeholders (`IO_REPARSE_TAG_CLOUD_FILES`, etc.) | **No** — cloud stubs look like regular files to `lstat` | `'wx'` returns `EEXIST` → handled via normal overwrite path (unlink + retry) | Low — cloud stubs are regular-file-like; write lands on the stub, which hydrates on access |
60
+ | `IO_REPARSE_TAG_AF_UNIX` | **No** | `'wx'` returns `EEXIST` → unlink + retry path | Low — same as junction path |
61
+
62
+ **Summary:** The only reparse type that can be present at the *leaf* target
63
+ path and bypasses `isSymbolicLink()` is `IO_REPARSE_TAG_MOUNT_POINT`
64
+ (NTFS junctions). The `'wx'` fail-closed pattern protects the leaf in all
65
+ cases — if an attacker injects a junction between `unlink` and the retry-open,
66
+ `'wx'` returns `EEXIST` and we throw the security error.
67
+
68
+ **Ancestor junctions (residual risk):** The upstream `hasSymlinkAncestor`
69
+ walk uses `lstat().isSymbolicLink()` and does not detect junctions in ancestor
70
+ directories. This is a pre-existing limitation shared by `node-tar` and other
71
+ pure-JS tar libraries. It is not introduced by this change. See "User
72
+ mitigations" below.
73
+
74
+ ### Notes on hardlinks, case-insensitive NTFS, and ADS
75
+
76
+ **Hardlinks:** A hardlink injected during the unlink+retry race creates a
77
+ directory entry at the target path. Our retry `'wx'` returns `EEXIST` →
78
+ security error (fail-closed). Pre-existing hardlinks are regular files that
79
+ share an inode; `unlink` decrements the link count and our `'wx'` creates a
80
+ new inode. Semantically correct.
81
+
82
+ **Case-insensitive NTFS:** All path operations resolve to the same directory
83
+ entry regardless of case. This is not a bypass vector — `path.resolve` produces
84
+ a canonical path that is used consistently.
85
+
86
+ **NTFS Alternate Data Streams (ADS):** Out of scope. ADS does not affect the
87
+ default data stream. Tar entry names containing `:` are rejected upstream by
88
+ path-traversal validation before reaching this code.
89
+
90
+ ### User mitigations
91
+
92
+ For environments where even the residual ancestor-junction risk is unacceptable:
93
+
94
+ 1. **Restricted-ACL temp directory** — extract under a directory with an ACL
95
+ that prevents other users/processes from creating files or junctions inside
96
+ it. On Windows, use `CreateDirectory` + `SetSecurityInfo` to set a DACL
97
+ that grants write access only to the calling process's SID.
98
+ 2. **Prefer WSL for untrusted archives** — Windows Subsystem for Linux uses
99
+ the Linux VFS with full `O_NOFOLLOW` support. `extractFile` on WSL takes the
100
+ POSIX branch and is fully protected.
101
+ 3. **Verify archive origin** — do not extract archives from untrusted sources
102
+ into directories writable by other processes (world-writable temp dirs,
103
+ shared application data folders, etc.).
@@ -2,7 +2,7 @@
2
2
  * Node.js TAR extraction with XZ decompression — v6 AsyncIterable API
3
3
  */
4
4
  import type { TarInputNode } from '../internal/to-async-iterable.js';
5
- import { type ExtractOptions, type TarEntryWithData } from '../types.js';
5
+ import type { ExtractOptions, TarEntryWithData } from '../types.js';
6
6
  /**
7
7
  * Extract a tar.xz archive.
8
8
  *
@@ -20,7 +20,7 @@ import { type ExtractOptions, type TarEntryWithData } from '../types.js';
20
20
  * }
21
21
  * ```
22
22
  */
23
- export declare function extract(input: TarInputNode, options?: ExtractOptions): AsyncIterable<TarEntryWithData>;
23
+ export declare function extract(input: TarInputNode, options?: ExtractOptions): AsyncGenerator<TarEntryWithData>;
24
24
  /**
25
25
  * Extract archive to memory (no disk writes)
26
26
  */
@@ -1 +1 @@
1
- {"version":3,"file":"extract.d.ts","sourceRoot":"","sources":["../../src/node/extract.ts"],"names":[],"mappings":"AAAA;;GAEG;AAKH,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,kCAAkC,CAAC;AACrE,OAAO,EACL,KAAK,cAAc,EAEnB,KAAK,gBAAgB,EAEtB,MAAM,aAAa,CAAC;AAgIrB;;;;;;;;;;;;;;;;GAgBG;AACH,wBAAuB,OAAO,CAC5B,KAAK,EAAE,YAAY,EACnB,OAAO,GAAE,cAAmB,GAC3B,aAAa,CAAC,gBAAgB,CAAC,CAwBjC;AAED;;GAEG"}
1
+ {"version":3,"file":"extract.d.ts","sourceRoot":"","sources":["../../src/node/extract.ts"],"names":[],"mappings":"AAAA;;GAEG;AAGH,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,kCAAkC,CAAC;AACrE,OAAO,KAAK,EAAE,cAAc,EAAY,gBAAgB,EAAE,MAAM,aAAa,CAAC;AAkG9E;;;;;;;;;;;;;;;;GAgBG;AAEH,wBAAuB,OAAO,CAC5B,KAAK,EAAE,YAAY,EACnB,OAAO,GAAE,cAAmB,GAC3B,cAAc,CAAC,gBAAgB,CAAC,CA8HlC;AAED;;GAEG"}
@@ -1,124 +1,85 @@
1
1
  /**
2
2
  * Node.js TAR extraction with XZ decompression — v6 AsyncIterable API
3
3
  */
4
- import { Writable } from 'node:stream';
5
- import { calculatePadding } from '../tar/index.js';
6
4
  import { stripPath } from '../tar/utils.js';
7
- import { TarEntryType, } from '../types.js';
8
- import { parseNextHeader } from './tar-parser.js';
9
- import { collectAllChunks, decompressXz, runWritable } from './xz-helpers.js';
10
- /** Wrap a TarEntry + content Buffer into a TarEntryWithData. */
11
- function makeTarEntryWithData(entry, content) {
12
- const u8 = new Uint8Array(content.buffer, content.byteOffset, content.byteLength);
13
- async function collectBytes() {
14
- return u8;
15
- }
16
- async function collectText(encoding) {
17
- const bytes = await collectBytes();
18
- const enc = (encoding ?? 'utf-8');
19
- return Buffer.from(bytes).toString(enc);
20
- }
21
- return {
22
- ...entry,
23
- data: (async function* () {
24
- if (u8.length > 0)
25
- yield u8;
26
- })(),
27
- bytes: collectBytes,
28
- text: collectText,
29
- };
30
- }
5
+ import { parseTar } from './tar-parser.js';
6
+ import { streamXz } from './xz-helpers.js';
31
7
  /**
32
- * Transform stream that unpacks TAR format
8
+ * Concatenate an array of Uint8Array chunks into a single Uint8Array.
9
+ * @internal
33
10
  */
34
- class TarUnpack extends Writable {
35
- state = {
36
- buffer: Buffer.alloc(0),
37
- paxAttrs: null,
38
- emptyBlockCount: 0,
11
+ /**
12
+ * Wrap a TarEntry + a pull-callback for its content into a TarEntryWithData.
13
+ *
14
+ * The `dataPull` callback returns an `AsyncGenerator<Uint8Array>` that is
15
+ * backed by the outer `parseTar` generator. It is single-use — consuming it
16
+ * twice yields nothing on the second pass (JS AsyncGenerator default).
17
+ *
18
+ * `bytes()` throws if `entry.data` was already iterated — call `bytes()` before
19
+ * iterating `entry.data` if you need the full content (D-3 / F-2 contract).
20
+ * `text()` uses `Buffer.toString()` and accepts any `BufferEncoding` (including
21
+ * `'base64'`, `'hex'`, `'latin1'`) — same contract as the pre-v6.1.0 behavior.
22
+ * Holding a reference to the returned entry also holds the cached bytes; release
23
+ * the entry to allow GC.
24
+ */
25
+ function makeTarEntryWithData(entry, dataPull) {
26
+ let cachedBytes = null;
27
+ let dataIterStarted = false;
28
+ const dataGen = dataPull(); // single-use generator
29
+ // Wrap dataGen so that direct iteration of `entry.data` is detected.
30
+ // When the consumer calls `for await (const chunk of entry.data)`, the
31
+ // wrapper sets `dataIterStarted = true` so that a subsequent `bytes()` call
32
+ // can throw instead of silently returning incomplete bytes.
33
+ // Wrap dataGen behind a plain AsyncIterable so that:
34
+ // 1. `for await` iteration sets `dataIterStarted = true` (F-2 guard)
35
+ // 2. The type is `AsyncIterable<Uint8Array>` — matching TarEntryWithData.data
36
+ // and avoiding the `[Symbol.asyncDispose]` requirement that TS/lib.esnext
37
+ // adds to the full `AsyncGenerator` interface (Explicit Resource Management).
38
+ const dataWrapper = {
39
+ [Symbol.asyncIterator]() {
40
+ dataIterStarted = true;
41
+ return dataGen;
42
+ },
43
+ };
44
+ return {
45
+ ...entry,
46
+ data: dataWrapper,
47
+ async bytes() {
48
+ if (cachedBytes !== null)
49
+ return cachedBytes;
50
+ if (dataIterStarted) {
51
+ throw new Error('entry.data already iterated; bytes() cannot recover full content — call bytes() before iterating entry.data');
52
+ }
53
+ dataIterStarted = true;
54
+ // Alloc-once optimisation: entry.size is known from the TAR header, so we
55
+ // pre-allocate a single buffer and set() each arriving chunk at its running
56
+ // offset. This halves peak memory vs the chunks-array-then-concat pattern
57
+ // (no intermediate array + final copy simultaneously resident).
58
+ if (entry.size === 0) {
59
+ cachedBytes = new Uint8Array(0);
60
+ return cachedBytes;
61
+ }
62
+ const buf = new Uint8Array(entry.size);
63
+ let offset = 0;
64
+ for await (const c of dataGen) {
65
+ if (offset + c.byteLength > entry.size) {
66
+ // Malformed archive: chunk would write past the declared entry size.
67
+ // Truncate at entry.size to avoid out-of-bounds writes and throw so
68
+ // callers know the data is corrupt. Code matches the TAR_PARSER_INVARIANT
69
+ // convention used in parseTar (corrupt archive detected at parse level).
70
+ throw Object.assign(new Error(`tar: entry "${entry.name}" declared size ${entry.size} but received more bytes (offset ${offset} + chunk ${c.byteLength} = ${offset + c.byteLength})`), { code: 'TAR_PARSER_INVARIANT' });
71
+ }
72
+ buf.set(c, offset);
73
+ offset += c.byteLength;
74
+ }
75
+ cachedBytes = buf;
76
+ return cachedBytes;
77
+ },
78
+ async text(encoding) {
79
+ const raw = await this.bytes();
80
+ return Buffer.from(raw.buffer, raw.byteOffset, raw.byteLength).toString((encoding ?? 'utf8'));
81
+ },
39
82
  };
40
- currentEntry = null;
41
- bytesRemaining = 0;
42
- paddingRemaining = 0;
43
- contentChunks = [];
44
- entries = [];
45
- _write(chunk, _encoding, callback) {
46
- this.state.buffer = Buffer.concat([this.state.buffer, chunk]);
47
- try {
48
- this.processBuffer();
49
- callback();
50
- }
51
- catch (error) {
52
- callback(error);
53
- }
54
- }
55
- /** Skip padding bytes that follow a file's content blocks. */
56
- skipPadding() {
57
- if (this.paddingRemaining <= 0) {
58
- return false;
59
- }
60
- const skip = Math.min(this.paddingRemaining, this.state.buffer.length);
61
- this.state.buffer = this.state.buffer.subarray(skip);
62
- this.paddingRemaining -= skip;
63
- return true;
64
- }
65
- /** Read file content bytes into `contentChunks`; finalize entry when done. */
66
- readContent() {
67
- if (this.bytesRemaining <= 0 || !this.currentEntry) {
68
- return false;
69
- }
70
- const readSize = Math.min(this.bytesRemaining, this.state.buffer.length);
71
- this.contentChunks.push(this.state.buffer.subarray(0, readSize));
72
- this.state.buffer = this.state.buffer.subarray(readSize);
73
- this.bytesRemaining -= readSize;
74
- if (this.bytesRemaining === 0) {
75
- const content = Buffer.concat(this.contentChunks);
76
- this.entries.push({ ...this.currentEntry, content });
77
- this.paddingRemaining = calculatePadding(this.currentEntry.size);
78
- this.currentEntry = null;
79
- this.contentChunks = [];
80
- }
81
- return true;
82
- }
83
- /** Push a no-content entry (directory, symlink, hardlink, empty file). */
84
- pushEmptyEntry(entry) {
85
- this.entries.push({ ...entry, content: Buffer.alloc(0) });
86
- }
87
- /** Dispatch a parsed entry: push immediately or prepare for content read. */
88
- handleEntry(entry) {
89
- if (entry.type === TarEntryType.DIRECTORY ||
90
- entry.type === TarEntryType.SYMLINK ||
91
- entry.type === TarEntryType.HARDLINK ||
92
- entry.size === 0) {
93
- this.pushEmptyEntry(entry);
94
- return;
95
- }
96
- this.currentEntry = entry;
97
- this.bytesRemaining = entry.size;
98
- this.contentChunks = [];
99
- }
100
- processBuffer() {
101
- while (this.state.buffer.length > 0) {
102
- if (this.skipPadding())
103
- continue;
104
- if (this.readContent())
105
- continue;
106
- const result = parseNextHeader(this.state);
107
- if (result.action === 'need-more-data' || result.action === 'end-of-archive')
108
- break;
109
- if (result.action === 'pax-consumed')
110
- continue;
111
- this.handleEntry(result.entry);
112
- }
113
- }
114
- _final(callback) {
115
- if (this.bytesRemaining > 0) {
116
- callback(new Error(`Unexpected end of archive, ${this.bytesRemaining} bytes remaining`));
117
- }
118
- else {
119
- callback();
120
- }
121
- }
122
83
  }
123
84
  /**
124
85
  * Extract a tar.xz archive.
@@ -137,23 +98,130 @@ class TarUnpack extends Writable {
137
98
  * }
138
99
  * ```
139
100
  */
101
+ // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: streaming generator with strip/filter/drain logic — complexity is intrinsic
140
102
  export async function* extract(input, options = {}) {
141
103
  const { strip = 0, filter } = options;
142
- const chunks = await collectAllChunks(input);
143
- const tarData = await decompressXz(chunks);
144
- const tarUnpack = new TarUnpack();
145
- await runWritable(tarUnpack, tarData);
146
- for (const entry of tarUnpack.entries) {
147
- const strippedName = stripPath(entry.name, strip);
148
- if (!strippedName) {
149
- continue;
104
+ const xzStream = streamXz(input);
105
+ const parser = parseTar(xzStream, 'extract');
106
+ // Lookahead: an event pulled from parseTar that hasn't been processed yet.
107
+ // This allows the data-generator to consume chunks and then "return" the
108
+ // terminating event (entry/end) for the outer loop to process.
109
+ let lookahead = null;
110
+ /** Pull next event from parser, respecting any pending lookahead. */
111
+ async function nextEvent() {
112
+ if (lookahead !== null) {
113
+ const ev = lookahead;
114
+ lookahead = null;
115
+ return { value: ev, done: false };
116
+ }
117
+ return parser.next();
118
+ }
119
+ /** Drain all remaining 'chunk' events for the current entry from parseTar.
120
+ * The terminating 'entry' or 'end' event is stored in `lookahead`. */
121
+ async function drainChunks() {
122
+ while (true) {
123
+ const result = await parser.next();
124
+ if (result.done)
125
+ return;
126
+ if (result.value.kind !== 'chunk') {
127
+ lookahead = result.value;
128
+ return;
129
+ }
130
+ }
131
+ }
132
+ try {
133
+ while (true) {
134
+ const result = await nextEvent();
135
+ if (result.done)
136
+ break;
137
+ const ev = result.value;
138
+ if (ev.kind === 'end')
139
+ break;
140
+ if (ev.kind === 'chunk') {
141
+ // Stray chunk at outer-loop level is a parser invariant violation (D-5).
142
+ const err = new Error('parser invariant: chunk emitted before entry');
143
+ err.code = 'TAR_PARSER_INVARIANT';
144
+ throw err;
145
+ }
146
+ // ev.kind === 'entry'
147
+ const rawEntry = ev.entry;
148
+ const strippedName = stripPath(rawEntry.name, strip);
149
+ if (!strippedName) {
150
+ await drainChunks();
151
+ continue;
152
+ }
153
+ const strippedEntry = { ...rawEntry, name: strippedName };
154
+ if (filter && !filter(strippedEntry)) {
155
+ await drainChunks();
156
+ continue;
157
+ }
158
+ // Build a data generator that pulls 'chunk' events from the parseTar stream.
159
+ // When chunks are exhausted it stores the next 'entry'/'end' in `lookahead`.
160
+ // The outer generator is suspended at `yield entryWithData` while the consumer
161
+ // iterates this — natural backpressure.
162
+ let dataGenInFlight = false;
163
+ function makeDataGen() {
164
+ if (dataGenInFlight) {
165
+ throw new Error('concurrent entry.data iteration is not supported');
166
+ }
167
+ dataGenInFlight = true;
168
+ return (async function* () {
169
+ try {
170
+ while (true) {
171
+ const r = await parser.next();
172
+ if (r.done)
173
+ return;
174
+ if (r.value.kind === 'chunk') {
175
+ yield r.value.data;
176
+ }
177
+ else {
178
+ // 'entry' or 'end' — store for outer loop.
179
+ lookahead = r.value;
180
+ return;
181
+ }
182
+ }
183
+ }
184
+ finally {
185
+ dataGenInFlight = false;
186
+ }
187
+ })();
188
+ }
189
+ const entryWithData = makeTarEntryWithData(strippedEntry, makeDataGen);
190
+ yield entryWithData;
191
+ // After the consumer advances past this entry, drain any remaining chunks
192
+ // that the consumer did not read (S-08 auto-drain, Case A per §12.4).
193
+ // If the data generator was fully consumed, lookahead is already set.
194
+ // If not, drain now.
195
+ if (lookahead === null) {
196
+ // Consumer did not fully iterate entry.data — drain remaining chunks.
197
+ try {
198
+ await drainChunks();
199
+ }
200
+ catch (err) {
201
+ // Decode/IO error during skipped data — swallow per D-2.
202
+ // TAR_PARSER_INVARIANT always re-throws per D-5.
203
+ if (err.code === 'TAR_PARSER_INVARIANT') {
204
+ throw err;
205
+ }
206
+ // Swallow other errors from skipped data per D-2.
207
+ }
208
+ }
209
+ }
210
+ }
211
+ finally {
212
+ // Case B (consumer break): close parser, no drain needed.
213
+ // Swallow cleanup errors per D-2. TAR_PARSER_INVARIANT handling: per D-5 these
214
+ // should always re-throw, but noUnsafeFinally prohibits throw in finally.
215
+ // In practice a TAR_PARSER_INVARIANT during parser.return() is a bug in our own
216
+ // code, not in the caller's data — the iterator is already being abandoned, so
217
+ // the invariant error will surface on the NEXT use attempt, which is unreachable
218
+ // here. We swallow it the same as other cleanup errors.
219
+ try {
220
+ await parser.return(undefined);
150
221
  }
151
- const strippedEntry = { ...entry, name: strippedName };
152
- if (filter && !filter(strippedEntry)) {
153
- continue;
222
+ catch {
223
+ // Swallow all cleanup errors per D-2.
154
224
  }
155
- const entryContent = entry.content;
156
- yield makeTarEntryWithData(strippedEntry, entryContent);
157
225
  }
158
226
  }
159
227
  /**