gn-native 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/Cargo.toml +23 -0
- package/README.md +135 -0
- package/gn-l0-multicorpus.snapshot +1 -0
- package/gn-native.linux-x64-gnu.node +0 -0
- package/index.d.ts +0 -0
- package/package.json +33 -0
- package/src/lib.rs +652 -0
- package/src/vtc_patch.rs +43 -0
package/Cargo.toml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
[package]
|
|
2
|
+
name = "gn-native"
|
|
3
|
+
version = "0.1.0"
|
|
4
|
+
edition = "2021"
|
|
5
|
+
|
|
6
|
+
[lib]
|
|
7
|
+
crate-type = ["cdylib"]
|
|
8
|
+
|
|
9
|
+
[dependencies]
|
|
10
|
+
napi = { version = "2", features = ["napi6", "tokio_rt", "async"] }
|
|
11
|
+
napi-derive = "2"
|
|
12
|
+
tokio = { version = "1", features = ["full"] }
|
|
13
|
+
glasik-core = { path = ".." }
|
|
14
|
+
libdeflater = "1.25"
|
|
15
|
+
rayon = "1.10"
|
|
16
|
+
serde_json = "1.0"
|
|
17
|
+
sha2 = "0.11.0"
|
|
18
|
+
hex = "0.4.3"
|
|
19
|
+
flate2 = "1.1.9"
|
|
20
|
+
|
|
21
|
+
[profile.release]
|
|
22
|
+
lto = true
|
|
23
|
+
codegen-units = 1
|
package/README.md
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# gn-native
|
|
2
|
+
|
|
3
|
+
Native Rust addon for Node.js — domain-adaptive lossless compression for LLM conversation streams.
|
|
4
|
+
|
|
5
|
+
## What is GN?
|
|
6
|
+
|
|
7
|
+
GN (Glasik Notation) is a split-stream tokenized compression codec built for LLM traffic.
|
|
8
|
+
It learns vocabulary from conversation patterns and separates token IDs from literal bytes
|
|
9
|
+
before compressing each stream independently with raw deflate.
|
|
10
|
+
|
|
11
|
+
The result: better ratio than brotli-6 on LLM data, with 2-4x better tail latency.
|
|
12
|
+
|
|
13
|
+
## Verified Benchmarks
|
|
14
|
+
|
|
15
|
+
Standard protocol: warm 500 chunks, test 300, 4 corpora x 3 seeds = 12 measurements.
|
|
16
|
+
|
|
17
|
+
### GN split-stream b=8 (production)
|
|
18
|
+
|
|
19
|
+
| Corpus | Ratio | vs gzip | vs brotli-6 | p50 | p99 |
|
|
20
|
+
|------------|-------------|---------|-------------|----------|----------|
|
|
21
|
+
| ShareGPT | 2.49-2.52x | +15% | +2% | 0.043ms | 0.061ms |
|
|
22
|
+
| WildChat | 2.48-2.51x | +15% | +2% | 0.042ms | 0.073ms |
|
|
23
|
+
| LMSYS | 2.50-2.56x | +14% | +2% | 0.044ms | 0.079ms |
|
|
24
|
+
| Ubuntu-IRC | 2.06-2.09x | +49% | +28% | 0.008ms | 0.013ms |
|
|
25
|
+
|
|
26
|
+
Baselines (same data, per-batch fair comparison):
|
|
27
|
+
|
|
28
|
+
| Algorithm | Ratio | p50 | p99 |
|
|
29
|
+
|-----------|--------|----------|----------|
|
|
30
|
+
| gzip-6 | 2.181x | 0.024ms | 0.220ms |
|
|
31
|
+
| brotli-6 | 2.472x | 0.044ms | 0.226ms |
|
|
32
|
+
|
|
33
|
+
GN split b=8 p99 never exceeds 0.123ms. Brotli-6 p99 reaches 0.226ms.
|
|
34
|
+
|
|
35
|
+
### Production metrics (OpenClaw live, April 2026)
|
|
36
|
+
|
|
37
|
+
- Messages processed: 3,570
|
|
38
|
+
- Average ratio: 2.404x
|
|
39
|
+
- Maximum ratio: 10.878x
|
|
40
|
+
- Total bytes saved: 2,440.9 KB
|
|
41
|
+
|
|
42
|
+
Real LLM agent traffic. Matches lab predictions.
|
|
43
|
+
|
|
44
|
+
### Cold-start
|
|
45
|
+
|
|
46
|
+
With L0 snapshot pre-loaded, GN beats brotli from chunk 0 (2.4903x vs 2.4271x at n=0).
|
|
47
|
+
|
|
48
|
+
## Architecture
|
|
49
|
+
|
|
50
|
+
### Split-stream insight
|
|
51
|
+
|
|
52
|
+
Mixed tokenized streams pollute deflate with structural noise (ESCAPE bytes every 2 bytes).
|
|
53
|
+
GN separates token IDs and literal bytes into independent streams, each compressed with raw deflate.
|
|
54
|
+
Token stream: pure symbols with skewed distribution. Deflate loves it.
|
|
55
|
+
Literal stream: clean text with no structural noise.
|
|
56
|
+
|
|
57
|
+
### Tiered vocabulary (L0-L3)
|
|
58
|
+
|
|
59
|
+
- **L0**: Universal (pre-trained, 20k entries, static)
|
|
60
|
+
- **L1**: Domain (per shard type, learned online)
|
|
61
|
+
- **L2**: Session (sliding window per session)
|
|
62
|
+
- **L3**: Chunk (ephemeral N-grams, serialized into frame)
|
|
63
|
+
|
|
64
|
+
### VTC v3 (Virtual Time Crystal identity)
|
|
65
|
+
|
|
66
|
+
Every compressed shard has a deterministic crystal identity:
|
|
67
|
+
VTC-v3-SHA256(shard_type || session_id || canonical_pairs || literal_hash || sequence_fingerprint)
|
|
68
|
+
|
|
69
|
+
- Same content + same session = same VTC always
|
|
70
|
+
- Different content, session, or shard type = different VTC guaranteed
|
|
71
|
+
- Collision-resistant by construction, not just by hash probability
|
|
72
|
+
- Includes literal residue (negative space) and emission order fingerprint
|
|
73
|
+
|
|
74
|
+
### Frame format
|
|
75
|
+
[1B shard_type][2B pairs_deflated_len LE][2B l3_ser_len LE][l3_ser][deflated_pairs][deflated_literals]
|
|
76
|
+
|
|
77
|
+
Self-contained. Given the vocabulary snapshot, fully decodable without external state.
|
|
78
|
+
|
|
79
|
+
## API
|
|
80
|
+
|
|
81
|
+
```javascript
|
|
82
|
+
const gn = require('gn-native');
|
|
83
|
+
|
|
84
|
+
// Split-stream batch compression (production, b=8)
|
|
85
|
+
const results = await gn.gnCompressSplitBatch(chunks); // Buffer[] -> Buffer (concatenated)
|
|
86
|
+
|
|
87
|
+
// Single chunk
|
|
88
|
+
const compressed = await gn.gnCompressSplit(data);
|
|
89
|
+
const decompressed = await gn.gnDecompress(compressed);
|
|
90
|
+
|
|
91
|
+
// Fractal sharding with VTC identity
|
|
92
|
+
const vtc = await gn.gnCompressFractalWithVtc(data, 'user_intent', sessionId);
|
|
93
|
+
// Returns: "VTC-v3-<64 hex chars>"
|
|
94
|
+
|
|
95
|
+
// Fractal compress/decompress
|
|
96
|
+
const frame = await gn.gnCompressFractal(data, 'user_intent', sessionId);
|
|
97
|
+
const original = await gn.gnDecompressFractal(frame, 'user_intent', sessionId);
|
|
98
|
+
|
|
99
|
+
// Vocabulary
|
|
100
|
+
await gn.gnSaveSnapshot(path);
|
|
101
|
+
await gn.gnLoadSnapshot(path);
|
|
102
|
+
const stats = await gn.gnWindowStats();
|
|
103
|
+
|
|
104
|
+
// Health check
|
|
105
|
+
const ok = await gn.gnTest(); // returns "binding_ok"
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Shard types
|
|
109
|
+
|
|
110
|
+
`user_intent` | `assistant_response` | `system_message` | `code_block` | `tool_call` | `tool_result` | `generic`
|
|
111
|
+
|
|
112
|
+
## Installation
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
npm install gn-native
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Requires pre-built `.node` addon for `linux-x64-gnu`.
|
|
119
|
+
To build from source:
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
git clone https://github.com/atomsrkuul/glasik-core
|
|
123
|
+
cd glasik-core/gn-node
|
|
124
|
+
npm install
|
|
125
|
+
npm run build
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Requires Rust toolchain and napi-rs CLI.
|
|
129
|
+
|
|
130
|
+
## Paper
|
|
131
|
+
|
|
132
|
+
GN: Domain-Adaptive Lossless Compression for LLM Conversation Streams
|
|
133
|
+
Robert Rider, Independent Researcher
|
|
134
|
+
Pending arXiv cs.IR submission.
|
|
135
|
+
GitHub: https://github.com/atomsrkuul/glasik-core (MIT)
|