gn-native 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/Cargo.toml ADDED
@@ -0,0 +1,23 @@
1
+ [package]
2
+ name = "gn-native"
3
+ version = "0.1.0"
4
+ edition = "2021"
5
+
6
+ [lib]
7
+ crate-type = ["cdylib"]
8
+
9
+ [dependencies]
10
+ napi = { version = "2", features = ["napi6", "tokio_rt", "async"] }
11
+ napi-derive = "2"
12
+ tokio = { version = "1", features = ["full"] }
13
+ glasik-core = { path = ".." }
14
+ libdeflater = "1.25"
15
+ rayon = "1.10"
16
+ serde_json = "1.0"
17
+ sha2 = "0.11.0"
18
+ hex = "0.4.3"
19
+ flate2 = "1.1.9"
20
+
21
+ [profile.release]
22
+ lto = true
23
+ codegen-units = 1
package/README.md ADDED
@@ -0,0 +1,135 @@
1
+ # gn-native
2
+
3
+ Native Rust addon for Node.js — domain-adaptive lossless compression for LLM conversation streams.
4
+
5
+ ## What is GN?
6
+
7
+ GN (Glasik Notation) is a split-stream tokenized compression codec built for LLM traffic.
8
+ It learns vocabulary from conversation patterns and separates token IDs from literal bytes
9
+ before compressing each stream independently with raw deflate.
10
+
11
+ The result: better ratio than brotli-6 on LLM data, with 2-4x better tail latency.
12
+
13
+ ## Verified Benchmarks
14
+
15
+ Standard protocol: warm 500 chunks, test 300, 4 corpora x 3 seeds = 12 measurements.
16
+
17
+ ### GN split-stream b=8 (production)
18
+
19
+ | Corpus | Ratio | vs gzip | vs brotli-6 | p50 | p99 |
20
+ |------------|-------------|---------|-------------|----------|----------|
21
+ | ShareGPT | 2.49-2.52x | +15% | +2% | 0.043ms | 0.061ms |
22
+ | WildChat | 2.48-2.51x | +15% | +2% | 0.042ms | 0.073ms |
23
+ | LMSYS | 2.50-2.56x | +14% | +2% | 0.044ms | 0.079ms |
24
+ | Ubuntu-IRC | 2.06-2.09x | +49% | +28% | 0.008ms | 0.013ms |
25
+
26
+ Baselines (same data, per-batch fair comparison):
27
+
28
+ | Algorithm | Ratio | p50 | p99 |
29
+ |-----------|--------|----------|----------|
30
+ | gzip-6 | 2.181x | 0.024ms | 0.220ms |
31
+ | brotli-6 | 2.472x | 0.044ms | 0.226ms |
32
+
33
+ GN split b=8 p99 never exceeds 0.123ms. Brotli-6 p99 reaches 0.226ms.
34
+
35
+ ### Production metrics (OpenClaw live, April 2026)
36
+
37
+ - Messages processed: 3,570
38
+ - Average ratio: 2.404x
39
+ - Maximum ratio: 10.878x
40
+ - Total bytes saved: 2,440.9 KB
41
+
42
+ Real LLM agent traffic. Matches lab predictions.
43
+
44
+ ### Cold-start
45
+
46
+ With L0 snapshot pre-loaded, GN beats brotli from chunk 0 (2.4903x vs 2.4271x at n=0).
47
+
48
+ ## Architecture
49
+
50
+ ### Split-stream insight
51
+
52
+ Mixed tokenized streams pollute deflate with structural noise (ESCAPE bytes every 2 bytes).
53
+ GN separates token IDs and literal bytes into independent streams, each compressed with raw deflate.
54
+ Token stream: pure symbols with skewed distribution. Deflate loves it.
55
+ Literal stream: clean text with no structural noise.
56
+
57
+ ### Tiered vocabulary (L0-L3)
58
+
59
+ - **L0**: Universal (pre-trained, 20k entries, static)
60
+ - **L1**: Domain (per shard type, learned online)
61
+ - **L2**: Session (sliding window per session)
62
+ - **L3**: Chunk (ephemeral N-grams, serialized into frame)
63
+
64
+ ### VTC v3 (Virtual Time Crystal identity)
65
+
66
+ Every compressed shard has a deterministic crystal identity:
67
+ VTC-v3-SHA256(shard_type || session_id || canonical_pairs || literal_hash || sequence_fingerprint)
68
+
69
+ - Same content + same session = same VTC always
70
+ - Different content, session, or shard type = different VTC guaranteed
71
+ - Collision-resistant by construction, not just by hash probability
72
+ - Includes literal residue (negative space) and emission order fingerprint
73
+
74
+ ### Frame format
75
+ [1B shard_type][2B pairs_deflated_len LE][2B l3_ser_len LE][l3_ser][deflated_pairs][deflated_literals]
76
+
77
+ Self-contained. Given the vocabulary snapshot, fully decodable without external state.
78
+
79
+ ## API
80
+
81
+ ```javascript
82
+ const gn = require('gn-native');
83
+
84
+ // Split-stream batch compression (production, b=8)
85
+ const results = await gn.gnCompressSplitBatch(chunks); // Buffer[] -> Buffer (concatenated)
86
+
87
+ // Single chunk
88
+ const compressed = await gn.gnCompressSplit(data);
89
+ const decompressed = await gn.gnDecompress(compressed);
90
+
91
+ // Fractal sharding with VTC identity
92
+ const vtc = await gn.gnCompressFractalWithVtc(data, 'user_intent', sessionId);
93
+ // Returns: "VTC-v3-<64 hex chars>"
94
+
95
+ // Fractal compress/decompress
96
+ const frame = await gn.gnCompressFractal(data, 'user_intent', sessionId);
97
+ const original = await gn.gnDecompressFractal(frame, 'user_intent', sessionId);
98
+
99
+ // Vocabulary
100
+ await gn.gnSaveSnapshot(path);
101
+ await gn.gnLoadSnapshot(path);
102
+ const stats = await gn.gnWindowStats();
103
+
104
+ // Health check
105
+ const ok = await gn.gnTest(); // returns "binding_ok"
106
+ ```
107
+
108
+ ### Shard types
109
+
110
+ `user_intent` | `assistant_response` | `system_message` | `code_block` | `tool_call` | `tool_result` | `generic`
111
+
112
+ ## Installation
113
+
114
+ ```bash
115
+ npm install gn-native
116
+ ```
117
+
118
+ Requires pre-built `.node` addon for `linux-x64-gnu`.
119
+ To build from source:
120
+
121
+ ```bash
122
+ git clone https://github.com/atomsrkuul/glasik-core
123
+ cd glasik-core/gn-node
124
+ npm install
125
+ npm run build
126
+ ```
127
+
128
+ Requires Rust toolchain and napi-rs CLI.
129
+
130
+ ## Paper
131
+
132
+ GN: Domain-Adaptive Lossless Compression for LLM Conversation Streams
133
+ Robert Rider, Independent Researcher
134
+ Pending arXiv cs.IR submission.
135
+ GitHub: https://github.com/atomsrkuul/glasik-core (MIT)