npm - slimjson - Versions diffs - 1.1.0 → 1.1.2 - Mend

slimjson 1.1.0 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/README.md +34 -25
package/README_EN.md +34 -25
package/compress.js +44 -74
package/coverage/clover.xml +275 -0
package/coverage/coverage-final.json +2 -0
package/coverage/lcov-report/base.css +224 -0
package/coverage/lcov-report/block-navigation.js +87 -0
package/coverage/lcov-report/compress.js.html +1630 -0
package/coverage/lcov-report/favicon.png +0 -0
package/coverage/lcov-report/index.html +116 -0
package/coverage/lcov-report/prettify.css +1 -0
package/coverage/lcov-report/prettify.js +2 -0
package/coverage/lcov-report/sort-arrow-sprite.png +0 -0
package/coverage/lcov-report/sorter.js +210 -0
package/coverage/lcov.info +636 -0
package/data/data.json +96365 -0
package/data/data.json.slim +1 -0
package/esm.mjs +1 -0
package/package.json +1 -1
package/test.js +719 -214
package/.claude/settings.local.json +0 -11

package/README.md CHANGED Viewed

@@ -440,61 +440,70 @@ console.log(`压缩率: ${ratio}%`);
 ## LLM 数据检索准确率
-使用 209 道数据检索题测试不同格式下 LLM 的理解准确率。
+使用 209 道数据检索题在 2 个模型上测试不同格式下 LLM 的理解准确率。
 #### 效率排名（每 1K tokens 的准确率）
 ```
-slimjson       ████████████████████   44.4 acc%/1K tok  │  94.7% acc  │  2,134 tokens
-TOON           ███████████████░░░░░   34.0 acc%/1K tok  │  92.8% acc  │  2,734 tokens
+slimjson       ████████████████████   44.3 acc%/1K tok  │  94.5% acc  │  2,133 tokens
+TOON           ███████████████░░░░░   33.8 acc%/1K tok  │  92.3% acc  │  2,734 tokens
 JSON compact   ██████████████░░░░░░   31.0 acc%/1K tok  │  95.2% acc  │  3,072 tokens
-YAML           ███████████░░░░░░░░░   25.4 acc%/1K tok  │  94.3% acc  │  3,716 tokens
-JSON           ██████████░░░░░░░░░░   21.1 acc%/1K tok  │  95.7% acc  │  4,538 tokens
-XML            ████████░░░░░░░░░░░░   18.5 acc%/1K tok  │  95.7% acc  │  5,162 tokens
+YAML           ███████████░░░░░░░░░   24.9 acc%/1K tok  │  92.3% acc  │  3,716 tokens
+JSON           █████████░░░░░░░░░░░   20.3 acc%/1K tok  │  92.3% acc  │  4,538 tokens
+XML            ████████░░░░░░░░░░░░   18.1 acc%/1K tok  │  93.3% acc  │  5,162 tokens
 ```
 *效率分数 = (准确率% ÷ tokens) × 1,000，越高越好。*
-> slimjson 准确率 **94.7%**（vs JSON 的 95.7%），同时节省 **53.0%** tokens。
+> slimjson 准确率 **94.5%**（vs JSON 的 92.3%），同时节省 **53.0%** tokens。
 #### 各模型准确率
 ```
 deepseek-v4-flash
-  JSON           ███████████████████░    95.7% (200/209)
   XML            ███████████████████░    95.7% (200/209)
+  JSON           ███████████████████░    95.7% (200/209)
   JSON compact   ███████████████████░    95.2% (199/209)
-→ slimjson       ███████████████████░    94.7% (198/209)
   YAML           ███████████████████░    94.3% (197/209)
+→ slimjson       ███████████████████░    93.3% (195/209)
   TOON           ███████████████████░    92.8% (194/209)
   CSV            ██████████████████░░    91.7% (100/109)
+mimo-v2.5-pro
+→ slimjson       ███████████████████░    95.7% (200/209)
+  JSON compact   ███████████████████░    95.2% (199/209)
+  TOON           ██████████████████░░    91.9% (192/209)
+  XML            ██████████████████░░    90.9% (190/209)
+  YAML           ██████████████████░░    90.4% (189/209)
+  JSON           ██████████████████░░    89.0% (186/209)
+  CSV            ██████████████████░░    88.1% (96/109)
 ```
 #### 按题型准确率
-| 题型 | JSON | XML | JSON compact | slimjson | YAML | TOON | CSV |
-|------|------|-----|-------------|----------|------|------|-----|
-| 字段检索 | 98.5% | 97.1% | 98.5% | 95.6% | 97.1% | 91.2% | 96.9% |
-| 聚合计算 | 98.4% | 96.8% | 95.2% | 95.2% | 93.7% | 95.2% | 86.2% |
-| 条件筛选 | 97.9% | 97.9% | 100.0% | 100.0% | 100.0% | 100.0% | 96.3% |
-| 结构感知 | 88.0% | 92.0% | 84.0% | 92.0% | 88.0% | 88.0% | 87.5% |
-| 结构验证 | 40.0% | 60.0% | 60.0% | 40.0% | 40.0% | 40.0% | 80.0% |
+| 题型 | JSON compact | slimjson | XML | JSON | TOON | YAML | CSV |
+|------|-------------|----------|-----|------|------|------|-----|
+| 字段检索 | 99.3% | 98.5% | 98.5% | 99.3% | 95.6% | 98.5% | 98.4% |
+| 聚合计算 | 94.4% | 96.0% | 88.9% | 89.7% | 92.9% | 90.5% | 84.5% |
+| 条件筛选 | 97.9% | 96.9% | 94.8% | 91.7% | 93.8% | 92.7% | 88.9% |
+| 结构感知 | 88.0% | 88.0% | 90.0% | 90.0% | 90.0% | 88.0% | 87.5% |
+| 结构验证 | 60.0% | 30.0% | 80.0% | 50.0% | 40.0% | 50.0% | 80.0% |
 #### 测试数据集
-| 数据集 | 行数 | 结构类型 | CSV 支持 |
-|--------|------|----------|----------|
-| 均匀员工记录 | 100 | 均匀 | ✓ |
-| 电商订单（嵌套结构） | 50 | 嵌套 | ✗ |
-| 时间序列分析数据 | 60 | 均匀 | ✓ |
-| Top 100 GitHub 仓库 | 100 | 均匀 | ✓ |
-| 半均匀事件日志 | 75 | 半均匀 | ✗ |
-| 深层嵌套配置 | 11 | 深层 | ✗ |
+| 数据集 | 行数 | 结构类型 | CSV 支持 | 表格化程度 |
+|--------|------|----------|----------|-----------|
+| 均匀员工记录 | 100 | 均匀 | ✓ | 100% |
+| 电商订单（嵌套结构） | 50 | 嵌套 | ✗ | 33% |
+| 时间序列分析数据 | 60 | 均匀 | ✓ | 100% |
+| Top 100 GitHub 仓库 | 100 | 均匀 | ✓ | 100% |
+| 半均匀事件日志 | 75 | 半均匀 | ✗ | 50% |
+| 深层嵌套配置 | 11 | 深层 | ✗ | 0% |
 ## 开发
 ```bash
-# 运行测试（192 个用例，100% 覆盖率）
+# 运行测试（209 个用例，100% 覆盖率）
 npm test
 # 运行压缩率基准测试（含 trim 对比）

package/README_EN.md CHANGED Viewed

@@ -429,61 +429,70 @@ Flat tabular datasets where CSV is applicable.
 ## LLM Data Retrieval Accuracy
-Accuracy tested with 209 data retrieval questions across different input formats.
+Accuracy tested with 209 data retrieval questions across 2 LLMs on different input formats.
 #### Efficiency Ranking (Accuracy per 1K Tokens)
 ```
-slimjson       ████████████████████   44.4 acc%/1K tok  │  94.7% acc  │  2,134 tokens
-TOON           ███████████████░░░░░   34.0 acc%/1K tok  │  92.8% acc  │  2,734 tokens
+slimjson       ████████████████████   44.3 acc%/1K tok  │  94.5% acc  │  2,133 tokens
+TOON           ███████████████░░░░░   33.8 acc%/1K tok  │  92.3% acc  │  2,734 tokens
 JSON compact   ██████████████░░░░░░   31.0 acc%/1K tok  │  95.2% acc  │  3,072 tokens
-YAML           ███████████░░░░░░░░░   25.4 acc%/1K tok  │  94.3% acc  │  3,716 tokens
-JSON           ██████████░░░░░░░░░░   21.1 acc%/1K tok  │  95.7% acc  │  4,538 tokens
-XML            ████████░░░░░░░░░░░░   18.5 acc%/1K tok  │  95.7% acc  │  5,162 tokens
+YAML           ███████████░░░░░░░░░   24.9 acc%/1K tok  │  92.3% acc  │  3,716 tokens
+JSON           █████████░░░░░░░░░░░   20.3 acc%/1K tok  │  92.3% acc  │  4,538 tokens
+XML            ████████░░░░░░░░░░░░   18.1 acc%/1K tok  │  93.3% acc  │  5,162 tokens
 ```
 *Efficiency score = (Accuracy % ÷ Tokens) × 1,000. Higher is better.*
-> slimjson achieves **94.7%** accuracy (vs JSON's 95.7%) while using **53.0% fewer tokens**.
+> slimjson achieves **94.5%** accuracy (vs JSON's 92.3%) while using **53.0% fewer tokens**.
 #### Per-Model Accuracy
 ```
 deepseek-v4-flash
-  JSON           ███████████████████░    95.7% (200/209)
   XML            ███████████████████░    95.7% (200/209)
+  JSON           ███████████████████░    95.7% (200/209)
   JSON compact   ███████████████████░    95.2% (199/209)
-→ slimjson       ███████████████████░    94.7% (198/209)
   YAML           ███████████████████░    94.3% (197/209)
+→ slimjson       ███████████████████░    93.3% (195/209)
   TOON           ███████████████████░    92.8% (194/209)
   CSV            ██████████████████░░    91.7% (100/109)
+mimo-v2.5-pro
+→ slimjson       ███████████████████░    95.7% (200/209)
+  JSON compact   ███████████████████░    95.2% (199/209)
+  TOON           ██████████████████░░    91.9% (192/209)
+  XML            ██████████████████░░    90.9% (190/209)
+  YAML           ██████████████████░░    90.4% (189/209)
+  JSON           ██████████████████░░    89.0% (186/209)
+  CSV            ██████████████████░░    88.1% (96/109)
 ```
 #### Accuracy by Question Type
-| Question Type | JSON | XML | JSON compact | slimjson | YAML | TOON | CSV |
-|---------------|------|-----|-------------|----------|------|------|-----|
-| Field Retrieval | 98.5% | 97.1% | 98.5% | 95.6% | 97.1% | 91.2% | 96.9% |
-| Aggregation | 98.4% | 96.8% | 95.2% | 95.2% | 93.7% | 95.2% | 86.2% |
-| Filtering | 97.9% | 97.9% | 100.0% | 100.0% | 100.0% | 100.0% | 96.3% |
-| Structure Awareness | 88.0% | 92.0% | 84.0% | 92.0% | 88.0% | 88.0% | 87.5% |
-| Structural Validation | 40.0% | 60.0% | 60.0% | 40.0% | 40.0% | 40.0% | 80.0% |
+| Question Type | JSON compact | slimjson | XML | JSON | TOON | YAML | CSV |
+|---------------|-------------|----------|-----|------|------|------|-----|
+| Field Retrieval | 99.3% | 98.5% | 98.5% | 99.3% | 95.6% | 98.5% | 98.4% |
+| Aggregation | 94.4% | 96.0% | 88.9% | 89.7% | 92.9% | 90.5% | 84.5% |
+| Filtering | 97.9% | 96.9% | 94.8% | 91.7% | 93.8% | 92.7% | 88.9% |
+| Structure Awareness | 88.0% | 88.0% | 90.0% | 90.0% | 90.0% | 88.0% | 87.5% |
+| Structural Validation | 60.0% | 30.0% | 80.0% | 50.0% | 40.0% | 50.0% | 80.0% |
 #### Datasets Tested
-| Dataset | Rows | Structure | CSV Support |
-|---------|------|-----------|-------------|
-| Uniform employee records | 100 | uniform | ✓ |
-| E-commerce orders (nested) | 50 | nested | ✗ |
-| Time-series analytics data | 60 | uniform | ✓ |
-| Top 100 GitHub repositories | 100 | uniform | ✓ |
-| Semi-uniform event logs | 75 | semi-uniform | ✗ |
-| Deeply nested configuration | 11 | deep | ✗ |
+| Dataset | Rows | Structure | CSV Support | Tabular % |
+|---------|------|-----------|-------------|-----------|
+| Uniform employee records | 100 | uniform | ✓ | 100% |
+| E-commerce orders (nested) | 50 | nested | ✗ | 33% |
+| Time-series analytics data | 60 | uniform | ✓ | 100% |
+| Top 100 GitHub repositories | 100 | uniform | ✓ | 100% |
+| Semi-uniform event logs | 75 | semi-uniform | ✗ | 50% |
+| Deeply nested configuration | 11 | deep | ✗ | 0% |
 ## Development
 ```bash
-# Run tests (192 cases, 100% coverage)
+# Run tests (209 cases, 100% coverage)
 npm test
 # Run compression ratio benchmarks (with trim comparison)

package/compress.js CHANGED Viewed

@@ -13,7 +13,6 @@
  */
 function mergeSchemas(s1, s2) {
     if (!Array.isArray(s1) || !Array.isArray(s2)) return s1;
     const first1 = s1[0];
     const first2 = s2[0];
@@ -40,12 +39,7 @@ function mergeSchemas(s1, s2) {
     }
     // 两者都是数组（不是对象 schema）→ 递归合并第一个元素
-    if (Array.isArray(first1) && Array.isArray(first2)) {
-        return [mergeSchemas(first1, first2)];
-    }
-    // 其他情况（原始值数组或类型不匹配）→ 取第一个
-    return s1;
+    return [mergeSchemas(first1, first2)];
 }
 /**
@@ -76,12 +70,10 @@ function inferSchema(value) {
             return [inferObjectSchema(objects)];
         }
         // 原始值数组 - 不压缩，由父级处理
-        return undefined;
-    }
-    if (typeof value === 'object' && value !== null) {
-        return inferObjectSchema([value]);
+        return;
     }
-    return undefined;
+    // value 是单个对象
+    return inferObjectSchema([value]);
 }
 /**
@@ -106,7 +98,7 @@ function inferObjectSchema(objects) {
     }
     return keyOrder.map(key => {
-        const values = keyValues.get(key) || [];
+        const values = keyValues.get(key);
         if (values.length === 0) return key;
         const sample = values[0];
@@ -147,15 +139,12 @@ function inferObjectSchema(objects) {
                 for (const v of values) {
                     if (Array.isArray(v)) {
                         const s = inferSchema(v);
-                        if (s) {
-                            // inferSchema 返回 [innerSchema]，取 innerSchema 用于合并
-                            const inner = Array.isArray(s) && s.length === 1 ? s[0] : s;
-                            merged = merged ? mergeSchemas(merged, inner) : inner;
-                        }
+                        const inner = s[0];
+                        merged = merged ? mergeSchemas(merged, inner) : inner;
                     }
                 }
                 // 再包一层 [] 表示"数组的数组"
-                return { [key]: [merged || inferSchema(sample[0])] };
+                return { [key]: [merged] };
             }
             // 原始值数组（如 ["张三","李四"]）→ 不压缩，直接用 key 名
@@ -180,31 +169,21 @@ function compressWithSchema(value, schema) {
         return value.map(item => compressWithSchema(item, inner));
     }
-    // schema 包含 undefined → 原始值数组，不压缩
-    if (Array.isArray(schema) && schema.some(s => s === undefined || s === null)) {
-        return value;
-    }
     // schema 是数组（对象 schema）→ 值是对象
-    if (Array.isArray(schema)) {
-        if (Array.isArray(value)) return value.length === 0 ? [] : null;
-        if (!value || typeof value !== 'object') return value;
-        return schema.map(fieldDef => {
-            let key, valueSchema;
-            if (typeof fieldDef === 'string') {
-                key = fieldDef;
-                valueSchema = undefined;
-            } else {
-                key = Object.keys(fieldDef)[0];
-                valueSchema = fieldDef[key];
-            }
-            const val = value[key];
-            if (val == null) return null;
-            return compressWithSchema(val, valueSchema);
-        });
-    }
-    return value;
+    if (!value || typeof value !== 'object') return value;
+    return schema.map(fieldDef => {
+        let key, valueSchema;
+        if (typeof fieldDef === 'string') {
+            key = fieldDef;
+            valueSchema = undefined;
+        } else {
+            key = Object.keys(fieldDef)[0];
+            valueSchema = fieldDef[key];
+        }
+        const val = value[key];
+        if (val == null) return null;
+        return compressWithSchema(val, valueSchema);
+    });
 }
 /**
@@ -257,41 +236,32 @@ function decompressWithSchema(data, schema) {
     // schema 是 [innerSchema] → 还原为数组
     if (Array.isArray(schema) && schema.length === 1 && Array.isArray(schema[0])) {
-        if (!Array.isArray(data)) return data;
         const inner = schema[0];
         return data.map(item => decompressWithSchema(item, inner));
     }
-    // schema 是数组
-    if (Array.isArray(schema)) {
-        // schema 包含 undefined 元素 → 原始值数组，不压缩
-        if (schema.some(s => s === undefined || s === null)) return data;
-        // 原始值（混合数组中的原始元素）→ 直接返回
-        if (typeof data !== 'object' || data === null) return data;
-        // 对象 schema → 还原为对象
-        const obj = {};
-        for (let i = 0; i < schema.length; i++) {
-            const fieldDef = schema[i];
-            let key, valueSchema;
-            if (typeof fieldDef === 'string') {
-                key = fieldDef;
-                valueSchema = undefined;
-            } else if (typeof fieldDef === 'object' && fieldDef !== null) {
-                key = Object.keys(fieldDef)[0];
-                valueSchema = fieldDef[key];
-            } else {
-                continue;
-            }
-            const val = data[i];
-            if (val === undefined) { obj[key] = null; continue; }
-            obj[key] = decompressWithSchema(val, valueSchema);
+    // 原始值（混合数组中的原始元素）→ 直接返回
+    if (typeof data !== 'object') return data;
+    // 对象 schema → 还原为对象
+    const obj = {};
+    for (let i = 0; i < schema.length; i++) {
+        const fieldDef = schema[i];
+        let key, valueSchema;
+        if (typeof fieldDef === 'string') {
+            key = fieldDef;
+            valueSchema = undefined;
+        } else if (typeof fieldDef === 'object' && fieldDef !== null) {
+            key = Object.keys(fieldDef)[0];
+            valueSchema = fieldDef[key];
+        } else {
+            continue;
         }
-        return obj;
+        const val = data[i];
+        if (val === undefined) { obj[key] = null; continue; }
+        obj[key] = decompressWithSchema(val, valueSchema);
     }
-    return data;
+    return obj;
 }
 /**
@@ -494,8 +464,7 @@ function parse(text) {
       skipWs();
       if (text[pos] !== ':') error('Expected :');
       pos++;
-      const val = parseValue();
-      obj[key] = val;
+      obj[key] = parseValue();
       skipWs();
       if (text[pos] === '}') { pos++; return obj; }
       if (text[pos] === ',') { pos++; continue; }
@@ -542,3 +511,4 @@ function parse(text) {
 }
 module.exports = { compress, decompress, stringify, parse };
+module.exports.default = module.exports;