logpare 0.0.1 → 0.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +285 -16
- package/dist/chunk-PVEO4LBX.js +574 -0
- package/dist/chunk-PVEO4LBX.js.map +1 -0
- package/dist/cli.cjs +724 -0
- package/dist/cli.cjs.map +1 -0
- package/dist/cli.d.cts +1 -0
- package/dist/cli.d.ts +1 -0
- package/dist/cli.js +171 -0
- package/dist/cli.js.map +1 -0
- package/dist/index.cjs +606 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +257 -0
- package/dist/index.d.ts +257 -0
- package/dist/index.js +19 -0
- package/dist/index.js.map +1 -0
- package/package.json +43 -9
- package/index.js +0 -1
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 logpare
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
CHANGED
|
@@ -1,32 +1,301 @@
|
|
|
1
1
|
# logpare
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/logpare)
|
|
4
|
+
[](https://github.com/logpare/logpare/actions)
|
|
5
|
+
[](https://opensource.org/licenses/MIT)
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
Semantic log compression for LLM context windows. Reduces repetitive log output by 60-90% while preserving diagnostic information.
|
|
8
|
+
|
|
9
|
+
## The Problem
|
|
10
|
+
|
|
11
|
+
AI assistants processing logs waste tokens on repetitive patterns. A 10,000-line log dump might contain 50 unique message templates repeated thousands of times — but the LLM sees (and bills for) every repetition.
|
|
12
|
+
|
|
13
|
+
## The Solution
|
|
14
|
+
|
|
15
|
+
LogPare uses the [Drain algorithm](https://github.com/logpai/Drain3) to identify log templates, then outputs a compressed format showing each template once with occurrence counts.
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
Input (10,847 lines):
|
|
19
|
+
INFO Connection from 192.168.1.1 established
|
|
20
|
+
INFO Connection from 192.168.1.2 established
|
|
21
|
+
INFO Connection from 10.0.0.55 established
|
|
22
|
+
... (10,844 more similar lines)
|
|
23
|
+
|
|
24
|
+
Output (23 templates):
|
|
25
|
+
=== Log Compression Summary ===
|
|
26
|
+
Input: 10,847 lines → 23 templates (99.8% reduction)
|
|
27
|
+
|
|
28
|
+
Top templates by frequency:
|
|
29
|
+
1. [4,521x] INFO Connection from <*> established
|
|
30
|
+
2. [3,892x] DEBUG Request <*> processed in <*>
|
|
31
|
+
3. [1,203x] WARN Retry attempt <*> for <*>
|
|
32
|
+
...
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Installation
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
npm install logpare
|
|
39
|
+
# or
|
|
40
|
+
pnpm add logpare
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## CLI Usage
|
|
44
|
+
|
|
45
|
+
LogPare includes a command-line interface for quick log compression:
|
|
6
46
|
|
|
7
|
-
For the MCP server implementation:
|
|
8
47
|
```bash
|
|
9
|
-
|
|
48
|
+
# Compress a log file
|
|
49
|
+
logpare server.log
|
|
50
|
+
|
|
51
|
+
# Pipe from stdin
|
|
52
|
+
cat /var/log/syslog | logpare
|
|
53
|
+
|
|
54
|
+
# JSON output
|
|
55
|
+
logpare --format json app.log
|
|
56
|
+
|
|
57
|
+
# Custom algorithm parameters
|
|
58
|
+
logpare --depth 5 --threshold 0.5 access.log
|
|
59
|
+
|
|
60
|
+
# Write to file
|
|
61
|
+
logpare --output templates.txt error.log
|
|
62
|
+
|
|
63
|
+
# Multiple files
|
|
64
|
+
logpare access.log error.log server.log
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### CLI Options
|
|
68
|
+
|
|
69
|
+
| Option | Short | Description | Default |
|
|
70
|
+
|--------|-------|-------------|---------|
|
|
71
|
+
| `--format` | `-f` | Output format: `summary`, `detailed`, `json` | `summary` |
|
|
72
|
+
| `--output` | `-o` | Write output to file | stdout |
|
|
73
|
+
| `--depth` | `-d` | Parse tree depth | `4` |
|
|
74
|
+
| `--threshold` | `-t` | Similarity threshold (0.0-1.0) | `0.4` |
|
|
75
|
+
| `--max-children` | `-c` | Max children per node | `100` |
|
|
76
|
+
| `--max-clusters` | `-m` | Max total clusters | `1000` |
|
|
77
|
+
| `--max-templates` | `-n` | Max templates in output | `50` |
|
|
78
|
+
| `--help` | `-h` | Show help | |
|
|
79
|
+
| `--version` | `-v` | Show version | |
|
|
80
|
+
|
|
81
|
+
## Programmatic Usage
|
|
82
|
+
|
|
83
|
+
### Simple API
|
|
84
|
+
|
|
85
|
+
```typescript
|
|
86
|
+
import { compress } from 'logpare';
|
|
87
|
+
|
|
88
|
+
const logs = [
|
|
89
|
+
'INFO Connection from 192.168.1.1 established',
|
|
90
|
+
'INFO Connection from 192.168.1.2 established',
|
|
91
|
+
'ERROR Connection timeout after 30s',
|
|
92
|
+
'INFO Connection from 10.0.0.1 established',
|
|
93
|
+
];
|
|
94
|
+
|
|
95
|
+
const result = compress(logs);
|
|
96
|
+
console.log(result.formatted);
|
|
97
|
+
// === Log Compression Summary ===
|
|
98
|
+
// Input: 4 lines → 2 templates (50.0% reduction)
|
|
99
|
+
// ...
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Text Input
|
|
103
|
+
|
|
104
|
+
```typescript
|
|
105
|
+
import { compressText } from 'logpare';
|
|
106
|
+
|
|
107
|
+
const logFile = fs.readFileSync('app.log', 'utf-8');
|
|
108
|
+
const result = compressText(logFile, { format: 'json' });
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Advanced API
|
|
112
|
+
|
|
113
|
+
```typescript
|
|
114
|
+
import { createDrain, defineStrategy } from 'logpare';
|
|
115
|
+
|
|
116
|
+
// Custom preprocessing strategy
|
|
117
|
+
const customStrategy = defineStrategy({
|
|
118
|
+
patterns: {
|
|
119
|
+
requestId: /req-[a-z0-9]+/gi,
|
|
120
|
+
},
|
|
121
|
+
getSimThreshold: (depth) => depth < 2 ? 0.5 : 0.4,
|
|
122
|
+
});
|
|
123
|
+
|
|
124
|
+
const drain = createDrain({
|
|
125
|
+
depth: 4,
|
|
126
|
+
maxClusters: 500,
|
|
127
|
+
preprocessing: customStrategy,
|
|
128
|
+
});
|
|
129
|
+
|
|
130
|
+
drain.addLogLines(logs);
|
|
131
|
+
const result = drain.getResult('detailed');
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Output Formats
|
|
135
|
+
|
|
136
|
+
### Summary (default)
|
|
137
|
+
Compact overview with top templates and rare events.
|
|
138
|
+
|
|
139
|
+
### Detailed
|
|
140
|
+
Full template list with sample variable values.
|
|
141
|
+
|
|
142
|
+
### JSON
|
|
143
|
+
Machine-readable format for programmatic use.
|
|
144
|
+
|
|
145
|
+
```typescript
|
|
146
|
+
compress(logs, { format: 'json' });
|
|
10
147
|
```
|
|
11
148
|
|
|
12
|
-
##
|
|
149
|
+
## API Reference
|
|
13
150
|
|
|
14
|
-
|
|
151
|
+
### `compress(lines, options?)`
|
|
15
152
|
|
|
16
|
-
|
|
17
|
-
- Compress build logs before pasting into Claude/Cursor
|
|
18
|
-
- Reduce verbose server logs for AI debugging
|
|
19
|
-
- Extract signal from noisy test output
|
|
153
|
+
Compress an array of log lines.
|
|
20
154
|
|
|
21
|
-
|
|
155
|
+
- `lines`: `string[]` - Log lines to compress
|
|
156
|
+
- `options.format`: `'summary' | 'detailed' | 'json'` - Output format (default: `'summary'`)
|
|
157
|
+
- `options.maxTemplates`: `number` - Max templates in output (default: `50`)
|
|
158
|
+
- `options.drain`: `DrainOptions` - Algorithm configuration
|
|
22
159
|
|
|
23
|
-
|
|
160
|
+
Returns `CompressionResult` with `templates`, `stats`, and `formatted` output.
|
|
161
|
+
|
|
162
|
+
### `compressText(text, options?)`
|
|
163
|
+
|
|
164
|
+
Compress a multi-line string (splits on newlines).
|
|
165
|
+
|
|
166
|
+
### `createDrain(options?)`
|
|
167
|
+
|
|
168
|
+
Create a Drain instance for incremental processing.
|
|
169
|
+
|
|
170
|
+
- `options.depth`: `number` - Parse tree depth (default: `4`)
|
|
171
|
+
- `options.simThreshold`: `number` - Similarity threshold 0-1 (default: `0.4`)
|
|
172
|
+
- `options.maxChildren`: `number` - Max children per node (default: `100`)
|
|
173
|
+
- `options.maxClusters`: `number` - Max total templates (default: `1000`)
|
|
174
|
+
- `options.preprocessing`: `ParsingStrategy` - Custom preprocessing
|
|
175
|
+
|
|
176
|
+
### `defineStrategy(overrides)`
|
|
177
|
+
|
|
178
|
+
Create a custom preprocessing strategy.
|
|
179
|
+
|
|
180
|
+
```typescript
|
|
181
|
+
const strategy = defineStrategy({
|
|
182
|
+
patterns: { customId: /id-\d+/g },
|
|
183
|
+
tokenize: (line) => line.split(','),
|
|
184
|
+
getSimThreshold: (depth) => 0.5,
|
|
185
|
+
});
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
## Built-in Patterns
|
|
189
|
+
|
|
190
|
+
LogPare automatically masks common variable types:
|
|
191
|
+
|
|
192
|
+
- IPv4/IPv6 addresses
|
|
193
|
+
- UUIDs
|
|
194
|
+
- Timestamps (ISO, Unix)
|
|
195
|
+
- File paths and URLs
|
|
196
|
+
- Hex IDs
|
|
197
|
+
- Block IDs (HDFS)
|
|
198
|
+
- Numbers with units (e.g., `250ms`, `1024KB`)
|
|
199
|
+
|
|
200
|
+
## Performance
|
|
201
|
+
|
|
202
|
+
- **Speed**: >10,000 lines/second
|
|
203
|
+
- **Memory**: O(templates), not O(lines)
|
|
204
|
+
- **V8 Optimized**: Uses `Map` for tree nodes, monomorphic constructors
|
|
205
|
+
|
|
206
|
+
## Parameter Tuning Guide
|
|
207
|
+
|
|
208
|
+
### When to Adjust Parameters
|
|
209
|
+
|
|
210
|
+
| Symptom | Cause | Solution |
|
|
211
|
+
|---------|-------|----------|
|
|
212
|
+
| Too many templates | Threshold too high | Lower `simThreshold` (e.g., 0.3) |
|
|
213
|
+
| Templates too generic | Threshold too low | Raise `simThreshold` (e.g., 0.5) |
|
|
214
|
+
| Similar logs not grouped | Depth too shallow | Increase `depth` (e.g., 5-6) |
|
|
215
|
+
| Too much memory usage | Too many clusters | Lower `maxClusters` |
|
|
216
|
+
|
|
217
|
+
### Recommended Settings by Log Type
|
|
218
|
+
|
|
219
|
+
**Structured logs (JSON, CSV):**
|
|
220
|
+
```typescript
|
|
221
|
+
{ depth: 3, simThreshold: 0.5 }
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
**Noisy application logs:**
|
|
225
|
+
```typescript
|
|
226
|
+
{ depth: 5, simThreshold: 0.3 }
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
**System logs (syslog, journald):**
|
|
230
|
+
```typescript
|
|
231
|
+
{ depth: 4, simThreshold: 0.4 } // defaults work well
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
**High-volume logs (>1M lines):**
|
|
235
|
+
```typescript
|
|
236
|
+
{ maxClusters: 500, maxChildren: 50 }
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
## Troubleshooting
|
|
240
|
+
|
|
241
|
+
### "Too many templates"
|
|
242
|
+
|
|
243
|
+
If you're getting more templates than expected:
|
|
244
|
+
|
|
245
|
+
1. **Lower the similarity threshold**: Templates that should group together may not meet the default 0.4 threshold
|
|
246
|
+
```typescript
|
|
247
|
+
compress(logs, { drain: { simThreshold: 0.3 } })
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
2. **Check for unmaked variables**: Custom IDs or tokens may need masking
|
|
251
|
+
```typescript
|
|
252
|
+
const strategy = defineStrategy({
|
|
253
|
+
patterns: { customId: /your-pattern/g }
|
|
254
|
+
});
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
### "Templates are too generic"
|
|
258
|
+
|
|
259
|
+
If templates are over-grouping different log types:
|
|
260
|
+
|
|
261
|
+
1. **Raise the similarity threshold**:
|
|
262
|
+
```typescript
|
|
263
|
+
compress(logs, { drain: { simThreshold: 0.5 } })
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
2. **Increase tree depth**:
|
|
267
|
+
```typescript
|
|
268
|
+
compress(logs, { drain: { depth: 5 } })
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
### "Memory usage too high"
|
|
272
|
+
|
|
273
|
+
For very large log files:
|
|
274
|
+
|
|
275
|
+
1. **Limit clusters**: Set `maxClusters` to cap memory usage
|
|
276
|
+
```typescript
|
|
277
|
+
compress(logs, { drain: { maxClusters: 500 } })
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
2. **Process in batches**: Use `createDrain()` and process chunks
|
|
281
|
+
|
|
282
|
+
### "Some patterns not being masked"
|
|
283
|
+
|
|
284
|
+
Add custom patterns for domain-specific tokens:
|
|
285
|
+
|
|
286
|
+
```typescript
|
|
287
|
+
const strategy = defineStrategy({
|
|
288
|
+
patterns: {
|
|
289
|
+
sessionId: /sess-[a-f0-9]+/gi,
|
|
290
|
+
orderId: /ORD-\d{10}/g,
|
|
291
|
+
}
|
|
292
|
+
});
|
|
293
|
+
```
|
|
24
294
|
|
|
25
|
-
##
|
|
295
|
+
## Coming from Python Drain3?
|
|
26
296
|
|
|
27
|
-
|
|
28
|
-
- Repository: https://github.com/logpare/logpare-mcp
|
|
297
|
+
See [MIGRATION.md](./MIGRATION.md) for a detailed comparison and migration guide.
|
|
29
298
|
|
|
30
299
|
## License
|
|
31
300
|
|
|
32
|
-
MIT
|
|
301
|
+
MIT
|