logpare 0.0.1 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 logpare
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -1,32 +1,325 @@
1
1
  # logpare
2
2
 
3
- **Semantic log reduction for LLM debugging and agent workflows**
3
+ [![npm version](https://img.shields.io/npm/v/logpare.svg)](https://www.npmjs.com/package/logpare)
4
+ [![CI](https://github.com/logpare/logpare/workflows/CI/badge.svg)](https://github.com/logpare/logpare/actions)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
6
 
5
- ## Install
7
+ Semantic log compression for LLM context windows. Reduces repetitive log output by 60-90% while preserving diagnostic information.
8
+
9
+ ## The Problem
10
+
11
+ AI assistants processing logs waste tokens on repetitive patterns. A 10,000-line log dump might contain 50 unique message templates repeated thousands of times — but the LLM sees (and bills for) every repetition.
12
+
13
+ ## The Solution
14
+
15
+ LogPare uses the [Drain algorithm](https://github.com/logpai/Drain3) to identify log templates, then outputs a compressed format showing each template once with occurrence counts.
16
+
17
+ ```
18
+ Input (10,847 lines):
19
+ INFO Connection from 192.168.1.1 established
20
+ INFO Connection from 192.168.1.2 established
21
+ INFO Connection from 10.0.0.55 established
22
+ ... (10,844 more similar lines)
23
+
24
+ Output (23 templates):
25
+ === Log Compression Summary ===
26
+ Input: 10,847 lines → 23 templates (99.8% reduction)
27
+
28
+ Top templates by frequency:
29
+ 1. [4,521x] INFO Connection from <*> established
30
+ 2. [3,892x] DEBUG Request <*> processed in <*>
31
+ 3. [1,203x] WARN Retry attempt <*> for <*>
32
+ ...
33
+ ```
34
+
35
+ ## Installation
36
+
37
+ ### As a CLI tool (recommended for command-line usage)
38
+
39
+ Install globally to use `logpare` directly from anywhere:
6
40
 
7
- For the MCP server implementation:
8
41
  ```bash
9
- npm install @logpare/mcp
42
+ npm install -g logpare
43
+
44
+ # Now works directly
45
+ logpare server.log
10
46
  ```
11
47
 
12
- ## What is LogPare?
48
+ ### As a library
13
49
 
14
- LogPare compresses console logs 60-90% through pattern deduplication and template extraction, making them usable in LLM context windows.
50
+ Install locally in your project for programmatic usage:
15
51
 
16
- **Use cases:**
17
- - Compress build logs before pasting into Claude/Cursor
18
- - Reduce verbose server logs for AI debugging
19
- - Extract signal from noisy test output
52
+ ```bash
53
+ npm install logpare
54
+ # or
55
+ pnpm add logpare
56
+ ```
57
+
58
+ > **Note:** Local installs require `npx` to run the CLI: `npx logpare server.log`
59
+
60
+ ## CLI Usage
61
+
62
+ LogPare includes a command-line interface for quick log compression:
63
+
64
+ ```bash
65
+ # Compress a log file
66
+ logpare server.log
67
+
68
+ # Pipe from stdin
69
+ cat /var/log/syslog | logpare
70
+
71
+ # JSON output
72
+ logpare --format json app.log
20
73
 
21
- ## Packages
74
+ # Custom algorithm parameters
75
+ logpare --depth 5 --threshold 0.5 access.log
76
+
77
+ # Write to file
78
+ logpare --output templates.txt error.log
79
+
80
+ # Multiple files
81
+ logpare access.log error.log server.log
82
+ ```
22
83
 
23
- - **[@logpare/mcp](https://npmjs.com/package/@logpare/mcp)** - MCP server for Claude, Cursor, and AI assistants
84
+ > **Using a local install?** Prefix commands with `npx`:
85
+ > ```bash
86
+ > npx logpare server.log
87
+ > cat /var/log/syslog | npx logpare
88
+ > ```
89
+
90
+ ### CLI Options
91
+
92
+ | Option | Short | Description | Default |
93
+ |--------|-------|-------------|---------|
94
+ | `--format` | `-f` | Output format: `summary`, `detailed`, `json` | `summary` |
95
+ | `--output` | `-o` | Write output to file | stdout |
96
+ | `--depth` | `-d` | Parse tree depth | `4` |
97
+ | `--threshold` | `-t` | Similarity threshold (0.0-1.0) | `0.4` |
98
+ | `--max-children` | `-c` | Max children per node | `100` |
99
+ | `--max-clusters` | `-m` | Max total clusters | `1000` |
100
+ | `--max-templates` | `-n` | Max templates in output | `50` |
101
+ | `--help` | `-h` | Show help | |
102
+ | `--version` | `-v` | Show version | |
103
+
104
+ ## Programmatic Usage
105
+
106
+ ### Simple API
107
+
108
+ ```typescript
109
+ import { compress } from 'logpare';
110
+
111
+ const logs = [
112
+ 'INFO Connection from 192.168.1.1 established',
113
+ 'INFO Connection from 192.168.1.2 established',
114
+ 'ERROR Connection timeout after 30s',
115
+ 'INFO Connection from 10.0.0.1 established',
116
+ ];
117
+
118
+ const result = compress(logs);
119
+ console.log(result.formatted);
120
+ // === Log Compression Summary ===
121
+ // Input: 4 lines → 2 templates (50.0% reduction)
122
+ // ...
123
+ ```
124
+
125
+ ### Text Input
126
+
127
+ ```typescript
128
+ import { compressText } from 'logpare';
129
+
130
+ const logFile = fs.readFileSync('app.log', 'utf-8');
131
+ const result = compressText(logFile, { format: 'json' });
132
+ ```
133
+
134
+ ### Advanced API
135
+
136
+ ```typescript
137
+ import { createDrain, defineStrategy } from 'logpare';
138
+
139
+ // Custom preprocessing strategy
140
+ const customStrategy = defineStrategy({
141
+ patterns: {
142
+ requestId: /req-[a-z0-9]+/gi,
143
+ },
144
+ getSimThreshold: (depth) => depth < 2 ? 0.5 : 0.4,
145
+ });
146
+
147
+ const drain = createDrain({
148
+ depth: 4,
149
+ maxClusters: 500,
150
+ preprocessing: customStrategy,
151
+ });
152
+
153
+ drain.addLogLines(logs);
154
+ const result = drain.getResult('detailed');
155
+ ```
156
+
157
+ ## Output Formats
158
+
159
+ ### Summary (default)
160
+ Compact overview with top templates and rare events.
161
+
162
+ ### Detailed
163
+ Full template list with sample variable values.
164
+
165
+ ### JSON
166
+ Machine-readable format for programmatic use.
167
+
168
+ ```typescript
169
+ compress(logs, { format: 'json' });
170
+ ```
171
+
172
+ ## API Reference
173
+
174
+ ### `compress(lines, options?)`
175
+
176
+ Compress an array of log lines.
177
+
178
+ - `lines`: `string[]` - Log lines to compress
179
+ - `options.format`: `'summary' | 'detailed' | 'json'` - Output format (default: `'summary'`)
180
+ - `options.maxTemplates`: `number` - Max templates in output (default: `50`)
181
+ - `options.drain`: `DrainOptions` - Algorithm configuration
182
+
183
+ Returns `CompressionResult` with `templates`, `stats`, and `formatted` output.
184
+
185
+ ### `compressText(text, options?)`
186
+
187
+ Compress a multi-line string (splits on newlines).
188
+
189
+ ### `createDrain(options?)`
190
+
191
+ Create a Drain instance for incremental processing.
192
+
193
+ - `options.depth`: `number` - Parse tree depth (default: `4`)
194
+ - `options.simThreshold`: `number` - Similarity threshold 0-1 (default: `0.4`)
195
+ - `options.maxChildren`: `number` - Max children per node (default: `100`)
196
+ - `options.maxClusters`: `number` - Max total templates (default: `1000`)
197
+ - `options.preprocessing`: `ParsingStrategy` - Custom preprocessing
198
+
199
+ ### `defineStrategy(overrides)`
200
+
201
+ Create a custom preprocessing strategy.
202
+
203
+ ```typescript
204
+ const strategy = defineStrategy({
205
+ patterns: { customId: /id-\d+/g },
206
+ tokenize: (line) => line.split(','),
207
+ getSimThreshold: (depth) => 0.5,
208
+ });
209
+ ```
210
+
211
+ ## Built-in Patterns
212
+
213
+ LogPare automatically masks common variable types:
214
+
215
+ - IPv4/IPv6 addresses
216
+ - Port numbers (e.g., `:443`, `:8080`)
217
+ - UUIDs
218
+ - Timestamps (ISO, Unix)
219
+ - File paths and URLs
220
+ - Hex IDs
221
+ - Block IDs (HDFS)
222
+ - Numbers with units (e.g., `250ms`, `1024KB`)
223
+
224
+ ## Performance
225
+
226
+ - **Speed**: >10,000 lines/second
227
+ - **Memory**: O(templates), not O(lines)
228
+ - **V8 Optimized**: Uses `Map` for tree nodes, monomorphic constructors
229
+
230
+ ## Parameter Tuning Guide
231
+
232
+ ### When to Adjust Parameters
233
+
234
+ | Symptom | Cause | Solution |
235
+ |---------|-------|----------|
236
+ | Too many templates | Threshold too high | Lower `simThreshold` (e.g., 0.3) |
237
+ | Templates too generic | Threshold too low | Raise `simThreshold` (e.g., 0.5) |
238
+ | Similar logs not grouped | Depth too shallow | Increase `depth` (e.g., 5-6) |
239
+ | Too much memory usage | Too many clusters | Lower `maxClusters` |
240
+
241
+ ### Recommended Settings by Log Type
242
+
243
+ **Structured logs (JSON, CSV):**
244
+ ```typescript
245
+ { depth: 3, simThreshold: 0.5 }
246
+ ```
247
+
248
+ **Noisy application logs:**
249
+ ```typescript
250
+ { depth: 5, simThreshold: 0.3 }
251
+ ```
252
+
253
+ **System logs (syslog, journald):**
254
+ ```typescript
255
+ { depth: 4, simThreshold: 0.4 } // defaults work well
256
+ ```
257
+
258
+ **High-volume logs (>1M lines):**
259
+ ```typescript
260
+ { maxClusters: 500, maxChildren: 50 }
261
+ ```
262
+
263
+ ## Troubleshooting
264
+
265
+ ### "Too many templates"
266
+
267
+ If you're getting more templates than expected:
268
+
269
+ 1. **Lower the similarity threshold**: Templates that should group together may not meet the default 0.4 threshold
270
+ ```typescript
271
+ compress(logs, { drain: { simThreshold: 0.3 } })
272
+ ```
273
+
274
+ 2. **Check for unmaked variables**: Custom IDs or tokens may need masking
275
+ ```typescript
276
+ const strategy = defineStrategy({
277
+ patterns: { customId: /your-pattern/g }
278
+ });
279
+ ```
280
+
281
+ ### "Templates are too generic"
282
+
283
+ If templates are over-grouping different log types:
284
+
285
+ 1. **Raise the similarity threshold**:
286
+ ```typescript
287
+ compress(logs, { drain: { simThreshold: 0.5 } })
288
+ ```
289
+
290
+ 2. **Increase tree depth**:
291
+ ```typescript
292
+ compress(logs, { drain: { depth: 5 } })
293
+ ```
294
+
295
+ ### "Memory usage too high"
296
+
297
+ For very large log files:
298
+
299
+ 1. **Limit clusters**: Set `maxClusters` to cap memory usage
300
+ ```typescript
301
+ compress(logs, { drain: { maxClusters: 500 } })
302
+ ```
303
+
304
+ 2. **Process in batches**: Use `createDrain()` and process chunks
305
+
306
+ ### "Some patterns not being masked"
307
+
308
+ Add custom patterns for domain-specific tokens:
309
+
310
+ ```typescript
311
+ const strategy = defineStrategy({
312
+ patterns: {
313
+ sessionId: /sess-[a-f0-9]+/gi,
314
+ orderId: /ORD-\d{10}/g,
315
+ }
316
+ });
317
+ ```
24
318
 
25
- ## Links
319
+ ## Coming from Python Drain3?
26
320
 
27
- - GitHub: https://github.com/logpare
28
- - Repository: https://github.com/logpare/logpare-mcp
321
+ See [MIGRATION.md](./MIGRATION.md) for a detailed comparison and migration guide.
29
322
 
30
323
  ## License
31
324
 
32
- MIT © Jeff Green
325
+ MIT