client-llm-preprocessor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +162 -0
- package/dist/index.d.ts +1319 -0
- package/dist/index.js +1040 -0
- package/package.json +67 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Client-Side LLM Preprocessor Contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
# Client-Side LLM Preprocessor 🛡️
|
|
2
|
+
|
|
3
|
+
[](https://github.com/USERNAME/local_processing_llm/actions)
|
|
4
|
+
[](https://www.npmjs.com/package/client-llm-preprocessor)
|
|
5
|
+
[](LICENSE)
|
|
6
|
+
[](CONTRIBUTING.md)
|
|
7
|
+
|
|
8
|
+
**Client-Side LLM Preprocessor** is a privacy-first JavaScript SDK that enables powerful text preprocessing entirely within the user's browser. It combines high-speed rule-based cleaning with optional high-reasoning LLM-based extraction and semantic cleaning.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## 🌟 Key Features
|
|
13
|
+
|
|
14
|
+
- 🕵️ **Privacy-First**: All data stay on the user's local machine. No API keys, no server-side processing.
|
|
15
|
+
- 💰 **Cost Efficient**: Clean and extract data locally to drastically reduce token usage before sending to paid APIs.
|
|
16
|
+
- ⚡ **Hybrid Processing**: High-speed rules for noise removal, LLM for semantic intelligence.
|
|
17
|
+
- 🏗️ **Structured Extraction**: Extract structured data (JSON) directly from messy text.
|
|
18
|
+
- 🧩 **Flexible Chunking**: Intelligent text splitting by length, sentence, or word.
|
|
19
|
+
- 🛡️ **Hardened & Tested**: 60+ tests covering extreme inputs, garbage text, and lifecycle chaos.
|
|
20
|
+
- 🔌 **Easy Integration**: Built-in WebGPU detection and standardized error handling.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
---
|
|
24
|
+
### ⚠️ Experimental Project
|
|
25
|
+
|
|
26
|
+
**This is a proof-of-concept / experiment.**
|
|
27
|
+
While the API is stable enough for testing, the performance and reliability are still evolving. Please do not rely on this for critical production workloads yet.
|
|
28
|
+
|
|
29
|
+
**Future Ideas (Roadmap):**
|
|
30
|
+
- 🙈 **PII Scrubbing**: Automatically detect and remove personal details (names, phones, emails) client-side before data ever leaves the device.
|
|
31
|
+
- ⚡ **Optimized WebGPU**: Better support for lower-end devices.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## 📑 Table of Contents
|
|
36
|
+
|
|
37
|
+
- [Quick Start](#🚀-quick-start)
|
|
38
|
+
- [Installation](#📦-installation)
|
|
39
|
+
- [Core Concepts](#🧩-core-concepts)
|
|
40
|
+
- [API Reference](#📖-api-reference)
|
|
41
|
+
- [Project Structure](#📂-project-structure)
|
|
42
|
+
- [Performance](#📊-performance)
|
|
43
|
+
- [Browser Requirements](#🌐-browser-requirements)
|
|
44
|
+
- [Contributing](#🤝-contributing)
|
|
45
|
+
- [License](#⚖️-license)
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## 🚀 Quick Start
|
|
50
|
+
|
|
51
|
+
### 1. Verify Environment
|
|
52
|
+
Always check for WebGPU support before attempting to load LLM models:
|
|
53
|
+
|
|
54
|
+
```javascript
|
|
55
|
+
import { Preprocessor } from 'client-llm-preprocessor';
|
|
56
|
+
|
|
57
|
+
const preprocessor = new Preprocessor();
|
|
58
|
+
const isSupported = await preprocessor.checkWebGPU();
|
|
59
|
+
|
|
60
|
+
if (!isSupported) {
|
|
61
|
+
console.warn("WebGPU not supported. Falling back to rule-based cleaning only.");
|
|
62
|
+
}
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### 2. Fast Rule-Based Cleaning (No Model Needed)
|
|
66
|
+
Clean text instantly without any downloads:
|
|
67
|
+
|
|
68
|
+
```javascript
|
|
69
|
+
const text = "<html><body>Contact: hello@example.com - Visit https://site.com</body></html>";
|
|
70
|
+
const cleaned = preprocessor.chunk(text, {
|
|
71
|
+
removeHtml: true,
|
|
72
|
+
removeUrls: true,
|
|
73
|
+
removeExtraWhitespace: true
|
|
74
|
+
});
|
|
75
|
+
// Result: "Contact: hello@example.com -"
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### 3. Smart LLM Extraction (Model Required)
|
|
79
|
+
Load a local model to extract structured data:
|
|
80
|
+
|
|
81
|
+
```javascript
|
|
82
|
+
await preprocessor.loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
|
|
83
|
+
|
|
84
|
+
const resume = "John Doe, Email: john@doe.com, Phone: 123-456-7890...";
|
|
85
|
+
const data = await preprocessor.extract(resume, {
|
|
86
|
+
format: 'json',
|
|
87
|
+
fields: ['name', 'email', 'phone']
|
|
88
|
+
});
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## 📦 Installation
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
npm install client-llm-preprocessor
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## 📂 Project Structure
|
|
102
|
+
|
|
103
|
+
The project follows a modular and well-documented structure:
|
|
104
|
+
|
|
105
|
+
```text
|
|
106
|
+
local_processing_llm/
|
|
107
|
+
├── .github/ # GitHub-specific workflows and templates
|
|
108
|
+
├── docs/ # In-depth technical guides & architecture
|
|
109
|
+
├── examples/ # Ready-to-run demo pages
|
|
110
|
+
├── src/ # Source code
|
|
111
|
+
│ ├── preprocess/ # Core logic (clean, chunk, extract)
|
|
112
|
+
│ ├── utils/ # Helpers (logger, validation, errors)
|
|
113
|
+
│ ├── engine.js # WebLLM wrapper
|
|
114
|
+
│ └── index.js # Package entry point
|
|
115
|
+
├── tests/ # 60+ automated tests
|
|
116
|
+
│ ├── unit/ # Pure logic tests
|
|
117
|
+
│ ├── integration/ # Workflow & lifecycle tests
|
|
118
|
+
│ └── helpers/ # Test utilities & mocks
|
|
119
|
+
├── dist/ # Compiled production build (ESM + Types)
|
|
120
|
+
├── package.json # Meta-data & dependencies
|
|
121
|
+
└── README.md # You are here
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## 📊 Performance
|
|
127
|
+
|
|
128
|
+
| Input Size | Rule-Based | LLM-Based |
|
|
129
|
+
| :--- | :--- | :--- |
|
|
130
|
+
| **10 KB** | < 1ms | 1-3 seconds |
|
|
131
|
+
| **1 MB** | 12ms | (Requires Chunking) |
|
|
132
|
+
| **10 MB** | 180ms | (Sequential Processing) |
|
|
133
|
+
|
|
134
|
+
> [!TIP]
|
|
135
|
+
> For a full breakdown of memory usage and speed benchmarks, see [BENCHMARKS.md](docs/BENCHMARKS.md).
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## 🌐 Browser Requirements
|
|
140
|
+
|
|
141
|
+
- **Local Processing**: Any modern browser (Chrome, Firefox, Safari, Edge).
|
|
142
|
+
- **LLM Features**: Requires **WebGPU** support.
|
|
143
|
+
- ✅ **Chrome 113+** (Windows, macOS, Linux)
|
|
144
|
+
- ✅ **Edge 113+**
|
|
145
|
+
- ⚠️ **Safari** (Experimental/Partial)
|
|
146
|
+
- ❌ **Firefox** (In progress by Mozilla)
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## 📖 Useful Documents
|
|
151
|
+
|
|
152
|
+
- **[Architecture Overview](docs/ARCHITECTURE.md)**: How the engine works.
|
|
153
|
+
- **[API Documentation](docs/API.md)**: Full method signatures and options.
|
|
154
|
+
- **[Contributing Guide](CONTRIBUTING.md)**: How to help improve the project.
|
|
155
|
+
- **[Security Policy](SECURITY.md)**: Reporting vulnerabilities.
|
|
156
|
+
- **[Troubleshooting](docs/TESTING_GUIDE.md)**: Solutions for common issues.
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## ⚖️ License
|
|
161
|
+
|
|
162
|
+
Distributed under the **MIT License**. See `LICENSE` for more information.
|