emlet 0.0.0 → 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +47 -0
- package/README.md +220 -0
- package/emlet.js +35 -0
- package/package.json +53 -7
- package/types.d.ts +11 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
Emlet License v1.0
|
|
2
|
+
Based on Apache License 2.0
|
|
3
|
+
Copyright (c) 2025 Basedwon
|
|
4
|
+
|
|
5
|
+
Licensed under the Apache License, Version 2.0 (the "License") with the following additional terms:
|
|
6
|
+
|
|
7
|
+
You may not use this file except in compliance with the License and the conditions below.
|
|
8
|
+
You may obtain a copy of the License at:
|
|
9
|
+
|
|
10
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
11
|
+
|
|
12
|
+
Unless required by applicable law or agreed to in writing, software distributed under the License is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Additional Terms Specific to Emlet
|
|
17
|
+
|
|
18
|
+
1. **Attribution Must Be Preserved**
|
|
19
|
+
|
|
20
|
+
- You must retain the original name “Emlet” and the author handle “basedwon” in any use, distribution, derivative, or reference to this software.
|
|
21
|
+
- You may not represent Emlet or its output as originating from any other individual, group, or entity.
|
|
22
|
+
|
|
23
|
+
2. **Repackaging or Rebranding is Prohibited**
|
|
24
|
+
|
|
25
|
+
- You may not rename, white-label, fork, or repackage Emlet under a different name without prior written permission.
|
|
26
|
+
- All forks or derivatives must clearly acknowledge the original source and include a link to the official repository.
|
|
27
|
+
|
|
28
|
+
3. **No Commercial Resale Without Permission**
|
|
29
|
+
|
|
30
|
+
- Emlet may be used freely in personal, research, open-source, or internal commercial contexts.
|
|
31
|
+
- You may not charge for access to Emlet, resell it, or include it in paid products or services without explicit authorization.
|
|
32
|
+
|
|
33
|
+
4. **Source Obfuscation is Intentional**
|
|
34
|
+
|
|
35
|
+
- Emlet is distributed in encrypted and obfuscated form by design. Attempts to reverse-engineer, decrypt, or extract its internal logic are strictly forbidden.
|
|
36
|
+
- You are licensed to use the public API as documented, but internal mechanisms are not licensed for inspection or modification.
|
|
37
|
+
|
|
38
|
+
5. **Sovereign Software Clause**
|
|
39
|
+
|
|
40
|
+
- Emlet is a sovereign work. It is offered for use, not ownership. Your license to use it is a trust—not a transfer of authorship.
|
|
41
|
+
- Misuse—including misattribution, rebranding, or hostile forking—may result in revocation of usage rights.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
END OF TERMS AND CONDITIONS
|
|
46
|
+
|
|
47
|
+
For licensing exceptions or commercial terms, contact: basedwon@tuta.com
|
package/README.md
ADDED
|
@@ -0,0 +1,220 @@
|
|
|
1
|
+
# Emlet
|
|
2
|
+
|
|
3
|
+
> **A compact embedding engine built for the web.**
|
|
4
|
+
|
|
5
|
+
[](https://www.npmjs.com/package/emlet)
|
|
6
|
+
[](https://gitlab.com/basedwon/emlet/-/pipelines)
|
|
7
|
+
[](https://gitlab.com/basedwon/emlet/-/blob/master/LICENSE)
|
|
8
|
+
[](https://www.npmjs.com/package/emlet)
|
|
9
|
+
|
|
10
|
+
[](https://gitlab.com/basedwon/emlet)
|
|
11
|
+
[](https://github.com/basedwon/emlet)
|
|
12
|
+
[](https://twitter.com/basdwon)
|
|
13
|
+
[](https://discordapp.com/users/basedwon)
|
|
14
|
+
|
|
15
|
+
Emlet is a fast, fully self-contained semantic embedding model designed to run anywhere JavaScript runs—browser, Node, edge, offline. No dependencies, no GPU, no network calls. Just load it and embed.
|
|
16
|
+
|
|
17
|
+
The entire engine fits in ~1 MB and produces deterministic vector embeddings suitable for similarity search, clustering, retrieval, tagging, or downstream ML workflows.
|
|
18
|
+
|
|
19
|
+
## Features
|
|
20
|
+
|
|
21
|
+
* In-browser semantic embeddings
|
|
22
|
+
* Deterministic output (same input → same vector)
|
|
23
|
+
* Offline-first, zero runtime dependencies
|
|
24
|
+
* Unicode-aware: text, symbols, emoji, ZWJ sequences
|
|
25
|
+
* Out-of-vocabulary synthesis (no missing tokens)
|
|
26
|
+
* Configurable vector dimensionality
|
|
27
|
+
* Optional “glimpse” tail of the full 1536D space
|
|
28
|
+
|
|
29
|
+
## Installation
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
npm install emlet
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Or load directly via CDN:
|
|
36
|
+
|
|
37
|
+
```html
|
|
38
|
+
<script src="https://unpkg.com/emlet"></script>
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
This exposes both `emlet` (a preloaded instance) and `Emlet` (the class) globally.
|
|
42
|
+
|
|
43
|
+
## Importing
|
|
44
|
+
|
|
45
|
+
```js
|
|
46
|
+
// CommonJS
|
|
47
|
+
const emlet = require('emlet')
|
|
48
|
+
const { emlet, Emlet } = require('emlet')
|
|
49
|
+
|
|
50
|
+
// ESM
|
|
51
|
+
import emlet from 'emlet'
|
|
52
|
+
import { emlet, Emlet } from 'emlet'
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Both styles are supported from the same file.
|
|
56
|
+
|
|
57
|
+
## Basic Usage
|
|
58
|
+
|
|
59
|
+
```js
|
|
60
|
+
const vec = emlet.embed('Hello, world!')
|
|
61
|
+
console.log(vec)
|
|
62
|
+
// → [0.08, -0.01, ...] (96-dimensional vector by default)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The default export is a ready-to-use model instance.
|
|
66
|
+
|
|
67
|
+
## Custom Models
|
|
68
|
+
|
|
69
|
+
You can create your own instance with a different output size:
|
|
70
|
+
|
|
71
|
+
```js
|
|
72
|
+
const model = new Emlet(128) // 128D output
|
|
73
|
+
const model2 = new Emlet(64, true) // 64D head + 32D tail = 96D
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Constructor
|
|
77
|
+
|
|
78
|
+
```js
|
|
79
|
+
new Emlet(dim = 96, useTail = false)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
* `dim`
|
|
83
|
+
Number of dimensions to emit from the primary embedding space.
|
|
84
|
+
|
|
85
|
+
* `useTail`
|
|
86
|
+
When `true`, appends a 32-dimensional “glimpse” of the full 1536D semantic space to every vector.
|
|
87
|
+
|
|
88
|
+
This allows output sizes from 1 up to 1536 dimensions, or 1568 when the tail is enabled.
|
|
89
|
+
|
|
90
|
+
## Out-of-Vocabulary Synthesis
|
|
91
|
+
|
|
92
|
+
Tokens not present in the internal vocabulary are synthesized deterministically:
|
|
93
|
+
|
|
94
|
+
```js
|
|
95
|
+
emlet.embed('quantaflux')
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
There are no unknown tokens and no fallbacks to zero vectors.
|
|
99
|
+
|
|
100
|
+
## Unicode and Emoji Support
|
|
101
|
+
|
|
102
|
+
Emlet natively handles Unicode symbols, emoji, modifiers, and ZWJ sequences:
|
|
103
|
+
|
|
104
|
+
```js
|
|
105
|
+
emlet.embed('🦄')
|
|
106
|
+
emlet.embed('👩🏽🚀')
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
These are embedded consistently and can be compared using standard vector similarity.
|
|
110
|
+
|
|
111
|
+
## Punctuation Handling
|
|
112
|
+
|
|
113
|
+
Punctuation is normally stripped during tokenization.
|
|
114
|
+
If the input is a **single character**, it is embedded as-is:
|
|
115
|
+
|
|
116
|
+
```js
|
|
117
|
+
emlet.embed('.')
|
|
118
|
+
emlet.embed('[')
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
This allows punctuation-level modeling when needed without polluting normal text embeddings.
|
|
122
|
+
|
|
123
|
+
## Common Patterns
|
|
124
|
+
|
|
125
|
+
### Text Chunking
|
|
126
|
+
|
|
127
|
+
```js
|
|
128
|
+
function chunkText(text, maxLen = 80) {
|
|
129
|
+
const words = text.split(/\s+/)
|
|
130
|
+
const chunks = []
|
|
131
|
+
let chunk = ''
|
|
132
|
+
|
|
133
|
+
for (let word of words) {
|
|
134
|
+
if ((chunk + ' ' + word).trim().length > maxLen) {
|
|
135
|
+
chunks.push(chunk.trim())
|
|
136
|
+
chunk = word
|
|
137
|
+
} else {
|
|
138
|
+
chunk += ' ' + word
|
|
139
|
+
}
|
|
140
|
+
}
|
|
141
|
+
|
|
142
|
+
if (chunk) chunks.push(chunk.trim())
|
|
143
|
+
return chunks
|
|
144
|
+
}
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Cosine Similarity
|
|
148
|
+
|
|
149
|
+
```js
|
|
150
|
+
function cosineSim(a, b) {
|
|
151
|
+
const dot = a.reduce((s, v, i) => s + v * b[i], 0)
|
|
152
|
+
const normA = Math.sqrt(a.reduce((s, v) => s + v * v, 0))
|
|
153
|
+
const normB = Math.sqrt(b.reduce((s, v) => s + v * v, 0))
|
|
154
|
+
return dot / (normA * normB + 1e-8)
|
|
155
|
+
}
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Top-K Similarity Search
|
|
159
|
+
|
|
160
|
+
```js
|
|
161
|
+
function topKSimilar(input, options, k = 5) {
|
|
162
|
+
const base = emlet.embed(input)
|
|
163
|
+
return options
|
|
164
|
+
.map(text => ({
|
|
165
|
+
text,
|
|
166
|
+
score: cosineSim(base, emlet.embed(text))
|
|
167
|
+
}))
|
|
168
|
+
.sort((a, b) => b.score - a.score)
|
|
169
|
+
.slice(0, k)
|
|
170
|
+
}
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
## API Surface
|
|
174
|
+
|
|
175
|
+
Emlet intentionally exposes a minimal API:
|
|
176
|
+
|
|
177
|
+
* `embed(text: string): number[]`
|
|
178
|
+
* `new Emlet(dim?: number, useTail?: boolean)`
|
|
179
|
+
|
|
180
|
+
Everything else—chunking, similarity, indexing, clustering—is left to userland.
|
|
181
|
+
|
|
182
|
+
## Testing
|
|
183
|
+
|
|
184
|
+
Emlet includes a test suite built with [testr](https://npmjs.com/package/@basd/testr).
|
|
185
|
+
|
|
186
|
+
To run the test, first clone the respository:
|
|
187
|
+
|
|
188
|
+
```sh
|
|
189
|
+
git clone https://github.com/basedwon/emlet.git
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
Install the (dev) dependencies, then run `npm test`:
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
npm install
|
|
196
|
+
npm test
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
## Donations
|
|
200
|
+
|
|
201
|
+
If you find this project useful and want to help support further development, please send us some coin. We greatly appreciate any and all contributions. Thank you!
|
|
202
|
+
|
|
203
|
+
**Bitcoin (BTC):**
|
|
204
|
+
```
|
|
205
|
+
1JUb1yNFH6wjGekRUW6Dfgyg4J4h6wKKdF
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
**Monero (XMR):**
|
|
209
|
+
```
|
|
210
|
+
46uV2fMZT3EWkBrGUgszJCcbqFqEvqrB4bZBJwsbx7yA8e2WBakXzJSUK8aqT4GoqERzbg4oKT2SiPeCgjzVH6VpSQ5y7KQ
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
## License
|
|
214
|
+
|
|
215
|
+
**Emlet License v1.0 (based on Apache 2.0)**
|
|
216
|
+
Use is permitted with attribution. Redistribution, rebranding, resale, and reverse engineering are prohibited without written permission.
|
|
217
|
+
|
|
218
|
+
See [`LICENSE`](./LICENSE) for full terms.
|
|
219
|
+
Contact: `basedwon@tuta.com` for commercial or licensing inquiries.
|
|
220
|
+
|