emlet 0.0.0 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,47 @@
1
+ Emlet License v1.0
2
+ Based on Apache License 2.0
3
+ Copyright (c) 2025 Basedwon
4
+
5
+ Licensed under the Apache License, Version 2.0 (the "License") with the following additional terms:
6
+
7
+ You may not use this file except in compliance with the License and the conditions below.
8
+ You may obtain a copy of the License at:
9
+
10
+ http://www.apache.org/licenses/LICENSE-2.0
11
+
12
+ Unless required by applicable law or agreed to in writing, software distributed under the License is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+
14
+ ---
15
+
16
+ ## Additional Terms Specific to Emlet
17
+
18
+ 1. **Attribution Must Be Preserved**
19
+
20
+ - You must retain the original name “Emlet” and the author handle “basedwon” in any use, distribution, derivative, or reference to this software.
21
+ - You may not represent Emlet or its output as originating from any other individual, group, or entity.
22
+
23
+ 2. **Repackaging or Rebranding is Prohibited**
24
+
25
+ - You may not rename, white-label, fork, or repackage Emlet under a different name without prior written permission.
26
+ - All forks or derivatives must clearly acknowledge the original source and include a link to the official repository.
27
+
28
+ 3. **No Commercial Resale Without Permission**
29
+
30
+ - Emlet may be used freely in personal, research, open-source, or internal commercial contexts.
31
+ - You may not charge for access to Emlet, resell it, or include it in paid products or services without explicit authorization.
32
+
33
+ 4. **Source Obfuscation is Intentional**
34
+
35
+ - Emlet is distributed in encrypted and obfuscated form by design. Attempts to reverse-engineer, decrypt, or extract its internal logic are strictly forbidden.
36
+ - You are licensed to use the public API as documented, but internal mechanisms are not licensed for inspection or modification.
37
+
38
+ 5. **Sovereign Software Clause**
39
+
40
+ - Emlet is a sovereign work. It is offered for use, not ownership. Your license to use it is a trust—not a transfer of authorship.
41
+ - Misuse—including misattribution, rebranding, or hostile forking—may result in revocation of usage rights.
42
+
43
+ ---
44
+
45
+ END OF TERMS AND CONDITIONS
46
+
47
+ For licensing exceptions or commercial terms, contact: basedwon@tuta.com
package/README.md ADDED
@@ -0,0 +1,220 @@
1
+ # Emlet
2
+
3
+ > **A compact embedding engine built for the web.**
4
+
5
+ [![npm](https://img.shields.io/npm/v/emlet?style=flat&logo=npm)](https://www.npmjs.com/package/emlet)
6
+ [![pipeline](https://gitlab.com/basedwon/emlet/badges/master/pipeline.svg)](https://gitlab.com/basedwon/emlet/-/pipelines)
7
+ [![license](https://img.shields.io/npm/l/emlet)](https://gitlab.com/basedwon/emlet/-/blob/master/LICENSE)
8
+ [![downloads](https://img.shields.io/npm/dw/emlet)](https://www.npmjs.com/package/emlet)
9
+
10
+ [![Gitlab](https://img.shields.io/badge/Gitlab%20-%20?logo=gitlab&color=%23383a40)](https://gitlab.com/basedwon/emlet)
11
+ [![Github](https://img.shields.io/badge/Github%20-%20?logo=github&color=%23383a40)](https://github.com/basedwon/emlet)
12
+ [![Twitter](https://img.shields.io/badge/@basdwon%20-%20?logo=twitter&color=%23383a40)](https://twitter.com/basdwon)
13
+ [![Discord](https://img.shields.io/badge/Basedwon%20-%20?logo=discord&color=%23383a40)](https://discordapp.com/users/basedwon)
14
+
15
+ Emlet is a fast, fully self-contained semantic embedding model designed to run anywhere JavaScript runs—browser, Node, edge, offline. No dependencies, no GPU, no network calls. Just load it and embed.
16
+
17
+ The entire engine fits in ~1 MB and produces deterministic vector embeddings suitable for similarity search, clustering, retrieval, tagging, or downstream ML workflows.
18
+
19
+ ## Features
20
+
21
+ * In-browser semantic embeddings
22
+ * Deterministic output (same input → same vector)
23
+ * Offline-first, zero runtime dependencies
24
+ * Unicode-aware: text, symbols, emoji, ZWJ sequences
25
+ * Out-of-vocabulary synthesis (no missing tokens)
26
+ * Configurable vector dimensionality
27
+ * Optional “glimpse” tail of the full 1536D space
28
+
29
+ ## Installation
30
+
31
+ ```bash
32
+ npm install emlet
33
+ ```
34
+
35
+ Or load directly via CDN:
36
+
37
+ ```html
38
+ <script src="https://unpkg.com/emlet"></script>
39
+ ```
40
+
41
+ This exposes both `emlet` (a preloaded instance) and `Emlet` (the class) globally.
42
+
43
+ ## Importing
44
+
45
+ ```js
46
+ // CommonJS
47
+ const emlet = require('emlet')
48
+ const { emlet, Emlet } = require('emlet')
49
+
50
+ // ESM
51
+ import emlet from 'emlet'
52
+ import { emlet, Emlet } from 'emlet'
53
+ ```
54
+
55
+ Both styles are supported from the same file.
56
+
57
+ ## Basic Usage
58
+
59
+ ```js
60
+ const vec = emlet.embed('Hello, world!')
61
+ console.log(vec)
62
+ // → [0.08, -0.01, ...] (96-dimensional vector by default)
63
+ ```
64
+
65
+ The default export is a ready-to-use model instance.
66
+
67
+ ## Custom Models
68
+
69
+ You can create your own instance with a different output size:
70
+
71
+ ```js
72
+ const model = new Emlet(128) // 128D output
73
+ const model2 = new Emlet(64, true) // 64D head + 32D tail = 96D
74
+ ```
75
+
76
+ ### Constructor
77
+
78
+ ```js
79
+ new Emlet(dim = 96, useTail = false)
80
+ ```
81
+
82
+ * `dim`
83
+ Number of dimensions to emit from the primary embedding space.
84
+
85
+ * `useTail`
86
+ When `true`, appends a 32-dimensional “glimpse” of the full 1536D semantic space to every vector.
87
+
88
+ This allows output sizes from 1 up to 1536 dimensions, or 1568 when the tail is enabled.
89
+
90
+ ## Out-of-Vocabulary Synthesis
91
+
92
+ Tokens not present in the internal vocabulary are synthesized deterministically:
93
+
94
+ ```js
95
+ emlet.embed('quantaflux')
96
+ ```
97
+
98
+ There are no unknown tokens and no fallbacks to zero vectors.
99
+
100
+ ## Unicode and Emoji Support
101
+
102
+ Emlet natively handles Unicode symbols, emoji, modifiers, and ZWJ sequences:
103
+
104
+ ```js
105
+ emlet.embed('🦄')
106
+ emlet.embed('👩🏽‍🚀')
107
+ ```
108
+
109
+ These are embedded consistently and can be compared using standard vector similarity.
110
+
111
+ ## Punctuation Handling
112
+
113
+ Punctuation is normally stripped during tokenization.
114
+ If the input is a **single character**, it is embedded as-is:
115
+
116
+ ```js
117
+ emlet.embed('.')
118
+ emlet.embed('[')
119
+ ```
120
+
121
+ This allows punctuation-level modeling when needed without polluting normal text embeddings.
122
+
123
+ ## Common Patterns
124
+
125
+ ### Text Chunking
126
+
127
+ ```js
128
+ function chunkText(text, maxLen = 80) {
129
+ const words = text.split(/\s+/)
130
+ const chunks = []
131
+ let chunk = ''
132
+
133
+ for (let word of words) {
134
+ if ((chunk + ' ' + word).trim().length > maxLen) {
135
+ chunks.push(chunk.trim())
136
+ chunk = word
137
+ } else {
138
+ chunk += ' ' + word
139
+ }
140
+ }
141
+
142
+ if (chunk) chunks.push(chunk.trim())
143
+ return chunks
144
+ }
145
+ ```
146
+
147
+ ### Cosine Similarity
148
+
149
+ ```js
150
+ function cosineSim(a, b) {
151
+ const dot = a.reduce((s, v, i) => s + v * b[i], 0)
152
+ const normA = Math.sqrt(a.reduce((s, v) => s + v * v, 0))
153
+ const normB = Math.sqrt(b.reduce((s, v) => s + v * v, 0))
154
+ return dot / (normA * normB + 1e-8)
155
+ }
156
+ ```
157
+
158
+ ### Top-K Similarity Search
159
+
160
+ ```js
161
+ function topKSimilar(input, options, k = 5) {
162
+ const base = emlet.embed(input)
163
+ return options
164
+ .map(text => ({
165
+ text,
166
+ score: cosineSim(base, emlet.embed(text))
167
+ }))
168
+ .sort((a, b) => b.score - a.score)
169
+ .slice(0, k)
170
+ }
171
+ ```
172
+
173
+ ## API Surface
174
+
175
+ Emlet intentionally exposes a minimal API:
176
+
177
+ * `embed(text: string): number[]`
178
+ * `new Emlet(dim?: number, useTail?: boolean)`
179
+
180
+ Everything else—chunking, similarity, indexing, clustering—is left to userland.
181
+
182
+ ## Testing
183
+
184
+ Emlet includes a test suite built with [testr](https://npmjs.com/package/@basd/testr).
185
+
186
+ To run the test, first clone the respository:
187
+
188
+ ```sh
189
+ git clone https://github.com/basedwon/emlet.git
190
+ ```
191
+
192
+ Install the (dev) dependencies, then run `npm test`:
193
+
194
+ ```bash
195
+ npm install
196
+ npm test
197
+ ```
198
+
199
+ ## Donations
200
+
201
+ If you find this project useful and want to help support further development, please send us some coin. We greatly appreciate any and all contributions. Thank you!
202
+
203
+ **Bitcoin (BTC):**
204
+ ```
205
+ 1JUb1yNFH6wjGekRUW6Dfgyg4J4h6wKKdF
206
+ ```
207
+
208
+ **Monero (XMR):**
209
+ ```
210
+ 46uV2fMZT3EWkBrGUgszJCcbqFqEvqrB4bZBJwsbx7yA8e2WBakXzJSUK8aqT4GoqERzbg4oKT2SiPeCgjzVH6VpSQ5y7KQ
211
+ ```
212
+
213
+ ## License
214
+
215
+ **Emlet License v1.0 (based on Apache 2.0)**
216
+ Use is permitted with attribution. Redistribution, rebranding, resale, and reverse engineering are prohibited without written permission.
217
+
218
+ See [`LICENSE`](./LICENSE) for full terms.
219
+ Contact: `basedwon@tuta.com` for commercial or licensing inquiries.
220
+