langtell 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +91 -0
  3. package/package.json +20 -0
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Oleksandr Zhuravlov
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,91 @@
1
+ # langtell
2
+
3
+ > Tell me the language.
4
+
5
+ `langtell` infers the language of short strings — titles, snippets, headlines —
6
+ by **fusing evidence from many signals** into a single weighted verdict with a
7
+ confidence score and an auditable trail. It reads the *tells*: the script and
8
+ distinctive letters of the text, the `<html lang>` / `og:locale` / meta tags of
9
+ the page it came from, the HTTP `Content-Language` header, and — optionally —
10
+ heavier statistical engines like [franc](https://github.com/wooorm/franc) or the
11
+ on-device Chrome AI language detector.
12
+
13
+ It is **not** another trigram detector competing with franc/cld3/tinyld. Those
14
+ answer *"what language is this body of text?"* from the characters alone.
15
+ `langtell` answers *"what language is this **title**, given the page, transport,
16
+ and source it arrived in?"* — and shows its work.
17
+
18
+ > **Status:** design preview. The API below is the committed design; the
19
+ > implementation is in progress. This `0.0.x` release reserves the name and
20
+ > documents the design — it has no runtime yet.
21
+
22
+ ## Why
23
+
24
+ - **Short strings beat statistical detectors.** A two-word title gives franc too
25
+ little to chew on. `langtell` leans on script ranges, distinctive letters, and
26
+ out-of-band metadata that a pure text detector never sees.
27
+ - **Auditable, not magic.** Every verdict carries the list of signals that
28
+ produced it (`evidence[]`), each with its kind, language, confidence, and raw
29
+ value — so you can debug *why* a title was classified the way it was.
30
+ - **Pay only for what you use.** The zero-dependency core (script + HTML + header
31
+ signals) is fully synchronous. Heavy engines (franc's trigram tables, the
32
+ browser detector) live behind their own subpaths and only enter your bundle —
33
+ and only run — when you opt in.
34
+
35
+ ## Quick start
36
+
37
+ ```ts
38
+ import { compile } from "langtell";
39
+
40
+ // compile() does the per-roster setup once; call the returned fn many times.
41
+ const detect = compile({ candidates: [UK, RU, EN] });
42
+
43
+ const result = detect({
44
+ text: "Їжак Сонік",
45
+ html, // optional: <html lang>, og:locale, meta content-language
46
+ responseHeaders, // optional: HTTP Content-Language
47
+ });
48
+ // → { language: "uk", confidence: 0.9x, evidence: [{ kind: "title-script", ... }, ...] }
49
+ ```
50
+
51
+ Add a heavy engine — it stays behind its own import door, and the return type
52
+ becomes `Promise` automatically because the engine is async:
53
+
54
+ ```ts
55
+ import { compile } from "langtell";
56
+ import { francEngine } from "langtell/franc";
57
+
58
+ const detect = compile({ candidates: [UK, RU, EN], engines: [francEngine] });
59
+ const result = await detect({ text, html, responseHeaders });
60
+ ```
61
+
62
+ ## API at a glance
63
+
64
+ | Export | Role |
65
+ | --- | --- |
66
+ | `compile(config)` | Build a configured `detect` function (does the precompute once). |
67
+ | `detect(input)` | The compiled detector. Sync or `Promise`, by config — see below. |
68
+ | `evidenceFromText(text)` | Producer: script + distinctive-letter signals. Zero-dep, sync. |
69
+ | `evidenceFromHtml(html)` | Producer: `<html lang>`, meta content-language, `og:locale`. Zero-dep, sync. |
70
+ | `evidenceFromHeaders(h)` | Producer: HTTP `Content-Language`. Zero-dep, sync. |
71
+ | `fuse(evidence, opts?)` | Weighted blend + "context never overrides clear script" guard. |
72
+ | `langtell/franc` | Opt-in franc engine (pulls trigram tables). |
73
+ | `langtell/chrome-ai` | Opt-in on-device Chrome AI engine (browser). |
74
+
75
+ `detect` returns a plain `Classification` when every registered source is
76
+ synchronous, and `Promise<Classification>` the moment an async engine is in the
77
+ mix — the type reflects the config, so you never guess whether to `await`. See
78
+ [DESIGN.md](./DESIGN.md) for the full architecture.
79
+
80
+ ## Prior art
81
+
82
+ - [`franc`](https://github.com/wooorm/franc) — trigram detection over 400+
83
+ languages. `langtell` can use it as one engine, but works on short strings
84
+ where franc has too little signal, and fuses it with page/transport metadata.
85
+ - `cld3`, `tinyld`, `languagedetect` — statistical text-only detectors.
86
+ `langtell` differs by combining script logic with out-of-band evidence and
87
+ emitting an auditable trail.
88
+
89
+ ## License
90
+
91
+ [MIT](./LICENSE)
package/package.json ADDED
@@ -0,0 +1,20 @@
1
+ {
2
+ "name": "langtell",
3
+ "version": "0.0.1",
4
+ "description": "Tell me the language — evidence-fusion language detection for short strings, with an auditable confidence trail.",
5
+ "type": "module",
6
+ "license": "MIT",
7
+ "sideEffects": false,
8
+ "keywords": [
9
+ "language",
10
+ "language-detection",
11
+ "language-identification",
12
+ "i18n",
13
+ "locale",
14
+ "bcp47",
15
+ "script-detection"
16
+ ],
17
+ "files": [
18
+ "dist"
19
+ ]
20
+ }