lemma-is 0.7.0 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +13 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -28,6 +28,19 @@ hestur, hest, hesti, hests, hestar, hesta, hestum, hestanna...
|
|
|
28
28
|
|
|
29
29
|
If a user searches "hestur" but your document contains "hestinum", they won't find it—unless you normalize both to the lemma at index time.
|
|
30
30
|
|
|
31
|
+
## Background
|
|
32
|
+
|
|
33
|
+
Icelandic is underserved in the search ecosystem:
|
|
34
|
+
|
|
35
|
+
- **PostgreSQL** has no Icelandic stemmer ([Snowball](https://snowballstem.org/) doesn't support it)
|
|
36
|
+
- **Elasticsearch** has no Icelandic analyzer in its [36 built-in languages](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html)
|
|
37
|
+
- **Algolia** lists Icelandic but only provides basic plurals—no morphological analysis
|
|
38
|
+
- **Existing Icelandic NLP tools** ([Greynir](https://github.com/mideind/GreynirPackage), [Nefnir](https://github.com/jonfd/nefnir)) are Python-only
|
|
39
|
+
|
|
40
|
+
For comparison, Finnish has [Voikko](https://voikko.puimula.org/) with PostgreSQL and Elasticsearch plugins. Icelandic has had nothing equivalent—until now.
|
|
41
|
+
|
|
42
|
+
lemma-is is the first npm package providing Icelandic lemmatization for search. It embeds the [BÍN](https://bin.arnastofnun.is/) morphological database and runs anywhere JavaScript runs.
|
|
43
|
+
|
|
31
44
|
## Why lemma-is?
|
|
32
45
|
|
|
33
46
|
GreynirEngine remains the gold standard for **sentence parsing** and grammatical analysis in Icelandic. But full parsing is not forgiving: if a sentence doesn't parse, you don't get disambiguated lemmas. That makes it a poor fit for messy, real‑world search indexing where recall matters.
|