an-array-of-catalan-words 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE.md +7 -0
- package/README.md +72 -0
- package/index.d.ts +2 -0
- package/index.js +1 -0
- package/index.json +891426 -0
- package/package.json +29 -0
package/LICENSE.md
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
This package is distributed under the same multi-license as the Softcatalà Hunspell dictionaries it is derived from.
|
|
2
|
+
|
|
3
|
+
You may choose to use this software under the terms of the:
|
|
4
|
+
|
|
5
|
+
- GNU General Public License version 2.0 or later (GPL-2.0-or-later)
|
|
6
|
+
- GNU Lesser General Public License version 2.1 or later (LGPL-2.1-or-later)
|
|
7
|
+
- Mozilla Public License version 1.1 (MPL-1.1)
|
package/README.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# an-array-of-catalan-words
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/an-array-of-catalan-words)
|
|
4
|
+
|
|
5
|
+
List of ~891,000 Catalan words.
|
|
6
|
+
|
|
7
|
+
Derived from the [Softcatalà](https://github.com/softcatala/catalan-dict-tools) Hunspell dictionaries,
|
|
8
|
+
processed and filtered to include only clean alphabetic words using the Catalan character set
|
|
9
|
+
(`[a-zçàèéíïòóúü]`).
|
|
10
|
+
|
|
11
|
+
Inspired by the architecture of [`an-array-of-english-words`](https://github.com/words/an-array-of-english-words)
|
|
12
|
+
by [Titus Wormer](https://github.com/wooorm).
|
|
13
|
+
|
|
14
|
+
## Install
|
|
15
|
+
|
|
16
|
+
```sh
|
|
17
|
+
npm install an-array-of-catalan-words
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Use
|
|
21
|
+
|
|
22
|
+
```js
|
|
23
|
+
const words = require('an-array-of-catalan-words')
|
|
24
|
+
|
|
25
|
+
console.log(words.length) // ~891000
|
|
26
|
+
console.log(words.slice(0, 5))
|
|
27
|
+
// [ 'a', 'aaronítica', 'aaronítico', 'ab', 'abaceria' ]
|
|
28
|
+
|
|
29
|
+
console.log(words.filter(w => w.startsWith('xoc')))
|
|
30
|
+
// [ 'xoc', 'xocolata', 'xocolater', ... ]
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## API
|
|
34
|
+
|
|
35
|
+
The default export is a `string[]` of Catalan words.
|
|
36
|
+
|
|
37
|
+
### TypeScript
|
|
38
|
+
|
|
39
|
+
Types are included:
|
|
40
|
+
|
|
41
|
+
```ts
|
|
42
|
+
import words = require('an-array-of-catalan-words')
|
|
43
|
+
|
|
44
|
+
const filtered: string[] = words.filter(w => w.length === 5)
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Dataset
|
|
48
|
+
|
|
49
|
+
- **Source**: [catalan-dict-tools](https://github.com/softcatala/catalan-dict-tools) (Softcatalà)
|
|
50
|
+
- **License**: GPL-2.0-or-later OR LGPL-2.1-or-later OR MPL-1.1
|
|
51
|
+
- **Words**: ~891,000 unique, lowercase Catalan words
|
|
52
|
+
- **Filter**: Only characters matching `/^[a-zçàèéíïòóúü]+$/`
|
|
53
|
+
|
|
54
|
+
## Build
|
|
55
|
+
|
|
56
|
+
To regenerate `index.json` from source:
|
|
57
|
+
|
|
58
|
+
```sh
|
|
59
|
+
node setup.js # Download catalan.dic and catalan.aff
|
|
60
|
+
# Then expand the dictionary (requires hunspell-reader):
|
|
61
|
+
npx hunspell-reader words -o data/raw_words.txt -s -u -l data/catalan.dic
|
|
62
|
+
node build.js # Clean, filter and generate index.json
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Credits
|
|
66
|
+
|
|
67
|
+
- **Linguistic data**: [Softcatalà](https://github.com/softcatala/catalan-dict-tools)
|
|
68
|
+
- **Architectural pattern**: [Titus Wormer (@wooorm)](https://github.com/wooorm) — [`an-array-of-english-words`](https://github.com/words/an-array-of-english-words)
|
|
69
|
+
|
|
70
|
+
## License
|
|
71
|
+
|
|
72
|
+
(GPL-2.0-or-later OR LGPL-2.1-or-later OR MPL-1.1) © Pablo G. Guízar
|
package/index.d.ts
ADDED
package/index.js
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
module.exports = require('./index.json');
|