surname-transliterator 0.4.2 → 0.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +57 -12
- data/lib/surname/transliterator/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: d56d6ccf14e6dac178484bca5702eec757b2f0275561e80f375470f177fbc95b
|
|
4
|
+
data.tar.gz: ea5a27415f2b10044daf825d6e96c5cebe06cf6f890c0c2ec99ecd01acffba2b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: c6a4f1aa675aaf7b3fad50ebe0137680bccd9e4a8e4b14bf1563b9e092afa1ab273700ceacf3ad6cff11c31cf01b448e997452180ba7c5b7cc2469f156d3d9d2
|
|
7
|
+
data.tar.gz: 68e439b62b070ae3d84c80d421e8bd994cc9e0df0814184a0581307a04e9f732853e33af73dd21484c96d662cfa98a8fff32a40a7dc585b9a22558d224225592
|
data/README.md
CHANGED
|
@@ -3,13 +3,13 @@
|
|
|
3
3
|
A Ruby gem for cross-language surname transliteration and transformation, based on genealogical rules. Supports transliteration (removing diacritics/Cyrillic) and polonization/de-polonization endings between languages like Polish-Lithuanian, Polish-Russian, Czech, etc. Extensible for more pairs. Useful for reducing false positives in genealogical matching.
|
|
4
4
|
|
|
5
5
|
Features:
|
|
6
|
-
- Transliterate
|
|
7
|
-
-
|
|
6
|
+
- Transliterate surnames (remove diacritics/Cyrillic, handle Polish digraphs like sz/č/cz/rz).
|
|
7
|
+
- Transform endings between languages (polonization/de-polonization based on genealogical rules).
|
|
8
|
+
- Generate W/V interchange variants for better genealogical matching.
|
|
9
|
+
- Support for Polish ↔ Lithuanian, Polish ↔ Russian transformations (asymmetric).
|
|
8
10
|
|
|
9
11
|
## Installation
|
|
10
12
|
|
|
11
|
-
TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
|
|
12
|
-
|
|
13
13
|
Install the gem and add to the application's Gemfile by executing:
|
|
14
14
|
|
|
15
15
|
```bash
|
|
@@ -29,23 +29,23 @@ require 'surname/transliterator'
|
|
|
29
29
|
|
|
30
30
|
# Convenience methods
|
|
31
31
|
polish_to_lith = Surname::Transliterator.polish_to_lithuanian("Łukasiewicz")
|
|
32
|
-
# => ["
|
|
32
|
+
# => ["Lukasiewič"] (transliterated with Polish digraphs)
|
|
33
33
|
|
|
34
34
|
polish_to_lith2 = Surname::Transliterator.polish_to_lithuanian("Antonowicz")
|
|
35
|
-
# => ["
|
|
35
|
+
# => ["Antonavičius", "Antonowič"] (transliterated + transformed)
|
|
36
36
|
|
|
37
37
|
lith_to_polish = Surname::Transliterator.lithuanian_to_polish("Jankauskas")
|
|
38
|
-
# => ["
|
|
38
|
+
# => ["Jankauski", "Jankauskas"] (transformed + transliterated)
|
|
39
39
|
|
|
40
40
|
polish_to_russian = Surname::Transliterator.polish_to_russian("Kowalski")
|
|
41
41
|
# => ["Kowalski", "Kowalskii"]
|
|
42
42
|
|
|
43
43
|
russian_to_polish = Surname::Transliterator.russian_to_polish("Иванов")
|
|
44
|
-
# => ["Ivanov"
|
|
44
|
+
# => ["Ivanov"]
|
|
45
45
|
|
|
46
|
-
# General cross-language normalization
|
|
47
|
-
variants = Surname::Transliterator.normalize_surname("
|
|
48
|
-
# => ["
|
|
46
|
+
# General cross-language normalization (includes W/V interchange for genealogical matching)
|
|
47
|
+
variants = Surname::Transliterator.normalize_surname("Wiszniewski", 'polish', 'lithuanian')
|
|
48
|
+
# => ["Wišnievskis", "Wišnievskas", "Wišniewski", "Višnievskis", "Višnievskas", "Višniewski"] (transformed + W/V)
|
|
49
49
|
|
|
50
50
|
# Just transliterate (remove diacritics/Cyrillic)
|
|
51
51
|
clean_polish = Surname::Transliterator.transliterate("Świętochowski", 'polish')
|
|
@@ -55,6 +55,43 @@ clean_russian = Surname::Transliterator.transliterate("Иванов", 'russian')
|
|
|
55
55
|
# => "Ivanov"
|
|
56
56
|
```
|
|
57
57
|
|
|
58
|
+
## Important Notes
|
|
59
|
+
|
|
60
|
+
- **Asymmetric Transformations**: Translations between languages are not symmetric due to historical genealogical adaptations. For example, Polish -owicz may become Lithuanian -avičius, but reversing it doesn't always restore -owicz exactly. Use `polish_to_lithuanian` and `lithuanian_to_polish` as separate methods with their own mappings.
|
|
61
|
+
|
|
62
|
+
## Supported Languages and Pairs
|
|
63
|
+
|
|
64
|
+
The gem supports transliteration and transformation for the following languages:
|
|
65
|
+
|
|
66
|
+
- **Polish**: Full transliteration (diacritics + digraphs like sz/č/cz/rz).
|
|
67
|
+
- **Lithuanian**: Full transliteration.
|
|
68
|
+
- **Russian**: Full transliteration.
|
|
69
|
+
- **Czech**: Basic transliteration.
|
|
70
|
+
|
|
71
|
+
### Supported Language Pairs for Transformations
|
|
72
|
+
|
|
73
|
+
| From ↓ / To → | Polish | Lithuanian | Russian |
|
|
74
|
+
|---------------|--------|------------|---------|
|
|
75
|
+
| **Polish** | - | ✅ (polish_to_lithuanian) | ✅ (polish_to_russian) |
|
|
76
|
+
| **Lithuanian**| ✅ (lithuanian_to_polish) | - | - |
|
|
77
|
+
| **Russian** | ✅ (russian_to_polish) | - | - |
|
|
78
|
+
|
|
79
|
+
Note: Transformations are asymmetric (see below). Add more pairs by editing `POLONIZATION_MAPPINGS`.
|
|
80
|
+
|
|
81
|
+
## Transformation Matrix Examples
|
|
82
|
+
|
|
83
|
+
Below is a matrix showing example transformations between languages (not symmetric):
|
|
84
|
+
|
|
85
|
+
| From → To | Polish → Lithuanian | Lithuanian → Polish |
|
|
86
|
+
|--------------------|---------------------|---------------------|
|
|
87
|
+
| Antonowicz | Antonavičius, Antonowič | - |
|
|
88
|
+
| Jankauskas | - | Jankauski, Jankauskas |
|
|
89
|
+
| Kowalski | Kovalskis | - |
|
|
90
|
+
| Wiśniewski | Višnievskis, Višnievskas | - |
|
|
91
|
+
| Dombrovskis | - | Dombrowski |
|
|
92
|
+
|
|
93
|
+
This illustrates why separate methods are needed for each direction.
|
|
94
|
+
|
|
58
95
|
## Adding New Languages
|
|
59
96
|
|
|
60
97
|
Edit `DIACRITIC_MAPPINGS` and `POLONIZATION_MAPPINGS` in the code to add support for more languages/pairs.
|
|
@@ -65,6 +102,14 @@ After checking out the repo, run `bin/setup` to install dependencies. You can al
|
|
|
65
102
|
|
|
66
103
|
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
|
67
104
|
|
|
105
|
+
## TODO
|
|
106
|
+
|
|
107
|
+
- Add polonization mappings for Czech surnames.
|
|
108
|
+
- Extend support for more language pairs (e.g., Lithuanian ↔ Russian).
|
|
109
|
+
- Improve W/V interchange logic for other languages.
|
|
110
|
+
- Add more genealogical sources for mapping validation.
|
|
111
|
+
- Consider adding fuzzy matching or Soundex for better approximate matches.
|
|
112
|
+
|
|
68
113
|
## Contributing
|
|
69
114
|
|
|
70
|
-
Bug reports and pull requests are welcome on GitHub at https://github.com/
|
|
115
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/justi-blue/surname-transliterator.
|