gptrans 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.gitattributes ADDED
@@ -0,0 +1,2 @@
1
+ # Auto detect text files and perform LF normalization
2
+ * text=auto
package/README.md ADDED
@@ -0,0 +1,115 @@
1
+ # 🚆 GPTrans
2
+
3
+ The smarter way to translate: AI-powered, cache-optimized, globally ready.
4
+
5
+ It intelligently batches and caches translation requests, ensuring blazing-fast results and reducing API calls.
6
+
7
+ Whether you're building a multilingual website, a mobile app, or a localization tool, GPTrans delivers top-tier performance with minimal setup.
8
+
9
+ ## ✨ Features
10
+
11
+ - **AI-Powered Translations:** Harness advanced models like OpenAI's GPT and Anthropic's Sonnet for high-quality translations
12
+ - **Smart Batching & Debouncing:** Automatically groups translation requests to optimize API usage
13
+ - **Caching with DeepBase:** Quickly retrieves cached translations to boost performance
14
+ - **Parameter Substitution:** Dynamically replace placeholders in your translations
15
+ - **Flexible Configuration:** Customize source and target locales, model keys, and batching settings to fit your needs
16
+
17
+ ## 📦 Installation
18
+
19
+ ```bash
20
+ npm install gptrans
21
+ ```
22
+
23
+ ## 🚀 Quick Start
24
+
25
+ Here's a simple example to get you started:
26
+
27
+ ```javascript
28
+ import GPTrans from 'gptrans';
29
+
30
+ const gptrans = new GPTrans({
31
+ target: 'es-AR',
32
+ from: 'en-US',
33
+ model: 'claude-3-5-sonnet-20241022'
34
+ });
35
+
36
+ // Translate text with parameter substitution
37
+ console.log(gptrans.t('Hello, {name}!', { name: 'John' }));
38
+
39
+ // Set context for gender-aware translations
40
+ console.log(gptrans.setContext('Message is for a woman').t('You are very good'));
41
+
42
+ // Other translation examples
43
+ console.log(gptrans.t('Withdraw'));
44
+ console.log(gptrans.t('Top-up'));
45
+ console.log(gptrans.t('Transfer'));
46
+ console.log(gptrans.t('Deposit'));
47
+ console.log(gptrans.t('Balance'));
48
+ console.log(gptrans.t('Transaction'));
49
+ console.log(gptrans.t('Account'));
50
+ console.log(gptrans.t('Card'));
51
+ ```
52
+
53
+ ## ⚙️ Configuration Options
54
+
55
+ When creating a new instance of Gptrans, you can customize:
56
+
57
+ | Option | Description | Default |
58
+ |--------|-------------|---------|
59
+ | `target` | Target language locale | `'en-US'` |
60
+ | `from` | Source language locale | `'es-AR'` |
61
+ | `model` | Translation model key | - |
62
+ | `batchThreshold` | Maximum number of characters to accumulate before triggering batch processing | `1000` |
63
+ | `debounceTimeout` | Time in milliseconds to wait before processing translations | `500` |
64
+
65
+ ## 🔍 How It Works
66
+
67
+ 1. **First-Time Translation Behavior:** On the first request, Gptrans will return the original text while processing the translation in the background. This ensures your application remains responsive without waiting for API calls.
68
+ 2. **Translation Caching:** Once processed, translations are stored in `db/gptrans_<iso>.json`. Subsequent requests for the same text will be served instantly from the cache.
69
+ 3. **Smart Batch Processing:** Translations are processed in batches, providing better context for more accurate results.
70
+ 4. **Dynamic Model Integration:** Easily plug in multiple AI translation providers with the ModelMix library.
71
+ 5. **Customizable Prompts:** Load and modify translation prompts (see the `prompt/translate.md` file) to fine-tune the translation output.
72
+ 6. **Manual Corrections:** A JSON file stores key-translation pairs, allowing you to override specific translations and make manual corrections when needed. Simply edit the `db/gptrans_<iso>.json` file:
73
+
74
+ ```json
75
+ {
76
+ {
77
+ "balanc_pephba": "Saldo",
78
+ "transa_m1wmv2": "Transacción",
79
+ "accoun_rtvnkg": "Cuenta",
80
+ "card_yis1pt": "Tarjeta",
81
+ "hello_name_1vhllz3": "¡Hola, {name}!",
82
+ ...
83
+ }
84
+ ```
85
+
86
+ ## 🌐 Environment Setup
87
+
88
+ Gptrans uses dotenv for environment configuration. Create a `.env` file in your project root and add your API keys:
89
+
90
+ ```env
91
+ OPENAI_API_KEY=your_openai_api_key
92
+ ANTHROPIC_API_KEY=your_anthropic_api_key
93
+ ```
94
+
95
+ ## 🎉 Why Choose Gptrans?
96
+
97
+ Gptrans stands out by combining advanced AI capabilities with efficient batching and caching. This means:
98
+
99
+ - **Speed:** Reduced API calls and instant retrieval of cached translations
100
+ - **Quality:** Leverage state-of-the-art models for precise and context-aware translations
101
+ - **Flexibility:** Tailor the tool to your specific localization needs with minimal effort
102
+ - **Zero Maintenance:** Set it up once and forget about it - automatic updates and self-healing capabilities keep everything running smoothly
103
+
104
+ If you're looking to streamline your translation workflow and bring your applications to a global audience effortlessly, Gptrans is the perfect choice!
105
+
106
+ ## Contributing
107
+
108
+ Contributions are welcome! Please open an issue or submit a pull request on GitHub to contribute improvements or fixes.
109
+
110
+ ## License
111
+
112
+ GPTrans is released under the MIT License.
113
+
114
+ Happy translating! 🌍✨
115
+
@@ -0,0 +1,13 @@
1
+ {
2
+ "eres_muy_bueno_26czme": "Sos muy bueno",
3
+ "eres_muy_bueno_k3ml5b": "Sos muy buena",
4
+ "hello_name_1987p1n": "¡Hola, {name}!",
5
+ "topup_uzdh5y": "Recargar",
6
+ "transf_176pc1a": "Transferir",
7
+ "deposi_wg2ec5": "Depositar",
8
+ "balanc_1rv8if7": "Saldo",
9
+ "transa_1wtqm5d": "Transacción",
10
+ "accoun_x1y0v8": "Cuenta",
11
+ "card_yis1ox": "Tarjeta",
12
+ "tienes_fuego_1i2o3ok": "¿Tenés fuego?"
13
+ }
@@ -0,0 +1,3 @@
1
+ {
2
+ "tenes_fuego_1fs98im": "¿Tienes fuego?"
3
+ }
@@ -0,0 +1,10 @@
1
+ {
2
+ "hello_name_1987p1n": "Hello, {name}!",
3
+ "topup_uzdh5y": "Top-up",
4
+ "transf_176pc1a": "Transfer",
5
+ "deposi_wg2ec5": "Deposit",
6
+ "balanc_1rv8if7": "Balance",
7
+ "transa_1wtqm5d": "Transaction",
8
+ "accoun_x1y0v8": "Account",
9
+ "card_yis1ox": "Card"
10
+ }
@@ -0,0 +1,3 @@
1
+ {
2
+ "tenes_fuego_1fs98im": "¿Tenés fuego?"
3
+ }
@@ -0,0 +1,5 @@
1
+ {
2
+ "eres_muy_bueno_26czme": "Eres muy bueno",
3
+ "eres_muy_bueno_k3ml5b": "Eres muy bueno",
4
+ "tienes_fuego_1i2o3ok": "Tienes fuego?"
5
+ }
package/demo/case_1.js ADDED
@@ -0,0 +1,36 @@
1
+ import GPTrans from '../index.js';
2
+
3
+ const gptrans = new GPTrans({
4
+ model: 'claude-3-5-sonnet-20241022',
5
+ });
6
+
7
+ console.log(gptrans.t('Hello, {name}!', { name: 'Anya' }));
8
+
9
+ console.log(gptrans.t('Top-up'));
10
+ console.log(gptrans.t('Transfer'));
11
+ console.log(gptrans.t('Deposit'));
12
+ console.log(gptrans.t('Balance'));
13
+ console.log(gptrans.t('Transaction'));
14
+ console.log(gptrans.t('Account'));
15
+ console.log(gptrans.t('Card'));
16
+
17
+ // Case 2: Translate from Spanish Spain to Spanish Argentina
18
+ const es2ar = new GPTrans({
19
+ from: 'es-ES',
20
+ target: 'es-AR',
21
+ model: 'claude-3-5-sonnet-20241022',
22
+ });
23
+
24
+ console.log(es2ar.t('Eres muy bueno'));
25
+ console.log(es2ar.setContext('El mensaje es para una mujer').t('Eres muy bueno'));
26
+ console.log(es2ar.setContext().t('Tienes fuego?'));
27
+
28
+ // Case 3
29
+ const ar2es = new GPTrans({
30
+ from: 'es-AR',
31
+ target: 'es-ES',
32
+ model: 'claude-3-5-sonnet-20241022',
33
+ });
34
+
35
+ console.log(ar2es.t('¿Tenés fuego?'));
36
+
package/index.js ADDED
@@ -0,0 +1,152 @@
1
+ import DeepBase from 'deepbase';
2
+ import stringHash from 'string-hash';
3
+ import { ModelMix, MixOpenAI, MixAnthropic } from 'modelmix';
4
+ import dotenv from 'dotenv';
5
+
6
+ import { isoAssoc } from './isoAssoc.js';
7
+ dotenv.config();
8
+
9
+ class Gptrans {
10
+ static #mmixInstance = null;
11
+
12
+ static get mmix() {
13
+ if (!this.#mmixInstance) {
14
+ const mmix = new ModelMix();
15
+
16
+ mmix.attach(new MixOpenAI());
17
+ mmix.attach(new MixAnthropic());
18
+
19
+ this.#mmixInstance = mmix;
20
+ }
21
+ return this.#mmixInstance;
22
+ }
23
+
24
+ constructor({ from = 'en-US', target = 'es-AR', model = 'gpt-4o-mini', batchThreshold = 1000, debounceTimeout = 500, promptFile = './prompt/translate.md', context = '' }) {
25
+ this.target = target;
26
+ this.from = from;
27
+ this.dbTarget = new DeepBase({ name: 'gptrans_' + this.target });
28
+ this.dbFrom = new DeepBase({ name: 'gptrans_from_' + this.from });
29
+ this.batchThreshold = batchThreshold; // Now represents character count threshold
30
+ this.debounceTimeout = debounceTimeout;
31
+ this.pendingTranslations = new Map(); // [key, text]
32
+ this.pendingCharCount = 0; // Add character count tracker
33
+ this.debounceTimer = null;
34
+ this.modelKey = model;
35
+ this.promptFile = promptFile;
36
+ this.context = context;
37
+ this.modelConfig = {
38
+ config: {
39
+ max_history: 1,
40
+ debug: false,
41
+ bottleneck: {
42
+ maxConcurrent: 5,
43
+ }
44
+ },
45
+ options: { max_tokens: batchThreshold }
46
+ };
47
+ }
48
+
49
+ setContext(context = '') {
50
+ if (this.context !== context && this.pendingTranslations.size > 0) {
51
+ clearTimeout(this.debounceTimer);
52
+ this._processBatch();
53
+ }
54
+ this.context = context;
55
+ return this;
56
+ }
57
+
58
+ t(text, params = {}) {
59
+ const key = this._textToKey(text);
60
+ const translation = this.get(key, text) || text;
61
+
62
+ return Object.entries(params).reduce(
63
+ (text, [key, value]) => text.replace(`{${key}}`, value),
64
+ translation
65
+ );
66
+ }
67
+
68
+ get(key, text) {
69
+ const translation = this.dbTarget.get(key);
70
+ if (!translation) {
71
+ this.pendingTranslations.set(key, text);
72
+ this.pendingCharCount += text.length; // Update character count
73
+
74
+ if (!this.dbFrom.get(key)) {
75
+ this.dbFrom.set(key, text);
76
+ }
77
+
78
+ // Clear existing timer
79
+ if (this.debounceTimer) {
80
+ clearTimeout(this.debounceTimer);
81
+ }
82
+
83
+ // Set new timer
84
+ this.debounceTimer = setTimeout(() => {
85
+ if (this.pendingTranslations.size > 0) {
86
+ this._processBatch();
87
+ }
88
+ }, this.debounceTimeout);
89
+
90
+ // Process if we hit the character count threshold
91
+ if (this.pendingCharCount >= this.batchThreshold) {
92
+ clearTimeout(this.debounceTimer);
93
+ this._processBatch();
94
+ }
95
+ }
96
+ return translation;
97
+ }
98
+
99
+ async _processBatch() {
100
+ const batch = Array.from(this.pendingTranslations.entries());
101
+
102
+ // Clear pending translations and character count before awaiting translation
103
+ this.pendingTranslations.clear();
104
+ this.modelConfig.options.max_tokens = this.pendingCharCount + 1000;
105
+ this.pendingCharCount = 0;
106
+
107
+ const textsToTranslate = batch.map(([_, text]) => text).join('\n---\n');
108
+ const translations = await this._translate(textsToTranslate);
109
+
110
+ const translatedTexts = translations.split('\n---\n');
111
+
112
+ batch.forEach(([key], index) => {
113
+ this.dbTarget.set(key, translatedTexts[index].trim());
114
+ });
115
+ }
116
+
117
+ async _translate(text) {
118
+ const model = Gptrans.mmix.create(this.modelKey, this.modelConfig);
119
+
120
+ model.setSystem("You are an expert translator specialized in literary translation between FROM_LANG and TARGET_DENONYM TARGET_LANG.");
121
+
122
+ model.addTextFromFile(this.promptFile);
123
+
124
+ model.replace({ INPUT: text, CONTEXT: this.context });
125
+ model.replace(isoAssoc(this.target, 'TARGET_'));
126
+ model.replace(isoAssoc(this.from, 'FROM_'));
127
+
128
+ const response = await model.message();
129
+
130
+ const codeBlockRegex = /```(?:\w*\n)?([\s\S]*?)```/;
131
+ const match = response.match(codeBlockRegex);
132
+ const translatedText = match ? match[1].trim() : response;
133
+
134
+ return translatedText;
135
+ }
136
+
137
+ _textToKey(text, tokens = 5, maxlen = 6) {
138
+ const words = text
139
+ .toLowerCase()
140
+ .replace(/[áàâäéèêëíìîïóòôöúùûüñ]/g, c => 'aeioun'['áéíóúñ'.indexOf(c.toLowerCase())] || c)
141
+ .replace(/[^a-z0-9\s]+/g, "")
142
+ .split(" ")
143
+ .slice(0, tokens);
144
+
145
+ let key = words.map((x) => x.slice(0, maxlen)).join("_");
146
+ key += key ? '_' : '';
147
+ key += stringHash(text + this.context).toString(36);
148
+ return key;
149
+ }
150
+ }
151
+
152
+ export default Gptrans;
package/isoAssoc.js ADDED
@@ -0,0 +1,178 @@
1
+ const countryName = {
2
+ 'ar': 'Argentina',
3
+ 'us': 'United States',
4
+ 'es': 'Spain',
5
+ 'pt': 'Portugal',
6
+ 'br': 'Brazil',
7
+ 'gb': 'United Kingdom',
8
+ 'au': 'Australia',
9
+ 'ca': 'Canada',
10
+ 'cn': 'China',
11
+ 'tw': 'Taiwan',
12
+ 'hk': 'Hong Kong',
13
+ 'sg': 'Singapore',
14
+ 'mx': 'Mexico',
15
+ 'in': 'India',
16
+ 'sa': 'Saudi Arabia',
17
+ 'bd': 'Bangladesh',
18
+ 'ru': 'Russia',
19
+ 'jp': 'Japan',
20
+ 'fr': 'France',
21
+ 'de': 'Germany',
22
+ 'at': 'Austria',
23
+ 'ch': 'Switzerland',
24
+ 'kr': 'South Korea',
25
+ 'it': 'Italy',
26
+ 'tr': 'Turkey',
27
+ 'vn': 'Vietnam',
28
+ 'pl': 'Poland',
29
+ 'nl': 'Netherlands',
30
+ 'be': 'Belgium',
31
+ 'id': 'Indonesia',
32
+ 'th': 'Thailand',
33
+ 'ph': 'Philippines',
34
+ 'ir': 'Iran',
35
+ 'ua': 'Ukraine',
36
+ 'il': 'Israel',
37
+ 'se': 'Sweden',
38
+ 'no': 'Norway',
39
+ 'fi': 'Finland',
40
+ 'cz': 'Czech Republic',
41
+ 'hu': 'Hungary',
42
+ 'ro': 'Romania',
43
+ 'bg': 'Bulgaria',
44
+ 'co': 'Colombia',
45
+ 'cl': 'Chile',
46
+ 'pe': 'Peru',
47
+ 've': 'Venezuela',
48
+ 'ec': 'Ecuador',
49
+ 'uy': 'Uruguay',
50
+ 'py': 'Paraguay',
51
+ 'bo': 'Bolivia',
52
+ 'cr': 'Costa Rica',
53
+ 'nz': 'New Zealand',
54
+ 'gr': 'Greece',
55
+ 'dk': 'Denmark'
56
+ };
57
+
58
+ const countryDenonym = {
59
+ 'ar': 'Argentinian',
60
+ 'es': 'Spanish',
61
+ 'pt': 'Portuguese',
62
+ 'br': 'Brazilian',
63
+ 'us': 'American',
64
+ 'gb': 'British',
65
+ 'au': 'Australian',
66
+ 'ca': 'Canadian',
67
+ 'cn': 'Chinese',
68
+ 'tw': 'Taiwanese',
69
+ 'hk': 'Hong Kongese',
70
+ 'sg': 'Singaporean',
71
+ 'mx': 'Mexican',
72
+ 'in': 'Indian',
73
+ 'sa': 'Saudi Arabian',
74
+ 'bd': 'Bangladeshi',
75
+ 'ru': 'Russian',
76
+ 'jp': 'Japanese',
77
+ 'fr': 'French',
78
+ 'de': 'German',
79
+ 'at': 'Austrian',
80
+ 'ch': 'Swiss',
81
+ 'kr': 'Korean',
82
+ 'it': 'Italian',
83
+ 'tr': 'Turkish',
84
+ 'vn': 'Vietnamese',
85
+ 'pl': 'Polish',
86
+ 'nl': 'Dutch',
87
+ 'be': 'Belgian',
88
+ 'id': 'Indonesian',
89
+ 'th': 'Thai',
90
+ 'ph': 'Filipino',
91
+ 'ir': 'Iranian',
92
+ 'ua': 'Ukrainian',
93
+ 'il': 'Israeli',
94
+ 'se': 'Swedish',
95
+ 'no': 'Norwegian',
96
+ 'fi': 'Finnish',
97
+ 'cz': 'Czech',
98
+ 'hu': 'Hungarian',
99
+ 'ro': 'Romanian',
100
+ 'bg': 'Bulgarian',
101
+ 'co': 'Colombian',
102
+ 'cl': 'Chilean',
103
+ 'pe': 'Peruvian',
104
+ 've': 'Venezuelan',
105
+ 'ec': 'Ecuadorian',
106
+ 'uy': 'Uruguayan',
107
+ 'py': 'Paraguayan',
108
+ 'bo': 'Bolivian',
109
+ 'cr': 'Costa Rican',
110
+ 'nz': 'New Zealander',
111
+ 'gr': 'Greek',
112
+ 'dk': 'Danish'
113
+ };
114
+
115
+ const langName = {
116
+ 'es': 'Spanish',
117
+ 'pt': 'Portuguese',
118
+ 'en': 'English',
119
+ 'zh': 'Chinese',
120
+ 'hi': 'Hindi',
121
+ 'ar': 'Arabic',
122
+ 'bn': 'Bengali',
123
+ 'ru': 'Russian',
124
+ 'ja': 'Japanese',
125
+ 'fr': 'French',
126
+ 'de': 'German',
127
+ 'ko': 'Korean',
128
+ 'it': 'Italian',
129
+ 'tr': 'Turkish',
130
+ 'vi': 'Vietnamese',
131
+ 'pl': 'Polish',
132
+ 'nl': 'Dutch',
133
+ 'id': 'Indonesian',
134
+ 'th': 'Thai',
135
+ 'tl': 'Tagalog',
136
+ 'fa': 'Persian',
137
+ 'uk': 'Ukrainian',
138
+ 'he': 'Hebrew',
139
+ 'sv': 'Swedish',
140
+ 'no': 'Norwegian',
141
+ 'fi': 'Finnish',
142
+ 'cs': 'Czech',
143
+ 'hu': 'Hungarian',
144
+ 'ro': 'Romanian',
145
+ 'bg': 'Bulgarian',
146
+ 'ca': 'Catalan',
147
+ 'gl': 'Galician',
148
+ 'eu': 'Basque',
149
+ 'el': 'Greek',
150
+ 'da': 'Danish',
151
+ 'ur': 'Urdu',
152
+ 'ms': 'Malay'
153
+ };
154
+
155
+ export function isoAssoc(iso, prefix = '') {
156
+ if (!iso) {
157
+ throw new Error('ISO code is required');
158
+ }
159
+
160
+ const parts = iso.toLowerCase().split('-');
161
+ const lang = parts[0];
162
+ const country = parts.length > 1 ? parts[1] : null;
163
+
164
+ if (!langName[lang]) {
165
+ throw new Error(`Invalid language code: ${lang}`);
166
+ }
167
+
168
+ if (country && !countryName[country]) {
169
+ throw new Error(`Invalid country code: ${country}`);
170
+ }
171
+
172
+ return {
173
+ [prefix + 'ISO']: iso,
174
+ [prefix + 'LANG']: langName[lang],
175
+ [prefix + 'COUNTRY']: country ? countryName[country] : langName[lang],
176
+ [prefix + 'DENONYM']: country ? countryDenonym[country] : 'Universal',
177
+ };
178
+ }
package/package.json ADDED
@@ -0,0 +1,26 @@
1
+ {
2
+ "name": "gptrans",
3
+ "type": "module",
4
+ "version": "1.0.0",
5
+ "description": "🚆 GPTrans - The smarter AI-powered way to translate.",
6
+ "repository": {
7
+ "type": "git",
8
+ "url": "git+https://github.com/clasen/GPTrans.git"
9
+ },
10
+ "main": "index.js",
11
+ "scripts": {
12
+ "test": "echo \"Error: no test specified\" && exit 1"
13
+ },
14
+ "author": "Martin Clasen",
15
+ "license": "MIT",
16
+ "bugs": {
17
+ "url": "https://github.com/clasen/GPTrans/issues"
18
+ },
19
+ "homepage": "https://github.com/clasen/GPTrans#readme",
20
+ "dependencies": {
21
+ "deepbase": "^1.3.2",
22
+ "dotenv": "^16.4.7",
23
+ "modelmix": "^2.8.0",
24
+ "string-hash": "^1.1.3"
25
+ }
26
+ }
@@ -0,0 +1,19 @@
1
+ # Goal
2
+ Translation from FROM_ISO to TARGET_ISO (TARGET_DENONYM TARGET_LANG) with cultural adaptations.
3
+
4
+ ## Text to translate
5
+ INPUT
6
+
7
+ # Return Format
8
+ - Provide the final translation within a code block using ```.
9
+ - Do not include alternative translations, only provide the best translation.
10
+
11
+ # Warnings
12
+ - **Context:** I will provide you with a text in FROM_DENONYM FROM_LANG. The goal is to translate it to TARGET_ISO (TARGET_DENONYM TARGET_LANG) while maintaining the essence, style, intention, and tone of the original.
13
+ - **Cultural references:** Adapt or explain references that are not familiar in TARGET_DENONYM culture, whenever necessary.
14
+ - **Wordplay and humor:** When it's impossible to directly translate wordplay, find a resource that recreates the playful effect.
15
+ - **Idioms:** Do not introduce new idioms or expressions that are not present in the original text.
16
+ - **Variables:** Do not translate content between curly braces {variable}. These are system variables and must remain exactly the same.
17
+
18
+ # Context
19
+ CONTEXT