docs-i18n 0.8.1 → 0.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (88) hide show
  1. package/admin/dist/server/server.js +32 -32
  2. package/package.json +1 -1
  3. package/template/app/routes/$lang.$project.$version.docs.$.tsx +2 -1
  4. package/template/app/routes/$lang.$project.$version.docs.framework.$framework.$.tsx +2 -0
  5. package/template/app/routes/$lang.$project.$version.docs.tsx +2 -1
  6. package/template/app/routes/$lang.$project.docs.$.tsx +2 -1
  7. package/template/app/routes/$lang.$project.docs.tsx +2 -1
  8. package/template/app/routes/$lang.docs.$.tsx +2 -1
  9. package/template/app/routes/$lang.docs.framework.$framework.$.tsx +2 -0
  10. package/template/app/routes/$lang.docs.tsx +2 -1
  11. package/template/app/utils/content-loader.ts +13 -2
  12. package/template/app/utils/docs.server.ts +17 -15
  13. package/template/content/blog/en/announcing-query-v5.md +110 -0
  14. package/template/content/blog/en/hello-world.md +26 -0
  15. package/template/content/blog/en/i18n-best-practices.md +57 -0
  16. package/template/content/blog/en/react-query-vs-swr.md +100 -0
  17. package/template/content/blog/en/state-management-2024.md +143 -0
  18. package/template/content/blog/en/tanstack-router-1.0.md +121 -0
  19. package/template/content/blog/ja/announcing-query-v5.md +110 -0
  20. package/template/content/blog/ja/hello-world.md +26 -0
  21. package/template/content/blog/zh-hans/announcing-query-v5.md +93 -0
  22. package/template/content/blog/zh-hans/hello-world.md +26 -0
  23. package/template/content/docs-i18n/docs.config.json +25 -0
  24. package/template/content/docs-i18n/en/architecture.md +335 -0
  25. package/template/content/docs-i18n/en/cli.md +13 -1
  26. package/template/content/docs-i18n/en/configuration.md +350 -0
  27. package/template/content/docs-i18n/en/deployment.md +222 -0
  28. package/template/content/docs-i18n/en/getting-started.md +189 -0
  29. package/template/content/docs.config.json +25 -0
  30. package/template/content/en/admin.md +151 -0
  31. package/template/content/en/architecture.md +222 -0
  32. package/template/content/en/cli.md +269 -0
  33. package/template/content/en/configuration.md +331 -0
  34. package/template/content/en/deployment.md +209 -0
  35. package/template/content/en/getting-started.md +168 -0
  36. package/template/content/form/docs.config.json +18 -0
  37. package/template/content/form/en/guides/validation.md +175 -0
  38. package/template/content/form/en/installation.md +63 -0
  39. package/template/content/form/en/overview.md +71 -0
  40. package/template/content/form/en/quick-start.md +121 -0
  41. package/template/content/form/ja/installation.md +63 -0
  42. package/template/content/form/ja/overview.md +71 -0
  43. package/template/content/form/zh-hans/installation.md +63 -0
  44. package/template/content/form/zh-hans/overview.md +71 -0
  45. package/template/content/query/docs.config.json +32 -0
  46. package/template/content/query/en/guides/mutations.md +126 -0
  47. package/template/content/query/en/guides/pagination.md +98 -0
  48. package/template/content/query/en/guides/queries.md +120 -0
  49. package/template/content/query/en/installation.md +78 -0
  50. package/template/content/query/en/overview.md +72 -0
  51. package/template/content/query/en/quick-start.md +108 -0
  52. package/template/content/query/ja/installation.md +78 -0
  53. package/template/content/query/ja/overview.md +72 -0
  54. package/template/content/query/zh-hans/guides/mutations.md +126 -0
  55. package/template/content/query/zh-hans/guides/pagination.md +98 -0
  56. package/template/content/query/zh-hans/guides/queries.md +120 -0
  57. package/template/content/query/zh-hans/installation.md +95 -0
  58. package/template/content/query/zh-hans/overview.md +72 -0
  59. package/template/content/query/zh-hans/quick-start.md +108 -0
  60. package/template/content/router/docs.config.json +18 -0
  61. package/template/content/router/en/guides/routing-concepts.md +131 -0
  62. package/template/content/router/en/installation.md +57 -0
  63. package/template/content/router/en/overview.md +74 -0
  64. package/template/content/router/en/quick-start.md +88 -0
  65. package/template/content/router/ja/installation.md +57 -0
  66. package/template/content/router/ja/overview.md +78 -0
  67. package/template/content/router/zh-hans/guides/routing-concepts.md +131 -0
  68. package/template/content/router/zh-hans/installation.md +57 -0
  69. package/template/content/router/zh-hans/overview.md +81 -0
  70. package/template/content/router/zh-hans/quick-start.md +88 -0
  71. package/template/content/table/docs.config.json +18 -0
  72. package/template/content/table/en/guides/column-definitions.md +135 -0
  73. package/template/content/table/en/installation.md +56 -0
  74. package/template/content/table/en/overview.md +79 -0
  75. package/template/content/table/en/quick-start.md +112 -0
  76. package/template/content/table/ja/installation.md +56 -0
  77. package/template/content/table/ja/overview.md +79 -0
  78. package/template/content/table/zh-hans/installation.md +56 -0
  79. package/template/content/table/zh-hans/overview.md +79 -0
  80. package/template/content/virtual/docs.config.json +18 -0
  81. package/template/content/virtual/en/guides/dynamic-sizing.md +129 -0
  82. package/template/content/virtual/en/installation.md +57 -0
  83. package/template/content/virtual/en/overview.md +74 -0
  84. package/template/content/virtual/en/quick-start.md +114 -0
  85. package/template/content/virtual/ja/installation.md +57 -0
  86. package/template/content/virtual/ja/overview.md +74 -0
  87. package/template/content/virtual/zh-hans/installation.md +57 -0
  88. package/template/content/virtual/zh-hans/overview.md +74 -0
@@ -0,0 +1,335 @@
1
+ ---
2
+ title: Architecture
3
+ description: How docs-i18n works internally -- the translation pipeline, AST parsing, caching, and chunking strategies.
4
+ ---
5
+
6
+ # Architecture
7
+
8
+ This document explains how docs-i18n works internally. Understanding the pipeline helps you tune configuration, debug issues, and contribute to the project.
9
+
10
+ ## System Overview
11
+
12
+ ```
13
+ ┌─────────────────────────────────────────────────────────────────┐
14
+ │ docs-i18n ecosystem │
15
+ │ │
16
+ │ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌─────────┐ │
17
+ │ │ CLI │ │ Admin │ │ Template │ │ Runtime │ │
18
+ │ │ │ │ Dashboard │ │ (Site) │ │ (D1) │ │
19
+ │ │ translate│ │ (pre- │ │ TanStack │ │ CF │ │
20
+ │ │ rescan │ │ built) │ │ Start │ │ Workers │ │
21
+ │ │ assemble │ │ │ │ │ │ │ │
22
+ │ │ status │ │ Web UI │ │ SSR docs │ │ Edge │ │
23
+ │ └────┬─────┘ └─────┬─────┘ └────┬─────┘ └────┬────┘ │
24
+ │ │ │ │ │ │
25
+ │ └────────┬───────┴───────┬───────┘ │ │
26
+ │ │ │ │ │
27
+ │ ┌──────▼──────┐ ┌─────▼──────┐ ┌─────▼─────┐ │
28
+ │ │ SQLite │ │ Content │ │ D1 │ │
29
+ │ │ .cache/ │ │ content/ │ │ (cloud) │ │
30
+ │ │ translations│ │ {project}/ │ │ │ │
31
+ │ │ .db │ │ {version}/ │ │ translate │ │
32
+ │ └─────────────┘ └────────────┘ │ ions │ │
33
+ │ └───────────┘ │
34
+ └─────────────────────────────────────────────────────────────────┘
35
+ ```
36
+
37
+ ## Monorepo Structure
38
+
39
+ ```
40
+ docs-i18n/
41
+ packages/
42
+ core/ ← Published as "docs-i18n" on npm
43
+ │ src/
44
+ │ cli.ts CLI entry point
45
+ │ core/ Parser, cache, assembler, translator
46
+ │ commands/ translate, rescan, assemble, status, upload
47
+ │ dist/ Built output
48
+
49
+ admin/ ← Pre-built admin dashboard
50
+ │ app/ TanStack Start React app
51
+ │ server/ Server functions (status, jobs, models)
52
+ │ dist/ Pre-built (zero-install runtime)
53
+ │ serve.mjs Node.js HTTP adapter
54
+
55
+ template/ ← Docs site template (built per-project)
56
+ app/ TanStack Start React app
57
+ content/ Demo content + docs-i18n's own docs
58
+ ```
59
+
60
+ ## Translation Pipeline
61
+
62
+ ```
63
+ Source .md/.mdx files (English)
64
+
65
+
66
+ ┌──────────────────────────────────────────────────┐
67
+ │ 1. Normalize │
68
+ │ Ensure JSX tags (<AppOnly> etc.) are separated │
69
+ │ by blank lines for correct AST parsing │
70
+ └──────────────────┬───────────────────────────────┘
71
+
72
+
73
+ ┌──────────────────────────────────────────────────┐
74
+ │ 2. Parse (remark AST) │
75
+ │ .md/.mdx → flat list of typed nodes │
76
+ │ heading | paragraph | list | code | html | ... │
77
+ │ Each node: { type, rawText, needsTranslation } │
78
+ └──────────────────┬───────────────────────────────┘
79
+
80
+
81
+ ┌──────────────────────────────────────────────────┐
82
+ │ 3. Hash (MD5) │
83
+ │ "## Installation" → "a1b2c3d4..." │
84
+ │ Same content = same key = deduplicated │
85
+ └──────────────────┬───────────────────────────────┘
86
+
87
+
88
+ ┌──────────────────────────────────────────────────┐
89
+ │ 4. Smart Chunking │
90
+ │ Group nodes into chunks fitting LLM context │
91
+ │ Input budget + output budget + language mult. │
92
+ │ CJK: 2.5x │ Cyrillic: 2.0x │ Chinese: 1.5x │
93
+ └──────────────────┬───────────────────────────────┘
94
+
95
+
96
+ ┌──────────────────────────────────────────────────┐
97
+ │ 5. LLM Translation │
98
+ │ Send: { nodes: [{ key, type, text }] } │
99
+ │ Recv: { "a1b2c3": "翻译结果" } │
100
+ │ JSON repair + key recovery + retry + rotation │
101
+ └──────────────────┬───────────────────────────────┘
102
+
103
+
104
+ ┌──────────────────────────────────────────────────┐
105
+ │ 6. Cache (SQLite) │
106
+ │ INSERT INTO translations (lang, key, value) │
107
+ │ WAL mode │ concurrent-safe │ instant writes │
108
+ └──────────────────┬───────────────────────────────┘
109
+
110
+
111
+ ┌──────────────────────────────────────────────────┐
112
+ │ 7. Assemble │
113
+ │ English source + cached translations │
114
+ │ → translated .md/.mdx files │
115
+ │ Missing translations → fallback to English │
116
+ └──────────────────────────────────────────────────┘
117
+ ```
118
+
119
+ ## Content Structure Convention
120
+
121
+ ```
122
+ your-project/
123
+ content/ ← English source (no /en/ subdir)
124
+ {project}/{version}/ ← e.g. query/v5/ or nextjs/latest/
125
+ overview.md
126
+ guides/routing.md
127
+ ...
128
+
129
+ .cache/
130
+ translations.db ← SQLite cache (all languages)
131
+ content/{version}/{lang}/ ← assembled translations
132
+ overview.md
133
+ guides/routing.md
134
+
135
+ docs-i18n.config.ts ← translation config
136
+ site.config.ts ← site template config
137
+ ```
138
+
139
+ The site template resolves content as:
140
+ ```
141
+ Request: /zh-hans/query/v5/docs/overview
142
+ │ │ │
143
+ │ │ └─ slug
144
+ │ └─ version
145
+ └─ project
146
+
147
+ English: content/query/v5/overview.md ← direct read
148
+ zh-hans: .cache/ → translations.db → assemble() ← from cache
149
+ Fallback: English source (isFallback: true)
150
+ ```
151
+
152
+ ## Step 1: Normalization
153
+
154
+ The `normalize()` function (`src/core/normalize.ts`) preprocesses MDX content to ensure that JSX tags like `<AppOnly>`, `<PagesOnly>`, `<details>`, and `<div>` are separated from surrounding content by blank lines. This ensures remark parses them as independent HTML nodes rather than merging them with adjacent text.
155
+
156
+ For example, this input:
157
+
158
+ ```
159
+ <AppOnly>
160
+ Some text here
161
+ </AppOnly>
162
+ ```
163
+
164
+ Becomes:
165
+
166
+ ```
167
+ <AppOnly>
168
+
169
+ Some text here
170
+
171
+ </AppOnly>
172
+ ```
173
+
174
+ ## Step 2: AST Parsing
175
+
176
+ The `parseMdx()` function (`src/core/parser.ts`) uses remark to parse markdown into a flat list of `ParsedNode` objects. Each node has:
177
+
178
+ - `type` -- The AST node type: `paragraph`, `heading`, `list`, `blockquote`, `code`, `html`, `thematicBreak`, or `frontmatter`.
179
+ - `rawText` -- The raw text content from the normalized source.
180
+ - `needsTranslation` -- Whether this node contains human-readable text.
181
+ - `md5` -- MD5 hash of the raw text (only for translatable nodes).
182
+ - `startOffset` / `endOffset` -- Character offsets in the normalized content.
183
+
184
+ **Translatable node types:** `paragraph`, `heading`, `list`, `blockquote`, and `html` nodes that contain non-tag text (e.g., `<summary>Examples</summary>`).
185
+
186
+ **Non-translatable:** `code` blocks, `thematicBreak`, and pure HTML/JSX tags (self-closing tags like `<Check size={18} />`, opening/closing tags like `<AppOnly></AppOnly>`).
187
+
188
+ **Frontmatter handling:** If the content starts with `---`, the parser detects YAML frontmatter and emits it as a single `frontmatter` node spanning from the opening `---` to the closing `---`. The frontmatter module (`src/core/frontmatter.ts`) then extracts only the configured translatable fields (e.g., `title`, `description`) using the `yaml` library, sends them to the LLM as plain text, and reconstructs the YAML with translated values while preserving all other fields and formatting.
189
+
190
+ ## Step 3: MD5 Hashing
191
+
192
+ Each translatable node's raw text is hashed with MD5 to produce a stable key. This key is used for:
193
+
194
+ - **Deduplication** -- identical content appearing in multiple files (or even multiple projects) shares a single translation.
195
+ - **Incremental updates** -- when source content changes, only the nodes with new MD5 hashes need translation. Unchanged nodes reuse their cached translations.
196
+ - **Heading differentiation** -- heading nodes include their level markers (`##`, `###`) in the hash, so "## Installation" and "### Installation" produce different keys.
197
+
198
+ ## Step 4: Smart Chunking
199
+
200
+ The translate command groups untranslated nodes into chunks that fit within the LLM's context window. The chunking algorithm (`src/commands/translate.ts`) accounts for:
201
+
202
+ - **Input budget** -- system prompt tokens + source text tokens. Estimated at `text.length / 4 + 80` tokens per node (accounting for JSON structure overhead).
203
+ - **Output budget** -- translated text tokens. Scaled by a per-language multiplier since different languages use tokens differently:
204
+
205
+ | Language | Multiplier | Reason |
206
+ | --- | --- | --- |
207
+ | Japanese, Korean, Hindi, Thai | 2.5x | CJK/Indic tokenization |
208
+ | Russian, Arabic, Ukrainian, Hebrew | 2.0x | Cyrillic/Arabic scripts |
209
+ | Chinese (Simplified/Traditional) | 1.5x | CJK but more concise |
210
+ | German, French, Portuguese | 1.3x | Slightly longer than English |
211
+ | Spanish, Vietnamese | 1.2x | Close to English length |
212
+ | Other languages | 2.0x | Safe default |
213
+
214
+ - **System prompt overhead** -- approximately 700 tokens for the translation prompt.
215
+ - **JSON structure overhead** -- approximately 80 tokens per key for JSON schema properties.
216
+ - **Safety margin** -- 85% of input budget and 75% of output budget are used to leave room for estimation errors.
217
+
218
+ When a chunk reaches its budget, a new chunk is started. This prevents context window overflow and output truncation.
219
+
220
+ ## Step 5: LLM Translation
221
+
222
+ The translator (`src/core/translator.ts`) uses structured JSON mode for translations:
223
+
224
+ **Input format:** A JSON object with a `nodes` array. Each node has a `key` (MD5), `type` (heading/paragraph/list/etc.), and `text` (content to translate).
225
+
226
+ ```json
227
+ {
228
+ "nodes": [
229
+ { "key": "a1b2c3...", "type": "heading", "text": "## Installation" },
230
+ { "key": "d4e5f6...", "type": "paragraph", "text": "Run the following command:" }
231
+ ]
232
+ }
233
+ ```
234
+
235
+ **Output format:** A flat JSON object mapping each key to its translation.
236
+
237
+ ```json
238
+ {
239
+ "a1b2c3...": "## \u5b89\u88c5",
240
+ "d4e5f6...": "\u8fd0\u884c\u4ee5\u4e0b\u547d\u4ee4\uff1a"
241
+ }
242
+ ```
243
+
244
+ **Robustness features:**
245
+
246
+ - **JSON repair** -- Handles common LLM JSON errors: unescaped newlines in strings, trailing commas, missing closing braces.
247
+ - **Thinking block stripping** -- Removes `<think>...</think>` blocks from reasoning models.
248
+ - **Unwrapping** -- If the model wraps output in `{"nodes": {...}}` or `{"translations": {...}}`, it is automatically unwrapped.
249
+ - **Key recovery** -- If the model corrupts an MD5 key (e.g., truncates it), the translator attempts fuzzy matching to recover the translation (up to 3 character differences).
250
+ - **Garbage detection** -- If more than 50% of translation values are identical, the model output is rejected.
251
+ - **Retry with backoff** -- Retries on 429 (rate limit), 503, 405, timeout, and connection errors. Uses exponential backoff starting at 2 seconds.
252
+ - **Model rotation** -- Supports a list of models to rotate through. Dead models (400/404 errors) are skipped. Rate-limited models (429) are deprioritized.
253
+ - **Truncation detection** -- If `finish_reason` is `'length'`, the output was truncated and the request is retried.
254
+
255
+ **Frontmatter translation:**
256
+
257
+ Frontmatter nodes are handled specially. Instead of sending the entire YAML block, the translator extracts individual translatable fields (e.g., `title`, `description`) and sends them as plain `paragraph` type nodes with virtual keys like `fm:<md5>:title`. After translation, the fields are reassembled into the original YAML structure using `reconstructFrontmatter()`.
258
+
259
+ ## Step 6: SQLite Cache
260
+
261
+ The `TranslationCache` class (`src/core/cache.ts`) manages all persistent state in a single SQLite database at `<cacheDir>/translations.db`.
262
+
263
+ **Schema:**
264
+
265
+ ```sql
266
+ -- EN source texts (deduplicated by MD5)
267
+ CREATE TABLE sources (
268
+ key TEXT PRIMARY KEY NOT NULL, -- MD5 hash
269
+ text TEXT NOT NULL, -- original English text
270
+ type TEXT NOT NULL DEFAULT 'paragraph'
271
+ );
272
+
273
+ -- Which files use each source node
274
+ CREATE TABLE source_files (
275
+ key TEXT NOT NULL, -- MD5 hash
276
+ file TEXT NOT NULL, -- relative file path
277
+ line INTEGER NOT NULL, -- line number
278
+ version TEXT NOT NULL DEFAULT 'latest',
279
+ PRIMARY KEY (version, key, file, line)
280
+ );
281
+
282
+ -- Translated texts per language
283
+ CREATE TABLE translations (
284
+ lang TEXT NOT NULL, -- language code
285
+ key TEXT NOT NULL, -- MD5 hash
286
+ value TEXT NOT NULL, -- translated text
287
+ created_at INTEGER NOT NULL DEFAULT (unixepoch()),
288
+ updated_at INTEGER NOT NULL DEFAULT (unixepoch()),
289
+ PRIMARY KEY (lang, key)
290
+ );
291
+ ```
292
+
293
+ **Performance configuration:**
294
+
295
+ - **WAL mode** -- Write-Ahead Logging enables concurrent readers with a single writer. No locking issues during parallel translation and assembly.
296
+ - **busy_timeout = 5000** -- Wait up to 5 seconds for a lock before failing.
297
+ - **synchronous = NORMAL** -- Balanced between safety and performance.
298
+ - **WITHOUT ROWID** -- Tables use the primary key directly, avoiding an extra rowid column.
299
+ - **Immediate writes** -- All `set()` calls write directly to disk. No explicit save step needed.
300
+
301
+ **Key operations:**
302
+
303
+ - `get(lang, md5)` -- Look up a cached translation.
304
+ - `set(lang, md5, translation)` -- Store or update a translation (upsert).
305
+ - `untranslatedKeys(lang, version)` -- Find all source keys that have no translation for a given language.
306
+ - `fileCoverage(version, lang)` -- Get per-file translation coverage (used by the admin dashboard).
307
+ - `prune(lang, usedMd5s)` -- Remove translations whose keys are no longer referenced.
308
+ - `exportJsonl(lang, outputPath)` / `importJsonl(lang, inputPath)` -- Export/import translations in JSONL format for backup or migration.
309
+
310
+ **SQLite compatibility:**
311
+
312
+ The `openDatabase()` function (`src/core/sqlite.ts`) automatically detects the runtime environment. Under Bun, it uses `bun:sqlite`. Under Node.js, it uses `better-sqlite3`. Both expose the same interface.
313
+
314
+ ## Step 7: Assembly
315
+
316
+ The `assemble()` function (`src/core/assembler.ts`) produces a translated file from English source content and cached translations:
317
+
318
+ 1. Normalizes the source content.
319
+ 2. Parses it into AST nodes.
320
+ 3. For each node:
321
+ - **Non-translatable nodes** (code blocks, HTML tags) are kept as-is.
322
+ - **Translatable nodes with a cached translation** are replaced with the cached value.
323
+ - **Translatable nodes without a cache hit** are either wrapped in `<!-- NEEDS_TRANSLATION -->` markers (for the legacy whole-file translation mode) or fall back to the original English text (for assembled output files).
324
+ 4. Preserves all whitespace and newlines between nodes.
325
+
326
+ The `AssembleResult` includes statistics: `cachedCount`, `uncachedCount`, `totalTranslatable`, and whether all nodes were cached (`allCached`).
327
+
328
+ ## Validation
329
+
330
+ The `validate()` function (`src/core/validator.ts`) compares LLM output against the translation cache to detect and correct modifications to already-cached translations. It uses two alignment strategies:
331
+
332
+ - **Fast path** -- When the number of translatable nodes in the source and output match, nodes are aligned by index.
333
+ - **Anchor-based alignment** -- When node counts differ (the LLM merged or split paragraphs), cached translations serve as anchor points. The validator finds exact matches between output text and cached translations, then aligns nodes between anchors by type matching.
334
+
335
+ Cached translations always override LLM modifications, ensuring translation consistency across runs.
@@ -271,6 +271,15 @@ For non-English languages, the site automatically loads translations from the `.
271
271
 
272
272
  ### Full translation pipeline
273
273
 
274
+ ```
275
+ rescan ──→ status ──→ translate ──→ assemble ──→ site dev
276
+ │ │ │ │ │
277
+ ▼ ▼ ▼ ▼ ▼
278
+ Index Progress Send to EN + cache Serve
279
+ source bars LLM, cache → output multilingual
280
+ files results files docs site
281
+ ```
282
+
274
283
  ```bash
275
284
  # 1. Scan source files
276
285
  docs-i18n rescan
@@ -281,8 +290,11 @@ docs-i18n status
281
290
  # 3. Translate
282
291
  docs-i18n translate --lang zh-hans
283
292
 
284
- # 4. Assemble output
293
+ # 4. Assemble output (optional — site does this at runtime)
285
294
  docs-i18n assemble --lang zh-hans
295
+
296
+ # 5. Start docs site
297
+ docs-i18n site
286
298
  ```
287
299
 
288
300
  ### Translate a single file for review