@remnic/export-weclone 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Joshua Warren
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,171 @@
1
+ # @remnic/export-weclone
2
+
3
+ Export [Remnic](https://github.com/joshuaswarren/remnic) memories as
4
+ [WeClone](https://github.com/xming521/weclone)-compatible fine-tuning
5
+ datasets. Produces Alpaca-format JSON consumable by
6
+ [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), which WeClone
7
+ drives under the hood.
8
+
9
+ This package solves the noisy-chat-log problem: WeClone normally trains on
10
+ raw Telegram / WeChat exports, which include spam, one-word replies, and
11
+ PII. Remnic has already distilled your conversations into structured
12
+ facts, preferences, entities, and topics — a much higher
13
+ signal-to-noise source for a personal digital avatar.
14
+
15
+ ## Install
16
+
17
+ ```bash
18
+ pnpm add @remnic/export-weclone
19
+ # or: npm i @remnic/export-weclone
20
+ ```
21
+
22
+ `@remnic/export-weclone` depends on `@remnic/core` and is intended to be
23
+ used alongside an existing Remnic memory store.
24
+
25
+ ## Quick start
26
+
27
+ The primary entry point is the `remnic` CLI (see
28
+ [`@remnic/cli`](../remnic-cli)). Importing this package as a side-effect
29
+ registers the `weclone` adapter with the core training-export registry:
30
+
31
+ ```bash
32
+ remnic training:export --format weclone --output ./weclone-dataset.json
33
+ ```
34
+
35
+ Common options:
36
+
37
+ ```bash
38
+ # Restrict to high-confidence memories created in 2026:
39
+ remnic training:export \
40
+ --format weclone \
41
+ --output ./weclone.json \
42
+ --since 2026-01-01 \
43
+ --until 2027-01-01 \
44
+ --min-confidence 0.7
45
+
46
+ # Restrict to specific categories:
47
+ remnic training:export \
48
+ --format weclone \
49
+ --output ./weclone.json \
50
+ --categories preference,fact,skill
51
+
52
+ # Generate conversational Q/A pairs instead of raw facts:
53
+ remnic training:export \
54
+ --format weclone \
55
+ --output ./weclone.json \
56
+ --synthesize
57
+
58
+ # Preview only (no file written):
59
+ remnic training:export --format weclone --output /tmp/preview.json --dry-run
60
+ ```
61
+
62
+ ## Output format
63
+
64
+ WeClone / LLaMA Factory expect [Alpaca
65
+ JSON](https://github.com/tatsu-lab/stanford_alpaca#data-release):
66
+
67
+ ```json
68
+ [
69
+ {
70
+ "instruction": "What kind of coffee do you like?",
71
+ "input": "",
72
+ "output": "dark roast, ethiopian yirgacheffe. something about that fruity wine-like flavor..."
73
+ }
74
+ ]
75
+ ```
76
+
77
+ The adapter emits only the three Alpaca fields. Remnic metadata
78
+ (`category`, `confidence`, `sourceIds`) is stripped from the output file
79
+ but is preserved on the in-memory records so callers building their own
80
+ pipelines can inspect it before serialization.
81
+
82
+ ## Programmatic API
83
+
84
+ ```ts
85
+ import {
86
+ ensureWecloneExportAdapterRegistered,
87
+ wecloneExportAdapter,
88
+ synthesizeTrainingPairs,
89
+ extractStyleMarkers,
90
+ sweepPii,
91
+ } from "@remnic/export-weclone";
92
+ import {
93
+ convertMemoriesToRecords,
94
+ getTrainingExportAdapter,
95
+ } from "@remnic/core";
96
+
97
+ // Side-effect import is usually enough, but explicit registration is safe:
98
+ ensureWecloneExportAdapterRegistered();
99
+
100
+ const records = await convertMemoriesToRecords({
101
+ memoryDir: "/path/to/memory",
102
+ minConfidence: 0.7,
103
+ });
104
+
105
+ const pairs = synthesizeTrainingPairs(records, { maxPairsPerRecord: 2 });
106
+ const { cleanRecords, redactedCount } = sweepPii(pairs);
107
+
108
+ const adapter = getTrainingExportAdapter("weclone");
109
+ const json = adapter!.formatRecords(cleanRecords);
110
+ ```
111
+
112
+ ### `synthesizeTrainingPairs(records, opts)`
113
+
114
+ Turns flat memory records into natural conversational Q/A pairs using
115
+ category-driven templates (preferences, opinions, expertise, personal).
116
+ Pure templates — no LLM calls. Optionally applies style markers (e.g.
117
+ lowercase normalization) extracted from the user's own transcripts.
118
+
119
+ ### `extractStyleMarkers(samples)`
120
+
121
+ Analyses text samples with regex-and-count heuristics and returns a
122
+ `StyleMarkers` profile (`avgSentenceLength`, `usesEmoji`, `formality`,
123
+ `usesLowercase`, `commonPhrases`). Used by `synthesizeTrainingPairs` to
124
+ match the output tone to the user's own writing style.
125
+
126
+ ### `sweepPii(records)`
127
+
128
+ Belt-and-suspenders PII redaction for email, SSN, credit-card, IP, and
129
+ phone patterns. Runs after Remnic's own privacy controls so that even if
130
+ something slips through the upstream filter, the final dataset cannot leak
131
+ these patterns. Returns `{ cleanRecords, redactedCount, redactionDetails }`.
132
+
133
+ ## How synthesis works
134
+
135
+ Remnic memories are facts, not conversations. The synthesizer maps each
136
+ memory category to a template group and generates a corresponding
137
+ question, using any parenthesised tags in the instruction as the topic:
138
+
139
+ ```
140
+ Category: preference
141
+ Memory: "Dark roast coffee, Ethiopian Yirgacheffe specifically"
142
+ Tags: food, coffee
143
+
144
+ Generated pair:
145
+ instruction: "What kind of food, coffee do you like?"
146
+ output: "Dark roast coffee, Ethiopian Yirgacheffe specifically"
147
+ ```
148
+
149
+ Question templates live in `src/synthesizer.ts`. Adding a new category
150
+ mapping is a one-line change.
151
+
152
+ ## Privacy posture
153
+
154
+ - Output JSON contains only `instruction`, `input`, `output`.
155
+ - Remnic metadata (`sourceIds`, etc.) is **not** written to the dataset
156
+ file — even the record IDs stay in the memory store.
157
+ - `sweepPii` runs by default in the CLI. Disable only with
158
+ `--no-privacy-sweep` and only when you have a compensating control.
159
+ - Symlinks and hard-linked `.md` files under `memoryDir` are refused by
160
+ the core converter to block data-exfiltration vectors out of the memory
161
+ store (see `packages/remnic-core/src/training-export/converter.ts`).
162
+
163
+ ## Related
164
+
165
+ - Tracking issue: [remnic#459](https://github.com/joshuaswarren/remnic/issues/459)
166
+ - Upstream: [WeClone](https://github.com/xming521/weclone)
167
+ - Format: [Alpaca JSON via LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory)
168
+
169
+ ## License
170
+
171
+ MIT. See the root [LICENSE](../../LICENSE) file.
@@ -0,0 +1,105 @@
1
+ import { TrainingExportAdapter, TrainingExportRecord } from '@remnic/core';
2
+
3
+ /**
4
+ * WeClone Alpaca-format training export adapter.
5
+ *
6
+ * Converts TrainingExportRecord[] into the JSON format that
7
+ * WeClone / LLaMA Factory expects for fine-tuning:
8
+ *
9
+ * [{ "instruction": "...", "input": "", "output": "..." }, ...]
10
+ *
11
+ * Only the three Alpaca fields are emitted; Remnic-specific
12
+ * metadata (category, confidence, sourceIds) is stripped.
13
+ */
14
+
15
+ declare const wecloneExportAdapter: TrainingExportAdapter;
16
+
17
+ /**
18
+ * Communication style marker extraction.
19
+ *
20
+ * Analyzes text samples using simple heuristics to produce
21
+ * a StyleMarkers profile. No LLM calls — pure regex and
22
+ * counting.
23
+ */
24
+ interface StyleMarkers {
25
+ avgSentenceLength: number;
26
+ usesEmoji: boolean;
27
+ formality: "formal" | "casual" | "mixed";
28
+ usesLowercase: boolean;
29
+ commonPhrases: string[];
30
+ }
31
+ /**
32
+ * Analyse text samples and extract communication style markers.
33
+ */
34
+ declare function extractStyleMarkers(samples: string[]): StyleMarkers;
35
+
36
+ /**
37
+ * Training-pair synthesizer.
38
+ *
39
+ * Converts Remnic's flat TrainingExportRecord[] — where
40
+ * `instruction` is a natural-language description and
41
+ * `category` identifies the memory type — into natural
42
+ * conversational question-answer pairs suitable for
43
+ * WeClone / LLaMA Factory fine-tuning.
44
+ *
45
+ * Uses template-based question generation (no LLM calls).
46
+ */
47
+
48
+ interface SynthesizerOptions {
49
+ styleMarkers?: StyleMarkers;
50
+ maxPairsPerRecord?: number;
51
+ }
52
+ /**
53
+ * Synthesize natural conversational training pairs from
54
+ * category-tagged memory records.
55
+ */
56
+ declare function synthesizeTrainingPairs(records: TrainingExportRecord[], options?: SynthesizerOptions): TrainingExportRecord[];
57
+
58
+ /**
59
+ * PII privacy sweep for training export records.
60
+ *
61
+ * Belt-and-suspenders check that runs after Remnic's own
62
+ * privacy controls. Scans instruction, input, and output
63
+ * fields for common PII patterns and replaces matches with
64
+ * [REDACTED].
65
+ */
66
+
67
+ interface PrivacySweepResult {
68
+ cleanRecords: TrainingExportRecord[];
69
+ redactedCount: number;
70
+ redactionDetails: {
71
+ index: number;
72
+ field: string;
73
+ pattern: string;
74
+ }[];
75
+ }
76
+ /**
77
+ * Scan and redact PII from training export records.
78
+ *
79
+ * Returns a new array of cleaned records, leaving the originals
80
+ * unmodified. The `redactedCount` is the number of records that
81
+ * had at least one redaction. `redactionDetails` lists every
82
+ * individual match with its record index, field, and pattern name.
83
+ */
84
+ declare function sweepPii(records: TrainingExportRecord[]): PrivacySweepResult;
85
+
86
+ /**
87
+ * @remnic/export-weclone
88
+ *
89
+ * WeClone-specific training-data export adapter that converts
90
+ * Remnic memories into Alpaca-format fine-tuning datasets
91
+ * compatible with WeClone / LLaMA Factory.
92
+ */
93
+
94
+ /**
95
+ * Idempotently register the WeClone adapter with the core training-export
96
+ * registry. Callable multiple times without throwing (CLAUDE.md #13:
97
+ * secondary calls must not crash host processes that pre-register the
98
+ * adapter for test fixtures).
99
+ *
100
+ * Returns true when the adapter was newly registered, false when an adapter
101
+ * with the same name already exists.
102
+ */
103
+ declare function ensureWecloneExportAdapterRegistered(): boolean;
104
+
105
+ export { type PrivacySweepResult, type StyleMarkers, type SynthesizerOptions, ensureWecloneExportAdapterRegistered, extractStyleMarkers, sweepPii, synthesizeTrainingPairs, wecloneExportAdapter };
package/dist/index.js ADDED
@@ -0,0 +1,337 @@
1
+ // openclaw-engram: Local-first memory plugin
2
+
3
+ // src/index.ts
4
+ import {
5
+ getTrainingExportAdapter,
6
+ registerTrainingExportAdapter
7
+ } from "@remnic/core";
8
+
9
+ // src/adapter.ts
10
+ var wecloneExportAdapter = {
11
+ name: "weclone",
12
+ fileExtension: ".json",
13
+ formatRecords(records) {
14
+ const alpacaRecords = records.map((r) => ({
15
+ instruction: r.instruction,
16
+ input: r.input,
17
+ output: r.output
18
+ }));
19
+ return JSON.stringify(alpacaRecords, null, 2);
20
+ }
21
+ };
22
+
23
+ // src/synthesizer.ts
24
+ var DEFAULT_MAX_PAIRS = 1;
25
+ var QUESTION_TEMPLATES = {
26
+ preferences: [
27
+ "What kind of {topic} do you like?",
28
+ "What's your preference for {topic}?",
29
+ "What are your favorite {topic}?"
30
+ ],
31
+ opinions: [
32
+ "What do you think about {topic}?",
33
+ "How do you feel about {topic}?",
34
+ "What's your opinion on {topic}?"
35
+ ],
36
+ expertise: [
37
+ "Tell me about {topic}.",
38
+ "What do you know about {topic}?",
39
+ "Can you explain {topic}?"
40
+ ],
41
+ personal: [
42
+ "Can you tell me about your {topic}?",
43
+ "Tell me about your {topic}.",
44
+ "What can you share about your {topic}?"
45
+ ]
46
+ };
47
+ var DEFAULT_TEMPLATES = [
48
+ "Tell me about {topic}.",
49
+ "What can you share about {topic}?"
50
+ ];
51
+ var CATEGORY_TO_TEMPLATE = {
52
+ preference: "preferences",
53
+ fact: "expertise",
54
+ entity: "expertise",
55
+ skill: "expertise",
56
+ correction: "opinions",
57
+ decision: "opinions",
58
+ principle: "opinions",
59
+ rule: "opinions",
60
+ personal: "personal",
61
+ relationship: "personal",
62
+ commitment: "personal",
63
+ moment: "personal"
64
+ };
65
+ function synthesizeTrainingPairs(records, options) {
66
+ const maxPairs = options?.maxPairsPerRecord ?? DEFAULT_MAX_PAIRS;
67
+ const style = options?.styleMarkers;
68
+ const result = [];
69
+ for (let i = 0; i < records.length; i++) {
70
+ const record = records[i];
71
+ const templateKey = resolveTemplateKey(record.category);
72
+ const topic = extractTopic(record.instruction);
73
+ const templates = QUESTION_TEMPLATES[templateKey] ?? DEFAULT_TEMPLATES;
74
+ const pairCount = Math.min(maxPairs, templates.length);
75
+ for (let j = 0; j < pairCount; j++) {
76
+ const templateIndex = (i + j) % templates.length;
77
+ const question = templates[templateIndex].replace("{topic}", topic);
78
+ let output = record.output;
79
+ if (style?.usesLowercase) {
80
+ output = output.toLowerCase();
81
+ }
82
+ result.push({
83
+ instruction: question,
84
+ input: "",
85
+ output,
86
+ category: record.category,
87
+ confidence: record.confidence,
88
+ sourceIds: record.sourceIds
89
+ });
90
+ }
91
+ }
92
+ return result;
93
+ }
94
+ function resolveTemplateKey(category) {
95
+ if (!category) return "";
96
+ return CATEGORY_TO_TEMPLATE[category.toLowerCase()] ?? "";
97
+ }
98
+ function extractTopic(instruction) {
99
+ const tagMatch = instruction.match(/\(([^()]+)\)/);
100
+ if (tagMatch) {
101
+ return tagMatch[1].trim().toLowerCase();
102
+ }
103
+ return "this";
104
+ }
105
+
106
+ // src/style-extractor.ts
107
+ var EMOJI_RE = /[\u{1F600}-\u{1F64F}\u{1F300}-\u{1F5FF}\u{1F680}-\u{1F6FF}\u{1F1E0}-\u{1F1FF}\u{2600}-\u{27BF}\u{2702}-\u{27B0}\u{FE00}-\u{FE0F}\u{1FA00}-\u{1FA6F}\u{1FA70}-\u{1FAFF}\u{2328}\u{23CF}\u{23E9}-\u{23F3}\u{23F8}-\u{23FA}\u{200D}\u{20E3}\u{FE0F}\u{E0020}-\u{E007F}\u{2B50}\u{2B55}\u{2934}\u{2935}\u{25AA}-\u{25FE}\u{2600}-\u{26FF}\u{2700}-\u{27BF}\u{231A}\u{231B}\u{23E9}-\u{23F3}\u{23F8}-\u{23FA}\u{25FB}-\u{25FE}\u{2614}\u{2615}\u{2648}-\u{2653}\u{267F}\u{2693}\u{26A1}\u{26AA}\u{26AB}\u{26BD}\u{26BE}\u{26C4}\u{26C5}\u{26CE}\u{26D4}\u{26EA}\u{26F2}\u{26F3}\u{26F5}\u{26FA}\u{26FD}\u{2702}\u{2705}\u{2708}-\u{270D}\u{270F}\u{2712}\u{2714}\u{2716}\u{271D}\u{2721}\u{2728}\u{2733}\u{2734}\u{2744}\u{2747}\u{274C}\u{274E}\u{2753}-\u{2755}\u{2757}\u{2763}\u{2764}\u{2795}-\u{2797}\u{27A1}\u{27B0}\u{27BF}\u{2934}\u{2935}]/u;
108
+ var FORMAL_MARKERS = [
109
+ "furthermore",
110
+ "however",
111
+ "therefore",
112
+ "moreover",
113
+ "consequently",
114
+ "nevertheless",
115
+ "in addition",
116
+ "accordingly",
117
+ "subsequently",
118
+ "regarding",
119
+ "pertaining",
120
+ "shall",
121
+ "hereby",
122
+ "whereas",
123
+ "notwithstanding",
124
+ "henceforth",
125
+ "aforementioned",
126
+ "please consider",
127
+ "would like to",
128
+ "i would",
129
+ "appreciation",
130
+ "recommendations",
131
+ "thoroughly",
132
+ "documentation"
133
+ ];
134
+ var CASUAL_MARKERS = [
135
+ "gonna",
136
+ "wanna",
137
+ "kinda",
138
+ "sorta",
139
+ "gotta",
140
+ "dunno",
141
+ "lemme",
142
+ "yeah",
143
+ "yep",
144
+ "nah",
145
+ "lol",
146
+ "omg",
147
+ "tbh",
148
+ "imo",
149
+ "btw",
150
+ "nope",
151
+ "cuz",
152
+ "tho",
153
+ "ain't",
154
+ "y'all",
155
+ "awesome",
156
+ "cool",
157
+ "dude",
158
+ "bro",
159
+ "bruh"
160
+ ];
161
+ var MIN_PHRASE_FREQUENCY = 2;
162
+ var MAX_COMMON_PHRASES = 10;
163
+ function extractStyleMarkers(samples) {
164
+ if (samples.length === 0) {
165
+ return {
166
+ avgSentenceLength: 0,
167
+ usesEmoji: false,
168
+ formality: "mixed",
169
+ usesLowercase: false,
170
+ commonPhrases: []
171
+ };
172
+ }
173
+ const joined = samples.join(" ");
174
+ return {
175
+ avgSentenceLength: calcAvgSentenceLength(joined),
176
+ usesEmoji: detectEmoji(joined),
177
+ formality: detectFormality(joined),
178
+ usesLowercase: detectLowercase(joined),
179
+ commonPhrases: findCommonPhrases(samples)
180
+ };
181
+ }
182
+ function calcAvgSentenceLength(text) {
183
+ const sentences = text.split(/[.!?]+/).map((s) => s.trim()).filter((s) => s.length > 0);
184
+ if (sentences.length === 0) return 0;
185
+ const totalWords = sentences.reduce((sum, s) => {
186
+ const words = s.split(/\s+/).filter((w) => w.length > 0);
187
+ return sum + words.length;
188
+ }, 0);
189
+ return Math.round(totalWords / sentences.length * 10) / 10;
190
+ }
191
+ function detectEmoji(text) {
192
+ return EMOJI_RE.test(text);
193
+ }
194
+ function detectFormality(text) {
195
+ const lower = text.toLowerCase();
196
+ let formalScore = 0;
197
+ for (const marker of FORMAL_MARKERS) {
198
+ if (new RegExp(`\\b${marker}\\b`, "i").test(lower)) formalScore++;
199
+ }
200
+ let casualScore = 0;
201
+ for (const marker of CASUAL_MARKERS) {
202
+ if (new RegExp(`\\b${marker}\\b`, "i").test(lower)) casualScore++;
203
+ }
204
+ const THRESHOLD = 2;
205
+ if (formalScore >= THRESHOLD && formalScore > casualScore) return "formal";
206
+ if (casualScore >= THRESHOLD && casualScore > formalScore) return "casual";
207
+ return "mixed";
208
+ }
209
+ function detectLowercase(text) {
210
+ const sentences = text.split(/[.!?]+/).map((s) => s.trim()).filter((s) => s.length > 0);
211
+ if (sentences.length === 0) return false;
212
+ const lowercaseStarts = sentences.filter((s) => {
213
+ const firstChar = s.charAt(0);
214
+ return firstChar === firstChar.toLowerCase() && firstChar !== firstChar.toUpperCase();
215
+ }).length;
216
+ return lowercaseStarts / sentences.length > 0.5;
217
+ }
218
+ function isAlnum(ch) {
219
+ const c = ch.charCodeAt(0);
220
+ return c >= 48 && c <= 57 || // 0-9
221
+ c >= 65 && c <= 90 || // A-Z
222
+ c >= 97 && c <= 122;
223
+ }
224
+ function trimNonAlnum(word) {
225
+ let start = 0;
226
+ let end = word.length;
227
+ while (start < end && !isAlnum(word.charAt(start))) start++;
228
+ while (end > start && !isAlnum(word.charAt(end - 1))) end--;
229
+ return start === 0 && end === word.length ? word : word.slice(start, end);
230
+ }
231
+ function findCommonPhrases(samples) {
232
+ const phraseCount = /* @__PURE__ */ new Map();
233
+ for (const sample of samples) {
234
+ const words = sample.split(/\s+/).map((w) => trimNonAlnum(w)).filter((w) => w.length > 0);
235
+ const seenInSample = /* @__PURE__ */ new Set();
236
+ for (let ngramSize = 2; ngramSize <= 3; ngramSize++) {
237
+ for (let i = 0; i <= words.length - ngramSize; i++) {
238
+ const phrase = words.slice(i, i + ngramSize).join(" ").toLowerCase();
239
+ if (!seenInSample.has(phrase)) {
240
+ seenInSample.add(phrase);
241
+ phraseCount.set(phrase, (phraseCount.get(phrase) ?? 0) + 1);
242
+ }
243
+ }
244
+ }
245
+ }
246
+ return [...phraseCount.entries()].filter(([, count]) => count >= MIN_PHRASE_FREQUENCY).sort((a, b) => {
247
+ if (b[1] !== a[1]) return b[1] - a[1];
248
+ return a[0].localeCompare(b[0]);
249
+ }).slice(0, MAX_COMMON_PHRASES).map(([phrase]) => phrase);
250
+ }
251
+
252
+ // src/privacy.ts
253
+ var PII_PATTERNS = [
254
+ {
255
+ // Email: user@domain.tld
256
+ name: "email",
257
+ regex: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g
258
+ },
259
+ {
260
+ // SSN: 123-45-6789 (exactly 3-2-4 digit groups)
261
+ name: "ssn",
262
+ regex: /\b\d{3}-\d{2}-\d{4}\b/g
263
+ },
264
+ {
265
+ // Credit card: 4 groups of 4 digits separated by dashes or spaces
266
+ name: "credit_card",
267
+ regex: /\b\d{4}[-\s]\d{4}[-\s]\d{4}[-\s]\d{4}\b/g
268
+ },
269
+ {
270
+ // IP address: four octets 0-255
271
+ name: "ip_address",
272
+ regex: /\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b/g
273
+ },
274
+ {
275
+ // Phone: optional +1- prefix, then 3-3-4 with dashes, dots, or spaces
276
+ // Also matches (555) 123-4567 format
277
+ name: "phone",
278
+ regex: /(?:\+\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]\d{3}[-.\s]\d{4}\b/g
279
+ }
280
+ ];
281
+ var SCANNED_FIELDS = [
282
+ "instruction",
283
+ "input",
284
+ "output"
285
+ ];
286
+ function sweepPii(records) {
287
+ const redactionDetails = [];
288
+ const recordHasRedaction = /* @__PURE__ */ new Set();
289
+ const cleanRecords = records.map((record, idx) => {
290
+ const cleaned = { ...record };
291
+ for (const field of SCANNED_FIELDS) {
292
+ let value = record[field];
293
+ if (!value) continue;
294
+ for (const pattern of PII_PATTERNS) {
295
+ pattern.regex.lastIndex = 0;
296
+ if (pattern.regex.test(value)) {
297
+ pattern.regex.lastIndex = 0;
298
+ value = value.replace(pattern.regex, "[REDACTED]");
299
+ recordHasRedaction.add(idx);
300
+ redactionDetails.push({
301
+ index: idx,
302
+ field,
303
+ pattern: pattern.name
304
+ });
305
+ }
306
+ }
307
+ cleaned[field] = value;
308
+ }
309
+ return cleaned;
310
+ });
311
+ return {
312
+ cleanRecords,
313
+ redactedCount: recordHasRedaction.size,
314
+ redactionDetails
315
+ };
316
+ }
317
+
318
+ // src/index.ts
319
+ function ensureWecloneExportAdapterRegistered() {
320
+ if (getTrainingExportAdapter(wecloneExportAdapter.name) !== void 0) {
321
+ return false;
322
+ }
323
+ registerTrainingExportAdapter(wecloneExportAdapter);
324
+ return true;
325
+ }
326
+ try {
327
+ ensureWecloneExportAdapterRegistered();
328
+ } catch {
329
+ }
330
+ export {
331
+ ensureWecloneExportAdapterRegistered,
332
+ extractStyleMarkers,
333
+ sweepPii,
334
+ synthesizeTrainingPairs,
335
+ wecloneExportAdapter
336
+ };
337
+ //# sourceMappingURL=index.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/index.ts","../src/adapter.ts","../src/synthesizer.ts","../src/style-extractor.ts","../src/privacy.ts"],"sourcesContent":["/**\n * @remnic/export-weclone\n *\n * WeClone-specific training-data export adapter that converts\n * Remnic memories into Alpaca-format fine-tuning datasets\n * compatible with WeClone / LLaMA Factory.\n */\n\nimport {\n getTrainingExportAdapter,\n registerTrainingExportAdapter,\n} from \"@remnic/core\";\n\nimport { wecloneExportAdapter } from \"./adapter.js\";\n\nexport { wecloneExportAdapter } from \"./adapter.js\";\nexport { synthesizeTrainingPairs, type SynthesizerOptions } from \"./synthesizer.js\";\nexport { extractStyleMarkers, type StyleMarkers } from \"./style-extractor.js\";\nexport { sweepPii, type PrivacySweepResult } from \"./privacy.js\";\n\n/**\n * Idempotently register the WeClone adapter with the core training-export\n * registry. Callable multiple times without throwing (CLAUDE.md #13:\n * secondary calls must not crash host processes that pre-register the\n * adapter for test fixtures).\n *\n * Returns true when the adapter was newly registered, false when an adapter\n * with the same name already exists.\n */\nexport function ensureWecloneExportAdapterRegistered(): boolean {\n if (getTrainingExportAdapter(wecloneExportAdapter.name) !== undefined) {\n return false;\n }\n registerTrainingExportAdapter(wecloneExportAdapter);\n return true;\n}\n\n// Side-effect registration: importing this module registers the adapter.\n// Callers that need to manage registration manually (e.g. tests that call\n// `clearTrainingExportAdapters()`) can re-invoke\n// `ensureWecloneExportAdapterRegistered()` after clearing.\n//\n// The try/catch keeps import-time errors from breaking unrelated callers —\n// the adapter surfaces `formatRecords` purely, so a failure here would be\n// surprising, but defensive coding keeps CLI startup resilient.\ntry {\n ensureWecloneExportAdapterRegistered();\n} catch {\n // Swallow — explicit callers can re-invoke ensureWecloneExportAdapterRegistered().\n}\n","/**\n * WeClone Alpaca-format training export adapter.\n *\n * Converts TrainingExportRecord[] into the JSON format that\n * WeClone / LLaMA Factory expects for fine-tuning:\n *\n * [{ \"instruction\": \"...\", \"input\": \"\", \"output\": \"...\" }, ...]\n *\n * Only the three Alpaca fields are emitted; Remnic-specific\n * metadata (category, confidence, sourceIds) is stripped.\n */\n\nimport type { TrainingExportAdapter, TrainingExportRecord } from \"@remnic/core\";\n\nexport const wecloneExportAdapter: TrainingExportAdapter = {\n name: \"weclone\",\n fileExtension: \".json\",\n\n formatRecords(records: TrainingExportRecord[]): string {\n const alpacaRecords = records.map((r) => ({\n instruction: r.instruction,\n input: r.input,\n output: r.output,\n }));\n return JSON.stringify(alpacaRecords, null, 2);\n },\n};\n","/**\n * Training-pair synthesizer.\n *\n * Converts Remnic's flat TrainingExportRecord[] — where\n * `instruction` is a natural-language description and\n * `category` identifies the memory type — into natural\n * conversational question-answer pairs suitable for\n * WeClone / LLaMA Factory fine-tuning.\n *\n * Uses template-based question generation (no LLM calls).\n */\n\nimport type { TrainingExportRecord } from \"@remnic/core\";\nimport type { StyleMarkers } from \"./style-extractor.js\";\n\nexport interface SynthesizerOptions {\n styleMarkers?: StyleMarkers;\n maxPairsPerRecord?: number;\n}\n\n/** Default limit for pairs generated per input record. */\nconst DEFAULT_MAX_PAIRS = 1;\n\n/**\n * Question templates keyed by template group.\n * Each array provides variety; the synthesizer picks\n * based on record index for deterministic output.\n */\nconst QUESTION_TEMPLATES: Record<string, string[]> = {\n preferences: [\n \"What kind of {topic} do you like?\",\n \"What's your preference for {topic}?\",\n \"What are your favorite {topic}?\",\n ],\n opinions: [\n \"What do you think about {topic}?\",\n \"How do you feel about {topic}?\",\n \"What's your opinion on {topic}?\",\n ],\n expertise: [\n \"Tell me about {topic}.\",\n \"What do you know about {topic}?\",\n \"Can you explain {topic}?\",\n ],\n personal: [\n \"Can you tell me about your {topic}?\",\n \"Tell me about your {topic}.\",\n \"What can you share about your {topic}?\",\n ],\n};\n\nconst DEFAULT_TEMPLATES = [\n \"Tell me about {topic}.\",\n \"What can you share about {topic}?\",\n];\n\n/**\n * Maps record.category values (from core converter) to\n * QUESTION_TEMPLATES keys. Categories not listed here\n * fall through to DEFAULT_TEMPLATES.\n */\nconst CATEGORY_TO_TEMPLATE: Record<string, string> = {\n preference: \"preferences\",\n fact: \"expertise\",\n entity: \"expertise\",\n skill: \"expertise\",\n correction: \"opinions\",\n decision: \"opinions\",\n principle: \"opinions\",\n rule: \"opinions\",\n personal: \"personal\",\n relationship: \"personal\",\n commitment: \"personal\",\n moment: \"personal\",\n};\n\n/**\n * Synthesize natural conversational training pairs from\n * category-tagged memory records.\n */\nexport function synthesizeTrainingPairs(\n records: TrainingExportRecord[],\n options?: SynthesizerOptions,\n): TrainingExportRecord[] {\n const maxPairs = options?.maxPairsPerRecord ?? DEFAULT_MAX_PAIRS;\n const style = options?.styleMarkers;\n const result: TrainingExportRecord[] = [];\n\n for (let i = 0; i < records.length; i++) {\n const record = records[i];\n const templateKey = resolveTemplateKey(record.category);\n const topic = extractTopic(record.instruction);\n const templates = QUESTION_TEMPLATES[templateKey] ?? DEFAULT_TEMPLATES;\n\n const pairCount = Math.min(maxPairs, templates.length);\n\n for (let j = 0; j < pairCount; j++) {\n const templateIndex = (i + j) % templates.length;\n const question = templates[templateIndex].replace(\"{topic}\", topic);\n let output = record.output;\n\n if (style?.usesLowercase) {\n output = output.toLowerCase();\n }\n\n result.push({\n instruction: question,\n input: \"\",\n output,\n category: record.category,\n confidence: record.confidence,\n sourceIds: record.sourceIds,\n });\n }\n }\n\n return result;\n}\n\n// ── Internals ────────────────────────────────────────────\n\n/**\n * Resolve a record's category field to a QUESTION_TEMPLATES key.\n * Falls back to empty string (which triggers DEFAULT_TEMPLATES).\n */\nfunction resolveTemplateKey(category: string | undefined): string {\n if (!category) return \"\";\n return CATEGORY_TO_TEMPLATE[category.toLowerCase()] ?? \"\";\n}\n\n/**\n * Extract a human-readable topic from the instruction string.\n *\n * The core converter produces instructions like:\n * \"Recall a factual memory (food, cooking)\"\n * \"Recall a user preference\"\n *\n * When parenthesized tags are present, use them as the topic.\n * Otherwise fall back to \"this\".\n */\nfunction extractTopic(instruction: string): string {\n const tagMatch = instruction.match(/\\(([^()]+)\\)/);\n if (tagMatch) {\n return tagMatch[1].trim().toLowerCase();\n }\n return \"this\";\n}\n","/**\n * Communication style marker extraction.\n *\n * Analyzes text samples using simple heuristics to produce\n * a StyleMarkers profile. No LLM calls — pure regex and\n * counting.\n */\n\nexport interface StyleMarkers {\n avgSentenceLength: number;\n usesEmoji: boolean;\n formality: \"formal\" | \"casual\" | \"mixed\";\n usesLowercase: boolean;\n commonPhrases: string[];\n}\n\n/**\n * Regex matching most common emoji code-point ranges.\n * Covers Emoticons, Dingbats, Transport/Map symbols,\n * Misc symbols, and supplemental blocks.\n */\nconst EMOJI_RE =\n /[\\u{1F600}-\\u{1F64F}\\u{1F300}-\\u{1F5FF}\\u{1F680}-\\u{1F6FF}\\u{1F1E0}-\\u{1F1FF}\\u{2600}-\\u{27BF}\\u{2702}-\\u{27B0}\\u{FE00}-\\u{FE0F}\\u{1FA00}-\\u{1FA6F}\\u{1FA70}-\\u{1FAFF}\\u{2328}\\u{23CF}\\u{23E9}-\\u{23F3}\\u{23F8}-\\u{23FA}\\u{200D}\\u{20E3}\\u{FE0F}\\u{E0020}-\\u{E007F}\\u{2B50}\\u{2B55}\\u{2934}\\u{2935}\\u{25AA}-\\u{25FE}\\u{2600}-\\u{26FF}\\u{2700}-\\u{27BF}\\u{231A}\\u{231B}\\u{23E9}-\\u{23F3}\\u{23F8}-\\u{23FA}\\u{25FB}-\\u{25FE}\\u{2614}\\u{2615}\\u{2648}-\\u{2653}\\u{267F}\\u{2693}\\u{26A1}\\u{26AA}\\u{26AB}\\u{26BD}\\u{26BE}\\u{26C4}\\u{26C5}\\u{26CE}\\u{26D4}\\u{26EA}\\u{26F2}\\u{26F3}\\u{26F5}\\u{26FA}\\u{26FD}\\u{2702}\\u{2705}\\u{2708}-\\u{270D}\\u{270F}\\u{2712}\\u{2714}\\u{2716}\\u{271D}\\u{2721}\\u{2728}\\u{2733}\\u{2734}\\u{2744}\\u{2747}\\u{274C}\\u{274E}\\u{2753}-\\u{2755}\\u{2757}\\u{2763}\\u{2764}\\u{2795}-\\u{2797}\\u{27A1}\\u{27B0}\\u{27BF}\\u{2934}\\u{2935}]/u;\n\n/** Words/phrases that signal formal register. */\nconst FORMAL_MARKERS = [\n \"furthermore\",\n \"however\",\n \"therefore\",\n \"moreover\",\n \"consequently\",\n \"nevertheless\",\n \"in addition\",\n \"accordingly\",\n \"subsequently\",\n \"regarding\",\n \"pertaining\",\n \"shall\",\n \"hereby\",\n \"whereas\",\n \"notwithstanding\",\n \"henceforth\",\n \"aforementioned\",\n \"please consider\",\n \"would like to\",\n \"i would\",\n \"appreciation\",\n \"recommendations\",\n \"thoroughly\",\n \"documentation\",\n];\n\n/** Words/phrases that signal casual register. */\nconst CASUAL_MARKERS = [\n \"gonna\",\n \"wanna\",\n \"kinda\",\n \"sorta\",\n \"gotta\",\n \"dunno\",\n \"lemme\",\n \"yeah\",\n \"yep\",\n \"nah\",\n \"lol\",\n \"omg\",\n \"tbh\",\n \"imo\",\n \"btw\",\n \"nope\",\n \"cuz\",\n \"tho\",\n \"ain't\",\n \"y'all\",\n \"awesome\",\n \"cool\",\n \"dude\",\n \"bro\",\n \"bruh\",\n];\n\n/** Minimum occurrences for a phrase to count as \"common\". */\nconst MIN_PHRASE_FREQUENCY = 2;\n\n/** Maximum number of common phrases to return. */\nconst MAX_COMMON_PHRASES = 10;\n\n/**\n * Analyse text samples and extract communication style markers.\n */\nexport function extractStyleMarkers(samples: string[]): StyleMarkers {\n if (samples.length === 0) {\n return {\n avgSentenceLength: 0,\n usesEmoji: false,\n formality: \"mixed\",\n usesLowercase: false,\n commonPhrases: [],\n };\n }\n\n const joined = samples.join(\" \");\n\n return {\n avgSentenceLength: calcAvgSentenceLength(joined),\n usesEmoji: detectEmoji(joined),\n formality: detectFormality(joined),\n usesLowercase: detectLowercase(joined),\n commonPhrases: findCommonPhrases(samples),\n };\n}\n\n// ── Internals ────────────────────────────────────────────\n\nfunction calcAvgSentenceLength(text: string): number {\n // Split on sentence-ending punctuation, filter empties\n const sentences = text\n .split(/[.!?]+/)\n .map((s) => s.trim())\n .filter((s) => s.length > 0);\n\n if (sentences.length === 0) return 0;\n\n const totalWords = sentences.reduce((sum, s) => {\n const words = s.split(/\\s+/).filter((w) => w.length > 0);\n return sum + words.length;\n }, 0);\n\n return Math.round((totalWords / sentences.length) * 10) / 10;\n}\n\nfunction detectEmoji(text: string): boolean {\n return EMOJI_RE.test(text);\n}\n\nfunction detectFormality(text: string): \"formal\" | \"casual\" | \"mixed\" {\n const lower = text.toLowerCase();\n\n let formalScore = 0;\n for (const marker of FORMAL_MARKERS) {\n // Word-boundary matching prevents false positives\n // (e.g., \"tho\" matching inside \"those\" or \"method\")\n if (new RegExp(`\\\\b${marker}\\\\b`, \"i\").test(lower)) formalScore++;\n }\n\n let casualScore = 0;\n for (const marker of CASUAL_MARKERS) {\n if (new RegExp(`\\\\b${marker}\\\\b`, \"i\").test(lower)) casualScore++;\n }\n\n // Threshold: need at least 2 markers to declare a style\n const THRESHOLD = 2;\n\n if (formalScore >= THRESHOLD && formalScore > casualScore) return \"formal\";\n if (casualScore >= THRESHOLD && casualScore > formalScore) return \"casual\";\n return \"mixed\";\n}\n\nfunction detectLowercase(text: string): boolean {\n // Split into sentences and check what fraction start with lowercase\n const sentences = text\n .split(/[.!?]+/)\n .map((s) => s.trim())\n .filter((s) => s.length > 0);\n\n if (sentences.length === 0) return false;\n\n const lowercaseStarts = sentences.filter((s) => {\n const firstChar = s.charAt(0);\n return firstChar === firstChar.toLowerCase() && firstChar !== firstChar.toUpperCase();\n }).length;\n\n // Majority (>50%) of sentences start lowercase\n return lowercaseStarts / sentences.length > 0.5;\n}\n\n/**\n * Check whether a character is alphanumeric (ASCII a-z, A-Z, 0-9) using\n * code-point comparison. Pure function — no regex, no backtracking.\n */\nfunction isAlnum(ch: string): boolean {\n const c = ch.charCodeAt(0);\n return (\n (c >= 48 && c <= 57) || // 0-9\n (c >= 65 && c <= 90) || // A-Z\n (c >= 97 && c <= 122) // a-z\n );\n}\n\n/**\n * Strip leading and trailing non-alphanumeric characters from `word` using\n * a single linear scan on each side. This replaces the previous\n * `/^[^a-zA-Z0-9]+/` / `/[^a-zA-Z0-9]+$/` regexes, which CodeQL flagged as\n * polynomial ReDoS on uncontrolled input (e.g. long `///...///` runs).\n */\nfunction trimNonAlnum(word: string): string {\n let start = 0;\n let end = word.length;\n while (start < end && !isAlnum(word.charAt(start))) start++;\n while (end > start && !isAlnum(word.charAt(end - 1))) end--;\n return start === 0 && end === word.length ? word : word.slice(start, end);\n}\n\nfunction findCommonPhrases(samples: string[]): string[] {\n const phraseCount = new Map<string, number>();\n\n for (const sample of samples) {\n // Tokenize: split on whitespace, strip edge punctuation with a linear\n // scan (no regex) to eliminate the polynomial backtracking that the\n // previous `replace(/^[^a-zA-Z0-9]+/, \"\")` chain exposed.\n const words = sample\n .split(/\\s+/)\n .map((w) => trimNonAlnum(w))\n .filter((w) => w.length > 0);\n\n // Build 2-gram and 3-gram phrases\n const seenInSample = new Set<string>();\n for (let ngramSize = 2; ngramSize <= 3; ngramSize++) {\n for (let i = 0; i <= words.length - ngramSize; i++) {\n const phrase = words.slice(i, i + ngramSize).join(\" \").toLowerCase();\n // Only count once per sample to avoid inflating from repetition within one sample\n if (!seenInSample.has(phrase)) {\n seenInSample.add(phrase);\n phraseCount.set(phrase, (phraseCount.get(phrase) ?? 0) + 1);\n }\n }\n }\n }\n\n // Filter by minimum frequency and sort by count descending, then alphabetical for stability\n return [...phraseCount.entries()]\n .filter(([, count]) => count >= MIN_PHRASE_FREQUENCY)\n .sort((a, b) => {\n if (b[1] !== a[1]) return b[1] - a[1];\n return a[0].localeCompare(b[0]);\n })\n .slice(0, MAX_COMMON_PHRASES)\n .map(([phrase]) => phrase);\n}\n","/**\n * PII privacy sweep for training export records.\n *\n * Belt-and-suspenders check that runs after Remnic's own\n * privacy controls. Scans instruction, input, and output\n * fields for common PII patterns and replaces matches with\n * [REDACTED].\n */\n\nimport type { TrainingExportRecord } from \"@remnic/core\";\n\nexport interface PrivacySweepResult {\n cleanRecords: TrainingExportRecord[];\n redactedCount: number;\n redactionDetails: { index: number; field: string; pattern: string }[];\n}\n\ninterface PiiPattern {\n name: string;\n regex: RegExp;\n}\n\n/**\n * Ordered list of PII patterns.\n *\n * Order matters: more specific patterns (SSN, credit card)\n * come before broader ones (phone) to avoid partial matches.\n */\nconst PII_PATTERNS: PiiPattern[] = [\n {\n // Email: user@domain.tld\n name: \"email\",\n regex: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}/g,\n },\n {\n // SSN: 123-45-6789 (exactly 3-2-4 digit groups)\n name: \"ssn\",\n regex: /\\b\\d{3}-\\d{2}-\\d{4}\\b/g,\n },\n {\n // Credit card: 4 groups of 4 digits separated by dashes or spaces\n name: \"credit_card\",\n regex: /\\b\\d{4}[-\\s]\\d{4}[-\\s]\\d{4}[-\\s]\\d{4}\\b/g,\n },\n {\n // IP address: four octets 0-255\n name: \"ip_address\",\n regex: /\\b(?:(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\b/g,\n },\n {\n // Phone: optional +1- prefix, then 3-3-4 with dashes, dots, or spaces\n // Also matches (555) 123-4567 format\n name: \"phone\",\n regex: /(?:\\+\\d{1,3}[-.\\s]?)?\\(?\\d{3}\\)?[-.\\s]\\d{3}[-.\\s]\\d{4}\\b/g,\n },\n];\n\nconst SCANNED_FIELDS: (keyof Pick<TrainingExportRecord, \"instruction\" | \"input\" | \"output\">)[] = [\n \"instruction\",\n \"input\",\n \"output\",\n];\n\n/**\n * Scan and redact PII from training export records.\n *\n * Returns a new array of cleaned records, leaving the originals\n * unmodified. The `redactedCount` is the number of records that\n * had at least one redaction. `redactionDetails` lists every\n * individual match with its record index, field, and pattern name.\n */\nexport function sweepPii(records: TrainingExportRecord[]): PrivacySweepResult {\n const redactionDetails: PrivacySweepResult[\"redactionDetails\"] = [];\n const recordHasRedaction = new Set<number>();\n\n const cleanRecords = records.map((record, idx) => {\n const cleaned: TrainingExportRecord = { ...record };\n\n for (const field of SCANNED_FIELDS) {\n let value = record[field];\n if (!value) continue;\n\n for (const pattern of PII_PATTERNS) {\n // Reset lastIndex for global regex reuse\n pattern.regex.lastIndex = 0;\n if (pattern.regex.test(value)) {\n pattern.regex.lastIndex = 0;\n value = value.replace(pattern.regex, \"[REDACTED]\");\n recordHasRedaction.add(idx);\n redactionDetails.push({\n index: idx,\n field,\n pattern: pattern.name,\n });\n }\n }\n\n cleaned[field] = value;\n }\n\n return cleaned;\n });\n\n return {\n cleanRecords,\n redactedCount: recordHasRedaction.size,\n redactionDetails,\n };\n}\n"],"mappings":";;;AAQA;AAAA,EACE;AAAA,EACA;AAAA,OACK;;;ACGA,IAAM,uBAA8C;AAAA,EACzD,MAAM;AAAA,EACN,eAAe;AAAA,EAEf,cAAc,SAAyC;AACrD,UAAM,gBAAgB,QAAQ,IAAI,CAAC,OAAO;AAAA,MACxC,aAAa,EAAE;AAAA,MACf,OAAO,EAAE;AAAA,MACT,QAAQ,EAAE;AAAA,IACZ,EAAE;AACF,WAAO,KAAK,UAAU,eAAe,MAAM,CAAC;AAAA,EAC9C;AACF;;;ACLA,IAAM,oBAAoB;AAO1B,IAAM,qBAA+C;AAAA,EACnD,aAAa;AAAA,IACX;AAAA,IACA;AAAA,IACA;AAAA,EACF;AAAA,EACA,UAAU;AAAA,IACR;AAAA,IACA;AAAA,IACA;AAAA,EACF;AAAA,EACA,WAAW;AAAA,IACT;AAAA,IACA;AAAA,IACA;AAAA,EACF;AAAA,EACA,UAAU;AAAA,IACR;AAAA,IACA;AAAA,IACA;AAAA,EACF;AACF;AAEA,IAAM,oBAAoB;AAAA,EACxB;AAAA,EACA;AACF;AAOA,IAAM,uBAA+C;AAAA,EACnD,YAAY;AAAA,EACZ,MAAM;AAAA,EACN,QAAQ;AAAA,EACR,OAAO;AAAA,EACP,YAAY;AAAA,EACZ,UAAU;AAAA,EACV,WAAW;AAAA,EACX,MAAM;AAAA,EACN,UAAU;AAAA,EACV,cAAc;AAAA,EACd,YAAY;AAAA,EACZ,QAAQ;AACV;AAMO,SAAS,wBACd,SACA,SACwB;AACxB,QAAM,WAAW,SAAS,qBAAqB;AAC/C,QAAM,QAAQ,SAAS;AACvB,QAAM,SAAiC,CAAC;AAExC,WAAS,IAAI,GAAG,IAAI,QAAQ,QAAQ,KAAK;AACvC,UAAM,SAAS,QAAQ,CAAC;AACxB,UAAM,cAAc,mBAAmB,OAAO,QAAQ;AACtD,UAAM,QAAQ,aAAa,OAAO,WAAW;AAC7C,UAAM,YAAY,mBAAmB,WAAW,KAAK;AAErD,UAAM,YAAY,KAAK,IAAI,UAAU,UAAU,MAAM;AAErD,aAAS,IAAI,GAAG,IAAI,WAAW,KAAK;AAClC,YAAM,iBAAiB,IAAI,KAAK,UAAU;AAC1C,YAAM,WAAW,UAAU,aAAa,EAAE,QAAQ,WAAW,KAAK;AAClE,UAAI,SAAS,OAAO;AAEpB,UAAI,OAAO,eAAe;AACxB,iBAAS,OAAO,YAAY;AAAA,MAC9B;AAEA,aAAO,KAAK;AAAA,QACV,aAAa;AAAA,QACb,OAAO;AAAA,QACP;AAAA,QACA,UAAU,OAAO;AAAA,QACjB,YAAY,OAAO;AAAA,QACnB,WAAW,OAAO;AAAA,MACpB,CAAC;AAAA,IACH;AAAA,EACF;AAEA,SAAO;AACT;AAQA,SAAS,mBAAmB,UAAsC;AAChE,MAAI,CAAC,SAAU,QAAO;AACtB,SAAO,qBAAqB,SAAS,YAAY,CAAC,KAAK;AACzD;AAYA,SAAS,aAAa,aAA6B;AACjD,QAAM,WAAW,YAAY,MAAM,cAAc;AACjD,MAAI,UAAU;AACZ,WAAO,SAAS,CAAC,EAAE,KAAK,EAAE,YAAY;AAAA,EACxC;AACA,SAAO;AACT;;;AC7HA,IAAM,WACJ;AAGF,IAAM,iBAAiB;AAAA,EACrB;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AACF;AAGA,IAAM,iBAAiB;AAAA,EACrB;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AACF;AAGA,IAAM,uBAAuB;AAG7B,IAAM,qBAAqB;AAKpB,SAAS,oBAAoB,SAAiC;AACnE,MAAI,QAAQ,WAAW,GAAG;AACxB,WAAO;AAAA,MACL,mBAAmB;AAAA,MACnB,WAAW;AAAA,MACX,WAAW;AAAA,MACX,eAAe;AAAA,MACf,eAAe,CAAC;AAAA,IAClB;AAAA,EACF;AAEA,QAAM,SAAS,QAAQ,KAAK,GAAG;AAE/B,SAAO;AAAA,IACL,mBAAmB,sBAAsB,MAAM;AAAA,IAC/C,WAAW,YAAY,MAAM;AAAA,IAC7B,WAAW,gBAAgB,MAAM;AAAA,IACjC,eAAe,gBAAgB,MAAM;AAAA,IACrC,eAAe,kBAAkB,OAAO;AAAA,EAC1C;AACF;AAIA,SAAS,sBAAsB,MAAsB;AAEnD,QAAM,YAAY,KACf,MAAM,QAAQ,EACd,IAAI,CAAC,MAAM,EAAE,KAAK,CAAC,EACnB,OAAO,CAAC,MAAM,EAAE,SAAS,CAAC;AAE7B,MAAI,UAAU,WAAW,EAAG,QAAO;AAEnC,QAAM,aAAa,UAAU,OAAO,CAAC,KAAK,MAAM;AAC9C,UAAM,QAAQ,EAAE,MAAM,KAAK,EAAE,OAAO,CAAC,MAAM,EAAE,SAAS,CAAC;AACvD,WAAO,MAAM,MAAM;AAAA,EACrB,GAAG,CAAC;AAEJ,SAAO,KAAK,MAAO,aAAa,UAAU,SAAU,EAAE,IAAI;AAC5D;AAEA,SAAS,YAAY,MAAuB;AAC1C,SAAO,SAAS,KAAK,IAAI;AAC3B;AAEA,SAAS,gBAAgB,MAA6C;AACpE,QAAM,QAAQ,KAAK,YAAY;AAE/B,MAAI,cAAc;AAClB,aAAW,UAAU,gBAAgB;AAGnC,QAAI,IAAI,OAAO,MAAM,MAAM,OAAO,GAAG,EAAE,KAAK,KAAK,EAAG;AAAA,EACtD;AAEA,MAAI,cAAc;AAClB,aAAW,UAAU,gBAAgB;AACnC,QAAI,IAAI,OAAO,MAAM,MAAM,OAAO,GAAG,EAAE,KAAK,KAAK,EAAG;AAAA,EACtD;AAGA,QAAM,YAAY;AAElB,MAAI,eAAe,aAAa,cAAc,YAAa,QAAO;AAClE,MAAI,eAAe,aAAa,cAAc,YAAa,QAAO;AAClE,SAAO;AACT;AAEA,SAAS,gBAAgB,MAAuB;AAE9C,QAAM,YAAY,KACf,MAAM,QAAQ,EACd,IAAI,CAAC,MAAM,EAAE,KAAK,CAAC,EACnB,OAAO,CAAC,MAAM,EAAE,SAAS,CAAC;AAE7B,MAAI,UAAU,WAAW,EAAG,QAAO;AAEnC,QAAM,kBAAkB,UAAU,OAAO,CAAC,MAAM;AAC9C,UAAM,YAAY,EAAE,OAAO,CAAC;AAC5B,WAAO,cAAc,UAAU,YAAY,KAAK,cAAc,UAAU,YAAY;AAAA,EACtF,CAAC,EAAE;AAGH,SAAO,kBAAkB,UAAU,SAAS;AAC9C;AAMA,SAAS,QAAQ,IAAqB;AACpC,QAAM,IAAI,GAAG,WAAW,CAAC;AACzB,SACG,KAAK,MAAM,KAAK;AAAA,EAChB,KAAK,MAAM,KAAK;AAAA,EAChB,KAAK,MAAM,KAAK;AAErB;AAQA,SAAS,aAAa,MAAsB;AAC1C,MAAI,QAAQ;AACZ,MAAI,MAAM,KAAK;AACf,SAAO,QAAQ,OAAO,CAAC,QAAQ,KAAK,OAAO,KAAK,CAAC,EAAG;AACpD,SAAO,MAAM,SAAS,CAAC,QAAQ,KAAK,OAAO,MAAM,CAAC,CAAC,EAAG;AACtD,SAAO,UAAU,KAAK,QAAQ,KAAK,SAAS,OAAO,KAAK,MAAM,OAAO,GAAG;AAC1E;AAEA,SAAS,kBAAkB,SAA6B;AACtD,QAAM,cAAc,oBAAI,IAAoB;AAE5C,aAAW,UAAU,SAAS;AAI5B,UAAM,QAAQ,OACX,MAAM,KAAK,EACX,IAAI,CAAC,MAAM,aAAa,CAAC,CAAC,EAC1B,OAAO,CAAC,MAAM,EAAE,SAAS,CAAC;AAG7B,UAAM,eAAe,oBAAI,IAAY;AACrC,aAAS,YAAY,GAAG,aAAa,GAAG,aAAa;AACnD,eAAS,IAAI,GAAG,KAAK,MAAM,SAAS,WAAW,KAAK;AAClD,cAAM,SAAS,MAAM,MAAM,GAAG,IAAI,SAAS,EAAE,KAAK,GAAG,EAAE,YAAY;AAEnE,YAAI,CAAC,aAAa,IAAI,MAAM,GAAG;AAC7B,uBAAa,IAAI,MAAM;AACvB,sBAAY,IAAI,SAAS,YAAY,IAAI,MAAM,KAAK,KAAK,CAAC;AAAA,QAC5D;AAAA,MACF;AAAA,IACF;AAAA,EACF;AAGA,SAAO,CAAC,GAAG,YAAY,QAAQ,CAAC,EAC7B,OAAO,CAAC,CAAC,EAAE,KAAK,MAAM,SAAS,oBAAoB,EACnD,KAAK,CAAC,GAAG,MAAM;AACd,QAAI,EAAE,CAAC,MAAM,EAAE,CAAC,EAAG,QAAO,EAAE,CAAC,IAAI,EAAE,CAAC;AACpC,WAAO,EAAE,CAAC,EAAE,cAAc,EAAE,CAAC,CAAC;AAAA,EAChC,CAAC,EACA,MAAM,GAAG,kBAAkB,EAC3B,IAAI,CAAC,CAAC,MAAM,MAAM,MAAM;AAC7B;;;AClNA,IAAM,eAA6B;AAAA,EACjC;AAAA;AAAA,IAEE,MAAM;AAAA,IACN,OAAO;AAAA,EACT;AAAA,EACA;AAAA;AAAA,IAEE,MAAM;AAAA,IACN,OAAO;AAAA,EACT;AAAA,EACA;AAAA;AAAA,IAEE,MAAM;AAAA,IACN,OAAO;AAAA,EACT;AAAA,EACA;AAAA;AAAA,IAEE,MAAM;AAAA,IACN,OAAO;AAAA,EACT;AAAA,EACA;AAAA;AAAA;AAAA,IAGE,MAAM;AAAA,IACN,OAAO;AAAA,EACT;AACF;AAEA,IAAM,iBAA2F;AAAA,EAC/F;AAAA,EACA;AAAA,EACA;AACF;AAUO,SAAS,SAAS,SAAqD;AAC5E,QAAM,mBAA2D,CAAC;AAClE,QAAM,qBAAqB,oBAAI,IAAY;AAE3C,QAAM,eAAe,QAAQ,IAAI,CAAC,QAAQ,QAAQ;AAChD,UAAM,UAAgC,EAAE,GAAG,OAAO;AAElD,eAAW,SAAS,gBAAgB;AAClC,UAAI,QAAQ,OAAO,KAAK;AACxB,UAAI,CAAC,MAAO;AAEZ,iBAAW,WAAW,cAAc;AAElC,gBAAQ,MAAM,YAAY;AAC1B,YAAI,QAAQ,MAAM,KAAK,KAAK,GAAG;AAC7B,kBAAQ,MAAM,YAAY;AAC1B,kBAAQ,MAAM,QAAQ,QAAQ,OAAO,YAAY;AACjD,6BAAmB,IAAI,GAAG;AAC1B,2BAAiB,KAAK;AAAA,YACpB,OAAO;AAAA,YACP;AAAA,YACA,SAAS,QAAQ;AAAA,UACnB,CAAC;AAAA,QACH;AAAA,MACF;AAEA,cAAQ,KAAK,IAAI;AAAA,IACnB;AAEA,WAAO;AAAA,EACT,CAAC;AAED,SAAO;AAAA,IACL;AAAA,IACA,eAAe,mBAAmB;AAAA,IAClC;AAAA,EACF;AACF;;;AJ/EO,SAAS,uCAAgD;AAC9D,MAAI,yBAAyB,qBAAqB,IAAI,MAAM,QAAW;AACrE,WAAO;AAAA,EACT;AACA,gCAA8B,oBAAoB;AAClD,SAAO;AACT;AAUA,IAAI;AACF,uCAAqC;AACvC,QAAQ;AAER;","names":[]}
package/package.json ADDED
@@ -0,0 +1,48 @@
1
+ {
2
+ "name": "@remnic/export-weclone",
3
+ "version": "1.0.1",
4
+ "description": "Export Remnic memories as WeClone-compatible Alpaca-format fine-tuning datasets",
5
+ "type": "module",
6
+ "main": "dist/index.js",
7
+ "types": "dist/index.d.ts",
8
+ "exports": {
9
+ ".": {
10
+ "types": "./dist/index.d.ts",
11
+ "import": "./dist/index.js"
12
+ }
13
+ },
14
+ "files": [
15
+ "dist",
16
+ "README.md"
17
+ ],
18
+ "publishConfig": {
19
+ "access": "public",
20
+ "provenance": true
21
+ },
22
+ "dependencies": {
23
+ "@remnic/core": "^1.0.3"
24
+ },
25
+ "devDependencies": {
26
+ "tsup": "^8.0.0",
27
+ "typescript": "^5.7.0",
28
+ "tsx": "^4.0.0"
29
+ },
30
+ "license": "MIT",
31
+ "repository": {
32
+ "type": "git",
33
+ "url": "https://github.com/joshuaswarren/remnic.git",
34
+ "directory": "packages/export-weclone"
35
+ },
36
+ "keywords": [
37
+ "remnic",
38
+ "memory",
39
+ "weclone",
40
+ "fine-tuning",
41
+ "export",
42
+ "alpaca"
43
+ ],
44
+ "scripts": {
45
+ "build": "tsup src/index.ts --format esm --dts",
46
+ "test": "tsx --test 'src/**/*.test.ts'"
47
+ }
48
+ }