diffdoc 0.5.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -165,6 +165,8 @@ DIFFDOC_SUMMARIZE_CONCURRENCY
165
165
  DIFFDOC_INCLUDE_GLOBS
166
166
  DIFFDOC_EXCLUDE_GLOBS
167
167
  DIFFDOC_IGNORE_FILE
168
+ DIFFDOC_SUMMARY_PROMPT
169
+ DIFFDOC_SUMMARY_PROMPT_FILE
168
170
  LOCAL_LLM_ENDPOINT
169
171
  LOCAL_CHAT_MODEL
170
172
  LOCAL_EMBED_ENDPOINT
@@ -231,16 +233,26 @@ npx diffdoc summarize --path . --mode all
231
233
  npx diffdoc summarize --path . --mode delta
232
234
  npx diffdoc summarize --path . --mode delta --json
233
235
  npx diffdoc summarize --path . --mode all --summarize-concurrency 4
236
+ npx diffdoc summarize --path . --mode all --refresh
234
237
  ```
235
238
 
236
239
  Summarization runs with bounded concurrency. The default is `2`; use `1` for strict rate limits, `2-4` for most providers, and higher values only when your local model server or API quota can handle the request volume.
237
240
 
238
- Store raw code snapshots in summary assets when you want retrieved results to include source text:
241
+ Use `--summary-prompt` or `--summary-prompt-file` to add domain-specific guidance without replacing DiffDoc's default structured prompt:
242
+
243
+ ```bash
244
+ npx diffdoc summarize --summary-prompt "Emphasize billing behavior, permissions, data retention, and operational risk."
245
+ npx diffdoc summarize --summary-prompt-file ./diffdoc-summary-prompt.md
246
+ ```
247
+
248
+ Raw code snapshots are optional. DiffDoc normally stores file path and content hash metadata so tools can look up source files from the repository when needed. Store raw code snapshots only when you need exported, offline, or point-in-time audit artifacts to include source text:
239
249
 
240
250
  ```bash
241
251
  npx diffdoc summarize --path . --mode all --include-code-snapshot
242
252
  ```
243
253
 
254
+ Snapshots increase artifact size and duplicate source code, which can include sensitive or proprietary content.
255
+
244
256
  Check manifest and index freshness:
245
257
 
246
258
  ```bash
@@ -248,6 +260,8 @@ npx diffdoc status
248
260
  npx diffdoc status --json
249
261
  ```
250
262
 
263
+ `status` also recommends the next command to run. It prioritizes refreshing missing or stale summaries before rebuilding the vector index.
264
+
251
265
  Embed summaries into the local Vectra index:
252
266
 
253
267
  ```bash
@@ -305,13 +319,47 @@ Each summary asset is portable JSON:
305
319
 
306
320
  ```json
307
321
  {
308
- "schemaVersion": 1,
322
+ "schemaVersion": 2,
309
323
  "content_hash": "md5-string",
310
- "summary": "Plain-English explanation text here.",
324
+ "metadata": {
325
+ "file_path": "src/example.ts",
326
+ "file_name": "example.ts",
327
+ "extension": ".ts",
328
+ "line_count": 42,
329
+ "byte_size": 1200,
330
+ "content_hash": "md5-string",
331
+ "generated_at": "2026-05-27T00:00:00.000Z",
332
+ "generator": {
333
+ "provider": "local",
334
+ "model": "qwen2.5-coder:7b",
335
+ "base_url": "http://localhost:11434/v1"
336
+ },
337
+ "prompt_version": 1,
338
+ "summary_format": "structured-functional-v1"
339
+ },
340
+ "summary": "## Metadata\n- File path: src/example.ts\n...",
311
341
  "raw_code_snapshot": "Optional code text when --include-code-snapshot is enabled"
312
342
  }
313
343
  ```
314
344
 
345
+ The JSON `metadata` contains deterministic source and generation facts. The markdown `summary` begins with `## Metadata`, which is embedded with the rest of the summary so file paths, hashes, inferred language/type, symbols, functions, classes, and dependencies are searchable. Language/type and symbol/dependency details are inferred by the model from the file path, extension, and code content rather than maintained through a static parser.
346
+
347
+ Structured summaries use these sections in order:
348
+
349
+ ```md
350
+ ## Metadata
351
+ ## Purpose
352
+ ## User-Visible Behavior
353
+ ## Business Rules
354
+ ## Data Inputs And Outputs
355
+ ## Side Effects
356
+ ## Error And Edge Cases
357
+ ## Dependencies
358
+ ## Operational Notes
359
+ ```
360
+
361
+ Summary assets are regenerated when the source hash changes, summary schema changes, prompt version changes, summary format changes, custom prompt hash changes, provider/model changes, or `--refresh` is passed. Regenerate existing schema `1` artifacts with `npx diffdoc summarize --mode all --refresh`. The `embed` command remains tolerant of older summary assets as long as they contain a content hash and summary text; use `status` or `summarize` to identify and refresh stale metadata.
362
+
315
363
  Commit `.diffdoc/manifest.json` and `.diffdoc/summaries/*.json` if you want summaries shared across machines or CI runs. Keep `.diffdoc/vectra/` local unless you have a specific reason to commit the generated vector index.
316
364
 
317
365
  The manifest and summary assets are the stable handoff point for consumers. The local Vectra index produced by `diffdoc embed` is optional and can be replaced by any embedding model and storage backend that fits your environment.
@@ -22,36 +22,25 @@ function getSummaryPath(summaryDir, hash) {
22
22
  return node_path_1.default.resolve(summaryDir, `${hash}.json`);
23
23
  }
24
24
  async function readManifest(manifestPath) {
25
- const parsed = JSON.parse(await promises_1.default.readFile(manifestPath, "utf8"));
26
- if (parsed.schemaVersion !== artifacts_1.MANIFEST_SCHEMA_VERSION) {
27
- throw new Error(`Unsupported manifest schema in ${manifestPath}. Expected schemaVersion ${artifacts_1.MANIFEST_SCHEMA_VERSION}.`);
25
+ const raw = JSON.parse(await promises_1.default.readFile(manifestPath, "utf8"));
26
+ const result = artifacts_1.RepoManifestSchema.safeParse(raw);
27
+ if (!result.success) {
28
+ const issues = result.error.issues.map((i) => ` - ${i.path.join(".")}: ${i.message}`).join("\n");
29
+ throw new Error(`Invalid manifest in ${manifestPath}:\n${issues}`);
28
30
  }
29
- return {
30
- schemaVersion: artifacts_1.MANIFEST_SCHEMA_VERSION,
31
- lastSyncedCommit: typeof parsed.lastSyncedCommit === "string" ? parsed.lastSyncedCommit : "",
32
- files: parsed.files && typeof parsed.files === "object" ? parsed.files : {}
33
- };
31
+ return result.data;
34
32
  }
35
33
  async function readSummaryAsset(summaryPath) {
36
- const parsed = JSON.parse(await promises_1.default.readFile(summaryPath, "utf8"));
37
- if (parsed.schemaVersion !== artifacts_1.SUMMARY_ASSET_SCHEMA_VERSION) {
38
- throw new Error(`Unsupported summary schema in ${summaryPath}. Expected schemaVersion ${artifacts_1.SUMMARY_ASSET_SCHEMA_VERSION}.`);
34
+ const raw = JSON.parse(await promises_1.default.readFile(summaryPath, "utf8"));
35
+ const result = artifacts_1.SummaryAssetSchema.safeParse(raw);
36
+ if (!result.success) {
37
+ const issues = result.error.issues.map((i) => ` - ${i.path.join(".")}: ${i.message}`).join("\n");
38
+ throw new Error(`Invalid summary asset in ${summaryPath}:\n${issues}`);
39
39
  }
40
- if (typeof parsed.content_hash !== "string") {
41
- throw new Error(`Invalid summary hash in ${summaryPath}.`);
42
- }
43
- if (typeof parsed.summary !== "string") {
44
- throw new Error(`Invalid summary text in ${summaryPath}.`);
45
- }
46
- return {
47
- schemaVersion: artifacts_1.SUMMARY_ASSET_SCHEMA_VERSION,
48
- content_hash: parsed.content_hash,
49
- summary: parsed.summary,
50
- raw_code_snapshot: typeof parsed.raw_code_snapshot === "string" ? parsed.raw_code_snapshot : undefined
51
- };
40
+ return result.data;
52
41
  }
53
- function buildDocument(filePath, summaryText) {
54
- return `File: ${filePath}\nSummary: ${summaryText}`;
42
+ function buildDocument(summaryAsset) {
43
+ return summaryAsset.summary;
55
44
  }
56
45
  async function runEmbed(options, config) {
57
46
  const manifestPath = (0, paths_1.resolveDiffdocArtifactPath)(options.manifest, config.baseDir);
@@ -96,7 +85,7 @@ async function runEmbed(options, config) {
96
85
  hash,
97
86
  summaryText: summaryAsset.summary,
98
87
  rawCodeSnapshot: summaryAsset.raw_code_snapshot,
99
- document: buildDocument(filePath, summaryAsset.summary)
88
+ document: buildDocument(summaryAsset)
100
89
  });
101
90
  }
102
91
  const activePathSet = new Set(entries.map(([filePath]) => filePath));
@@ -8,7 +8,10 @@ const promises_1 = __importDefault(require("node:fs/promises"));
8
8
  const node_path_1 = __importDefault(require("node:path"));
9
9
  const promises_2 = require("node:readline/promises");
10
10
  const node_process_1 = require("node:process");
11
+ const SCHEMA_BASE_URL = "https://raw.githubusercontent.com/sullyTheDev/diffdoc";
12
+ const PKG_VERSION = require("../../package.json").version;
11
13
  const DEFAULT_CONFIG = {
14
+ $schema: `${SCHEMA_BASE_URL}/v${PKG_VERSION}/schemas/diffdocrc.schema.json`,
12
15
  baseDir: "./.diffdoc",
13
16
  aiProvider: "local",
14
17
  localLlmEndpoint: "http://localhost:11434/v1",
@@ -10,6 +10,7 @@ const vectra_1 = require("vectra");
10
10
  const embed_1 = require("./embed");
11
11
  const artifacts_1 = require("../types/artifacts");
12
12
  const paths_1 = require("../utils/paths");
13
+ const llm_1 = require("../utils/llm");
13
14
  function getSummaryDir(manifestPath) {
14
15
  return node_path_1.default.resolve(node_path_1.default.dirname(manifestPath), "summaries");
15
16
  }
@@ -25,18 +26,12 @@ async function readManifest(manifestPath) {
25
26
  }
26
27
  throw error;
27
28
  }
28
- if (!parsed || typeof parsed !== "object" || Array.isArray(parsed)) {
29
- throw new Error(`Invalid manifest JSON in ${manifestPath}. Expected an object.`);
29
+ const result = artifacts_1.RepoManifestSchema.safeParse(parsed);
30
+ if (!result.success) {
31
+ const issues = result.error.issues.map((i) => ` - ${i.path.join(".")}: ${i.message}`).join("\n");
32
+ throw new Error(`Invalid manifest in ${manifestPath}:\n${issues}`);
30
33
  }
31
- const manifest = parsed;
32
- if (manifest.schemaVersion !== artifacts_1.MANIFEST_SCHEMA_VERSION) {
33
- throw new Error(`Unsupported manifest schema in ${manifestPath}. Expected schemaVersion ${artifacts_1.MANIFEST_SCHEMA_VERSION}.`);
34
- }
35
- return {
36
- schemaVersion: artifacts_1.MANIFEST_SCHEMA_VERSION,
37
- lastSyncedCommit: typeof manifest.lastSyncedCommit === "string" ? manifest.lastSyncedCommit : "",
38
- files: manifest.files && typeof manifest.files === "object" ? manifest.files : {}
39
- };
34
+ return result.data;
40
35
  }
41
36
  async function getSummaryStats(manifestPath, manifest) {
42
37
  const summaryDir = getSummaryDir(manifestPath);
@@ -64,10 +59,36 @@ async function getSummaryStats(manifestPath, manifest) {
64
59
  missingFromManifestCount += 1;
65
60
  }
66
61
  }
62
+ let staleCount = 0;
63
+ for (const hash of manifestHashes) {
64
+ if (!summaryHashes.has(hash)) {
65
+ continue;
66
+ }
67
+ try {
68
+ const raw = JSON.parse(await promises_1.default.readFile(node_path_1.default.resolve(summaryDir, `${hash}.json`), "utf8"));
69
+ const result = artifacts_1.SummaryAssetSchema.safeParse(raw);
70
+ if (!result.success) {
71
+ staleCount += 1;
72
+ continue;
73
+ }
74
+ const asset = result.data;
75
+ if (asset.content_hash !== hash ||
76
+ !asset.metadata ||
77
+ asset.metadata.content_hash !== hash ||
78
+ asset.metadata.prompt_version !== llm_1.SUMMARY_PROMPT_VERSION ||
79
+ asset.metadata.summary_format !== llm_1.SUMMARY_FORMAT) {
80
+ staleCount += 1;
81
+ }
82
+ }
83
+ catch {
84
+ staleCount += 1;
85
+ }
86
+ }
67
87
  return {
68
88
  summaryFileCount: summaryHashes.size,
69
89
  orphanCount,
70
- missingFromManifestCount
90
+ missingFromManifestCount,
91
+ staleCount
71
92
  };
72
93
  }
73
94
  async function getIndexFreshness(manifest, config) {
@@ -120,22 +141,58 @@ async function getIndexFreshness(manifest, config) {
120
141
  };
121
142
  }
122
143
  function formatSummaryFreshness(stats) {
123
- if (stats.missingFromManifestCount === 0) {
144
+ if (stats.missingFromManifestCount === 0 && stats.staleCount === 0) {
124
145
  return "fresh";
125
146
  }
126
- return `stale (missing: ${stats.missingFromManifestCount})`;
147
+ return `stale (missing: ${stats.missingFromManifestCount}, stale: ${stats.staleCount})`;
148
+ }
149
+ function buildSummarizeCommand(manifestOption) {
150
+ const command = "diffdoc summarize --mode all --refresh";
151
+ return manifestOption === "manifest.json" ? command : `${command} --out ${manifestOption}`;
152
+ }
153
+ function buildEmbedCommand(manifestOption) {
154
+ const command = "diffdoc embed";
155
+ return manifestOption === "manifest.json" ? command : `${command} --manifest ${manifestOption}`;
156
+ }
157
+ function getNextCommand(manifestOption, summaryStats, indexFreshness) {
158
+ if (summaryStats.missingFromManifestCount > 0 || summaryStats.staleCount > 0) {
159
+ return {
160
+ command: buildSummarizeCommand(manifestOption),
161
+ reason: "summary artifacts are missing or stale"
162
+ };
163
+ }
164
+ if (indexFreshness.status === "missing") {
165
+ return {
166
+ command: buildEmbedCommand(manifestOption),
167
+ reason: "vector index is missing"
168
+ };
169
+ }
170
+ if (indexFreshness.status === "stale") {
171
+ return {
172
+ command: buildEmbedCommand(manifestOption),
173
+ reason: "vector index is stale"
174
+ };
175
+ }
176
+ return {
177
+ command: null,
178
+ reason: "summaries and index are fresh"
179
+ };
127
180
  }
128
- function buildStatusReport(manifest, summaryStats, indexFreshness) {
181
+ function buildStatusReport(manifest, summaryStats, indexFreshness, manifestOption) {
182
+ const nextCommand = getNextCommand(manifestOption, summaryStats, indexFreshness);
129
183
  return {
130
184
  manifestSchema: manifest.schemaVersion,
131
185
  trackedFileCount: Object.keys(manifest.files).length,
132
186
  summaryFileCount: summaryStats.summaryFileCount,
133
187
  orphanCount: summaryStats.orphanCount,
134
188
  summaryFreshness: {
135
- status: summaryStats.missingFromManifestCount === 0 ? "fresh" : "stale",
136
- missing: summaryStats.missingFromManifestCount
189
+ status: summaryStats.missingFromManifestCount === 0 && summaryStats.staleCount === 0 ? "fresh" : "stale",
190
+ missing: summaryStats.missingFromManifestCount,
191
+ stale: summaryStats.staleCount
137
192
  },
138
- indexFreshness
193
+ indexFreshness,
194
+ nextCommand: nextCommand.command,
195
+ nextCommandReason: nextCommand.reason
139
196
  };
140
197
  }
141
198
  function formatIndexFreshness(freshness) {
@@ -152,7 +209,7 @@ async function runStatus(options, config) {
152
209
  const manifest = await readManifest(manifestPath);
153
210
  const summaryStats = await getSummaryStats(manifestPath, manifest);
154
211
  const indexFreshness = await getIndexFreshness(manifest, config);
155
- const report = buildStatusReport(manifest, summaryStats, indexFreshness);
212
+ const report = buildStatusReport(manifest, summaryStats, indexFreshness, options.manifest);
156
213
  if (options.json) {
157
214
  console.log(JSON.stringify(report, null, 2));
158
215
  return;
@@ -161,6 +218,10 @@ async function runStatus(options, config) {
161
218
  console.log(`tracked files: ${report.trackedFileCount}`);
162
219
  console.log(`summary files: ${report.summaryFileCount}`);
163
220
  console.log(`orphans: ${report.orphanCount}`);
221
+ console.log(`stale summaries: ${report.summaryFreshness.stale}`);
164
222
  console.log(`summary freshness: ${formatSummaryFreshness(summaryStats)}`);
165
223
  console.log(`index freshness: ${formatIndexFreshness(indexFreshness)}`);
224
+ console.log("");
225
+ console.log(`next command: ${report.nextCommand || "none"}`);
226
+ console.log(`reason: ${report.nextCommandReason}`);
166
227
  }
@@ -12,6 +12,10 @@ const git_1 = require("../utils/git");
12
12
  const hashing_1 = require("../utils/hashing");
13
13
  const llm_1 = require("../utils/llm");
14
14
  const paths_1 = require("../utils/paths");
15
+ const SCHEMA_BASE_URL = "https://raw.githubusercontent.com/sullyTheDev/diffdoc";
16
+ const PKG_VERSION = require("../../package.json").version;
17
+ const MANIFEST_SCHEMA_URL = `${SCHEMA_BASE_URL}/v${PKG_VERSION}/schemas/manifest.schema.json`;
18
+ const SUMMARY_ASSET_SCHEMA_URL = `${SCHEMA_BASE_URL}/v${PKG_VERSION}/schemas/summary-asset.schema.json`;
15
19
  function normalizeRelativePath(filePath) {
16
20
  return filePath.split(node_path_1.default.sep).join("/");
17
21
  }
@@ -71,15 +75,6 @@ function shouldIncludeFile(filePath, includeGlobs, excludeGlobs, ignoreMatcher)
71
75
  function isIgnoredDirectory(dirPath, ignoreMatcher) {
72
76
  return ignoreMatcher.ignores(dirPath) || ignoreMatcher.ignores(`${dirPath}/`);
73
77
  }
74
- async function fileExists(filePath) {
75
- try {
76
- await promises_1.default.access(filePath);
77
- return true;
78
- }
79
- catch {
80
- return false;
81
- }
82
- }
83
78
  async function atomicWriteUtf8(targetPath, content) {
84
79
  await promises_1.default.mkdir(node_path_1.default.dirname(targetPath), { recursive: true });
85
80
  const tempPath = `${targetPath}.${process.pid}.${Date.now()}.tmp`;
@@ -99,22 +94,87 @@ async function writeManifest(manifestPath, manifest) {
99
94
  async function writeSummaryAsset(summaryPath, summary) {
100
95
  await atomicWriteUtf8(summaryPath, `${JSON.stringify(summary, null, 2)}\n`);
101
96
  }
97
+ function getPromptHash(config) {
98
+ return config.summarize.resolvedSummaryPrompt
99
+ ? (0, hashing_1.hashTextContent)(config.summarize.resolvedSummaryPrompt)
100
+ : undefined;
101
+ }
102
+ function buildSummaryMetadata(params) {
103
+ return {
104
+ file_path: params.filePath,
105
+ file_name: node_path_1.default.basename(params.filePath),
106
+ extension: node_path_1.default.extname(params.filePath),
107
+ line_count: params.rawCodeSnapshot.length === 0 ? 0 : params.rawCodeSnapshot.split(/\r\n|\r|\n/).length,
108
+ byte_size: Buffer.byteLength(params.rawCodeSnapshot, "utf8"),
109
+ content_hash: params.hash,
110
+ generated_at: params.generatedAt,
111
+ generator: {
112
+ provider: params.config.provider,
113
+ model: params.config.chat.model
114
+ },
115
+ prompt_version: llm_1.SUMMARY_PROMPT_VERSION,
116
+ summary_format: llm_1.SUMMARY_FORMAT,
117
+ custom_prompt_hash: params.customPromptHash,
118
+ custom_prompt_source: params.customPromptSource
119
+ };
120
+ }
121
+ function isRecord(value) {
122
+ return Boolean(value) && typeof value === "object" && !Array.isArray(value);
123
+ }
124
+ function hasExpectedCustomPromptHash(metadata, customPromptHash) {
125
+ const actual = typeof metadata.custom_prompt_hash === "string" ? metadata.custom_prompt_hash : undefined;
126
+ return actual === customPromptHash;
127
+ }
128
+ async function isSummaryAssetFresh(summaryPath, expected) {
129
+ let parsed;
130
+ try {
131
+ parsed = JSON.parse(await promises_1.default.readFile(summaryPath, "utf8"));
132
+ }
133
+ catch {
134
+ return false;
135
+ }
136
+ if (!isRecord(parsed)) {
137
+ return false;
138
+ }
139
+ if (parsed.schemaVersion !== artifacts_1.SUMMARY_ASSET_SCHEMA_VERSION || parsed.content_hash !== expected.hash) {
140
+ return false;
141
+ }
142
+ if (expected.includeCodeSnapshot !== (typeof parsed.raw_code_snapshot === "string")) {
143
+ return false;
144
+ }
145
+ if (!isRecord(parsed.metadata)) {
146
+ return false;
147
+ }
148
+ const metadata = parsed.metadata;
149
+ if (metadata.content_hash !== expected.hash) {
150
+ return false;
151
+ }
152
+ if (metadata.prompt_version !== expected.promptVersion || metadata.summary_format !== expected.summaryFormat) {
153
+ return false;
154
+ }
155
+ if (!hasExpectedCustomPromptHash(metadata, expected.customPromptHash)) {
156
+ return false;
157
+ }
158
+ if (!isRecord(metadata.generator)) {
159
+ return false;
160
+ }
161
+ return metadata.generator.provider === expected.provider && metadata.generator.model === expected.model;
162
+ }
102
163
  async function readManifest(manifestPath) {
103
164
  try {
104
- const parsed = JSON.parse(await promises_1.default.readFile(manifestPath, "utf8"));
105
- if (parsed.schemaVersion !== artifacts_1.MANIFEST_SCHEMA_VERSION) {
106
- throw new Error(`Unsupported manifest schema in ${manifestPath}. Expected schemaVersion ${artifacts_1.MANIFEST_SCHEMA_VERSION}.`);
165
+ const raw = JSON.parse(await promises_1.default.readFile(manifestPath, "utf8"));
166
+ const result = artifacts_1.RepoManifestSchema.safeParse(raw);
167
+ if (!result.success) {
168
+ const issues = result.error.issues.map((i) => ` - ${i.path.join(".")}: ${i.message}`).join("\n");
169
+ throw new Error(`Invalid manifest in ${manifestPath}:\n${issues}`);
107
170
  }
108
- return {
109
- schemaVersion: artifacts_1.MANIFEST_SCHEMA_VERSION,
110
- lastSyncedCommit: typeof parsed.lastSyncedCommit === "string" ? parsed.lastSyncedCommit : "",
111
- files: parsed.files && typeof parsed.files === "object" ? parsed.files : {}
112
- };
171
+ return result.data;
113
172
  }
114
173
  catch (error) {
115
174
  const nodeError = error;
116
175
  if (nodeError.code === "ENOENT") {
117
176
  return {
177
+ $schema: MANIFEST_SCHEMA_URL,
118
178
  schemaVersion: artifacts_1.MANIFEST_SCHEMA_VERSION,
119
179
  lastSyncedCommit: "",
120
180
  files: {}
@@ -210,14 +270,13 @@ async function removeManifestPath(filePath, manifest, manifestPath, summaryDir,
210
270
  await deleteSummaryIfUnreferenced(summaryDir, previousHash, refs);
211
271
  return true;
212
272
  }
213
- async function ensureSummaryAsset(summaryDir, hash, summaryText, rawCodeSnapshot, includeCodeSnapshot) {
273
+ async function ensureSummaryAsset(summaryDir, hash, metadata, summaryText, rawCodeSnapshot, includeCodeSnapshot) {
214
274
  const summaryPath = getSummaryPath(summaryDir, hash);
215
- if (await fileExists(summaryPath)) {
216
- return;
217
- }
218
275
  const summary = {
276
+ $schema: SUMMARY_ASSET_SCHEMA_URL,
219
277
  schemaVersion: artifacts_1.SUMMARY_ASSET_SCHEMA_VERSION,
220
278
  content_hash: hash,
279
+ metadata,
221
280
  summary: summaryText,
222
281
  raw_code_snapshot: includeCodeSnapshot ? rawCodeSnapshot : undefined
223
282
  };
@@ -285,7 +344,18 @@ async function runSummarize(options, config) {
285
344
  : config.summarize.excludeGlobs.map(normalizeGlobPattern));
286
345
  const ignoreFile = options.ignoreFile || config.summarize.ignoreFile;
287
346
  const ignoreMatcher = await readIgnoreMatcher(repoPath, ignoreFile);
288
- const totals = { scanned: 0, skipped: 0, updated: 0, failed: 0, pruned: 0 };
347
+ const customPromptHash = getPromptHash(config);
348
+ const customPromptSource = customPromptHash ? config.summarize.summaryPromptSource : undefined;
349
+ const summaryFreshnessExpected = (hash) => ({
350
+ hash,
351
+ promptVersion: llm_1.SUMMARY_PROMPT_VERSION,
352
+ summaryFormat: llm_1.SUMMARY_FORMAT,
353
+ customPromptHash,
354
+ provider: config.provider,
355
+ model: config.chat.model,
356
+ includeCodeSnapshot: options.includeCodeSnapshot
357
+ });
358
+ const totals = { scanned: 0, skipped: 0, updated: 0, refreshed: 0, failed: 0, pruned: 0 };
289
359
  const failures = [];
290
360
  const isJson = options.json;
291
361
  const concurrency = config.summarize.concurrency;
@@ -293,20 +363,31 @@ async function runSummarize(options, config) {
293
363
  const summaryAssetTasks = new Map();
294
364
  async function ensureSummaryAssetForFile(filePath, hash, rawCodeSnapshot) {
295
365
  const summaryPath = getSummaryPath(summaryDir, hash);
296
- if (await fileExists(summaryPath)) {
297
- return;
366
+ if (!options.refresh && await isSummaryAssetFresh(summaryPath, summaryFreshnessExpected(hash))) {
367
+ return false;
298
368
  }
299
369
  let task = summaryAssetTasks.get(hash);
300
370
  if (!task) {
301
371
  task = (async () => {
302
- const summaryText = await (0, llm_1.generateFunctionalSummary)(filePath, rawCodeSnapshot, config.chat);
303
- await ensureSummaryAsset(summaryDir, hash, summaryText, rawCodeSnapshot, options.includeCodeSnapshot);
372
+ const generatedAt = new Date().toISOString();
373
+ const metadata = buildSummaryMetadata({
374
+ filePath,
375
+ hash,
376
+ rawCodeSnapshot,
377
+ config,
378
+ generatedAt,
379
+ customPromptHash,
380
+ customPromptSource
381
+ });
382
+ const summaryText = await (0, llm_1.generateFunctionalSummary)(filePath, rawCodeSnapshot, metadata, config.chat, config.summarize.resolvedSummaryPrompt);
383
+ await ensureSummaryAsset(summaryDir, hash, metadata, summaryText, rawCodeSnapshot, options.includeCodeSnapshot);
384
+ return true;
304
385
  })().finally(() => {
305
386
  summaryAssetTasks.delete(hash);
306
387
  });
307
388
  summaryAssetTasks.set(hash, task);
308
389
  }
309
- await task;
390
+ return task;
310
391
  }
311
392
  if (!isJson) {
312
393
  console.log(`Starting summarize run`);
@@ -403,11 +484,17 @@ async function runSummarize(options, config) {
403
484
  const rawCodeSnapshot = await promises_1.default.readFile(absolutePath, "utf8");
404
485
  const hash = (0, hashing_1.hashFileContent)(rawCodeSnapshot);
405
486
  if (previousHash === hash) {
487
+ const regenerated = await ensureSummaryAssetForFile(filePath, hash, rawCodeSnapshot);
406
488
  await withManifestLock(async () => {
407
- totals.skipped += 1;
489
+ if (regenerated) {
490
+ totals.refreshed += 1;
491
+ }
492
+ else {
493
+ totals.skipped += 1;
494
+ }
408
495
  completedModified += 1;
409
496
  if (!isJson) {
410
- console.log(`[${completedModified}/${deltas.modifiedOrAdded.length}] unchanged ${filePath}`);
497
+ console.log(`[${completedModified}/${deltas.modifiedOrAdded.length}] ${regenerated ? "refreshed" : "unchanged"} ${filePath}`);
411
498
  }
412
499
  });
413
500
  return;
@@ -481,6 +568,7 @@ async function runSummarize(options, config) {
481
568
  console.log(`Summarize complete`);
482
569
  console.log(`Scanned: ${totals.scanned}`);
483
570
  console.log(`Updated: ${totals.updated}`);
571
+ console.log(`Refreshed: ${totals.refreshed}`);
484
572
  console.log(`Skipped: ${totals.skipped}`);
485
573
  console.log(`Pruned: ${totals.pruned}`);
486
574
  console.log(`Failed: ${totals.failed}`);
@@ -0,0 +1,113 @@
1
+ "use strict";
2
+ var __importDefault = (this && this.__importDefault) || function (mod) {
3
+ return (mod && mod.__esModule) ? mod : { "default": mod };
4
+ };
5
+ Object.defineProperty(exports, "__esModule", { value: true });
6
+ exports.runValidate = runValidate;
7
+ const promises_1 = __importDefault(require("node:fs/promises"));
8
+ const node_path_1 = __importDefault(require("node:path"));
9
+ const artifacts_1 = require("../types/artifacts");
10
+ const paths_1 = require("../utils/paths");
11
+ function getSummaryDir(manifestPath) {
12
+ return node_path_1.default.resolve(node_path_1.default.dirname(manifestPath), "summaries");
13
+ }
14
+ async function runValidate(options, config) {
15
+ const manifestPath = (0, paths_1.resolveDiffdocArtifactPath)(options.manifest, config.baseDir);
16
+ const issues = [];
17
+ let manifestValid = false;
18
+ let summaryAssetsChecked = 0;
19
+ let summaryAssetsValid = 0;
20
+ // Validate manifest
21
+ let manifestData;
22
+ try {
23
+ manifestData = JSON.parse(await promises_1.default.readFile(manifestPath, "utf8"));
24
+ }
25
+ catch (error) {
26
+ const nodeError = error;
27
+ if (nodeError.code === "ENOENT") {
28
+ issues.push({ file: manifestPath, path: "", message: "Manifest file not found." });
29
+ }
30
+ else {
31
+ issues.push({ file: manifestPath, path: "", message: `Failed to parse JSON: ${error.message}` });
32
+ }
33
+ manifestData = undefined;
34
+ }
35
+ if (manifestData !== undefined) {
36
+ const result = artifacts_1.RepoManifestSchema.safeParse(manifestData);
37
+ if (result.success) {
38
+ manifestValid = true;
39
+ // Validate each referenced summary asset
40
+ const summaryDir = getSummaryDir(manifestPath);
41
+ const hashes = new Set(Object.values(result.data.files));
42
+ for (const hash of hashes) {
43
+ summaryAssetsChecked += 1;
44
+ const summaryPath = node_path_1.default.resolve(summaryDir, `${hash}.json`);
45
+ let summaryRaw;
46
+ try {
47
+ summaryRaw = JSON.parse(await promises_1.default.readFile(summaryPath, "utf8"));
48
+ }
49
+ catch (error) {
50
+ const nodeError = error;
51
+ if (nodeError.code === "ENOENT") {
52
+ issues.push({ file: summaryPath, path: "", message: "Summary asset file not found." });
53
+ }
54
+ else {
55
+ issues.push({ file: summaryPath, path: "", message: `Failed to parse JSON: ${error.message}` });
56
+ }
57
+ continue;
58
+ }
59
+ const assetResult = artifacts_1.SummaryAssetSchema.safeParse(summaryRaw);
60
+ if (assetResult.success) {
61
+ // Cross-check content_hash matches filename
62
+ if (assetResult.data.content_hash !== hash) {
63
+ issues.push({ file: summaryPath, path: "content_hash", message: `Expected "${hash}" but got "${assetResult.data.content_hash}".` });
64
+ }
65
+ else {
66
+ summaryAssetsValid += 1;
67
+ }
68
+ }
69
+ else {
70
+ for (const issue of assetResult.error.issues) {
71
+ issues.push({ file: summaryPath, path: issue.path.join("."), message: issue.message });
72
+ }
73
+ }
74
+ }
75
+ }
76
+ else {
77
+ for (const issue of result.error.issues) {
78
+ issues.push({ file: manifestPath, path: issue.path.join("."), message: issue.message });
79
+ }
80
+ }
81
+ }
82
+ const report = {
83
+ valid: manifestValid && issues.length === 0,
84
+ manifestPath,
85
+ manifestValid,
86
+ summaryAssetsChecked,
87
+ summaryAssetsValid,
88
+ issues
89
+ };
90
+ if (options.json) {
91
+ console.log(JSON.stringify(report, null, 2));
92
+ }
93
+ else {
94
+ console.log(`Manifest: ${manifestPath}`);
95
+ console.log(`Manifest valid: ${manifestValid ? "yes" : "NO"}`);
96
+ console.log(`Summary assets checked: ${summaryAssetsChecked}`);
97
+ console.log(`Summary assets valid: ${summaryAssetsValid}`);
98
+ console.log("---");
99
+ if (issues.length === 0) {
100
+ console.log("All artifacts pass schema validation.");
101
+ }
102
+ else {
103
+ console.log(`Issues (${issues.length}):`);
104
+ for (const issue of issues) {
105
+ const location = issue.path ? `${issue.file} -> ${issue.path}` : issue.file;
106
+ console.log(` - ${location}: ${issue.message}`);
107
+ }
108
+ }
109
+ }
110
+ if (!report.valid) {
111
+ process.exitCode = 1;
112
+ }
113
+ }
package/dist/config.js CHANGED
@@ -6,6 +6,7 @@ Object.defineProperty(exports, "__esModule", { value: true });
6
6
  exports.buildRuntimeConfig = buildRuntimeConfig;
7
7
  const node_fs_1 = __importDefault(require("node:fs"));
8
8
  const node_path_1 = __importDefault(require("node:path"));
9
+ const schemas_1 = require("./schemas");
9
10
  function readOption(value, envName, fallback = "") {
10
11
  return value || process.env[envName] || fallback;
11
12
  }
@@ -36,6 +37,14 @@ function readPositiveIntegerOption(value, envName, fallback) {
36
37
  }
37
38
  return parsed;
38
39
  }
40
+ function readPromptOption(value, envName) {
41
+ const option = value ?? process.env[envName];
42
+ return option && option.trim() ? option : undefined;
43
+ }
44
+ function resolvePromptFile(promptFile) {
45
+ const resolvedPath = node_path_1.default.resolve(process.cwd(), promptFile);
46
+ return node_fs_1.default.readFileSync(resolvedPath, "utf8");
47
+ }
39
48
  function loadRcFile(configPath) {
40
49
  const resolvedPath = node_path_1.default.resolve(process.cwd(), configPath || ".diffdocrc");
41
50
  if (!node_fs_1.default.existsSync(resolvedPath)) {
@@ -48,7 +57,12 @@ function loadRcFile(configPath) {
48
57
  if (!parsed || typeof parsed !== "object" || Array.isArray(parsed)) {
49
58
  throw new Error(`Config file must contain a JSON object: ${resolvedPath}`);
50
59
  }
51
- return parsed;
60
+ const result = schemas_1.DiffdocConfigSchema.safeParse(parsed);
61
+ if (!result.success) {
62
+ const issues = result.error.issues.map((i) => ` - ${i.path.join(".")}: ${i.message}`).join("\n");
63
+ throw new Error(`Invalid config file ${resolvedPath}:\n${issues}`);
64
+ }
65
+ return result.data;
52
66
  }
53
67
  function mergeConfigOptions(options) {
54
68
  const rcOptions = loadRcFile(options.config);
@@ -73,6 +87,13 @@ function buildRuntimeConfig(options, needs = { chat: true, embeddings: true }) {
73
87
  const excludeGlobs = readListOption(mergedOptions.excludeGlobs, "DIFFDOC_EXCLUDE_GLOBS");
74
88
  const ignoreFile = readOption(mergedOptions.ignoreFile, "DIFFDOC_IGNORE_FILE", ".diffdocignore");
75
89
  const summarizeConcurrency = readPositiveIntegerOption(mergedOptions.summarizeConcurrency, "DIFFDOC_SUMMARIZE_CONCURRENCY", 2);
90
+ const summaryPrompt = readPromptOption(mergedOptions.summaryPrompt, "DIFFDOC_SUMMARY_PROMPT");
91
+ const summaryPromptFile = readPromptOption(mergedOptions.summaryPromptFile, "DIFFDOC_SUMMARY_PROMPT_FILE");
92
+ if (summaryPrompt && summaryPromptFile) {
93
+ throw new Error("Configure either summaryPrompt or summaryPromptFile, not both.");
94
+ }
95
+ const resolvedSummaryPrompt = summaryPromptFile ? resolvePromptFile(summaryPromptFile) : summaryPrompt;
96
+ const summaryPromptSource = summaryPromptFile ? summaryPromptFile : summaryPrompt ? "inline" : undefined;
76
97
  const chatBaseURL = provider === "cloud"
77
98
  ? readOption(mergedOptions.cloudLlmEndpoint, "CLOUD_LLM_ENDPOINT", "https://api.openai.com/v1")
78
99
  : readOption(mergedOptions.localLlmEndpoint, "LOCAL_LLM_ENDPOINT");
@@ -118,7 +139,11 @@ function buildRuntimeConfig(options, needs = { chat: true, embeddings: true }) {
118
139
  includeGlobs,
119
140
  excludeGlobs,
120
141
  ignoreFile,
121
- concurrency: summarizeConcurrency
142
+ concurrency: summarizeConcurrency,
143
+ summaryPrompt,
144
+ summaryPromptFile,
145
+ resolvedSummaryPrompt,
146
+ summaryPromptSource
122
147
  }
123
148
  };
124
149
  }
package/dist/index.js CHANGED
@@ -8,6 +8,7 @@ const init_1 = require("./commands/init");
8
8
  const query_1 = require("./commands/query");
9
9
  const status_1 = require("./commands/status");
10
10
  const summarize_1 = require("./commands/summarize");
11
+ const validate_1 = require("./commands/validate");
11
12
  const program = new commander_1.Command();
12
13
  function collectOption(value, previous) {
13
14
  previous.push(value);
@@ -42,7 +43,7 @@ function addCloudEndpointAndKeyOptions(command) {
42
43
  program
43
44
  .name("diffdoc")
44
45
  .description("Translate repository code shifts into plain-English business context")
45
- .version("0.5.0");
46
+ .version("0.6.0");
46
47
  program
47
48
  .command("init")
48
49
  .description("Initialize DiffDoc configuration for this repository")
@@ -72,6 +73,9 @@ addChatOptions(addBaseOptions(program
72
73
  .option("--exclude-glob <pattern>", "exclude glob pattern (repeatable)", collectOption, [])
73
74
  .option("--ignore-file <path>", "path to ignore pattern file relative to --path")
74
75
  .option("--summarize-concurrency <count>", "number of files to summarize concurrently")
76
+ .option("--summary-prompt <text>", "additional instructions for summary generation")
77
+ .option("--summary-prompt-file <path>", "path to additional summary prompt instructions")
78
+ .option("--refresh", "regenerate summaries even when source and summary metadata are fresh", false)
75
79
  .action(async (options) => {
76
80
  try {
77
81
  const config = (0, config_1.buildRuntimeConfig)(options, { chat: true });
@@ -83,7 +87,8 @@ addChatOptions(addBaseOptions(program
83
87
  json: options.json,
84
88
  includeGlobs: options.includeGlob,
85
89
  excludeGlobs: options.excludeGlob,
86
- ignoreFile: options.ignoreFile
90
+ ignoreFile: options.ignoreFile,
91
+ refresh: options.refresh
87
92
  }, config);
88
93
  }
89
94
  catch (error) {
@@ -153,6 +158,21 @@ addBaseOptions(program
153
158
  process.exit(1);
154
159
  }
155
160
  });
161
+ addBaseOptions(program
162
+ .command("validate"))
163
+ .description("Validate manifest and summary assets against JSON schemas")
164
+ .option("--manifest <path>", "manifest input path under --base-dir", "manifest.json")
165
+ .option("--json", "print validation report as JSON for CI", false)
166
+ .action(async (options) => {
167
+ try {
168
+ const config = (0, config_1.buildRuntimeConfig)(options, { embeddings: false, chat: false });
169
+ await (0, validate_1.runValidate)({ manifest: options.manifest, json: options.json }, config);
170
+ }
171
+ catch (error) {
172
+ console.error(error instanceof Error ? error.message : error);
173
+ process.exit(1);
174
+ }
175
+ });
156
176
  program.parseAsync(process.argv).catch((error) => {
157
177
  console.error(error instanceof Error ? error.message : error);
158
178
  process.exit(1);
@@ -0,0 +1,71 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ exports.SummaryAssetSchema = exports.SummaryMetadataSchema = exports.RepoManifestSchema = exports.DiffdocConfigSchema = exports.SUMMARY_ASSET_SCHEMA_VERSION = exports.MANIFEST_SCHEMA_VERSION = void 0;
4
+ const zod_1 = require("zod");
5
+ // ---------------------------------------------------------------------------
6
+ // Schema version constants
7
+ // ---------------------------------------------------------------------------
8
+ exports.MANIFEST_SCHEMA_VERSION = 2;
9
+ exports.SUMMARY_ASSET_SCHEMA_VERSION = 2;
10
+ // ---------------------------------------------------------------------------
11
+ // Configuration schema (.diffdocrc)
12
+ // ---------------------------------------------------------------------------
13
+ exports.DiffdocConfigSchema = zod_1.z.object({
14
+ baseDir: zod_1.z.string().optional(),
15
+ aiProvider: zod_1.z.enum(["local", "cloud"]).optional(),
16
+ localLlmEndpoint: zod_1.z.string().optional(),
17
+ localEmbedEndpoint: zod_1.z.string().optional(),
18
+ localChatModel: zod_1.z.string().optional(),
19
+ localEmbedModel: zod_1.z.string().optional(),
20
+ cloudLlmEndpoint: zod_1.z.string().optional(),
21
+ cloudChatModel: zod_1.z.string().optional(),
22
+ cloudEmbedModel: zod_1.z.string().optional(),
23
+ embedBatchSize: zod_1.z.union([zod_1.z.number().int().positive(), zod_1.z.string()]).optional(),
24
+ summarizeConcurrency: zod_1.z.union([zod_1.z.number().int().positive(), zod_1.z.string()]).optional(),
25
+ openaiApiKey: zod_1.z.string().optional(),
26
+ includeGlobs: zod_1.z.union([zod_1.z.array(zod_1.z.string()), zod_1.z.string()]).optional(),
27
+ excludeGlobs: zod_1.z.union([zod_1.z.array(zod_1.z.string()), zod_1.z.string()]).optional(),
28
+ ignoreFile: zod_1.z.string().optional(),
29
+ summaryPrompt: zod_1.z.string().optional(),
30
+ summaryPromptFile: zod_1.z.string().optional()
31
+ }).strict();
32
+ // ---------------------------------------------------------------------------
33
+ // Repository manifest schema
34
+ // ---------------------------------------------------------------------------
35
+ exports.RepoManifestSchema = zod_1.z.object({
36
+ $schema: zod_1.z.string().optional(),
37
+ schemaVersion: zod_1.z.literal(exports.MANIFEST_SCHEMA_VERSION),
38
+ lastSyncedCommit: zod_1.z.string(),
39
+ files: zod_1.z.record(zod_1.z.string(), zod_1.z.string())
40
+ });
41
+ // ---------------------------------------------------------------------------
42
+ // Summary metadata schema (nested within summary assets)
43
+ // ---------------------------------------------------------------------------
44
+ exports.SummaryMetadataSchema = zod_1.z.object({
45
+ file_path: zod_1.z.string(),
46
+ file_name: zod_1.z.string(),
47
+ extension: zod_1.z.string(),
48
+ line_count: zod_1.z.number().int().nonnegative(),
49
+ byte_size: zod_1.z.number().int().nonnegative(),
50
+ content_hash: zod_1.z.string(),
51
+ generated_at: zod_1.z.string(),
52
+ generator: zod_1.z.object({
53
+ provider: zod_1.z.string(),
54
+ model: zod_1.z.string()
55
+ }),
56
+ prompt_version: zod_1.z.number().int(),
57
+ summary_format: zod_1.z.string(),
58
+ custom_prompt_hash: zod_1.z.string().optional(),
59
+ custom_prompt_source: zod_1.z.string().optional()
60
+ });
61
+ // ---------------------------------------------------------------------------
62
+ // Summary asset schema (individual hash-named JSON files)
63
+ // ---------------------------------------------------------------------------
64
+ exports.SummaryAssetSchema = zod_1.z.object({
65
+ $schema: zod_1.z.string().optional(),
66
+ schemaVersion: zod_1.z.literal(exports.SUMMARY_ASSET_SCHEMA_VERSION),
67
+ content_hash: zod_1.z.string(),
68
+ metadata: exports.SummaryMetadataSchema.optional(),
69
+ summary: zod_1.z.string(),
70
+ raw_code_snapshot: zod_1.z.string().optional()
71
+ });
@@ -0,0 +1,32 @@
1
+ "use strict";
2
+ var __importDefault = (this && this.__importDefault) || function (mod) {
3
+ return (mod && mod.__esModule) ? mod : { "default": mod };
4
+ };
5
+ Object.defineProperty(exports, "__esModule", { value: true });
6
+ const node_fs_1 = __importDefault(require("node:fs"));
7
+ const node_path_1 = __importDefault(require("node:path"));
8
+ const zod_to_json_schema_1 = require("zod-to-json-schema");
9
+ const schemas_1 = require("../schemas");
10
+ const SCHEMA_BASE_URL = "https://raw.githubusercontent.com/sullyTheDev/diffdoc";
11
+ const VERSION = JSON.parse(node_fs_1.default.readFileSync(node_path_1.default.resolve(__dirname, "../../package.json"), "utf8")).version;
12
+ const schemas = [
13
+ { name: "diffdocrc.schema.json", zodSchema: schemas_1.DiffdocConfigSchema },
14
+ { name: "manifest.schema.json", zodSchema: schemas_1.RepoManifestSchema },
15
+ { name: "summary-asset.schema.json", zodSchema: schemas_1.SummaryAssetSchema }
16
+ ];
17
+ const outDir = node_path_1.default.resolve(__dirname, "../../schemas");
18
+ node_fs_1.default.mkdirSync(outDir, { recursive: true });
19
+ for (const entry of schemas) {
20
+ const jsonSchema = (0, zod_to_json_schema_1.zodToJsonSchema)(entry.zodSchema, {
21
+ name: entry.name.replace(".schema.json", ""),
22
+ $refStrategy: "none"
23
+ });
24
+ const schemaWithId = {
25
+ ...jsonSchema,
26
+ $id: `${SCHEMA_BASE_URL}/v${VERSION}/schemas/${entry.name}`
27
+ };
28
+ const outPath = node_path_1.default.resolve(outDir, entry.name);
29
+ node_fs_1.default.writeFileSync(outPath, `${JSON.stringify(schemaWithId, null, 2)}\n`);
30
+ console.log(`Generated: ${outPath}`);
31
+ }
32
+ console.log("Schema generation complete.");
@@ -1,5 +1,12 @@
1
1
  "use strict";
2
+ // Re-export all schema definitions and types from the canonical source.
3
+ // This file exists for backwards-compatible import paths.
2
4
  Object.defineProperty(exports, "__esModule", { value: true });
3
- exports.SUMMARY_ASSET_SCHEMA_VERSION = exports.MANIFEST_SCHEMA_VERSION = void 0;
4
- exports.MANIFEST_SCHEMA_VERSION = 2;
5
- exports.SUMMARY_ASSET_SCHEMA_VERSION = 1;
5
+ exports.DiffdocConfigSchema = exports.SummaryAssetSchema = exports.SummaryMetadataSchema = exports.RepoManifestSchema = exports.SUMMARY_ASSET_SCHEMA_VERSION = exports.MANIFEST_SCHEMA_VERSION = void 0;
6
+ var schemas_1 = require("../schemas");
7
+ Object.defineProperty(exports, "MANIFEST_SCHEMA_VERSION", { enumerable: true, get: function () { return schemas_1.MANIFEST_SCHEMA_VERSION; } });
8
+ Object.defineProperty(exports, "SUMMARY_ASSET_SCHEMA_VERSION", { enumerable: true, get: function () { return schemas_1.SUMMARY_ASSET_SCHEMA_VERSION; } });
9
+ Object.defineProperty(exports, "RepoManifestSchema", { enumerable: true, get: function () { return schemas_1.RepoManifestSchema; } });
10
+ Object.defineProperty(exports, "SummaryMetadataSchema", { enumerable: true, get: function () { return schemas_1.SummaryMetadataSchema; } });
11
+ Object.defineProperty(exports, "SummaryAssetSchema", { enumerable: true, get: function () { return schemas_1.SummaryAssetSchema; } });
12
+ Object.defineProperty(exports, "DiffdocConfigSchema", { enumerable: true, get: function () { return schemas_1.DiffdocConfigSchema; } });
@@ -1,7 +1,11 @@
1
1
  "use strict";
2
2
  Object.defineProperty(exports, "__esModule", { value: true });
3
3
  exports.hashFileContent = hashFileContent;
4
+ exports.hashTextContent = hashTextContent;
4
5
  const node_crypto_1 = require("node:crypto");
5
6
  function hashFileContent(fileContent) {
6
7
  return (0, node_crypto_1.createHash)("md5").update(fileContent, "utf8").digest("hex");
7
8
  }
9
+ function hashTextContent(textContent) {
10
+ return (0, node_crypto_1.createHash)("sha256").update(textContent, "utf8").digest("hex");
11
+ }
package/dist/utils/llm.js CHANGED
@@ -3,17 +3,108 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
3
3
  return (mod && mod.__esModule) ? mod : { "default": mod };
4
4
  };
5
5
  Object.defineProperty(exports, "__esModule", { value: true });
6
+ exports.SUMMARY_FORMAT = exports.SUMMARY_PROMPT_VERSION = void 0;
6
7
  exports.generateFunctionalSummary = generateFunctionalSummary;
7
8
  exports.generateAnswer = generateAnswer;
8
9
  exports.generateEmbeddings = generateEmbeddings;
9
10
  const openai_1 = __importDefault(require("openai"));
11
+ exports.SUMMARY_PROMPT_VERSION = 1;
12
+ exports.SUMMARY_FORMAT = "structured-functional-v1";
13
+ const SUMMARY_SYSTEM_PROMPT = `Generate a structured DiffDoc functional summary for the provided source file.
14
+
15
+ Required headings, exactly once and in this order:
16
+ ## Metadata
17
+ ## Purpose
18
+ ## User-Visible Behavior
19
+ ## Business Rules
20
+ ## Data Inputs And Outputs
21
+ ## Side Effects
22
+ ## Error And Edge Cases
23
+ ## Dependencies
24
+ ## Operational Notes
25
+
26
+ Section guidance:
27
+
28
+ ## Metadata
29
+ Include file-level context useful for search and retrieval. This section is mandatory and must contain every bullet below exactly once, in this order:
30
+ - File path: {copy the provided file path exactly}
31
+ - File name: {copy the provided file name exactly}
32
+ - Extension: {copy the provided extension exactly}
33
+ - Inferred language/type: {infer from file path, file name, extension, and code content}
34
+ - Content hash: {copy the provided content hash exactly}
35
+ - Line count: {copy the provided line count exactly}
36
+ - Byte size: {copy the provided byte size exactly}
37
+ - Summary format: {copy the provided summary format exactly}
38
+ - Notable symbols/classes/functions: {infer from code, or write "None identified."}
39
+ - External dependencies: {infer from imports, packages, runtime services, external APIs, or write "None identified."}
40
+ - Internal dependencies: {infer from project imports, local modules, local artifacts, or write "None identified."}
41
+ - Public API/exports: {infer exported functions, classes, types, routes, commands, tools, or write "None identified."}
42
+
43
+ ## Purpose
44
+ Explain why this file exists and the main responsibility it serves.
45
+ Examples: handles login requests, builds a vector index, loads runtime configuration.
46
+
47
+ ## User-Visible Behavior
48
+ Describe behavior users, operators, developers, or API consumers would observe.
49
+ Examples: CLI output, API responses, UI behavior, created/updated/deleted files, validation errors.
50
+
51
+ ## Business Rules
52
+ Describe implemented rules, constraints, decisions, and policy-like behavior.
53
+ Examples: required fields, valid modes, filtering precedence, defaults, validation rules, skip conditions.
54
+
55
+ ## Data Inputs And Outputs
56
+ Describe what data enters and leaves this file's behavior.
57
+ Examples: input files, config values, environment variables, function arguments, API payloads, generated artifacts, return values.
58
+
59
+ ## Side Effects
60
+ Describe changes caused outside local computation.
61
+ Examples: writes files, deletes files, calls external services, updates indexes, logs output, mutates shared state, sends network requests.
62
+
63
+ ## Error And Edge Cases
64
+ Describe failure handling and unusual conditions.
65
+ Examples: missing files, invalid config, unsupported schemas, empty results, network/model failures, deleted or unchanged files.
66
+
67
+ ## Dependencies
68
+ Describe important internal and external dependencies.
69
+ Examples: imported packages, runtime services, local artifacts, external APIs, models/providers, framework components, project files.
70
+
71
+ ## Operational Notes
72
+ Describe details useful for running, maintaining, scaling, or debugging.
73
+ Examples: concurrency, performance, idempotency, caching/reuse, schema implications, regeneration requirements, security/privacy considerations.
74
+
75
+ Rules:
76
+ - Use every heading exactly once.
77
+ - Use headings in the required order.
78
+ - Start with ## Metadata.
79
+ - Include provided deterministic metadata values exactly.
80
+ - Do not rename, omit, reorder, or merge Metadata bullets.
81
+ - Infer the language/type from the provided file path, file name, extension, and code content. Prefer code content when extension is ambiguous. If uncertain, provide the best likely language/type and briefly note uncertainty.
82
+ - Let the code identify symbols, classes, functions, and dependencies. Include important identifiers when useful for search.
83
+ - If a section has no applicable content, write "None identified."
84
+ - Do not invent behavior, requirements, dependencies, or intent not supported by the code.
85
+ - Summarize implemented behavior only.
86
+ - Prefer specific behavior over generic descriptions.
87
+ - Use plain English.
88
+ - Provide zero conversational preamble.
89
+ - Do not include Markdown sections outside the required headings.`;
10
90
  function createClient(config) {
11
91
  return {
12
92
  client: new openai_1.default({ apiKey: config.apiKey, baseURL: config.baseURL }),
13
93
  model: config.model
14
94
  };
15
95
  }
16
- async function generateFunctionalSummary(fileName, codeContent, config) {
96
+ function formatMetadataForPrompt(metadata) {
97
+ return [
98
+ `- File path: ${metadata.file_path}`,
99
+ `- File name: ${metadata.file_name}`,
100
+ `- Extension: ${metadata.extension || "None"}`,
101
+ `- Content hash: ${metadata.content_hash}`,
102
+ `- Line count: ${metadata.line_count}`,
103
+ `- Byte size: ${metadata.byte_size}`,
104
+ `- Summary format: ${metadata.summary_format}`
105
+ ].join("\n");
106
+ }
107
+ async function generateFunctionalSummary(fileName, codeContent, metadata, config, customPrompt) {
17
108
  const { client, model } = createClient(config);
18
109
  const response = await client.chat.completions.create({
19
110
  model,
@@ -21,11 +112,11 @@ async function generateFunctionalSummary(fileName, codeContent, config) {
21
112
  messages: [
22
113
  {
23
114
  role: "system",
24
- content: "Explain what this code does for non-technical stakeholders. Focus on business behavior, user impact, rules, data movement, and visible outcomes. Use plain English, avoid jargon, and provide zero conversational preamble."
115
+ content: SUMMARY_SYSTEM_PROMPT
25
116
  },
26
117
  {
27
118
  role: "user",
28
- content: `File: ${fileName}\n\nCode:\n${codeContent}`
119
+ content: `File: ${fileName}\n\nProvided metadata:\n${formatMetadataForPrompt(metadata)}\n\nConsumer instructions:\n${customPrompt && customPrompt.trim() ? customPrompt.trim() : "None."}\n\nCode:\n${codeContent}`
29
120
  }
30
121
  ]
31
122
  });
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "diffdoc",
3
- "version": "0.5.0",
3
+ "version": "0.6.1",
4
4
  "description": "Translate repository code shifts into plain-English business context",
5
5
  "license": "MIT",
6
6
  "author": "Christopher Sullivan",
@@ -20,6 +20,7 @@
20
20
  },
21
21
  "files": [
22
22
  "dist",
23
+ "schemas",
23
24
  "README.md",
24
25
  "LICENSE",
25
26
  ".diffdocrc.example"
@@ -28,7 +29,8 @@
28
29
  "node": ">=22"
29
30
  },
30
31
  "scripts": {
31
- "build": "tsc",
32
+ "build": "tsc && node dist/scripts/generate-schemas.js",
33
+ "generate:schemas": "node dist/scripts/generate-schemas.js",
32
34
  "clean": "node -e \"require('fs').rmSync('dist', { recursive: true, force: true })\"",
33
35
  "start": "tsc && node ./dist/index.js",
34
36
  "prepare": "npm run build"
@@ -44,6 +46,7 @@
44
46
  },
45
47
  "devDependencies": {
46
48
  "@types/node": "^20.19.41",
47
- "typescript": "^5.3.3"
49
+ "typescript": "^5.3.3",
50
+ "zod-to-json-schema": "^3.25.2"
48
51
  }
49
52
  }
@@ -0,0 +1,104 @@
1
+ {
2
+ "$ref": "#/definitions/diffdocrc",
3
+ "definitions": {
4
+ "diffdocrc": {
5
+ "type": "object",
6
+ "properties": {
7
+ "baseDir": {
8
+ "type": "string"
9
+ },
10
+ "aiProvider": {
11
+ "type": "string",
12
+ "enum": [
13
+ "local",
14
+ "cloud"
15
+ ]
16
+ },
17
+ "localLlmEndpoint": {
18
+ "type": "string"
19
+ },
20
+ "localEmbedEndpoint": {
21
+ "type": "string"
22
+ },
23
+ "localChatModel": {
24
+ "type": "string"
25
+ },
26
+ "localEmbedModel": {
27
+ "type": "string"
28
+ },
29
+ "cloudLlmEndpoint": {
30
+ "type": "string"
31
+ },
32
+ "cloudChatModel": {
33
+ "type": "string"
34
+ },
35
+ "cloudEmbedModel": {
36
+ "type": "string"
37
+ },
38
+ "embedBatchSize": {
39
+ "anyOf": [
40
+ {
41
+ "type": "integer",
42
+ "exclusiveMinimum": 0
43
+ },
44
+ {
45
+ "type": "string"
46
+ }
47
+ ]
48
+ },
49
+ "summarizeConcurrency": {
50
+ "anyOf": [
51
+ {
52
+ "type": "integer",
53
+ "exclusiveMinimum": 0
54
+ },
55
+ {
56
+ "type": "string"
57
+ }
58
+ ]
59
+ },
60
+ "openaiApiKey": {
61
+ "type": "string"
62
+ },
63
+ "includeGlobs": {
64
+ "anyOf": [
65
+ {
66
+ "type": "array",
67
+ "items": {
68
+ "type": "string"
69
+ }
70
+ },
71
+ {
72
+ "type": "string"
73
+ }
74
+ ]
75
+ },
76
+ "excludeGlobs": {
77
+ "anyOf": [
78
+ {
79
+ "type": "array",
80
+ "items": {
81
+ "type": "string"
82
+ }
83
+ },
84
+ {
85
+ "type": "string"
86
+ }
87
+ ]
88
+ },
89
+ "ignoreFile": {
90
+ "type": "string"
91
+ },
92
+ "summaryPrompt": {
93
+ "type": "string"
94
+ },
95
+ "summaryPromptFile": {
96
+ "type": "string"
97
+ }
98
+ },
99
+ "additionalProperties": false
100
+ }
101
+ },
102
+ "$schema": "http://json-schema.org/draft-07/schema#",
103
+ "$id": "https://raw.githubusercontent.com/sullyTheDev/diffdoc/v0.6.1/schemas/diffdocrc.schema.json"
104
+ }
@@ -0,0 +1,34 @@
1
+ {
2
+ "$ref": "#/definitions/manifest",
3
+ "definitions": {
4
+ "manifest": {
5
+ "type": "object",
6
+ "properties": {
7
+ "$schema": {
8
+ "type": "string"
9
+ },
10
+ "schemaVersion": {
11
+ "type": "number",
12
+ "const": 2
13
+ },
14
+ "lastSyncedCommit": {
15
+ "type": "string"
16
+ },
17
+ "files": {
18
+ "type": "object",
19
+ "additionalProperties": {
20
+ "type": "string"
21
+ }
22
+ }
23
+ },
24
+ "required": [
25
+ "schemaVersion",
26
+ "lastSyncedCommit",
27
+ "files"
28
+ ],
29
+ "additionalProperties": false
30
+ }
31
+ },
32
+ "$schema": "http://json-schema.org/draft-07/schema#",
33
+ "$id": "https://raw.githubusercontent.com/sullyTheDev/diffdoc/v0.6.1/schemas/manifest.schema.json"
34
+ }
@@ -0,0 +1,103 @@
1
+ {
2
+ "$ref": "#/definitions/summary-asset",
3
+ "definitions": {
4
+ "summary-asset": {
5
+ "type": "object",
6
+ "properties": {
7
+ "$schema": {
8
+ "type": "string"
9
+ },
10
+ "schemaVersion": {
11
+ "type": "number",
12
+ "const": 2
13
+ },
14
+ "content_hash": {
15
+ "type": "string"
16
+ },
17
+ "metadata": {
18
+ "type": "object",
19
+ "properties": {
20
+ "file_path": {
21
+ "type": "string"
22
+ },
23
+ "file_name": {
24
+ "type": "string"
25
+ },
26
+ "extension": {
27
+ "type": "string"
28
+ },
29
+ "line_count": {
30
+ "type": "integer",
31
+ "minimum": 0
32
+ },
33
+ "byte_size": {
34
+ "type": "integer",
35
+ "minimum": 0
36
+ },
37
+ "content_hash": {
38
+ "type": "string"
39
+ },
40
+ "generated_at": {
41
+ "type": "string"
42
+ },
43
+ "generator": {
44
+ "type": "object",
45
+ "properties": {
46
+ "provider": {
47
+ "type": "string"
48
+ },
49
+ "model": {
50
+ "type": "string"
51
+ }
52
+ },
53
+ "required": [
54
+ "provider",
55
+ "model"
56
+ ],
57
+ "additionalProperties": false
58
+ },
59
+ "prompt_version": {
60
+ "type": "integer"
61
+ },
62
+ "summary_format": {
63
+ "type": "string"
64
+ },
65
+ "custom_prompt_hash": {
66
+ "type": "string"
67
+ },
68
+ "custom_prompt_source": {
69
+ "type": "string"
70
+ }
71
+ },
72
+ "required": [
73
+ "file_path",
74
+ "file_name",
75
+ "extension",
76
+ "line_count",
77
+ "byte_size",
78
+ "content_hash",
79
+ "generated_at",
80
+ "generator",
81
+ "prompt_version",
82
+ "summary_format"
83
+ ],
84
+ "additionalProperties": false
85
+ },
86
+ "summary": {
87
+ "type": "string"
88
+ },
89
+ "raw_code_snapshot": {
90
+ "type": "string"
91
+ }
92
+ },
93
+ "required": [
94
+ "schemaVersion",
95
+ "content_hash",
96
+ "summary"
97
+ ],
98
+ "additionalProperties": false
99
+ }
100
+ },
101
+ "$schema": "http://json-schema.org/draft-07/schema#",
102
+ "$id": "https://raw.githubusercontent.com/sullyTheDev/diffdoc/v0.6.1/schemas/summary-asset.schema.json"
103
+ }