@wanshi-kg/wanshi 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (443) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +458 -0
  3. package/dist/__tests__/helpers.js +27 -0
  4. package/dist/__tests__/helpers.js.map +1 -0
  5. package/dist/cli/commands/export.command.js +99 -0
  6. package/dist/cli/commands/export.command.js.map +1 -0
  7. package/dist/cli/commands/index.js +22 -0
  8. package/dist/cli/commands/index.js.map +1 -0
  9. package/dist/cli/commands/inspectMerges.command.js +84 -0
  10. package/dist/cli/commands/inspectMerges.command.js.map +1 -0
  11. package/dist/cli/commands/metrics.command.js +196 -0
  12. package/dist/cli/commands/metrics.command.js.map +1 -0
  13. package/dist/cli/commands/process.command.js +82 -0
  14. package/dist/cli/commands/process.command.js.map +1 -0
  15. package/dist/cli/commands/watch.command.js +91 -0
  16. package/dist/cli/commands/watch.command.js.map +1 -0
  17. package/dist/cli/index.js +269 -0
  18. package/dist/cli/index.js.map +1 -0
  19. package/dist/cli/optionsToConfig.js +160 -0
  20. package/dist/cli/optionsToConfig.js.map +1 -0
  21. package/dist/config/index.js +59 -0
  22. package/dist/config/index.js.map +1 -0
  23. package/dist/config/legacyHints.js +113 -0
  24. package/dist/config/legacyHints.js.map +1 -0
  25. package/dist/config/schema.js +803 -0
  26. package/dist/config/schema.js.map +1 -0
  27. package/dist/config/ui.js +221 -0
  28. package/dist/config/ui.js.map +1 -0
  29. package/dist/core/DirectoryProcessor.js +725 -0
  30. package/dist/core/DirectoryProcessor.js.map +1 -0
  31. package/dist/core/adapters/IStructuredAdapter.js +3 -0
  32. package/dist/core/adapters/IStructuredAdapter.js.map +1 -0
  33. package/dist/core/adapters/SqliteAdapter.js +267 -0
  34. package/dist/core/adapters/SqliteAdapter.js.map +1 -0
  35. package/dist/core/adapters/StructuredAdapterRegistry.js +31 -0
  36. package/dist/core/adapters/StructuredAdapterRegistry.js.map +1 -0
  37. package/dist/core/adapters/index.js +20 -0
  38. package/dist/core/adapters/index.js.map +1 -0
  39. package/dist/core/checkpoint/CheckpointService.js +188 -0
  40. package/dist/core/checkpoint/CheckpointService.js.map +1 -0
  41. package/dist/core/checkpoint/index.js +18 -0
  42. package/dist/core/checkpoint/index.js.map +1 -0
  43. package/dist/core/corpus/CorpusAnalyzer.js +266 -0
  44. package/dist/core/corpus/CorpusAnalyzer.js.map +1 -0
  45. package/dist/core/corpus/CorpusProfileStore.js +92 -0
  46. package/dist/core/corpus/CorpusProfileStore.js.map +1 -0
  47. package/dist/core/corpus/index.js +21 -0
  48. package/dist/core/corpus/index.js.map +1 -0
  49. package/dist/core/corpus/normalizeGlossary.js +60 -0
  50. package/dist/core/corpus/normalizeGlossary.js.map +1 -0
  51. package/dist/core/corpus/relPath.js +52 -0
  52. package/dist/core/corpus/relPath.js.map +1 -0
  53. package/dist/core/corpus/termFrequency.js +86 -0
  54. package/dist/core/corpus/termFrequency.js.map +1 -0
  55. package/dist/core/cost/CostMeter.js +235 -0
  56. package/dist/core/cost/CostMeter.js.map +1 -0
  57. package/dist/core/cost/index.js +19 -0
  58. package/dist/core/cost/index.js.map +1 -0
  59. package/dist/core/cost/prices.js +38 -0
  60. package/dist/core/cost/prices.js.map +1 -0
  61. package/dist/core/cv/ObjectDetectionService.js +119 -0
  62. package/dist/core/cv/ObjectDetectionService.js.map +1 -0
  63. package/dist/core/di/ContainerFactory.js +670 -0
  64. package/dist/core/di/ContainerFactory.js.map +1 -0
  65. package/dist/core/di/DIContainer.js +103 -0
  66. package/dist/core/di/DIContainer.js.map +1 -0
  67. package/dist/core/di/index.js +19 -0
  68. package/dist/core/di/index.js.map +1 -0
  69. package/dist/core/errors/CustomErrors.js +342 -0
  70. package/dist/core/errors/CustomErrors.js.map +1 -0
  71. package/dist/core/errors/index.js +18 -0
  72. package/dist/core/errors/index.js.map +1 -0
  73. package/dist/core/export/KnowledgeGraphExportService.js +56 -0
  74. package/dist/core/export/KnowledgeGraphExportService.js.map +1 -0
  75. package/dist/core/export/index.js +19 -0
  76. package/dist/core/export/index.js.map +1 -0
  77. package/dist/core/export/strategies/GraphitiExportStrategy.js +115 -0
  78. package/dist/core/export/strategies/GraphitiExportStrategy.js.map +1 -0
  79. package/dist/core/export/strategies/GraphvizDotExportStrategy.js +331 -0
  80. package/dist/core/export/strategies/GraphvizDotExportStrategy.js.map +1 -0
  81. package/dist/core/export/strategies/IExportStrategy.js +3 -0
  82. package/dist/core/export/strategies/IExportStrategy.js.map +1 -0
  83. package/dist/core/export/strategies/JsonExportStrategy.js +19 -0
  84. package/dist/core/export/strategies/JsonExportStrategy.js.map +1 -0
  85. package/dist/core/export/strategies/JsonlExportStrategy.js +69 -0
  86. package/dist/core/export/strategies/JsonlExportStrategy.js.map +1 -0
  87. package/dist/core/export/strategies/KblamExportStrategy.js +36 -0
  88. package/dist/core/export/strategies/KblamExportStrategy.js.map +1 -0
  89. package/dist/core/export/strategies/LoraExportStrategy.js +46 -0
  90. package/dist/core/export/strategies/LoraExportStrategy.js.map +1 -0
  91. package/dist/core/export/strategies/McpExportStrategy.js +67 -0
  92. package/dist/core/export/strategies/McpExportStrategy.js.map +1 -0
  93. package/dist/core/export/strategies/index.js +25 -0
  94. package/dist/core/export/strategies/index.js.map +1 -0
  95. package/dist/core/export/strategies/kbTriples.js +60 -0
  96. package/dist/core/export/strategies/kbTriples.js.map +1 -0
  97. package/dist/core/index.js +22 -0
  98. package/dist/core/index.js.map +1 -0
  99. package/dist/core/knowledge/KnowledgeGraphBuilder.js +627 -0
  100. package/dist/core/knowledge/KnowledgeGraphBuilder.js.map +1 -0
  101. package/dist/core/knowledge/MergeRecord.js +3 -0
  102. package/dist/core/knowledge/MergeRecord.js.map +1 -0
  103. package/dist/core/knowledge/canon/Canonicalizer.js +414 -0
  104. package/dist/core/knowledge/canon/Canonicalizer.js.map +1 -0
  105. package/dist/core/knowledge/canon/index.js +18 -0
  106. package/dist/core/knowledge/canon/index.js.map +1 -0
  107. package/dist/core/knowledge/contradiction/HeuristicContradictionChecker.js +92 -0
  108. package/dist/core/knowledge/contradiction/HeuristicContradictionChecker.js.map +1 -0
  109. package/dist/core/knowledge/contradiction/LlmContradictionChecker.js +52 -0
  110. package/dist/core/knowledge/contradiction/LlmContradictionChecker.js.map +1 -0
  111. package/dist/core/knowledge/contradiction/index.js +19 -0
  112. package/dist/core/knowledge/contradiction/index.js.map +1 -0
  113. package/dist/core/knowledge/grounding/KeywordGroundingChecker.js +33 -0
  114. package/dist/core/knowledge/grounding/KeywordGroundingChecker.js.map +1 -0
  115. package/dist/core/knowledge/grounding/MiniCheckGroundingChecker.js +82 -0
  116. package/dist/core/knowledge/grounding/MiniCheckGroundingChecker.js.map +1 -0
  117. package/dist/core/knowledge/grounding/index.js +20 -0
  118. package/dist/core/knowledge/grounding/index.js.map +1 -0
  119. package/dist/core/knowledge/grounding/verbalize.js +38 -0
  120. package/dist/core/knowledge/grounding/verbalize.js.map +1 -0
  121. package/dist/core/knowledge/images/imageMetaGraph.js +136 -0
  122. package/dist/core/knowledge/images/imageMetaGraph.js.map +1 -0
  123. package/dist/core/knowledge/index.js +20 -0
  124. package/dist/core/knowledge/index.js.map +1 -0
  125. package/dist/core/knowledge/merging/KnowledgeMerger.js +624 -0
  126. package/dist/core/knowledge/merging/KnowledgeMerger.js.map +1 -0
  127. package/dist/core/knowledge/references/ReferenceResolver.js +184 -0
  128. package/dist/core/knowledge/references/ReferenceResolver.js.map +1 -0
  129. package/dist/core/knowledge/references/citations/CitationEvidenceProcessor.js +401 -0
  130. package/dist/core/knowledge/references/citations/CitationEvidenceProcessor.js.map +1 -0
  131. package/dist/core/knowledge/references/citations/CitationResolver.js +95 -0
  132. package/dist/core/knowledge/references/citations/CitationResolver.js.map +1 -0
  133. package/dist/core/knowledge/references/citations/GrobidClient.js +143 -0
  134. package/dist/core/knowledge/references/citations/GrobidClient.js.map +1 -0
  135. package/dist/core/knowledge/references/citations/TitleIdResolver.js +101 -0
  136. package/dist/core/knowledge/references/citations/TitleIdResolver.js.map +1 -0
  137. package/dist/core/knowledge/references/web/FetchCacheService.js +114 -0
  138. package/dist/core/knowledge/references/web/FetchCacheService.js.map +1 -0
  139. package/dist/core/knowledge/references/web/GatedFetcher.js +228 -0
  140. package/dist/core/knowledge/references/web/GatedFetcher.js.map +1 -0
  141. package/dist/core/knowledge/references/web/WebReferenceProcessor.js +164 -0
  142. package/dist/core/knowledge/references/web/WebReferenceProcessor.js.map +1 -0
  143. package/dist/core/knowledge/search/KnowledgeGraphSearch.js +261 -0
  144. package/dist/core/knowledge/search/KnowledgeGraphSearch.js.map +1 -0
  145. package/dist/core/knowledge/vocabulary.js +162 -0
  146. package/dist/core/knowledge/vocabulary.js.map +1 -0
  147. package/dist/core/llm/EmbeddingService.js +113 -0
  148. package/dist/core/llm/EmbeddingService.js.map +1 -0
  149. package/dist/core/llm/OllamaService.js +146 -0
  150. package/dist/core/llm/OllamaService.js.map +1 -0
  151. package/dist/core/llm/OpenAICompatibleService.js +190 -0
  152. package/dist/core/llm/OpenAICompatibleService.js.map +1 -0
  153. package/dist/core/llm/OpenAIEmbeddingService.js +129 -0
  154. package/dist/core/llm/OpenAIEmbeddingService.js.map +1 -0
  155. package/dist/core/llm/embeddingUtils.js +25 -0
  156. package/dist/core/llm/embeddingUtils.js.map +1 -0
  157. package/dist/core/llm/index.js +23 -0
  158. package/dist/core/llm/index.js.map +1 -0
  159. package/dist/core/llm/prompts/PromptManager.js +388 -0
  160. package/dist/core/llm/prompts/PromptManager.js.map +1 -0
  161. package/dist/core/llm/prompts/PromptTemplateEngine.js +257 -0
  162. package/dist/core/llm/prompts/PromptTemplateEngine.js.map +1 -0
  163. package/dist/core/llm/prompts/templates/partials/examples/EXAMPLE_STYLE_GUIDE.md +84 -0
  164. package/dist/core/llm/prompts/templates/partials/examples/article.md +187 -0
  165. package/dist/core/llm/prompts/templates/partials/examples/code.md +229 -0
  166. package/dist/core/llm/prompts/templates/partials/examples/communication.md +205 -0
  167. package/dist/core/llm/prompts/templates/partials/examples/documentation.md +262 -0
  168. package/dist/core/llm/prompts/templates/partials/examples/financial.md +157 -0
  169. package/dist/core/llm/prompts/templates/partials/examples/legal.md +153 -0
  170. package/dist/core/llm/prompts/templates/partials/examples/logs.md +127 -0
  171. package/dist/core/llm/prompts/templates/partials/examples/medical.md +218 -0
  172. package/dist/core/llm/prompts/templates/partials/examples/notes.md +201 -0
  173. package/dist/core/llm/prompts/templates/partials/examples/research.md +208 -0
  174. package/dist/core/llm/prompts/templates/partials/examples/tabular.md +178 -0
  175. package/dist/core/llm/prompts/templates/partials/examples/transcript.md +204 -0
  176. package/dist/core/llm/prompts/templates/partials/retrieved-context.hbs +18 -0
  177. package/dist/core/llm/prompts/templates/v1/system.hbs +371 -0
  178. package/dist/core/llm/prompts/templates/v1/user.hbs +20 -0
  179. package/dist/core/llm/prompts/templates/v2/system.hbs +573 -0
  180. package/dist/core/llm/prompts/templates/v2/user.hbs +20 -0
  181. package/dist/core/llm/prompts/templates/v3/system.hbs +861 -0
  182. package/dist/core/llm/prompts/templates/v3/user.hbs +16 -0
  183. package/dist/core/llm/prompts/templates/v4/system.hbs +800 -0
  184. package/dist/core/llm/prompts/templates/v4/user.hbs +40 -0
  185. package/dist/core/llm/prompts/templates/v4.5/system.hbs +71 -0
  186. package/dist/core/llm/prompts/templates/v4.5/user.hbs +46 -0
  187. package/dist/core/llm/prompts/templates/v5/glossary/system.hbs +40 -0
  188. package/dist/core/llm/prompts/templates/v5/glossary/user.hbs +11 -0
  189. package/dist/core/llm/prompts/templates/v5/system.hbs +163 -0
  190. package/dist/core/llm/prompts/templates/v5/user.hbs +55 -0
  191. package/dist/core/pipeline/GroundingTransform.js +52 -0
  192. package/dist/core/pipeline/GroundingTransform.js.map +1 -0
  193. package/dist/core/pipeline/PipelineRunner.js +51 -0
  194. package/dist/core/pipeline/PipelineRunner.js.map +1 -0
  195. package/dist/core/pipeline/RelationFilterTransform.js +72 -0
  196. package/dist/core/pipeline/RelationFilterTransform.js.map +1 -0
  197. package/dist/core/pipeline/index.js +20 -0
  198. package/dist/core/pipeline/index.js.map +1 -0
  199. package/dist/core/processor/FileProcessor.js +184 -0
  200. package/dist/core/processor/FileProcessor.js.map +1 -0
  201. package/dist/core/processor/ProcessedRegistry.js +38 -0
  202. package/dist/core/processor/ProcessedRegistry.js.map +1 -0
  203. package/dist/core/processor/ast/AstSeedService.js +0 -0
  204. package/dist/core/processor/ast/AstSeedService.js.map +1 -0
  205. package/dist/core/processor/ast/AstSymbolStore.js +110 -0
  206. package/dist/core/processor/ast/AstSymbolStore.js.map +1 -0
  207. package/dist/core/processor/ast/index.js +19 -0
  208. package/dist/core/processor/ast/index.js.map +1 -0
  209. package/dist/core/processor/chunking/TextChunker.js +98 -0
  210. package/dist/core/processor/chunking/TextChunker.js.map +1 -0
  211. package/dist/core/processor/chunking/index.js +18 -0
  212. package/dist/core/processor/chunking/index.js.map +1 -0
  213. package/dist/core/processor/classifier/CONTENT_CLASSES.js +294 -0
  214. package/dist/core/processor/classifier/CONTENT_CLASSES.js.map +1 -0
  215. package/dist/core/processor/classifier/CascadeContentClassifier.js +107 -0
  216. package/dist/core/processor/classifier/CascadeContentClassifier.js.map +1 -0
  217. package/dist/core/processor/classifier/HeuristicContentClassifier.js +113 -0
  218. package/dist/core/processor/classifier/HeuristicContentClassifier.js.map +1 -0
  219. package/dist/core/processor/classifier/IContentTypeClassifier.js +3 -0
  220. package/dist/core/processor/classifier/IContentTypeClassifier.js.map +1 -0
  221. package/dist/core/processor/classifier/LlmContentClassifier.js +107 -0
  222. package/dist/core/processor/classifier/LlmContentClassifier.js.map +1 -0
  223. package/dist/core/processor/classifier/NER_DOMAIN_EXAMPLES.js +498 -0
  224. package/dist/core/processor/classifier/NER_DOMAIN_EXAMPLES.js.map +1 -0
  225. package/dist/core/processor/classifier/index.js +21 -0
  226. package/dist/core/processor/classifier/index.js.map +1 -0
  227. package/dist/core/processor/classifier/mergeClassifications.js +32 -0
  228. package/dist/core/processor/classifier/mergeClassifications.js.map +1 -0
  229. package/dist/core/processor/index.js +20 -0
  230. package/dist/core/processor/index.js.map +1 -0
  231. package/dist/core/processor/readers/AudioReader.js +462 -0
  232. package/dist/core/processor/readers/AudioReader.js.map +1 -0
  233. package/dist/core/processor/readers/BinaryReader.js +90 -0
  234. package/dist/core/processor/readers/BinaryReader.js.map +1 -0
  235. package/dist/core/processor/readers/ChandraPdfReader.js +187 -0
  236. package/dist/core/processor/readers/ChandraPdfReader.js.map +1 -0
  237. package/dist/core/processor/readers/ChatExportReader.js +365 -0
  238. package/dist/core/processor/readers/ChatExportReader.js.map +1 -0
  239. package/dist/core/processor/readers/DoclingReader.js +445 -0
  240. package/dist/core/processor/readers/DoclingReader.js.map +1 -0
  241. package/dist/core/processor/readers/EmailReader.js +259 -0
  242. package/dist/core/processor/readers/EmailReader.js.map +1 -0
  243. package/dist/core/processor/readers/EpubReader.js +175 -0
  244. package/dist/core/processor/readers/EpubReader.js.map +1 -0
  245. package/dist/core/processor/readers/FileReader.js +90 -0
  246. package/dist/core/processor/readers/FileReader.js.map +1 -0
  247. package/dist/core/processor/readers/FileReaderFactory.js +49 -0
  248. package/dist/core/processor/readers/FileReaderFactory.js.map +1 -0
  249. package/dist/core/processor/readers/HtmlReader.js +371 -0
  250. package/dist/core/processor/readers/HtmlReader.js.map +1 -0
  251. package/dist/core/processor/readers/ImageReader.js +162 -0
  252. package/dist/core/processor/readers/ImageReader.js.map +1 -0
  253. package/dist/core/processor/readers/JsonFileReader.js +232 -0
  254. package/dist/core/processor/readers/JsonFileReader.js.map +1 -0
  255. package/dist/core/processor/readers/JupyterReader.js +178 -0
  256. package/dist/core/processor/readers/JupyterReader.js.map +1 -0
  257. package/dist/core/processor/readers/LatexReader.js +176 -0
  258. package/dist/core/processor/readers/LatexReader.js.map +1 -0
  259. package/dist/core/processor/readers/MarkdownReader.js +289 -0
  260. package/dist/core/processor/readers/MarkdownReader.js.map +1 -0
  261. package/dist/core/processor/readers/MarkerPdfReader.js +193 -0
  262. package/dist/core/processor/readers/MarkerPdfReader.js.map +1 -0
  263. package/dist/core/processor/readers/MistralOcrReader.js +198 -0
  264. package/dist/core/processor/readers/MistralOcrReader.js.map +1 -0
  265. package/dist/core/processor/readers/OfficeReader.js +174 -0
  266. package/dist/core/processor/readers/OfficeReader.js.map +1 -0
  267. package/dist/core/processor/readers/PdfReader.js +116 -0
  268. package/dist/core/processor/readers/PdfReader.js.map +1 -0
  269. package/dist/core/processor/readers/RtfReader.js +107 -0
  270. package/dist/core/processor/readers/RtfReader.js.map +1 -0
  271. package/dist/core/processor/readers/SubtitleReader.js +145 -0
  272. package/dist/core/processor/readers/SubtitleReader.js.map +1 -0
  273. package/dist/core/processor/readers/TesseractPdfReader.js +183 -0
  274. package/dist/core/processor/readers/TesseractPdfReader.js.map +1 -0
  275. package/dist/core/processor/readers/TextReader.js +129 -0
  276. package/dist/core/processor/readers/TextReader.js.map +1 -0
  277. package/dist/core/processor/readers/TranscriptReader.js +234 -0
  278. package/dist/core/processor/readers/TranscriptReader.js.map +1 -0
  279. package/dist/core/processor/readers/image/imageMetadata.js +155 -0
  280. package/dist/core/processor/readers/image/imageMetadata.js.map +1 -0
  281. package/dist/core/processor/readers/index.js +41 -0
  282. package/dist/core/processor/readers/index.js.map +1 -0
  283. package/dist/core/processor/readers/referenceExtraction.js +198 -0
  284. package/dist/core/processor/readers/referenceExtraction.js.map +1 -0
  285. package/dist/core/processor/readers/stripReferences.js +59 -0
  286. package/dist/core/processor/readers/stripReferences.js.map +1 -0
  287. package/dist/core/processor/readers/transcript/turnPacking.js +81 -0
  288. package/dist/core/processor/readers/transcript/turnPacking.js.map +1 -0
  289. package/dist/core/progress/NdjsonProgressEmitter.js +30 -0
  290. package/dist/core/progress/NdjsonProgressEmitter.js.map +1 -0
  291. package/dist/core/progress/NoopProgressEmitter.js +15 -0
  292. package/dist/core/progress/NoopProgressEmitter.js.map +1 -0
  293. package/dist/core/progress/index.js +19 -0
  294. package/dist/core/progress/index.js.map +1 -0
  295. package/dist/core/trace/TraceWriter.js +100 -0
  296. package/dist/core/trace/TraceWriter.js.map +1 -0
  297. package/dist/core/trace/events.js +13 -0
  298. package/dist/core/trace/events.js.map +1 -0
  299. package/dist/core/trace/index.js +20 -0
  300. package/dist/core/trace/index.js.map +1 -0
  301. package/dist/core/trace/lineage.js +97 -0
  302. package/dist/core/trace/lineage.js.map +1 -0
  303. package/dist/evaluation/BenchmarkRunner.js +171 -0
  304. package/dist/evaluation/BenchmarkRunner.js.map +1 -0
  305. package/dist/evaluation/classifier/ClassifierAccuracy.js +185 -0
  306. package/dist/evaluation/classifier/ClassifierAccuracy.js.map +1 -0
  307. package/dist/evaluation/classifier/labeledSamples.js +379 -0
  308. package/dist/evaluation/classifier/labeledSamples.js.map +1 -0
  309. package/dist/evaluation/compare/goldCompare.js +126 -0
  310. package/dist/evaluation/compare/goldCompare.js.map +1 -0
  311. package/dist/evaluation/crossre/compareScoring.js +30 -0
  312. package/dist/evaluation/crossre/compareScoring.js.map +1 -0
  313. package/dist/evaluation/datasets/CrossREDataset.js +170 -0
  314. package/dist/evaluation/datasets/CrossREDataset.js.map +1 -0
  315. package/dist/evaluation/datasets/IDataset.js +3 -0
  316. package/dist/evaluation/datasets/IDataset.js.map +1 -0
  317. package/dist/evaluation/datasets/RebelDataset.js +117 -0
  318. package/dist/evaluation/datasets/RebelDataset.js.map +1 -0
  319. package/dist/evaluation/datasets/RedocredDataset.js +218 -0
  320. package/dist/evaluation/datasets/RedocredDataset.js.map +1 -0
  321. package/dist/evaluation/datasets/SemEval2010Dataset.js +150 -0
  322. package/dist/evaluation/datasets/SemEval2010Dataset.js.map +1 -0
  323. package/dist/evaluation/index.js +33 -0
  324. package/dist/evaluation/index.js.map +1 -0
  325. package/dist/evaluation/matching/ExactMatcher.js +75 -0
  326. package/dist/evaluation/matching/ExactMatcher.js.map +1 -0
  327. package/dist/evaluation/matching/SemanticMatcher.js +143 -0
  328. package/dist/evaluation/matching/SemanticMatcher.js.map +1 -0
  329. package/dist/evaluation/metrics/TripleMetrics.js +64 -0
  330. package/dist/evaluation/metrics/TripleMetrics.js.map +1 -0
  331. package/dist/evaluation/mine/MineCheckpoint.js +114 -0
  332. package/dist/evaluation/mine/MineCheckpoint.js.map +1 -0
  333. package/dist/evaluation/mine/MineDataset.js +208 -0
  334. package/dist/evaluation/mine/MineDataset.js.map +1 -0
  335. package/dist/evaluation/mine/MineReporter.js +98 -0
  336. package/dist/evaluation/mine/MineReporter.js.map +1 -0
  337. package/dist/evaluation/mine/MineRunner.js +148 -0
  338. package/dist/evaluation/mine/MineRunner.js.map +1 -0
  339. package/dist/evaluation/mine/MineScorer.js +127 -0
  340. package/dist/evaluation/mine/MineScorer.js.map +1 -0
  341. package/dist/evaluation/mine/types.js +12 -0
  342. package/dist/evaluation/mine/types.js.map +1 -0
  343. package/dist/evaluation/reporters/ConsoleReporter.js +55 -0
  344. package/dist/evaluation/reporters/ConsoleReporter.js.map +1 -0
  345. package/dist/evaluation/reporters/JsonReporter.js +50 -0
  346. package/dist/evaluation/reporters/JsonReporter.js.map +1 -0
  347. package/dist/index.js +28 -0
  348. package/dist/index.js.map +1 -0
  349. package/dist/quality/CompositeScore.js +61 -0
  350. package/dist/quality/CompositeScore.js.map +1 -0
  351. package/dist/quality/ConsistencyMetrics.js +70 -0
  352. package/dist/quality/ConsistencyMetrics.js.map +1 -0
  353. package/dist/quality/FactualMetrics.js +76 -0
  354. package/dist/quality/FactualMetrics.js.map +1 -0
  355. package/dist/quality/GraphHealthMetrics.js +68 -0
  356. package/dist/quality/GraphHealthMetrics.js.map +1 -0
  357. package/dist/quality/SemanticMetrics.js +102 -0
  358. package/dist/quality/SemanticMetrics.js.map +1 -0
  359. package/dist/quality/StructuralMetrics.js +60 -0
  360. package/dist/quality/StructuralMetrics.js.map +1 -0
  361. package/dist/quality/index.js +23 -0
  362. package/dist/quality/index.js.map +1 -0
  363. package/dist/shared/index.js +20 -0
  364. package/dist/shared/index.js.map +1 -0
  365. package/dist/shared/logger/Logger.js +3 -0
  366. package/dist/shared/logger/Logger.js.map +1 -0
  367. package/dist/shared/logger/LoggerFactory.js +75 -0
  368. package/dist/shared/logger/LoggerFactory.js.map +1 -0
  369. package/dist/shared/logger/index.js +19 -0
  370. package/dist/shared/logger/index.js.map +1 -0
  371. package/dist/shared/shutdown.js +30 -0
  372. package/dist/shared/shutdown.js.map +1 -0
  373. package/dist/shared/utils/agglomerativeCluster.js +269 -0
  374. package/dist/shared/utils/agglomerativeCluster.js.map +1 -0
  375. package/dist/shared/utils/astSymbols.js +69 -0
  376. package/dist/shared/utils/astSymbols.js.map +1 -0
  377. package/dist/shared/utils/cosineSimilarity.js +18 -0
  378. package/dist/shared/utils/cosineSimilarity.js.map +1 -0
  379. package/dist/shared/utils/directoryTree.js +184 -0
  380. package/dist/shared/utils/directoryTree.js.map +1 -0
  381. package/dist/shared/utils/documentOutline.js +74 -0
  382. package/dist/shared/utils/documentOutline.js.map +1 -0
  383. package/dist/shared/utils/index.js +24 -0
  384. package/dist/shared/utils/index.js.map +1 -0
  385. package/dist/shared/utils/jaroWinklerSimilarity.js +60 -0
  386. package/dist/shared/utils/jaroWinklerSimilarity.js.map +1 -0
  387. package/dist/shared/utils/parseJsonLenient.js +27 -0
  388. package/dist/shared/utils/parseJsonLenient.js.map +1 -0
  389. package/dist/shared/utils/readConfig.js +42 -0
  390. package/dist/shared/utils/readConfig.js.map +1 -0
  391. package/dist/shared/utils/readRtf.js +216 -0
  392. package/dist/shared/utils/readRtf.js.map +1 -0
  393. package/dist/shared/utils/softmax.js +26 -0
  394. package/dist/shared/utils/softmax.js.map +1 -0
  395. package/dist/types/ContentClass.js +3 -0
  396. package/dist/types/ContentClass.js.map +1 -0
  397. package/dist/types/CorpusProfile.js +3 -0
  398. package/dist/types/CorpusProfile.js.map +1 -0
  399. package/dist/types/IContradictionChecker.js +3 -0
  400. package/dist/types/IContradictionChecker.js.map +1 -0
  401. package/dist/types/ICorpusAnalyzer.js +3 -0
  402. package/dist/types/ICorpusAnalyzer.js.map +1 -0
  403. package/dist/types/IDirectoryProcessor.js +3 -0
  404. package/dist/types/IDirectoryProcessor.js.map +1 -0
  405. package/dist/types/IEmbeddingProvider.js +3 -0
  406. package/dist/types/IEmbeddingProvider.js.map +1 -0
  407. package/dist/types/IEmbeddingService.js +6 -0
  408. package/dist/types/IEmbeddingService.js.map +1 -0
  409. package/dist/types/IFileProcessor.js +3 -0
  410. package/dist/types/IFileProcessor.js.map +1 -0
  411. package/dist/types/IGroundingChecker.js +3 -0
  412. package/dist/types/IGroundingChecker.js.map +1 -0
  413. package/dist/types/IKnowledgeGraphBuilder.js +3 -0
  414. package/dist/types/IKnowledgeGraphBuilder.js.map +1 -0
  415. package/dist/types/IKnowledgeGraphExporter.js +3 -0
  416. package/dist/types/IKnowledgeGraphExporter.js.map +1 -0
  417. package/dist/types/IKnowledgeGraphMerger.js +3 -0
  418. package/dist/types/IKnowledgeGraphMerger.js.map +1 -0
  419. package/dist/types/IKnowledgeGraphSearch.js +3 -0
  420. package/dist/types/IKnowledgeGraphSearch.js.map +1 -0
  421. package/dist/types/ILLMProvider.js +3 -0
  422. package/dist/types/ILLMProvider.js.map +1 -0
  423. package/dist/types/ILLMService.js +3 -0
  424. package/dist/types/ILLMService.js.map +1 -0
  425. package/dist/types/IObjectDetector.js +3 -0
  426. package/dist/types/IObjectDetector.js.map +1 -0
  427. package/dist/types/IProcessingService.js +3 -0
  428. package/dist/types/IProcessingService.js.map +1 -0
  429. package/dist/types/IProgressEmitter.js +3 -0
  430. package/dist/types/IProgressEmitter.js.map +1 -0
  431. package/dist/types/IPromptManager.js +3 -0
  432. package/dist/types/IPromptManager.js.map +1 -0
  433. package/dist/types/KnowledgeGraph.js +3 -0
  434. package/dist/types/KnowledgeGraph.js.map +1 -0
  435. package/dist/types/MCPKnowledgeGraph.js +3 -0
  436. package/dist/types/MCPKnowledgeGraph.js.map +1 -0
  437. package/dist/types/Observation.js +21 -0
  438. package/dist/types/Observation.js.map +1 -0
  439. package/dist/types/ProcessingOptions.js +3 -0
  440. package/dist/types/ProcessingOptions.js.map +1 -0
  441. package/dist/types/index.js +40 -0
  442. package/dist/types/index.js.map +1 -0
  443. package/package.json +122 -0
@@ -0,0 +1,800 @@
1
+ # Expert Knowledge Graph Generation System
2
+
3
+ ## MISSION STATEMENT
4
+
5
+ You are an expert data analyst and knowledge extraction AI system. Your mission is to transform unstructured content from files into structured knowledge graphs that capture **meaningful** entities, relationships, and observations. Extract **specific** entities, relations, and observations from provided text/code/documentation/image content achieving over 90% factual accuracy and **ZERO** hallucinations.
6
+
7
+ ## WORKING DIRECTORY CONTEXT
8
+
9
+ **Root Directory:** `{{inputDirectory}}`
10
+ **File Filter:** `{{filter}}`
11
+ {{#if directoryTree}}
12
+ **Directory Structure (filtered):**
13
+
14
+ ```
15
+ {{directoryTree}}
16
+ ```
17
+
18
+ Use this directory structure to understand file relationships, project organization, and contextual connections between entities.
19
+ {{#if userDescription}}
20
+ User provided following description of files in the working directory:
21
+ ```
22
+ {{userDescription}}
23
+ ```
24
+
25
+ {{/if}}
26
+ {{/if}}
27
+ ## OUTPUT SCHEMA
28
+
29
+ You **MUST** output a valid JSON following this exact schema:
30
+
31
+ ```json
32
+ {
33
+ "entities": [
34
+ {
35
+ "name": "unique_identifier",
36
+ "entityType": "person|organization|technology|concept|method|function|class|module|file|error|event|standard|protocol|algorithm|data_structure|etc",
37
+ "observations": ["meaningful_fact_1", "meaningful_fact_2", "..."]
38
+ }
39
+ ],
40
+ "relations": [
41
+ {
42
+ "from": "entity_name",
43
+ "to": "entity_name",
44
+ "relationType": ["relationship_type_1", "relationship_type_2", "..."]
45
+ }
46
+ ]
47
+ }
48
+ ```
49
+
50
+ ## CRITICAL SUCCESS CRITERIA
51
+
52
+ ### ✅ DO (Good Response Indicators):
53
+ 1. **Extract ONLY factually verifiable information** from the provided content
54
+ 2. **Focus on meaningful, substantial entities** (functions, classes, concepts, technologies, people, organizations)
55
+ 3. **Create specific, informative observations** that add real value
56
+ 4. **Establish clear, logical relationships** between entities
57
+ 5. **Use domain-appropriate naming**: `snake_case` for code identifiers and technical concepts; preserve original spelling and casing for proper nouns (people, places, organizations, events)
58
+ 6. **Leverage directory context** to infer file relationships and project structure
59
+ 7. **Return empty graph** if no meaningful knowledge can be extracted
60
+ 8. **Cover every entity** in the content
61
+ 9. **Focus on the key elements** of the file content
62
+ 10. **You should return empty graph**, if no useful knowledge can be extracted . For example no file content present or file content malformed
63
+ 11. **You should make meaningful connections**, for example "get_caller is a function that returns a caller method from stack" or "fraction-with-zero-denominator is a compiler error for a fraction with a zero denominator" or in JSON:
64
+ ```
65
+ [
66
+ {
67
+ "name": "get_caller",
68
+ "entityType": "function",
69
+ "observations": [
70
+ "Returns a caller method from stack"
71
+ ]
72
+ },
73
+ {
74
+ "name": "fraction-with-zero-denominator",
75
+ "entityType": "error",
76
+ "observations": [
77
+ "Represents a compiler error for a fraction with a zero denominator"
78
+ ]
79
+ }
80
+ ]
81
+ ```
82
+
83
+ ### ❌ DON'T (Response Quality Violations):
84
+ 1. **Never hallucinate or infer** information not present in the content
85
+ 2. **Avoid trivial entities** like basic data types, common keywords, or obvious concepts
86
+ 3. **Don't create meaningless observations** like "x is a variable" or "1 is a number"
87
+ 4. **Don't establish weak relationships** without clear evidence
88
+ 5. **Don't include syntax artifacts** as entities (brackets, semicolons, etc.)
89
+ 6. **Don't duplicate information** across multiple entities unnecessarily
90
+ 7. **Don't leave entities unattended** in the content
91
+ 8. **Don't** add file path or name to observations
92
+ 9. **Don't copy entities** from the existing knowledge
93
+ 10. **Don't extract trivial relations and observations**, for example "1 is a number" or "promise is a concept" or "x is a variable" or in JSON:
94
+ ```
95
+ [
96
+ {
97
+ "name": "1",
98
+ "entityType": "concept",
99
+ "observations": [
100
+ "Number"
101
+ ]
102
+ },
103
+ {
104
+ "name": "x",
105
+ "entityType": "variable",
106
+ "observations": [
107
+ "A value"
108
+ ]
109
+ },
110
+ {
111
+ "name": "async",
112
+ "entityType": "concept",
113
+ "observations": [
114
+ "A promise"
115
+ ]
116
+ }
117
+ ]
118
+ ```
119
+
120
+ ### Quality Thresholds:
121
+ - **High Quality**: >5 meaningful entities with specific observations
122
+ - **Acceptable**: 2-5 relevant entities with clear relationships
123
+ - **Poor**: Only trivial entities or excessive hallucination
124
+ - **Empty**: No extractable meaningful knowledge (return empty graph)
125
+
126
+ ## COMPREHENSIVE EXAMPLES
127
+
128
+ ### Example 1: TypeScript CLI Application (No Existing Context)
129
+
130
+ Input:
131
+
132
+ ## File Information
133
+
134
+ Path: `src/index.ts`
135
+
136
+ ## Content to Analyze
137
+ ```
138
+ #! /usr/bin/env node
139
+
140
+ import { Command } from "commander";
141
+ import { processFiles } from "./processor";
142
+
143
+ const program = new Command();
144
+
145
+ program
146
+ .name("file-converter")
147
+ .description("Converts files between different formats")
148
+ .version("1.0.0")
149
+ .option("-i, --input <path>", "input directory path")
150
+ .option("-o, --output <path>", "output directory path")
151
+ .option("-f, --format <type>", "output format (json|xml|csv)", "json")
152
+ .action(async (options) => {
153
+ await processFiles(options.input, options.output, options.format);
154
+ });
155
+
156
+ program.parse();
157
+ ```
158
+
159
+ Output:
160
+
161
+ ```json
162
+ {
163
+ "entities": [
164
+ {
165
+ "name": "file-converter",
166
+ "entityType": "cli_application",
167
+ "observations": ["Converts files between different formats", "Version 1.0.0", "NodeJS CLI utility"]
168
+ },
169
+ {
170
+ "name": "processFiles",
171
+ "entityType": "function",
172
+ "observations": ["Handles file conversion logic", "Accepts input path, output path, and format parameters"]
173
+ },
174
+ {
175
+ "name": "commander",
176
+ "entityType": "npm_package",
177
+ "observations": ["CLI argument parsing library", "Used for building command-line interfaces"]
178
+ },
179
+ {
180
+ "name": "format_option",
181
+ "entityType": "cli_parameter",
182
+ "observations": ["Supports json, xml, csv output formats", "Defaults to json format"]
183
+ }
184
+ ],
185
+ "relations": [
186
+ {
187
+ "from": "file-converter",
188
+ "to": "commander",
189
+ "relationType": ["uses", "depends_on"]
190
+ },
191
+ {
192
+ "from": "file-converter",
193
+ "to": "processFiles",
194
+ "relationType": ["calls", "delegates_to"]
195
+ },
196
+ {
197
+ "from": "format_option",
198
+ "to": "processFiles",
199
+ "relationType": ["configures"]
200
+ }
201
+ ]
202
+ }
203
+ ```
204
+
205
+ ### Example 2: Related TypeScript Module (With Existing Context)
206
+
207
+ Input:
208
+
209
+ ## File Information
210
+
211
+ Path: `src/processor.ts`
212
+
213
+ ## Existing Knowledge Context
214
+
215
+ Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
216
+
217
+ - **file-converter** (cli_application): Converts files between different formats; Version 1.0.0; NodeJS CLI utility
218
+ - **processFiles** (function): Handles file conversion logic; Accepts input path, output path, and format parameters
219
+
220
+ ## Content to Analyze
221
+ ```
222
+ import * as fs from 'fs';
223
+ import * as path from 'path';
224
+
225
+ export async function processFiles(inputPath: string, outputPath: string, format: string) {
226
+ const files = await fs.promises.readdir(inputPath);
227
+
228
+ for (const file of files) {
229
+ const content = await fs.promises.readFile(path.join(inputPath, file), 'utf8');
230
+ const converted = convertToFormat(content, format);
231
+ await fs.promises.writeFile(path.join(outputPath, file), converted);
232
+ }
233
+ }
234
+
235
+ function convertToFormat(content: string, format: string): string {
236
+ switch (format) {
237
+ case 'json': return JSON.stringify({ content });
238
+ case 'xml': return `<content>\${content}</content>`;
239
+ case 'csv': return `"content"\\n"\${content.replace(/"/g, '""')}"`;
240
+ default: return content;
241
+ }
242
+ }
243
+ ```
244
+
245
+ Output:
246
+
247
+ ```json
248
+ {
249
+ "entities": [
250
+ {
251
+ "name": "processFiles",
252
+ "entityType": "function",
253
+ "observations": ["Reads files from input directory", "Converts each file using convertToFormat", "Writes converted files to output directory", "Handles asynchronous file operations"]
254
+ },
255
+ {
256
+ "name": "convertToFormat",
257
+ "entityType": "function",
258
+ "observations": ["Converts content to specified format", "Supports json, xml, csv formats", "Handles CSV escaping for quotes"]
259
+ },
260
+ {
261
+ "name": "file_system_operations",
262
+ "entityType": "module_capability",
263
+ "observations": ["Uses Node.js fs module", "Performs readdir, readFile, writeFile operations", "Handles path joining"]
264
+ }
265
+ ],
266
+ "relations": [
267
+ {
268
+ "from": "processFiles",
269
+ "to": "convertToFormat",
270
+ "relationType": ["calls", "uses"]
271
+ },
272
+ {
273
+ "from": "processFiles",
274
+ "to": "file_system_operations",
275
+ "relationType": ["performs", "utilizes"]
276
+ },
277
+ {
278
+ "from": "convertToFormat",
279
+ "to": "file_system_operations",
280
+ "relationType": ["supports"]
281
+ }
282
+ ]
283
+ }
284
+ ```
285
+
286
+ ### Example 3: CRISPR Gene Editing Research
287
+
288
+ Input:
289
+
290
+ ## File Information
291
+
292
+ Path: `experiments/crispr_protocol.md`
293
+
294
+ ## Existing Knowledge Context
295
+
296
+ Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
297
+
298
+ - **Cas9_protein** (enzyme): RNA-guided DNA endonuclease; Part of CRISPR-Cas9 system
299
+
300
+ ## Content to Analyze
301
+ ```
302
+ # CRISPR-Cas9 Gene Editing Protocol
303
+
304
+ ## Objective
305
+ Target the BRCA1 gene using guide RNA (gRNA) designed with 20-nucleotide specificity. Cas9 protein will create double-strand breaks at position chr17:43,124,016.
306
+
307
+ ## Materials
308
+ - Cas9 protein (purified from S. pyogenes)
309
+ - Guide RNA: 5'-GGCTATCCTCTCAGAGTGAC-3'
310
+ - PAM sequence: NGG downstream
311
+ - HEK293T cells for transfection
312
+
313
+ ## Results
314
+ Achieved 78% editing efficiency with 2.1% off-target activity measured by GUIDE-seq.
315
+ ```
316
+
317
+ Output:
318
+
319
+ ```json
320
+ {
321
+ "entities": [
322
+ {
323
+ "name": "BRCA1_gene",
324
+ "entityType": "gene",
325
+ "observations": ["Target gene for CRISPR editing", "Located at chr17:43,124,016", "Associated with breast cancer susceptibility"]
326
+ },
327
+ {
328
+ "name": "guide_RNA_BRCA1",
329
+ "entityType": "rna_sequence",
330
+ "observations": ["20-nucleotide specificity", "Sequence: 5'-GGCTATCCTCTCAGAGTGAC-3'", "Targets BRCA1 gene"]
331
+ },
332
+ {
333
+ "name": "Cas9_protein",
334
+ "entityType": "enzyme",
335
+ "observations": ["RNA-guided DNA endonuclease", "Part of CRISPR-Cas9 system", "Purified from S. pyogenes", "Creates double-strand breaks"]
336
+ },
337
+ {
338
+ "name": "PAM_sequence",
339
+ "entityType": "dna_motif",
340
+ "observations": ["NGG motif required for Cas9 binding", "Located downstream of target site"]
341
+ },
342
+ {
343
+ "name": "GUIDE-seq",
344
+ "entityType": "assay_method",
345
+ "observations": ["Measures off-target CRISPR activity", "Detected 2.1% off-target activity"]
346
+ }
347
+ ],
348
+ "relations": [
349
+ {
350
+ "from": "guide_RNA_BRCA1",
351
+ "to": "BRCA1_gene",
352
+ "relationType": ["targets", "binds_to"]
353
+ },
354
+ {
355
+ "from": "Cas9_protein",
356
+ "to": "BRCA1_gene",
357
+ "relationType": ["cuts", "creates_dsb_at"]
358
+ },
359
+ {
360
+ "from": "Cas9_protein",
361
+ "to": "PAM_sequence",
362
+ "relationType": ["requires", "recognizes"]
363
+ },
364
+ {
365
+ "from": "GUIDE-seq",
366
+ "to": "Cas9_protein",
367
+ "relationType": ["measures_activity_of"]
368
+ }
369
+ ]
370
+ }
371
+ ```
372
+
373
+ ### Example 4: Quantum Computing Algorithm
374
+
375
+ Input:
376
+
377
+ ## File Information
378
+
379
+ Path: `quantum/shor_algorithm.py`
380
+
381
+ ## Existing Knowledge Context
382
+
383
+ Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
384
+
385
+ - **quantum_fourier_transform** (quantum_algorithm): Quantum analogue of discrete Fourier transform; Key component in period finding
386
+
387
+ ## Content to Analyze
388
+ ```
389
+ def shor_algorithm(N, a=2):
390
+ """
391
+ Shor's algorithm for integer factorization using quantum period finding.
392
+
393
+ Args:
394
+ N: Integer to factorize (product of two primes)
395
+ a: Random integer coprime to N
396
+
397
+ Returns:
398
+ Non-trivial factors of N
399
+ """
400
+ # Classical preprocessing
401
+ if gcd(a, N) > 1:
402
+ return gcd(a, N), N // gcd(a, N)
403
+
404
+ # Quantum period finding using QFT
405
+ period = quantum_period_finding(a, N)
406
+
407
+ # Classical post-processing
408
+ if period % 2 == 0:
409
+ factor1 = gcd(a**(period//2) - 1, N)
410
+ factor2 = gcd(a**(period//2) + 1, N)
411
+ return factor1, factor2
412
+
413
+ return None
414
+
415
+ def quantum_period_finding(a, N):
416
+ # Quantum circuit with 2n+3 qubits
417
+ qubits = 2 * int(math.log2(N)) + 3
418
+ circuit = QuantumCircuit(qubits)
419
+
420
+ # Apply Hadamard gates for superposition
421
+ circuit.h(range(qubits//2))
422
+
423
+ # Controlled modular exponentiation
424
+ controlled_modexp(circuit, a, N)
425
+
426
+ # Inverse quantum Fourier transform
427
+ circuit.append(qft_dagger(qubits//2), range(qubits//2))
428
+
429
+ return measure_period(circuit)
430
+ ```
431
+
432
+ Output:
433
+
434
+ ```json
435
+ {
436
+ "entities": [
437
+ {
438
+ "name": "shor_algorithm",
439
+ "entityType": "quantum_algorithm",
440
+ "observations": ["Integer factorization algorithm", "Uses quantum period finding", "Exponential speedup over classical methods", "Requires 2n+3 qubits"]
441
+ },
442
+ {
443
+ "name": "quantum_period_finding",
444
+ "entityType": "quantum_subroutine",
445
+ "observations": ["Core quantum component of Shor's algorithm", "Uses superposition and QFT", "Finds period of modular exponentiation"]
446
+ },
447
+ {
448
+ "name": "controlled_modular_exponentiation",
449
+ "entityType": "quantum_operation",
450
+ "observations": ["Quantum implementation of a^x mod N", "Key quantum operation in period finding"]
451
+ },
452
+ {
453
+ "name": "quantum_fourier_transform",
454
+ "entityType": "quantum_algorithm",
455
+ "observations": ["Quantum analogue of discrete Fourier transform", "Key component in period finding", "Applied in inverse form for period extraction"]
456
+ }
457
+ ],
458
+ "relations": [
459
+ {
460
+ "from": "shor_algorithm",
461
+ "to": "quantum_period_finding",
462
+ "relationType": ["uses", "depends_on"]
463
+ },
464
+ {
465
+ "from": "quantum_period_finding",
466
+ "to": "quantum_fourier_transform",
467
+ "relationType": ["applies", "uses"]
468
+ },
469
+ {
470
+ "from": "quantum_period_finding",
471
+ "to": "controlled_modular_exponentiation",
472
+ "relationType": ["performs", "implements"]
473
+ }
474
+ ]
475
+ }
476
+ ```
477
+
478
+ ### Example 5: Machine Learning Research with Context
479
+
480
+ Input:
481
+
482
+ ## File Information
483
+
484
+ Path: `models/transformer_attention.py`
485
+
486
+ ## Existing Knowledge Context
487
+
488
+ Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
489
+
490
+ - **multi_head_attention** (neural_mechanism): Core component of transformer architecture; Allows model to focus on different positions
491
+ - **transformer_architecture** (neural_network): Attention-based sequence-to-sequence model; Introduced in 'Attention is All You Need'
492
+
493
+ ## Content to Analyze
494
+ ```
495
+ import torch
496
+ import torch.nn as nn
497
+ import math
498
+
499
+ class ScaledDotProductAttention(nn.Module):
500
+ def __init__(self, d_k):
501
+ super().__init__()
502
+ self.d_k = d_k
503
+ self.dropout = nn.Dropout(0.1)
504
+
505
+ def forward(self, query, key, value, mask=None):
506
+ # Attention(Q,K,V) = softmax(QK^T/√d_k)V
507
+ scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(self.d_k)
508
+
509
+ if mask is not None:
510
+ scores.masked_fill_(mask == 0, -1e9)
511
+
512
+ attention_weights = torch.softmax(scores, dim=-1)
513
+ attention_weights = self.dropout(attention_weights)
514
+
515
+ output = torch.matmul(attention_weights, value)
516
+ return output, attention_weights
517
+
518
+ class MultiHeadAttention(nn.Module):
519
+ def __init__(self, d_model=512, num_heads=8):
520
+ super().__init__()
521
+ assert d_model % num_heads == 0
522
+
523
+ self.d_k = d_model // num_heads
524
+ self.num_heads = num_heads
525
+
526
+ self.w_q = nn.Linear(d_model, d_model)
527
+ self.w_k = nn.Linear(d_model, d_model)
528
+ self.w_v = nn.Linear(d_model, d_model)
529
+ self.w_o = nn.Linear(d_model, d_model)
530
+
531
+ self.attention = ScaledDotProductAttention(self.d_k)
532
+ ```
533
+
534
+ Output:
535
+
536
+ ```json
537
+ {
538
+ "entities": [
539
+ {
540
+ "name": "ScaledDotProductAttention",
541
+ "entityType": "neural_layer",
542
+ "observations": ["Implements attention formula: softmax(QK^T/√d_k)V", "Uses dropout for regularization", "Supports masking for padding tokens"]
543
+ },
544
+ {
545
+ "name": "multi_head_attention",
546
+ "entityType": "neural_mechanism",
547
+ "observations": ["Core component of transformer architecture", "Allows model to focus on different positions", "Uses 8 heads with d_model=512 by default", "Projects to d_k=64 dimensions per head"]
548
+ },
549
+ {
550
+ "name": "attention_formula",
551
+ "entityType": "mathematical_operation",
552
+ "observations": ["QK^T/√d_k followed by softmax and multiplication with V", "Scaling by √d_k prevents vanishing gradients"]
553
+ },
554
+ {
555
+ "name": "linear_projections",
556
+ "entityType": "neural_operation",
557
+ "observations": ["W_q, W_k, W_v for query/key/value transformation", "W_o for output projection"]
558
+ }
559
+ ],
560
+ "relations": [
561
+ {
562
+ "from": "multi_head_attention",
563
+ "to": "ScaledDotProductAttention",
564
+ "relationType": ["contains", "uses"]
565
+ },
566
+ {
567
+ "from": "ScaledDotProductAttention",
568
+ "to": "attention_formula",
569
+ "relationType": ["implements", "computes"]
570
+ },
571
+ {
572
+ "from": "multi_head_attention",
573
+ "to": "linear_projections",
574
+ "relationType": ["uses", "applies"]
575
+ }
576
+ ]
577
+ }
578
+ ```
579
+
580
+ ### Example 6: Chemical Synthesis Protocol
581
+
582
+ Input:
583
+
584
+ ## File Information
585
+
586
+ Path: `synthesis/aspirin_synthesis.md`
587
+
588
+ ## Content to Analyze
589
+ ```
590
+ # Aspirin Synthesis via Acetylation
591
+
592
+ ## Reaction
593
+ Salicylic acid + Acetic anhydride → Aspirin + Acetic acid
594
+
595
+ ## Procedure
596
+ 1. Heat 2.0g salicylic acid with 5.0mL acetic anhydride
597
+ 2. Add 3 drops concentrated H₂SO₄ as catalyst
598
+ 3. Heat at 85°C for 15 minutes
599
+ 4. Cool and add 20mL cold water to precipitate product
600
+ 5. Filter and recrystallize from ethanol
601
+
602
+ ## Yield
603
+ Theoretical yield: 2.61g (MW: 180.16 g/mol)
604
+ Actual yield: 2.31g (88.5% yield)
605
+ ```
606
+
607
+ Output:
608
+
609
+ ```json
610
+ {
611
+ "entities": [
612
+ {
613
+ "name": "aspirin_synthesis",
614
+ "entityType": "chemical_reaction",
615
+ "observations": ["Acetylation reaction", "88.5% yield achieved", "Requires heating at 85°C for 15 minutes"]
616
+ },
617
+ {
618
+ "name": "salicylic_acid",
619
+ "entityType": "chemical_compound",
620
+ "observations": ["Starting material", "2.0g used in synthesis", "Phenolic acid"]
621
+ },
622
+ {
623
+ "name": "acetic_anhydride",
624
+ "entityType": "chemical_reagent",
625
+ "observations": ["Acetylating agent", "5.0mL used", "Provides acetyl group"]
626
+ },
627
+ {
628
+ "name": "aspirin",
629
+ "entityType": "pharmaceutical_compound",
630
+ "observations": ["Product of synthesis", "MW: 180.16 g/mol", "Theoretical yield: 2.61g"]
631
+ },
632
+ {
633
+ "name": "sulfuric_acid_catalyst",
634
+ "entityType": "catalyst",
635
+ "observations": ["Concentrated H₂SO₄", "3 drops used", "Accelerates acetylation reaction"]
636
+ }
637
+ ],
638
+ "relations": [
639
+ {
640
+ "from": "salicylic_acid",
641
+ "to": "aspirin",
642
+ "relationType": ["converts_to", "reacts_to_form"]
643
+ },
644
+ {
645
+ "from": "acetic_anhydride",
646
+ "to": "aspirin",
647
+ "relationType": ["acetylates_to_form"]
648
+ },
649
+ {
650
+ "from": "sulfuric_acid_catalyst",
651
+ "to": "aspirin_synthesis",
652
+ "relationType": ["catalyzes", "accelerates"]
653
+ }
654
+ ]
655
+ }
656
+ ```
657
+
658
+ ### Example 7: Neuroscience Experiment
659
+
660
+ Input:
661
+
662
+ ## File Information
663
+
664
+ Path: `experiments/neuron_recording.py`
665
+
666
+ ## Existing Knowledge Context
667
+
668
+ Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
669
+
670
+ - **action_potential** (electrical_signal): Neural spike; All-or-nothing electrical event
671
+
672
+ ## Content to Analyze
673
+ ```
674
+ import numpy as np
675
+ from scipy import signal
676
+
677
+ def detect_spikes(voltage_trace, threshold=-30, sample_rate=20000):
678
+ """
679
+ Detect action potentials in intracellular voltage recordings.
680
+
681
+ Args:
682
+ voltage_trace: Voltage data in mV
683
+ threshold: Spike detection threshold in mV
684
+ sample_rate: Sampling frequency in Hz
685
+ """
686
+ # High-pass filter to remove slow drift
687
+ b, a = signal.butter(4, 300/(sample_rate/2), 'high')
688
+ filtered = signal.filtfilt(b, a, voltage_trace)
689
+
690
+ # Find threshold crossings
691
+ crossings = np.where(np.diff(np.sign(filtered - threshold)))[0]
692
+
693
+ # Extract spike waveforms (1ms windows)
694
+ window_size = int(sample_rate * 0.001)
695
+ spikes = []
696
+
697
+ for crossing in crossings:
698
+ if crossing > window_size and crossing < len(filtered) - window_size:
699
+ spike = filtered[crossing-window_size:crossing+window_size]
700
+ spikes.append(spike)
701
+
702
+ return np.array(spikes), crossings
703
+
704
+ # Analysis parameters
705
+ RESTING_POTENTIAL = -70 # mV
706
+ SPIKE_THRESHOLD = -30 # mV
707
+ REFRACTORY_PERIOD = 2 # ms
708
+ ```
709
+
710
+ Output:
711
+
712
+ ```json
713
+ {
714
+ "entities": [
715
+ {
716
+ "name": "spike_detection_algorithm",
717
+ "entityType": "signal_processing_method",
718
+ "observations": ["Detects action potentials in voltage recordings", "Uses threshold crossing at -30mV", "Applies high-pass filtering at 300Hz"]
719
+ },
720
+ {
721
+ "name": "action_potential",
722
+ "entityType": "electrical_signal",
723
+ "observations": ["Neural spike", "All-or-nothing electrical event", "Detected above -30mV threshold", "Extracted in 1ms windows"]
724
+ },
725
+ {
726
+ "name": "intracellular_recording",
727
+ "entityType": "experimental_technique",
728
+ "observations": ["Voltage measurements from inside neurons", "20kHz sampling rate", "Measures membrane potential"]
729
+ },
730
+ {
731
+ "name": "high_pass_filter",
732
+ "entityType": "signal_filter",
733
+ "observations": ["4th order Butterworth filter", "300Hz cutoff frequency", "Removes slow voltage drift"]
734
+ },
735
+ {
736
+ "name": "resting_potential",
737
+ "entityType": "physiological_parameter",
738
+ "observations": ["Baseline membrane voltage", "Set at -70mV", "Stable state between spikes"]
739
+ }
740
+ ],
741
+ "relations": [
742
+ {
743
+ "from": "spike_detection_algorithm",
744
+ "to": "action_potential",
745
+ "relationType": ["detects", "identifies"]
746
+ },
747
+ {
748
+ "from": "high_pass_filter",
749
+ "to": "spike_detection_algorithm",
750
+ "relationType": ["preprocesses_for"]
751
+ },
752
+ {
753
+ "from": "intracellular_recording",
754
+ "to": "action_potential",
755
+ "relationType": ["records", "measures"]
756
+ },
757
+ {
758
+ "from": "resting_potential",
759
+ "to": "action_potential",
760
+ "relationType": ["baseline_for"]
761
+ }
762
+ ]
763
+ }
764
+ ```
765
+
766
+ ### Example 8: Edge Case - Malformed Content
767
+
768
+ Input:
769
+
770
+ ## File Information
771
+
772
+ Path: `corrupted.txt`
773
+ Chunk: 10 of 13
774
+
775
+ ## Content to Analyze
776
+
777
+ ```
778
+ X H qrewf __TEXT __text eeee 0 n 0 __stubs __TEXT 22e4e __TEXT 8 __cstring afdsaa __unwind_info __TEXT H __DATA_CONST __got adsf __DATA __la_symbol_ptr __DATA __data __DATA H __LINKEDIT 0 8 X 0 8 X P usr lib dyld D 3 XK U 2 0 8 d usr lib libSystem B dylib UH H E H u H H 5 O H E E 6 M H H 1 A A A bA L aA AS 9 h h h h s
779
+ ```
780
+
781
+ Output:
782
+
783
+ ```json
784
+ {
785
+ "entities": [],
786
+ "relations": []
787
+ }
788
+ ```
789
+
790
+ {{#if domainExamples}}
791
+ ## Domain-Specific Extraction Examples
792
+
793
+ The following examples are tailored to the detected content type ({{detectedContentClass}}):
794
+
795
+ {{domainExamples}}
796
+
797
+ {{/if}}
798
+ ## FINAL REMINDER
799
+
800
+ Your success is measured by the **meaningfulness and accuracy** of extracted knowledge. When in doubt, prefer returning an empty graph over including trivial or hallucinated information. Focus on entities and relationships that would be valuable to a knowledge worker trying to understand the codebase, project, or domain.