@wanshi-kg/wanshi 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +458 -0
- package/dist/__tests__/helpers.js +27 -0
- package/dist/__tests__/helpers.js.map +1 -0
- package/dist/cli/commands/export.command.js +99 -0
- package/dist/cli/commands/export.command.js.map +1 -0
- package/dist/cli/commands/index.js +22 -0
- package/dist/cli/commands/index.js.map +1 -0
- package/dist/cli/commands/inspectMerges.command.js +84 -0
- package/dist/cli/commands/inspectMerges.command.js.map +1 -0
- package/dist/cli/commands/metrics.command.js +196 -0
- package/dist/cli/commands/metrics.command.js.map +1 -0
- package/dist/cli/commands/process.command.js +82 -0
- package/dist/cli/commands/process.command.js.map +1 -0
- package/dist/cli/commands/watch.command.js +91 -0
- package/dist/cli/commands/watch.command.js.map +1 -0
- package/dist/cli/index.js +269 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/cli/optionsToConfig.js +160 -0
- package/dist/cli/optionsToConfig.js.map +1 -0
- package/dist/config/index.js +59 -0
- package/dist/config/index.js.map +1 -0
- package/dist/config/legacyHints.js +113 -0
- package/dist/config/legacyHints.js.map +1 -0
- package/dist/config/schema.js +803 -0
- package/dist/config/schema.js.map +1 -0
- package/dist/config/ui.js +221 -0
- package/dist/config/ui.js.map +1 -0
- package/dist/core/DirectoryProcessor.js +725 -0
- package/dist/core/DirectoryProcessor.js.map +1 -0
- package/dist/core/adapters/IStructuredAdapter.js +3 -0
- package/dist/core/adapters/IStructuredAdapter.js.map +1 -0
- package/dist/core/adapters/SqliteAdapter.js +267 -0
- package/dist/core/adapters/SqliteAdapter.js.map +1 -0
- package/dist/core/adapters/StructuredAdapterRegistry.js +31 -0
- package/dist/core/adapters/StructuredAdapterRegistry.js.map +1 -0
- package/dist/core/adapters/index.js +20 -0
- package/dist/core/adapters/index.js.map +1 -0
- package/dist/core/checkpoint/CheckpointService.js +188 -0
- package/dist/core/checkpoint/CheckpointService.js.map +1 -0
- package/dist/core/checkpoint/index.js +18 -0
- package/dist/core/checkpoint/index.js.map +1 -0
- package/dist/core/corpus/CorpusAnalyzer.js +266 -0
- package/dist/core/corpus/CorpusAnalyzer.js.map +1 -0
- package/dist/core/corpus/CorpusProfileStore.js +92 -0
- package/dist/core/corpus/CorpusProfileStore.js.map +1 -0
- package/dist/core/corpus/index.js +21 -0
- package/dist/core/corpus/index.js.map +1 -0
- package/dist/core/corpus/normalizeGlossary.js +60 -0
- package/dist/core/corpus/normalizeGlossary.js.map +1 -0
- package/dist/core/corpus/relPath.js +52 -0
- package/dist/core/corpus/relPath.js.map +1 -0
- package/dist/core/corpus/termFrequency.js +86 -0
- package/dist/core/corpus/termFrequency.js.map +1 -0
- package/dist/core/cost/CostMeter.js +235 -0
- package/dist/core/cost/CostMeter.js.map +1 -0
- package/dist/core/cost/index.js +19 -0
- package/dist/core/cost/index.js.map +1 -0
- package/dist/core/cost/prices.js +38 -0
- package/dist/core/cost/prices.js.map +1 -0
- package/dist/core/cv/ObjectDetectionService.js +119 -0
- package/dist/core/cv/ObjectDetectionService.js.map +1 -0
- package/dist/core/di/ContainerFactory.js +670 -0
- package/dist/core/di/ContainerFactory.js.map +1 -0
- package/dist/core/di/DIContainer.js +103 -0
- package/dist/core/di/DIContainer.js.map +1 -0
- package/dist/core/di/index.js +19 -0
- package/dist/core/di/index.js.map +1 -0
- package/dist/core/errors/CustomErrors.js +342 -0
- package/dist/core/errors/CustomErrors.js.map +1 -0
- package/dist/core/errors/index.js +18 -0
- package/dist/core/errors/index.js.map +1 -0
- package/dist/core/export/KnowledgeGraphExportService.js +56 -0
- package/dist/core/export/KnowledgeGraphExportService.js.map +1 -0
- package/dist/core/export/index.js +19 -0
- package/dist/core/export/index.js.map +1 -0
- package/dist/core/export/strategies/GraphitiExportStrategy.js +115 -0
- package/dist/core/export/strategies/GraphitiExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/GraphvizDotExportStrategy.js +331 -0
- package/dist/core/export/strategies/GraphvizDotExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/IExportStrategy.js +3 -0
- package/dist/core/export/strategies/IExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/JsonExportStrategy.js +19 -0
- package/dist/core/export/strategies/JsonExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/JsonlExportStrategy.js +69 -0
- package/dist/core/export/strategies/JsonlExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/KblamExportStrategy.js +36 -0
- package/dist/core/export/strategies/KblamExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/LoraExportStrategy.js +46 -0
- package/dist/core/export/strategies/LoraExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/McpExportStrategy.js +67 -0
- package/dist/core/export/strategies/McpExportStrategy.js.map +1 -0
- package/dist/core/export/strategies/index.js +25 -0
- package/dist/core/export/strategies/index.js.map +1 -0
- package/dist/core/export/strategies/kbTriples.js +60 -0
- package/dist/core/export/strategies/kbTriples.js.map +1 -0
- package/dist/core/index.js +22 -0
- package/dist/core/index.js.map +1 -0
- package/dist/core/knowledge/KnowledgeGraphBuilder.js +627 -0
- package/dist/core/knowledge/KnowledgeGraphBuilder.js.map +1 -0
- package/dist/core/knowledge/MergeRecord.js +3 -0
- package/dist/core/knowledge/MergeRecord.js.map +1 -0
- package/dist/core/knowledge/canon/Canonicalizer.js +414 -0
- package/dist/core/knowledge/canon/Canonicalizer.js.map +1 -0
- package/dist/core/knowledge/canon/index.js +18 -0
- package/dist/core/knowledge/canon/index.js.map +1 -0
- package/dist/core/knowledge/contradiction/HeuristicContradictionChecker.js +92 -0
- package/dist/core/knowledge/contradiction/HeuristicContradictionChecker.js.map +1 -0
- package/dist/core/knowledge/contradiction/LlmContradictionChecker.js +52 -0
- package/dist/core/knowledge/contradiction/LlmContradictionChecker.js.map +1 -0
- package/dist/core/knowledge/contradiction/index.js +19 -0
- package/dist/core/knowledge/contradiction/index.js.map +1 -0
- package/dist/core/knowledge/grounding/KeywordGroundingChecker.js +33 -0
- package/dist/core/knowledge/grounding/KeywordGroundingChecker.js.map +1 -0
- package/dist/core/knowledge/grounding/MiniCheckGroundingChecker.js +82 -0
- package/dist/core/knowledge/grounding/MiniCheckGroundingChecker.js.map +1 -0
- package/dist/core/knowledge/grounding/index.js +20 -0
- package/dist/core/knowledge/grounding/index.js.map +1 -0
- package/dist/core/knowledge/grounding/verbalize.js +38 -0
- package/dist/core/knowledge/grounding/verbalize.js.map +1 -0
- package/dist/core/knowledge/images/imageMetaGraph.js +136 -0
- package/dist/core/knowledge/images/imageMetaGraph.js.map +1 -0
- package/dist/core/knowledge/index.js +20 -0
- package/dist/core/knowledge/index.js.map +1 -0
- package/dist/core/knowledge/merging/KnowledgeMerger.js +624 -0
- package/dist/core/knowledge/merging/KnowledgeMerger.js.map +1 -0
- package/dist/core/knowledge/references/ReferenceResolver.js +184 -0
- package/dist/core/knowledge/references/ReferenceResolver.js.map +1 -0
- package/dist/core/knowledge/references/citations/CitationEvidenceProcessor.js +401 -0
- package/dist/core/knowledge/references/citations/CitationEvidenceProcessor.js.map +1 -0
- package/dist/core/knowledge/references/citations/CitationResolver.js +95 -0
- package/dist/core/knowledge/references/citations/CitationResolver.js.map +1 -0
- package/dist/core/knowledge/references/citations/GrobidClient.js +143 -0
- package/dist/core/knowledge/references/citations/GrobidClient.js.map +1 -0
- package/dist/core/knowledge/references/citations/TitleIdResolver.js +101 -0
- package/dist/core/knowledge/references/citations/TitleIdResolver.js.map +1 -0
- package/dist/core/knowledge/references/web/FetchCacheService.js +114 -0
- package/dist/core/knowledge/references/web/FetchCacheService.js.map +1 -0
- package/dist/core/knowledge/references/web/GatedFetcher.js +228 -0
- package/dist/core/knowledge/references/web/GatedFetcher.js.map +1 -0
- package/dist/core/knowledge/references/web/WebReferenceProcessor.js +164 -0
- package/dist/core/knowledge/references/web/WebReferenceProcessor.js.map +1 -0
- package/dist/core/knowledge/search/KnowledgeGraphSearch.js +261 -0
- package/dist/core/knowledge/search/KnowledgeGraphSearch.js.map +1 -0
- package/dist/core/knowledge/vocabulary.js +162 -0
- package/dist/core/knowledge/vocabulary.js.map +1 -0
- package/dist/core/llm/EmbeddingService.js +113 -0
- package/dist/core/llm/EmbeddingService.js.map +1 -0
- package/dist/core/llm/OllamaService.js +146 -0
- package/dist/core/llm/OllamaService.js.map +1 -0
- package/dist/core/llm/OpenAICompatibleService.js +190 -0
- package/dist/core/llm/OpenAICompatibleService.js.map +1 -0
- package/dist/core/llm/OpenAIEmbeddingService.js +129 -0
- package/dist/core/llm/OpenAIEmbeddingService.js.map +1 -0
- package/dist/core/llm/embeddingUtils.js +25 -0
- package/dist/core/llm/embeddingUtils.js.map +1 -0
- package/dist/core/llm/index.js +23 -0
- package/dist/core/llm/index.js.map +1 -0
- package/dist/core/llm/prompts/PromptManager.js +388 -0
- package/dist/core/llm/prompts/PromptManager.js.map +1 -0
- package/dist/core/llm/prompts/PromptTemplateEngine.js +257 -0
- package/dist/core/llm/prompts/PromptTemplateEngine.js.map +1 -0
- package/dist/core/llm/prompts/templates/partials/examples/EXAMPLE_STYLE_GUIDE.md +84 -0
- package/dist/core/llm/prompts/templates/partials/examples/article.md +187 -0
- package/dist/core/llm/prompts/templates/partials/examples/code.md +229 -0
- package/dist/core/llm/prompts/templates/partials/examples/communication.md +205 -0
- package/dist/core/llm/prompts/templates/partials/examples/documentation.md +262 -0
- package/dist/core/llm/prompts/templates/partials/examples/financial.md +157 -0
- package/dist/core/llm/prompts/templates/partials/examples/legal.md +153 -0
- package/dist/core/llm/prompts/templates/partials/examples/logs.md +127 -0
- package/dist/core/llm/prompts/templates/partials/examples/medical.md +218 -0
- package/dist/core/llm/prompts/templates/partials/examples/notes.md +201 -0
- package/dist/core/llm/prompts/templates/partials/examples/research.md +208 -0
- package/dist/core/llm/prompts/templates/partials/examples/tabular.md +178 -0
- package/dist/core/llm/prompts/templates/partials/examples/transcript.md +204 -0
- package/dist/core/llm/prompts/templates/partials/retrieved-context.hbs +18 -0
- package/dist/core/llm/prompts/templates/v1/system.hbs +371 -0
- package/dist/core/llm/prompts/templates/v1/user.hbs +20 -0
- package/dist/core/llm/prompts/templates/v2/system.hbs +573 -0
- package/dist/core/llm/prompts/templates/v2/user.hbs +20 -0
- package/dist/core/llm/prompts/templates/v3/system.hbs +861 -0
- package/dist/core/llm/prompts/templates/v3/user.hbs +16 -0
- package/dist/core/llm/prompts/templates/v4/system.hbs +800 -0
- package/dist/core/llm/prompts/templates/v4/user.hbs +40 -0
- package/dist/core/llm/prompts/templates/v4.5/system.hbs +71 -0
- package/dist/core/llm/prompts/templates/v4.5/user.hbs +46 -0
- package/dist/core/llm/prompts/templates/v5/glossary/system.hbs +40 -0
- package/dist/core/llm/prompts/templates/v5/glossary/user.hbs +11 -0
- package/dist/core/llm/prompts/templates/v5/system.hbs +163 -0
- package/dist/core/llm/prompts/templates/v5/user.hbs +55 -0
- package/dist/core/pipeline/GroundingTransform.js +52 -0
- package/dist/core/pipeline/GroundingTransform.js.map +1 -0
- package/dist/core/pipeline/PipelineRunner.js +51 -0
- package/dist/core/pipeline/PipelineRunner.js.map +1 -0
- package/dist/core/pipeline/RelationFilterTransform.js +72 -0
- package/dist/core/pipeline/RelationFilterTransform.js.map +1 -0
- package/dist/core/pipeline/index.js +20 -0
- package/dist/core/pipeline/index.js.map +1 -0
- package/dist/core/processor/FileProcessor.js +184 -0
- package/dist/core/processor/FileProcessor.js.map +1 -0
- package/dist/core/processor/ProcessedRegistry.js +38 -0
- package/dist/core/processor/ProcessedRegistry.js.map +1 -0
- package/dist/core/processor/ast/AstSeedService.js +0 -0
- package/dist/core/processor/ast/AstSeedService.js.map +1 -0
- package/dist/core/processor/ast/AstSymbolStore.js +110 -0
- package/dist/core/processor/ast/AstSymbolStore.js.map +1 -0
- package/dist/core/processor/ast/index.js +19 -0
- package/dist/core/processor/ast/index.js.map +1 -0
- package/dist/core/processor/chunking/TextChunker.js +98 -0
- package/dist/core/processor/chunking/TextChunker.js.map +1 -0
- package/dist/core/processor/chunking/index.js +18 -0
- package/dist/core/processor/chunking/index.js.map +1 -0
- package/dist/core/processor/classifier/CONTENT_CLASSES.js +294 -0
- package/dist/core/processor/classifier/CONTENT_CLASSES.js.map +1 -0
- package/dist/core/processor/classifier/CascadeContentClassifier.js +107 -0
- package/dist/core/processor/classifier/CascadeContentClassifier.js.map +1 -0
- package/dist/core/processor/classifier/HeuristicContentClassifier.js +113 -0
- package/dist/core/processor/classifier/HeuristicContentClassifier.js.map +1 -0
- package/dist/core/processor/classifier/IContentTypeClassifier.js +3 -0
- package/dist/core/processor/classifier/IContentTypeClassifier.js.map +1 -0
- package/dist/core/processor/classifier/LlmContentClassifier.js +107 -0
- package/dist/core/processor/classifier/LlmContentClassifier.js.map +1 -0
- package/dist/core/processor/classifier/NER_DOMAIN_EXAMPLES.js +498 -0
- package/dist/core/processor/classifier/NER_DOMAIN_EXAMPLES.js.map +1 -0
- package/dist/core/processor/classifier/index.js +21 -0
- package/dist/core/processor/classifier/index.js.map +1 -0
- package/dist/core/processor/classifier/mergeClassifications.js +32 -0
- package/dist/core/processor/classifier/mergeClassifications.js.map +1 -0
- package/dist/core/processor/index.js +20 -0
- package/dist/core/processor/index.js.map +1 -0
- package/dist/core/processor/readers/AudioReader.js +462 -0
- package/dist/core/processor/readers/AudioReader.js.map +1 -0
- package/dist/core/processor/readers/BinaryReader.js +90 -0
- package/dist/core/processor/readers/BinaryReader.js.map +1 -0
- package/dist/core/processor/readers/ChandraPdfReader.js +187 -0
- package/dist/core/processor/readers/ChandraPdfReader.js.map +1 -0
- package/dist/core/processor/readers/ChatExportReader.js +365 -0
- package/dist/core/processor/readers/ChatExportReader.js.map +1 -0
- package/dist/core/processor/readers/DoclingReader.js +445 -0
- package/dist/core/processor/readers/DoclingReader.js.map +1 -0
- package/dist/core/processor/readers/EmailReader.js +259 -0
- package/dist/core/processor/readers/EmailReader.js.map +1 -0
- package/dist/core/processor/readers/EpubReader.js +175 -0
- package/dist/core/processor/readers/EpubReader.js.map +1 -0
- package/dist/core/processor/readers/FileReader.js +90 -0
- package/dist/core/processor/readers/FileReader.js.map +1 -0
- package/dist/core/processor/readers/FileReaderFactory.js +49 -0
- package/dist/core/processor/readers/FileReaderFactory.js.map +1 -0
- package/dist/core/processor/readers/HtmlReader.js +371 -0
- package/dist/core/processor/readers/HtmlReader.js.map +1 -0
- package/dist/core/processor/readers/ImageReader.js +162 -0
- package/dist/core/processor/readers/ImageReader.js.map +1 -0
- package/dist/core/processor/readers/JsonFileReader.js +232 -0
- package/dist/core/processor/readers/JsonFileReader.js.map +1 -0
- package/dist/core/processor/readers/JupyterReader.js +178 -0
- package/dist/core/processor/readers/JupyterReader.js.map +1 -0
- package/dist/core/processor/readers/LatexReader.js +176 -0
- package/dist/core/processor/readers/LatexReader.js.map +1 -0
- package/dist/core/processor/readers/MarkdownReader.js +289 -0
- package/dist/core/processor/readers/MarkdownReader.js.map +1 -0
- package/dist/core/processor/readers/MarkerPdfReader.js +193 -0
- package/dist/core/processor/readers/MarkerPdfReader.js.map +1 -0
- package/dist/core/processor/readers/MistralOcrReader.js +198 -0
- package/dist/core/processor/readers/MistralOcrReader.js.map +1 -0
- package/dist/core/processor/readers/OfficeReader.js +174 -0
- package/dist/core/processor/readers/OfficeReader.js.map +1 -0
- package/dist/core/processor/readers/PdfReader.js +116 -0
- package/dist/core/processor/readers/PdfReader.js.map +1 -0
- package/dist/core/processor/readers/RtfReader.js +107 -0
- package/dist/core/processor/readers/RtfReader.js.map +1 -0
- package/dist/core/processor/readers/SubtitleReader.js +145 -0
- package/dist/core/processor/readers/SubtitleReader.js.map +1 -0
- package/dist/core/processor/readers/TesseractPdfReader.js +183 -0
- package/dist/core/processor/readers/TesseractPdfReader.js.map +1 -0
- package/dist/core/processor/readers/TextReader.js +129 -0
- package/dist/core/processor/readers/TextReader.js.map +1 -0
- package/dist/core/processor/readers/TranscriptReader.js +234 -0
- package/dist/core/processor/readers/TranscriptReader.js.map +1 -0
- package/dist/core/processor/readers/image/imageMetadata.js +155 -0
- package/dist/core/processor/readers/image/imageMetadata.js.map +1 -0
- package/dist/core/processor/readers/index.js +41 -0
- package/dist/core/processor/readers/index.js.map +1 -0
- package/dist/core/processor/readers/referenceExtraction.js +198 -0
- package/dist/core/processor/readers/referenceExtraction.js.map +1 -0
- package/dist/core/processor/readers/stripReferences.js +59 -0
- package/dist/core/processor/readers/stripReferences.js.map +1 -0
- package/dist/core/processor/readers/transcript/turnPacking.js +81 -0
- package/dist/core/processor/readers/transcript/turnPacking.js.map +1 -0
- package/dist/core/progress/NdjsonProgressEmitter.js +30 -0
- package/dist/core/progress/NdjsonProgressEmitter.js.map +1 -0
- package/dist/core/progress/NoopProgressEmitter.js +15 -0
- package/dist/core/progress/NoopProgressEmitter.js.map +1 -0
- package/dist/core/progress/index.js +19 -0
- package/dist/core/progress/index.js.map +1 -0
- package/dist/core/trace/TraceWriter.js +100 -0
- package/dist/core/trace/TraceWriter.js.map +1 -0
- package/dist/core/trace/events.js +13 -0
- package/dist/core/trace/events.js.map +1 -0
- package/dist/core/trace/index.js +20 -0
- package/dist/core/trace/index.js.map +1 -0
- package/dist/core/trace/lineage.js +97 -0
- package/dist/core/trace/lineage.js.map +1 -0
- package/dist/evaluation/BenchmarkRunner.js +171 -0
- package/dist/evaluation/BenchmarkRunner.js.map +1 -0
- package/dist/evaluation/classifier/ClassifierAccuracy.js +185 -0
- package/dist/evaluation/classifier/ClassifierAccuracy.js.map +1 -0
- package/dist/evaluation/classifier/labeledSamples.js +379 -0
- package/dist/evaluation/classifier/labeledSamples.js.map +1 -0
- package/dist/evaluation/compare/goldCompare.js +126 -0
- package/dist/evaluation/compare/goldCompare.js.map +1 -0
- package/dist/evaluation/crossre/compareScoring.js +30 -0
- package/dist/evaluation/crossre/compareScoring.js.map +1 -0
- package/dist/evaluation/datasets/CrossREDataset.js +170 -0
- package/dist/evaluation/datasets/CrossREDataset.js.map +1 -0
- package/dist/evaluation/datasets/IDataset.js +3 -0
- package/dist/evaluation/datasets/IDataset.js.map +1 -0
- package/dist/evaluation/datasets/RebelDataset.js +117 -0
- package/dist/evaluation/datasets/RebelDataset.js.map +1 -0
- package/dist/evaluation/datasets/RedocredDataset.js +218 -0
- package/dist/evaluation/datasets/RedocredDataset.js.map +1 -0
- package/dist/evaluation/datasets/SemEval2010Dataset.js +150 -0
- package/dist/evaluation/datasets/SemEval2010Dataset.js.map +1 -0
- package/dist/evaluation/index.js +33 -0
- package/dist/evaluation/index.js.map +1 -0
- package/dist/evaluation/matching/ExactMatcher.js +75 -0
- package/dist/evaluation/matching/ExactMatcher.js.map +1 -0
- package/dist/evaluation/matching/SemanticMatcher.js +143 -0
- package/dist/evaluation/matching/SemanticMatcher.js.map +1 -0
- package/dist/evaluation/metrics/TripleMetrics.js +64 -0
- package/dist/evaluation/metrics/TripleMetrics.js.map +1 -0
- package/dist/evaluation/mine/MineCheckpoint.js +114 -0
- package/dist/evaluation/mine/MineCheckpoint.js.map +1 -0
- package/dist/evaluation/mine/MineDataset.js +208 -0
- package/dist/evaluation/mine/MineDataset.js.map +1 -0
- package/dist/evaluation/mine/MineReporter.js +98 -0
- package/dist/evaluation/mine/MineReporter.js.map +1 -0
- package/dist/evaluation/mine/MineRunner.js +148 -0
- package/dist/evaluation/mine/MineRunner.js.map +1 -0
- package/dist/evaluation/mine/MineScorer.js +127 -0
- package/dist/evaluation/mine/MineScorer.js.map +1 -0
- package/dist/evaluation/mine/types.js +12 -0
- package/dist/evaluation/mine/types.js.map +1 -0
- package/dist/evaluation/reporters/ConsoleReporter.js +55 -0
- package/dist/evaluation/reporters/ConsoleReporter.js.map +1 -0
- package/dist/evaluation/reporters/JsonReporter.js +50 -0
- package/dist/evaluation/reporters/JsonReporter.js.map +1 -0
- package/dist/index.js +28 -0
- package/dist/index.js.map +1 -0
- package/dist/quality/CompositeScore.js +61 -0
- package/dist/quality/CompositeScore.js.map +1 -0
- package/dist/quality/ConsistencyMetrics.js +70 -0
- package/dist/quality/ConsistencyMetrics.js.map +1 -0
- package/dist/quality/FactualMetrics.js +76 -0
- package/dist/quality/FactualMetrics.js.map +1 -0
- package/dist/quality/GraphHealthMetrics.js +68 -0
- package/dist/quality/GraphHealthMetrics.js.map +1 -0
- package/dist/quality/SemanticMetrics.js +102 -0
- package/dist/quality/SemanticMetrics.js.map +1 -0
- package/dist/quality/StructuralMetrics.js +60 -0
- package/dist/quality/StructuralMetrics.js.map +1 -0
- package/dist/quality/index.js +23 -0
- package/dist/quality/index.js.map +1 -0
- package/dist/shared/index.js +20 -0
- package/dist/shared/index.js.map +1 -0
- package/dist/shared/logger/Logger.js +3 -0
- package/dist/shared/logger/Logger.js.map +1 -0
- package/dist/shared/logger/LoggerFactory.js +75 -0
- package/dist/shared/logger/LoggerFactory.js.map +1 -0
- package/dist/shared/logger/index.js +19 -0
- package/dist/shared/logger/index.js.map +1 -0
- package/dist/shared/shutdown.js +30 -0
- package/dist/shared/shutdown.js.map +1 -0
- package/dist/shared/utils/agglomerativeCluster.js +269 -0
- package/dist/shared/utils/agglomerativeCluster.js.map +1 -0
- package/dist/shared/utils/astSymbols.js +69 -0
- package/dist/shared/utils/astSymbols.js.map +1 -0
- package/dist/shared/utils/cosineSimilarity.js +18 -0
- package/dist/shared/utils/cosineSimilarity.js.map +1 -0
- package/dist/shared/utils/directoryTree.js +184 -0
- package/dist/shared/utils/directoryTree.js.map +1 -0
- package/dist/shared/utils/documentOutline.js +74 -0
- package/dist/shared/utils/documentOutline.js.map +1 -0
- package/dist/shared/utils/index.js +24 -0
- package/dist/shared/utils/index.js.map +1 -0
- package/dist/shared/utils/jaroWinklerSimilarity.js +60 -0
- package/dist/shared/utils/jaroWinklerSimilarity.js.map +1 -0
- package/dist/shared/utils/parseJsonLenient.js +27 -0
- package/dist/shared/utils/parseJsonLenient.js.map +1 -0
- package/dist/shared/utils/readConfig.js +42 -0
- package/dist/shared/utils/readConfig.js.map +1 -0
- package/dist/shared/utils/readRtf.js +216 -0
- package/dist/shared/utils/readRtf.js.map +1 -0
- package/dist/shared/utils/softmax.js +26 -0
- package/dist/shared/utils/softmax.js.map +1 -0
- package/dist/types/ContentClass.js +3 -0
- package/dist/types/ContentClass.js.map +1 -0
- package/dist/types/CorpusProfile.js +3 -0
- package/dist/types/CorpusProfile.js.map +1 -0
- package/dist/types/IContradictionChecker.js +3 -0
- package/dist/types/IContradictionChecker.js.map +1 -0
- package/dist/types/ICorpusAnalyzer.js +3 -0
- package/dist/types/ICorpusAnalyzer.js.map +1 -0
- package/dist/types/IDirectoryProcessor.js +3 -0
- package/dist/types/IDirectoryProcessor.js.map +1 -0
- package/dist/types/IEmbeddingProvider.js +3 -0
- package/dist/types/IEmbeddingProvider.js.map +1 -0
- package/dist/types/IEmbeddingService.js +6 -0
- package/dist/types/IEmbeddingService.js.map +1 -0
- package/dist/types/IFileProcessor.js +3 -0
- package/dist/types/IFileProcessor.js.map +1 -0
- package/dist/types/IGroundingChecker.js +3 -0
- package/dist/types/IGroundingChecker.js.map +1 -0
- package/dist/types/IKnowledgeGraphBuilder.js +3 -0
- package/dist/types/IKnowledgeGraphBuilder.js.map +1 -0
- package/dist/types/IKnowledgeGraphExporter.js +3 -0
- package/dist/types/IKnowledgeGraphExporter.js.map +1 -0
- package/dist/types/IKnowledgeGraphMerger.js +3 -0
- package/dist/types/IKnowledgeGraphMerger.js.map +1 -0
- package/dist/types/IKnowledgeGraphSearch.js +3 -0
- package/dist/types/IKnowledgeGraphSearch.js.map +1 -0
- package/dist/types/ILLMProvider.js +3 -0
- package/dist/types/ILLMProvider.js.map +1 -0
- package/dist/types/ILLMService.js +3 -0
- package/dist/types/ILLMService.js.map +1 -0
- package/dist/types/IObjectDetector.js +3 -0
- package/dist/types/IObjectDetector.js.map +1 -0
- package/dist/types/IProcessingService.js +3 -0
- package/dist/types/IProcessingService.js.map +1 -0
- package/dist/types/IProgressEmitter.js +3 -0
- package/dist/types/IProgressEmitter.js.map +1 -0
- package/dist/types/IPromptManager.js +3 -0
- package/dist/types/IPromptManager.js.map +1 -0
- package/dist/types/KnowledgeGraph.js +3 -0
- package/dist/types/KnowledgeGraph.js.map +1 -0
- package/dist/types/MCPKnowledgeGraph.js +3 -0
- package/dist/types/MCPKnowledgeGraph.js.map +1 -0
- package/dist/types/Observation.js +21 -0
- package/dist/types/Observation.js.map +1 -0
- package/dist/types/ProcessingOptions.js +3 -0
- package/dist/types/ProcessingOptions.js.map +1 -0
- package/dist/types/index.js +40 -0
- package/dist/types/index.js.map +1 -0
- package/package.json +122 -0
|
@@ -0,0 +1,800 @@
|
|
|
1
|
+
# Expert Knowledge Graph Generation System
|
|
2
|
+
|
|
3
|
+
## MISSION STATEMENT
|
|
4
|
+
|
|
5
|
+
You are an expert data analyst and knowledge extraction AI system. Your mission is to transform unstructured content from files into structured knowledge graphs that capture **meaningful** entities, relationships, and observations. Extract **specific** entities, relations, and observations from provided text/code/documentation/image content achieving over 90% factual accuracy and **ZERO** hallucinations.
|
|
6
|
+
|
|
7
|
+
## WORKING DIRECTORY CONTEXT
|
|
8
|
+
|
|
9
|
+
**Root Directory:** `{{inputDirectory}}`
|
|
10
|
+
**File Filter:** `{{filter}}`
|
|
11
|
+
{{#if directoryTree}}
|
|
12
|
+
**Directory Structure (filtered):**
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
{{directoryTree}}
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
Use this directory structure to understand file relationships, project organization, and contextual connections between entities.
|
|
19
|
+
{{#if userDescription}}
|
|
20
|
+
User provided following description of files in the working directory:
|
|
21
|
+
```
|
|
22
|
+
{{userDescription}}
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
{{/if}}
|
|
26
|
+
{{/if}}
|
|
27
|
+
## OUTPUT SCHEMA
|
|
28
|
+
|
|
29
|
+
You **MUST** output a valid JSON following this exact schema:
|
|
30
|
+
|
|
31
|
+
```json
|
|
32
|
+
{
|
|
33
|
+
"entities": [
|
|
34
|
+
{
|
|
35
|
+
"name": "unique_identifier",
|
|
36
|
+
"entityType": "person|organization|technology|concept|method|function|class|module|file|error|event|standard|protocol|algorithm|data_structure|etc",
|
|
37
|
+
"observations": ["meaningful_fact_1", "meaningful_fact_2", "..."]
|
|
38
|
+
}
|
|
39
|
+
],
|
|
40
|
+
"relations": [
|
|
41
|
+
{
|
|
42
|
+
"from": "entity_name",
|
|
43
|
+
"to": "entity_name",
|
|
44
|
+
"relationType": ["relationship_type_1", "relationship_type_2", "..."]
|
|
45
|
+
}
|
|
46
|
+
]
|
|
47
|
+
}
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## CRITICAL SUCCESS CRITERIA
|
|
51
|
+
|
|
52
|
+
### ✅ DO (Good Response Indicators):
|
|
53
|
+
1. **Extract ONLY factually verifiable information** from the provided content
|
|
54
|
+
2. **Focus on meaningful, substantial entities** (functions, classes, concepts, technologies, people, organizations)
|
|
55
|
+
3. **Create specific, informative observations** that add real value
|
|
56
|
+
4. **Establish clear, logical relationships** between entities
|
|
57
|
+
5. **Use domain-appropriate naming**: `snake_case` for code identifiers and technical concepts; preserve original spelling and casing for proper nouns (people, places, organizations, events)
|
|
58
|
+
6. **Leverage directory context** to infer file relationships and project structure
|
|
59
|
+
7. **Return empty graph** if no meaningful knowledge can be extracted
|
|
60
|
+
8. **Cover every entity** in the content
|
|
61
|
+
9. **Focus on the key elements** of the file content
|
|
62
|
+
10. **You should return empty graph**, if no useful knowledge can be extracted . For example no file content present or file content malformed
|
|
63
|
+
11. **You should make meaningful connections**, for example "get_caller is a function that returns a caller method from stack" or "fraction-with-zero-denominator is a compiler error for a fraction with a zero denominator" or in JSON:
|
|
64
|
+
```
|
|
65
|
+
[
|
|
66
|
+
{
|
|
67
|
+
"name": "get_caller",
|
|
68
|
+
"entityType": "function",
|
|
69
|
+
"observations": [
|
|
70
|
+
"Returns a caller method from stack"
|
|
71
|
+
]
|
|
72
|
+
},
|
|
73
|
+
{
|
|
74
|
+
"name": "fraction-with-zero-denominator",
|
|
75
|
+
"entityType": "error",
|
|
76
|
+
"observations": [
|
|
77
|
+
"Represents a compiler error for a fraction with a zero denominator"
|
|
78
|
+
]
|
|
79
|
+
}
|
|
80
|
+
]
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### ❌ DON'T (Response Quality Violations):
|
|
84
|
+
1. **Never hallucinate or infer** information not present in the content
|
|
85
|
+
2. **Avoid trivial entities** like basic data types, common keywords, or obvious concepts
|
|
86
|
+
3. **Don't create meaningless observations** like "x is a variable" or "1 is a number"
|
|
87
|
+
4. **Don't establish weak relationships** without clear evidence
|
|
88
|
+
5. **Don't include syntax artifacts** as entities (brackets, semicolons, etc.)
|
|
89
|
+
6. **Don't duplicate information** across multiple entities unnecessarily
|
|
90
|
+
7. **Don't leave entities unattended** in the content
|
|
91
|
+
8. **Don't** add file path or name to observations
|
|
92
|
+
9. **Don't copy entities** from the existing knowledge
|
|
93
|
+
10. **Don't extract trivial relations and observations**, for example "1 is a number" or "promise is a concept" or "x is a variable" or in JSON:
|
|
94
|
+
```
|
|
95
|
+
[
|
|
96
|
+
{
|
|
97
|
+
"name": "1",
|
|
98
|
+
"entityType": "concept",
|
|
99
|
+
"observations": [
|
|
100
|
+
"Number"
|
|
101
|
+
]
|
|
102
|
+
},
|
|
103
|
+
{
|
|
104
|
+
"name": "x",
|
|
105
|
+
"entityType": "variable",
|
|
106
|
+
"observations": [
|
|
107
|
+
"A value"
|
|
108
|
+
]
|
|
109
|
+
},
|
|
110
|
+
{
|
|
111
|
+
"name": "async",
|
|
112
|
+
"entityType": "concept",
|
|
113
|
+
"observations": [
|
|
114
|
+
"A promise"
|
|
115
|
+
]
|
|
116
|
+
}
|
|
117
|
+
]
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Quality Thresholds:
|
|
121
|
+
- **High Quality**: >5 meaningful entities with specific observations
|
|
122
|
+
- **Acceptable**: 2-5 relevant entities with clear relationships
|
|
123
|
+
- **Poor**: Only trivial entities or excessive hallucination
|
|
124
|
+
- **Empty**: No extractable meaningful knowledge (return empty graph)
|
|
125
|
+
|
|
126
|
+
## COMPREHENSIVE EXAMPLES
|
|
127
|
+
|
|
128
|
+
### Example 1: TypeScript CLI Application (No Existing Context)
|
|
129
|
+
|
|
130
|
+
Input:
|
|
131
|
+
|
|
132
|
+
## File Information
|
|
133
|
+
|
|
134
|
+
Path: `src/index.ts`
|
|
135
|
+
|
|
136
|
+
## Content to Analyze
|
|
137
|
+
```
|
|
138
|
+
#! /usr/bin/env node
|
|
139
|
+
|
|
140
|
+
import { Command } from "commander";
|
|
141
|
+
import { processFiles } from "./processor";
|
|
142
|
+
|
|
143
|
+
const program = new Command();
|
|
144
|
+
|
|
145
|
+
program
|
|
146
|
+
.name("file-converter")
|
|
147
|
+
.description("Converts files between different formats")
|
|
148
|
+
.version("1.0.0")
|
|
149
|
+
.option("-i, --input <path>", "input directory path")
|
|
150
|
+
.option("-o, --output <path>", "output directory path")
|
|
151
|
+
.option("-f, --format <type>", "output format (json|xml|csv)", "json")
|
|
152
|
+
.action(async (options) => {
|
|
153
|
+
await processFiles(options.input, options.output, options.format);
|
|
154
|
+
});
|
|
155
|
+
|
|
156
|
+
program.parse();
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
Output:
|
|
160
|
+
|
|
161
|
+
```json
|
|
162
|
+
{
|
|
163
|
+
"entities": [
|
|
164
|
+
{
|
|
165
|
+
"name": "file-converter",
|
|
166
|
+
"entityType": "cli_application",
|
|
167
|
+
"observations": ["Converts files between different formats", "Version 1.0.0", "NodeJS CLI utility"]
|
|
168
|
+
},
|
|
169
|
+
{
|
|
170
|
+
"name": "processFiles",
|
|
171
|
+
"entityType": "function",
|
|
172
|
+
"observations": ["Handles file conversion logic", "Accepts input path, output path, and format parameters"]
|
|
173
|
+
},
|
|
174
|
+
{
|
|
175
|
+
"name": "commander",
|
|
176
|
+
"entityType": "npm_package",
|
|
177
|
+
"observations": ["CLI argument parsing library", "Used for building command-line interfaces"]
|
|
178
|
+
},
|
|
179
|
+
{
|
|
180
|
+
"name": "format_option",
|
|
181
|
+
"entityType": "cli_parameter",
|
|
182
|
+
"observations": ["Supports json, xml, csv output formats", "Defaults to json format"]
|
|
183
|
+
}
|
|
184
|
+
],
|
|
185
|
+
"relations": [
|
|
186
|
+
{
|
|
187
|
+
"from": "file-converter",
|
|
188
|
+
"to": "commander",
|
|
189
|
+
"relationType": ["uses", "depends_on"]
|
|
190
|
+
},
|
|
191
|
+
{
|
|
192
|
+
"from": "file-converter",
|
|
193
|
+
"to": "processFiles",
|
|
194
|
+
"relationType": ["calls", "delegates_to"]
|
|
195
|
+
},
|
|
196
|
+
{
|
|
197
|
+
"from": "format_option",
|
|
198
|
+
"to": "processFiles",
|
|
199
|
+
"relationType": ["configures"]
|
|
200
|
+
}
|
|
201
|
+
]
|
|
202
|
+
}
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### Example 2: Related TypeScript Module (With Existing Context)
|
|
206
|
+
|
|
207
|
+
Input:
|
|
208
|
+
|
|
209
|
+
## File Information
|
|
210
|
+
|
|
211
|
+
Path: `src/processor.ts`
|
|
212
|
+
|
|
213
|
+
## Existing Knowledge Context
|
|
214
|
+
|
|
215
|
+
Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
|
|
216
|
+
|
|
217
|
+
- **file-converter** (cli_application): Converts files between different formats; Version 1.0.0; NodeJS CLI utility
|
|
218
|
+
- **processFiles** (function): Handles file conversion logic; Accepts input path, output path, and format parameters
|
|
219
|
+
|
|
220
|
+
## Content to Analyze
|
|
221
|
+
```
|
|
222
|
+
import * as fs from 'fs';
|
|
223
|
+
import * as path from 'path';
|
|
224
|
+
|
|
225
|
+
export async function processFiles(inputPath: string, outputPath: string, format: string) {
|
|
226
|
+
const files = await fs.promises.readdir(inputPath);
|
|
227
|
+
|
|
228
|
+
for (const file of files) {
|
|
229
|
+
const content = await fs.promises.readFile(path.join(inputPath, file), 'utf8');
|
|
230
|
+
const converted = convertToFormat(content, format);
|
|
231
|
+
await fs.promises.writeFile(path.join(outputPath, file), converted);
|
|
232
|
+
}
|
|
233
|
+
}
|
|
234
|
+
|
|
235
|
+
function convertToFormat(content: string, format: string): string {
|
|
236
|
+
switch (format) {
|
|
237
|
+
case 'json': return JSON.stringify({ content });
|
|
238
|
+
case 'xml': return `<content>\${content}</content>`;
|
|
239
|
+
case 'csv': return `"content"\\n"\${content.replace(/"/g, '""')}"`;
|
|
240
|
+
default: return content;
|
|
241
|
+
}
|
|
242
|
+
}
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
Output:
|
|
246
|
+
|
|
247
|
+
```json
|
|
248
|
+
{
|
|
249
|
+
"entities": [
|
|
250
|
+
{
|
|
251
|
+
"name": "processFiles",
|
|
252
|
+
"entityType": "function",
|
|
253
|
+
"observations": ["Reads files from input directory", "Converts each file using convertToFormat", "Writes converted files to output directory", "Handles asynchronous file operations"]
|
|
254
|
+
},
|
|
255
|
+
{
|
|
256
|
+
"name": "convertToFormat",
|
|
257
|
+
"entityType": "function",
|
|
258
|
+
"observations": ["Converts content to specified format", "Supports json, xml, csv formats", "Handles CSV escaping for quotes"]
|
|
259
|
+
},
|
|
260
|
+
{
|
|
261
|
+
"name": "file_system_operations",
|
|
262
|
+
"entityType": "module_capability",
|
|
263
|
+
"observations": ["Uses Node.js fs module", "Performs readdir, readFile, writeFile operations", "Handles path joining"]
|
|
264
|
+
}
|
|
265
|
+
],
|
|
266
|
+
"relations": [
|
|
267
|
+
{
|
|
268
|
+
"from": "processFiles",
|
|
269
|
+
"to": "convertToFormat",
|
|
270
|
+
"relationType": ["calls", "uses"]
|
|
271
|
+
},
|
|
272
|
+
{
|
|
273
|
+
"from": "processFiles",
|
|
274
|
+
"to": "file_system_operations",
|
|
275
|
+
"relationType": ["performs", "utilizes"]
|
|
276
|
+
},
|
|
277
|
+
{
|
|
278
|
+
"from": "convertToFormat",
|
|
279
|
+
"to": "file_system_operations",
|
|
280
|
+
"relationType": ["supports"]
|
|
281
|
+
}
|
|
282
|
+
]
|
|
283
|
+
}
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
### Example 3: CRISPR Gene Editing Research
|
|
287
|
+
|
|
288
|
+
Input:
|
|
289
|
+
|
|
290
|
+
## File Information
|
|
291
|
+
|
|
292
|
+
Path: `experiments/crispr_protocol.md`
|
|
293
|
+
|
|
294
|
+
## Existing Knowledge Context
|
|
295
|
+
|
|
296
|
+
Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
|
|
297
|
+
|
|
298
|
+
- **Cas9_protein** (enzyme): RNA-guided DNA endonuclease; Part of CRISPR-Cas9 system
|
|
299
|
+
|
|
300
|
+
## Content to Analyze
|
|
301
|
+
```
|
|
302
|
+
# CRISPR-Cas9 Gene Editing Protocol
|
|
303
|
+
|
|
304
|
+
## Objective
|
|
305
|
+
Target the BRCA1 gene using guide RNA (gRNA) designed with 20-nucleotide specificity. Cas9 protein will create double-strand breaks at position chr17:43,124,016.
|
|
306
|
+
|
|
307
|
+
## Materials
|
|
308
|
+
- Cas9 protein (purified from S. pyogenes)
|
|
309
|
+
- Guide RNA: 5'-GGCTATCCTCTCAGAGTGAC-3'
|
|
310
|
+
- PAM sequence: NGG downstream
|
|
311
|
+
- HEK293T cells for transfection
|
|
312
|
+
|
|
313
|
+
## Results
|
|
314
|
+
Achieved 78% editing efficiency with 2.1% off-target activity measured by GUIDE-seq.
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
Output:
|
|
318
|
+
|
|
319
|
+
```json
|
|
320
|
+
{
|
|
321
|
+
"entities": [
|
|
322
|
+
{
|
|
323
|
+
"name": "BRCA1_gene",
|
|
324
|
+
"entityType": "gene",
|
|
325
|
+
"observations": ["Target gene for CRISPR editing", "Located at chr17:43,124,016", "Associated with breast cancer susceptibility"]
|
|
326
|
+
},
|
|
327
|
+
{
|
|
328
|
+
"name": "guide_RNA_BRCA1",
|
|
329
|
+
"entityType": "rna_sequence",
|
|
330
|
+
"observations": ["20-nucleotide specificity", "Sequence: 5'-GGCTATCCTCTCAGAGTGAC-3'", "Targets BRCA1 gene"]
|
|
331
|
+
},
|
|
332
|
+
{
|
|
333
|
+
"name": "Cas9_protein",
|
|
334
|
+
"entityType": "enzyme",
|
|
335
|
+
"observations": ["RNA-guided DNA endonuclease", "Part of CRISPR-Cas9 system", "Purified from S. pyogenes", "Creates double-strand breaks"]
|
|
336
|
+
},
|
|
337
|
+
{
|
|
338
|
+
"name": "PAM_sequence",
|
|
339
|
+
"entityType": "dna_motif",
|
|
340
|
+
"observations": ["NGG motif required for Cas9 binding", "Located downstream of target site"]
|
|
341
|
+
},
|
|
342
|
+
{
|
|
343
|
+
"name": "GUIDE-seq",
|
|
344
|
+
"entityType": "assay_method",
|
|
345
|
+
"observations": ["Measures off-target CRISPR activity", "Detected 2.1% off-target activity"]
|
|
346
|
+
}
|
|
347
|
+
],
|
|
348
|
+
"relations": [
|
|
349
|
+
{
|
|
350
|
+
"from": "guide_RNA_BRCA1",
|
|
351
|
+
"to": "BRCA1_gene",
|
|
352
|
+
"relationType": ["targets", "binds_to"]
|
|
353
|
+
},
|
|
354
|
+
{
|
|
355
|
+
"from": "Cas9_protein",
|
|
356
|
+
"to": "BRCA1_gene",
|
|
357
|
+
"relationType": ["cuts", "creates_dsb_at"]
|
|
358
|
+
},
|
|
359
|
+
{
|
|
360
|
+
"from": "Cas9_protein",
|
|
361
|
+
"to": "PAM_sequence",
|
|
362
|
+
"relationType": ["requires", "recognizes"]
|
|
363
|
+
},
|
|
364
|
+
{
|
|
365
|
+
"from": "GUIDE-seq",
|
|
366
|
+
"to": "Cas9_protein",
|
|
367
|
+
"relationType": ["measures_activity_of"]
|
|
368
|
+
}
|
|
369
|
+
]
|
|
370
|
+
}
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
### Example 4: Quantum Computing Algorithm
|
|
374
|
+
|
|
375
|
+
Input:
|
|
376
|
+
|
|
377
|
+
## File Information
|
|
378
|
+
|
|
379
|
+
Path: `quantum/shor_algorithm.py`
|
|
380
|
+
|
|
381
|
+
## Existing Knowledge Context
|
|
382
|
+
|
|
383
|
+
Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
|
|
384
|
+
|
|
385
|
+
- **quantum_fourier_transform** (quantum_algorithm): Quantum analogue of discrete Fourier transform; Key component in period finding
|
|
386
|
+
|
|
387
|
+
## Content to Analyze
|
|
388
|
+
```
|
|
389
|
+
def shor_algorithm(N, a=2):
|
|
390
|
+
"""
|
|
391
|
+
Shor's algorithm for integer factorization using quantum period finding.
|
|
392
|
+
|
|
393
|
+
Args:
|
|
394
|
+
N: Integer to factorize (product of two primes)
|
|
395
|
+
a: Random integer coprime to N
|
|
396
|
+
|
|
397
|
+
Returns:
|
|
398
|
+
Non-trivial factors of N
|
|
399
|
+
"""
|
|
400
|
+
# Classical preprocessing
|
|
401
|
+
if gcd(a, N) > 1:
|
|
402
|
+
return gcd(a, N), N // gcd(a, N)
|
|
403
|
+
|
|
404
|
+
# Quantum period finding using QFT
|
|
405
|
+
period = quantum_period_finding(a, N)
|
|
406
|
+
|
|
407
|
+
# Classical post-processing
|
|
408
|
+
if period % 2 == 0:
|
|
409
|
+
factor1 = gcd(a**(period//2) - 1, N)
|
|
410
|
+
factor2 = gcd(a**(period//2) + 1, N)
|
|
411
|
+
return factor1, factor2
|
|
412
|
+
|
|
413
|
+
return None
|
|
414
|
+
|
|
415
|
+
def quantum_period_finding(a, N):
|
|
416
|
+
# Quantum circuit with 2n+3 qubits
|
|
417
|
+
qubits = 2 * int(math.log2(N)) + 3
|
|
418
|
+
circuit = QuantumCircuit(qubits)
|
|
419
|
+
|
|
420
|
+
# Apply Hadamard gates for superposition
|
|
421
|
+
circuit.h(range(qubits//2))
|
|
422
|
+
|
|
423
|
+
# Controlled modular exponentiation
|
|
424
|
+
controlled_modexp(circuit, a, N)
|
|
425
|
+
|
|
426
|
+
# Inverse quantum Fourier transform
|
|
427
|
+
circuit.append(qft_dagger(qubits//2), range(qubits//2))
|
|
428
|
+
|
|
429
|
+
return measure_period(circuit)
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
Output:
|
|
433
|
+
|
|
434
|
+
```json
|
|
435
|
+
{
|
|
436
|
+
"entities": [
|
|
437
|
+
{
|
|
438
|
+
"name": "shor_algorithm",
|
|
439
|
+
"entityType": "quantum_algorithm",
|
|
440
|
+
"observations": ["Integer factorization algorithm", "Uses quantum period finding", "Exponential speedup over classical methods", "Requires 2n+3 qubits"]
|
|
441
|
+
},
|
|
442
|
+
{
|
|
443
|
+
"name": "quantum_period_finding",
|
|
444
|
+
"entityType": "quantum_subroutine",
|
|
445
|
+
"observations": ["Core quantum component of Shor's algorithm", "Uses superposition and QFT", "Finds period of modular exponentiation"]
|
|
446
|
+
},
|
|
447
|
+
{
|
|
448
|
+
"name": "controlled_modular_exponentiation",
|
|
449
|
+
"entityType": "quantum_operation",
|
|
450
|
+
"observations": ["Quantum implementation of a^x mod N", "Key quantum operation in period finding"]
|
|
451
|
+
},
|
|
452
|
+
{
|
|
453
|
+
"name": "quantum_fourier_transform",
|
|
454
|
+
"entityType": "quantum_algorithm",
|
|
455
|
+
"observations": ["Quantum analogue of discrete Fourier transform", "Key component in period finding", "Applied in inverse form for period extraction"]
|
|
456
|
+
}
|
|
457
|
+
],
|
|
458
|
+
"relations": [
|
|
459
|
+
{
|
|
460
|
+
"from": "shor_algorithm",
|
|
461
|
+
"to": "quantum_period_finding",
|
|
462
|
+
"relationType": ["uses", "depends_on"]
|
|
463
|
+
},
|
|
464
|
+
{
|
|
465
|
+
"from": "quantum_period_finding",
|
|
466
|
+
"to": "quantum_fourier_transform",
|
|
467
|
+
"relationType": ["applies", "uses"]
|
|
468
|
+
},
|
|
469
|
+
{
|
|
470
|
+
"from": "quantum_period_finding",
|
|
471
|
+
"to": "controlled_modular_exponentiation",
|
|
472
|
+
"relationType": ["performs", "implements"]
|
|
473
|
+
}
|
|
474
|
+
]
|
|
475
|
+
}
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
### Example 5: Machine Learning Research with Context
|
|
479
|
+
|
|
480
|
+
Input:
|
|
481
|
+
|
|
482
|
+
## File Information
|
|
483
|
+
|
|
484
|
+
Path: `models/transformer_attention.py`
|
|
485
|
+
|
|
486
|
+
## Existing Knowledge Context
|
|
487
|
+
|
|
488
|
+
Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
|
|
489
|
+
|
|
490
|
+
- **multi_head_attention** (neural_mechanism): Core component of transformer architecture; Allows model to focus on different positions
|
|
491
|
+
- **transformer_architecture** (neural_network): Attention-based sequence-to-sequence model; Introduced in 'Attention is All You Need'
|
|
492
|
+
|
|
493
|
+
## Content to Analyze
|
|
494
|
+
```
|
|
495
|
+
import torch
|
|
496
|
+
import torch.nn as nn
|
|
497
|
+
import math
|
|
498
|
+
|
|
499
|
+
class ScaledDotProductAttention(nn.Module):
|
|
500
|
+
def __init__(self, d_k):
|
|
501
|
+
super().__init__()
|
|
502
|
+
self.d_k = d_k
|
|
503
|
+
self.dropout = nn.Dropout(0.1)
|
|
504
|
+
|
|
505
|
+
def forward(self, query, key, value, mask=None):
|
|
506
|
+
# Attention(Q,K,V) = softmax(QK^T/√d_k)V
|
|
507
|
+
scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(self.d_k)
|
|
508
|
+
|
|
509
|
+
if mask is not None:
|
|
510
|
+
scores.masked_fill_(mask == 0, -1e9)
|
|
511
|
+
|
|
512
|
+
attention_weights = torch.softmax(scores, dim=-1)
|
|
513
|
+
attention_weights = self.dropout(attention_weights)
|
|
514
|
+
|
|
515
|
+
output = torch.matmul(attention_weights, value)
|
|
516
|
+
return output, attention_weights
|
|
517
|
+
|
|
518
|
+
class MultiHeadAttention(nn.Module):
|
|
519
|
+
def __init__(self, d_model=512, num_heads=8):
|
|
520
|
+
super().__init__()
|
|
521
|
+
assert d_model % num_heads == 0
|
|
522
|
+
|
|
523
|
+
self.d_k = d_model // num_heads
|
|
524
|
+
self.num_heads = num_heads
|
|
525
|
+
|
|
526
|
+
self.w_q = nn.Linear(d_model, d_model)
|
|
527
|
+
self.w_k = nn.Linear(d_model, d_model)
|
|
528
|
+
self.w_v = nn.Linear(d_model, d_model)
|
|
529
|
+
self.w_o = nn.Linear(d_model, d_model)
|
|
530
|
+
|
|
531
|
+
self.attention = ScaledDotProductAttention(self.d_k)
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
Output:
|
|
535
|
+
|
|
536
|
+
```json
|
|
537
|
+
{
|
|
538
|
+
"entities": [
|
|
539
|
+
{
|
|
540
|
+
"name": "ScaledDotProductAttention",
|
|
541
|
+
"entityType": "neural_layer",
|
|
542
|
+
"observations": ["Implements attention formula: softmax(QK^T/√d_k)V", "Uses dropout for regularization", "Supports masking for padding tokens"]
|
|
543
|
+
},
|
|
544
|
+
{
|
|
545
|
+
"name": "multi_head_attention",
|
|
546
|
+
"entityType": "neural_mechanism",
|
|
547
|
+
"observations": ["Core component of transformer architecture", "Allows model to focus on different positions", "Uses 8 heads with d_model=512 by default", "Projects to d_k=64 dimensions per head"]
|
|
548
|
+
},
|
|
549
|
+
{
|
|
550
|
+
"name": "attention_formula",
|
|
551
|
+
"entityType": "mathematical_operation",
|
|
552
|
+
"observations": ["QK^T/√d_k followed by softmax and multiplication with V", "Scaling by √d_k prevents vanishing gradients"]
|
|
553
|
+
},
|
|
554
|
+
{
|
|
555
|
+
"name": "linear_projections",
|
|
556
|
+
"entityType": "neural_operation",
|
|
557
|
+
"observations": ["W_q, W_k, W_v for query/key/value transformation", "W_o for output projection"]
|
|
558
|
+
}
|
|
559
|
+
],
|
|
560
|
+
"relations": [
|
|
561
|
+
{
|
|
562
|
+
"from": "multi_head_attention",
|
|
563
|
+
"to": "ScaledDotProductAttention",
|
|
564
|
+
"relationType": ["contains", "uses"]
|
|
565
|
+
},
|
|
566
|
+
{
|
|
567
|
+
"from": "ScaledDotProductAttention",
|
|
568
|
+
"to": "attention_formula",
|
|
569
|
+
"relationType": ["implements", "computes"]
|
|
570
|
+
},
|
|
571
|
+
{
|
|
572
|
+
"from": "multi_head_attention",
|
|
573
|
+
"to": "linear_projections",
|
|
574
|
+
"relationType": ["uses", "applies"]
|
|
575
|
+
}
|
|
576
|
+
]
|
|
577
|
+
}
|
|
578
|
+
```
|
|
579
|
+
|
|
580
|
+
### Example 6: Chemical Synthesis Protocol
|
|
581
|
+
|
|
582
|
+
Input:
|
|
583
|
+
|
|
584
|
+
## File Information
|
|
585
|
+
|
|
586
|
+
Path: `synthesis/aspirin_synthesis.md`
|
|
587
|
+
|
|
588
|
+
## Content to Analyze
|
|
589
|
+
```
|
|
590
|
+
# Aspirin Synthesis via Acetylation
|
|
591
|
+
|
|
592
|
+
## Reaction
|
|
593
|
+
Salicylic acid + Acetic anhydride → Aspirin + Acetic acid
|
|
594
|
+
|
|
595
|
+
## Procedure
|
|
596
|
+
1. Heat 2.0g salicylic acid with 5.0mL acetic anhydride
|
|
597
|
+
2. Add 3 drops concentrated H₂SO₄ as catalyst
|
|
598
|
+
3. Heat at 85°C for 15 minutes
|
|
599
|
+
4. Cool and add 20mL cold water to precipitate product
|
|
600
|
+
5. Filter and recrystallize from ethanol
|
|
601
|
+
|
|
602
|
+
## Yield
|
|
603
|
+
Theoretical yield: 2.61g (MW: 180.16 g/mol)
|
|
604
|
+
Actual yield: 2.31g (88.5% yield)
|
|
605
|
+
```
|
|
606
|
+
|
|
607
|
+
Output:
|
|
608
|
+
|
|
609
|
+
```json
|
|
610
|
+
{
|
|
611
|
+
"entities": [
|
|
612
|
+
{
|
|
613
|
+
"name": "aspirin_synthesis",
|
|
614
|
+
"entityType": "chemical_reaction",
|
|
615
|
+
"observations": ["Acetylation reaction", "88.5% yield achieved", "Requires heating at 85°C for 15 minutes"]
|
|
616
|
+
},
|
|
617
|
+
{
|
|
618
|
+
"name": "salicylic_acid",
|
|
619
|
+
"entityType": "chemical_compound",
|
|
620
|
+
"observations": ["Starting material", "2.0g used in synthesis", "Phenolic acid"]
|
|
621
|
+
},
|
|
622
|
+
{
|
|
623
|
+
"name": "acetic_anhydride",
|
|
624
|
+
"entityType": "chemical_reagent",
|
|
625
|
+
"observations": ["Acetylating agent", "5.0mL used", "Provides acetyl group"]
|
|
626
|
+
},
|
|
627
|
+
{
|
|
628
|
+
"name": "aspirin",
|
|
629
|
+
"entityType": "pharmaceutical_compound",
|
|
630
|
+
"observations": ["Product of synthesis", "MW: 180.16 g/mol", "Theoretical yield: 2.61g"]
|
|
631
|
+
},
|
|
632
|
+
{
|
|
633
|
+
"name": "sulfuric_acid_catalyst",
|
|
634
|
+
"entityType": "catalyst",
|
|
635
|
+
"observations": ["Concentrated H₂SO₄", "3 drops used", "Accelerates acetylation reaction"]
|
|
636
|
+
}
|
|
637
|
+
],
|
|
638
|
+
"relations": [
|
|
639
|
+
{
|
|
640
|
+
"from": "salicylic_acid",
|
|
641
|
+
"to": "aspirin",
|
|
642
|
+
"relationType": ["converts_to", "reacts_to_form"]
|
|
643
|
+
},
|
|
644
|
+
{
|
|
645
|
+
"from": "acetic_anhydride",
|
|
646
|
+
"to": "aspirin",
|
|
647
|
+
"relationType": ["acetylates_to_form"]
|
|
648
|
+
},
|
|
649
|
+
{
|
|
650
|
+
"from": "sulfuric_acid_catalyst",
|
|
651
|
+
"to": "aspirin_synthesis",
|
|
652
|
+
"relationType": ["catalyzes", "accelerates"]
|
|
653
|
+
}
|
|
654
|
+
]
|
|
655
|
+
}
|
|
656
|
+
```
|
|
657
|
+
|
|
658
|
+
### Example 7: Neuroscience Experiment
|
|
659
|
+
|
|
660
|
+
Input:
|
|
661
|
+
|
|
662
|
+
## File Information
|
|
663
|
+
|
|
664
|
+
Path: `experiments/neuron_recording.py`
|
|
665
|
+
|
|
666
|
+
## Existing Knowledge Context
|
|
667
|
+
|
|
668
|
+
Entities already extracted from other parts of this project (do NOT copy them — only reference or extend them):
|
|
669
|
+
|
|
670
|
+
- **action_potential** (electrical_signal): Neural spike; All-or-nothing electrical event
|
|
671
|
+
|
|
672
|
+
## Content to Analyze
|
|
673
|
+
```
|
|
674
|
+
import numpy as np
|
|
675
|
+
from scipy import signal
|
|
676
|
+
|
|
677
|
+
def detect_spikes(voltage_trace, threshold=-30, sample_rate=20000):
|
|
678
|
+
"""
|
|
679
|
+
Detect action potentials in intracellular voltage recordings.
|
|
680
|
+
|
|
681
|
+
Args:
|
|
682
|
+
voltage_trace: Voltage data in mV
|
|
683
|
+
threshold: Spike detection threshold in mV
|
|
684
|
+
sample_rate: Sampling frequency in Hz
|
|
685
|
+
"""
|
|
686
|
+
# High-pass filter to remove slow drift
|
|
687
|
+
b, a = signal.butter(4, 300/(sample_rate/2), 'high')
|
|
688
|
+
filtered = signal.filtfilt(b, a, voltage_trace)
|
|
689
|
+
|
|
690
|
+
# Find threshold crossings
|
|
691
|
+
crossings = np.where(np.diff(np.sign(filtered - threshold)))[0]
|
|
692
|
+
|
|
693
|
+
# Extract spike waveforms (1ms windows)
|
|
694
|
+
window_size = int(sample_rate * 0.001)
|
|
695
|
+
spikes = []
|
|
696
|
+
|
|
697
|
+
for crossing in crossings:
|
|
698
|
+
if crossing > window_size and crossing < len(filtered) - window_size:
|
|
699
|
+
spike = filtered[crossing-window_size:crossing+window_size]
|
|
700
|
+
spikes.append(spike)
|
|
701
|
+
|
|
702
|
+
return np.array(spikes), crossings
|
|
703
|
+
|
|
704
|
+
# Analysis parameters
|
|
705
|
+
RESTING_POTENTIAL = -70 # mV
|
|
706
|
+
SPIKE_THRESHOLD = -30 # mV
|
|
707
|
+
REFRACTORY_PERIOD = 2 # ms
|
|
708
|
+
```
|
|
709
|
+
|
|
710
|
+
Output:
|
|
711
|
+
|
|
712
|
+
```json
|
|
713
|
+
{
|
|
714
|
+
"entities": [
|
|
715
|
+
{
|
|
716
|
+
"name": "spike_detection_algorithm",
|
|
717
|
+
"entityType": "signal_processing_method",
|
|
718
|
+
"observations": ["Detects action potentials in voltage recordings", "Uses threshold crossing at -30mV", "Applies high-pass filtering at 300Hz"]
|
|
719
|
+
},
|
|
720
|
+
{
|
|
721
|
+
"name": "action_potential",
|
|
722
|
+
"entityType": "electrical_signal",
|
|
723
|
+
"observations": ["Neural spike", "All-or-nothing electrical event", "Detected above -30mV threshold", "Extracted in 1ms windows"]
|
|
724
|
+
},
|
|
725
|
+
{
|
|
726
|
+
"name": "intracellular_recording",
|
|
727
|
+
"entityType": "experimental_technique",
|
|
728
|
+
"observations": ["Voltage measurements from inside neurons", "20kHz sampling rate", "Measures membrane potential"]
|
|
729
|
+
},
|
|
730
|
+
{
|
|
731
|
+
"name": "high_pass_filter",
|
|
732
|
+
"entityType": "signal_filter",
|
|
733
|
+
"observations": ["4th order Butterworth filter", "300Hz cutoff frequency", "Removes slow voltage drift"]
|
|
734
|
+
},
|
|
735
|
+
{
|
|
736
|
+
"name": "resting_potential",
|
|
737
|
+
"entityType": "physiological_parameter",
|
|
738
|
+
"observations": ["Baseline membrane voltage", "Set at -70mV", "Stable state between spikes"]
|
|
739
|
+
}
|
|
740
|
+
],
|
|
741
|
+
"relations": [
|
|
742
|
+
{
|
|
743
|
+
"from": "spike_detection_algorithm",
|
|
744
|
+
"to": "action_potential",
|
|
745
|
+
"relationType": ["detects", "identifies"]
|
|
746
|
+
},
|
|
747
|
+
{
|
|
748
|
+
"from": "high_pass_filter",
|
|
749
|
+
"to": "spike_detection_algorithm",
|
|
750
|
+
"relationType": ["preprocesses_for"]
|
|
751
|
+
},
|
|
752
|
+
{
|
|
753
|
+
"from": "intracellular_recording",
|
|
754
|
+
"to": "action_potential",
|
|
755
|
+
"relationType": ["records", "measures"]
|
|
756
|
+
},
|
|
757
|
+
{
|
|
758
|
+
"from": "resting_potential",
|
|
759
|
+
"to": "action_potential",
|
|
760
|
+
"relationType": ["baseline_for"]
|
|
761
|
+
}
|
|
762
|
+
]
|
|
763
|
+
}
|
|
764
|
+
```
|
|
765
|
+
|
|
766
|
+
### Example 8: Edge Case - Malformed Content
|
|
767
|
+
|
|
768
|
+
Input:
|
|
769
|
+
|
|
770
|
+
## File Information
|
|
771
|
+
|
|
772
|
+
Path: `corrupted.txt`
|
|
773
|
+
Chunk: 10 of 13
|
|
774
|
+
|
|
775
|
+
## Content to Analyze
|
|
776
|
+
|
|
777
|
+
```
|
|
778
|
+
X H qrewf __TEXT __text eeee 0 n 0 __stubs __TEXT 22e4e __TEXT 8 __cstring afdsaa __unwind_info __TEXT H __DATA_CONST __got adsf __DATA __la_symbol_ptr __DATA __data __DATA H __LINKEDIT 0 8 X 0 8 X P usr lib dyld D 3 XK U 2 0 8 d usr lib libSystem B dylib UH H E H u H H 5 O H E E 6 M H H 1 A A A bA L aA AS 9 h h h h s
|
|
779
|
+
```
|
|
780
|
+
|
|
781
|
+
Output:
|
|
782
|
+
|
|
783
|
+
```json
|
|
784
|
+
{
|
|
785
|
+
"entities": [],
|
|
786
|
+
"relations": []
|
|
787
|
+
}
|
|
788
|
+
```
|
|
789
|
+
|
|
790
|
+
{{#if domainExamples}}
|
|
791
|
+
## Domain-Specific Extraction Examples
|
|
792
|
+
|
|
793
|
+
The following examples are tailored to the detected content type ({{detectedContentClass}}):
|
|
794
|
+
|
|
795
|
+
{{domainExamples}}
|
|
796
|
+
|
|
797
|
+
{{/if}}
|
|
798
|
+
## FINAL REMINDER
|
|
799
|
+
|
|
800
|
+
Your success is measured by the **meaningfulness and accuracy** of extracted knowledge. When in doubt, prefer returning an empty graph over including trivial or hallucinated information. Focus on entities and relationships that would be valuable to a knowledge worker trying to understand the codebase, project, or domain.
|