@wentorai/research-plugins 1.2.2 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +16 -8
- package/openclaw.plugin.json +10 -3
- package/package.json +2 -5
- package/skills/analysis/dataviz/SKILL.md +25 -0
- package/skills/analysis/dataviz/chart-image-generator/SKILL.md +1 -1
- package/skills/analysis/econometrics/SKILL.md +23 -0
- package/skills/analysis/econometrics/robustness-checks/SKILL.md +1 -1
- package/skills/analysis/statistics/SKILL.md +21 -0
- package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +1 -1
- package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +1 -1
- package/skills/analysis/statistics/{senior-data-scientist-guide → modeling-strategy-guide}/SKILL.md +5 -5
- package/skills/analysis/wrangling/SKILL.md +21 -0
- package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +1 -1
- package/skills/analysis/wrangling/data-cog-guide/SKILL.md +1 -1
- package/skills/domains/ai-ml/SKILL.md +37 -0
- package/skills/domains/biomedical/SKILL.md +28 -0
- package/skills/domains/biomedical/genomas-guide/SKILL.md +1 -1
- package/skills/domains/biomedical/med-researcher-guide/SKILL.md +1 -1
- package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +1 -1
- package/skills/domains/business/SKILL.md +17 -0
- package/skills/domains/business/architecture-design-guide/SKILL.md +1 -1
- package/skills/domains/chemistry/SKILL.md +19 -0
- package/skills/domains/chemistry/computational-chemistry-guide/SKILL.md +1 -1
- package/skills/domains/cs/SKILL.md +21 -0
- package/skills/domains/ecology/SKILL.md +16 -0
- package/skills/domains/economics/SKILL.md +20 -0
- package/skills/domains/economics/post-labor-economics/SKILL.md +1 -1
- package/skills/domains/economics/pricing-psychology-guide/SKILL.md +1 -1
- package/skills/domains/education/SKILL.md +19 -0
- package/skills/domains/education/academic-study-methods/SKILL.md +1 -1
- package/skills/domains/education/edumcp-guide/SKILL.md +1 -1
- package/skills/domains/finance/SKILL.md +19 -0
- package/skills/domains/finance/akshare-finance-data/SKILL.md +1 -1
- package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +1 -1
- package/skills/domains/finance/stata-accounting-research/SKILL.md +1 -1
- package/skills/domains/geoscience/SKILL.md +17 -0
- package/skills/domains/humanities/SKILL.md +16 -0
- package/skills/domains/humanities/history-research-guide/SKILL.md +1 -1
- package/skills/domains/humanities/political-history-guide/SKILL.md +1 -1
- package/skills/domains/law/SKILL.md +19 -0
- package/skills/domains/math/SKILL.md +17 -0
- package/skills/domains/pharma/SKILL.md +17 -0
- package/skills/domains/physics/SKILL.md +16 -0
- package/skills/domains/social-science/SKILL.md +17 -0
- package/skills/domains/social-science/sociology-research-methods/SKILL.md +1 -1
- package/skills/literature/discovery/SKILL.md +20 -0
- package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +1 -1
- package/skills/literature/discovery/semantic-paper-radar/SKILL.md +1 -1
- package/skills/literature/fulltext/SKILL.md +26 -0
- package/skills/literature/metadata/SKILL.md +35 -0
- package/skills/literature/metadata/doi-content-negotiation/SKILL.md +4 -0
- package/skills/literature/metadata/doi-resolution-guide/SKILL.md +4 -0
- package/skills/literature/metadata/orcid-api/SKILL.md +4 -0
- package/skills/literature/metadata/orcid-integration-guide/SKILL.md +4 -0
- package/skills/literature/search/SKILL.md +43 -0
- package/skills/literature/search/paper-search-mcp-guide/SKILL.md +1 -1
- package/skills/research/automation/SKILL.md +21 -0
- package/skills/research/deep-research/SKILL.md +24 -0
- package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +1 -1
- package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
- package/skills/research/funding/SKILL.md +20 -0
- package/skills/research/methodology/SKILL.md +24 -0
- package/skills/research/paper-review/SKILL.md +19 -0
- package/skills/research/paper-review/paper-critique-framework/SKILL.md +1 -1
- package/skills/tools/code-exec/SKILL.md +18 -0
- package/skills/tools/diagram/SKILL.md +20 -0
- package/skills/tools/document/SKILL.md +21 -0
- package/skills/tools/knowledge-graph/SKILL.md +21 -0
- package/skills/tools/ocr-translate/SKILL.md +18 -0
- package/skills/tools/ocr-translate/handwriting-recognition-guide/SKILL.md +2 -0
- package/skills/tools/ocr-translate/latex-ocr-guide/SKILL.md +2 -0
- package/skills/tools/scraping/SKILL.md +17 -0
- package/skills/writing/citation/SKILL.md +33 -0
- package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +2 -0
- package/skills/writing/composition/SKILL.md +22 -0
- package/skills/writing/composition/research-paper-writer/SKILL.md +1 -1
- package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +1 -1
- package/skills/writing/latex/SKILL.md +22 -0
- package/skills/writing/latex/academic-writing-latex/SKILL.md +1 -1
- package/skills/writing/latex/latex-drawing-guide/SKILL.md +1 -1
- package/skills/writing/polish/SKILL.md +20 -0
- package/skills/writing/polish/chinese-text-humanizer/SKILL.md +1 -1
- package/skills/writing/templates/SKILL.md +22 -0
- package/skills/writing/templates/beamer-presentation-guide/SKILL.md +1 -1
- package/skills/writing/templates/scientific-article-pdf/SKILL.md +1 -1
- package/skills/analysis/dataviz/citation-map-guide/SKILL.md +0 -184
- package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +0 -171
- package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +0 -192
- package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +0 -267
- package/skills/analysis/econometrics/stata-regression/SKILL.md +0 -117
- package/skills/analysis/statistics/general-statistics-guide/SKILL.md +0 -226
- package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +0 -106
- package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +0 -192
- package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +0 -193
- package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +0 -100
- package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +0 -197
- package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +0 -159
- package/skills/domains/humanities/digital-humanities-methods/SKILL.md +0 -232
- package/skills/domains/law/legal-research-methods/SKILL.md +0 -190
- package/skills/domains/social-science/sociology-research-guide/SKILL.md +0 -238
- package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +0 -233
- package/skills/literature/discovery/paper-tracking-guide/SKILL.md +0 -211
- package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +0 -168
- package/skills/literature/search/arxiv-osiris/SKILL.md +0 -199
- package/skills/literature/search/deepgit-search-guide/SKILL.md +0 -147
- package/skills/literature/search/multi-database-literature-search/SKILL.md +0 -198
- package/skills/literature/search/papers-chat-guide/SKILL.md +0 -194
- package/skills/literature/search/pasa-paper-search-guide/SKILL.md +0 -138
- package/skills/literature/search/scientify-literature-survey/SKILL.md +0 -203
- package/skills/research/automation/ai-scientist-guide/SKILL.md +0 -228
- package/skills/research/automation/coexist-ai-guide/SKILL.md +0 -149
- package/skills/research/automation/foam-agent-guide/SKILL.md +0 -203
- package/skills/research/automation/research-paper-orchestrator/SKILL.md +0 -254
- package/skills/research/deep-research/academic-deep-research/SKILL.md +0 -190
- package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +0 -200
- package/skills/research/deep-research/corvus-research-guide/SKILL.md +0 -132
- package/skills/research/deep-research/deep-research-pro/SKILL.md +0 -213
- package/skills/research/deep-research/deep-research-work/SKILL.md +0 -204
- package/skills/research/deep-research/research-cog/SKILL.md +0 -153
- package/skills/research/methodology/academic-mentor-guide/SKILL.md +0 -169
- package/skills/research/methodology/deep-innovator-guide/SKILL.md +0 -242
- package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +0 -169
- package/skills/research/paper-review/paper-compare-guide/SKILL.md +0 -238
- package/skills/research/paper-review/paper-digest-guide/SKILL.md +0 -240
- package/skills/research/paper-review/paper-research-assistant/SKILL.md +0 -231
- package/skills/research/paper-review/research-quality-filter/SKILL.md +0 -261
- package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +0 -110
- package/skills/tools/diagram/clawphd-guide/SKILL.md +0 -149
- package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +0 -201
- package/skills/tools/document/md2pdf-xelatex/SKILL.md +0 -212
- package/skills/tools/document/openpaper-guide/SKILL.md +0 -232
- package/skills/tools/document/weknora-guide/SKILL.md +0 -216
- package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +0 -135
- package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +0 -156
- package/skills/tools/ocr-translate/formula-recognition-guide/SKILL.md +0 -367
- package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +0 -198
- package/skills/tools/scraping/api-data-collection-guide/SKILL.md +0 -301
- package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +0 -182
- package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +0 -200
- package/skills/writing/composition/paper-debugger-guide/SKILL.md +0 -143
- package/skills/writing/composition/paperforge-guide/SKILL.md +0 -205
|
@@ -1,232 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: openpaper-guide
|
|
3
|
-
description: "Open-source tool for organizing and annotating research papers"
|
|
4
|
-
metadata:
|
|
5
|
-
openclaw:
|
|
6
|
-
emoji: "📄"
|
|
7
|
-
category: "tools"
|
|
8
|
-
subcategory: "document"
|
|
9
|
-
keywords: ["paper management", "PDF annotation", "research organizer", "paper reader", "document viewer", "open source"]
|
|
10
|
-
source: "https://github.com/nicehash/openpaper"
|
|
11
|
-
---
|
|
12
|
-
|
|
13
|
-
# OpenPaper Guide
|
|
14
|
-
|
|
15
|
-
## Overview
|
|
16
|
-
|
|
17
|
-
OpenPaper is an open-source research paper management and annotation tool. It provides PDF viewing with inline annotations, paper organization with tags and collections, metadata extraction, full-text search across your library, and export capabilities. Designed as a lightweight, privacy-focused alternative to commercial reference managers, running entirely locally.
|
|
18
|
-
|
|
19
|
-
## Installation
|
|
20
|
-
|
|
21
|
-
```bash
|
|
22
|
-
# Install via pip
|
|
23
|
-
pip install openpaper
|
|
24
|
-
|
|
25
|
-
# Or from source
|
|
26
|
-
git clone https://github.com/nicehash/openpaper.git
|
|
27
|
-
cd openpaper && pip install -e .
|
|
28
|
-
|
|
29
|
-
# Launch
|
|
30
|
-
openpaper
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
## Library Management
|
|
34
|
-
|
|
35
|
-
```python
|
|
36
|
-
from openpaper import Library
|
|
37
|
-
|
|
38
|
-
library = Library("./my_research_library")
|
|
39
|
-
|
|
40
|
-
# Add papers
|
|
41
|
-
paper = library.add("path/to/paper.pdf")
|
|
42
|
-
print(f"Added: {paper.title}")
|
|
43
|
-
print(f"Authors: {paper.authors}")
|
|
44
|
-
print(f"Year: {paper.year}")
|
|
45
|
-
|
|
46
|
-
# Bulk import
|
|
47
|
-
added = library.import_directory(
|
|
48
|
-
"downloads/papers/",
|
|
49
|
-
recursive=True,
|
|
50
|
-
extract_metadata=True, # Auto-extract from PDF
|
|
51
|
-
deduplicate=True, # Skip duplicates by DOI/title
|
|
52
|
-
)
|
|
53
|
-
print(f"Imported {len(added)} papers, {added.duplicates} skipped")
|
|
54
|
-
```
|
|
55
|
-
|
|
56
|
-
## Organization
|
|
57
|
-
|
|
58
|
-
```python
|
|
59
|
-
# Tags
|
|
60
|
-
paper.add_tag("transformer")
|
|
61
|
-
paper.add_tag("attention")
|
|
62
|
-
paper.add_tag("priority:high")
|
|
63
|
-
|
|
64
|
-
# Collections
|
|
65
|
-
library.create_collection("thesis-chapter-2")
|
|
66
|
-
library.add_to_collection("thesis-chapter-2", paper)
|
|
67
|
-
|
|
68
|
-
# Smart collections (auto-updating filters)
|
|
69
|
-
library.create_smart_collection(
|
|
70
|
-
name="Recent NLP",
|
|
71
|
-
filters={
|
|
72
|
-
"tags": ["nlp"],
|
|
73
|
-
"year": {"gte": 2023},
|
|
74
|
-
"read_status": "unread",
|
|
75
|
-
},
|
|
76
|
-
)
|
|
77
|
-
|
|
78
|
-
# List and browse
|
|
79
|
-
for p in library.search(tags=["transformer"], year=2024):
|
|
80
|
-
print(f"{p.title} ({p.year}) - {p.read_status}")
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
## Annotations
|
|
84
|
-
|
|
85
|
-
```python
|
|
86
|
-
# Add annotations to papers
|
|
87
|
-
paper.annotate(
|
|
88
|
-
page=3,
|
|
89
|
-
type="highlight",
|
|
90
|
-
text="The attention mechanism allows the model to focus...",
|
|
91
|
-
color="yellow",
|
|
92
|
-
note="Key definition of attention",
|
|
93
|
-
)
|
|
94
|
-
|
|
95
|
-
paper.annotate(
|
|
96
|
-
page=5,
|
|
97
|
-
type="comment",
|
|
98
|
-
position=(100, 250), # x, y coordinates
|
|
99
|
-
note="This contradicts the claim in Smith et al. 2023",
|
|
100
|
-
)
|
|
101
|
-
|
|
102
|
-
# Export annotations
|
|
103
|
-
annotations = paper.get_annotations()
|
|
104
|
-
for ann in annotations:
|
|
105
|
-
print(f"[p.{ann.page}] {ann.type}: {ann.text[:60]}...")
|
|
106
|
-
if ann.note:
|
|
107
|
-
print(f" Note: {ann.note}")
|
|
108
|
-
|
|
109
|
-
# Export to markdown
|
|
110
|
-
paper.export_annotations("annotations.md")
|
|
111
|
-
```
|
|
112
|
-
|
|
113
|
-
## Search
|
|
114
|
-
|
|
115
|
-
```python
|
|
116
|
-
# Full-text search across library
|
|
117
|
-
results = library.search_fulltext("attention mechanism")
|
|
118
|
-
for r in results:
|
|
119
|
-
print(f"{r.title} (relevance: {r.score:.2f})")
|
|
120
|
-
for match in r.matches[:3]:
|
|
121
|
-
print(f" p.{match.page}: ...{match.context}...")
|
|
122
|
-
|
|
123
|
-
# Metadata search
|
|
124
|
-
results = library.search(
|
|
125
|
-
query="transformer", # Title/abstract search
|
|
126
|
-
authors="Vaswani",
|
|
127
|
-
year_range=(2020, 2025),
|
|
128
|
-
tags=["nlp"],
|
|
129
|
-
)
|
|
130
|
-
|
|
131
|
-
# Semantic search (if embeddings enabled)
|
|
132
|
-
results = library.semantic_search(
|
|
133
|
-
"methods for reducing quadratic complexity of attention",
|
|
134
|
-
top_k=10,
|
|
135
|
-
)
|
|
136
|
-
```
|
|
137
|
-
|
|
138
|
-
## Metadata Extraction
|
|
139
|
-
|
|
140
|
-
```python
|
|
141
|
-
# Auto-extract metadata from PDFs
|
|
142
|
-
metadata = library.extract_metadata("paper.pdf")
|
|
143
|
-
print(f"Title: {metadata.title}")
|
|
144
|
-
print(f"Authors: {metadata.authors}")
|
|
145
|
-
print(f"Abstract: {metadata.abstract[:200]}...")
|
|
146
|
-
print(f"DOI: {metadata.doi}")
|
|
147
|
-
print(f"Year: {metadata.year}")
|
|
148
|
-
print(f"References: {len(metadata.references)}")
|
|
149
|
-
|
|
150
|
-
# Enrich with external databases
|
|
151
|
-
enriched = library.enrich_metadata(
|
|
152
|
-
paper,
|
|
153
|
-
sources=["crossref", "semantic_scholar"],
|
|
154
|
-
)
|
|
155
|
-
print(f"Citations: {enriched.citation_count}")
|
|
156
|
-
print(f"Venue: {enriched.venue}")
|
|
157
|
-
```
|
|
158
|
-
|
|
159
|
-
## Export
|
|
160
|
-
|
|
161
|
-
```python
|
|
162
|
-
# Export bibliography
|
|
163
|
-
library.export_bibtex("references.bib", collection="thesis-chapter-2")
|
|
164
|
-
|
|
165
|
-
# Export reading list
|
|
166
|
-
library.export_reading_list("reading_list.md", format="markdown")
|
|
167
|
-
|
|
168
|
-
# Export annotations from all papers
|
|
169
|
-
library.export_all_annotations("all_annotations.md")
|
|
170
|
-
|
|
171
|
-
# Sync with reference manager
|
|
172
|
-
library.export_ris("export.ris") # RIS format
|
|
173
|
-
library.export_csv("export.csv") # CSV with metadata
|
|
174
|
-
```
|
|
175
|
-
|
|
176
|
-
## Configuration
|
|
177
|
-
|
|
178
|
-
```json
|
|
179
|
-
{
|
|
180
|
-
"library_path": "./research_library",
|
|
181
|
-
"pdf_viewer": "builtin",
|
|
182
|
-
"metadata": {
|
|
183
|
-
"auto_extract": true,
|
|
184
|
-
"enrich_sources": ["crossref"],
|
|
185
|
-
"language": "en"
|
|
186
|
-
},
|
|
187
|
-
"search": {
|
|
188
|
-
"fulltext_index": true,
|
|
189
|
-
"semantic_search": false,
|
|
190
|
-
"embedding_model": "all-MiniLM-L6-v2"
|
|
191
|
-
},
|
|
192
|
-
"storage": {
|
|
193
|
-
"copy_pdfs": true,
|
|
194
|
-
"organize_by": "year",
|
|
195
|
-
"max_library_size_gb": 10
|
|
196
|
-
}
|
|
197
|
-
}
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
## CLI Usage
|
|
201
|
-
|
|
202
|
-
```bash
|
|
203
|
-
# Add paper
|
|
204
|
-
openpaper add paper.pdf --tags "nlp,transformer"
|
|
205
|
-
|
|
206
|
-
# Search
|
|
207
|
-
openpaper search "attention mechanism" --limit 10
|
|
208
|
-
|
|
209
|
-
# List library
|
|
210
|
-
openpaper list --sort year --tags "priority:high"
|
|
211
|
-
|
|
212
|
-
# Export
|
|
213
|
-
openpaper export bibtex --collection thesis --output refs.bib
|
|
214
|
-
|
|
215
|
-
# Stats
|
|
216
|
-
openpaper stats
|
|
217
|
-
# Papers: 342, Tagged: 289, Annotated: 156, Collections: 12
|
|
218
|
-
```
|
|
219
|
-
|
|
220
|
-
## Use Cases
|
|
221
|
-
|
|
222
|
-
1. **Paper library**: Organize and search your PDF collection
|
|
223
|
-
2. **Reading workflow**: Track read status, annotate, take notes
|
|
224
|
-
3. **Reference management**: Export BibTeX for LaTeX papers
|
|
225
|
-
4. **Literature review**: Tag and categorize papers by topic
|
|
226
|
-
5. **Team sharing**: Export reading lists and annotations
|
|
227
|
-
|
|
228
|
-
## References
|
|
229
|
-
|
|
230
|
-
- [OpenPaper GitHub](https://github.com/nicehash/openpaper)
|
|
231
|
-
- [Zotero](https://www.zotero.org/) — Popular open-source alternative
|
|
232
|
-
- [Semantic Scholar API](https://api.semanticscholar.org/) — Metadata enrichment
|
|
@@ -1,216 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: weknora-guide
|
|
3
|
-
description: "Tencent document understanding engine with RAG capabilities"
|
|
4
|
-
metadata:
|
|
5
|
-
openclaw:
|
|
6
|
-
emoji: "📑"
|
|
7
|
-
category: "tools"
|
|
8
|
-
subcategory: "document"
|
|
9
|
-
keywords: ["WeKnora", "document understanding", "RAG", "text mining", "Tencent", "knowledge extraction"]
|
|
10
|
-
source: "https://github.com/Tencent/WeKnora"
|
|
11
|
-
---
|
|
12
|
-
|
|
13
|
-
# WeKnora Guide
|
|
14
|
-
|
|
15
|
-
## Overview
|
|
16
|
-
|
|
17
|
-
WeKnora is Tencent's open-source document understanding and retrieval-augmented generation engine. It processes complex documents (PDF, DOCX, HTML) into structured knowledge, supporting layout analysis, table extraction, formula recognition, and multi-modal content parsing. Integrates with RAG pipelines for question answering over document collections. Suited for academic paper processing, report analysis, and enterprise document intelligence.
|
|
18
|
-
|
|
19
|
-
## Installation
|
|
20
|
-
|
|
21
|
-
```bash
|
|
22
|
-
# Install WeKnora
|
|
23
|
-
pip install weknora
|
|
24
|
-
|
|
25
|
-
# With GPU support
|
|
26
|
-
pip install weknora[gpu]
|
|
27
|
-
|
|
28
|
-
# With all optional dependencies
|
|
29
|
-
pip install weknora[all]
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
## Document Parsing
|
|
33
|
-
|
|
34
|
-
```python
|
|
35
|
-
from weknora import DocumentParser
|
|
36
|
-
|
|
37
|
-
parser = DocumentParser()
|
|
38
|
-
|
|
39
|
-
# Parse a PDF document
|
|
40
|
-
doc = parser.parse("research_paper.pdf")
|
|
41
|
-
|
|
42
|
-
print(f"Pages: {doc.num_pages}")
|
|
43
|
-
print(f"Sections: {len(doc.sections)}")
|
|
44
|
-
print(f"Tables: {len(doc.tables)}")
|
|
45
|
-
print(f"Figures: {len(doc.figures)}")
|
|
46
|
-
print(f"Equations: {len(doc.equations)}")
|
|
47
|
-
|
|
48
|
-
# Access structured content
|
|
49
|
-
for section in doc.sections:
|
|
50
|
-
print(f"\n## {section.title}")
|
|
51
|
-
print(f" {section.text[:200]}...")
|
|
52
|
-
if section.tables:
|
|
53
|
-
print(f" Tables: {len(section.tables)}")
|
|
54
|
-
```
|
|
55
|
-
|
|
56
|
-
## Layout Analysis
|
|
57
|
-
|
|
58
|
-
```python
|
|
59
|
-
from weknora import LayoutAnalyzer
|
|
60
|
-
|
|
61
|
-
analyzer = LayoutAnalyzer(model="layoutlmv3")
|
|
62
|
-
|
|
63
|
-
# Detect document layout elements
|
|
64
|
-
layout = analyzer.analyze("paper.pdf")
|
|
65
|
-
|
|
66
|
-
for page in layout.pages:
|
|
67
|
-
print(f"\nPage {page.number}:")
|
|
68
|
-
for element in page.elements:
|
|
69
|
-
print(f" [{element.type}] ({element.bbox}) "
|
|
70
|
-
f"{element.text[:50]}...")
|
|
71
|
-
# Element types: title, text, table, figure,
|
|
72
|
-
# equation, header, footer, caption, list
|
|
73
|
-
```
|
|
74
|
-
|
|
75
|
-
## Table Extraction
|
|
76
|
-
|
|
77
|
-
```python
|
|
78
|
-
from weknora import TableExtractor
|
|
79
|
-
|
|
80
|
-
extractor = TableExtractor()
|
|
81
|
-
|
|
82
|
-
# Extract tables from document
|
|
83
|
-
tables = extractor.extract("paper.pdf")
|
|
84
|
-
|
|
85
|
-
for i, table in enumerate(tables):
|
|
86
|
-
print(f"\nTable {i+1}: {table.caption}")
|
|
87
|
-
df = table.to_dataframe()
|
|
88
|
-
print(df.head())
|
|
89
|
-
|
|
90
|
-
# Export
|
|
91
|
-
df.to_csv(f"table_{i+1}.csv")
|
|
92
|
-
|
|
93
|
-
# Extract specific table by page
|
|
94
|
-
table = extractor.extract_from_page("paper.pdf", page=5, index=0)
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
## Formula Recognition
|
|
98
|
-
|
|
99
|
-
```python
|
|
100
|
-
from weknora import FormulaRecognizer
|
|
101
|
-
|
|
102
|
-
recognizer = FormulaRecognizer()
|
|
103
|
-
|
|
104
|
-
# Extract formulas from document
|
|
105
|
-
formulas = recognizer.extract("paper.pdf")
|
|
106
|
-
|
|
107
|
-
for formula in formulas:
|
|
108
|
-
print(f"Page {formula.page}: {formula.latex}")
|
|
109
|
-
# Output: "\\mathcal{L} = -\\sum_{i} y_i \\log(\\hat{y}_i)"
|
|
110
|
-
print(f" Type: {formula.type}") # inline or display
|
|
111
|
-
```
|
|
112
|
-
|
|
113
|
-
## RAG Pipeline
|
|
114
|
-
|
|
115
|
-
```python
|
|
116
|
-
from weknora import RAGPipeline
|
|
117
|
-
|
|
118
|
-
# Build RAG over document collection
|
|
119
|
-
rag = RAGPipeline(
|
|
120
|
-
embedding_model="bge-large-zh-v1.5",
|
|
121
|
-
chunk_size=512,
|
|
122
|
-
chunk_overlap=64,
|
|
123
|
-
)
|
|
124
|
-
|
|
125
|
-
# Index documents
|
|
126
|
-
rag.add_documents([
|
|
127
|
-
"papers/transformer.pdf",
|
|
128
|
-
"papers/bert.pdf",
|
|
129
|
-
"papers/gpt3.pdf",
|
|
130
|
-
])
|
|
131
|
-
|
|
132
|
-
# Query
|
|
133
|
-
result = rag.query(
|
|
134
|
-
"What is the computational complexity of self-attention?"
|
|
135
|
-
)
|
|
136
|
-
print(result.answer)
|
|
137
|
-
for source in result.sources:
|
|
138
|
-
print(f" [{source.document}] p.{source.page}: "
|
|
139
|
-
f"{source.text[:80]}...")
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
## Multi-Modal Processing
|
|
143
|
-
|
|
144
|
-
```python
|
|
145
|
-
from weknora import MultiModalParser
|
|
146
|
-
|
|
147
|
-
parser = MultiModalParser()
|
|
148
|
-
|
|
149
|
-
# Process document with figures and tables
|
|
150
|
-
doc = parser.parse("paper.pdf", extract_all=True)
|
|
151
|
-
|
|
152
|
-
# Access figure descriptions
|
|
153
|
-
for fig in doc.figures:
|
|
154
|
-
print(f"Figure {fig.number}: {fig.caption}")
|
|
155
|
-
fig.save_image(f"figures/fig_{fig.number}.png")
|
|
156
|
-
|
|
157
|
-
# Cross-reference tables and text
|
|
158
|
-
for ref in doc.cross_references:
|
|
159
|
-
print(f"'{ref.text}' → {ref.target_type} {ref.target_id}")
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
## Batch Processing
|
|
163
|
-
|
|
164
|
-
```python
|
|
165
|
-
from weknora import BatchProcessor
|
|
166
|
-
|
|
167
|
-
processor = BatchProcessor(
|
|
168
|
-
workers=4,
|
|
169
|
-
output_dir="./parsed_docs",
|
|
170
|
-
)
|
|
171
|
-
|
|
172
|
-
# Process directory of documents
|
|
173
|
-
results = processor.process_directory(
|
|
174
|
-
"papers/",
|
|
175
|
-
formats=["pdf", "docx"],
|
|
176
|
-
output_format="json", # or "markdown"
|
|
177
|
-
)
|
|
178
|
-
|
|
179
|
-
print(f"Processed: {results.success}/{results.total}")
|
|
180
|
-
print(f"Failed: {results.failures}")
|
|
181
|
-
```
|
|
182
|
-
|
|
183
|
-
## Configuration
|
|
184
|
-
|
|
185
|
-
```python
|
|
186
|
-
from weknora import Config
|
|
187
|
-
|
|
188
|
-
config = Config(
|
|
189
|
-
parser={
|
|
190
|
-
"layout_model": "layoutlmv3",
|
|
191
|
-
"ocr_engine": "paddleocr",
|
|
192
|
-
"formula_engine": "latex_ocr",
|
|
193
|
-
"language": "en", # or "zh", "multi"
|
|
194
|
-
},
|
|
195
|
-
rag={
|
|
196
|
-
"embedding_model": "bge-large-zh-v1.5",
|
|
197
|
-
"reranker": "bge-reranker-large",
|
|
198
|
-
"chunk_strategy": "semantic",
|
|
199
|
-
"vector_store": "faiss",
|
|
200
|
-
},
|
|
201
|
-
)
|
|
202
|
-
```
|
|
203
|
-
|
|
204
|
-
## Use Cases
|
|
205
|
-
|
|
206
|
-
1. **Paper parsing**: Extract structured content from academic PDFs
|
|
207
|
-
2. **Table digitization**: Convert paper tables to spreadsheets
|
|
208
|
-
3. **Document QA**: RAG-based question answering over papers
|
|
209
|
-
4. **Knowledge extraction**: Build knowledge bases from documents
|
|
210
|
-
5. **Report analysis**: Process and compare technical reports
|
|
211
|
-
|
|
212
|
-
## References
|
|
213
|
-
|
|
214
|
-
- [WeKnora GitHub](https://github.com/Tencent/WeKnora)
|
|
215
|
-
- [LayoutLMv3](https://arxiv.org/abs/2204.08387)
|
|
216
|
-
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
|
@@ -1,135 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: mimir-memory-guide
|
|
3
|
-
description: "Semantic vector search memory bank for AI agents"
|
|
4
|
-
metadata:
|
|
5
|
-
openclaw:
|
|
6
|
-
emoji: "📝"
|
|
7
|
-
category: "tools"
|
|
8
|
-
subcategory: "knowledge-graph"
|
|
9
|
-
keywords: ["agent memory", "vector search", "semantic memory", "MCP", "knowledge persistence", "embedding"]
|
|
10
|
-
source: "https://github.com/orneryd/Mimir"
|
|
11
|
-
---
|
|
12
|
-
|
|
13
|
-
# Mimir Agent Memory Guide
|
|
14
|
-
|
|
15
|
-
## Overview
|
|
16
|
-
|
|
17
|
-
Mimir is a semantic memory bank for AI agents that provides persistent, searchable memory via vector embeddings. Agents can store observations, learnings, and context that persists across sessions, and retrieve relevant memories using semantic search. Available as an MCP server for seamless integration with Claude Code, OpenClaw, and other agent frameworks.
|
|
18
|
-
|
|
19
|
-
## Installation
|
|
20
|
-
|
|
21
|
-
```bash
|
|
22
|
-
# MCP server
|
|
23
|
-
npm install -g @mimir/mcp-server
|
|
24
|
-
|
|
25
|
-
# Python library
|
|
26
|
-
pip install mimir-memory
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
## MCP Configuration
|
|
30
|
-
|
|
31
|
-
```json
|
|
32
|
-
{
|
|
33
|
-
"mcpServers": {
|
|
34
|
-
"mimir": {
|
|
35
|
-
"command": "npx",
|
|
36
|
-
"args": ["@mimir/mcp-server"],
|
|
37
|
-
"env": {
|
|
38
|
-
"MEMORY_PATH": "~/.mimir/memories",
|
|
39
|
-
"EMBEDDING_MODEL": "all-MiniLM-L6-v2"
|
|
40
|
-
}
|
|
41
|
-
}
|
|
42
|
-
}
|
|
43
|
-
}
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
## Core Operations
|
|
47
|
-
|
|
48
|
-
```python
|
|
49
|
-
from mimir import MemoryBank
|
|
50
|
-
|
|
51
|
-
bank = MemoryBank(
|
|
52
|
-
path="./agent_memory",
|
|
53
|
-
embedding_model="all-MiniLM-L6-v2",
|
|
54
|
-
)
|
|
55
|
-
|
|
56
|
-
# Store a memory
|
|
57
|
-
bank.store(
|
|
58
|
-
content="The project uses FastAPI with PostgreSQL. "
|
|
59
|
-
"Database migrations are managed with Alembic.",
|
|
60
|
-
metadata={
|
|
61
|
-
"type": "project_context",
|
|
62
|
-
"project": "wentor",
|
|
63
|
-
"timestamp": "2025-03-10",
|
|
64
|
-
},
|
|
65
|
-
)
|
|
66
|
-
|
|
67
|
-
# Semantic search
|
|
68
|
-
results = bank.search(
|
|
69
|
-
query="How does the database work?",
|
|
70
|
-
top_k=5,
|
|
71
|
-
)
|
|
72
|
-
for r in results:
|
|
73
|
-
print(f"[{r.score:.3f}] {r.content[:100]}...")
|
|
74
|
-
|
|
75
|
-
# Delete memory
|
|
76
|
-
bank.delete(memory_id=results[0].id)
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
## MCP Tools
|
|
80
|
-
|
|
81
|
-
```markdown
|
|
82
|
-
### Available MCP Tools
|
|
83
|
-
- `store_memory(content, metadata)` — Store new memory
|
|
84
|
-
- `search_memory(query, limit)` — Semantic search
|
|
85
|
-
- `list_memories(filter)` — List by metadata filter
|
|
86
|
-
- `delete_memory(id)` — Remove specific memory
|
|
87
|
-
- `clear_memories(filter)` — Clear matching memories
|
|
88
|
-
|
|
89
|
-
### Usage in Agent Chat
|
|
90
|
-
"Remember that the API uses JWT tokens with 30-day expiry"
|
|
91
|
-
→ Stores as persistent memory
|
|
92
|
-
|
|
93
|
-
"What do you know about the authentication system?"
|
|
94
|
-
→ Searches memories, returns relevant stored context
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
## Memory Types
|
|
98
|
-
|
|
99
|
-
```python
|
|
100
|
-
# Structured memory categories
|
|
101
|
-
bank.store(
|
|
102
|
-
content="User prefers dark theme and minimal logging",
|
|
103
|
-
metadata={"type": "preference"},
|
|
104
|
-
)
|
|
105
|
-
|
|
106
|
-
bank.store(
|
|
107
|
-
content="Fixed bug: ECharts CSS vars don't work on Canvas",
|
|
108
|
-
metadata={"type": "lesson_learned", "topic": "echarts"},
|
|
109
|
-
)
|
|
110
|
-
|
|
111
|
-
bank.store(
|
|
112
|
-
content="Deploy script is at deploy/deploy.sh, "
|
|
113
|
-
"requires .credentials file",
|
|
114
|
-
metadata={"type": "project_knowledge"},
|
|
115
|
-
)
|
|
116
|
-
|
|
117
|
-
# Search by type
|
|
118
|
-
prefs = bank.search(
|
|
119
|
-
query="user preferences",
|
|
120
|
-
filter={"type": "preference"},
|
|
121
|
-
)
|
|
122
|
-
```
|
|
123
|
-
|
|
124
|
-
## Use Cases
|
|
125
|
-
|
|
126
|
-
1. **Agent memory**: Persistent context across sessions
|
|
127
|
-
2. **Project knowledge**: Store and retrieve project facts
|
|
128
|
-
3. **Learning accumulation**: Build expertise over time
|
|
129
|
-
4. **Preference tracking**: Remember user preferences
|
|
130
|
-
5. **Research notes**: Searchable research observations
|
|
131
|
-
|
|
132
|
-
## References
|
|
133
|
-
|
|
134
|
-
- [Mimir GitHub](https://github.com/orneryd/Mimir)
|
|
135
|
-
- [MCP Specification](https://modelcontextprotocol.io/)
|