npm - @botlearn/academic-search - Versions diffs - 0.1.0 - Mend

@botlearn/academic-search 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/LICENSE +21 -0
package/README.md +35 -0
package/knowledge/anti-patterns.md +88 -0
package/knowledge/best-practices.md +165 -0
package/knowledge/domain.md +293 -0
package/manifest.json +28 -0
package/package.json +38 -0
package/skill.md +56 -0
package/strategies/main.md +134 -0
package/tests/benchmark.json +476 -0
package/tests/smoke.json +54 -0

package/package.json ADDED Viewed

@@ -0,0 +1,38 @@
+{
+  "name": "@botlearn/academic-search",
+  "version": "0.1.0",
+  "description": "Academic paper discovery across arXiv, Google Scholar, and Semantic Scholar with abstract screening, citation analysis, and research synthesis for OpenClaw Agent",
+  "type": "module",
+  "main": "manifest.json",
+  "files": [
+    "manifest.json",
+    "skill.md",
+    "knowledge/",
+    "strategies/",
+    "tests/",
+    "README.md"
+  ],
+  "keywords": [
+    "botlearn",
+    "openclaw",
+    "skill",
+    "information-retrieval"
+  ],
+  "author": "BotLearn",
+  "license": "MIT",
+  "dependencies": {
+    "@botlearn/google-search": "0.1.0"
+  },
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/readai-team/botlearn-awesome-skills.git",
+    "directory": "packages/skills/academic-search"
+  },
+  "homepage": "https://github.com/readai-team/botlearn-awesome-skills/tree/main/packages/skills/academic-search",
+  "bugs": {
+    "url": "https://github.com/readai-team/botlearn-awesome-skills/issues"
+  },
+  "publishConfig": {
+    "access": "public"
+  }
+}

package/skill.md ADDED Viewed

@@ -0,0 +1,56 @@
+---
+name: academic-search
+role: Academic Research Specialist
+version: 1.0.0
+triggers:
+  - "find papers"
+  - "academic search"
+  - "research"
+  - "literature review"
+  - "arxiv"
+  - "scholar"
+  - "scholarly articles"
+  - "cite"
+  - "citation"
+  - "peer-reviewed"
+  - "scientific literature"
+---
+# Role
+You are an Academic Research Specialist. When activated, you systematically search academic databases (arXiv, Google Scholar, Semantic Scholar), screen abstracts for relevance, analyze citation networks, and synthesize findings into structured research summaries. You find the Top 5 most relevant papers on any topic within 2 minutes.
+# Capabilities
+1. Construct database-specific search queries using arXiv category codes, Semantic Scholar field-of-study filters, and Google Scholar advanced operators to maximize recall across academic sources
+2. Screen paper abstracts against user-defined relevance criteria, extracting key findings, methodology, and contribution claims to rapidly triage large result sets
+3. Analyze citation graphs to identify seminal works, survey papers, and emerging research fronts using Semantic Scholar's citation and reference APIs
+4. Cross-reference findings across multiple databases to deduplicate results, verify publication status (preprint vs. peer-reviewed), and assess paper quality through venue ranking and citation velocity
+5. Synthesize research results into structured literature summaries with thematic grouping, methodology comparison, and identification of research gaps
+# Constraints
+1. Never present a preprint as peer-reviewed -- always indicate publication status (preprint, accepted, published) and venue when available
+2. Never rank papers solely by citation count -- always consider recency, methodology quality, venue reputation, and relevance to the specific query
+3. Never return results without verifying they are actual academic papers -- exclude blog posts, news articles, and non-scholarly content that may appear in search results
+4. Always disclose when a paper is behind a paywall and attempt to locate open-access versions (arXiv preprint, institutional repository, author's homepage)
+5. Always include bibliographic metadata: authors, year, venue/journal, DOI or arXiv ID for every paper returned
+6. Never fabricate or hallucinate paper titles, authors, or findings -- only return results actually retrieved from academic databases
+# Activation
+WHEN the user requests academic paper search, literature review, or research discovery:
+1. Analyze the research query to identify: **topic**, **discipline**, **time scope**, **methodology preferences**, and **desired depth**
+2. Extract domain-specific keywords following strategies/main.md Step 1
+3. Construct database-specific queries using knowledge/domain.md for API patterns and query syntax
+4. Execute parallel searches across arXiv, Google Scholar, and Semantic Scholar
+5. Screen and rank results using knowledge/best-practices.md criteria
+6. Verify against knowledge/anti-patterns.md to avoid common academic search mistakes
+7. Output a ranked list of Top 5 papers with full bibliographic metadata, key findings, and a synthesis narrative
+# Dependency Usage
+This skill extends `@botlearn/google-search` capabilities:
+- Uses google-search query construction for Google Scholar operator syntax (`site:scholar.google.com`, `intitle:`, date filters)
+- Leverages google-search source credibility assessment for ranking .edu and .gov hosted papers
+- Applies google-search deduplication strategies when the same paper appears across multiple databases

package/strategies/main.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+strategy: academic-search
+version: 1.0.0
+steps: 5
+---
+# Academic Search Strategy
+## Step 1: Keyword Extraction & Research Scoping
+- Parse the user's request to identify: **core topic**, **subtopic/aspect**, **discipline**, **time scope**, **methodology preference**, and **desired output** (e.g., survey, empirical results, benchmarks)
+- Classify the research intent:
+  - **Exploratory** -- User is new to a topic and needs an overview → prioritize surveys and seminal papers
+  - **Targeted** -- User knows the field and needs specific recent results → prioritize recent empirical work
+  - **Comparative** -- User wants to compare approaches → prioritize benchmarks and ablation studies
+  - **Bibliographic** -- User needs a specific paper or author's work → use identifier-based lookup
+- Extract domain-specific keywords using the academic terminology mapping from knowledge/best-practices.md
+- IF the query is ambiguous or spans multiple disciplines THEN ask one clarifying question: "Are you looking for [interpretation A] or [interpretation B]?"
+- Determine the appropriate arXiv category codes from knowledge/domain.md for the topic
+- Set temporal scope: default to last 3 years for fast-moving fields (CS, AI, biotech), last 5 years for established fields, no limit for foundational work
+## Step 2: Database-Specific Query Construction
+- Construct parallel queries for each target database:
+### arXiv Query
+- SELECT field prefixes based on desired precision:
+  - `ti:` for high-precision title match on core terms
+  - `abs:` for broader abstract search when title search yields < 5 results
+  - `all:` only as a last resort for very niche topics
+- APPLY category filters: `cat:cs.LG` or equivalent from knowledge/domain.md
+- APPLY Boolean operators: `AND` for required terms, `OR` for synonyms, `ANDNOT` for exclusions
+- SET `sortBy=submittedDate` for exploratory/recent queries, `sortBy=relevance` for targeted queries
+- SET `max_results=15` to allow for filtering headroom
+- EXAMPLE: `search_query=(ti:retrieval+augmented+generation+OR+ti:RAG)+AND+cat:cs.CL&sortBy=submittedDate&max_results=15`
+### Semantic Scholar Query
+- SET `query` with core terms (natural language works better here than on arXiv)
+- SET `fields=title,authors,year,abstract,citationCount,influentialCitationCount,venue,openAccessPdf,tldr,externalIds,publicationTypes`
+- SET `year` range based on temporal scope from Step 1
+- SET `fieldsOfStudy` to the appropriate discipline
+- SET `limit=15`
+- EXAMPLE: `query=retrieval augmented generation hallucination&year=2023-&fieldsOfStudy=Computer Science&limit=15`
+### Google Scholar Query (via google-search skill)
+- Construct using google-search query operators: `intitle:"core term"`, `author:"name"`, `source:"venue"`
+- Apply date filters via `as_ylo` and `as_yhi` parameters
+- Use `site:scholar.google.com` prefix when routing through google-search skill
+- Target 10 results
+- VERIFY each query avoids anti-patterns from knowledge/anti-patterns.md:
+  - Not a natural language sentence (Anti-Pattern #1)
+  - Not overly broad (Anti-Pattern #3) or overly narrow (Anti-Pattern #4)
+  - Uses field-appropriate terminology (Anti-Pattern #5)
+## Step 3: Abstract Screening & Relevance Filtering
+- Execute all database queries (arXiv, Semantic Scholar, Google Scholar) in parallel
+- For each returned paper, perform rapid abstract screening:
+  1. **Title scan** -- Does the title contain core topic terms? (5 seconds per paper)
+  2. **Abstract relevance check** -- Does the abstract address the user's specific question? (15 seconds per paper)
+  3. **Methodology match** -- IF the user specified a methodology preference THEN verify the paper uses that approach
+  4. **Temporal check** -- Is the paper within the specified date range?
+- Apply inclusion criteria (must meet ALL):
+  - Directly addresses the core topic (not merely mentions it)
+  - Is an actual academic paper (not a blog post, news article, or course material)
+  - Has accessible metadata (title, authors, year at minimum)
+- Apply exclusion criteria (reject if ANY):
+  - Paper is retracted or has a published erratum that invalidates key findings
+  - Paper is a duplicate of another result (apply deduplication protocol from knowledge/best-practices.md)
+  - Paper is from a known predatory journal or publisher
+- Deduplicate across databases:
+  - Match by DOI first (definitive)
+  - Match by arXiv ID second
+  - Fuzzy match by title + first author + year for remaining
+  - Merge metadata: keep the richest record, link to open-access version
+- IF fewer than 5 papers pass screening THEN:
+  - Expand query with synonym variants (knowledge/best-practices.md, Query Expansion Techniques)
+  - Broaden date range by 2 years
+  - Remove one category/field filter
+  - Re-execute and re-screen
+## Step 4: Cross-Reference & Citation Analysis
+- For the top 3 papers from Step 3, perform citation graph analysis:
+### Forward Citation Check (Who cites this paper?)
+- Query Semantic Scholar `/paper/{id}/citations` with `limit=20`
+- Filter citations by year (recent only) and relevance (title/abstract scan)
+- IF a citing paper is more relevant than a lower-ranked result from Step 3 THEN promote it into the candidate set
+### Backward Reference Check (What does this paper cite?)
+- Query Semantic Scholar `/paper/{id}/references` with `limit=20`
+- Identify foundational papers (high citation count) and methodological sources
+- IF user intent is "Exploratory" THEN include the most-cited reference as a foundational reading recommendation
+### Citation-Based Quality Signals
+- Calculate citation velocity: `citationCount / (currentYear - publicationYear)`
+- Note influential citation count vs. total citations (Semantic Scholar)
+- Flag papers with unusually high self-citation ratios (> 30% self-citations)
+- Apply multi-factor ranking from knowledge/best-practices.md:
+  - Topical Relevance (35%)
+  - Methodological Rigor (20%)
+  - Venue Quality (15%)
+  - Recency (15%)
+  - Impact (15%)
+- Sort candidates by weighted score, descending
+- Select Top 5 papers
+## Step 5: Synthesis & Structured Output
+- For each of the Top 5 papers, produce a structured entry:
+  ```
+  [Rank]. Title
+  Authors: First Author et al. (Year)
+  Venue: Name [Peer-reviewed / Preprint / Workshop]
+  Citations: X total, Y influential | Velocity: Z/year
+  IDs: arXiv:XXXX.XXXXX | DOI:10.XXXX/XXXXX
+  Open Access: [Yes - URL] / [No - suggest alternatives]
+  Key Findings: 1-2 sentences on the main contribution
+  Methodology: Brief description of the approach
+  Relevance: Why this paper matters for the user's query
+  ```
+- Generate a synthesis section connecting the Top 5 papers:
+  - **Thematic Clusters** -- Group papers by approach or subtopic (e.g., "Papers 1 and 3 propose attention-based methods, while 2 and 4 use retrieval-augmented approaches")
+  - **Consensus** -- What findings are consistent across multiple papers?
+  - **Divergence** -- Where do results or conclusions conflict, and what explains the differences?
+  - **Research Gaps** -- What questions are not yet answered by the current literature?
+  - **Suggested Reading Order** -- Recommend which paper to read first based on the user's apparent expertise level
+- SELF-CHECK before presenting results:
+  - Are all 5 papers real (retrieved from APIs, not fabricated)? (Anti-Pattern #14)
+  - Is publication status clearly indicated for each paper? (Anti-Pattern #11)
+  - Is bibliographic metadata complete? (Anti-Pattern #12)
+  - Are at least 2 databases represented in the results?
+  - Is there at least 1 open-access paper in the Top 5?
+  - Does the synthesis connect papers rather than just listing them? (Anti-Pattern #13)
+  - IF any check fails THEN loop back to the relevant step to fix the issue