PyPI - ref-management - Versions diffs - 1.0.3__tar.gz - Mend

ref-management 1.0.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

ref_management-1.0.3/LICENSE +21 -0
ref_management-1.0.3/PKG-INFO +13 -0
ref_management-1.0.3/README.md +161 -0
ref_management-1.0.3/pyproject.toml +35 -0
ref_management-1.0.3/ref_management/__init__.py +3 -0
ref_management-1.0.3/ref_management/add_dois.py +126 -0
ref_management-1.0.3/ref_management/apply_citations.py +471 -0
ref_management-1.0.3/ref_management/auto_format.py +106 -0
ref_management-1.0.3/ref_management/generate_report.py +170 -0
ref_management-1.0.3/ref_management/scan_raw_refs.py +361 -0
ref_management-1.0.3/ref_management/verify_bib.py +300 -0
ref_management-1.0.3/ref_management.egg-info/PKG-INFO +13 -0
ref_management-1.0.3/ref_management.egg-info/SOURCES.txt +17 -0
ref_management-1.0.3/ref_management.egg-info/dependency_links.txt +1 -0
ref_management-1.0.3/ref_management.egg-info/entry_points.txt +7 -0
ref_management-1.0.3/ref_management.egg-info/requires.txt +6 -0
ref_management-1.0.3/ref_management.egg-info/top_level.txt +1 -0
ref_management-1.0.3/setup.cfg +4 -0
ref_management-1.0.3/setup.py +3 -0

ref_management-1.0.3/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Akira Imamoto
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

ref_management-1.0.3/PKG-INFO ADDED Viewed

@@ -0,0 +1,13 @@
+Metadata-Version: 2.4
+Name: ref-management
+Version: 1.0.3
+Summary: Manuscript Reference Toolkit (ARM) — extract, verify, and format references in research manuscripts
+Author-email: Akira Imamoto <aimamoto@uchicago.edu>
+License-File: LICENSE
+Requires-Dist: bibtexparser>=1.4
+Requires-Dist: python-docx>=1.0
+Requires-Dist: biopython>=1.80
+Requires-Dist: rapidfuzz>=3.0
+Requires-Dist: requests>=2.28
+Requires-Dist: citeproc-py>=0.6
+Dynamic: license-file

ref_management-1.0.3/README.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Manuscript Reference Toolkit ARM (Another Reference Manager v1-Revision 3)
+![Python Version](https://img.shields.io/badge/python-3.x-blue) ![License](https://img.shields.io/badge/license-MIT-green)
+A comprehensive Python toolkit designed to extract, verify, correct, and format references in research manuscripts.
+This **Revision 3** toolkit bridges the gap between rough drafts (which often contain raw references, incomplete metadata, or AI hallucinations) and a **finalized, submission-ready Word document**. It features a new **Universal CSL Formatting Engine**, an **Author-Year Bridge** (allowing you to draft with `(Author, Year)` and automatically convert to numeric formats if needed), and intelligent text-replacement algorithms that preserve your document's native fonts and formatting.
+## Key Features (r3 Updates)
+*   **Universal CSL Engine:** Powered by `citeproc-py`, simply provide any Citation Style Language (`.csl`) file (e.g., from the Zotero repository) to format your manuscript exactly to specific journal requirements (e.g., Nature, Cell, APA).
+*   **The Author-Year Bridge:** Draft naturally with `(Smith, 2024)` in text. The pipeline will fuzzy-match the authors to your bibliography and dynamically convert them to whatever your CSL demands (e.g., converting to `1–3` superscripts).
+*   **MDPI & Online Journal Preprocessor:** Automatically algebraic-extracts article numbers from DOIs (e.g., isolating `903` from `genes15070903`) to guarantee modern online journals print with correct page numbers.
+*   **Smart Bibliography Placement & Pagination:** Automatically detects trailing sections (like "Figure Legends" or "Tables") and perfectly inserts the formatted References in between the main text and trailing sections with clean page breaks.
+*   **Advanced Number Collapsing:** Automatically enforces universally required typographic ranges for scientific papers (e.g., converting `1, 2, 3` into `1–3`) natively avoiding CSL engine quirks.
+*   **Dual-Database Verification:** Seamlessly falls back to **Crossref** if a DOI is not found in **PubMed** (perfect for statistics or older journals).
+*   **Smart Shields:** Protects $CV^2$, $R^2$, `Tyr530`, and $1 \times 10^5$ from being misread as citations.
+## Why ARM? (Advantages over Traditional Reference Managers)
+While conventional reference managers (e.g., Zotero, Mendeley, EndNote) are highly effective for personal library curation, they frequently introduce friction during multi-author manuscript preparation. ARM is specifically designed to resolve these collaborative bottlenecks:
+*   **Decentralized Collaborative Drafting:** Traditional tools require all co-authors to synchronize a centralized library database or install proprietary Word plugins. ARM completely eliminates personal library dependency. Co-authors can draft references organically in plain text (e.g., typing `(Author, Year)` or pasting raw, unformatted references at the bottom of the document), and the pipeline will dynamically resolve and format them.
+*   **Post-Hoc Resolution of Messy Drafts:** Instead of forcing authors to use a strict GUI to "insert" citations while writing, ARM acts as a robust post-processing compiler. It takes rough drafts—often containing incomplete metadata, inconsistent formatting, or AI-hallucinated citations—and mathematically standardizes them against the PubMed and Crossref APIs.
+*   **Intelligent Text & Math Protection:** Standard Word plugins often override native typography or mangle inline mathematics (mistaking superscript numbers for citations). ARM utilizes NLP-driven "Smart Shields" to actively protect critical scientific nomenclature and statistical notations (e.g., $R^2$, $CV^2$, $1 \times 10^5$).
+*   **Native Algorithmic Formatting:** Unlike traditional plugins that rely heavily on hidden Word XML field codes (which can corrupt documents when shared across different operating systems), ARM executes clean text-replacement algorithms that preserve your document's native fonts, margins, and layout.
+## Configuration (Important)
+To query PubMed efficiently without hitting rate limits, you should configure your NCBI credentials.
+**Option A: Environment Variables (Recommended)**
+*   **Mac/Linux:**
+    ```bash
+    export NCBI_EMAIL="your_email@example.com"
+    export NCBI_API_KEY="your_api_key"
+    ```
+*   **Windows (CMD/PowerShell):**
+    ```cmd
+    set NCBI_EMAIL=your_email@example.com
+    set NCBI_API_KEY=your_api_key
+    ```
+**Option B: Hardcoding**
+You can edit the `Entrez.email` and `Entrez.api_key` lines directly at the top of the `ref_management/verify_bib.py` module.
+---
+## 📦 Installation
+Install directly from PyPI with a single command:
+```bash
+pip install ref-management
+```
+This automatically installs all required dependencies (`bibtexparser`, `python-docx`, `biopython`, `rapidfuzz`, `requests`, `citeproc-py`) and creates the following CLI commands:
+| Command | Description |
+| :--- | :--- |
+| `arm-format` | End-to-end pipeline wrapper |
+| `arm-scan` | Scan & extract references from a `.docx` |
+| `arm-verify` | Enrich a `.bib` file via PubMed / Crossref |
+| `arm-apply` | Apply CSL formatting to the Word document |
+| `arm-add-dois` | Append missing DOIs to an intermediate draft |
+| `arm-report` | Generate a plain-text reference list from a `.bib` |
+---
+## 🚀 Workflow 1: Fully Automated Pipeline (Recommended)
+Use this wrapper command to execute the entire extraction, verification, and formatting process automatically.
+```bash
+arm-format "MyDraft.docx" --csl "nature"
+```
+> **💡 Pro-Tip (Default Directory):** You can create a folder at `~/citation_styles/` and store all your downloaded `.csl` files from Zotero there. The pipeline will automatically search this folder, meaning you can simply type `--csl cell` instead of providing the full file path.
+**What it does:**
+1. Loads your desired journal style via the provided `.csl` file.
+2. Extracts raw references from your document.
+3. Downloads the missing metadata (Volume, Issue, Pages) from PubMed/Crossref.
+4. Rewrites your in-text citations natively.
+5. Injects a perfectly formatted bibliography, applying proper page breaks to ensure your Tables and Figure Legends are pushed cleanly to the next page.
+*   **Output:** `MyDraft_final_nature.docx`, plus diagnostic CSV/BibTeX files.
+---
+## Workflow 2: Partial / Step-by-Step Pipeline
+If you want to manually inspect or edit the references between steps, you can run the modules individually.
+### Step 1: Scan & Extract
+Reads the raw reference list at the bottom of your draft and maps them to PMIDs/DOIs.
+```bash
+arm-scan "MyDraft.docx"
+```
+### Step 2: Verify & Enrich
+Takes the extracted `.bib` file, hits PubMed/Crossref, and fills in all missing Journal names, Volumes, and Authors.
+```bash
+arm-verify "MyDraft_extracted.bib"
+```
+### Step 3: Apply to Manuscript via CSL Engine
+Takes your verified references and applies them to the document using your target CSL style.
+```bash
+arm-apply "MyDraft_extracted_verified.bib" "MyDraft.docx" --csl "nature.csl"
+```
+---
+## Troubleshooting
+### "Dependent" CSL Style Error
+If the script aborts with an error stating that your `.csl` file is a **dependent style**, it means the file you downloaded from the Zotero Style Repository is just a lightweight link to a "parent" publisher style (e.g., *The EMBO Journal* uses *EMBO Press*).
+**Solution:**
+1. Read the terminal error message—the script will automatically scan the XML and tell you the exact name and URL of the parent style you need.
+2. Download that parent `.csl` file and place it in your `~/citation_styles/` folder.
+3. Rerun the script using the parent style.
+*Example:* `arm-format "MyDraft.docx" --csl embo-press`
+---
+## Extra Tools
+### 1. Inject DOIs into an Intermediate Draft
+If you want to quickly append clickable DOIs to the raw references of an intermediate draft (for co-authors to easily click/read papers) *without* fully reformatting the document or changing in-text citations:
+```bash
+arm-add-dois "MyDraft_extracted_verified.bib" "MyDraft.docx"
+```
+*   **Output:** `MyDraft_with_DOIs.docx` (Your original draft, with `https://doi.org/...` seamlessly appended to references that were missing it).
+### 2. Generate a Text Report
+If you just want a clean text file of your references (without modifying a Word document), you can use the reporter command on any verified `.bib` file:
+```bash
+arm-report "MyDraft_extracted_verified.bib"
+```
+*   **Output:** `MyDraft_extracted_verified_list.txt`
+---
+## Module Overview
+| Module / Command | Purpose |
+| :--- | :--- |
+| **`arm-format`** | **The Wrapper:** Runs all steps automatically using the CSL Engine. |
+| **`arm-scan`** | **The Auditor:** Scans `.docx` for raw refs, outputs CSV report and a raw `.bib` mapping. |
+| **`arm-verify`** | **The Enrichment Engine:** Queries PubMed/Crossref to enrich missing metadata. |
+| **`arm-apply`** | **The CSL Formatter:** Updates inline citations, protects math/fonts, and smartly paginates the Bibliography. |
+| **`arm-add-dois`** | **The Linker:** Appends DOIs to raw reference lists for intermediate co-author drafts. |
+| **`arm-report`** | **The Reporter:** Converts `.bib` files into clean `.txt` lists. |
+---
+## Disclaimer
+*While this toolkit uses fuzzy logic, NLP shields, and official APIs to verify and map data, always perform a final visual review of the generated manuscript before submitting to a journal.*

ref_management-1.0.3/pyproject.toml ADDED Viewed

@@ -0,0 +1,35 @@
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "ref-management"
+version = "1.0.3"
+description = "Manuscript Reference Toolkit (ARM) — extract, verify, and format references in research manuscripts"
+authors = [
+    { name = "Akira Imamoto", email = "aimamoto@uchicago.edu" }
+]
+dependencies = [
+    "bibtexparser>=1.4",
+    "python-docx>=1.0",
+    "biopython>=1.80",
+    "rapidfuzz>=3.0",
+    "requests>=2.28",
+    "citeproc-py>=0.6",
+]
+[project_urls]
+Homepage = "https://github.com/aimamoto/ref_management"
+Repository = "https://github.com/aimamoto/ref_management"
+[project.scripts]
+arm-format   = "ref_management.auto_format:main"
+arm-scan     = "ref_management.scan_raw_refs:main"
+arm-verify   = "ref_management.verify_bib:main"
+arm-apply    = "ref_management.apply_citations:main"
+arm-add-dois = "ref_management.add_dois:main"
+arm-report   = "ref_management.generate_report:main"
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["ref_management*"]

ref_management-1.0.3/ref_management/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""ref_management – Manuscript Reference Toolkit (ARM)."""
+__version__ = "1.0.0"

ref_management-1.0.3/ref_management/add_dois.py ADDED Viewed

@@ -0,0 +1,126 @@
+import sys
+import re
+import argparse
+from pathlib import Path
+# --- MONKEY PATCH FOR PYPARSING/BIBTEXPARSER COMPATIBILITY ---
+import pyparsing
+if not hasattr(pyparsing, 'DelimitedList'):
+    if hasattr(pyparsing, 'delimited_list'): setattr(pyparsing, 'DelimitedList', pyparsing.delimited_list)
+    elif hasattr(pyparsing, 'delimitedList'): setattr(pyparsing, 'DelimitedList', pyparsing.delimitedList)
+import bibtexparser
+from docx import Document
+from rapidfuzz import fuzz
+REF_HEADER_PATTERN = re.compile(r'^\s*(?:[0-9]+\.?\s*)?(?:REFERENCES|BIBLIOGRAPHY|LITERATURE CITED|WORKS CITED)\s*$', re.IGNORECASE)
+POST_REF_PATTERN = re.compile(r'^\s*(?:Tables?|Figures?|Figure Legends?|Supplementary.*?|Appendices|Data Availability|Acknowledgements?|Author Contributions?|Funding|Conflict(?:s)? of Interest|Competing Interests?|(?:Table|Figure|Fig\.?)\s*\d+.*)$', re.IGNORECASE)
+def clean_for_match(text: str) -> str:
+    """Removes punctuation and normalizes spacing for accurate fuzzy matching."""
+    if not text: return ""
+    text = text.replace('{', '').replace('}', '')
+    return re.sub(r'[^\w\s]', '', text.lower()).strip()
+def process_document(bib_path: Path, docx_path: Path, output_path: Path):
+    print(f"\nReading verified BibTeX: {bib_path.name}...")
+    try:
+        with open(bib_path, 'r', encoding='utf-8') as f:
+            bib_db = bibtexparser.load(f)
+    except Exception as e:
+        print(f"❌ ERROR reading BibTeX: {e}")
+        sys.exit(1)
+    # Build an index of cleaned titles to DOIs
+    doi_map = {}
+    for entry in bib_db.entries:
+        doi = entry.get('doi', '').strip()
+        title = entry.get('title', '').strip()
+        if doi and title:
+            # Clean DOI prefix if present
+            clean_doi = doi.replace('https://doi.org/', '').replace('doi:', '').strip()
+            doi_map[clean_for_match(title)] = clean_doi
+    print(f"Loaded {len(doi_map)} DOIs from BibTeX.")
+    print(f"Scanning document: {docx_path.name}...")
+    doc = Document(str(docx_path))
+    # 1. Find the boundaries of the References section
+    ref_start_idx = -1
+    for i, p in enumerate(doc.paragraphs):
+        if REF_HEADER_PATTERN.match(p.text):
+            ref_start_idx = i
+            break
+    if ref_start_idx == -1:
+        print("❌ ERROR: Could not locate 'References' header in the document.")
+        sys.exit(1)
+    ref_end_idx = len(doc.paragraphs)
+    for i in range(ref_start_idx + 1, len(doc.paragraphs)):
+        text = doc.paragraphs[i].text.strip()
+        if text and POST_REF_PATTERN.match(text):
+            ref_end_idx = i
+            break
+    # 2. Iterate through the references and append DOIs
+    added_count = 0
+    already_had_count = 0
+    for i in range(ref_start_idx + 1, ref_end_idx):
+        para = doc.paragraphs[i]
+        text = para.text.strip()
+        # Skip empty lines or very short fragments
+        if len(text) < 20: continue
+        # Check if a DOI is already present in this paragraph
+        if re.search(r'(?i)\bhttps?://doi\.org\b', text) or re.search(r'(?i)\bdoi:', text):
+            already_had_count += 1
+            continue
+        # Fuzzy match the paragraph text against our BibTeX titles
+        best_match_doi = None
+        best_score = 85  # Minimum strictness threshold
+        para_clean = clean_for_match(text)
+        for bib_title, doi in doi_map.items():
+            # partial_ratio is perfect here because the title is just a substring of the full reference paragraph
+            score = fuzz.partial_ratio(bib_title, para_clean)
+            if score > best_score:
+                best_score = score
+                best_match_doi = doi
+        if best_match_doi:
+            # Append the DOI natively to the paragraph
+            if not text.endswith('.'):
+                para.add_run('.')
+            # Format the run slightly to match typical hyperlink aesthetics (optional, but clean)
+            run = para.add_run(f" https://doi.org/{best_match_doi}")
+            added_count += 1
+    # 3. Save the patched draft
+    doc.save(str(output_path))
+    print(f"\nSuccess! Saved to {output_path.name}")
+    print(f" -> Found {already_had_count} references that already had DOIs.")
+    print(f" -> Dynamically matched and injected {added_count} missing DOIs.")
+def main():
+    parser = argparse.ArgumentParser(description="Appends DOIs to the References section of an intermediate draft.")
+    parser.add_argument("bib", type=Path, help="The verified .bib file containing the DOIs")
+    parser.add_argument("doc", type=Path, help="The intermediate .docx file")
+    args = parser.parse_args()
+    if not args.bib.exists():
+        print(f"❌ ERROR: BibTeX file '{args.bib}' not found.")
+        sys.exit(1)
+    if not args.doc.exists():
+        print(f"❌ ERROR: Document '{args.doc}' not found.")
+        sys.exit(1)
+    output = args.doc.with_name(f"{args.doc.stem}_with_DOIs.docx")
+    process_document(args.bib, args.doc, output)
+if __name__ == "__main__":
+    main()