endnote-utils 0.1.3__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,223 @@
1
+ Metadata-Version: 2.4
2
+ Name: endnote-utils
3
+ Version: 0.2.0
4
+ Summary: Convert EndNote XML to CSV/JSON/XLSX with streaming parse and TXT report.
5
+ Author-email: Minh Quach <minhquach8@gmail.com>
6
+ License: MIT
7
+ Keywords: endnote,xml,csv,bibliography,research
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.8
12
+ Description-Content-Type: text/markdown
13
+ Requires-Dist: openpyxl>=3.1.0
14
+
15
+ # EndNote Utils
16
+
17
+ Convert **EndNote XML files** into clean CSV/JSON/XLSX with automatic TXT reports.
18
+ Supports both **Python API** and **command-line interface (CLI)**.
19
+
20
+ ---
21
+
22
+ ## Features
23
+
24
+ - ✅ Parse one XML file (`--xml`) or an entire folder of `*.xml` (`--folder`)
25
+ - ✅ Streams `<record>` elements using `iterparse` (low memory usage)
26
+ - ✅ Extracts fields:
27
+ `database, ref_type, title, journal, authors, year, volume, number, abstract, doi, urls, keywords, publisher, isbn, language, extracted_date`
28
+ - ✅ Adds a `database` column from the XML filename stem (`IEEE.xml → IEEE`)
29
+ - ✅ Normalizes DOI (`10.xxxx` → `https://doi.org/...`)
30
+ - ✅ Supports **multiple output formats**: CSV, JSON, XLSX
31
+ - ✅ Always generates a **TXT report** (default: `<out>_report.txt`) with:
32
+ - per-file counts (exported/skipped)
33
+ - totals, files processed
34
+ - run timestamp & duration
35
+ - **duplicate table** per database (Origin / Retractions / Duplicates / Remaining)
36
+ - optional duplicate key list (top-N)
37
+ - optional summary stats (year, ref_type, journal, top authors)
38
+ - ✅ Auto-creates output folders if missing
39
+ - ✅ Deduplication:
40
+ - `--dedupe doi` (unique by DOI)
41
+ - `--dedupe title-year` (unique by normalized title + year)
42
+ - `--dedupe-keep first|last` (keep first or last occurrence within each file)
43
+ - ✅ Summary stats (`--stats`) with optional JSON export (`--stats-json`)
44
+ - ✅ CLI options for CSV formatting, filters, verbosity
45
+ - ✅ Importable Python API for scripting & integration
46
+
47
+ ---
48
+
49
+ ## Installation
50
+
51
+ ### From PyPI
52
+
53
+ ```bash
54
+ pip install endnote-utils
55
+ ```
56
+
57
+ Requires **Python 3.8+**.
58
+
59
+ ---
60
+
61
+ ## Usage
62
+
63
+ ### Command Line
64
+
65
+ #### Single file
66
+
67
+ ```bash
68
+ endnote-utils --xml data/IEEE.xml --out output/ieee.csv
69
+ ```
70
+
71
+ #### Folder with multiple files
72
+
73
+ ```bash
74
+ endnote-utils --folder data/xmls --out output/all_records.csv
75
+ ```
76
+
77
+ #### Custom report path
78
+
79
+ ```bash
80
+ endnote-utils \
81
+ --xml data/Scopus.xml \
82
+ --out output/scopus.csv \
83
+ --report reports/scopus_run.txt \
84
+ --stats \
85
+ --verbose
86
+ ```
87
+
88
+ If `--report` is not provided, it defaults to `<out>_report.txt`.
89
+ Use `--no-report` to disable report generation.
90
+
91
+ ---
92
+
93
+ ### CLI Options
94
+
95
+ | Option | Description | Default |
96
+ | --------------- | --------------------------------------------------- | ------------------ |
97
+ | `--xml` | Path to a single EndNote XML file | – |
98
+ | `--folder` | Path to a folder containing multiple `*.xml` files | – |
99
+ | `--csv` | (Legacy) Output CSV path | – |
100
+ | `--out` | Generic output path (`.csv`, `.json`, `.xlsx`) | – |
101
+ | `--format` | Explicit format (`csv`, `json`, `xlsx`) | inferred |
102
+ | `--report` | Output TXT report path | `<out>_report.txt` |
103
+ | `--no-report` | Disable TXT report completely | – |
104
+ | `--delimiter` | CSV delimiter | `,` |
105
+ | `--quoting` | CSV quoting: `minimal`, `all`, `nonnumeric`, `none` | `minimal` |
106
+ | `--no-header` | Suppress CSV header row | – |
107
+ | `--encoding` | Output text encoding | `utf-8` |
108
+ | `--ref-type` | Only include records with this `ref_type` name | – |
109
+ | `--year` | Only include records with this year | – |
110
+ | `--max-records` | Stop after N records per file (for testing) | – |
111
+ | `--dedupe` | Deduplicate mode: `none`, `doi`, `title-year` | `none` |
112
+ | `--dedupe-keep` | Deduplication strategy: `first`, `last` | `first` |
113
+ | `--stats` | Include summary stats in TXT report | – |
114
+ | `--stats-json` | Path to JSON file to save stats & duplicate info | – |
115
+ | `--verbose` | Verbose logging with debug details | – |
116
+
117
+ ---
118
+
119
+ ### Example Report (snippet)
120
+
121
+ ```
122
+ ========================================
123
+ EndNote Export Report
124
+ ========================================
125
+ Run started : 2025-09-11 14:30:22
126
+ Files : 4
127
+ Duration : 0.47 seconds
128
+
129
+ Per-file results
130
+ ----------------------------------------
131
+ GGScholar.xml : 13 exported, 0 skipped
132
+ IEEE.xml : 2147 exported, 0 skipped
133
+ PubMed.xml : 504 exported, 0 skipped
134
+ Scopus.xml : 847 exported, 0 skipped
135
+ TOTAL exported: 3511
136
+
137
+ Duplicates table (by database)
138
+ ----------------------------------------
139
+ Database Origin Retractions Duplicates Remaining
140
+ ------------------------------------------------------------
141
+ GGScholar 179 0 27 152
142
+ IEEE 1900 0 589 1311
143
+ PubMed 320 0 225 95
144
+ Scopus 1999 1 511 1489
145
+ TOTAL 4410 1 1352 3047
146
+
147
+ Duplicate keys (top)
148
+ ----------------------------------------
149
+ Mode : doi
150
+ Keep : first
151
+ Removed: 1352
152
+ Details (top):
153
+ 10.1109/SPMB55497.2022.10014965 : 3 duplicate(s)
154
+ 10.1109/TSSA63730.2024.10864368 : 2 duplicate(s)
155
+
156
+ Summary stats
157
+ ----------------------------------------
158
+ By year:
159
+ 2022 : 569
160
+ 2023 : 684
161
+ 2024 : 1148
162
+ 2025 : 1108
163
+
164
+ By ref_type (top):
165
+ Journal Article: 2037
166
+ Conference Proceedings: 1470
167
+ Book Section: 4
168
+
169
+ By journal (top 20):
170
+ IEEE Access: 175
171
+ IEEE Journal of Biomedical and Health Informatics: 67
172
+ ...
173
+
174
+ Top authors (top 10):
175
+ Y. Wang: 50
176
+ X. Wang: 35
177
+ ...
178
+ ```
179
+
180
+ ---
181
+
182
+ ## Python API
183
+
184
+ ```python
185
+ from pathlib import Path
186
+ from endnote_utils import export, export_folder
187
+
188
+ # Single file
189
+ total, out_file, report_file = export(
190
+ Path("data/IEEE.xml"),
191
+ Path("output/ieee.csv"),
192
+ dedupe="doi", stats=True
193
+ )
194
+
195
+ # Folder
196
+ total, out_file, report_file = export_folder(
197
+ Path("data/xmls"),
198
+ Path("output/all.csv"),
199
+ ref_type="Conference Proceedings",
200
+ year="2024",
201
+ dedupe="title-year",
202
+ dedupe_keep="last",
203
+ stats=True,
204
+ stats_json=Path("output/stats.json"),
205
+ )
206
+ ```
207
+
208
+ ---
209
+
210
+ ## Development Notes
211
+
212
+ * Pure Python, uses only standard library (`argparse`, `csv`, `xml.etree.ElementTree`, `logging`, `pathlib`, `json`).
213
+ * Optional dependency: `openpyxl` (for Excel `.xlsx` export).
214
+ * Streaming XML parsing avoids high memory usage.
215
+ * Deduplication strategies configurable (`doi` / `title-year`).
216
+ * Report includes per-database table and optional JSON snapshot.
217
+ * Follows [PEP 621](https://peps.python.org/pep-0621/) packaging (`pyproject.toml`).
218
+
219
+ ---
220
+
221
+ ## License
222
+
223
+ MIT License © 2025 Minh Quach
@@ -0,0 +1,209 @@
1
+ # EndNote Utils
2
+
3
+ Convert **EndNote XML files** into clean CSV/JSON/XLSX with automatic TXT reports.
4
+ Supports both **Python API** and **command-line interface (CLI)**.
5
+
6
+ ---
7
+
8
+ ## Features
9
+
10
+ - ✅ Parse one XML file (`--xml`) or an entire folder of `*.xml` (`--folder`)
11
+ - ✅ Streams `<record>` elements using `iterparse` (low memory usage)
12
+ - ✅ Extracts fields:
13
+ `database, ref_type, title, journal, authors, year, volume, number, abstract, doi, urls, keywords, publisher, isbn, language, extracted_date`
14
+ - ✅ Adds a `database` column from the XML filename stem (`IEEE.xml → IEEE`)
15
+ - ✅ Normalizes DOI (`10.xxxx` → `https://doi.org/...`)
16
+ - ✅ Supports **multiple output formats**: CSV, JSON, XLSX
17
+ - ✅ Always generates a **TXT report** (default: `<out>_report.txt`) with:
18
+ - per-file counts (exported/skipped)
19
+ - totals, files processed
20
+ - run timestamp & duration
21
+ - **duplicate table** per database (Origin / Retractions / Duplicates / Remaining)
22
+ - optional duplicate key list (top-N)
23
+ - optional summary stats (year, ref_type, journal, top authors)
24
+ - ✅ Auto-creates output folders if missing
25
+ - ✅ Deduplication:
26
+ - `--dedupe doi` (unique by DOI)
27
+ - `--dedupe title-year` (unique by normalized title + year)
28
+ - `--dedupe-keep first|last` (keep first or last occurrence within each file)
29
+ - ✅ Summary stats (`--stats`) with optional JSON export (`--stats-json`)
30
+ - ✅ CLI options for CSV formatting, filters, verbosity
31
+ - ✅ Importable Python API for scripting & integration
32
+
33
+ ---
34
+
35
+ ## Installation
36
+
37
+ ### From PyPI
38
+
39
+ ```bash
40
+ pip install endnote-utils
41
+ ```
42
+
43
+ Requires **Python 3.8+**.
44
+
45
+ ---
46
+
47
+ ## Usage
48
+
49
+ ### Command Line
50
+
51
+ #### Single file
52
+
53
+ ```bash
54
+ endnote-utils --xml data/IEEE.xml --out output/ieee.csv
55
+ ```
56
+
57
+ #### Folder with multiple files
58
+
59
+ ```bash
60
+ endnote-utils --folder data/xmls --out output/all_records.csv
61
+ ```
62
+
63
+ #### Custom report path
64
+
65
+ ```bash
66
+ endnote-utils \
67
+ --xml data/Scopus.xml \
68
+ --out output/scopus.csv \
69
+ --report reports/scopus_run.txt \
70
+ --stats \
71
+ --verbose
72
+ ```
73
+
74
+ If `--report` is not provided, it defaults to `<out>_report.txt`.
75
+ Use `--no-report` to disable report generation.
76
+
77
+ ---
78
+
79
+ ### CLI Options
80
+
81
+ | Option | Description | Default |
82
+ | --------------- | --------------------------------------------------- | ------------------ |
83
+ | `--xml` | Path to a single EndNote XML file | – |
84
+ | `--folder` | Path to a folder containing multiple `*.xml` files | – |
85
+ | `--csv` | (Legacy) Output CSV path | – |
86
+ | `--out` | Generic output path (`.csv`, `.json`, `.xlsx`) | – |
87
+ | `--format` | Explicit format (`csv`, `json`, `xlsx`) | inferred |
88
+ | `--report` | Output TXT report path | `<out>_report.txt` |
89
+ | `--no-report` | Disable TXT report completely | – |
90
+ | `--delimiter` | CSV delimiter | `,` |
91
+ | `--quoting` | CSV quoting: `minimal`, `all`, `nonnumeric`, `none` | `minimal` |
92
+ | `--no-header` | Suppress CSV header row | – |
93
+ | `--encoding` | Output text encoding | `utf-8` |
94
+ | `--ref-type` | Only include records with this `ref_type` name | – |
95
+ | `--year` | Only include records with this year | – |
96
+ | `--max-records` | Stop after N records per file (for testing) | – |
97
+ | `--dedupe` | Deduplicate mode: `none`, `doi`, `title-year` | `none` |
98
+ | `--dedupe-keep` | Deduplication strategy: `first`, `last` | `first` |
99
+ | `--stats` | Include summary stats in TXT report | – |
100
+ | `--stats-json` | Path to JSON file to save stats & duplicate info | – |
101
+ | `--verbose` | Verbose logging with debug details | – |
102
+
103
+ ---
104
+
105
+ ### Example Report (snippet)
106
+
107
+ ```
108
+ ========================================
109
+ EndNote Export Report
110
+ ========================================
111
+ Run started : 2025-09-11 14:30:22
112
+ Files : 4
113
+ Duration : 0.47 seconds
114
+
115
+ Per-file results
116
+ ----------------------------------------
117
+ GGScholar.xml : 13 exported, 0 skipped
118
+ IEEE.xml : 2147 exported, 0 skipped
119
+ PubMed.xml : 504 exported, 0 skipped
120
+ Scopus.xml : 847 exported, 0 skipped
121
+ TOTAL exported: 3511
122
+
123
+ Duplicates table (by database)
124
+ ----------------------------------------
125
+ Database Origin Retractions Duplicates Remaining
126
+ ------------------------------------------------------------
127
+ GGScholar 179 0 27 152
128
+ IEEE 1900 0 589 1311
129
+ PubMed 320 0 225 95
130
+ Scopus 1999 1 511 1489
131
+ TOTAL 4410 1 1352 3047
132
+
133
+ Duplicate keys (top)
134
+ ----------------------------------------
135
+ Mode : doi
136
+ Keep : first
137
+ Removed: 1352
138
+ Details (top):
139
+ 10.1109/SPMB55497.2022.10014965 : 3 duplicate(s)
140
+ 10.1109/TSSA63730.2024.10864368 : 2 duplicate(s)
141
+
142
+ Summary stats
143
+ ----------------------------------------
144
+ By year:
145
+ 2022 : 569
146
+ 2023 : 684
147
+ 2024 : 1148
148
+ 2025 : 1108
149
+
150
+ By ref_type (top):
151
+ Journal Article: 2037
152
+ Conference Proceedings: 1470
153
+ Book Section: 4
154
+
155
+ By journal (top 20):
156
+ IEEE Access: 175
157
+ IEEE Journal of Biomedical and Health Informatics: 67
158
+ ...
159
+
160
+ Top authors (top 10):
161
+ Y. Wang: 50
162
+ X. Wang: 35
163
+ ...
164
+ ```
165
+
166
+ ---
167
+
168
+ ## Python API
169
+
170
+ ```python
171
+ from pathlib import Path
172
+ from endnote_utils import export, export_folder
173
+
174
+ # Single file
175
+ total, out_file, report_file = export(
176
+ Path("data/IEEE.xml"),
177
+ Path("output/ieee.csv"),
178
+ dedupe="doi", stats=True
179
+ )
180
+
181
+ # Folder
182
+ total, out_file, report_file = export_folder(
183
+ Path("data/xmls"),
184
+ Path("output/all.csv"),
185
+ ref_type="Conference Proceedings",
186
+ year="2024",
187
+ dedupe="title-year",
188
+ dedupe_keep="last",
189
+ stats=True,
190
+ stats_json=Path("output/stats.json"),
191
+ )
192
+ ```
193
+
194
+ ---
195
+
196
+ ## Development Notes
197
+
198
+ * Pure Python, uses only standard library (`argparse`, `csv`, `xml.etree.ElementTree`, `logging`, `pathlib`, `json`).
199
+ * Optional dependency: `openpyxl` (for Excel `.xlsx` export).
200
+ * Streaming XML parsing avoids high memory usage.
201
+ * Deduplication strategies configurable (`doi` / `title-year`).
202
+ * Report includes per-database table and optional JSON snapshot.
203
+ * Follows [PEP 621](https://peps.python.org/pep-0621/) packaging (`pyproject.toml`).
204
+
205
+ ---
206
+
207
+ ## License
208
+
209
+ MIT License © 2025 Minh Quach
@@ -4,8 +4,8 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "endnote-utils"
7
- version = "0.1.3"
8
- description = "Convert EndNote XML to CSV with streaming parse and TXT report."
7
+ version = "0.2.0"
8
+ description = "Convert EndNote XML to CSV/JSON/XLSX with streaming parse and TXT report."
9
9
  readme = { file = "README.md", content-type = "text/markdown" }
10
10
  requires-python = ">=3.8"
11
11
  license = {text = "MIT"}
@@ -17,7 +17,9 @@ classifiers = [
17
17
  "Operating System :: OS Independent",
18
18
  ]
19
19
 
20
- dependencies = [] # stdlib only
20
+ dependencies = [
21
+ "openpyxl>=3.1.0"
22
+ ]
21
23
 
22
24
  [project.scripts]
23
25
  endnote-utils = "endnote_utils.cli:main"
@@ -0,0 +1,186 @@
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import logging
5
+ import sys
6
+ from pathlib import Path
7
+ from typing import List, Optional, Tuple
8
+
9
+ from .core import (
10
+ DEFAULT_FIELDNAMES,
11
+ export_files_with_report, # generic writer: csv/json/xlsx
12
+ )
13
+
14
+ SUPPORTED_FORMATS = ("csv", "json", "xlsx")
15
+ EXT_TO_FORMAT = {".csv": "csv", ".json": "json", ".xlsx": "xlsx"}
16
+
17
+
18
+ def build_parser() -> argparse.ArgumentParser:
19
+ p = argparse.ArgumentParser(
20
+ description="Export EndNote XML (file or folder) to CSV/JSON/XLSX with a TXT report."
21
+ )
22
+
23
+ # Input source (mutually exclusive)
24
+ g = p.add_mutually_exclusive_group(required=True)
25
+ g.add_argument("--xml", help="Path to a single EndNote XML file.")
26
+ g.add_argument("--folder", help="Path to a folder containing *.xml files.")
27
+
28
+ # Output selection (CSV legacy flag + new generic flags)
29
+ p.add_argument(
30
+ "--csv",
31
+ required=False,
32
+ help="(Legacy) Output CSV path. Prefer --out for csv/json/xlsx.",
33
+ )
34
+ p.add_argument(
35
+ "--out",
36
+ required=False,
37
+ help="Generic output path; format inferred from file extension if --format not provided. "
38
+ "Supported extensions: .csv, .json, .xlsx",
39
+ )
40
+ p.add_argument(
41
+ "--format",
42
+ choices=SUPPORTED_FORMATS,
43
+ help="Output format. If omitted, inferred from --out extension or --csv.",
44
+ )
45
+
46
+ # Report controls
47
+ p.add_argument("--report", required=False, help="Path to TXT report (default: <output>_report.txt).")
48
+ p.add_argument(
49
+ "--no-report",
50
+ action="store_true",
51
+ help="Disable writing the TXT report (by default, a report is always generated).",
52
+ )
53
+
54
+ # CSV-specific formatting options (ignored for JSON/XLSX except delimiter/quoting/header)
55
+ p.add_argument("--delimiter", default=",", help="CSV delimiter (default: ',').")
56
+ p.add_argument(
57
+ "--quoting",
58
+ default="minimal",
59
+ choices=["minimal", "all", "nonnumeric", "none"],
60
+ help="CSV quoting mode (default: minimal).",
61
+ )
62
+ p.add_argument("--no-header", action="store_true", help="Do not write CSV header row.")
63
+ p.add_argument("--encoding", default="utf-8", help="Output text encoding (default: utf-8).")
64
+
65
+ # Filters / limits
66
+ p.add_argument("--ref-type", default=None, help="Filter by ref_type name.")
67
+ p.add_argument("--year", default=None, help="Filter by year.")
68
+ p.add_argument("--max-records", type=int, default=None, help="Max records per file (testing).")
69
+
70
+ # Deduplication & Stats
71
+ p.add_argument("--dedupe", choices=["none", "doi", "title-year"], default="none",
72
+ help="Deduplicate records by key. Default: none.")
73
+ p.add_argument("--dedupe-keep", choices=["first", "last"], default="first",
74
+ help="When duplicates found, keep the first or last occurrence. Default: first.")
75
+ p.add_argument("--stats", action="store_true",
76
+ help="Compute summary stats and include them in the TXT report.")
77
+ p.add_argument("--stats-json",
78
+ help="Optional JSON file path to write detailed stats (when --stats is used).")
79
+ p.add_argument("--top-authors", type=int, default=10,
80
+ help="How many top authors to list in the report/stats JSON. Default: 10.")
81
+
82
+ # Verbosity
83
+ p.add_argument("--verbose", action="store_true", help="Verbose logging.")
84
+
85
+ return p
86
+
87
+
88
+ def _resolve_inputs(args: argparse.Namespace) -> List[Path]:
89
+ if args.xml:
90
+ xml_path = Path(args.xml)
91
+ if not xml_path.is_file():
92
+ raise FileNotFoundError(xml_path)
93
+ return [xml_path]
94
+
95
+ folder = Path(args.folder)
96
+ if not folder.is_dir():
97
+ raise FileNotFoundError(folder)
98
+ inputs = sorted(p for p in folder.glob("*.xml") if p.is_file())
99
+ if not inputs:
100
+ raise FileNotFoundError(f"No *.xml files found in folder: {folder}")
101
+ return inputs
102
+
103
+
104
+ def _resolve_output_and_format(args: argparse.Namespace) -> tuple[Path, str, Optional[Path]]:
105
+ """
106
+ Decide final out_path, out_format, and report_path using:
107
+ - Prefer --out/--format if provided
108
+ - Fallback to --csv (legacy) which implies CSV
109
+ - If --no-report, return report_path=None
110
+ """
111
+ target_path: Optional[Path] = None
112
+ out_format: Optional[str] = None
113
+
114
+ if args.out:
115
+ target_path = Path(args.out)
116
+ out_format = args.format
117
+ if not out_format:
118
+ # infer from extension
119
+ out_format = EXT_TO_FORMAT.get(target_path.suffix.lower())
120
+ if not out_format:
121
+ raise SystemExit(
122
+ "Cannot infer output format from extension. "
123
+ "Use --format {csv,json,xlsx} or set a supported extension."
124
+ )
125
+ elif args.csv:
126
+ target_path = Path(args.csv)
127
+ out_format = args.format or "csv"
128
+ if out_format != "csv":
129
+ # user asked for non-csv but used --csv path
130
+ raise SystemExit("When using --csv, --format must be 'csv'. Use --out for json/xlsx.")
131
+ else:
132
+ raise SystemExit("You must provide either --out (preferred) or --csv (legacy).")
133
+
134
+ # Report path defaults next to chosen output file (unless disabled)
135
+ if args.no_report:
136
+ report_path: Optional[Path] = None
137
+ else:
138
+ report_path = Path(args.report) if args.report else target_path.with_name(target_path.stem + "_report.txt")
139
+
140
+ return target_path, out_format, report_path
141
+
142
+
143
+ def main() -> None:
144
+ args = build_parser().parse_args()
145
+ logging.basicConfig(
146
+ level=logging.DEBUG if args.verbose else logging.INFO,
147
+ format="%(levelname)s: %(message)s",
148
+ stream=sys.stderr,
149
+ )
150
+
151
+ try:
152
+ inputs = _resolve_inputs(args)
153
+ out_path, out_format, report_path = _resolve_output_and_format(args)
154
+
155
+ total, final_out, final_report = export_files_with_report(
156
+ inputs=inputs,
157
+ out_path=out_path,
158
+ out_format=out_format,
159
+ fieldnames=DEFAULT_FIELDNAMES,
160
+ delimiter=args.delimiter,
161
+ quoting=args.quoting,
162
+ include_header=not args.no_header,
163
+ encoding=args.encoding,
164
+ ref_type=args.ref_type,
165
+ year=args.year,
166
+ max_records_per_file=args.max_records,
167
+ dedupe=args.dedupe,
168
+ dedupe_keep=args.dedupe_keep,
169
+ stats=args.stats,
170
+ stats_json=Path(args.stats_json) if args.stats_json else None,
171
+ top_authors=args.top_authors,
172
+ report_path=report_path, # may be None → core should skip writing report
173
+ )
174
+
175
+ logging.info("Exported %d record(s) → %s", total, final_out)
176
+ if report_path is None:
177
+ logging.info("Report disabled by --no-report.")
178
+ else:
179
+ logging.info("Report → %s", final_report)
180
+
181
+ except FileNotFoundError as e:
182
+ logging.error("File/folder not found: %s", e)
183
+ sys.exit(1)
184
+ except Exception as e:
185
+ logging.error("Unexpected error: %s", e)
186
+ sys.exit(2)