pubmatrixpython 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,42 @@
1
+ name: Publish to PyPI
2
+
3
+ on:
4
+ push:
5
+ tags:
6
+ - "v*"
7
+
8
+ jobs:
9
+ build:
10
+ name: Build distribution
11
+ runs-on: ubuntu-latest
12
+ steps:
13
+ - uses: actions/checkout@v6
14
+ with:
15
+ persist-credentials: false
16
+ - name: Install uv
17
+ uses: astral-sh/setup-uv@v3
18
+ - name: Build sdist and wheel
19
+ run: uv build
20
+ - name: Store distribution packages
21
+ uses: actions/upload-artifact@v5
22
+ with:
23
+ name: python-package-distributions
24
+ path: dist/
25
+
26
+ publish-to-pypi:
27
+ name: Publish to PyPI
28
+ needs: build
29
+ runs-on: ubuntu-latest
30
+ environment:
31
+ name: pypi
32
+ url: https://pypi.org/p/pubmatrixpython
33
+ permissions:
34
+ id-token: write
35
+ steps:
36
+ - name: Download distributions
37
+ uses: actions/download-artifact@v6
38
+ with:
39
+ name: python-package-distributions
40
+ path: dist/
41
+ - name: Publish to PyPI
42
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,26 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+
12
+ # Jupyter
13
+ .ipynb_checkpoints/
14
+
15
+ # Environment
16
+ .env
17
+
18
+ # uv lock file
19
+ uv.lock
20
+
21
+ # dev / local-only notebooks
22
+ dev/
23
+
24
+ # Status bank files
25
+ CLAUDE*.md
26
+ .claude/
@@ -0,0 +1 @@
1
+ 3.13
@@ -0,0 +1,28 @@
1
+ # Changelog
2
+
3
+ ## [0.2.0] — 2026-06-03
4
+
5
+ - `n_workers` parameter for concurrent NCBI queries (respects rate limits automatically)
6
+ - `cache_dir` parameter to cache query results to disk and skip redundant requests
7
+ - `timeout` parameter exposed on `pubmatrix()` (was hardcoded at 30 s)
8
+ - `plot_pubmatrix_heatmap()` now returns `(fig, ax)` tuple instead of `ax` alone
9
+ - Added `show` parameter to `plot_pubmatrix_heatmap()`; `plt.show()` no longer called automatically
10
+ - Replaced `print()` with `logging` throughout — output can now be suppressed or redirected
11
+ - Narrowed exception handling in `_fetch_count()` to `requests.RequestException`
12
+ - Fixed float-unsafe clustering check (`np.allclose` instead of `==`)
13
+ - `n_tries` and `n_workers` now validated with clear error messages
14
+ - `odfpy` moved to optional extra: `pip install pubmatrixpython[ods]`
15
+ - Requires Python ≥ 3.10 (relaxed from 3.13)
16
+ - Full test suite: 60 tests covering core queries, XML parsing, caching, retry logic, and heatmap rendering
17
+
18
+ ## [0.1.0] — 2026-03-28
19
+
20
+ Initial release. Python port of [PubMatrixR](https://github.com/ToledoEM/PubMatrixR-v2).
21
+
22
+ - `pubmatrix()` — pairwise PubMed/PMC co-occurrence queries with progress bar
23
+ - `pubmatrix_from_file()` — load term lists from a plain-text file
24
+ - `plot_pubmatrix_heatmap()` — heatmap with optional clustering, custom colours, PNG export
25
+ - `pubmatrix_heatmap()` — quick wrapper with defaults
26
+ - Date range filtering via `daterange`
27
+ - CSV and ODS export with PubMed hyperlinks
28
+ - NCBI API key support for higher rate limits
@@ -0,0 +1,2 @@
1
+ YEAR: 2026
2
+ COPYRIGHT HOLDER: Enrique Toledo
@@ -0,0 +1,21 @@
1
+ # MIT License
2
+
3
+ Copyright (c) 2026 Enrique Toledo
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,300 @@
1
+ Metadata-Version: 2.4
2
+ Name: pubmatrixpython
3
+ Version: 0.2.0
4
+ Summary: Python port of PubMatrixR — systematic literature co-occurrence analysis via NCBI PubMed
5
+ Project-URL: Homepage, https://toledoem.github.io/pubmatrixp/
6
+ Project-URL: Repository, https://github.com/ToledoEM/PubMatrixPython
7
+ Project-URL: Changelog, https://github.com/ToledoEM/PubMatrixPython/blob/main/CHANGELOG.md
8
+ Author-email: Enrique Toledo <enriquetoledo@gmail.com>
9
+ License-Expression: MIT
10
+ License-File: LICENSE
11
+ License-File: LICENSE.md
12
+ Keywords: bioinformatics,co-occurrence,literature-mining,ncbi,pubmed
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
22
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
23
+ Requires-Python: >=3.10
24
+ Requires-Dist: matplotlib<4,>=3.10
25
+ Requires-Dist: pandas<4,>=2.0
26
+ Requires-Dist: requests<3,>=2.33
27
+ Requires-Dist: scipy<2,>=1.10
28
+ Requires-Dist: seaborn<1,>=0.13
29
+ Requires-Dist: tqdm<5,>=4.60
30
+ Provides-Extra: ods
31
+ Requires-Dist: odfpy>=1.4.1; extra == 'ods'
32
+ Description-Content-Type: text/markdown
33
+
34
+ # PubMatrixPython v0.2
35
+
36
+ <img src="https://toledoem.github.io/img/LogoPubmatrixP.png" align="right" width="150"/>
37
+
38
+ ![Python](https://img.shields.io/badge/python-3.10%2B-blue)
39
+ ![Tests](https://img.shields.io/badge/tests-60%20passed-brightgreen)
40
+ ![License](https://img.shields.io/badge/license-MIT-green)
41
+
42
+ Python port of the [PubMatrixR](https://github.com/ToledoEM/PubMatrixR-v2) R package.
43
+
44
+ For every pair of search terms `(A, B)`, it counts how many PubMed or PMC publications mention both. Good for mapping relationships between genes, diseases, and pathways across the literature.
45
+
46
+ Based on: Becker et al. (2003) *PubMatrix: a tool for multiplex literature mining*. BMC Bioinformatics 4:61. https://doi.org/10.1186/1471-2105-4-61
47
+
48
+ ---
49
+
50
+ ## Key features
51
+
52
+ - **Pairwise literature search** — automatically searches every combination of terms from two lists
53
+ - **PubMed or PMC** — query MEDLINE abstracts or PMC full text via NCBI E-utilities
54
+ - **Heatmap visualisation** — overlap-percentage heatmaps with optional hierarchical clustering
55
+ - **Export to CSV or ODS** — results include clickable hyperlinks to the matching PubMed search
56
+ - **Date filtering** — restrict searches to a publication year range
57
+ - **Flexible input** — pass term lists directly, or load them from a text file
58
+ - **Concurrency** — `n_workers` for parallel queries, respecting NCBI rate limits
59
+ - **Disk caching** — `cache_dir` persists query results between runs
60
+ - **Progress tracking** — built-in progress bar for long searches
61
+
62
+ ## Use cases
63
+
64
+ - **Gene–disease association studies** — explore literature connections between genes and diseases
65
+ - **Pathway analysis** — investigate co-occurrence of genes within or across biological pathways
66
+ - **Drug–target research** — analyse relationships between compounds and potential targets
67
+ - **Systematic literature reviews** — quantify research coverage across multiple topics
68
+ - **Knowledge gap identification** — find under-researched combinations of terms
69
+ - **Bibliometric analysis** — measure research activity in a domain over time
70
+
71
+ ---
72
+
73
+ ## Setup
74
+
75
+ Requires [uv](https://docs.astral.sh/uv/). Install it with:
76
+
77
+ ```bash
78
+ curl -LsSf https://astral.sh/uv/install.sh | sh
79
+ ```
80
+
81
+ Clone and install dependencies:
82
+
83
+ ```bash
84
+ git clone <repo-url>
85
+ cd PubMatrixPython
86
+ uv sync --all-groups
87
+ ```
88
+
89
+ ---
90
+
91
+ ## Running the notebooks
92
+
93
+ All `uv` commands must be run from the **project root** (`PubMatrixPython/`), where `pyproject.toml` lives.
94
+
95
+ ```bash
96
+ cd /path/to/PubMatrixPython
97
+ uv run jupyter lab
98
+ ```
99
+
100
+ Then open any notebook from the `notebooks/` folder in the browser.
101
+
102
+ | Notebook | What it covers |
103
+ |----------|---------------|
104
+ | `01_pubmatrix.ipynb` | Basic queries, date filtering, PMC database, file input, CSV export, heatmap visualisation |
105
+ | `02_example_wnt.ipynb` | Full worked example: WNT genes × obesity genes |
106
+
107
+ ---
108
+
109
+ ## Quick start (script or REPL)
110
+
111
+ ### Interactive REPL
112
+
113
+ ```bash
114
+ uv run python
115
+ ```
116
+
117
+ ```python
118
+ from pubmatrix import pubmatrix, plot_pubmatrix_heatmap
119
+
120
+ A = ["WNT1", "WNT2", "CTNNB1"]
121
+ B = ["obesity", "diabetes", "cancer"]
122
+
123
+ result = pubmatrix(A=A, B=B)
124
+ print(result)
125
+
126
+ plot_pubmatrix_heatmap(result, title="WNT × Disease")
127
+ ```
128
+
129
+ ### Running a script
130
+
131
+ Create a file `my_analysis.py`:
132
+
133
+ ```python
134
+ from pubmatrix import pubmatrix, plot_pubmatrix_heatmap
135
+
136
+ A = ["WNT1", "WNT2", "WNT3A", "WNT5A", "CTNNB1"]
137
+ B = ["obesity", "diabetes", "cancer", "inflammation"]
138
+
139
+ result = pubmatrix(
140
+ A=A,
141
+ B=B,
142
+ database="pubmed",
143
+ daterange=[2010, 2024], # optional date filter
144
+ outfile="results",
145
+ export_format="csv", # saves results_result.csv with PubMed hyperlinks
146
+ )
147
+
148
+ print(result)
149
+
150
+ plot_pubmatrix_heatmap(
151
+ result,
152
+ title="WNT Genes × Disease",
153
+ filename="heatmap.png", # saves to file instead of displaying
154
+ )
155
+ ```
156
+
157
+ Run it with:
158
+
159
+ ```bash
160
+ uv run python my_analysis.py
161
+ ```
162
+
163
+ ### Loading terms from a file
164
+
165
+ Create `terms.txt`:
166
+
167
+ ```
168
+ WNT1
169
+ WNT2
170
+ CTNNB1
171
+ #
172
+ obesity
173
+ diabetes
174
+ cancer
175
+ ```
176
+
177
+ ```python
178
+ from pubmatrix import pubmatrix_from_file
179
+
180
+ result = pubmatrix_from_file("terms.txt")
181
+ print(result)
182
+ ```
183
+
184
+ ```bash
185
+ uv run python my_analysis.py
186
+ ```
187
+
188
+ ---
189
+
190
+ ## API reference
191
+
192
+ ### `pubmatrix(A, B, ...)`
193
+
194
+ Query PubMed and return a `pandas.DataFrame` (rows = B, cols = A).
195
+
196
+ ```python
197
+ pubmatrix(
198
+ A, # list of str — column terms
199
+ B, # list of str — row terms
200
+ api_key=None, # NCBI API key (10 req/s vs 3 req/s default)
201
+ database="pubmed", # "pubmed" or "pmc"
202
+ daterange=None, # e.g. [2015, 2024]
203
+ outfile=None, # base filename for export
204
+ export_format=None, # None | "csv" | "ods"
205
+ n_tries=2, # retries on network failure
206
+ n_workers=1, # parallel workers for concurrent queries
207
+ timeout=30, # HTTP request timeout in seconds
208
+ cache_dir=None, # directory to cache query results on disk
209
+ )
210
+ ```
211
+
212
+ ### `pubmatrix_from_file(filepath, ...)`
213
+
214
+ Load terms from a plain-text file and run `pubmatrix()`.
215
+
216
+ File format:
217
+ ```
218
+ WNT1
219
+ WNT2
220
+ #
221
+ obesity
222
+ diabetes
223
+ ```
224
+
225
+ ```python
226
+ result = pubmatrix_from_file("terms.txt", database="pubmed")
227
+ ```
228
+
229
+ ### `plot_pubmatrix_heatmap(matrix, ...)`
230
+
231
+ Heatmap of overlap percentages with optional hierarchical clustering. Returns `(fig, ax)`.
232
+
233
+ ```python
234
+ fig, ax = plot_pubmatrix_heatmap(
235
+ matrix, # DataFrame from pubmatrix()
236
+ title="PubMatrix Co-occurrence Heatmap",
237
+ cluster_rows=True,
238
+ cluster_cols=True,
239
+ show_numbers=True,
240
+ color_palette=None, # list of hex colours
241
+ filename=None, # save to PNG if set
242
+ width=10, height=8,
243
+ scale_font=True,
244
+ show=False, # call plt.show() after plotting
245
+ )
246
+ ```
247
+
248
+ ### `pubmatrix_heatmap(matrix, title=...)`
249
+
250
+ Quick wrapper around `plot_pubmatrix_heatmap()` with all defaults. Returns `(fig, ax)`.
251
+
252
+ ---
253
+
254
+ ## Output files
255
+
256
+ When `outfile` and `export_format` are set, results are written to
257
+ `{outfile}_result.{extension}` (`.csv` or `.ods`). Each cell contains the
258
+ publication count and a hyperlink to the matching PubMed search. Row names
259
+ come from `B`, column names from `A`.
260
+
261
+ ODS export requires the optional `odfpy` dependency:
262
+
263
+ ```bash
264
+ pip install pubmatrixpython[ods]
265
+ ```
266
+
267
+ ---
268
+
269
+ ## NCBI API key
270
+
271
+ Without a key: 3 requests/second. With a key: 10 requests/second.
272
+ Get one at https://account.ncbi.nlm.nih.gov/
273
+
274
+ ```python
275
+ result = pubmatrix(A=A, B=B, api_key="YOUR_KEY_HERE")
276
+ ```
277
+
278
+ ---
279
+
280
+ ## More documentation
281
+
282
+ - [Performance notes](docs/performance.md) — rate limits, caching, concurrency
283
+ - [Troubleshooting](docs/troubleshooting.md) — empty results, rate limiting, slow searches
284
+ - [Full reference notebook](https://toledoem.github.io/pubmatrixp/) — every parameter and feature, with output
285
+
286
+ ---
287
+
288
+ ## License & citation
289
+
290
+ This project is licensed under the MIT License — see [`LICENSE.md`](LICENSE.md).
291
+
292
+ If you use PubMatrixPython in your research, please cite:
293
+
294
+ > Becker KG, Hosack DA, Dennis G Jr, Lempicki RA, Bright TJ, Cheadle C, Engel J.
295
+ > *PubMatrix: a tool for multiplex literature mining.*
296
+ > BMC Bioinformatics. 2003 Dec 10;4:61. https://doi.org/10.1186/1471-2105-4-61
297
+
298
+ **Developers:**
299
+ - Tyler Laird (Author, original PubMatrixR)
300
+ - Enrique Toledo (Author, maintainer)
@@ -0,0 +1,267 @@
1
+ # PubMatrixPython v0.2
2
+
3
+ <img src="https://toledoem.github.io/img/LogoPubmatrixP.png" align="right" width="150"/>
4
+
5
+ ![Python](https://img.shields.io/badge/python-3.10%2B-blue)
6
+ ![Tests](https://img.shields.io/badge/tests-60%20passed-brightgreen)
7
+ ![License](https://img.shields.io/badge/license-MIT-green)
8
+
9
+ Python port of the [PubMatrixR](https://github.com/ToledoEM/PubMatrixR-v2) R package.
10
+
11
+ For every pair of search terms `(A, B)`, it counts how many PubMed or PMC publications mention both. Good for mapping relationships between genes, diseases, and pathways across the literature.
12
+
13
+ Based on: Becker et al. (2003) *PubMatrix: a tool for multiplex literature mining*. BMC Bioinformatics 4:61. https://doi.org/10.1186/1471-2105-4-61
14
+
15
+ ---
16
+
17
+ ## Key features
18
+
19
+ - **Pairwise literature search** — automatically searches every combination of terms from two lists
20
+ - **PubMed or PMC** — query MEDLINE abstracts or PMC full text via NCBI E-utilities
21
+ - **Heatmap visualisation** — overlap-percentage heatmaps with optional hierarchical clustering
22
+ - **Export to CSV or ODS** — results include clickable hyperlinks to the matching PubMed search
23
+ - **Date filtering** — restrict searches to a publication year range
24
+ - **Flexible input** — pass term lists directly, or load them from a text file
25
+ - **Concurrency** — `n_workers` for parallel queries, respecting NCBI rate limits
26
+ - **Disk caching** — `cache_dir` persists query results between runs
27
+ - **Progress tracking** — built-in progress bar for long searches
28
+
29
+ ## Use cases
30
+
31
+ - **Gene–disease association studies** — explore literature connections between genes and diseases
32
+ - **Pathway analysis** — investigate co-occurrence of genes within or across biological pathways
33
+ - **Drug–target research** — analyse relationships between compounds and potential targets
34
+ - **Systematic literature reviews** — quantify research coverage across multiple topics
35
+ - **Knowledge gap identification** — find under-researched combinations of terms
36
+ - **Bibliometric analysis** — measure research activity in a domain over time
37
+
38
+ ---
39
+
40
+ ## Setup
41
+
42
+ Requires [uv](https://docs.astral.sh/uv/). Install it with:
43
+
44
+ ```bash
45
+ curl -LsSf https://astral.sh/uv/install.sh | sh
46
+ ```
47
+
48
+ Clone and install dependencies:
49
+
50
+ ```bash
51
+ git clone <repo-url>
52
+ cd PubMatrixPython
53
+ uv sync --all-groups
54
+ ```
55
+
56
+ ---
57
+
58
+ ## Running the notebooks
59
+
60
+ All `uv` commands must be run from the **project root** (`PubMatrixPython/`), where `pyproject.toml` lives.
61
+
62
+ ```bash
63
+ cd /path/to/PubMatrixPython
64
+ uv run jupyter lab
65
+ ```
66
+
67
+ Then open any notebook from the `notebooks/` folder in the browser.
68
+
69
+ | Notebook | What it covers |
70
+ |----------|---------------|
71
+ | `01_pubmatrix.ipynb` | Basic queries, date filtering, PMC database, file input, CSV export, heatmap visualisation |
72
+ | `02_example_wnt.ipynb` | Full worked example: WNT genes × obesity genes |
73
+
74
+ ---
75
+
76
+ ## Quick start (script or REPL)
77
+
78
+ ### Interactive REPL
79
+
80
+ ```bash
81
+ uv run python
82
+ ```
83
+
84
+ ```python
85
+ from pubmatrix import pubmatrix, plot_pubmatrix_heatmap
86
+
87
+ A = ["WNT1", "WNT2", "CTNNB1"]
88
+ B = ["obesity", "diabetes", "cancer"]
89
+
90
+ result = pubmatrix(A=A, B=B)
91
+ print(result)
92
+
93
+ plot_pubmatrix_heatmap(result, title="WNT × Disease")
94
+ ```
95
+
96
+ ### Running a script
97
+
98
+ Create a file `my_analysis.py`:
99
+
100
+ ```python
101
+ from pubmatrix import pubmatrix, plot_pubmatrix_heatmap
102
+
103
+ A = ["WNT1", "WNT2", "WNT3A", "WNT5A", "CTNNB1"]
104
+ B = ["obesity", "diabetes", "cancer", "inflammation"]
105
+
106
+ result = pubmatrix(
107
+ A=A,
108
+ B=B,
109
+ database="pubmed",
110
+ daterange=[2010, 2024], # optional date filter
111
+ outfile="results",
112
+ export_format="csv", # saves results_result.csv with PubMed hyperlinks
113
+ )
114
+
115
+ print(result)
116
+
117
+ plot_pubmatrix_heatmap(
118
+ result,
119
+ title="WNT Genes × Disease",
120
+ filename="heatmap.png", # saves to file instead of displaying
121
+ )
122
+ ```
123
+
124
+ Run it with:
125
+
126
+ ```bash
127
+ uv run python my_analysis.py
128
+ ```
129
+
130
+ ### Loading terms from a file
131
+
132
+ Create `terms.txt`:
133
+
134
+ ```
135
+ WNT1
136
+ WNT2
137
+ CTNNB1
138
+ #
139
+ obesity
140
+ diabetes
141
+ cancer
142
+ ```
143
+
144
+ ```python
145
+ from pubmatrix import pubmatrix_from_file
146
+
147
+ result = pubmatrix_from_file("terms.txt")
148
+ print(result)
149
+ ```
150
+
151
+ ```bash
152
+ uv run python my_analysis.py
153
+ ```
154
+
155
+ ---
156
+
157
+ ## API reference
158
+
159
+ ### `pubmatrix(A, B, ...)`
160
+
161
+ Query PubMed and return a `pandas.DataFrame` (rows = B, cols = A).
162
+
163
+ ```python
164
+ pubmatrix(
165
+ A, # list of str — column terms
166
+ B, # list of str — row terms
167
+ api_key=None, # NCBI API key (10 req/s vs 3 req/s default)
168
+ database="pubmed", # "pubmed" or "pmc"
169
+ daterange=None, # e.g. [2015, 2024]
170
+ outfile=None, # base filename for export
171
+ export_format=None, # None | "csv" | "ods"
172
+ n_tries=2, # retries on network failure
173
+ n_workers=1, # parallel workers for concurrent queries
174
+ timeout=30, # HTTP request timeout in seconds
175
+ cache_dir=None, # directory to cache query results on disk
176
+ )
177
+ ```
178
+
179
+ ### `pubmatrix_from_file(filepath, ...)`
180
+
181
+ Load terms from a plain-text file and run `pubmatrix()`.
182
+
183
+ File format:
184
+ ```
185
+ WNT1
186
+ WNT2
187
+ #
188
+ obesity
189
+ diabetes
190
+ ```
191
+
192
+ ```python
193
+ result = pubmatrix_from_file("terms.txt", database="pubmed")
194
+ ```
195
+
196
+ ### `plot_pubmatrix_heatmap(matrix, ...)`
197
+
198
+ Heatmap of overlap percentages with optional hierarchical clustering. Returns `(fig, ax)`.
199
+
200
+ ```python
201
+ fig, ax = plot_pubmatrix_heatmap(
202
+ matrix, # DataFrame from pubmatrix()
203
+ title="PubMatrix Co-occurrence Heatmap",
204
+ cluster_rows=True,
205
+ cluster_cols=True,
206
+ show_numbers=True,
207
+ color_palette=None, # list of hex colours
208
+ filename=None, # save to PNG if set
209
+ width=10, height=8,
210
+ scale_font=True,
211
+ show=False, # call plt.show() after plotting
212
+ )
213
+ ```
214
+
215
+ ### `pubmatrix_heatmap(matrix, title=...)`
216
+
217
+ Quick wrapper around `plot_pubmatrix_heatmap()` with all defaults. Returns `(fig, ax)`.
218
+
219
+ ---
220
+
221
+ ## Output files
222
+
223
+ When `outfile` and `export_format` are set, results are written to
224
+ `{outfile}_result.{extension}` (`.csv` or `.ods`). Each cell contains the
225
+ publication count and a hyperlink to the matching PubMed search. Row names
226
+ come from `B`, column names from `A`.
227
+
228
+ ODS export requires the optional `odfpy` dependency:
229
+
230
+ ```bash
231
+ pip install pubmatrixpython[ods]
232
+ ```
233
+
234
+ ---
235
+
236
+ ## NCBI API key
237
+
238
+ Without a key: 3 requests/second. With a key: 10 requests/second.
239
+ Get one at https://account.ncbi.nlm.nih.gov/
240
+
241
+ ```python
242
+ result = pubmatrix(A=A, B=B, api_key="YOUR_KEY_HERE")
243
+ ```
244
+
245
+ ---
246
+
247
+ ## More documentation
248
+
249
+ - [Performance notes](docs/performance.md) — rate limits, caching, concurrency
250
+ - [Troubleshooting](docs/troubleshooting.md) — empty results, rate limiting, slow searches
251
+ - [Full reference notebook](https://toledoem.github.io/pubmatrixp/) — every parameter and feature, with output
252
+
253
+ ---
254
+
255
+ ## License & citation
256
+
257
+ This project is licensed under the MIT License — see [`LICENSE.md`](LICENSE.md).
258
+
259
+ If you use PubMatrixPython in your research, please cite:
260
+
261
+ > Becker KG, Hosack DA, Dennis G Jr, Lempicki RA, Bright TJ, Cheadle C, Engel J.
262
+ > *PubMatrix: a tool for multiplex literature mining.*
263
+ > BMC Bioinformatics. 2003 Dec 10;4:61. https://doi.org/10.1186/1471-2105-4-61
264
+
265
+ **Developers:**
266
+ - Tyler Laird (Author, original PubMatrixR)
267
+ - Enrique Toledo (Author, maintainer)