PyPI - pmc-toolkit - Versions diffs - 0.2.0__tar.gz → 0.4.0__tar.gz - Mend

pmc-toolkit 0.2.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/.gitignore RENAMED Viewed

@@ -8,3 +8,6 @@ wheels/
 # Virtual environments
 .venv
+.DS_Store

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pmc-toolkit
-Version: 0.2.0
+Version: 0.4.0
 Summary: Python toolkit and CLI for exploring, downloading, and parsing PMC article data.
 Project-URL: Homepage, https://github.com/JakaKokosar/pmc-toolkit
 Project-URL: Repository, https://github.com/JakaKokosar/pmc-toolkit
@@ -42,6 +42,7 @@ The project currently supports:
 - listing available versions for a PMCID
 - validating PMC identifiers before making requests
 - retrieving metadata for a PMC identifier, defaulting to the latest version for a base PMCID
+- converting PMID, DOI, PMCID, or MID values to PMC identifiers when PMC has a matching record
 - listing every object for a resolved article version, using the local cache when available
 - downloading files for an article version into a local cache (optional `--ext`
   filters apply only to `fetch`, not to `files`; `--ext` accepts either a
@@ -94,6 +95,13 @@ Fetch metadata for a specific version:
 uv run pmc-toolkit metadata PMC11370360.1
 ```
+Convert a PMID or DOI to a PMCID when PMC has a matching record:
+```bash
+uv run pmc-toolkit idconv 23193287
+uv run pmc-toolkit idconv 10.1093/nar/gks1195 --idtype doi
+```
 List every object key for an article version (including media and supplements).
 For unversioned IDs, the CLI resolves the latest version from S3 first; once the
 version is known, the cached object-key manifest is reused when present. There
@@ -134,22 +142,17 @@ uv run pmc-toolkit fetch PMC11370360.1 --cache-dir ./data
 PMC_TOOLKIT_CACHE=./data uv run pmc-toolkit fetch PMC11370360.1
 ```
-Convert a cached XML file into extracted JSON. Run `fetch --ext xml` first if
-the XML is not already in the cache. The first conversion parses XML once,
+Parse a cached XML file into extracted JSON. Run `fetch --ext xml` first if
+the XML is not already in the cache. The first parse reads XML once,
 writes `<cache-root>/<PMCid.N>/.pmc-extracted-article.json`, and prints the
-extracted JSON; later conversions for the same article version read that JSON
+extracted JSON; later parses for the same article version read that JSON
 cache unless `--force` is passed.
 ```bash
 uv run pmc-toolkit fetch PMC11370360.1 --ext xml
-uv run pmc-toolkit convert-xml PMC11370360.1
+uv run pmc-toolkit parse PMC11370360.1
 ```
-List the extracted JSON top-level keys:
-```bash
-uv run pmc-toolkit convert-xml --list-keys PMC11370360.1
-```
 `article_info.publication_date` currently uses the first publication date found
 in the XML. If downstream consumers need to distinguish date types such as
@@ -176,7 +179,7 @@ Each resolved article version has a directory `<cache_root>/<PMCid.N>/` containi
 - **`<PMCid.N>.json`** — cached metadata (from S3 `metadata/<PMCid.N>.json`), written after a successful read.
 - **`.pmc-object-keys.json`** — JSON array of S3 object keys under that article’s prefix, written after `list_objects_v2` (or read on cache hit). If this file is missing or not a list of strings, listing or fetch may refetch from S3 or raise `ValueError` for an invalid manifest.
-- **`.pmc-extracted-article.json`** — full extracted JSON produced from the cached XML by `pmc-toolkit convert-xml`; reused by later conversions for the same article version.
+- **`.pmc-extracted-article.json`** — full extracted JSON produced from the cached XML by `pmc-toolkit parse`; reused by later parses for the same article version.
 **Cache root selection:** `pmc-toolkit metadata` and `pmc-toolkit files` (and the matching `storage_api` functions) always use the default OS user cache from [`platformdirs`](https://github.com/tox-dev/platformdirs). Only `pmc-toolkit fetch` and `fetch_files(..., cache_dir=...)` accept `--cache-dir` or the `PMC_TOOLKIT_CACHE` environment variable.

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/README.md RENAMED Viewed

@@ -10,6 +10,7 @@ The project currently supports:
 - listing available versions for a PMCID
 - validating PMC identifiers before making requests
 - retrieving metadata for a PMC identifier, defaulting to the latest version for a base PMCID
+- converting PMID, DOI, PMCID, or MID values to PMC identifiers when PMC has a matching record
 - listing every object for a resolved article version, using the local cache when available
 - downloading files for an article version into a local cache (optional `--ext`
   filters apply only to `fetch`, not to `files`; `--ext` accepts either a
@@ -62,6 +63,13 @@ Fetch metadata for a specific version:
 uv run pmc-toolkit metadata PMC11370360.1
 ```
+Convert a PMID or DOI to a PMCID when PMC has a matching record:
+```bash
+uv run pmc-toolkit idconv 23193287
+uv run pmc-toolkit idconv 10.1093/nar/gks1195 --idtype doi
+```
 List every object key for an article version (including media and supplements).
 For unversioned IDs, the CLI resolves the latest version from S3 first; once the
 version is known, the cached object-key manifest is reused when present. There
@@ -102,22 +110,17 @@ uv run pmc-toolkit fetch PMC11370360.1 --cache-dir ./data
 PMC_TOOLKIT_CACHE=./data uv run pmc-toolkit fetch PMC11370360.1
 ```
-Convert a cached XML file into extracted JSON. Run `fetch --ext xml` first if
-the XML is not already in the cache. The first conversion parses XML once,
+Parse a cached XML file into extracted JSON. Run `fetch --ext xml` first if
+the XML is not already in the cache. The first parse reads XML once,
 writes `<cache-root>/<PMCid.N>/.pmc-extracted-article.json`, and prints the
-extracted JSON; later conversions for the same article version read that JSON
+extracted JSON; later parses for the same article version read that JSON
 cache unless `--force` is passed.
 ```bash
 uv run pmc-toolkit fetch PMC11370360.1 --ext xml
-uv run pmc-toolkit convert-xml PMC11370360.1
+uv run pmc-toolkit parse PMC11370360.1
 ```
-List the extracted JSON top-level keys:
-```bash
-uv run pmc-toolkit convert-xml --list-keys PMC11370360.1
-```
 `article_info.publication_date` currently uses the first publication date found
 in the XML. If downstream consumers need to distinguish date types such as
@@ -144,7 +147,7 @@ Each resolved article version has a directory `<cache_root>/<PMCid.N>/` containi
 - **`<PMCid.N>.json`** — cached metadata (from S3 `metadata/<PMCid.N>.json`), written after a successful read.
 - **`.pmc-object-keys.json`** — JSON array of S3 object keys under that article’s prefix, written after `list_objects_v2` (or read on cache hit). If this file is missing or not a list of strings, listing or fetch may refetch from S3 or raise `ValueError` for an invalid manifest.
-- **`.pmc-extracted-article.json`** — full extracted JSON produced from the cached XML by `pmc-toolkit convert-xml`; reused by later conversions for the same article version.
+- **`.pmc-extracted-article.json`** — full extracted JSON produced from the cached XML by `pmc-toolkit parse`; reused by later parses for the same article version.
 **Cache root selection:** `pmc-toolkit metadata` and `pmc-toolkit files` (and the matching `storage_api` functions) always use the default OS user cache from [`platformdirs`](https://github.com/tox-dev/platformdirs). Only `pmc-toolkit fetch` and `fetch_files(..., cache_dir=...)` accept `--cache-dir` or the `PMC_TOOLKIT_CACHE` environment variable.

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/RELEASING.md RENAMED Viewed

@@ -32,6 +32,10 @@ deployment if the environment requires it. Smoke test:
 uv run --with "pmc-toolkit==${version}" --no-project -- pmc-toolkit --help
 ```
+From **v0.2.0**, the PyPI wheel exposes only the `pmc-toolkit` console script
+(the previous `pmc` script was removed so the binary matches the distribution
+name).
 Optionally draft a GitHub Release from the tag for user-facing notes.
 ## Troubleshooting

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "pmc-toolkit"
-version = "0.2.0"
+version = "0.4.0"
 description = "Python toolkit and CLI for exploring, downloading, and parsing PMC article data."
 readme = "README.md"
 requires-python = ">=3.11"

pmc_toolkit-0.4.0/skills/pmc-toolkit/SKILL.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+name: pmc-toolkit
+description: Work with PubMed Central Open Access articles by PMCID using PMC Toolkit. Use for version resolution, metadata and file inventory, downloads, parsed article evidence extraction, authors and contributor analysis, figures, tables, references, supplements, declarations, knowledge extraction, and report-style summaries. Can convert PMID/DOI to PMCID only to continue PMC full-text workflows; not for keyword literature search or non-PMC article analysis.
+---
+# PMC Toolkit
+Use this skill to retrieve, download, parse, and cite PMC Open Access article data with PMC Toolkit. Select commands by the data needed to complete the task, not by surface wording in the request.
+## Operating Rules
+- Run published-tool commands as `uvx pmc-toolkit ...`.
+- Do not add installation guidance. If `uv` or `uvx` is unavailable, report that PMC Toolkit needs it and stop.
+- Live lookups, listings, and downloads require network access to the PMC Open Access S3 dataset unless the needed data is already cached.
+- Resolve bundled helper paths relative to this skill directory, for example `<SKILL_DIR>/scripts/content-outline.jq`.
+- Do not load, dump, grep, or search raw XML or PDF files directly for article-content tasks. Fetch XML only as parser input, then use `parse` output and bundled JSON helpers for evidence extraction.
+- When piping PMC Toolkit JSON through `jq`, use `jq -c` unless pretty-printed JSON is explicitly needed for human inspection. Prefer compact JSON to avoid bloating context.
+- For simple extraction requests where the command output is already the user-facing answer, do not repeat large text in the final response. Return only a brief label or status plus the exact command output when needed; for long abstracts, tables, or lists, prefer telling the user the command printed the requested value instead of restating it.
+- Do not invent missing declarations, author notes, figures, tables, or references. Report the missing parsed field or empty list.
+## Task Router
+Choose the smallest route that answers the request. Prefer a direct CLI route when one command output is enough; use a workflow route when the task needs multi-step retrieval, synthesis, or evidence reporting. If using a direct CLI route, make the shell command do the final formatting so the assistant response can stay minimal.
+### Direct CLI Routes
+- PMCID availability and version resolution: read [references/cli-versions.md](references/cli-versions.md) for `versions` command details.
+- PMID/DOI to PMCID conversion for continuing PMC workflows: read [references/cli-idconv.md](references/cli-idconv.md) for `idconv` command details.
+- DOI, title, journal, license, OA flags, retraction flags, and S3 URL fields: read [references/cli-metadata.md](references/cli-metadata.md) for `metadata` command details.
+- File inventory or downloads for XML, PDF, text, figures, media, or supplements: read [references/cli-files.md](references/cli-files.md) for `files` and `fetch` command details.
+- Parsed article JSON, body sections, supporting info, parsed authors, figures, tables, references, or helper `jq` usage: run `fetch --ext xml` as needed, then `parse`. Read [references/cli-parse.md](references/cli-parse.md) for `parse` output shape and helper-script usage.
+- If `parse` is needed but the right parsed field is not obvious, read [references/data-locator.md](references/data-locator.md) before retrieving detailed evidence.
+- When a task asks about a referenced/cited article and the parsed reference has `identifiers.pmid` or `identifiers.doi` but no `identifiers.pmcid`, use `idconv` before stopping.
+### Workflow Routes
+- Article-content questions, passage finding, section analysis, support for claims, declarations, supplements, or evidence-grounded answers: read [references/workflow-evidence-extraction.md](references/workflow-evidence-extraction.md).
+- Author, affiliation, ORCID, equal-contribution, corresponding-author, contributor, or author-note tasks: read [references/workflow-author-contributor-analysis.md](references/workflow-author-contributor-analysis.md).
+- Knowledge extraction, claim extraction, evidence matrices, mechanism summaries, or structured fact extraction: read [references/workflow-knowledge-extraction.md](references/workflow-knowledge-extraction.md).
+- Figure interpretation, graphics lookup, panel questions, or visual inspection: read [references/workflow-figure-image-analysis.md](references/workflow-figure-image-analysis.md). Fetch image files only when visual inspection is required.
+- Report-style summaries, author reports, evidence reports, or deliverables combining several data types: read [references/workflow-reporting.md](references/workflow-reporting.md), then load only the source-specific workflow references required by the report.
+## Bundled Resources
+Open references only after choosing a route above, when command-specific details, output shapes, or workflow details are needed.
+- [references/data-locator.md](references/data-locator.md) - task-to-parsed-JSON-field routing.
+- [references/workflow-evidence-extraction.md](references/workflow-evidence-extraction.md) - detailed evidence retrieval loop and answer contract.
+- [references/workflow-author-contributor-analysis.md](references/workflow-author-contributor-analysis.md) - author, affiliation, correspondence, and contributor-note workflow.
+- [references/workflow-knowledge-extraction.md](references/workflow-knowledge-extraction.md) - generic structured extraction workflow.
+- [references/workflow-figure-image-analysis.md](references/workflow-figure-image-analysis.md) - figure caption, linked text, and visual-inspection workflow.
+- [references/workflow-reporting.md](references/workflow-reporting.md) - report assembly pattern for mixed data tasks.
+- [references/cli-versions.md](references/cli-versions.md) - `versions` examples and version selection.
+- [references/cli-idconv.md](references/cli-idconv.md) - `idconv` examples and missing-PMC handling.
+- [references/cli-metadata.md](references/cli-metadata.md) - `metadata` examples and field overview.
+- [references/cli-files.md](references/cli-files.md) - `files` and `fetch` output shapes.
+- [references/cli-parse.md](references/cli-parse.md) - `parse` output shape and helper-script usage.
+- [references/cli-parse-figures.md](references/cli-parse-figures.md) - figure lookup shape and citation context.
+- [references/cli-parse-tables.md](references/cli-parse-tables.md) - table lookup shape and citation context.
+- [references/cli-parse-references.md](references/cli-parse-references.md) - reference lookup shape and citation context.
+- `<SKILL_DIR>/scripts/content-outline.jq` - paper outline first step for evidence extraction.
+- `<SKILL_DIR>/scripts/query-id.jq` - lookup sections, paragraphs, figures, tables, and references by `source_id`.
+- `<SKILL_DIR>/scripts/reverse-lookup-xref.jq` - find paragraphs that cite a figure, table, or reference.
+## Gotchas
+- `files` has no extension filter. Use `fetch --ext` for filtered downloads.
+- Parsed reference records often omit PMCID even when they include PMID or DOI. Use `idconv` to test whether PMC has a matching article before saying PMC full text is unavailable.
+- `parse` needs cached XML; run `fetch <PMCID.N> --ext xml` first when XML is absent.
+- `fetch` and `parse` use the default PMC Toolkit cache unless `--cache-dir` or `PMC_TOOLKIT_CACHE` is provided. Use custom cache paths only when there is a concrete reason.
+- Cache paths are per article version. Keep the same cache root across `fetch` and `parse` if a custom cache is used.

pmc_toolkit-0.4.0/skills/pmc-toolkit/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "PMC Toolkit"
+  short_description: "PMC OA evidence extraction"
+  default_prompt: "Use $pmc-toolkit to retrieve PMC paper metadata, files, parsed evidence, authors, figures, tables, or report-ready findings for this PMCID. Convert PMID/DOI to PMCID only when needed to continue PMC full-text work."

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/cli-files.md ADDED Viewed

@@ -0,0 +1,59 @@
+# CLI: `files` And `fetch`
+Use `files <PMCID.N>` to list every S3 object key under the article version prefix. Use `fetch <PMCID.N>` to download all or selected object extensions into the local cache.
+## `files`
+`files` has no extension filter.
+```bash
+uvx pmc-toolkit files PMCxxxx.N
+```
+Example output:
+```json
+{
+  "versioned_pmcid": "PMCxxxx.N",
+  "keys": [
+    "PMCxxxx.N/PMCxxxx.N.xml",
+    "PMCxxxx.N/PMCxxxx.N.pdf",
+    "PMCxxxx.N/media-1.jpg"
+  ]
+}
+```
+Use `files` for inventory, not for local paths.
+## `fetch`
+Use `fetch` when a file must exist locally for parsing, inspection, or user delivery.
+Example for downloading all files listed in above `files` output:
+```bash
+uvx pmc-toolkit fetch PMCxxxx.N --ext xml,pdf,jpg
+```
+Example output:
+```json
+{
+  "versioned_pmcid": "PMCxxxx.N",
+  "cache_dir": "/cache/root/PMCxxxx.N",
+  "files": [
+    {
+      "key": "PMCxxxx.N/PMCxxxx.N.xml",
+      "local_path": "/cache/root/PMCxxxx.N/PMCxxxx.N.xml",
+      "action": "downloaded"
+    }, ...
+  ]
+}
+```
+Use `local_path` if you need access to the downloaded files.
+## Cache Notes
+- `metadata` and `files` use the default OS user cache for metadata/manifests.
+- `fetch` and `parse` can use `--cache-dir` or `PMC_TOOLKIT_CACHE`; keep the same cache root across both commands.
+- Cache paths are per article version under `<cache_root>/<PMCID.N>/`.

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/cli-idconv.md ADDED Viewed

@@ -0,0 +1,25 @@
+# CLI: `idconv`
+Use `idconv <ID...>` to convert PMID, DOI, PMCID, or MID values to PMC identifiers through the PMC ID Converter API. Use this only as a bridge back into PMC full-text workflows, for example when a parsed reference has `identifiers.pmid` or `identifiers.doi` but no `identifiers.pmcid`.
+```bash
+uvx pmc-toolkit idconv 23193287
+uvx pmc-toolkit idconv 10.1093/nar/gks1195 --idtype doi
+```
+Example output shape:
+```json
+[
+  {
+    "requested-id": "23193287",
+    "pmid": 23193287,
+    "pmcid": "PMC3531190",
+    "doi": "10.1093/nar/gks1195"
+  }
+]
+```
+When a record has `status: "error"` or no `pmcid`, stop the PMC full-text workflow for that referenced article and report that no matching PMC record was found. Do not summarize from the title alone.
+After a record returns `pmcid`, run `versions <PMCID>` and continue with `metadata`, `fetch`, and `parse`.

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/cli-metadata.md ADDED Viewed

@@ -0,0 +1,34 @@
+# CLI: `metadata`
+Use `metadata <PMCID.N>` to fetch bibliographic fields, Open Access flags, and S3 URL fields (for example `xml_url`, `pdf_url`, `media_urls`, `text_url`), plus `pmid` and `doi`.
+Example:
+```bash
+uvx pmc-toolkit metadata PMCxxxx.N
+```
+Example output:
+```json
+{
+  "pmcid": "PMCxxxx",
+  "version": N,
+  "pmid": 12345678,
+  "doi": "10.1234/example.doi",
+  "mid": null,
+  "title": "Example article title",
+  "citation": "Journal Name",
+  "is_pmc_openaccess": true/false,
+  "is_manuscript": true/false,
+  "is_historical_ocr": true/false,
+  "is_retracted": true/false,
+  "license_code": "license code",
+  "xml_url": "s3://pmc-oa-opendata/PMCxxxx.N/PMCxxxx.N.xml?md5=<hex>",
+  "pdf_url": "s3://pmc-oa-opendata/PMCxxxx.N/PMCxxxx.N.pdf?md5=<hex>",
+  "media_urls": [
+    "s3://pmc-oa-opendata/PMCxxxx.N/media-1.jpg?md5=<hex>",
+    ...
+  ],
+  "text_url": "s3://pmc-oa-opendata/PMCxxxx.N/PMCxxxx.N.txt?md5=<hex>"
+}
+```

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/cli-parse.md ADDED Viewed

@@ -0,0 +1,116 @@
+# CLI: `parse`
+Use `parse <PMCID.N>` after cached full-text XML exists. Run `fetch <PMCID.N> --ext xml` first when `<cache>/<PMCID.N>/<PMCID.N>.xml` is missing. The first run parses XML once, writes `<cache-root>/<PMCID.N>/.pmc-extracted-article.json`, and prints the extracted JSON; later runs reuse that cache unless `--force` is passed.
+Add `--cache-dir` or `PMC_TOOLKIT_CACHE` when the XML was fetched outside the default OS user cache. Keep the same cache root across `fetch` and `parse`.
+```bash
+uvx pmc-toolkit fetch PMCxxxx.N --ext xml
+uvx pmc-toolkit parse PMCxxxx.N
+```
+The `parse` command prints the extracted article JSON (`result.data`), not the `fetch` wrapper with `versioned_pmcid`, `cache_dir`, and downloaded `files`.
+## Extracted JSON top-level keys
+- **article_info** - `journal`, `article_ids`, `title`, `publication_date`, `article_type`, `license`, `keywords`, `authors[]`, `abstract`, `funding_grants[]`
+- **content** - `paragraphs[]` and `sections[]`; items include `source_id`, `section_id`, `title`, `text`, `reference_ids`, `figure_ids`, and `table_ids`
+- **references** - `references[]` with `source_id`, `label`, `text`, `publication_type`, `identifiers`, `article_title`, `source`, `year`, `volume`, `issue`, and `pages`
+- **figures** - `figures[]` with `source_id`, `label`, `caption`, and `graphics`
+- **tables** - `tables[]` with `source_id`, `label`, `caption`, `rows`, and `footnotes`
+- **supporting_info** - `acknowledgements`, `competing_interests`, `data_availability`, `supplementary_media`, `author_notes`, `related_articles`, and `custom_metadata`
+## Narrow retrieval with jq
+**Start here.** Full `parse` output is large. Pipe it through jq and load only the slice you need.
+### Content outline (default first step)
+`scripts/content-outline.jq` returns a nested section tree: article title plus `section_id` and `title` for each section.
+```bash
+uvx pmc-toolkit parse PMCxxxx.N | jq -c -f <SKILL_DIR>/scripts/content-outline.jq
+```
+Example output:
+```json
+{
+  "title": "journal title",
+  "sections": [
+    {
+      "section_id": "S1",
+      "title": "section title"
+    },
+    {
+      "section_id": "S2",
+      "title": "section title",
+      "sections": [
+        {
+          "section_id": "S3",
+          "title": "sub-section title"
+        }
+      ]
+    }
+  ]
+}
+```
+Use this to pick relevant sections (based on their titles) before loading detailed information.
+The `section_id` values in the outline are XML source IDs (`S1`, `S2`, ...). Use them with `<SKILL_DIR>/scripts/query-id.jq` to fetch detailed section data.
+### Drill down by ID
+`scripts/query-id.jq` returns the first object whose `source_id` matches. After the content outline, pass a chosen ID:
+**Section** - paragraph text and xref links for that section:
+```bash
+uvx pmc-toolkit parse PMCxxxx.N | jq -c --arg id "S3" -f <SKILL_DIR>/scripts/query-id.jq
+```
+Example output:
+```json
+{
+  "source_id": "S3",
+  "section_id": "2.1",
+  "title": "sub-section title",
+  "paragraphs": [
+    {
+      "source_id": "P9",
+      "text": "paragraph text",
+      "reference_ids": ["R1", "R18"],
+      "figure_ids": ["F1", "F5"],
+      "table_ids": ["T1"]
+    }
+  ],
+  "sections": []
+}
+```
+Some sections are containers only. In the outline, `S2` (Results) has child sections but no paragraphs of its own - the text lives in `S3`, `S4`, etc.
+Query those leaf `S*` IDs (sections with no nested `sections` in the outline), not the parent, to load only the subsection you need.
+**Figure, table, or reference** - same script, different ID prefix:
+```bash
+uvx pmc-toolkit parse PMCxxxx.N | jq -c --arg id "F1" -f <SKILL_DIR>/scripts/query-id.jq
+uvx pmc-toolkit parse PMCxxxx.N | jq -c --arg id "R1" -f <SKILL_DIR>/scripts/query-id.jq
+uvx pmc-toolkit parse PMCxxxx.N | jq -c --arg id "T1" -f <SKILL_DIR>/scripts/query-id.jq
+```
+Use paragraph `reference_ids`, `figure_ids`, and `table_ids` to fetch linked entries with `scripts/query-id.jq`. Output shapes:
+- [cli-parse-references.md](cli-parse-references.md) - `R*` lookup
+- [cli-parse-figures.md](cli-parse-figures.md) - `F*` lookup
+- [cli-parse-tables.md](cli-parse-tables.md) - `T*` lookup
+### Reverse lookup by xref
+`query-id.jq` resolves an ID to its object. `reverse-lookup-xref.jq` finds every paragraph that cites a given reference, figure, or table. Pass `--arg xref` as `references`, `figures`, or `tables`:
+```bash
+uvx pmc-toolkit parse PMCxxxx.N | jq -c --arg xref references --arg id "R1" -f <SKILL_DIR>/scripts/reverse-lookup-xref.jq
+uvx pmc-toolkit parse PMCxxxx.N | jq -c --arg xref figures --arg id "F1" -f <SKILL_DIR>/scripts/reverse-lookup-xref.jq
+uvx pmc-toolkit parse PMCxxxx.N | jq -c --arg xref tables --arg id "T1" -f <SKILL_DIR>/scripts/reverse-lookup-xref.jq
+```

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/cli-versions.md ADDED Viewed

@@ -0,0 +1,35 @@
+# CLI: `versions`
+Use `versions <PMCID>` to list every published versioned PMCID string (`PMCxxxx.1`, `PMCxxxx.2`, ...) for a **base** PMCID only. `versions` rejects versioned IDs.
+```bash
+uvx pmc-toolkit versions PMCxxxx
+```
+Example output shape:
+```json
+{
+  "pmcid": "PMCxxxx",
+  "versions": [
+    "PMCxxxx.1",
+    "PMCxxxx.2"
+  ]
+}
+```
+If `.versions` is empty, stop for that PMCID and report that no PMC Open Access version was found. Do not continue to `metadata`, `files`, `fetch`, or `parse` for that PMCID.
+## Pick the latest `<PMCID.N>`
+```bash
+uvx pmc-toolkit versions PMCxxxx | jq -c -r '.versions[-1]'
+```
+## Pick a non-latest version
+Select an element of `.versions` by index (for example `.versions[0]` for the first published version).
+## Next steps
+After you have `<PMCID.N>`, continue with `metadata`, `files`, `fetch`, and `parse` as described in the main skill.

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/data-locator.md ADDED Viewed

@@ -0,0 +1,42 @@
+# Parsed Data Locator
+Use this file after choosing `parse` when the right parsed JSON field is not obvious. It maps task intent to the lowest-cost parsed field or helper command. For command selection before parsing, use the router in `SKILL.md` and the CLI references.
+## Parsed JSON Routing
+Use `uvx pmc-toolkit parse <PMCID.N> | jq -c '<filter>'` for compact retrieval.
+| Task | First parsed source | Notes |
+| --- | --- | --- |
+| Article identity from XML | `.article_info` | Use when XML-derived identity is needed. For DOI, title, journal, license, OA flags, and S3 URLs alone, prefer `metadata`. |
+| Abstract | `.article_info.abstract` | Use before body sections for high-level study orientation. |
+| Body section discovery | `<SKILL_DIR>/scripts/content-outline.jq` | Always inspect outline before loading body text. |
+| Body passages | `query-id.jq` on selected `content.sections[].source_id` | Prefer leaf sections when parent sections only group subsections. |
+| Standalone body paragraphs | `.content.paragraphs[]?` | Some XML has top-level body paragraphs outside sections. |
+| Authors and affiliations | `.article_info.authors[]` | Authors include resolved affiliation text when available. |
+| ORCID | `.article_info.authors[].orcid` | Report absent ORCIDs as absent, not unknown. |
+| Equal contribution, author notes, correspondence | `.supporting_info.author_notes` | Use with `.article_info.authors[]`; do not infer equal contribution from author order alone. |
+| Funding | `.article_info.funding_grants[]`, then `.supporting_info` | Some articles encode funding in article metadata, some in acknowledgements. |
+| Acknowledgements | `.supporting_info.acknowledgements[]` | Cite paragraph `source_id` when available. |
+| Competing interests | `.supporting_info.competing_interests[]` | Preserve exact statement and report absence if empty. |
+| Data availability | `.supporting_info.data_availability[]` | Preserve accessions, repository names, URLs, and restrictions. |
+| Supplementary media | `.supporting_info.supplementary_media[]` | Use `files` only when local download or object-key inventory is needed. |
+| Related articles | `.supporting_info.related_articles[]` | Useful for preprint to published article links. |
+| Custom PMC/JATS metadata | `.supporting_info.custom_metadata` | Use for PMC properties that are not normal article fields. |
+| Figures | `.figures[]` | Start with label, caption, graphics. Use linked paragraphs before fetching images unless visual inspection is requested. |
+| Tables | `.tables[]` | Contains XML-extracted rows and footnotes only. Do not assume PDF-only tables are available. |
+| References | `.references[]` | Use identifiers and labels. Reverse lookup paragraphs that cite a reference for context. |
+| Figure, table, or reference citation context | `reverse-lookup-xref.jq` | Pass `--arg xref figures`, `tables`, or `references`. |
+## Retrieval Shortcuts
+- Author summary:
+  `uvx pmc-toolkit parse <PMCID.N> | jq -c '{title: .article_info.title, authors: .article_info.authors, author_notes: .supporting_info.author_notes}'`
+- Declarations:
+  `uvx pmc-toolkit parse <PMCID.N> | jq -c '.supporting_info | {acknowledgements, competing_interests, data_availability, author_notes}'`
+- Figure inventory:
+  `uvx pmc-toolkit parse <PMCID.N> | jq -c '.figures[] | {source_id, label, caption, graphics}'`
+- Table inventory:
+  `uvx pmc-toolkit parse <PMCID.N> | jq -c '.tables[] | {source_id, label, caption, rows, footnotes}'`
+- Reference inventory:
+  `uvx pmc-toolkit parse <PMCID.N> | jq -c '.references[] | {source_id, label, article_title, source, year, identifiers}'`

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/workflow-author-contributor-analysis.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Workflow: Author And Contributor Analysis
+Use this workflow for author lists, affiliations, ORCIDs, corresponding authors, equal-contribution notes, contributor notes, author-focused reports, and author-related declarations.
+## Source Priority
+1. Use `metadata` for PMCID version, title, DOI, citation, OA flags, and retraction status.
+2. Use `parse` for author names, resolved affiliations, ORCIDs, and author notes.
+3. Use `supporting_info.author_notes` for equal contribution, correspondence, and other notes.
+4. Use `supporting_info.acknowledgements`, `competing_interests`, and `data_availability` only when the author task asks for declarations or report context.
+## Retrieval
+Resolve the version and fetch XML:
+```bash
+uvx pmc-toolkit metadata <PMCID.N>
+uvx pmc-toolkit fetch <PMCID.N> --ext xml
+uvx pmc-toolkit parse <PMCID.N> | jq -c '{title: .article_info.title, authors: .article_info.authors, author_notes: .supporting_info.author_notes}'
+```
+If declarations or report context are requested:
+```bash
+uvx pmc-toolkit parse <PMCID.N> | jq -c '.supporting_info | {acknowledgements, competing_interests, data_availability, author_notes}'
+```
+## Interpretation Rules
+- Preserve author order from `.article_info.authors[]`.
+- Treat missing `orcid` fields as absent ORCIDs. Do not infer ORCIDs.
+- Treat missing affiliation text as absent affiliation data. Do not invent institutional names.
+- Identify equal contribution only from `author_notes`, not from author order or symbols unless the note explains the symbol.
+- Identify corresponding authors only from correspondence entries or explicit author notes.
+- If author notes mention symbols but the parsed author list does not connect symbols to names, report the limitation instead of forcing a mapping.
+## Output Patterns
+For a compact author answer, include:
+- Selected `<PMCID.N>`.
+- Article title and DOI when available.
+- Ordered author list.
+- Affiliation and ORCID fields when requested or relevant.
+- Author-note evidence with `supporting_info.author_notes` and item `source_id` when available.
+- Clear gaps for missing ORCIDs, affiliations, equal-contribution notes, or correspondence.
+For an author report, use [workflow-reporting.md](workflow-reporting.md) and include an evidence table for author-note claims.

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/workflow-evidence-extraction.md ADDED Viewed

@@ -0,0 +1,47 @@
+# Workflow: Evidence Extraction
+Use this workflow for article-content questions, evidence-grounded answers, section analysis, declarations, supplements, and claims that must be tied to PMC article evidence.
+## Retrieval Loop
+1. Resolve the PMCID to a pinned `<PMCID.N>`.
+2. Fetch XML if needed:
+   `uvx pmc-toolkit fetch <PMCID.N> --ext xml`
+3. Inspect the outline first:
+   `uvx pmc-toolkit parse <PMCID.N> | jq -c -f <SKILL_DIR>/scripts/content-outline.jq`
+4. Use [data-locator.md](data-locator.md) to decide whether the answer lives in `article_info`, `content`, `supporting_info`, `figures`, `tables`, or `references`.
+5. State the next retrieval plan before loading detailed evidence when the task needs multiple evidence targets.
+6. Retrieve narrow evidence:
+   `uvx pmc-toolkit parse <PMCID.N> | jq -c --arg id "<SOURCE_ID>" -f <SKILL_DIR>/scripts/query-id.jq`
+7. For linked support, retrieve cited objects or reverse lookup citation context:
+   `uvx pmc-toolkit parse <PMCID.N> | jq -c --arg xref references --arg id "<REFERENCE_ID>" -f <SKILL_DIR>/scripts/reverse-lookup-xref.jq`
+8. If the user asks about the full text, abstract, authors, figures, tables, or evidence inside a referenced/cited article, inspect that reference's `identifiers`. If it has `pmcid`, continue with that PMCID. If it has `pmid` or `doi` but no `pmcid`, read [cli-idconv.md](cli-idconv.md) and run `idconv` before stopping.
+9. Decide after each retrieval whether the evidence is sufficient. If not, choose the next source and repeat.
+10. Stop when the answer is sufficiently supported or when the parsed JSON lacks the needed field. Report gaps explicitly.
+## Evidence Selection
+- Use article title, abstract, and outline to orient.
+- Prefer sections whose titles match the task. Query leaf sections rather than broad parent sections when possible.
+- For claims about results, methods, or discussion, use body paragraphs, not only the abstract.
+- For declarations, use `supporting_info` first.
+- For figure, table, or reference claims, inspect the object and linked paragraph context.
+- For author/contributor claims, use the author workflow.
+## Answer Requirements
+Include in the final answer:
+- Base PMCID and selected `<PMCID.N>`.
+- Each claim with a human-readable locator.
+- Stable `source_id` when useful for traceability.
+- Short evidence summary or short quote from retrieved parsed JSON.
+- Any gap, conflict, or mismatch.
+Use these locators:
+- Body text: `section_id` and section title, plus paragraph `source_id` when needed.
+- Figures: figure `label`.
+- Tables: table `label`.
+- References: reference `label`.
+- Supporting info: supporting-info category plus item `source_id` when available.

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/workflow-figure-image-analysis.md ADDED Viewed

@@ -0,0 +1,33 @@
+# Workflow: Figure And Image Analysis
+Use this workflow for figure captions, figure-linked claims, panel interpretation, graphics files, and visual inspection requests.
+## Caption And Text Evidence
+1. Resolve to `<PMCID.N>`.
+2. Fetch and parse XML.
+3. List figures:
+   `uvx pmc-toolkit parse <PMCID.N> | jq -c '.figures[] | {source_id, label, caption, graphics}'`
+4. Retrieve the selected figure by ID:
+   `uvx pmc-toolkit parse <PMCID.N> | jq -c --arg id "<FIGURE_ID>" -f <SKILL_DIR>/scripts/query-id.jq`
+5. Retrieve paragraphs that cite the figure:
+   `uvx pmc-toolkit parse <PMCID.N> | jq -c --arg xref figures --arg id "<FIGURE_ID>" -f <SKILL_DIR>/scripts/reverse-lookup-xref.jq`
+Use caption plus linked paragraphs for text-grounded figure answers.
+## Visual Inspection
+Fetch image files only when the user asks about the visual itself, a panel, an image feature, or when caption/text evidence is insufficient.
+1. Run `files <PMCID.N>` to inspect available image/media object keys.
+2. Match figure `graphics[]` values to object-key suffixes when possible.
+3. Fetch only likely image extensions:
+   `uvx pmc-toolkit fetch <PMCID.N> --ext jpg,png,tif,tiff,gif`
+4. Use the returned `local_path` for visual inspection with the available image-viewing tool.
+## Output Rules
+- Cite figure `label` and selected `<PMCID.N>`.
+- Include caption evidence and linked paragraph evidence when used.
+- Distinguish what is visible in the image from what the caption or article text states.
+- If the graphics file cannot be matched or fetched, answer from caption/text evidence and report the visual gap.

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/workflow-knowledge-extraction.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Workflow: Knowledge Extraction
+Use this workflow for broad but structured tasks such as extracting key findings, mechanisms, datasets, claims, measurements, limitations, interventions, outcomes, or article-specific knowledge graphs. Keep the workflow generic; specialize the output schema to the user's task.
+## Process
+1. Resolve to `<PMCID.N>`.
+2. Fetch XML and inspect the outline.
+3. Read the abstract only for orientation:
+   `uvx pmc-toolkit parse <PMCID.N> | jq -c '{title: .article_info.title, abstract: .article_info.abstract, keywords: .article_info.keywords}'`
+4. Choose target sections from the outline. For most research extraction tasks, inspect methods, results, discussion, limitations, and any named domain sections.
+5. Retrieve selected sections by `source_id` with `query-id.jq`.
+6. Extract candidate knowledge records from retrieved evidence only.
+7. If the task involves figures, tables, or references, retrieve those objects and their linked paragraph context.
+8. Stop when the selected evidence covers the requested schema or when additional sections are unlikely to change the answer. Report uninspected sections when they are relevant but not loaded.
+## Record Schema
+Use or adapt this schema unless the user provides another:
+- `item`: concise concept, claim, finding, method, variable, dataset, limitation, or outcome.
+- `category`: user-relevant class such as method, result, mechanism, limitation, dataset, or evidence.
+- `evidence_locator`: section title and `section_id`; figure/table/reference label if applicable.
+- `source_id`: paragraph, section, figure, table, or reference ID.
+- `evidence`: short quote or compact summary from retrieved parsed JSON.
+- `confidence`: high, medium, or low based on specificity and directness of evidence.
+- `gap`: missing context, ambiguity, or unsupported inference.
+## Rules
+- Separate article claims from your own synthesis.
+- Do not use uninspected sections as evidence.
+- Prefer direct result/method paragraphs over abstract-only evidence.
+- Keep extraction records small enough to verify. If the task is large, produce a first-pass matrix and state what remains to inspect.
+- Use [workflow-reporting.md](workflow-reporting.md) when the user asks for a polished report rather than raw records.

pmc_toolkit-0.4.0/skills/pmc-toolkit/references/workflow-reporting.md ADDED Viewed

@@ -0,0 +1,32 @@
+# Workflow: Reporting
+Use this workflow when the user asks for a report, memo, evidence brief, author report, structured summary, or deliverable that combines multiple PMC Toolkit data sources.
+## Process
+1. Infer the report scope from the request. Ask a question only when the deliverable cannot be scoped safely.
+2. Resolve to `<PMCID.N>` and collect metadata for the report header.
+3. Use the router in `SKILL.md` to choose command and workflow sources. Load [data-locator.md](data-locator.md) only when parsed JSON fields are not obvious.
+4. Load only the source-specific workflow references required by the report.
+5. Retrieve evidence in small slices. Keep a scratch list of every claim with its locator.
+6. Assemble the report from retrieved evidence only.
+7. Include a limitations or gaps section when data is absent, ambiguous, or not inspected.
+## Suggested Sections
+Use these sections when they fit the task:
+- Article: PMCID, selected version, title, DOI, journal/citation, date, license/OA status.
+- Scope: what the report covers.
+- Findings: grouped by the user's task.
+- Evidence Table: claim, locator, source ID, short evidence, gap.
+- Files Or Artifacts: available XML/PDF/media/supplements when relevant.
+- Gaps: absent parsed fields, unavailable XML/files, or uninspected sections.
+## Rules
+- Do not make report sections that hide evidence gaps.
+- Do not over-fetch. A report can combine metadata, author notes, and a few body sections without loading the entire parsed article.
+- Use concise quotations only when they add auditability.
+- For author reports, use [workflow-author-contributor-analysis.md](workflow-author-contributor-analysis.md).
+- For broad extraction reports, use [workflow-knowledge-extraction.md](workflow-knowledge-extraction.md).

pmc_toolkit-0.4.0/skills/pmc-toolkit/scripts/content-outline.jq ADDED Viewed

@@ -0,0 +1,14 @@
+def drop_empty($o):
+  $o | with_entries(select(.value | if type == "array" then length > 0 else . != null end));
+def section:
+  drop_empty({
+    section_id: .source_id,
+    title: .title,
+    sections: [.sections[]? | section]
+  });
+{
+  title: .article_info.title,
+  sections: [.content.sections[]? | section]
+}

pmc_toolkit-0.4.0/skills/pmc-toolkit/scripts/query-id.jq ADDED Viewed

@@ -0,0 +1,3 @@
+..
+| objects
+| select(.source_id? == $id)

pmc_toolkit-0.4.0/skills/pmc-toolkit/scripts/reverse-lookup-xref.jq ADDED Viewed

@@ -0,0 +1,13 @@
+{
+  references: "reference_ids",
+  figures: "figure_ids",
+  tables: "table_ids"
+}[$xref] as $field
+| if $field == null then
+    error("xref must be references, figures, or tables")
+  else
+    [ .. | objects
+      | select(.[$field]? | index($id))
+      | {source_id, text, ($field): .[$field] }
+    ]
+  end

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/cli.py RENAMED Viewed

@@ -71,6 +71,44 @@ def metadata(
     _emit_json(result.model_dump(mode="json"))
+@app.command("idconv")
+def idconv(
+    identifiers: list[str] = typer.Argument(
+        ...,
+        help=(
+            "PMID, DOI, PMCID, or MID values to convert to PMC identifiers. "
+            "Comma-separated values are accepted."
+        ),
+    ),
+    idtype: str | None = typer.Option(
+        None,
+        "--idtype",
+        help="Optional input identifier type: pmid, doi, pmcid, or mid.",
+    ),
+    email: str | None = typer.Option(
+        None,
+        "--email",
+        envvar="NCBI_EMAIL",
+        help="Optional contact email sent to the NCBI ID Converter API.",
+    ),
+) -> None:
+    """
+    Convert PMID/DOI identifiers to PMC identifiers when available in PMC.
+    """
+    def build_result():
+        from pmc_toolkit.idconv_api import convert_to_pmcids
+        return convert_to_pmcids(
+            identifiers,
+            idtype=idtype,
+            email=email,
+        )
+    result = _run_command(build_result)
+    _emit_json(result["records"])
 @app.command("files")
 def files(
     requested_pmcid: str = typer.Argument(
@@ -140,8 +178,8 @@ def fetch(
     _emit_json(result.model_dump(mode="json"))
-@app.command("convert-xml")
-def convert_xml(
+@app.command("parse")
+def parse(
     requested_pmcid: str = typer.Argument(
         ...,
         help="PMC accession ID or version ID, e.g. PMC11370360 or PMC11370360.1",
@@ -158,22 +196,10 @@ def convert_xml(
         "-f",
         help="Recreate the extracted JSON cache from the cached XML.",
     ),
-    list_keys: bool = typer.Option(
-        False,
-        "--list-keys",
-        help="Print available extracted JSON keys and descriptions, then exit.",
-    ),
 ) -> None:
     """
-    Convert cached PMC full-text XML into cached extracted JSON.
+    Parse cached PMC full-text XML into cached extracted JSON.
     """
-    if list_keys:
-        from pmc_toolkit.xml_parse_utils import EXTRACT_OUTPUT_KEY_DESCRIPTIONS
-        typer.echo("Available extracted JSON keys:")
-        for key, description in EXTRACT_OUTPUT_KEY_DESCRIPTIONS.items():
-            typer.echo(f"- {key}: {description}")
-        return
     def build_result():
         from pmc_toolkit.xml_parse_api import ensure_extracted_article

pmc_toolkit-0.4.0/src/pmc_toolkit/idconv_api.py ADDED Viewed

@@ -0,0 +1,46 @@
+"""NCBI PMC ID Converter client."""
+from collections.abc import Sequence
+import json
+from typing import Any
+from urllib.error import HTTPError, URLError
+from urllib.parse import urlencode
+from urllib.request import Request, urlopen
+ID_CONVERTER_URL = "https://pmc.ncbi.nlm.nih.gov/tools/idconv/api/v1/articles/"
+def convert_to_pmcids(
+    identifiers: Sequence[str],
+    *,
+    idtype: str | None = None,
+    email: str | None = None,
+    timeout: float = 30.0,
+) -> dict[str, Any]:
+    params = {
+        "tool": "pmc_toolkit",
+        "format": "json",
+        "ids": ",".join(identifiers),
+    }
+    if idtype:
+        params["idtype"] = idtype
+    if email:
+        params["email"] = email
+    url = f"{ID_CONVERTER_URL}?{urlencode(params)}"
+    request = Request(url, headers={"User-Agent": "pmc-toolkit"})
+    try:
+        with urlopen(request, timeout=timeout) as response:
+            payload = json.loads(response.read().decode("utf-8"))
+    except HTTPError as exc:
+        raise RuntimeError(
+            f"ID converter request failed with HTTP {exc.code}."
+        ) from exc
+    except URLError as exc:
+        raise RuntimeError(f"ID converter request failed: {exc.reason}.") from exc
+    except json.JSONDecodeError as exc:
+        raise RuntimeError("ID converter returned invalid JSON.") from exc
+    if not isinstance(payload, dict):
+        raise RuntimeError("ID converter returned an unexpected response.")
+    return payload

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/xml_parse_api.py RENAMED Viewed

@@ -105,11 +105,7 @@ def _ensure_extracted_article_cache(
     from pmc_toolkit.xml_parse_utils import extract_article_data, load_xml
     root = load_xml(paths.xml_path)
-    parsed = _group_extracted_article(
-        extract_article_data(root),
-        versioned_pmcid=paths.versioned_pmcid,
-        xml_path=paths.xml_path,
-    )
+    parsed = _group_extracted_article(extract_article_data(root))
     storage_cache.write_cached_extracted_article(
         paths.cache_root,
         paths.versioned_pmcid,
@@ -120,15 +116,8 @@ def _ensure_extracted_article_cache(
 def _group_extracted_article(
     raw_data: dict[str, Any],
-    *,
-    versioned_pmcid: str,
-    xml_path: Path,
 ) -> dict[str, Any]:
     return {
-        "_meta": {
-            "versioned_pmcid": versioned_pmcid,
-            "xml_path": str(xml_path),
-        },
         "article_info": _article_info(raw_data),
         "content": raw_data["content"],
         "references": raw_data["references"],

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/xml_parse_utils.py RENAMED Viewed

@@ -13,26 +13,6 @@ XMLParser = etree.XMLParser(
     remove_blank_text=True,
 )
 REFERENCE_SEPARATOR_PATTERN = re.compile(r"^[\s,;]+$")
-EXTRACT_OUTPUT_KEY_DESCRIPTIONS = {
-    "article_info": (
-        "article_info.journal, article_ids, title, publication_date, article_type, "
-        "license, keywords, authors[], abstract, and funding_grants[]"
-    ),
-    "content": (
-        "content.paragraphs[] and content.sections[]; objects include source_id, "
-        "section_id, title, text, reference_ids, figure_ids, and table_ids"
-    ),
-    "references": (
-        "references[] items with source_id, label, text, publication_type, "
-        "identifiers, article_title, source, year, volume, issue, and pages"
-    ),
-    "figures": "figures[] items with source_id, label, caption, and graphics",
-    "tables": "tables[] items with source_id, label, caption, rows, and footnotes",
-    "supporting_info": (
-        "acknowledgements, competing_interests, data_availability, "
-        "supplementary_media, author_notes, related_articles, and custom_metadata"
-    ),
-}
 def load_xml(path: Path) -> Any:

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/tests/test_xml_parse_api.py RENAMED Viewed

@@ -73,7 +73,7 @@ def test_ensure_extracted_article_reads_xml_and_writes_extracted_cache(
     assert result.versioned_pmcid == "PMC11370360.1"
     assert result.xml_path == str(article_dir / "PMC11370360.1.xml")
-    assert result.data["_meta"]["versioned_pmcid"] == "PMC11370360.1"
+    assert "_meta" not in result.data
     assert result.data["article_info"]["journal"]["name"] == "bioRxiv"
     assert result.data["article_info"]["journal"]["issn"] == "2692-8205"
     assert result.data["article_info"]["article_ids"] == {

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/uv.lock RENAMED Viewed

@@ -251,7 +251,7 @@ wheels = [
 [[package]]
 name = "pmc-toolkit"
-version = "0.2.0"
+version = "0.4.0"
 source = { editable = "." }
 dependencies = [
     { name = "boto3" },

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/.github/workflows/ci.yml RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/.github/workflows/release.yml RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/.python-version RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/AGENTS.md RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/LICENSE RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/__init__.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/cache.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/models.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/storage_api.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/storage_utils.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/src/pmc_toolkit/validators.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/tests/test_cli.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/tests/test_storage.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/tests/test_validators.py RENAMED Viewed

File without changes

{pmc_toolkit-0.2.0 → pmc_toolkit-0.4.0}/tests/test_xml_parse_utils.py RENAMED Viewed

File without changes

pmc-toolkit 0.2.0__tar.gz → 0.4.0__tar.gz

pmc-toolkit 0.2.0tar.gz → 0.4.0tar.gz