galaxy-tool-refactor-rules 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,24 @@
1
+ # Machine-local scratch, never committed: the cloned corpus (.local/corpus,
2
+ # seeded from corpus_sources.json) and external source clones for inspection
3
+ # (e.g. .local/galaxy-src = a clone of galaxyproject/galaxy used to verify
4
+ # Galaxy-internal behaviour locally).
5
+ # No trailing slash: `.local/` matches only a directory, so an accidental
6
+ # symlink at this path was committable (it happened once; removed in this branch).
7
+ .local
8
+ .venv/
9
+ __pycache__/
10
+ *.pyc
11
+ *.pyo
12
+ .pytest_cache/
13
+ .mypy_cache/
14
+ .ruff_cache/
15
+ dist/
16
+ *.egg-info/
17
+
18
+ # Local-only draft of the GCC poster abstract.
19
+ gcc2026-abstract.txt
20
+
21
+ # Claude Code session scratch (the tracked .claude/ settings + skills stay; this
22
+ # runtime lock for scheduled wakeups is machine-local).
23
+ .claude/scheduled_tasks.lock
24
+ .claude/worktrees/
@@ -0,0 +1,67 @@
1
+ # CLAUDE.md
2
+
3
+ Guidance for Claude Code working in this repository.
4
+
5
+ ## Project
6
+
7
+ `galaxy-tool-refactor-rules` is the **rule-metadata** tier (tier 0.5) of the
8
+ Galaxy tool refactoring framework. It is a tiny, dependency-free package that the
9
+ codemod (tier 2), fmt (tier 3), check (tier 3.5), and registry (tier 3.6) tiers
10
+ depend on directly; the app tier (4) reaches it transitively through the facade:
11
+
12
+ | Tier | Layer | Package |
13
+ |---|---|---|
14
+ | 0.5 | **rule metadata** | `galaxy-tool-refactor-rules` *(this repo)* |
15
+ | 1 | parsing & validation | `galaxy-tool-source` |
16
+ | 2 | structure | `galaxy-tool-codemod` |
17
+ | 3 | formatting | `galaxy-tool-fmt` |
18
+ | 3.5 | advisory checks | `galaxy-tool-lint` |
19
+ | 3.6 | rule registry / rulesets | `galaxy-tool-refactor-registry` |
20
+ | 4 | app / CLI | `galaxy-tool-refactor-cli` |
21
+
22
+ It owns the shared rule **metadata + diagnostics** vocabulary — three things:
23
+
24
+ - `meta.py` — the `RuleMeta` frozen dataclass (the GTR rule descriptor).
25
+ - `violation.py` — the `Violation` frozen dataclass (the per-occurrence detect
26
+ result: `code`, `int` `sourceline`, `str` `xpath`, `message`). Pure primitives,
27
+ no lxml — the read-only counterpart to tier-2 `Change` / tier-3 `Edit`.
28
+ - `reference.py` — `render_rule_reference_table`, a pure markdown renderer for
29
+ a rule glossary.
30
+
31
+ **Invariant: stay dependency-free.** Do not import lxml, tier 1/2/3, or any
32
+ runtime dependency here. The whole point of this package is to be a shared
33
+ primitive that does not drag the tiers into each other — adding a dependency
34
+ would defeat that and risk re-coupling fmt to codemod (see
35
+ `galaxy-tool-fmt/docs/decisions.md` §D10). The behavioral rule/codemod base
36
+ classes belong in their own tiers, not here.
37
+
38
+ ## Coding standards
39
+
40
+ Hand-written code follows **dignified-python**, vendored at the workspace root
41
+ `.claude/skills/dignified-python/`:
42
+
43
+ - Absolute imports, no re-exports, no `__all__`.
44
+ - Keyword-only arguments after the first.
45
+ - No import-time side effects.
46
+ - Type hints throughout; mypy strict.
47
+
48
+ `optimized-python` (`.claude/skills/optimized-python/`) is a secondary
49
+ reference; **dignified-python governs on conflict**.
50
+
51
+ ## Commands
52
+
53
+ Run these from the **workspace root** (`galaxy-tool-refactor/`):
54
+
55
+ - `uv sync` — install dependencies
56
+ - `uv run --package galaxy-tool-refactor-rules pytest galaxy-tool-refactor-rules/tests/` — run tests
57
+ - `uv run ruff check galaxy-tool-refactor-rules/src galaxy-tool-refactor-rules/tests` — lint
58
+ - `uv run mypy --config-file galaxy-tool-refactor-rules/pyproject.toml galaxy-tool-refactor-rules/src` — type-check (strict)
59
+
60
+ ## Useful workspace references
61
+
62
+ - `galaxy-tool-fmt/src/galaxy_tool_fmt/rules.py` — the tier-3 `Rule` ABC
63
+ that carries `meta: ClassVar[RuleMeta]`.
64
+ - `galaxy-tool-codemod/src/galaxy_tool_codemod/catalog.py` — the tier-2
65
+ catalog of GTR-coded codemods.
66
+ - `scripts/corpus_check.py` — the stat-page generator that renders the
67
+ cross-tier rule reference table via `render_rule_reference_table`.
@@ -0,0 +1,55 @@
1
+ Metadata-Version: 2.4
2
+ Name: galaxy-tool-refactor-rules
3
+ Version: 0.2.0
4
+ Summary: Shared rule-metadata vocabulary for the galaxy-tool-refactor tiers (tier 0.5).
5
+ Author: Richard Burhans
6
+ License-Expression: MIT
7
+ Requires-Python: >=3.10
8
+ Description-Content-Type: text/markdown
9
+
10
+ # galaxy-tool-refactor-rules
11
+
12
+ Shared **rule-metadata vocabulary** for the Galaxy tool refactoring framework —
13
+ a small, dependency-free "tier 0.5" package consumed by both higher tiers:
14
+
15
+ | Tier | Layer | Package |
16
+ |---|---|---|
17
+ | 0.5 | **rule metadata** | `galaxy-tool-refactor-rules` *(this package)* |
18
+ | 1 | parsing & validation | `galaxy-tool-source` |
19
+ | 2 | structure | `galaxy-tool-codemod` |
20
+ | 3 | formatting | `galaxy-tool-fmt` |
21
+ | 3.5 | advisory checks | `galaxy-tool-lint` |
22
+ | 3.6 | rule registry / rulesets | `galaxy-tool-refactor-registry` |
23
+ | 4 | app / CLI | `galaxy-tool-refactor-cli` |
24
+
25
+ It provides:
26
+
27
+ - **`galaxy_tool_refactor_rules.meta.RuleMeta`** — a frozen dataclass describing
28
+ one GTR rule (`code`, `summary`, `since`, `until`, `cite`, `order`,
29
+ `detect_only`, `applies_to`). A tier-3 formatter `Rule`, a tier-2
30
+ `CodemodCommand`, and a tier-3.5 `CheckRule` each carry a
31
+ `meta: ClassVar[RuleMeta]`, so the tiers share one registry vocabulary.
32
+ - **`galaxy_tool_refactor_rules.violation.Violation`** — a frozen dataclass for a
33
+ detect-phase finding (`code`, `sourceline`, `xpath`, `message`); the pure,
34
+ lxml-free, read-only counterpart to the mutating tier-2 `Change` / tier-3 `Edit`.
35
+ - **`galaxy_tool_refactor_rules.reference.render_rule_reference_table`** — a pure
36
+ helper that renders `(RuleMeta, tier)` pairs as a GitHub-flavored markdown
37
+ glossary table.
38
+
39
+ ## Why a separate package
40
+
41
+ The descriptor is the only thing the two tiers genuinely share — their
42
+ *behavioral* bases differ (fmt yields lxml edits; codemod walks a cursor), so
43
+ those stay in their own packages. Keeping `RuleMeta` here, with **zero runtime
44
+ dependencies**, lets both fmt and codemod depend on it without depending on each
45
+ other — preserving the tier independence documented in
46
+ `galaxy-tool-fmt/docs/decisions.md` §D10. The extraction was anticipated in
47
+ that package's §D1 ("a shared rule-engine package will be extracted only when a
48
+ second consumer materialises"); the codemod tier is that consumer.
49
+
50
+ ## Install / test
51
+
52
+ ```bash
53
+ uv sync # from the workspace root
54
+ uv run --package galaxy-tool-refactor-rules pytest galaxy-tool-refactor-rules/tests/
55
+ ```
@@ -0,0 +1,46 @@
1
+ # galaxy-tool-refactor-rules
2
+
3
+ Shared **rule-metadata vocabulary** for the Galaxy tool refactoring framework —
4
+ a small, dependency-free "tier 0.5" package consumed by both higher tiers:
5
+
6
+ | Tier | Layer | Package |
7
+ |---|---|---|
8
+ | 0.5 | **rule metadata** | `galaxy-tool-refactor-rules` *(this package)* |
9
+ | 1 | parsing & validation | `galaxy-tool-source` |
10
+ | 2 | structure | `galaxy-tool-codemod` |
11
+ | 3 | formatting | `galaxy-tool-fmt` |
12
+ | 3.5 | advisory checks | `galaxy-tool-lint` |
13
+ | 3.6 | rule registry / rulesets | `galaxy-tool-refactor-registry` |
14
+ | 4 | app / CLI | `galaxy-tool-refactor-cli` |
15
+
16
+ It provides:
17
+
18
+ - **`galaxy_tool_refactor_rules.meta.RuleMeta`** — a frozen dataclass describing
19
+ one GTR rule (`code`, `summary`, `since`, `until`, `cite`, `order`,
20
+ `detect_only`, `applies_to`). A tier-3 formatter `Rule`, a tier-2
21
+ `CodemodCommand`, and a tier-3.5 `CheckRule` each carry a
22
+ `meta: ClassVar[RuleMeta]`, so the tiers share one registry vocabulary.
23
+ - **`galaxy_tool_refactor_rules.violation.Violation`** — a frozen dataclass for a
24
+ detect-phase finding (`code`, `sourceline`, `xpath`, `message`); the pure,
25
+ lxml-free, read-only counterpart to the mutating tier-2 `Change` / tier-3 `Edit`.
26
+ - **`galaxy_tool_refactor_rules.reference.render_rule_reference_table`** — a pure
27
+ helper that renders `(RuleMeta, tier)` pairs as a GitHub-flavored markdown
28
+ glossary table.
29
+
30
+ ## Why a separate package
31
+
32
+ The descriptor is the only thing the two tiers genuinely share — their
33
+ *behavioral* bases differ (fmt yields lxml edits; codemod walks a cursor), so
34
+ those stay in their own packages. Keeping `RuleMeta` here, with **zero runtime
35
+ dependencies**, lets both fmt and codemod depend on it without depending on each
36
+ other — preserving the tier independence documented in
37
+ `galaxy-tool-fmt/docs/decisions.md` §D10. The extraction was anticipated in
38
+ that package's §D1 ("a shared rule-engine package will be extracted only when a
39
+ second consumer materialises"); the codemod tier is that consumer.
40
+
41
+ ## Install / test
42
+
43
+ ```bash
44
+ uv sync # from the workspace root
45
+ uv run --package galaxy-tool-refactor-rules pytest galaxy-tool-refactor-rules/tests/
46
+ ```
@@ -0,0 +1,152 @@
1
+ # Decisions — galaxy-tool-refactor-rules
2
+
3
+ Each entry records a decision once it lands: a date, the decision, and the
4
+ rationale. Mirrors the conventions of the sibling packages' `docs/decisions.md`.
5
+
6
+ ## D1 (2026-05-29) — Extract the shared `RuleMeta` into its own tier-0.5 package
7
+
8
+ ### Decision
9
+
10
+ The `RuleMeta` descriptor (previously private to `galaxy-tool-fmt`) and a
11
+ pure markdown rule-glossary renderer now live in a new, dependency-free package
12
+ `galaxy-tool-refactor-rules`. Both tier 3 (`galaxy-tool-fmt`) and tier 2
13
+ (`galaxy-tool-codemod`) depend on it; neither depends on the other.
14
+
15
+ Only the *metadata* moved. The behavioral bases stayed in their tiers:
16
+ `galaxy_tool_fmt.rules.Rule` (yields lxml `Edit`s) and
17
+ `galaxy_tool_codemod.codemod.CodemodCommand` (cursor-walk visitor) have
18
+ different execution contracts and are not unified.
19
+
20
+ ### Rationale
21
+
22
+ - **A documented trigger, now met.** `galaxy-tool-fmt/docs/decisions.md` §D1
23
+ §Layout said a shared rule package would be extracted "only when a second
24
+ consumer materialises." Giving the codemod tier the same metadata vocabulary
25
+ (so the GTR rule registry spans both tiers) is that second consumer.
26
+ - **Tier independence is preserved, not weakened.** The standing constraint
27
+ (fmt's library must not depend on codemod — §D10 there) is about not dragging
28
+ the *structural framework* into cosmetic-only installs. A metadata-only
29
+ package with **zero runtime dependencies** is a shared primitive like tier 1,
30
+ not the structural framework; both tiers can depend on it safely.
31
+ - **Dependency-free by design.** Keeping lxml / edits / cursor out of this
32
+ package is what makes it a safe shared dependency. The package ships a
33
+ `py.typed` marker and is type-checked under mypy strict.
34
+
35
+ ### Scope
36
+
37
+ `RuleMeta` fields at the time of this extraction were the fmt original (`code`,
38
+ `summary`, `since`, `until`, `cite`, `order`); `detect_only` was added later in
39
+ §D2. The cross-tier GTR registry at the time spanned GTR001–GTR012 (3 fmt rules,
40
+ 9 codemods); it later grew GTR013 (codemod §17) and the advisory codes
41
+ (tier 3.5, `galaxy-tool-lint`). Codes are globally unique across the tiers
42
+ (asserted by a test in `galaxy-tool-fmt`'s corpus-check suite, which can
43
+ import both tiers).
44
+
45
+ ## D2 (2026-05-30) — Add the shared `Violation` diagnostic type
46
+
47
+ ### Decision
48
+
49
+ A `Violation` frozen dataclass joins `RuleMeta` in this tier-0.5 package
50
+ (`violation.py`). It is the per-occurrence result of a rule's **detect (lint)**
51
+ phase: `code` (matching `RuleMeta.code`), `sourceline` (1-based `int`, `0` when
52
+ the element has no source position), `xpath` (`str`), and `message`. It is the
53
+ read-only counterpart to the mutating tier-2 `Change` and tier-3 `Edit`.
54
+
55
+ This lands as part of the detect/fix rule-split effort (see
56
+ `galaxy-tool-codemod/docs/decisions.md` §19; the effort, PR1–5, merged in
57
+ #15). Tier-2 `Change` projects onto a `Violation` via `Change.to_violation()`.
58
+
59
+ ### Rationale
60
+
61
+ - **One shared diagnostic vocabulary.** Both the codemod (tier 2) and formatter
62
+ (tier 3) detect phases, the future report-only `check` CLI (tier 4), and the
63
+ planned advisory check library all surface findings; a single `Violation`
64
+ type lets them do so without depending on one another — the same role
65
+ `RuleMeta` plays for the rule registry.
66
+ - **Dependency-free is preserved.** The location is a plain `int` line plus a
67
+ `str` xpath, never an lxml handle, so this package stays free of lxml / tier
68
+ 1/2/3 imports (the invariant from D1 and `galaxy-tool-fmt` §D10). Putting
69
+ `Violation` in tier 2 instead would have forced fmt to import codemod once fmt
70
+ surfaces its edits as violations — exactly the re-coupling the tier split
71
+ exists to prevent.
72
+
73
+ ### Scope
74
+
75
+ `Violation` is pure data with no methods beyond the dataclass. A capability flag
76
+ on `RuleMeta` (e.g. `detect_only`) is deferred until detect-only rules arrive
77
+ (roadmap PR4), so the flag has a first non-default user when added.
78
+
79
+ ## D3 (2026-05-30) — `RuleMeta.applies_to` (document-kind applicability)
80
+
81
+ ### Decision
82
+
83
+ `RuleMeta` gains `applies_to: frozenset[str]` (default `frozenset({"tool"})`) —
84
+ the document kinds a rule operates on, a subset of `{"tool", "macro"}`. A generic
85
+ XML rule (canonical indentation GTR001, empty-element shorthand GTR004) declares
86
+ `{"tool", "macro"}`; a tool-structural rule (blank line between `<tool>` sections
87
+ GTR003, attribute / element order, profile upgrades) keeps the default `{"tool"}`;
88
+ a future macro-library rule declares `{"macro"}`. Consumers run a rule against a
89
+ document only when the document's kind is in this set (fmt's `rules_for_kind`,
90
+ and later the registry/codemod tiers).
91
+
92
+ ### Rationale
93
+
94
+ - **One field drives fmt *and* codemods.** Phase 2 of the macro-aware effort
95
+ needs to run rules on macro-library files (`<macros>` root); rather than a
96
+ bespoke "is this rule macro-safe?" check per tier, applicability becomes rule
97
+ metadata, read uniformly wherever a document of a given kind is formatted or
98
+ codemodded.
99
+ - **Default `{"tool"}` is conservative and churn-free.** Every existing rule is
100
+ tool-structural by history, so the default leaves them unchanged; only the two
101
+ generic whitespace rules are explicitly widened. A rule runs on a macro file
102
+ only when it opts in.
103
+ - **A set, not a single `scope` enum.** "Applies to any XML" is just "both
104
+ kinds", so a `frozenset` avoids a special `"any"` value and extends cleanly if
105
+ another document kind ever appears. See `galaxy-tool-fmt/docs/decisions.md`
106
+ §D16 for the fmt-side consumption (`format_macro_document` / `rules_for_kind`).
107
+
108
+
109
+ ## D4 (2026-06-09) — `RuleMeta.rulesets` + the `Ruleset` catalog
110
+
111
+ ### Decision
112
+
113
+ Add `rulesets: frozenset[str] = frozenset()` to `RuleMeta` and a dependency-free
114
+ `rulesets.py` (a `Ruleset(name, description)` catalog + `DEFAULT_RULESET`). This is the
115
+ maintainer-facing "mark which rules belong to which set" mechanism: a rule declares its
116
+ set membership on its own meta, and the registry tier (3.6) derives `name → codes` by
117
+ grouping rules by these names (registry D15). The catalog holds the names + descriptions
118
+ because that is a property of the *set*, not of any member; names are plain strings so a
119
+ rule in any tier can declare membership without a heavier import. `order` is now used by
120
+ **both** the fmt and codemod families (each ordered independently) — its docstring was
121
+ corrected accordingly, since the canonical pipeline's order moved off the old hardcoded
122
+ `CANONICAL_CODEMODS` tuple onto `meta.order`. Staying dependency-free is preserved (only
123
+ `dataclasses`). The default empty `rulesets` means "never independently selectable" (e.g.
124
+ an upgrade-only codemod driven by `UpgradeToLatest`).
125
+
126
+ ### Reproduction
127
+
128
+ ```sh
129
+ uv run --package galaxy-tool-refactor-rules pytest \
130
+ galaxy-tool-refactor-rules/tests/test_rulesets.py galaxy-tool-refactor-rules/tests/test_meta.py
131
+ ```
132
+
133
+ ## D5 (2026-06-09) — `RuleMeta.planemo_linters` (the planemo-name alias)
134
+
135
+ ### Decision
136
+
137
+ Add `planemo_linters: frozenset[str] = frozenset()` to `RuleMeta`: the planemo
138
+ (`galaxy.tool_util.lint`) linter class names a rule covers (e.g. GTR028 →
139
+ `{"HelpMissing", "HelpEmpty"}`). One GTR rule may cover several planemo linters —
140
+ planemo splits some single practices across linter classes, and our rule is the
141
+ natural unit (a survey found 21 of 25 such bundles are one practice). Formalizing
142
+ the mapping (previously only in docstrings + the hand-maintained parity table) on
143
+ the rule makes it the single source of truth: the registry (D16) derives a
144
+ `planemo name → GTR code` index for name-based selection and **generates** the
145
+ parity table from it. Empty for our own rules with no planemo equivalent (the
146
+ cosmetic fmt rules, the XSD-restoring repairs). Dependency-free (just strings).
147
+
148
+ ### Reproduction
149
+
150
+ ```sh
151
+ uv run --package galaxy-tool-refactor-rules pytest galaxy-tool-refactor-rules/tests/test_meta.py
152
+ ```
@@ -0,0 +1,37 @@
1
+ [project]
2
+ name = "galaxy-tool-refactor-rules"
3
+ version = "0.2.0"
4
+ description = "Shared rule-metadata vocabulary for the galaxy-tool-refactor tiers (tier 0.5)."
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ license = "MIT"
8
+ authors = [{ name = "Richard Burhans" }]
9
+ # Intentionally dependency-free: this package is a pure metadata descriptor plus
10
+ # a markdown render helper. It must stay free of lxml / tier-2 / tier-3 imports
11
+ # so both fmt and codemod can depend on it without coupling the tiers together.
12
+ dependencies = []
13
+
14
+ [build-system]
15
+ requires = ["hatchling"]
16
+ build-backend = "hatchling.build"
17
+
18
+ [dependency-groups]
19
+ dev = [
20
+ "pytest>=8",
21
+ "ruff>=0.5",
22
+ "mypy>=1.10",
23
+ ]
24
+
25
+ [tool.ruff]
26
+ src = ["src"]
27
+ target-version = "py310"
28
+
29
+ [tool.ruff.lint]
30
+ select = ["E", "F", "I", "B", "UP", "SIM", "PTH"]
31
+
32
+ [tool.mypy]
33
+ files = ["src"]
34
+ strict = true
35
+
36
+ [tool.pytest.ini_options]
37
+ testpaths = ["tests"]
@@ -0,0 +1,11 @@
1
+ """Shared rule-metadata vocabulary for the galaxy-tool-refactor tiers.
2
+
3
+ Tier 0.5 of the refactoring framework: a dependency-free home for the
4
+ ``RuleMeta`` descriptor shared by the formatter (tier 3) and the codemod
5
+ framework (tier 2), plus a pure markdown render helper for rule glossaries.
6
+
7
+ Following the project's dignified-python conventions there are no re-exports;
8
+ callers import ``RuleMeta`` from ``galaxy_tool_refactor_rules.meta`` and
9
+ ``render_rule_reference_table`` from ``galaxy_tool_refactor_rules.reference``
10
+ directly.
11
+ """
@@ -0,0 +1,86 @@
1
+ """The ``RuleMeta`` descriptor shared across the refactor tiers.
2
+
3
+ Both a tier-3 formatter rule (``galaxy_tool_fmt.rules.Rule``) and a tier-2
4
+ codemod (``galaxy_tool_codemod.codemod.CodemodCommand``) carry a
5
+ ``meta: ClassVar[RuleMeta]`` so the two tiers expose one uniform vocabulary for
6
+ the GTR rule registry. The descriptor is pure data — it deliberately knows
7
+ nothing about lxml, edits, or the cursor walk, which keeps this package
8
+ dependency-free and the tiers independent.
9
+
10
+ Versioning convention: stability for consumers comes from pinning the owning
11
+ tier's package version in their lockfile, not from this metadata. ``since`` /
12
+ ``until`` are documentary only — ``until`` stays ``None`` while a rule is active
13
+ and is stamped (for the changelog) in the same commit that retires the rule.
14
+ """
15
+
16
+ from __future__ import annotations
17
+
18
+ from dataclasses import dataclass
19
+
20
+
21
+ @dataclass(frozen=True)
22
+ class RuleMeta:
23
+ """Metadata descriptor for a refactor rule (a fmt rule or a codemod).
24
+
25
+ Attributes:
26
+ code: Short unique rule identifier (e.g. ``"GTR001"``).
27
+ summary: One-line human-readable description.
28
+ since: Version in which this rule was introduced.
29
+ until: Version in which this rule was removed, or ``None`` if active.
30
+ cite: Optional reference URL or citation.
31
+ order: Application order; lower values run first. Each family is ordered
32
+ independently by this value — the formatter tier sequences its
33
+ cosmetic rules, and the codemod tier sequences its canonical codemods
34
+ (the registry's apply phase sorts each family by ``order``). An
35
+ upgrade-only or report-only rule leaves it at the default.
36
+ detect_only: Whether the rule only *reports* (a lint with no automatic
37
+ fix), as opposed to the fixable fmt rules and codemods. The advisory
38
+ check tier (``galaxy-tool-lint``) sets this ``True``; a
39
+ report-only consumer like the ``check`` CLI uses it to treat such
40
+ findings as informational rather than as a failing gate.
41
+ applies_to: The document kinds the rule operates on — a subset of
42
+ ``{"tool", "macro"}``. A generic XML rule (canonical indentation,
43
+ empty-element shorthand) applies to both; a tool-structural rule
44
+ (``<tool>`` child order, a blank line between ``<tool>`` sections,
45
+ attribute order, profile upgrades) applies only to ``"tool"``; a
46
+ macro-library rule applies only to ``"macro"``. The default
47
+ ``{"tool"}`` is the conservative choice — a rule runs on a macro
48
+ file only when it explicitly opts in. Consumers run a rule against a
49
+ document only when the document's kind is in this set.
50
+ parent: The code of the **partition parent** this rule is a sub-rule of,
51
+ or ``None`` for a standalone rule. A best-practice that splits into a
52
+ provably-fixable part and an advisory residual is modelled as a parent
53
+ practice code (e.g. ``"GTR020"``) with two sub-rules whose own ``code``
54
+ is dotted: ``"GTR020.1"`` (fixable) and ``"GTR020.2"`` (advisory). The
55
+ parent is a registry-level grouping (selectable, expands to its
56
+ children), not itself a rule; this field is what the registry derives
57
+ the groups from. See registry ``docs/decisions.md`` D10.
58
+ rulesets: The names of the rule-sets this rule belongs to (the catalog
59
+ lives in ``rulesets.py``). This is the maintainer-facing "mark which
60
+ rules belong to which set" mechanism: the registry groups rules by
61
+ these names into selectable sets, and the CLI ``--ruleset`` flag
62
+ selects the **union** of the named sets. The default empty set means
63
+ the rule is never independently selectable — e.g. an upgrade-only
64
+ codemod driven internally by ``UpgradeToLatest``. Every name used
65
+ here must appear in the ``rulesets.py`` catalog (guarded by a test).
66
+ planemo_linters: The names of the planemo (``galaxy.tool_util.lint``)
67
+ linter classes this rule covers, e.g. ``{"HelpMissing", "HelpEmpty"}``.
68
+ One GTR rule may cover several planemo linters (planemo splits some
69
+ single practices across linter classes). The registry derives a
70
+ ``planemo name → GTR code`` index from this so a planemo user can
71
+ select/find a rule by its planemo name (``--select HelpMissing``) and
72
+ so the parity table can be generated. Empty for our own rules with no
73
+ planemo equivalent (the cosmetic fmt rules, the XSD-restoring repairs).
74
+ """
75
+
76
+ code: str
77
+ summary: str
78
+ since: str
79
+ until: str | None = None
80
+ cite: str | None = None
81
+ order: int = 100
82
+ detect_only: bool = False
83
+ applies_to: frozenset[str] = frozenset({"tool"})
84
+ parent: str | None = None
85
+ rulesets: frozenset[str] = frozenset()
86
+ planemo_linters: frozenset[str] = frozenset()
@@ -0,0 +1,48 @@
1
+ """Render a cross-tier GTR rule glossary as GitHub-flavored markdown.
2
+
3
+ A pure, dependency-free helper: it turns ``(RuleMeta, tier)`` pairs into the
4
+ rows of a ``| Rule | Tier | What it does |`` table, sorted by code. It emits the
5
+ table only — the caller owns any surrounding heading and intro prose, which is
6
+ context-specific (the fmt stat page frames it differently than a standalone rule
7
+ registry would).
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ import re
13
+ from collections.abc import Iterable
14
+
15
+ from galaxy_tool_refactor_rules.meta import RuleMeta
16
+
17
+ # Element names in a summary (``<param>``, ``<foo/>``) would be parsed as HTML
18
+ # tags inside a markdown table cell and rendered as nothing; wrap them in
19
+ # backticks so they survive as literal text.
20
+ _ANGLE_TOKEN = re.compile(r"(<[^>]+>)")
21
+
22
+
23
+ def _backtick_xml_tokens(text: str) -> str:
24
+ """Backtick-wrap angle-bracket tokens so GitHub renders them literally."""
25
+ return _ANGLE_TOKEN.sub(r"`\1`", text)
26
+
27
+
28
+ def render_rule_reference_table(entries: Iterable[tuple[RuleMeta, str]]) -> list[str]:
29
+ """Render ``(meta, tier)`` pairs as a markdown reference table.
30
+
31
+ Args:
32
+ entries: ``(RuleMeta, tier_label)`` pairs, where ``tier_label`` names the
33
+ owning tier (e.g. ``"fmt"`` or ``"codemod"``).
34
+
35
+ Returns:
36
+ Markdown lines: a header row, the separator, then one row per rule
37
+ ordered by ``meta.code``.
38
+ """
39
+ rows = sorted(entries, key=lambda entry: entry[0].code)
40
+ lines = [
41
+ "| Rule | Tier | What it does |",
42
+ "|---|---|---|",
43
+ ]
44
+ lines.extend(
45
+ f"| {meta.code} | {tier} | {_backtick_xml_tokens(meta.summary)} |"
46
+ for meta, tier in rows
47
+ )
48
+ return lines
@@ -0,0 +1,84 @@
1
+ """The named rule-sets — the maintainer-facing vocabulary for grouping rules.
2
+
3
+ A *ruleset* is a named, described bucket of rules. **Membership is declared per
4
+ rule** (``RuleMeta.rulesets``): a maintainer marks a rule's set(s) right on the
5
+ rule. This module is the authoritative catalog of the ruleset *names and
6
+ descriptions* — the one property that belongs to the set itself, not to any
7
+ member. The registry tier (3.6) derives ``name -> {codes}`` by grouping rules by
8
+ their declared membership, and the CLI ``--ruleset`` flag selects the **union** of
9
+ the named sets.
10
+
11
+ Dependency-free, like the rest of tier 0.5 — ruleset names are plain strings, so a
12
+ rule in any tier can declare membership (``RuleMeta(..., rulesets=frozenset({...}))``)
13
+ without importing anything heavier. Adding a ruleset is a developer task (a new
14
+ ``Ruleset`` here + tagging the member rules); there are no user-defined rulesets.
15
+
16
+ The catalog also defines the subset relationships informally via the seeded
17
+ membership: ``cosmetic`` ⊆ ``default`` == ``iuc`` ⊆ ``strict`` today. ``default``
18
+ is the set applied when the user names no ruleset; it reproduces the historical
19
+ default ``format`` behaviour.
20
+ """
21
+
22
+ from __future__ import annotations
23
+
24
+ from dataclasses import dataclass
25
+
26
+
27
+ @dataclass(frozen=True)
28
+ class Ruleset:
29
+ """A named rule-set: a selectable bucket of rules.
30
+
31
+ Attributes:
32
+ name: The selection key (e.g. ``"default"``) — exactly as it appears in
33
+ ``RuleMeta.rulesets`` and on the CLI ``--ruleset`` flag.
34
+ description: A one-line human-readable summary.
35
+ """
36
+
37
+ name: str
38
+ description: str
39
+
40
+
41
+ DEFAULT_RULESET = "default"
42
+ """The ruleset selected when the user names none (the no-argument ``format`` set)."""
43
+
44
+
45
+ _CATALOG: tuple[Ruleset, ...] = (
46
+ Ruleset(
47
+ name="cosmetic",
48
+ description="Cosmetic whitespace only (indent, blank lines, shorthand).",
49
+ ),
50
+ Ruleset(
51
+ name="default",
52
+ description="The opinionated canonical formatter: structural repair + "
53
+ "attribute/element order + CDATA/quoting/help-RST fixes + cosmetic "
54
+ "formatting (the default).",
55
+ ),
56
+ Ruleset(
57
+ name="iuc",
58
+ description="Mirrors 'default' today; reserved for IUC-specific "
59
+ "divergence.",
60
+ ),
61
+ Ruleset(
62
+ name="strict",
63
+ description="Everything in 'default' plus the advisory best-practice "
64
+ "checks (report-only).",
65
+ ),
66
+ )
67
+
68
+
69
+ def rulesets_catalog() -> tuple[Ruleset, ...]:
70
+ """Return every defined ruleset, in display order."""
71
+ return _CATALOG
72
+
73
+
74
+ def ruleset_names() -> tuple[str, ...]:
75
+ """Return the defined ruleset names, in display order."""
76
+ return tuple(ruleset.name for ruleset in _CATALOG)
77
+
78
+
79
+ def ruleset_description(name: str, /) -> str | None:
80
+ """Return the one-line description for *name*, or ``None`` if undefined."""
81
+ for ruleset in _CATALOG:
82
+ if ruleset.name == name:
83
+ return ruleset.description
84
+ return None
@@ -0,0 +1,39 @@
1
+ """The ``Violation`` diagnostic descriptor shared across the refactor tiers.
2
+
3
+ A ``Violation`` is the per-occurrence report a detect (lint) phase produces:
4
+ which rule fired (``code``), where (``sourceline`` + ``xpath``), and a
5
+ human-readable ``message``. It is the read-only counterpart to the mutating
6
+ ``Edit`` (tier 3) / ``Change`` (tier 2) types — those carry the fix; this carries
7
+ the finding. Both the formatter (tier 3) and codemod (tier 2) detect phases, the
8
+ ``check`` CLI (tier 4), and the advisory check library surface diagnostics as
9
+ ``Violation``s, so the type lives here next to ``RuleMeta`` where every tier can
10
+ reach it without depending on one another.
11
+
12
+ Like ``RuleMeta`` this is pure data — the location is a plain ``int`` line plus a
13
+ ``str`` xpath, never an lxml handle — which keeps this package dependency-free
14
+ (no lxml, no tier 1/2/3 imports). See ``docs/decisions.md`` § D1 for the
15
+ shared-vocabulary rationale.
16
+ """
17
+
18
+ from __future__ import annotations
19
+
20
+ from dataclasses import dataclass
21
+
22
+
23
+ @dataclass(frozen=True)
24
+ class Violation:
25
+ """A single detected rule occurrence in a tool XML document.
26
+
27
+ Attributes:
28
+ code: The rule's identifier (e.g. ``"GTR002"``); matches ``RuleMeta.code``.
29
+ sourceline: 1-based line of the offending element, or ``0`` when the
30
+ element was synthesised and has no source position.
31
+ xpath: Absolute xpath to the offending element (e.g.
32
+ ``"/tool/inputs/param[1]"``).
33
+ message: One-line human-readable description of the finding.
34
+ """
35
+
36
+ code: str
37
+ sourceline: int
38
+ xpath: str
39
+ message: str
@@ -0,0 +1,67 @@
1
+ """Guard: tier 0.5 (``galaxy-tool-refactor-rules``) must stay dependency-free.
2
+
3
+ The whole point of this tier is to be a shared primitive that the codemod and fmt
4
+ tiers can each carry without depending on each other (ARCHITECTURE.md §2; rules
5
+ ``docs/decisions.md`` §D1). A single third-party or higher-tier import here would
6
+ re-couple the tiers it exists to keep apart. The invariant was prose-only — the
7
+ architecture audit (2026-06-03, `docs/architecture_audit.md`) flagged it as
8
+ unguarded — so this test enforces it: every import in the package's ``src`` must
9
+ resolve to the standard library or the package itself. A future commit that adds,
10
+ say, an ``lxml`` import fails here loudly.
11
+ """
12
+
13
+ from __future__ import annotations
14
+
15
+ import ast
16
+ import sys
17
+ from pathlib import Path
18
+
19
+ import galaxy_tool_refactor_rules
20
+
21
+ _PACKAGE = "galaxy_tool_refactor_rules"
22
+ _PACKAGE_DIR = Path(galaxy_tool_refactor_rules.__file__).parent
23
+ # Top-level module names an import in this tier may reference: the standard
24
+ # library, ``__future__``, and the package itself. Nothing else.
25
+ _ALLOWED_ROOTS = set(sys.stdlib_module_names) | {_PACKAGE, "__future__"}
26
+
27
+
28
+ def _imported_roots(source: str, /) -> set[str]:
29
+ """The top-level module names *source* imports absolutely.
30
+
31
+ Relative imports (``from . import x``) are internal and carry no external
32
+ coupling, so they are ignored.
33
+ """
34
+ roots: set[str] = set()
35
+ for node in ast.walk(ast.parse(source)):
36
+ if isinstance(node, ast.Import):
37
+ roots.update(alias.name.split(".")[0] for alias in node.names)
38
+ elif isinstance(node, ast.ImportFrom) and node.level == 0 and node.module:
39
+ roots.add(node.module.split(".")[0])
40
+ return roots
41
+
42
+
43
+ def test_rules_tier_imports_only_stdlib_and_self() -> None:
44
+ offenders: dict[str, set[str]] = {}
45
+ for path in sorted(_PACKAGE_DIR.rglob("*.py")):
46
+ external = _imported_roots(path.read_text(encoding="utf-8")) - _ALLOWED_ROOTS
47
+ if external:
48
+ offenders[str(path.relative_to(_PACKAGE_DIR))] = external
49
+ assert not offenders, (
50
+ "tier 0.5 must stay dependency-free (ARCHITECTURE.md §2), but these files "
51
+ f"import non-stdlib, non-self modules: {offenders}"
52
+ )
53
+
54
+
55
+ def test_guard_detects_a_planted_violation() -> None:
56
+ """The scan would actually catch a forbidden import (not a vacuous pass)."""
57
+ assert _imported_roots("import lxml.etree") == {"lxml"}
58
+ assert _imported_roots("from galaxy_tool_source.binding import load_tool") == {
59
+ "galaxy_tool_source"
60
+ }
61
+ # stdlib + self + a relative import are all allowed (subtract to empty).
62
+ allowed = _imported_roots(
63
+ "import sys\nfrom dataclasses import dataclass\n"
64
+ "from galaxy_tool_refactor_rules.meta import RuleMeta\n"
65
+ "from . import violation\n"
66
+ )
67
+ assert allowed - _ALLOWED_ROOTS == set()
@@ -0,0 +1,62 @@
1
+ """Tests for the shared ``RuleMeta`` descriptor."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import dataclasses
6
+
7
+ import pytest
8
+
9
+ from galaxy_tool_refactor_rules.meta import RuleMeta
10
+
11
+
12
+ def test_rule_meta_is_frozen() -> None:
13
+ meta = RuleMeta(code="GTR001", summary="Do a thing.", since="0.1.0")
14
+ with pytest.raises(dataclasses.FrozenInstanceError):
15
+ meta.code = "GTR999" # type: ignore[misc]
16
+
17
+
18
+ def test_rule_meta_defaults() -> None:
19
+ meta = RuleMeta(code="GTR001", summary="Do a thing.", since="0.1.0")
20
+ assert meta.until is None
21
+ assert meta.cite is None
22
+ assert meta.order == 100
23
+ assert meta.detect_only is False
24
+ assert meta.applies_to == frozenset({"tool"})
25
+ assert meta.parent is None
26
+ assert meta.rulesets == frozenset()
27
+ assert meta.planemo_linters == frozenset()
28
+
29
+
30
+ def test_rule_meta_partition_child_carries_parent() -> None:
31
+ meta = RuleMeta(
32
+ code="GTR020.1", summary="Fix the provable part.", since="0.1.0",
33
+ parent="GTR020",
34
+ )
35
+ assert meta.code == "GTR020.1"
36
+ assert meta.parent == "GTR020"
37
+
38
+
39
+ def test_rule_meta_applies_to_can_widen_to_macro() -> None:
40
+ meta = RuleMeta(
41
+ code="GTR001",
42
+ summary="Generic XML rule.",
43
+ since="0.1.0",
44
+ applies_to=frozenset({"tool", "macro"}),
45
+ )
46
+ assert meta.applies_to == frozenset({"tool", "macro"})
47
+
48
+
49
+ def test_rule_meta_carries_supplied_values() -> None:
50
+ meta = RuleMeta(
51
+ code="GTR021",
52
+ summary="Do a thing.",
53
+ since="0.1.0",
54
+ until="0.4.0",
55
+ cite="https://example.invalid/spec",
56
+ order=10,
57
+ detect_only=True,
58
+ )
59
+ assert meta.until == "0.4.0"
60
+ assert meta.cite == "https://example.invalid/spec"
61
+ assert meta.order == 10
62
+ assert meta.detect_only is True
@@ -0,0 +1,44 @@
1
+ """Tests for the markdown rule-reference table renderer."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from galaxy_tool_refactor_rules.meta import RuleMeta
6
+ from galaxy_tool_refactor_rules.reference import render_rule_reference_table
7
+
8
+
9
+ def test_header_and_separator() -> None:
10
+ lines = render_rule_reference_table([])
11
+ assert lines == ["| Rule | Tier | What it does |", "|---|---|---|"]
12
+
13
+
14
+ def test_rows_are_sorted_by_code() -> None:
15
+ entries = [
16
+ (RuleMeta(code="GTR003", summary="Three.", since="0.1.0"), "fmt"),
17
+ (RuleMeta(code="GTR001", summary="One.", since="0.1.0"), "fmt"),
18
+ (RuleMeta(code="GTR002", summary="Two.", since="0.0.1"), "codemod"),
19
+ ]
20
+ rows = render_rule_reference_table(entries)[2:]
21
+ codes = [line.split("|")[1].strip() for line in rows]
22
+ assert codes == ["GTR001", "GTR002", "GTR003"]
23
+
24
+
25
+ def test_tier_label_is_emitted() -> None:
26
+ entries = [(RuleMeta(code="GTR002", summary="Two.", since="0.0.1"), "codemod")]
27
+ assert render_rule_reference_table(entries)[2] == "| GTR002 | codemod | Two. |"
28
+
29
+
30
+ def test_angle_bracket_tokens_are_backtick_wrapped() -> None:
31
+ entries = [
32
+ (
33
+ RuleMeta(
34
+ code="GTR004",
35
+ summary="Collapse empty leaves to <foo/> form for <tool> children.",
36
+ since="0.1.0",
37
+ ),
38
+ "fmt",
39
+ ),
40
+ ]
41
+ row = render_rule_reference_table(entries)[2]
42
+ assert "`<foo/>`" in row
43
+ assert "`<tool>`" in row
44
+ assert "<foo/> " not in row.replace("`<foo/>`", "")
@@ -0,0 +1,49 @@
1
+ """Tests for the named-ruleset catalog."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import dataclasses
6
+
7
+ import pytest
8
+
9
+ from galaxy_tool_refactor_rules.rulesets import (
10
+ DEFAULT_RULESET,
11
+ Ruleset,
12
+ ruleset_description,
13
+ ruleset_names,
14
+ rulesets_catalog,
15
+ )
16
+
17
+
18
+ def test_ruleset_is_frozen() -> None:
19
+ ruleset = Ruleset(name="default", description="default")
20
+ with pytest.raises(dataclasses.FrozenInstanceError):
21
+ ruleset.name = "other" # type: ignore[misc]
22
+
23
+
24
+ def test_catalog_has_the_seeded_rulesets() -> None:
25
+ assert ruleset_names() == ("cosmetic", "default", "iuc", "strict")
26
+
27
+
28
+ def test_default_ruleset_is_in_the_catalog() -> None:
29
+ assert DEFAULT_RULESET == "default"
30
+ assert DEFAULT_RULESET in ruleset_names()
31
+
32
+
33
+ def test_descriptions_resolve_for_every_name() -> None:
34
+ for name in ruleset_names():
35
+ description = ruleset_description(name)
36
+ assert description is not None
37
+ assert description != ""
38
+ # A description equal to the name is a placeholder, not a description
39
+ # (the user-facing `rulesets` command / MCP `list_rulesets` shows it).
40
+ assert description != name
41
+
42
+
43
+ def test_unknown_ruleset_description_is_none() -> None:
44
+ assert ruleset_description("nope") is None
45
+
46
+
47
+ def test_catalog_names_are_unique() -> None:
48
+ names = [ruleset.name for ruleset in rulesets_catalog()]
49
+ assert len(names) == len(set(names))
@@ -0,0 +1,36 @@
1
+ """Tests for the shared ``Violation`` diagnostic descriptor."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import dataclasses
6
+
7
+ import pytest
8
+
9
+ from galaxy_tool_refactor_rules.violation import Violation
10
+
11
+
12
+ def test_violation_is_frozen() -> None:
13
+ violation = Violation(
14
+ code="GTR001", sourceline=12, xpath="/tool", message="needs a fix"
15
+ )
16
+ with pytest.raises(dataclasses.FrozenInstanceError):
17
+ violation.code = "GTR999" # type: ignore[misc]
18
+
19
+
20
+ def test_violation_carries_supplied_values() -> None:
21
+ violation = Violation(
22
+ code="GTR002",
23
+ sourceline=7,
24
+ xpath="/tool/inputs/param[1]",
25
+ message="<param> attributes are not in IUC order",
26
+ )
27
+ assert violation.code == "GTR002"
28
+ assert violation.sourceline == 7
29
+ assert violation.xpath == "/tool/inputs/param[1]"
30
+ assert violation.message == "<param> attributes are not in IUC order"
31
+
32
+
33
+ def test_violation_equality_is_by_value() -> None:
34
+ one = Violation(code="GTR001", sourceline=1, xpath="/tool", message="m")
35
+ two = Violation(code="GTR001", sourceline=1, xpath="/tool", message="m")
36
+ assert one == two