yaml-reference 2.6.0__tar.gz → 2.6.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (24) hide show
  1. yaml_reference-2.6.2/.github/copilot-instructions.md +141 -0
  2. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/PKG-INFO +17 -2
  3. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/README.md +16 -1
  4. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/tests/unit/test_reference.py +67 -5
  5. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/yaml_reference/__init__.py +44 -4
  6. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.github/workflows/pytests-pr.yml +0 -0
  7. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.github/workflows/release.yml +0 -0
  8. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.github/workflows/spectests-pr.yml +0 -0
  9. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.gitignore +0 -0
  10. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.pre-commit-config.yaml +0 -0
  11. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.python-version +0 -0
  12. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.vscode/settings.json +0 -0
  13. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/.zed/settings.json +0 -0
  14. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/GitVersion.yml +0 -0
  15. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/LICENSE +0 -0
  16. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/Makefile +0 -0
  17. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/pyproject.toml +0 -0
  18. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/scripts/spec-test.sh +0 -0
  19. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/scripts/update-readme-badge.sh +0 -0
  20. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/tests/unit/conftest.py +0 -0
  21. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/tests/unit/test_flatten.py +0 -0
  22. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/tests/unit/test_merge.py +0 -0
  23. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/uv.lock +0 -0
  24. {yaml_reference-2.6.0 → yaml_reference-2.6.2}/yaml_reference/cli.py +0 -0
@@ -0,0 +1,141 @@
1
+ # Copilot Instructions for yaml-reference
2
+
3
+ ## Project Overview
4
+
5
+ **yaml-reference** is a Python library that extends `ruamel.yaml` with cross-file YAML composition using custom tags (`!reference`, `!reference-all`, `!flatten`, `!merge`). It's built to be a reference implementation of the [yaml-reference-specs](https://github.com/dsillman2000/yaml-reference-specs) specification.
6
+
7
+ ## Build, Test, and Lint
8
+
9
+ **Tool chain:** `uv` (Python package manager) + `pytest` (testing) + `ruff` (linting/formatting)
10
+
11
+ ### Install dependencies
12
+ ```bash
13
+ uv sync
14
+ ```
15
+
16
+ ### Run all tests
17
+ ```bash
18
+ uv run pytest tests/ -v
19
+ ```
20
+
21
+ ### Run a single test file
22
+ ```bash
23
+ uv run pytest tests/unit/test_reference.py -v
24
+ ```
25
+
26
+ ### Run tests matching a pattern
27
+ ```bash
28
+ uv run pytest tests/unit/test_reference.py::test_reference_load -v
29
+ ```
30
+
31
+ ### Run spec compliance tests (tests against yaml-reference-specs)
32
+ ```bash
33
+ make spec-test
34
+ ```
35
+
36
+ ### Code formatting
37
+ ```bash
38
+ uv run ruff format
39
+ # or
40
+ make format
41
+ ```
42
+
43
+ ### Linting and auto-fix
44
+ ```bash
45
+ uv run ruff check --fix
46
+ # or
47
+ make lint
48
+ ```
49
+
50
+ ### Run full quality check (format + lint + test)
51
+ ```bash
52
+ make check
53
+ ```
54
+
55
+ ### Build package
56
+ ```bash
57
+ uv build
58
+ ```
59
+
60
+ ## Architecture
61
+
62
+ The library is structured in two key parts:
63
+
64
+ ### Core Module (`yaml_reference/__init__.py`)
65
+ - **Reference & ReferenceAll classes**: Represent the `!reference` and `!reference-all` YAML tags as Python objects
66
+ - **parse_yaml_with_references()**: Parses YAML, returning Reference/ReferenceAll objects without resolving them (one layer only)
67
+ - **load_yaml_with_references()**: Fully recursively resolves all references, returning a complete Python dict
68
+ - **Flatten & Merge classes**: Represent `!flatten` and `!merge` tag logic
69
+ - **YAML loader setup**: Registers custom constructors with `ruamel.yaml.YAML` for each tag
70
+
71
+ ### CLI Module (`yaml_reference/cli.py`)
72
+ - Simple entry point that calls the core loading functions
73
+ - Outputs JSON to stdout (compatible with spec tests)
74
+ - Takes optional `--allow` flag for path restrictions
75
+
76
+ ### Test Structure (`tests/unit/`)
77
+ - `test_reference.py`: Tests for `!reference` and `!reference-all` tag resolution
78
+ - `test_flatten.py`: Tests for `!flatten` tag behavior
79
+ - `test_merge.py`: Tests for `!merge` tag behavior
80
+ - `conftest.py`: Pytest fixtures and test utilities
81
+
82
+ ## Key Conventions & Design Patterns
83
+
84
+ ### Security-First Path Handling
85
+ 1. **Relative paths only**: All references must use relative paths (e.g., `path: "config/db.yaml"`). Absolute paths raise `ValueError`.
86
+ 2. **Path restriction by default**: References can only access files in the same directory or subdirectories (no `..` to escape). Use `allow_paths` parameter to explicitly allow other directory trees.
87
+ 3. **Security invariant**: Disallowed files are **never opened or read into memory**. Path filtering happens before file I/O.
88
+ 4. **Silent omission (for `!reference-all`)**: When a glob pattern matches files outside allowed paths, those files are silently dropped from results and the function returns `rc=0` (not an error).
89
+
90
+ ### YAML Tag Implementation Pattern
91
+ Each custom tag follows this pattern:
92
+ 1. Define a class with `yaml_tag` attribute
93
+ 2. Implement `@classmethod from_yaml(cls, constructor, node)` to parse from YAML
94
+ 3. Register constructor with the YAML loader in `__init__.py`
95
+ 4. The class instance persists through `parse_yaml_with_references()`, allowing layer-by-layer resolution
96
+
97
+ ### Reference Resolution Order
98
+ 1. **Circular reference detection** occurs during recursive resolution by tracking a "resolution stack"
99
+ 2. **Anchors** (optional parameter): If specified, extract only the anchored section from the referenced file
100
+ 3. **Recursive expansion**: `load_yaml_with_references()` recursively expands all tags, applying `!flatten` and `!merge` logic as it encounters them
101
+
102
+ ### Error Handling
103
+ - **ValueError** for spec violations: absolute paths, circular references, invalid anchors
104
+ - **FileNotFoundError** for missing referenced files
105
+ - **Glob errors**: Return empty list `[]` if glob matches no files (silent omission)
106
+
107
+ ### Spec Compliance Testing
108
+ The project tests against `yaml-reference-specs`, a Go-based reference implementation. The spec tests verify:
109
+ - Correct expansion of all four tags
110
+ - Proper error detection (bad paths, missing files, circular refs)
111
+ - Path restriction enforcement
112
+ - Edge cases like empty globs and nested composition
113
+
114
+ Run with: `make spec-test` or `scripts/spec-test.sh`
115
+
116
+ ## Pre-commit Hooks
117
+
118
+ The repository enforces conventional commits and code quality via pre-commit:
119
+ - **ruff-check** and **ruff-format**: Ensures consistent style
120
+ - **conventional-pre-commit**: Enforces Conventional Commits format (e.g., `feat:`, `fix:`, `docs:`)
121
+ - Standard hooks: trailing whitespace, EOF fixers, YAML validation
122
+
123
+ Install hooks with: `pre-commit install`
124
+
125
+ ## Common Workflows
126
+
127
+ ### Adding a new tag type
128
+ 1. Create a class in `yaml_reference/__init__.py` with `yaml_tag` attribute and `from_yaml()` classmethod
129
+ 2. Register the constructor after the class definition
130
+ 3. Add resolution logic (handle in recursive expansion)
131
+ 4. Write tests in `tests/unit/test_*.py` following existing patterns
132
+ 5. Update README.md with usage example
133
+
134
+ ### Debugging a reference resolution issue
135
+ 1. Use `parse_yaml_with_references()` to see raw Reference objects before resolution
136
+ 2. Add print statements or use a debugger to trace the `_resolve_references()` recursive calls
137
+ 3. Check the resolution stack to verify circular reference detection is working
138
+ 4. Run a specific test with `-v` flag to see detailed assertion output
139
+
140
+ ### Updating error messages
141
+ Ensure error messages follow this pattern: include the problematic value, the path of the file where the error occurred, and the specific constraint violated. This helps spec tests verify proper error handling.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: yaml-reference
3
- Version: 2.6.0
3
+ Version: 2.6.2
4
4
  Summary: Extension package built on top of `ruamel.yaml` to support cross-file references in YAML files using tags `!reference` and `!reference-all`.
5
5
  Project-URL: Repository, https://github.com/dsillman2000/yaml-reference.git
6
6
  Author-email: David Sillman <dsillman2000@gmail.com>
@@ -40,7 +40,7 @@ uv add yaml-reference
40
40
  ```
41
41
 
42
42
  ## Spec
43
- ![Spec Status](https://img.shields.io/badge/spec%20v0.2.6--3-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.6-3)
43
+ ![Spec Status](https://img.shields.io/badge/spec%20v0.2.6--4-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.6-4)
44
44
 
45
45
  This Python library implements the YAML specification for cross-file references and YAML composition in YAML files using tags `!reference`, `!reference-all`, `!flatten`, and `!merge` as defined in the [yaml-reference-specs project](https://github.com/dsillman2000/yaml-reference-specs).
46
46
 
@@ -286,6 +286,21 @@ yaml-reference compile input.yml --allow /allowed/path1 --allow /allowed/path2
286
286
 
287
287
  Whether or not `allow_paths` is specified, the default behavior is to allow references to files in the same directory as the source YAML file (or subdirectories). "Back-navigating" out of a the root directory is not allowed (".." local references in a root YAML file). This provides a secure baseline to prevent unsafe access which is not explicitly allowed.
288
288
 
289
+ ### Glob matching behavior for `!reference-all`
290
+
291
+ `!reference-all` applies **silent-omission semantics** when individual glob matches fall outside the allowed path set. Disallowed paths are filtered out *before* any file is opened (security invariant: disallowed file contents are never loaded into memory). The result is the subset of glob matches that are both reachable and allowed:
292
+
293
+ | Scenario | Behaviour | Exit |
294
+ |---|---|---|
295
+ | Glob matches zero files | Returns `[]` | `rc=0` |
296
+ | Some matched files are outside `allow_paths` | Disallowed files are silently dropped; remaining files are returned | `rc=0` |
297
+ | All matched files are outside `allow_paths` | Returns `[]` | `rc=0` |
298
+ | Glob pattern is absolute (starts with `/`) | Hard error – `ValueError` raised | `rc=1` |
299
+ | A matched file is the calling file (self-reference) | Hard error – circularity `ValueError` raised | `rc=1` |
300
+ | A matched file transitively references the caller | Hard error – circularity `ValueError` raised | `rc=1` |
301
+
302
+ This design keeps `!reference-all` resilient against partially-populated directory trees while still enforcing absolute-path and circularity invariants as hard failures.
303
+
289
304
  ### Absolute path restrictions
290
305
 
291
306
  References using absolute paths (e.g., `/tmp/file.yml`) are explicitly rejected with a `ValueError`. All reference paths must be relative to the source file's directory. If you absolutely must reference an absolute path, relative paths to symlinks can be used. Note that their target directories must be explicitly allowed to avoid permission errors (see the above section about "Path restriction and `allow_paths`").
@@ -14,7 +14,7 @@ uv add yaml-reference
14
14
  ```
15
15
 
16
16
  ## Spec
17
- ![Spec Status](https://img.shields.io/badge/spec%20v0.2.6--3-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.6-3)
17
+ ![Spec Status](https://img.shields.io/badge/spec%20v0.2.6--4-passing-brightgreen?link=https%3A%2F%2Fgithub.com%2Fdsillman2000%2Fyaml-reference-specs%2Ftree%2Fv0.2.6-4)
18
18
 
19
19
  This Python library implements the YAML specification for cross-file references and YAML composition in YAML files using tags `!reference`, `!reference-all`, `!flatten`, and `!merge` as defined in the [yaml-reference-specs project](https://github.com/dsillman2000/yaml-reference-specs).
20
20
 
@@ -260,6 +260,21 @@ yaml-reference compile input.yml --allow /allowed/path1 --allow /allowed/path2
260
260
 
261
261
  Whether or not `allow_paths` is specified, the default behavior is to allow references to files in the same directory as the source YAML file (or subdirectories). "Back-navigating" out of a the root directory is not allowed (".." local references in a root YAML file). This provides a secure baseline to prevent unsafe access which is not explicitly allowed.
262
262
 
263
+ ### Glob matching behavior for `!reference-all`
264
+
265
+ `!reference-all` applies **silent-omission semantics** when individual glob matches fall outside the allowed path set. Disallowed paths are filtered out *before* any file is opened (security invariant: disallowed file contents are never loaded into memory). The result is the subset of glob matches that are both reachable and allowed:
266
+
267
+ | Scenario | Behaviour | Exit |
268
+ |---|---|---|
269
+ | Glob matches zero files | Returns `[]` | `rc=0` |
270
+ | Some matched files are outside `allow_paths` | Disallowed files are silently dropped; remaining files are returned | `rc=0` |
271
+ | All matched files are outside `allow_paths` | Returns `[]` | `rc=0` |
272
+ | Glob pattern is absolute (starts with `/`) | Hard error – `ValueError` raised | `rc=1` |
273
+ | A matched file is the calling file (self-reference) | Hard error – circularity `ValueError` raised | `rc=1` |
274
+ | A matched file transitively references the caller | Hard error – circularity `ValueError` raised | `rc=1` |
275
+
276
+ This design keeps `!reference-all` resilient against partially-populated directory trees while still enforcing absolute-path and circularity invariants as hard failures.
277
+
263
278
  ### Absolute path restrictions
264
279
 
265
280
  References using absolute paths (e.g., `/tmp/file.yml`) are explicitly rejected with a `ValueError`. All reference paths must be relative to the source file's directory. If you absolutely must reference an absolute path, relative paths to symlinks can be used. Note that their target directories must be explicitly allowed to avoid permission errors (see the above section about "Path restriction and `allow_paths`").
@@ -155,11 +155,73 @@ def test_allow_paths_load_yaml_with_references(stage_files):
155
155
  )
156
156
  assert data["all"][0]["outside"] == "outside_value"
157
157
 
158
- # Test with allow_paths that doesn't include the referenced file (should fail, !reference-all)
159
- with pytest.raises(PermissionError):
160
- load_yaml_with_references(
161
- stg / "inner/with_all.yml", allow_paths=[stg / "some"]
162
- )
158
+ # Test with allow_paths that doesn't include the referenced file (!reference-all):
159
+ # disallowed files are silently omitted, so the result is an empty list.
160
+ data = load_yaml_with_references(
161
+ stg / "inner/with_all.yml", allow_paths=[stg / "some"]
162
+ )
163
+ assert data["all"] == []
164
+
165
+
166
+ def test_reference_all_empty_glob(stage_files):
167
+ """Test that !reference-all returns [] without error when glob matches no files."""
168
+ files = {
169
+ "test.yml": "data: !reference-all { glob: ./nonexistent/*.yml }",
170
+ }
171
+ stg = stage_files(files)
172
+ data = load_yaml_with_references(stg / "test.yml")
173
+ assert data["data"] == []
174
+
175
+
176
+ def test_reference_all_silently_omits_disallowed_paths(stage_files):
177
+ """Test that !reference-all silently omits files outside the allowed paths.
178
+
179
+ Relative-path violations (glob matches beyond the allowed directory tree) must be
180
+ dropped *before* reading the file (security invariant) and must NOT trigger an
181
+ error; the result is simply the subset of paths that are allowed.
182
+ """
183
+ files = {
184
+ # test.yml lives in sub/; globs into ../chapters/ which is outside sub/
185
+ "sub/test.yml": "data: !reference-all { glob: ../chapters/*.yml }",
186
+ "chapters/file1.yml": "val: 1",
187
+ "chapters/file2.yml": "val: 2",
188
+ }
189
+ stg = stage_files(files)
190
+
191
+ # allow_paths=[stg/"other"] => effective = [stg/"other", stg/"sub"].
192
+ # Neither covers stg/"chapters", so all matches are silently omitted.
193
+ data = load_yaml_with_references(stg / "sub/test.yml", allow_paths=[stg / "other"])
194
+ assert data["data"] == []
195
+
196
+ # When we explicitly allow chapters/, both files are included.
197
+ data = load_yaml_with_references(
198
+ stg / "sub/test.yml", allow_paths=[stg / "chapters"]
199
+ )
200
+ assert len(data["data"]) == 2
201
+ assert {"val": 1} in data["data"]
202
+ assert {"val": 2} in data["data"]
203
+
204
+
205
+ def test_reference_all_partial_disallow(stage_files):
206
+ """Test that !reference-all includes only the allowed subset of glob matches.
207
+
208
+ When a glob expands to paths spanning multiple directories and only some of those
209
+ directories are in allow_paths, the disallowed paths are silently omitted while the
210
+ allowed paths are still included in the result.
211
+ """
212
+ files = {
213
+ "sub/test.yml": "data: !reference-all { glob: ../*/file.yml }",
214
+ "allowed/file.yml": "kind: allowed",
215
+ "blocked/file.yml": "kind: blocked",
216
+ }
217
+ stg = stage_files(files)
218
+
219
+ # Only stg/"allowed" is in allow_paths (plus the default stg/"sub").
220
+ # stg/"blocked"/file.yml is not covered, so it is silently omitted.
221
+ data = load_yaml_with_references(
222
+ stg / "sub/test.yml", allow_paths=[stg / "allowed"]
223
+ )
224
+ assert data["data"] == [{"kind": "allowed"}]
163
225
 
164
226
 
165
227
  @pytest.mark.parametrize(
@@ -400,6 +400,32 @@ def _recursively_attribute_location_to_references(data: Any, base_path: Path):
400
400
  return data
401
401
 
402
402
 
403
+ def _is_path_allowed(path: Path, allow_paths: Sequence[Path]) -> bool:
404
+ """Check whether a resolved path is accessible given the allow_paths configuration.
405
+
406
+ Unlike `_check_file_path`, this never raises; it returns `False` for paths that
407
+ do not exist, are not regular files, or fall outside every entry in *allow_paths*.
408
+ An empty *allow_paths* sequence means "no directory restrictions" (all existing files
409
+ are considered allowed).
410
+
411
+ Args:
412
+ path: Resolved, absolute path to check.
413
+ allow_paths: List of allowed directory paths.
414
+
415
+ Returns:
416
+ True if the path is an accessible file within an allowed directory (or no
417
+ restrictions are in place). False otherwise.
418
+ """
419
+ if not path.exists() or not path.is_file():
420
+ return False
421
+ if not allow_paths:
422
+ return True
423
+ for allow_path in allow_paths:
424
+ if path.is_relative_to(allow_path):
425
+ return True
426
+ return False
427
+
428
+
403
429
  def _check_and_track_path(path: Path, visited_paths: set[Path]) -> None:
404
430
  """
405
431
  Check for circular reference and add path to visited set.
@@ -481,12 +507,26 @@ def _recursively_resolve_references(
481
507
  elif isinstance(data, ReferenceAll):
482
508
  glob_results = Path(data.location).parent.glob(data.glob)
483
509
  abs_paths = [path.resolve() for path in glob_results]
510
+
511
+ # Empty glob match -> silent omission, return empty list.
484
512
  if not abs_paths:
485
- raise FileNotFoundError(
486
- f'No files found matching glob pattern "{data.glob}" in directory "{Path(data.location).parent}"'
487
- )
488
- abs_paths = sorted(abs_paths, key=lambda x: str(x))
513
+ return []
514
+
515
+ # Precompute allowed paths sequence once to avoid repeated list()
516
+ # construction in the comprehension below.
517
+ allowed_paths_seq = list(allow_paths)
489
518
 
519
+ # Security invariant: filter out disallowed / nonexistent paths *before*
520
+ # opening any file. Relative-path violations are silently omitted here;
521
+ # absolute-path violations are caught earlier in ReferenceAll.__init__.
522
+ abs_paths = [p for p in abs_paths if _is_path_allowed(p, allowed_paths_seq)]
523
+
524
+ # All matched paths were disallowed → silent omission, return empty list.
525
+ if not abs_paths:
526
+ return []
527
+
528
+ # Sort only the allowed paths to avoid sorting entries that will be dropped.
529
+ abs_paths = sorted(abs_paths, key=lambda x: str(x))
490
530
  resolved_items = []
491
531
  for path in abs_paths:
492
532
  # Check for circular reference and track path
File without changes
File without changes
File without changes