PyPI - bundesrecht - Versions diffs - 0.1.0__tar.gz - Mend

bundesrecht 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

bundesrecht-0.1.0/LICENSE +21 -0
bundesrecht-0.1.0/PKG-INFO +536 -0
bundesrecht-0.1.0/README.md +515 -0
bundesrecht-0.1.0/bundesrecht/__init__.py +141 -0
bundesrecht-0.1.0/bundesrecht/_corpus.py +347 -0
bundesrecht-0.1.0/bundesrecht/lookup.py +1194 -0
bundesrecht-0.1.0/bundesrecht/normaliser.py +495 -0
bundesrecht-0.1.0/bundesrecht/py.typed +0 -0
bundesrecht-0.1.0/bundesrecht/references.py +528 -0
bundesrecht-0.1.0/bundesrecht.egg-info/PKG-INFO +536 -0
bundesrecht-0.1.0/bundesrecht.egg-info/SOURCES.txt +14 -0
bundesrecht-0.1.0/bundesrecht.egg-info/dependency_links.txt +1 -0
bundesrecht-0.1.0/bundesrecht.egg-info/requires.txt +3 -0
bundesrecht-0.1.0/bundesrecht.egg-info/top_level.txt +1 -0
bundesrecht-0.1.0/pyproject.toml +37 -0
bundesrecht-0.1.0/setup.cfg +4 -0

bundesrecht-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Harshil
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

bundesrecht-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,536 @@
+Metadata-Version: 2.4
+Name: bundesrecht
+Version: 0.1.0
+Summary: Structured parsing, normalisation, and resolution of German federal law references
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/harshildarji/bundesrecht
+Keywords: german law,bundesrecht,legal nlp,normreferenz,gesetze
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Science/Research
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Text Processing :: Linguistic
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Provides-Extra: dev
+Requires-Dist: pytest>=7; extra == "dev"
+Dynamic: license-file
+# bundesrecht
+Python package for parsing, normalising, and resolving German federal law references.
+Zero dependencies. Pure Python 3.10+.
+## Contents
+<!-- no toc -->
+- [Simplified architecture](#simplified-architecture)
+- [Installation](#installation)
+- [Parsing references](#parsing-references)
+- [Data model](#data-model)
+- [Normalising references](#normalising-references)
+- [What the normaliser handles](#what-the-normaliser-handles)
+- [Resolving references](#resolving-references)
+- [Corpus cache](#corpus-cache)
+- [QueryResult](#queryresult)
+- [LawData](#lawdata)
+- [Resolved depth reference](#resolved-depth-reference)
+- [Complete example](#complete-example)
+## Simplified architecture
+The library is built in three layers. The **parser** is the foundational *brick*, identifying the structure of any German citation string. The **normaliser** builds on the parser to handle expansion and produce canonical strings. The **resolver** builds on both to look up actual statutory text from the corpus.
+All three layers are exposed as public APIs. Use `parse_reference()` when you only need structured extraction. Use `normalise()` when you need canonical strings without corpus lookup. Use `query()` when you need the actual statutory text.
+<p align="center">
+  <img src="https://raw.githubusercontent.com/harshildarji/bundesrecht/main/examples/architecture.png" alt="Simplified architecture of the bundesrecht library" width="350">
+</p>
+## Installation
+```bash
+pip install bundesrecht
+```
+## Parsing references
+Parses a raw citation string into a structured `LawReference` object
+without resolving it against any law data.
+```python
+from bundesrecht import parse_reference
+ref = parse_reference('§ 2 Abs. 1 Nr. 1 UrhG')
+ref.law                   # → 'UrhG'
+ref.paragraphs            # → [ParagraphRef(...)]
+str(ref)                  # → '§ 2 Abs. 1 Nr. 1 UrhG'
+para = ref.paragraphs[0]
+para.paragraph            # → '2'
+para.sub_refs             # → [SubReference(Abs, '1'), SubReference(Nr, '1')]
+str(para.sub_refs[0])     # → 'Abs. 1'
+str(para.sub_refs[1])     # → 'Nr. 1'
+```
+## Data model
+Three dataclasses represent a parsed reference at increasing levels of specificity.
+These objects are returned by `parse_reference()` and are also exposed through `QueryResult.reference`.
+### LawReference
+```python
+@dataclass
+class LawReference:
+    paragraphs: list[ParagraphRef]   # one or more paragraphs
+    law: str | None                  # e.g. 'BGB', 'UrhG'
+    raw: str                         # original input string
+```
+### ParagraphRef
+```python
+@dataclass
+class ParagraphRef:
+    paragraph: str                   # '312', '312a', '1'
+    sub_refs: list[SubReference]     # Abs, Satz, Nr, Buchst, etc.
+    range_end: str | None            # set for '§ 312 bis 314'
+    is_ff: bool                      # § 312 ff.
+    is_f: bool                       # § 312 f.
+    ivm_refs: list[SubReference]     # sub-refs after 'iVm' within a paragraph
+```
+### SubReference
+```python
+@dataclass
+class SubReference:
+    level: str      # 'Abs', 'Satz', 'Nr', 'Buchst', 'Alt', 'Halbsatz'
+    number: str     # '1', '2', 'a', '1a'
+    range_end: str  # set for 'Abs. 2 bis 4'
+```
+String representations:
+| level    | example output |
+| -------- | -------------- |
+| Abs      | `Abs. 2`       |
+| Satz     | `Satz 1`       |
+| Nr       | `Nr. 3`        |
+| Buchst   | `Buchst. a`    |
+| Alt      | `Alt. 1`       |
+| Halbsatz | `Halbsatz 2`   |
+## Normalising references
+Available directly without loading any law data.
+```python
+from bundesrecht import normalise
+normalise('§ 312 i.V.m. § 355 BGB')
+# → ['§ 312 BGB', '§ 355 BGB']
+normalise('§§ 12-15 BGB')
+# → ['§ 12 BGB', '§ 13 BGB', '§ 14 BGB', '§ 15 BGB']
+normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
+# → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']
+normalise('§§ 137 S. 2, 398, 903 BGB')
+# → ['§ 137 Satz 2 BGB', '§ 398 BGB', '§ 903 BGB']
+normalise('§§ 46 Abs. 2 ArbGG, 91 Abs. 1 ZPO')
+# → ['§ 46 Abs. 2 ArbGG', '§ 91 Abs. 1 ZPO']
+# iVm variants - all recognised
+normalise('§ 1 iVm § 2 BGB')
+normalise('§ 1 i.V.m. § 2 BGB')
+normalise('§ 1 i. V. m. § 2 BGB')
+# → ['§ 1 BGB', '§ 2 BGB']  in all cases
+# S. expands to Satz
+normalise('§ 1 S. 2 BGB')
+# → ['§ 1 Satz 2 BGB']
+# f. always expands to exactly 2 paragraphs
+normalise('§ 312 f. BGB')
+# → ['§ 312 BGB', '§ 313 BGB']
+# ff. is preserved by default - pass ff_expansion to expand
+normalise('§ 312 ff. BGB')
+# → ['§ 312 ff. BGB']
+normalise('§ 312 ff. BGB', ff_expansion=3)
+# → ['§ 312 BGB', '§ 313 BGB', '§ 314 BGB']
+normalise('§ 312 ff. BGB', ff_expansion=5)
+# → ['§ 312 BGB', '§ 313 BGB', '§ 314 BGB', '§ 315 BGB', '§ 316 BGB']
+```
+## What the normaliser handles
+| Input form                        | Output                                     |
+| --------------------------------- | ------------------------------------------ |
+| `§ 312 i.V.m. § 355 BGB`          | `['§ 312 BGB', '§ 355 BGB']`               |
+| `§ 312 iVm § 355 BGB`             | `['§ 312 BGB', '§ 355 BGB']`               |
+| `§§ 12-15 BGB`                    | `['§ 12 BGB', ..., '§ 15 BGB']`            |
+| `§§ 12 bis 15 BGB`                | same                                       |
+| `§§ 137 S. 2, 398 BGB`            | `['§ 137 Satz 2 BGB', '§ 398 BGB']`        |
+| `§§ 46 Abs. 2 ArbGG, 91 ZPO`      | `['§ 46 Abs. 2 ArbGG', '§ 91 ZPO']`        |
+| `§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2` | three separate canonical refs              |
+| `§ 1 S. 2 BGB`                    | `['§ 1 Satz 2 BGB']`                       |
+| `§ 312 f. BGB`                    | `['§ 312 BGB', '§ 313 BGB']`               |
+| `§ 312 ff. BGB`                   | `['§ 312 ff. BGB']` (preserved by default) |
+| `§ 312 ff. BGB` (ff_expansion=3)  | `['§ 312 BGB', '§ 313 BGB', '§ 314 BGB']`  |
+| `§312 BGB` (no space)             | `['§ 312 BGB']`                            |
+Ranges with letter suffixes (`§§ 12a-12c`) are left unchanged because
+intermediate values are not predictable.
+## Resolving references
+`Bundesrecht` is the dataset-backed entry point for resolving references.
+Load once, query as many times as you like.
+```python
+from bundesrecht import Bundesrecht
+lib = Bundesrecht()
+```
+By default, `Bundesrecht()` uses the corpus version pinned to the installed
+package. It loads the compatible cached corpus if present, or downloads the
+matching public `gesetze.jsonl` from Hugging Face on first use.
+For offline or reproducible work with an explicit corpus file:
+```python
+lib = Bundesrecht(local_path='data/gesetze.jsonl')
+```
+### lib.query(raw)
+Normalises a raw citation string and resolves each canonical reference.
+Returns `list[QueryResult]`.
+```python
+# Simple paragraph
+results = lib.query('§ 242 BGB')
+# Paragraph + Absatz
+results = lib.query('§ 433 Abs. 1 BGB')
+# Paragraph + Absatz + Nummer
+results = lib.query('§ 2 Abs. 1 Nr. 1 UrhG')
+# Multi-target: expands into 3 separate results
+results = lib.query('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
+# → QueryResult for § 2 Abs. 1 Nr. 1 UrhG
+# → QueryResult for § 2 Abs. 1 Nr. 7 UrhG
+# → QueryResult for § 2 Abs. 2 UrhG
+# i.V.m.: expands into 2 separate results
+results = lib.query('§ 312 i.V.m. § 355 BGB')
+# → QueryResult for § 312 BGB
+# → QueryResult for § 355 BGB
+# §§ range: expands into one result per paragraph
+results = lib.query('§§ 12-15 BGB')
+# → § 12, § 13, § 14, § 15
+# §§ with separate laws per chunk
+results = lib.query('§§ 46 Abs. 2 ArbGG, 91 Abs. 1 ZPO')
+# → § 46 Abs. 2 ArbGG
+# → § 91 Abs. 1 ZPO
+# Satz reference
+results = lib.query('§ 1 Satz 2 BGB')
+# Buchstabe reference
+results = lib.query('§ 2 Abs. 1 Nr. 1 Buchst. a UrhG')
+```
+### lib.query_canonical(canonical)
+Skips normalisation and resolves a pre-cleaned reference directly.
+Use this when you have already normalised the string yourself.
+```python
+results = lib.query_canonical('§ 2 Abs. 1 Nr. 1 UrhG')
+```
+### lib.normalise(raw)
+Normalises a citation string without resolving it.
+Returns `list[str]` of canonical strings.
+```python
+lib.normalise('§ 312 i.V.m. § 355 BGB')
+# → ['§ 312 BGB', '§ 355 BGB']
+lib.normalise('§§ 12-15 BGB')
+# → ['§ 12 BGB', '§ 13 BGB', '§ 14 BGB', '§ 15 BGB']
+lib.normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
+# → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']
+```
+### lib.get_law(abbreviation)
+Returns a `LawData` object for a law by its abbreviation. Case-insensitive.
+Returns `None` if not found.
+```python
+bgb = lib.get_law('BGB')
+bgb = lib.get_law('bgb')   # same result
+```
+### lib.available_laws
+Sorted list of all law abbreviations currently loaded.
+```python
+lib.available_laws[:5]
+# → ['1-DM-GOLDMÜNZG', '1. BESVNG', '1. BIMSCHV', '1. BMELDDÜV', '1. DV LUFTBO']
+```
+### lib.law_count
+Number of distinct laws loaded.
+```python
+lib.law_count   # → 6873
+```
+## Corpus cache
+The PyPI package ships code only. It does not bundle the full corpus and does
+not download data during installation.
+On first `Bundesrecht()` use, the package checks a commit-keyed cache:
+```text
+~/.cache/bundesrecht/<pinned-data-commit>/gesetze.jsonl
+```
+If the compatible file is missing, it downloads the exact Hugging Face dataset
+commit pinned by this package version and validates the JSONL structure before
+loading it. Later calls reuse the cached file.
+To choose a different cache root, set:
+```bash
+export BUNDESRECHT_CACHE_DIR=/path/to/cache
+```
+To avoid network access entirely, pass a local file:
+```python
+lib = Bundesrecht(local_path='data/gesetze.jsonl')
+```
+Local files are validated before loading. If a local file does not match the
+expected corpus shape, use `Bundesrecht()` to load the package-managed corpus.
+## QueryResult
+Returned by `query()` and `query_canonical()`. One object per resolved reference.
+```python
+r = lib.query('§ 433 Abs. 1 BGB')[0]
+```
+### r.full_text()
+Returns the text at the resolved depth - Satz text if a Satz was resolved,
+Nummer text if a Nummer was resolved, Absatz text if an Absatz was resolved,
+or the full section content if only the paragraph was found.
+```python
+r.full_text()
+# → 'Durch den Kaufvertrag wird der Verkäufer einer Sache verpflichtet...'
+```
+### r.titel()
+Returns the section heading (Überschrift), if one exists.
+```python
+r.titel()
+# → 'Vertragstypische Pflichten beim Kaufvertrag'
+```
+### r.resolved_depth
+String indicating how deeply the reference was resolved.
+One of: `'section'`, `'absatz'`, `'satz'`, `'nummer'`, `'buchstabe'`, `'unterbuchstabe'`.
+```python
+r.resolved_depth   # → 'absatz'  (Absatz found, but no Nummer requested)
+```
+### r.resolution_note
+Human-readable explanation when the requested depth was not fully resolved.
+Empty string when resolution was complete.
+```python
+r.resolution_note
+# → ''  (fully resolved)
+# → 'Buchstabe c not found in Nr. 1'  (partial resolution)
+```
+### r.reference
+The parsed `LawReference` object for this result.
+```python
+r.reference.law           # → 'BGB'
+r.reference.paragraphs    # → [ParagraphRef(paragraph='433', ...)]
+str(r.reference)          # → '§ 433 Abs. 1 BGB'
+```
+### r.law_data
+The `LawData` object for the parent statute.
+```python
+r.law_data.jurabk                            # → 'BGB'
+r.law_data.gesetze_id                        # → 'BGB::BJNR001950896'
+r.law_data.metadaten.get('langtitel')         # → 'Bürgerliches Gesetzbuch'
+r.law_data.metadaten.get('ausfertigung_datum') # → '1896-08-18'
+len(r.law_data.sections)                     # → 2541
+```
+### r.section
+Raw dict of the resolved section, or `None` if not found.
+```python
+r.section.get('titel')    # same as r.titel()
+r.section.get('content')  # list of content blocks
+```
+### r.resolved_para
+The specific `ParagraphRef` that was matched (after multi-target expansion).
+```python
+str(r.resolved_para)   # → '433 Abs. 1'
+```
+## LawData
+Returned by `lib.get_law()` and available as `result.law_data`.
+```python
+bgb = lib.get_law('BGB')
+```
+### Attributes
+```python
+bgb.jurabk        # → 'BGB'          abbreviation
+bgb.gesetze_id    # → 'BGB::BJNR001950896'  internal corpus ID
+bgb.metadaten     # → dict            full metadata
+bgb.sections      # → dict            all sections keyed by paragraph string
+bgb.fussnoten     # → list            footnotes at law level
+bgb.quelle        # → dict            source metadata
+```
+### Useful metadaten keys
+```python
+bgb.metadaten.get('langtitel')                        # → 'Bürgerliches Gesetzbuch'
+bgb.metadaten.get('kurztitel')                        # short title if present
+bgb.metadaten.get('ausfertigung_datum')               # → '1896-08-18'
+bgb.metadaten.get('fundstelle', {}).get('periodikum') # → 'RGBl'
+bgb.metadaten.get('fundstelle', {}).get('zitstelle')  # → '1896, 195'
+```
+### bgb.get_section(paragraph)
+Look up a section by paragraph number string.
+```python
+sec = bgb.get_section('433')
+sec['titel']    # → 'Vertragstypische Pflichten beim Kaufvertrag'
+sec['content']  # → list of Absatz dicts
+```
+### bgb.get_absatz(paragraph, absatz)
+Look up a specific Absatz within a section.
+```python
+abs1 = bgb.get_absatz('433', 1)
+abs1 = bgb.get_absatz('433', '1')   # string also works
+```
+## Resolved depth reference
+| `resolved_depth`   | Meaning                                                |
+| ------------------ | ------------------------------------------------------ |
+| `'section'`        | Only the paragraph was found (no sub-ref match)        |
+| `'absatz'`         | Absatz resolved, Nummer was not requested/found        |
+| `'nummer'`         | Nummer resolved, Buchstabe not requested/found         |
+| `'buchstabe'`      | Buchstabe resolved, Unterbuchstabe not requested/found |
+| `'unterbuchstabe'` | Fully resolved to Unterbuchstabe level (`aa)`, `bb)`)  |
+## Complete example
+```python
+from bundesrecht import Bundesrecht, normalise, parse_reference
+# Load
+lib = Bundesrecht()
+print(lib)   # → Bundesrecht(6873 laws loaded)
+# Parse only
+ref = parse_reference('§ 433 Abs. 1 Satz 1 BGB')
+ref.law                          # → 'BGB'
+ref.paragraphs[0].paragraph      # → '433'
+ref.paragraphs[0].sub_refs       # → [SubReference(Abs,1), SubReference(Satz,1)]
+# Normalise only
+normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
+# → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']
+# Resolve
+results = lib.query('§ 433 Abs. 1 BGB')
+r = results[0]
+r.titel()          # → 'Vertragstypische Pflichten beim Kaufvertrag'
+r.full_text()      # → actual statutory text of Abs. 1
+r.resolved_depth   # → 'absatz'  (Absatz found, but no Nummer requested)
+str(r.reference)   # → '§ 433 Abs. 1 BGB'
+# Inspect a law directly
+bgb = lib.get_law('BGB')
+bgb.metadaten.get('langtitel')                        # → 'Bürgerliches Gesetzbuch'
+bgb.metadaten.get('ausfertigung_datum')  # → '1896-08-18'
+len(bgb.sections)                        # → 2541
+# List all laws
+lib.available_laws[:5]    # → ['1-DM-GOLDMÜNZG', '1. BESVNG', ...]
+lib.law_count             # → 6873
+```