bundesrecht 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Harshil
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,536 @@
1
+ Metadata-Version: 2.4
2
+ Name: bundesrecht
3
+ Version: 0.1.0
4
+ Summary: Structured parsing, normalisation, and resolution of German federal law references
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/harshildarji/bundesrecht
7
+ Keywords: german law,bundesrecht,legal nlp,normreferenz,gesetze
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Intended Audience :: Science/Research
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3.11
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Topic :: Text Processing :: Linguistic
15
+ Requires-Python: >=3.10
16
+ Description-Content-Type: text/markdown
17
+ License-File: LICENSE
18
+ Provides-Extra: dev
19
+ Requires-Dist: pytest>=7; extra == "dev"
20
+ Dynamic: license-file
21
+
22
+ # bundesrecht
23
+
24
+ Python package for parsing, normalising, and resolving German federal law references.
25
+
26
+ Zero dependencies. Pure Python 3.10+.
27
+
28
+ ## Contents
29
+ <!-- no toc -->
30
+ - [Simplified architecture](#simplified-architecture)
31
+ - [Installation](#installation)
32
+ - [Parsing references](#parsing-references)
33
+ - [Data model](#data-model)
34
+ - [Normalising references](#normalising-references)
35
+ - [What the normaliser handles](#what-the-normaliser-handles)
36
+ - [Resolving references](#resolving-references)
37
+ - [Corpus cache](#corpus-cache)
38
+ - [QueryResult](#queryresult)
39
+ - [LawData](#lawdata)
40
+ - [Resolved depth reference](#resolved-depth-reference)
41
+ - [Complete example](#complete-example)
42
+
43
+
44
+ ## Simplified architecture
45
+
46
+ The library is built in three layers. The **parser** is the foundational *brick*, identifying the structure of any German citation string. The **normaliser** builds on the parser to handle expansion and produce canonical strings. The **resolver** builds on both to look up actual statutory text from the corpus.
47
+
48
+ All three layers are exposed as public APIs. Use `parse_reference()` when you only need structured extraction. Use `normalise()` when you need canonical strings without corpus lookup. Use `query()` when you need the actual statutory text.
49
+
50
+ <p align="center">
51
+ <img src="https://raw.githubusercontent.com/harshildarji/bundesrecht/main/examples/architecture.png" alt="Simplified architecture of the bundesrecht library" width="350">
52
+ </p>
53
+
54
+
55
+ ## Installation
56
+
57
+ ```bash
58
+ pip install bundesrecht
59
+ ```
60
+
61
+
62
+ ## Parsing references
63
+
64
+ Parses a raw citation string into a structured `LawReference` object
65
+ without resolving it against any law data.
66
+
67
+ ```python
68
+ from bundesrecht import parse_reference
69
+
70
+ ref = parse_reference('§ 2 Abs. 1 Nr. 1 UrhG')
71
+
72
+ ref.law # → 'UrhG'
73
+ ref.paragraphs # → [ParagraphRef(...)]
74
+ str(ref) # → '§ 2 Abs. 1 Nr. 1 UrhG'
75
+
76
+ para = ref.paragraphs[0]
77
+ para.paragraph # → '2'
78
+ para.sub_refs # → [SubReference(Abs, '1'), SubReference(Nr, '1')]
79
+ str(para.sub_refs[0]) # → 'Abs. 1'
80
+ str(para.sub_refs[1]) # → 'Nr. 1'
81
+ ```
82
+
83
+
84
+ ## Data model
85
+
86
+ Three dataclasses represent a parsed reference at increasing levels of specificity.
87
+ These objects are returned by `parse_reference()` and are also exposed through `QueryResult.reference`.
88
+
89
+ ### LawReference
90
+
91
+ ```python
92
+ @dataclass
93
+ class LawReference:
94
+ paragraphs: list[ParagraphRef] # one or more paragraphs
95
+ law: str | None # e.g. 'BGB', 'UrhG'
96
+ raw: str # original input string
97
+ ```
98
+
99
+ ### ParagraphRef
100
+
101
+ ```python
102
+ @dataclass
103
+ class ParagraphRef:
104
+ paragraph: str # '312', '312a', '1'
105
+ sub_refs: list[SubReference] # Abs, Satz, Nr, Buchst, etc.
106
+ range_end: str | None # set for '§ 312 bis 314'
107
+ is_ff: bool # § 312 ff.
108
+ is_f: bool # § 312 f.
109
+ ivm_refs: list[SubReference] # sub-refs after 'iVm' within a paragraph
110
+ ```
111
+
112
+ ### SubReference
113
+
114
+ ```python
115
+ @dataclass
116
+ class SubReference:
117
+ level: str # 'Abs', 'Satz', 'Nr', 'Buchst', 'Alt', 'Halbsatz'
118
+ number: str # '1', '2', 'a', '1a'
119
+ range_end: str # set for 'Abs. 2 bis 4'
120
+ ```
121
+
122
+ String representations:
123
+
124
+ | level | example output |
125
+ | -------- | -------------- |
126
+ | Abs | `Abs. 2` |
127
+ | Satz | `Satz 1` |
128
+ | Nr | `Nr. 3` |
129
+ | Buchst | `Buchst. a` |
130
+ | Alt | `Alt. 1` |
131
+ | Halbsatz | `Halbsatz 2` |
132
+
133
+
134
+ ## Normalising references
135
+
136
+ Available directly without loading any law data.
137
+
138
+ ```python
139
+ from bundesrecht import normalise
140
+
141
+ normalise('§ 312 i.V.m. § 355 BGB')
142
+ # → ['§ 312 BGB', '§ 355 BGB']
143
+
144
+ normalise('§§ 12-15 BGB')
145
+ # → ['§ 12 BGB', '§ 13 BGB', '§ 14 BGB', '§ 15 BGB']
146
+
147
+ normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
148
+ # → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']
149
+
150
+ normalise('§§ 137 S. 2, 398, 903 BGB')
151
+ # → ['§ 137 Satz 2 BGB', '§ 398 BGB', '§ 903 BGB']
152
+
153
+ normalise('§§ 46 Abs. 2 ArbGG, 91 Abs. 1 ZPO')
154
+ # → ['§ 46 Abs. 2 ArbGG', '§ 91 Abs. 1 ZPO']
155
+
156
+ # iVm variants - all recognised
157
+ normalise('§ 1 iVm § 2 BGB')
158
+ normalise('§ 1 i.V.m. § 2 BGB')
159
+ normalise('§ 1 i. V. m. § 2 BGB')
160
+ # → ['§ 1 BGB', '§ 2 BGB'] in all cases
161
+
162
+ # S. expands to Satz
163
+ normalise('§ 1 S. 2 BGB')
164
+ # → ['§ 1 Satz 2 BGB']
165
+
166
+ # f. always expands to exactly 2 paragraphs
167
+ normalise('§ 312 f. BGB')
168
+ # → ['§ 312 BGB', '§ 313 BGB']
169
+
170
+ # ff. is preserved by default - pass ff_expansion to expand
171
+ normalise('§ 312 ff. BGB')
172
+ # → ['§ 312 ff. BGB']
173
+
174
+ normalise('§ 312 ff. BGB', ff_expansion=3)
175
+ # → ['§ 312 BGB', '§ 313 BGB', '§ 314 BGB']
176
+
177
+ normalise('§ 312 ff. BGB', ff_expansion=5)
178
+ # → ['§ 312 BGB', '§ 313 BGB', '§ 314 BGB', '§ 315 BGB', '§ 316 BGB']
179
+ ```
180
+
181
+
182
+ ## What the normaliser handles
183
+
184
+ | Input form | Output |
185
+ | --------------------------------- | ------------------------------------------ |
186
+ | `§ 312 i.V.m. § 355 BGB` | `['§ 312 BGB', '§ 355 BGB']` |
187
+ | `§ 312 iVm § 355 BGB` | `['§ 312 BGB', '§ 355 BGB']` |
188
+ | `§§ 12-15 BGB` | `['§ 12 BGB', ..., '§ 15 BGB']` |
189
+ | `§§ 12 bis 15 BGB` | same |
190
+ | `§§ 137 S. 2, 398 BGB` | `['§ 137 Satz 2 BGB', '§ 398 BGB']` |
191
+ | `§§ 46 Abs. 2 ArbGG, 91 ZPO` | `['§ 46 Abs. 2 ArbGG', '§ 91 ZPO']` |
192
+ | `§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2` | three separate canonical refs |
193
+ | `§ 1 S. 2 BGB` | `['§ 1 Satz 2 BGB']` |
194
+ | `§ 312 f. BGB` | `['§ 312 BGB', '§ 313 BGB']` |
195
+ | `§ 312 ff. BGB` | `['§ 312 ff. BGB']` (preserved by default) |
196
+ | `§ 312 ff. BGB` (ff_expansion=3) | `['§ 312 BGB', '§ 313 BGB', '§ 314 BGB']` |
197
+ | `§312 BGB` (no space) | `['§ 312 BGB']` |
198
+
199
+ Ranges with letter suffixes (`§§ 12a-12c`) are left unchanged because
200
+ intermediate values are not predictable.
201
+
202
+
203
+ ## Resolving references
204
+
205
+ `Bundesrecht` is the dataset-backed entry point for resolving references.
206
+ Load once, query as many times as you like.
207
+
208
+ ```python
209
+ from bundesrecht import Bundesrecht
210
+
211
+ lib = Bundesrecht()
212
+ ```
213
+
214
+ By default, `Bundesrecht()` uses the corpus version pinned to the installed
215
+ package. It loads the compatible cached corpus if present, or downloads the
216
+ matching public `gesetze.jsonl` from Hugging Face on first use.
217
+
218
+ For offline or reproducible work with an explicit corpus file:
219
+
220
+ ```python
221
+ lib = Bundesrecht(local_path='data/gesetze.jsonl')
222
+ ```
223
+
224
+ ### lib.query(raw)
225
+
226
+ Normalises a raw citation string and resolves each canonical reference.
227
+ Returns `list[QueryResult]`.
228
+
229
+ ```python
230
+ # Simple paragraph
231
+ results = lib.query('§ 242 BGB')
232
+
233
+ # Paragraph + Absatz
234
+ results = lib.query('§ 433 Abs. 1 BGB')
235
+
236
+ # Paragraph + Absatz + Nummer
237
+ results = lib.query('§ 2 Abs. 1 Nr. 1 UrhG')
238
+
239
+ # Multi-target: expands into 3 separate results
240
+ results = lib.query('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
241
+ # → QueryResult for § 2 Abs. 1 Nr. 1 UrhG
242
+ # → QueryResult for § 2 Abs. 1 Nr. 7 UrhG
243
+ # → QueryResult for § 2 Abs. 2 UrhG
244
+
245
+ # i.V.m.: expands into 2 separate results
246
+ results = lib.query('§ 312 i.V.m. § 355 BGB')
247
+ # → QueryResult for § 312 BGB
248
+ # → QueryResult for § 355 BGB
249
+
250
+ # §§ range: expands into one result per paragraph
251
+ results = lib.query('§§ 12-15 BGB')
252
+ # → § 12, § 13, § 14, § 15
253
+
254
+ # §§ with separate laws per chunk
255
+ results = lib.query('§§ 46 Abs. 2 ArbGG, 91 Abs. 1 ZPO')
256
+ # → § 46 Abs. 2 ArbGG
257
+ # → § 91 Abs. 1 ZPO
258
+
259
+ # Satz reference
260
+ results = lib.query('§ 1 Satz 2 BGB')
261
+
262
+ # Buchstabe reference
263
+ results = lib.query('§ 2 Abs. 1 Nr. 1 Buchst. a UrhG')
264
+ ```
265
+
266
+ ### lib.query_canonical(canonical)
267
+
268
+ Skips normalisation and resolves a pre-cleaned reference directly.
269
+ Use this when you have already normalised the string yourself.
270
+
271
+ ```python
272
+ results = lib.query_canonical('§ 2 Abs. 1 Nr. 1 UrhG')
273
+ ```
274
+
275
+ ### lib.normalise(raw)
276
+
277
+ Normalises a citation string without resolving it.
278
+ Returns `list[str]` of canonical strings.
279
+
280
+ ```python
281
+ lib.normalise('§ 312 i.V.m. § 355 BGB')
282
+ # → ['§ 312 BGB', '§ 355 BGB']
283
+
284
+ lib.normalise('§§ 12-15 BGB')
285
+ # → ['§ 12 BGB', '§ 13 BGB', '§ 14 BGB', '§ 15 BGB']
286
+
287
+ lib.normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
288
+ # → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']
289
+ ```
290
+
291
+ ### lib.get_law(abbreviation)
292
+
293
+ Returns a `LawData` object for a law by its abbreviation. Case-insensitive.
294
+ Returns `None` if not found.
295
+
296
+ ```python
297
+ bgb = lib.get_law('BGB')
298
+ bgb = lib.get_law('bgb') # same result
299
+ ```
300
+
301
+ ### lib.available_laws
302
+
303
+ Sorted list of all law abbreviations currently loaded.
304
+
305
+ ```python
306
+ lib.available_laws[:5]
307
+ # → ['1-DM-GOLDMÜNZG', '1. BESVNG', '1. BIMSCHV', '1. BMELDDÜV', '1. DV LUFTBO']
308
+ ```
309
+
310
+ ### lib.law_count
311
+
312
+ Number of distinct laws loaded.
313
+
314
+ ```python
315
+ lib.law_count # → 6873
316
+ ```
317
+
318
+
319
+ ## Corpus cache
320
+
321
+ The PyPI package ships code only. It does not bundle the full corpus and does
322
+ not download data during installation.
323
+
324
+ On first `Bundesrecht()` use, the package checks a commit-keyed cache:
325
+
326
+ ```text
327
+ ~/.cache/bundesrecht/<pinned-data-commit>/gesetze.jsonl
328
+ ```
329
+
330
+ If the compatible file is missing, it downloads the exact Hugging Face dataset
331
+ commit pinned by this package version and validates the JSONL structure before
332
+ loading it. Later calls reuse the cached file.
333
+
334
+ To choose a different cache root, set:
335
+
336
+ ```bash
337
+ export BUNDESRECHT_CACHE_DIR=/path/to/cache
338
+ ```
339
+
340
+ To avoid network access entirely, pass a local file:
341
+
342
+ ```python
343
+ lib = Bundesrecht(local_path='data/gesetze.jsonl')
344
+ ```
345
+
346
+ Local files are validated before loading. If a local file does not match the
347
+ expected corpus shape, use `Bundesrecht()` to load the package-managed corpus.
348
+
349
+
350
+ ## QueryResult
351
+
352
+ Returned by `query()` and `query_canonical()`. One object per resolved reference.
353
+
354
+ ```python
355
+ r = lib.query('§ 433 Abs. 1 BGB')[0]
356
+ ```
357
+
358
+ ### r.full_text()
359
+
360
+ Returns the text at the resolved depth - Satz text if a Satz was resolved,
361
+ Nummer text if a Nummer was resolved, Absatz text if an Absatz was resolved,
362
+ or the full section content if only the paragraph was found.
363
+
364
+ ```python
365
+ r.full_text()
366
+ # → 'Durch den Kaufvertrag wird der Verkäufer einer Sache verpflichtet...'
367
+ ```
368
+
369
+ ### r.titel()
370
+
371
+ Returns the section heading (Überschrift), if one exists.
372
+
373
+ ```python
374
+ r.titel()
375
+ # → 'Vertragstypische Pflichten beim Kaufvertrag'
376
+ ```
377
+
378
+ ### r.resolved_depth
379
+
380
+ String indicating how deeply the reference was resolved.
381
+ One of: `'section'`, `'absatz'`, `'satz'`, `'nummer'`, `'buchstabe'`, `'unterbuchstabe'`.
382
+
383
+ ```python
384
+ r.resolved_depth # → 'absatz' (Absatz found, but no Nummer requested)
385
+ ```
386
+
387
+ ### r.resolution_note
388
+
389
+ Human-readable explanation when the requested depth was not fully resolved.
390
+ Empty string when resolution was complete.
391
+
392
+ ```python
393
+ r.resolution_note
394
+ # → '' (fully resolved)
395
+ # → 'Buchstabe c not found in Nr. 1' (partial resolution)
396
+ ```
397
+
398
+ ### r.reference
399
+
400
+ The parsed `LawReference` object for this result.
401
+
402
+ ```python
403
+ r.reference.law # → 'BGB'
404
+ r.reference.paragraphs # → [ParagraphRef(paragraph='433', ...)]
405
+ str(r.reference) # → '§ 433 Abs. 1 BGB'
406
+ ```
407
+
408
+ ### r.law_data
409
+
410
+ The `LawData` object for the parent statute.
411
+
412
+ ```python
413
+ r.law_data.jurabk # → 'BGB'
414
+ r.law_data.gesetze_id # → 'BGB::BJNR001950896'
415
+ r.law_data.metadaten.get('langtitel') # → 'Bürgerliches Gesetzbuch'
416
+ r.law_data.metadaten.get('ausfertigung_datum') # → '1896-08-18'
417
+ len(r.law_data.sections) # → 2541
418
+ ```
419
+
420
+ ### r.section
421
+
422
+ Raw dict of the resolved section, or `None` if not found.
423
+
424
+ ```python
425
+ r.section.get('titel') # same as r.titel()
426
+ r.section.get('content') # list of content blocks
427
+ ```
428
+
429
+ ### r.resolved_para
430
+
431
+ The specific `ParagraphRef` that was matched (after multi-target expansion).
432
+
433
+ ```python
434
+ str(r.resolved_para) # → '433 Abs. 1'
435
+ ```
436
+
437
+
438
+ ## LawData
439
+
440
+ Returned by `lib.get_law()` and available as `result.law_data`.
441
+
442
+ ```python
443
+ bgb = lib.get_law('BGB')
444
+ ```
445
+
446
+ ### Attributes
447
+
448
+ ```python
449
+ bgb.jurabk # → 'BGB' abbreviation
450
+ bgb.gesetze_id # → 'BGB::BJNR001950896' internal corpus ID
451
+ bgb.metadaten # → dict full metadata
452
+ bgb.sections # → dict all sections keyed by paragraph string
453
+ bgb.fussnoten # → list footnotes at law level
454
+ bgb.quelle # → dict source metadata
455
+ ```
456
+
457
+ ### Useful metadaten keys
458
+
459
+ ```python
460
+ bgb.metadaten.get('langtitel') # → 'Bürgerliches Gesetzbuch'
461
+ bgb.metadaten.get('kurztitel') # short title if present
462
+ bgb.metadaten.get('ausfertigung_datum') # → '1896-08-18'
463
+ bgb.metadaten.get('fundstelle', {}).get('periodikum') # → 'RGBl'
464
+ bgb.metadaten.get('fundstelle', {}).get('zitstelle') # → '1896, 195'
465
+
466
+ ```
467
+
468
+ ### bgb.get_section(paragraph)
469
+
470
+ Look up a section by paragraph number string.
471
+
472
+ ```python
473
+ sec = bgb.get_section('433')
474
+ sec['titel'] # → 'Vertragstypische Pflichten beim Kaufvertrag'
475
+ sec['content'] # → list of Absatz dicts
476
+ ```
477
+
478
+ ### bgb.get_absatz(paragraph, absatz)
479
+
480
+ Look up a specific Absatz within a section.
481
+
482
+ ```python
483
+ abs1 = bgb.get_absatz('433', 1)
484
+ abs1 = bgb.get_absatz('433', '1') # string also works
485
+ ```
486
+
487
+
488
+ ## Resolved depth reference
489
+
490
+ | `resolved_depth` | Meaning |
491
+ | ------------------ | ------------------------------------------------------ |
492
+ | `'section'` | Only the paragraph was found (no sub-ref match) |
493
+ | `'absatz'` | Absatz resolved, Nummer was not requested/found |
494
+ | `'nummer'` | Nummer resolved, Buchstabe not requested/found |
495
+ | `'buchstabe'` | Buchstabe resolved, Unterbuchstabe not requested/found |
496
+ | `'unterbuchstabe'` | Fully resolved to Unterbuchstabe level (`aa)`, `bb)`) |
497
+
498
+
499
+ ## Complete example
500
+
501
+ ```python
502
+ from bundesrecht import Bundesrecht, normalise, parse_reference
503
+
504
+ # Load
505
+ lib = Bundesrecht()
506
+ print(lib) # → Bundesrecht(6873 laws loaded)
507
+
508
+ # Parse only
509
+ ref = parse_reference('§ 433 Abs. 1 Satz 1 BGB')
510
+ ref.law # → 'BGB'
511
+ ref.paragraphs[0].paragraph # → '433'
512
+ ref.paragraphs[0].sub_refs # → [SubReference(Abs,1), SubReference(Satz,1)]
513
+
514
+ # Normalise only
515
+ normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
516
+ # → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']
517
+
518
+ # Resolve
519
+ results = lib.query('§ 433 Abs. 1 BGB')
520
+ r = results[0]
521
+
522
+ r.titel() # → 'Vertragstypische Pflichten beim Kaufvertrag'
523
+ r.full_text() # → actual statutory text of Abs. 1
524
+ r.resolved_depth # → 'absatz' (Absatz found, but no Nummer requested)
525
+ str(r.reference) # → '§ 433 Abs. 1 BGB'
526
+
527
+ # Inspect a law directly
528
+ bgb = lib.get_law('BGB')
529
+ bgb.metadaten.get('langtitel') # → 'Bürgerliches Gesetzbuch'
530
+ bgb.metadaten.get('ausfertigung_datum') # → '1896-08-18'
531
+ len(bgb.sections) # → 2541
532
+
533
+ # List all laws
534
+ lib.available_laws[:5] # → ['1-DM-GOLDMÜNZG', '1. BESVNG', ...]
535
+ lib.law_count # → 6873
536
+ ```