quiz-gen 0.1.5__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
@@ -0,0 +1,395 @@
1
+ Metadata-Version: 2.4
2
+ Name: quiz-gen
3
+ Version: 0.1.5
4
+ Summary: AI-powered quiz generator for regulatory, certification, and educational documentation
5
+ Author-email: Yauheniya Varabyova <yauheniya.ai@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Documentation, https://quiz-gen.readthedocs.io
8
+ Project-URL: Repository, https://github.com/yauheniya-ai/quiz-gen
9
+ Project-URL: Bug Tracker, https://github.com/yauheniya-ai/quiz-gen/issues
10
+ Keywords: quiz,regulation,certification,education,eur-lex,cfr
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Education
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Requires-Python: >=3.10
20
+ Description-Content-Type: text/markdown
21
+ License-File: LICENSE
22
+ Requires-Dist: beautifulsoup4>=4.12.0
23
+ Requires-Dist: lxml>=5.0.0
24
+ Requires-Dist: requests>=2.31.0
25
+ Provides-Extra: dev
26
+ Requires-Dist: pytest>=7.4.0; extra == "dev"
27
+ Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
28
+ Requires-Dist: black>=23.0.0; extra == "dev"
29
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
30
+ Requires-Dist: mypy>=1.5.0; extra == "dev"
31
+ Requires-Dist: pre-commit>=3.4.0; extra == "dev"
32
+ Requires-Dist: twine>=4.0.0; extra == "dev"
33
+ Requires-Dist: mkdocs; extra == "dev"
34
+ Requires-Dist: mkdocs-material; extra == "dev"
35
+ Dynamic: license-file
36
+
37
+ # quiz-gen
38
+
39
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
40
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
41
+ [![PyPI version](https://img.shields.io/pypi/v/quiz-gen?color=blue&label=PyPI)](https://pypi.org/project/quiz-gen/)
42
+ [![GitHub last commit](https://img.shields.io/github/last-commit/yauheniya-ai/quiz-gen)](https://github.com/yauheniya-ai/quiz-gen/commits/main)
43
+ [![Downloads](https://pepy.tech/badge/quiz-gen)](https://pepy.tech/project/quiz-gen)
44
+
45
+
46
+ AI-powered quiz generator for regulatory, certification, and educational documentation. Extract structured content from complex legal and technical documents to create comprehensive learning materials.
47
+
48
+ ## Features
49
+
50
+ - **EUR-Lex Document Parser**: Parse and structure European Union legal documents with full table of contents extraction
51
+ - **Hierarchical Document Analysis**: Automatically identify document structure including chapters, sections, articles, and recitals
52
+ - **Intelligent Chunking**: Extract meaningful content chunks at appropriate granularity levels (articles and recitals)
53
+ - **Table of Contents Generation**: Build complete document navigation structure with 3-level hierarchy
54
+ - **Regulatory Document Support**: Specialized parsing for aviation regulations, directives, and other technical documentation
55
+
56
+ ## Installation
57
+
58
+ ```bash
59
+ pip install quiz-gen
60
+ ```
61
+
62
+ ## Quick Start
63
+
64
+ ### Parsing EUR-Lex Documents
65
+
66
+ ```python
67
+ from quiz_gen import EURLexParser
68
+
69
+ # Parse a regulation document
70
+ url = "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=OJ:L_202401689"
71
+ parser = EURLexParser(url=url)
72
+ chunks, toc = parser.parse()
73
+
74
+ # Access structured content
75
+ print(f"Extracted {len(chunks)} content chunks")
76
+ print(f"Document has {len(toc['sections'])} major sections")
77
+
78
+ # Save results
79
+ parser.save_chunks('output_chunks.json')
80
+ parser.save_toc('output_toc.json')
81
+ ```
82
+
83
+ ### Document Structure
84
+
85
+ The parser extracts documents into a multi-level hierarchy:
86
+
87
+ **Level 1**: Major Sections
88
+ - Preamble
89
+ - Enacting Terms
90
+
91
+ **Level 2/3**: Structural Divisions
92
+ - Chapters
93
+ - Sections
94
+
95
+ **Level 1/2/3/4**: Content Elements
96
+ - Title
97
+ - Citation
98
+ - Recitals
99
+ - Articles
100
+ - Concluding formulas
101
+ - Annex
102
+ - Appendix
103
+
104
+ ### Working with Chunks
105
+
106
+ ```python
107
+ # Iterate through extracted chunks
108
+ for chunk in chunks:
109
+ print(f"{chunk.title}")
110
+ print(f"Type: {chunk.section_type.value}")
111
+ print(f"Number: {chunk.number}")
112
+ print(f"Content: {chunk.content[:200]}...")
113
+ print(f"Hierarchy: {' > '.join(chunk.hierarchy_path)}")
114
+ print()
115
+ ```
116
+
117
+ ### Displaying Table of Contents
118
+
119
+ ```python
120
+ # Print formatted TOC
121
+ parser.print_toc()
122
+
123
+ # Output:
124
+ # PREAMBLE
125
+ # Citation
126
+ # Recital 1
127
+ # Recital 2
128
+ # ...
129
+ #
130
+ # ENACTING TERMS
131
+ # CHAPTER I - PRINCIPLES
132
+ # Article 1 - Subject matter and objectives
133
+ # Article 2 - Scope
134
+ ```
135
+
136
+ ## Use Cases
137
+
138
+ ### Compliance and Legal
139
+
140
+ - Analyze regulatory requirements systematically
141
+ - Track changes across document versions
142
+ - Build searchable knowledge bases from legal texts
143
+
144
+ ### Documentation Processing
145
+
146
+ - Convert unstructured documents into structured data
147
+ - Build citation networks and cross-references
148
+ - Support automated document analysis workflows
149
+
150
+ ### Education and Training
151
+
152
+ - Generate study materials from regulatory documents
153
+ - Create structured learning paths for certification programs
154
+ - Extract key concepts for examination preparation
155
+
156
+ ## Supported Document Types
157
+
158
+ Currently supports:
159
+
160
+ - **EUR-Lex HTML Documents**: European Union regulations, directives, decisions
161
+ - **Legislative Acts**: Structured legal documents with formal hierarchies
162
+
163
+ ### Document Format Requirements
164
+
165
+ - Documents must use EUR-Lex HTML format
166
+ - Must contain `eli-subdivision` elements for proper structure identification
167
+ - Supports multi-level hierarchies with chapters, sections, and articles
168
+
169
+ ## Advanced Usage
170
+
171
+ ### Custom Parsing Workflows
172
+
173
+ ```python
174
+ from quiz_gen import EURLexParser
175
+
176
+ parser = EURLexParser(url=document_url)
177
+
178
+ # Parse specific sections
179
+ parser._parse_preamble() # Extract citations and recitals
180
+ parser._parse_enacting_terms() # Extract chapters and articles
181
+ parser._parse_annexes() # Extract annexes
182
+
183
+ # Access intermediate results
184
+ toc = parser.toc # Full table of contents
185
+ chunks = parser.chunks # Content chunks only
186
+ ```
187
+
188
+ ### Filtering Chunks by Type
189
+
190
+ ```python
191
+ from quiz_gen import SectionType
192
+
193
+ # Get only recitals
194
+ recitals = [c for c in chunks if c.section_type == SectionType.RECITAL]
195
+
196
+ # Get only articles
197
+ articles = [c for c in chunks if c.section_type == SectionType.ARTICLE]
198
+
199
+ # Filter by chapter
200
+ chapter_1_articles = [
201
+ c for c in articles
202
+ if 'CHAPTER I' in ' > '.join(c.hierarchy_path)
203
+ ]
204
+ ```
205
+
206
+ ### Accessing Metadata
207
+
208
+ ```python
209
+ for chunk in chunks:
210
+ # Access structured metadata
211
+ print(chunk.metadata) # {'id': 'art_1', 'subtitle': '...'}
212
+
213
+ # Navigate hierarchy
214
+ print(chunk.hierarchy_path) # ['CHAPTER I - PRINCIPLES', 'Article 1']
215
+
216
+ # Identify parent sections
217
+ print(chunk.parent_section)
218
+ ```
219
+
220
+ ## Project Structure
221
+
222
+ ```
223
+ quiz-gen/
224
+ ├── src/
225
+ │ └── quiz_gen/
226
+ │ ├── parsers/
227
+ │ │ └── html/
228
+ │ │ └── eu_lex_parser.py
229
+ │ ├── models/
230
+ │ │ ├── chunk.py
231
+ │ │ ├── document.py
232
+ │ │ └── quiz.py
233
+ │ └── utils/
234
+ ├── examples/
235
+ │ └── eu_lex_toc_chunks.py
236
+ ├── tests/
237
+ ├── data/
238
+ │ ├── processed/
239
+ │ └── raw/
240
+ └── docs/
241
+ ```
242
+
243
+ ## Development
244
+
245
+ ### Setting up Development Environment
246
+
247
+ ```bash
248
+ # Clone the repository
249
+ git clone https://github.com/yauheniya-ai/quiz-gen.git
250
+ cd quiz-gen
251
+
252
+ # Install with development dependencies
253
+ pip install -e ".[dev]"
254
+
255
+ # Run tests
256
+ pytest
257
+
258
+ # Run linting
259
+ ruff check .
260
+ black .
261
+ ```
262
+
263
+
264
+ ### Contributing
265
+
266
+ Contributions are welcome! Please ensure:
267
+
268
+ 1. Code follows PEP 8 style guidelines
269
+ 2. All tests pass
270
+ 3. New features include appropriate tests
271
+ 4. Documentation is updated
272
+
273
+ ## API Reference
274
+
275
+ ### EURLexParser
276
+
277
+ Main parser class for EUR-Lex documents.
278
+
279
+ **Methods**:
280
+ - `parse()` -> `tuple[List[RegulationChunk], Dict]`: Parse document and return chunks and TOC
281
+ - `fetch()` -> `str`: Fetch HTML content from URL
282
+ - `save_chunks(filepath: str)`: Save chunks to JSON file
283
+ - `save_toc(filepath: str)`: Save table of contents to JSON file
284
+ - `print_toc()`: Display formatted table of contents
285
+
286
+ ### RegulationChunk
287
+
288
+ Represents a parsed content chunk (article or recital).
289
+
290
+ **Attributes**:
291
+ - `section_type`: Type of section (ARTICLE, RECITAL, etc.)
292
+ - `number`: Section number (e.g., "1", "42")
293
+ - `title`: Full title including subtitle
294
+ - `content`: Text content
295
+ - `hierarchy_path`: List of parent sections
296
+ - `metadata`: Additional structured data
297
+
298
+ ### SectionType
299
+
300
+ Enumeration of document section types.
301
+
302
+ **Values**:
303
+ - `PREAMBLE`: Preamble section
304
+ - `ENACTING_TERMS`: Main regulatory content
305
+ - `CITATION`: Citation in preamble
306
+ - `RECITAL`: Recital in preamble
307
+ - `CHAPTER`: Chapter division
308
+ - `SECTION`: Section within chapter
309
+ - `ARTICLE`: Article (main content unit)
310
+ - `ANNEX`: Annex section
311
+
312
+ ## Roadmap
313
+
314
+ Future enhancements planned:
315
+
316
+ - AI-powered quiz generation from extracted content
317
+ - Support for additional document formats (PDF, DOCX, PPTX)
318
+ - Multi-language support
319
+ - Question validation and quality metrics
320
+ - Integration with learning management systems
321
+ - Version comparison and diff analysis
322
+
323
+ ## License
324
+
325
+ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
326
+
327
+ ## Citation
328
+
329
+ If you use this software in academic work, please cite:
330
+
331
+ ```
332
+ Varabyova, Y. (2026). Quiz Gen AI: AI-powered quiz generator for regulatory documentation.
333
+ GitHub repository: https://github.com/yauheniya-ai/quiz-gen
334
+ ```
335
+
336
+ ## Support
337
+
338
+ - Documentation: https://quiz-gen.readthedocs.io
339
+ - Issue Tracker: https://github.com/yauheniya-ai/quiz-gen/issues
340
+
341
+ ## Acknowledgments
342
+
343
+ Built with:
344
+ - BeautifulSoup4 for HTML parsing
345
+ - lxml for XML processing
346
+ - EUR-Lex for providing structured legal documents
347
+
348
+ ## Changelog
349
+
350
+ ### Version 0.1.0 (2026-01-17)
351
+
352
+ Initial release:
353
+ - EUR-Lex document parser
354
+ - Hierarchical document structure extraction
355
+ - Table of contents generation
356
+ - JSON export for chunks and TOC
357
+
358
+ ### Version 0.1.1 (2026-01-18)
359
+
360
+ Parser enhancements:
361
+ - Added regulation title extraction and chunking
362
+ - Support for flexible 3-4 level hierarchy with sections within chapters
363
+ - Complete annexes extraction including table-based content
364
+ - Combined citations into single chunk matching EU-Lex structure
365
+ - Added concluding formulas parsing
366
+
367
+ ### Version 0.1.2 (2026-01-18)
368
+
369
+ Text formatting and tooling:
370
+ - Implemented smart text cleaning for proper list formatting (removes extra newlines after list markers)
371
+ - Fixed numbered paragraph spacing
372
+ - Added professional command-line interface (CLI)
373
+ - Created comprehensive documentation with MkDocs and Material theme
374
+
375
+ ### Version 0.1.3 (2026-01-19)
376
+
377
+ Parser robustness improvements:
378
+ - Fixed parsing of articles directly under enacting terms (without chapter hierarchy)
379
+ - Enhanced article content extraction to handle table-based list items (e.g., (a), (b), (c) in table cells)
380
+ - Added proper appendix detection and parsing (distinguishes appendices from annexes)
381
+ - Improved title extraction for multi-paragraph appendix titles
382
+
383
+ ### Version 0.1.4 (2026-01-19)
384
+
385
+ Annex parsing improvements:
386
+ - Added intelligent detection and parsing of parts within annexes (PART 1, PART 2, etc.)
387
+ - Improved part titles to include annex identifier (e.g., "ANNEX 1 - PART 1" instead of "ANNEX - PART 1")
388
+ - Removed arbitrary content truncation in annexes and appendices - all content now preserved in full
389
+ - Enhanced content collection for parts with proper boundary detection between sections
390
+
391
+ ### Version 0.1.5 (2026-01-19)
392
+
393
+ Bug fixes:
394
+ - Fixed annex TOC title to display with identifier (e.g., "ANNEX 1" instead of "ANNEX")
395
+ - Fixed empty content in annex parts by switching from sibling navigation to descendants iteration
@@ -0,0 +1,37 @@
1
+ quiz_gen/__init__.py,sha256=BLiiFuMIAzlX_G0mTpoGpx1Y6V19kurOM7dhqZopAJg,540
2
+ quiz_gen/__version__.py,sha256=03oJbrV_EJ7RkHGfnaj4OK9XO1l5fKnBwqKcHiKtgKk,410
3
+ quiz_gen/cli.py,sha256=rpqIDQMqnqppeOQkDNtXq63egC-eUgIEsdcGDpWyz1E,6264
4
+ quiz_gen/config.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
5
+ quiz_gen/agents/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
6
+ quiz_gen/agents/answer_generator.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
7
+ quiz_gen/agents/base_agent.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
8
+ quiz_gen/agents/orchestrator.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
9
+ quiz_gen/agents/question_generator.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
10
+ quiz_gen/agents/reviewer.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
11
+ quiz_gen/agents/validator.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
12
+ quiz_gen/models/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
13
+ quiz_gen/models/chunk.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
14
+ quiz_gen/models/document.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
15
+ quiz_gen/models/question.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
16
+ quiz_gen/models/quiz.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
17
+ quiz_gen/parsers/__init__.py,sha256=g2KpJf5yen7mZJy-GWQXybGK2k7mZuYEpyfqKg1rZEw,230
18
+ quiz_gen/parsers/base.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
19
+ quiz_gen/parsers/pdf_parser.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
20
+ quiz_gen/parsers/utils.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
21
+ quiz_gen/parsers/html/eu_lex_parser.py,sha256=-3bJ8RVjqeD9qKysMKG6ICPcLYI81_TtYuWdkn-Yobo,36109
22
+ quiz_gen/storage/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
23
+ quiz_gen/storage/base.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
24
+ quiz_gen/storage/database.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
25
+ quiz_gen/storage/json_storage.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
26
+ quiz_gen/utils/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
27
+ quiz_gen/utils/helpers.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
28
+ quiz_gen/utils/logging.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
29
+ quiz_gen/validation/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
30
+ quiz_gen/validation/human_feedback.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
31
+ quiz_gen/validation/quality_checker.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
32
+ quiz_gen-0.1.5.dist-info/licenses/LICENSE,sha256=bJXbgXAmDWf3_2rn3BVh6-4wZtB5xbycNfqTHyGu_tE,1076
33
+ quiz_gen-0.1.5.dist-info/METADATA,sha256=NSWmeH_HCGkIA6gqqevP7BCObHA8jxprD0ZK2Y7saoA,11475
34
+ quiz_gen-0.1.5.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
35
+ quiz_gen-0.1.5.dist-info/entry_points.txt,sha256=uRVta04GJsOC3b3AZUONSq924-Vqwp2UjsEGYsJKQe4,47
36
+ quiz_gen-0.1.5.dist-info/top_level.txt,sha256=Nrt267uGX_L3FsFX2NSW0Uh9XQ9LthH7Mss9n9VrL2g,9
37
+ quiz_gen-0.1.5.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (80.9.0)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ quiz-gen = quiz_gen.cli:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Yauheniya Varabyova
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ quiz_gen