txtdown 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- txtdown-0.2.0/CHANGELOG.md +33 -0
- txtdown-0.2.0/LICENSE +21 -0
- txtdown-0.2.0/MANIFEST.in +13 -0
- txtdown-0.2.0/PKG-INFO +214 -0
- txtdown-0.2.0/README.md +185 -0
- txtdown-0.2.0/examples/augustine-civ-dei-1.2.txtd +29 -0
- txtdown-0.2.0/examples/cicero-de-amicitia.txtd +465 -0
- txtdown-0.2.0/examples/sulpicia.txtd +66 -0
- txtdown-0.2.0/pyproject.toml +55 -0
- txtdown-0.2.0/setup.cfg +4 -0
- txtdown-0.2.0/src/txtdown/__init__.py +29 -0
- txtdown-0.2.0/src/txtdown/models.py +181 -0
- txtdown-0.2.0/src/txtdown/parser.py +379 -0
- txtdown-0.2.0/src/txtdown/writer.py +106 -0
- txtdown-0.2.0/src/txtdown.egg-info/PKG-INFO +214 -0
- txtdown-0.2.0/src/txtdown.egg-info/SOURCES.txt +21 -0
- txtdown-0.2.0/src/txtdown.egg-info/dependency_links.txt +1 -0
- txtdown-0.2.0/src/txtdown.egg-info/requires.txt +6 -0
- txtdown-0.2.0/src/txtdown.egg-info/top_level.txt +1 -0
- txtdown-0.2.0/tests/fixtures/dulcitius-scene1.txtd +35 -0
- txtdown-0.2.0/tests/fixtures/sulpicia.txtd +65 -0
- txtdown-0.2.0/tests/test_parser.py +879 -0
- txtdown-0.2.0/tests/test_writer.py +286 -0
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project are documented here. The format is based on
|
|
4
|
+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to
|
|
5
|
+
[Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
|
+
|
|
7
|
+
## [0.2.0] - 2026-06-20
|
|
8
|
+
|
|
9
|
+
First public release.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
- **Cross-source quotation markup.** Lines beginning with `>` are parsed as verbatim
|
|
13
|
+
quotations of other literary sources; the marker is stripped and the line is flagged
|
|
14
|
+
with `Line.is_quote`. Consecutive `>` lines form a multi-line quotation. Round-trips
|
|
15
|
+
through `write()`. Demonstrated by the Cicero (quoting Ennius/Terence) and Augustine
|
|
16
|
+
(quoting Virgil) examples.
|
|
17
|
+
- **Speaker markup for dramatic texts.** `@Speaker:` at the start of a line extracts the
|
|
18
|
+
speaker into `Line.speaker` and keeps `Line.text` as clean speech (single-word names).
|
|
19
|
+
- **Explicit line numbering.** Leading `N.` prefixes override auto-increment, and trailing
|
|
20
|
+
editorial labels (e.g. `983a`) are captured in `Line.label` and usable in citations.
|
|
21
|
+
- **Strict validation.** `parse()` now requires a YAML front matter block with a `work`
|
|
22
|
+
field and raises `ValueError` when either is missing. Pass `parse(..., strict=False)` to
|
|
23
|
+
parse a fragment (single line or section) without metadata.
|
|
24
|
+
- `examples/cicero-de-amicitia.txtd` — full *Laelius de Amicitia* with cross-source quotes.
|
|
25
|
+
|
|
26
|
+
## [0.1.0] - Initial format
|
|
27
|
+
|
|
28
|
+
- YAML front matter metadata (`author`, `work`, `source`, `scope`, plus arbitrary extras).
|
|
29
|
+
- Sections separated by `---`, with auto-numbering, explicit IDs, and optional titles.
|
|
30
|
+
- Auto-numbered lines with 1-indexed, citation-based access (`doc.get("2.3")`).
|
|
31
|
+
- Round-trip-safe `parse()` / `write()`.
|
|
32
|
+
|
|
33
|
+
[0.2.0]: https://github.com/diyclassics/txtdown/releases/tag/v0.2.0
|
txtdown-0.2.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2018-2026 Patrick J. Burns
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# Make the sdist a coherent, self-testing source release:
|
|
2
|
+
# include the test suite with its fixtures, the example corpus, and the changelog
|
|
3
|
+
# so downstream packagers can run the tests against the packaged source.
|
|
4
|
+
graft tests
|
|
5
|
+
graft examples
|
|
6
|
+
include CHANGELOG.md
|
|
7
|
+
include LICENSE
|
|
8
|
+
include README.md
|
|
9
|
+
|
|
10
|
+
# Never ship caches or compiled artifacts.
|
|
11
|
+
global-exclude __pycache__
|
|
12
|
+
global-exclude *.py[cod]
|
|
13
|
+
global-exclude .DS_Store
|
txtdown-0.2.0/PKG-INFO
ADDED
|
@@ -0,0 +1,214 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: txtdown
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: Minimal markup for Latin text collections
|
|
5
|
+
Author: Patrick J. Burns
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/diyclassics/txtdown
|
|
8
|
+
Project-URL: Repository, https://github.com/diyclassics/txtdown
|
|
9
|
+
Project-URL: Changelog, https://github.com/diyclassics/txtdown/blob/main/CHANGELOG.md
|
|
10
|
+
Project-URL: Issues, https://github.com/diyclassics/txtdown/issues
|
|
11
|
+
Keywords: latin,markup,text,philology,digital-humanities,nlp
|
|
12
|
+
Classifier: Development Status :: 3 - Alpha
|
|
13
|
+
Classifier: Intended Audience :: Science/Research
|
|
14
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Topic :: Text Processing :: Markup
|
|
20
|
+
Requires-Python: >=3.10
|
|
21
|
+
Description-Content-Type: text/markdown
|
|
22
|
+
License-File: LICENSE
|
|
23
|
+
Requires-Dist: pyyaml>=6.0
|
|
24
|
+
Provides-Extra: dev
|
|
25
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
26
|
+
Requires-Dist: pytest-cov>=4.0; extra == "dev"
|
|
27
|
+
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
|
28
|
+
Dynamic: license-file
|
|
29
|
+
|
|
30
|
+
# txtdown
|
|
31
|
+
|
|
32
|
+
Minimal markup for Latin text collections using human-readable markup with inferrable hierarchical structure for scholarly citation.
|
|
33
|
+
|
|
34
|
+
## Installation
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
pip install git+https://github.com/diyclassics/txtdown.git
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Quick Start
|
|
41
|
+
|
|
42
|
+
```python
|
|
43
|
+
from txtdown import parse, write
|
|
44
|
+
|
|
45
|
+
# Parse a .txtd file
|
|
46
|
+
doc = parse("sulpicia.txtd")
|
|
47
|
+
|
|
48
|
+
# Access metadata
|
|
49
|
+
print(doc.metadata.author) # "Sulpicia"
|
|
50
|
+
print(doc.metadata.work) # "Epistulae"
|
|
51
|
+
|
|
52
|
+
# Access by citation
|
|
53
|
+
line = doc.get("2.3") # Section 2, line 3
|
|
54
|
+
section = doc.get("1") # Entire section 1
|
|
55
|
+
|
|
56
|
+
# Iterate sections and lines
|
|
57
|
+
for section in doc.sections:
|
|
58
|
+
for line in section.lines:
|
|
59
|
+
print(f"{section.id}.{line.number}: {line.text}")
|
|
60
|
+
|
|
61
|
+
# Write back to file (round-trip safe)
|
|
62
|
+
write(doc, "output.txtd")
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Format Specification
|
|
66
|
+
|
|
67
|
+
A `.txtd` file consists of a YAML front matter block followed by sections separated by horizontal rules (`---`). The front matter block is required and must include a `work` field; `parse()` raises `ValueError` otherwise. To parse a fragment without metadata (e.g. a single line or section), pass `strict=False`.
|
|
68
|
+
|
|
69
|
+
### Basic Structure
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
---
|
|
73
|
+
author: Sulpicia
|
|
74
|
+
work: Epistulae
|
|
75
|
+
source: https://thelatinlibrary.com/sulpicia.html
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
--- 1
|
|
79
|
+
|
|
80
|
+
Tandem venit amor, qualem texisse pudori
|
|
81
|
+
quam nudasse alicui sit mihi fama magis.
|
|
82
|
+
exorata meis illum Cytherea Camenis
|
|
83
|
+
attulit in nostrum deposuitque sinum.
|
|
84
|
+
etc.
|
|
85
|
+
|
|
86
|
+
--- 2
|
|
87
|
+
|
|
88
|
+
Invisus natalis adest, qui rure molesto
|
|
89
|
+
et sine Cerintho tristis agendus erit.
|
|
90
|
+
etc.
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### Sections
|
|
94
|
+
|
|
95
|
+
- Sections are separated by `---` (three or more hyphens)
|
|
96
|
+
- Sections auto-number (1, 2, 3...) unless given explicit IDs (best practice)
|
|
97
|
+
- Explicit section ID: `--- prooemium` or `--- 1a`
|
|
98
|
+
- Section with title: `--- prooemium: Introduction`
|
|
99
|
+
|
|
100
|
+
### Lines (for verse)
|
|
101
|
+
|
|
102
|
+
- Lines auto-number within each section (1, 2, 3...)
|
|
103
|
+
- Blank lines don't count toward line numbering
|
|
104
|
+
- Access via citation: `doc.get("2.3")` returns section 2, line 3
|
|
105
|
+
|
|
106
|
+
**Line indentation** (`mode: verse`): Leading whitespace indicates poetic structure (e.g., pentameter lines in elegiac couplets):
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
Tandem venit amor, qualem texisse pudori
|
|
110
|
+
quam nudasse alicui sit mihi fama magis.
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
The parser preserves indentation. For NLP, TxtdownReader strips leading whitespace when joining lines for sentence segmentation.
|
|
114
|
+
|
|
115
|
+
### Speaker Markup (dramatic texts)
|
|
116
|
+
|
|
117
|
+
For dramatic texts, use `@Speaker:` at the start of a line to mark speaker attribution:
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
@Diocletianus: Quid sibi vult ista, quae vos agitat, fatuitas?
|
|
121
|
+
@Agapes: quod signum fatuitatis nobis inesse deprehendis?
|
|
122
|
+
@Diocletianus: Evidens magnumque.
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
The parser extracts the speaker name into `line.speaker` and keeps `line.text` as pure speech text — ideal for NLP pipelines that need clean text without markup.
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
doc = parse("dulcitius.txtd")
|
|
129
|
+
for line in doc.sections[0].lines:
|
|
130
|
+
print(f"{line.speaker}: {line.text}")
|
|
131
|
+
# Diocletianus: Quid sibi vult ista...
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Non-speaker lines (stage directions, prose) have `line.speaker = None`. Speaker markup round-trips through `write()`.
|
|
135
|
+
|
|
136
|
+
### Cross-source Quotation
|
|
137
|
+
|
|
138
|
+
Use `>` at the start of a line to mark text quoted verbatim from *another* literary
|
|
139
|
+
source — an author embedding a poet's verse in their own prose, for example. This
|
|
140
|
+
repurposes the familiar blockquote convention for the citational habits of classical texts:
|
|
141
|
+
|
|
142
|
+
```
|
|
143
|
+
Quamquam Ennius recte:
|
|
144
|
+
|
|
145
|
+
> Amicus certus in re incerta cernitur,
|
|
146
|
+
|
|
147
|
+
tamen haec duo levitatis et infirmitatis plerosque convincunt.
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
The parser strips the `>` marker and flags the line with `line.is_quote = True`, keeping
|
|
151
|
+
`line.text` as clean quoted text. Consecutive `>` lines form a multi-line quotation:
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
> Negat quis, nego; ait, aio; postremo imperavi egomet mihi
|
|
155
|
+
> Omnia adsentari,
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
```python
|
|
159
|
+
doc = parse("cicero-de-amicitia.txtd")
|
|
160
|
+
quotes = [line.text for s in doc.sections for line in s.lines if line.is_quote]
|
|
161
|
+
# ['Amicus certus in re incerta cernitur,', ...]
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
Non-quote lines have `line.is_quote = False`. Quotation markup round-trips through `write()`.
|
|
165
|
+
See `examples/cicero-de-amicitia.txtd` (Cicero quoting Ennius and Terence) and
|
|
166
|
+
`examples/augustine-civ-dei-1.2.txtd` (Augustine quoting Virgil).
|
|
167
|
+
|
|
168
|
+
### Metadata
|
|
169
|
+
|
|
170
|
+
| Field | Description |
|
|
171
|
+
|-------|-------------|
|
|
172
|
+
| `work` | Work title (**required**) |
|
|
173
|
+
| `author` | Author name |
|
|
174
|
+
| `source` | Source URL or reference |
|
|
175
|
+
| `scope` | Portion of work in file (e.g., `1-6` for books 1-6) |
|
|
176
|
+
|
|
177
|
+
Additional fields are preserved in `metadata.extras`.
|
|
178
|
+
|
|
179
|
+
## API Reference
|
|
180
|
+
|
|
181
|
+
### Functions
|
|
182
|
+
|
|
183
|
+
- `parse(path_or_content: str, *, strict: bool = True) -> Document` — Parse a `.txtd` file or string. Strict by default: raises `ValueError` if the front matter block or `work` field is missing; pass `strict=False` for fragments.
|
|
184
|
+
- `write(doc: Document, path: str | None) -> str` — Write to file if path given; always returns serialized string
|
|
185
|
+
|
|
186
|
+
### Classes
|
|
187
|
+
|
|
188
|
+
- `Document` — Container with `metadata: Metadata` and `sections: list[Section]`
|
|
189
|
+
- `Section` — Container with `id: str`, `lines: list[Line]`, optional `title` and `metadata`
|
|
190
|
+
- `Line` — Container with `text: str`, `number: int`, optional `speaker: str | None` and `label: str | None`, and `is_quote: bool` (cross-source quotation)
|
|
191
|
+
- `Metadata` — Container with `author`, `work`, `source`, `scope`, and `extras` dict
|
|
192
|
+
|
|
193
|
+
## Development
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# Clone and install dev dependencies
|
|
197
|
+
git clone https://github.com/diyclassics/txtdown.git
|
|
198
|
+
cd txtdown
|
|
199
|
+
pip install -e ".[dev]"
|
|
200
|
+
|
|
201
|
+
# Run tests
|
|
202
|
+
pytest tests/ -v
|
|
203
|
+
|
|
204
|
+
# Run with coverage
|
|
205
|
+
pytest tests/ --cov=txtdown --cov-report=term-missing
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
## Project History
|
|
209
|
+
|
|
210
|
+
The idea for txtdown originated in January 2018, inspired by the need for a document format for Latin text collections that balanced the simplicity of plaintext with the more involved markup of XML-based formats like TEI. The goal was to create a format that is both human-readable and computer-tractable, supporting hierarchical structures, fundamental annotations, and embedded metadata. Txtdown has since been influenced by ongoing work on annotation projects such as the [Representing Women Authorship in the Latin Treebanks (RWALT)](https://diyclassics.github.io/rwalt-site/) project.
|
|
211
|
+
|
|
212
|
+
## License
|
|
213
|
+
|
|
214
|
+
MIT
|
txtdown-0.2.0/README.md
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
# txtdown
|
|
2
|
+
|
|
3
|
+
Minimal markup for Latin text collections using human-readable markup with inferrable hierarchical structure for scholarly citation.
|
|
4
|
+
|
|
5
|
+
## Installation
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pip install git+https://github.com/diyclassics/txtdown.git
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
## Quick Start
|
|
12
|
+
|
|
13
|
+
```python
|
|
14
|
+
from txtdown import parse, write
|
|
15
|
+
|
|
16
|
+
# Parse a .txtd file
|
|
17
|
+
doc = parse("sulpicia.txtd")
|
|
18
|
+
|
|
19
|
+
# Access metadata
|
|
20
|
+
print(doc.metadata.author) # "Sulpicia"
|
|
21
|
+
print(doc.metadata.work) # "Epistulae"
|
|
22
|
+
|
|
23
|
+
# Access by citation
|
|
24
|
+
line = doc.get("2.3") # Section 2, line 3
|
|
25
|
+
section = doc.get("1") # Entire section 1
|
|
26
|
+
|
|
27
|
+
# Iterate sections and lines
|
|
28
|
+
for section in doc.sections:
|
|
29
|
+
for line in section.lines:
|
|
30
|
+
print(f"{section.id}.{line.number}: {line.text}")
|
|
31
|
+
|
|
32
|
+
# Write back to file (round-trip safe)
|
|
33
|
+
write(doc, "output.txtd")
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Format Specification
|
|
37
|
+
|
|
38
|
+
A `.txtd` file consists of a YAML front matter block followed by sections separated by horizontal rules (`---`). The front matter block is required and must include a `work` field; `parse()` raises `ValueError` otherwise. To parse a fragment without metadata (e.g. a single line or section), pass `strict=False`.
|
|
39
|
+
|
|
40
|
+
### Basic Structure
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
---
|
|
44
|
+
author: Sulpicia
|
|
45
|
+
work: Epistulae
|
|
46
|
+
source: https://thelatinlibrary.com/sulpicia.html
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
--- 1
|
|
50
|
+
|
|
51
|
+
Tandem venit amor, qualem texisse pudori
|
|
52
|
+
quam nudasse alicui sit mihi fama magis.
|
|
53
|
+
exorata meis illum Cytherea Camenis
|
|
54
|
+
attulit in nostrum deposuitque sinum.
|
|
55
|
+
etc.
|
|
56
|
+
|
|
57
|
+
--- 2
|
|
58
|
+
|
|
59
|
+
Invisus natalis adest, qui rure molesto
|
|
60
|
+
et sine Cerintho tristis agendus erit.
|
|
61
|
+
etc.
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Sections
|
|
65
|
+
|
|
66
|
+
- Sections are separated by `---` (three or more hyphens)
|
|
67
|
+
- Sections auto-number (1, 2, 3...) unless given explicit IDs (best practice)
|
|
68
|
+
- Explicit section ID: `--- prooemium` or `--- 1a`
|
|
69
|
+
- Section with title: `--- prooemium: Introduction`
|
|
70
|
+
|
|
71
|
+
### Lines (for verse)
|
|
72
|
+
|
|
73
|
+
- Lines auto-number within each section (1, 2, 3...)
|
|
74
|
+
- Blank lines don't count toward line numbering
|
|
75
|
+
- Access via citation: `doc.get("2.3")` returns section 2, line 3
|
|
76
|
+
|
|
77
|
+
**Line indentation** (`mode: verse`): Leading whitespace indicates poetic structure (e.g., pentameter lines in elegiac couplets):
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
Tandem venit amor, qualem texisse pudori
|
|
81
|
+
quam nudasse alicui sit mihi fama magis.
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
The parser preserves indentation. For NLP, TxtdownReader strips leading whitespace when joining lines for sentence segmentation.
|
|
85
|
+
|
|
86
|
+
### Speaker Markup (dramatic texts)
|
|
87
|
+
|
|
88
|
+
For dramatic texts, use `@Speaker:` at the start of a line to mark speaker attribution:
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
@Diocletianus: Quid sibi vult ista, quae vos agitat, fatuitas?
|
|
92
|
+
@Agapes: quod signum fatuitatis nobis inesse deprehendis?
|
|
93
|
+
@Diocletianus: Evidens magnumque.
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
The parser extracts the speaker name into `line.speaker` and keeps `line.text` as pure speech text — ideal for NLP pipelines that need clean text without markup.
|
|
97
|
+
|
|
98
|
+
```python
|
|
99
|
+
doc = parse("dulcitius.txtd")
|
|
100
|
+
for line in doc.sections[0].lines:
|
|
101
|
+
print(f"{line.speaker}: {line.text}")
|
|
102
|
+
# Diocletianus: Quid sibi vult ista...
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Non-speaker lines (stage directions, prose) have `line.speaker = None`. Speaker markup round-trips through `write()`.
|
|
106
|
+
|
|
107
|
+
### Cross-source Quotation
|
|
108
|
+
|
|
109
|
+
Use `>` at the start of a line to mark text quoted verbatim from *another* literary
|
|
110
|
+
source — an author embedding a poet's verse in their own prose, for example. This
|
|
111
|
+
repurposes the familiar blockquote convention for the citational habits of classical texts:
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
Quamquam Ennius recte:
|
|
115
|
+
|
|
116
|
+
> Amicus certus in re incerta cernitur,
|
|
117
|
+
|
|
118
|
+
tamen haec duo levitatis et infirmitatis plerosque convincunt.
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
The parser strips the `>` marker and flags the line with `line.is_quote = True`, keeping
|
|
122
|
+
`line.text` as clean quoted text. Consecutive `>` lines form a multi-line quotation:
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
> Negat quis, nego; ait, aio; postremo imperavi egomet mihi
|
|
126
|
+
> Omnia adsentari,
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
doc = parse("cicero-de-amicitia.txtd")
|
|
131
|
+
quotes = [line.text for s in doc.sections for line in s.lines if line.is_quote]
|
|
132
|
+
# ['Amicus certus in re incerta cernitur,', ...]
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Non-quote lines have `line.is_quote = False`. Quotation markup round-trips through `write()`.
|
|
136
|
+
See `examples/cicero-de-amicitia.txtd` (Cicero quoting Ennius and Terence) and
|
|
137
|
+
`examples/augustine-civ-dei-1.2.txtd` (Augustine quoting Virgil).
|
|
138
|
+
|
|
139
|
+
### Metadata
|
|
140
|
+
|
|
141
|
+
| Field | Description |
|
|
142
|
+
|-------|-------------|
|
|
143
|
+
| `work` | Work title (**required**) |
|
|
144
|
+
| `author` | Author name |
|
|
145
|
+
| `source` | Source URL or reference |
|
|
146
|
+
| `scope` | Portion of work in file (e.g., `1-6` for books 1-6) |
|
|
147
|
+
|
|
148
|
+
Additional fields are preserved in `metadata.extras`.
|
|
149
|
+
|
|
150
|
+
## API Reference
|
|
151
|
+
|
|
152
|
+
### Functions
|
|
153
|
+
|
|
154
|
+
- `parse(path_or_content: str, *, strict: bool = True) -> Document` — Parse a `.txtd` file or string. Strict by default: raises `ValueError` if the front matter block or `work` field is missing; pass `strict=False` for fragments.
|
|
155
|
+
- `write(doc: Document, path: str | None) -> str` — Write to file if path given; always returns serialized string
|
|
156
|
+
|
|
157
|
+
### Classes
|
|
158
|
+
|
|
159
|
+
- `Document` — Container with `metadata: Metadata` and `sections: list[Section]`
|
|
160
|
+
- `Section` — Container with `id: str`, `lines: list[Line]`, optional `title` and `metadata`
|
|
161
|
+
- `Line` — Container with `text: str`, `number: int`, optional `speaker: str | None` and `label: str | None`, and `is_quote: bool` (cross-source quotation)
|
|
162
|
+
- `Metadata` — Container with `author`, `work`, `source`, `scope`, and `extras` dict
|
|
163
|
+
|
|
164
|
+
## Development
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
# Clone and install dev dependencies
|
|
168
|
+
git clone https://github.com/diyclassics/txtdown.git
|
|
169
|
+
cd txtdown
|
|
170
|
+
pip install -e ".[dev]"
|
|
171
|
+
|
|
172
|
+
# Run tests
|
|
173
|
+
pytest tests/ -v
|
|
174
|
+
|
|
175
|
+
# Run with coverage
|
|
176
|
+
pytest tests/ --cov=txtdown --cov-report=term-missing
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## Project History
|
|
180
|
+
|
|
181
|
+
The idea for txtdown originated in January 2018, inspired by the need for a document format for Latin text collections that balanced the simplicity of plaintext with the more involved markup of XML-based formats like TEI. The goal was to create a format that is both human-readable and computer-tractable, supporting hierarchical structures, fundamental annotations, and embedded metadata. Txtdown has since been influenced by ongoing work on annotation projects such as the [Representing Women Authorship in the Latin Treebanks (RWALT)](https://diyclassics.github.io/rwalt-site/) project.
|
|
182
|
+
|
|
183
|
+
## License
|
|
184
|
+
|
|
185
|
+
MIT
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
---
|
|
2
|
+
author: Augustine of Hippo
|
|
3
|
+
work: De Civitate Dei
|
|
4
|
+
source: https://www.thelatinlibrary.com/augustine/civ1.shtml
|
|
5
|
+
scope: 1.2
|
|
6
|
+
language: la
|
|
7
|
+
mode: prose
|
|
8
|
+
genre: theology
|
|
9
|
+
comment: This example demonstrates the use of `>` for indicating in-text quotation.
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
--- 2
|
|
13
|
+
|
|
14
|
+
Tot bella gesta conscripta sunt uel ante conditam Romam uel ab eius exortu et imperio: legant et proferant sic aut ab alienigenis aliquam captam esse ciuitatem, ut hostes, qui ceperant, parcerent eis, quos ad deorum suorum templa confugisse compererant, aut aliquem ducem barbarorum praecepisse, ut inrupto oppido nullus feriretur, qui in illo uel illo templo fuisset inuentus. Nonne uidit Aeneas Priamum per aras
|
|
15
|
+
|
|
16
|
+
> Sanguine foedantem quos ipse sacrauerat ignes?
|
|
17
|
+
|
|
18
|
+
Nonne Diomedes et Vlixes
|
|
19
|
+
|
|
20
|
+
> caesis summae custodibus arcis.
|
|
21
|
+
> Corripuere sacram effigiem manibusque cruentis
|
|
22
|
+
> Virgineas ausi diuae contingere uittas?
|
|
23
|
+
|
|
24
|
+
Nec tamen quod sequitur uerum est:
|
|
25
|
+
|
|
26
|
+
> Ex illo fluere ac retro sublapsa referri
|
|
27
|
+
> Spes Danaum.
|
|
28
|
+
|
|
29
|
+
Postea quippe uicerunt, postea Troiam ferro ignibusque delerunt, postea confugientem ad aras Priamum obtruncauerunt. Nec ideo Troia periit, quia Mineruam perdidit. Quid enim prius ipsa Minerua perdiderat, ut periret? an forte custodes suos? Hoc sane uerum est; illis quippe interemptis potuit auferri. Neque enim homines a simulacro, sed simulacrum ab hominibus seruabatur. Quomodo ergo colebatur, ut patriam custodiret et ciues, quae suos non ualuit custodire custodes?
|