txtdown 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,33 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project are documented here. The format is based on
4
+ [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to
5
+ [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
+
7
+ ## [0.2.0] - 2026-06-20
8
+
9
+ First public release.
10
+
11
+ ### Added
12
+ - **Cross-source quotation markup.** Lines beginning with `>` are parsed as verbatim
13
+ quotations of other literary sources; the marker is stripped and the line is flagged
14
+ with `Line.is_quote`. Consecutive `>` lines form a multi-line quotation. Round-trips
15
+ through `write()`. Demonstrated by the Cicero (quoting Ennius/Terence) and Augustine
16
+ (quoting Virgil) examples.
17
+ - **Speaker markup for dramatic texts.** `@Speaker:` at the start of a line extracts the
18
+ speaker into `Line.speaker` and keeps `Line.text` as clean speech (single-word names).
19
+ - **Explicit line numbering.** Leading `N.` prefixes override auto-increment, and trailing
20
+ editorial labels (e.g. `983a`) are captured in `Line.label` and usable in citations.
21
+ - **Strict validation.** `parse()` now requires a YAML front matter block with a `work`
22
+ field and raises `ValueError` when either is missing. Pass `parse(..., strict=False)` to
23
+ parse a fragment (single line or section) without metadata.
24
+ - `examples/cicero-de-amicitia.txtd` — full *Laelius de Amicitia* with cross-source quotes.
25
+
26
+ ## [0.1.0] - Initial format
27
+
28
+ - YAML front matter metadata (`author`, `work`, `source`, `scope`, plus arbitrary extras).
29
+ - Sections separated by `---`, with auto-numbering, explicit IDs, and optional titles.
30
+ - Auto-numbered lines with 1-indexed, citation-based access (`doc.get("2.3")`).
31
+ - Round-trip-safe `parse()` / `write()`.
32
+
33
+ [0.2.0]: https://github.com/diyclassics/txtdown/releases/tag/v0.2.0
txtdown-0.2.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018-2026 Patrick J. Burns
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,13 @@
1
+ # Make the sdist a coherent, self-testing source release:
2
+ # include the test suite with its fixtures, the example corpus, and the changelog
3
+ # so downstream packagers can run the tests against the packaged source.
4
+ graft tests
5
+ graft examples
6
+ include CHANGELOG.md
7
+ include LICENSE
8
+ include README.md
9
+
10
+ # Never ship caches or compiled artifacts.
11
+ global-exclude __pycache__
12
+ global-exclude *.py[cod]
13
+ global-exclude .DS_Store
txtdown-0.2.0/PKG-INFO ADDED
@@ -0,0 +1,214 @@
1
+ Metadata-Version: 2.4
2
+ Name: txtdown
3
+ Version: 0.2.0
4
+ Summary: Minimal markup for Latin text collections
5
+ Author: Patrick J. Burns
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/diyclassics/txtdown
8
+ Project-URL: Repository, https://github.com/diyclassics/txtdown
9
+ Project-URL: Changelog, https://github.com/diyclassics/txtdown/blob/main/CHANGELOG.md
10
+ Project-URL: Issues, https://github.com/diyclassics/txtdown/issues
11
+ Keywords: latin,markup,text,philology,digital-humanities,nlp
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Text Processing :: Markup
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: pyyaml>=6.0
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest>=7.0; extra == "dev"
26
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
27
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
28
+ Dynamic: license-file
29
+
30
+ # txtdown
31
+
32
+ Minimal markup for Latin text collections using human-readable markup with inferrable hierarchical structure for scholarly citation.
33
+
34
+ ## Installation
35
+
36
+ ```bash
37
+ pip install git+https://github.com/diyclassics/txtdown.git
38
+ ```
39
+
40
+ ## Quick Start
41
+
42
+ ```python
43
+ from txtdown import parse, write
44
+
45
+ # Parse a .txtd file
46
+ doc = parse("sulpicia.txtd")
47
+
48
+ # Access metadata
49
+ print(doc.metadata.author) # "Sulpicia"
50
+ print(doc.metadata.work) # "Epistulae"
51
+
52
+ # Access by citation
53
+ line = doc.get("2.3") # Section 2, line 3
54
+ section = doc.get("1") # Entire section 1
55
+
56
+ # Iterate sections and lines
57
+ for section in doc.sections:
58
+ for line in section.lines:
59
+ print(f"{section.id}.{line.number}: {line.text}")
60
+
61
+ # Write back to file (round-trip safe)
62
+ write(doc, "output.txtd")
63
+ ```
64
+
65
+ ## Format Specification
66
+
67
+ A `.txtd` file consists of a YAML front matter block followed by sections separated by horizontal rules (`---`). The front matter block is required and must include a `work` field; `parse()` raises `ValueError` otherwise. To parse a fragment without metadata (e.g. a single line or section), pass `strict=False`.
68
+
69
+ ### Basic Structure
70
+
71
+ ```
72
+ ---
73
+ author: Sulpicia
74
+ work: Epistulae
75
+ source: https://thelatinlibrary.com/sulpicia.html
76
+ ---
77
+
78
+ --- 1
79
+
80
+ Tandem venit amor, qualem texisse pudori
81
+ quam nudasse alicui sit mihi fama magis.
82
+ exorata meis illum Cytherea Camenis
83
+ attulit in nostrum deposuitque sinum.
84
+ etc.
85
+
86
+ --- 2
87
+
88
+ Invisus natalis adest, qui rure molesto
89
+ et sine Cerintho tristis agendus erit.
90
+ etc.
91
+ ```
92
+
93
+ ### Sections
94
+
95
+ - Sections are separated by `---` (three or more hyphens)
96
+ - Sections auto-number (1, 2, 3...) unless given explicit IDs (best practice)
97
+ - Explicit section ID: `--- prooemium` or `--- 1a`
98
+ - Section with title: `--- prooemium: Introduction`
99
+
100
+ ### Lines (for verse)
101
+
102
+ - Lines auto-number within each section (1, 2, 3...)
103
+ - Blank lines don't count toward line numbering
104
+ - Access via citation: `doc.get("2.3")` returns section 2, line 3
105
+
106
+ **Line indentation** (`mode: verse`): Leading whitespace indicates poetic structure (e.g., pentameter lines in elegiac couplets):
107
+
108
+ ```
109
+ Tandem venit amor, qualem texisse pudori
110
+ quam nudasse alicui sit mihi fama magis.
111
+ ```
112
+
113
+ The parser preserves indentation. For NLP, TxtdownReader strips leading whitespace when joining lines for sentence segmentation.
114
+
115
+ ### Speaker Markup (dramatic texts)
116
+
117
+ For dramatic texts, use `@Speaker:` at the start of a line to mark speaker attribution:
118
+
119
+ ```
120
+ @Diocletianus: Quid sibi vult ista, quae vos agitat, fatuitas?
121
+ @Agapes: quod signum fatuitatis nobis inesse deprehendis?
122
+ @Diocletianus: Evidens magnumque.
123
+ ```
124
+
125
+ The parser extracts the speaker name into `line.speaker` and keeps `line.text` as pure speech text — ideal for NLP pipelines that need clean text without markup.
126
+
127
+ ```python
128
+ doc = parse("dulcitius.txtd")
129
+ for line in doc.sections[0].lines:
130
+ print(f"{line.speaker}: {line.text}")
131
+ # Diocletianus: Quid sibi vult ista...
132
+ ```
133
+
134
+ Non-speaker lines (stage directions, prose) have `line.speaker = None`. Speaker markup round-trips through `write()`.
135
+
136
+ ### Cross-source Quotation
137
+
138
+ Use `>` at the start of a line to mark text quoted verbatim from *another* literary
139
+ source — an author embedding a poet's verse in their own prose, for example. This
140
+ repurposes the familiar blockquote convention for the citational habits of classical texts:
141
+
142
+ ```
143
+ Quamquam Ennius recte:
144
+
145
+ > Amicus certus in re incerta cernitur,
146
+
147
+ tamen haec duo levitatis et infirmitatis plerosque convincunt.
148
+ ```
149
+
150
+ The parser strips the `>` marker and flags the line with `line.is_quote = True`, keeping
151
+ `line.text` as clean quoted text. Consecutive `>` lines form a multi-line quotation:
152
+
153
+ ```
154
+ > Negat quis, nego; ait, aio; postremo imperavi egomet mihi
155
+ > Omnia adsentari,
156
+ ```
157
+
158
+ ```python
159
+ doc = parse("cicero-de-amicitia.txtd")
160
+ quotes = [line.text for s in doc.sections for line in s.lines if line.is_quote]
161
+ # ['Amicus certus in re incerta cernitur,', ...]
162
+ ```
163
+
164
+ Non-quote lines have `line.is_quote = False`. Quotation markup round-trips through `write()`.
165
+ See `examples/cicero-de-amicitia.txtd` (Cicero quoting Ennius and Terence) and
166
+ `examples/augustine-civ-dei-1.2.txtd` (Augustine quoting Virgil).
167
+
168
+ ### Metadata
169
+
170
+ | Field | Description |
171
+ |-------|-------------|
172
+ | `work` | Work title (**required**) |
173
+ | `author` | Author name |
174
+ | `source` | Source URL or reference |
175
+ | `scope` | Portion of work in file (e.g., `1-6` for books 1-6) |
176
+
177
+ Additional fields are preserved in `metadata.extras`.
178
+
179
+ ## API Reference
180
+
181
+ ### Functions
182
+
183
+ - `parse(path_or_content: str, *, strict: bool = True) -> Document` — Parse a `.txtd` file or string. Strict by default: raises `ValueError` if the front matter block or `work` field is missing; pass `strict=False` for fragments.
184
+ - `write(doc: Document, path: str | None) -> str` — Write to file if path given; always returns serialized string
185
+
186
+ ### Classes
187
+
188
+ - `Document` — Container with `metadata: Metadata` and `sections: list[Section]`
189
+ - `Section` — Container with `id: str`, `lines: list[Line]`, optional `title` and `metadata`
190
+ - `Line` — Container with `text: str`, `number: int`, optional `speaker: str | None` and `label: str | None`, and `is_quote: bool` (cross-source quotation)
191
+ - `Metadata` — Container with `author`, `work`, `source`, `scope`, and `extras` dict
192
+
193
+ ## Development
194
+
195
+ ```bash
196
+ # Clone and install dev dependencies
197
+ git clone https://github.com/diyclassics/txtdown.git
198
+ cd txtdown
199
+ pip install -e ".[dev]"
200
+
201
+ # Run tests
202
+ pytest tests/ -v
203
+
204
+ # Run with coverage
205
+ pytest tests/ --cov=txtdown --cov-report=term-missing
206
+ ```
207
+
208
+ ## Project History
209
+
210
+ The idea for txtdown originated in January 2018, inspired by the need for a document format for Latin text collections that balanced the simplicity of plaintext with the more involved markup of XML-based formats like TEI. The goal was to create a format that is both human-readable and computer-tractable, supporting hierarchical structures, fundamental annotations, and embedded metadata. Txtdown has since been influenced by ongoing work on annotation projects such as the [Representing Women Authorship in the Latin Treebanks (RWALT)](https://diyclassics.github.io/rwalt-site/) project.
211
+
212
+ ## License
213
+
214
+ MIT
@@ -0,0 +1,185 @@
1
+ # txtdown
2
+
3
+ Minimal markup for Latin text collections using human-readable markup with inferrable hierarchical structure for scholarly citation.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pip install git+https://github.com/diyclassics/txtdown.git
9
+ ```
10
+
11
+ ## Quick Start
12
+
13
+ ```python
14
+ from txtdown import parse, write
15
+
16
+ # Parse a .txtd file
17
+ doc = parse("sulpicia.txtd")
18
+
19
+ # Access metadata
20
+ print(doc.metadata.author) # "Sulpicia"
21
+ print(doc.metadata.work) # "Epistulae"
22
+
23
+ # Access by citation
24
+ line = doc.get("2.3") # Section 2, line 3
25
+ section = doc.get("1") # Entire section 1
26
+
27
+ # Iterate sections and lines
28
+ for section in doc.sections:
29
+ for line in section.lines:
30
+ print(f"{section.id}.{line.number}: {line.text}")
31
+
32
+ # Write back to file (round-trip safe)
33
+ write(doc, "output.txtd")
34
+ ```
35
+
36
+ ## Format Specification
37
+
38
+ A `.txtd` file consists of a YAML front matter block followed by sections separated by horizontal rules (`---`). The front matter block is required and must include a `work` field; `parse()` raises `ValueError` otherwise. To parse a fragment without metadata (e.g. a single line or section), pass `strict=False`.
39
+
40
+ ### Basic Structure
41
+
42
+ ```
43
+ ---
44
+ author: Sulpicia
45
+ work: Epistulae
46
+ source: https://thelatinlibrary.com/sulpicia.html
47
+ ---
48
+
49
+ --- 1
50
+
51
+ Tandem venit amor, qualem texisse pudori
52
+ quam nudasse alicui sit mihi fama magis.
53
+ exorata meis illum Cytherea Camenis
54
+ attulit in nostrum deposuitque sinum.
55
+ etc.
56
+
57
+ --- 2
58
+
59
+ Invisus natalis adest, qui rure molesto
60
+ et sine Cerintho tristis agendus erit.
61
+ etc.
62
+ ```
63
+
64
+ ### Sections
65
+
66
+ - Sections are separated by `---` (three or more hyphens)
67
+ - Sections auto-number (1, 2, 3...) unless given explicit IDs (best practice)
68
+ - Explicit section ID: `--- prooemium` or `--- 1a`
69
+ - Section with title: `--- prooemium: Introduction`
70
+
71
+ ### Lines (for verse)
72
+
73
+ - Lines auto-number within each section (1, 2, 3...)
74
+ - Blank lines don't count toward line numbering
75
+ - Access via citation: `doc.get("2.3")` returns section 2, line 3
76
+
77
+ **Line indentation** (`mode: verse`): Leading whitespace indicates poetic structure (e.g., pentameter lines in elegiac couplets):
78
+
79
+ ```
80
+ Tandem venit amor, qualem texisse pudori
81
+ quam nudasse alicui sit mihi fama magis.
82
+ ```
83
+
84
+ The parser preserves indentation. For NLP, TxtdownReader strips leading whitespace when joining lines for sentence segmentation.
85
+
86
+ ### Speaker Markup (dramatic texts)
87
+
88
+ For dramatic texts, use `@Speaker:` at the start of a line to mark speaker attribution:
89
+
90
+ ```
91
+ @Diocletianus: Quid sibi vult ista, quae vos agitat, fatuitas?
92
+ @Agapes: quod signum fatuitatis nobis inesse deprehendis?
93
+ @Diocletianus: Evidens magnumque.
94
+ ```
95
+
96
+ The parser extracts the speaker name into `line.speaker` and keeps `line.text` as pure speech text — ideal for NLP pipelines that need clean text without markup.
97
+
98
+ ```python
99
+ doc = parse("dulcitius.txtd")
100
+ for line in doc.sections[0].lines:
101
+ print(f"{line.speaker}: {line.text}")
102
+ # Diocletianus: Quid sibi vult ista...
103
+ ```
104
+
105
+ Non-speaker lines (stage directions, prose) have `line.speaker = None`. Speaker markup round-trips through `write()`.
106
+
107
+ ### Cross-source Quotation
108
+
109
+ Use `>` at the start of a line to mark text quoted verbatim from *another* literary
110
+ source — an author embedding a poet's verse in their own prose, for example. This
111
+ repurposes the familiar blockquote convention for the citational habits of classical texts:
112
+
113
+ ```
114
+ Quamquam Ennius recte:
115
+
116
+ > Amicus certus in re incerta cernitur,
117
+
118
+ tamen haec duo levitatis et infirmitatis plerosque convincunt.
119
+ ```
120
+
121
+ The parser strips the `>` marker and flags the line with `line.is_quote = True`, keeping
122
+ `line.text` as clean quoted text. Consecutive `>` lines form a multi-line quotation:
123
+
124
+ ```
125
+ > Negat quis, nego; ait, aio; postremo imperavi egomet mihi
126
+ > Omnia adsentari,
127
+ ```
128
+
129
+ ```python
130
+ doc = parse("cicero-de-amicitia.txtd")
131
+ quotes = [line.text for s in doc.sections for line in s.lines if line.is_quote]
132
+ # ['Amicus certus in re incerta cernitur,', ...]
133
+ ```
134
+
135
+ Non-quote lines have `line.is_quote = False`. Quotation markup round-trips through `write()`.
136
+ See `examples/cicero-de-amicitia.txtd` (Cicero quoting Ennius and Terence) and
137
+ `examples/augustine-civ-dei-1.2.txtd` (Augustine quoting Virgil).
138
+
139
+ ### Metadata
140
+
141
+ | Field | Description |
142
+ |-------|-------------|
143
+ | `work` | Work title (**required**) |
144
+ | `author` | Author name |
145
+ | `source` | Source URL or reference |
146
+ | `scope` | Portion of work in file (e.g., `1-6` for books 1-6) |
147
+
148
+ Additional fields are preserved in `metadata.extras`.
149
+
150
+ ## API Reference
151
+
152
+ ### Functions
153
+
154
+ - `parse(path_or_content: str, *, strict: bool = True) -> Document` — Parse a `.txtd` file or string. Strict by default: raises `ValueError` if the front matter block or `work` field is missing; pass `strict=False` for fragments.
155
+ - `write(doc: Document, path: str | None) -> str` — Write to file if path given; always returns serialized string
156
+
157
+ ### Classes
158
+
159
+ - `Document` — Container with `metadata: Metadata` and `sections: list[Section]`
160
+ - `Section` — Container with `id: str`, `lines: list[Line]`, optional `title` and `metadata`
161
+ - `Line` — Container with `text: str`, `number: int`, optional `speaker: str | None` and `label: str | None`, and `is_quote: bool` (cross-source quotation)
162
+ - `Metadata` — Container with `author`, `work`, `source`, `scope`, and `extras` dict
163
+
164
+ ## Development
165
+
166
+ ```bash
167
+ # Clone and install dev dependencies
168
+ git clone https://github.com/diyclassics/txtdown.git
169
+ cd txtdown
170
+ pip install -e ".[dev]"
171
+
172
+ # Run tests
173
+ pytest tests/ -v
174
+
175
+ # Run with coverage
176
+ pytest tests/ --cov=txtdown --cov-report=term-missing
177
+ ```
178
+
179
+ ## Project History
180
+
181
+ The idea for txtdown originated in January 2018, inspired by the need for a document format for Latin text collections that balanced the simplicity of plaintext with the more involved markup of XML-based formats like TEI. The goal was to create a format that is both human-readable and computer-tractable, supporting hierarchical structures, fundamental annotations, and embedded metadata. Txtdown has since been influenced by ongoing work on annotation projects such as the [Representing Women Authorship in the Latin Treebanks (RWALT)](https://diyclassics.github.io/rwalt-site/) project.
182
+
183
+ ## License
184
+
185
+ MIT
@@ -0,0 +1,29 @@
1
+ ---
2
+ author: Augustine of Hippo
3
+ work: De Civitate Dei
4
+ source: https://www.thelatinlibrary.com/augustine/civ1.shtml
5
+ scope: 1.2
6
+ language: la
7
+ mode: prose
8
+ genre: theology
9
+ comment: This example demonstrates the use of `>` for indicating in-text quotation.
10
+ ---
11
+
12
+ --- 2
13
+
14
+ Tot bella gesta conscripta sunt uel ante conditam Romam uel ab eius exortu et imperio: legant et proferant sic aut ab alienigenis aliquam captam esse ciuitatem, ut hostes, qui ceperant, parcerent eis, quos ad deorum suorum templa confugisse compererant, aut aliquem ducem barbarorum praecepisse, ut inrupto oppido nullus feriretur, qui in illo uel illo templo fuisset inuentus. Nonne uidit Aeneas Priamum per aras
15
+
16
+ > Sanguine foedantem quos ipse sacrauerat ignes?
17
+
18
+ Nonne Diomedes et Vlixes
19
+
20
+ > caesis summae custodibus arcis.
21
+ > Corripuere sacram effigiem manibusque cruentis
22
+ > Virgineas ausi diuae contingere uittas?
23
+
24
+ Nec tamen quod sequitur uerum est:
25
+
26
+ > Ex illo fluere ac retro sublapsa referri
27
+ > Spes Danaum.
28
+
29
+ Postea quippe uicerunt, postea Troiam ferro ignibusque delerunt, postea confugientem ad aras Priamum obtruncauerunt. Nec ideo Troia periit, quia Mineruam perdidit. Quid enim prius ipsa Minerua perdiderat, ut periret? an forte custodes suos? Hoc sane uerum est; illis quippe interemptis potuit auferri. Neque enim homines a simulacro, sed simulacrum ab hominibus seruabatur. Quomodo ergo colebatur, ut patriam custodiret et ciues, quae suos non ualuit custodire custodes?