md-reports 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,427 @@
1
+ Metadata-Version: 2.3
2
+ Name: md-reports
3
+ Version: 0.1.0
4
+ Summary: Render Markdown reports to multiple formats (DOCX, with pluggable renderers)
5
+ Author: pythoro
6
+ Author-email: pythoro <pythoro@mindquip.com>
7
+ Requires-Dist: jinja2>=3.1
8
+ Requires-Dist: markdown-it-py>=3.0
9
+ Requires-Dist: mdit-py-plugins>=0.4
10
+ Requires-Dist: python-docx>=1.1
11
+ Requires-Dist: pytest>=7 ; extra == 'dev'
12
+ Requires-Dist: ruff>=0.5 ; extra == 'dev'
13
+ Requires-Dist: matplotlib>=3.7 ; extra == 'examples'
14
+ Requires-Python: >=3.10
15
+ Provides-Extra: dev
16
+ Provides-Extra: examples
17
+ Description-Content-Type: text/markdown
18
+
19
+ # md-reports
20
+
21
+ Convert Markdown to DOCX using a configurable Word template. Designed for
22
+ embedding in Python scripts (no CLI). Extensible to other output formats.
23
+
24
+ ## Install
25
+
26
+ ```bash
27
+ uv add md-reports
28
+ ```
29
+
30
+ ## Quick start
31
+
32
+ ```python
33
+ from md_reports import convert_markdown_text, convert_markdown_file
34
+
35
+ # from a string
36
+ convert_markdown_text(
37
+ "# Title\n\nHello **world**.",
38
+ "out.docx",
39
+ )
40
+
41
+ # from a file (relative image paths resolve against the markdown file)
42
+ convert_markdown_file("doc.md", "doc.docx")
43
+
44
+ # inject script values via Jinja2 substitution
45
+ convert_markdown_text(
46
+ "# Q{{ q }} report\n\nRevenue grew by **{{ pct }}%**.",
47
+ "report.docx",
48
+ context={"q": 2, "pct": 14.5},
49
+ )
50
+ ```
51
+
52
+ Reusable converter (avoids reloading the template each call; supports
53
+ a `default_context` shared across all conversions):
54
+
55
+ ```python
56
+ from md_reports import (
57
+ ConversionOptions, DocxRenderer, MarkdownConverter,
58
+ )
59
+
60
+ conv = MarkdownConverter(
61
+ renderer=DocxRenderer(
62
+ template_path="house_style.docx",
63
+ options=ConversionOptions(strict_mode=True),
64
+ ),
65
+ default_context={"site": "Acme"},
66
+ )
67
+ conv.convert_file("a.md", "a.docx", context={"doc": "Q1"})
68
+ conv.convert_file("b.md", "b.docx", context={"doc": "Q2"})
69
+ ```
70
+
71
+ The `renderer` argument selects the output format. `DocxRenderer` is
72
+ the only built-in renderer today; the abstraction is in place for
73
+ additional renderers (e.g. HTML) to be added without changes to
74
+ `parse`, the model, options, or `MarkdownConverter`.
75
+
76
+ ## What's supported
77
+
78
+ Block elements:
79
+
80
+ - Headings `#` to `######` (mapped to `Heading 1` … `Heading 6` with
81
+ cascading fallback)
82
+ - Paragraphs, block quotes, fenced code blocks
83
+ - Bullet and ordered lists (with nesting)
84
+ - Standard markdown tables (header + body rows, alignment markers)
85
+ - CSV embedding via fenced blocks — both file-backed and inline
86
+ literal data
87
+ - Embedded images (`![alt](path)`) as figures with auto-numbered
88
+ captions
89
+
90
+ Inline elements: bold, italic, inline code, markdown links, and minimal
91
+ inline `<a href="...">…</a>` HTML.
92
+
93
+ ### Figures
94
+
95
+ Block-level images become a figure with an auto-numbered caption
96
+ sourced from the alt text:
97
+
98
+ ```markdown
99
+ ![Quarterly revenue chart](charts/revenue.png)
100
+ ```
101
+
102
+ renders the image followed by a Caption-styled paragraph
103
+ `Figure 1: Quarterly revenue chart`. The number is a Word `SEQ` field,
104
+ so it stays correct after copy/paste or reordering (Word updates fields
105
+ on print or `F9`).
106
+
107
+ ### Table captions
108
+
109
+ Markdown has no native table caption syntax. `md-reports` consumes the
110
+ paragraph immediately preceding a table when it begins with `Table:`:
111
+
112
+ ```markdown
113
+ Table: Quarterly revenue by region.
114
+
115
+ | Region | Q1 | Q2 |
116
+ |--------|----|----|
117
+ | EMEA | 1 | 2 |
118
+ ```
119
+
120
+ The caption is emitted above the table as
121
+ `Table 1: Quarterly revenue by region.` styled with Caption. Figure and
122
+ table counters are independent. The prefix is configurable via
123
+ `ConversionOptions.table_caption_prefix`.
124
+
125
+ ### Cross-references
126
+
127
+ Attach a label to a figure or table by appending `{#label}` to its alt
128
+ text or caption, then refer to it from anywhere in the document with a
129
+ markdown link whose target is `#label`:
130
+
131
+ ```markdown
132
+ ![Quarterly revenue {#fig-revenue}](charts/revenue.png)
133
+
134
+ Table: Sales by region {#tab-sales}
135
+
136
+ | Region | Total |
137
+ |--------|------:|
138
+ | EMEA | 100 |
139
+
140
+ See [Figure 1](#fig-revenue) and [](#tab-sales) for details.
141
+ ```
142
+
143
+ Each labelled caption is wrapped in a Word bookmark; each `#label` link
144
+ becomes a Word `REF` field pointing at that bookmark. The link text
145
+ becomes the cached display value; an empty link text auto-fills as
146
+ `"<Prefix> <Number>"` (e.g. `Table 1`). Forward references work — the
147
+ parser resolves all labels before rendering. After F9 (or print), Word
148
+ recomputes both the SEQ counters and the REF fields so reordering or
149
+ inserting figures keeps numbering and cross-references in sync.
150
+
151
+ Unknown `#label` targets degrade to plain text with a warning (or raise
152
+ in `strict_mode`).
153
+
154
+ #### Preview-friendly label form
155
+
156
+ The bare `{#label}` marker shows up as literal text when the markdown
157
+ is viewed in a plain markdown previewer (GitHub, VS Code, etc.). For
158
+ table and CSV captions you can wrap the marker in an HTML comment so
159
+ the marker is invisible in previews while still being picked up by
160
+ md-reports:
161
+
162
+ ```markdown
163
+ Table: Sales by region <!-- {#tab-sales} -->
164
+
165
+ | Region | Total |
166
+ |--------|------:|
167
+ | EMEA | 100 |
168
+ ```
169
+
170
+ Both forms are supported and behave identically; pick whichever you
171
+ prefer. The comment must come at the end of the caption. Image alt
172
+ text is already invisible in previews, so the bare form is fine there
173
+ and no comment variant is needed.
174
+
175
+ ### CSV embedding
176
+
177
+ Two fenced-block variants render CSV data as a DOCX table.
178
+
179
+ **From a file** — the body is a single path resolved against
180
+ `project_root` (or the markdown file's directory):
181
+
182
+ ````markdown
183
+ Table: Quarterly revenue.
184
+
185
+ ```csv-file
186
+ data/quarterly.csv
187
+ ```
188
+ ````
189
+
190
+ **Inline** — the body is the CSV literal itself:
191
+
192
+ ````markdown
193
+ ```csv
194
+ region,q1,q2
195
+ EMEA,1,2
196
+ APAC,3,4
197
+ ```
198
+ ````
199
+
200
+ Either form accepts a `no-header` flag on the info string to suppress
201
+ header-row treatment (no row gets bolded; all rows are body):
202
+
203
+ ````markdown
204
+ ```csv-file no-header
205
+ data/raw.csv
206
+ ```
207
+ ````
208
+
209
+ CSV-derived tables share the same `Table N` counter as native markdown
210
+ tables, accept the same preceding-`Table:` caption, and use the
211
+ `Table Grid` style. The delimiter is auto-detected via `csv.Sniffer`
212
+ (falls back to comma); encoding is UTF-8.
213
+
214
+ #### Embedding a pandas DataFrame
215
+
216
+ Pass a DataFrame in the context and pipe it through the built-in
217
+ `to_csv` Jinja2 filter inside a `csv` fence:
218
+
219
+ ````markdown
220
+ Table: Quarterly figures.
221
+
222
+ ```csv
223
+ {{ df | to_csv }}
224
+ ```
225
+ ````
226
+
227
+ ```python
228
+ import pandas as pd
229
+
230
+ df = pd.DataFrame(
231
+ {"region": ["EMEA", "APAC"], "q1": [1, 3], "q2": [2, 4]}
232
+ )
233
+ convert_markdown_text(markdown_text, "out.docx", context={"df": df})
234
+ ```
235
+
236
+ The filter calls `value.to_csv(index=False)` and strips the trailing
237
+ newline. Captions, the shared `Table N` counter, and the `no-header`
238
+ flag all work the same as for any `csv` fence.
239
+
240
+ The filter is duck-typed on `.to_csv()` — pandas is **not** a
241
+ dependency of `md-reports`. Any object with a compatible `.to_csv()`
242
+ method works (your script provides it). Pass any kwargs supported by
243
+ the underlying method, e.g.:
244
+
245
+ ````markdown
246
+ ```csv
247
+ {{ df | to_csv(sep=';', na_rep='—', index=True) }}
248
+ ```
249
+ ````
250
+
251
+ ## Jinja2 context
252
+
253
+ Pass a `context` dict to inject script-side values into the markdown
254
+ before parsing. Substitution runs once on the raw markdown text, so
255
+ values flow into every textual position — body, headings, table cells,
256
+ image paths, CSV file paths, inline CSV data, captions:
257
+
258
+ ```python
259
+ convert_markdown_text(
260
+ "# {{ title | upper }}\n\nGrowth: **{{ pct }}%**",
261
+ "out.docx",
262
+ context={"title": "q1 results", "pct": 14.5},
263
+ )
264
+ ```
265
+
266
+ The full Jinja2 syntax is available — variables, filters, conditionals,
267
+ loops:
268
+
269
+ ```markdown
270
+ # {{ report_title }}
271
+
272
+ {% for finding in findings %}
273
+ - {{ finding }}
274
+ {% endfor %}
275
+
276
+ {% if show_appendix %}
277
+ ## Appendix
278
+
279
+ See [details]({{ appendix_url }}).
280
+ {% endif %}
281
+ ```
282
+
283
+ Supported value types include `str`, `int`, `float`, `bool`, `None`,
284
+ `list`/`tuple` of those, and `dict` (for attribute access via
285
+ `{{ user.name }}`).
286
+
287
+ **Missing-variable behavior**:
288
+
289
+ - Default mode: a simple `{{ name }}` whose key is missing renders as
290
+ the literal `{{ name }}` and emits a warning — visible breadcrumb,
291
+ no silent data loss. More complex Jinja2 errors (syntax errors,
292
+ iteration over an undefined sequence, etc.) cause the markdown to
293
+ be left unchanged with a warning.
294
+ - `strict_mode=True`: any undefined variable or template error raises
295
+ `ValidationError`.
296
+
297
+ `MarkdownConverter` accepts a `default_context` at construction
298
+ time and per-call `context=` overrides that merge over it (call-site
299
+ keys win).
300
+
301
+ ## Options
302
+
303
+ ```python
304
+ from md_reports import ConversionOptions
305
+
306
+ ConversionOptions(
307
+ strict_mode=False, # raise instead of warn on issues
308
+ figure_caption_prefix="Figure",
309
+ table_caption_prefix="Table",
310
+ project_root=None, # root for resolving relative paths
311
+ # to images and CSV files
312
+ )
313
+ ```
314
+
315
+ ## Templates
316
+
317
+ If `template_path` is omitted on `DocxRenderer` (or you don't pass a
318
+ renderer at all), a packaged default DOCX template is used. To
319
+ inspect or copy the default:
320
+
321
+ ```python
322
+ from md_reports import get_default_template_path
323
+
324
+ print(get_default_template_path())
325
+ ```
326
+
327
+ The template should provide these styles (fallbacks apply when
328
+ missing):
329
+
330
+ - `Normal`, `Heading 1` … `Heading 6`
331
+ - `List Bullet`, `List Number` (and their `2`/`3` variants for nesting)
332
+ - `Quote`, `Caption`
333
+ - `Table Grid`
334
+ - `Code` (optional; falls back to monospace runs in `Normal`)
335
+
336
+ ### Front matter
337
+
338
+ Whatever already lives in the template (cover page, headers/footers,
339
+ title block, table of contents) is preserved — markdown content is
340
+ appended after the existing body.
341
+
342
+ ## Document properties
343
+
344
+ Set DOCX core properties (the fields shown under *File > Info* in
345
+ Word) via `properties=`:
346
+
347
+ ```python
348
+ convert_markdown_text(
349
+ md,
350
+ "report.docx",
351
+ properties={
352
+ "title": "Q4 Report",
353
+ "author": "Jane Doe",
354
+ "subject": "Quarterly review",
355
+ "tags": "revenue, headcount",
356
+ "comments": "Reference: REP-2026-Q4",
357
+ "category": "Finance",
358
+ },
359
+ )
360
+ ```
361
+
362
+ To display these in the rendered document, edit the template and
363
+ insert the matching field via `Insert > Quick Parts > Field…` →
364
+ `Title` / `Author` / `Subject` / `Keywords` / `Comments` / `Category`.
365
+ You can place these in the body, header, or footer. Word recomputes
366
+ fields on F9 / print, same as `SEQ` and `REF`.
367
+
368
+ `MarkdownConverter` also accepts `default_properties=`, merged with
369
+ per-call `properties=` (call-site keys win).
370
+
371
+ Accepted keys (case-insensitive) and the core property they target:
372
+
373
+ | Key (and aliases) | Core property |
374
+ |---|---|
375
+ | `title` | `title` |
376
+ | `author`, `creator` | `author` |
377
+ | `subject` | `subject` |
378
+ | `keywords`, `tags` | `keywords` |
379
+ | `comments`, `description` | `comments` |
380
+ | `category`, `categories` | `category` |
381
+ | `content_status` | `content_status` |
382
+ | `identifier` | `identifier` |
383
+ | `language` | `language` |
384
+ | `version` | `version` |
385
+ | `last_modified_by` | `last_modified_by` |
386
+
387
+ Unknown keys warn (or raise under `strict_mode`). Datetime properties
388
+ (`created`, `modified`, `last_printed`, `revision`) are deliberately
389
+ not exposed.
390
+
391
+ `Company` lives in DOCX extended properties (`docProps/app.xml`),
392
+ not core properties, and is not currently writable via this API.
393
+ Arbitrary user-defined properties (`docProps/custom.xml`) are likewise
394
+ not yet supported — use `subject`/`keywords`/`comments` as a host for
395
+ reference strings or project codes.
396
+
397
+ ## Limitations (v1)
398
+
399
+ - No CLI.
400
+ - Remote (`http(s)://`) image fetching is not supported — use local
401
+ files.
402
+ - Cell merges (`rowspan`/`colspan`) and nested tables are not
403
+ supported.
404
+ - CSV embedding has no per-fence delimiter/encoding overrides yet
405
+ (UTF-8 + `csv.Sniffer` only).
406
+ - `SEQ` field numbers display correctly in Word once fields update
407
+ (typically on print or pressing `F9`); the file is written with a
408
+ pre-computed display value so first-open looks right too.
409
+ - No footnotes, math, definition lists, or task lists.
410
+
411
+ ## Errors
412
+
413
+ All exceptions inherit from `MdAstDocxError`. Specific types:
414
+
415
+ - `TemplateError` — template missing/unreadable
416
+ - `ParseError` — markdown could not be parsed
417
+ - `RenderError` — DOCX rendering failed
418
+ - `ValidationError` — bad input arguments
419
+
420
+ ## Development
421
+
422
+ ```bash
423
+ uv sync --extra dev
424
+ uv run pytest
425
+ uv run ruff check src tests
426
+ uv run ruff format src tests
427
+ ```