@brahim.ariani/md2pdf-cli 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,13 +7,13 @@ GitHub-flavored Markdown, tables, code blocks, blockquotes, images **and LaTeX m
7
7
  ## Install
8
8
 
9
9
  ```bash
10
- npm install -g md2pdf-cli
10
+ npm install -g @brahim.ariani/md2pdf-cli
11
11
  ```
12
12
 
13
13
  Or use it directly in a project:
14
14
 
15
15
  ```bash
16
- npm install md2pdf-cli
16
+ npm install brahim.ariani/md2pdf-cli
17
17
  ```
18
18
 
19
19
  ## CLI
@@ -30,9 +30,20 @@ If `output.pdf` is omitted, the output filename is derived from the input (e.g.
30
30
  |------------------------|------------------------------------------------------------|
31
31
  | `--title <text>` | Document title (defaults to the input filename) |
32
32
  | `--css <file>` | Path to a custom CSS file (replaces the default styles) |
33
+ | `--theme <name>` | Built-in theme: `default`, `academic`, `latex` |
33
34
  | `--format <size>` | Page format: `A4`, `Letter`, `Legal`, ... Default: `A4` |
35
+ | `--toc` | Prepend an auto-generated table of contents |
36
+ | `--toc-depth <n>` | Deepest heading level included in the TOC. Default: `3` |
37
+ | `--toc-title <text>` | TOC heading text. Default: `Contents` |
38
+ | `--highlight` | Syntax-highlight fenced code blocks with Shiki |
39
+ | `--code-theme <name>` | Shiki theme for code blocks. Default: `github-light` |
40
+ | `--mermaid` | Render ` ```mermaid ` code blocks as diagrams |
41
+ | `--mermaid-theme <t>` | Mermaid theme: `base`, `default`, `neutral`, `dark`, `forest`. Default: `base` |
42
+ | `--cover` | Render a title page from YAML front matter |
43
+ | `--no-cover` | Never render a title page (overrides front matter) |
34
44
  | `--no-page-numbers` | Disable the page-number footer |
35
45
  | `--no-math` | Disable KaTeX equation rendering |
46
+ | `--no-sanitize` | Disable HTML sanitization (**unsafe**, see below) |
36
47
  | `--keep-html` | Keep the intermediate `.tmp.html` file for debugging |
37
48
  | `-h`, `--help` | Show usage |
38
49
 
@@ -43,6 +54,10 @@ md2pdf research.md
43
54
  md2pdf research.md out/research.pdf
44
55
  md2pdf report.md report.pdf --title "Quarterly Report" --css theme.css
45
56
  md2pdf notes.md notes.pdf --format Letter --no-page-numbers
57
+ md2pdf book.md book.pdf --toc --toc-depth 2 --toc-title "Table of Contents"
58
+ md2pdf code.md code.pdf --highlight --code-theme github-dark
59
+ md2pdf paper.md paper.pdf --cover --toc
60
+ md2pdf thesis.md thesis.pdf --theme academic --toc
46
61
  md2pdf paper.md paper.pdf # equations rendered by default
47
62
  md2pdf draft.md draft.pdf --no-math # treat $...$ as literal text
48
63
  ```
@@ -61,6 +76,146 @@ $$
61
76
 
62
77
  Equations are rendered server-side with KaTeX, so the PDF is self-contained and prints identically on any machine.
63
78
 
79
+ ## Table of contents
80
+
81
+ Pass `--toc` to prepend an auto-generated, clickable table of contents built
82
+ from the document headings. Each heading also receives a stable `id` slug, so
83
+ the TOC links resolve as in-document bookmarks.
84
+
85
+ ```bash
86
+ md2pdf report.md report.pdf --toc # depth 3 (default)
87
+ md2pdf report.md report.pdf --toc --toc-depth 2 # only h1 + h2
88
+ md2pdf report.md report.pdf --toc --toc-title "Sommaire"
89
+ ```
90
+
91
+ The TOC is placed on its own page (it ends with a page break). You can fully
92
+ restyle it via `--css` by targeting `nav.toc`, `nav.toc .toc-title`, etc.
93
+
94
+ ## Themes
95
+
96
+ Pick a built-in look with `--theme`:
97
+
98
+ | Theme | Description |
99
+ |------------|----------------------------------------------------------------------|
100
+ | `default` | Clean sans-serif, navy headings, zebra tables (the original look) |
101
+ | `academic` | Georgia serif, justified & indented paragraphs, centered title |
102
+ | `latex` | Classic LaTeX `article` look: Computer Modern serif, booktabs tables |
103
+
104
+ ```bash
105
+ md2pdf report.md report.pdf --theme academic
106
+ md2pdf thesis.md thesis.pdf --theme latex --toc
107
+ md2pdf design.md design.pdf --mermaid --highlight --cover
108
+ ```
109
+
110
+ Every theme keeps the same structural rules (math, TOC, cover page, tables,
111
+ page-break safety) and only swaps typography and colors. An unknown theme name
112
+ falls back to `default` with a warning. For full control, `--css` still
113
+ replaces all styles entirely.
114
+
115
+ ## Front matter & cover page
116
+
117
+ Markdown files may start with a YAML front-matter block. It is parsed, stripped
118
+ from the body, and used to enrich the document:
119
+
120
+ ```markdown
121
+ ---
122
+ title: Quarterly Report
123
+ subtitle: Q2 2026 Financial Overview
124
+ author:
125
+ - Brahim Ariani
126
+ - Finance Team
127
+ date: 2026-05-31
128
+ cover: true
129
+ ---
130
+
131
+ # Introduction
132
+ ...
133
+ ```
134
+
135
+ - `title` becomes the HTML document title (a `--title` flag still wins).
136
+ - With `--cover` (or `cover: true` in the front matter), a dedicated **title
137
+ page** is rendered from `title`, `subtitle`, `author`/`authors` and `date`,
138
+ followed by a page break. `--no-cover` disables it even if the front matter
139
+ requests one.
140
+
141
+ ```bash
142
+ md2pdf paper.md paper.pdf --cover
143
+ md2pdf paper.md paper.pdf --cover --toc # title page, then a TOC page
144
+ ```
145
+
146
+ Restyle the cover via `--css` by targeting `section.cover`, `.cover-title`,
147
+ `.cover-subtitle`, `.cover-author`, `.cover-date`.
148
+
149
+ ## Syntax highlighting
150
+
151
+ Pass `--highlight` to colorize fenced code blocks with
152
+ [Shiki](https://shiki.style/) (the same engine that powers VS Code). Colors are
153
+ inlined into the HTML, so the PDF stays self-contained and prints identically
154
+ everywhere — no client-side JavaScript or web fonts required.
155
+
156
+ ```bash
157
+ md2pdf code.md code.pdf --highlight
158
+ md2pdf code.md code.pdf --highlight --code-theme github-dark
159
+ ```
160
+
161
+ Only the languages actually used in the document are loaded, keeping conversion
162
+ fast. Use any Shiki theme name (e.g. `github-light`, `github-dark`, `nord`,
163
+ `dracula`, `min-light`). Unknown languages fall back to a plain, escaped code
164
+ block, and an unknown theme falls back to `github-light`.
165
+
166
+ ## Mermaid diagrams
167
+
168
+ With `--mermaid`, fenced code blocks tagged `mermaid` are rendered into vector
169
+ diagrams with [Mermaid](https://mermaid.js.org/) (flowcharts, sequence diagrams,
170
+ Gantt charts, etc.):
171
+
172
+ ````markdown
173
+ ```mermaid
174
+ flowchart LR
175
+ A[Start] --> B{OK?}
176
+ B -- Yes --> C[Ship]
177
+ B -- No --> A
178
+ ```
179
+ ````
180
+
181
+ ```bash
182
+ md2pdf design.md design.pdf --mermaid
183
+ md2pdf design.md design.pdf --mermaid --mermaid-theme neutral
184
+ ```
185
+
186
+ Diagrams are rendered inside the same headless Chromium used for printing, so
187
+ the resulting SVG is embedded directly in the PDF — no network access or extra
188
+ tooling required. Mermaid runs with `securityLevel: 'strict'`, and a diagram
189
+ with invalid syntax is skipped rather than aborting the whole conversion.
190
+ Without `--mermaid`, ` ```mermaid ` blocks are left as plain code.
191
+
192
+ ## Security / HTML sanitization
193
+
194
+ Markdown allows raw HTML, which means an untrusted `.md` file can embed
195
+ `<script>`, `<iframe>`, or event-handler attributes (`onerror`, `onclick`, ...).
196
+ Because the document is rendered through a real browser (Chromium) before being
197
+ printed, such payloads would otherwise execute.
198
+
199
+ To prevent this, the HTML produced from your Markdown is **sanitized by default**
200
+ with [DOMPurify](https://github.com/cure53/DOMPurify) before it ever reaches the
201
+ browser. Scripts, event handlers, and dangerous URIs (`javascript:`, ...) are
202
+ stripped, while legitimate content — headings, tables, code blocks, images,
203
+ links and KaTeX/MathML/SVG math — is preserved.
204
+
205
+ If you fully trust the input and need to keep raw HTML (custom `<script>`,
206
+ embeds, etc.), you can opt out:
207
+
208
+ ```bash
209
+ md2pdf trusted.md trusted.pdf --no-sanitize
210
+ ```
211
+
212
+ ```js
213
+ await convert({ input: 'trusted.md', output: 'trusted.pdf', sanitize: false });
214
+ ```
215
+
216
+ > Only disable sanitization for content you control. Never run `--no-sanitize`
217
+ > on files from untrusted sources.
218
+
64
219
  ## Programmatic API
65
220
 
66
221
  ```js
@@ -74,6 +229,11 @@ await convert({
74
229
  // css: '/* inline CSS string */',
75
230
  format: 'A4',
76
231
  pageNumbers: true,
232
+ sanitize: true,
233
+ toc: true,
234
+ tocDepth: 3,
235
+ highlight: true,
236
+ codeTheme: 'github-light',
77
237
  });
78
238
  ```
79
239
 
@@ -84,12 +244,22 @@ await convert({
84
244
  | `input` | `string` | — | Path to a Markdown file (required) |
85
245
  | `output` | `string` | — | Path to the output PDF (required) |
86
246
  | `title` | `string` | input basename | `<title>` of the generated HTML |
87
- | `css` | `string` | bundled default | Inline CSS string |
88
- | `cssFile` | `string` | | Path to a CSS file (overrides `css`) |
247
+ | `theme` | `string` | `'default'` | Built-in theme: `default`/`academic`/`latex` |
248
+ | `css` | `string` | bundled default | Inline CSS string (overrides `theme`) |
249
+ | `cssFile` | `string` | — | Path to a CSS file (overrides `css` and `theme`) |
89
250
  | `format` | `string` | `'A4'` | Puppeteer page format |
90
251
  | `margin` | `object` | 22mm / 18mm | `{ top, bottom, left, right }` |
91
252
  | `pageNumbers` | `boolean` | `true` | Render `n / total` in the footer |
92
253
  | `math` | `boolean` | `true` | Render `$...$` and `$$...$$` as KaTeX |
254
+ | `sanitize` | `boolean` | `true` | Sanitize generated HTML (strip scripts/handlers) |
255
+ | `toc` | `boolean` | `false` | Prepend an auto-generated table of contents |
256
+ | `tocDepth` | `number` | `3` | Deepest heading level included in the TOC |
257
+ | `tocTitle` | `string` | `'Contents'` | TOC heading text |
258
+ | `highlight` | `boolean` | `false` | Syntax-highlight code blocks with Shiki |
259
+ | `codeTheme` | `string` | `'github-light'` | Shiki theme name for code blocks |
260
+ | `mermaid` | `boolean` | `false` | Render `mermaid` code blocks as diagrams |
261
+ | `mermaidTheme` | `string` | `'base'` | Mermaid theme name (light by default) |
262
+ | `cover` | `boolean` | front matter | Render a title page (`true`/`false` overrides YAML) |
93
263
  | `headerTemplate` | `string` | empty | Puppeteer header HTML |
94
264
  | `footerTemplate` | `string` | page numbers | Puppeteer footer HTML |
95
265
  | `puppeteerOptions` | `object` | `{}` | Extra options passed to `puppeteer.launch` |
package/bin/md2pdf.js CHANGED
@@ -3,6 +3,7 @@
3
3
 
4
4
  const path = require('path');
5
5
  const { convert } = require('../lib/index');
6
+ const { listThemes } = require('../lib/styles');
6
7
 
7
8
  function printUsage() {
8
9
  console.log(`Usage:
@@ -11,9 +12,20 @@ function printUsage() {
11
12
  Options:
12
13
  --title <text> Document title (defaults to input filename)
13
14
  --css <file> Path to a custom CSS file (replaces the default styles)
15
+ --theme <name> Built-in theme: default, academic, latex
14
16
  --format <size> Page format (A4, Letter, ...). Default: A4
17
+ --toc Prepend an auto-generated table of contents
18
+ --toc-depth <n> Max heading level included in the TOC. Default: 3
19
+ --toc-title <text> TOC heading text. Default: "Contents"
20
+ --highlight Syntax-highlight fenced code blocks (Shiki)
21
+ --code-theme <name> Shiki theme for code blocks. Default: github-light
22
+ --mermaid Render mermaid fenced code blocks as diagrams
23
+ --mermaid-theme <t> Mermaid theme: base, default, neutral, dark, forest. Default: base
24
+ --cover Render a title page from YAML front matter
25
+ --no-cover Never render a title page (overrides front matter)
15
26
  --no-page-numbers Disable footer page numbers
16
27
  --no-math Disable KaTeX equation rendering ($...$ / $$...$$)
28
+ --no-sanitize Disable HTML sanitization (UNSAFE: allows raw HTML/scripts)
17
29
  --keep-html Keep the intermediate .tmp.html file
18
30
  -h, --help Show this help
19
31
 
@@ -31,10 +43,21 @@ function parseArgs(argv) {
31
43
  if (a === '-h' || a === '--help') { args.flags.help = true; continue; }
32
44
  if (a === '--no-page-numbers') { args.flags.pageNumbers = false; continue; }
33
45
  if (a === '--no-math') { args.flags.math = false; continue; }
46
+ if (a === '--no-sanitize' || a === '--unsafe') { args.flags.sanitize = false; continue; }
34
47
  if (a === '--keep-html') { args.flags.keepHtml = true; continue; }
48
+ if (a === '--toc') { args.flags.toc = true; continue; }
49
+ if (a === '--highlight') { args.flags.highlight = true; continue; }
50
+ if (a === '--mermaid') { args.flags.mermaid = true; continue; }
51
+ if (a === '--cover') { args.flags.cover = true; continue; }
52
+ if (a === '--no-cover') { args.flags.cover = false; continue; }
35
53
  if (a === '--title') { args.flags.title = argv[++i]; continue; }
36
54
  if (a === '--css') { args.flags.cssFile = argv[++i]; continue; }
55
+ if (a === '--theme') { args.flags.theme = argv[++i]; continue; }
37
56
  if (a === '--format') { args.flags.format = argv[++i]; continue; }
57
+ if (a === '--toc-depth') { args.flags.tocDepth = parseInt(argv[++i], 10); continue; }
58
+ if (a === '--toc-title') { args.flags.tocTitle = argv[++i]; continue; }
59
+ if (a === '--code-theme') { args.flags.codeTheme = argv[++i]; continue; }
60
+ if (a === '--mermaid-theme') { args.flags.mermaidTheme = argv[++i]; continue; }
38
61
  if (a.startsWith('--')) {
39
62
  console.error(`Unknown option: ${a}`);
40
63
  process.exit(2);
@@ -57,15 +80,33 @@ function parseArgs(argv) {
57
80
  args.positional[1] ||
58
81
  input.replace(/\.md$/i, '') + '.pdf';
59
82
 
83
+ if (args.flags.theme && !listThemes().includes(args.flags.theme)) {
84
+ console.warn(
85
+ `WARNING: unknown theme "${args.flags.theme}", falling back to "default". ` +
86
+ `Available: ${listThemes().join(', ')}`
87
+ );
88
+ args.flags.theme = 'default';
89
+ }
90
+
60
91
  try {
61
92
  const result = await convert({
62
93
  input,
63
94
  output,
64
95
  title: args.flags.title,
65
96
  cssFile: args.flags.cssFile,
97
+ theme: args.flags.theme || 'default',
66
98
  format: args.flags.format || 'A4',
67
99
  pageNumbers: args.flags.pageNumbers !== false,
68
100
  math: args.flags.math !== false,
101
+ sanitize: args.flags.sanitize !== false,
102
+ toc: !!args.flags.toc,
103
+ tocDepth: Number.isInteger(args.flags.tocDepth) ? args.flags.tocDepth : 3,
104
+ tocTitle: args.flags.tocTitle,
105
+ highlight: !!args.flags.highlight,
106
+ codeTheme: args.flags.codeTheme,
107
+ mermaid: !!args.flags.mermaid,
108
+ mermaidTheme: args.flags.mermaidTheme || 'base',
109
+ cover: args.flags.cover,
69
110
  keepHtml: !!args.flags.keepHtml,
70
111
  });
71
112
  if (result.brokenImages && result.brokenImages.length) {
@@ -0,0 +1,61 @@
1
+ 'use strict';
2
+
3
+ const matter = require('gray-matter');
4
+
5
+ function escapeHtml(s) {
6
+ return String(s)
7
+ .replace(/&/g, '&amp;')
8
+ .replace(/</g, '&lt;')
9
+ .replace(/>/g, '&gt;')
10
+ .replace(/"/g, '&quot;');
11
+ }
12
+
13
+ // Splits YAML front matter from the Markdown body. Always returns a plain
14
+ // metadata object and the remaining content (front matter stripped).
15
+ function parseFrontMatter(rawMd) {
16
+ try {
17
+ const { data, content } = matter(rawMd);
18
+ return { data: data && typeof data === 'object' ? data : {}, content };
19
+ } catch (_) {
20
+ // Malformed YAML: fall back to treating the whole file as content.
21
+ return { data: {}, content: rawMd };
22
+ }
23
+ }
24
+
25
+ function formatDate(value) {
26
+ if (value instanceof Date && !isNaN(value)) {
27
+ return value.toISOString().slice(0, 10);
28
+ }
29
+ return String(value);
30
+ }
31
+
32
+ function normalizeAuthors(data) {
33
+ const raw = data.author != null ? data.author : data.authors;
34
+ if (raw == null) return [];
35
+ return (Array.isArray(raw) ? raw : [raw]).map((a) => String(a)).filter(Boolean);
36
+ }
37
+
38
+ // Builds a standalone title page from front matter metadata. Returns '' when
39
+ // there is nothing meaningful to show.
40
+ function buildCoverHtml(data = {}) {
41
+ const parts = [];
42
+ if (data.title) {
43
+ parts.push(`<h1 class="cover-title">${escapeHtml(data.title)}</h1>`);
44
+ }
45
+ if (data.subtitle) {
46
+ parts.push(`<p class="cover-subtitle">${escapeHtml(data.subtitle)}</p>`);
47
+ }
48
+ const authors = normalizeAuthors(data);
49
+ if (authors.length) {
50
+ parts.push(
51
+ `<p class="cover-author">${authors.map(escapeHtml).join(', ')}</p>`
52
+ );
53
+ }
54
+ if (data.date != null && data.date !== '') {
55
+ parts.push(`<p class="cover-date">${escapeHtml(formatDate(data.date))}</p>`);
56
+ }
57
+ if (!parts.length) return '';
58
+ return `<section class="cover">${parts.join('')}</section>`;
59
+ }
60
+
61
+ module.exports = { parseFrontMatter, buildCoverHtml };
@@ -0,0 +1,38 @@
1
+ 'use strict';
2
+
3
+ const DEFAULT_THEME = 'github-light';
4
+
5
+ // Shiki is ESM-only; load it lazily via dynamic import so this CommonJS
6
+ // package keeps working and pays the cost only when highlighting is enabled.
7
+ let shikiPromise = null;
8
+ function loadShiki() {
9
+ if (!shikiPromise) shikiPromise = import('shiki');
10
+ return shikiPromise;
11
+ }
12
+
13
+ // Builds a highlighter preloaded with the requested theme and languages.
14
+ // `codeToHtml` is synchronous once the highlighter exists, so it can safely be
15
+ // called from marked's synchronous `code` renderer.
16
+ async function createCodeHighlighter({ theme = DEFAULT_THEME, langs = [] } = {}) {
17
+ const shiki = await loadShiki();
18
+
19
+ const safeTheme = theme in shiki.bundledThemes ? theme : DEFAULT_THEME;
20
+ const safeLangs = Array.from(new Set(langs)).filter(
21
+ (lang) => lang in shiki.bundledLanguages || lang in shiki.bundledLanguagesAlias
22
+ );
23
+
24
+ const highlighter = await shiki.createHighlighter({
25
+ themes: [safeTheme],
26
+ langs: safeLangs,
27
+ });
28
+ const loaded = new Set(highlighter.getLoadedLanguages());
29
+
30
+ return {
31
+ theme: safeTheme,
32
+ supports: (lang) => !!lang && loaded.has(lang),
33
+ toHtml: (code, lang) =>
34
+ highlighter.codeToHtml(code, { lang, theme: safeTheme }),
35
+ };
36
+ }
37
+
38
+ module.exports = { createCodeHighlighter, DEFAULT_THEME };
package/lib/index.js CHANGED
@@ -2,16 +2,74 @@
2
2
 
3
3
  const fs = require('fs');
4
4
  const path = require('path');
5
- const { marked } = require('marked');
5
+ const { Marked } = require('marked');
6
6
  const markedKatex = require('marked-katex-extension');
7
7
  const puppeteer = require('puppeteer');
8
- const { defaultCss, katexCssLink } = require('./styles');
8
+ const { defaultCss, katexCssLink, getThemeCss } = require('./styles');
9
+ const { sanitizeHtml } = require('./sanitize');
10
+ const { collectHeadings, buildTocHtml, slugify } = require('./toc');
11
+ const { createCodeHighlighter } = require('./highlight');
12
+ const { parseFrontMatter, buildCoverHtml } = require('./frontmatter');
13
+ const { getMermaidScript } = require('./mermaid');
9
14
 
10
- let katexExtensionRegistered = false;
11
- function ensureKatexExtension() {
12
- if (katexExtensionRegistered) return;
13
- marked.use(markedKatex({ throwOnError: false, output: 'html', nonStandard: true }));
14
- katexExtensionRegistered = true;
15
+ // Recursively collect the fenced-code languages used anywhere in the document
16
+ // (including inside lists/blockquotes) so only those grammars are loaded.
17
+ function collectCodeLangs(tokens, out = new Set()) {
18
+ for (const token of tokens) {
19
+ if (token.type === 'code' && token.lang) {
20
+ out.add(token.lang.trim().split(/\s+/)[0]);
21
+ }
22
+ if (token.tokens) collectCodeLangs(token.tokens, out);
23
+ if (token.items) collectCodeLangs(token.items, out);
24
+ if (token.rows) for (const row of token.rows) collectCodeLangs(row, out);
25
+ }
26
+ return out;
27
+ }
28
+
29
+ // Render Markdown to HTML using an isolated Marked instance so per-call state
30
+ // (heading ids, KaTeX extension, highlighter) never leaks across invocations.
31
+ async function renderMarkdown({ md, math, toc, tocDepth, tocTitle, highlight, codeTheme, mermaid }) {
32
+ const m = new Marked();
33
+ m.setOptions({ gfm: true, breaks: false });
34
+ if (math) {
35
+ m.use(markedKatex({ throwOnError: false, output: 'html', nonStandard: true }));
36
+ }
37
+
38
+ const tokens = m.lexer(md);
39
+ const headings = collectHeadings(tokens);
40
+ const slugs = headings.map((h) => h.slug);
41
+
42
+ let highlighter = null;
43
+ if (highlight) {
44
+ const langs = Array.from(collectCodeLangs(tokens));
45
+ highlighter = await createCodeHighlighter({ theme: codeTheme, langs });
46
+ }
47
+
48
+ let idx = 0;
49
+ const renderer = {
50
+ heading(text, level) {
51
+ const slug = slugs[idx++] || slugify(text) || `section-${idx}`;
52
+ return `<h${level} id="${slug}">${text}</h${level}>\n`;
53
+ },
54
+ };
55
+ if (highlighter || mermaid) {
56
+ renderer.code = (text, infostring) => {
57
+ const lang = (infostring || '').trim().split(/\s+/)[0];
58
+ if (mermaid && lang === 'mermaid') {
59
+ return `<pre class="mermaid">${escapeHtml(text)}</pre>\n`;
60
+ }
61
+ if (highlighter && highlighter.supports(lang)) {
62
+ return highlighter.toHtml(text, lang);
63
+ }
64
+ const cls = lang ? ` class="language-${lang}"` : '';
65
+ return `<pre><code${cls}>${escapeHtml(text)}</code></pre>\n`;
66
+ };
67
+ }
68
+ m.use({ renderer });
69
+
70
+ const body = m.parse(md);
71
+ const tocHtml = toc ? buildTocHtml(headings, { title: tocTitle, depth: tocDepth }) : '';
72
+ return { body, tocHtml };
15
73
  }
16
74
 
17
75
  function buildHtml({ body, title, css, math }) {
@@ -97,6 +155,16 @@ async function convert(options) {
97
155
  puppeteerOptions = {},
98
156
  keepHtml = false,
99
157
  math = true,
158
+ sanitize = true,
159
+ toc = false,
160
+ tocDepth = 3,
161
+ tocTitle = 'Contents',
162
+ highlight = false,
163
+ codeTheme = 'github-light',
164
+ cover,
165
+ theme = 'default',
166
+ mermaid = false,
167
+ mermaidTheme = 'base',
100
168
  } = options;
101
169
 
102
170
  if (!input) throw new Error('`input` is required');
@@ -110,17 +178,36 @@ async function convert(options) {
110
178
  }
111
179
 
112
180
  const rawMd = fs.readFileSync(inputAbs, 'utf8');
113
- marked.setOptions({ gfm: true, breaks: false });
114
- if (math) ensureKatexExtension();
115
- const md = math ? normalizeBlockMath(rawMd) : rawMd;
116
- const body = marked.parse(md);
181
+ const { data: frontMatter, content } = parseFrontMatter(rawMd);
182
+ const md = math ? normalizeBlockMath(content) : content;
183
+ const { body: parsedBody, tocHtml } = await renderMarkdown({
184
+ md,
185
+ math,
186
+ toc,
187
+ tocDepth,
188
+ tocTitle,
189
+ highlight,
190
+ codeTheme,
191
+ mermaid,
192
+ });
193
+
194
+ const wantCover =
195
+ cover === true || (cover == null && frontMatter.cover === true);
196
+ const coverHtml = wantCover ? buildCoverHtml(frontMatter) : '';
197
+ const fullBody = [coverHtml, tocHtml, parsedBody].filter(Boolean).join('\n');
198
+ const body = sanitize ? sanitizeHtml(fullBody) : fullBody;
117
199
 
118
- let resolvedCss = css || defaultCss;
200
+ let resolvedCss;
119
201
  if (cssFile) {
120
202
  resolvedCss = fs.readFileSync(path.resolve(cssFile), 'utf8');
203
+ } else if (css) {
204
+ resolvedCss = css;
205
+ } else {
206
+ resolvedCss = getThemeCss(theme);
121
207
  }
122
208
 
123
- const docTitle = title || path.basename(inputAbs, path.extname(inputAbs));
209
+ const docTitle =
210
+ title || frontMatter.title || path.basename(inputAbs, path.extname(inputAbs));
124
211
  const html = buildHtml({ body, title: docTitle, css: resolvedCss, math });
125
212
 
126
213
  const tmpHtmlPath = outputAbs.replace(/\.pdf$/i, '') + '.tmp.html';
@@ -154,6 +241,21 @@ async function convert(options) {
154
241
  .map((img) => img.src)
155
242
  );
156
243
 
244
+ if (mermaid && (await page.$('.mermaid'))) {
245
+ await page.addScriptTag({ content: getMermaidScript() });
246
+ await page.evaluate(async (themeName) => {
247
+ window.mermaid.initialize({
248
+ startOnLoad: false,
249
+ securityLevel: 'strict',
250
+ theme: themeName,
251
+ });
252
+ await window.mermaid.run({
253
+ querySelector: '.mermaid',
254
+ suppressErrors: true,
255
+ });
256
+ }, mermaidTheme);
257
+ }
258
+
157
259
  const pdfOptions = {
158
260
  path: outputAbs,
159
261
  format,
@@ -181,4 +283,4 @@ async function convert(options) {
181
283
  }
182
284
  }
183
285
 
184
- module.exports = { convert, defaultCss };
286
+ module.exports = { convert, defaultCss, renderMarkdown };
package/lib/mermaid.js ADDED
@@ -0,0 +1,16 @@
1
+ 'use strict';
2
+
3
+ const fs = require('fs');
4
+
5
+ // Mermaid's bundled UMD build is large (~3 MB); read it lazily and cache it so
6
+ // it is only loaded into memory when a document actually uses diagrams.
7
+ let cachedScript = null;
8
+
9
+ function getMermaidScript() {
10
+ if (cachedScript != null) return cachedScript;
11
+ const scriptPath = require.resolve('mermaid/dist/mermaid.min.js');
12
+ cachedScript = fs.readFileSync(scriptPath, 'utf8');
13
+ return cachedScript;
14
+ }
15
+
16
+ module.exports = { getMermaidScript };
@@ -0,0 +1,25 @@
1
+ 'use strict';
2
+
3
+ const { JSDOM } = require('jsdom');
4
+ const createDOMPurify = require('dompurify');
5
+
6
+ let purifier = null;
7
+
8
+ function getPurifier() {
9
+ if (purifier) return purifier;
10
+ const { window } = new JSDOM('');
11
+ purifier = createDOMPurify(window);
12
+ return purifier;
13
+ }
14
+
15
+ function sanitizeHtml(html) {
16
+ const DOMPurify = getPurifier();
17
+ return DOMPurify.sanitize(html, {
18
+ USE_PROFILES: { html: true, svg: true, svgFilters: true, mathMl: true },
19
+ ADD_ATTR: ['target'],
20
+ FORBID_TAGS: ['style'],
21
+ ALLOW_DATA_ATTR: false,
22
+ });
23
+ }
24
+
25
+ module.exports = { sanitizeHtml };
package/lib/styles.js CHANGED
@@ -13,22 +13,72 @@ function katexCssLink() {
13
13
  return `<link rel="stylesheet" href="${fileUrl}">`;
14
14
  }
15
15
 
16
- const defaultCss = `
16
+ // Structural rules required by the converter's features (KaTeX, TOC, cover
17
+ // page, images, page-break safety). These are theme-agnostic and always
18
+ // applied; per-theme skins below only handle typography and colors.
19
+ const baseCss = `
17
20
  @page { size: A4; margin: 22mm 18mm 22mm 18mm; }
18
21
  * { box-sizing: border-box; }
22
+ body { margin: 0; max-width: 100%; }
23
+ h1, h2, h3, h4 { page-break-after: avoid; break-after: avoid; }
24
+ img {
25
+ max-width: 100%;
26
+ height: auto;
27
+ display: block;
28
+ margin: 12px auto;
29
+ page-break-inside: avoid;
30
+ break-inside: avoid;
31
+ }
32
+ table {
33
+ width: 100%;
34
+ border-collapse: collapse;
35
+ margin: 12px 0 18px 0;
36
+ page-break-inside: avoid;
37
+ }
38
+ pre { overflow-x: auto; page-break-inside: avoid; }
39
+ .mermaid {
40
+ text-align: center;
41
+ margin: 14px 0;
42
+ background: #ffffff;
43
+ color: #000;
44
+ padding: 8px 0;
45
+ border: none;
46
+ page-break-inside: avoid;
47
+ break-inside: avoid;
48
+ }
49
+ .mermaid svg { max-width: 100%; height: auto; background: #ffffff; }
50
+ .katex { font-size: 1.05em; }
51
+ .katex-display { margin: 14px 0; overflow-x: auto; overflow-y: hidden; page-break-inside: avoid; }
52
+ .katex-display > .katex { display: inline-block; text-align: center; max-width: 100%; }
53
+ nav.toc { page-break-after: always; break-after: page; margin-bottom: 8px; }
54
+ nav.toc .toc-title { margin-top: 0; padding-bottom: 4px; }
55
+ nav.toc ul { list-style: none; margin: 4px 0; padding-left: 18px; }
56
+ nav.toc > ul { padding-left: 0; }
57
+ nav.toc li { margin-bottom: 3px; }
58
+ section.cover {
59
+ page-break-after: always;
60
+ break-after: page;
61
+ display: flex;
62
+ flex-direction: column;
63
+ align-items: center;
64
+ justify-content: center;
65
+ text-align: center;
66
+ min-height: 80vh;
67
+ }
68
+ section.cover .cover-title { font-size: 30pt; border: none; margin: 0 0 8px 0; padding: 0; }
69
+ section.cover .cover-subtitle { font-size: 15pt; margin: 0 0 28px 0; }
70
+ section.cover .cover-author { font-size: 13pt; margin: 0 0 6px 0; }
71
+ section.cover .cover-date { font-size: 11pt; margin: 0; }
72
+ `;
73
+
74
+ const defaultSkin = `
19
75
  body {
20
76
  font-family: "Segoe UI", "Helvetica Neue", Arial, sans-serif;
21
77
  font-size: 11pt;
22
78
  line-height: 1.55;
23
79
  color: #1a1a1a;
24
- max-width: 100%;
25
- margin: 0;
26
- }
27
- h1, h2, h3, h4 {
28
- color: #102a43;
29
- page-break-after: avoid;
30
- break-after: avoid;
31
80
  }
81
+ h1, h2, h3, h4 { color: #102a43; }
32
82
  h1 { font-size: 22pt; border-bottom: 2px solid #102a43; padding-bottom: 6px; margin-top: 0; }
33
83
  h2 { font-size: 16pt; border-bottom: 1px solid #bcccdc; padding-bottom: 4px; margin-top: 28px; }
34
84
  h3 { font-size: 13pt; margin-top: 22px; }
@@ -45,59 +95,119 @@ code {
45
95
  font-size: 9.5pt;
46
96
  color: #b91c1c;
47
97
  }
48
- pre {
49
- background: #0f172a;
50
- color: #f1f5f9;
51
- padding: 12px;
52
- border-radius: 5px;
53
- overflow-x: auto;
54
- font-size: 9pt;
55
- }
98
+ pre { background: #0f172a; color: #f1f5f9; padding: 12px; border-radius: 5px; font-size: 9pt; }
56
99
  pre code { background: transparent; color: inherit; padding: 0; }
57
- table {
58
- width: 100%;
59
- border-collapse: collapse;
60
- margin: 12px 0 18px 0;
61
- page-break-inside: avoid;
62
- font-size: 9.5pt;
63
- }
64
- th, td {
65
- border: 1px solid #cbd5e1;
66
- padding: 6px 9px;
67
- text-align: left;
68
- vertical-align: top;
69
- }
100
+ table { font-size: 9.5pt; }
101
+ th, td { border: 1px solid #cbd5e1; padding: 6px 9px; text-align: left; vertical-align: top; }
70
102
  th { background: #e2e8f0; color: #0b2447; font-weight: 600; }
71
103
  tr:nth-child(even) td { background: #f8fafc; }
72
104
  hr { border: none; border-top: 1px solid #cbd5e1; margin: 24px 0; }
73
- blockquote {
74
- border-left: 4px solid #64748b;
75
- padding: 4px 12px;
76
- color: #475569;
77
- background: #f8fafc;
78
- margin: 10px 0;
105
+ blockquote { border-left: 4px solid #64748b; padding: 4px 12px; color: #475569; background: #f8fafc; margin: 10px 0; }
106
+ img { border: 1px solid #e2e8f0; border-radius: 4px; }
107
+ img + em, p > em { display: block; text-align: center; color: #475569; font-size: 9.5pt; margin-bottom: 14px; }
108
+ a { color: #1d4ed8; text-decoration: none; }
109
+ nav.toc .toc-title { border-bottom: 1px solid #bcccdc; }
110
+ nav.toc a { color: #102a43; }
111
+ section.cover .cover-subtitle { color: #475569; }
112
+ section.cover .cover-author { color: #102a43; }
113
+ section.cover .cover-date { color: #64748b; }
114
+ `;
115
+
116
+ // Emulates the classic LaTeX `article` look: Computer Modern serif, justified
117
+ // and indented paragraphs, booktabs-style rules, centered title.
118
+ const latexSkin = `
119
+ body {
120
+ font-family: "Latin Modern Roman", "CMU Serif", "Computer Modern", "Georgia", "Times New Roman", serif;
121
+ font-size: 11pt;
122
+ line-height: 1.5;
123
+ color: #000;
79
124
  }
80
- img {
81
- max-width: 100%;
82
- height: auto;
83
- display: block;
84
- margin: 12px auto;
85
- border: 1px solid #e2e8f0;
86
- border-radius: 4px;
87
- page-break-inside: avoid;
88
- break-inside: avoid;
125
+ h1, h2, h3, h4 { color: #000; font-weight: 700; }
126
+ h1 { font-size: 19pt; text-align: center; margin-top: 0; margin-bottom: 20px; }
127
+ h2 { font-size: 14pt; margin-top: 24px; }
128
+ h3 { font-size: 12pt; margin-top: 18px; }
129
+ h4 { font-size: 11pt; margin-top: 14px; font-style: italic; font-weight: 600; }
130
+ p { text-align: justify; margin: 0 0 2px 0; text-indent: 1.5em; }
131
+ p:first-of-type, h1 + p, h2 + p, h3 + p, h4 + p { text-indent: 0; }
132
+ ul, ol { margin: 6px 0 10px 0; padding-left: 24px; }
133
+ li { margin-bottom: 2px; }
134
+ code {
135
+ background: #f4f4f4;
136
+ padding: 1px 4px;
137
+ border-radius: 2px;
138
+ font-family: "Consolas", "Courier New", monospace;
139
+ font-size: 9.5pt;
140
+ color: #222;
89
141
  }
90
- img + em, p > em {
91
- display: block;
92
- text-align: center;
93
- color: #475569;
142
+ pre { background: #f4f4f4; color: #1a1a1a; padding: 12px; border: 1px solid #ddd; border-radius: 3px; font-size: 9pt; }
143
+ pre code { background: transparent; color: inherit; padding: 0; }
144
+ table { font-size: 10pt; margin: 14px auto; border-top: 1.5px solid #000; border-bottom: 1.5px solid #000; }
145
+ th, td { border: none; padding: 5px 12px; text-align: left; vertical-align: top; }
146
+ th { border-bottom: 1px solid #000; font-weight: 700; }
147
+ hr { border: none; border-top: 1px solid #000; margin: 22px 0; }
148
+ blockquote { border-left: 2px solid #000; padding: 2px 14px; color: #1a1a1a; margin: 10px 24px; }
149
+ img + em, p > em { display: block; text-align: center; color: #333; font-size: 9.5pt; margin-bottom: 14px; }
150
+ a { color: #000; text-decoration: none; }
151
+ nav.toc .toc-title { border-bottom: 1px solid #000; text-align: center; }
152
+ nav.toc a { color: #000; }
153
+ section.cover .cover-subtitle { color: #333; }
154
+ section.cover .cover-date { color: #333; }
155
+ `;
156
+
157
+ const academicSkin = `
158
+ body {
159
+ font-family: "Georgia", "Times New Roman", "Cambria", serif;
160
+ font-size: 11.5pt;
161
+ line-height: 1.6;
162
+ color: #161616;
163
+ }
164
+ h1, h2, h3, h4 { color: #161616; font-weight: 700; font-family: "Georgia", "Times New Roman", serif; }
165
+ h1 { font-size: 20pt; text-align: center; margin-top: 0; margin-bottom: 18px; }
166
+ h2 { font-size: 15pt; margin-top: 26px; }
167
+ h3 { font-size: 12.5pt; margin-top: 20px; font-style: italic; }
168
+ h4 { font-size: 11.5pt; margin-top: 16px; }
169
+ p { text-align: justify; margin: 0 0 4px 0; text-indent: 1.6em; }
170
+ p:first-of-type, h1 + p, h2 + p, h3 + p, h4 + p { text-indent: 0; }
171
+ ul, ol { margin: 8px 0 12px 0; padding-left: 26px; }
172
+ li { margin-bottom: 3px; }
173
+ code {
174
+ background: #f2f2f0;
175
+ padding: 1px 4px;
176
+ border-radius: 2px;
177
+ font-family: "Consolas", "Courier New", monospace;
94
178
  font-size: 9.5pt;
95
- margin-bottom: 14px;
179
+ color: #333;
96
180
  }
97
- a { color: #1d4ed8; text-decoration: none; }
98
- .katex { font-size: 1.05em; }
99
- .katex-display { margin: 14px 0; overflow-x: auto; overflow-y: hidden; page-break-inside: avoid; }
100
- .katex-display > .katex { display: inline-block; text-align: center; max-width: 100%; }
181
+ pre { background: #f2f2f0; color: #1a1a1a; padding: 12px; border: 1px solid #ddd; border-radius: 3px; font-size: 9pt; }
182
+ pre code { background: transparent; color: inherit; padding: 0; }
183
+ table { font-size: 10pt; margin: 14px auto; }
184
+ th, td { border: 1px solid #999; padding: 5px 10px; text-align: left; vertical-align: top; }
185
+ th { background: #ececec; font-weight: 700; }
186
+ hr { border: none; border-top: 1px solid #999; margin: 22px 0; }
187
+ blockquote { border-left: 3px solid #888; padding: 2px 14px; color: #333; font-style: italic; margin: 10px 20px; }
188
+ img + em, p > em { display: block; text-align: center; color: #444; font-size: 9.5pt; margin-bottom: 14px; }
189
+ a { color: #1a1a1a; text-decoration: underline; }
190
+ nav.toc .toc-title { border-bottom: 1px solid #999; text-align: center; }
191
+ nav.toc a { color: #161616; }
192
+ section.cover .cover-subtitle { color: #444; font-style: italic; }
193
+ section.cover .cover-date { color: #555; }
101
194
  `;
102
195
 
103
- module.exports = { defaultCss, katexCssLink };
196
+ const themes = {
197
+ default: defaultSkin,
198
+ academic: academicSkin,
199
+ latex: latexSkin,
200
+ };
201
+
202
+ function listThemes() {
203
+ return Object.keys(themes);
204
+ }
205
+
206
+ function getThemeCss(name) {
207
+ const skin = themes[name] || themes.default;
208
+ return `${baseCss}\n${skin}`;
209
+ }
210
+
211
+ const defaultCss = getThemeCss('default');
212
+
213
+ module.exports = { defaultCss, katexCssLink, getThemeCss, listThemes };
package/lib/toc.js ADDED
@@ -0,0 +1,75 @@
1
+ 'use strict';
2
+
3
+ function escapeHtml(s) {
4
+ return String(s)
5
+ .replace(/&/g, '&amp;')
6
+ .replace(/</g, '&lt;')
7
+ .replace(/>/g, '&gt;')
8
+ .replace(/"/g, '&quot;');
9
+ }
10
+
11
+ function slugify(text) {
12
+ return String(text)
13
+ .toLowerCase()
14
+ .trim()
15
+ .replace(/<[^>]*>/g, '')
16
+ .replace(/[^\w\s-]/g, '')
17
+ .replace(/\s+/g, '-')
18
+ .replace(/-+/g, '-')
19
+ .replace(/^-+|-+$/g, '');
20
+ }
21
+
22
+ function collectHeadings(tokens) {
23
+ const used = new Map();
24
+ const out = [];
25
+ for (const token of tokens) {
26
+ if (token.type !== 'heading') continue;
27
+ const text = token.text;
28
+ let slug = slugify(text) || 'section';
29
+ if (used.has(slug)) {
30
+ const next = used.get(slug) + 1;
31
+ used.set(slug, next);
32
+ slug = `${slug}-${next}`;
33
+ } else {
34
+ used.set(slug, 0);
35
+ }
36
+ out.push({ level: token.depth, text, slug });
37
+ }
38
+ return out;
39
+ }
40
+
41
+ function nest(items) {
42
+ const root = { level: -Infinity, children: [] };
43
+ const stack = [root];
44
+ for (const item of items) {
45
+ const node = { ...item, children: [] };
46
+ while (stack.length > 1 && stack[stack.length - 1].level >= item.level) {
47
+ stack.pop();
48
+ }
49
+ stack[stack.length - 1].children.push(node);
50
+ stack.push(node);
51
+ }
52
+ return root.children;
53
+ }
54
+
55
+ function renderNodes(nodes) {
56
+ if (!nodes.length) return '';
57
+ const items = nodes
58
+ .map(
59
+ (n) =>
60
+ `<li><a href="#${n.slug}">${escapeHtml(n.text)}</a>${renderNodes(n.children)}</li>`
61
+ )
62
+ .join('');
63
+ return `<ul>${items}</ul>`;
64
+ }
65
+
66
+ function buildTocHtml(headings, { title = 'Contents', depth = 3 } = {}) {
67
+ const items = headings.filter((h) => h.level <= depth);
68
+ if (!items.length) return '';
69
+ const heading = title
70
+ ? `<h2 class="toc-title">${escapeHtml(title)}</h2>`
71
+ : '';
72
+ return `<nav class="toc">${heading}${renderNodes(nest(items))}</nav>`;
73
+ }
74
+
75
+ module.exports = { slugify, collectHeadings, buildTocHtml };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@brahim.ariani/md2pdf-cli",
3
- "version": "1.0.0",
3
+ "version": "1.2.0",
4
4
  "description": "Convert Markdown files to beautifully styled PDFs using marked and puppeteer.",
5
5
  "keywords": [
6
6
  "markdown",
@@ -31,13 +31,24 @@
31
31
  "node": ">=18"
32
32
  },
33
33
  "scripts": {
34
- "test": "node test/smoke.js"
34
+ "test": "node test/sanitize.js && node test/toc.js && node test/highlight.js && node test/frontmatter.js && node test/themes.js && node test/mermaid.js && node test/smoke.js",
35
+ "test:sanitize": "node test/sanitize.js",
36
+ "test:toc": "node test/toc.js",
37
+ "test:highlight": "node test/highlight.js",
38
+ "test:frontmatter": "node test/frontmatter.js",
39
+ "test:themes": "node test/themes.js",
40
+ "test:mermaid": "node test/mermaid.js"
35
41
  },
36
42
  "dependencies": {
43
+ "dompurify": "^3.4.7",
44
+ "gray-matter": "^4.0.3",
45
+ "jsdom": "^29.1.1",
37
46
  "katex": "^0.16.9",
38
47
  "marked": "^12.0.0",
39
48
  "marked-katex-extension": "^5.0.0",
40
- "puppeteer": "^24.15.0"
49
+ "mermaid": "^11.15.0",
50
+ "puppeteer": "^24.15.0",
51
+ "shiki": "^4.1.0"
41
52
  },
42
53
  "repository": {
43
54
  "type": "git",