@vakra-dev/supermarkdown 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,343 @@
1
+ # supermarkdown
2
+
3
+ [![npm version](https://img.shields.io/npm/v/@vakra-dev/supermarkdown.svg)](https://www.npmjs.com/package/@vakra-dev/supermarkdown)
4
+ [![crates.io](https://img.shields.io/crates/v/supermarkdown.svg)](https://crates.io/crates/supermarkdown)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
6
+
7
+ High-performance HTML to Markdown converter with full GitHub Flavored Markdown support. Written in Rust, available for Node.js and as a native Rust crate.
8
+
9
+ ## Features
10
+
11
+ - **Fast** - Written in Rust with O(n) algorithms, significantly faster than JavaScript alternatives
12
+ - **Full GFM Support** - Tables with alignment, strikethrough, autolinks, fenced code blocks
13
+ - **Accurate** - Handles malformed HTML gracefully via html5ever
14
+ - **Configurable** - Multiple heading styles, link styles, custom selectors
15
+ - **Zero Dependencies** - Single native binary, no JavaScript runtime overhead
16
+ - **Cross-Platform** - Pre-built binaries for Windows, macOS, and Linux (x64 & ARM64)
17
+ - **TypeScript Ready** - Full type definitions included
18
+ - **Async Support** - Non-blocking conversion for large documents
19
+
20
+ ## Installation
21
+
22
+ ```bash
23
+ npm install @vakra-dev/supermarkdown
24
+ ```
25
+
26
+ ## Quick Start
27
+
28
+ ```javascript
29
+ import { convert } from "@vakra-dev/supermarkdown";
30
+
31
+ const html = `
32
+ <h1>Hello World</h1>
33
+ <p>This is a <strong>test</strong> with a <a href="https://example.com">link</a>.</p>
34
+ `;
35
+
36
+ const markdown = convert(html);
37
+ console.log(markdown);
38
+ // # Hello World
39
+ //
40
+ // This is a **test** with a [link](https://example.com).
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ ### Basic Conversion
46
+
47
+ ```javascript
48
+ import { convert } from "@vakra-dev/supermarkdown";
49
+
50
+ const markdown = convert("<h1>Title</h1><p>Paragraph</p>");
51
+ ```
52
+
53
+ ### With Options
54
+
55
+ ```javascript
56
+ import { convert } from "@vakra-dev/supermarkdown";
57
+
58
+ const markdown = convert(html, {
59
+ headingStyle: "setext", // 'atx' (default) or 'setext'
60
+ linkStyle: "referenced", // 'inline' (default) or 'referenced'
61
+ excludeSelectors: ["nav", ".sidebar", "#ads"],
62
+ includeSelectors: [".important"], // Override excludes for specific elements
63
+ });
64
+ ```
65
+
66
+ ### Async Conversion
67
+
68
+ For large documents, use `convertAsync` to avoid blocking the main thread:
69
+
70
+ ```javascript
71
+ import { convertAsync } from "@vakra-dev/supermarkdown";
72
+
73
+ const markdown = await convertAsync(largeHtml);
74
+
75
+ // Process multiple documents in parallel
76
+ const results = await Promise.all([
77
+ convertAsync(html1),
78
+ convertAsync(html2),
79
+ convertAsync(html3),
80
+ ]);
81
+ ```
82
+
83
+ ## API Reference
84
+
85
+ ### `convert(html, options?)`
86
+
87
+ Converts HTML to Markdown synchronously.
88
+
89
+ **Parameters:**
90
+
91
+ - `html` (string) - The HTML string to convert
92
+ - `options` (object, optional) - Conversion options
93
+
94
+ **Returns:** string - The converted Markdown
95
+
96
+ ### `convertAsync(html, options?)`
97
+
98
+ Converts HTML to Markdown asynchronously.
99
+
100
+ **Parameters:**
101
+
102
+ - `html` (string) - The HTML string to convert
103
+ - `options` (object, optional) - Conversion options
104
+
105
+ **Returns:** Promise<string> - The converted Markdown
106
+
107
+ ### Options
108
+
109
+ | Option | Type | Default | Description |
110
+ | ------------------ | ---------------------------- | ----------- | ------------------------------------------------ |
111
+ | `headingStyle` | `'atx'` \| `'setext'` | `'atx'` | ATX uses `#` prefix, Setext uses underlines |
112
+ | `linkStyle` | `'inline'` \| `'referenced'` | `'inline'` | Inline: `[text](url)`, Referenced: `[text][1]` |
113
+ | `codeFence` | `` '`' `` \| `'~'` | `` '`' `` | Character for fenced code blocks |
114
+ | `bulletMarker` | `'-'` \| `'*'` \| `'+'` | `'-'` | Character for unordered list items |
115
+ | `baseUrl` | `string` | `undefined` | Base URL for resolving relative links |
116
+ | `excludeSelectors` | `string[]` | `[]` | CSS selectors for elements to exclude |
117
+ | `includeSelectors` | `string[]` | `[]` | CSS selectors to force keep (overrides excludes) |
118
+
119
+ ## Supported Elements
120
+
121
+ ### Block Elements
122
+
123
+ | HTML | Markdown |
124
+ | -------------------------- | ---------------------------------------------- |
125
+ | `<h1>` - `<h6>` | `#` headings or setext underlines |
126
+ | `<p>` | Paragraphs with blank lines |
127
+ | `<blockquote>` | `>` quoted blocks (supports nesting) |
128
+ | `<ul>`, `<ol>` | `-` or `1.` lists (supports `start` attribute) |
129
+ | `<pre><code>` | Fenced code blocks with language detection |
130
+ | `<table>` | GFM tables with alignment and captions |
131
+ | `<hr>` | `---` horizontal rules |
132
+ | `<dl>`, `<dt>`, `<dd>` | Definition lists |
133
+ | `<details>`, `<summary>` | Collapsible sections |
134
+ | `<figure>`, `<figcaption>` | Images with captions |
135
+
136
+ ### Inline Elements
137
+
138
+ | HTML | Markdown |
139
+ | -------------------------- | --------------------------------------- |
140
+ | `<a>` | `[text](url)`, `[text][ref]`, or `<url>` (autolink) |
141
+ | `<img>` | `![alt](src)` |
142
+ | `<strong>`, `<b>` | `**bold**` |
143
+ | `<em>`, `<i>` | `*italic*` |
144
+ | `<code>` | `` `code` `` (handles nested backticks) |
145
+ | `<del>`, `<s>`, `<strike>` | `~~strikethrough~~` |
146
+ | `<sub>` | `<sub>subscript</sub>` |
147
+ | `<sup>` | `<sup>superscript</sup>` |
148
+ | `<br>` | Line breaks |
149
+
150
+ ### HTML Passthrough
151
+
152
+ Elements without Markdown equivalents are preserved as HTML:
153
+
154
+ - `<kbd>` - Keyboard input
155
+ - `<mark>` - Highlighted text
156
+ - `<abbr>` - Abbreviations (preserves `title` attribute)
157
+ - `<samp>` - Sample output
158
+ - `<var>` - Variables
159
+
160
+ ## Advanced Features
161
+
162
+ ### Table Alignment
163
+
164
+ Extracts alignment from `align` attribute or `text-align` style:
165
+
166
+ ```html
167
+ <table>
168
+ <tr>
169
+ <th align="left">Left</th>
170
+ <th align="center">Center</th>
171
+ <th align="right">Right</th>
172
+ </tr>
173
+ </table>
174
+ ```
175
+
176
+ Output:
177
+
178
+ ```markdown
179
+ | Left | Center | Right |
180
+ | :--- | :----: | ----: |
181
+ ```
182
+
183
+ ### Ordered List Start
184
+
185
+ Respects the `start` attribute on ordered lists:
186
+
187
+ ```html
188
+ <ol start="5">
189
+ <li>Fifth item</li>
190
+ <li>Sixth item</li>
191
+ </ol>
192
+ ```
193
+
194
+ Output:
195
+
196
+ ```markdown
197
+ 5. Fifth item
198
+ 6. Sixth item
199
+ ```
200
+
201
+ ### Autolinks
202
+
203
+ When a link's text matches its URL or email, autolink syntax is used:
204
+
205
+ ```html
206
+ <a href="https://example.com">https://example.com</a>
207
+ <a href="mailto:test@example.com">test@example.com</a>
208
+ ```
209
+
210
+ Output:
211
+
212
+ ```markdown
213
+ <https://example.com>
214
+ <test@example.com>
215
+ ```
216
+
217
+ ### Code Block Language Detection
218
+
219
+ Automatically detects language from class names:
220
+
221
+ - `language-*` (e.g., `language-rust`)
222
+ - `lang-*` (e.g., `lang-python`)
223
+ - `highlight-*` (e.g., `highlight-go`)
224
+ - `hljs-*` (highlight.js classes, excluding token classes like `hljs-keyword`)
225
+ - Bare language names (e.g., `javascript`, `python`) as fallback
226
+
227
+ ```html
228
+ <pre><code class="language-rust">fn main() {}</code></pre>
229
+ ```
230
+
231
+ Output:
232
+
233
+ ````markdown
234
+ ```rust
235
+ fn main() {}
236
+ ```
237
+ ````
238
+
239
+ Code blocks containing backticks automatically use more backticks as delimiters.
240
+
241
+ ### Line Number Handling
242
+
243
+ Line number gutters are automatically stripped from code blocks. Elements with these class patterns are skipped:
244
+
245
+ - `gutter`
246
+ - `line-number`
247
+ - `line-numbers`
248
+ - `lineno`
249
+ - `linenumber`
250
+
251
+ ### URL Encoding
252
+
253
+ Spaces and parentheses in URLs are automatically percent-encoded:
254
+
255
+ ```javascript
256
+ // <a href="https://example.com/path (1)">link</a>
257
+ // → [link](https://example.com/path%20%281%29)
258
+ ```
259
+
260
+ ### Selector-Based Filtering
261
+
262
+ Remove unwanted elements like navigation, ads, or sidebars:
263
+
264
+ ```javascript
265
+ const markdown = convert(html, {
266
+ excludeSelectors: [
267
+ "nav",
268
+ "header",
269
+ "footer",
270
+ ".sidebar",
271
+ ".advertisement",
272
+ "#cookie-banner",
273
+ ],
274
+ includeSelectors: [".main-content"],
275
+ });
276
+ ```
277
+
278
+ ## Limitations
279
+
280
+ Some HTML features cannot be fully represented in Markdown:
281
+
282
+ | Feature | Behavior |
283
+ | ----------------------- | ------------------------------------------ |
284
+ | Table colspan/rowspan | Content placed in first cell |
285
+ | Nested tables | Inner tables converted inline |
286
+ | Form elements | Skipped |
287
+ | iframe/video/audio | Skipped (no standard Markdown equivalent) |
288
+ | CSS styling | Ignored (except `text-align` for tables) |
289
+ | Empty elements | Removed from output |
290
+
291
+ ## Rust Usage
292
+
293
+ Add to your `Cargo.toml`:
294
+
295
+ ```toml
296
+ [dependencies]
297
+ supermarkdown = "0.0.2"
298
+ ```
299
+
300
+ ```rust
301
+ use supermarkdown::{convert, convert_with_options, Options, HeadingStyle};
302
+
303
+ // Basic conversion
304
+ let markdown = convert("<h1>Hello</h1>");
305
+
306
+ // With options
307
+ let options = Options::new()
308
+ .heading_style(HeadingStyle::Setext)
309
+ .exclude_selectors(vec!["nav".to_string()]);
310
+
311
+ let markdown = convert_with_options("<h1>Hello</h1>", &options);
312
+ ```
313
+
314
+ ## Performance
315
+
316
+ supermarkdown is designed for high performance:
317
+
318
+ - **Single-pass parsing** - O(n) HTML traversal
319
+ - **Pre-computed metadata** - List indices and CSS selectors computed in one pass
320
+ - **Zero-copy where possible** - Minimal string allocations
321
+ - **Native code** - No JavaScript runtime overhead
322
+
323
+ ## Contributing
324
+
325
+ Contributions are welcome! Please feel free to submit a Pull Request.
326
+
327
+ ```bash
328
+ # Clone the repository
329
+ git clone https://github.com/vakra-dev/supermarkdown.git
330
+ cd supermarkdown
331
+
332
+ # Run tests
333
+ cargo test
334
+
335
+ # Build Node.js bindings
336
+ cd crates/supermarkdown-napi
337
+ npm install
338
+ npm run build
339
+ ```
340
+
341
+ ## License
342
+
343
+ MIT License - see [LICENSE](LICENSE) for details.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@vakra-dev/supermarkdown",
3
- "version": "0.0.2",
3
+ "version": "0.0.3",
4
4
  "description": "High-performance HTML to Markdown converter with full GFM support",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",
@@ -26,7 +26,8 @@
26
26
  "files": [
27
27
  "index.js",
28
28
  "index.d.ts",
29
- "*.node"
29
+ "*.node",
30
+ "README.md"
30
31
  ],
31
32
  "napi": {
32
33
  "name": "supermarkdown",
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file