@vakra-dev/supermarkdown 0.0.2 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +343 -0
- package/package.json +3 -2
- package/supermarkdown.darwin-arm64.node +0 -0
- package/supermarkdown.darwin-x64.node +0 -0
- package/supermarkdown.linux-arm64-gnu.node +0 -0
- package/supermarkdown.linux-arm64-musl.node +0 -0
- package/supermarkdown.linux-x64-gnu.node +0 -0
- package/supermarkdown.linux-x64-musl.node +0 -0
- package/supermarkdown.win32-arm64-msvc.node +0 -0
- package/supermarkdown.win32-x64-msvc.node +0 -0
package/README.md
ADDED
|
@@ -0,0 +1,343 @@
|
|
|
1
|
+
# supermarkdown
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/@vakra-dev/supermarkdown)
|
|
4
|
+
[](https://crates.io/crates/supermarkdown)
|
|
5
|
+
[](https://opensource.org/licenses/MIT)
|
|
6
|
+
|
|
7
|
+
High-performance HTML to Markdown converter with full GitHub Flavored Markdown support. Written in Rust, available for Node.js and as a native Rust crate.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
- **Fast** - Written in Rust with O(n) algorithms, significantly faster than JavaScript alternatives
|
|
12
|
+
- **Full GFM Support** - Tables with alignment, strikethrough, autolinks, fenced code blocks
|
|
13
|
+
- **Accurate** - Handles malformed HTML gracefully via html5ever
|
|
14
|
+
- **Configurable** - Multiple heading styles, link styles, custom selectors
|
|
15
|
+
- **Zero Dependencies** - Single native binary, no JavaScript runtime overhead
|
|
16
|
+
- **Cross-Platform** - Pre-built binaries for Windows, macOS, and Linux (x64 & ARM64)
|
|
17
|
+
- **TypeScript Ready** - Full type definitions included
|
|
18
|
+
- **Async Support** - Non-blocking conversion for large documents
|
|
19
|
+
|
|
20
|
+
## Installation
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
npm install @vakra-dev/supermarkdown
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Quick Start
|
|
27
|
+
|
|
28
|
+
```javascript
|
|
29
|
+
import { convert } from "@vakra-dev/supermarkdown";
|
|
30
|
+
|
|
31
|
+
const html = `
|
|
32
|
+
<h1>Hello World</h1>
|
|
33
|
+
<p>This is a <strong>test</strong> with a <a href="https://example.com">link</a>.</p>
|
|
34
|
+
`;
|
|
35
|
+
|
|
36
|
+
const markdown = convert(html);
|
|
37
|
+
console.log(markdown);
|
|
38
|
+
// # Hello World
|
|
39
|
+
//
|
|
40
|
+
// This is a **test** with a [link](https://example.com).
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Usage
|
|
44
|
+
|
|
45
|
+
### Basic Conversion
|
|
46
|
+
|
|
47
|
+
```javascript
|
|
48
|
+
import { convert } from "@vakra-dev/supermarkdown";
|
|
49
|
+
|
|
50
|
+
const markdown = convert("<h1>Title</h1><p>Paragraph</p>");
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### With Options
|
|
54
|
+
|
|
55
|
+
```javascript
|
|
56
|
+
import { convert } from "@vakra-dev/supermarkdown";
|
|
57
|
+
|
|
58
|
+
const markdown = convert(html, {
|
|
59
|
+
headingStyle: "setext", // 'atx' (default) or 'setext'
|
|
60
|
+
linkStyle: "referenced", // 'inline' (default) or 'referenced'
|
|
61
|
+
excludeSelectors: ["nav", ".sidebar", "#ads"],
|
|
62
|
+
includeSelectors: [".important"], // Override excludes for specific elements
|
|
63
|
+
});
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Async Conversion
|
|
67
|
+
|
|
68
|
+
For large documents, use `convertAsync` to avoid blocking the main thread:
|
|
69
|
+
|
|
70
|
+
```javascript
|
|
71
|
+
import { convertAsync } from "@vakra-dev/supermarkdown";
|
|
72
|
+
|
|
73
|
+
const markdown = await convertAsync(largeHtml);
|
|
74
|
+
|
|
75
|
+
// Process multiple documents in parallel
|
|
76
|
+
const results = await Promise.all([
|
|
77
|
+
convertAsync(html1),
|
|
78
|
+
convertAsync(html2),
|
|
79
|
+
convertAsync(html3),
|
|
80
|
+
]);
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## API Reference
|
|
84
|
+
|
|
85
|
+
### `convert(html, options?)`
|
|
86
|
+
|
|
87
|
+
Converts HTML to Markdown synchronously.
|
|
88
|
+
|
|
89
|
+
**Parameters:**
|
|
90
|
+
|
|
91
|
+
- `html` (string) - The HTML string to convert
|
|
92
|
+
- `options` (object, optional) - Conversion options
|
|
93
|
+
|
|
94
|
+
**Returns:** string - The converted Markdown
|
|
95
|
+
|
|
96
|
+
### `convertAsync(html, options?)`
|
|
97
|
+
|
|
98
|
+
Converts HTML to Markdown asynchronously.
|
|
99
|
+
|
|
100
|
+
**Parameters:**
|
|
101
|
+
|
|
102
|
+
- `html` (string) - The HTML string to convert
|
|
103
|
+
- `options` (object, optional) - Conversion options
|
|
104
|
+
|
|
105
|
+
**Returns:** Promise<string> - The converted Markdown
|
|
106
|
+
|
|
107
|
+
### Options
|
|
108
|
+
|
|
109
|
+
| Option | Type | Default | Description |
|
|
110
|
+
| ------------------ | ---------------------------- | ----------- | ------------------------------------------------ |
|
|
111
|
+
| `headingStyle` | `'atx'` \| `'setext'` | `'atx'` | ATX uses `#` prefix, Setext uses underlines |
|
|
112
|
+
| `linkStyle` | `'inline'` \| `'referenced'` | `'inline'` | Inline: `[text](url)`, Referenced: `[text][1]` |
|
|
113
|
+
| `codeFence` | `` '`' `` \| `'~'` | `` '`' `` | Character for fenced code blocks |
|
|
114
|
+
| `bulletMarker` | `'-'` \| `'*'` \| `'+'` | `'-'` | Character for unordered list items |
|
|
115
|
+
| `baseUrl` | `string` | `undefined` | Base URL for resolving relative links |
|
|
116
|
+
| `excludeSelectors` | `string[]` | `[]` | CSS selectors for elements to exclude |
|
|
117
|
+
| `includeSelectors` | `string[]` | `[]` | CSS selectors to force keep (overrides excludes) |
|
|
118
|
+
|
|
119
|
+
## Supported Elements
|
|
120
|
+
|
|
121
|
+
### Block Elements
|
|
122
|
+
|
|
123
|
+
| HTML | Markdown |
|
|
124
|
+
| -------------------------- | ---------------------------------------------- |
|
|
125
|
+
| `<h1>` - `<h6>` | `#` headings or setext underlines |
|
|
126
|
+
| `<p>` | Paragraphs with blank lines |
|
|
127
|
+
| `<blockquote>` | `>` quoted blocks (supports nesting) |
|
|
128
|
+
| `<ul>`, `<ol>` | `-` or `1.` lists (supports `start` attribute) |
|
|
129
|
+
| `<pre><code>` | Fenced code blocks with language detection |
|
|
130
|
+
| `<table>` | GFM tables with alignment and captions |
|
|
131
|
+
| `<hr>` | `---` horizontal rules |
|
|
132
|
+
| `<dl>`, `<dt>`, `<dd>` | Definition lists |
|
|
133
|
+
| `<details>`, `<summary>` | Collapsible sections |
|
|
134
|
+
| `<figure>`, `<figcaption>` | Images with captions |
|
|
135
|
+
|
|
136
|
+
### Inline Elements
|
|
137
|
+
|
|
138
|
+
| HTML | Markdown |
|
|
139
|
+
| -------------------------- | --------------------------------------- |
|
|
140
|
+
| `<a>` | `[text](url)`, `[text][ref]`, or `<url>` (autolink) |
|
|
141
|
+
| `<img>` | `` |
|
|
142
|
+
| `<strong>`, `<b>` | `**bold**` |
|
|
143
|
+
| `<em>`, `<i>` | `*italic*` |
|
|
144
|
+
| `<code>` | `` `code` `` (handles nested backticks) |
|
|
145
|
+
| `<del>`, `<s>`, `<strike>` | `~~strikethrough~~` |
|
|
146
|
+
| `<sub>` | `<sub>subscript</sub>` |
|
|
147
|
+
| `<sup>` | `<sup>superscript</sup>` |
|
|
148
|
+
| `<br>` | Line breaks |
|
|
149
|
+
|
|
150
|
+
### HTML Passthrough
|
|
151
|
+
|
|
152
|
+
Elements without Markdown equivalents are preserved as HTML:
|
|
153
|
+
|
|
154
|
+
- `<kbd>` - Keyboard input
|
|
155
|
+
- `<mark>` - Highlighted text
|
|
156
|
+
- `<abbr>` - Abbreviations (preserves `title` attribute)
|
|
157
|
+
- `<samp>` - Sample output
|
|
158
|
+
- `<var>` - Variables
|
|
159
|
+
|
|
160
|
+
## Advanced Features
|
|
161
|
+
|
|
162
|
+
### Table Alignment
|
|
163
|
+
|
|
164
|
+
Extracts alignment from `align` attribute or `text-align` style:
|
|
165
|
+
|
|
166
|
+
```html
|
|
167
|
+
<table>
|
|
168
|
+
<tr>
|
|
169
|
+
<th align="left">Left</th>
|
|
170
|
+
<th align="center">Center</th>
|
|
171
|
+
<th align="right">Right</th>
|
|
172
|
+
</tr>
|
|
173
|
+
</table>
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
Output:
|
|
177
|
+
|
|
178
|
+
```markdown
|
|
179
|
+
| Left | Center | Right |
|
|
180
|
+
| :--- | :----: | ----: |
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
### Ordered List Start
|
|
184
|
+
|
|
185
|
+
Respects the `start` attribute on ordered lists:
|
|
186
|
+
|
|
187
|
+
```html
|
|
188
|
+
<ol start="5">
|
|
189
|
+
<li>Fifth item</li>
|
|
190
|
+
<li>Sixth item</li>
|
|
191
|
+
</ol>
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Output:
|
|
195
|
+
|
|
196
|
+
```markdown
|
|
197
|
+
5. Fifth item
|
|
198
|
+
6. Sixth item
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### Autolinks
|
|
202
|
+
|
|
203
|
+
When a link's text matches its URL or email, autolink syntax is used:
|
|
204
|
+
|
|
205
|
+
```html
|
|
206
|
+
<a href="https://example.com">https://example.com</a>
|
|
207
|
+
<a href="mailto:test@example.com">test@example.com</a>
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
Output:
|
|
211
|
+
|
|
212
|
+
```markdown
|
|
213
|
+
<https://example.com>
|
|
214
|
+
<test@example.com>
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
### Code Block Language Detection
|
|
218
|
+
|
|
219
|
+
Automatically detects language from class names:
|
|
220
|
+
|
|
221
|
+
- `language-*` (e.g., `language-rust`)
|
|
222
|
+
- `lang-*` (e.g., `lang-python`)
|
|
223
|
+
- `highlight-*` (e.g., `highlight-go`)
|
|
224
|
+
- `hljs-*` (highlight.js classes, excluding token classes like `hljs-keyword`)
|
|
225
|
+
- Bare language names (e.g., `javascript`, `python`) as fallback
|
|
226
|
+
|
|
227
|
+
```html
|
|
228
|
+
<pre><code class="language-rust">fn main() {}</code></pre>
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
Output:
|
|
232
|
+
|
|
233
|
+
````markdown
|
|
234
|
+
```rust
|
|
235
|
+
fn main() {}
|
|
236
|
+
```
|
|
237
|
+
````
|
|
238
|
+
|
|
239
|
+
Code blocks containing backticks automatically use more backticks as delimiters.
|
|
240
|
+
|
|
241
|
+
### Line Number Handling
|
|
242
|
+
|
|
243
|
+
Line number gutters are automatically stripped from code blocks. Elements with these class patterns are skipped:
|
|
244
|
+
|
|
245
|
+
- `gutter`
|
|
246
|
+
- `line-number`
|
|
247
|
+
- `line-numbers`
|
|
248
|
+
- `lineno`
|
|
249
|
+
- `linenumber`
|
|
250
|
+
|
|
251
|
+
### URL Encoding
|
|
252
|
+
|
|
253
|
+
Spaces and parentheses in URLs are automatically percent-encoded:
|
|
254
|
+
|
|
255
|
+
```javascript
|
|
256
|
+
// <a href="https://example.com/path (1)">link</a>
|
|
257
|
+
// → [link](https://example.com/path%20%281%29)
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### Selector-Based Filtering
|
|
261
|
+
|
|
262
|
+
Remove unwanted elements like navigation, ads, or sidebars:
|
|
263
|
+
|
|
264
|
+
```javascript
|
|
265
|
+
const markdown = convert(html, {
|
|
266
|
+
excludeSelectors: [
|
|
267
|
+
"nav",
|
|
268
|
+
"header",
|
|
269
|
+
"footer",
|
|
270
|
+
".sidebar",
|
|
271
|
+
".advertisement",
|
|
272
|
+
"#cookie-banner",
|
|
273
|
+
],
|
|
274
|
+
includeSelectors: [".main-content"],
|
|
275
|
+
});
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
## Limitations
|
|
279
|
+
|
|
280
|
+
Some HTML features cannot be fully represented in Markdown:
|
|
281
|
+
|
|
282
|
+
| Feature | Behavior |
|
|
283
|
+
| ----------------------- | ------------------------------------------ |
|
|
284
|
+
| Table colspan/rowspan | Content placed in first cell |
|
|
285
|
+
| Nested tables | Inner tables converted inline |
|
|
286
|
+
| Form elements | Skipped |
|
|
287
|
+
| iframe/video/audio | Skipped (no standard Markdown equivalent) |
|
|
288
|
+
| CSS styling | Ignored (except `text-align` for tables) |
|
|
289
|
+
| Empty elements | Removed from output |
|
|
290
|
+
|
|
291
|
+
## Rust Usage
|
|
292
|
+
|
|
293
|
+
Add to your `Cargo.toml`:
|
|
294
|
+
|
|
295
|
+
```toml
|
|
296
|
+
[dependencies]
|
|
297
|
+
supermarkdown = "0.0.2"
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
```rust
|
|
301
|
+
use supermarkdown::{convert, convert_with_options, Options, HeadingStyle};
|
|
302
|
+
|
|
303
|
+
// Basic conversion
|
|
304
|
+
let markdown = convert("<h1>Hello</h1>");
|
|
305
|
+
|
|
306
|
+
// With options
|
|
307
|
+
let options = Options::new()
|
|
308
|
+
.heading_style(HeadingStyle::Setext)
|
|
309
|
+
.exclude_selectors(vec!["nav".to_string()]);
|
|
310
|
+
|
|
311
|
+
let markdown = convert_with_options("<h1>Hello</h1>", &options);
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
## Performance
|
|
315
|
+
|
|
316
|
+
supermarkdown is designed for high performance:
|
|
317
|
+
|
|
318
|
+
- **Single-pass parsing** - O(n) HTML traversal
|
|
319
|
+
- **Pre-computed metadata** - List indices and CSS selectors computed in one pass
|
|
320
|
+
- **Zero-copy where possible** - Minimal string allocations
|
|
321
|
+
- **Native code** - No JavaScript runtime overhead
|
|
322
|
+
|
|
323
|
+
## Contributing
|
|
324
|
+
|
|
325
|
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
326
|
+
|
|
327
|
+
```bash
|
|
328
|
+
# Clone the repository
|
|
329
|
+
git clone https://github.com/vakra-dev/supermarkdown.git
|
|
330
|
+
cd supermarkdown
|
|
331
|
+
|
|
332
|
+
# Run tests
|
|
333
|
+
cargo test
|
|
334
|
+
|
|
335
|
+
# Build Node.js bindings
|
|
336
|
+
cd crates/supermarkdown-napi
|
|
337
|
+
npm install
|
|
338
|
+
npm run build
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
## License
|
|
342
|
+
|
|
343
|
+
MIT License - see [LICENSE](LICENSE) for details.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@vakra-dev/supermarkdown",
|
|
3
|
-
"version": "0.0.
|
|
3
|
+
"version": "0.0.3",
|
|
4
4
|
"description": "High-performance HTML to Markdown converter with full GFM support",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"types": "index.d.ts",
|
|
@@ -26,7 +26,8 @@
|
|
|
26
26
|
"files": [
|
|
27
27
|
"index.js",
|
|
28
28
|
"index.d.ts",
|
|
29
|
-
"*.node"
|
|
29
|
+
"*.node",
|
|
30
|
+
"README.md"
|
|
30
31
|
],
|
|
31
32
|
"napi": {
|
|
32
33
|
"name": "supermarkdown",
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|