inkmark 0.1.0-aarch64-linux-musl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +3 -0
- data/LICENSE.txt +21 -0
- data/NOTICE +16 -0
- data/README.md +1166 -0
- data/lib/inkmark/3.3/inkmark.so +0 -0
- data/lib/inkmark/3.4/inkmark.so +0 -0
- data/lib/inkmark/4.0/inkmark.so +0 -0
- data/lib/inkmark/event.rb +342 -0
- data/lib/inkmark/native.rb +8 -0
- data/lib/inkmark/options.rb +698 -0
- data/lib/inkmark/toc.rb +40 -0
- data/lib/inkmark/version.rb +6 -0
- data/lib/inkmark.rb +711 -0
- data/sig/inkmark.rbs +219 -0
- metadata +178 -0
data/README.md
ADDED
|
@@ -0,0 +1,1166 @@
|
|
|
1
|
+
# Inkmark
|
|
2
|
+
|
|
3
|
+
A very fast, feature-packed, AI-first markdown gem for Ruby.
|
|
4
|
+
|
|
5
|
+
[](https://github.com/yaroslav/inkmark/releases)
|
|
6
|
+
[](https://rubydoc.info/gems/inkmark)
|
|
7
|
+
|
|
8
|
+
<div align="center">
|
|
9
|
+
<img src="https://raw.githubusercontent.com/yaroslav/inkmark/refs/heads/main/assets/images/inky.png" width="400" height="400" alt="Inky">
|
|
10
|
+
</div>
|
|
11
|
+
|
|
12
|
+
- **Very fast**. Up to 1.3× faster than redcarpet _(not CommonMark-conformant)_, about 3×–9× faster than other Ruby Markdown gems with native extensions. Built with Rust, based on [pulldown-cmark](https://github.com/pulldown-cmark/pulldown-cmark), uses SIMD.
|
|
13
|
+
- **No surprises**. CommonMark + GitHub Flavored Markdown conformance.
|
|
14
|
+
- **"Batteries included" approach**. Build lots of useful features, make them easy to use and as fast as possible.
|
|
15
|
+
- **Easy to use**. As simple as a one-method API. Pass options inline as a hash, set them one by one, or set default options for the entire application.
|
|
16
|
+
- **Feature-packed**. Server-side syntax highlighting with themes, frontmatter support, table of contents in Markdown and HTML, plain text export, extraction of headers/links/images, statistics (character and word count, likely document language, blocks count), lazy image loading attributes, emoji shortcodes, autolinks, heading IDs with Unicode-transliterated slugs, wikilinks, footnotes, tables, task lists, smart punctuation, hard wraps, "nofollow/noopener" on external links.
|
|
17
|
+
- **AI-first**. Two chunking primitives: heading-based with breadcrumbs and per-chunk character/word counts, and sliding-window with overlap for size-bounded chunks where headings are absent or uneven. Block-aware or word-aware truncation for context-window budgeting. Markdown-to-Markdown pipeline. Plain-text extraction for embedding models. Structured extraction of headings, images, links, code blocks—each carrying byte ranges back into the source.
|
|
18
|
+
- **Security conscious**. Raw HTML denied by default. Hostname and URL-scheme allowlists for both links and images. GFM tagfilter for dangerous tags. A Rust-backed gem.
|
|
19
|
+
- **Easy extension API**. Hook any element with a Ruby block—no subclassing, no intermediate AST, no HTML post-processing. Rewrite URLs, swap code blocks for your own renderer, drop subtrees, or just walk the document for analysis. Handlers fire inside the single-pass parser, so extension costs essentially nothing beyond the render itself—and far less than regexing over output HTML.
|
|
20
|
+
|
|
21
|
+
## Contents
|
|
22
|
+
|
|
23
|
+
- [Installation](#installation)
|
|
24
|
+
- [Quick start](#quick-start)
|
|
25
|
+
- [Presets](#presets)
|
|
26
|
+
- [Options](#options)
|
|
27
|
+
- [Raw HTML](#raw-html)
|
|
28
|
+
- [Host allowlists](#host-allowlists)
|
|
29
|
+
- [URL scheme filtering](#url-scheme-filtering)
|
|
30
|
+
- [Statistics and extraction](#statistics-and-extraction)
|
|
31
|
+
- [Chunks extraction (for RAG)](#chunks-extraction-for-rag)
|
|
32
|
+
- [Truncation](#truncation)
|
|
33
|
+
- [Plain-text extraction](#plain-text-extraction)
|
|
34
|
+
- [Markdown-to-Markdown pipeline](#markdown-to-markdown-pipeline)
|
|
35
|
+
- [Event handlers](#event-handlers)
|
|
36
|
+
- [Benchmarks](#benchmarks)
|
|
37
|
+
- [Contributing](#contributing)
|
|
38
|
+
- [Acknowledgements](#acknowledgements)
|
|
39
|
+
- [License](#license)
|
|
40
|
+
|
|
41
|
+
## Installation
|
|
42
|
+
|
|
43
|
+
bundle add inkmark
|
|
44
|
+
|
|
45
|
+
Or in your `Gemfile`:
|
|
46
|
+
|
|
47
|
+
```ruby
|
|
48
|
+
gem "inkmark"
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Ruby 3.3+ is supported.
|
|
52
|
+
|
|
53
|
+
## Quick start
|
|
54
|
+
|
|
55
|
+
```ruby
|
|
56
|
+
require "inkmark"
|
|
57
|
+
|
|
58
|
+
# Class-method shortcut
|
|
59
|
+
Inkmark.to_html("**hello**")
|
|
60
|
+
# => "<p><strong>hello</strong></p>\n"
|
|
61
|
+
|
|
62
|
+
# Instance form
|
|
63
|
+
Inkmark.new("# Hello").to_html
|
|
64
|
+
|
|
65
|
+
# With options
|
|
66
|
+
Inkmark.to_html("hi <em>there</em>", options: { raw_html: true })
|
|
67
|
+
|
|
68
|
+
# Mutable options via accessor
|
|
69
|
+
g = Inkmark.new("# Table\n\n| a | b |\n|---|---|\n| 1 | 2 |")
|
|
70
|
+
g.options.tables = false
|
|
71
|
+
g.to_html # tables render as paragraphs now
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## Presets
|
|
75
|
+
|
|
76
|
+
Inkmark ships presets as opinionated shortcuts for common
|
|
77
|
+
rendering profiles. Pass one via `preset:` in the options hash; every
|
|
78
|
+
other option in the hash overrides the preset's values (deep-merging
|
|
79
|
+
for nested element-policy hashes). You can—and are recommended to!—override preset options as you see fit.
|
|
80
|
+
|
|
81
|
+
- **`:recommended`**: a curated profile for modern web content. On
|
|
82
|
+
top of GFM, enables smart punctuation, auto heading IDs, lazy-loading
|
|
83
|
+
images with an `http`/`https` scheme allowlist, autolinks,
|
|
84
|
+
`rel="nofollow noopener"` on external links, a scheme allowlist for
|
|
85
|
+
link destinations, emoji shortcodes, syntax highlighting, hard wraps,
|
|
86
|
+
and frontmatter parsing.
|
|
87
|
+
|
|
88
|
+
**This is a good starting point for most apps**. Still, you are expected to
|
|
89
|
+
override individual options to match your specific needs (e.g. adding statistics and table of contents, tightening link/image allowlists to your own hostnames, turning off features you don't want).
|
|
90
|
+
|
|
91
|
+
- **`:trusted`**: `:recommended` plus raw HTML pass-through.
|
|
92
|
+
**Dangerous.** Intended only for content you fully trust: internal,
|
|
93
|
+
team-authored. With raw HTML on, Inkmark does no sanitization beyond
|
|
94
|
+
the narrow GFM tagfilter (turn it off on your own risk); the caller is
|
|
95
|
+
responsible for output safety. Do not apply this preset to anything a user can influence, directly or indirectly.
|
|
96
|
+
|
|
97
|
+
- **`:gfm`**: the bare default. CommonMark plus the core GFM extensions
|
|
98
|
+
(tables, strikethrough, tasklists, footnotes, tagfilter). Strict,
|
|
99
|
+
conservative, and matches the render profile of every other major
|
|
100
|
+
GFM engine. Everything else is off.
|
|
101
|
+
|
|
102
|
+
- **`:commonmark`**: the minimum. Strict CommonMark. No GFM extensions, no
|
|
103
|
+
typographics, nothing opinionated.
|
|
104
|
+
|
|
105
|
+
```ruby
|
|
106
|
+
# Recommended profile
|
|
107
|
+
Inkmark.to_html(md, options: { preset: :recommended })
|
|
108
|
+
|
|
109
|
+
# Recommended profile with stats and table of contents
|
|
110
|
+
Inkmark.to_html(md, options: { preset: :recommended, statistics: true, toc: true })
|
|
111
|
+
|
|
112
|
+
# Recommended profile, but disable smart punctuation
|
|
113
|
+
Inkmark.to_html(md, options: { preset: :recommended, smart_punctuation: false })
|
|
114
|
+
|
|
115
|
+
# Just GFM (the default)
|
|
116
|
+
Inkmark.to_html(md)
|
|
117
|
+
Inkmark.to_html(md, options: { preset: :gfm }) # equivalent
|
|
118
|
+
|
|
119
|
+
# Recommended profile with a tightened link-host allowlist
|
|
120
|
+
Inkmark.to_html(md, options: {
|
|
121
|
+
preset: :recommended,
|
|
122
|
+
links: { allowed_hosts: ["*.example.com"] }
|
|
123
|
+
})
|
|
124
|
+
|
|
125
|
+
# Trusted content (raw HTML passes through—use with care)
|
|
126
|
+
Inkmark.to_html(internal_doc, options: { preset: :trusted })
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## Options
|
|
130
|
+
|
|
131
|
+
GFM extensions are on by default; raw HTML rendering is off by default.
|
|
132
|
+
Pass a hash to `Inkmark.to_html` / `Inkmark.new`, or mutate a `Inkmark::Options`
|
|
133
|
+
instance via its accessors.
|
|
134
|
+
|
|
135
|
+
| Key | Default | Description |
|
|
136
|
+
|---|---|---|
|
|
137
|
+
| `gfm` | `true` | GFM conformance mode + tables, strikethrough, tasklists, and footnotes. |
|
|
138
|
+
| `gfm_tag_filter` | `true` | GFM "Disallowed Raw HTML" extension. When `gfm` and `raw_html` are both true, protects you from several predefined tags (`title`, `textarea`, `style`, `xmp`, `iframe`, `noembed`, `noframes`, `script`, `plaintext`). No effect when `raw_html: false`. |
|
|
139
|
+
| `tables` | `true` | GFM pipe tables with optional column alignment markers (`:---`, `:---:`, `---:`). |
|
|
140
|
+
| `strikethrough` | `true` | `~~text~~` renders as `<del>text</del>`. |
|
|
141
|
+
| `tasklists` | `true` | `- [ ]` and `- [x]` render as disabled checkboxes. |
|
|
142
|
+
| `footnotes` | `true` | `text[^1]` + `[^1]: body` renders as superscript links and footnote block. |
|
|
143
|
+
| `raw_html` | `false` | Pass raw HTML through unescaped. Off by default for untrusted-input safety. **When enabled, the caller is fully responsible for sanitizing the output—see the [Raw HTML](#raw-html) section.** |
|
|
144
|
+
| `smart_punctuation` | `false` | Convert `"..."` → `"..."`, `...` → `…`, `--` → `–`, `---` → `—`. |
|
|
145
|
+
| `headings` | `{ attributes: false, ids: false }` | Heading-related policy. `:attributes` enables `# Heading {#id .klass}` Markdown inline attribute syntax; `:ids` auto-generates `id="slug"` on every heading from its text, with automatic Unicode transliteration of non-English headings (duplicates get a counter suffix; user-supplied ids from `:attributes` win). Deep-merges over defaults—pass only the sub-keys you care about. |
|
|
146
|
+
| `images` | `{ lazy: false, allowed_hosts: nil, allowed_schemes: nil }` | Image-related policy. `:lazy` adds `loading="lazy" decoding="async"` to every `<img>`. `:allowed_hosts` is a glob allowlist for `<img src>` hostnames (see examples; non-matching images drop to alt text). `:allowed_schemes` is a URL-scheme allowlist—typical: `["http", "https"]` to block `data:` image URIs. Both allowlists default to `nil` (no filtering); `[]` deny-all-external. Deep-merges. |
|
|
147
|
+
| `links` | `{ autolink: false, nofollow: false, allowed_hosts: nil, allowed_schemes: nil }` | Link-related policy. `:autolink` auto-links bare URLs and emails with correct boundary detection. `:nofollow` adds `rel="nofollow noopener"` to external `<a>` tags. `:allowed_hosts` / `:allowed_schemes` are glob / scheme allowlists for `<a href>` (relative/anchor/mailto URLs are never filtered). Non-matching links unwrap to plain text. Deep-merges. |
|
|
148
|
+
| `emoji_shortcodes` | `false` | Replace gemoji-style `:shortcode:` sequences with their emoji character (`:rocket:` → 🚀). Unknown codes and codes inside code blocks are preserved. |
|
|
149
|
+
| `syntax_highlight` | `false` | Server-side syntax highlighting for fenced code blocks with a language tag. Uses the `syntect` Rust crate with CSS class output. Batteries included: pair with CSS from `Inkmark.highlight_css` for the theme stylesheet. |
|
|
150
|
+
| `hard_wrap` | `false` | Treat every single newline as a hard line break (`<br />`). By default a bare `\n` is a soft break rendered as a space. Enable for one-sentence-per-line content or when migrating from renderers that default to hard wraps. |
|
|
151
|
+
| `toc` | `false` | Collect a table of contents from headings. Accepts `true` / `false` for simple enable/disable, or a Hash like `toc: { depth: 3 }` to limit which heading levels appear in the rendered TOC (h1–h3 in that example; default is no limit). Enables `Inkmark#toc` which returns a `Inkmark::Toc` value object (`#to_markdown` / `#to_html` / `#to_s`). Implicitly enables `headings: { ids: true }`. Also populates a lightweight `Inkmark#statistics` with `heading_count`. Depth affects only the rendered TOC; `heading_count`, `extracts[:headings]`, and `chunks_by_heading` still see every heading. |
|
|
152
|
+
| `statistics` | `false` | Collect scalar document statistics during parsing: language detection, character/word counts, and `*_count` fields for headings, code blocks, images, links, and footnote definitions. See examples. For structured arrays of records, use `extract`. Implies `toc` and `headings: { ids: true }`. |
|
|
153
|
+
| `extract` | `nil` | Hash opting into structured extraction of specific element kinds. Keys: `:images`, `:links`, `:code_blocks`, `:headings`, `:footnote_definitions`—each `true`/`false`. When set, `Inkmark#extracts` returns a Hash keyed by the requested kinds, each with an Array of record Hashes including a `:byte_range`. `extract: { headings: true }` and `toc: true` trigger each other—one heading walk powers both surfaces. |
|
|
154
|
+
| `math` | `false` | Recognize `$inline$` and `$$display$$` math blocks. |
|
|
155
|
+
| `definition_list` | `false` | `term\n: definition` renders as `<dl>`. |
|
|
156
|
+
| `superscript` | `false` | `^text^` renders as `<sup>`. |
|
|
157
|
+
| `subscript` | `false` | `~text~` renders as `<sub>`. Conflicts with strikethrough—enable only one. |
|
|
158
|
+
| `wikilinks` | `false` | `[[Page]]` and `[[Page\|label]]` render as links. |
|
|
159
|
+
| `frontmatter` | `false` | Frontmatter (YAML metadata at the start of the document). Parsed and exposed via `Inkmark#frontmatter`; the block is stripped from rendered output. |
|
|
160
|
+
|
|
161
|
+
Options can be supplied either way:
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
# As a hash at construction
|
|
165
|
+
Inkmark.to_html(md, options: { math: true, tables: false })
|
|
166
|
+
|
|
167
|
+
# Via mutable accessor
|
|
168
|
+
g = Inkmark.new(md)
|
|
169
|
+
g.options.math = true
|
|
170
|
+
g.options.tables = false
|
|
171
|
+
g.to_html
|
|
172
|
+
|
|
173
|
+
# Process-level defaults, to set in your application initializer
|
|
174
|
+
Inkmark.default_options.math = true
|
|
175
|
+
Inkmark.new(md).to_html # picks up the default
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Unknown option keys raise `ArgumentError` immediately, including via the
|
|
179
|
+
hash form—typos fail loudly:
|
|
180
|
+
|
|
181
|
+
```ruby
|
|
182
|
+
Inkmark.new("x", options: { taples: true })
|
|
183
|
+
# => ArgumentError: unknown Inkmark option: :taples
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
## Raw HTML
|
|
187
|
+
|
|
188
|
+
Raw HTML is suppressed by default. This is safe-by-default for rendering untrusted markdown:
|
|
189
|
+
|
|
190
|
+
```ruby
|
|
191
|
+
Inkmark.to_html("<script>alert(1)</script>")
|
|
192
|
+
# => "<p><script>alert(1)</script></p>\n"
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Enable pass-through with `raw_html: true`; _only do this for trusted
|
|
196
|
+
input_:
|
|
197
|
+
|
|
198
|
+
```ruby
|
|
199
|
+
Inkmark.to_html("<em>keep me</em>", options: { raw_html: true })
|
|
200
|
+
# => "<p><em>keep me</em></p>\n"
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
> **Your responsibility.** With `raw_html: true` you are fully
|
|
204
|
+
> responsible for every `<tag>` that reaches the HTML output. Inkmark does not
|
|
205
|
+
> sanitize raw HTML beyond the narrow GFM tagfilter described below—it will
|
|
206
|
+
> happily emit `<img onerror="…">`, `<a href="javascript:…">`, `<style>`
|
|
207
|
+
> contents, and any other attack surface the source contains. Always pipe the
|
|
208
|
+
> output through a dedicated sanitizer (like [Loofah][] or
|
|
209
|
+
> [rails-html-sanitizer][]) before rendering untrusted content in a page.
|
|
210
|
+
|
|
211
|
+
[Loofah]: https://github.com/flavorjones/loofah
|
|
212
|
+
[rails-html-sanitizer]: https://github.com/rails/rails-html-sanitizer
|
|
213
|
+
|
|
214
|
+
Even with `raw_html: true`, the **GFM tagfilter** stays on by
|
|
215
|
+
default and escapes nine unsafe tag names—`title`, `textarea`, `style`, `xmp`,
|
|
216
|
+
`iframe`, `noembed`, `noframes`, `script`, `plaintext`. This is required for GFM conformance. Opt out with `gfm_tag_filter: false` (or `gfm: false`) if you need raw pass-through of those tags—trusted input only. The tagfilter is a narrow spec-compliance pass, **not** a sanitizer—the responsibility note above still applies in full.
|
|
217
|
+
|
|
218
|
+
```ruby
|
|
219
|
+
Inkmark.to_html("<script>alert(1)</script>", options: { raw_html: true })
|
|
220
|
+
# => "<p><script>alert(1)</script></p>\n"
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
## Host allowlists
|
|
224
|
+
|
|
225
|
+
Restrict which hostnames can appear in links and images by passing glob
|
|
226
|
+
patterns. Disallowed links have their `<a>` tags stripped (the link text
|
|
227
|
+
stays); disallowed images drop to their alt text (or disappear when alt
|
|
228
|
+
is empty). Relative URLs, anchors, `mailto:`, and other non-web schemes
|
|
229
|
+
pass through unchanged—only `http://` / `https://` URLs are matched.
|
|
230
|
+
|
|
231
|
+
```ruby
|
|
232
|
+
Inkmark.to_html(md, options: {
|
|
233
|
+
links: { allowed_hosts: ["example.com", "*.example.com"] },
|
|
234
|
+
images: { allowed_hosts: ["{cdn,static,img}.example.com"] }
|
|
235
|
+
})
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
Patterns use glob syntax (same engine as `.gitignore`), **not regex**:
|
|
239
|
+
|
|
240
|
+
- `example.com`: exact host only
|
|
241
|
+
- `*.example.com`: any subdomain (matches `cdn.example.com`, `a.b.example.com`; does **not** match bare `example.com`)
|
|
242
|
+
- `{cdn,static}.example.com`: brace alternation for multiple explicit hosts
|
|
243
|
+
- `*.{example,trusted}.com`: combine wildcards and alternation
|
|
244
|
+
|
|
245
|
+
Hostnames are matched case-insensitively and ports are ignored. An empty
|
|
246
|
+
array `[]` blocks every external link or image while still allowing
|
|
247
|
+
relative URLs.
|
|
248
|
+
|
|
249
|
+
## URL scheme filtering
|
|
250
|
+
|
|
251
|
+
For rendering untrusted markdown, opt in to scheme allowlists to block
|
|
252
|
+
`javascript:`, `data:`, and other dangerous URL schemes in links and
|
|
253
|
+
images:
|
|
254
|
+
|
|
255
|
+
```ruby
|
|
256
|
+
Inkmark.to_html(md, options: {
|
|
257
|
+
links: { allowed_schemes: ["http", "https", "mailto"] },
|
|
258
|
+
images: { allowed_schemes: ["http", "https"] }
|
|
259
|
+
})
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Disallowed links are unwrapped (text stays, `<a>` tags drop); disallowed
|
|
263
|
+
images drop to alt text. Relative paths, anchors, and protocol-relative
|
|
264
|
+
URLs pass through—no scheme to check.
|
|
265
|
+
|
|
266
|
+
```ruby
|
|
267
|
+
opts = { links: { allowed_schemes: ["http", "https"] } }
|
|
268
|
+
|
|
269
|
+
Inkmark.to_html("[click](javascript:alert(1))", options: opts)
|
|
270
|
+
# => "<p>click</p>\n"
|
|
271
|
+
|
|
272
|
+
Inkmark.to_html(">)",
|
|
273
|
+
options: { images: { allowed_schemes: ["http", "https"] } })
|
|
274
|
+
# => "<p>pic</p>\n" # dropped to alt text
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
**Scope:** scheme filtering applies to markdown-emitted links and images
|
|
278
|
+
(`[text](url)` / ``). Raw HTML `<a href>` / `<img src>` inside
|
|
279
|
+
`raw_html: true` content is *not* filtered—for that case use a
|
|
280
|
+
downstream HTML sanitizer like Loofah.
|
|
281
|
+
|
|
282
|
+
**Default:** filtering is off. Full CommonMark autolink conformance is
|
|
283
|
+
preserved (including uncommon schemes like `irc:` and `ftp:`). Add the
|
|
284
|
+
filter explicitly when rendering untrusted input.
|
|
285
|
+
|
|
286
|
+
## Statistics and extraction
|
|
287
|
+
|
|
288
|
+
Inkmark collects document metadata as a side effect of the single render pass.
|
|
289
|
+
Two independent options control what's exposed:
|
|
290
|
+
|
|
291
|
+
- **`statistics: true`** populates `Inkmark#statistics` with scalar counts and
|
|
292
|
+
language detection—nothing you have to iterate.
|
|
293
|
+
- **`extract: { kind: true, ... }`** populates `Inkmark#extracts` with structured
|
|
294
|
+
arrays of records. Opt into only the kinds you need; unasked-for arrays are
|
|
295
|
+
never allocated.
|
|
296
|
+
|
|
297
|
+
```ruby
|
|
298
|
+
md = Inkmark.new(source, options: {
|
|
299
|
+
statistics: true,
|
|
300
|
+
extract: {
|
|
301
|
+
images: true,
|
|
302
|
+
links: true,
|
|
303
|
+
code_blocks: true,
|
|
304
|
+
headings: true,
|
|
305
|
+
footnote_definitions: true
|
|
306
|
+
}
|
|
307
|
+
})
|
|
308
|
+
md.to_html
|
|
309
|
+
|
|
310
|
+
md.statistics
|
|
311
|
+
# => {
|
|
312
|
+
# heading_count: 2,
|
|
313
|
+
# likely_language: "eng",
|
|
314
|
+
# language_confidence: 0.93,
|
|
315
|
+
# character_count: 142,
|
|
316
|
+
# word_count: 28,
|
|
317
|
+
# code_block_count: 1,
|
|
318
|
+
# image_count: 1,
|
|
319
|
+
# link_count: 2,
|
|
320
|
+
# footnote_definition_count: 1,
|
|
321
|
+
# }
|
|
322
|
+
|
|
323
|
+
md.extracts[:code_blocks]
|
|
324
|
+
# => [{ lang: "ruby", source: "puts \"hello\"\n", byte_range: 78...101 }]
|
|
325
|
+
|
|
326
|
+
md.extracts[:headings]
|
|
327
|
+
# => [
|
|
328
|
+
# { level: 1, text: "Hello World", id: "hello-world", byte_range: 0...14 },
|
|
329
|
+
# { level: 2, text: "Code Example", id: "code-example", byte_range: 68...83 }
|
|
330
|
+
# ]
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
### Extract record shapes
|
|
334
|
+
|
|
335
|
+
| Kind | Fields |
|
|
336
|
+
|---------------------------|------------------------------------------------|
|
|
337
|
+
| `:images` | `src`, `alt`, `title`, `byte_range` |
|
|
338
|
+
| `:links` | `href`, `text`, `title`, `byte_range` |
|
|
339
|
+
| `:code_blocks` | `lang`, `source`, `byte_range` |
|
|
340
|
+
| `:headings` | `level`, `text`, `id`, `byte_range` |
|
|
341
|
+
| `:footnote_definitions` | `label`, `text`, `byte_range` |
|
|
342
|
+
|
|
343
|
+
`byte_range` is an exclusive `Range` (`start...end`) pointing into the original
|
|
344
|
+
source string—slice with `source.byteslice(r.begin, r.size)` to recover the
|
|
345
|
+
raw Markdown. `source` on `:code_blocks` is pulldown-cmark's pre-filter code
|
|
346
|
+
content, so enabling `syntax_highlight: true` does not mutate it.
|
|
347
|
+
|
|
348
|
+
### Mutual trigger: `toc` ↔ `extract[:headings]`
|
|
349
|
+
|
|
350
|
+
One heading walk powers both the TOC renderer and the heading extract, so the
|
|
351
|
+
two options trigger each other. Enabling either gives you access to both
|
|
352
|
+
`Inkmark#toc` (with `#to_markdown` / `#to_html`) and `Inkmark#extracts[:headings]`.
|
|
353
|
+
|
|
354
|
+
```ruby
|
|
355
|
+
Inkmark.new(source, options: { toc: true }).extracts[:headings]
|
|
356
|
+
# => [{ level: 1, text: "Hello World", id: "hello-world", byte_range: 0...14 }, ...]
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
## Chunks extraction (for RAG)
|
|
360
|
+
|
|
361
|
+
`Inkmark.chunks_by_heading` splits a document by heading into an ordered
|
|
362
|
+
Array of section Hashes. Each section's `:content` is **filter-applied
|
|
363
|
+
Markdown**—emoji expanded, URLs autolinked, allowlists applied—serialized
|
|
364
|
+
back through pulldown-cmark. Designed as the first stage of a
|
|
365
|
+
chunk → embed → retrieve pipeline.
|
|
366
|
+
|
|
367
|
+
```ruby
|
|
368
|
+
sections = Inkmark.chunks_by_heading(readme)
|
|
369
|
+
sections.each do |s|
|
|
370
|
+
puts "#{'#' * s[:level]} #{s[:heading]} (#{s[:id]})"
|
|
371
|
+
puts s[:content]
|
|
372
|
+
end
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
Each entry:
|
|
376
|
+
|
|
377
|
+
```ruby
|
|
378
|
+
{
|
|
379
|
+
heading: "From source", # String, or nil for the preamble
|
|
380
|
+
level: 3, # 1-6, or 0 for the preamble
|
|
381
|
+
id: "from-source", # slug, or nil for the preamble
|
|
382
|
+
breadcrumb: ["Docs", "Installation"], # ancestor heading texts, root to parent
|
|
383
|
+
content: "Run `bundle install`...\n" # filter-applied Markdown
|
|
384
|
+
}
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
Sections are **hierarchical**: a `##` section's `:content` includes any
|
|
388
|
+
nested `###` subsections, which also appear as their own entries. Content
|
|
389
|
+
before the first heading (if any) becomes a preamble entry with
|
|
390
|
+
`heading: nil` and `level: 0`.
|
|
391
|
+
|
|
392
|
+
`:breadcrumb` carries the ancestor heading texts from root to immediate
|
|
393
|
+
parent. Root-level sections and the preamble have an empty array. Skipped
|
|
394
|
+
levels are omitted, so an `###` directly under an `#` has `breadcrumb:
|
|
395
|
+
["Top"]`, not `["Top", nil]`. RAG pipelines typically prepend the
|
|
396
|
+
breadcrumb to each chunk before embedding—it gives the vector model a
|
|
397
|
+
cheap signal about the chunk's place in the document:
|
|
398
|
+
|
|
399
|
+
Enable `statistics: true` to add `:character_count` and `:word_count` to
|
|
400
|
+
every section entry. Counts reflect the section's filter-applied text
|
|
401
|
+
content including any code-block bodies (code is content for embedding
|
|
402
|
+
purposes, not just prose). Numbers across sections won't sum to the
|
|
403
|
+
document total because sections overlap hierarchically—a parent section's
|
|
404
|
+
count includes its nested subsections.
|
|
405
|
+
|
|
406
|
+
```ruby
|
|
407
|
+
Inkmark.chunks_by_heading(doc, options: {statistics: true})
|
|
408
|
+
# => [
|
|
409
|
+
# { heading: "Installation", level: 2, id: "installation",
|
|
410
|
+
# breadcrumb: ["Intro"],
|
|
411
|
+
# character_count: 180, word_count: 32,
|
|
412
|
+
# content: "..." },
|
|
413
|
+
# ...
|
|
414
|
+
# ]
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
```ruby
|
|
418
|
+
Inkmark.chunks_by_heading(readme).each do |s|
|
|
419
|
+
next if s[:heading].nil? # skip preamble
|
|
420
|
+
context = (s[:breadcrumb] + [s[:heading]]).join(" > ")
|
|
421
|
+
embed_and_store("#{context}\n\n#{s[:content]}", metadata: {id: s[:id]})
|
|
422
|
+
end
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
### Picking specific sections
|
|
426
|
+
|
|
427
|
+
`chunks_by_heading` always returns the full array. Use plain `Enumerable`
|
|
428
|
+
to slice it however you need:
|
|
429
|
+
|
|
430
|
+
```ruby
|
|
431
|
+
sections = Inkmark.chunks_by_heading(readme)
|
|
432
|
+
|
|
433
|
+
# Find one by heading text
|
|
434
|
+
sections.find { |s| s[:heading] == "Installation" }
|
|
435
|
+
|
|
436
|
+
# Filter by regexp
|
|
437
|
+
sections.select { |s| s[:heading]&.match?(/install|usage/i) }
|
|
438
|
+
|
|
439
|
+
# All top-level headings only
|
|
440
|
+
sections.select { |s| s[:level] == 1 }
|
|
441
|
+
|
|
442
|
+
# Skip the preamble
|
|
443
|
+
sections.reject { |s| s[:heading].nil? }
|
|
444
|
+
```
|
|
445
|
+
|
|
446
|
+
No filter kwarg on the method—`.select` / `.find` / `.reject` already
|
|
447
|
+
cover every filtering shape, and you can compose conditions freely
|
|
448
|
+
(heading AND level, or heading NOT in a blocklist, etc.). The preamble
|
|
449
|
+
is a regular entry with `heading: nil` and falls out of Regexp/String
|
|
450
|
+
filters naturally (`nil == "Foo"` is false; `nil&.match?(x)` is nil).
|
|
451
|
+
|
|
452
|
+
### RAG pipeline caveat: HTML-emitting filters
|
|
453
|
+
|
|
454
|
+
**Disable `syntax_highlight`, `images: { lazy: true }`, and `links: { nofollow: true }`
|
|
455
|
+
when chunking for RAG.** These filters embed raw `<pre>…`, `<img loading=…>`,
|
|
456
|
+
and `<a rel=…>` HTML into the serialized Markdown; the HTML noise hurts
|
|
457
|
+
embedding quality for downstream semantic search.
|
|
458
|
+
|
|
459
|
+
```ruby
|
|
460
|
+
sections = Inkmark.chunks_by_heading(doc, options: {
|
|
461
|
+
emoji_shortcodes: true, # keep—improves semantic signal
|
|
462
|
+
links: {
|
|
463
|
+
autolink: true, # keep—proper anchor markdown
|
|
464
|
+
allowed_schemes: %w[http https mailto], # keep—safe URLs
|
|
465
|
+
nofollow: false # off—would embed <a rel=...> HTML
|
|
466
|
+
},
|
|
467
|
+
images: { lazy: false }, # off—would embed <img loading=...> HTML
|
|
468
|
+
syntax_highlight: false # off—would embed <pre><span...> HTML
|
|
469
|
+
})
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
### Scope
|
|
473
|
+
|
|
474
|
+
`chunks_by_heading` is a **structural chunking primitive**, not a
|
|
475
|
+
complete RAG chunker. It splits a document along heading boundaries.
|
|
476
|
+
For documents without headings—or when you need a strict size
|
|
477
|
+
budget regardless of document structure—reach for
|
|
478
|
+
[`chunks_by_size`](#sliding-window-chunking) below.
|
|
479
|
+
|
|
480
|
+
Inkmark does not ship token-based budgeting (there is no embedded
|
|
481
|
+
tokenizer). Use `character_count` / `word_count` or your own tokenizer
|
|
482
|
+
to approximate. Prepending document titles or parent-heading
|
|
483
|
+
breadcrumbs to each chunk is a few lines of Ruby on top of the array
|
|
484
|
+
this method returns.
|
|
485
|
+
|
|
486
|
+
### Sliding-window chunking
|
|
487
|
+
|
|
488
|
+
`Inkmark.chunks_by_size` splits a document into fixed-size chunks with
|
|
489
|
+
optional overlap, walking the filter-applied Markdown sequentially. Use
|
|
490
|
+
this when headings are absent or uneven, or when you need a strict
|
|
491
|
+
size budget for embedding input.
|
|
492
|
+
|
|
493
|
+
```ruby
|
|
494
|
+
# Char-budgeted windows with overlap
|
|
495
|
+
Inkmark.chunks_by_size(doc, chars: 500, overlap: 50)
|
|
496
|
+
|
|
497
|
+
# Word budget, word-boundary cuts
|
|
498
|
+
Inkmark.chunks_by_size(doc, words: 120, overlap: 15, at: :word)
|
|
499
|
+
|
|
500
|
+
# Dual budget: cut at whichever is reached first
|
|
501
|
+
Inkmark.chunks_by_size(doc, chars: 1000, words: 200)
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
Each window:
|
|
505
|
+
|
|
506
|
+
```ruby
|
|
507
|
+
{
|
|
508
|
+
index: 0, # 0-based sequence position
|
|
509
|
+
content: "..." # filter-applied Markdown slice
|
|
510
|
+
# character_count, word_count added when options: { statistics: true }
|
|
511
|
+
}
|
|
512
|
+
```
|
|
513
|
+
|
|
514
|
+
**Boundary modes.** `at: :block` (default) cuts only between top-level
|
|
515
|
+
Markdown blocks—output stays valid Markdown, and a single block that
|
|
516
|
+
exceeds the budget is emitted as its own window rather than silently
|
|
517
|
+
dropped. `at: :word` serializes the full filtered Markdown and cuts at
|
|
518
|
+
the last Unicode word boundary that fits—tighter fit but may split
|
|
519
|
+
open constructs.
|
|
520
|
+
|
|
521
|
+
**Overlap.** Measured in chars. Each new window begins with the
|
|
522
|
+
trailing `overlap` chars of the previous window, so adjacent chunks
|
|
523
|
+
share context—useful when an embedding model's attention benefits
|
|
524
|
+
from neighbor overlap. Must be less than `chars:` when both are set.
|
|
525
|
+
|
|
526
|
+
**Validation.** `chars` or `words` required (at least one). Both must
|
|
527
|
+
be positive. `overlap` defaults to 0, must be non-negative, and must be
|
|
528
|
+
less than `chars` when `chars` is set. Invalid combinations raise
|
|
529
|
+
`ArgumentError` at the Ruby boundary—silent clamping would mask
|
|
530
|
+
bugs like swapped args.
|
|
531
|
+
|
|
532
|
+
#### Heading vs size: which to use
|
|
533
|
+
|
|
534
|
+
`chunks_by_heading` for docs where headings encode meaningful
|
|
535
|
+
structure (articles, specs, READMEs). Each chunk carries heading,
|
|
536
|
+
level, id, and breadcrumb metadata—retrieval benefits from that
|
|
537
|
+
context.
|
|
538
|
+
|
|
539
|
+
`chunks_by_size` for unstructured or uneven-heading docs, or when a
|
|
540
|
+
hard size ceiling matters more than document structure. No structural
|
|
541
|
+
metadata; windows are just positioned slices.
|
|
542
|
+
|
|
543
|
+
You can compose them for a hybrid "heading-based, but size-capped"
|
|
544
|
+
pattern:
|
|
545
|
+
|
|
546
|
+
```ruby
|
|
547
|
+
Inkmark.chunks_by_heading(doc).flat_map do |c|
|
|
548
|
+
if c[:content].size > 2000
|
|
549
|
+
Inkmark.chunks_by_size(c[:content], chars: 500, overlap: 50)
|
|
550
|
+
else
|
|
551
|
+
[c]
|
|
552
|
+
end
|
|
553
|
+
end
|
|
554
|
+
```
|
|
555
|
+
|
|
556
|
+
## Truncation
|
|
557
|
+
|
|
558
|
+
`Inkmark.truncate_markdown` caps a document at a character and/or word
|
|
559
|
+
budget, cutting at either a Markdown block boundary (valid structure) or
|
|
560
|
+
a Unicode word boundary (tighter fit, may split an open construct).
|
|
561
|
+
Designed for LLM context-window budgeting and RAG chunk normalization.
|
|
562
|
+
|
|
563
|
+
```ruby
|
|
564
|
+
# Block-boundary cut: last complete block that fits, output is valid Markdown
|
|
565
|
+
Inkmark.truncate_markdown(doc, chars: 4000, at: :block)
|
|
566
|
+
|
|
567
|
+
# Word-boundary cut: last word that fits, output may split open constructs
|
|
568
|
+
Inkmark.truncate_markdown(doc, chars: 4000, at: :word)
|
|
569
|
+
|
|
570
|
+
# Dual budget: cut at whichever limit is hit first
|
|
571
|
+
Inkmark.truncate_markdown(doc, chars: 4000, words: 500, at: :word)
|
|
572
|
+
|
|
573
|
+
# Suppress the marker
|
|
574
|
+
Inkmark.truncate_markdown(doc, chars: 4000, at: :block, marker: nil)
|
|
575
|
+
|
|
576
|
+
# Custom marker
|
|
577
|
+
Inkmark.truncate_markdown(doc, chars: 4000, at: :block, marker: "[…]")
|
|
578
|
+
```
|
|
579
|
+
|
|
580
|
+
Default marker is `"…"`. When appended, it counts toward the
|
|
581
|
+
budget—`chars: 4000` always yields output ≤ 4000 codepoints.
|
|
582
|
+
|
|
583
|
+
**Behavior:**
|
|
584
|
+
|
|
585
|
+
- **Source fits the budget**: returned unchanged (no marker).
|
|
586
|
+
- **First block alone exceeds the budget** (block mode): empty string.
|
|
587
|
+
Honest to "no block fits"; fall through to word-mode truncation if you
|
|
588
|
+
want a best-effort cut.
|
|
589
|
+
- **Marker too large for the budget**: raises `ArgumentError`.
|
|
590
|
+
- **Filter pipeline**: `emoji_shortcodes`, `links: { autolink: true }`, host/scheme
|
|
591
|
+
allowlists etc. run before truncation, so the measured output matches
|
|
592
|
+
what downstream tools consume.
|
|
593
|
+
|
|
594
|
+
### Per-section truncation
|
|
595
|
+
|
|
596
|
+
`chunks_by_heading` accepts a `truncate:` kwarg that applies the same
|
|
597
|
+
contract to every section's `:content` independently:
|
|
598
|
+
|
|
599
|
+
```ruby
|
|
600
|
+
Inkmark.chunks_by_heading(doc, truncate: {chars: 500, at: :block})
|
|
601
|
+
```
|
|
602
|
+
|
|
603
|
+
Each section's content is cut to the 500-char budget; metadata
|
|
604
|
+
(`:heading`, `:level`, `:id`, `:breadcrumb`) stays intact. When
|
|
605
|
+
`statistics: true` is also set, `:character_count` / `:word_count` are
|
|
606
|
+
recomputed against the truncated content.
|
|
607
|
+
|
|
608
|
+
```ruby
|
|
609
|
+
Inkmark.chunks_by_heading(doc,
|
|
610
|
+
options: {statistics: true},
|
|
611
|
+
truncate: {chars: 500, at: :block, marker: "…"}
|
|
612
|
+
)
|
|
613
|
+
# => each entry: { heading:, level:, id:, breadcrumb:,
|
|
614
|
+
# character_count:, word_count:, content: (≤ 500 chars) }
|
|
615
|
+
```
|
|
616
|
+
|
|
617
|
+
Because sections are hierarchical (a parent section's `:content`
|
|
618
|
+
includes nested subsections), applying the same budget to every entry
|
|
619
|
+
means each chunk stands alone as a self-contained, budget-capped unit.
|
|
620
|
+
|
|
621
|
+
## Plain-text extraction
|
|
622
|
+
|
|
623
|
+
`Inkmark#to_plain_text` strips all Markdown syntax and returns inline content as
|
|
624
|
+
plain text. Designed for embedding models, token counting, LLM input, and any
|
|
625
|
+
downstream consumer that treats Markdown formatting as noise.
|
|
626
|
+
|
|
627
|
+
```ruby
|
|
628
|
+
Inkmark.to_plain_text("**bold** and [a link](https://example.com)")
|
|
629
|
+
# => "bold and a link (https://example.com)\n"
|
|
630
|
+
|
|
631
|
+
g = Inkmark.new(source, options: { emoji_shortcodes: true, links: { autolink: true } })
|
|
632
|
+
g.to_plain_text
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
The same event-level filters (emoji replacement, autolink, host/scheme
|
|
636
|
+
allowlists, etc.) run before plain-text serialization, so preprocessing passes
|
|
637
|
+
apply consistently across `to_html`, `to_markdown`, and `to_plain_text`.
|
|
638
|
+
|
|
639
|
+
### Output grammar
|
|
640
|
+
|
|
641
|
+
| Element | Plain-text form |
|
|
642
|
+
|---|---|
|
|
643
|
+
| `**bold**`, `*italic*`, `~~strike~~` | inner text only |
|
|
644
|
+
| `` `code` `` | inner text (no backticks) |
|
|
645
|
+
| `[text](url)` | `text (url)` |
|
|
646
|
+
| `<https://x.com>` (autolink) | `https://x.com` (collapses when text == url) |
|
|
647
|
+
| `` | `alt (src)` |
|
|
648
|
+
| `# Heading` | plain text with blank line above/below |
|
|
649
|
+
| `> quote` | every line prefixed with `> ` (email-style; nests) |
|
|
650
|
+
| `- item` / `1. item` | `- ` / `1. ` bullets; 2-space indent per nesting |
|
|
651
|
+
| `- [x] task` | `- task` (checkbox dropped) |
|
|
652
|
+
| tables | header row `\t`-joined, blank line, body rows `\t`-joined |
|
|
653
|
+
| ``` ```code``` ``` | raw content, blank line above/below |
|
|
654
|
+
| `---` | `---` surrounded by blank lines |
|
|
655
|
+
| `[^foo]` | `[foo]` |
|
|
656
|
+
| `[^foo]: body` | appended at document end as `[foo]: body` |
|
|
657
|
+
| soft break | space |
|
|
658
|
+
| hard break | `\n` |
|
|
659
|
+
| raw HTML | stripped by default; passes through when `raw_html: true` |
|
|
660
|
+
|
|
661
|
+
Blank lines inside a blockquote emit a bare `>` marker (matching email quoting
|
|
662
|
+
conventions; no trailing whitespace).
|
|
663
|
+
|
|
664
|
+
## Markdown-to-Markdown pipeline
|
|
665
|
+
|
|
666
|
+
`Inkmark#to_markdown` runs the same event-level filter pipeline as `to_html` and
|
|
667
|
+
serializes the result back to Markdown text. Use it as a preprocessing step in
|
|
668
|
+
pipelines that consume Markdown: LLM prompts, secondary renderers, content
|
|
669
|
+
storage, or any stage that needs clean Markdown rather than HTML.
|
|
670
|
+
|
|
671
|
+
```ruby
|
|
672
|
+
# Class-method shortcut
|
|
673
|
+
Inkmark.to_markdown("**bold** :rocket:", options: { emoji_shortcodes: true })
|
|
674
|
+
# => "**bold** 🚀"
|
|
675
|
+
|
|
676
|
+
# Instance form—the same options object drives both outputs
|
|
677
|
+
g = Inkmark.new(source, options: {
|
|
678
|
+
emoji_shortcodes: true,
|
|
679
|
+
links: { allowed_hosts: ["trusted.com", "*.trusted.com"] }
|
|
680
|
+
})
|
|
681
|
+
g.to_markdown # filtered Markdown for pipeline
|
|
682
|
+
g.to_html # rendered HTML for display
|
|
683
|
+
```
|
|
684
|
+
|
|
685
|
+
### Choosing filters for a Markdown pipeline
|
|
686
|
+
|
|
687
|
+
Inkmark's filters fall into two groups depending on what they emit:
|
|
688
|
+
|
|
689
|
+
**Markdown-native filters** transform the event stream without producing HTML.
|
|
690
|
+
Their output is standard Markdown and is safe to pass to any downstream
|
|
691
|
+
consumer:
|
|
692
|
+
|
|
693
|
+
| Filter | Effect in `to_markdown` |
|
|
694
|
+
|---|---|
|
|
695
|
+
| `emoji_shortcodes` | `:rocket:` → `🚀` in the output text |
|
|
696
|
+
| `links: { autolink: true }` | bare `https://x.com` → `[https://x.com](https://x.com)` |
|
|
697
|
+
| `links: { allowed_hosts:, allowed_schemes: }` | disallowed links unwrapped to plain text |
|
|
698
|
+
| `images: { allowed_hosts:, allowed_schemes: }` | disallowed images dropped to alt text |
|
|
699
|
+
| `smart_punctuation` | `"..."` → `"…"` etc. (text-only transformation) |
|
|
700
|
+
|
|
701
|
+
**HTML-emitting filters** synthesize raw `<...>` markup. When these are active
|
|
702
|
+
and you call `to_markdown`, that markup is embedded verbatim in the output. Raw
|
|
703
|
+
HTML blocks are valid CommonMark, but they may break or confuse downstream
|
|
704
|
+
consumers—especially LLMs and renderers that do not expect HTML inside
|
|
705
|
+
Markdown:
|
|
706
|
+
|
|
707
|
+
| Filter | What ends up in the Markdown |
|
|
708
|
+
|---|---|
|
|
709
|
+
| `syntax_highlight` | fenced code blocks become `<pre><code><span class=...>` HTML |
|
|
710
|
+
| `images: { lazy: true }` | images become `<img loading="lazy" decoding="async" ...>` HTML |
|
|
711
|
+
| `links: { nofollow: true }` | links become `<a rel="nofollow noopener" ...>` HTML |
|
|
712
|
+
|
|
713
|
+
**Recommendation:** disable HTML-emitting filters when calling `to_markdown`.
|
|
714
|
+
They are designed for final HTML output and produce hard-to-process markup in a
|
|
715
|
+
Markdown pipeline:
|
|
716
|
+
|
|
717
|
+
```ruby
|
|
718
|
+
Inkmark.to_markdown(source, options: {
|
|
719
|
+
# Markdown-native—safe to enable
|
|
720
|
+
emoji_shortcodes: true,
|
|
721
|
+
links: { allowed_schemes: %w[http https mailto], nofollow: false },
|
|
722
|
+
images: { lazy: false },
|
|
723
|
+
|
|
724
|
+
# HTML-emitting—turn off for clean Markdown output
|
|
725
|
+
syntax_highlight: false, # would embed <pre><span...> blocks
|
|
726
|
+
})
|
|
727
|
+
```
|
|
728
|
+
|
|
729
|
+
## Event handlers
|
|
730
|
+
|
|
731
|
+
Register handlers with `#on` to inspect or transform document elements as they
|
|
732
|
+
are parsed. Handlers fire **post-order**—children before parents—so when a
|
|
733
|
+
`:table` handler runs, its rows and cells are already available. Returns `self`
|
|
734
|
+
for chaining.
|
|
735
|
+
|
|
736
|
+
```ruby
|
|
737
|
+
md = Inkmark.new(source)
|
|
738
|
+
|
|
739
|
+
md.on(:heading) { |h| ... }
|
|
740
|
+
.on(:image) { |img| ... }
|
|
741
|
+
.on(:link) { |l| ... }
|
|
742
|
+
```
|
|
743
|
+
|
|
744
|
+
Two entry points trigger handlers:
|
|
745
|
+
|
|
746
|
+
- **`#walk`**—fires handlers without producing HTML. Use it for analysis:
|
|
747
|
+
collecting specific elements, validating content, extracting structured data.
|
|
748
|
+
For built-in heading/link/image/word-count collection, see `statistics: true`.
|
|
749
|
+
- **`#to_html`**—fires handlers then renders. Mutations made inside a handler
|
|
750
|
+
change what ends up in the HTML.
|
|
751
|
+
|
|
752
|
+
### Collecting data with `#walk`
|
|
753
|
+
|
|
754
|
+
```ruby
|
|
755
|
+
# Check that every image has alt text
|
|
756
|
+
md = Inkmark.new(source)
|
|
757
|
+
missing_alt = []
|
|
758
|
+
md.on(:image) { |img| missing_alt << img.dest if img.text.empty? }
|
|
759
|
+
md.walk
|
|
760
|
+
raise "Images missing alt text: #{missing_alt.join(', ')}" if missing_alt.any?
|
|
761
|
+
```
|
|
762
|
+
|
|
763
|
+
```ruby
|
|
764
|
+
# Collect every fenced code block language used in the document
|
|
765
|
+
languages = Set.new
|
|
766
|
+
md.on(:code_block) { |c| languages << c.lang if c.lang && !c.lang.empty? }
|
|
767
|
+
md.walk
|
|
768
|
+
```
|
|
769
|
+
|
|
770
|
+
```ruby
|
|
771
|
+
# Validate that no link points to a deprecated domain
|
|
772
|
+
deprecated = /old-docs\.example\.com/
|
|
773
|
+
md.on(:link) { |l| warn "Deprecated link: #{l.dest}" if l.dest =~ deprecated }
|
|
774
|
+
md.walk
|
|
775
|
+
```
|
|
776
|
+
|
|
777
|
+
### Rewriting output with `#to_html`
|
|
778
|
+
|
|
779
|
+
#### Image CDN rewriting
|
|
780
|
+
|
|
781
|
+
Set `dest=` to redirect images to a CDN. The change is reflected in the
|
|
782
|
+
rendered `<img src>`:
|
|
783
|
+
|
|
784
|
+
```ruby
|
|
785
|
+
md = Inkmark.new(source)
|
|
786
|
+
md.on(:image) do |img|
|
|
787
|
+
img.dest = "https://cdn.example.net/#{File.basename(img.dest)}"
|
|
788
|
+
end
|
|
789
|
+
html = md.to_html
|
|
790
|
+
```
|
|
791
|
+
|
|
792
|
+
#### Rewriting link destinations
|
|
793
|
+
|
|
794
|
+
```ruby
|
|
795
|
+
md.on(:link) do |l|
|
|
796
|
+
if l.dest.start_with?("http")
|
|
797
|
+
l.html = %(<a href="#{l.dest}" target="_blank" rel="noopener">#{l.text}</a>)
|
|
798
|
+
end
|
|
799
|
+
end
|
|
800
|
+
```
|
|
801
|
+
|
|
802
|
+
#### Shifting heading levels
|
|
803
|
+
|
|
804
|
+
Bump every heading down one level so the document fits inside a layout that
|
|
805
|
+
reserves `<h1>` for the page title:
|
|
806
|
+
|
|
807
|
+
```ruby
|
|
808
|
+
md = Inkmark.new(source)
|
|
809
|
+
md.on(:heading) { |h| h.level = [h.level + 1, 6].min }
|
|
810
|
+
html = md.to_html
|
|
811
|
+
```
|
|
812
|
+
|
|
813
|
+
#### Custom code block rendering
|
|
814
|
+
|
|
815
|
+
Intercept fenced code blocks by language tag. Setting `html=` skips Inkmark's
|
|
816
|
+
default `<pre><code>` output—and the `syntax_highlight` filter, even if
|
|
817
|
+
enabled:
|
|
818
|
+
|
|
819
|
+
```ruby
|
|
820
|
+
md = Inkmark.new(source)
|
|
821
|
+
md.on(:code_block) do |c|
|
|
822
|
+
case c.lang
|
|
823
|
+
when "mermaid"
|
|
824
|
+
c.html = %(<div class="mermaid">#{c.text}</div>\n)
|
|
825
|
+
when "math"
|
|
826
|
+
c.html = %(<div class="math">\\[#{c.text}\\]</div>\n)
|
|
827
|
+
end
|
|
828
|
+
end
|
|
829
|
+
html = md.to_html
|
|
830
|
+
```
|
|
831
|
+
|
|
832
|
+
#### Custom directives in paragraphs
|
|
833
|
+
|
|
834
|
+
Match a special directive syntax and replace the paragraph with a component:
|
|
835
|
+
|
|
836
|
+
```ruby
|
|
837
|
+
# Markdown:
|
|
838
|
+
# @available_since rails=7.1 ruby=3.2
|
|
839
|
+
#
|
|
840
|
+
md.on(:paragraph) do |p|
|
|
841
|
+
next unless p.text =~ /\A@available_since\s+(.+)\z/
|
|
842
|
+
attrs = $1.scan(/(\w+)=(\S+)/).map { |k, v| %( #{k}="#{v}") }.join
|
|
843
|
+
p.html = %(<AvailableSince#{attrs} />\n)
|
|
844
|
+
end
|
|
845
|
+
```
|
|
846
|
+
|
|
847
|
+
#### Replacing with Markdown
|
|
848
|
+
|
|
849
|
+
Use `markdown=` when the replacement is itself Markdown rather than raw HTML.
|
|
850
|
+
The replacement is parsed with the same options as the main document—emoji
|
|
851
|
+
expansion, heading IDs, raw HTML suppression—and is subject to the same
|
|
852
|
+
post-render filters (`syntax_highlight`, allowlists, `images: { lazy: true }`, `links: { nofollow: true }`).
|
|
853
|
+
Handlers do **not** fire on elements within the replacement.
|
|
854
|
+
`html=` takes priority when both are set on the same event.
|
|
855
|
+
|
|
856
|
+
```ruby
|
|
857
|
+
md = Inkmark.new(source)
|
|
858
|
+
md.on(:paragraph) do |p|
|
|
859
|
+
if p.text.start_with?("@note ")
|
|
860
|
+
body = p.text.sub(/\A@note /, "")
|
|
861
|
+
p.markdown = "> **Note:** #{body}"
|
|
862
|
+
end
|
|
863
|
+
end
|
|
864
|
+
html = md.to_html
|
|
865
|
+
```
|
|
866
|
+
|
|
867
|
+
#### Suppressing elements
|
|
868
|
+
|
|
869
|
+
Call `delete` on any event to omit it from the output. Children are suppressed
|
|
870
|
+
along with their parent:
|
|
871
|
+
|
|
872
|
+
```ruby
|
|
873
|
+
md.on(:image) { |img| img.delete } # all images
|
|
874
|
+
md.on(:heading) { |h| h.delete if h.text.start_with?("INTERNAL:") } # by content
|
|
875
|
+
```
|
|
876
|
+
|
|
877
|
+
#### Inline code annotation
|
|
878
|
+
|
|
879
|
+
`:code` fires for inline backtick spans. Use it to add links or decoration:
|
|
880
|
+
|
|
881
|
+
```ruby
|
|
882
|
+
md.on(:code) do |c|
|
|
883
|
+
if c.text =~ /\A[A-Z][A-Za-z]+#\w+\z/ # e.g. String#split
|
|
884
|
+
c.html = %(<a href="/api/#{c.text.tr('#', '/')}"><code>#{c.text}</code></a>)
|
|
885
|
+
end
|
|
886
|
+
end
|
|
887
|
+
```
|
|
888
|
+
|
|
889
|
+
### Children and tree context
|
|
890
|
+
|
|
891
|
+
Container elements expose their child events (lazy, cached):
|
|
892
|
+
|
|
893
|
+
```ruby
|
|
894
|
+
md.on(:table) do |t|
|
|
895
|
+
rows = t.children_of(:table_row)
|
|
896
|
+
rows.each_with_index do |row, i|
|
|
897
|
+
cells = row.children_of(:table_cell).map(&:text)
|
|
898
|
+
puts "Row #{i}: #{cells.join(' | ')}"
|
|
899
|
+
end
|
|
900
|
+
end
|
|
901
|
+
```
|
|
902
|
+
|
|
903
|
+
Use `parent_kind` and `ancestor_kinds` for context-sensitive decisions:
|
|
904
|
+
|
|
905
|
+
```ruby
|
|
906
|
+
# Skip decorative images that are already inside a link
|
|
907
|
+
md.on(:image) { |img| img.delete if img.ancestor_kinds.include?(:link) }
|
|
908
|
+
|
|
909
|
+
# Only process top-level paragraphs
|
|
910
|
+
md.on(:paragraph) { |p| next unless p.parent_kind.nil? }
|
|
911
|
+
```
|
|
912
|
+
|
|
913
|
+
`depth` gives the nesting level (0 = top-level block):
|
|
914
|
+
|
|
915
|
+
```ruby
|
|
916
|
+
md.on(:blockquote) { |b| puts "blockquote at depth #{b.depth}" }
|
|
917
|
+
md.on(:paragraph) { |p| puts "paragraph at depth #{p.depth}" }
|
|
918
|
+
# A paragraph inside a blockquote has depth 1.
|
|
919
|
+
```
|
|
920
|
+
|
|
921
|
+
### Source byte ranges
|
|
922
|
+
|
|
923
|
+
`byte_range` is an exclusive Ruby Range (`start...end`) that lets you slice the
|
|
924
|
+
original source to recover the raw Markdown for any element:
|
|
925
|
+
|
|
926
|
+
```ruby
|
|
927
|
+
source = File.read("post.md")
|
|
928
|
+
md = Inkmark.new(source)
|
|
929
|
+
md.on(:heading) do |h|
|
|
930
|
+
puts "#{h.byte_range}: #{source[h.byte_range].inspect}"
|
|
931
|
+
end
|
|
932
|
+
md.walk
|
|
933
|
+
```
|
|
934
|
+
|
|
935
|
+
Populated for all container kinds and the leaf kinds `:code`, `:rule`,
|
|
936
|
+
`:inline_math`, `:display_math`. Returns `nil` for `:text`, `:soft_break`,
|
|
937
|
+
and `:hard_break`. Also `nil` for `:link` when `links: { autolink: true }` is enabled
|
|
938
|
+
(the autolink filter inserts new link events that would shift the offset queue).
|
|
939
|
+
|
|
940
|
+
### Event object reference
|
|
941
|
+
|
|
942
|
+
Every handler receives a `Inkmark::Event` with these fields and methods:
|
|
943
|
+
|
|
944
|
+
| Field / method | Type | Description |
|
|
945
|
+
|---|---|---|
|
|
946
|
+
| `kind` | `Symbol` | Element kind, e.g. `:heading`, `:image` |
|
|
947
|
+
| `text` | `String` | Plain text of all descendant text nodes |
|
|
948
|
+
| `depth` | `Integer` | Nesting depth; 0 = top-level block |
|
|
949
|
+
| `parent_kind` | `Symbol, nil` | Kind of the immediate parent, or `nil` at root |
|
|
950
|
+
| `ancestor_kinds` | `Array<Symbol>` | Ancestor kinds, nearest first |
|
|
951
|
+
| `byte_range` | `Range, nil` | Byte offsets in the original source string |
|
|
952
|
+
| `children` | `Array<Event>` | Direct child events (containers only) |
|
|
953
|
+
| `children_of(kind)` | `Array<Event>` | Children filtered by kind |
|
|
954
|
+
| `delete` |—| Suppress this element from output |
|
|
955
|
+
| `deleted?` | `Boolean` | True if `delete` was called |
|
|
956
|
+
| `html=` | `String, nil` | Replace output with a raw HTML string |
|
|
957
|
+
| `markdown=` | `String, nil` | Replace output by re-rendering a Markdown string |
|
|
958
|
+
| `dest=` | `String, nil` | Rewrite URL on `:link` / `:image` |
|
|
959
|
+
| `title=` | `String, nil` | Rewrite title attribute on `:link` / `:image` |
|
|
960
|
+
| `level=` | `Integer, nil` | Change heading level (1–6) on `:heading` |
|
|
961
|
+
| `id=` | `String, nil` | Change `id` attribute on `:heading` |
|
|
962
|
+
|
|
963
|
+
#### Per-kind field availability
|
|
964
|
+
|
|
965
|
+
**Container kinds**: handler fires after all children are processed:
|
|
966
|
+
|
|
967
|
+
| Kind | Readable | Mutable |
|
|
968
|
+
|---|---|---|
|
|
969
|
+
| `:heading` | `text`, `level`, `id` | `level=`, `id=`, `html=`, `markdown=` |
|
|
970
|
+
| `:paragraph` | `text` | `html=`, `markdown=` |
|
|
971
|
+
| `:blockquote` | `text` | `html=`, `markdown=` |
|
|
972
|
+
| `:list` |—| `html=`, `markdown=` |
|
|
973
|
+
| `:ordered_list` |—| `html=`, `markdown=` |
|
|
974
|
+
| `:list_item` | `text` | `html=`, `markdown=` |
|
|
975
|
+
| `:code_block` | `text`, `lang` | `html=`, `markdown=` |
|
|
976
|
+
| `:table` |—| `html=`, `markdown=` |
|
|
977
|
+
| `:table_head` |—| `html=`, `markdown=` |
|
|
978
|
+
| `:table_row` | `text` | `html=`, `markdown=` |
|
|
979
|
+
| `:table_cell` | `text` | `html=`, `markdown=` |
|
|
980
|
+
| `:emphasis` | `text` | `html=`, `markdown=` |
|
|
981
|
+
| `:strong` | `text` | `html=`, `markdown=` |
|
|
982
|
+
| `:strikethrough` | `text` | `html=`, `markdown=` |
|
|
983
|
+
| `:link` | `text`, `dest`, `title` | `dest=`, `title=`, `html=`, `markdown=` |
|
|
984
|
+
| `:image` | `text` (alt), `dest`, `title` | `dest=`, `title=`, `html=`, `markdown=` |
|
|
985
|
+
| `:footnote_definition` | `text` | `html=`, `markdown=` |
|
|
986
|
+
|
|
987
|
+
**Leaf kinds**: no children; handler fires on the event itself:
|
|
988
|
+
|
|
989
|
+
| Kind | Readable | Mutable |
|
|
990
|
+
|---|---|---|
|
|
991
|
+
| `:code` | `text` | `html=` |
|
|
992
|
+
| `:text` | `text` | `html=` |
|
|
993
|
+
| `:html` | `text` | `html=` |
|
|
994
|
+
| `:rule` |—| `html=` |
|
|
995
|
+
| `:soft_break` |—| `html=` |
|
|
996
|
+
| `:hard_break` |—| `html=` |
|
|
997
|
+
| `:footnote_reference` | `text` | `html=` |
|
|
998
|
+
|
|
999
|
+
All kinds expose `depth`, `parent_kind`, `ancestor_kinds`, `byte_range`,
|
|
1000
|
+
`children`, `children_of`, `delete`, `deleted?`.
|
|
1001
|
+
|
|
1002
|
+
`:code_block` `text` and `source` are identical—`source` is an alias for
|
|
1003
|
+
readability when treating the field as raw source code.
|
|
1004
|
+
|
|
1005
|
+
### Filter interaction
|
|
1006
|
+
|
|
1007
|
+
Enrichment filters run **before** handlers. Handlers always see:
|
|
1008
|
+
|
|
1009
|
+
- Emoji already resolved (`emoji_shortcodes: true`)—`h.text` contains `"🚀"`,
|
|
1010
|
+
not `":rocket:"`
|
|
1011
|
+
- Bare URLs already autolinked (`links: { autolink: true }`)—they appear as `:link` events
|
|
1012
|
+
- Heading `id` already set (`headings: { ids: true }`)—`h.id` is populated
|
|
1013
|
+
|
|
1014
|
+
Post-render filters (`syntax_highlight`, allowlists, `images: { lazy: true }`,
|
|
1015
|
+
`links: { nofollow: true }`) run **after** handlers:
|
|
1016
|
+
|
|
1017
|
+
- `:code_block` events are still `:code_block`, not opaque HTML, even when
|
|
1018
|
+
`syntax_highlight: true`—setting `html=` on a code block overrides the
|
|
1019
|
+
highlighter
|
|
1020
|
+
- Handler-set `dest=` values pass through host and scheme allowlists
|
|
1021
|
+
|
|
1022
|
+
## Benchmarks
|
|
1023
|
+
|
|
1024
|
+
Inkmark ships a benchmark harness comparing it against `kramdown`,
|
|
1025
|
+
`commonmarker`, `redcarpet`, `markly`, and `rdiscount` on a sweep of real
|
|
1026
|
+
markdown inputs.
|
|
1027
|
+
|
|
1028
|
+
Measuring apples to apples: every adapter is tuned for **feature parity** with
|
|
1029
|
+
Inkmark's defaults—CommonMark + core GFM (tables, strikethrough, tasklists,
|
|
1030
|
+
footnotes, tagfilter), no typographics, no autolink, no syntax highlighting,
|
|
1031
|
+
no heading-id slugging.
|
|
1032
|
+
|
|
1033
|
+
Run locally:
|
|
1034
|
+
|
|
1035
|
+
```bash
|
|
1036
|
+
bundle config set with benchmark
|
|
1037
|
+
bundle install
|
|
1038
|
+
bundle exec rake benchmark
|
|
1039
|
+
```
|
|
1040
|
+
|
|
1041
|
+
### Assets
|
|
1042
|
+
|
|
1043
|
+
| Asset | Size | What it exercises |
|
|
1044
|
+
|---|---:|---|
|
|
1045
|
+
| `commonmark-spec` | 201.3 KB | CommonMark spec—code-block-heavy, edge-case-heavy |
|
|
1046
|
+
| `commonmarker-readme` | 17.0 KB | Real-world commonmarker README—options tables, fenced code |
|
|
1047
|
+
| `redcarpet-readme` | 14.0 KB | Real-world redcarpet README—prose + code samples |
|
|
1048
|
+
| `redcarpet-benchmark` | 8.0 KB | Classic redcarpet bench corpus—heavy emphasis / inline parsing |
|
|
1049
|
+
| `large-4k` | 3.7 KB | dotenv README—mixed prose, code blocks, tables |
|
|
1050
|
+
| `medium-1k` | 1.0 KB | Faraday README header—images, badges, inline links |
|
|
1051
|
+
| `small-512b` | 0.5 KB | Short README section with headings and bullet lists |
|
|
1052
|
+
| `tiny-256b` | 0.3 KB | 3-line CommonMark snippet—parser setup/overhead-bound |
|
|
1053
|
+
|
|
1054
|
+
See `benchmarks/NOTICE` for attribution on the vendored test inputs.
|
|
1055
|
+
|
|
1056
|
+
### Results
|
|
1057
|
+
|
|
1058
|
+
Numbers below are from AWS EC2 `c7a.large` (AMD EPYC), Ruby 4.0.2 with YJIT on.
|
|
1059
|
+
Each engine uses its idiomatic "hot path"—Inkmark relies on its cached default
|
|
1060
|
+
options, Redcarpet reuses one pre-built `Markdown` object. Iterations per
|
|
1061
|
+
second, higher is better.
|
|
1062
|
+
|
|
1063
|
+
**`commonmark-spec` (201.3 KB)**
|
|
1064
|
+
```
|
|
1065
|
+
inkmark: 1,172 i/s
|
|
1066
|
+
redcarpet: 908 i/s - 1.29x slower
|
|
1067
|
+
markly: 453 i/s - 2.59x slower
|
|
1068
|
+
commonmarker: 345 i/s - 3.40x slower
|
|
1069
|
+
rdiscount: 212 i/s - 5.53x slower
|
|
1070
|
+
kramdown: 26 i/s - 45.08x slower
|
|
1071
|
+
```
|
|
1072
|
+
|
|
1073
|
+
**`commonmarker-readme` (16.9 KB)**
|
|
1074
|
+
```
|
|
1075
|
+
inkmark: 16,658 i/s
|
|
1076
|
+
redcarpet: 12,988 i/s - 1.28x slower
|
|
1077
|
+
commonmarker: 4,268 i/s - 3.90x slower
|
|
1078
|
+
markly: 3,974 i/s - 4.19x slower
|
|
1079
|
+
rdiscount: 2,676 i/s - 6.22x slower
|
|
1080
|
+
kramdown: 113 i/s - 147.42x slower
|
|
1081
|
+
```
|
|
1082
|
+
|
|
1083
|
+
**`redcarpet-readme` (14.0 KB)**
|
|
1084
|
+
```
|
|
1085
|
+
inkmark: 17,343 i/s
|
|
1086
|
+
redcarpet: 13,587 i/s - 1.28x slower
|
|
1087
|
+
markly: 5,455 i/s - 3.18x slower
|
|
1088
|
+
commonmarker: 4,890 i/s - 3.55x slower
|
|
1089
|
+
rdiscount: 3,336 i/s - 5.20x slower
|
|
1090
|
+
kramdown: 208 i/s - 83.38x slower
|
|
1091
|
+
```
|
|
1092
|
+
|
|
1093
|
+
**`redcarpet-benchmark` (8.0 KB)**
|
|
1094
|
+
```
|
|
1095
|
+
inkmark: 27,634 i/s
|
|
1096
|
+
redcarpet: 23,777 i/s - 1.16x slower
|
|
1097
|
+
markly: 9,346 i/s - 2.96x slower
|
|
1098
|
+
commonmarker: 7,805 i/s - 3.54x slower
|
|
1099
|
+
rdiscount: 6,201 i/s - 4.46x slower
|
|
1100
|
+
kramdown: 367 i/s - 75.30x slower
|
|
1101
|
+
```
|
|
1102
|
+
|
|
1103
|
+
**`large-4k` (3.7 KB)**
|
|
1104
|
+
```
|
|
1105
|
+
inkmark: 64,051 i/s
|
|
1106
|
+
redcarpet: 58,420 i/s - 1.10x slower
|
|
1107
|
+
markly: 22,500 i/s - 2.85x slower
|
|
1108
|
+
commonmarker: 18,053 i/s - 3.55x slower
|
|
1109
|
+
rdiscount: 13,839 i/s - 4.63x slower
|
|
1110
|
+
kramdown: 624 i/s - 102.64x slower
|
|
1111
|
+
```
|
|
1112
|
+
|
|
1113
|
+
**`medium-1k` (1.0 KB)**
|
|
1114
|
+
```
|
|
1115
|
+
redcarpet: 216,968 i/s
|
|
1116
|
+
inkmark: 213,478 i/s - 1.02x slower
|
|
1117
|
+
markly: 70,251 i/s - 3.09x slower
|
|
1118
|
+
commonmarker: 46,357 i/s - 4.68x slower
|
|
1119
|
+
rdiscount: 45,880 i/s - 4.73x slower
|
|
1120
|
+
kramdown: 2,813 i/s - 77.13x slower
|
|
1121
|
+
```
|
|
1122
|
+
|
|
1123
|
+
**`small-512b` (0.5 KB)**
|
|
1124
|
+
```
|
|
1125
|
+
inkmark: 388,266 i/s
|
|
1126
|
+
redcarpet: 368,401 i/s - 1.05x slower
|
|
1127
|
+
rdiscount: 74,032 i/s - 5.24x slower
|
|
1128
|
+
markly: 61,175 i/s - 6.35x slower
|
|
1129
|
+
commonmarker: 46,658 i/s - 8.32x slower
|
|
1130
|
+
kramdown: 3,952 i/s - 98.25x slower
|
|
1131
|
+
```
|
|
1132
|
+
|
|
1133
|
+
**`tiny-256b` (0.3 KB)**
|
|
1134
|
+
```
|
|
1135
|
+
redcarpet: 535,972 i/s
|
|
1136
|
+
inkmark: 511,019 i/s - 1.05x slower
|
|
1137
|
+
rdiscount: 99,001 i/s - 5.41x slower
|
|
1138
|
+
markly: 96,159 i/s - 5.57x slower
|
|
1139
|
+
commonmarker: 57,704 i/s - 9.29x slower
|
|
1140
|
+
kramdown: 4,117 i/s - 130.18x slower
|
|
1141
|
+
```
|
|
1142
|
+
|
|
1143
|
+
## Contributing
|
|
1144
|
+
|
|
1145
|
+
Bug reports and pull requests are welcome on GitHub at
|
|
1146
|
+
https://github.com/yaroslav/inkmark.
|
|
1147
|
+
|
|
1148
|
+
## Acknowledgements
|
|
1149
|
+
|
|
1150
|
+
Inkmark is built with:
|
|
1151
|
+
|
|
1152
|
+
[pulldown-cmark](https://github.com/pulldown-cmark/pulldown-cmark) by Raph Levien, Marcus Klaas de Vries, Martín Pozo, Michael Howell, Roope Salmi and Martin Geisler;
|
|
1153
|
+
|
|
1154
|
+
[Magnus](https://github.com/matsadler/magnus) by Matthew Sadler;
|
|
1155
|
+
|
|
1156
|
+
[syntect](https://github.com/trishume/syntect) by Tristan Hume, Keith Hall, Google Inc and other contributors;
|
|
1157
|
+
|
|
1158
|
+
And other Rust crates—thanks to their authors.
|
|
1159
|
+
|
|
1160
|
+
Thanks to Julik Tarkhanov for short but useful brainstorming sessions.
|
|
1161
|
+
|
|
1162
|
+
## License
|
|
1163
|
+
|
|
1164
|
+
The gem is available as open source under the terms of the
|
|
1165
|
+
[MIT License](LICENSE.txt). Third-party content (benchmark assets, CommonMark
|
|
1166
|
+
spec) is attributed in `NOTICE` and `benchmarks/NOTICE`.
|