inkmark 0.1.0-x86_64-linux → 0.1.2-x86_64-linux
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/README.md +26 -16
- data/lib/inkmark/3.3/inkmark.so +0 -0
- data/lib/inkmark/3.4/inkmark.so +0 -0
- data/lib/inkmark/4.0/inkmark.so +0 -0
- data/lib/inkmark/version.rb +1 -1
- data/lib/inkmark.rb +7 -1
- metadata +4 -4
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: bad7213c3988c5b14773d0f35819253a35ac7c5feed26a05b61567705a5f211d
|
|
4
|
+
data.tar.gz: a2b0dd76afc2d15b811ac318c26d40f96e571e1e6fdbfdda2bdee99c7d764d69
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: b8b010daac54b71e9b66f35a08787f4fa42421d002de4edd01b9317ff97ac9f882a7d66b56deb3a2404828c3ba9e7565676e19c8d304f73dee6c77d1662e9631
|
|
7
|
+
data.tar.gz: 220d5715c270a05adf091c9eb0745024db9a08862f5d7f2284640d3db994b686e9381bd7fa68d01f7ca8da0dfeb29d55062ed99dd082842e3d419d8128f25fb3
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,12 @@
|
|
|
1
|
+
## [0.1.2] - 2026-06-21
|
|
2
|
+
|
|
3
|
+
- Fix `Inkmark.truncate_markdown` raising `TypeError` when called without explicit `options:`.
|
|
4
|
+
- Update dependencies on the Rust side.
|
|
5
|
+
|
|
6
|
+
## [0.1.1] - 2026-04-22
|
|
7
|
+
|
|
8
|
+
- Strip DWARF debug info from shipped Linux and Windows binaries via `strip = "debuginfo"`.
|
|
9
|
+
|
|
1
10
|
## [0.1.0] - 2026-04-22
|
|
2
11
|
|
|
3
12
|
- Initial public release
|
data/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Inkmark
|
|
2
2
|
|
|
3
|
-
A very fast, feature-packed, AI-first
|
|
3
|
+
A very fast, feature-packed, AI-first Markdown gem for Ruby.
|
|
4
4
|
|
|
5
5
|
[](https://github.com/yaroslav/inkmark/releases)
|
|
6
6
|
[](https://rubydoc.info/gems/inkmark)
|
|
@@ -12,12 +12,17 @@ A very fast, feature-packed, AI-first markdown gem for Ruby.
|
|
|
12
12
|
- **Very fast**. Up to 1.3× faster than redcarpet _(not CommonMark-conformant)_, about 3×–9× faster than other Ruby Markdown gems with native extensions. Built with Rust, based on [pulldown-cmark](https://github.com/pulldown-cmark/pulldown-cmark), uses SIMD.
|
|
13
13
|
- **No surprises**. CommonMark + GitHub Flavored Markdown conformance.
|
|
14
14
|
- **"Batteries included" approach**. Build lots of useful features, make them easy to use and as fast as possible.
|
|
15
|
-
- **Easy to use**. As simple as a one-method API. Pass options inline as a hash, set them one by one, or set default options for the entire application.
|
|
16
|
-
- **Feature-packed**. Server-side syntax highlighting with themes, frontmatter support, table of contents in Markdown and HTML, plain text export, extraction of
|
|
15
|
+
- **Easy to use**. As simple as a one-method API. Pass options inline as a hash, set them one by one, or set default options for the entire application.
|
|
16
|
+
- **Feature-packed**. Server-side syntax highlighting with themes, frontmatter support, table of contents in Markdown and HTML, plain text export, extraction of headings/links/images, statistics (character and word count, likely document language, blocks count), lazy image loading attributes, emoji shortcodes, autolinks, heading IDs with Unicode-transliterated slugs, wikilinks, footnotes, tables, task lists, smart punctuation, hard wraps, "nofollow/noopener" on external links.
|
|
17
17
|
- **AI-first**. Two chunking primitives: heading-based with breadcrumbs and per-chunk character/word counts, and sliding-window with overlap for size-bounded chunks where headings are absent or uneven. Block-aware or word-aware truncation for context-window budgeting. Markdown-to-Markdown pipeline. Plain-text extraction for embedding models. Structured extraction of headings, images, links, code blocks—each carrying byte ranges back into the source.
|
|
18
18
|
- **Security conscious**. Raw HTML denied by default. Hostname and URL-scheme allowlists for both links and images. GFM tagfilter for dangerous tags. A Rust-backed gem.
|
|
19
19
|
- **Easy extension API**. Hook any element with a Ruby block—no subclassing, no intermediate AST, no HTML post-processing. Rewrite URLs, swap code blocks for your own renderer, drop subtrees, or just walk the document for analysis. Handlers fire inside the single-pass parser, so extension costs essentially nothing beyond the render itself—and far less than regexing over output HTML.
|
|
20
20
|
|
|
21
|
+
**See the introductory post for background and motivation**:
|
|
22
|
+
|
|
23
|
+
**[Inkmark: a very fast, feature-packed, AI-first Markdown gem for Ruby
|
|
24
|
+
](https://yaroslav.io/posts/inkmark-fast-ai-first-markdown)**
|
|
25
|
+
|
|
21
26
|
## Contents
|
|
22
27
|
|
|
23
28
|
- [Installation](#installation)
|
|
@@ -50,6 +55,8 @@ gem "inkmark"
|
|
|
50
55
|
|
|
51
56
|
Ruby 3.3+ is supported.
|
|
52
57
|
|
|
58
|
+
The gem comes precompiled, a compiler toolchain is _not_ required for installation.
|
|
59
|
+
|
|
53
60
|
## Quick start
|
|
54
61
|
|
|
55
62
|
```ruby
|
|
@@ -91,7 +98,7 @@ for nested element-policy hashes). You can—and are recommended to!—override
|
|
|
91
98
|
- **`:trusted`**: `:recommended` plus raw HTML pass-through.
|
|
92
99
|
**Dangerous.** Intended only for content you fully trust: internal,
|
|
93
100
|
team-authored. With raw HTML on, Inkmark does no sanitization beyond
|
|
94
|
-
the narrow GFM tagfilter (turn it off
|
|
101
|
+
the narrow GFM tagfilter (turn it off at your own risk); the caller is
|
|
95
102
|
responsible for output safety. Do not apply this preset to anything a user can influence, directly or indirectly.
|
|
96
103
|
|
|
97
104
|
- **`:gfm`**: the bare default. CommonMark plus the core GFM extensions
|
|
@@ -129,7 +136,7 @@ Inkmark.to_html(internal_doc, options: { preset: :trusted })
|
|
|
129
136
|
## Options
|
|
130
137
|
|
|
131
138
|
GFM extensions are on by default; raw HTML rendering is off by default.
|
|
132
|
-
Pass a hash to `Inkmark.to_html` / `Inkmark.new`, or mutate
|
|
139
|
+
Pass a hash to `Inkmark.to_html` / `Inkmark.new`, or mutate an `Inkmark::Options`
|
|
133
140
|
instance via its accessors.
|
|
134
141
|
|
|
135
142
|
| Key | Default | Description |
|
|
@@ -158,7 +165,7 @@ instance via its accessors.
|
|
|
158
165
|
| `wikilinks` | `false` | `[[Page]]` and `[[Page\|label]]` render as links. |
|
|
159
166
|
| `frontmatter` | `false` | Frontmatter (YAML metadata at the start of the document). Parsed and exposed via `Inkmark#frontmatter`; the block is stripped from rendered output. |
|
|
160
167
|
|
|
161
|
-
Options can be supplied
|
|
168
|
+
Options can be supplied in several ways:
|
|
162
169
|
|
|
163
170
|
```ruby
|
|
164
171
|
# As a hash at construction
|
|
@@ -185,7 +192,7 @@ Inkmark.new("x", options: { taples: true })
|
|
|
185
192
|
|
|
186
193
|
## Raw HTML
|
|
187
194
|
|
|
188
|
-
Raw HTML is suppressed by default. This is safe-by-default for rendering untrusted
|
|
195
|
+
Raw HTML is suppressed by default. This is safe-by-default for rendering untrusted Markdown:
|
|
189
196
|
|
|
190
197
|
```ruby
|
|
191
198
|
Inkmark.to_html("<script>alert(1)</script>")
|
|
@@ -248,7 +255,7 @@ relative URLs.
|
|
|
248
255
|
|
|
249
256
|
## URL scheme filtering
|
|
250
257
|
|
|
251
|
-
For rendering untrusted
|
|
258
|
+
For rendering untrusted Markdown, opt in to scheme allowlists to block
|
|
252
259
|
`javascript:`, `data:`, and other dangerous URL schemes in links and
|
|
253
260
|
images:
|
|
254
261
|
|
|
@@ -274,7 +281,7 @@ Inkmark.to_html(">)",
|
|
|
274
281
|
# => "<p>pic</p>\n" # dropped to alt text
|
|
275
282
|
```
|
|
276
283
|
|
|
277
|
-
**Scope:** scheme filtering applies to
|
|
284
|
+
**Scope:** scheme filtering applies to Markdown-emitted links and images
|
|
278
285
|
(`[text](url)` / ``). Raw HTML `<a href>` / `<img src>` inside
|
|
279
286
|
`raw_html: true` content is *not* filtered—for that case use a
|
|
280
287
|
downstream HTML sanitizer like Loofah.
|
|
@@ -394,7 +401,7 @@ parent. Root-level sections and the preamble have an empty array. Skipped
|
|
|
394
401
|
levels are omitted, so an `###` directly under an `#` has `breadcrumb:
|
|
395
402
|
["Top"]`, not `["Top", nil]`. RAG pipelines typically prepend the
|
|
396
403
|
breadcrumb to each chunk before embedding—it gives the vector model a
|
|
397
|
-
cheap signal about the chunk's place in the document
|
|
404
|
+
cheap signal about the chunk's place in the document.
|
|
398
405
|
|
|
399
406
|
Enable `statistics: true` to add `:character_count` and `:word_count` to
|
|
400
407
|
every section entry. Counts reflect the section's filter-applied text
|
|
@@ -460,7 +467,7 @@ embedding quality for downstream semantic search.
|
|
|
460
467
|
sections = Inkmark.chunks_by_heading(doc, options: {
|
|
461
468
|
emoji_shortcodes: true, # keep—improves semantic signal
|
|
462
469
|
links: {
|
|
463
|
-
autolink: true, # keep—proper anchor
|
|
470
|
+
autolink: true, # keep—proper anchor Markdown
|
|
464
471
|
allowed_schemes: %w[http https mailto], # keep—safe URLs
|
|
465
472
|
nofollow: false # off—would embed <a rel=...> HTML
|
|
466
473
|
},
|
|
@@ -907,7 +914,10 @@ Use `parent_kind` and `ancestor_kinds` for context-sensitive decisions:
|
|
|
907
914
|
md.on(:image) { |img| img.delete if img.ancestor_kinds.include?(:link) }
|
|
908
915
|
|
|
909
916
|
# Only process top-level paragraphs
|
|
910
|
-
md.on(:paragraph)
|
|
917
|
+
md.on(:paragraph) do |p|
|
|
918
|
+
next unless p.parent_kind.nil?
|
|
919
|
+
# ... only top-level paragraphs reach here
|
|
920
|
+
end
|
|
911
921
|
```
|
|
912
922
|
|
|
913
923
|
`depth` gives the nesting level (0 = top-level block):
|
|
@@ -927,7 +937,7 @@ original source to recover the raw Markdown for any element:
|
|
|
927
937
|
source = File.read("post.md")
|
|
928
938
|
md = Inkmark.new(source)
|
|
929
939
|
md.on(:heading) do |h|
|
|
930
|
-
puts "#{h.byte_range}: #{source
|
|
940
|
+
puts "#{h.byte_range}: #{source.byteslice(h.byte_range).inspect}"
|
|
931
941
|
end
|
|
932
942
|
md.walk
|
|
933
943
|
```
|
|
@@ -939,7 +949,7 @@ and `:hard_break`. Also `nil` for `:link` when `links: { autolink: true }` is en
|
|
|
939
949
|
|
|
940
950
|
### Event object reference
|
|
941
951
|
|
|
942
|
-
Every handler receives
|
|
952
|
+
Every handler receives an `Inkmark::Event` with these fields and methods:
|
|
943
953
|
|
|
944
954
|
| Field / method | Type | Description |
|
|
945
955
|
|---|---|---|
|
|
@@ -972,7 +982,7 @@ Every handler receives a `Inkmark::Event` with these fields and methods:
|
|
|
972
982
|
| `:list` |—| `html=`, `markdown=` |
|
|
973
983
|
| `:ordered_list` |—| `html=`, `markdown=` |
|
|
974
984
|
| `:list_item` | `text` | `html=`, `markdown=` |
|
|
975
|
-
| `:code_block` | `text
|
|
985
|
+
| `:code_block` | `text` (alias `source`), `lang` | `html=`, `markdown=` |
|
|
976
986
|
| `:table` |—| `html=`, `markdown=` |
|
|
977
987
|
| `:table_head` |—| `html=`, `markdown=` |
|
|
978
988
|
| `:table_row` | `text` | `html=`, `markdown=` |
|
|
@@ -1023,7 +1033,7 @@ Post-render filters (`syntax_highlight`, allowlists, `images: { lazy: true }`,
|
|
|
1023
1033
|
|
|
1024
1034
|
Inkmark ships a benchmark harness comparing it against `kramdown`,
|
|
1025
1035
|
`commonmarker`, `redcarpet`, `markly`, and `rdiscount` on a sweep of real
|
|
1026
|
-
|
|
1036
|
+
Markdown inputs.
|
|
1027
1037
|
|
|
1028
1038
|
Measuring apples to apples: every adapter is tuned for **feature parity** with
|
|
1029
1039
|
Inkmark's defaults—CommonMark + core GFM (tables, strikethrough, tasklists,
|
data/lib/inkmark/3.3/inkmark.so
CHANGED
|
Binary file
|
data/lib/inkmark/3.4/inkmark.so
CHANGED
|
Binary file
|
data/lib/inkmark/4.0/inkmark.so
CHANGED
|
Binary file
|
data/lib/inkmark/version.rb
CHANGED
data/lib/inkmark.rb
CHANGED
|
@@ -222,7 +222,13 @@ class Inkmark
|
|
|
222
222
|
params = normalize_truncate_params(
|
|
223
223
|
chars: chars, words: words, at: at, marker: marker
|
|
224
224
|
)
|
|
225
|
-
|
|
225
|
+
# truncate's native binding requires an options Hash; unlike the
|
|
226
|
+
# to_html/to_plain_text bindings it has no nil fast path, so fall
|
|
227
|
+
# back to the default options hash when the resolver returns nil.
|
|
228
|
+
_native_truncate_markdown(
|
|
229
|
+
source, params,
|
|
230
|
+
resolve_frozen_options(options) || default_options.to_native_hash_frozen
|
|
231
|
+
)
|
|
226
232
|
end
|
|
227
233
|
|
|
228
234
|
# Render +source+ through the filter pipeline and serialize to plain
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: inkmark
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.2
|
|
5
5
|
platform: x86_64-linux
|
|
6
6
|
authors:
|
|
7
7
|
- Yaroslav Markin
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2026-
|
|
11
|
+
date: 2026-06-21 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: rake
|
|
@@ -122,7 +122,7 @@ dependencies:
|
|
|
122
122
|
- - ">="
|
|
123
123
|
- !ruby/object:Gem::Version
|
|
124
124
|
version: '0'
|
|
125
|
-
description: A very fast, feature-packed, AI-first
|
|
125
|
+
description: A very fast, feature-packed, AI-first Markdown (CommonMark/GFM) gem for
|
|
126
126
|
Ruby, based on pulldown-cmark (Rust).
|
|
127
127
|
email:
|
|
128
128
|
- yaroslav@markin.net
|
|
@@ -174,5 +174,5 @@ requirements: []
|
|
|
174
174
|
rubygems_version: 3.5.23
|
|
175
175
|
signing_key:
|
|
176
176
|
specification_version: 4
|
|
177
|
-
summary: Very fast, feature-packed, AI-first
|
|
177
|
+
summary: Very fast, feature-packed, AI-first Markdown gem for Ruby.
|
|
178
178
|
test_files: []
|