inkmark 0.1.0-x86_64-linux-musl → 0.1.1-x86_64-linux-musl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +19 -16
- data/lib/inkmark/3.3/inkmark.so +0 -0
- data/lib/inkmark/3.4/inkmark.so +0 -0
- data/lib/inkmark/4.0/inkmark.so +0 -0
- data/lib/inkmark/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 5ac2199d5255de3998135d1253777ba4106ce66be866ec9c7bb62507da71c105
|
|
4
|
+
data.tar.gz: 42318d013db96d0df93acccfc420fd16b881163dde439341c1dac0dabfda1f92
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: fba875810b4a47b6b8ffd8478d549d0b0d394e039f854e66dfca5b4ea29afff6d529e499ad661efb89694a120b17066232f70f7ed93b59056ac4be9bffcea01c
|
|
7
|
+
data.tar.gz: dbb77b0035fd98a3e4105b953efc3c7193e8faaed9254312952cd32d0a6942738f8e5d7e20fb591afd326b38d8dbee6afa0d9d9994d9a7da94e29b8219bde3ca
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Inkmark
|
|
2
2
|
|
|
3
|
-
A very fast, feature-packed, AI-first
|
|
3
|
+
A very fast, feature-packed, AI-first Markdown gem for Ruby.
|
|
4
4
|
|
|
5
5
|
[](https://github.com/yaroslav/inkmark/releases)
|
|
6
6
|
[](https://rubydoc.info/gems/inkmark)
|
|
@@ -12,8 +12,8 @@ A very fast, feature-packed, AI-first markdown gem for Ruby.
|
|
|
12
12
|
- **Very fast**. Up to 1.3× faster than redcarpet _(not CommonMark-conformant)_, about 3×–9× faster than other Ruby Markdown gems with native extensions. Built with Rust, based on [pulldown-cmark](https://github.com/pulldown-cmark/pulldown-cmark), uses SIMD.
|
|
13
13
|
- **No surprises**. CommonMark + GitHub Flavored Markdown conformance.
|
|
14
14
|
- **"Batteries included" approach**. Build lots of useful features, make them easy to use and as fast as possible.
|
|
15
|
-
- **Easy to use**. As simple as a one-method API. Pass options inline as a hash, set them one by one, or set default options for the entire application.
|
|
16
|
-
- **Feature-packed**. Server-side syntax highlighting with themes, frontmatter support, table of contents in Markdown and HTML, plain text export, extraction of
|
|
15
|
+
- **Easy to use**. As simple as a one-method API. Pass options inline as a hash, set them one by one, or set default options for the entire application.
|
|
16
|
+
- **Feature-packed**. Server-side syntax highlighting with themes, frontmatter support, table of contents in Markdown and HTML, plain text export, extraction of headings/links/images, statistics (character and word count, likely document language, blocks count), lazy image loading attributes, emoji shortcodes, autolinks, heading IDs with Unicode-transliterated slugs, wikilinks, footnotes, tables, task lists, smart punctuation, hard wraps, "nofollow/noopener" on external links.
|
|
17
17
|
- **AI-first**. Two chunking primitives: heading-based with breadcrumbs and per-chunk character/word counts, and sliding-window with overlap for size-bounded chunks where headings are absent or uneven. Block-aware or word-aware truncation for context-window budgeting. Markdown-to-Markdown pipeline. Plain-text extraction for embedding models. Structured extraction of headings, images, links, code blocks—each carrying byte ranges back into the source.
|
|
18
18
|
- **Security conscious**. Raw HTML denied by default. Hostname and URL-scheme allowlists for both links and images. GFM tagfilter for dangerous tags. A Rust-backed gem.
|
|
19
19
|
- **Easy extension API**. Hook any element with a Ruby block—no subclassing, no intermediate AST, no HTML post-processing. Rewrite URLs, swap code blocks for your own renderer, drop subtrees, or just walk the document for analysis. Handlers fire inside the single-pass parser, so extension costs essentially nothing beyond the render itself—and far less than regexing over output HTML.
|
|
@@ -91,7 +91,7 @@ for nested element-policy hashes). You can—and are recommended to!—override
|
|
|
91
91
|
- **`:trusted`**: `:recommended` plus raw HTML pass-through.
|
|
92
92
|
**Dangerous.** Intended only for content you fully trust: internal,
|
|
93
93
|
team-authored. With raw HTML on, Inkmark does no sanitization beyond
|
|
94
|
-
the narrow GFM tagfilter (turn it off
|
|
94
|
+
the narrow GFM tagfilter (turn it off at your own risk); the caller is
|
|
95
95
|
responsible for output safety. Do not apply this preset to anything a user can influence, directly or indirectly.
|
|
96
96
|
|
|
97
97
|
- **`:gfm`**: the bare default. CommonMark plus the core GFM extensions
|
|
@@ -129,7 +129,7 @@ Inkmark.to_html(internal_doc, options: { preset: :trusted })
|
|
|
129
129
|
## Options
|
|
130
130
|
|
|
131
131
|
GFM extensions are on by default; raw HTML rendering is off by default.
|
|
132
|
-
Pass a hash to `Inkmark.to_html` / `Inkmark.new`, or mutate
|
|
132
|
+
Pass a hash to `Inkmark.to_html` / `Inkmark.new`, or mutate an `Inkmark::Options`
|
|
133
133
|
instance via its accessors.
|
|
134
134
|
|
|
135
135
|
| Key | Default | Description |
|
|
@@ -158,7 +158,7 @@ instance via its accessors.
|
|
|
158
158
|
| `wikilinks` | `false` | `[[Page]]` and `[[Page\|label]]` render as links. |
|
|
159
159
|
| `frontmatter` | `false` | Frontmatter (YAML metadata at the start of the document). Parsed and exposed via `Inkmark#frontmatter`; the block is stripped from rendered output. |
|
|
160
160
|
|
|
161
|
-
Options can be supplied
|
|
161
|
+
Options can be supplied in several ways:
|
|
162
162
|
|
|
163
163
|
```ruby
|
|
164
164
|
# As a hash at construction
|
|
@@ -185,7 +185,7 @@ Inkmark.new("x", options: { taples: true })
|
|
|
185
185
|
|
|
186
186
|
## Raw HTML
|
|
187
187
|
|
|
188
|
-
Raw HTML is suppressed by default. This is safe-by-default for rendering untrusted
|
|
188
|
+
Raw HTML is suppressed by default. This is safe-by-default for rendering untrusted Markdown:
|
|
189
189
|
|
|
190
190
|
```ruby
|
|
191
191
|
Inkmark.to_html("<script>alert(1)</script>")
|
|
@@ -248,7 +248,7 @@ relative URLs.
|
|
|
248
248
|
|
|
249
249
|
## URL scheme filtering
|
|
250
250
|
|
|
251
|
-
For rendering untrusted
|
|
251
|
+
For rendering untrusted Markdown, opt in to scheme allowlists to block
|
|
252
252
|
`javascript:`, `data:`, and other dangerous URL schemes in links and
|
|
253
253
|
images:
|
|
254
254
|
|
|
@@ -274,7 +274,7 @@ Inkmark.to_html(">)",
|
|
|
274
274
|
# => "<p>pic</p>\n" # dropped to alt text
|
|
275
275
|
```
|
|
276
276
|
|
|
277
|
-
**Scope:** scheme filtering applies to
|
|
277
|
+
**Scope:** scheme filtering applies to Markdown-emitted links and images
|
|
278
278
|
(`[text](url)` / ``). Raw HTML `<a href>` / `<img src>` inside
|
|
279
279
|
`raw_html: true` content is *not* filtered—for that case use a
|
|
280
280
|
downstream HTML sanitizer like Loofah.
|
|
@@ -394,7 +394,7 @@ parent. Root-level sections and the preamble have an empty array. Skipped
|
|
|
394
394
|
levels are omitted, so an `###` directly under an `#` has `breadcrumb:
|
|
395
395
|
["Top"]`, not `["Top", nil]`. RAG pipelines typically prepend the
|
|
396
396
|
breadcrumb to each chunk before embedding—it gives the vector model a
|
|
397
|
-
cheap signal about the chunk's place in the document
|
|
397
|
+
cheap signal about the chunk's place in the document.
|
|
398
398
|
|
|
399
399
|
Enable `statistics: true` to add `:character_count` and `:word_count` to
|
|
400
400
|
every section entry. Counts reflect the section's filter-applied text
|
|
@@ -460,7 +460,7 @@ embedding quality for downstream semantic search.
|
|
|
460
460
|
sections = Inkmark.chunks_by_heading(doc, options: {
|
|
461
461
|
emoji_shortcodes: true, # keep—improves semantic signal
|
|
462
462
|
links: {
|
|
463
|
-
autolink: true, # keep—proper anchor
|
|
463
|
+
autolink: true, # keep—proper anchor Markdown
|
|
464
464
|
allowed_schemes: %w[http https mailto], # keep—safe URLs
|
|
465
465
|
nofollow: false # off—would embed <a rel=...> HTML
|
|
466
466
|
},
|
|
@@ -907,7 +907,10 @@ Use `parent_kind` and `ancestor_kinds` for context-sensitive decisions:
|
|
|
907
907
|
md.on(:image) { |img| img.delete if img.ancestor_kinds.include?(:link) }
|
|
908
908
|
|
|
909
909
|
# Only process top-level paragraphs
|
|
910
|
-
md.on(:paragraph)
|
|
910
|
+
md.on(:paragraph) do |p|
|
|
911
|
+
next unless p.parent_kind.nil?
|
|
912
|
+
# ... only top-level paragraphs reach here
|
|
913
|
+
end
|
|
911
914
|
```
|
|
912
915
|
|
|
913
916
|
`depth` gives the nesting level (0 = top-level block):
|
|
@@ -927,7 +930,7 @@ original source to recover the raw Markdown for any element:
|
|
|
927
930
|
source = File.read("post.md")
|
|
928
931
|
md = Inkmark.new(source)
|
|
929
932
|
md.on(:heading) do |h|
|
|
930
|
-
puts "#{h.byte_range}: #{source
|
|
933
|
+
puts "#{h.byte_range}: #{source.byteslice(h.byte_range).inspect}"
|
|
931
934
|
end
|
|
932
935
|
md.walk
|
|
933
936
|
```
|
|
@@ -939,7 +942,7 @@ and `:hard_break`. Also `nil` for `:link` when `links: { autolink: true }` is en
|
|
|
939
942
|
|
|
940
943
|
### Event object reference
|
|
941
944
|
|
|
942
|
-
Every handler receives
|
|
945
|
+
Every handler receives an `Inkmark::Event` with these fields and methods:
|
|
943
946
|
|
|
944
947
|
| Field / method | Type | Description |
|
|
945
948
|
|---|---|---|
|
|
@@ -972,7 +975,7 @@ Every handler receives a `Inkmark::Event` with these fields and methods:
|
|
|
972
975
|
| `:list` |—| `html=`, `markdown=` |
|
|
973
976
|
| `:ordered_list` |—| `html=`, `markdown=` |
|
|
974
977
|
| `:list_item` | `text` | `html=`, `markdown=` |
|
|
975
|
-
| `:code_block` | `text
|
|
978
|
+
| `:code_block` | `text` (alias `source`), `lang` | `html=`, `markdown=` |
|
|
976
979
|
| `:table` |—| `html=`, `markdown=` |
|
|
977
980
|
| `:table_head` |—| `html=`, `markdown=` |
|
|
978
981
|
| `:table_row` | `text` | `html=`, `markdown=` |
|
|
@@ -1023,7 +1026,7 @@ Post-render filters (`syntax_highlight`, allowlists, `images: { lazy: true }`,
|
|
|
1023
1026
|
|
|
1024
1027
|
Inkmark ships a benchmark harness comparing it against `kramdown`,
|
|
1025
1028
|
`commonmarker`, `redcarpet`, `markly`, and `rdiscount` on a sweep of real
|
|
1026
|
-
|
|
1029
|
+
Markdown inputs.
|
|
1027
1030
|
|
|
1028
1031
|
Measuring apples to apples: every adapter is tuned for **feature parity** with
|
|
1029
1032
|
Inkmark's defaults—CommonMark + core GFM (tables, strikethrough, tasklists,
|
data/lib/inkmark/3.3/inkmark.so
CHANGED
|
Binary file
|
data/lib/inkmark/3.4/inkmark.so
CHANGED
|
Binary file
|
data/lib/inkmark/4.0/inkmark.so
CHANGED
|
Binary file
|
data/lib/inkmark/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: inkmark
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.1
|
|
5
5
|
platform: x86_64-linux-musl
|
|
6
6
|
authors:
|
|
7
7
|
- Yaroslav Markin
|
|
@@ -122,7 +122,7 @@ dependencies:
|
|
|
122
122
|
- - ">="
|
|
123
123
|
- !ruby/object:Gem::Version
|
|
124
124
|
version: '0'
|
|
125
|
-
description: A very fast, feature-packed, AI-first
|
|
125
|
+
description: A very fast, feature-packed, AI-first Markdown (CommonMark/GFM) gem for
|
|
126
126
|
Ruby, based on pulldown-cmark (Rust).
|
|
127
127
|
email:
|
|
128
128
|
- yaroslav@markin.net
|
|
@@ -174,5 +174,5 @@ requirements: []
|
|
|
174
174
|
rubygems_version: 3.5.23
|
|
175
175
|
signing_key:
|
|
176
176
|
specification_version: 4
|
|
177
|
-
summary: Very fast, feature-packed, AI-first
|
|
177
|
+
summary: Very fast, feature-packed, AI-first Markdown gem for Ruby.
|
|
178
178
|
test_files: []
|