epub-wasm 0.1.5 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +96 -78
- package/epub_wasm_bg.wasm +0 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,111 +1,129 @@
|
|
|
1
1
|
# epub-wasm
|
|
2
2
|
|
|
3
|
-
A WebAssembly
|
|
3
|
+
A Rust crate that compiles to WebAssembly for parsing EPUB files into structured JSON format. This crate provides the core EPUB parsing logic that powers the [`epub-wasm` npm package](pkg/README.md).
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
This crate leverages the [`rbook`](https://crates.io/crates/rbook) to parse EPUB files and extract their content into a clean JSON structure with chapters, headings, and paragraphs. The parsed data is then exposed via WebAssembly bindings for use in web applications.
|
|
4
8
|
|
|
5
9
|
## Features
|
|
6
10
|
|
|
7
|
-
- **
|
|
11
|
+
- **EPUB parsing**: Extracts text content from EPUB files
|
|
12
|
+
- **Structured output**: Organizes content into books, chapters, and blocks
|
|
13
|
+
- **WebAssembly compatible**: Designed for compilation to WASM
|
|
14
|
+
- **Fast**: Leverages Rust's performance for efficient parsing (parse times typically range from ~20 ms to ~60 ms)
|
|
15
|
+
|
|
16
|
+
## Building
|
|
8
17
|
|
|
9
|
-
|
|
10
|
-
- **Structured output**: Converts EPUB content into clean JSON with chapters, headings, and paragraphs
|
|
11
|
-
- **Zero dependencies**: Self-contained WebAssembly module
|
|
18
|
+
### Prerequisites
|
|
12
19
|
|
|
13
|
-
|
|
20
|
+
- [Rust](https://rustup.rs/) (latest stable)
|
|
21
|
+
- [wasm-pack](https://rustwasm.github.io/wasm-pack/)
|
|
22
|
+
|
|
23
|
+
### Build for WebAssembly
|
|
14
24
|
|
|
15
25
|
```bash
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
#
|
|
20
|
-
|
|
26
|
+
# Install wasm-pack if not installed
|
|
27
|
+
cargo install wasm-pack
|
|
28
|
+
|
|
29
|
+
# Build for bundler (recommended for Vite/SvelteKit)
|
|
30
|
+
wasm-pack build --release --target bundler
|
|
31
|
+
|
|
32
|
+
# Or build for web (serves WASM from URL)
|
|
33
|
+
# use this to test with the index.html
|
|
34
|
+
wasm-pack build --release --target web
|
|
21
35
|
```
|
|
22
36
|
|
|
23
|
-
|
|
37
|
+
### Build for Native (Testing)
|
|
24
38
|
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
// Load the EPUB file
|
|
28
|
-
const wasm = await import("epub-wasm");
|
|
29
|
-
const response = await fetch("./book.epub");
|
|
30
|
-
const arrayBuffer = await response.arrayBuffer();
|
|
31
|
-
const uint8Array = new Uint8Array(arrayBuffer);
|
|
32
|
-
|
|
33
|
-
// Parse with WebAssembly
|
|
34
|
-
const jsonString = await wasm.parse_epub(uint8Array);
|
|
35
|
-
const bookData = JSON.parse(jsonString);
|
|
36
|
-
|
|
37
|
-
console.log("Book:", bookData);
|
|
38
|
-
// Access chapters: bookData.chapters
|
|
39
|
-
// Access first chapter blocks: bookData.chapters[0].blocks
|
|
40
|
-
}
|
|
39
|
+
```bash
|
|
40
|
+
cargo build --release
|
|
41
41
|
```
|
|
42
42
|
|
|
43
|
-
|
|
43
|
+
### Optimization Levels
|
|
44
44
|
|
|
45
|
-
|
|
45
|
+
You can adjust the `opt-level` in `Cargo.toml` to balance between performance and binary size:
|
|
46
46
|
|
|
47
|
-
|
|
47
|
+
| Value | Meaning |
|
|
48
|
+
| ----- | ---------------------------------------------- |
|
|
49
|
+
| `0` | No optimization (fast compile, slow binary) |
|
|
50
|
+
| `1` | Basic optimizations |
|
|
51
|
+
| `2` | Good balance (default for release) |
|
|
52
|
+
| `3` | Maximum performance |
|
|
53
|
+
| `"s"` | Optimize for **small binary size** |
|
|
54
|
+
| `"z"` | Optimize for **smallest possible binary size** |
|
|
48
55
|
|
|
49
|
-
|
|
56
|
+
For WebAssembly, you might want to use `"z"` for minimal size or `"s"` for a balance. The current setting is `opt-level = 3` for maximum performance.
|
|
57
|
+
|
|
58
|
+
## Usage
|
|
50
59
|
|
|
51
|
-
|
|
60
|
+
This crate is primarily designed to be compiled to WebAssembly. The main entry point is the `parse_epub` function:
|
|
52
61
|
|
|
53
|
-
|
|
62
|
+
```rust
|
|
63
|
+
use epub_wasm::parse_epub;
|
|
54
64
|
|
|
55
|
-
|
|
65
|
+
// Parse EPUB bytes into JSON string
|
|
66
|
+
let epub_data: &[u8] = // ... load EPUB file
|
|
67
|
+
let json_result = parse_epub(epub_data);
|
|
68
|
+
let book: serde_json::Value = serde_json::from_str(&json_result).unwrap();
|
|
69
|
+
```
|
|
56
70
|
|
|
57
|
-
|
|
71
|
+
## API
|
|
58
72
|
|
|
59
|
-
|
|
73
|
+
### `parse_epub(data: &[u8]) -> String`
|
|
60
74
|
|
|
61
|
-
|
|
62
|
-
type Book = {
|
|
63
|
-
id: string;
|
|
64
|
-
title: string;
|
|
65
|
-
chapters: Chapter[];
|
|
66
|
-
};
|
|
75
|
+
Parses an EPUB file from raw bytes and returns a JSON string representation.
|
|
67
76
|
|
|
68
|
-
|
|
69
|
-
title: string;
|
|
70
|
-
id: string;
|
|
71
|
-
blocks: Block[];
|
|
72
|
-
};
|
|
77
|
+
**Parameters:**
|
|
73
78
|
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
79
|
+
- `data`: Raw bytes of the EPUB file
|
|
80
|
+
|
|
81
|
+
**Returns:** JSON string with the following structure:
|
|
82
|
+
|
|
83
|
+
```json
|
|
84
|
+
{
|
|
85
|
+
"id": "550e8400-e29b-41d4-a716-446655440000",
|
|
86
|
+
"title": "Book Title",
|
|
87
|
+
"chapters": [
|
|
88
|
+
{
|
|
89
|
+
"title": "Chapter Title",
|
|
90
|
+
"id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
|
|
91
|
+
"blocks": [
|
|
92
|
+
{
|
|
93
|
+
"type": "heading",
|
|
94
|
+
"text": "Heading Text",
|
|
95
|
+
"position": "0"
|
|
96
|
+
},
|
|
97
|
+
{
|
|
98
|
+
"type": "paragraph",
|
|
99
|
+
"text": "Paragraph text...",
|
|
100
|
+
"position": "1"
|
|
101
|
+
}
|
|
102
|
+
]
|
|
103
|
+
}
|
|
104
|
+
]
|
|
105
|
+
}
|
|
78
106
|
```
|
|
79
107
|
|
|
80
|
-
|
|
108
|
+
Each block in the parsed output includes a `position` field, a lexicographically sortable string that defines its stable reading order within the document. eg sequence:
|
|
109
|
+
"0", "1", "2", ..., "9", "A", "B", ..., "z", "10", "11", ...
|
|
81
110
|
|
|
82
|
-
|
|
83
|
-
- **Chapter**: Each chapter contains a title, unique ID, and content broken into blocks
|
|
84
|
-
- **Block**: Individual content units that are either headings (`h1`-`h6`) or paragraphs
|
|
85
|
-
- Content is extracted from HTML/XHTML files within the EPUB, preserving the document structure
|
|
86
|
-
- Headings and paragraphs are identified and categorized automatically from the EPUB's markup
|
|
111
|
+
## Dependencies
|
|
87
112
|
|
|
88
|
-
|
|
113
|
+
- [`rbook`](https://crates.io/crates/rbook): EPUB parsing library
|
|
114
|
+
- [`serde`](https://crates.io/crates/serde): Serialization framework
|
|
115
|
+
- [`wasm-bindgen`](https://crates.io/crates/wasm-bindgen): WebAssembly bindings
|
|
116
|
+
- [`uuid`](https://crates.io/crates/wasm-bindgen): Id
|
|
89
117
|
|
|
90
|
-
|
|
91
|
-
interface Book {
|
|
92
|
-
id: string;
|
|
93
|
-
title: string;
|
|
94
|
-
chapters: Chapter[];
|
|
95
|
-
}
|
|
118
|
+
## Development
|
|
96
119
|
|
|
97
|
-
|
|
98
|
-
title: string;
|
|
99
|
-
id: string;
|
|
100
|
-
blocks: Block[];
|
|
101
|
-
}
|
|
120
|
+
### Running the Web Example
|
|
102
121
|
|
|
103
|
-
|
|
104
|
-
type: "heading" | "paragraph";
|
|
105
|
-
text: string;
|
|
106
|
-
}
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
## License
|
|
122
|
+
After building with wasm-pack, you can test the generated package:
|
|
110
123
|
|
|
111
|
-
|
|
124
|
+
```bash
|
|
125
|
+
cd pkg
|
|
126
|
+
# Serve locally (requires a web server)
|
|
127
|
+
python3 -m http.server 8000
|
|
128
|
+
# Then open index.html in browser
|
|
129
|
+
```
|
package/epub_wasm_bg.wasm
CHANGED
|
Binary file
|