epub-wasm 0.1.5 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,111 +1,129 @@
1
1
  # epub-wasm
2
2
 
3
- A WebAssembly module for parsing EPUB files into structured JSON format. This package provides a fast, browser-compatible way to extract and process EPUB content without server-side processing.
3
+ A Rust crate that compiles to WebAssembly for parsing EPUB files into structured JSON format. This crate provides the core EPUB parsing logic that powers the [`epub-wasm` npm package](pkg/README.md).
4
+
5
+ ## Overview
6
+
7
+ This crate leverages the [`rbook`](https://crates.io/crates/rbook) to parse EPUB files and extract their content into a clean JSON structure with chapters, headings, and paragraphs. The parsed data is then exposed via WebAssembly bindings for use in web applications.
4
8
 
5
9
  ## Features
6
10
 
7
- - **Fast EPUB parsing**: Leverages Rust's performance for efficient parsing (parse times typically range from ~20 ms to ~60 ms)
11
+ - **EPUB parsing**: Extracts text content from EPUB files
12
+ - **Structured output**: Organizes content into books, chapters, and blocks
13
+ - **WebAssembly compatible**: Designed for compilation to WASM
14
+ - **Fast**: Leverages Rust's performance for efficient parsing (parse times typically range from ~20 ms to ~60 ms)
15
+
16
+ ## Building
8
17
 
9
- - **Browser-ready**: Works directly in web browsers and modern JavaScript environments
10
- - **Structured output**: Converts EPUB content into clean JSON with chapters, headings, and paragraphs
11
- - **Zero dependencies**: Self-contained WebAssembly module
18
+ ### Prerequisites
12
19
 
13
- ## Installation
20
+ - [Rust](https://rustup.rs/) (latest stable)
21
+ - [wasm-pack](https://rustwasm.github.io/wasm-pack/)
22
+
23
+ ### Build for WebAssembly
14
24
 
15
25
  ```bash
16
- npm install epub-wasm
17
- # or
18
- yarn add epub-wasm
19
- # or
20
- bun add epub-wasm
26
+ # Install wasm-pack if not installed
27
+ cargo install wasm-pack
28
+
29
+ # Build for bundler (recommended for Vite/SvelteKit)
30
+ wasm-pack build --release --target bundler
31
+
32
+ # Or build for web (serves WASM from URL)
33
+ # use this to test with the index.html
34
+ wasm-pack build --release --target web
21
35
  ```
22
36
 
23
- ## Usage
37
+ ### Build for Native (Testing)
24
38
 
25
- ```javascript
26
- async function parseEpub() {
27
- // Load the EPUB file
28
- const wasm = await import("epub-wasm");
29
- const response = await fetch("./book.epub");
30
- const arrayBuffer = await response.arrayBuffer();
31
- const uint8Array = new Uint8Array(arrayBuffer);
32
-
33
- // Parse with WebAssembly
34
- const jsonString = await wasm.parse_epub(uint8Array);
35
- const bookData = JSON.parse(jsonString);
36
-
37
- console.log("Book:", bookData);
38
- // Access chapters: bookData.chapters
39
- // Access first chapter blocks: bookData.chapters[0].blocks
40
- }
39
+ ```bash
40
+ cargo build --release
41
41
  ```
42
42
 
43
- ## API Reference
43
+ ### Optimization Levels
44
44
 
45
- ### `parse_epub(data: Uint8Array) -> string`
45
+ You can adjust the `opt-level` in `Cargo.toml` to balance between performance and binary size:
46
46
 
47
- Parses an EPUB file from a byte array and returns a JSON string.
47
+ | Value | Meaning |
48
+ | ----- | ---------------------------------------------- |
49
+ | `0` | No optimization (fast compile, slow binary) |
50
+ | `1` | Basic optimizations |
51
+ | `2` | Good balance (default for release) |
52
+ | `3` | Maximum performance |
53
+ | `"s"` | Optimize for **small binary size** |
54
+ | `"z"` | Optimize for **smallest possible binary size** |
48
55
 
49
- **Parameters:**
56
+ For WebAssembly, you might want to use `"z"` for minimal size or `"s"` for a balance. The current setting is `opt-level = 3` for maximum performance.
57
+
58
+ ## Usage
50
59
 
51
- - `data`: `Uint8Array` - The raw bytes of the EPUB file
60
+ This crate is primarily designed to be compiled to WebAssembly. The main entry point is the `parse_epub` function:
52
61
 
53
- **Returns:** `string` - JSON representation of the parsed book
62
+ ```rust
63
+ use epub_wasm::parse_epub;
54
64
 
55
- **Throws:** Will return a fallback empty book JSON if parsing fails
65
+ // Parse EPUB bytes into JSON string
66
+ let epub_data: &[u8] = // ... load EPUB file
67
+ let json_result = parse_epub(epub_data);
68
+ let book: serde_json::Value = serde_json::from_str(&json_result).unwrap();
69
+ ```
56
70
 
57
- ### Extraction Process
71
+ ## API
58
72
 
59
- The EPUB parsing extracts content based on the following TypeScript type definitions:
73
+ ### `parse_epub(data: &[u8]) -> String`
60
74
 
61
- ```typescript
62
- type Book = {
63
- id: string;
64
- title: string;
65
- chapters: Chapter[];
66
- };
75
+ Parses an EPUB file from raw bytes and returns a JSON string representation.
67
76
 
68
- type Chapter = {
69
- title: string;
70
- id: string;
71
- blocks: Block[];
72
- };
77
+ **Parameters:**
73
78
 
74
- type Block = {
75
- text: string;
76
- type: "heading" | "paragraph";
77
- };
79
+ - `data`: Raw bytes of the EPUB file
80
+
81
+ **Returns:** JSON string with the following structure:
82
+
83
+ ```json
84
+ {
85
+ "id": "550e8400-e29b-41d4-a716-446655440000",
86
+ "title": "Book Title",
87
+ "chapters": [
88
+ {
89
+ "title": "Chapter Title",
90
+ "id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
91
+ "blocks": [
92
+ {
93
+ "type": "heading",
94
+ "text": "Heading Text",
95
+ "position": "0"
96
+ },
97
+ {
98
+ "type": "paragraph",
99
+ "text": "Paragraph text...",
100
+ "position": "1"
101
+ }
102
+ ]
103
+ }
104
+ ]
105
+ }
78
106
  ```
79
107
 
80
- **How extraction works:**
108
+ Each block in the parsed output includes a `position` field, a lexicographically sortable string that defines its stable reading order within the document. eg sequence:
109
+ "0", "1", "2", ..., "9", "A", "B", ..., "z", "10", "11", ...
81
110
 
82
- - **Book**: Represents the entire EPUB with a unique ID, title, and array of chapters
83
- - **Chapter**: Each chapter contains a title, unique ID, and content broken into blocks
84
- - **Block**: Individual content units that are either headings (`h1`-`h6`) or paragraphs
85
- - Content is extracted from HTML/XHTML files within the EPUB, preserving the document structure
86
- - Headings and paragraphs are identified and categorized automatically from the EPUB's markup
111
+ ## Dependencies
87
112
 
88
- ### JSON Output Structure
113
+ - [`rbook`](https://crates.io/crates/rbook): EPUB parsing library
114
+ - [`serde`](https://crates.io/crates/serde): Serialization framework
115
+ - [`wasm-bindgen`](https://crates.io/crates/wasm-bindgen): WebAssembly bindings
116
+ - [`uuid`](https://crates.io/crates/wasm-bindgen): Id
89
117
 
90
- ```typescript
91
- interface Book {
92
- id: string;
93
- title: string;
94
- chapters: Chapter[];
95
- }
118
+ ## Development
96
119
 
97
- interface Chapter {
98
- title: string;
99
- id: string;
100
- blocks: Block[];
101
- }
120
+ ### Running the Web Example
102
121
 
103
- interface Block {
104
- type: "heading" | "paragraph";
105
- text: string;
106
- }
107
- ```
108
-
109
- ## License
122
+ After building with wasm-pack, you can test the generated package:
110
123
 
111
- MIT/Apache-2.0
124
+ ```bash
125
+ cd pkg
126
+ # Serve locally (requires a web server)
127
+ python3 -m http.server 8000
128
+ # Then open index.html in browser
129
+ ```
package/epub_wasm_bg.wasm CHANGED
Binary file
package/package.json CHANGED
@@ -5,7 +5,7 @@
5
5
  "Mikiyas <0xmik@proton.me>"
6
6
  ],
7
7
  "description": "EPUB utilities compiled to WebAssembly",
8
- "version": "0.1.5",
8
+ "version": "0.2.0",
9
9
  "license": "MIT/Apache-2.0",
10
10
  "repository": {
11
11
  "type": "git",