epub-wasm 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +74 -104
  2. package/package.json +2 -2
package/README.md CHANGED
@@ -1,141 +1,111 @@
1
1
  # epub-wasm
2
2
 
3
- A Rust crate that compiles to WebAssembly for parsing EPUB files into structured JSON format. This crate provides the core EPUB parsing logic that powers the [`epub-wasm` npm package](pkg/README.md).
4
-
5
- ## Overview
6
-
7
- This crate leverages the [`rbook`](https://crates.io/crates/rbook) and [`scraper`](https://crates.io/crates/scraper) crates to parse EPUB files and extract their content into a clean JSON structure with chapters, headings, and paragraphs. The parsed data is then exposed via WebAssembly bindings for use in web applications.
3
+ A WebAssembly module for parsing EPUB files into structured JSON format. This package provides a fast, browser-compatible way to extract and process EPUB content without server-side processing.
8
4
 
9
5
  ## Features
10
6
 
11
- - **EPUB parsing**: Extracts text content from EPUB files
12
- - **Structured output**: Organizes content into books, chapters, and blocks
13
- - **WebAssembly compatible**: Designed for compilation to WASM
14
- - **Fast**: Leverages Rust's performance for efficient parsing
15
-
16
- ## Building
17
-
18
- ### Prerequisites
19
-
20
- - [Rust](https://rustup.rs/) (latest stable)
21
- - [wasm-pack](https://rustwasm.github.io/wasm-pack/)
22
-
23
- ### Build for WebAssembly
24
-
25
- ```bash
26
- # Install wasm-pack if not installed
27
- cargo install wasm-pack
28
-
29
- # Build for bundler (recommended for Vite/SvelteKit)
30
- wasm-pack build --release --target bundler --out-dir pkg
31
-
32
- # Or build for web (serves WASM from URL)
33
- wasm-pack build --release --target web --out-dir pkg
34
- ```
7
+ - **Fast EPUB parsing**: Leverages Rust's performance compiled to WebAssembly
8
+ - **Browser-ready**: Works directly in web browsers and modern JavaScript environments
9
+ - **Structured output**: Converts EPUB content into clean JSON with chapters, headings, and paragraphs
10
+ - **Zero dependencies**: Self-contained WebAssembly module
35
11
 
36
- ### Build for Native (Testing)
12
+ ## Installation
37
13
 
38
14
  ```bash
39
- cargo build --release
15
+ npm install epub-wasm
16
+ # or
17
+ yarn add epub-wasm
18
+ # or
19
+ bun add epub-wasm
40
20
  ```
41
21
 
42
22
  ## Usage
43
23
 
44
- This crate is primarily designed to be compiled to WebAssembly. The main entry point is the `parse_epub` function:
24
+ ```javascript
25
+ import * as wasm from "epub-wasm";
45
26
 
46
- ```rust
47
- use epub_wasm::parse_epub;
27
+ async function parseEpub() {
28
+ // Load the EPUB file
29
+ const response = await fetch("./book.epub");
30
+ const arrayBuffer = await response.arrayBuffer();
31
+ const uint8Array = new Uint8Array(arrayBuffer);
48
32
 
49
- // Parse EPUB bytes into JSON string
50
- let epub_data: &[u8] = // ... load EPUB file
51
- let json_result = parse_epub(epub_data);
52
- let book: serde_json::Value = serde_json::from_str(&json_result).unwrap();
33
+ // Parse with WebAssembly
34
+ const jsonString = await wasm.parse_epub(uint8Array);
35
+ const bookData = JSON.parse(jsonString);
36
+
37
+ console.log("Book:", bookData);
38
+ // Access chapters: bookData.chapters
39
+ // Access first chapter blocks: bookData.chapters[0].blocks
40
+ }
53
41
  ```
54
42
 
55
- ## API
43
+ ## API Reference
56
44
 
57
- ### `parse_epub(data: &[u8]) -> String`
45
+ ### `parse_epub(data: Uint8Array) -> string`
58
46
 
59
- Parses an EPUB file from raw bytes and returns a JSON string representation.
47
+ Parses an EPUB file from a byte array and returns a JSON string.
60
48
 
61
49
  **Parameters:**
62
50
 
63
- - `data`: Raw bytes of the EPUB file
64
-
65
- **Returns:** JSON string with the following structure:
66
-
67
- ```json
68
- {
69
- "id": "book_001",
70
- "title": "Book Title",
71
- "chapters": [
72
- {
73
- "title": "Chapter Title",
74
- "id": "chapter_001",
75
- "blocks": [
76
- {
77
- "type": "heading",
78
- "text": "Heading Text"
79
- },
80
- {
81
- "type": "paragraph",
82
- "text": "Paragraph text..."
83
- }
84
- ]
85
- }
86
- ]
87
- }
88
- ```
51
+ - `data`: `Uint8Array` - The raw bytes of the EPUB file
89
52
 
90
- ## Dependencies
53
+ **Returns:** `string` - JSON representation of the parsed book
91
54
 
92
- - [`rbook`](https://crates.io/crates/rbook): EPUB parsing library
93
- - [`scraper`](https://crates.io/crates/scraper): HTML parsing and CSS selector engine
94
- - [`serde`](https://crates.io/crates/serde): Serialization framework
95
- - [`wasm-bindgen`](https://crates.io/crates/wasm-bindgen): WebAssembly bindings
55
+ **Throws:** Will return a fallback empty book JSON if parsing fails
96
56
 
97
- ## Project Structure
57
+ ### Extraction Process
98
58
 
99
- ```
100
- epub-wasm/
101
- ├── src/
102
- │ └── lib.rs # Main library code with WASM bindings
103
- ├── pkg/ # Generated WebAssembly package (after build)
104
- │ ├── epub_wasm.js
105
- │ ├── epub_wasm_bg.wasm
106
- │ └── ...
107
- ├── Cargo.toml # Package configuration
108
- └── README.md # This file
109
- ```
59
+ The EPUB parsing extracts content based on the following TypeScript type definitions:
110
60
 
111
- ## Development
61
+ ```typescript
62
+ type Book = {
63
+ id: string;
64
+ title: string;
65
+ chapters: Chapter[];
66
+ };
112
67
 
113
- ### Testing
68
+ type Chapter = {
69
+ title: string;
70
+ id: string;
71
+ blocks: Block[];
72
+ };
114
73
 
115
- ```bash
116
- cargo test
74
+ type Block = {
75
+ text: string;
76
+ type: "heading" | "paragraph";
77
+ };
117
78
  ```
118
79
 
119
- ### Running the Web Example
80
+ **How extraction works:**
120
81
 
121
- After building with wasm-pack, you can test the generated package:
122
-
123
- ```bash
124
- cd pkg
125
- # Serve locally (requires a web server)
126
- python3 -m http.server 8000
127
- # Then open index.html in browser
128
- ```
82
+ - **Book**: Represents the entire EPUB with a unique ID, title, and array of chapters
83
+ - **Chapter**: Each chapter contains a title, unique ID, and content broken into blocks
84
+ - **Block**: Individual content units that are either headings (`h1`-`h6`) or paragraphs
85
+ - Content is extracted from HTML/XHTML files within the EPUB, preserving the document structure
86
+ - Headings and paragraphs are identified and categorized automatically from the EPUB's markup
129
87
 
130
- ## Contributing
88
+ ### JSON Output Structure
131
89
 
132
- Contributions are welcome! Please feel free to submit issues and pull requests.
90
+ ```typescript
91
+ interface Book {
92
+ id: string;
93
+ title: string;
94
+ chapters: Chapter[];
95
+ }
133
96
 
134
- ## License
97
+ interface Chapter {
98
+ title: string;
99
+ id: string;
100
+ blocks: Block[];
101
+ }
135
102
 
136
- Licensed under MIT/Apache-2.0.
103
+ interface Block {
104
+ type: "heading" | "paragraph";
105
+ text: string;
106
+ }
107
+ ```
137
108
 
138
- ## Related Projects
109
+ ## License
139
110
 
140
- - [`rs-epub`](https://github.com/0xmiki/rs-epub) - Parent Rust project with additional EPUB utilities
141
- - [`epub-wasm` npm package](pkg/README.md) - The WebAssembly package published to npm
111
+ MIT/Apache-2.0
package/package.json CHANGED
@@ -5,11 +5,11 @@
5
5
  "Mikiyas <0xmik@proton.me>"
6
6
  ],
7
7
  "description": "EPUB utilities compiled to WebAssembly",
8
- "version": "0.1.1",
8
+ "version": "0.1.3",
9
9
  "license": "MIT/Apache-2.0",
10
10
  "repository": {
11
11
  "type": "git",
12
- "url": "https://github.com/0xmiki/epub-wasm"
12
+ "url": "https://github.com/0xmiki/epub_wasm"
13
13
  },
14
14
  "files": [
15
15
  "epub_wasm.js",