portapack 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -52,6 +52,8 @@ export default defineConfig({
52
52
  ],
53
53
  },
54
54
  { text: 'Contributing', link: '/contributing' },
55
+ { text: 'Architecture', link: '/architecture'},
56
+ { text: 'Roadmap', link: 'roadmap'}
55
57
  ],
56
58
 
57
59
  sidebar: {
@@ -0,0 +1,186 @@
1
+ # PortaPack Architecture
2
+
3
+ ## Overview
4
+
5
+ PortaPack is a sophisticated tool that bundles entire websites—HTML, CSS, JavaScript, images, and fonts—into self-contained HTML files for offline access. This document outlines the architectural components that make up the system.
6
+
7
+ ```mermaid
8
+ graph TD
9
+ CLI[CLI Entry Point] --> Options[Options Parser]
10
+ Options --> Core
11
+ API[API Entry Point] --> Core
12
+
13
+ subgraph Core ["Core Pipeline"]
14
+ Parser[HTML Parser] --> Extractor[Asset Extractor]
15
+ Extractor --> Minifier[Asset Minifier]
16
+ Minifier --> Packer[HTML Packer]
17
+ end
18
+
19
+ subgraph Recursive ["Advanced Features"]
20
+ WebFetcher[Web Fetcher] --> MultipageBundler[Multipage Bundler]
21
+ end
22
+
23
+ WebFetcher --> Parser
24
+ Core --> Output[Bundled HTML]
25
+ MultipageBundler --> Output
26
+
27
+ subgraph Utilities ["Utilities"]
28
+ Logger[Logger]
29
+ MimeUtils[MIME Utilities]
30
+ BuildTimer[Build Timer]
31
+ Slugify[URL Slugifier]
32
+ end
33
+
34
+ Logger -.-> CLI
35
+ Logger -.-> Core
36
+ Logger -.-> Recursive
37
+ MimeUtils -.-> Extractor
38
+ MimeUtils -.-> Parser
39
+ BuildTimer -.-> CLI
40
+ BuildTimer -.-> API
41
+ Slugify -.-> MultipageBundler
42
+ ```
43
+
44
+ ## Entry Points
45
+
46
+ ### CLI Interface
47
+
48
+ The command-line interface provides a convenient way to use PortaPack through terminal commands:
49
+
50
+ | Component | Purpose |
51
+ |-----------|---------|
52
+ | `cli-entry.ts` | Executable entry point with shebang support |
53
+ | `cli.ts` | Main runner that processes args and manages execution |
54
+ | `options.ts` | Parses command-line arguments and normalizes options |
55
+
56
+ ### API Interface
57
+
58
+ The programmatic API enables developers to integrate PortaPack into their applications:
59
+
60
+ | Component | Purpose |
61
+ |-----------|---------|
62
+ | `index.ts` | Exports public functions like `pack()` with TypeScript types |
63
+ | `types.ts` | Defines shared interfaces and types for the entire system |
64
+
65
+ ## Core Pipeline
66
+
67
+ The bundling process follows a clear four-stage pipeline:
68
+
69
+ ### 1. HTML Parser (`parser.ts`)
70
+
71
+ The parser reads and analyzes the input HTML:
72
+ - Uses Cheerio for robust HTML parsing
73
+ - Identifies linked assets through element attributes (href, src, etc.)
74
+ - Creates an initial asset list with URLs and inferred types
75
+ - Handles both local file paths and remote URLs
76
+
77
+ ### 2. Asset Extractor (`extractor.ts`)
78
+
79
+ The extractor resolves and fetches all referenced assets:
80
+ - Resolves relative URLs against the base context
81
+ - Fetches content for all discovered assets
82
+ - Recursively extracts nested assets from CSS (@import, url())
83
+ - Handles protocol-relative URLs and different origins
84
+ - Provides detailed logging of asset discovery
85
+
86
+ ### 3. Asset Minifier (`minifier.ts`)
87
+
88
+ The minifier reduces the size of all content:
89
+ - Minifies HTML using html-minifier-terser
90
+ - Minifies CSS using clean-css
91
+ - Minifies JavaScript using terser
92
+ - Preserves original content if minification fails
93
+ - Configurable through command-line flags
94
+
95
+ ### 4. HTML Packer (`packer.ts`)
96
+
97
+ The packer combines everything into a single file:
98
+ - Inlines CSS into `<style>` tags
99
+ - Embeds JavaScript into `<script>` tags
100
+ - Converts binary assets to data URIs
101
+ - Handles srcset attributes properly
102
+ - Ensures proper HTML structure with base tag
103
+
104
+ ## Advanced Features
105
+
106
+ ### Web Fetcher (`web-fetcher.ts`)
107
+
108
+ For remote content, the web fetcher provides crawling capabilities:
109
+ - Uses Puppeteer for fully-rendered page capture
110
+ - Crawls websites recursively to specified depth
111
+ - Respects same-origin policy by default
112
+ - Manages browser instances efficiently
113
+ - Provides detailed logging of the crawl process
114
+
115
+ ### Multipage Bundler (`bundler.ts`)
116
+
117
+ For bundling multiple pages into a single file:
118
+ - Combines multiple HTML documents into one
119
+ - Creates a client-side router for navigation
120
+ - Generates a navigation interface
121
+ - Uses slugs for routing between pages
122
+ - Handles page templates and content swapping
123
+
124
+ ## Utilities
125
+
126
+ ### Logger (`logger.ts`)
127
+ - Customizable log levels (debug, info, warn, error)
128
+ - Consistent logging format across the codebase
129
+ - Optional timestamps and colored output
130
+
131
+ ### MIME Utilities (`mime.ts`)
132
+ - Maps file extensions to correct MIME types
133
+ - Categorizes assets by type (CSS, JS, image, font)
134
+ - Provides fallbacks for unknown extensions
135
+
136
+ ### Build Timer (`meta.ts`)
137
+ - Tracks build performance metrics
138
+ - Records asset counts and page counts
139
+ - Captures output size and build duration
140
+ - Collects errors and warnings for reporting
141
+
142
+ ### URL Slugifier (`slugify.ts`)
143
+ - Converts URLs to safe HTML IDs
144
+ - Handles special characters and normalization
145
+ - Prevents slug collisions in multipage bundles
146
+
147
+ ## Asynchronous Processing
148
+
149
+ PortaPack uses modern async patterns throughout:
150
+
151
+ - **Promise-based Pipeline**: Each stage returns promises that are awaited
152
+ - **Sequential Processing**: Assets are processed in order to avoid overwhelming resources
153
+ - **Error Boundaries**: Individual asset failures don't break the entire pipeline
154
+ - **Resource Management**: Browser instances and file handles are properly closed
155
+
156
+ ## Build System
157
+
158
+ PortaPack uses a dual build configuration:
159
+
160
+ | Build Target | Format | Purpose |
161
+ |--------------|--------|---------|
162
+ | CLI | CommonJS (.cjs) | Works with Node.js and npx |
163
+ | API | ESModule (.js) | Modern import/export support |
164
+
165
+ TypeScript declarations (.d.ts) are generated for API consumers, and source maps support debugging.
166
+
167
+ ## Current Limitations
168
+
169
+ ### Script Execution Issues
170
+
171
+ - Inlined scripts with `async`/`defer` attributes lose their intended loading behavior
172
+ - ES Modules with import/export statements may fail after bundling
173
+ - Script execution order can change, breaking dependencies
174
+
175
+ ### Content Limitations
176
+
177
+ - CORS policies may prevent access to some cross-origin resources
178
+ - Only initially rendered content from SPAs is captured by default
179
+ - Very large sites produce impractically large HTML files
180
+
181
+ ### Technical Constraints
182
+
183
+ - No streaming API or WebSocket support
184
+ - Service worker capabilities are not preserved
185
+ - Memory pressure with large sites
186
+ - Limited support for authenticated content
@@ -77,8 +77,7 @@ For more specific use cases, you can access individual components:
77
77
  import {
78
78
  generatePortableHTML,
79
79
  generateRecursivePortableHTML,
80
- bundleMultiPageHTML,
81
- fetchAndPackWebPage,
80
+ bundleMultiPageHTML
82
81
  } from 'portapack';
83
82
 
84
83
  // Bundle a single HTML file or URL
@@ -0,0 +1,233 @@
1
+ # PortaPack Roadmap
2
+
3
+ ## Version 2: Enhanced Bundling Capabilities
4
+
5
+ Version 2 focuses on addressing core limitations and expanding compatibility with modern web applications.
6
+
7
+ ### 🎯 Script Execution Enhancement
8
+
9
+ | Feature | Description | Priority |
10
+ |---------|-------------|----------|
11
+ | **Script Execution Manager** | Preserve loading order of async/defer scripts | High |
12
+ | **Dependency Analysis** | Detect and maintain script dependencies | High |
13
+ | **Script Initialization Sequencing** | Ensure scripts initialize in the correct order | Medium |
14
+
15
+ **Implementation:** Add a lightweight runtime script that:
16
+ - Maintains a queue of scripts to execute
17
+ - Respects original async/defer behavior
18
+ - Adds proper event listeners for load/error events
19
+ - Enforces correct execution order
20
+
21
+ ### 🔄 Module Support
22
+
23
+ | Feature | Description | Priority |
24
+ |---------|-------------|----------|
25
+ | **ES Module Transformation** | Convert ES modules to browser-compatible format | High |
26
+ | **Import Resolution** | Resolve and inline imported modules | High |
27
+ | **Export Management** | Create namespace for module exports | Medium |
28
+
29
+ **Implementation:**
30
+ - Parse import statements using an AST parser
31
+ - Resolve modules relative to source files
32
+ - Rewrite as namespaced functions
33
+ - Create a runtime module resolution system
34
+
35
+ ### 📦 Resource Optimization
36
+
37
+ | Feature | Description | Priority |
38
+ |---------|-------------|----------|
39
+ | **Bundle Chunking** | Split large bundles into multiple linked files | Medium |
40
+ | **Lazy Loading** | Load assets only when needed | Medium |
41
+ | **Selective Embedding** | Configure thresholds for embedding vs. linking | Low |
42
+
43
+ **Implementation:**
44
+ - Create a manifest system for chunked resources
45
+ - Add intersection observer for lazy loading
46
+ - Implement size-based decision logic for embedding
47
+
48
+ ### 🖥️ Enhanced SPA Support
49
+
50
+ | Feature | Description | Priority |
51
+ |---------|-------------|----------|
52
+ | **Rendered State Capture** | Wait for JavaScript rendering before capture | High |
53
+ | **Route Detection** | Automatically discover SPA routes | Medium |
54
+ | **State Interaction** | Simulate user interactions to capture states | Medium |
55
+
56
+ **Implementation:**
57
+ - Add configurable wait strategies
58
+ - Implement navigation state detection
59
+ - Create event simulation system
60
+
61
+ ### 🔒 Authentication Support
62
+
63
+ | Feature | Description | Priority |
64
+ |---------|-------------|----------|
65
+ | **Authentication Configuration** | Pass credentials to the crawler | High |
66
+ | **Login Sequence** | Define authentication steps | Medium |
67
+ | **Session Management** | Maintain authenticated state during crawling | Medium |
68
+
69
+ **Implementation:**
70
+ - Add cookie and header configuration options
71
+ - Create login sequence definition format
72
+ - Implement session persistence
73
+
74
+ ### 💼 Developer Experience
75
+
76
+ | Feature | Description | Priority |
77
+ |---------|-------------|----------|
78
+ | **Enhanced Diagnostics** | Improved logging and error reporting | Medium |
79
+ | **Preview Server** | Built-in server for bundle testing | Medium |
80
+ | **Bundle Analysis** | Visual report of bundle composition | Low |
81
+
82
+ **Implementation:**
83
+ - Expand logging with visualization options
84
+ - Create lightweight preview server
85
+ - Implement size and composition analyzer
86
+
87
+ ## Version 3: Universal Content Platform
88
+
89
+ Version 3 transforms PortaPack from a bundling tool into a comprehensive offline content platform.
90
+
91
+ ### 📱 Cross-Platform Applications
92
+
93
+ | Platform | Key Features |
94
+ |----------|-------------|
95
+ | **Desktop** (macOS, Windows, Linux) | Native app with system integration, background bundling |
96
+ | **Mobile** (iOS, Android) | Touch-optimized interface, efficient storage management |
97
+ | **Browser Extensions** | One-click saving, context menu integration |
98
+
99
+ **Implementation:**
100
+ - Use Electron for desktop applications
101
+ - React Native for mobile platforms
102
+ - Extension APIs for major browsers
103
+
104
+ ### ☁️ Synchronization System
105
+
106
+ | Feature | Description |
107
+ |---------|-------------|
108
+ | **Encrypted Sync** | End-to-end encrypted content synchronization |
109
+ | **Delta Updates** | Bandwidth-efficient incremental synchronization |
110
+ | **Reading State Sync** | Preserve reading position across devices |
111
+ | **Selective Sync** | Choose what content syncs to which devices |
112
+
113
+ **Implementation:**
114
+ - Create secure synchronization protocol
115
+ - Implement conflict resolution system
116
+ - Build metadata synchronization service
117
+
118
+ ### 🧠 Content Intelligence
119
+
120
+ | Feature | Description |
121
+ |---------|-------------|
122
+ | **Automatic Summarization** | AI-generated summaries of saved content |
123
+ | **Smart Tagging** | Automatic categorization and organization |
124
+ | **Content Relationships** | Identify connections between saved items |
125
+ | **Content Extraction** | Convert complex pages to readable format |
126
+
127
+ **Implementation:**
128
+ - Integrate NLP models for content understanding
129
+ - Develop concept extraction algorithms
130
+ - Create relationship graph between content
131
+ - Build advanced readability transformations
132
+
133
+ ### 🔍 Advanced Search & Organization
134
+
135
+ | Feature | Description |
136
+ |---------|-------------|
137
+ | **Full-Text Search** | Search across all content |
138
+ | **Semantic Search** | Find content by meaning, not just keywords |
139
+ | **Smart Collections** | Automatically organize related content |
140
+ | **Timeline Views** | Chronological content organization |
141
+
142
+ **Implementation:**
143
+ - Build full-text search engine with indexing
144
+ - Implement vector-based semantic search
145
+ - Create automatic collection generation
146
+ - Develop flexible visualization components
147
+
148
+ ### ✏️ Interactive Features
149
+
150
+ | Feature | Description |
151
+ |---------|-------------|
152
+ | **Annotation System** | Highlights, notes, and comments |
153
+ | **Content Transformations** | Dark mode, font adjustment, text-to-speech |
154
+ | **Social Sharing** | Controlled sharing with privacy options |
155
+ | **Export Capabilities** | Convert to PDF, EPUB, and other formats |
156
+
157
+ **Implementation:**
158
+ - Create cross-platform annotation framework
159
+ - Build content adaptation engine
160
+ - Implement secure sharing mechanism
161
+ - Develop export converters for multiple formats
162
+
163
+ ### 🔧 Technical Architecture Expansion
164
+
165
+ | Component | Purpose |
166
+ |-----------|---------|
167
+ | **Sync Service** | Handle cross-device synchronization |
168
+ | **Auth System** | Manage user accounts and security |
169
+ | **Content Processing** | Pipeline for intelligent content handling |
170
+ | **Analytics** | Privacy-focused usage tracking |
171
+
172
+ **Implementation:**
173
+ - Build scalable backend services
174
+ - Create secure authentication system
175
+ - Develop modular processing pipeline
176
+ - Implement privacy-preserving analytics
177
+
178
+ ### 🧩 Developer Platform
179
+
180
+ | Feature | Description |
181
+ |---------|-------------|
182
+ | **Plugin System** | Custom processors and content handlers |
183
+ | **API** | Third-party integration capabilities |
184
+ | **Webhooks** | Automation triggers and notifications |
185
+ | **Theme Engine** | Customization of the reading experience |
186
+
187
+ **Implementation:**
188
+ - Create plugin architecture with sandboxing
189
+ - Develop comprehensive API documentation
190
+ - Implement webhook system with security
191
+ - Build theme and template engine
192
+
193
+ ### 🤖 Machine Learning Capabilities
194
+
195
+ | Feature | Description |
196
+ |---------|-------------|
197
+ | **Topic Extraction** | Identify main topics in content |
198
+ | **Entity Recognition** | Detect people, places, organizations |
199
+ | **Recommendation Engine** | Suggest related content |
200
+ | **On-Device Processing** | Local ML for privacy and performance |
201
+
202
+ **Implementation:**
203
+ - Deploy NLP models for content analysis
204
+ - Create entity linking system
205
+ - Develop recommendation algorithms
206
+ - Optimize ML models for on-device usage
207
+
208
+ ## Development Timeline
209
+
210
+ ### Version 2 Milestones
211
+
212
+ 1. **Phase 1:** Script Execution Manager & Module Support
213
+ 2. **Phase 2:** Resource Optimization & SPA Support
214
+ 3. **Phase 3** Authentication Support & Developer Experience
215
+ 4. **Phase 4** Stabilization & Release
216
+
217
+ ### Version 3 Phased Approach
218
+
219
+ 1. **Foundation Phase:**
220
+ - Cross-platform application architecture
221
+ - Core synchronization system
222
+ - Basic content intelligence
223
+
224
+ 2. **Expansion Phase:**
225
+ - Advanced search and organization
226
+ - Interactive features
227
+ - Developer platform beta
228
+
229
+ 3. **Intelligence Phase:**
230
+ - Full machine learning capabilities
231
+ - Recommendation engine
232
+ - Advanced content relationships
233
+
package/examples/main.ts CHANGED
@@ -13,7 +13,6 @@ import {
13
13
  generatePortableHTML,
14
14
  bundleMultiPageHTML,
15
15
  generateRecursivePortableHTML,
16
- fetchAndPackWebPage,
17
16
  } from '../src/index'; // 🔧 use '../src/index' for dev, '../dist/index' for built
18
17
 
19
18
  const TEMP_DIR = path.join(os.tmpdir(), 'portapack-example');
@@ -67,17 +66,6 @@ async function timedBundle(name: string, task: () => Promise<{ html: string; met
67
66
  })
68
67
  );
69
68
 
70
- // 🔹 Fetch and display raw HTML from remote site (no metadata)
71
- console.log(chalk.cyan('\n⏳ Fetch and Pack Web Page (raw)'));
72
- try {
73
- const { html, metadata } = await fetchAndPackWebPage('https://getbootstrap.com');
74
- const filePath = await writeTempFile('fetched-page.html', html);
75
- console.log(chalk.green('✅ Saved fetched HTML:'), `file://${filePath}`);
76
- console.log(`📦 Size: ${(metadata.outputSize / 1024).toFixed(2)} KB`);
77
- } catch (err) {
78
- console.error(chalk.red('❌ Failed to fetch web page:'), err);
79
- }
80
-
81
69
  // 🔹 Multi-page manual bundle
82
70
  await timedBundle('Multi-Page Site Bundling', async () => {
83
71
  const pages = [
@@ -101,15 +89,5 @@ async function timedBundle(name: string, task: () => Promise<{ html: string; met
101
89
  generateRecursivePortableHTML('https://getbootstrap.com', 2)
102
90
  );
103
91
 
104
- // 🔹 Broken page test
105
- console.log(chalk.cyan('\n⏳ Broken Page Test'));
106
- try {
107
- const { html, metadata } = await fetchAndPackWebPage('https://example.com/404');
108
- const brokenOut = await writeTempFile('broken-page.html', html);
109
- console.log(chalk.yellow('⚠️ Page returned something, saved to:'), `file://${brokenOut}`);
110
- } catch {
111
- console.log(chalk.red('🚫 Could not fetch broken page as expected.'));
112
- }
113
-
114
92
  console.log(chalk.gray(`\n📁 Output directory: ${TEMP_DIR}\n`));
115
93
  })();
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "portapack",
3
- "version": "0.3.2",
3
+ "version": "0.3.3",
4
4
  "description": "📦 A tool to bundle and minify HTML and all its dependencies into a single portable file.",
5
5
  "main": "dist/index.js",
6
6
  "module": "dist/index.js",
@@ -215,8 +215,6 @@ export function bundleMultiPageHTML(pages: PageEntry[], logger?: Logger): string
215
215
  `Could not determine a valid base slug for "${page.url}", using generated fallback "${baseSlug}".`
216
216
  );
217
217
  }
218
- // --- END REVISED SLUG LOGIC ---
219
-
220
218
  // --- Collision Handling ---
221
219
  let slug = baseSlug;
222
220
  let collisionCounter = 1;
@@ -288,7 +288,6 @@ async function fetchAsset(
288
288
  logger?.debug(
289
289
  `Workspaceed remote asset ${resolvedUrl.href} (Status: ${response.status}, Type: ${response.headers['content-type'] || 'N/A'}, Size: ${response.data?.byteLength ?? 0} bytes)`
290
290
  );
291
- // console.log(`[DEBUG fetchAsset] HTTP fetch SUCCESS for: ${resolvedUrl.href}, Status: ${response.status}`); // Keep debug log commented unless needed
292
291
  // Return the fetched data as a Node.js Buffer
293
292
  return Buffer.from(response.data);
294
293
  }
@@ -300,7 +299,6 @@ async function fetchAsset(
300
299
  // IMPORTANT: This strips query params and fragments from the URL
301
300
  filePath = fileURLToPath(resolvedUrl);
302
301
  } catch (e: any) {
303
- // console.error(`[DEBUG fetchAsset] fileURLToPath FAILED for: ${resolvedUrl.href}`, e); // Keep debug log commented unless needed
304
302
  logger?.error(
305
303
  `Could not convert file URL to path: ${resolvedUrl.href}. Error: ${e.message}`
306
304
  );
@@ -308,12 +306,9 @@ async function fetchAsset(
308
306
  }
309
307
 
310
308
  const normalizedForLog = path.normalize(filePath);
311
- // console.log(`[DEBUG fetchAsset] Attempting readFile with path: "${normalizedForLog}" (Original from URL: "${filePath}")`); // Keep debug log commented unless needed
312
309
 
313
310
  // Read file content using fs/promises
314
311
  const data = await readFile(filePath); // This call uses the mock in tests
315
-
316
- // console.log(`[DEBUG fetchAsset] readFile call SUCCEEDED for path: "${normalizedForLog}". Data length: ${data?.byteLength}`); // Keep debug log commented unless needed
317
312
  logger?.debug(`Read local file ${filePath} (${data.byteLength} bytes)`);
318
313
  // Return the file content as a Buffer
319
314
  return data;
package/tsup.config.ts CHANGED
@@ -7,7 +7,6 @@
7
7
  * - CommonJS format (`cjs`) for CLI compatibility with Node/npx
8
8
  * - .cjs file extension to avoid ESM interpretation issues
9
9
  * - Shebang (`#!/usr/bin/env node`) for executability
10
- * - No type declarations
11
10
  *
12
11
  * 🔹 API Build:
13
12
  * - ESModule format (`esm`) for modern module usage