pagerts 1.4.1 โ†’ 1.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,243 +1,242 @@
1
- # PagerTS
2
-
3
- [![CI/CD Security Pipeline](https://github.com/akinevz2/pagerts/actions/workflows/ci.yml/badge.svg?branch=dev)](https://github.com/akinevz2/pagerts/actions/workflows/ci.yml)
4
- [![Security](https://img.shields.io/badge/security-maintained-green.svg)](./SECURITY.md)
5
- [![Node.js Version](https://img.shields.io/badge/node-%3E%3D20.0.0-brightgreen.svg)](https://nodejs.org)
6
- [![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
7
-
8
- PagerTS is a secure, modern command-line utility that transforms URLs into structured JSON objects, extracting all navigable items and resources from webpages.
9
-
10
- ## Features
11
-
12
- - ๐Ÿ”’ **Security-First**: Built-in URL validation, rate limiting, and XSS protection
13
- - ๐Ÿš€ **Modern TypeScript**: Strict type checking and modern ES2022 syntax
14
- - โšก **Fast**: Efficient parsing with LinkeDOM and concurrent request handling
15
- - ๐Ÿงช **Well-Tested**: Comprehensive test coverage with Jest
16
- - ๐Ÿ“ฆ **Easy to Use**: Simple CLI interface with sensible defaults
17
-
18
- ## Installation
19
-
20
- ### Global Installation
21
-
22
- ```bash
23
- npm install -g pagerts
24
- pagerts <url>
25
- ```
26
-
27
- ### Using npx (No Installation Required)
28
-
29
- ```bash
30
- npx pagerts <url>
31
- ```
32
-
33
- ### From Source
34
-
35
- ```bash
36
- git clone https://github.com/akinevz0/pagerts.git
37
- cd pagerts
38
- npm install
39
- npm run build
40
- npm link
41
- ```
42
-
43
- ## Usage
44
-
45
- ### Basic Usage
46
-
47
- Extract resources from a remote URL:
48
-
49
- ```bash
50
- pagerts https://example.com
51
- ```
52
-
53
- Extract from multiple URLs:
54
-
55
- ```bash
56
- pagerts https://example.com https://example.org
57
- ```
58
-
59
- Extract from a local HTML file:
60
-
61
- ```bash
62
- pagerts file:///path/to/file.html
63
- ```
64
-
65
- ### Output Format
66
-
67
- The output is a JSON object containing:
68
-
69
- ```json
70
- {
71
- "title": "Page Title",
72
- "url": "https://example.com",
73
- "resources": [
74
- {
75
- "name": "Link Text",
76
- "url": "https://example.com/page"
77
- }
78
- ]
79
- }
80
- ```
81
-
82
- Fields:
83
-
84
- - `title`: The page's title extracted from the `<title>` tag
85
- - `url`: The URL of the page
86
- - `resources`: Array of resources found on the page (links, meta tags, embeds)
87
- - `name`: Readable text or description
88
- - `url`: Target URL of the resource
89
-
90
- ## Security
91
-
92
- PagerTS takes security seriously. See [SECURITY.md](./SECURITY.md) for:
93
-
94
- - Security features and protections
95
- - How to report vulnerabilities
96
- - Best practices for users
97
- - Security checklist for contributors
98
-
99
- ### Built-in Security Features
100
-
101
- - โœ… URL validation (only allows `http://`, `https://`, `file://`)
102
- - โœ… Input sanitization to prevent XSS attacks
103
- - โœ… Rate limiting (50 requests/minute by default)
104
- - โœ… Request timeouts to prevent hanging
105
- - โœ… Maximum URL length enforcement
106
- - โœ… Suspicious pattern detection
107
- - โœ… Safe HTML parsing (no script execution)
108
-
109
- ## Development
110
-
111
- ### Prerequisites
112
-
113
- - Node.js >= 20.0.0
114
- - npm >= 9.0.0
115
-
116
- ### Setup
117
-
118
- ```bash
119
- # Clone the repository
120
- git clone https://github.com/akinevz0/pagerts.git
121
- cd pagerts
122
-
123
- # Install dependencies
124
- npm install
125
-
126
- # Run in development mode
127
- npm run dev <url>
128
- ```
129
-
130
- ### Available Scripts
131
-
132
- ```bash
133
- # Run tests
134
- npm test
135
-
136
- # Run tests in watch mode
137
- npm test:watch
138
-
139
- # Build the project
140
- npm run build
141
-
142
- # Lint code
143
- npm run lint
144
-
145
- # Fix linting issues
146
- npm run lint:fix
147
-
148
- # Type check
149
- npm run type-check
150
-
151
- # Format code
152
- npm run format
153
-
154
- # Check formatting
155
- npm run format:check
156
-
157
- # Security audit
158
- npm run security:audit
159
-
160
- # Complete security check (audit + lint)
161
- npm run security:check
162
- ```
163
-
164
- ### Project Structure
165
-
166
- ```
167
- pagerts/
168
- โ”œโ”€โ”€ src/
169
- โ”‚ โ”œโ”€โ”€ main.ts # CLI entry point
170
- โ”‚ โ”œโ”€โ”€ security.ts # Security utilities
171
- โ”‚ โ”œโ”€โ”€ resource.ts # Resource types
172
- โ”‚ โ”œโ”€โ”€ extractors/ # Content extractors
173
- โ”‚ โ”‚ โ”œโ”€โ”€ AbstractExtractor.ts
174
- โ”‚ โ”‚ โ”œโ”€โ”€ PageExtractor.ts
175
- โ”‚ โ”‚ โ”œโ”€โ”€ ResourceExtractor.ts
176
- โ”‚ โ”‚ โ””โ”€โ”€ TagExtractor.ts
177
- โ”‚ โ”œโ”€โ”€ page/ # Page fetching
178
- โ”‚ โ”‚ โ”œโ”€โ”€ Page.ts
179
- โ”‚ โ”‚ โ””โ”€โ”€ PageFetcher.ts
180
- โ”‚ โ”œโ”€โ”€ printers/ # Output formatters
181
- โ”‚ โ”‚ โ”œโ”€โ”€ AbstractResourcePrinter.ts
182
- โ”‚ โ”‚ โ”œโ”€โ”€ JSONStylePrinter.ts
183
- โ”‚ โ”‚ โ””โ”€โ”€ LogStylePrinter.ts
184
- โ”‚ โ””โ”€โ”€ __tests__/ # Test files
185
- โ”œโ”€โ”€ bin/ # Built files
186
- โ”œโ”€โ”€ .github/workflows/ # CI/CD pipelines
187
- โ”œโ”€โ”€ package.json
188
- โ”œโ”€โ”€ tsconfig.json
189
- โ”œโ”€โ”€ jest.config.js
190
- โ”œโ”€โ”€ eslint.config.js
191
- โ””โ”€โ”€ SECURITY.md
192
- ```
193
-
194
- ## Contributing
195
-
196
- Contributions are welcome! Please:
197
-
198
- 1. Fork the repository
199
- 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
200
- 3. Commit your changes (`git commit -m 'Add amazing feature'`)
201
- 4. Push to the branch (`git push origin feature/amazing-feature`)
202
- 5. Open a Pull Request
203
-
204
- ### Contribution Guidelines
205
-
206
- - Write tests for new features
207
- - Follow the existing code style (enforced by ESLint and Prettier)
208
- - Update documentation as needed
209
- - Ensure all tests pass (`npm test`)
210
- - Run security checks (`npm run security:check`)
211
- - Follow security best practices (see [SECURITY.md](./SECURITY.md))
212
-
213
- ## License
214
-
215
- This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
216
-
217
- ## Author
218
-
219
- **Kirill <kine> Nevzorov**
220
-
221
- ## Support
222
-
223
- - ๐Ÿ› [Report bugs](https://github.com/akinevz0/pagerts/issues)
224
- - ๐Ÿ’ก [Request features](https://github.com/akinevz0/pagerts/issues)
225
- - ๐Ÿ”’ [Report security issues](./SECURITY.md)
226
-
227
- ## Changelog
228
-
229
- ### v0.3.0 (Latest)
230
-
231
- - โœจ Added comprehensive security features
232
- - โœจ Implemented URL validation and sanitization
233
- - โœจ Added rate limiting
234
- - โœจ Modernized codebase with TypeScript strict mode
235
- - โœจ Added ESLint with security plugin
236
- - โœจ Added comprehensive test suite
237
- - โœจ Added CI/CD with GitHub Actions
238
- - โœจ Improved error handling and retry logic
239
- - ๐Ÿ“š Added security documentation
240
-
241
- ### v0.2.0
242
-
243
- - Initial public release
1
+ # PagerTS
2
+
3
+ [![CI/CD Security Pipeline](https://github.com/akinevz2/pagerts/actions/workflows/ci.yml/badge.svg?branch=main-stable)](https://github.com/akinevz2/pagerts/actions/workflows/ci.yml)
4
+ [![Security](https://img.shields.io/badge/security-maintained-green.svg)](./SECURITY.md)
5
+ [![Node.js Version](https://img.shields.io/badge/node-%3E%3D20.0.0-brightgreen.svg)](https://nodejs.org)
6
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
7
+
8
+ PagerTS is a secure, modern command-line utility that transforms URLs into structured JSON objects, extracting all navigable items and resources from webpages.
9
+
10
+ ## Features
11
+
12
+ - ๐Ÿ”’ **Security-First**: Built-in URL validation, rate limiting, and XSS protection
13
+ - ๐Ÿš€ **Modern TypeScript**: Strict type checking and modern ES2022 syntax
14
+ - โšก **Fast**: Efficient parsing with LinkeDOM and concurrent request handling
15
+ - ๐Ÿงช **Well-Tested**: Comprehensive test coverage with Jest
16
+ - ๐Ÿ“ฆ **Easy to Use**: Simple CLI interface with sensible defaults
17
+
18
+ ## Installation
19
+
20
+ ### Global Installation
21
+
22
+ ```bash
23
+ npm install -g pagerts
24
+ pagerts <url>
25
+ ```
26
+
27
+ ### Using npx (No Installation Required)
28
+
29
+ ```bash
30
+ npx pagerts <url>
31
+ ```
32
+
33
+ ### From Source
34
+
35
+ ```bash
36
+ git clone https://github.com/akinevz0/pagerts.git
37
+ cd pagerts
38
+ npm install
39
+ npm run build
40
+ npm link
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ ### Basic Usage
46
+
47
+ Extract resources from a remote URL:
48
+
49
+ ```bash
50
+ pagerts https://example.com
51
+ ```
52
+
53
+ Extract from multiple URLs:
54
+
55
+ ```bash
56
+ pagerts https://example.com https://example.org
57
+ ```
58
+
59
+ Extract from a local HTML file:
60
+
61
+ ```bash
62
+ pagerts file:///path/to/file.html
63
+ ```
64
+
65
+ ### Output Format
66
+
67
+ The output is a JSON object containing:
68
+
69
+ ```json
70
+ {
71
+ "title": "Page Title",
72
+ "url": "https://example.com",
73
+ "resources": [
74
+ {
75
+ "name": "Link Text",
76
+ "url": "https://example.com/page"
77
+ }
78
+ ]
79
+ }
80
+ ```
81
+
82
+ Fields:
83
+
84
+ - `title`: The page's title extracted from the `<title>` tag
85
+ - `url`: The URL of the page
86
+ - `resources`: Array of resources found on the page (links, meta tags, embeds)
87
+ - `name`: Readable text or description
88
+ - `url`: Target URL of the resource
89
+
90
+ ## Security
91
+
92
+ PagerTS takes security seriously. See [SECURITY.md](./SECURITY.md) for:
93
+
94
+ - Security features and protections
95
+ - How to report vulnerabilities
96
+ - Best practices for users
97
+ - Security checklist for contributors
98
+
99
+ ### Built-in Security Features
100
+
101
+ - โœ… URL validation (only allows `http://`, `https://`, `file://`)
102
+ - โœ… Input sanitization to prevent XSS attacks
103
+ - โœ… Rate limiting (50 requests/minute by default)
104
+ - โœ… Request timeouts to prevent hanging
105
+ - โœ… Maximum URL length enforcement
106
+ - โœ… Suspicious pattern detection
107
+ - โœ… Safe HTML parsing (no script execution)
108
+
109
+ ## Development
110
+
111
+ ### Prerequisites
112
+
113
+ - Node.js >= 20.0.0
114
+ - npm >= 9.0.0
115
+
116
+ ### Setup
117
+
118
+ ```bash
119
+ # Clone the repository
120
+ git clone https://github.com/akinevz0/pagerts.git
121
+ cd pagerts
122
+
123
+ # Install dependencies
124
+ npm install
125
+
126
+ # Run in development mode
127
+ npm run dev <url>
128
+ ```
129
+
130
+ ### Available Scripts
131
+
132
+ ```bash
133
+ # Run tests
134
+ npm test
135
+
136
+ # Run tests in watch mode
137
+ npm test:watch
138
+
139
+ # Build the project
140
+ npm run build
141
+
142
+ # Lint code
143
+ npm run lint
144
+
145
+ # Fix linting issues
146
+ npm run lint:fix
147
+
148
+ # Type check
149
+ npm run type-check
150
+
151
+ # Format code
152
+ npm run format
153
+
154
+ # Check formatting
155
+ npm run format:check
156
+
157
+ # Security audit
158
+ npm run security:audit
159
+
160
+ # Complete security check (audit + lint)
161
+ npm run security:check
162
+ ```
163
+
164
+ ### Project Structure
165
+
166
+ ```
167
+ pagerts/
168
+ โ”œโ”€โ”€ src/
169
+ โ”‚ โ”œโ”€โ”€ main.ts # CLI entry point
170
+ โ”‚ โ”œโ”€โ”€ security.ts # Security utilities
171
+ โ”‚ โ”œโ”€โ”€ resource.ts # Resource types
172
+ โ”‚ โ”œโ”€โ”€ extractors/ # Content extractors
173
+ โ”‚ โ”‚ โ”œโ”€โ”€ AbstractExtractor.ts
174
+ โ”‚ โ”‚ โ”œโ”€โ”€ PageExtractor.ts
175
+ โ”‚ โ”‚ โ”œโ”€โ”€ ResourceExtractor.ts
176
+ โ”‚ โ”‚ โ””โ”€โ”€ TagExtractor.ts
177
+ โ”‚ โ”œโ”€โ”€ page/ # Page fetching
178
+ โ”‚ โ”‚ โ”œโ”€โ”€ Page.ts
179
+ โ”‚ โ”‚ โ””โ”€โ”€ PageFetcher.ts
180
+ โ”‚ โ”œโ”€โ”€ printers/ # Output formatters
181
+ โ”‚ โ”‚ โ”œโ”€โ”€ AbstractResourcePrinter.ts
182
+ โ”‚ โ”‚ โ”œโ”€โ”€ JSONStylePrinter.ts
183
+ โ”‚ โ”‚ โ””โ”€โ”€ LogStylePrinter.ts
184
+ โ”‚ โ””โ”€โ”€ __tests__/ # Test files
185
+ โ”œโ”€โ”€ bin/ # Built files
186
+ โ”œโ”€โ”€ .github/workflows/ # CI/CD pipelines
187
+ โ”œโ”€โ”€ package.json
188
+ โ”œโ”€โ”€ tsconfig.json
189
+ โ”œโ”€โ”€ jest.config.js
190
+ โ”œโ”€โ”€ eslint.config.js
191
+ โ””โ”€โ”€ SECURITY.md
192
+ ```
193
+
194
+ ## Contributing
195
+
196
+ Contributions are welcome! Please:
197
+
198
+ 1. Fork the repository
199
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
200
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
201
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
202
+ 5. Open a Pull Request
203
+
204
+ ### Contribution Guidelines
205
+
206
+ - Write tests for new features
207
+ - Follow the existing code style (enforced by ESLint and Prettier)
208
+ - Update documentation as needed
209
+ - Ensure all tests pass (`npm test`)
210
+ - Run security checks (`npm run security:check`)
211
+ - Follow security best practices (see [SECURITY.md](./SECURITY.md))
212
+
213
+ ## License
214
+
215
+ This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
216
+
217
+ ## Author
218
+
219
+ **Kirill <kine> Nevzorov**
220
+
221
+ ## Support
222
+
223
+ - ๐Ÿ› [Report bugs](https://github.com/akinevz0/pagerts/issues)
224
+ - ๐Ÿ’ก [Request features](https://github.com/akinevz0/pagerts/issues)
225
+ - ๐Ÿ”’ [Report security issues](./SECURITY.md)
226
+
227
+ ## Changelog
228
+
229
+ ### v0.3.0 -> v1.4.1 summary
230
+
231
+ Key changes in this range:
232
+
233
+ - Security hardening and dependency-surface reduction (`863389a`).
234
+ - CI/security gate tightening and scan-noise cleanup (`da73bdb`, `46875e8`).
235
+ - Packaging/runtime interoperability fixes for CJS/ESM builds and publishes (`4054ab9`, `74d3f98`, `64b2a2f`, `e67acd6`).
236
+ - Regression fix for ignored script resources (`bc13b55`).
237
+ - Dependency tree refresh/stabilization (`1f8f86d`) and release bump to `v1.4.1` (`8846bec`).
238
+ - General code hardening and cleanup across extractors/fetching/printers, plus lockfile and build artifact maintenance in the same span.
239
+
240
+ ### v0.2.0
241
+
242
+ - Initial public release
package/bin/main.js CHANGED
@@ -1,12 +1,361 @@
1
1
  #!/usr/bin/env node
2
- import{Command as q,createArgument as _,Option as $}from"commander";var w={name:"pagerts",description:"A tool for viewing external relations in a webpage",version:"1.3.0",type:"module",main:"main.js",bin:{pagerts:"bin/main.js"},files:["bin"],engines:{node:">=18.0.0"},scripts:{test:"jest --coverage","test:watch":"jest --watch",build:"esbuild src/main.ts --bundle --packages=external --outdir=bin --minify --sourcemap --platform=node --format=esm",lint:"eslint src/**/*.ts","lint:fix":"eslint src/**/*.ts --fix","type-check":"tsc --noEmit",format:'prettier --write "src/**/*.ts"',"format:check":'prettier --check "src/**/*.ts"',"security:audit":"npm audit --audit-level=moderate","security:check":"npm run security:audit && npm run lint",start:"node ./bin/main.js",dev:"tsx src/main.ts",prepare:"npm run build"},keywords:["webpage","hierarchy","management","web-scraping","cli","url-extraction"],author:"Kirill <kine> Nevzorov",license:"MIT",bugs:{url:"https://github.com/akinevz2/pagerts/issues"},homepage:"https://github.com/akinevz2/pagerts",dependencies:{"@exodus/bytes":"^1.15.0",commander:"^12.1.0",linkedom:"^0.18.9"},devDependencies:{"@types/jest":"^29.5.14","@types/node":"^22.10.5","@typescript-eslint/eslint-plugin":"^8.20.0","@typescript-eslint/parser":"^8.20.0",esbuild:"^0.25.1",eslint:"^9.18.0","eslint-config-prettier":"^9.1.0","eslint-plugin-security":"^3.0.1",jest:"^29.7.0",prettier:"^3.4.2","ts-jest":"^29.2.5",tsx:"^4.19.2",typescript:"^5.7.2"}};var u=class{constructor(t){this.name=t}};var d=class extends u{constructor(){super("page-extractor")}async extract(t){let{window:{document:e},url:r}=t;return{title:e.title,url:r}}};var L=["id","innerText","textContent","class","ariaLabel","ariaDescription","alt"],k=["href","data-src","target","action","src","url"],P=(s,t)=>{let e=s.getAttribute(t);return e!=null&&e.trim()!==""?e:void 0};function b(s){for(let t of L){let e=P(s,t);if(e!==void 0)return{key:t,value:e}}}function v(s){for(let t of k){let e=P(s,t);if(e!==void 0)return{key:t,value:e}}}var g=class extends u{constructor(e){super("page-extractor");this.tags=e}async extract(e){let{document:r}=e.window;return this.tags.flatMap(o=>Array.from(r.querySelectorAll(o)).flatMap(i=>{let a=v(i);return a?[{text:b(i)??{key:"src",value:a.value},link:a}]:[]}))}};import{readFile as O}from"fs/promises";import{parseHTML as D}from"linkedom";import{legacyHookDecode as S}from"@exodus/bytes/encoding.js";var f=class{timeout;maxRetries;constructor(t=1e4,e=2){this.timeout=t,this.maxRetries=e}buildDOMResult(t,e){let{document:r}=D(t);return{window:{document:r},url:e}}async fetchPage(t,e=0){try{let r;t.startsWith("file://")?r=O(t.substring(7),"utf-8").then(i=>this.buildDOMResult(i,t)):r=fetch(t).then(async i=>{let a=await i.arrayBuffer(),x=i.headers.get("content-type")??"",m=/charset=([^\s;]+)/i.exec(x),n=S(new Uint8Array(a),m?.[1]??"utf-8");return this.buildDOMResult(n,t)});let o=await(this.timeout>0?Promise.race([r,new Promise((i,a)=>setTimeout(()=>a(new Error("Request timeout")),this.timeout))]):r);return{url:t,content:o}}catch(r){let o=r instanceof Error?r.message:"Unknown error";return e<this.maxRetries&&this.isRetryableError(o)?(process.stderr.write(`Retrying ${t} (attempt ${e+1}/${this.maxRetries})...
3
- `),await this.delay(1e3*(e+1)),this.fetchPage(t,e+1)):{url:t,error:`Failed to fetch: ${o}`}}}isRetryableError(t){return[/timeout/i,/ECONNRESET/i,/ETIMEDOUT/i,/ENOTFOUND/i,/network/i].some(r=>r.test(t))}delay(t){return new Promise(e=>setTimeout(e,t))}async fetchAll(t){return(await Promise.all(t.map(r=>this.fetchPage(r)))).filter(r=>r.content!==void 0||r.error)}};var p=class{constructor(){}};var y=class extends p{print(...t){let e=JSON.stringify(t);process.stdout.write(e+`
4
- `)}};var E=["http:","https:","file:"];var K=[/javascript:/i,/data:/i,/vbscript:/i,/<script/i,/on\w+=/i];function N(s){if(!s||!s.trim())return{isValid:!1,error:"URL cannot be empty"};let t=s.trim();if(t.length>2048)return{isValid:!1,error:"URL exceeds maximum length of 2048 characters"};for(let i of K)if(i.test(t))return{isValid:!1,error:"URL contains suspicious patterns"};let e;try{e=new URL(t)}catch{return t.startsWith("file://")?{isValid:!0,sanitizedUrl:t}:{isValid:!1,error:"Invalid URL format"}}if(!E.includes(e.protocol))return{isValid:!1,error:`Protocol ${e.protocol} is not allowed. Allowed protocols: ${E.join(", ")}`};let r=e.hostname.toLowerCase();return(r==="localhost"||r==="127.0.0.1"||r==="::1"||r.startsWith("192.168.")||r.startsWith("10.")||/^172\.(1[6-9]|2\d|3[01])\./.test(r))&&e.protocol!=="file:"&&console.warn(`Warning: Accessing local network resource: ${t}`),{isValid:!0,sanitizedUrl:e.toString()}}function M(s){let t=[],e=[];for(let r of s){let o=N(r);o.isValid&&o.sanitizedUrl?t.push(o.sanitizedUrl):e.push({url:r,error:o.error||"Unknown validation error"})}return{validUrls:t,errors:e}}var{description:I,name:V,version:C}=w,z=new q,F=_("<url | file...>","remote https://URL or local file://resource.html to extract from");(async()=>await z.name(V).version(C,"-v, --version").description(I).addArgument(F).addOption(new $("--watch","keep running: SIGWINCH re-fetches after resize, Ctrl-D releases in-flight requests, Ctrl-C exits")).action(async(s,t)=>{try{let{validUrls:e,errors:r}=M(s);r.length>0&&(console.error(`
5
- \u274C URL Validation Errors:`),r.forEach(({url:n,error:c})=>{console.error(` - ${n}: ${c}`)})),e.length===0&&(console.error(`
6
- \u274C No valid URLs to process. Exiting.`),process.exit(1)),console.error(`
7
- \u2705 Processing ${e.length} valid URL(s)...`);let o=new y,i=new f(t.watch?0:1e4,2),a=new d,x=new g(["a","meta","link","embed","script"]),m=async()=>{let n=await i.fetchAll(e),c=[];for(let{content:l,url:T,error:h}of n){let R=h!==void 0||!l?[]:await x.extract(l),U=h!==void 0||!l?{url:T,error:h??"Unknown error",resources:R}:await a.extract(l);c.push({...U,resources:R})}await o.print(...c)};if(t.watch){process.stdin.resume(),process.on("SIGINT",()=>{process.exit(0)});let n=null;process.stdin.on("end",()=>{n=null});let c=null;process.on("SIGWINCH",()=>{c!==null&&clearTimeout(c),c=setTimeout(()=>{c=null,n=m().catch(l=>{console.error(`
8
- \u274C An error occurred:`,l instanceof Error?l.message:l)})},150)}),n=m(),await n}else await m()}catch(e){console.error(`
9
- \u274C An error occurred:`,e instanceof Error?e.message:e),process.exit(1)}}).parseAsync(process.argv))();
2
+
3
+ // src/main.ts
4
+ import { Command, createArgument, Option } from "commander";
5
+ import { createRequire } from "node:module";
6
+
7
+ // src/extractors/AbstractExtractor.ts
8
+ var AbstractExtractor = class {
9
+ constructor(name2) {
10
+ this.name = name2;
11
+ }
12
+ };
13
+
14
+ // src/extractors/PageExtractor.ts
15
+ var PageExtractor = class extends AbstractExtractor {
16
+ constructor() {
17
+ super("page-extractor");
18
+ }
19
+ async extract(value) {
20
+ const {
21
+ window: { document },
22
+ url
23
+ } = value;
24
+ return { title: document.title, url };
25
+ }
26
+ };
27
+
28
+ // src/resource.ts
29
+ var RESOURCE_DISPLAYABLE_KEYS = [
30
+ "id",
31
+ "innerText",
32
+ "textContent",
33
+ "class",
34
+ "ariaLabel",
35
+ "ariaDescription",
36
+ "alt"
37
+ ];
38
+ var RESOURCE_LINK_KEYS = ["href", "data-src", "target", "action", "src", "url"];
39
+ var readAttr = (element, key) => {
40
+ const v = element.getAttribute(key);
41
+ return v != null && v.trim() !== "" ? v : void 0;
42
+ };
43
+ function findResourceText(element) {
44
+ for (const key of RESOURCE_DISPLAYABLE_KEYS) {
45
+ const value = readAttr(element, key);
46
+ if (value !== void 0) return { key, value };
47
+ }
48
+ return void 0;
49
+ }
50
+ function findResourceLink(element) {
51
+ for (const key of RESOURCE_LINK_KEYS) {
52
+ const value = readAttr(element, key);
53
+ if (value !== void 0) return { key, value };
54
+ }
55
+ return void 0;
56
+ }
57
+
58
+ // src/extractors/ResourceExtractor.ts
59
+ var ResourceExtractor = class extends AbstractExtractor {
60
+ constructor(tags) {
61
+ super("page-extractor");
62
+ this.tags = tags;
63
+ }
64
+ async extract(value) {
65
+ const { document } = value.window;
66
+ return this.tags.flatMap(
67
+ (tag) => Array.from(document.querySelectorAll(tag)).flatMap((element) => {
68
+ const link = findResourceLink(element);
69
+ if (!link) return [];
70
+ const text = findResourceText(element) ?? { key: "src", value: link.value };
71
+ return [{ text, link }];
72
+ })
73
+ );
74
+ }
75
+ };
76
+
77
+ // src/page/PageFetcher.ts
78
+ import { parseHTML } from "linkedom";
79
+ var PageFetcher = class {
80
+ timeout;
81
+ maxRetries;
82
+ constructor(timeout = 1e4, maxRetries = 2) {
83
+ this.timeout = timeout;
84
+ this.maxRetries = maxRetries;
85
+ }
86
+ buildDOMResult(html, url) {
87
+ const { document } = parseHTML(html);
88
+ return { window: { document }, url };
89
+ }
90
+ decodeHtml(buffer, charset) {
91
+ try {
92
+ return new TextDecoder(charset).decode(new Uint8Array(buffer));
93
+ } catch {
94
+ return new TextDecoder("utf-8").decode(new Uint8Array(buffer));
95
+ }
96
+ }
97
+ async fetchPage(url, retryCount = 0) {
98
+ try {
99
+ const domPromise = fetch(url).then(async (response) => {
100
+ const buffer = await response.arrayBuffer();
101
+ const contentType = response.headers.get("content-type") ?? "";
102
+ const charsetMatch = /charset=([^\s;]+)/i.exec(contentType);
103
+ const html = this.decodeHtml(buffer, charsetMatch?.[1] ?? "utf-8");
104
+ return this.buildDOMResult(html, url);
105
+ });
106
+ const content = await (this.timeout > 0 ? Promise.race([
107
+ domPromise,
108
+ new Promise(
109
+ (_, reject) => setTimeout(() => reject(new Error("Request timeout")), this.timeout)
110
+ )
111
+ ]) : domPromise);
112
+ return { url, content };
113
+ } catch (error) {
114
+ const message = error instanceof Error ? error.message : "Unknown error";
115
+ if (retryCount < this.maxRetries && this.isRetryableError(message)) {
116
+ process.stderr.write(`Retrying ${url} (attempt ${retryCount + 1}/${this.maxRetries})...
117
+ `);
118
+ await this.delay(1e3 * (retryCount + 1));
119
+ return this.fetchPage(url, retryCount + 1);
120
+ }
121
+ return { url, error: `Failed to fetch: ${message}` };
122
+ }
123
+ }
124
+ isRetryableError(message) {
125
+ const retryablePatterns = [/timeout/i, /ECONNRESET/i, /ETIMEDOUT/i, /ENOTFOUND/i, /network/i];
126
+ return retryablePatterns.some((pattern) => pattern.test(message));
127
+ }
128
+ delay(ms) {
129
+ return new Promise((resolve) => setTimeout(resolve, ms));
130
+ }
131
+ async fetchAll(urls) {
132
+ const responses = await Promise.all(urls.map((url) => this.fetchPage(url)));
133
+ return responses.filter((response) => response.content !== void 0 || response.error);
134
+ }
135
+ };
136
+
137
+ // src/page/FileFetcher.ts
138
+ import { readFile } from "node:fs/promises";
139
+ import { parseHTML as parseHTML2 } from "linkedom";
140
+ var MAX_FILES_FAILSAFE = 254;
141
+ var FileFetcher = class {
142
+ buildDOMResult(html, filePath) {
143
+ const { document } = parseHTML2(html);
144
+ return { window: { document }, url: `file://${filePath}` };
145
+ }
146
+ async fetchFile(filePath) {
147
+ try {
148
+ const html = await readFile(filePath, "utf-8");
149
+ return { path: filePath, content: this.buildDOMResult(html, filePath) };
150
+ } catch (error) {
151
+ return {
152
+ path: filePath,
153
+ error: error instanceof Error ? error.message : "Unknown error"
154
+ };
155
+ }
156
+ }
157
+ async fetchAll(filePaths) {
158
+ return Promise.all(filePaths.map((p) => this.fetchFile(p)));
159
+ }
160
+ };
161
+
162
+ // src/printers/AbstractResourcePrinter.ts
163
+ var AbstractResourcePrinter = class {
164
+ constructor() {
165
+ }
166
+ };
167
+
168
+ // src/printers/JSONStylePrinter.ts
169
+ var JSONStylePrinter = class extends AbstractResourcePrinter {
170
+ print(...pages) {
171
+ const json = JSON.stringify(pages);
172
+ process.stdout.write(json + "\n");
173
+ }
174
+ };
175
+
176
+ // src/security.ts
177
+ var ALLOWED_PROTOCOLS = ["http:", "https:"];
178
+ var MAX_URL_LENGTH = 2048;
179
+ var SUSPICIOUS_PATTERNS = [
180
+ /javascript:/i,
181
+ /data:/i,
182
+ /vbscript:/i,
183
+ /<script/i,
184
+ /on\w+=/i
185
+ // Event handlers like onclick=
186
+ ];
187
+ function validateUrl(url) {
188
+ if (!url || !url.trim()) {
189
+ return {
190
+ isValid: false,
191
+ error: "URL cannot be empty"
192
+ };
193
+ }
194
+ const trimmedUrl = url.trim();
195
+ if (trimmedUrl.length > MAX_URL_LENGTH) {
196
+ return {
197
+ isValid: false,
198
+ error: `URL exceeds maximum length of ${MAX_URL_LENGTH} characters`
199
+ };
200
+ }
201
+ for (const pattern of SUSPICIOUS_PATTERNS) {
202
+ if (pattern.test(trimmedUrl)) {
203
+ return {
204
+ isValid: false,
205
+ error: "URL contains suspicious patterns"
206
+ };
207
+ }
208
+ }
209
+ let parsedUrl;
210
+ try {
211
+ parsedUrl = new URL(trimmedUrl);
212
+ } catch {
213
+ return {
214
+ isValid: false,
215
+ error: "Invalid URL format"
216
+ };
217
+ }
218
+ if (!ALLOWED_PROTOCOLS.includes(parsedUrl.protocol)) {
219
+ return {
220
+ isValid: false,
221
+ error: `Protocol ${parsedUrl.protocol} is not allowed. Allowed protocols: ${ALLOWED_PROTOCOLS.join(", ")}`
222
+ };
223
+ }
224
+ const hostname = parsedUrl.hostname.toLowerCase();
225
+ const isLocalhost = hostname === "localhost" || hostname === "127.0.0.1" || hostname === "::1" || hostname.startsWith("192.168.") || hostname.startsWith("10.") || /^172\.(1[6-9]|2\d|3[01])\./.test(hostname);
226
+ if (isLocalhost) {
227
+ console.warn(`Warning: Accessing local network resource: ${trimmedUrl}`);
228
+ }
229
+ return {
230
+ isValid: true,
231
+ sanitizedUrl: parsedUrl.toString()
232
+ };
233
+ }
234
+ function validateUrls(urls) {
235
+ const validUrls = [];
236
+ const errors = [];
237
+ for (const url of urls) {
238
+ const result = validateUrl(url);
239
+ if (result.isValid && result.sanitizedUrl) {
240
+ validUrls.push(result.sanitizedUrl);
241
+ } else {
242
+ errors.push({
243
+ url,
244
+ error: result.error || "Unknown validation error"
245
+ });
246
+ }
247
+ }
248
+ return { validUrls, errors };
249
+ }
250
+
251
+ // src/main.ts
252
+ var require2 = createRequire(import.meta.url);
253
+ var pkg = require2("../package.json");
254
+ var { description, name, version } = pkg;
255
+ var program = new Command();
256
+ var urlArg = createArgument("<url...>", "remote https://URL to extract from");
257
+ var fileArg = createArgument("<paths...>", "local file paths to extract from");
258
+ var pageExtractor = new PageExtractor();
259
+ var resourceExtractor = new ResourceExtractor(["a", "meta", "link", "embed", "script"]);
260
+ var printer = new JSONStylePrinter();
261
+ async function buildPageMetadata(responses) {
262
+ const pageMetadatas = [];
263
+ for (const { content, url: responseUrl, path, error } of responses) {
264
+ const resolvedUrl = responseUrl ?? path ?? "";
265
+ const resources = error !== void 0 || !content ? [] : await resourceExtractor.extract(content);
266
+ const descriptor = error !== void 0 || !content ? { url: resolvedUrl, error: error ?? "Unknown error", resources } : await pageExtractor.extract(content);
267
+ pageMetadatas.push({ ...descriptor, resources });
268
+ }
269
+ return pageMetadatas;
270
+ }
271
+ (async () => {
272
+ program.name(name).version(version, "-v, --version").description(description);
273
+ program.command("fetch", { isDefault: true }).description("fetch and extract resources from remote URL(s)").addArgument(urlArg).addOption(
274
+ new Option(
275
+ "--watch",
276
+ "keep running: SIGWINCH re-fetches after resize, Ctrl-D releases in-flight requests, Ctrl-C exits"
277
+ )
278
+ ).action(async (urls, options) => {
279
+ try {
280
+ const { validUrls, errors } = validateUrls(urls);
281
+ if (errors.length > 0) {
282
+ console.error("\n\u274C URL Validation Errors:");
283
+ errors.forEach(({ url: invalidUrl, error }) => {
284
+ console.error(` - ${invalidUrl}: ${error}`);
285
+ });
286
+ }
287
+ if (validUrls.length === 0) {
288
+ console.error("\n\u274C No valid URLs to process. Exiting.");
289
+ process.exit(1);
290
+ }
291
+ console.error(`
292
+ \u2705 Processing ${validUrls.length} valid URL(s)...`);
293
+ const pageFetcher = new PageFetcher(options.watch ? 0 : 1e4, 2);
294
+ const execute = async () => {
295
+ const responses = await pageFetcher.fetchAll(validUrls);
296
+ const pageMetadatas = await buildPageMetadata(responses);
297
+ await printer.print(...pageMetadatas);
298
+ };
299
+ if (options.watch) {
300
+ process.stdin.resume();
301
+ process.on("SIGINT", () => process.exit(0));
302
+ let activeExecution = null;
303
+ process.stdin.on("end", () => {
304
+ activeExecution = null;
305
+ });
306
+ let winchTimer = null;
307
+ process.on("SIGWINCH", () => {
308
+ if (winchTimer !== null) clearTimeout(winchTimer);
309
+ winchTimer = setTimeout(() => {
310
+ winchTimer = null;
311
+ activeExecution = execute().catch((err) => {
312
+ console.error("\n\u274C An error occurred:", err instanceof Error ? err.message : err);
313
+ });
314
+ }, 150);
315
+ });
316
+ activeExecution = execute();
317
+ await activeExecution;
318
+ } else {
319
+ await execute();
320
+ }
321
+ } catch (error) {
322
+ console.error("\n\u274C An error occurred:", error instanceof Error ? error.message : error);
323
+ process.exit(1);
324
+ }
325
+ });
326
+ program.command("file").description("extract resources from local file(s) via direct filesystem access").addArgument(fileArg).addOption(
327
+ new Option("--no-failsafe", `bypass the ${MAX_FILES_FAILSAFE}-file limit safety check`)
328
+ ).action(async (paths, options) => {
329
+ try {
330
+ if (options.failsafe && paths.length > MAX_FILES_FAILSAFE) {
331
+ console.error(
332
+ `
333
+ \u274C ${paths.length} files specified exceeds the safety limit of ${MAX_FILES_FAILSAFE}.`
334
+ );
335
+ console.error(` Pass --no-failsafe to bypass this check and process all files.`);
336
+ process.exit(1);
337
+ }
338
+ if (!options.failsafe && paths.length > MAX_FILES_FAILSAFE) {
339
+ console.error(
340
+ `
341
+ \u26A0\uFE0F Failsafe bypassed: processing ${paths.length} files (limit is ${MAX_FILES_FAILSAFE}).`
342
+ );
343
+ }
344
+ console.error(`
345
+ \u2705 Processing ${paths.length} file(s)...`);
346
+ const fileFetcher = new FileFetcher();
347
+ const responses = await fileFetcher.fetchAll(paths);
348
+ const pageMetadatas = await buildPageMetadata(
349
+ responses.map(({ path, content, error }) => ({ path, content, error }))
350
+ );
351
+ await printer.print(...pageMetadatas);
352
+ } catch (error) {
353
+ console.error("\n\u274C An error occurred:", error instanceof Error ? error.message : error);
354
+ process.exit(1);
355
+ }
356
+ });
357
+ await program.parseAsync(process.argv);
358
+ })();
10
359
  /**
11
360
  * @license MIT
12
361
  * We are interested in visualising a page as a collection of tags.
package/bin/main.js.map CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "version": 3,
3
- "sources": ["../src/main.ts", "../package.json", "../src/extractors/AbstractExtractor.ts", "../src/extractors/PageExtractor.ts", "../src/resource.ts", "../src/extractors/ResourceExtractor.ts", "../src/page/PageFetcher.ts", "../src/printers/AbstractResourcePrinter.ts", "../src/printers/JSONStylePrinter.ts", "../src/security.ts"],
4
- "sourcesContent": ["#!/usr/bin/env node\nimport { Command, createArgument, Option } from 'commander';\n\nimport pkg from '../package.json' with { type: 'json' };\nimport { PageExtractor, ResourceExtractor } from './extractors/index.js';\nimport { PageFetcher, type PageMetadata } from './page/index.js';\nimport { JSONStylePrinter } from './printers/index.js';\nimport { validateUrls } from './security.js';\n\nconst { description, name, version } = pkg;\n\nconst program = new Command();\n\nconst url = createArgument(\n '<url | file...>',\n 'remote https://URL or local file://resource.html to extract from'\n);\n\n(async (): Promise<void> => {\n await program\n .name(name)\n .version(version, '-v, --version')\n .description(description)\n .addArgument(url)\n .addOption(new Option('--watch', 'keep running: SIGWINCH re-fetches after resize, Ctrl-D releases in-flight requests, Ctrl-C exits'))\n .action(async (urls: string[], options: { watch: boolean }) => {\n try {\n // Validate URLs first\n const { validUrls, errors } = validateUrls(urls);\n\n // Report validation errors\n if (errors.length > 0) {\n console.error('\\n\u274C URL Validation Errors:');\n errors.forEach(({ url: invalidUrl, error }) => {\n console.error(` - ${invalidUrl}: ${error}`);\n });\n }\n\n // Exit if no valid URLs\n if (validUrls.length === 0) {\n console.error('\\n\u274C No valid URLs to process. Exiting.');\n process.exit(1);\n }\n\n console.error(`\\n\u2705 Processing ${validUrls.length} valid URL(s)...`);\n\n const printer = new JSONStylePrinter();\n // watch mode is unbounded (timeout=0); default mode uses 10s timeout\n const pageFetcher = new PageFetcher(options.watch ? 0 : 10000, 2);\n const pageExtractor = new PageExtractor();\n const resourceExtractor = new ResourceExtractor(['a', 'meta', 'link', 'embed', 'script']);\n\n const execute = async (): Promise<void> => {\n const pageResponses = await pageFetcher.fetchAll(validUrls);\n const pageMetadatas: PageMetadata[] = [];\n\n for (const { content, url: responseUrl, error } of pageResponses) {\n const resources =\n error !== undefined || !content ? [] : await resourceExtractor.extract(content);\n const descriptor =\n error !== undefined || !content\n ? { url: responseUrl, error: error ?? 'Unknown error', resources }\n : await pageExtractor.extract(content);\n pageMetadatas.push({ ...descriptor, resources });\n\n\n }\n\n await printer.print(...pageMetadatas);\n };\n\n if (options.watch) {\n process.stdin.resume();\n\n process.on('SIGINT', () => {\n process.exit(0);\n });\n\n let activeExecution: Promise<void> | null = null;\n\n process.stdin.on('end', () => {\n // Ctrl-D: detach in-flight requests and let them fly off\n activeExecution = null;\n });\n\n let winchTimer: ReturnType<typeof setTimeout> | null = null;\n process.on('SIGWINCH', () => {\n if (winchTimer !== null) clearTimeout(winchTimer);\n winchTimer = setTimeout(() => {\n winchTimer = null;\n activeExecution = execute().catch((err: unknown) => {\n console.error('\\n\u274C An error occurred:', err instanceof Error ? err.message : err);\n });\n }, 150);\n });\n\n activeExecution = execute();\n await activeExecution;\n } else {\n await execute();\n }\n } catch (error) {\n console.error('\\n\u274C An error occurred:', error instanceof Error ? error.message : error);\n process.exit(1);\n }\n })\n .parseAsync(process.argv);\n})();\n", "{\r\n \"name\": \"pagerts\",\r\n \"description\": \"A tool for viewing external relations in a webpage\",\r\n \"version\": \"1.3.0\",\r\n \"type\": \"module\",\r\n \"main\": \"main.js\",\r\n \"bin\": {\r\n \"pagerts\": \"bin/main.js\"\r\n },\r\n \"files\": [\r\n \"bin\"\r\n ],\r\n \"engines\": {\r\n \"node\": \">=18.0.0\"\r\n },\r\n \"scripts\": {\r\n \"test\": \"jest --coverage\",\r\n \"test:watch\": \"jest --watch\",\r\n \"build\": \"esbuild src/main.ts --bundle --packages=external --outdir=bin --minify --sourcemap --platform=node --format=esm\",\r\n \"lint\": \"eslint src/**/*.ts\",\r\n \"lint:fix\": \"eslint src/**/*.ts --fix\",\r\n \"type-check\": \"tsc --noEmit\",\r\n \"format\": \"prettier --write \\\"src/**/*.ts\\\"\",\r\n \"format:check\": \"prettier --check \\\"src/**/*.ts\\\"\",\r\n \"security:audit\": \"npm audit --audit-level=moderate\",\r\n \"security:check\": \"npm run security:audit && npm run lint\",\r\n \"start\": \"node ./bin/main.js\",\r\n \"dev\": \"tsx src/main.ts\",\r\n \"prepare\": \"npm run build\"\r\n },\r\n \"keywords\": [\r\n \"webpage\",\r\n \"hierarchy\",\r\n \"management\",\r\n \"web-scraping\",\r\n \"cli\",\r\n \"url-extraction\"\r\n ],\r\n \"author\": \"Kirill <kine> Nevzorov\",\r\n \"license\": \"MIT\",\r\n \"bugs\": {\r\n \"url\": \"https://github.com/akinevz2/pagerts/issues\"\r\n },\r\n \"homepage\": \"https://github.com/akinevz2/pagerts\",\r\n \"dependencies\": {\r\n \"@exodus/bytes\": \"^1.15.0\",\r\n \"commander\": \"^12.1.0\",\r\n \"linkedom\": \"^0.18.9\"\r\n },\r\n \"devDependencies\": {\r\n \"@types/jest\": \"^29.5.14\",\r\n \"@types/node\": \"^22.10.5\",\r\n \"@typescript-eslint/eslint-plugin\": \"^8.20.0\",\r\n \"@typescript-eslint/parser\": \"^8.20.0\",\r\n \"esbuild\": \"^0.25.1\",\r\n \"eslint\": \"^9.18.0\",\r\n \"eslint-config-prettier\": \"^9.1.0\",\r\n \"eslint-plugin-security\": \"^3.0.1\",\r\n \"jest\": \"^29.7.0\",\r\n \"prettier\": \"^3.4.2\",\r\n \"ts-jest\": \"^29.2.5\",\r\n \"tsx\": \"^4.19.2\",\r\n \"typescript\": \"^5.7.2\"\r\n }\r\n}", "export abstract class AbstractExtractor<V, R> {\n constructor(readonly name: string) {}\n abstract extract(value: V): Promise<R>;\n}\n", "import type { Page } from '../page/index.js';\nimport type { DOMResult } from '../page/index.js';\nimport { AbstractExtractor } from './AbstractExtractor.js';\n\nexport class PageExtractor extends AbstractExtractor<DOMResult, Page> {\n constructor() {\n super('page-extractor');\n }\n\n async extract(value: DOMResult): Promise<Page> {\n const { window: { document }, url } = value;\n return { title: document.title, url };\n }\n}\n", "/**\n * @license MIT\n * We are interested in visualising a page as a collection of tags.\n *\n * We wish to work with tags that can be compactly previewed on a webpage.\n * Here we must declare all of the element types that can be used to represent\n * a resource that can be hyperlinked off a webpage.\n */\ntype Tags = HTMLElementTagNameMap;\n\nexport const RESOURCE_DISPLAYABLE_KEYS = [\n 'id',\n 'innerText',\n 'textContent',\n 'class',\n 'ariaLabel',\n 'ariaDescription',\n 'alt',\n] as const;\n\nexport type DisplayableKey = (typeof RESOURCE_DISPLAYABLE_KEYS)[number];\n\nexport const RESOURCE_LINK_KEYS = ['href', 'data-src', 'target', 'action', 'src', 'url'] as const;\n\nexport type LinkKey = (typeof RESOURCE_LINK_KEYS)[number];\n\nexport type AttributeKey = DisplayableKey | LinkKey;\n\nexport type ResourceKey = { key: AttributeKey; value: string };\nexport type ResourceLink = { key: LinkKey; value: string };\n\nexport type ExternalResource = {\n text: ResourceKey;\n link: ResourceLink;\n};\n\nexport type Tag = keyof Tags;\n\nexport type Resource = HTMLElement & {\n [K in AttributeKey]?: string | null;\n};\n\nexport type ResourceByName<T extends keyof Tags> = Tags[T];\n\n// --- adapters ---\n\nconst readAttr = (element: Resource, key: AttributeKey): string | undefined => {\n const v = element.getAttribute(key);\n return v != null && v.trim() !== '' ? v : undefined;\n};\n\nexport function findResourceText(element: Resource): ResourceKey | undefined {\n for (const key of RESOURCE_DISPLAYABLE_KEYS) {\n const value = readAttr(element, key);\n if (value !== undefined) return { key, value };\n }\n return undefined;\n}\n\nexport function findResourceLink(element: Resource): ResourceLink | undefined {\n for (const key of RESOURCE_LINK_KEYS) {\n const value = readAttr(element, key);\n if (value !== undefined) return { key, value };\n }\n return undefined;\n}\n\nexport const isResourceKey = (key: string): key is AttributeKey =>\n (RESOURCE_DISPLAYABLE_KEYS as readonly string[]).includes(key) ||\n (RESOURCE_LINK_KEYS as readonly string[]).includes(key);\n", "import type { DOMResult } from '../page/index.js';\nimport {\n findResourceLink,\n findResourceText,\n type ExternalResource,\n type Resource,\n type Tag,\n} from '../resource.js';\nimport { AbstractExtractor } from './AbstractExtractor.js';\n\nexport class ResourceExtractor extends AbstractExtractor<DOMResult, ExternalResource[]> {\n constructor(private readonly tags: Tag[]) {\n super('page-extractor');\n }\n async extract(value: DOMResult): Promise<ExternalResource[]> {\n const { document } = value.window;\n return this.tags.flatMap((tag) =>\n Array.from(document.querySelectorAll<Resource>(tag)).flatMap((element) => {\n const link = findResourceLink(element);\n if (!link) return [];\n const text = findResourceText(element) ?? { key: 'src' as const, value: link.value };\n return [{ text, link }];\n })\n );\n }\n}\n", "import { readFile } from 'fs/promises';\nimport { parseHTML } from 'linkedom';\nimport { legacyHookDecode } from '@exodus/bytes/encoding.js';\n\nexport interface DOMResult {\n window: { document: Document };\n url: string;\n}\n\ninterface PageResponse {\n url: string;\n content?: DOMResult;\n error?: string;\n}\n\nexport class PageFetcher {\n private readonly timeout: number;\n private readonly maxRetries: number;\n\n constructor(timeout = 10000, maxRetries = 2) {\n this.timeout = timeout;\n this.maxRetries = maxRetries;\n }\n\n private buildDOMResult(html: string, url: string): DOMResult {\n const { document } = parseHTML(html) as { document: Document };\n return { window: { document }, url };\n }\n\n private async fetchPage(url: string, retryCount = 0): Promise<PageResponse> {\n try {\n let domPromise: Promise<DOMResult>;\n\n if (url.startsWith('file://')) {\n domPromise = readFile(url.substring(7), 'utf-8').then((html) =>\n this.buildDOMResult(html, url)\n );\n } else {\n domPromise = fetch(url).then(async (response) => {\n const buffer = await response.arrayBuffer();\n const contentType = response.headers.get('content-type') ?? '';\n const charsetMatch = /charset=([^\\s;]+)/i.exec(contentType);\n const html = legacyHookDecode(new Uint8Array(buffer), charsetMatch?.[1] ?? 'utf-8');\n return this.buildDOMResult(html, url);\n });\n }\n\n const content = await (this.timeout > 0\n ? Promise.race([\n domPromise,\n new Promise<never>((_, reject) =>\n setTimeout(() => reject(new Error('Request timeout')), this.timeout)\n ),\n ])\n : domPromise);\n\n return { url, content };\n } catch (error) {\n const message = error instanceof Error ? error.message : 'Unknown error';\n\n // Retry logic for transient errors\n if (retryCount < this.maxRetries && this.isRetryableError(message)) {\n process.stderr.write(`Retrying ${url} (attempt ${retryCount + 1}/${this.maxRetries})...\\n`);\n await this.delay(1000 * (retryCount + 1)); // Exponential backoff\n return this.fetchPage(url, retryCount + 1);\n }\n\n return { url, error: `Failed to fetch: ${message}` };\n }\n }\n\n private isRetryableError(message: string): boolean {\n const retryablePatterns = [/timeout/i, /ECONNRESET/i, /ETIMEDOUT/i, /ENOTFOUND/i, /network/i];\n return retryablePatterns.some((pattern) => pattern.test(message));\n }\n\n private delay(ms: number): Promise<void> {\n return new Promise((resolve) => setTimeout(resolve, ms));\n }\n\n async fetchAll(urls: string[]): Promise<PageResponse[]> {\n const responses = await Promise.all(urls.map((url) => this.fetchPage(url)));\n return responses.filter((response) => response.content !== undefined || response.error);\n }\n}\n", "import type { PageMetadata } from '../page/index.js';\n\nexport abstract class AbstractResourcePrinter {\n constructor() {}\n abstract print(...pages: PageMetadata[]): void | Promise<void>;\n}\n", "import type { PageMetadata } from '../page/index.js';\nimport { AbstractResourcePrinter } from './AbstractResourcePrinter.js';\n\nexport class JSONStylePrinter extends AbstractResourcePrinter {\n print(...pages: PageMetadata[]): void | Promise<void> {\n const json = JSON.stringify(pages);\n process.stdout.write(json + '\\n');\n }\n}\n", "/**\n * Security utilities for URL validation and sanitization\n */\n\nconst ALLOWED_PROTOCOLS = ['http:', 'https:', 'file:'];\nconst MAX_URL_LENGTH = 2048;\nconst SUSPICIOUS_PATTERNS = [\n /javascript:/i,\n /data:/i,\n /vbscript:/i,\n /<script/i,\n /on\\w+=/i, // Event handlers like onclick=\n];\n\nexport interface ValidationResult {\n isValid: boolean;\n error?: string;\n sanitizedUrl?: string;\n}\n\n/**\n * Validates a URL for security concerns\n * @param url - The URL to validate\n * @returns ValidationResult object with validation status\n */\nexport function validateUrl(url: string): ValidationResult {\n // Check if URL is empty or whitespace\n if (!url || !url.trim()) {\n return {\n isValid: false,\n error: 'URL cannot be empty',\n };\n }\n\n const trimmedUrl = url.trim();\n\n // Check URL length to prevent DoS\n if (trimmedUrl.length > MAX_URL_LENGTH) {\n return {\n isValid: false,\n error: `URL exceeds maximum length of ${MAX_URL_LENGTH} characters`,\n };\n }\n\n // Check for suspicious patterns\n for (const pattern of SUSPICIOUS_PATTERNS) {\n if (pattern.test(trimmedUrl)) {\n return {\n isValid: false,\n error: 'URL contains suspicious patterns',\n };\n }\n }\n\n // Parse the URL\n let parsedUrl: URL;\n try {\n parsedUrl = new URL(trimmedUrl);\n } catch (error) {\n // If URL parsing fails, it might be a file path\n if (trimmedUrl.startsWith('file://')) {\n return {\n isValid: true,\n sanitizedUrl: trimmedUrl,\n };\n }\n return {\n isValid: false,\n error: 'Invalid URL format',\n };\n }\n\n // Check protocol\n if (!ALLOWED_PROTOCOLS.includes(parsedUrl.protocol)) {\n return {\n isValid: false,\n error: `Protocol ${parsedUrl.protocol} is not allowed. Allowed protocols: ${ALLOWED_PROTOCOLS.join(', ')}`,\n };\n }\n\n // Check for localhost/internal IPs in production (security consideration)\n const hostname = parsedUrl.hostname.toLowerCase();\n const isLocalhost =\n hostname === 'localhost' ||\n hostname === '127.0.0.1' ||\n hostname === '::1' ||\n hostname.startsWith('192.168.') ||\n hostname.startsWith('10.') ||\n /^172\\.(1[6-9]|2\\d|3[01])\\./.test(hostname);\n\n if (isLocalhost && parsedUrl.protocol !== 'file:') {\n // Allow but warn about localhost URLs\n console.warn(`Warning: Accessing local network resource: ${trimmedUrl}`);\n }\n\n return {\n isValid: true,\n sanitizedUrl: parsedUrl.toString(),\n };\n}\n\n/**\n * Validates an array of URLs\n * @param urls - Array of URLs to validate\n * @returns Object with valid URLs and errors\n */\nexport function validateUrls(urls: string[]): {\n validUrls: string[];\n errors: Array<{ url: string; error: string }>;\n} {\n const validUrls: string[] = [];\n const errors: Array<{ url: string; error: string }> = [];\n\n for (const url of urls) {\n const result = validateUrl(url);\n if (result.isValid && result.sanitizedUrl) {\n validUrls.push(result.sanitizedUrl);\n } else {\n errors.push({\n url,\n error: result.error || 'Unknown validation error',\n });\n }\n }\n\n return { validUrls, errors };\n}\n\n/**\n * Rate limiter to prevent abuse\n */\nexport class RateLimiter {\n private requests: number[] = [];\n private readonly maxRequests: number;\n private readonly windowMs: number;\n\n constructor(maxRequests = 10, windowMs = 60000) {\n this.maxRequests = maxRequests;\n this.windowMs = windowMs;\n }\n\n /**\n * Check if a request is allowed under rate limiting\n * @returns true if request is allowed, false otherwise\n */\n public isAllowed(): boolean {\n const now = Date.now();\n\n // Remove old requests outside the time window\n this.requests = this.requests.filter((time) => now - time < this.windowMs);\n\n if (this.requests.length >= this.maxRequests) {\n return false;\n }\n\n this.requests.push(now);\n return true;\n }\n\n /**\n * Get remaining requests in current window\n */\n public getRemainingRequests(): number {\n const now = Date.now();\n this.requests = this.requests.filter((time) => now - time < this.windowMs);\n return Math.max(0, this.maxRequests - this.requests.length);\n }\n}\n\n/**\n * Sanitizes HTML content to prevent XSS attacks\n * @param text - Text to sanitize\n * @returns Sanitized text\n */\nexport function sanitizeText(text: string): string {\n if (!text) return '';\n\n return text\n .replace(/</g, '&lt;')\n .replace(/>/g, '&gt;')\n .replace(/\"/g, '&quot;')\n .replace(/'/g, '&#x27;')\n .replace(/\\//g, '&#x2F;');\n}\n"],
5
- "mappings": ";AACA,OAAS,WAAAA,EAAS,kBAAAC,EAAgB,UAAAC,MAAc,YCDhD,IAAAC,EAAA,CACE,KAAQ,UACR,YAAe,qDACf,QAAW,QACX,KAAQ,SACR,KAAQ,UACR,IAAO,CACL,QAAW,aACb,EACA,MAAS,CACP,KACF,EACA,QAAW,CACT,KAAQ,UACV,EACA,QAAW,CACT,KAAQ,kBACR,aAAc,eACd,MAAS,kHACT,KAAQ,qBACR,WAAY,2BACZ,aAAc,eACd,OAAU,iCACV,eAAgB,iCAChB,iBAAkB,mCAClB,iBAAkB,yCAClB,MAAS,qBACT,IAAO,kBACP,QAAW,eACb,EACA,SAAY,CACV,UACA,YACA,aACA,eACA,MACA,gBACF,EACA,OAAU,yBACV,QAAW,MACX,KAAQ,CACN,IAAO,4CACT,EACA,SAAY,sCACZ,aAAgB,CACd,gBAAiB,UACjB,UAAa,UACb,SAAY,SACd,EACA,gBAAmB,CACjB,cAAe,WACf,cAAe,WACf,mCAAoC,UACpC,4BAA6B,UAC7B,QAAW,UACX,OAAU,UACV,yBAA0B,SAC1B,yBAA0B,SAC1B,KAAQ,UACR,SAAY,SACZ,UAAW,UACX,IAAO,UACP,WAAc,QAChB,CACF,EChEO,IAAeC,EAAf,KAAuC,CAC5C,YAAqBC,EAAc,CAAd,UAAAA,CAAe,CAEtC,ECCO,IAAMC,EAAN,cAA4BC,CAAmC,CACpE,aAAc,CACZ,MAAM,gBAAgB,CACxB,CAEA,MAAM,QAAQC,EAAiC,CAC7C,GAAM,CAAE,OAAQ,CAAE,SAAAC,CAAS,EAAG,IAAAC,CAAI,EAAIF,EACtC,MAAO,CAAE,MAAOC,EAAS,MAAO,IAAAC,CAAI,CACtC,CACF,ECHO,IAAMC,EAA4B,CACvC,KACA,YACA,cACA,QACA,YACA,kBACA,KACF,EAIaC,EAAqB,CAAC,OAAQ,WAAY,SAAU,SAAU,MAAO,KAAK,EAwBjFC,EAAW,CAACC,EAAmBC,IAA0C,CAC7E,IAAMC,EAAIF,EAAQ,aAAaC,CAAG,EAClC,OAAOC,GAAK,MAAQA,EAAE,KAAK,IAAM,GAAKA,EAAI,MAC5C,EAEO,SAASC,EAAiBH,EAA4C,CAC3E,QAAWC,KAAOJ,EAA2B,CAC3C,IAAMO,EAAQL,EAASC,EAASC,CAAG,EACnC,GAAIG,IAAU,OAAW,MAAO,CAAE,IAAAH,EAAK,MAAAG,CAAM,CAC/C,CAEF,CAEO,SAASC,EAAiBL,EAA6C,CAC5E,QAAWC,KAAOH,EAAoB,CACpC,IAAMM,EAAQL,EAASC,EAASC,CAAG,EACnC,GAAIG,IAAU,OAAW,MAAO,CAAE,IAAAH,EAAK,MAAAG,CAAM,CAC/C,CAEF,CCvDO,IAAME,EAAN,cAAgCC,CAAiD,CACtF,YAA6BC,EAAa,CACxC,MAAM,gBAAgB,EADK,UAAAA,CAE7B,CACA,MAAM,QAAQC,EAA+C,CAC3D,GAAM,CAAE,SAAAC,CAAS,EAAID,EAAM,OAC3B,OAAO,KAAK,KAAK,QAASE,GACxB,MAAM,KAAKD,EAAS,iBAA2BC,CAAG,CAAC,EAAE,QAASC,GAAY,CACxE,IAAMC,EAAOC,EAAiBF,CAAO,EACrC,OAAKC,EAEE,CAAC,CAAE,KADGE,EAAiBH,CAAO,GAAK,CAAE,IAAK,MAAgB,MAAOC,EAAK,KAAM,EACnE,KAAAA,CAAK,CAAC,EAFJ,CAAC,CAGrB,CAAC,CACH,CACF,CACF,ECzBA,OAAS,YAAAG,MAAgB,cACzB,OAAS,aAAAC,MAAiB,WAC1B,OAAS,oBAAAC,MAAwB,4BAa1B,IAAMC,EAAN,KAAkB,CACN,QACA,WAEjB,YAAYC,EAAU,IAAOC,EAAa,EAAG,CAC3C,KAAK,QAAUD,EACf,KAAK,WAAaC,CACpB,CAEQ,eAAeC,EAAcC,EAAwB,CAC3D,GAAM,CAAE,SAAAC,CAAS,EAAIP,EAAUK,CAAI,EACnC,MAAO,CAAE,OAAQ,CAAE,SAAAE,CAAS,EAAG,IAAAD,CAAI,CACrC,CAEA,MAAc,UAAUA,EAAaE,EAAa,EAA0B,CAC1E,GAAI,CACF,IAAIC,EAEAH,EAAI,WAAW,SAAS,EAC1BG,EAAaV,EAASO,EAAI,UAAU,CAAC,EAAG,OAAO,EAAE,KAAMD,GACrD,KAAK,eAAeA,EAAMC,CAAG,CAC/B,EAEAG,EAAa,MAAMH,CAAG,EAAE,KAAK,MAAOI,GAAa,CAC/C,IAAMC,EAAS,MAAMD,EAAS,YAAY,EACpCE,EAAcF,EAAS,QAAQ,IAAI,cAAc,GAAK,GACtDG,EAAe,qBAAqB,KAAKD,CAAW,EACpDP,EAAOJ,EAAiB,IAAI,WAAWU,CAAM,EAAGE,IAAe,CAAC,GAAK,OAAO,EAClF,OAAO,KAAK,eAAeR,EAAMC,CAAG,CACtC,CAAC,EAGH,IAAMQ,EAAU,MAAO,KAAK,QAAU,EAClC,QAAQ,KAAK,CACXL,EACA,IAAI,QAAe,CAACM,EAAGC,IACrB,WAAW,IAAMA,EAAO,IAAI,MAAM,iBAAiB,CAAC,EAAG,KAAK,OAAO,CACrE,CACF,CAAC,EACDP,GAEJ,MAAO,CAAE,IAAAH,EAAK,QAAAQ,CAAQ,CACxB,OAASG,EAAO,CACd,IAAMC,EAAUD,aAAiB,MAAQA,EAAM,QAAU,gBAGzD,OAAIT,EAAa,KAAK,YAAc,KAAK,iBAAiBU,CAAO,GAC/D,QAAQ,OAAO,MAAM,YAAYZ,CAAG,aAAaE,EAAa,CAAC,IAAI,KAAK,UAAU;AAAA,CAAQ,EAC1F,MAAM,KAAK,MAAM,KAAQA,EAAa,EAAE,EACjC,KAAK,UAAUF,EAAKE,EAAa,CAAC,GAGpC,CAAE,IAAAF,EAAK,MAAO,oBAAoBY,CAAO,EAAG,CACrD,CACF,CAEQ,iBAAiBA,EAA0B,CAEjD,MAD0B,CAAC,WAAY,cAAe,aAAc,aAAc,UAAU,EACnE,KAAMC,GAAYA,EAAQ,KAAKD,CAAO,CAAC,CAClE,CAEQ,MAAME,EAA2B,CACvC,OAAO,IAAI,QAASC,GAAY,WAAWA,EAASD,CAAE,CAAC,CACzD,CAEA,MAAM,SAASE,EAAyC,CAEtD,OADkB,MAAM,QAAQ,IAAIA,EAAK,IAAKhB,GAAQ,KAAK,UAAUA,CAAG,CAAC,CAAC,GACzD,OAAQI,GAAaA,EAAS,UAAY,QAAaA,EAAS,KAAK,CACxF,CACF,EClFO,IAAea,EAAf,KAAuC,CAC5C,aAAc,CAAC,CAEjB,ECFO,IAAMC,EAAN,cAA+BC,CAAwB,CAC5D,SAASC,EAA6C,CACpD,IAAMC,EAAO,KAAK,UAAUD,CAAK,EACjC,QAAQ,OAAO,MAAMC,EAAO;AAAA,CAAI,CAClC,CACF,ECJA,IAAMC,EAAoB,CAAC,QAAS,SAAU,OAAO,EAErD,IAAMC,EAAsB,CAC1B,eACA,SACA,aACA,WACA,SACF,EAaO,SAASC,EAAYC,EAA+B,CAEzD,GAAI,CAACA,GAAO,CAACA,EAAI,KAAK,EACpB,MAAO,CACL,QAAS,GACT,MAAO,qBACT,EAGF,IAAMC,EAAaD,EAAI,KAAK,EAG5B,GAAIC,EAAW,OAAS,KACtB,MAAO,CACL,QAAS,GACT,MAAO,+CACT,EAIF,QAAWC,KAAWJ,EACpB,GAAII,EAAQ,KAAKD,CAAU,EACzB,MAAO,CACL,QAAS,GACT,MAAO,kCACT,EAKJ,IAAIE,EACJ,GAAI,CACFA,EAAY,IAAI,IAAIF,CAAU,CAChC,MAAgB,CAEd,OAAIA,EAAW,WAAW,SAAS,EAC1B,CACL,QAAS,GACT,aAAcA,CAChB,EAEK,CACL,QAAS,GACT,MAAO,oBACT,CACF,CAGA,GAAI,CAACG,EAAkB,SAASD,EAAU,QAAQ,EAChD,MAAO,CACL,QAAS,GACT,MAAO,YAAYA,EAAU,QAAQ,uCAAuCC,EAAkB,KAAK,IAAI,CAAC,EAC1G,EAIF,IAAMC,EAAWF,EAAU,SAAS,YAAY,EAShD,OAPEE,IAAa,aACbA,IAAa,aACbA,IAAa,OACbA,EAAS,WAAW,UAAU,GAC9BA,EAAS,WAAW,KAAK,GACzB,6BAA6B,KAAKA,CAAQ,IAEzBF,EAAU,WAAa,SAExC,QAAQ,KAAK,8CAA8CF,CAAU,EAAE,EAGlE,CACL,QAAS,GACT,aAAcE,EAAU,SAAS,CACnC,CACF,CAOO,SAASG,EAAaC,EAG3B,CACA,IAAMC,EAAsB,CAAC,EACvBC,EAAgD,CAAC,EAEvD,QAAWT,KAAOO,EAAM,CACtB,IAAMG,EAASX,EAAYC,CAAG,EAC1BU,EAAO,SAAWA,EAAO,aAC3BF,EAAU,KAAKE,EAAO,YAAY,EAElCD,EAAO,KAAK,CACV,IAAAT,EACA,MAAOU,EAAO,OAAS,0BACzB,CAAC,CAEL,CAEA,MAAO,CAAE,UAAAF,EAAW,OAAAC,CAAO,CAC7B,CTrHA,GAAM,CAAE,YAAAE,EAAa,KAAAC,EAAM,QAAAC,CAAQ,EAAIC,EAEjCC,EAAU,IAAIC,EAEdC,EAAMC,EACV,kBACA,kEACF,GAEC,SACC,MAAMH,EACH,KAAKH,CAAI,EACT,QAAQC,EAAS,eAAe,EAChC,YAAYF,CAAW,EACvB,YAAYM,CAAG,EACf,UAAU,IAAIE,EAAO,UAAW,kGAAkG,CAAC,EACnI,OAAO,MAAOC,EAAgBC,IAAgC,CAC7D,GAAI,CAEF,GAAM,CAAE,UAAAC,EAAW,OAAAC,CAAO,EAAIC,EAAaJ,CAAI,EAG3CG,EAAO,OAAS,IAClB,QAAQ,MAAM;AAAA,8BAA4B,EAC1CA,EAAO,QAAQ,CAAC,CAAE,IAAKE,EAAY,MAAAC,CAAM,IAAM,CAC7C,QAAQ,MAAM,OAAOD,CAAU,KAAKC,CAAK,EAAE,CAC7C,CAAC,GAICJ,EAAU,SAAW,IACvB,QAAQ,MAAM;AAAA,0CAAwC,EACtD,QAAQ,KAAK,CAAC,GAGhB,QAAQ,MAAM;AAAA,oBAAkBA,EAAU,MAAM,kBAAkB,EAElE,IAAMK,EAAU,IAAIC,EAEdC,EAAc,IAAIC,EAAYT,EAAQ,MAAQ,EAAI,IAAO,CAAC,EAC1DU,EAAgB,IAAIC,EACpBC,EAAoB,IAAIC,EAAkB,CAAC,IAAK,OAAQ,OAAQ,QAAS,QAAQ,CAAC,EAElFC,EAAU,SAA2B,CACzC,IAAMC,EAAgB,MAAMP,EAAY,SAASP,CAAS,EACpDe,EAAgC,CAAC,EAEvC,OAAW,CAAE,QAAAC,EAAS,IAAKC,EAAa,MAAAb,CAAM,IAAKU,EAAe,CAChE,IAAMI,EACJd,IAAU,QAAa,CAACY,EAAU,CAAC,EAAI,MAAML,EAAkB,QAAQK,CAAO,EAC1EG,EACJf,IAAU,QAAa,CAACY,EACpB,CAAE,IAAKC,EAAa,MAAOb,GAAS,gBAAiB,UAAAc,CAAU,EAC/D,MAAMT,EAAc,QAAQO,CAAO,EACzCD,EAAc,KAAK,CAAE,GAAGI,EAAY,UAAAD,CAAU,CAAC,CAGjD,CAEA,MAAMb,EAAQ,MAAM,GAAGU,CAAa,CACtC,EAEA,GAAIhB,EAAQ,MAAO,CACjB,QAAQ,MAAM,OAAO,EAErB,QAAQ,GAAG,SAAU,IAAM,CACzB,QAAQ,KAAK,CAAC,CAChB,CAAC,EAED,IAAIqB,EAAwC,KAE5C,QAAQ,MAAM,GAAG,MAAO,IAAM,CAE5BA,EAAkB,IACpB,CAAC,EAED,IAAIC,EAAmD,KACvD,QAAQ,GAAG,WAAY,IAAM,CACvBA,IAAe,MAAM,aAAaA,CAAU,EAChDA,EAAa,WAAW,IAAM,CAC5BA,EAAa,KACbD,EAAkBP,EAAQ,EAAE,MAAOS,GAAiB,CAClD,QAAQ,MAAM;AAAA,2BAA0BA,aAAe,MAAQA,EAAI,QAAUA,CAAG,CAClF,CAAC,CACH,EAAG,GAAG,CACR,CAAC,EAEDF,EAAkBP,EAAQ,EAC1B,MAAMO,CACR,MACE,MAAMP,EAAQ,CAElB,OAAST,EAAO,CACd,QAAQ,MAAM;AAAA,2BAA0BA,aAAiB,MAAQA,EAAM,QAAUA,CAAK,EACtF,QAAQ,KAAK,CAAC,CAChB,CACF,CAAC,EACA,WAAW,QAAQ,IAAI",
6
- "names": ["Command", "createArgument", "Option", "package_default", "AbstractExtractor", "name", "PageExtractor", "AbstractExtractor", "value", "document", "url", "RESOURCE_DISPLAYABLE_KEYS", "RESOURCE_LINK_KEYS", "readAttr", "element", "key", "v", "findResourceText", "value", "findResourceLink", "ResourceExtractor", "AbstractExtractor", "tags", "value", "document", "tag", "element", "link", "findResourceLink", "findResourceText", "readFile", "parseHTML", "legacyHookDecode", "PageFetcher", "timeout", "maxRetries", "html", "url", "document", "retryCount", "domPromise", "response", "buffer", "contentType", "charsetMatch", "content", "_", "reject", "error", "message", "pattern", "ms", "resolve", "urls", "AbstractResourcePrinter", "JSONStylePrinter", "AbstractResourcePrinter", "pages", "json", "ALLOWED_PROTOCOLS", "SUSPICIOUS_PATTERNS", "validateUrl", "url", "trimmedUrl", "pattern", "parsedUrl", "ALLOWED_PROTOCOLS", "hostname", "validateUrls", "urls", "validUrls", "errors", "result", "description", "name", "version", "package_default", "program", "Command", "url", "createArgument", "Option", "urls", "options", "validUrls", "errors", "validateUrls", "invalidUrl", "error", "printer", "JSONStylePrinter", "pageFetcher", "PageFetcher", "pageExtractor", "PageExtractor", "resourceExtractor", "ResourceExtractor", "execute", "pageResponses", "pageMetadatas", "content", "responseUrl", "resources", "descriptor", "activeExecution", "winchTimer", "err"]
3
+ "sources": ["../src/main.ts", "../src/extractors/AbstractExtractor.ts", "../src/extractors/PageExtractor.ts", "../src/resource.ts", "../src/extractors/ResourceExtractor.ts", "../src/page/PageFetcher.ts", "../src/page/FileFetcher.ts", "../src/printers/AbstractResourcePrinter.ts", "../src/printers/JSONStylePrinter.ts", "../src/security.ts"],
4
+ "sourcesContent": ["#!/usr/bin/env node\nimport { Command, createArgument, Option } from 'commander';\nimport { createRequire } from 'node:module';\n\nimport { PageExtractor, ResourceExtractor } from './extractors/index.js';\nimport { FileFetcher, MAX_FILES_FAILSAFE, PageFetcher, type PageMetadata } from './page/index.js';\nimport { JSONStylePrinter } from './printers/index.js';\nimport { validateUrls } from './security.js';\n\nconst require = createRequire(import.meta.url);\nconst pkg = require('../package.json') as {\n description: string;\n name: string;\n version: string;\n};\n\nconst { description, name, version } = pkg;\n\nconst program = new Command();\n\nconst urlArg = createArgument('<url...>', 'remote https://URL to extract from');\nconst fileArg = createArgument('<paths...>', 'local file paths to extract from');\n\n// Shared extractor instances.\nconst pageExtractor = new PageExtractor();\nconst resourceExtractor = new ResourceExtractor(['a', 'meta', 'link', 'embed', 'script']);\nconst printer = new JSONStylePrinter();\n\nasync function buildPageMetadata(\n responses: Array<{\n url?: string;\n path?: string;\n content?: import('./page/index.js').DOMResult;\n error?: string;\n }>\n): Promise<PageMetadata[]> {\n const pageMetadatas: PageMetadata[] = [];\n\n for (const { content, url: responseUrl, path, error } of responses) {\n const resolvedUrl = responseUrl ?? path ?? '';\n const resources =\n error !== undefined || !content ? [] : await resourceExtractor.extract(content);\n const descriptor =\n error !== undefined || !content\n ? { url: resolvedUrl, error: error ?? 'Unknown error', resources }\n : await pageExtractor.extract(content);\n pageMetadatas.push({ ...descriptor, resources });\n }\n\n return pageMetadatas;\n}\n\n(async (): Promise<void> => {\n program.name(name).version(version, '-v, --version').description(description);\n\n // \u2500\u2500 fetch subcommand (default remote URL mode) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n program\n .command('fetch', { isDefault: true })\n .description('fetch and extract resources from remote URL(s)')\n .addArgument(urlArg)\n .addOption(\n new Option(\n '--watch',\n 'keep running: SIGWINCH re-fetches after resize, Ctrl-D releases in-flight requests, Ctrl-C exits'\n )\n )\n .action(async (urls: string[], options: { watch: boolean }) => {\n try {\n const { validUrls, errors } = validateUrls(urls);\n\n if (errors.length > 0) {\n console.error('\\n\u274C URL Validation Errors:');\n errors.forEach(({ url: invalidUrl, error }) => {\n console.error(` - ${invalidUrl}: ${error}`);\n });\n }\n\n if (validUrls.length === 0) {\n console.error('\\n\u274C No valid URLs to process. Exiting.');\n process.exit(1);\n }\n\n console.error(`\\n\u2705 Processing ${validUrls.length} valid URL(s)...`);\n\n const pageFetcher = new PageFetcher(options.watch ? 0 : 10000, 2);\n\n const execute = async (): Promise<void> => {\n const responses = await pageFetcher.fetchAll(validUrls);\n const pageMetadatas = await buildPageMetadata(responses);\n await printer.print(...pageMetadatas);\n };\n\n if (options.watch) {\n process.stdin.resume();\n process.on('SIGINT', () => process.exit(0));\n\n let activeExecution: Promise<void> | null = null;\n process.stdin.on('end', () => {\n activeExecution = null;\n });\n\n let winchTimer: ReturnType<typeof setTimeout> | null = null;\n process.on('SIGWINCH', () => {\n if (winchTimer !== null) clearTimeout(winchTimer);\n winchTimer = setTimeout(() => {\n winchTimer = null;\n activeExecution = execute().catch((err: unknown) => {\n console.error('\\n\u274C An error occurred:', err instanceof Error ? err.message : err);\n });\n }, 150);\n });\n\n activeExecution = execute();\n await activeExecution;\n } else {\n await execute();\n }\n } catch (error) {\n console.error('\\n\u274C An error occurred:', error instanceof Error ? error.message : error);\n process.exit(1);\n }\n });\n\n // \u2500\u2500 file subcommand (local filesystem access) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n program\n .command('file')\n .description('extract resources from local file(s) via direct filesystem access')\n .addArgument(fileArg)\n .addOption(\n new Option('--no-failsafe', `bypass the ${MAX_FILES_FAILSAFE}-file limit safety check`)\n )\n .action(async (paths: string[], options: { failsafe: boolean }) => {\n try {\n if (options.failsafe && paths.length > MAX_FILES_FAILSAFE) {\n console.error(\n `\\n\u274C ${paths.length} files specified exceeds the safety limit of ${MAX_FILES_FAILSAFE}.`\n );\n console.error(` Pass --no-failsafe to bypass this check and process all files.`);\n process.exit(1);\n }\n\n if (!options.failsafe && paths.length > MAX_FILES_FAILSAFE) {\n console.error(\n `\\n\u26A0\uFE0F Failsafe bypassed: processing ${paths.length} files (limit is ${MAX_FILES_FAILSAFE}).`\n );\n }\n\n console.error(`\\n\u2705 Processing ${paths.length} file(s)...`);\n\n const fileFetcher = new FileFetcher();\n const responses = await fileFetcher.fetchAll(paths);\n const pageMetadatas = await buildPageMetadata(\n responses.map(({ path, content, error }) => ({ path, content, error }))\n );\n\n await printer.print(...pageMetadatas);\n } catch (error) {\n console.error('\\n\u274C An error occurred:', error instanceof Error ? error.message : error);\n process.exit(1);\n }\n });\n\n await program.parseAsync(process.argv);\n})();\n", "export abstract class AbstractExtractor<V, R> {\n constructor(readonly name: string) {}\n abstract extract(value: V): Promise<R>;\n}\n", "import type { Page } from '../page/index.js';\nimport type { DOMResult } from '../page/index.js';\nimport { AbstractExtractor } from './AbstractExtractor.js';\n\nexport class PageExtractor extends AbstractExtractor<DOMResult, Page> {\n constructor() {\n super('page-extractor');\n }\n\n async extract(value: DOMResult): Promise<Page> {\n const {\n window: { document },\n url,\n } = value;\n return { title: document.title, url };\n }\n}\n", "/**\n * @license MIT\n * We are interested in visualising a page as a collection of tags.\n *\n * We wish to work with tags that can be compactly previewed on a webpage.\n * Here we must declare all of the element types that can be used to represent\n * a resource that can be hyperlinked off a webpage.\n */\ntype Tags = HTMLElementTagNameMap;\n\nexport const RESOURCE_DISPLAYABLE_KEYS = [\n 'id',\n 'innerText',\n 'textContent',\n 'class',\n 'ariaLabel',\n 'ariaDescription',\n 'alt',\n] as const;\n\nexport type DisplayableKey = (typeof RESOURCE_DISPLAYABLE_KEYS)[number];\n\nexport const RESOURCE_LINK_KEYS = ['href', 'data-src', 'target', 'action', 'src', 'url'] as const;\n\nexport type LinkKey = (typeof RESOURCE_LINK_KEYS)[number];\n\nexport type AttributeKey = DisplayableKey | LinkKey;\n\nexport type ResourceKey = { key: AttributeKey; value: string };\nexport type ResourceLink = { key: LinkKey; value: string };\n\nexport type ExternalResource = {\n text: ResourceKey;\n link: ResourceLink;\n};\n\nexport type Tag = keyof Tags;\n\nexport type Resource = HTMLElement & {\n [K in AttributeKey]?: string | null;\n};\n\nexport type ResourceByName<T extends keyof Tags> = Tags[T];\n\n// --- adapters ---\n\nconst readAttr = (element: Resource, key: AttributeKey): string | undefined => {\n const v = element.getAttribute(key);\n return v != null && v.trim() !== '' ? v : undefined;\n};\n\nexport function findResourceText(element: Resource): ResourceKey | undefined {\n for (const key of RESOURCE_DISPLAYABLE_KEYS) {\n const value = readAttr(element, key);\n if (value !== undefined) return { key, value };\n }\n return undefined;\n}\n\nexport function findResourceLink(element: Resource): ResourceLink | undefined {\n for (const key of RESOURCE_LINK_KEYS) {\n const value = readAttr(element, key);\n if (value !== undefined) return { key, value };\n }\n return undefined;\n}\n\nexport const isResourceKey = (key: string): key is AttributeKey =>\n (RESOURCE_DISPLAYABLE_KEYS as readonly string[]).includes(key) ||\n (RESOURCE_LINK_KEYS as readonly string[]).includes(key);\n", "import type { DOMResult } from '../page/index.js';\nimport {\n findResourceLink,\n findResourceText,\n type ExternalResource,\n type Resource,\n type Tag,\n} from '../resource.js';\nimport { AbstractExtractor } from './AbstractExtractor.js';\n\nexport class ResourceExtractor extends AbstractExtractor<DOMResult, ExternalResource[]> {\n constructor(private readonly tags: Tag[]) {\n super('page-extractor');\n }\n async extract(value: DOMResult): Promise<ExternalResource[]> {\n const { document } = value.window;\n return this.tags.flatMap((tag) =>\n Array.from(document.querySelectorAll<Resource>(tag)).flatMap((element) => {\n const link = findResourceLink(element);\n if (!link) return [];\n const text = findResourceText(element) ?? { key: 'src' as const, value: link.value };\n return [{ text, link }];\n })\n );\n }\n}\n", "import { parseHTML } from 'linkedom';\n\ntype ParseHTMLResult = {\n document: Document;\n};\n\nexport interface DOMResult {\n window: { document: Document };\n url: string;\n}\n\ninterface PageResponse {\n url: string;\n content?: DOMResult;\n error?: string;\n}\n\nexport class PageFetcher {\n private readonly timeout: number;\n private readonly maxRetries: number;\n\n constructor(timeout = 10000, maxRetries = 2) {\n this.timeout = timeout;\n this.maxRetries = maxRetries;\n }\n\n private buildDOMResult(html: string, url: string): DOMResult {\n const { document } = parseHTML(html) as ParseHTMLResult;\n return { window: { document }, url };\n }\n\n private decodeHtml(buffer: ArrayBuffer, charset: string): string {\n try {\n return new TextDecoder(charset).decode(new Uint8Array(buffer));\n } catch {\n return new TextDecoder('utf-8').decode(new Uint8Array(buffer));\n }\n }\n\n private async fetchPage(url: string, retryCount = 0): Promise<PageResponse> {\n try {\n const domPromise = fetch(url).then(async (response) => {\n const buffer = await response.arrayBuffer();\n const contentType = response.headers.get('content-type') ?? '';\n const charsetMatch = /charset=([^\\s;]+)/i.exec(contentType);\n const html = this.decodeHtml(buffer, charsetMatch?.[1] ?? 'utf-8');\n return this.buildDOMResult(html, url);\n });\n\n const content = await (this.timeout > 0\n ? Promise.race([\n domPromise,\n new Promise<never>((_, reject) =>\n setTimeout(() => reject(new Error('Request timeout')), this.timeout)\n ),\n ])\n : domPromise);\n\n return { url, content };\n } catch (error) {\n const message = error instanceof Error ? error.message : 'Unknown error';\n\n // Retry logic for transient errors\n if (retryCount < this.maxRetries && this.isRetryableError(message)) {\n process.stderr.write(`Retrying ${url} (attempt ${retryCount + 1}/${this.maxRetries})...\\n`);\n await this.delay(1000 * (retryCount + 1)); // Exponential backoff\n return this.fetchPage(url, retryCount + 1);\n }\n\n return { url, error: `Failed to fetch: ${message}` };\n }\n }\n\n private isRetryableError(message: string): boolean {\n const retryablePatterns = [/timeout/i, /ECONNRESET/i, /ETIMEDOUT/i, /ENOTFOUND/i, /network/i];\n return retryablePatterns.some((pattern) => pattern.test(message));\n }\n\n private delay(ms: number): Promise<void> {\n return new Promise((resolve) => setTimeout(resolve, ms));\n }\n\n async fetchAll(urls: string[]): Promise<PageResponse[]> {\n const responses = await Promise.all(urls.map((url) => this.fetchPage(url)));\n return responses.filter((response) => response.content !== undefined || response.error);\n }\n}\n", "import { readFile } from 'node:fs/promises';\nimport { parseHTML } from 'linkedom';\n\nimport type { DOMResult } from './PageFetcher.js';\n\nexport const MAX_FILES_FAILSAFE = 254;\n\ntype ParseHTMLResult = {\n document: Document;\n};\n\nexport interface FileResponse {\n path: string;\n content?: DOMResult;\n error?: string;\n}\n\nexport class FileFetcher {\n private buildDOMResult(html: string, filePath: string): DOMResult {\n const { document } = parseHTML(html) as ParseHTMLResult;\n return { window: { document }, url: `file://${filePath}` };\n }\n\n async fetchFile(filePath: string): Promise<FileResponse> {\n try {\n // filePath is supplied directly by the CLI user, not derived from network input.\n // eslint-disable-next-line security/detect-non-literal-fs-filename\n const html = await readFile(filePath, 'utf-8');\n return { path: filePath, content: this.buildDOMResult(html, filePath) };\n } catch (error) {\n return {\n path: filePath,\n error: error instanceof Error ? error.message : 'Unknown error',\n };\n }\n }\n\n async fetchAll(filePaths: string[]): Promise<FileResponse[]> {\n return Promise.all(filePaths.map((p) => this.fetchFile(p)));\n }\n}\n", "import type { PageMetadata } from '../page/index.js';\n\nexport abstract class AbstractResourcePrinter {\n constructor() {}\n abstract print(...pages: PageMetadata[]): void | Promise<void>;\n}\n", "import type { PageMetadata } from '../page/index.js';\nimport { AbstractResourcePrinter } from './AbstractResourcePrinter.js';\n\nexport class JSONStylePrinter extends AbstractResourcePrinter {\n print(...pages: PageMetadata[]): void | Promise<void> {\n const json = JSON.stringify(pages);\n process.stdout.write(json + '\\n');\n }\n}\n", "/**\n * Security utilities for URL validation and sanitization\n */\n\nconst ALLOWED_PROTOCOLS = ['http:', 'https:'];\nconst MAX_URL_LENGTH = 2048;\nconst SUSPICIOUS_PATTERNS = [\n /javascript:/i,\n /data:/i,\n /vbscript:/i,\n /<script/i,\n /on\\w+=/i, // Event handlers like onclick=\n];\n\nexport interface ValidationResult {\n isValid: boolean;\n error?: string;\n sanitizedUrl?: string;\n}\n\n/**\n * Validates a URL for security concerns\n * @param url - The URL to validate\n * @returns ValidationResult object with validation status\n */\nexport function validateUrl(url: string): ValidationResult {\n // Check if URL is empty or whitespace\n if (!url || !url.trim()) {\n return {\n isValid: false,\n error: 'URL cannot be empty',\n };\n }\n\n const trimmedUrl = url.trim();\n\n // Check URL length to prevent DoS\n if (trimmedUrl.length > MAX_URL_LENGTH) {\n return {\n isValid: false,\n error: `URL exceeds maximum length of ${MAX_URL_LENGTH} characters`,\n };\n }\n\n // Check for suspicious patterns\n for (const pattern of SUSPICIOUS_PATTERNS) {\n if (pattern.test(trimmedUrl)) {\n return {\n isValid: false,\n error: 'URL contains suspicious patterns',\n };\n }\n }\n\n // Parse the URL\n let parsedUrl: URL;\n try {\n parsedUrl = new URL(trimmedUrl);\n } catch {\n return {\n isValid: false,\n error: 'Invalid URL format',\n };\n }\n\n // Check protocol\n if (!ALLOWED_PROTOCOLS.includes(parsedUrl.protocol)) {\n return {\n isValid: false,\n error: `Protocol ${parsedUrl.protocol} is not allowed. Allowed protocols: ${ALLOWED_PROTOCOLS.join(', ')}`,\n };\n }\n\n // Check for localhost/internal IPs in production (security consideration)\n const hostname = parsedUrl.hostname.toLowerCase();\n const isLocalhost =\n hostname === 'localhost' ||\n hostname === '127.0.0.1' ||\n hostname === '::1' ||\n hostname.startsWith('192.168.') ||\n hostname.startsWith('10.') ||\n /^172\\.(1[6-9]|2\\d|3[01])\\./.test(hostname);\n\n if (isLocalhost) {\n // Allow but warn about localhost URLs\n console.warn(`Warning: Accessing local network resource: ${trimmedUrl}`);\n }\n\n return {\n isValid: true,\n sanitizedUrl: parsedUrl.toString(),\n };\n}\n\n/**\n * Validates an array of URLs\n * @param urls - Array of URLs to validate\n * @returns Object with valid URLs and errors\n */\nexport function validateUrls(urls: string[]): {\n validUrls: string[];\n errors: Array<{ url: string; error: string }>;\n} {\n const validUrls: string[] = [];\n const errors: Array<{ url: string; error: string }> = [];\n\n for (const url of urls) {\n const result = validateUrl(url);\n if (result.isValid && result.sanitizedUrl) {\n validUrls.push(result.sanitizedUrl);\n } else {\n errors.push({\n url,\n error: result.error || 'Unknown validation error',\n });\n }\n }\n\n return { validUrls, errors };\n}\n\n/**\n * Rate limiter to prevent abuse\n */\nexport class RateLimiter {\n private requests: number[] = [];\n private readonly maxRequests: number;\n private readonly windowMs: number;\n\n constructor(maxRequests = 10, windowMs = 60000) {\n this.maxRequests = maxRequests;\n this.windowMs = windowMs;\n }\n\n /**\n * Check if a request is allowed under rate limiting\n * @returns true if request is allowed, false otherwise\n */\n public isAllowed(): boolean {\n const now = Date.now();\n\n // Remove old requests outside the time window\n this.requests = this.requests.filter((time) => now - time < this.windowMs);\n\n if (this.requests.length >= this.maxRequests) {\n return false;\n }\n\n this.requests.push(now);\n return true;\n }\n\n /**\n * Get remaining requests in current window\n */\n public getRemainingRequests(): number {\n const now = Date.now();\n this.requests = this.requests.filter((time) => now - time < this.windowMs);\n return Math.max(0, this.maxRequests - this.requests.length);\n }\n}\n\n/**\n * Sanitizes HTML content to prevent XSS attacks\n * @param text - Text to sanitize\n * @returns Sanitized text\n */\nexport function sanitizeText(text: string): string {\n if (!text) return '';\n\n return text\n .replace(/</g, '&lt;')\n .replace(/>/g, '&gt;')\n .replace(/\"/g, '&quot;')\n .replace(/'/g, '&#x27;')\n .replace(/\\//g, '&#x2F;');\n}\n"],
5
+ "mappings": ";;;AACA,SAAS,SAAS,gBAAgB,cAAc;AAChD,SAAS,qBAAqB;;;ACFvB,IAAe,oBAAf,MAAuC;AAAA,EAC5C,YAAqBA,OAAc;AAAd,gBAAAA;AAAA,EAAe;AAEtC;;;ACCO,IAAM,gBAAN,cAA4B,kBAAmC;AAAA,EACpE,cAAc;AACZ,UAAM,gBAAgB;AAAA,EACxB;AAAA,EAEA,MAAM,QAAQ,OAAiC;AAC7C,UAAM;AAAA,MACJ,QAAQ,EAAE,SAAS;AAAA,MACnB;AAAA,IACF,IAAI;AACJ,WAAO,EAAE,OAAO,SAAS,OAAO,IAAI;AAAA,EACtC;AACF;;;ACNO,IAAM,4BAA4B;AAAA,EACvC;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AACF;AAIO,IAAM,qBAAqB,CAAC,QAAQ,YAAY,UAAU,UAAU,OAAO,KAAK;AAwBvF,IAAM,WAAW,CAAC,SAAmB,QAA0C;AAC7E,QAAM,IAAI,QAAQ,aAAa,GAAG;AAClC,SAAO,KAAK,QAAQ,EAAE,KAAK,MAAM,KAAK,IAAI;AAC5C;AAEO,SAAS,iBAAiB,SAA4C;AAC3E,aAAW,OAAO,2BAA2B;AAC3C,UAAM,QAAQ,SAAS,SAAS,GAAG;AACnC,QAAI,UAAU,OAAW,QAAO,EAAE,KAAK,MAAM;AAAA,EAC/C;AACA,SAAO;AACT;AAEO,SAAS,iBAAiB,SAA6C;AAC5E,aAAW,OAAO,oBAAoB;AACpC,UAAM,QAAQ,SAAS,SAAS,GAAG;AACnC,QAAI,UAAU,OAAW,QAAO,EAAE,KAAK,MAAM;AAAA,EAC/C;AACA,SAAO;AACT;;;ACvDO,IAAM,oBAAN,cAAgC,kBAAiD;AAAA,EACtF,YAA6B,MAAa;AACxC,UAAM,gBAAgB;AADK;AAAA,EAE7B;AAAA,EACA,MAAM,QAAQ,OAA+C;AAC3D,UAAM,EAAE,SAAS,IAAI,MAAM;AAC3B,WAAO,KAAK,KAAK;AAAA,MAAQ,CAAC,QACxB,MAAM,KAAK,SAAS,iBAA2B,GAAG,CAAC,EAAE,QAAQ,CAAC,YAAY;AACxE,cAAM,OAAO,iBAAiB,OAAO;AACrC,YAAI,CAAC,KAAM,QAAO,CAAC;AACnB,cAAM,OAAO,iBAAiB,OAAO,KAAK,EAAE,KAAK,OAAgB,OAAO,KAAK,MAAM;AACnF,eAAO,CAAC,EAAE,MAAM,KAAK,CAAC;AAAA,MACxB,CAAC;AAAA,IACH;AAAA,EACF;AACF;;;ACzBA,SAAS,iBAAiB;AAiBnB,IAAM,cAAN,MAAkB;AAAA,EACN;AAAA,EACA;AAAA,EAEjB,YAAY,UAAU,KAAO,aAAa,GAAG;AAC3C,SAAK,UAAU;AACf,SAAK,aAAa;AAAA,EACpB;AAAA,EAEQ,eAAe,MAAc,KAAwB;AAC3D,UAAM,EAAE,SAAS,IAAI,UAAU,IAAI;AACnC,WAAO,EAAE,QAAQ,EAAE,SAAS,GAAG,IAAI;AAAA,EACrC;AAAA,EAEQ,WAAW,QAAqB,SAAyB;AAC/D,QAAI;AACF,aAAO,IAAI,YAAY,OAAO,EAAE,OAAO,IAAI,WAAW,MAAM,CAAC;AAAA,IAC/D,QAAQ;AACN,aAAO,IAAI,YAAY,OAAO,EAAE,OAAO,IAAI,WAAW,MAAM,CAAC;AAAA,IAC/D;AAAA,EACF;AAAA,EAEA,MAAc,UAAU,KAAa,aAAa,GAA0B;AAC1E,QAAI;AACF,YAAM,aAAa,MAAM,GAAG,EAAE,KAAK,OAAO,aAAa;AACrD,cAAM,SAAS,MAAM,SAAS,YAAY;AAC1C,cAAM,cAAc,SAAS,QAAQ,IAAI,cAAc,KAAK;AAC5D,cAAM,eAAe,qBAAqB,KAAK,WAAW;AAC1D,cAAM,OAAO,KAAK,WAAW,QAAQ,eAAe,CAAC,KAAK,OAAO;AACjE,eAAO,KAAK,eAAe,MAAM,GAAG;AAAA,MACtC,CAAC;AAED,YAAM,UAAU,OAAO,KAAK,UAAU,IAClC,QAAQ,KAAK;AAAA,QACX;AAAA,QACA,IAAI;AAAA,UAAe,CAAC,GAAG,WACrB,WAAW,MAAM,OAAO,IAAI,MAAM,iBAAiB,CAAC,GAAG,KAAK,OAAO;AAAA,QACrE;AAAA,MACF,CAAC,IACD;AAEJ,aAAO,EAAE,KAAK,QAAQ;AAAA,IACxB,SAAS,OAAO;AACd,YAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU;AAGzD,UAAI,aAAa,KAAK,cAAc,KAAK,iBAAiB,OAAO,GAAG;AAClE,gBAAQ,OAAO,MAAM,YAAY,GAAG,aAAa,aAAa,CAAC,IAAI,KAAK,UAAU;AAAA,CAAQ;AAC1F,cAAM,KAAK,MAAM,OAAQ,aAAa,EAAE;AACxC,eAAO,KAAK,UAAU,KAAK,aAAa,CAAC;AAAA,MAC3C;AAEA,aAAO,EAAE,KAAK,OAAO,oBAAoB,OAAO,GAAG;AAAA,IACrD;AAAA,EACF;AAAA,EAEQ,iBAAiB,SAA0B;AACjD,UAAM,oBAAoB,CAAC,YAAY,eAAe,cAAc,cAAc,UAAU;AAC5F,WAAO,kBAAkB,KAAK,CAAC,YAAY,QAAQ,KAAK,OAAO,CAAC;AAAA,EAClE;AAAA,EAEQ,MAAM,IAA2B;AACvC,WAAO,IAAI,QAAQ,CAAC,YAAY,WAAW,SAAS,EAAE,CAAC;AAAA,EACzD;AAAA,EAEA,MAAM,SAAS,MAAyC;AACtD,UAAM,YAAY,MAAM,QAAQ,IAAI,KAAK,IAAI,CAAC,QAAQ,KAAK,UAAU,GAAG,CAAC,CAAC;AAC1E,WAAO,UAAU,OAAO,CAAC,aAAa,SAAS,YAAY,UAAa,SAAS,KAAK;AAAA,EACxF;AACF;;;ACtFA,SAAS,gBAAgB;AACzB,SAAS,aAAAC,kBAAiB;AAInB,IAAM,qBAAqB;AAY3B,IAAM,cAAN,MAAkB;AAAA,EACf,eAAe,MAAc,UAA6B;AAChE,UAAM,EAAE,SAAS,IAAIA,WAAU,IAAI;AACnC,WAAO,EAAE,QAAQ,EAAE,SAAS,GAAG,KAAK,UAAU,QAAQ,GAAG;AAAA,EAC3D;AAAA,EAEA,MAAM,UAAU,UAAyC;AACvD,QAAI;AAGF,YAAM,OAAO,MAAM,SAAS,UAAU,OAAO;AAC7C,aAAO,EAAE,MAAM,UAAU,SAAS,KAAK,eAAe,MAAM,QAAQ,EAAE;AAAA,IACxE,SAAS,OAAO;AACd,aAAO;AAAA,QACL,MAAM;AAAA,QACN,OAAO,iBAAiB,QAAQ,MAAM,UAAU;AAAA,MAClD;AAAA,IACF;AAAA,EACF;AAAA,EAEA,MAAM,SAAS,WAA8C;AAC3D,WAAO,QAAQ,IAAI,UAAU,IAAI,CAAC,MAAM,KAAK,UAAU,CAAC,CAAC,CAAC;AAAA,EAC5D;AACF;;;ACtCO,IAAe,0BAAf,MAAuC;AAAA,EAC5C,cAAc;AAAA,EAAC;AAEjB;;;ACFO,IAAM,mBAAN,cAA+B,wBAAwB;AAAA,EAC5D,SAAS,OAA6C;AACpD,UAAM,OAAO,KAAK,UAAU,KAAK;AACjC,YAAQ,OAAO,MAAM,OAAO,IAAI;AAAA,EAClC;AACF;;;ACJA,IAAM,oBAAoB,CAAC,SAAS,QAAQ;AAC5C,IAAM,iBAAiB;AACvB,IAAM,sBAAsB;AAAA,EAC1B;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA;AACF;AAaO,SAAS,YAAY,KAA+B;AAEzD,MAAI,CAAC,OAAO,CAAC,IAAI,KAAK,GAAG;AACvB,WAAO;AAAA,MACL,SAAS;AAAA,MACT,OAAO;AAAA,IACT;AAAA,EACF;AAEA,QAAM,aAAa,IAAI,KAAK;AAG5B,MAAI,WAAW,SAAS,gBAAgB;AACtC,WAAO;AAAA,MACL,SAAS;AAAA,MACT,OAAO,iCAAiC,cAAc;AAAA,IACxD;AAAA,EACF;AAGA,aAAW,WAAW,qBAAqB;AACzC,QAAI,QAAQ,KAAK,UAAU,GAAG;AAC5B,aAAO;AAAA,QACL,SAAS;AAAA,QACT,OAAO;AAAA,MACT;AAAA,IACF;AAAA,EACF;AAGA,MAAI;AACJ,MAAI;AACF,gBAAY,IAAI,IAAI,UAAU;AAAA,EAChC,QAAQ;AACN,WAAO;AAAA,MACL,SAAS;AAAA,MACT,OAAO;AAAA,IACT;AAAA,EACF;AAGA,MAAI,CAAC,kBAAkB,SAAS,UAAU,QAAQ,GAAG;AACnD,WAAO;AAAA,MACL,SAAS;AAAA,MACT,OAAO,YAAY,UAAU,QAAQ,uCAAuC,kBAAkB,KAAK,IAAI,CAAC;AAAA,IAC1G;AAAA,EACF;AAGA,QAAM,WAAW,UAAU,SAAS,YAAY;AAChD,QAAM,cACJ,aAAa,eACb,aAAa,eACb,aAAa,SACb,SAAS,WAAW,UAAU,KAC9B,SAAS,WAAW,KAAK,KACzB,6BAA6B,KAAK,QAAQ;AAE5C,MAAI,aAAa;AAEf,YAAQ,KAAK,8CAA8C,UAAU,EAAE;AAAA,EACzE;AAEA,SAAO;AAAA,IACL,SAAS;AAAA,IACT,cAAc,UAAU,SAAS;AAAA,EACnC;AACF;AAOO,SAAS,aAAa,MAG3B;AACA,QAAM,YAAsB,CAAC;AAC7B,QAAM,SAAgD,CAAC;AAEvD,aAAW,OAAO,MAAM;AACtB,UAAM,SAAS,YAAY,GAAG;AAC9B,QAAI,OAAO,WAAW,OAAO,cAAc;AACzC,gBAAU,KAAK,OAAO,YAAY;AAAA,IACpC,OAAO;AACL,aAAO,KAAK;AAAA,QACV;AAAA,QACA,OAAO,OAAO,SAAS;AAAA,MACzB,CAAC;AAAA,IACH;AAAA,EACF;AAEA,SAAO,EAAE,WAAW,OAAO;AAC7B;;;AT9GA,IAAMC,WAAU,cAAc,YAAY,GAAG;AAC7C,IAAM,MAAMA,SAAQ,iBAAiB;AAMrC,IAAM,EAAE,aAAa,MAAM,QAAQ,IAAI;AAEvC,IAAM,UAAU,IAAI,QAAQ;AAE5B,IAAM,SAAS,eAAe,YAAY,oCAAoC;AAC9E,IAAM,UAAU,eAAe,cAAc,kCAAkC;AAG/E,IAAM,gBAAgB,IAAI,cAAc;AACxC,IAAM,oBAAoB,IAAI,kBAAkB,CAAC,KAAK,QAAQ,QAAQ,SAAS,QAAQ,CAAC;AACxF,IAAM,UAAU,IAAI,iBAAiB;AAErC,eAAe,kBACb,WAMyB;AACzB,QAAM,gBAAgC,CAAC;AAEvC,aAAW,EAAE,SAAS,KAAK,aAAa,MAAM,MAAM,KAAK,WAAW;AAClE,UAAM,cAAc,eAAe,QAAQ;AAC3C,UAAM,YACJ,UAAU,UAAa,CAAC,UAAU,CAAC,IAAI,MAAM,kBAAkB,QAAQ,OAAO;AAChF,UAAM,aACJ,UAAU,UAAa,CAAC,UACpB,EAAE,KAAK,aAAa,OAAO,SAAS,iBAAiB,UAAU,IAC/D,MAAM,cAAc,QAAQ,OAAO;AACzC,kBAAc,KAAK,EAAE,GAAG,YAAY,UAAU,CAAC;AAAA,EACjD;AAEA,SAAO;AACT;AAAA,CAEC,YAA2B;AAC1B,UAAQ,KAAK,IAAI,EAAE,QAAQ,SAAS,eAAe,EAAE,YAAY,WAAW;AAG5E,UACG,QAAQ,SAAS,EAAE,WAAW,KAAK,CAAC,EACpC,YAAY,gDAAgD,EAC5D,YAAY,MAAM,EAClB;AAAA,IACC,IAAI;AAAA,MACF;AAAA,MACA;AAAA,IACF;AAAA,EACF,EACC,OAAO,OAAO,MAAgB,YAAgC;AAC7D,QAAI;AACF,YAAM,EAAE,WAAW,OAAO,IAAI,aAAa,IAAI;AAE/C,UAAI,OAAO,SAAS,GAAG;AACrB,gBAAQ,MAAM,iCAA4B;AAC1C,eAAO,QAAQ,CAAC,EAAE,KAAK,YAAY,MAAM,MAAM;AAC7C,kBAAQ,MAAM,OAAO,UAAU,KAAK,KAAK,EAAE;AAAA,QAC7C,CAAC;AAAA,MACH;AAEA,UAAI,UAAU,WAAW,GAAG;AAC1B,gBAAQ,MAAM,6CAAwC;AACtD,gBAAQ,KAAK,CAAC;AAAA,MAChB;AAEA,cAAQ,MAAM;AAAA,oBAAkB,UAAU,MAAM,kBAAkB;AAElE,YAAM,cAAc,IAAI,YAAY,QAAQ,QAAQ,IAAI,KAAO,CAAC;AAEhE,YAAM,UAAU,YAA2B;AACzC,cAAM,YAAY,MAAM,YAAY,SAAS,SAAS;AACtD,cAAM,gBAAgB,MAAM,kBAAkB,SAAS;AACvD,cAAM,QAAQ,MAAM,GAAG,aAAa;AAAA,MACtC;AAEA,UAAI,QAAQ,OAAO;AACjB,gBAAQ,MAAM,OAAO;AACrB,gBAAQ,GAAG,UAAU,MAAM,QAAQ,KAAK,CAAC,CAAC;AAE1C,YAAI,kBAAwC;AAC5C,gBAAQ,MAAM,GAAG,OAAO,MAAM;AAC5B,4BAAkB;AAAA,QACpB,CAAC;AAED,YAAI,aAAmD;AACvD,gBAAQ,GAAG,YAAY,MAAM;AAC3B,cAAI,eAAe,KAAM,cAAa,UAAU;AAChD,uBAAa,WAAW,MAAM;AAC5B,yBAAa;AACb,8BAAkB,QAAQ,EAAE,MAAM,CAAC,QAAiB;AAClD,sBAAQ,MAAM,+BAA0B,eAAe,QAAQ,IAAI,UAAU,GAAG;AAAA,YAClF,CAAC;AAAA,UACH,GAAG,GAAG;AAAA,QACR,CAAC;AAED,0BAAkB,QAAQ;AAC1B,cAAM;AAAA,MACR,OAAO;AACL,cAAM,QAAQ;AAAA,MAChB;AAAA,IACF,SAAS,OAAO;AACd,cAAQ,MAAM,+BAA0B,iBAAiB,QAAQ,MAAM,UAAU,KAAK;AACtF,cAAQ,KAAK,CAAC;AAAA,IAChB;AAAA,EACF,CAAC;AAGH,UACG,QAAQ,MAAM,EACd,YAAY,mEAAmE,EAC/E,YAAY,OAAO,EACnB;AAAA,IACC,IAAI,OAAO,iBAAiB,cAAc,kBAAkB,0BAA0B;AAAA,EACxF,EACC,OAAO,OAAO,OAAiB,YAAmC;AACjE,QAAI;AACF,UAAI,QAAQ,YAAY,MAAM,SAAS,oBAAoB;AACzD,gBAAQ;AAAA,UACN;AAAA,SAAO,MAAM,MAAM,gDAAgD,kBAAkB;AAAA,QACvF;AACA,gBAAQ,MAAM,mEAAmE;AACjF,gBAAQ,KAAK,CAAC;AAAA,MAChB;AAEA,UAAI,CAAC,QAAQ,YAAY,MAAM,SAAS,oBAAoB;AAC1D,gBAAQ;AAAA,UACN;AAAA,8CAAuC,MAAM,MAAM,oBAAoB,kBAAkB;AAAA,QAC3F;AAAA,MACF;AAEA,cAAQ,MAAM;AAAA,oBAAkB,MAAM,MAAM,aAAa;AAEzD,YAAM,cAAc,IAAI,YAAY;AACpC,YAAM,YAAY,MAAM,YAAY,SAAS,KAAK;AAClD,YAAM,gBAAgB,MAAM;AAAA,QAC1B,UAAU,IAAI,CAAC,EAAE,MAAM,SAAS,MAAM,OAAO,EAAE,MAAM,SAAS,MAAM,EAAE;AAAA,MACxE;AAEA,YAAM,QAAQ,MAAM,GAAG,aAAa;AAAA,IACtC,SAAS,OAAO;AACd,cAAQ,MAAM,+BAA0B,iBAAiB,QAAQ,MAAM,UAAU,KAAK;AACtF,cAAQ,KAAK,CAAC;AAAA,IAChB;AAAA,EACF,CAAC;AAEH,QAAM,QAAQ,WAAW,QAAQ,IAAI;AACvC,GAAG;",
6
+ "names": ["name", "parseHTML", "require"]
7
7
  }
package/package.json CHANGED
@@ -1,65 +1,65 @@
1
- {
2
- "name": "pagerts",
3
- "description": "A tool for viewing external relations in a webpage",
4
- "version": "1.4.1",
5
- "type": "module",
6
- "main": "main.js",
7
- "bin": {
8
- "pagerts": "bin/main.js"
9
- },
10
- "files": [
11
- "bin"
12
- ],
13
- "engines": {
14
- "node": ">=20.0.0"
15
- },
16
- "scripts": {
17
- "test": "jest --coverage",
18
- "test:watch": "jest --watch",
19
- "build": "esbuild src/main.ts --bundle --packages=external --outdir=bin --minify --sourcemap --platform=node --format=esm",
20
- "lint": "eslint src/**/*.ts",
21
- "lint:fix": "eslint src/**/*.ts --fix",
22
- "type-check": "tsc --noEmit",
23
- "format": "prettier --write \"src/**/*.ts\"",
24
- "format:check": "prettier --check \"src/**/*.ts\"",
25
- "security:audit": "npm audit --audit-level=moderate",
26
- "security:check": "npm run security:audit && npm run lint",
27
- "start": "node ./bin/main.js",
28
- "dev": "tsx src/main.ts",
29
- "prepare": "npm run build"
30
- },
31
- "keywords": [
32
- "webpage",
33
- "hierarchy",
34
- "management",
35
- "web-scraping",
36
- "cli",
37
- "url-extraction"
38
- ],
39
- "author": "Kirill <kine> Nevzorov",
40
- "license": "MIT",
41
- "bugs": {
42
- "url": "https://github.com/akinevz2/pagerts/issues"
43
- },
44
- "homepage": "https://github.com/akinevz2/pagerts",
45
- "dependencies": {
46
- "@exodus/bytes": "^1.15.0",
47
- "commander": "^14.0.3",
48
- "linkedom": "^0.18.9"
49
- },
50
- "devDependencies": {
51
- "@types/jest": "^29.5.14",
52
- "@types/node": "^22.10.5",
53
- "@typescript-eslint/eslint-plugin": "^8.20.0",
54
- "@typescript-eslint/parser": "^8.20.0",
55
- "esbuild": "^0.25.1",
56
- "eslint": "^9.18.0",
57
- "eslint-config-prettier": "^9.1.0",
58
- "eslint-plugin-security": "^3.0.1",
59
- "jest": "^29.7.0",
60
- "prettier": "^3.4.2",
61
- "ts-jest": "^29.2.5",
62
- "tsx": "^4.19.2",
63
- "typescript": "^5.7.2"
64
- }
65
- }
1
+ {
2
+ "name": "pagerts",
3
+ "description": "A tool for viewing external relations in a webpage",
4
+ "version": "1.4.3",
5
+ "type": "module",
6
+ "main": "main.js",
7
+ "bin": {
8
+ "pagerts": "bin/main.js"
9
+ },
10
+ "files": [
11
+ "bin"
12
+ ],
13
+ "engines": {
14
+ "node": ">=20.0.0"
15
+ },
16
+ "scripts": {
17
+ "test": "jest --coverage",
18
+ "test:watch": "jest --watch",
19
+ "build": "esbuild src/main.ts --bundle --packages=external --outdir=bin --sourcemap --platform=node --format=esm",
20
+ "lint": "eslint src/**/*.ts",
21
+ "lint:fix": "eslint src/**/*.ts --fix",
22
+ "type-check": "tsc --noEmit",
23
+ "format": "prettier --write \"**/*.{ts,js,mjs,cjs,json,yml,yaml,md}\"",
24
+ "format:check": "prettier --check \"**/*.{ts,js,mjs,cjs,json,yml,yaml,md}\"",
25
+ "security:audit": "npm audit --audit-level=moderate",
26
+ "security:check": "npm run security:audit && npm run lint",
27
+ "start": "node ./bin/main.js",
28
+ "dev": "tsx src/main.ts",
29
+ "prepare": "npm run build"
30
+ },
31
+ "keywords": [
32
+ "webpage",
33
+ "hierarchy",
34
+ "management",
35
+ "web-scraping",
36
+ "cli",
37
+ "url-extraction"
38
+ ],
39
+ "author": "Kirill <kine> Nevzorov",
40
+ "license": "MIT",
41
+ "bugs": {
42
+ "url": "https://github.com/akinevz2/pagerts/issues"
43
+ },
44
+ "homepage": "https://github.com/akinevz2/pagerts",
45
+ "dependencies": {
46
+ "@exodus/bytes": "^1.15.0",
47
+ "commander": "^14.0.3",
48
+ "linkedom": "^0.18.9"
49
+ },
50
+ "devDependencies": {
51
+ "@types/jest": "^29.5.14",
52
+ "@types/node": "^22.10.5",
53
+ "@typescript-eslint/eslint-plugin": "^8.20.0",
54
+ "@typescript-eslint/parser": "^8.20.0",
55
+ "esbuild": "^0.25.1",
56
+ "eslint": "^9.18.0",
57
+ "eslint-config-prettier": "^9.1.0",
58
+ "eslint-plugin-security": "^3.0.1",
59
+ "jest": "^29.7.0",
60
+ "prettier": "^3.4.2",
61
+ "ts-jest": "^29.2.5",
62
+ "tsx": "^4.19.2",
63
+ "typescript": "^5.7.2"
64
+ }
65
+ }