pagerts 0.2.0 โ†’ 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/.github/codeql/codeql-config.yml +7 -0
  2. package/.github/workflows/ci.yml +146 -0
  3. package/.github/workflows/dependency-update.yml +52 -0
  4. package/.prettierignore +5 -0
  5. package/.prettierrc.json +10 -0
  6. package/MAINTAINERS.md +30 -0
  7. package/POST-INSTALL.md +205 -0
  8. package/README.md +220 -16
  9. package/SECURITY.md +160 -0
  10. package/bin/main.js +24 -19
  11. package/bin/main.js.map +4 -4
  12. package/eslint.config.mjs +83 -0
  13. package/{jest.config.js โ†’ jest.config.cjs} +45 -30
  14. package/package.json +34 -13
  15. package/src/__tests__/PageFetcher.test.ts +48 -0
  16. package/src/__tests__/security.test.ts +153 -0
  17. package/src/extractors/AbstractExtractor.ts +4 -5
  18. package/src/extractors/PageExtractor.ts +21 -12
  19. package/src/extractors/ResourceExtractor.ts +31 -25
  20. package/src/extractors/TagExtractor.ts +13 -14
  21. package/src/extractors/index.ts +4 -0
  22. package/src/main.ts +71 -43
  23. package/src/page/Page.ts +24 -19
  24. package/src/page/PageFetcher.ts +81 -30
  25. package/src/page/index.ts +3 -0
  26. package/src/printers/AbstractResourcePrinter.ts +6 -6
  27. package/src/printers/JSONStylePrinter.ts +9 -12
  28. package/src/printers/LogStylePrinter.ts +30 -28
  29. package/src/printers/index.ts +3 -0
  30. package/src/resource.ts +88 -96
  31. package/src/security.ts +184 -0
  32. package/tsconfig.eslint.json +5 -0
  33. package/tsconfig.json +27 -11
  34. package/bin/package.json +0 -40
  35. package/bin/src/extractors/AbstractExtractor.js +0 -11
  36. package/bin/src/extractors/AbstractExtractor.js.map +0 -1
  37. package/bin/src/extractors/PageExtractor.js +0 -13
  38. package/bin/src/extractors/PageExtractor.js.map +0 -1
  39. package/bin/src/extractors/ResourceExtractor.js +0 -32
  40. package/bin/src/extractors/ResourceExtractor.js.map +0 -1
  41. package/bin/src/main.js +0 -36
  42. package/bin/src/main.js.map +0 -1
  43. package/bin/src/page/Page.js +0 -8
  44. package/bin/src/page/Page.js.map +0 -1
  45. package/bin/src/page/PageFetcher.js +0 -26
  46. package/bin/src/page/PageFetcher.js.map +0 -1
  47. package/bin/src/printers/AbstractResourcePrinter.js +0 -8
  48. package/bin/src/printers/AbstractResourcePrinter.js.map +0 -1
  49. package/bin/src/printers/JSONStylePrinter.js +0 -12
  50. package/bin/src/printers/JSONStylePrinter.js.map +0 -1
  51. package/bin/src/printers/LogStylePrinter.js +0 -27
  52. package/bin/src/printers/LogStylePrinter.js.map +0 -1
  53. package/bin/src/resource.js +0 -56
  54. package/bin/src/resource.js.map +0 -1
package/README.md CHANGED
@@ -1,39 +1,243 @@
1
- # README
1
+ # PagerTS
2
2
 
3
- PagerTS is a command line utility that provides the user with a portable tool for transforming an URL into a JSON Object.
3
+ [![CI/CD](https://github.com/akinevz0/pagerts/workflows/CI%2FCD%20Security%20Pipeline/badge.svg)](https://github.com/akinevz0/pagerts/actions)
4
+ [![Security](https://img.shields.io/badge/security-maintained-green.svg)](./SECURITY.md)
5
+ [![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org)
6
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
4
7
 
5
- The output of this command contains all of navigable items one can navigate to within a webpage.
8
+ PagerTS is a secure, modern command-line utility that transforms URLs into structured JSON objects, extracting all navigable items and resources from webpages.
9
+
10
+ ## Features
11
+
12
+ - ๐Ÿ”’ **Security-First**: Built-in URL validation, rate limiting, and XSS protection
13
+ - ๐Ÿš€ **Modern TypeScript**: Strict type checking and modern ES2022 syntax
14
+ - โšก **Fast**: Efficient parsing with JSDOM and concurrent request handling
15
+ - ๐Ÿงช **Well-Tested**: Comprehensive test coverage with Jest
16
+ - ๐Ÿ“ฆ **Easy to Use**: Simple CLI interface with sensible defaults
17
+
18
+ ## Installation
19
+
20
+ ### Global Installation
21
+
22
+ ```bash
23
+ npm install -g pagerts
24
+ pagerts <url>
25
+ ```
26
+
27
+ ### Using npx (No Installation Required)
28
+
29
+ ```bash
30
+ npx pagerts <url>
31
+ ```
32
+
33
+ ### From Source
34
+
35
+ ```bash
36
+ git clone https://github.com/akinevz0/pagerts.git
37
+ cd pagerts
38
+ npm install
39
+ npm run build
40
+ npm link
41
+ ```
6
42
 
7
43
  ## Usage
8
44
 
9
- To use `pagerts` invoke it in the command line as:
45
+ ### Basic Usage
46
+
47
+ Extract resources from a remote URL:
48
+
49
+ ```bash
50
+ pagerts https://example.com
51
+ ```
52
+
53
+ Extract from multiple URLs:
54
+
55
+ ```bash
56
+ pagerts https://example.com https://example.org
57
+ ```
58
+
59
+ Extract from a local HTML file:
10
60
 
11
61
  ```bash
12
- pagerts https://website/page.html
62
+ pagerts file:///path/to/file.html
63
+ ```
64
+
65
+ ### Output Format
66
+
67
+ The output is a JSON object containing:
68
+
69
+ ```json
70
+ {
71
+ "title": "Page Title",
72
+ "url": "https://example.com",
73
+ "resources": [
74
+ {
75
+ "name": "Link Text",
76
+ "url": "https://example.com/page"
77
+ }
78
+ ]
79
+ }
13
80
  ```
14
81
 
15
- There is also support for loading local system html resources.
82
+ Fields:
83
+
84
+ - `title`: The page's title extracted from the `<title>` tag
85
+ - `url`: The URL of the page
86
+ - `resources`: Array of resources found on the page (links, meta tags, embeds)
87
+ - `name`: Readable text or description
88
+ - `url`: Target URL of the resource
89
+
90
+ ## Security
91
+
92
+ PagerTS takes security seriously. See [SECURITY.md](./SECURITY.md) for:
93
+
94
+ - Security features and protections
95
+ - How to report vulnerabilities
96
+ - Best practices for users
97
+ - Security checklist for contributors
16
98
 
17
- ## Output
99
+ ### Built-in Security Features
18
100
 
19
- The output encodes a dependency list of resources. It is parsed using JSDOM library and returned to the user as an object annotated with fields `title`, `url`, `resources`.
101
+ - โœ… URL validation (only allows `http://`, `https://`, `file://`)
102
+ - โœ… Input sanitization to prevent XSS attacks
103
+ - โœ… Rate limiting (50 requests/minute by default)
104
+ - โœ… Request timeouts to prevent hanging
105
+ - โœ… Maximum URL length enforcement
106
+ - โœ… Suspicious pattern detection
107
+ - โœ… Safe HTML parsing (no script execution)
20
108
 
21
- The last field represents a list of tuples, mapping name of the resource to a url.
109
+ ## Development
22
110
 
23
- The name is extracted from the readable text on the page.
111
+ ### Prerequisites
24
112
 
25
- ## Installing
113
+ - Node.js >= 18.0.0
114
+ - npm >= 9.0.0
26
115
 
27
- The CLI can be installed using npm:
116
+ ### Setup
28
117
 
29
118
  ```bash
30
- npm i -g pagerts
31
- pagerts <url>
119
+ # Clone the repository
120
+ git clone https://github.com/akinevz0/pagerts.git
121
+ cd pagerts
122
+
123
+ # Install dependencies
124
+ npm install
125
+
126
+ # Run in development mode
127
+ npm run dev <url>
32
128
  ```
33
129
 
34
- Or run from npx:
130
+ ### Available Scripts
35
131
 
36
132
  ```bash
37
- npx pagerts <url>
133
+ # Run tests
134
+ npm test
135
+
136
+ # Run tests in watch mode
137
+ npm test:watch
138
+
139
+ # Build the project
140
+ npm run build
141
+
142
+ # Lint code
143
+ npm run lint
144
+
145
+ # Fix linting issues
146
+ npm run lint:fix
147
+
148
+ # Type check
149
+ npm run type-check
150
+
151
+ # Format code
152
+ npm run format
153
+
154
+ # Check formatting
155
+ npm run format:check
156
+
157
+ # Security audit
158
+ npm run security:audit
159
+
160
+ # Complete security check (audit + lint)
161
+ npm run security:check
162
+ ```
163
+
164
+ ### Project Structure
165
+
38
166
  ```
167
+ pagerts/
168
+ โ”œโ”€โ”€ src/
169
+ โ”‚ โ”œโ”€โ”€ main.ts # CLI entry point
170
+ โ”‚ โ”œโ”€โ”€ security.ts # Security utilities
171
+ โ”‚ โ”œโ”€โ”€ resource.ts # Resource types
172
+ โ”‚ โ”œโ”€โ”€ extractors/ # Content extractors
173
+ โ”‚ โ”‚ โ”œโ”€โ”€ AbstractExtractor.ts
174
+ โ”‚ โ”‚ โ”œโ”€โ”€ PageExtractor.ts
175
+ โ”‚ โ”‚ โ”œโ”€โ”€ ResourceExtractor.ts
176
+ โ”‚ โ”‚ โ””โ”€โ”€ TagExtractor.ts
177
+ โ”‚ โ”œโ”€โ”€ page/ # Page fetching
178
+ โ”‚ โ”‚ โ”œโ”€โ”€ Page.ts
179
+ โ”‚ โ”‚ โ””โ”€โ”€ PageFetcher.ts
180
+ โ”‚ โ”œโ”€โ”€ printers/ # Output formatters
181
+ โ”‚ โ”‚ โ”œโ”€โ”€ AbstractResourcePrinter.ts
182
+ โ”‚ โ”‚ โ”œโ”€โ”€ JSONStylePrinter.ts
183
+ โ”‚ โ”‚ โ””โ”€โ”€ LogStylePrinter.ts
184
+ โ”‚ โ””โ”€โ”€ __tests__/ # Test files
185
+ โ”œโ”€โ”€ bin/ # Built files
186
+ โ”œโ”€โ”€ .github/workflows/ # CI/CD pipelines
187
+ โ”œโ”€โ”€ package.json
188
+ โ”œโ”€โ”€ tsconfig.json
189
+ โ”œโ”€โ”€ jest.config.js
190
+ โ”œโ”€โ”€ eslint.config.js
191
+ โ””โ”€โ”€ SECURITY.md
192
+ ```
193
+
194
+ ## Contributing
195
+
196
+ Contributions are welcome! Please:
197
+
198
+ 1. Fork the repository
199
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
200
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
201
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
202
+ 5. Open a Pull Request
203
+
204
+ ### Contribution Guidelines
205
+
206
+ - Write tests for new features
207
+ - Follow the existing code style (enforced by ESLint and Prettier)
208
+ - Update documentation as needed
209
+ - Ensure all tests pass (`npm test`)
210
+ - Run security checks (`npm run security:check`)
211
+ - Follow security best practices (see [SECURITY.md](./SECURITY.md))
212
+
213
+ ## License
214
+
215
+ This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
216
+
217
+ ## Author
218
+
219
+ **Kirill kn253 Nevzorov**
220
+
221
+ ## Support
222
+
223
+ - ๐Ÿ› [Report bugs](https://github.com/akinevz0/pagerts/issues)
224
+ - ๐Ÿ’ก [Request features](https://github.com/akinevz0/pagerts/issues)
225
+ - ๐Ÿ”’ [Report security issues](./SECURITY.md)
226
+
227
+ ## Changelog
228
+
229
+ ### v0.3.0 (Latest)
230
+
231
+ - โœจ Added comprehensive security features
232
+ - โœจ Implemented URL validation and sanitization
233
+ - โœจ Added rate limiting
234
+ - โœจ Modernized codebase with TypeScript strict mode
235
+ - โœจ Added ESLint with security plugin
236
+ - โœจ Added comprehensive test suite
237
+ - โœจ Added CI/CD with GitHub Actions
238
+ - โœจ Improved error handling and retry logic
239
+ - ๐Ÿ“š Added security documentation
240
+
241
+ ### v0.2.0
39
242
 
243
+ - Initial public release
package/SECURITY.md ADDED
@@ -0,0 +1,160 @@
1
+ # Security Policy
2
+
3
+ ## Supported Versions
4
+
5
+ We release patches for security vulnerabilities. Currently supported versions:
6
+
7
+ | Version | Supported |
8
+ | ------- | ------------------ |
9
+ | 0.3.x | :white_check_mark: |
10
+ | < 0.3.0 | :x: |
11
+
12
+ ## Security Features
13
+
14
+ PagerTS implements several security measures to protect users:
15
+
16
+ ### Input Validation
17
+
18
+ - **URL Validation**: All URLs are validated before processing
19
+ - **Protocol Restrictions**: Only `http://`, `https://`, and `file://` protocols are allowed
20
+ - **Length Limits**: URLs are limited to 2048 characters to prevent DoS attacks
21
+ - **Pattern Detection**: Suspicious patterns (javascript:, data:, etc.) are blocked
22
+
23
+ ### Rate Limiting
24
+
25
+ - Requests are rate-limited to prevent abuse (default: 50 requests per minute)
26
+ - Configurable rate limits per instance
27
+
28
+ ### Safe HTML Parsing
29
+
30
+ - JSDOM is configured to run in secure mode
31
+ - JavaScript execution from fetched pages is disabled
32
+ - Timeouts prevent hanging on slow resources
33
+ - Retry logic with exponential backoff for transient failures
34
+
35
+ ### Data Sanitization
36
+
37
+ - HTML content is sanitized to prevent XSS attacks
38
+ - Special characters are properly escaped in output
39
+
40
+ ## Reporting a Vulnerability
41
+
42
+ We take the security of PagerTS seriously. If you believe you have found a security vulnerability, please report it to us as described below.
43
+
44
+ **Please do not report security vulnerabilities through public GitHub issues.**
45
+
46
+ Instead, please report them via email to the maintainer or through GitHub's private vulnerability reporting feature.
47
+
48
+ Please include the following information:
49
+
50
+ - Type of issue (e.g., buffer overflow, SQL injection, cross-site scripting, etc.)
51
+ - Full paths of source file(s) related to the manifestation of the issue
52
+ - The location of the affected source code (tag/branch/commit or direct URL)
53
+ - Any special configuration required to reproduce the issue
54
+ - Step-by-step instructions to reproduce the issue
55
+ - Proof-of-concept or exploit code (if possible)
56
+ - Impact of the issue, including how an attacker might exploit it
57
+
58
+ ### What to Expect
59
+
60
+ - **Acknowledgment**: We will acknowledge your report within 48 hours
61
+ - **Communication**: We will keep you informed about the progress of fixing the issue
62
+ - **Credit**: We will give you credit for the discovery when we announce the fix (unless you prefer to remain anonymous)
63
+
64
+ ## Security Best Practices for Users
65
+
66
+ When using PagerTS, follow these security guidelines:
67
+
68
+ ### 1. Be Cautious with URLs
69
+
70
+ ```bash
71
+ # Good - trusted domain
72
+ pagerts https://example.com
73
+
74
+ # Bad - suspicious or untrusted URLs
75
+ pagerts javascript:alert(1) # Will be blocked
76
+ pagerts data:text/html,... # Will be blocked
77
+ ```
78
+
79
+ ### 2. Use Environment Variables for Sensitive Data
80
+
81
+ Never hardcode sensitive information. Use environment variables:
82
+
83
+ ```bash
84
+ # Create a .env file (never commit this!)
85
+ API_KEY=your_secret_key
86
+
87
+ # Use it in your scripts
88
+ pagerts $TARGET_URL
89
+ ```
90
+
91
+ ### 3. Validate Output
92
+
93
+ Always validate and sanitize output before using it in other systems:
94
+
95
+ ```bash
96
+ # Pipe through jq for safe JSON processing
97
+ pagerts https://example.com | jq '.'
98
+ ```
99
+
100
+ ### 4. Keep Dependencies Updated
101
+
102
+ Regularly update PagerTS and its dependencies:
103
+
104
+ ```bash
105
+ npm update -g pagerts
106
+ ```
107
+
108
+ ### 5. Network Security
109
+
110
+ - Use HTTPS URLs whenever possible
111
+ - Be cautious when fetching from local networks
112
+ - Consider using a VPN or proxy for sensitive operations
113
+
114
+ ### 6. File System Access
115
+
116
+ When using `file://` URLs:
117
+
118
+ - Ensure you have appropriate permissions
119
+ - Be cautious with symbolic links
120
+ - Validate file paths to prevent directory traversal
121
+
122
+ ## Security Checklist for Contributors
123
+
124
+ If you're contributing to PagerTS, ensure your code:
125
+
126
+ - [ ] Validates all user input
127
+ - [ ] Uses parameterized queries (if applicable)
128
+ - [ ] Properly escapes output
129
+ - [ ] Handles errors gracefully without exposing sensitive information
130
+ - [ ] Includes tests for security-critical functionality
131
+ - [ ] Doesn't introduce new dependencies without security review
132
+ - [ ] Follows the principle of least privilege
133
+ - [ ] Includes appropriate logging (without logging sensitive data)
134
+
135
+ ## Dependencies
136
+
137
+ PagerTS regularly audits its dependencies for security vulnerabilities. Run the security check:
138
+
139
+ ```bash
140
+ npm run security:check
141
+ ```
142
+
143
+ ## Automated Security Testing
144
+
145
+ PagerTS uses:
146
+
147
+ - **npm audit**: Checks for known vulnerabilities in dependencies
148
+ - **ESLint with security plugin**: Static analysis for security issues
149
+ - **GitHub Dependabot**: Automated dependency updates
150
+ - **GitHub Actions**: CI/CD with security scanning
151
+
152
+ ## Contact
153
+
154
+ For security concerns, contact: [GitHub Issues](https://github.com/akinevz0/pagerts/issues)
155
+
156
+ ## Acknowledgments
157
+
158
+ We thank the following researchers for responsibly disclosing vulnerabilities:
159
+
160
+ - (None yet - be the first!)