pagerts 0.1.9 โ†’ 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,42 +1,243 @@
1
- # README
1
+ # PagerTS
2
2
 
3
- This project aims to develop an efficient application for downloading and presenting the user a list of resources that a webpage links to on the Internet.
3
+ [![CI/CD](https://github.com/akinevz0/pagerts/workflows/CI%2FCD%20Security%20Pipeline/badge.svg)](https://github.com/akinevz0/pagerts/actions)
4
+ [![Security](https://img.shields.io/badge/security-maintained-green.svg)](./SECURITY.md)
5
+ [![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org)
6
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
4
7
 
5
- PagerTS is a command line utility that provides the user with a portable tool for transforming an URL into a JSON Object. The output of this command represents the navigable items within a webpage.
8
+ PagerTS is a secure, modern command-line utility that transforms URLs into structured JSON objects, extracting all navigable items and resources from webpages.
6
9
 
7
- ## Usage
10
+ ## Features
11
+
12
+ - ๐Ÿ”’ **Security-First**: Built-in URL validation, rate limiting, and XSS protection
13
+ - ๐Ÿš€ **Modern TypeScript**: Strict type checking and modern ES2022 syntax
14
+ - โšก **Fast**: Efficient parsing with JSDOM and concurrent request handling
15
+ - ๐Ÿงช **Well-Tested**: Comprehensive test coverage with Jest
16
+ - ๐Ÿ“ฆ **Easy to Use**: Simple CLI interface with sensible defaults
17
+
18
+ ## Installation
19
+
20
+ ### Global Installation
21
+
22
+ ```bash
23
+ npm install -g pagerts
24
+ pagerts <url>
25
+ ```
26
+
27
+ ### Using npx (No Installation Required)
8
28
 
9
- To use PagerTS, run it in the command line as:
29
+ ```bash
30
+ npx pagerts <url>
31
+ ```
32
+
33
+ ### From Source
10
34
 
11
35
  ```bash
36
+ git clone https://github.com/akinevz0/pagerts.git
12
37
  cd pagerts
13
38
  npm install
14
- pagerts -h
39
+ npm run build
40
+ npm link
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ ### Basic Usage
46
+
47
+ Extract resources from a remote URL:
48
+
49
+ ```bash
50
+ pagerts https://example.com
51
+ ```
52
+
53
+ Extract from multiple URLs:
54
+
55
+ ```bash
56
+ pagerts https://example.com https://example.org
15
57
  ```
16
58
 
17
- To run the application
59
+ Extract from a local HTML file:
18
60
 
19
61
  ```bash
20
- npm start -- https?://...
62
+ pagerts file:///path/to/file.html
63
+ ```
64
+
65
+ ### Output Format
66
+
67
+ The output is a JSON object containing:
68
+
69
+ ```json
70
+ {
71
+ "title": "Page Title",
72
+ "url": "https://example.com",
73
+ "resources": [
74
+ {
75
+ "name": "Link Text",
76
+ "url": "https://example.com/page"
77
+ }
78
+ ]
79
+ }
21
80
  ```
22
81
 
23
- ## Output
82
+ Fields:
24
83
 
25
- The output encodes a dependency list of resources. It is parsed using JSDOM library and returned to the user as an object annotated with fields `title`, `url`, `resources`.
84
+ - `title`: The page's title extracted from the `<title>` tag
85
+ - `url`: The URL of the page
86
+ - `resources`: Array of resources found on the page (links, meta tags, embeds)
87
+ - `name`: Readable text or description
88
+ - `url`: Target URL of the resource
26
89
 
27
- The last field represents a list of tuples, mapping name of the resource to a url.
90
+ ## Security
28
91
 
29
- The name is extracted from the readable text on the page.
92
+ PagerTS takes security seriously. See [SECURITY.md](./SECURITY.md) for:
30
93
 
31
- ## Installing
94
+ - Security features and protections
95
+ - How to report vulnerabilities
96
+ - Best practices for users
97
+ - Security checklist for contributors
32
98
 
33
- The CLI can be installed using:
99
+ ### Built-in Security Features
100
+
101
+ - โœ… URL validation (only allows `http://`, `https://`, `file://`)
102
+ - โœ… Input sanitization to prevent XSS attacks
103
+ - โœ… Rate limiting (50 requests/minute by default)
104
+ - โœ… Request timeouts to prevent hanging
105
+ - โœ… Maximum URL length enforcement
106
+ - โœ… Suspicious pattern detection
107
+ - โœ… Safe HTML parsing (no script execution)
108
+
109
+ ## Development
110
+
111
+ ### Prerequisites
112
+
113
+ - Node.js >= 18.0.0
114
+ - npm >= 9.0.0
115
+
116
+ ### Setup
34
117
 
35
118
  ```bash
119
+ # Clone the repository
120
+ git clone https://github.com/akinevz0/pagerts.git
36
121
  cd pagerts
37
- git pull
38
- npm install -g ./
122
+
123
+ # Install dependencies
124
+ npm install
125
+
126
+ # Run in development mode
127
+ npm run dev <url>
39
128
  ```
40
129
 
41
- It will be made available as a system-wide application under the name of `pagerts`
130
+ ### Available Scripts
131
+
132
+ ```bash
133
+ # Run tests
134
+ npm test
135
+
136
+ # Run tests in watch mode
137
+ npm test:watch
138
+
139
+ # Build the project
140
+ npm run build
141
+
142
+ # Lint code
143
+ npm run lint
144
+
145
+ # Fix linting issues
146
+ npm run lint:fix
147
+
148
+ # Type check
149
+ npm run type-check
150
+
151
+ # Format code
152
+ npm run format
153
+
154
+ # Check formatting
155
+ npm run format:check
156
+
157
+ # Security audit
158
+ npm run security:audit
159
+
160
+ # Complete security check (audit + lint)
161
+ npm run security:check
162
+ ```
163
+
164
+ ### Project Structure
165
+
166
+ ```
167
+ pagerts/
168
+ โ”œโ”€โ”€ src/
169
+ โ”‚ โ”œโ”€โ”€ main.ts # CLI entry point
170
+ โ”‚ โ”œโ”€โ”€ security.ts # Security utilities
171
+ โ”‚ โ”œโ”€โ”€ resource.ts # Resource types
172
+ โ”‚ โ”œโ”€โ”€ extractors/ # Content extractors
173
+ โ”‚ โ”‚ โ”œโ”€โ”€ AbstractExtractor.ts
174
+ โ”‚ โ”‚ โ”œโ”€โ”€ PageExtractor.ts
175
+ โ”‚ โ”‚ โ”œโ”€โ”€ ResourceExtractor.ts
176
+ โ”‚ โ”‚ โ””โ”€โ”€ TagExtractor.ts
177
+ โ”‚ โ”œโ”€โ”€ page/ # Page fetching
178
+ โ”‚ โ”‚ โ”œโ”€โ”€ Page.ts
179
+ โ”‚ โ”‚ โ””โ”€โ”€ PageFetcher.ts
180
+ โ”‚ โ”œโ”€โ”€ printers/ # Output formatters
181
+ โ”‚ โ”‚ โ”œโ”€โ”€ AbstractResourcePrinter.ts
182
+ โ”‚ โ”‚ โ”œโ”€โ”€ JSONStylePrinter.ts
183
+ โ”‚ โ”‚ โ””โ”€โ”€ LogStylePrinter.ts
184
+ โ”‚ โ””โ”€โ”€ __tests__/ # Test files
185
+ โ”œโ”€โ”€ bin/ # Built files
186
+ โ”œโ”€โ”€ .github/workflows/ # CI/CD pipelines
187
+ โ”œโ”€โ”€ package.json
188
+ โ”œโ”€โ”€ tsconfig.json
189
+ โ”œโ”€โ”€ jest.config.js
190
+ โ”œโ”€โ”€ eslint.config.js
191
+ โ””โ”€โ”€ SECURITY.md
192
+ ```
193
+
194
+ ## Contributing
195
+
196
+ Contributions are welcome! Please:
197
+
198
+ 1. Fork the repository
199
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
200
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
201
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
202
+ 5. Open a Pull Request
203
+
204
+ ### Contribution Guidelines
205
+
206
+ - Write tests for new features
207
+ - Follow the existing code style (enforced by ESLint and Prettier)
208
+ - Update documentation as needed
209
+ - Ensure all tests pass (`npm test`)
210
+ - Run security checks (`npm run security:check`)
211
+ - Follow security best practices (see [SECURITY.md](./SECURITY.md))
212
+
213
+ ## License
214
+
215
+ This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
216
+
217
+ ## Author
218
+
219
+ **Kirill kn253 Nevzorov**
220
+
221
+ ## Support
222
+
223
+ - ๐Ÿ› [Report bugs](https://github.com/akinevz0/pagerts/issues)
224
+ - ๐Ÿ’ก [Request features](https://github.com/akinevz0/pagerts/issues)
225
+ - ๐Ÿ”’ [Report security issues](./SECURITY.md)
226
+
227
+ ## Changelog
228
+
229
+ ### v0.3.0 (Latest)
230
+
231
+ - โœจ Added comprehensive security features
232
+ - โœจ Implemented URL validation and sanitization
233
+ - โœจ Added rate limiting
234
+ - โœจ Modernized codebase with TypeScript strict mode
235
+ - โœจ Added ESLint with security plugin
236
+ - โœจ Added comprehensive test suite
237
+ - โœจ Added CI/CD with GitHub Actions
238
+ - โœจ Improved error handling and retry logic
239
+ - ๐Ÿ“š Added security documentation
240
+
241
+ ### v0.2.0
42
242
 
243
+ - Initial public release
package/SECURITY.md ADDED
@@ -0,0 +1,160 @@
1
+ # Security Policy
2
+
3
+ ## Supported Versions
4
+
5
+ We release patches for security vulnerabilities. Currently supported versions:
6
+
7
+ | Version | Supported |
8
+ | ------- | ------------------ |
9
+ | 0.3.x | :white_check_mark: |
10
+ | < 0.3.0 | :x: |
11
+
12
+ ## Security Features
13
+
14
+ PagerTS implements several security measures to protect users:
15
+
16
+ ### Input Validation
17
+
18
+ - **URL Validation**: All URLs are validated before processing
19
+ - **Protocol Restrictions**: Only `http://`, `https://`, and `file://` protocols are allowed
20
+ - **Length Limits**: URLs are limited to 2048 characters to prevent DoS attacks
21
+ - **Pattern Detection**: Suspicious patterns (javascript:, data:, etc.) are blocked
22
+
23
+ ### Rate Limiting
24
+
25
+ - Requests are rate-limited to prevent abuse (default: 50 requests per minute)
26
+ - Configurable rate limits per instance
27
+
28
+ ### Safe HTML Parsing
29
+
30
+ - JSDOM is configured to run in secure mode
31
+ - JavaScript execution from fetched pages is disabled
32
+ - Timeouts prevent hanging on slow resources
33
+ - Retry logic with exponential backoff for transient failures
34
+
35
+ ### Data Sanitization
36
+
37
+ - HTML content is sanitized to prevent XSS attacks
38
+ - Special characters are properly escaped in output
39
+
40
+ ## Reporting a Vulnerability
41
+
42
+ We take the security of PagerTS seriously. If you believe you have found a security vulnerability, please report it to us as described below.
43
+
44
+ **Please do not report security vulnerabilities through public GitHub issues.**
45
+
46
+ Instead, please report them via email to the maintainer or through GitHub's private vulnerability reporting feature.
47
+
48
+ Please include the following information:
49
+
50
+ - Type of issue (e.g., buffer overflow, SQL injection, cross-site scripting, etc.)
51
+ - Full paths of source file(s) related to the manifestation of the issue
52
+ - The location of the affected source code (tag/branch/commit or direct URL)
53
+ - Any special configuration required to reproduce the issue
54
+ - Step-by-step instructions to reproduce the issue
55
+ - Proof-of-concept or exploit code (if possible)
56
+ - Impact of the issue, including how an attacker might exploit it
57
+
58
+ ### What to Expect
59
+
60
+ - **Acknowledgment**: We will acknowledge your report within 48 hours
61
+ - **Communication**: We will keep you informed about the progress of fixing the issue
62
+ - **Credit**: We will give you credit for the discovery when we announce the fix (unless you prefer to remain anonymous)
63
+
64
+ ## Security Best Practices for Users
65
+
66
+ When using PagerTS, follow these security guidelines:
67
+
68
+ ### 1. Be Cautious with URLs
69
+
70
+ ```bash
71
+ # Good - trusted domain
72
+ pagerts https://example.com
73
+
74
+ # Bad - suspicious or untrusted URLs
75
+ pagerts javascript:alert(1) # Will be blocked
76
+ pagerts data:text/html,... # Will be blocked
77
+ ```
78
+
79
+ ### 2. Use Environment Variables for Sensitive Data
80
+
81
+ Never hardcode sensitive information. Use environment variables:
82
+
83
+ ```bash
84
+ # Create a .env file (never commit this!)
85
+ API_KEY=your_secret_key
86
+
87
+ # Use it in your scripts
88
+ pagerts $TARGET_URL
89
+ ```
90
+
91
+ ### 3. Validate Output
92
+
93
+ Always validate and sanitize output before using it in other systems:
94
+
95
+ ```bash
96
+ # Pipe through jq for safe JSON processing
97
+ pagerts https://example.com | jq '.'
98
+ ```
99
+
100
+ ### 4. Keep Dependencies Updated
101
+
102
+ Regularly update PagerTS and its dependencies:
103
+
104
+ ```bash
105
+ npm update -g pagerts
106
+ ```
107
+
108
+ ### 5. Network Security
109
+
110
+ - Use HTTPS URLs whenever possible
111
+ - Be cautious when fetching from local networks
112
+ - Consider using a VPN or proxy for sensitive operations
113
+
114
+ ### 6. File System Access
115
+
116
+ When using `file://` URLs:
117
+
118
+ - Ensure you have appropriate permissions
119
+ - Be cautious with symbolic links
120
+ - Validate file paths to prevent directory traversal
121
+
122
+ ## Security Checklist for Contributors
123
+
124
+ If you're contributing to PagerTS, ensure your code:
125
+
126
+ - [ ] Validates all user input
127
+ - [ ] Uses parameterized queries (if applicable)
128
+ - [ ] Properly escapes output
129
+ - [ ] Handles errors gracefully without exposing sensitive information
130
+ - [ ] Includes tests for security-critical functionality
131
+ - [ ] Doesn't introduce new dependencies without security review
132
+ - [ ] Follows the principle of least privilege
133
+ - [ ] Includes appropriate logging (without logging sensitive data)
134
+
135
+ ## Dependencies
136
+
137
+ PagerTS regularly audits its dependencies for security vulnerabilities. Run the security check:
138
+
139
+ ```bash
140
+ npm run security:check
141
+ ```
142
+
143
+ ## Automated Security Testing
144
+
145
+ PagerTS uses:
146
+
147
+ - **npm audit**: Checks for known vulnerabilities in dependencies
148
+ - **ESLint with security plugin**: Static analysis for security issues
149
+ - **GitHub Dependabot**: Automated dependency updates
150
+ - **GitHub Actions**: CI/CD with security scanning
151
+
152
+ ## Contact
153
+
154
+ For security concerns, contact: [GitHub Issues](https://github.com/akinevz0/pagerts/issues)
155
+
156
+ ## Acknowledgments
157
+
158
+ We thank the following researchers for responsibly disclosing vulnerabilities:
159
+
160
+ - (None yet - be the first!)