pagerts 0.1.9 โ 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/codeql/codeql-config.yml +7 -0
- package/.github/workflows/ci.yml +146 -0
- package/.github/workflows/dependency-update.yml +52 -0
- package/.prettierignore +5 -0
- package/.prettierrc.json +10 -0
- package/MAINTAINERS.md +30 -0
- package/POST-INSTALL.md +205 -0
- package/README.md +218 -17
- package/SECURITY.md +160 -0
- package/bin/main.js +24 -19
- package/bin/main.js.map +4 -4
- package/eslint.config.mjs +83 -0
- package/{jest.config.js โ jest.config.cjs} +45 -30
- package/package.json +34 -13
- package/src/__tests__/PageFetcher.test.ts +48 -0
- package/src/__tests__/security.test.ts +153 -0
- package/src/extractors/AbstractExtractor.ts +4 -5
- package/src/extractors/PageExtractor.ts +21 -21
- package/src/extractors/ResourceExtractor.ts +31 -25
- package/src/extractors/TagExtractor.ts +13 -14
- package/src/extractors/index.ts +4 -0
- package/src/main.ts +71 -48
- package/src/page/Page.ts +24 -20
- package/src/page/PageFetcher.ts +81 -24
- package/src/page/index.ts +3 -0
- package/src/printers/AbstractResourcePrinter.ts +6 -6
- package/src/printers/JSONStylePrinter.ts +9 -12
- package/src/printers/LogStylePrinter.ts +30 -28
- package/src/printers/index.ts +3 -0
- package/src/resource.ts +88 -96
- package/src/security.ts +184 -0
- package/tsconfig.eslint.json +5 -0
- package/tsconfig.json +27 -11
package/README.md
CHANGED
|
@@ -1,42 +1,243 @@
|
|
|
1
|
-
#
|
|
1
|
+
# PagerTS
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://github.com/akinevz0/pagerts/actions)
|
|
4
|
+
[](./SECURITY.md)
|
|
5
|
+
[](https://nodejs.org)
|
|
6
|
+
[](./LICENSE)
|
|
4
7
|
|
|
5
|
-
PagerTS is a command
|
|
8
|
+
PagerTS is a secure, modern command-line utility that transforms URLs into structured JSON objects, extracting all navigable items and resources from webpages.
|
|
6
9
|
|
|
7
|
-
##
|
|
10
|
+
## Features
|
|
11
|
+
|
|
12
|
+
- ๐ **Security-First**: Built-in URL validation, rate limiting, and XSS protection
|
|
13
|
+
- ๐ **Modern TypeScript**: Strict type checking and modern ES2022 syntax
|
|
14
|
+
- โก **Fast**: Efficient parsing with JSDOM and concurrent request handling
|
|
15
|
+
- ๐งช **Well-Tested**: Comprehensive test coverage with Jest
|
|
16
|
+
- ๐ฆ **Easy to Use**: Simple CLI interface with sensible defaults
|
|
17
|
+
|
|
18
|
+
## Installation
|
|
19
|
+
|
|
20
|
+
### Global Installation
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
npm install -g pagerts
|
|
24
|
+
pagerts <url>
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
### Using npx (No Installation Required)
|
|
8
28
|
|
|
9
|
-
|
|
29
|
+
```bash
|
|
30
|
+
npx pagerts <url>
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### From Source
|
|
10
34
|
|
|
11
35
|
```bash
|
|
36
|
+
git clone https://github.com/akinevz0/pagerts.git
|
|
12
37
|
cd pagerts
|
|
13
38
|
npm install
|
|
14
|
-
|
|
39
|
+
npm run build
|
|
40
|
+
npm link
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Usage
|
|
44
|
+
|
|
45
|
+
### Basic Usage
|
|
46
|
+
|
|
47
|
+
Extract resources from a remote URL:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
pagerts https://example.com
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Extract from multiple URLs:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
pagerts https://example.com https://example.org
|
|
15
57
|
```
|
|
16
58
|
|
|
17
|
-
|
|
59
|
+
Extract from a local HTML file:
|
|
18
60
|
|
|
19
61
|
```bash
|
|
20
|
-
|
|
62
|
+
pagerts file:///path/to/file.html
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Output Format
|
|
66
|
+
|
|
67
|
+
The output is a JSON object containing:
|
|
68
|
+
|
|
69
|
+
```json
|
|
70
|
+
{
|
|
71
|
+
"title": "Page Title",
|
|
72
|
+
"url": "https://example.com",
|
|
73
|
+
"resources": [
|
|
74
|
+
{
|
|
75
|
+
"name": "Link Text",
|
|
76
|
+
"url": "https://example.com/page"
|
|
77
|
+
}
|
|
78
|
+
]
|
|
79
|
+
}
|
|
21
80
|
```
|
|
22
81
|
|
|
23
|
-
|
|
82
|
+
Fields:
|
|
24
83
|
|
|
25
|
-
|
|
84
|
+
- `title`: The page's title extracted from the `<title>` tag
|
|
85
|
+
- `url`: The URL of the page
|
|
86
|
+
- `resources`: Array of resources found on the page (links, meta tags, embeds)
|
|
87
|
+
- `name`: Readable text or description
|
|
88
|
+
- `url`: Target URL of the resource
|
|
26
89
|
|
|
27
|
-
|
|
90
|
+
## Security
|
|
28
91
|
|
|
29
|
-
|
|
92
|
+
PagerTS takes security seriously. See [SECURITY.md](./SECURITY.md) for:
|
|
30
93
|
|
|
31
|
-
|
|
94
|
+
- Security features and protections
|
|
95
|
+
- How to report vulnerabilities
|
|
96
|
+
- Best practices for users
|
|
97
|
+
- Security checklist for contributors
|
|
32
98
|
|
|
33
|
-
|
|
99
|
+
### Built-in Security Features
|
|
100
|
+
|
|
101
|
+
- โ
URL validation (only allows `http://`, `https://`, `file://`)
|
|
102
|
+
- โ
Input sanitization to prevent XSS attacks
|
|
103
|
+
- โ
Rate limiting (50 requests/minute by default)
|
|
104
|
+
- โ
Request timeouts to prevent hanging
|
|
105
|
+
- โ
Maximum URL length enforcement
|
|
106
|
+
- โ
Suspicious pattern detection
|
|
107
|
+
- โ
Safe HTML parsing (no script execution)
|
|
108
|
+
|
|
109
|
+
## Development
|
|
110
|
+
|
|
111
|
+
### Prerequisites
|
|
112
|
+
|
|
113
|
+
- Node.js >= 18.0.0
|
|
114
|
+
- npm >= 9.0.0
|
|
115
|
+
|
|
116
|
+
### Setup
|
|
34
117
|
|
|
35
118
|
```bash
|
|
119
|
+
# Clone the repository
|
|
120
|
+
git clone https://github.com/akinevz0/pagerts.git
|
|
36
121
|
cd pagerts
|
|
37
|
-
|
|
38
|
-
|
|
122
|
+
|
|
123
|
+
# Install dependencies
|
|
124
|
+
npm install
|
|
125
|
+
|
|
126
|
+
# Run in development mode
|
|
127
|
+
npm run dev <url>
|
|
39
128
|
```
|
|
40
129
|
|
|
41
|
-
|
|
130
|
+
### Available Scripts
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
# Run tests
|
|
134
|
+
npm test
|
|
135
|
+
|
|
136
|
+
# Run tests in watch mode
|
|
137
|
+
npm test:watch
|
|
138
|
+
|
|
139
|
+
# Build the project
|
|
140
|
+
npm run build
|
|
141
|
+
|
|
142
|
+
# Lint code
|
|
143
|
+
npm run lint
|
|
144
|
+
|
|
145
|
+
# Fix linting issues
|
|
146
|
+
npm run lint:fix
|
|
147
|
+
|
|
148
|
+
# Type check
|
|
149
|
+
npm run type-check
|
|
150
|
+
|
|
151
|
+
# Format code
|
|
152
|
+
npm run format
|
|
153
|
+
|
|
154
|
+
# Check formatting
|
|
155
|
+
npm run format:check
|
|
156
|
+
|
|
157
|
+
# Security audit
|
|
158
|
+
npm run security:audit
|
|
159
|
+
|
|
160
|
+
# Complete security check (audit + lint)
|
|
161
|
+
npm run security:check
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### Project Structure
|
|
165
|
+
|
|
166
|
+
```
|
|
167
|
+
pagerts/
|
|
168
|
+
โโโ src/
|
|
169
|
+
โ โโโ main.ts # CLI entry point
|
|
170
|
+
โ โโโ security.ts # Security utilities
|
|
171
|
+
โ โโโ resource.ts # Resource types
|
|
172
|
+
โ โโโ extractors/ # Content extractors
|
|
173
|
+
โ โ โโโ AbstractExtractor.ts
|
|
174
|
+
โ โ โโโ PageExtractor.ts
|
|
175
|
+
โ โ โโโ ResourceExtractor.ts
|
|
176
|
+
โ โ โโโ TagExtractor.ts
|
|
177
|
+
โ โโโ page/ # Page fetching
|
|
178
|
+
โ โ โโโ Page.ts
|
|
179
|
+
โ โ โโโ PageFetcher.ts
|
|
180
|
+
โ โโโ printers/ # Output formatters
|
|
181
|
+
โ โ โโโ AbstractResourcePrinter.ts
|
|
182
|
+
โ โ โโโ JSONStylePrinter.ts
|
|
183
|
+
โ โ โโโ LogStylePrinter.ts
|
|
184
|
+
โ โโโ __tests__/ # Test files
|
|
185
|
+
โโโ bin/ # Built files
|
|
186
|
+
โโโ .github/workflows/ # CI/CD pipelines
|
|
187
|
+
โโโ package.json
|
|
188
|
+
โโโ tsconfig.json
|
|
189
|
+
โโโ jest.config.js
|
|
190
|
+
โโโ eslint.config.js
|
|
191
|
+
โโโ SECURITY.md
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
## Contributing
|
|
195
|
+
|
|
196
|
+
Contributions are welcome! Please:
|
|
197
|
+
|
|
198
|
+
1. Fork the repository
|
|
199
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
200
|
+
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
|
201
|
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
202
|
+
5. Open a Pull Request
|
|
203
|
+
|
|
204
|
+
### Contribution Guidelines
|
|
205
|
+
|
|
206
|
+
- Write tests for new features
|
|
207
|
+
- Follow the existing code style (enforced by ESLint and Prettier)
|
|
208
|
+
- Update documentation as needed
|
|
209
|
+
- Ensure all tests pass (`npm test`)
|
|
210
|
+
- Run security checks (`npm run security:check`)
|
|
211
|
+
- Follow security best practices (see [SECURITY.md](./SECURITY.md))
|
|
212
|
+
|
|
213
|
+
## License
|
|
214
|
+
|
|
215
|
+
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
|
|
216
|
+
|
|
217
|
+
## Author
|
|
218
|
+
|
|
219
|
+
**Kirill kn253 Nevzorov**
|
|
220
|
+
|
|
221
|
+
## Support
|
|
222
|
+
|
|
223
|
+
- ๐ [Report bugs](https://github.com/akinevz0/pagerts/issues)
|
|
224
|
+
- ๐ก [Request features](https://github.com/akinevz0/pagerts/issues)
|
|
225
|
+
- ๐ [Report security issues](./SECURITY.md)
|
|
226
|
+
|
|
227
|
+
## Changelog
|
|
228
|
+
|
|
229
|
+
### v0.3.0 (Latest)
|
|
230
|
+
|
|
231
|
+
- โจ Added comprehensive security features
|
|
232
|
+
- โจ Implemented URL validation and sanitization
|
|
233
|
+
- โจ Added rate limiting
|
|
234
|
+
- โจ Modernized codebase with TypeScript strict mode
|
|
235
|
+
- โจ Added ESLint with security plugin
|
|
236
|
+
- โจ Added comprehensive test suite
|
|
237
|
+
- โจ Added CI/CD with GitHub Actions
|
|
238
|
+
- โจ Improved error handling and retry logic
|
|
239
|
+
- ๐ Added security documentation
|
|
240
|
+
|
|
241
|
+
### v0.2.0
|
|
42
242
|
|
|
243
|
+
- Initial public release
|
package/SECURITY.md
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
1
|
+
# Security Policy
|
|
2
|
+
|
|
3
|
+
## Supported Versions
|
|
4
|
+
|
|
5
|
+
We release patches for security vulnerabilities. Currently supported versions:
|
|
6
|
+
|
|
7
|
+
| Version | Supported |
|
|
8
|
+
| ------- | ------------------ |
|
|
9
|
+
| 0.3.x | :white_check_mark: |
|
|
10
|
+
| < 0.3.0 | :x: |
|
|
11
|
+
|
|
12
|
+
## Security Features
|
|
13
|
+
|
|
14
|
+
PagerTS implements several security measures to protect users:
|
|
15
|
+
|
|
16
|
+
### Input Validation
|
|
17
|
+
|
|
18
|
+
- **URL Validation**: All URLs are validated before processing
|
|
19
|
+
- **Protocol Restrictions**: Only `http://`, `https://`, and `file://` protocols are allowed
|
|
20
|
+
- **Length Limits**: URLs are limited to 2048 characters to prevent DoS attacks
|
|
21
|
+
- **Pattern Detection**: Suspicious patterns (javascript:, data:, etc.) are blocked
|
|
22
|
+
|
|
23
|
+
### Rate Limiting
|
|
24
|
+
|
|
25
|
+
- Requests are rate-limited to prevent abuse (default: 50 requests per minute)
|
|
26
|
+
- Configurable rate limits per instance
|
|
27
|
+
|
|
28
|
+
### Safe HTML Parsing
|
|
29
|
+
|
|
30
|
+
- JSDOM is configured to run in secure mode
|
|
31
|
+
- JavaScript execution from fetched pages is disabled
|
|
32
|
+
- Timeouts prevent hanging on slow resources
|
|
33
|
+
- Retry logic with exponential backoff for transient failures
|
|
34
|
+
|
|
35
|
+
### Data Sanitization
|
|
36
|
+
|
|
37
|
+
- HTML content is sanitized to prevent XSS attacks
|
|
38
|
+
- Special characters are properly escaped in output
|
|
39
|
+
|
|
40
|
+
## Reporting a Vulnerability
|
|
41
|
+
|
|
42
|
+
We take the security of PagerTS seriously. If you believe you have found a security vulnerability, please report it to us as described below.
|
|
43
|
+
|
|
44
|
+
**Please do not report security vulnerabilities through public GitHub issues.**
|
|
45
|
+
|
|
46
|
+
Instead, please report them via email to the maintainer or through GitHub's private vulnerability reporting feature.
|
|
47
|
+
|
|
48
|
+
Please include the following information:
|
|
49
|
+
|
|
50
|
+
- Type of issue (e.g., buffer overflow, SQL injection, cross-site scripting, etc.)
|
|
51
|
+
- Full paths of source file(s) related to the manifestation of the issue
|
|
52
|
+
- The location of the affected source code (tag/branch/commit or direct URL)
|
|
53
|
+
- Any special configuration required to reproduce the issue
|
|
54
|
+
- Step-by-step instructions to reproduce the issue
|
|
55
|
+
- Proof-of-concept or exploit code (if possible)
|
|
56
|
+
- Impact of the issue, including how an attacker might exploit it
|
|
57
|
+
|
|
58
|
+
### What to Expect
|
|
59
|
+
|
|
60
|
+
- **Acknowledgment**: We will acknowledge your report within 48 hours
|
|
61
|
+
- **Communication**: We will keep you informed about the progress of fixing the issue
|
|
62
|
+
- **Credit**: We will give you credit for the discovery when we announce the fix (unless you prefer to remain anonymous)
|
|
63
|
+
|
|
64
|
+
## Security Best Practices for Users
|
|
65
|
+
|
|
66
|
+
When using PagerTS, follow these security guidelines:
|
|
67
|
+
|
|
68
|
+
### 1. Be Cautious with URLs
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
# Good - trusted domain
|
|
72
|
+
pagerts https://example.com
|
|
73
|
+
|
|
74
|
+
# Bad - suspicious or untrusted URLs
|
|
75
|
+
pagerts javascript:alert(1) # Will be blocked
|
|
76
|
+
pagerts data:text/html,... # Will be blocked
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### 2. Use Environment Variables for Sensitive Data
|
|
80
|
+
|
|
81
|
+
Never hardcode sensitive information. Use environment variables:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
# Create a .env file (never commit this!)
|
|
85
|
+
API_KEY=your_secret_key
|
|
86
|
+
|
|
87
|
+
# Use it in your scripts
|
|
88
|
+
pagerts $TARGET_URL
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### 3. Validate Output
|
|
92
|
+
|
|
93
|
+
Always validate and sanitize output before using it in other systems:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
# Pipe through jq for safe JSON processing
|
|
97
|
+
pagerts https://example.com | jq '.'
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### 4. Keep Dependencies Updated
|
|
101
|
+
|
|
102
|
+
Regularly update PagerTS and its dependencies:
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
npm update -g pagerts
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### 5. Network Security
|
|
109
|
+
|
|
110
|
+
- Use HTTPS URLs whenever possible
|
|
111
|
+
- Be cautious when fetching from local networks
|
|
112
|
+
- Consider using a VPN or proxy for sensitive operations
|
|
113
|
+
|
|
114
|
+
### 6. File System Access
|
|
115
|
+
|
|
116
|
+
When using `file://` URLs:
|
|
117
|
+
|
|
118
|
+
- Ensure you have appropriate permissions
|
|
119
|
+
- Be cautious with symbolic links
|
|
120
|
+
- Validate file paths to prevent directory traversal
|
|
121
|
+
|
|
122
|
+
## Security Checklist for Contributors
|
|
123
|
+
|
|
124
|
+
If you're contributing to PagerTS, ensure your code:
|
|
125
|
+
|
|
126
|
+
- [ ] Validates all user input
|
|
127
|
+
- [ ] Uses parameterized queries (if applicable)
|
|
128
|
+
- [ ] Properly escapes output
|
|
129
|
+
- [ ] Handles errors gracefully without exposing sensitive information
|
|
130
|
+
- [ ] Includes tests for security-critical functionality
|
|
131
|
+
- [ ] Doesn't introduce new dependencies without security review
|
|
132
|
+
- [ ] Follows the principle of least privilege
|
|
133
|
+
- [ ] Includes appropriate logging (without logging sensitive data)
|
|
134
|
+
|
|
135
|
+
## Dependencies
|
|
136
|
+
|
|
137
|
+
PagerTS regularly audits its dependencies for security vulnerabilities. Run the security check:
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
npm run security:check
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## Automated Security Testing
|
|
144
|
+
|
|
145
|
+
PagerTS uses:
|
|
146
|
+
|
|
147
|
+
- **npm audit**: Checks for known vulnerabilities in dependencies
|
|
148
|
+
- **ESLint with security plugin**: Static analysis for security issues
|
|
149
|
+
- **GitHub Dependabot**: Automated dependency updates
|
|
150
|
+
- **GitHub Actions**: CI/CD with security scanning
|
|
151
|
+
|
|
152
|
+
## Contact
|
|
153
|
+
|
|
154
|
+
For security concerns, contact: [GitHub Issues](https://github.com/akinevz0/pagerts/issues)
|
|
155
|
+
|
|
156
|
+
## Acknowledgments
|
|
157
|
+
|
|
158
|
+
We thank the following researchers for responsibly disclosing vulnerabilities:
|
|
159
|
+
|
|
160
|
+
- (None yet - be the first!)
|