@akotliar/sitemap-qa 1.0.0-alpha.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +235 -0
- package/dist/index.cjs +484 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +1 -0
- package/dist/index.d.ts +1 -0
- package/dist/index.js +484 -0
- package/dist/index.js.map +1 -0
- package/package.json +65 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Alex Kotliar
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,235 @@
|
|
|
1
|
+
# Sitemap-QA
|
|
2
|
+
|
|
3
|
+
> **Automated sitemap analysis for QA teams** — Detect test/qa/dev/staging URLs, admin paths, sensitive parameters, and URLs that shouldn't be publicly indexed.
|
|
4
|
+
|
|
5
|
+
[](LICENSE)
|
|
6
|
+
[](package.json)
|
|
7
|
+
[](https://www.typescriptlang.org/)
|
|
8
|
+
|
|
9
|
+
Sitemap-QA is a command-line tool that automatically discovers, parses, and analyzes website sitemaps to identify potential quality issues, security risks, and configuration problems. Built for QA teams to validate deployments, catch environment leakage, and identify URLs that shouldn't be publicly indexed.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## 🎯 Why Sitemap-QA?
|
|
14
|
+
|
|
15
|
+
Unlike SEO-focused sitemap validators, Sitemap-QA is designed specifically for **QA validation and risk detection**:
|
|
16
|
+
|
|
17
|
+
- ✅ **Detect environment leakage** — Find staging, dev, or test URLs that shouldn't be in production sitemaps
|
|
18
|
+
- ✅ **Identify exposed admin paths** — Catch `/admin`, `/dashboard`, and internal routes in public indexes
|
|
19
|
+
- ✅ **Flag sensitive parameters** — Detect API keys, tokens, or passwords in sitemap URLs
|
|
20
|
+
- ✅ **Validate domain consistency** — Find protocol mismatches and subdomain issues
|
|
21
|
+
- ✅ **Fast and automated** — Analyze thousands of URLs in seconds with detailed reports
|
|
22
|
+
|
|
23
|
+
Perfect for CI/CD pipelines, pre-release validation, and security audits.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## 🚀 Quick Start
|
|
28
|
+
|
|
29
|
+
### Installation
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
npm install -g sitemap-qa
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
### Basic Usage
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
# Analyze a website's sitemap
|
|
39
|
+
sitemap-qa analyze https://example.com
|
|
40
|
+
|
|
41
|
+
# Generate JSON output for CI/CD
|
|
42
|
+
sitemap-qa analyze https://example.com --output json > report.json
|
|
43
|
+
|
|
44
|
+
# Increase verbosity for debugging
|
|
45
|
+
sitemap-qa analyze https://example.com --verbose
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## 📋 Features
|
|
51
|
+
|
|
52
|
+
### Automatic Sitemap Discovery
|
|
53
|
+
- Checks `robots.txt` for sitemap declarations
|
|
54
|
+
- Tests standard paths (`/sitemap.xml`, `/sitemap_index.xml`, etc.)
|
|
55
|
+
- Recursively follows sitemap indexes
|
|
56
|
+
- Handles multiple sitemaps and formats
|
|
57
|
+
- Detects and processes malformed sitemap indexes (sitemaps listed in `<url>` blocks instead of `<sitemap>` blocks)
|
|
58
|
+
|
|
59
|
+
### Risk Detection Patterns
|
|
60
|
+
|
|
61
|
+
| Risk Category | Severity | Examples | Can Be Excluded |
|
|
62
|
+
|--------------|----------|----------|-----------------|
|
|
63
|
+
| **Environment Leakage** | High | `staging.example.com`, `/dev/`, `/test/` | ✅ Via patterns |
|
|
64
|
+
| **Admin Paths** | High | `/admin`, `/dashboard`, `/config`, `/console` | ✅ Via patterns |
|
|
65
|
+
| **Internal Content** | Medium | `/internal` paths | ✅ Via patterns |
|
|
66
|
+
| **Sensitive Parameters** | High | `?token=`, `?apikey=`, `?password=` | ✅ Via patterns |
|
|
67
|
+
| **Test Content** | Medium | `/test-`, `sample-`, `demo-` | ✅ Via patterns |
|
|
68
|
+
| **Protocol Inconsistency** | Medium | HTTP URLs in HTTPS sitemaps | ❌ Always detected |
|
|
69
|
+
| **Domain Mismatch** | Medium | Different domains in sitemap | ❌ Always detected |
|
|
70
|
+
|
|
71
|
+
**Note:** Admin path patterns now properly match URLs with query parameters (e.g., `/admin?id=123`).
|
|
72
|
+
|
|
73
|
+
### Output Formats
|
|
74
|
+
|
|
75
|
+
#### HTML Report (Interactive)
|
|
76
|
+
The HTML report provides an interactive, visually appealing view with:
|
|
77
|
+
- Expandable/collapsible sections by severity
|
|
78
|
+
- Download buttons to export all URLs per category
|
|
79
|
+
- Clean, modern design with hover effects
|
|
80
|
+
- Portable single-file format
|
|
81
|
+
|
|
82
|
+
#### JSON Report (Machine-Readable)
|
|
83
|
+
```json
|
|
84
|
+
{
|
|
85
|
+
"analysis_metadata": {
|
|
86
|
+
"base_url": "https://example.com",
|
|
87
|
+
"tool_version": "1.0.0",
|
|
88
|
+
"analysis_type": "rule-based analysis",
|
|
89
|
+
"analysis_timestamp": "2025-12-11T00:00:00.000Z",
|
|
90
|
+
"execution_time_ms": 4523
|
|
91
|
+
},
|
|
92
|
+
"sitemaps_discovered": [
|
|
93
|
+
"https://example.com/sitemap.xml"
|
|
94
|
+
],
|
|
95
|
+
"suspicious_groups": [
|
|
96
|
+
{
|
|
97
|
+
"category": "environment_leakage",
|
|
98
|
+
"severity": "high",
|
|
99
|
+
"count": 3,
|
|
100
|
+
"rationale": "Production sitemap contains staging URLs",
|
|
101
|
+
"sample_urls": ["..."],
|
|
102
|
+
"recommended_action": "Verify sitemap generation excludes non-production environments"
|
|
103
|
+
}
|
|
104
|
+
],
|
|
105
|
+
"summary": {
|
|
106
|
+
"high_severity_count": 2,
|
|
107
|
+
"medium_severity_count": 1,
|
|
108
|
+
"low_severity_count": 0,
|
|
109
|
+
"total_risky_urls": 8,
|
|
110
|
+
"overall_status": "issues_found"
|
|
111
|
+
}
|
|
112
|
+
}
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## 🛠️ CLI Options
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
Usage: sitemap-qa analyze [options] <url>
|
|
121
|
+
|
|
122
|
+
Analyze a website's sitemap for quality issues
|
|
123
|
+
|
|
124
|
+
Arguments:
|
|
125
|
+
url Base URL of the website to analyze
|
|
126
|
+
|
|
127
|
+
Options:
|
|
128
|
+
--timeout <seconds> HTTP request timeout in seconds (default: 30)
|
|
129
|
+
--output <format> Output format: html or json (default: "html")
|
|
130
|
+
--output-dir <path> Output directory for reports (default: "./sitemap-qa/report")
|
|
131
|
+
--output-file <path> Custom output filename
|
|
132
|
+
--accepted-patterns <list> Comma-separated patterns to exclude from risk detection
|
|
133
|
+
--verbose Enable verbose logging
|
|
134
|
+
-h, --help Display help for command
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### Examples
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
# Basic analysis with HTML report (default)
|
|
141
|
+
sitemap-qa analyze https://example.com
|
|
142
|
+
|
|
143
|
+
# JSON output for CI/CD integration
|
|
144
|
+
sitemap-qa analyze https://example.com --output json
|
|
145
|
+
|
|
146
|
+
# Custom output directory
|
|
147
|
+
sitemap-qa analyze https://example.com --output-dir ./reports
|
|
148
|
+
|
|
149
|
+
# Exclude specific URL patterns from detection
|
|
150
|
+
sitemap-qa analyze https://example.com --accepted-patterns "internal-*,test-*"
|
|
151
|
+
|
|
152
|
+
# Increase timeout for slow servers
|
|
153
|
+
sitemap-qa analyze https://example.com --timeout 60
|
|
154
|
+
|
|
155
|
+
# Verbose mode for debugging
|
|
156
|
+
sitemap-qa analyze https://example.com --verbose
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## 🔧 Configuration
|
|
162
|
+
|
|
163
|
+
Create a `.sitemap-qa.config.json` file in your project root or `~/.sitemap-qa/config.json` for global settings:
|
|
164
|
+
|
|
165
|
+
```json
|
|
166
|
+
{
|
|
167
|
+
"timeout": 30,
|
|
168
|
+
"concurrency": 10,
|
|
169
|
+
"outputFormat": "html",
|
|
170
|
+
"outputDir": "./sitemap-qa/report",
|
|
171
|
+
"verbose": false,
|
|
172
|
+
"acceptedPatterns": [
|
|
173
|
+
"test-*",
|
|
174
|
+
"staging-*"
|
|
175
|
+
]
|
|
176
|
+
}
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### Configuration Options
|
|
180
|
+
|
|
181
|
+
| Option | Type | Default | Description |
|
|
182
|
+
|--------|------|---------|-------------|
|
|
183
|
+
| `timeout` | number | `30` | HTTP request timeout in seconds (1-300) |
|
|
184
|
+
| `concurrency` | number | `10` | Number of concurrent HTTP requests |
|
|
185
|
+
| `outputFormat` | string | `"html"` | Output format: `"html"` or `"json"` |
|
|
186
|
+
| `outputDir` | string | `"./sitemap-qa/report"` | Directory for generated reports |
|
|
187
|
+
| `verbose` | boolean | `false` | Enable detailed logging |
|
|
188
|
+
| `acceptedPatterns` | string[] | `[]` | URL patterns to exclude from risk detection |
|
|
189
|
+
|
|
190
|
+
### Accepted Patterns
|
|
191
|
+
|
|
192
|
+
Exclude specific URLs from risk detection using wildcard patterns:
|
|
193
|
+
|
|
194
|
+
```json
|
|
195
|
+
{
|
|
196
|
+
"acceptedPatterns": [
|
|
197
|
+
"testing-*", // Matches: test-player, test-player-stats
|
|
198
|
+
"https://example.com/admin/*", // Matches: any URL under /admin/
|
|
199
|
+
"/special-case" // Matches: exact path segment
|
|
200
|
+
]
|
|
201
|
+
}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Pattern Syntax:**
|
|
205
|
+
- Use `*` as a wildcard (matches any characters within a path segment)
|
|
206
|
+
- Patterns are case-insensitive
|
|
207
|
+
- Special characters are automatically escaped
|
|
208
|
+
- Patterns match against the full URL
|
|
209
|
+
- Full URLs or path fragments both work
|
|
210
|
+
|
|
211
|
+
**Priority:** CLI options > Project config (`.sitemap-qa.config.json`) > Global config (`~/.sitemap-qa/config.json`) > Defaults
|
|
212
|
+
|
|
213
|
+
## 📝 License
|
|
214
|
+
|
|
215
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## 🙏 Acknowledgments
|
|
220
|
+
|
|
221
|
+
Built with:
|
|
222
|
+
- [Commander.js](https://github.com/tj/commander.js) - CLI framework
|
|
223
|
+
- [Chalk](https://github.com/chalk/chalk) - Terminal styling
|
|
224
|
+
- [Vitest](https://vitest.dev/) - Testing framework
|
|
225
|
+
- [TypeScript](https://www.typescriptlang.org/) - Type safety
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## 📧 Support
|
|
230
|
+
|
|
231
|
+
- **Issues**: [GitHub Issues](https://github.com/akotliar/sitemap-qa/issues)-
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
**Made with ❤️ for QA teams everywhere**
|