@akotliar/sitemap-qa 1.0.0-alpha.3 → 1.0.0-alpha.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +106 -100
- package/dist/index.js +758 -2253
- package/dist/index.js.map +1 -1
- package/package.json +9 -7
package/README.md
CHANGED
|
@@ -13,12 +13,12 @@ Sitemap-QA is a command-line tool that automatically discovers, parses, and anal
|
|
|
13
13
|
|
|
14
14
|
## 🎯 Why Sitemap-QA?
|
|
15
15
|
|
|
16
|
-
Unlike SEO-focused sitemap validators, Sitemap-QA is designed specifically for **QA validation and risk detection
|
|
16
|
+
Unlike SEO-focused sitemap validators, Sitemap-QA is designed specifically for **QA validation and risk detection** using a **Policy-as-Code** approach:
|
|
17
17
|
|
|
18
18
|
- ✅ **Detect environment leakage** — Find staging, dev, or test URLs that shouldn't be in production sitemaps
|
|
19
19
|
- ✅ **Identify exposed admin paths** — Catch `/admin`, `/dashboard`, and internal routes in public indexes
|
|
20
|
-
- ✅ **Flag sensitive
|
|
21
|
-
- ✅ **
|
|
20
|
+
- ✅ **Flag sensitive files** — Detect database backups, environment files, and archives
|
|
21
|
+
- ✅ **Fully Customizable** — Define your own risk categories and patterns using Literal, Glob, or Regex matching
|
|
22
22
|
- ✅ **Fast and automated** — Analyze thousands of URLs in seconds with detailed reports
|
|
23
23
|
|
|
24
24
|
Perfect for CI/CD pipelines, pre-release validation, and security audits.
|
|
@@ -42,9 +42,6 @@ sitemap-qa analyze https://example.com
|
|
|
42
42
|
|
|
43
43
|
# Generate JSON output for CI/CD
|
|
44
44
|
sitemap-qa analyze https://example.com --output json > report.json
|
|
45
|
-
|
|
46
|
-
# Increase verbosity for debugging
|
|
47
|
-
sitemap-qa analyze https://example.com --verbose
|
|
48
45
|
```
|
|
49
46
|
|
|
50
47
|
---
|
|
@@ -55,61 +52,66 @@ sitemap-qa analyze https://example.com --verbose
|
|
|
55
52
|
- Checks `robots.txt` for sitemap declarations
|
|
56
53
|
- Tests standard paths (`/sitemap.xml`, `/sitemap_index.xml`, etc.)
|
|
57
54
|
- Recursively follows sitemap indexes
|
|
58
|
-
- Handles multiple sitemaps and
|
|
59
|
-
- Detects and processes malformed sitemap indexes (sitemaps listed in `<url>` blocks instead of `<sitemap>` blocks)
|
|
55
|
+
- Handles multiple sitemaps in supported formats (XML, compressed XML `.xml.gz`, and dynamically generated/PHP-based sitemaps)
|
|
60
56
|
|
|
61
57
|
### Risk Detection Patterns
|
|
62
58
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
|
66
|
-
|
|
67
|
-
| **
|
|
68
|
-
| **
|
|
69
|
-
| **
|
|
70
|
-
|
|
71
|
-
|
|
59
|
+
The tool comes with a set of default policies, but you can fully customize them in your `sitemap-qa.yaml`.
|
|
60
|
+
|
|
61
|
+
| Risk Category | Description | Example Patterns |
|
|
62
|
+
|--------------|-------------|------------------|
|
|
63
|
+
| **Security & Admin** | Detects exposed administrative interfaces and sensitive configuration files. | `**/admin/**`, `**/.env*`, `/wp-admin` |
|
|
64
|
+
| **Environment Leakage** | Finds staging or development URLs that shouldn't be in production sitemaps. | `**/staging.**`, `**/dev.**` |
|
|
65
|
+
| **Sensitive Files** | Flags database backups, archives, and other sensitive file types. | `**/*.{sql,bak,zip,tar}`, `**/*.tar.gz` |
|
|
66
|
+
|
|
67
|
+
### Customizing Risks
|
|
68
|
+
|
|
69
|
+
You can add your own categories and patterns to the `sitemap-qa.yaml` file. Patterns support `literal`, `glob`, and `regex` matching.
|
|
70
|
+
|
|
71
|
+
```yaml
|
|
72
|
+
policies:
|
|
73
|
+
- category: "Internal API"
|
|
74
|
+
patterns:
|
|
75
|
+
- type: "glob"
|
|
76
|
+
value: "**/api/v1/internal/**"
|
|
77
|
+
reason: "Internal API version 1 should not be exposed."
|
|
78
|
+
```
|
|
72
79
|
|
|
73
80
|
|
|
74
81
|
### Output Formats
|
|
75
82
|
|
|
76
83
|
#### HTML Report (Interactive)
|
|
77
84
|
The HTML report provides an interactive, visually appealing view with:
|
|
78
|
-
- Expandable/collapsible sections by
|
|
79
|
-
- Download buttons to export all URLs per
|
|
85
|
+
- Expandable/collapsible sections by category
|
|
86
|
+
- Download buttons to export all URLs per finding
|
|
80
87
|
- Clean, modern design with hover effects
|
|
81
88
|
- Portable single-file format
|
|
82
89
|
|
|
83
90
|
#### JSON Report (Machine-Readable)
|
|
84
91
|
```json
|
|
85
92
|
{
|
|
86
|
-
"
|
|
87
|
-
"
|
|
88
|
-
"
|
|
89
|
-
"analysis_type": "rule-based analysis",
|
|
90
|
-
"analysis_timestamp": "2025-12-11T00:00:00.000Z",
|
|
91
|
-
"execution_time_ms": 4523
|
|
93
|
+
"metadata": {
|
|
94
|
+
"generatedAt": "2025-12-24T12:00:00.000Z",
|
|
95
|
+
"durationMs": 1240
|
|
92
96
|
},
|
|
93
|
-
"
|
|
94
|
-
"
|
|
95
|
-
|
|
96
|
-
|
|
97
|
+
"summary": {
|
|
98
|
+
"totalUrls": 895,
|
|
99
|
+
"totalRisks": 2,
|
|
100
|
+
"urlsWithRisksCount": 1
|
|
101
|
+
},
|
|
102
|
+
"findings": [
|
|
97
103
|
{
|
|
98
|
-
"
|
|
99
|
-
"
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
+
"loc": "https://example.com/admin/login",
|
|
105
|
+
"risks": [
|
|
106
|
+
{
|
|
107
|
+
"category": "Security & Admin",
|
|
108
|
+
"pattern": "**/admin/**",
|
|
109
|
+
"type": "glob",
|
|
110
|
+
"reason": "Administrative interfaces should not be publicly indexed."
|
|
111
|
+
}
|
|
112
|
+
]
|
|
104
113
|
}
|
|
105
|
-
]
|
|
106
|
-
"summary": {
|
|
107
|
-
"high_severity_count": 2,
|
|
108
|
-
"medium_severity_count": 1,
|
|
109
|
-
"low_severity_count": 0,
|
|
110
|
-
"total_risky_urls": 8,
|
|
111
|
-
"overall_status": "issues_found"
|
|
112
|
-
}
|
|
114
|
+
]
|
|
113
115
|
}
|
|
114
116
|
```
|
|
115
117
|
|
|
@@ -126,90 +128,90 @@ Arguments:
|
|
|
126
128
|
url Base URL of the website to analyze
|
|
127
129
|
|
|
128
130
|
Options:
|
|
129
|
-
--
|
|
130
|
-
--output <format>
|
|
131
|
-
--
|
|
132
|
-
--output-file <path> Custom output filename
|
|
133
|
-
--accepted-patterns <list> Comma-separated patterns to exclude from risk detection
|
|
134
|
-
--verbose Enable verbose logging
|
|
131
|
+
-c, --config <path> Path to sitemap-qa.yaml
|
|
132
|
+
-o, --output <format> Output format: json, html, or all (default: "all")
|
|
133
|
+
-d, --out-dir <path> Output directory for reports (default: ".")
|
|
135
134
|
-h, --help Display help for command
|
|
136
135
|
```
|
|
137
136
|
|
|
138
137
|
### Examples
|
|
139
138
|
|
|
140
139
|
```bash
|
|
141
|
-
# Basic analysis with HTML
|
|
140
|
+
# Basic analysis with both HTML and JSON reports (default)
|
|
142
141
|
sitemap-qa analyze https://example.com
|
|
143
142
|
|
|
144
|
-
# JSON output
|
|
143
|
+
# JSON output only
|
|
145
144
|
sitemap-qa analyze https://example.com --output json
|
|
146
145
|
|
|
147
146
|
# Custom output directory
|
|
148
|
-
sitemap-qa analyze https://example.com --
|
|
149
|
-
|
|
150
|
-
# Exclude specific URL patterns from detection
|
|
151
|
-
sitemap-qa analyze https://example.com --accepted-patterns "internal-*,test-*"
|
|
152
|
-
|
|
153
|
-
# Increase timeout for slow servers
|
|
154
|
-
sitemap-qa analyze https://example.com --timeout 60
|
|
147
|
+
sitemap-qa analyze https://example.com --out-dir ./reports
|
|
155
148
|
|
|
156
|
-
#
|
|
157
|
-
sitemap-qa analyze https://example.com --
|
|
149
|
+
# Use a specific configuration file
|
|
150
|
+
sitemap-qa analyze https://example.com --config ./custom-config.yaml
|
|
158
151
|
```
|
|
159
152
|
|
|
160
153
|
---
|
|
161
154
|
|
|
162
155
|
## 🔧 Configuration
|
|
163
156
|
|
|
164
|
-
Create a
|
|
165
|
-
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
157
|
+
Create a `sitemap-qa.yaml` file in your project root to define your monitoring policies and tool settings:
|
|
158
|
+
|
|
159
|
+
```yaml
|
|
160
|
+
# Tool Settings
|
|
161
|
+
# Default outDir is "."; this example uses a custom reports directory
|
|
162
|
+
outDir: "./sitemap-qa/report" # custom output directory
|
|
163
|
+
outputFormat: "all" # Options: json, html, all
|
|
164
|
+
|
|
165
|
+
# Monitoring Policies
|
|
166
|
+
policies:
|
|
167
|
+
- category: "Security & Admin"
|
|
168
|
+
patterns:
|
|
169
|
+
- type: "glob"
|
|
170
|
+
value: "**/admin/**"
|
|
171
|
+
reason: "Administrative interfaces should not be publicly indexed."
|
|
172
|
+
- type: "literal"
|
|
173
|
+
value: "/wp-admin"
|
|
174
|
+
reason: "WordPress admin paths are common attack vectors."
|
|
175
|
+
- type: "regex"
|
|
176
|
+
value: ".*\\.php$"
|
|
177
|
+
reason: "PHP file detected"
|
|
178
178
|
```
|
|
179
179
|
|
|
180
180
|
### Configuration Options
|
|
181
181
|
|
|
182
182
|
| Option | Type | Default | Description |
|
|
183
183
|
|--------|------|---------|-------------|
|
|
184
|
-
| `
|
|
185
|
-
| `
|
|
186
|
-
| `
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
184
|
+
| `outDir` | string | `"."` | Directory for generated reports (current working directory by default) |
|
|
185
|
+
| `outputFormat` | string | `"all"` | Report types to generate: `json`, `html`, or `all` |
|
|
186
|
+
| `policies` | array | `[]` | List of monitoring policies with patterns |
|
|
187
|
+
|
|
188
|
+
> Note: The earlier `sitemap-qa.yaml` example sets `outDir: "./sitemap-qa/report"` as a recommended path. If you omit `outDir`, the default is `"."` (the current working directory).
|
|
189
|
+
### Policy Patterns
|
|
190
|
+
|
|
191
|
+
Define patterns to detect risks in your sitemaps:
|
|
192
|
+
|
|
193
|
+
```yaml
|
|
194
|
+
policies:
|
|
195
|
+
- category: "Custom Rules"
|
|
196
|
+
patterns:
|
|
197
|
+
- type: "literal"
|
|
198
|
+
value: "test"
|
|
199
|
+
reason: "Test URL found"
|
|
200
|
+
- type: "glob"
|
|
201
|
+
value: "**/internal/*"
|
|
202
|
+
reason: "Internal path exposed"
|
|
203
|
+
- type: "regex"
|
|
204
|
+
value: "api/v[0-9]/"
|
|
205
|
+
reason: "API versioning detected"
|
|
203
206
|
```
|
|
204
207
|
|
|
205
|
-
**
|
|
206
|
-
-
|
|
207
|
-
-
|
|
208
|
-
-
|
|
209
|
-
-
|
|
210
|
-
- Full URLs or path fragments both work
|
|
208
|
+
**Rule Types:**
|
|
209
|
+
- `literal`: Exact string match
|
|
210
|
+
- `glob`: Wildcard patterns (e.g., `**/admin/**`)
|
|
211
|
+
- `regex`: Regular expression matching (patterns are YAML strings and must use proper escaping)
|
|
212
|
+
- When defining regex patterns in `sitemap-qa.yaml`, remember they are YAML strings, so you must escape backslashes (for example, `".*\\\\.php$"` in YAML corresponds to the regex `.*\.php$`).
|
|
211
213
|
|
|
212
|
-
**Priority:** CLI options > Project config (
|
|
214
|
+
**Priority:** CLI options > Project config (`sitemap-qa.yaml`) > Defaults
|
|
213
215
|
|
|
214
216
|
## 📝 License
|
|
215
217
|
|
|
@@ -222,6 +224,10 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
|
|
222
224
|
Built with:
|
|
223
225
|
- [Commander.js](https://github.com/tj/commander.js) - CLI framework
|
|
224
226
|
- [Chalk](https://github.com/chalk/chalk) - Terminal styling
|
|
227
|
+
- [Undici](https://github.com/nodejs/undici) - High-performance HTTP client
|
|
228
|
+
- [Fast-XML-Parser](https://github.com/NaturalIntelligence/fast-xml-parser) - Fast XML parsing
|
|
229
|
+
- [Zod](https://zod.dev/) - Schema validation
|
|
230
|
+
- [Micromatch](https://github.com/micromatch/micromatch) - Glob pattern matching
|
|
225
231
|
- [Vitest](https://vitest.dev/) - Testing framework
|
|
226
232
|
- [TypeScript](https://www.typescriptlang.org/) - Type safety
|
|
227
233
|
|