@akotliar/sitemap-qa 1.0.0-alpha.3 → 1.0.0-alpha.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +165 -104
- package/dist/index.js +841 -2248
- package/dist/index.js.map +1 -1
- package/package.json +9 -7
package/README.md
CHANGED
|
@@ -11,14 +11,39 @@ Sitemap-QA is a command-line tool that automatically discovers, parses, and anal
|
|
|
11
11
|
|
|
12
12
|
---
|
|
13
13
|
|
|
14
|
+
## 📑 Table of Contents
|
|
15
|
+
|
|
16
|
+
- [Why Sitemap-QA?](#-why-sitemap-qa)
|
|
17
|
+
- [Quick Start](#-quick-start)
|
|
18
|
+
- [Installation](#installation)
|
|
19
|
+
- [Basic Usage](#basic-usage)
|
|
20
|
+
- [Features](#-features)
|
|
21
|
+
- [Automatic Sitemap Discovery](#automatic-sitemap-discovery)
|
|
22
|
+
- [Risk Detection Patterns](#risk-detection-patterns)
|
|
23
|
+
- [Customizing Risks](#customizing-risks)
|
|
24
|
+
- [Output Formats](#output-formats)
|
|
25
|
+
- [CLI Commands](#-cli-commands)
|
|
26
|
+
- [analyze](#analyze-command)
|
|
27
|
+
- [init](#init-command)
|
|
28
|
+
- [Configuration](#-configuration)
|
|
29
|
+
- [Configuration Options](#configuration-options)
|
|
30
|
+
|
|
31
|
+
- [License](#-license)
|
|
32
|
+
- [Acknowledgments](#-acknowledgments)
|
|
33
|
+
- [Support](#-support)
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
14
37
|
## 🎯 Why Sitemap-QA?
|
|
15
38
|
|
|
16
|
-
Unlike SEO-focused sitemap validators, Sitemap-QA is designed specifically for **QA validation and risk detection
|
|
39
|
+
Unlike SEO-focused sitemap validators, Sitemap-QA is designed specifically for **QA validation and risk detection** using a **Policy-as-Code** approach:
|
|
17
40
|
|
|
18
41
|
- ✅ **Detect environment leakage** — Find staging, dev, or test URLs that shouldn't be in production sitemaps
|
|
19
42
|
- ✅ **Identify exposed admin paths** — Catch `/admin`, `/dashboard`, and internal routes in public indexes
|
|
20
|
-
- ✅ **Flag sensitive
|
|
21
|
-
- ✅ **
|
|
43
|
+
- ✅ **Flag sensitive files** — Detect database backups, environment files, and archives
|
|
44
|
+
- ✅ **Domain Consistency** — Automatically flag URLs that point to external or incorrect domains (handles `www.` normalization)
|
|
45
|
+
- ✅ **Acceptable Patterns (Allowlist)** — Exclude known safe URLs from being flagged as risks
|
|
46
|
+
- ✅ **Fully Customizable** — Define your own risk categories and patterns using Literal, Glob, or Regex matching
|
|
22
47
|
- ✅ **Fast and automated** — Analyze thousands of URLs in seconds with detailed reports
|
|
23
48
|
|
|
24
49
|
Perfect for CI/CD pipelines, pre-release validation, and security audits.
|
|
@@ -37,14 +62,17 @@ npm install -g @akotliar/sitemap-qa@alpha
|
|
|
37
62
|
### Basic Usage
|
|
38
63
|
|
|
39
64
|
```bash
|
|
40
|
-
#
|
|
65
|
+
# Step 1: Initialize a configuration file (optional but recommended)
|
|
66
|
+
sitemap-qa init
|
|
67
|
+
|
|
68
|
+
# Step 2: Analyze a website's sitemap
|
|
41
69
|
sitemap-qa analyze https://example.com
|
|
42
70
|
|
|
43
|
-
# Generate JSON output for CI/CD
|
|
44
|
-
sitemap-qa analyze https://example.com --output json
|
|
71
|
+
# Generate JSON output only for CI/CD
|
|
72
|
+
sitemap-qa analyze https://example.com --output json
|
|
45
73
|
|
|
46
|
-
#
|
|
47
|
-
sitemap-qa analyze https://example.com --
|
|
74
|
+
# Use a custom configuration file
|
|
75
|
+
sitemap-qa analyze https://example.com --config ./custom-config.yaml
|
|
48
76
|
```
|
|
49
77
|
|
|
50
78
|
---
|
|
@@ -55,67 +83,102 @@ sitemap-qa analyze https://example.com --verbose
|
|
|
55
83
|
- Checks `robots.txt` for sitemap declarations
|
|
56
84
|
- Tests standard paths (`/sitemap.xml`, `/sitemap_index.xml`, etc.)
|
|
57
85
|
- Recursively follows sitemap indexes
|
|
58
|
-
- Handles multiple sitemaps and
|
|
59
|
-
- Detects and processes malformed sitemap indexes (sitemaps listed in `<url>` blocks instead of `<sitemap>` blocks)
|
|
86
|
+
- Handles multiple sitemaps in supported formats (XML, compressed XML `.xml.gz`, and dynamically generated/PHP-based sitemaps)
|
|
60
87
|
|
|
61
88
|
### Risk Detection Patterns
|
|
62
89
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
|
66
|
-
|
|
67
|
-
| **
|
|
68
|
-
| **
|
|
69
|
-
| **
|
|
70
|
-
| **
|
|
71
|
-
|
|
90
|
+
The tool comes with a set of default policies, but you can fully customize them in your `sitemap-qa.yaml`.
|
|
91
|
+
|
|
92
|
+
| Risk Category | Description | Example Patterns |
|
|
93
|
+
|--------------|-------------|------------------|
|
|
94
|
+
| **Security & Admin** | Detects exposed administrative interfaces and sensitive configuration files. | `**/admin/**`, `**/.env*`, `/wp-admin` |
|
|
95
|
+
| **Environment Leakage** | Finds staging or development URLs that shouldn't be in production sitemaps. | `**/staging.**`, `**/dev.**` |
|
|
96
|
+
| **Sensitive Files** | Flags database backups, archives, and other sensitive file types. | `**/*.{sql,bak,zip,tar}`, `**/*.tar.gz` |
|
|
97
|
+
| **Domain Consistency** | Detects URLs that don't match the target domain (ignoring `www.` differences). | `example.com` vs `other.com` |
|
|
98
|
+
|
|
99
|
+
### Customizing Risks
|
|
100
|
+
|
|
101
|
+
You can add your own categories and patterns to the `sitemap-qa.yaml` file. Patterns support `literal`, `glob`, and `regex` matching. See the [Configuration](#-configuration) section for details.
|
|
72
102
|
|
|
73
103
|
|
|
74
104
|
### Output Formats
|
|
75
105
|
|
|
76
106
|
#### HTML Report (Interactive)
|
|
77
107
|
The HTML report provides an interactive, visually appealing view with:
|
|
78
|
-
- Expandable/collapsible sections by
|
|
79
|
-
- Download buttons to export all URLs per
|
|
108
|
+
- Expandable/collapsible sections by category
|
|
109
|
+
- Download buttons to export all URLs per finding
|
|
80
110
|
- Clean, modern design with hover effects
|
|
81
111
|
- Portable single-file format
|
|
82
112
|
|
|
83
113
|
#### JSON Report (Machine-Readable)
|
|
84
114
|
```json
|
|
85
115
|
{
|
|
86
|
-
"
|
|
87
|
-
"
|
|
88
|
-
"
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
"
|
|
116
|
+
"metadata": {
|
|
117
|
+
"generatedAt": "2025-12-24T12:00:00.000Z",
|
|
118
|
+
"durationMs": 1240
|
|
119
|
+
},
|
|
120
|
+
"summary": {
|
|
121
|
+
"totalUrls": 895,
|
|
122
|
+
"totalRisks": 2,
|
|
123
|
+
"urlsWithRisksCount": 1,
|
|
124
|
+
"ignoredUrlsCount": 5
|
|
92
125
|
},
|
|
93
|
-
"
|
|
94
|
-
"https://example.com/sitemap.xml"
|
|
95
|
-
],
|
|
96
|
-
"suspicious_groups": [
|
|
126
|
+
"findings": [
|
|
97
127
|
{
|
|
98
|
-
"
|
|
99
|
-
"
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
128
|
+
"loc": "https://example.com/admin/login",
|
|
129
|
+
"risks": [
|
|
130
|
+
{
|
|
131
|
+
"category": "Security & Admin",
|
|
132
|
+
"pattern": "**/admin/**",
|
|
133
|
+
"type": "glob",
|
|
134
|
+
"reason": "Administrative interfaces should not be publicly indexed."
|
|
135
|
+
}
|
|
136
|
+
]
|
|
104
137
|
}
|
|
105
|
-
]
|
|
106
|
-
"summary": {
|
|
107
|
-
"high_severity_count": 2,
|
|
108
|
-
"medium_severity_count": 1,
|
|
109
|
-
"low_severity_count": 0,
|
|
110
|
-
"total_risky_urls": 8,
|
|
111
|
-
"overall_status": "issues_found"
|
|
112
|
-
}
|
|
138
|
+
]
|
|
113
139
|
}
|
|
114
140
|
```
|
|
115
141
|
|
|
116
142
|
---
|
|
117
143
|
|
|
118
|
-
## 🛠️ CLI
|
|
144
|
+
## 🛠️ CLI Commands
|
|
145
|
+
|
|
146
|
+
Sitemap-QA provides two main commands: `init` and `analyze`.
|
|
147
|
+
|
|
148
|
+
|
|
149
|
+
### init Command
|
|
150
|
+
|
|
151
|
+
Initialize a default `sitemap-qa.yaml` configuration file in the current directory.
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
Usage: sitemap-qa init [options]
|
|
155
|
+
|
|
156
|
+
Initialize a default sitemap-qa.yaml configuration file
|
|
157
|
+
|
|
158
|
+
Options:
|
|
159
|
+
-h, --help Display help for command
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
#### Example
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
# Create a default configuration file
|
|
166
|
+
sitemap-qa init
|
|
167
|
+
|
|
168
|
+
# This creates sitemap-qa.yaml with:
|
|
169
|
+
# - Default risk policies (Security & Admin, Environment Leakage, Sensitive Files)
|
|
170
|
+
# - Example acceptable patterns
|
|
171
|
+
# - Default output settings
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
**Note:** The `init` command will fail if `sitemap-qa.yaml` already exists in the current directory to prevent accidental overwrites.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
|
|
179
|
+
### analyze Command
|
|
180
|
+
|
|
181
|
+
Analyze a website's sitemap for quality issues and security risks.
|
|
119
182
|
|
|
120
183
|
```
|
|
121
184
|
Usage: sitemap-qa analyze <url> [options]
|
|
@@ -126,90 +189,84 @@ Arguments:
|
|
|
126
189
|
url Base URL of the website to analyze
|
|
127
190
|
|
|
128
191
|
Options:
|
|
129
|
-
--
|
|
130
|
-
--output <format>
|
|
131
|
-
--
|
|
132
|
-
--output-file <path> Custom output filename
|
|
133
|
-
--accepted-patterns <list> Comma-separated patterns to exclude from risk detection
|
|
134
|
-
--verbose Enable verbose logging
|
|
192
|
+
-c, --config <path> Path to sitemap-qa.yaml configuration file
|
|
193
|
+
-o, --output <format> Output format: json, html, or all (default: "all")
|
|
194
|
+
-d, --out-dir <path> Output directory for reports (default: ".")
|
|
135
195
|
-h, --help Display help for command
|
|
136
196
|
```
|
|
137
197
|
|
|
138
|
-
|
|
198
|
+
#### Examples
|
|
139
199
|
|
|
140
200
|
```bash
|
|
141
|
-
# Basic analysis with HTML
|
|
201
|
+
# Basic analysis with both HTML and JSON reports (default)
|
|
142
202
|
sitemap-qa analyze https://example.com
|
|
143
203
|
|
|
144
|
-
# JSON output
|
|
204
|
+
# JSON output only
|
|
145
205
|
sitemap-qa analyze https://example.com --output json
|
|
146
206
|
|
|
147
|
-
#
|
|
148
|
-
sitemap-qa analyze https://example.com --output
|
|
207
|
+
# HTML output only
|
|
208
|
+
sitemap-qa analyze https://example.com --output html
|
|
149
209
|
|
|
150
|
-
#
|
|
151
|
-
sitemap-qa analyze https://example.com --
|
|
210
|
+
# Custom output directory
|
|
211
|
+
sitemap-qa analyze https://example.com --out-dir ./reports
|
|
152
212
|
|
|
153
|
-
#
|
|
154
|
-
sitemap-qa analyze https://example.com --
|
|
213
|
+
# Use a specific configuration file
|
|
214
|
+
sitemap-qa analyze https://example.com --config ./custom-config.yaml
|
|
155
215
|
|
|
156
|
-
#
|
|
157
|
-
sitemap-qa analyze https://example.com --
|
|
216
|
+
# Combine options
|
|
217
|
+
sitemap-qa analyze https://example.com --config ./custom-config.yaml --output json --out-dir ./reports
|
|
158
218
|
```
|
|
159
219
|
|
|
160
|
-
---
|
|
161
|
-
|
|
162
220
|
## 🔧 Configuration
|
|
163
221
|
|
|
164
|
-
Create a
|
|
165
|
-
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
222
|
+
Create a `sitemap-qa.yaml` file in your project root to define your monitoring policies and tool settings:
|
|
223
|
+
|
|
224
|
+
```yaml
|
|
225
|
+
# Tool Settings
|
|
226
|
+
# Default outDir is "."; this example uses a custom reports directory
|
|
227
|
+
outDir: "./sitemap-qa/report" # custom output directory
|
|
228
|
+
outputFormat: "all" # Options: json, html, all
|
|
229
|
+
enforceDomainConsistency: true # Flag URLs from other domains
|
|
230
|
+
|
|
231
|
+
# Monitoring Policies
|
|
232
|
+
acceptable_patterns:
|
|
233
|
+
- type: "literal"
|
|
234
|
+
value: "/acceptable-path"
|
|
235
|
+
reason: "Example of an acceptable path that should not be flagged."
|
|
236
|
+
- type: "glob"
|
|
237
|
+
value: "**/public-docs/**"
|
|
238
|
+
reason: "Public documentation is always acceptable."
|
|
239
|
+
|
|
240
|
+
policies:
|
|
241
|
+
- category: "Security & Admin"
|
|
242
|
+
patterns:
|
|
243
|
+
- type: "glob"
|
|
244
|
+
value: "**/admin/**"
|
|
245
|
+
reason: "Administrative interfaces should not be publicly indexed."
|
|
246
|
+
- type: "literal"
|
|
247
|
+
value: "/wp-admin"
|
|
248
|
+
reason: "WordPress admin paths are common attack vectors."
|
|
249
|
+
- type: "regex"
|
|
250
|
+
value: ".*\\.php$"
|
|
251
|
+
reason: "PHP file detected"
|
|
178
252
|
```
|
|
179
253
|
|
|
180
254
|
### Configuration Options
|
|
181
255
|
|
|
182
256
|
| Option | Type | Default | Description |
|
|
183
257
|
|--------|------|---------|-------------|
|
|
184
|
-
| `
|
|
185
|
-
| `
|
|
186
|
-
| `
|
|
187
|
-
| `
|
|
188
|
-
| `
|
|
189
|
-
| `acceptedPatterns` | string[] | `[]` | URL patterns to exclude from risk detection |
|
|
258
|
+
| `outDir` | string | `"."` | Directory for generated reports (current working directory by default) |
|
|
259
|
+
| `outputFormat` | string | `"all"` | Report types to generate: `json`, `html`, or `all` |
|
|
260
|
+
| `enforceDomainConsistency` | boolean | `true` | If true, flags URLs that don't match the root sitemap domain (ignoring `www.`) |
|
|
261
|
+
| `acceptable_patterns` | array | `[]` | List of patterns to exclude from risk analysis |
|
|
262
|
+
| `policies` | array | `[]` | List of monitoring policies with patterns |
|
|
190
263
|
|
|
191
|
-
### Accepted Patterns
|
|
192
264
|
|
|
193
|
-
|
|
265
|
+
**Priority:** CLI options > Project config (`sitemap-qa.yaml`) > Defaults
|
|
194
266
|
|
|
195
|
-
```json
|
|
196
|
-
{
|
|
197
|
-
"acceptedPatterns": [
|
|
198
|
-
"testing-*", // Matches: test-player, test-player-stats
|
|
199
|
-
"https://example.com/admin/*", // Matches: any URL under /admin/
|
|
200
|
-
"/special-case" // Matches: exact path segment
|
|
201
|
-
]
|
|
202
|
-
}
|
|
203
|
-
```
|
|
204
267
|
|
|
205
|
-
**Pattern Syntax:**
|
|
206
|
-
- Use `*` as a wildcard (matches any characters within a path segment)
|
|
207
|
-
- Patterns are case-insensitive
|
|
208
|
-
- Special characters are automatically escaped
|
|
209
|
-
- Patterns match against the full URL
|
|
210
|
-
- Full URLs or path fragments both work
|
|
211
268
|
|
|
212
|
-
|
|
269
|
+
---
|
|
213
270
|
|
|
214
271
|
## 📝 License
|
|
215
272
|
|
|
@@ -222,6 +279,10 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
|
|
222
279
|
Built with:
|
|
223
280
|
- [Commander.js](https://github.com/tj/commander.js) - CLI framework
|
|
224
281
|
- [Chalk](https://github.com/chalk/chalk) - Terminal styling
|
|
282
|
+
- [Undici](https://github.com/nodejs/undici) - High-performance HTTP client
|
|
283
|
+
- [Fast-XML-Parser](https://github.com/NaturalIntelligence/fast-xml-parser) - Fast XML parsing
|
|
284
|
+
- [Zod](https://zod.dev/) - Schema validation
|
|
285
|
+
- [Micromatch](https://github.com/micromatch/micromatch) - Glob pattern matching
|
|
225
286
|
- [Vitest](https://vitest.dev/) - Testing framework
|
|
226
287
|
- [TypeScript](https://www.typescriptlang.org/) - Type safety
|
|
227
288
|
|
|
@@ -229,7 +290,7 @@ Built with:
|
|
|
229
290
|
|
|
230
291
|
## 📧 Support
|
|
231
292
|
|
|
232
|
-
- **Issues**: [GitHub Issues](https://github.com/akotliar/sitemap-qa/issues)
|
|
293
|
+
- **Issues**: [GitHub Issues](https://github.com/akotliar/sitemap-qa/issues)
|
|
233
294
|
|
|
234
295
|
---
|
|
235
296
|
|