@akotliar/sitemap-qa 1.0.0-alpha.4 → 1.0.0-alpha.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +98 -43
- package/dist/index.js +318 -357
- package/dist/index.js.map +1 -1
- package/dist/reporters/templates/partials/finding.hbs +20 -0
- package/dist/reporters/templates/partials/header.hbs +9 -0
- package/dist/reporters/templates/partials/summary.hbs +22 -0
- package/dist/reporters/templates/report.hbs +293 -0
- package/package.json +4 -1
package/README.md
CHANGED
|
@@ -11,6 +11,29 @@ Sitemap-QA is a command-line tool that automatically discovers, parses, and anal
|
|
|
11
11
|
|
|
12
12
|
---
|
|
13
13
|
|
|
14
|
+
## 📑 Table of Contents
|
|
15
|
+
|
|
16
|
+
- [Why Sitemap-QA?](#-why-sitemap-qa)
|
|
17
|
+
- [Quick Start](#-quick-start)
|
|
18
|
+
- [Installation](#installation)
|
|
19
|
+
- [Basic Usage](#basic-usage)
|
|
20
|
+
- [Features](#-features)
|
|
21
|
+
- [Automatic Sitemap Discovery](#automatic-sitemap-discovery)
|
|
22
|
+
- [Risk Detection Patterns](#risk-detection-patterns)
|
|
23
|
+
- [Customizing Risks](#customizing-risks)
|
|
24
|
+
- [Output Formats](#output-formats)
|
|
25
|
+
- [CLI Commands](#-cli-commands)
|
|
26
|
+
- [analyze](#analyze-command)
|
|
27
|
+
- [init](#init-command)
|
|
28
|
+
- [Configuration](#-configuration)
|
|
29
|
+
- [Configuration Options](#configuration-options)
|
|
30
|
+
|
|
31
|
+
- [License](#-license)
|
|
32
|
+
- [Acknowledgments](#-acknowledgments)
|
|
33
|
+
- [Support](#-support)
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
14
37
|
## 🎯 Why Sitemap-QA?
|
|
15
38
|
|
|
16
39
|
Unlike SEO-focused sitemap validators, Sitemap-QA is designed specifically for **QA validation and risk detection** using a **Policy-as-Code** approach:
|
|
@@ -18,6 +41,8 @@ Unlike SEO-focused sitemap validators, Sitemap-QA is designed specifically for *
|
|
|
18
41
|
- ✅ **Detect environment leakage** — Find staging, dev, or test URLs that shouldn't be in production sitemaps
|
|
19
42
|
- ✅ **Identify exposed admin paths** — Catch `/admin`, `/dashboard`, and internal routes in public indexes
|
|
20
43
|
- ✅ **Flag sensitive files** — Detect database backups, environment files, and archives
|
|
44
|
+
- ✅ **Domain Consistency** — Automatically flag URLs that point to external or incorrect domains (handles `www.` normalization)
|
|
45
|
+
- ✅ **Acceptable Patterns (Allowlist)** — Exclude known safe URLs from being flagged as risks
|
|
21
46
|
- ✅ **Fully Customizable** — Define your own risk categories and patterns using Literal, Glob, or Regex matching
|
|
22
47
|
- ✅ **Fast and automated** — Analyze thousands of URLs in seconds with detailed reports
|
|
23
48
|
|
|
@@ -37,11 +62,17 @@ npm install -g @akotliar/sitemap-qa@alpha
|
|
|
37
62
|
### Basic Usage
|
|
38
63
|
|
|
39
64
|
```bash
|
|
40
|
-
#
|
|
65
|
+
# Step 1: Initialize a configuration file (optional but recommended)
|
|
66
|
+
sitemap-qa init
|
|
67
|
+
|
|
68
|
+
# Step 2: Analyze a website's sitemap
|
|
41
69
|
sitemap-qa analyze https://example.com
|
|
42
70
|
|
|
43
|
-
# Generate JSON output for CI/CD
|
|
44
|
-
sitemap-qa analyze https://example.com --output json
|
|
71
|
+
# Generate JSON output only for CI/CD
|
|
72
|
+
sitemap-qa analyze https://example.com --output json
|
|
73
|
+
|
|
74
|
+
# Use a custom configuration file
|
|
75
|
+
sitemap-qa analyze https://example.com --config ./custom-config.yaml
|
|
45
76
|
```
|
|
46
77
|
|
|
47
78
|
---
|
|
@@ -63,19 +94,11 @@ The tool comes with a set of default policies, but you can fully customize them
|
|
|
63
94
|
| **Security & Admin** | Detects exposed administrative interfaces and sensitive configuration files. | `**/admin/**`, `**/.env*`, `/wp-admin` |
|
|
64
95
|
| **Environment Leakage** | Finds staging or development URLs that shouldn't be in production sitemaps. | `**/staging.**`, `**/dev.**` |
|
|
65
96
|
| **Sensitive Files** | Flags database backups, archives, and other sensitive file types. | `**/*.{sql,bak,zip,tar}`, `**/*.tar.gz` |
|
|
97
|
+
| **Domain Consistency** | Detects URLs that don't match the target domain (ignoring `www.` differences). | `example.com` vs `other.com` |
|
|
66
98
|
|
|
67
99
|
### Customizing Risks
|
|
68
100
|
|
|
69
|
-
You can add your own categories and patterns to the `sitemap-qa.yaml` file. Patterns support `literal`, `glob`, and `regex` matching.
|
|
70
|
-
|
|
71
|
-
```yaml
|
|
72
|
-
policies:
|
|
73
|
-
- category: "Internal API"
|
|
74
|
-
patterns:
|
|
75
|
-
- type: "glob"
|
|
76
|
-
value: "**/api/v1/internal/**"
|
|
77
|
-
reason: "Internal API version 1 should not be exposed."
|
|
78
|
-
```
|
|
101
|
+
You can add your own categories and patterns to the `sitemap-qa.yaml` file. Patterns support `literal`, `glob`, and `regex` matching. See the [Configuration](#-configuration) section for details.
|
|
79
102
|
|
|
80
103
|
|
|
81
104
|
### Output Formats
|
|
@@ -97,7 +120,8 @@ The HTML report provides an interactive, visually appealing view with:
|
|
|
97
120
|
"summary": {
|
|
98
121
|
"totalUrls": 895,
|
|
99
122
|
"totalRisks": 2,
|
|
100
|
-
"urlsWithRisksCount": 1
|
|
123
|
+
"urlsWithRisksCount": 1,
|
|
124
|
+
"ignoredUrlsCount": 5
|
|
101
125
|
},
|
|
102
126
|
"findings": [
|
|
103
127
|
{
|
|
@@ -117,7 +141,44 @@ The HTML report provides an interactive, visually appealing view with:
|
|
|
117
141
|
|
|
118
142
|
---
|
|
119
143
|
|
|
120
|
-
## 🛠️ CLI
|
|
144
|
+
## 🛠️ CLI Commands
|
|
145
|
+
|
|
146
|
+
Sitemap-QA provides two main commands: `init` and `analyze`.
|
|
147
|
+
|
|
148
|
+
|
|
149
|
+
### init Command
|
|
150
|
+
|
|
151
|
+
Initialize a default `sitemap-qa.yaml` configuration file in the current directory.
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
Usage: sitemap-qa init [options]
|
|
155
|
+
|
|
156
|
+
Initialize a default sitemap-qa.yaml configuration file
|
|
157
|
+
|
|
158
|
+
Options:
|
|
159
|
+
-h, --help Display help for command
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
#### Example
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
# Create a default configuration file
|
|
166
|
+
sitemap-qa init
|
|
167
|
+
|
|
168
|
+
# This creates sitemap-qa.yaml with:
|
|
169
|
+
# - Default risk policies (Security & Admin, Environment Leakage, Sensitive Files)
|
|
170
|
+
# - Example acceptable patterns
|
|
171
|
+
# - Default output settings
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
**Note:** The `init` command will fail if `sitemap-qa.yaml` already exists in the current directory to prevent accidental overwrites.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
|
|
179
|
+
### analyze Command
|
|
180
|
+
|
|
181
|
+
Analyze a website's sitemap for quality issues and security risks.
|
|
121
182
|
|
|
122
183
|
```
|
|
123
184
|
Usage: sitemap-qa analyze <url> [options]
|
|
@@ -128,13 +189,13 @@ Arguments:
|
|
|
128
189
|
url Base URL of the website to analyze
|
|
129
190
|
|
|
130
191
|
Options:
|
|
131
|
-
-c, --config <path> Path to sitemap-qa.yaml
|
|
192
|
+
-c, --config <path> Path to sitemap-qa.yaml configuration file
|
|
132
193
|
-o, --output <format> Output format: json, html, or all (default: "all")
|
|
133
194
|
-d, --out-dir <path> Output directory for reports (default: ".")
|
|
134
195
|
-h, --help Display help for command
|
|
135
196
|
```
|
|
136
197
|
|
|
137
|
-
|
|
198
|
+
#### Examples
|
|
138
199
|
|
|
139
200
|
```bash
|
|
140
201
|
# Basic analysis with both HTML and JSON reports (default)
|
|
@@ -143,14 +204,18 @@ sitemap-qa analyze https://example.com
|
|
|
143
204
|
# JSON output only
|
|
144
205
|
sitemap-qa analyze https://example.com --output json
|
|
145
206
|
|
|
207
|
+
# HTML output only
|
|
208
|
+
sitemap-qa analyze https://example.com --output html
|
|
209
|
+
|
|
146
210
|
# Custom output directory
|
|
147
211
|
sitemap-qa analyze https://example.com --out-dir ./reports
|
|
148
212
|
|
|
149
213
|
# Use a specific configuration file
|
|
150
214
|
sitemap-qa analyze https://example.com --config ./custom-config.yaml
|
|
151
|
-
```
|
|
152
215
|
|
|
153
|
-
|
|
216
|
+
# Combine options
|
|
217
|
+
sitemap-qa analyze https://example.com --config ./custom-config.yaml --output json --out-dir ./reports
|
|
218
|
+
```
|
|
154
219
|
|
|
155
220
|
## 🔧 Configuration
|
|
156
221
|
|
|
@@ -161,8 +226,17 @@ Create a `sitemap-qa.yaml` file in your project root to define your monitoring p
|
|
|
161
226
|
# Default outDir is "."; this example uses a custom reports directory
|
|
162
227
|
outDir: "./sitemap-qa/report" # custom output directory
|
|
163
228
|
outputFormat: "all" # Options: json, html, all
|
|
229
|
+
enforceDomainConsistency: true # Flag URLs from other domains
|
|
164
230
|
|
|
165
231
|
# Monitoring Policies
|
|
232
|
+
acceptable_patterns:
|
|
233
|
+
- type: "literal"
|
|
234
|
+
value: "/acceptable-path"
|
|
235
|
+
reason: "Example of an acceptable path that should not be flagged."
|
|
236
|
+
- type: "glob"
|
|
237
|
+
value: "**/public-docs/**"
|
|
238
|
+
reason: "Public documentation is always acceptable."
|
|
239
|
+
|
|
166
240
|
policies:
|
|
167
241
|
- category: "Security & Admin"
|
|
168
242
|
patterns:
|
|
@@ -183,35 +257,16 @@ policies:
|
|
|
183
257
|
|--------|------|---------|-------------|
|
|
184
258
|
| `outDir` | string | `"."` | Directory for generated reports (current working directory by default) |
|
|
185
259
|
| `outputFormat` | string | `"all"` | Report types to generate: `json`, `html`, or `all` |
|
|
260
|
+
| `enforceDomainConsistency` | boolean | `true` | If true, flags URLs that don't match the root sitemap domain (ignoring `www.`) |
|
|
261
|
+
| `acceptable_patterns` | array | `[]` | List of patterns to exclude from risk analysis |
|
|
186
262
|
| `policies` | array | `[]` | List of monitoring policies with patterns |
|
|
187
263
|
|
|
188
|
-
> Note: The earlier `sitemap-qa.yaml` example sets `outDir: "./sitemap-qa/report"` as a recommended path. If you omit `outDir`, the default is `"."` (the current working directory).
|
|
189
|
-
### Policy Patterns
|
|
190
264
|
|
|
191
|
-
|
|
265
|
+
**Priority:** CLI options > Project config (`sitemap-qa.yaml`) > Defaults
|
|
192
266
|
|
|
193
|
-
```yaml
|
|
194
|
-
policies:
|
|
195
|
-
- category: "Custom Rules"
|
|
196
|
-
patterns:
|
|
197
|
-
- type: "literal"
|
|
198
|
-
value: "test"
|
|
199
|
-
reason: "Test URL found"
|
|
200
|
-
- type: "glob"
|
|
201
|
-
value: "**/internal/*"
|
|
202
|
-
reason: "Internal path exposed"
|
|
203
|
-
- type: "regex"
|
|
204
|
-
value: "api/v[0-9]/"
|
|
205
|
-
reason: "API versioning detected"
|
|
206
|
-
```
|
|
207
267
|
|
|
208
|
-
**Rule Types:**
|
|
209
|
-
- `literal`: Exact string match
|
|
210
|
-
- `glob`: Wildcard patterns (e.g., `**/admin/**`)
|
|
211
|
-
- `regex`: Regular expression matching (patterns are YAML strings and must use proper escaping)
|
|
212
|
-
- When defining regex patterns in `sitemap-qa.yaml`, remember they are YAML strings, so you must escape backslashes (for example, `".*\\\\.php$"` in YAML corresponds to the regex `.*\.php$`).
|
|
213
268
|
|
|
214
|
-
|
|
269
|
+
---
|
|
215
270
|
|
|
216
271
|
## 📝 License
|
|
217
272
|
|
|
@@ -235,7 +290,7 @@ Built with:
|
|
|
235
290
|
|
|
236
291
|
## 📧 Support
|
|
237
292
|
|
|
238
|
-
- **Issues**: [GitHub Issues](https://github.com/akotliar/sitemap-qa/issues)
|
|
293
|
+
- **Issues**: [GitHub Issues](https://github.com/akotliar/sitemap-qa/issues)
|
|
239
294
|
|
|
240
295
|
---
|
|
241
296
|
|