reffy 3.1.0 → 4.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +102 -46
- package/index.js +1 -2
- package/package.json +25 -34
- package/reffy.js +210 -194
- package/src/browserlib/extract-dfns.mjs +348 -10
- package/src/browserlib/map-ids-to-headings.mjs +53 -6
- package/src/cli/check-specs.js +1 -1
- package/src/cli/crawl-and-study.js +212 -0
- package/src/cli/generate-idlnames.js +3 -2
- package/src/lib/fetch.js +6 -0
- package/src/lib/nock-server.js +2 -1
- package/src/{cli/crawl-specs.js → lib/specs-crawler.js} +16 -212
- package/src/lib/util.js +17 -2
package/README.md
CHANGED
|
@@ -1,49 +1,37 @@
|
|
|
1
1
|
# Reffy
|
|
2
2
|
|
|
3
|
-
Reffy is a **Web spec crawler
|
|
3
|
+
Reffy is a **Web spec crawler** tool. It is notably used to update [Webref](https://github.com/w3c/webref#webref) every 6 hours.
|
|
4
4
|
|
|
5
|
-
The code features a generic crawler that can fetch Web specifications and generate machine-readable extracts out of them
|
|
5
|
+
The code features a generic crawler that can fetch Web specifications and generate machine-readable extracts out of them. Created extracts include lists of CSS properties, definitions, IDL, links and references contained in the specification.
|
|
6
|
+
|
|
7
|
+
The code also currently includes a set of individual tools to study extracts and create human-readable reports (such as the [crawl report in Webref](https://w3c.github.io/webref/ed/)). Please note the on-going plan to move this part out of Reffy into a dedicated companion analysis tool (see [issue #747](https://github.com/w3c/reffy/issues/747)).
|
|
6
8
|
|
|
7
9
|
|
|
8
10
|
## How to use
|
|
9
11
|
|
|
10
12
|
### Pre-requisites
|
|
11
13
|
|
|
12
|
-
|
|
13
|
-
- If you want to generate HTML reports, you need to install [Pandoc](http://pandoc.org/).
|
|
14
|
+
To install Reffy, you need [Node.js](https://nodejs.org/en/).
|
|
14
15
|
|
|
15
16
|
### Installation
|
|
16
17
|
|
|
17
|
-
Reffy is available as an NPM package. To install, run:
|
|
18
|
-
|
|
19
|
-
`npm install reffy`
|
|
20
|
-
|
|
21
|
-
This should install Reffy's command-line interface tools to Node.js path.
|
|
22
|
-
|
|
23
|
-
### Launch Reffy
|
|
24
|
-
|
|
25
|
-
To crawl all specs, generate a crawl report and an anomaly report, follow these steps:
|
|
26
|
-
|
|
27
|
-
1. To produce a report using Editor's Drafts, run `reffy run ed`.
|
|
28
|
-
2. To produce a report using latest published versions in `/TR/`, run `reffy run tr`.
|
|
18
|
+
Reffy is available as an NPM package. To install the package globally, run:
|
|
29
19
|
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
3. **Markdown report generation**: Produces a human-readable report in Markdown format out of the report returned by the analysis step, or directly out of results of the crawling step. `generate-report reports/ed/study.json [perspec|dep]`. By default, the tool generates a report per anomaly, pass `perspec` to create a report per specification and `dep` to generate a dependencies report. You will probably want to redirect the output to a file, e.g. using `generate-report reports/ed/study.json > reports/ed/index.md`.
|
|
34
|
-
4. **Conversion to HTML**: Takes the Markdown analysis per specification and prepares an HTML report with expandable sections. `pandoc reports/ed/index.md -f markdown -t html5 --section-divs -s --template report-template.html -o reports/ed/index.html` (where `report.md` is the Markdown report)
|
|
35
|
-
5. **Diff with latest published version of the crawl report**: Compares a crawl analysis with the latest published crawl analysis and produce a human-readable diff in Markdown format. `generate-report reports/ed/study.json diff https://w3c.github.io/webref/ed/study.json`
|
|
20
|
+
```bash
|
|
21
|
+
npm install -g reffy`
|
|
22
|
+
```
|
|
36
23
|
|
|
37
|
-
|
|
24
|
+
This will install Reffy as a command-line interface tool.
|
|
38
25
|
|
|
39
|
-
|
|
40
|
-
* The crawler uses a local cache for HTTP exchanges. It will create and fill a `.cache` subfolder in particular.
|
|
26
|
+
The list of specs crawled by default evolves regularly. To make sure that you run the latest version, use:
|
|
41
27
|
|
|
42
|
-
|
|
28
|
+
```bash
|
|
29
|
+
npm update -g reffy
|
|
30
|
+
```
|
|
43
31
|
|
|
44
|
-
###
|
|
32
|
+
### Launch Reffy
|
|
45
33
|
|
|
46
|
-
|
|
34
|
+
Reffy crawls requested specifications and runs a set of processing modules on the content fetched to create relevant extracts from each spec. Which specs get crawled, and which processing modules get run depend on how the crawler gets called. By default, the crawler crawls all specs defined in [browser-specs](https://github.com/w3c/browser-specs/) and runs all core processing modules defined in the [`browserlib`](https://github.com/w3c/reffy/tree/main/src/browserlib) folder.
|
|
47
35
|
|
|
48
36
|
Crawl results will either be returned to the console or saved in individual files in a report folder when the `--output` parameter is set.
|
|
49
37
|
|
|
@@ -60,31 +48,68 @@ The crawler can be fully parameterized to crawl a specific list of specs and run
|
|
|
60
48
|
|
|
61
49
|
- To extract the raw IDL defined in Fetch, run:
|
|
62
50
|
```bash
|
|
63
|
-
|
|
51
|
+
reffy --spec fetch --module idl
|
|
64
52
|
```
|
|
65
53
|
- To retrieve the list of specs that the HTML spec references, run (noting that crawling the HTML spec takes some time due to it being a multipage spec):
|
|
66
54
|
```bash
|
|
67
|
-
|
|
55
|
+
reffy --spec html --module refs`
|
|
68
56
|
```
|
|
69
57
|
- To extract the list of CSS properties defined in CSS Flexible Box Layout Module Level 1, run:
|
|
70
58
|
```bash
|
|
71
|
-
|
|
59
|
+
reffy --spec css-flexbox-1 --module css
|
|
72
60
|
```
|
|
73
61
|
- To extract the list of terms defined in WAI ARIA 1.2, run:
|
|
74
62
|
```bash
|
|
75
|
-
|
|
63
|
+
reffy --spec wai-aria-1.2 --module dfns
|
|
76
64
|
```
|
|
77
65
|
- To run an hypothetical `extract-editors.mjs` processing module and create individual spec extracts with the result of the processing under an `editors` folder for all specs in browser-specs, run:
|
|
78
66
|
```bash
|
|
79
|
-
|
|
67
|
+
reffy --output reports/test --module editors:extract-editors.mjs
|
|
80
68
|
```
|
|
81
69
|
|
|
82
70
|
You may add `--terse` (or `-t`) to the above commands to access the extracts directly.
|
|
83
71
|
|
|
84
|
-
Run `
|
|
72
|
+
Run `reffy -h` for a complete list of options and usage details.
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
Some notes:
|
|
76
|
+
|
|
77
|
+
* The crawler may take a few minutes, depending on the number of specs it needs to crawl.
|
|
78
|
+
* The crawler uses a local cache for HTTP exchanges. It will create and fill a `.cache` subfolder in particular.
|
|
79
|
+
* If you cloned the repo instead of installing Reffy globally, replace `reffy` width `node reffy.js` in the above example to run Reffy.
|
|
80
|
+
|
|
81
|
+
|
|
82
|
+
## Additional tools
|
|
83
|
+
|
|
84
|
+
Additional CLI tools in the `src/cli` folder complete the main specs crawler.
|
|
85
|
+
|
|
86
|
+
|
|
87
|
+
### WebIDL parser
|
|
88
|
+
|
|
89
|
+
The **WebIDL parser** takes the relative path to an IDL extract and generates a JSON structure that describes WebIDL term definitions and references that the spec contains. The parser uses [WebIDL2](https://github.com/darobin/webidl2.js/) to parse the WebIDL content found in the spec. To run the WebIDL parser: `node src/cli/parse-webidl.js [idlfile]`
|
|
90
|
+
|
|
91
|
+
To create the WebIDL extract in the first place, you will need to run the `idl` module in Reffy, as in:
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
reffy --spec fetch --module idl > fetch.idl
|
|
95
|
+
```
|
|
96
|
+
|
|
85
97
|
|
|
98
|
+
### WebIDL names generator
|
|
86
99
|
|
|
87
|
-
|
|
100
|
+
The **WebIDL names generator** takes the results of a crawl as input and creates a report per referenceable IDL name, that details the complete parsed IDL structure that defines the name across all specs. To run the generator: `node src/cli/generate-idlnames.js [crawl folder] [save folder]`
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
### Crawl results merger
|
|
104
|
+
|
|
105
|
+
The **crawl results merger** merges a new JSON crawl report into a reference one. This tool is typically useful to replace the crawl results of a given specification with the results of a new run of the crawler on that specification. To run the crawl results merger: `node src/cli/merge-crawl-results.js [new crawl report] [reference crawl report] [crawl report to create]`
|
|
106
|
+
|
|
107
|
+
|
|
108
|
+
### Analysis tools
|
|
109
|
+
|
|
110
|
+
**Note:** Plan is to move analysis tools out of Reffy's codebase into a dedicated companion analysis tool (see [issue #747](https://github.com/w3c/reffy/issues/747)).
|
|
111
|
+
|
|
112
|
+
#### Study tool
|
|
88
113
|
|
|
89
114
|
**Reffy's report study tool** takes the machine-readable report generated by the crawler, and creates a study report of *potential* anomalies found in the report. The study report can then easily be converted to a human-readable Markdown report. Reported potential anomalies are:
|
|
90
115
|
|
|
@@ -98,32 +123,63 @@ Run `crawl-specs -h` for a complete list of options and usage details.
|
|
|
98
123
|
8. specs that link to another spec but do not include a reference to that other spec;
|
|
99
124
|
9. specs that link to another spec inconsistently in the body of the document and in the list of references (e.g. because the body of the document references the Editor's draft while the reference is to the latest published version).
|
|
100
125
|
|
|
101
|
-
|
|
126
|
+
For instance:
|
|
102
127
|
|
|
103
|
-
|
|
128
|
+
```bash
|
|
129
|
+
node src/cli/study-crawl.js reports/ed/crawl.json > reports/ed/study.json.
|
|
130
|
+
```
|
|
104
131
|
|
|
105
|
-
|
|
132
|
+
#### Markdown report generator
|
|
133
|
+
|
|
134
|
+
The **markdown report generator** produces a human-readable report in Markdown format out of the report returned by the study step, or directly out of the results of the crawling step. To run the generator:
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
node src/cli/generate-report.js reports/ed/study.json [perspec|dep]`
|
|
138
|
+
```
|
|
106
139
|
|
|
107
|
-
|
|
140
|
+
By default, the tool generates a report per anomaly, pass `perspec` to create a report per specification and `dep` to generate a dependencies report. You will probably want to redirect the output to a file, e.g. using `node src/cli/generate-report.js reports/ed/study.json > reports/ed/index.md`.
|
|
108
141
|
|
|
109
|
-
The
|
|
142
|
+
The markdown report generator may also produce diff reports, e.g.:
|
|
110
143
|
|
|
111
|
-
|
|
144
|
+
```bash
|
|
145
|
+
node src/cli/generate-report.js reports/ed/study.json diff https://w3c.github.io/webref/ed/study.json
|
|
146
|
+
```
|
|
112
147
|
|
|
113
|
-
|
|
148
|
+
#### Spec checker
|
|
114
149
|
|
|
115
|
-
The **spec checker** takes the URL of a spec, a reference crawl report and the name of the study report to create as inputs. It crawls and studies the given spec against the reference crawl report. Essentially, it applies the **crawler**, the **merger** and the **study** tool in order, to produces the anomalies report for the given spec. Note the URL can check multiple specs at once, provided the URLs are passed as a comma-separated value list without spaces. To run the spec checker: `check-specs [url] [reference crawl report] [study report to create]`
|
|
150
|
+
The **spec checker** takes the URL of a spec, a reference crawl report and the name of the study report to create as inputs. It crawls and studies the given spec against the reference crawl report. Essentially, it applies the **crawler**, the **merger** and the **study** tool in order, to produces the anomalies report for the given spec. Note the URL can check multiple specs at once, provided the URLs are passed as a comma-separated value list without spaces. To run the spec checker: `node src/cli/check-specs.js [url] [reference crawl report] [study report to create]`
|
|
116
151
|
|
|
117
152
|
For instance:
|
|
118
153
|
|
|
119
154
|
```bash
|
|
120
|
-
|
|
121
|
-
check-specs https://www.w3.org/TR/webstorage/ reports/ed/crawl.json reports/study-webstorage.json
|
|
155
|
+
node src/cli/check-specs.js https://www.w3.org/TR/webstorage/ reports/ed/crawl.json reports/study-webstorage.json
|
|
122
156
|
```
|
|
123
157
|
|
|
158
|
+
#### Crawl and study all at once
|
|
159
|
+
|
|
160
|
+
**Note:** You will need to install [Pandoc](http://pandoc.org/) for HTML report generation to succeed.
|
|
161
|
+
|
|
162
|
+
To crawl all specs, generate a crawl report and an anomaly report, follow these steps:
|
|
163
|
+
|
|
164
|
+
1. To produce a report using Editor's Drafts, run `npm run ed`.
|
|
165
|
+
2. To produce a report using latest published versions in `/TR/`, run `npm run tr`.
|
|
166
|
+
|
|
167
|
+
These commands run the `src/cli/crawl-and-study.js` script. Under the hoods, this script runs the following tools in turn:
|
|
168
|
+
1. **Crawler**: crawls all specs with [Reffy](#launch-reffy)
|
|
169
|
+
2. **Analysis**: Runs the [study tool](#study-tool)
|
|
170
|
+
3. **Markdown report generation**: Runs the [markdown report generator](#markdown-report-generator)
|
|
171
|
+
4. **Conversion to HTML**: Runs `pandoc` to prepare an HTML report with expandable sections out of the Takes the markdown report per specification. Typically runs `pandoc reports/ed/index.md -f markdown -t html5 --section-divs -s --template report-template.html -o reports/ed/index.html` (where `report.md` is the Markdown report)
|
|
172
|
+
5. **Diff with latest published version of the crawl report**: Compares a crawl analysis with the latest published crawl analysis and produce a human-readable diff in Markdown format with the [markdown report generator](#markdown-report-generator)
|
|
173
|
+
|
|
174
|
+
|
|
175
|
+
### WebIDL terms explorer
|
|
176
|
+
|
|
177
|
+
See the related **[WebIDLPedia](https://dontcallmedom.github.io/webidlpedia)** project and its [repo](https://github.com/dontcallmedom/webidlpedia).
|
|
178
|
+
|
|
179
|
+
|
|
124
180
|
## Technical notes
|
|
125
181
|
|
|
126
|
-
Reffy should be able to parse most of the W3C/WHATWG specifications that define CSS and/or WebIDL terms (both published versions and Editor's Drafts)
|
|
182
|
+
Reffy should be able to parse most of the W3C/WHATWG specifications that define CSS and/or WebIDL terms (both published versions and Editor's Drafts), and more generally speaking specs authored with one of [Bikeshed](https://tabatkins.github.io/bikeshed/) or [ReSpec](https://respec.org/docs/). Reffy can also parse certain IETF specs to some extent, and may work with other types of specs as well.
|
|
127
183
|
|
|
128
184
|
### List of specs to crawl
|
|
129
185
|
|
package/index.js
CHANGED
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "reffy",
|
|
3
|
-
"version": "
|
|
3
|
+
"version": "4.0.3",
|
|
4
4
|
"description": "W3C/WHATWG spec dependencies exploration companion. Features a short set of tools to study spec references as well as WebIDL term definitions and references found in W3C specifications.",
|
|
5
5
|
"repository": {
|
|
6
6
|
"type": "git",
|
|
@@ -29,51 +29,42 @@
|
|
|
29
29
|
"node": ">=14"
|
|
30
30
|
},
|
|
31
31
|
"main": "index.js",
|
|
32
|
-
"bin":
|
|
33
|
-
"reffy": "./reffy.js",
|
|
34
|
-
"check-specs": "./src/cli/check-specs.js",
|
|
35
|
-
"crawl-specs": "./src/cli/crawl-specs.js",
|
|
36
|
-
"generate-idlnames": "./src/cli/generate-idlnames.js",
|
|
37
|
-
"generate-report": "./src/cli/generate-report.js",
|
|
38
|
-
"merge-crawl-results": "./src/cli/merge-crawl-results.js",
|
|
39
|
-
"parse-webidl": "./src/cli/parse-webidl.js",
|
|
40
|
-
"study-crawl": "./src/cli/study-crawl.js"
|
|
41
|
-
},
|
|
32
|
+
"bin": "./reffy.js",
|
|
42
33
|
"dependencies": {
|
|
43
34
|
"abortcontroller-polyfill": "1.7.3",
|
|
44
|
-
"browser-specs": "2.
|
|
45
|
-
"commander": "8.
|
|
35
|
+
"browser-specs": "2.14.1",
|
|
36
|
+
"commander": "8.2.0",
|
|
46
37
|
"fetch-filecache-for-crawling": "4.0.2",
|
|
47
38
|
"node-pandoc": "0.3.0",
|
|
48
|
-
"puppeteer": "10.
|
|
39
|
+
"puppeteer": "10.4.0",
|
|
49
40
|
"webidl2": "24.1.2"
|
|
50
41
|
},
|
|
51
42
|
"devDependencies": {
|
|
52
43
|
"chai": "4.3.4",
|
|
53
|
-
"mocha": "9.1.
|
|
44
|
+
"mocha": "9.1.2",
|
|
54
45
|
"nock": "13.1.3",
|
|
55
|
-
"respec": "26.13.
|
|
46
|
+
"respec": "26.13.4",
|
|
56
47
|
"respec-hljs": "2.1.1",
|
|
57
|
-
"rollup": "2.
|
|
48
|
+
"rollup": "2.58.0"
|
|
58
49
|
},
|
|
59
50
|
"scripts": {
|
|
60
|
-
"all": "node
|
|
61
|
-
"diff": "node
|
|
62
|
-
"diffnew": "node
|
|
63
|
-
"tr": "node
|
|
64
|
-
"tr-crawl": "node
|
|
65
|
-
"tr-study": "node
|
|
66
|
-
"tr-markdown": "node
|
|
67
|
-
"tr-html": "node
|
|
68
|
-
"tr-diff": "node
|
|
69
|
-
"tr-diffnew": "node
|
|
70
|
-
"ed": "node
|
|
71
|
-
"ed-crawl": "node --max-old-space-size=8192
|
|
72
|
-
"ed-study": "node
|
|
73
|
-
"ed-markdown": "node
|
|
74
|
-
"ed-html": "node
|
|
75
|
-
"ed-diff": "node
|
|
76
|
-
"ed-diffnew": "node
|
|
51
|
+
"all": "node src/cli/crawl-and-study.js run ed all && node src/cli/crawl-and-study.js run tr all",
|
|
52
|
+
"diff": "node src/cli/crawl-and-study.js run ed diff && node src/cli/crawl-and-study.js run tr diff",
|
|
53
|
+
"diffnew": "node src/cli/crawl-and-study.js run ed diffnew && node src/cli/crawl-and-study.js run tr diffnew",
|
|
54
|
+
"tr": "node src/cli/crawl-and-study.js run tr all",
|
|
55
|
+
"tr-crawl": "node src/cli/crawl-and-study.js run tr crawl",
|
|
56
|
+
"tr-study": "node src/cli/crawl-and-study.js run tr study",
|
|
57
|
+
"tr-markdown": "node src/cli/crawl-and-study.js run tr markdown",
|
|
58
|
+
"tr-html": "node src/cli/crawl-and-study.js run tr html",
|
|
59
|
+
"tr-diff": "node src/cli/crawl-and-study.js run tr diff",
|
|
60
|
+
"tr-diffnew": "node src/cli/crawl-and-study.js run tr diffnew",
|
|
61
|
+
"ed": "node src/cli/crawl-and-study.js run ed all",
|
|
62
|
+
"ed-crawl": "node --max-old-space-size=8192 src/cli/crawl-and-study.js run ed crawl",
|
|
63
|
+
"ed-study": "node src/cli/crawl-and-study.js run ed study",
|
|
64
|
+
"ed-markdown": "node src/cli/crawl-and-study.js run ed markdown",
|
|
65
|
+
"ed-html": "node src/cli/crawl-and-study.js run ed html",
|
|
66
|
+
"ed-diff": "node src/cli/crawl-and-study.js run ed diff",
|
|
67
|
+
"ed-diffnew": "node src/cli/crawl-and-study.js run ed diffnew",
|
|
77
68
|
"test": "mocha --recursive tests/"
|
|
78
69
|
}
|
|
79
70
|
}
|