markdown_link_checker_sc 0.0.118 → 0.0.119

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,19 +12,21 @@ Current version only does internal link checking
12
12
  Usage: markdown_link_checker_sc [options]
13
13
 
14
14
  Options:
15
- -r, --root <path> Root directory of your source (i.e. root of github repo). Use -d as well to specify a folder if docs are not in the root, or to just run on
16
- particular subfolder. Defaults to current directory. (default: "D:\\github\\hamishwillee\\markdown_link_checker_sc")
17
- -d, --directory [directory] The directory to search for markdown and html files, relative to root - such as: `en` for an English subfolder. Default empty (same as -r
18
- directory) (default: "")
19
- -i, --imagedir [directory] The directory to search for all image files for global orphan checking, relative to root - such as: `assets` or `en`. Default empty if not
20
- explicitly set, and global orphan checking will not be done (default: "")
15
+ -r, --root <path> Root directory of your source (i.e. root of github repo). Use -d as well to specify a folder if docs are not in the root, or to just
16
+ run on particular subfolder. Defaults to current directory. (default: "D:\\github\\hamishwillee\\markdown_link_checker_sc")
17
+ -d, --directory [directory] The directory to search for markdown and html files, relative to root - such as: `en` for an English subfolder. Default empty (same
18
+ as -r directory) (default: "")
19
+ -i, --imagedir [directory] The directory to search for all image files for global orphan checking, relative to root - such as: `assets` or `en`. Default empty
20
+ if not explicitly set, and global orphan checking will not be done (default: "")
21
21
  -c, --headingAnchorSlugify [value] Slugify approach for turning markdown headings into heading anchors. Currently support vuepress only and always (default: "vuepress")
22
22
  -t, --tryMarkdownforHTML [value] Try a markdown file extension check if a link to HTML fails. (default: true)
23
- -l, --log <types...> Export logs for debugging. Types: allerrors, filterederrors, allresults etc.
24
- -f, --files <path> JSON file with array of files to report on (default is all files). Paths are relative relative to -d by default, but -r can be used to set a
25
- different root. (default: "")
23
+ -l, --log <types...> Types of console logs to display logs for debugging. Types: functions, todo etc.
24
+ -f, --files <path> JSON file with array of files to report on (default is all files). Paths are relative relative to -d by default, but -r can be used
25
+ to set a different root. (default: "")
26
26
  -s, --toc [value] full filename of TOC/Summary file in file system. If not specified, inferred from file with most links to other files
27
27
  -u, --site_url [value] Site base url in form dev.example.com (used to catch absolute urls to local files)
28
+ -o, --logtofile [value] Output logs to file (default: true)
29
+ -p, --interactive [value] Interactively add errors to the ignore list at _link_checker_sc/ignore_errors.json (default: false)
28
30
  -h, --help display help for command
29
31
  ```
30
32
 
@@ -37,7 +39,11 @@ Currently matches:
37
39
  - `![Image alt](url)`
38
40
  - `<a href="someurl#someanchor?someparams" title="sometitle">some text</a>`
39
41
  - `<img src="someurl" title="sometitle" />`
40
-
42
+ - `<img src="someurl" title="sometitle" />`
43
+ - `[reference link text][reference name]`, where the reference is define as [reference name]: reference_url "reference title"
44
+ - Only supports reference name and text format - not "plain reference name" like `[reference name]`
45
+ - reference must be all on one line, and can have up to three whitespaces before it on line. May not have text after reference title.
46
+
41
47
 
42
48
  > **Note:** It uses simple regexp. If you have a link commented out, or inside a code block that may well be captured.
43
49
 
@@ -46,25 +52,26 @@ There are heaps of link formats it does not match:
46
52
  - `<http://www.whatever.com>` - doesn't support autolinks
47
53
  - `www.fred.com` - Doesn't support auto-links external.
48
54
  - `[![image title](imageurl)](linkurl)`- Doesn't properly support a link around an image.
49
- - `linkreference: linkurl` - Doesn't support reference links (which would be linked like `[link text][linkreference]`
55
+ - Reference links where the reference is defined across lines.
50
56
 
51
57
 
52
58
  Essentially lots of the other things https://github.github.com/gfm/
53
59
 
54
-
55
60
  The regex that drives this is very simple.
56
61
 
57
-
58
62
  There are many other alternatives, such as: https://github.com/tcort/markdown-link-check
59
63
  You might also use a tokenziker or round trip to HTML using something like https://marked.js.org/using_advanced#inline in future as HTML is eaiser to extract links from.
60
64
 
61
65
  This does catch a LOT of cases though, and is pretty quick.
62
66
 
63
- ## TODO
67
+ ## Also does
64
68
 
65
- - Files passed in should be filtered to check if markdown and only use the markdown ones in the markdown areas.
66
- - Files passed in shoudl be filtered for image types.
67
- All image types passed in should be checked to make sure they are not orphans.
69
+ - Catches markdown files that are orphans - i.e. not linked by any file, or not linked by file which has the most links (normally the TOC file)
70
+ - Catches orphan images
71
+ - Allows you to specify that some errors are OK to ignore. These are stored in a file. See `-i` options\
72
+
73
+
74
+ ## TODO
68
75
 
69
76
  Anchors that are not url escaped can trip it up.
70
77
  - You can URL escape them like this: [Airframe Reference](#underwater_robot_underwater_robot_hippocampus_uuv_%28unmanned_underwater_vehicle%29)
@@ -73,8 +80,6 @@ Anchors that are not url escaped can trip it up.
73
80
 
74
81
  Anchors defined in id in a or span are caught. Need to check those in video, div are also caught and used in internal link checking.
75
82
 
76
- A way to indicate that a particular error can be ignored - e.g. by page, type, maybe by line etc. Perhaps make this something that can be turned on and off.
77
-
78
83
  Get images in/around the source files that are not linked - i.e. orphan images.
79
84
 
80
85
 
@@ -82,8 +87,7 @@ Get images in/around the source files that are not linked - i.e. orphan images.
82
87
  # How does it work?
83
88
 
84
89
  The way this works:
85
- - Specify the directory and it will searc
86
- h below that for all markdown/html files.
90
+ - Specify the directory and it will search below that for all markdown/html files.
87
91
  - It loads each file, and:
88
92
  - parses for markdown and html style links for both page and image links.
89
93
  - parses headings and builds list of anchors in the page (as per vuepress) for those headings (poorly tested code)
@@ -0,0 +1,137 @@
1
+ [
2
+ {
3
+ "type": "InternalLinkToHTML",
4
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
5
+ "link": {
6
+ "url": "../flight_modes/altitude_fw.html",
7
+ "text": "Altitude"
8
+ },
9
+ "hideReason": "Gas hasdfhasfkl"
10
+ },
11
+ {
12
+ "type": "LinkedFileMissingAnchor",
13
+ "fileRelativeToRoot": "en\\config_heli\\README.md",
14
+ "link": {
15
+ "url": "../airframes/airframe_reference.md#copter_helicopter_generic_helicopter_%28tail_esc%29",
16
+ "text": "Generic Helicopter - with Tail ESC"
17
+ },
18
+ "hideReason": "n"
19
+ },
20
+ {
21
+ "type": "InternalLinkToHTML",
22
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
23
+ "link": {
24
+ "url": "../flight_modes/position_fw.html",
25
+ "text": "Position"
26
+ },
27
+ "hideReason": "n"
28
+ },
29
+ {
30
+ "type": "InternalLinkToHTML",
31
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
32
+ "link": {
33
+ "url": "../flight_modes/stabilized_fw.html",
34
+ "text": "Stabilized"
35
+ },
36
+ "hideReason": "nnnn"
37
+ },
38
+ {
39
+ "type": "InternalLinkToHTML",
40
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
41
+ "link": {
42
+ "url": "../flight_modes/acro_fw.html",
43
+ "text": "Acro"
44
+ },
45
+ "hideReason": "nnnnnn"
46
+ },
47
+ {
48
+ "type": "InternalLinkToHTML",
49
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
50
+ "link": {
51
+ "url": "../flight_modes/manual_fw.html",
52
+ "text": "Manual"
53
+ },
54
+ "hideReason": "nnnnnnn"
55
+ },
56
+ {
57
+ "type": "InternalLinkToHTML",
58
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
59
+ "link": {
60
+ "url": "../flight_modes/position_mc.html",
61
+ "text": "Position"
62
+ },
63
+ "hideReason": "y"
64
+ },
65
+ {
66
+ "type": "InternalLinkToHTML",
67
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
68
+ "link": {
69
+ "url": "../flight_modes/altitude_mc.html",
70
+ "text": "Altitude"
71
+ },
72
+ "hideReason": "y"
73
+ },
74
+ {
75
+ "type": "InternalLinkToHTML",
76
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
77
+ "link": {
78
+ "url": "../flight_modes/manual_stabilized_mc.html",
79
+ "text": "Manual/ Stabilized"
80
+ },
81
+ "hideReason": "y"
82
+ },
83
+ {
84
+ "type": "InternalLinkToHTML",
85
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
86
+ "link": {
87
+ "url": "../flight_modes/acro_mc.html",
88
+ "text": "Acro"
89
+ },
90
+ "hideReason": "y"
91
+ },
92
+ {
93
+ "type": "InternalLinkToHTML",
94
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
95
+ "link": {
96
+ "url": "../flight_modes/orbit.html",
97
+ "text": "Orbit"
98
+ },
99
+ "hideReason": "y"
100
+ },
101
+ {
102
+ "type": "InternalLinkToHTML",
103
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
104
+ "link": {
105
+ "url": "../flight_modes/takeoff.html",
106
+ "text": "Takeoff"
107
+ },
108
+ "hideReason": "y"
109
+ },
110
+ {
111
+ "type": "InternalLinkToHTML",
112
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
113
+ "link": {
114
+ "url": "../flight_modes/land.html",
115
+ "text": "Land"
116
+ },
117
+ "hideReason": "y"
118
+ },
119
+ {
120
+ "type": "InternalLinkToHTML",
121
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
122
+ "link": {
123
+ "url": "../flight_modes/hold.html",
124
+ "text": "Hold"
125
+ },
126
+ "hideReason": "y"
127
+ },
128
+ {
129
+ "type": "InternalLinkToHTML",
130
+ "fileRelativeToRoot": "en\\flight_modes\\README.md",
131
+ "link": {
132
+ "url": "../flight_modes/return.html",
133
+ "text": "Return"
134
+ },
135
+ "hideReason": "y"
136
+ }
137
+ ]
package/index.js CHANGED
@@ -64,10 +64,11 @@ program
64
64
  "-u, --site_url [value]",
65
65
  "Site base url in form dev.example.com (used to catch absolute urls to local files)"
66
66
  )
67
+ .option("-o, --logtofile [value]", "Output logs to file", true)
67
68
  .option(
68
- "-o, --logtofile [value]",
69
- "Output logs to file",
70
- true
69
+ "-p, --interactive [value]",
70
+ "Interactively add errors to the ignore list at _link_checker_sc/ignore_errors.json",
71
+ false
71
72
  )
72
73
 
73
74
  .parse(process.argv);
@@ -83,25 +84,30 @@ sharedData.allHTMLFiles = new Set([]);
83
84
  sharedData.allImageFiles = new Set([]);
84
85
  sharedData.allOtherFiles = new Set([]);
85
86
 
86
- const markdownDirectory = path.join(sharedData.options.root, sharedData.options.directory);
87
+ const markdownDirectory = path.join(
88
+ sharedData.options.root,
89
+ sharedData.options.directory
90
+ );
87
91
 
88
92
  // Function for loading JSON file that contains files to report on
89
93
  async function loadJSONFileToReportOn(filePath) {
90
94
  sharedData.options.log.includes("functions")
91
95
  ? console.log(`Function: loadJSONFileToReportOn(): filePath: ${filePath}`)
92
96
  : null;
93
- sharedData.options.log.includes("quick")
97
+ sharedData.options.log.includes("quick")
94
98
  ? console.log(`Function: loadJSONFileToReportOn(): filePath: ${filePath}`)
95
99
  : null;
96
100
  try {
97
101
  const fileContent = await fs.promises.readFile(filePath, "utf8");
98
102
  let filesArray = JSON.parse(fileContent);
99
103
  // Array relative to root, so update to have full path
100
- filesArray = filesArray.map((str) => path.join(sharedData.options.root, str));
101
-
104
+ filesArray = filesArray.map((str) =>
105
+ path.join(sharedData.options.root, str)
106
+ );
107
+
102
108
  sharedData.options.log.includes("quick")
103
- ? console.log(`quick:filesArray: ${filesArray}`)
104
- : null;
109
+ ? console.log(`quick:filesArray: ${filesArray}`)
110
+ : null;
105
111
 
106
112
  return filesArray;
107
113
  } catch (error) {
@@ -111,7 +117,6 @@ async function loadJSONFileToReportOn(filePath) {
111
117
  }
112
118
  }
113
119
 
114
-
115
120
  const replaceDelimiter = (str, underscore) =>
116
121
  underscore ? str.replace(/\s+/g, "_") : str.replace(/\s+/g, "-");
117
122
 
@@ -122,6 +127,8 @@ const processFile = async (file) => {
122
127
  try {
123
128
  const contents = await fs.promises.readFile(file, "utf8");
124
129
  const resultsForFile = processMarkdown(contents, file);
130
+ //console.log(resultsForFile);
131
+
125
132
  resultsForFile["page_file"] = file;
126
133
 
127
134
  // Call slugify slugifyVuepress() on each of the headings
@@ -160,36 +167,87 @@ const processDirectory = async (dir) => {
160
167
  if (result) {
161
168
  results.push(result);
162
169
  }
163
- }
164
-
165
- else if (isHTML(file)) {
170
+ } else if (isHTML(file)) {
166
171
  sharedData.allHTMLFiles.add(file);
167
172
  const result = await processFile(file);
168
173
  if (result) {
169
174
  results.push(result);
170
175
  }
171
- }
172
-
173
- else if (isImage(file)) {
176
+ } else if (isImage(file)) {
174
177
  sharedData.allImageFiles.add(file);
175
- }
176
- else {
178
+ } else {
177
179
  sharedData.allOtherFiles.add(file);
178
180
  }
179
181
  }
180
182
  return results;
181
183
  };
182
184
 
185
+ function filterIgnoreErrors(errors) {
186
+ // This method removes any errors that are in the ignore errors list
187
+ // This list is imported from the file _link_checker_sc/ignore_errors.json
188
+
189
+ // Currently it is the pages to output, as listed in the options.files to output.
190
+ sharedData.options.log.includes("functions")
191
+ ? console.log(`Function: filterIgnoreErrors()`)
192
+ : null;
193
+
194
+ try {
195
+ //sharedData.IgnoreErrors = require('./_link_checker_sc/ignore_errors.json');
196
+ const ignoreFromFile = fs.readFileSync(
197
+ "./_link_checker_sc/ignore_errors.json"
198
+ );
199
+ sharedData.IgnoreErrors = JSON.parse(ignoreFromFile);
200
+ //console.log(sharedData.IgnoreErrors);
201
+ } catch (error) {
202
+ //console.log("probs loading");
203
+ //console.log(error);
204
+ sharedData.IgnoreErrors = [];
205
+ }
206
+
207
+ const filteredErrors = errors.filter((error) => {
208
+
209
+ let returnValue = true; //All items are not filtered, by default.
210
+ sharedData.IgnoreErrors.forEach((ignorableError) => {
211
+ if (
212
+ error.type === ignorableError.type &&
213
+ error.fileRelativeToRoot === ignorableError.fileRelativeToRoot
214
+ ) {
215
+ // Same file and type, so probably filter out.
216
+ if (!(error.link && ignorableError.link)) {
217
+ returnValue = false; // Neither have a link, so we match on same type
218
+ }
219
+
220
+ if (
221
+ error.link &&
222
+ ignorableError.link &&
223
+ error.link.url === ignorableError.link.url
224
+ ) {
225
+ returnValue = false; // They both have a link and it is the same link
226
+ }
227
+ }
228
+
229
+
230
+ });
231
+ //if (returnValue ==false) console.log(error);
232
+ return returnValue;
233
+ });
234
+
235
+
236
+ return filteredErrors;
237
+ }
238
+
183
239
  function filterErrors(errors) {
240
+ // This method filters all errors against settings in the command line
241
+ // Currently it is the pages to output, as listed in the options.files to output.
184
242
  sharedData.options.log.includes("functions")
185
243
  ? console.log(`Function: filterErrors()`)
186
244
  : null;
187
- // This method filters all errors against settings in the command line - such as pages to output.
245
+
188
246
  let filteredErrors = errors;
189
247
  // Filter results on specified file names (if any specified)
190
248
  //console.log(`Number pages to filter: ${sharedData.options.files.length}`);
191
249
  if (sharedData.options.files.length > 0) {
192
- //console.log(`USharedFileslength: ${sharedData.options.files.length}`);
250
+ //console.log(`USharedFileslength: ${sharedData.options.files.length}`);
193
251
  filteredErrors = errors.filter((error) => {
194
252
  //console.log(`UError: ${error}`);
195
253
  //console.log(JSON.stringify(error, null, 2));
@@ -206,46 +264,61 @@ function filterErrors(errors) {
206
264
 
207
265
  //main function, after options et have been set up.
208
266
  (async () => {
209
-
210
- sharedData.options.files ? (sharedData.options.files = await loadJSONFileToReportOn(sharedData.options.files)) : (sharedData.options.files = []);
267
+ sharedData.options.files
268
+ ? (sharedData.options.files = await loadJSONFileToReportOn(
269
+ sharedData.options.files
270
+ ))
271
+ : (sharedData.options.files = []);
211
272
 
212
273
  // process containing markdown, return results which includes links, headings, id anchors
213
274
  const results = await processDirectory(markdownDirectory);
214
275
 
215
- // Process just the relative links to find errors like missing files, anchors
216
- const errorsFromRelativeLinks = processRelativeLinks(results);
217
276
  if (!results.allErrors) {
218
277
  results.allErrors = [];
219
278
  }
279
+
280
+ // Add errors saved with page during page parsing.
281
+ // Convenient to include with page earlier, but move into main errors item in results here.
282
+ // (we could also just have a global errors and add to that, and share it round to wherever errors are done - might have been easier).
283
+ const pageErrors = results.reduce((accumulator, page) => {
284
+ if (page.errors) {
285
+ accumulator.push(...page.errors);
286
+ }
287
+ return accumulator;
288
+ }, []);
289
+
290
+ results["allErrors"].push(...pageErrors);
291
+
292
+ // Process just the relative links to find errors like missing files, anchors
293
+ const errorsFromRelativeLinks = processRelativeLinks(results);
294
+
220
295
  results["allErrors"].push(...errorsFromRelativeLinks);
221
296
 
222
297
  // Process just images linked in local file system - find errors like missing images.
223
- const errorsFromLocalImageLinks = await checkLocalImageLinks(
224
- results
225
- );
298
+ const errorsFromLocalImageLinks = await checkLocalImageLinks(results);
226
299
  //console.log(errorsFromLocalImageLinks)
227
300
  results["allErrors"].push(...errorsFromLocalImageLinks);
228
301
 
229
302
  // Process links to current site URL - should be relative links normally.
230
- const errorsFromUrlsToLocalSite = await processUrlsToLocalSource(
231
- results
232
- );
303
+ const errorsFromUrlsToLocalSite = await processUrlsToLocalSource(results);
233
304
  //console.log(errorsFromUrlsToLocalSite)
234
305
  results["allErrors"].push(...errorsFromUrlsToLocalSite);
235
306
 
236
307
  // Check for page orphans - markdown files not linked anywhere and not in summary.
237
308
  // Guesses the table of contents file if not specified in options.toc
238
- sharedData.options.toc ? null : (sharedData.options.toc = getPageWithMostLinks(results));
309
+ sharedData.options.toc
310
+ ? null
311
+ : (sharedData.options.toc = getPageWithMostLinks(results));
239
312
  checkPageOrphans(results); // Perhaps should follow pattern of returning errors - currently updates results
240
313
 
241
- const errorsGlobalImageOrphanCheck = await checkImageOrphansGlobal(
242
- results
243
- );
314
+ const errorsGlobalImageOrphanCheck = await checkImageOrphansGlobal(results);
244
315
  results["allErrors"].push(...errorsGlobalImageOrphanCheck);
245
316
 
246
317
  // Filter the errors based on the settings in options.
247
318
  // At time of writing just filters on specific set of pages.
248
- const filteredResults = filterErrors(results.allErrors);
319
+ let filteredResults = filterErrors(results.allErrors);
320
+ // Filter out the ones we have indicated we want to ignore.
321
+ filteredResults = filterIgnoreErrors(filteredResults);
249
322
 
250
323
  // Output the errors as console.logs
251
324
  outputErrors(filteredResults);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "markdown_link_checker_sc",
3
- "version": "0.0.118",
3
+ "version": "0.0.119",
4
4
  "description": "Markdown Link Checker",
5
5
  "main": "index.js",
6
6
  "scripts": {
@@ -22,6 +22,7 @@
22
22
  "author": "Hamish Willee",
23
23
  "license": "MIT",
24
24
  "dependencies": {
25
- "commander": "^10.0.0"
25
+ "commander": "^10.0.0",
26
+ "prompt-sync": "^4.2.0"
26
27
  }
27
28
  }
package/src/errors.js CHANGED
@@ -10,8 +10,15 @@ class LinkError {
10
10
  if (link) {
11
11
  this.link = link;
12
12
  this.file = this.link.page;
13
+ this.fileRelativeToRoot = this.link.fileRelativeToRoot;
13
14
  } else {
14
15
  this.file = file; // i.e. infer file from link, but if link not specified then can take passed value
16
+ this.fileRelativeToRoot = this.file.split(sharedData.options.root)[1];
17
+ this.fileRelativeToRoot =
18
+ this.fileRelativeToRoot.startsWith("/") ||
19
+ this.fileRelativeToRoot.startsWith("\\")
20
+ ? this.fileRelativeToRoot.substring(1)
21
+ : this.fileRelativeToRoot;
15
22
  }
16
23
  }
17
24
 
@@ -125,15 +132,32 @@ class OrphanedImageError extends LinkError {
125
132
  constructor({ file, link }) {
126
133
  super({ file: file, link: link, type: "OrphanedImage" });
127
134
  }
135
+ output() {
136
+ console.log(`- ${this.type}: Image not linked from docs: ${this.file}`);
137
+ }
138
+ }
139
+
140
+ class ReferenceForLinkNotFoundError extends LinkError {
141
+ constructor({ file, linkMatch, refMatch }) {
142
+ super({ file: file, type: "ReferenceForLinkNotFound" });
143
+ if (!linkMatch) {
144
+ throw new Error("ReferenceForLinkNotFoundError: linkMatch is required!");
145
+ } else {
146
+ this.linkMatch = linkMatch;
147
+ }
148
+ if (!refMatch) {
149
+ throw new Error("ReferenceForLinkNotFoundError: refMatch is required!");
150
+ } else {
151
+ this.refMatch = refMatch;
152
+ }
153
+ }
128
154
  output() {
129
155
  console.log(
130
- `- ${this.type}: Image not linked from docs: ${this.file}`
156
+ `- ${this.type}: Matching reference ${this.refMatch} not found for link ${this.linkMatch}`
131
157
  );
132
158
  }
133
159
  }
134
160
 
135
-
136
-
137
161
  export {
138
162
  LinkError,
139
163
  CurrentFileMissingAnchorError,
@@ -144,5 +168,6 @@ export {
144
168
  PageNotInTOCError,
145
169
  PageNotLinkedInternallyError,
146
170
  LocalImageNotFoundError,
147
- OrphanedImageError
171
+ OrphanedImageError,
172
+ ReferenceForLinkNotFoundError,
148
173
  };
package/src/links.js CHANGED
@@ -8,11 +8,12 @@ class Link {
8
8
  anchor = "";
9
9
  params = "";
10
10
  type = "unHandledLinkType";
11
- goat = "This is a 2goat";
11
+ //goat = "This is a 2goat";
12
12
  isImage = false;
13
13
  isMarkdown = false;
14
14
  isHTML = false;
15
15
  isRelative = false;
16
+ isReferenceLink = false;
16
17
 
17
18
  //isImage = false;
18
19
  static linkTypes;
@@ -33,7 +34,7 @@ class Link {
33
34
  ]);
34
35
  }
35
36
 
36
- constructor({ page, url, type, text, title }) {
37
+ constructor({ page, url, type, text, title, refName, refMatch }) {
37
38
  logFunction("Link:constructor");
38
39
 
39
40
  if (page) {
@@ -42,6 +43,10 @@ class Link {
42
43
  throw new Error("Link: page argument is required.");
43
44
  }
44
45
 
46
+ // Create a relative file link for comparison
47
+ this.fileRelativeToRoot = this.page.split(sharedData.options.root)[1];
48
+ this.fileRelativeToRoot = (this.fileRelativeToRoot.startsWith('/') || this.fileRelativeToRoot.startsWith('\\')) ? this.fileRelativeToRoot.substring(1) : this.fileRelativeToRoot
49
+
45
50
  if (url) {
46
51
  this.url = url;
47
52
  this.splitURL(this.url);
@@ -49,6 +54,11 @@ class Link {
49
54
  throw new Error("Link: url argument is required.");
50
55
  }
51
56
 
57
+ text ? (this.text = text) : (this.text = "");
58
+ title ? (this.title = title) : (this.title = "");
59
+ refName ? (this.refName = refName) : (this.refName = "");
60
+ refMatch ? (this.refMatch = refMatch) : (this.refMatch = "");
61
+
52
62
  const linkTypeGuess = this.findType(); // Do to populate the isXxxx values
53
63
  if (type) {
54
64
  if (!Link.linkTypes.has(type)) {
@@ -62,8 +72,7 @@ class Link {
62
72
  //No type specified - use type inferred from extension etc.
63
73
  this.type = linkTypeGuess;
64
74
  }
65
- text ? (this.text = text) : (this.text = "");
66
- title ? (this.title = title) : (this.title = "");
75
+
67
76
  }
68
77
 
69
78
  // Take a URL and split to address, anchor, params
@@ -117,6 +126,8 @@ class Link {
117
126
  this.isMarkdown =
118
127
  this.address && isMarkdown(this.address) ? true : false; //only if address is true.
119
128
  this.isHTML = this.address && isHTML(this.address) ? true : false; //only if address is true.
129
+ this.isReferenceLink = this.refName ? true : false; //Only if we have a reference name
130
+
120
131
  const regexpTestProtocol = /^[a-z]+:/i;
121
132
 
122
133
  //console.log(`Linkcheck1: ${this.address} `);
@@ -3,6 +3,12 @@
3
3
  import { sharedData } from "./shared_data.js";
4
4
  import { logFunction } from "./helpers.js";
5
5
 
6
+ import promptSync from "prompt-sync";
7
+ const prompt = promptSync();
8
+
9
+ import fs from "fs";
10
+ import path from "path";
11
+
6
12
  //Function that generates console and/or log output from an array of error objects.
7
13
  // - `results` is an array of error objects.
8
14
  // These will have a `type` and a `page`. They may also have other values, depending on type of error - such as linkurl
@@ -27,6 +33,7 @@ function outputErrors(results) {
27
33
  }
28
34
  }
29
35
 
36
+ //let updateErrors = false;
30
37
  //console.log(sortedByPageErrors);
31
38
  for (const page in sortedByPageErrors) {
32
39
  let pageFromRoot;
@@ -40,9 +47,53 @@ function outputErrors(results) {
40
47
  for (const error of sortedByPageErrors[page]) {
41
48
  if (error.output) {
42
49
  error.output();
50
+
51
+ // Add items to the errors to be ignored, if enabled.
52
+ if (sharedData.options.interactive) {
53
+ const hideError = prompt("Stop reporting on this error? (Y/N) ", "N");
54
+ console.log(`HideError: ${hideError}`);
55
+ if (!sharedData.IgnoreErrors) {
56
+ sharedData.IgnoreErrors = [];
57
+ }
58
+ if (hideError === "X" || hideError === "x") {
59
+ // Exit without saving
60
+ exit();
61
+ }
62
+ if (hideError === "Y" || hideError === "y") {
63
+ const reduceLink = {
64
+ url: error.link.url,
65
+ text: error.link.text,
66
+ };
67
+ const reduceError = {
68
+ type: error.type,
69
+ fileRelativeToRoot: error.fileRelativeToRoot,
70
+ link: reduceLink,
71
+ };
72
+ reduceError.hideReason = prompt("Why? (enter for now reason) ", "");
73
+
74
+ sharedData.IgnoreErrors.push(reduceError);
75
+ //updateErrors = true;
76
+ }
77
+ }
43
78
  }
44
79
  }
45
80
  }
81
+
82
+ // Create the `_link_checker_sc` folder if it doesn't exist.
83
+ const dirPath = path.join(process.cwd(), "_link_checker_sc");
84
+ if (!fs.existsSync(dirPath) && sharedData.options.interactive) {
85
+ fs.mkdirSync(dirPath);
86
+ }
87
+
88
+ // Create create file to store the json for the errors into
89
+ // But only if iterative update in progress
90
+ if (sharedData.options.interactive) {
91
+ const filePath = path.join(dirPath, "ignore_errors.json");
92
+ fs.writeFileSync(
93
+ filePath,
94
+ JSON.stringify(sharedData.IgnoreErrors, null, 2)
95
+ );
96
+ }
46
97
  }
47
98
 
48
99
  export { outputErrors };
@@ -1,6 +1,7 @@
1
1
  import { Link } from "./links.js";
2
2
  import { sharedData } from "./shared_data.js";
3
3
  import { logFunction } from "./helpers.js";
4
+ import { processReferenceLinks } from "./process_markdown_reflinks.js";
4
5
 
5
6
  // Returns slug for a string (markdown heading) using Vuepress algorithm.
6
7
  // Algorithm from chatgpt - needs testing.
@@ -15,7 +16,9 @@ const processMarkdown = (contents, page) => {
15
16
  const urlLocalLinks = [];
16
17
  const urlImageLinks = [];
17
18
  const relativeImageLinks = [];
19
+ //const referenceLinks = [];
18
20
  const unHandledLinkTypes = [];
21
+ const errors = [];
19
22
  let redirectTo; //Pages that contain <Redirect to="string"/> links
20
23
 
21
24
  //console.log("SHARED_DATA");
@@ -53,8 +56,21 @@ const processMarkdown = (contents, page) => {
53
56
  unHandledLinkTypes,
54
57
  page
55
58
  );
59
+
60
+ // This gets a reference links
56
61
  }
57
62
 
63
+
64
+ const referenceLinkInfo = processReferenceLinks(contents, page);
65
+ urlLinks.push(...referenceLinkInfo.urlLinks);
66
+ urlLocalLinks.push(...referenceLinkInfo.urlLocalLinks);
67
+ urlImageLinks.push(...referenceLinkInfo.urlImageLinks);
68
+ relativeLinks.push(...referenceLinkInfo.relativeLinks);
69
+ relativeImageLinks.push(...referenceLinkInfo.relativeImageLinks);
70
+ errors.push(...referenceLinkInfo.errors);
71
+
72
+ //errors: errors, //TODO need to also pass referenceLinkInfo.errors
73
+
58
74
  // Match html tags that have an id element
59
75
  // (another way an anchor can be created)
60
76
  const htmlTagsWithIdsMatches = contents.match(
@@ -86,9 +102,12 @@ const processMarkdown = (contents, page) => {
86
102
  relativeImageLinks,
87
103
  unHandledLinkTypes,
88
104
  redirectTo,
105
+ errors,
89
106
  };
90
107
  };
91
108
 
109
+
110
+
92
111
  // Processes line, taking arrays of different link types.
93
112
  // Update the incoming values and return
94
113
  // Note, assumption is all links are on one line, not split across lines.
@@ -103,7 +122,7 @@ const processLineMarkdownLinks = (
103
122
  unHandledLinkTypes,
104
123
  page
105
124
  ) => {
106
- logFunction(`Function: processMarkdown(): page: ${page}`);
125
+ logFunction(`Function: processMarkdownLinks(): page: ${page}`);
107
126
 
108
127
  //const regex = /(?<prefix>[!@]?)\[(?<text>[^\]]+)\]\((?<url>\S+?)(?:\s+"(?<title>[^"]+)")?\)/g;
109
128
  // Match to Markdown link OR image
@@ -199,7 +218,9 @@ const processLineMarkdownLinks = (
199
218
  }
200
219
  default: {
201
220
  unHandledLinkTypes.push(link);
202
- sharedData.options.log.includes("todo") ? console.log(`TODO: 3Unhandled link.type: ${link.type}`) : null;
221
+ sharedData.options.log.includes("todo")
222
+ ? console.log(`TODO: 3Unhandled link.type: ${link.type}`)
223
+ : null;
203
224
  break;
204
225
  }
205
226
  }
@@ -224,7 +245,8 @@ const processLineMarkdownLinks = (
224
245
  let linkId = "";
225
246
  if (attributes) {
226
247
  const titlematch = attributes.match(regexHTMLTitle);
227
- linkTitle = titlematch && titlematch.groups.title ? titlematch.groups.title : "";
248
+ linkTitle =
249
+ titlematch && titlematch.groups.title ? titlematch.groups.title : "";
228
250
  const hrefmatch = attributes.match(regexHTMLhref);
229
251
  linkUrl = hrefmatch && hrefmatch.groups.href ? hrefmatch.groups.href : "";
230
252
  const idMatch = attributes.match(regexHTMLid);
@@ -250,7 +272,9 @@ const processLineMarkdownLinks = (
250
272
  //const link = new Link(linkUrl, linkText, linkTitle);
251
273
  if (!linkUrl) {
252
274
  //We should only get here for empty links.
253
- console.log( `WWregexHTMLmatchAtag: page: ${page}, linkUrl: ${linkUrl}, linkText: ${linkText}, linkTitle: ${linkTitle}, linkType: ${linkType}` );
275
+ console.log(
276
+ `WWregexHTMLmatchAtag: page: ${page}, linkUrl: ${linkUrl}, linkText: ${linkText}, linkTitle: ${linkTitle}, linkType: ${linkType}`
277
+ );
254
278
  }
255
279
 
256
280
  const link = new Link({
@@ -301,7 +325,9 @@ const processLineMarkdownLinks = (
301
325
 
302
326
  default: {
303
327
  unHandledLinkTypes.push(link);
304
- sharedData.options.log.includes("todo") ? console.log(`TODO: 2Unhandled link.type: ${link.type}`) : null;
328
+ sharedData.options.log.includes("todo")
329
+ ? console.log(`TODO: 2Unhandled link.type: ${link.type}`)
330
+ : null;
305
331
  break;
306
332
  }
307
333
  }
@@ -316,7 +342,6 @@ const processLineMarkdownLinks = (
316
342
  const regex_htmlattr_src =
317
343
  /src\s*[=]\s*(?<quote>['"])(?<src>.*?)(?<!\\)\k<quote>/i;
318
344
 
319
-
320
345
  for (const match of line.matchAll(regexHTMLImgTotal)) {
321
346
  //console.log(`XXXXXregexHTMLImgTotals: ${match}`)
322
347
  const attributes = match.groups.attributes;
@@ -386,7 +411,9 @@ const processLineMarkdownLinks = (
386
411
 
387
412
  default: {
388
413
  unHandledLinkTypes.push(link);
389
- sharedData.options.log.includes("todo") ? console.log(`TODO: 1Unhandled link.type: ${link.type}`) : null;
414
+ sharedData.options.log.includes("todo")
415
+ ? console.log(`TODO: 1Unhandled link.type: ${link.type}`)
416
+ : null;
390
417
  break;
391
418
  }
392
419
  }
@@ -0,0 +1,173 @@
1
+ import { Link } from "./links.js";
2
+ import { logFunction } from "./helpers.js";
3
+ import {
4
+ ReferenceForLinkNotFoundError /* CurrentFileMissingAnchorError, LinkedFileMissingAnchorError, */,
5
+ } from "./errors.js";
6
+
7
+ //import { sharedData } from "./shared_data.js";
8
+
9
+ // Process all content in page, generating lists of links and some errors.
10
+ function processReferenceLinks(content, page) {
11
+ logFunction(`Function: processReferenceLinks(): page: ${page}`);
12
+
13
+ // Detect reference link
14
+ //const regex = /^\[(.+?)\]:\s+(.+?$)/;
15
+ // Link label format: https://github.github.com/gfm/#link-label
16
+ // Link reference definition: https://github.github.com/gfm/#link-reference-definition
17
+ // This will only catch the "all in one line format".
18
+ // Within that it catches reference, url and title.
19
+ //const regex = /^\s{0,3}\[(?<refName>.+?)\]:\s*?(?<refUrl>.+?$)/;
20
+ //const regex = /^\s{0,3}[\[(?<refName>.+?)\]:\s*?(?<refUrl>.+?)$/;
21
+ const references = {};
22
+ const possibleLinks = [];
23
+
24
+ const errors = [];
25
+ const urlLinks = [];
26
+ const urlLocalLinks = [];
27
+ const urlImageLinks = [];
28
+ const relativeLinks = [];
29
+ const relativeImageLinks = [];
30
+
31
+ const regexReference =
32
+ /^\s{0,3}\[(?<capRefName>.+?)\]:\s*?(?<capRefUrl>.+?)(?:[\"'](?<capRefTitle>[^\"\']+)[\"'])?(\s*$)/; //is goodish
33
+ //const regex = /^\s{0,3}\[(?<refName>.+?)\]:\s*?(?<refUrl>.+?)(?:[\"'](?<refTitle>[^\"\']+)[\"'])?(\s*(?<refTrailing>\S*))?$/
34
+ // TODO NEED to do something about trailing text as it breaks parser.
35
+ //Split content into lines
36
+ const lines = content.split(/\r?\n/);
37
+
38
+ for (let i = 0; i < lines.length; i++) {
39
+ const line = lines[i];
40
+
41
+ // Match on reflinks
42
+ const matchstring = line.match(regexReference);
43
+ if (matchstring) {
44
+ const { capRefName, capRefUrl, capRefTitle } = matchstring.groups;
45
+
46
+ // Normalize refname (lowercase, trimmed, only onewhitespace)
47
+ // First reference used by default.
48
+ const refName = capRefName.trim().toLowerCase().replace(/\s+/g, " ");
49
+ const refTitle = capRefTitle ? capRefTitle : "";
50
+ const refUrl = capRefUrl.trim();
51
+
52
+ if (refName.length !== capRefName.length) {
53
+ console.log(`TODO warn check spaces on ref: ${capRefName}`);
54
+ }
55
+
56
+ const refItem = {
57
+ ref: refName,
58
+ url: refUrl,
59
+ title: refTitle,
60
+ captured: matchstring[0],
61
+ };
62
+
63
+ if (refName in references) {
64
+ console.log(`TODO: Error duplicate reference to print `);
65
+ } else {
66
+ references[refName] = refItem;
67
+ }
68
+ }
69
+
70
+ //Match on possible reference links.
71
+ // const regexWithLinkText = /(?<prefix>[!@]?)\[(?<text>[^\]]*)\][(?<reference>.*?)]/g;
72
+ const regexWithLinkText =
73
+ /(?<prefix>[!@]?)\[(?<text>.*?)\]\[(?<reference>.*?)\]/g;
74
+
75
+ const matches = line.matchAll(regexWithLinkText);
76
+ //console.log(`Matches: ${matches}`);
77
+
78
+ for (const match of matches) {
79
+ const { prefix, text, reference } = match.groups;
80
+ //console.log( ` Prefix: ${prefix}, Text: ${text}, Reference: ${reference}, ` );
81
+ const refName = reference.trim().toLowerCase().replace(/\s+/g, " ");
82
+
83
+ //Create link (possible link from ref)
84
+ // Note, this is just an object, not an object of type Link
85
+ const link = {
86
+ page: page,
87
+ text: text,
88
+ prefix: prefix,
89
+ refName: refName,
90
+ refMatch: reference,
91
+ linkMatch: match[0],
92
+ };
93
+ possibleLinks.push(link);
94
+ //console.log(possibleLinks);
95
+ }
96
+ }
97
+ //console.log(references);
98
+ //console.log(possibleLinks);
99
+
100
+ // Iterate through the possible links, checking for references.
101
+ // Create links and errors
102
+ possibleLinks.forEach((value) => {
103
+ if (value.refName in references) {
104
+ //console.log("Ref exists for link:");
105
+ //console.log(references[value.refName]);
106
+
107
+ //Create link for ref links with matching ref
108
+ const link = new Link({
109
+ page: value.page,
110
+ url: references[value.refName].url,
111
+ text: value.text,
112
+ title: references[value.refName].title,
113
+ isReference: true,
114
+ refName: value.refName,
115
+ refMatch: value.linkMatch,
116
+ });
117
+
118
+ // TODO Save error here if there is a mismatch in prefix - i.e. prefix ! but URL is not an image.
119
+ // Perhaps roll that out elsewhere too.
120
+ // Now lets add to correct type.
121
+
122
+ //Link works out it own type, so add to the appropriate array to return:
123
+ switch (link.type) {
124
+ case "urlLink":
125
+ urlLinks.push(link);
126
+ break;
127
+ case "urlLocalLink":
128
+ urlLocalLinks.push(link);
129
+ break;
130
+ case "urlImageLink":
131
+ urlImageLinks.push(link);
132
+ break;
133
+ case "relativeLink":
134
+ relativeLinks.push(link);
135
+ break;
136
+ case "relativeImageLink":
137
+ relativeImageLinks.push(link);
138
+ break;
139
+ default:
140
+ throw new Error(
141
+ `processReferenceLinks: '${link.type}' link type unknown in switch statement!`
142
+ );
143
+ break;
144
+ }
145
+ } else {
146
+ const error = new ReferenceForLinkNotFoundError({
147
+ file: value.page,
148
+ linkMatch: value.linkMatch,
149
+ refMatch: value.refMatch,
150
+ });
151
+ //TODO: It is valid to have text that has referene format.
152
+ // Don't push error until it can be disabled by default or disabled individually.
153
+ //errors.push(error);
154
+ }
155
+ });
156
+
157
+ //console.log(refLinks);
158
+ return {
159
+ errors: errors,
160
+ urlLinks: urlLinks,
161
+ urlLocalLinks: urlLocalLinks,
162
+ urlImageLinks: urlImageLinks,
163
+ relativeLinks: relativeLinks,
164
+ relativeImageLinks: relativeImageLinks,
165
+ };
166
+ }
167
+
168
+ export { processReferenceLinks };
169
+
170
+ /*
171
+
172
+
173
+ */
@@ -0,0 +1,32 @@
1
+ # Test
2
+
3
+ Run like: `node .\index.js -d tests/links/linkreference/links/`
4
+
5
+ Confirm we pick up references
6
+
7
+ - [this is the link text before references][this is the reference 1] And why not
8
+ - [link text before references without reference][reference does not exist] so there
9
+
10
+ Image URL tests
11
+ - ![image link text to non-image URL][this is the reference 1] thingy
12
+ - ![image link text to image URL][this ref to image url]
13
+ - [image url but not image link][this ref to image url]
14
+
15
+
16
+ Image URL tests
17
+ - ![image link text to non-image URL][this is the reference 1] thingy
18
+ - ![image link text to image URL][this ref to image url]
19
+ - [image url but not image link][this ref to image url]
20
+ - ![image link text to relative URL][rel ref to image url]
21
+
22
+
23
+ [this is the reference 1]: http://this.com/is/a/url/refererence
24
+ [this is reference 2]: http://this.com/is/a/url/refererence 'is title in singlequote'
25
+
26
+ [this ref to image url]: http://this.com/is/a/url/animage.jpg 'is title in singlequote'
27
+
28
+
29
+ [rel ref to image url]: ../url/arelimage.jpg 'is title in singlequote'
30
+
31
+
32
+ This is some text [this is the link text after reference 2][ this is reference 2] And why not
@@ -0,0 +1,32 @@
1
+ # Test
2
+
3
+ Run like: `node .\index.js -d tests/links/linkreference/`
4
+
5
+ Confirm we pick up references
6
+
7
+ [reference 1]: http://this.com/is/a/url/refererence
8
+ [reference 2]: http://this.com/is/a/url/refererence withtextafternoquoteisinvalid
9
+ [reference 3]: http://this.com/is/a/url/refererence "is title in doublequote"
10
+ [reference 4]: http://this.com/is/a/url/refererence 'is title in singlequote'
11
+ [reference 5]: http://this.com/is/a/url/refererence 'is title in singlequote but has text after' with following text.
12
+ [relativepathref]: /a/relative/path
13
+
14
+ [pathref with whitespace ]: /a/path/ref/first/should/be/used
15
+
16
+ [pathref with whitespace]: /a/path/ref/second/should/not/be/used
17
+
18
+ [ pathref WITH Capitals AnD whitespace ]: /a/path/ref/second/should/not/be/used
19
+
20
+ [ onespacebefore twospace threespace fourspace WITH Capitals AnD whitespace ]: /a/path/ref/second/should/not/be/used
21
+
22
+ [ pathref with whitespace]: /a/path/ref/link/but/should/not/be/used
23
+
24
+ [reference indented two spaces]: /a/path/ref
25
+
26
+ [reference indented THREEE spaces]: /a/path/ref
27
+
28
+ [reference indented 4 spaces should be ignored]: /a/path/ref
29
+
30
+ [ref with trailing text not title is error]: /a/path/ref trailingtextnot_matched_as_title.
31
+
32
+ [ref with trailing text after title is error]: /a/path/ref 'ref title text' trailingtextnot_matched_as_title_after_title .