extract-from-sitemap 0.0.12 → 0.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +18 -0
  2. package/cli.js +4 -1
  3. package/package.json +1 -1
package/README.md ADDED
@@ -0,0 +1,18 @@
1
+ This repo allows you to create a static markdown bundle based on one or multiple sources. The sources must either have a functional and complete sitemap, or should specify custom urls to be extracted.
2
+
3
+ ## Step by Step Guide
4
+
5
+ 1. Create a `llmtext.json` file in the root of your project. This is where you define your sources to be extracted from. For an example combining multiple sources, see [this example](https://github.com/janwilmake/parallel-llmtext/blob/main/llmtext.json).
6
+ 2. Run `npx extract-from-sitemap` (or add it to your `package.json` scripts, [like this](https://github.com/janwilmake/parallel-llmtext/blob/main/package.json))
7
+ 3. Set up CI/CD in your repo to automatically update your extracted static files as often as needed. **Example coming soon**
8
+ 4. Use an agent-rewriter such as [next-agent-rewriter](../next-agent-rewriter) to rewrite agent requests to the appropriate static markdown files. In addition, it's best practice to add a link in your html to show the markdown variant is available, like this: `<link rel="alternate" type="text/markdown" href="{path}.md" title="Docs" />`
9
+
10
+ ## Known limitations
11
+
12
+ This library is in active development. Known limitations:
13
+
14
+ - Does not work for nested sitemaps
15
+ - Does not work on sitemaps that are too large
16
+ - Example to make it recurring is still missing
17
+
18
+ I am working on addressing these issues.
package/cli.js CHANGED
@@ -574,7 +574,10 @@ function generateCombinedLlmsTxt(allSources) {
574
574
  link = source.pathPrefix + (path.startsWith("/") ? path : "/" + path);
575
575
  }
576
576
 
577
- combinedTxt += `- [${title}](${link}) (${file.tokens} tokens)${description}\n`;
577
+ combinedTxt += `- [${title}](${link}): ${description.replaceAll(
578
+ "\n",
579
+ " "
580
+ )}\n`;
578
581
  }
579
582
  }
580
583
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "extract-from-sitemap",
3
3
  "bin": "cli.js",
4
- "version": "0.0.12",
4
+ "version": "0.0.13",
5
5
  "main": "mod.js",
6
6
  "description": "A module and CLI that allows extracting all pages from a sitemap into markdown and a llms.txt, using Parallel.ai APIs.",
7
7
  "files": [