npm - @conduction/docusaurus-preset - Versions diffs - 3.4.0 → 3.5.0 - Mend

@conduction/docusaurus-preset 3.4.0 → 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md +50 -0
package/bin/validate-ai-baseline.mjs +177 -0
package/package.json +5 -1

package/README.md CHANGED Viewed

@@ -21,6 +21,7 @@ A few non-negotiables encoded by the package CSS and worth knowing about:
 - **Brand-default navbar** — locale-dropdown + GitHub link. Sites override `items[]` for site-specific navigation.
 - **Brand-default footer** — three-column link grid + Conduction-tells (KvK, BTW, address). Per-property override: pass `footer: { links: [...] }` to swap columns and inherit the brand copyright unchanged. Spread `baseFooterLinks()` to keep one or two brand columns alongside site-specific ones.
 - **Sensible defaults** — `trailingSlash`, `onBrokenLinks: 'warn'`, `respectPrefersColorScheme`, dark-mode brand mapping.
+- **AI-crawler baseline** — Organization + WebSite JSON-LD on every page, `SoftwareApplication` JSON-LD from `<DetailHero>`, `FAQPage` JSON-LD from `<FAQ>`, default `og:image` + Twitter card meta, sitemap options, and a `postBuild` plugin that emits `robots.txt` when the site does not ship its own. See the AI baseline section below for the validator and content requirements.
 ## Usage
@@ -163,6 +164,55 @@ import '@conduction/docusaurus-preset/diagrams';
 This is how product sites such as `mydash.conduction.nl/docs/...` adopt the brand without copying CSS or theme code, and stay in sync as the design-system evolves.
+## AI-crawler baseline
+Every site that consumes this preset inherits a contract that AI crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google AI Overviews) expect. The schemas, meta tags, and `robots.txt` ship automatically; sites only have to opt in to the content that surfaces them.
+**What the preset ships**
+| Surface | Source | How a site uses it |
+| --- | --- | --- |
+| Organization + WebSite JSON-LD | `headTags` injected by `createConfig` | Automatic on every page |
+| `og:image`, `twitter:site`, `twitter:card`, `og:type` | `themeConfig.image` + `themeConfig.metadata` defaults | Override per site by passing `themeConfig.image: 'img/og-my-app.png'` |
+| Default `robots.txt` | `conduction-ai-crawling` postBuild plugin | Drop `static/robots.txt` to override |
+| `SoftwareApplication` JSON-LD | `<DetailHero appId="my-app" .../>` | Pages that should advertise the app must render `<DetailHero>` with an `appId` that resolves in `src/data/apps-registry.js`. No DetailHero means no schema. |
+| `FAQPage` JSON-LD | `<FAQ>` with `<FAQItem question=...>` children | Drop a `<FAQ>` block onto a page; the schema is auto-emitted from the children |
+| Sitemap options | Default `sitemap` config on the classic preset | Sites that override `presets` must include their own `sitemap` block |
+**Validating a site**
+The preset ships a generic 8-check validator as a `bin`. Wire it into the site's build:
+```jsonc
+// docs/package.json
+{
+  "scripts": {
+    "build": "docusaurus build",
+    "postbuild": "validate-ai-baseline",
+    "validate:ai-baseline": "validate-ai-baseline"
+  }
+}
+```
+`npm run build` now exits non-zero if any of these regress: `robots.txt` exists with a Sitemap line and an AI-bot allow line, `sitemap.xml` has at least one URL, the homepage emits Organization + WebSite JSON-LD plus `og:image` / `og:type` / `twitter:site` / `twitter:card`, and the `og:image` URL resolves to a real file. Sites can extend the validator with extra checks (per-app SoftwareApplication, FAQPage on specific pages, etc.) by adding their own `scripts/validate-ai-baseline-site.mjs` and chaining it.
+**Per-app docs site checklist**
+For a per-app docs site to satisfy the full schema contract, the landing page must render `<DetailHero appId="my-app" .../>` with an `appId` that exists in `src/data/apps-registry.js`. That single render emits the `SoftwareApplication` JSON-LD with category mapping (Data and Processes -> BusinessApplication, Connectors -> DeveloperApplication, etc.), `operatingSystem: 'Nextcloud'`, and the EUPL-1.2 license URL. Sites that build a custom landing without `<DetailHero>` get only Organization + WebSite, not the per-app schema.
+**Opting out**
+```js
+createConfig({
+  title: '...',
+  url: '...',
+  baseUrl: '/',
+  aiCrawling: { disable: true },          // skip the whole postBuild plugin
+  // or, finer-grained:
+  aiCrawling: { disable: { robotsTxt: true } },  // ship our own static/robots.txt
+});
+```
 ## Releasing
 Releases auto-publish on push to `main`, driven by [semantic-release](https://semantic-release.gitbook.io/) reading [conventional-commit](https://www.conventionalcommits.org/) messages. The [.github/workflows/publish-packages.yml](../.github/workflows/publish-packages.yml) workflow walks every commit since the last `@conduction/docusaurus-preset-v*` tag and decides what to ship:

package/bin/validate-ai-baseline.mjs ADDED Viewed

@@ -0,0 +1,177 @@
+#!/usr/bin/env node
+/**
+ * scripts/validate-ai-baseline.mjs
+ *
+ * Generic AI-crawler baseline validator. Runs as a postbuild step on
+ * every Conduction Docusaurus site that consumes
+ * @conduction/docusaurus-preset >= 3.4.0. Asserts the SSG output
+ * carries the contract AI crawlers (GPTBot, ClaudeBot, PerplexityBot,
+ * OAI-SearchBot, Claude-SearchBot, Google AI Overviews) expect.
+ *
+ * Universal checks only - no site-specific routes. Sites that want
+ * additional gates (per-app SoftwareApplication, FAQPage on specific
+ * pages, etc.) extend this script in place. See conduction-website's
+ * version for an example of additional checks.
+ *
+ * Exit codes:
+ *   0   all checks passed
+ *   1   one or more checks failed (CI should block)
+ *   2   build directory not found (script invoked before build)
+ */
+import {readFileSync, existsSync, statSync} from 'node:fs';
+import {join, resolve} from 'node:path';
+const buildDir = resolve(process.argv[2] || 'build');
+if (!existsSync(buildDir)) {
+  console.error(`✗ build directory not found: ${buildDir}`);
+  console.error(`  Run \`npx docusaurus build\` first.`);
+  process.exit(2);
+}
+const results = [];
+function check(name, fn) {
+  try {
+    const r = fn();
+    results.push({name, ok: r.ok, msg: r.msg});
+  } catch (e) {
+    results.push({name, ok: false, msg: `threw: ${e.message}`});
+  }
+}
+function readBuild(p) {
+  return readFileSync(join(buildDir, p), 'utf8');
+}
+/* robots.txt - shipped by the preset's ai-crawling plugin (or the
+   site's own static/robots.txt). Either way, the file must exist
+   and name at least one AI search bot so a `grep` audit can confirm
+   the posture at a glance. */
+check('robots.txt exists and is non-empty', () => {
+  const path = join(buildDir, 'robots.txt');
+  if (!existsSync(path)) return {ok: false, msg: 'missing'};
+  const size = statSync(path).size;
+  if (size < 50) return {ok: false, msg: `too small (${size} bytes)`};
+  return {ok: true, msg: `${size} bytes`};
+});
+check('robots.txt names at least one AI search bot', () => {
+  const body = readBuild('robots.txt');
+  const candidates = ['OAI-SearchBot', 'Claude-SearchBot', 'PerplexityBot', 'ChatGPT-User', 'Claude-User'];
+  const found = candidates.filter(ua => body.includes(`User-agent: ${ua}`));
+  if (found.length === 0) {
+    return {ok: false, msg: `none of [${candidates.join(', ')}] referenced`};
+  }
+  return {ok: true, msg: `${found.length} bot(s): ${found.join(', ')}`};
+});
+check('robots.txt has a Sitemap line', () => {
+  const body = readBuild('robots.txt');
+  const matches = body.match(/^Sitemap:\s+https?:\/\//gm) || [];
+  if (matches.length === 0) return {ok: false, msg: 'no Sitemap: line'};
+  return {ok: true, msg: `${matches.length} sitemap line(s)`};
+});
+/* sitemap.xml - emitted by @docusaurus/plugin-sitemap (loaded via
+   the classic preset). Locale-specific sitemaps (e.g. /nl/sitemap.xml)
+   are present for i18n builds; we only check the canonical one
+   because some sites are single-locale. */
+check('sitemap.xml exists and has at least 1 URL', () => {
+  const path = join(buildDir, 'sitemap.xml');
+  if (!existsSync(path)) return {ok: false, msg: 'missing'};
+  const body = readBuild('sitemap.xml');
+  const n = (body.match(/<loc>/g) || []).length;
+  if (n < 1) return {ok: false, msg: 'no <loc> entries'};
+  return {ok: true, msg: `${n} URLs`};
+});
+/* Helper for the JSON-LD checks below. Docusaurus emits ld+json
+   tags via two paths with different attribute ordering: top-level
+   headTags renders <script type="..."> first, while Helmet (used
+   by <Head> from inside React components like <DetailHero>, <FAQ>)
+   prefixes data-rh="true". The regex matches either ordering. */
+function extractJsonLdBlocks(html) {
+  const out = [];
+  const re = /<script\b[^>]*\btype="application\/ld\+json"[^>]*>([\s\S]*?)<\/script>/g;
+  let m;
+  while ((m = re.exec(html)) !== null) {
+    out.push(m[1]);
+  }
+  return out;
+}
+check('homepage emits >= 2 JSON-LD blocks, all valid JSON', () => {
+  if (!existsSync(join(buildDir, 'index.html'))) return {ok: false, msg: 'no index.html'};
+  const html = readBuild('index.html');
+  const blocks = extractJsonLdBlocks(html);
+  if (blocks.length < 2) return {ok: false, msg: `only ${blocks.length} block(s)`};
+  for (const [i, b] of blocks.entries()) {
+    try {JSON.parse(b);} catch (e) {
+      return {ok: false, msg: `block ${i} invalid JSON: ${e.message}`};
+    }
+  }
+  return {ok: true, msg: `${blocks.length} blocks, all valid`};
+});
+check('homepage JSON-LD includes Organization and WebSite', () => {
+  const html = readBuild('index.html');
+  const types = extractJsonLdBlocks(html).map(b => {
+    try {return JSON.parse(b)['@type'];} catch {return null;}
+  });
+  const want = ['Organization', 'WebSite'];
+  const missing = want.filter(t => !types.includes(t));
+  if (missing.length) return {ok: false, msg: `missing @type: ${missing.join(', ')}`};
+  return {ok: true, msg: types.filter(Boolean).join(' + ')};
+});
+/* Social-card meta. og:image is the one that breaks LinkedIn /
+   Slack / AI previews when it 404s, so we also resolve the URL to
+   a local file in the build output. */
+function metaTag(html, key) {
+  const re = new RegExp(`<meta[^>]+(?:name|property)="${key}"[^>]+content="([^"]+)"`, 'i');
+  const m = html.match(re);
+  return m ? m[1] : null;
+}
+check('homepage has og:image, og:type, twitter:site, twitter:card', () => {
+  const html = readBuild('index.html');
+  const checks = {
+    'og:image': metaTag(html, 'og:image'),
+    'og:type': metaTag(html, 'og:type'),
+    'twitter:site': metaTag(html, 'twitter:site'),
+    'twitter:card': metaTag(html, 'twitter:card'),
+  };
+  const missing = Object.entries(checks).filter(([, v]) => !v).map(([k]) => k);
+  if (missing.length) return {ok: false, msg: `missing: ${missing.join(', ')}`};
+  return {ok: true, msg: 'all four present'};
+});
+check('og:image URL resolves to a file in the build', () => {
+  const html = readBuild('index.html');
+  const url = metaTag(html, 'og:image');
+  if (!url) return {ok: false, msg: 'no og:image meta'};
+  const path = url.replace(/^https?:\/\/[^/]+\//, '');
+  const local = join(buildDir, path);
+  if (!existsSync(local)) return {ok: false, msg: `og:image refers to ${url}, not found at ${local}`};
+  const size = statSync(local).size;
+  if (size < 1024) return {ok: false, msg: `og:image file suspiciously small (${size} bytes)`};
+  return {ok: true, msg: `${path} (${size} bytes)`};
+});
+/* Report */
+let failed = 0;
+for (const {name, ok, msg} of results) {
+  const icon = ok ? '✓' : '✗';
+  console.log(`${icon} ${name} - ${msg}`);
+  if (!ok) failed++;
+}
+console.log('');
+if (failed) {
+  console.error(`${failed} of ${results.length} checks failed.`);
+  console.error('AI-crawler baseline regressed. Fix the failures above before merging.');
+  process.exit(1);
+} else {
+  console.log(`All ${results.length} AI-baseline checks passed.`);
+}

package/package.json CHANGED Viewed

@@ -1,9 +1,12 @@
 {
   "name": "@conduction/docusaurus-preset",
-  "version": "3.4.0",
+  "version": "3.5.0",
   "scripts": {
     "prepack": "node scripts/prepack-bundle-css.js"
   },
+  "bin": {
+    "validate-ai-baseline": "./bin/validate-ai-baseline.mjs"
+  },
   "description": "Conduction brand preset for Docusaurus 3. Tokens, theme, navbar, footer, i18n config for nl/en/de/fr, and the React component library that powers conduction.nl and the Conduction product sites.",
   "main": "src/index.js",
   "exports": {
@@ -28,6 +31,7 @@
   "files": [
     "src/",
     "static/",
+    "bin/",
     "README.md",
     "MISSING_COMPONENTS.md"
   ],