@conduction/docusaurus-preset 3.4.0 → 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -21,6 +21,7 @@ A few non-negotiables encoded by the package CSS and worth knowing about:
21
21
  - **Brand-default navbar** — locale-dropdown + GitHub link. Sites override `items[]` for site-specific navigation.
22
22
  - **Brand-default footer** — three-column link grid + Conduction-tells (KvK, BTW, address). Per-property override: pass `footer: { links: [...] }` to swap columns and inherit the brand copyright unchanged. Spread `baseFooterLinks()` to keep one or two brand columns alongside site-specific ones.
23
23
  - **Sensible defaults** — `trailingSlash`, `onBrokenLinks: 'warn'`, `respectPrefersColorScheme`, dark-mode brand mapping.
24
+ - **AI-crawler baseline** — Organization + WebSite JSON-LD on every page, `SoftwareApplication` JSON-LD from `<DetailHero>`, `FAQPage` JSON-LD from `<FAQ>`, default `og:image` + Twitter card meta, sitemap options, and a `postBuild` plugin that emits `robots.txt` when the site does not ship its own. See the AI baseline section below for the validator and content requirements.
24
25
 
25
26
  ## Usage
26
27
 
@@ -163,6 +164,55 @@ import '@conduction/docusaurus-preset/diagrams';
163
164
 
164
165
  This is how product sites such as `mydash.conduction.nl/docs/...` adopt the brand without copying CSS or theme code, and stay in sync as the design-system evolves.
165
166
 
167
+ ## AI-crawler baseline
168
+
169
+ Every site that consumes this preset inherits a contract that AI crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google AI Overviews) expect. The schemas, meta tags, and `robots.txt` ship automatically; sites only have to opt in to the content that surfaces them.
170
+
171
+ **What the preset ships**
172
+
173
+ | Surface | Source | How a site uses it |
174
+ | --- | --- | --- |
175
+ | Organization + WebSite JSON-LD | `headTags` injected by `createConfig` | Automatic on every page |
176
+ | `og:image`, `twitter:site`, `twitter:card`, `og:type` | `themeConfig.image` + `themeConfig.metadata` defaults | Override per site by passing `themeConfig.image: 'img/og-my-app.png'` |
177
+ | Default `robots.txt` | `conduction-ai-crawling` postBuild plugin | Drop `static/robots.txt` to override |
178
+ | `SoftwareApplication` JSON-LD | `<DetailHero appId="my-app" .../>` | Pages that should advertise the app must render `<DetailHero>` with an `appId` that resolves in `src/data/apps-registry.js`. No DetailHero means no schema. |
179
+ | `FAQPage` JSON-LD | `<FAQ>` with `<FAQItem question=...>` children | Drop a `<FAQ>` block onto a page; the schema is auto-emitted from the children |
180
+ | Sitemap options | Default `sitemap` config on the classic preset | Sites that override `presets` must include their own `sitemap` block |
181
+
182
+ **Validating a site**
183
+
184
+ The preset ships a generic 8-check validator as a `bin`. Wire it into the site's build:
185
+
186
+ ```jsonc
187
+ // docs/package.json
188
+ {
189
+ "scripts": {
190
+ "build": "docusaurus build",
191
+ "postbuild": "validate-ai-baseline",
192
+ "validate:ai-baseline": "validate-ai-baseline"
193
+ }
194
+ }
195
+ ```
196
+
197
+ `npm run build` now exits non-zero if any of these regress: `robots.txt` exists with a Sitemap line and an AI-bot allow line, `sitemap.xml` has at least one URL, the homepage emits Organization + WebSite JSON-LD plus `og:image` / `og:type` / `twitter:site` / `twitter:card`, and the `og:image` URL resolves to a real file. Sites can extend the validator with extra checks (per-app SoftwareApplication, FAQPage on specific pages, etc.) by adding their own `scripts/validate-ai-baseline-site.mjs` and chaining it.
198
+
199
+ **Per-app docs site checklist**
200
+
201
+ For a per-app docs site to satisfy the full schema contract, the landing page must render `<DetailHero appId="my-app" .../>` with an `appId` that exists in `src/data/apps-registry.js`. That single render emits the `SoftwareApplication` JSON-LD with category mapping (Data and Processes -> BusinessApplication, Connectors -> DeveloperApplication, etc.), `operatingSystem: 'Nextcloud'`, and the EUPL-1.2 license URL. Sites that build a custom landing without `<DetailHero>` get only Organization + WebSite, not the per-app schema.
202
+
203
+ **Opting out**
204
+
205
+ ```js
206
+ createConfig({
207
+ title: '...',
208
+ url: '...',
209
+ baseUrl: '/',
210
+ aiCrawling: { disable: true }, // skip the whole postBuild plugin
211
+ // or, finer-grained:
212
+ aiCrawling: { disable: { robotsTxt: true } }, // ship our own static/robots.txt
213
+ });
214
+ ```
215
+
166
216
  ## Releasing
167
217
 
168
218
  Releases auto-publish on push to `main`, driven by [semantic-release](https://semantic-release.gitbook.io/) reading [conventional-commit](https://www.conventionalcommits.org/) messages. The [.github/workflows/publish-packages.yml](../.github/workflows/publish-packages.yml) workflow walks every commit since the last `@conduction/docusaurus-preset-v*` tag and decides what to ship:
@@ -0,0 +1,177 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * scripts/validate-ai-baseline.mjs
4
+ *
5
+ * Generic AI-crawler baseline validator. Runs as a postbuild step on
6
+ * every Conduction Docusaurus site that consumes
7
+ * @conduction/docusaurus-preset >= 3.4.0. Asserts the SSG output
8
+ * carries the contract AI crawlers (GPTBot, ClaudeBot, PerplexityBot,
9
+ * OAI-SearchBot, Claude-SearchBot, Google AI Overviews) expect.
10
+ *
11
+ * Universal checks only - no site-specific routes. Sites that want
12
+ * additional gates (per-app SoftwareApplication, FAQPage on specific
13
+ * pages, etc.) extend this script in place. See conduction-website's
14
+ * version for an example of additional checks.
15
+ *
16
+ * Exit codes:
17
+ * 0 all checks passed
18
+ * 1 one or more checks failed (CI should block)
19
+ * 2 build directory not found (script invoked before build)
20
+ */
21
+
22
+ import {readFileSync, existsSync, statSync} from 'node:fs';
23
+ import {join, resolve} from 'node:path';
24
+
25
+ const buildDir = resolve(process.argv[2] || 'build');
26
+
27
+ if (!existsSync(buildDir)) {
28
+ console.error(`✗ build directory not found: ${buildDir}`);
29
+ console.error(` Run \`npx docusaurus build\` first.`);
30
+ process.exit(2);
31
+ }
32
+
33
+ const results = [];
34
+
35
+ function check(name, fn) {
36
+ try {
37
+ const r = fn();
38
+ results.push({name, ok: r.ok, msg: r.msg});
39
+ } catch (e) {
40
+ results.push({name, ok: false, msg: `threw: ${e.message}`});
41
+ }
42
+ }
43
+
44
+ function readBuild(p) {
45
+ return readFileSync(join(buildDir, p), 'utf8');
46
+ }
47
+
48
+ /* robots.txt - shipped by the preset's ai-crawling plugin (or the
49
+ site's own static/robots.txt). Either way, the file must exist
50
+ and name at least one AI search bot so a `grep` audit can confirm
51
+ the posture at a glance. */
52
+ check('robots.txt exists and is non-empty', () => {
53
+ const path = join(buildDir, 'robots.txt');
54
+ if (!existsSync(path)) return {ok: false, msg: 'missing'};
55
+ const size = statSync(path).size;
56
+ if (size < 50) return {ok: false, msg: `too small (${size} bytes)`};
57
+ return {ok: true, msg: `${size} bytes`};
58
+ });
59
+
60
+ check('robots.txt names at least one AI search bot', () => {
61
+ const body = readBuild('robots.txt');
62
+ const candidates = ['OAI-SearchBot', 'Claude-SearchBot', 'PerplexityBot', 'ChatGPT-User', 'Claude-User'];
63
+ const found = candidates.filter(ua => body.includes(`User-agent: ${ua}`));
64
+ if (found.length === 0) {
65
+ return {ok: false, msg: `none of [${candidates.join(', ')}] referenced`};
66
+ }
67
+ return {ok: true, msg: `${found.length} bot(s): ${found.join(', ')}`};
68
+ });
69
+
70
+ check('robots.txt has a Sitemap line', () => {
71
+ const body = readBuild('robots.txt');
72
+ const matches = body.match(/^Sitemap:\s+https?:\/\//gm) || [];
73
+ if (matches.length === 0) return {ok: false, msg: 'no Sitemap: line'};
74
+ return {ok: true, msg: `${matches.length} sitemap line(s)`};
75
+ });
76
+
77
+ /* sitemap.xml - emitted by @docusaurus/plugin-sitemap (loaded via
78
+ the classic preset). Locale-specific sitemaps (e.g. /nl/sitemap.xml)
79
+ are present for i18n builds; we only check the canonical one
80
+ because some sites are single-locale. */
81
+ check('sitemap.xml exists and has at least 1 URL', () => {
82
+ const path = join(buildDir, 'sitemap.xml');
83
+ if (!existsSync(path)) return {ok: false, msg: 'missing'};
84
+ const body = readBuild('sitemap.xml');
85
+ const n = (body.match(/<loc>/g) || []).length;
86
+ if (n < 1) return {ok: false, msg: 'no <loc> entries'};
87
+ return {ok: true, msg: `${n} URLs`};
88
+ });
89
+
90
+ /* Helper for the JSON-LD checks below. Docusaurus emits ld+json
91
+ tags via two paths with different attribute ordering: top-level
92
+ headTags renders <script type="..."> first, while Helmet (used
93
+ by <Head> from inside React components like <DetailHero>, <FAQ>)
94
+ prefixes data-rh="true". The regex matches either ordering. */
95
+ function extractJsonLdBlocks(html) {
96
+ const out = [];
97
+ const re = /<script\b[^>]*\btype="application\/ld\+json"[^>]*>([\s\S]*?)<\/script>/g;
98
+ let m;
99
+ while ((m = re.exec(html)) !== null) {
100
+ out.push(m[1]);
101
+ }
102
+ return out;
103
+ }
104
+
105
+ check('homepage emits >= 2 JSON-LD blocks, all valid JSON', () => {
106
+ if (!existsSync(join(buildDir, 'index.html'))) return {ok: false, msg: 'no index.html'};
107
+ const html = readBuild('index.html');
108
+ const blocks = extractJsonLdBlocks(html);
109
+ if (blocks.length < 2) return {ok: false, msg: `only ${blocks.length} block(s)`};
110
+ for (const [i, b] of blocks.entries()) {
111
+ try {JSON.parse(b);} catch (e) {
112
+ return {ok: false, msg: `block ${i} invalid JSON: ${e.message}`};
113
+ }
114
+ }
115
+ return {ok: true, msg: `${blocks.length} blocks, all valid`};
116
+ });
117
+
118
+ check('homepage JSON-LD includes Organization and WebSite', () => {
119
+ const html = readBuild('index.html');
120
+ const types = extractJsonLdBlocks(html).map(b => {
121
+ try {return JSON.parse(b)['@type'];} catch {return null;}
122
+ });
123
+ const want = ['Organization', 'WebSite'];
124
+ const missing = want.filter(t => !types.includes(t));
125
+ if (missing.length) return {ok: false, msg: `missing @type: ${missing.join(', ')}`};
126
+ return {ok: true, msg: types.filter(Boolean).join(' + ')};
127
+ });
128
+
129
+ /* Social-card meta. og:image is the one that breaks LinkedIn /
130
+ Slack / AI previews when it 404s, so we also resolve the URL to
131
+ a local file in the build output. */
132
+ function metaTag(html, key) {
133
+ const re = new RegExp(`<meta[^>]+(?:name|property)="${key}"[^>]+content="([^"]+)"`, 'i');
134
+ const m = html.match(re);
135
+ return m ? m[1] : null;
136
+ }
137
+
138
+ check('homepage has og:image, og:type, twitter:site, twitter:card', () => {
139
+ const html = readBuild('index.html');
140
+ const checks = {
141
+ 'og:image': metaTag(html, 'og:image'),
142
+ 'og:type': metaTag(html, 'og:type'),
143
+ 'twitter:site': metaTag(html, 'twitter:site'),
144
+ 'twitter:card': metaTag(html, 'twitter:card'),
145
+ };
146
+ const missing = Object.entries(checks).filter(([, v]) => !v).map(([k]) => k);
147
+ if (missing.length) return {ok: false, msg: `missing: ${missing.join(', ')}`};
148
+ return {ok: true, msg: 'all four present'};
149
+ });
150
+
151
+ check('og:image URL resolves to a file in the build', () => {
152
+ const html = readBuild('index.html');
153
+ const url = metaTag(html, 'og:image');
154
+ if (!url) return {ok: false, msg: 'no og:image meta'};
155
+ const path = url.replace(/^https?:\/\/[^/]+\//, '');
156
+ const local = join(buildDir, path);
157
+ if (!existsSync(local)) return {ok: false, msg: `og:image refers to ${url}, not found at ${local}`};
158
+ const size = statSync(local).size;
159
+ if (size < 1024) return {ok: false, msg: `og:image file suspiciously small (${size} bytes)`};
160
+ return {ok: true, msg: `${path} (${size} bytes)`};
161
+ });
162
+
163
+ /* Report */
164
+ let failed = 0;
165
+ for (const {name, ok, msg} of results) {
166
+ const icon = ok ? '✓' : '✗';
167
+ console.log(`${icon} ${name} - ${msg}`);
168
+ if (!ok) failed++;
169
+ }
170
+ console.log('');
171
+ if (failed) {
172
+ console.error(`${failed} of ${results.length} checks failed.`);
173
+ console.error('AI-crawler baseline regressed. Fix the failures above before merging.');
174
+ process.exit(1);
175
+ } else {
176
+ console.log(`All ${results.length} AI-baseline checks passed.`);
177
+ }
package/package.json CHANGED
@@ -1,9 +1,12 @@
1
1
  {
2
2
  "name": "@conduction/docusaurus-preset",
3
- "version": "3.4.0",
3
+ "version": "3.5.0",
4
4
  "scripts": {
5
5
  "prepack": "node scripts/prepack-bundle-css.js"
6
6
  },
7
+ "bin": {
8
+ "validate-ai-baseline": "./bin/validate-ai-baseline.mjs"
9
+ },
7
10
  "description": "Conduction brand preset for Docusaurus 3. Tokens, theme, navbar, footer, i18n config for nl/en/de/fr, and the React component library that powers conduction.nl and the Conduction product sites.",
8
11
  "main": "src/index.js",
9
12
  "exports": {
@@ -28,6 +31,7 @@
28
31
  "files": [
29
32
  "src/",
30
33
  "static/",
34
+ "bin/",
31
35
  "README.md",
32
36
  "MISSING_COMPONENTS.md"
33
37
  ],