metanova 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/USAGE_GUIDE.md ADDED
@@ -0,0 +1,829 @@
1
+ # MetaNova Usage Guide
2
+
3
+ This guide explains how to use MetaNova from zero: installation, quick starts, API details, security options, diagnostics, adapters, plugins, examples, validation, and troubleshooting.
4
+
5
+ ## Introduction
6
+
7
+ MetaNova is a JavaScript and TypeScript metadata extraction library. It turns web pages and public URLs into a predictable JSON object for link previews, bots, bookmark managers, AI agents, search systems, dashboards, CMS integrations, and social tooling.
8
+
9
+ MetaNova can extract from:
10
+
11
+ - Open Graph tags
12
+ - Twitter Cards
13
+ - JSON-LD and Schema.org
14
+ - oEmbed discovery and optional oEmbed JSON
15
+ - Standard HTML metadata
16
+ - Canonical URLs
17
+ - Favicons
18
+ - Images, lazy images, `srcset`, `picture`, video posters, and `noscript` fallback images
19
+ - Videos and audio
20
+ - Built-in site adapters for common social/media platforms
21
+ - Custom plugins and adapters
22
+
23
+ ## Installation
24
+
25
+ ```bash
26
+ npm install metanova
27
+ ```
28
+
29
+ For local development in this repository:
30
+
31
+ ```bash
32
+ npm install
33
+ npm run build
34
+ ```
35
+
36
+ ## Quick Start
37
+
38
+ ```ts
39
+ import { fetchMetadata, createPreviewCard } from "metanova";
40
+
41
+ const metadata = await fetchMetadata("https://example.com/article");
42
+ const card = createPreviewCard(metadata);
43
+
44
+ console.log(metadata.bestImage);
45
+ console.log(card);
46
+ ```
47
+
48
+ Expected shape:
49
+
50
+ ```json
51
+ {
52
+ "title": "Example article",
53
+ "description": "A short summary.",
54
+ "image": "https://example.com/cover.jpg",
55
+ "url": "https://example.com/article",
56
+ "siteName": "Example",
57
+ "type": "article",
58
+ "confidence": 92
59
+ }
60
+ ```
61
+
62
+ ## ESM Usage
63
+
64
+ ```js
65
+ import { fetchMetadata, parseMetadata, createPreviewCard } from "metanova";
66
+
67
+ const metadata = await fetchMetadata("https://example.com/post");
68
+ console.log(createPreviewCard(metadata));
69
+ ```
70
+
71
+ ## CommonJS Usage
72
+
73
+ ```js
74
+ const { parseMetadata } = require("metanova");
75
+
76
+ const metadata = parseMetadata("<title>Hello</title>", "https://example.com");
77
+ console.log(metadata.title);
78
+ ```
79
+
80
+ ## TypeScript Usage
81
+
82
+ ```ts
83
+ import {
84
+ fetchMetadata,
85
+ type FetchMetadataOptions,
86
+ type UnifiedMetadata
87
+ } from "metanova";
88
+
89
+ const options: FetchMetadataOptions = {
90
+ timeoutMs: 8000,
91
+ maxBytes: 2_000_000
92
+ };
93
+
94
+ const metadata: UnifiedMetadata = await fetchMetadata("https://example.com", options);
95
+ console.log(metadata.confidence);
96
+ ```
97
+
98
+ ## fetchMetadata
99
+
100
+ `fetchMetadata(url, options)` downloads a page, follows redirects safely, extracts metadata, normalizes it, scores images, calculates confidence, and returns a unified result.
101
+
102
+ ```ts
103
+ const metadata = await fetchMetadata("https://example.com/article", {
104
+ timeoutMs: 8000,
105
+ retries: 1,
106
+ retryDelayMs: 250,
107
+ maxRedirects: 5,
108
+ maxBytes: 2_000_000,
109
+ userAgent: "MyBot/1.0",
110
+ acceptLanguage: "en-US,en;q=0.9",
111
+ fetchOEmbed: true
112
+ });
113
+ ```
114
+
115
+ Common options:
116
+
117
+ - `timeoutMs`: request timeout in milliseconds.
118
+ - `retries`: retry count after failed requests.
119
+ - `retryDelayMs`: delay between retries.
120
+ - `maxRedirects`: redirect limit.
121
+ - `maxBytes`: maximum response size.
122
+ - `userAgent`: custom user agent.
123
+ - `accept`: custom Accept header.
124
+ - `acceptLanguage`: custom Accept-Language header.
125
+ - `acceptEncoding`: custom Accept-Encoding header.
126
+ - `headers`: custom headers.
127
+ - `fetch`: custom fetch implementation.
128
+ - `cache`: optional cache with `get(url)` and `set(url, entry)`.
129
+ - `fetchOEmbed`: fetch discovered JSON oEmbed endpoints.
130
+ - `allowLocalhost`: allow localhost and loopback URLs.
131
+ - `allowPrivateNetwork`: allow private/reserved network targets.
132
+
133
+ ## parseMetadata
134
+
135
+ `parseMetadata(html, url, options)` extracts metadata from HTML you already have. It does not perform network requests.
136
+
137
+ ```ts
138
+ import { parseMetadata } from "metanova";
139
+
140
+ const metadata = parseMetadata(html, "https://example.com/article");
141
+ console.log(metadata.title);
142
+ ```
143
+
144
+ Use `parseMetadataAsync` when adapters or plugins may perform asynchronous work, or when you want to fetch oEmbed JSON:
145
+
146
+ ```ts
147
+ import { parseMetadataAsync } from "metanova";
148
+
149
+ const metadata = await parseMetadataAsync(html, "https://example.com/video", {
150
+ fetchOEmbed: true
151
+ });
152
+ ```
153
+
154
+ ## createPreviewCard
155
+
156
+ `createPreviewCard(metadata)` returns a compact object suitable for bots and application previews.
157
+
158
+ ```ts
159
+ import { createPreviewCard } from "metanova";
160
+
161
+ const card = createPreviewCard(metadata);
162
+ ```
163
+
164
+ Output:
165
+
166
+ ```json
167
+ {
168
+ "title": "...",
169
+ "description": "...",
170
+ "image": "...",
171
+ "url": "...",
172
+ "siteName": "...",
173
+ "domain": "example.com",
174
+ "author": "...",
175
+ "type": "article",
176
+ "confidence": 92
177
+ }
178
+ ```
179
+
180
+ ## Architecture
181
+
182
+ MetaNova runs a layered pipeline:
183
+
184
+ 1. URL validation, normalization, short-link detection, redirect handling, and SSRF checks.
185
+ 2. Browser-like networking with realistic default request headers.
186
+ 3. Extraction from Open Graph, Twitter Cards, JSON-LD, embedded app data, oEmbed, standard HTML, and media tags.
187
+ 4. Site adapters for platform-specific recovery.
188
+ 5. `MediaDiscoveryEngine` to merge images, videos, audio, embedded thumbnails, social media, and fallback assets.
189
+ 6. Image scoring and `bestImage` selection.
190
+ 7. `ConfidenceEngine` and completeness scoring.
191
+ 8. Diagnostics and extraction trace generation.
192
+
193
+ ## Embedded Data Extraction
194
+
195
+ Modern apps often store preview data outside OG tags. MetaNova extracts:
196
+
197
+ - `script#__NEXT_DATA__`
198
+ - Nuxt payloads
199
+ - `window.__INITIAL_STATE__`
200
+ - `window.__PRELOADED_STATE__`
201
+ - Apollo-style state payloads
202
+ - `script[type="application/json"]`
203
+ - JSON blobs inside script tags
204
+
205
+ These sources can provide title, description, author, publish date, images, video thumbnails, and canonical-like URLs.
206
+
207
+ ```ts
208
+ import { extractEmbeddedData } from "metanova";
209
+
210
+ const embedded = extractEmbeddedData(html);
211
+ console.log(embedded.items.map((item) => item.source));
212
+ ```
213
+
214
+ ## Media Discovery Engine
215
+
216
+ Use `discoverMedia(rawSources, finalUrl)` when you are building custom pipelines from raw extraction sources.
217
+
218
+ The engine searches:
219
+
220
+ - Open Graph images/videos/audio
221
+ - Twitter images/videos
222
+ - JSON-LD images/videos/audio
223
+ - embedded application data
224
+ - oEmbed thumbnails/photos
225
+ - `srcset`, `picture`, lazy image attributes
226
+ - video posters
227
+ - adapter/plugin media
228
+
229
+ It filters data URLs, pixels, sprites, placeholders, icons, avatars, and weak duplicates.
230
+
231
+ ## Confidence Engine
232
+
233
+ `confidence` is an integer from 0 to 100.
234
+
235
+ It considers title quality, description quality, image quality, canonical URL, structured data, adapter success, embedded data, and warning count.
236
+
237
+ ```ts
238
+ console.log(metadata.confidence); // 94
239
+ ```
240
+
241
+ ## Completeness
242
+
243
+ `completeness` is an integer from 0 to 100. It measures whether useful preview fields exist:
244
+
245
+ - title
246
+ - description
247
+ - best image
248
+ - canonical URL
249
+ - site name
250
+ - author
251
+ - publisher
252
+ - known type
253
+ - publish date
254
+ - extra media
255
+
256
+ ```ts
257
+ console.log(metadata.completeness); // 88
258
+ ```
259
+
260
+ ## Reliability
261
+
262
+ `reliability` is an integer from 0 to 100. It combines confidence, completeness, adapter success, media quality, and warning count.
263
+
264
+ ```ts
265
+ console.log(metadata.reliability); // 91
266
+ ```
267
+
268
+ ## Source Attribution
269
+
270
+ MetaNova records where important fields came from:
271
+
272
+ ```ts
273
+ console.log(metadata.sources);
274
+ ```
275
+
276
+ Example:
277
+
278
+ ```json
279
+ {
280
+ "title": "jsonLd",
281
+ "description": "openGraph",
282
+ "author": "youtubeAdapter",
283
+ "image": "twitter"
284
+ }
285
+ ```
286
+
287
+ ## Adapter Diagnostics
288
+
289
+ When an adapter matches, diagnostics include the adapter name and a rough adapter confidence:
290
+
291
+ ```json
292
+ {
293
+ "adapter": {
294
+ "matched": true,
295
+ "name": "youtubeAdapter",
296
+ "confidence": 95
297
+ }
298
+ }
299
+ ```
300
+
301
+ ## Working With Images
302
+
303
+ MetaNova discovers images from:
304
+
305
+ - `og:image`
306
+ - Twitter Card images
307
+ - JSON-LD image fields
308
+ - oEmbed thumbnails and photos
309
+ - `link[rel=image_src]`
310
+ - `img[src]`
311
+ - `img[srcset]`
312
+ - `picture > source[srcset]`
313
+ - `data-src`
314
+ - `data-original`
315
+ - `data-lazy-src`
316
+ - `data-image`
317
+ - `data-thumbnail`
318
+ - video `poster`
319
+ - `noscript` fallback images
320
+
321
+ All relative URLs are resolved against the page URL.
322
+
323
+ MetaNova ignores common bad candidates:
324
+
325
+ - base64 and `data:` images
326
+ - tracking pixels
327
+ - tiny icon images
328
+ - sprites
329
+ - transparent placeholders
330
+ - empty or unsupported URLs
331
+
332
+ `bestImage` is selected with image scoring. The score considers source reliability, dimensions, aspect ratio, format, URL hints such as `cover` or `preview`, and penalties for `logo`, `avatar`, `sprite`, `pixel`, and `placeholder`.
333
+
334
+ ```ts
335
+ console.log(metadata.bestImage);
336
+ console.log(metadata.images[0].score);
337
+ console.log(metadata.diagnostics.selectedImageReason);
338
+ ```
339
+
340
+ Expected reason example:
341
+
342
+ ```txt
343
+ Selected because it came from og:image, has 1200x630, and scored 100.
344
+ ```
345
+
346
+ ## Working With Videos
347
+
348
+ MetaNova discovers videos from:
349
+
350
+ - `og:video`
351
+ - Twitter player metadata
352
+ - JSON-LD `VideoObject`
353
+ - oEmbed video data
354
+ - HTML `<video>` and `<source>`
355
+ - common iframe embeds such as YouTube, Vimeo, TikTok, Instagram, and Facebook
356
+
357
+ Video posters are also added as image candidates because they are often the best preview image.
358
+
359
+ ```ts
360
+ for (const video of metadata.videos) {
361
+ console.log(video.url, video.poster);
362
+ }
363
+ ```
364
+
365
+ ## Working With Diagnostics
366
+
367
+ Diagnostics explain how the result was produced.
368
+
369
+ ```ts
370
+ console.log(metadata.diagnostics);
371
+ ```
372
+
373
+ Important fields:
374
+
375
+ - `originalUrl`: input URL.
376
+ - `finalUrl`: final URL after redirects.
377
+ - `canonicalUrl`: canonical URL extracted from page metadata.
378
+ - `redirects`: redirect chain.
379
+ - `isShortUrl`: whether the original URL is a known short-link host.
380
+ - `shortUrlProvider`: short-link provider, such as Bitly, Reddit, Pinterest, X, TinyURL, or YouTube.
381
+ - `sourcesUsed`: sources that contributed metadata.
382
+ - `warnings`: non-fatal extraction issues.
383
+ - `errors`: fatal fetch errors when `ok` is false.
384
+ - `selectedImageReason`: why `bestImage` was selected.
385
+ - `trace`: ordered extraction steps.
386
+
387
+ Example:
388
+
389
+ ```json
390
+ {
391
+ "originalUrl": "https://youtu.be/abc",
392
+ "finalUrl": "https://www.youtube.com/watch?v=abc",
393
+ "canonicalUrl": "https://www.youtube.com/watch?v=abc",
394
+ "redirects": [],
395
+ "isShortUrl": true,
396
+ "shortUrlProvider": "YouTube",
397
+ "sourcesUsed": ["openGraph", "youtubeAdapter"],
398
+ "warnings": [],
399
+ "trace": ["downloaded page", "parsed Open Graph", "adapter matched: youtubeAdapter", "selected image from youtubeAdapter (openGraph)"],
400
+ "selectedImageReason": "Selected because it came from og:image..."
401
+ }
402
+ ```
403
+
404
+ ## Extraction Trace
405
+
406
+ `diagnostics.trace` is an ordered list of decisions and milestones. It is useful when a preview is weak because it shows whether the page was downloaded, which extractors produced data, which adapter matched, and where the final image came from.
407
+
408
+ ```ts
409
+ for (const step of metadata.diagnostics.trace) {
410
+ console.log(step);
411
+ }
412
+ ```
413
+
414
+ Typical steps include:
415
+
416
+ - `downloaded page`
417
+ - `parsed Open Graph`
418
+ - `parsed JSON-LD`
419
+ - `parsed embedded application data`
420
+ - `adapter matched: redditAdapter`
421
+ - `selected image from redditAdapter (openGraph)`
422
+
423
+ ## Security Options
424
+
425
+ MetaNova protects `fetchMetadata` against SSRF-style targets by default.
426
+
427
+ Blocked by default:
428
+
429
+ - `localhost`
430
+ - loopback addresses
431
+ - private network addresses
432
+ - link-local and reserved networks
433
+ - unsupported protocols
434
+ - malicious redirects to blocked targets
435
+ - oversized responses
436
+
437
+ Trusted local development example:
438
+
439
+ ```ts
440
+ await fetchMetadata("http://127.0.0.1:3000", {
441
+ allowLocalhost: true,
442
+ allowPrivateNetwork: true
443
+ });
444
+ ```
445
+
446
+ ## Performance
447
+
448
+ MetaNova core does not run browser automation. It relies on browser-like networking, static HTML parsing, embedded JSON payload extraction, bounded response sizes, and adapter-specific heuristics. This keeps the default package fast and suitable for bots, serverless functions, queues, and indexing jobs.
449
+
450
+ Performance controls:
451
+
452
+ - `timeoutMs`
453
+ - `retries`
454
+ - `retryDelayMs`
455
+ - `maxRedirects`
456
+ - `maxBytes`
457
+ - `cache`
458
+ - `fetchOEmbed`
459
+
460
+ Size and timeout controls:
461
+
462
+ ```ts
463
+ await fetchMetadata("https://example.com", {
464
+ timeoutMs: 5000,
465
+ maxBytes: 1_000_000,
466
+ maxRedirects: 3,
467
+ acceptLanguage: "en-US,en;q=0.9"
468
+ });
469
+ ```
470
+
471
+ ## Adapters
472
+
473
+ Adapters add site-specific behavior. Built-in adapters currently cover:
474
+
475
+ - Reddit posts and `redd.it`
476
+ - Pinterest pins and `pin.it`
477
+ - Behance projects
478
+ - YouTube videos and `youtu.be`
479
+ - TikTok posts
480
+ - X/Twitter posts and `t.co`
481
+ - Facebook public posts
482
+ - Instagram public posts
483
+ - YouTube playlists and community posts
484
+
485
+ Adapter contract:
486
+
487
+ ```ts
488
+ const adapter = {
489
+ name: "docsAdapter",
490
+ detect(url) {
491
+ return url.hostname === "docs.example.com";
492
+ },
493
+ extract(context) {
494
+ return {
495
+ source: "docsAdapter",
496
+ title: context.raw.openGraph.title,
497
+ platform: "Docs"
498
+ };
499
+ },
500
+ normalize(rawData) {
501
+ return {
502
+ ...rawData,
503
+ source: "docsAdapter",
504
+ type: "article",
505
+ siteName: rawData.platform
506
+ };
507
+ }
508
+ };
509
+
510
+ const metadata = parseMetadata(html, "https://docs.example.com/guide", {
511
+ adapters: [adapter]
512
+ });
513
+ ```
514
+
515
+ `detect(url)` decides whether the adapter applies.
516
+
517
+ `extract(context)` reads HTML, parsed raw sources, URL, and options.
518
+
519
+ `normalize(rawData, context)` converts adapter-specific raw data into a normal MetaNova result contribution.
520
+
521
+ ### YouTube Playlists
522
+
523
+ When a YouTube URL includes a playlist id, MetaNova returns:
524
+
525
+ ```json
526
+ {
527
+ "type": "playlist",
528
+ "playlist": {
529
+ "id": "...",
530
+ "title": "...",
531
+ "channel": {},
532
+ "videos": [
533
+ {
534
+ "id": "...",
535
+ "title": "...",
536
+ "url": "https://www.youtube.com/watch?v=..."
537
+ }
538
+ ]
539
+ }
540
+ }
541
+ ```
542
+
543
+ That means a developer can do:
544
+
545
+ ```ts
546
+ for (const video of metadata.playlist?.videos ?? []) {
547
+ await fetchMetadata(video.url);
548
+ }
549
+ ```
550
+
551
+ ## Plugins
552
+
553
+ Plugins can register custom extractors, adapters, and image scorers.
554
+
555
+ ```ts
556
+ import { MetaNova } from "metanova";
557
+
558
+ const plugin = {
559
+ name: "internal-docs",
560
+ setup(api) {
561
+ api.addExtractor("docs-meta", ({ $ }) => ({
562
+ source: "docs-meta",
563
+ title: $("meta[name='doc:title']").attr("content"),
564
+ siteName: "Internal Docs"
565
+ }));
566
+
567
+ api.addImageScorer((image) => (image.url.includes("/hero/") ? 12 : 0));
568
+ }
569
+ };
570
+
571
+ MetaNova.use(plugin);
572
+ ```
573
+
574
+ You can also pass plugins per call:
575
+
576
+ ```ts
577
+ parseMetadata(html, "https://example.com", {
578
+ plugins: [plugin]
579
+ });
580
+ ```
581
+
582
+ ## Error Handling
583
+
584
+ `fetchMetadata` returns `ok: false` for handled fetch failures.
585
+
586
+ ```ts
587
+ const metadata = await fetchMetadata("https://example.com");
588
+
589
+ if (!metadata.ok) {
590
+ console.error(metadata.diagnostics.errors);
591
+ }
592
+ ```
593
+
594
+ Network failure example:
595
+
596
+ ```json
597
+ {
598
+ "ok": false,
599
+ "type": "unknown",
600
+ "confidence": 0,
601
+ "diagnostics": {
602
+ "errors": ["Request timed out."],
603
+ "warnings": []
604
+ }
605
+ }
606
+ ```
607
+
608
+ Use `try/catch` around direct lower-level helpers such as `validateUrl` or `assertSafeRequestUrl`.
609
+
610
+ ## Real Examples
611
+
612
+ After building the package, run:
613
+
614
+ ```bash
615
+ npm install
616
+ npm run build
617
+ node examples/quick-start.mjs
618
+ node examples/commonjs.cjs
619
+ node examples/parse-html.mjs
620
+ node examples/preview-card.mjs
621
+ node examples/social-links.mjs
622
+ node examples/reddit.mjs
623
+ node examples/pinterest.mjs
624
+ node examples/behance.mjs
625
+ node examples/youtube.mjs
626
+ node examples/diagnostics.mjs
627
+ node examples/custom-plugin.mjs
628
+ node examples/custom-adapter.mjs
629
+ ```
630
+
631
+ The examples above are mock examples: they use local HTML fixtures so they run deterministically without network.
632
+
633
+ Live network examples take a URL argument and do not contain built-in validation URLs:
634
+
635
+ ```bash
636
+ node examples/live-fetch.mjs https://example.com
637
+ node examples/youtube-video.mjs https://example.com
638
+ node examples/youtube-playlist.mjs https://example.com
639
+ node examples/social-preview.mjs https://example.com
640
+ ```
641
+
642
+ ### Reddit Post
643
+
644
+ ```ts
645
+ const metadata = parseMetadata(html, "https://www.reddit.com/r/typescript/comments/abc123/title/");
646
+ console.log(metadata.type); // social_post
647
+ console.log(metadata.siteName); // Reddit
648
+ ```
649
+
650
+ ### Pinterest Pin Or Short Link
651
+
652
+ ```ts
653
+ parseMetadata(html, "https://www.pinterest.com/pin/123456789/");
654
+ detectShortUrl("https://pin.it/abc"); // { isShortUrl: true, provider: "Pinterest" }
655
+ ```
656
+
657
+ ### Behance Project
658
+
659
+ ```ts
660
+ const metadata = parseMetadata(html, "https://www.behance.net/gallery/123456789/project");
661
+ console.log(metadata.type); // image
662
+ ```
663
+
664
+ ### YouTube Video
665
+
666
+ ```ts
667
+ const metadata = parseMetadata(html, "https://youtu.be/dQw4w9WgXcQ");
668
+ console.log(metadata.type); // video
669
+ console.log(metadata.canonicalUrl); // https://www.youtube.com/watch?v=dQw4w9WgXcQ
670
+ ```
671
+
672
+ ### Open Graph Page
673
+
674
+ ```ts
675
+ const metadata = parseMetadata(`
676
+ <meta property="og:title" content="Post">
677
+ <meta property="og:description" content="Summary">
678
+ <meta property="og:image" content="/cover.jpg">
679
+ `, "https://example.com/post");
680
+ ```
681
+
682
+ Expected:
683
+
684
+ ```json
685
+ {
686
+ "type": "website",
687
+ "title": "Post",
688
+ "bestImage": "https://example.com/cover.jpg"
689
+ }
690
+ ```
691
+
692
+ ### Fallback-Only Page
693
+
694
+ ```ts
695
+ const metadata = parseMetadata(`
696
+ <title>Fallback</title>
697
+ <meta name="description" content="No Open Graph here.">
698
+ <img data-lazy-src="/fallback-cover.jpg" width="1200" height="630">
699
+ `, "https://example.com/fallback");
700
+ ```
701
+
702
+ Expected:
703
+
704
+ ```json
705
+ {
706
+ "title": "Fallback",
707
+ "description": "No Open Graph here.",
708
+ "bestImage": "https://example.com/fallback-cover.jpg"
709
+ }
710
+ ```
711
+
712
+ ## Validation & Verification
713
+
714
+ Run the full quality suite:
715
+
716
+ ```bash
717
+ npm run typecheck
718
+ npm run lint
719
+ npm test
720
+ npm run build
721
+ npm pack --dry-run
722
+ ```
723
+
724
+ Verify examples:
725
+
726
+ ```bash
727
+ node examples/quick-start.mjs
728
+ node examples/commonjs.cjs
729
+ node examples/parse-html.mjs
730
+ node examples/preview-card.mjs
731
+ node examples/social-links.mjs
732
+ node examples/reddit.mjs
733
+ node examples/pinterest.mjs
734
+ node examples/behance.mjs
735
+ node examples/youtube.mjs
736
+ node examples/diagnostics.mjs
737
+ node examples/custom-plugin.mjs
738
+ node examples/custom-adapter.mjs
739
+ ```
740
+
741
+ What to inspect:
742
+
743
+ - JSON output has `ok`, `url`, `finalUrl`, `type`, `title`, `description`, `bestImage`, `confidence`, and media arrays.
744
+ - Diagnostics include `sourcesUsed`, `warnings`, `redirects`, and `selectedImageReason`.
745
+ - Image scoring puts large cover/social images ahead of logos, avatars, sprites, pixels, and placeholders.
746
+ - Adapters classify known social/media URLs correctly.
747
+ - Preview cards include `confidence`.
748
+
749
+ ## Troubleshooting
750
+
751
+ ### 403 Forbidden
752
+
753
+ Some sites block generic fetchers. Try a clear user agent:
754
+
755
+ ```ts
756
+ fetchMetadata(url, {
757
+ userAgent: "MyAppBot/1.0 (+https://example.com/bot)"
758
+ });
759
+ ```
760
+
761
+ ### Timeout
762
+
763
+ Increase `timeoutMs` or reduce work:
764
+
765
+ ```ts
766
+ fetchMetadata(url, { timeoutMs: 15000, retries: 2 });
767
+ ```
768
+
769
+ ### No bestImage
770
+
771
+ Check:
772
+
773
+ - Does the page include image metadata?
774
+ - Are image URLs relative and resolvable?
775
+ - Were all candidates filtered as pixels/icons/placeholders?
776
+ - Inspect `metadata.images` and `metadata.diagnostics.warnings`.
777
+
778
+ ### Blocked By Website
779
+
780
+ Some sites require browser rendering or authentication. MetaNova core intentionally does not use browser automation. Add a future rendering adapter outside the core package when needed.
781
+
782
+ ### Invalid URL
783
+
784
+ Use `validateUrl`:
785
+
786
+ ```ts
787
+ validateUrl("https://example.com");
788
+ ```
789
+
790
+ Only `http:` and `https:` are allowed by default.
791
+
792
+ ### Private IP Blocked
793
+
794
+ This is expected SSRF protection. For trusted local development:
795
+
796
+ ```ts
797
+ fetchMetadata("http://127.0.0.1:3000", {
798
+ allowLocalhost: true
799
+ });
800
+ ```
801
+
802
+ ### Missing Metadata
803
+
804
+ Use fallback extraction:
805
+
806
+ ```ts
807
+ const metadata = parseMetadata(html, url, { includeRaw: true });
808
+ console.log(metadata.raw);
809
+ ```
810
+
811
+ ### CJS/ESM Import Issues
812
+
813
+ Build first:
814
+
815
+ ```bash
816
+ npm run build
817
+ ```
818
+
819
+ ESM:
820
+
821
+ ```js
822
+ import { parseMetadata } from "metanova";
823
+ ```
824
+
825
+ CommonJS:
826
+
827
+ ```js
828
+ const { parseMetadata } = require("metanova");
829
+ ```