npm - headless-youtube-captions - Versions diffs - 2.0.0 → 3.0.0 - Mend

headless-youtube-captions 2.0.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md CHANGED Viewed

@@ -4,398 +4,176 @@
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![Node.js Version](https://img.shields.io/node/v/headless-youtube-captions.svg)](https://nodejs.org)
-> Extract YouTube video transcripts, channel videos, and comments by interacting with YouTube's UI using Puppeteer
+> Extract YouTube transcripts, channel videos, comments, search results, and video metadata using yt-dlp
 ## Features
-- 🎯 Extract video transcripts/captions in multiple languages
-- 📺 Get channel videos with pagination support
-- 🔍 Search videos within a specific channel
-- 🌍 **NEW**: Search across all of YouTube globally
-- 💬 Extract video comments with sorting options
-- 🐳 Docker support with configurable Chrome executable path
-- 📦 Zero build dependencies - runs directly from source
-- 🚀 Modern ES modules with async/await
-- 🛡️ Handles cookie consent and ad skipping automatically
+- Transcripts/captions in multiple languages
+- Channel video listings with pagination
+- Channel-scoped video search
+- Global YouTube search
+- Video comments with sort options
+- Full video metadata (description, tags, categories, like/view counts)
+- Zero npm dependencies — uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) CLI
+- Modern ES modules with TypeScript definitions
+## Prerequisites
+[yt-dlp](https://github.com/yt-dlp/yt-dlp#installation) must be installed and available in your PATH.
+```bash
+# macOS
+brew install yt-dlp
+# pip
+pip install yt-dlp
+# Linux
+sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
+sudo chmod a+rx /usr/local/bin/yt-dlp
+```
 ## Installation
 ```bash
-npm install -S headless-youtube-captions
-# OR
-yarn add headless-youtube-captions
+npm install headless-youtube-captions
 ```
 ## Usage
 ### Extract Video Transcripts
 ```js
 import { getSubtitles } from 'headless-youtube-captions';
 const captions = await getSubtitles({
-  videoID: 'JueUvj6X3DA', // YouTube video ID
+  videoID: 'dQw4w9WgXcQ',
   lang: 'en' // Optional, default: 'en'
 });
-console.log(captions);
+// => [{ start: "0.0", dur: "3.0", text: "We're no strangers to love" }, ...]
 ```
 ### Get Channel Videos
 ```js
 import { getChannelVideos } from 'headless-youtube-captions';
 const result = await getChannelVideos({
-  channelURL: '@mkbhd',  // or full URL like 'https://youtube.com/@mkbhd'
-  limit: 30              // Optional, default: 30
+  channelURL: '@mkbhd', // or full URL, or channel ID
+  page: 1,              // Optional, default: 1
+  pageSize: 20          // Optional, default: 20
 });
 console.log(result.videos);
+// => [{ id, title, views, uploadTime, duration, thumbnail, url }, ...]
+console.log(result.hasMore); // true if more pages available
 ```
 ### Search Channel Videos
 ```js
 import { searchChannelVideos } from 'headless-youtube-captions';
 const result = await searchChannelVideos({
   channelURL: '@mkbhd',
   query: 'iphone review',
-  limit: 20              // Optional, default: 30
+  page: 1,     // Optional
+  pageSize: 20 // Optional
 });
 console.log(result.results);
 ```
 ### Search YouTube Globally
 ```js
 import { searchYouTubeGlobal } from 'headless-youtube-captions';
 const result = await searchYouTubeGlobal({
   query: 'javascript tutorial',
-  maxResults: 10,         // Optional, 1-20, default: 10
-  resultTypes: ['videos'] // Optional, ['videos', 'channels', 'all'], default: ['all']
+  page: 1,     // Optional, default: 1
+  pageSize: 10 // Optional, default: 10
 });
 console.log(result.results);
+// => [{ id, type: 'video', title, url, channel, views, duration, uploadTime, thumbnail }, ...]
 ```
 ### Get Video Comments
 ```js
 import { getVideoComments } from 'headless-youtube-captions';
 const result = await getVideoComments({
-  videoID: 'JueUvj6X3DA',
-  limit: 50,             // Optional, default: 50
-  sortBy: 'top'          // Optional, 'top' or 'newest', default: 'top'
+  videoID: 'dQw4w9WgXcQ',
+  sortBy: 'top', // Optional, 'top' or 'newest', default: 'top'
+  page: 1,       // Optional
+  pageSize: 20   // Optional
 });
 console.log(result.comments);
+// => [{ author, authorUrl, text, time, likes, isReply }, ...]
 ```
-## API
-### `getSubtitles(options)`
-Extracts captions/transcripts from a YouTube video by automating browser interactions.
-#### Parameters
-- `options` (Object):
-  - `videoID` (String, required): The YouTube video ID
-  - `lang` (String, optional): Language code for captions. Default: `'en'`. Supported: `'en'`, `'de'`, `'fr'`
-#### Returns
-A Promise that resolves to an array of caption objects.
-#### Caption Object Format
-Each caption object contains:
-```js
-{
-  "start": "0",     // Start time in seconds (as string)
-  "dur": "3.0",     // Duration in seconds (as string)
-  "text": "Caption text here"  // The actual caption text
-}
-```
-#### Example Response
-```js
-[
-  {
-    "start": "0",
-    "dur": "3.0",
-    "text": "- Creating passive income takes work,"
-  },
-  {
-    "start": "3",
-    "dur": "2.0",
-    "text": "but once you implement those processes,"
-  },
-  {
-    "start": "5",
-    "dur": "3.0",
-    "text": "it's one of the most fruitful income sources"
-  }
-  // ... more captions
-]
-```
-## How It Works
-This library uses Puppeteer to:
-1. Navigate to the YouTube video page
-2. Handle cookie consent and ads if present
-3. Click the "Show transcript" button in the video description
-4. Extract transcript segments from the opened transcript panel
-5. Parse timestamps and text content
-6. Calculate proper durations for each caption segment
-## Requirements
-- Node.js 18 or higher (ES modules support required)
-- Puppeteer (installed as a dependency)
-## Docker Usage
-When running in Docker containers, you may need to specify the Chrome executable path using the `PUPPETEER_EXECUTABLE_PATH` environment variable:
-```bash
-# Set the environment variable
-export PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable
-# Or run directly
-PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable node your-script.js
-```
-Example Dockerfile configuration:
-```dockerfile
-# Install Chrome dependencies
-RUN apt-get update && apt-get install -y \
-    wget \
-    gnupg \
-    ca-certificates \
-    fonts-liberation \
-    libasound2 \
-    libatk-bridge2.0-0 \
-    libatk1.0-0 \
-    libatspi2.0-0 \
-    libcups2 \
-    libdbus-1-3 \
-    libdrm2 \
-    libgbm1 \
-    libgtk-3-0 \
-    libnspr4 \
-    libnss3 \
-    libxcomposite1 \
-    libxdamage1 \
-    libxfixes3 \
-    libxkbcommon0 \
-    libxrandr2 \
-    xdg-utils
-# Install Chrome
-RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
-    && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list \
-    && apt-get update \
-    && apt-get install -y google-chrome-stable \
-    && rm -rf /var/lib/apt/lists/*
-# Set the Chrome executable path
-ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable
-```
-## Error Handling
-The function will throw an error if:
-- The video ID is invalid or the video doesn't exist
-- The video has no available captions/transcripts
-- The "Show transcript" button cannot be found
-- Network issues prevent loading the page
-Example error handling:
+### Get Video Metadata
 ```js
-try {
-  const captions = await getSubtitles({ videoID: 'XXXXX' });
-  console.log(captions);
-} catch (error) {
-  console.error('Failed to extract captions:', error.message);
-}
-```
-## Notes
+import { getVideoMetadata } from 'headless-youtube-captions';
-- The library runs Puppeteer in headless mode by default
-- Extraction time depends on video page load time and transcript length
-- The library respects YouTube's UI structure as of the last update
-- Some videos may not have transcripts available
-### `getChannelVideos(options)`
-Extracts videos from a YouTube channel with pagination support.
-#### Parameters
-- `options` (Object):
-  - `channelURL` (String, required): Channel identifier (@handle, channel ID, or full URL)
-  - `limit` (Number, optional): Maximum videos to return. Default: `30`
-  - `pageToken` (String, optional): For pagination (future use)
-#### Returns
+const result = await getVideoMetadata({
+  videoID: 'dQw4w9WgXcQ'
+});
-```js
-{
-  channel: {
-    name: "Channel Name",
-    subscribers: "1.2M subscribers",
-    videoCount: "500 videos"
-  },
-  videos: [
-    {
-      id: "videoId123",
-      title: "Video Title",
-      views: "1.2M views",
-      uploadTime: "2 days ago",
-      duration: "10:45",
-      thumbnail: "https://...",
-      url: "https://youtube.com/watch?v=videoId123"
-    }
-    // ... more videos
-  ],
-  totalLoaded: 30,
-  hasMore: true
-}
+console.log(result.video);
+// => { id, title, description, uploadDate, viewCount, likeCount, duration, tags, categories }
+console.log(result.channel);
+// => { name, url, subscriberCount }
 ```
-### `searchChannelVideos(options)`
+## Pagination
-Search for videos within a specific YouTube channel.
+All list-returning functions use `page` and `pageSize` parameters:
-#### Parameters
+| Function | Default pageSize | Pagination type |
+|----------|-----------------|-----------------|
+| `getChannelVideos` | 20 | Server-side (only fetches requested page) |
+| `searchChannelVideos` | 20 | Server-side |
+| `getVideoComments` | 20 | Fetch up to page*pageSize, return slice |
+| `searchYouTubeGlobal` | 10 | Fetch up to page*pageSize, return slice |
-- `options` (Object):
-  - `channelURL` (String, required): Channel identifier (@handle, channel ID, or full URL)
-  - `query` (String, required): Search query
-  - `limit` (Number, optional): Maximum results. Default: `30`
+All results include a `hasMore` boolean to indicate if more pages are available.
-#### Returns
+## API Reference
-```js
-{
-  query: "iphone review",
-  results: [
-    {
-      id: "videoId123",
-      title: "iPhone 15 Review",
-      views: "2.5M views",
-      uploadTime: "1 week ago",
-      duration: "15:23",
-      thumbnail: "https://...",
-      url: "https://youtube.com/watch?v=videoId123"
-    }
-    // ... more results
-  ],
-  totalFound: 25
-}
-```
+### `getSubtitles({ videoID, lang? })`
-### `getVideoComments(options)`
+Returns `Promise<{ start: string, dur: string, text: string }[]>`
-Extract comments from a YouTube video with pagination support.
+### `getChannelVideos({ channelURL, page?, pageSize? })`
-#### Parameters
+Returns `Promise<{ channel, videos, page, pageSize, hasMore }>`
-- `options` (Object):
-  - `videoID` (String, required): YouTube video ID
-  - `limit` (Number, optional): Maximum comments to return. Default: `50`
-  - `sortBy` (String, optional): Sort order - `'top'` or `'newest'`. Default: `'top'`
-  - `pageToken` (String, optional): For pagination (future use)
+### `searchChannelVideos({ channelURL, query, page?, pageSize? })`
-#### Returns
+Returns `Promise<{ query, results, page, pageSize, totalFound, hasMore }>`
-```js
-{
-  video: {
-    id: "JueUvj6X3DA",
-    title: "Video Title",
-    channel: {
-      name: "Channel Name",
-      url: "https://youtube.com/@channel"
-    },
-    views: "1.5M views"
-  },
-  comments: [
-    {
-      author: "Username",
-      authorUrl: "https://youtube.com/@username",
-      authorAvatar: "https://...",
-      text: "Great video! Thanks for sharing...",
-      time: "2 days ago",
-      likes: "245",
-      replyCount: "12"
-    }
-    // ... more comments
-  ],
-  totalComments: 1566,
-  totalLoaded: 50,
-  hasMore: true,
-  sortBy: "top"
-}
-```
+### `getVideoComments({ videoID, sortBy?, page?, pageSize? })`
-### `searchYouTubeGlobal(options)`
+Returns `Promise<{ comments, page, pageSize, totalFetched, hasMore, sortBy }>`
-Search across all of YouTube for videos and channels.
+### `searchYouTubeGlobal({ query, page?, pageSize? })`
-#### Parameters
+Returns `Promise<{ query, results, page, pageSize, hasMore }>`
-- `options` (Object):
-  - `query` (String, required): Search term to find content
-  - `maxResults` (Number, optional): Maximum results to return (1-20). Default: `10`
-  - `resultTypes` (Array, optional): Types of results to include. Options: `['videos']`, `['channels']`, `['all']`. Default: `['all']`
+### `getVideoMetadata({ videoID })`
-#### Returns
+Returns `Promise<{ video, channel }>` — no pagination needed.
-```js
-{
-  query: "javascript tutorial",
-  resultTypes: ["videos"],
-  maxResults: 10,
-  totalFound: 8,
-  results: [
-    {
-      id: "videoId123",
-      type: "video",
-      title: "JavaScript Tutorial for Beginners",
-      url: "https://youtube.com/watch?v=videoId123",
-      channel: "Code Academy",
-      views: "2.1M views",
-      uploadTime: "1 year ago",
-      duration: "1:23:45",
-      thumbnail: "https://i.ytimg.com/vi/..."
-    },
-    {
-      id: "channelId456",
-      type: "channel",
-      title: "JavaScript Mastery",
-      url: "https://youtube.com/@javascriptmastery",
-      subscribers: "1.2M subscribers",
-      videoCount: "200 videos",
-      thumbnail: "https://yt3.ggpht.com/..."
-    }
-    // ... more results
-  ]
-}
-```
-#### Result Types
+## Requirements
-- **Video Results** include: `id`, `type`, `title`, `url`, `channel`, `views`, `uploadTime`, `duration`, `thumbnail`
-- **Channel Results** include: `id`, `type`, `title`, `url`, `subscribers`, `videoCount`, `thumbnail`
+- Node.js 18+
+- [yt-dlp](https://github.com/yt-dlp/yt-dlp) installed and in PATH
 ## License
-MIT
+MIT

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "headless-youtube-captions",
-  "version": "2.0.0",
-  "description": "Extract YouTube video transcripts (via yt-dlp), channel videos, comments, and comprehensive video metadata",
+  "version": "3.0.0",
+  "description": "Extract YouTube video transcripts, channel videos, comments, search results, and video metadata via yt-dlp",
   "main": "src/index.js",
   "types": "src/index.d.ts",
   "type": "module",
@@ -23,12 +23,7 @@
   "scripts": {
     "test": "node --test test/*.test.js"
   },
-  "dependencies": {
-    "he": "^1.2.0",
-    "lodash": "^4.17.21",
-    "patchright": "^1.51.1",
-    "striptags": "^3.2.0"
-  },
+  "dependencies": {},
   "devDependencies": {},
   "keywords": [
     "youtube",
@@ -42,10 +37,7 @@
     "metadata",
     "description",
     "yt-dlp",
-    "patchright",
     "scraper",
-    "api",
-    "automation",
-    "global-search"
+    "api"
   ]
 }