npm - tokwise - Versions diffs - 0.1.0 - Mend

tokwise 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Sebastian Crossa
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,185 @@
+# Tokwise CLI
+Local-first command line tooling for turning saved short-form videos into a searchable, transcript-centered knowledge base.
+Tokwise syncs clips into local files, builds a search index, downloads media, transcribes audio, classifies themes, exports Markdown, compiles a wiki, answers questions against local evidence, and installs an agent skill.
+## Install
+```bash
+npm install -g tokwise
+```
+Or run without installing:
+```bash
+npx tokwise status
+```
+Main command:
+```bash
+tokwise status
+```
+Short alias:
+```bash
+tw status
+```
+Requires Node.js 20+. Media download requires `yt-dlp` on PATH. Transcription requires either OpenAI Whisper CLI, whisper.cpp, or a custom command.
+### Develop
+```bash
+npm install
+npm run build
+npm link
+```
+## Quick Start
+```bash
+# Optional: save a browser cookie for private collections or liked videos.
+# Easiest on macOS: pull it straight from a logged-in Chromium browser.
+tokwise auth from-browser            # auto-detects Chrome, Brave, Edge, Arc, or Chromium
+tokwise auth refresh                 # re-pull later when the session goes stale
+# Or paste a cookie manually (works everywhere).
+tokwise auth set --cookie "YOUR_COOKIE"
+# Tokwise tries to detect the @handle tied to your cookie. Set it manually if needed.
+tokwise auth set-username your-handle
+# Sync a collection, download audio, transcribe, classify, and index.
+# --collection accepts a full URL, an @user/collection/slug path, or a bare slug.
+tokwise sync --collection "name-123" \
+  --limit 200 \
+  --download --audio \
+  --transcribe --stt-engine whisper --stt-model base \
+  --classify
+# Search and explore.
+tokwise search "how to choose a career"
+tokwise similar <video-id>
+tokwise categories
+tokwise stats
+tokwise md
+tokwise wiki
+tokwise ask "What patterns show up in my saved advice videos?" --engine ollama --model llama3.1
+```
+Every `tokwise` command can also be run with `tw`.
+## Core Commands
+```bash
+tokwise sync                  Sync URLs, collections, playlists, liked videos, or imports
+tokwise fetch-media           Download video/audio with yt-dlp for existing records
+tokwise transcribe            Run Whisper, whisper.cpp, or a custom STT command
+tokwise index                 Rebuild the BM25 search index
+tokwise search <query>        Full-text search across descriptions and transcripts
+tokwise list                  Filter by author, date, category, domain, collection, transcript state
+tokwise show <id>             Show one video in detail
+tokwise similar <id>          Find transcript-similar videos
+tokwise stats                 Counts, date range, top authors, transcript coverage
+tokwise viz                   Terminal dashboard with simple bars
+tokwise categories            Category distribution
+tokwise domains               Domain distribution
+tokwise collections           Collection/source distribution
+tokwise classify              Regex or local Ollama classification
+tokwise model                 View or change default local model preferences
+tokwise md                    Export one Markdown file per video
+tokwise wiki                  Compile an interlinked local wiki
+tokwise ask <question>        Ask against top local matches, optionally via Ollama
+tokwise lint                  Check generated wiki links
+tokwise library ...           Manage local Markdown library pages
+tokwise commands ...          Manage reusable local command notes
+tokwise skill ...             Install/show/uninstall an agent skill
+tokwise paths/status/path     Show local data locations and health
+```
+## Data Layout
+By default data lives under `~/.tokwise/`.
+```text
+~/.tokwise/
+  videos/
+    videos.jsonl          # one normalized video record per line
+    search-index.json     # local BM25 index
+    auth.json             # optional browser cookie, chmod 600
+    media/                # yt-dlp video files
+    audio/                # yt-dlp extracted audio files
+    transcripts/          # .json and .txt STT outputs
+  library/
+    index.md              # generated wiki entry point
+    videos/*.md           # one markdown page per video
+    categories/*.md
+    domains/*.md
+  commands/
+    *.md                  # portable command notes for agents
+```
+Override locations with:
+```bash
+export TOKWISE_DATA_DIR=/path/to/data
+export TOKWISE_LIBRARY_DIR=/path/to/library
+export TOKWISE_COMMANDS_DIR=/path/to/commands
+```
+Legacy `TT_*` environment variables and `~/.tiktoktheory` are still read so existing local archives keep working after the rename.
+## Sources
+```bash
+tokwise sync --collection <url | @user/collection/slug | slug-or-id>
+tokwise sync --playlist <playlist-id-or-url>
+tokwise sync --liked <username>
+tokwise sync --user <username>
+tokwise sync --search-video "life advice"
+tokwise sync --url "https://www.tiktok.com/@user/video/123"
+tokwise sync --urls-file urls.txt
+tokwise sync --input export.jsonl
+```
+`--collection` accepts three forms, from most to least explicit:
+```bash
+tokwise sync --collection "https://www.tiktok.com/@user/collection/name-123"
+tokwise sync --collection "@user/collection/name-123"
+tokwise sync --collection "name-123"   # uses the @handle saved with your cookie
+```
+The bare-slug form needs the username tied to your cookie. Tokwise tries to detect it automatically when you run `tokwise auth set` or `tokwise auth from-browser`; you can also set it explicitly with `tokwise auth set-username <handle>` or `--username` on those commands. `tokwise auth show` reports the saved handle.
+Private collections usually require a fresh browser cookie from a logged-in session. On macOS, `tokwise auth from-browser` reads and decrypts it straight from a logged-in Chromium browser (Chrome, Brave, Edge, Arc, or Chromium) via the macOS Keychain, and `tokwise auth refresh` re-pulls it when the session goes stale. On other platforms or browsers, paste it manually with `tokwise auth set`. Cookies are stored locally only (`auth.json`, chmod 600).
+## Transcription
+Whisper CLI:
+```bash
+tokwise transcribe --engine whisper --model base --language en
+```
+whisper.cpp:
+```bash
+tokwise transcribe --engine whisper-cpp --command whisper-cli --model /path/to/ggml-base.en.bin
+```
+Custom command:
+```bash
+tokwise transcribe --engine custom \
+  --command 'my-stt --input "{input}" --output "{output}" --language "{language}"'
+```
+The custom command should write JSON, plain text, or print the transcript to stdout.
+## Attribution
+Tokwise was inspired by [afar1/fieldtheory-cli](https://github.com/afar1/fieldtheory-cli), especially its local-first approach to syncing personal saved content into searchable files, Markdown, and agent-readable workflows.

package/dist/ask.js ADDED Viewed

@@ -0,0 +1,58 @@
+export async function answerQuestion(question, results, options) {
+    if (results.length === 0)
+        return "No local evidence matched that question.";
+    if (options.engine === "ollama")
+        return answerWithOllama(question, results, options);
+    return answerExtractively(question, results);
+}
+function answerExtractively(question, results) {
+    return [
+        `Question: ${question}`,
+        "",
+        "Top local evidence:",
+        "",
+        ...results.slice(0, 8).map((result, idx) => {
+            const video = result.video;
+            const author = video.author?.username ? `@${video.author.username}` : "unknown";
+            const summary = video.classification?.summary ?? result.highlights[0] ?? video.description ?? "No text.";
+            return `${idx + 1}. ${video.id} ${author}\n   ${summary}\n   ${video.canonicalUrl ?? video.url}`;
+        }),
+        "",
+        "Use --engine ollama --model <model> for a synthesized answer from a local model.",
+    ].join("\n");
+}
+async function answerWithOllama(question, results, options) {
+    const baseUrl = options.ollamaBaseUrl ?? "http://localhost:11434";
+    const model = options.model ?? "llama3.1";
+    const context = results
+        .slice(0, 10)
+        .map((result, idx) => {
+        const video = result.video;
+        return [
+            `[${idx + 1}] ${video.id} ${video.canonicalUrl ?? video.url}`,
+            `Author: ${video.author?.username ?? "unknown"}`,
+            `Category: ${video.classification?.category ?? "unknown"}`,
+            `Summary: ${video.classification?.summary ?? ""}`,
+            `Transcript: ${(video.transcript?.text ?? video.description ?? "").slice(0, 2500)}`,
+        ].join("\n");
+    })
+        .join("\n\n");
+    const prompt = [
+        "Answer the user's question using only the saved clip evidence below.",
+        "Cite video ids in brackets when making claims. If evidence is thin, say so.",
+        "",
+        `Question: ${question}`,
+        "",
+        "Evidence:",
+        context,
+    ].join("\n");
+    const response = await fetch(`${baseUrl.replace(/\/+$/, "")}/api/generate`, {
+        method: "POST",
+        headers: { "content-type": "application/json" },
+        body: JSON.stringify({ model, prompt, stream: false }),
+    });
+    if (!response.ok)
+        throw new Error(`Ollama ask failed: ${response.status} ${response.statusText}`);
+    const body = (await response.json());
+    return body.response?.trim() || "Ollama returned an empty answer.";
+}

package/dist/browser-cookies.js ADDED Viewed

@@ -0,0 +1,160 @@
+import crypto from "node:crypto";
+import fs from "node:fs/promises";
+import os from "node:os";
+import path from "node:path";
+import { runProcess } from "./process.js";
+export const SUPPORTED_BROWSERS = ["chrome", "brave", "edge", "arc", "chromium"];
+const CHANNELS = {
+    chrome: { dir: "Google/Chrome", service: "Chrome Safe Storage", account: "Chrome" },
+    brave: { dir: "BraveSoftware/Brave-Browser", service: "Brave Safe Storage", account: "Brave" },
+    edge: { dir: "Microsoft Edge", service: "Microsoft Edge Safe Storage", account: "Microsoft Edge" },
+    arc: { dir: "Arc/User Data", service: "Arc Safe Storage", account: "Arc" },
+    chromium: { dir: "Chromium", service: "Chromium Safe Storage", account: "Chromium" },
+};
+export function isChromiumBrowser(value) {
+    return SUPPORTED_BROWSERS.includes(value);
+}
+export function chromiumTargets(browser, profile) {
+    const channel = CHANNELS[browser];
+    return {
+        cookieDbPath: path.join(os.homedir(), "Library", "Application Support", channel.dir, profile, "Cookies"),
+        keychainService: channel.service,
+        keychainAccount: channel.account,
+    };
+}
+export function deriveKey(password) {
+    return crypto.pbkdf2Sync(password, "saltysalt", 1003, 16, "sha1");
+}
+export function decryptCookieValue(encrypted, key, hostKey) {
+    const prefix = encrypted.subarray(0, 3).toString("latin1");
+    const body = prefix === "v10" || prefix === "v11" ? encrypted.subarray(3) : encrypted;
+    const iv = Buffer.alloc(16, 0x20);
+    const decipher = crypto.createDecipheriv("aes-128-cbc", key, iv);
+    decipher.setAutoPadding(false);
+    const padded = Buffer.concat([decipher.update(body), decipher.final()]);
+    const unpadded = removePkcs7Padding(padded);
+    const stripped = stripDomainHashPrefix(unpadded, hostKey);
+    return stripped.toString("utf8");
+}
+function removePkcs7Padding(buffer) {
+    if (buffer.length === 0)
+        return buffer;
+    const padLength = buffer[buffer.length - 1] ?? 0;
+    if (padLength > 0 && padLength <= 16 && padLength <= buffer.length) {
+        return buffer.subarray(0, buffer.length - padLength);
+    }
+    return buffer;
+}
+// Recent Chrome builds prepend a 32-byte SHA-256 hash of the cookie's domain to
+// the decrypted plaintext. When the host key is known and matches that prefix,
+// strip it so the clean cookie value is recovered.
+function stripDomainHashPrefix(buffer, hostKey) {
+    if (!hostKey || buffer.length < 32)
+        return buffer;
+    const domainHash = crypto.createHash("sha256").update(hostKey).digest();
+    if (buffer.subarray(0, 32).equals(domainHash)) {
+        return buffer.subarray(32);
+    }
+    return buffer;
+}
+export function buildCookieHeader(rows, key) {
+    const pairs = [];
+    for (const row of rows) {
+        if (!row.name || !row.encrypted_hex)
+            continue;
+        try {
+            const value = decryptCookieValue(Buffer.from(row.encrypted_hex, "hex"), key, row.host_key);
+            if (value)
+                pairs.push(`${row.name}=${value}`);
+        }
+        catch {
+            continue;
+        }
+    }
+    return pairs.join("; ");
+}
+export async function readKeychainPassword(service, account) {
+    const result = await runProcess("security", ["find-generic-password", "-w", "-a", account, "-s", service]);
+    if (result.code !== 0) {
+        throw new Error(`Keychain access for "${service}" failed. Re-run and click Allow when macOS asks, or pass --cookie manually.`);
+    }
+    const password = result.stdout.trim();
+    if (!password) {
+        throw new Error(`Keychain returned an empty password for "${service}".`);
+    }
+    return password;
+}
+export async function readTikTokCookieRows(cookieDbPath) {
+    const tmpDir = await fs.mkdtemp(path.join(os.tmpdir(), "tokwise-cookies-"));
+    const tmpDb = path.join(tmpDir, "Cookies");
+    try {
+        await fs.copyFile(cookieDbPath, tmpDb);
+        for (const suffix of ["-wal", "-shm"]) {
+            try {
+                await fs.copyFile(`${cookieDbPath}${suffix}`, `${tmpDb}${suffix}`);
+            }
+            catch {
+                // Sidecar files are optional; ignore when absent.
+            }
+        }
+        const sql = "SELECT host_key, name, hex(encrypted_value) AS encrypted_hex FROM cookies WHERE host_key LIKE '%tiktok.com%';";
+        const result = await runProcess("/usr/bin/sqlite3", ["-json", tmpDb, sql]);
+        if (result.code !== 0) {
+            throw new Error(`Could not read cookies database: ${result.stderr || result.stdout || "unknown error"}`);
+        }
+        return parseSqliteJsonRows(result.stdout);
+    }
+    finally {
+        await fs.rm(tmpDir, { recursive: true, force: true });
+    }
+}
+export function parseSqliteJsonRows(stdout) {
+    const trimmed = stdout.trim();
+    if (!trimmed)
+        return [];
+    const parsed = JSON.parse(trimmed);
+    if (!Array.isArray(parsed))
+        return [];
+    return parsed.flatMap((entry) => {
+        if (typeof entry !== "object" || entry === null)
+            return [];
+        const record = entry;
+        const host_key = typeof record.host_key === "string" ? record.host_key : "";
+        const name = typeof record.name === "string" ? record.name : "";
+        const encrypted_hex = typeof record.encrypted_hex === "string" ? record.encrypted_hex : "";
+        return [{ host_key, name, encrypted_hex }];
+    });
+}
+async function fileExists(filePath) {
+    try {
+        await fs.access(filePath);
+        return true;
+    }
+    catch {
+        return false;
+    }
+}
+export async function extractTikTokCookie(options) {
+    const candidates = options.browser ? [options.browser] : SUPPORTED_BROWSERS;
+    const detected = [];
+    for (const browser of candidates) {
+        const target = chromiumTargets(browser, options.profile);
+        if (await fileExists(target.cookieDbPath))
+            detected.push(browser);
+    }
+    if (detected.length === 0) {
+        throw new Error(`No supported Chromium browser found on macOS for profile "${options.profile}" (looked for: ${candidates.join(", ")}). Use \`tw auth set --cookie\` instead.`);
+    }
+    for (const browser of detected) {
+        const target = chromiumTargets(browser, options.profile);
+        const rows = await readTikTokCookieRows(target.cookieDbPath);
+        if (rows.length === 0)
+            continue;
+        const password = await readKeychainPassword(target.keychainService, target.keychainAccount);
+        const cookie = buildCookieHeader(rows, deriveKey(password));
+        if (!cookie)
+            continue;
+        return { cookie, browser, profile: options.profile };
+    }
+    throw new Error(`Found ${detected.join(", ")} but no tiktok.com cookies in profile "${options.profile}". Open tiktok.com in your browser, log in, then retry.`);
+}

package/dist/classify.js ADDED Viewed

@@ -0,0 +1,118 @@
+import { searchableText, tokenize } from "./search.js";
+const CATEGORY_RULES = [
+    { label: "career", keywords: ["career", "job", "work", "interview", "manager", "promotion", "resume", "business"] },
+    { label: "relationships", keywords: ["relationship", "friend", "partner", "dating", "marriage", "family", "boundaries"] },
+    { label: "health", keywords: ["health", "sleep", "fitness", "diet", "body", "therapy", "mental", "anxiety", "stress"] },
+    { label: "money", keywords: ["money", "finance", "invest", "budget", "saving", "debt", "wealth", "income"] },
+    { label: "productivity", keywords: ["productivity", "habit", "routine", "focus", "discipline", "calendar", "system"] },
+    { label: "learning", keywords: ["learn", "study", "read", "book", "skill", "practice", "teach"] },
+    { label: "mindset", keywords: ["mindset", "confidence", "fear", "failure", "motivation", "identity", "belief"] },
+    { label: "creativity", keywords: ["create", "writing", "artist", "idea", "taste", "creative", "make"] },
+    { label: "spirituality", keywords: ["meaning", "purpose", "gratitude", "meditation", "spiritual", "presence"] },
+];
+const DOMAIN_RULES = [
+    { label: "decision-making", keywords: ["choice", "decision", "tradeoff", "choose", "option", "clarity"] },
+    { label: "self-regulation", keywords: ["emotion", "calm", "stress", "discipline", "impulse", "nervous"] },
+    { label: "social-dynamics", keywords: ["people", "friend", "relationship", "boundary", "conversation", "trust"] },
+    { label: "work-and-ambition", keywords: ["career", "job", "interview", "promotion", "work", "business", "goal", "ambition", "manager"] },
+    { label: "health-and-energy", keywords: ["sleep", "health", "body", "exercise", "food", "energy"] },
+    { label: "money-and-security", keywords: ["money", "budget", "wealth", "debt", "invest", "rent"] },
+    { label: "meaning-and-values", keywords: ["meaning", "purpose", "values", "life", "death", "legacy"] },
+];
+export function classifyRegex(video) {
+    const text = searchableText(video).toLowerCase();
+    const tokens = new Set(tokenize(text));
+    const category = bestRule(tokens, CATEGORY_RULES) ?? "life-advice";
+    const domain = bestRule(tokens, DOMAIN_RULES) ?? "general-life";
+    const topics = topTopics(video, tokens);
+    return {
+        category,
+        domain,
+        topics,
+        summary: summarize(video),
+        engine: "regex",
+        classifiedAt: new Date().toISOString(),
+    };
+}
+export async function classifyOllama(video, options) {
+    const baseUrl = options.ollamaBaseUrl ?? "http://localhost:11434";
+    const model = options.model ?? "llama3.1";
+    const prompt = [
+        "Classify this saved short-form life-advice video.",
+        "Return compact JSON with keys: category, domain, topics, summary.",
+        "Categories should be short lowercase labels. Topics should be 3 to 7 short phrases.",
+        "",
+        `Author: ${video.author?.username ?? "unknown"}`,
+        `Description: ${video.description ?? ""}`,
+        `Transcript: ${(video.transcript?.text ?? "").slice(0, 6000)}`,
+    ].join("\n");
+    const response = await fetch(`${baseUrl.replace(/\/+$/, "")}/api/generate`, {
+        method: "POST",
+        headers: { "content-type": "application/json" },
+        body: JSON.stringify({ model, prompt, stream: false, format: "json" }),
+    });
+    if (!response.ok)
+        throw new Error(`Ollama classify failed: ${response.status} ${response.statusText}`);
+    const body = (await response.json());
+    const parsed = parseJsonObject(body.response ?? "{}");
+    const fallback = classifyRegex(video);
+    return {
+        category: stringValue(parsed.category) ?? fallback.category,
+        domain: stringValue(parsed.domain) ?? fallback.domain,
+        topics: arrayOfStrings(parsed.topics) ?? fallback.topics,
+        summary: stringValue(parsed.summary) ?? fallback.summary,
+        engine: "ollama",
+        model,
+        classifiedAt: new Date().toISOString(),
+    };
+}
+export async function classifyOne(video, options) {
+    if (options.engine === "ollama")
+        return classifyOllama(video, options);
+    return classifyRegex(video);
+}
+function bestRule(tokens, rules) {
+    let best;
+    for (const rule of rules) {
+        const score = rule.keywords.reduce((sum, keyword) => sum + (tokens.has(keyword) ? 1 : 0), 0);
+        if (score > 0 && (!best || score > best.score))
+            best = { label: rule.label, score };
+    }
+    return best?.label;
+}
+function topTopics(video, tokens) {
+    const hashtags = video.hashtags.slice(0, 8).map((tag) => tag.toLowerCase());
+    const meaningful = [...tokens].filter((token) => token.length > 4).slice(0, 8);
+    return [...new Set([...hashtags, ...meaningful])].slice(0, 7);
+}
+function summarize(video) {
+    const text = video.transcript?.text || video.description || "";
+    const firstSentence = text.split(/(?<=[.!?])\s+/)[0]?.replace(/\s+/g, " ").trim();
+    return firstSentence?.slice(0, 260) || "No transcript summary available yet.";
+}
+function parseJsonObject(text) {
+    try {
+        const parsed = JSON.parse(text);
+        return typeof parsed === "object" && parsed !== null && !Array.isArray(parsed) ? parsed : {};
+    }
+    catch {
+        const match = text.match(/\{[\s\S]*\}/);
+        if (!match)
+            return {};
+        try {
+            return JSON.parse(match[0]);
+        }
+        catch {
+            return {};
+        }
+    }
+}
+function stringValue(value) {
+    return typeof value === "string" && value.trim() ? value.trim() : undefined;
+}
+function arrayOfStrings(value) {
+    if (!Array.isArray(value))
+        return undefined;
+    const strings = value.filter((entry) => typeof entry === "string" && entry.trim().length > 0);
+    return strings.length > 0 ? strings : undefined;
+}