npm - react-native-pageindex - Versions diffs - 0.1.1 → 0.1.3 - Mend

react-native-pageindex 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -6,6 +6,28 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and
 ---
+## [0.1.3] — 2026-03-08
+### Added
+- **Chat mode** (`ChatPanel.tsx`) — conversational multi-turn Q&A over any indexed document, with collapsible cited-source cards per reply (node title, node ID, page range, relevance score)
+- `docs/screenshots/demo-chat-mode.png` — screenshot of the Chat tab in action
+### Documentation
+- Added `## Conversational Chat Mode` section to README with minimal code example and full description of the browser demo's Chat tab implementation
+- Updated Demo section with Chat mode screenshot and walkthrough (section 8)
+- Updated Features table to include **Conversational chat**
+---
+## [0.1.2] — 2026-03-08
+### Documentation
+- Added full **Demo** section to README with screenshots and an 8-part walkthrough of the React demo app
+- Covers: data sources, index modes, build pipeline code, PDF extraction with pdfjs-dist + Vite `?url`, LLM provider wiring, progress tracking, reverse-index search, and local setup instructions
+- Added `docs/screenshots/demo-keyword-mode.png` and `docs/screenshots/demo-llm-mode.png`
+---
 ## [0.1.1] — 2026-03-08
 ### Fixed

package/README.md CHANGED Viewed

@@ -9,6 +9,265 @@ No vector database required. Instead of embeddings, the library uses the LLM to
 ---
+## Demo
+A fully interactive React demo app is included in the [`demo/`](./demo) directory. It runs in the browser and showcases both index modes against two built-in datasets — no backend required.
+### Keyword mode — instant, no API key
+> CSV dataset (100 farmers · 14 columns) indexed in **0.0 s** using TF-IDF scoring.
+![PageIndex Demo – Keyword mode](docs/screenshots/demo-keyword-mode.png)
+### LLM mode — semantic tree via any LLM
+> 32-page farming PDF with a TOC parsed into **31 nodes** and **250 indexed terms** in ~214 s using `gpt-4o-mini`.
+![PageIndex Demo – LLM mode](docs/screenshots/demo-llm-mode.png)
+### Chat mode — conversational AI over any indexed document
+> Ask natural-language questions and get cited answers backed by the reverse index. Multi-turn conversation with collapsible source references per reply.
+![PageIndex Demo – Chat mode](docs/screenshots/demo-chat-mode.png)
+---
+### How the demo is built
+The demo is a **Vite + React + TypeScript** single-page app that wires `react-native-pageindex` directly in the browser. Below is a walkthrough of every layer.
+#### 1. Data sources (`ConfigPanel.tsx`)
+Three mutually exclusive source modes are offered:
+| Mode | What it loads |
+|---|---|
+| **Sample CSV** | `farmer_dataset.csv` — 100 rows, 14 columns (crop, state, soil type, risk…) |
+| **Sample PDF** | `crop_production_guide.pdf` — 32-page farming guide generated with `pdfkit`, complete with a dot-leader TOC and 7 chapters |
+| **Upload** | Any `.pdf`, `.html`, `.md`, `.csv`, or `.txt` file drag-dropped or file-picked by the user |
+#### 2. Index modes (`App.tsx`)
+| Mode | Description | API key needed? |
+|---|---|---|
+| **Keyword** | Calls `extractCsvPages` → `buildReverseIndex({ mode: 'keyword' })`. Pure TF-IDF, zero LLM calls. | ❌ No |
+| **Full LLM** | Full pipeline: extract → `pageIndex` / `pageIndexMd` → `buildReverseIndex`. LLM reasons about structure, generates summaries, and builds a semantic tree. | ✅ Yes |
+#### 3. The build pipeline (`App.tsx` — `handleBuild`)
+```ts
+// ── CSV / Keyword mode ────────────────────────────────────────────────────
+import { extractCsvPages, buildReverseIndex } from 'react-native-pageindex';
+const pages = await extractCsvPages(csvText, { rowsPerPage: 10 });
+const result = {                            // flat PageIndexResult (no LLM)
+  doc_name: fileName,
+  structure: { children: pages.map((p, i) => ({ title: `Rows ${i*10+1}–${(i+1)*10}`, node_id: `g${i}`, start_index: i, end_index: i })) },
+};
+const index = await buildReverseIndex({ result, pages, options: { mode: 'keyword' } });
+// ── PDF / LLM mode ────────────────────────────────────────────────────────
+import { pageIndex, buildReverseIndex } from 'react-native-pageindex';
+import { extractPdfPagesFromBuffer } from './demoExtractors';   // pdfjs-dist wrapper
+const pages = await extractPdfPagesFromBuffer(arrayBuffer);     // uses pdfjs-dist v5
+const result = await pageIndex({
+  pages,
+  docName: 'Crop Production Guide',
+  llm,                                       // passed from LLM config panel
+  options: {
+    onProgress: ({ step, percent, detail }) => setProgress({ step, percent, detail }),
+  },
+});
+const index = await buildReverseIndex({ result, pages, llm, options: { mode: 'keyword' } });
+// ── HTML / Markdown / TXT / Upload mode ──────────────────────────────────
+import { pageIndexMd, buildReverseIndex } from 'react-native-pageindex';
+import { htmlToMarkdown } from './demoExtractors';
+const markdown = fileType === 'html' ? htmlToMarkdown(rawText) : rawText;
+const result = await pageIndexMd({
+  content: markdown,
+  docName: fileName,
+  llm,
+  options: { onProgress: setProgress },
+});
+const index = await buildReverseIndex({ result, llm, options: { mode: 'keyword' } });
+```
+#### 4. PDF extraction in the browser (`demoExtractors.ts`)
+`pdfjs-dist` requires a Web Worker. In a Vite app the worker URL is resolved at build time using the `?url` import suffix:
+```ts
+// demoExtractors.ts
+import pdfjsWorkerSrc from 'pdfjs-dist/build/pdf.worker.min.mjs?url';
+export async function extractPdfPagesFromBuffer(
+  buffer: ArrayBuffer,
+): Promise<PageData[]> {
+  const pdfjsLib = await import('pdfjs-dist');
+  pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorkerSrc; // local, not CDN
+  const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
+  const pages: PageData[] = [];
+  for (let i = 1; i <= pdf.numPages; i++) {
+    const pg = await pdf.getPage(i);
+    const content = await pg.getTextContent();
+    const text = content.items.map((item: any) => item.str).join(' ');
+    pages.push({ text, tokenCount: Math.ceil(text.length / 4) });
+  }
+  return pages;
+}
+```
+> **Why `?url` and not a CDN link?**
+> Pointing pdfjs at an external CDN URL fails if the CDN is unreachable or CORS-blocked. The `?url` import makes Vite serve the worker file locally from the same dev-server / bundle.
+#### 5. LLM provider wiring (`llm.ts`)
+The demo supports OpenAI and Anthropic out of the box. It bridges each provider's SDK into the `LLMProvider` interface that `react-native-pageindex` expects:
+```ts
+// llm.ts — OpenAI adapter (simplified)
+import type { LLMProvider } from 'react-native-pageindex';
+export function makeOpenAIProvider(apiKey: string, model: string): LLMProvider {
+  return async (prompt, opts) => {
+    const res = await fetch(`${OPENAI_BASE}/v1/chat/completions`, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${apiKey}` },
+      body: JSON.stringify({
+        model,
+        messages: [...(opts?.chatHistory ?? []), { role: 'user', content: prompt }],
+      }),
+    });
+    const data = await res.json();
+    if (!res.ok) throw new Error(`OpenAI ${res.status}: ${data.error?.message}`);
+    return { content: data.choices[0].message.content, finishReason: data.choices[0].finish_reason };
+  };
+}
+```
+OpenAI calls in the browser are routed through a **Vite dev-server proxy** (`/llm-proxy/openai → https://api.openai.com`) to avoid CORS errors. Anthropic supports the `anthropic-dangerous-direct-browser-access: true` header so no proxy is needed.
+#### 6. Progress tracking (`ProgressDisplay.tsx`)
+`pageIndex` and `pageIndexMd` both fire `onProgress` callbacks after every named step. The demo displays a live progress bar and step label:
+```ts
+options: {
+  onProgress: ({ step, percent, detail }) => {
+    setProgress({ step, percent, detail }); // drives the progress bar in ProgressDisplay.tsx
+  },
+}
+```
+The PDF pipeline emits **13 named steps** (Initializing → Extracting PDF pages → Scanning for TOC → … → Done); the Markdown pipeline emits **8 steps**.
+#### 7. Reverse index & search (`SearchPanel.tsx`)
+After the index is built, the demo calls `buildReverseIndex` then passes the result to `searchReverseIndex` on every keystroke:
+```ts
+import { searchReverseIndex } from 'react-native-pageindex';
+const hits = searchReverseIndex(reverseIndex, query, 20);
+// hits[0] = { nodeTitle, nodeId, score, matchedTerm, totalScore, pageRange, ... }
+```
+Results are ranked by `totalScore` and each card shows the matched term, score, confidence level (High / Medium / Low), and the page range covered by that tree node.
+#### 8. Chat mode (`ChatPanel.tsx`)
+After the index is built a **💬 Chat** tab appears alongside Tree View, Search and Raw Pages. It implements a full multi-turn conversational loop over the indexed document:
+1. **Retrieve** — `searchReverseIndex(reverseIndex, question, 5)` fetches the top-5 relevant tree nodes.
+2. **Build context** — actual page text (or node summaries as fallback) from those nodes is injected into the prompt, capped to 1 200 chars per node.
+3. **Chat history** — the last 10 conversation turns are passed as `chatHistory` for multi-turn continuity.
+4. **Stream answer** — the configured `LLMProvider` returns the answer, which is displayed with collapsible **citations** (node title, node ID, page range, relevance score).
+```ts
+import { searchReverseIndex } from 'react-native-pageindex';
+import type { LLMProvider, ReverseIndex, PageIndexResult, PageData } from 'react-native-pageindex';
+async function chat(
+  question: string,
+  reverseIndex: ReverseIndex,
+  result: PageIndexResult,
+  pages: PageData[],
+  llm: LLMProvider,
+  chatHistory: { role: 'user' | 'assistant'; content: string }[] = [],
+) {
+  // 1. Retrieve relevant nodes
+  const hits = searchReverseIndex(reverseIndex, question, 5);
+  // 2. Build grounded context from page text / node summaries
+  const contextParts = hits.map(hit => {
+    const pageRange = `${hit.startIndex ?? '?'}–${hit.endIndex ?? '?'}`;
+    const body = pages
+      .slice((hit.startIndex ?? 1) - 1, hit.endIndex ?? 1)
+      .map(p => p.text)
+      .join('\n')
+      .slice(0, 1200);
+    return `[${hit.nodeTitle} | pages ${pageRange}]\n${body || hit.summary}`;
+  });
+  const systemPrompt =
+    `You are a helpful assistant for "${result.doc_name}". ` +
+    `Use the provided sections as your primary source. ` +
+    `Always cite which section your answer comes from.`;
+  const userTurn =
+    `Relevant sections:\n\n${contextParts.join('\n\n---\n\n')}` +
+    `\n\nQuestion: ${question}`;
+  // 3. Call LLM with chat history
+  const response = await llm(userTurn, {
+    chatHistory: [
+      { role: 'system' as any, content: systemPrompt },
+      ...chatHistory,
+    ],
+  });
+  // 4. Return answer + citation metadata
+  return {
+    answer: response.content,
+    citations: hits.map(h => ({
+      title: h.nodeTitle,
+      nodeId: h.nodeId,
+      pages: `${h.startIndex}–${h.endIndex}`,
+      score: Math.round(h.totalScore * 100),
+    })),
+  };
+}
+```
+> **Chat requires an LLM provider.** In **Keyword mode** the Chat tab is visible but the `llm` ref is `null` — configure an API key in the sidebar and switch to **Full LLM** mode before building the index to enable chat.
+#### 9. Running the demo locally
+```bash
+git clone https://github.com/subham11/react-native-pageindex.git
+cd react-native-pageindex/demo
+npm install
+npm run dev            # → http://localhost:5173
+```
+Select **Sample CSV → Keyword** for an instant zero-API-key demo, or select **Sample PDF → Full LLM**, enter an OpenAI or Anthropic key, and click **Build LLM Index** to see the full semantic-tree pipeline in action. Once the index is built, switch to the **💬 Chat** tab to start a conversation with the document.
+---
 ## Features
 | Feature | Detail |
@@ -16,6 +275,7 @@ No vector database required. Instead of embeddings, the library uses the LLM to
 | **Multi-format** | PDF, Word (.docx), CSV, Spreadsheet (.xlsx/.xls), Markdown |
 | **Forward index** | Hierarchical tree: chapters → sections → subsections |
 | **Reverse index** | Inverted index: term → node locations for fast lookup |
+| **Conversational chat** | Multi-turn Q&A with cited answers, backed by the reverse index |
 | **Provider-agnostic** | Pass any LLM (OpenAI, Anthropic, Ollama, Gemini…) |
 | **Progress tracking** | Fine-grained per-step callbacks (13 PDF steps, 8 MD steps) |
 | **Fully typed** | 100% TypeScript, `.d.ts` declarations included |
@@ -367,6 +627,83 @@ const llm: LLMProvider = async (prompt) => {
 ---
+## Conversational Chat Mode
+Once you have a `PageIndexResult` and a `ReverseIndex` you can add a full multi-turn chat interface to your app. The pattern is:
+```
+User question
+  → searchReverseIndex()     ← retrieves the most relevant tree nodes
+  → build grounded context   ← page text / node summaries (no embeddings)
+  → LLMProvider()            ← any provider, with chat history
+  → cited answer             ← response + source metadata
+```
+### Minimal example
+```ts
+import {
+  pageIndex,
+  buildReverseIndex,
+  searchReverseIndex,
+} from 'react-native-pageindex';
+// 1. Build the forward index (once per document)
+const result = await pageIndex({ pages, llm, docName: 'My Docs' });
+// 2. Build the reverse index (once per document)
+const reverseIndex = await buildReverseIndex({ result, pages, options: { mode: 'keyword' } });
+// 3. Chat loop
+const history: { role: 'user' | 'assistant'; content: string }[] = [];
+async function ask(question: string) {
+  // Retrieve top-5 relevant nodes
+  const hits = searchReverseIndex(reverseIndex, question, 5);
+  // Build grounded context
+  const context = hits
+    .map(h => `[${h.nodeTitle}]\n${h.summary ?? ''}`)
+    .join('\n\n---\n\n');
+  const userTurn = `Context:\n${context}\n\nQuestion: ${question}`;
+  // Call LLM with running history
+  const { content } = await llm(userTurn, { chatHistory: history });
+  // Update history for next turn
+  history.push({ role: 'user', content: question });
+  history.push({ role: 'assistant', content });
+  return {
+    answer: content,
+    sources: hits.map(h => ({ title: h.nodeTitle, pages: `${h.startIndex}–${h.endIndex}` })),
+  };
+}
+// Usage
+const { answer, sources } = await ask('Best season to grow paddy in Odisha?');
+console.log(answer);
+// → "According to the 'Rice (Paddy) Cultivation' section, rice is primarily
+//    a kharif crop … the best season is during the kharif / monsoon period."
+console.log(sources);
+// → [{ title: 'Rice (Paddy) Cultivation', pages: '12–15' }, ...]
+```
+### Chat in the browser demo
+The demo app's **💬 Chat** tab is a fully-featured implementation built on top of the pattern above:
+- **Multi-turn** — up to 10 previous messages are sent as `chatHistory`, preserving conversational context.
+- **Cited answers** — each response includes expandable source cards with node title, node ID, page range and relevance score (0–100).
+- **Grounded context** — actual page text is preferred over summaries; each node's contribution is capped at 1 200 chars to stay within token budgets.
+- **Keyboard shortcuts** — Enter to send, Shift+Enter for a newline.
+- **LLM providers** — OpenAI (via Vite dev-server proxy), Anthropic (direct), or Ollama (local) — configured in the sidebar before building the index.
+> **Tip:** For best chat quality, build the index in **Full LLM** mode (not Keyword mode) so each node has a rich LLM-generated summary the chat can draw on when no page text is available.
+---
 ## React Native Usage
 ```ts

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "react-native-pageindex",
-  "version": "0.1.1",
+  "version": "0.1.3",
   "description": "Vectorless, reasoning-based RAG — builds a hierarchical tree index from PDF, DOCX, CSV, XLSX or Markdown using any LLM. React Native compatible.",
   "main": "dist/index.js",
   "module": "dist/index.js",