react-native-pageindex 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/CHANGELOG.md +22 -0
  2. package/README.md +337 -0
  3. package/package.json +1 -1
package/CHANGELOG.md CHANGED
@@ -6,6 +6,28 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and
6
6
 
7
7
  ---
8
8
 
9
+ ## [0.1.3] — 2026-03-08
10
+
11
+ ### Added
12
+ - **Chat mode** (`ChatPanel.tsx`) — conversational multi-turn Q&A over any indexed document, with collapsible cited-source cards per reply (node title, node ID, page range, relevance score)
13
+ - `docs/screenshots/demo-chat-mode.png` — screenshot of the Chat tab in action
14
+
15
+ ### Documentation
16
+ - Added `## Conversational Chat Mode` section to README with minimal code example and full description of the browser demo's Chat tab implementation
17
+ - Updated Demo section with Chat mode screenshot and walkthrough (section 8)
18
+ - Updated Features table to include **Conversational chat**
19
+
20
+ ---
21
+
22
+ ## [0.1.2] — 2026-03-08
23
+
24
+ ### Documentation
25
+ - Added full **Demo** section to README with screenshots and an 8-part walkthrough of the React demo app
26
+ - Covers: data sources, index modes, build pipeline code, PDF extraction with pdfjs-dist + Vite `?url`, LLM provider wiring, progress tracking, reverse-index search, and local setup instructions
27
+ - Added `docs/screenshots/demo-keyword-mode.png` and `docs/screenshots/demo-llm-mode.png`
28
+
29
+ ---
30
+
9
31
  ## [0.1.1] — 2026-03-08
10
32
 
11
33
  ### Fixed
package/README.md CHANGED
@@ -9,6 +9,265 @@ No vector database required. Instead of embeddings, the library uses the LLM to
9
9
 
10
10
  ---
11
11
 
12
+ ## Demo
13
+
14
+ A fully interactive React demo app is included in the [`demo/`](./demo) directory. It runs in the browser and showcases both index modes against two built-in datasets — no backend required.
15
+
16
+ ### Keyword mode — instant, no API key
17
+
18
+ > CSV dataset (100 farmers · 14 columns) indexed in **0.0 s** using TF-IDF scoring.
19
+
20
+ ![PageIndex Demo – Keyword mode](docs/screenshots/demo-keyword-mode.png)
21
+
22
+ ### LLM mode — semantic tree via any LLM
23
+
24
+ > 32-page farming PDF with a TOC parsed into **31 nodes** and **250 indexed terms** in ~214 s using `gpt-4o-mini`.
25
+
26
+ ![PageIndex Demo – LLM mode](docs/screenshots/demo-llm-mode.png)
27
+
28
+ ### Chat mode — conversational AI over any indexed document
29
+
30
+ > Ask natural-language questions and get cited answers backed by the reverse index. Multi-turn conversation with collapsible source references per reply.
31
+
32
+ ![PageIndex Demo – Chat mode](docs/screenshots/demo-chat-mode.png)
33
+
34
+ ---
35
+
36
+ ### How the demo is built
37
+
38
+ The demo is a **Vite + React + TypeScript** single-page app that wires `react-native-pageindex` directly in the browser. Below is a walkthrough of every layer.
39
+
40
+ #### 1. Data sources (`ConfigPanel.tsx`)
41
+
42
+ Three mutually exclusive source modes are offered:
43
+
44
+ | Mode | What it loads |
45
+ |---|---|
46
+ | **Sample CSV** | `farmer_dataset.csv` — 100 rows, 14 columns (crop, state, soil type, risk…) |
47
+ | **Sample PDF** | `crop_production_guide.pdf` — 32-page farming guide generated with `pdfkit`, complete with a dot-leader TOC and 7 chapters |
48
+ | **Upload** | Any `.pdf`, `.html`, `.md`, `.csv`, or `.txt` file drag-dropped or file-picked by the user |
49
+
50
+ #### 2. Index modes (`App.tsx`)
51
+
52
+ | Mode | Description | API key needed? |
53
+ |---|---|---|
54
+ | **Keyword** | Calls `extractCsvPages` → `buildReverseIndex({ mode: 'keyword' })`. Pure TF-IDF, zero LLM calls. | ❌ No |
55
+ | **Full LLM** | Full pipeline: extract → `pageIndex` / `pageIndexMd` → `buildReverseIndex`. LLM reasons about structure, generates summaries, and builds a semantic tree. | ✅ Yes |
56
+
57
+ #### 3. The build pipeline (`App.tsx` — `handleBuild`)
58
+
59
+ ```ts
60
+ // ── CSV / Keyword mode ────────────────────────────────────────────────────
61
+ import { extractCsvPages, buildReverseIndex } from 'react-native-pageindex';
62
+
63
+ const pages = await extractCsvPages(csvText, { rowsPerPage: 10 });
64
+
65
+ const result = { // flat PageIndexResult (no LLM)
66
+ doc_name: fileName,
67
+ structure: { children: pages.map((p, i) => ({ title: `Rows ${i*10+1}–${(i+1)*10}`, node_id: `g${i}`, start_index: i, end_index: i })) },
68
+ };
69
+
70
+ const index = await buildReverseIndex({ result, pages, options: { mode: 'keyword' } });
71
+
72
+
73
+ // ── PDF / LLM mode ────────────────────────────────────────────────────────
74
+ import { pageIndex, buildReverseIndex } from 'react-native-pageindex';
75
+ import { extractPdfPagesFromBuffer } from './demoExtractors'; // pdfjs-dist wrapper
76
+
77
+ const pages = await extractPdfPagesFromBuffer(arrayBuffer); // uses pdfjs-dist v5
78
+
79
+ const result = await pageIndex({
80
+ pages,
81
+ docName: 'Crop Production Guide',
82
+ llm, // passed from LLM config panel
83
+ options: {
84
+ onProgress: ({ step, percent, detail }) => setProgress({ step, percent, detail }),
85
+ },
86
+ });
87
+
88
+ const index = await buildReverseIndex({ result, pages, llm, options: { mode: 'keyword' } });
89
+
90
+
91
+ // ── HTML / Markdown / TXT / Upload mode ──────────────────────────────────
92
+ import { pageIndexMd, buildReverseIndex } from 'react-native-pageindex';
93
+ import { htmlToMarkdown } from './demoExtractors';
94
+
95
+ const markdown = fileType === 'html' ? htmlToMarkdown(rawText) : rawText;
96
+
97
+ const result = await pageIndexMd({
98
+ content: markdown,
99
+ docName: fileName,
100
+ llm,
101
+ options: { onProgress: setProgress },
102
+ });
103
+
104
+ const index = await buildReverseIndex({ result, llm, options: { mode: 'keyword' } });
105
+ ```
106
+
107
+ #### 4. PDF extraction in the browser (`demoExtractors.ts`)
108
+
109
+ `pdfjs-dist` requires a Web Worker. In a Vite app the worker URL is resolved at build time using the `?url` import suffix:
110
+
111
+ ```ts
112
+ // demoExtractors.ts
113
+ import pdfjsWorkerSrc from 'pdfjs-dist/build/pdf.worker.min.mjs?url';
114
+
115
+ export async function extractPdfPagesFromBuffer(
116
+ buffer: ArrayBuffer,
117
+ ): Promise<PageData[]> {
118
+ const pdfjsLib = await import('pdfjs-dist');
119
+ pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorkerSrc; // local, not CDN
120
+
121
+ const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
122
+ const pages: PageData[] = [];
123
+
124
+ for (let i = 1; i <= pdf.numPages; i++) {
125
+ const pg = await pdf.getPage(i);
126
+ const content = await pg.getTextContent();
127
+ const text = content.items.map((item: any) => item.str).join(' ');
128
+ pages.push({ text, tokenCount: Math.ceil(text.length / 4) });
129
+ }
130
+ return pages;
131
+ }
132
+ ```
133
+
134
+ > **Why `?url` and not a CDN link?**
135
+ > Pointing pdfjs at an external CDN URL fails if the CDN is unreachable or CORS-blocked. The `?url` import makes Vite serve the worker file locally from the same dev-server / bundle.
136
+
137
+ #### 5. LLM provider wiring (`llm.ts`)
138
+
139
+ The demo supports OpenAI and Anthropic out of the box. It bridges each provider's SDK into the `LLMProvider` interface that `react-native-pageindex` expects:
140
+
141
+ ```ts
142
+ // llm.ts — OpenAI adapter (simplified)
143
+ import type { LLMProvider } from 'react-native-pageindex';
144
+
145
+ export function makeOpenAIProvider(apiKey: string, model: string): LLMProvider {
146
+ return async (prompt, opts) => {
147
+ const res = await fetch(`${OPENAI_BASE}/v1/chat/completions`, {
148
+ method: 'POST',
149
+ headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${apiKey}` },
150
+ body: JSON.stringify({
151
+ model,
152
+ messages: [...(opts?.chatHistory ?? []), { role: 'user', content: prompt }],
153
+ }),
154
+ });
155
+ const data = await res.json();
156
+ if (!res.ok) throw new Error(`OpenAI ${res.status}: ${data.error?.message}`);
157
+ return { content: data.choices[0].message.content, finishReason: data.choices[0].finish_reason };
158
+ };
159
+ }
160
+ ```
161
+
162
+ OpenAI calls in the browser are routed through a **Vite dev-server proxy** (`/llm-proxy/openai → https://api.openai.com`) to avoid CORS errors. Anthropic supports the `anthropic-dangerous-direct-browser-access: true` header so no proxy is needed.
163
+
164
+ #### 6. Progress tracking (`ProgressDisplay.tsx`)
165
+
166
+ `pageIndex` and `pageIndexMd` both fire `onProgress` callbacks after every named step. The demo displays a live progress bar and step label:
167
+
168
+ ```ts
169
+ options: {
170
+ onProgress: ({ step, percent, detail }) => {
171
+ setProgress({ step, percent, detail }); // drives the progress bar in ProgressDisplay.tsx
172
+ },
173
+ }
174
+ ```
175
+
176
+ The PDF pipeline emits **13 named steps** (Initializing → Extracting PDF pages → Scanning for TOC → … → Done); the Markdown pipeline emits **8 steps**.
177
+
178
+ #### 7. Reverse index & search (`SearchPanel.tsx`)
179
+
180
+ After the index is built, the demo calls `buildReverseIndex` then passes the result to `searchReverseIndex` on every keystroke:
181
+
182
+ ```ts
183
+ import { searchReverseIndex } from 'react-native-pageindex';
184
+
185
+ const hits = searchReverseIndex(reverseIndex, query, 20);
186
+ // hits[0] = { nodeTitle, nodeId, score, matchedTerm, totalScore, pageRange, ... }
187
+ ```
188
+
189
+ Results are ranked by `totalScore` and each card shows the matched term, score, confidence level (High / Medium / Low), and the page range covered by that tree node.
190
+
191
+ #### 8. Chat mode (`ChatPanel.tsx`)
192
+
193
+ After the index is built a **💬 Chat** tab appears alongside Tree View, Search and Raw Pages. It implements a full multi-turn conversational loop over the indexed document:
194
+
195
+ 1. **Retrieve** — `searchReverseIndex(reverseIndex, question, 5)` fetches the top-5 relevant tree nodes.
196
+ 2. **Build context** — actual page text (or node summaries as fallback) from those nodes is injected into the prompt, capped to 1 200 chars per node.
197
+ 3. **Chat history** — the last 10 conversation turns are passed as `chatHistory` for multi-turn continuity.
198
+ 4. **Stream answer** — the configured `LLMProvider` returns the answer, which is displayed with collapsible **citations** (node title, node ID, page range, relevance score).
199
+
200
+ ```ts
201
+ import { searchReverseIndex } from 'react-native-pageindex';
202
+ import type { LLMProvider, ReverseIndex, PageIndexResult, PageData } from 'react-native-pageindex';
203
+
204
+ async function chat(
205
+ question: string,
206
+ reverseIndex: ReverseIndex,
207
+ result: PageIndexResult,
208
+ pages: PageData[],
209
+ llm: LLMProvider,
210
+ chatHistory: { role: 'user' | 'assistant'; content: string }[] = [],
211
+ ) {
212
+ // 1. Retrieve relevant nodes
213
+ const hits = searchReverseIndex(reverseIndex, question, 5);
214
+
215
+ // 2. Build grounded context from page text / node summaries
216
+ const contextParts = hits.map(hit => {
217
+ const pageRange = `${hit.startIndex ?? '?'}–${hit.endIndex ?? '?'}`;
218
+ const body = pages
219
+ .slice((hit.startIndex ?? 1) - 1, hit.endIndex ?? 1)
220
+ .map(p => p.text)
221
+ .join('\n')
222
+ .slice(0, 1200);
223
+ return `[${hit.nodeTitle} | pages ${pageRange}]\n${body || hit.summary}`;
224
+ });
225
+
226
+ const systemPrompt =
227
+ `You are a helpful assistant for "${result.doc_name}". ` +
228
+ `Use the provided sections as your primary source. ` +
229
+ `Always cite which section your answer comes from.`;
230
+
231
+ const userTurn =
232
+ `Relevant sections:\n\n${contextParts.join('\n\n---\n\n')}` +
233
+ `\n\nQuestion: ${question}`;
234
+
235
+ // 3. Call LLM with chat history
236
+ const response = await llm(userTurn, {
237
+ chatHistory: [
238
+ { role: 'system' as any, content: systemPrompt },
239
+ ...chatHistory,
240
+ ],
241
+ });
242
+
243
+ // 4. Return answer + citation metadata
244
+ return {
245
+ answer: response.content,
246
+ citations: hits.map(h => ({
247
+ title: h.nodeTitle,
248
+ nodeId: h.nodeId,
249
+ pages: `${h.startIndex}–${h.endIndex}`,
250
+ score: Math.round(h.totalScore * 100),
251
+ })),
252
+ };
253
+ }
254
+ ```
255
+
256
+ > **Chat requires an LLM provider.** In **Keyword mode** the Chat tab is visible but the `llm` ref is `null` — configure an API key in the sidebar and switch to **Full LLM** mode before building the index to enable chat.
257
+
258
+ #### 9. Running the demo locally
259
+
260
+ ```bash
261
+ git clone https://github.com/subham11/react-native-pageindex.git
262
+ cd react-native-pageindex/demo
263
+ npm install
264
+ npm run dev # → http://localhost:5173
265
+ ```
266
+
267
+ Select **Sample CSV → Keyword** for an instant zero-API-key demo, or select **Sample PDF → Full LLM**, enter an OpenAI or Anthropic key, and click **Build LLM Index** to see the full semantic-tree pipeline in action. Once the index is built, switch to the **💬 Chat** tab to start a conversation with the document.
268
+
269
+ ---
270
+
12
271
  ## Features
13
272
 
14
273
  | Feature | Detail |
@@ -16,6 +275,7 @@ No vector database required. Instead of embeddings, the library uses the LLM to
16
275
  | **Multi-format** | PDF, Word (.docx), CSV, Spreadsheet (.xlsx/.xls), Markdown |
17
276
  | **Forward index** | Hierarchical tree: chapters → sections → subsections |
18
277
  | **Reverse index** | Inverted index: term → node locations for fast lookup |
278
+ | **Conversational chat** | Multi-turn Q&A with cited answers, backed by the reverse index |
19
279
  | **Provider-agnostic** | Pass any LLM (OpenAI, Anthropic, Ollama, Gemini…) |
20
280
  | **Progress tracking** | Fine-grained per-step callbacks (13 PDF steps, 8 MD steps) |
21
281
  | **Fully typed** | 100% TypeScript, `.d.ts` declarations included |
@@ -367,6 +627,83 @@ const llm: LLMProvider = async (prompt) => {
367
627
 
368
628
  ---
369
629
 
630
+ ## Conversational Chat Mode
631
+
632
+ Once you have a `PageIndexResult` and a `ReverseIndex` you can add a full multi-turn chat interface to your app. The pattern is:
633
+
634
+ ```
635
+ User question
636
+ → searchReverseIndex() ← retrieves the most relevant tree nodes
637
+ → build grounded context ← page text / node summaries (no embeddings)
638
+ → LLMProvider() ← any provider, with chat history
639
+ → cited answer ← response + source metadata
640
+ ```
641
+
642
+ ### Minimal example
643
+
644
+ ```ts
645
+ import {
646
+ pageIndex,
647
+ buildReverseIndex,
648
+ searchReverseIndex,
649
+ } from 'react-native-pageindex';
650
+
651
+ // 1. Build the forward index (once per document)
652
+ const result = await pageIndex({ pages, llm, docName: 'My Docs' });
653
+
654
+ // 2. Build the reverse index (once per document)
655
+ const reverseIndex = await buildReverseIndex({ result, pages, options: { mode: 'keyword' } });
656
+
657
+ // 3. Chat loop
658
+ const history: { role: 'user' | 'assistant'; content: string }[] = [];
659
+
660
+ async function ask(question: string) {
661
+ // Retrieve top-5 relevant nodes
662
+ const hits = searchReverseIndex(reverseIndex, question, 5);
663
+
664
+ // Build grounded context
665
+ const context = hits
666
+ .map(h => `[${h.nodeTitle}]\n${h.summary ?? ''}`)
667
+ .join('\n\n---\n\n');
668
+
669
+ const userTurn = `Context:\n${context}\n\nQuestion: ${question}`;
670
+
671
+ // Call LLM with running history
672
+ const { content } = await llm(userTurn, { chatHistory: history });
673
+
674
+ // Update history for next turn
675
+ history.push({ role: 'user', content: question });
676
+ history.push({ role: 'assistant', content });
677
+
678
+ return {
679
+ answer: content,
680
+ sources: hits.map(h => ({ title: h.nodeTitle, pages: `${h.startIndex}–${h.endIndex}` })),
681
+ };
682
+ }
683
+
684
+ // Usage
685
+ const { answer, sources } = await ask('Best season to grow paddy in Odisha?');
686
+ console.log(answer);
687
+ // → "According to the 'Rice (Paddy) Cultivation' section, rice is primarily
688
+ // a kharif crop … the best season is during the kharif / monsoon period."
689
+ console.log(sources);
690
+ // → [{ title: 'Rice (Paddy) Cultivation', pages: '12–15' }, ...]
691
+ ```
692
+
693
+ ### Chat in the browser demo
694
+
695
+ The demo app's **💬 Chat** tab is a fully-featured implementation built on top of the pattern above:
696
+
697
+ - **Multi-turn** — up to 10 previous messages are sent as `chatHistory`, preserving conversational context.
698
+ - **Cited answers** — each response includes expandable source cards with node title, node ID, page range and relevance score (0–100).
699
+ - **Grounded context** — actual page text is preferred over summaries; each node's contribution is capped at 1 200 chars to stay within token budgets.
700
+ - **Keyboard shortcuts** — Enter to send, Shift+Enter for a newline.
701
+ - **LLM providers** — OpenAI (via Vite dev-server proxy), Anthropic (direct), or Ollama (local) — configured in the sidebar before building the index.
702
+
703
+ > **Tip:** For best chat quality, build the index in **Full LLM** mode (not Keyword mode) so each node has a rich LLM-generated summary the chat can draw on when no page text is available.
704
+
705
+ ---
706
+
370
707
  ## React Native Usage
371
708
 
372
709
  ```ts
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "react-native-pageindex",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "description": "Vectorless, reasoning-based RAG — builds a hierarchical tree index from PDF, DOCX, CSV, XLSX or Markdown using any LLM. React Native compatible.",
5
5
  "main": "dist/index.js",
6
6
  "module": "dist/index.js",