react-native-pageindex 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/CHANGELOG.md +9 -0
  2. package/README.md +186 -0
  3. package/package.json +1 -1
package/CHANGELOG.md CHANGED
@@ -6,6 +6,15 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and
6
6
 
7
7
  ---
8
8
 
9
+ ## [0.1.2] — 2026-03-08
10
+
11
+ ### Documentation
12
+ - Added full **Demo** section to README with screenshots and an 8-part walkthrough of the React demo app
13
+ - Covers: data sources, index modes, build pipeline code, PDF extraction with pdfjs-dist + Vite `?url`, LLM provider wiring, progress tracking, reverse-index search, and local setup instructions
14
+ - Added `docs/screenshots/demo-keyword-mode.png` and `docs/screenshots/demo-llm-mode.png`
15
+
16
+ ---
17
+
9
18
  ## [0.1.1] — 2026-03-08
10
19
 
11
20
  ### Fixed
package/README.md CHANGED
@@ -9,6 +9,192 @@ No vector database required. Instead of embeddings, the library uses the LLM to
9
9
 
10
10
  ---
11
11
 
12
+ ## Demo
13
+
14
+ A fully interactive React demo app is included in the [`demo/`](./demo) directory. It runs in the browser and showcases both index modes against two built-in datasets — no backend required.
15
+
16
+ ### Keyword mode — instant, no API key
17
+
18
+ > CSV dataset (100 farmers · 14 columns) indexed in **0.0 s** using TF-IDF scoring.
19
+
20
+ ![PageIndex Demo – Keyword mode](docs/screenshots/demo-keyword-mode.png)
21
+
22
+ ### LLM mode — semantic tree via any LLM
23
+
24
+ > 32-page farming PDF with a TOC parsed into **31 nodes** and **250 indexed terms** in ~214 s using `gpt-4o-mini`.
25
+
26
+ ![PageIndex Demo – LLM mode](docs/screenshots/demo-llm-mode.png)
27
+
28
+ ---
29
+
30
+ ### How the demo is built
31
+
32
+ The demo is a **Vite + React + TypeScript** single-page app that wires `react-native-pageindex` directly in the browser. Below is a walkthrough of every layer.
33
+
34
+ #### 1. Data sources (`ConfigPanel.tsx`)
35
+
36
+ Three mutually exclusive source modes are offered:
37
+
38
+ | Mode | What it loads |
39
+ |---|---|
40
+ | **Sample CSV** | `farmer_dataset.csv` — 100 rows, 14 columns (crop, state, soil type, risk…) |
41
+ | **Sample PDF** | `crop_production_guide.pdf` — 32-page farming guide generated with `pdfkit`, complete with a dot-leader TOC and 7 chapters |
42
+ | **Upload** | Any `.pdf`, `.html`, `.md`, `.csv`, or `.txt` file drag-dropped or file-picked by the user |
43
+
44
+ #### 2. Index modes (`App.tsx`)
45
+
46
+ | Mode | Description | API key needed? |
47
+ |---|---|---|
48
+ | **Keyword** | Calls `extractCsvPages` → `buildReverseIndex({ mode: 'keyword' })`. Pure TF-IDF, zero LLM calls. | ❌ No |
49
+ | **Full LLM** | Full pipeline: extract → `pageIndex` / `pageIndexMd` → `buildReverseIndex`. LLM reasons about structure, generates summaries, and builds a semantic tree. | ✅ Yes |
50
+
51
+ #### 3. The build pipeline (`App.tsx` — `handleBuild`)
52
+
53
+ ```ts
54
+ // ── CSV / Keyword mode ────────────────────────────────────────────────────
55
+ import { extractCsvPages, buildReverseIndex } from 'react-native-pageindex';
56
+
57
+ const pages = await extractCsvPages(csvText, { rowsPerPage: 10 });
58
+
59
+ const result = { // flat PageIndexResult (no LLM)
60
+ doc_name: fileName,
61
+ structure: { children: pages.map((p, i) => ({ title: `Rows ${i*10+1}–${(i+1)*10}`, node_id: `g${i}`, start_index: i, end_index: i })) },
62
+ };
63
+
64
+ const index = await buildReverseIndex({ result, pages, options: { mode: 'keyword' } });
65
+
66
+
67
+ // ── PDF / LLM mode ────────────────────────────────────────────────────────
68
+ import { pageIndex, buildReverseIndex } from 'react-native-pageindex';
69
+ import { extractPdfPagesFromBuffer } from './demoExtractors'; // pdfjs-dist wrapper
70
+
71
+ const pages = await extractPdfPagesFromBuffer(arrayBuffer); // uses pdfjs-dist v5
72
+
73
+ const result = await pageIndex({
74
+ pages,
75
+ docName: 'Crop Production Guide',
76
+ llm, // passed from LLM config panel
77
+ options: {
78
+ onProgress: ({ step, percent, detail }) => setProgress({ step, percent, detail }),
79
+ },
80
+ });
81
+
82
+ const index = await buildReverseIndex({ result, pages, llm, options: { mode: 'keyword' } });
83
+
84
+
85
+ // ── HTML / Markdown / TXT / Upload mode ──────────────────────────────────
86
+ import { pageIndexMd, buildReverseIndex } from 'react-native-pageindex';
87
+ import { htmlToMarkdown } from './demoExtractors';
88
+
89
+ const markdown = fileType === 'html' ? htmlToMarkdown(rawText) : rawText;
90
+
91
+ const result = await pageIndexMd({
92
+ content: markdown,
93
+ docName: fileName,
94
+ llm,
95
+ options: { onProgress: setProgress },
96
+ });
97
+
98
+ const index = await buildReverseIndex({ result, llm, options: { mode: 'keyword' } });
99
+ ```
100
+
101
+ #### 4. PDF extraction in the browser (`demoExtractors.ts`)
102
+
103
+ `pdfjs-dist` requires a Web Worker. In a Vite app the worker URL is resolved at build time using the `?url` import suffix:
104
+
105
+ ```ts
106
+ // demoExtractors.ts
107
+ import pdfjsWorkerSrc from 'pdfjs-dist/build/pdf.worker.min.mjs?url';
108
+
109
+ export async function extractPdfPagesFromBuffer(
110
+ buffer: ArrayBuffer,
111
+ ): Promise<PageData[]> {
112
+ const pdfjsLib = await import('pdfjs-dist');
113
+ pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorkerSrc; // local, not CDN
114
+
115
+ const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
116
+ const pages: PageData[] = [];
117
+
118
+ for (let i = 1; i <= pdf.numPages; i++) {
119
+ const pg = await pdf.getPage(i);
120
+ const content = await pg.getTextContent();
121
+ const text = content.items.map((item: any) => item.str).join(' ');
122
+ pages.push({ text, tokenCount: Math.ceil(text.length / 4) });
123
+ }
124
+ return pages;
125
+ }
126
+ ```
127
+
128
+ > **Why `?url` and not a CDN link?**
129
+ > Pointing pdfjs at an external CDN URL fails if the CDN is unreachable or CORS-blocked. The `?url` import makes Vite serve the worker file locally from the same dev-server / bundle.
130
+
131
+ #### 5. LLM provider wiring (`llm.ts`)
132
+
133
+ The demo supports OpenAI and Anthropic out of the box. It bridges each provider's SDK into the `LLMProvider` interface that `react-native-pageindex` expects:
134
+
135
+ ```ts
136
+ // llm.ts — OpenAI adapter (simplified)
137
+ import type { LLMProvider } from 'react-native-pageindex';
138
+
139
+ export function makeOpenAIProvider(apiKey: string, model: string): LLMProvider {
140
+ return async (prompt, opts) => {
141
+ const res = await fetch(`${OPENAI_BASE}/v1/chat/completions`, {
142
+ method: 'POST',
143
+ headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${apiKey}` },
144
+ body: JSON.stringify({
145
+ model,
146
+ messages: [...(opts?.chatHistory ?? []), { role: 'user', content: prompt }],
147
+ }),
148
+ });
149
+ const data = await res.json();
150
+ if (!res.ok) throw new Error(`OpenAI ${res.status}: ${data.error?.message}`);
151
+ return { content: data.choices[0].message.content, finishReason: data.choices[0].finish_reason };
152
+ };
153
+ }
154
+ ```
155
+
156
+ OpenAI calls in the browser are routed through a **Vite dev-server proxy** (`/llm-proxy/openai → https://api.openai.com`) to avoid CORS errors. Anthropic supports the `anthropic-dangerous-direct-browser-access: true` header so no proxy is needed.
157
+
158
+ #### 6. Progress tracking (`ProgressDisplay.tsx`)
159
+
160
+ `pageIndex` and `pageIndexMd` both fire `onProgress` callbacks after every named step. The demo displays a live progress bar and step label:
161
+
162
+ ```ts
163
+ options: {
164
+ onProgress: ({ step, percent, detail }) => {
165
+ setProgress({ step, percent, detail }); // drives the progress bar in ProgressDisplay.tsx
166
+ },
167
+ }
168
+ ```
169
+
170
+ The PDF pipeline emits **13 named steps** (Initializing → Extracting PDF pages → Scanning for TOC → … → Done); the Markdown pipeline emits **8 steps**.
171
+
172
+ #### 7. Reverse index & search (`SearchPanel.tsx`)
173
+
174
+ After the index is built, the demo calls `buildReverseIndex` then passes the result to `searchReverseIndex` on every keystroke:
175
+
176
+ ```ts
177
+ import { searchReverseIndex } from 'react-native-pageindex';
178
+
179
+ const hits = searchReverseIndex(reverseIndex, query, 20);
180
+ // hits[0] = { nodeTitle, nodeId, score, matchedTerm, totalScore, pageRange, ... }
181
+ ```
182
+
183
+ Results are ranked by `totalScore` and each card shows the matched term, score, confidence level (High / Medium / Low), and the page range covered by that tree node.
184
+
185
+ #### 8. Running the demo locally
186
+
187
+ ```bash
188
+ git clone https://github.com/subham11/react-native-pageindex.git
189
+ cd react-native-pageindex/demo
190
+ npm install
191
+ npm run dev # → http://localhost:5173
192
+ ```
193
+
194
+ Select **Sample CSV → Keyword** for an instant zero-API-key demo, or select **Sample PDF → Full LLM**, enter an OpenAI or Anthropic key, and click **Build LLM Index** to see the full semantic-tree pipeline in action.
195
+
196
+ ---
197
+
12
198
  ## Features
13
199
 
14
200
  | Feature | Detail |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "react-native-pageindex",
3
- "version": "0.1.1",
3
+ "version": "0.1.2",
4
4
  "description": "Vectorless, reasoning-based RAG — builds a hierarchical tree index from PDF, DOCX, CSV, XLSX or Markdown using any LLM. React Native compatible.",
5
5
  "main": "dist/index.js",
6
6
  "module": "dist/index.js",