react-native-pageindex 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +9 -0
- package/README.md +186 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -6,6 +6,15 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and
|
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
+
## [0.1.2] — 2026-03-08
|
|
10
|
+
|
|
11
|
+
### Documentation
|
|
12
|
+
- Added full **Demo** section to README with screenshots and an 8-part walkthrough of the React demo app
|
|
13
|
+
- Covers: data sources, index modes, build pipeline code, PDF extraction with pdfjs-dist + Vite `?url`, LLM provider wiring, progress tracking, reverse-index search, and local setup instructions
|
|
14
|
+
- Added `docs/screenshots/demo-keyword-mode.png` and `docs/screenshots/demo-llm-mode.png`
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
9
18
|
## [0.1.1] — 2026-03-08
|
|
10
19
|
|
|
11
20
|
### Fixed
|
package/README.md
CHANGED
|
@@ -9,6 +9,192 @@ No vector database required. Instead of embeddings, the library uses the LLM to
|
|
|
9
9
|
|
|
10
10
|
---
|
|
11
11
|
|
|
12
|
+
## Demo
|
|
13
|
+
|
|
14
|
+
A fully interactive React demo app is included in the [`demo/`](./demo) directory. It runs in the browser and showcases both index modes against two built-in datasets — no backend required.
|
|
15
|
+
|
|
16
|
+
### Keyword mode — instant, no API key
|
|
17
|
+
|
|
18
|
+
> CSV dataset (100 farmers · 14 columns) indexed in **0.0 s** using TF-IDF scoring.
|
|
19
|
+
|
|
20
|
+

|
|
21
|
+
|
|
22
|
+
### LLM mode — semantic tree via any LLM
|
|
23
|
+
|
|
24
|
+
> 32-page farming PDF with a TOC parsed into **31 nodes** and **250 indexed terms** in ~214 s using `gpt-4o-mini`.
|
|
25
|
+
|
|
26
|
+

|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
### How the demo is built
|
|
31
|
+
|
|
32
|
+
The demo is a **Vite + React + TypeScript** single-page app that wires `react-native-pageindex` directly in the browser. Below is a walkthrough of every layer.
|
|
33
|
+
|
|
34
|
+
#### 1. Data sources (`ConfigPanel.tsx`)
|
|
35
|
+
|
|
36
|
+
Three mutually exclusive source modes are offered:
|
|
37
|
+
|
|
38
|
+
| Mode | What it loads |
|
|
39
|
+
|---|---|
|
|
40
|
+
| **Sample CSV** | `farmer_dataset.csv` — 100 rows, 14 columns (crop, state, soil type, risk…) |
|
|
41
|
+
| **Sample PDF** | `crop_production_guide.pdf` — 32-page farming guide generated with `pdfkit`, complete with a dot-leader TOC and 7 chapters |
|
|
42
|
+
| **Upload** | Any `.pdf`, `.html`, `.md`, `.csv`, or `.txt` file drag-dropped or file-picked by the user |
|
|
43
|
+
|
|
44
|
+
#### 2. Index modes (`App.tsx`)
|
|
45
|
+
|
|
46
|
+
| Mode | Description | API key needed? |
|
|
47
|
+
|---|---|---|
|
|
48
|
+
| **Keyword** | Calls `extractCsvPages` → `buildReverseIndex({ mode: 'keyword' })`. Pure TF-IDF, zero LLM calls. | ❌ No |
|
|
49
|
+
| **Full LLM** | Full pipeline: extract → `pageIndex` / `pageIndexMd` → `buildReverseIndex`. LLM reasons about structure, generates summaries, and builds a semantic tree. | ✅ Yes |
|
|
50
|
+
|
|
51
|
+
#### 3. The build pipeline (`App.tsx` — `handleBuild`)
|
|
52
|
+
|
|
53
|
+
```ts
|
|
54
|
+
// ── CSV / Keyword mode ────────────────────────────────────────────────────
|
|
55
|
+
import { extractCsvPages, buildReverseIndex } from 'react-native-pageindex';
|
|
56
|
+
|
|
57
|
+
const pages = await extractCsvPages(csvText, { rowsPerPage: 10 });
|
|
58
|
+
|
|
59
|
+
const result = { // flat PageIndexResult (no LLM)
|
|
60
|
+
doc_name: fileName,
|
|
61
|
+
structure: { children: pages.map((p, i) => ({ title: `Rows ${i*10+1}–${(i+1)*10}`, node_id: `g${i}`, start_index: i, end_index: i })) },
|
|
62
|
+
};
|
|
63
|
+
|
|
64
|
+
const index = await buildReverseIndex({ result, pages, options: { mode: 'keyword' } });
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
// ── PDF / LLM mode ────────────────────────────────────────────────────────
|
|
68
|
+
import { pageIndex, buildReverseIndex } from 'react-native-pageindex';
|
|
69
|
+
import { extractPdfPagesFromBuffer } from './demoExtractors'; // pdfjs-dist wrapper
|
|
70
|
+
|
|
71
|
+
const pages = await extractPdfPagesFromBuffer(arrayBuffer); // uses pdfjs-dist v5
|
|
72
|
+
|
|
73
|
+
const result = await pageIndex({
|
|
74
|
+
pages,
|
|
75
|
+
docName: 'Crop Production Guide',
|
|
76
|
+
llm, // passed from LLM config panel
|
|
77
|
+
options: {
|
|
78
|
+
onProgress: ({ step, percent, detail }) => setProgress({ step, percent, detail }),
|
|
79
|
+
},
|
|
80
|
+
});
|
|
81
|
+
|
|
82
|
+
const index = await buildReverseIndex({ result, pages, llm, options: { mode: 'keyword' } });
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
// ── HTML / Markdown / TXT / Upload mode ──────────────────────────────────
|
|
86
|
+
import { pageIndexMd, buildReverseIndex } from 'react-native-pageindex';
|
|
87
|
+
import { htmlToMarkdown } from './demoExtractors';
|
|
88
|
+
|
|
89
|
+
const markdown = fileType === 'html' ? htmlToMarkdown(rawText) : rawText;
|
|
90
|
+
|
|
91
|
+
const result = await pageIndexMd({
|
|
92
|
+
content: markdown,
|
|
93
|
+
docName: fileName,
|
|
94
|
+
llm,
|
|
95
|
+
options: { onProgress: setProgress },
|
|
96
|
+
});
|
|
97
|
+
|
|
98
|
+
const index = await buildReverseIndex({ result, llm, options: { mode: 'keyword' } });
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
#### 4. PDF extraction in the browser (`demoExtractors.ts`)
|
|
102
|
+
|
|
103
|
+
`pdfjs-dist` requires a Web Worker. In a Vite app the worker URL is resolved at build time using the `?url` import suffix:
|
|
104
|
+
|
|
105
|
+
```ts
|
|
106
|
+
// demoExtractors.ts
|
|
107
|
+
import pdfjsWorkerSrc from 'pdfjs-dist/build/pdf.worker.min.mjs?url';
|
|
108
|
+
|
|
109
|
+
export async function extractPdfPagesFromBuffer(
|
|
110
|
+
buffer: ArrayBuffer,
|
|
111
|
+
): Promise<PageData[]> {
|
|
112
|
+
const pdfjsLib = await import('pdfjs-dist');
|
|
113
|
+
pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorkerSrc; // local, not CDN
|
|
114
|
+
|
|
115
|
+
const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
|
|
116
|
+
const pages: PageData[] = [];
|
|
117
|
+
|
|
118
|
+
for (let i = 1; i <= pdf.numPages; i++) {
|
|
119
|
+
const pg = await pdf.getPage(i);
|
|
120
|
+
const content = await pg.getTextContent();
|
|
121
|
+
const text = content.items.map((item: any) => item.str).join(' ');
|
|
122
|
+
pages.push({ text, tokenCount: Math.ceil(text.length / 4) });
|
|
123
|
+
}
|
|
124
|
+
return pages;
|
|
125
|
+
}
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
> **Why `?url` and not a CDN link?**
|
|
129
|
+
> Pointing pdfjs at an external CDN URL fails if the CDN is unreachable or CORS-blocked. The `?url` import makes Vite serve the worker file locally from the same dev-server / bundle.
|
|
130
|
+
|
|
131
|
+
#### 5. LLM provider wiring (`llm.ts`)
|
|
132
|
+
|
|
133
|
+
The demo supports OpenAI and Anthropic out of the box. It bridges each provider's SDK into the `LLMProvider` interface that `react-native-pageindex` expects:
|
|
134
|
+
|
|
135
|
+
```ts
|
|
136
|
+
// llm.ts — OpenAI adapter (simplified)
|
|
137
|
+
import type { LLMProvider } from 'react-native-pageindex';
|
|
138
|
+
|
|
139
|
+
export function makeOpenAIProvider(apiKey: string, model: string): LLMProvider {
|
|
140
|
+
return async (prompt, opts) => {
|
|
141
|
+
const res = await fetch(`${OPENAI_BASE}/v1/chat/completions`, {
|
|
142
|
+
method: 'POST',
|
|
143
|
+
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${apiKey}` },
|
|
144
|
+
body: JSON.stringify({
|
|
145
|
+
model,
|
|
146
|
+
messages: [...(opts?.chatHistory ?? []), { role: 'user', content: prompt }],
|
|
147
|
+
}),
|
|
148
|
+
});
|
|
149
|
+
const data = await res.json();
|
|
150
|
+
if (!res.ok) throw new Error(`OpenAI ${res.status}: ${data.error?.message}`);
|
|
151
|
+
return { content: data.choices[0].message.content, finishReason: data.choices[0].finish_reason };
|
|
152
|
+
};
|
|
153
|
+
}
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
OpenAI calls in the browser are routed through a **Vite dev-server proxy** (`/llm-proxy/openai → https://api.openai.com`) to avoid CORS errors. Anthropic supports the `anthropic-dangerous-direct-browser-access: true` header so no proxy is needed.
|
|
157
|
+
|
|
158
|
+
#### 6. Progress tracking (`ProgressDisplay.tsx`)
|
|
159
|
+
|
|
160
|
+
`pageIndex` and `pageIndexMd` both fire `onProgress` callbacks after every named step. The demo displays a live progress bar and step label:
|
|
161
|
+
|
|
162
|
+
```ts
|
|
163
|
+
options: {
|
|
164
|
+
onProgress: ({ step, percent, detail }) => {
|
|
165
|
+
setProgress({ step, percent, detail }); // drives the progress bar in ProgressDisplay.tsx
|
|
166
|
+
},
|
|
167
|
+
}
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
The PDF pipeline emits **13 named steps** (Initializing → Extracting PDF pages → Scanning for TOC → … → Done); the Markdown pipeline emits **8 steps**.
|
|
171
|
+
|
|
172
|
+
#### 7. Reverse index & search (`SearchPanel.tsx`)
|
|
173
|
+
|
|
174
|
+
After the index is built, the demo calls `buildReverseIndex` then passes the result to `searchReverseIndex` on every keystroke:
|
|
175
|
+
|
|
176
|
+
```ts
|
|
177
|
+
import { searchReverseIndex } from 'react-native-pageindex';
|
|
178
|
+
|
|
179
|
+
const hits = searchReverseIndex(reverseIndex, query, 20);
|
|
180
|
+
// hits[0] = { nodeTitle, nodeId, score, matchedTerm, totalScore, pageRange, ... }
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Results are ranked by `totalScore` and each card shows the matched term, score, confidence level (High / Medium / Low), and the page range covered by that tree node.
|
|
184
|
+
|
|
185
|
+
#### 8. Running the demo locally
|
|
186
|
+
|
|
187
|
+
```bash
|
|
188
|
+
git clone https://github.com/subham11/react-native-pageindex.git
|
|
189
|
+
cd react-native-pageindex/demo
|
|
190
|
+
npm install
|
|
191
|
+
npm run dev # → http://localhost:5173
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Select **Sample CSV → Keyword** for an instant zero-API-key demo, or select **Sample PDF → Full LLM**, enter an OpenAI or Anthropic key, and click **Build LLM Index** to see the full semantic-tree pipeline in action.
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
12
198
|
## Features
|
|
13
199
|
|
|
14
200
|
| Feature | Detail |
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "react-native-pageindex",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.2",
|
|
4
4
|
"description": "Vectorless, reasoning-based RAG — builds a hierarchical tree index from PDF, DOCX, CSV, XLSX or Markdown using any LLM. React Native compatible.",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"module": "dist/index.js",
|