india-reg-mcp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Akhil Govind
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,292 @@
1
+ # india-reg-mcp
2
+
3
+ An MCP server that gives Claude (and any MCP client) searchable, cited access to **RBI and SEBI regulatory documents** — circulars, master directions, notifications, and regulations.
4
+
5
+ No API keys. No subscriptions. Built entirely on free public government data.
6
+
7
+ ---
8
+
9
+ ## Why This Exists
10
+
11
+ Regulatory rules in India are scattered across thousands of PDFs and HTML pages on rbi.org.in and sebi.gov.in. They reference and supersede each other constantly. There is no AI-accessible way to ask "what are the current rules on X" and get sourced answers.
12
+
13
+ This MCP solves that. It scrapes, indexes, and exposes the full text of RBI and SEBI documents in a local SQLite database — so Claude can search and retrieve primary-source regulatory text with official citations, instantly.
14
+
15
+ **This is not a legal advice tool.** It retrieves primary-source documents so you can reason over actual rules instead of hallucinated summaries.
16
+
17
+ ---
18
+
19
+ ## What It Does
20
+
21
+ - **Full-text search** across all indexed RBI and SEBI documents
22
+ - **Retrieve full document body** by ID, with official source link
23
+ - **List recent documents** — useful for "what changed this month" questions
24
+ - **List Master Directions and Master Circulars** — the consolidated, currently-in-force rules on each subject
25
+ - **Browse by department** — SEBI's Investment Management, Market Regulation, etc.
26
+ - **On-demand sync** — pull newly published documents from regulators' sites
27
+ - **Topic search** — returns both consolidated master rules AND recent circulars on a subject in one call
28
+
29
+ ---
30
+
31
+ ## Tools (8 total)
32
+
33
+ | Tool | Description |
34
+ |---|---|
35
+ | `search_regulations` | Full-text search with optional regulator/type filter |
36
+ | `get_document` | Retrieve full text of a document by ID |
37
+ | `get_recent` | Most recent documents, optionally filtered |
38
+ | `list_master_directions` | All RBI Master Directions + SEBI Master Circulars |
39
+ | `list_by_department` | SEBI documents by department name |
40
+ | `sync_latest` | Incremental scrape of new documents from RBI + SEBI |
41
+ | `sync_status` | Document count breakdown + last sync time |
42
+ | `search_by_topic` | Combined master rules + recent circulars on a topic |
43
+
44
+ ---
45
+
46
+ ## Architecture
47
+
48
+ ```
49
+ RBI / SEBI websites
50
+
51
+
52
+ Scrapers (TypeScript)
53
+ rbi.ts ──── ASP.NET POST with viewstate tokens
54
+ sebi.ts ──── AJAX pagination via JSP endpoint
55
+
56
+
57
+ SQLite DB (~/.india-reg-mcp/regdata.db)
58
+ ├── documents table (full text, metadata, source URLs)
59
+ ├── documents_fts (FTS5 full-text index, porter stemmer)
60
+ └── sync_meta (last sync timestamp)
61
+
62
+
63
+ MCP Server (stdio transport)
64
+ ├── 8 tools exposed to Claude
65
+ └── Every result includes official source URL + disclaimer
66
+ ```
67
+
68
+ **Key design decision:** Tools never scrape live. The scrapers populate a local SQLite database once, and tools query that instantly. Only `sync_latest` hits the regulators' sites. This makes every tool call fast, keeps you off government servers during normal use, and works offline once indexed.
69
+
70
+ ---
71
+
72
+ ## Installation
73
+
74
+ ### Prerequisites
75
+
76
+ - Node.js 22 LTS or higher
77
+ - Claude Desktop or Claude Code CLI
78
+
79
+ ### Setup
80
+
81
+ ```bash
82
+ git clone https://github.com/Akhilgovind02/india-regulatory-mcp.git
83
+ cd india-regulatory-mcp
84
+ npm install
85
+ npm run build
86
+ ```
87
+
88
+ ### First-run sync (populates the database)
89
+
90
+ ```bash
91
+ npm run sync
92
+ ```
93
+
94
+ This scrapes RBI notifications for the last 36 months and SEBI circulars (~1000 recent documents). Takes **5–15 minutes** depending on your connection. The database is stored at `~/.india-reg-mcp/regdata.db` and survives rebuilds.
95
+
96
+ Progress is printed to stderr:
97
+ ```
98
+ [sync] RBI 2026-6: 12 docs found
99
+ [sync] RBI 2026-5: 18 docs found
100
+ ...
101
+ [sync] SEBI ssid=6 page 0: 25 docs
102
+ ...
103
+ [sync] Sync complete.
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Configuration
109
+
110
+ ### Claude Code (CLI)
111
+
112
+ ```bash
113
+ claude mcp add -s user india-reg node /ABSOLUTE/PATH/TO/india-regulatory-mcp/dist/index.js
114
+ ```
115
+
116
+ Or add to `~/.claude.json` manually:
117
+ ```json
118
+ {
119
+ "mcpServers": {
120
+ "india-reg": {
121
+ "command": "node",
122
+ "args": ["/ABSOLUTE/PATH/TO/india-regulatory-mcp/dist/index.js"]
123
+ }
124
+ }
125
+ }
126
+ ```
127
+
128
+ ### Claude Desktop
129
+
130
+ Edit `%APPDATA%\Claude\claude_desktop_config.json` (Windows) or `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS):
131
+
132
+ ```json
133
+ {
134
+ "mcpServers": {
135
+ "india-reg": {
136
+ "command": "node",
137
+ "args": ["C:\\ABSOLUTE\\PATH\\TO\\india-regulatory-mcp\\dist\\index.js"]
138
+ }
139
+ }
140
+ }
141
+ ```
142
+
143
+ Restart Claude Desktop fully after editing the config.
144
+
145
+ ---
146
+
147
+ ## Usage Examples
148
+
149
+ Once connected, ask Claude naturally:
150
+
151
+ ```
152
+ What are the current RBI rules on digital lending?
153
+ ```
154
+ → Claude calls `search_by_topic("digital lending")` — returns the Master Direction on Digital Lending plus recent amending circulars, each with official source links.
155
+
156
+ ```
157
+ What new SEBI circulars came out this month?
158
+ ```
159
+ → Claude calls `get_recent(regulator="SEBI", limit=10)`.
160
+
161
+ ```
162
+ Show me all RBI Master Directions
163
+ ```
164
+ → Claude calls `list_master_directions(regulator="RBI")`.
165
+
166
+ ```
167
+ What are the KYC requirements for mutual funds?
168
+ ```
169
+ → Claude calls `search_regulations("KYC mutual fund", doc_type="master_direction")`.
170
+
171
+ ```
172
+ Pull in the latest regulatory updates
173
+ ```
174
+ → Claude calls `sync_latest()` — incremental, only fetches new documents.
175
+
176
+ ---
177
+
178
+ ## Data Sources
179
+
180
+ ### RBI (Reserve Bank of India)
181
+ - **Notifications + Circulars**: `rbi.org.in/Scripts/NotificationUser.aspx`
182
+ - **Master Directions**: `rbi.org.in/Scripts/BS_ViewMasterDirections.aspx`
183
+ - **Master Circulars**: `rbi.org.in/Scripts/BS_ViewMasterCirculardetails.aspx`
184
+ - Document bodies: HTML page text (with PDF fallback for PDF-only docs)
185
+
186
+ ### SEBI (Securities and Exchange Board of India)
187
+ - **Circulars** (`ssid=7`): ~2,775 documents as of June 2026
188
+ - **Master Circulars** (`ssid=6`): consolidated current rules by topic
189
+ - **Regulations** (`ssid=3`): statutory regulations
190
+ - Document bodies: primarily PDF-embedded content extracted via `pdf-parse`
191
+
192
+ ---
193
+
194
+ ## Document Types
195
+
196
+ | Type | Source | Description |
197
+ |---|---|---|
198
+ | `circular` | RBI + SEBI | Point-in-time regulatory guidance |
199
+ | `master_direction` | RBI | Consolidated current rules on a subject (supersedes earlier circulars) |
200
+ | `master_circular` | SEBI | Same as master direction, SEBI's term |
201
+ | `notification` | RBI | Statutory notifications under various Acts |
202
+ | `regulation` | SEBI | Formal regulations (e.g. SEBI (FPI) Regulations 2019) |
203
+
204
+ **For compliance questions, start with `master_direction`/`master_circular`.** These represent the current state of rules, not a point-in-time snapshot.
205
+
206
+ ---
207
+
208
+ ## Keeping the Index Fresh
209
+
210
+ The database is a point-in-time snapshot. RBI and SEBI publish new documents frequently (several per week).
211
+
212
+ **Option 1 — Ask Claude:** "Pull in the latest regulatory updates" → Claude calls `sync_latest`.
213
+
214
+ **Option 2 — CLI:**
215
+ ```bash
216
+ npm run sync
217
+ ```
218
+
219
+ **Option 3 — Scheduled (example cron, runs every Sunday at 2am):**
220
+ ```
221
+ 0 2 * * 0 cd /path/to/india-regulatory-mcp && npm run sync >> ~/.india-reg-mcp/sync.log 2>&1
222
+ ```
223
+
224
+ `sync_latest` and `npm run sync` are both incremental — they skip documents already in the database.
225
+
226
+ ---
227
+
228
+ ## Project Structure
229
+
230
+ ```
231
+ india-regulatory-mcp/
232
+ ├── src/
233
+ │ ├── index.ts ← MCP server, all 8 tools
234
+ │ ├── db/
235
+ │ │ ├── schema.ts ← SQLite schema + FTS5 setup
236
+ │ │ └── queries.ts ← DB read/write functions
237
+ │ ├── scrapers/
238
+ │ │ ├── rbi.ts ← RBI scraper (ASP.NET POST + viewstate)
239
+ │ │ ├── sebi.ts ← SEBI scraper (AJAX pagination)
240
+ │ │ ├── pdf.ts ← PDF download + text extraction
241
+ │ │ ├── run-sync.ts ← CLI full sync runner
242
+ │ │ └── repair-bodies.ts ← Utility: backfill missing body text
243
+ │ └── util/
244
+ │ ├── http.ts ← Polite fetch (UA + retry + delay)
245
+ │ └── format.ts ← MCP response helpers
246
+ ├── dist/ ← Compiled output (not in repo)
247
+ ├── package.json
248
+ └── tsconfig.json
249
+ ```
250
+
251
+ ---
252
+
253
+ ## Technical Notes
254
+
255
+ ### Scraping approach
256
+
257
+ **RBI** uses ASP.NET with viewstate tokens. The scraper fetches viewstate on startup, then POSTs month-by-month to get document listings. Content is extracted from the HTML circular page; PDF fallback for PDF-only documents.
258
+
259
+ **SEBI** uses a JSP-based listing with AJAX pagination. Page 1 is a GET request (establishes JSESSIONID). Pages 2+ POST to `getnewslistinfo.jsp` with the session cookie. Circular content is typically a PDF embedded in an iframe — the scraper extracts the PDF URL and parses it with `pdf-parse`.
260
+
261
+ ### Polite scraping
262
+ - Concurrency capped at 2 parallel requests per host
263
+ - 300ms delay between document fetches
264
+ - 500ms delay between listing pages
265
+ - Exponential backoff on 429/5xx responses
266
+ - Real User-Agent string identifying the tool
267
+
268
+ ### Database
269
+ - Location: `~/.india-reg-mcp/regdata.db`
270
+ - WAL mode for performance
271
+ - FTS5 with Porter stemmer — handles morphological variants ("lending" matches "lend", "lender")
272
+ - Full-text search terms are quoted per-word to handle hyphens and special characters safely
273
+
274
+ ---
275
+
276
+ ## Stack
277
+
278
+ | Package | Version | Purpose |
279
+ |---|---|---|
280
+ | `@modelcontextprotocol/sdk` | 1.29.0 | MCP server framework |
281
+ | `better-sqlite3` | 12.10.0 | SQLite with built-in FTS5 |
282
+ | `cheerio` | 1.2.0 | HTML parsing |
283
+ | `pdf-parse` | 2.4.5 | PDF text extraction |
284
+ | `turndown` | 7.2.4 | HTML → Markdown |
285
+ | `p-limit` | 7.3.0 | Concurrency control |
286
+ | `zod` | 4.4.3 | Schema validation |
287
+
288
+ ---
289
+
290
+ ## Disclaimer
291
+
292
+ This tool retrieves and surfaces primary-source regulatory documents from official RBI and SEBI publications. It does not provide legal advice. Always verify against the official linked document. The index is a point-in-time snapshot — use `sync_latest` to pull recent publications.
@@ -0,0 +1,91 @@
1
+ import { db } from "./schema.js";
2
+ // Lazy statement cache — prepared after initSchema() has created the tables
3
+ const stmts = {};
4
+ function stmt(key, sql) {
5
+ if (!stmts[key])
6
+ stmts[key] = db.prepare(sql);
7
+ return stmts[key];
8
+ }
9
+ export function upsertDoc(doc) {
10
+ stmt("upsert", `
11
+ INSERT INTO documents (id, regulator, doc_type, title, date, department, source_url, pdf_url, body, indexed_at)
12
+ VALUES (@id, @regulator, @doc_type, @title, @date, @department, @source_url, @pdf_url, @body, @indexed_at)
13
+ ON CONFLICT(id) DO UPDATE SET
14
+ title=@title, body=@body, department=@department, pdf_url=@pdf_url, indexed_at=@indexed_at
15
+ `).run(doc);
16
+ }
17
+ export function upsertMany(docs) {
18
+ const s = stmt("upsert", `
19
+ INSERT INTO documents (id, regulator, doc_type, title, date, department, source_url, pdf_url, body, indexed_at)
20
+ VALUES (@id, @regulator, @doc_type, @title, @date, @department, @source_url, @pdf_url, @body, @indexed_at)
21
+ ON CONFLICT(id) DO UPDATE SET
22
+ title=@title, body=@body, department=@department, pdf_url=@pdf_url, indexed_at=@indexed_at
23
+ `);
24
+ const tx = db.transaction((rows) => rows.forEach((r) => s.run(r)));
25
+ tx(docs);
26
+ }
27
+ export function docExists(id) {
28
+ return !!stmt("exists", "SELECT 1 FROM documents WHERE id = ?").get(id);
29
+ }
30
+ export function getDoc(id) {
31
+ return stmt("getDoc", "SELECT * FROM documents WHERE id = ?").get(id);
32
+ }
33
+ export function searchDocs(opts) {
34
+ if (!opts.query.trim())
35
+ return [];
36
+ const limit = opts.limit ?? 10;
37
+ let sql = `
38
+ SELECT d.*, snippet(documents_fts, 1, '<<', '>>', ' … ', 16) AS snippet
39
+ FROM documents_fts f
40
+ JOIN documents d ON d.rowid = f.rowid
41
+ WHERE documents_fts MATCH @q
42
+ `;
43
+ const params = { q: escapeFts(opts.query) };
44
+ if (opts.regulator) {
45
+ sql += " AND d.regulator = @regulator";
46
+ params.regulator = opts.regulator;
47
+ }
48
+ if (opts.docType) {
49
+ sql += " AND d.doc_type = @docType";
50
+ params.docType = opts.docType;
51
+ }
52
+ sql += " ORDER BY rank, d.date DESC LIMIT @limit";
53
+ params.limit = limit;
54
+ return db.prepare(sql).all(params);
55
+ }
56
+ export function recentDocs(opts) {
57
+ const limit = opts.limit ?? 15;
58
+ let sql = "SELECT * FROM documents WHERE 1=1";
59
+ const params = {};
60
+ if (opts.regulator) {
61
+ sql += " AND regulator = @regulator";
62
+ params.regulator = opts.regulator;
63
+ }
64
+ if (opts.docType) {
65
+ sql += " AND doc_type = @docType";
66
+ params.docType = opts.docType;
67
+ }
68
+ sql += " ORDER BY date DESC LIMIT @limit";
69
+ params.limit = limit;
70
+ return db.prepare(sql).all(params);
71
+ }
72
+ export function listByDepartment(dept, limit = 20) {
73
+ return db.prepare("SELECT * FROM documents WHERE department LIKE ? ORDER BY date DESC LIMIT ?").all(`%${dept}%`, limit);
74
+ }
75
+ export function docCount() {
76
+ return db.prepare("SELECT regulator, doc_type, COUNT(*) as n FROM documents GROUP BY regulator, doc_type").all();
77
+ }
78
+ export function getSyncMeta(key) {
79
+ const row = stmt("getMeta", "SELECT value FROM sync_meta WHERE key = ?").get(key);
80
+ return row?.value;
81
+ }
82
+ export function setSyncMeta(key, value) {
83
+ stmt("setMeta", "INSERT INTO sync_meta (key,value) VALUES (?,?) ON CONFLICT(key) DO UPDATE SET value=?")
84
+ .run(key, value, value);
85
+ }
86
+ function escapeFts(q) {
87
+ const trimmed = q.trim();
88
+ if (!trimmed)
89
+ return "";
90
+ return trimmed.split(/\s+/).map((w) => `"${w.replace(/"/g, '')}"`).join(" ");
91
+ }
@@ -0,0 +1,56 @@
1
+ import Database from "better-sqlite3";
2
+ import { homedir } from "node:os";
3
+ import { join } from "node:path";
4
+ import { mkdirSync, existsSync } from "node:fs";
5
+ const DB_DIR = join(homedir(), ".india-reg-mcp");
6
+ if (!existsSync(DB_DIR))
7
+ mkdirSync(DB_DIR, { recursive: true });
8
+ const DB_PATH = join(DB_DIR, "regdata.db");
9
+ export const db = new Database(DB_PATH);
10
+ db.pragma("journal_mode = WAL");
11
+ export function initSchema() {
12
+ db.exec(`
13
+ CREATE TABLE IF NOT EXISTS documents (
14
+ id TEXT PRIMARY KEY,
15
+ regulator TEXT NOT NULL,
16
+ doc_type TEXT NOT NULL,
17
+ title TEXT NOT NULL,
18
+ date TEXT NOT NULL,
19
+ department TEXT,
20
+ source_url TEXT NOT NULL,
21
+ pdf_url TEXT,
22
+ body TEXT,
23
+ indexed_at TEXT NOT NULL
24
+ );
25
+
26
+ CREATE INDEX IF NOT EXISTS idx_regulator ON documents(regulator);
27
+ CREATE INDEX IF NOT EXISTS idx_doctype ON documents(doc_type);
28
+ CREATE INDEX IF NOT EXISTS idx_date ON documents(date);
29
+
30
+ CREATE VIRTUAL TABLE IF NOT EXISTS documents_fts USING fts5(
31
+ title, body,
32
+ content='documents',
33
+ content_rowid='rowid',
34
+ tokenize='porter unicode61'
35
+ );
36
+
37
+ DROP TRIGGER IF EXISTS documents_ai;
38
+
39
+ CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
40
+ INSERT INTO documents_fts(documents_fts, rowid, title, body) VALUES ('delete', new.rowid, new.title, new.body);
41
+ INSERT INTO documents_fts(rowid, title, body) VALUES (new.rowid, new.title, new.body);
42
+ END;
43
+ CREATE TRIGGER IF NOT EXISTS documents_ad AFTER DELETE ON documents BEGIN
44
+ INSERT INTO documents_fts(documents_fts, rowid, title, body) VALUES ('delete', old.rowid, old.title, old.body);
45
+ END;
46
+ CREATE TRIGGER IF NOT EXISTS documents_au AFTER UPDATE ON documents BEGIN
47
+ INSERT INTO documents_fts(documents_fts, rowid, title, body) VALUES ('delete', old.rowid, old.title, old.body);
48
+ INSERT INTO documents_fts(rowid, title, body) VALUES (new.rowid, new.title, new.body);
49
+ END;
50
+
51
+ CREATE TABLE IF NOT EXISTS sync_meta (
52
+ key TEXT PRIMARY KEY,
53
+ value TEXT
54
+ );
55
+ `);
56
+ }
package/dist/index.js ADDED
@@ -0,0 +1,169 @@
1
+ #!/usr/bin/env node
2
+ import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
3
+ import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
4
+ import { z } from "zod";
5
+ import { initSchema } from "./db/schema.js";
6
+ import * as q from "./db/queries.js";
7
+ import { syncRbi } from "./scrapers/rbi.js";
8
+ import { syncSebi } from "./scrapers/sebi.js";
9
+ import { DISCLAIMER, ok, err, emptyDbMsg } from "./util/format.js";
10
+ initSchema();
11
+ const server = new McpServer({ name: "india-reg-mcp", version: "1.0.0" });
12
+ server.tool("search_regulations", "Full-text search across indexed RBI and SEBI regulatory documents (circulars, master directions, notifications, regulations). " +
13
+ "Returns matching documents with title, date, regulator, a highlighted snippet, and the official source link. " +
14
+ "Use this to answer 'what are the rules on X' style questions with cited primary sources.", {
15
+ query: z.string().describe("Search terms e.g. 'digital lending', 'KYC periodic updation', 'FPI registration', 'mutual fund nomination'"),
16
+ regulator: z.enum(["RBI", "SEBI"]).optional().describe("Optionally limit to one regulator"),
17
+ doc_type: z.enum(["circular", "master_direction", "master_circular", "notification", "regulation"]).optional()
18
+ .describe("Optionally limit to one document type. master_direction/master_circular are consolidated current rules."),
19
+ limit: z.number().default(10).describe("Max results (1-25)"),
20
+ }, async ({ query, regulator, doc_type, limit }) => {
21
+ try {
22
+ if (!q.docCount().length)
23
+ return emptyDbMsg();
24
+ const results = q.searchDocs({ query, regulator, docType: doc_type, limit: Math.min(limit, 25) });
25
+ return ok({
26
+ query,
27
+ resultCount: results.length,
28
+ results: results.map((r) => ({
29
+ id: r.id, regulator: r.regulator, type: r.doc_type,
30
+ title: r.title, date: r.date, snippet: r.snippet,
31
+ source: r.source_url, pdf: r.pdf_url,
32
+ })),
33
+ note: "Use get_document with an id to retrieve the full text.",
34
+ disclaimer: DISCLAIMER,
35
+ });
36
+ }
37
+ catch (e) {
38
+ return err(e instanceof Error ? e.message : String(e));
39
+ }
40
+ });
41
+ server.tool("get_document", "Retrieve the full text of a specific regulatory document by its id (from search results). Returns the complete body plus metadata and official link.", { id: z.string().describe("Document id from search results e.g. 'rbi:13344' or 'sebi:101703'") }, async ({ id }) => {
42
+ try {
43
+ const doc = q.getDoc(id);
44
+ if (!doc)
45
+ return err(`No document found with id ${id}. Use search_regulations to find valid ids.`);
46
+ const body = doc.body && doc.body.length > 12000
47
+ ? doc.body.slice(0, 12000) + "\n\n[... truncated. See full document at source link ...]"
48
+ : doc.body;
49
+ return ok({
50
+ id: doc.id, regulator: doc.regulator, type: doc.doc_type,
51
+ title: doc.title, date: doc.date, department: doc.department,
52
+ source: doc.source_url, pdf: doc.pdf_url, body,
53
+ disclaimer: DISCLAIMER,
54
+ });
55
+ }
56
+ catch (e) {
57
+ return err(e instanceof Error ? e.message : String(e));
58
+ }
59
+ });
60
+ server.tool("get_recent", "Get the most recent regulatory documents, optionally filtered by regulator or type. Useful for 'what changed recently' questions.", {
61
+ regulator: z.enum(["RBI", "SEBI"]).optional(),
62
+ doc_type: z.enum(["circular", "master_direction", "master_circular", "notification", "regulation"]).optional(),
63
+ limit: z.number().default(15).describe("Max results (1-30)"),
64
+ }, async ({ regulator, doc_type, limit }) => {
65
+ try {
66
+ if (!q.docCount().length)
67
+ return emptyDbMsg();
68
+ const docs = q.recentDocs({ regulator, docType: doc_type, limit: Math.min(limit, 30) });
69
+ return ok({
70
+ results: docs.map((d) => ({ id: d.id, regulator: d.regulator, type: d.doc_type, title: d.title, date: d.date, source: d.source_url })),
71
+ disclaimer: DISCLAIMER,
72
+ });
73
+ }
74
+ catch (e) {
75
+ return err(e instanceof Error ? e.message : String(e));
76
+ }
77
+ });
78
+ server.tool("list_master_directions", "List RBI Master Directions and SEBI Master Circulars — the consolidated, currently-in-force rules on each subject. " +
79
+ "Best starting point for understanding the current state of regulation on a topic.", { regulator: z.enum(["RBI", "SEBI"]).optional() }, async ({ regulator }) => {
80
+ try {
81
+ if (!q.docCount().length)
82
+ return emptyDbMsg();
83
+ const md = q.recentDocs({ regulator, docType: "master_direction", limit: 50 });
84
+ const mc = q.recentDocs({ regulator, docType: "master_circular", limit: 50 });
85
+ const all = [...md, ...mc].sort((a, b) => b.date.localeCompare(a.date));
86
+ return ok({
87
+ count: all.length,
88
+ documents: all.map((d) => ({ id: d.id, regulator: d.regulator, type: d.doc_type, title: d.title, date: d.date, source: d.source_url })),
89
+ disclaimer: DISCLAIMER,
90
+ });
91
+ }
92
+ catch (e) {
93
+ return err(e instanceof Error ? e.message : String(e));
94
+ }
95
+ });
96
+ server.tool("list_by_department", "List SEBI documents from a specific department e.g. 'Investment Management', 'Market Regulation', 'Corporation Finance'.", { department: z.string().describe("Department name or partial e.g. 'Investment Management', 'Foreign Portfolio'") }, async ({ department }) => {
97
+ try {
98
+ if (!q.docCount().length)
99
+ return emptyDbMsg();
100
+ const docs = q.listByDepartment(department, 25);
101
+ return ok({
102
+ department, count: docs.length,
103
+ documents: docs.map((d) => ({ id: d.id, title: d.title, date: d.date, source: d.source_url })),
104
+ disclaimer: DISCLAIMER,
105
+ });
106
+ }
107
+ catch (e) {
108
+ return err(e instanceof Error ? e.message : String(e));
109
+ }
110
+ });
111
+ server.tool("sync_latest", "Refresh the regulatory index by scraping the latest documents from RBI and SEBI. " +
112
+ "Run this to pull in newly published circulars, master circulars, and regulations. Incremental — only fetches documents not already indexed. " +
113
+ "Note: takes 2-5 minutes as it politely scrapes the regulators' sites.", {
114
+ months_back: z.number().default(2).describe("How many months of RBI history to check (default 2 for incremental refresh)"),
115
+ sebi_pages: z.number().default(3).describe("How many SEBI listing pages to check per section (default 3 = ~75 recent docs per section)"),
116
+ }, async ({ months_back, sebi_pages }) => {
117
+ try {
118
+ const log = [];
119
+ const rbiCount = await syncRbi(months_back, (m) => log.push(m));
120
+ const sebiCirc = await syncSebi(7, sebi_pages, (m) => log.push(m));
121
+ const sebiMaster = await syncSebi(6, sebi_pages, (m) => log.push(m));
122
+ const sebiReg = await syncSebi(3, sebi_pages, (m) => log.push(m));
123
+ const sebiCount = sebiCirc + sebiMaster + sebiReg;
124
+ q.setSyncMeta("last_sync", new Date().toISOString());
125
+ return ok({
126
+ message: "Sync complete.",
127
+ newRbiDocs: rbiCount, newSebiDocs: sebiCount,
128
+ log,
129
+ disclaimer: DISCLAIMER,
130
+ });
131
+ }
132
+ catch (e) {
133
+ return err(e instanceof Error ? e.message : String(e));
134
+ }
135
+ });
136
+ server.tool("sync_status", "Show how many documents are indexed, broken down by regulator and type, and when the index was last synced.", {}, async () => {
137
+ try {
138
+ const counts = q.docCount();
139
+ const lastSync = q.getSyncMeta("last_sync");
140
+ const total = counts.reduce((s, c) => s + c.n, 0);
141
+ return ok({ totalDocuments: total, breakdown: counts, lastSync: lastSync || "never" });
142
+ }
143
+ catch (e) {
144
+ return err(e instanceof Error ? e.message : String(e));
145
+ }
146
+ });
147
+ server.tool("search_by_topic", "Topic-focused search that returns BOTH the consolidated master rules AND recent circulars on a subject, " +
148
+ "so you get the current baseline plus any recent changes. Best tool for 'give me everything on X' questions.", { topic: z.string().describe("Regulatory topic e.g. 'digital lending', 'NBFC capital adequacy', 'FPI', 'algo trading', 'KYC'") }, async ({ topic }) => {
149
+ try {
150
+ if (!q.docCount().length)
151
+ return emptyDbMsg();
152
+ const masters = q.searchDocs({ query: topic, docType: "master_direction", limit: 3 })
153
+ .concat(q.searchDocs({ query: topic, docType: "master_circular", limit: 3 }));
154
+ const recent = q.searchDocs({ query: topic, limit: 10 });
155
+ return ok({
156
+ topic,
157
+ consolidatedRules: masters.map((r) => ({ id: r.id, regulator: r.regulator, title: r.title, date: r.date, source: r.source_url })),
158
+ relatedDocuments: recent.map((r) => ({ id: r.id, regulator: r.regulator, type: r.doc_type, title: r.title, date: r.date, snippet: r.snippet, source: r.source_url })),
159
+ guidance: "Start with consolidatedRules for the current baseline, then check relatedDocuments for recent amendments. Use get_document for full text.",
160
+ disclaimer: DISCLAIMER,
161
+ });
162
+ }
163
+ catch (e) {
164
+ return err(e instanceof Error ? e.message : String(e));
165
+ }
166
+ });
167
+ const transport = new StdioServerTransport();
168
+ await server.connect(transport);
169
+ console.error("india-reg-mcp running on stdio");
@@ -0,0 +1,19 @@
1
+ import { politeFetch } from "../util/http.js";
2
+ import { PDFParse } from "pdf-parse";
3
+ export async function extractPdfText(pdfUrl) {
4
+ try {
5
+ const res = await politeFetch(pdfUrl);
6
+ const buf = Buffer.from(await res.arrayBuffer());
7
+ const parser = new PDFParse({ data: buf });
8
+ const result = await parser.getText();
9
+ return cleanText(result.text);
10
+ }
11
+ catch (e) {
12
+ const msg = e instanceof Error ? e.message : String(e);
13
+ console.error(`PDF extract failed for ${pdfUrl}: ${msg}`);
14
+ return "";
15
+ }
16
+ }
17
+ function cleanText(t) {
18
+ return t.replace(/\r/g, "").replace(/\n{3,}/g, "\n\n").replace(/[ \t]{2,}/g, " ").trim();
19
+ }
@@ -0,0 +1,168 @@
1
+ import * as cheerio from "cheerio";
2
+ import TurndownService from "turndown";
3
+ import pLimit from "p-limit";
4
+ import { politeFetch, sleep } from "../util/http.js";
5
+ import { extractPdfText } from "./pdf.js";
6
+ import { upsertMany, docExists } from "../db/queries.js";
7
+ const td = new TurndownService();
8
+ const limit = pLimit(2);
9
+ const RBI_BASE = "https://rbi.org.in/Scripts/NotificationUser.aspx";
10
+ async function fetchViewstateTokens() {
11
+ const res = await politeFetch(RBI_BASE);
12
+ const html = await res.text();
13
+ return {
14
+ vs: html.match(/id="__VIEWSTATE"\s+value="([^"]+)"/)?.[1] ?? "",
15
+ vsg: html.match(/id="__VIEWSTATEGENERATOR"\s+value="([^"]+)"/)?.[1] ?? "",
16
+ ev: html.match(/id="__EVENTVALIDATION"\s+value="([^"]+)"/)?.[1] ?? "",
17
+ };
18
+ }
19
+ // Scrape one month via POST (month: 1-12, or 0 = all)
20
+ export async function scrapeRbiMonth(year, month, tokens) {
21
+ const t = tokens ?? await fetchViewstateTokens();
22
+ const body = new URLSearchParams({
23
+ __VIEWSTATE: t.vs,
24
+ __VIEWSTATEGENERATOR: t.vsg,
25
+ __EVENTVALIDATION: t.ev,
26
+ hdnYear: String(year),
27
+ hdnMonth: String(month),
28
+ "UsrFontCntr$btn": "",
29
+ });
30
+ let res;
31
+ for (let attempt = 0; attempt <= 2; attempt++) {
32
+ try {
33
+ res = await fetch(RBI_BASE, {
34
+ method: "POST",
35
+ headers: {
36
+ "User-Agent": "india-reg-mcp/1.0 (open-source regulatory indexer)",
37
+ "Content-Type": "application/x-www-form-urlencoded",
38
+ "Referer": RBI_BASE,
39
+ "Accept": "text/html,*/*",
40
+ },
41
+ body: body.toString(),
42
+ signal: AbortSignal.timeout(30_000),
43
+ });
44
+ if (res.ok)
45
+ break;
46
+ if (res.status === 429 || res.status >= 500) {
47
+ await sleep(2000 * (attempt + 1));
48
+ continue;
49
+ }
50
+ throw new Error(`RBI POST failed: HTTP ${res.status}`);
51
+ }
52
+ catch (e) {
53
+ if (attempt === 2)
54
+ throw e;
55
+ await sleep(2000 * (attempt + 1));
56
+ }
57
+ }
58
+ if (!res?.ok)
59
+ throw new Error(`RBI POST failed after retries`);
60
+ const html = await res.text();
61
+ if (html.includes("No Notification Found"))
62
+ return [];
63
+ const $ = cheerio.load(html);
64
+ const items = [];
65
+ let currentDate = "";
66
+ $("table tr").each((_, tr) => {
67
+ const $tr = $(tr);
68
+ const text = $tr.text().trim();
69
+ const dateMatch = text.match(/^([A-Z][a-z]{2}\s+\d{1,2},\s+\d{4})$/);
70
+ if (dateMatch) {
71
+ currentDate = toISO(dateMatch[1]);
72
+ return;
73
+ }
74
+ const titleLink = $tr.find('a[href*="NotificationUser.aspx?Id="]').first();
75
+ if (titleLink.length) {
76
+ const href = titleLink.attr("href") || "";
77
+ const idMatch = href.match(/Id=(\d+)/);
78
+ if (!idMatch)
79
+ return;
80
+ if (!currentDate)
81
+ return; // skip rows before the first date header
82
+ const pdfLink = $tr.find('a[href*=".PDF"], a[href*=".pdf"]').first();
83
+ items.push({
84
+ id: `rbi:${idMatch[1]}`,
85
+ title: titleLink.text().trim(),
86
+ date: currentDate,
87
+ htmlUrl: absolute(href, "https://rbi.org.in/Scripts/"),
88
+ pdfUrl: pdfLink.length ? (pdfLink.attr("href") || null) : null,
89
+ });
90
+ }
91
+ });
92
+ return items;
93
+ }
94
+ async function fetchRbiBody(item) {
95
+ try {
96
+ const res = await politeFetch(item.htmlUrl);
97
+ const $ = cheerio.load(await res.text());
98
+ // #pnlDetails is the main content container on RBI ASP.NET doc pages
99
+ const main = $("#pnlDetails, #example-min, table.tablebg").first();
100
+ const bodyHtml = main.length ? main.html() : $("body").html();
101
+ let markdown = bodyHtml ? td.turndown(bodyHtml) : "";
102
+ if (markdown.length < 200 && item.pdfUrl)
103
+ markdown = await extractPdfText(item.pdfUrl);
104
+ return markdown;
105
+ }
106
+ catch {
107
+ return item.pdfUrl ? await extractPdfText(item.pdfUrl) : "";
108
+ }
109
+ }
110
+ export async function syncRbi(monthsBack, onProgress) {
111
+ const tokens = await fetchViewstateTokens();
112
+ const now = new Date();
113
+ let total = 0;
114
+ for (let i = 0; i < monthsBack; i++) {
115
+ const d = new Date(now.getFullYear(), now.getMonth() - i, 1);
116
+ const year = d.getFullYear();
117
+ const month = d.getMonth() + 1; // 1-indexed to match GetYearMonth JS
118
+ let items;
119
+ try {
120
+ items = await scrapeRbiMonth(year, month, tokens);
121
+ }
122
+ catch (e) {
123
+ const msg = e instanceof Error ? e.message : String(e);
124
+ onProgress?.(`RBI ${year}-${month}: fetch failed (${msg}), skipping`);
125
+ await sleep(3000);
126
+ continue;
127
+ }
128
+ onProgress?.(`RBI ${year}-${month}: ${items.length} docs found`);
129
+ const newItems = items.filter((it) => !docExists(it.id));
130
+ const rows = await Promise.all(newItems.map((it) => limit(async () => {
131
+ const body = await fetchRbiBody(it);
132
+ await sleep(300);
133
+ return {
134
+ id: it.id, regulator: "RBI",
135
+ doc_type: classifyRbi(it.title),
136
+ title: it.title, date: it.date, department: null,
137
+ source_url: it.htmlUrl, pdf_url: it.pdfUrl, body,
138
+ indexed_at: new Date().toISOString(),
139
+ };
140
+ })));
141
+ if (rows.length)
142
+ upsertMany(rows);
143
+ total += rows.length;
144
+ await sleep(500);
145
+ }
146
+ return total;
147
+ }
148
+ function classifyRbi(title) {
149
+ const t = title.toLowerCase();
150
+ if (t.includes("master direction"))
151
+ return "master_direction";
152
+ if (t.includes("master circular"))
153
+ return "master_circular";
154
+ if (t.includes("regulations"))
155
+ return "regulation";
156
+ if (t.includes("circular"))
157
+ return "circular";
158
+ return "notification";
159
+ }
160
+ function toISO(s) {
161
+ const d = new Date(s);
162
+ return isNaN(d.getTime()) ? "" : d.toISOString().split("T")[0];
163
+ }
164
+ function absolute(href, base) {
165
+ if (href.startsWith("http"))
166
+ return href;
167
+ return new URL(href, base).toString();
168
+ }
@@ -0,0 +1,92 @@
1
+ /**
2
+ * Backfill body text for docs that were indexed without body content.
3
+ * Run with: npx tsx src/scrapers/repair-bodies.ts
4
+ */
5
+ import { initSchema } from "../db/schema.js";
6
+ import { db } from "../db/schema.js";
7
+ import { sleep } from "../util/http.js";
8
+ import pLimit from "p-limit";
9
+ // Import body fetchers directly
10
+ import * as cheerio from "cheerio";
11
+ import TurndownService from "turndown";
12
+ import { politeFetch } from "../util/http.js";
13
+ import { PDFParse } from "pdf-parse";
14
+ initSchema();
15
+ const td = new TurndownService();
16
+ const limit = pLimit(2);
17
+ const log = (m) => console.error(`[repair] ${m}`);
18
+ async function fetchRbiBody(sourceUrl) {
19
+ try {
20
+ const res = await politeFetch(sourceUrl);
21
+ const $ = cheerio.load(await res.text());
22
+ const main = $("#pnlDetails, #example-min, table.tablebg").first();
23
+ const bodyHtml = main.length ? main.html() : $("body").html();
24
+ return bodyHtml ? td.turndown(bodyHtml) : "";
25
+ }
26
+ catch {
27
+ return "";
28
+ }
29
+ }
30
+ async function fetchSebiBody(sourceUrl) {
31
+ try {
32
+ const res = await politeFetch(sourceUrl);
33
+ const $ = cheerio.load(await res.text());
34
+ const iframeSrc = $("iframe[src*='sebi_data'], iframe[src*='?file=']").first().attr("src") || "";
35
+ const pdfUrlMatch = iframeSrc.match(/[?&]file=((?:https?:\/\/|\/)[^'"&\s]+\.pdf)/i);
36
+ const rawPdfPath = pdfUrlMatch ? pdfUrlMatch[1] : null;
37
+ const pdfUrl = rawPdfPath
38
+ ? rawPdfPath.startsWith("/") ? `https://www.sebi.gov.in${rawPdfPath}` : rawPdfPath
39
+ : null;
40
+ if (pdfUrl) {
41
+ const buf = Buffer.from(await (await politeFetch(pdfUrl)).arrayBuffer());
42
+ const parser = new PDFParse({ data: buf });
43
+ const result = await parser.getText();
44
+ const text = result.text.replace(/\r/g, "").replace(/\n{3,}/g, "\n\n").trim();
45
+ return { body: text, pdfUrl };
46
+ }
47
+ return { body: "", pdfUrl: null };
48
+ }
49
+ catch {
50
+ return { body: "", pdfUrl: null };
51
+ }
52
+ }
53
+ const updateStmt = db.prepare("UPDATE documents SET body=@body, pdf_url=@pdf_url, indexed_at=@indexed_at WHERE id=@id");
54
+ const updateBodyOnly = db.prepare("UPDATE documents SET body=@body, indexed_at=@indexed_at WHERE id=@id");
55
+ const docs = db.prepare("SELECT id, regulator, source_url FROM documents WHERE body IS NULL OR body = '' OR (regulator='SEBI' AND LENGTH(body) < 2000) ORDER BY date DESC").all();
56
+ log(`Found ${docs.length} docs needing body repair`);
57
+ let done = 0;
58
+ let failed = 0;
59
+ await Promise.all(docs.map((doc) => limit(async () => {
60
+ try {
61
+ let body = "";
62
+ let pdfUrl = null;
63
+ if (doc.regulator === "RBI") {
64
+ body = await fetchRbiBody(doc.source_url);
65
+ }
66
+ else {
67
+ const result = await fetchSebiBody(doc.source_url);
68
+ body = result.body;
69
+ pdfUrl = result.pdfUrl;
70
+ }
71
+ if (doc.regulator === "RBI") {
72
+ updateBodyOnly.run({ id: doc.id, body, indexed_at: new Date().toISOString() });
73
+ }
74
+ else {
75
+ updateStmt.run({ id: doc.id, body, pdf_url: pdfUrl, indexed_at: new Date().toISOString() });
76
+ }
77
+ done++;
78
+ if (done % 20 === 0)
79
+ log(`Progress: ${done}/${docs.length} done`);
80
+ }
81
+ catch (e) {
82
+ const msg = e instanceof Error ? e.message : String(e);
83
+ console.error(`Failed ${doc.id}: ${msg}`);
84
+ failed++;
85
+ }
86
+ await sleep(300);
87
+ })));
88
+ log(`Done. ${done} updated, ${failed} failed.`);
89
+ // Show updated stats
90
+ const stats = db.prepare("SELECT regulator, SUM(CASE WHEN body IS NULL OR body='' THEN 1 ELSE 0 END) as no_body, COUNT(*) as total FROM documents GROUP BY regulator").all();
91
+ stats.forEach(s => log(`${s.regulator}: ${s.total - s.no_body}/${s.total} have body`));
92
+ process.exit(0);
@@ -0,0 +1,31 @@
1
+ import { initSchema } from "../db/schema.js";
2
+ import { syncRbi } from "./rbi.js";
3
+ import { syncSebi } from "./sebi.js";
4
+ import { setSyncMeta } from "../db/queries.js";
5
+ async function main() {
6
+ initSchema();
7
+ const log = (m) => console.error(`[sync] ${m}`);
8
+ const args = process.argv.slice(2);
9
+ const quick = args.includes("--quick"); // quick mode: 6mo RBI + 5 pages SEBI
10
+ if (quick) {
11
+ log("Quick sync mode (6 months RBI, 5 pages SEBI each)...");
12
+ const rbiCount = await syncRbi(6, log);
13
+ log(`RBI: ${rbiCount} new documents`);
14
+ const sebiCirc = await syncSebi(7, 5, log);
15
+ log(`SEBI circulars: ${sebiCirc} new documents`);
16
+ }
17
+ else {
18
+ log("Starting RBI sync (last 36 months)...");
19
+ const rbiCount = await syncRbi(36, log);
20
+ log(`RBI: ${rbiCount} new documents`);
21
+ log("Starting SEBI sync...");
22
+ const sebiMaster = await syncSebi(6, 5, log);
23
+ const sebiCirc = await syncSebi(7, 40, log);
24
+ const sebiReg = await syncSebi(3, 10, log);
25
+ log(`SEBI: ${sebiMaster + sebiCirc + sebiReg} new documents`);
26
+ }
27
+ setSyncMeta("last_sync", new Date().toISOString());
28
+ log("Sync complete.");
29
+ process.exit(0);
30
+ }
31
+ main().catch((e) => { console.error(e); process.exit(1); });
@@ -0,0 +1,146 @@
1
+ import * as cheerio from "cheerio";
2
+ import TurndownService from "turndown";
3
+ import pLimit from "p-limit";
4
+ import { politeFetch, sleep } from "../util/http.js";
5
+ import { upsertMany, docExists } from "../db/queries.js";
6
+ const td = new TurndownService();
7
+ const limit = pLimit(2);
8
+ const UA = "india-reg-mcp/1.0 (open-source regulatory indexer)";
9
+ // ssid: 7=circulars, 6=master circulars, 3=regulations
10
+ const SEBI_LIST_BASE = "https://www.sebi.gov.in/sebiweb/home/HomeAction.do?doListing=yes&sid=1&ssid=";
11
+ const SEBI_AJAX = "https://www.sebi.gov.in/sebiweb/ajax/home/getnewslistinfo.jsp";
12
+ function parseListItems($) {
13
+ const items = [];
14
+ $("table tr").each((_, tr) => {
15
+ const $tr = $(tr);
16
+ const link = $tr.find('a[href*="/legal/"]').first();
17
+ if (!link.length)
18
+ return;
19
+ const href = link.attr("href") || "";
20
+ const idMatch = href.match(/_(\d+)\.html/);
21
+ if (!idMatch)
22
+ return;
23
+ const dateCell = $tr.find("td").first().text().trim();
24
+ items.push({
25
+ id: `sebi:${idMatch[1]}`,
26
+ title: link.text().trim(),
27
+ date: toISO(dateCell),
28
+ url: absolute(href, "https://www.sebi.gov.in/"),
29
+ });
30
+ });
31
+ return items;
32
+ }
33
+ // Pages 1+: AJAX POST to getnewslistinfo.jsp (page 0 handled in syncSebi)
34
+ async function getSebiPage(ssid, pageIndex, jsessionid) {
35
+ // Pages 1+: AJAX POST to getnewslistinfo.jsp
36
+ const body = new URLSearchParams({
37
+ nextValue: "1",
38
+ next: "n",
39
+ search: "", fromDate: "", toDate: "", fromYear: "", toYear: "",
40
+ deptId: "",
41
+ sid: "1", ssid: String(ssid), smid: "0", ssidhidden: String(ssid),
42
+ intmid: "-1",
43
+ sText: "Legal", ssText: ssid === 7 ? "Circulars" : ssid === 6 ? "Master Circulars" : "Regulations",
44
+ smText: "",
45
+ doDirect: String(pageIndex),
46
+ });
47
+ const res = await fetch(SEBI_AJAX, {
48
+ method: "POST",
49
+ headers: {
50
+ "User-Agent": UA,
51
+ "Content-Type": "application/x-www-form-urlencoded",
52
+ "Cookie": `JSESSIONID=${jsessionid}`,
53
+ "Referer": `${SEBI_LIST_BASE}${ssid}&smid=0&nextValue=0`,
54
+ "Accept": "*/*",
55
+ "X-Requested-With": "XMLHttpRequest",
56
+ },
57
+ body: body.toString(),
58
+ signal: AbortSignal.timeout(30_000),
59
+ });
60
+ if (!res.ok)
61
+ throw new Error(`SEBI AJAX failed: HTTP ${res.status}`);
62
+ return parseListItems(cheerio.load(await res.text()));
63
+ }
64
+ async function fetchSebiBody(item) {
65
+ try {
66
+ const res = await politeFetch(item.url);
67
+ const $ = cheerio.load(await res.text());
68
+ // SEBI pages embed content as PDF in an iframe — src may have absolute or relative PDF path
69
+ // e.g. ?file=https://www.sebi.gov.in/sebi_data/... or ?file=/sebi_data/...
70
+ const iframeSrc = $("iframe[src*='sebi_data'], iframe[src*='?file=']").first().attr("src") || "";
71
+ const pdfUrlMatch = iframeSrc.match(/[?&]file=((?:https?:\/\/|\/)[^'"&\s]+\.pdf)/i);
72
+ const rawPdfPath = pdfUrlMatch ? pdfUrlMatch[1] : null;
73
+ const pdfUrl = rawPdfPath
74
+ ? rawPdfPath.startsWith("/") ? `https://www.sebi.gov.in${rawPdfPath}` : rawPdfPath
75
+ : null;
76
+ if (pdfUrl) {
77
+ const { extractPdfText } = await import("./pdf.js");
78
+ const body = await extractPdfText(pdfUrl);
79
+ return { body, pdfUrl };
80
+ }
81
+ // Fallback: extract any visible text from the page
82
+ const main = $(".main_section, .news-detail-slider, #member-wrapper").first();
83
+ const bodyHtml = main.length ? main.html() : $("body").html();
84
+ return { body: bodyHtml ? td.turndown(bodyHtml) : "", pdfUrl: null };
85
+ }
86
+ catch {
87
+ return { body: "", pdfUrl: null };
88
+ }
89
+ }
90
+ const SSID_DOC_TYPE = { 6: "master_circular", 3: "regulation" };
91
+ const SSID_DEPARTMENT = { 6: "Master Circulars", 3: "Regulations", 7: "Circulars" };
92
+ export async function syncSebi(ssid, maxPages, onProgress) {
93
+ // Page 0: single GET that both establishes the session and returns first page listings
94
+ const url = `${SEBI_LIST_BASE}${ssid}&smid=0&nextValue=0`;
95
+ const page0Res = await fetch(url, {
96
+ headers: { "User-Agent": UA, "Accept": "text/html,*/*" },
97
+ signal: AbortSignal.timeout(30_000),
98
+ });
99
+ if (!page0Res.ok)
100
+ throw new Error(`SEBI GET failed: HTTP ${page0Res.status}`);
101
+ const cookie = page0Res.headers.get("set-cookie") || "";
102
+ const jsessionid = cookie.match(/JSESSIONID=([^;]+)/)?.[1] ?? "";
103
+ if (!jsessionid)
104
+ throw new Error("SEBI: failed to obtain JSESSIONID — session cookie absent from page-0 response");
105
+ const doc_type = SSID_DOC_TYPE[ssid] ?? "circular";
106
+ const department = SSID_DEPARTMENT[ssid] ?? "Circulars";
107
+ let total = 0;
108
+ const processPage = async (items) => {
109
+ if (!items.length)
110
+ return false;
111
+ const newItems = items.filter((it) => !docExists(it.id));
112
+ const rows = await Promise.all(newItems.map((it) => limit(async () => {
113
+ const { body, pdfUrl } = await fetchSebiBody(it);
114
+ await sleep(300);
115
+ return {
116
+ id: it.id, regulator: "SEBI",
117
+ doc_type, title: it.title, date: it.date, department,
118
+ source_url: it.url, pdf_url: pdfUrl, body,
119
+ indexed_at: new Date().toISOString(),
120
+ };
121
+ })));
122
+ if (rows.length)
123
+ upsertMany(rows);
124
+ total += rows.length;
125
+ await sleep(500);
126
+ return true;
127
+ };
128
+ const page0Items = parseListItems(cheerio.load(await page0Res.text()));
129
+ onProgress?.(`SEBI ssid=${ssid} page 0: ${page0Items.length} docs`);
130
+ await processPage(page0Items);
131
+ for (let page = 1; page < maxPages; page++) {
132
+ const items = await getSebiPage(ssid, page, jsessionid);
133
+ if (!items.length)
134
+ break;
135
+ onProgress?.(`SEBI ssid=${ssid} page ${page}: ${items.length} docs`);
136
+ await processPage(items);
137
+ }
138
+ return total;
139
+ }
140
+ function toISO(s) {
141
+ const d = new Date(s.replace(/(\w{3})\s+(\d{1,2}),\s+(\d{4})/, "$1 $2 $3"));
142
+ return isNaN(d.getTime()) ? "" : d.toISOString().split("T")[0];
143
+ }
144
+ function absolute(href, base) {
145
+ return href.startsWith("http") ? href : new URL(href, base).toString();
146
+ }
@@ -0,0 +1,10 @@
1
+ export const DISCLAIMER = "Source: official RBI/SEBI publications. This is primary-source retrieval, not legal advice. Verify against the linked official document.";
2
+ export function ok(data) {
3
+ return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
4
+ }
5
+ export function err(m) {
6
+ return { content: [{ type: "text", text: `Error: ${m}` }] };
7
+ }
8
+ export function emptyDbMsg() {
9
+ return ok({ message: "The regulatory index is empty. Run 'npm run sync' first, or call the sync_latest tool to populate it.", disclaimer: DISCLAIMER });
10
+ }
@@ -0,0 +1,25 @@
1
+ const UA = "india-reg-mcp/1.0 (open-source regulatory indexer; +https://github.com/yourusername/india-reg-mcp)";
2
+ export async function politeFetch(url, retries = 2) {
3
+ for (let attempt = 0; attempt <= retries; attempt++) {
4
+ try {
5
+ const res = await fetch(url, {
6
+ headers: { "User-Agent": UA, "Accept": "text/html,application/pdf,*/*" },
7
+ signal: AbortSignal.timeout(30_000),
8
+ });
9
+ if (res.ok)
10
+ return res;
11
+ if (res.status === 429 || res.status >= 500) {
12
+ await sleep(1000 * (attempt + 1));
13
+ continue;
14
+ }
15
+ throw new Error(`HTTP ${res.status} for ${url}`);
16
+ }
17
+ catch (e) {
18
+ if (attempt === retries)
19
+ throw e;
20
+ await sleep(1000 * (attempt + 1));
21
+ }
22
+ }
23
+ throw new Error(`Failed after ${retries} retries: ${url}`);
24
+ }
25
+ export function sleep(ms) { return new Promise((r) => setTimeout(r, ms)); }
package/package.json ADDED
@@ -0,0 +1,61 @@
1
+ {
2
+ "name": "india-reg-mcp",
3
+ "version": "1.0.0",
4
+ "description": "MCP server for Indian financial regulations — RBI & SEBI circulars, master directions, notifications. Searchable, cited, no API keys.",
5
+ "type": "module",
6
+ "main": "dist/index.js",
7
+ "bin": {
8
+ "india-reg-mcp": "./dist/index.js"
9
+ },
10
+ "files": [
11
+ "dist/**/*",
12
+ "README.md"
13
+ ],
14
+ "keywords": [
15
+ "mcp",
16
+ "model-context-protocol",
17
+ "india",
18
+ "rbi",
19
+ "sebi",
20
+ "regulations",
21
+ "circulars",
22
+ "compliance",
23
+ "fintech",
24
+ "claude"
25
+ ],
26
+ "license": "MIT",
27
+ "repository": {
28
+ "type": "git",
29
+ "url": "https://github.com/Akhilgovind02/india-regulatory-mcp.git"
30
+ },
31
+ "homepage": "https://github.com/Akhilgovind02/india-regulatory-mcp",
32
+ "engines": {
33
+ "node": ">=18.0.0"
34
+ },
35
+ "scripts": {
36
+ "build": "tsc",
37
+ "postbuild": "node scripts/add-shebang.mjs",
38
+ "dev": "tsx src/index.ts",
39
+ "sync": "tsx src/scrapers/run-sync.ts",
40
+ "inspect": "npx @modelcontextprotocol/inspector tsx src/index.ts",
41
+ "start": "node dist/index.js",
42
+ "prepublishOnly": "npm run build"
43
+ },
44
+ "dependencies": {
45
+ "@modelcontextprotocol/sdk": "1.29.0",
46
+ "better-sqlite3": "12.10.0",
47
+ "cheerio": "1.2.0",
48
+ "p-limit": "7.3.0",
49
+ "pdf-parse": "2.4.5",
50
+ "turndown": "7.2.4",
51
+ "zod": "4.4.3"
52
+ },
53
+ "devDependencies": {
54
+ "@modelcontextprotocol/inspector": "0.22.0",
55
+ "@types/better-sqlite3": "^7.6.13",
56
+ "@types/node": "25.9.3",
57
+ "@types/turndown": "^5.0.6",
58
+ "tsx": "4.22.4",
59
+ "typescript": "6.0.3"
60
+ }
61
+ }