bidreader 0.2.0__tar.gz → 0.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,151 @@
1
+ Metadata-Version: 2.4
2
+ Name: bidreader
3
+ Version: 0.5.0
4
+ Summary: Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps/exclusions vendors bury. Every value cited to its page.
5
+ Author-email: Anmol <anmol@attentive.ai>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/anmolsam/bidreader
8
+ Project-URL: Issues, https://github.com/anmolsam/bidreader/issues
9
+ Keywords: construction,estimating,takeoff,subcontractor,bid,quote,scope,exclusions,spec,AEC,preconstruction,BOQ,LLM,MCP
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Topic :: Office/Business :: Financial
15
+ Requires-Python: >=3.10
16
+ Description-Content-Type: text/markdown
17
+ License-File: LICENSE
18
+ Requires-Dist: pymupdf>=1.24
19
+ Requires-Dist: certifi>=2024.0
20
+ Provides-Extra: tables
21
+ Requires-Dist: pdfplumber>=0.11; extra == "tables"
22
+ Provides-Extra: mcp
23
+ Requires-Dist: mcp>=1.2; extra == "mcp"
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest>=8; extra == "dev"
26
+ Dynamic: license-file
27
+
28
+ <div align="center">
29
+
30
+ # 📄 BidReader
31
+
32
+ ### Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.
33
+
34
+ Every line item carries its **page**, the **exact source text** it came from, and an **arithmetic check** (`qty × unit_price == amount`) — verification on top of extraction, not just an LLM guess.
35
+
36
+ [![PyPI](https://img.shields.io/pypi/v/bidreader?color=2ea043&label=pip%20install%20bidreader)](https://pypi.org/project/bidreader/)
37
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
38
+ [![Python](https://img.shields.io/badge/python-3.10%2B-blue)](pyproject.toml)
39
+ [![MCP](https://img.shields.io/badge/MCP-server-8b5cf6)](docs/MCP.md)
40
+ [![Runs on free models](https://img.shields.io/badge/runs%20on-free%20LLMs-success)](docs/FREE_MODELS.md)
41
+
42
+ </div>
43
+
44
+ ---
45
+
46
+ > *"Manually typing numbers from a PDF into Excel because the formatting is a crime scene… hunting for the one line where a sub quietly excluded 'trash removal' in size-8 font."*
47
+ > — r/Construction, **498 upvotes** ([source](https://www.reddit.com/r/Construction/comments/1pq34ur/))
48
+
49
+ The construction-AI gold rush is all chasing the same crowded, resisted thing — autonomous *takeoff*. The **loudest unmet pain** of estimators is upstream and downstream of it: wrangling crime-scene PDFs into clean data, and **catching what subcontractors quietly excluded** before it costs six figures on the job.
50
+
51
+ No permissively-licensed library did this. **BidReader is that primitive** — MIT, `pip install`, runs on free LLMs, and callable from any AI agent over MCP.
52
+
53
+ ## Quickstart (copy-paste, ~30 seconds)
54
+
55
+ ```bash
56
+ pip install bidreader
57
+
58
+ # Use any one — a FREE key works (see docs/FREE_MODELS.md):
59
+ export GEMINI_API_KEY=... # free at aistudio.google.com
60
+ # or export OPENROUTER_API_KEY=... (has :free models)
61
+ # or export REQUESTY_API_KEY=...
62
+
63
+ bidreader your_sub_quote.pdf
64
+ ```
65
+
66
+ ```python
67
+ from bidreader import read
68
+
69
+ doc = read("sub_quote.pdf")
70
+ doc.line_items # [{section, description, qty, unit, amount, page}, ...]
71
+ doc.exclusions # [{item, quote, page, risk}, ...] <- the buried stuff
72
+ doc.scope_gaps # trade-standard scope NOT in the doc — confirm before bidding
73
+ doc.to_json()
74
+ ```
75
+
76
+ ## Real output
77
+
78
+ On a real **$324,240.61 drywall estimate** (72 line items, scanned in seconds), BidReader's scope engine caught a genuinely expensive hole:
79
+
80
+ ```
81
+ !! SCOPE GAPS TO CONFIRM:
82
+ - Finishing (taping, mudding, sanding) -- the gypsum line items price the BOARD
83
+ only, not the finishing labor to reach a paint-ready surface.
84
+ - Door hardware -- "Door W/ Frame" lines don't include hinges/locks/closers.
85
+ - Firestopping at rated assemblies -- life-safety scope, commonly omitted.
86
+ ```
87
+
88
+ On a real **25-page multi-trade GC estimate**, it parsed **959 line items across 16 CSI divisions** (demolition → concrete → steel → finishes → plumbing → fire suppression), each page-cited. See [docs/RESULTS.md](docs/RESULTS.md) and a full worked example in [`examples/`](examples/).
89
+
90
+ ## Use it from an AI agent (MCP)
91
+
92
+ ```bash
93
+ pip install "bidreader[mcp]"
94
+ ```
95
+ ```json
96
+ { "mcpServers": { "bidreader": {
97
+ "command": "bidreader-mcp",
98
+ "env": { "GEMINI_API_KEY": "..." }
99
+ }}}
100
+ ```
101
+ Tools: `read_document`, `catch_exclusions`, `extract_line_items`. Now your agent can answer *"which subs excluded fire-stopping across this bid folder?"* Full guide: [docs/MCP.md](docs/MCP.md).
102
+
103
+ ## How it works
104
+
105
+ ```
106
+ PDF (sub-quote / bid package / spec / schedule)
107
+ → page-tagged text extraction (PyMuPDF)
108
+ → chunk by page (scales to 25+ page, 900+ line-item estimates)
109
+ → LLM structured extraction (line items · exclusions · assumptions · alternates · scope gaps)
110
+ → merge + page-cited output (JSON / CLI / MCP)
111
+ ```
112
+
113
+ Text-based, so it runs great on **free** models — see [docs/FREE_MODELS.md](docs/FREE_MODELS.md).
114
+
115
+ ## Benchmark
116
+
117
+ Reproducible ground-truth benchmark ([`benchmark/`](benchmark/)) — synthetic docs we author, so truth is exact and the PDFs ship in-repo:
118
+
119
+ | metric | score |
120
+ |---|---|
121
+ | Line-item recall | **100%** |
122
+ | Exclusion-catch recall (incl. prose-buried) | **100%** |
123
+ | No-hallucination rate (clean docs) | **100%** |
124
+ | Bid-total accuracy (±2%) | **100%** |
125
+ | Arithmetic errors caught | **2/2**, 0 false positives |
126
+
127
+ Honest caveat: synthetic docs are cleaner than real scans — these are an **upper bound** on well-structured input, not a claim about messy real bids. Uncontrolled real-document results are in [docs/RESULTS.md](docs/RESULTS.md). Reproduce: `python benchmark/generate.py && python benchmark/run.py`.
128
+
129
+ ## Why this, and why now — the evidence
130
+
131
+ A full write-up (problem, market data, prior-art gap, method, results) is in **[PAPER.md](PAPER.md)**. The short version:
132
+
133
+ - **Loudest, most-shared pain** in construction-estimating communities (the 498-upvote thread above; more cited in the paper).
134
+ - **It works *today*** — document extraction is LLM-native, unlike floor-plan symbol detection (academic SOTA tops out ~83% mAP).
135
+ - **Empty slot** — `bidreader`, `blueprint-parser`, `pytakeoff` were all unclaimed on PyPI; the only adjacent tools are AGPL/non-commercial or abandoned toys.
136
+ - **Broadest base** — every estimator *and* every construction-AI builder needs document extraction. The library is the dependency; the MCP server is the agent-era surface.
137
+
138
+ ## Roadmap
139
+
140
+ - [ ] Scanned-PDF vision OCR path
141
+ - [ ] Revision/addendum **diff** ("what changed between Addendum 3 and 4")
142
+ - [ ] Excel/CSV BOQ export + multi-quote **leveling** (compare subs side-by-side)
143
+ - [ ] Region/trade notation packs (AISC, BS/IS, AUS)
144
+
145
+ ## Contributing
146
+
147
+ PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). Good first issues: add a notation parser, a new export format, or a test fixture.
148
+
149
+ ## License
150
+
151
+ [MIT](LICENSE) © 2026. Cite via [CITATION.cff](CITATION.cff).
@@ -0,0 +1,124 @@
1
+ <div align="center">
2
+
3
+ # 📄 BidReader
4
+
5
+ ### Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.
6
+
7
+ Every line item carries its **page**, the **exact source text** it came from, and an **arithmetic check** (`qty × unit_price == amount`) — verification on top of extraction, not just an LLM guess.
8
+
9
+ [![PyPI](https://img.shields.io/pypi/v/bidreader?color=2ea043&label=pip%20install%20bidreader)](https://pypi.org/project/bidreader/)
10
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
11
+ [![Python](https://img.shields.io/badge/python-3.10%2B-blue)](pyproject.toml)
12
+ [![MCP](https://img.shields.io/badge/MCP-server-8b5cf6)](docs/MCP.md)
13
+ [![Runs on free models](https://img.shields.io/badge/runs%20on-free%20LLMs-success)](docs/FREE_MODELS.md)
14
+
15
+ </div>
16
+
17
+ ---
18
+
19
+ > *"Manually typing numbers from a PDF into Excel because the formatting is a crime scene… hunting for the one line where a sub quietly excluded 'trash removal' in size-8 font."*
20
+ > — r/Construction, **498 upvotes** ([source](https://www.reddit.com/r/Construction/comments/1pq34ur/))
21
+
22
+ The construction-AI gold rush is all chasing the same crowded, resisted thing — autonomous *takeoff*. The **loudest unmet pain** of estimators is upstream and downstream of it: wrangling crime-scene PDFs into clean data, and **catching what subcontractors quietly excluded** before it costs six figures on the job.
23
+
24
+ No permissively-licensed library did this. **BidReader is that primitive** — MIT, `pip install`, runs on free LLMs, and callable from any AI agent over MCP.
25
+
26
+ ## Quickstart (copy-paste, ~30 seconds)
27
+
28
+ ```bash
29
+ pip install bidreader
30
+
31
+ # Use any one — a FREE key works (see docs/FREE_MODELS.md):
32
+ export GEMINI_API_KEY=... # free at aistudio.google.com
33
+ # or export OPENROUTER_API_KEY=... (has :free models)
34
+ # or export REQUESTY_API_KEY=...
35
+
36
+ bidreader your_sub_quote.pdf
37
+ ```
38
+
39
+ ```python
40
+ from bidreader import read
41
+
42
+ doc = read("sub_quote.pdf")
43
+ doc.line_items # [{section, description, qty, unit, amount, page}, ...]
44
+ doc.exclusions # [{item, quote, page, risk}, ...] <- the buried stuff
45
+ doc.scope_gaps # trade-standard scope NOT in the doc — confirm before bidding
46
+ doc.to_json()
47
+ ```
48
+
49
+ ## Real output
50
+
51
+ On a real **$324,240.61 drywall estimate** (72 line items, scanned in seconds), BidReader's scope engine caught a genuinely expensive hole:
52
+
53
+ ```
54
+ !! SCOPE GAPS TO CONFIRM:
55
+ - Finishing (taping, mudding, sanding) -- the gypsum line items price the BOARD
56
+ only, not the finishing labor to reach a paint-ready surface.
57
+ - Door hardware -- "Door W/ Frame" lines don't include hinges/locks/closers.
58
+ - Firestopping at rated assemblies -- life-safety scope, commonly omitted.
59
+ ```
60
+
61
+ On a real **25-page multi-trade GC estimate**, it parsed **959 line items across 16 CSI divisions** (demolition → concrete → steel → finishes → plumbing → fire suppression), each page-cited. See [docs/RESULTS.md](docs/RESULTS.md) and a full worked example in [`examples/`](examples/).
62
+
63
+ ## Use it from an AI agent (MCP)
64
+
65
+ ```bash
66
+ pip install "bidreader[mcp]"
67
+ ```
68
+ ```json
69
+ { "mcpServers": { "bidreader": {
70
+ "command": "bidreader-mcp",
71
+ "env": { "GEMINI_API_KEY": "..." }
72
+ }}}
73
+ ```
74
+ Tools: `read_document`, `catch_exclusions`, `extract_line_items`. Now your agent can answer *"which subs excluded fire-stopping across this bid folder?"* Full guide: [docs/MCP.md](docs/MCP.md).
75
+
76
+ ## How it works
77
+
78
+ ```
79
+ PDF (sub-quote / bid package / spec / schedule)
80
+ → page-tagged text extraction (PyMuPDF)
81
+ → chunk by page (scales to 25+ page, 900+ line-item estimates)
82
+ → LLM structured extraction (line items · exclusions · assumptions · alternates · scope gaps)
83
+ → merge + page-cited output (JSON / CLI / MCP)
84
+ ```
85
+
86
+ Text-based, so it runs great on **free** models — see [docs/FREE_MODELS.md](docs/FREE_MODELS.md).
87
+
88
+ ## Benchmark
89
+
90
+ Reproducible ground-truth benchmark ([`benchmark/`](benchmark/)) — synthetic docs we author, so truth is exact and the PDFs ship in-repo:
91
+
92
+ | metric | score |
93
+ |---|---|
94
+ | Line-item recall | **100%** |
95
+ | Exclusion-catch recall (incl. prose-buried) | **100%** |
96
+ | No-hallucination rate (clean docs) | **100%** |
97
+ | Bid-total accuracy (±2%) | **100%** |
98
+ | Arithmetic errors caught | **2/2**, 0 false positives |
99
+
100
+ Honest caveat: synthetic docs are cleaner than real scans — these are an **upper bound** on well-structured input, not a claim about messy real bids. Uncontrolled real-document results are in [docs/RESULTS.md](docs/RESULTS.md). Reproduce: `python benchmark/generate.py && python benchmark/run.py`.
101
+
102
+ ## Why this, and why now — the evidence
103
+
104
+ A full write-up (problem, market data, prior-art gap, method, results) is in **[PAPER.md](PAPER.md)**. The short version:
105
+
106
+ - **Loudest, most-shared pain** in construction-estimating communities (the 498-upvote thread above; more cited in the paper).
107
+ - **It works *today*** — document extraction is LLM-native, unlike floor-plan symbol detection (academic SOTA tops out ~83% mAP).
108
+ - **Empty slot** — `bidreader`, `blueprint-parser`, `pytakeoff` were all unclaimed on PyPI; the only adjacent tools are AGPL/non-commercial or abandoned toys.
109
+ - **Broadest base** — every estimator *and* every construction-AI builder needs document extraction. The library is the dependency; the MCP server is the agent-era surface.
110
+
111
+ ## Roadmap
112
+
113
+ - [ ] Scanned-PDF vision OCR path
114
+ - [ ] Revision/addendum **diff** ("what changed between Addendum 3 and 4")
115
+ - [ ] Excel/CSV BOQ export + multi-quote **leveling** (compare subs side-by-side)
116
+ - [ ] Region/trade notation packs (AISC, BS/IS, AUS)
117
+
118
+ ## Contributing
119
+
120
+ PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). Good first issues: add a notation parser, a new export format, or a test fixture.
121
+
122
+ ## License
123
+
124
+ [MIT](LICENSE) © 2026. Cite via [CITATION.cff](CITATION.cff).
@@ -1,3 +1,3 @@
1
1
  from .extract import read, Doc
2
2
  __all__ = ["read", "Doc"]
3
- __version__ = "0.2.0"
3
+ __version__ = "0.5.0"
@@ -24,6 +24,12 @@ def main():
24
24
  f"{str(li.get('qty') or ''):>8s}{str(li.get('unit') or ''):>5s}{amt:>13s} p{li.get('page','?')}")
25
25
  if d.get('bid_total'):
26
26
  print(f" {'BID TOTAL':56s}{'$' + format(d['bid_total'], ',.2f'):>13s}")
27
+ mm = [li for li in d.line_items if li.get('math_check') == 'mismatch']
28
+ if mm:
29
+ print(f"\n!! ARITHMETIC MISMATCHES ({len(mm)}) — qty x unit_price != amount:")
30
+ for li in mm[:10]:
31
+ print(f" - p{li.get('page','?')} {str(li.get('description',''))[:46]}: "
32
+ f"stated {li.get('amount')}, computed {li.get('math_expected')}")
27
33
  print(f"\n!! EXCLUSIONS CAUGHT ({len(d.exclusions)}):")
28
34
  for e in d.exclusions:
29
35
  print(f" - {e.get('item','?')} (page {e.get('page','?')})")
@@ -24,7 +24,7 @@ SCHEMA_PROMPT = """You are a construction estimating assistant reading a vendor/
24
24
  "vendor": "<company or null>", "project": "<project/title or null>",
25
25
  "trade": "<trade e.g. Drywall, Electrical or null>", "currency": "<e.g. USD or null>",
26
26
  "bid_total": <number or null>,
27
- "line_items": [{"section":"<csi/section or null>","description":"<text>","qty":<num|null>,"unit":"<EA/SF/LF/LS|null>","unit_price":<num|null>,"amount":<num|null>,"page":<int>}],
27
+ "line_items": [{"section":"<csi/section or null>","description":"<text>","qty":<num|null>,"unit":"<EA/SF/LF/LS|null>","unit_price":<num|null>,"amount":<num|null>,"page":<int>,"source_text":"<the EXACT line as printed in the document>"}],
28
28
  "exclusions": [{"item":"<short label>","quote":"<EXACT text as written>","page":<int>,"risk":"<one line: why this matters / who eats the cost>"}],
29
29
  "assumptions": [{"text":"<exact>","page":<int>}],
30
30
  "alternates": [{"text":"<exact>","amount":<num|null>,"page":<int>}],
@@ -36,13 +36,26 @@ Rules: exclusions are CRITICAL — hunt everywhere, including fine print, footno
36
36
  For scope_gaps, infer trade-standard scope a vendor commonly omits that is NOT mentioned in this doc."""
37
37
 
38
38
 
39
- def _page_text(doc):
40
- parts = []
39
+ def _page_blocks(doc):
40
+ out = []
41
41
  for i, p in enumerate(doc):
42
42
  t = p.get_text().strip()
43
43
  if t:
44
- parts.append(f"[PAGE {i+1}]\n{t}")
45
- return "\n\n".join(parts)
44
+ out.append(f"[PAGE {i+1}]\n{t}")
45
+ return out
46
+
47
+
48
+ def _chunk(blocks, budget=42000):
49
+ """Group page-blocks into chunks under a char budget so the model's JSON
50
+ output never overflows on large multi-page estimates."""
51
+ chunks, cur, n = [], [], 0
52
+ for b in blocks:
53
+ if cur and n + len(b) > budget:
54
+ chunks.append("\n\n".join(cur)); cur, n = [], 0
55
+ cur.append(b); n += len(b)
56
+ if cur:
57
+ chunks.append("\n\n".join(cur))
58
+ return chunks
46
59
 
47
60
 
48
61
  def _llm(text):
@@ -74,6 +87,26 @@ def _clean(s):
74
87
  return json.loads(s)
75
88
 
76
89
 
90
+ def validate(data):
91
+ """Attach an objective arithmetic check to each line item:
92
+ 'ok' if qty*unit_price ≈ amount, 'mismatch' if not, 'n/a' if missing inputs.
93
+ This is non-LLM verification — the trust layer on top of extraction."""
94
+ flagged = 0
95
+ for li in data.get("line_items", []):
96
+ q, up, amt = li.get("qty"), li.get("unit_price"), li.get("amount")
97
+ if isinstance(q, (int, float)) and isinstance(up, (int, float)) and isinstance(amt, (int, float)):
98
+ expect = q * up
99
+ tol = max(1.0, abs(amt) * 0.02) # 2% or $1
100
+ if abs(expect - amt) <= tol:
101
+ li["math_check"] = "ok"
102
+ else:
103
+ li["math_check"] = "mismatch"; li["math_expected"] = round(expect, 2); flagged += 1
104
+ else:
105
+ li["math_check"] = "n/a"
106
+ data["math_flagged"] = flagged
107
+ return data
108
+
109
+
77
110
  class Doc(dict):
78
111
  """Result with convenience accessors."""
79
112
  @property
@@ -85,13 +118,40 @@ class Doc(dict):
85
118
  def to_json(self, **kw): return json.dumps(self, indent=2, **kw)
86
119
 
87
120
 
121
+ def _merge(parts):
122
+ out = {"doc_type": None, "vendor": None, "project": None, "trade": None,
123
+ "currency": None, "bid_total": None, "line_items": [], "exclusions": [],
124
+ "assumptions": [], "alternates": [], "scope_gaps": [], "notes": None}
125
+ seen_gap = set()
126
+ totals = []
127
+ for d in parts:
128
+ for k in ("doc_type", "vendor", "project", "trade", "currency", "notes"):
129
+ if not out[k] and d.get(k):
130
+ out[k] = d[k]
131
+ if isinstance(d.get("bid_total"), (int, float)):
132
+ totals.append(d["bid_total"])
133
+ for k in ("line_items", "exclusions", "assumptions", "alternates"):
134
+ out[k].extend(d.get(k) or [])
135
+ for g in d.get("scope_gaps") or []:
136
+ key = (g.get("missing") or "").strip().lower()
137
+ if key and key not in seen_gap:
138
+ seen_gap.add(key); out["scope_gaps"].append(g)
139
+ out["bid_total"] = max(totals) if totals else None # grand total > subtotals
140
+ return out
141
+
142
+
88
143
  def read(path: str) -> Doc:
89
- """Read a construction PDF into structured, page-cited data."""
144
+ """Read a construction PDF into structured, page-cited data.
145
+
146
+ Large multi-page estimates are chunked by page and merged, so the model's
147
+ JSON output never truncates."""
90
148
  doc = fitz.open(path)
91
- text = _page_text(doc)
92
- if len(text) < 40:
149
+ blocks = _page_blocks(doc)
150
+ if sum(len(b) for b in blocks) < 40:
93
151
  raise RuntimeError("No extractable text (scanned PDF) — vision OCR path TODO.")
94
- data = _llm(text)
152
+ chunks = _chunk(blocks)
153
+ data = validate(_merge([_llm(c) for c in chunks]))
95
154
  data["_source"] = path.split("/")[-1]
96
155
  data["_pages"] = doc.page_count
156
+ data["_chunks"] = len(chunks)
97
157
  return Doc(data)
@@ -0,0 +1,151 @@
1
+ Metadata-Version: 2.4
2
+ Name: bidreader
3
+ Version: 0.5.0
4
+ Summary: Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps/exclusions vendors bury. Every value cited to its page.
5
+ Author-email: Anmol <anmol@attentive.ai>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/anmolsam/bidreader
8
+ Project-URL: Issues, https://github.com/anmolsam/bidreader/issues
9
+ Keywords: construction,estimating,takeoff,subcontractor,bid,quote,scope,exclusions,spec,AEC,preconstruction,BOQ,LLM,MCP
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Topic :: Office/Business :: Financial
15
+ Requires-Python: >=3.10
16
+ Description-Content-Type: text/markdown
17
+ License-File: LICENSE
18
+ Requires-Dist: pymupdf>=1.24
19
+ Requires-Dist: certifi>=2024.0
20
+ Provides-Extra: tables
21
+ Requires-Dist: pdfplumber>=0.11; extra == "tables"
22
+ Provides-Extra: mcp
23
+ Requires-Dist: mcp>=1.2; extra == "mcp"
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest>=8; extra == "dev"
26
+ Dynamic: license-file
27
+
28
+ <div align="center">
29
+
30
+ # 📄 BidReader
31
+
32
+ ### Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.
33
+
34
+ Every line item carries its **page**, the **exact source text** it came from, and an **arithmetic check** (`qty × unit_price == amount`) — verification on top of extraction, not just an LLM guess.
35
+
36
+ [![PyPI](https://img.shields.io/pypi/v/bidreader?color=2ea043&label=pip%20install%20bidreader)](https://pypi.org/project/bidreader/)
37
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
38
+ [![Python](https://img.shields.io/badge/python-3.10%2B-blue)](pyproject.toml)
39
+ [![MCP](https://img.shields.io/badge/MCP-server-8b5cf6)](docs/MCP.md)
40
+ [![Runs on free models](https://img.shields.io/badge/runs%20on-free%20LLMs-success)](docs/FREE_MODELS.md)
41
+
42
+ </div>
43
+
44
+ ---
45
+
46
+ > *"Manually typing numbers from a PDF into Excel because the formatting is a crime scene… hunting for the one line where a sub quietly excluded 'trash removal' in size-8 font."*
47
+ > — r/Construction, **498 upvotes** ([source](https://www.reddit.com/r/Construction/comments/1pq34ur/))
48
+
49
+ The construction-AI gold rush is all chasing the same crowded, resisted thing — autonomous *takeoff*. The **loudest unmet pain** of estimators is upstream and downstream of it: wrangling crime-scene PDFs into clean data, and **catching what subcontractors quietly excluded** before it costs six figures on the job.
50
+
51
+ No permissively-licensed library did this. **BidReader is that primitive** — MIT, `pip install`, runs on free LLMs, and callable from any AI agent over MCP.
52
+
53
+ ## Quickstart (copy-paste, ~30 seconds)
54
+
55
+ ```bash
56
+ pip install bidreader
57
+
58
+ # Use any one — a FREE key works (see docs/FREE_MODELS.md):
59
+ export GEMINI_API_KEY=... # free at aistudio.google.com
60
+ # or export OPENROUTER_API_KEY=... (has :free models)
61
+ # or export REQUESTY_API_KEY=...
62
+
63
+ bidreader your_sub_quote.pdf
64
+ ```
65
+
66
+ ```python
67
+ from bidreader import read
68
+
69
+ doc = read("sub_quote.pdf")
70
+ doc.line_items # [{section, description, qty, unit, amount, page}, ...]
71
+ doc.exclusions # [{item, quote, page, risk}, ...] <- the buried stuff
72
+ doc.scope_gaps # trade-standard scope NOT in the doc — confirm before bidding
73
+ doc.to_json()
74
+ ```
75
+
76
+ ## Real output
77
+
78
+ On a real **$324,240.61 drywall estimate** (72 line items, scanned in seconds), BidReader's scope engine caught a genuinely expensive hole:
79
+
80
+ ```
81
+ !! SCOPE GAPS TO CONFIRM:
82
+ - Finishing (taping, mudding, sanding) -- the gypsum line items price the BOARD
83
+ only, not the finishing labor to reach a paint-ready surface.
84
+ - Door hardware -- "Door W/ Frame" lines don't include hinges/locks/closers.
85
+ - Firestopping at rated assemblies -- life-safety scope, commonly omitted.
86
+ ```
87
+
88
+ On a real **25-page multi-trade GC estimate**, it parsed **959 line items across 16 CSI divisions** (demolition → concrete → steel → finishes → plumbing → fire suppression), each page-cited. See [docs/RESULTS.md](docs/RESULTS.md) and a full worked example in [`examples/`](examples/).
89
+
90
+ ## Use it from an AI agent (MCP)
91
+
92
+ ```bash
93
+ pip install "bidreader[mcp]"
94
+ ```
95
+ ```json
96
+ { "mcpServers": { "bidreader": {
97
+ "command": "bidreader-mcp",
98
+ "env": { "GEMINI_API_KEY": "..." }
99
+ }}}
100
+ ```
101
+ Tools: `read_document`, `catch_exclusions`, `extract_line_items`. Now your agent can answer *"which subs excluded fire-stopping across this bid folder?"* Full guide: [docs/MCP.md](docs/MCP.md).
102
+
103
+ ## How it works
104
+
105
+ ```
106
+ PDF (sub-quote / bid package / spec / schedule)
107
+ → page-tagged text extraction (PyMuPDF)
108
+ → chunk by page (scales to 25+ page, 900+ line-item estimates)
109
+ → LLM structured extraction (line items · exclusions · assumptions · alternates · scope gaps)
110
+ → merge + page-cited output (JSON / CLI / MCP)
111
+ ```
112
+
113
+ Text-based, so it runs great on **free** models — see [docs/FREE_MODELS.md](docs/FREE_MODELS.md).
114
+
115
+ ## Benchmark
116
+
117
+ Reproducible ground-truth benchmark ([`benchmark/`](benchmark/)) — synthetic docs we author, so truth is exact and the PDFs ship in-repo:
118
+
119
+ | metric | score |
120
+ |---|---|
121
+ | Line-item recall | **100%** |
122
+ | Exclusion-catch recall (incl. prose-buried) | **100%** |
123
+ | No-hallucination rate (clean docs) | **100%** |
124
+ | Bid-total accuracy (±2%) | **100%** |
125
+ | Arithmetic errors caught | **2/2**, 0 false positives |
126
+
127
+ Honest caveat: synthetic docs are cleaner than real scans — these are an **upper bound** on well-structured input, not a claim about messy real bids. Uncontrolled real-document results are in [docs/RESULTS.md](docs/RESULTS.md). Reproduce: `python benchmark/generate.py && python benchmark/run.py`.
128
+
129
+ ## Why this, and why now — the evidence
130
+
131
+ A full write-up (problem, market data, prior-art gap, method, results) is in **[PAPER.md](PAPER.md)**. The short version:
132
+
133
+ - **Loudest, most-shared pain** in construction-estimating communities (the 498-upvote thread above; more cited in the paper).
134
+ - **It works *today*** — document extraction is LLM-native, unlike floor-plan symbol detection (academic SOTA tops out ~83% mAP).
135
+ - **Empty slot** — `bidreader`, `blueprint-parser`, `pytakeoff` were all unclaimed on PyPI; the only adjacent tools are AGPL/non-commercial or abandoned toys.
136
+ - **Broadest base** — every estimator *and* every construction-AI builder needs document extraction. The library is the dependency; the MCP server is the agent-era surface.
137
+
138
+ ## Roadmap
139
+
140
+ - [ ] Scanned-PDF vision OCR path
141
+ - [ ] Revision/addendum **diff** ("what changed between Addendum 3 and 4")
142
+ - [ ] Excel/CSV BOQ export + multi-quote **leveling** (compare subs side-by-side)
143
+ - [ ] Region/trade notation packs (AISC, BS/IS, AUS)
144
+
145
+ ## Contributing
146
+
147
+ PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). Good first issues: add a notation parser, a new export format, or a test fixture.
148
+
149
+ ## License
150
+
151
+ [MIT](LICENSE) © 2026. Cite via [CITATION.cff](CITATION.cff).
@@ -10,4 +10,5 @@ bidreader.egg-info/SOURCES.txt
10
10
  bidreader.egg-info/dependency_links.txt
11
11
  bidreader.egg-info/entry_points.txt
12
12
  bidreader.egg-info/requires.txt
13
- bidreader.egg-info/top_level.txt
13
+ bidreader.egg-info/top_level.txt
14
+ tests/test_offline.py
@@ -1,6 +1,9 @@
1
1
  pymupdf>=1.24
2
2
  certifi>=2024.0
3
3
 
4
+ [dev]
5
+ pytest>=8
6
+
4
7
  [mcp]
5
8
  mcp>=1.2
6
9
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "bidreader"
7
- version = "0.2.0"
7
+ version = "0.5.0"
8
8
  description = "Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps/exclusions vendors bury. Every value cited to its page."
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.10"
@@ -24,6 +24,7 @@ dependencies = ["pymupdf>=1.24", "certifi>=2024.0"]
24
24
  [project.optional-dependencies]
25
25
  tables = ["pdfplumber>=0.11"]
26
26
  mcp = ["mcp>=1.2"]
27
+ dev = ["pytest>=8"]
27
28
 
28
29
  [project.scripts]
29
30
  bidreader = "bidreader.cli:main"
@@ -0,0 +1,36 @@
1
+ """Offline regression tests (no network / no LLM). Run: pytest -q"""
2
+ from bidreader.extract import _chunk, _merge, validate
3
+
4
+
5
+ def test_chunk_respects_budget_and_keeps_all_pages():
6
+ blocks = [f"[PAGE {i}]\n" + "x" * 20000 for i in range(1, 6)]
7
+ chunks = _chunk(blocks, budget=42000)
8
+ assert len(chunks) >= 2
9
+ joined = "\n\n".join(chunks)
10
+ for i in range(1, 6):
11
+ assert f"[PAGE {i}]" in joined # no page dropped
12
+
13
+
14
+ def test_merge_concats_and_picks_grand_total():
15
+ a = {"line_items": [{"description": "a"}], "bid_total": 100,
16
+ "scope_gaps": [{"missing": "X", "why": "1"}], "vendor": "Acme"}
17
+ b = {"line_items": [{"description": "b"}], "bid_total": 250,
18
+ "scope_gaps": [{"missing": "x", "why": "dup"}, {"missing": "Y", "why": "2"}]}
19
+ m = _merge([a, b])
20
+ assert len(m["line_items"]) == 2
21
+ assert m["bid_total"] == 250 # grand total = max
22
+ assert len(m["scope_gaps"]) == 2 # 'X' and 'x' dedupe
23
+ assert m["vendor"] == "Acme"
24
+
25
+
26
+ def test_validate_flags_bad_arithmetic():
27
+ data = {"line_items": [
28
+ {"qty": 10, "unit_price": 2.0, "amount": 20.0}, # ok
29
+ {"qty": 10, "unit_price": 2.0, "amount": 25.0}, # mismatch
30
+ {"qty": None, "unit_price": 2.0, "amount": 20.0}, # n/a
31
+ ]}
32
+ out = validate(data)
33
+ checks = [li["math_check"] for li in out["line_items"]]
34
+ assert checks == ["ok", "mismatch", "n/a"]
35
+ assert out["math_flagged"] == 1
36
+ assert out["line_items"][1]["math_expected"] == 20.0
bidreader-0.2.0/PKG-INFO DELETED
@@ -1,109 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: bidreader
3
- Version: 0.2.0
4
- Summary: Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps/exclusions vendors bury. Every value cited to its page.
5
- Author-email: Anmol <anmol@attentive.ai>
6
- License: MIT
7
- Project-URL: Homepage, https://github.com/anmolsam/bidreader
8
- Project-URL: Issues, https://github.com/anmolsam/bidreader/issues
9
- Keywords: construction,estimating,takeoff,subcontractor,bid,quote,scope,exclusions,spec,AEC,preconstruction,BOQ,LLM,MCP
10
- Classifier: Development Status :: 4 - Beta
11
- Classifier: Intended Audience :: Developers
12
- Classifier: License :: OSI Approved :: MIT License
13
- Classifier: Programming Language :: Python :: 3
14
- Classifier: Topic :: Office/Business :: Financial
15
- Requires-Python: >=3.10
16
- Description-Content-Type: text/markdown
17
- License-File: LICENSE
18
- Requires-Dist: pymupdf>=1.24
19
- Requires-Dist: certifi>=2024.0
20
- Provides-Extra: tables
21
- Requires-Dist: pdfplumber>=0.11; extra == "tables"
22
- Provides-Extra: mcp
23
- Requires-Dist: mcp>=1.2; extra == "mcp"
24
- Dynamic: license-file
25
-
26
- # BidReader
27
-
28
- **Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.** Every value is cited to its page and exact source text.
29
-
30
- MIT · `pip install bidreader` · works on the PDFs estimators actually get.
31
-
32
- > *"Manually typing numbers from a PDF into Excel because the formatting is a crime scene… hunting for the one line where a sub quietly excluded 'trash removal' in size-8 font."* — r/Construction (498 upvotes)
33
-
34
- BidReader is that junior who never sleeps: it doesn't write anything new — it finds what's already there and points to it.
35
-
36
- ---
37
-
38
- ## What it does
39
-
40
- ```bash
41
- pip install bidreader
42
- export REQUESTY_API_KEY=... # or OPENROUTER_API_KEY / GEMINI_API_KEY (free tier works)
43
- bidreader sub_quote.pdf
44
- ```
45
-
46
- ```python
47
- from bidreader import read
48
- doc = read("sub_quote.pdf")
49
- doc.line_items # [{section, description, qty, unit, amount, page}, ...]
50
- doc.exclusions # [{item, quote, page, risk}, ...] <- the stuff they buried
51
- doc.scope_gaps # trade-standard scope NOT mentioned (confirm before you bid)
52
- doc.to_json()
53
- ```
54
-
55
- ## Real output (drywall sub-quote, exclusion buried in size-7 font)
56
-
57
- ```
58
- LINE ITEMS (5):
59
- 09 22 16 Metal stud framing, 3-5/8" 25ga walls 12400 SF $35,340.00 p1
60
- 09 29 00 5/8" Type X gypsum board, both faces 24800 SF $40,920.00 p1
61
- 09 29 00 Tape & finish, Level 4 24800 SF $23,560.00 p1
62
- ... BID TOTAL $121,628.00
63
-
64
- !! EXCLUSIONS CAUGHT (4):
65
- - Fire-stopping/firecaulking (page 1)
66
- "this proposal EXCLUDES fire-stopping/firecaulking at rated assemblies"
67
- risk: life-safety scope; another trade or a change order eats this cost.
68
- - Debris removal/haul-off (page 1)
69
- "removal/haul-off of construction debris (by others)"
70
- ...
71
-
72
- SCOPE GAPS TO CONFIRM (5):
73
- - Acoustic ceiling tiles -- grid framing is included but the tiles within it are not.
74
- - Rough carpentry blocking/backing for fixtures -- not mentioned.
75
- ```
76
-
77
- ## Why
78
-
79
- The construction-AI gold rush is all building the same crowded, resisted thing — autonomous *takeoff*. The loudest, most-repeated, **unmet** estimator pain is upstream and downstream of it: turning crime-scene PDFs into clean data and **catching what subs quietly excluded**. No permissive library does this. BidReader is that primitive.
80
-
81
- - **MIT** — depend on it inside your commercial estimating/BIM product (no AGPL/NC contamination).
82
- - **Provider-agnostic** — Requesty, OpenRouter, or Google AI Studio (free tier).
83
- - **Cited** — every number traces to a page + the exact source text. Trust is the real adoption blocker; this is built for it.
84
-
85
- ## Use it from an AI agent (MCP)
86
-
87
- Any MCP client — Claude Desktop, Cursor, etc. — can call BidReader:
88
-
89
- ```bash
90
- pip install "bidreader[mcp]"
91
- ```
92
- ```json
93
- {
94
- "mcpServers": {
95
- "bidreader": {
96
- "command": "bidreader-mcp",
97
- "env": { "REQUESTY_API_KEY": "rqsty-..." }
98
- }
99
- }
100
- }
101
- ```
102
- Tools exposed: `read_document(path)`, `catch_exclusions(path)`, `extract_line_items(path)`.
103
- Now your agent can answer *"which subs excluded fire-stopping?"* across a bid folder.
104
-
105
- ## Roadmap
106
- - Scanned-PDF vision OCR path · revision/addendum **diff** ("what changed between Addendum 3 and 4") · `bidreader-mcp` server (any agent can call it) · Excel/CSV export · multi-quote leveling (compare subs side-by-side).
107
-
108
- ## License
109
- MIT.
bidreader-0.2.0/README.md DELETED
@@ -1,84 +0,0 @@
1
- # BidReader
2
-
3
- **Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.** Every value is cited to its page and exact source text.
4
-
5
- MIT · `pip install bidreader` · works on the PDFs estimators actually get.
6
-
7
- > *"Manually typing numbers from a PDF into Excel because the formatting is a crime scene… hunting for the one line where a sub quietly excluded 'trash removal' in size-8 font."* — r/Construction (498 upvotes)
8
-
9
- BidReader is that junior who never sleeps: it doesn't write anything new — it finds what's already there and points to it.
10
-
11
- ---
12
-
13
- ## What it does
14
-
15
- ```bash
16
- pip install bidreader
17
- export REQUESTY_API_KEY=... # or OPENROUTER_API_KEY / GEMINI_API_KEY (free tier works)
18
- bidreader sub_quote.pdf
19
- ```
20
-
21
- ```python
22
- from bidreader import read
23
- doc = read("sub_quote.pdf")
24
- doc.line_items # [{section, description, qty, unit, amount, page}, ...]
25
- doc.exclusions # [{item, quote, page, risk}, ...] <- the stuff they buried
26
- doc.scope_gaps # trade-standard scope NOT mentioned (confirm before you bid)
27
- doc.to_json()
28
- ```
29
-
30
- ## Real output (drywall sub-quote, exclusion buried in size-7 font)
31
-
32
- ```
33
- LINE ITEMS (5):
34
- 09 22 16 Metal stud framing, 3-5/8" 25ga walls 12400 SF $35,340.00 p1
35
- 09 29 00 5/8" Type X gypsum board, both faces 24800 SF $40,920.00 p1
36
- 09 29 00 Tape & finish, Level 4 24800 SF $23,560.00 p1
37
- ... BID TOTAL $121,628.00
38
-
39
- !! EXCLUSIONS CAUGHT (4):
40
- - Fire-stopping/firecaulking (page 1)
41
- "this proposal EXCLUDES fire-stopping/firecaulking at rated assemblies"
42
- risk: life-safety scope; another trade or a change order eats this cost.
43
- - Debris removal/haul-off (page 1)
44
- "removal/haul-off of construction debris (by others)"
45
- ...
46
-
47
- SCOPE GAPS TO CONFIRM (5):
48
- - Acoustic ceiling tiles -- grid framing is included but the tiles within it are not.
49
- - Rough carpentry blocking/backing for fixtures -- not mentioned.
50
- ```
51
-
52
- ## Why
53
-
54
- The construction-AI gold rush is all building the same crowded, resisted thing — autonomous *takeoff*. The loudest, most-repeated, **unmet** estimator pain is upstream and downstream of it: turning crime-scene PDFs into clean data and **catching what subs quietly excluded**. No permissive library does this. BidReader is that primitive.
55
-
56
- - **MIT** — depend on it inside your commercial estimating/BIM product (no AGPL/NC contamination).
57
- - **Provider-agnostic** — Requesty, OpenRouter, or Google AI Studio (free tier).
58
- - **Cited** — every number traces to a page + the exact source text. Trust is the real adoption blocker; this is built for it.
59
-
60
- ## Use it from an AI agent (MCP)
61
-
62
- Any MCP client — Claude Desktop, Cursor, etc. — can call BidReader:
63
-
64
- ```bash
65
- pip install "bidreader[mcp]"
66
- ```
67
- ```json
68
- {
69
- "mcpServers": {
70
- "bidreader": {
71
- "command": "bidreader-mcp",
72
- "env": { "REQUESTY_API_KEY": "rqsty-..." }
73
- }
74
- }
75
- }
76
- ```
77
- Tools exposed: `read_document(path)`, `catch_exclusions(path)`, `extract_line_items(path)`.
78
- Now your agent can answer *"which subs excluded fire-stopping?"* across a bid folder.
79
-
80
- ## Roadmap
81
- - Scanned-PDF vision OCR path · revision/addendum **diff** ("what changed between Addendum 3 and 4") · `bidreader-mcp` server (any agent can call it) · Excel/CSV export · multi-quote leveling (compare subs side-by-side).
82
-
83
- ## License
84
- MIT.
@@ -1,109 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: bidreader
3
- Version: 0.2.0
4
- Summary: Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps/exclusions vendors bury. Every value cited to its page.
5
- Author-email: Anmol <anmol@attentive.ai>
6
- License: MIT
7
- Project-URL: Homepage, https://github.com/anmolsam/bidreader
8
- Project-URL: Issues, https://github.com/anmolsam/bidreader/issues
9
- Keywords: construction,estimating,takeoff,subcontractor,bid,quote,scope,exclusions,spec,AEC,preconstruction,BOQ,LLM,MCP
10
- Classifier: Development Status :: 4 - Beta
11
- Classifier: Intended Audience :: Developers
12
- Classifier: License :: OSI Approved :: MIT License
13
- Classifier: Programming Language :: Python :: 3
14
- Classifier: Topic :: Office/Business :: Financial
15
- Requires-Python: >=3.10
16
- Description-Content-Type: text/markdown
17
- License-File: LICENSE
18
- Requires-Dist: pymupdf>=1.24
19
- Requires-Dist: certifi>=2024.0
20
- Provides-Extra: tables
21
- Requires-Dist: pdfplumber>=0.11; extra == "tables"
22
- Provides-Extra: mcp
23
- Requires-Dist: mcp>=1.2; extra == "mcp"
24
- Dynamic: license-file
25
-
26
- # BidReader
27
-
28
- **Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.** Every value is cited to its page and exact source text.
29
-
30
- MIT · `pip install bidreader` · works on the PDFs estimators actually get.
31
-
32
- > *"Manually typing numbers from a PDF into Excel because the formatting is a crime scene… hunting for the one line where a sub quietly excluded 'trash removal' in size-8 font."* — r/Construction (498 upvotes)
33
-
34
- BidReader is that junior who never sleeps: it doesn't write anything new — it finds what's already there and points to it.
35
-
36
- ---
37
-
38
- ## What it does
39
-
40
- ```bash
41
- pip install bidreader
42
- export REQUESTY_API_KEY=... # or OPENROUTER_API_KEY / GEMINI_API_KEY (free tier works)
43
- bidreader sub_quote.pdf
44
- ```
45
-
46
- ```python
47
- from bidreader import read
48
- doc = read("sub_quote.pdf")
49
- doc.line_items # [{section, description, qty, unit, amount, page}, ...]
50
- doc.exclusions # [{item, quote, page, risk}, ...] <- the stuff they buried
51
- doc.scope_gaps # trade-standard scope NOT mentioned (confirm before you bid)
52
- doc.to_json()
53
- ```
54
-
55
- ## Real output (drywall sub-quote, exclusion buried in size-7 font)
56
-
57
- ```
58
- LINE ITEMS (5):
59
- 09 22 16 Metal stud framing, 3-5/8" 25ga walls 12400 SF $35,340.00 p1
60
- 09 29 00 5/8" Type X gypsum board, both faces 24800 SF $40,920.00 p1
61
- 09 29 00 Tape & finish, Level 4 24800 SF $23,560.00 p1
62
- ... BID TOTAL $121,628.00
63
-
64
- !! EXCLUSIONS CAUGHT (4):
65
- - Fire-stopping/firecaulking (page 1)
66
- "this proposal EXCLUDES fire-stopping/firecaulking at rated assemblies"
67
- risk: life-safety scope; another trade or a change order eats this cost.
68
- - Debris removal/haul-off (page 1)
69
- "removal/haul-off of construction debris (by others)"
70
- ...
71
-
72
- SCOPE GAPS TO CONFIRM (5):
73
- - Acoustic ceiling tiles -- grid framing is included but the tiles within it are not.
74
- - Rough carpentry blocking/backing for fixtures -- not mentioned.
75
- ```
76
-
77
- ## Why
78
-
79
- The construction-AI gold rush is all building the same crowded, resisted thing — autonomous *takeoff*. The loudest, most-repeated, **unmet** estimator pain is upstream and downstream of it: turning crime-scene PDFs into clean data and **catching what subs quietly excluded**. No permissive library does this. BidReader is that primitive.
80
-
81
- - **MIT** — depend on it inside your commercial estimating/BIM product (no AGPL/NC contamination).
82
- - **Provider-agnostic** — Requesty, OpenRouter, or Google AI Studio (free tier).
83
- - **Cited** — every number traces to a page + the exact source text. Trust is the real adoption blocker; this is built for it.
84
-
85
- ## Use it from an AI agent (MCP)
86
-
87
- Any MCP client — Claude Desktop, Cursor, etc. — can call BidReader:
88
-
89
- ```bash
90
- pip install "bidreader[mcp]"
91
- ```
92
- ```json
93
- {
94
- "mcpServers": {
95
- "bidreader": {
96
- "command": "bidreader-mcp",
97
- "env": { "REQUESTY_API_KEY": "rqsty-..." }
98
- }
99
- }
100
- }
101
- ```
102
- Tools exposed: `read_document(path)`, `catch_exclusions(path)`, `extract_line_items(path)`.
103
- Now your agent can answer *"which subs excluded fire-stopping?"* across a bid folder.
104
-
105
- ## Roadmap
106
- - Scanned-PDF vision OCR path · revision/addendum **diff** ("what changed between Addendum 3 and 4") · `bidreader-mcp` server (any agent can call it) · Excel/CSV export · multi-quote leveling (compare subs side-by-side).
107
-
108
- ## License
109
- MIT.
File without changes
File without changes