open-civics 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Tim Simpson
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,172 @@
1
+ # open-civics
2
+
3
+ Structured contact data for US elected officials — state legislators, county councils, and city councils — scraped weekly and published as npm packages.
4
+
5
+ South Carolina is fully covered: 170 state legislators and 96 local jurisdictions (every county and incorporated municipality).
6
+
7
+ ## Packages
8
+
9
+ Install via npm:
10
+
11
+ ```bash
12
+ npm install open-civics # contact data
13
+ npm install open-civics-boundaries # district boundary GeoJSON
14
+ ```
15
+
16
+ | Package | What's in it |
17
+ |---------|-------------|
18
+ | `open-civics` | Names, titles, emails, phones, districts for state and local reps |
19
+ | `open-civics-boundaries` | GeoJSON district boundaries for client-side point-in-polygon matching |
20
+
21
+ ## What the data looks like
22
+
23
+ Each local jurisdiction file has a `meta` block and a `members` array:
24
+
25
+ ```json
26
+ {
27
+ "meta": {
28
+ "state": "SC",
29
+ "level": "local",
30
+ "jurisdiction": "county:greenville",
31
+ "label": "Greenville County Council",
32
+ "lastUpdated": "2026-03-14"
33
+ },
34
+ "members": [
35
+ {
36
+ "name": "Benton Blount",
37
+ "title": "Chairman, District 19",
38
+ "email": "BBlount@greenvillecounty.org",
39
+ "phone": "(864) 483-2474"
40
+ }
41
+ ]
42
+ }
43
+ ```
44
+
45
+ State legislator files are keyed by district number with senate/house chambers:
46
+
47
+ ```json
48
+ {
49
+ "meta": { "state": "SC", "level": "state" },
50
+ "senate": {
51
+ "1": { "name": "...", "district": "1", "party": "R", "email": "...", "phone": "..." }
52
+ },
53
+ "house": {
54
+ "1": { "name": "...", "district": "1", "party": "D", "email": "...", "phone": "..." }
55
+ }
56
+ }
57
+ ```
58
+
59
+ ## Usage
60
+
61
+ ```js
62
+ // State legislators
63
+ import scState from 'open-civics/sc/state.json';
64
+ const senator = scState.senate["1"];
65
+
66
+ // Local councils
67
+ import greenvilleCounty from 'open-civics/sc/local/county-greenville.json';
68
+ const members = greenvilleCounty.members;
69
+
70
+ // District boundaries (GeoJSON FeatureCollection)
71
+ import senateBoundaries from 'open-civics-boundaries/sc/boundaries/sldu.json';
72
+ import houseBoundaries from 'open-civics-boundaries/sc/boundaries/sldl.json';
73
+ import countyBoundaries from 'open-civics-boundaries/sc/boundaries/county-greenville.json';
74
+ ```
75
+
76
+ Boundary files are standard GeoJSON FeatureCollections. Each feature has a `properties.district` field matching the district keys in the contact data. Use any point-in-polygon library (Turf.js, Mapbox, etc.) to find which district a user's address falls in.
77
+
78
+ ## Data structure
79
+
80
+ ```
81
+ data/
82
+ sc/
83
+ state.json # State legislators (senate + house + governor)
84
+ local/
85
+ county-greenville.json # Greenville County Council
86
+ place-greenville.json # Greenville City Council
87
+ ... # 96 jurisdiction files total
88
+ boundaries/
89
+ sldu.json # State senate district boundaries
90
+ sldl.json # State house district boundaries
91
+ county-greenville.json # County council district boundaries
92
+ place-greenville.json # City council district boundaries (where available)
93
+ ...
94
+ ```
95
+
96
+ ## How scraping works
97
+
98
+ Data is scraped from government websites using Python adapters — one per site pattern. Five shared adapters handle the most common CMS platforms:
99
+
100
+ | Adapter | Sites | How it works |
101
+ |---------|-------|-------------|
102
+ | Revize | ~30 cities | Parses bold name / mailto / phone patterns |
103
+ | CivicPlus | ~14 counties | Parses staff directory tables with JS-obfuscated emails |
104
+ | TableAdapter | ~10 jurisdictions | Auto-detects HTML tables with name/email/phone columns |
105
+ | GenericMailto | ~15 cities | Finds mailto links in WordPress/Drupal content areas |
106
+ | DrupalViews | ~3 counties | Parses Drupal views-row and person-item patterns |
107
+
108
+ The remaining jurisdictions use bespoke adapters, MASC (Municipal Association of SC), or SCAC (SC Association of Counties) as data sources.
109
+
110
+ Boundary data comes from the US Census TIGER/Line shapefiles (state districts) and ArcGIS REST services (local districts).
111
+
112
+ ## Running the scrapers
113
+
114
+ Requires Python 3.12+.
115
+
116
+ ```bash
117
+ pip install -r requirements.txt
118
+
119
+ # Scrape everything for a state
120
+ python -m scrapers --state SC
121
+
122
+ # Scrape only state legislators
123
+ python -m scrapers --state SC --state-only
124
+
125
+ # Scrape only local councils
126
+ python -m scrapers --state SC --local-only
127
+
128
+ # Scrape state + local, skip boundaries (faster)
129
+ python -m scrapers --state SC --skip-boundaries
130
+
131
+ # Preview what would run without scraping
132
+ python -m scrapers --dry-run
133
+
134
+ # Validate all data files
135
+ python validate.py
136
+ ```
137
+
138
+ ## Automation
139
+
140
+ Three GitHub Actions workflows keep data fresh:
141
+
142
+ - **Weekly scrape** (Mondays 10am ET): Runs state + local scrapers, opens a PR with changes
143
+ - **Monthly scrape** (1st of month 10am ET): Full scrape including boundary rebuilds
144
+ - **Validation**: Runs on every PR touching `data/` — auto-merges `data-update/*` branches if validation passes
145
+ - **Publish**: Weekly npm publish if data changed since last release
146
+
147
+ ## Running tests
148
+
149
+ ```bash
150
+ pip install -r requirements-dev.txt
151
+
152
+ # Unit tests (fast, no network)
153
+ pytest tests/unit/ -v
154
+
155
+ # All tests including integration
156
+ pytest -v
157
+
158
+ # Refresh integration test snapshots from live sites
159
+ python scripts/refresh_snapshots.py
160
+ ```
161
+
162
+ ## Adding a new state
163
+
164
+ 1. Add a state block to `registry.json` under `states.XX`
165
+ 2. Add email format rules to `scrapers/state_email_rules.py`
166
+ 3. Run `python -m scrapers --state XX --state-only` to pull legislators
167
+ 4. Add local jurisdiction adapters as needed (see `CLAUDE.md` for the adapter selection checklist)
168
+ 5. Add boundary sources to `registry.json` and run `python -m scrapers --state XX --boundaries-only`
169
+
170
+ ## License
171
+
172
+ See [LICENSE](LICENSE) for details.