nesstar-converter 1.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- nesstar_converter-1.0.1/CITATION.cff +28 -0
- nesstar_converter-1.0.1/LICENSE +21 -0
- nesstar_converter-1.0.1/MANIFEST.in +7 -0
- nesstar_converter-1.0.1/PKG-INFO +264 -0
- nesstar_converter-1.0.1/README.md +236 -0
- nesstar_converter-1.0.1/docs/TECHNICAL.md +205 -0
- nesstar_converter-1.0.1/docs/global-coverage.md +155 -0
- nesstar_converter-1.0.1/examples/basic_usage.py +29 -0
- nesstar_converter-1.0.1/nesstar_converter.egg-info/PKG-INFO +264 -0
- nesstar_converter-1.0.1/nesstar_converter.egg-info/SOURCES.txt +18 -0
- nesstar_converter-1.0.1/nesstar_converter.egg-info/dependency_links.txt +1 -0
- nesstar_converter-1.0.1/nesstar_converter.egg-info/entry_points.txt +2 -0
- nesstar_converter-1.0.1/nesstar_converter.egg-info/requires.txt +8 -0
- nesstar_converter-1.0.1/nesstar_converter.egg-info/top_level.txt +1 -0
- nesstar_converter-1.0.1/nesstar_converter.py +1557 -0
- nesstar_converter-1.0.1/pyproject.toml +54 -0
- nesstar_converter-1.0.1/setup.cfg +4 -0
- nesstar_converter-1.0.1/tests/__init__.py +0 -0
- nesstar_converter-1.0.1/tests/conftest.py +7 -0
- nesstar_converter-1.0.1/tests/test_nesstar_converter.py +935 -0
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
cff-version: 1.2.0
|
|
2
|
+
message: "If you use nesstar-converter in research, please cite it using this metadata."
|
|
3
|
+
title: "nesstar-converter"
|
|
4
|
+
type: software
|
|
5
|
+
version: 1.0.1
|
|
6
|
+
date-released: 2026-04-13
|
|
7
|
+
license: MIT
|
|
8
|
+
repository-code: "https://github.com/abhinavjnu/nesstar-converter"
|
|
9
|
+
url: "https://github.com/abhinavjnu/nesstar-converter"
|
|
10
|
+
abstract: >-
|
|
11
|
+
Pure-Python converter for legacy Nesstar survey binaries to open formats such
|
|
12
|
+
as Parquet, CSV, Excel, Stata, JSON, and fixed-width text.
|
|
13
|
+
keywords:
|
|
14
|
+
- Nesstar
|
|
15
|
+
- survey data
|
|
16
|
+
- microdata
|
|
17
|
+
- DDI
|
|
18
|
+
- Parquet
|
|
19
|
+
authors:
|
|
20
|
+
- family-names: "abhinavjnu"
|
|
21
|
+
preferred-citation:
|
|
22
|
+
type: software
|
|
23
|
+
title: "nesstar-converter"
|
|
24
|
+
version: 1.0.1
|
|
25
|
+
date-released: 2026-04-13
|
|
26
|
+
url: "https://github.com/abhinavjnu/nesstar-converter"
|
|
27
|
+
authors:
|
|
28
|
+
- family-names: "abhinavjnu"
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 abhinavjnu
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,264 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: nesstar-converter
|
|
3
|
+
Version: 1.0.1
|
|
4
|
+
Summary: Pure-Python converter for legacy Nesstar survey files
|
|
5
|
+
Author: abhinavjnu
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/abhinavjnu/nesstar-converter
|
|
8
|
+
Project-URL: Repository, https://github.com/abhinavjnu/nesstar-converter
|
|
9
|
+
Project-URL: Issues, https://github.com/abhinavjnu/nesstar-converter/issues
|
|
10
|
+
Project-URL: Documentation, https://github.com/abhinavjnu/nesstar-converter/tree/main/docs
|
|
11
|
+
Keywords: nesstar,microdata,ddi,survey-data,parquet
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Environment :: Console
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Classifier: Topic :: Scientific/Engineering
|
|
17
|
+
Requires-Python: >=3.10
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
License-File: LICENSE
|
|
20
|
+
Requires-Dist: pandas>=2.0
|
|
21
|
+
Requires-Dist: pyarrow>=12.0
|
|
22
|
+
Requires-Dist: numpy>=1.24
|
|
23
|
+
Requires-Dist: tqdm>=4.60
|
|
24
|
+
Requires-Dist: openpyxl>=3.0
|
|
25
|
+
Provides-Extra: dev
|
|
26
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
27
|
+
Dynamic: license-file
|
|
28
|
+
|
|
29
|
+
# Nesstar Converter
|
|
30
|
+
|
|
31
|
+
[](https://www.python.org/downloads/)
|
|
32
|
+
[](LICENSE)
|
|
33
|
+
[](https://github.com/abhinavjnu/nesstar-converter/actions/workflows/ci.yml)
|
|
34
|
+
|
|
35
|
+
**Pure-Python conversion for legacy Nesstar survey files - no `NesstarExporter.exe`, no Windows-only GUI, no dependency on discontinued desktop tooling.**
|
|
36
|
+
|
|
37
|
+
Nesstar was once a common dissemination format across social-science archives, national data services, and statistical agencies worldwide. The legacy ecosystem persists, but the original tooling is fragmented: many servers are gone, much documentation is outdated, and the surviving migration tools still depend on a proprietary Windows executable.
|
|
38
|
+
|
|
39
|
+
`nesstar-converter` takes the opposite approach. It reverse-engineers the binary format directly in Python and writes open outputs such as Parquet, CSV, Excel, Stata, and JSON on Linux, macOS, and Windows.
|
|
40
|
+
|
|
41
|
+
This project started with India's MoSPI survey archives, but the underlying problem is global. The wider Nesstar ecosystem touched the UK Data Archive, the European Social Survey, Statistics Canada / ODESI, GESIS, SSJDA in Japan, and the IHSN / World Bank metadata workflow. See [`docs/global-coverage.md`](docs/global-coverage.md) for the evidence-backed map.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Why this exists
|
|
46
|
+
|
|
47
|
+
- **Zero `.exe` dependency** - no `NesstarExporter.exe`, no batch wrappers, no Wine
|
|
48
|
+
- **Cross-platform** - works anywhere Python 3.10+ works
|
|
49
|
+
- **Reverse-engineered binary parser** - reads `.Nesstar` files directly
|
|
50
|
+
- **Open output formats** - Parquet, CSV, TSV, Excel, Stata, JSON, JSONL, fixed-width text
|
|
51
|
+
- **Validation-first** - compares converted output against official Nesstar Explorer exports
|
|
52
|
+
- **Non-technical friendly** - one CLI, clear commands, sensible defaults
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## `nesstar-converter` vs `ihsn/nesstar-exporter`
|
|
57
|
+
|
|
58
|
+
The IHSN tool is useful if you already have the official Windows exporter binary and want to automate that workflow. It is **not** a replacement for the binary itself.
|
|
59
|
+
|
|
60
|
+
| Dimension | `ihsn/nesstar-exporter` | `nesstar-converter` |
|
|
61
|
+
|---|---|---|
|
|
62
|
+
| Core approach | Python wrapper around `NesstarExporter.exe` | Pure-Python binary parser |
|
|
63
|
+
| Requires `NesstarExporter.exe` | **Yes** | **No** |
|
|
64
|
+
| OS model | Windows-oriented workflow | Linux / macOS / Windows |
|
|
65
|
+
| Reads binary directly | No | Yes |
|
|
66
|
+
| Reverse-engineered format support | No | Yes |
|
|
67
|
+
| Parquet output | No | Yes |
|
|
68
|
+
| RDF / DDI export via official tool | Yes | No |
|
|
69
|
+
| Validation against text exports | No built-in validation layer | Yes |
|
|
70
|
+
| Install model | Repo scripts + external exe path | Standard Python package / console script |
|
|
71
|
+
|
|
72
|
+
**Evidence:** the IHSN repo's own README, `config.json`, `src/config.py`, and `src/exporter.py` all require a path to `NesstarExporter.exe` and shell out to it with `subprocess.run(...)`.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## Who uses Nesstar?
|
|
77
|
+
|
|
78
|
+
Nesstar was not just an India-specific format. It was part of a broader international archive ecosystem.
|
|
79
|
+
|
|
80
|
+
| Institution / repository | Country / region | What we verified |
|
|
81
|
+
|---|---|---|
|
|
82
|
+
| **NSD / Sikt** | Norway | Original Nesstar developer and ESS host |
|
|
83
|
+
| **UK Data Archive / UK Data Service** | United Kingdom | Co-developer and former Nesstar WebView operator |
|
|
84
|
+
| **European Social Survey** | Pan-European | Disseminated through Nesstar from 2004 |
|
|
85
|
+
| **Statistics Canada / ODESI** | Canada | Licensed the full Nesstar suite; former WebView instance |
|
|
86
|
+
| **GESIS ZACAT** | Germany | Former Nesstar WebView catalog |
|
|
87
|
+
| **Sciences Po / CDSP** | France | Publicly documented migration away from Nesstar |
|
|
88
|
+
| **SSJDA / CSRDA** | Japan | Publicly documented Nesstar deployment |
|
|
89
|
+
| **IHSN / World Bank ecosystem** | Global | Still distributes Nesstar Publisher and maintains migration tooling |
|
|
90
|
+
| **India MoSPI / NSO** | India | Active distributor of `.Nesstar` survey files |
|
|
91
|
+
| **DataFirst / Stats SA** | South Africa | Important related archive / testing target, but evidence is legacy or mixed |
|
|
92
|
+
|
|
93
|
+
For the full institution table, confidence levels, and source links, see [`docs/global-coverage.md`](docs/global-coverage.md).
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Supported formats
|
|
98
|
+
|
|
99
|
+
| Format | Extension | Best for |
|
|
100
|
+
|---|---|---|
|
|
101
|
+
| `parquet` | `.parquet` | Analytics, DuckDB, pandas, R, long-term storage |
|
|
102
|
+
| `csv` | `.csv` | Universal spreadsheet compatibility |
|
|
103
|
+
| `tsv` | `.tsv` | Tab-separated workflows and legacy survey tooling |
|
|
104
|
+
| `excel` | `.xlsx` | Non-technical users |
|
|
105
|
+
| `stata` | `.dta` | Stata users, with leading zeros preserved |
|
|
106
|
+
| `json` | `.json` | Web apps and structured interchange |
|
|
107
|
+
| `jsonl` | `.jsonl` | Streaming and line-oriented pipelines |
|
|
108
|
+
| `fwf` | `.txt` | Fixed-width text output |
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Quick start
|
|
113
|
+
|
|
114
|
+
### Install from source
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
git clone https://github.com/abhinavjnu/nesstar-converter.git
|
|
118
|
+
cd nesstar-converter
|
|
119
|
+
python -m pip install -e ".[dev]"
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Inspect a file
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
nesstar-converter info path/to/file.Nesstar path/to/ddi.xml
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Convert to open formats
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv,parquet,stata
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Validate against official text exports
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
nesstar-converter validate ./output ./exported_text
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
If the companion `ddi.xml` sits beside the `.Nesstar` file, you can omit it and the tool will auto-detect it.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Validation and coverage
|
|
145
|
+
|
|
146
|
+
This repository distinguishes between:
|
|
147
|
+
|
|
148
|
+
1. **Cell-level validation** - converted output matched official Nesstar Explorer exports row-for-row and value-for-value.
|
|
149
|
+
2. **Structure-level verification** - official export files matched published file counts and variable counts, but the raw package lacked the companion DDI XML required for full binary re-validation.
|
|
150
|
+
|
|
151
|
+
| Survey / corpus | Years / rounds | Verification level | Result |
|
|
152
|
+
|---|---|---|---|
|
|
153
|
+
| **EUS** | 38th Round (1983) | Cell-level | 9/9 blocks, 3.4M rows, zero mismatches against official exports |
|
|
154
|
+
| **HCES** | 38th (1983), 45th (1989-90), 66th (2009-10) | Cell-level | 27/28 blocks, 23.4M+ rows, zero mismatches for blocks present in DDI |
|
|
155
|
+
| **PLFS** | 2017-18 to 2022-23 | Structure-level | 24/24 official export files matched NADA data-dictionary row/column counts; one 2017-18 revisit export includes a trailing blank tab column |
|
|
156
|
+
|
|
157
|
+
**PLFS note:** the local PLFS raw ZIPs contain `.Nesstar` files, official text exports, and the legacy Nesstar Explorer installer, but not the companion DDI XML needed for full binary decoding in the current open parser. That means PLFS is confirmed as a real Nesstar distribution corpus, but its current evidence in this repo is structural rather than full cell-level re-validation.
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## For non-technical users
|
|
162
|
+
|
|
163
|
+
If your goal is simply "turn this old survey file into something Excel can open", the shortest path is:
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
git clone https://github.com/abhinavjnu/nesstar-converter.git
|
|
167
|
+
cd nesstar-converter
|
|
168
|
+
python -m pip install -e .
|
|
169
|
+
nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Then open the generated `.csv` files in Excel, LibreOffice, Google Sheets, Stata, R, or Python.
|
|
173
|
+
|
|
174
|
+
If you are unsure which format to choose:
|
|
175
|
+
|
|
176
|
+
| You want to... | Use |
|
|
177
|
+
|---|---|
|
|
178
|
+
| Open the data in Excel | `csv` |
|
|
179
|
+
| Work in Stata | `stata` |
|
|
180
|
+
| Analyze in Python / R / DuckDB | `parquet` |
|
|
181
|
+
| Preserve a text-like interchange format | `tsv` or `fwf` |
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Python API
|
|
186
|
+
|
|
187
|
+
```python
|
|
188
|
+
from nesstar_converter import convert_nesstar, show_info
|
|
189
|
+
|
|
190
|
+
show_info("survey.Nesstar", "ddi.xml")
|
|
191
|
+
|
|
192
|
+
report = convert_nesstar(
|
|
193
|
+
"survey.Nesstar",
|
|
194
|
+
"ddi.xml",
|
|
195
|
+
"./output",
|
|
196
|
+
formats=["csv", "parquet"],
|
|
197
|
+
year="2022-23",
|
|
198
|
+
)
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
Key functions:
|
|
202
|
+
|
|
203
|
+
| Function | Purpose |
|
|
204
|
+
|---|---|
|
|
205
|
+
| `convert_nesstar(...)` | Convert one `.Nesstar` file to one or more formats |
|
|
206
|
+
| `parse_ddi(...)` | Parse DDI XML block and variable metadata |
|
|
207
|
+
| `show_info(...)` | Inspect a file before conversion |
|
|
208
|
+
| `validate_against_export(...)` | Compare converted output to official text exports |
|
|
209
|
+
| `batch_convert(...)` | Convert a survey corpus in batch mode |
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Limitations
|
|
214
|
+
|
|
215
|
+
- **Full decoding currently expects DDI metadata.** If a distributor ships only the `.Nesstar` binary and omits the companion DDI XML, the current parser cannot yet do full open extraction on its own.
|
|
216
|
+
- **This is a data-conversion tool, not an RDF packager.** If your goal is specifically DDI / RDF export via the official legacy toolchain, the IHSN wrapper may still be useful - but it still requires `NesstarExporter.exe`.
|
|
217
|
+
- **Legacy ecosystems are inconsistent.** Different institutions used different Nesstar-era conventions, so community test cases from outside India are especially valuable.
|
|
218
|
+
|
|
219
|
+
---
|
|
220
|
+
|
|
221
|
+
## Documentation
|
|
222
|
+
|
|
223
|
+
- [`docs/TECHNICAL.md`](docs/TECHNICAL.md) - binary format notes and implementation details
|
|
224
|
+
- [`docs/global-coverage.md`](docs/global-coverage.md) - institutions, countries, archives, and source links
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Testing
|
|
229
|
+
|
|
230
|
+
```bash
|
|
231
|
+
python -m pip install -e ".[dev]"
|
|
232
|
+
pytest tests/ -v
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
CI runs unit tests on Python 3.10-3.13 and checks formatting with Ruff.
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## Contributing
|
|
240
|
+
|
|
241
|
+
Good contributions for this project:
|
|
242
|
+
|
|
243
|
+
- Test the converter on non-MoSPI Nesstar files
|
|
244
|
+
- Report datasets that still circulate as `.Nesstar` / `.NSDstat`
|
|
245
|
+
- Share evidence of legacy Nesstar repositories or migrations
|
|
246
|
+
- Improve metadata recovery for archives that omit `ddi.xml`
|
|
247
|
+
|
|
248
|
+
Community testing requests are tracked in the issue tracker, including:
|
|
249
|
+
|
|
250
|
+
- Stats SA GHS
|
|
251
|
+
- UK Data Archive legacy Nesstar packages
|
|
252
|
+
- World Bank / IHSN LSMS-style Nesstar corpora
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## Citation
|
|
257
|
+
|
|
258
|
+
If you use this tool in research, please cite it using [`CITATION.cff`](CITATION.cff).
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## License
|
|
263
|
+
|
|
264
|
+
[MIT](LICENSE)
|
|
@@ -0,0 +1,236 @@
|
|
|
1
|
+
# Nesstar Converter
|
|
2
|
+
|
|
3
|
+
[](https://www.python.org/downloads/)
|
|
4
|
+
[](LICENSE)
|
|
5
|
+
[](https://github.com/abhinavjnu/nesstar-converter/actions/workflows/ci.yml)
|
|
6
|
+
|
|
7
|
+
**Pure-Python conversion for legacy Nesstar survey files - no `NesstarExporter.exe`, no Windows-only GUI, no dependency on discontinued desktop tooling.**
|
|
8
|
+
|
|
9
|
+
Nesstar was once a common dissemination format across social-science archives, national data services, and statistical agencies worldwide. The legacy ecosystem persists, but the original tooling is fragmented: many servers are gone, much documentation is outdated, and the surviving migration tools still depend on a proprietary Windows executable.
|
|
10
|
+
|
|
11
|
+
`nesstar-converter` takes the opposite approach. It reverse-engineers the binary format directly in Python and writes open outputs such as Parquet, CSV, Excel, Stata, and JSON on Linux, macOS, and Windows.
|
|
12
|
+
|
|
13
|
+
This project started with India's MoSPI survey archives, but the underlying problem is global. The wider Nesstar ecosystem touched the UK Data Archive, the European Social Survey, Statistics Canada / ODESI, GESIS, SSJDA in Japan, and the IHSN / World Bank metadata workflow. See [`docs/global-coverage.md`](docs/global-coverage.md) for the evidence-backed map.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Why this exists
|
|
18
|
+
|
|
19
|
+
- **Zero `.exe` dependency** - no `NesstarExporter.exe`, no batch wrappers, no Wine
|
|
20
|
+
- **Cross-platform** - works anywhere Python 3.10+ works
|
|
21
|
+
- **Reverse-engineered binary parser** - reads `.Nesstar` files directly
|
|
22
|
+
- **Open output formats** - Parquet, CSV, TSV, Excel, Stata, JSON, JSONL, fixed-width text
|
|
23
|
+
- **Validation-first** - compares converted output against official Nesstar Explorer exports
|
|
24
|
+
- **Non-technical friendly** - one CLI, clear commands, sensible defaults
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## `nesstar-converter` vs `ihsn/nesstar-exporter`
|
|
29
|
+
|
|
30
|
+
The IHSN tool is useful if you already have the official Windows exporter binary and want to automate that workflow. It is **not** a replacement for the binary itself.
|
|
31
|
+
|
|
32
|
+
| Dimension | `ihsn/nesstar-exporter` | `nesstar-converter` |
|
|
33
|
+
|---|---|---|
|
|
34
|
+
| Core approach | Python wrapper around `NesstarExporter.exe` | Pure-Python binary parser |
|
|
35
|
+
| Requires `NesstarExporter.exe` | **Yes** | **No** |
|
|
36
|
+
| OS model | Windows-oriented workflow | Linux / macOS / Windows |
|
|
37
|
+
| Reads binary directly | No | Yes |
|
|
38
|
+
| Reverse-engineered format support | No | Yes |
|
|
39
|
+
| Parquet output | No | Yes |
|
|
40
|
+
| RDF / DDI export via official tool | Yes | No |
|
|
41
|
+
| Validation against text exports | No built-in validation layer | Yes |
|
|
42
|
+
| Install model | Repo scripts + external exe path | Standard Python package / console script |
|
|
43
|
+
|
|
44
|
+
**Evidence:** the IHSN repo's own README, `config.json`, `src/config.py`, and `src/exporter.py` all require a path to `NesstarExporter.exe` and shell out to it with `subprocess.run(...)`.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Who uses Nesstar?
|
|
49
|
+
|
|
50
|
+
Nesstar was not just an India-specific format. It was part of a broader international archive ecosystem.
|
|
51
|
+
|
|
52
|
+
| Institution / repository | Country / region | What we verified |
|
|
53
|
+
|---|---|---|
|
|
54
|
+
| **NSD / Sikt** | Norway | Original Nesstar developer and ESS host |
|
|
55
|
+
| **UK Data Archive / UK Data Service** | United Kingdom | Co-developer and former Nesstar WebView operator |
|
|
56
|
+
| **European Social Survey** | Pan-European | Disseminated through Nesstar from 2004 |
|
|
57
|
+
| **Statistics Canada / ODESI** | Canada | Licensed the full Nesstar suite; former WebView instance |
|
|
58
|
+
| **GESIS ZACAT** | Germany | Former Nesstar WebView catalog |
|
|
59
|
+
| **Sciences Po / CDSP** | France | Publicly documented migration away from Nesstar |
|
|
60
|
+
| **SSJDA / CSRDA** | Japan | Publicly documented Nesstar deployment |
|
|
61
|
+
| **IHSN / World Bank ecosystem** | Global | Still distributes Nesstar Publisher and maintains migration tooling |
|
|
62
|
+
| **India MoSPI / NSO** | India | Active distributor of `.Nesstar` survey files |
|
|
63
|
+
| **DataFirst / Stats SA** | South Africa | Important related archive / testing target, but evidence is legacy or mixed |
|
|
64
|
+
|
|
65
|
+
For the full institution table, confidence levels, and source links, see [`docs/global-coverage.md`](docs/global-coverage.md).
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Supported formats
|
|
70
|
+
|
|
71
|
+
| Format | Extension | Best for |
|
|
72
|
+
|---|---|---|
|
|
73
|
+
| `parquet` | `.parquet` | Analytics, DuckDB, pandas, R, long-term storage |
|
|
74
|
+
| `csv` | `.csv` | Universal spreadsheet compatibility |
|
|
75
|
+
| `tsv` | `.tsv` | Tab-separated workflows and legacy survey tooling |
|
|
76
|
+
| `excel` | `.xlsx` | Non-technical users |
|
|
77
|
+
| `stata` | `.dta` | Stata users, with leading zeros preserved |
|
|
78
|
+
| `json` | `.json` | Web apps and structured interchange |
|
|
79
|
+
| `jsonl` | `.jsonl` | Streaming and line-oriented pipelines |
|
|
80
|
+
| `fwf` | `.txt` | Fixed-width text output |
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Quick start
|
|
85
|
+
|
|
86
|
+
### Install from source
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
git clone https://github.com/abhinavjnu/nesstar-converter.git
|
|
90
|
+
cd nesstar-converter
|
|
91
|
+
python -m pip install -e ".[dev]"
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Inspect a file
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
nesstar-converter info path/to/file.Nesstar path/to/ddi.xml
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Convert to open formats
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv,parquet,stata
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Validate against official text exports
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
nesstar-converter validate ./output ./exported_text
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
If the companion `ddi.xml` sits beside the `.Nesstar` file, you can omit it and the tool will auto-detect it.
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Validation and coverage
|
|
117
|
+
|
|
118
|
+
This repository distinguishes between:
|
|
119
|
+
|
|
120
|
+
1. **Cell-level validation** - converted output matched official Nesstar Explorer exports row-for-row and value-for-value.
|
|
121
|
+
2. **Structure-level verification** - official export files matched published file counts and variable counts, but the raw package lacked the companion DDI XML required for full binary re-validation.
|
|
122
|
+
|
|
123
|
+
| Survey / corpus | Years / rounds | Verification level | Result |
|
|
124
|
+
|---|---|---|---|
|
|
125
|
+
| **EUS** | 38th Round (1983) | Cell-level | 9/9 blocks, 3.4M rows, zero mismatches against official exports |
|
|
126
|
+
| **HCES** | 38th (1983), 45th (1989-90), 66th (2009-10) | Cell-level | 27/28 blocks, 23.4M+ rows, zero mismatches for blocks present in DDI |
|
|
127
|
+
| **PLFS** | 2017-18 to 2022-23 | Structure-level | 24/24 official export files matched NADA data-dictionary row/column counts; one 2017-18 revisit export includes a trailing blank tab column |
|
|
128
|
+
|
|
129
|
+
**PLFS note:** the local PLFS raw ZIPs contain `.Nesstar` files, official text exports, and the legacy Nesstar Explorer installer, but not the companion DDI XML needed for full binary decoding in the current open parser. That means PLFS is confirmed as a real Nesstar distribution corpus, but its current evidence in this repo is structural rather than full cell-level re-validation.
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## For non-technical users
|
|
134
|
+
|
|
135
|
+
If your goal is simply "turn this old survey file into something Excel can open", the shortest path is:
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
git clone https://github.com/abhinavjnu/nesstar-converter.git
|
|
139
|
+
cd nesstar-converter
|
|
140
|
+
python -m pip install -e .
|
|
141
|
+
nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Then open the generated `.csv` files in Excel, LibreOffice, Google Sheets, Stata, R, or Python.
|
|
145
|
+
|
|
146
|
+
If you are unsure which format to choose:
|
|
147
|
+
|
|
148
|
+
| You want to... | Use |
|
|
149
|
+
|---|---|
|
|
150
|
+
| Open the data in Excel | `csv` |
|
|
151
|
+
| Work in Stata | `stata` |
|
|
152
|
+
| Analyze in Python / R / DuckDB | `parquet` |
|
|
153
|
+
| Preserve a text-like interchange format | `tsv` or `fwf` |
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Python API
|
|
158
|
+
|
|
159
|
+
```python
|
|
160
|
+
from nesstar_converter import convert_nesstar, show_info
|
|
161
|
+
|
|
162
|
+
show_info("survey.Nesstar", "ddi.xml")
|
|
163
|
+
|
|
164
|
+
report = convert_nesstar(
|
|
165
|
+
"survey.Nesstar",
|
|
166
|
+
"ddi.xml",
|
|
167
|
+
"./output",
|
|
168
|
+
formats=["csv", "parquet"],
|
|
169
|
+
year="2022-23",
|
|
170
|
+
)
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Key functions:
|
|
174
|
+
|
|
175
|
+
| Function | Purpose |
|
|
176
|
+
|---|---|
|
|
177
|
+
| `convert_nesstar(...)` | Convert one `.Nesstar` file to one or more formats |
|
|
178
|
+
| `parse_ddi(...)` | Parse DDI XML block and variable metadata |
|
|
179
|
+
| `show_info(...)` | Inspect a file before conversion |
|
|
180
|
+
| `validate_against_export(...)` | Compare converted output to official text exports |
|
|
181
|
+
| `batch_convert(...)` | Convert a survey corpus in batch mode |
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Limitations
|
|
186
|
+
|
|
187
|
+
- **Full decoding currently expects DDI metadata.** If a distributor ships only the `.Nesstar` binary and omits the companion DDI XML, the current parser cannot yet do full open extraction on its own.
|
|
188
|
+
- **This is a data-conversion tool, not an RDF packager.** If your goal is specifically DDI / RDF export via the official legacy toolchain, the IHSN wrapper may still be useful - but it still requires `NesstarExporter.exe`.
|
|
189
|
+
- **Legacy ecosystems are inconsistent.** Different institutions used different Nesstar-era conventions, so community test cases from outside India are especially valuable.
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
## Documentation
|
|
194
|
+
|
|
195
|
+
- [`docs/TECHNICAL.md`](docs/TECHNICAL.md) - binary format notes and implementation details
|
|
196
|
+
- [`docs/global-coverage.md`](docs/global-coverage.md) - institutions, countries, archives, and source links
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## Testing
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
python -m pip install -e ".[dev]"
|
|
204
|
+
pytest tests/ -v
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
CI runs unit tests on Python 3.10-3.13 and checks formatting with Ruff.
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## Contributing
|
|
212
|
+
|
|
213
|
+
Good contributions for this project:
|
|
214
|
+
|
|
215
|
+
- Test the converter on non-MoSPI Nesstar files
|
|
216
|
+
- Report datasets that still circulate as `.Nesstar` / `.NSDstat`
|
|
217
|
+
- Share evidence of legacy Nesstar repositories or migrations
|
|
218
|
+
- Improve metadata recovery for archives that omit `ddi.xml`
|
|
219
|
+
|
|
220
|
+
Community testing requests are tracked in the issue tracker, including:
|
|
221
|
+
|
|
222
|
+
- Stats SA GHS
|
|
223
|
+
- UK Data Archive legacy Nesstar packages
|
|
224
|
+
- World Bank / IHSN LSMS-style Nesstar corpora
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Citation
|
|
229
|
+
|
|
230
|
+
If you use this tool in research, please cite it using [`CITATION.cff`](CITATION.cff).
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## License
|
|
235
|
+
|
|
236
|
+
[MIT](LICENSE)
|