nesstar-converter 1.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,28 @@
1
+ cff-version: 1.2.0
2
+ message: "If you use nesstar-converter in research, please cite it using this metadata."
3
+ title: "nesstar-converter"
4
+ type: software
5
+ version: 1.0.1
6
+ date-released: 2026-04-13
7
+ license: MIT
8
+ repository-code: "https://github.com/abhinavjnu/nesstar-converter"
9
+ url: "https://github.com/abhinavjnu/nesstar-converter"
10
+ abstract: >-
11
+ Pure-Python converter for legacy Nesstar survey binaries to open formats such
12
+ as Parquet, CSV, Excel, Stata, JSON, and fixed-width text.
13
+ keywords:
14
+ - Nesstar
15
+ - survey data
16
+ - microdata
17
+ - DDI
18
+ - Parquet
19
+ authors:
20
+ - family-names: "abhinavjnu"
21
+ preferred-citation:
22
+ type: software
23
+ title: "nesstar-converter"
24
+ version: 1.0.1
25
+ date-released: 2026-04-13
26
+ url: "https://github.com/abhinavjnu/nesstar-converter"
27
+ authors:
28
+ - family-names: "abhinavjnu"
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 abhinavjnu
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,7 @@
1
+ include LICENSE
2
+ include README.md
3
+ include CITATION.cff
4
+ include pyproject.toml
5
+ recursive-include tests *.py
6
+ recursive-include examples *.py
7
+ recursive-include docs *.md
@@ -0,0 +1,264 @@
1
+ Metadata-Version: 2.4
2
+ Name: nesstar-converter
3
+ Version: 1.0.1
4
+ Summary: Pure-Python converter for legacy Nesstar survey files
5
+ Author: abhinavjnu
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/abhinavjnu/nesstar-converter
8
+ Project-URL: Repository, https://github.com/abhinavjnu/nesstar-converter
9
+ Project-URL: Issues, https://github.com/abhinavjnu/nesstar-converter/issues
10
+ Project-URL: Documentation, https://github.com/abhinavjnu/nesstar-converter/tree/main/docs
11
+ Keywords: nesstar,microdata,ddi,survey-data,parquet
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: Console
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Topic :: Scientific/Engineering
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ License-File: LICENSE
20
+ Requires-Dist: pandas>=2.0
21
+ Requires-Dist: pyarrow>=12.0
22
+ Requires-Dist: numpy>=1.24
23
+ Requires-Dist: tqdm>=4.60
24
+ Requires-Dist: openpyxl>=3.0
25
+ Provides-Extra: dev
26
+ Requires-Dist: pytest>=7.0; extra == "dev"
27
+ Dynamic: license-file
28
+
29
+ # Nesstar Converter
30
+
31
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
32
+ [![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
33
+ [![CI](https://github.com/abhinavjnu/nesstar-converter/actions/workflows/ci.yml/badge.svg)](https://github.com/abhinavjnu/nesstar-converter/actions/workflows/ci.yml)
34
+
35
+ **Pure-Python conversion for legacy Nesstar survey files - no `NesstarExporter.exe`, no Windows-only GUI, no dependency on discontinued desktop tooling.**
36
+
37
+ Nesstar was once a common dissemination format across social-science archives, national data services, and statistical agencies worldwide. The legacy ecosystem persists, but the original tooling is fragmented: many servers are gone, much documentation is outdated, and the surviving migration tools still depend on a proprietary Windows executable.
38
+
39
+ `nesstar-converter` takes the opposite approach. It reverse-engineers the binary format directly in Python and writes open outputs such as Parquet, CSV, Excel, Stata, and JSON on Linux, macOS, and Windows.
40
+
41
+ This project started with India's MoSPI survey archives, but the underlying problem is global. The wider Nesstar ecosystem touched the UK Data Archive, the European Social Survey, Statistics Canada / ODESI, GESIS, SSJDA in Japan, and the IHSN / World Bank metadata workflow. See [`docs/global-coverage.md`](docs/global-coverage.md) for the evidence-backed map.
42
+
43
+ ---
44
+
45
+ ## Why this exists
46
+
47
+ - **Zero `.exe` dependency** - no `NesstarExporter.exe`, no batch wrappers, no Wine
48
+ - **Cross-platform** - works anywhere Python 3.10+ works
49
+ - **Reverse-engineered binary parser** - reads `.Nesstar` files directly
50
+ - **Open output formats** - Parquet, CSV, TSV, Excel, Stata, JSON, JSONL, fixed-width text
51
+ - **Validation-first** - compares converted output against official Nesstar Explorer exports
52
+ - **Non-technical friendly** - one CLI, clear commands, sensible defaults
53
+
54
+ ---
55
+
56
+ ## `nesstar-converter` vs `ihsn/nesstar-exporter`
57
+
58
+ The IHSN tool is useful if you already have the official Windows exporter binary and want to automate that workflow. It is **not** a replacement for the binary itself.
59
+
60
+ | Dimension | `ihsn/nesstar-exporter` | `nesstar-converter` |
61
+ |---|---|---|
62
+ | Core approach | Python wrapper around `NesstarExporter.exe` | Pure-Python binary parser |
63
+ | Requires `NesstarExporter.exe` | **Yes** | **No** |
64
+ | OS model | Windows-oriented workflow | Linux / macOS / Windows |
65
+ | Reads binary directly | No | Yes |
66
+ | Reverse-engineered format support | No | Yes |
67
+ | Parquet output | No | Yes |
68
+ | RDF / DDI export via official tool | Yes | No |
69
+ | Validation against text exports | No built-in validation layer | Yes |
70
+ | Install model | Repo scripts + external exe path | Standard Python package / console script |
71
+
72
+ **Evidence:** the IHSN repo's own README, `config.json`, `src/config.py`, and `src/exporter.py` all require a path to `NesstarExporter.exe` and shell out to it with `subprocess.run(...)`.
73
+
74
+ ---
75
+
76
+ ## Who uses Nesstar?
77
+
78
+ Nesstar was not just an India-specific format. It was part of a broader international archive ecosystem.
79
+
80
+ | Institution / repository | Country / region | What we verified |
81
+ |---|---|---|
82
+ | **NSD / Sikt** | Norway | Original Nesstar developer and ESS host |
83
+ | **UK Data Archive / UK Data Service** | United Kingdom | Co-developer and former Nesstar WebView operator |
84
+ | **European Social Survey** | Pan-European | Disseminated through Nesstar from 2004 |
85
+ | **Statistics Canada / ODESI** | Canada | Licensed the full Nesstar suite; former WebView instance |
86
+ | **GESIS ZACAT** | Germany | Former Nesstar WebView catalog |
87
+ | **Sciences Po / CDSP** | France | Publicly documented migration away from Nesstar |
88
+ | **SSJDA / CSRDA** | Japan | Publicly documented Nesstar deployment |
89
+ | **IHSN / World Bank ecosystem** | Global | Still distributes Nesstar Publisher and maintains migration tooling |
90
+ | **India MoSPI / NSO** | India | Active distributor of `.Nesstar` survey files |
91
+ | **DataFirst / Stats SA** | South Africa | Important related archive / testing target, but evidence is legacy or mixed |
92
+
93
+ For the full institution table, confidence levels, and source links, see [`docs/global-coverage.md`](docs/global-coverage.md).
94
+
95
+ ---
96
+
97
+ ## Supported formats
98
+
99
+ | Format | Extension | Best for |
100
+ |---|---|---|
101
+ | `parquet` | `.parquet` | Analytics, DuckDB, pandas, R, long-term storage |
102
+ | `csv` | `.csv` | Universal spreadsheet compatibility |
103
+ | `tsv` | `.tsv` | Tab-separated workflows and legacy survey tooling |
104
+ | `excel` | `.xlsx` | Non-technical users |
105
+ | `stata` | `.dta` | Stata users, with leading zeros preserved |
106
+ | `json` | `.json` | Web apps and structured interchange |
107
+ | `jsonl` | `.jsonl` | Streaming and line-oriented pipelines |
108
+ | `fwf` | `.txt` | Fixed-width text output |
109
+
110
+ ---
111
+
112
+ ## Quick start
113
+
114
+ ### Install from source
115
+
116
+ ```bash
117
+ git clone https://github.com/abhinavjnu/nesstar-converter.git
118
+ cd nesstar-converter
119
+ python -m pip install -e ".[dev]"
120
+ ```
121
+
122
+ ### Inspect a file
123
+
124
+ ```bash
125
+ nesstar-converter info path/to/file.Nesstar path/to/ddi.xml
126
+ ```
127
+
128
+ ### Convert to open formats
129
+
130
+ ```bash
131
+ nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv,parquet,stata
132
+ ```
133
+
134
+ ### Validate against official text exports
135
+
136
+ ```bash
137
+ nesstar-converter validate ./output ./exported_text
138
+ ```
139
+
140
+ If the companion `ddi.xml` sits beside the `.Nesstar` file, you can omit it and the tool will auto-detect it.
141
+
142
+ ---
143
+
144
+ ## Validation and coverage
145
+
146
+ This repository distinguishes between:
147
+
148
+ 1. **Cell-level validation** - converted output matched official Nesstar Explorer exports row-for-row and value-for-value.
149
+ 2. **Structure-level verification** - official export files matched published file counts and variable counts, but the raw package lacked the companion DDI XML required for full binary re-validation.
150
+
151
+ | Survey / corpus | Years / rounds | Verification level | Result |
152
+ |---|---|---|---|
153
+ | **EUS** | 38th Round (1983) | Cell-level | 9/9 blocks, 3.4M rows, zero mismatches against official exports |
154
+ | **HCES** | 38th (1983), 45th (1989-90), 66th (2009-10) | Cell-level | 27/28 blocks, 23.4M+ rows, zero mismatches for blocks present in DDI |
155
+ | **PLFS** | 2017-18 to 2022-23 | Structure-level | 24/24 official export files matched NADA data-dictionary row/column counts; one 2017-18 revisit export includes a trailing blank tab column |
156
+
157
+ **PLFS note:** the local PLFS raw ZIPs contain `.Nesstar` files, official text exports, and the legacy Nesstar Explorer installer, but not the companion DDI XML needed for full binary decoding in the current open parser. That means PLFS is confirmed as a real Nesstar distribution corpus, but its current evidence in this repo is structural rather than full cell-level re-validation.
158
+
159
+ ---
160
+
161
+ ## For non-technical users
162
+
163
+ If your goal is simply "turn this old survey file into something Excel can open", the shortest path is:
164
+
165
+ ```bash
166
+ git clone https://github.com/abhinavjnu/nesstar-converter.git
167
+ cd nesstar-converter
168
+ python -m pip install -e .
169
+ nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv
170
+ ```
171
+
172
+ Then open the generated `.csv` files in Excel, LibreOffice, Google Sheets, Stata, R, or Python.
173
+
174
+ If you are unsure which format to choose:
175
+
176
+ | You want to... | Use |
177
+ |---|---|
178
+ | Open the data in Excel | `csv` |
179
+ | Work in Stata | `stata` |
180
+ | Analyze in Python / R / DuckDB | `parquet` |
181
+ | Preserve a text-like interchange format | `tsv` or `fwf` |
182
+
183
+ ---
184
+
185
+ ## Python API
186
+
187
+ ```python
188
+ from nesstar_converter import convert_nesstar, show_info
189
+
190
+ show_info("survey.Nesstar", "ddi.xml")
191
+
192
+ report = convert_nesstar(
193
+ "survey.Nesstar",
194
+ "ddi.xml",
195
+ "./output",
196
+ formats=["csv", "parquet"],
197
+ year="2022-23",
198
+ )
199
+ ```
200
+
201
+ Key functions:
202
+
203
+ | Function | Purpose |
204
+ |---|---|
205
+ | `convert_nesstar(...)` | Convert one `.Nesstar` file to one or more formats |
206
+ | `parse_ddi(...)` | Parse DDI XML block and variable metadata |
207
+ | `show_info(...)` | Inspect a file before conversion |
208
+ | `validate_against_export(...)` | Compare converted output to official text exports |
209
+ | `batch_convert(...)` | Convert a survey corpus in batch mode |
210
+
211
+ ---
212
+
213
+ ## Limitations
214
+
215
+ - **Full decoding currently expects DDI metadata.** If a distributor ships only the `.Nesstar` binary and omits the companion DDI XML, the current parser cannot yet do full open extraction on its own.
216
+ - **This is a data-conversion tool, not an RDF packager.** If your goal is specifically DDI / RDF export via the official legacy toolchain, the IHSN wrapper may still be useful - but it still requires `NesstarExporter.exe`.
217
+ - **Legacy ecosystems are inconsistent.** Different institutions used different Nesstar-era conventions, so community test cases from outside India are especially valuable.
218
+
219
+ ---
220
+
221
+ ## Documentation
222
+
223
+ - [`docs/TECHNICAL.md`](docs/TECHNICAL.md) - binary format notes and implementation details
224
+ - [`docs/global-coverage.md`](docs/global-coverage.md) - institutions, countries, archives, and source links
225
+
226
+ ---
227
+
228
+ ## Testing
229
+
230
+ ```bash
231
+ python -m pip install -e ".[dev]"
232
+ pytest tests/ -v
233
+ ```
234
+
235
+ CI runs unit tests on Python 3.10-3.13 and checks formatting with Ruff.
236
+
237
+ ---
238
+
239
+ ## Contributing
240
+
241
+ Good contributions for this project:
242
+
243
+ - Test the converter on non-MoSPI Nesstar files
244
+ - Report datasets that still circulate as `.Nesstar` / `.NSDstat`
245
+ - Share evidence of legacy Nesstar repositories or migrations
246
+ - Improve metadata recovery for archives that omit `ddi.xml`
247
+
248
+ Community testing requests are tracked in the issue tracker, including:
249
+
250
+ - Stats SA GHS
251
+ - UK Data Archive legacy Nesstar packages
252
+ - World Bank / IHSN LSMS-style Nesstar corpora
253
+
254
+ ---
255
+
256
+ ## Citation
257
+
258
+ If you use this tool in research, please cite it using [`CITATION.cff`](CITATION.cff).
259
+
260
+ ---
261
+
262
+ ## License
263
+
264
+ [MIT](LICENSE)
@@ -0,0 +1,236 @@
1
+ # Nesstar Converter
2
+
3
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
4
+ [![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
5
+ [![CI](https://github.com/abhinavjnu/nesstar-converter/actions/workflows/ci.yml/badge.svg)](https://github.com/abhinavjnu/nesstar-converter/actions/workflows/ci.yml)
6
+
7
+ **Pure-Python conversion for legacy Nesstar survey files - no `NesstarExporter.exe`, no Windows-only GUI, no dependency on discontinued desktop tooling.**
8
+
9
+ Nesstar was once a common dissemination format across social-science archives, national data services, and statistical agencies worldwide. The legacy ecosystem persists, but the original tooling is fragmented: many servers are gone, much documentation is outdated, and the surviving migration tools still depend on a proprietary Windows executable.
10
+
11
+ `nesstar-converter` takes the opposite approach. It reverse-engineers the binary format directly in Python and writes open outputs such as Parquet, CSV, Excel, Stata, and JSON on Linux, macOS, and Windows.
12
+
13
+ This project started with India's MoSPI survey archives, but the underlying problem is global. The wider Nesstar ecosystem touched the UK Data Archive, the European Social Survey, Statistics Canada / ODESI, GESIS, SSJDA in Japan, and the IHSN / World Bank metadata workflow. See [`docs/global-coverage.md`](docs/global-coverage.md) for the evidence-backed map.
14
+
15
+ ---
16
+
17
+ ## Why this exists
18
+
19
+ - **Zero `.exe` dependency** - no `NesstarExporter.exe`, no batch wrappers, no Wine
20
+ - **Cross-platform** - works anywhere Python 3.10+ works
21
+ - **Reverse-engineered binary parser** - reads `.Nesstar` files directly
22
+ - **Open output formats** - Parquet, CSV, TSV, Excel, Stata, JSON, JSONL, fixed-width text
23
+ - **Validation-first** - compares converted output against official Nesstar Explorer exports
24
+ - **Non-technical friendly** - one CLI, clear commands, sensible defaults
25
+
26
+ ---
27
+
28
+ ## `nesstar-converter` vs `ihsn/nesstar-exporter`
29
+
30
+ The IHSN tool is useful if you already have the official Windows exporter binary and want to automate that workflow. It is **not** a replacement for the binary itself.
31
+
32
+ | Dimension | `ihsn/nesstar-exporter` | `nesstar-converter` |
33
+ |---|---|---|
34
+ | Core approach | Python wrapper around `NesstarExporter.exe` | Pure-Python binary parser |
35
+ | Requires `NesstarExporter.exe` | **Yes** | **No** |
36
+ | OS model | Windows-oriented workflow | Linux / macOS / Windows |
37
+ | Reads binary directly | No | Yes |
38
+ | Reverse-engineered format support | No | Yes |
39
+ | Parquet output | No | Yes |
40
+ | RDF / DDI export via official tool | Yes | No |
41
+ | Validation against text exports | No built-in validation layer | Yes |
42
+ | Install model | Repo scripts + external exe path | Standard Python package / console script |
43
+
44
+ **Evidence:** the IHSN repo's own README, `config.json`, `src/config.py`, and `src/exporter.py` all require a path to `NesstarExporter.exe` and shell out to it with `subprocess.run(...)`.
45
+
46
+ ---
47
+
48
+ ## Who uses Nesstar?
49
+
50
+ Nesstar was not just an India-specific format. It was part of a broader international archive ecosystem.
51
+
52
+ | Institution / repository | Country / region | What we verified |
53
+ |---|---|---|
54
+ | **NSD / Sikt** | Norway | Original Nesstar developer and ESS host |
55
+ | **UK Data Archive / UK Data Service** | United Kingdom | Co-developer and former Nesstar WebView operator |
56
+ | **European Social Survey** | Pan-European | Disseminated through Nesstar from 2004 |
57
+ | **Statistics Canada / ODESI** | Canada | Licensed the full Nesstar suite; former WebView instance |
58
+ | **GESIS ZACAT** | Germany | Former Nesstar WebView catalog |
59
+ | **Sciences Po / CDSP** | France | Publicly documented migration away from Nesstar |
60
+ | **SSJDA / CSRDA** | Japan | Publicly documented Nesstar deployment |
61
+ | **IHSN / World Bank ecosystem** | Global | Still distributes Nesstar Publisher and maintains migration tooling |
62
+ | **India MoSPI / NSO** | India | Active distributor of `.Nesstar` survey files |
63
+ | **DataFirst / Stats SA** | South Africa | Important related archive / testing target, but evidence is legacy or mixed |
64
+
65
+ For the full institution table, confidence levels, and source links, see [`docs/global-coverage.md`](docs/global-coverage.md).
66
+
67
+ ---
68
+
69
+ ## Supported formats
70
+
71
+ | Format | Extension | Best for |
72
+ |---|---|---|
73
+ | `parquet` | `.parquet` | Analytics, DuckDB, pandas, R, long-term storage |
74
+ | `csv` | `.csv` | Universal spreadsheet compatibility |
75
+ | `tsv` | `.tsv` | Tab-separated workflows and legacy survey tooling |
76
+ | `excel` | `.xlsx` | Non-technical users |
77
+ | `stata` | `.dta` | Stata users, with leading zeros preserved |
78
+ | `json` | `.json` | Web apps and structured interchange |
79
+ | `jsonl` | `.jsonl` | Streaming and line-oriented pipelines |
80
+ | `fwf` | `.txt` | Fixed-width text output |
81
+
82
+ ---
83
+
84
+ ## Quick start
85
+
86
+ ### Install from source
87
+
88
+ ```bash
89
+ git clone https://github.com/abhinavjnu/nesstar-converter.git
90
+ cd nesstar-converter
91
+ python -m pip install -e ".[dev]"
92
+ ```
93
+
94
+ ### Inspect a file
95
+
96
+ ```bash
97
+ nesstar-converter info path/to/file.Nesstar path/to/ddi.xml
98
+ ```
99
+
100
+ ### Convert to open formats
101
+
102
+ ```bash
103
+ nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv,parquet,stata
104
+ ```
105
+
106
+ ### Validate against official text exports
107
+
108
+ ```bash
109
+ nesstar-converter validate ./output ./exported_text
110
+ ```
111
+
112
+ If the companion `ddi.xml` sits beside the `.Nesstar` file, you can omit it and the tool will auto-detect it.
113
+
114
+ ---
115
+
116
+ ## Validation and coverage
117
+
118
+ This repository distinguishes between:
119
+
120
+ 1. **Cell-level validation** - converted output matched official Nesstar Explorer exports row-for-row and value-for-value.
121
+ 2. **Structure-level verification** - official export files matched published file counts and variable counts, but the raw package lacked the companion DDI XML required for full binary re-validation.
122
+
123
+ | Survey / corpus | Years / rounds | Verification level | Result |
124
+ |---|---|---|---|
125
+ | **EUS** | 38th Round (1983) | Cell-level | 9/9 blocks, 3.4M rows, zero mismatches against official exports |
126
+ | **HCES** | 38th (1983), 45th (1989-90), 66th (2009-10) | Cell-level | 27/28 blocks, 23.4M+ rows, zero mismatches for blocks present in DDI |
127
+ | **PLFS** | 2017-18 to 2022-23 | Structure-level | 24/24 official export files matched NADA data-dictionary row/column counts; one 2017-18 revisit export includes a trailing blank tab column |
128
+
129
+ **PLFS note:** the local PLFS raw ZIPs contain `.Nesstar` files, official text exports, and the legacy Nesstar Explorer installer, but not the companion DDI XML needed for full binary decoding in the current open parser. That means PLFS is confirmed as a real Nesstar distribution corpus, but its current evidence in this repo is structural rather than full cell-level re-validation.
130
+
131
+ ---
132
+
133
+ ## For non-technical users
134
+
135
+ If your goal is simply "turn this old survey file into something Excel can open", the shortest path is:
136
+
137
+ ```bash
138
+ git clone https://github.com/abhinavjnu/nesstar-converter.git
139
+ cd nesstar-converter
140
+ python -m pip install -e .
141
+ nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv
142
+ ```
143
+
144
+ Then open the generated `.csv` files in Excel, LibreOffice, Google Sheets, Stata, R, or Python.
145
+
146
+ If you are unsure which format to choose:
147
+
148
+ | You want to... | Use |
149
+ |---|---|
150
+ | Open the data in Excel | `csv` |
151
+ | Work in Stata | `stata` |
152
+ | Analyze in Python / R / DuckDB | `parquet` |
153
+ | Preserve a text-like interchange format | `tsv` or `fwf` |
154
+
155
+ ---
156
+
157
+ ## Python API
158
+
159
+ ```python
160
+ from nesstar_converter import convert_nesstar, show_info
161
+
162
+ show_info("survey.Nesstar", "ddi.xml")
163
+
164
+ report = convert_nesstar(
165
+ "survey.Nesstar",
166
+ "ddi.xml",
167
+ "./output",
168
+ formats=["csv", "parquet"],
169
+ year="2022-23",
170
+ )
171
+ ```
172
+
173
+ Key functions:
174
+
175
+ | Function | Purpose |
176
+ |---|---|
177
+ | `convert_nesstar(...)` | Convert one `.Nesstar` file to one or more formats |
178
+ | `parse_ddi(...)` | Parse DDI XML block and variable metadata |
179
+ | `show_info(...)` | Inspect a file before conversion |
180
+ | `validate_against_export(...)` | Compare converted output to official text exports |
181
+ | `batch_convert(...)` | Convert a survey corpus in batch mode |
182
+
183
+ ---
184
+
185
+ ## Limitations
186
+
187
+ - **Full decoding currently expects DDI metadata.** If a distributor ships only the `.Nesstar` binary and omits the companion DDI XML, the current parser cannot yet do full open extraction on its own.
188
+ - **This is a data-conversion tool, not an RDF packager.** If your goal is specifically DDI / RDF export via the official legacy toolchain, the IHSN wrapper may still be useful - but it still requires `NesstarExporter.exe`.
189
+ - **Legacy ecosystems are inconsistent.** Different institutions used different Nesstar-era conventions, so community test cases from outside India are especially valuable.
190
+
191
+ ---
192
+
193
+ ## Documentation
194
+
195
+ - [`docs/TECHNICAL.md`](docs/TECHNICAL.md) - binary format notes and implementation details
196
+ - [`docs/global-coverage.md`](docs/global-coverage.md) - institutions, countries, archives, and source links
197
+
198
+ ---
199
+
200
+ ## Testing
201
+
202
+ ```bash
203
+ python -m pip install -e ".[dev]"
204
+ pytest tests/ -v
205
+ ```
206
+
207
+ CI runs unit tests on Python 3.10-3.13 and checks formatting with Ruff.
208
+
209
+ ---
210
+
211
+ ## Contributing
212
+
213
+ Good contributions for this project:
214
+
215
+ - Test the converter on non-MoSPI Nesstar files
216
+ - Report datasets that still circulate as `.Nesstar` / `.NSDstat`
217
+ - Share evidence of legacy Nesstar repositories or migrations
218
+ - Improve metadata recovery for archives that omit `ddi.xml`
219
+
220
+ Community testing requests are tracked in the issue tracker, including:
221
+
222
+ - Stats SA GHS
223
+ - UK Data Archive legacy Nesstar packages
224
+ - World Bank / IHSN LSMS-style Nesstar corpora
225
+
226
+ ---
227
+
228
+ ## Citation
229
+
230
+ If you use this tool in research, please cite it using [`CITATION.cff`](CITATION.cff).
231
+
232
+ ---
233
+
234
+ ## License
235
+
236
+ [MIT](LICENSE)