inei-microdatos 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- inei_microdatos-0.1.0/.github/workflows/ci.yml +26 -0
- inei_microdatos-0.1.0/.github/workflows/publish.yml +30 -0
- inei_microdatos-0.1.0/.gitignore +7 -0
- inei_microdatos-0.1.0/PKG-INFO +562 -0
- inei_microdatos-0.1.0/README.md +539 -0
- inei_microdatos-0.1.0/pyproject.toml +36 -0
- inei_microdatos-0.1.0/src/inei_microdatos/__init__.py +20 -0
- inei_microdatos-0.1.0/src/inei_microdatos/__main__.py +3 -0
- inei_microdatos-0.1.0/src/inei_microdatos/aliases.py +91 -0
- inei_microdatos-0.1.0/src/inei_microdatos/catalog.py +298 -0
- inei_microdatos-0.1.0/src/inei_microdatos/cli.py +227 -0
- inei_microdatos-0.1.0/src/inei_microdatos/client.py +268 -0
- inei_microdatos-0.1.0/src/inei_microdatos/data/catalog.json +153201 -0
- inei_microdatos-0.1.0/src/inei_microdatos/download.py +236 -0
- inei_microdatos-0.1.0/src/inei_microdatos/reader.py +250 -0
- inei_microdatos-0.1.0/tests/test_catalog.py +106 -0
- inei_microdatos-0.1.0/tests/test_client.py +104 -0
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
strategy:
|
|
13
|
+
matrix:
|
|
14
|
+
python-version: ["3.9", "3.11", "3.12"]
|
|
15
|
+
steps:
|
|
16
|
+
- uses: actions/checkout@v4
|
|
17
|
+
|
|
18
|
+
- uses: actions/setup-python@v5
|
|
19
|
+
with:
|
|
20
|
+
python-version: ${{ matrix.python-version }}
|
|
21
|
+
|
|
22
|
+
- name: Install package
|
|
23
|
+
run: pip install -e ".[test]"
|
|
24
|
+
|
|
25
|
+
- name: Run tests
|
|
26
|
+
run: pytest tests/ -v
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
release:
|
|
5
|
+
types: [published]
|
|
6
|
+
|
|
7
|
+
permissions:
|
|
8
|
+
id-token: write
|
|
9
|
+
|
|
10
|
+
jobs:
|
|
11
|
+
publish:
|
|
12
|
+
runs-on: ubuntu-latest
|
|
13
|
+
environment:
|
|
14
|
+
name: pypi
|
|
15
|
+
url: https://pypi.org/p/inei-microdatos
|
|
16
|
+
steps:
|
|
17
|
+
- uses: actions/checkout@v4
|
|
18
|
+
|
|
19
|
+
- uses: actions/setup-python@v5
|
|
20
|
+
with:
|
|
21
|
+
python-version: "3.12"
|
|
22
|
+
|
|
23
|
+
- name: Install build tools
|
|
24
|
+
run: pip install build
|
|
25
|
+
|
|
26
|
+
- name: Build package
|
|
27
|
+
run: python -m build
|
|
28
|
+
|
|
29
|
+
- name: Publish to PyPI
|
|
30
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -0,0 +1,562 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: inei-microdatos
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Programmatic access to INEI Peru's microdata portal
|
|
5
|
+
Project-URL: Homepage, https://github.com/fiorellarmartins/inei-microdatos
|
|
6
|
+
Author: Fiorella Ramirez
|
|
7
|
+
License-Expression: MIT
|
|
8
|
+
Keywords: enaho,endes,inei,microdata,peru,statistics
|
|
9
|
+
Classifier: Development Status :: 3 - Alpha
|
|
10
|
+
Classifier: Intended Audience :: Science/Research
|
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Topic :: Scientific/Engineering
|
|
14
|
+
Requires-Python: >=3.8
|
|
15
|
+
Requires-Dist: click>=8.0
|
|
16
|
+
Requires-Dist: pandas>=1.2
|
|
17
|
+
Requires-Dist: pyreadstat>=1.0
|
|
18
|
+
Requires-Dist: requests>=2.28
|
|
19
|
+
Requires-Dist: tqdm>=4.60
|
|
20
|
+
Provides-Extra: test
|
|
21
|
+
Requires-Dist: pytest>=7.0; extra == 'test'
|
|
22
|
+
Description-Content-Type: text/markdown
|
|
23
|
+
|
|
24
|
+
# inei-microdatos
|
|
25
|
+
|
|
26
|
+
Acceso programático al [portal de microdatos del INEI](https://proyectos.inei.gob.pe/microdatos/). Descarga microdatos de encuestas, censos y documentación sin navegar los dropdowns del portal.
|
|
27
|
+
|
|
28
|
+
El portal alberga **67 encuestas**, **5,900+ módulos descargables** y **8,100+ archivos de documentación** desde 1994 hasta 2025 — incluyendo encuestas de hogares (ENAHO), encuestas demográficas y de salud (ENDES), encuestas de empleo (EPEN), censos agropecuarios (CENAGRO), encuestas económicas (EEA) y decenas más.
|
|
29
|
+
|
|
30
|
+
[English version below](#english)
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## El problema
|
|
35
|
+
|
|
36
|
+
El portal de microdatos del INEI es una aplicación ASP antigua con dropdowns en cascada vía AJAX. No hay API. Descargar un solo módulo requiere 4 clicks. Descargar una encuesta completa a través de los años requiere cientos. El portal usa codificación Windows-1252 con secuencias de escape estilo JavaScript que rompen los clientes HTTP estándar.
|
|
37
|
+
|
|
38
|
+
Este paquete maneja todo eso.
|
|
39
|
+
|
|
40
|
+
## Instalación
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
pip install inei-microdatos
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Requiere Python 3.8+. Incluye pandas y pyreadstat para leer datos en todos los formatos (CSV, STATA, SPSS).
|
|
47
|
+
|
|
48
|
+
## Inicio rápido
|
|
49
|
+
|
|
50
|
+
```python
|
|
51
|
+
from inei_microdatos import load_catalog, download_modules, read_module
|
|
52
|
+
from inei_microdatos.catalog import filter_catalog
|
|
53
|
+
|
|
54
|
+
# Cargar el catálogo incluido (viene con el paquete, sin configuración)
|
|
55
|
+
catalog = load_catalog()
|
|
56
|
+
|
|
57
|
+
# Filtrar lo que necesitas
|
|
58
|
+
endes_2024 = filter_catalog(catalog, survey="endes", year_min=2024)
|
|
59
|
+
|
|
60
|
+
# Descargar
|
|
61
|
+
download_modules(endes_2024, dest="./data/", fmt="CSV", workers=4)
|
|
62
|
+
|
|
63
|
+
# Leer en DataFrames
|
|
64
|
+
dfs = read_module("./data/ENDES/2024/Unico/968-Modulo1629.zip")
|
|
65
|
+
for name, df in dfs.items():
|
|
66
|
+
print(f"{name}: {df.shape}")
|
|
67
|
+
# RECH0: (37390, 44)
|
|
68
|
+
# RECH1: (135045, 36)
|
|
69
|
+
# RECH4: (135045, 22)
|
|
70
|
+
# RECHM: (3002, 8)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Aliases de encuestas
|
|
74
|
+
|
|
75
|
+
En lugar de escribir nombres completos, usa aliases cortos:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
inei-microdatos list --survey enaho # en vez de "Condiciones de Vida y Pobreza - ENAHO"
|
|
79
|
+
inei-microdatos list --survey endes # en vez de "Demográfica y de Salud Familiar - ENDES"
|
|
80
|
+
inei-microdatos list --survey cenagro # en vez de "CENSO NACIONAL AGROPECUARIO - CENAGRO"
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Aliases comunes: `enaho`, `endes`, `epen`, `epe-lima`, `cenagro`, `eea`, `enapres`, `renamu`, `enaho-panel`, `enpove`, `enapref`, `enares`, `lgbti` y [50+ más](src/inei_microdatos/aliases.py). Ejecuta `inei-microdatos aliases` para ver todos.
|
|
84
|
+
|
|
85
|
+
Los aliases funcionan en todos los lugares donde se acepta `--survey` — en el CLI y en `filter_catalog()`.
|
|
86
|
+
|
|
87
|
+
## CLI
|
|
88
|
+
|
|
89
|
+
El paquete incluye una interfaz de línea de comandos para explorar y descargar sin escribir código.
|
|
90
|
+
|
|
91
|
+
### Explorar datos disponibles
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
# Resumen general
|
|
95
|
+
inei-microdatos stats
|
|
96
|
+
|
|
97
|
+
# Listar todas las encuestas
|
|
98
|
+
inei-microdatos list
|
|
99
|
+
|
|
100
|
+
# Filtrar
|
|
101
|
+
inei-microdatos list --survey enaho --year-min 2020
|
|
102
|
+
inei-microdatos list --survey endes
|
|
103
|
+
inei-microdatos list --survey cenagro
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Descargar
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
# Descargar ENDES 2020-2024 como CSV
|
|
110
|
+
inei-microdatos download --survey endes --year-min 2020 --format CSV --dest ./data/
|
|
111
|
+
|
|
112
|
+
# Descargar ENAHO anual como STATA
|
|
113
|
+
inei-microdatos download --survey enaho --period "Anual" --year-min 2018 --format STATA --dest ./data/
|
|
114
|
+
|
|
115
|
+
# Incluir documentación (cuestionarios, diccionarios, fichas técnicas)
|
|
116
|
+
inei-microdatos download --survey enaho --year-min 2024 --format CSV --dest ./data/ --include-docs
|
|
117
|
+
|
|
118
|
+
# Descargar solo documentación
|
|
119
|
+
inei-microdatos docs --survey endes --year-min 2020 --dest ./docs/
|
|
120
|
+
|
|
121
|
+
# Vista previa de lo que se descargaría (sin descargar)
|
|
122
|
+
inei-microdatos download --survey endes --year-min 2024 --format CSV --dest ./data/ --dry-run
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Leer archivos descargados
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
# Listar tablas dentro de un ZIP
|
|
129
|
+
inei-microdatos read ./data/968-Modulo1629.zip --info
|
|
130
|
+
|
|
131
|
+
# Vista previa de datos
|
|
132
|
+
inei-microdatos read ./data/968-Modulo1629.zip -t RECH0
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Organización de carpetas
|
|
136
|
+
|
|
137
|
+
Controla cómo se organizan los archivos en disco:
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
# Por defecto: {survey}/{year}/{period}/{code}.zip
|
|
141
|
+
inei-microdatos download --survey endes --dest ./data/
|
|
142
|
+
|
|
143
|
+
# Plano por año (sin subcarpetas de período)
|
|
144
|
+
inei-microdatos download --survey endes --dest ./data/ --layout by-year
|
|
145
|
+
|
|
146
|
+
# Completamente plano
|
|
147
|
+
inei-microdatos download --survey endes --dest ./data/ --layout flat
|
|
148
|
+
|
|
149
|
+
# Organizado por formato
|
|
150
|
+
inei-microdatos download --survey endes --dest ./data/ --layout by-format
|
|
151
|
+
|
|
152
|
+
# Template personalizado
|
|
153
|
+
inei-microdatos download --survey endes --dest ./data/ \
|
|
154
|
+
--layout "{year}/{survey}/{module_name}.zip"
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Placeholders disponibles: `{survey}`, `{year}`, `{period}`, `{code}`, `{module_name}`, `{format}`.
|
|
158
|
+
|
|
159
|
+
## API de Python
|
|
160
|
+
|
|
161
|
+
### Catálogo
|
|
162
|
+
|
|
163
|
+
```python
|
|
164
|
+
from inei_microdatos import load_catalog
|
|
165
|
+
from inei_microdatos.catalog import filter_catalog, catalog_stats, catalog_age
|
|
166
|
+
|
|
167
|
+
# Cargar catálogo incluido (sin configuración)
|
|
168
|
+
catalog = load_catalog()
|
|
169
|
+
|
|
170
|
+
# Verificar cuándo se generó
|
|
171
|
+
print(catalog_age()) # "2026-03-31T16:00:31+00:00"
|
|
172
|
+
|
|
173
|
+
# Estadísticas
|
|
174
|
+
print(catalog_stats(catalog))
|
|
175
|
+
# {'surveys': 67, 'survey_years': 295, 'modules': 5932, ...}
|
|
176
|
+
|
|
177
|
+
# Filtrar por nombre de encuesta (o alias), rango de años, período
|
|
178
|
+
enaho = filter_catalog(catalog, survey="enaho", year_min=2020, period="Anual")
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Descarga
|
|
182
|
+
|
|
183
|
+
```python
|
|
184
|
+
from inei_microdatos import download_modules, download_docs
|
|
185
|
+
|
|
186
|
+
# Descargar con fallback de formato (CSV preferido, cae a STATA/SPSS si no hay)
|
|
187
|
+
result = download_modules(catalog, dest="./data/", fmt="CSV", workers=4)
|
|
188
|
+
# {'ok': 13, 'skipped': 0, 'failed': 0, 'bad_zip': 0}
|
|
189
|
+
|
|
190
|
+
# Formato estricto (sin fallback)
|
|
191
|
+
result = download_modules(catalog, dest="./data/", fmt="STATA", fallback=False)
|
|
192
|
+
|
|
193
|
+
# Vista previa sin descargar
|
|
194
|
+
result = download_modules(catalog, dest="./data/", fmt="CSV", dry_run=True)
|
|
195
|
+
|
|
196
|
+
# Documentación
|
|
197
|
+
result = download_docs(catalog, dest="./docs/", workers=4)
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### Lectura
|
|
201
|
+
|
|
202
|
+
```python
|
|
203
|
+
from inei_microdatos import read_module, read_catalog_entry, list_tables
|
|
204
|
+
|
|
205
|
+
# Desde un ZIP descargado
|
|
206
|
+
dfs = read_module("./data/968-Modulo1629.zip")
|
|
207
|
+
|
|
208
|
+
# Leer solo tablas específicas
|
|
209
|
+
dfs = read_module("./data/968-Modulo1629.zip", tables=["RECH0", "RECH1"])
|
|
210
|
+
|
|
211
|
+
# Desde un código de descarga (descarga a directorio temporal automáticamente)
|
|
212
|
+
dfs = read_module("968-Modulo1629")
|
|
213
|
+
|
|
214
|
+
# Directo desde catálogo (descarga + lee en un paso)
|
|
215
|
+
dfs = read_catalog_entry(catalog[0], year="2024", module="Hogar")
|
|
216
|
+
|
|
217
|
+
# Inspeccionar sin leer
|
|
218
|
+
tables = list_tables("./data/968-Modulo1629.zip")
|
|
219
|
+
# [{'name': 'RECH0', 'format': 'csv', 'size_bytes': 6598376, ...}, ...]
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Cliente (bajo nivel)
|
|
223
|
+
|
|
224
|
+
```python
|
|
225
|
+
from inei_microdatos import INEIClient
|
|
226
|
+
|
|
227
|
+
client = INEIClient()
|
|
228
|
+
surveys = client.get_surveys()
|
|
229
|
+
years = client.get_years(surveys[0])
|
|
230
|
+
periods = client.get_periods(surveys[0], years[0])
|
|
231
|
+
modules = client.get_modules(surveys[0], years[0], periods[0])
|
|
232
|
+
|
|
233
|
+
print(modules[0].download_url("STATA"))
|
|
234
|
+
# https://proyectos.inei.gob.pe/iinei/srienaho/descarga/STATA/966-Modulo01.zip
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### Actualizar el catálogo
|
|
238
|
+
|
|
239
|
+
El catálogo incluido es una foto fija. Para obtener los datos más recientes del INEI:
|
|
240
|
+
|
|
241
|
+
```python
|
|
242
|
+
from inei_microdatos import INEIClient
|
|
243
|
+
from inei_microdatos.catalog import build_catalog, save_catalog
|
|
244
|
+
|
|
245
|
+
client = INEIClient()
|
|
246
|
+
catalog = build_catalog(client) # ~10 minutos
|
|
247
|
+
save_catalog(catalog, "~/.inei-microdatos/catalog.json")
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
O por CLI:
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
inei-microdatos crawl # primera vez
|
|
254
|
+
inei-microdatos crawl --refresh # re-crawl
|
|
255
|
+
inei-microdatos crawl --survey enaho # solo una encuesta específica
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
## Formatos disponibles
|
|
259
|
+
|
|
260
|
+
| Formato | Cobertura | Notas |
|
|
261
|
+
|---------|-----------|-------|
|
|
262
|
+
| **SPSS** (.sav) | ~98% de los módulos | Mayor cobertura |
|
|
263
|
+
| **STATA** (.dta) | ~42% | Incluye etiquetas de valores |
|
|
264
|
+
| **CSV** | ~43% | UTF-8 con BOM |
|
|
265
|
+
|
|
266
|
+
Las encuestas antiguas (pre-2008) frecuentemente solo están disponibles en SPSS/STATA, no en CSV. El flag `--format CSV` automáticamente cae a STATA o SPSS cuando CSV no está disponible. Usa `--no-fallback` para desactivar esto.
|
|
267
|
+
|
|
268
|
+
## Separación metodológica de ENAHO
|
|
269
|
+
|
|
270
|
+
ENAHO cambió de metodología en 2004. El portal del INEI ofrece "ENAHO Metodología Anterior" y "ENAHO Metodología Actualizada" como dropdowns separados, pero devuelven datos idénticos para la encuesta principal "Condiciones de Vida y Pobreza".
|
|
271
|
+
|
|
272
|
+
Este paquete automáticamente los separa en el límite metodológico:
|
|
273
|
+
- **ENAHO Anterior**: 1997–2003 (metodología antigua)
|
|
274
|
+
- **ENAHO Actualizada**: 2004–presente (metodología actual)
|
|
275
|
+
|
|
276
|
+
Las sub-encuestas temáticas (Empleo, Educación, Victimización, etc.) y las variantes PANEL son datasets genuinamente distintos y se preservan tal cual.
|
|
277
|
+
|
|
278
|
+
## Cómo funciona
|
|
279
|
+
|
|
280
|
+
El portal del INEI usa tres endpoints AJAX detrás de dropdowns en cascada:
|
|
281
|
+
|
|
282
|
+
1. `CambiaEnc.asp` — selección de encuesta, devuelve años disponibles
|
|
283
|
+
2. `CambiaAnio.asp` — selección de año, devuelve períodos disponibles
|
|
284
|
+
3. `cambiaPeriodo.asp` — selección de período, devuelve tabla de módulos con links de descarga
|
|
285
|
+
|
|
286
|
+
Las URLs de descarga siguen un patrón predecible: `https://proyectos.inei.gob.pe/iinei/srienaho/descarga/{FORMATO}/{CÓDIGO}.zip`
|
|
287
|
+
|
|
288
|
+
El detalle crítico de implementación es la codificación: los nombres de encuestas que contienen caracteres Windows-1252 (como el en-dash `\x96` en los nombres de encuestas EPEN) deben codificarse usando la convención `escape()` de JavaScript (`%96`), no la codificación percent UTF-8 por defecto de Python (`%C2%96`). La codificación de formularios estándar de `requests` falla silenciosamente — el servidor devuelve HTML vacío sin error.
|
|
289
|
+
|
|
290
|
+
## Licencia
|
|
291
|
+
|
|
292
|
+
MIT
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
<a name="english"></a>
|
|
297
|
+
|
|
298
|
+
# inei-microdatos (English)
|
|
299
|
+
|
|
300
|
+
Programmatic access to Peru's [INEI microdata portal](https://proyectos.inei.gob.pe/microdatos/). Download survey microdata, census files, and documentation without clicking through the portal's dropdowns.
|
|
301
|
+
|
|
302
|
+
The portal hosts **67 surveys**, **5,900+ downloadable modules**, and **8,100+ documentation files** spanning from 1994 to 2025 — covering household surveys (ENAHO), demographic and health surveys (ENDES), employment surveys (EPEN), agricultural censuses (CENAGRO), economic surveys (EEA), and dozens more.
|
|
303
|
+
|
|
304
|
+
## The problem
|
|
305
|
+
|
|
306
|
+
INEI's microdata portal is an old ASP application with cascading AJAX dropdowns. There is no API. Downloading a single module requires 4 clicks. Downloading an entire survey across years requires hundreds. The portal uses Windows-1252 encoding with JavaScript-style escape sequences that break standard HTTP clients.
|
|
307
|
+
|
|
308
|
+
This package handles all of that.
|
|
309
|
+
|
|
310
|
+
## Install
|
|
311
|
+
|
|
312
|
+
```bash
|
|
313
|
+
pip install inei-microdatos
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
Requires Python 3.8+. Includes pandas and pyreadstat to read data in all formats (CSV, STATA, SPSS).
|
|
317
|
+
|
|
318
|
+
## Quick start
|
|
319
|
+
|
|
320
|
+
```python
|
|
321
|
+
from inei_microdatos import load_catalog, download_modules, read_module
|
|
322
|
+
from inei_microdatos.catalog import filter_catalog
|
|
323
|
+
|
|
324
|
+
# Load the bundled catalog (ships with the package, no setup needed)
|
|
325
|
+
catalog = load_catalog()
|
|
326
|
+
|
|
327
|
+
# Filter to what you need
|
|
328
|
+
endes_2024 = filter_catalog(catalog, survey="endes", year_min=2024)
|
|
329
|
+
|
|
330
|
+
# Download
|
|
331
|
+
download_modules(endes_2024, dest="./data/", fmt="CSV", workers=4)
|
|
332
|
+
|
|
333
|
+
# Read into DataFrames
|
|
334
|
+
dfs = read_module("./data/ENDES/2024/Unico/968-Modulo1629.zip")
|
|
335
|
+
for name, df in dfs.items():
|
|
336
|
+
print(f"{name}: {df.shape}")
|
|
337
|
+
# RECH0: (37390, 44)
|
|
338
|
+
# RECH1: (135045, 36)
|
|
339
|
+
# RECH4: (135045, 22)
|
|
340
|
+
# RECHM: (3002, 8)
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
## Survey aliases
|
|
344
|
+
|
|
345
|
+
Instead of typing full survey names, use short aliases:
|
|
346
|
+
|
|
347
|
+
```bash
|
|
348
|
+
inei-microdatos list --survey enaho # instead of "Condiciones de Vida y Pobreza - ENAHO"
|
|
349
|
+
inei-microdatos list --survey endes # instead of "Demográfica y de Salud Familiar - ENDES"
|
|
350
|
+
inei-microdatos list --survey cenagro # instead of "CENSO NACIONAL AGROPECUARIO - CENAGRO"
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
Common aliases: `enaho`, `endes`, `epen`, `epe-lima`, `cenagro`, `eea`, `enapres`, `renamu`, `enaho-panel`, `enpove`, `enapref`, `enares`, `lgbti`, and [50+ more](src/inei_microdatos/aliases.py). Run `inei-microdatos aliases` to see all.
|
|
354
|
+
|
|
355
|
+
Aliases work everywhere `--survey` is accepted — in the CLI and in `filter_catalog()`.
|
|
356
|
+
|
|
357
|
+
## CLI
|
|
358
|
+
|
|
359
|
+
The package includes a command-line interface for browsing and downloading without writing code.
|
|
360
|
+
|
|
361
|
+
### Browse available data
|
|
362
|
+
|
|
363
|
+
```bash
|
|
364
|
+
# Overview
|
|
365
|
+
inei-microdatos stats
|
|
366
|
+
|
|
367
|
+
# List all surveys
|
|
368
|
+
inei-microdatos list
|
|
369
|
+
|
|
370
|
+
# Filter
|
|
371
|
+
inei-microdatos list --survey enaho --year-min 2020
|
|
372
|
+
inei-microdatos list --survey endes
|
|
373
|
+
inei-microdatos list --survey cenagro
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
### Download
|
|
377
|
+
|
|
378
|
+
```bash
|
|
379
|
+
# Download ENDES 2020-2024 as CSV
|
|
380
|
+
inei-microdatos download --survey endes --year-min 2020 --format CSV --dest ./data/
|
|
381
|
+
|
|
382
|
+
# Download ENAHO annual data as STATA
|
|
383
|
+
inei-microdatos download --survey enaho --period "Anual" --year-min 2018 --format STATA --dest ./data/
|
|
384
|
+
|
|
385
|
+
# Include documentation (questionnaires, dictionaries, fichas)
|
|
386
|
+
inei-microdatos download --survey enaho --year-min 2024 --format CSV --dest ./data/ --include-docs
|
|
387
|
+
|
|
388
|
+
# Download only documentation
|
|
389
|
+
inei-microdatos docs --survey endes --year-min 2020 --dest ./docs/
|
|
390
|
+
|
|
391
|
+
# Preview what would be downloaded (no actual download)
|
|
392
|
+
inei-microdatos download --survey endes --year-min 2024 --format CSV --dest ./data/ --dry-run
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
### Read downloaded files
|
|
396
|
+
|
|
397
|
+
```bash
|
|
398
|
+
# List tables inside a ZIP
|
|
399
|
+
inei-microdatos read ./data/968-Modulo1629.zip --info
|
|
400
|
+
|
|
401
|
+
# Preview data
|
|
402
|
+
inei-microdatos read ./data/968-Modulo1629.zip -t RECH0
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
### Folder layouts
|
|
406
|
+
|
|
407
|
+
Control how files are organized on disk:
|
|
408
|
+
|
|
409
|
+
```bash
|
|
410
|
+
# Default: {survey}/{year}/{period}/{code}.zip
|
|
411
|
+
inei-microdatos download --survey endes --dest ./data/
|
|
412
|
+
|
|
413
|
+
# Flat by year (no period subfolders)
|
|
414
|
+
inei-microdatos download --survey endes --dest ./data/ --layout by-year
|
|
415
|
+
|
|
416
|
+
# Completely flat
|
|
417
|
+
inei-microdatos download --survey endes --dest ./data/ --layout flat
|
|
418
|
+
|
|
419
|
+
# Organized by format
|
|
420
|
+
inei-microdatos download --survey endes --dest ./data/ --layout by-format
|
|
421
|
+
|
|
422
|
+
# Custom template
|
|
423
|
+
inei-microdatos download --survey endes --dest ./data/ \
|
|
424
|
+
--layout "{year}/{survey}/{module_name}.zip"
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
Available placeholders: `{survey}`, `{year}`, `{period}`, `{code}`, `{module_name}`, `{format}`.
|
|
428
|
+
|
|
429
|
+
## Python API
|
|
430
|
+
|
|
431
|
+
### Catalog
|
|
432
|
+
|
|
433
|
+
```python
|
|
434
|
+
from inei_microdatos import load_catalog
|
|
435
|
+
from inei_microdatos.catalog import filter_catalog, catalog_stats, catalog_age
|
|
436
|
+
|
|
437
|
+
# Load bundled catalog (zero setup)
|
|
438
|
+
catalog = load_catalog()
|
|
439
|
+
|
|
440
|
+
# Check when it was crawled
|
|
441
|
+
print(catalog_age()) # "2026-03-31T16:00:31+00:00"
|
|
442
|
+
|
|
443
|
+
# Stats
|
|
444
|
+
print(catalog_stats(catalog))
|
|
445
|
+
# {'surveys': 67, 'survey_years': 295, 'modules': 5932, ...}
|
|
446
|
+
|
|
447
|
+
# Filter by survey name (or alias), year range, period
|
|
448
|
+
enaho = filter_catalog(catalog, survey="enaho", year_min=2020, period="Anual")
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
### Download
|
|
452
|
+
|
|
453
|
+
```python
|
|
454
|
+
from inei_microdatos import download_modules, download_docs
|
|
455
|
+
|
|
456
|
+
# Download with format fallback (CSV preferred, falls back to STATA/SPSS)
|
|
457
|
+
result = download_modules(catalog, dest="./data/", fmt="CSV", workers=4)
|
|
458
|
+
# {'ok': 13, 'skipped': 0, 'failed': 0, 'bad_zip': 0}
|
|
459
|
+
|
|
460
|
+
# Strict format (no fallback)
|
|
461
|
+
result = download_modules(catalog, dest="./data/", fmt="STATA", fallback=False)
|
|
462
|
+
|
|
463
|
+
# Dry run (preview without downloading)
|
|
464
|
+
result = download_modules(catalog, dest="./data/", fmt="CSV", dry_run=True)
|
|
465
|
+
|
|
466
|
+
# Documentation
|
|
467
|
+
result = download_docs(catalog, dest="./docs/", workers=4)
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
### Read
|
|
471
|
+
|
|
472
|
+
```python
|
|
473
|
+
from inei_microdatos import read_module, read_catalog_entry, list_tables
|
|
474
|
+
|
|
475
|
+
# From a downloaded ZIP
|
|
476
|
+
dfs = read_module("./data/968-Modulo1629.zip")
|
|
477
|
+
|
|
478
|
+
# Read specific tables only
|
|
479
|
+
dfs = read_module("./data/968-Modulo1629.zip", tables=["RECH0", "RECH1"])
|
|
480
|
+
|
|
481
|
+
# From a download code (downloads to temp dir automatically)
|
|
482
|
+
dfs = read_module("968-Modulo1629")
|
|
483
|
+
|
|
484
|
+
# Directly from catalog (downloads + reads in one step)
|
|
485
|
+
dfs = read_catalog_entry(catalog[0], year="2024", module="Hogar")
|
|
486
|
+
|
|
487
|
+
# Inspect without reading
|
|
488
|
+
tables = list_tables("./data/968-Modulo1629.zip")
|
|
489
|
+
# [{'name': 'RECH0', 'format': 'csv', 'size_bytes': 6598376, ...}, ...]
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
### Client (low-level)
|
|
493
|
+
|
|
494
|
+
```python
|
|
495
|
+
from inei_microdatos import INEIClient
|
|
496
|
+
|
|
497
|
+
client = INEIClient()
|
|
498
|
+
surveys = client.get_surveys()
|
|
499
|
+
years = client.get_years(surveys[0])
|
|
500
|
+
periods = client.get_periods(surveys[0], years[0])
|
|
501
|
+
modules = client.get_modules(surveys[0], years[0], periods[0])
|
|
502
|
+
|
|
503
|
+
print(modules[0].download_url("STATA"))
|
|
504
|
+
# https://proyectos.inei.gob.pe/iinei/srienaho/descarga/STATA/966-Modulo01.zip
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
### Update the catalog
|
|
508
|
+
|
|
509
|
+
The bundled catalog is a snapshot. To get the latest data from INEI:
|
|
510
|
+
|
|
511
|
+
```python
|
|
512
|
+
from inei_microdatos import INEIClient
|
|
513
|
+
from inei_microdatos.catalog import build_catalog, save_catalog
|
|
514
|
+
|
|
515
|
+
client = INEIClient()
|
|
516
|
+
catalog = build_catalog(client) # ~10 minutes
|
|
517
|
+
save_catalog(catalog, "~/.inei-microdatos/catalog.json")
|
|
518
|
+
```
|
|
519
|
+
|
|
520
|
+
Or via CLI:
|
|
521
|
+
|
|
522
|
+
```bash
|
|
523
|
+
inei-microdatos crawl # first time
|
|
524
|
+
inei-microdatos crawl --refresh # re-crawl
|
|
525
|
+
inei-microdatos crawl --survey enaho # crawl specific survey only
|
|
526
|
+
```
|
|
527
|
+
|
|
528
|
+
## Available formats
|
|
529
|
+
|
|
530
|
+
| Format | Coverage | Notes |
|
|
531
|
+
|--------|----------|-------|
|
|
532
|
+
| **SPSS** (.sav) | ~98% of modules | Best coverage |
|
|
533
|
+
| **STATA** (.dta) | ~42% | Value labels included |
|
|
534
|
+
| **CSV** | ~43% | UTF-8 with BOM |
|
|
535
|
+
|
|
536
|
+
Older surveys (pre-2008) are often available only in SPSS/STATA, not CSV. The `--format CSV` flag automatically falls back to STATA or SPSS when CSV isn't available. Use `--no-fallback` to disable this.
|
|
537
|
+
|
|
538
|
+
## ENAHO methodology split
|
|
539
|
+
|
|
540
|
+
ENAHO changed methodology in 2004. The INEI portal offers both "ENAHO Metodología Anterior" and "ENAHO Metodología Actualizada" as separate dropdowns, but they return identical data for the main "Condiciones de Vida y Pobreza" survey.
|
|
541
|
+
|
|
542
|
+
This package automatically splits them at the boundary:
|
|
543
|
+
- **ENAHO Anterior**: 1997–2003 (old methodology)
|
|
544
|
+
- **ENAHO Actualizada**: 2004–present (current methodology)
|
|
545
|
+
|
|
546
|
+
The thematic sub-surveys (Empleo, Educación, Victimización, etc.) and PANEL variants are genuinely distinct datasets and are preserved as-is.
|
|
547
|
+
|
|
548
|
+
## How it works
|
|
549
|
+
|
|
550
|
+
The INEI portal uses three AJAX endpoints behind cascading dropdowns:
|
|
551
|
+
|
|
552
|
+
1. `CambiaEnc.asp` — survey selection, returns available years
|
|
553
|
+
2. `CambiaAnio.asp` — year selection, returns available periods
|
|
554
|
+
3. `cambiaPeriodo.asp` — period selection, returns module table with download links
|
|
555
|
+
|
|
556
|
+
Download URLs follow a predictable pattern: `https://proyectos.inei.gob.pe/iinei/srienaho/descarga/{FORMAT}/{CODE}.zip`
|
|
557
|
+
|
|
558
|
+
The critical implementation detail is encoding: survey names containing Windows-1252 characters (like the en-dash `\x96` in EPEN survey names) must be encoded using JavaScript's `escape()` convention (`%96`), not Python's default UTF-8 percent-encoding (`%C2%96`). Standard `requests` form encoding silently fails — the server returns empty HTML with no error.
|
|
559
|
+
|
|
560
|
+
## License
|
|
561
|
+
|
|
562
|
+
MIT
|