@tricoteuses/assemblee 2.2.0 → 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +44 -23
- package/lib/cleaners/scrutins.d.ts +1 -0
- package/lib/cleaners.js +333 -324
- package/lib/index.js +943 -548
- package/lib/{loaders-Csd5rgG_.js → loaders-COCQTD-d.js} +1449 -2084
- package/lib/loaders.js +5 -5
- package/lib/parsers.js +9302 -578
- package/lib/raw_types/questions.d.ts +1 -1
- package/lib/raw_types/scrutins.d.ts +1 -1
- package/lib/schemas/agendas.json +8 -0
- package/lib/schemas/debats.json +0 -6
- package/lib/schemas/dossiers_legislatifs.json +8 -0
- package/lib/schemas/questions.json +9 -2
- package/lib/scripts/retrieve_documents.d.ts +6 -5
- package/lib/scripts/retrieve_videos.d.ts +14 -0
- package/lib/scripts/shared/cli_helpers.d.ts +5 -0
- package/lib/scripts/shared/utils.d.ts +2 -0
- package/lib/types/agendas.d.ts +8 -0
- package/lib/types/debats.d.ts +0 -2
- package/lib/types/questions.d.ts +1 -1
- package/package.json +3 -2
- package/lib/index-COAP8XeF.js +0 -8733
package/README.md
CHANGED
|
@@ -42,53 +42,74 @@ npm install
|
|
|
42
42
|
|
|
43
43
|
### Basic usage
|
|
44
44
|
|
|
45
|
-
Create a
|
|
46
|
-
|
|
45
|
+
Create a directory to store the data, then run the following command to download, reorganize and clean the data.
|
|
47
46
|
```bash
|
|
48
47
|
mkdir ../assemblee-data/
|
|
49
|
-
|
|
50
|
-
# Download and clean open data
|
|
51
48
|
npm run data:download ../assemblee-data
|
|
52
49
|
```
|
|
53
50
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
```bash
|
|
57
|
-
# Retrieval of députés' pictures from Assemblée nationale's website
|
|
58
|
-
npm run data:retrieve_deputes_photos ../assemblee-data
|
|
51
|
+
### Available Commands
|
|
59
52
|
|
|
60
|
-
|
|
61
|
-
npm run data:
|
|
53
|
+
- `npm run data:download <dir>`: Download, reorganize, and clean data
|
|
54
|
+
- `npm run data:retrieve_open_data <dir>`: Download raw data files.
|
|
55
|
+
- `npm run data:reorganize_data <dir>`: Reorganize raw files by entity.
|
|
56
|
+
- `npm run data:clean_data <dir>`: Clean and validate reorganized files.
|
|
57
|
+
- `npm run data:retrieve_deputes_photos <dir>`: Retrieval of députés' pictures from Assemblée nationale's website
|
|
58
|
+
- `npm run data:retrieve_senateurs_photos <dir>`: Retrieval of sénateurs' pictures from Assemblée nationale's website
|
|
59
|
+
- `npm run data:retrieve_documents <dir>`: Retrieval of legislative documents from Assemblée nationale's website
|
|
60
|
+
- `npm run data:retrieve_pending_amendements <dir>`: Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
|
|
62
61
|
|
|
63
|
-
# Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
|
|
64
|
-
npm run data:retrieve_pending_amendements ../assemblee-data
|
|
65
|
-
```
|
|
66
62
|
|
|
67
63
|
_Notes_:
|
|
68
64
|
|
|
69
65
|
- Reorganized files (generated by the _data:reorganize_data_ command) are also available in [Tricoteuses / Data / Données brutes de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-brut). They are updated on a regular basis.
|
|
70
66
|
- Split & cleaned files (generated by the _data:clean_data_ command) are also available in [Tricoteuses / Data / Données nettoyées de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-nettoye) with the `_nettoye` suffix. They are updated on a regular basis.
|
|
71
67
|
|
|
72
|
-
### Filtering
|
|
68
|
+
### Filtering Options
|
|
73
69
|
|
|
74
70
|
Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.
|
|
75
71
|
|
|
76
|
-
|
|
72
|
+
Examples:
|
|
77
73
|
|
|
78
74
|
```bash
|
|
79
|
-
#
|
|
80
|
-
npm run data:download ../assemblee-data --
|
|
81
|
-
```
|
|
75
|
+
# Only download amendments
|
|
76
|
+
npm run data:download ../assemblee-data -- -k Amendements
|
|
82
77
|
|
|
83
|
-
|
|
84
|
-
```bash
|
|
85
|
-
# Available options : 14, 15, 16, 17
|
|
78
|
+
# Only process 16th and 17th legislatures
|
|
86
79
|
npm run data:download ../assemblee-data -- -l 16 -l 17
|
|
87
|
-
|
|
88
80
|
```
|
|
89
81
|
|
|
82
|
+
### Common Options
|
|
83
|
+
|
|
84
|
+
- `--categories` or `-k <name>`: Filter by dataset categories (Available options : `ActeursEtOrganes`, `Agendas`, `Amendements`, `DossiersLegislatifs`, `Photos`, `Scrutins`, `Questions`, `ComptesRendusSeances`)
|
|
85
|
+
|
|
86
|
+
- `--legislature` or `-l <number>`: Specify one or more legislatures to process (e.g., `-l 15 -l 16`)
|
|
87
|
+
- `--dataDir <path>` (Mandatory): Path to the working directory where all data is stored (required)
|
|
88
|
+
- `--silent` or `-s`: Disable logging
|
|
89
|
+
- `--verbose` or `-v`: Enable verbose logging
|
|
90
|
+
- `--fetch` or `-f`: Force re-download of data even if already present
|
|
91
|
+
- `--commit` or `-c`: Automatically commit cleaned data
|
|
92
|
+
- `--pull` or `-p`: Pull repositories before starting
|
|
93
|
+
- `--clone` or `-C <url>`: Clone Git repositories from a remote group or organization
|
|
94
|
+
- `--remote` or `-r <name>`: Push commits to specified Git remote(s)
|
|
95
|
+
|
|
90
96
|
If you use such options, use them in all subsequent commands too (_data:regorganize_data_ and _data:clean_data_).
|
|
91
97
|
|
|
98
|
+
### Options for Cleaning Data
|
|
99
|
+
|
|
100
|
+
- `--dataset` or `-d <name>`: Clean a specific dataset only
|
|
101
|
+
- `--no-reset-after-commit`: Skip Git reset after committing (useful to preserve local changes)
|
|
102
|
+
- `--no-validate` or `-V`: Skip schema validation during cleaning
|
|
103
|
+
- `--fullCompteRenduCommissions`: Force reprocessing of commission reports
|
|
104
|
+
- `--fetchDocuments` : Specify to retrieve documents like reports, videos metadata files
|
|
105
|
+
- `--parseDocuments`: Specify to parse documents into cleaned json
|
|
106
|
+
|
|
107
|
+
### Options for Retrieving Documents
|
|
108
|
+
|
|
109
|
+
- `--full` or `-f`: Retrieve all documents, even those already downloaded
|
|
110
|
+
- `--document-type` or `-T <type>`: Restrict to specific document types (e.g., `PION`)
|
|
111
|
+
|
|
112
|
+
|
|
92
113
|
## Download using Docker
|
|
93
114
|
|
|
94
115
|
A Docker image that downloads and cleans the data all at once is available. Build it locally or run it from the container registry.
|