npm - @tricoteuses/senat - Versions diffs - 2.9.1 → 2.9.5 - Mend

@tricoteuses/senat 2.9.1 → 2.9.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/LICENSE.md +22 -22
package/README.md +116 -116
package/lib/loaders.d.ts +1 -1
package/lib/model/agenda.js +2 -0
package/lib/scripts/retrieve_videos.d.ts +1 -0
package/lib/scripts/retrieve_videos.js +420 -0
package/lib/types/agenda.d.ts +2 -0
package/lib/validators/senat.d.ts +0 -0
package/lib/validators/senat.js +24 -0
package/package.json +95 -94

package/LICENSE.md CHANGED Viewed

@@ -1,22 +1,22 @@
-# Tricoteuses-Senat
-## _Handle French Sénat's open data_
-By: Emmanuel Raviart <mailto:emmanuel@raviart.com>
-Copyright (C) 2019, 2020, 2021 Emmanuel Raviart
-https://git.tricoteuses.fr/logiciels/tricoteuses-senat
-> Tricoteuses-Senat is free software; you can redistribute it and/or modify
-> it under the terms of the GNU Affero General Public License as
-> published by the Free Software Foundation, either version 3 of the
-> License, or (at your option) any later version.
->
-> Tricoteuses-Senat is distributed in the hope that it will be useful,
-> but WITHOUT ANY WARRANTY; without even the implied warranty of
-> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-> GNU Affero General Public License for more details.
->
-> You should have received a copy of the GNU Affero General Public License
-> along with this program. If not, see <http://www.gnu.org/licenses/>.
+# Tricoteuses-Senat
+## _Handle French Sénat's open data_
+By: Emmanuel Raviart <mailto:emmanuel@raviart.com>
+Copyright (C) 2019, 2020, 2021 Emmanuel Raviart
+https://git.tricoteuses.fr/logiciels/tricoteuses-senat
+> Tricoteuses-Senat is free software; you can redistribute it and/or modify
+> it under the terms of the GNU Affero General Public License as
+> published by the Free Software Foundation, either version 3 of the
+> License, or (at your option) any later version.
+>
+> Tricoteuses-Senat is distributed in the hope that it will be useful,
+> but WITHOUT ANY WARRANTY; without even the implied warranty of
+> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+> GNU Affero General Public License for more details.
+>
+> You should have received a copy of the GNU Affero General Public License
+> along with this program. If not, see <http://www.gnu.org/licenses/>.

package/README.md CHANGED Viewed

@@ -1,116 +1,116 @@
-# Tricoteuses-Senat
-## _Retrieve, clean up & handle  French Sénat's open data_
-## Requirements
-- Node >= 22
-## Installation
-```bash
-git clone https://git.tricoteuses.fr/logiciels/tricoteuses-senat
-cd tricoteuses-senat/
-```
-Create a `.env` file to set PostgreSQL database informations and other configuration variables (you can use `example.env` as a template). Then
-```bash
-npm install
-```
-### Database creation (not needed if downloading with Docker image)
-#### Using Docker
-```bash
-docker run --name local-postgres -d -p 5432:5432 -e POSTGRES_PASSWORD=$YOUR_CUSTOM_DB_PASSWORD postgres
-# Default Postgres user is postgres
-# But scripts require an "opendata" role
-docker exec -it local-postgres psql -U postgres -c "CREATE ROLE opendata;"
-```
-## Download data
-Create a folder where the data will be downloaded and run the following command to download the data and convert it into JSON files.
-```bash
-mkdir ../senat-data/
-# Available options for optional `categories` parameter : All,  Ameli, Debats, DosLeg, Questions, Sens
-npm run data:download ../senat-data -- [--categories All]
-```
-Data from other sources is also available :
-```bash
-# Retrieval of textes and rapports from Sénat's website
-# Available options for optional `formats` parameter : xml, html, pdf
-# Available options for optional `types` parameter : textes, rapports
-npm run data:retrieve_documents ../senat-data -- --fromSession 2022 [--formats xml pdf] [--types textes]
-# Retrieval & parsing (textes in xml format only for now)
-npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --parseDocuments
-# Parsing only
-npm run data:parse_textes_lois ../senat-data
-# Retrieval (& parsing) of agenda from Sénat's website
-npm run data:retrieve_agenda ../senat-data -- --fromSession 2022 [--parseAgenda]
-# Retrieval (& parsing) of comptes-rendus des débats from Sénat's website
-npm run data:retrieve_comptes_rendus ../senat-data -- [--parseDebats]
-# Retrieval of sénateurs' pictures from Sénat's website
-npm run data:retrieve_senateurs_photos ../senat-data
-```
-## Data download using Docker
-A Docker image that downloads and converts the data all at once is available. Build it locally or run it from the container registry.
-Use the environment variables `FROM_SESSION` and `CATEGORIES` if needed.
-```bash
-docker run --pull always --name tricoteuses-senat -v ../senat-data:/app/senat-data -d git.tricoteuses.fr/logiciels/tricoteuses-senat:latest
-```
-Use the environment variable `CATEGORIES` and `FROM_SESSION` if needed.
-## Using the data
-Once the data is downloaded, you can use loaders to retrieve it.
-To use loaders in your project, you can install the _@tricoteuses/senat_ package, and import the iterator functions that you need.
-```bash
-npm install @tricoteuses/senat
-```
-```js
-import { iterLoadSenatQuestions } from "@tricoteuses/senat/loaders"
-// Pass data directory and legislature as arguments
-for (const { item: question } of iterLoadSenatQuestions("../senat-data", 17)) {
-  console.log(question.id)
-}
-```
-## Generation of raw types from SQL schema (for contributors only)
-```bash
-npm run data:generate_schemas ../senat-data
-```
-## Publishing
-To publish a new version of this package onto npm, bump the package version and publish.
-```bash
-npm version x.y.z # Bumps version in package.json and creates a new tag x.y.z
-npx tsc
-npm publish
-```
-The Docker image will be automatically built during a CI Workflow if you push the tag to the remote repository.
-```bash
-git push --tags
-```
+# Tricoteuses-Senat
+## _Retrieve, clean up & handle  French Sénat's open data_
+## Requirements
+- Node >= 22
+## Installation
+```bash
+git clone https://git.tricoteuses.fr/logiciels/tricoteuses-senat
+cd tricoteuses-senat/
+```
+Create a `.env` file to set PostgreSQL database informations and other configuration variables (you can use `example.env` as a template). Then
+```bash
+npm install
+```
+### Database creation (not needed if downloading with Docker image)
+#### Using Docker
+```bash
+docker run --name local-postgres -d -p 5432:5432 -e POSTGRES_PASSWORD=$YOUR_CUSTOM_DB_PASSWORD postgres
+# Default Postgres user is postgres
+# But scripts require an "opendata" role
+docker exec -it local-postgres psql -U postgres -c "CREATE ROLE opendata;"
+```
+## Download data
+Create a folder where the data will be downloaded and run the following command to download the data and convert it into JSON files.
+```bash
+mkdir ../senat-data/
+# Available options for optional `categories` parameter : All,  Ameli, Debats, DosLeg, Questions, Sens
+npm run data:download ../senat-data -- [--categories All]
+```
+Data from other sources is also available :
+```bash
+# Retrieval of textes and rapports from Sénat's website
+# Available options for optional `formats` parameter : xml, html, pdf
+# Available options for optional `types` parameter : textes, rapports
+npm run data:retrieve_documents ../senat-data -- --fromSession 2022 [--formats xml pdf] [--types textes]
+# Retrieval & parsing (textes in xml format only for now)
+npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --parseDocuments
+# Parsing only
+npm run data:parse_textes_lois ../senat-data
+# Retrieval (& parsing) of agenda from Sénat's website
+npm run data:retrieve_agenda ../senat-data -- --fromSession 2022 [--parseAgenda]
+# Retrieval (& parsing) of comptes-rendus des débats from Sénat's website
+npm run data:retrieve_comptes_rendus ../senat-data -- [--parseDebats]
+# Retrieval of sénateurs' pictures from Sénat's website
+npm run data:retrieve_senateurs_photos ../senat-data
+```
+## Data download using Docker
+A Docker image that downloads and converts the data all at once is available. Build it locally or run it from the container registry.
+Use the environment variables `FROM_SESSION` and `CATEGORIES` if needed.
+```bash
+docker run --pull always --name tricoteuses-senat -v ../senat-data:/app/senat-data -d git.tricoteuses.fr/logiciels/tricoteuses-senat:latest
+```
+Use the environment variable `CATEGORIES` and `FROM_SESSION` if needed.
+## Using the data
+Once the data is downloaded, you can use loaders to retrieve it.
+To use loaders in your project, you can install the _@tricoteuses/senat_ package, and import the iterator functions that you need.
+```bash
+npm install @tricoteuses/senat
+```
+```js
+import { iterLoadSenatQuestions } from "@tricoteuses/senat/loaders"
+// Pass data directory and legislature as arguments
+for (const { item: question } of iterLoadSenatQuestions("../senat-data", 17)) {
+  console.log(question.id)
+}
+```
+## Generation of raw types from SQL schema (for contributors only)
+```bash
+npm run data:generate_schemas ../senat-data
+```
+## Publishing
+To publish a new version of this package onto npm, bump the package version and publish.
+```bash
+npm version x.y.z # Bumps version in package.json and creates a new tag x.y.z
+npx tsc
+npm publish
+```
+The Docker image will be automatically built during a CI Workflow if you push the tag to the remote repository.
+```bash
+git push --tags
+```

package/lib/loaders.d.ts CHANGED Viewed

@@ -17,7 +17,7 @@ export declare const TEXTE_FOLDER = "leg";
 export declare const DATA_ORIGINAL_FOLDER = "original";
 export declare const DATA_TRANSFORMED_FOLDER = "transformed";
 export declare const DOCUMENT_METADATA_FILE = "metadata.json";
-type IterItem<T> = {
+export type IterItem<T> = {
     item: T;
     filePathFromDataset?: string;
     legislature?: number;

package/lib/model/agenda.js CHANGED Viewed

@@ -119,6 +119,8 @@ function transformAgenda(document, fileName) {
             captationVideo: videoElement !== null,
             urlDossierSenat: urlDossierSenat,
             quantieme: eventIsSeance(eventElement) ? getQuantieme(eventElement, seanceElements) : null,
+            urlVideo: null,
+            timecodeDebutVideo: null
         });
     }
     return agendaEvents;

package/lib/scripts/retrieve_videos.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export {};

package/lib/scripts/retrieve_videos.js ADDED Viewed

@@ -0,0 +1,420 @@
+// scripts/retrieve_senat_videos_from_agendas.ts
+import assert from "assert";
+import commandLineArgs from "command-line-args";
+import fs from "fs-extra";
+import fsp from "fs/promises";
+import path from "path";
+import { AGENDA_FOLDER, DATA_TRANSFORMED_FOLDER, iterLoadSenatAgendas, } from "../loaders";
+import { getSessionsFromStart } from "../types/sessions";
+import { commonOptions } from "./shared/cli_helpers";
+// ===================== Constants =====================
+const MATCH_THRESHOLD = 0.60;
+const MAX_CANDIDATES = 15;
+const MAX_PAGES = 3;
+const STATS = { total: 0, accepted: 0 };
+const VIDEOS_ROOT_FOLDER = "videos";
+const SENAT_VIDEOS_SEARCH_AJAX = "https://videos.senat.fr/senat_videos_search.php";
+const SENAT_DATAS_ROOT = "https://videos.senat.fr/Datas/senat";
+const SENAT_VOD_HOST = "https://vodsenat.akamaized.net";
+// ===================== CLI =====================
+const optionsDefinitions = [
+    ...commonOptions,
+];
+const options = commandLineArgs(optionsDefinitions);
+// ===================== Utils =====================
+function normalize(s) {
+    return (s ?? "")
+        .toLowerCase()
+        .normalize("NFD")
+        .replace(/[\u0300-\u036f]/g, "")
+        .replace(/[^\p{L}\p{N}\s-]/gu, " ")
+        .replace(/\s+/g, " ")
+        .trim();
+}
+function tokens(s) { return normalize(s).split(" ").filter(Boolean); }
+function dice(a, b) {
+    const A = new Set(tokens(a)), B = new Set(tokens(b));
+    if (!A.size || !B.size)
+        return 0;
+    let inter = 0;
+    for (const t of A)
+        if (B.has(t))
+            inter++;
+    return (2 * inter) / (A.size + B.size);
+}
+// Heuristic for Europe/Paris DST: +02:00 ≈ April→October, +01:00 otherwise.
+function parisOffsetForDate(dateYYYYMMDD) {
+    const m = Number(dateYYYYMMDD.split("-")[1] || "1");
+    return (m >= 4 && m <= 10) ? "+02:00" : "+01:00";
+}
+function epochToParisDateTime(epochSec) {
+    if (!Number.isFinite(epochSec))
+        return null;
+    const dUtc = new Date(epochSec * 1000);
+    // Offset heuristic (same logique que parisOffsetForDate)
+    const m = dUtc.getUTCMonth() + 1; // 1..12
+    const offsetHours = (m >= 4 && m <= 10) ? 2 : 1;
+    const offsetStr = offsetHours === 2 ? "+02:00" : "+01:00";
+    // Applique l'offset pour obtenir la date/heure locales Paris
+    const localMs = dUtc.getTime() + offsetHours * 3600 * 1000;
+    const dl = new Date(localMs);
+    const yyyy = String(dl.getUTCFullYear());
+    const mm = String(dl.getUTCMonth() + 1).padStart(2, "0");
+    const dd = String(dl.getUTCDate()).padStart(2, "0");
+    const hh = String(dl.getUTCHours()).padStart(2, "0");
+    const mi = String(dl.getUTCMinutes()).padStart(2, "0");
+    const ss = String(dl.getUTCSeconds()).padStart(2, "0");
+    const ms = String(dl.getUTCMilliseconds()).padStart(3, "0");
+    return {
+        date: `${yyyy}-${mm}-${dd}`,
+        startTime: `${hh}:${mi}:${ss}.${ms}${offsetStr}`,
+    };
+}
+function toTargetEpoch(date, time) {
+    if (!date)
+        return null;
+    let t = (time ?? "00:00").trim();
+    // Si l'heure contient déjà un fuseau (Z ou ±HH:MM), on la fait simplement précéder de la date.
+    const hasTz = /(?:Z|[+-]\d{2}:\d{2})$/i.test(t);
+    let iso;
+    if (hasTz) {
+        // Exemple: 2022-10-04T18:00:00.000+02:00
+        iso = `${date}T${t}`;
+    }
+    else {
+        // Normalise pour avoir au moins HH:mm:ss
+        if (/^\d{1,2}$/.test(t)) {
+            t = `${t.padStart(2, "0")}:00:00`;
+        }
+        else if (/^\d{1,2}:\d{2}$/.test(t)) {
+            t = `${t}:00`;
+        } // sinon, on garde tel quel (gère HH:mm:ss et HH:mm:ss.SSS)
+        // Ajoute l’offset Paris (heuristique saisonnière)
+        iso = `${date}T${t}${parisOffsetForDate(date)}`;
+    }
+    const ms = Date.parse(iso);
+    return Number.isNaN(ms) ? null : Math.floor(ms / 1000);
+}
+async function fetchText(url) {
+    const res = await fetch(url);
+    if (!res.ok)
+        return null;
+    return await res.text();
+}
+async function fetchBuffer(url) {
+    const res = await fetch(url);
+    if (!res.ok)
+        return null;
+    const ab = await res.arrayBuffer();
+    return Buffer.from(ab);
+}
+async function writeIfChanged(p, content) {
+    const exists = await fs.pathExists(p);
+    if (exists) {
+        const old = await fsp.readFile(p, "utf-8");
+        if (old === content)
+            return;
+    }
+    await fsp.writeFile(p, content, "utf-8");
+}
+function queryString(obj) {
+    return Object.entries(obj)
+        .map(([k, v]) => `${encodeURIComponent(k)}=${encodeURIComponent(v)}`)
+        .join("&");
+}
+function simplifyTitleForKeywords(input) {
+    return (input || "")
+        .replace(/\baudition\s+de\b/gi, " ")
+        .replace(/\breunion\b/gi, " ")
+        .replace(/\bsur\b/gi, " ")
+        .replace(/\b(la|le|les|des|de|du|d’|d')\b/gi, " ")
+        .replace(/[–—-]/g, " ")
+        .replace(/\s+/g, " ")
+        .trim();
+}
+function toFRDate(dateYYYYMMDD) {
+    const [y, m, d] = dateYYYYMMDD.split("-");
+    return `${d}/${m}/${y}`; // DD/MM/YYYY
+}
+function formatYYYYMMDD(dateYYYYMMDD) {
+    const [y, m, d] = dateYYYYMMDD.split("-");
+    return `${y}${m}${d}`;
+}
+function makeReunionUid(agenda) {
+    // agenda.date is expected as "YYYY-MM-DD"
+    const ymd = agenda.date ? formatYYYYMMDD(agenda.date) : "00000000";
+    return `${ymd}-${agenda.id}`;
+}
+function extractCandidatesFromSearchHtml(html) {
+    const out = [];
+    const re = /href="\/?video\.(\d+)_([a-z0-9]+)\.[^"]+"/gi;
+    let m;
+    while ((m = re.exec(html))) {
+        const id = m[1], hash = m[2];
+        const pageUrl = `https://videos.senat.fr/video.${id}_${hash}.html`;
+        const ctx = html.slice(Math.max(0, m.index - 240), Math.min(html.length, m.index + 240));
+        const t = ctx.match(/title="([^"]+)"/i) || ctx.match(/>([^<]{10,200})</);
+        out.push({ id, hash, pageUrl, title: t?.[1] });
+    }
+    const seen = new Set();
+    return out.filter(c => {
+        const k = `${c.id}_${c.hash}`;
+        if (seen.has(k))
+            return false;
+        seen.add(k);
+        return true;
+    });
+}
+function parseDataNvs(nvs) {
+    const epoch = nvs.match(/<metadata\s+name="date"\s+value="(\d+)"/i)?.[1];
+    const title = nvs.match(/<metadata\s+name="title"\s+value="([^"]+)"/i)?.[1];
+    return { epoch: epoch ? Number(epoch) : undefined, title };
+}
+function buildSenatVodMasterM3u8FromNvs(xml, host = SENAT_VOD_HOST) {
+    if (!xml)
+        return null;
+    // (a) Déjà un lien VOD complet en .smil/playlist.m3u8
+    const mVod = xml.match(/https?:\/\/[^"'<>]*vodsenat[^"'<>]*\.smil\/(?:playlist|master)\.m3u8/i);
+    if (mVod)
+        return mVod[0];
+    // (b) Chemin senat/YYYY/MM/<basename>.smil  -> normalise en playlist.m3u8
+    const mSmilPath = xml.match(/senat\/(\d{4})\/(\d{2})\/([^"'<>\/]+?)\.smil/i);
+    if (mSmilPath) {
+        const [, y, m, base] = mSmilPath;
+        return `${host}/senat/${y}/${m}/${base}.smil/playlist.m3u8`;
+    }
+    // (c) Chemin senat/YYYY/MM/<basename>.mp4  -> transforme en .smil/playlist.m3u8
+    const mMp4Path = xml.match(/senat\/(\d{4})\/(\d{2})\/([^"'<>\/]+?)\.mp4/i);
+    if (mMp4Path) {
+        const [, y, m, base] = mMp4Path;
+        return `${host}/senat/${y}/${m}/${base}.smil/playlist.m3u8`;
+    }
+    // (d) À défaut, n’importe quel .m3u8 présent (faible priorité — peut être du live)
+    const mAny = xml.match(/https?:\/\/[^"'<>]+\.m3u8/i);
+    return mAny ? mAny[0] : null;
+}
+function score(agenda, agendaTs, videoTitle, videoEpoch) {
+    const titleScore = dice(agenda.titre || "", videoTitle || "");
+    let timeScore = 0;
+    if (agendaTs && videoEpoch) {
+        const deltaMin = Math.abs(videoEpoch - agendaTs) / 60;
+        timeScore = Math.max(0, 1 - (deltaMin / 180));
+    }
+    let orgBonus = 0;
+    if (agenda.organe && videoTitle) {
+        const o = normalize(agenda.organe);
+        const t = normalize(videoTitle);
+        if (o && t.includes(o.split(" ")[0]))
+            orgBonus = 0.15;
+    }
+    return 0.3 * titleScore + 0.7 * timeScore + orgBonus;
+}
+function buildSearchStrategies(agenda) {
+    const fr = agenda.date ? toFRDate(agenda.date) : undefined;
+    const kw = simplifyTitleForKeywords(agenda.titre || "");
+    const commission = agenda.organe || undefined;
+    // common base
+    const base = { search: "true", videotype: "Commission" };
+    if (fr)
+        Object.assign(base, { period: "custom", begin: fr, end: fr });
+    const strategies = [];
+    // 1) keywords + commission
+    if (kw && commission)
+        strategies.push({ ...base, motscles: kw, commission });
+    // 2) keywords without commission
+    if (kw)
+        strategies.push({ ...base, motscles: kw });
+    // 3) full-text (AND) + commission
+    if (kw && commission)
+        strategies.push({ ...base, text: `AND${kw}`, commission });
+    // 4) full-text (AND) without commission
+    if (kw)
+        strategies.push({ ...base, text: `AND${kw}` });
+    // 5) no keywords (just type + period)
+    strategies.push({ ...base });
+    return strategies;
+}
+async function fetchAllSearchPages(args, baseDir, strategyIndex, maxPages = MAX_PAGES) {
+    const pages = [];
+    for (let p = 1; p <= maxPages; p++) {
+        const url = `${SENAT_VIDEOS_SEARCH_AJAX}?${queryString({ ...args, page: String(p) })}`;
+        const html = await fetchText(url);
+        if (!html)
+            break;
+        pages.push(html);
+        if (!/href="\/?video\.\d+_[a-z0-9]+\./i.test(html))
+            break;
+    }
+    return pages;
+}
+async function processAgenda(agenda, session, dataDir) {
+    if (!agenda)
+        return;
+    if (!agenda.captationVideo) {
+        if (!options["silent"])
+            console.log(`[skip] ${agenda.id} captationVideo=false`);
+        return;
+    }
+    if (!agenda.date || !agenda.startTime) {
+        if (!options["silent"])
+            console.log(`[skip] ${agenda.id} date/hour missing`);
+        return;
+    }
+    STATS.total++;
+    const reunionUid = makeReunionUid(agenda);
+    const baseDir = path.join(dataDir, VIDEOS_ROOT_FOLDER, String(session), reunionUid);
+    await fs.ensureDir(baseDir);
+    const agendaTs = toTargetEpoch(agenda.date, agenda.startTime);
+    // ==== 1) Multi-strategy searches ====
+    const strategies = buildSearchStrategies(agenda);
+    let combinedHtml = "";
+    let usedStrategy = -1;
+    let candidates = [];
+    for (let i = 0; i < strategies.length; i++) {
+        const pages = await fetchAllSearchPages(strategies[i], baseDir, i + 1, MAX_PAGES);
+        if (pages.length === 0)
+            continue;
+        const combined = pages.join("\n<!-- PAGE SPLIT -->\n");
+        const cs = extractCandidatesFromSearchHtml(combined);
+        if (cs.length) {
+            combinedHtml = combined;
+            candidates = cs.slice(0, MAX_CANDIDATES);
+            usedStrategy = i + 1;
+            break;
+        }
+    }
+    if (usedStrategy === -1 || !candidates.length) {
+        if (!options["silent"])
+            console.log(`[miss] ${agenda.id} no candidates (triedStrategies=${strategies.length})`);
+        return;
+    }
+    // ==== 2) Enrich via data.nvs + scoring; pick best ====
+    let best = null;
+    for (const c of candidates) {
+        const dataUrl = `${SENAT_DATAS_ROOT}/${c.id}_${c.hash}/content/data.nvs`;
+        const buf = await fetchBuffer(dataUrl);
+        if (!buf)
+            continue;
+        const meta = parseDataNvs(buf.toString("utf-8"));
+        const s = score(agenda, agendaTs, c.title ?? meta.title, meta.epoch);
+        if (!best || s > best.score) {
+            best = { id: c.id, hash: c.hash, pageUrl: c.pageUrl, epoch: meta.epoch, vtitle: c.title ?? meta.title, score: s };
+        }
+    }
+    if (!best) {
+        if (!options["silent"])
+            console.log(`[miss] ${agenda.id} candidats without data.nvs`);
+        return;
+    }
+    const accepted = best.score >= MATCH_THRESHOLD;
+    if (accepted)
+        STATS.accepted++;
+    if (!options["silent"]) {
+        console.log(`[pick] ${agenda.id} best id=${best.id} hash=${best.hash} score=${best.score.toFixed(2)} accepted=${accepted} (strategy=${usedStrategy})`);
+    }
+    // ==== 3) Write metadata + NVS of the best candidate (always) ====
+    const bestDt = best?.epoch ? epochToParisDateTime(best.epoch) : null;
+    const metadata = {
+        reunionUid,
+        session,
+        accepted,
+        threshold: MATCH_THRESHOLD,
+        strategy: usedStrategy,
+        agenda: {
+            date: agenda.date,
+            startTime: agenda.startTime,
+            titre: agenda.titre,
+            organe: agenda.organe ?? undefined,
+            id: agenda.id,
+        },
+        best: {
+            id: best.id,
+            hash: best.hash,
+            pageUrl: best.pageUrl,
+            epoch: best.epoch ?? null,
+            date: bestDt?.date ?? null,
+            startTime: bestDt?.startTime ?? null,
+            title: best.vtitle ?? null,
+            score: best.score,
+        },
+    };
+    await writeIfChanged(path.join(baseDir, "metadata.json"), JSON.stringify(metadata, null, 2));
+    const dataUrl = `${SENAT_DATAS_ROOT}/${best.id}_${best.hash}/content/data.nvs`;
+    const finalUrl = `${SENAT_DATAS_ROOT}/${best.id}_${best.hash}/content/finalplayer.nvs`;
+    const dataTxt = await fetchText(dataUrl);
+    const finalTxt = await fetchText(finalUrl);
+    if (dataTxt)
+        await fsp.writeFile(path.join(baseDir, "data.nvs"), dataTxt, "utf-8");
+    if (finalTxt)
+        await fsp.writeFile(path.join(baseDir, "finalplayer.nvs"), finalTxt, "utf-8");
+    let master = null;
+    if (dataTxt)
+        master = buildSenatVodMasterM3u8FromNvs(dataTxt);
+    // ==== 4) Update agenda file (only if accepted + m3u8) ====
+    if (accepted && master) {
+        const agendaJsonPath = path.join(dataDir, AGENDA_FOLDER, DATA_TRANSFORMED_FOLDER, String(session), `${formatYYYYMMDD(agenda.date)}.json`);
+        if (await fs.pathExists(agendaJsonPath)) {
+            const raw = await fsp.readFile(agendaJsonPath, "utf-8");
+            let items;
+            try {
+                items = JSON.parse(raw);
+            }
+            catch (e) {
+                console.warn(`[warn] invalid JSON in ${agendaJsonPath}:`, e?.message);
+                items = null;
+            }
+            if (Array.isArray(items)) {
+                const idx = items.findIndex((e) => String(e?.id) === String(agenda.id));
+                if (idx === -1) {
+                    console.warn(`[warn] agenda id ${agenda.id} not found in ${agendaJsonPath}`);
+                }
+                else {
+                    // add/update urlVideo on the matching item
+                    items[idx] = { ...items[idx], urlVideo: master };
+                    await writeIfChanged(agendaJsonPath, JSON.stringify(items, null, 2));
+                    if (!options["silent"]) {
+                        console.log(`[write] ${agenda.id} urlVideo ← ${master}`);
+                    }
+                }
+            }
+            else {
+                console.warn(`[warn] expected an array in ${agendaJsonPath}, got ${typeof items}`);
+            }
+        }
+        else {
+            console.warn(`[warn] agenda file not found for update: ${agendaJsonPath}`);
+        }
+    }
+}
+async function processAll(dataDir, sessions) {
+    for (const session of sessions) {
+        for (const { item: agendas } of iterLoadSenatAgendas(dataDir, session, {})) {
+            for (const agenda of agendas) {
+                try {
+                    await processAgenda(agenda, session, dataDir);
+                }
+                catch (e) {
+                    console.error(`[error] ${agenda.id}:`, e?.message || e);
+                }
+            }
+        }
+    }
+}
+async function main() {
+    const dataDir = options["dataDir"];
+    assert(dataDir, "Missing argument: data directory");
+    const sessions = getSessionsFromStart(options["fromSession"]);
+    if (!options["silent"])
+        console.time("senat-agendas→videos start processing time");
+    await processAll(dataDir, sessions);
+    if (!options["silent"])
+        console.timeEnd("senat-agendas→videos processing time");
+    if (!options["silent"]) {
+        const { total, accepted } = STATS;
+        const ratio = total ? (accepted / total * 100).toFixed(1) : "0.0";
+        console.log(`[summary] accepted=${accepted} / total=${total} (${ratio}%)`);
+    }
+}
+main()
+    .then(() => process.exit(0))
+    .catch((err) => { console.error(err); process.exit(1); });

package/lib/types/agenda.d.ts CHANGED Viewed

@@ -12,4 +12,6 @@ export interface AgendaEvent {
     captationVideo: boolean;
     urlDossierSenat: string | null;
     quantieme: string | null;
+    urlVideo: string | null;
+    timecodeDebutVideo: number | null;
 }

package/lib/validators/senat.d.ts ADDED Viewed

File without changes

package/lib/validators/senat.js ADDED Viewed

@@ -0,0 +1,24 @@
+"use strict";
+// import { validateNonEmptyTrimmedString } from "@biryani/core"
+// const acteurUidRegExp = /^PA\d+$/
+// const organeUidRegExp = /^PO\d+$/
+// export function validateSenateurUid(input: any): [any, any] {
+//   const [value, error] = validateNonEmptyTrimmedString(input)
+//   if (error !== null) {
+//     return [value, error]
+//   }
+//   if (!acteurUidRegExp.test(value)) {
+//     return [value, 'Invalid "acteur" unique ID']
+//   }
+//   return [value, null]
+// }
+// export function validateOrganeUid(input: any): [any, any] {
+//   const [value, error] = validateNonEmptyTrimmedString(input)
+//   if (error !== null) {
+//     return [value, error]
+//   }
+//   if (!organeUidRegExp.test(value)) {
+//     return [value, 'Invalid "organe" unique ID']
+//   }
+//   return [value, null]
+// }

package/package.json CHANGED Viewed

@@ -1,94 +1,95 @@
-{
-  "name": "@tricoteuses/senat",
-  "version": "2.9.1",
-  "description": "Handle French Sénat's open data",
-  "keywords": [
-    "France",
-    "open data",
-    "Parliament",
-    "Sénat"
-  ],
-  "author": "Emmanuel Raviart <emmanuel@raviart.com>",
-  "bugs": {
-    "url": "https://git.tricoteuses.fr/logiciels/tricoteuses-senat/issues"
-  },
-  "homepage": "https://tricoteuses.fr/",
-  "license": "AGPL-3.0-or-later",
-  "repository": {
-    "type": "git",
-    "url": "https://git.tricoteuses.fr/logiciels/tricoteuses-senat.git"
-  },
-  "type": "module",
-  "engines": {
-    "node": ">=22"
-  },
-  "files": [
-    "lib"
-  ],
-  "exports": {
-    ".": {
-      "import": "./lib/index.js",
-      "types": "./lib/index.d.ts"
-    },
-    "./loaders": {
-      "import": "./lib/loaders.js",
-      "types": "./lib/loaders.d.ts"
-    },
-    "./package.json": "./package.json"
-  },
-  "publishConfig": {
-    "access": "public"
-  },
-  "scripts": {
-    "build": "tsc",
-    "build:types": "tsc --emitDeclarationOnly",
-    "data:convert_data": "tsx src/scripts/convert_data.ts",
-    "data:download": "tsx src/scripts/data-download.ts",
-    "data:generate_schemas": "tsx src/scripts/retrieve_open_data.ts --schema",
-    "data:retrieve_agenda": "cross-env TZ='Etc/UTC' tsx src/scripts/retrieve_agenda.ts",
-    "data:retrieve_comptes_rendus": "tsx src/scripts/retrieve_comptes_rendus.ts",
-    "data:retrieve_documents": "tsx src/scripts/retrieve_documents.ts",
-    "data:retrieve_open_data": "tsx src/scripts/retrieve_open_data.ts --all",
-    "data:retrieve_senateurs_photos": "tsx src/scripts/retrieve_senateurs_photos.ts --fetch",
-    "data:parse_textes_lois": "tsx src/scripts/parse_textes.ts",
-    "prepare": "npm run build",
-    "prepublishOnly": "npm run build",
-    "prettier": "prettier --write 'src/**/*.ts' 'tests/**/*.test.ts'",
-    "type-check": "tsc --noEmit",
-    "type-check:watch": "npm run type-check -- --watch"
-  },
-  "dependencies": {
-    "@biryani/core": "^0.2.1",
-    "command-line-args": "^5.1.1",
-    "dotenv": "^8.2.0",
-    "fs-extra": "^9.1.0",
-    "jsdom": "^26.0.0",
-    "kysely": "^0.27.4",
-    "luxon": "^3.5.0",
-    "node-stream-zip": "^1.8.2",
-    "pg": "^8.13.1",
-    "pg-cursor": "^2.12.1",
-    "slug": "^11.0.0",
-    "tsx": "^4.19.4",
-    "windows-1252": "^1.0.0"
-  },
-  "devDependencies": {
-    "@typed-code/schemats": "^5.0.1",
-    "@types/command-line-args": "^5.0.0",
-    "@types/fs-extra": "^9.0.7",
-    "@types/jsdom": "^21.1.7",
-    "@types/luxon": "^3.4.2",
-    "@types/node": "^20.17.6",
-    "@types/pg": "^8.11.10",
-    "@types/pg-cursor": "^2.7.2",
-    "@types/slug": "^5.0.9",
-    "@typescript-eslint/eslint-plugin": "^8.13.0",
-    "@typescript-eslint/parser": "^8.13.0",
-    "cross-env": "^10.0.0",
-    "eslint": "^8.57.1",
-    "kysely-codegen": "^0.18.0",
-    "prettier": "^3.5.3",
-    "tslib": "^2.1.0",
-    "typescript": "^5.8.3"
-  }
-}
+{
+  "name": "@tricoteuses/senat",
+  "version": "2.9.5",
+  "description": "Handle French Sénat's open data",
+  "keywords": [
+    "France",
+    "open data",
+    "Parliament",
+    "Sénat"
+  ],
+  "author": "Emmanuel Raviart <emmanuel@raviart.com>",
+  "bugs": {
+    "url": "https://git.tricoteuses.fr/logiciels/tricoteuses-senat/issues"
+  },
+  "homepage": "https://tricoteuses.fr/",
+  "license": "AGPL-3.0-or-later",
+  "repository": {
+    "type": "git",
+    "url": "https://git.tricoteuses.fr/logiciels/tricoteuses-senat.git"
+  },
+  "type": "module",
+  "engines": {
+    "node": ">=22"
+  },
+  "files": [
+    "lib"
+  ],
+  "exports": {
+    ".": {
+      "import": "./lib/index.js",
+      "types": "./lib/index.d.ts"
+    },
+    "./loaders": {
+      "import": "./lib/loaders.js",
+      "types": "./lib/loaders.d.ts"
+    },
+    "./package.json": "./package.json"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "scripts": {
+    "build": "tsc",
+    "build:types": "tsc --emitDeclarationOnly",
+    "data:convert_data": "tsx src/scripts/convert_data.ts",
+    "data:download": "tsx src/scripts/data-download.ts",
+    "data:generate_schemas": "tsx src/scripts/retrieve_open_data.ts --schema",
+    "data:retrieve_agenda": "cross-env TZ='Etc/UTC' tsx src/scripts/retrieve_agenda.ts",
+    "data:retrieve_comptes_rendus": "tsx src/scripts/retrieve_comptes_rendus.ts",
+    "data:retrieve_documents": "tsx src/scripts/retrieve_documents.ts",
+    "data:retrieve_open_data": "tsx src/scripts/retrieve_open_data.ts --all",
+    "data:retrieve_senateurs_photos": "tsx src/scripts/retrieve_senateurs_photos.ts --fetch",
+    "data:retrieve_videos": "tsx src/scripts/retrieve_videos.ts",
+    "data:parse_textes_lois": "tsx src/scripts/parse_textes.ts",
+    "prepare": "npm run build",
+    "prepublishOnly": "npm run build",
+    "prettier": "prettier --write 'src/**/*.ts' 'tests/**/*.test.ts'",
+    "type-check": "tsc --noEmit",
+    "type-check:watch": "npm run type-check -- --watch"
+  },
+  "dependencies": {
+    "@biryani/core": "^0.2.1",
+    "command-line-args": "^5.1.1",
+    "dotenv": "^8.2.0",
+    "fs-extra": "^9.1.0",
+    "jsdom": "^26.0.0",
+    "kysely": "^0.27.4",
+    "luxon": "^3.5.0",
+    "node-stream-zip": "^1.8.2",
+    "pg": "^8.13.1",
+    "pg-cursor": "^2.12.1",
+    "slug": "^11.0.0",
+    "tsx": "^4.19.4",
+    "windows-1252": "^1.0.0"
+  },
+  "devDependencies": {
+    "@typed-code/schemats": "^5.0.1",
+    "@types/command-line-args": "^5.0.0",
+    "@types/fs-extra": "^9.0.7",
+    "@types/jsdom": "^21.1.7",
+    "@types/luxon": "^3.4.2",
+    "@types/node": "^20.17.6",
+    "@types/pg": "^8.11.10",
+    "@types/pg-cursor": "^2.7.2",
+    "@types/slug": "^5.0.9",
+    "@typescript-eslint/eslint-plugin": "^8.13.0",
+    "@typescript-eslint/parser": "^8.13.0",
+    "cross-env": "^10.0.0",
+    "eslint": "^8.57.1",
+    "kysely-codegen": "^0.18.0",
+    "prettier": "^3.5.3",
+    "tslib": "^2.1.0",
+    "typescript": "^5.8.3"
+  }
+}