PyPI - folio-data-import - Versions diffs - 0.2.8rc6__tar.gz → 0.2.8rc8__tar.gz - Mend

folio-data-import 0.2.8rc6tar.gz → 0.2.8rc8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of folio-data-import might be problematic. Click here for more details.

Files changed (12) hide show

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: folio_data_import
-Version: 0.2.8rc6
+Version: 0.2.8rc8
 Summary: A python module to interact with the data importing capabilities of the open-source FOLIO ILS
 License: MIT
 Author: Brooks Travis
@@ -108,11 +108,11 @@ Unlike mod-user-import, this importer does not require `externalSystemId` as the
 #### Preferred Contact Type Mapping
-Another point of departure from the behavior of `mod-user-import` is the handling of `preferredContactTypeId`. This importer will accept either the `"001", "002", "003"...` values stored by the FOLIO, or the human-friendly strings used by `mod-user-import` (`"mail", "email", "text", "phone", "mobile"`). It will also __*set a customizable default for all users that do not otherwise have a valid value specified*__ (using `--default_preferred_contact_type`), unless a (valid) value is already present in the user record being updated.
+Another point of departure from the behavior of `mod-user-import` is the handling of `preferredContactTypeId`. This importer will accept either the `"001", "002", "003"...` values stored by FOLIO, or the human-friendly strings used by `mod-user-import` (`"mail", "email", "text", "phone", "mobile"`). It will also __*set a customizable default for all users that do not otherwise have a valid value specified*__ (using `--default_preferred_contact_type`), unless a (valid) value is already present in the user record being updated.
 #### Field Protection (*experimental*)
-This script offers a rudimentary field protection implementation using custom fields. To enable this functionality, create a text custom field that has the field name `protectedFields`. In this field, you ca specify a comma-separated list of User schema field names, using dot-notation for nested fields. This protection should support all standard fields except addresses within `personal.addresses`. If you include `personal.addresses` in a user record, any existing addresses will be replaced by the new values.
+This script offers a rudimentary field protection implementation using custom fields. To enable this functionality, create a text custom field that has the field name `protectedFields`. In this field, you can specify a comma-separated list of User schema field names, using dot-notation for nested fields. This protection should support all standard fields except addresses within `personal.addresses`. If you include `personal.addresses` in a user record, any existing addresses will be replaced by the new values.
 ##### Example

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/README.md RENAMED Viewed

@@ -78,11 +78,11 @@ Unlike mod-user-import, this importer does not require `externalSystemId` as the
 #### Preferred Contact Type Mapping
-Another point of departure from the behavior of `mod-user-import` is the handling of `preferredContactTypeId`. This importer will accept either the `"001", "002", "003"...` values stored by the FOLIO, or the human-friendly strings used by `mod-user-import` (`"mail", "email", "text", "phone", "mobile"`). It will also __*set a customizable default for all users that do not otherwise have a valid value specified*__ (using `--default_preferred_contact_type`), unless a (valid) value is already present in the user record being updated.
+Another point of departure from the behavior of `mod-user-import` is the handling of `preferredContactTypeId`. This importer will accept either the `"001", "002", "003"...` values stored by FOLIO, or the human-friendly strings used by `mod-user-import` (`"mail", "email", "text", "phone", "mobile"`). It will also __*set a customizable default for all users that do not otherwise have a valid value specified*__ (using `--default_preferred_contact_type`), unless a (valid) value is already present in the user record being updated.
 #### Field Protection (*experimental*)
-This script offers a rudimentary field protection implementation using custom fields. To enable this functionality, create a text custom field that has the field name `protectedFields`. In this field, you ca specify a comma-separated list of User schema field names, using dot-notation for nested fields. This protection should support all standard fields except addresses within `personal.addresses`. If you include `personal.addresses` in a user record, any existing addresses will be replaced by the new values.
+This script offers a rudimentary field protection implementation using custom fields. To enable this functionality, create a text custom field that has the field name `protectedFields`. In this field, you can specify a comma-separated list of User schema field names, using dot-notation for nested fields. This protection should support all standard fields except addresses within `personal.addresses`. If you include `personal.addresses` in a user record, any existing addresses will be replaced by the new values.
 ##### Example

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "folio_data_import"
-version = "0.2.8rc6"
+version = "0.2.8rc8"
 description = "A python module to interact with the data importing capabilities of the open-source FOLIO ILS"
 authors = ["Brooks Travis <brooks.travis@gmail.com>"]
 license = "MIT"

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/src/folio_data_import/MARCDataImport.py RENAMED Viewed

@@ -1,18 +1,21 @@
 import argparse
 import asyncio
+import datetime
+from email import message
 import glob
 import importlib
 import io
+import logging
 import os
 import sys
-from typing import List
 import uuid
 from contextlib import ExitStack
-import datetime
 from datetime import datetime as dt
+from functools import cached_property
 from getpass import getpass
 from pathlib import Path
 from time import sleep
+from typing import List
 import folioclient
 import httpx
@@ -22,7 +25,6 @@ import tabulate
 from humps import decamelize
 from tqdm import tqdm
 try:
     datetime_utc = datetime.UTC
 except AttributeError:
@@ -36,6 +38,18 @@ REPORT_SUMMARY_ORDERING = {"created": 0, "updated": 1, "discarded": 2, "error":
 RETRY_TIMEOUT_START = 1
 RETRY_TIMEOUT_RETRY_FACTOR = 2
+# Custom log level for data issues, set to 26
+DATA_ISSUE_LVL_NUM = 26
+logging.addLevelName(DATA_ISSUE_LVL_NUM, "DATA_ISSUES")
+def data_issues(self, msg, *args, **kws):
+    if self.isEnabledFor(DATA_ISSUE_LVL_NUM):
+        self._log(DATA_ISSUE_LVL_NUM, msg, args, **kws)
+logging.Logger.data_issues = data_issues
+logger = logging.getLogger(__name__)
 class MARCImportJob:
     """
     Class to manage importing MARC data (Bib, Authority) into FOLIO using the Change Manager
@@ -56,7 +70,6 @@ class MARCImportJob:
     bad_records_file: io.TextIOWrapper
     failed_batches_file: io.TextIOWrapper
     job_id: str
-    job_import_profile: dict
     pbar_sent: tqdm
     pbar_imported: tqdm
     http_client: httpx.Client
@@ -77,9 +90,11 @@ class MARCImportJob:
         marc_record_preprocessor=None,
         consolidate=False,
         no_progress=False,
+        let_summary_fail=False,
     ) -> None:
         self.consolidate_files = consolidate
         self.no_progress = no_progress
+        self.let_summary_fail = let_summary_fail
         self.folio_client: folioclient.FolioClient = folio_client
         self.import_files = marc_files
         self.import_profile_name = import_profile_name
@@ -87,6 +102,8 @@ class MARCImportJob:
         self.batch_delay = batch_delay
         self.current_retry_timeout = None
         self.marc_record_preprocessor = marc_record_preprocessor
+        self.pbar_sent: tqdm
+        self.pbar_imported: tqdm
     async def do_work(self) -> None:
         """
@@ -100,21 +117,25 @@ class MARCImportJob:
         Returns:
             None
         """
-        with httpx.Client() as http_client, open(
-            self.import_files[0].parent.joinpath(
-                f"bad_marc_records_{dt.now(tz=datetime_utc).strftime('%Y%m%d%H%M%S')}.mrc"
-            ),
-            "wb+",
-        ) as bad_marc_file, open(
-            self.import_files[0].parent.joinpath(
-                f"failed_batches_{dt.now(tz=datetime_utc).strftime('%Y%m%d%H%M%S')}.mrc"
-            ),
-            "wb+",
-        ) as failed_batches:
+        with (
+            httpx.Client() as http_client,
+            open(
+                self.import_files[0].parent.joinpath(
+                    f"bad_marc_records_{dt.now(tz=datetime_utc).strftime('%Y%m%d%H%M%S')}.mrc"
+                ),
+                "wb+",
+            ) as bad_marc_file,
+            open(
+                self.import_files[0].parent.joinpath(
+                    f"failed_batches_{dt.now(tz=datetime_utc).strftime('%Y%m%d%H%M%S')}.mrc"
+                ),
+                "wb+",
+            ) as failed_batches,
+        ):
             self.bad_records_file = bad_marc_file
-            print(f"Writing bad records to {self.bad_records_file.name}")
+            logger.info(f"Writing bad records to {self.bad_records_file.name}")
             self.failed_batches_file = failed_batches
-            print(f"Writing failed batches to {self.failed_batches_file.name}")
+            logger.info(f"Writing failed batches to {self.failed_batches_file.name}")
             self.http_client = http_client
             if self.consolidate_files:
                 self.current_file = self.import_files
@@ -135,16 +156,16 @@ class MARCImportJob:
         Returns:
             None
         """
-        self.bad_records_file.seek(0)
-        if not self.bad_records_file.read(1):
-            os.remove(self.bad_records_file.name)
-            print("No bad records found. Removing bad records file.")
-        self.failed_batches_file.seek(0)
-        if not self.failed_batches_file.read(1):
-            os.remove(self.failed_batches_file.name)
-            print("No failed batches. Removing failed batches file.")
-        print("Import complete.")
-        print(f"Total records imported: {self.total_records_sent}")
+        with open(self.bad_records_file.name, "rb") as bad_records:
+            if not bad_records.read(1):
+                os.remove(bad_records.name)
+                logger.info("No bad records found. Removing bad records file.")
+        with open(self.failed_batches_file.name, "rb") as failed_batches:
+            if not failed_batches.read(1):
+                os.remove(failed_batches.name)
+                logger.info("No failed batches. Removing failed batches file.")
+        logger.info("Import complete.")
+        logger.info(f"Total records imported: {self.total_records_sent}")
     async def get_job_status(self) -> None:
         """
@@ -158,21 +179,28 @@ class MARCImportJob:
         """
         try:
             self.current_retry_timeout = (
-                self.current_retry_timeout * RETRY_TIMEOUT_RETRY_FACTOR
-            ) if self.current_retry_timeout else RETRY_TIMEOUT_START
+                (self.current_retry_timeout * RETRY_TIMEOUT_RETRY_FACTOR)
+                if self.current_retry_timeout
+                else RETRY_TIMEOUT_START
+            )
             job_status = self.folio_client.folio_get(
                 "/metadata-provider/jobExecutions?statusNot=DISCARDED&uiStatusAny"
                 "=PREPARING_FOR_PREVIEW&uiStatusAny=READY_FOR_PREVIEW&uiStatusAny=RUNNING&limit=50"
             )
             self.current_retry_timeout = None
-        except (httpx.ConnectTimeout, httpx.ReadTimeout):
-            sleep(.25)
-            with httpx.Client(
-                timeout=self.current_retry_timeout,
-                verify=self.folio_client.ssl_verify
-            ) as temp_client:
-                self.folio_client.httpx_client = temp_client
-                return await self.get_job_status()
+        except (httpx.ConnectTimeout, httpx.ReadTimeout, httpx.HTTPStatusError) as e:
+            if not hasattr(e, "response") or e.response.status_code in [502, 504]:
+                error_text = e.response.text if hasattr(e, "response") else str(e)
+                logger.warning(f"SERVER ERROR fetching job status: {error_text}. Retrying.")
+                sleep(0.25)
+                with httpx.Client(
+                    timeout=self.current_retry_timeout,
+                    verify=self.folio_client.ssl_verify,
+                ) as temp_client:
+                    self.folio_client.httpx_client = temp_client
+                    return await self.get_job_status()
+            else:
+                raise e
         try:
             status = [
                 job for job in job_status["jobExecutions"] if job["id"] == self.job_id
@@ -180,16 +208,32 @@ class MARCImportJob:
             self.pbar_imported.update(status["progress"]["current"] - self.last_current)
             self.last_current = status["progress"]["current"]
         except IndexError:
-            job_status = self.folio_client.folio_get(
-                "/metadata-provider/jobExecutions?limit=100&sortBy=completed_date%2Cdesc&statusAny"
-                "=COMMITTED&statusAny=ERROR&statusAny=CANCELLED"
-            )
-            status = [
-                job for job in job_status["jobExecutions"] if job["id"] == self.job_id
-            ][0]
-            self.pbar_imported.update(status["progress"]["current"] - self.last_current)
-            self.last_current = status["progress"]["current"]
-            self.finished = True
+            try:
+                job_status = self.folio_client.folio_get(
+                    "/metadata-provider/jobExecutions?limit=100&sortBy=completed_date%2Cdesc&statusAny"
+                    "=COMMITTED&statusAny=ERROR&statusAny=CANCELLED"
+                )
+                status = [
+                    job for job in job_status["jobExecutions"] if job["id"] == self.job_id
+                ][0]
+                self.pbar_imported.update(status["progress"]["current"] - self.last_current)
+                self.last_current = status["progress"]["current"]
+                self.finished = True
+            except (httpx.ConnectTimeout, httpx.ReadTimeout, httpx.HTTPStatusError) as e:
+                if not hasattr(e, "response") or e.response.status_code in [502, 504]:
+                    error_text = e.response.text if hasattr(e, "response") else str(e)
+                    logger.warning(
+                        f"SERVER ERROR fetching job status: {error_text}. Retrying."
+                    )
+                    sleep(0.25)
+                    with httpx.Client(
+                        timeout=self.current_retry_timeout,
+                        verify=self.folio_client.ssl_verify,
+                    ) as temp_client:
+                        self.folio_client.httpx_client = temp_client
+                        return await self.get_job_status()
+                else:
+                    raise e
     async def create_folio_import_job(self) -> None:
         """
@@ -209,7 +253,7 @@ class MARCImportJob:
         try:
             create_job.raise_for_status()
         except httpx.HTTPError as e:
-            print(
+            logger.error(
                 "Error creating job: "
                 + str(e)
                 + "\n"
@@ -217,10 +261,15 @@ class MARCImportJob:
             )
             raise e
         self.job_id = create_job.json()["parentJobExecutionId"]
+        logger.info("Created job: " + self.job_id)
-    async def get_import_profile(self) -> None:
+    @cached_property
+    def import_profile(self) -> dict:
         """
-        Retrieves the import profile with the specified name.
+        Returns the import profile for the current job execution.
+        Returns:
+            dict: The import profile for the current job execution.
         """
         import_profiles = self.folio_client.folio_get(
             "/data-import-profiles/jobProfiles",
@@ -232,7 +281,7 @@ class MARCImportJob:
             for profile in import_profiles
             if profile["name"] == self.import_profile_name
         ][0]
-        self.job_import_profile = profile
+        return profile
     async def set_job_profile(self) -> None:
         """
@@ -248,15 +297,15 @@ class MARCImportJob:
             + "/jobProfile",
             headers=self.folio_client.okapi_headers,
             json={
-                "id": self.job_import_profile["id"],
-                "name": self.job_import_profile["name"],
+                "id": self.import_profile["id"],
+                "name": self.import_profile["name"],
                 "dataType": "MARC",
             },
         )
         try:
             set_job_profile.raise_for_status()
         except httpx.HTTPError as e:
-            print(
+            logger.error(
                 "Error creating job: "
                 + str(e)
                 + "\n"
@@ -298,8 +347,13 @@ class MARCImportJob:
                 headers=self.folio_client.okapi_headers,
                 json=batch_payload,
             )
-        except httpx.ReadTimeout:
-            sleep(.25)
+            # if batch_payload["recordsMetadata"]["last"]:
+            #     logger.log(
+            #         25,
+            #         f"Sending last batch of {batch_payload['recordsMetadata']['total']} records.",
+            #     )
+        except (httpx.ConnectTimeout, httpx.ReadTimeout):
+            sleep(0.25)
             return await self.process_record_batch(batch_payload)
         try:
             post_batch.raise_for_status()
@@ -307,12 +361,14 @@ class MARCImportJob:
             self.record_batch = []
             self.pbar_sent.update(len(batch_payload["initialRecords"]))
         except Exception as e:
-            if hasattr(e, "response") and e.response.status_code in [500, 422]: # TODO: #26 Check for specific error code once https://folio-org.atlassian.net/browse/MODSOURMAN-1281 is resolved
+            if (
+                hasattr(e, "response") and e.response.status_code in [500, 422]
+            ):  # TODO: #26 Check for specific error code once https://folio-org.atlassian.net/browse/MODSOURMAN-1281 is resolved
                 self.total_records_sent += len(self.record_batch)
                 self.record_batch = []
                 self.pbar_sent.update(len(batch_payload["initialRecords"]))
             else:
-                print("Error posting batch: " + str(e))
+                logger.error("Error posting batch: " + str(e))
                 for record in self.record_batch:
                     self.failed_batches_file.write(record)
                     self.error_records += len(self.record_batch)
@@ -334,14 +390,20 @@ class MARCImportJob:
         """
         counter = 0
         for import_file in files:
+            file_path = Path(import_file.name)
             self.pbar_sent.set_description(
                 f"Sent ({os.path.basename(import_file.name)}): "
             )
             reader = pymarc.MARCReader(import_file, hide_utf8_warnings=True)
-            for record in reader:
+            for idx, record in enumerate(reader, start=1):
                 if len(self.record_batch) == self.batch_size:
                     await self.process_record_batch(
-                        await self.create_batch_payload(counter, total_records, False),
+                        await self.create_batch_payload(
+                            counter,
+                            total_records,
+                            (counter - self.error_records)
+                            == (total_records - self.error_records),
+                        ),
                     )
                     await self.get_job_status()
                     sleep(0.25)
@@ -353,14 +415,35 @@ class MARCImportJob:
                     self.record_batch.append(record.as_marc())
                     counter += 1
                 else:
+                    logger.data_issues(
+                        "RECORD FAILED\t%s\t%s\t%s",
+                        f"{file_path.name}:{idx}",
+                        f"Error reading {idx} record from {file_path}. Skipping. Writing current chunk to {self.bad_records_file.name}.",
+                        "",
+                    )
                     self.bad_records_file.write(reader.current_chunk)
             if self.record_batch:
                 await self.process_record_batch(
-                    await self.create_batch_payload(counter, total_records, True),
+                    await self.create_batch_payload(
+                        counter,
+                        total_records,
+                        (counter - self.error_records)
+                        == (total_records - self.error_records),
+                    ),
                 )
+            import_complete_path = file_path.parent.joinpath("import_complete")
+            if import_complete_path.exists():
+                logger.debug(f"Creating import_complete directory: {import_complete_path.absolute()}")
+                import_complete_path.mkdir(exist_ok=True)
+            logger.debug(f"Moving {file_path} to {import_complete_path.absolute()}")
+            file_path.rename(
+                file_path.parent.joinpath("import_complete", file_path.name)
+            )
     @staticmethod
-    async def apply_marc_record_preprocessing(record: pymarc.Record, func_or_path) -> pymarc.Record:
+    async def apply_marc_record_preprocessing(
+        record: pymarc.Record, func_or_path
+    ) -> pymarc.Record:
         """
         Apply preprocessing to the MARC record before sending it to FOLIO.
@@ -373,23 +456,29 @@ class MARCImportJob:
         """
         if isinstance(func_or_path, str):
             try:
-                path_parts = func_or_path.rsplit('.')
+                path_parts = func_or_path.rsplit(".")
                 module_path, func_name = ".".join(path_parts[:-1]), path_parts[-1]
                 module = importlib.import_module(module_path)
                 func = getattr(module, func_name)
             except (ImportError, AttributeError) as e:
-                print(f"Error importing preprocessing function {func_or_path}: {e}. Skipping preprocessing.")
+                logger.error(
+                    f"Error importing preprocessing function {func_or_path}: {e}. Skipping preprocessing."
+                )
                 return record
         elif callable(func_or_path):
             func = func_or_path
         else:
-            print(f"Invalid preprocessing function: {func_or_path}. Skipping preprocessing.")
+            logger.warning(
+                f"Invalid preprocessing function: {func_or_path}. Skipping preprocessing."
+            )
             return record
         try:
             return func(record)
         except Exception as e:
-            print(f"Error applying preprocessing function: {e}. Skipping preprocessing.")
+            logger.error(
+                f"Error applying preprocessing function: {e}. Skipping preprocessing."
+            )
             return record
     async def create_batch_payload(self, counter, total_records, is_last) -> dict:
@@ -434,24 +523,26 @@ class MARCImportJob:
             None
         """
         await self.create_folio_import_job()
-        await self.get_import_profile()
         await self.set_job_profile()
         with ExitStack() as stack:
             files = [
                 stack.enter_context(open(file, "rb")) for file in self.current_file
             ]
             total_records = await self.read_total_records(files)
-            with tqdm(
-                desc="Imported: ",
-                total=total_records,
-                position=1,
-                disable=self.no_progress,
-            ) as pbar_imported, tqdm(
-                desc="Sent: ()",
-                total=total_records,
-                position=0,
-                disable=self.no_progress,
-            ) as pbar_sent:
+            with (
+                tqdm(
+                    desc="Imported: ",
+                    total=total_records,
+                    position=1,
+                    disable=self.no_progress,
+                ) as pbar_imported,
+                tqdm(
+                    desc="Sent: ()",
+                    total=total_records,
+                    position=0,
+                    disable=self.no_progress,
+                ) as pbar_sent,
+            ):
                 self.pbar_sent = pbar_sent
                 self.pbar_imported = pbar_imported
                 await self.process_records(files, total_records)
@@ -459,34 +550,39 @@ class MARCImportJob:
                     await self.get_job_status()
                 sleep(1)
             if self.finished:
-                job_summary = await self.get_job_summary()
-                job_summary.pop("jobExecutionId")
-                job_summary.pop("totalErrors")
-                columns = ["Summary"] + list(job_summary.keys())
-                rows = set()
-                for key in columns[1:]:
-                    rows.update(job_summary[key].keys())
-                table_data = []
-                for row in rows:
-                    metric_name = decamelize(row).split("_")[1]
-                    table_row = [metric_name]
-                    for col in columns[1:]:
-                        table_row.append(job_summary[col].get(row, "N/A"))
-                    table_data.append(table_row)
-                table_data.sort(key=lambda x: REPORT_SUMMARY_ORDERING.get(x[0], 99))
-                columns = columns[:1] + [
-                    " ".join(decamelize(x).split("_")[:-1]) for x in columns[1:]
-                ]
-                print(
-                    f"Results for {'file' if len(self.current_file) == 1 else 'files'}: "
-                    f"{', '.join([os.path.basename(x.name) for x in self.current_file])}"
-                )
-                print(
-                    tabulate.tabulate(
-                        table_data, headers=columns, tablefmt="fancy_grid"
-                    ),
-                )
+                if job_summary := await self.get_job_summary():
+                    job_id = job_summary.pop("jobExecutionId", None)
+                    total_errors = job_summary.pop("totalErrors", 0)
+                    columns = ["Summary"] + list(job_summary.keys())
+                    rows = set()
+                    for key in columns[1:]:
+                        rows.update(job_summary[key].keys())
+                    table_data = []
+                    for row in rows:
+                        metric_name = decamelize(row).split("_")[1]
+                        table_row = [metric_name]
+                        for col in columns[1:]:
+                            table_row.append(job_summary[col].get(row, "N/A"))
+                        table_data.append(table_row)
+                    table_data.sort(key=lambda x: REPORT_SUMMARY_ORDERING.get(x[0], 99))
+                    columns = columns[:1] + [
+                        " ".join(decamelize(x).split("_")[:-1]) for x in columns[1:]
+                    ]
+                    logger.info(
+                        f"Results for {'file' if len(self.current_file) == 1 else 'files'}: "
+                        f"{', '.join([os.path.basename(x.name) for x in self.current_file])}"
+                    )
+                    logger.info(
+                        "\n"
+                        + tabulate.tabulate(
+                            table_data, headers=columns, tablefmt="fancy_grid"
+                        ),
+                    )
+                    if total_errors:
+                        logger.info(f"Total errors: {total_errors}. Job ID: {job_id}.")
+                else:
+                    logger.error(f"No job summary available for job {self.job_id}.")
             self.last_current = 0
             self.finished = False
@@ -499,26 +595,86 @@ class MARCImportJob:
         """
         try:
             self.current_retry_timeout = (
-                self.current_retry_timeout * RETRY_TIMEOUT_RETRY_FACTOR
-            ) if self.current_retry_timeout else RETRY_TIMEOUT_START
-            job_summary = self.folio_client.folio_get(
-                f"/metadata-provider/jobSummary/{self.job_id}"
+                (self.current_retry_timeout * RETRY_TIMEOUT_RETRY_FACTOR)
+                if self.current_retry_timeout
+                else RETRY_TIMEOUT_START
             )
+            with httpx.Client(
+                timeout=self.current_retry_timeout, verify=self.folio_client.ssl_verify
+            ) as temp_client:
+                self.folio_client.httpx_client = temp_client
+                job_summary = self.folio_client.folio_get(
+                    f"/metadata-provider/jobSummary/{self.job_id}"
+                )
             self.current_retry_timeout = None
         except (httpx.ConnectTimeout, httpx.ReadTimeout, httpx.HTTPStatusError) as e:
-            if not hasattr(e, "response") or e.response.status_code == 502:
-                sleep(.25)
+            error_text = e.response.text if hasattr(e, "response") else str(e)
+            if not hasattr(e, "response") or (
+                e.response.status_code in [502, 504] and not self.let_summary_fail
+            ):
+                logger.warning(f"SERVER ERROR fetching job summary: {e}. Retrying.")
+                sleep(0.25)
                 with httpx.Client(
                     timeout=self.current_retry_timeout,
-                    verify=self.folio_client.ssl_verify
+                    verify=self.folio_client.ssl_verify,
                 ) as temp_client:
                     self.folio_client.httpx_client = temp_client
-                    return await self.get_job_status()
+                    return await self.get_job_summary()
+            elif hasattr(e, "response") and (
+                e.response.status_code in [502, 504] and self.let_summary_fail
+            ):
+                logger.warning(
+                    f"SERVER ERROR fetching job summary: {error_text}. Skipping final summary check."
+                )
+                job_summary = {}
             else:
                 raise e
         return job_summary
+def set_up_cli_logging():
+    """
+    This function sets up logging for the CLI.
+    """
+    logger.setLevel(logging.INFO)
+    logger.propagate = False
+    # Set up file and stream handlers
+    file_handler = logging.FileHandler(
+        "folio_data_import_{}.log".format(dt.now().strftime("%Y%m%d%H%M%S"))
+    )
+    file_handler.setLevel(logging.INFO)
+    file_handler.addFilter(ExcludeLevelFilter(DATA_ISSUE_LVL_NUM))
+    # file_handler.addFilter(IncludeLevelFilter(25))
+    file_formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
+    file_handler.setFormatter(file_formatter)
+    logger.addHandler(file_handler)
+    if not any(
+        isinstance(h, logging.StreamHandler) and h.stream == sys.stderr
+        for h in logger.handlers
+    ):
+        stream_handler = logging.StreamHandler(sys.stdout)
+        stream_handler.setLevel(logging.INFO)
+        stream_handler.addFilter(ExcludeLevelFilter(DATA_ISSUE_LVL_NUM))
+        # stream_handler.addFilter(ExcludeLevelFilter(25))
+        stream_formatter = logging.Formatter("%(message)s")
+        stream_handler.setFormatter(stream_formatter)
+        logger.addHandler(stream_handler)
+    # Set up data issues logging
+    data_issues_handler = logging.FileHandler(
+        "marc_import_data_issues_{}.log".format(dt.now().strftime("%Y%m%d%H%M%S"))
+    )
+    data_issues_handler.setLevel(26)
+    data_issues_formatter = logging.Formatter("%(message)s")
+    data_issues_handler.setFormatter(data_issues_formatter)
+    logger.addHandler(data_issues_handler)
+    # Stop httpx from logging info messages to the console
+    logging.getLogger("httpx").setLevel(logging.WARNING)
 async def main() -> None:
     """
     Main function to run the MARC import job.
@@ -526,6 +682,7 @@ async def main() -> None:
     This function parses command line arguments, initializes the FolioClient,
     and runs the MARCImportJob.
     """
+    set_up_cli_logging()
     parser = argparse.ArgumentParser()
     parser.add_argument("--gateway_url", type=str, help="The FOLIO API Gateway URL")
     parser.add_argument("--tenant_id", type=str, help="The FOLIO tenant ID")
@@ -582,6 +739,11 @@ async def main() -> None:
         action="store_true",
         help="Disable progress bars (eg. for running in a CI environment)",
     )
+    parser.add_argument(
+        "--let-summary-fail",
+        action="store_true",
+        help="Do not retry fetching the final job summary if it fails",
+    )
     args = parser.parse_args()
     if not args.password:
         args.password = getpass("Enter FOLIO password: ")
@@ -601,10 +763,10 @@ async def main() -> None:
     marc_files.sort()
     if len(marc_files) == 0:
-        print(f"No files found matching {args.marc_file_path}. Exiting.")
+        logger.critical(f"No files found matching {args.marc_file_path}. Exiting.")
         sys.exit(1)
     else:
-        print(marc_files)
+        logger.info(marc_files)
     if not args.import_profile_name:
         import_profiles = folio_client.folio_get(
@@ -636,12 +798,31 @@ async def main() -> None:
             marc_record_preprocessor=args.preprocessor,
             consolidate=bool(args.consolidate),
             no_progress=bool(args.no_progress),
+            let_summary_fail=bool(args.let_summary_fail),
         ).do_work()
     except Exception as e:
-        print("Error importing files: " + str(e))
+        logger.error("Error importing files: " + str(e))
         raise
+class ExcludeLevelFilter(logging.Filter):
+    def __init__(self, level):
+        super().__init__()
+        self.level = level
+    def filter(self, record):
+        return record.levelno != self.level
+class IncludeLevelFilter(logging.Filter):
+    def __init__(self, level):
+        super().__init__()
+        self.level = level
+    def filter(self, record):
+        return record.levelno == self.level
 def sync_main() -> None:
     """
     Synchronous main function to run the MARC import job.

folio_data_import-0.2.8rc8/src/folio_data_import/marc_preprocessors/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ from ._preprocessors import *

folio_data_import-0.2.8rc8/src/folio_data_import/marc_preprocessors/_preprocessors.py ADDED Viewed

@@ -0,0 +1,273 @@
+import pymarc
+import logging
+logger = logging.getLogger("folio_data_import.MARCDataImport")
+def prepend_prefix_001(record: pymarc.Record, prefix: str) -> pymarc.Record:
+    """
+    Prepend a prefix to the record's 001 field.
+    Args:
+        record (pymarc.Record): The MARC record to preprocess.
+        prefix (str): The prefix to prepend to the 001 field.
+    Returns:
+        pymarc.Record: The preprocessed MARC record.
+    """
+    record["001"].data = f"({prefix})" + record["001"].data
+    return record
+def prepend_ppn_prefix_001(record: pymarc.Record) -> pymarc.Record:
+    """
+    Prepend the PPN prefix to the record's 001 field. Useful when
+    importing records from the ABES SUDOC catalog
+    Args:
+        record (pymarc.Record): The MARC record to preprocess.
+    Returns:
+        pymarc.Record: The preprocessed MARC record.
+    """
+    return prepend_prefix_001(record, "PPN")
+def prepend_abes_prefix_001(record: pymarc.Record) -> pymarc.Record:
+    """
+    Prepend the ABES prefix to the record's 001 field. Useful when
+    importing records from the ABES SUDOC catalog
+    Args:
+        record (pymarc.Record): The MARC record to preprocess.
+    Returns:
+        pymarc.Record: The preprocessed MARC record.
+    """
+    return prepend_prefix_001(record, "ABES")
+def strip_999_ff_fields(record: pymarc.Record) -> pymarc.Record:
+    """
+    Strip all 999 fields with ff indicators from the record.
+    Useful when importing records exported from another FOLIO system
+    Args:
+        record (pymarc.Record): The MARC record to preprocess.
+    Returns:
+        pymarc.Record: The preprocessed MARC record.
+    """
+    for field in record.get_fields("999"):
+        if field.indicators == pymarc.Indicators(*["f", "f"]):
+            record.remove_field(field)
+    return record
+def sudoc_supercede_prep(record: pymarc.Record) -> pymarc.Record:
+    """
+    Preprocesses a record from the ABES SUDOC catalog to copy 035 fields
+    with a $9 subfield value of 'sudoc' to 935 fields with a $a subfield
+    prefixed with "(ABES)". This is useful when importing newly-merged records
+    from the SUDOC catalog when you want the new record to replace the old one
+    in FOLIO. This also applyes the prepend_ppn_prefix_001 function to the record.
+    Args:
+        record (pymarc.Record): The MARC record to preprocess.
+    Returns:
+        pymarc.Record: The preprocessed MARC record.
+    """
+    record = prepend_abes_prefix_001(record)
+    for field in record.get_fields("035"):
+        if "a" in field and "9" in field and field["9"] == "sudoc":
+            _935 = pymarc.Field(
+                tag="935",
+                indicators=["f", "f"],
+                subfields=[pymarc.field.Subfield("a", "(ABES)" + field["a"])],
+            )
+            record.add_ordered_field(_935)
+    return record
+def clean_empty_fields(record: pymarc.Record) -> pymarc.Record:
+    """
+    Remove empty fields and subfields from the record. These can cause
+    data import mapping issues in FOLIO. Removals are logged at custom
+    log level 26, which is used by folio_migration_tools to populate the
+    data issues report.
+    Args:
+        record (pymarc.Record): The MARC record to preprocess.
+    Returns:
+        pymarc.Record: The preprocessed MARC record.
+    """
+    MAPPED_FIELDS = {
+        "010": ["a", "z"],
+        "020": ["a", "y", "z"],
+        "035": ["a", "z"],
+        "040": ["a", "b", "c", "d", "e", "f", "g", "h", "k", "m", "n", "p", "r", "s"],
+        "050": ["a", "b"],
+        "082": ["a", "b"],
+        "100": ["a", "b", "c", "d", "q"],
+        "110": ["a", "b", "c"],
+        "111": ["a", "c", "d"],
+        "130": [
+            "a",
+            "d",
+            "f",
+            "k",
+            "l",
+            "m",
+            "n",
+            "o",
+            "p",
+            "r",
+            "s",
+            "t",
+            "x",
+            "y",
+            "z",
+        ],
+        "180": ["x", "y", "z"],
+        "210": ["a", "c"],
+        "240": ["a", "f", "k", "l", "m", "n", "o", "p", "r", "s", "t", "x", "y", "z"],
+        "245": ["a", "b", "c", "f", "g", "h", "k", "n", "p", "s"],
+        "246": ["a", "f", "g", "n", "p", "s"],
+        "250": ["a", "b"],
+        "260": ["a", "b", "c", "e", "f", "g"],
+        "300": ["a", "b", "c", "e", "f", "g"],
+        "440": ["a", "n", "p", "v", "x", "y", "z"],
+        "490": ["a", "v", "x", "y", "z"],
+        "500": ["a", "c", "d", "n", "p", "v", "x", "y", "z"],
+        "505": ["a", "g", "r", "t", "u"],
+        "520": ["a", "b", "c", "u"],
+        "600": ["a", "b", "c", "d", "q", "t", "v", "x", "y", "z"],
+        "610": ["a", "b", "c", "d", "t", "v", "x", "y", "z"],
+        "611": ["a", "c", "d", "t", "v", "x", "y", "z"],
+        "630": [
+            "a",
+            "d",
+            "f",
+            "k",
+            "l",
+            "m",
+            "n",
+            "o",
+            "p",
+            "r",
+            "s",
+            "t",
+            "x",
+            "y",
+            "z",
+        ],
+        "650": ["a", "d", "v", "x", "y", "z"],
+        "651": ["a", "v", "x", "y", "z"],
+        "655": ["a", "v", "x", "y", "z"],
+        "700": ["a", "b", "c", "d", "q", "t", "v", "x", "y", "z"],
+        "710": ["a", "b", "c", "d", "t", "v", "x", "y", "z"],
+        "711": ["a", "c", "d", "t", "v", "x", "y", "z"],
+        "730": [
+            "a",
+            "d",
+            "f",
+            "k",
+            "l",
+            "m",
+            "n",
+            "o",
+            "p",
+            "r",
+            "s",
+            "t",
+            "x",
+            "y",
+            "z",
+        ],
+        "740": ["a", "n", "p", "v", "x", "y", "z"],
+        "800": ["a", "b", "c", "d", "q", "t", "v", "x", "y", "z"],
+        "810": ["a", "b", "c", "d", "t", "v", "x", "y", "z"],
+        "811": ["a", "c", "d", "t", "v", "x", "y", "z"],
+        "830": [
+            "a",
+            "d",
+            "f",
+            "k",
+            "l",
+            "m",
+            "n",
+            "o",
+            "p",
+            "r",
+            "s",
+            "t",
+            "x",
+            "y",
+            "z",
+        ],
+        "856": ["u", "y", "z"],
+    }
+    for field in list(record.get_fields()):
+        len_subs = len(field.subfields)
+        subfield_value = bool(field.subfields[0].value) if len_subs > 0 else False
+        if not int(field.tag) >= 900 and field.tag in MAPPED_FIELDS:
+            if int(field.tag) > 9 and len_subs == 0:
+                logger.log(
+                    26,
+                    "DATA ISSUE\t%s\t%s\t%s",
+                    record["001"].value(),
+                    f"{field.tag} is empty",
+                    field,
+                )
+                record.remove_field(field)
+            elif len_subs == 1 and not subfield_value:
+                logger.log(
+                    26,
+                    "DATA ISSUE\t%s\t%s\t%s",
+                    record["001"].value(),
+                    f"{field.tag}${field.subfields[0].code} is empty, removing field",
+                    field,
+                )
+                record.remove_field(field)
+            else:
+                if len_subs > 1 and "a" in field and not field["a"].strip():
+                    logger.log(
+                        26,
+                        "DATA ISSUE\t%s\t%s\t%s",
+                        record["001"].value(),
+                        f"{field.tag}$a is empty, removing field",
+                        field,
+                    )
+                    field.delete_subfield("a")
+                for idx, subfield in enumerate(list(field.subfields), start=1):
+                    if subfield.code in MAPPED_FIELDS.get(field.tag, []) and not subfield.value:
+                        logger.log(
+                            26,
+                            "DATA ISSUE\t%s\t%s\t%s",
+                            record["001"].value(),
+                            f"{field.tag}${subfield.code} ({ordinal(idx)} subfield) is empty, but other subfields have values, removing subfield",
+                            field,
+                        )
+                        field.delete_subfield(subfield.code)
+                if len(field.subfields) == 0:
+                    logger.log(
+                        26,
+                        "DATA ISSUE\t%s\t%s\t%s",
+                        record["001"].value(),
+                        f"{field.tag} has no non-empty subfields after cleaning, removing field",
+                        field,
+                    )
+                    record.remove_field(field)
+    return record
+def ordinal(n):
+    s = ("th", "st", "nd", "rd") + ("th",) * 10
+    v = n % 100
+    if v > 13:
+        return f"{n}{s[v % 10]}"
+    else:
+        return f"{n}{s[v]}"

folio_data_import-0.2.8rc6/src/folio_data_import/marc_preprocessors/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- from ._preprocessors import prepend_ppn_prefix_001, strip_999_ff_fields

folio_data_import-0.2.8rc6/src/folio_data_import/marc_preprocessors/_preprocessors.py DELETED Viewed

@@ -1,84 +0,0 @@
-import pymarc
-def prepend_prefix_001(record: pymarc.Record, prefix: str) -> pymarc.Record:
-    """
-    Prepend a prefix to the record's 001 field.
-    Args:
-        record (pymarc.Record): The MARC record to preprocess.
-        prefix (str): The prefix to prepend to the 001 field.
-    Returns:
-        pymarc.Record: The preprocessed MARC record.
-    """
-    record['001'].data = f'({prefix})' + record['001'].data
-    return record
-def prepend_ppn_prefix_001(record: pymarc.Record) -> pymarc.Record:
-    """
-    Prepend the PPN prefix to the record's 001 field. Useful when
-    importing records from the ABES SUDOC catalog
-    Args:
-        record (pymarc.Record): The MARC record to preprocess.
-    Returns:
-        pymarc.Record: The preprocessed MARC record.
-    """
-    return prepend_prefix_001(record, 'PPN')
-def prepend_abes_prefix_001(record: pymarc.Record) -> pymarc.Record:
-    """
-    Prepend the ABES prefix to the record's 001 field. Useful when
-    importing records from the ABES SUDOC catalog
-    Args:
-        record (pymarc.Record): The MARC record to preprocess.
-    Returns:
-        pymarc.Record: The preprocessed MARC record.
-    """
-    return prepend_prefix_001(record, 'ABES')
-def strip_999_ff_fields(record: pymarc.Record) -> pymarc.Record:
-    """
-    Strip all 999 fields with ff indicators from the record.
-    Useful when importing records exported from another FOLIO system
-    Args:
-        record (pymarc.Record): The MARC record to preprocess.
-    Returns:
-        pymarc.Record: The preprocessed MARC record.
-    """
-    for field in record.get_fields('999'):
-        if field.indicators == pymarc.Indicators(*['f', 'f']):
-            record.remove_field(field)
-    return record
-def sudoc_supercede_prep(record: pymarc.Record) -> pymarc.Record:
-    """
-    Preprocesses a record from the ABES SUDOC catalog to copy 035 fields
-    with a $9 subfield value of 'sudoc' to 935 fields with a $a subfield
-    prefixed with "(ABES)". This is useful when importing newly-merged records
-    from the SUDOC catalog when you want the new record to replace the old one
-    in FOLIO. This also applyes the prepend_ppn_prefix_001 function to the record.
-    Args:
-        record (pymarc.Record): The MARC record to preprocess.
-    Returns:
-        pymarc.Record: The preprocessed MARC record.
-    """
-    record = prepend_abes_prefix_001(record)
-    for field in record.get_fields('035'):
-        if "a" in field and "9" in field and field['9'] == 'sudoc':
-            _935 = pymarc.Field(
-                tag='935',
-                indicators=['f', 'f'],
-                subfields=[
-                    pymarc.field.Subfield('a', "(ABES)" + field['a'])
-                ]
-            )
-            record.add_ordered_field(_935)
-    return record

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/LICENSE RENAMED Viewed

File without changes

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/src/folio_data_import/UserImport.py RENAMED Viewed

File without changes

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/src/folio_data_import/__init__.py RENAMED Viewed

File without changes

{folio_data_import-0.2.8rc6 → folio_data_import-0.2.8rc8}/src/folio_data_import/__main__.py RENAMED Viewed

File without changes

folio-data-import 0.2.8rc6__tar.gz → 0.2.8rc8__tar.gz

Potentially problematic release.

folio-data-import 0.2.8rc6tar.gz → 0.2.8rc8tar.gz