PyPI - seabirdfilehandler - Versions diffs - 0.5.2__tar.gz → 0.5.4__tar.gz - Mend

seabirdfilehandler 0.5.2tar.gz → 0.5.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of seabirdfilehandler might be problematic. Click here for more details.

Files changed (18) hide show

seabirdfilehandler-0.5.4/PKG-INFO ADDED Viewed

@@ -0,0 +1,53 @@
+Metadata-Version: 2.3
+Name: seabirdfilehandler
+Version: 0.5.4
+Summary: Library of parsers to interact with SeaBird CTD files.
+Keywords: CTD,parser,seabird,data
+Author: Emil Michels
+Author-email: <emil.michels@io-warnemuende.de>
+Requires-Python: >=3.12
+Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
+Classifier: Development Status :: 3 - Alpha
+Classifier: Operating System :: OS Independent
+Classifier: Intended Audience :: Science/Research
+Classifier: Topic :: Scientific/Engineering :: Oceanography
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Requires-Dist: pandas (>=2.2.1)
+Requires-Dist: xmltodict (>=0.13.0)
+Project-URL: Documentation, https://ctd-software.pages.io-warnemuende.de/seabirdfilehandler
+Project-URL: Homepage, https://ctd-software.pages.io-warnemuende.de/seabirdfilehandler
+Project-URL: Repository, https://git.io-warnemuende.de/CTD-Software/SeabirdFileHandler
+Description-Content-Type: text/markdown
+# Intro
+This is a library for handling the different SeaBird file types. Each file is
+meant to be represented by one object that stores all of its information in a
+structured way. Through the grouping of different data types, more complex
+calculations, visualisations and output forms are possible inside of those
+objects.
+By being able to parse edited data and metadata back to the original file
+format, this package can be used to process data using custom ideas, while
+staying compatible to the original SeaBird software packages. This way, one can
+create new workflows that interchangeably use old and new processing modules.
+One implementation of this idea is the [ctd-processing python package](https://ctd-software.pages.io-warnemuende.de/processing/), also developed at the IOW.
+The structured metadata does provide the possibility to leverage the vast
+amounts of information stored inside the extensive metadata header. Sensor data
+and processing information are readily available in intuitive dictionaries.
+## Development roadmap
+### misc improvements
+- refactor processing module handling
+- extend individual parameter information
+- handle duplicate input columns
+### visualisation
+- write an intuitive visualisation module

seabirdfilehandler-0.5.4/README.md ADDED Viewed

@@ -0,0 +1,29 @@
+# Intro
+This is a library for handling the different SeaBird file types. Each file is
+meant to be represented by one object that stores all of its information in a
+structured way. Through the grouping of different data types, more complex
+calculations, visualisations and output forms are possible inside of those
+objects.
+By being able to parse edited data and metadata back to the original file
+format, this package can be used to process data using custom ideas, while
+staying compatible to the original SeaBird software packages. This way, one can
+create new workflows that interchangeably use old and new processing modules.
+One implementation of this idea is the [ctd-processing python package](https://ctd-software.pages.io-warnemuende.de/processing/), also developed at the IOW.
+The structured metadata does provide the possibility to leverage the vast
+amounts of information stored inside the extensive metadata header. Sensor data
+and processing information are readily available in intuitive dictionaries.
+## Development roadmap
+### misc improvements
+- refactor processing module handling
+- extend individual parameter information
+- handle duplicate input columns
+### visualisation
+- write an intuitive visualisation module

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/pyproject.toml RENAMED Viewed

@@ -16,10 +16,11 @@ classifiers = [
     "Programming Language :: Python :: 3.12",
     "Programming Language :: Python :: 3.13",
 ]
-urls.homepage = "https://git.io-warnemuende.de/CTD-Software/SeabirdFileHandler"
+urls.homepage = "https://ctd-software.pages.io-warnemuende.de/seabirdfilehandler"
 urls.repository = "https://git.io-warnemuende.de/CTD-Software/SeabirdFileHandler"
+urls.documentation = "https://ctd-software.pages.io-warnemuende.de/seabirdfilehandler"
 dynamic = []
-version = "0.5.2"
+version = "0.5.4"
 [tool.poetry]
@@ -43,6 +44,7 @@ pyment = ">=0.3.3"
 pylint = ">=3.0.2"
 pre-commit = ">=3.6.2"
 tomlkit = ">=0.13.2"
+myst-parser = "^4.0.1"
 [tool.pytest.ini_options]
 pythonpath = [".", "src", "src/seabirdfilehandler"]

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/bottlefile.py RENAMED Viewed

@@ -1,3 +1,4 @@
+from pathlib import Path
 from typing import Union
 from datetime import datetime, time
 import pandas as pd
@@ -22,12 +23,13 @@ class BottleFile(DataFile):
     """
-    def __init__(self, path_to_file):
-        super().__init__(path_to_file)
-        self.original_df = self.create_dataframe()
-        self.df = self.original_df
-        self.setting_dataframe_dtypes()
-        self.adding_timestamp_column()
+    def __init__(self, path_to_file: Path | str, only_header: bool = False):
+        super().__init__(path_to_file, only_header)
+        if not only_header:
+            self.original_df = self.create_dataframe()
+            self.df = self.original_df
+            self.setting_dataframe_dtypes()
+            self.adding_timestamp_column()
     def create_dataframe(self):
         """Creates a dataframe out of the btl file. Manages the double data

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/cnvfile.py RENAMED Viewed

@@ -19,21 +19,26 @@ class CnvFile(DataFile):
     be able to use this representation for all applications concerning cnv
     files, like data processing, transformation or visualization.
-    To achieve that, the metadata header is organized by the grandparent-class,
-    SeaBirdFile, while the data table is extracted by this class. The data
-    representation of choice is a pandas Dataframe. Inside this class, there
-    are methods to parse cnv data into dataframes, do the reverse of writing a
-    dataframe into cnv compliant form and to manipulate the dataframe in
-    various ways.
+    To achieve that, the metadata header is organized by the parent-class,
+    DataFile, while the data table is extracted by this class. The data
+    representation can be a numpy array or pandas dataframe. The handling of
+    the data is mostly done inside parameters, a representation of the
+    individual measurement parameter data and metadata.
+    This class is also able to parse the edited data and metadata back to the
+    original .cnv file format, allowing for custom data processing using this
+    representation, while still being able to use Sea-Birds original software
+    on that output. It also allows to stay comparable with other parsers or
+    methods in general.
     Parameters
     ----------
     path_to_file: Path | str:
         the path to the file
-    full_data_header: bool:
-        whether to use the full data column descriptions for the dataframe
-    long_header_names: bool:
-        whether to use long header names in the dateframe
+    only_header: bool :
+        Whether to stop reading the file after the metadata header.
+    create_dataframe: bool :
+        Whether to create a pandas DataFrame from the data table.
     absolute_time_calculation: bool:
         whether to use a real timestamp instead of the second count
     event_log_column: bool:
@@ -55,9 +60,11 @@ class CnvFile(DataFile):
         super().__init__(path_to_file, only_header)
         self.validation_modules = self.obtaining_validation_modules()
         self.start_time = self.reading_start_time()
-        self.parameters = Parameters(self.data, self.data_table_description)
+        self.parameters = Parameters(
+            self.data, self.data_table_description, only_header
+        )
         if create_dataframe:
-            self.df = self.parameters.get_pandas_dataframe()
+            self.df = self.create_dataframe()
         if absolute_time_calculation:
             self.absolute_time_calculation()
         if event_log_column:
@@ -65,6 +72,13 @@ class CnvFile(DataFile):
         if coordinate_columns:
             self.add_position_columns()
+    def create_dataframe(self) -> pd.DataFrame:
+        """
+        Plain dataframe creator.
+        """
+        self.df = self.parameters.get_pandas_dataframe()
+        return self.df
     def reading_start_time(
         self,
         time_source: str = "System UTC",

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/datafiles.py RENAMED Viewed

@@ -18,15 +18,21 @@ logger = logging.getLogger(__name__)
 class DataFile:
-    """Collection of methods for the SeaBird files that feature some kind of
-    data table that is represented in a pandas dataframe.
+    """
+    The base class for all Sea-Bird data files, which are .cnv, .btl, and .bl .
+    One instance of this class, or its children, represents one data text file.
+    The different information bits of such a file are structured into individual
+    lists or dictionaries. The data table will be loaded as numpy array and
+    can be converted to a pandas DataFrame. Datatype-specific behavior is
+    implemented in the subclasses.
     Parameters
     ----------
-    Returns
-    -------
+    path_to_file: Path | str :
+        The file to the data file.
+    only_header: bool :
+        Whether to stop reading the file after the metadata header.
     """
     def __init__(
@@ -66,16 +72,10 @@ class DataFile:
         return self.file_data == other.file_data
     def read_file(self):
-        """Reads and structures all the different information present in the
+        """
+        Reads and structures all the different information present in the
         file. Lists and Dictionaries are the data structures of choice. Uses
         basic prefix checking to distinguish different header information.
-        Parameters
-        ----------
-        Returns
-        -------
         """
         past_sensors = False
         with self.path_to_file.open("r", encoding="latin-1") as file:
@@ -109,14 +109,18 @@ class DataFile:
     def sensor_xml_to_flattened_dict(
         self, sensor_data: str
     ) -> list[dict] | dict:
-        """Reads the pure xml sensor input and creates a multilevel dictionary,
+        """
+        Reads the pure xml sensor input and creates a multilevel dictionary,
         dropping the first two dictionaries, as they are single entry only
         Parameters
         ----------
+        sensor_data: str:
+            The raw xml sensor data.
         Returns
         -------
+        A list of sensor information, which is a structured dict.
         """
         full_sensor_dict = xmltodict.parse(sensor_data, process_comments=True)
@@ -153,8 +157,9 @@ class DataFile:
             return tidied_sensor_list
     def structure_metadata(self, metadata_list: list) -> dict:
-        """Creates a dictionary to store the metadata that is added by using
-        werums dship API.
+        """
+        Creates a dictionary to store custom metadata, of which Sea-Bird allows
+        12 lines in each file.
         Parameters
         ----------
@@ -181,7 +186,8 @@ class DataFile:
         file_name: str | None = None,
         file_type: str = ".csv",
     ) -> Path:
-        """Creates a Path object holding the desired output path.
+        """
+        Creates a Path object holding the desired output path.
         Parameters
         ----------
@@ -209,14 +215,13 @@ class DataFile:
         output_file_path: Path | str | None = None,
         output_file_name: str | None = None,
     ):
-        """Writes a csv from the current dataframe. Takes a list of columns to
-        use, a boolean for writing the header and the output file parameters.
+        """
+        Writes a csv from the given data.
         Parameters
         ----------
-        selected_columns : list :
-            a list of columns to include in the csv
-            (Default value = self.df.columns)
+        data: pd.DataFrame | np.ndarray :
+            The source data to use.
         with_header : boolean :
             indicating whether the header shall appear in the output
              (Default value = True)
@@ -246,7 +251,8 @@ class DataFile:
         list_of_columns: list | str,
         df: pd.DataFrame,
     ):
-        """Alters the dataframe to only hold the given columns.
+        """
+        Alters the dataframe to only hold the given columns.
         Parameters
         ----------

seabirdfilehandler-0.5.4/src/seabirdfilehandler/file_collection.py ADDED Viewed

@@ -0,0 +1,411 @@
+from __future__ import annotations
+from pathlib import Path
+import logging
+from collections import UserList
+from typing import Callable, Type
+import pandas as pd
+import numpy as np
+from seabirdfilehandler import (
+    CnvFile,
+    BottleFile,
+    BottleLogFile,
+)
+from seabirdfilehandler import DataFile
+from seabirdfilehandler.utils import get_unique_sensor_data
+logger = logging.getLogger(__name__)
+def get_collection(
+    path_to_files: Path | str,
+    file_suffix: str = "cnv",
+    only_metadata: bool = False,
+    sorting_key: Callable | None = None,
+) -> Type[FileCollection]:
+    """
+    Factory to create instances of FileCollection, depending on input type.
+    Parameters
+    ----------
+    path_to_files : Path | str :
+        The path to the directory to search for files.
+    file_suffix : str :
+        The suffix to search for. (Default value = "cnv")
+    only_metadata : bool :
+        Whether to read only metadata. (Default value = False)
+    sorting_key : Callable | None :
+        A callable that returns the filename-part to use to sort the collection. (Default value = None)
+    Returns
+    -------
+    An instance of FileCollection or one of its children.
+    """
+    mapping_suffix_to_type = {
+        "cnv": CnvCollection,
+        "btl": FileCollection,
+        "bl": FileCollection,
+    }
+    file_suffix = file_suffix.strip(".")
+    try:
+        collection = mapping_suffix_to_type[file_suffix](
+            path_to_files, file_suffix, only_metadata, sorting_key
+        )
+    except ValueError:
+        raise ValueError(f"Unknown input file type: {file_suffix}, aborting.")
+    else:
+        return collection
+class FileCollection(UserList):
+    """
+    A representation of multiple files of the same kind. These files share
+    the same suffix and are otherwise closely connected to each other. A common
+    use case would be the collection of CNVs to allow for easier processing or
+    integration of field calibration measurements.
+    Parameters
+    ----------
+    path_to_files : Path | str :
+        The path to the directory to search for files.
+    file_suffix : str :
+        The suffix to search for. (Default value = "cnv")
+    only_metadata : bool :
+        Whether to read only metadata. (Default value = False)
+    sorting_key : Callable | None :
+        A callable that returns the filename-part to use to sort the collection. (Default value = None)
+    """
+    def __init__(
+        self,
+        path_to_files: str | Path,
+        file_suffix: str,
+        only_metadata: bool = False,
+        sorting_key: Callable | None = None,
+    ):
+        super().__init__()
+        self.path_to_files = Path(path_to_files)
+        self.file_suffix = file_suffix.strip(".")
+        self.file_type = self.extract_file_type(self.file_suffix)
+        self.individual_file_paths = self.collect_files(
+            sorting_key=sorting_key
+        )
+        self.data = self.load_files(only_metadata)
+        if not only_metadata:
+            self.df_list = self.get_dataframes()
+            self.df = self.get_collection_dataframe(self.df_list)
+    def __str__(self):
+        return "/n".join(self.data)
+    def extract_file_type(self, suffix: str) -> Type[DataFile]:
+        """
+        Determines the file type using the input suffix.
+        Parameters
+        ----------
+        suffix : str :
+            The file suffix.
+        Returns
+        -------
+        An object corresponding to the given suffix.
+        """
+        mapping_suffix_to_type = {
+            "cnv": CnvFile,
+            "btl": BottleFile,
+            "bl": BottleLogFile,
+        }
+        file_type = DataFile
+        for key, value in mapping_suffix_to_type.items():
+            if key == suffix:
+                file_type = value
+                break
+        return file_type
+    def collect_files(
+        self,
+        sorting_key: Callable | None = lambda file: int(
+            file.stem.split("_")[3]
+        ),
+    ) -> list[Path]:
+        """
+        Creates a list of target files, recursively from the given directory.
+        These can be sorted with the help of the sorting_key parameter, which
+        is a Callable that identifies the part of the filename that shall be
+        used for sorting.
+        Parameters
+        ----------
+        sorting_key : Callable | None :
+            The part of the filename to use in sorting. (Default value = lambda file: int(file.stem.split("_")[3]))
+        Returns
+        -------
+        A list of all paths found.
+        """
+        return sorted(
+            self.path_to_files.rglob(f"*{self.file_suffix}"),
+            key=sorting_key,
+        )
+    def load_files(self, only_metadata: bool = False) -> list[DataFile]:
+        """
+        Creates python instances of each file.
+        Parameters
+        ----------
+        only_metadata : bool :
+            Whether to load only file metadata. (Default value = False)
+        Returns
+        -------
+        A list of all instances.
+        """
+        data = []
+        for file in self.individual_file_paths:
+            try:
+                data.append(self.file_type(file, only_metadata))
+            except TypeError:
+                logger.error(
+                    f"Could not open file {file} with the type "
+                    f"{self.file_type}."
+                )
+                continue
+        return data
+    def get_dataframes(
+        self,
+        event_log: bool = False,
+        coordinates: bool = False,
+        time_correction: bool = False,
+        cast_identifier: bool = False,
+    ) -> list[pd.DataFrame]:
+        """
+        Collects all individual dataframes and allows additional column
+        creation.
+        Parameters
+        ----------
+        event_log : bool :
+            (Default value = False)
+        coordinates : bool :
+            (Default value = False)
+        time_correction : bool :
+            (Default value = False)
+        cast_identifier : bool :
+            (Default value = False)
+        Returns
+        -------
+        A list of the individual pandas DataFrames.
+        """
+        for index, file in enumerate(self.data):
+            if event_log:
+                file.add_station_and_event_column()
+            if coordinates:
+                file.add_position_columns()
+            if time_correction:
+                file.absolute_time_calculation()
+                file.add_start_time()
+            if cast_identifier:
+                file.add_cast_number(index + 1)
+        return [file.df for file in self.data]
+    def get_collection_dataframe(
+        self, list_of_dfs: list[pd.DataFrame] | None = None
+    ) -> pd.DataFrame:
+        """
+        Creates one DataFrame from the individual ones, by concatenation.
+        Parameters
+        ----------
+        list_of_dfs : list[pd.DataFrame] | None :
+            A list of the individual DataFrames. (Default value = None)
+        Returns
+        -------
+        A pandas DataFrame representing the whole dataset.
+        """
+        if not list_of_dfs:
+            list_of_dfs = self.get_dataframes()
+        if not list_of_dfs:
+            raise ValueError("No dataframes to concatenate.")
+        df = pd.concat(list_of_dfs, ignore_index=True)
+        self.df = df
+        return df
+    def tidy_collection_dataframe(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Apply the different dataframe edits to the given dataframe.
+        Parameters
+        ----------
+        df : pd.DataFrame :
+            A DataFrame to edit.
+        Returns
+        -------
+        The tidied dataframe.
+        """
+        df = self.use_bad_flag_for_nan(df)
+        df = self.set_dtype_to_float(df)
+        return self.select_real_scan_data(df)
+    def use_bad_flag_for_nan(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Replace all Nan values by the bad flag value, defined inside the files.
+        Parameters
+        ----------
+        df : pd.DataFrame :
+            The dataframe to edit.
+        Returns
+        -------
+        The edited DataFrame.
+        """
+        bad_flags = set()
+        for file in self.data:
+            for line in file.data_table_description:
+                if line.startswith("bad_flag"):
+                    flag = line.split("=")[1].strip()
+                    bad_flags.add(flag)
+        for flag in bad_flags:
+            df.replace(to_replace=flag, value=np.nan, inplace=True)
+        return df
+    def set_dtype_to_float(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Use the float-dtype for all DataFrame columns.
+        Parameters
+        ----------
+        df : pd.DataFrame :
+            The dataframe to edit.
+        Returns
+        -------
+        The edited DataFrame.
+        """
+        for parameter in df.columns:
+            if parameter in ["datetime"]:
+                continue
+            try:
+                df[parameter] = df[parameter].astype("float")
+            finally:
+                continue
+        return df
+    def select_real_scan_data(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Drop data rows have no 'Scan' value, if that column exists.
+        Parameters
+        ----------
+        df : pd.DataFrame :
+            The dataframe to edit.
+        Returns
+        -------
+        The edited DataFrame.
+        """
+        try:
+            scan_column = [
+                c for c in df.columns if c.lower().startswith("scan")
+            ][0]
+        except IndexError:
+            return df
+        else:
+            df = df.loc[df[scan_column].notna()]
+        return df
+    def to_csv(self, file_name):
+        """
+        Writes a csv file with the given filename.
+        Parameters
+        ----------
+        file_name :
+            The new csv file name.
+        """
+        self.df.to_csv(file_name)
+class CnvCollection(FileCollection):
+    """
+    Specific methods to work with collections of .cnv files.
+    """
+    def __init__(
+        self,
+        *args,
+        **kwargs,
+    ):
+        super().__init__(*args, **kwargs)
+        self.data_meta_info = self.get_data_table_meta_info()
+        self.sensor_data = get_unique_sensor_data(
+            [file.sensors for file in self.data]
+        )
+        self.array = self.get_array()
+    def get_dataframes(
+        self,
+        event_log: bool = False,
+        coordinates: bool = False,
+        time_correction: bool = False,
+        cast_identifier: bool = False,
+    ) -> list[pd.DataFrame]:
+        """
+        Collects all individual dataframes and allows additional column
+        creation.
+        Parameters
+        ----------
+        event_log : bool :
+            (Default value = False)
+        coordinates : bool :
+            (Default value = False)
+        time_correction : bool :
+            (Default value = False)
+        cast_identifier : bool :
+            (Default value = False)
+        Returns
+        -------
+        A list of the individual pandas DataFrames.
+        """
+        for index, file in enumerate(self.data):
+            if event_log:
+                file.add_station_and_event_column()
+            if coordinates:
+                file.add_position_columns()
+            if time_correction:
+                file.absolute_time_calculation()
+                file.add_start_time()
+            if cast_identifier:
+                file.add_cast_number(index + 1)
+        return [file.create_dataframe() for file in self.data]
+    def get_data_table_meta_info(self) -> list[dict]:
+        """
+        Ensures the same data description in all input cnv files and returns
+        it.
+        Acts as an early alarm when working on different kinds of files, which
+        cannot be concatenated together.
+        Returns
+        -------
+        A list of dictionaries that represent the data column information.
+        """
+        all_column_descriptions = [
+            file.parameters.metadata for file in self.data
+        ]
+        for info in all_column_descriptions:
+            if all_column_descriptions[0] != info:
+                raise AssertionError(
+                    "Acting on differently formed data files, aborting"
+                )
+        return all_column_descriptions[0]
+    def get_array(self) -> np.ndarray:
+        """
+        Creates a collection array of all individual file arrays.
+        Returns
+        -------
+        A numpy array, representing the data of all input files.
+        """
+        return np.concatenate(
+            [file.parameters.create_full_ndarray() for file in self.data]
+        )

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/parameter.py RENAMED Viewed

@@ -18,10 +18,10 @@ class Parameters(UserDict):
     Parameters
     ----------
-    data: list:
-        The raw data as extraced by SeaBirdFile
-    metadata: list,
-        The raw metadata as extraced by SeaBirdFile
+    data: list
+        The raw data as extraced by DataFile
+    metadata: list
+        The raw metadata as extraced by DataFile
     Returns
     -------
@@ -32,15 +32,20 @@ class Parameters(UserDict):
         self,
         data: list,
         metadata: list,
+        only_header: bool = False,
     ):
         self.raw_input_data = data
         self.raw_metadata = metadata
-        self.full_data_array = self.create_full_ndarray()
         self.differentiate_table_description()
         self.metadata, self.duplicate_columns = self.reading_data_header(
             metadata
         )
-        self.data = self.create_parameter_instances()
+        if not only_header:
+            self.full_data_array = self.create_full_ndarray()
+            self.data = self.create_parameter_instances()
+    def get_parameter_names(self) -> list[str]:
+        return [parameter["name"] for parameter in self.metadata.values()]
     def get_parameter_list(self) -> list[Parameter]:
         """ """

seabirdfilehandler-0.5.2/PKG-INFO DELETED Viewed

@@ -1,28 +0,0 @@
-Metadata-Version: 2.3
-Name: seabirdfilehandler
-Version: 0.5.2
-Summary: Library of parsers to interact with SeaBird CTD files.
-Keywords: CTD,parser,seabird,data
-Author: Emil Michels
-Author-email: <emil.michels@io-warnemuende.de>
-Requires-Python: >=3.12
-Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
-Classifier: Development Status :: 3 - Alpha
-Classifier: Operating System :: OS Independent
-Classifier: Intended Audience :: Science/Research
-Classifier: Topic :: Scientific/Engineering :: Oceanography
-Classifier: Programming Language :: Python :: 3 :: Only
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3.13
-Requires-Dist: pandas (>=2.2.1)
-Requires-Dist: xmltodict (>=0.13.0)
-Project-URL: Homepage, https://git.io-warnemuende.de/CTD-Software/SeabirdFileHandler
-Project-URL: Repository, https://git.io-warnemuende.de/CTD-Software/SeabirdFileHandler
-Description-Content-Type: text/markdown
-This is a library for handling the different SeaBird file types. Each file is
-meant to be represented by one object that stores all of its information in a
-structured way. Through the grouping of different data types, more complex
-calculations, visualisations and output forms will be possible inside of those
-objects.

seabirdfilehandler-0.5.2/README.md DELETED Viewed

@@ -1,5 +0,0 @@
-This is a library for handling the different SeaBird file types. Each file is
-meant to be represented by one object that stores all of its information in a
-structured way. Through the grouping of different data types, more complex
-calculations, visualisations and output forms will be possible inside of those
-objects.

seabirdfilehandler-0.5.2/src/seabirdfilehandler/file_collection.py DELETED Viewed

@@ -1,258 +0,0 @@
-from pathlib import Path
-import logging
-from collections import UserList
-from typing import Callable, Type
-import pandas as pd
-import numpy as np
-from seabirdfilehandler import (
-    CnvFile,
-    BottleFile,
-    BottleLogFile,
-)
-from seabirdfilehandler import DataFile
-from seabirdfilehandler.utils import get_unique_sensor_data
-logger = logging.getLogger(__name__)
-class FileCollection(UserList):
-    """A representation of multiple files of the same kind. These files share
-    the same suffix and are otherwise closely connected to each other. A common
-    use case would be the collection of CNVs to allow for easier processing or
-    integration of field calibration measurements.
-    Parameters
-    ----------
-    Returns
-    -------
-    """
-    def __init__(
-        self,
-        path_to_files: str | Path,
-        file_suffix: str,
-        only_metadata: bool = False,
-        sorting_key: Callable | None = None,
-    ):
-        super().__init__()
-        self.path_to_files = Path(path_to_files)
-        self.file_suffix = file_suffix.strip(".")
-        self.file_type: Type[DataFile]
-        self.extract_file_type()
-        self.individual_file_paths = []
-        self.collect_files(sorting_key=sorting_key)
-        self.load_files(only_metadata)
-        if not only_metadata:
-            if self.file_type == DataFile:
-                self.df_list = self.get_dataframes()
-                self.df = self.get_collection_dataframe(self.df_list)
-            if self.file_type == CnvFile:
-                self.data_meta_info = self.get_data_table_meta_info()
-            self.sensor_data = get_unique_sensor_data(
-                [file.sensors for file in self.data]
-            )
-    def __str__(self):
-        return "/n".join(self.data)
-    def extract_file_type(self):
-        """ """
-        mapping_suffix_to_type = {
-            "cnv": CnvFile,
-            "btl": BottleFile,
-            "bl": BottleLogFile,
-        }
-        for key, value in mapping_suffix_to_type.items():
-            if key == self.file_suffix:
-                self.file_type = value
-                break
-            else:
-                self.file_type = DataFile
-    def collect_files(
-        self,
-        sorting_key: Callable | None = lambda file: int(
-            file.stem.split("_")[3]
-        ),
-    ):
-        """ """
-        self.individual_file_paths = sorted(
-            self.path_to_files.rglob(f"*{self.file_suffix}"),
-            key=sorting_key,
-        )
-    def load_files(self, only_metadata: bool = False):
-        """ """
-        for file in self.individual_file_paths:
-            try:
-                self.data.append(self.file_type(file))
-            except TypeError:
-                logger.error(
-                    f"Could not open file {file} with the type "
-                    f"{self.file_type}."
-                )
-                continue
-    def get_dataframes(
-        self,
-        event_log: bool = False,
-        coordinates: bool = False,
-        time_correction: bool = False,
-        cast_identifier: bool = False,
-        long_header_names: bool = False,
-        full_data_header: bool = True,
-    ) -> list[pd.DataFrame]:
-        """
-        Parameters
-        ----------
-        event_log: bool :
-             (Default value = False)
-        coordinates: bool :
-             (Default value = False)
-        time_correction: bool :
-             (Default value = False)
-        cast_identifier: bool :
-             (Default value = False)
-        long_header_names: bool :
-             (Default value = False)
-        full_data_header: bool :
-             (Default value = True)
-        Returns
-        -------
-        """
-        for index, file in enumerate(self.data):
-            if full_data_header:
-                file.rename_dataframe_header(header_detail_level="longinfo")
-            elif long_header_names:
-                file.rename_dataframe_header(header_detail_level="name")
-            if event_log:
-                file.add_station_and_event_column()
-            if coordinates:
-                file.add_position_columns()
-            if time_correction:
-                file.absolute_time_calculation()
-                file.add_start_time()
-            if cast_identifier:
-                file.add_cast_number(index + 1)
-        return [file.df for file in self.data]
-    def get_collection_dataframe(
-        self, list_of_dfs: list[pd.DataFrame] | None = None
-    ) -> pd.DataFrame:
-        """
-        Parameters
-        ----------
-        list_of_dfs: list[pd.DataFrame] | None :
-             (Default value = None)
-        Returns
-        -------
-        """
-        if not list_of_dfs:
-            list_of_dfs = self.get_dataframes()
-        df = pd.concat(list_of_dfs, ignore_index=True)
-        # df.meta.metadata = list_of_dfs[0].meta.metadata
-        return df
-    def tidy_collection_dataframe(self, df: pd.DataFrame) -> pd.DataFrame:
-        """
-        Parameters
-        ----------
-        df: pd.DataFrame :
-        Returns
-        -------
-        """
-        df = self.use_bad_flag_for_nan(df)
-        df = self.set_dtype_to_float(df)
-        return self.select_real_scan_data(df)
-    def use_bad_flag_for_nan(self, df: pd.DataFrame) -> pd.DataFrame:
-        """
-        Parameters
-        ----------
-        df: pd.DataFrame :
-        Returns
-        -------
-        """
-        bad_flags = set()
-        for file in self.data:
-            for line in file.data_table_description:
-                if line.startswith("bad_flag"):
-                    flag = line.split("=")[1].strip()
-                    bad_flags.add(flag)
-        for flag in bad_flags:
-            df.replace(to_replace=flag, value=np.nan, inplace=True)
-        return df
-    def set_dtype_to_float(self, df: pd.DataFrame) -> pd.DataFrame:
-        """
-        Parameters
-        ----------
-        df: pd.DataFrame :
-        Returns
-        -------
-        """
-        for parameter in df.columns:
-            if parameter in ["datetime"]:
-                continue
-            try:
-                df[parameter] = df[parameter].astype("float")
-            finally:
-                continue
-        return df
-    def select_real_scan_data(self, df: pd.DataFrame) -> pd.DataFrame:
-        """
-        Parameters
-        ----------
-        df: pd.DataFrame :
-        Returns
-        -------
-        """
-        # TODO: fix this hardcoded name
-        try:
-            df = df.loc[df["Scan Count"].notna()]
-        finally:
-            pass
-        return df
-    def to_csv(self, file_name):
-        """
-        Parameters
-        ----------
-        file_name :
-        Returns
-        -------
-        """
-        self.get_collection_dataframe().to_csv(file_name)
-    def get_data_table_meta_info(self) -> list[list[dict]]:
-        """ """
-        return [file.parameters.metadata for file in self.data]

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/LICENSE RENAMED Viewed

File without changes

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/__init__.py RENAMED Viewed

File without changes

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/bottlelogfile.py RENAMED Viewed

File without changes

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/geomar_ctd_file_parser.py RENAMED Viewed

File without changes

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/utils.py RENAMED Viewed

File without changes

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/validation_modules.py RENAMED Viewed

File without changes

{seabirdfilehandler-0.5.2 → seabirdfilehandler-0.5.4}/src/seabirdfilehandler/xmlfiles.py RENAMED Viewed

File without changes

seabirdfilehandler 0.5.2__tar.gz → 0.5.4__tar.gz

Potentially problematic release.

seabirdfilehandler 0.5.2tar.gz → 0.5.4tar.gz