PyPI - pydartdiags - Versions diffs - 0.0.42__tar.gz → 0.0.43__tar.gz - Mend

pydartdiags 0.0.42tar.gz → 0.0.43tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of pydartdiags might be problematic. Click here for more details.

Files changed (23) hide show

pydartdiags-0.0.43/PKG-INFO ADDED Viewed

@@ -0,0 +1,45 @@
+Metadata-Version: 2.1
+Name: pydartdiags
+Version: 0.0.43
+Summary: Observation Sequence Diagnostics for DART
+Home-page: https://github.com/NCAR/pyDARTdiags.git
+Author: Helen Kershaw
+Author-email: Helen Kershaw <hkershaw@ucar.edu>
+Project-URL: Homepage, https://github.com/NCAR/pyDARTdiags.git
+Project-URL: Issues, https://github.com/NCAR/pyDARTdiags/issues
+Project-URL: Documentation, https://ncar.github.io/pyDARTdiags
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: Apache Software License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pandas>=2.2.0
+Requires-Dist: numpy>=1.26
+Requires-Dist: plotly>=5.22.0
+Requires-Dist: pyyaml>=6.0.2
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![codecov](https://codecov.io/gh/NCAR/pyDARTdiags/graph/badge.svg?token=VK55SQZSVD)](https://codecov.io/gh/NCAR/pyDARTdiags)
+[![PyPI version](https://badge.fury.io/py/pydartdiags.svg)](https://pypi.org/project/pydartdiags/)
+# pyDARTdiags
+pyDARTdiags is a Python library for obsevation space diagnostics for the Data Assimilation Research Testbed ([DART](https://github.com/NCAR/DART)).
+pyDARTdiags is under initial development, so please use caution.
+The MATLAB [observation space diagnostics](https://docs.dart.ucar.edu/en/latest/guide/matlab-observation-space.html) are available through [DART](https://github.com/NCAR/DART).
+pyDARTdiags can be installed through pip: https://pypi.org/project/pydartdiags/
+Documenation : https://ncar.github.io/pyDARTdiags/
+## Contributing
+Contributions are welcome! If you have a feature request, bug report, or a suggestion, please open an issue on our GitHub repository.
+Please read our [Contributors Guide](https://github.com/NCAR/pyDARTdiags/blob/main/CONTRIBUTING.md) if you would like to contribute to
+pyDARTdiags.
+## License
+pyDARTdiags is released under the Apache License 2.0. For more details, see the LICENSE file in the root directory of this source tree or visit [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

pydartdiags-0.0.43/README.md ADDED Viewed

@@ -0,0 +1,24 @@
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![codecov](https://codecov.io/gh/NCAR/pyDARTdiags/graph/badge.svg?token=VK55SQZSVD)](https://codecov.io/gh/NCAR/pyDARTdiags)
+[![PyPI version](https://badge.fury.io/py/pydartdiags.svg)](https://pypi.org/project/pydartdiags/)
+# pyDARTdiags
+pyDARTdiags is a Python library for obsevation space diagnostics for the Data Assimilation Research Testbed ([DART](https://github.com/NCAR/DART)).
+pyDARTdiags is under initial development, so please use caution.
+The MATLAB [observation space diagnostics](https://docs.dart.ucar.edu/en/latest/guide/matlab-observation-space.html) are available through [DART](https://github.com/NCAR/DART).
+pyDARTdiags can be installed through pip: https://pypi.org/project/pydartdiags/
+Documenation : https://ncar.github.io/pyDARTdiags/
+## Contributing
+Contributions are welcome! If you have a feature request, bug report, or a suggestion, please open an issue on our GitHub repository.
+Please read our [Contributors Guide](https://github.com/NCAR/pyDARTdiags/blob/main/CONTRIBUTING.md) if you would like to contribute to
+pyDARTdiags.
+## License
+pyDARTdiags is released under the Apache License 2.0. For more details, see the LICENSE file in the root directory of this source tree or visit [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

{pydartdiags-0.0.42 → pydartdiags-0.0.43}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "pydartdiags"
-version = "0.0.42"
+version = "0.0.43"
 authors = [
   { name="Helen Kershaw", email="hkershaw@ucar.edu" },
 ]

{pydartdiags-0.0.42 → pydartdiags-0.0.43}/src/pydartdiags/obs_sequence/obs_sequence.py RENAMED Viewed

@@ -5,6 +5,23 @@ import os
 import yaml
 import struct
+def requires_assimilation_info(func):
+    def wrapper(self, *args, **kwargs):
+        if self.has_assimilation_info:
+            return func(self, *args, **kwargs)
+        else:
+            raise ValueError("Assimilation information is required to call this function.")
+    return wrapper
+def requires_posterior_info(func):
+    def wrapper(self, *args, **kwargs):
+        if self.has_posterior_info:
+            return func(self, *args, **kwargs)
+        else:
+            raise ValueError("Posterior information is required to call this function.")
+    return wrapper
 class obs_sequence:
     """Create an obs_sequence object from an ascii observation sequence file.
@@ -59,6 +76,8 @@ class obs_sequence:
     def __init__(self, file, synonyms=None):
         self.loc_mod = 'None'
+        self.has_assimilation_info = False
+        self.has_posterior = False
         self.file = file
         self.synonyms_for_obs = ['NCEP BUFR observation',
                                  'AIRS observation',
@@ -72,6 +91,17 @@ class obs_sequence:
             else:
                 self.synonyms_for_obs.append(synonyms)
+        if file is None:
+            # Early exit for testing purposes
+            self.df = pd.DataFrame()
+            self.types = {}
+            self.reverse_types = {}
+            self.copie_names = []
+            self.n_copies = 0
+            self.seq = []
+            self.all_obs = []
+            return
         module_dir = os.path.dirname(__file__)
         self.default_composite_types = os.path.join(module_dir,"composite_types.yaml")
@@ -103,11 +133,16 @@ class obs_sequence:
         self.synonyms_for_obs = [synonym.replace(' ', '_') for synonym in self.synonyms_for_obs]
         rename_dict = {old: 'observation' for old in self.synonyms_for_obs  if old in self.df.columns}
         self.df = self.df.rename(columns=rename_dict)
         # calculate bias and sq_err is the obs_seq is an obs_seq.final
         if 'prior_ensemble_mean'.casefold() in map(str.casefold, self.columns):
-            self.df['bias'] = (self.df['prior_ensemble_mean'] - self.df['observation'])
-            self.df['sq_err'] = self.df['bias']**2  # squared error
+            self.has_assimilation_info = True
+            self.df['prior_bias'] = (self.df['prior_ensemble_mean'] - self.df['observation'])
+            self.df['prior_sq_err'] = self.df['prior_bias']**2  # squared error
+        if 'posterior_ensemble_mean'.casefold() in map(str.casefold, self.columns):
+            self.has_posterior_info = True
+            self.df['posterior_bias'] = (self.df['posterior_ensemble_mean'] - self.df['observation'])
+            self.df['posterior_sq_err'] = self.df['posterior_bias']**2
     def create_all_obs(self):
         """ steps through the generator to create a
@@ -152,14 +187,38 @@ class obs_sequence:
             data.append(self.types[type_value]) # observation type
         # any observation specific obs def info is between here and the end of the list
+        # can be obs_def & external forward operator
+        metadata = obs[typeI+2:-2]
+        obs_def_metadata, external_metadata = self.split_metadata(metadata)
+        data.append(obs_def_metadata)
+        data.append(external_metadata)
         time = obs[-2].split()
         data.append(int(time[0])) # seconds
         data.append(int(time[1])) # days
         data.append(convert_dart_time(int(time[0]), int(time[1]))) # datetime   # HK todo what is approprate for 1d models?
         data.append(float(obs[-1])) # obs error variance ?convert to sd?
         return data
+    @staticmethod
+    def split_metadata(metadata):
+        """
+        Split the metadata list at the first occurrence of an element starting with 'externalF0'.
+        Args:
+            metadata (list of str): The metadata list to be split.
+        Returns:
+            tuple: Two sublists, the first containing elements before 'externalF0', and the second
+                containing 'externalF0' and all elements after it. If 'externalF0' is not found,
+                the first sublist contains the entire metadata list, and the second is empty.
+        """
+        for i, item in enumerate(metadata):
+            if item.startswith('external_FO'):
+                return metadata[:i], metadata[i:]
+        return metadata, []
     def list_to_obs(self, data):
         obs = []
         obs.append('OBS        ' + str(data[0]))  # obs_num lots of space
@@ -171,10 +230,16 @@ class obs_sequence:
             obs.append('   '.join(map(str, data[self.n_copies+2:self.n_copies+5])) + '   ' + str(self.reversed_vert[data[self.n_copies+5]]) )  # location x, y, z, vert
             obs.append('kind') # this is type of observation
             obs.append(self.reverse_types[data[self.n_copies + 6]])  # observation type
+            # Convert metadata to a string and append
+            obs.extend(data[self.n_copies + 7])  # metadata
         elif self.loc_mod == 'loc1d':
             obs.append(data[self.n_copies+2])  # 1d location
             obs.append('kind') # this is type of observation
             obs.append(self.reverse_types[data[self.n_copies + 3]])  # observation type
+            # Convert metadata to a string and append
+            metadata = ' '.join(map(str, data[self.n_copies + 4:-4]))
+            if metadata:
+                obs.append(metadata)  # metadata
         obs.append(' '.join(map(str, data[-4:-2])))  # seconds, days
         obs.append(data[-1])  # obs error variance
@@ -273,12 +338,70 @@ class obs_sequence:
         elif self.loc_mod == 'loc1d':
             heading.append('location')
         heading.append('type')
+        heading.append('metadata')
+        heading.append('external_FO')
         heading.append('seconds')
         heading.append('days')
         heading.append('time')
         heading.append('obs_err_var')
         return heading
+    @requires_assimilation_info
+    def select_by_dart_qc(self, dart_qc):
+        """
+        Selects rows from a DataFrame based on the DART quality control flag.
+        Parameters:
+            df (DataFrame): A pandas DataFrame.
+            dart_qc (int): The DART quality control flag to select.
+        Returns:
+            DataFrame: A DataFrame containing only the rows with the specified DART quality control flag.
+        Raises:
+            ValueError: If the DART quality control flag is not present in the DataFrame.
+        """
+        if dart_qc not in self.df['DART_quality_control'].unique():
+            raise ValueError(f"DART quality control flag '{dart_qc}' not found in DataFrame.")
+        else:
+            return self.df[self.df['DART_quality_control'] == dart_qc]
+    @requires_assimilation_info
+    def select_failed_qcs(self):
+        """
+        Select rows from the DataFrame where the DART quality control flag is greater than 0.
+        Returns:
+            pandas.DataFrame: A DataFrame containing only the rows with a DART quality control flag greater than 0.
+        """
+        return self.df[self.df['DART_quality_control'] > 0]
+    @requires_assimilation_info
+    def possible_vs_used(self):
+        """
+        Calculates the count of possible vs. used observations by type.
+        This function takes a DataFrame containing observation data, including a 'type' column for the observation
+        type and an 'observation' column. The number of used observations ('used'), is the total number
+        minus the observations that failed quality control checks (as determined by the `select_failed_qcs` function).
+        The result is a DataFrame with each observation type, the count of possible observations, and the count of
+        used observations.
+        Returns:
+            pd.DataFrame: A DataFrame with three columns: 'type', 'possible', and 'used'. 'type' is the observation type,
+            'possible' is the count of all observations of that type, and 'used' is the count of observations of that type
+            that passed quality control checks.
+        """
+        possible = self.df.groupby('type')['observation'].count()
+        possible.rename('possible', inplace=True)
+        failed_qcs = self.select_failed_qcs().groupby('type')['observation'].count()
+        used = possible - failed_qcs.reindex(possible.index, fill_value=0)
+        used.rename('used', inplace=True)
+        return pd.concat([possible, used], axis=1).reset_index()
     @staticmethod
     def is_binary(file):
         """Check if a file is binary file."""
@@ -659,65 +782,6 @@ def convert_dart_time(seconds, days):
     """
     time = dt.datetime(1601,1,1) + dt.timedelta(days=days, seconds=seconds)
     return time
-def select_by_dart_qc(df, dart_qc):
-    """
-    Selects rows from a DataFrame based on the DART quality control flag.
-    Parameters:
-        df (DataFrame): A pandas DataFrame.
-        dart_qc (int): The DART quality control flag to select.
-    Returns:
-        DataFrame: A DataFrame containing only the rows with the specified DART quality control flag.
-    Raises:
-        ValueError: If the DART quality control flag is not present in the DataFrame.
-    """
-    if dart_qc not in df['DART_quality_control'].unique():
-        raise ValueError(f"DART quality control flag '{dart_qc}' not found in DataFrame.")
-    else:
-        return df[df['DART_quality_control'] == dart_qc]
-def select_failed_qcs(df):
-    """
-    Selects rows from a DataFrame where the DART quality control flag is greater than 0.
-    Parameters:
-        df (DataFrame): A pandas DataFrame.
-    Returns:
-        DataFrame: A DataFrame containing only the rows with a DART quality control flag greater than 0.
-    """
-    return df[df['DART_quality_control'] > 0]
-def possible_vs_used(df):
-    """
-    Calculates the count of possible vs. used observations by type.
-    This function takes a DataFrame containing observation data, including a 'type' column for the observation
-    type and an 'observation' column. The number of used observations ('used'), is the total number
-    minus the observations that failed quality control checks (as determined by the `select_failed_qcs` function).
-    The result is a DataFrame with each observation type, the count of possible observations, and the count of
-    used observations.
-    Parameters:
-        df (pd.DataFrame): A DataFrame with at least two columns: 'type' for the observation type and 'observation'
-        for the observation data. It may also contain other columns required by the `select_failed_qcs` function
-        to determine failed quality control checks.
-    Returns:
-        pd.DataFrame: A DataFrame with three columns: 'type', 'possible', and 'used'. 'type' is the observation type,
-        'possible' is the count of all observations of that type, and 'used' is the count of observations of that type
-        that passed quality control checks.
-    """
-    possible = df.groupby('type')['observation'].count()
-    possible.rename('possible', inplace=True)
-    used = df.groupby('type')['observation'].count() - select_failed_qcs(df).groupby('type')['observation'].count()
-    used.rename('used', inplace=True)
-    return pd.concat([possible, used], axis=1).reset_index()
 def construct_composit(df_comp, composite, components):
     """

pydartdiags-0.0.43/src/pydartdiags/plots/plots.py ADDED Viewed

@@ -0,0 +1,339 @@
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+import pandas as pd
+def plot_rank_histogram(df):
+    """
+    Plots a rank histogram colored by observation type.
+    All histogram bars are initalized to be hidden and can be toggled visible in the plot's legend
+    """
+    _, _, df_hist = calculate_rank(df)
+    fig = px.histogram(df_hist, x='rank', color='obstype', title='Histogram Colored by obstype')
+    for trace in fig.data:
+        trace.visible = 'legendonly'
+    fig.show()
+def calculate_rank(df):
+    """
+    Calculate the rank of observations within an ensemble.
+    This function takes a DataFrame containing ensemble predictions and observed values,
+    adds sampling noise to the ensemble predictions, and calculates the rank of the observed
+    value within the perturbed ensemble for each observation. The rank indicates the position
+    of the observed value within the sorted ensemble values, with 1 being the lowest. If the
+    observed value is larger than the largest ensemble member, its rank is set to the ensemble
+    size plus one.
+    Parameters:
+        df (pd.DataFrame): A DataFrame with columns for mean, standard deviation, observed values,
+                           ensemble size, and observation type. The DataFrame should have one row per observation.
+    Returns:
+        tuple: A tuple containing the rank array, ensemble size, and a result DataFrame. The result
+        DataFrame contains columns for 'rank' and 'obstype'.
+    """
+    ensemble_values = df.filter(regex='prior_ensemble_member').to_numpy().copy()
+    std_dev = np.sqrt(df['obs_err_var']).to_numpy()
+    obsvalue = df['observation'].to_numpy()
+    obstype = df['type'].to_numpy()
+    ens_size = ensemble_values.shape[1]
+    mean = 0.0 # mean of the sampling noise
+    rank = np.zeros(obsvalue.shape[0], dtype=int)
+    for obs in range(ensemble_values.shape[0]):
+        sampling_noise = np.random.normal(mean, std_dev[obs], ens_size)
+        ensemble_values[obs] += sampling_noise
+        ensemble_values[obs].sort()
+        for i, ens in enumerate(ensemble_values[obs]):
+            if obsvalue[obs] <= ens:
+                rank[obs] = i + 1
+                break
+        if rank[obs] == 0: # observation is larger than largest ensemble member
+            rank[obs] = ens_size + 1
+    result_df = pd.DataFrame({
+        'rank': rank,
+        'obstype': obstype
+    })
+    return (rank, ens_size, result_df)
+def plot_profile(df, levels, verticalUnit = "pressure (Pa)"):
+    """
+    Plots RMSE, bias, and total spread profiles for different observation types across specified vertical levels.
+    This function takes a DataFrame containing observational data and model predictions, categorizes
+    the data into specified vertical levels, and calculates the RMSE, bias and total spread for each level and
+    observation type. It then plots three line charts: one for RMSE, one for bias, one for total spread, as functions
+    of vertical level. The vertical levels are plotted on the y-axis in reversed order to represent
+    the vertical profile in the atmosphere correctly if the vertical units are pressure.
+    Parameters:
+        df (pd.DataFrame): The input DataFrame containing at least the 'vertical' column for vertical levels,
+        the vert_unit column, and other columns required by the `rmse_bias` function for calculating RMSE and
+        Bias.
+        levels (array-like): The bin edges for categorizing the 'vertical' column values into the desired
+        vertical levels.
+        verticalUnit (string) (optional): The vertical unit to be used. Only observations in df which have this
+        string in the vert_unit column will be plotted. Defaults to 'pressure (Pa)'.
+    Returns:
+        tuple: A tuple containing the DataFrame with RMSE, bias and total spread calculations,
+        The DataFrame includes a 'vlevels' column representing the categorized vertical levels
+        and 'midpoint' column representing the midpoint of each vertical level bin. And the three figures.
+    Raises:
+        ValueError: If there are missing values in the 'vertical' column of the input DataFrame.
+        ValueError: If none of the input obs have 'verticalUnit' in the 'vert_unit' column of the input DataFrame.
+    Note:
+        - The function modifies the input DataFrame by adding 'vlevels' and 'midpoint' columns.
+        - The 'midpoint' values are calculated as half the midpoint of each vertical level bin, which may need
+          adjustment based on the specific requirements for vertical level representation.
+        - The plots are generated using Plotly Express and are displayed inline. The y-axis of the plots is
+          reversed to align with standard atmospheric pressure level representation if the vertical units
+          are atmospheric pressure.
+    """
+    pd.options.mode.copy_on_write = True
+    if df['vertical'].isnull().values.any(): # what about horizontal observations?
+        raise ValueError("Missing values in 'vertical' column.")
+    elif verticalUnit not in df['vert_unit'].values:
+        raise ValueError("No obs with expected vertical unit '"+verticalUnit+"'.")
+    else:
+        df = df[df["vert_unit"].isin({verticalUnit})] # Subset to only rows with the correct vertical unit
+        df.loc[:,'vlevels'] = pd.cut(df['vertical'], levels)
+        if verticalUnit == "pressure (Pa)":
+            df.loc[:,'midpoint'] = df['vlevels'].apply(lambda x: x.mid / 100.) # HK todo units
+        else:
+            df.loc[:,'midpoint'] = df['vlevels'].apply(lambda x: x.mid)
+    # Calculations
+    df_profile_prior = rmse_bias_totalspread(df, phase='prior')
+    df_profile_posterior = None
+    if 'posterior_ensemble_mean' in df.columns:
+        df_profile_posterior = rmse_bias_totalspread(df, phase='posterior')
+    # Merge prior and posterior dataframes
+    if df_profile_posterior is not None:
+        df_profile = pd.merge(df_profile_prior, df_profile_posterior, on=['midpoint', 'type'], suffixes=('_prior', '_posterior'))
+        fig_rmse = plot_profile_prior_post(df_profile, 'rmse', verticalUnit)
+        fig_rmse.show()
+        fig_bias = plot_profile_prior_post(df_profile, 'bias', verticalUnit)
+        fig_bias.show()
+        fig_ts = plot_profile_prior_post(df_profile, 'totalspread', verticalUnit)
+        fig_ts.show()
+    else:
+        df_profile = df_profile_prior
+        fig_rmse = plot_profile_prior(df_profile, 'rmse', verticalUnit)
+        fig_rmse.show()
+        fig_bias = plot_profile_prior(df_profile, 'bias', verticalUnit)
+        fig_bias.show()
+        fig_ts = plot_profile_prior(df_profile, 'totalspread', verticalUnit)
+        fig_ts.show()
+    return df_profile, fig_rmse, fig_ts, fig_bias
+def plot_profile_prior_post(df_profile, stat, verticalUnit):
+    """
+    Plots prior and posterior statistics by vertical level for different observation types.
+    Parameters:
+        df_profile (pd.DataFrame): DataFrame containing the prior and posterior statistics.
+        stat (str): The statistic to plot (e.g., 'rmse', 'bias', 'totalspread').
+        verticalUnit (str): The unit of the vertical axis (e.g., 'pressure (Pa)').
+    Returns:
+        plotly.graph_objects.Figure: The generated Plotly figure.
+    """
+    # Reshape DataFrame to long format for easier plotting
+    df_long = pd.melt(
+        df_profile,
+        id_vars=["midpoint", "type"],
+        value_vars=["prior_"+stat, "posterior_"+stat],
+        var_name=stat+"_type",
+        value_name=stat+"_value"
+    )
+    # Define a color mapping for observation each type
+    unique_types = df_long["type"].unique()
+    colors = px.colors.qualitative.Plotly
+    color_mapping = {type_: colors[i % len(colors)] for i, type_ in enumerate(unique_types)}
+    # Create a mapping for line styles based on stat
+    line_styles = {"prior_"+stat: "solid", "posterior_"+stat: "dash"}
+    # Create the figure
+    fig_stat = go.Figure()
+    # Loop through each type and type to add traces
+    for t in df_long["type"].unique():
+        for stat_type, dash_style in line_styles.items():
+            # Filter the DataFrame for this type and stat
+            df_filtered = df_long[(df_long[stat+"_type"] == stat_type) & (df_long["type"] == t)]
+            # Add a trace
+            fig_stat.add_trace(go.Scatter(
+                x=df_filtered[stat+"_value"],
+                y=df_filtered["midpoint"],
+                mode='lines+markers',
+                name='prior '+t if stat_type == "prior_"+stat else 'post ',  # Show legend for "prior_stat OBS TYPE" only
+                line=dict(dash=dash_style, color=color_mapping[t]),  # Same color for all traces in group
+                marker=dict(size=8, color=color_mapping[t]),
+                legendgroup=t  # Group traces by type
+            ))
+        # Update layout
+        fig_stat.update_layout(
+            title= stat+' by Level',
+            xaxis_title=stat,
+            yaxis_title=verticalUnit,
+            width=800,
+            height=800,
+            template="plotly_white"
+        )
+    if verticalUnit == "pressure (Pa)":
+        fig_stat.update_yaxes(autorange="reversed")
+    return fig_stat
+def plot_profile_prior(df_profile, stat, verticalUnit):
+    """
+    Plots prior statistics by vertical level for different observation types.
+    Parameters:
+        df_profile (pd.DataFrame): DataFrame containing the prior statistics.
+        stat (str): The statistic to plot (e.g., 'rmse', 'bias', 'totalspread').
+        verticalUnit (str): The unit of the vertical axis (e.g., 'pressure (Pa)').
+    Returns:
+        plotly.graph_objects.Figure: The generated Plotly figure.
+    """
+    # Reshape DataFrame to long format for easier plotting - not needed for prior only, but
+    #   leaving it in for consistency with the plot_profile_prior_post function for now
+    df_long = pd.melt(
+        df_profile,
+        id_vars=["midpoint", "type"],
+        value_vars=["prior_"+stat],
+        var_name=stat+"_type",
+        value_name=stat+"_value"
+    )
+    # Define a color mapping for observation each type
+    unique_types = df_long["type"].unique()
+    colors = px.colors.qualitative.Plotly
+    color_mapping = {type_: colors[i % len(colors)] for i, type_ in enumerate(unique_types)}
+    # Create the figure
+    fig_stat = go.Figure()
+    # Loop through each type to add traces
+    for t in df_long["type"].unique():
+        # Filter the DataFrame for this type and stat
+        df_filtered = df_long[(df_long["type"] == t)]
+        # Add a trace
+        fig_stat.add_trace(go.Scatter(
+            x=df_filtered[stat+"_value"],
+            y=df_filtered["midpoint"],
+            mode='lines+markers',
+            name='prior ' + t,
+            line=dict(color=color_mapping[t]),  # Same color for all traces in group
+            marker=dict(size=8, color=color_mapping[t]),
+            legendgroup=t  # Group traces by type
+        ))
+    # Update layout
+    fig_stat.update_layout(
+        title=stat + ' by Level',
+        xaxis_title=stat,
+        yaxis_title=verticalUnit,
+        width=800,
+        height=800,
+        template="plotly_white"
+    )
+    if verticalUnit == "pressure (Pa)":
+        fig_stat.update_yaxes(autorange="reversed")
+    return fig_stat
+def mean_then_sqrt(x):
+    """
+    Calculates the mean of an array-like object and then takes the square root of the result.
+    Parameters:
+        arr (array-like): An array-like object (such as a list or a pandas Series).
+                          The elements should be numeric.
+    Returns:
+        float: The square root of the mean of the input array.
+    Raises:
+        TypeError: If the input is not an array-like object containing numeric values.
+         ValueError: If the input array is empty.
+    """
+    return np.sqrt(np.mean(x))
+def rmse_bias_totalspread(df, phase='prior'):
+    if phase == 'prior':
+        sq_err_column = 'prior_sq_err'
+        bias_column = 'prior_bias'
+        rmse_column = 'prior_rmse'
+        spread_column = 'prior_ensemble_spread'
+        totalspread_column = 'prior_totalspread'
+    elif phase == 'posterior':
+        sq_err_column = 'posterior_sq_err'
+        bias_column = 'posterior_bias'
+        rmse_column = 'posterior_rmse'
+        spread_column = 'posterior_ensemble_spread'
+        totalspread_column = 'posterior_totalspread'
+    else:
+        raise ValueError("Invalid phase. Must be 'prior' or 'posterior'.")
+    rmse_bias_ts_df = df.groupby(['midpoint', 'type'], observed=False).agg({
+        sq_err_column: mean_then_sqrt,
+        bias_column: 'mean',
+        spread_column: mean_then_sqrt,
+        'obs_err_var': mean_then_sqrt
+    }).reset_index()
+    # Add column for totalspread
+    rmse_bias_ts_df[totalspread_column] = np.sqrt(rmse_bias_ts_df[spread_column] + rmse_bias_ts_df['obs_err_var'])
+    # Rename square error to root mean square error
+    rmse_bias_ts_df.rename(columns={sq_err_column: rmse_column}, inplace=True)
+    return rmse_bias_ts_df
+def rmse_bias_by_obs_type(df, obs_type):
+    """
+    Calculate the RMSE and bias for a given observation type.
+    Parameters:
+        df (DataFrame): A pandas DataFrame.
+        obs_type (str): The observation type for which to calculate the RMSE and bias.
+    Returns:
+        DataFrame: A DataFrame containing the RMSE and bias for the given observation type.
+    Raises:
+        ValueError: If the observation type is not present in the DataFrame.
+    """
+    if obs_type not in df['type'].unique():
+        raise ValueError(f"Observation type '{obs_type}' not found in DataFrame.")
+    else:
+        obs_type_df = df[df['type'] == obs_type]
+        obs_type_agg = obs_type_df.groupby('vlevels', observed=False).agg({'sq_err':mean_then_sqrt, 'bias':'mean'}).reset_index()
+        obs_type_agg.rename(columns={'sq_err':'rmse'}, inplace=True)
+        return obs_type_agg

pydartdiags 0.0.42__tar.gz → 0.0.43__tar.gz

Potentially problematic release.

pydartdiags 0.0.42tar.gz → 0.0.43tar.gz