ibicus.evaluate module

The evaluate-module: provides a set of functionalities to assess the performance of your bias correction method.

Bias correction is prone to mis-use and requires careful evaluation, as demonstrated and argued in Maraun et al. 2017. In particular, the bias correction methods implemented in this package operate on a marginal level, that is they correct distribution of individual variables at individual locations. There is therefore only a subset of climate model biases that these debiasers will be able to correct. Biases in the temporal or spatial structure of climate models, or the feedbacks to large-scale weather patterns might not be well corrected.

The evaluate-module: attempts to provide the user with the functionality to make an informed decision whether a chosen bias correction method is fit for purpose - whether it corrects marginal, as well as spatial and temporal statistical properties properties in the desired manner, as well as how it modifies the multivariate structure, if and how it modifies the climate change trend, and how it changes the bias in selected climate impact metrics.

There are three components to the evaluation module:

1. Testing assumptions of different debiasers

Different debiasers rely on different assumptions - some are parametrics, others non-parametric, some bias correct each day or month of the year separately, others are applied to all days of the year in the same way.

This components is meant to check some of these assumptions and for example help the user choose an appropriate function to fit the data to, an appropriate application window (entire year vs each days or month individually) and rule out the use of some debiasers that are not fit for purpose in a specific application.

The current version of this component can analyse the following two questions? - Is the fit of the default distribution ‘good enough’ or should a different distribution be used? - Is there any seasonality in the data that should be accounted for, for example by applying a ‘running window mode’ (meaning that the bias correction is fitted separately for different parts of the year, i.e. windows)?

The following functions are currently available:

assumptions.calculate_aic(variable, dataset, ...)

Calculates the Akaike Information Criterion (AIC) at each location for each of the distributions specified.

assumptions.plot_aic(variable, aic_values[, ...])

Creates a boxplot of AIC values across all locations.

assumptions.plot_fit_worst_aic(variable, ...)

Plots a histogram and overlayed fit at the location of worst AIC.

assumptions.plot_quantile_residuals(...[, ...])

Plots timeseries and autocorrelation function of quantile residuals, as well as a QQ-plot of normalized quantile residuals at one location.

2. Evaluating the bias corrected model on a validation period

In order to assess the performance of a bias correction method, the bias corrected model data has to be compared to observational / reanalysis data. The historical period for which observations exist is therefore split into to dataset in pre-processing - a reference period, and a validation period.

There are two types of analysis that the evaluation module enables you to conduct:

  1. Statistical properties: this includes the marginal bias of descriptive statistics such as the mean, or 5th and 95th percentile, as well as the difference in spatial and multivariate correlation structure.

  2. Threshold metrics: A threshold metric is an instance of the class ThresholdMetric-class and is needs to be one of four types: exceedance of the specified threshold value (‘higher’), underceedance of the threshold value (‘lower’), falling within two specified bounds (‘between’) or falling outside two specified bounds (‘outside’). With the functionalities provided as part of ThresholdMetric-class, the marginal exceedance probability as well as the temporal spell length, the spatial extent and the spatiotemporal cluster size can be analysed. Some threshold metrics are pre-specified, and the user can add further metrics in the following way:

>>> frost_days = ThresholdMetric(name="Frost days (tasmin<0°C)", variable="tasmin", threshold_value=273.13, threshold_type="lower")

The following table provides an overview of the different components that can be analysed in each of these two categories:

Statistical properties

Threshold metrics

Marginal

x

x

Temporal

x (spell length)

Spatial

x (RMSE)

x (spatial extent)

Spatiotemporal

x (cluster size)

Multivariate

x (correlation)

x (joint exceedance)

Within the metrics class, the following functions are available:

metrics.ThresholdMetric.calculate_instances_of_threshold_exceedance(dataset)

Returns an array of the same size as dataset containing 1 when the threshold condition is met and 0 when not.

metrics.ThresholdMetric.filter_threshold_exceedances(dataset)

Returns an array containing the values of dataset where the threshold condition is met and zero where not.

metrics.ThresholdMetric.calculate_exceedance_probability(dataset)

Returns the probability of metrics occurrence (threshold exceedance/underceedance or inside/outside range), at each location (across the entire time period).

metrics.ThresholdMetric.calculate_number_annual_days_beyond_threshold(...)

Calculates number of days beyond threshold for each year in the dataset.

metrics.ThresholdMetric.calculate_spell_length(...)

Returns a py:class:`pd.DataFrame of individual spell lengths of metrics occurrences (threshold exceedance/underceedance or inside/outside range), counted across locations, for each climate dataset specified in **climate_data.

metrics.ThresholdMetric.calculate_spatial_extent(...)

Returns a py:class:`pd.DataFrame of spatial extends of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

metrics.ThresholdMetric.calculate_spatiotemporal_clusters(...)

Returns a py:class:`pd.DataFrame of sizes of individual spatiotemporal clusters of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

metrics.ThresholdMetric.violinplots_clusters(...)

Returns three violinplots with distributions of temporal, spatial and spatiotemporal extends of metric occurrences, comparing all climate dataset specified in **climate_data.

AccumulativeThresholdMetric-class is a child class of ThresholdMetric-class that adds additional functionalities for variables and metrics where the total accumulative amount over a given threshold is of interest - this is the case for precipitation, but not for temperature for example. The following functions are added:

metrics.AccumulativeThresholdMetric.calculate_percent_of_total_amount_beyond_threshold(dataset)

Calculates percentage of total amount beyond threshold for each location over all timesteps.

metrics.AccumulativeThresholdMetric.calculate_annual_value_beyond_threshold(...)

Calculates amount beyond threshold for each year in the dataset.

metrics.AccumulativeThresholdMetric.calculate_intensity_index(dataset)

Calculates the amount beyond a threshold divided by the number of instance the threshold is exceeded.

For the evaluation of marginal properties, the following functions are currently available:

The following functions are available to analyse the bias in spatial correlation structure:

correlation.rmse_spatial_correlation_distribution(...)

Calculates Root-Mean-Squared-Error between observed and modelled spatial correlation matrix at each location.

correlation.rmse_spatial_correlation_boxplot(...)

Boxplot of RMSE of spatial correlation across locations.

To analyse the multivariate correlation structure, as well as joint threshold exceedances:

multivariate.calculate_conditional_joint_threshold_exceedance(...)

Returns a pd.DataFrame containing location-wise conditional exceedance probability.

multivariate.plot_conditional_joint_threshold_exceedance(...)

Accepts ouput given by calculate_conditional_joint_threshold_exceedance() and creates an overview boxplot of the conditional exceedance probability across locations in the chosen datasets.

multivariate.calculate_and_spatialplot_multivariate_correlation(...)

Calculates correlation between the two variables specified in keyword arguments (such as tas and pr) at each location and outputs spatial plot.

multivariate.plot_correlation_single_location(...)

Uses seaborn.regplot and output of create_multivariate_dataframes() to plot scatterplot and Pearson correlation estimate of the two specified variables.

multivariate.plot_bootstrap_correlation_replicates(...)

Plots histograms of correlation between variables in input dataframes estimated via bootstrap using _calculate_bootstrap_correlation_replicates().

3. Investigating whether the climate change trend is preserved

Bias correction methods can significantly modify the trend projected in the climate model simulation (Switanek 2017). If the user does not consider the simulated trend to be credible, then modifying it can be a good thing to do. However, any trend modification should always be a concious and informed choice, and it the belief that a bias correction method will improve the trend should be justified. Otherwise, the trend modification through the application of a bias correction method should be considered an artifact.

This component helps the user assess whether a certain method preserves the cliamte model trend or not. Some methods implemented in this package are explicitly trend preserving, for more details see the methodologies and descriptions of the individual debiasers.

trend.calculate_future_trend_bias(variable, ...)

For each location, calculates the bias in the trend of the bias corrected model compared to the raw climate model for the following metrics: mean, 5% and 95% quantile (default) as well as metrics passed as arguments to the function.

trend.plot_future_trend_bias_boxplot(...[, ...])

Accepts ouput given by calculate_future_trend_bias() and creates an overview boxplot of the bias in the trend of metrics.

trend.plot_future_trend_bias_spatial(...[, ...])

Accepts ouput given by calculate_future_trend_bias() and creates an spatial plot of trend bias for one chosen metric.

ibicus.evaluate.metrics

Metrics module - Standard metric definitions

class ibicus.evaluate.metrics.AccumulativeThresholdMetric(threshold_value, threshold_type, name='unknown', variable='unknown')

Bases: ibicus.evaluate.metrics.ThresholdMetric

Class for climate metrics that are defined by thresholds (child class of ThresholdMetric), but are accumulative. This mainly concerns precipitation metrics.

An example of such a metric is total precipitation by very wet days (days > 10mm precipitation).

Attributes:
name
threshold_type
threshold_value
variable

Methods

calculate_annual_value_beyond_threshold(...)

Calculates amount beyond threshold for each year in the dataset.

calculate_exceedance_probability(dataset)

Returns the probability of metrics occurrence (threshold exceedance/underceedance or inside/outside range), at each location (across the entire time period).

calculate_instances_of_threshold_exceedance(dataset)

Returns an array of the same size as dataset containing 1 when the threshold condition is met and 0 when not.

calculate_intensity_index(dataset)

Calculates the amount beyond a threshold divided by the number of instance the threshold is exceeded.

calculate_number_annual_days_beyond_threshold(...)

Calculates number of days beyond threshold for each year in the dataset.

calculate_percent_of_total_amount_beyond_threshold(dataset)

Calculates percentage of total amount beyond threshold for each location over all timesteps.

calculate_spatial_extent(**climate_data)

Returns a py:class:`pd.DataFrame of spatial extends of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

calculate_spatiotemporal_clusters(**climate_data)

Returns a py:class:`pd.DataFrame of sizes of individual spatiotemporal clusters of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

calculate_spell_length(minimum_length, ...)

Returns a py:class:`pd.DataFrame of individual spell lengths of metrics occurrences (threshold exceedance/underceedance or inside/outside range), counted across locations, for each climate dataset specified in **climate_data.

filter_threshold_exceedances(dataset)

Returns an array containing the values of dataset where the threshold condition is met and zero where not.

violinplots_clusters(minimum_length, ...)

Returns three violinplots with distributions of temporal, spatial and spatiotemporal extends of metric occurrences, comparing all climate dataset specified in **climate_data.

calculate_annual_value_beyond_threshold(dataset, dates_array, time_func=<function year>)

Calculates amount beyond threshold for each year in the dataset.

Parameters:
datasetnp.ndarray

Input data, either observations or climate projections to be analysed, numeric entries expected.

dates_arraynp.ndarray

Array of dates matching time dimension of dataset. Has to be of form time_dictionary[time_specification] - for example: tas_dates_validate[‘time_obs’]

time_funcfunctions

Points to utils function to either extract days or months.

Returns:
np.ndarray

3d array - [years, lat, long]

calculate_intensity_index(dataset)

Calculates the amount beyond a threshold divided by the number of instance the threshold is exceeded.

Designed to calculate the simple precipitation intensity index but can be used for other variables.

Parameters:
datasetnp.ndarray

Input data, either observations or climate projectionsdataset to be analysed, numeric entries expected.

calculate_percent_of_total_amount_beyond_threshold(dataset)

Calculates percentage of total amount beyond threshold for each location over all timesteps.

Parameters:
datasetnp.ndarray

Input data, either observations or climate projectionsdataset to be analysed, numeric entries expected.

Returns:
np.ndarray

2d array with percentage of total amount above threshold at each location.

class ibicus.evaluate.metrics.ThresholdMetric(threshold_value, threshold_type, name='unknown', variable='unknown')

Bases: object

Generic climate metric defined by exceedance or underceedance of threshold; or values between an upper and lower threshold.

Organises the definition and functionalities of such metrics. Enables to implement a subsection of the Climdex climate extreme indices<https://www.climdex.org/learn/indices/>.

Examples

>>> warm_days = ThresholdMetric(name = 'Mean warm days (K)', variable = 'tas', threshold_value = [295], threshold_type = 'higher')
Attributes:
threshold_valueUnion[float, list[float], tuple[float]]

Threshold value(s) for the variable (in the correct unit). If threshold_type = “higher” or threshold_type = “lower” this is just a single float value and the metric is defined as exceedance or underceedance of that value. If threshold_type = “between” or threshold_type = “outside” then this needs to be a list in the form: [lower_bound, upper_bound] and the metric is defined as falling in between, or falling outside these values.

threshold_typestr

One of [“higher”, “lower”, “between”, “outside”]. Indicates whether we are either interested in values above the threshold value (“higher”, strict >), values below the threshold value (“lower”, strict <), values between the threshold values (“between”, not strict including the bounds) or outside the threshold values (“outside”, strict not including the bounds).

namestr = “unknown”

Metric name. Will be used in dataframes, plots etc. Recommended to include threshold value and units. Example : ‘Frost days

(tasmin < 0°C)’. Default: `”unknown”`.
variablestr = “unknown”

Unique variable that this threshold metric refers to. Example for frost days: tasmin. Default: “unknown”.

Methods

calculate_exceedance_probability(dataset)

Returns the probability of metrics occurrence (threshold exceedance/underceedance or inside/outside range), at each location (across the entire time period).

calculate_instances_of_threshold_exceedance(dataset)

Returns an array of the same size as dataset containing 1 when the threshold condition is met and 0 when not.

calculate_number_annual_days_beyond_threshold(...)

Calculates number of days beyond threshold for each year in the dataset.

calculate_spatial_extent(**climate_data)

Returns a py:class:`pd.DataFrame of spatial extends of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

calculate_spatiotemporal_clusters(**climate_data)

Returns a py:class:`pd.DataFrame of sizes of individual spatiotemporal clusters of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

calculate_spell_length(minimum_length, ...)

Returns a py:class:`pd.DataFrame of individual spell lengths of metrics occurrences (threshold exceedance/underceedance or inside/outside range), counted across locations, for each climate dataset specified in **climate_data.

filter_threshold_exceedances(dataset)

Returns an array containing the values of dataset where the threshold condition is met and zero where not.

violinplots_clusters(minimum_length, ...)

Returns three violinplots with distributions of temporal, spatial and spatiotemporal extends of metric occurrences, comparing all climate dataset specified in **climate_data.

calculate_exceedance_probability(dataset)

Returns the probability of metrics occurrence (threshold exceedance/underceedance or inside/outside range), at each location (across the entire time period).

Parameters:
datasetnp.ndarray

Input data, either observations or climate projections to be analysed, numeric entries expected

Returns:
np.ndarray

Probability of metric occurrence at each location.

calculate_instances_of_threshold_exceedance(dataset)

Returns an array of the same size as dataset containing 1 when the threshold condition is met and 0 when not.

Parameters:
datasetnp.ndarray

Input data, either observations or climate projections to be analysed, numeric entries expected.

calculate_number_annual_days_beyond_threshold(dataset, dates_array, time_func=<function year>)

Calculates number of days beyond threshold for each year in the dataset.

Parameters:
datasetnp.ndarray

Input data, either observations or climate projections to be analysed, numeric entries expected.

dates_arraynp.ndarray

Array of dates matching time dimension of dataset. Has to be of form time_dictionary[time_specification] - for example: tas_dates_validate[‘time_obs’]

time_funcfunctions

Points to utils function to either extract days or months.

Returns:
np.ndarray

3d array - [years, lat, long]

calculate_spatial_extent(**climate_data)

Returns a py:class:`pd.DataFrame of spatial extends of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

The spatial extent is defined as the percentage of the area where the threshold is exceeded/underceeded or values are between or outside the bounds (depending on self.threshold_type), given that it is exceeded at one location. The output dataframe has three columns: ‘Correction Method’ - obs/raw or name of debiaser, ‘Metric’ - name of the threshold metric, ‘Spatial extent (% of area)’

Parameters:
**climate_data

Keyword arguments, providing the input data to investigate.

Returns:
pd.DataFrame

Dataframe of spatial extends of metrics occurrences.

Examples

>>> dry_days.calculate_spatial_extent(obs = tas_obs_validate, raw = tas_cm_validate, ISIMIP = tas_val_debiased_ISIMIP)
calculate_spatiotemporal_clusters(**climate_data)

Returns a py:class:`pd.DataFrame of sizes of individual spatiotemporal clusters of metrics occurrences (threshold exceedance/underceedance or inside/outside range), for each climate dataset specified in **climate_data.

A spatiotemporal cluster is defined as a connected set (in time and/or space) where the threshold is exceeded/underceeded or values are between or outside the bounds (depending on self.threshold_type). The output dataframe has three columns: ‘Correction Method’ - obs/raw or name of debiaser, ‘Metric’ - name of the threshold metric, ‘Spatiotemporal cluster size’

Parameters:
climate_data

Keyword arguments, providing the input data to investigate.

Returns:
pd.DataFrame

Dataframe of sizes of individual spatiotemporal clusters of metrics occurrences.

Examples

>>> dry_days.calculate_spatiotemporal_clusters(obs = tas_obs_validate, raw = tas_cm_validate, ISIMIP = tas_val_debiased_ISIMIP)
calculate_spell_length(minimum_length, **climate_data)

Returns a py:class:`pd.DataFrame of individual spell lengths of metrics occurrences (threshold exceedance/underceedance or inside/outside range), counted across locations, for each climate dataset specified in **climate_data.

A spell length is defined as the number of days that a threshold is continuesly exceeded, underceeded or where values are continuously between or outside the threshold (depending on self.threshold_type). The output dataframe has three columns: ‘Correction Method’ - obs/raw or name of debiaser as specified in **climate_data, ‘Metric’ - name of the threshold metric, ‘Spell length - individual spell length counts’.

Parameters:
minimum lengthint

Minimum spell length (in days) investigated.

climate_data

Keyword arguments, providing the input data to investigate.

Returns:
pd.DataFrame

Dataframe of spell lengths of metrics occurrences.

Examples

>>> dry_days.calculate_spell_length(minimum_length = 4, obs = tas_obs_validate, raw = tas_cm_validate, ISIMIP = tas_val_debiased_ISIMIP)
filter_threshold_exceedances(dataset)

Returns an array containing the values of dataset where the threshold condition is met and zero where not.

Parameters:
datasetnp.ndarray

Input data, either observations or climate projections to be analysed, numeric entries expected.

violinplots_clusters(minimum_length, **climate_data)

Returns three violinplots with distributions of temporal, spatial and spatiotemporal extends of metric occurrences, comparing all climate dataset specified in **climate_data.

Parameters:
minimum lengthint

Minimum spell length (in days) investigated for temporal extends.

name
threshold_type
threshold_value
variable
ibicus.evaluate.metrics.R10mm = AccumulativeThresholdMetric(threshold_value=0.00011574074074074075, threshold_type='higher', name='Very wet days \n (> 10 mm/day)', variable='pr')

Very wet days (> 10 mm/day) for pr.

ibicus.evaluate.metrics.R20mm = AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr')

Extremely wet days (> 20 mm/day) for pr.

ibicus.evaluate.metrics.cold_days = ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas')

Cold days (<275) for tas.

ibicus.evaluate.metrics.dry_days = AccumulativeThresholdMetric(threshold_value=1.1574074074074073e-05, threshold_type='lower', name='Dry days \n (< 1 mm/day)', variable='pr')

Dry days (< 1 mm/day) for pr.

ibicus.evaluate.metrics.frost_days = ThresholdMetric(threshold_value=273.13, threshold_type='lower', name='Frost days \n  (tasmin<0°C)', variable='tasmin')

Frost days (<0°C) for tasmin.

ibicus.evaluate.metrics.icing_days = ThresholdMetric(threshold_value=273.13, threshold_type='lower', name='Icing days \n (tasmax<0°C)', variable='tasmax')

Icing days (<0°C) for tasmax.

ibicus.evaluate.metrics.summer_days = ThresholdMetric(threshold_value=298.15, threshold_type='higher', name='Summer days \n  (tasmax>25°C)', variable='tasmax')

Summer days (>25°C) for tasmax.

ibicus.evaluate.metrics.tropical_nights = ThresholdMetric(threshold_value=293.13, threshold_type='higher', name='Tropical Nights \n (tasmin>20°C)', variable='tasmin')

Tropical Nights (>20°C) for tasmin.

ibicus.evaluate.metrics.warm_days = ThresholdMetric(threshold_value=295, threshold_type='higher', name='Mean warm days (K)', variable='tas')

Warm days (>295K) for tas.

ibicus.evaluate.metrics.wet_days = AccumulativeThresholdMetric(threshold_value=1.1574074074074073e-05, threshold_type='higher', name='Wet days \n (> 1 mm/day)', variable='pr')

Wet days (> 1 mm/day) for pr.

ibicus.evaluate.marginal

ibicus.evaluate.multivariate

ibicus.evaluate.multivariate.calculate_and_spatialplot_multivariate_correlation(variables, manual_title=' ', **kwargs)

Calculates correlation between the two variables specified in keyword arguments (such as tas and pr) at each location and outputs spatial plot.

Parameters:
variablelist

Variable name, has to be given in standard form specified in documentation.

manual_titlestr

Optional argument present in all plot functions: manual_title will be used as title of the plot.

kwargs

Keyword arguments specifying a list of two np.ndarrays containing the two variables of interest.

Examples

>>> correlation.calculate_multivariate_correlation_locationwise(variables = ['tas', 'pr'], obs = [tas_obs_validate, pr_obs_validate], raw = [tas_cm_validate, pr_cm_validate], ISIMIP = [tas_val_debiased_ISIMIP, pr_val_debiased_ISIMIP])
ibicus.evaluate.multivariate.calculate_conditional_joint_threshold_exceedance(metric1, metric2, **climate_data)

Returns a pd.DataFrame containing location-wise conditional exceedance probability.

Calculates:

\[p (\text{Metric1} | \text{Metric2}) = p (\text{Metric1} , \text{Metric2}) / p(\text{Metric2})\]

Output is a pd.DataFrame with 3 columns: - Correction Method: Type of climate data - obs, raw, bias_correction_name. Given through key of climate_data - Compound metric: str reading ‘Metric1.name given Metric2.name’ - Conditional exceedance probability: 2d numpy array with conditional exceedance probability at each location

Parameters:
metric1ThresholdMetric

observational dataset in validation period

metric2ThresholdMetric

Array of strings containing the names of the metrics that are to be assessed.

**climate_data

Keyword arguments of type key = debiased_dataset in validation period (example: ‘QM = tas_val_debiased_QM’, or ‘obs = tas_val_debiased_obs’).

Returns:
pd.DataFrame

DataFrame with conditional exceedance probability at all locations for the combination of metrics chosen.

Examples

>>> dry_frost_data = calculate_conditional_exceedance(metric1 = dry_days, metric2 = frost_days, obs = [pr_obs_validate, tasmin_obs_validate], raw = [pr_cm_validate, tasmin_cm_validate], ISIMIP = [pr_val_debiased_ISIMIP, tasmin_val_debiased_ISIMIP])
ibicus.evaluate.multivariate.create_multivariate_dataframes(variables, datasets_obs, datasets_bc, gridpoint=(0, 0))

Helper function creating two joint pd.Dataframe of two variables specified, for observational dataset as well as one bias corrected dataset at one datapoint.

Parameters:
variableslist

List of two variable names, has to be given in standard form following CMIP convention

datasets_obslist

List of two observational datasets during same period for the two variables.

datasets_bclist

List of two bias corrected datasets during same period for the two variables.

gridpointtupel

Tupel that specifies location from which data will be extracted

Examples

>>> tas_pr_obs, tas_pr_isimip = _create_multivariate_dataframes(variables = ['tas', 'pr'], datasets_obs = [tas_obs_validate, pr_obs_validate], datasets_bc = [tas_val_debiased_ISIMIP, pr_val_debiased_ISIMIP], gridpoint = (1,1))
ibicus.evaluate.multivariate.plot_bootstrap_correlation_replicates(obs_df, bc_df, bc_name, size)

Plots histograms of correlation between variables in input dataframes estimated via bootstrap using _calculate_bootstrap_correlation_replicates().

Parameters:
obs_dfpd.DataFrame

First argument of output of create_multivariate_dataframes()

bc_dfpd.DataFrame

Second argument of output of create_multivariate_dataframes()

bc_name: str

Name of bias correction method

size: int

Number of draws in bootstrapping procedure

Examples

>>> plot_bootstrap_correlation_replicates(obs_df = tas_pr_obs, bc_df = tas_pr_isimip, bc_name = 'ISIMIP', size=500)
ibicus.evaluate.multivariate.plot_conditional_joint_threshold_exceedance(conditional_exceedance_df)

Accepts ouput given by calculate_conditional_joint_threshold_exceedance() and creates an overview boxplot of the conditional exceedance probability across locations in the chosen datasets.

Parameters:
bias_array: np.ndarray

Output of calculate_conditional_joint_threshold_exceedance()

ibicus.evaluate.multivariate.plot_correlation_single_location(variables, obs_df, bc_df)

Uses seaborn.regplot and output of create_multivariate_dataframes() to plot scatterplot and Pearson correlation estimate of the two specified variables. Offers visual comparison of correlation at single location.

Parameters:
variablelist

List of variable name, has to be given in standard form following CMIP convetion.

obs_dfpd.DataFrame

First argument of output of create_multivariate_dataframes()

bc_dfpd.DataFrame

Second argument of output of create_multivariate_dataframes()

Examples

>>> plot_correlation_single_location(variables = ['tas', 'pr'], obs_df = tas_pr_obs, bc_df = tas_pr_isimip)

ibicus.evaluate.correlation

ibicus.evaluate.correlation.rmse_spatial_correlation_boxplot(variable, dataset, manual_title=' ')

Boxplot of RMSE of spatial correlation across locations.

Parameters:
variablestr

Variable name, has to be given in standard form specified in documentation.

datasetpd.DataFrame

Ouput format of function rmse_spatial_correlation_distribution()

manual_titlestr

Optional argument present in all plot functions: manual_title will be used as title of the plot.

ibicus.evaluate.correlation.rmse_spatial_correlation_distribution(variable, obs_data, **cm_data)

Calculates Root-Mean-Squared-Error between observed and modelled spatial correlation matrix at each location.

The computation involves the following steps: At each location, calculate the correlation to each other location in the observed as well as the climate model data set. Then calculate the mean squared error between these two matrices.

Parameters:
variablestr

Variable name, has to be given in standard form specified in documentation.

obs_datanp.ndarray

Optional argument present in all plot functions: manual_title will be used as title of the plot.

cm_data

Keyword arguments specifying climate model datasets, for example: QM = tas_debiased_QM

Examples

>>> tas_rmsd_spatial = rmse_spatial_correlation_distribution(variable = 'tas', obs_data = tas_obs_validate, raw = tas_cm_future, QDM = tas_val_debiased_QDM)

ibicus.evaluate.trend

ibicus.evaluate.trend.calculate_future_trend_bias(variable, raw_validate, raw_future, metrics=[], trend_type='additive', remove_outliers=True, **debiased_cms)

For each location, calculates the bias in the trend of the bias corrected model compared to the raw climate model for the following metrics: mean, 5% and 95% quantile (default) as well as metrics passed as arguments to the function.

Trend can be specified as either additive or multiplicative.

Function returns numpy array with three columns: [Correction method: str, Metric: str, Relative change bias (%): List containing one 2d np.ndarray containing trend bias at each location]

Parameters:
variablestr

Variable name, has to be given in standard form following CMIP convention.

raw_validatenp.ndarray

Raw climate data set in validation period

raw_future: np.ndarray

Raw climate data set in future period

metricsnp.ndarray

1d numpy array of strings containing the keys of the metrics to be analysed. Example: metrics = [‘dry’, ‘wet’]

trend_type: str

Determines whether additive or multiplicative trend is analysed. Has to be one of [‘additive’, ‘multiplicative’]

debiased_cmsnp.ndarray

Keyword arguments given in format debiaser_name = [debiased_dataset_validation_period, debiased_dataset_future_period] Example: QM = [tas_val_debiased_QM, tas_future_debiased_QM].

Examples

>>> tas_trend_bias_data = trend.calculate_future_trend_bias(variable = 'tas', raw_validate = tas_cm_validate, raw_future = tas_cm_future, metrics = ['warm_days', 'cold_days'], trend_type = additive, QDM = [tas_val_debiased_QDM, tas_fut_debiased_QDM], CDFT = [tas_val_debiased_CDFT, tas_fut_debiased_CDFT])
ibicus.evaluate.trend.plot_future_trend_bias_boxplot(variable, bias_df, manual_title=' ')

Accepts ouput given by calculate_future_trend_bias() and creates an overview boxplot of the bias in the trend of metrics.

Parameters:
variable: str

Variable name, has to be given in standard form specified in documentation.

bias_df: pd.DataFrame

Numpy array with three columns: [Bias correction method, Metric, Bias value at certain location]

manual_titlestr

Optional argument present in all plot functions: manual_title will be used as title of the plot.

ibicus.evaluate.trend.plot_future_trend_bias_spatial(variable, metric, bias_df, manual_title=' ')

Accepts ouput given by calculate_future_trend_bias() and creates an spatial plot of trend bias for one chosen metric.

Parameters:
variable: str

Variable name, has to be given in standard form specified in documentation.

bias_array: np.ndarray

Numpy array with three columns: [Bias correction method, Metric, Bias value at certain location]

manual_titlestr

Optional argument present in all plot functions: manual_title will be used as title of the plot.

ibicus.evaluate.assumptions

ibicus.evaluate.assumptions.calculate_aic(variable, dataset, *distributions)

Calculates the Akaike Information Criterion (AIC) at each location for each of the distributions specified.

Warning

*distributions can currently only be scipy.stats.rv_continuous and not as usually also StatisticalModel.

Parameters:
variable: str

Variable name, has to be given in standard form specified in documentation.

datasetnp.ndarray

Input data, either observations or climate projections dataset to be analysed, numeric entries expected.

*distributionslist[scipy.stats.rv_continuous]

Distributions to be tested, elements are scipy.stats.rv_continuous

Returns:
pd.DataFrame

DataFrame with all locations, distributions and associated AIC values.

ibicus.evaluate.assumptions.plot_aic(variable, aic_values, manual_title=' ')

Creates a boxplot of AIC values across all locations.

Parameters:
variablestr

Variable name, has to be given in standard form specified in documentation.

aic_valuespd.DataFrame

Pandas dataframe of type of output by calculate_aic_goodness_of_fit.

ibicus.evaluate.assumptions.plot_fit_worst_aic(variable, dataset, data_type, distribution, nr_bins='auto', aic_values=None, manual_title=' ')

Plots a histogram and overlayed fit at the location of worst AIC.

Warning

distribution can currently only be scipy.stats.rv_continuous and not as usually also StatisticalModel.

Parameters:
variablestr

Variable name, has to be given in standard CMIP convention

datasetnp.ndarray

3d-input data [time, lat, long], numeric entries expected. Either observations or climate projections dataset to be analysed.

data_typestr

Data type analysed - can be observational data or raw / debiased climate model data. Used to generate title only.

distributionscipy.stats.rv_continuous

Distribution providing fit to the data

nr_binsUnion[int, str] = “auto”

Number of bins used for the histogram. Either :py:class:int` or “auto” (default).

aic_valuesOptional[pd.DataFrame] = None

Pandas dataframe of type output by calculate_aic_goodness_of_fit. If None then they are recalculated;

ibicus.evaluate.assumptions.plot_quantile_residuals(variable, dataset, distribution, data_type, manual_title=' ')

Plots timeseries and autocorrelation function of quantile residuals, as well as a QQ-plot of normalized quantile residuals at one location.

Parameters:
variable: str

Variable name, has to be given in standard form specified in documentation.

datasetnp.ndarray

1d numpy array. Input data, either observations or climate projections dataset at one location, numeric entries expected.

distribution: scipy.stats.rv_continuous

Name of the distribution analysed, used for title only.

data_type: str

Data type analysed - can be observational data or raw / debiased climate model data. Used to generate title only.

Examples

>>> tas_obs_plot_gof = assumptions.plot_quantile_residuals(variable = 'tas', dataset = tas_obs[:,0,0], distribution = scipy.stats.norm, data_type = 'observation data')