clophfit.testing.evaluation#

Functions for evaluating fit quality.

This module provides metrics for evaluating fitting performance, including: 1. Bias (accuracy) 2. Coverage (uncertainty quantification) 3. Residual distribution (goodness of fit) 4. Parameter error analysis

Functions#

`calculate_bias`(estimated, true_value)	Calculate the bias (mean error) of the estimates.
`calculate_rmse`(estimated, true_value)	Calculate the Root Mean Square Error (RMSE).
`calculate_coverage`(estimated, errors, true_value[, ...])	Calculate the coverage probability of the confidence intervals.
`evaluate_residuals`(residuals)	Evaluate the normality of residuals.
`extract_params`(fr[, param_name])	Extract parameter value and error from a FitResult.
`load_real_data_paths`()	Find available real data directories.
`compare_methods_statistical`(method1_errors, method2_errors)	Perform statistical comparison between two methods.

Module Contents#

clophfit.testing.evaluation.calculate_bias(estimated, true_value)#

Calculate the bias (mean error) of the estimates.

Parameters:

estimated (np.ndarray) – Array of estimated values.
true_value (float) – The true value.

Returns:

The bias (mean of estimated - true_value).

Return type:

float

clophfit.testing.evaluation.calculate_rmse(estimated, true_value)#

Calculate the Root Mean Square Error (RMSE).

Parameters:

estimated (ArrayF) – Array of estimated values.
true_value (float) – The true value.

Returns:

The RMSE.

Return type:

float

clophfit.testing.evaluation.calculate_coverage(estimated, errors, true_value, confidence=0.95)#

Calculate the coverage probability of the confidence intervals.

Parameters:

estimated (ArrayF) – Array of estimated values.
errors (ArrayF) – Array of standard errors (1 sigma).
true_value (float) – The true value.
confidence (float) – The desired confidence level (default: 0.95).

Returns:

The fraction of intervals that contain the true value.

Return type:

float

clophfit.testing.evaluation.evaluate_residuals(residuals)#

Evaluate the normality of residuals.

Parameters:: residuals (np.ndarray) – Array of residuals.
Returns:: Dictionary containing: - ‘shapiro_stat’: Shapiro-Wilk test statistic - ‘shapiro_p’: Shapiro-Wilk p-value - ‘mean’: Mean of residuals - ‘std’: Standard deviation of residuals
Return type:: dict[str, float]

clophfit.testing.evaluation.extract_params(fr, param_name='K')#

Extract parameter value and error from a FitResult.

Parameters:

fr (FitResult[MiniT]) – The fit result object.
param_name (str) – The name of the parameter to extract (default: “K”).

Returns:

(value, error). Returns (np.nan, np.nan) if extraction fails.

Return type:

tuple[float, float]

clophfit.testing.evaluation.load_real_data_paths()#

Find available real data directories.

Returns:: Mapping of dataset name to path
Return type:: dict[str, Path]

clophfit.testing.evaluation.compare_methods_statistical(method1_errors, method2_errors, method1_name='Method 1', method2_name='Method 2', *, verbose=True)#

Perform statistical comparison between two methods.

Uses Mann-Whitney U test (non-parametric) for comparing absolute errors.

Parameters:

method1_errors (Sequence[float]) – Errors from method 1
method2_errors (Sequence[float]) – Errors from method 2
method1_name (str) – Name of method 1
method2_name (str) – Name of method 2
verbose (bool, optional) – Whether to print detailed comparison info, defaults to True.

Returns:

Statistical comparison results. The dictionary includes: - ‘test’: Name of the statistical test. - ‘statistic’: Test statistic value. - ‘p_value’: Computed p-value. - significant: Whether the difference is significant at alpha=0.05. - better_method: Method with lower MAE (or ‘Equivalent’). - mae1: Mean absolute error for method1. - mae2: Mean absolute error for method2. - error: Only present when comparison cannot be performed.

Return type:

dict[str, float | bool | str]