clophfit.testing.evaluation#

Functions for evaluating fit quality.

This module provides metrics for evaluating fitting performance, including: 1. Bias (accuracy) 2. Coverage (uncertainty quantification) 3. Residual distribution (goodness of fit) 4. Parameter error analysis

Functions#

calculate_bias(estimated, true_value)

Calculate the bias (mean error) of the estimates.

calculate_rmse(estimated, true_value)

Calculate the Root Mean Square Error (RMSE).

calculate_coverage(estimated, errors, true_value[, ...])

Calculate the coverage probability of the confidence intervals.

evaluate_residuals(residuals)

Evaluate the normality of residuals.

extract_params(fr[, param_name])

Extract parameter value and error from a FitResult.

load_real_data_paths()

Find available real data directories.

compare_methods_statistical(method1_errors, method2_errors)

Perform statistical comparison between two methods.

Module Contents#

clophfit.testing.evaluation.calculate_bias(estimated, true_value)#

Calculate the bias (mean error) of the estimates.

Parameters:
  • estimated (np.ndarray) – Array of estimated values.

  • true_value (float) – The true value.

Returns:

The bias (mean of estimated - true_value).

Return type:

float

clophfit.testing.evaluation.calculate_rmse(estimated, true_value)#

Calculate the Root Mean Square Error (RMSE).

Parameters:
  • estimated (ArrayF) – Array of estimated values.

  • true_value (float) – The true value.

Returns:

The RMSE.

Return type:

float

clophfit.testing.evaluation.calculate_coverage(estimated, errors, true_value, confidence=0.95)#

Calculate the coverage probability of the confidence intervals.

Parameters:
  • estimated (ArrayF) – Array of estimated values.

  • errors (ArrayF) – Array of standard errors (1 sigma).

  • true_value (float) – The true value.

  • confidence (float) – The desired confidence level (default: 0.95).

Returns:

The fraction of intervals that contain the true value.

Return type:

float

clophfit.testing.evaluation.evaluate_residuals(residuals)#

Evaluate the normality of residuals.

Parameters:

residuals (np.ndarray) – Array of residuals.

Returns:

Dictionary containing: - ‘shapiro_stat’: Shapiro-Wilk test statistic - ‘shapiro_p’: Shapiro-Wilk p-value - ‘mean’: Mean of residuals - ‘std’: Standard deviation of residuals

Return type:

dict[str, float]

clophfit.testing.evaluation.extract_params(fr, param_name='K')#

Extract parameter value and error from a FitResult.

Parameters:
  • fr (FitResult[MiniT]) – The fit result object.

  • param_name (str) – The name of the parameter to extract (default: “K”).

Returns:

(value, error). Returns (np.nan, np.nan) if extraction fails.

Return type:

tuple[float, float]

clophfit.testing.evaluation.load_real_data_paths()#

Find available real data directories.

Returns:

Mapping of dataset name to path

Return type:

dict[str, Path]

clophfit.testing.evaluation.compare_methods_statistical(method1_errors, method2_errors, method1_name='Method 1', method2_name='Method 2', *, verbose=True)#

Perform statistical comparison between two methods.

Uses Mann-Whitney U test (non-parametric) for comparing absolute errors.

Parameters:
  • method1_errors (Sequence[float]) – Errors from method 1

  • method2_errors (Sequence[float]) – Errors from method 2

  • method1_name (str) – Name of method 1

  • method2_name (str) – Name of method 2

  • verbose (bool, optional) – Whether to print detailed comparison info, defaults to True.

Returns:

Statistical comparison results. The dictionary includes: - ‘test’: Name of the statistical test. - ‘statistic’: Test statistic value. - ‘p_value’: Computed p-value. - significant: Whether the difference is significant at alpha=0.05. - better_method: Method with lower MAE (or ‘Equivalent’). - mae1: Mean absolute error for method1. - mae2: Mean absolute error for method2. - error: Only present when comparison cannot be performed.

Return type:

dict[str, float | bool | str]