clophfit.testing.evaluation =========================== .. py:module:: clophfit.testing.evaluation .. autoapi-nested-parse:: Functions for evaluating fit quality. This module provides metrics for evaluating fitting performance, including: 1. Bias (accuracy) 2. Coverage (uncertainty quantification) 3. Residual distribution (goodness of fit) 4. Parameter error analysis Functions --------- .. autoapisummary:: clophfit.testing.evaluation.calculate_bias clophfit.testing.evaluation.calculate_rmse clophfit.testing.evaluation.calculate_coverage clophfit.testing.evaluation.evaluate_residuals clophfit.testing.evaluation.extract_params clophfit.testing.evaluation.load_real_data_paths clophfit.testing.evaluation.compare_methods_statistical Module Contents --------------- .. py:function:: calculate_bias(estimated, true_value) Calculate the bias (mean error) of the estimates. :param estimated: Array of estimated values. :type estimated: np.ndarray :param true_value: The true value. :type true_value: float :returns: The bias (mean of estimated - true_value). :rtype: float .. py:function:: calculate_rmse(estimated, true_value) Calculate the Root Mean Square Error (RMSE). :param estimated: Array of estimated values. :type estimated: ArrayF :param true_value: The true value. :type true_value: float :returns: The RMSE. :rtype: float .. py:function:: calculate_coverage(estimated, errors, true_value, confidence = 0.95) Calculate the coverage probability of the confidence intervals. :param estimated: Array of estimated values. :type estimated: ArrayF :param errors: Array of standard errors (1 sigma). :type errors: ArrayF :param true_value: The true value. :type true_value: float :param confidence: The desired confidence level (default: 0.95). :type confidence: float :returns: The fraction of intervals that contain the true value. :rtype: float .. py:function:: evaluate_residuals(residuals) Evaluate the normality of residuals. :param residuals: Array of residuals. :type residuals: np.ndarray :returns: Dictionary containing: - 'shapiro_stat': Shapiro-Wilk test statistic - 'shapiro_p': Shapiro-Wilk p-value - 'mean': Mean of residuals - 'std': Standard deviation of residuals :rtype: dict[str, float] .. py:function:: extract_params(fr, param_name = 'K') Extract parameter value and error from a FitResult. :param fr: The fit result object. :type fr: FitResult[MiniT] :param param_name: The name of the parameter to extract (default: "K"). :type param_name: str :returns: (value, error). Returns (np.nan, np.nan) if extraction fails. :rtype: tuple[float, float] .. py:function:: load_real_data_paths() Find available real data directories. :returns: Mapping of dataset name to path :rtype: dict[str, Path] .. py:function:: compare_methods_statistical(method1_errors, method2_errors, method1_name = 'Method 1', method2_name = 'Method 2', *, verbose = True) Perform statistical comparison between two methods. Uses Mann-Whitney U test (non-parametric) for comparing absolute errors. :param method1_errors: Errors from method 1 :type method1_errors: Sequence[float] :param method2_errors: Errors from method 2 :type method2_errors: Sequence[float] :param method1_name: Name of method 1 :type method1_name: str :param method2_name: Name of method 2 :type method2_name: str :param verbose: Whether to print detailed comparison info, defaults to True. :type verbose: bool, optional :returns: Statistical comparison results. The dictionary includes: - 'test': Name of the statistical test. - 'statistic': Test statistic value. - 'p_value': Computed p-value. - `significant`: Whether the difference is significant at alpha=0.05. - `better_method`: Method with lower MAE (or 'Equivalent'). - `mae1`: Mean absolute error for method1. - `mae2`: Mean absolute error for method2. - `error`: Only present when comparison cannot be performed. :rtype: dict[str, float | bool | str]