clophfit.fitting.residuals ========================== .. py:module:: clophfit.fitting.residuals .. autoapi-nested-parse:: Residual extraction and analysis utilities for fit results. This module provides tools to extract, analyze, and validate residuals from fitting procedures. Useful for diagnostics, model validation, and comparing different fitting methods. Classes ------- .. autoapisummary:: clophfit.fitting.residuals.ResidualPoint Functions --------- .. autoapisummary:: clophfit.fitting.residuals.extract_residual_points clophfit.fitting.residuals.residual_dataframe clophfit.fitting.residuals.collect_multi_residuals clophfit.fitting.residuals.residual_statistics clophfit.fitting.residuals.validate_residuals clophfit.fitting.residuals.compute_residual_covariance clophfit.fitting.residuals.compute_correlation_matrices clophfit.fitting.residuals.analyze_label_bias clophfit.fitting.residuals.detect_adjacent_correlation clophfit.fitting.residuals.estimate_x_shift_statistics clophfit.fitting.residuals.plot_residual_vs_predicted clophfit.fitting.residuals.plot_residual_vs_yerr Module Contents --------------- .. py:class:: ResidualPoint Single residual data point with metadata. .. attribute:: label Dataset label (e.g., 'y1', 'y2' for multi-label fits) :type: str .. attribute:: x X-value (pH or ligand concentration) :type: float .. attribute:: resid_weighted Weighted residual: (y - model) / y_err :type: float .. attribute:: resid_raw Raw residual: (y - model) :type: float .. attribute:: raw_i Index into the original (unmasked) arrays for this label (`DataArray.xc/yc`). :type: int .. attribute:: y_err Measurement uncertainty used during fitting. :type: float .. attribute:: predicted Model-predicted signal value (y - resid_raw). :type: float .. py:function:: extract_residual_points(fr) Extract residual points from a fit result. :param fr: Fit result containing residuals and dataset :type fr: FitResult[Any] :returns: List of residual points with metadata for each observation :rtype: list[ResidualPoint] :raises ValueError: If residual length doesn't match dataset sizes .. rubric:: Examples >>> from clophfit.fitting.core import fit_binding_glob >>> from clophfit.fitting.data_structures import Dataset, DataArray >>> import numpy as np >>> # Create test data >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0]) >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x)) >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10) >>> dataset = Dataset({"y1": da}, is_ph=True) >>> fr = fit_binding_glob(dataset) >>> residuals = extract_residual_points(fr) >>> len(residuals) > 0 True >>> residuals[0].label 'y1' .. py:function:: residual_dataframe(fr) Convert fit result residuals to a DataFrame. :param fr: Fit result to extract residuals from :type fr: FitResult[Any] :returns: DataFrame with columns: label, x, resid_weighted, resid_raw, raw_i, y_err, predicted :rtype: pd.DataFrame .. rubric:: Examples >>> from clophfit.fitting.core import fit_binding_glob >>> from clophfit.fitting.data_structures import Dataset, DataArray >>> import numpy as np >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0]) >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x)) >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10) >>> dataset = Dataset({"y1": da}, is_ph=True) >>> fr = fit_binding_glob(dataset) >>> df = residual_dataframe(fr) >>> "label" in df.columns and "x" in df.columns True .. py:function:: collect_multi_residuals(fit_results, round_x = 3) Collect residuals from multiple fit results into a single DataFrame. :param fit_results: Dictionary mapping well/key identifiers to fit results :type fit_results: dict[str, FitResult[Any]] :param round_x: Number of decimals to round x values (avoids float drift). Set to None to disable rounding. :type round_x: int | None :returns: Combined DataFrame with columns: well, label, x, resid_weighted, resid_raw, raw_i :rtype: pd.DataFrame .. rubric:: Examples >>> from clophfit.fitting.core import fit_binding_glob >>> from clophfit.fitting.data_structures import Dataset, DataArray >>> import numpy as np >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0]) >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x)) >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10) >>> dataset = Dataset({"y1": da}, is_ph=True) >>> results = {"A01": fit_binding_glob(dataset), "A02": fit_binding_glob(dataset)} >>> all_res = collect_multi_residuals(results) >>> "well" in all_res.columns True >>> len(all_res) == 10 # 2 wells * 5 points True .. py:function:: residual_statistics(df) Compute residual statistics by label. :param df: Residual DataFrame (from residual_dataframe or collect_multi_residuals) :type df: pd.DataFrame :returns: Statistics by label: mean, std, median, mad, outlier_count :rtype: pd.DataFrame .. rubric:: Examples >>> from clophfit.fitting.core import fit_binding_glob >>> from clophfit.fitting.data_structures import Dataset, DataArray >>> import numpy as np >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0]) >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x)) >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10) >>> dataset = Dataset({"y1": da}, is_ph=True) >>> results = {"A01": fit_binding_glob(dataset)} >>> all_res = collect_multi_residuals(results) >>> stats = residual_statistics(all_res) >>> "mean" in stats.columns True .. py:function:: validate_residuals(fr, *, verbose = True) Validate residual quality for a fit result. Checks for common issues: - Systematic bias (mean significantly different from 0) - Outliers (more than 5% beyond ±3-sigma) - Serial correlation (adjacent residuals) :param fr: Fit result to validate :type fr: FitResult[Any] :param verbose: Print warnings for failed checks :type verbose: bool :returns: Dictionary of check results: {'bias_ok', 'outliers_ok', 'correlation_ok'} :rtype: dict[str, bool] .. rubric:: Examples >>> from clophfit.fitting.core import fit_binding_glob >>> from clophfit.fitting.data_structures import Dataset, DataArray >>> import numpy as np >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0]) >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x)) >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10) >>> dataset = Dataset({"y1": da}, is_ph=True) >>> fr = fit_binding_glob(dataset) >>> checks = validate_residuals(fr, verbose=False) >>> isinstance(checks, dict) and "bias_ok" in checks True .. py:function:: compute_residual_covariance(all_res, value_col = 'resid_weighted') Compute covariance matrix of residuals for each label. .. py:function:: compute_correlation_matrices(cov_by_label) Convert covariance matrices to correlation matrices. .. py:function:: analyze_label_bias(all_res, n_bins = 5) Detect systematic bias by label and x-range. .. py:function:: detect_adjacent_correlation(all_res) Detect correlation between adjacent residuals within wells. .. py:function:: estimate_x_shift_statistics(all_res, fit_results) Estimate potential systematic x-shifts per well (heuristics). .. py:function:: plot_residual_vs_predicted(all_res, title = '') Plot |standardized residual| vs predicted signal per label. A flat trend at ~0.80 (expected |N(0,1)|) confirms the error model is correctly calibrated. A rising trend indicates under-estimated errors at high signals (multiplicative noise). :param all_res: Residual DataFrame from ``collect_multi_residuals``. Must contain columns ``label``, ``predicted``, and ``resid_weighted``. :type all_res: pd.DataFrame :param title: Figure suptitle suffix. :type title: str, optional :returns: Matplotlib figure (one panel per label). :rtype: Figure .. py:function:: plot_residual_vs_yerr(all_res, title = '') Plot raw residual² vs y_err² per label (error calibration check). Points should scatter around the y=x line if the assigned uncertainties match the actual scatter. A slope < 1 means errors are over-estimated; slope > 1 means under-estimated. :param all_res: Residual DataFrame from ``collect_multi_residuals``. Must contain columns ``label``, ``y_err``, and ``resid_raw``. :type all_res: pd.DataFrame :param title: Figure suptitle suffix. :type title: str, optional :returns: Matplotlib figure (one panel per label). :rtype: Figure