clophfit.fitting.residuals
==========================

.. py:module:: clophfit.fitting.residuals

.. autoapi-nested-parse::

   Residual extraction and analysis utilities for fit results.

   This module provides tools to extract, analyze, and validate residuals from
   fitting procedures. Useful for diagnostics, model validation, and comparing
   different fitting methods.


Classes
-------

.. autoapisummary::

   clophfit.fitting.residuals.ResidualPoint


Functions
---------

.. autoapisummary::

   clophfit.fitting.residuals.extract_residual_points
   clophfit.fitting.residuals.residual_dataframe
   clophfit.fitting.residuals.collect_multi_residuals
   clophfit.fitting.residuals.residual_statistics
   clophfit.fitting.residuals.validate_residuals
   clophfit.fitting.residuals.compute_residual_covariance
   clophfit.fitting.residuals.compute_correlation_matrices
   clophfit.fitting.residuals.analyze_label_bias
   clophfit.fitting.residuals.detect_adjacent_correlation
   clophfit.fitting.residuals.estimate_x_shift_statistics
   clophfit.fitting.residuals.plot_residual_vs_predicted
   clophfit.fitting.residuals.plot_residual_vs_yerr


Module Contents
---------------

.. py:class:: ResidualPoint

   Single residual data point with metadata.

   .. attribute:: label

      Dataset label (e.g., 'y1', 'y2' for multi-label fits)

      :type: str

   .. attribute:: x

      X-value (pH or ligand concentration)

      :type: float

   .. attribute:: resid_weighted

      Weighted residual: (y - model) / y_err

      :type: float

   .. attribute:: resid_raw

      Raw residual: (y - model)

      :type: float

   .. attribute:: raw_i

      Index into the original (unmasked) arrays for this label (`DataArray.xc/yc`).

      :type: int

   .. attribute:: y_err

      Measurement uncertainty used during fitting.

      :type: float

   .. attribute:: predicted

      Model-predicted signal value (y - resid_raw).

      :type: float


.. py:function:: extract_residual_points(fr)

   Extract residual points from a fit result.

   :param fr: Fit result containing residuals and dataset
   :type fr: FitResult[Any]

   :returns: List of residual points with metadata for each observation
   :rtype: list[ResidualPoint]

   :raises ValueError: If residual length doesn't match dataset sizes

   .. rubric:: Examples

   >>> from clophfit.fitting.core import fit_binding_glob
   >>> from clophfit.fitting.data_structures import Dataset, DataArray
   >>> import numpy as np
   >>> # Create test data
   >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0])
   >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x))
   >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10)
   >>> dataset = Dataset({"y1": da}, is_ph=True)
   >>> fr = fit_binding_glob(dataset)
   >>> residuals = extract_residual_points(fr)
   >>> len(residuals) > 0
   True
   >>> residuals[0].label
   'y1'


.. py:function:: residual_dataframe(fr)

   Convert fit result residuals to a DataFrame.

   :param fr: Fit result to extract residuals from
   :type fr: FitResult[Any]

   :returns: DataFrame with columns: label, x, resid_weighted, resid_raw, raw_i, y_err, predicted
   :rtype: pd.DataFrame

   .. rubric:: Examples

   >>> from clophfit.fitting.core import fit_binding_glob
   >>> from clophfit.fitting.data_structures import Dataset, DataArray
   >>> import numpy as np
   >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0])
   >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x))
   >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10)
   >>> dataset = Dataset({"y1": da}, is_ph=True)
   >>> fr = fit_binding_glob(dataset)
   >>> df = residual_dataframe(fr)
   >>> "label" in df.columns and "x" in df.columns
   True


.. py:function:: collect_multi_residuals(fit_results, round_x = 3)

   Collect residuals from multiple fit results into a single DataFrame.

   :param fit_results: Dictionary mapping well/key identifiers to fit results
   :type fit_results: dict[str, FitResult[Any]]
   :param round_x: Number of decimals to round x values (avoids float drift).
                   Set to None to disable rounding.
   :type round_x: int | None

   :returns: Combined DataFrame with columns: well, label, x, resid_weighted, resid_raw, raw_i
   :rtype: pd.DataFrame

   .. rubric:: Examples

   >>> from clophfit.fitting.core import fit_binding_glob
   >>> from clophfit.fitting.data_structures import Dataset, DataArray
   >>> import numpy as np
   >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0])
   >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x))
   >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10)
   >>> dataset = Dataset({"y1": da}, is_ph=True)
   >>> results = {"A01": fit_binding_glob(dataset), "A02": fit_binding_glob(dataset)}
   >>> all_res = collect_multi_residuals(results)
   >>> "well" in all_res.columns
   True
   >>> len(all_res) == 10  # 2 wells * 5 points
   True


.. py:function:: residual_statistics(df)

   Compute residual statistics by label.

   :param df: Residual DataFrame (from residual_dataframe or collect_multi_residuals)
   :type df: pd.DataFrame

   :returns: Statistics by label: mean, std, median, mad, outlier_count
   :rtype: pd.DataFrame

   .. rubric:: Examples

   >>> from clophfit.fitting.core import fit_binding_glob
   >>> from clophfit.fitting.data_structures import Dataset, DataArray
   >>> import numpy as np
   >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0])
   >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x))
   >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10)
   >>> dataset = Dataset({"y1": da}, is_ph=True)
   >>> results = {"A01": fit_binding_glob(dataset)}
   >>> all_res = collect_multi_residuals(results)
   >>> stats = residual_statistics(all_res)
   >>> "mean" in stats.columns
   True


.. py:function:: validate_residuals(fr, *, verbose = True)

   Validate residual quality for a fit result.

   Checks for common issues:
   - Systematic bias (mean significantly different from 0)
   - Outliers (more than 5% beyond ±3-sigma)
   - Serial correlation (adjacent residuals)

   :param fr: Fit result to validate
   :type fr: FitResult[Any]
   :param verbose: Print warnings for failed checks
   :type verbose: bool

   :returns: Dictionary of check results: {'bias_ok', 'outliers_ok', 'correlation_ok'}
   :rtype: dict[str, bool]

   .. rubric:: Examples

   >>> from clophfit.fitting.core import fit_binding_glob
   >>> from clophfit.fitting.data_structures import Dataset, DataArray
   >>> import numpy as np
   >>> x = np.array([9.0, 8.0, 7.0, 6.0, 5.0])
   >>> y = 500 + 500 * 10 ** (7.0 - x) / (1 + 10 ** (7.0 - x))
   >>> da = DataArray(xc=x, yc=y, y_errc=np.ones_like(y) * 10)
   >>> dataset = Dataset({"y1": da}, is_ph=True)
   >>> fr = fit_binding_glob(dataset)
   >>> checks = validate_residuals(fr, verbose=False)
   >>> isinstance(checks, dict) and "bias_ok" in checks
   True


.. py:function:: compute_residual_covariance(all_res, value_col = 'resid_weighted')

   Compute covariance matrix of residuals for each label.


.. py:function:: compute_correlation_matrices(cov_by_label)

   Convert covariance matrices to correlation matrices.


.. py:function:: analyze_label_bias(all_res, n_bins = 5)

   Detect systematic bias by label and x-range.


.. py:function:: detect_adjacent_correlation(all_res)

   Detect correlation between adjacent residuals within wells.


.. py:function:: estimate_x_shift_statistics(all_res, fit_results)

   Estimate potential systematic x-shifts per well (heuristics).


.. py:function:: plot_residual_vs_predicted(all_res, title = '')

   Plot |standardized residual| vs predicted signal per label.

   A flat trend at ~0.80 (expected |N(0,1)|) confirms the error model is
   correctly calibrated.  A rising trend indicates under-estimated errors
   at high signals (multiplicative noise).

   :param all_res: Residual DataFrame from ``collect_multi_residuals``.  Must contain
                   columns ``label``, ``predicted``, and ``resid_weighted``.
   :type all_res: pd.DataFrame
   :param title: Figure suptitle suffix.
   :type title: str, optional

   :returns: Matplotlib figure (one panel per label).
   :rtype: Figure


.. py:function:: plot_residual_vs_yerr(all_res, title = '')

   Plot raw residual² vs y_err² per label (error calibration check).

   Points should scatter around the y=x line if the assigned uncertainties
   match the actual scatter.  A slope < 1 means errors are over-estimated;
   slope > 1 means under-estimated.

   :param all_res: Residual DataFrame from ``collect_multi_residuals``.  Must contain
                   columns ``label``, ``y_err``, and ``resid_raw``.
   :type all_res: pd.DataFrame
   :param title: Figure suptitle suffix.
   :type title: str, optional

   :returns: Matplotlib figure (one panel per label).
   :rtype: Figure