clophfit.fitting.diagnostics
============================

.. py:module:: clophfit.fitting.diagnostics

.. autoapi-nested-parse::

   Well-quality diagnostics for plate-reader titration data.

   Two complementary entry points:

   - :func:`detect_bad_wells_from_dat` — reads raw ``.dat`` files (one per well,
     all labels together).  No fitting required; works before the fitting pipeline.
     Detects low-signal and flat-curve wells across all labels.

   - :func:`detect_bad_wells` — reads ``ffit*.csv`` fit results (one label per
     file).  Adds fit-quality criteria (K at bound, K outlier, poor fit) on top
     of the signal-quality checks.

   Detection criteria
   ------------------
   - **K at bound** : K equals the optimizer bound (default 3 or 11 for pH).
     Fit converged to a limit, not a true optimum.
   - **K outlier** : |K - median_K| > ``k_mad_factor * MAD(K)`` across all
     wells on the plate.  Identifies wells with biologically implausible K.
   - **Poor fit** : sK / K > ``max_sk_ratio``.  Relative uncertainty so large
     that K is undetermined.
   - **Low signal** : max(|y|) < ``min_signal_fraction`` * plate median signal.
     Absolute amplitude so small the fit is noise-dominated.  Applies to CTR too.
   - **Flat curve** : (max(y) - min(y)) / max(|y|) < ``min_dynamic_range``.
     Signal barely changes over the pH/Cl range.
   - **Inverted curve** : S0 > S1 for pH or S0 < S1 for Cl -- wrong polarity.
     Only checked in :func:`detect_bad_wells` (requires fitted plateaus).
   - **High residuals** : per-well residual MAD > ``residual_mad_factor`` times
     the plate median MAD.  Requires the optional ``residual_stats`` DataFrame.


Functions
---------

.. autoapisummary::

   clophfit.fitting.diagnostics.detect_bad_wells_from_dat
   clophfit.fitting.diagnostics.detect_bad_wells


Module Contents
---------------

.. py:function:: detect_bad_wells_from_dat(data_dir, *, min_signal_fraction = 0.05, min_dynamic_range = 0.05, ctr_cols = None)

   Flag unreliable wells by reading raw ``.dat`` titration files.

   Reads every ``*.dat`` file in *data_dir* (one per well).  Each file must
   have an ``x`` column and one or more signal columns (e.g. ``y1``, ``y2``).
   All labels are checked together — no fitting is required.

   :param data_dir: Directory containing ``*.dat`` files (one per well, CSV format with
                    columns ``x, y1[, y2, ...]``).
   :type data_dir: str | Path
   :param min_signal_fraction: Flag a well when any label's ``max(|y|)`` is below this fraction of
                               the plate-wide median ``max(|y|)`` for that label (default 0.05).
   :type min_signal_fraction: float
   :param min_dynamic_range: Flag a well when any label's ``(max(y) - min(y)) / max(|y|)`` is
                             below this threshold (default 0.05).
   :type min_dynamic_range: float
   :param ctr_cols: 1-based column numbers for control wells (e.g. ``[1, 12]``).
                    Currently used only for logging; all flags apply equally to CTR wells
                    because low signal or flat curves in a CTR are genuinely informative.
   :type ctr_cols: list[int] | None

   :returns: One row per well with columns:

             - ``well``
             - ``flag_low_signal``
             - ``flag_flat_curve``
             - ``flag_any``
             - ``flag_count``

             Sorted by descending ``flag_count``.
   :rtype: pd.DataFrame

   :raises FileNotFoundError: If no ``*.dat`` files are found in *data_dir*.

   .. rubric:: Examples

   >>> import tempfile
   >>> from pathlib import Path
   >>> nl = chr(10)
   >>> with tempfile.TemporaryDirectory() as d:
   ...     rows_by_well = {
   ...         "A01": ["8,100,200", "7,90,190", "6,80,180"],
   ...         "A02": ["8,1,2", "7,0.9,1.9", "6,0.8,1.8"],
   ...     }
   ...     for well, rows in rows_by_well.items():
   ...         _ = (Path(d) / f"{well}.dat").write_text(
   ...             nl.join(["x,y1,y2", *rows, ""])
   ...         )
   ...     flags = detect_bad_wells_from_dat(d)
   ...     flags[["well", "flag_any"]].values.tolist()
   [['A02', True], ['A01', False]]


.. py:function:: detect_bad_wells(ffit, *, k_min = 3.0, k_max = 11.0, k_mad_factor = 5.0, max_sk_ratio = 0.3, min_signal_fraction = 0.05, min_dynamic_range = 0.05, check_polarity = True, is_ph = True, ctr_cols = None, residual_stats = None, residual_mad_factor = 5.0)

   Flag unreliable wells from a ffit result DataFrame.

   :param ffit: Per-well fit results with at minimum columns ``well``, ``K``, ``sK``
                and at least one pair of ``S0_{lbl}`` / ``S1_{lbl}`` columns.
                Typically read from ``ffit*.csv`` produced by ``ppr``.
   :type ffit: pd.DataFrame
   :param k_min: Lower optimizer bound for K (default 3.0 for pH).
   :type k_min: float
   :param k_max: Upper optimizer bound for K (default 11.0 for pH).
   :type k_max: float
   :param k_mad_factor: Outlier threshold: flag if ``|K - median| > k_mad_factor * MAD``.
   :type k_mad_factor: float
   :param max_sk_ratio: Maximum tolerated relative uncertainty sK/K (default 0.30).
   :type max_sk_ratio: float
   :param min_signal_fraction: Minimum signal amplitude relative to the plate median: flag wells
                               where ``max(|S0|, |S1|) < min_signal_fraction * plate_median_signal``
                               (default 0.05).  Catches wells with very low absolute signal regardless
                               of relative dynamic range.  Applies to all wells including CTR.
   :type min_signal_fraction: float
   :param min_dynamic_range: Minimum required |S1-S0|/max(|S0|,|S1|) per label (default 0.05).
   :type min_dynamic_range: float
   :param check_polarity: If True, flag wells where the signal direction is inverted relative
                          to the expected biological response.
   :type check_polarity: bool
   :param is_ph: If True (default), pH assay: expect S1 > S0 (signal rises with pH).
                 If False, Cl assay: expect S0 > S1.
   :type is_ph: bool
   :param ctr_cols: Column numbers (1-based, e.g. ``[1, 12]``) reserved for control wells.
                    CTR wells are excluded from the K-outlier population so their different
                    pKa does not bias the sample statistics.  ``flag_k_at_bound``,
                    ``flag_k_outlier``, and ``flag_inverted`` are suppressed for CTR wells —
                    their pKa may be outside the measurement range, causing these criteria
                    to fire spuriously.  All other flags (``flag_poor_fit`` when K is not
                    at bound, ``flag_low_signal``, ``flag_flat_curve``,
                    ``flag_high_residuals``) apply to CTR wells.
                    Default ``None`` means no CTR exclusion.
   :type ctr_cols: list[int] | None
   :param residual_stats: Optional DataFrame from ``residual_stats_*.csv`` with columns
                          ``label``, ``mad``, and ``well``.  When provided, enables per-well
                          residual-MAD outlier detection.
   :type residual_stats: pd.DataFrame | None
   :param residual_mad_factor: Flag if per-well residual MAD > ``residual_mad_factor`` times the
                               plate median MAD (default 5.0).
   :type residual_mad_factor: float

   :returns: One row per well with boolean flag columns:

             - ``flag_k_at_bound``
             - ``flag_k_outlier``
             - ``flag_poor_fit``
             - ``flag_low_signal``
             - ``flag_flat_curve``
             - ``flag_inverted``    (only when ``check_polarity=True``)
             - ``flag_high_residuals``  (only when ``residual_stats`` provided)
             - ``flag_any``         -- True if any flag is set

             Ordered by descending ``flag_count``.
   :rtype: pd.DataFrame

   .. rubric:: Examples

   >>> import pandas as pd
   >>> ffit = pd.DataFrame({
   ...     "well": ["A01", "B06", "E10"],
   ...     "K": [7.1, 3.0, 11.0],
   ...     "sK": [0.06, 400.0, 35.0],
   ...     "S0_1": [600.0, 45.0, 5890.0],
   ...     "S1_1": [1100.0, -7800.0, 475.0],
   ... })
   >>> flags = detect_bad_wells(ffit, k_min=3.0, k_max=11.0, ctr_cols=[1])
   >>> flags[["well", "flag_any"]].values.tolist()
   [['B06', True], ['E10', True], ['A01', False]]