clophfit.fitting.diagnostics ============================ .. py:module:: clophfit.fitting.diagnostics .. autoapi-nested-parse:: Well-quality diagnostics for plate-reader titration data. Two complementary entry points: - :func:`detect_bad_wells_from_dat` — reads raw ``.dat`` files (one per well, all labels together). No fitting required; works before the fitting pipeline. Detects low-signal and flat-curve wells across all labels. - :func:`detect_bad_wells` — reads ``ffit*.csv`` fit results (one label per file). Adds fit-quality criteria (K at bound, K outlier, poor fit) on top of the signal-quality checks. Detection criteria ------------------ - **K at bound** : K equals the optimizer bound (default 3 or 11 for pH). Fit converged to a limit, not a true optimum. - **K outlier** : |K - median_K| > ``k_mad_factor * MAD(K)`` across all wells on the plate. Identifies wells with biologically implausible K. - **Poor fit** : sK / K > ``max_sk_ratio``. Relative uncertainty so large that K is undetermined. - **Low signal** : max(|y|) < ``min_signal_fraction`` * plate median signal. Absolute amplitude so small the fit is noise-dominated. Applies to CTR too. - **Flat curve** : (max(y) - min(y)) / max(|y|) < ``min_dynamic_range``. Signal barely changes over the pH/Cl range. - **Inverted curve** : S0 > S1 for pH or S0 < S1 for Cl -- wrong polarity. Only checked in :func:`detect_bad_wells` (requires fitted plateaus). - **High residuals** : per-well residual MAD > ``residual_mad_factor`` times the plate median MAD. Requires the optional ``residual_stats`` DataFrame. Functions --------- .. autoapisummary:: clophfit.fitting.diagnostics.detect_bad_wells_from_dat clophfit.fitting.diagnostics.detect_bad_wells Module Contents --------------- .. py:function:: detect_bad_wells_from_dat(data_dir, *, min_signal_fraction = 0.05, min_dynamic_range = 0.05, ctr_cols = None) Flag unreliable wells by reading raw ``.dat`` titration files. Reads every ``*.dat`` file in *data_dir* (one per well). Each file must have an ``x`` column and one or more signal columns (e.g. ``y1``, ``y2``). All labels are checked together — no fitting is required. :param data_dir: Directory containing ``*.dat`` files (one per well, CSV format with columns ``x, y1[, y2, ...]``). :type data_dir: str | Path :param min_signal_fraction: Flag a well when any label's ``max(|y|)`` is below this fraction of the plate-wide median ``max(|y|)`` for that label (default 0.05). :type min_signal_fraction: float :param min_dynamic_range: Flag a well when any label's ``(max(y) - min(y)) / max(|y|)`` is below this threshold (default 0.05). :type min_dynamic_range: float :param ctr_cols: 1-based column numbers for control wells (e.g. ``[1, 12]``). Currently used only for logging; all flags apply equally to CTR wells because low signal or flat curves in a CTR are genuinely informative. :type ctr_cols: list[int] | None :returns: One row per well with columns: - ``well`` - ``flag_low_signal`` - ``flag_flat_curve`` - ``flag_any`` - ``flag_count`` Sorted by descending ``flag_count``. :rtype: pd.DataFrame :raises FileNotFoundError: If no ``*.dat`` files are found in *data_dir*. .. rubric:: Examples >>> import tempfile >>> from pathlib import Path >>> nl = chr(10) >>> with tempfile.TemporaryDirectory() as d: ... rows_by_well = { ... "A01": ["8,100,200", "7,90,190", "6,80,180"], ... "A02": ["8,1,2", "7,0.9,1.9", "6,0.8,1.8"], ... } ... for well, rows in rows_by_well.items(): ... _ = (Path(d) / f"{well}.dat").write_text( ... nl.join(["x,y1,y2", *rows, ""]) ... ) ... flags = detect_bad_wells_from_dat(d) ... flags[["well", "flag_any"]].values.tolist() [['A02', True], ['A01', False]] .. py:function:: detect_bad_wells(ffit, *, k_min = 3.0, k_max = 11.0, k_mad_factor = 5.0, max_sk_ratio = 0.3, min_signal_fraction = 0.05, min_dynamic_range = 0.05, check_polarity = True, is_ph = True, ctr_cols = None, residual_stats = None, residual_mad_factor = 5.0) Flag unreliable wells from a ffit result DataFrame. :param ffit: Per-well fit results with at minimum columns ``well``, ``K``, ``sK`` and at least one pair of ``S0_{lbl}`` / ``S1_{lbl}`` columns. Typically read from ``ffit*.csv`` produced by ``ppr``. :type ffit: pd.DataFrame :param k_min: Lower optimizer bound for K (default 3.0 for pH). :type k_min: float :param k_max: Upper optimizer bound for K (default 11.0 for pH). :type k_max: float :param k_mad_factor: Outlier threshold: flag if ``|K - median| > k_mad_factor * MAD``. :type k_mad_factor: float :param max_sk_ratio: Maximum tolerated relative uncertainty sK/K (default 0.30). :type max_sk_ratio: float :param min_signal_fraction: Minimum signal amplitude relative to the plate median: flag wells where ``max(|S0|, |S1|) < min_signal_fraction * plate_median_signal`` (default 0.05). Catches wells with very low absolute signal regardless of relative dynamic range. Applies to all wells including CTR. :type min_signal_fraction: float :param min_dynamic_range: Minimum required |S1-S0|/max(|S0|,|S1|) per label (default 0.05). :type min_dynamic_range: float :param check_polarity: If True, flag wells where the signal direction is inverted relative to the expected biological response. :type check_polarity: bool :param is_ph: If True (default), pH assay: expect S1 > S0 (signal rises with pH). If False, Cl assay: expect S0 > S1. :type is_ph: bool :param ctr_cols: Column numbers (1-based, e.g. ``[1, 12]``) reserved for control wells. CTR wells are excluded from the K-outlier population so their different pKa does not bias the sample statistics. ``flag_k_at_bound``, ``flag_k_outlier``, and ``flag_inverted`` are suppressed for CTR wells — their pKa may be outside the measurement range, causing these criteria to fire spuriously. All other flags (``flag_poor_fit`` when K is not at bound, ``flag_low_signal``, ``flag_flat_curve``, ``flag_high_residuals``) apply to CTR wells. Default ``None`` means no CTR exclusion. :type ctr_cols: list[int] | None :param residual_stats: Optional DataFrame from ``residual_stats_*.csv`` with columns ``label``, ``mad``, and ``well``. When provided, enables per-well residual-MAD outlier detection. :type residual_stats: pd.DataFrame | None :param residual_mad_factor: Flag if per-well residual MAD > ``residual_mad_factor`` times the plate median MAD (default 5.0). :type residual_mad_factor: float :returns: One row per well with boolean flag columns: - ``flag_k_at_bound`` - ``flag_k_outlier`` - ``flag_poor_fit`` - ``flag_low_signal`` - ``flag_flat_curve`` - ``flag_inverted`` (only when ``check_polarity=True``) - ``flag_high_residuals`` (only when ``residual_stats`` provided) - ``flag_any`` -- True if any flag is set Ordered by descending ``flag_count``. :rtype: pd.DataFrame .. rubric:: Examples >>> import pandas as pd >>> ffit = pd.DataFrame({ ... "well": ["A01", "B06", "E10"], ... "K": [7.1, 3.0, 11.0], ... "sK": [0.06, 400.0, 35.0], ... "S0_1": [600.0, 45.0, 5890.0], ... "S1_1": [1100.0, -7800.0, 475.0], ... }) >>> flags = detect_bad_wells(ffit, k_min=3.0, k_max=11.0, ctr_cols=[1]) >>> flags[["well", "flag_any"]].values.tolist() [['B06', True], ['E10', True], ['A01', False]]