clophfit.fitting.diagnostics#

Well-quality diagnostics for plate-reader titration data.

Two complementary entry points:

  • detect_bad_wells_from_dat() — reads raw .dat files (one per well, all labels together). No fitting required; works before the fitting pipeline. Detects low-signal and flat-curve wells across all labels.

  • detect_bad_wells() — reads ffit*.csv fit results (one label per file). Adds fit-quality criteria (K at bound, K outlier, poor fit) on top of the signal-quality checks.

Detection criteria#

  • K at bound : K equals the optimizer bound (default 3 or 11 for pH). Fit converged to a limit, not a true optimum.

  • K outlier : |K - median_K| > k_mad_factor * MAD(K) across all wells on the plate. Identifies wells with biologically implausible K.

  • Poor fit : sK / K > max_sk_ratio. Relative uncertainty so large that K is undetermined.

  • Low signal : max(|y|) < min_signal_fraction * plate median signal. Absolute amplitude so small the fit is noise-dominated. Applies to CTR too.

  • Flat curve : (max(y) - min(y)) / max(|y|) < min_dynamic_range. Signal barely changes over the pH/Cl range.

  • Inverted curve : S0 > S1 for pH or S0 < S1 for Cl – wrong polarity. Only checked in detect_bad_wells() (requires fitted plateaus).

  • High residuals : per-well residual MAD > residual_mad_factor times the plate median MAD. Requires the optional residual_stats DataFrame.

Functions#

detect_bad_wells_from_dat(data_dir, *[, ...])

Flag unreliable wells by reading raw .dat titration files.

detect_bad_wells(ffit, *[, k_min, k_max, ...])

Flag unreliable wells from a ffit result DataFrame.

Module Contents#

clophfit.fitting.diagnostics.detect_bad_wells_from_dat(data_dir, *, min_signal_fraction=0.05, min_dynamic_range=0.05, ctr_cols=None)#

Flag unreliable wells by reading raw .dat titration files.

Reads every *.dat file in data_dir (one per well). Each file must have an x column and one or more signal columns (e.g. y1, y2). All labels are checked together — no fitting is required.

Parameters:
  • data_dir (str | Path) – Directory containing *.dat files (one per well, CSV format with columns x, y1[, y2, ...]).

  • min_signal_fraction (float) – Flag a well when any label’s max(|y|) is below this fraction of the plate-wide median max(|y|) for that label (default 0.05).

  • min_dynamic_range (float) – Flag a well when any label’s (max(y) - min(y)) / max(|y|) is below this threshold (default 0.05).

  • ctr_cols (list[int] | None) – 1-based column numbers for control wells (e.g. [1, 12]). Currently used only for logging; all flags apply equally to CTR wells because low signal or flat curves in a CTR are genuinely informative.

Returns:

One row per well with columns:

  • well

  • flag_low_signal

  • flag_flat_curve

  • flag_any

  • flag_count

Sorted by descending flag_count.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If no *.dat files are found in data_dir.

Examples

>>> import tempfile
>>> from pathlib import Path
>>> nl = chr(10)
>>> with tempfile.TemporaryDirectory() as d:
...     rows_by_well = {
...         "A01": ["8,100,200", "7,90,190", "6,80,180"],
...         "A02": ["8,1,2", "7,0.9,1.9", "6,0.8,1.8"],
...     }
...     for well, rows in rows_by_well.items():
...         _ = (Path(d) / f"{well}.dat").write_text(
...             nl.join(["x,y1,y2", *rows, ""])
...         )
...     flags = detect_bad_wells_from_dat(d)
...     flags[["well", "flag_any"]].values.tolist()
[['A02', True], ['A01', False]]
clophfit.fitting.diagnostics.detect_bad_wells(ffit, *, k_min=3.0, k_max=11.0, k_mad_factor=5.0, max_sk_ratio=0.3, min_signal_fraction=0.05, min_dynamic_range=0.05, check_polarity=True, is_ph=True, ctr_cols=None, residual_stats=None, residual_mad_factor=5.0)#

Flag unreliable wells from a ffit result DataFrame.

Parameters:
  • ffit (pd.DataFrame) – Per-well fit results with at minimum columns well, K, sK and at least one pair of S0_{lbl} / S1_{lbl} columns. Typically read from ffit*.csv produced by ppr.

  • k_min (float) – Lower optimizer bound for K (default 3.0 for pH).

  • k_max (float) – Upper optimizer bound for K (default 11.0 for pH).

  • k_mad_factor (float) – Outlier threshold: flag if |K - median| > k_mad_factor * MAD.

  • max_sk_ratio (float) – Maximum tolerated relative uncertainty sK/K (default 0.30).

  • min_signal_fraction (float) – Minimum signal amplitude relative to the plate median: flag wells where max(|S0|, |S1|) < min_signal_fraction * plate_median_signal (default 0.05). Catches wells with very low absolute signal regardless of relative dynamic range. Applies to all wells including CTR.

  • min_dynamic_range (float) – Minimum required |S1-S0|/max(|S0|,|S1|) per label (default 0.05).

  • check_polarity (bool) – If True, flag wells where the signal direction is inverted relative to the expected biological response.

  • is_ph (bool) – If True (default), pH assay: expect S1 > S0 (signal rises with pH). If False, Cl assay: expect S0 > S1.

  • ctr_cols (list[int] | None) – Column numbers (1-based, e.g. [1, 12]) reserved for control wells. CTR wells are excluded from the K-outlier population so their different pKa does not bias the sample statistics. flag_k_at_bound, flag_k_outlier, and flag_inverted are suppressed for CTR wells — their pKa may be outside the measurement range, causing these criteria to fire spuriously. All other flags (flag_poor_fit when K is not at bound, flag_low_signal, flag_flat_curve, flag_high_residuals) apply to CTR wells. Default None means no CTR exclusion.

  • residual_stats (pd.DataFrame | None) – Optional DataFrame from residual_stats_*.csv with columns label, mad, and well. When provided, enables per-well residual-MAD outlier detection.

  • residual_mad_factor (float) – Flag if per-well residual MAD > residual_mad_factor times the plate median MAD (default 5.0).

Returns:

One row per well with boolean flag columns:

  • flag_k_at_bound

  • flag_k_outlier

  • flag_poor_fit

  • flag_low_signal

  • flag_flat_curve

  • flag_inverted (only when check_polarity=True)

  • flag_high_residuals (only when residual_stats provided)

  • flag_any – True if any flag is set

Ordered by descending flag_count.

Return type:

pd.DataFrame

Examples

>>> import pandas as pd
>>> ffit = pd.DataFrame({
...     "well": ["A01", "B06", "E10"],
...     "K": [7.1, 3.0, 11.0],
...     "sK": [0.06, 400.0, 35.0],
...     "S0_1": [600.0, 45.0, 5890.0],
...     "S1_1": [1100.0, -7800.0, 475.0],
... })
>>> flags = detect_bad_wells(ffit, k_min=3.0, k_max=11.0, ctr_cols=[1])
>>> flags[["well", "flag_any"]].values.tolist()
[['B06', True], ['E10', True], ['A01', False]]