clophfit.fitting.diagnostics#
Well-quality diagnostics for plate-reader titration data.
Two complementary entry points:
detect_bad_wells_from_dat()— reads raw.datfiles (one per well, all labels together). No fitting required; works before the fitting pipeline. Detects low-signal and flat-curve wells across all labels.detect_bad_wells()— readsffit*.csvfit results (one label per file). Adds fit-quality criteria (K at bound, K outlier, poor fit) on top of the signal-quality checks.
Detection criteria#
K at bound : K equals the optimizer bound (default 3 or 11 for pH). Fit converged to a limit, not a true optimum.
K outlier : |K - median_K| >
k_mad_factor * MAD(K)across all wells on the plate. Identifies wells with biologically implausible K.Poor fit : sK / K >
max_sk_ratio. Relative uncertainty so large that K is undetermined.Low signal : max(|y|) <
min_signal_fraction* plate median signal. Absolute amplitude so small the fit is noise-dominated. Applies to CTR too.Flat curve : (max(y) - min(y)) / max(|y|) <
min_dynamic_range. Signal barely changes over the pH/Cl range.Inverted curve : S0 > S1 for pH or S0 < S1 for Cl – wrong polarity. Only checked in
detect_bad_wells()(requires fitted plateaus).High residuals : per-well residual MAD >
residual_mad_factortimes the plate median MAD. Requires the optionalresidual_statsDataFrame.
Functions#
|
Flag unreliable wells by reading raw |
|
Flag unreliable wells from a ffit result DataFrame. |
Module Contents#
- clophfit.fitting.diagnostics.detect_bad_wells_from_dat(data_dir, *, min_signal_fraction=0.05, min_dynamic_range=0.05, ctr_cols=None)#
Flag unreliable wells by reading raw
.dattitration files.Reads every
*.datfile in data_dir (one per well). Each file must have anxcolumn and one or more signal columns (e.g.y1,y2). All labels are checked together — no fitting is required.- Parameters:
data_dir (str | Path) – Directory containing
*.datfiles (one per well, CSV format with columnsx, y1[, y2, ...]).min_signal_fraction (float) – Flag a well when any label’s
max(|y|)is below this fraction of the plate-wide medianmax(|y|)for that label (default 0.05).min_dynamic_range (float) – Flag a well when any label’s
(max(y) - min(y)) / max(|y|)is below this threshold (default 0.05).ctr_cols (list[int] | None) – 1-based column numbers for control wells (e.g.
[1, 12]). Currently used only for logging; all flags apply equally to CTR wells because low signal or flat curves in a CTR are genuinely informative.
- Returns:
One row per well with columns:
wellflag_low_signalflag_flat_curveflag_anyflag_count
Sorted by descending
flag_count.- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If no
*.datfiles are found in data_dir.
Examples
>>> import tempfile >>> from pathlib import Path >>> nl = chr(10) >>> with tempfile.TemporaryDirectory() as d: ... rows_by_well = { ... "A01": ["8,100,200", "7,90,190", "6,80,180"], ... "A02": ["8,1,2", "7,0.9,1.9", "6,0.8,1.8"], ... } ... for well, rows in rows_by_well.items(): ... _ = (Path(d) / f"{well}.dat").write_text( ... nl.join(["x,y1,y2", *rows, ""]) ... ) ... flags = detect_bad_wells_from_dat(d) ... flags[["well", "flag_any"]].values.tolist() [['A02', True], ['A01', False]]
- clophfit.fitting.diagnostics.detect_bad_wells(ffit, *, k_min=3.0, k_max=11.0, k_mad_factor=5.0, max_sk_ratio=0.3, min_signal_fraction=0.05, min_dynamic_range=0.05, check_polarity=True, is_ph=True, ctr_cols=None, residual_stats=None, residual_mad_factor=5.0)#
Flag unreliable wells from a ffit result DataFrame.
- Parameters:
ffit (pd.DataFrame) – Per-well fit results with at minimum columns
well,K,sKand at least one pair ofS0_{lbl}/S1_{lbl}columns. Typically read fromffit*.csvproduced byppr.k_min (float) – Lower optimizer bound for K (default 3.0 for pH).
k_max (float) – Upper optimizer bound for K (default 11.0 for pH).
k_mad_factor (float) – Outlier threshold: flag if
|K - median| > k_mad_factor * MAD.max_sk_ratio (float) – Maximum tolerated relative uncertainty sK/K (default 0.30).
min_signal_fraction (float) – Minimum signal amplitude relative to the plate median: flag wells where
max(|S0|, |S1|) < min_signal_fraction * plate_median_signal(default 0.05). Catches wells with very low absolute signal regardless of relative dynamic range. Applies to all wells including CTR.min_dynamic_range (float) – Minimum required |S1-S0|/max(|S0|,|S1|) per label (default 0.05).
check_polarity (bool) – If True, flag wells where the signal direction is inverted relative to the expected biological response.
is_ph (bool) – If True (default), pH assay: expect S1 > S0 (signal rises with pH). If False, Cl assay: expect S0 > S1.
ctr_cols (list[int] | None) – Column numbers (1-based, e.g.
[1, 12]) reserved for control wells. CTR wells are excluded from the K-outlier population so their different pKa does not bias the sample statistics.flag_k_at_bound,flag_k_outlier, andflag_invertedare suppressed for CTR wells — their pKa may be outside the measurement range, causing these criteria to fire spuriously. All other flags (flag_poor_fitwhen K is not at bound,flag_low_signal,flag_flat_curve,flag_high_residuals) apply to CTR wells. DefaultNonemeans no CTR exclusion.residual_stats (pd.DataFrame | None) – Optional DataFrame from
residual_stats_*.csvwith columnslabel,mad, andwell. When provided, enables per-well residual-MAD outlier detection.residual_mad_factor (float) – Flag if per-well residual MAD >
residual_mad_factortimes the plate median MAD (default 5.0).
- Returns:
One row per well with boolean flag columns:
flag_k_at_boundflag_k_outlierflag_poor_fitflag_low_signalflag_flat_curveflag_inverted(only whencheck_polarity=True)flag_high_residuals(only whenresidual_statsprovided)flag_any– True if any flag is set
Ordered by descending
flag_count.- Return type:
pd.DataFrame
Examples
>>> import pandas as pd >>> ffit = pd.DataFrame({ ... "well": ["A01", "B06", "E10"], ... "K": [7.1, 3.0, 11.0], ... "sK": [0.06, 400.0, 35.0], ... "S0_1": [600.0, 45.0, 5890.0], ... "S1_1": [1100.0, -7800.0, 475.0], ... }) >>> flags = detect_bad_wells(ffit, k_min=3.0, k_max=11.0, ctr_cols=[1]) >>> flags[["well", "flag_any"]].values.tolist() [['B06', True], ['E10', True], ['A01', False]]