clophfit.fitting.bayes#

Bayesian (PyMC) fitting utilities and pipelines.

Classes#

NoisePriors

Prior scale parameters for the 3-component heteroscedastic noise model.

Functions#

create_x_true(xc, x_errc, n_xerr[, lower_nsd])

Create latent variables for x-values with uncertainty.

create_parameter_priors(params, n_sd[, key, ctr_name, ...])

Create PyMC parameter prior distributions from lmfit Parameters.

rename_keys(data)

Rename dictionary keys coming from multi-trace into base names.

process_trace(trace, p_names, ds, n_xerr)

Process the trace to extract parameter estimates and update datasets.

extract_fit(key, ctr, trace_df, ds[, well_key])

Compute individual dataset fit from a multi-well trace summary.

x_true_from_trace_df(trace_df)

Extract x_true from an ArviZ summary DataFrame.

fit_binding_pymc(ds_or_fr[, n_sd, n_xerr, ye_scaling, ...])

Analyze multi-label titration datasets using PyMC (single model).

fit_binding_pymc2(ds_or_fr[, n_sd, n_xerr, n_samples])

Analyze multi-label titration datasets using PyMC with separate ye_mag per label.

fit_binding_pymc_compare(fr, buffer_sd, *[, ...])

Fits a Bayesian binding model with two different noise models for comparison.

closest_point_on_curve(f, x_obs, y_obs)

Find the closest point on the model curve.

fit_binding_pymc_odr(fr[, n_sd, xe_scaling, ...])

Bayesian ODR-like modeling of x and y errors.

weighted_stats(values, stderr)

Weighted mean and stderr for control priors.

fit_binding_pymc_multi(results, scheme[, n_sd, ...])

Multi-well PyMC with shared K per control group and per-label noise.

fit_binding_pymc_multi2(results, scheme, bg_err[, ...])

Multi-well PyMC with heteroscedastic noise combining buffer and signal.

fit_binding_pymc_multi_noise(results, scheme, buffer_df)

Multi-well PyMC fit with shared learnable heteroscedastic noise model.

fit_binding_pymc_multi_noise_xrw(results, scheme, ...)

Multi-well PyMC fit with shared noise model and per-well pH random walk.

plot_ppc_well(trace, key[, labels, figsize])

Draw posterior predictive samples for a particular well (and all its labels).

compare_posteriors(trace, results)

Print posterior mean ± 95 % C.I.

fit_pymc_hierarchical(results, scheme, bg_err[, n_sd, ...])

Analyze multiple titrations with a hierarchical Bayesian model.

Module Contents#

clophfit.fitting.bayes.create_x_true(xc, x_errc, n_xerr, lower_nsd=2.5)#

Create latent variables for x-values with uncertainty.

Returns a PyMC Deterministic variable when in a Model context with uncertainty, or a numpy array when there’s no uncertainty or no active Model.

Parameters:
  • xc (clophfit.clophfit_types.ArrayF)

  • x_errc (clophfit.clophfit_types.ArrayF)

  • n_xerr (float)

  • lower_nsd (float)

Return type:

clophfit.clophfit_types.ArrayF | pymc.Deterministic

clophfit.fitting.bayes.create_parameter_priors(params, n_sd, key='', ctr_name='', default_sigma=0.001)#

Create PyMC parameter prior distributions from lmfit Parameters.

Parameters:
  • params (Parameters) – lmfit Parameters to convert to PyMC priors.

  • n_sd (float) – Scaling factor for parameter standard errors.

  • key (str) – Optional suffix to add to parameter names.

  • ctr_name (str) – If specified, skip creating K prior (shared from control group).

  • default_sigma (float) – Default sigma when stderr is not available (default: 1e-3).

Returns:

Dictionary of PyMC distribution objects.

Return type:

dict[str, pm.Distribution]

clophfit.fitting.bayes.rename_keys(data)#

Rename dictionary keys coming from multi-trace into base names.

Parameters:

data (dict[str, Any])

Return type:

dict[str, Any]

clophfit.fitting.bayes.process_trace(trace, p_names, ds, n_xerr)#

Process the trace to extract parameter estimates and update datasets.

Parameters:
  • trace (az.InferenceData) – The posterior samples from PyMC sampling.

  • p_names (KeysView[str]) – Parameter names.

  • ds (Dataset) – The dataset containing titration data.

  • n_xerr (float) – Scaling factor for x_errc.

Returns:

The updated fit result with extracted parameter values and datasets. Residuals are WEIGHTED (weight * (obs - pred)) where weight = 1/y_err, computed using posterior mean parameter estimates.

Return type:

FitResult[az.InferenceData]

Raises:

TypeError – If az.summary does not return a DataFrame.

clophfit.fitting.bayes.extract_fit(key, ctr, trace_df, ds, well_key='')#

Compute individual dataset fit from a multi-well trace summary.

Parameters:
  • key (str) – Well identifier used to filter per-well parameters in trace_df.

  • ctr (str) – Control group name used to filter shared K parameters.

  • trace_df (pd.DataFrame) – ArviZ summary DataFrame (fmt="wide") from the multi-well MCMC run.

  • ds (Dataset) – Per-well dataset whose x values are updated in-place from the trace.

  • well_key (str, optional) – When provided, per-well x posteriors (x_per_well[step, well_key]) are used instead of the global x_true. Pass the well identifier for xrw fits so each well’s .dat/.png uses its own inferred pH axis.

Returns:

Fit result with figure, parameters, and dataset using posterior x.

Return type:

FitResult[az.InferenceData]

clophfit.fitting.bayes.x_true_from_trace_df(trace_df)#

Extract x_true from an ArviZ summary DataFrame.

Parameters:

trace_df (pandas.DataFrame)

Return type:

clophfit.fitting.data_structures.DataArray

clophfit.fitting.bayes.fit_binding_pymc(ds_or_fr, n_sd=10.0, n_xerr=1.0, ye_scaling=1.0, n_samples=2000, nuts_sampler='default')#

Analyze multi-label titration datasets using PyMC (single model).

Parameters:
  • ds_or_fr (Dataset | FitResult[MiniT]) – Either a Dataset (will run initial LS fit) or a FitResult with initial params.

  • n_sd (float) – Number of standard deviations for parameter priors.

  • n_xerr (float) – Scaling factor for x-error.

  • ye_scaling (float) – Scaling factor for y-error magnitude prior.

  • n_samples (int) – Number of MCMC samples.

  • nuts_sampler (str) – NUTS sampler backend: "default" (PyMC C/pytensor), "blackjax", "numpyro", or "nutpie".

Returns:

Bayesian fitting results.

Return type:

FitResult[az.InferenceData]

clophfit.fitting.bayes.fit_binding_pymc2(ds_or_fr, n_sd=10.0, n_xerr=1.0, n_samples=2000)#

Analyze multi-label titration datasets using PyMC with separate ye_mag per label.

Parameters:
  • ds_or_fr (Dataset | FitResult[MiniT]) – Either a Dataset (will run initial LS fit) or a FitResult with initial params.

  • n_sd (float) – Number of standard deviations for parameter priors.

  • n_xerr (float) – Scaling factor for x-error.

  • n_samples (int) – Number of MCMC samples.

Returns:

Bayesian fitting results with per-label error scaling.

Return type:

FitResult[az.InferenceData]

clophfit.fitting.bayes.fit_binding_pymc_compare(fr, buffer_sd, *, learn_separate_y_mag=False, n_sd=10.0, n_xerr=1.0, n_samples=2000)#

Fits a Bayesian binding model with two different noise models for comparison.

Parameters:
  • fr (FitResult[MiniT]) – The fit result from a previous run, providing initial parameters and dataset.

  • buffer_sd (dict[str, float]) – bg_err

  • learn_separate_y_mag (bool) – If True, learns a unique noise scaling factor for each dataset label. If False, learns a single scaling factor for all pre-weighted data.

  • n_sd (float) – Prior width for parameters in create_parameter_priors.

  • n_xerr (float) – Scaling factor for x_errc in create_x_true.

  • n_samples (int) – Number of MCMC samples to draw.

Returns:

The posterior samples from PyMC for the specified noise model.

Return type:

az.InferenceData

clophfit.fitting.bayes.closest_point_on_curve(f, x_obs, y_obs)#

Find the closest point on the model curve.

Parameters:
  • f (clophfit.clophfit_types.FloatFunc)

  • x_obs (float)

  • y_obs (float)

Return type:

float

clophfit.fitting.bayes.fit_binding_pymc_odr(fr, n_sd=10.0, xe_scaling=1.0, ye_scaling=10.0, n_samples=2000)#

Bayesian ODR-like modeling of x and y errors.

Parameters:
Return type:

arviz.InferenceData | pymc.backends.base.MultiTrace

clophfit.fitting.bayes.weighted_stats(values, stderr)#

Weighted mean and stderr for control priors.

Parameters:
  • values (collections.abc.Mapping[str, collections.abc.Sequence[float | None]])

  • stderr (collections.abc.Mapping[str, collections.abc.Sequence[float | None]])

Return type:

dict[str, tuple[float, float]]

clophfit.fitting.bayes.fit_binding_pymc_multi(results, scheme, n_sd=5.0, n_xerr=1.0, ye_scaling=1.0, n_samples=2000, nuts_sampler='default', *, ctr_free_k=False)#

Multi-well PyMC with shared K per control group and per-label noise.

Parameters:
  • results (dict[str, FitResult[MiniT]]) – Per-well initial fit results.

  • scheme (PlateScheme) – Plate scheme defining control groups for shared-K priors.

  • n_sd (float) – Prior width multiplier for per-well S0/S1 parameters.

  • n_xerr (float) – Scaling factor applied to x-value uncertainties.

  • ye_scaling (float) – HalfNormal sigma for the per-label y-error scaling factor.

  • n_samples (int) – Number of MCMC posterior samples per chain.

  • nuts_sampler (str) – NUTS sampler backend ("default", "blackjax", "numpyro", "nutpie").

  • ctr_free_k (bool) – If True, each CTR replicate well gets its own independent flat K prior Normal(group_mean, 0.2) — identical to UNK well treatment, no hierarchical shrinkage. The spread of K posteriors across replicates then quantifies between-replicate accuracy. If False (default), all replicates of the same CTR share a single K.

Returns:

The PyMC posterior trace.

Return type:

az.InferenceData

Raises:

ValueError – If no valid dataset is found in results.

clophfit.fitting.bayes.fit_binding_pymc_multi2(results, scheme, bg_err, n_sd=5.0, n_xerr=1.0, n_samples=2000)#

Multi-well PyMC with heteroscedastic noise combining buffer and signal.

Parameters:
  • results (dict[str, clophfit.fitting.data_structures.FitResult[clophfit.fitting.data_structures.MiniT]])

  • scheme (clophfit.prtecan.PlateScheme)

  • bg_err (dict[int, clophfit.clophfit_types.ArrayF])

  • n_sd (float)

  • n_xerr (float)

  • n_samples (int)

Return type:

arviz.InferenceData

class clophfit.fitting.bayes.NoisePriors#

Prior scale parameters for the 3-component heteroscedastic noise model.

All values are HalfNormal sigma parameters. The variance model is:

Var(y | mu) = sigma_read**2 + gain * max(0, mu) + alpha**2 * mu**2
Parameters:
  • sigma_read (float) – HalfNormal sigma for the readout-floor noise (RFU).

  • gain (float) – HalfNormal sigma for the Poisson-like gain term (RFU/RFU).

  • alpha (float) – HalfNormal sigma for the multiplicative CV term (dimensionless).

clophfit.fitting.bayes.fit_binding_pymc_multi_noise(results, scheme, buffer_df, n_sd=5.0, n_xerr=1.0, n_samples=2000, nuts_sampler='default', *, ctr_free_k=False)#

Multi-well PyMC fit with shared learnable heteroscedastic noise model.

Fits all wells simultaneously. Per-label noise parameters (sigma_read, gain, alpha) are shared across all wells and inferred from the data. The variance model is:

Var(y | mu) = sigma_read**2 + gain * max(0, mu) + alpha**2 * mu**2

where mu is the model-predicted (background-subtracted) signal. Priors for the noise parameters are derived empirically from the buffer replicate variance via _noise_priors_from_buffer().

Input data must be background-subtracted (i.e. the standard Tecan pipeline output where buffer mean has already been removed).

Parameters:
  • results (dict[str, FitResult[MiniT]]) – Per-well initial fit results, typically from fit_binding_glob.

  • scheme (PlateScheme) – Plate scheme defining control groups for shared-K priors.

  • buffer_df (dict[int, pd.DataFrame]) – Buffer DataFrames (integer label index -> DataFrame with well columns), used to derive noise priors from replicate variance.

  • n_sd (float) – Prior width multiplier for per-well S0/S1 parameters.

  • n_xerr (float) – Scaling factor applied to x-value uncertainties.

  • n_samples (int) – Number of MCMC posterior samples per chain.

  • nuts_sampler (str) – NUTS sampler backend: "default" (pytensor/CPU), "blackjax" (JAX/GPU), "numpyro" (JAX/GPU), or "nutpie" (Rust/CPU).

  • ctr_free_k (bool) – If True, each CTR replicate well gets its own independent flat K prior Normal(group_mean, 0.2) — identical to UNK well treatment, no hierarchical shrinkage. The spread of K posteriors across replicates quantifies between-replicate accuracy. If False (default), all replicates share a single K.

Returns:

Posterior trace. Noise parameters are accessible as trace.posterior["sigma_read_<lbl>"], trace.posterior["gain_<lbl>"], and trace.posterior["alpha_<lbl>"].

Return type:

az.InferenceData

Raises:

ValueError – If no valid dataset is found in results.

clophfit.fitting.bayes.fit_binding_pymc_multi_noise_xrw(results, scheme, buffer_df, n_sd=5.0, n_xerr=1.0, n_samples=2000, sigma_pip_prior=0.02, nuts_sampler='default', *, ctr_free_k=False)#

Multi-well PyMC fit with shared noise model and per-well pH random walk.

Extends fit_binding_pymc_multi_noise() with a hierarchical random-walk model for per-well pH deviations. The first titration step is common to all wells (same buffer). Each subsequent acid addition introduces independent Normal(0, sigma_pip²) deviations that accumulate, so the variance of the pH deviation at step t is t · sigma_pip².

Non-centred parameterisation is used for numerical efficiency:

z_pip[t, w] ~ Normal(0, 1)  (shape: n_steps-1 x n_wells)
x_dev[:, w] = concat([0, cumsum(sigma_pip * z_pip[:, w])])
x_per_well  = x_nominal[:, None] + x_dev   (shape: n_steps x n_wells)
Parameters:
  • results (dict[str, FitResult[MiniT]]) – Per-well initial fit results, typically from fit_binding_glob.

  • scheme (PlateScheme) – Plate scheme defining control groups for shared-K priors.

  • buffer_df (dict[int, pd.DataFrame]) – Buffer DataFrames (integer label index -> DataFrame with well columns), used to derive noise priors from replicate variance.

  • n_sd (float) – Prior width multiplier for per-well S0/S1 parameters.

  • n_xerr (float) – Scaling factor applied to x-value uncertainties.

  • n_samples (int) – Number of MCMC posterior samples per chain.

  • sigma_pip_prior (float) – Prior scale (HalfNormal sigma) for the per-step pipetting SD, in the same units as the x-axis (pH units by default).

  • nuts_sampler (str) – NUTS sampler backend: "default" (pytensor/CPU), "blackjax" (JAX/GPU), "numpyro" (JAX/GPU), or "nutpie" (Rust/CPU).

  • ctr_free_k (bool) – If True, each CTR replicate well gets its own independent flat K prior Normal(group_mean, 0.2) — identical to UNK well treatment, no hierarchical shrinkage. The spread of K posteriors across replicates quantifies between-replicate accuracy. If False (default), all replicates share a single K.

Returns:

Posterior trace. Per-well x is accessible as trace.posterior["x_per_well"] with dims ("chain", "draw", "step", "well"). Noise parameters are accessible as trace.posterior["sigma_read_<lbl>"] etc.

Return type:

az.InferenceData

Raises:

ValueError – If no valid dataset is found in results.

clophfit.fitting.bayes.plot_ppc_well(trace, key, labels=None, figsize=(8, 4))#

Draw posterior predictive samples for a particular well (and all its labels).

The returned figure can be displayed with matplotlib.

Parameters:
  • trace (az.InferenceData) – Trace produced by fit_binding_pymc_advanced.

  • key (str) – Well identifier (e.g. ‘A01’).

  • labels (list[str] | None) – Names of the bands to show. If None the function will automatically look for all variables starting with 'y_' that contain this key.

  • figsize (tuple[float, float]) – size?

Returns:

Plot

Return type:

figure.Figure

clophfit.fitting.bayes.compare_posteriors(trace, results)#

Print posterior mean ± 95 % C.I.

For the K parameter for each well, and juxtapose it with the deterministic K (from fit_binding_pymc).

Parameters:
  • trace (az.InferenceData) – Output of fit_binding_pymc_advanced.

  • results (dict[str, FitResult[MiniT]]) – Deterministic fits produced by the old pipeline.

Return type:

None

clophfit.fitting.bayes.fit_pymc_hierarchical(results, scheme, bg_err, n_sd=5.0, n_xerr=1.0, n_samples=2000)#

Analyze multiple titrations with a hierarchical Bayesian model.

This model shares information about the dissociation constant ‘K’ among wells belonging to the same control group, leading to more robust estimates.

Parameters:
  • results (dict[str, FitResult[MiniT]]) – A dictionary mapping well IDs to their initial FitResult from a prior fit_lm run.

  • scheme (PlateScheme) – The plate scheme defining control groups.

  • bg_err (dict[int, ArrayF]) – Background error for each signal band.

  • n_sd (float) – The number of standard deviations for the prior width of S0/S1.

  • n_xerr (float) – Scaling factor for x-value uncertainties.

  • n_samples (int) – Number of MCMC samples.

Returns:

The PyMC trace containing the posterior distributions.

Return type:

az.InferenceData

Raises:

ValueError – With invalid dataset.