clophfit.fitting.bayes#

Bayesian (PyMC) fitting utilities and pipelines.

Classes#

NoisePriors

Prior scale parameters for the 3-component heteroscedastic noise model.

Functions#

`create_x_true`(xc, x_errc, n_xerr[, lower_nsd])	Create latent variables for x-values with uncertainty.
`create_parameter_priors`(params, n_sd[, key, ctr_name, ...])	Create PyMC parameter prior distributions from lmfit Parameters.
`rename_keys`(data)	Rename dictionary keys coming from multi-trace into base names.
`process_trace`(trace, p_names, ds, n_xerr)	Process the trace to extract parameter estimates and update datasets.
`extract_fit`(key, ctr, trace_df, ds[, well_key])	Compute individual dataset fit from a multi-well trace summary.
`x_true_from_trace_df`(trace_df)	Extract x_true from an ArviZ summary DataFrame.
`fit_binding_pymc`(ds_or_fr[, n_sd, n_xerr, ye_scaling, ...])	Analyze multi-label titration datasets using PyMC (single model).
`fit_binding_pymc2`(ds_or_fr[, n_sd, n_xerr, n_samples])	Analyze multi-label titration datasets using PyMC with separate ye_mag per label.
`fit_binding_pymc_compare`(fr, buffer_sd, *[, ...])	Fits a Bayesian binding model with two different noise models for comparison.
`closest_point_on_curve`(f, x_obs, y_obs)	Find the closest point on the model curve.
`fit_binding_pymc_odr`(fr[, n_sd, xe_scaling, ...])	Bayesian ODR-like modeling of x and y errors.
`weighted_stats`(values, stderr)	Weighted mean and stderr for control priors.
`fit_binding_pymc_multi`(results, scheme[, n_sd, ...])	Multi-well PyMC with shared K per control group and per-label noise.
`fit_binding_pymc_multi2`(results, scheme, bg_err[, ...])	Multi-well PyMC with heteroscedastic noise combining buffer and signal.
`fit_binding_pymc_multi_noise`(results, scheme, buffer_df)	Multi-well PyMC fit with shared learnable heteroscedastic noise model.
`fit_binding_pymc_multi_noise_xrw`(results, scheme, ...)	Multi-well PyMC fit with shared noise model and per-well pH random walk.
`plot_ppc_well`(trace, key[, labels, figsize])	Draw posterior predictive samples for a particular well (and all its labels).
`compare_posteriors`(trace, results)	Print posterior mean ± 95 % C.I.
`fit_pymc_hierarchical`(results, scheme, bg_err[, n_sd, ...])	Analyze multiple titrations with a hierarchical Bayesian model.

Module Contents#

clophfit.fitting.bayes.create_x_true(xc, x_errc, n_xerr, lower_nsd=2.5)#

Create latent variables for x-values with uncertainty.

Returns a PyMC Deterministic variable when in a Model context with uncertainty, or a numpy array when there’s no uncertainty or no active Model.

Parameters:

xc (clophfit.clophfit_types.ArrayF)
x_errc (clophfit.clophfit_types.ArrayF)
n_xerr (float)
lower_nsd (float)

Return type:

clophfit.clophfit_types.ArrayF | pymc.Deterministic

clophfit.fitting.bayes.create_parameter_priors(params, n_sd, key='', ctr_name='', default_sigma=0.001)#

Create PyMC parameter prior distributions from lmfit Parameters.

Parameters:

params (Parameters) – lmfit Parameters to convert to PyMC priors.
n_sd (float) – Scaling factor for parameter standard errors.
key (str) – Optional suffix to add to parameter names.
ctr_name (str) – If specified, skip creating K prior (shared from control group).
default_sigma (float) – Default sigma when stderr is not available (default: 1e-3).

Returns:

Dictionary of PyMC distribution objects.

Return type:

dict[str, pm.Distribution]

clophfit.fitting.bayes.rename_keys(data)#

Rename dictionary keys coming from multi-trace into base names.

Parameters:: data (dict[str, Any])
Return type:: dict[str, Any]

clophfit.fitting.bayes.process_trace(trace, p_names, ds, n_xerr)#

Process the trace to extract parameter estimates and update datasets.

Parameters:

trace (az.InferenceData) – The posterior samples from PyMC sampling.
p_names (KeysView[str]) – Parameter names.
ds (Dataset) – The dataset containing titration data.
n_xerr (float) – Scaling factor for x_errc.

Returns:

The updated fit result with extracted parameter values and datasets. Residuals are WEIGHTED (weight * (obs - pred)) where weight = 1/y_err, computed using posterior mean parameter estimates.

Return type:

FitResult[az.InferenceData]

Raises:

TypeError – If az.summary does not return a DataFrame.

clophfit.fitting.bayes.extract_fit(key, ctr, trace_df, ds, well_key='')#

Compute individual dataset fit from a multi-well trace summary.

Parameters:

key (str) – Well identifier used to filter per-well parameters in trace_df.
ctr (str) – Control group name used to filter shared K parameters.
trace_df (pd.DataFrame) – ArviZ summary DataFrame (fmt="wide") from the multi-well MCMC run.
ds (Dataset) – Per-well dataset whose x values are updated in-place from the trace.
well_key (str, optional) – When provided, per-well x posteriors (x_per_well[step, well_key]) are used instead of the global x_true. Pass the well identifier for xrw fits so each well’s .dat/.png uses its own inferred pH axis.

Returns:

Fit result with figure, parameters, and dataset using posterior x.

Return type:

FitResult[az.InferenceData]

clophfit.fitting.bayes.x_true_from_trace_df(trace_df)#

Extract x_true from an ArviZ summary DataFrame.

Parameters:: trace_df (pandas.DataFrame)
Return type:: clophfit.fitting.data_structures.DataArray

clophfit.fitting.bayes.fit_binding_pymc(ds_or_fr, n_sd=10.0, n_xerr=1.0, ye_scaling=1.0, n_samples=2000, nuts_sampler='default')#

Analyze multi-label titration datasets using PyMC (single model).

Parameters:

ds_or_fr (Dataset | FitResult[MiniT]) – Either a Dataset (will run initial LS fit) or a FitResult with initial params.
n_sd (float) – Number of standard deviations for parameter priors.
n_xerr (float) – Scaling factor for x-error.
ye_scaling (float) – Scaling factor for y-error magnitude prior.
n_samples (int) – Number of MCMC samples.
nuts_sampler (str) – NUTS sampler backend: "default" (PyMC C/pytensor), "blackjax", "numpyro", or "nutpie".

Returns:

Bayesian fitting results.

Return type:

FitResult[az.InferenceData]

clophfit.fitting.bayes.fit_binding_pymc2(ds_or_fr, n_sd=10.0, n_xerr=1.0, n_samples=2000)#

Analyze multi-label titration datasets using PyMC with separate ye_mag per label.

Parameters:

ds_or_fr (Dataset | FitResult[MiniT]) – Either a Dataset (will run initial LS fit) or a FitResult with initial params.
n_sd (float) – Number of standard deviations for parameter priors.
n_xerr (float) – Scaling factor for x-error.
n_samples (int) – Number of MCMC samples.

Returns:

Bayesian fitting results with per-label error scaling.

Return type:

FitResult[az.InferenceData]

clophfit.fitting.bayes.fit_binding_pymc_compare(fr, buffer_sd, *, learn_separate_y_mag=False, n_sd=10.0, n_xerr=1.0, n_samples=2000)#

Fits a Bayesian binding model with two different noise models for comparison.

Parameters:

fr (FitResult[MiniT]) – The fit result from a previous run, providing initial parameters and dataset.
buffer_sd (dict[str, float]) – bg_err
learn_separate_y_mag (bool) – If True, learns a unique noise scaling factor for each dataset label. If False, learns a single scaling factor for all pre-weighted data.
n_sd (float) – Prior width for parameters in create_parameter_priors.
n_xerr (float) – Scaling factor for x_errc in create_x_true.
n_samples (int) – Number of MCMC samples to draw.

Returns:

The posterior samples from PyMC for the specified noise model.

Return type:

az.InferenceData

clophfit.fitting.bayes.closest_point_on_curve(f, x_obs, y_obs)#

Find the closest point on the model curve.

Parameters:

f (clophfit.clophfit_types.FloatFunc)
x_obs (float)
y_obs (float)

Return type:

float

clophfit.fitting.bayes.fit_binding_pymc_odr(fr, n_sd=10.0, xe_scaling=1.0, ye_scaling=10.0, n_samples=2000)#

Bayesian ODR-like modeling of x and y errors.

Parameters:

fr (clophfit.fitting.data_structures.FitResult[clophfit.fitting.data_structures.MiniT])
n_sd (float)
xe_scaling (float)
ye_scaling (float)
n_samples (int)

Return type:

arviz.InferenceData | pymc.backends.base.MultiTrace

clophfit.fitting.bayes.weighted_stats(values, stderr)#

Weighted mean and stderr for control priors.

Parameters:

values (collections.abc.Mapping[str, collections.abc.Sequence[float | None]])
stderr (collections.abc.Mapping[str, collections.abc.Sequence[float | None]])

Return type:

dict[str, tuple[float, float]]

clophfit.fitting.bayes.fit_binding_pymc_multi(results, scheme, n_sd=5.0, n_xerr=1.0, ye_scaling=1.0, n_samples=2000, nuts_sampler='default', *, ctr_free_k=False)#

Multi-well PyMC with shared K per control group and per-label noise.

Parameters:

results (dict[str, FitResult[MiniT]]) – Per-well initial fit results.
scheme (PlateScheme) – Plate scheme defining control groups for shared-K priors.
n_sd (float) – Prior width multiplier for per-well S0/S1 parameters.
n_xerr (float) – Scaling factor applied to x-value uncertainties.
ye_scaling (float) – HalfNormal sigma for the per-label y-error scaling factor.
n_samples (int) – Number of MCMC posterior samples per chain.
nuts_sampler (str) – NUTS sampler backend ("default", "blackjax", "numpyro", "nutpie").
ctr_free_k (bool) – If True, each CTR replicate well gets its own independent flat K prior Normal(group_mean, 0.2) — identical to UNK well treatment, no hierarchical shrinkage. The spread of K posteriors across replicates then quantifies between-replicate accuracy. If False (default), all replicates of the same CTR share a single K.

Returns:

The PyMC posterior trace.

Return type:

az.InferenceData

Raises:

ValueError – If no valid dataset is found in results.

clophfit.fitting.bayes.fit_binding_pymc_multi2(results, scheme, bg_err, n_sd=5.0, n_xerr=1.0, n_samples=2000)#

Multi-well PyMC with heteroscedastic noise combining buffer and signal.

Parameters:

results (dict[str, clophfit.fitting.data_structures.FitResult[clophfit.fitting.data_structures.MiniT]])
scheme (clophfit.prtecan.PlateScheme)
bg_err (dict[int, clophfit.clophfit_types.ArrayF])
n_sd (float)
n_xerr (float)
n_samples (int)

Return type:

arviz.InferenceData

class clophfit.fitting.bayes.NoisePriors#

Prior scale parameters for the 3-component heteroscedastic noise model.

All values are HalfNormal sigma parameters. The variance model is:

Var(y | mu) = sigma_read**2 + gain * max(0, mu) + alpha**2 * mu**2

Parameters:

sigma_read (float) – HalfNormal sigma for the readout-floor noise (RFU).
gain (float) – HalfNormal sigma for the Poisson-like gain term (RFU/RFU).
alpha (float) – HalfNormal sigma for the multiplicative CV term (dimensionless).

clophfit.fitting.bayes.fit_binding_pymc_multi_noise(results, scheme, buffer_df, n_sd=5.0, n_xerr=1.0, n_samples=2000, nuts_sampler='default', *, ctr_free_k=False)#

Multi-well PyMC fit with shared learnable heteroscedastic noise model.

Fits all wells simultaneously. Per-label noise parameters (sigma_read, gain, alpha) are shared across all wells and inferred from the data. The variance model is:

Var(y | mu) = sigma_read**2 + gain * max(0, mu) + alpha**2 * mu**2

where mu is the model-predicted (background-subtracted) signal. Priors for the noise parameters are derived empirically from the buffer replicate variance via _noise_priors_from_buffer().

Input data must be background-subtracted (i.e. the standard Tecan pipeline output where buffer mean has already been removed).

Parameters:

results (dict[str, FitResult[MiniT]]) – Per-well initial fit results, typically from fit_binding_glob.
scheme (PlateScheme) – Plate scheme defining control groups for shared-K priors.
buffer_df (dict[int, pd.DataFrame]) – Buffer DataFrames (integer label index -> DataFrame with well columns), used to derive noise priors from replicate variance.
n_sd (float) – Prior width multiplier for per-well S0/S1 parameters.
n_xerr (float) – Scaling factor applied to x-value uncertainties.
n_samples (int) – Number of MCMC posterior samples per chain.
nuts_sampler (str) – NUTS sampler backend: "default" (pytensor/CPU), "blackjax" (JAX/GPU), "numpyro" (JAX/GPU), or "nutpie" (Rust/CPU).
ctr_free_k (bool) – If True, each CTR replicate well gets its own independent flat K prior Normal(group_mean, 0.2) — identical to UNK well treatment, no hierarchical shrinkage. The spread of K posteriors across replicates quantifies between-replicate accuracy. If False (default), all replicates share a single K.

Returns:

Posterior trace. Noise parameters are accessible as trace.posterior["sigma_read_<lbl>"], trace.posterior["gain_<lbl>"], and trace.posterior["alpha_<lbl>"].

Return type:

az.InferenceData

Raises:

ValueError – If no valid dataset is found in results.

clophfit.fitting.bayes.fit_binding_pymc_multi_noise_xrw(results, scheme, buffer_df, n_sd=5.0, n_xerr=1.0, n_samples=2000, sigma_pip_prior=0.02, nuts_sampler='default', *, ctr_free_k=False)#

Multi-well PyMC fit with shared noise model and per-well pH random walk.

Extends fit_binding_pymc_multi_noise() with a hierarchical random-walk model for per-well pH deviations. The first titration step is common to all wells (same buffer). Each subsequent acid addition introduces independent Normal(0, sigma_pip²) deviations that accumulate, so the variance of the pH deviation at step t is t · sigma_pip².

Non-centred parameterisation is used for numerical efficiency:

z_pip[t, w] ~ Normal(0, 1)  (shape: n_steps-1 x n_wells)
x_dev[:, w] = concat([0, cumsum(sigma_pip * z_pip[:, w])])
x_per_well  = x_nominal[:, None] + x_dev   (shape: n_steps x n_wells)

Parameters:

results (dict[str, FitResult[MiniT]]) – Per-well initial fit results, typically from fit_binding_glob.
scheme (PlateScheme) – Plate scheme defining control groups for shared-K priors.
buffer_df (dict[int, pd.DataFrame]) – Buffer DataFrames (integer label index -> DataFrame with well columns), used to derive noise priors from replicate variance.
n_sd (float) – Prior width multiplier for per-well S0/S1 parameters.
n_xerr (float) – Scaling factor applied to x-value uncertainties.
n_samples (int) – Number of MCMC posterior samples per chain.
sigma_pip_prior (float) – Prior scale (HalfNormal sigma) for the per-step pipetting SD, in the same units as the x-axis (pH units by default).
nuts_sampler (str) – NUTS sampler backend: "default" (pytensor/CPU), "blackjax" (JAX/GPU), "numpyro" (JAX/GPU), or "nutpie" (Rust/CPU).
ctr_free_k (bool) – If True, each CTR replicate well gets its own independent flat K prior Normal(group_mean, 0.2) — identical to UNK well treatment, no hierarchical shrinkage. The spread of K posteriors across replicates quantifies between-replicate accuracy. If False (default), all replicates share a single K.

Returns:

Posterior trace. Per-well x is accessible as trace.posterior["x_per_well"] with dims ("chain", "draw", "step", "well"). Noise parameters are accessible as trace.posterior["sigma_read_<lbl>"] etc.

Return type:

az.InferenceData

Raises:

ValueError – If no valid dataset is found in results.

clophfit.fitting.bayes.plot_ppc_well(trace, key, labels=None, figsize=(8, 4))#

Draw posterior predictive samples for a particular well (and all its labels).

The returned figure can be displayed with matplotlib.

Parameters:

trace (az.InferenceData) – Trace produced by fit_binding_pymc_advanced.
key (str) – Well identifier (e.g. ‘A01’).
labels (list[str] | None) – Names of the bands to show. If None the function will automatically look for all variables starting with 'y_' that contain this key.
figsize (tuple[float, float]) – size?

Returns:

Plot

Return type:

figure.Figure

clophfit.fitting.bayes.compare_posteriors(trace, results)#

Print posterior mean ± 95 % C.I.

For the K parameter for each well, and juxtapose it with the deterministic K (from fit_binding_pymc).

Parameters:

trace (az.InferenceData) – Output of fit_binding_pymc_advanced.
results (dict[str, FitResult[MiniT]]) – Deterministic fits produced by the old pipeline.

Return type:

None

clophfit.fitting.bayes.fit_pymc_hierarchical(results, scheme, bg_err, n_sd=5.0, n_xerr=1.0, n_samples=2000)#

Analyze multiple titrations with a hierarchical Bayesian model.

This model shares information about the dissociation constant ‘K’ among wells belonging to the same control group, leading to more robust estimates.

Parameters:

results (dict[str, FitResult[MiniT]]) – A dictionary mapping well IDs to their initial FitResult from a prior fit_lm run.
scheme (PlateScheme) – The plate scheme defining control groups.
bg_err (dict[int, ArrayF]) – Background error for each signal band.
n_sd (float) – The number of standard deviations for the prior width of S0/S1.
n_xerr (float) – Scaling factor for x-value uncertainties.
n_samples (int) – Number of MCMC samples.

Returns:

The PyMC trace containing the posterior distributions.

Return type:

az.InferenceData

Raises:

ValueError – With invalid dataset.