clophfit.fitting.core#

Clophfit: Fitting of Cl- binding and pH titration curves.

This module provides a comprehensive suite of tools for analyzing titration data, particularly for chloride binding and pH titration experiments common in biochemistry, such as those involving fluorescent probes.

Core Functionality:#

Data Modeling: Implements a 1-site binding model suitable for both ligand concentration and pH titrations.
Spectral Data Processing: - Processes raw spectral data (e.g., from fluorescence spectroscopy). - Offers two methods for data reduction:
- Singular Value Decomposition (SVD) to extract the most significant spectral component.
- Band integration over a specified wavelength range.
Curve Fitting: Provides three distinct fitting backends to determine the dissociation constant (K) and other parameters:
- Least-Squares (LM): Utilizes the lmfit library for robust non-linear least-squares minimization. Supports iterative reweighting and outlier removal.
- Orthogonal Distance Regression (ODR): Employs odrpack to account for uncertainties in both x and y variables, which is crucial when x-values (e.g., pH measurements) have errors.
- Bayesian Modeling (PyMC): Implements a hierarchical Bayesian model using pymc. This approach is powerful for:
  - Quantifying parameter uncertainties as full posterior distributions.
  - Modeling errors in x-values as latent variables.
  - Sharing information between multiple experiments (hierarchical fitting) to obtain more robust parameter estimates.
Result Visualization: Includes extensive plotting functions to visualize: - Raw and processed spectra. - Fitted curves with confidence intervals. - Diagnostic plots for SVD and Bayesian analyses (e.g., corner plots).

Functions#

`weight_da`(da, *, is_ph)	Estimate initial weights for a DataArray by fitting it individually.
`weight_multi_ds_titration`(ds)	Assign weights to all DataArrays within a Dataset.
`analyze_spectra`(spectra, *, is_ph[, band])	Analyze spectra titration, fit the data, and plot the results.
`analyze_spectra_glob`(titration, ds[, dbands])	Analyze multi-label spectra visualize the results.
`fit_binding_glob`(ds, *[, method, reweight, ...])	Analyze multi-label titration datasets and visualize the results.

Module Contents#

clophfit.fitting.core.weight_da(da, *, is_ph)#

Estimate initial weights for a DataArray by fitting it individually.

The standard error of the residuals from this initial fit is used as the uncertainty (y_err) for subsequent weighted fits.

Parameters:

da (DataArray) – The data array to be weighted.
is_ph (bool) – Whether the titration is pH-based.

Returns:

True if the weighting fit was successful, False otherwise.

Return type:

bool

clophfit.fitting.core.weight_multi_ds_titration(ds)#

Assign weights to all DataArrays within a Dataset.

Iterates through each DataArray in the Dataset, calling weight_da to estimate y_err. For any DataArray where weighting fails (e.g., due to insufficient data), a fallback error is assigned based on the errors from successfully fitted arrays.

Optimized version with reduced set operations and memory allocations.

Parameters:: ds (clophfit.fitting.data_structures.Dataset)
Return type:: None

clophfit.fitting.core.analyze_spectra(spectra, *, is_ph, band=None)#

Analyze spectra titration, fit the data, and plot the results.

This function performs either Singular Value Decomposition (SVD) or integrates spectra over a specified band.

Parameters:

spectra (pd.DataFrame) – The DataFrame containing spectra (one spectrum for each column).
is_ph (bool) – Whether the x-axis represents pH.
band (tuple[int, int] | None) – If provided, use the ‘band’ integration method. Otherwise, use ‘svd’.

Returns:

An object containing the fit results and the summary plot.

Return type:

FitResult[Minimizer]

Raises:

ValueError – If the band parameters are not in the spectra’s index when the band method is used.

Notes

Creates plots of spectra, principal component vectors, singular values, fit of the first principal component and PCA for SVD; only of spectra and fit for Band method.

clophfit.fitting.core.analyze_spectra_glob(titration, ds, dbands=None)#

Analyze multi-label spectra visualize the results.

Parameters:

titration (dict[str, pandas.DataFrame])
ds (clophfit.fitting.data_structures.Dataset)
dbands (dict[str, tuple[int, int]] | None)

Return type:

clophfit.fitting.data_structures.SpectraGlobResults

clophfit.fitting.core.fit_binding_glob(ds, *, method='lm', reweight=None, remove_outliers=None, max_iter=15, tol=0.01, scale_covar=True)#

Analyze multi-label titration datasets and visualize the results.

Unified fitting function that supports standard least-squares and robust fitting with optional iterative reweighting and outlier detection.

Parameters:

ds (Dataset) – Input dataset with x, y, and y_err for each label.
method (str, optional) – Fitting method: "lm" (default) for standard least-squares or "huber" for Huber-loss robust fitting (reduces outlier influence).
reweight (str | None, optional) –
Reweighting strategy to apply after each residual evaluation:
- "irls" - iteratively reweighted least-squares (uniform scale per label from MA-residual).
Default is None (no reweighting).
remove_outliers (str | None, optional) – Outlier-removal specification of the form "zscore:threshold:min_keep" where threshold is the z-score cutoff and min_keep is the minimum number of points required per label. Default is None.
max_iter (int, optional) – Maximum number of iterations for iterative procedures (reweighting). Default is 15.
tol (float, optional) – Convergence tolerance on the reduced chi-squared. The loop stops when the improvement drops below this value. Default is 0.01.
scale_covar (bool, optional) – Whether to scale the covariance matrix. Default is True.

Returns:

An object containing the fit results, plot figure, minimizer, and dataset copy.

Return type:

FitResult[Minimizer]

Raises:

InsufficientDataError – If there are not enough data points for the number of parameters.

Notes

Parameter uncertainties are scaled by \(\\sqrt{\\chi^2_\\nu}\) via lmfit’s Minimizer(scale_covar=True), which improves coverage when errors are underestimated.

Residuals returned are WEIGHTED (weight * (observed - predicted)) where weight = 1/y_err. This is appropriate for heteroscedastic data where different observations have different uncertainties.