clophfit.testing.synthetic
==========================

.. py:module:: clophfit.testing.synthetic

.. autoapi-nested-parse::

   Synthetic data generation for testing and benchmarking.

   This module provides a unified API for generating synthetic pH titration datasets
   with characteristics matching real experimental data from Tecan plate readers.

   Primary functions:
   - make_dataset: Unified function for all synthetic data generation
   - make_simple_dataset: Simplified interface for unit tests


Classes
-------

.. autoapisummary::

   clophfit.testing.synthetic.TruthParams


Functions
---------

.. autoapisummary::

   clophfit.testing.synthetic.make_dataset
   clophfit.testing.synthetic.make_simple_dataset
   clophfit.testing.synthetic.make_benchmark_dataset


Module Contents
---------------

.. py:class:: TruthParams

   Ground truth parameters for synthetic data.


.. py:function:: make_dataset(k = None, s0 = None, s1 = None, *, is_ph = True, seed = None, rng = None, n_labels = 2, randomize_signals = False, error_model = 'realistic', noise = 0.02, y_err = None, rel_error = 0.035, min_error = 1.0, buffer_sd = 50.0, error_ratio = 1.0, low_ph_drop = False, low_ph_drop_magnitude = 0.4, low_ph_drop_label = 'y1', saturation_prob = 0.0, x_error_large = 0.0, x_systematic_offset = 0.0, rel_x_err = 0.01, n_points = None)

   Generate synthetic pH/Cl titration data with configurable complexity.

   This is the unified function for all synthetic data generation.
   It supports per-label error scaling, randomization of signal parameters
   based on real experimental data distributions, and realistic artifacts
   like low-pH drops.

   :param k: Equilibrium constant (pKa for pH, Kd for Cl). If None and randomize_signals
             is True, sampled from real data distribution.
   :type k: float | None
   :param s0: Signal at unbound state. Use dict for multiple labels: {"y1": 700, "y2": 1000}.
              If None and randomize_signals is True, sampled from real data distribution.
   :type s0: dict[str, float] | float | None
   :param s1: Signal at bound state. Use dict for multiple labels: {"y1": 1200, "y2": 200}.
              If None and randomize_signals is True, sampled from real data distribution.
   :type s1: dict[str, float] | float | None
   :param is_ph: True for pH titration, False for Cl titration.
   :type is_ph: bool
   :param seed: Random seed for reproducibility. If None (default), generates random data.
   :type seed: int | None
   :param rng: Pre-existing random generator (overrides seed if provided).
   :type rng: np.random.Generator | None
   :param n_labels: Number of labels (1 or 2). Only used when randomize_signals=True.
   :type n_labels: int
   :param randomize_signals: If True, randomize K, S0, S1 from real L4 data distributions when not provided.
                             Creates y1/y2 dual-channel data with realistic signal magnitudes and ranges.
   :type randomize_signals: bool
   :param error_model: Error model to use:
                       - "simple": Constant noise as fraction of dynamic range (uses `noise`).
                       - "uniform": Constant absolute error per label (uses `y_err`).
                       - "realistic": Relative error with floor (uses `rel_error`, `min_error`).
                       - "physics": Shot noise + buffer noise (uses `buffer_sd`).
   :type error_model: str
   :param noise: For "simple" model: relative noise as fraction of dynamic range.
                 Use dict for per-label: {"y1": 0.05, "y2": 0.02}.
   :type noise: float | dict[str, float]
   :param y_err: For "uniform" model: constant absolute error per label.
                 Use dict for per-label: {"y1": 10.0, "y2": 3.0}.
   :type y_err: float | dict[str, float] | None
   :param rel_error: For "realistic" model: relative error as fraction of signal.
                     Use dict for per-label: {"y1": 0.07, "y2": 0.025} for 3x y1/y2 ratio.
   :type rel_error: float | dict[str, float]
   :param min_error: For "realistic" model: minimum error floor (instrument noise).
   :type min_error: float | dict[str, float]
   :param buffer_sd: For "physics" model: base buffer SD where err = sqrt(signal + buffer_sd^2).
                     For y2, scaled by error_ratio if not a dict.
   :type buffer_sd: float | dict[str, float]
   :param error_ratio: For two-label physics model: ratio of y2_buffer_sd to y1_buffer_sd.
                       1.0 = equal errors, 0.2 = y2 has 1/5 the error of y1.
   :type error_ratio: float
   :param low_ph_drop: Simulate acidic tail collapse at lowest pH (realistic artifact).
   :type low_ph_drop: bool
   :param low_ph_drop_magnitude: Fraction of signal to drop at lowest pH (0-1).
   :type low_ph_drop_magnitude: float
   :param low_ph_drop_label: Which label to apply the pH drop to ("y1" or "y2").
   :type low_ph_drop_label: str
   :param saturation_prob: Probability of masking points (saturation).
   :type saturation_prob: float
   :param x_error_large: Additional random x-error (pH units).
   :type x_error_large: float
   :param x_systematic_offset: Systematic x-offset (pH units).
   :type x_systematic_offset: float
   :param rel_x_err: Relative x-error for Cl titrations (ignored for pH).
   :type rel_x_err: float
   :param n_points: Number of pH points. If None, uses L2_PH_VALUES (7 points).
                    If specified, generates evenly-spaced pH from 5.5 to 9.0.
   :type n_points: int | None

   :returns: * *Dataset* -- Generated dataset with specified labels.
             * *TruthParams* -- Ground truth parameters (K, S0, S1).

   .. rubric:: Examples

   Simple single-channel for unit tests:

   >>> ds, truth = make_dataset(7.0, 100, 1000, error_model="simple", noise=0.02)

   Randomized dual-channel matching real data distributions:

   >>> ds, truth = make_dataset(randomize_signals=True, seed=42)

   Randomized single-channel:

   >>> ds, truth = make_dataset(randomize_signals=True, n_labels=1, seed=42)

   Physics-based errors with differential noise (y2 5x more precise):

   >>> ds, truth = make_dataset(
   ...     k=7.0,
   ...     s0={"y1": 1000, "y2": 800},
   ...     s1={"y1": 200, "y2": 300},
   ...     error_model="physics",
   ...     buffer_sd=50.0,
   ...     error_ratio=0.2,
   ... )

   Simulate low-pH drop artifact:

   >>> ds, truth = make_dataset(
   ...     randomize_signals=True,
   ...     low_ph_drop=True,
   ...     low_ph_drop_magnitude=0.4,
   ... )


.. py:function:: make_simple_dataset(k, s0, s1, *, is_ph, noise = 0.02, seed = None, rel_x_err = 0.01)

   Create a simple synthetic Dataset for unit tests.

   Uses fixed x-values and simple noise model for backward compatibility
   with existing tests. Does NOT set y_err when noise=0 to allow fitters
   to use default weighting.


.. py:function:: make_benchmark_dataset(k = 7.0, *, n_labels = 1, n_points = 7, error_ratio = 1.0, add_outlier = False, outlier_label = 'y1', outlier_sigma = 4.0, seed = None, rng = None)

   Generate synthetic data for fitter benchmarking.

   This is an alias for make_dataset with physics error model and
   convenient defaults for benchmarking.

   :param k: True pKa value (default 7.0).
   :type k: float
   :param n_labels: Number of labels: 1 or 2 (default 1).
   :type n_labels: int
   :param n_points: Number of pH points (default 7).
   :type n_points: int
   :param error_ratio: Ratio of y2_buffer_sd to y1_buffer_sd.
                       1.0 = equal errors, 0.2 = y2 has 1/5 the error of y1.
   :type error_ratio: float
   :param add_outlier: If True, add a low-pH drop in the specified label.
   :type add_outlier: bool
   :param outlier_label: Label to add pH drop to ("y1" or "y2").
   :type outlier_label: str
   :param outlier_sigma: Magnitude of low-pH drop (fraction of signal, 0-1).
                         Default 4.0 is converted to 0.4 (40% drop).
   :type outlier_sigma: float
   :param seed: Random seed for reproducibility.
   :type seed: int | None
   :param rng: Pre-existing random generator (overrides seed if provided).
   :type rng: np.random.Generator | None

   :returns: * *Dataset* -- Generated dataset.
             * *TruthParams* -- Ground truth parameters.

   .. rubric:: Examples

   Single label, clean:
   >>> ds, truth = make_benchmark_dataset(k=7.0, n_labels=1)

   Two labels with 1:5 error ratio:
   >>> ds, truth = make_benchmark_dataset(k=7.0, n_labels=2, error_ratio=0.2)

   Two labels with low-pH drop in noisy channel:
   >>> ds, truth = make_benchmark_dataset(
   ...     k=7.0, n_labels=2, error_ratio=0.2, add_outlier=True, outlier_label="y1"
   ... )