bootstrapped_input_data¶

Bootstrap resampling functionality for tfbpmodeling input data.

tfbpmodeling.bootstrapped_input_data ¶

BootstrappedModelingInputData ¶

BootstrappedModelingInputData(
    response_df,
    model_df,
    n_bootstraps,
    normalize_sample_weights=True,
    random_state=None,
)

This class handles bootstrapped resampling of a response vector and model matrix.

This class supports both on-the-fly generation and externally provided bootstrap indices. For each bootstrap sample, it maintains sample weights derived from frequency counts of resampled instances.

Initialize bootstrapped modeling input.

Either n_bootstraps or bootstrap_indices must be provided.

Parameters:	`response_df` (`DataFrame`) – Response variable. `model_df` (`DataFrame`) – Predictor matrix. `n_bootstraps` (`int`) – Number of bootstrap replicates to generate. `random_state` (`int \| None`, default: `None` ) – Random state for reproducibility. Can be an integer or a numpy RandomState object, or None. If None (default), then a random random state is chosen.

Raises:	`ValueError` – if the response_df and model_df do not have the same index or if arguments are not correct datatype.

Source code in tfbpmodeling/bootstrapped_input_data.py

def __init__(
    self,
    response_df: pd.DataFrame,
    model_df: pd.DataFrame,
    n_bootstraps: int,
    normalize_sample_weights: bool = True,
    random_state: int | None = None,
) -> None:
    """
    Initialize bootstrapped modeling input.

    Either `n_bootstraps` or `bootstrap_indices` must be provided.

    :param response_df: Response variable.
    :param model_df: Predictor matrix.
    :param n_bootstraps: Number of bootstrap replicates to generate.
    :param random_state: Random state for reproducibility. Can be an integer or a
        numpy RandomState object, or None. If None (default), then a random
        random state is chosen.

    :raises ValueError: if the response_df and model_df do not have the same index
        or if arguments are not correct datatype.

    """

    self.response_df: pd.DataFrame = response_df
    self.model_df: pd.DataFrame = model_df
    if not response_df.index.equals(model_df.index):
        raise IndexError("response_df and model_df must have the same index order.")
    self.normalize_sample_weights = normalize_sample_weights

    # If bootstrap_indices is provided, set n_bootstraps based on its length
    self.n_bootstraps = n_bootstraps

    # set the random number generator attribute
    self.random_state = random_state
    self._rng = check_random_state(self.random_state)
    logger.info(
        f"Using random state: {self.random_state}"
        if self.random_state is not None
        else "No explicit random state set."
    )

    # Initialize attributes
    self._bootstrap_indices: list[np.ndarray] = []
    self._sample_weights: dict[int, np.ndarray] = {}

    self._generate_bootstrap_indices()

bootstrap_indices `property` `writable` ¶

bootstrap_indices

A list of arrays representing bootstrap sample indices.

model_df `property` `writable` ¶

model_df

Get the model DataFrame.

Returns:	`DataFrame` – The model DataFrame.

n_bootstraps `property` `writable` ¶

n_bootstraps

Get the number of bootstrap samples.

Returns:	`int` – The number of bootstrap samples.

normalize_sample_weights `property` `writable` ¶

normalize_sample_weights

Get the normalization status for sample weights.

Returns:	`bool` – True if sample weights are normalized, False otherwise.

random_state `property` `writable` ¶

random_state

An integer used to set the random state when generating the bootstrap samples.

Set this explicitly for reproducibility

response_df `property` `writable` ¶

response_df

Get the response DataFrame.

Returns:	`DataFrame` – The response DataFrame.

sample_weights `property` `writable` ¶

sample_weights

Normalized sample weights corresponding to bootstrap samples.

Returns:	`dict[int, ndarray]` – A dictionary mapping bootstrap index to sample weights.

iter ¶

__iter__()

Resets the iterator and returns itself.

Source code in tfbpmodeling/bootstrapped_input_data.py

def __iter__(self):
    """Resets the iterator and returns itself."""
    self._current_index = 0
    return self

next ¶

__next__()

Provides the next bootstrap sample for iteration.

Returns:	`tuple[ndarray, ndarray]` – Tuple of (sample_indices, sample_weights).

Raises:	`StopIteration` – When all bootstrap samples are exhausted.

Source code in tfbpmodeling/bootstrapped_input_data.py

def __next__(self) -> tuple[np.ndarray, np.ndarray]:
    """
    Provides the next bootstrap sample for iteration.

    :return: Tuple of (sample_indices, sample_weights).
    :raises StopIteration: When all bootstrap samples are exhausted.

    """
    if self._current_index >= self.n_bootstraps:
        raise StopIteration

    sample_indices, sample_weights = self.get_bootstrap_sample(self._current_index)

    self._current_index += 1
    return sample_indices, sample_weights

deserialize `classmethod` ¶

deserialize(filename)

Loads the object from a JSON file.

Parameters:	`filename` (`str`) – Path to the BootstrapModelingData JSON file.

Source code in tfbpmodeling/bootstrapped_input_data.py

@classmethod
def deserialize(cls, filename: str):
    """
    Loads the object from a JSON file.

    :param filename: Path to the BootstrapModelingData JSON file.

    """
    with open(filename) as f:
        data = json.load(f)

    response_df = pd.DataFrame(**data["response_df"]).rename_axis(
        index=data["index_name"]
    )
    model_df = pd.DataFrame(**data["model_df"]).rename_axis(
        index=data["index_name"]
    )
    n_bootstraps = data["n_bootstraps"]

    normalize_sample_weights = data["normalize_sample_weights"]

    random_state = data["random_state"]

    instance = cls(
        response_df,
        model_df,
        n_bootstraps,
        normalize_sample_weights=normalize_sample_weights,
        random_state=random_state,
    )

    return instance

get_bootstrap_sample ¶

get_bootstrap_sample(i)

Retrieves a bootstrap sample by index.

Parameters:	`i` (`int`) – Bootstrap sample index.

Returns:	`tuple[ndarray, ndarray]` – Tuple of (sample_indices, sample_weights).

Raises:	`IndexError` – If the index exceeds the number of bootstraps.

Source code in tfbpmodeling/bootstrapped_input_data.py

def get_bootstrap_sample(self, i: int) -> tuple[np.ndarray, np.ndarray]:
    """
    Retrieves a bootstrap sample by index.

    :param i: Bootstrap sample index.
    :return: Tuple of (sample_indices, sample_weights).
    :raises IndexError: If the index exceeds the number of bootstraps.

    """
    if i >= self.n_bootstraps or i < 0:
        raise IndexError(
            f"Bootstrap index {i} out of range. Max: {self.n_bootstraps - 1}"
        )

    sampled_indices = self.bootstrap_indices[i]
    sample_weights = self.get_sample_weight(i)

    return (
        sampled_indices,
        sample_weights,
    )

get_sample_weight ¶

get_sample_weight(i)

Retrieves sample weights for a bootstrap sample.

Parameters:	`i` (`int`) – Bootstrap sample index.

Returns:	`ndarray` – Array of sample weights.

Source code in tfbpmodeling/bootstrapped_input_data.py

def get_sample_weight(self, i: int) -> np.ndarray:
    """
    Retrieves sample weights for a bootstrap sample.

    :param i: Bootstrap sample index.
    :return: Array of sample weights.

    """
    if i >= self.n_bootstraps or i < 0:
        raise IndexError(f"Sample weight index {i} out of range.")
    return self.sample_weights[i]

regenerate ¶

regenerate()

Re-generate, randomly, bootstrap samples and sample weights.

This should be called if the response or predictors change.

Source code in tfbpmodeling/bootstrapped_input_data.py

def regenerate(self) -> None:
    """
    Re-generate, randomly, bootstrap samples and sample weights.

    This should be called if the response or predictors change.

    """
    self._generate_bootstrap_indices()

save_indices ¶

save_indices(filename)

Saves only the bootstrap indices to a JSON file.

Saves the bootstrap indices to a JSON file. This can be used to persist the bootstrap indices for later use, allowing for reproducibility in analyses.

Parameters:	`filename` (`str`) – Path to the JSON file where the bootstrap indices will be saved. This will overwrite the file if it exists.

Source code in tfbpmodeling/bootstrapped_input_data.py

def save_indices(self, filename: str) -> None:
    """
    Saves only the bootstrap indices to a JSON file.

    Saves the bootstrap indices to a JSON file. This can be used to persist the
    bootstrap indices for later use, allowing for reproducibility in analyses.

    :param filename: Path to the JSON file where the bootstrap indices will be
        saved. This will overwrite the file if it exists.

    """
    data = {
        "n_bootstraps": self.n_bootstraps,
        "bootstrap_indices": [
            indices.tolist() for indices in self._bootstrap_indices
        ],
    }
    with open(filename, "w") as f:
        json.dump(data, f)

serialize ¶

serialize(filename)

Saves the object as a JSON file.

Serializes the current state of the BootstrappedModelingInputData object to a JSON file, including the response and model DataFrames, number of bootstraps, bootstrap indices, and sample weights.

Parameters:	`filename` (`str`) – Path to the JSON file where the object will be saved.

Raises:	`ValueError` – If the filename is not a valid path or if the object cannot be serialized. This method will overwrite the file if it exists.

Source code in tfbpmodeling/bootstrapped_input_data.py

def serialize(self, filename: str) -> None:
    """
    Saves the object as a JSON file.

    Serializes the current state of the BootstrappedModelingInputData object to a
    JSON file, including the response and model DataFrames, number of bootstraps,
    bootstrap indices, and sample weights.

    :param filename: Path to the JSON file where the object will be saved.
    :raises ValueError: If the filename is not a valid path or if the object cannot
        be serialized. This method will overwrite the file if it exists.

    """
    data = {
        "response_df": self.response_df.to_dict(orient="split"),
        "index_name": self.response_df.index.name,
        "model_df": self.model_df.to_dict(orient="split"),
        "n_bootstraps": self.n_bootstraps,
        "normalize_sample_weights": self.normalize_sample_weights,
        "random_state": self.random_state,
    }

    with open(filename, "w") as f:
        json.dump(data, f)

Overview¶

The bootstrapped_input_data module provides the BootstrappedModelingInputData class that extends the base ModelingInputData with bootstrap resampling capabilities. This is essential for the statistical inference approach used in tfbpmodeling.

Key Features¶

Bootstrap Sample Generation: Creates multiple resampled datasets from the original data
Stratified Sampling: Maintains data distribution characteristics across bootstrap samples
Reproducible Results: Supports random seed setting for consistent results
Memory Efficient: Optimized storage and access patterns for large bootstrap sets

Usage Examples¶

Basic Bootstrap Creation¶

from tfbpmodeling.modeling_input_data import ModelingInputData
from tfbpmodeling.bootstrapped_input_data import BootstrappedModelingInputData

# Create base data
base_data = ModelingInputData(
    response_file='expression.csv',
    predictors_file='binding.csv',
    perturbed_tf='YPD1'
)

# Create bootstrap version
bootstrap_data = BootstrappedModelingInputData(
    base_data=base_data,
    n_bootstraps=1000,
    random_state=42
)

Accessing Bootstrap Samples¶

# Get bootstrap indices
indices = bootstrap_data.get_bootstrap_indices()

# Get specific bootstrap sample data
sample_data = bootstrap_data.get_bootstrap_sample(sample_idx=0)

# Iterate through all bootstrap samples
for i in range(bootstrap_data.n_bootstraps):
    sample = bootstrap_data.get_bootstrap_sample(i)
    # Process sample...

modeling_input_data: Base data structures
bootstrap_model_results: Results aggregation
interface: Main workflow integration

bootstrapped_input_data¶

tfbpmodeling.bootstrapped_input_data ¶

BootstrappedModelingInputData ¶

bootstrap_indices property writable ¶

model_df property writable ¶

n_bootstraps property writable ¶

normalize_sample_weights property writable ¶

random_state property writable ¶

response_df property writable ¶

sample_weights property writable ¶

__iter__ ¶

__next__ ¶

deserialize classmethod ¶

get_bootstrap_sample ¶

get_sample_weight ¶

regenerate ¶

save_indices ¶

serialize ¶

Overview¶

Key Features¶

Usage Examples¶

Basic Bootstrap Creation¶

Accessing Bootstrap Samples¶

Related Modules¶

bootstrap_indices `property` `writable` ¶

model_df `property` `writable` ¶

n_bootstraps `property` `writable` ¶

normalize_sample_weights `property` `writable` ¶

random_state `property` `writable` ¶

response_df `property` `writable` ¶

sample_weights `property` `writable` ¶

iter ¶

next ¶

deserialize `classmethod` ¶