evaluate_interactor_significance_linear

Linear regression-based interactor significance testing for tfbpmodeling.

tfbpmodeling.evaluate_interactor_significance_linear

evaluate_interactor_significance_linear

evaluate_interactor_significance_linear(
    input_data,
    stratification_classes,
    model_variables,
    estimator=LinearRegression(fit_intercept=True),
)

Compare predictive performance of interaction terms vs. their main effects.

This function performs a stratified cross-validation comparison between: - The original model containing interaction terms (e.g., TF1:TF2) - A reduced model where each interactor is replaced by its corresponding main effect (e.g., TF2)

R² scores are computed for both models using stratified CV. The delta in R² informs whether the interaction term adds predictive value.

Parameters:
  • input_data (ModelingInputData) –

    A ModelingInputData instance containing predictors and response.

  • stratification_classes (ndarray) –

    Array of stratification labels for CV.

  • model_variables (list[str]) –

    List of model terms, including interaction terms.

  • estimator (BaseEstimator, default: LinearRegression(fit_intercept=True) ) –

    A scikit-learn estimator to use for modeling. Default is LinearRegression(fit_intercept=True).

Returns:
Raises:
  • KeyError

    If a main effect is missing from the input data.

Source code in tfbpmodeling/evaluate_interactor_significance_linear.py
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def evaluate_interactor_significance_linear(
    input_data: ModelingInputData,
    stratification_classes: np.ndarray,
    model_variables: list[str],
    estimator: BaseEstimator = LinearRegression(fit_intercept=True),
) -> "InteractorSignificanceResults":
    """
    Compare predictive performance of interaction terms vs. their main effects.

    This function performs a stratified cross-validation comparison between:
    - The original model containing interaction terms (e.g., TF1:TF2)
    - A reduced model where each interactor is replaced by its corresponding
      main effect (e.g., TF2)

    R² scores are computed for both models using stratified CV. The delta in R²
    informs whether the interaction term adds predictive value.

    :param input_data: A `ModelingInputData` instance containing predictors
        and response.
    :param stratification_classes: Array of stratification labels for CV.
    :param model_variables: List of model terms, including interaction terms.
    :param estimator: A scikit-learn estimator to use for modeling. Default is
        `LinearRegression(fit_intercept=True)`.

    :return: An `InteractorSignificanceResults` instance with evaluation results.

    :raises KeyError: If a main effect is missing from the input data.

    """
    logger.info("Interactor significance evaluation method: Linear")

    output = []

    response_df = input_data.response_df

    # Identify interaction terms (those with ":")
    interactors = [var for var in model_variables if ":" in var]

    logger.info(f"Testing the following interaction variables: {interactors}")

    # NOTE: add_row_max is set to True such that IF the formula includes row_max,
    # the column is present. However, if the formula doesn't not include row_max,
    # then that column will not be present in the model matrix.
    add_row_max = "row_max" in model_variables
    logger.info(
        "Using 'row_max' in model variables "
        "for evaluate_interactor_significance: %s",
        add_row_max,
    )
    # Get the average R² of the original model
    avg_r2_original_model = stratified_cv_r2(
        response_df,
        input_data.get_modeling_data(
            " + ".join(model_variables), add_row_max=add_row_max
        ),
        stratification_classes,
        estimator=estimator,
    )

    for interactor in interactors:
        # Extract main effect from interactor
        main_effect = interactor.split(":")[1]

        logger.debug(f"Testing interactor '{interactor}' with variant '{main_effect}'.")

        # Ensure main effect exists in predictors
        if main_effect not in input_data.predictors_df.columns:
            raise KeyError(f"Main effect '{main_effect}' not found in predictors.")

        # Define predictor sets for comparison
        predictors_with_main_effect = [
            var for var in model_variables if var != interactor
        ] + [
            main_effect
        ]  # Replace interactor with main effect

        # Get the average R² of the model with the main effect replacing one of the
        # interaction terms
        avg_r2_main_effect = stratified_cv_r2(
            response_df,
            input_data.get_modeling_data(
                " + ".join(predictors_with_main_effect), add_row_max=add_row_max
            ),
            stratification_classes,
            estimator=estimator,
        )

        # Store results
        output.append(
            {
                "interactor": interactor,
                "variant": main_effect,
                "avg_r2_interactor": avg_r2_original_model,
                "avg_r2_main_effect": avg_r2_main_effect,
                "delta_r2": avg_r2_main_effect - avg_r2_original_model,
            }
        )

    return InteractorSignificanceResults(output)

Overview

The evaluate_interactor_significance_linear module provides functions for evaluating the significance of interaction terms using standard linear regression methods. This approach uses classical statistical tests to compare models with and without interaction terms.

Key Features

  • Classical Statistics: Uses standard linear regression and F-tests
  • Direct Interpretation: Unregularized coefficients with clear interpretation
  • Statistical Rigor: Proper p-values and confidence intervals
  • Flexible Testing: Multiple comparison correction options

Usage Examples

Basic Significance Testing

from tfbpmodeling.evaluate_interactor_significance_linear import (
    evaluate_interactor_significance_linear
)

# Run linear regression-based significance testing
results = evaluate_interactor_significance_linear(
    X_main=main_effects_data,
    X_interaction=interaction_data,
    y=response_data,
    alpha=0.05
)

# Extract results
significant_interactions = results['significant_features']
p_values = results['p_values']
f_statistics = results['f_statistics']

Multiple Comparison Correction

# With Bonferroni correction
results = evaluate_interactor_significance_linear(
    X_main=main_effects_data,
    X_interaction=interaction_data,
    y=response_data,
    alpha=0.05,
    correction='bonferroni'
)

# With FDR correction
results = evaluate_interactor_significance_linear(
    X_main=main_effects_data,
    X_interaction=interaction_data,
    y=response_data,
    alpha=0.05,
    correction='fdr_bh'
)

Method Details

Statistical Approach

  1. Main Effect Model: Fit linear regression with only main effects
  2. Full Model: Fit linear regression with main effects + interactions
  3. F-Test: Compare models using F-statistic for nested model comparison
  4. Individual Tests: Test each interaction term individually

Model Comparison

  • Nested F-Test: Overall test for any interaction effects
  • Individual t-Tests: Test each interaction coefficient
  • Partial F-Tests: Test subsets of interaction terms
  • Multiple Comparisons: Adjust for multiple testing

Advantages

  • Interpretable: Direct coefficient interpretation
  • Established Theory: Well-understood statistical properties
  • Sensitive: Can detect small but significant effects
  • Comprehensive: Provides full statistical inference

Considerations

  • Overfitting Risk: May overfit in high-dimensional settings
  • Multicollinearity: Sensitive to correlated predictors
  • Assumptions: Requires standard linear regression assumptions
  • Multiple Testing: Needs correction for many interactions