evaluate_interactor_significance_lassocv

LassoCV-based interactor significance testing for tfbpmodeling.

tfbpmodeling.evaluate_interactor_significance_lassocv

evaluate_interactor_significance_lassocv

evaluate_interactor_significance_lassocv(
    input_data,
    stratification_classes,
    model_variables,
    estimator=LassoCV(
        fit_intercept=True,
        max_iter=10000,
        selection="random",
        random_state=42,
        n_jobs=4,
    ),
)

Evaluate which interaction terms survive LassoCV when main effects are included.

Returns:
  • InteractorSignificanceResults
    • List of retained interaction terms - pd.Series of all model coefficients (indexed by term name) - Selected alpha value from LassoCV
Source code in tfbpmodeling/evaluate_interactor_significance_lassocv.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
def evaluate_interactor_significance_lassocv(
    input_data: ModelingInputData,
    stratification_classes: np.ndarray,
    model_variables: list[str],
    estimator: BaseEstimator = LassoCV(
        fit_intercept=True,
        max_iter=10000,
        selection="random",
        random_state=42,
        n_jobs=4,
    ),
) -> "InteractorSignificanceResults":
    """
    Evaluate which interaction terms survive LassoCV when main effects are included.

    :return:
        - List of retained interaction terms
        - pd.Series of all model coefficients (indexed by term name)
        - Selected alpha value from LassoCV

    """
    logger.info("Interactor significance evaluation method: LassoCV")

    interactors = [v for v in model_variables if ":" in v]
    modifier_main_effects = {i.split(":")[1] for i in interactors}

    augmented_vars = list(set(model_variables + list(modifier_main_effects)))
    logger.info(
        f"Model includes interaction terms and their main effects: {augmented_vars}"
    )
    add_row_max = "row_max" in augmented_vars
    logger.info(
        "Using 'row_max' in model variables "
        "for evaluate_interactor_significance: %s",
        add_row_max,
    )

    X = input_data.get_modeling_data(
        " + ".join(augmented_vars),
        add_row_max=add_row_max,
        drop_intercept=True,
    )
    y = input_data.response_df

    skf = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)

    model_i = stratified_cv_modeling(
        y,
        X,
        classes=stratification_classes,
        estimator=estimator,
        skf=skf,
    )

    coefs = pd.Series(model_i.coef_, index=X.columns)
    retained_vars = coefs[coefs != 0].index.tolist()
    retained_interactors = [v for v in retained_vars if ":" in v]

    logger.info(f"Retained interaction terms: {retained_interactors}")
    y_pred = model_i.predict(X)
    r2_full_model = r2_score(y, y_pred)

    output = []
    for interactor in interactors:
        main_effect = interactor.split(":")[1]
        output.append(
            {
                "interactor": interactor,
                "variant": main_effect,
                "r2_lasso_model": r2_full_model,
                "coef_interactor": coefs.get(interactor, 0.0),
                "coef_main_effect": coefs.get(main_effect, 0.0),
            }
        )

    return InteractorSignificanceResults(output)

Overview

The evaluate_interactor_significance_lassocv module provides functions for evaluating the significance of interaction terms using LassoCV regularization. This approach uses regularized regression to compare models with and without interaction terms, providing a conservative approach to interaction significance testing.

Key Features

  • Regularized Comparison: Uses LassoCV to compare interaction vs main effect models
  • Cross-Validation: Built-in CV for robust model comparison
  • Conservative Testing: Regularization reduces false positive interactions
  • Scalable Analysis: Handles high-dimensional feature spaces efficiently

Usage Examples

Basic Significance Testing

from tfbpmodeling.evaluate_interactor_significance_lassocv import (
    evaluate_interactor_significance_lassocv
)

# Run LassoCV-based significance testing
results = evaluate_interactor_significance_lassocv(
    X_main=main_effects_data,
    X_interaction=interaction_data,
    y=response_data,
    cv_folds=5,
    alpha_range=np.logspace(-4, 1, 50)
)

# Extract significant interactions
significant_interactions = results['significant_features']
p_values = results['p_values']

Advanced Configuration

# Custom LassoCV parameters
results = evaluate_interactor_significance_lassocv(
    X_main=main_effects_data,
    X_interaction=interaction_data,
    y=response_data,
    cv_folds=10,
    alpha_range=np.logspace(-5, 2, 100),
    max_iter=10000,
    tol=1e-6
)

Method Details

Statistical Approach

  1. Main Effect Model: Fit LassoCV with only main effects
  2. Interaction Model: Fit LassoCV with main effects + interactions
  3. Model Comparison: Compare CV scores and coefficient stability
  4. Significance Assessment: Determine if interactions improve model performance

Advantages

  • Regularization: Reduces overfitting in high-dimensional settings
  • Feature Selection: Automatically selects relevant interactions
  • Robust: Less sensitive to noise compared to standard linear regression
  • Scalable: Efficient for large feature sets

Considerations

  • Conservative: May miss weak but real interactions
  • Hyperparameter Sensitive: Alpha range affects results
  • Interpretation: Regularized coefficients may be shrunk