stratified_cv_r2¶
R² calculation with stratified cross-validation for tfbpmodeling.
tfbpmodeling.stratified_cv_r2 ¶
stratified_cv_r2 ¶
stratified_cv_r2(
y,
X,
classes,
estimator=LinearRegression(fit_intercept=True),
skf=StratifiedKFold(
n_splits=4, shuffle=True, random_state=42
),
**kwargs
)
Calculate the average stratified CV r-squared for a given estimator and data. By default, this is a 4-fold stratified CV with a LinearRegression estimator. Note that by default, the estimator is set to LinearRegression() and the StratifiedKFold object is set to a 4-fold stratified CV with shuffle=True and random_state=42. LinearRegression has fit_intercept explicitly set to True, meaning the data IS NOT expected to be centered and there should not be a constant column in X.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in tfbpmodeling/stratified_cv_r2.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
Overview¶
The stratified_cv_r2 module provides specialized functions for calculating R² scores using stratified cross-validation. This ensures that model performance metrics accurately reflect the model's ability to generalize across different data strata.
Key Features¶
- Stratified R² Calculation: R² scores that account for data stratification
- Cross-Validation Integration: Works with stratified CV folds
- Bootstrap Compatibility: Integrates with bootstrap resampling
- Robust Performance Metrics: Reduces bias in performance estimation
Usage Examples¶
Basic R² Calculation¶
from tfbpmodeling.stratified_cv_r2 import calculate_stratified_r2
# Calculate stratified R² scores
r2_scores = calculate_stratified_r2(
estimator=LassoCV(),
X=predictor_data,
y=response_data,
cv_folds=5,
stratification_bins=[0, 8, 12, np.inf]
)
print(f"Mean R²: {r2_scores.mean():.3f}")
print(f"Std R²: {r2_scores.std():.3f}")
Bootstrap Integration¶
from tfbpmodeling.stratified_cv_r2 import bootstrap_stratified_r2
# Bootstrap R² with stratification
bootstrap_r2 = bootstrap_stratified_r2(
estimator=LassoCV(),
X=predictor_data,
y=response_data,
n_bootstraps=1000,
cv_folds=5,
stratification_bins=[0, 8, 12, np.inf]
)
# Get confidence interval for R²
r2_ci = np.percentile(bootstrap_r2, [2.5, 97.5])
print(f"R² 95% CI: [{r2_ci[0]:.3f}, {r2_ci[1]:.3f}]")
Performance Metrics¶
Stratified R²¶
Calculates R² separately for each stratum and then aggregates:
# Per-stratum R² calculation
stratum_r2 = calculate_per_stratum_r2(
estimator=model,
X=X_test,
y=y_test,
strata=test_strata
)
Weighted Aggregation¶
Combines R² scores across strata with appropriate weighting:
# Weighted average R²
weighted_r2 = calculate_weighted_r2(
stratum_r2_scores=stratum_scores,
stratum_weights=stratum_sizes
)
Related Modules¶
- stratified_cv: Stratified cross-validation
- bootstrap_model_results: Results aggregation
- interface: Workflow integration