Hyperparameter Sweep
This notebook introduces how to perform a hyperparameter sweep to find the best hyperparameters for our model using the Optuna library. Feel free to modify the objective function if you would like to test other hyperparameters or values.
# imports
import argparse
from argparse import Namespace
from pytorch_lightning import Trainer, LightningModule, seed_everything
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.loggers import CSVLogger, TensorBoardLogger
from torchsummary import summary
from yeastdnnexplorer.data_loaders.synthetic_data_loader import SyntheticDataLoader
from yeastdnnexplorer.ml_models.simple_model import SimpleModel
from yeastdnnexplorer.ml_models.customizable_model import CustomizableModel
import optuna
import matplotlib.pyplot as plt
import seaborn as sns
# set random seed for reproducability
seed_everything(42)
Seed set to 42
42
Here we define loggers and checkpoints for our model. Checkpoints tell pytorch when to save instances of the model (that can be loaded and inspected later) and loggers tell pytorch how to format the metrics that the model logs during its training.
# Checkpoint to save the best version of model (during the entire training process) based on the metric passed into "monitor"
best_model_checkpoint = ModelCheckpoint(
monitor="val_mse", # You can modify this to save the best model based on any other metric that the model you're testing tracks and reports
mode="min",
filename="best-model-{epoch:02d}-{val_loss:.2f}.ckpt",
save_top_k=1, # Can modify this to save the top k models
)
# Callback to save checkpoints every 2 epochs, regardless of performance
periodic_checkpoint = ModelCheckpoint(
filename="periodic-{epoch:02d}.ckpt",
every_n_epochs=2,
save_top_k=-1, # Setting -1 saves all checkpoints
)
# define loggers for the model
tb_logger = TensorBoardLogger("logs/tensorboard_logs")
csv_logger = CSVLogger("logs/csv_logs")
Now we perform our hyperparameter sweep using the Optuna library. To do this, we need to define an objective function that returns a scalar value. This scalar value will be the value that our sweep is attempting to minimize. We train one instance of our model inside each call to the objective function (each model on each iteration will use a different selection of hyperparameters). In our objective function, we return the validation mse associated with the instance of the model. This is because we would like to find the combination of hyperparameters that leads to the lowest validation mse. We use validation mse instead of test mse since we do not want to risk fitting to the test data at all while tuning hyperparameters.
If you'd like to try different hyperparameters, you just need to modify the list of possible values corresponding to the hyperparameter in question.
If you'd like to run the hyperparamter sweep on real data instead of synthetic data, simply swap out the synthetic data loader for the real data loader.
# on each call to the objective function, it will choose a hyperparameter value from each of the suggest_categorical arrays and pass them into the model
# this allows us to test many different hyperparameter configurations during our sweep
def objective(trial):
# model hyperparameters
lr = trial.suggest_categorical("lr", [0.01])
hidden_layer_num = trial.suggest_categorical("hidden_layer_num", [1, 2, 3, 5])
activation = trial.suggest_categorical(
"activation", ["ReLU", "Sigmoid", "Tanh", "LeakyReLU"]
)
optimizer = trial.suggest_categorical("optimizer", ["Adam", "SGD", "RMSprop"])
L2_regularization_term = trial.suggest_categorical(
"L2_regularization_term", [0.0, 0.1]
)
dropout_rate = trial.suggest_categorical(
"dropout_rate", [0.0, 0.5]
)
# data module hyperparameters
batch_size = trial.suggest_categorical("batch_size", [32])
# training hyperparameters
max_epochs = trial.suggest_categorical(
"max_epochs", [1]
) # default is 10
# defining what to pass in for the hidden layer sizes list based on the number of hidden layers
hidden_layer_sizes_configurations = {
1: [[64], [256]],
2: [[64, 32], [256, 64]],
3: [[256, 128, 32], [512, 256, 64]],
5: [[512, 256, 128, 64, 32]],
}
hidden_layer_sizes = trial.suggest_categorical(
f"hidden_layer_sizes_{hidden_layer_num}_layers",
hidden_layer_sizes_configurations[hidden_layer_num],
)
print("=" * 70)
print("About to create model with the following hyperparameters:")
print(f"lr: {lr}")
print(f"hidden_layer_num: {hidden_layer_num}")
print(f"hidden_layer_sizes: {hidden_layer_sizes}")
print(f"activation: {activation}")
print(f"optimizer: {optimizer}")
print(f"L2_regularization_term: {L2_regularization_term}")
print(f"dropout_rate: {dropout_rate}")
print(f"batch_size: {batch_size}")
print(f"max_epochs: {max_epochs}")
print("")
# create data module
data_module = SyntheticDataLoader(
batch_size=batch_size,
num_genes=4000,
bound_mean=3.0,
bound=[0.5] * 10,
n_sample=[1, 2, 2, 4, 4],
val_size=0.1,
test_size=0.1,
random_state=42,
max_mean_adjustment=3.0,
)
num_tfs = sum(data_module.n_sample) # sum of all n_sample is the number of TFs
# create model
model = CustomizableModel(
input_dim=num_tfs,
output_dim=num_tfs,
lr=lr,
hidden_layer_num=hidden_layer_num,
hidden_layer_sizes=hidden_layer_sizes,
activation=activation,
optimizer=optimizer,
L2_regularization_term=L2_regularization_term,
dropout_rate=dropout_rate,
)
# create trainer
trainer = Trainer(
max_epochs=max_epochs,
deterministic=True,
accelerator="cpu",
# callbacks and loggers are commented out for now since running a large sweep would generate an unnecessarily huge amount of checkpoints and logs
# callbacks=[best_model_checkpoint, periodic_checkpoint],
# logger=[tb_logger, csv_logger],
)
# train model
trainer.fit(model, data_module)
# get best validation loss from the model
return trainer.callback_metrics["val_mse"]
Now we define an optuna study, which represents our hyperparameter sweep. It will run the objective function n_trials times and choose the model that gave the best val_mse across all of those trials with different hyperparameters. Note that this will create a very large amount of output as it will show training stats for every model. This is why we print out the best params and loss in a separate cell.
STUDY_NAME = "CustomizableModelHyperparameterSweep3"
NUM_TRIALS = 5 # you will need a lot more than 5 trials if you have many possible combinations of hyperparams
# Perform hyperparameter optimization using Optuna
study = optuna.create_study(
direction="minimize", # we want to minimize the val_mse
study_name=STUDY_NAME,
# storage="sqlite:///db.sqlite3", # you can save the study results in a database if you'd like, this is needed if you want to try and use the optuna dashboard library to dispaly results
)
study.optimize(objective, n_trials=NUM_TRIALS)
# Get the best hyperparameters and their corresponding values
best_params = study.best_params
best_loss = study.best_value
[I 2024-05-29 13:18:03,548] A new study created in memory with name: CustomizableModelHyperparameterSweep3 /Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/optuna/distributions.py:524: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [64] which is of type list. warnings.warn(message) /Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/optuna/distributions.py:524: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [256] which is of type list. warnings.warn(message)
====================================================================== About to create model with the following hyperparameters: lr: 0.01 hidden_layer_num: 1 hidden_layer_sizes: [256] activation: Tanh optimizer: RMSprop L2_regularization_term: 0.1 dropout_rate: 0.5 batch_size: 32 max_epochs: 1
GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /Users/ericjia/yeastdnnexplorer/yeastdnnexplorer/data_loaders/synthetic_data_loader.py:260: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). X_train, Y_train = torch.tensor(X_train, dtype=torch.float32), torch.tensor( /Users/ericjia/yeastdnnexplorer/yeastdnnexplorer/data_loaders/synthetic_data_loader.py:263: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). X_val, Y_val = torch.tensor(X_val, dtype=torch.float32), torch.tensor( /Users/ericjia/yeastdnnexplorer/yeastdnnexplorer/data_loaders/synthetic_data_loader.py:266: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). X_test, Y_test = torch.tensor(X_test, dtype=torch.float32), torch.tensor( | Name | Type | Params ---------------------------------------------------- 0 | activation | Tanh | 0 1 | input_layer | Linear | 3.6 K 2 | hidden_layers | ModuleList | 0 3 | output_layer | Linear | 3.3 K 4 | dropout | Dropout | 0 5 | mae | MeanAbsoluteError | 0 6 | SMSE | SMSE | 0 ---------------------------------------------------- 6.9 K Trainable params 0 Non-trainable params 6.9 K Total params 0.028 Total estimated model params size (MB)
Sanity Checking: | …
/Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 15 worker processes in total. Our suggested max number of worker in current system is 8 (`cpuset` is not taken into account), which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg(
Training: | …
Validation: | …
`Trainer.fit` stopped: `max_epochs=1` reached. [I 2024-05-29 13:18:26,417] Trial 0 finished with value: 4.489274501800537 and parameters: {'lr': 0.01, 'hidden_layer_num': 1, 'activation': 'Tanh', 'optimizer': 'RMSprop', 'L2_regularization_term': 0.1, 'dropout_rate': 0.5, 'batch_size': 32, 'max_epochs': 1, 'hidden_layer_sizes_1_layers': [256]}. Best is trial 0 with value: 4.489274501800537. /Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/optuna/distributions.py:524: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [64] which is of type list. warnings.warn(message) /Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/optuna/distributions.py:524: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [256] which is of type list. warnings.warn(message) GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs
====================================================================== About to create model with the following hyperparameters: lr: 0.01 hidden_layer_num: 1 hidden_layer_sizes: [256] activation: LeakyReLU optimizer: SGD L2_regularization_term: 0.1 dropout_rate: 0.5 batch_size: 32 max_epochs: 1
| Name | Type | Params ---------------------------------------------------- 0 | activation | LeakyReLU | 0 1 | input_layer | Linear | 3.6 K 2 | hidden_layers | ModuleList | 0 3 | output_layer | Linear | 3.3 K 4 | dropout | Dropout | 0 5 | mae | MeanAbsoluteError | 0 6 | SMSE | SMSE | 0 ---------------------------------------------------- 6.9 K Trainable params 0 Non-trainable params 6.9 K Total params 0.028 Total estimated model params size (MB)
Sanity Checking: | …
Training: | …
Validation: | …
`Trainer.fit` stopped: `max_epochs=1` reached. [I 2024-05-29 13:18:45,320] Trial 1 finished with value: 6.033911228179932 and parameters: {'lr': 0.01, 'hidden_layer_num': 1, 'activation': 'LeakyReLU', 'optimizer': 'SGD', 'L2_regularization_term': 0.1, 'dropout_rate': 0.5, 'batch_size': 32, 'max_epochs': 1, 'hidden_layer_sizes_1_layers': [256]}. Best is trial 0 with value: 4.489274501800537. /Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/optuna/distributions.py:524: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [64, 32] which is of type list. warnings.warn(message) /Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/optuna/distributions.py:524: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [256, 64] which is of type list. warnings.warn(message) GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs
====================================================================== About to create model with the following hyperparameters: lr: 0.01 hidden_layer_num: 2 hidden_layer_sizes: [256, 64] activation: ReLU optimizer: SGD L2_regularization_term: 0.0 dropout_rate: 0.5 batch_size: 32 max_epochs: 1
| Name | Type | Params ---------------------------------------------------- 0 | activation | ReLU | 0 1 | input_layer | Linear | 3.6 K 2 | hidden_layers | ModuleList | 16.4 K 3 | output_layer | Linear | 845 4 | dropout | Dropout | 0 5 | mae | MeanAbsoluteError | 0 6 | SMSE | SMSE | 0 ---------------------------------------------------- 20.9 K Trainable params 0 Non-trainable params 20.9 K Total params 0.084 Total estimated model params size (MB)
Sanity Checking: | …
Training: | …
Validation: | …
`Trainer.fit` stopped: `max_epochs=1` reached. [I 2024-05-29 13:19:02,993] Trial 2 finished with value: 6.900921821594238 and parameters: {'lr': 0.01, 'hidden_layer_num': 2, 'activation': 'ReLU', 'optimizer': 'SGD', 'L2_regularization_term': 0.0, 'dropout_rate': 0.5, 'batch_size': 32, 'max_epochs': 1, 'hidden_layer_sizes_2_layers': [256, 64]}. Best is trial 0 with value: 4.489274501800537. GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs
====================================================================== About to create model with the following hyperparameters: lr: 0.01 hidden_layer_num: 2 hidden_layer_sizes: [64, 32] activation: Tanh optimizer: Adam L2_regularization_term: 0.1 dropout_rate: 0.0 batch_size: 32 max_epochs: 1
| Name | Type | Params ---------------------------------------------------- 0 | activation | Tanh | 0 1 | input_layer | Linear | 896 2 | hidden_layers | ModuleList | 2.1 K 3 | output_layer | Linear | 429 4 | dropout | Dropout | 0 5 | mae | MeanAbsoluteError | 0 6 | SMSE | SMSE | 0 ---------------------------------------------------- 3.4 K Trainable params 0 Non-trainable params 3.4 K Total params 0.014 Total estimated model params size (MB)
Sanity Checking: | …
Training: | …
Validation: | …
`Trainer.fit` stopped: `max_epochs=1` reached. [I 2024-05-29 13:19:19,976] Trial 3 finished with value: 4.5260910987854 and parameters: {'lr': 0.01, 'hidden_layer_num': 2, 'activation': 'Tanh', 'optimizer': 'Adam', 'L2_regularization_term': 0.1, 'dropout_rate': 0.0, 'batch_size': 32, 'max_epochs': 1, 'hidden_layer_sizes_2_layers': [64, 32]}. Best is trial 0 with value: 4.489274501800537. /Users/ericjia/Library/Caches/pypoetry/virtualenvs/yeastdnnexplorer-iu4_cpc2-py3.11/lib/python3.11/site-packages/optuna/distributions.py:524: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [512, 256, 128, 64, 32] which is of type list. warnings.warn(message) GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs
====================================================================== About to create model with the following hyperparameters: lr: 0.01 hidden_layer_num: 5 hidden_layer_sizes: [512, 256, 128, 64, 32] activation: Tanh optimizer: RMSprop L2_regularization_term: 0.1 dropout_rate: 0.5 batch_size: 32 max_epochs: 1
| Name | Type | Params ---------------------------------------------------- 0 | activation | Tanh | 0 1 | input_layer | Linear | 7.2 K 2 | hidden_layers | ModuleList | 174 K 3 | output_layer | Linear | 429 4 | dropout | Dropout | 0 5 | mae | MeanAbsoluteError | 0 6 | SMSE | SMSE | 0 ---------------------------------------------------- 182 K Trainable params 0 Non-trainable params 182 K Total params 0.729 Total estimated model params size (MB)
Sanity Checking: | …
Training: | …
Validation: | …
`Trainer.fit` stopped: `max_epochs=1` reached. [I 2024-05-29 13:19:37,861] Trial 4 finished with value: 4.612905502319336 and parameters: {'lr': 0.01, 'hidden_layer_num': 5, 'activation': 'Tanh', 'optimizer': 'RMSprop', 'L2_regularization_term': 0.1, 'dropout_rate': 0.5, 'batch_size': 32, 'max_epochs': 1, 'hidden_layer_sizes_5_layers': [512, 256, 128, 64, 32]}. Best is trial 0 with value: 4.489274501800537.
Print out the best hyperparameters and the val_mse assocaited with the model with the best hyperparameters.
print("RESULTS" + ("=" * 70))
print(f"Best hyperparameters: {best_params}")
print(f"Best loss: {best_loss}")
RESULTS====================================================================== Best hyperparameters: {'lr': 0.01, 'hidden_layer_num': 1, 'activation': 'Tanh', 'optimizer': 'RMSprop', 'L2_regularization_term': 0.1, 'dropout_rate': 0.5, 'batch_size': 32, 'max_epochs': 1, 'hidden_layer_sizes_1_layers': [256]} Best loss: 4.489274501800537
And that's it! Now you could take what you found to be the best hyperparameters and train a model with them for many more epochs. The Optuna Documentation will be a helpful resource if you'd like to add more to this notebook or the hyperparam sweep functions