In the previous post, we built a single layer neural network that showed an impressive test accuracy of 98.2%. That said, in the previous post, we chose some default parameters -- we trained on a cross-entropy loss, we used 50 epochs, we used an 'Adam' optimizer, and we didn't set a batch size. If we were able to get such impressive results out of the box, can we do better? And if so, how?
Enter hyperparameter tuning. Hyperparameter tuning is the process of searching through combinations of neural net hyperparameters to find the ones that perform the best. There are a number of strategies to do this efficiently; here, we use scikit-learn
's GridSearchCV
function to optimize over hyperparameters, using 5-fold cross-validation.
This post was interesting because it begins to get at the limits of ChatGPT4, at least in so far as my prompting is concerned. Read on to see the failure modes of ChatGPT in this notebook.
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rc
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
rc('text', usetex=True)
rc('text.latex', preamble=r'\usepackage{cmbright}')
rc('font', **{'family': 'sans-serif', 'sans-serif': ['Helvetica']})
%matplotlib inline
# This enables SVG graphics inline.
%config InlineBackend.figure_formats = {'png', 'retina'}
rc = {'lines.linewidth': 2,
'axes.labelsize': 18,
'axes.titlesize': 18,
'axes.facecolor': 'DFDFE5'}
sns.set_context('notebook', rc=rc)
sns.set_style("dark")
mpl.rcParams['xtick.labelsize'] = 16
mpl.rcParams['ytick.labelsize'] = 16
mpl.rcParams['legend.fontsize'] = 14
2023-05-22 14:25:26.357157: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Reload the data we used previously:
# chatgpt suggested:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"
data = pd.read_csv(url, header=None)
# The first column is an ID (we can ignore this), the second column is the label (M = malignant, B = benign),
# and the rest are features
labels = data.iloc[:, 1]
features = data.iloc[:, 2:]
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
le = LabelEncoder()
labels_encoded = le.fit_transform(labels)
features_train, features_test, labels_train, labels_test =\
train_test_split(features_scaled, labels_encoded,
test_size=0.2, random_state=42)
I gave chatGPT the minimal code from the previous post, and I asked it to help me write a hyperparameter tuning script. This is what it came up with:
# chatgpt suggested the following, but is deprecated:
#from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# the right code is now:
from scikeras.wrappers import KerasClassifier
# from here, all chatgpt except choice of hyperparams, which is mine:
from sklearn.model_selection import GridSearchCV
def create_model(optimizer='adam', loss='binary_crossentropy'):
m = Sequential()
m.add(Dense(1, activation='sigmoid', input_shape=(features_train.shape[1],)))
m.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
return m
# hyperparameters to tune over
param_grid = {
'optimizer': ['SGD', 'RMSprop', 'Adam'],
'epochs': [10, 20, 30],
'batch_size': [10, 20, 30],
}
# do the grid search
model = KerasClassifier(model=create_model, verbose=0)
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=10)
grid_result = grid.fit(features_train, labels_train)
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
WARNING:tensorflow:5 out of the last 15 calls to <function Model.make_predict_function.<locals>.predict_function at 0x17caab310> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details. WARNING:tensorflow:5 out of the last 13 calls to <function Model.make_predict_function.<locals>.predict_function at 0x10f3c9700> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details. Best: 0.9757487922705315 using {'batch_size': 20, 'epochs': 30, 'optimizer': 'SGD'}
Next, I asked it to fit the best model to the data, given the hyperparameter tuning results:
# Extract the best parameters
best_params = grid_result.best_params_
# Train the model with the best parameters
model = create_model(best_params['optimizer'])
history = model.fit(features_train, labels_train,
epochs=best_params['epochs'],
batch_size=best_params['batch_size'],
verbose=0)
import matplotlib.pyplot as plt
# Plot training accuracy
plt.figure(figsize=(12, 6))
plt.plot(history.history['accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper left')
plt.show()
# Plot training loss
plt.figure(figsize=(12, 6))
plt.plot(history.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper left')
plt.show()
loss, accuracy = model.evaluate(features_test, labels_test, verbose=0)
print(f"Test loss: {loss}")
print(f"Test accuracy: {accuracy}")
Test loss: 0.10274569690227509 Test accuracy: 0.9824561476707458
Wow! We went from 94% accuracy to 98.2% accuracy, just by tuning the model a tiny bit! That is seriously cool--and ChatGPT did most of the work!
One of the things I was wondering about was whether we could also tune the choice of loss function and whether a multi-layer NN might outperform our very simple single-layer network. To study this, I modified the param_grids
dictionary, and I re-wrote the create_model
function (with Chat's help) to have a variable number of relu layers. Then, I asked Chat to hyperparameter optimize this new function....
def create_model(optimizer='adam', loss='binary_crossentropy', num_layers=1):
model = Sequential()
# add relu layers
for _ in range(num_layers):
model.add(Dense(10, activation='relu'))
# final layer for classification:
model.add(Dense(1, activation='sigmoid'))
# compile
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
return model
param_grid = {
'optimizer': ['SGD'],
'loss': ['binary_crossentropy', 'hinge'],
'epochs': [50, 100, 150],
'batch_size': [10, 20],
'num_layers': [1, 2]
}
model = KerasClassifier(model=create_model, verbose=0)
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5)
grid_result = grid.fit(features_train, labels_train)
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[10], line 20 11 param_grid = { 12 'optimizer': ['SGD'], 13 'loss': ['binary_crossentropy', 'hinge'], (...) 16 'num_layers': [1, 2] 17 } 19 grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5) ---> 20 grid_result = grid.fit(features_train, labels_train) 22 print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}") File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_search.py:874, in BaseSearchCV.fit(self, X, y, groups, **fit_params) 868 results = self._format_results( 869 all_candidate_params, n_splits, all_out, all_more_results 870 ) 872 return results --> 874 self._run_search(evaluate_candidates) 876 # multimetric is determined here because in the case of a callable 877 # self.scoring the return type is only known after calling 878 first_test_score = all_out[0]["test_scores"] File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_search.py:1388, in GridSearchCV._run_search(self, evaluate_candidates) 1386 def _run_search(self, evaluate_candidates): 1387 """Search all candidates in param_grid""" -> 1388 evaluate_candidates(ParameterGrid(self.param_grid)) File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_search.py:821, in BaseSearchCV.fit.<locals>.evaluate_candidates(candidate_params, cv, more_results) 813 if self.verbose > 0: 814 print( 815 "Fitting {0} folds for each of {1} candidates," 816 " totalling {2} fits".format( 817 n_splits, n_candidates, n_candidates * n_splits 818 ) 819 ) --> 821 out = parallel( 822 delayed(_fit_and_score)( 823 clone(base_estimator), 824 X, 825 y, 826 train=train, 827 test=test, 828 parameters=parameters, 829 split_progress=(split_idx, n_splits), 830 candidate_progress=(cand_idx, n_candidates), 831 **fit_and_score_kwargs, 832 ) 833 for (cand_idx, parameters), (split_idx, (train, test)) in product( 834 enumerate(candidate_params), enumerate(cv.split(X, y, groups)) 835 ) 836 ) 838 if len(out) < 1: 839 raise ValueError( 840 "No fits were performed. " 841 "Was the CV iterator empty? " 842 "Were there no candidates?" 843 ) File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/utils/parallel.py:63, in Parallel.__call__(self, iterable) 58 config = get_config() 59 iterable_with_config = ( 60 (_with_config(delayed_func, config), args, kwargs) 61 for delayed_func, args, kwargs in iterable 62 ) ---> 63 return super().__call__(iterable_with_config) File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/joblib/parallel.py:1085, in Parallel.__call__(self, iterable) 1076 try: 1077 # Only set self._iterating to True if at least a batch 1078 # was dispatched. In particular this covers the edge (...) 1082 # was very quick and its callback already dispatched all the 1083 # remaining jobs. 1084 self._iterating = False -> 1085 if self.dispatch_one_batch(iterator): 1086 self._iterating = self._original_iterator is not None 1088 while self.dispatch_one_batch(iterator): File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/joblib/parallel.py:901, in Parallel.dispatch_one_batch(self, iterator) 899 return False 900 else: --> 901 self._dispatch(tasks) 902 return True File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/joblib/parallel.py:819, in Parallel._dispatch(self, batch) 817 with self._lock: 818 job_idx = len(self._jobs) --> 819 job = self._backend.apply_async(batch, callback=cb) 820 # A job can complete so quickly than its callback is 821 # called before we get here, causing self._jobs to 822 # grow. To ensure correct results ordering, .insert is 823 # used (rather than .append) in the following line 824 self._jobs.insert(job_idx, job) File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/joblib/_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback) 206 def apply_async(self, func, callback=None): 207 """Schedule a func to be run""" --> 208 result = ImmediateResult(func) 209 if callback: 210 callback(result) File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/joblib/_parallel_backends.py:597, in ImmediateResult.__init__(self, batch) 594 def __init__(self, batch): 595 # Don't delay the application, to avoid keeping the input 596 # arguments in memory --> 597 self.results = batch() File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/joblib/parallel.py:288, in BatchedCalls.__call__(self) 284 def __call__(self): 285 # Set the default nested backend to self._backend but do not set the 286 # change the default number of processes to -1 287 with parallel_backend(self._backend, n_jobs=self._n_jobs): --> 288 return [func(*args, **kwargs) 289 for func, args, kwargs in self.items] File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/joblib/parallel.py:288, in <listcomp>(.0) 284 def __call__(self): 285 # Set the default nested backend to self._backend but do not set the 286 # change the default number of processes to -1 287 with parallel_backend(self._backend, n_jobs=self._n_jobs): --> 288 return [func(*args, **kwargs) 289 for func, args, kwargs in self.items] File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/utils/parallel.py:123, in _FuncWrapper.__call__(self, *args, **kwargs) 121 config = {} 122 with config_context(**config): --> 123 return self.function(*args, **kwargs) File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:674, in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score) 671 for k, v in parameters.items(): 672 cloned_parameters[k] = clone(v, safe=False) --> 674 estimator = estimator.set_params(**cloned_parameters) 676 start_time = time.time() 678 X_train, y_train = _safe_split(estimator, X, y, train) File ~/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py:1168, in BaseWrapper.set_params(self, **params) 1164 super().set_params(**{param: value}) 1165 except ValueError: 1166 # Give a SciKeras specific user message to aid 1167 # in moving from the Keras wrappers -> 1168 raise ValueError( 1169 f"Invalid parameter {param} for estimator {self.__name__}." 1170 "\nThis issue can likely be resolved by setting this parameter" 1171 f" in the {self.__name__} constructor:" 1172 f"\n`{self.__name__}({param}={value})`" 1173 "\nCheck the list of available parameters with" 1174 " `estimator.get_params().keys()`" 1175 ) from None 1176 return self ValueError: Invalid parameter num_layers for estimator KerasClassifier. This issue can likely be resolved by setting this parameter in the KerasClassifier constructor: `KerasClassifier(num_layers=1)` Check the list of available parameters with `estimator.get_params().keys()`
And it failed! So I gave the error message to ChatGPT and it was completely unable to fix the problem. To me, that is surprising--the error gives a solution that in fact works. However, ChatGPT was totally unablo to find a simple solution to this. Simply pasting the error message and trying to get it to fix the code (prompt: Please fix the code that is giving this error message) led to increasingly worse solutions.
Eventually, I remembered that zero-shot prompts that ask these LLMs to reason about their logic and enumerate steps frequently perform better. So, I prompted Chat to "Please reason about the error message by breaking it into pieces. Then suggest a solution based on this analysis of the error message", and it output the following (very hacky) code:
from sklearn.model_selection import GridSearchCV
def create_model_func(num_layers=1):
"""A wrapper around `create_model`, which specifies how many layers `create_model` should have"""
def create_model(optimizer='adam', loss='binary_crossentropy'):
model = Sequential()
# here chat made a mistake: when adding features, it's important to specify the
# number of inputs into each layer
# add layers. the architecture for this network goes from
# M features --> 10 features with relu activation --> 10 .... --> 1 sigmoid node
for _ in range(num_layers):
if _ == 0:
model.add(Dense(10, activation='relu', input_shape=(features_train.shape[1],)))
else:
model.add(Dense(10, activation='relu', input_shape=(10,)))
if num_layers == 1:
model.add(Dense(1, activation='sigmoid', input_shape=(features_train.shape[1],)))
else:
model.add(Dense(1, activation='sigmoid', input_shape=(10,)))
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
return model
#return a function with the number of layers pre-specified
return create_model
param_grid = {
'optimizer': ['SGD', 'Adam'],
'loss': ['binary_crossentropy', 'hinge'],
'epochs': [20, 40, 60],
'batch_size': [10, 20, 30],
}
models = []
for num_layers in [1, 2]:
model = KerasClassifier(model=create_model_func(num_layers), verbose=0)
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=10)
grid_result = grid.fit(features_train, labels_train)
print(f"Best for {num_layers} layers: {grid_result.best_score_} using {grid_result.best_params_}")
models.append(grid_result)
/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:378: FitFailedWarning: 180 fits failed out of a total of 360. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 180 fits failed with the following error: Traceback (most recent call last): File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 1494, in fit super().fit(X=X, y=y, sample_weight=sample_weight, **kwargs) File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 762, in fit self._fit( File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 929, in _fit self._check_model_compatibility(y) File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 571, in _check_model_compatibility raise ValueError( ValueError: loss=hinge but model compiled with binary_crossentropy. Data may not match loss function! warnings.warn(some_fits_failed_message, FitFailedWarning) /Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_search.py:952: UserWarning: One or more of the test scores are non-finite: [0.97130435 0.96917874 nan nan 0.98024155 0.98246377 nan nan 0.97357488 0.96917874 nan nan 0.9626087 0.97806763 nan nan 0.97140097 0.97144928 nan nan 0.9757971 0.97362319 nan nan 0.96483092 0.94942029 nan nan 0.97362319 0.97135266 nan nan 0.97362319 0.9736715 nan nan] warnings.warn(
Best for 1 layers: 0.9824637681159419 using {'batch_size': 10, 'epochs': 40, 'loss': 'binary_crossentropy', 'optimizer': 'Adam'}
/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:378: FitFailedWarning: 180 fits failed out of a total of 360. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 180 fits failed with the following error: Traceback (most recent call last): File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 1494, in fit super().fit(X=X, y=y, sample_weight=sample_weight, **kwargs) File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 762, in fit self._fit( File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 929, in _fit self._check_model_compatibility(y) File "/Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/scikeras/wrappers.py", line 571, in _check_model_compatibility raise ValueError( ValueError: loss=hinge but model compiled with binary_crossentropy. Data may not match loss function! warnings.warn(some_fits_failed_message, FitFailedWarning) /Users/davidangeles/opt/anaconda3/envs/gene_expression_env/lib/python3.9/site-packages/sklearn/model_selection/_search.py:952: UserWarning: One or more of the test scores are non-finite: [0.97801932 0.9736715 nan nan 0.97352657 0.97797101 nan nan 0.97806763 0.97149758 nan nan 0.97144928 0.96913043 nan nan 0.97570048 0.96483092 nan nan 0.96700483 0.97352657 nan nan 0.9736715 0.96048309 nan nan 0.9647343 0.97801932 nan nan 0.97574879 0.96690821 nan nan] warnings.warn(
Best for 2 layers: 0.9780676328502416 using {'batch_size': 10, 'epochs': 60, 'loss': 'binary_crossentropy', 'optimizer': 'SGD'}
According to this code, it would be best to use a single layer!