If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of hyperparameters whose values you should specify when training the model. So how do you find the optimal values for these hyperparameters?
You can use hyperparameter tuning to find the best values for the hyperparameters. By systematically adjusting hyperparameters, you can optimize your models to achieve the best possible results.
This tutorial provides practical tips for effective hyperparameter tuning—starting from building a baseline model to using advanced techniques like Bayesian optimization. Whether you’re new to hyperparameter tuning or looking to refine your approach, these tips will help you build better machine learning models. Let’s get started.
1. Start Simple: Train a Baseline Model Without Any Tuning
When beginning the process of hyperparameter tuning, it’s good to start simple by training a baseline model without any tuning. This initial model serves as a reference point to measure the impact of subsequent tuning efforts.
Here’s why this step is essential and how to execute it effectively:
- A baseline model provides a benchmark to compare against models with the models . This helps in quantifying the improvements achieved through hyperparameter tuning.
- Select a default model: Choose a model that fits the problem at hand. For example, a decision tree for a classification problem or a linear regression for a regression problem.
- Use default hyperparameters: Train the model using the default hyperparameters provided by the library. For instance, if using scikit-learn, instantiate the model without specifying any parameters.
Assess the performance of the baseline model using appropriate metrics. This step involves splitting the data into training and testing sets, training the model, making predictions, and evaluating the results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris
# Load data data = load_iris() X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=25)
# Initialize model with default parameters model = DecisionTreeClassifier()
# Train model model.fit(X_train, y_train)
# Predict and evaluate y_pred = model.predict(X_test) baseline_accuracy = accuracy_score(y_test, y_pred) print(f‘Baseline Accuracy: {baseline_accuracy:.2f}’) |
Document the performance metrics of the baseline model. This will be useful for comparison as you proceed with hyperparameter tuning.
2. Use Hyperparameter Search with Cross-Validation
Once you have established a baseline model, the next step is to optimize the model’s performance through hyperparameter tuning. Utilizing hyperparameter search techniques with cross-validation is a robust approach to finding the best set of hyperparameters.
Why use hyperparameter search with cross-validation?
- Cross-validation provides a more reliable estimate of model performance by averaging results across multiple folds, reducing the risk of overfitting to a particular train-test split.
- Hyperparameter search methods like Grid Search and Random Search allow for systematic exploration of the hyperparameter space, ensuring a thorough evaluation of potential configurations.
- This method helps in selecting hyperparameters that generalize well to unseen data, leading to better model performance in production.
Choose a search technique: Select a hyperparameter search method. The two most common strategies are:
- Grid search which involves an exhaustive search over a parameter grid
- Randomized search which involves random sampling parameters from a specified distribution
Define hyperparameter grid: Specify the hyperparameters and their respective ranges or distributions to search over.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris
# Load data data = load_iris() X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=25)
# Initialize model model = DecisionTreeClassifier()
# Define hyperparameter grid for Grid Search param_grid = { ‘criterion’: [‘gini’, ‘entropy’], ‘max_depth’: [None, 10, 20, 30], ‘min_samples_split’: [2, 5, 10] } |
Use cross-validation: Instead of defining a cross-validation strategy separately, you can use cross_val_score to evaluate model performance with the specified cross-validation scheme.
from sklearn.model_selection import cross_val_score
# Grid Search grid_search = GridSearchCV(model, param_grid, cv=5, scoring=‘accuracy’) grid_search.fit(X_train, y_train) best_params_grid = grid_search.best_params_ best_score_grid = grid_search.best_score_
print(f‘Best Parameters (Grid Search): {best_params_grid}’) print(f‘Best Cross-Validation Score (Grid Search): {best_score_grid:.2f}’) |
Using hyperparameter tuning with cross-validation this way ensures more reliable performance estimates and improved model generalization.
3. Use Randomized Search for Initial Exploration
When starting hyperparameter tuning, it’s often beneficial to use randomized search for initial exploration. Randomized search provides a more efficient way to explore a wide range of hyperparameters compared to grid search, especially when dealing with high-dimensional hyperparameter spaces.
Define hyperparameter distribution: Specify the hyperparameters and their respective distributions from which to sample.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
from sklearn.model_selection import RandomizedSearchCV from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris import numpy as np
# Load data data = load_iris() X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
# Initialize model model = DecisionTreeClassifier()
# Define hyperparameter distribution for Random Search param_dist = { ‘criterion’: [‘gini’, ‘entropy’], ‘max_depth’: [None] + list(range(10, 31)), ‘min_samples_split’: range(2, 11), ‘min_samples_leaf’: range(1, 11) } |
Set up randomized search with cross-validation: Use randomized search with cross-validation to explore the hyperparameter space.
# Random Search random_search = RandomizedSearchCV(model, param_dist, n_iter=100, cv=5, scoring=‘accuracy’) random_search.fit(X_train, y_train) best_params_random = random_search.best_params_ best_score_random = random_search.best_score_
print(f‘Best Parameters (Random Search): {best_params_random}’) print(f‘Best Cross-Validation Score (Random Search): {best_score_random:.2f}’) |
Evaluate the model: Train the model using the best hyperparameters and evaluate its performance on the test set.
best_model = DecisionTreeClassifier(**best_params_random) best_model.fit(X_train, y_train) y_pred = best_model.predict(X_test) final_accuracy = accuracy_score(y_test, y_pred)
print(f‘Final Model Accuracy: {final_accuracy:.2f}’) |
Randomized search is, therefore, better suited for high-dimensional hyperparameter spaces and computationally expensive models.
4. Monitor Overfitting with Validation Curves
Validation curves help visualize the effect of a hyperparameter on the training and validation performance, allowing you to identify overfitting or underfitting.
Here’s an example. This code snippet evaluates how the performance of a Random Forest classifier varies with different values of the n_estimators hyperparameter using validation curves. It does this by calculating training and cross-validation scores for a range of n_estimators values (10, 100, 200, 400, 800, 1000) across 5-fold cross-validation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
from sklearn.model_selection import validation_curve from sklearn.ensemble import RandomForestClassifier import matplotlib.pyplot as plt import numpy as np
# Define hyperparameter range param_range = [10, 100, 200, 400, 800, 1000]
# Calculate validation curve train_scores, test_scores = validation_curve( RandomForestClassifier(), X_train, y_train, param_name=“n_estimators”, param_range=param_range, cv=5, scoring=“accuracy”)
# Calculate mean and standard deviation train_mean = np.mean(train_scores, axis=1) train_std = np.std(train_scores, axis=1) test_mean = np.mean(test_scores, axis=1) test_std = np.std(test_scores, axis=1) |
It then plots the mean accuracy scores along with their standard deviations for both training and cross-validation sets. The resulting plot helps to visualize whether the model is overfitting or underfitting at different values of n_estimators.
# Plot validation curve plt.plot(param_range, train_mean, label=“Training score”, color=“r”) plt.fill_between(param_range, train_mean – train_std, train_mean + train_std, color=“r”, alpha=0.3) plt.plot(param_range, test_mean, label=“Cross-validation score”, color=“g”) plt.fill_between(param_range, test_mean – test_std, test_mean + test_std, color=“g”, alpha=0.3) plt.title(“Validation Curve with Random Forest”) plt.xlabel(“Number of Estimators”) plt.ylabel(“Accuracy”) plt.legend(loc=“best”) plt.show() |
5. Use Bayesian Optimization for Efficient Search
Using Bayesian optimization for hyperparameter tuning is a highly efficient and effective approach. It uses probabilistic modeling to explore the hyperparameter space—requiring fewer evaluations and computational resources.
You’ll need libraries like scikit-optimize or hyperopt to perform Bayesian optimization. Here, we’ll use scikit-optimize:
!pip install scikit–optimize |
Define the hyperparameter space: Specify the hyperparameters and their respective ranges to search over.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
from skopt import BayesSearchCV from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score
# Load data data = load_iris() X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=25)
# Initialize model model = DecisionTreeClassifier()
# Define hyperparameter space for Bayesian Optimization param_space = { ‘criterion’: [‘gini’, ‘entropy’], ‘max_depth’: [None] + list(range(10, 31)), ‘min_samples_split’: (2, 10), ‘min_samples_leaf’: (1, 10) } |
Set up Bayesian optimization with cross-validation: Use Bayesian optimization with cross-validation to explore the hyperparameter space.
# Bayesian Optimization opt = BayesSearchCV(model, param_space, n_iter=32, cv=5, scoring=‘accuracy’) opt.fit(X_train, y_train) best_params_bayes = opt.best_params_ best_score_bayes = opt.best_score_
print(f‘Best Parameters (Bayesian Optimization): {best_params_bayes}’) print(f‘Best Cross-Validation Score (Bayesian Optimization): {best_score_bayes:.2f}’) |
Evaluate the model: Train a final model using the best hyperparameters found by Bayesian optimization and evaluate its performance on the test set.
best_model = DecisionTreeClassifier(**best_params_bayes) best_model.fit(X_train, y_train) y_pred = best_model.predict(X_test) final_accuracy = accuracy_score(y_test, y_pred)
print(f‘Final Model Accuracy: {final_accuracy:.2f}’) |
Summary
Effective hyperparameter tuning can make a substantial difference in the performance of your machine learning models.
By starting with a simple baseline model and progressively using search techniques, you can systematically explore and identify the best hyperparameters. From initial exploration with randomized search to efficient fine-tuning with Bayesian optimization, we went over practical tips to optimize your model’s hyperparameters.
So happy hyperparameter tuning!