Stacking generalization is a widely used technique among machine learning (ML) engineers, where multiple models are combined to boost overall predictive performance. On the other hand, hyperparameter optimization (HPO) involves systematically searching for the best set of hyperparameters to maximize the performance of a given ML algorithm.
A common challenge when using both stacking and HPO is the significant computational demand. These methods often require training multiple models and iterating through numerous hyperparameter combinations for each one. This can quickly become resource- and time-intensive, especially for large datasets.
In this post, we demonstrate how to streamline this pipeline, which combines stacking generalization with HPO. We will showcase how this workflow can be executed in just 15 minutes using GPU-accelerated computing with the cuML library. Thanks to the cuML zero code change integration with scikit-learn, you can use your existing ML workflow with GPU acceleration (without code modifications) and achieve the same model accuracy. Unlike CPU-based execution, where typically only one trial runs at a time, GPU acceleration enables parallel execution of multiple HPO trials, significantly reducing training time.
We first discuss the stacking approach we used, its implementation, and the improvement in accuracy. We then discuss how HPO will improve the overall accuracy by searching for the best hyperparameters.Â
Stacking generalization
Stacking generalization is a well-established technique that has been widely used in experiments, including many Kaggle competitions. It is an effective ensemble method but is often underused in practical applications due to its compute cost.Â
Figure 1 illustrates the stacking architecture we implemented. At the base level, we used three different models: Random Forest, K-Nearest Neighbors (KNN), and Logistic Regression. The predictions from these base models were then passed to a KNN metamodel, which made the final classification based on the combined outputs.
For our experiments, we used a classification dataset containing 1M samples extrapolated from the source dataset and nine features. This setup enabled us to leverage the strengths of each base model and improve overall prediction accuracy through stacking.
Figure 1. Stacking generalization technique using different algorithms
To enable GPU acceleration in your existing scikit-learn workflow, all you need is the cuML library. Once installed, simply add the magic command:
This command activates GPU acceleration under the hood—no changes are required to your model creation, training, or evaluation code. You can continue using the familiar scikit-learn syntax, but with execution boost achieved using the cuML library, which is GPU-accelerated.The following code snippet demonstrates how we set up our stacking pipeline using Random Forest, KNN, and Logistic Regression as base models, with a KNN model as the metalearner. To try it yourself, check out the accompanying Jupyter Notebook.
%load_ext cuml.accel
# Load the cuML libraries
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
# Define base models (level-0 models)
base_models = [
(“logistic_regression”, LogisticRegression(**lr_study.best_params, max_iter = 20000, tol=1e-3)),
(“random_forest”, RandomForestClassifier(**rf_study.best_params, random_state = 42)),
(“k_nearest_neighbors”, KNeighborsClassifier(**knn_study.best_params))
]
# Function to generate meta features for stacking
def generate_meta_features_for_stacking(base_models, X, y, X_meta):
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
meta_features = cp.zeros((X_meta.shape[0], len(base_models)))
for i, (name, model) in enumerate(base_models):
# Initialize array to hold out-of-fold predictions (for training data only)
meta_predictions = cp.zeros((X.shape[0],))
print(“Model name: “, name)
# Out-of-fold predictions (only used if X_meta == X)
for train_idx, val_idx in kfold.split(X, y):
model.fit(cp.array(X.iloc[train_idx]), cp.array(y.iloc[train_idx]))
predictions = model.predict(cp.array(X.iloc[val_idx]))
meta_predictions[val_idx] = predictions.ravel()
# Refit model on full training data for final prediction on X_meta
model.fit(cp.array(X), cp.array(y))
# Predict meta features for test data
predictions = model.predict(cp.array(X_meta))
meta_features[:, i] = cp.array(predictions).ravel()
return meta_features
# meta_train uses out-of-fold predictions to prevent leakage
meta_train = generate_meta_features_for_stacking(base_models, X_train_scaled, y_train_df, X_train_scaled)
# meta_valid uses predictions from base models trained on full training set
meta_valid = generate_meta_features_for_stacking(base_models, X_train_scaled, y_train_df, X_valid_scaled)
As shown in Figure 2, stacking generalization led to an overall improvement of 0.28% in the prediction accuracy, measured using 5-fold stratified cross-validation.
Figure 2. Improvement in accuracy achieved using the KNN meta model in stacking generalization
Hyperparameter optimization
To further enhance the performance of our stacking ensemble, we applied HPO to each of the base models as well as the meta model. For execution, we selected the best-performing base model to generate out-of-fold predictions, which were stacked to create a new metadataset. This dataset was used to run HPO on the KNN metamodel, further refining its performance.
For HPO, we used the Optuna library, optimizing for classification accuracy as a metric. It’s important to note that the entire HPO process was GPU-accelerated by enabling the cuML kernel using %load_ext cuml.accel. Hence, the syntax remains the same as scikit-learn.Â
The following code snippet shows how HPO was performed for Logistic Regression. The same approach applies to Random Forest, KNN, and the KNN metamodel. To explore the full implementation, check out the accompanying Jupyter Notebook.
%load_ext cuml.accel
# Load the cuML libraries
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
# Define the model training and evaluation function
def train_and_eval(C=1, penalty=’l2′):
lr = LogisticRegression(C = C, penalty = penalty, max_iter = 20000, tol=1e-3)
lr.fit(X_train_scaled, y_train_df)
y_proba = lr.predict_proba(X_valid_scaled)[:, 1]
# Compute accuracy score
score = cp.round(lr.score(cp.asnumpy(X_valid_scaled), cp.asnumpy(y_valid_df)) * 100, 2)
return score
# Define the Optuna objective function for hyperparameter tuning
def objective(trial):
C = trial.suggest_float(“C”, 1e-2, 1e2, log = True)
penalty = trial.suggest_categorical(“penalty”, [“l1”, “l2″])
return train_and_eval(C, penalty)
# Create an Optuna study to maximize accuracy score
lr_study = optuna.create_study(
direction=”maximize”,
study_name=”optuna_logistic_acc_score”,
sampler=optuna.samplers.RandomSampler(seed=142),
)
# Launch hyperparameter optimization with the defined objective
lr_study.optimize(objective, n_trials=40)
# Print the best hyperparameter set and its corresponding evaluation score
print(f”Best params: {lr_study.best_params}”)
print(f”Best accuracy score: {lr_study.best_value}”)
As shown in Figure 3, applying HPO to both the base models and the meta model resulted in a 1.44% improvement in prediction accuracy compared to the model without HPO.
Figure 3. Improvement in accuracy using HPO
Advantages of using GPU acceleration with cuML
To enhance execution speed, particularly during HPO for four distinct models, leveraging the GPU-accelerated cuML library is highly beneficial. It enables the completion of multiple iterations in the same timeframe that a CPU-based execution would require for just one iteration.Â
In our scenario, we managed to conduct approximately 40 iterations for each model, with each taking around 5 seconds, whereas a single iteration for a model would typically take approximately 5 minutes on a CPU. Moreover, activating GPU acceleration is straightforward—simply include the %load_ext cuml.accel command in your code, which provides seamless integration with scikit-learn.
Get started
The accuracy of a system can be enhanced by integrating stacking generalization with HPO. Our proposed solution uses the GPU-accelerated cuML library, enabling data scientists to conduct deep HPO for each model in the stacking generalization stack. The compatibility of the cuML library with scikit-learn syntax enables developers to seamlessly incorporate this technique into production environments. The integration not only facilitates the development of superior models but also accelerates the iteration process, thereby empowering data scientists and developers to achieve faster execution and improved model performance in real-world applications.
To try this approach with your own application, download the latest version of NVIDIA cuML. You can always share your feedback on Slack at #RAPIDS-GoAi.Â
To learn more about zero code change cuML, see NVIDIA cuML Brings Zero Code Change Acceleration to scikit-learn. For more examples of this capability, see the Google Colab notebook. The latest version of cuML, with zero code change capabilities, is preinstalled in Google Colab. For self-paced and instructor-led courses, see the DLI Learning Path for Data Science.