Flexible Clean Propulsion Technologies

Automatic Relevance Detection (ARD) and Hyperparameter Tuning in the Training of Gaussian Process Regression-based Surrogate Models: A Machine Learning Approach

Author

Md Shahnawaz Ahmed

Category

Publication channel

Keywords

ARD, model training, Gaussian process regression, regression., surrogate modeling, hyperparameter optimization, kernel selection, log marginal likelihood, machine learning

Year of the publication

2025

Citation

Ahmed, Md Shahnawaz. (2025). Automatic Relevance Detection (ARD) and Hyperparameter Tuning in the Training of Gaussian Process Regression‑based Surrogate Models: A Machine Learning Approach. Åbo Akademi University. https://cleanpropulsion.org/wp-content/uploads/2025/10/ahmed_md_shahnawaz-1.pdf

Language

English

Related to:

Abstract

Gaussian Process Regressor (GPR) is a Bayesian optimization-based non-parametric regression technique that has emerged as a very successful modeling method for creating surrogate models in Machine Learning (ML), particularly where real-world experimental costs are very high and limited. The effectiveness of GPR hinges on precise hyperparameter calibration and the model’s ability to discern and prioritize relevant input features. This thesis investigates GPR with a particular emphasis on hyperparameter tuning and integrating Automatic Relevance Determination (ARD) to assess the significance of input features and improve surrogate modeling. A series of controlled experiments is conducted on generated synthetic datasets with diverse dimensionality, noise levels, signal variances,
and feature length scales, enabling a systematic evaluation of model behavior under different training circumstances. Implemented using Python and the scikit-learn library, the experimental framework facilitates practical assessment of GPR’s sensitivity to initialization, sample size, and kernel configurations. Model performances are evaluated for comparison using multiple optimization strategies under different settings. Although these optimization strategies have various algorithms, they still serve a few optimization goals, like maximizing the likelihood and minimizing the error/loss function. The proposed alternate strategies improve the existing model’s drawbacks, model generalization, reduce overfitting, and decrease computational overhead. These findings contribute to developing robust and interpretable surrogate models, with implications for scientific computing, optimization tasks, and engineering applications.