MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning

Apr 15, 2025·

Mathys E. Jam

Eric Petit

P. Oliveira

David Defour

Greg Henry

William Jalby

· 1 min read

PDF Code Hal

Abstract

Many High-Performance Computing (HPC) libraries rely on decision trees to select the best kernel hyperparameters at runtime,depending on the input and environment. However, finding optimized configurations for each input and environment is challengingand requires significant manual effort and computational resources. This paper presents MLKAPS, a tool that automates this task using machine learning and adaptive sampling techniques. MLKAPS generates decision trees that tune HPC kernels’ design parameters toachieve efficient performance for any user input. MLKAPS scales to large input and design spaces, outperforming similar state-of-the-artauto-tuning tools in tuning time and mean speedup. We demonstrate the benefits of MLKAPS on the highly optimized Intel MKLdgetrf LU kernel and show that MLKAPS finds blindspots in the manual tuning of HPC experts. It improves over 85% of the inputswith a geomean speedup of x1.30. On the Intel MKL dgeqrf QR kernel, MLKAPS improves performance on 85% of the inputs with ageomean speedup of x1.18.

Type

Preprint

This article contains most of the work I accomplished during my 1st year as a PhD student, as well as results I obtained during my Master’s Degree internship.

This version was submitted to ACM TACO in December 2024 and is currently under review

The preprint is available on HAL and arXiv.

Last updated on Apr 15, 2025

Auto-Tuning

Authors

Mathys E. Jam

PhD Student