Apply __launch_bounds__(1024) on CUDA, not just HIP

libeigen/eigen!2431

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
1 file changed