SparseLU: add a specialized gemm kernel, and add padding to the supernodes such that supernodes columns are all properly aligned