bug fix in gemv: solution always use a temporary in dst.innerStride != 1 even though this is not needed when packet_size == 1....