Add AVX512 optimizations for matrix multiply
17 files changed