commit | f85038b7f3e9a0bd7d2bfbed96cc966863aeea57 | [log] [tgz] |
---|---|---|
author | Antonio Sanchez <cantonios@google.com> | Wed Feb 03 08:18:28 2021 -0800 |
committer | Antonio Sanchez <cantonios@google.com> | Wed Feb 03 09:01:48 2021 -0800 |
tree | a890999030a9b7b22f0091ba5185b1a58d06d550 | |
parent | 56c8b14d875ae42a52d0da52916fac1e29305ca7 [diff] |
Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.
Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
For more information go to http://eigen.tuxfamily.org/.
For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.