Fix excessive GEBP register spilling for 32-bit NEON.

Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
3 files changed
tree: a890999030a9b7b22f0091ba5185b1a58d06d550
  1. bench/
  2. blas/
  3. ci/
  4. cmake/
  5. debug/
  6. demos/
  7. doc/
  8. Eigen/
  9. failtest/
  10. lapack/
  11. scripts/
  12. test/
  13. unsupported/
  14. .gitignore
  15. .gitlab-ci.yml
  16. .hgeol
  17. CMakeLists.txt
  18. COPYING.APACHE
  19. COPYING.BSD
  20. COPYING.GPL
  21. COPYING.LGPL
  22. COPYING.MINPACK
  23. COPYING.MPL2
  24. COPYING.README
  25. CTestConfig.cmake
  26. CTestCustom.cmake.in
  27. eigen3.pc.in
  28. INSTALL
  29. README.md
  30. signature_of_eigen3_matrix_library
README.md

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/.

For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.