doc/UsingAOCL.dox - mirror - Git at Google

 /*
  Copyright (c) 2025, AMD Inc. All rights reserved.
  Redistribution and use in source and binary forms, with or without modification,
  are permitted provided that the following conditions are met:
  * Redistributions of source code must retain the above copyright notice, this
    list of conditions and the following disclaimer.
  * Redistributions in binary form must reproduce the above copyright notice,
    this list of conditions and the following disclaimer in the documentation
    and/or other materials provided with the distribution.
  * Neither the name of AMD nor the names of its contributors may
    be used to endorse or promote products derived from this software without
    specific prior written permission.
  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
  ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
  ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  ********************************************************************************
  * Content : Documentation on the use of AMD AOCL through Eigen
  ********************************************************************************
 */

 namespace Eigen {

 /** \page TopicUsingAOCL Using AMD® AOCL from %Eigen

 Since %Eigen version 3.4 and later, users can benefit from built-in AMD® Optimizing CPU Libraries (AOCL) optimizations with an installed copy of AOCL 5.0 (or later).

 <a href="https://www.amd.com/en/developer/aocl.html"> AMD AOCL </a> provides highly optimized, multi-threaded mathematical routines for x86-64 processors with a focus on AMD "Zen"-based architectures. AOCL is available on Linux and Windows for x86-64 architectures.

 \note
 AMD® AOCL is freely available software, but it is the responsibility of users to download, install, and ensure their product's license allows linking to the AOCL libraries. AOCL is distributed under a permissive license that allows commercial use.

 Using AMD AOCL through %Eigen is straightforward:
 -# export \c AOCL_ROOT into your environment
 -# define one of the AOCL macros before including any %Eigen headers (see table below)
 -# link your program to AOCL libraries (BLIS, FLAME, LibM)
 -# ensure your system supports the target architecture optimizations

 When doing so, a number of %Eigen's algorithms are silently substituted with calls to AMD AOCL routines.
 These substitutions apply only for \b Dynamic \b or \b large \b enough objects with one of the following standard scalar types: \c float, \c double, \c complex<float>, and \c complex<double>.
 Operations on other scalar types or mixing reals and complexes will continue to use the built-in algorithms.

 The AOCL integration targets three core components:
 - **BLIS**: High-performance BLAS implementation optimized for modern cache hierarchies
 - **FLAME**: Dense linear algebra algorithms providing LAPACK functionality
 - **LibM**: Optimized standard math routines with vectorized implementations

 \section TopicUsingAOCL_Macros Configuration Macros

 You can choose which parts will be substituted by defining one or multiple of the following macros:

 <table class="manual">
 <tr><td>\c EIGEN_USE_BLAS </td><td>Enables the use of external BLAS level 2 and 3 routines (AOCL-BLIS)</td></tr>
 <tr class="alt"><td>\c EIGEN_USE_LAPACKE </td><td>Enables the use of external LAPACK routines via the LAPACKE C interface (AOCL-FLAME)</td></tr>
 <tr><td>\c EIGEN_USE_LAPACKE_STRICT </td><td>Same as \c EIGEN_USE_LAPACKE but algorithms of lower robustness are disabled. \n This currently concerns only JacobiSVD which would be replaced by \c gesvd.</td></tr>
 <tr class="alt"><td>\c EIGEN_USE_AOCL_VML </td><td>Enables the use of AOCL LibM vector math operations for coefficient-wise functions</td></tr>
 <tr><td>\c EIGEN_USE_AOCL_ALL </td><td>Defines \c EIGEN_USE_BLAS, \c EIGEN_USE_LAPACKE, and \c EIGEN_USE_AOCL_VML</td></tr>
 <tr class="alt"><td>\c EIGEN_USE_AOCL_MT </td><td>Equivalent to \c EIGEN_USE_AOCL_ALL, but ensures multi-threaded BLIS (\c libblis-mt) is used. \n \b Recommended for most applications.</td></tr>
 </table>

 \note The AOCL integration automatically enables optimizations when the matrix/vector size exceeds \c EIGEN_AOCL_VML_THRESHOLD (default: 128 elements). For smaller operations, Eigen's built-in vectorization may be faster due to function call overhead.

 \section TopicUsingAOCL_Performance Performance Considerations

 The \c EIGEN_USE_BLAS and \c EIGEN_USE_LAPACKE macros can be combined with AOCL-specific optimizations:

 - **Multi-threading**: Use \c EIGEN_USE_AOCL_MT to automatically select the multi-threaded BLIS library
 - **Architecture targeting**: AOCL libraries are optimized for AMD Zen architectures (Zen, Zen2, Zen3, Zen4, Zen5)
 - **Vector Math Library**: AOCL LibM provides vectorized implementations that can operate on entire arrays simultaneously
 - **Memory layout**: Eigen's column-major storage directly matches AOCL's expected data layout for zero-copy operation

 \section TopicUsingAOCL_Types Supported Data Types and Sizes

 AOCL acceleration is applied to:
 - **Scalar types**: \c float, \c double, \c complex<float>, \c complex<double>
 - **Matrix/Vector sizes**: Dynamic size or compile-time size ≥ \c EIGEN_AOCL_VML_THRESHOLD
 - **Storage order**: Both column-major (default) and row-major layouts
 - **Memory alignment**: Eigen's data pointers are directly compatible with AOCL function signatures

 The current AOCL Vector Math Library integration is specialized for \c double precision, with automatic fallback to scalar implementations for \c float.

 \section TopicUsingAOCL_Functions Vector Math Functions

 The following table summarizes coefficient-wise operations accelerated by \c EIGEN_USE_AOCL_VML:

 <table class="manual">
 <tr><th>Code example</th><th>AOCL routines</th></tr>
 <tr><td>\code
 v2 = v1.array().exp();
 v2 = v1.array().sin();
 v2 = v1.array().cos();
 v2 = v1.array().tan();
 v2 = v1.array().log();
 v2 = v1.array().log10();
 v2 = v1.array().log2();
 v2 = v1.array().sqrt();
 v2 = v1.array().pow(1.5);
 v2 = v1.array() + v2.array();
 \endcode</td><td>\code
 amd_vrda_exp
 amd_vrda_sin
 amd_vrda_cos
 amd_vrda_tan
 amd_vrda_log
 amd_vrda_log10
 amd_vrda_log2
 amd_vrda_sqrt
 amd_vrda_pow
 amd_vrda_add
 \endcode</td></tr>
 </table>

 In the examples, v1 and v2 are dense vectors of type \c VectorXd with size ≥ \c EIGEN_AOCL_VML_THRESHOLD.

 \section TopicUsingAOCL_Example Complete Example

 \code
 #define EIGEN_USE_AOCL_MT
 #include <iostream>
 #include <Eigen/Dense>

 int main() {
     const int n = 2048;

     // Large matrices automatically use AOCL-BLIS for multiplication
     Eigen::MatrixXd A = Eigen::MatrixXd::Random(n, n);
     Eigen::MatrixXd B = Eigen::MatrixXd::Random(n, n);
     Eigen::MatrixXd C = A * B;  // Dispatched to dgemm

     // Large vectors automatically use AOCL LibM for math functions
     Eigen::VectorXd v = Eigen::VectorXd::LinSpaced(10000, 0, 10);
     Eigen::VectorXd result = v.array().sin();  // Dispatched to amd_vrda_sin

     // LAPACK decompositions use AOCL-FLAME
     Eigen::LLT<Eigen::MatrixXd> llt(A);  // Dispatched to dpotrf

     std::cout << "Matrix norm: " << C.norm() << std::endl;
     std::cout << "Vector result norm: " << result.norm() << std::endl;

     return 0;
 }
 \endcode

 \section TopicUsingAOCL_Building Building and Linking

 To compile with AOCL support, set the \c AOCL_ROOT environment variable and link against the required libraries:

 \code
 export AOCL_ROOT=/path/to/aocl
 clang++ -O3 -g -DEIGEN_USE_AOCL_ALL \
         -I./install/include -I${AOCL_ROOT}/include \
         -Wno-parentheses my_app.cpp \
         -L${AOCL_ROOT} -lamdlibm -lflame -lblis \
         -lpthread -lrt -lm -lomp \
         -o eigen_aocl_example
 \endcode

 For multi-threaded performance, use the multi-threaded BLIS library:
 \code
 clang++ -O3 -g -DEIGEN_USE_AOCL_MT \
         -I./install/include -I${AOCL_ROOT}/include \
         -Wno-parentheses my_app.cpp \
         -L${AOCL_ROOT} -lamdlibm -lflame -lblis-mt \
         -lpthread -lrt -lm -lomp \
         -o eigen_aocl_example
 \endcode

 Key compiler and linker flags:
 - \c -DEIGEN_USE_AOCL_ALL: Enable all AOCL accelerations (BLAS, LAPACK, VML)
 - \c -DEIGEN_USE_AOCL_MT: Enable multi-threaded version (uses \c -lblis-mt)
 - \c -lblis: Single-threaded BLIS library
 - \c -lblis-mt: Multi-threaded BLIS library (recommended for performance)
 - \c -lflame: FLAME LAPACK implementation
 - \c -lamdlibm: AMD LibM vector math library
 - \c -lomp: OpenMP runtime for multi-threading support
 - \c -lpthread -lrt: System threading and real-time libraries
 - \c -Wno-parentheses: Suppress common warnings when using AOCL headers

 \subsection TopicUsingAOCL_EigenBuild Building Eigen with AOCL Support

 To build Eigen with AOCL Support, use the following CMake configuration:

 \code
 cmake .. -DCMAKE_BUILD_TYPE=Release \
          -DCMAKE_C_COMPILER=clang \
          -DCMAKE_CXX_COMPILER=clang++ \
          -DCMAKE_INSTALL_PREFIX=$PWD/install \
          -DINCLUDE_INSTALL_DIR=$PWD/install/include \
       && make install -j$(nproc)
 \endcode


 To build Eigen with AOCL integration and benchmarking capabilities, use the following CMake configuration:

 \code
 cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON \
          -DEIGEN_AOCL_BENCH_FLAGS="-O3 -mavx512f -fveclib=AMDLIBM" \
          -DEIGEN_AOCL_BENCH_USE_MT=OFF \
          -DEIGEN_AOCL_BENCH_ARCH=znver5 \
          -DCMAKE_BUILD_TYPE=Debug \
          -DCMAKE_C_COMPILER=clang \
          -DCMAKE_CXX_COMPILER=clang++ \
          -DCMAKE_INSTALL_PREFIX=$PWD/install \
          -DINCLUDE_INSTALL_DIR=$PWD/install/include \
       && make install -j$(nproc)
 \endcode

 **CMake Configuration Parameters:**

 <table class="manual">
 <tr><th>Parameter</th><th>Expected Values</th><th>Description</th></tr>
 <tr><td>\c EIGEN_BUILD_AOCL_BENCH</td><td>\c ON, \c OFF</td><td>Enable/disable AOCL benchmark compilation</td></tr>
 <tr class="alt"><td>\c EIGEN_AOCL_BENCH_FLAGS</td><td>Compiler flags string</td><td>Additional compiler optimizations: \c "-O3 -mavx512f -fveclib=AMDLIBM"</td></tr>
 <tr><td>\c EIGEN_AOCL_BENCH_USE_MT</td><td>\c ON, \c OFF</td><td>Use multi-threaded AOCL libraries (\c ON recommended for performance)</td></tr>
 <tr class="alt"><td>\c EIGEN_AOCL_BENCH_ARCH</td><td>\c znver3, \c znver4, \c znver5, \c native, \c generic</td><td>Target AMD architecture (match your CPU generation)</td></tr>
 <tr><td>\c CMAKE_BUILD_TYPE</td><td>\c Release, \c Debug, \c RelWithDebInfo</td><td>Build configuration (\c Release recommended for benchmarks)</td></tr>
 <tr class="alt"><td>\c CMAKE_C_COMPILER</td><td>\c clang, \c gcc</td><td>C compiler (clang recommended for AOCL)</td></tr>
 <tr><td>\c CMAKE_CXX_COMPILER</td><td>\c clang++, \c g++</td><td>C++ compiler (clang++ recommended for AOCL)</td></tr>
 <tr class="alt"><td>\c CMAKE_INSTALL_PREFIX</td><td>Installation path</td><td>Where to install Eigen headers</td></tr>
 <tr><td>\c INCLUDE_INSTALL_DIR</td><td>Header path</td><td>Specific path for Eigen headers</td></tr>
 </table>

 **Architecture Selection Guide:**
 - \c znver3: AMD Zen 3 (EPYC 7003, Ryzen 5000 series)
 - \c znver4: AMD Zen 4 (EPYC 9004, Ryzen 7000 series)
 - \c znver5: AMD Zen 5 (EPYC 9005, Ryzen 9000 series)
 - \c native: Auto-detect current CPU architecture
 - \c generic: Generic x86-64 without specific optimizations

 **Custom Compiler Flags Explanation:**
 - \c -O3: Maximum optimization level
 - \c -mavx512f: Enable AVX-512 instruction set (if supported)
 - \c -fveclib=AMDLIBM: Use AMD LibM for vectorized math functions

 \subsection TopicUsingAOCL_Benchmark Building the AOCL Benchmark

 After configuring Eigen, build the AOCL benchmark executable:

 \code
 cmake --build . --target benchmark_aocl -j$(nproc)
 \endcode

 This creates the \c benchmark_aocl executable that demonstrates AOCL acceleration with various matrix sizes and operations.

 **Running the Benchmark:**
 \code
 ./benchmark_aocl
 \endcode

 The benchmark will automatically compare:
 - Eigen's native performance vs AOCL-accelerated operations
 - Matrix multiplication performance (BLIS vs Eigen)
 - Vector math functions performance (LibM vs Eigen)
 - Memory bandwidth utilization and cache efficiency

 \section TopicUsingAOCL_CMake CMake Integration

 When using CMake, you can use a FindAOCL module:

 \code
 find_package(AOCL REQUIRED)
 target_compile_definitions(my_target PRIVATE EIGEN_USE_AOCL_MT)
 target_link_libraries(my_target PRIVATE AOCL::BLIS_MT AOCL::FLAME AOCL::LIBM)
 \endcode

 \section TopicUsingAOCL_Troubleshooting Troubleshooting

 Common issues and solutions:

 - **Link errors**: Ensure \c AOCL_ROOT is set and libraries are in \c LD_LIBRARY_PATH
 - **Performance not improved**: Verify you're using matrices/vectors larger than the threshold
 - **Thread contention**: Set \c OMP_NUM_THREADS to match your CPU core count
 - **Architecture mismatch**: Use appropriate \c -march flag for your AMD processor

 \section TopicUsingAOCL_Links Links

 - AMD AOCL can be downloaded for free <a href="https://www.amd.com/en/developer/aocl.html">here</a>
 - AOCL User Guide and documentation available on the AMD Developer Portal
 - AOCL is also available through package managers and containerized environments

 */

 }
	/*
	Copyright (c) 2025, AMD Inc. All rights reserved.
	Redistribution and use in source and binary forms, with or without modification,
	are permitted provided that the following conditions are met:
	* Redistributions of source code must retain the above copyright notice, this
	list of conditions and the following disclaimer.
	* Redistributions in binary form must reproduce the above copyright notice,
	this list of conditions and the following disclaimer in the documentation
	and/or other materials provided with the distribution.
	* Neither the name of AMD nor the names of its contributors may
	be used to endorse or promote products derived from this software without
	specific prior written permission.
	THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
	ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
	WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
	DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
	ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
	(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
	LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
	ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
	(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	********************************************************************************
	* Content : Documentation on the use of AMD AOCL through Eigen
	********************************************************************************
	*/

	namespace Eigen {

	/** \page TopicUsingAOCL Using AMD® AOCL from %Eigen

	Since %Eigen version 3.4 and later, users can benefit from built-in AMD® Optimizing CPU Libraries (AOCL) optimizations with an installed copy of AOCL 5.0 (or later).

	<a href="https://www.amd.com/en/developer/aocl.html"> AMD AOCL </a> provides highly optimized, multi-threaded mathematical routines for x86-64 processors with a focus on AMD "Zen"-based architectures. AOCL is available on Linux and Windows for x86-64 architectures.

	\note
	AMD® AOCL is freely available software, but it is the responsibility of users to download, install, and ensure their product's license allows linking to the AOCL libraries. AOCL is distributed under a permissive license that allows commercial use.

	Using AMD AOCL through %Eigen is straightforward:
	-# export \c AOCL_ROOT into your environment
	-# define one of the AOCL macros before including any %Eigen headers (see table below)
	-# link your program to AOCL libraries (BLIS, FLAME, LibM)
	-# ensure your system supports the target architecture optimizations

	When doing so, a number of %Eigen's algorithms are silently substituted with calls to AMD AOCL routines.
	These substitutions apply only for \b Dynamic \b or \b large \b enough objects with one of the following standard scalar types: \c float, \c double, \c complex<float>, and \c complex<double>.
	Operations on other scalar types or mixing reals and complexes will continue to use the built-in algorithms.

	The AOCL integration targets three core components:
	- BLIS: High-performance BLAS implementation optimized for modern cache hierarchies
	- FLAME: Dense linear algebra algorithms providing LAPACK functionality
	- LibM: Optimized standard math routines with vectorized implementations

	\section TopicUsingAOCL_Macros Configuration Macros

	You can choose which parts will be substituted by defining one or multiple of the following macros:

	<table class="manual">
	<tr><td>\c EIGEN_USE_BLAS </td><td>Enables the use of external BLAS level 2 and 3 routines (AOCL-BLIS)</td></tr>
	<tr class="alt"><td>\c EIGEN_USE_LAPACKE </td><td>Enables the use of external LAPACK routines via the LAPACKE C interface (AOCL-FLAME)</td></tr>
	<tr><td>\c EIGEN_USE_LAPACKE_STRICT </td><td>Same as \c EIGEN_USE_LAPACKE but algorithms of lower robustness are disabled. \n This currently concerns only JacobiSVD which would be replaced by \c gesvd.</td></tr>
	<tr class="alt"><td>\c EIGEN_USE_AOCL_VML </td><td>Enables the use of AOCL LibM vector math operations for coefficient-wise functions</td></tr>
	<tr><td>\c EIGEN_USE_AOCL_ALL </td><td>Defines \c EIGEN_USE_BLAS, \c EIGEN_USE_LAPACKE, and \c EIGEN_USE_AOCL_VML</td></tr>
	<tr class="alt"><td>\c EIGEN_USE_AOCL_MT </td><td>Equivalent to \c EIGEN_USE_AOCL_ALL, but ensures multi-threaded BLIS (\c libblis-mt) is used. \n \b Recommended for most applications.</td></tr>
	</table>

	\note The AOCL integration automatically enables optimizations when the matrix/vector size exceeds \c EIGEN_AOCL_VML_THRESHOLD (default: 128 elements). For smaller operations, Eigen's built-in vectorization may be faster due to function call overhead.

	\section TopicUsingAOCL_Performance Performance Considerations

	The \c EIGEN_USE_BLAS and \c EIGEN_USE_LAPACKE macros can be combined with AOCL-specific optimizations:

	- Multi-threading: Use \c EIGEN_USE_AOCL_MT to automatically select the multi-threaded BLIS library
	- Architecture targeting: AOCL libraries are optimized for AMD Zen architectures (Zen, Zen2, Zen3, Zen4, Zen5)
	- Vector Math Library: AOCL LibM provides vectorized implementations that can operate on entire arrays simultaneously
	- Memory layout: Eigen's column-major storage directly matches AOCL's expected data layout for zero-copy operation

	\section TopicUsingAOCL_Types Supported Data Types and Sizes

	AOCL acceleration is applied to:
	- Scalar types: \c float, \c double, \c complex<float>, \c complex<double>
	- Matrix/Vector sizes: Dynamic size or compile-time size ≥ \c EIGEN_AOCL_VML_THRESHOLD
	- Storage order: Both column-major (default) and row-major layouts
	- Memory alignment: Eigen's data pointers are directly compatible with AOCL function signatures

	The current AOCL Vector Math Library integration is specialized for \c double precision, with automatic fallback to scalar implementations for \c float.

	\section TopicUsingAOCL_Functions Vector Math Functions

	The following table summarizes coefficient-wise operations accelerated by \c EIGEN_USE_AOCL_VML:

	<table class="manual">
	<tr><th>Code example</th><th>AOCL routines</th></tr>
	<tr><td>\code
	v2 = v1.array().exp();
	v2 = v1.array().sin();
	v2 = v1.array().cos();
	v2 = v1.array().tan();
	v2 = v1.array().log();
	v2 = v1.array().log10();
	v2 = v1.array().log2();
	v2 = v1.array().sqrt();
	v2 = v1.array().pow(1.5);
	v2 = v1.array() + v2.array();
	\endcode</td><td>\code
	amd_vrda_exp
	amd_vrda_sin
	amd_vrda_cos
	amd_vrda_tan
	amd_vrda_log
	amd_vrda_log10
	amd_vrda_log2
	amd_vrda_sqrt
	amd_vrda_pow
	amd_vrda_add
	\endcode</td></tr>
	</table>

	In the examples, v1 and v2 are dense vectors of type \c VectorXd with size ≥ \c EIGEN_AOCL_VML_THRESHOLD.

	\section TopicUsingAOCL_Example Complete Example

	\code
	#define EIGEN_USE_AOCL_MT
	#include <iostream>
	#include <Eigen/Dense>

	int main() {
	const int n = 2048;

	// Large matrices automatically use AOCL-BLIS for multiplication
	Eigen::MatrixXd A = Eigen::MatrixXd::Random(n, n);
	Eigen::MatrixXd B = Eigen::MatrixXd::Random(n, n);
	Eigen::MatrixXd C = A * B; // Dispatched to dgemm

	// Large vectors automatically use AOCL LibM for math functions
	Eigen::VectorXd v = Eigen::VectorXd::LinSpaced(10000, 0, 10);
	Eigen::VectorXd result = v.array().sin(); // Dispatched to amd_vrda_sin

	// LAPACK decompositions use AOCL-FLAME
	Eigen::LLT<Eigen::MatrixXd> llt(A); // Dispatched to dpotrf

	std::cout << "Matrix norm: " << C.norm() << std::endl;
	std::cout << "Vector result norm: " << result.norm() << std::endl;

	return 0;
	}
	\endcode

	\section TopicUsingAOCL_Building Building and Linking

	To compile with AOCL support, set the \c AOCL_ROOT environment variable and link against the required libraries:

	\code
	export AOCL_ROOT=/path/to/aocl
	clang++ -O3 -g -DEIGEN_USE_AOCL_ALL \
	-I./install/include -I${AOCL_ROOT}/include \
	-Wno-parentheses my_app.cpp \
	-L${AOCL_ROOT} -lamdlibm -lflame -lblis \
	-lpthread -lrt -lm -lomp \
	-o eigen_aocl_example
	\endcode

	For multi-threaded performance, use the multi-threaded BLIS library:
	\code
	clang++ -O3 -g -DEIGEN_USE_AOCL_MT \
	-I./install/include -I${AOCL_ROOT}/include \
	-Wno-parentheses my_app.cpp \
	-L${AOCL_ROOT} -lamdlibm -lflame -lblis-mt \
	-lpthread -lrt -lm -lomp \
	-o eigen_aocl_example
	\endcode

	Key compiler and linker flags:
	- \c -DEIGEN_USE_AOCL_ALL: Enable all AOCL accelerations (BLAS, LAPACK, VML)
	- \c -DEIGEN_USE_AOCL_MT: Enable multi-threaded version (uses \c -lblis-mt)
	- \c -lblis: Single-threaded BLIS library
	- \c -lblis-mt: Multi-threaded BLIS library (recommended for performance)
	- \c -lflame: FLAME LAPACK implementation
	- \c -lamdlibm: AMD LibM vector math library
	- \c -lomp: OpenMP runtime for multi-threading support
	- \c -lpthread -lrt: System threading and real-time libraries
	- \c -Wno-parentheses: Suppress common warnings when using AOCL headers

	\subsection TopicUsingAOCL_EigenBuild Building Eigen with AOCL Support

	To build Eigen with AOCL Support, use the following CMake configuration:

	\code
	cmake .. -DCMAKE_BUILD_TYPE=Release \
	-DCMAKE_C_COMPILER=clang \
	-DCMAKE_CXX_COMPILER=clang++ \
	-DCMAKE_INSTALL_PREFIX=$PWD/install \
	-DINCLUDE_INSTALL_DIR=$PWD/install/include \
	&& make install -j$(nproc)
	\endcode


	To build Eigen with AOCL integration and benchmarking capabilities, use the following CMake configuration:

	\code
	cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON \
	-DEIGEN_AOCL_BENCH_FLAGS="-O3 -mavx512f -fveclib=AMDLIBM" \
	-DEIGEN_AOCL_BENCH_USE_MT=OFF \
	-DEIGEN_AOCL_BENCH_ARCH=znver5 \
	-DCMAKE_BUILD_TYPE=Debug \
	-DCMAKE_C_COMPILER=clang \
	-DCMAKE_CXX_COMPILER=clang++ \
	-DCMAKE_INSTALL_PREFIX=$PWD/install \
	-DINCLUDE_INSTALL_DIR=$PWD/install/include \
	&& make install -j$(nproc)
	\endcode

	CMake Configuration Parameters:

	<table class="manual">
	<tr><th>Parameter</th><th>Expected Values</th><th>Description</th></tr>
	<tr><td>\c EIGEN_BUILD_AOCL_BENCH</td><td>\c ON, \c OFF</td><td>Enable/disable AOCL benchmark compilation</td></tr>
	<tr class="alt"><td>\c EIGEN_AOCL_BENCH_FLAGS</td><td>Compiler flags string</td><td>Additional compiler optimizations: \c "-O3 -mavx512f -fveclib=AMDLIBM"</td></tr>
	<tr><td>\c EIGEN_AOCL_BENCH_USE_MT</td><td>\c ON, \c OFF</td><td>Use multi-threaded AOCL libraries (\c ON recommended for performance)</td></tr>
	<tr class="alt"><td>\c EIGEN_AOCL_BENCH_ARCH</td><td>\c znver3, \c znver4, \c znver5, \c native, \c generic</td><td>Target AMD architecture (match your CPU generation)</td></tr>
	<tr><td>\c CMAKE_BUILD_TYPE</td><td>\c Release, \c Debug, \c RelWithDebInfo</td><td>Build configuration (\c Release recommended for benchmarks)</td></tr>
	<tr class="alt"><td>\c CMAKE_C_COMPILER</td><td>\c clang, \c gcc</td><td>C compiler (clang recommended for AOCL)</td></tr>
	<tr><td>\c CMAKE_CXX_COMPILER</td><td>\c clang++, \c g++</td><td>C++ compiler (clang++ recommended for AOCL)</td></tr>
	<tr class="alt"><td>\c CMAKE_INSTALL_PREFIX</td><td>Installation path</td><td>Where to install Eigen headers</td></tr>
	<tr><td>\c INCLUDE_INSTALL_DIR</td><td>Header path</td><td>Specific path for Eigen headers</td></tr>
	</table>

	Architecture Selection Guide:
	- \c znver3: AMD Zen 3 (EPYC 7003, Ryzen 5000 series)
	- \c znver4: AMD Zen 4 (EPYC 9004, Ryzen 7000 series)
	- \c znver5: AMD Zen 5 (EPYC 9005, Ryzen 9000 series)
	- \c native: Auto-detect current CPU architecture
	- \c generic: Generic x86-64 without specific optimizations

	Custom Compiler Flags Explanation:
	- \c -O3: Maximum optimization level
	- \c -mavx512f: Enable AVX-512 instruction set (if supported)
	- \c -fveclib=AMDLIBM: Use AMD LibM for vectorized math functions

	\subsection TopicUsingAOCL_Benchmark Building the AOCL Benchmark

	After configuring Eigen, build the AOCL benchmark executable:

	\code
	cmake --build . --target benchmark_aocl -j$(nproc)
	\endcode

	This creates the \c benchmark_aocl executable that demonstrates AOCL acceleration with various matrix sizes and operations.

	Running the Benchmark:
	\code
	./benchmark_aocl
	\endcode

	The benchmark will automatically compare:
	- Eigen's native performance vs AOCL-accelerated operations
	- Matrix multiplication performance (BLIS vs Eigen)
	- Vector math functions performance (LibM vs Eigen)
	- Memory bandwidth utilization and cache efficiency

	\section TopicUsingAOCL_CMake CMake Integration

	When using CMake, you can use a FindAOCL module:

	\code
	find_package(AOCL REQUIRED)
	target_compile_definitions(my_target PRIVATE EIGEN_USE_AOCL_MT)
	target_link_libraries(my_target PRIVATE AOCL::BLIS_MT AOCL::FLAME AOCL::LIBM)
	\endcode

	\section TopicUsingAOCL_Troubleshooting Troubleshooting

	Common issues and solutions:

	- Link errors: Ensure \c AOCL_ROOT is set and libraries are in \c LD_LIBRARY_PATH
	- Performance not improved: Verify you're using matrices/vectors larger than the threshold
	- Thread contention: Set \c OMP_NUM_THREADS to match your CPU core count
	- Architecture mismatch: Use appropriate \c -march flag for your AMD processor

	\section TopicUsingAOCL_Links Links

	- AMD AOCL can be downloaded for free <a href="https://www.amd.com/en/developer/aocl.html">here</a>
	- AOCL User Guide and documentation available on the AMD Developer Portal
	- AOCL is also available through package managers and containerized environments

	*/

	}