| Bench Template Library |
| |
| **************************************** |
| Introduction : |
| |
| The aim of this project is to compare the performance |
| of available numerical libraries. The code is designed |
| as generic and modular as possible. Thus, adding new |
| numerical libraries or new numerical tests should |
| require minimal effort. |
| |
| |
| ***************************************** |
| |
| Installation : |
| |
| BTL uses cmake / ctest: |
| |
| 1 - create a build directory: |
| |
| $ mkdir build |
| $ cd build |
| |
| 2 - configure: |
| |
| $ ccmake .. |
| |
| 3 - run the bench using ctest: |
| |
| $ ctest -V |
| |
| You can run the benchmarks only on libraries matching a given regular expression: |
| ctest -V -R <regexp> |
| For instance: |
| ctest -V -R eigen2 |
| |
| You can also select a given set of actions defining the environment variable BTL_CONFIG this way: |
| BTL_CONFIG="-a action1{:action2}*" ctest -V |
| An example: |
| BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata" ctest -V -R eigen2 |
| |
| Finally, if bench results already exist (the bench*.dat files) then they merges by keeping the best for each matrix size. If you want to overwrite the previous ones you can simply add the "--overwrite" option: |
| BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata --overwrite" ctest -V -R eigen2 |
| |
| 4 : Analyze the result. different data files (.dat) are produced in each libs directories. |
| If gnuplot is available, choose a directory name in the data directory to store the results and type: |
| $ cd data |
| $ mkdir my_directory |
| $ cp ../libs/*/*.dat my_directory |
| Build the data utilities in this (data) directory |
| make |
| Then you can look the raw data, |
| go_mean my_directory |
| or smooth the data first : |
| smooth_all.sh my_directory |
| go_mean my_directory_smooth |
| |
| |
| ************************************************* |
| |
| Files and directories : |
| |
| generic_bench : all the bench sources common to all libraries |
| |
| actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested. |
| |
| libs/* : bench sources specific to each tested libraries. |
| |
| machine_dep : directory used to store machine specific Makefile.in |
| |
| data : directory used to store gnuplot scripts and data analysis utilities |
| |
| ************************************************** |
| |
| Principles : the code modularity is achieved by defining two concepts : |
| |
| ****** Action concept : This is a class defining which kind |
| of test must be performed (e.g. a matrix_vector_product). |
| An Action should define the following methods : |
| |
| *** Ctor using the size of the problem (matrix or vector size) as an argument |
| Action action(size); |
| *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments) |
| action.initialize(); |
| *** calculate : this method actually launch the calculation to be benchmarked |
| action.calculate; |
| *** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation) |
| *** name() : this method returns the name of the action (std::string) |
| |
| ****** Interface concept : This is a class or namespace defining how to use a given library and |
| its specific containers (matrix and vector). Up to now an interface should following types |
| |
| *** real_type : kind of float to be used (float or double) |
| *** stl_vector : must correspond to std::vector<real_type> |
| *** stl_matrix : must correspond to std::vector<stl_vector> |
| *** gene_vector : the vector type for this interface --> e.g. (real_type *) for the C_interface |
| *** gene_matrix : the matrix type for this interface --> e.g. (gene_vector *) for the C_interface |
| |
| + the following common methods |
| |
| *** free_matrix(gene_matrix & A, int N) dealocation of a N sized gene_matrix A |
| *** free_vector(gene_vector & B) dealocation of a N sized gene_vector B |
| *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A. |
| The allocation of A is done in this function. |
| *** vector_to_stl(gene_vector & B, stl_vector & B_stl) copy the content of an stl_vector B_stl into a gene_vector B. |
| The allocation of B is done in this function. |
| *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl. |
| The size of A_STL must corresponds to the size of A. |
| *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl. |
| The size of B_STL must corresponds to the size of B. |
| *** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source |
| and cible must be sized NxN. |
| *** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source |
| and cible must be sized N. |
| |
| and the following method corresponding to the action one wants to be benchmarked : |
| |
| *** matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N) |
| *** matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N) |
| *** ata_product(const gene_matrix & A, gene_matrix & X, int N) |
| *** aat_product(const gene_matrix & A, gene_matrix & X, int N) |
| *** axpy(real coef, const gene_vector & X, gene_vector & Y, int N) |
| |
| The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with |
| an interface. A typical main.cpp source stored in a given library directory libs/A_LIB |
| looks like : |
| |
| bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; |
| |
| this function will produce XY data file containing measured mflops as a function of the size for 50 |
| sizes between 10 and 10000. |
| |
| This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time |
| measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides |
| a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer |
| so |
| |
| bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; |
| |
| is equivalent to |
| |
| bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; |
| |
| If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer). |
| replace |
| bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point); |
| with |
| bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point); |
| in generic/bench.hh |
| |
| . |
| |
| |
| |