Made the blocking computation aware of the l3 cache
Also optimized the blocking parameters to take into account the number of threads used for a computation
13 files changed