* added optimized paths for matrix-vector and vector-matrix products (using either a cache friendly strategy or re-using dot-product vectorized implementation) * add LinearAccessBit to Transpose