Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512.
14 files changed