Replace __float2half_rn with __float2half

The latter provides a consistent definition for CUDA 8.0 and 9.0.
1 file changed