* New v_reverse HAL intrinsic for reversing the ordering of a vector
* Fix conflict.
* Try to resolve conflict again.
* Try one more time.
* Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and paste issue with NEON.
* Change v_uint16x8 SSE2 version to use shuffles