Commit Graph

28 Commits

Author SHA1 Message Date
Vitaly Tuzov
3b015dfc7d Merge pull request #14210 from terfendail:wui_512
AVX512 wide universal intrinsics (#14210)

* Added implementation of 512-bit wide universal intrinsics(WIP)

* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks

* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros

* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines

* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.

* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build

* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
2019-06-03 18:05:35 +03:00
Vitaly Tuzov
99b39aa5bd Fixed out of bound reading in LINEAR_EXACT resize for 8UC3 2019-03-05 17:21:21 +03:00
Vitaly Tuzov
334c4d62b5 Merge pull request #13781 from terfendail:warp_wintr
Resize reworked using wide universal intrinsics (#13781)

* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize

* Reworked linear resize using new wide LUT intrinsics

* Fix for VSX intrinsics
2019-02-20 14:30:28 +03:00
Alexander Alekhin
2d5ccc7b3e imgproc(resize): update checks (static analyzers) 2018-12-03 13:13:48 +03:00
maver1
e397434cb6 Merge pull request #12877 from maver1:3.4
* Updated ICV packages and IPP integration

* core(test): minMaxIdx IPP regression test

* core(ipp): workaround minMaxIdx problem

* core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun

* Returned semicolon after CV_INSTRUMENT_REGION_IPP()
2018-10-24 15:02:53 +03:00
Michał Janiszewski
c8e6ce304f Catch exceptions by const-reference
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
2018-10-16 22:43:54 +02:00
take1014
24af70c7e0 resolves 11283 2018-10-12 23:08:25 +09:00
Vitaly Tuzov
9d602f2752 Replaced SSE2 area resize implementation with wide universal intrinsic implementation 2018-10-08 16:27:52 +03:00
Alexander Alekhin
92ec971453 Merge pull request #12526 from terfendail:avx2_resize_fix 2018-09-14 15:57:47 +00:00
Hamdi Sahloul
5d54def264 Add semicolons after CV_INSTRUMENT macros 2018-09-14 06:45:31 +09:00
Vitaly Tuzov
29770e13e8 Fixed bit-exact resize SIMD implementation for AVX2 baseline 2018-09-13 18:20:27 +03:00
Hamdi Sahloul
a39e0daacf Utilize CV_UNUSED macro 2018-09-07 20:33:52 +09:00
Vitaly Tuzov
f9a5c4d181 Fixed bit-exact resize wide intrinsics implementation for 16U 2018-09-03 20:37:25 +03:00
Vitaly Tuzov
e345cb03d5 Bit-exact resize reworked to use wide intrinsics (#12038)
* Bit-exact resize reworked to use wide intrinsics

* Reworked bit-exact resize row data loading

* Added bit-exact resize row data loaders for SIMD256 and SIMD512

* Fixed type punned pointer dereferencing warning

* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize
2018-08-31 16:54:05 +03:00
Alexander Alekhin
b09a4a98d4 opencv: Use cv::AutoBuffer<>::data() 2018-07-04 19:11:29 +03:00
gnthibault
b46fef327e Fixed Assertin error due to Size.area() overflowing 2018-06-08 11:22:36 +02:00
Alexander Alekhin
5d36ee2fe7 imgproc: apply CV_OVERRIDE/CV_FINAL 2018-03-28 17:57:59 +03:00
Maksim Shabunin
8b87c4b96a Fixed several warnings produced by clang 6 and static analyzers 2018-01-16 15:26:28 +03:00
Alexander Alekhin
8acd05f12a Merge pull request #10421 from cezheng:patch-1 2018-01-05 09:24:33 +00:00
Vadim Pisarevsky
3f68d6d8a7 Merge pull request #10392 from terfendail:bitexact_fallback 2017-12-22 13:23:55 +00:00
Vitaly Tuzov
5fdb42a7c9 Added fallback to generic linear resize in case bit-exact resize of provided matrix isn't supported 2017-12-22 14:29:50 +03:00
Ce Zheng
602b08d9c7
Update resize inline comments
Reading through the implementation, I feel this line of comment is not consistent with the actually code, so this is for correcting it.
2017-12-22 16:03:12 +08:00
Vitaly Tuzov
019162486c Disabled universal intrinsic based implementation for bit-exact resize of 3-channel images 2017-12-22 10:08:30 +03:00
Vitaly Tuzov
1eb2fa9efb Added universal intrinsics based implementations for CV_8UC2, CV_8UC3, CV_8UC4 bit-exact resizes. 2017-12-20 17:17:10 +03:00
Vitaly Tuzov
51cb56ef2c Implementation of bit-exact resize. Internal calls to linear resize updated to use bit-exact version. (#9468) 2017-12-13 15:00:38 +03:00
Maksim Shabunin
184daa155f Fixed minor issues reported by GCC 7.2 2017-11-03 18:06:39 +03:00
Vitaly Tuzov
e8caa9b5c0 removed unused interpolateLinear 2017-08-31 15:34:27 +03:00
Vitaly Tuzov
b1f46b6d69 Move resize implementation to separate file 2017-08-31 14:36:19 +03:00