Alexander Alekhin
4733a19bab
Merge pull request #16194 from alalek:fix_16192
...
* imgproc(test): resize(LANCZOS4) reproducer 16192
* imgproc: fix resize LANCZOS4 coefficients generation
2019-12-19 13:20:42 +03:00
Vitaly Tuzov
f5a84f75c4
Fix for CV_8UC2 linear resize vectorization
2019-12-18 21:41:36 +00:00
Paul Murphy
1c4a64f0a1
Merge pull request #16138 from pmur:reg_16137
...
* imgproc: Prevent 1B overrun of 8C3 SIMD optimization
The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.
The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.
Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.
This should resolve #16137
* imgproc(resize): extra check
2019-12-12 13:00:44 +03:00
Paul Murphy
a011035ed6
Merge pull request #15257 from pmur:resize
...
* resize: HResizeLinear reduce duplicate work
There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.
Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.
The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this. The performance
improvement is 10-50% depending on the scaling.
* imgproc: vectorize HResizeLinear
Performance is mostly gated by the gather operations
for x inputs.
Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.
While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.
For float types, this results in a more modest
1.2x improvement.
* Update U8 processing for non-bitexact linear resize
* core: hal: vsx: improve v_load_expand_q
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
* resize: Fix cn == 3 step per feedback
Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
2019-12-09 14:54:06 +03:00
clunietp
2185bce4b7
Fix 13577
2019-11-18 07:41:34 -05:00
Vitaly Tuzov
3b015dfc7d
Merge pull request #14210 from terfendail:wui_512
...
AVX512 wide universal intrinsics (#14210 )
* Added implementation of 512-bit wide universal intrinsics(WIP)
* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
2019-06-03 18:05:35 +03:00
Vitaly Tuzov
99b39aa5bd
Fixed out of bound reading in LINEAR_EXACT resize for 8UC3
2019-03-05 17:21:21 +03:00
Vitaly Tuzov
334c4d62b5
Merge pull request #13781 from terfendail:warp_wintr
...
Resize reworked using wide universal intrinsics (#13781 )
* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize
* Reworked linear resize using new wide LUT intrinsics
* Fix for VSX intrinsics
2019-02-20 14:30:28 +03:00
Alexander Alekhin
2d5ccc7b3e
imgproc(resize): update checks (static analyzers)
2018-12-03 13:13:48 +03:00
maver1
e397434cb6
Merge pull request #12877 from maver1:3.4
...
* Updated ICV packages and IPP integration
* core(test): minMaxIdx IPP regression test
* core(ipp): workaround minMaxIdx problem
* core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun
* Returned semicolon after CV_INSTRUMENT_REGION_IPP()
2018-10-24 15:02:53 +03:00
Michał Janiszewski
c8e6ce304f
Catch exceptions by const-reference
...
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
2018-10-16 22:43:54 +02:00
take1014
24af70c7e0
resolves 11283
2018-10-12 23:08:25 +09:00
Vitaly Tuzov
9d602f2752
Replaced SSE2 area resize implementation with wide universal intrinsic implementation
2018-10-08 16:27:52 +03:00
Alexander Alekhin
92ec971453
Merge pull request #12526 from terfendail:avx2_resize_fix
2018-09-14 15:57:47 +00:00
Hamdi Sahloul
5d54def264
Add semicolons after CV_INSTRUMENT macros
2018-09-14 06:45:31 +09:00
Vitaly Tuzov
29770e13e8
Fixed bit-exact resize SIMD implementation for AVX2 baseline
2018-09-13 18:20:27 +03:00
Hamdi Sahloul
a39e0daacf
Utilize CV_UNUSED macro
2018-09-07 20:33:52 +09:00
Vitaly Tuzov
f9a5c4d181
Fixed bit-exact resize wide intrinsics implementation for 16U
2018-09-03 20:37:25 +03:00
Vitaly Tuzov
e345cb03d5
Bit-exact resize reworked to use wide intrinsics ( #12038 )
...
* Bit-exact resize reworked to use wide intrinsics
* Reworked bit-exact resize row data loading
* Added bit-exact resize row data loaders for SIMD256 and SIMD512
* Fixed type punned pointer dereferencing warning
* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize
2018-08-31 16:54:05 +03:00
Alexander Alekhin
b09a4a98d4
opencv: Use cv::AutoBuffer<>::data()
2018-07-04 19:11:29 +03:00
gnthibault
b46fef327e
Fixed Assertin error due to Size.area() overflowing
2018-06-08 11:22:36 +02:00
Alexander Alekhin
5d36ee2fe7
imgproc: apply CV_OVERRIDE/CV_FINAL
2018-03-28 17:57:59 +03:00
Maksim Shabunin
8b87c4b96a
Fixed several warnings produced by clang 6 and static analyzers
2018-01-16 15:26:28 +03:00
Alexander Alekhin
8acd05f12a
Merge pull request #10421 from cezheng:patch-1
2018-01-05 09:24:33 +00:00
Vadim Pisarevsky
3f68d6d8a7
Merge pull request #10392 from terfendail:bitexact_fallback
2017-12-22 13:23:55 +00:00
Vitaly Tuzov
5fdb42a7c9
Added fallback to generic linear resize in case bit-exact resize of provided matrix isn't supported
2017-12-22 14:29:50 +03:00
Ce Zheng
602b08d9c7
Update resize inline comments
...
Reading through the implementation, I feel this line of comment is not consistent with the actually code, so this is for correcting it.
2017-12-22 16:03:12 +08:00
Vitaly Tuzov
019162486c
Disabled universal intrinsic based implementation for bit-exact resize of 3-channel images
2017-12-22 10:08:30 +03:00
Vitaly Tuzov
1eb2fa9efb
Added universal intrinsics based implementations for CV_8UC2, CV_8UC3, CV_8UC4 bit-exact resizes.
2017-12-20 17:17:10 +03:00
Vitaly Tuzov
51cb56ef2c
Implementation of bit-exact resize. Internal calls to linear resize updated to use bit-exact version. ( #9468 )
2017-12-13 15:00:38 +03:00
Maksim Shabunin
184daa155f
Fixed minor issues reported by GCC 7.2
2017-11-03 18:06:39 +03:00
Vitaly Tuzov
e8caa9b5c0
removed unused interpolateLinear
2017-08-31 15:34:27 +03:00
Vitaly Tuzov
b1f46b6d69
Move resize implementation to separate file
2017-08-31 14:36:19 +03:00