Commit Graph

4452 Commits

Author SHA1 Message Date
Alexander Alekhin
055ffc0425 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-24 18:21:19 +00:00
Chip Kerchner
5a6a49405d Merge pull request #15738 from ChipKerchner:bugInt64x2Comparison
Fixing bug with comparison of v_int64x2 or v_uint64x2

* Casting v_uint64x2 to v_float64x2 and comparing does NOT work in all cases.  Rewrite using epi64 instructions - faster too.

* Fix bad merge.

* Fix equal comparsion for non-SSE4.1. Add test cases for v_int64x2 comparisons.

* Try to fix merge conflict.

* Only test v_int64x2 comparisons if CV_SIMD_64F

* Fix compiler warning.
2019-10-22 16:37:20 +03:00
Alexander Alekhin
938d8dce06 Merge pull request #15685 from pmur:cnz64f-simd 2019-10-18 20:19:40 +00:00
Alexander Alekhin
24ebca5c59 core(simd): v_reverse() for MSA backend 2019-10-18 16:43:03 +03:00
Alexander Alekhin
a2b3cd9a2c Merge pull request #15709 from alalek:js_simd_reverse 2019-10-17 13:14:50 +00:00
Alexander Alekhin
d31da08d43 Merge pull request #15708 from alalek:js_simd_support_1.38.48 2019-10-17 13:14:34 +00:00
Alexander Alekhin
ad5d14ec0e Merge pull request #15701 from alalek:issue_15691 2019-10-16 11:13:07 +00:00
Alexander Alekhin
bce653117f Merge pull request #15700 from alalek:issue_12943 2019-10-16 11:12:49 +00:00
Alexander Alekhin
ad172726c0 js(simd): v_reverse implementation 2019-10-15 18:46:08 +03:00
Alexander Alekhin
b1a8de0901 js(simd): support Emscripten 1.38.48-upstream 2019-10-15 15:39:22 +03:00
Alexander Alekhin
823884b064 core(alloc): force initialization of memalign flag
- before main() launch
2019-10-15 13:07:11 +03:00
Alexander Alekhin
6a7d1c15d3 core(ipp): skip huge input in flip()
- IPP/SSE4.2 works well
2019-10-14 18:26:19 +03:00
Alexander Alekhin
e42560bed5 Merge pull request #15659 from malfet:use-atomic-in-getExpTab32f 2019-10-12 20:27:58 +00:00
Alexander Alekhin
d6630ab35b Merge pull request #15655 from malfet:use-atomic-in-parallel-for 2019-10-12 20:26:15 +00:00
Chip Kerchner
027769bf5d Merge pull request #15662 from ChipKerchner:addVReverseIntrinsic
* New v_reverse HAL intrinsic for reversing the ordering of a vector

* Fix conflict.

* Try to resolve conflict again.

* Try one more time.

* Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and paste issue with NEON.

* Change v_uint16x8 SSE2 version to use shuffles
2019-10-11 18:34:17 +03:00
Paul E. Murphy
ec91a3d59d core: vectorize countNonZero64f
Improves performance a bit. 2.2x on P9 and 2 - 3x on coffee lake
x86-64.
2019-10-11 09:02:46 -05:00
Alexander Alekhin
46206b4814 Merge tag '4.1.2' 2019-10-09 23:01:31 +00:00
Alexander Alekhin
4c71dbf0af release: OpenCV 4.1.2
OpenCV 4.1.2
2019-10-09 22:53:14 +00:00
Alexander Alekhin
65573784c4 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-09 19:46:18 +00:00
Alexander Alekhin
dd4f591d54 Merge tag '3.4.8' 2019-10-09 18:33:35 +03:00
Alexander Alekhin
6bdb9ca725 OpenCV release (3.4.8)
OpenCV 3.4.8
2019-10-09 14:42:29 +03:00
Maksim Shabunin
1ca74c3c03 Merge pull request #15544 from mshabunin:disable_posix_memalign
* Disable posix_memalign by default

* core: fix memalign parameter handling
2019-10-09 14:06:12 +03:00
Alexander Smorkalov
c3a588037a Merge pull request #15666 from seanm:Wnewline 2019-10-09 11:04:44 +00:00
Marcin Tolysz
3fd36c1be1 Merge pull request #15658 from tolysz:patch-1
* Cuda + OpenGL on ARM

There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations. 
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.

* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
2019-10-09 11:38:10 +03:00
Sean McBride
24effe8cd6 Fixed clang -Wnewline-eof warning by adding newline to end of file 2019-10-09 10:12:09 +03:00
Nikita Shulga
ec37364762 Use std::atomic in getExpTab32f and getLogTab32f
Reads and writes to volatile bool are not guaranteed to be atomic.
2019-10-07 16:35:07 -07:00
Nikita Shulga
23288b7cb5 Use atomic operations to modify flagNestedParallelFor
This ensures uniform behavior on any C++11 compliant compiler
2019-10-07 16:26:30 -07:00
Sayed Adel
f2fe6f40c2 Merge pull request #15510 from seiko2plus:issue15506
* core: rework and optimize SIMD implementation of dotProd

  - add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
  - add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
    pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
  - fix clang build on ppc64le
  - support wide universal intrinsics for dotProd_32s
  - remove raw SIMD and activate universal intrinsics for dotProd_8
  - implement SIMD optimization for dotProd_s16&u16
  - extend performance test data types of dotprod
  - fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
  - optimize v_mul_expand(int32) on VSX

* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast

  this changes made depend on "terfendail" review
2019-10-07 22:01:35 +03:00
Alexander Alekhin
7837ae0e19 Merge pull request #15654 from sturkmen72:patch-3 2019-10-07 16:15:04 +00:00
Marcin Tolysz
53400d86e2 Fix compiler warnings for latest cuda npp which defines this itself as:
```
#define NPP_VER_MAJOR 10
#define NPP_VER_MINOR 2
#define NPP_VER_PATCH 0
#define NPP_VER_BUILD 243

#define NPP_VERSION (NPP_VER_MAJOR * 1000 +     \
                     NPP_VER_MINOR *  100 +     \
                     NPP_VER_PATCH)
2019-10-07 11:45:26 +01:00
Suleyman TURKMEN
c0489963bb
Update copy.cpp 2019-10-07 11:59:52 +03:00
Alexander Alekhin
626bfbf309 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-05 15:45:31 +00:00
Alexander Alekhin
98fc098216 Merge pull request #15646 from alalek:fix_avx512_detection 2019-10-05 15:30:09 +00:00
Alexander Alekhin
22d0c57a1c Merge pull request #15602 from alalek:core_softfloat_ubsan_shift 2019-10-05 15:27:35 +00:00
Alexander Alekhin
bdc097495a fix avx512 detection
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
2019-10-05 11:03:57 +00:00
Alexander Alekhin
3fb6617d62 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-02 17:49:19 +03:00
Alexander Alekhin
77346d7286 core: workaround transform() inplace calls 2019-10-01 16:52:14 +03:00
Alexander Alekhin
ed9bca969c core: fix UBSAN in softfloat 2019-09-27 16:29:50 +03:00
Alexander Alekhin
bc927f9788 Merge pull request #15591 from alalek:core_persistence_fix 2019-09-26 12:59:37 +00:00
Alexander Alekhin
e2a5a6a05c Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-25 18:32:44 +00:00
Alexander Alekhin
677b94c92e Merge pull request #15579 from alalek:ocl_use_host_mem_ptr_flag 2019-09-25 15:12:59 +00:00
Alexander Alekhin
eacadf0e73 core(ocl): add flag OPENCV_OPENCL_ENABLE_MEM_USE_HOST_PTR
to control CL_MEM_USE_HOST_PTR usage
2019-09-25 15:12:36 +03:00
Alexander Alekhin
6e246ee58c core(persistence): fix reserveNodeSpace() implementation
- avoid data copying after buffer block shrink
- resize current block in case of single FileNode
2019-09-25 15:02:20 +03:00
Alexander Alekhin
d2cacac07a Merge pull request #15573 from alalek:build_cxx11_warnings 2019-09-24 22:08:55 +00:00
Wenzhao Xiang
c2096771cb Merge pull request #15371 from Wenzhao-Xiang:gsoc_2019
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)

* [GSoC 2019]

Improve the performance of JavaScript version of OpenCV (OpenCV.js):
1. Create the base of OpenCV.js performance test:
     This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it.
2. Optimize the OpenCV.js performance by WASM threads:
     This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser.
3. Optimize the OpenCV.js performance by WASM SIMD:
     Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.

* [GSoC2019] 

1. use short license header
2. fix documentation node issue
3. remove the unused `hasSIMD128()` api

* [GSoC2019]

1. fix emscripten define
2. use fallback function for f16

* [GSoC2019]

Fix rebase issue
2019-09-24 16:30:42 +03:00
Alexander Alekhin
3cf9185159 Merge pull request #15538 from terfendail:wui_checkany 2019-09-23 15:52:24 +00:00
Maksim Shabunin
c8abf2ad14 backport: fixed warnings produced by clang-9.0.0
ea3dc78986
83fc27cb99
2019-09-23 18:36:18 +03:00
Alexander Alekhin
fcc69d5a60 core(test): fix check conditions 2019-09-22 11:28:41 +00:00
Alexander Alekhin
a74fe2ec01 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-20 21:11:49 +00:00
mipsopen-fwu
b1ea91d8bd Merge pull request #15422 from mipsopen-fwu:msa-dev
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed some unused code in mips.toolchain.cmake.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
2019-09-20 19:52:48 +03:00