Commit Graph

2600 Commits

Author SHA1 Message Date
Alexander Alekhin
e42560bed5 Merge pull request #15659 from malfet:use-atomic-in-getExpTab32f 2019-10-12 20:27:58 +00:00
Alexander Alekhin
d6630ab35b Merge pull request #15655 from malfet:use-atomic-in-parallel-for 2019-10-12 20:26:15 +00:00
Alexander Alekhin
65573784c4 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-09 19:46:18 +00:00
Maksim Shabunin
1ca74c3c03 Merge pull request #15544 from mshabunin:disable_posix_memalign
* Disable posix_memalign by default

* core: fix memalign parameter handling
2019-10-09 14:06:12 +03:00
Marcin Tolysz
3fd36c1be1 Merge pull request #15658 from tolysz:patch-1
* Cuda + OpenGL on ARM

There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations. 
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.

* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
2019-10-09 11:38:10 +03:00
Nikita Shulga
ec37364762 Use std::atomic in getExpTab32f and getLogTab32f
Reads and writes to volatile bool are not guaranteed to be atomic.
2019-10-07 16:35:07 -07:00
Nikita Shulga
23288b7cb5 Use atomic operations to modify flagNestedParallelFor
This ensures uniform behavior on any C++11 compliant compiler
2019-10-07 16:26:30 -07:00
Sayed Adel
f2fe6f40c2 Merge pull request #15510 from seiko2plus:issue15506
* core: rework and optimize SIMD implementation of dotProd

  - add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
  - add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
    pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
  - fix clang build on ppc64le
  - support wide universal intrinsics for dotProd_32s
  - remove raw SIMD and activate universal intrinsics for dotProd_8
  - implement SIMD optimization for dotProd_s16&u16
  - extend performance test data types of dotprod
  - fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
  - optimize v_mul_expand(int32) on VSX

* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast

  this changes made depend on "terfendail" review
2019-10-07 22:01:35 +03:00
Suleyman TURKMEN
c0489963bb
Update copy.cpp 2019-10-07 11:59:52 +03:00
Alexander Alekhin
626bfbf309 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-05 15:45:31 +00:00
Alexander Alekhin
98fc098216 Merge pull request #15646 from alalek:fix_avx512_detection 2019-10-05 15:30:09 +00:00
Alexander Alekhin
22d0c57a1c Merge pull request #15602 from alalek:core_softfloat_ubsan_shift 2019-10-05 15:27:35 +00:00
Alexander Alekhin
bdc097495a fix avx512 detection
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
2019-10-05 11:03:57 +00:00
Alexander Alekhin
3fb6617d62 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-02 17:49:19 +03:00
Alexander Alekhin
77346d7286 core: workaround transform() inplace calls 2019-10-01 16:52:14 +03:00
Alexander Alekhin
ed9bca969c core: fix UBSAN in softfloat 2019-09-27 16:29:50 +03:00
Alexander Alekhin
bc927f9788 Merge pull request #15591 from alalek:core_persistence_fix 2019-09-26 12:59:37 +00:00
Alexander Alekhin
e2a5a6a05c Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-25 18:32:44 +00:00
Alexander Alekhin
677b94c92e Merge pull request #15579 from alalek:ocl_use_host_mem_ptr_flag 2019-09-25 15:12:59 +00:00
Alexander Alekhin
eacadf0e73 core(ocl): add flag OPENCV_OPENCL_ENABLE_MEM_USE_HOST_PTR
to control CL_MEM_USE_HOST_PTR usage
2019-09-25 15:12:36 +03:00
Alexander Alekhin
6e246ee58c core(persistence): fix reserveNodeSpace() implementation
- avoid data copying after buffer block shrink
- resize current block in case of single FileNode
2019-09-25 15:02:20 +03:00
Wenzhao Xiang
c2096771cb Merge pull request #15371 from Wenzhao-Xiang:gsoc_2019
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)

* [GSoC 2019]

Improve the performance of JavaScript version of OpenCV (OpenCV.js):
1. Create the base of OpenCV.js performance test:
     This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it.
2. Optimize the OpenCV.js performance by WASM threads:
     This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser.
3. Optimize the OpenCV.js performance by WASM SIMD:
     Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.

* [GSoC2019] 

1. use short license header
2. fix documentation node issue
3. remove the unused `hasSIMD128()` api

* [GSoC2019]

1. fix emscripten define
2. use fallback function for f16

* [GSoC2019]

Fix rebase issue
2019-09-24 16:30:42 +03:00
Alexander Alekhin
a74fe2ec01 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-20 21:11:49 +00:00
mipsopen-fwu
b1ea91d8bd Merge pull request #15422 from mipsopen-fwu:msa-dev
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed some unused code in mips.toolchain.cmake.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
2019-09-20 19:52:48 +03:00
Alexander Alekhin
bea2c75452 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-05 14:29:22 +03:00
Alexander Alekhin
0a13633411 Merge pull request #15444 from alalek:ocl_fix_fft_kernel 2019-09-04 16:25:34 +00:00
Alexander Alekhin
8bd2720c28 core(ocl): fix fft kernel compilation
- error: variables in the local address space can only be declared in the outermost scope of a kernel function
2019-09-03 15:46:53 +03:00
David Carlier
6769ee3748 OpenCL: FreeBSD build fix 2019-09-02 18:30:53 +01:00
Alexander Alekhin
048ddbf9ee Merge pull request #15339 from pmur:dotprod-32s-vsx 2019-08-31 11:16:04 +00:00
Alexander Alekhin
2a6527e751 Merge pull request #15402 from ChipKerchner:normUnroll 2019-08-31 11:10:05 +00:00
ChipKerchner
288e6f9c07 –Improve vectorization in the 'norm' functions 2019-08-27 12:15:19 -05:00
Alexander Alekhin
a7b954f655 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-23 19:24:37 +03:00
Kazuma Furuhashi
ccecd3405a Merge pull request #15007 from 284km:fixatypo
s/last_occurence/last_occurrence/
2019-08-22 17:32:25 +03:00
Alexander Alekhin
8b1fe8f6e0 core: fix stat SIMD code 2019-08-22 16:37:26 +03:00
Alexander Alekhin
4700722444 Merge pull request #15359 from mgehre:fix_dangling_pointer 2019-08-21 11:38:36 +00:00
Matthias Gehre
0e92ac2af7 modules/core/src/ocl.cpp: Fix dangling pointer
Detected by clang trunk:
```
opencv/modules/core/src/ocl.cpp:4337:37: warning: object backing the pointer will be destroyed at the end of the full-expression [-Wdangling]
        CV_OCL_CHECK_RESULT(retval, cv::format("clCreateBuffer(capacity=%lld) => %p", (long long int)entry.capacity_, (void*)entry.clBuffer_).c_str());
                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
opencv/modules/core/src/ocl.cpp:193:42: note: expanded from macro 'CV_OCL_CHECK_RESULT'
            if (0) { const char* msg_ = (msg); CV_UNUSED(msg_); /* ensure const char* type (cv::String without c_str()) */ } \
```
because `cv::format` yields a temporary std::string, and thus `msg_` points to a destroyed buffer.
2019-08-20 23:30:34 +02:00
Paul E. Murphy
33fb253a66 core: vectorize dotProd_32s
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.
2019-08-20 15:28:36 -05:00
luz.paz
fcc7d8dd4e Fix modules/ typos
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`

backporting of commit: ec43292e1e
2019-08-16 17:34:29 +03:00
luz.paz
ec43292e1e Fix modules/ typos
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`
2019-08-15 18:02:09 -04:00
Alexander Alekhin
2ad0487cec Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-13 18:32:29 +00:00
Hugo Lindström
935067ee05 Merge pull request #15265 from hugolm84:wince-armv7-supports-neon
* WINCE 8.0 requires ARMv7 Thumb2 and thus have NEON instructions

* Only add NEON if on _ARM_
2019-08-09 18:01:37 +03:00
Alexander Alekhin
174b4ce29d Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-05 18:11:43 +00:00
Victor Romero
987bb2ca61 Fix build for UWP
backport of commit: f18cbd036a
2019-08-05 17:19:36 +03:00
Thang Tran
d659eb9327 core: fixed error message to avoid confusion 2019-08-04 17:17:03 +02:00
Victor Romero
f18cbd036a Merge pull request #15207 from vicroms:fix-uwp-build
Fix build for UWP (#15207)

* Guard non-WinRT calls to fix UWP build

* Remove unnecessary guard for WinRT
2019-08-03 22:53:38 +03:00
Alexander Alekhin
ba934ff1ce Merge pull request #15202 from hugolm84:support_build_shared_for_wince 2019-08-02 15:34:02 +00:00
Hugo Lindström
03fe1cb7fc Support building shared libraries on WINCE. 2019-08-01 15:28:04 +02:00
Maksim Shabunin
6d5ac67681 Restored IPP call reduction 2019-07-31 15:41:22 +03:00
Alexander Alekhin
0cf479dd5c Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-07-25 19:21:47 +00:00
Chip Kerchner
0db4fb1835 Merge pull request #15136 from ChipKerchner:dotProd_unroll
* Unroll multiply and add instructions in dotProd_32f - 35% faster.

* Eliminate unnecessary v_reduce_sum instructions.
2019-07-25 21:21:32 +03:00