opencv

Author	SHA1	Message	Date
Alexander Alekhin	0d2857a242	Merge pull request #21152 from rogday:fix_defaults	2021-11-29 22:39:27 +00:00
Alexander Alekhin	17d99e6266	Merge pull request #21142 from alalek:dnn_two_inputs_ocl_fp16_3.4	2021-11-29 21:44:59 +00:00
Andrew Ryrie	ea7d4be3f8	Merge pull request #20658 from smbz:lstm_optimisation * dnn: LSTM optimisation This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm. fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications: - Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned. - Allow for weight matrices where the number of columns is not a multiple of 8. I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on. * Fix warning about initialisation order * Remove C++11 syntax * Fix build when AVX(2) is not available In this case the CV_TRY_X macros are defined to 0, rather than being undefined. * Minor changes as requested: - Don't check hardware support for AVX(2) when dispatch is disabled for these - Add braces * Fix out-of-bounds access in fully connected layer The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway). This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems. * Improve tail mask handling - Use static array for generating tail masks (as requested) - Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs * Revert whitespace change * Improve readability of conditions for using AVX * dnn(lstm): minor coding style changes, replaced left aligned load	2021-11-29 21:43:00 +00:00
Smirnov Egor	05db8784ae	fix Clip, LeakyReLU, LRN, Split defaults	2021-11-29 20:20:34 +03:00
Alexander Alekhin	58b06222ff	dnn(DataLayer): fix CPU/OpenCL code paths for FP16 handling	2021-11-28 07:44:05 +00:00
Alexander Alekhin	58dc397930	dnn(test): add two_inputs test with FP32/U8 data types - remove similar test from IE scope under HAVE_INF_ENGINE	2021-11-28 07:44:04 +00:00
yuki takehara	a6277370ca	Merge pull request #21107 from take1014:remove_assert_21038 resolves #21038 * remove C assert * revert C header * fix several points in review * fix test_ds.cpp	2021-11-27 18:34:52 +00:00
Alexander Alekhin	985aa0423d	dnn(test): update InferenceEngine tests	2021-11-26 18:46:26 +00:00
Alexander Alekhin	8041ab8a61	Merge pull request #21025 from alalek:issue_21004 * dnn(ocl4dnn): fix LRN layer accuracy problems - FP16 intermediate computation is not accurate and may provide NaN values * dnn(test): update tolerance for FP16	2021-11-12 01:54:07 +03:00
ZaKiiiiiiiii	98b6ce353c	Merge pull request #20904 from Crayon-new:fix_bug_in_maxLayer fix bug: wrong output dimension when "keep_dims" is false in pooling layer. * fix bug in max layer * code align * delete permute layer and add test case * add name assert * check other cases * remove c++11 features * style:add "const" remove assert * style:sanitize file names	2021-11-09 19:24:04 +03:00
Alexander Alekhin	edf533c83e	Merge pull request #21007 from alalek:cmake_dnn_fix_wrong_tengine_order	2021-11-04 12:28:27 +00:00
Alexander Alekhin	c1d61c88e9	dnn(cmake): don't hijack OpenCL options with Tengine	2021-11-04 09:59:19 +00:00
Alexander Alekhin	d484939c02	Merge pull request #20999 from alalek:dnn_replace_deprecated_calls dnn(protobuf): replace deprecated calls * dnn: replace deprecated ByteSize() => ByteSizeLong() * dnn: replace deprecated calls, use GetRepeatedFieldRef	2021-11-03 15:59:36 +00:00
rogday	b3f966e2ca	Merge pull request #20883 from rogday:eltwise_refactoring * backport elementwise_layers refactor * keep NULL	2021-10-19 13:29:22 +00:00
Alexander Alekhin	53d6c9b9c0	Merge pull request #20860 from rogday:sum_fix	2021-10-12 15:36:32 +00:00
Smirnov Egor	238dbffb48	change asserts for Sum	2021-10-11 20:59:44 +03:00
Smirnov Egor	a9d7b6eab7	fix const - input and remove unimplemented function	2021-10-11 18:58:10 +03:00
Alexander Alekhin	81e7988eb9	Merge pull request #20840 from alalek:dnn_ocl_cleanup_code	2021-10-08 05:07:51 +00:00
Alexander Alekhin	8c2dd5fb9a	dnn(ocl4dnn): cleanup dead code, improve logging	2021-10-08 00:39:40 +00:00
Alexander Alekhin	724e04e979	dnn(ocl4dnn): add extra checks to convolution layer - prevent running code over unsupported/non-tested configurations - prevent integer div by zero	2021-10-07 23:18:32 +00:00
Oliver Kuckertz	a3d7811f24	Merge pull request #20725 from mologie:fix-dnn-tf-on-arm * dnn: fix unaligned memory access crash on armv7 The getTensorContent function would return a Mat pointing to some member of a Protobuf-encoded message. Protobuf does not make any alignment guarantees, which results in a crash on armv7 when loading models while bit 2 is set in /proc/cpu/alignment (or the relevant kernel feature for alignment compatibility is disabled). Any read attempt from the previously unaligned data member would send SIGBUS. As workaround, this commit makes an aligned copy via existing clone functionality in getTensorContent. The unsafe copy=false option is removed. Unfortunately, a rather crude hack in PReLUSubgraph in fact writes(!) to the Protobuf message. We limit ourselves to fixing the alignment issues in this commit, and add getTensorContentRefUnaligned to cover the write case with a safe memcpy. A FIXME marks the issue. * dnn: reduce amount of .clone() calls * dnn: update FIXME comment Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>	2021-10-06 16:41:05 +00:00
Alexander Alekhin	646924fce8	dnn(pytest/test_input_3d): reload model between switching targets	2021-10-05 23:23:08 +00:00
Alexander Alekhin	ebef84e9ea	pre: OpenCV 3.4.16 (version++)	2021-10-04 20:47:07 +00:00
Alexander Alekhin	f977d10a19	dnn(ocl): fix conv DWCONV workgroup	2021-10-01 18:52:07 +00:00
Alexander Alekhin	846317ef37	dnn(ocl): fix conv BASIC workgroup	2021-09-29 14:55:46 +00:00
SamFC10	9c5d7716e2	fix for unsqueeze opset version 13	2021-09-17 17:40:57 +05:30
Alexander Alekhin	46fd26e366	Merge pull request #20699 from alalek:dnn_perf_update_convolution_tests	2021-09-16 17:11:32 +00:00
rogday	c410d7a97d	Merge pull request #20671 from rogday:yolov4x-mish Add support for YOLOv4x-mish * backport to 3.4 for supporting yolov4x-mish * add YOLOv4x-mish test * address review comments Co-authored-by: Guo Xu <guoxu@1school.com.cn>	2021-09-14 17:49:49 +00:00
Alexander Alekhin	6e66a9222a	dnn(onnx): fix format specifier	2021-09-11 22:26:52 +00:00
Zihao Mu	51b03b87e6	BiasAdd could load Const from second place.	2021-09-11 15:34:41 +00:00
Alexander Alekhin	1aacb9bb15	dnn(perf): update convolution tests	2021-09-10 13:11:02 +00:00
Alexander Alekhin	6ace801418	Merge pull request #20661 from alalek:dnn_ocl_fix_gemm_like_kernel	2021-09-10 11:58:52 +00:00
rogday	d31b93b513	Merge pull request #20674 from rogday:prelu_slope Fix PReLU negative slope access pattern * fix prelu negative slope access pattern * change begin() to ptr()	2021-09-10 11:07:16 +00:00
rogday	4807cd8a6e	Merge pull request #20605 from rogday:split_slice_shenanigans Add Normalize subgraph, fix Slice, Mul and Expand * Add Normalize subgraph, support for starts<0 and axis<0 in Slice, Mul broadcasting in the middle and fix Expand's unsqueeze * remove todos * remove range-based for loop * address review comments * change >> to > > in template * fix indexation * fix expand that does nothing	2021-09-09 14:41:40 +03:00
Alexander Alekhin	35e824c287	dnn(ocl): fix out of bound access in GEMM-like kernels - dropped usage of CreateSubBuffer() - buffers lifetime management issue - fixed elementwise offset - avoid out of bounds read access	2021-09-06 18:17:21 +00:00
Alexander Alekhin	5578ad5e14	dnn(ocl): fix automatic globalsize adjusting - if kernel code doesn't support that	2021-09-06 03:11:29 +00:00
Alexander Alekhin	0a43b23275	Merge pull request #20651 from alalek:issue_18361	2021-09-04 18:22:12 +00:00
Alexander Alekhin	7967683296	Merge pull request #20648 from alalek:issue_20615	2021-09-04 18:21:58 +00:00
Alexander Alekhin	5b2c016834	dnn(ocl): avoid out of buffer access in copyWeightsSwizzled	2021-09-04 15:45:59 +00:00
Alexander Alekhin	407adc7061	dnn(ocl): fix buffer offsets in IDLF kernel - drop CreateSubBuffer - fix FUSED_CONV_ELTWISE mode	2021-09-04 15:28:35 +00:00
rogday	d0e612dc36	Merge pull request #20647 from rogday:resize_concat_optimization Fix resize+concat optimization * fix resize+concat optimization * add comment and fix indentation	2021-09-03 12:32:29 +00:00
Alexander Alekhin	060a76dc3e	Merge pull request #20573 from rogday:onnx_scale_fix	2021-09-01 14:09:17 +00:00
WJJ1995	edc442afdb	Merge pull request #20511 from wjj19950828:add_humanseg_support_0806 * support PPSeg model for dnn module * fixed README for CI * add test case * fixed bug * deal with comments * rm dnn_model_runner * update test case * fixed bug for testcase * update testcase	2021-09-01 10:10:05 +00:00
Alexander Alekhin	ae6fabc6fe	dnn(ocl): drop CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE check - it is a hint and it should not block kernel execution	2021-08-30 20:40:14 +00:00
Vincent Rabaud	38d0063c36	Do not use deprecated ReleaseCleared in protobuf library. This is to make code work with protobuf arenas for memory management (ReleaseCleared is incompatible). The cleaning of the memory is also simpler.	2021-08-26 15:36:22 +02:00
Alexander Alekhin	77a5c43d50	Merge pull request #20586 from alalek:issue_20585	2021-08-21 17:22:58 +00:00
Alexander Alekhin	f28e4b86fb	dnn(ocl): fix top initialization in verifyResult	2021-08-21 16:04:13 +00:00
Alexander Alekhin	a9817e9127	Merge pull request #20556 from rogday:onnx_split_sum_fix	2021-08-20 08:10:18 +00:00
Vincent Rabaud	9cfa84313c	Use the one argument version of SetTotalBytesLimit. The two argument versions has been deprecated, cf https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream	2021-08-19 14:31:29 +02:00
Smirnov Egor	fe625a558e	fix hasDynamicShapes for batch_size and fix axis selection in Scale layer	2021-08-18 19:22:24 +03:00

1 2 3 4 5 ...

1328 Commits