Commit Graph

30 Commits

Author SHA1 Message Date
Smirnov Egor
9c84749e2c backport YOLOv4x-mish new_coords CUDA implementation 2021-10-08 14:14:49 +03:00
Alexander Alekhin
24fcb7f813 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-09-25 17:50:00 +00:00
rogday
c410d7a97d
Merge pull request #20671 from rogday:yolov4x-mish
Add support for YOLOv4x-mish

* backport to 3.4 for supporting yolov4x-mish

* add YOLOv4x-mish test

* address review comments

Co-authored-by: Guo Xu <guoxu@1school.com.cn>
2021-09-14 17:49:49 +00:00
Alexander Alekhin
ca8c3dd9b5 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-03-22 12:05:23 +00:00
Liubov Batanina
c0dd82fb53
Merge pull request #19632 from l-bat:lb/ie_arm_target
Added OpenVINO ARM target

* Added IE ARM target

* Added OpenVINO ARM target

* Delete ARM target

* Detect ARM platform

* Changed device name in ArmPlugin

* Change ARM detection
2021-03-20 11:20:02 +00:00
Alexander Alekhin
e3d502310f Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-06-19 07:33:50 +00:00
Liubov Batanina
d93b6be3cc Changed StridedSlice to VariadicSplit in Region layer 2020-06-17 10:02:53 +03:00
Liubov Batanina
d3aaf2d3a3
Merge pull request #17371 from l-bat:nms_model
* Fix NMS bug in DetectionModel

* Fixed comments

* Refactoring
2020-05-28 22:54:19 +00:00
YashasSamaga
3c35b563d7 add scale_x_y parameter to region 2020-05-10 16:53:28 +05:30
Alexander Alekhin
09799402f9 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-05-06 19:53:51 +00:00
Liubov Batanina
a5696da9ec
Merge pull request #17185 from l-bat:yolo_v4
* Support Yolov4

* Skip Mish on OpenVINO 2020.2

* Revert Mish

* Refactoring
2020-04-30 16:53:44 +03:00
Alexander Alekhin
bfcc136dc7 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-04-21 21:32:51 +00:00
Liubov Batanina
8badf7f354
Merge pull request #17112 from l-bat:ie_region
* Support nGraph Region

* Support region since OpenVINO 2020.2

* Skip myriad
2020-04-21 09:26:58 +00:00
Yashas Samaga B L
613c12e590 Merge pull request #14827 from YashasSamaga:cuda4dnn-csl-low
CUDA backend for the DNN module

* stub cuda4dnn design

* minor fixes for tests and doxygen

* add csl public api directory to module headers

* add low-level CSL components

* add high-level CSL components

* integrate csl::Tensor into backbone code

* switch to CPU iff unsupported; otherwise, fail on error

* add fully connected layer

* add softmax layer

* add activation layers

* support arbitary rank TensorDescriptor

* pass input wrappers to `initCUDA()`

* add 1d/2d/3d-convolution

* add pooling layer

* reorganize and refactor code

* fixes for gcc, clang and doxygen; remove cxx14/17 code

* add blank_layer

* add LRN layer

* add rounding modes for pooling layer

* split tensor.hpp into tensor.hpp and tensor_ops.hpp

* add concat layer

* add scale layer

* add batch normalization layer

* split math.cu into activations.cu and math.hpp

* add eltwise layer

* add flatten layer

* add tensor transform api

* add asymmetric padding support for convolution layer

* add reshape layer

* fix rebase issues

* add permute layer

* add padding support for concat layer

* refactor and reorganize code

* add normalize layer

* optimize bias addition in scale layer

* add prior box layer

* fix and optimize normalize layer

* add asymmetric padding support for pooling layer

* add event API

* improve pooling performance for some padding scenarios

* avoid over-allocation of compute resources to kernels

* improve prior box performance

* enable layer fusion

* add const layer

* add resize layer

* add slice layer

* add padding layer

* add deconvolution layer

* fix channelwise  ReLU initialization

* add vector traits

* add vectorized versions of relu, clipped_relu, power

* add vectorized concat kernels

* improve concat_with_offsets performance

* vectorize scale and bias kernels

* add support for multi-billion element tensors

* vectorize prior box kernels

* fix address alignment check

* improve bias addition performance of conv/deconv/fc layers

* restructure code for supporting multiple targets

* add DNN_TARGET_CUDA_FP64

* add DNN_TARGET_FP16

* improve vectorization

* add region layer

* improve tensor API, add dynamic ranks

1. use ManagedPtr instead of a Tensor in backend wrapper
2. add new methods to tensor classes
  - size_range: computes the combined size of for a given axis range
  - tensor span/view can be constructed from a raw pointer and shape
3. the tensor classes can change their rank at runtime (previously rank was fixed at compile-time)
4. remove device code from tensor classes (as they are unused)
5. enforce strict conditions on tensor class APIs to improve debugging ability

* fix parametric relu activation

* add squeeze/unsqueeze tensor API

* add reorg layer

* optimize permute and enable 2d permute

* enable 1d and 2d slice

* add split layer

* add shuffle channel layer

* allow tensors of different ranks in reshape primitive

* patch SliceOp to allow Crop Layer

* allow extra shape inputs in reshape layer

* use `std::move_backward` instead of `std::move` for insert in resizable_static_array

* improve workspace management

* add spatial LRN

* add nms (cpu) to region layer

* add max pooling with argmax ( and a fix to limits.hpp)

* add max unpooling layer

* rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA

* update supportBackend to be more rigorous

* remove stray include from preventing non-cuda build

* include op_cuda.hpp outside condition #if

* refactoring, fixes and many optimizations

* drop DNN_TARGET_CUDA_FP64

* fix gcc errors

* increase max. tensor rank limit to six

* add Interp layer

* drop custom layers; use BackendNode

* vectorize activation kernels

* fixes for gcc

* remove wrong assertion

* fix broken assertion in unpooling primitive

* fix build errors in non-CUDA build

* completely remove workspace from public API

* fix permute layer

* enable accuracy and perf. tests for DNN_TARGET_CUDA

* add asynchronous forward

* vectorize eltwise ops

* vectorize fill kernel

* fixes for gcc

* remove CSL headers from public API

* remove csl header source group from cmake

* update min. cudnn version in cmake

* add numerically stable FP32 log1pexp

* refactor code

* add FP16 specialization to cudnn based tensor addition

* vectorize scale1 and bias1 + minor refactoring

* fix doxygen build

* fix invalid alignment assertion

* clear backend wrappers before allocateLayers

* ignore memory lock failures

* do not allocate internal blobs

* integrate NVTX

* add numerically stable half precision log1pexp

* fix indentation, following coding style,  improve docs

* remove accidental modification of IE code

* Revert "add asynchronous forward"

This reverts commit 1154b9da9da07e9b52f8a81bdcea48cf31c56f70.

* [cmake] throw error for unsupported CC versions

* fix rebase issues

* add more docs, refactor code, fix bugs

* minor refactoring and fixes

* resolve warnings/errors from clang

* remove haveCUDA() checks from supportBackend()

* remove NVTX integration

* changes based on review comments

* avoid exception when no CUDA device is present

* add color code for CUDA in Net::dump
2019-10-21 14:28:00 +03:00
zuoshaobo
8b52eabd48 fix the region layer's output anchor size normalization error 2019-03-15 08:41:11 -04:00
Alexander Alekhin
9d02d42afe dnn(ocl4dnn): don't use getUMat()
especially in CPU only processing
2018-10-05 15:24:51 +03:00
Dmitry Kurtaev
24ab751547 Merge pull request #12565 from dkurt:dnn_non_intel_gpu
* Remove isIntel check from deep learning layers

* Remove fp16->fp32 fallbacks where it's not necessary

* Fix Kernel::run to prevent localsize > globalsize
2018-09-26 16:27:00 +03:00
Marat K
38f8fc6c82 Merge pull request #12249 from kopytjuk:feature/region-layer-batch-mode
Feature/region layer batch mode (#12249)

* Add batch mode for Darknet networks.

Swap variables in test_darknet.

Adapt reorg layer to batch mode.

Adapt region layer.

Add OpenCL implementation.

Remove trailing whitespace.

Bugifx reorg opencl implementation.

Fix bug in OpenCL reorg.

Fix modulo bug.

Fix bug.

Reorg openCL.

Restore reorg layer opencl code.

OpenCl fix.

Work on openCL reorg.

Remove whitespace.

Fix openCL region layer implementation.

Fix bug.

Fix softmax region opencl bug.

Fix opencl bug.

Fix openCL bug.

Update aff_trans.cpp

When the fullAffine parameter is set to false, the estimateRigidTransform function maybe return empty, then the _localAffineEstimate function will be called, but the bug in it will result in incorrect results.

core(libva): support YV12 too

Added to CPU path only.
OpenCL code path still expects NV12 only (according to Intel OpenCL extension)

cmake: allow to specify own libva paths

via CMake:
- `-DVA_LIBRARIES=/opt/intel/mediasdk/lib64/libva.so.2\;/opt/intel/mediasdk/lib64/libva-drm.so.2`

android: NDK17 support

tested with NDK 17b (17.1.4828580)

Enable more deep learning tests using Intel's Inference Engine backend

ts: don't pass NULL for std::string() constructor

openvino: use 2018R3 defines

experimental version++

OpenCV version++

OpenCV 3.4.3

OpenCV version '-openvino'

openvino: use 2018R3 defines

Fixed windows build with InferenceEngine

dnn: fix variance setting bug for PriorBoxLayer

- The size of second channel should be size[2] of output tensor,
- The Scalar should be {variance[0], variance[0], variance[0], variance[0]}
  for _variance.size() == 1 case.

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

Fix lifetime of networks which are loaded from Model Optimizer IRs

Adds a small note describing BUILD_opencv_world (#12332)

* Added a mall note describing BUILD_opencv_world cmake option to the Installation in Windows tutorial.

* Made slight changes in BUILD_opencv_world documentation.

* Update windows_install.markdown

improved grammar

Update opengl_interop.cpp

resolves #12307

java: fix LIST_GET macro

fix typo

Added option to fail on missing testdata

Fixed that object_detection.py does not work in python3.

cleanup: IPP Async (IPP_A)

except header file with conversion routines (will be removed in OpenCV 4.0)

imgcodecs: add null pointer check

Include preprocessing nodes to object detection TensorFlow networks (#12211)

* Include preprocessing nodes to object detection TensorFlow networks

* Enable more fusion

* faster_rcnn_resnet50_coco_2018_01_28 test

countNonZero function reworked to use wide universal intrinsics instead of SSE2 intrinsics

resolve #5788

imgcodecs(webp): multiple fixes

- don't reallocate passed 'img' (test fixed - must use IMREAD_UNCHANGED / IMREAD_ANYCOLOR)
- avoid memory DDOS
- avoid reading of whole file during header processing
- avoid data access after allocated buffer during header processing (missing checks)
- use WebPFree() to free allocated buffers (libwebp >= 0.5.0)
- drop unused & undefined `.close()` method
- added checks for channels >= 5 in encoder

ml: fix adjusting K in KNearest (#12358)

dnn(perf): fix and merge Convolution tests

- OpenCL tests didn't run any OpenCL kernels
- use real configuration from existed models (the first 100 cases)
- batch size = 1

dnn(test): use dnnBackendsAndTargets() param generator

Bit-exact resize reworked to use wide intrinsics (#12038)

* Bit-exact resize reworked to use wide intrinsics

* Reworked bit-exact resize row data loading

* Added bit-exact resize row data loaders for SIMD256 and SIMD512

* Fixed type punned pointer dereferencing warning

* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize

Bit-exact GaussianBlur reworked to use wide intrinsics (#12073)

* Bit-exact GaussianBlur reworked to use wide intrinsics

* Added v_mul_hi universal intrinsic

* Removed custom SSE2 branch from bit-exact GaussianBlur

* Removed loop unrolling for gaussianBlur horizontal smoothing

doc: fix English gramma in tutorial out-of-focus-deblur filter (#12214)

* doc: fix English gramma in tutorial out-of-focus-deblur filter

* Update out_of_focus_deblur_filter.markdown

slightly modified one sentence

doc: add new tutorial motion deblur filter (#12215)

* doc: add new tutorial motion deblur filter

* Update motion_deblur_filter.markdown

a few minor changes

Replace Slice layer to Crop in Faster-RCNN networks from Caffe

js: use generated list of OpenCV headers

- replaces hand-written list

imgcodecs(webp): use safe cast to size_t on Win32

* Put Version status back to -dev.

follow the common codestyle

Exclude some target engines.

Refactor formulas.

Refactor code.

* Remove unused variable.

* Remove inference engine check for yolov2.

* Alter darknet batch tests to test with two different images.

* Add yolov3 second image GT.

* Fix bug.

* Fix bug.

* Add second test.

* Remove comment.

* Add NMS on network level.

* Add helper files to dev.

* syntax fix.

* Fix OD sample.

Fix sample dnn object detection.

Fix NMS boxes bug.

remove trailing whitespace.

Remove debug function.

Change thresholds for opencl tests.

* Adapt score diff and iou diff.

* Alter iouDiffs.

* Add debug messages.

* Adapt iouDiff.

* Fix tests
2018-09-12 13:29:43 +03:00
Hamdi Sahloul
a39e0daacf Utilize CV_UNUSED macro 2018-09-07 20:33:52 +09:00
Dmitry Kurtaev
d486204a0d Merge pull request #12264 from dkurt:dnn_remove_forward_method
* Remove a forward method in dnn::Layer

* Add a test

* Fix tests

* Mark multiple dnn::Layer::finalize methods as deprecated

* Replace back dnn's inputBlobs to vector of pointers

* Remove Layer::forward_fallback from CV_OCL_RUN scopes
2018-09-06 13:26:47 +03:00
Dmitry Kurtaev
f96f934426 Update Intel's Inference Engine deep learning backend (#11587)
* Update Intel's Inference Engine deep learning backend

* Remove cpu_extension dependency

* Update Darknet accuracy tests
2018-05-31 14:05:21 +03:00
Li Peng
ba5e8befa9 fp16 ocl support for more layers
Signed-off-by: Li Peng <peng.li@intel.com>
2018-05-16 22:45:04 +08:00
Dmitry Kurtaev
97fec07d96 Support YOLOv3 model from Darknet 2018-04-16 18:44:12 +03:00
Alexander Alekhin
1060c0f439 dnn: apply CV_OVERRIDE/CV_FINAL 2018-03-28 18:43:27 +03:00
Alexander Alekhin
6c051a55e5 cmake: don't add include <module>/src directory to avoid conflicts
during opencv_world builds
2018-03-19 11:14:15 +03:00
Alexander Alekhin
1b83bc48a1 dnn: make OpenCL DNN code optional 2018-03-01 12:12:40 +03:00
Dmitry Kurtaev
c67e75b68f Refactor NMS procedure at RegionLayer 2017-12-21 12:21:45 +03:00
Li Peng
66feea6cac region layer ocl implementation
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-07 02:26:46 +08:00
Li Peng
8f99083726 Add new layer forward interface
Add layer forward interface with InputArrayOfArrays and
OutputArrayOfArrays parameters, it allows UMat buffer to be
processed and transferred in the layers.

Signed-off-by: Li Peng <peng.li@intel.com>
2017-11-09 15:59:39 +08:00
AlexeyAB
ecc34dc521 Added DNN Darknet Yolo v2 for object detection 2017-10-09 21:08:44 +03:00