When accessing global memory by DWORD4, memory bandwidth can be fully utilized on Intel platform. This patch will make more image format(e.g. 8UC4) be processed in DWORD4 by work-item. After applying this patch, 3 subcase of ./opencv_perf_core --gtest_filter=OCL_RepeatFixture_Repeat.Repeat/* can be speedup on HD4000 graphics card with Beignet: OCL_RepeatFixture_Repeat.Repeat/2, 64% improvement. OCL_RepeatFixture_Repeat.Repeat/6, 50% improvement. OCL_RepeatFixture_Repeat.Repeat/8, 56% improvement. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> |
||
|---|---|---|
| 3rdparty | ||
| apps | ||
| cmake | ||
| data | ||
| doc | ||
| include | ||
| modules | ||
| platforms | ||
| samples | ||
| .gitattributes | ||
| .gitignore | ||
| .tgitconfig | ||
| CMakeLists.txt | ||
| index.rst | ||
| LICENSE | ||
| README.md | ||
OpenCV: Open Source Computer Vision Library
Resources
- Homepage: http://opencv.org
- Docs: http://docs.opencv.org
- Q&A forum: http://answers.opencv.org
- Issue tracking: http://code.opencv.org
Contributing
Please read before starting work on a pull request: http://code.opencv.org/projects/opencv/wiki/How_to_contribute
Summary of guidelines:
- One pull request per issue;
- Choose the right base branch;
- Include tests and documentation;
- Clean up "oops" commits before submitting;
- Follow the coding style guide.
