标签:oom contain this obj switch point inux status latency
[TOC]
Visionworks OpenVX
heterogeneous computation framework
除了官方的參考實作外,下方是不同廠商的實作,有些有開放原始碼有些則是包裝程動態函式庫.
以上是有通過conformance test的廠商,另外ARM 也有類似的SDK(compute library)而且初期開發時在架構上也是參考OpenVX。
雖然一開始OpenVX是針對電腦視覺運算設計的軟體框架,但由於類神經網路的編程模式(programming model)跟熱門程度讓Khronos OpenVX工作小組也特別訂定了Neural Network Extension使得OpenVX也加入了深度學習的戰場。
NVIDIA VisionWorks toolkit is a software development package for computer vision (CV) and image processing. VisionWorks? implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GPUs and SOCs enabling developers to realize CV applications on a scalable and flexible platform.
IMAGE ARITHMETIC
FLOW & DEPTH
GEOMETRIC TRANSFORMS
FILTERS
FEATURES
ANALYSIS
Yes. user node, base it on the Advanced Tiling Extensions (see the Intel‘s Extensions to the OpenVX* API: Advanced Tiling chapter)
ref:
vxScheduleGraph
API call.vxCreateImageFromHandle
). To enable zero-copy with the GPU the externally allocated memory should be aligned. For more details, refer to https://software.intel.com/en-us/node/540453.vxVerifyGraph
latency costs. For example, construct the graph in a way it would not require the verification upon the parameters updates. Notice that unlike Map/Unmap for the input images (see the Map/Unmap for OpenVX* Images section), setting new images with different meta-data (size, type, etc) almost certainly triggers the verification, potentially adding significant overhead.A Windows build environment needs these components:
Your license includes the full version of the product. To access the toolkit:
CPU: SSE4.1 or above CPU, 64-bit.
GPU: Radeon Professional Graphics Cards or Vega Family of Products (16GB required for vx_loomsl and vx_nn libraries)
OpenCV 3 (optional)
download
for RunVX
Build this project to generate AMD OpenVX library and RunVX executable.
Download to C:\Users\aeejshe\Downloads
Build SW according to guidelines, especially
Demo
C:\Users\aeejshe\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx exa
mples\gdf\canny.gdf
***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****
runvx.exe 0.9.7
OK: using AMD OpenVX 0.9.7
OK: enabled graph scheduling in separate threads
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms
,clread-ms
OK: capturing 480x360 image(s) into 480x360 RGB image buffer
csv,OVERALL, PASS, 1, , 8.60, 8.60, 0.00, 0.00, 0.00, 0.00 (medi
an 8.598)
> total elapsed time: 0.11 sec
Abort: Press any key to exit...
canny.gdf
# create input and output images
data input = image:480,360,RGB2
data output = image:480,360,U008
# specify input source for input image and request for displaying input and output images
read input examples/images/face1.jpg
view input inputWindow
view output edgesWindow
# compute luma image channel from input RGB image
data yuv = image-virtual:0,0,IYUV
data luma = image-virtual:0,0,U008
node org.khronos.openvx.color_convert input yuv
node org.khronos.openvx.channel_extract yuv !CHANNEL_Y luma
# compute edges in luma image using Canny edge detector
data hyst = threshold:RANGE,UINT8:INIT,80,100
data gradient_size = scalar:INT32,3
node org.khronos.openvx.canny_edge_detector luma hyst gradient_size !NORM_L1 output
usage
C:\Users\aeejshe\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx
***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****
runvx.exe 0.9.7
Usage:
runvx.exe [options] [file] <file.gdf> [argument(s)]
runvx.exe [options] node <kernelName> [argument(s)]
runvx.exe [options] shell [argument(s)]
The argument(s) are data objects created using <data-description> syntax.
These arguments can be accessed from inside GDF as $1, $2, etc.
The available command-line options are:
-h
Show full help.
-v
Turn on verbose logs.
-root:<directory>
Replace ~ in filenames with <directory> in the command-line and
GDF file. The default value of ‘~‘ is current working directory.
-frames:[<start>:]<end>|eof|live
Run the graph/node for specified frames or until eof or just as live.
Use live to indicate that input is live until aborted by user.
-affinity:CPU|GPU[<device-index>]
Set context affinity to CPU or GPU.
-dump-profile
Print performance profiling information after graph launch.
-enable-profile
use directive VX_DIRECTIVE_AMD_ENABLE_PROFILE_CAPTURE when graph is create
d
-discard-compare-errors
Continue graph processing even if compare mismatches occur.
-disable-virtual
Replace all virtual data types in GDF with non-virtual data types.
Use of this flag (i.e. for debugging) can make a graph run slower.
dump profile
C:\Users\aeejshe\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx -du
mp-profile examples\gdf\canny.gdf
***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****
runvx.exe 0.9.7
OK: using AMD OpenVX 0.9.7
OK: enabled graph scheduling in separate threads
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms
,clread-ms
OK: capturing 480x360 image(s) into 480x360 RGB image buffer
csv,OVERALL, PASS, 1, , 8.62, 8.62, 0.00, 0.00, 0.00, 0.00 (medi
an 8.621)
> total elapsed time: 0.07 sec
> graph profile:
COUNT,tmp(ms),avg(ms),min(ms),max(ms),DEV,KERNEL
1, 8.621, 8.621, 8.621, 8.621,CPU,GRAPH
1, 1.196, 1.196, 1.196, 1.196,CPU,com.amd.openvx.ColorConvert_Y_RGB
1, 4.905, 4.905, 4.905, 4.905,CPU,com.amd.openvx.CannySobel_U16_U8_3x3_
L1NORM
1, 2.305, 2.305, 2.305, 2.305,CPU,com.amd.openvx.CannySuppThreshold_U8X
Y_U16_3x3
1, 0.208, 0.208, 0.208, 0.208,CPU,com.amd.openvx.CannyEdgeTrace_U8_U8XY
Abort: Press any key to exit...
Test if CSE works
input
# create input and output images
data input = image:480,360,RGB2
data output = image:480,360,U008
data output2 = image:480,360,U008
# specify input source for input image and request for displaying input and output images
read input examples/images/face1.jpg
view input inputWindow
view output edgesWindow
# compute luma image channel from input RGB image
data yuv = image-virtual:0,0,IYUV
data yuv2 = image-virtual:0,0,IYUV
data luma = image-virtual:0,0,U008
data luma2 = image-virtual:0,0,U008
node org.khronos.openvx.color_convert input yuv
node org.khronos.openvx.color_convert input yuv2
node org.khronos.openvx.channel_extract yuv !CHANNEL_Y luma
node org.khronos.openvx.channel_extract yuv2 !CHANNEL_Y luma2
# compute edges in luma image using Canny edge detector
data hyst = threshold:RANGE,UINT8:INIT,80,100
data gradient_size = scalar:INT32,3
node org.khronos.openvx.canny_edge_detector luma hyst gradient_size !NORM_L1 output
node org.khronos.openvx.canny_edge_detector luma2 hyst gradient_size !NORM_L1 output2
Output
C:\Users\aeejshe\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx -du
mp-profile examples\gdf\canny.gdf
***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****
runvx.exe 0.9.7
OK: using AMD OpenVX 0.9.7
OK: enabled graph scheduling in separate threads
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms
,clread-ms
OK: capturing 480x360 image(s) into 480x360 RGB image buffer
csv,OVERALL, PASS, 1, , 17.13, 17.13, 0.00, 0.00, 0.00, 0.00 (medi
an 17.127)
> total elapsed time: 0.07 sec
> graph profile:
COUNT,tmp(ms),avg(ms),min(ms),max(ms),DEV,KERNEL
1, 17.127, 17.127, 17.127, 17.127,CPU,GRAPH
1, 1.202, 1.202, 1.202, 1.202,CPU,com.amd.openvx.ColorConvert_Y_RGB
1, 1.192, 1.192, 1.192, 1.192,CPU,com.amd.openvx.ColorConvert_Y_RGB
1, 4.857, 4.857, 4.857, 4.857,CPU,com.amd.openvx.CannySobel_U16_U8_3x3_
L1NORM
1, 4.838, 4.838, 4.838, 4.838,CPU,com.amd.openvx.CannySobel_U16_U8_3x3_
L1NORM
1, 2.312, 2.312, 2.312, 2.312,CPU,com.amd.openvx.CannySuppThreshold_U8X
Y_U16_3x3
1, 2.302, 2.302, 2.302, 2.302,CPU,com.amd.openvx.CannySuppThreshold_U8X
Y_U16_3x3
1, 0.209, 0.209, 0.209, 0.209,CPU,com.amd.openvx.CannyEdgeTrace_U8_U8XY
1, 0.207, 0.207, 0.207, 0.207,CPU,com.amd.openvx.CannyEdgeTrace_U8_U8XY
Abort: Press any key to exit...
Q: Why CSE not work?
TODO:
//vx_api.h
VX_API_ENTRY vx_graph VX_API_CALL vxCreateGraph(vx_context context);
VX_API_ENTRY vx_status VX_API_CALL vxVerifyGraph(vx_graph graph);
VX_API_ENTRY vx_status VX_API_CALL vxProcessGraph(vx_graph graph);
VX_API_ENTRY vx_image VX_API_CALL vxCreateVirtualImage(vx_graph graph, vx_uint32 width, vx_uint32 height, vx_df_image color);
//vx_node.h
VX_API_ENTRY vx_node VX_API_CALL vxColorConvertNode(vx_graph graph, vx_image input, vx_image output);
Features
//core.hpp
GAPI_EXPORTS GMat resize(const GMat& src, const Size& dsize, double fx = 0, double fy = 0, int interpolation = INTER_LINEAR);
//GComputation.hpp
class GComputation{
...
GComputation(GProtoInputArgs &&ins,
GProtoOutputArgs &&outs); // Arg-to-arg overload
void apply(GRunArgs &&ins, GRunArgsP &&outs, GCompileArgs &&args = {});
...
}
of G-API apply function
GComputation -> GComputation2: apply
GComputation2 -> GCompiler: compile
GCompiler -> Graph: build graph
Graph --> GComputation2: return ade::Graph
GComputation2 -> Graph: exec the graph
ref:
Study if OpenVINO or OpenCV supports
Lib | CSE | partially inputs |
---|---|---|
OpenVINO | x | x |
AMDOVX | x | x |
OpenCV G-API | x | x |
Intel TBB | x | v behavior: the ready nodes are called then exit Code: C:\jshe\codes\lua\src\tbbtest\test_tbb_behavior.cpp |
Tensorflow | v |
TODO
Test if can be called multiples like following
while true
modify input
vxProcessGraph()
ref: http://projects.eees.dei.unibo.it/adrenaline/tutorial-02-execute-openvx-examples/
OpenVX讀書筆記
high level | low level | |
---|---|---|
ovx | strong typed eg VX_API_ENTRY vx_node VX_API_CALL vxColorConvertNode(vx_graph graph, vx_image input, vx_image output); |
weak typed, eg OpenVX.dll!agoCreateNode(_vx_graph * graph, int kernel_id) |
tbb | strong typed make_edge(tbbflowoutput_port<1>(gpu_slm_split_n), tbbflowinput_port<1>(gpu_slm_mat_mult_n)) tbbflowfunction_node< validation_args_type > mat_validation_n(g, tbbflowunlimited, [](const validation_args_type& result) { // Get references to matrixes const tbbflowgfx_buffer const tbbflowgfx_buffer const tbbflowgfx_buffer // Verify results // Check that slm algorithm produces correct results on CPU: validate_mat("matrix multiply: ‘SLM‘ CPU vs. CPU", SIZE_Y, SIZE_X, CPU_SLM_MAT.data(), CPU_NAIVE_MAT.data()); // Verify Gen results: validate_mat("matrix multiply: SLM Gen vs. CPU", SIZE_Y, SIZE_X, GPU_SLM_MAT.data(), CPU_NAIVE_MAT.data()); }); |
Not sure |
G-API | strong typed | TODO |
// ovx: \vis_bep_12\C\Users\aeejshe\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2 // tbb: C:\Users\aeejshe\Downloads\tbb2017_20170604oss_win\tbb2017_20170604oss
Define a enum
VX_KERNEL_COLOR_CONVERT = VX_KERNEL_BASE(VX_ID_KHRONOS, VX_LIBRARY_KHR_BASE) + 0x1,
Registrtion
OVX_KERNEL_ENTRY( VX_KERNEL_COLOR_CONVERT , ColorConvert, "color_convert", AIN_AOUT, ATYPE_II , false ),
the parameters meaning
#define OVX_KERNEL_ENTRY(kernel_id,name,kname,argCfg,argType,validRectReset)
#define ATYPE_II { VX_TYPE_IMAGE, VX_TYPE_IMAGE }
Implement "DramaDivideNode" operation, it is used to select the best suited for this PC architecture
int agoDramaDivideNode(AgoNodeList * nodeList, AgoNode * anode)
{
// save parameter list
vx_uint32 paramCount = anode->paramCount;
AgoData * paramList[AGO_MAX_PARAMS]; memcpy(paramList, anode->paramList, sizeof(paramList));
// divide the node depending on the type
int status = -1;
switch (anode->akernel->id)
{
case VX_KERNEL_COLOR_CONVERT:
status = agoDramaDivideColorConvertNode(nodeList, anode);
break;
the function is called by optimize function
> OpenVX.dll!agoCreateNode(_vx_graph * graph, int kernel_id) Line 2699 C++
OpenVX.dll!agoDramaDivideAppend(AgoNodeList * nodeList, _vx_node * anode, int new_kernel_id, _vx_reference * * paramList, unsigned int paramCount) Line 37 C++
OpenVX.dll!agoDramaDivideAppend(AgoNodeList * nodeList, _vx_node * anode, int new_kernel_id) Line 56 C++
OpenVX.dll!agoDramaDivideColorConvertNode(AgoNodeList * nodeList, _vx_node * anode) Line 244 C++
OpenVX.dll!agoDramaDivideNode(AgoNodeList * nodeList, _vx_node * anode) Line 1818 C++
OpenVX.dll!agoOptimizeDramaDivide(_vx_graph * agraph) Line 1962 C++
OpenVX.dll!agoOptimizeDrama(_vx_graph * agraph) Line 522 C++
OpenVX.dll!agoOptimizeGraph(_vx_graph * agraph) Line 209 C++
OpenVX.dll!vxVerifyGraph(_vx_graph * graph) Line 2450 C++
runvx.exe!CVxEngine::ProcessGraph(std::vector<char const *,std::allocator<char const *> > * graphNameList, unsigned __int64 beginIndex) Line 285 C++
Choose the best
标签:oom contain this obj switch point inux status latency
原文地址:https://www.cnblogs.com/cutepig/p/12041564.html