Platform: AMD Accelerated Parallel Processing Device: gfx1103 Driver version : 3614.0 (HSA1.1,LC) (Linux x64) Compute units : 6 Clock frequency : 2700 MHz Global memory bandwidth (GBPS) float : 71.93 float2 : 76.21 float4 : 79.33 float8 : 79.91 float16 : 82.53 Single-precision compute (GFLOPS) float : 2301.48 float2 : 2522.03 float4 : 2519.30 float8 : 2503.28 float16 : 2505.05 Half-precision compute (GFLOPS) half : 2513.05 half2 : 4574.63 half4 : 4561.89 half8 : 4452.38 half16 : 4418.84 Double-precision compute (GFLOPS) double : 85.92 double2 : 85.84 double4 : 85.59 double8 : 85.34 double16 : 84.68 Integer compute (GIOPS) int : 535.31 int2 : 535.65 int4 : 536.31 int8 : 532.17 int16 : 531.11 Integer compute Fast 24bit (GIOPS) int : 2283.13 int2 : 2319.12 int4 : 2299.29 int8 : 2273.81 int16 : 1905.24 Integer char (8bit) compute (GIOPS) char : 2296.83 char2 : 1331.43 char4 : 1301.48 char8 : 1167.89 char16 : 1162.60 Integer short (16bit) compute (GIOPS) short : 2288.83 short2 : 2298.11 short4 : 2290.89 short8 : 2229.92 short16 : 2216.77 Transfer bandwidth (GBPS) enqueueWriteBuffer : 16.81 enqueueReadBuffer : 5.23 enqueueWriteBuffer non-blocking : 16.84 enqueueReadBuffer non-blocking : 5.23 enqueueMapBuffer(for read) : 49334.05 memcpy from mapped ptr : 5.17 enqueueUnmap(after write) : 285212.47 memcpy to mapped ptr : 16.86 Kernel launch latency : 8.57 us