Platform: Portable Computing Language Device: NVIDIA Tegra X2 Driver version : 3.0-rc2 (Linux ARM64) Compute units : 2 Clock frequency : 1020 MHz Global memory bandwidth (GBPS) float : 37.31 float2 : 46.28 float4 : 45.69 float8 : 34.27 float16 : 28.97 Single-precision compute (GFLOPS) float : 592.11 float2 : 649.11 float4 : 647.81 float8 : 651.77 float16 : 649.53 No half precision support! Skipped Double-precision compute (GFLOPS) double : 20.79 double2 : 20.74 double4 : 20.71 double8 : 20.64 double16 : 20.48 Integer compute (GIOPS) int : 217.46 int2 : 219.72 int4 : 218.27 int8 : 216.66 int16 : 215.70 Integer compute Fast 24bit (GIOPS) int : 217.46 int2 : 219.69 int4 : 218.28 int8 : 216.60 int16 : 215.16 Transfer bandwidth (GBPS) enqueueWriteBuffer : 5.79 enqueueReadBuffer : 0.31 enqueueWriteBuffer non-blocking : 5.55 enqueueReadBuffer non-blocking : 0.31 enqueueMapBuffer(for read) : 14717.57 memcpy from mapped ptr : 0.31 enqueueUnmap(after write) : 0.53 memcpy to mapped ptr : 2.86 Kernel launch latency : -947.51 us