Platform: Portable Computing Language Device: NVIDIA Tegra X1 Driver version : 3.0-rc2 (Linux ARM64) Compute units : 1 Clock frequency : 921 MHz Global memory bandwidth (GBPS) float : 17.84 float2 : 20.57 float4 : 21.38 float8 : 19.87 float16 : 17.72 Single-precision compute (GFLOPS) float : 220.80 float2 : 228.89 float4 : 230.41 float8 : 229.79 float16 : 229.13 No half precision support! Skipped Double-precision compute (GFLOPS) double : 7.32 double2 : 7.32 double4 : 7.30 double8 : 7.28 double16 : 7.22 Integer compute (GIOPS) int : 76.86 int2 : 77.52 int4 : 77.30 int8 : 76.07 int16 : 74.52 Integer compute Fast 24bit (GIOPS) int : 76.87 int2 : 77.51 int4 : 77.29 int8 : 76.04 int16 : 74.89 Transfer bandwidth (GBPS) enqueueWriteBuffer : 2.35 enqueueReadBuffer : 2.36 enqueueWriteBuffer non-blocking : 2.37 enqueueReadBuffer non-blocking : 2.27 enqueueMapBuffer(for read) : 7635.37 memcpy from mapped ptr : 0.71 enqueueUnmap(after write) : 9.07 memcpy to mapped ptr : 3.56 Kernel launch latency : -14.24 us