Platform: Portable Computing Language Device: Orin Driver version : 5.0 (Linux ARM64) Compute units : 8 Clock frequency : 624 MHz Global memory bandwidth (GBPS) float : 62.87 float2 : 63.71 float4 : 63.71 float8 : 63.68 float16 : 40.08 Single-precision compute (GFLOPS) float : 1245.91 float2 : 1237.44 float4 : 1217.39 float8 : 1254.84 float16 : 1245.56 Half-precision compute (GFLOPS) half : 635.04 half2 : 2397.03 half4 : 1801.24 half8 : 1961.29 half16 : 1727.49 Double-precision compute (GFLOPS) double : 19.97 double2 : 19.95 double4 : 19.91 double8 : 19.81 double16 : 19.61 Integer compute (GIOPS) int : 425.48 int2 : 427.67 int4 : 437.13 int8 : 426.58 int16 : 428.96 Integer compute Fast 24bit (GIOPS) int : 425.48 int2 : 427.67 int4 : 437.13 int8 : 426.58 int16 : 429.16 Integer char (8bit) compute (GIOPS) char : 210.67 char2 : 280.69 char4 : 323.71 char8 : 359.63 char16 : 337.26 Integer short (16bit) compute (GIOPS) short : 210.61 short2 : 273.66 short4 : 322.43 short8 : 351.30 short16 : 336.00 Transfer bandwidth (GBPS) enqueueWriteBuffer : 6.67 enqueueReadBuffer : 6.65 enqueueWriteBuffer non-blocking : 6.65 enqueueReadBuffer non-blocking : 6.65 enqueueMapBuffer(for read) : 47989.21 memcpy from mapped ptr : 6.70 enqueueUnmap(after write) : 8.04 memcpy to mapped ptr : 6.70 Kernel launch latency : -530.93 us