Platform: Portable Computing Language Device: Orin Driver version : 5.0 (Linux ARM64) Compute units : 8 Clock frequency : 918 MHz Global memory bandwidth (GBPS) float : 86.53 float2 : 93.67 float4 : 93.68 float8 : 93.64 float16 : 58.93 Single-precision compute (GFLOPS) float : 1830.74 float2 : 1818.19 float4 : 1788.92 float8 : 1843.85 float16 : 1830.18 Half-precision compute (GFLOPS) half : 933.07 half2 : 3520.95 half4 : 2646.06 half8 : 2881.55 half16 : 2537.46 Double-precision compute (GFLOPS) double : 29.35 double2 : 29.32 double4 : 29.25 double8 : 29.11 double16 : 28.82 Integer compute (GIOPS) int : 625.10 int2 : 628.33 int4 : 642.09 int8 : 626.74 int16 : 630.25 Integer compute Fast 24bit (GIOPS) int : 625.10 int2 : 628.37 int4 : 642.10 int8 : 626.74 int16 : 630.53 Integer char (8bit) compute (GIOPS) char : 309.53 char2 : 412.41 char4 : 475.61 char8 : 528.38 char16 : 495.53 Integer short (16bit) compute (GIOPS) short : 309.46 short2 : 402.09 short4 : 473.74 short8 : 516.14 short16 : 493.67 Transfer bandwidth (GBPS) enqueueWriteBuffer : 7.03 enqueueReadBuffer : 7.02 enqueueWriteBuffer non-blocking : 7.02 enqueueReadBuffer non-blocking : 7.02 enqueueMapBuffer(for read) : 117609.66 memcpy from mapped ptr : 7.04 enqueueUnmap(after write) : 8.41 memcpy to mapped ptr : 7.05 Kernel launch latency : -408.96 us