CUDA Optimization 125 words·1 min· loading · loading · Like CUDA CUDA Parallel Programming - This article is part of a series. Part 1: GPU 结构 Part 2: CUDA Programming Part 3: CUDA Conv Part 4: This Article Part 5: CUDA Memory and Optimization CUDA 程序获得高性能的必要(但不充分)条件有: 数据传输比例较小 核函数的算术强度较高(计算访存比) 核函数中定义的线程数目较多 在编写与优化 CUDA 程序时,要想方设法(设计算法)做到: 减少主机与设备之间的数据传输 提高核函数的算术强度(计算访存比) 增大核函数的并行规模 CUDA Parallel Programming - This article is part of a series. Part 1: GPU 结构 Part 2: CUDA Programming Part 3: CUDA Conv Part 4: This Article Part 5: CUDA Memory and Optimization