AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Opengl 4.710/14/2023 ![]() Demonstrates tf32 (e8m10) GEMM computation using the WMMA API for tf32 employing the Tensor Cores. ![]() Demonstrates _nv_bfloat16 (e8m7) GEMM computation using the WMMA API for _nv_bfloat16 employing the Tensor Cores. ![]() Makes use of asynchronous copy from global to shared memory using cuda pipeline which leads to further performance gain. Demonstrates double precision GEMM computation using the WMMA API for double precision employing the Tensor Cores. Demonstrates the stream attributes that affect L2 locality. Demonstrates asynchronous copy of data from global to shared memory using cuda pipeline.
0 Comments
Read More
Leave a Reply. |