https://github.com/flame/how-to-optimize-gemm/wiki
Copyright by Prof. Robert van de Geijn ([email protected]).
Adapted to Github Markdown Wiki by Jianyu Huang ([email protected]).
- The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication - Step-by-Step
 - NOTICE ON ACADEMIC HONESTY
 - References
 - Set Up
 - Step-by-step optimizations
 - Computing four elements of C at a time
 - Computing a 4 x 4 block of C at a time
 - Acknowledgement
 
This material was partially sponsored by grants from the National Science Foundation (Awards ACI-1148125/1340293 and ACI-1550493).
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).