Since #1417 we need to assemble (parallel comm) the matrix after regular assembly and before setting a local diagonal, set diagonal, and the assemble (parallel comm) again. Previously only one parallel comm step was required.
We should figure out how to set the local diagonal without requiring a parallel communication step afterwards.