Closed
Description
I'm trying to run dlrm training with the mixed dimensions trick. The paper says that amsgrad is used, so I tried changing the SGD optimizer to Adam and set amsgrad=True. But Adam doesn't support sparse gradients. The code works with SGD, but I don't get the same accuracies reported in the paper. How do you suggest getting amsgrad to work so that one can reproduce results from the paper? Thanks in advance.
Metadata
Metadata
Assignees
Labels
No labels