Mixed Dimensions Trick

I'm trying to run dlrm training with the mixed dimensions trick.   The paper says that amsgrad is used, so I tried changing the SGD optimizer to Adam and set amsgrad=True.  But Adam doesn't support sparse gradients.  The code works with SGD, but I don't get the same accuracies reported in the paper.  How do you suggest getting amsgrad to work so that one can reproduce results from the paper?  Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mixed Dimensions Trick #108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mixed Dimensions Trick #108

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions