Skip to content

Is Gradient Clippping in the code as it is on the paper? #29

Open
@sangmain

Description

@sangmain

Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train():

optimizer.zero_grad()
loss.backward()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)
optimizer.step()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))
If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions