The VariableMgrDistributedReplicated decrease the speed of convergence

@reedwm 

Hi, I am in trouble during using the following code.
""'
    for i, (g, v) in enumerate(grads):
      apply_gradient_op = opt.apply_gradients([(g, v)])
      barrier = self.benchmark_cnn.add_sync_queues_and_barrier(
          'replicate_variable_%s' % i, [apply_gradient_op])
   """
Here, the servers run the op "apply_gradient_op" one by one, but not average the servers's gradients. While the optimizer is Momentum, this will result of different update_value with the average value.  In my application, the speed of convergence will slower  and the training time will increase.
Is there any method to send an op that can add all the gradients and return the sum of all server's gradients? Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The VariableMgrDistributedReplicated decrease the speed of convergence #115

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The VariableMgrDistributedReplicated decrease the speed of convergence #115

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions