The variable "end_training" in Bert_Large training is wrongly used. 

In the code below, the variable "end_training" is defined with boolean type to decide when to end the training. 

https://github.com/IntelAI/models/blob/cdd842a33eb9d402ff18bfb79bd106ae132a8e99/models/language_modeling/pytorch/bert_large/training/gpu/run_pretrain_mlperf.py#L838

In the code below to calculate the one iteration training time, the variable "end_training" is wrongly re-used to record the end training time. 
https://github.com/IntelAI/models/blob/cdd842a33eb9d402ff18bfb79bd106ae132a8e99/models/language_modeling/pytorch/bert_large/training/gpu/run_pretrain_mlperf.py#L1006

"end_training" is set with a non-zero value in the code line 1006. As a result, after one data file is used for training, the training exits here and will never go to next data file. 
https://github.com/IntelAI/models/blob/cdd842a33eb9d402ff18bfb79bd106ae132a8e99/models/language_modeling/pytorch/bert_large/training/gpu/run_pretrain_mlperf.py#L1079

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The variable "end_training" in Bert_Large training is wrongly used. #170

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The variable "end_training" in Bert_Large training is wrongly used. #170

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions