You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @da03 ,
Thanks for raising this issue. I just wanted to do an explicit follow up to this.
Yes, this bug was introduced in refactoring. The cause of this bug was because I originally named the datasets differently than what they're called in the paper. When I refactored the encoder and decoder repositories, I thought it would be good to align the names to what they're called in the paper...
The original repository does not have this bug as I previously called the dataset restaurant, and not tm2. Below are screenshots from the original dataset where I had the dataset named as restaurant.
As I mentioned in #9 , I plan to rectify these issues soon! Thanks again for raising them and apologies for the inconvenience.
It seems to me that this line should be changed to
if 'tm' in self.name
(language_modeling_via_stochastic_processes/language_modeling_via_stochastic_processes/transformers/src/transformers/data/datasets/language_modeling.py
Line 1201 in 5cbc3ee
self.start_conversation
andself.end_conversation
to split the training and test sets (seehttps://github.com/rosewang2008/language_modeling_via_stochastic_processes/blob/main/language_modeling_via_stochastic_processes/transformers/src/transformers/data/datasets/language_modeling.py#L1182
) for the tm2 dataset. With the current code, it seems that the training and test sets would be the same for tm2.
The text was updated successfully, but these errors were encountered: