Skip to content

Issue in data loading #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
da03 opened this issue Aug 8, 2022 · 1 comment
Open

Issue in data loading #10

da03 opened this issue Aug 8, 2022 · 1 comment

Comments

@da03
Copy link

da03 commented Aug 8, 2022

It seems to me that this line should be changed to if 'tm' in self.name (

), since you were using self.start_conversation and self.end_conversation to split the training and test sets (see
https://github.com/rosewang2008/language_modeling_via_stochastic_processes/blob/main/language_modeling_via_stochastic_processes/transformers/src/transformers/data/datasets/language_modeling.py#L1182
) for the tm2 dataset. With the current code, it seems that the training and test sets would be the same for tm2.

@rosewang2008
Copy link
Owner

Hi @da03 ,
Thanks for raising this issue. I just wanted to do an explicit follow up to this.

Yes, this bug was introduced in refactoring. The cause of this bug was because I originally named the datasets differently than what they're called in the paper. When I refactored the encoder and decoder repositories, I thought it would be good to align the names to what they're called in the paper...

The original repository does not have this bug as I previously called the dataset restaurant, and not tm2. Below are screenshots from the original dataset where I had the dataset named as restaurant.

Screen Shot 2022-08-29 at 11 59 08 AM

Screen Shot 2022-08-29 at 12 00 03 PM

As I mentioned in #9 , I plan to rectify these issues soon! Thanks again for raising them and apologies for the inconvenience.

Rose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants