Open
Description
Hi Kevin,
Thank you for the impressive work!
In section 3.3, it says "a vanilla bidirectional transformer architecture" is adopted with a citation to the origin transformer paper. Also in Appendix C2, "an auto-regressive transformer" is used as a baseline.
I am quite confused since it looks like the implementation uses a BERT architecture (for both the main model and the autoregressive baseline). I am wondering whether the implementation or the preprint has been updated.
Best,
Metadata
Metadata
Assignees
Labels
No labels