Skip to content

Conversation

imomayiz
Copy link

Fixes tools/preprocess_data when tokenizer does not have eod_token. Otherwise the error raised by preprocess_data.py is not clear to debug.

Copy link

copy-pr-bot bot commented Aug 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@sbhavani sbhavani added bug Something isn't working module: data pipeline labels Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working module: data pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants