Skip to content

Add sequence truncation support for long sequences #718

@HuiyingLi

Description

@HuiyingLi

Is your feature request related to a problem? Please describe.
Currently, the dataset formatting utilities (ChatDataset, ColumnMappedTextInstructionDataset) do not truncate sequences that exceed certain length. seq_length is only used for padding, not truncation.

Describe the solution you'd like
Truncation support so that user can specify max sequencelength.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions