EvalSubtitle is a tool for reference-based evaluation of subtitle segmentation.
The repository contains the Subtitle Segmentation Score (Sigma), specifically tailored for evaluating segmentation from system outputs where the text is not identical to a reference (imperfect texts). EvalSub also contains a collection of standard segmentation metrics (F1, WindowDiff etc.) as well as subtitling evaluation metrics: BLEU on segmented (BLEU_br) and non-segmented text (BLEU_nb), and TER_br.
More details can be found in the paper.
python>=3.6.0
The script evalsub_main.py allows the computation of the following metrics:
Standard segmentation metrics:
PrecisionRecallF1PkWinDiff(WindowDiff)SegSim(Segmentation Similarity)BoundSim(Boundary Similarity)
Subtitling evaluation metrics
BLEU_brBLEU_nbTER_brCPL_confSigma
System and reference files should be in one of these formats: tagged text, or SRT.
Tagged text files contain text interspersed with segmentation symbols: <eol>, which indicates a change of line within the same screen, and <eob>, which indicates the end of a subtitle block and a subsequent change of screen.
A line of the file corresponds to a full sentence.
System and reference files should contain the same number of sentences for the computation of certain metrics (BLEU_br, BLEU_nb, Sigma, TER_br).
If not, the auto_segmentation option can be used to automatically segment system output according to reference sentences (implementation from SubER Levenshtein alignment tool).
--all,-a: Compute all metrics.--standard,-std: Compute all metrics that require identical/perfect text.--end2end,-e2e: Compute all metrics that do not require identical/perfect text.--include,-i: Compute only the specified metrics.--exclude,-e: Compute all but the specified metrics.--text,-t: Whether the text from system subtitles is identical to the text from reference subtitles ("perfect"), or not ("imperfect"). (Can be used as a safeguard to prevent computing standard metrics with imperfect text)--system_files,-sys: Segmented subtitle files to evaluate (by default, the system files in data).--reference_file,-ref: Reference segmented subtitle file (by default, the reference file in data).--results_file,-res: CSV file where to write the results.--srt,-srt: Whether the subtitle files are in SRT format.--auto_segmentation,-as: Whether to use automatic segmentation for system sequences.--window_size,-k: Window size for the window-based (Pk, WinDiff) segmentation evaluation (by default, is computed as half of the mean reference segmentation length).--max_transpo,-n: Maximum distance that can be accounted as a boundary transposition error (by default, 2). Specific to SegSim and BoundSim.--max_cpl,-cpl: Maximum allowed length for subtitle lines (by default, 42).--confidence_interval,-ci: If set, compute (and print) the confidence interval (CI) for BLEU and Sigma. The CI is computed using bootstrap resampling (with 95% confidence).
Note: the metric names have to be written as in the list above.
Compute all end-to-end metrics but TER_br, for the automatic subtitles in data:
python evalsub_main.py -res results.csv -e2e -e TER_br
Compute only Sigma, BLEU_br and BLEU_nb, for the automatic subtitles in data:
python evalsub_main.py -res results.csv -i Sigma BLEU_br BLEU_nb
Compute all metrics that are compatible with imperfect text:
python evalsub_main.py -res results.csv -a -t imperfect
also equivalent to
python evalsub_main.py -res results.csv -e2e
If you use EvalSubtitle in your research, please cite the following paper:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
