Skip to content

Two-Stage Movie Scripts Summarization: An Efficient Method For Low-resource Long Documents Summarization @ ASCW 2022 by UdS

Notifications You must be signed in to change notification settings

tony-hong/script-2-story

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Two-Stage Movie Script Summarization (CreativeSumm 2022)

Unofficial reference implementation and reproducible pipeline for
โ€œTwo-Stage Movie Script Summarization: An Efficient Method for Low-Resource Long Document Summarizationโ€ (CreativeSumm @ COLING 2022).


๐Ÿ”— Quick Links


โœจ Overview

This repository implements a two-stage pipeline for summarizing movie scripts โ†’ movie plots:

  1. Stage A โ€” Script Condensation (Heuristic Extraction):
    Extract actions and salient dialogues from screenplay format to drastically shorten the input while keeping core narrative content.

  2. Stage B โ€” Abstractive Summarization (LED + Efficient Finetuning):
    Use Longformer-Encoderโ€“Decoder (LED) with parameter-efficient strategies (BitFit, NoisyTune) to generate coherent plot summaries from the condensed script.

Why this design?
Movie scripts are long (tens of thousands of tokens) and structurally idiosyncratic. The heuristic pass reduces length and noise; LED then focuses computation on what matters.

Notes

We release the simplified version of our code. This release is for the quick implementation of our work. It is not meant to replicate our results in our paper.


๐Ÿš€ Setup

Python: 3.9โ€“3.11

git clone https://github.com/<you>/two-stage-script-sum.git
cd two-stage-script-sum

# Create env (conda or venv)
python -m venv .venv && source .venv/bin/activate

# Install deps
pip install -r requirements.txt
# or, minimal:
pip install torch transformers datasets accelerate evaluate rouge-score nltk sentencepiece

python -c "import nltk; nltk.download('punkt')"

๐Ÿงช Repro Tips

  • Use gradient accumulation for long contexts.
  • Enable mixed precision (--fp16) if your hardware supports it.
  • For especially long scripts, chunk condensed text into overlapping windows and merge beams.

๐Ÿ“Š Expected Behavior (High-Level)

  • The two-stage pipeline substantially reduces token length before generation.
  • With LED + BitFit/NoisyTune, you should see improvements over zero-shot LED baselines on standard automatic metrics.

Exact numbers depend on dataset split, preprocessing choices, and compute budget.


๐Ÿงพ Citation

If you use this repo, please cite the original paper:

Liu, Dongqi; Hong, Xudong; Lin, Pin-Jie; Chang, Ernie; Demberg, Vera (2022).
Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization.
In Proceedings of the Workshop on Automatic Summarization for Creative Writing (CreativeSumm @ COLING 2022), pp. 57โ€“66.

BibTeX

@inproceedings{pu-etal-2022-two,
  title     = {Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization},
  author    = {Liu, Dongqi and Hong, Xudong and Lin, Pin-Jie and Chang, Ernie and Demberg, Vera},
  editor    = {Mckeown, Kathleen},
  booktitle = {Proceedings of the Workshop on Automatic Summarization for Creative Writing},
  month     = {oct},
  year      = {2022},
  address   = {Gyeongju, Republic of Korea},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2022.creativesumm-1.9/},
  pages     = {57--66}
}

๐Ÿ“š Related Resources


๐Ÿ™ Acknowledgements

We thank the CreativeSumm organizers and the ACL community.
This repo builds on the open-source NLP ecosystem (PyTorch, Transformers, Datasets, Evaluate).


๐Ÿ—บ๏ธ Maintainers

About

Two-Stage Movie Scripts Summarization: An Efficient Method For Low-resource Long Documents Summarization @ ASCW 2022 by UdS

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages