Two-Stage Movie Script Summarization (CreativeSumm 2022)

Unofficial reference implementation and reproducible pipeline for
“Two-Stage Movie Script Summarization: An Efficient Method for Low-Resource Long Document Summarization” (CreativeSumm @ COLING 2022).

🔗 Quick Links

Paper: https://aclanthology.org/2022.creativesumm-1.9/
PDF: https://aclanthology.org/2022.creativesumm-1.9.pdf
BibTeX: in the 📚 Citation section

✨ Overview

This repository implements a two-stage pipeline for summarizing movie scripts → movie plots:

Stage A — Script Condensation (Heuristic Extraction):
Extract actions and salient dialogues from screenplay format to drastically shorten the input while keeping core narrative content.
Stage B — Abstractive Summarization (LED + Efficient Finetuning):
Use Longformer-Encoder–Decoder (LED) with parameter-efficient strategies (BitFit, NoisyTune) to generate coherent plot summaries from the condensed script.

Why this design?
Movie scripts are long (tens of thousands of tokens) and structurally idiosyncratic. The heuristic pass reduces length and noise; LED then focuses computation on what matters.

Notes

We release the simplified version of our code. This release is for the quick implementation of our work. It is not meant to replicate our results in our paper.

🚀 Setup

Python: 3.9–3.11

git clone https://github.com/<you>/two-stage-script-sum.git
cd two-stage-script-sum

# Create env (conda or venv)
python -m venv .venv && source .venv/bin/activate

# Install deps
pip install -r requirements.txt
# or, minimal:
pip install torch transformers datasets accelerate evaluate rouge-score nltk sentencepiece

python -c "import nltk; nltk.download('punkt')"

🧪 Repro Tips

Use gradient accumulation for long contexts.
Enable mixed precision (--fp16) if your hardware supports it.
For especially long scripts, chunk condensed text into overlapping windows and merge beams.

📊 Expected Behavior (High-Level)

The two-stage pipeline substantially reduces token length before generation.
With LED + BitFit/NoisyTune, you should see improvements over zero-shot LED baselines on standard automatic metrics.

Exact numbers depend on dataset split, preprocessing choices, and compute budget.

🧾 Citation

If you use this repo, please cite the original paper:

Liu, Dongqi; Hong, Xudong; Lin, Pin-Jie; Chang, Ernie; Demberg, Vera (2022).
Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization.
In Proceedings of the Workshop on Automatic Summarization for Creative Writing (CreativeSumm @ COLING 2022), pp. 57–66.

BibTeX

@inproceedings{pu-etal-2022-two,
  title     = {Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization},
  author    = {Liu, Dongqi and Hong, Xudong and Lin, Pin-Jie and Chang, Ernie and Demberg, Vera},
  editor    = {Mckeown, Kathleen},
  booktitle = {Proceedings of the Workshop on Automatic Summarization for Creative Writing},
  month     = {oct},
  year      = {2022},
  address   = {Gyeongju, Republic of Korea},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2022.creativesumm-1.9/},
  pages     = {57--66}
}

📚 Related Resources

CreativeSumm 2022 Shared Task (overview & data pointers): https://creativesumm.github.io/sharedtask
Longformer-Encoder–Decoder (LED) model: available via Hugging Face Transformers

🙏 Acknowledgements

We thank the CreativeSumm organizers and the ACL community.
This repo builds on the open-source NLP ecosystem (PyTorch, Transformers, Datasets, Evaluate).

🗺️ Maintainers

Xudong Hong ([email protected]) — open issues/PRs welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
eval.py		eval.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Two-Stage Movie Script Summarization (CreativeSumm 2022)

🔗 Quick Links

✨ Overview

Notes

🚀 Setup

🧪 Repro Tips

📊 Expected Behavior (High-Level)

🧾 Citation

📚 Related Resources

🙏 Acknowledgements

🗺️ Maintainers

About

Uh oh!

Releases 1

Packages

Languages

tony-hong/script-2-story

Folders and files

Latest commit

History

Repository files navigation

Two-Stage Movie Script Summarization (CreativeSumm 2022)

🔗 Quick Links

✨ Overview

Notes

🚀 Setup

🧪 Repro Tips

📊 Expected Behavior (High-Level)

🧾 Citation

📚 Related Resources

🙏 Acknowledgements

🗺️ Maintainers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages