Clone the necessary repositories and install dependencies:
# Clone repositories
git clone https://anonymous.4open.science/r/spotlight # Training and evaluation
git clone https://anonymous.4open.science/r/lm-corpus-FAB7 # Training corpus
git clone https://anonymous.4open.science/r/lm-profiler-A550 # Latency testing tool
# Install dependencies
cd spotlight
pip install -r requirements.txt
pip install -e .
cd ../lm-corpus-FAB7
pip install -e .
cd ../lm-profiler-A550
pip install -e .For enhanced performance, install the CUDA kernel:
cd spotlight/spotlight/kernel
bash install.shUpon successful compilation, two .so files will be added to the kernel directory.
Pre-trained model checkpoints are available for download:
| Model | Checkpoint |
|---|---|
| LLaMA3-8B | llama3-8b-spotlight.pth |
| LLaMA3-8B (C4) | llama3-8b-spotlight-c4.pth |
| LLaMA3-8B (Code) | llama3-8b-spotlight-code.pth |
| Qwen2.5-1.5B | qwen2.5-1.5b-spotlight.pth |
| Qwen2.5-3B | qwen2.5-3b-spotlight.pth |
| Qwen2.5-7B | qwen2.5-7b-spotlight.pth |
| Qwen2.5-14B | qwen2.5-14b-spotlight.pth |
Kindly reminder: If you’re unable to download these files on anonymous GitHub, you can clone the repository locally and use Git LFS to fetch them. Each file is small in size, with the largest being only a few dozen megabytes.
Evaluate the Intersection over Union (IoU) metric:
-
Run the test script:
bash scripts/test_iou.sh
-
By default, the script evaluates the training-free linear hashing version. To evaluate a trained model, update the
load_ckpkey in the relevant JSON configuration file (e.g.,test_iou/llama2-7b-linearhashing.json) to point to the desired checkpoint from the Model Weights section.
-
Prepare Datasets:
Download the required datasets:
Set environment variables:
export SPOTLIGHT_PROOFPILE_PATH=/path/to/proof-pile.json export SPOTLIGHT_CODEPARROT_PATH=/path/to/codeparrot.json
Replace
/path/to/with the actual file paths. -
Run Evaluation:
bash scripts/test_ppl.sh
All required datasets are automatically downloaded during evaluation. Ensure lm-eval-harness version 0.3.0 is installed, then run:
bash scripts/test_lmeval.sh-
Prepare Datasets:
Download data.zip and place it in the
LongBench/directory asLongBench/data.zip. -
Run Evaluation:
bash scripts/test_longbench.sh
-
Evaluation Results:
Below are the evaluation logs for various models and configurations:
LLaMA2-7B
Method Config Eval Log Original N/A llama2-7b +Quest 1024 Budget llama2-7b-quest-1024 +Quest 128 Budget llama2-7b-quest-128 +MagicPIG Default llama2-7b-magicpig +Spotlight 90% Pruned llama2-7b-spotlight-90 +Spotlight 98% Pruned llama2-7b-spotlight-98 LLaMA2-7B-Chat
Method Config Eval Log Original N/A llama2-7b-chat +Quest 1024 Budget llama2-7b-chat-quest-1024 +Quest 128 Budget llama2-7b-chat-quest-128 +MagicPIG Default llama2-7b-chat-magicpig +Spotlight 90% Pruned llama2-7b-chat-spotlight-90 +Spotlight 98% Pruned llama2-7b-chat-spotlight-98 LLaMA3-8B
Method Config Eval Log Original N/A llama3-8b +Quest 1024 Budget llama3-8b-quest-1024 +Quest 256 Budget llama3-8b-quest-256 +MagicPIG Default llama3-8b-magicpig +Spotlight 90% Pruned llama3-8b-spotlight-90 +Spotlight 98% Pruned llama3-8b-spotlight-98 Kindly reminder: The link above points to a summary of the results (if it doesn’t open properly, please click the view raw button in the top right corner). The detailed responses generated by each model can be found under test_longbench/log, where you’ll find hundreds of JSON files recording each model’s output on each sub-dataset.
To evaluate response fidelity, obtain LongBench output files by either:
- Running the LongBench test scripts to generate output files.
- Using provided output files from the LongBench section.
For example, to compare output similarity between LLaMA3-8B with and without Spotlight Attention:
python test_longbench/test_sim.py test_longbench/log/llama3-8b.json test_longbench/log/llama3-8b-spotlight.jsoncd spotlight
mkdir -p data/slimpajama ckpDownload and place the following datasets in the data/slimpajama directory:
Choose a training method based on available disk space:
-
Sufficient Disk Space:
- Edit
train.shto include the--prepare_dataargument and run the script. - After completion, replace
--prepare_datawith--use_prepared_dataand rerun.
- Edit
-
Limited Disk Space (Default): Generate activations on-the-fly by running the
train.shscript without modifications.
The trained checkpoint will be saved in the ckp directory and can be referenced in test scripts using the load_ckp keyword.
Training involves computing a ranking loss, which can be memory-intensive due to the large tensor ( Z ):
[ n_{\text{heads}} \times n_{\text{query}} \times n_{\text{top}} \times (n_{\text{query}} - n_{\text{top}}) ]
To mitigate memory issues, adjust the following parameters (default: 1024):
--max_que: Maximum query tokens.--max_top: Maximum top-ranked tokens.--max_oth: Maximum other tokens.
If out-of-memory errors occur, reduce --max_que and --max_oth first.
This README provides a polished, professional guide to installing, evaluating, and training models with Spotlight Attention. For additional details or support, refer to the linked repositories or datasets.
