CDACSE

Title

《Large Language Models as Curriculum Data Generators for Unsupervised Sentence Representation》

Training

For each existing model to be improved, you can use ChatGPT data from ./data or LLaMA data from ./data_llama.

Training scripts

We provide training scripts ./run_unsup_example.sh. Training scripts call ./train.py for training. The script runs with the following code,

bash run_unsup_example.sh

Evaluation

Before evaluation, please download the evaluation datasets by running,

cd SentEval/data/downstream/
bash download_dataset.sh

Our evaluation code for sentence embeddings is based on a modified version of SentEval. It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting and reports Spearman's correlation. You can evaluate any transformers-based pre-trained models using our evaluation code. For example,

python evaluation.py --model_name_or_path `Replace with your model or path` --pooler cls_before_pooler --task_set sts --mode test

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
SentEval		SentEval
data		data
data_llama		data_llama
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
requirements.txt		requirements.txt
run_unsup_example.sh		run_unsup_example.sh
setup.py		setup.py
simcse_to_huggingface.py		simcse_to_huggingface.py
text.py		text.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

CDACSE

Title

Training

Training scripts

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

Ravansper/CDACSE

Folders and files

Latest commit

History

Repository files navigation

CDACSE

Title

Training

Training scripts

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages