You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-15Lines changed: 9 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -1,39 +1,33 @@
1
1
# AstroCLIP
2
2
3
-
The goal of this project is to demonstrate the ability of contrastive pre-training between two different kinds of astronomical data modalities (multi-band imaging, and optical spectra), to yield a meaningful embedding space which captures physical information about galaxies and is shared between both modalities.
3
+
Official PyTorch implementation and pre-trained models for paper **AstroCLIP: A Cross-Modal Foundation Model for Galaxies**.
4
4
5
5

6
6
7
-
8
-
## Getting Started
9
-
TODO: Link tutorial notebook.
7
+
AstroCLIP is a novel, cross-modal, self-supervised foundation model that creates a shared embedding space for multi-band imaging and optical spectra of galaxies. These embeddings encode meaningful physical information shared between both modalities, and can be used as the basis for competitive zero- and few-shot learning on a variety of downstream tasks, including similarity search, redshift estimation, galaxy property prediction, and morphology classification.
10
8
11
9
## Installation
12
-
The training and evaluation code requires PyTorch 2.0. Additionally, an up-to-date eventlet is required for wandb. Note that teh code has only been tested with the specified versions and also expects a Linux environment. To install the AstroCLIP package and its corresponding dependencies, please follow the code below.
13
-
14
-
The following packages are excluded from the project's dependencies to allow for a more flexible system configuration (i.e. allow the use of module subsystem).
10
+
The training and evaluation code requires PyTorch 2.0. Additionally, an up-to-date eventlet is required for wandb. Note that the code has only been tested with the specified versions and also expects a Linux environment. To install the AstroCLIP package and its corresponding dependencies, please follow the code below.
It is possible to override default storage path by changing the flag in `astroclip/env.py`
23
18
24
19
## Training
25
20
26
-
AstroCLIP is trained using a two-step process.
21
+
AstroCLIP is trained using a two-step process.First, we pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately. Then, we CLIP align these two encoders on a paired image-spectrum dataset.
27
22
28
-
First, we pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately.
23
+
### DINOv2 ViT Image Pretraining:
24
+
AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the [DINOv2](https://github.com/facebookresearch/dinov2/tree/2302b6bf46953431b969155307b9bed152754069) package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suite an astrophysics context.
29
25
30
-
### Image encoder:
31
-
The AstroDINO model is based on the DINO_v2 model and can be run from the astrodino subdirectory.
32
-
33
-
Run with
26
+
Model training can be launched with the following command:
34
27
```
35
28
image_trainer -c astroclip/astrodino/config.yaml
36
29
```
30
+
Ultimately, we run training using 20 A100 GPUs (on 5 nodes) for 250k steps using the config provided [here](https://github.com/PolymathicAI/AstroCLIP_v2/blob/master/astroclip/astrodino/config.yaml), which takes roughly 46 hours.
37
31
38
32
### Spectrum encoder:
39
33
@@ -43,7 +37,7 @@ spectrum_trainer fit -c config/specformer.yaml
0 commit comments