You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For reproducibility, we include the scripts to generate the cross-matched datasets [here]().
148
148
149
-
### Image Pretraining
149
+
### Image Pretraining Dataset
150
+
151
+

150
152
151
153
While the AstroCLIP and Spectrum Encoder models are trained on the image-spectrum dataset, we pretrain the galaxy image model separately on full Stein, et al. (2022) image dataset, which consists of 76M galaxy images. This dataset can be accessed using this globus endpoint:
152
154
@@ -157,10 +159,10 @@ The directory is organized into south and north surveys, where each survey is sp
157
159
158
160
## Pretraining
159
161
160
-
AstroCLIP is trained using a two-step process.First, we pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately. Then, we CLIP align these two encoders on a paired image-spectrum dataset.
162
+
AstroCLIP is trained using a two-step process.First, we pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately. Then, we CLIP align these two encoders on a paired image-spectrum dataset.
161
163
162
-
### DINOv2 ViT Image Pretraining:
163
-
AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the [DINOv2](https://github.com/facebookresearch/dinov2/tree/2302b6bf46953431b969155307b9bed152754069) package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suit an astrophysics context.
164
+
### Image Pretraining - DINOv2 ViT:
165
+
AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the [DINOv2](https://github.com/facebookresearch/dinov2/) package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suit an astrophysics context.
164
166
165
167
Model training can be launched with the following command:
166
168
```
@@ -177,7 +179,7 @@ spectrum_trainer fit -c config/specformer.yaml
177
179
```
178
180
We train the model using 4 A100 GPUs (on 1 node) for 30k steps which takes roughly 12 hours.
179
181
180
-
### CLIP alignment:
182
+
### CLIP Alignment:
181
183
182
184
Once pretrained, we align the image and spectrum encoder using cross-attention projection heads. Model training can be launched with the following command:
183
185
```
@@ -188,3 +190,12 @@ We train the model using 4 A100 GPUs (on 1 node) for 15k steps which takes rough
188
190
## Downstream Tasks
189
191
190
192
TODO
193
+
194
+
## Acknowledgements
195
+
This reposity uses datasets and contrastive augmentations from [Stein, et al. (2022)](https://github.com/georgestein/ssl-legacysurvey/tree/main). The image pretraining is built on top of the [DINOv2 pretraining framework](https://github.com/facebookresearch/dinov2/).
196
+
197
+
## License
198
+
AstroCLIP code and model weights are released under the MIT license. See [LICENSE](https://github.com/PolymathicAI/AstroCLIP/blob/main/LICENSE) for additional details.
0 commit comments