Merge branch 'main' of https://github.com/PolymathicAI/AstroCLIP

lhparker1 · lhparker1 · commit 228500e36c0a · 2024-06-07T13:00:23.000-04:00
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
 # AstroCLIP
 
-Official PyTorch implementation and pre-trained models for paper **AstroCLIP: A Cross-Modal Foundation Model for Galaxies**. 
+Official PyTorch implementation and pre-trained models for paper **AstroCLIP: A Cross-Modal Foundation Model for Galaxies**.
 
 ![image](assets/im_embedding.png)
 
-AstroCLIP is a novel, cross-modal, self-supervised foundation model that creates a shared embedding space for multi-band imaging and optical spectra of galaxies. These embeddings encode meaningful physical information shared between both modalities, and can be used as the basis for competitive zero- and few-shot learning on a variety of downstream tasks, including similarity search, redshift estimation, galaxy property prediction, and morphology classification. 
+AstroCLIP is a novel, cross-modal, self-supervised foundation model that creates a shared embedding space for multi-band imaging and optical spectra of galaxies. These embeddings encode meaningful physical information shared between both modalities, and can be used as the basis for competitive zero- and few-shot learning on a variety of downstream tasks, including similarity search, redshift estimation, galaxy property prediction, and morphology classification.
 
 ## Installation
 The training and evaluation code requires PyTorch 2.0. Additionally, an up-to-date eventlet is required for wandb. Note that the code has only been tested with the specified versions and also expects a Linux environment. To install the AstroCLIP package and its corresponding dependencies, please follow the code below.
@@ -14,6 +14,7 @@ pip install --upgrade pip
 pip install --upgrade eventlet torch lightning[extra]
 pip install -e .
 ```
+It is possible to override default storage path by changing the flag in `astroclip/env.py`
 
 ## Pretrained Models
 
@@ -77,10 +78,10 @@ The directory is organized into south and north surveys, where each survey is sp
 
 ## Training
 
-AstroCLIP is trained using a two-step process. First, we pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately. Then, we CLIP align these two encoders on a paired image-spectrum dataset.
+AstroCLIP is trained using a two-step process.First, we pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately. Then, we CLIP align these two encoders on a paired image-spectrum dataset.
 
-### Image Pretraining - ViT with DINOv2:
-AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the [DINOv2](https://github.com/facebookresearch/dinov2/tree/2302b6bf46953431b969155307b9bed152754069) package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to better suit the astrophysical context. 
+### DINOv2 ViT Image Pretraining:
+AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the [DINOv2](https://github.com/facebookresearch/dinov2/tree/2302b6bf46953431b969155307b9bed152754069) package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suit an astrophysics context.
 
 Model training can be launched with the following command:
 ```
diff --git a/astroclip/env.py b/astroclip/env.py
@@ -9,6 +9,11 @@
 WARN_ONCE = True
 
 
+# TODO: change here the defaults
+ASTROCLIP_ROOT = "/mnt/ceph/users/polymathic/astroclip"
+WANDB_ENTITY_NAME = "flatiron-scipt"
+
+
 def default_dotenv_values():
     """Use a default .env but tell the user how to create their own."""
 
@@ -22,8 +27,8 @@ def default_dotenv_values():
         global WARN_ONCE
 
         # TODO: these should be replaced with a folder in the project's root
-        f.write('ASTROCLIP_ROOT="/mnt/ceph/users/polymathic/astroclip"\n')
-        f.write('WANDB_ENTITY_NAME="flatiron-scipt"\n')
+        f.write("ASTROCLIP_ROOT={ASTROCLIP_ROOT}\n")
+        f.write('WANDB_ENTITY_NAME="{WANDB_ENTITY_NAME}"\n')
         f.flush()
 
         if WARN_ONCE:
diff --git a/astroclip/models/__init__.py b/astroclip/models/__init__.py
@@ -1,4 +1,5 @@
 from . import astroclip
 from .astroclip import AstroClipModel
+from .loader import load_model
 from .moco_v2 import Moco_v2
 from .specformer import SpecFormer
diff --git a/astroclip/models/loader.py b/astroclip/models/loader.py
@@ -0,0 +1,7 @@
+import joblib
+from huggingface_hub import hf_hub_download
+
+
+def load_model(repo_id, filename):
+    model = joblib.load(hf_hub_download(repo_id=repo_id, filename=filename))
+    return model
diff --git a/requirements.txt b/requirements.txt
@@ -1,6 +1,7 @@
 astropy
 datasets
 dinov2 @ git+https://github.com/facebookresearch/dinov2.git@2302b6bf46953431b969155307b9bed152754069
+huggingface_hub
 jaxtyping
 lightning[extra]
 plotly