NVIDIA-NeMo · sarahyurick · Sep 19, 2025 · Sep 19, 2025 · Sep 23, 2025
diff --git a/nemo/utils/import_utils.py b/nemo/utils/import_utils.py
@@ -12,7 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-# This file is taken from https://github.com/NVIDIA/NeMo-Curator, which is adapted from cuML's safe_imports module:
+# This file is taken from https://github.com/NVIDIA-NeMo/Curator/blob/dask/nemo_curator/utils/import_utils.py,
+# which is adapted from cuML's safe_imports module:
 # https://github.com/rapidsai/cuml/blob/e93166ea0dddfa8ef2f68c6335012af4420bc8ac/python/cuml/internals/safe_imports.py
 
 

diff --git a/tutorials/llm/llama/README.rst b/tutorials/llm/llama/README.rst
@@ -16,7 +16,7 @@ This repository contains Jupyter Notebook tutorials using the NeMo Framework for
      - Perform LoRA PEFT on Llama 3 8B Instruct using a dataset for bio-medical domain question answering. Deploy multiple LoRA adapters with NVIDIA NIM.
    * - `Llama 3.1 Law-Domain LoRA Fine-Tuning and Deployment with NeMo Framework and NVIDIA NIM <./sdg-law-title-generation>`_
      - `Law StackExchange <https://huggingface.co/datasets/ymoslem/Law-StackExchange>`_
-     - Perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM. As a prerequisite, follow the tutorial for `data curation using NeMo Curator <https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg>`_.
+     - Perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM. As a prerequisite, follow the tutorial for `data curation using NeMo Curator <https://github.com/NVIDIA-NeMo/Curator/tree/dask/tutorials/peft-curation-with-sdg>`_.
    * - `Llama 3.1 Pruning and Distillation with NeMo Framework <./pruning-distillation>`_
      - `WikiText-103-v1 <https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1>`_
      - Perform pruning and distillation on Llama 3.1 8B using the WikiText-103-v1 dataset with NeMo Framework.

diff --git a/tutorials/llm/llama/domain-adaptive-pretraining/README.md b/tutorials/llm/llama/domain-adaptive-pretraining/README.md
@@ -17,9 +17,9 @@ Here, we share a tutorial with best practices on custom tokenization and DAPT (D
 
 * In this tutorial, we will leverage chip domain/hardware datasets from open-source GitHub repositories, wiki URLs, and academic papers. Therefore, as a pre-requisite the user should curate the domain specific and general purpose data using the NeMo Curator and place them in the directories mentioned below. 
 
-* `./code/data` should contain curated data from chip domain after processing with NeMo Curator. Playbook for DAPT data curation can be found [here](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/dapt-curation)
+* `./code/data` should contain curated data from chip domain after processing with NeMo Curator. Playbook for DAPT data curation can be found [here](https://github.com/NVIDIA-NeMo/Curator/tree/dask/tutorials/dapt-curation). Please note that this tutorial uses NeMo Curator version 0.9.0 or lower.
 
-* `./code/general_data` should contain open-source general purpose data that llama-2 was trained on. This data will help idenitfy token/vocabulary differences between general purpose and domain-specific datasets. Data can be downloaded from [Wikepedia](https://huggingface.co/datasets/legacy-datasets/wikipedia), [CommonCrawl](https://data.commoncrawl.org/) etc. and curated with [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/single_node_tutorial)
+* `./code/general_data` should contain open-source general purpose data that llama-2 was trained on. This data will help idenitfy token/vocabulary differences between general purpose and domain-specific datasets. Data can be downloaded from [Wikepedia](https://huggingface.co/datasets/legacy-datasets/wikipedia), [CommonCrawl](https://data.commoncrawl.org/) etc. and curated with [NeMo Curator](https://github.com/NVIDIA-NeMo/Curator/tree/dask/tutorials/single_node_tutorial). Please note that this tutorial uses NeMo Curator version 0.9.0 or lower.
 
 
 ## Custom Tokenization for DAPT

diff --git a/tutorials/llm/llama/domain-adaptive-pretraining/code/custom_tokenization.ipynb b/tutorials/llm/llama/domain-adaptive-pretraining/code/custom_tokenization.ipynb
@@ -74,7 +74,7 @@
     "- Step 6: Merge the new embeddings with the original embedding table (in llama2-2-70b) to get the final <b>Domain Adapted Tokenizer</b>.\n",
     "## Data\n",
     "\n",
-    "In this playbook, we will leverage chip domain/hardware datasets from open-source GitHub repositories, wiki URLs, and academic papers. Data has been processed and curated using [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator/tree/main) as shown in this [playbook](https://github.com/jvamaraju/ndc_dapt_playbook/tree/dapt_jv)"
+    "In this playbook, we will leverage chip domain/hardware datasets from open-source GitHub repositories, wiki URLs, and academic papers. Data has been processed and curated using [NeMo Curator](https://github.com/NVIDIA-NeMo/Curator/tree/dask) as shown in this [playbook](https://github.com/jvamaraju/ndc_dapt_playbook/tree/dapt_jv). Please note that this tutorial uses NeMo Curator version 0.9.0 or lower."
    ]
   },
   {

diff --git a/...ials/llm/llama/domain-adaptive-pretraining/code/domain_adaptive_pretraining_nemo2.0.ipynb b/...ials/llm/llama/domain-adaptive-pretraining/code/domain_adaptive_pretraining_nemo2.0.ipynb
@@ -69,7 +69,7 @@
    "source": [
     "# Data\n",
     "\n",
-    "* In this playbook, we will leverage chip domain/hardware datasets from open-source GitHub repositories, wiki URLs, and academic papers. Data has been processed and curated using [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator/tree/main) as shown in this [playbook](https://github.com/jvamaraju/ndc_dapt_playbook/tree/dapt_jv)"
+    "* In this playbook, we will leverage chip domain/hardware datasets from open-source GitHub repositories, wiki URLs, and academic papers. Data has been processed and curated using [NeMo Curator](https://github.com/NVIDIA-NeMo/Curator/tree/dask) as shown in this [playbook](https://github.com/jvamaraju/ndc_dapt_playbook/tree/dapt_jv). Please note that this tutorial uses NeMo Curator version 0.9.0 or lower."
    ]
   },
   {

diff --git a/tutorials/llm/reasoning/README.md b/tutorials/llm/reasoning/README.md
@@ -8,7 +8,7 @@ This recipe is inspired by the [Llama Nemotron family of models](https://www.nvi
 
 Checkout the following resources that are used in this tutorial.
 * [Llama-Nemotron-Post-Training-Data](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset), an open source dataset for instilling reasoning behavior in large language models.
-* The tutorial on [curating the Llama Nemotron Reasoning Dataset with NVIDIA NeMo Curator](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/llama-nemotron-data-curation).
+* The tutorial on [curating the Llama Nemotron Reasoning Dataset with NVIDIA NeMo Curator](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/text/llama-nemotron-data-curation).
 You will need the output from that tutorial for training a reasoning model.
 
 ## Hardware Requirements

diff --git a/tutorials/llm/reasoning/Reasoning-SFT.ipynb b/tutorials/llm/reasoning/Reasoning-SFT.ipynb
@@ -21,7 +21,7 @@
     "### 🧰 Tools and Resources\n",
     "* [NeMo Framework](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html)\n",
     "* [Llama-Nemotron-Post-Training-Data](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset), an open source dataset for instilling reasoning behavior in large language models.\n",
-    "* [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator) for data curation\n",
+    "* [NeMo Curator](https://github.com/NVIDIA-NeMo/Curator) for data curation\n",
     "\n",
     "## 📌 Requirements\n",
     "\n",
@@ -32,7 +32,7 @@
     "* A valid Hugging Face API token with access to the [Meta LLaMa 3.1-8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model (since this is a gated model).\n",
     "\n",
     "### Dataset\n",
-    "To follow along, you would need an appropriate reasoning dataset. Checkout the tutorial on [curating the Llama Nemotron Reasoning Dataset with NVIDIA NeMo Curator](https://github.com/NVIDIA-NeMo/Curator/tree/dask/tutorials/llama-nemotron-data-curation).\n",
+    "To follow along, you would need an appropriate reasoning dataset. Checkout the tutorial on [curating the Llama Nemotron Reasoning Dataset with NVIDIA NeMo Curator](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/text/llama-nemotron-data-curation).\n",
     "You will need the output from that tutorial as the training set input to this playbook!\n",
     "\n",
     "### Hardware Requirements\n",