Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions build_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,4 @@ dependencies:
- linkchecker==10.4.0
- pre-commit==3.7.1
- python-dotenv[cli]==1.0.1
- mkdocs-ezglossary-plugin==1.7.1
6 changes: 3 additions & 3 deletions docs/account_management/cheaha_account.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ These instructions are intended to guide researchers on creating new accounts an

## Creating a New Account

Creating a new account is a simple, automated, self-service process. To start, navigate to <https://rc.uab.edu>, our [Open OnDemand](../cheaha/open_ondemand/index.md) web portal, and authenticate. The authentication process differs depending on your affiliation. Accounts are available to researchers with the following situations.
Creating a new <section:account> is a simple, automated, self-service process. To start, navigate to <https://rc.uab.edu>, our [Open OnDemand](../cheaha/open_ondemand/index.md) web portal, and authenticate. The authentication process differs depending on your affiliation. Accounts are available to researchers with the following situations.

- If you are affiliated with UAB and have a BlazerID, please authenticate using Single Sign-On (SSO).
- If you are affiliated with UAB and have a <section:BlazerID>, please authenticate using Single Sign-On (SSO).
- If you are affiliated with UAB Medicine, you will need to use your BlazerID to authenticate via Single Sign-On (SSO) instead of your UABMC authentication process.
- If you are an external collaborator and have a XIAS account with access to Cheaha, please authenticate using your XIAS email address as the ID, not automatically generated `xias-XXXXXX-1` ID.
- If you are an external collaborator and have a XIAS account with access to <section:Cheaha>, please authenticate using your XIAS email address as the ID, not automatically generated `xias-XXXXXX-1` ID.
- If you are an external collaborator and do not have a XIAS account, you will need a UAB-affiliated sponsor and will need to follow our [XIAS Guest Account Instructions](xias/guest_instructions.md). Your sponsor will need to follow our [XIAS Site Management](xias/pi_site_management.md) and [XIAS Guest Management](xias/pi_guest_management.md) documentation pages.

Once you have authenticated, you should see a page that looks like the following.
Expand Down
12 changes: 6 additions & 6 deletions docs/cheaha/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ A full list of the available hardware can be found on our [hardware page](./hard

All researchers are granted 5 TB of individual storage when they [create their Research Computing account](../account_management/cheaha_account.md).

Shared storage is available to all Lab Groups and Core Facilities on campus. Shared storage is also available to UAB Administration groups.
<section:Shared storage> is available to all Lab Groups and Core Facilities on campus. Shared storage is also available to UAB Administration groups.

Please visit our [Storage page](../data_management/storage.md) for detailed information about our individual and shared storage options.

### Partitions

Compute nodes are divided into groups called partitions each with specific qualities suitable for different kinds of workflows or software. In order to submit a compute job, a partition must be chosen in the Slurm options. The partitions can be roughly grouped as such:
<section:Compute nodes> are divided into groups called <section:partitions> each with specific qualities suitable for different kinds of workflows or software. In order to submit a compute job, a partition must be chosen in the Slurm options. The partitions can be roughly grouped as such:

| Use | Partition Names | Notes |
|---|---|---|
Expand All @@ -70,7 +70,7 @@ To effectively manage and provide high-performance computing (HPC) resources to

##### Login vs. Compute Nodes

Like with most HPC clusters, cheaha nodes are divided into two, the login node and compute nodes. The login node acts as the gateway for users to access the cluster, submit jobs, and manage files. Compute nodes, on the other hand, are like the engines of the cluster, designed to perform the heavy lifting of data processing and computation.
Like with most HPC clusters, cheaha nodes are divided into two, the <section:login node> and <section:compute nodes>. The login node acts as the gateway for users to access the cluster, <section:submit jobs>, and manage files. Compute nodes, on the other hand, are like the engines of the cluster, designed to perform the heavy lifting of data processing and computation.

The Login node can be accessed from the Cheaha landing page or through the `$HOME` directory. You can see in the images below, how to identify if you’re within a login node or compute node.

Expand Down Expand Up @@ -140,11 +140,11 @@ Ideally, only non-intensive tasks like editing files, or managing job submission

##### How to start SLURM Jobs?

There are two straightforward ways to start SLURM jobs on cheaha, and they are detailed below.
There are two straightforward ways to start <section:SLURM> jobs on cheaha, and they are detailed below.

###### Open OnDemand (OOD)

UAB uses the OOD platform, a web-based interface for providing access to cluster resources without the need for command-line tools. Users can easily submit jobs, manage files, and even use interactive applications directly from their browsers. One of the standout features of OOD is the ability to launch interactive applications, such as a virtual desktop environment. This feature allows users to work within the cluster as if they were on a local desktop, providing a user-friendly interface for managing tasks and running applications. For an overview of how the page works, and to read more details see our docs on [Navigating Open OnDemand](../cheaha/open_ondemand/index.md). After logging into OOD, users can access various applications designed for job management, file editing, and more.
UAB uses the <section:OOD> platform, a web-based interface for providing access to cluster resources without the need for command-line tools. Users can easily submit jobs, manage files, and even use interactive applications directly from their browsers. One of the standout features of OOD is the ability to launch interactive applications, such as a virtual desktop environment. This feature allows users to work within the cluster as if they were on a local desktop, providing a user-friendly interface for managing tasks and running applications. For an overview of how the page works, and to read more details see our docs on [Navigating Open OnDemand](../cheaha/open_ondemand/index.md). After logging into OOD, users can access various applications designed for job management, file editing, and more.

###### Terminal (sbatch Jobs)

Expand All @@ -162,7 +162,7 @@ Slurm is our job queueing software used for submitting any number of job scripts

## Software

A large variety of software is available on Cheaha as modules. To view and use these modules see [the following documentation](./software/modules.md).
A large variety of software is available on Cheaha as <section:modules>. To view and use these modules see [the following documentation](./software/modules.md).

For new software installation, please try searching [Anaconda](../workflow_solutions/using_anaconda.md) for packages first. If you still need help, please [send a support ticket](../help/support.md)

Expand Down
6 changes: 3 additions & 3 deletions docs/cheaha/hardware.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ The following hardware summaries may be useful for selecting partitions for work

### Summary

The table below contains a summary of the computational resources available on Cheaha and relevant Quality of Service (QoS) Limits. QoS limits allow us to balance usage and ensure fairness for all researchers using the cluster. QoS limits are not a guarantee of resource availability.
The table below contains a summary of the computational resources available on Cheaha and relevant <section:Quality of Service (QoS) Limits>. QoS limits allow us to balance usage and ensure fairness for all researchers using the cluster. QoS limits are not a guarantee of resource availability.

In the table, [Slurm](./slurm/introduction.md) partitions are grouped by shared QoS limits on cores, memory, and GPUs. Node limits are applied to partitions independently. All limits are applied to researchers independently.
In the table, [Slurm](./slurm/introduction.md) partitions are grouped by shared QoS limits on cores, memory, and <section:GPUs>. Node limits are applied to partitions independently. All limits are applied to researchers independently.

Examples of how to make use of the table:

- Suppose you submit 30 jobs to the "express" partition, and suppose each job needs 10 cores each. Hypothetically, in order for all of the jobs to start at once, 300 cores would be required. The QoS limit on cores is 264 on the "express" partition, so at most 26 jobs (260 cores) can start at once. The remaining 4 jobs will be held in queue, because starting one more would go beyond the QoS limit (270 > 264).
- Suppose you submit 30 <section:jobs> to the "express" partition, and suppose each job needs 10 cores each. Hypothetically, in order for all of the jobs to start at once, 300 cores would be required. The QoS limit on cores is 264 on the "express" partition, so at most 26 jobs (260 cores) can start at once. The remaining 4 jobs will be held in queue, because starting one more would go beyond the QoS limit (270 > 264).
- Suppose you submit 5 jobs to the "medium" partition and 5 to the "long" partition, each requiring 1 node. Then, 10 total nodes would be needed. In this case, it is possible that all 10 jobs can start at once because partition node limits are separate. If all 5 jobs start, jobs on the "medium" partition.
- Suppose you submit 5 jobs to the "amperenodes" partition and 5 to "amperenodes-medium", for a total of 10 A100 GPUs. Additionally, you also submit 4 jobs to the "pascalnodes" partition totaling 8 P100 GPUs. Then 4 of the "gpu: ampere" group jobs can start at once, because the QoS limit is 4 GPUs there. Additionally, all 4 of the "gpu: pascal" group jobs, because the QoS limit is 8 GPUs there. In this case, the QoS for each group is separate.

Expand Down
2 changes: 1 addition & 1 deletion docs/cheaha/job_efficiency.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Questions to ask yourself before requesting resources:

1. How large is the data I'm working with?

- Start by requesting memory equal to double the size of one file, no less than 2 GB per core.
- Start by requesting <section:memory> equal to double the size of one file, no less than 2 GB per core.
- If that isn't enough, increase the request by 50% until there are no more memory errors.
- Example: If your data file is 4 GB, try starting out by requesting 8 GB of memory, then 12 GB, 16 GB, etc.

Expand Down
2 changes: 1 addition & 1 deletion docs/cheaha/open_ondemand/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Open OnDemand

Open OnDemand (OOD) is web portal to access Cheaha. On it, you can submit interactive jobs in easy to use forms. These jobs include a generic desktop as well as specific apps such as RStudio or MATLAB. There is also access to a basic file manager for viewing and moving files.
Open OnDemand (OOD) is web portal to access Cheaha. On it, you can submit interactive jobs in easy to use forms. These jobs include a generic desktop as well as specific apps such as <section:RStudio> or <section:MATLAB>. There is also access to a basic file manager for viewing and moving files.

The web portal can be accessed at [https://rc.uab.edu](https://rc.uab.edu) and is available both on and off campus.

Expand Down
2 changes: 1 addition & 1 deletion docs/cheaha/open_ondemand/ood_jupyter.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ To modify the environment that Anaconda and Jupyter will run in, please use the

### CUDA

For GPU applications you'll need to load a `CUDA/*` module to have the CUDA toolkit available. If working with deep learning workflows, you may also need to load the `cuDNN/*-CUDA-*` module corresponding to your choice of `CUDA/*` module version. These are required for popular ML/DL/AI libraries like TensorFlow, Keras, and PyTorch. Use `module spider cuda/` and `module spider cudnn` to view the list of appropriate modules. An example of what to put in the Environment Setup field when using a version of Tensorflow compatible with CUDA version 12.2.0 is shown below.
For GPU applications you'll need to load a `CUDA/*` module to have the CUDA toolkit available. If working with deep learning workflows, you may also need to load the `cuDNN/*-CUDA-*` module corresponding to your choice of `CUDA/*` module version. These are required for popular ML/DL/AI libraries like <section:TensorFlow>, <section:Keras>, and <section:PyTorch>. Use `module spider cuda/` and `module spider cudnn` to view the list of appropriate modules. An example of what to put in the Environment Setup field when using a version of Tensorflow compatible with CUDA version 12.2.0 is shown below.

```shell
# ENVIRONMENT SETUP
Expand Down
2 changes: 1 addition & 1 deletion docs/cheaha/open_ondemand/ood_layout.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ For each job running via Open OnDemand, there will be a card listed on this page
1. **Host**: The node on which the job is currently running.
1. **Time Remaining**: The amount of time remaining from the total requested time.
1. **Session ID**: This is the unique ID for the OOD session for this job, which can be clicked to access the OOD log directory for troubleshooting.
1. **Node, Cores and State**: Information about the number of node, cores assignment, and state of the job.
1. **Node, Cores and State**: Information about the number of <section:node>, <section:cores> assignment, and <section:state> of the job.
1. **Launch Desktop in new tab**: Click this button to open your interactive VNC session.
1. **Delete**: Click this button if you want to cancel/stop a running job, and/or delete the session if the job has already ended.
1. **View Only (Share-able Link)**: Click this button to share the URL of your job with someone. It allows them to watch as you interact with the program and assist you. However, they can only view and cannot control or enter any data.
Expand Down
2 changes: 1 addition & 1 deletion docs/cheaha/slurm/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ For more information on these nodes, see `Detailed Hardware Information`.

To submit a job with one or more GPUs, you will need to set the partition to `pascalnodes` or `amperenodes` family of partitions for P100 GPUs or `amperenodes` family for A100 GPUs.

When requesting a job using `sbatch`, you will need to include the Slurm flag `--gres=gpu:#`. Replace `#` with the number of GPUs you need. Quotas and constraints are available on our [Hardware Summary](../hardware.md#summary)
When requesting a job using `sbatch`, you will need to include the Slurm flag `--gres=gpu:#`. Replace `#` with the number of GPUs you need. <section:Quotas> and constraints are available on our [Hardware Summary](../hardware.md#summary)

<!-- markdownlint-disable MD046 -->
!!! note
Expand Down
2 changes: 1 addition & 1 deletion docs/cheaha/slurm/slurm_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ This user guide provides comprehensive insight into different types of batch job

1. [Parallel Jobs](#example-3-parallel-jobs) is suitable for executing multiple independent tasks/jobs simultaneously and efficiently distributing them across resources. This approach is particularly beneficial for small-scale tasks that cannot be split into parallel processes within the code itself. For example, consider a Python script that operates on different data set, in such a scenario, you can utilize `srun` to execute multiple instances of the script concurrently, each operating on a different dataset and on different resources.

1. [Array Job](#example-4-array-job) is used for submitting and running multiple large number of identical tasks in parallel. They share the same code and execute with similar resource requirements. Instead of submitting multiple [sequential job](#example-2-sequential-job), you can submit a single array job, which helps to manage and schedule a large number of similar tasks efficiently. This improves efficiency, resource utilization, scalability, and ease of debugging. For instance, array jobs can be designed for executing multiple instances of the same task with slight variations in inputs or parameters such as perform [FastQC](https://home.cc.umanitoba.ca/~psgendb/doc/fastqc.help) processing on 10 different samples.
1. [Array Job](#example-4-array-job) is used for submitting and running multiple large number of identical tasks in parallel. They share the same code and execute with similar resource requirements. Instead of submitting multiple [sequential job](#example-2-sequential-job), you can submit a single array job, which helps to manage and schedule a large number of similar tasks efficiently. This improves efficiency, resource utilization, scalability, and ease of debugging. For instance, <section:array job> can be designed for executing multiple instances of the same task with slight variations in inputs or parameters such as perform [FastQC](https://home.cc.umanitoba.ca/~psgendb/doc/fastqc.help) processing on 10 different samples.

1. [Mutlithreaded or Multicore Job](#example-5-multithreaded-or-multicore-job) is used when software inherently support multithreaded parallelism i.e run independent tasks simultaneously on multicore processors. For instance, there are numerous software such as [MATLAB](https://www.mathworks.com/help/matlab/ref/parfor.html), [FEBio](https://help.febio.org/FebioUser/FEBio_um_3-4-Section-2.6.html), [Xplor-NIH](https://nmr.cit.nih.gov/xplor-nih/doc/current/helperPrograms/options.html) support running multiple tasks at the same time on multicore processors. Users or programmers do not need to modify the code; you can simply enable multithreaded parallelism by configuring the appropriate options.

Expand Down
2 changes: 1 addition & 1 deletion docs/data_management/lts/lts_cores.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[UAB Core Facilities](https://www.uab.edu/cores/ircp/uab-ircp-core-facilities) provide access to research instruments and services for scientific and clinical investigators. Cores can generate large amounts of data very quickly for many labs and so have some unique data management concerns. These concerns can be summarized as follows:

1. Data transfer off local machines
1. <section:Data transfer> off local machines
1. Data organization
1. Data distribution

Expand Down
6 changes: 3 additions & 3 deletions docs/data_management/storage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Storage

Research Computing offers several data storage options to meet individual or shared needs of UAB researchers, depending on their requirement and use-cases. The types of storage available, procedures for requesting access, responsibilities, and usage guidelines are detailed in the following sections.
Research Computing offers several data storage options to meet individual or shared needs of UAB researchers, depending on their requirement and use-cases. The types of <section:storage> available, procedures for requesting access, responsibilities, and usage guidelines are detailed in the following sections.

## What Type of Storage Do I Need?

Expand All @@ -14,7 +14,7 @@ Every Cheaha user has personal directories found at `/home/$USER` (or `$HOME`) a

### How Do I Request Individual Long-Term Storage?

To request individual Long-Term Storage, please first read and understand how [Long-Term Storage](./lts/index.md) differs from traditional file systems, like GPFS on Cheaha. Decide if it is suitable for your needs. Then please feel free to contact [Support](../help/support.md).
To request individual <section:Long-Term Storage>, please first read and understand how [Long-Term Storage](./lts/index.md) differs from traditional file systems, like GPFS on Cheaha. Decide if it is suitable for your needs. Then please feel free to contact [Support](../help/support.md).

## What Shared Storage Solutions are Available?

Expand Down Expand Up @@ -87,7 +87,7 @@ To request changes in Shared Storage membership, please contact [Support](../hel

### How Can I Get A Larger `/data/project/` (GPFS) Allocation?

At this time, due to constraints on total GPFS storage, we are not able to increase `/data/project/` allocations. Please consider batching your analyses by leveraging a combination of [LTS](./lts/index.md) to store raw and/or input data, and [User Scratch](#user-scratch) for temporary storage of up to 100 TB of data for use during analysis.
At this time, due to constraints on total <section:GPFS storage>, we are not able to increase `/data/project/` allocations. Please consider batching your analyses by leveraging a combination of [LTS](./lts/index.md) to store raw and/or input data, and [User Scratch](#user-scratch) for temporary storage of up to 100 TB of data for use during analysis.

If you wish to have further discussion of options for expanding your GPFS allocation and other workarounds tailored to your workflow, please [Contact Support](../help/support.md). Please also note that project storage is not just for a single project only, it is meant as a storage for multiple projects.

Expand Down
Loading
Loading