From 7109ddf25a8a90e4eef700a5690fd069a81450c9 Mon Sep 17 00:00:00 2001 From: bdu-birhanu Date: Mon, 30 Dec 2024 16:31:05 -0600 Subject: [PATCH 1/7] intial draft for glossary --- docs/account_management/cheaha_account.md | 6 +- docs/cheaha/getting_started.md | 12 +- docs/cheaha/hardware.md | 6 +- docs/cheaha/job_efficiency.md | 2 +- docs/cheaha/open_ondemand/index.md | 2 +- docs/cheaha/open_ondemand/ood_jupyter.md | 2 +- docs/cheaha/open_ondemand/ood_layout.md | 2 +- docs/cheaha/slurm/gpu.md | 2 +- docs/cheaha/slurm/slurm_tutorial.md | 2 +- docs/data_management/lts/lts_cores.md | 2 +- docs/data_management/lts/policies.md | 2 +- docs/data_management/storage.md | 6 +- docs/data_management/transfer/rclone.md | 2 +- docs/glossary.md | 172 ++++++++++++++++++++++ docs/uab_cloud/index.md | 4 +- docs/uab_cloud/remote_access.md | 6 +- mkdocs.yml | 13 +- 17 files changed, 213 insertions(+), 30 deletions(-) create mode 100644 docs/glossary.md diff --git a/docs/account_management/cheaha_account.md b/docs/account_management/cheaha_account.md index 5408b1727..653784671 100644 --- a/docs/account_management/cheaha_account.md +++ b/docs/account_management/cheaha_account.md @@ -4,11 +4,11 @@ These instructions are intended to guide researchers on creating new accounts an ## Creating a New Account -Creating a new account is a simple, automated, self-service process. To start, navigate to , our [Open OnDemand](../cheaha/open_ondemand/index.md) web portal, and authenticate. The authentication process differs depending on your affiliation. Accounts are available to researchers with the following situations. +Creating a new is a simple, automated, self-service process. To start, navigate to , our [Open OnDemand](../cheaha/open_ondemand/index.md) web portal, and authenticate. The authentication process differs depending on your affiliation. Accounts are available to researchers with the following situations. -- If you are affiliated with UAB and have a BlazerID, please authenticate using Single Sign-On (SSO). +- If you are affiliated with UAB and have a , please authenticate using Single Sign-On (SSO). - If you are affiliated with UAB Medicine, you will need to use your BlazerID to authenticate via Single Sign-On (SSO) instead of your UABMC authentication process. -- If you are an external collaborator and have a XIAS account with access to Cheaha, please authenticate using your XIAS email address as the ID, not automatically generated `xias-XXXXXX-1` ID. +- If you are an external collaborator and have a XIAS account with access to , please authenticate using your XIAS email address as the ID, not automatically generated `xias-XXXXXX-1` ID. - If you are an external collaborator and do not have a XIAS account, you will need a UAB-affiliated sponsor and will need to follow our [XIAS Guest Account Instructions](xias/guest_instructions.md). Your sponsor will need to follow our [XIAS Site Management](xias/pi_site_management.md) and [XIAS Guest Management](xias/pi_guest_management.md) documentation pages. Once you have authenticated, you should see a page that looks like the following. diff --git a/docs/cheaha/getting_started.md b/docs/cheaha/getting_started.md index 084282dc5..5d1fd3f05 100644 --- a/docs/cheaha/getting_started.md +++ b/docs/cheaha/getting_started.md @@ -42,13 +42,13 @@ A full list of the available hardware can be found on our [hardware page](./hard All researchers are granted 5 TB of individual storage when they [create their Research Computing account](../account_management/cheaha_account.md). -Shared storage is available to all Lab Groups and Core Facilities on campus. Shared storage is also available to UAB Administration groups. + is available to all Lab Groups and Core Facilities on campus. Shared storage is also available to UAB Administration groups. Please visit our [Storage page](../data_management/storage.md) for detailed information about our individual and shared storage options. ### Partitions -Compute nodes are divided into groups called partitions each with specific qualities suitable for different kinds of workflows or software. In order to submit a compute job, a partition must be chosen in the Slurm options. The partitions can be roughly grouped as such: + are divided into groups called each with specific qualities suitable for different kinds of workflows or software. In order to submit a compute job, a partition must be chosen in the Slurm options. The partitions can be roughly grouped as such: | Use | Partition Names | Notes | |---|---|---| @@ -70,7 +70,7 @@ To effectively manage and provide high-performance computing (HPC) resources to ##### Login vs. Compute Nodes -Like with most HPC clusters, cheaha nodes are divided into two, the login node and compute nodes. The login node acts as the gateway for users to access the cluster, submit jobs, and manage files. Compute nodes, on the other hand, are like the engines of the cluster, designed to perform the heavy lifting of data processing and computation. +Like with most HPC clusters, cheaha nodes are divided into two, the and . The login node acts as the gateway for users to access the cluster, , and manage files. Compute nodes, on the other hand, are like the engines of the cluster, designed to perform the heavy lifting of data processing and computation. The Login node can be accessed from the Cheaha landing page or through the `$HOME` directory. You can see in the images below, how to identify if you’re within a login node or compute node. @@ -140,11 +140,11 @@ Ideally, only non-intensive tasks like editing files, or managing job submission ##### How to start SLURM Jobs? -There are two straightforward ways to start SLURM jobs on cheaha, and they are detailed below. +There are two straightforward ways to start jobs on cheaha, and they are detailed below. ###### Open OnDemand (OOD) -UAB uses the OOD platform, a web-based interface for providing access to cluster resources without the need for command-line tools. Users can easily submit jobs, manage files, and even use interactive applications directly from their browsers. One of the standout features of OOD is the ability to launch interactive applications, such as a virtual desktop environment. This feature allows users to work within the cluster as if they were on a local desktop, providing a user-friendly interface for managing tasks and running applications. For an overview of how the page works, and to read more details see our docs on [Navigating Open OnDemand](../cheaha/open_ondemand/index.md). After logging into OOD, users can access various applications designed for job management, file editing, and more. +UAB uses the platform, a web-based interface for providing access to cluster resources without the need for command-line tools. Users can easily submit jobs, manage files, and even use interactive applications directly from their browsers. One of the standout features of OOD is the ability to launch interactive applications, such as a virtual desktop environment. This feature allows users to work within the cluster as if they were on a local desktop, providing a user-friendly interface for managing tasks and running applications. For an overview of how the page works, and to read more details see our docs on [Navigating Open OnDemand](../cheaha/open_ondemand/index.md). After logging into OOD, users can access various applications designed for job management, file editing, and more. ###### Terminal (sbatch Jobs) @@ -162,7 +162,7 @@ Slurm is our job queueing software used for submitting any number of job scripts ## Software -A large variety of software is available on Cheaha as modules. To view and use these modules see [the following documentation](./software/modules.md). +A large variety of software is available on Cheaha as . To view and use these modules see [the following documentation](./software/modules.md). For new software installation, please try searching [Anaconda](../workflow_solutions/using_anaconda.md) for packages first. If you still need help, please [send a support ticket](../help/support.md) diff --git a/docs/cheaha/hardware.md b/docs/cheaha/hardware.md index 3df21f8b5..196e97d89 100644 --- a/docs/cheaha/hardware.md +++ b/docs/cheaha/hardware.md @@ -12,13 +12,13 @@ The following hardware summaries may be useful for selecting partitions for work ### Summary -The table below contains a summary of the computational resources available on Cheaha and relevant Quality of Service (QoS) Limits. QoS limits allow us to balance usage and ensure fairness for all researchers using the cluster. QoS limits are not a guarantee of resource availability. +The table below contains a summary of the computational resources available on Cheaha and relevant . QoS limits allow us to balance usage and ensure fairness for all researchers using the cluster. QoS limits are not a guarantee of resource availability. -In the table, [Slurm](./slurm/introduction.md) partitions are grouped by shared QoS limits on cores, memory, and GPUs. Node limits are applied to partitions independently. All limits are applied to researchers independently. +In the table, [Slurm](./slurm/introduction.md) partitions are grouped by shared QoS limits on cores, memory, and . Node limits are applied to partitions independently. All limits are applied to researchers independently. Examples of how to make use of the table: -- Suppose you submit 30 jobs to the "express" partition, and suppose each job needs 10 cores each. Hypothetically, in order for all of the jobs to start at once, 300 cores would be required. The QoS limit on cores is 264 on the "express" partition, so at most 26 jobs (260 cores) can start at once. The remaining 4 jobs will be held in queue, because starting one more would go beyond the QoS limit (270 > 264). +- Suppose you submit 30 to the "express" partition, and suppose each job needs 10 cores each. Hypothetically, in order for all of the jobs to start at once, 300 cores would be required. The QoS limit on cores is 264 on the "express" partition, so at most 26 jobs (260 cores) can start at once. The remaining 4 jobs will be held in queue, because starting one more would go beyond the QoS limit (270 > 264). - Suppose you submit 5 jobs to the "medium" partition and 5 to the "long" partition, each requiring 1 node. Then, 10 total nodes would be needed. In this case, it is possible that all 10 jobs can start at once because partition node limits are separate. If all 5 jobs start, jobs on the "medium" partition. - Suppose you submit 5 jobs to the "amperenodes" partition and 5 to "amperenodes-medium", for a total of 10 A100 GPUs. Additionally, you also submit 4 jobs to the "pascalnodes" partition totaling 8 P100 GPUs. Then 4 of the "gpu: ampere" group jobs can start at once, because the QoS limit is 4 GPUs there. Additionally, all 4 of the "gpu: pascal" group jobs, because the QoS limit is 8 GPUs there. In this case, the QoS for each group is separate. diff --git a/docs/cheaha/job_efficiency.md b/docs/cheaha/job_efficiency.md index 9f22a3c8c..514843d58 100644 --- a/docs/cheaha/job_efficiency.md +++ b/docs/cheaha/job_efficiency.md @@ -24,7 +24,7 @@ Questions to ask yourself before requesting resources: 1. How large is the data I'm working with? - - Start by requesting memory equal to double the size of one file, no less than 2 GB per core. + - Start by requesting equal to double the size of one file, no less than 2 GB per core. - If that isn't enough, increase the request by 50% until there are no more memory errors. - Example: If your data file is 4 GB, try starting out by requesting 8 GB of memory, then 12 GB, 16 GB, etc. diff --git a/docs/cheaha/open_ondemand/index.md b/docs/cheaha/open_ondemand/index.md index a0b5c7206..be48f7526 100644 --- a/docs/cheaha/open_ondemand/index.md +++ b/docs/cheaha/open_ondemand/index.md @@ -1,6 +1,6 @@ # Open OnDemand -Open OnDemand (OOD) is web portal to access Cheaha. On it, you can submit interactive jobs in easy to use forms. These jobs include a generic desktop as well as specific apps such as RStudio or MATLAB. There is also access to a basic file manager for viewing and moving files. +Open OnDemand (OOD) is web portal to access Cheaha. On it, you can submit interactive jobs in easy to use forms. These jobs include a generic desktop as well as specific apps such as or . There is also access to a basic file manager for viewing and moving files. The web portal can be accessed at [https://rc.uab.edu](https://rc.uab.edu) and is available both on and off campus. diff --git a/docs/cheaha/open_ondemand/ood_jupyter.md b/docs/cheaha/open_ondemand/ood_jupyter.md index 987dc3ee7..3f14a7e80 100644 --- a/docs/cheaha/open_ondemand/ood_jupyter.md +++ b/docs/cheaha/open_ondemand/ood_jupyter.md @@ -14,7 +14,7 @@ To modify the environment that Anaconda and Jupyter will run in, please use the ### CUDA -For GPU applications you'll need to load a `CUDA/*` module to have the CUDA toolkit available. If working with deep learning workflows, you may also need to load the `cuDNN/*-CUDA-*` module corresponding to your choice of `CUDA/*` module version. These are required for popular ML/DL/AI libraries like TensorFlow, Keras, and PyTorch. Use `module spider cuda/` and `module spider cudnn` to view the list of appropriate modules. An example of what to put in the Environment Setup field when using a version of Tensorflow compatible with CUDA version 12.2.0 is shown below. +For GPU applications you'll need to load a `CUDA/*` module to have the CUDA toolkit available. If working with deep learning workflows, you may also need to load the `cuDNN/*-CUDA-*` module corresponding to your choice of `CUDA/*` module version. These are required for popular ML/DL/AI libraries like , , and . Use `module spider cuda/` and `module spider cudnn` to view the list of appropriate modules. An example of what to put in the Environment Setup field when using a version of Tensorflow compatible with CUDA version 12.2.0 is shown below. ```shell # ENVIRONMENT SETUP diff --git a/docs/cheaha/open_ondemand/ood_layout.md b/docs/cheaha/open_ondemand/ood_layout.md index 6b1985c4b..8fc9407ee 100644 --- a/docs/cheaha/open_ondemand/ood_layout.md +++ b/docs/cheaha/open_ondemand/ood_layout.md @@ -134,7 +134,7 @@ For each job running via Open OnDemand, there will be a card listed on this page 1. **Host**: The node on which the job is currently running. 1. **Time Remaining**: The amount of time remaining from the total requested time. 1. **Session ID**: This is the unique ID for the OOD session for this job, which can be clicked to access the OOD log directory for troubleshooting. -1. **Node, Cores and State**: Information about the number of node, cores assignment, and state of the job. +1. **Node, Cores and State**: Information about the number of , assignment, and of the job. 1. **Launch Desktop in new tab**: Click this button to open your interactive VNC session. 1. **Delete**: Click this button if you want to cancel/stop a running job, and/or delete the session if the job has already ended. 1. **View Only (Share-able Link)**: Click this button to share the URL of your job with someone. It allows them to watch as you interact with the program and assist you. However, they can only view and cannot control or enter any data. diff --git a/docs/cheaha/slurm/gpu.md b/docs/cheaha/slurm/gpu.md index 316c96459..0ca2901c3 100644 --- a/docs/cheaha/slurm/gpu.md +++ b/docs/cheaha/slurm/gpu.md @@ -12,7 +12,7 @@ For more information on these nodes, see `Detailed Hardware Information`. To submit a job with one or more GPUs, you will need to set the partition to `pascalnodes` or `amperenodes` family of partitions for P100 GPUs or `amperenodes` family for A100 GPUs. -When requesting a job using `sbatch`, you will need to include the Slurm flag `--gres=gpu:#`. Replace `#` with the number of GPUs you need. Quotas and constraints are available on our [Hardware Summary](../hardware.md#summary) +When requesting a job using `sbatch`, you will need to include the Slurm flag `--gres=gpu:#`. Replace `#` with the number of GPUs you need. and constraints are available on our [Hardware Summary](../hardware.md#summary) !!! note diff --git a/docs/cheaha/slurm/slurm_tutorial.md b/docs/cheaha/slurm/slurm_tutorial.md index df27c93c8..0b7bd8667 100644 --- a/docs/cheaha/slurm/slurm_tutorial.md +++ b/docs/cheaha/slurm/slurm_tutorial.md @@ -36,7 +36,7 @@ This user guide provides comprehensive insight into different types of batch job 1. [Parallel Jobs](#example-3-parallel-jobs) is suitable for executing multiple independent tasks/jobs simultaneously and efficiently distributing them across resources. This approach is particularly beneficial for small-scale tasks that cannot be split into parallel processes within the code itself. For example, consider a Python script that operates on different data set, in such a scenario, you can utilize `srun` to execute multiple instances of the script concurrently, each operating on a different dataset and on different resources. -1. [Array Job](#example-4-array-job) is used for submitting and running multiple large number of identical tasks in parallel. They share the same code and execute with similar resource requirements. Instead of submitting multiple [sequential job](#example-2-sequential-job), you can submit a single array job, which helps to manage and schedule a large number of similar tasks efficiently. This improves efficiency, resource utilization, scalability, and ease of debugging. For instance, array jobs can be designed for executing multiple instances of the same task with slight variations in inputs or parameters such as perform [FastQC](https://home.cc.umanitoba.ca/~psgendb/doc/fastqc.help) processing on 10 different samples. +1. [Array Job](#example-4-array-job) is used for submitting and running multiple large number of identical tasks in parallel. They share the same code and execute with similar resource requirements. Instead of submitting multiple [sequential job](#example-2-sequential-job), you can submit a single array job, which helps to manage and schedule a large number of similar tasks efficiently. This improves efficiency, resource utilization, scalability, and ease of debugging. For instance, can be designed for executing multiple instances of the same task with slight variations in inputs or parameters such as perform [FastQC](https://home.cc.umanitoba.ca/~psgendb/doc/fastqc.help) processing on 10 different samples. 1. [Mutlithreaded or Multicore Job](#example-5-multithreaded-or-multicore-job) is used when software inherently support multithreaded parallelism i.e run independent tasks simultaneously on multicore processors. For instance, there are numerous software such as [MATLAB](https://www.mathworks.com/help/matlab/ref/parfor.html), [FEBio](https://help.febio.org/FebioUser/FEBio_um_3-4-Section-2.6.html), [Xplor-NIH](https://nmr.cit.nih.gov/xplor-nih/doc/current/helperPrograms/options.html) support running multiple tasks at the same time on multicore processors. Users or programmers do not need to modify the code; you can simply enable multithreaded parallelism by configuring the appropriate options. diff --git a/docs/data_management/lts/lts_cores.md b/docs/data_management/lts/lts_cores.md index 7392247ae..90e45cb2e 100644 --- a/docs/data_management/lts/lts_cores.md +++ b/docs/data_management/lts/lts_cores.md @@ -2,7 +2,7 @@ [UAB Core Facilities](https://www.uab.edu/cores/ircp/uab-ircp-core-facilities) provide access to research instruments and services for scientific and clinical investigators. Cores can generate large amounts of data very quickly for many labs and so have some unique data management concerns. These concerns can be summarized as follows: -1. Data transfer off local machines +1. off local machines 1. Data organization 1. Data distribution diff --git a/docs/data_management/lts/policies.md b/docs/data_management/lts/policies.md index 52b32dd75..d5fa321ff 100644 --- a/docs/data_management/lts/policies.md +++ b/docs/data_management/lts/policies.md @@ -2,7 +2,7 @@ A major use for LTS is storage of data that should be accessible to multiple users from a lab or research group. By default, buckets are only visible and accessible to the owner of the bucket, and no mechanism exists to search for buckets other users have created. -Instead, sharing buckets must be done through the command line using [bucket policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-policies.html). A bucket policy is a JSON formatted file that assigns user read and write permissions to the bucket and to objects within the bucket. If you have not worked with JSON files before, a brief explantion can be found [here](https://docs.fileformat.com/web/json/). It's important to note that the bucket owner will always retain the ability to perform all actions on a bucket and its contents and so do not need to be explicitly granted permissions. +Instead, sharing buckets must be done through the command line using [bucket policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-policies.html). A bucket policy is a JSON formatted file that assigns user read and write permissions to the bucket and to objects within the bucket. If you have not worked with JSON files before, a brief explanation can be found [here](https://docs.fileformat.com/web/json/). It's important to note that the bucket owner will always retain the ability to perform all actions on a bucket and its contents and so do not need to be explicitly granted permissions. !!! important diff --git a/docs/data_management/storage.md b/docs/data_management/storage.md index 18ad81562..a13263a83 100644 --- a/docs/data_management/storage.md +++ b/docs/data_management/storage.md @@ -1,6 +1,6 @@ # Storage -Research Computing offers several data storage options to meet individual or shared needs of UAB researchers, depending on their requirement and use-cases. The types of storage available, procedures for requesting access, responsibilities, and usage guidelines are detailed in the following sections. +Research Computing offers several data storage options to meet individual or shared needs of UAB researchers, depending on their requirement and use-cases. The types of available, procedures for requesting access, responsibilities, and usage guidelines are detailed in the following sections. ## What Type of Storage Do I Need? @@ -14,7 +14,7 @@ Every Cheaha user has personal directories found at `/home/$USER` (or `$HOME`) a ### How Do I Request Individual Long-Term Storage? -To request individual Long-Term Storage, please first read and understand how [Long-Term Storage](./lts/index.md) differs from traditional file systems, like GPFS on Cheaha. Decide if it is suitable for your needs. Then please feel free to contact [Support](../help/support.md). +To request individual , please first read and understand how [Long-Term Storage](./lts/index.md) differs from traditional file systems, like GPFS on Cheaha. Decide if it is suitable for your needs. Then please feel free to contact [Support](../help/support.md). ## What Shared Storage Solutions are Available? @@ -87,7 +87,7 @@ To request changes in Shared Storage membership, please contact [Support](../hel ### How Can I Get A Larger `/data/project/` (GPFS) Allocation? -At this time, due to constraints on total GPFS storage, we are not able to increase `/data/project/` allocations. Please consider batching your analyses by leveraging a combination of [LTS](./lts/index.md) to store raw and/or input data, and [User Scratch](#user-scratch) for temporary storage of up to 100 TB of data for use during analysis. +At this time, due to constraints on total , we are not able to increase `/data/project/` allocations. Please consider batching your analyses by leveraging a combination of [LTS](./lts/index.md) to store raw and/or input data, and [User Scratch](#user-scratch) for temporary storage of up to 100 TB of data for use during analysis. If you wish to have further discussion of options for expanding your GPFS allocation and other workarounds tailored to your workflow, please [Contact Support](../help/support.md). Please also note that project storage is not just for a single project only, it is meant as a storage for multiple projects. diff --git a/docs/data_management/transfer/rclone.md b/docs/data_management/transfer/rclone.md index 7b9ff1353..6b188b3fc 100644 --- a/docs/data_management/transfer/rclone.md +++ b/docs/data_management/transfer/rclone.md @@ -1,6 +1,6 @@ # RClone -[RClone](https://rclone.org/) is a powerful command line tool for transferring and synchronizing files over the internet between various machines, servers and cloud storage services. It is highly recommended for small to moderate amounts of data. For very large amounts of data consider using [Globus](globus.md) for increased robustness against failure. Where Globus is not available, `rclone` is still suitable. +[RClone](https://rclone.org/) is a powerful command line tool for transferring and synchronizing files over the internet between various machines, servers and cloud storage services. It is highly recommended for small to moderate amounts of data. For very large amounts of data consider using [Globus](globus.md) for increased robustness against failure. Where is not available, `rclone` is still suitable. RClone requires a modest amount of setup time on local machines, but once setup can be used fairly easily. RClone uses the concepts of "remotes", which is an abstract term for any storage service or device that is not physically part of the local machine. Many remotes are offered, including [SFTP](../../uab_cloud/remote_access.md#sftp) and various [UAB Cloud Storage Solutions](https://www.uab.edu/it/home/tech-solutions/file-storage/storage-options). SFTP may be used to access Cheaha, cloud.rc and other laptop and desktop computers. diff --git a/docs/glossary.md b/docs/glossary.md new file mode 100644 index 000000000..282e38433 --- /dev/null +++ b/docs/glossary.md @@ -0,0 +1,172 @@ +--- +toc_depth: 1 +--- + +# Glossary + +## Introduction + +This glossary defines key terms related to the Research Computing system to help researchers, administrators, help desks, and users effectively use our resources, and also serving as a helpful reference for understanding the system's terminology, listed alphabetically below. + + [A](#a) [**B**](#b) [**C**](#c) [**D**](#d) [**E**](#e) [**F**](#f) [**G**](#g) [**H**](#h) [**I**](#i) [**J**](#j) [**K**](#k) [**L**](#l) [**M**](#m) [**N**](#n) [**O**](#o) [**P**](#p) [**Q**](#q) [**R**](#r) [**S**](#s) [**T**](#t) [**U**](#u) [**V**](#v) [**W**](#w) [**X**](#x) [**Y**](#y) [**Z**](#z) + +### A + +**section:Account** +: Refers to the user credentials that you use to log into the Research Computing systems. + +**section:ACLs** +: Access Control Lists (ACLs) is a mechanism used to define permissions for files and directories in an HPC system. + +**section:Array Job** +: A job type in HPC used to submit and execute a large number of identical or similar tasks in parallel, managed efficiently under a single job submission. + +### B + +**section:BlazerID** +: The user name that will bed used to connect to any UAB system. + +### C + +**section:cores** +: Individual processing units within a CPU that can execute tasks. + +**section:Cheaha** +: A shared cluster computing environment for UAB researchers + +**section:Compute node** +: A dedicated server in an Cheaha cluster designed to perform computational tasks. + +### D + +**section:Data transfer** +: The process of moving files between local systems and Cheaha, or between storage locations on the Research computing system. + +### E + +### F + +### G + +**section:Globus** +: A data transfer tool that enables fast and secure research data transfers between our local systems and Cheaha system, or between storage locations on the Cheaha system. + +**section: GPFS storage** +: General Parallel File System (GPFS) storage provides scalable and distributed storage to manage large amounts of data efficiently. For example, Cheaha project directory is a GPFS storage. + +**section:GPUs** +: Graphics Processing Units (GPUs), specialized hardware for parallel processing, often used in machine learning, deep learning, and other computationally intensive tasks. + +### H + +### I + +**section:Instance** +: A virtualized compute resource in the cloud, often running as a virtual machine, with allocated resources (such as CPU, memory, and storage resources). + +### J + +**section:Jobs** +: Tasks submitted to the Cheaha system for execution, typically managed by a SLURM scheduler. + +### K + +**section:Key pairs** +: These can be keys used to securely access virtual machines or cloud instances via SSH (a public key and private key). They can also be used as a username (access key) and password (secret key) to access LTS (Long-Term Storage). + +**section:Keras** +: A high level neural network API that runs on top of TensorFlow for training machine learning and deep learning models. + +### L + +**section:Login node** +: An entry point to a Cheaha cluster for users. + +**section:Long-Term Storage** +: An S3 object-storage platform hosted at UAB which is designed to hold data that is not currently being used in analysis but should be kept for data sharing or reused for further analysis in the future. + +### M + +**section:MATLAB** +: Application software for numerical computation, data analysis, and visualization. + +**section:Memory** +: The system's storage, typically called RAM, allocated to a job for temporary data storage during its execution. + +**section:Modules** +: A software package on HPC systems, allowing user to easily load/access specific versions of software applications and libraries. + +### N + +**section:Node** +: A single computational unit in an HPC cluster, containing processors, and memory. + +### O + +**section:OOD** +: Open OnDemand (OOD) is a web based portal that provides users with easy access to compute resources, file systems, and job management tools through a graphical interface, in HPC environment. + +### P + +**section:Partitions** +: A logical group of nodes that are organized based on their hardware, usage type, or priority. + +**section:PyTorch** +: An open source framework, developed by Meta, that provides tools for developing and training deep learning models. + +### Q + +**section:Quality of Service (QoS) Limits** +: QoS limits allow us to balance usage and ensure fairness for all researchers using the cluster. + +**section:Quotas** +: Limits on resources such as storage or computational time allocated to a user. + +### R + +**section:RStudio** +: An integrated development environment (IDE) for R programming, used for statistical computing and data analysis. + +### S + +**section:Shared Storage** +: A centralized storage space (e.g Cheaha project directory, or Shared LTS allocation) used for collaborative work by multiple users, managed by the PIs. + +**section:SLURM** +: Simple Linux Utility for Resource Management (SLURM), a popular workload manager in HPC systems. It schedules jobs based using resource requests such as number of CPUs, maximum memory (RAM) required per CPU, maximum run time, and more. + +**section:State** +: The current status of a job in the scheduler, such as pending, running, completed, or failed. + +**section:Storage** +: Resources allocated for storing data that includes home/user, project, scratch directories on Cheaha and LTS. + +**section:Submit Jobs** +: The process of queuing computational tasks to run on the HPC system using a job scheduler such as SLURM. + +### T + +**section:TensorFlow** +: An open-source machine learning framework, developed by Google, commonly used for building and training deep learning models. + +### U + +### V + +**section:Virtual Machine (VM)** +: A software-based computer that provides a virtualized environment, that functions like a physical computer, for running applications or operating systems. + +**section:Volumes** +: Virtual storage devices used to persist data, often associated with Virtual Machines (VM) or cloud environments. + +### W + +### X + +**section:XIAS account** +: A guest user credential for non-UAB individuals who need access to our HPC system. It uses the guest's email as the username. + +### Y + +### Z + diff --git a/docs/uab_cloud/index.md b/docs/uab_cloud/index.md index b2394fbd7..6b896a8f9 100644 --- a/docs/uab_cloud/index.md +++ b/docs/uab_cloud/index.md @@ -26,7 +26,7 @@ Once logged in, you will see the OpenStack dashboard. An example is shown below. To get the most out of cloud.rc, you'll want to make sure you have a working familiarity with the [Linux terminal](../workflow_solutions/shell.md). -Cloud.rc runs on Openstack. If you are new to Openstack or to cloud.rc, it is highly recommended to follow our [Tutorial](tutorial/index.md) to learn how to set up all of the necessary components of a virtual machine (VM) setup. The tutorial is intended to be followed in order. Doing it out of order may result in errors and issues. If you encounter any unexpected issues, unclear instructions or have questions or comments, please contact [Support](../help/support.md). +Cloud.rc runs on Openstack. If you are new to Openstack or to cloud.rc, it is highly recommended to follow our [Tutorial](tutorial/index.md) to learn how to set up all of the necessary components of a setup. The tutorial is intended to be followed in order. Doing it out of order may result in errors and issues. If you encounter any unexpected issues, unclear instructions or have questions or comments, please contact [Support](../help/support.md). ## Cloud Usage Philosophy @@ -42,7 +42,7 @@ The downside to disposable machines is losing configuration specifics. Software ## Naming Conventions -Entities on cloud.rc must be named a certain way or difficult-to-diagnose errors may occur. Entities includes instances, volumes, networks, routers, and anything else that you are allowed to give a name to. +Entities on cloud.rc must be named a certain way or difficult-to-diagnose errors may occur. Entities includes instances, , networks, routers, and anything else that you are allowed to give a name to. Please use the following rules when naming entities: diff --git a/docs/uab_cloud/remote_access.md b/docs/uab_cloud/remote_access.md index 3c2792b6c..8ea7b369f 100644 --- a/docs/uab_cloud/remote_access.md +++ b/docs/uab_cloud/remote_access.md @@ -113,7 +113,7 @@ fi ### Generating Key Pairs -The instructions for generating key pairs are identical for all operating systems. [GitHub](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) maintains excellent documentation on generating key pairs. The gist of those instructions follows. +The instructions for generating are identical for all operating systems. [GitHub](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) maintains excellent documentation on generating key pairs. The gist of those instructions follows. 1. Open a terminal window. 1. Use the command `ssh-keygen -t ed25519 -C "your_email@example.com"` @@ -230,11 +230,11 @@ Where `user` is the remote username, `remote_ip` is the IP address of the remote ## Make Instances Publically Accessible From the Internet -It is possible to make [instances](./tutorial/instances.md) publically accessible from the external internet. [Floating IPs](./tutorial/networks.md#floating-ips) are pulled from a limited and fixed pool of public IP addresses assigned from the overall UAB IP pool. By default, these IP addresses are unable to communicate beyond the UAB Internet Border firewall, for security reasons. To make your instance publically accessible, a Firewall Security Exception must be filed. The result of the security exception is to create a firewall rule to allow traffic between the internet and an application on your instance. This section will go over how to make your instance publically accessible. +It is possible to make [instances](./tutorial/instances.md) publicly accessible from the external internet. [Floating IPs](./tutorial/networks.md#floating-ips) are pulled from a limited and fixed pool of public IP addresses assigned from the overall UAB IP pool. By default, these IP addresses are unable to communicate beyond the UAB Internet Border firewall, for security reasons. To make your publicly accessible, a Firewall Security Exception must be filed. The result of the security exception is to create a firewall rule to allow traffic between the internet and an application on your instance. This section will go over how to make your instance publicly accessible. ### Expectations -The expectation of making an instance publically accessible is to advance UAB's mission, so be sure you've configured and thoroughly tested your instance in the UAB Network before proceeding. The following list is intended as a helpful reminder. +The expectation of making an instance publicly accessible is to advance UAB's mission, so be sure you've configured and thoroughly tested your instance in the UAB Network before proceeding. The following list is intended as a helpful reminder. - Have an instance with some research application or server that advances UAB's mission. - The instance is configured with a floating IP address. diff --git a/mkdocs.yml b/mkdocs.yml index 2f22456c3..6bbeaeb7a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,7 +26,11 @@ copyright: Copyright © 2021-2024 The University of Alabama at Birmingham. Date: Thu, 9 Jan 2025 09:16:46 -0600 Subject: [PATCH 2/7] fix typo --- docs/glossary.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/glossary.md b/docs/glossary.md index 282e38433..4fb835f30 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -28,15 +28,15 @@ This glossary defines key terms related to the Research Computing system to help ### C -**section:cores** -: Individual processing units within a CPU that can execute tasks. - **section:Cheaha** : A shared cluster computing environment for UAB researchers **section:Compute node** : A dedicated server in an Cheaha cluster designed to perform computational tasks. +**section:Core** +: Individual processing unit within a CPU that can execute tasks. + ### D **section:Data transfer** @@ -54,7 +54,7 @@ This glossary defines key terms related to the Research Computing system to help **section: GPFS storage** : General Parallel File System (GPFS) storage provides scalable and distributed storage to manage large amounts of data efficiently. For example, Cheaha project directory is a GPFS storage. -**section:GPUs** +**section:GPU** : Graphics Processing Units (GPUs), specialized hardware for parallel processing, often used in machine learning, deep learning, and other computationally intensive tasks. ### H @@ -108,7 +108,7 @@ This glossary defines key terms related to the Research Computing system to help ### P -**section:Partitions** +**section:Partition** : A logical group of nodes that are organized based on their hardware, usage type, or priority. **section:PyTorch** @@ -119,7 +119,7 @@ This glossary defines key terms related to the Research Computing system to help **section:Quality of Service (QoS) Limits** : QoS limits allow us to balance usage and ensure fairness for all researchers using the cluster. -**section:Quotas** +**section:Quota** : Limits on resources such as storage or computational time allocated to a user. ### R From 71daa95236b758fc765e1068361136b123b4ece3 Mon Sep 17 00:00:00 2001 From: bdu-birhanu Date: Mon, 20 Jan 2025 13:53:11 -0600 Subject: [PATCH 3/7] resolve conflic in mkdocs.yml --- mkdocs.yml | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/mkdocs.yml b/mkdocs.yml index 99ed920bf..781453d23 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -201,13 +201,8 @@ nav: - Contributing Content: contributing/contributor_guide.md - Help: - Support: help/support.md -<<<<<<< HEAD - - FAQ: help/faq.md - - Glossary: glossary.md -======= - FAQ - Frequently Asked Questions: help/faq.md - ->>>>>>> main + - Glossary: glossary.md validation: nav: omitted_files: warn From fc91952e8f800257337071655bf5334a43321765 Mon Sep 17 00:00:00 2001 From: bdu-birhanu Date: Mon, 20 Jan 2025 14:05:29 -0600 Subject: [PATCH 4/7] resolve conflict in iam_and_policies --- docs/data_management/lts/iam_and_policies.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/data_management/lts/iam_and_policies.md b/docs/data_management/lts/iam_and_policies.md index fdc7fa536..dccb8f246 100644 --- a/docs/data_management/lts/iam_and_policies.md +++ b/docs/data_management/lts/iam_and_policies.md @@ -4,8 +4,6 @@ toc_depth: 3 # LTS Identity and Access Management -<<<<<<< HEAD:docs/data_management/lts/policies.md -======= LTS Identity and Access Management (IAM) is a framework for managing identities, roles, and permissions in Long-Term Storage (LTS) solutions, ensuring secure and efficient access to storage spaces like [buckets](./index.md#terminology) or [objects](./index.md#terminology). Understanding access rights and permissions for LTS spaces is essential for effective data management and security. This section aims to clarify common misconceptions regarding ownership, steward roles, access control, and how bucket policies help manage permissions in LTS spaces. ## Terminology @@ -84,7 +82,6 @@ If you, as a Lab/Core PI, do not wish to manage the LTS space yourself, we recom A major use for LTS is storage of data that should be accessible to multiple users from a Lab or research group. By default, buckets are only visible and accessible to the owner of the bucket, and no mechanism exists to search for buckets other users have created. ->>>>>>> main:docs/data_management/lts/iam_and_policies.md Instead, sharing buckets must be done through the command line using [bucket policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-policies.html). A bucket policy is a JSON formatted file that assigns user read and write permissions to the bucket and to objects within the bucket. If you have not worked with JSON files before, a brief explanation can be found [here](https://docs.fileformat.com/web/json/). It's important to note that the bucket owner will always retain the ability to perform all actions on a bucket and its contents and so do not need to be explicitly granted permissions. From 77bb630f41f66cea56fc081d5ba48e430d30bdb0 Mon Sep 17 00:00:00 2001 From: bdu-birhanu Date: Mon, 20 Jan 2025 14:19:19 -0600 Subject: [PATCH 5/7] add mkdocs-ezglossary-plugin in buuld_env --- build_env.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/build_env.yml b/build_env.yml index 2ad41e57a..32687cdb5 100644 --- a/build_env.yml +++ b/build_env.yml @@ -40,3 +40,4 @@ dependencies: - linkchecker==10.4.0 - pre-commit==3.7.1 - python-dotenv[cli]==1.0.1 + - mkdocs-ezglossary-plugin==1.7.1 From da5d4e298182f00923b1004f4ea052dd0324f392 Mon Sep 17 00:00:00 2001 From: bdu-birhanu Date: Mon, 20 Jan 2025 19:57:11 -0600 Subject: [PATCH 6/7] added terms --- docs/glossary.md | 320 +++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 281 insertions(+), 39 deletions(-) diff --git a/docs/glossary.md b/docs/glossary.md index 4fb835f30..c4c59ad21 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -12,159 +12,401 @@ This glossary defines key terms related to the Research Computing system to help ### A +**A100 GPU** +: High-performance GPU from NVIDIA, commonly used in AI/ML workloads. + **section:Account** -: Refers to the user credentials that you use to log into the Research Computing systems. +: Refers to the user credentials that you use to log into the Research Computing systems. **section:ACLs** -: Access Control Lists (ACLs) is a mechanism used to define permissions for files and directories in an HPC system. +: Access Control Lists (ACLs) is a mechanism used to define permissions for files and directories in an HPC system. + +**section:Allocation** +: Refers to the designated LTS storage quota assigned to a individual user or their Lab or Core Facility. + +**section:amperenodes** +: Nodes that are equipped with Ampere architecture GPUs, used for high-performance computing. + +**section:Anaconda** +: A distribution of Python for scientific computing and machine learning, including package management and environment management tools. **section:Array Job** -: A job type in HPC used to submit and execute a large number of identical or similar tasks in parallel, managed efficiently under a single job submission. +: A job type in HPC used to submit and execute a large number of identical or similar tasks in parallel, managed efficiently under a single job submission. ### B +**section:Binary** +: A compiled program, also called executable, software or file that the operating system can execute directly. + **section:BlazerID** -: The user name that will bed used to connect to any UAB system. +: The user name that will bed used to connect to any UAB system. + +**section:Bucket** +: A bucket is the root which objects are stored in Long Term Storage (LTS) systems, it is similar to a root directory or folder in traditional file system. + +**section:Bucket Policy** +: A set of permissions defining access control for a and LTS storage bucket/object. ### C +**section:Ceph** +: A distributed object storage system designed for scalability and fault tolerance. Please refer [Ceph](https://ceph.io/en/) for details. + **section:Cheaha** -: A shared cluster computing environment for UAB researchers +: A shared cluster computing environment for UAB researchers. + +**section:CICD** +: Continuous Integration (CI) and Continuous Deployment(CD) are practices for automating software testing and deployment. + +**section:Cloud** +: A network of remote servers that provide computing resources such as storage, processing power, and applications on demand over the internet. + +**section:Commercial Software** +: Licensed software (such as Amber, Ansys, Gurobi, LS-Dyna, Mathematica, SAS, Stata) used for specialized computing tasks in various scientific and engineering fields. + +**section:Community Container** +: A prebuilt, shared containerized environment designed for use by multiple researchers. **section:Compute node** -: A dedicated server in an Cheaha cluster designed to perform computational tasks. +: A dedicated server in Cheaha cluster designed to perform computational tasks. + +**section:Conda** +: Package and environment management system for Python and other languages. **section:Core** -: Individual processing unit within a CPU that can execute tasks. +: Individual processing unit within a CPU that can execute tasks. + +**section:CUDA** +: NVIDIA's parallel computing platform and API that enables GPU acceleration for computing tasks. ### D **section:Data transfer** -: The process of moving files between local systems and Cheaha, or between storage locations on the Research computing system. +: The process of moving files between local systems and Cheaha, or between storage locations on the Research computing system. + +**section:Docker Image** +: A snapshot of the libraries and dependencies required inside a container for an application to run. + +**section: Duo 2FA** +: A two-factor authentication system used for securing user access. ### E +**section:Exosphere** +: please refer [Exosphere](https://ieeexplore.ieee.org/document/9308090) + ### F +**section:FLOPS** +: Floating Point Operations Per Second (FLOPS), a measure of computational performance. + ### G +**section:GitLFS** +: A Git extension for versioning large files, used to store binary files outside the repository. Please refer [GitLFS](https://git-lfs.com/) for details. + **section:Globus** -: A data transfer tool that enables fast and secure research data transfers between our local systems and Cheaha system, or between storage locations on the Cheaha system. +: A data transfer tool that enables fast and secure research data transfers between our local systems and Cheaha system, or between storage locations on the Cheaha system. + +**section:Globus CLI** +: A command line tool that allows users to manage file transfers, automate workflows, and interact with Globus services without using the web interface. +**section:Globus Collection** +: A logical representation of data storage that is accessible via the Globus service. -**section: GPFS storage** -: General Parallel File System (GPFS) storage provides scalable and distributed storage to manage large amounts of data efficiently. For example, Cheaha project directory is a GPFS storage. +**section:Globus Connect Personal** +: A software application that allows individual users to transfer files between their personal devices and other Globus-enabled storage systems. + +**section:Globus Connect Server** +: A server-side software that enables institutions or organizations to connect their storage systems to Globus. + +**section:Globus Group** +: A collection of users managed through Globus for access control and collaboration. + +**section:GPFS storage** +: General Parallel File System (GPFS) storage provides scalable and distributed storage to manage large amounts of data efficiently. For example, Cheaha project directory is a GPFS storage. **section:GPU** -: Graphics Processing Units (GPUs), specialized hardware for parallel processing, often used in machine learning, deep learning, and other computationally intensive tasks. +: Graphics Processing Units (GPUs), specialized hardware for parallel processing, often used in machine learning, deep learning, and other computationally intensive tasks. + +**section:GRES** +: A Generic Resource Scheduling (GRES), refers to resources like GPUs that are shared across jobs. ### H +**section:High Performance Computing (HPC)** +: A computing system that enables the processing of large datasets and complex computations at high speeds using parallel processing techniques (it focuses on solving a problem as quickly as possible.). + +**section:High Throughput Computing (HTC)** +: A computing model focused on maximizing the number of tasks completed in a given time frame, often used for handling large-scale batch processing (it focuses on completing a large number of jobs over a long period of time). + +**section:Horizon** +: A web-based Dashboard with graphical interface to manage and interacting with OpenStack services. Please refer [Horizon](https://docs.openstack.org/horizon/latest/?utm_source=chatgpt.com) for details. + +**section:Host** +: A computer or system that provides services or resources to other systems in a network. + ### I +**section:Identity and Access Management (IAM)** +: A framework of policies that help you securely control access to system resources. + **section:Instance** -: A virtualized compute resource in the cloud, often running as a virtual machine, with allocated resources (such as CPU, memory, and storage resources). +: A virtualized compute resource in the cloud, often running as a virtual machine, with allocated resources (such as CPU, memory, and storage resources). + +**section:Interactive Apps** +: Applications used to interact with applications in openOn Demand(OOD) portal that during job execution. + +**section:Ipykernel** +: The IPython kernel (Ipykernel) is the backend component that executes Python code in Jupyter notebooks. ### J +**section:Jetstream** +: A cloud-based research computing platform offering on-demand computing resources, primarily for scientific research. Please refer [Jetstream](https://dl.acm.org/doi/10.1145/3437359.3465565) + **section:Jobs** -: Tasks submitted to the Cheaha system for execution, typically managed by a SLURM scheduler. +: Tasks submitted to the Cheaha system for execution, typically managed by a SLURM scheduler. + +**section:Job Composer** +: A feature in OpenOnDemand (OOD) portal used to define and manage job specifications for execution. + +**section:JobID** +: A unique identifier assigned to a job when it is submitted for execution in the system. + +**section: Job submission script** +: A script containing resource requests and commands to execute a job on an HPC system, usually used with a scheduler like SLURM. ### K **section:Key pairs** -: These can be keys used to securely access virtual machines or cloud instances via SSH (a public key and private key). They can also be used as a username (access key) and password (secret key) to access LTS (Long-Term Storage). +: These can be keys used to securely access virtual machines or cloud instances via SSH (a public key and private key). They can also be used as a username (access key) and password (secret key) to access LTS (Long-Term Storage). **section:Keras** -: A high level neural network API that runs on top of TensorFlow for training machine learning and deep learning models. +: A high level neural network API that runs on top of TensorFlow for training machine learning and deep learning models. ### L +**section:largemem** +: A class of nodes or partition with large memory resources, used for running memory intensive tasks. + +**section:Local Scratch** +: A local storage directly attached to compute nodes, offering fast read/write speeds, typically used for temporary storage during job execution on that node. + **section:Login node** -: An entry point to a Cheaha cluster for users. +: An entry point to a Cheaha cluster for users. **section:Long-Term Storage** -: An S3 object-storage platform hosted at UAB which is designed to hold data that is not currently being used in analysis but should be kept for data sharing or reused for further analysis in the future. +: An S3 object-storage platform hosted at UAB which is designed to hold data that is not currently being used in analysis but should be kept for data sharing or reused for further analysis in the future. ### M +**section:Mamba** +: A replacement/reimplementation of the conda package manager that improves package dependency resolution and installation speed. + +**section:Mambaforge** +: A preconfigured distribution of Miniconda that includes Mamba as the default package manager. + **section:MATLAB** -: Application software for numerical computation, data analysis, and visualization. +: Application software for numerical computation, data analysis, and visualization. **section:Memory** -: The system's storage, typically called RAM, allocated to a job for temporary data storage during its execution. +: The system's storage, typically called RAM, allocated to a job for temporary data storage during its execution. + +**section:Message Passing Interface (MPI)** +: A standardized communication protocol used for parallel computing, enabling processes to exchange data efficiently across multiple nodes. -**section:Modules** -: A software package on HPC systems, allowing user to easily load/access specific versions of software applications and libraries. +**section:Miniforge** +: A minimal Conda installer that provides a flexible way to manage Python environments. + +**section:Miniconda** +: A minimal version of Anaconda that includes only Conda, Python, and essential dependencies. + +**section:Module** +: A system that manages environment settings for software packages on HPC, allowing users to load, unload, and switch between different software versions. + +**section:module load** +: A command used to load a specific software module into the environment, making it available for use. + +**section:module reset** +: A command that unloads all currently loaded modules and restores the default environment settings. + +**section:My Interactive Sessions** +: User-specific sessions in the OpenOnDemand (OOD) portal that allow us to view the list of jobs or sessions we have requested. ### N +**section:Network scratch** +: Network scratch can be a user scratch which is available under directory `/scratch/$USER` or `$USER_SCRATCH` on the Cheaha login node and each compute node. + **section:Node** -: A single computational unit in an HPC cluster, containing processors, and memory. +: A single computational unit in an HPC cluster, containing processors, and memory. + +**section:noVNC** +: A browser-based Virtual Network Computing (VNC) client that allows users to access remote graphical desktops over the web. + +**section:Nvidia-smi** +: Command for monitoring and managing NVIDIA GPU devices. ### O **section:OOD** -: Open OnDemand (OOD) is a web based portal that provides users with easy access to compute resources, file systems, and job management tools through a graphical interface, in HPC environment. +: Open OnDemand (OOD) is a web based portal that provides users with easy access to compute resources, file systems, and job management tools through a graphical interface, in HPC environment. + +**section:Open Science Grid (OSG)** +: A distributed computing infrastructure that provides high-throughput computing resources for scientific research. + +**section:OpenStack** +: An open-source cloud computing platform that controls large pools of computing, storage, and networking resources. Please refer [OpenStack](https://www.openstack.org/) for details. ### P +**section:Package** +: A collection of software components bundled together to provide specific functionality, commonly managed via Conda, Pip. + +**section:pascalnodes** +: Nodes equipped with Pascal architecture GPUs, typically used for machine learning or simulations. + +**section:P100 GPU** +: NVIDIA's Pascal-based GPU, widely used in scientific computing and deep learning tasks. + +**section:Parabricks** +: NVIDIA-accelerated toolkit for genomic data analysis, providing fast and efficient workflows. + **section:Partition** -: A logical group of nodes that are organized based on their hardware, usage type, or priority. +: A logical group of nodes that are organized based on their hardware, usage type, or priority. **section:PyTorch** -: An open source framework, developed by Meta, that provides tools for developing and training deep learning models. +: An open source framework, developed by Meta, that provides tools for developing and training deep learning models. ### Q -**section:Quality of Service (QoS) Limits** -: QoS limits allow us to balance usage and ensure fairness for all researchers using the cluster. +**section: Quality of Service (QoS) Limits** +: A set of parameters in an HPC scheduler defining job priorities, resource limits, and execution policies. It allow us to balance usage and ensure fairness for all researchers using the cluster. **section:Quota** -: Limits on resources such as storage or computational time allocated to a user. +: Limits on resources such as storage or computational time allocated to a user. ### R +**section:rclone** +: A command-line program for managing and syncing files between local storage and cloud services, often used for data transfers in HPC environments. + +**section:Remote Tunnels** +: Secure SSH-based connections that allow forwarding of ports from an HPC system to a local machine, enabling remote access to services. + **section:RStudio** -: An integrated development environment (IDE) for R programming, used for statistical computing and data analysis. +: An integrated development environment (IDE) for R programming, used for statistical computing and data analysis. ### S +**section:sacct** +: A command used to retrieve job accounting data in Slurm. + +**section:sbatch** +: A SLURM command used to submit batch jobs to the scheduler. + +**section:scancel** +: A command used to cancel or halt a job in Slurm. + +**section:ScienceDMZ** +: Please refer [ScienceDMZ](https://fasterdata.es.net/science-dmz/) + +**section:Scheduler** +: A workload manager in an HPC system such as SLURM that allocates resources and schedules jobs based on priority, policies, and availability. + +**section:Scratch Retention Policy** +: The set of rules governing how long data can be stored in an HPC scratch Storage space before automatic deletion. + +**section:Secure SHell (SSH)** +: A protocol used for secure remote access to servers or HPC systems. + +**section:Security Policy** +: A set of rules and configurations that define access controls, permissions, and security settings to protect systems and data. + +**Section:seff** +: A command used to display job efficiency in Slurm. + +**section:Shared Collection** +: A Globus collection that allows multiple users to access and manage shared data resources securely. + **section:Shared Storage** -: A centralized storage space (e.g Cheaha project directory, or Shared LTS allocation) used for collaborative work by multiple users, managed by the PIs. +: A centralized storage space (e.g Cheaha project directory, or Shared LTS allocation) used for collaborative work by multiple users, managed by the PIs. + +**section:Single Sign-On (SSO)** +: An authentication mechanism that allows users to login once and access multiple HPC and institutional services without reentering credentials. + +**section:Singularity** +: A containerization platform used for creating and running portable, reproducible environments in high-performance computing. **section:SLURM** -: Simple Linux Utility for Resource Management (SLURM), a popular workload manager in HPC systems. It schedules jobs based using resource requests such as number of CPUs, maximum memory (RAM) required per CPU, maximum run time, and more. +: Simple Linux Utility for Resource Management (SLURM), a popular workload manager in HPC systems. It schedules jobs based using resource requests such as number of CPUs, maximum memory (RAM) required per CPU, maximum run time, and more. + +**section:SlURM Flags** +: Options or parameters used in SlURM job scripts to control resource allocation, scheduling, and execution behavior. + +**section:Snapshot** +: A copy of a file system, disk, or virtual machine, allowing for backup, or cloning purposes. + +**section:srun** +: A command in SLURM used to launch parallel jobs directly on allocated compute resources. **section:State** -: The current status of a job in the scheduler, such as pending, running, completed, or failed. +: The current status of a job in the scheduler, such as pending, running, completed, or failed. **section:Storage** -: Resources allocated for storing data that includes home/user, project, scratch directories on Cheaha and LTS. +: Resources allocated for storing data that includes home/user, project, scratch directories on Cheaha and LTS. **section:Submit Jobs** -: The process of queuing computational tasks to run on the HPC system using a job scheduler such as SLURM. +: The process of queuing computational tasks to run on the HPC system using a job scheduler such as SLURM. + +**section:S3cmd** +: A command-line tool for managing data in Amazon S3 and S3-compatible storage (example UAB LTS). + +**section:S5cmd** +: A command-line tool for interacting with S3 storage, optimized for speed and parallelism (example UAB LTS). + +**section:squeue** +: A command used to display the status of jobs in the SLURM queue. ### T **section:TensorFlow** -: An open-source machine learning framework, developed by Google, commonly used for building and training deep learning models. +: An open-source machine learning framework, developed by Google, commonly used for building and training deep learning models. + +**section:TRES** +: A TRES is a resource that can be tracked for usage or used to enforce limits against. Please refer [TRES](https://slurm.schedmd.com/tres.html) for details. ### U +**section:UAB Cloud** +: A cloud service provided by UAB Research Computing, also called "cloud.rc" is a portal based on OpenStack cloud software, for more permanent research applications such as web pages and database hosting and develop applications for high performance compute. + +**section:UAB Box** +: An alternative storage solutions provide and maintained by UAB IT, that allows users to store, access, and share documents, research data, and other files. + ### V **section:Virtual Machine (VM)** -: A software-based computer that provides a virtualized environment, that functions like a physical computer, for running applications or operating systems. +: A software-based computer that provides a virtualized environment, that functions like a physical computer, for running applications or operating systems. + +**section:VSCode** +: A lightweight and extensible code editor, developed by Microsoft, that support multiple programming languages. **section:Volumes** -: Virtual storage devices used to persist data, often associated with Virtual Machines (VM) or cloud environments. +: Virtual storage devices used to persist data, often associated with Virtual Machines (VM) or cloud environments. ### W +**section:Workflow Manager** +: Software that automates the execution and management of tasks in a predefined workflow or pipeline. For example [Snakemake](https://snakemake.readthedocs.io/en/stable/), [Nextflow](https://www.nextflow.io/), etc. + ### X **section:XIAS account** -: A guest user credential for non-UAB individuals who need access to our HPC system. It uses the guest's email as the username. +: A guest user credential for non-UAB individuals who need access to our HPC system. It uses the guest's email, alos XIAS email, as the username. + +**section:XNAT** +: Imaging data management platform commonly used in medical and neuroscience research. Please refer [XNAT](https://www.xnat.org/) for details. ### Y From 0f3a9eb56e45bb058adc1cb6d34d7e02cd85015c Mon Sep 17 00:00:00 2001 From: bdu-birhanu Date: Mon, 20 Jan 2025 20:06:40 -0600 Subject: [PATCH 7/7] fic typo terms --- docs/glossary.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/glossary.md b/docs/glossary.md index c4c59ad21..8d15792ff 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -87,7 +87,7 @@ This glossary defines key terms related to the Research Computing system to help **section:Docker Image** : A snapshot of the libraries and dependencies required inside a container for an application to run. -**section: Duo 2FA** +**section:Duo 2FA** : A two-factor authentication system used for securing user access. ### E @@ -110,6 +110,7 @@ This glossary defines key terms related to the Research Computing system to help **section:Globus CLI** : A command line tool that allows users to manage file transfers, automate workflows, and interact with Globus services without using the web interface. + **section:Globus Collection** : A logical representation of data storage that is accessible via the Globus service.