Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions docs/Batch_Computing/Globus_Compute/.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Globus Compute

Check failure on line 1 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check page meta

misc

string index out of range

Check warning on line 1 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check page meta

meta.parse

Meta block missing or malformed.

Check warning on line 1 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check page meta

meta.siblings

Parent category 'Globus_Compute' has too few children (1). Try to nest '4' or more items here to justify it's existence.

Check warning on line 1 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check page meta

meta.siblings

Parent category 'Batch_Computing' has too many children (12). Try to keep number of items in a category under '8', maybe add some new categories?

!!! warning

Our Globus Compute offering is in early-access mode and may change. Please let us know if you have any feedback or suggestions for improvements.

Check warning on line 5 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Prose

misc.phrasal_adjectives.ly

'No hyphen is necessary in phrasal adjectives with an adverb ending in -ly, unless the -ly adverb is part of a longer phrase.'

## Overview

[Globus Compute](https://www.globus.org/compute) provides a secure, scalable way to run Python functions remotely on compute resources.
On Mahuika, we host a Globus Compute multi-user endpoint, allowing users to submit and manage compute tasks through
Globus or compatible clients without needing to connect to the HPC via SSH or OnDemand.
Globus Compute has [comprehensive documentation](https://globus-compute.readthedocs.io/en/latest/quickstart.html) and here we will just
highlight the specifics of working with the Mahuika endpoints.

## Key concepts

`Endpoint`

: A Globus Compute service running on the HPC system that executes user functions

`Single-user mode`

: Each user runs and manages their own endpoint on a login node, which can execute their functions (this approach will not be discussed
here but is documented on the Globus Compute site)

`Multi-user mode`

: A centrally managed endpoint that all REANNZ HPC users can send tasks to; a user identity mapping process
maps the user's Globus identity to their account on Mahuika

`Executors`

: Manage job submissions to Slurm or execute jobs directly on login nodes

`Authentication`

: Handled via Globus Auth - users must have a Globus account, which they can sign into with institutional or Globus IDs, and they

Check warning on line 37 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'Auth' is misspelled.
must add a linked identity with the "NeSI Keycloak" to their Globus account

## Requirements

You must have a REANNZ HPC account, Globus account and have linked an identity from your Globus account to the NeSI Keycloak. This can be achieved by
navigating to the [NeSI HPC Storage](https://app.globus.org/file-manager?origin_id=763d50ee-e814-4080-878b-6a8be5cf7570) in the Globus
web app and ensuring you can see your REANNZ HPC files.

## Endpoints

| Name | Endpoint ID | Purpose |
|--------------|----------------------------------------|--------------------------------------------------------------|
| reannz-login | `63c0b682-43d1-4b97-bf23-6a676dfdd8bd` | Lighweight tasks that are suitable to be run on a login node, such as submitting Slurm jobs, checking job status, etc. |

Check warning on line 50 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'Lighweight' is misspelled.

Check warning on line 50 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'reannz' is misspelled.
| reannz-slurm | `abf152c8-ad9b-453f-bcc8-3424284344f3` | Resource intensive tasks; work sent to this endpoint will run in a Slurm job |

Check warning on line 51 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'slurm' is misspelled.

Check warning on line 51 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'reannz' is misspelled.

## `reannz-slurm` endpoint

Check warning on line 53 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'slurm' is misspelled.

Check warning on line 53 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'reannz' is misspelled.

This endpoint submits work in Slurm jobs. The following configuration options are available via [`user_endpoint_config`](https://globus-compute.readthedocs.io/en/v2.20.1/reference/executor.html#globus_compute_sdk.Executor.user_endpoint_config):

- `ACCOUNT_ID` (required): your REANNZ HPC project code
- `WALL_TIME` (optional, defaults to `00:05:00`): the wall time for the Slurm job that gets submitted (must be enough time for your function to complete)
- `MEM_PER_CPU` (optional, defaults to `2G`): amount of memory to be requested in the Slurm job
- `GPUS_PER_NODE` (optional, defaults to no GPU): request GPUs for the Slurm job

## Simple example

1. Install Python 3.11 and create a virtual environment
```
python -m venv venv

Check warning on line 66 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Prose

lexical_illusions

'There's a lexical illusion in 'venv venv' - a phrase is repeated.'
source venv/bin/activate
```
2. Install Globus Compute
```
pip install "globus_compute_sdk>=3,<4"
```
3. Create a simple Python script (replacing `<your_project_code>` with your project code)
```
# test.py
from globus_compute_sdk import Executor

def hello_from_node():
import os
import getpass
return f"Hello, this function ran as {getpass.getuser()} on {os.uname().nodename}"

mep_id = "abf152c8-ad9b-453f-bcc8-3424284344f3"
with Executor() as ex:
ex.endpoint_id = mep_id
ex.user_endpoint_config = {"ACCOUNT_ID": "<your_project_code>"}
f = ex.submit(hello_from_node)
print(f.result())
```
4. Run the test
```
python test.py
```

## Limitations and known problems

Check warning on line 95 in docs/Batch_Computing/Globus_Compute/.md

View workflow job for this annotation

GitHub Actions / Check Prose

lexical_illusions

'There's a lexical illusion in 'Limitations and known problems

Limitations and known problems related to our current implementation are listed here.
If these are impacting your ability to use this service, please [let us know](mailto:[email protected]).

- Currently limited to a single CPU
- You must use Python 3.11 (we are exploring options to execute functions in containers, which will enable use of different Python versions)
- You can only import Python packages that are available in the `Python/3.11.6-foss-2023a` environment module (containerisation will help here too)
- There can be a lag of around 1 minute to run a function if you have not used the endpoint recently
- Globus Compute version 3.x

## Other notes

- Globus Compute uses token based auth after the initial setup (along with guest collections things can be fully automated)
- standard access and usage policies, quotas and accounting rules apply (active project, no compute intensive work on login endpoint, etc)
Loading