Skip to content

Commit e219e4f

Browse files
natnesichrisdjscottCallumWalley
authored
Globus compute (#899)
For visibility --------- Signed-off-by: Chris Scott <[email protected]> Signed-off-by: Cal <[email protected]> Co-authored-by: Chris Scott <[email protected]> Co-authored-by: Chris Scott <[email protected]> Co-authored-by: Cal <[email protected]>
1 parent 577b52c commit e219e4f

File tree

1 file changed

+109
-0
lines changed
  • docs/Batch_Computing/Globus_Compute

1 file changed

+109
-0
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Globus Compute
2+
3+
!!! warning
4+
5+
Our Globus Compute offering is in early-access mode and may change. Please let us know if you have any feedback or suggestions for improvements.
6+
7+
## Overview
8+
9+
[Globus Compute](https://www.globus.org/compute) provides a secure, scalable way to run Python functions remotely on compute resources.
10+
On Mahuika, we host a Globus Compute multi-user endpoint, allowing users to submit and manage compute tasks through
11+
Globus or compatible clients without needing to connect to the HPC via SSH or OnDemand.
12+
Globus Compute has [comprehensive documentation](https://globus-compute.readthedocs.io/en/latest/quickstart.html) and here we will just
13+
highlight the specifics of working with the Mahuika endpoints.
14+
15+
## Key concepts
16+
17+
`Endpoint`
18+
19+
: A Globus Compute service running on the HPC system that executes user functions
20+
21+
`Single-user mode`
22+
23+
: Each user runs and manages their own endpoint on a login node, which can execute their functions (this approach will not be discussed
24+
here but is documented on the Globus Compute site)
25+
26+
`Multi-user mode`
27+
28+
: A centrally managed endpoint that all REANNZ HPC users can send tasks to; a user identity mapping process
29+
maps the user's Globus identity to their account on Mahuika
30+
31+
`Executors`
32+
33+
: Manage job submissions to Slurm or execute jobs directly on login nodes
34+
35+
`Authentication`
36+
37+
: Handled via Globus Auth - users must have a Globus account, which they can sign into with institutional or Globus IDs, and they
38+
must add a linked identity with the "NeSI Keycloak" to their Globus account
39+
40+
## Requirements
41+
42+
You must have a REANNZ HPC account, Globus account and have linked an identity from your Globus account to the NeSI Keycloak. This can be achieved by
43+
navigating to the [NeSI HPC Storage](https://app.globus.org/file-manager?origin_id=763d50ee-e814-4080-878b-6a8be5cf7570) in the Globus
44+
web app and ensuring you can see your REANNZ HPC files.
45+
46+
## Endpoints
47+
48+
| Name | Endpoint ID | Purpose |
49+
|--------------|----------------------------------------|--------------------------------------------------------------|
50+
| reannz-login | `63c0b682-43d1-4b97-bf23-6a676dfdd8bd` | Lighweight tasks that are suitable to be run on a login node, such as submitting Slurm jobs, checking job status, etc. |
51+
| reannz-slurm | `abf152c8-ad9b-453f-bcc8-3424284344f3` | Resource intensive tasks; work sent to this endpoint will run in a Slurm job |
52+
53+
## `reannz-slurm` endpoint
54+
55+
This endpoint submits work in Slurm jobs. The following configuration options are available via [`user_endpoint_config`](https://globus-compute.readthedocs.io/en/v2.20.1/reference/executor.html#globus_compute_sdk.Executor.user_endpoint_config):
56+
57+
- `ACCOUNT_ID` (required): your REANNZ HPC project code
58+
- `WALL_TIME` (optional, defaults to `00:05:00`): the wall time for the Slurm job that gets submitted (must be enough time for your function to complete)
59+
- `MEM_PER_CPU` (optional, defaults to `2G`): amount of memory to be requested in the Slurm job
60+
- `GPUS_PER_NODE` (optional, defaults to no GPU): request GPUs for the Slurm job
61+
62+
## Simple example
63+
64+
1. Install Python 3.11 and create a virtual environment
65+
```
66+
python -m venv venv
67+
source venv/bin/activate
68+
```
69+
2. Install Globus Compute
70+
```
71+
pip install "globus_compute_sdk>=3,<4"
72+
```
73+
3. Create a simple Python script (replacing `<your_project_code>` with your project code)
74+
```
75+
# test.py
76+
from globus_compute_sdk import Executor
77+
78+
def hello_from_node():
79+
import os
80+
import getpass
81+
return f"Hello, this function ran as {getpass.getuser()} on {os.uname().nodename}"
82+
83+
mep_id = "abf152c8-ad9b-453f-bcc8-3424284344f3"
84+
with Executor() as ex:
85+
ex.endpoint_id = mep_id
86+
ex.user_endpoint_config = {"ACCOUNT_ID": "<your_project_code>"}
87+
f = ex.submit(hello_from_node)
88+
print(f.result())
89+
```
90+
4. Run the test
91+
```
92+
python test.py
93+
```
94+
95+
## Limitations and known problems
96+
97+
Limitations and known problems related to our current implementation are listed here.
98+
If these are impacting your ability to use this service, please [let us know](mailto:[email protected]).
99+
100+
- Currently limited to a single CPU
101+
- You must use Python 3.11 (we are exploring options to execute functions in containers, which will enable use of different Python versions)
102+
- You can only import Python packages that are available in the `Python/3.11.6-foss-2023a` environment module (containerisation will help here too)
103+
- There can be a lag of around 1 minute to run a function if you have not used the endpoint recently
104+
- Globus Compute version 3.x
105+
106+
## Other notes
107+
108+
- Globus Compute uses token based auth after the initial setup (along with guest collections things can be fully automated)
109+
- standard access and usage policies, quotas and accounting rules apply (active project, no compute intensive work on login endpoint, etc)

0 commit comments

Comments
 (0)