Skip to content

Getting access to BioHPC Linux nodes

wlinda1412 edited this page Apr 13, 2023 · 15 revisions

You may be asked to compute on BioHPC's ECCO Linux nodes (f.i., for very long-running jobs, or for very large memory).

Request an account

Go to the BioHPC account request page, requesting specifically joining the ECCO group and lv39 (Lars') "lab"

Reserve a node

  • Go to BioHPC Reservations page, choose "Restricted", and reserve a node:
    • cbsuecco02: up to 7 days
    • all others: up to 3 days
    • in both cases, renewable

Access a node

See Getting Started Guide and Remote Access. SSH is the best path (if you don't need graphical applications).

Note that, for off-campus access, you will need to use Cornell VPN. Instructions can be found here.

Notes

Shared directory

Your default home directory (/home/NETID) is not shared among group users (same as on CISER). Use /home/ecco_lv39 instead.

Transfer of Data to BioHPC

  • The BioHPC instructions for using FileZilla are great for moving data from your personal workspace to BioHPC.
  • In the event that you need to transfer data from CISER to BioHPC (CISER does not have FileZilla installed):
  1. First, open up a bash shell in the directory that holds the folder which you want to transfer to BioHPC.
  2. SFTP into BioHPC sftp [email protected]. Your password is the same that you use to login to the cbsuecco02 node (or the login node).
  3. cd into the desired directory on the BioHPC node.
  4. You need to first create the directory on BioHPC: mkdir data.
  5. Use the put command to place the desired folder (i.e. "data") on BioHPC: put -r data/.
  6. If you run into an error along the lines of Can't find request for ID 31425, try zipping up the files and just transferring the zip file. Once transferred, you can unzip on BioHPC (if you run into issues with the "unzip" command, try using 7z: i.e. /programs/bin/util/7z x (ZIPFILE))
  7. Give Lars access to my "workdir/mjd443" with chmod -R a+rwX /workdir/netid (this command is not permanent and should be run again after any edits to the directory).

Using Stata

  • To use Stata version 16, execute one of the following commands before running stata-mp:

    /usr/local/stata16/stata-mp or export PATH=/usr/local/stata16:$PATH

  • Ensure that the tmp directory being used is running on the BioHPC /workdir space, by running the following commands before executing your program(s):

    export STATATMP=/workdir/netid/tmp
    mkdir $STATATMP
    
  • Don't run Stata interactively via SSH. Instead, execute the program by the following: stata-mp -b do master.do

Using Matlab

To execute a Matlab program (example - program called main.m) from the command line:

matlab -nodisplay -r "addpath(genpath('.')); main"

Utilize tmux

Cheatsheet: https://gist.github.com/MohamedAlaa/2961058

  1. Login via SSH
  2. Launch tmux with a session name that makes sense, e.g. tmux new -s AEAREP-xxxx
  3. Launch your Matlab, Stata, etc job
  4. Disconnect from tmux: ctrl-b d. You don't need to press this both Keyboard shortcut at a time. First press "Ctrl+b" and then press "d".
  5. Log out of SSH

Next time:

  1. Login via SSH
  2. Reconnect to your tmux session: tmux a -t AEAREP-xxxx
  3. If you forgot what session, tmux ls

Note: When logged into the compute node, you can call ps ux to see all your running jobs.

Using Docker

The BioHPC docker command is docker1, see here for more details. All files that are shared via the -v option must reside on /workdir/NETID and cannot be shared across nodes. To get the files to /workdir/NETID, the following commands can be used, assuming that your files are in /home/ecco_lv39/Workspace/aearep-$AEAREP:

  • Sync to workdir:
AEAREP=12345
[[ -d /workdir/$(id -nu) ]] || mkdir /workdir/$(id -nu)
rsync -auv /home/ecco_lv39/Workspace/aearep-$AEAREP/ /workdir/$(id -nu)/aearep-$AEAREP/
  • Sync back to shared drive (once computations are done, or at any time
AEAREP=12345
rsync -auv /workdir/$(id -nu)/aearep-$AEAREP/ /home/ecco_lv39/Workspace/aearep-$AEAREP/ 
Clone this wiki locally