Using the farm cluster

We do our computation on the Farm cluster at the UC Davis College of Agricultural and Environmental Sciences.

For a general overview of using this cluster see the Ross-Ibarra wiki page.

For a technical overview (hardware, etc.), look here.

Setting up your account on the farm cluster for the first time

Follow RIBLAB instructions: https://github.com/RILAB/lab-docs/wiki/Farm-Account-Set-Up
Go to the terminal and type ssh-keygen
Just press Enter when prompted for 'file in which to save the key' or 'passphrase'. You do not need to change any of this information.
When finished, the id_rsa.pub file should be located in the ~/.ssh directory. Confirm this by typing this:

ls -lah ~/.ssh

and making sure the file is listed.

Copy the id_rsa.pub file to a location where you you can easily upload it (Mac Finder will not be able to view the .ssh directory.)

cp ~/.ssh/id_rsa.pub ~/Documents

Then go to the account request form: https://wiki.cse.ucdavis.edu/cgi-bin/index2.pl to fill out the required information and upload the id_rsa.pub file.

Backing up data

We have access to a server on campus: Spot. This backup actually has its own back up too. At the minimum, all raw sequence files should be backed up here.

For now, directories are named by sequence type (RAD, RNA-seq), with sub-folders for specific projects. Put your data in the appropriate place.

Group
- Whitehead
  - RAD_seq
    - Fundulus
    - Red_Abalone
  - RNA_seq
    - Fundulus
    - Red_Abalone
  - Genome_seq

Here is the relevant information for the server:

spot.lawr.ucdavis.edu Port 22
kerberos login
kerberos passphrase

For the most basic backup, do the following:

log into farm
srun into a node with srun -p high -t 24:00:00 --pty bash
specify port with -P 2022

run:

scp -P 2022 FILENAME [email protected]:/Whitehead/Group/COMPLETEPATH/

example:

scp -P 2022 CB-1_S2_L001_R1_001.fastq.gz [email protected]:/Whitehead/Group/RAD_seq/Fundulus/AdmixtureMapping_2015/

you will be asked for your kerberos password

You can also do this in a loop, batch job, etc. Plus a wild card works if you're trying to get all files of a type (i.e., *.fastq.gz)

General tips and tricks:

to access a node directly

with 20g of memory:

srun -p high -t 24:00:00 --mem=20000 --pty bash

To run R:

srun -p high -t 24:00:00 --mem=20000 --pty R

Use Cyberduck to graphically interact with files on farm, download files to your desktop

Download: https://cyberduck.io/?l=en

Setup ftp transfer:

This will point to your home directory. Then, you can browse and click into directories where your files are located.

Writing scripts and submitting jobs to the SLURM workload management system

Submit scripts as slurm jobs so that they will run while you can walk away from the computer. To get coffee! Or whatever. The scrolling output you would normally see on the screen will be automatically saved to slurm output files for you to review later.

With nano (or your editor of choice), make a new file count_reads.sh and use this as a template script.

#!/bin/bash
#SBATCH -D /home/ljcohen/adv-unix-workshop/data/MiSeq/farm/
#SBATCH -J insert_descriptive_job_name_here
#SBATCH -t 2:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -p high
#SBATCH -c 4
#SBATCH --mem=16000

# load modules 
module load bio/1.0

# change directory to where the files are
cd ../

# counts the number of reads in each file
for i in *.fq
do
  echo $i
  grep -c "^@M00967" $i
  fastqc $i
done

This script tells the slurm management system to request 16 GB RAM on 1 node with 4 processors for 2 hrs at high priority.

To run this script, submit it to slurm with this command:

sbatch count_reads.sh

A confirmation message like this should appear, telling you that the job was submitted:

Submitted batch job 10670199

More info on farm: https://github.com/ngs-docs/2016-adv-begin-shell-genomics/blob/master/ucd-farm-intro.md

Advanced shell lesson: https://github.com/ngs-docs/2016-adv-begin-shell-genomics

Pipe R graphics back to your screen

use ssh -X to log in.

This should pull up an xterm. You can also use -Y
To test if your X11 connection is working, type xeyes. A set of eyeballs should pop up on your screen. They will follow your mouse. If these don't appear, look here for help.
srunx high -t 24:00:00 --mem=###
- this will open an x term through a node
Then run R from whatever directory you want. Graphics should be piped back to your machine.

creating a pipeline from your scripts

A useful resource from NIH
- https://hpc.nih.gov/docs/job_dependencies.html
- explains how to have jobs wait on the completion of other jobs, etc. This is convenient for your life- you can submit a bunch of jobs at once that will run sequentially. Ideally, you'd compile your scripts using something like this for publication

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using the farm cluster

Setting up your account on the farm cluster for the first time

Backing up data

General tips and tricks:

to access a node directly

Use Cyberduck to graphically interact with files on farm, download files to your desktop

Writing scripts and submitting jobs to the SLURM workload management system

Pipe R graphics back to your screen

creating a pipeline from your scripts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally