Skip to content

Using the farm cluster

Lisa Johnson edited this page Sep 7, 2017 · 24 revisions

We do our computation on the Farm cluster at the UC Davis College of Agricultural and Environmental Sciences.

For a general overview of using this cluster see the Ross-Ibarra wiki page.

For a technical overview (hardware, etc.), look here.

Setting up your account on the farm cluster for the first time

  • Follow RIBLAB instructions: https://github.com/RILAB/lab-docs/wiki/Farm-Account-Set-Up
  • Go to the terminal and type ssh-keygen
  • Just press Enter when prompted for 'file in which to save the key' or 'passphrase'. You do not need to change any of this information.
  • When finished, the id_rsa.pub file should be located in the ~/.ssh directory. Confirm this by typing this:
ls -lah ~/.ssh

and making sure the file is listed.

  • Copy the id_rsa.pub file to a location where you you can easily upload it (Mac Finder will not be able to view the .ssh directory.)
cp ~/.ssh/id_rsa.pub ~/Documents

Backing up data

We have access to a server on campus: Spot. This backup actually has its own back up too. At the minimum, all raw sequence files should be backed up here.

For now, directories are named by sequence type (RAD, RNA-seq), with sub-folders for specific projects. Put your data in the appropriate place.

  • Group
    • Whitehead
      • RAD_seq
        • Fundulus
        • Red_Abalone
      • RNA_seq
        • Fundulus
        • Red_Abalone
      • Genome_seq

Here is the relevant information for the server:

  • spot.lawr.ucdavis.edu Port 22
  • kerberos login
  • kerberos passphrase

For the most basic backup, do the following:

  • log into farm
  • srun into a node with srun -p high -t 24:00:00 --pty bash
  • specify port with -P 2022

run:

scp -P 2022 FILENAME [email protected]:/Whitehead/Group/COMPLETEPATH/

example:

scp -P 2022 CB-1_S2_L001_R1_001.fastq.gz [email protected]:/Whitehead/Group/RAD_seq/Fundulus/AdmixtureMapping_2015/
  • you will be asked for your kerberos password

You can also do this in a loop, batch job, etc. Plus a wild card works if you're trying to get all files of a type (i.e., *.fastq.gz)

General tips and tricks:

to access a node directly

with 20g of memory:

  • srun -p high -t 24:00:00 --mem=20000 --pty bash

To run R:

  • srun -p high -t 24:00:00 --mem=20000 --pty R

Use Cyberduck to graphically interact with files on farm, download files to your desktop

Download: https://cyberduck.io/?l=en

Setup ftp transfer:

This will point to your home directory. Then, you can browse and click into directories where your files are located.

Writing scripts and submitting jobs to the SLURM workload management system

Submit scripts as slurm jobs so that they will run while you can walk away from the computer. To get coffee! Or whatever. The scrolling output you would normally see on the screen will be automatically saved to slurm output files for you to review later.

With nano (or your editor of choice), make a new file count_reads.sh and use this as a template script.

#!/bin/bash
#SBATCH -D /home/ljcohen/adv-unix-workshop/data/MiSeq/farm/
#SBATCH -J insert_descriptive_job_name_here
#SBATCH -t 2:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -p high
#SBATCH -c 4
#SBATCH --mem=16000

# load modules 
module load bio/1.0

# change directory to where the files are
cd ../

# counts the number of reads in each file
for i in *.fq
do
  echo $i
  grep -c "^@M00967" $i
  fastqc $i
done

This script tells the slurm management system to request 16 GB RAM on 1 node with 4 processors for 2 hrs at high priority.

To run this script, submit it to slurm with this command:

sbatch count_reads.sh

A confirmation message like this should appear, telling you that the job was submitted:

Submitted batch job 10670199

More info on farm: https://github.com/ngs-docs/2016-adv-begin-shell-genomics/blob/master/ucd-farm-intro.md

Advanced shell lesson: https://github.com/ngs-docs/2016-adv-begin-shell-genomics

Pipe R graphics back to your screen

use ssh -X to log in.

  • This should pull up an xterm. You can also use -Y
  • To test if your X11 connection is working, type xeyes. A set of eyeballs should pop up on your screen. They will follow your mouse. If these don't appear, look here for help.
  • srunx high -t 24:00:00 --mem=###
    • this will open an x term through a node
  • Then run R from whatever directory you want. Graphics should be piped back to your machine.

creating a pipeline from your scripts

  • A useful resource from NIH
    • https://hpc.nih.gov/docs/job_dependencies.html
    • explains how to have jobs wait on the completion of other jobs, etc. This is convenient for your life- you can submit a bunch of jobs at once that will run sequentially. Ideally, you'd compile your scripts using something like this for publication
Clone this wiki locally