-
Notifications
You must be signed in to change notification settings - Fork 4
Using the farm cluster
We do our computation on the Farm cluster at the UC Davis College of Agricultural and Environmental Sciences.
For a general overview of using this cluster see the Ross-Ibarra wiki page.
For a technical overview (hardware, etc.), look here.
- Follow RIBLAB instructions: https://github.com/RILAB/lab-docs/wiki/Farm-Account-Set-Up
- Go to the terminal and type
ssh-keygen
- Just press
Enter
when prompted for 'file in which to save the key' or 'passphrase'. You do not need to change any of this information. - When finished, the
id_rsa.pub
file should be located in the~/.ssh
directory. Confirm this by typing this:
ls -lah ~/.ssh
and making sure the file is listed.
- Copy the
id_rsa.pub
file to a location where you you can easily upload it (Mac Finder will not be able to view the.ssh
directory.)
cp ~/.ssh/id_rsa.pub ~/Documents
- Then go to the account request form:
https://wiki.cse.ucdavis.edu/cgi-bin/index2.pl
to fill out the required information and upload the
id_rsa.pub
file.
We have access to a server on campus: Spot. This backup actually has its own back up too. At the minimum, all raw sequence files should be backed up here.
For now, directories are named by sequence type (RAD, RNA-seq), with sub-folders for specific projects. Put your data in the appropriate place.
- Group
- Whitehead
- RAD_seq
- Fundulus
- Red_Abalone
- RNA_seq
- Fundulus
- Red_Abalone
- Genome_seq
- RAD_seq
- Whitehead
Here is the relevant information for the server:
- spot.lawr.ucdavis.edu Port 22
- kerberos login
- kerberos passphrase
For the most basic backup, do the following:
- log into farm
- srun into a node with
srun -p high -t 24:00:00 --pty bash
- specify port with
-P 2022
run:
scp -P 2022 FILENAME [email protected]:/Whitehead/Group/COMPLETEPATH/
example:
scp -P 2022 CB-1_S2_L001_R1_001.fastq.gz [email protected]:/Whitehead/Group/RAD_seq/Fundulus/AdmixtureMapping_2015/
- you will be asked for your kerberos password
You can also do this in a loop, batch job, etc. Plus a wild card works if you're trying to get all files of a type (i.e., *.fastq.gz)
with 20g of memory:
srun -p high -t 24:00:00 --mem=20000 --pty bash
To run R:
srun -p high -t 24:00:00 --mem=20000 --pty R
Download: https://cyberduck.io/?l=en
Setup ftp transfer:
This will point to your home directory. Then, you can browse and click into directories where your files are located.
Writing scripts and submitting jobs to the SLURM workload management system
Submit scripts as slurm jobs so that they will run while you can walk away from the computer. To get coffee! Or whatever. The scrolling output you would normally see on the screen will be automatically saved to slurm output files for you to review later.
With nano (or your editor of choice), make a new file count_reads.sh
and use this as a template script.
#!/bin/bash
#SBATCH -D /home/ljcohen/adv-unix-workshop/data/MiSeq/farm/
#SBATCH -J insert_descriptive_job_name_here
#SBATCH -t 2:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -p high
#SBATCH -c 4
#SBATCH --mem=16000
# load modules
module load bio/1.0
# change directory to where the files are
cd ../
# counts the number of reads in each file
for i in *.fq
do
echo $i
grep -c "^@M00967" $i
fastqc $i
done
This script tells the slurm management system to request 16 GB RAM on 1 node with 4 processors for 2 hrs at high priority.
To run this script, submit it to slurm with this command:
sbatch count_reads.sh
A confirmation message like this should appear, telling you that the job was submitted:
Submitted batch job 10670199
More info on farm: https://github.com/ngs-docs/2016-adv-begin-shell-genomics/blob/master/ucd-farm-intro.md
Advanced shell lesson: https://github.com/ngs-docs/2016-adv-begin-shell-genomics
use ssh -X
to log in.
- This should pull up an xterm. You can also use -Y
- To test if your X11 connection is working, type
xeyes
. A set of eyeballs should pop up on your screen. They will follow your mouse. If these don't appear, look here for help. -
srunx high -t 24:00:00 --mem=###
- this will open an x term through a node
- Then run R from whatever directory you want. Graphics should be piped back to your machine.
- A useful resource from NIH
- https://hpc.nih.gov/docs/job_dependencies.html
- explains how to have jobs wait on the completion of other jobs, etc. This is convenient for your life- you can submit a bunch of jobs at once that will run sequentially. Ideally, you'd compile your scripts using something like this for publication