Guide and scripts for UF cluster programs

By Jesse Breinholt, Kawahara Lab

ExaML Guide

ExaML module must be run on the new hypergator cluster:

Step 1: Generate starting tree using raxml, I generally do this interactively on the test nodes as it takes only a few minutes as follows.

ssh dev01
cd /scratch/lfs/username/dirwhereyouplacedyourfiles
module load raxml
raxmlHPC-PTHREADS-SSE3 -y -m model -s infile.phy -p $RANDOM -n outfile -T 2

Step 2: generate binary file from input file to be used by ExaML. I also do this interactively. Note if using DNA the –m should be –m DNA. Any time you want to change the partition scheme you must generate a new binary file. If you do not want to partition the data delete the –q option.

module load gcc/4.7.2 openmpi/1.6.4 examl/1.0.3
parser -s infile.phy -m PROT –q raxmlformatpartitionfile -n outfile

Step 3: Run ExaML using a qsub script example below. Remember this must be executed from the hypergator cluster, this also means you cannot execute it from any of the old interactive nodes on the old HPC such as test01-test06. First estimate the amount of memory needed using these formulas (from: ) where n = number of taxa and m = the number of variable sites. The number of variable sites can be taken from the info file produced by the RAxML run to generate the starting tree.

• MEM(AA+GAMMA)= (n-2) * m * (80 * 8) bytes
• MEM(AA+CAT)= (n-2) * m * (20 * 8) bytes
• MEM(DNA+GAMMA)= (n-2) * m * (16 * 8) bytes
• MEM(DNA+CAT)= (n-2) * m * (4 * 8) bytes

This will help you decide the minimum number of nodes the program needs to give it enough memory. The script below is for 32 GB 4 nodes with 8-processors pernode. The option –m sets the model and the options are –m GAMMA or –m PSR (per-site rate category model this used to be called CAT in RAxML). The other options you may want to consider (text directly from ExaML manual at

-D ML search convergence criterion. This will break off ML searches if the relative
Robinson-Foulds distance between the trees obtained from two consecutive lazy SPR cyclesis smaller or equal to 1%. Usage recommended for very large datasets in terms of taxa.On trees with more than 500 taxa this will yield execution time improvements of approximately 50% while yielding only slightly worse trees.

-i Initial rearrangement setting for the subsequent application of topological changes phase
Recommendation: on datasets with more than 10,000 taxa you should set this values to -i
25 such that RAxML does not spend much time trying to estimate the best rearrangement
radius. This is done when you don’t specify -i. My empirical observation is that the estimate
will always be 25 on very large datasets (RAxML-light manual at

-Q Enable alternative data/load distribution algorithm for datasets with many partitions
In particular under PSR this can lead to parallel performance improvements of up to factor 10!

-R read in a binary checkpoint file called ExaML_binaryCheckpoint.RUN_ID_number

-S turn on memory saving option for gappy multi-gene alignments. For large and gappy
datasets specify -S to save memory .This will produce slightly different likelihood values, may be a bit slower but can reduce memory consumption from 70GB to 19GB on very large and gappy datasets.

Note: If you do not give your run enough time you can restart it from the last check point by adding –R ExaML_binaryCheckpoint.outfile_# to the command line. But remember to also change the name of the outputfile –n outfile as it will not overwrite the same file and will cause the program to crash.

Previous runs by our lab:
Nucleotide Data set with 34 taxa 3,248,454 bp with 1,938,570 distinct alignment patterns
with 8 partitions
1 node 8 processors: time 5:43:21 memory 7.8 GB
2 nodes 8 processors: time 5:47:36 memory 7.88 GB
4 nodes 8 processors: time 1:32:25 memory 8.5 GB
2 nodes 16 processors: time 2:56:15 memory 8 GB

Here is the qsub script

#PBS -N runname
#PBS -m bea
#PBS -M emailaddress
#PBS -o runname.$PBS_JOBID.log
#PBS -e runname.$PBS_JOBID.err
#PBS -l nodes=4:ppn=8
#PBS -l pmem=2048mb
#PBS -l walltime=30:00:00

# This job’s working directory
echo Working directory is $PBS_O_WORKDIR

module load gcc/4.7.2 openmpi/1.6.4 examl/1.0.3
mpiexec examl -s infile.binary -t startingtreefile -m GAMMA -n outfile