How to use CAFE from Orthofinder Results
2
1
Entering edit mode
22 months ago
ahmadjoyyia ▴ 20

I asked this question a few days ago but did not get any answer. I tried finding it on my but could not really get it. I have orthofinder results 'Gene_count' which gives gene count in each orthogroup in each genome. Its format is like;

OG0000032 10 11 1 13 0 35

I also have SpeciesTree_rooted that is generated by orthofinder. I need to know if someone here has used orthofinder results and some modifications (if they did) to get successful gene families expansion/contraction results from CAFE or any other software please? I am really looking for an answer!

expansion orthofinder cafe gene • 3.8k views
ADD COMMENT
8
Entering edit mode
22 months ago

I have been looking into something similar (maybe, not sure myself) and I didn't know about cafe. It looks good, thanks for bringing it up. The following R code should adapt orthofinder 2.x output to cafe5.

Lots of disclaimers: This code works for me in that it makes cafe complete without errors. Whether it makes something sensible I'm not sure as I'm still getting to grips with the whole idea of expansion/contraction of gene families.

First, make orthofinder's species tree ultrametric. It should be already binary and rooted as required by cafe.

library(ape)
library(data.table)

tre <- read.tree('Species_Tree/SpeciesTree_rooted.txt')
stopifnot(is.binary(tre))
stopifnot(is.rooted(tre))

if(is.ultrametric(tre)) {
    utre <- tre
} else{
    utre <- chronos(tre)
}
write.tree(utre, 'cafe/SpeciesTree_rooted_ultra.txt')

Then prepare the table of counts. I use the N0 output since the original orthogroups are deprecated in favour of the phylogenetic hierarchical orthogroups:

hog <- fread('Phylogenetic_Hierarchical_Orthogroups/N0.tsv')
hog[, OG := NULL]
hog[, `Gene Tree Parent Clade` := NULL]
hog <- melt(hog, id.vars='HOG', variable.name='species', value.name='pid')
hog <- hog[pid != '']
hog$n <- sapply(hog$pid, function(x) length(strsplit(x, ', ')[[1]]))

# Exclude HOGs with lots of genes in a one or more species. 
# See also cafe tutorial about filtering gene families
keep <- hog[, list(n_max=max(n)), HOG][n_max < 100]$HOG
hog <- hog[HOG %in% keep]

# Exclude HOGs present in only 1 species
keep <- hog[, .N, HOG][N > 1]$HOG
hog <- hog[HOG %in% keep]

counts <- dcast(hog, HOG ~ species, value.var='n', fill=0)
counts[, Desc := 'n/a']
setcolorder(counts, 'Desc')
fwrite(counts, 'hog_gene_counts.tsv', sep='\t')

Run cafe5 with your favorite settings:

nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results --cores 30 &
nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results_k3 --cores 30 -k 3 &
nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results_p_k3 --cores 30 -p -k 3 &
ADD COMMENT
0
Entering edit mode

Hi Dariober! It worked!! Thank you so much!! It would be really helpful for others as well!

ADD REPLY
1
Entering edit mode
21 months ago
ahmadjoyyia ▴ 20

Edit: One can also gene Gene_count file by making some amendments like adding column of Desc with values n/a before Orthogroups columns, and remove Total column, then use dos2unix. It would work.

ADD COMMENT
0
Entering edit mode

Note that the orthogroups directory is deprecated.

Once I'm here, orthofinder has also a script for making the species tree ultrametetric, make_ultrametric.py. I don't know how it compares to the method I suggest above though.

ADD REPLY

Login before adding your answer.

Traffic: 1675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6