I have been looking into something similar (maybe, not sure myself) and I didn't know about cafe. It looks good, thanks for bringing it up. The following R code should adapt orthofinder 2.x output to cafe5.
Lots of disclaimers: This code works for me in that it makes cafe complete without errors. Whether it makes something sensible I'm not sure as I'm still getting to grips with the whole idea of expansion/contraction of gene families.
First, make orthofinder's species tree ultrametric. It should be already binary and rooted as required by cafe.
library(ape)
library(data.table)
tre <- read.tree('Species_Tree/SpeciesTree_rooted.txt')
stopifnot(is.binary(tre))
stopifnot(is.rooted(tre))
if(is.ultrametric(tre)) {
utre <- tre
} else{
utre <- chronos(tre)
}
write.tree(utre, 'cafe/SpeciesTree_rooted_ultra.txt')
Then prepare the table of counts. I use the N0 output since the original orthogroups are deprecated in favour of the phylogenetic hierarchical orthogroups:
hog <- fread('Phylogenetic_Hierarchical_Orthogroups/N0.tsv')
hog[, OG := NULL]
hog[, `Gene Tree Parent Clade` := NULL]
hog <- melt(hog, id.vars='HOG', variable.name='species', value.name='pid')
hog <- hog[pid != '']
hog$n <- sapply(hog$pid, function(x) length(strsplit(x, ', ')[[1]]))
# Exclude HOGs with lots of genes in a one or more species.
# See also cafe tutorial about filtering gene families
keep <- hog[, list(n_max=max(n)), HOG][n_max < 100]$HOG
hog <- hog[HOG %in% keep]
# Exclude HOGs present in only 1 species
keep <- hog[, .N, HOG][N > 1]$HOG
hog <- hog[HOG %in% keep]
counts <- dcast(hog, HOG ~ species, value.var='n', fill=0)
counts[, Desc := 'n/a']
setcolorder(counts, 'Desc')
fwrite(counts, 'hog_gene_counts.tsv', sep='\t')
Run cafe5 with your favorite settings:
nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results --cores 30 &
nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results_k3 --cores 30 -k 3 &
nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results_p_k3 --cores 30 -p -k 3 &
Hi Dariober! It worked!! Thank you so much!! It would be really helpful for others as well!