Hi,
I am attempting OrthoMCL to compare 10 different strains of a certain bacterial species, and went through the pipeline after installing MySql, OrthoMcl and Mcl locally.
The problem is that all the entries are in the inParalogs.txt, while the orthologs.txt is empty.
The input for the pipeline was the aminoacid fasta files (got from RAST output from assembled contigs).
I process each file through the pipeline individually.
I called on orthomcladjustfasta, orthomclfilterfasta, makeblastdb, blastall, orthomclblastparser, orthomclloadblast, orthomclpairs etc. They all ran to completion without any glitches.
My adjusted fasta header looks like this
>sampleID|Protein_id
makeblastdb and blastall commands I used for each of my sample are
makeblastdb -in mySample.goodProteins.fasta -dbtype prot -out mySample_blastDB
blastall -p blastp -i mySample.goodProteins.fasta -d mySample_blastDB -o mySample_blast.csv -e 1 -m 8 -a 2 -v 1000 -b 1000
Then I call on orthomclblastparser to produce similarSequences.txt
for each file.
I appreciate any help in understanding why my orthologs.txt
is empty and the inparalogs.txt
has about 21000 rows of data.
Kind regards,
Brindha
Typically, you want to include an outgroup in the clustering, so consider how closely related the strains are. Also, I'm not sure mixing blast+ database and legacy blast tools is a good idea, though it may work.
Thanks for your quick response @SES.
Regarding outgroup, should it be a related species or can be totally random bacterial species?
Also could you expand on what you mean by "I'm not sure mixing blast+ database and legacy blast tools is a good idea, though it may work". I was following on of the online tutorials to use makeblastdb and blastall. I tried earlier with formatdb that another tutorial suggested, but it didn't work with blastall. So, I used makeblastdb.
Look forward to hearing your suggestions.
The outgroup should be closely related, not random. By mixing the tools I mean,
formatdb
andblastall
work together (both from legacy blast), andmakeblastdb
andblastp
(blast+) work together. By mixing them you are using programs from different toolkits and they are not designed to work together (they may, but it is not advisable).Yes. I used another strain from the same phlyum, but different family as the outgroup. The orthologs file is still empty while all the info is in the inparalogs file. Not sure why this is the case. (These strains btw is from a published paper on their comparative genomics, and they have managed to identify orthologs using orthomcl and Synergy2. Not sure why I am not able to replicate it).
I shall try blastp with makeblastdb, and see if it make a difference.
(Separately, I also ran into trouble with Amphora2 that Synergy2 uses, and I have posted another question in this forum regarding that.)
Thanks much