Problems With Orthomclblastparser Script
4
0
Entering edit mode
12.7 years ago
GR ▴ 400

Hi All,

I was trying to find true orthologs for a set of sequences using OrthoMCL program. I made it upto step 8- orthoMCLBlastParser. I provided my blast output in -m8 format. When I ran the orthoMCLBlastParser it asks for the taxonID of the subject sequences. I modified my blast output file by providing an id to subject sequences like 'xxx|YYYYYY'. But still getting the same error.

Can someone help me for this.

Thanks, R.

I am just copying a few lines from my blast output file and error given by the orthoMCLBlastParser.

BlastParser:

BLASTP 2.2.26+

Query: ppp|scf8123

Database: aPD3R_pep

Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score

100 hits found

ppp|scf8123 xxx|Pd3R61150.1|PACid:197844 56.00 800 239 15 9 804 13 703 0.0 824

ppp|scf8123 xxx|Pd3R61150.1|PACid:197366 95.88 826 32 2 1 824 1 826 0.0 1557

Error:

acquiring genes from ppp.fasta couldn't find taxon for gene 'xxx|Pd3R61150.1|PACid:197844' at /Downloads/orthomclSoftware-v2.0.2/bin/orthomclBlastParser line 103, line 1.

Please note that I removed the first 5 lines of the output file otherwise it gives me the error: couldn't find taxon for gene 'BLASTP' at /Downloads/orthomclSoftware-v2.0.2/bin/orthomclBlastParser line 103, line 1.

orthomcl fasta conversion • 4.8k views
ADD COMMENT
0
Entering edit mode
12.7 years ago
Vitis ★ 2.6k

I think you should keep only 'xxx|Pd3R61150.1' of 'xxx|Pd3R61150.1|PACid:197844' for the query proteins. If the manual says the naming convention should in this form, I think they mean it, strictly. At least, I followed these and it worked.

ADD COMMENT
0
Entering edit mode
12.7 years ago
SES 8.6k

Are you also giving the orthoMCLBlastParser the path to the directory of "Adjusted" Fasta files (as defined by orthoMCL) as the second argument? That directory should contain "ppp.fasta" and "xxx.fasta" and any additional taxa that you are analyzing. I agree with vitis that you should probably follow the directions explicitly because modifying the headers will likely break the parser or possibly introduce some other unintended effect downstream.

ADD COMMENT
0
Entering edit mode

Thanks a lot Vitis and Ses. This is working now. I managed to make it work till step 9- orthomclBlastParser where it loads the blast results into the database. When I run the next step for generating the potential Ortholog, inparalog and coorthologs, it results me the empty tables in the database :( . On running the next script OrthomclDumpPairs, pairs directory has three empty files and mclInput file is also empty. Any ideas?

ADD REPLY
0
Entering edit mode

i have the same problem, I am having data in the intermediate tables but not Ortholog,Paralog or CoOrtholog tables

ADD REPLY
0
Entering edit mode

I managed to run this program and completed all the steps. For this problem, I was messing up with my database and there was some memory problem on my system (hard to remember the exact problem right now). But I took time and went through all the steps again and again (tried to follow each and every minor point given in the manual). It took me two day but finally did it myself. Try to follow the manual (every detail), if u still cant do I will be able to help u.

ADD REPLY
0
Entering edit mode

I have met the same problem. only obtained empty mclInput file. How did you resolve it? I did not run the all-V-all blast. Just did blast of my sequences with the reference proteome. Does it matter? Thanks

ADD REPLY
0
Entering edit mode
12.7 years ago
GR ▴ 400

Thanks a lot Vitis and Ses. This is working now. I managed to make it work till step 9- orthomclBlastParser where it loads the blast results into the database. When I run the next step for generating the potential Ortholog, inparalog and coorthologs, it results me the empty tables in the database :( . On running the next script OrthomclDumpPairs, pairs directory has three empty files and mclInput file is also empty. Any ideas?

Just to add a little bit of information, I have 30 sequences for which I am interested to look for the orthologs in one particular species. So I did not run the all-vs-all blast. I just did the blast of my sequences with the proteome of another species and got the results in the same format -m 8. This time I carefully followed all the naming conventions provided in the manual. Is it something to do with the All-vs-All blast?

There is no chance that my sequences does not have orthologs in the another species. Please help.

Many thanks once again.

ADD COMMENT
0
Entering edit mode
12.6 years ago
jollymrt ▴ 10

thanks for the help ritu but now i am getting populated InParalog table but empty ortholog tables. Any idea why that is happening.

ADD COMMENT
0
Entering edit mode

I have met the same problem. only obtained empty mclInput file. How did you resolve it? I did not run the all-V-all blast. Just did blast of my sequences with the reference proteome. Does it matter? Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6