Hi,
I have a new genome assembly and I want to align the protein sequences of the original assembly against it. What is the best tool for this job?
Hi,
I have a new genome assembly and I want to align the protein sequences of the original assembly against it. What is the best tool for this job?
Hi, several suggestions:
This publication reviews the performance of 7 tools doing spliced alignments from proteins (They look also at 12 tools doing DNA alignments):
Hiroaki Iwata and Osamu Gotoh Nucleic Acids Res. 2012 Nov; 40(20): e161. doi: 10.1093/nar/gks708
The second way is more time consuming if you use these tools directly. Often the two steps are coupled. The first step is used to define chunks of genome that will be send to the second step tools (e.g. within Maker and Ensembl annotation pipelines).
Cheers
I would use blat or exonerate. Blat is better for more closely related species and the nice thing is that both will produce a blast table for easy parsing (though, with exonerate you have to use the 'roll-your-own' with a custom string, which I could share). Exonerate is used by Maker for protein alignments and it has a lot more options that allow you to control the splicing and intron modeling, codon alignment, etc. Blat is a lot faster, so that is a trade-off to consider.
If you wish to align those proteins to a reference assembly you could use the exonerate (http://www.ebi.ac.uk/~guy/exonerate/) protein2genome model which models introns. I used this when I wanted to align proteins from the TAIR10 database to our reference genome. You would also probably want to split the file into considerably smaller chunks so that many faster individual alignments can be carried out before the results are merged - this way the alignment as a whole will be much quicker.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I think you meant tblastn and not Blasx
Right, I will update the post !
May I also ask here if Promer (MUMmer) has an option of aligning a proteome to a genome? According to what i see it uses only nucleotide sequences, am I missing smth?
In the MUMer4 publication they state "It is not restricted to DNA and can also align protein sequences". It is not clearly said in the manual but it looks you can proteome as input of Promer. This approach do not provide splice-aware alignment.
Another tool really fast would be PSimScan.
Otherwise if you look for splice aware alignment you could have a look at this publication they show performance of 7 different tools for protein alignments:
Hiroaki Iwata and Osamu Gotoh Nucleic Acids Res. 2012 Nov; 40(20): e161. doi: 10.1093/nar/gks708
I tried to align proteome to a genome using promer, but it treats proteins as IUPAC code for nt and turns to N all the letters that it does not recognize... Anyway I ll try to look more into their publication, thanks. PSimScan looks a good tool too! I just need an "approximate" alignment at this stage, but will look into the publication that you mentioned for my future reference, many many thanks!
Here is what they say on the MUMmer4.x README:
I think it can only deal with nucleotides.
That's pity they don't check if the input is AA or DNA and skip the six frame translation if it is already protein. You should create an issue and ask if it could be implemented in a future version.
You're right. I will.