blastn of reference genome CDS to subject genome gives range of hits, which do not translate into protein
1
0
Entering edit mode
4.7 years ago
VBer ▴ 200

I have a consensus genome created by incorporating only biallelic SNPs into the reference genome. I want to get the protein sequence of a particular gene from my consensus genome.

I tried using the reference gene CDS taken from NCBI through blastn to do this. I got a single hit, spanning multiple ranges. I concatenated all the aligned nucleotides from the consensus and tried to translate them but the reference protein is not found in its entirety in any frame. The reference protein is split across several frames, something I did not expect, because there are only SNPs present in the consensus.

Any idea why this is happening and solutions to get the protein sequence?

BLAST • 1.3k views
ADD COMMENT
3
Entering edit mode
4.7 years ago

blast could work but might not be accurate enough, I'm afraid

perhaps give a real mapping tool a try, stuff like gmap or EST2genome or such (they take correct splicing into account, something that blast will not do)

in a genomic context yes the protein ( or more accurate the CDS) is in most cases 'split' over different frames. However when you concatenate them together it should revert back to a single reading frame (unless some of the SNPs you introduced causes frameshifts and/or premature stop codons)

ADD COMMENT
1
Entering edit mode

Thanks, gmap worked like a charm!

ADD REPLY
0
Entering edit mode

Frameshift mutations are not possible as I did not include single nucleotide insertions, just SNPs. So perhaps I might have premature stop codons. I will try the aligners you have suggested, and perhaps look at the number of mutations too, and get back. Thanks!

ADD REPLY
0
Entering edit mode

Frameshift mutations are not possible as I did not include single nucleotide insertions, just SNPs.

correct you are.

then it's likely because blast does not provided you a correct gene structure (not a surprise neither, that's not it's goal). Yes give the gene mappers a try and see what that gives.

an alternative to this could be to transfer the annotation of your reference (given it has one) and then based on that extract your protein sequence.

ADD REPLY
0
Entering edit mode

For the latter, I only know RATT. Do you know any other tools?

ADD REPLY
0
Entering edit mode

was also thinking of that one indeed.

there is also 'liftOver' from the ALLmaps package if I remember well.

ADD REPLY

Login before adding your answer.

Traffic: 2344 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6