How to find a specific gene in a assembled genome that is not annotated?
2
0
Entering edit mode
6.5 years ago

I have the an assembled genome (contigs) of an insect that is not yet annotated. There are some genes of interest (such as actin) that I want to know if they are present in that genome. What can I do

assembly gene sequence • 3.6k views
ADD COMMENT
3
Entering edit mode
6.5 years ago
GenoMax 147k

You can take the sequence of genes of interest from the closest related species that you can find in GenBank and then you can use blat (or blast) to search against your contigs.

ADD COMMENT
0
Entering edit mode

Thanks.

However, I already try that with a sequence of the closest relative that I can find (they are from the same subfamily), and I do not get any hit. I do not know if their sequences are that different or if I am doing something wrong

ADD REPLY
0
Entering edit mode

what kind of sequences are you using as input? nucleotide? protein? and related: what kind of blast are you running?

ADD REPLY
0
Entering edit mode

christian_jpg2 : As has been suggested in this thread you could try tblastn if plain blastn did not work.

You also have to consider the possibility that if you did the blast search right and did not find a hit for something that should be present then your assembly could be of poor quality. You could take your assembly and try blasting it against nr to see if you get reasonable/contiguous hits.

ADD REPLY
2
Entering edit mode
6.5 years ago
toheitka ▴ 230

It looks to me, as if you should follow this thread "Building Hidden Markov Model (HMM) for proteins" closely, as both of you want similar things.

As for HMMs of your proteins, you could retrieve them using PFAM. The actin page, for example, is here: https://pfam.xfam.org/family/PF00022

Then, you can follow this EBI tutorial to retrieve the HMM from PFAM.

Actually, what I would do is very crude and a bit dirty: I would translate my genomes in all frames and check with the downloaded HMM using HMMER, if I get a hit. It would be great, if HMMER could use protein HMMs to search DNA, but this functionality does not yet exist (I think).

ADD COMMENT
3
Entering edit mode

running a simple tblastn with protein against the genome will be more then sufficient here.

For more specific (or really distantly related species == very low sequence conservation), your given approach will indeed be the appropriate one.

ADD REPLY
1
Entering edit mode

Yes, lieven.sterck is right, obviously. I had assumed, genomax already did that. Upon re-reading, I saw, that probably BLASTn has been used, so I agree, tBLASTn would be the keyword.

ADD REPLY
1
Entering edit mode

(and I am biased because I am in love with HMMs...)

ADD REPLY

Login before adding your answer.

Traffic: 2447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6