Question

Blast unigenes with set of protein sequences

0

Entering edit mode

9.1 years ago

Kurban ▴ 230

Hello guys, I have more than 10,000 de novo assembled unigenes from RNA-seq, and blasted them against 95 protein sequences from another insect species get 454 blast results. But their similarity range is about 20%-100%; e-value range 8.00E-06 - 0. When I want to select the probable homologous from these blast results what should be the cut-offs for similarity and e-value?
Thanks in advance.

Blast unigenes with protein • 1.6k views

ADD COMMENT • link 9.1 years ago by Kurban ▴ 230

0

Entering edit mode

This question comes up often and there is no defined cutoff that designates a homolog. A gene is a homolog or it is not. On the other hand similarity is expressed in %. A sequence could still be homologous (with a low % similarity) if it is evolutionarily far apart. If your insects species are closely related then 20% similarily may be low but if they are not then 20% could still be an important data point.

As you are well aware blast E-values are dependent on size of the database which in this case is very small. Was there a reason to only select those 95 proteins? 454 genes that you have a blast result are similar to some extent with your target gene set and you would need to examine the entire lot to see if you can remove some redundancy.

ADD REPLY • link 9.1 years ago by GenoMax 152k