Blastx false positives
0
0
Entering edit mode
9.5 years ago
biobio ▴ 50

Hi,

We are working on using sequencing to identify novel viruses using blast. The idea is to sequence siRNAs from plants and use blast to find virus associated sequences. We are using the viral refseq database and using blastx. The problem we are running into is that even with low E-value cutoffs (10E-20), we are getting a lot of false positives.

By false positive, we mean that the blast result shows a virus hit, but when we blast that contig again, we get matches from a plant genome. How can we filter our results to ensure the hits that say virus are actually viruses?

blast • 3.9k views
ADD COMMENT
0
Entering edit mode

When you blast it again, you blast it against the same database of viruses, correct? How can you get plant results then?

ADD REPLY
0
Entering edit mode

No, sorry. We take the interesting results and blast it against NR using the web interface.

ADD REPLY
0
Entering edit mode

That's why you're seeing plant results - because they are always a better fit than viral seqs when not filtered by organism.

ADD REPLY
0
Entering edit mode

But if the sequences are actually from viruses, shouldn't viruses be the best hit?

ADD REPLY
1
Entering edit mode

They're not sequences from viruses, they're small plant (host) molecules that target complementary nucleotide sequences. What these things complement could be either host or foreign (viral).

When you get hits against plants for a given siRNA, I can think of two reasons:

1. You're just finding that siRNA in the plant's genome

- or -

2. You're finding that siRNA's target in the plant's genome

ADD REPLY
0
Entering edit mode

Ah okay, that makes sense. So when doing blast against the viral database, is it possible to remove the plant hits without doing a blast against NR?

ADD REPLY
0
Entering edit mode

Well for one, I'm not sure why you're using BLASTX, siRNAs are sequence specific and target mRNA molecules, not proteins. So using BLASTX doesn't make any sense here.

This is the tricky part, on one hand you should still search against the host, but on the other hand even if a siRNA matches a host gene, it still may have anti-viral activity in vivo (either through silencing a gene needed by the virus or by silencing the virus directly).

I would also narrow my search down to plant viruses, no sense in searching animal viruses.

ADD REPLY

Login before adding your answer.

Traffic: 3039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6