Hello guys!
I have BLASTXed a query file (which contain more than 140000 nucleotide sequences) with a db file (which contain more than 1400 polypeptide sequences). got the blasted query sequences around 20000 (with e-value 3), then I change e-value to 5 still got more than 15000 aligned sequences. When I checked the identity(%) of them, there are more that 2000 sequences that their identity smaller than 30. when I analysis the blastx result which factors should I consider?
Should I select 3 or 5 for e-value?
What % of the identity could be the threshold for blastx result?
Thank you in advance.