Filtering for Percentage of identity (sequence identity: pident, %) from tblastn results
1
0
Entering edit mode
5.5 years ago
endretoth ▴ 40

Dear Bioinformaticians,

I would like to ask about defining the level of filtering by sequence identity (pident, %) from tblastn results.

I have a table of tblasn results in Galaxy including about 800,000 sequences. I would like to filter them by sequence identity but if I filter them with 98% I lose almost all sequences. I would like to know what is the accepted level for filtering considering that this is from protein! data. I think this should not be as strict as a blastn filtering (commonly 98 or 99%). Please give me advice and link me to any publication which tells me a proper percentage.

All answers are greatly appreciated. :)

Thend

blast filtering sequence pident tblastn • 2.5k views
ADD COMMENT
0
Entering edit mode

It is impossible to say without knowing any details about the project. Why do you need to filter the sequences?

Even when knowing the details, there is probably no perfect threshold, it is often a trade-off between removing artifacts (I guess this is what you want to do) and not losing too much information.

ADD REPLY
0
Entering edit mode

hey endretoth, I am doing a similar work Can you tell me how did you filter the sequences ? manually or did you use any programming language to do that?

ADD REPLY
0
Entering edit mode

It depends on what kind of sequences you are using in search. you should also be looking for the blosum or PAM matrix you used in blast, depending on divergence between the sequences you are looking for and query sequence.

ADD REPLY
0
Entering edit mode
4.2 years ago
endretoth ▴ 40

Hi priyankamowlali,

We used 96% minimum percent identity, later we defined another threshold of 98%. Although we lost majority of our sequences we made sure that they are good/high quality. I hope this helps.

Best, Thend

ADD COMMENT

Login before adding your answer.

Traffic: 1380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6