Question

Filtering for Percentage of identity (sequence identity: pident, %) from tblastn results

0

Entering edit mode

5.5 years ago

endretoth ▴ 40

Dear Bioinformaticians,

I would like to ask about defining the level of filtering by sequence identity (pident, %) from tblastn results.

I have a table of tblasn results in Galaxy including about 800,000 sequences. I would like to filter them by sequence identity but if I filter them with 98% I lose almost all sequences. I would like to know what is the accepted level for filtering considering that this is from protein! data. I think this should not be as strict as a blastn filtering (commonly 98 or 99%). Please give me advice and link me to any publication which tells me a proper percentage.

All answers are greatly appreciated. :)

Thend

blast filtering sequence pident tblastn • 2.5k views

ADD COMMENT • link 4.2 years ago by endretoth ▴ 40

0

Entering edit mode

It is impossible to say without knowing any details about the project. Why do you need to filter the sequences?

Even when knowing the details, there is probably no perfect threshold, it is often a trade-off between removing artifacts (I guess this is what you want to do) and not losing too much information.

ADD REPLY • link 5.5 years ago by Corentin ▴ 610

0

Entering edit mode

hey endretoth, I am doing a similar work Can you tell me how did you filter the sequences ? manually or did you use any programming language to do that?

ADD REPLY • link 4.2 years ago by priyankamowlali • 0

0

Entering edit mode

It depends on what kind of sequences you are using in search. you should also be looking for the blosum or PAM matrix you used in blast, depending on divergence between the sequences you are looking for and query sequence.

ADD REPLY • link 4.2 years ago by cpad0112 21k

score 0 · Answer 1 · 2020-10-15

0

Entering edit mode

4.2 years ago

endretoth ▴ 40

Hi priyankamowlali,

We used 96% minimum percent identity, later we defined another threshold of 98%. Although we lost majority of our sequences we made sure that they are good/high quality. I hope this helps.

Best, Thend

ADD COMMENT • link 4.2 years ago by endretoth ▴ 40