Question

Search predicted peptides from proteomes

0

Entering edit mode

3.3 years ago

tomas4482 ▴ 430

I created a fasta containing my target around 1000 predicted peptide sequences. It is concatenated with Uniprot reference fasta to create a target-decoy database. Then the search engine will search proteome data to look up for any hits in the reference database.

I have some questions.

I would like to set 5% FDR for peptide identification filtering. Someone suggested that I should do this step exclusively for my predicted peptides. But I found few literatures mentioned details about how to perform this step. Some of them performed FDR filtering for the whole reference (Uniprot ref+predicted peptides). May I ask what is the preferred one?
If I would like to do the FDR filtering only for my target sequences, how could I do this?

peptide proteome search • 804 views

ADD COMMENT • link updated 3.2 years ago by progistar ▴ 40 • written 3.3 years ago by tomas4482 ▴ 430

0

Entering edit mode

If you want to do FDR-control on your predicted sequences only, filter out ref mapped PSMs from your result (regardless of target or decoy). As a result, you have only target/decoy PSMs mapping to the predicted sequences. Apply FDR-control on the PSMs. This practice is called separated FDR in general.

However, you have to be careful if you have small amount of PSMs to apply FDR-control. This is because there are chances that the PSMs are too small to represent null-distribution.

Actually, there is no significant difference between separated FDR and global FDR (use of whole reference) in practice so it is a matter of choice in many cases.

If you are beginner in proteomics, I recommend you to draw score distributions of 1) use of whole reference and 2) predicted and ref separately. And then, make a decision which one shows better null-distribution (Of course, before FDR control) and identification rate (Of course, after FDR control).

P.S. I am concerned with filtering ref-mapped PSMs so I give an example for your convenience:

PEPTIDE1 maps to both target-REF-protein1 and target-PREDICTED-protein1 => PEPTIDE1 is assigned to target-REF-protein1. => This is because reference protein is prioritized than predicted protein.

PEPTIDE2 maps to both decoy-REF-protein1 and target-PREDICTED-protein1 => PEPTIDE2 is assigned to target-PREDICTED-protein1 => This is because target sequence is prioritized than decoy sequence.

PEPTIDE3 maps to both decoy-REF-protein1 and decoy-PREDICTED-protein1 => PEPTIDE3 is assigned to both of them. => This is because there is no priority between decoys.

Good luck to you!

ADD REPLY • link 3.2 years ago by progistar ▴ 40