I created a fasta containing my target around 1000 predicted peptide sequences. It is concatenated with Uniprot reference fasta to create a target-decoy database. Then the search engine will search proteome data to look up for any hits in the reference database.
I have some questions.
I would like to set 5% FDR for peptide identification filtering. Someone suggested that I should do this step exclusively for my predicted peptides. But I found few literatures mentioned details about how to perform this step. Some of them performed FDR filtering for the whole reference (Uniprot ref+predicted peptides). May I ask what is the preferred one?
If I would like to do the FDR filtering only for my target sequences, how could I do this?
If you want to do FDR-control on your predicted sequences only, filter out ref mapped PSMs from your result (regardless of target or decoy). As a result, you have only target/decoy PSMs mapping to the predicted sequences. Apply FDR-control on the PSMs. This practice is called separated FDR in general.
However, you have to be careful if you have small amount of PSMs to apply FDR-control. This is because there are chances that the PSMs are too small to represent null-distribution.
Actually, there is no significant difference between separated FDR and global FDR (use of whole reference) in practice so it is a matter of choice in many cases.
If you are beginner in proteomics, I recommend you to draw score distributions of 1) use of whole reference and 2) predicted and ref separately. And then, make a decision which one shows better null-distribution (Of course, before FDR control) and identification rate (Of course, after FDR control).
P.S. I am concerned with filtering ref-mapped PSMs so I give an example for your convenience:
PEPTIDE1 maps to both target-REF-protein1 and target-PREDICTED-protein1 => PEPTIDE1 is assigned to target-REF-protein1. => This is because reference protein is prioritized than predicted protein.
PEPTIDE2 maps to both decoy-REF-protein1 and target-PREDICTED-protein1 => PEPTIDE2 is assigned to target-PREDICTED-protein1 => This is because target sequence is prioritized than decoy sequence.
PEPTIDE3 maps to both decoy-REF-protein1 and decoy-PREDICTED-protein1 => PEPTIDE3 is assigned to both of them. => This is because there is no priority between decoys.
Good luck to you!