Hi! I've recently analysed some RNA-seq data of highly degraded human FFPE samples whose libraries were prepared starting from different RNA inputs (range 1-40ng) using the Nugen Ovation RNA-seq System v2 kit. Using Picard's MarkDuplicates on the BAMs, 84-93% of mapped reads were marked as duplicates, with higher input samples having a slightly lower duplication rate.
I know that 'high' duplication rates are expected in RNA-seq, but my question is: what is the expected duplication rate(s) in a typical experiment with FFPE (or, more generally, degraded) samples? Also, can such a high duplication rate have any impact on the downstream analyses, such as differential gene expression? From what I read so far, I understand it is not recommended to remove duplicates from RNA-seq experiments anyway.
Thanks for your help!
Does it matter? The data are what they are. Of course quality is lower in FFPE than in freshly lysed tissues. Be sure to have sufficient experimental replicates and interpret results with care. Some sufficient cutoffs e.g. only trust DEGs with a fold change beyond 2 might be a good idea here. Beyond that there is anyway no way to make the data "better", so analyse as usual and try to validate important findings.