Hi everybody,
what is the best practice to deal with the unpaired data generated by trimming paired-end RNA-Seq data, when only one of the mates makes it through the trimming?
I have seen people recommend to only use the paired data remaining (and ignore the often small unpaired files), but I am afraid to lose crucial data. I could easily process the paired and two unpaired sets per sample separatly
My analysis pipeline is
fastqc - trimmomatic - fastqc - STAR - featureCounts - voom/limma
If trying to use all data, at what point would you recommend to put everything together (and how)?
Many thanks!
Hi guys,
thanks for the quick replies. The unpaired reverse reads are next to nothing (0.2% or something), the forward unpaired usually more like 2 - 5%. Does this sound normal to you?
There is no "normal". Ideally you should not have any. But this is biology and you live with what you have :-)
If you use BBDuk for trimming paired reads, you will not end up with any singletons, which can make the processing easier. Reads will either be retained as pairs or discarded as pairs. In situations where one read is trimmed down to nothing, the pair is discarded if a minimum length restriction is used. If no limitation is set, the read will be trimmed down to a minimum length of 1bp, so it will still be present and the fastq file will be valid and correctly paired, but it will typically be ignored downstream and only its mate will be used (since 1bp reads don't map).