Question

When is recommended to remove or keep Unpaired "Orphan" reads in downstream analysis?

0

Entering edit mode

8.6 years ago

fernardo ▴ 180

Hi all,

In NGS, after alignment, SAM and BAM files contain reads (paired and unpaired). I wonder when is better to keep the unpaired (orphaned) reads for the downstream analysis and when should be kept?

e.g. in RNA-seq, exome-seq, variant calling, structural variantion, methylation, Chip-seq... whatever comes through your mind is highly appreciated.

If your answer is supported by a paper or experience would definitely be great.

Thanks in advance

RNA-Seq SNP alignment exome-seq samtools • 4.0k views

ADD COMMENT • link updated 3.5 years ago by Devon Ryan 104k • written 8.6 years ago by fernardo ▴ 180

score 0 · Answer 1 · 2016-04-03

0

Entering edit mode

8.6 years ago

Devon Ryan 104k

If you already went through the hassle of including the orphaned reads in the alignment then just keep everything. Having them in there isn't going to hurt anything.

ADD COMMENT • link 8.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks. How about if I prove that it hurts.. surely I'd be happy to correct me if I am wrong. So e.g. orphan reads changes DP value for SNPs and INDELs in VCF files, please have a look at my other post "A: Why GATK and bcftools SNP calling different? ". Then when we filter the SNPs based on DP value, it will eliminate a number of SNPs. Hope that clear.

From your first part, I feel like there is a way that one can get rid of orphan reads during/before the alignment. Is there?

ADD REPLY • link 8.6 years ago by fernardo ▴ 180

0

Entering edit mode

If you have evidence that in a particular use-case including them produces lower quality results then certainly leave them out. That two different tools happen to treat them differently is neither a surprise nor a problem. This just means that the DP threshold should be different if you use samtools versus GATK.

BTW, we were using different defintions of orphan reads. I assumed you meant those whose mates were removed during the trimming process. You meant what are commonly referred to as "singletons" (i.e., paired-end reads whose mates don't align). The former can either be excluded from or included in the alignment. The latter would need to be filtered out afterward if they cause a problem (this is unusual, though).

ADD REPLY • link 8.6 years ago by Devon Ryan 104k

0

Entering edit mode

OK. Thanks. So those reads who lose their mates through trimming might be better to stay, right? Those reads whose mates are not aligned better to be removed?

By the way, how one can separate Singletons from Orphaned reads? Perhaps during trimming(trimmomatic), Orphaned reads can be separated then after alignment if any single reads found, they are Singletons, can I be correct?

ADD REPLY • link 8.6 years ago by fernardo ▴ 180

0

Entering edit mode

Devon Ryan Hi, sorry to reply on such an old post but I have a similar question. The kneaddata output has unmatched_1.fq and unmatched_2.fq which are reads whose mates are lost but they themselves passed both trimmomatic and bowtie2 step. In this case would at what step would have reads without mates be an issue in downstream processing? Thanks in advance