Does single-end short-read sequencing affect the mapping of mitochondrial and ribosomal RNA?
1
1
Entering edit mode
4 months ago
txema.heredia ▴ 190

Hi,

I am reanalyzing some old RNA-seq data using single-end 50bp reads. I aligned it with STAR, removed duplicates with Picard MarkDuplicates, and used htseq-count to determine the reads per gene.

I noticed that I am getting lower mitochondrial (0.15%-0.25%) and ribosomal fractions (0.6%-0.9%) than in other analysis of an almost identical cell line using pair-end 150bp reads ( 0.5%-1.5% mt ; 3%-8% ribosomal). Both using the same reference genome and annotation.

I was wondering if these differences I am seeing are a known effect of the sequencing technology used, or if they reflect some biological differences between cell lines.

Thanks,

pair-end single-end RNA-seq • 505 views
ADD COMMENT
2
Entering edit mode

I would guess it reflects more of a difference in library prep rather than something biological or due to sequencing method.

The longer reads can possibly increase mapping to repetitive regions and the paired end can decrease the amount of duplicates removed. So those factors might increase the percentage of mt and rRNA reads.

In general, I believe common practice is to not remove duplicates for RNA seq data unless there's a reason for it.

ADD REPLY
2
Entering edit mode

In general, I believe common practice is to not remove duplicates for RNA seq data unless there's a reason for it.

I think you are very much right. I autopiloted into removing duplicates because I am used to paired-end sequencing with UMIs, which let you get rid of PCR-duplicates.

By not removing duplicates on this data I am getting 2.5%-7% mitochondrial and 2%-6.5% ribosomal fraction.

ADD REPLY
1
Entering edit mode
4 months ago
kalavattam ▴ 280

In comparison to paired-end sequencing, single-end sequencing often results in more multi-mapping alignments and duplicate/redundant alignments, especially when dealing with repetitive DNA sequences like ribosomal DNA. Paired-end sequencing (and especially paired-end sequencing with longer reads) can help disambiguate the origin of multi-mapping alignments and provide additional information to differentiate alignments that would otherwise be considered duplicates in the alignment of single-end sequenced data.

For more details, you can refer to this publication. (There are others, but this one immediately comes to mind.)

So, when you mark and remove the duplicates, that's in part why the percentages are smaller for the alignment of single-end sequenced data versus the alignment of paired-end sequenced data. And when you leave the duplicates in, that's in part why the percentages are larger for the single-end sequenced data versus the paired-end sequenced data. I say "in part" because other differences—such as differences in mitochondrial fractions—can arise from variations in library preparation methods and other factors.

ADD COMMENT

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6