Question

96% of reads aligned to the genome, but only 40% to the transcriptome?

0

Entering edit mode

2.7 years ago

demoraesdiogo2017 ▴ 110

Hello

My question is about single-cell rnaseq, but I believe people with experience with bulk RNA-seq might also be able to answer this.

I aligned a few single cell datasets with cellranger, but when I checked the results, it seems that although most reads aligned to to the genome (only half with high confidence), only 40% of the reads aligned with the transcriptome. Here is an example of one of the outputs:

Reads Mapped to Genome  96.6%
Reads Mapped Confidently to Genome  54.6%
Reads Mapped Confidently to Intergenic Regions  8.5%
Reads Mapped Confidently to Intronic Regions    0.2%
Reads Mapped Confidently to Exonic Regions  46.0%
Reads Mapped Confidently to Transcriptome   45.3%
Reads Mapped Antisense to Gene  0.4%

I am not sure what to think about this. Could this be a sign of low integrity of the reads? My hypothesis is that this if there is degradation in the sample, it could have not aligned as a trasncript, but it shouldn't have any problem aligning with the genome. Another hypothesis is that the sample was contaminated with genomic DNA. I am, however, not even sure if these results are normal.

rnaseq single-cell • 2.1k views

ADD COMMENT • link updated 2.6 years ago by ATpoint 87k • written 2.7 years ago by demoraesdiogo2017 ▴ 110

0

Entering edit mode

Is that so bad? I have had data of (what I think) good quality with similar mapping rates, though I used Alevin and not CellRanger. Others may comment as well, but I would not get too much of a headache here, just continue and check whether data are ok and can be analysed. Check the usual QC metrics and if you get cluster separation matching biological expectations. See whether you get the usual amount of genes detected per cell (that obviously depends on the celltype, but very generally, like 1000 per cell and more).

ADD REPLY • link 2.6 years ago by ATpoint 87k

score 0 · Answer 1 · 2022-07-20

There is a significant gap between Mapped to Genome and Mapped Confidently to genome. Of the reads that map confidently, almost 83% map confidently to the transcriptome. The issue is about 40% of the reads are non-confident mappings, suggesting that significant fractions of these reads may have been trimmed, leaving small non-unique reads.