Question

low mapping rate when using SRA RNA-seq data

0

Entering edit mode

9.6 years ago

Pei ▴ 240

Hi:

I am interesting in using human tissue data from SRA dataset SRP007412

however, after fastq-dump and running tophat

I found that the mapping rate is rather poor: most were < 70% and even 45% for some sample.

What would you typically do when encounter such low mapping rate public data?

Thanks in advance!

Best wishes

RNA-Seq • 3.0k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by Pei ▴ 240

Ram · Answer 1 · 2016-01-25

0

Entering edit mode

9.6 years ago

Sukhi Singh 11k

How did you measure the mappability? For RNA-Seq it works a bit differently as the reads are not coming from DNA, they are sequenced exons and splice junction boundaries. That's why Tophat/Cufflinks remap the unmapped reads to the special exon-exon junction libraries (transcriptome) and thus the mappability would be the reads mapped the first time plus the ones those get mapped again to the special libraries.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by Sukhi Singh 11k

0

Entering edit mode

Hi Sukhdeep:

I used the mapping rate provided in the align_summary.txt file, which provided by tophat.

I think this is the same as what you suggested.

Thanks.

ADD REPLY • link 9.6 years ago by Pei ▴ 240

0

Entering edit mode

Hey, that's right then. I don't we can generalise that public data has a lower mappability. You could try pulling some other recent datasets just to test that. It could also be that the library is over sequenced and thus producing lot of duplicates or some samples are contaminated. Run the downstream processing and see if you are happy with the results, if the saturation limit is reached, you might not care or could do anything about it.

Also, this might be a help

why low mapping rates for RNAseq?

ADD REPLY • link 9.6 years ago by Sukhi Singh 11k