Very low mapping rate of mRNA seq data
0
0
Entering edit mode
6.3 years ago
Seq225 ▴ 110

Hi,

I have ~100 paired end mRNA seq data (150 Nt read size) that were sequenced by Illumina MiSeq. I have clipped the adapter by cutadapt. I am getting low mapping percentage. I used BWA, Bowtie2, and STAR. For STAR, I am getting 3-30% mapping. Bowtie2 ~40% and BWA 40-45%. I am using assembled transcriptome of my particular organism.

I thought probably I was messing up something with the adapter trimming. But for paired end, it should not be a big issue, I guess. What else could I messing up???

Thanks.

RNA-Seq sequencing genome assembly • 5.3k views
ADD COMMENT
0
Entering edit mode

Did you consider to have poor-quality data? Could be contaminated with genomic DNA, or contaminations from other species. Is there a reference genome of the species? If so align against that rather than the transcriptome to rule out gDNA contaminations. Also, blast a good number of unmapped reads to see where they could belong to.

ADD REPLY
0
Entering edit mode

Thank you.

The reads are good quality (almost all of them). Not sure about any sort of contamination. Unfortunately, I do not have a ref genome.

ADD REPLY
0
Entering edit mode

Addition to suggestions by ATpoint,

you can check several contamination:

  1. Bacteria or virus using kraken tool. 2.rRNA contamination. You can use blast as remotely to search unmapped reads against to your species data set in NCBI. 3.If you isolated your species from environmental samples, you can map its mRNA to some species that share the same environment with your species.
ADD REPLY
0
Entering edit mode

Agreed. With low-quality data, I meant the quality of the library, so gDNA and rRNA content. The sequencing data are typically robust from Illumina machines. Please give some details about the species and how the library was made. Does the species have poly-A RNA? If so, was it a mRNA enrichment or rRNA depletion kit, or just a total-RNA seq?

ADD REPLY
0
Entering edit mode

ATpoint and Mehemt, thank you very much. I have talked with the sequencing core. They never depleted the rRNA. That could be the actual problem.

Do you guys know any way to find out the rRNA mapping percentage of my data? Like I said, I do not have a genome sequence for this organism. Is there any rRNA sequence Database?

Thanks again!

ADD REPLY
0
Entering edit mode

Sorry for the late reply. I would recommend you to do:

  1. You can search unmapped RNA reads against to rRNA database that you can obtain from NCBI of a very close species using blast. (please check blast manual in order to use it for this purpose). For instance, download all rRNA of the closely related species, and search, or you can use blast with remote option.

  2. You can map mRNA of your species to genome of a species that is very close to your species at a genus level.

ADD REPLY

Login before adding your answer.

Traffic: 1449 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6