Entering edit mode
6.2 years ago
Seq225
▴
110
Hi,
I have ~100 paired end mRNA seq data (150 Nt read size) that were sequenced by Illumina MiSeq. I have clipped the adapter by cutadapt. I am getting low mapping percentage. I used BWA, Bowtie2, and STAR. For STAR, I am getting 3-30% mapping. Bowtie2 ~40% and BWA 40-45%. I am using assembled transcriptome of my particular organism.
I thought probably I was messing up something with the adapter trimming. But for paired end, it should not be a big issue, I guess. What else could I messing up???
Thanks.
Did you consider to have poor-quality data? Could be contaminated with genomic DNA, or contaminations from other species. Is there a reference genome of the species? If so align against that rather than the transcriptome to rule out gDNA contaminations. Also, blast a good number of unmapped reads to see where they could belong to.
Thank you.
The reads are good quality (almost all of them). Not sure about any sort of contamination. Unfortunately, I do not have a ref genome.
Addition to suggestions by ATpoint,
you can check several contamination:
Agreed. With low-quality data, I meant the quality of the library, so gDNA and rRNA content. The sequencing data are typically robust from Illumina machines. Please give some details about the species and how the library was made. Does the species have poly-A RNA? If so, was it a mRNA enrichment or rRNA depletion kit, or just a total-RNA seq?
ATpoint and Mehemt, thank you very much. I have talked with the sequencing core. They never depleted the rRNA. That could be the actual problem.
Do you guys know any way to find out the rRNA mapping percentage of my data? Like I said, I do not have a genome sequence for this organism. Is there any rRNA sequence Database?
Thanks again!
Sorry for the late reply. I would recommend you to do:
You can search unmapped RNA reads against to rRNA database that you can obtain from NCBI of a very close species using blast. (please check blast manual in order to use it for this purpose). For instance, download all rRNA of the closely related species, and search, or you can use blast with remote option.
You can map mRNA of your species to genome of a species that is very close to your species at a genus level.