Question

Moderate Mapping percentage

0

Entering edit mode

9 months ago

Researcher ▴ 30

Hi all, I received my sequenced transcriptome and genomic data from my service provider and started working with it. Both the DNA and RNA data passed quality metrics post trimming. But the mapping percentage comes out to be 90% using bowtie-DNA and 85% using Hisat2-RNA. I tried both hg38 and hg19 reference genomes, still the same issue persists. Ill attach the QC metrics here. Kindly let me know where i am making an error. Its paired end data 150bp.

enter image description here

trimmomatic NGS RNA-Seq • 1.1k views

ADD COMMENT • link 9 months ago by Researcher ▴ 30

1

Entering edit mode

85-90% mapping rate seems quite reasonable to me. I've certainly seem much worse!

ADD REPLY • link 9 months ago by Dave Carlson ★ 2.0k

0

Entering edit mode

Is it ? Because every time I have worked its been between 97-98% only. So we can proceed furter with analysis with this percentage of mapping?

ADD REPLY • link 9 months ago by Researcher ▴ 30

1

Entering edit mode

Yes, I would say those numbers are reasonable. You could also try mapping your RNA-Seq data with STAR. In my anecdotal experience, the mapping rates tend to be higher than Hisat2.

ADD REPLY • link 9 months ago by Dave Carlson ★ 2.0k

0

Entering edit mode

using STAR, it turned out to be the same percentage

ADD REPLY • link 9 months ago by Researcher ▴ 30

score 1 · Accepted Answer · 2024-02-09

1

Entering edit mode

9 months ago

dthorbur ★ 2.5k

Have you tried looking at the unmapped reads? BLASTing them to see if they are even considered human?

You could use something like KRAKEN2 to check for contamination, but this does require downloading and setting up the large databases, but is a good QC step if you have dealt with contamination issues in the past.

Also, what is the taxonomic relationship between the genome human genome assemblies you are using and the individual/population you have sequenced? The differences could be real, and manifesting as lower than expected mapping rates due to fixed differences between your samples and the reference genome.

ADD COMMENT • link 9 months ago by dthorbur ★ 2.5k

0

Entering edit mode

hi, thanks for responding. I havent tried looking at the unmapped sequences.Will do that. The data that i am working is cancer patients RNA and DNA that we isolated and sequenced.

ADD REPLY • link 9 months ago by Researcher ▴ 30

1

Entering edit mode

My knowledge of cancer genomics and transcriptomics is pretty limited, so I don't think there is much I can offer. But I am curious, if you took tumour/cancer samples vs normal cells? If so, is there a noticeable difference in mapping between these tissue types? If there is, then your difference in mapping may be partly explained unmapped cancer-specific reads that have nowhere to map to in your genome/transcriptome. If not, then it's likely something like contamination or population/individual specific differences between your samples and the reference.

If the former, you can do a quick de novo transcriptome assembly step with unmapped RNAseq reads using something like stringtie. Unsure what to do about the unmapped genomic reads if it is likely cancer-specific differences.

ADD REPLY • link 9 months ago by dthorbur ★ 2.5k

0

Entering edit mode

I am not sure whether this is issue is cancer specific or population specific, because I have worked with similar cancer tissue data previously and had perfect 98% mapping to reference genome. Maybe there was issue with the sample processing.

ADD REPLY • link 9 months ago by Researcher ▴ 30