Low mapping rate for human NGS PE reads to hs37d5 genome
0
0
Entering edit mode
5.2 years ago
Ginsea Chen ▴ 140

Dear all.

I sequenced DNA samples of a human being by using NGS technology and mapping reads (length is 90bp) to the human genome (version:hs37d5). Now I detected so low mapping rate (normal sample is higher than 99%, while my value is 88%). I collected all unmapped reads (243118 reads, flag of bam is 0) and tried to find their origins, while I can' t find any hits in NCBI nr database and only 2430 reads contained index sequences and only 210 reads containing adapter sequences.

So, my question is how should I do to find any reason which causes this low mapping rate? If you have some suggestions, please tell me.

Thanks.

NGS human reads hs37d5 Low mapping rate • 2.9k views
ADD COMMENT
0
Entering edit mode

Did you run fastqc to check if you might have carryover of adapters or other overrepresented sequences?

ADD REPLY
0
Entering edit mode

I have used cutadapter to cut adapter sequences and used our in house script to filter low-quality reads. I have never used fastqc. Thanks for you suggestion, I will try it.

ADD REPLY
0
Entering edit mode

I have used fastqc to treat all unmapped reads, and I get base sequence quality like:enter image description here and get base sequence content like:enter image description here

ADD REPLY
0
Entering edit mode

This unmapped data does not appear to be of great quality (median values around Q24 ). As others have said 88% is not bad alignment rate by any means. You may want to take some of the unmapped reads and blast them to see if they are contaminants.

ADD REPLY
0
Entering edit mode

I have mapped all unmapped reads to NCBI nr database and not find any matching record.

ADD REPLY
0
Entering edit mode

There is not much you can do in that case. These could simply be sequencing artifacts.

Note: Did you do a translated blast search since you mention nr? How about a blastn search with nt?

ADD REPLY
0
Entering edit mode

Agreed. Given that base quality the results appear to be fine, I've seen worse mapping rates. I suggest you proceed with downstream analysis and see if this goes without issues. If so, don't bother yourself with the mapping rate.

ADD REPLY
0
Entering edit mode

Thanks for @ATpoint and @genomax. These samples with low mapping rates have been analyzed, and we observed samples with mapping rate lower than 95% always contained some abnormal SNP/indel variations which around with may soft clip bases like follows.

enter image description here

Now, I am not sure there was a direct relationship among low mapping rates and much soft clip reads around snp/indel variations, while I always observed lots of SNP/indel variations around may soft clip bases in a sample which mapping rates lower than 95%.

ADD REPLY
0
Entering edit mode

I would say 88% is not actually very low, but within acceptable limits, we usually get 85-95%. But anyway, you also might try to run FASTQC on your raw data to see if you have any adapters or overrepresented sequences.

ADD REPLY
0
Entering edit mode

For samples which mapping rate lower than 95%, we will observe much soft clip reads around a SNP or indel variations like supplement figure(https://photos.app.goo.gl/FTxpyvn2qZJDGhnC8 ), So we think the unknown reason which causes low mapping rate may influence the accuracy of variations detecting in target samples.

ADD REPLY
0
Entering edit mode

The link is not functional. Please upload the image to a public image hoster such as ImgBB and then paste the full link including the prefix (e.g. .png) into the image field:

enter image description here

ADD REPLY
1
Entering edit mode

Please try again, I have fixed it

ADD REPLY

Login before adding your answer.

Traffic: 2031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6