Dear all.
I sequenced DNA samples of a human being by using NGS technology and mapping reads (length is 90bp) to the human genome (version:hs37d5). Now I detected so low mapping rate (normal sample is higher than 99%, while my value is 88%). I collected all unmapped reads (243118 reads, flag of bam is 0) and tried to find their origins, while I can' t find any hits in NCBI nr database and only 2430 reads contained index sequences and only 210 reads containing adapter sequences.
So, my question is how should I do to find any reason which causes this low mapping rate? If you have some suggestions, please tell me.
Thanks.
Did you run
fastqc
to check if you might have carryover of adapters or other overrepresented sequences?I have used cutadapter to cut adapter sequences and used our in house script to filter low-quality reads. I have never used fastqc. Thanks for you suggestion, I will try it.
I have used fastqc to treat all unmapped reads, and I get base sequence quality like: and get base sequence content like:
This unmapped data does not appear to be of great quality (median values around Q24 ). As others have said 88% is not bad alignment rate by any means. You may want to take some of the unmapped reads and blast them to see if they are contaminants.
I have mapped all unmapped reads to NCBI nr database and not find any matching record.
There is not much you can do in that case. These could simply be sequencing artifacts.
Note: Did you do a translated blast search since you mention
nr
? How about ablastn
search withnt
?Agreed. Given that base quality the results appear to be fine, I've seen worse mapping rates. I suggest you proceed with downstream analysis and see if this goes without issues. If so, don't bother yourself with the mapping rate.
Thanks for @ATpoint and @genomax. These samples with low mapping rates have been analyzed, and we observed samples with mapping rate lower than 95% always contained some abnormal SNP/indel variations which around with may soft clip bases like follows.
Now, I am not sure there was a direct relationship among low mapping rates and much soft clip reads around snp/indel variations, while I always observed lots of SNP/indel variations around may soft clip bases in a sample which mapping rates lower than 95%.
I would say 88% is not actually very low, but within acceptable limits, we usually get 85-95%. But anyway, you also might try to run FASTQC on your raw data to see if you have any adapters or overrepresented sequences.
For samples which mapping rate lower than 95%, we will observe much soft clip reads around a SNP or indel variations like supplement figure(https://photos.app.goo.gl/FTxpyvn2qZJDGhnC8 ), So we think the unknown reason which causes low mapping rate may influence the accuracy of variations detecting in target samples.
The link is not functional. Please upload the image to a public image hoster such as ImgBB and then paste the full link including the prefix (e.g.
.png
) into the image field:Please try again, I have fixed it