Question

Low mapping rate, how can I handle it?

0

Entering edit mode

3.4 years ago

ssko ▴ 20

Hello,

I am new in this field. I am doing metagenome analysis with shotgun reads. All reads are single ended. DNAs were obtained from airways of human. I just want to find taxon abundances in the samples. Then I will predict the diversities and core microbes.

My mapping results are terrible. How can I handle bad mappings?? OR should I change the tools that I used the analysis?? Which tools are more accurate or sensitive for microbiome analysis?? I need any suggestions, please!

I followed this pipeline:

Assembly was done using Megahit
Short contigs (<200 bps) were removed using prinseq
Read mapping against contigs was performed using BWA
Similarity searches for GenBank, KEGG, eggNOG were done using Diamond
Binning was done using MaxBin2

This is my mapping results:

  # Sample  Total reads Mapped reads    Mapping perc    Total bases
    samples13   21380728    17881628    83.63   1618006383
    samples14   109599  22051   20.12   7606328
    samples15   258752  119090  46.02   18803788
    samples16   340586  147490  43.30   24935657
    samples12   7342679 6205921 84.52   524794709
    samples11   7741157 6283578 81.17   554721680
    samples17   17108901    15213361    88.92   1294292384
    samples18   4012626 2850684 71.04   302834087

BWA Mapping Microbiome Assembly • 3.2k views

ADD COMMENT • link 3.4 years ago by ssko ▴ 20

0

Entering edit mode

My mapping results are terrible.

What makes you say that? I mostly see samples with >70% alignment rates, which is fine.

ADD REPLY • link 3.4 years ago by Friederike 9.0k

0

Entering edit mode

Friederike Each sample belongs to a different person. Mapping percentage of sample 14,15 and 16 are under the 50%. They are really bad for me, especially sample 14. I don't get any healthy information from those samples.

ADD REPLY • link 3.4 years ago by ssko ▴ 20

0

Entering edit mode

Have you actually looked at the results? What type of information are you missing from the results? What exactly is throwing you off? (Not saying that these samples didn't fail, but in order to get a sense of why they may have failed, we need more information)

ADD REPLY • link 3.4 years ago by Friederike 9.0k

0

Entering edit mode

Friederike In the nutshell, I should make a comparison between samples like (11-12) , (13-14), (15-16), (17-18). enter image description here
for example; in the image (assume the samples are ordered 11 12 13...), we expect equivalent level of sample 13 and 14, but sample 14 has very low abundance compare to sample 13. I'm not sure, but this situation depends on sample amount, right? If so, is it possible to normalize with these results and calculate diversity and core microbes? I need relative abundance For this, right?

ADD REPLY • link 3.4 years ago by ssko ▴ 20

0

Entering edit mode

I need relative abundance For this, right

yes

ADD REPLY • link 3.4 years ago by Friederike 9.0k

score 3 · Accepted Answer · 2021-11-30

I think your results are mostly fine. As already mentioned, in most of them you have a high mapping rate. In those where the mapping is low, it seems that it is matched by a low number of reads. That is most likely because of low depth or abundance, and leads to fragmented assemblies. I suggest you look at your assembly statistics and eliminate those samples where most of the contigs are short, and probably those with low sequencing depth. That should leave you with good samples.

If I can make another suggestion: try to go with more informative titles such as "Low mapping rate", as they will attract more people to read and potentially answer. I think the idea is to tell us something specific about the problem so that people who are interested in that subject matter will comment.