How does the mapping software deals with reads mapping against polyploid genome. Does they randomly map at any allelic location ? If yes, then it might affect structure variation study. Isnt it?
How does the mapping software deals with reads mapping against polyploid genome. Does they randomly map at any allelic location ? If yes, then it might affect structure variation study. Isnt it?
The reference genome for mapping is haploid so mapping is not different compared to haploid organisms. More complicated are alternative haplotypes, but for the spirit of this question it's okay to ignore those.
Same questions! When the reference is tetraploid or hexaploid there are many duplications across the genome. Lots of reads will be Mutipul-mapped reasds. How to deal with it ?
How to deal with it?
First key to success is to open a new question, instead of adding an solution to a 2.5y old question.
Besides this, I see two options:
Many thanks for your reply!
I don't understand what you mean for the first option.
Maybe I didn't explain the question well.
I mapped the re-sequencing reads(100 bp pair-end) with bwa-mem (defaut parameter) for the hexaploid wheat.
But I found a lot of reads which can be mapped to reference have very low mapping quality (actually 0 for most mapped reads), thus make the number of uniq-reads very low.
I was wondering if there is a method can improve the accuracy. Maybe add some parameter or change another maping software
See this thread for some details on the MAPQ field calculations of BWA. In my opinion the field often is close to meaningless, and I had my share of difficulties with big plant genomes and within that group specifically the wheat (Triticum aestivum) genome.
You may need to play with the parameters and/or other aligners like bowtie2, though I'm really not enough of a Triticum specialist to evaluate whether the duplication rate you observe is expected or abnormal.
I suspect the real answers to these questions is that our standard procedures, ie. mapping to a standard haploid reference and using well mapped reads to call good SNPs effectively, is not going to work well.
If we are looking at true diploid, triploid and polyploid reference genomes, then mapping quality will very frequently be low (0, or close to zero) because many near identical sequences are present.
I suspect this is a perfect use case for pangenome software like PGGB -> ODGI, rather than the BWA vs haploid reference we are all more familiar with.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What if the reference is tetraploid ?
The reference is never tetraploid. The genome may be tetraploid or hexaploid or whateverploid, but the reference is always haploid. Multiple alleles are then mapped to the same location and it's up to the variant caller to take ploidy into account.