Question

Using GRch38 (human genome) primary assembly for mapping long reads

0

Entering edit mode

3.8 years ago

prasundutta87 ▴ 670

Hi,

Mapping short reads to human genome primary asembly (GRch38) can become messy due to the presence of alternate contigs. Hence, genomes without alternate contigs are used for mapping using BWA as mentioned in - https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use.

However, when long reads are used, is it okay to use the primary or toplevel assembly of the human genome (GRch38)? From my perspective, a lot of mapping errors usually associated with short reads is overcome by long reads. But, is there any other issue I am missing here?

Regards, Prasun

assembly alignment • 1.5k views

ADD COMMENT • link updated 3.8 years ago by colindaven 7.0k • written 3.8 years ago by prasundutta87 ▴ 670

score 1 · Answer 1 · 2021-02-08

This is a pretty clear answer on (not) using ALT contigs from Heng Li.

https://github.com/lh3/minimap2/issues/72

I completely agree - they are a nightmare for most genomes (not all bioinformatics is human-based), only relevant for WGS and variant calling, cripple non ALT-aware aligners and read mappers, mess up any functional genomics , etc, etc.