Question

What exactly are unmappable regions?

1

Entering edit mode

2.3 years ago

DS ▴ 70

What exactly is "unmappable regions"? My understanding from some google searches is that they are some short regions on the gene that are difficult to map. Is this correct? If so, why are there short region and long region, aren't they randomly splitted?

Thank you.

region genome mappable genetics gene • 1.3k views

ADD COMMENT • link updated 2.3 years ago by benformatics 4.1k • written 2.3 years ago by DS ▴ 70

score 3 · Accepted Answer · 2023-01-10

3

Entering edit mode

2.3 years ago

mark.ziemann ★ 2.0k

In the genome, there is a lot of what is called "repetitive DNA", these are sequences that appear many times throughout the genome. For example LINE1 and Alu are two types of repetitive sequences, that make up a large fraction of the human genome. Naturally, repetitive DNA is processed in sequencing assays like WGS and ChIP-seq, but aligners have a hard time figuring out where the read comes from as the sequence could have originated from many different places. The same thing happens when there are paralogous genes with very similar sequences, the aligner can't exactly distinguish where the sequence originated. This is why in short read sequencing, a lot of reads are discarded from the analysis as we don't know the true genomic origin of those reads. Long read sequencing mostly avoids this problem.

ADD COMMENT • link 2.3 years ago by mark.ziemann ★ 2.0k

0

Entering edit mode

so instead of randomly put into one of the "predicted" genomic origin, we just discard all of them?

ADD REPLY • link 2.3 years ago by DS ▴ 70

1

Entering edit mode

it depends on the alignment parameters you define - but i'd sat in most case a read would be aligned to multiple locations and assigned a lower "mapping quality" score

ADD REPLY • link 2.3 years ago by benformatics 4.1k