Question

Generally how are non-canonical chromosomes treated in human variation studies

1

Entering edit mode

11 weeks ago

eebloom ▴ 90

I can't seem to find a very cohesive answer to this question. When calling variants in human data aligned to GRCh38, some variants are aligned to regions of the genome such as:

chr17_GL000205v2_random

chr1_KI270706v1_random

chrUn_KN707874v1_decoy

chrEBV

chrM

I understand these to be various alternative contigs, unlocalised sequences, unknown chromosomes, mitochondrial chromosome etc. (see descriptions here)

If your biological question is concerning human variation in nuclear (autosomal and sex) chromosomes it seems suitable to remove variants mapping to chrEBV, chrM and unknown chromosomes.

However, what is the general consensus on unlocalised sequences on canonical chromosomes or reads with multiple alignments. For instance, some variants have a breakpoint in a canonical chromosome and another ion an unknown chromosome e.g.

chr16   76295587    r_246_0 A   [chrUn_KI270518v1:835[G .   SVTYPE=BND;MATEID=r_246_1   TR:VR   213:5   175:0
chrUn_KI270518v1    835 r_246_1 G   [chr16:76295587[G   .   SVTYPE=BND;MATEID=r_246_0   TR:VR   213:5   175:0

It does not seem to be documented in many methods in publications of structural variant analyses. I have seen one example where they discuss "filter[ing] SVs found in the sex and unknown chromosomes".

variants SV assembly WGS • 243 views

ADD COMMENT • link updated 11 weeks ago by GenoMax 148k • written 11 weeks ago by eebloom ▴ 90

0

Entering edit mode

This blog post may partly help: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

ADD REPLY • link 11 weeks ago by GenoMax 148k