How to deal with Unplace Chromosome (chrUn)
1
0
Entering edit mode
6.7 years ago
vivekruhela ▴ 20

Hi,

I have generated variant calling format file (.vcf) with the help of gatk. But when I checked the vcf file, I have found some strange entries like chrUn which are actually unnplaced chromosome (check Unplaced sequence definition). While I have never heard about it in vcf file. Entries are as follows: Other that this, I have also got the random chromosome in chr1.

##contig=<ID=chr1_gl000191_random
##contig=<ID=chr1_gl000192_random
##contig=<ID=chr4_ctg9_hap1
##contig=<ID=chr4_gl000193_random
##contig=<ID=chr4_gl000194_random
##contig=<ID=chr6_apd_hap1
##contig=<ID=chr6_cox_hap2
##contig=<ID=chr6_dbb_hap3
##contig=<ID=chr6_mann_hap4
##contig=<ID=chr6_mcf_hap5
##contig=<ID=chr6_qbl_hap6
##contig=<ID=chr6_ssto_hap7
##contig=<ID=chr7_gl000195_random
##contig=<ID=chr8_gl000196_random
##contig=<ID=chr8_gl000197_random
##contig=<ID=chr9_gl000198_random
##contig=<ID=chr9_gl000199_random
##contig=<ID=chr9_gl000200_random
##contig=<ID=chr9_gl000201_random
##contig=<ID=chr11_gl000202_random
##contig=<ID=chr17_ctg5_hap1
##contig=<ID=chr17_gl000203_random
##contig=<ID=chr17_gl000204_random
##contig=<ID=chr17_gl000205_random
##contig=<ID=chr17_gl000206_random
##contig=<ID=chr18_gl000207_random
##contig=<ID=chr19_gl000208_random
##contig=<ID=chr19_gl000209_random
##contig=<ID=chr21_gl000210_random
##contig=<ID=chrUn_gl000211
##contig=<ID=chrUn_gl000212
##contig=<ID=chrUn_gl000213
##contig=<ID=chrUn_gl000214
##contig=<ID=chrUn_gl000215
##contig=<ID=chrUn_gl000216
##contig=<ID=chrUn_gl000217
##contig=<ID=chrUn_gl000218

What is the actual meaning of unplaced chromosome and why they are still unplaced. Is it right to have a variant exist in any unplaced chromosome. Than how to handle it. What does it mean to have a variant in unplaced chromosome whose exact position we don't know yet. And what is the difference between chr1 and chr1_gl000191_random. Are variants in chr1 and variants in chr1_gl000191_random different.

Thanks. I really appreciate your help.

R next-gen genome SNP • 4.6k views
ADD COMMENT
2
Entering edit mode
6.7 years ago

Normally, to sort contigs into pseudomolecules chr1, chr4 etc. you use a genetic map. You take the markers in the genetic map, compare them with the contigs you have from the assembler, and try to place as many contigs into the genetic map as possible, from which you then draw up the pseudomolecules.

The unplaced contigs don't carry markers that are also in the genetic map. Often, but not always, the unplaced contigs are repetitive regions, sometimes they're just poor in markers, etc.

You do have to keep the variants on those contigs, they could be real. I don't understand what you mean by 'whether variants between chr1 and chr1_gl000191_random are different, it could be that one genetic map indicated that the gl000191 region belongs on chr1 but the exact position is not clear so they left it as an extra piece, whoever made those pseudomolecules can tell you about that.

ADD COMMENT
0
Entering edit mode

Thanks for your explanation. I don't understand..'whoever made those pseudomolecules can tell you about that.' As far as I know pseudo molecules are the human genomes. So from where I can get the information about random gene in chr1. One thing I would like to share is that I have checked the coverage depth for these unplaced chromosome and for others too from the bam file obtained after alignment. I found that these unplaced chromosome have very poor coverage like 5 or 6 while others like chr1 or 2 have coverage of order 2000. So does that means that unplaced chromosome are prone to error due to high uncertainties.

ADD REPLY
0
Entering edit mode

Oh yes of course, that's the human genome, I never work with that thing! This older biostars post may have some answers for you: Additional Data In Human Genome (Hg18 / Hg19) Assembly ? Yes these are all the human genome, including the contigs named 'random'.

I wouldn't worry too much about the coverage, if these contigs are for example alternative haplotypes then the coverage is bound to be low.

ADD REPLY

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6