Hi,
I have generated variant calling format file (.vcf) with the help of gatk. But when I checked the vcf file, I have found some strange entries like chrUn which are actually unnplaced chromosome (check Unplaced sequence definition). While I have never heard about it in vcf file. Entries are as follows: Other that this, I have also got the random chromosome in chr1.
##contig=<ID=chr1_gl000191_random
##contig=<ID=chr1_gl000192_random
##contig=<ID=chr4_ctg9_hap1
##contig=<ID=chr4_gl000193_random
##contig=<ID=chr4_gl000194_random
##contig=<ID=chr6_apd_hap1
##contig=<ID=chr6_cox_hap2
##contig=<ID=chr6_dbb_hap3
##contig=<ID=chr6_mann_hap4
##contig=<ID=chr6_mcf_hap5
##contig=<ID=chr6_qbl_hap6
##contig=<ID=chr6_ssto_hap7
##contig=<ID=chr7_gl000195_random
##contig=<ID=chr8_gl000196_random
##contig=<ID=chr8_gl000197_random
##contig=<ID=chr9_gl000198_random
##contig=<ID=chr9_gl000199_random
##contig=<ID=chr9_gl000200_random
##contig=<ID=chr9_gl000201_random
##contig=<ID=chr11_gl000202_random
##contig=<ID=chr17_ctg5_hap1
##contig=<ID=chr17_gl000203_random
##contig=<ID=chr17_gl000204_random
##contig=<ID=chr17_gl000205_random
##contig=<ID=chr17_gl000206_random
##contig=<ID=chr18_gl000207_random
##contig=<ID=chr19_gl000208_random
##contig=<ID=chr19_gl000209_random
##contig=<ID=chr21_gl000210_random
##contig=<ID=chrUn_gl000211
##contig=<ID=chrUn_gl000212
##contig=<ID=chrUn_gl000213
##contig=<ID=chrUn_gl000214
##contig=<ID=chrUn_gl000215
##contig=<ID=chrUn_gl000216
##contig=<ID=chrUn_gl000217
##contig=<ID=chrUn_gl000218
What is the actual meaning of unplaced chromosome and why they are still unplaced. Is it right to have a variant exist in any unplaced chromosome. Than how to handle it. What does it mean to have a variant in unplaced chromosome whose exact position we don't know yet. And what is the difference between chr1 and chr1_gl000191_random. Are variants in chr1 and variants in chr1_gl000191_random different.
Thanks. I really appreciate your help.
Thanks for your explanation. I don't understand..'whoever made those pseudomolecules can tell you about that.' As far as I know pseudo molecules are the human genomes. So from where I can get the information about random gene in chr1. One thing I would like to share is that I have checked the coverage depth for these unplaced chromosome and for others too from the bam file obtained after alignment. I found that these unplaced chromosome have very poor coverage like 5 or 6 while others like chr1 or 2 have coverage of order 2000. So does that means that unplaced chromosome are prone to error due to high uncertainties.
Oh yes of course, that's the human genome, I never work with that thing! This older biostars post may have some answers for you: Additional Data In Human Genome (Hg18 / Hg19) Assembly ? Yes these are all the human genome, including the contigs named 'random'.
I wouldn't worry too much about the coverage, if these contigs are for example alternative haplotypes then the coverage is bound to be low.