Hi, When I run GATK with Error and showed that the variant file with VCF fomat file has not the same order with reference file or not compatible. I used VCFsorter.pl to sort my vcf file according to the genome file using
$ perl vcfsorter.pl genome.dict New.vcf > New1.vcf 2>STDERR
It generated a new file called New1.vcf. However, New1.vcf produced a file with only header in the file but removing all the data line information ( original VCF or New.vcf file include both header and data line information), besides, ##contig+<id=number is="" totally="" the="" same="" as="" the="" new.vcf="" with="" no="" any="" changes.<="" p="">
The chromosome order in my reference file (genome.dict or genome.fa) is as 10,11,12,13,14,15,16,17,18,19,1,2,3,4,5,6,7,8,9,X,Y from header of New.vcf file, it has chromosome order as 1,10,11,12,13,14,15,16,17,18,19,2,3,4,5,6,7,8,9,X,Y New1.vcf file (sorted vcf file using vcfsorter.pl program above) has the same chromosome order as New.vcf or 1,10,11,12,13,14,15,16,17,18,19,2,3,4,5,6,7,8,9,X,Y
I am pretty confused about follows:
1) in-compatiblity between VCF file and Reference file means JUST chromosome order like above? since my reference is .fa format, VCF file header requires the same ##contig=number order with the Reference files?
2) why VCFsorter.pl generate the VCF file with just leaving header but ignore the data line information?
3) when i compare the reference .dict file (genome.dict) with sorted VCF file (New1.vcf), they seems quite different in the header content, is this normal?
[rzeng@prism reference]$ more genome.dict
@HD VN:1.4 SO:unsorted @SQ SN:chr10 LN:130694993 UR:file:/raid1/rzeng/reference/genome.fa M5:7831ecda5dd6bcf838e2452ea0139eac @SQ SN:chr11 LN:122082543 UR:file:/raid1/rzeng/reference/genome.fa M5:e168c7a3194813f597181f26bb1bd13f @SQ SN:chr12 LN:120129022 UR:file:/raid1/rzeng/reference/genome.fa M5:671f85bb54a6e097d631e2e2dd813ad4 @SQ SN:chr13 LN:120421639 UR:file:/raid1/rzeng/reference/genome.fa M5:7f9b9fa3fbd9a38634107dfdc7fd8dc8 @SQ SN:chr14 LN:124902244 UR:file:/raid1/rzeng/reference/genome.fa M5:bf4e1efa25a8fd23b41c91f9bcb86388 @SQ SN:chr15 LN:104043685 UR:file:/raid1/rzeng/reference/genome.fa M5:106358dace00e5825ae337c1f1821b5e @SQ SN:chr16 LN:98207768 UR:file:/raid1/rzeng/reference/genome.fa M5:5482110a6cedd3558272325eee4c5a17 @SQ SN:chr17 LN:94987271 UR:file:/raid1/rzeng/reference/genome.fa M5:0d21e8edbfcd8410523b2b94e6dae892 @SQ SN:chr18 LN:90702639 UR:file:/raid1/rzeng/reference/genome.fa M5:46fda2f7e6dbf91bff91d6703e004afb @SQ SN:chr19 LN:61431566 UR:file:/raid1/rzeng/reference/genome.fa M5:7d363594531514ce41dfacfd97bc995d @SQ SN:chr1 LN:195471971 UR:file:/raid1/rzeng/reference/genome.fa M5:c4ec915e7348d42648eefc1534b71c99 @SQ SN:chr2 LN:182113224 UR:file:/raid1/rzeng/reference/genome.fa M5:fe020a692e23f8468b376e14e304a10f @SQ SN:chr3 LN:160039680 UR:file:/raid1/rzeng/reference/genome.fa M5:50f9385167e70825931231ddf1181b80 @SQ SN:chr4 LN:156508116 UR:file:/raid1/rzeng/reference/genome.fa M5:e7bdfb3ce7f54d2720b0718ed2ea063c @SQ SN:chr5 LN:151834684 UR:file:/raid1/rzeng/reference/genome.fa M5:095f3d4ebe1f0bafff057cc9b130186d @SQ SN:chr6 LN:149736546 UR:file:/raid1/rzeng/reference/genome.fa M5:62628d042ea5e01adff5b481d23def67 @SQ SN:chr7 LN:145441459 UR:file:/raid1/rzeng/reference/genome.fa M5:65da9ab01a76dcbcaef6f32a753585c1 @SQ SN:chr8 LN:129401213 UR:file:/raid1/rzeng/reference/genome.fa M5:dd2d079a37c02e8a3f95abff9e37ac69 @SQ SN:chr9 LN:124595110 UR:file:/raid1/rzeng/reference/genome.fa M5:ef8a85e56b750c10568656361fac7990 @SQ SN:chrM LN:16299 UR:file:/raid1/rzeng/reference/genome.fa M5:11c8af2a2528b25f2c080ab7da42edda @SQ SN:chrX LN:171031299 UR:file:/raid1/rzeng/reference/genome.fa M5:b3db6d6da78d5268688ee395c2c8cb4a @SQ SN:chrY LN:91744698 UR:file:/raid1/rzeng/reference/genome.fa M5:837a35bcca18643d030d4eec5e5b9c64
[rzeng@prism reference]$ more New1.vcf
##fileformat=VCFv4.1 ##samtoolsVersion=0.1.18-r572 ##reference=ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa ##source_20130026.2=vcf-annotate(r813) -f +/D=200/d=5/q=20/w=2/a=5 (AJ,AKR,CASTEiJ,CBAJ,DBA2J,FVBNJ,LPJ,PWKPhJ,WSBEiJ) ##source_20130026.2=vcf-annotate(r813) -f +/D=250/d=5/q=20/w=2/a=5 (129S1,BALBcJ,C3HHeJ,C57BL6NJ,NODShiLtJ,NZO,Spretus) ##source_20130305.2=vcf-annotate(r818) -f +/D=155/d=5/q=20/w=2/a=5 (129P2) ##source_20130304.2=vcf-annotate(r818) -f +/D=100/d=5/q=20/w=2/a=5 (129S5) ##contig=<id=1,length=195471971> ##contig=<id=10,length=130694993> ##contig=<id=11,length=122082543> ##contig=<id=12,length=120129022> ##contig=<id=13,length=120421639> ##contig=<id=14,length=124902244> ##contig=<id=15,length=104043685> ##contig=<id=16,length=98207768> ##contig=<id=17,length=94987271> ##contig=<id=18,length=90702639> ##contig=<id=19,length=61431566> ##contig=<id=2,length=182113224> ##contig=<id=3,length=160039680> ##contig=<id=4,length=156508116> ##contig=<id=5,length=151834684> ##contig=<id=6,length=149736546> ##contig=<id=7,length=145441459> ##contig=<id=8,length=129401213> ##contig=<id=9,length=124595110> ##contig=<id=x,length=171031299> ##FILTER=<id=basequalbias,description="min p-value="" for="" baseq="" bias="" (info="" pv4)="" [0]"=""> ##FILTER=<id=enddistbias,description="min p-value="" for="" end="" distance="" bias="" (info="" pv4)="" [0.0001]"=""> ##FILTER=<id=gapwin,description="window size="" for="" filtering="" adjacent="" gaps="" [3]"=""> ##FILTER=<id=het,description="genotype call="" is="" heterozygous="" (low="" quality)="" []"=""> ##FILTER=<id=mapqualbias,description="min p-value="" for="" mapq="" bias="" (info="" pv4)="" [0]"=""> ##FILTER=<id=maxdp,description="maximum read="" depth="" (info="" dp="" or="" info="" dp4)="" [200]"=""> ##FILTER=<id=minab,description="minimum number="" of="" alternate="" bases="" (info="" dp4)="" [5]"=""> ##FILTER=<id=mindp,description="minimum read="" depth="" (info="" dp="" or="" info="" dp4)="" [5]"=""> ##FILTER=<id=minmq,description="minimum rms="" mapping="" quality="" for="" snps="" (info="" mq)="" [20]"=""> ##FILTER=<id=qual,description="minimum value="" of="" the="" qual="" field="" [10]"=""> ##FILTER=<id=refn,description="reference base="" is="" n="" []"=""> ##FILTER=<id=snpgap,description="snp within="" int="" bp="" around="" a="" gap="" to="" be="" filtered="" [2]"=""> ##FILTER=<id=strandbias,description="min p-value="" for="" strand="" bias="" (info="" pv4)="" [0.0001]"=""> ##FILTER=<id=vdb,description="minimum variant="" distance="" bias="" (info="" vdb)="" [0]"=""> ##FORMAT=<id=dp,number=1,type=integer,description="# high-quality="" bases"=""> ##FORMAT=<id=gl,number=3,type=float,description="likelihoods for="" rr,ra,aa="" genotypes="" (r="ref,A=alt)""> ##FORMAT=<id=gq,number=1,type=integer,description="genotype quality"=""> ##FORMAT=<id=gt,number=1,type=string,description="genotype"> ##FORMAT=<id=pl,number=g,type=integer,description="list of="" phred-scaled="" genotype="" likelihoods"=""> ##FORMAT=<id=sp,number=1,type=integer,description="phred-scaled strand="" bias="" p-value"=""> ##FORMAT=<id=fi,number=1,type=integer,description="pass(1) or="" fail="" (0)="" filter"=""> ##INFO=<id=ac1,number=1,type=float,description="max-likelihood estimate="" of="" the="" first="" alt="" allele="" count="" (no="" hwe="" assumption)"=""> ##INFO=<id=af1,number=1,type=float,description="max-likelihood estimate="" of="" the="" first="" alt="" allele="" frequency="" (assuming="" hwe)"=""> ##INFO=<id=dp,number=1,type=integer,description="raw read="" depth"=""> ##INFO=<id=dp4,number=4,type=integer,description="# high-quality="" ref-forward="" bases,="" ref-reverse,="" alt-forward="" and="" alt-reverse="" bases"=""> ##INFO=<id=indel,number=0,type=flag,description="indicates that="" the="" variant="" is="" an="" indel."=""> ##INFO=<id=mdv,number=1,type=integer,description="maximum number="" of="" high-quality="" non-reference="" bases"=""> ##INFO=<id=mq,number=1,type=float,description="rms mapping="" quality"=""> ##INFO=<id=msd,number=1,type=float,description="maximum depth="" across="" non-ref="" genotypes"=""> ##INFO=<id=pv0,number=1,type=float,description="p-value for="" strand="" bias"=""> ##INFO=<id=pv1,number=1,type=float,description="p-value for="" baseq="" bias"=""> ##INFO=<id=pv2,number=1,type=float,description="p-value for="" mapq="" bias"=""> ##INFO=<id=pv3,number=1,type=float,description="p-value for="" tail="" distance="" bias"=""> ##INFO=<id=pv4,number=4,type=float,description="p-values for="" strand="" bias,="" baseq="" bias,="" mapq="" bias="" and="" tail="" distance="" bias"=""> ##INFO=<id=qd,number=1,type=float,description="quality by="" depth"=""> ##INFO=<id=sb,number=1,type=float,description="strand bias"=""> ##INFO=<id=vdb,number=1,type=float,description="variant distance="" bias"=""> ##INFO=<id=ac,number=.,type=integer,description="allele count="" in="" genotypes"=""> ##INFO=<id=an,number=1,type=integer,description="total number="" of="" alleles="" in="" called="" genotypes"=""> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129P2
please, edit and format your question.