How To Adjust The Chromosome Contigs Order
2
1
Entering edit mode
11.2 years ago
Tonyzeng ▴ 310

When I run BaseRecalibrator, my input reference file is genome.fa, my mapped file is K1_bam.marked.realigned.fixed.bam and variants/dbSNP file is New.vcf. the commands are showed as follows:

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I K1_bam.marked.realigned.fixed.bam -R genome.fa -knownSites New.vcf -o recal_dataK1.grp 

Here is the result:

#####ERROR MESSAGE: Input files /raid1/rzeng/reference/New.vcf and reference have incompatible contigs: No overlapping contigs found.
#####ERROR   /raid1/rzeng/reference/New.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X]
#####ERROR   reference contigs = [chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]

Here my reference contig order is obviously different from variants files. How can I adjust the order of reference file and make it the same as New.vcf file? Also, my .bam file is a mapped file by aligning sample sequencing reads with reference file, so .bam should has the same chromosome order as reference file, do I need to to adjust the chromosome order of .bam file too, So confused here!

Thank you

gatk reference dbsnp chromosome • 7.6k views
ADD COMMENT
0
Entering edit mode

Hi,

I am having a vcf file :

> seqlevels(D784G)

[1] "chrI"    "chrII"   "chrIII"  "chrIV"   "chrIX"   "chrM"    "chrV"    "chrVI"   "chrVII"  "chrVIII"
[11] "chrX"    "chrXI"   "chrXII"  "chrXIII" "chrXIV"  "chrXV"   "chrXVI" 

> seqlevels(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene)

[1] "chrI"    "chrII"   "chrIII"  "chrIV"   "chrV"    "chrVI"   "chrVII"  "chrVIII" "chrIX"   "chrX"   
[11] "chrXI"   "chrXII"  "chrXIII" "chrXIV"  "chrXV"   "chrXVI"  "chrM"   

> loc <- locateVariants(D784G, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, AllVariants())

Error in mergeNamedAtomicVectors(genome(x), genome(y), what = c("sequence",  : 
  sequences chrI, chrII, chrIII, chrIV, chrIX, chrM, chrV, chrVI, chrVII, chrVIII, chrX, chrXI, chrXII, chrXIII, chrXIV, chrXV, chrXVI have incompatible genomes:
  - in 'x': TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene
  - in 'y': sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, sacCer3, 

How to handle it?

Do I need to re arrange the seqlevels of my vcf file? If yes, then how?

Sincerely,
Deepak

ADD REPLY
3
Entering edit mode
11.2 years ago

Easier is to sort your VCF instead of the BAM. As your BAM seems to be sorted on a simple lexical order you can just

sed 's/^\([^#]\)/chr\1/' < in.vcf  | LC_ALL=C sort -t '\t' -k1,1 -k2,2n > out.vcf

beware if your VCF header contains some "##contig=" lines: you might have to change it too with sed .

update: added the sed

update 2021: you can do this with bcftools only now.

ADD COMMENT
0
Entering edit mode

Hi, Pierr, Thank you for the reply. My VCF does has the header like these

##contig=<ID=1,length=195471971>
##contig=<ID=10,length=130694993>

So how to reorder this by SED? Not so familiar with this though, thanks

ADD REPLY
0
Entering edit mode

Pierre, when I grep the header using grep '^#' input.vcf > output.vcf my VCF heder contains some ##contig= lines as you mentioned. However, the actual order for the ##contig= is as followings 1,10,11,12,13,14,15,16,17,18,19,2,3,4,5,6,7,8,9,X. It is different from the Error information generated by GATK as ERROR /raid1/rzeng/reference/New.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X] above.

How can I do this now? Seems like the actually order is not as complicated as before, I just need to sort chr1 from very beginning to the position between Chr19 and Chr2, right?

ADD REPLY
0
Entering edit mode

Hi, Pierre,

I have done running Basecalibration of GATK without any modification of the order ##contig=, it has done with out any problem. So I do not need to sort the header anyway. Thanks

ADD REPLY
0
Entering edit mode

Thanks, it works this time

ADD REPLY
0
Entering edit mode

Hi tony,

Could you please tell me how it works. I am also facing similar error.

ADD REPLY

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6