Problem replacing contigs in a dbsnp vcf using Picard SortVcf
0
0
Entering edit mode
9.7 years ago
idedios ▴ 30

Right now I'm trying to change the contigs in the dbsnp vcf to match those of my reference genome (from 1, 2, ..., Y, MT to chrM, chr1, chr2,...)

I'm currently using JDK 1.7 u79 for compatibility with MuTect 1.7

SortVcf was used as such:

java -jar picard.jar SortVcf \
  INPUT=dbsnp.vcf \
  OUTPUT=dbsnp.fixed.vcf \
  SEQUENCE_DICTIONARY=hg19.dict

Here's SortVcf's output:

Exception in thread "main" java.lang.NullPointerException
    at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:84)
    at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:21)
    at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
    at java.util.TimSort.sort(TimSort.java:203)
    at java.util.Arrays.sort(Arrays.java:727)
    at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:218)
    at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:165)
    at picard.vcf.SortVcf.sortInputs(SortVcf.java:154)
    at picard.vcf.SortVcf.doWork(SortVcf.java:87)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Picard contig vcf • 4.5k views
ADD COMMENT
1
Entering edit mode

Have you already added prefix "chr" to your dbSNP file and then using the new file as an input ? I am not sure if this is the problem but may be try the following command first on your dbSNP file:

awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' dbSNP_old.vcf > dbSNP_new.vcf

You can now use the dbSNP_new,vcf file as an input for the picard. I have a python script that sorts vcf file based on a given order of chromosomes OR You can download it from here and modify it accordingly.

ADD REPLY
0
Entering edit mode

On top off adding the prefix I have to fix the mitochondrial chromosome contig from MT to chrM. Then maybe SortVcf can reorder it.

Thanks!

ADD REPLY
1
Entering edit mode
sed 's/MT/chrM/g' dbSNP_new.vcf > dbSNP_extranew.vcf

:-) OR

sed -i 's/MT/chrM/g' dbSNP_new.vcf

will not produce a new file. The -i will edit the file on the spot.

ADD REPLY
0
Entering edit mode

Thanks I really need to learn to use awk and sed.

ADD REPLY
0
Entering edit mode

Yes. They are really handy and pretty fast when it comes to text manipulation. I would strongly suggest you to learn basic awk one liners.

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6