Reorder The Contig=<Id Of The Header Of Vcf File, How?
1
0
Entering edit mode
11.1 years ago
Tonyzeng ▴ 310

HI, I have VCF file with header that I need to change the order of contig ID from

##contig=<ID=1,length=195471971>
##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=X,length=171031299>

How can I change it to

##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=1,length=195471971>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=X,length=171031299>
vcf • 5.8k views
ADD COMMENT
0
Entering edit mode

That's not a BAM header. Do you mean VCF?

ADD REPLY
0
Entering edit mode

Thank you for the reminding, Dpryan, I corrected it.

ADD REPLY
0
Entering edit mode

Do you need to reorder the whole file, or just the header lines? It's unclear from your question.

ADD REPLY
0
Entering edit mode

I need just reorder the header lines because the order of read lines have been modified perfectly, Thank you!

ADD REPLY
0
Entering edit mode

Huh!! I just wrote a code for you to order the read lines. Anyways, its a hightime for you to learn vi commands (http://www.cs.colostate.edu/helpdocs/vi.html). Use unix to edit the file if it is too big for any windows application like Notepad++,

ADD REPLY
0
Entering edit mode

Thanks, Ashutoshmits, I am sorry not to make it clear that I do generate a VCF file with the correct chromosome order to the READ LINES but not the header line. As for the header line of VCF file, I still need to reorder ##contig=<id=number. i="" assumed="" that="" the="" following="" code="" you="" posted="" works="" for="" order="" the="" read="" lines="" but="" not="" for="" the="" header="" line.="" <="" p="">

ADD REPLY
0
Entering edit mode

Ashutoshmits, I have done running Basecalibration of GATK without any modification of the order ##contig=, it has done with out any probelm. So I do not need to sort the header anymore.

ADD REPLY
0
Entering edit mode

Cool. It means GATK doesnt care for the contig order in the header of a VCF file.

ADD REPLY
0
Entering edit mode

Oh yeah! Thank you so much for your help anyway, Ashutoshmits

ADD REPLY
0
Entering edit mode
11.1 years ago

Here is the code that should work. You will have to manually change the order in header but it will take care of the remaining. Make sure your computer has enough RAM if you have a big VCF file.


import os, sys
Argument = []
Argument = sys.argv[1:]

if (len(Argument)) < 1:
        print "Usage:Input_vcf Outputfile"
        sys.exit()

output = open(Argument[1],"w")
input = open(Argument[0])

def numeric_compare(x, y):
        x1 = int(x)
        y1 = int(y)
        return x1 - y1
Chromosome = ["10","11","12","13","14","15","16","17","18","19","X","1","2","3","4","5","6","7","8","9"]
VCF = {}
for line in input:
        if line.startswith("#"):
                output.write(str(line))
                continue
        v = []
        v = line.strip("\n").split("\t")

        if v[0] not in VCF:
                VCF[v[0]] = {}
                VCF[v[0]][v[1]] = line
        else:
                VCF[v[0]][v[1]] = line
for chr in Chromosome:
        for pos in sorted(VCF[chr].keys(),cmp=numeric_compare):
                output.write(str(VCF[chr][pos]))
                output.flush()
output.close()
ADD COMMENT

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6