I am trying to use Python to merge a set of VCF files that cannot be handled by vcftools or any other similar software due to non-standard format.
The files are divided by chromosome, and I want them in one file (without the header getting caught inside)
a
OutFile = open('AltaiNeanderthal.vcf', 'w')
filelist = ['AltaiNea.hg19_1000g.1.mod_filtered.vcf', 'AltaiNea.hg19_1000g.2.mod_filtered.vcf', 'AltaiNea.hg19_1000g.3.mod_filtered.vcf', 'AltaiNea.hg19_1000g.4.mod_filtered.vcf', 'AltaiNea.hg19_1000g.5.mod_filtered.vcf', 'AltaiNea.hg19_1000g.6.mod_filtered.vcf', 'AltaiNea.hg19_1000g.7.mod_filtered.vcf', 'AltaiNea.hg19_1000g.8.mod_filtered.vcf', 'AltaiNea.hg19_1000g.9.mod_filtered.vcf', 'AltaiNea.hg19_1000g.10.mod_filtered.vcf', 'AltaiNea.hg19_1000g.11.mod_filtered.vcf', 'AltaiNea.hg19_1000g.12.mod_filtered.vcf', 'AltaiNea.hg19_1000g.13.mod_filtered.vcf', 'AltaiNea.hg19_1000g.14.mod_filtered.vcf', 'AltaiNea.hg19_1000g.15.mod_filtered.vcf', 'AltaiNea.hg19_1000g.16.mod_filtered.vcf', 'AltaiNea.hg19_1000g.17.mod_filtered.vcf', 'AltaiNea.hg19_1000g.18.mod_filtered.vcf', 'AltaiNea.hg19_1000g.19.mod_filtered.vcf', 'AltaiNea.hg19_1000g.20.mod_filtered.vcf', 'AltaiNea.hg19_1000g.21.mod_filtered.vcf', 'AltaiNea.hg19_1000g.22.mod_filtered.vcf']
for infile in filelist:
openfile = open(infile, 'r')
for Line in infile:
Line=Line.strip('\n')
if Line[0] != '#':
OutFile.write(Line + '\n')
openfile.close()
OutFile.close()
and my output is basically line by line every character in the file names
A
l
t
a I N
e
a
.
h
Anyone know why it is doing this? Thanks!
(I know my code is (if it worked) is even omitting the first header, I plan on adding that back manually).
Now get the header from any vcf file and add it to the top of the
Clean.vcf
fileYeah that or this should also work I think (these are both bash shell linux by the way):
And if you want only one of the headers, just pick that specific file instead of iterating over the folder.
Just a quick question: do you have access to a linux system?