How to remove lines from a vcf file
1
0
Entering edit mode
22 months ago
Khaleesi95 ▴ 40

Hi guys, I'm really new to bash. I have a vcf file, and these are the first 37 rows of the file:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20230109
##source=PLINKv1.90
##contig=<ID=1,length=248916898>
##contig=<ID=2,length=242078573>
##contig=<ID=3,length=198053357>
##contig=<ID=4,length=189951624>
##contig=<ID=5,length=181263937>
##contig=<ID=6,length=170599915>
##contig=<ID=7,length=159278281>
##contig=<ID=8,length=145067296>
##contig=<ID=9,length=138123972>
##contig=<ID=10,length=133620800>
##contig=<ID=11,length=135031155>
##contig=<ID=12,length=133185624>
##contig=<ID=13,length=114325599>
##contig=<ID=14,length=106879457>
##contig=<ID=15,length=101857170>
##contig=<ID=16,length=90103688>
##contig=<ID=17,length=83089846>
##contig=<ID=18,length=80257298>
##contig=<ID=19,length=58572940>
##contig=<ID=20,length=64281111>
##contig=<ID=21,length=46664836>
##contig=<ID=22,length=50772965>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##bcftools_viewVersion=1.15.1+htslib-1.15.1
##bcftools_viewCommand=view -S /data/....
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       843942  1_843942        A       G       .       .       PR;AC=4832;AN=36398
1       864627  1_864627        C       T       .       .       PR;AC=407;AN=36418
1       874496  1_874496        A       G       .       .       PR;AC=4169;AN=36388
1       900119  1_900119        A       G       .       .       PR;AC=8481;AN=36354
1       903352  1_903352        G       A       .       .       PR;AC=1110;AN=36380 

Is it possible to remove the first 31 rows of the files, to get only this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
    1       843942  1_843942        A       G       .       .       PR;AC=4832;AN=36398
    1       864627  1_864627        C       T       .       .       PR;AC=407;AN=36418
    1       874496  1_874496        A       G       .       .       PR;AC=4169;AN=36388
    1       900119  1_900119        A       G       .       .       PR;AC=8481;AN=36354
    1       903352  1_903352        G       A       .       .       PR;AC=1110;AN=36380 

Probably the awk command may work, but I'm really know to the bash language and I'm not sure how to built the command line correctly. Thank you!

bash vcf • 1.3k views
ADD COMMENT
0
Entering edit mode
22 months ago
Sej Modha 5.3k

If you want to use command-line tools then you could use simple grep:

grep -v '^##' my_file.vcf
ADD COMMENT
3
Entering edit mode

Resulting file is no longer in VCF format (make a note OP).

ADD REPLY

Login before adding your answer.

Traffic: 2137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6