Entering edit mode
3.9 years ago
MAPK
★
2.1k
I have a very large VCF.vcf.gz
file. I want to remove chr from column 1 and column 3.
I tried zcat ${VCF} | awk '{if($0 !~ /^#/) gsub(/chr/,""); print}'| bgzip -c> ${VCF%*..*}-with_no_chr.vcf.gz && tabix -s1 -b2 -e2 ${VCF%*..*}-with_no_chr.vcf.gz
, but is there a better way to do it?
##contig=<ID=HLA-DRB1*>
##reference=file:////Homo_sapiens_assembly38.fasta
##source=ApplyVQSR
##source=SelectVariants
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 69511 chr1:69511:A:G A G 11157600 PASS
chr1 69536 chr1:69536:C:A C A 581.98 PASS
chr1 69536 chr1:69536:C:T C T 581.98 PASS
Result I want:
##contig=<ID=HLA-DRB1*>
##reference=file:////Homo_sapiens_assembly38.fasta
##source=ApplyVQSR
##source=SelectVariants
#CHROM POS ID REF ALT QUAL FILTER INFO
1 69511 1:69511:A:G A G 11157600 PASS
1 69536 1:69536:C:A C A 581.98 PASS
1 69536 1:69536:C:T C T 581.98 PASS
Did you tried change chromosome notation?
It only renamed the chr first column not the chr:position