As said above bcftools will be the best way to do it. If you want a code version here is mine
Assuming that you have a match_table.txt file like this (separate by tab) :
gi|996703411|ref|NW_015379183.1| chr1
gi|996703411|ref|NW_015379184.1| chr2
gi|996703411|ref|NW_015379185.1| chr3
gi|996703411|ref|NW_015379186.1| chr4
gi|996703411|ref|NW_015379187.1| chr5
Coding version in python :
###Create a dictionnary containing your match_table.txt
###Open your match table
with open("matching_table.txt", 'r') as match_f:
###For each line, you create a key/value item in a dictionnary
for line in match_f:
gi_notation = line.rstrip().split("\t")[0]
chr_notation = line.rstrip().split("\t")[1]
###Check if the key doesn't exist in the dictionnary
if gi_notation not in match_dict:
match_dict[gi_notation] = chr_notation
print("Care, duplicate in matching_table.txt, on : "+str(gi_notation))
###Open your vcf file
new_vcf_file = open("your_new_vcf_file.vcf", "a")
with open("your_vcf_file.vcf", 'r') as vcf_f:
###Read it line by line
headers_chromosome = ""
for line in vcf_f:
###Change VCF dictionnary headers
if line.startswith('##contig'):
###Get chromosome name
headers_chromosome = line.split("=")[2].split(",")[0]
###If your chromosome exist in your dictionnary
if headers_chromosome in match_dict:
###Replace in chromosome name in line
line = line.replace(headers_chromosome, match_dict[headers_chromosome])
###Skip metadata informations
if line[0] != '#':
###Retrieve your chromosome for each line
current_chromosome = line.split("\t")[0]
###If your chromosome exist in your dictionnary
if current_chromosome in match_dict:
###Change the value of your chromosome
###Your chromosome is not in your dictionnary (I write it as it is but you can do something else...)
print("This chromosome is not in my matching_table.txt : "+str(current_chromosome))
###Write unchanged metadata
