I am trying to update an older .vcf file with some new information so that it reflects the changes that have been implemented in the . The sample IDs have changed, as well as some of the formatting of the GT, DS, and GP information (e.g. changing forward slashes to pipe symbols). I have been researching how best to go about this process using the Python packages of PyVCF, but it's not entirely clear from their docs (https://pyvcf.readthedocs.io/en/latest/INTRO.html) how one can do this. I have tried to use PyVCF Writer object that would copy the template of the new .vcf file (i.e. its metadata and format), and then I wanted to make a for loop that would iterate over each record in the old .vcf and then change each of the sample names (based on a pre-existing dict), as well as modify the content of the INFO, FORMAT, and sample result sections.
However, it does not seem that PyVCF has any tools to easily do this. So I found another library called VCFPy (https://vcfpy.readthedocs.io/en/stable), but it also does not seem that it has any clearcut tools to do this easily.
With both packages, I wanted to iterate over the old .vcf file (as a reader object), copy each sample and variant, and modify each respectively. So my code would kind of look like this below:
old_vcf_reader = vcf.Reader(filename='vcf/test/tb.vcf.gz')
new_vcf_writer = vcf.Writer(open('/dev/null', 'w'), vcf_reader)
for record in old_vcf_reader:
#update sample names and modify GT formats
vcf_writer.write_record(record)
But does anybody know how I can readily update/modify content within each record in the above for loop easily?
Have you looked at the cyvcf2 documentation? That's my preferred module for working with VCFs.
I have not tried it yet. I will look into it now.