I have run into a problem that seems simple but has been surprisingly hard. I have a group of .g.vcf.gz files that I want to joint genotype with GATK. They were generated with incorrect and sloppy read set names. I would like to change the name in the header before grouping them together. Unfortunately, I do not have access to a Unix based machine. I have tried to install cygwin for samtools or tabix but my lack of programming has prevented me from getting those programs to work. The other option I explored was decompressing the files, editing the vcf file and then converting back to vcf.gz. The e power of my computer is not great and the large file size has stumpted me.
Can anyone recommend an easy way to change or alter a header?
Hello there.
well it seems you have a set of problems, instead of just one. I am in the same situation as you, but i have solved a few, so here you go:
If you haven't acces to a Unix or linux platform, and just windows, the way to go is to install in your machine a "virtual machine", you can actually use Unix/Linux as desired on your own windows machine, here is a way to start:https://www.storagecraft.com/blog/the-dead-simple-guide-to-installing-a-linux-virtual-machine-on-windows/
I lasted a day or two installing it and understanding the basics, but when you have success with it, is like having a whole new computer installed in your before only computer.
After doing that and having your Unix/Linux machine running, you may need to install samtools and bcftools through the terminal, that is not a big deal, you should be able to do that following the tutorials on the source page here: http://www.htslib.org/
you know the path, download packages, unpack them, install them, go through the tutorial ... etc.
when you have these programs installed, i have a script that solved me the same problem on a .bam file,
I am having the same problem as you, trying to change a .vcf.gz header at the moment and that's how i found your post... i hope my comment is of any aid, and i hope we both will be able to solve the tip of the iceberg of our bioinformatic problem, because i think that going through the correction of the headers manually will be dead boring... plus exageratedly time spending.
cheers !