I have generated vcf files containing SNPs from mutant genome sequences. Since the mutagen only caused transitions to these genomes, I want to make vcf files that have only transition mutations/ SNPs. I know awk could easily do it. I have started with the following command:
cat file.vcf | awk '$4=="G" && $5=="C"' > transitions.vcf
it generates a file with G>C transitions only. However, I need to include 3 other transitions (A>G, C>T and T>C) from the same file to the transitions.vcf file. I am not sure how to combine all awks together to get one single vcf file. Any help is highly appreciated.
All this would do is extract positions where the REF allele is G and ALT is C irrespective of your subject ‘s genotype.
It will NOT give you positions where your subject has G > C mutations