I am creating a gene sequence for a sample in the vcf using a standard reference genome. The command for generating the sequence I found on this site works well.
samtools faidx ref.fasta chrom:start-stop | bcftools consensus -s sample my.vcf
But I have separate SNP and INDEL vcf files generated using GATK UnifiedGenotyper. I would like to merge these files so I can generate a consensus sequence from a reference. I want to include the INDELS but I am having trouble finding information on what happens with common tools used to join these vcf files.
Tools like: GATK CombineVariants
Any ideas would be appreciated.
Thanks
and what do you want to know ?
The PI I work for would like to be able to generate strain specific gene sequences which include both snps and indels for the generation of PCR primers etc...
What happens when I combine the snp and indel vcf and there are overlapping sites?
Please use
ADD REPLY
to answer to earlier comments, as such this thread remains logically structured and easy to follow. I now moved your comment, but as you can see that's not optimal.1) try and see. 2) If the SNP and the INDEL share the same REF allele, i would say GATK produces only one variant. Else two variants will be created.
Thank you for the help.
furthermore, GATK combine variant has a parameter to prioritize the source of genotypes.