I have numerous VCF files of a HIV genomic region and I'm looking to retrieve the resulting consensus sequence from just a region ~50 nt of the alignment so I can feed it to a TF binding prediction program. Normally I'd just use vcf-consensus to output the whole region and then slice the region I want out. However, there are numerous indels (of varying sizes) in this region that are important and I don't want to use that sort of method.
I can't seem to find any flags in the vcf-consensus tool that lets me limit the output.
Any suggestions?
I can easily filter the VCF to get my desired region. But if there are insertions wouldn't bedtools-intersect just give me the 50nt region and not 50 nt + insertions?
Will, how did you solved this? How did you filter the desired region in a .vcf? Thanks!