I have been trying to generate FASTA sequences for a region using a multi-sample VCF file and a reference genome. I have 70 diploid individuals in the VCF in total, and what I finally want is to obtain 140 sequences, two for each sample. Output format like multiple sequence alignment is also great for me. Is there any script/tools can do this?
I have tried FastaAlternateReferenceMaker (from GATK) but it only gives me a consensus sequence for all the samples.
Any help will be greatly appreciated!
extract the VCF for each sample with a loop and
bcftools view -s ${SAMPLENAME}
....Thanks for the reply. I understand that I could iteration the process with a single-sample VCF, but any suggestions for generating two FASTA sequences using VCF?