Entering edit mode
23 months ago
William
★
5.3k
How to convert VCF (with possible predicted gene effects) and multiple samples to protein fasta/MSA
Input:
- VCF (possibly with already gene/protein effects predicted via e.g. SnpEff)
- GFF3 (for the reference protein sequence and maybe to predict effects)
Output:
- protein fasta (1 or 2 sequences per sample in the VCF (2 sequences for heterozygous samples))
Is there any tool that can do this? command line or in python/R code?
This only seems to create the protein sequence per variant, not per sample (with possibly multiple variant effects included based on sample genotype(s)).
If you've got phased genotypes then you want Haplosaurus