I have multiple VCFs for single cell RNA seq data and I want to get peptide sequences from these files. I have searched and found out that we can annotate the VCF with Ensembl's VEP to get the amino acids/protein information. However, I am looking to get a fasta file as an output which can be used for downstream analysis. I found out that GATK's FastaAlternateReferenceMaker can be used to get a fasta file from VCF.
Can I use the output VCF from VEP as an input to the FastaAlternateReferenceMaker to get the required fasta? I am not sure if I should pass the entire VCF as an input or just the protein information. Please help me get a better understanding of this procedure.
With
ENSP
you will get the reference protein sequence though correct?Yes correct.