Dear Biostars,
I have a question concerning the generation of vcf (variant calling format) creation.
Does anyone know of a tool that would allow me to turn a multiple sequence alignment (containing reference and several variants) into a vcf file?
thanks!
EDIT:
I have a multiple sequence alignment of a several cloned papillomaviruses. We know that the sequence of each individual genome are correct. I.e. all variations between the reference and these additional sequences represent naturally occurring SNPs (and not sequencing errors). I would like to extract the SNPs (and indels) from this alignment and create a vcf file.
I hope this clarifies the problem!
thanks again
duplicate of Getting A Vcf File From A Fasta Alignment
This was asked 2.6 years ago. :p
search the website for "SNP calling"
SNP calling is a little bit different from what I am looking for. Calling implies a certain threshold before something is considered a SNP and returns a level of confidence for each identified SNP. The sequences i am using are confirmed variants, i.e. I know that each variation is real. I would like to "simply" create a vcf file containing all differences between the files
Did you figure out a tool that does this? Also do you mean that any multiple sequence alignments using assembled sequences (assuming the assembly is correct) do not have to go through a "variant calling" approach? What about alignment errors?
What format is your data in? We need more information to understand what you are trying to do. What and Why = Best answer.