Which VCF file to select for generating Multiple sequence alignment of a gene?
1
0
Entering edit mode
2.2 years ago

Hello, I have a conceptual question. I want to create fasta files for a genomic region from a multisample VCF file. I use the FastaAlternateReferenceMaker from GATK to do this. I am dealing with a highly heterozygous species, so I have quite a few heterozygous SNPs in the VCF file. So, when I generate the fasta files for each sample from the VCF file, I get a few Letters like K, Y, etc. instead of nucleotide bases.

This means that in those positions, the SNP is heterozygous, is it right?

Also, when making fasta files from a multisample VCF, should I use the VCF file filtered with MAF, genotyping call rate, and other filtering criteria? Or should I use an unfiltered VCF file for such purposes?

Thank you.

VCF GATK sequence alignment • 1.3k views
ADD COMMENT
1
Entering edit mode
2.2 years ago
cmdcolin ★ 4.0k

I think you are correct in your assesment: the tool says

"--use-iupac-sample null If specified, heterozygous SNP sites will be output using IUPAC ambiguity codes given the genotypes for this sample"

https://gatk.broadinstitute.org/hc/en-us/articles/360037594571-FastaAlternateReferenceMaker

ADD COMMENT
0
Entering edit mode

note that bcftools consensus has similar features too

ADD REPLY
0
Entering edit mode

Ok, that makes sense. But then how do you deal with such kinds of heterozygous SNP sites if you want to translate them into protein sequences or do some kind of sequence entropy analysis? Do you have any suggestions?

Thank you.

ADD REPLY
0
Entering edit mode

this is a good question, I don't really have any particular answer other than to keep trying to work with the consensus tools you are trying or coding your own tools :) to me, getting accurate non-reference gene structure predictions is still in need of work! most workflows just use "variant effect prediction"

ADD REPLY
0
Entering edit mode

Thank you very much for your feedback. Make sense. I will try to find something.

ADD REPLY

Login before adding your answer.

Traffic: 2412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6