FastaAlternateReferenceMaker is not outputting fasta sequence for multiple samples from a VCF
1
0
Entering edit mode
2.4 years ago

Hello, I have the coordinates for a gene of interest and I am trying to obtain a consensus sequence from a multi-sample VCF file where eventually I would like to have one fasta sequence for each sample. And then I will perform a multiple sequence alignment. I am using the below command:

gatk FastaAlternateReferenceMaker -R Ref.fasta -O heatshock.fasta -L Scaffold_9__1_contigs__length_70652527:6009958-6011175 -V Multi_sample.vcf 

But the code is only outputting only one fasta sequence and it does not even print the sample name. I have 99 samples in my VCF file.

Can you please tell me what am I doing wrong here? Do I need to provide a vcf for each individual to obtain a sequence for each individual?

Thank you.

GATK SNP fasta sequence alignment • 1.5k views
ADD COMMENT
1
Entering edit mode
2.4 years ago

use --use-iupac-sample https://gatk.broadinstitute.org/hc/en-us/articles/360037594571-FastaAlternateReferenceMaker#--use-iupac-sample

something like:

bcftools query -l Multi_sample.vcf  | while read S
do

gatk FastaAlternateReferenceMaker -R Ref.fasta -O "${S}.fasta" --use-iupac-sample "${S}" -L Scaffold_9__1_contigs__length_70652527:6009958-6011175 -V Multi_sample.vcf 

done
ADD COMMENT
0
Entering edit mode

Hi Pierre, Thank you very much. That worked very well. But I have another question. In some of the samples, the fasta output files contain letters like R M Y etc. instead of nucleotide bases. But in other files, the sequences are good.

Can you please tell me why is it happening and how can I correct that? Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6