Entering edit mode
4.1 years ago
storm1907
▴
30
Hello, I am in need to convert vcf to fasta. Previous topics did not seem to resolve this type of problem
I have this code, but when trying to do bgzip and tabix commands, i afterwards get error like "the input file is not BGZF" and "Broken VCF header, no column names?" Any help will be appreciated
#!bin/bash
module load bio/vcftools/0.1.17
module load bio/samtools/1.10
export inpath=/dir
export outpath=/dir
mkdir -p $outpath
chmod u+rwx $outpath
export reference=/ref
$reference
for file in $inpath/* ;
do
bname=$(basename $file)
echo "base name is $bname"
bfile=$outpath/$bname".bgz"
tabfile=$outpath/$bname".tabix"
outfile=$outpath/$bname".fasta"
bgzip $file $bfile
cd $outpath
tabix -f $bfile > $tabfile
cat $reference | vcf-consensus $tabfile > $outfile
done
You're using the
bgzip
command wrong. You're also not giving tabix the preset to use, plus using the command itself wrong. Fix those and you should be fine. Read the bgzip/tabix manuals.Thank, you, fixed
but the next problem is: I get output fasta file with correct sample name in file's title, and reference sequence name inside the file:
Any suggestions, why is that? :o
Is that output not how you expect it to be? What do you expect and what are you getting?
I would like to see sample's name in the very begginning of file. Not reference sequence's name
That seems to be a custom requirement, which may not be addressed by vcf-consensus. You should be able to use
awk
orbioawk
to add a line containing theFILENAME
.Thank you! I also had an idea, that vcf consensus might be supposed for generating multi-fasta. Because at the moment I am obtaining only single fasta files, which should be concatenated
Try switching to
bcftools consensus
. vcftools is obsolete, I think.