VCF Tools converting
0
0
Entering edit mode
4.1 years ago
storm1907 ▴ 30

Hello, I am in need to convert vcf to fasta. Previous topics did not seem to resolve this type of problem

I have this code, but when trying to do bgzip and tabix commands, i afterwards get error like "the input file is not BGZF" and "Broken VCF header, no column names?" Any help will be appreciated

#!bin/bash
module load bio/vcftools/0.1.17
module load bio/samtools/1.10

export inpath=/dir

export outpath=/dir
mkdir -p $outpath
chmod u+rwx $outpath
export reference=/ref
$reference
for file in $inpath/* ;
do
        bname=$(basename $file)
        echo "base name is $bname"
        bfile=$outpath/$bname".bgz"
        tabfile=$outpath/$bname".tabix"
        outfile=$outpath/$bname".fasta"
        bgzip $file $bfile
        cd $outpath
        tabix -f $bfile > $tabfile
        cat $reference | vcf-consensus $tabfile > $outfile
done
software error • 1.2k views
ADD COMMENT
0
Entering edit mode

You're using the bgzip command wrong. You're also not giving tabix the preset to use, plus using the command itself wrong. Fix those and you should be fine. Read the bgzip/tabix manuals.

ADD REPLY
0
Entering edit mode

Thank, you, fixed

for file in $inpath/*.filtered ;
do
        echo $file
        bname=$(basename $file)
        echo "base name is $bname"
        bfile=$inpath/$bname".gz"
        outfile=$outpath/$bname".fasta"
        bgzip -c $file > $bfile
        tabix -p vcf $bfile
        cat $reference | vcf-consensus $bfile > $outfile
done

but the next problem is: I get output fasta file with correct sample name in file's title, and reference sequence name inside the file:

<nc_xxx> <sequence>

Any suggestions, why is that? :o

ADD REPLY
0
Entering edit mode

Is that output not how you expect it to be? What do you expect and what are you getting?

ADD REPLY
0
Entering edit mode

I would like to see sample's name in the very begginning of file. Not reference sequence's name

ADD REPLY
0
Entering edit mode

That seems to be a custom requirement, which may not be addressed by vcf-consensus. You should be able to use awk or bioawk to add a line containing the FILENAME.

ADD REPLY
0
Entering edit mode

Thank you! I also had an idea, that vcf consensus might be supposed for generating multi-fasta. Because at the moment I am obtaining only single fasta files, which should be concatenated

ADD REPLY
0
Entering edit mode

Try switching to bcftools consensus. vcftools is obsolete, I think.

ADD REPLY

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6