VCF samtools
0
0
Entering edit mode
3.0 years ago
Lucía • 0

Hello, I am having trouble when doing variant calling with samtools. I am getting only the header an no variants. If I would instead use Freebayes, I do get a lot of variables, and with Gatk, I get just a few. What can the problem be? Do I have to use different formats of bam input files, or different reference genome formats?

This is the command I am running:

samtools mpileup -go output/"$r1"/BWA_samtools_"$r1".bcf -f "$Re1" output/"$r1"/"$r1"_RG_bwa.bam
bcftools call -vmO v -o output/"$r1"/BWA_samtools_"$r1".vcf output/"$r1"/BWA_samtools_"$r1".bcf

and this is the empty header I am getting:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SRRsample_R1

Thank you very much in advance!

vcf samtools variant-calling • 1.7k views
ADD COMMENT
0
Entering edit mode

use

bcftools mpileup 

instead

bcftools mpileup -O v -f reference.fa  alignment.bam | bcftools call --ploidy 1 -m -v -O z -o variants.vcf.gz 
ADD REPLY
0
Entering edit mode

I am still getting no variants :( only the header

ADD REPLY
0
Entering edit mode

most likely it is a default filtering issue. freebayes will pass less reliable variants, the default p-value threshold for bcftools is 0.5

I would visualize the BAM file, then use the variants from all three callers and verify what exactly are being called (or not).

there are also many other possible reasons for some variants not making it through, all having to do with the statistical model and expectations of the data. For example, if all variants come from the same strand some variant callers ignore that, others filter it out because of strand bias. etc

ADD REPLY
0
Entering edit mode

Great, so if the problem should not be reference genome format or the bam format being different for samtools? If it is a filtering issue, can I change it for samtools being more permissing? Because I am not getting any variable at all. Thank you!

ADD REPLY
0
Entering edit mode

the mpileup command generates a genotype for every base. Basically, it is a VCF file with every single base getting a probability of being a variant. When you use "bcftools call" it selects from these variants based on certain criteria. Perhaps you could try --pval-threshold 1

ADD REPLY
0
Entering edit mode

Thank you for your response :) I am still getting no variants though

ADD REPLY
0
Entering edit mode

ah yes, the world of bioinformatics can mysterious ... sometimes you run a tool and it does not seem to work at all ... but WHY????

if you are certain you are running it correctly then almost always it has to to with the data - there could be something about your data that makes bcftools think variants are non-reliable - whereas the other tools use different criteria and ignore that signal. But who is right? Deciding that is what bioinformatics is about.

like I said, visualize the variants in IGV, next to the alignment file, then you can figure out which one of the tools is right.

ADD REPLY
0
Entering edit mode

The thing is that I am not getting any variants with samtools, and there are indeed some variants in my file. That leads me to think that maybe the command or any format of the pipeline is wrong for that tool.

ADD REPLY

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6