Entering edit mode
3.0 years ago
Lucía
•
0
Hello, I am having trouble when doing variant calling with samtools. I am getting only the header an no variants. If I would instead use Freebayes, I do get a lot of variables, and with Gatk, I get just a few. What can the problem be? Do I have to use different formats of bam input files, or different reference genome formats?
This is the command I am running:
samtools mpileup -go output/"$r1"/BWA_samtools_"$r1".bcf -f "$Re1" output/"$r1"/"$r1"_RG_bwa.bam
bcftools call -vmO v -o output/"$r1"/BWA_samtools_"$r1".vcf output/"$r1"/BWA_samtools_"$r1".bcf
and this is the empty header I am getting:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRRsample_R1
Thank you very much in advance!
use
instead
I am still getting no variants :( only the header
most likely it is a default filtering issue. freebayes will pass less reliable variants, the default p-value threshold for bcftools is 0.5
I would visualize the BAM file, then use the variants from all three callers and verify what exactly are being called (or not).
there are also many other possible reasons for some variants not making it through, all having to do with the statistical model and expectations of the data. For example, if all variants come from the same strand some variant callers ignore that, others filter it out because of strand bias. etc
Great, so if the problem should not be reference genome format or the bam format being different for samtools? If it is a filtering issue, can I change it for samtools being more permissing? Because I am not getting any variable at all. Thank you!
the mpileup command generates a genotype for every base. Basically, it is a VCF file with every single base getting a probability of being a variant. When you use "bcftools call" it selects from these variants based on certain criteria. Perhaps you could try
--pval-threshold 1
Thank you for your response :) I am still getting no variants though
ah yes, the world of bioinformatics can mysterious ... sometimes you run a tool and it does not seem to work at all ... but WHY????
if you are certain you are running it correctly then almost always it has to to with the data - there could be something about your data that makes bcftools think variants are non-reliable - whereas the other tools use different criteria and ignore that signal. But who is right? Deciding that is what bioinformatics is about.
like I said, visualize the variants in IGV, next to the alignment file, then you can figure out which one of the tools is right.
The thing is that I am not getting any variants with samtools, and there are indeed some variants in my file. That leads me to think that maybe the command or any format of the pipeline is wrong for that tool.