I have a bam file that I want to do some variant calling on. We had a company generate a VCF for use using this particular bam file but I'm not entirely sure it was done correctly. I want to see if I can "double check" their work using freebayes. I have freebayes working on my computer but I was wondering if anyone has a script that they use to generate their VCFs? I've been playing with the commands but I still have been getting random calls where they shouldn't be. It's partly because there is low coverage in some regions but I also think it may be too sensitive when it comes to calling variants.
I guess what I'm asking is if anyone has a script that I can mirror to try and generate a correctly variant called VCF?
Thanks!!
did you check in the VCF header if the freebayes cmd-line is present ?
I'm not entirely sure how to do that?
Hello,
there is nothing special about using freebayes.
This should be fine. Setting the defaults to other values should be avoided as long as you don't know what you are doing.
Removing variants that could be artefacts is a step after the variant calling. For this you could use for example bcftools.
fin swimmer
When I do that, pretty much just about every spot is marked as a variant. When I do something like --min-coverage 5, a lot of the random variants go away. It doesn't seem to be looking at the reads in the bam file to see if there is a variant?
Are you sure you used the same reference sequenc for variant calling as the one you used for aligment?
Please post the output of
samtools view -H aln.bam
and the first lines of the results in your vcf file including the header.fin swimmer
I was actually using the wrong reference but they two weren't that different. I was making test VCFs that were about 1000 bp in size so it really wouldn't have made a difference. I changed the reference to the correct one and still have the same issue. I would say a good 200-300 of the 1000 bp are misscalled.
Sorry if I wasn't clear enough. We need the whole header and some example of variants.
fin swimmer
Hello,
there is still some example in your vcf file missing. Without it is quiet difficult to help. And what does this
......
mean? What have you truncated there?Did you already take a look in igv on those position?
As you can see in your post there is this line
In the vcf file of your computer company should be something similar. There you can see what programm they use and with what parameters. This is what Pierre ask for.
Why do you use
--use-duplicate-reads
?fin swimmer
I have the same problem 4 years later. The first time I run it, it apparently works fine, but then it stops working for the same data forever, even if I delete all generated files (including indexes).