I have used FreeBayes for variant calling after facing some issues with GATK. The bam file which FreeBayes requires as a mandate is 53 Gb in size.
The command used is:
/data/programs/freebayes/bin/freebayes -f Reference.fasta Realign-AB.bam > Bayes-AB.vcf &
When i check for the different types of variants called by it, I get this:
- 1833122 snp 80312 complex 32512 mnp 26186 ins 22394 del
21993 complex snp 15185 snp snp 9023 mnp snp 2562 complex complex 1345 del snp 604 ins snp 524 complex mnp 500 mnp mnp 424 ins ins 360 del del 227 complex del 141 complex ins 74 del mnp 71 del ins 3 ins mnp
First to begin it is awful lot of variants. These are the results after filtering with quality and read depth(30 and 60 respectively).
Also there is not much explaination pertaining to how FreeBayes is capturing these variants. The reference is that of Aedes albopictus . It is 2 GB in size.
My question is
How can I understand what these variants are? How are these being captured? What might be the reason for so many Variants?
Hello,
are you sure you took the same reference sequence in alignment and variant calling? Can you please show the first lines of the resulting vcf (header and some variants)?
fin swimmer
Yes I am sure that the same reference sequence was used in alignment and variant calling.
Can you detailed you sample(s) sequenced ? Was it 1 mosquito ? If remember well mosquito genome are plenty of TE isn't it ? that's can explain a bit i think