I have cancer dataset containing 10 tumor and 10 control pairs. Each tumor or control dataset is 100 GB in size. I have a refrence sequence too. (mm9.fa). SO I need to do some variant calling using these available data. What I do is the following:-
samtools mpileup -g -f mm9.fa *.bam | bcftools view -bvcg - > var.raw.bcf (mpileup takes all the tumor and control pairs) bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
But the entire process is agonizingly slow (>10 hrs and still nothing). What should I do? Can I find the variants individually and then merge them into one large bcf file? P.S: I am very new to this field and pardon my ignorance in some words written above. Thanks in advance.
Hey, What about this MutScan: detect and visualize target mutations by just scanning FastQ, 50X faster than normal pipelines ? They said 50x faster than classic pipeline :)
Best