Question

Variant calling using samtools

0

Entering edit mode

8.3 years ago

Gene_MMP8 ▴ 240

I have cancer dataset containing 10 tumor and 10 control pairs. Each tumor or control dataset is 100 GB in size. I have a refrence sequence too. (mm9.fa). SO I need to do some variant calling using these available data. What I do is the following:-

samtools mpileup -g -f mm9.fa *.bam | bcftools view -bvcg - > var.raw.bcf (mpileup takes all the tumor and control pairs) bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

But the entire process is agonizingly slow (>10 hrs and still nothing). What should I do? Can I find the variants individually and then merge them into one large bcf file? P.S: I am very new to this field and pardon my ignorance in some words written above. Thanks in advance.

alignment • 3.4k views

ADD COMMENT • link updated 8.3 years ago by Brian Bushnell 20k • written 8.3 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

Hey, What about this MutScan: detect and visualize target mutations by just scanning FastQ, 50X faster than normal pipelines ? They said 50x faster than classic pipeline :)

Best

ADD REPLY • link 8.3 years ago by Titus ▴ 910

score 0 · Answer 1 · 2017-05-12

0

Entering edit mode

8.3 years ago

vmicrobio ▴ 290

you have to try it with submission to a computer cluster. It will save you some time and computer access.

ADD COMMENT • link 8.3 years ago by vmicrobio ▴ 290

score 0 · Answer 2 · 2017-05-12

0

Entering edit mode

8.3 years ago

Brian Bushnell 20k

BBMap's variant-caller is roughly 80x faster than the pipeline you are using. Samtools has to be in your path (and I recommend samtools 1.4, which is much faster than older versions). The command for multiple samples would be:

callvariants.sh in=sample1.bam,sample2.bam,sample3.bam ref=mm9.fa ploidy=2 out=variants.vcf multisample

ADD COMMENT • link 8.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks for the suggestion. I have 20 bam files. Can I write *.bam here? Its giving an error here. Or do I need to write each and every file name?

ADD REPLY • link 8.3 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

Sorry :) I'll modify it to allow *.bam but right now you have to list all of them. Also, please note I modified the command to add the term "multisample".