Entering edit mode
6.7 years ago
Siavash Salek Ardestani
▴
20
Hi I have 50 DNA whole genome seq(bam file) that these samples produced separately (until re calibration step) and Then for variant calling using Unified Genotyper ,I produced one vcf file for all of these samples. I want to know is it a true method ? or I should merge all samples together in a bam file from realignment step until end of variant calling step?
Go through the specific commands that you used, please. Thank you!
Thank you so much. so my method is true.
No. Can you, please, paste (here, in a text box, via the
ADD REPLY
button) the commands that you executed when you were processing this data? :)You have not provided enough information for anyone to give an answer.
indel realignment
realignment
make grp
recalibration
variant calling
Hello,
there is no one and only "true" way for processing your data. The way can differ due to the sequencing platform, the source of your material, the goal of you experiment and so on.
What you have shown is ok I think, but there is a lot space for improvements. So for example using GATKs HaplotypeCaller or freebayes instead of UnifiedGenotyper. Doing this you can avoid realignment completly.
fin swimmer
thank you so much. you mean when I using HaplotypeCaller or freebayes, I can avoid realignment ?
Yes,
both discard the alignment information within a window around a suspected ins/del and doing a de novo assembly with the reads within this window. This is much more powerful.
Even BQR is questionable. If the vast majority of your sequenced bases have (very) good Q Scores the benefit of it will be negligible.
fin swimmer
Sorry, I got caught up in London rush hour.
As per finswimmer, definitely switch to using
HaplotypeCaller
.I also agree with finswimmer's mild skepticism about the
BaseQualityScoreRecalibration
- I don't believe that it affects much, possibly just a few bases that are on the fringe of poor/good quality. As you have already included it, you do not have to exclude it.The re-alignment for indels, I believe that you should keep this. I have noticed that it can definitely improve calling of indels when compared to, for example,
samtools mpileup
.The other question is surrounding your use of
UnifiedGenotyper
/HaplotypeCaller
and in supplying multiple BAMs. The other option is to call variants separately on each BAM and to then merge the VCFs together by performing joint genotyping.The GATK's variant calling pipelines are, in my opinion, overly complex, and make something difficult when it should be easy. You should learn 'backup' methods that are different from the GATK's. Obviously, however, you should be generally aware of the 'best practices': https://gatkforums.broadinstitute.org/gatk/discussion/3238/best-practices-for-variant-discovery-in-dnaseq
In any case, here is a quick summary of what to do:
bwa mem
; if shorter, use the originalbwa
algorithm orbowtie
The final step is definitely not necessary and may even result in you throwing out good calls.
Thank you but my main question is: 1.can I use bam files separately in Local re-alignment and BQSR steps or I should merge these bam files after Mark duplicates? 2.when I do Variant calling separately then I do not have any 0/0 genotype but when I do Variant calling all samples or together (even two samples) then I have 0/0.
Do not merge them for the purposes of making duplicates or re-aligning indels. Treat them as separate samples.
You can merge them during variant calling, in accordance with how the GATK's tools are developed.
thank you so much for useful guidance.
I know gatk's best practice guides still recommend it, but I cannot understand why this should improve my variant calling, as the HaplotypeCaller doesn't rely on the alignment. Can you explain it?
Before starting with the Variant calling or doing anything else with the bam file I would recommend to sort the file by coordinates, as a lot of tools expect it that way. It is annoying to realize that first if you want to test a tool you didn't think of before.
fin swimmer
I would recommend to sort the file by coordinates: which software can do it?
samtools sort can do this.
I have seen how the use / non-use of the IndelRealigner does modify calls on indels, and this is when using HaplotypeCaller.
it is my method. is it true?