Hi,
I am working on variant calling on fungal genomes. I have Illumina HiSeq reads. I am new to this and am following this workflow
Step 1: QC of raw reads performed using FastQC tool
Step 2: Preprocessing of raw reads performed using Trimmomatic-v0.36 tool
Step 3: QC of clean reads performed using FastQC tool
Step 4: Alignment to reference genome using bowtie2-v2.2.6 tool
Step 5: SAM to BAM (alignment files) conversion using samtools-v1.3.1
Step 6: Remove duplicates using sambamba-0.6.6 tool
Step 7: coordinate sorting of bam files with samtools
Step 8: Variant calling performed using samtools/bcftools
Step 9: variant filtering with bcftools
In this post Workflow Or Tutorial For Snp Calling? and other variant calling related resources I found 2 additional steps before variant calling i.e. local realignment and base quality recalibration.
- Are these steps essential?
- I found that these option are not available in samtools but in GATK. How can I perform these steps for my data?
Hi Santosh,
I have already completed upto Step7 of the work flow above using the tools mentioned and no I don't have GATK installed. What would you suggest in this case? I am supposed to use this work flow maybe coz they have been using this here.
You may proceed with your current pipeline, also because doing everything GATK way will need some time to learn. However, it is better to learn GATK in long-term as that is more or less standard now. You may start from here for GATK https://software.broadinstitute.org/gatk/best-practices/
I think I can add these 2 steps in my work flow (using gatk) before the 8th step i.e. actual variant calling. I read someone do it in some post here on Biostar. What would you suggest?
My opinion is that it will not harm you in any case. So you should try it if you could.
Thanks, I appreciate.
@Santosh Hi, GATK requires read group tag @RG. I have aligned my reads using bowie2 (without the use of RG tag, as I never required it earlier) and completed the 7 steps of my workflow. Now, I read that we can add RG using picard but the big question, what values I will give in the command? I have no clue. My bam files look like this,
and so on...