Entering edit mode
6.4 years ago
shuksi1984
▴
60
Following are my DNAseq pipeline steps keeping GATKBestPrac in mind:
Step1-Quality check of raw data (fastqc)
Step2-Alignment with reference genome(bwa)
Step3-SAM to BAM conversion and sorting (SortSam)
step4-Generate aligment summary (CollectAlignmentSummaryMetrics, Samtools (depth), etc)
step5-Duplicate marking (MarkDuplicates)
Step6-Bam indexing (BuildBAMIndex)
Step7-Perform local realignment (RealignerTargetCreator, IndelRealigner)
Step8-Adjusting base quality score (BaseRecalibrator)
Step-9-Recalibration run quality visualization (AnalyzeCovariates)
Step10-Call Variants (HaplotypeCaller)
Step11-Retain SNPs and Indels (SelectVatiants)
Step12-Filter Indels and SNPs (VatiantFiltration)
Step13-Filter reads based on various read properties (PrintReads)
Step14-Detect additional variants (HaplotypeCaller)
Step-15-Retain SNPs and Indels 2nd time (SelectVariants)
Step16-Filter Indels and SNPs 2nd time (VatiantFiltration)
Step17-Annotate SNPs (SnpEff)
Step18-Generate genome coverage (bedtools)
I have mentioned the tools also. Kindly, let me know whether I am on track.
You should perhaps mention what you aim to achieve.
I want to find out SNPs and indels.
If you want to use the 'best practices', according to the GATK, then just follow their most updated pipeline that is on their website. Have you done NGS data processing in the past?
No, I havent done. This is the 1st time I am doing.
Can you share the link?
https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145
I am very new in WGS, WES analysis. I was wondering if this workflow that you mentioned worked for you?