I am just starting to learn to use bioinformatics tools. My university has a limited and expensive bioinformatics team, so I'm mostly on my own except for big questions.
I am planning to use GATK to run 58 cancer control/normal pairs of Exome sequencing data (Illumina) from FASTQ or BAM file format, through the pipeline, with an output of a VCF & MAF format for analysis.
The current GATK pipeline is used for disease but not cancer, so I was wondering if anyone knew if there should be changes made for cancer. Here's the current pipeline starting with BAM files:
- (Non-GATK) Picard Mark Duplicates or Samtools roundup
- Indel Realignment (Realigner TargetCreator + Indel Realigner)
- Base Quality Score Reacalibration (Base Recalibrator + PrintReads)
- HaplotypeCaller <- I've been told this is for germline variants; what can I use for somatic variants?
- VQSR (VariantRecalibrator and ApplyRecalibrator in SNP and INDEL mode)
- Annotation using Oncotator (?)
I'd like some verification that this pipeline will output what I need to run my samples on MuTect, MutSig, or some other analysis program. I appreciate any advice.
Crossposted on Stack Exchange Biology.
The analysis can be done in a pretty good way using this link Link