Entering edit mode
2.9 years ago
Dan
▴
180
Hello,
I am new to whole-genome sequencing data analysis. I am not sure whether I am correctly using Picard and GATK for my whole-genome analysis. Can someone help to check my pipeline? Thanks a lot!
TrimGalore-0.6.6/trim_galore --paired --cores 4 --retain_unpaired H_5_S12_L001_R1_001.fastq.gz H_5_S12_L001_R2_001.fastq.gz -o ./out
bwa mem -t 18 bwa_index/GRCh38_Broad/GRCh38_Broad H_5/out/*val_1.fq.gz H_5/out/*val_2.fq.gz > H_5_mem_val.sam
samtools view -Sb -T hg20/Broad_Homo_sapiens_assembly38.fasta H_5_mem_val.sam > H_5_mem_val.bam
samtools sort -n H_5_mem_val.bam -o H_5_mem_val.bam
samtools fixmate -m H_5_mem_val.bam H_5_fixed.bam
samtools sort H_5_fixed.bam -o H_5_sorted.bam
samtools markdup -r H_5_sorted.bam H_5_dedup.bam
samtools view -S H_1_dedup.bam | head -1 | awk '{print $1}'
# A01494:44:H53Y7DMXY:1:2301:27579:4883
java -jar ~/picard.jar AddOrReplaceReadGroups I=H_5_dedup.bam O=H_5_dedup.RG.bam RGID=A01494.44 RGLB=lib RGPL=illumina RGSM=H_5 RGPU=A01494.44.H53Y7DMXY.1
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar BaseRecalibrator -I H_5_dedup.RG.bam -O H_5/recal.txt -R hg20/Broad_Homo_sapiens_assembly38.fasta --known-sites vcf/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites vcf/Homo_sapiens_assembly38.dbsnp138.vcf --known-sites vcf/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar ApplyBQSR -I H_5_dedup.RG.bam -O H_5.final.bam -R hg20/Broad_Homo_sapiens_assembly38.fasta --bqsr-recal-file H_5/recal.txt
samtools index H_5.final.bam
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar Mutect2 \
-R hg20/Broad_Homo_sapiens_assembly38.fasta \
--germline-resource vcf/af-only-gnomad.hg38.vcf.gz \
--panel-of-normals vcf/1000g_pon.hg38.vcf.gz \
-I H_5.final.bam \
-O vcf/H_5.vcf.gz \
--f1r2-tar-gz tmp/H_5.f1.tar.gz \
--af-of-alleles-not-in-resource -1.0
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar GetPileupSummaries \
-I H_5.final.bam \
-V vcf/af-only-gnomad.hg38.vcf.gz \
-O H_5.pileup.txt \
--intervals Bed/Agilent.71M.Covered.hg38.bed
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar CalculateContamination \
-I H_5.pileup.txt \
-O tmp/H_5.contamination.table \
-tumor-segmentation tmp/H_5.segments.table
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar LearnReadOrientationModel \
-I tmp/H_5.f1.tar.gz \
-O tmp/H_5.prior.tar.gz
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar FilterMutectCalls \
-V vcf/H_5.vcf.gz \
-R hg20/Broad_Homo_sapiens_assembly38.fasta \
--contamination-table tmp/H_5.contamination.table \
-O vcf/H_5.filtered.vcf.gz \
--ob-priors tmp/H_5.prior.tar.gz \
--tumor-segmentation tmp/H_5.segments.table
java -jar ~/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar Funcotator \
--variant vcf/H_5.filtered.vcf.gz \
--reference hg20/Broad_Homo_sapiens_assembly38.fasta \
--ref-version hg38 \
--data-sources-path gatk/funcotator_dataSources.v1.7.20200521s \
--output vcf/H_5.filtered.func.vcf \
--output-file-format VCF