Dear all,
I am using Mutect2 (GATK4 with Java 8, OpenJDK) on RNA-Seq data from a tumor. I would like to call SNVs (as part of a RNA editing pipeline), but Mutect2 outputs always an empty SNV file. I went through the GATK forum since this problem occurred quite often for the GATK3 version of the tool. Since it takes at least two working days after registration until I can post my problem in the GATK forum, I hoped I could get help here a bit faster. I processed my data as follows.
1. STAR two pass-alignment:
STAR \
--genomeDir $star_index_dir \
--readFilesIn $output_dir/star-pass2/read_1.fq $output_dir/star-pass2/read_2.fq \
--outFileNamePrefix $output_dir/${sample_id}. \
--outSAMattributes NH HI AS nM NM MD \
--outSAMtype BAM Unsorted \
--twopassMode Basic \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--sjdbScore 1 \
--outTmpDir $PBS_SCRATCH_DIR/$PBS_JOBID/star \
--runThreadN $threads &> $output_dir/star-pass2/log.txt
2. Picard AddOrReplaceReadGroups
java -d64 -Xmx24g -Djava.io.tmpdir=$PBS_SCRATCH_DIR/$PBS_JOBID -jar $EBROOTPICARD/picard.jar AddOrReplaceReadGroups \
INPUT=$output_dir/${sample_id}.Aligned.out.bam \
OUTPUT=$output_dir/${sample_id}.Aligned.out_read_group_added.bam \
RGID=$sample_id \
RGLB=stranded \
RGPL=illumina \
RGPU=barcode \
RGSM=$sample_id \
VALIDATION_STRINGENCY=SILENT &> $output_dir/log_AddOrReplaceReadGroups.txt
3. Sort
java -d64 -Xmx24g -Djava.io.tmpdir=$PBS_SCRATCH_DIR/$PBS_JOBID -jar $EBROOTPICARD/picard.jar SortSam
I=$output_dir/${sample_id}.Aligned.out_read_group_added.bam \
O=$output_dir/${sample_id}.sorted.bam \
SORT_ORDER=coordinate &> $output_dir/log_samsort.txt
4. Mutect2
$GATK --java-options "-d64 -XX:CompileThreshold=1000 -XX:ReservedCodeCacheSize=128m -Xmx25g -Xms20g -Djava.io.tmpdir=$PBS_SCRATCH_DIR/$PBS_JOBID" \
Mutect2 \
-R $ref_genome_fasta \
-I $output_dir/${sample_id}.sorted.mdup.bam \
-O $output_dir/${sample_id}.mutect.raw.vcf \
-tumor ${sample_id} \
-L 19 \ # <- Just to make testing faster, use chr19
--callable-depth 10 \
--base-quality-score-threshold 25 \
--dont-use-soft-clipped-bases \
--native-pair-hmm-threads $threads > $output_dir/log_MuTect2.txt
If used ValidateSAMFile
to check if my BAM file is corrupt, but no error was detected. I tryed it for several BAM files and I never got a non-empty VCF file.
Any suggestion?
Thanks for your help and have a nice weekend!
Have you tried HaplotypeCaller? That would help to determine if GATK is able to properly parse the BAM file.
I found the solution to my problem....you have to use
-outSAMmapqUnique 60
in STAR to make GATK software work with your output. @igor: Thanks for the reply!!You can also use GATK to adjust the MAPQ scores. From Calling variants in RNAseq: