Mutect2 (GATK4, Java8) outputs empty VCF file for tumor only mode
0
0
Entering edit mode
5.4 years ago
ResearchR ▴ 120

Dear all,

I am using Mutect2 (GATK4 with Java 8, OpenJDK) on RNA-Seq data from a tumor. I would like to call SNVs (as part of a RNA editing pipeline), but Mutect2 outputs always an empty SNV file. I went through the GATK forum since this problem occurred quite often for the GATK3 version of the tool. Since it takes at least two working days after registration until I can post my problem in the GATK forum, I hoped I could get help here a bit faster. I processed my data as follows.

1. STAR two pass-alignment:

STAR \
--genomeDir $star_index_dir \
--readFilesIn $output_dir/star-pass2/read_1.fq $output_dir/star-pass2/read_2.fq \
--outFileNamePrefix $output_dir/${sample_id}. \
--outSAMattributes NH HI AS nM NM MD \
--outSAMtype BAM Unsorted \
--twopassMode Basic \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--sjdbScore 1 \
--outTmpDir $PBS_SCRATCH_DIR/$PBS_JOBID/star \
--runThreadN $threads &> $output_dir/star-pass2/log.txt

2. Picard AddOrReplaceReadGroups

java -d64 -Xmx24g -Djava.io.tmpdir=$PBS_SCRATCH_DIR/$PBS_JOBID -jar $EBROOTPICARD/picard.jar AddOrReplaceReadGroups \
INPUT=$output_dir/${sample_id}.Aligned.out.bam \
OUTPUT=$output_dir/${sample_id}.Aligned.out_read_group_added.bam \
RGID=$sample_id \
RGLB=stranded \
RGPL=illumina \
RGPU=barcode \
RGSM=$sample_id \
VALIDATION_STRINGENCY=SILENT &> $output_dir/log_AddOrReplaceReadGroups.txt

3. Sort

java -d64 -Xmx24g -Djava.io.tmpdir=$PBS_SCRATCH_DIR/$PBS_JOBID -jar $EBROOTPICARD/picard.jar SortSam 
I=$output_dir/${sample_id}.Aligned.out_read_group_added.bam \
O=$output_dir/${sample_id}.sorted.bam \
SORT_ORDER=coordinate &> $output_dir/log_samsort.txt

4. Mutect2

$GATK --java-options "-d64 -XX:CompileThreshold=1000 -XX:ReservedCodeCacheSize=128m -Xmx25g -Xms20g -Djava.io.tmpdir=$PBS_SCRATCH_DIR/$PBS_JOBID" \
Mutect2 \
-R $ref_genome_fasta \
-I $output_dir/${sample_id}.sorted.mdup.bam \
-O $output_dir/${sample_id}.mutect.raw.vcf \
-tumor ${sample_id} \
-L 19 \  # <- Just to make testing faster, use chr19
--callable-depth 10 \
--base-quality-score-threshold 25 \
--dont-use-soft-clipped-bases \
--native-pair-hmm-threads $threads > $output_dir/log_MuTect2.txt

If used ValidateSAMFile to check if my BAM file is corrupt, but no error was detected. I tryed it for several BAM files and I never got a non-empty VCF file.

Any suggestion?

Thanks for your help and have a nice weekend!

RNA-Seq SNV Mutect2 • 2.8k views
ADD COMMENT
1
Entering edit mode

Have you tried HaplotypeCaller? That would help to determine if GATK is able to properly parse the BAM file.

ADD REPLY
0
Entering edit mode

I found the solution to my problem....you have to use -outSAMmapqUnique 60 in STAR to make GATK software work with your output. @igor: Thanks for the reply!!

ADD REPLY
0
Entering edit mode

You can also use GATK to adjust the MAPQ scores. From Calling variants in RNAseq:

STAR assigns good alignments a MAPQ of 255 (which technically means “unknown” and is therefore meaningless to GATK). So we use the GATK’s ReassignOneMappingQuality read filter to reassign all good alignments to the default value of 60. This is not ideal, and we hope that in the future RNAseq mappers will emit meaningful quality scores, but in the meantime this is the best we can do.

ADD REPLY

Login before adding your answer.

Traffic: 1677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6