Discrepancy with Mutect output version 1.1.4 using the --interval option for normal/tumor pair cancer samples
0
0
Entering edit mode
10.3 years ago
ivivek_ngs ★ 5.2k

I used Mutect for my normal/tumor pair samples(exome) with the bam files which are preprocessed by GATK for BQSR and those recalibrated bam files are used for Mutect downstream for somatic mutation identification. To note during processing of bam in GATK I have already used target bed files provided by the company which are used for target enrichment during the library preparation process. Still I wanted to see if I use those target bed files again with Mutect do I get mutations with better annotation on exomes or not. When I run mutect without the --intervals option i.e not specifying the Target enrichment bed files I get variants which around 1800 which are having the flag KEEP but when use the option --intervals with MuTect 1.1.4 with I do not get any variants with flag KEEP and COVERED. Is it not a right way to use the target bed file of the company while running the MuTect on GATK processed bam files? The command line I used is below

java -Xmx14g -jar /scratch/GT/softwares/mutect/muTect-1.1.4.jar \
  --analysis_type MuTect \
  --reference_sequence /scratch/GT/vdas/test_exome/exome/hg19.fa \
  --cosmic /data/PGP/exome/mutect/hg19/hg19_cosmic_v54_120711.vcf \
  --dbsnp /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf \
  --input_file:normal /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.bam \
  --input_file:tumor /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.bam \
  --out /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/mutect_S_333soma_t.txt \
  --coverage_file /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/LG.coverage.wig.txt \
  --vcf /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/mutect_S_333soma_t.vcf \
  --intervals /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed \
  --fraction_contamination 0.5

I would like to know if anyone has tried this or not and the feedback. Is it advisable to use the target bed files with --interval options of GATK recalibrated bam files while running MuTect and fishing out somatic mutations for the pair?

MuTect GATK SNP • 5.4k views
ADD COMMENT
0
Entering edit mode

It seems that using the interval file is not the problem, I had re run the analysis again without using the interval list of bed file that comes with the target kit, but I kept the tumor contamination fraction as 50% or 0.5 . I get only 2 somatic variants. I have also checked the scenario with VarScan at both 25% and 50% purity and I found very few variants which were high confident. I retrieved around 180 odd variants with VarScan out of which only 8-9 were on the exons. This might be an indication that the variants might not be ideally true. But I would like to have some input from you guys who already tried exome data analysis for somatic variants with less than or equal to 50% pure line for normal/tumor pairs for fishing out mutations(somatic). Please give your views.

ADD REPLY
0
Entering edit mode

Hi vchris_ngs,

Your input file

--input_file:normal [Did you realign bam files and how did you do that.] Is it that muTect needs BAM file be ordered in different way.
ADD REPLY
0
Entering edit mode

realignment of the bam file is the normal exome sequence pipeline procedure, it is the realignment around the indels, I am using the realigned and recalibrated bam files and calling variants on them with Mutect, VarScan and GATK, so am keeping the source bam file same from which I am making the variant calls.

ADD REPLY
0
Entering edit mode

If you could share the links where the procedure is defined to get realignment around indels, that would be great.

I normally used sorted BAM for varscan2/somaticSnipper and works fine. I wanted to run muTect on the same BAM and it is throwing error.

You can see the error message which I posted here.

ADD REPLY
1
Entering edit mode

@Chirag Nepal

I used the below command with Mutect version 1.1.4

java -Xmx14g \
  -jar /path/muTect-1.1.4.jar \
  --analysis_type MuTect \
  --reference_sequence /path/hg19.fa \
  --cosmic /path/mutect/hg19/hg19_cosmic_v54_120711.vcf \
  --dbsnp /path/databases/dbsnp_137.hg19.vcf \
  --input_file:normal /path/normal/normal.realigned.recal.bam \
  --input_file:tumor /tumor/tumor.realigned.recal.bam \
  --out /path/mutect_out.txt \
  --coverage_file /path/mutect.coverage.wig.txt \
  --vcf /path/mutect_out.vcf \
  --intervals /path/Exon_V4_clean.list \
  --fraction_contamination 0.50 \
  -ip 50

Using local realignment on the sorted bam files which are cleaned after marking or removing duplicates with Picard tool (please read the documentation of Picard)

Local realignment around indels

Step1:

java -Xmx4g -jar GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R hg19.fa \
-o input.bam.list \
-I input.marked.bam

supported extensions (.bed, .list, .picard, .interval_list, or .intervals)

This step puts the table in the file in input.bam.list. When this is finished we can start the realigning step using the statements below:(medium time run)

java -Xmx4g -Djava.io.tmpdir=/tmp \
-jar GenomeAnalysisTK.jar \
-I input.marked.bam \
-R hg19.fa \
-T IndelRealigner \
-targetIntervals input.bam.list \
-o input.marked.realigned.bam

Step2: Quality score recalibration

That's still not all. Quality data generated from the sequencer isn't always very accurate and for obtaining good SNP calls (which rely on base quality scores), recalibration of these scores is necessary. Again this is done in two steps: the CountCovariates step and the PrintReads steps. Both can be run from the GATK package:

1) Count covariates:

java -Xmx4g -jar GenomeAnalysisTK.jar \
-l INFO \
-R hg19.fa \
--DBSNP dbsnp137.txt \
-I input.marked.realigned.fixed.bam \
-T CountCovariates \
-cov ReadGroupCovariate \
-cov QualityScoreCovariate \
-cov CycleCovariate \
-cov ContextCovariate \
-recalFile input.recal_data.csv

This step creates a .csv file which is needed for the next step and requires a dbSNP file, which can be downloaded at the UCSC Genome browser homepage'(http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/). DbSNP137 is the most novel one which can be downloaded from the UCSC browser, but dbSNP is updated regularly, so newer versions will be available in the future.

2) Table recalibration:

java -Xmx4g -jar GenomeAnalysisTK.jar \
-l INFO \
-R hg19.fa \
-I input.marked.realigned.fixed.bam \
-T PrintReads \
--out input.marked.realigned.fixed.recal.bam \
-recalFile input.recal_data.csv

For my analysis I created this bam files individually both for the normal and the tumor samples, this bam file can be then used to run with Mutect (in my case I used the version 1.1.4), the GATK which I have used is 2.3.4, but I would recommend you to use the latest version of the suite.

ADD REPLY
0
Entering edit mode

hi @vchris_ngs,

Thanks for your detail answer.

I got into this error:

ERROR MESSAGE: SAM/BAM file bam/P1T_SRR900123_trimmed_bwa.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups

I used bwa to map reads, and converted/sort sam to bam.

Then used this command:

java -Xmx4g -jar ~/unixTools/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /steno-internal/chirag/data/indexGenome/hg19/bwa/hg19.fa -o TEST_LIST -I bam/P1T_SRR900123_trimmed_bwa.bam

Can you help me with this?

ADD REPLY
0
Entering edit mode

Could it be that you forgot to set the read groups at bwa alignment?

You can use Picard AddOrReplaceReadGroups take a look here

ADD REPLY
0
Entering edit mode

I did forget to set read groups at bwa alignments.

ReadGroup infor are not present in fastq files. As I am working on downloaded reads from SRA, I have no information needed to put for Read group in Picardtools. I am trying to bypass this by dummy values, but not yet successful.

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6