I am trying to calculate the depth of coverage of an exome using gatk and am having trouble with two parts:
This is the command I am trying to emulate -
java -Xmx3072m -jar ./Sting/dist/GenomeAnalysisTK.jar \
-T DepthOfCoverage -I group1.READS.bam.list -L EXOME.interval_list \
-R ./human_g1k_v37.fasta \
-dt BY_SAMPLE -dcov 5000 -l INFO --omitDepthOutputAtEachBase --omitLocusTable \
--minBaseQuality 0 --minMappingQuality 20 --start 1 --stop 5000 --nBins 200 \
--includeRefNSites \
-o group1.DATA
first question is the group1.READS.bam.list I am confused by what they are asking for. Do they simply want paths to a handfulmy bam files separated by new lines?
second question is when I run the command it errors saying my contigs are incompatible.
Input files reads and reference have incompatible contigs: No overlapping contigs found.
ERROR reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247...
ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5...
Is this an issue with the reference genome I used or my bam files? I used the hg19 because the human_g1k_v37 was erroring on the reference while hg19 did not.
Thank you enormously.
Thank you for your response! I aligned against hg19 initially and this error was still present when attempting to run the function using hg19 as a reference.
Sounds weird - if you use the exact same reference in both steps this shouldn't happen. Check that the contig names in the hg19 version you have are the same as the ones in your bam file. If those are the same, the only things that I can thing of of the top of my head is that the EXOM.interval_list might be malformated, or that you could try regenerating the index/dictionary files.
I was told that I needed to use the humang1kv37 reference but I got this error:
Badly formed genome loc: Contig 'chr start stop name' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
Does this mean that I don't have the reference dict in gatk?
Try removing the .dict file, I GATK does not find it it should try to recreate it. You should also check that your EXOME.interval_list has the same contig names and is derived from the same reference that you are using.
Is the .dict in the gatk jar file? I am unsure how to delete what is within it
Johan is right. The chromosome names in the header of the fasta sequences that were collectively used as a reference genome should match with the new reference file you are providing for your current analysis (coverage). GATK will even complain if the first reference file had chromosome names as 1,2,3....MT and the current reference contains chromosome names as chr1,chr2..chrMT. Also, the order of the chromosomes matter too. The BAM file should be sorted in the same order as chromosome appear in the new reference file or vice versa.
I was told that I needed to use the humang1kv37 reference but I got this error:
Badly formed genome loc: Contig 'chr start stop name' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
Does this mean that I don't have the reference dict in gatk?
1) Can you run the command: grep ">" Reference_file.fasta (Reference file you are giving as an input to GATK) and paste the output here. It should be the header line for each chromosome. 2) Also, run: samtools view -H Input.bam file (BAM file going as an input in GATK) and paste the header information of your BAM file here.
Also, paste the GATK command you are using.
Head lines for each chromosomes:
the samtools view -H input.bam command gives me too much output to paste in here.
it has a section of:
@HD VN:1.0 GO:none SO:coordinate @SQ SN:1 LN:249250621 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:1b22b98cdeb4a9304cb5d48026a85128 SP:Homo Sapiens @SQ SN:2 LN:243199373 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:a0d9851da00400dec1098a9255ac712e SP:Homo Sapiens @SQ SN:3 LN:198022430 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:fdfd811849cc2fadebc929bb925902e5 SP:Homo Sapiens @SQ SN:4 LN:191154276 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:23dccd106897542ad87d2765d28a19a1 SP:Homo Sapiens @SQ SN:5 LN:180915260 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:0740173db9ffd264d728f32784845cd7 SP:Homo Sapiens @SQ SN:6 LN:171115067 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:1d3a93a248d92a729ee764823acbbc6b SP:Homo Sapiens @SQ SN:7 LN:159138663 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:618366e953d6aaad97dbe4777c29375e SP:Homo Sapiens @SQ SN:8 LN:146364022 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:96f514a9929e410c6651697bded59aec SP:Homo Sapiens
as section of:
@RG ID:C08VM.1 PL:illumina PU:C08VMACXX111121.1.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.2 PL:illumina PU:C08VMACXX111121.2.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.3 PL:illumina PU:C08VMACXX111121.3.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.4 PL:illumina PU:C08VMACXX111121.4.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.5 PL:illumina PU:C08VMACXX111121.5.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.6 PL:illumina PU:C08VMACXX111121.6.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.7 PL:illumina PU:C08VMACXX111121.7.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.8 PL:illumina PU:C08VMACXX111121.8.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0CGV.4 PL:illumina PU:D0CGVACXX111118.4.AGGTTATC LB:Catch-103331 DT:2011-11-18T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.1 PL:illumina PU:D0D16ACXX111121.1.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.2 PL:illumina PU:D0D16ACXX111121.2.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.3 PL:illumina PU:D0D16ACXX111121.3.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.4 PL:illumina PU:D0D16ACXX111121.4.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.5 PL:illumina PU:D0D16ACXX111121.5.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.6 PL:illumina PU:D0D16ACXX111121.6.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.7 PL:illumina PU:D0D16ACXX111121.7.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI
and a section of:
@PG ID:GATK TableRecalibration VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/D0D6UACXX/C1-1602011-11-182011-11-28/6/Catch-103331/D0D6UACXX.6.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate] @PG ID:GATK TableRecalibration.1 VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/C08VMACXX/C1-1602011-11-212011-12-01/8/Catch-103331/C08VMACXX.8.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate] @PG ID:GATK TableRecalibration.10 VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/C08VMACXX/C1-1602011-11-212011-12-01/5/Catch-103331/C08VMACXX.5.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate] @PG ID:GATK TableRecalibration.11 VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/C08VMACXX/C1-1602011-11-212011-12-01/7/Catch-103331/C08VMACXX.7.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]
Is this proper?
the approximate command I ran is in the original post.