How to reduce annotation errors using SNPEff on BBMap`s and Pilon`s VCFs?
0
0
Entering edit mode
7.8 years ago

How to reduce annotation errors using SNPEff on BBMap and Pilon VCFs?

I get "chromosome not found" error in Tuberculosis genome data processing experiments and the output is red and full of errors

I have tried # annotation

  java -Xmx10G -jar snpEff.jar -c snpEff.config -s SNPEffBBmapOutputStats.html -v -no-downstream -no-upstream   m_tuberculosis_H37Rv BBMap_variant_call.vcf> SNPEffBBmapGenome_merge.var.ann.vcf

# default parameters of SNPEff:

 java -Xmx10G -jar snpEff.jar -c snpEff.config -s SNPEffBBmapOutputStats.html - m_tuberculosis_H37Rv BBMap_variant_call.vcf>  SNPEffBBmapGenome_merge.var.ann.vcf

Should I change settings of SNPEff or preprocess VCFs somehow before inputting them into the annotation engine?

Thanks.

snpeff vcf bbmap pilon • 2.5k views
ADD COMMENT
0
Entering edit mode

It's possible that the problem is the chromosome names having spaces in them. Can you post the VCF header?

ADD REPLY
0
Entering edit mode
##fileformat=VCFv4.1
##fileDate=20170202
##source="Pilon version 1.21 Fri Dec 9 16:44:44 2016 -0500"
##PILON="--genome H37Rv_reference.fa --frags file.sorted.bam --output pilon_output.pilon --vcf"
##reference=file:/home/mat29/Desktop/Ready_Genomics_software/H37Rv_reference.fa
##contig=<ID=Mycobacterium,length=4411532>
##FILTER=<ID=LowCov,Description="Low Coverage of good reads at location">
##FILTER=<ID=Amb,Description="Ambiguous evidence in haploid genome">
##FILTER=<ID=Del,Description="This base is in a deletion or change event from another record">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Valid read depth; some reads may have been filtered">
##INFO=<ID=TD,Number=1,Type=Integer,Description="Total read depth including bad pairs">
##INFO=<ID=PC,Number=1,Type=Integer,Description="Physical coverage of valid inserts across locus">
##INFO=<ID=BQ,Number=1,Type=Integer,Description="Mean base quality at locus">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Mean read mapping quality at locus">
##INFO=<ID=QD,Number=1,Type=Integer,Description="Variant confidence/quality by depth">
##INFO=<ID=BC,Number=4,Type=Integer,Description="Count of As, Cs, Gs, Ts at locus">
##INFO=<ID=QP,Number=4,Type=Integer,Description="Percentage of As, Cs, Gs, Ts weighted by Q & MQ at locus">
##INFO=<ID=IC,Number=1,Type=Integer,Description="Number of reads with insertion here">
##INFO=<ID=DC,Number=1,Type=Integer,Description="Number of reads with deletion here">
##INFO=<ID=XC,Number=1,Type=Integer,Description="Number of reads clipped here">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Fraction of evidence in support of alternate allele(s)">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=String,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise change from local reassembly (ALT contains Ns)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=String,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=String,Description="Approximate read depth; some reads may have been filtered">
##ALT=<ID=DUP,Description="Possible segmental duplication">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
Mycobacterium    1    .    T    .    777    PASS    DP=22;TD=42;BQ=35;MQ=40;QD=35;BC=0,0,0,22;QP=0,0,0,100;PC=35;IC=0;DC=0;XC=0;AC=0;AF=0.00    GT    0/0

Thank you!!!

ADD REPLY
0
Entering edit mode

This looks like a mismatch of chromosome identifiers to me.

ADD REPLY
0
Entering edit mode

I post the other one, it looks unhealthy ---

 ##fileformat=VCFv4.2
##BBMapVersion=36.86
##ploidy=1
##rarity=1.00000
##minallelefraction=0.10000
##reads=789372
##pairedReads=789372
##properlyPairedReads=562846
##readLengthAvg=90.982
##properPairRate=0.71303
##totalQualityAvg=34.373
##mapqAvg=16.857
##reference=H37Rv_reference.fa
##contig=<ID=NC_000962.3,length=4411532>
##FORMAT=<ID=PASS,Number=1,Type=String,Description="Pass">
##FORMAT=<ID=FAIL,Number=1,Type=String,Description="Fail">
##INFO=<ID=SN,Number=1,Type=Integer,Description="Scaffold Number">
##INFO=<ID=STA,Number=1,Type=Integer,Description="Start">
##INFO=<ID=STO,Number=1,Type=Integer,Description="Stop">
##INFO=<ID=TYP,Number=1,Type=Integer,Description="Type">
##INFO=<ID=R1P,Number=1,Type=Integer,Description="Read1 Plus Count">
##INFO=<ID=R1M,Number=1,Type=Integer,Description="Read1 Minus Count">
##INFO=<ID=R2P,Number=1,Type=Integer,Description="Read2 Plus Count">
##INFO=<ID=R2M,Number=1,Type=Integer,Description="Read2 Minus Count">
##INFO=<ID=PPC,Number=1,Type=Integer,Description="Paired Count">
##INFO=<ID=LS,Number=1,Type=Integer,Description="Length Sum">
##INFO=<ID=MQS,Number=1,Type=Integer,Description="MAPQ Sum">
##INFO=<ID=MQM,Number=1,Type=Integer,Description="MAPQ Max">
##INFO=<ID=BQS,Number=1,Type=Integer,Description="Base Quality Sum">
##INFO=<ID=BQM,Number=1,Type=Integer,Description="Base Quality Max">
##INFO=<ID=EDS,Number=1,Type=Integer,Description="End Distance Sum">
##INFO=<ID=EDM,Number=1,Type=Integer,Description="End Distance Max">
##INFO=<ID=IDS,Number=1,Type=Integer,Description="Identity Sum">
##INFO=<ID=IDM,Number=1,Type=Integer,Description="Identity Max">
##INFO=<ID=COV,Number=1,Type=Integer,Description="Coverage">
##INFO=<ID=MCOV,Number=1,Type=Integer,Description="Minus Coverage">
##INFO=<ID=CED,Number=1,Type=Integer,Description="Contig End Distance">
##INFO=<ID=HMP,Number=1,Type=Integer,Description="Homopolymer Count">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Ref+, Ref-, Alt+, Alt-">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele Depth">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
##FORMAT=<ID=SC,Number=1,Type=Float,Description="Score">
##FORMAT=<ID=PF,Number=1,Type=String,Description="Pass Filter">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    bbmap_mapped
NC_000962.3    204    .    C    T    33.97    PASS    SN=0;STA=203;STO=204;TYP=SUB;R1P=6;R1M=3;R2P=5;R2M=2;PPC=2;LS=1456;MQS=226;MQM=18;BQS=495;BQM=38;EDS=440;EDM=49;IDS=13378;IDM=857;COV=16;MCOV=-1;CED=203;HMP=3;DP=16;AF=1.0000;DP4=-3,3,11,5    GT:DP:AD:AF:SC:PF    1:16:16:1.0000:33.97:PASS
NC_000962.3    207    .    T    C    36.53    PASS    SN=0;STA=206;STO=207;TYP=SUB;R1P=6;R1M=3;R2P=5;R2M=2;PPC=2;LS=1456;MQS=226;MQM=18;BQS=577;BQM=40;EDS=446;EDM=49;IDS=13378;IDM=857;COV=16;MCOV=-1;CED=206;HMP=0;DP=16;AF=1.0000;DP4=-3,3,11,5    GT:DP:AD:AF:SC:PF    1:16:16:1.0000:36.53:PASS
NC_000962.3    210    .    C    G    36.26    PASS    SN=0;STA=209;STO=210;TYP=SUB;R1P=6;R1M=3;R2P=4;
ADD REPLY

Login before adding your answer.

Traffic: 2264 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6