Hello All,
Can anyone please help me understand why VCF file obtained using tassel5v2 pipeline "DiscooverySNPcallerV2" is giving 'N' in reference as well as alternate allele? When I check the particular position in reference there is no "N" in that particular position. Bowtie2 was used for tag mapping on reference genome.
Commands used for SNPcalling is as follows:
#$tasselPath -Xms64G -Xmx100G -fork1 -DiscoverySNPCallerPluginV2 -db GBSV2.db -sC "chr1" -eC "chr478" -mnLCov 0.1 -deleteOldData true -endPlugin -runfork1 #$tasselPath -Xms64G -Xmx100G -fork1 -ProductionSNPCallerPluginV2 -db GBSV2.db -i seqDir -k keyFile -o Tassel_out.vcf -kmerLength 85 -mnQS 20 -e ApekI -endPlugin -runfork1
Output file generated looks like this:
##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not known. The major allele was used as reference allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the reference and alternate alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=3,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not appl
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 110 111 112 113 114 115 116 117 118
1 16032 S1_16032 NC N . PASS QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL ./.:0,0:0 0/1:3,1:4:99:24,0,96
1 16033 S1_16033 NA N . PASS QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL ./.:0,0:0 0/1:3,1:4:99:24,0,96
1 16034 S1_16034 NA N . PASS QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL ./.:0,0:0 0/1:3,1:4:99:24,0,96
1 16035 S1_16035 NT N . PASS QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL ./.:0,0:0 0/1:3,1:4:99:24,0,96
1 16145 S1_16145 T C . PASS QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 ./.:
1 16177 S1_16177 NG N . PASS QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 ./.:
1 16178 S1_16178 NG N . PASS QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 ./.:
1 16179 S1_16179 NT N . PASS QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 ./.:
1 16180 S1_16180 NC N . PASS QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 ./.:
1 16295 S1_16295 T C . PASS QualityScore=0.0;DP=422 GT:AD:DP:GQ:PL 0/0:1,0:1:66:0,3,36 1/0:2,3:5:99
1 16319 S1_16319 A T . PASS QualityScore=0.0;DP=146 GT:AD:DP:GQ:PL ./.:0,0:0 0/0:2,0:2:79:0,6,72
Am I doing something wrong? Any help is appreciated.
Hi toralmanvar
I don't have much knowledge about Tassel but do you think if its something related to presenting rare alleles? Have a look at the guide here on page#5
Thanks a ton Vijay for the reply. But this issue is related to plugin not able to use reference sequence to identify the adjacent base for obtained INDELs, in spite of providing reference genome in one of the parameters of DiscoverySNPCallerPluginV2. Though I got some help from the tassel goggle group to understand the reason behind this problem, but I am yet to receive proper solution for the same.