Tassel5v2 vcf file with 'N' in reference and alternate allele
1
0
Entering edit mode
5.1 years ago
Tm ★ 1.1k

Hello All,

Can anyone please help me understand why VCF file obtained using tassel5v2 pipeline "DiscooverySNPcallerV2" is giving 'N' in reference as well as alternate allele? When I check the particular position in reference there is no "N" in that particular position. Bowtie2 was used for tag mapping on reference genome.

Commands used for SNPcalling is as follows:

#$tasselPath -Xms64G -Xmx100G -fork1 -DiscoverySNPCallerPluginV2 -db GBSV2.db -sC "chr1" -eC "chr478" -mnLCov 0.1 -deleteOldData true -endPlugin -runfork1

#$tasselPath -Xms64G -Xmx100G -fork1 -ProductionSNPCallerPluginV2 -db GBSV2.db -i seqDir -k  keyFile -o Tassel_out.vcf -kmerLength 85 -mnQS 20 -e ApekI -endPlugin -runfork1
  
Output file generated looks like this:

##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not known. The major allele was used as reference allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the reference and alternate alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=3,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not appl
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  110     111     112     113     114     115     116     117     118
1       16032   S1_16032        NC      N       .       PASS    QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL  ./.:0,0:0       0/1:3,1:4:99:24,0,96
1       16033   S1_16033        NA      N       .       PASS    QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL  ./.:0,0:0       0/1:3,1:4:99:24,0,96
1       16034   S1_16034        NA      N       .       PASS    QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL  ./.:0,0:0       0/1:3,1:4:99:24,0,96
1       16035   S1_16035        NT      N       .       PASS    QualityScore=0.0;DP=375 GT:AD:DP:GQ:PL  ./.:0,0:0       0/1:3,1:4:99:24,0,96
1       16145   S1_16145        T       C       .       PASS    QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL  ./.:0,0:0       ./.:0,0:0       ./.:
1       16177   S1_16177        NG      N       .       PASS    QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL  ./.:0,0:0       ./.:0,0:0       ./.:
1       16178   S1_16178        NG      N       .       PASS    QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL  ./.:0,0:0       ./.:0,0:0       ./.:
1       16179   S1_16179        NT      N       .       PASS    QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL  ./.:0,0:0       ./.:0,0:0       ./.:
1       16180   S1_16180        NC      N       .       PASS    QualityScore=0.0;DP=147 GT:AD:DP:GQ:PL  ./.:0,0:0       ./.:0,0:0       ./.:
1       16295   S1_16295        T       C       .       PASS    QualityScore=0.0;DP=422 GT:AD:DP:GQ:PL  0/0:1,0:1:66:0,3,36     1/0:2,3:5:99
1       16319   S1_16319        A       T       .       PASS    QualityScore=0.0;DP=146 GT:AD:DP:GQ:PL  ./.:0,0:0       0/0:2,0:2:79:0,6,72

Am I doing something wrong? Any help is appreciated.

Tassel5v2 SNPcalling GBS • 3.0k views
ADD COMMENT
0
Entering edit mode

Hi toralmanvar

I don't have much knowledge about Tassel but do you think if its something related to presenting rare alleles? Have a look at the guide here on page#5

tassel

ADD REPLY
0
Entering edit mode

Thanks a ton Vijay for the reply. But this issue is related to plugin not able to use reference sequence to identify the adjacent base for obtained INDELs, in spite of providing reference genome in one of the parameters of DiscoverySNPCallerPluginV2. Though I got some help from the tassel goggle group to understand the reason behind this problem, but I am yet to receive proper solution for the same.

ADD REPLY
0
Entering edit mode
3.6 years ago
Lindsay • 0

I believe eight loci in your above example are all deletions. In the VCF specification, to indicate an insertion or deletion you also have to indicate the nucleotide before it and the position of that nucleotide. TASSEL does this in a weird way, using an N so that the preceding nucleotide can be variable.

ADD COMMENT

Login before adding your answer.

Traffic: 2173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6