Question

Understanding DiscoSNP++ output VCF file

1

Entering edit mode

7.1 years ago

achyR ▴ 10

Hello

I have a small query related to the output of discoSNP++. While analyzing the vcf file generated by vcfcreator, I found multiple "genotypes", which are as follows:

.|. ./. 0|0 0/0 0|1 0/1 1|1 1/1

I was wondering if someone can help me understand what does "./." ".|." "0/0" and "0|0" means.

Thank you for your help.

SNP VCF discoSNP++ Genotype • 1.6k views

ADD COMMENT • link 7.1 years ago by achyR ▴ 10

score 3 · Answer 1 · 2017-12-08

Hi Achal, thanks for your question.

Here is an explanation (non limited to discoSnp, and adapted to diploid species).

A genotype provides a way to know for each variant if it exists in the reference allele and/or in the alternative allele.

with a / :
- with a reference genome: the first value corresponds to the reference genome.
- without a reference genome (discoSnp only): the choice of the reference versus alternative allele is random
with a | : the variant is phased with the previous one. The first value corresponds to the same allele than the first allele of the previous genotype. This explains why the 1|0 genotype exists.

About the values:

./. the variant is not seen (missing data)
0/0: homozygous variant only existing in the reference
1/1: homozygous variant only existing in the alternative
0/1: heterozygous variant.

Hope this helps, Pierre

score 0 · Answer 2 · 2017-12-08

Hello Pierre

Thank you for your reply. It was helpful. However, I am still confused in interpreting "./."

I have 50 samples listed in the .fof file. Upon completion, discoSNP++ (followed by vcfcreator) outputs a contig fasta file and a vcf file. The vcf file contains numerous rows, each corresponds to single variant, and 9 + 50 columns. These 50 columns corresponds to the variant information within 50 samples used. Now take an example row from the output vcf file:

SNP_higher_path_9480770 56 9480770 C T . . Ty=SNP;Rk=1;UL=6;UR=20;CL=.;CR=.;Genome=.;Sd=. GT:DP:PL:AD:HQ ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 0/0:11:5,37,224:11,0:66,0 ./.:1:.,.,.:1,0:68,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:5:.,.,.:0,5:0,63 ./.:0:.,.,.:0,0:0,0 1/1:1259:25184,3794,59:0,1259:0,66 ./.:0:.,.,.:0,0:0,0 1/1:43:864,134,6:0,43:0,65 1/1:38:764,119,6:0,38:0,64 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 1/1:34:684,107,6:0,34:0,66 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0

Here you see that most columns have "./." and some have "1/1".

Now my question is how should I interpret samples with genotype "./."? Should I interpret is as the contig "SNP_higher_path_9480770" is missing in this particular sample OR the contig is present but without any variation?

Hope you get my query. Thanks