Haploid Genotypes in discoSNP++ VCFs
1
1
Entering edit mode
6.2 years ago

I am trialing discoSNP++ as part of a bacterial GWAS pipeline and am seeking some clarification on the genotypes in the multisample VCFs. Similar to the Pseudomonas example provided in the VCF_creator user guide (pages 4-5), we see a large number heterozygous genotypes despite all the samples being from haploid organisms (e.g. 0/0, 0/1, 1/1). How should we interpret these heterozygous reads from our bacterial sequence data (paired-end data provided in the fof_reads1.txt and fof_reads2.txt structure as described in Case 4 of the discoSNP user guide). Any guidance appreciated!

discosnp • 1.3k views
ADD COMMENT
1
Entering edit mode
6.2 years ago

Hi

The genotyping is an option that can be switched off (-n option) when working on non diploid species. Computing the genotype on haploid species is meaningless.

However, 0/1 results may warn you (depending on the effective coverage of each allele) as this may reflect the existence of approximate repeats in the genome -----A------//------T------- that may be seen as SNP variants while they are not.

1/1 results are expected as, with no reference genome, the "reference" allele is randomly chosen and an homozygous variant may fall in the other allele.

Hope this helps; Pierre

ADD COMMENT
0
Entering edit mode

Thanks for the great (and quick!) explanation.

ADD REPLY

Login before adding your answer.

Traffic: 1854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6