Question

number of heterozygous and homozygous SNPs across lineages

0

Entering edit mode

4.1 years ago

romain.derelle • 0

Hello,

I have mapped several Illumina paired-end libraries on the reference genome of a protist (unicellular eukaryote; 100Mb genome with few repeats but very low AT, high ploidie) using Bwa-mem. All libraries correspond to different lineages of the species of the reference genome.

After calling heterozygous and homozygous SNPs using samtools (only primary alignments: -F 260) and bcftools (only good quality alignments: -q 30), I obtained the following results:

lineages nearly identical to the ref genome: 1K homo SNPs + 1K hetero SNPs
lineages a bit more distant from the ref genome: 0.5M homo SNPs + 10K hetero SNPs
lineages very distant from the ref genome: 1M homo SNPs + 50K hetero SNPs

nb: a SNP was called if the number of alternate reads (DP4) was at least 5 reads, and was called 'heterozygous' when the number of reference reads was below 5.

The fact that I observe more homozygous SNPs in distant lineages than in lineages close to the reference genome makes perfectly sense to me. But why the number of heterozygous SNPs also increases with the phylogenetic/evolutionary distance ? I would expect it to be constant across lineages. Does it suggest that there is a problem with my analyses ?

many thanks if you could help me making sense of these results.

SNP bwa samtools bcftools • 704 views

ADD COMMENT • link 4.1 years ago by romain.derelle • 0

0

Entering edit mode

I am not sure if your system to detect homozygous and heterozygous SNPs is common (using a five read threshold) but perhaps you want to try using GATK to call variants instead?

ADD REPLY • link 4.1 years ago by samuel.a.odonnell ▴ 580

0

Entering edit mode

thanks Samuel. I'll give GATK or Freibades a try. But my problem will likely persist with these programs.

ADD REPLY • link 4.1 years ago by romain.derelle • 0