Question

SNPs detection in transcriptomics to define genetic variation between individuals

0

Entering edit mode

7.6 years ago

roncalli • 0

Hi,

I am new to the SNP calling analysis and my question regard "expectations". I am trying to compare 3 de novo assemblies for calanoid copepod (non-model) and I would like to use the SNP calling as a way to assess the genetic variation between individuals. I do know from other paper that high genetic diversity is expected in my species and I am trying to link the SNP calling with this. I have seen that for humans 0.1% of SNPs diversity is enough to claim that 2 individuals are from different population but I am not sure if I can use this "expected" value also for copepods.

So my question is: Is there a way, based on % of "shared" and unique SNPs that I can support my theory of genetic diversity? Any "magic number" that would support the difference between individuals?

Thanks for the help,

Vittoria

RNA-Seq SNP genetic variation de novo assembly • 1.6k views

ADD COMMENT • link 7.6 years ago by roncalli • 0

score 0 · Answer 1 · 2017-09-05

Interesting. I don't know of any magic numbers.

Instead of comparing assemblies directly, which would always ignore heterozygote variation (assemblies are likely to be haploid), I would suggest another approach.

a) merge assemblies. Software: cd-hit, supertranscript
b) rename contigs in merged transcriptome
c) map reads with bwa to merged trancriptome.
d) call variants, eg freebayes or bbmap callvariants.sh

This has the advantage of being standard conform.

You could also annotate your merged assembly with Interproscan for example to look at function

score 0 · Answer 2 · 2017-09-05

0

Entering edit mode

7.6 years ago

roncalli • 0

Hi,

Thanks for the answer. Interestingly I did followed your suggestion already and this is what I did:

1) generate a merged assembly used as reference 2) mapped back to it each samples (raw reads) from a single individual 3) identified SNPs using samtool for calling variants

Now. How do I interpret the results? I am planning to generate a Venn diagram to see which SNPs are shared and which are "unique". My question now is "What is the % unique SNPs that would claim that individuals are from different population?"

I am very confused by the human #.

Thanks for the help,

Vittoria

ADD COMMENT • link 7.6 years ago by roncalli • 0

0

Entering edit mode

This is a very specific biological question. I would guess the ability to detect SNPs would be strongly affected by the number of transcripts expressed in each individual. I don't think anyone here can provide you with a definitive answer, the best bet would be to go through other non-model organisms to find more expectation values. Molecular breeding research might also help.

I would also show haplotypes - eg through visualization - of conserved well known genes (with SNPs) of the three samples.

Multisample SNP calling - through Freebayes etc - would at least give you the three samples together.

Lastly, this leads you very definitely towards molecular phylogenetics, trees and various relationship matrices.

ADD REPLY • link 7.6 years ago by colindaven 7.4k