As the title suggest, I was wondering why it is a good idea to exclude monomorphic loci from SNP analysis. How would including them affect a PCA plot for example?
As the title suggest, I was wondering why it is a good idea to exclude monomorphic loci from SNP analysis. How would including them affect a PCA plot for example?
In my understanding, monomorphic means something that appears in just one state (or form), in contrast to polymorphic that means something that appears in more than one form. SNPs are by definition polymorphic. A monomorphic site is one site in which all the individuals have the same form (genotype). It is a good idea to exclude it from analysis because it gives no information. Please, note that you implicitly always exclude from analysis the majority of the 3 billion positions of the human genome for which you find no variation.
You would inflate your SNP numbers and misrepresent your data.
How would you differentiate between a sequencing error, one-off single mutation or transcription error, and a bona-fide SNP? SNPs are found across individuals in a population -- monomorphic loci represent one individual's nucleotide state and may be the result of errors across numerous levels. When you see a SNP in multiple individuals you can infer it is not from sequencing error or a mutation found in a single individual.
In your example, locus A would be not informative and it would be pointless to leave that nucleotide alignment position in the analysis -- it would provide you with no information and would also waste compute time (meh, probably negligible). You would want to remove uninformative characters -- this would include non-variable sites as well as monomorphic sites (one "mutation" and not a SNP) or highly variable sites.
Hi,
How to consider heterozygous allelic state of parents in polymorphism analysis, for example
SNP1 SNP2 SNP3
p1 AA AT AA
p2 AA AA TT
here i want to see polymorphism between p1 and p2,
This is my expected results
SNP1 SNP2 SNP3
p1 mono ? poly
p2
Thanks in advance
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Would there be any harm in keeping monomorphic loci in the dataset given that they do not seem to contribute to any of the variation that we might see?
As Josh already said, it does no harm in terms of results (they are uninformative), but it wastes computer time.