Why Are Monomorphic Loci Excluded From Analysis?
3
5
Entering edit mode
11.3 years ago
714 ▴ 110

As the title suggest, I was wondering why it is a good idea to exclude monomorphic loci from SNP analysis. How would including them affect a PCA plot for example?

snp pca • 21k views
ADD COMMENT
13
Entering edit mode
11.3 years ago
Fabio Marroni ★ 3.0k

In my understanding, monomorphic means something that appears in just one state (or form), in contrast to polymorphic that means something that appears in more than one form. SNPs are by definition polymorphic. A monomorphic site is one site in which all the individuals have the same form (genotype). It is a good idea to exclude it from analysis because it gives no information. Please, note that you implicitly always exclude from analysis the majority of the 3 billion positions of the human genome for which you find no variation.

ADD COMMENT
1
Entering edit mode

Would there be any harm in keeping monomorphic loci in the dataset given that they do not seem to contribute to any of the variation that we might see?

ADD REPLY
6
Entering edit mode

As Josh already said, it does no harm in terms of results (they are uninformative), but it wastes computer time.

ADD REPLY
5
Entering edit mode
11.3 years ago
Josh Herr 5.8k

You would inflate your SNP numbers and misrepresent your data.

How would you differentiate between a sequencing error, one-off single mutation or transcription error, and a bona-fide SNP? SNPs are found across individuals in a population -- monomorphic loci represent one individual's nucleotide state and may be the result of errors across numerous levels. When you see a SNP in multiple individuals you can infer it is not from sequencing error or a mutation found in a single individual.

ADD COMMENT
0
Entering edit mode

That makes some sense, howeforver I'm afraid I don't quite understand all of it. For example, if 100 individuals were gentyped at loci A-D and all were homozygous C/C at locus A, then why would one exclude locus A from the dataset and subsequently, analysis?

ADD REPLY
1
Entering edit mode

In your example, locus A would be not informative and it would be pointless to leave that nucleotide alignment position in the analysis -- it would provide you with no information and would also waste compute time (meh, probably negligible). You would want to remove uninformative characters -- this would include non-variable sites as well as monomorphic sites (one "mutation" and not a SNP) or highly variable sites.

ADD REPLY
0
Entering edit mode
11.3 years ago

Hi, How to consider heterozygous allelic state of parents in polymorphism analysis, for example SNP1 SNP2 SNP3 p1 AA AT AA p2 AA AA TT here i want to see polymorphism between p1 and p2, This is my expected results SNP1 SNP2 SNP3 p1 mono ? poly p2
Thanks in advance

ADD COMMENT
0
Entering edit mode

I don't quite understand your question? (This isn't an answer by the way, so it should be placed as an additional question in a new thread). Are you asking how to differentiate between heterozygosity and SNP polymorphisms?

ADD REPLY

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6