What causes some individuals to be separated clusters from the population the in PCA
1
1
Entering edit mode
5.1 years ago
Hann ▴ 110

Hello,

I am doing population genetics structure analyses with cereal crop plants (Fonio millet). I have 157 samples/accessions that are coming from different locations in West Africa.

I've run the Principal component analysis.

Results showed that there're 8 individuals from Togo separated and clusters apart from the rest of individuals (check the image in the link):

Screen-Shot-2019-10-29-at-9-12-26-AM

I want to investigate more this separation, what causes this clear clusters of these 8 samples? and how to test the hypotheses?

In a genetic perspective; many processes could involve in shaping the genetic of species:

-Genetic drift can affect the structure ???

-Natural selection ??

-Deleterious mutations ??

I did PCA analysis on the two subgenomes, and one of the subgenomes separated 8 accessions coming from south Togo from the rest of Togo's samples and the other accession

Screen-Shot-2019-10-29-at-8-59-11-AM

I've also conducted chromosomes-wise PCA, and the separation was only with chromosomes 3A and 9A

Screen-Shot-2019-10-29-at-9-02-38-AM

I didn't find any particular pattern for chromosomes 3A and 9A when conducting FST, or allele frequencies.

There must be something, but how to reveal it?

Thanks

SNP population genetics PCA • 1.4k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
5.1 years ago

Please first explain how you processed and pre-filtered the data. Do those Togolese Republic individuals comprise a family, by any chance?

Note that the percent explained variation is quite low on both axes, so, the differences to which we are referring here are minute / small. A PCA bi-plot will always 'expand out' the samples in the plot space; so, it can frequently occur that small differences can appear magnified.

For example, if I do a PCA analysis of variants / mutations in tumour and normal samples, then the percent expalined variation would be upward of 80%.

ADD COMMENT
0
Entering edit mode

Mm, what you said makes sense actually.

So these are 157 individuals from the same species Fonio millet (Digerati exilis), and they're coming from different locations, and different bioclimate variables.

The VCF file was filter accoring to the following:

  • allow no more than three SNPs into a 10-bp window and to remove indels

  • tolerate 10% missing data per SNP

  • low and high mean depth (14 ≤ DP ≥ 42)

  • and extract only biallelic SNPs

  • remove individuals that have more than 33% missing data and SNPs present in the unanchored chromosomes has been removed

The PCA analysis was done using all SNPs.

I already have a significant effect of climatic, geographic as well as social (i.e., ethnicity and linguistic groups) on the genetic structure. But I want to look at the data in genetic perspectives. What are the processes that made 8 samples from Togo to be separated? Is it because of gene flow? genetic drifts ....etc. And what I want is to find a way to test for these hypotheses.

I tried different approaches to somehow have a hint such as: Fst (check the differentiation between 8 samples from Togo and a random number of samples from the rest of individuals)

I looked at the allele frequency and the alternative allele frequency and didn't find a specific pattern to Togo's samples

I also checked the SNP density of all alleles / alternative alleles only and no specific patterns were observed

ADD REPLY
0
Entering edit mode

Interesting but perhaps outside of the scope of my experience. If you can trace back the origin of those particular samples, then that may allude to their differences. Could it be that they are from some 'laboratory' lineage that has been cultivated / bred repeatedly over time in vitro? Or perhaps it is the 'social' aspect that explains, i.e., the way that they cultivate this crop over the centuries has been quite specific.

ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6