principal component analysis on pool-seq SNP data

0

Entering edit mode

3.1 years ago

stephen.johnson.online • 0

I would like to perform principal component analysis on a pool-seq SNP dataset. I've been looking into methods for doing this, but have had trouble finding examples that may apply for pooled data as opposed to individual genotypes. For example, I'm not sure if PLINK can be used to run PCA on pooled datasets. Is anyone familiar with whether PLINK can be used for PCA on pooled SNP data, and, if not, any toolkit or approach that would be ideal to use for PCA on pooled data?

Thanks in advance!

sequencing analysis component pooled principal • 1.4k views

ADD COMMENT • link updated 2.2 years ago by Benjamin • 0 • written 3.1 years ago by stephen.johnson.online • 0

0

Entering edit mode

Have you looked at this tutorial from Kevin Blighe ?

Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

ADD REPLY • link 3.1 years ago by ATpoint 86k

0

Entering edit mode

Thanks, this tutorial is really in depth and may be useful! Do you know if PLINK can be used for pooled SNP datasets? It looks like the tutorial is for a file with individual genotypes.

ADD REPLY • link 3.1 years ago by stephen.johnson.online • 0

0

Entering edit mode

What do you mean by "pooled"? You mean to merge different datasets? In that case, you will be dealing with potential batch and / or technical artefacts.

ADD REPLY • link 3.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Individuals were pooled prior to sequencing, so each library contains DNA from multiple individuals. I'm still not sure about PLINK, but I did come across someone else who did use the prcomp function in base R to run PCA on pool-seq allele frequencies