Hey there,
I have a set of ~13K SNPs across 16 individuals, 8 from each of two different species. Individuals were sequenced in two different sequencing runs. I am using the R package SNPRelate to calculate relatedness between pairs and to visualize the individuals in PCA space. The problem is that PCA is separating not only by species but also by sequencing run. This pattern persists even if I remove SNPs that are missing in many individuals.
I would like to find the particular SNPs that are separating the different sequencing runs and remove them from analyses. Any suggestions on finding the SNPs that are contributing to this pattern?
-S
I'm sorry. This is my first time doing this kind of analysis, and I don't understand how the structure of your PCA object looks like. I'm using the SNPRelate package to do my PCAs, and the loading object looks like this:
How can I extract the loadings for the PC that I want with its associated snp.id? And how is this different from the snp correlation analysis from this tutorial? Any help will be welcomed!
Hi rturba07, could you solve it? I'm trying to do the same, and I'm having difficulties. Any advice will help!
well, this project got in the backburner for awhile and i'll soon go back to it. but taking a look at what i posted and the dataset, i think what i would try to do is extract both variables $snploading and $snp.id, combine them in a data.frame and order by value, then extract the top 10 ones or something like that. does that make sense?
Well, that's exactly what I did. As I'm using only PC1 and PC2 to separate my populations I used the code below to create a dataframe with the loadings for PC1 and PC2: write.table(SNPLoad$snploading[1,], file="snploading.txt", quote = FALSE, col.names = TRUE, row.names = FALSE, sep = ",") write.table(SNPLoad$snploading[2,], file="snploading2.txt", quote = FALSE, col.names = TRUE, row.names = FALSE, sep = ",") write.table(SNPLoad$snp.id, file="snpid.txt", quote = FALSE, col.names = TRUE, row.names = FALSE, sep = ",") So I have now a table that looks like: SNPid PC1 PC2 63492565 0,094750109 -0,001474008 49041906 0,082402028 -0,000652574 I think this could be what I needed, but maybe I should test a subset of these SNPs in a new PCA to see if they are useful markers.