Hello Everyone! I am trying to apply robust PCA method to SNP data (HapMap data) to cope with population stratification. I read a paper published in 2020 and it says that this is the first time Robust statistics has been applied to RNA seq data analysis. The link is here.
I get the idea that their input data format is a matrix of m samples and n genes. But I am not able to figure out what will be the input data matrix format for the SNP data. Mostly for effective PCA methods, additive SNP coding yields a numeric matrix containing 0,1,2 which is fed as input data. Also, I am confused if the results of the PCA plot will show clumping of samples or SNPs? Can anyone help me in understanding the input data format and providing a brief tutorial for how I should proceed with my raw data the preprocessing followed by using robust pca in R( rrcov, pcaHubert) etc.?