Plink MDS plot based on IBS to exclude samples
1
0
Entering edit mode
8.4 years ago
Jimbo ▴ 120

I have used plink to produce an MDS plot using human GWAS with the following code (to reduce LD snps as recommended):

plink --bfile pFiles/geno --indep-pairwise 50 10 0.2 --out prune1

plink --bfile pFiles/geno --extract prune1.prune.in --genome --out ibs1

plink --bfile pFiles/geno --read-genome ibs1.genome --cluster cc --ppc 1e-3 --mds-plot 2 --out strat1

I then opened the mds file (strat1.mds) in R and plotted C1 vs. C2 - it seems clear that there are some outlier samples. Plot looks something like this:

plink c1 v c2

Am I justified in removing these outliers from further analysis purely by looking at this data and essentially just saying "we took the clump in the middle for further analysis"

Or should I use something more subjective (e.g. PPC) to get rid of samples that look a bit out there from downstream analysis?

plink gwas ibs mds qc • 6.1k views
ADD COMMENT
2
Entering edit mode
8.4 years ago
Floris Brenk ★ 1.0k

Usually people remove samples that are more than 2 or 3 SD apart from the mean, that is the more scientific way to say for "we took the clump in the middle for further analysis"

ADD COMMENT
0
Entering edit mode

So this would be a valid approach to remove outliers from a GWAS study? i.e. calculate mean + sd for C1 &remove 'outliers' then repeat for C2? Is there a reference? I ask because I can't find anyone explicitly using such stats - I've just seen people sort of eyeball the figure and remove the ones that look funny e.g. http://onlinelibrary.wiley.com/doi/10.1002/ejp.560/full

ADD REPLY
1
Entering edit mode

"All subjects with an identity-by-state (IBS) genetic distance from the sample mean of more than 3 standard deviations were considered outliers with respect to genetic ancestry and were pruned from the sample. This was also confirmed through visual inspection of Multidimensional scaling (MDS) plots."

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3213201/

ADD REPLY
0
Entering edit mode

Just checking is this human data?

ADD REPLY
0
Entering edit mode

Yes - human data and passed all basic QC, call rates high in all samples etc. so I don't think it's some dodgy batch effect - certainly the outliers aren't all on a specific plate or a specific phenotype/gender etc.

The article looks good thanks it appears they use visual inspection to confirm and use IBS to remove outliers rather than any specific calculation based on MDS values, although I'm not 100% sure how they calculated sample mean pair-wise IBS given it's a pairwise calculation.

ADD REPLY

Login before adding your answer.

Traffic: 2989 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6