What can be causing my very odd data?
0
2
Entering edit mode
9.0 years ago
HumeMarx ▴ 40

Hi all

I am having a lot of difficulty with a set of case/control exome data (using Plink2 as my main analysis tool).

I have a lot of heterozygous haploid genotypes and nonmale nonmissing Y chromosome markers. A large proportion of the samples appear not to be Caucasian from pca analysis, even though the curators assure us they are all Caucasian. Also a very large number of samples seem to be very closely related to each other (pi-hat estimates way above 0.125).

On top of this a few sex-fails have been detected (removed from analysis before population stratification and relatedness checks).

Is it likely that all these issues are caused by missing SNPs? Over 2/3 of the available SNPs had to be removed from analysis as they had missingness values above 15%.

My personal feeling is that I can't really trust this set of data based on all these things that are going wrong! There has to be a fundamental reason why every step in this analysis is causing so much spurious results!

Any help is immensely appreciated!

missing-genotypes Plink Exome-sequencing pca • 2.0k views
ADD COMMENT
0
Entering edit mode

It's horrible!

Let's start from the easiest thing: "I have a lot of heterozygous haploid genotypes and nonmale nonmissing Y chromosome markers". I imagine nonmale non-missing Y markers means you have females with Y markers, right? So maybe your samples are not what you are thinking. Try to calculate how many males and how many females you have in your sample and see if the numbers make sense.

However, I wouldn't be happy with the missingness levels you are talking about. What happens if you try to slightly increase the threshold for removing a SNP, i.e. 25% of missingness? Does this rescue a lot of SNPs? Maybe some sequencing run was very bad?

ADD REPLY
0
Entering edit mode

Hi

Yes the data is awful. Exactly, I have females with Y markers. That suggests a genotyping issue to me, am I right?!

Overall there are two males in the pedigree file that actually appear to be female based on the SNP data. Quite a few others that are female but have Y markers present!

By increasing the threshold to 20% I save a lot of SNPs. But the problem is I can't find a single person who would condone this increase. All the papers I have read specify ideally 10% but no one is keen to go above 15%!

What I am also struggling to find out is if the heterozygous haploids have a REAL biological/genetical meaning?

ADD REPLY

Login before adding your answer.

Traffic: 2080 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6