Entering edit mode
7.3 years ago
popayekid55
▴
110
Dear all,
i have 10 human samples. i have aligned to human grch38 from ensemble using hisat. i want to the sample gender. i have checked reads for X and Y chromosome. For all the samples X reads are greater compared to Y. i refered this link. But our samples have male samples. how is this possible. i dont have the access to GATK DepthOfCoverage. what could be the alternate for this to find coverage. or is there any other way to confirm the gender.
thank you all
The X chromosome is much larger and more important than the Y chromosome, so of course it will tend to yield more reads. I would guess that 1, 6, 8, and 9 are female, but you can't really say that with any confidence without calibration. And the columns are unlabeled and no methodology is given, so it's not even certain what the numbers mean.
1st column is sample IDs, subsequent columns are ChrX and ChrY. Number are raw read count aligned to chromosome. Analysis methodology as i said hisat was used to align. sequencing is polyA capture and data from Hiseq2000
OK, thanks. So, I suggest you follow Devon's advice, as it looks like there is a clear separation between males and females; the lowest male expression being a 170x ratio, and highest (apparent) female expression being a 3964x ratio, which gives you over a factor of 20 margin.
In absence of X-chromosomal abnormalities, Xist expression should be able help mark female samples.
these are normal samples. there are no abnormalities. could understand by
Xist expression
From my experience, yes, looking at the expression of Xist, you'll see great differences between males and females in your RNA-seq dataset.
Please provide a precise metric. What kind of samples are these?
these human normal male female samples
Take the ratio of Y and X and I think you'll see that you have two groups of samples...
I have added the ratio. Are there any strict/range of cutoff to separate groups. Different threads gives different cutoffs
Any strict cutoff you get from someone else will have been derived from their datasets. Just look at the data, you'll be able to eye-ball two groups.