Hi,
Please excuse me if this is a really dumb question,
I am using sequencing.com, snpvcf file from dante labs whole genome testing.
I am female, but there is variant data in the y chromosome. of course I expected it to be blank. Also ins/del on y in indel file. is there someone who could enlighten me please. I am assuming this data is correct. I haven't explored deeply but these seem all to be protein coding, and there are thousands of variants with high numbers under the modifier heading, both hom and het.
Any insights really appreciated. Sequencing.com are getting back to me, and I will update post if its anything other than me being biometric ignorant. Thank you
Please be assured that this is for my own entertainment, I am not looking for health advice, or counseling. I apologize if my post came across this way....Who knows, I might learn something about genetics, and at the very least, its really making my brain work.
It is possible that these genes are from PAR region. Depending on how alignments were done the reference may have contained both X/Y chr.
Also see: Only females but still few reads map to Y chromosome
thank you for responding, I have looked at the data viewer provided by dante and there is definitely some (very limited) data on the y chromosome. I suppose what I really need to know (assuming the sample is true and not contaminated) is that as there is data showing on the y (comes up with 78000 variants) I must have a y chromosome?
is there a simple way of determining karyotype?
I strongly recommend to consult a MD / counselor who is expert in human genetics rather than trying to interpret your results with help of a bioinformatics community. If you want clinical-grade verifications of your karyotype then go for professional tests, not any simple approaches. It is nice that everyone now can get personalized genome sequences but it typically leaves more questions than answers.
thank you, I do agree completely, At 59 with 3 children it is hardly a concern, just interesting as I have many variants in the sex determining genes. its just fascinating. I expected to see that this was a normal result, with y chromosome genes being transferred from the mother somehow. or a male childs dna getting mixed up with mine. Just interested in finding out how it works, rather than looking for somekind of diagnosis, although a question that has popped in my head.....If I have y dna from a pregnancy could that have an impact on me? (not looking for a answer, just intriguing, especially as I my health deteriorated badly after the birth of my first son.
Just to make a couple points raised by the others a bit clearer:
The most likely explanation is that the Y-related data you're looking at is an artifact of how the DNA sequencing is done. In order to sequence quickly and cheaply, the DNA molecule is chopped up into tiny fragments before sequencing. This means that the result of the sequencing is not one very long stretch of ACTGs (=the individual nucleotides of the DNA); it is instead a box of millions and millions of tiny DNA pieces. Think of it like a jigsaw puzzle where the assembled picture would be the complete DNA. In order to solve the randomly generated jigsaw puzzle, we rely on a manual, i.e. a reference genome, that we use to align the puzzle pieces to. This alignment is nothing more fancy than finding the position in the reference genome where the order of the ACTGs matches the order of the ACTGs in a given tiny piece of sequence information. Now, if there are certain regions on the X chromosome that are extremely similar to regions that are normally on the Y chromosome (these are the PAR the others are referring to), then it can happen, that we misalign some pieces, i.e. instead of placing them on the X chromosome, where they belong, the computer algorithm mistakingly assumes that they originated from a Y chromosome. This can happen because most of the individual pieces tend to have small errors in their sequence, a missed "C" here, an additional "A" here, which are also common technical artifacts of the sequencing process. These errors may make some pieces look more similar to the (false) Y-chromosome places than their actual X origins.
When someone brought up contamination as a possible source, they most likely did not imply that you actually had cells with foreign DNA in your body but that sometime during the sequencing process (including the collection of the sample) someone else's cells may have inadvertently been collected, too.
"If I have y dna from a pregnancy could that have an impact on me" It is extremely unlikely that you (a) have your male child's DNA somewhere in your body and (b) that the sequencing picked up on it. It is far more likely that the sample was contaminated from external sources (e.g. spit, hairs, skin cells from other people that handled your DNA).
thank you so much for the reply, Ive learned a lot today. I think that maybe I will dig deeper. Not for diagnosis but my own curiosity. I have BAM ant FSTQ files that I haven't even managed to download yet, I may get a more complete picture there. Yes most variants are in the PAR region, except I have variants in the PCH11Y gene which I think isn't possible without a y chromosome. So Im edging toward contamination. If this proves to be the case I will ask them to re-sequence. I must say that I haven't been this absorbed for ages. Its really good to find out things, like why I have such loose joints and my skin looks like fish scales if I don't oil it every day. Who knew these have names.....My gp didn't lol
Unless you feel comfortable working on the Linux command line, you will probably gain little from downloading the very big FASTQ and BAM files; these are fairly unwieldy files that cannot be opened with standard software (the data viewer you mentioned earlier is most likely loading and visualizing the content of the BAM file for your convenience).
I can absolutely relate how this can be super fascinating (after all, I do this for a living), but I would caution you against drawing too many conclusions on your own; I'd definitely reach out to the sequencing company as well as a human geneticist who should hopefully be able to put the most exciting findings in context.