Exome variants limited only to the first half of a gene? Technical error?
0
0
Entering edit mode
4.3 years ago
knsvar ▴ 40

Hi,

We have performed an exome for a genetically heterogeneous disorder.

There is a certain gene of interest which spans more than 150 kb and has >100 exons.

Looking in the variants of the gene (incl. also synonymous and intronic) there appear to be approx. 30 SNVs within the first half (5' part) of the gene but no variant at all in the rest. 30 variants within the first 50 exons and introns but no variant thereafter.

There appear to be reads within the exons which do not have variants and the coverage for the entire coding region appears to be >97% at 30x. Coverage appears to be insufficient for a single or two exons at most.

Would you have any idea why this may be happening ?

Thank you in advance.

next-gen exome • 1.3k views
ADD COMMENT
2
Entering edit mode

I'm assuming here you are talking about Titin. Are you sure that your coverage is consistent across all the exons? What exactly makes you think that there should be variants at both ends of the gene and/or across the whole thing? Depending on the tissue you sequenced and the disease context - maybe this make sense.

The main thing is... if you go in and look at those downstream exons (in like IGV) - do you see any evidence of missed variants?

ADD REPLY
1
Entering edit mode

No, actually it's RYR1. :) The DNA was extracted from a blood sample.

Going back to the browser, I have a sort of rare artefacts but not suggestive of heterozygous (or homozygous) variants. Apologies for the description if inappropriate, but I am a clinician.

Looking briefly with a colleague from the lab, this region was poor in variants in 3-4 other exomes (e.g. 2-3 SNVs incl. synonymous and splicing / intronic). We probably had some more variants (I suppose around 10-15) with an older capture kit.

Still the gene always appeared to have a coverage of ~98% qt 30x with only 1-2 exons having poor coverage.

I was wondering if there could be a bioinformatic or other cause for this.

ADD REPLY
1
Entering edit mode

If coverage is poor over a certain portion of the gene, then the variant caller may not even attempt to try to call variants. Some parameter configuration may be needed. This being said, you seem to indicate that the gene is adequately covered (?). You could check the default settings on the bioinformatics program that you are using - some may be ramped up too high.

Having worked in more than one clinical lab, I know that these issues are almost daily occurrences.

ADD REPLY
1
Entering edit mode

If the kit used for the exome sequencing was identical to the one used for the other 3-4 exomes and the bioinformatic pipeline was also identical you wouldn't' really expect any novel artifacts due to either the exome library preparation or the alignment of the reads to the genome (as artifacts should be reproducible between experiments).

When you say they are not suggestive of het/homo vars do you mean they have extremely low allele frequencies? Maybe some kind of contamination... but still very strange that they are isolated to part of the gene. Any evidence for some kind of crazy recombination event where part of the gene was duplicated or transposed or something.

If the variants are spread across different reads and well separated (not all clumped together in 20 nt windows) - it could possibly be biological? But honestly, I have no idea... Somebody else here might know more though. Good luck!

ADD REPLY
0
Entering edit mode

In the screenshots you may get an idea of the issue (whole gene view, a known snv, an exon without variants ). Apologies but I am from a mobile.

https://ibb.co/Vw0qWCf https://ibb.co/WW6SSrj https://ibb.co/Jyc50kP

ADD REPLY
0
Entering edit mode

Please edit original post and add the images using: How to add images to a Biostars post

ADD REPLY
0
Entering edit mode

Hi those look like standard errors seen in Illumina sequencing due to either the base-caller making incorrect calls or the consistent incorporation of a wrong base at a given position during the sequencing run.

What really confuses me is that you only see this on part of one gene? Did you only target that specific gene. I would be very surprised if you didn't have even a low baseline level of errors elsewhere in the genome.

You can see that one variant from dbSNP is probably real (qualitatively based on my experience) the rest look like artifacts. When I do exome variant calling usually those other variants (which seem to be very randomly dispersed) would be filtered out.

ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6