WES and GBAP1
0
0
Entering edit mode
1 day ago

Hello,

sequencing the GBA1 gene is important in hereditary Parkinson's disease research. Just upstream of GBA1 is a pseudogene GBAP1 with 90+ % sequence identity. This means WES capture kits include the pseudogene unless specifically designed to exclude it (not the case for us, unfortunately).

Illumina offers Gauchian to call variants in this region: https://github.com/Illumina/Gauchian

But it only works on WGS data.

I wonder if anyone here is aware of a tool or approach that helps analysis of this problematic region, when working with WES data?

Thanks in advance!

wes homology gba ngs • 232 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Hello Odin, variants have been called already, this post is about the trouble with interpreting them, is the read/variant from the pseudogene or the real gene?

ADD REPLY
0
Entering edit mode

Is this a single target capture or are there multiple targets? Have you tried to align your data to the region (including the pseudogene) (if single target) and/or to the entire genome (if multiple targets)?

ADD REPLY
0
Entering edit mode

Hello GenoMax!

We have WES data, so all canonical human exons are meant to be captured and sequenced (by using flanking primers, i.e. homologous regions like the pseudogene GBAP1 would be included by accident). I'm not sure what you're suggesting, could you rephrase please?

ADD REPLY
0
Entering edit mode

I was trying to see if you have done some analysis which it looks like you have. It would be hard to distinguish between alignments to the pseudogene and real gene unless that particular read is aligned in a region of sequence difference. I assume you have short reads so you can't completely capture the complete genomic context.

ADD REPLY
0
Entering edit mode

Yes sadly, we have 75bp and 150bp paired-end reads... I was hoping there'd be some specific tool to "resolve" this particular problematic region. I saw a paper where they had compared the read depth between the two regions, and basically said "if the read depths are sufficiently different at the locus of some detected variant, we can't say whether it's the real gene or the pseudogene". I suppose they're reasoning that depth discrepancy implies reads being mapped i.e. variants called incorrectly.

Long-read sequencing would solve everything of course, but we're talking about ~ 30,000 people, the cost of long-read sequencing is just ridiculously beyond our budget...

ADD REPLY

Login before adding your answer.

Traffic: 1643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6