Question

WES and GBAP1

0

Entering edit mode

4 weeks ago

Joel Wallenius ▴ 210

Hello,

sequencing the GBA1 gene is important in hereditary Parkinson's disease research. Just upstream of GBA1 is a pseudogene GBAP1 with 90+ % sequence identity. This means WES capture kits include the pseudogene unless specifically designed to exclude it (not the case for us, unfortunately).

Illumina offers Gauchian to call variants in this region: https://github.com/Illumina/Gauchian

But it only works on WGS data.

I wonder if anyone here is aware of a tool or approach that helps analysis of this problematic region, when working with WES data?

Thanks in advance!

wes homology gba ngs • 385 views

ADD COMMENT • link 4 weeks ago by Joel Wallenius ▴ 210

0

Entering edit mode

if i understand correctly you want to call variants?

in that case, you can use mutect https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2

if CNV https://boevalab.inf.ethz.ch/FREEC/

https://www.bioconductor.org/packages/release/bioc/html/PureCN.html

ADD REPLY • link 4 weeks ago by Hyper_Odin ▴ 320

0

Entering edit mode

Hello Odin, variants have been called already, this post is about the trouble with interpreting them, is the read/variant from the pseudogene or the real gene?

ADD REPLY • link 4 weeks ago by Joel Wallenius ▴ 210

0

Entering edit mode

Is this a single target capture or are there multiple targets? Have you tried to align your data to the region (including the pseudogene) (if single target) and/or to the entire genome (if multiple targets)?

ADD REPLY • link 4 weeks ago by GenoMax 148k

0

Entering edit mode

Hello GenoMax!

We have WES data, so all canonical human exons are meant to be captured and sequenced (by using flanking primers, i.e. homologous regions like the pseudogene GBAP1 would be included by accident). I'm not sure what you're suggesting, could you rephrase please?

ADD REPLY • link 4 weeks ago by Joel Wallenius ▴ 210

0

Entering edit mode

I was trying to see if you have done some analysis which it looks like you have. It would be hard to distinguish between alignments to the pseudogene and real gene unless that particular read is aligned in a region of sequence difference. I assume you have short reads so you can't completely capture the complete genomic context.

ADD REPLY • link 4 weeks ago by GenoMax 148k

0

Entering edit mode

Yes sadly, we have 75bp and 150bp paired-end reads... I was hoping there'd be some specific tool to "resolve" this particular problematic region. I saw a paper where they had compared the read depth between the two regions, and basically said "if the read depths are sufficiently different at the locus of some detected variant, we can't say whether it's the real gene or the pseudogene". I suppose they're reasoning that depth discrepancy implies reads being mapped i.e. variants called incorrectly.

Long-read sequencing would solve everything of course, but we're talking about ~ 30,000 people, the cost of long-read sequencing is just ridiculously beyond our budget...

ADD REPLY • link 4 weeks ago by Joel Wallenius ▴ 210