Question

Large number of overlapped genes/transcripts reported by VEP

0

Entering edit mode

6.4 years ago

newbio17 ▴ 360

I'm currently in the process of analyzing whole-exome and RNA sequencing data on a cancer cell line and attempting to see how many genes consists of deleterious mutations.

I have performed quality control, alignment/mapping (BWA for WES and STAR for RNA-Seq), and variant calling (VarScan).

The VCF file returned was given as a input to ENSEMBL's Variant Effect Predictor (VEP), and I plan to filtering the output so that it consists of SNPs annotated as deleterious.

I quickly examined the HTML file containing statistics (default output provided by VEP), and noticed that there were large number of overlapped genes/transcripts reported by the tool.

Should I be concerned with such large numbers? Is there something I am missing or should be looking out for? Any input would be greatly appreciated.

Thank you.

RNA-Seq SNP VEP • 1.9k views

ADD COMMENT • link 6.4 years ago by newbio17 ▴ 360

2

Entering edit mode

Hello newbio17,

what do you mean by "large numbers" and why do you worry about this? If I'm doing WES and RNA sequencing I would expect that (nearly) all my variants overlap a transcript of a gene.

Furthermore AFAIK VEP reports for every transcript that overlaps the variant. One gene can have multiple transcripts.

fin swimmer

ADD REPLY • link 6.4 years ago by finswimmer 16k

0

Entering edit mode

Hi finswimmer,

Thank you for your input.

It's my first time working with WES and RNA-Seq data so everything is new to me. As a reference, below are the statistics VEP reported for the run. To clarify, it seemed to me that the number reported for overlapped genes with respect to number of variants processed was a little high.

General statistics

Lines of input read: 27714
Variants processed: 26392
Variants filtered out: 0
Novel / existing variants: 0 (0.0) / 26392 (100.0)
Overlapped genes: 9564
Overlapped transcripts: 9603

ADD REPLY • link 6.4 years ago by newbio17 ▴ 360

0

Entering edit mode

Honestly, I'm surprised that you sequenced a whole exome and only identified variants in 9564 genes. Given the frequency of variants in any individual, I would have thought you'd have variants in every gene.

ADD REPLY • link 6.4 years ago by Emily 24k