How to transform a whole-genome callset into whole-exome callset?
1
0
Entering edit mode
3.2 years ago

Hi all,

I have a callset from whole-genome data and with this callset, I want to transform it into exome callset by extracting the variants using a exome target interval. I obtained two exome target list, one from 1KG project (Phase 3) and the Twist Exome target (https://www.twistbioscience.com/resources/bed-file/ngs-human-core-exome-panel-bed-files). I have some questions:

  • Are these exome intervals appropriate to get the exon variants? If so, which one should I use? I subset my VCF file with both lists and for 1KG I got 68463 variants while with Twist Exome target, I got 21082.
  • Looking at the annotations for the subset (regardless if it was with 1KG or Twist), I get variants annotated as introns even if they are tagged as protein coding transcripts. Does this makes sense for an exome target list?

Thank you very much for your help!

Kind regards,

Alejandro

calling wes variant wgs • 1.5k views
ADD COMMENT
1
Entering edit mode
3.2 years ago
h.mon 35k

There is no universal, consensus whole-exome annotation. There are, however, various platforms and versions of whole-exome library kits, with their accompanying annotations. These kit-specific annotations are made using a particular version of the human genome and annotation. You need to subset your whole-genome calls against the particular annotation of the kit in question, which means you either have to map against the same genome version, or you have to convert (e.g., with liftOver) the coordinates between different genome builds.

Sometimes, the upstream genome annotation is updated but kit manufacturers often lag behind, keeping an outdated annotation - this could explain the intron / coding discrepancies you observed.

ADD COMMENT
0
Entering edit mode

Thank you very much for your answer!

For 1KG target list, I converted it into Hg38 genomic coordinates while Twist interval list is already in Hg38 genomic coordinates. My VCF file was annotated with dbSNP build 138. Is it possible that presence of intron variants are due to the dbSNP version?

ADD REPLY
0
Entering edit mode

It is possible, but you would have to check to confirm if this is the case. It could also be an error in coordinate conversion.

ADD REPLY

Login before adding your answer.

Traffic: 2149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6