Question

1000 genomes coding SNPs not in exonic regions?

0

Entering edit mode

6.2 years ago

spiral01 ▴ 110

Hi, I am attempting to identify the exon that each of my synonymous or missense SNPs in the 1000 genomes data belongs to. I am using the GENCODE GTF files found here: https://www.gencodegenes.org/human/ and extracting all exons.

I then use bedtools to identify which exon each of my SNPs fall in. It appears that many of my SNPs' co-ordinates are not within any exon. What I would like to know is if and how synonymous or missense SNPs can fall in intronic regions?

SNP • 2.1k views

ADD COMMENT • link 6.2 years ago by spiral01 ▴ 110

1

Entering edit mode

Why are you comparing to the GTF? There are tools designed to do exactly what you need.

ADD REPLY • link 6.2 years ago by Emily 24k

0

Entering edit mode

I need to obtain the exon that each SNP lies in, as well as the start and end co-ordinates (because my ultimate goal is to identify the length of the specific exon that each SNP lies in). The available GENCODE annotation of 1000 genomes variants provides the exon number within the gene, but not the exon id or start and end coordinates?

ADD REPLY • link 6.2 years ago by spiral01 ▴ 110

1

Entering edit mode

Simply get the gencode annotation for hg19, extract exons, and use bedtools intersect where -a is the SNPs and -b is the exon.gtf. Use option -wb to return the entire interval of the matching exon. From there you can cut or awk out what you need.

ADD REPLY • link 6.2 years ago by ATpoint 87k

0

Entering edit mode

Can you confirm that the reference genomes are the same, so hg19 vs hg19 or hg38 vs hg38?

ADD REPLY • link 6.2 years ago by ATpoint 87k

0

Entering edit mode

Hi, thanks for the response. Yes I can confirm that the ref genomes are the same, hg38.

ADD REPLY • link 6.2 years ago by spiral01 ▴ 110

0

Entering edit mode

Where are you getting your 1K genome SNPs from?

ADD REPLY • link 6.2 years ago by i.sudbery 21k

0

Entering edit mode

I am getting the data with GENCODE annotations here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/functional_annotation/

ADD REPLY • link 6.2 years ago by spiral01 ▴ 110

0

Entering edit mode

Could these be artifacts from a liftOver operation, perhaps?

ADD REPLY • link 6.2 years ago by Ram 45k

score 1 · Answer 1 · 2019-01-22

1

Entering edit mode

6.2 years ago

i.sudbery 21k

How are you filtering the synonymous and non-synonymous SNPs from all 1000KG SNPs?

ADD COMMENT • link 6.2 years ago by i.sudbery 21k

0

Entering edit mode

The 1000 genomes data is available with consequence annotations here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/functional_annotation/. I then simply parse the variants for those that have a missense or synonymous consequence annotation.

ADD REPLY • link 6.2 years ago by spiral01 ▴ 110

4

Entering edit mode

Those files are all on GRCh37. That's why it's not matching the GRCh38 GTF.

ADD REPLY • link 6.2 years ago by Emily 24k

0

Entering edit mode

Gah such a rookie error. Thank you!

ADD REPLY • link 6.2 years ago by spiral01 ▴ 110