1000 genomes coding SNPs not in exonic regions?
1
0
Entering edit mode
5.8 years ago
spiral01 ▴ 110

Hi, I am attempting to identify the exon that each of my synonymous or missense SNPs in the 1000 genomes data belongs to. I am using the GENCODE GTF files found here: https://www.gencodegenes.org/human/ and extracting all exons.

I then use bedtools to identify which exon each of my SNPs fall in. It appears that many of my SNPs' co-ordinates are not within any exon. What I would like to know is if and how synonymous or missense SNPs can fall in intronic regions?

SNP • 1.9k views
ADD COMMENT
1
Entering edit mode

Why are you comparing to the GTF? There are tools designed to do exactly what you need.

ADD REPLY
0
Entering edit mode

I need to obtain the exon that each SNP lies in, as well as the start and end co-ordinates (because my ultimate goal is to identify the length of the specific exon that each SNP lies in). The available GENCODE annotation of 1000 genomes variants provides the exon number within the gene, but not the exon id or start and end coordinates?

ADD REPLY
1
Entering edit mode

Simply get the gencode annotation for hg19, extract exons, and use bedtools intersect where -a is the SNPs and -b is the exon.gtf. Use option -wb to return the entire interval of the matching exon. From there you can cut or awk out what you need.

ADD REPLY
0
Entering edit mode

Can you confirm that the reference genomes are the same, so hg19 vs hg19 or hg38 vs hg38?

ADD REPLY
0
Entering edit mode

Hi, thanks for the response. Yes I can confirm that the ref genomes are the same, hg38.

ADD REPLY
0
Entering edit mode

Where are you getting your 1K genome SNPs from?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Could these be artifacts from a liftOver operation, perhaps?

ADD REPLY
1
Entering edit mode
5.8 years ago

How are you filtering the synonymous and non-synonymous SNPs from all 1000KG SNPs?

ADD COMMENT
0
Entering edit mode

The 1000 genomes data is available with consequence annotations here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/functional_annotation/. I then simply parse the variants for those that have a missense or synonymous consequence annotation.

ADD REPLY
4
Entering edit mode

Those files are all on GRCh37. That's why it's not matching the GRCh38 GTF.

ADD REPLY
0
Entering edit mode

Gah such a rookie error. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6