Extracting ENST ids from coordinates
1
0
Entering edit mode
3.0 years ago
graeme.thorn ▴ 100

I have a set of results from Whippet, which lists deltaPsi values for segments of genes (which may or may not be exons). For the analysis I have the relevant Ensembl Gene IDs, but I need to identify which specific isoforms involve these segments that are flagged as significantly different between conditions.

Is there a quick programmatic way of extracting all transcripts for the particular gene which contain a particular segment coordinates?

biomart whippet • 1.3k views
ADD COMMENT
0
Entering edit mode

You tagged the biomart, does that mean you have tried that but could not find what you are looking for? Asking this because what you describe as a problem is a kind of job that the biomart can help with an answer.

ADD REPLY
0
Entering edit mode

I tagged it biomart as I expect there probably is a solution using biomart but I'm not too au fait with it (or biomaRt, the R package) to extract what I need.

ADD REPLY
0
Entering edit mode

I'm afraid BioMart is not the best way of doing it as it's gene oriented. If you decide to output Ensembl transcript stable IDs (ENSTs) for a given genomic region, the the BioMart is going to look for a gene overlapping this region and print all of the gene's transcripts. You could, however, do it using the REST API and overlap endpoint described here: https://rest.ensembl.org/documentation/info/overlap_region Here's na example: https://rest.ensembl.org/overlap/region/human/17:27630005-27630969?feature=transcript;content-type=application/json

ADD REPLY
1
Entering edit mode
3.0 years ago
graeme.thorn ▴ 100

This may not be the most efficient way of doing this, but I extracted the annotated exons from the GTF feature file, and extracted the locations and ENST ids into a BED4 file, using grep, sed and awk. Similarly, the Whippet segment files were converted into a BED4 file, so the files were of the form

chr1 158831351 158831557 ENSG00000163563.1
...

(the .1 is the indicator for the Whippet segment) and

chr1 158831351 158831557 ENST00000368141
chr1 158831351 158831557 ENST00000491210
...

then ran a bedtools intersect -wa -wb -a Whippet.bed -b ENST.bed to get

chr1 158831351 158831557 ENSG00000016363.1 chr1 158831351 158831557 ENST00000368141
chr1 158831351 158831557 ENSG00000016363.1 chr1 158831351 158831557 ENST00000491210
...

This is the right format now for me to get the affected transcripts in the Whippet PSI output.

ADD COMMENT

Login before adding your answer.

Traffic: 1738 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6