extracting the exons coordinates on hg38
1
0
Entering edit mode
7.1 years ago
Bogdan ★ 1.4k

Dear all,

please could you advise : how can we obtain the coordinates of exons of the RefSeq or UCSC genes (canonical isoforms) on hg38, where each coordinates (chr, start, end) also have assigned the gene name .. ?

thanks a lot, and a happy weekend,

-- bogdan

genome exome • 5.6k views
ADD COMMENT
0
Entering edit mode

Hello Bogdan!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/101333/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Dear gentlemen, thank you for your replies : very much appreciate your help ;). Yes, i knew the previous postings related to extracting the hg19 exon coordinates before i emailed ; although it applies a bit differently to hg38 and to RefSeq genes.

I thought that we may find 2 solutions to the same question : in BioC (by using GenomeFeatures), and in a not-BioC related manner; that I can compare afterwards.

thanks again for your hep, and happy weekend ;) !

ADD REPLY
0
Entering edit mode

Hi Bogdan

I don't know whether you completed this in the end, but ffor anyone else who is trying to use Hg38 then I would recommend following the blog post linked above for hg19, but use the GENCODEv29 track instead of UCSC genes to download the canonical transcripts and exons file.

Follow the instructions as written but just change the tracks over. The code given in that previous blog should work to. I had no errors.

Hope that helps

Lloyd

ADD REPLY
0
Entering edit mode
7.1 years ago
chen ★ 2.5k

Try OpenGene.jl, a library written in Julia (https://github.com/OpenGene/OpenGene.jl).

using OpenGene, OpenGene.Reference

# load the gencode dataset, it will download a file from gencode website if it's not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh38")

genes = gencode_genes(index, "TP53")
tp53 = genes[1]
exons = tp53.transcripts[1].exons
#print the exons
for exon in exons:
    println(exon.number, exon.start_pos, exon.end_pos)
end
ADD COMMENT

Login before adding your answer.

Traffic: 1608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6