Is there a way to query Ensembl to get all 3'UTRs from all species?
1
0
Entering edit mode
12 months ago
SJP • 0

I am trying to obtain stats on how many 3'UTRs are annotated in Ensembl. I would really like to download the as many annotated 3'UTRs as possible from as many species as possible and find some basics stats on them (length and so on). For these purposes, its not even really neccessary to download fasta sequences if I can avoid it.

I cannot figure out how to do this in a non-manual way i.e. going into biomart and individually query each species database, which took me forever to do for just a handful of them.

I don't code in PERL, so I am aware of the PERL API but I'm really lost on how to actually formulate my query using that language (I usually use Python).

I currenly use a HPC and my laptop doesn't have a lot of memory so I would really like to do through command-line if thats possible.

Also - not sure ifit might be possible to do using the UCSC genome browser? I tried but, again, didn't know how to apply my query to multiple species.

python ensembl UTR • 406 views
ADD COMMENT
0
Entering edit mode
12 months ago

you could download all the gtf files in ucsc genark

wget -r -nd -N --no-parent -nH --cut-dirs=100 -P OUTDIR   -A '*.gtf.gz'  'https://hgdownload.soe.ucsc.edu/hubs/GCF/'

and then extract the UTR from the GTF files. Extracting 5'UTR and 3'UTR bed files from gtf file

ADD COMMENT

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6