Question

Is there a way to query Ensembl to get all 3'UTRs from all species?

0

Entering edit mode

20 months ago

SJP • 0

I am trying to obtain stats on how many 3'UTRs are annotated in Ensembl. I would really like to download the as many annotated 3'UTRs as possible from as many species as possible and find some basics stats on them (length and so on). For these purposes, its not even really neccessary to download fasta sequences if I can avoid it.

I cannot figure out how to do this in a non-manual way i.e. going into biomart and individually query each species database, which took me forever to do for just a handful of them.

I don't code in PERL, so I am aware of the PERL API but I'm really lost on how to actually formulate my query using that language (I usually use Python).

I currenly use a HPC and my laptop doesn't have a lot of memory so I would really like to do through command-line if thats possible.

Also - not sure ifit might be possible to do using the UCSC genome browser? I tried but, again, didn't know how to apply my query to multiple species.

python ensembl UTR • 659 views

ADD COMMENT • link updated 20 months ago by Pierre Lindenbaum 166k • written 20 months ago by SJP • 0

score 0 · Answer 1 · 2023-11-21

0

Entering edit mode

20 months ago

Pierre Lindenbaum 166k

you could download all the gtf files in ucsc genark

wget -r -nd -N --no-parent -nH --cut-dirs=100 -P OUTDIR   -A '*.gtf.gz'  'https://hgdownload.soe.ucsc.edu/hubs/GCF/'

and then extract the UTR from the GTF files. Extracting 5'UTR and 3'UTR bed files from gtf file

ADD COMMENT • link 20 months ago by Pierre Lindenbaum 166k