I am looking for way to get sequences of 3'UTRs for entire transcriptomes, for a few dozen species.
I have tried doing this via biomart, but was unable to find this data in the FTP and downloading the data for so many species manually is unfeasible.
I have also tried to get the data via R biomaRt package, using the following code:
ensembl=useMart("ensembl",dataset="trubripes_gene_ensembl")
genes<-getBM(mart=ensembl,attributes="ensembl_gene_id")
s<-getSequence(seqType='3utr',mart=ensembl,type="ensembl_gene_id",id=genes[,1])
But the out indicated "Sequence unavailable" for about 95% of the genes. On the other hand, when I tried the same with a subset of 100 mouse genes, I received more then a 100 matches, with multiple non duplicated sequence matching a single Ensembl gene.
What would be the right approach to accomplish this task?
Thanks in advance
Dolev Rahat
You're already using the most obvious method. For the species with many instances of "sequence unavailable", have you look at their annotations to see if they have much in the way of annotated UTRs?