Hello,
I'm working with biomaRt package in R. I'm trying to retreive all entrez genes of hsapiensgeneensembl data set. filtering by gene type - protein coding attributes - entrez gene ID
so far I did the following:
library(biomaRt)
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
I'm not sure how to do it using getBM function so that it will not be specific to a list of values but to all values in human data set.
thanks for your help,
Tom :)
Hi,
I have not used "biomart" from last 2-3 months. But here is something which I was using to play around-
Thanks!
let me be more specific - my goal is to download all FASTA sequences under the following conditions:
dataSet - hsapiensgeneensembl filter - gene type - protein coding attributes : ensembl gene id, ensembl transcript id, associated gene name, chromosome name, strand, transcript start.
under sequences: 5' UTR, 3000 bp upstream flank
in ensembl->biomart I got 21976/57945 matches and downloaded it a gz fasta file.
I wish to do this in biomaRt bioconductor in R.
I tried to do it with getSequence function but I dont know how to retrieve all sequences in hsapiens.
Thanks a lot,
tom
You just have to play around with the parameters for a while:
Get all genes for current release (GRCh38 on current date, June 16, 2019)
Then, obtain the 5UTR sequencs for genes based on their HGNC symbol:
If you want bases up- or down-stream of the UTR, you can either try the functionality within
getSequence()
(seeupstream
anddownstream
parameters), OR, you can obtain the 5UTR co-ordinates from the originalgetBM()
function (above), add 3000bp to these, and then usegetSequence()
withoutid
, like this:use a star in the values field (I need only entrezID but you can add more here)
Did not work for me; see my answer.